DYNAMICS OF MARKETS Econophysics and Finance
Standard texts and research in economics and finance ignore the fact that t...
85 downloads
1262 Views
897KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
DYNAMICS OF MARKETS Econophysics and Finance
Standard texts and research in economics and finance ignore the fact that there is no evidence from the analysis of real, unmassaged market data to support the notion of Adam Smith’s stabilizing Invisible Hand. The neo-classical equilibrium model forms the theoretical basis for the positions of the US Treasury, the World Bank, the IMF, and the European Union, all of which accept and apply it as their credo. As is taught and practised today, that equilibrium model provides the theoretical underpinning for globalization with the expectation to achieve the best of all possible worlds via the deregulation of all markets. In stark contrast, this text introduces a new empirically based model of financial market dynamics that explains volatility and prices options correctly and makes clear the instability of financial markets. The emphasis is on understanding how real markets behave, not how they hypothetically “should” behave. This text is written for physics and engineering graduate students and finance specialists, but will also serve as a valuable resource for those with less of a mathematics background. Although much of the text is mathematical, the logical structure guides the reader through the main line of thought. The reader is not only led to the frontiers, to the main unsolved challenges in economic theory, but will also receive a general understanding of the main ideas of econophysics. Joe M cCauley, Professor of Physics at the University of Houston since 1974, wrote his dissertation on vortices in superfluids with Lars Onsager at Yale. His early postgraduate work focused on statistical physics, critical phenomena, and vortex dynamics. His main field of interest became nonlinear dynamics, with many papers on computability, symbolic dynamics, nonintegrability, and complexity, including two Cambridge books on nonlinear dynamics. He has lectured widely in Scandinavia and Germany, and has contributed significantly to the theory of flow through porous media, Newtonian relativity and cosmology, and the analysis of galaxy statistics. Since 1999, his focus has shifted to econophysics, and he has been invited to present many conference lectures in Europe, the Americas, and Asia. His main contribution is a new empirically based model of financial markets. An avid long distance hiker, he lives part of the time in a high alpine village in Austria with his German wife and two young sons, where he tends a two square meter patch of arugula and onions, and reads Henning Mankell mysteries in Norwegian. The author is very grateful to the Austrian National Bank for permission to use the 1000 Schilling banknote as cover piece, and also to Schr¨odinger’s daughter, Ruth Braunizer, and the Physics Library at the University of Vienna for permission to use Erwin Schr¨odinger’s photo, which appears on the banknote.
DY NA M ICS O F MA R K E T S Econophysics and Finance JOSE PH L . Mc C AU LEY University of Houston
CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521824477 © Joseph L. McCauley 2004 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2004 This digitally printed version 2007 A catalogue record for this publication is available from the British Library Library of Congress Cataloguing in Publication data McCauley, Joseph L. Dynamics of markets : econophysics and finance / Joseph McCauley. p. cm. Includes bibliographical references and index. ISBN 0 521 82447 8 1. Finance – Mathematical models. 2. Finance – Statistical methods. 3. Business mathematics. 4. Markets – Mathematical models. 5. Statistical physics. I. Title. HG106.M4 2004 332´.01´5195 – dc22 2003060538 ISBN 978-0-521-82447-7 hardback ISBN 978-0-521-03628-3 paperback
The publisher has used its best endeavors to ensure that the URLs for external websites referred to in this publication are correct and active at the time of going to press. However, the publisher has no responsibility for the websites and can make no guarantee that a site will remain live or that the content is or will remain appropriate.
Mainly for my stimulating partner Cornelia, who worked very hard and effectively helping me to improve this text, but also for our youngest son, Finn.
v
Contents
Preface 1 The moving target 1.1 Invariance principles and laws of nature 1.2 Humanly invented law can always be violated 1.3 Where are we headed? 2 Neo-classical economic theory 2.1 Why study “optimizing behavior”? 2.2 Dissecting neo-classical economic theory (microeconomics) 2.3 The myth of equilibrium via perfect information 2.4 How many green jackets does a consumer want? 2.5 Macroeconomic lawlessness 2.6 When utility doesn’t exist 2.7 Global perspectives in economics 2.8 Local perspectives in physics 3 Probability and stochastic processes 3.1 Elementary rules of probability theory 3.2 The empirical distribution 3.3 Some properties of probability distributions 3.4 Some theoretical distributions 3.5 Laws of large numbers 3.6 Stochastic processes 3.7 Correlations and stationary processes 4 Scaling the ivory tower of finance 4.1 Prolog 4.2 Horse trading by a fancy name 4.3 Liquidity, and several shaky ideas of “true value” 4.4 The Gambler’s Ruin 4.5 The Modigliani–Miller argument vii
page xi 1 1 3 6 9 9 10 16 21 22 26 28 29 31 31 32 33 35 38 41 58 63 63 63 64 67 68
viii
Contents
4.6 4.7 4.8
5
6
7
8
From Gaussian returns to fat tails The best tractable approximation to liquid market dynamics “Temporary price equilibria” and other wrong ideas of “equilibrium” in economics and finance 4.9 Searching for Adam Smith’s Invisible Hand 4.10 Black’s “equilibrium”: dreams of “springs” in the market 4.11 Macroeconomics: lawless phenomena? 4.12 No universal scaling exponents either! 4.13 Fluctuations, fat tails, and diversification Standard betting procedures in portfolio selection theory 5.1 Introduction 5.2 Risk and return 5.3 Diversification and correlations 5.4 The CAPM portfolio selection strategy 5.5 The efficient market hypothesis 5.6 Hedging with options 5.7 Stock shares as options on a firm’s assets 5.8 The Black–Scholes model 5.9 The CAPM option pricing strategy 5.10 Backward-time diffusion: solving the Black–Scholes pde 5.11 We can learn from Enron Dynamics of financial markets, volatility, and option pricing 6.1 An empirical model of option pricing 6.2 Dynamics and volatility of returns 6.3 Option pricing via stretched exponentials Appendix A. The first Kolmogorov equation Thermodynamic analogies vs instability of markets 7.1 Liquidity and approximately reversible trading 7.2 Replicating self-financing hedges 7.3 Why thermodynamic analogies fail 7.4 Entropy and instability of financial markets 7.5 The challenge: to find at least one stable market Appendix B. Stationary vs nonstationary random forces Scaling, correlations, and cascades in finance and turbulence 8.1 Fractal vs self-affine scaling 8.2 Persistence and antipersistence 8.3 Martingales and the efficient market hypothesis 8.4 Energy dissipation in fluid turbulence 8.5 Multiaffine scaling in turbulence models 8.6 Levy distributions
72 75 76 77 83 85 86 88 91 91 91 93 97 101 102 105 107 109 112 118 121 121 132 144 145 147 147 148 150 153 157 157 161 161 163 166 169 173 176
Contents
8.7 Recent analyses of financial data Appendix C. Continuous time Markov processes 9 What is complexity? 9.1 Patterns hidden in statistics 9.2 Computable numbers and functions 9.3 Algorithmic complexity 9.4 Automata 9.5 Chaos vs randomness vs complexity 9.6 Complexity at the border of chaos 9.7 Replication and mutation 9.8 Why not econobiology? 9.9 Note added April 8, 2003 References Index
ix
179 184 185 186 188 188 190 192 193 195 196 199 201 207
Preface
This book emphasizes what standard texts and research in economics and finance ignore: that there is as yet no evidence from the analysis of real, unmassaged market data to support the notion of Adam Smith’s stabilizing Invisible Hand. There is no empirical evidence for stable equilibrium, for a stabilizing hand to provide self-regulation of unregulated markets. This is in stark contrast with the standard model taught in typical economics texts (Mankiw, 2000; Barro, 1997), which forms the basis for the positions of the US Treasury, the European Union, the World Bank, and the IMF, who take the standard theory as their credo (Stiglitz, 2002). Our central thrust is to introduce a new empirically based model of financial market dynamics that prices options correctly and also makes clear the instability of financial markets. Our emphasis is on understanding how markets really behave, not how they hypothetically “should” behave as predicted by completely unrealistic models. By analyzing financial market data we will develop a new model of the dynamics of market returns with nontrivial volatility. The model allows us to value options in agreement with traders’ prices. The concentration is on financial markets because that is where one finds the very best data for a careful empirical analysis. We will also suggest how to analyze other economic price data to find evidence for or against Adam Smith’s Invisible Hand. That is, we will explain that the idea of the Invisible Hand is falsifiable. That method is described at the end of Sections 4.9 and 7.5. Standard economic theory and standard finance theory have entirely different origins and show very little, if any, theoretical overlap. The former, with no empirical basis for its postulates, is based on the idea of equilibrium, whereas finance theory is motivated by, and deals from the start with, empirical data and modeling via nonequilibrium stochastic dynamics. However, mathematicians teach standard finance theory as if it would be merely a subset of the abstract theory of stochastic processes (Neftci, 2000). There, lognormal pricing of assets combined with “implied volatility” is taken as the standard model. xi
xii
Preface
The “no-arbitrage” condition is regarded as the foundation of modern finance theory and is sometimes even confused with the idea of Adam Smith’s Invisible Hand (Nakamura, 2000). Instead of following the finance theorists and beginning with mathematical theorems about “no-arbitrage,” we will use the empirically observed market distribution to deduce a new dynamical model. We do not need the idea of “implied volatility” that is required when using the lognormal distribution, because we will deduce the empirical volatility from the observed market distribution. And, if a market perfectly satisfies a no-arbitrage condition, so is it, and if not, then so is it as well. We ask what markets are doing empirically, not what they would do were they to follow our wishes expressed as mathematically convenient model assumptions. In other words, we present a physicist’s approach to economics and finance, one that is completely uncolored by any belief in the ideology of neoclassical economic theory or by pretty mathematical theorems about Martingales. One strength of our empirically based approach is that it exposes neo-classical expectations of stability as falsified, and therefore as a false basis for advising the world in financial matters. But before we enter the realm of economics and finance, we first need to emphasize the difference of socio-economic phenomena with natural phenomena (physics, chemistry, cell biology) by bringing to light the underlying basis for the discovery of mathematical laws of nature. The reader finds this presented in Chapter 1 where we follow Wigner and discuss invariance principles as the fundamental building blocks necessary for the discovery of physical law. Taking the next step, we review the globally dominant economic theory critically. This constitutes Chapter 2. We show that the neo-classical microeconomic theory is falsified by agents’ choices. We then scrutinize briefly the advanced and very impressive mathematical work by Sonnenschein (1973a, b), Radner (1968), and Kirman (1989) in neo-classical economics. Our discussion emphasizes Sonnenschein’s inadequately advertised result that shows that there is no macroeconomic theory of markets based on utility maximization (Keen, 2001). The calculations made by Radner and Kirman show that equilibrium cannot be located by agents, and that liquidity/money and therefore financial markets can not appear in the neo-classical theory. Next, in Chapter 3, we introduce probability and stochastic processes from a physicist’s standpoint, presenting Fokker–Planck equations and Green functions for diffusive processes parallel to Ito calculus. Green functions are later used to formulate market dynamics and option pricing. With these tools in hand we proceed to Chapter 4 where we introduce and discuss the standard notions of finance theory, including the Nobel Prize winning Modigliani–Miller argument, which says that the amount of debt doesn’t matter. The most important topic in this chapter is the analysis of the instability and lack
Preface
xiii
of equilibrium of financial markets, based on the example provided by the standard lognormal pricing model. We bring to light the reigning confusion in economics over the notion of equilibrium, and then go on to present an entirely new interpretation of Black’s idea of value. We also explain why an assumption of microscopic randomness cannot, in and of itself, lead to universality of macroscopic economic rules. Chapter 5 presents standard portfolio selection theory, including a detailed analysis of the capital asset pricing model (CAPM) and an introduction to option pricing based on empirical averages. Synthetic options are also defined. We present and discuss the last part of the very beautiful Black–Scholes paper that explains how one can understand bondholders (debt owners) as the owners of a firm, while stockholders merely have options on the company’s assets. Finally, for the first time in the literature, we show why Black and Scholes were wrong in claiming in their original path finding 1973 paper that the CAPM and the delta hedge yield the same option price partial differential equation. We show how to solve the Black–Scholes equation easily by using the Green function, and then end the chapter by discussing Enron, an example where the ratio of debt to equity did matter. The main contribution of this book to finance theory is our (Gunaratne and McCauley) empirically based theory of market dynamics, volatility and option pricing. This forms the core of Chapter 6, where the exponential distribution plays the key role. The main idea is that an (x, t)-dependent diffusion coefficient is required to generate the empirical returns distribution. This automatically explains why volatility is a random variable but one that is perfectly correlated with returns x. This model is not merely an incremental improvement on any existing model, but is completely new and constitutes a major improvement on Black–Scholes theory. Nonuniqueness in extracting stochastic dynamics from empirical data is faced and discussed. We also show that the “risk neutral” option pricing partial differential equation is simply the backward Kolmogorov equation corresponding to the Fokker–Planck equation describing the data. That is, all information required for option pricing is included in the Green function of the market Fokker–Planck equation. Finally, we show how to price options using stretched exponential densities. In Chapter 7 we discuss liquidity, reversible trading, and replicating, selffinancing hedges. Then follows a thermodynamic analogy that leads us back to a topic introduced in Chapter 4, the instability of financial markets. We explain in this chapter why empirically valid thermodynamic analogies cannot be achieved in economic modeling, and suggest an empirical test to determine whether any market can be found that shows evidence for Adam Smith’s stabilizing Invisible Hand. In Chapter 8, after introducing affine scaling, we discuss the efficient market hypothesis (EMH) in light of fractional Brownian motion, using Ito calculus to formulate the latter. We use Kolmogorov’s 1962 lognormal model of turbulence to
xiv
Preface
show how one can analyze the question: do financial data show evidence for an information cascade? In concluding, we discuss Levy distributions and then discuss the results of financial data analyses by five different groups of econophysicists. We end the book with a survey of various ideas of complexity in Chapter 9. The chapter is based on ideas from nonlinear dynamics and computability theory. We cover qualitatively and only very briefly the difficult unanswered question whether biology might eventually provide a working mathematical model for economic behavior. For those readers who are not trained in advanced mathematics but want an overview of our econophysics viewpoint in financial market theory, here is a recommended “survival guide”: the nonmathematical reader should try to follow the line of the argumentation in Chapters 1, 2, 4, 5, 7, and 9 by ignoring most of the equations. Selectively reading those chapters may provide a reasonable understanding of the main issues in this field. For a deeper, more critical understanding the reader can’t avoid the introduction to stochastic calculus given in Chapter 3. For those with adequate mathematical background, interested only in the bare bones of finance theory, Chapters 3–6 are recommended. Those chapters, which form the core of finance theory, can be read independently of the rest of the book and can be supplemented with the discussions of scaling, correlations and fair games in Chapter 8 if the reader is interested in a deeper understanding of the basic ideas of econophysics. Chapters 6, 7 and 8 are based on the mathematics of stochastic processes developed in Chapter 3 and cannot be understood without that basis. Chapter 9 discusses complexity qualitatively from the perspective of Turing’s idea of computability and von Neumann’s consequent ideas of automata and, like Chapters 1 and 2, does not depend at all on Chapter 3. Although Chapter 9 contains no equations, it relies on very advanced ideas from computability theory and nonlinear dynamics. I teach most of the content of Chapters 2–8 at a comfortable pace in a onesemester course for second year graduate students in physics at the University of Houston. As homework one can either assign the students to work through the derivations, assign a project, or both. A project might involve working through a theoretical paper like the one by Kirman, or analyzing economic data on agricultural commodities (Roehner, 2001). The goal in the latter case is to find nonfinancial economic data that are good enough to permit unambiguous conclusions to be drawn. The main idea is to plot histograms for different times to try to learn the time evolution of price statistics. As useful background for a graduate course using this book, the students have preferably already had courses in statistical mechanics, classical mechanics or nonlinear dynamics (primarily for Chapter 2), and mathematical methods. Prior background in economic theory was neither required nor seen as useful, but the students
Preface
xv
are advised to read Bodie and Merton’s introductory level finance text to learn the main terminology in that field. I’m very grateful to my friend and colleague Gemunu Gunaratne, without whom there would be no Chapter 6 and no new model of market dynamics and option pricing. That work was done together during 2001 and 2002, partly while I was teaching econophysics during two fall semesters and also via email while I was in Austria. Gemunu’s original unpublished work on the discovery of the empirical distribution and consequent option pricing are presented with slight variation in Section 6.1.2. My contribution to that section is the discovery that γ and ν must blow up at expiration in order to reproduce the correct forward-time initial condition at expiration of the option. Gemunu’s pioneering empirical work was done around 1990 while working for a year at Tradelink Corp. Next, I am enormously indebted to my life-partner, hiking companion and wife, former newspaper editor Cornelia K¨uffner, for critically reading this Preface and all chapters, and suggesting vast improvements in the presentation. Cornelia followed the logic of my arguments, made comments and asked me penetrating and crucial questions, and my answers to her questions are by and large written into the text, making the presentation much more complete. To the extent that the text succeeds in getting the ideas across to the reader, then you have her to thank. My editor, Simon Capelin, has always been supportive and encouraging since we first made contact with each other around 1990. Simon, in the best tradition of English respect and tolerance for nonmainstream ideas, encouraged the development of this book, last but not least over a lively and very pleasant dinner together in Messina in December, 2001, where we celebrated Gene Stanley’s 60th birthday. Larry Pinsky, Physics Department Chairman at the University of Houston, has been totally supportive of my work in econophysics, has financed my travel to many conferences and also has created, with the aid of the local econophysics/complexity group, a new econophysics option in the graduate program at our university. I have benefited greatly from discussions, support, and also criticism from many colleagues, especially my good friend and colleague Yi-Cheng Zhang, who drew me into this new field by asking me first to write book reviews and then articles for the econophysics web site www.unifr.ch/econophysics. I’m also very much indebted to Gene Stanley, who has made Physica A the primary econophysics journal, and has thereby encouraged work in this new field. I’ve learned from Doyne Farmer, Harry Thomas (who made me realize that I had to learn Ito calculus), Cris Moore, Johannes Skjeltorp, Joseph Hrgovcic, Kevin Bassler, George Reiter, Michel Dacorogna, Joachim Peinke, Paul Ormerod, Giovanni Dosi, Lei-Han Tang, Giulio Bottazzi, Angelo Secchi, and an anonymous former Enron employee (Chapter 5). Last but far from least, my old friend Arne Skjeltorp, the father of the theoretical economist Johannes Skjeltorp, has long been a strong source of support and encouragement for my work and life.
xvi
Preface
I end the Preface by explaining why Erwin Schr¨odinger’s face decorates the cover of this book. Schr¨odinger was the first physicist to inspire others, with his Cambridge (1944) book What is Life?, to apply the methods of physics to a science beyond physics. He encouraged physicists to study the chromosome molecules/fibers that carry the “code-script.” In fact, Schr¨odinger’s phrase “code-script” is the origin of the phrase “genetic code.” He attributed the discrete jumps called mutations to quantum jumps in chemical bonding. He also suggested that the stability of rules of heredity, in the absence of a large N limit that would be necessary for any macroscopic biological laws, must be due to the stability of the chromosome molecules (which he called linear “aperiodic crystals”) formed via chemical bonding a` la Heitler–London theory. He asserted that the code-script carries the complete set of instructions and mechanism required to generate any organism via cellular replication, and this is, as he had guessed without using the term, where the “complexity” lies. In fact, What is Life? was written parallel to (and independent of) Turing’s and von Neumann’s development of our first ideas of complexity. Now, the study of complexity includes economics and finance. As in Schr¨odinger’s day, a new fertile research frontier has opened up. Joe McCauley Ehrwald (Tirol) April 9, 2003
1 The moving target
1.1 Invariance principles and laws of nature The world is complicated and physics has made it appear relatively simple. Everything that we study in physics is reduced to a mathematical law of nature. At very small distances nature is governed by relativistic quantum field theory. At very large distances, for phenomena where both light speed and gravity matter, we have general relativity. In between, where neither atomic scale phenomena nor light speed matter, we have Newtonian mechanics. We have a law to understand and explain everything, at least qualitatively, except phenomena involving decisions made by minds. Our success in discovering that nature behaves mathematically has led to what a famous economist has described as “the Tarzan complex,” meaning that physicists are bold enough to break into fields beyond the natural sciences, beyond the safe realm of mathematical laws of nature. Where did our interest in economics and finance come from? From my own perspective, it started with the explosion of interest in nonlinear dynamics and chaos in the 1980s. Many years of work in that field formed the perspective put forth in this book. It even colors the way that I look at stochastic dynamics. From our experience in nonlinear dynamics we know that our simple looking local equations of motion can generate chaotic and even computationally complex solutions. In the latter case the digitized dynamical system is the computer and the digitized initial condition is the program. With the corresponding explosion of interest in “complexity,” both in dynamical systems theory and statistical physics, physicists are attempting to compete with economists in understanding and explaining economic phenomena, both theoretically and computationally. Econophysics – is it only a new word, a new fad? Will it persist, or is it just a desperate attempt by fundless physicists to go into business, to work where the “real money” is found? We will try to demonstrate in this text that econophysicists can indeed contribute to economic thinking, both critically and creatively. First, it is important
1
2
The moving target
to have a clear picture of just how and why theoretical physics differs from economic theorizing. Eugene Wigner, one of the greatest physicists of the twentieth century and the acknowledged expert in symmetry principles, thought most clearly about these matters. He asked himself: why are we able to discover mathematical laws of nature at all? An historic example points to the answer. In order to combat the prevailing Aristotelian ideas, Galileo Galilei proposed an experiment to show that relative motion doesn’t matter. Motivated by the Copernican idea, his aim was to explain why, if the earth moves, we don’t feel the motion. His proposed experiment: drop a ball from the mast of a uniformly moving ship on a smooth sea. It will, he asserted, fall parallel to the mast just as if the ship were at rest. Galileo’s starting point for discovering physics was therefore the principle of relativity. Galileo’s famous thought experiment would have made no sense were the earth not a local inertial frame for times on the order of seconds or minutes.1 Nor would it have made sense if initial conditions like absolute position and absolute time mattered. The known mathematical laws of nature, the laws of physics, do not change on any time scale that we can observe. Nature obeys inviolable mathematical laws only because those laws are grounded in local invariance principles, local invariance with respect to frames moving at constant velocity (principle of relativity), local translational invariance, local rotational invariance and local time-translational invariance. These local invariances are the same whether we discuss Newtonian mechanics, general relativity or quantum mechanics. Were it not for these underlying invariance principles it would have been impossible to discover mathematical laws of nature in the first place (Wigner, 1967). Why is this? Because the local invariances form the theoretical basis for repeatable identical experiments whose results can be reproduced by different observers independently of where and at what time the observations are made, and independently of the state of relative motion of the observational machinery. In physics, therefore, we do not have merely models of the behavior of matter. Instead, we know mathematical laws of nature that cannot be violated intentionally. They are beyond the possibility of human invention, intervention, or convention, as Alan Turing, the father of modern computability theory, said of arithmetic in his famous paper proving that there are far more numbers that can be defined to “exist” mathematically than there are algorithms available to compute them.2 1
2
There exist in the universe only local inertial frames, those locally in free fall in the net gravitational field of other bodies, there are no global inertial frames as Mach and Newton assumed. See Barbour (1989) for a fascinating and detailed account of the history of mechanics. The set of numbers that can be defined by continued fractions is uncountable and fills up the continuum. The set of algorithms available to generate initial conditions (“seeds”) for continued fraction expansions is, in contrast, countable.
Humanly invented law can always be violated
3
How are laws of nature discovered? As we well know, they are only established by repeatable identical (to within some decimal precision) experiments or observations. In physics and astronomy all predictions must in practice be falsifiable, otherwise we do not regard a model or theory as scientific. A falsifiable theory or model is one with few enough parameters and definite enough predictions (preferably of some new phenomenon) that it can be tested observationally and, if wrong, can be proven wrong. The cosmological principle (CP) may be an example of a model that is not falsifiable.3 A nonfalsifiable hypothesis may belong to the realm of philosophy or religion, but not to science. But we face more in life than can be classified as science, religion or philosophy: there is also medicine, which is not a completely scientific field, especially in everyday diagnosis. Most of our own daily decisions must be made on the basis of experience, bad information and instinct without adequate or even any scientific basis. For a discussion of an alternative to Galilean reasoning in the social field and medical diagnosis, see Carlo Ginzburg’s (1992) essay on Clues in Clues, Myths, and the Historical Method, where he argues that the methods of Sherlock Holmes and art history are more fruitful in the social field than scientific rigor. But then this writer does not belong to the school of thought that believes that everything can be mathematized. Indeed, not everything can be. As von Neumann wrote, a simple system is one that is easier to describe mathematically than it is to build (the solar system, deterministic chaos, for example). In contrast, a complex system is easier to make than it is to describe completely mathematically (an embryo, for example). See Berlin (1998) for a nonmathematical discussion of the idea that there may be social problems that are not solvable. 1.2 Humanly invented law can always be violated Anyone who has taken both physics and economics classes knows that these subjects are completely different in nature, notwithstanding the economists’ failed attempt to make economics look like an exercise in calculus, or the finance theorists’ failed attempt to portray financial markets as a subset of the theory of stochastic processes obeying the Martingale representation theorem. In economics, in contrast with physics, there exist no known inviolable mathematical laws of “motion”/behavior. Instead, economic law is either legislated law, dictatorial edict, contract, or in tribal societies the rule of tradition. Economic “law,” like any legislated law or social contract, can always be violated by willful people and groups. In addition, the idea of falsification via observation has not yet taken root. Instead, 3
The CP assumes that the universe is uniform at large enough distances, but out to the present limit of 170 Mpc h−1 we see nothing but clusters of clusters of galaxies, with no crossover to homogeneity indicated by reliable data analyses.
4
The moving target
an internal logic system called neo-classical economic theory was invented via postulation and dominates academic economics. That theory is not derived from empirical data. The good news, from our standpoint, is that some specific predictions of the theory are falsifiable. In fact, there is so far no evidence at all for the validity of the theory from any real market data. The bad news is that this is the standard theory taught in economics textbooks, where there are many “graphs” but few if any that can be obtained from or justified by unmassaged, real market data. In his very readable book Intermediate Microeconomics, Hal Varian (1999), who was a dynamical systems theorist before he was an economist, writes that much of (neo-classical) economics (theory) is based on two principles. The optimization principle. People try to choose the best patterns of consumption they can afford. The equilibrium principle. Prices adjust until the amount that people demand of something is equal to the amount that is supplied.
Both of these principles sound like common sense, and we will see that they turn out to be more akin to common sense than to science. They have been postulated as describing markets, but lack the required empirical underpinning. Because the laws of physics, or better said the known laws of nature, are based on local invariance principles, they are independent of initial conditions like absolute time, absolute position in the universe, and absolute orientation. We cannot say the same about markets: socio-economic behavior is not necessarily universal but may vary from country to country. Mexico is not like China, which in turn is not like the USA, which in turn is not like Germany. Many econophysicists, in agreement with economists, would like to ignore the details and hope that a single universal “law of motion” governs markets, but this idea remains only a hope, not a reality. There are no known socio-economic invariances to support that hope. The best we can reasonably hope for in economic theory is a model that captures and reproduces the essentials of historical data for specific markets during some epoch. We can try to describe mathematically what has happened in the past, but there is no guarantee that the future will be the same. Insurance companies provide an example. There, historic statistics are used with success in making money under normally expected circumstances, but occasionally there comes a “surprise” whose risk was not estimated correctly based on past statistics, and the companies consequently lose a lot of money through paying claims. Econophysicists aim to be at least as successful in the modeling of financial markets, following Markowitz, Osborne, Mandelbrot, Sharpe, Black, Scholes, and Merton, who were the pioneers of finance theory. The insurance industry, like econophysics, uses historic statistics
Humanly invented law can always be violated
5
and mathematics to try to estimate the probability of extreme events, but the method of this text differs significantly from their methods. Some people will remain unconvinced that there is a practical difference between economics and the hardest unsolved problems in physics. One might object: we can’t solve the Navier–Stokes equations for turbulence because of the butterfly effect or the computational complexity of the solutions of those equations, so what’s the difference with economics? Economics cannot be fairly compared with turbulence. In fluid mechanics we know the equations of motion based on Galilean invariance principles. In turbulence theory we cannot predict the weather. However, we understand the weather physically and can describe it qualitatively and reliably based on the equations of thermo-hydrodynamics. We understand very well the physics of formation and motion of hurricanes and tornadoes even if we cannot predict when and where they will hit. In economics, in contrast, we do not know any universal laws of markets that could be used to explain even qualitatively correctly the phenomena of economic growth, bubbles, recessions, depressions, the lopsided distribution of wealth, the collapse of Marxism, and so on. We cannot use mathematics systematically to explain why Argentina, Brazil, Mexico, Russia, and Thailand collapsed financially after following the advice of neo-classical economics and deregulating, opening up their markets to external investment and control. We cannot use the standard economic theory to explain mathematically why Enron and WCom and the others collapsed. Such extreme events are ruled out from the start by assuming equilibrium in neo-classical economic theory, and also in the standard theory of financial markets and option prices based on expectations of small fluctuations. Econophysics is not like academic economics. We are not trying to make incremental improvements in theory, as Yi-Cheng Zhang has so poetically put it, we are trying instead to replace the standard models with something completely new. Econophysics began in this spirit in 1958 with M. F. M. Osborne’s discovery of Gaussian stock market returns, Benoit Mandelbrot’s emphasis on distributions with fat tails, and then Osborne’s empirically based criticism of neo-classical economics theory in 1977, where he suggested an alternative formulation of supply and demand behavior. Primarily, though, world events and new research opportunities drew many physicists into finance. As Philip Mirowski (2002) emphasizes in his book Machine Dreams, the advent of physicists working in large numbers in finance coincided with the reduction in physics funding after the collapse of the USSR. What Mirowski does not emphasize is that it also coincides, with a time lag of roughly a decade, with the advent of the Black–Scholes theory of option pricing and the simultaneous start of large-scale options trading in Chicago, the advent of deregulation as a dominant government philosophy in the 1980s and beyond, and
6
The moving target
in the 1990s the collapse of the USSR and the explosion of computing technology with the collection of high-frequency finance data. All of these developments opened the door to the globalization of capital and led to a demand on modeling and data analysis in finance that many physicists have found fascinating and lucrative, especially since the standard theories (neo-classical in economics, Black–Scholes in finance) do not describe markets correctly. 1.3 Where are we headed? Economic phenomena provide us with data. Data are analyzed by economists in a subfield called econometrics (the division of theory and data analysis in the economics profession is Bourbakian). The main tool used in the past in econometrics was regression analysis, which so far has not led to any significant insight into economic phenomena. Regression analysis cannot be used to isolate cause and effect and therefore does not lead to new qualitative understanding. Worse, sometimes data analyses and model-based theoretical expectations are mixed together in a way that makes the resulting analysis useless. An example of regression analysis is the “derivation” of the Phillip’s curve (Ormerod, 1994), purporting to show the functional relationship between inflation and employment (see Figure 1.1). To obtain that curve a straight line is drawn through a big scatter of points that don’t suggest that any curve at all should be drawn through them (see the graphs in McCandless (1991) for some examples). Econometrics, regression analysis, does not lead to isolation of cause and effect. Studying correlations is not the same as understanding how and why certain phenomena have occurred. International and governmental banks (like the Federal Reserve) use manyparameter econometric models to try to make economic forecasts. Were these models applied to something simpler, namely the stock market, you would lose money by placing bets based on those predictions (Bass, 1991). In other words, the models are too complicated and based on too few good ideas and too many unknown parameters to be very useful. The falsification of a many-parameter econometric model would require extremely accurate data, and even then the model can not be falsifiable if it has too many unknown or badly known parameters. So far, neither econophysicists nor alternative economists (non neo-classical economists) have come up with models that are adequate to dislodge neo-classical economic theory from its role as king of the textbooks. An aim of this book is to make it clear to the reader that neo-classical theory, beloved of pure mathematicians, is a bad place to start in order to make new models of economic behavior. This includes the neo-classical idea of Nash equilibria in game theory. In order to avoid reinventing a square wheel, it would be good for econophysicists to gain an overview of what’s been done in economic theory since World War II since the advent of both game
Where are we headed?
7
0.10
0.08
0.06 I 0.04
0.02
0.04
0.06 U
0.08
0.10
Figure 1.1. The data points represent the inflation rate I vs the unemployment rate U. The straight line is an example from econometrics of the misapplication of regression analysis, because no curve can describe the data.
theory (which von Neumann abandoned in economics) and automata (which he believed to be a fruitful path, but has so far borne no fruit). The main reason for the popularity with physicists of analyzing stock, bond, and foreign exchange markets is that those markets provide very accurate “high frequency data,” meaning data on a time scale from seconds upward. Markets outside finance do not provide data of comparable accuracy. Finance is therefore the best empirical testing ground for new behavioral models. Interesting alternative work in modeling, paying attention to limitations on our ability to gather, process, and interpret information, is carried out in several schools of economics in northern Italy and elsewhere, but so far the Black–Scholes option pricing model is the only falsifiable and partly successful model within the economic sciences. But “Why,” asked a former student of economics, “do physicists believe that they are more qualified than economists to explain economic phenomena?4 And if physicists, then why not also mathematicians, chemists, and biologists, all other natural scientists as well?” I responded that mathematicians do work in economics, but they tend to be postulatory and to ignore real data. Then I talked with a colleague and came up with a better answer: chemists and biologists are trained to concentrate on details. Physicists are trained to see the connections between seemingly different 4
Maybe many citizens of Third World countries would say that econophysicists could not do worse, and might even do better.
8
The moving target
phenomena, to try to get a glimpse of the big picture and to present the simplest possible mathematical description of a phenomenon that includes as many links as are necessary, but not more. With that in mind, let’s get to work and sift through the evidence from one econophysicist’s viewpoint. Most of this book is primarily about that part of economics called finance, because that is where comparisons with empirical data lead to the clearest conclusions.
2 Neo-classical economic theory
2.1 Why study “optimizing behavior”? We live in a time of widespread belief in an economic model, a model that emphasizes deregulated markets with the reduction and avoidance of government intervention in socio-economic problems. This belief gained ground explosively after the collapse of the competing extreme ideology, communism. After many decades of rigorous attempts at central planning, communism has been thoroughly discredited in our age. The winning side now advances globalization via rapid privatization and deregulation of markets.1 The dominant theoretical economic underpinning for this ideology is provided by neo-classical equilibrium theory, also called optimizing behavior, and is taught in standard economics texts. Therefore it is necessary to know what are the model’s assumptions and to understand how its predictions compare empirically with real, unmassaged data. We will see, among other things, that although the model is used to advise governments, businesses, and international lending agencies on financial matters, the neo-classical model relies on presumptions of stability and equilibrium in a way that completely excludes the possibility of discussing money/capital and financial markets! It is even more strange that the standard equilibrium model completely excludes the profit motive as well in describing markets: the accumulation of capital is not allowed within the confines of that model, and, because of the severe nature of the assumptions required to guarantee equilibrium, cannot be included perturbatively either. This will all be discussed below. Economists distinguish between classical and neo-classical economic ideas. Classical theory began with Adam Smith, and neo-classical theory began with 1
The vast middle ground represented by the regulation of free markets, along with the idea that markets do not necessarily provide the best solution to all social problems, is not taught by “Pareto efficiency” in the standard neo-classical model.
9
10
Neo-classical economic theory
Walras, Pareto, I. Fisher and others. Adam Smith (2000) observed society qualitatively and invented the notion of an Invisible Hand that hypothetically should match supply to demand in free markets. When politicians, businessmen, and economists assert that “I believe in the law of supply and demand” they implicitly assume that Smith’s Invisible Hand is in firm control of the market. Mathematically formulated, the Invisible Hand represents the implicit assumption that a stable equilibrium point determines market dynamics, whatever those dynamics may be. This philosophy has led to an elevated notion of the role of markets in our society. Exactly how the Invisible Hand should accomplish the self-regulation of free markets and avoid social chaos is something that economists have not been able to explain satisfactorily. Adam Smith was not completely against the idea of government intervention and noted that it is sometimes necessary. He did not assert that free markets are always the best solution to all socio-economic problems. Smith lived in a Calvinist society and also wrote a book about morals. He assumed that economic agents (consumers, producers, traders, bankers, CEOs, accountants) would exercise selfrestraint in order that markets would not be dominated by greed and criminality. He believed that people would regulate themselves, that self-discipline would prevent foolishness and greed from playing the dominant role in the market. This is quite different from the standard belief, which elevates self-interest and deregulation to the level of guiding principles. Varian (1999), in his text Intermediate Economics, shows via a rent control example how to use neo-classical reasoning to “prove” mathematically that free-market solutions are best, that any other solution is less efficient. This is the theory that students of economics are most often taught. We therefore present and discuss it critically in the next sections. Supra-governmental organizations like the World Bank and the International Monetary Fund (IMF) rely on the neo-classical equilibrium model in formulating guidelines for extending loans (Stiglitz, 2002). After you understand this chapter then you will be in a better position to understand what ideas lie underneath whenever one of those organizations announces that a country is in violation of its rules. 2.2 Dissecting neo-classical economic theory (microeconomics) In economic theory we speak of “agents.” In neo-classical theory agents consist of consumers and producers. Let x = (x1 , . . . , xn ), where xk denotes the quantity of asset k held or desired by a consumer. x1 may be the number of VW Golfs, x2 the number of Phillips TV sets, x3 the number of ice cream cones, etc. These are demanded by a consumer at prices given by p = ( p1 , . . . , pn ). Neoclassical theory describes the behavior of a so-called “rational agent.” By “rational
Dissecting neo-classical economic theory
11
U(x)
x
Figure 2.1. Utility vs quantity x demanded for decreasing returns.
agent” the neo-classicals mean the following. Each consumer is assumed to perform “optimizing behavior.” By this is meant that the consumer’s implicit mental calculations are assumed equivalent to maximizing a utility function U (x) that is supposed to describe his or her ordering of preferences for these assets, limited only by his or her budget constraint M, where M=
n
¯ pk xk = px
(2.1)
k=1
Here, for example, M equals five TV sets, each demanded at price 230 Euros, plus three VW Golfs, each wanted at 17 000 Euros, and other items. In other words, M is the sum of the number of each item wanted by the consumer times the price he or she is willing to pay for it. That is, complex calculations and educated guesses that might require extensive information gathering, processing and interpretation capability by an agent are vastly oversimplified in this theory and are replaced instead by maximizing a simple utility function in the standard theory. A functional form of the utility U (x) cannot be deduced empirically, but U is assumed to be a concave function of x in order to model the expectation of “decreasing returns” (see Arthur (1994) for examples and models of increasing returns and feedback effects in markets). By decreasing returns we mean that we are willing to pay less for n Ford Mondeos than we are for n − 1, less for n − 2 than for n − 1, and so on. An example of such a utility is U (x) = lnx (see Figure 2.1) But what about producers? Optimizing behavior on the part of a producer means that the producer maximizes profits subject to his or her budget constraint. We intentionally leave out savings because there is no demand for liquidity (money as cash) in this theory. The only role played here by money is as a bookkeeping device. This is explained below.
12
Neo-classical economic theory p = f (x)
x
Figure 2.2. Neo-classical demand curve, downward sloping for the case of decreasing returns.
Each consumer is supposed to maximize his or her own utility function while each producer is assumed to maximize his or her profit. As consumers we therefore maximize utility U (x) subject to the budget constraint (2.1), ˜ dU − pdx/λ =0
(2.2)
where 1/λ is a Lagrange multiplier. We can just as well take p/λ as price p since λ changes only the price scale. This yields the following result for a consumer’s demand curve, describing algebraically what the consumer is willing to pay for more and more of the same item, p = ∇U (x) = f (x)
(2.3)
with slope p of the bidder’s price decreasing toward zero as x goes to infinity, as with U (x) = lnx and p = 1/x, for example (see Figure 2.2). Equation (2.3) is a key prediction of neo-classical economic theory because it turns out to be falsifiable. Some agents buy while others sell, so we must invent a corresponding supply schedule. Let p = g(x) denote the asking price of assets x supplied. Common sense suggests that the asking price should increase as the quantity x supplied increases (because increasing price will induce suppliers to increase production), so that neo-classical supply curves slope upward. The missing piece, so far, is that market clearing is assumed: everyone who wants to trade finds someone on the opposite side and matches up with him or her. The market clearing price is the equilibrium price, the price where total demand equals total supply. There is no dissatisfaction in such a world, dissatisfaction being quantified as excess demand, which vanishes.
Dissecting neo-classical economic theory
f(x)
13
g(x)
p
x
Figure 2.3. Neo-classical predictions for demand and supply curves p = f (x) and p = g(x) respectively. The intersection determines the idea of neo-classical equilibrium, but such equilibria are typically ruled out by the dynamics.
But even an idealized market will not start from an equilibrium point, because arbitrary initial bid and ask prices will not coincide. How, in principle, can an idealized market of utility maximizers clear itself dynamically? That is, how can a nonequilibrium market evolve toward equilibrium? To perform “optimizing behavior” the agents must know each other’s demand and supply schedules (or else submit them to a central planning authority)2 and then agree to adjust their prices to produce clearing. In this hypothetical picture everyone who wants to trade does so successfully, and this defines the equilibrium price (market clearing price), the point where the supply and demand curves p = g(x) and p = f (x) intersect (Figure 2.3). There are several severe problems with this picture, and here is one: Kenneth Arrow has pointed out that supply and demand schedules for the infinite future must be presented and read by every agent (or a central market maker). Each agent must know at the initial time precisely what he or she wants for the rest of his or her life, and must allocate his or her budget accordingly. Otherwise, dissatisfaction leading to new further trades (nonequilibrium) could occur later. In neo-classical theory, no trades are made at any nonequilibrium price. Agents must exchange information, adjust their prices until equilibrium is reached, and then goods are exchanged. The vanishing of excess demand, the condition for equilibrium, can be formulated as follows: let xD = D( p) denote the quantity demanded, the demand function. Formally, this should be the inverse of p = f (x) if the inverse f of D exists. Also, 2
Mirowski (2002) points out that socialists were earlier interested in the theory because, if the Invisible Hand would work purely mechanically then it would mean that the market should be amenable to central planning. The idea was to simulate the free market via mechanized optimal planning rules that mimic a perfect market, and thereby beat the performance of real markets.
14
Neo-classical economic theory
let xS = S( p) (the inverse of p = g(x), if this inverse exists) denote the quantity supplied. In equilibrium we would have vanishing excess demand xD − xS = D( p) − S( p) = 0
(2.4)
The equilibrium price, if one or more exists, solves this set of n simultaneous nonlinear equations. The excess demand is simply ε( p) = D( p) − S( p)
(2.5)
and fails to vanish away from equilibrium. Market efficiency e can be defined as S D , (2.6) e( p) = min D S so that e = 1 in equilibrium. Note that, more generally, efficiency e must depend on both bid and ask prices if the spread between them is large. Market clearing is equivalent to assuming 100% efficiency. One may rightly have doubts that 100% efficiency is possible in any process that depends on the gathering, exchange and understanding of information, the production and distribution of goods and services, and other human behavior. This leads to the question whether market equilibrium can provide a good zeroth-order approximation to any real market. A good zerothorder approximation is one where a real market can then be described accurately perturbatively, by including corrections to equilibrium as higher order effects. That is, the equilibrium point must be stable. A quick glance at any standard economics text (see, for example, Mankiw (2000) or Varian (1999)) will show that equilibrium is assumed both to exist and to be stable. The assumption of a stable equilibrium point is equivalent to assuming the existence of Adam Smith’s Invisible Hand. The assumption of uniqueness, of a single global equilibrium, is equivalent to assuming the universality of the action of the Invisible Hand independently of initial conditions. Here, equilibrium would have to be an attractive fixed point with infinite basin of attraction in price space. Arrow (Arrow and Hurwicz, 1958) and other major contributors to neo-classical economic theory went on to formulate “General Equilibrium Theory” using dp = ε( p) dt
(2.7)
and discovered the mathematical conditions that guarantee a unique, stable equilibrium (again, no trades are made in the theory so long as d p/dt = 0). The equation simply assumes that prices do not change in equilibrium (where excess demand vanishes), that they increase if excess demand is positive, and decrease if excess demand is negative. The conditions discovered by Arrow and others are that all
Dissecting neo-classical economic theory
15
agents must have perfect foresight for the infinite future (all orders for the future are placed at the initial time, although delivery may occur later as scheduled), and every agent conforms to exactly the same view of the future (the market, which is “complete,” is equivalent to the perfect cloning of a single agent as “utility computer” that can receive all the required economic data, process them, and price all his future demands in a very short time). Here is an example: at time t = 0 you plan your entire future, ordering a car on one future date, committing to pay for your children’s education on another date, buying your vacation house on another date, placing all future orders for daily groceries, drugs, long-distance charges and gasoline supplies, and heart treatment as well. All demands for your lifetime are planned and ordered in preference. In other words, your and your family’s entire future is decided completely at time zero. These assumptions were seen as necessary in order to construct a theory where one could prove rigorous mathematical theorems. Theorem proving about totally unrealistic markets became more important than the empirics of real markets in this picture. Savings, cash, and financial markets are irrelevant here because no agent needs to set aside cash for an uncertain future. How life should work for real agents with inadequate or uncertain lifelong budget constraints is not and can not be discussed within the model. In the neo-classical model it is possible to adjust demand schedules somewhat, as new information becomes available, but not to abandon a preplanned schedule entirely. The predictions of the neo-classical model of an economic agent have proven very appealing to mathematicians, international bankers, and politicians. For example, in the ideal neo-classical world, free of government regulations that hypothetically promote only inefficiency, there is no unemployment. Let L denote the labor supply. With dL/dt = ε(L), in equilibrium ε(L) = 0 so that everyone who wants to work has a job. This illustrates what is meant by maximum efficiency: no resource goes unused. Whether every possible resource (land as community meadow, or public walking path, for example) ought to be monetized and used economically is taken for granted, is not questioned in the model, leading to the belief that everything should be priced and traded (see elsewhere the formal idea of Arrow–Debreu prices, a neoclassical notion that foreshadowed in spirit the idea of derivatives). Again, this is a purely postulated abstract theory with no empirical basis, in contrast with real markets made up of qualitatively different kinds of agents with real desires and severe limitations on the availability of information and the ability to sort and correctly interpret information. In the remainder of this chapter we discuss scientific criticism of the neo-classical program from both theoretical and empirical viewpoints, starting with theoretical limitations on optimizing behavior discovered by three outstanding neo-classical theorists.
16
Neo-classical economic theory
2.3 The myth of equilibrium via perfect information In real markets, supply and demand determine nonequilibrium prices. There are bid prices by prospective buyers and ask prices by prospective sellers, so by “price” we mean here the price at which the last trade occurred. This is not a clear definition for a slow-moving, illiquid market like housing, but is well-enough defined for trades of Intel, Dell, or a currency like the Euro, for example. The simplest case for continuous time trading, an idealization of limited validity, would be an equation of the form dp = D( p, t) − S( p, t) = ε( p, t) dt
(2.8)
where pk is the price of an item like a computer or a cup of coffee, D is the demand at price p, S is the corresponding supply, and the vector field ε is the excess demand. Phase space is just the n-dimensional p-space, and is flat with no metric (the ps in (2.8) are always Cartesian (McCauley, 1997a)). More generally, we could assume that d p/dt = f (ε( p, t)), where f is any vector field with the same qualitative properties as the excess demand. Whatever the choice, we must be satisfied with studying topological classes of excess demand functions, because the excess demand function cannot be uniquely specified by the theory. Given a model, equilibrium is determined by vanishing excess demand, by ε = 0. Stability of equilibrium, when equilibria exist at all, is determined by the behavior of solutions displaced slightly from an equilibrium point. Note that dynamics requires only that we specify x = D(p), not p = f (x), and likewise for the supply schedule. The empirical and theoretical importance of this fact will become apparent below. We must also specify a supply function x = S( p). If we assume that the production time is long on the time scale for trading then we can take the production function to be constant, the “initial endowment,” S( p) ≈ x0 , which is just the total supply at the initial time t0 . This is normally assumed in papers on neo-classical equilibrium theory. In this picture agents simply trade what is available at time t = 0, there is no new production (pure barter economy). With demand assumed slaved to price in the form x = D( p), the phase space is the n-dimensional space of the prices p. That phase space is flat means that global parallelization of flows is possible for integrable systems. The n-component ordinary differential equation (2.8) is then analyzed qualitatively in phase space by standard methods. In general there are n − 1 time-independent locally conserved quantities, but we can use the budget constraint to show that one of these conservation laws is global: if we form the scalar product of p with excess demand ε then
The myth of equilibrium
17
applying the budget constraint to both D and S yields ˜ p) = 0 pε(
(2.9)
The underlying reason for this constraint, called Walras’s Law, is that capital and capital accumulation are not allowed in neo-classical theory: neo-classical models assume a pure barter economy, so that the cost of the goods demanded can only equal the cost of the goods offered for sale. This condition means simply that the motion in the n-dimensional price space is confined to the surface of an n − 1dimensional sphere. Therefore, the motion is at most n − 1-dimensional. What the motion looks like on this hypersphere for n > 3 is a question that cannot be answered a priori without specifying a definite class of models. Hyperspheres in dimensions n = 3 and 7 are flat with torsion, which is nonintuitive (Nakahara, 1990). Given a model of excess demand we can start by analyzing the number and character of equilibria and their stability. Beyond that, one can ask whether the motion is integrable. Typically, the motion for n > 3 is nonintegrable and may be chaotic or even complex, depending upon the topological class of model considered. As an example of how easy it is to violate the expectation of stable equilibrium within the confines of optimizing behavior, we present next the details of H. Scarf’s model (Scarf, 1960). In that model consider three agents with three assets. The model is defined by assuming individual utilities of the form Ui (x) = min(x1 , x2 )
(2.10)
and an initial endowment for agent number 1 x0 = (1, 0, 0)
(2.11)
The utilities and endowments of the other two agents are cyclic permutations on the above. Agent k has one item of asset k to sell and none of the other two assets. Recall that in neo-classical theory the excess demand equation (2.8) is interpreted only as a price-adjustment process, with no trades taking place away from equilibrium. If equilibrium is reached then the trading can only be cyclic with each agent selling his asset and buying one asset from one of the other two agents: either agent 1 sells to agent 2 who sells to agent 3 who sells to agent 1, or else agent 1 sells to agent 3 who sells to agent 2 who sells to agent 1. Nothing else is possible at equilibrium. Remember that if equilibrium is not reached then, in this picture, no trades occur. Also, the budget constraint, which is agent k’s income from selling his single unit of asset k if the market clears (he or she has no other source of income other than
18
Neo-classical economic theory
from what he or she sells), is ˜ 0 = pk M = px
(2.12)
Because cyclic trading of a single asset is required, one can anticipate that equilibrium can be possible only if p1 = p2 = p3 . In order to prove this, we need the idea of “indifference curves.” The idea of indifference curves in utility theory, discussed by I. Fisher (Mirowski, 1989), may have arisen in analogy with either thermodynamics or potential theory. Indifference surfaces are defined in the following way. Let U (x1 , . . . , xn ) = C = constant. If the implicit function theorem is satisfied then we can solve to find one of the xs, say xi , as a function of the other n − 1 xs and C. If we hold all xs in the argument of f constant but one, say x j , then we get an “indifference curve” xi = f (x j , C)
(2.13)
We can move along this curve without changing the utility U for our “rational preferences.” This idea will be applied in an example below. The indifference curves for agent 1 are as follows. Note first that if x2 > x1 then x1 = C whereas if x2 < x1 then x2 = C. Graphing these results yields as indifference curves x2 = f (x1 ) = x1 . Note also that p3 is constant. Substituting the indifference curves into the budget constraint yields the demand vector components for agent 1 as x1 =
M = D1 ( p) p1 + p2
x2 =
M = D2 ( p) p1 + p2
x3 = 0
(2.14)
The excess demand for agent 1 is therefore given by p1 p2 −1=− p1 + p 2 p1 + p2 p1 = p1 + p 2
ε11 = ε12
ε13 = 0
(2.15)
where εi j is the jth component of agent i’s excess demand vector. We obtain the excess demands for agents 2 and 3 by cyclic permutation of indices. The kth component of total excess demand for asset k is given by summing over agents εk = ε1k + ε2k + ε3k
(2.16)
The myth of equilibrium
19
so that − p2 p3 + p1 + p 2 p1 + p3 − p3 p1 ε2 = + p2 + p 3 p1 + p2 − p1 p2 ε3 = + p3 + p 1 p2 + p3 ε1 =
(2.17)
The excess demand has a symmetry that reminds us of rotations on the sphere. In equilibrium ε = 0 so that p1 = p2 = p3
(2.18)
is the only equilibrium point. It is easy to see that there is a second global conservation law p1 p 2 p 3 = C 2
(2.19)
ε1 p2 p3 + ε2 p1 p3 + ε3 p1 p2 = 0
(2.20)
following from
With two global conservation laws the motion on the 3-sphere is globally integrable, chaotic motion is impossible (McCauley, 1997a). It is now easy to see that there are initial data on the 3-sphere from which equilibrium cannot be reached. For example, let ( p10 , p20 , p30 ) = (1, 1, 1)
(2.21)
so that 2 p12 + p22 + p31 =3
(2.22a)
Then with p10 p20 p30 = 1 equilibrium occurs but for other initial data the plane is not tangent to the sphere at equilibrium and equilibrium cannot be reached. The equilibrium point is an unstable focus enclosed by a stable limit cycle. In general, the market oscillates and cannot reach equilibrium. For four or more assets it is easy to write down models of excess demand for which the motion is chaotic (Saari, 1995). The neo-classical theorist Roy Radner (1968) arrived at a much stronger criticism of the neo-classical theory from within. Suppose that agents have slightly different information initially. Then equilibrium is not computable. That is, the information demands made on agents are so great that they cannot locate equilibrium. In other words, maximum computational complexity enters when we deviate even slightly from the idealized case. It is significant that if agents cannot find an equilibrium
20
Neo-classical economic theory
point, then they cannot agree on a price that will clear the market. This is one step closer to the truth: real markets are not approximated by the neo-classical equilibrium model. Radner also points out that liquidity demand, the demand for cash as savings, for example, arises from two basic sources. First, in a certain but still neo-classical world liquidity demand would arise because agents cannot compute equilibrium, cannot locate it. Second, the demand for liquidity arises from uncertainty about the future. The notion that liquidity reflects uncertainty will appear when we study the dynamics of financial markets. In neo-classical equilibrium theory, perfect information about the infinite future is required and assumed. In reality, information acquired at one time is incomplete and tends to become degraded as time goes on. Entropy change plays no role in neoclassical economic theory in spite of the fact that, given a probability distribution reflecting the uncertainty of events in a system (the market), the Gibbs entropy describes both the accumulation and degradation of information. Neo-classical theory makes extreme demands on the ability of agents to gather and process information but, as Fischer Black wrote, it is extremely difficult in practice to know what is noise and what is information (we will discuss Black’s 1986 paper “Noise” in Chapter 4). For example, when one reads the financial news one usually only reads someone else’s opinion, or assertions based on assumptions that the future will be more or less like the past. Most of the time, what we think is information is probably more like noise or misinformation. This point of view is closer to finance theory, which does not use neo-classical economics as a starting point. Another important point is that information should not be confused with knowledge (Dosi, 2001). The symbol string “saht” (based on at least a 26 letter alphabet a–z) has four digits of information, but without a rule to interpret it the string has no meaning, no knowledge content. In English we can give meaning to the combinations “hast,” “hats,” and “shat.” Information theory is based on the entropy of all possible strings that one can make from a given number of symbols, that number being 4! = 24 in this example, but “information” in standard economics and finance theory does not make use of entropy. Neo-classical economic theory assumes 100% efficiency (perfect matching a buyer to every seller, and vice versa), but typical markets outside the financial ones3 are highly illiquid and inefficient (housing, automobiles, floorlamps, carpets, etc.) where it is typically relatively hard to match buyers to sellers. Were it easy to match buyers to sellers, then advertising and inventory would be largely superfluous. Seen from this standpoint, one might conclude that advertising may distort markets instead of making them more efficient. Again, it would be important to 3
Financial markets are far from 100% efficient, excess demand does not vanish due to outstanding limit orders.
How many green jackets?
21
distinguish advertising as formal “information” from knowledge of empirical facts. In financial markets, which are usually very liquid (with a large volume of buy and sell executions per second), the neo-classical economic ideas of equilibrium and stability have proven completely useless in the face of the available empirical data. 2.4 How many green jackets does a consumer want? An empirically based criticism of neo-classical theory was provided in 1973 by M. F. M. Osborne, whom we can regard as the first econophysicist. According to the standard textbook argument, utility maximization for the case of diminishing returns predicts price as a function of demand, p = f (x), as a downward-sloping curve (Figure 2.2). Is there empirical evidence for this prediction? Osborne tried without success to find empirical evidence for the textbook supply and demand curves (Figure 2.3), whose intersection would determine equilibrium. This was an implicit challenge to the notion that markets are in or near equilibrium. In the spirit of Osborne’s toy model of a market for red dresses, we now provide a Gedankenexperiment to illustrate how the neo-classical prediction fails for individual agents. Suppose that I’m in the market for a green jacket. My neo-classical demand curve would then predict that I, as consumer, would have the following qualitative behavior, for example: I would want/bid to buy one green jacket for $50, two for $42.50 each, three for $31.99 each, and so on (and this hypothetical demand curve would be continuous!). Clearly, no consumer thinks this way. This is a way of illustrating Osborne’s point, that the curve p = f (x) does not exist empirically for individual agents. What exist instead, Osborne argues, are the functions x = D( p) and x = S( p), which are exactly the functions required for excess demand dynamics (2.8). Osborne notes that these functions are not invertible, implying that utility can not explain real markets. One can understand the lack of invertibility by modeling my demand for a green jacket correctly. Suppose that I want one jacket and am willing to pay a maximum of $50. In that case I will take any (suitable) green jacket for $50 or less, so that my demand function is a step function x = ϑ($50 − p), as shown in Figure 2.4. The step function ϑ is zero if p > $50, unity if p ≤ $50. Rarely, if ever, is a consumer in the market for two green jackets, and one is almost never interested in buying for three or more at one time. Nevertheless, the step function can be used to include these rare cases. This argument is quite general: Osborne points out that limit bid/ask orders in the stock market are also step functions (one can see this graphically in delayed time on the web site 3DStockCharts.com). Limit orders and the step demand function for green jackets provide examples of the
22
Neo-classical economic theory x = D( p)
2
1
0 $ 50
p
Figure 2.4. Empirical demand functions are step functions.
falsification of the neo-classical prediction that individual agents have downwardsloping demand curves p = f (x). With or without equilibrium, the utility-based prediction is wrong. Optimizing behavior does not describe even to zeroth order how individual agents order their preferences. Alternatives like wanting one or two of several qualitatively different jackets can also be described by step functions just as limit orders for different stocks are described by different step functions. The limit order that is executed first wins, and the other orders are then cancelled unless there is enough cash for more than one order.
2.5 Macroeconomic lawlessness One might raise the following question: suppose that we take many step functions x = D( p) for many agents and combine them. Do we get approximately a smooth curve that we can invert to find a relation p = f (x) that agrees qualitatively with the downward-sloping neo-classical prediction? In agreement with Osborne’s attempts, apparently not empirically: the economist Paul Ormerod has pointed out that the only known downward-sloping macroscopic demand curve is provided by the example of cornflakes sales in British supermarkets. What about theory? If we assume neo-classical individual demand functions and then aggregate them, do we arrive at a downward-sloping macro-demand curve? According to H. Sonnenschein (1973a, b) the answer is no, that no definite demand curve is predicted by aggregation (Kirman, 1989), the resulting curve can be anything, including no curve at all. In other words, nothing definite is predicted. This
Macroeconomic lawlessness
23
means that there exists no macroeconomic theory that is grounded in microeconomic theory. What is worse, there is no empirical evidence for the downwardsloping demand curves presented in typical neo-classical texts on macroeconomics, like the relatively readable one by N. G. Mankiw (2000). This means that there is no microeconomic basis for either Keynesian economics or Monetarism, both of which make empirically illegitimate assumptions about equilibrium. For example, in Keynesian theory (Modigliani, 2001) it is taught that there is an aggregate output equilibrium where the labor market is “stuck” at less than full employment but prices do not drop as a consequence. Keynes tried to explain this via an equilibrium model that went beyond the bounds of neo-classical reasoning. The neo-classicals led by J. R. Hicks later revised theoretical thinking to try to include neo-Keynesianism in the assumption of vanishing total excess demand for all goods and money, but Radner has explained why money cannot be included meaningfully in the neo-classical model. A better way to understand Keynes’ original idea is to assume that the market is not in equilibrium dp = ε1 ( p, L) = 0 dt
(2.22b)
dL = ε2 ( p, L) = 0 dt
(2.22c)
with
where p is the price vector of commodities and financial markets. But a deterministic model will not work: financial markets (which are typically highly liquid) are described by stochastic dynamics. Of interest would be to model the Keynesian liquidity trap (see Ackerlof, 1984) without assuming expected utility maximization. There, one models markets where liquidity dries up. If one wants to model nonequilibrium states that persist for long times, then maybe spin glass/neural network models would be interesting. John Maynard Keynes advanced a far more realistic picture of markets than do monetarists by arguing that capitalism is not a stable, self-regulating system capable of perpetual prosperity. Instead, he saw markets as inherently unstable, occasionally in need of a fix by the government. We emphasize that by neglecting entropy the neo-classical equilibrium model ignores the second law prohibition against the construction of an economic perpetuum mobile. The idea of a market as a frictionless, 100% efficient machine (utility computer) that runs perpetually is an illegal idea from the standpoint of statistical physics. Markets require mechanical acts like production, consumption, and information gathering and processing, and certainly cannot evade or supplant the second law of thermodynamics simply by
24
Neo-classical economic theory
postulating utility maximization. We will discuss both market instability and entropy in Chapter 7. Keynes’ difficulty in explaining his new and important idea was that while he recognized the need for the idea of nonequilibrium markets in reality, his neo-classical education mired him in the sticky mud of equilibrium ideas. Also, his neo-classical contemporaries seemed unable to understand any economic explanation that could not be cast into the straitjacket of an equilibrium description. This can be compared with the resistance of the Church in medieval times to any description of motion that was not Aristotelian, although Keynes certainly was no Galileo. Monetarism (including supply-side economics) and Keynesian theory are examples of one-parameter models (that have become ideologies, politically) because they are attempts to use equilibrium arguments to describe the behavior and regulation of a complex system by controlling a single parameter, like the money supply or the level of government spending while ignoring everything else. The advice provided by both approximations was found to be useful by governments during certain specific eras (otherwise they would not have become widely believed), but the one-parameter advice has failed outside those eras4 because economic systems are complex and nonuniversal, filled with surprises, instead of simple and universal. In monetarism one controls the money supply, in Keynesianism the level of government spending, while in the supply-side belief tax reductions dominate the thinking. The monetarist notion is that a steady increase in the money supply leads to a steady rate of economic growth, but we know (from the Lorenz model, for example) that constant forcing does not necessarily lead to a steady state and can easily yield chaos instead. The Lorenz model is a dynamical system in a threedimensional phase space where constant forcing (constant control parameters) can lead to either equilibrium, limit cycles, or chaotic motion depending on the size of the parameters. In the extreme wing of free-market belief, monetarism at the Chicago School, for example, the main ideology is uniform: when a problem exists, then the advice is to deregulate, the belief being that government regulations create only economic inefficiency. These arguments have theoretical grounding in the neo-classical idea of Pareto efficiency based on utility maximization (see Varian for the definition of Pareto efficiency in welfare economics). They presume, as Arrow has made clear, perfect foresight and perfect conformity on the part of all agents. The effect of regulations is treated negatively in the model because in equilibrium regulations would 4
Keynesianism was popular in the USA until the oil embargo of the 1970s. Monetarism and related “supply-side economics” gained ascendancy in official circles with the elections of Reagan and Thatcher, although it was the Carter appointed Federal Reserve Bank Chairman Paul Volcker whose lending policies ended runaway inflation in the 1980s in the USA. During the 1990s even center-left politicians like Clinton, Blair and Schr¨oder became apostles of deregulated markets.
Macroeconomic lawlessness
25
reduce the model’s efficiency. In real markets, where equilibrium is apparently impossible, deregulation seems instead to lead to extreme imbalances. Marxism and other earlier competing economic theories of the nineteenth and early twentieth centuries also assumed stable equilibria of various kinds. In Marxism, for example, evolution toward a certain future is historically guaranteed. This assumption is equivalent mathematically to assuming a stable fixed point, a simple attractor for some undetermined mapping of society. Society was supposed somehow to iterate itself toward this inevitable state of equilibrium with no possible choice of any other behavior, a silly assumption based on wishful thinking. But one of Karl Marx’s positive contributions was to remind us that the neo-classical model ignores the profit motive completely: in a pure barter economy the accumulation of capital is impossible, but capitalists are driven to some extent by the desire to accumulate capital. Marx reconnected economic theory to Adam Smith’s original idea of the profit motive. Evidence for stability and equilibrium in unregulated markets is largely if not entirely anecdotal, more akin to weak circumstantial evidence in legal circles than to scientific evidence. Convincing, reproducible empirical evidence for the Invisible Hand has never been presented by economists. Markets whose statistics are wellenough defined to admit description by falsifiable stochastic models (financial markets) are unstable (see Chapters 4 and 7). It would be an interesting challenge to find at least one example of a real, economically significant market where excess demand actually vanishes and remains zero or close to zero to within observational error, where only small fluctuations occur about a definite state of equilibrium. A flea market, for example, is an example where equilibrium is never reached. Some trades are executed but at the end of the day most of the items put up for sale are carried home again because most ask prices were not met, or there was inadequate demand for most items. Selling a few watches from a table covered with watches is not an example of equilibrium or near equilibrium. The same goes for filling a fraction of the outstanding limit orders in the stock market. We now summarize the evidence from the above sections against the notion that equilibrium exists, as is assumed explicitly by the intersecting neo-classical supply–demand curves shown in Figure 2.3. Scarf’s model shows how easy it is to violate stability of equilibrium with a simple model. Sonnenschein explained that neo-classical supply–demand curves cannot be expected macroeconomically, even if they would exist microeconomically. Osborne explained very clearly why neoclassical supply–demand curves do not exist microeconomically in real markets. Radner showed that with even slight uncertainty, hypothetical optimizing agents cannot locate the equilibrium point assumed in Figure 2.3, even in a nearly ideal, toy neo-classical economy. And yet, intersecting neo-classical supply–demand curves remain the foundation of nearly every standard economics textbook.
26
Neo-classical economic theory
2.6 When utility doesn’t exist We show next that when production is taken into account a utility function generally does not exist. Instead of free choice of U (x), a path-dependent utility functional is determined by the dynamics (McCauley, 2000). Above we have assumed that x = D( p). We now relax this assumption and assume that demand is generated by a production function s x˙ = s(x, v, t)
(2.23)
where v denotes a set of unknown control functions. We next assume a discounted utility functional (the price of money is discounted at the rate e−bt , for example) (2.24) A = e−bt u(x, v, t)dt where u(x, v, t) is the undiscounted “utility rate” (see Intrilligator (1971); see also Caratheodory (1999) and Courant and Hilbert (1953)). We maximize the utility functional A with respect to the set of instruments v, but subject to the constraint (2.23) (this is Mayer’s problem in the calculus of variation), yielding (2.25) ␦A = dt(␦(e−bt (u + p˜ ␦(s(x, v, t) − x˙ )))) = 0 where the pi are the Lagrange multipliers. The extremum conditions are H (x, p , t) = max(u(x, v, t) + p s(x, v, t))
(2.26)
∂u ∂sk + pk =0 ∂vi ∂vi
(2.27)
v
(sum over repeated index k) which yields “the positive feedback form” v = f (x, p, t)
(2.28)
Substituting (2.28) into (2.26) yields H (x, p , t) = max(u(x, v, t) + p˜ s(x, v, t)) v
p˙ = bp − ∇x H x˙ = ∇ D H = S(x, p , t)
(2.29)
where, with v = f (x, p , t) determining the maximum in (2.29), we have S(x, p , t) = s(x, f (x, p , t), t). The integral A in (2.24) is just the Action and the discounted utility rate is the Lagrangian.
When utility doesn’t exist
27
To see that we can study a Hamiltonian system in 2n-dimensional phase space, we use the discounted utility rate w(x, v, t) = e−bt u(x, v, t) with p = e−bt p to find ˜ v, t)) h(x, p, t) = max(w(x, v, t) + ps(x, v
∂h ∂ xi ∂h x˙ i = = Si (x, p, t) ∂ pi
p˙ i = −
(2.30)
which is a Hamiltonian system. Whether or not (2.23) with the vs held constant is driven–dissipative the system (2.30) is phase–volume preserving, and h is generally time dependent. Since the Hamiltonian h generally depends on time it isn’t conserved, but integrability occurs if there are n global commuting conservation laws (McCauley, 1997a). These conservation laws typically do not commute with the Hamiltonian h(x, p), and are generally time-dependent. The integrability condition due to n commuting global conservation laws can be written as p = ∇U (x)
(2.31)
where, for bounded motion, the utility U (x) is multivalued (turning points of the motion in phase space make U multivalued). U is just the reduced action given by (2.32) below, which is a path-independent functional when integrability (2.31) is satisfied, and so the action A is also given in this case by ˜ A= pdx (2.32) In this picture a utility function cannot be chosen by the agent but is determined instead by the dynamics. When satisfied, the integrability condition (2.31) eliminates chaotic motion (and complexity) from consideration because there is a global, differentiable canonical transformation to a coordinate system where the motion is free particle motion described by n commuting constant speed translations on a flat manifold imbedded in the 2n-dimensional phase space. Conservation laws correspond, as usual, to continuous symmetries of the Hamiltonian dynamical system. In the economics literature p is called the “shadow price” but the condition (2.32) is just the neo-classical condition for price. The equilibria that fall out of optimization-control problems in the 2ndimensional phase space of the Hamiltonian system (2.30) are not attractors. The equilibria are either elliptic or hyperbolic points (sources and sinks in phase space are impossible in a Hamiltonian system). It would be necessary to choose an initial condition to lie precisely on a stable asymptote of a hyperbolic point in order to
28
Neo-classical economic theory
have stability. Let us assume that, in reality, prices and quantities are bounded. For arbitrary initial data bounded motion guarantees that there is eternal oscillation with no approach to equilibrium. The generic case is that the motion in phase space is nonintegrable, in which case it is typically chaotic. In this case the neo-classical condition (2.31) does not exist and both the action A = wdt (2.33) and the reduced action (2.32) are path-dependent functionals, in agreement with Mirowski (1989). In this case p = f (x) does not exist. The reason why (2.31) can’t hold when a Hamiltonian system is nonintegrable was discussed qualitatively by Einstein in his explanation why Bohr–Sommerfeld quantization cannot be applied either to the helium atom (three-body problem) or to a statistical mechanical system (mixing system). The main point is that chaotic dynamics, which is more common than simple dynamics, makes it impossible to construct a utility function. 2.7 Global perspectives in economics Neo-classical equilibrium assumptions dominate politico-economic thinking in the western world and form the underlying support for the idea of globalization via deregulation and privatization/commercialization in our age of fast communication, intensive advertising, and rapid, high volume trading in international markets. That unregulated markets are optimally efficient and lead to the best possible world is widely believed. Challenging the idea is akin to challenging a religion, because the belief in unregulated markets has come to reign only weakly opposed in the west since the mid 1980s. However, in contrast with the monotony of a potentially uniform landscape of shopping malls, the diversity of western Europe still provides us with an example of a middle path, that of regulated markets. For example, unlimited land development is not allowed in western European countries, in spite of the idea that such regulation makes the market for land less efficient and denies developers’ demands to exploit all possible resources freely in order to make a profit. Simultaneously, western European societies seem less economically unstable and have better quality of life for a higher percentage of the population than the USA in some important aspects: there is less poverty, better education, and better healthcare for more people, at least within the old west European countries if not within the expanding European Union.5 The advocates of completely 5
Both the European Union and the US Treasury Department reinforce neo-classical IMF rules globally (see Stiglitz, 2002).
Local perspectives in physics
29
deregulated markets assume, for example, that it is better for Germany to produce some Mercedes in Birmingham (USA), where labor is cheaper, than to produce all of them in Stuttgart where it is relatively expensive because the standard and cost of living are much higher in Baden-W¨urtemburg than in Alabama. The opposite of globalization via deregulation is advocated by Jacobs (1995), who provides examples from certain small Japanese and other cities to argue that wealth is created when cities replace imports by their own production. This is quite different than the idea of a US-owned factory or Wal-Mart over the border in Mexico, or a BMW plant in Shenyang. See also Mirowski (1989), Osborne (1977), Ormerod (1994) and Keen (2001) for thoughtful, well-written discussions of basic flaws in neo-classical thinking. 2.8 Local perspectives in physics The foundations of mathematical laws of nature (physics) are the local invariance principles of the relativity principle, translational, rotational and time-translational invariance. These are local invariance principles corresponding to local conservation laws that transcend the validity of Newton’s three laws and hold in quantum theory as well. These local invariance principles are the basis for the possibility of repeated identical experiments and observations independent of absolute time, absolute position and absolute motion. Without these invariance principles inviolable mathematical laws of nature could not have been discovered. Of global invariance principles, both nonlinear dynamics and general relativity have taught us that we have little to say: global invariance principles require in addition the validity of integrability conditions (global integrability of the differential equations describing local symmetries) that usually are not satisfied in realistic cases. Translational and rotational invariance do not hold globally in general relativity because matter causes space-time curvature, and those invariances are properties of empty, flat spaces. The tangent space to any differentiable manifold is an example of such a space, but that is a local idea. Mach’s Principle is based on the error of assuming that invariance principles are global, not local, and thereby replaces the relativity principle with a position of relativism (McCauley, 2001). What has this to do with economics? Differential geometry and nonlinear dynamics provide us with a different perspective than does training in statistical physics. We learn that nonlinear systems of equations like d p/dt = ε( p) generally have only local solutions, that the integrability conditions for global solutions are usually not met. Even if global solutions exist they may be noncomputable in a way that prevents their implementation in practice. These limitations on solving ideal problems in mathematics lead us to doubt that the idea of a universal solution for all
30
Neo-classical economic theory
socio-economic problems, taken for granted in neo-classical theory and represented in practice by globalization via deregulation and external ownership, can be good advice. Some diversity of markets would seem to provide better insurance against large-scale financial disaster than does the uniformity of present-day globalization, just as genetic diversity in a population provides more insurance against a disastrous disease than does a monoculture.
3 Probability and stochastic processes
3.1 Elementary rules of probability theory It is possible to begin a discussion of probability from different starting points. But because, in the end, comparison with real empirical data1 is the only test of a theory or model, we adopt only one, the empirical definition of probability based upon the law of large numbers. Given an event with possible outcomes A1 , A2 , . . . , A N , the probability for Ak is pk ≈ n k /N , where N is the number of repeated identical experiments or observations and n k is the number of times that the event Ak is observed to occur. We point out in the next section that the empirical definition of probability agrees with the formal measure theoretic definition. For equally probable events p = 1/N . For mutually exclusive events (Gnedenko, 1967; Gnedenko and Khinchin, 1962) A and B probabilities add, P(A or B) = P(A) + P(B). For example, the probability that a coin lands heads plus the probability that it does not land heads adds to unity (total probability is normalized to unity in this text). For a complete (i.e. exhaustive) set of mutually exclusive alternatives {Ak }, we have P(Ak ) = 1. For example, in die tossing, if pk is the probability for the number k to show, where 1 ≤ k ≤ 6, then the p1 + p2 + p3 + p4 + p5 + p6 = 1. For a fair die tossed fairly, pk = 1/6. For statistically independent events A and B the probabilities multiply, P(A and B) = P(A)P(B). For example, for two successive fair tosses of a fair coin ( p = 1/2) the probability to get two heads is p 2 = (1/2)2 . Statistical independence is often mislabeled “randomness” but statistical independence occurs in deterministic chaos where there is no randomness, but merely the pseudo-random generation of numbers completely deterministically, meaning via an algorithm. We can use what we have developed so far to calculate a simple formula for the occurrence of at least one desired outcome in many events. For this, we need the probability that the event does not occur. Suppose that p is the probability that 1
Computer simulations certainly do not qualify either as empirical data or as substitutes for empirical data.
31
32
Probability and stochastic processes
event A occurs. Then the probability that the event A does not occur is q = 1 − p. The probability to get at least one occurrence of A in n repeated identical trials is 1 − (q)n . As an example, the probability to get at least one “6” in n tosses of a fair (where p = 1/6) die is 1 − (5/6)n . The breakeven point is given by 1/2 = (5/6)n , or n ≈ 4 is required to break even. One can make money by getting many people to bet that a “6” won’t occur in four (or more) tosses of a die so long as one does not suffer the Gambler’s Ruin (so long as an unlikely run against the odds doesn’t break your gambling budget). That is, we should consider not only the expected outcome of an event or process, we must also look at the fluctuations. What are the odds that at least two people in one room have the same birthday? We leave it to the reader to show that the breakeven point for the birthday game requires n = 22 people (Weaver, 1982). The method of calculation is the same as in the paragraph above.
3.2 The empirical distribution Consider a collection of N one-dimensional data points arranged linearly, x1 , x2 , . . . , xn . Let P(x) denote the probability that a point lies to the left of x on the x-axis. The empirical probability distribution is then P(x) =
k
θ(x − xi )/n
(3.1)
i=1
where xk is the nearest point to the left of x, xk ≤ x and θ (x) = 1 if 0 ≤ x, 0 otherwise. Note that P(−∞) = 0 and P(∞) = 1. The function P(x) is nondecreasing, defines a staircase of a finite number of steps, is constant between any two data points, and is discontinuous at each data point. P(x) satisfies all of the formal conditions required to define a probability measure mathematically. Theoretical measures like the Cantor function define a probability distribution on a staircase of infinitely many steps, a so-called devil’s staircase. We can also write down the probability density f (x) where dP(x) = f (x)dx f (x) =
n
␦(x − xi )/n
(3.2)
i=1
We can compute averages using the empirical distribution. For example, ∞ x =
xdP(x) = −∞
n 1 xi n 1
(3.3)
Some properties of probability distributions
33
and ∞ x =
x 2 dP(x) =
2
−∞
n 1 x2 n 1 i
(3.4)
The mean square fluctuation is defined by x 2 = (x − x)2 = x 2 − x2
(3.5)
The root mean square fluctuation, the square root of (3.5), is an indication of the usefulness of the average (3.3) for characterizing the data. The data are accurately characterized by the mean if and only if x 2 1/2 |x|
(3.6)
and even then only for a sequence of many identical repeated experiments or approximately identical repeated observations. Statistics generally have no useful predictive power for a single experiment or observation and can at best be relied on for accuracy in predictive power for an accurate description of the average of many repeated trials. 3.3 Some properties of probability distributions An important idea is that of the characteristic function of a distribution, defined by the Fourier transform ikx (3.7) e = dP(x)eikx Expanding the exponential in power series we obtain the expansion in terms of the moments of the distribution ∞ (ik)m m x (3.8) eikx = m! m=0 showing that the distribution is characterized by all of its moments (with some exceptions), and not just by the average and mean square fluctuation. For an empirical distribution the characteristic function has the form n ei jkx j /n (3.9) eikx = j=1
Clearly, if all moments beyond a certain order m diverge (as with Levy distributions, for example) then the expansion (3.9) of the characteristic function does not exist. Empirically, smooth distributions do not exist. Only histograms can be constructed from data, but we will still consider model distributions P(x) that are
34
Probability and stochastic processes
smooth with continuous derivatives of many orders, dP(x) = f (x)dx, so that the density f (x) is at least once differentiable. Smooth distributions are useful if they can be used to approximate observed histograms accurately. In the smooth case, transformations of the variable x are important. Consider a transformation of variable y = h(x) with inverse x = q(y). The new distribution of y has density dx f˜ (y) = f (x) dy
(3.10)
f (x) = e−x
(3.11)
For example, if 2
/2σ 2
where x = ln( p/ p0 ) and y = ( p − p0 )/ p0 , then y = h(x) = ex − 1 so that f˜ (y) =
1 −(ln(1+y))2 /2σ 2 e 1+y
(3.12)
The probability density transforms f (x) like a scalar density, and the probability distribution P(x) transforms like a scalar (i.e. like an ordinary function), ˜ P(y) = P(x)
(3.13)
Whenever a distribution is invariant under the transformation y = h(x) then P(y) = P(x)
(3.14)
That is, the functional form of the distribution doesn’t change under the transformation. As an example, if we replace p and p0 by λp and λp0 , a scale transformation, then neither an arbitrary density f (x) nor its corresponding distribution P(x) is invariant. In general, even if f (x) is invariant then P(x) is not, unless both dx and the limits of integration in x P(x) =
f (x)dx
(3.15)
−∞
are invariant. The distinction between scalars, scalar densities, and invariants is stressed here, because even books on relativity often write “invariant” when they should have written “scalar” (Hammermesh, 1962; McCauley, 2001). Next, we discuss some model distributions that have appeared in the finance literature and also will be later used in this text.
Some theoretical distributions
35
3.4 Some theoretical distributions The Gaussian and lognormal distributions (related by a coordinate transformation) form the basis for standard finance theory. The exponential distribution forms the basis for our empirical approach in Chapters 6 and 7. Stretched exponentials are also used to price options in Chapter 6. We therefore discuss some properties of all four distributions next. The dynamics and volatility of exponential distributions are presented in Chapter 6 as well. Levy distributions are discussed in Chapter 8 but are not needed in this text. 3.4.1 Gaussian and lognormal distributions The Gaussian distribution defined by the density 1 2 2 e−(x−x) /2σ f (x) = √ 2σ
(3.16)
with mean square fluctuation x 2 = σ 2
(3.17)
plays a special role in probability theory because it arises as a limit distribution from the law of large numbers, and also forms the basis for the theory of stochastic processes in continuous time. If we take x = ln p then g( p)d p = f (x)dx defines the density g( p), which is lognormal in the variable p. The lognormal distribution was first applied in finance by Osborne in 1958 (Cootner, 1964), and then was used later by Black, Scholes and Merton in 1973. 3.4.2 The exponential distribution The asymmetric exponential distribution2 was discovered in an analysis of financial data by Gunaratne in 1990 in intraday trading of bonds and foreign exchange. The exponential distribution was observed earlier in hard turbulence by Gunaratne (see Castaing et al. 1989). The asymmetric exponential density is defined by ⎧γ γ (x−␦) ⎪ x <␦ ⎨ e 2 (3.18) f (x) = ν ⎪ ⎩ e−ν(x−␦) x > ␦ 2 where ␦, γ , and ν are the parameters that define the distribution and may depend on time. Different possible normalizations of the distribution are possible. The 2
Known in the literature as the Laplace distribution.
36
Probability and stochastic processes
one chosen above is not the one required to conserve probability in a stochastic dynamical description. That normalization is introduced in Chapter 6. Moments of this distribution are easy to calculate in closed form. For example, ∞ x+ =
x f (x)dx = ␦ + ␦
1 ν
(3.19)
1 γ
(3.20)
is the mean of the distribution for x > ␦ while ␦ x− =
x f (x)dx = ␦ − −∞
defines the mean for that part with x < ␦. The mean of the entire distribution is given by x = ␦ +
(γ − ν) γν
(3.21)
The analogous expressions for the mean square are x 2 + =
2 ␦ + 2 + ␦2 2 ν ν
(3.22)
x 2 − =
2 ␦ − 2 + ␦2 2 γ γ
(3.23)
and
Hence the variances for the distinct regions are given by 1 ν2 1 σ−2 = 2 γ
σ+2 =
(3.24)
and for the whole by σ2 =
γ 2 + ν2 γ 2ν 2
(3.25)
We can estimate the probability of large events. The probability for at least one event x > σ is given (for x > ␦) by ν P(x > σ ) = 2
∞ σ
1 e−ν(x−␦) dx = e−ν(σ −␦) 2
(3.26)
A distribution with “fat tails” is one where the density obeys f (x) ≈ x −µ for large x. Fat-tailed distributions lead to predictions of higher probabilities for large values of
Some theoretical distributions
37
x than do Gaussians. Suppose that x = ln(p(t + t)/p(t)). If the probability density f is Gaussian in returns x then we have a lognormal distribution, with a prediction of a correspondingly small probability for “large events” (large price differences over a time interval t). If, however, the returns distribution is exponential then we have fat tails in the variable y = p(t + t)/ p(t) with density g(y) = f (x)dx/dy, ⎧γ −γ ␦ γ −1 ␦ ⎪ ⎨ e y , y<e 2 (3.27) g(y, t) = ν ⎪ ⎩ e−γ ␦ y −ν−1 , y > e␦ 2 with scaling exponents γ − 1 and ν + 1. The exponential distribution plays a special role in the theory of financial data for small to moderate returns. In that case we will find that the ␦, γ , and µ all depend on the time lag t. That is, the distribution that describes financial data is not a stationary one but depends on time. More generally, any price distribution that is asymptotically fat in the price, g( p) ≈ p −µ , is asymptotically exponential in returns, f (x) ≈ e−µx . 3.4.3 Stretched exponential distributions Stretched exponential distributions have been used to fit financial data, which are far from lognormal, especially regarding the observations of “fat tails.” The density of the stretched exponential is given by α Ae−(ν(x−␦)) , x > ␦ (3.28) f (x, t) = α Ae(γ (x−␦)) , x < ␦ dx = ν −1 z 1/α−1 dz
(3.29)
We can easily evaluate all averages of the form ∞ z + = A n
α
(ν(x − ␦))nα e−(ν(x−␦)) dx
(3.30)
␦
where n is an integer. Therefore we can reproduce analogs of the calculations for the exponential distribution. For example, A=
1 γν γ + ν Γ (1/α)
(3.31)
where Γ (ζ ) is the Gamma function, and x+ = ␦ −
1 Γ (2/α) ν Γ (1/α)
(3.32)
38
Probability and stochastic processes
Calculating the mean square fluctuation is equally simple, so we leave it as an exercise for the reader. Option price predictions for stretched exponential distributions can be calculated nearly in closed form, as we show in Chapter 6.
3.5 Laws of large numbers 3.5.1 The law of large numbers We address next the question what can be expected in the case of a large number of independently distributed events by deriving Tschebychev’s inequality. Consider empirical observations where xk occurs a fraction pk times with k = 1, . . . , m. Then n m 1 x = xdP(x) = xj = pk x k (3.33) n j=1 k=1 We concentrate now on x as a random variable where n 1 xk n k=1
x=
(3.34)
The mean square fluctuation in x is σx2 = x 2 =
m
pk (xk − x)2
(3.35)
k=1
Note that σx2 ≥
pk (xk − x)2 ≥ α 2
|xk −x|>α
pk = P(|x − x| > α) (3.36)
|xk −x|>α
so that P(|x − x| > α) ≤
σx2 α2
(3.37)
This is called Tschebychev’s inequality. Next we obtain an upper bound on the mean square fluctuation. From x − x =
n 1 (x j − x) n j=1
(3.38)
we obtain (x − x)2 =
n n 1 1 2 x) (x − + (x j − x)(xk − x) j n 2 j=1 n 2 j =k
(3.39)
Laws of large numbers
so that
39
2
n n σ j max 1 1 σx2 = 2 (x j − x)2 2 σ j2 ≤ n j=1 n j=1 n
(3.40)
where σ j2 = (x j − x j )2
(3.41)
The latter must be calculated from the empirical distribution P j (x j ) of the random variable j; note that the n different distributions P j may, but need not, be the same. The law of large numbers follows from combining (3.37) with (3.40) to obtain 2
σ j max P(|x − x| > α) ≤ (3.42) nα 2 Note that if the n random variables are distributed identically with mean square fluctuation σ 2 then we obtain from (3.40) that σx2 =
σ2 n
(3.43)
which suggests that expected uncertainty can be reduced by studying the sum x of n independent variables instead of the individual variables xk . We have discussed the weak law of large numbers, which suggests that deviations from the mean of x are on the order of 1/n. The strong version of the law of large numbers, to be discussed next, describes the distribution P(x) of fluctuations in x about its mean in the ideal but empirically unrealistic limit where n goes to infinity. That limit is widely quoted as justifying many conclusions that do not follow empirically, in finance and elsewhere. We will see that the problem is not just correlations, that the strong limit can easily lead to wrong conclusions about the long time behavior of stochastic processes.
3.5.2 The central limit theorem We showed earlier that a probability distribution P(x) may be characterized by its moments via the characteristic function Φ(k), which we introduced earlier. The Fourier transform of a Gaussian is again a Gaussian, 1
φ(k) = √ 2σ
∞
dxeikx e−(x−x) /2σ = eikx e−k 2
2
2
σ 2 /2
(3.44)
−∞
We now show that the Gaussian plays a special role in a certain ideal limit. Consider N independent random variables xk , which may or may not be identically distributed.
40
Probability and stochastic processes
Each has finite variance σk . That is, the individual distributions Pk (xk ) need not be the same. All that matters is statistical independence. We can formulate the problem in either of two ways. We may ask directly what is the distribution P(x) of the variable n 1 xk x=√ n k=1
(3.45)
where we can assume that each xk has been constructed to have vanishing mean. The characteristic function is
∞ n n √ ikxk /√n eikx dP(x) = eikx = eikxk / n = e (3.46) Φ(k) = k=1
−∞
k=1
where statistical independence was used in the last step. Writing n
Φ(k) = e = ikx
√ ikxk / n
e
n
= ek=1
√ Ak (k/ n)
(3.47)
k=1
where
√ √ Ak (k/ n) = ln eikxk / n
we can expand to obtain √ Ak (k/ n) = Ak (0) + k 2 A
k (0)/2n + k 3 O(n −1/2 )/n + · · · where
A
k (0) = xk2
(3.48)
(3.49)
(3.50)
If, as n goes to infinity, we could neglect terms of order k 3 and higher in the exponent of Φ(k) then we would obtain the Gaussian limit
eikx = e
√ Ak (k/ n)
≈ e−k
2
σx2 /2
(3.51)
where σx is the variance of the cumulative variable x. An equivalent way to derive the same result is to start with the convolution of the individual distributions subject to the constraint (3.45) √ xk / n (3.52) P(x) = . . . dP1 (x1 ) . . . dPn (xn )␦ x − Using the Fourier transform representation of the delta function yields Φ(k) =
N i=1
√ φi (k/ n)
(3.53)
Stochastic processes
41
where φk is the characteristic function of Pk , and provides another way to derive the central limit theorem (CLT). A nice example that shows the limitations of the CLT is provided by JeanPhillipe Bouchaud and Marc Potters (2000) who consider the asymmetric exponential density f 1 (x) = θ(x)αe−αx
(3.54)
Using (3.54) in (3.52) yields the density f (x, N ) = θ(x)α
x N −1 e−αx (N − 1)!
(3.55)
Clearly, this distribution is never Gaussian for either arbitrary or large values of x. What, then, does the central limit theorem describe? If we locate the value of x for which f (x, N ) is largest, the most probable value of x, and approximate ln f (x, N ) by a Taylor expansion to second order about that point, then we obtain a Gaussian approximation to f . Since the most probable and mean values approximate each other for large N, we see that the CLT asymptotically describes small fluctuations about the mean. However, the CLT does not describe the distribution of very small or very large values of x correctly for any value of N. In this book we generally will not appeal to the central limit theorem in data analysis because it does not provide a good approximation for a large range of values of x, neither in finance nor in fluid turbulence. It is possible to go beyond the CLT and develop formulae for both “large deviations” and “extreme values,” but we do not use those results in this text and refer the reader to the literature instead (Frisch and Sornette, 1997).
3.6 Stochastic processes We now give a physicist’s perspective on stochastic processes (Wax, 1954), with emphasis on some processes that appear in the finance literature. Two references for stochastic calculus are the hard but readable book on financial mathematics by Baxter and Rennie (1995) and the much less readable, but stimulating, book by Steele (2000). Stochastic differential equations are covered in detail by Arnold (1992), which is required reading for serious students of the subject. Arnold also discusses dynamic stability, which the other books ignore. By a random variable B we mean a variable that is described by a probability distribution P(B). Whether a “random variable” evolves deterministically in time via deterministic chaotic differential equations (where in fact nothing in the time evolution is random!) or randomly (via stochastical differential equations) is not implied in this definition: chaotic differential equations generate pseudo-random
42
Probability and stochastic processes
time series x(t) and corresponding probability distributions perfectly deterministically. In the rest of this text we are not concerned with deterministic dynamical systems because they are not indicated by financial market data. Deterministic dynamics is smooth at the smallest time scales, equivalent via a local coordinate transformation to constant velocity motion, over short time intervals. Random processes, in contrast, have nondeterministic jumps at even the shortest time scales, as in the stock market over one tick (where t ≈ 1 s). Hence, in this text we concern ourselves with the methods of the theory of stochastic processes, but treat here only the ideal case of continuous time processes because the discrete case is much harder to handle analytically. By a stochastic or random process x(t) we mean that instead of determinism we have, for example, a one-parameter collection of random variables x(t) and a difference or differential equation where a jump of size x over a time interval t is defined by a probability distribution. An example is provided by a “stochastic differential equation” dx(t, t) = R(x, t)dt + b(x, t)dB
(3.56)
where dB is a Gaussian independent and identically distributed random variable with null mean and variance equal to dt 1/2 . The calculus of stochastic processes was formulated by Ito and Stratonovich. We use Ito calculus below, for convenience. The standard finance literature is based on Ito calculus.
3.6.1 Stochastic differential equations and stochastic integration Consider the stochastic differential equation (sde) dx(t) = Rdt + bdB(t)
(3.57)
where R and b may depend on both x and t. Although this equation is written superficially in the form of a Pfaff differential equation, it is not Pfaffian: “dx” and “dB” are not Leibnitz–Newton differentials but are “stochastic differentials,” as defined by Wiener, Levy, Doob, Stratonovich and Ito. The rules for manipulating stochastic differentials are not the same as the rules for manipulating ordinary differentials: “dB” is not a differential in the usual sense but is itself defined by a probability distribution where B(t) is a continuous but everywhere nondifferentiable curve. Such curves have been discussed by Weierstrass, Levy and Wiener, and by Mandelbrot. As the simplest example R and b are constants, and the global (meaning valid for all t and t) solution of (3.57) is x = x(t + t) − x(t) = Rt + bB(t)
(3.58)
Stochastic processes
43
Here, B is an identically and independently distributed Gaussian random variable with null average B = 0 B = t 2
(3.59) 2H
(3.60)
and B(t)B(t ) = 0,
t = t
(3.61)
but with H = 1/2. Exactly why H = 1/2 is required for the assumption of statistical independence is explained in Chapter 8 in the presentation of fractional Brownian motion. In the case of infinitesimal changes, where paths B(t) are continuous but everywhere nondifferentiable, we have dB = 0 dB2 = dt
(3.62)
which defines a Wiener process. In finance theory we will generally use the variable x(t) = ln( p(t)/ p(0)) x = ln( p(t + t)/ p(t))
(3.63)
representing returns on investment from time 0 to time t, where p is the price of the underlying asset. The purpose of the short example that follows is to motivate the study of Ito calculus. For constant R and constant b = σ , the meaning of (3.58) is that the left-hand side of x(t) − x(0) − Rt = B (3.64) b has the same Gaussian distribution as does B. On the other hand prices are described (with b = σ ) by p(t + t) = p(t)e Rt+σ B
(3.65)
and are lognormally distributed (see Figure 3.1). Equation (3.65) is an example of “multiplicative noise” whereas (3.64) is an example of “additive noise.” Note that the average/expected return is x = Rt
(3.66)
whereas the average/expected price is p(t + t) = p(t)e Rt eσ B
(3.67)
44
Probability and stochastic processes (a) 1200
800
400
1970
1980
1990
Figure 3.1(a). UK FTA index, 1963–92. From Baxter and Rennie (1995), fig. 3.1. (b) 1200
800
400
10
20
30
Figure 3.1(b). Exponential Brownian motion dp = Rpdt + σ pdB with constant R and σ . From Baxter and Rennie (1995), fig. 3.6.
The Gaussian average in (3.67) is easy to calculate and is given by p(t + t) = p(t)e Rt eσ
2
t/2
(3.68)
Now for the hard part that shows the need for a stochastic calculus in order to avoid mistakes. Naively, from (3.57) one would expect that d p = p0 ex dx = p Rdt + pbdB
(3.69)
but this is wrong. Instead, we must keep all terms up to O(dt), so that with p = p0 ex = g(x) we have d p ≈ g˙ dt + g dx + g
dx2 /2
(3.70)
d p = Rpdt + b2 pdB2 /2 + pbdB
(3.71)
which yields
Stochastic processes
45
We will show next, quite generally, how in the solution p = p(R + σ 2 /2)t + pσ • B
(3.72)
of the sde, the “integral” of the stochastic term dB2 in (3.72) is equal to the deterministic term t with probability one. The “Ito product” represented by the dot third term of (3.72) is not a multiplication but instead is defined below by a “stochastic integral.” The proof (known in finance texts as Ito’s lemma) is an application of the law of large numbers. At one point in his informative text on options and derivatives Hull (1997) makes the mistake of treating the Ito product as an ordinary one by asserting that (3.72) implies that p/ p is Gaussian distributed. He assumes that one can divide both sides of (3.72) by p to obtain p/ p = (R + σ 2 /2)t + σ B
(3.73)
But this is wrong: we know from Section 3.3 that if x is Gaussian, then p/ p cannot be Gaussian too. To get onto the right path, consider any analytic function G(x) of the random variable x where dx = R(x, t)dt + b(x, t)dB(t)
(3.74)
˙ + G dx + G
dx2 /2 dG = Gdt
(3.75)
˙ + RG )dt + b2 G
dB2 /2 + bG dB dG = (G
(3.76)
Then with
we obtain, to O(dt), that
which is a stochastic differential form in both dB and dB2 , and is called Ito’s lemma. Next, we integrate over a small but finite time interval t to obtain the stochastic integral equation t+t
G =
˙ (G(x(s), s) + R(x(s), s)G (x(s), s))ds
t t+t
+
b(x(s), s)G
(x(s), s)dB(s)2 /2 + bG • B
(3.77)
t
where the “dot” in the last term is defined by a stochastic integral, the Ito product, below. Note that all three terms generally depend on the path CB defined by the function B(t), the Brownian trajectory.
46
Probability and stochastic processes
First we do the stochastic integral of dB2 for the case where the integrand is constant, independent of (x(t), t). By a stochastic integral we mean N ␦Bk2 (3.78) dB2 ≈ k=1
In formal Brownian motion theory N goes to infinity. There, the functions B(t) are continuous but almost everywhere nondifferentiable and have infinite length, a fractal phenomenon, but in market theory N is the number of trades preceding the value x(t), the number of ticks in the stock market starting from t = 0, for example, when the price p(t) was observed to be registered. Actually, there is never a single price but rather bid/ask prices with a spread, so that we must assume a very liquid market where bid/ask spreads ␦p are very small compared with either price, as in normal trading in the stock market with a highly traded stock. In (3.77) t should be large compared with a tick time interval ␦t ≈ 1 s. We treat here only the abstract mathematical case where N goes to infinity. Next we study X = dB 2 where xk = ␦Bk . By the law of large numbers we have N 2 2 2 ␦Bk − ␦Bk = N (␦B 4 − ␦B 2 2 ) (3.79) σ X2 = (X − X )2 = k=1
where ␦Bk2 = σk2 = σ 2 = ␦t
(3.80)
and t = N ␦τ . Since the Wiener process B(t) is Gaussian distributed, we obtain ␦B 4 = 3␦t 2
(3.81)
σ X2 ≈ 2N ␦t 2 = 2t 2 /N
(3.82)
and the final result
In mathematics ␦t goes to zero but in the market we need t ␦t (but still small compared with the trading time scale, which can be as small as 1 s). We consider here the continuous time case uncritically for the time being. In the abstract mathematical case where N becomes infinite we obtain the stochastic integral equation G = G(t + t) − G(t) t+t ˙ (G(x(s), s) + R(x(s), s)G (x(s), s) + b2 (x(s), s)G
(x(s), s)/2)ds = t
+ bG • B
(3.83)
Stochastic processes
47
where the Ito product is defined by t+t
G • B =
G (x(s), s)dB(s)
t
≈
N
G (xk−1 , tk )(B(tk ) − B(tk−1 )) =
k=1
N
G (xk−1 , tk )␦Bk
(3.84)
k=1
The next point is crucial in Ito calculus: equation (3.84) means that G (xk−1 , tk )␦Bk = 0 because xk−1 is determined by ␦Bk−1 , which is uncorrelated with ␦Bk . However, unless we can find a transformation to the simple form (3.64) we are faced with solving the stochastic integral equation t+t
t+t
R(x(s), s)ds +
x = t
b(x(s), s)dB(s)
(3.85)
t
When a Lipshitz condition is satisfied by both R and b then we can use the method of repeated approximations to solve this stochastic integral equation (see Arnold, 1992). The noise dB is always Gaussian which means that dx is always locally Gaussian (locally, the motion is always Gaussian), but the global displacement ∆x is nonGaussian due to fluctuations included in the Ito product unless b is independent of x. To illustrate this, consider the simple sde d p = pdB
(3.86)
whose solution, we already know, is p(t + t) = p(t)e−t/2+B(t)
(3.87)
First, note that the distribution of p is nonGaussian. The stochastic integral equation that leads to this solution via summation of an infinite series of stochastic terms is t+t
p(t + t) = p(t) +
p(s)dB(s)
(3.88)
t
Solving by iteration (Picard’s method works in the stochastic case if both R and b satisfy Lipshitz conditions!) yields t+t
s ( p(s) +
p(t + t) = p(t) + t
p(w)dB(w))dB(s) = · · · t
(3.89)
48
Probability and stochastic processes
and we see that we meet stochastic integrals like B(s)dB(s)
(3.90)
and worse. Therefore, even in the simplest case we need the full apparatus of stochastic calculus. Integrals like (3.90) can be evaluated via Ito’s lemma. For example, let dx = dB so that x = B, and then take g(x) = x 2 . It follows directly from Ito’s lemma that t+t
1 B(s)dB(s) = (B 2 (t + t) − B 2 (t) − t) 2
(3.91)
t
We leave it as an exercise to the reader to use Ito’s lemma to derive results for other stochastic integrals, and to use them to solve (3.88) by iteration. We now present a few other scattered but useful results on stochastic integration. Using dB = 0 and dB2 = dt we obtain easily that f (s)dB(s) = 0 2
=
f (s)dB(s)
f 2 (s)ds
(3.92)
From the latter we see that if the sde has the special form dx = R(x, t)dt + b(t)dB(t)
(3.93)
then the mean square fluctuation is given by t+t
(x) = 2
b2 (s)ds
(3.94)
t
Also, there is an integration by parts formula (see Doob in Wax, 1954) that holds when f (t) is continuously differentiable, t
t f (s)dB(s) =
t0
f (s)B|tt0
−
(B) f (s)ds
(3.95)
t0
where B = B(s) − B(0). In the finance literature σ 2 = x 2 − x2 is called “volatility.” For small enough t we have σ 2 ≈ D(x, t)t, so that we will have to distinguish between local and
Stochastic processes
49
global volatility. By “global volatility” we mean the variance squared at large times t. 3.6.2 Markov processes A Markov process (Wax, 1954; Stratonovich, 1963; Gnedenko, 1967) defines a special class of stochastic process. Given a random process, the probability to obtain a state (x, t) with accuracy dx and (x , t ) with accuracy dx is denoted by the two-point density f (x, t; x , t )dxdx , with normalization (3.96a) f (x, t; x , t )dxdx = 1 For statistically independent events the two-point density factors, f (x, t; x , t ) = f (x, t) f (x , t )
(3.96b)
In such a process history in the form of the trajectory or any information about the trajectory does not matter, only the last event x(t) at time t matters. The next simplest case is that of a Markov process, f (x, t; x0 , t0 ) = g(x, t| x0 , t0 ) f (x0 , t0 )
(3.97)
Here, the probability density to go from x0 at time t0 to x at time t is equal to the initial probability density f (x0 , t0 ) times the transition probability g, which is the conditional probability to get (x, t), given (x0 , t0 ). Because f is normalized at all times, so must be the transition probability, (3.98) g(x, t | x , t0 )dx = 1 and integrating over all possible initial conditions in (3.97) yields the Smoluchowski equation, or Markov equation, (3.99) f (x, t) = g(x, t | x0 , t0 ) f (x0 , t0 ) dx0 The transition probability must also satisfy the Smoluchowski equation, g(x, t + t | x0 , t0 ) = g(x, t + t | z, t)g(z, t | x0 , t0 ) dz
(3.100)
We use the symbol g to denote the transition probability because this quantity will be the Green function of the Fokker–Planck equation obtained in the diffusion approximation below. A Markov process has no memory at long times. Equations (3.99) and (3.100) imply that distant history is irrelevant, that all that matters is the state at the initial
50
Probability and stochastic processes
time, not what happened before. The implication is that, as in statistically independent processes, there are no patterns of events in Markov processes that permit one to deduce the past from the present. We will show next how to derive a diffusion approximation for Markov processes by using an sde. Suppose we want to calculate the time rate of change of the conditional average of a dynamical variable A(x) ∞ Ax0 ,t0 = A(y)g( y, t | x0 , t0 )dy (3.101) −∞
The conditional average applies to the case where we know that we started at the point x0 at time t0 . When this information is not available then we must use a distribution f (x, t) satisfying (3.99) with specified initial condition f (x, t0 ). We can use this idea to derive the Fokker–Planck equation as follows. We can write the derivative ∞ dAx0 ,t0 ∂ g( y, t | x0 , t0 ) = dy (3.102) A(y) dt ∂t −∞
as the limit of 1 t
∞ dy A(y) [g( y, t + t | x0 , t0 ) − g( y, t | x0 , t0 )]
(3.103)
−∞
Using the Markov condition (3.100) this can be rewritten as ∞ 1 dy A(y) [g( y, t + t | x0 , t0 ) − g(y, t | x0 , t0 )] t −∞ 1 dy A(y) dz(g( y, t + t | z, t)g( z, t | x0 , t0 ) − g(y, t | x0 , t0 )) = t (3.104) Assuming that the test function A(y) is analytic, we expand about y = z but need only terms up to second order because the diffusion approximation requires that x n /t vanishes as t vanishes for 3 ≤ n. From the stochastic difference equation (local solution for small t) x ≈ R(x, t)t + D(x, t)B (3.105) we obtain the first and second moments as conditional averages x ≈ Rt x 2 ≈ D(x, t)t
(3.106)
Stochastic processes
51
and therefore ∞ A(y) −∞
∞
=
∂ g( y, t| x0 , t0 ) dy ∂t dzg( z, t| x0 , t0 )(A (z)R(z, t) + A
(z)D(z, t)/2)
(3.107)
−∞
Integrating twice by parts and assuming that g vanishes fast enough at the boundaries, we obtain
∞
∂ ∂ g( y, t| x0 , t0 ) + (Rg( y, t| x0 , t0 )) ∂t ∂y −∞ 1 ∂2 (Dg( y, t| x , t )) dy = 0 − 0 0 2 ∂ y2 A(y)
(3.108)
Since the choice of test function A(y) is arbitrary, we obtain the Fokker–Planck equation ∂ 1 ∂2 ∂ g( x, t| x0 , t0 ) = − (R(x, t)g( x, t| x0 , t0 )) + (D(x, t)g( x, t| x0 , t0 )) ∂t ∂x 2 ∂x2 (3.109) which is a forward-time diffusion equation satisfying the initial condition g( x, t0 | x0 , t0 ) = ␦(x − x0 )
(3.110)
The Fokker–Planck equation describes the Markov process as convection/drift combined with diffusion, whenever the diffusion approximation is possible (see Appendix C for an alternative derivation). Whenever the initial state is specified instead by a distribution f (x, t0 ) then f (x, t) satisfies the Fokker–Planck equation and initial value problem with the solution ∞ g( x, t | x0 , t0 ) f (x0 , t0 )dx0 (3.111) f (x, t) = −∞
This is all that one needs in order to understand the Black–Scholes equation, which is a backward-time diffusion equation with a specified forward-time initial condition. Note that the Fokker–Planck equation expresses local conservation of probability. We can write ∂j ∂f =− ∂t ∂x
(3.112)
52
Probability and stochastic processes
where the probability current density is j(x, t) = R f (x, t) −
1 ∂ (D f (x, t)) 2 ∂x
(3.113)
Global probability conservation ∞ f (x, t)dx = 1
(3.114)
−∞
requires d dt
f dx =
∞ ∂f dx = − j| =0 ∂t −∞
(3.115)
Equilibrium solutions (which exist only if both R and D are time independent) satisfy j(x, t) = R f (x, t) −
1 ∂ (D f (x, t)) = 0 2 ∂x
(3.116)
and are given by f (x) =
C 2 e D(x)
R(x) D(x) dx
(3.117)
with C a constant. The general stationary state, in contrast, follows from integrating (again, only if R and D are t-independent) the first-order equation 1 ∂ (D(x) f (x)) = J = constant = 0 2 ∂x
j = R(x) f (x) − and is given by C 2 e f (x) = D(x)
R(x) D(x) dx
J 2 e + D(x)
R(x) D(x) dx
e−2
R(x) D(x) dx
dx
(3.118)
(3.119)
We now give an example of a stochastic process that occurs as an approximation in the finance literature, the Gaussian process with sde dx = Rdt + σ dB
(3.120)
and with R and σ constants. In this special case both B and x are Gaussian. Writing y = x − Rt we get, with g(x, t) = G(y, t), σ 2 ∂2G ∂G = ∂t 2 ∂ y2
(3.121)
∂ g σ 2 ∂2g ∂g = −R + ∂t ∂x 2 ∂x2
(3.122)
so that the Green function of
Stochastic processes
53
corresponding to (3.120) is given by 1 x 2 e−( 4σ t ) g( x, t| x0 , t0 ) = √ 4σ t
(3.123)
where x = x − x0 − Rt and t = t − t0 . This distribution forms the basis for textbook finance theory and was first suggested in 1958 by Osborne as a description of stock price returns, where x = ln( p(t)/ p(0)). A second way to calculate the conditional average (3.114) is as follows. With ∂g ∂ j(z, t) dA = A(z) dz = − A(z) dz (3.124a) dt ∂t ∂z where j is the current density (3.118), integrating by parts we get 1 dA (3.124b) = R A + D A
dt 2 This is a very useful result. For example, we can use it to calculate the moments of a distribution: substituting A = x n with x(t) = ln p(t)/ p(t0 ), x = ln p(t + t)/ p(t), we obtain n(n − 1) d x n = nRx n−1 + Dx n−2 (3.125) dt 2 These equations predict the same time dependence for moments independently of which solution of the Fokker–Planck equation we use, because the different solutions are represented by different choices of initial conditions in (3.109). For n = 2 we have
t+t d 2 x = 2Rx + D = 2 R(t) R(s)ds + D (3.126) dt t
where ∞ D =
D(z, t)g( z, t|x, t0 )dz
(3.127)
−∞
Neglecting terms O(t) in (3.126) and integrating yields the small t approximation t+t
x ≈ 2
∞ ds
t
D(z, s)g( z, t|x, t)dz
(3.128)
−∞
Financial data indicate Brownian-like average volatility σ 2 ≈ t 2H
(3.129)
54
Probability and stochastic processes
with H = O(1/2) after relatively short times t > 10 min, but show nontrivial local volatility D(x, t) as well. The easiest approximation, that of Gaussian returns, has constant local volatility D(x, t) and therefore cannot describe the data. We show in Chapter 6 that intraday trading is well-approximated by an asymmetric exponential distribution with nontrivial local volatility. Financial data indicate that strong initial pair correlations die out relatively quickly on a time scale of 10 min of trading time, after which the easiest approximation is to assume a Brownian-like variance. Markov processes can also be used to describe pair correlations. The formulation of mean square fluctuations above is essential for describing the “volatility” of financial markets in Chapter 6. We end the section with the following observation. Gaussian returns in the Black– Scholes model are generated by the sde dx = (r − σ 2 /2)dt + σ dB
(3.130)
where σ is constant. The corresponding Fokker–Planck equation is ∂f σ 2 ∂2 f ∂f = −(r − σ 2 /2) + ∂t ∂x 2 ∂x2 Therefore, a lognormal price distribution is described by d p = r pdt + σ pdB
(3.131)
(3.132)
and the lognormal distribution is the Green function for ∂g ∂ σ 2 ∂2 2 = −r ( pg) + ( p g) ∂t ∂p 2 ∂ p2
(3.133)
where g( p, t)d p = f (x, t)dx, or f (x, t) = pg( p, t) with x = ln p. A word on coordinate transformations is needed at this point. Beginning with an Ito equation for p, the transformation x = h( p, t) yields an Ito equation for x. Each Ito equation has a corresponding Fokker–Planck equation. If g( p, t) solves the Fokker–Planck equation in p, then the solution to the Fokker–Planck equation in x is given by f (x, t) = g(m(x, t), t)dm(x, t)/dx where m(x, t) = p is the inverse of x = h(p, t). This is because the solutions of Fokker–Planck equations transform like scalar densities. With x = ln p, for example, we get g( p, t) = f (ln( p/ p0 ), t)/ p. This transformation is important for Chapters 5 and 6. 3.6.3 Wiener integrals We can express the transition probability and more general solutions of Markov processes as Wiener integrals (Kac, 1959), also called path integrals (Feynman and
Stochastic processes
55
Hibbs, 1965) because they were discovered independently by Feynman, motivated by a conjecture in quantum theory made by Dirac. Beginning with the sde dx = σ (x, t) • dB(t)
(3.134)
where the drift (assuming R is x-independent) has been subtracted out, t+t
R(s)ds → x
x −
(3.135)
t
note that the conditional probability is given by 1 g(x, t |x0 , t0 ) = ␦(x − σ • B) = 2
dkeikx e−ikσ •B (3.136)
The average is over very small Gaussian noise increments Bk . For the general solution where the probability density obeys the initial condition f (x, 0) = f 0 (x), we can work backward formally from f (x, t) = g(x, t | z, 0) f 0 (z)dz = dz f 0 (z)␦(z − x + σ • B) (3.137) to obtain f (x, t) = f 0 (x − σ • B)
(3.138)
To illustrate the calculation of averages over Gaussian noise B, consider the simple case where σ (t) in the Ito product is independent of x. Writing the Ito product as a finite sum over small increments Bk we have 1 g(x, t | x0 , t0 ) = 2
ikx
dke
n
d∆B j e−B j /2␦t e−ikσ j ␦tB j 2
(3.139)
j=1
where t = n␦t. Using the fact that the Fourier transform of a Gaussian is also a Gaussian −Bk2 /2␦t 2 −ikσk Bk e = e−␦t(kσk ) (3.140) dBk e √ 2␦t we obtain g(x, t | x0 , t0 ) =
1 2
dkeikx e−k
2
σ˜ 2 /2
1 2 2 =√ e−x /2σ˜ 2 σ˜ 2
(3.141)
56
Probability and stochastic processes
where t+t
σ˜ 2 =
σ 2 (s)ds
(3.142)
t
We have therefore derived the Green function for the diffusion equation (3.121) with variance σ (t) by averaging over Gaussian noise. The integral over the Bk in (3.139) is the simplest example of a Wiener integral. Note that the sde (3.134) is equivalent to the simplest diffusion equation (3.121) with constant variance in the time variable τ where dτ = σ 2 dt. For the case where σ depends on position x(t) we need an additional averaging over all possible paths connecting the end points (x, x0 ). This is introduced systematically as follows. First, we write ␦(x − x0 − σ • B) =
∞ n
dxi−1 ␦(xi − xi−1 − σi−1 Bi−1 )
(3.143)
i=2 −∞
where x = xn and x0 are fixed end points. From this we have g(x, t | x0 , t0 ) = ␦(x − σ • B) 1 = dxi−1 dkeikxi e−ikσi •Bi (2)n−2 i
(3.144)
where xi = xi − xi−1 . Doing the Gaussian average gives us g(x, t | x0 , t0 ) =
∞ n i=2 −∞
1
e dxi−1 2σ 2 (xi−1 , ti−1 )␦t
−
(xi −xi−1 )2 2σ 2 (xi−1 ,ti )␦t
(3.145)
for large n (n eventually goes to infinity), and where t = n␦t. A diffusion coeffi√ cient D(x, t) = σ 2 that is linear in x/ t yields the exponential distribution (see Chapter 6 for details). Note that the propagators in (3.145) are the transition probabilities derived for the local solution of the sde (3.134). The approximate solution for very small time intervals ␦t is ␦x ≈ σ (x, t)B
(3.146)
␦x ≈ B σ (x, t)
(3.147)
so that
Stochastic processes
57
is Gaussian with mean square fluctuation ␦t: g0 (␦x, ␦t) ≈
1
e−(␦x) /2σ 2
2σ 2 (x, t)␦t
2
(x,t)␦t
(3.148)
Using (3.148), we can rewrite (3.145) as g(x, t | x0 , t0 ) =
n−1
∞ dxi g0 (xi , ti | xi−1 , ti−1 )
(3.149)
i=1 −∞
where g0 is the local approximation to the global propagator g. Gaussian approximations are often made in the literature by mathematical default, because Wiener integrals are generally very hard to evaluate otherwise. Often, it is easier to solve the pde directly to find the Green function than to evaluate the above expression. However, the Wiener integral provides us with a nice qualitative way of understanding the Green function. Monte Carlo provides one numerical method of evaluating a Wiener integral. In statistical physics and quantum field theory the renormalization group method is used, but accurate results are generally only possible very near a critical point (second-order phase transition).
3.6.4 Local vs global volatility In Chapter 6 we will need the distinction between local and global volatility. We now present that idea. Just as in the case of deterministic ordinary differential equations (odes), we can distinguish between global and local solutions. Whenever (3.74) has a solution valid for all (t, t) then that is the global solution. Examples of global solutions of sdes are given by (3.58) and (3.65). This is analogous to the class of integrable ordinary differential equations. Whenever we make an approximation valid only for small enough t, ˙ + G
/2)t + G (x(t), t)B ˙ + G
/2)t + G • B ≈ (G G ≈ (G
(3.150)
then we have an example of a local solution. Equation (3.73) above is an example of a local solution, valid only for small enough p. Again, these distinctions are essential but are not made in Hull’s 1997 text on options. We apply this idea next to volatility, where the global volatility is defined by σ 2 = x 2 − x2 . If we start with the sde (3.151) dx = Rdt + D(x, t)dB(t)
58
Probability and stochastic processes
where x(t) = ln( p(t)/ p(t0 )) is the return for prices at two finitely separated time intervals t and t0 , then we can calculate the volatility for small enough t from the conditional average t+t
t+t ∞
D(x(s), s) ds =
x 2 ≈ t
D(z, s)g(z, s | x, t)dzds t
(3.152)
−∞
This can be obtained from equation (3.125) for the second moment. For small enough t we can approximate g ≈ ␦(z − x) to obtain t+t
x ≈
D(x, s)ds ≈ D(x, t)t
2
(3.153)
t
which is the expected result: the local volatility is just the diffusion coefficient. Note that (3.153) is just what we would have obtained by iterating the stochastic integral equation (3.85) one time and then truncating the series. The global volatility for arbitrary t is given by ⎞2 ⎛ t+t R(x(s), s)ds ⎠ σ 2 = x 2 − x2 = ⎝ t
t+t
t+t ∞
D(z, s)g( z, s| x, t)dzds −
+ t
−∞
2 R(x(s), s)ds
(3.154)
t
which does not necessarily go like t for large t, depending on the model under consideration. For the asymptotically stationary process known as the Smoluchowski–Uhlenbeck–Ornstein process (see Sections 3.7 and 4.9), for example, σ 2 goes like t for small t but approaches a constant as t becomes large. But financial data are not stationary, as we will see in Chapter 6. As the first step toward understanding that assertion, let us next define a stationary process.
3.7 Correlations and stationary processes The purpose of this section is to provide an introduction to what is largely ignored in this book: correlations. The reason for the neglect is that liquid markets (stock, bond, foreign exchange) are very hard to beat, meaning that to a good zeroth approximation there are no long-time correlations that can be exploited for profit. Markov processes, as we will show in Chapter 6, provide a good zeroth-order approximation to very liquid markets.
Correlations and stationary processes
59
Financial data indicate that strong initial pair correlations die out relatively quickly on a time scale of 10 min of trading. After that the average volatility obeys σ 2 ≈ t H with H = O(1/2), as is discussed by Mantegna and Stanley (2000). We therefore need a description of correlations. A stationary process for n random variables is defined by a time-translation invariant probability distribution P(x1 , . . . , xn ; t1 , . . . , tn ) = P(x1 , . . . , xn ; t1 + t, . . . , tn + t) (3.155) A distribution of n−m variables is obtained by integrating over m variables. It follows that the one-point distribution P(x) is time invariant (time independent), dP/dt = 0, so that the averages x and ∆x 2 = σ 2 are constants independent of t. Wiener processes and other nonsteady forms of diffusion are therefore not stationary. In general, a Markov process cannot be stationary at short times, and can only be stationary at large times if equilibrium or some other steady state is reached. The S–U–O process provides an example of an asymptotically stationary Markov process, where after the exponential decay of initial pair correlations statistical equilibrium is reached. Applied to the velocity of a Brownian particle, that process describes the approach to equipartition and the Maxwellian velocity distribution (Wax, 1954). To calculate two-point correlations we need the two-point distribution P(x1 , x2 ; t1 , t2 ). The time-translated value is given by the power series in t1 and t2 of P(x1 , x2 ; t1 + t, t2 + t) = T P(x1 , x2 ; t1 , t2 )
(3.156)
T = et∂/∂t1 +∆t∂/∂t2
(3.157)
where
is the time-translation operator. From the power series expansion of T in (3.149), we see that time-translational invariance requires that the distribution P satisfies the first-order partial differential equation ∂P ∂P + =0 ∂t1 ∂t2
(3.158)
Using the nineteenth-century mathematics of Jacobi, this equation has characteristic curves defined by the simplest globally integrable dynamical system dt1 = dt2
(3.159)
t1 − t2 = constant
(3.160)
with the solution
60
Probability and stochastic processes
which means that P depends only on the difference t1 − t2 . It follows that P(x1 , x2 ; t1 , t2 ) = P(x1 , x2 ; t1 − t2 ). It also follows that the one-point probability P(x, t) = P(x) is time independent. Statistical equilibrium and steady states are examples of stationary processes. Consider next a stochastic process x(t) where x1 = x(t), x2 = x(t + t). The pair correlation function is defined by x(t)x(t + t) = x1 x2 dP(x1 , x2 ; t) (3.161) where x = x − x. Since x is not square integrable in t for unbounded times but only fluctuates about its average value of 0, we can form a Fourier transform (in reality, Fourier series, because empirical data and simulation results are always discrete) in a window of finite width 2T ∞ A(ω, T )eiωt dω x(t) = −∞
1 A(ω, T ) = 2
T
x(t)e−iωt dt
(3.162a, b)
−T
If the stochastic system is ergodic (Yaglom and Yaglom, 1962) then averages over x can be replaced by time averages yielding 1 x(t)x(t + t) = T
T
∞ x(t)x(t + t)dt =
G(ω)eiωt dω
(3.163)
−∞
−T
where the spectral density is given by |A(ω, T )2 | (3.164) T for large T , and where the mean square fluctuation is given by the time-independent result G(ω) = 2
1 σ 2 = x(t)2 = T
T
∞ x(t)2 dt =
−T
G(ω)dω
(3.165)
−∞
Clearly, the Wiener process is not stationary, it has no spectral density and has instead a mean square fluctuation σ 2 that grows as t 1/2 . We will discuss nonstationary processes and their importance for economics and finance in Chapters 4, 6 and 7.
Correlations and stationary processes
61
A stationary one-point probability density P(x) is time independent, dP(x)/dt = 0, and can only describe a stochastic process that either is in a steady state or equilibrium. Financial time series are not stationary. Market distributions are neither in equilibrium nor stationary, they are diffusive with effectively unbounded x so that statistical equilibrium is impossible, as is discussed in Chapters 4 and 7. As we noted above, financial data are pair-correlated over times on the order of 10 min, after which one obtains nonstationary Brownian-like behavior σ 2 ≈ ct 2H
(3.166)
with H = O(1/2) to within observational error. We will see in Chapter 8 that the case where H = 1/2, called fractional Brownian motion, implies long-time correlations. Even if H = 1/2, this does not imply that higher-order correlation functions in returns show statistical independence. Equation (3.166) tells us nothing about three-point or higher-order correlation functions, for example. Consider next some well-studied model spectral densites. White noise is defined heuristically in the physics literature by the Langevin equation dB = ξ (t)dt
(3.167a)
with the formally time-translationally invariant autocorrelation function ξ (t)ξ (t ) = ␦(t − t )
(3.167b)
We have stressed earlier in this chapter that B(t1 )B(t2 ) = 0 if t1 and t2 do not overlap, but this correlation function does not vanish for the case of overlap (Stratonovich, 1963). From (3.161) it follows that the spectral density of white noise is constant, G(ω) =
1 2
(3.168)
so that the variance of white noise is infinite. For the Wiener process we obtain t+t
B(t) = 2
t+t
dwξ (s)ξ (w) = t
ds t
(3.169a)
t
which is correct, and we see that the stochastic “derivative” dB/dt of a Wiener process B(t) defines white noise, corresponding to the usual Langevin equations used in statistical physics. The model autocorrelation function R(t) = σ 2 e−|t|/τ
(3.169b)
62
Probability and stochastic processes
with spectral density G(ω) =
2σ 2 τ 1 + (ωτ )2
(3.170)
approximates white noise (lack of correlations) at low frequencies (and also the S–U–O process) whereas for a continuous time process with unbounded random variable x the high-frequency approximation G(ω) ∼ ω−2 characterizes correlated motion over the finite time scale τ . It is often claimed in the literature that “1/ f 2 noise” characterizes a random process but this apparently is not true without further assumptions, like a discrete time process on a circle (amounting to the assumption of periodic boundary conditions on x).3 Stretched exponential autocorrelation functions may have finite spectral densities, whereas power-law correlations decay very slowly, R(t) ≈ (t/τ )η−1
(3.171)
with a finite cutoff at short times, and (with 0 ≤ ν ≤ 1) are infinitely long-ranged (see Kubo et al., 1978, for an example from laminar hydrodynamics). Long-ranged correlations based on spectral densities G(ω) ∼ ω−η with 0 < η < 2, with a lowfrequency cutoff that breaks scale invariance, have been modeled in papers on SOC (self-organized criticality). A zeroth-order qualitative picture of financial markets arising from empirical data (see Dacorogna et al., 2001) is that the short-time correlated behavior of returns is described by G(ω) ∼ ω−2 . However, it is not white noise but rather a nonstationary stochastic process with average volatility σ 2 ≈ O(t) (with no spectral density) that describes the longer-time behavior of the data. As we pointed out above, Brownian behavior of the average volatility does not imply Gaussian returns but is consistent with many other models with nontrivial local volatility, like the exponential distribution that we will discuss in detail in Chapter 6. As we stated at the beginning of the section, financial returns are, to a good lowest-order approximation, Markovian, corresponding to the difficulty of beating the market. A possible correction to this picture is a form of weak but long-time correlation called fractional Brownian motion, as is discussed in Chapter 8. 3
Maybe this is the case assumed in Mantegna and Stanley (2000), where 1/ f 2 noise is mentioned.
4 Scaling the ivory tower of finance
4.1 Prolog In this chapter, whose title is borrowed from Farmer (1999), we discuss basic ideas from finance: the time value of money, arbitrage, several different ideas of value, as well as the Modigliani–Miller theorem, which is a cornerstone of classical finance theory. We then turn our attention to several ideas from econophysics: fattailed distributions, market instability, and universality. We criticize the economists’ application of the word “equilibrium” to processes that vary rapidly with time and are far from dynamic equilibrium, where supply and demand certainly do not balance. New points of view are presented in the two sections on Adam Smith’s Invisible Hand and Fischer Black’s notion of “equilibrium.” First we will start with elementary mathematics, but eventually will draw heavily on the introduction to probability and stochastic processes presented in Chapter 3.
4.2 Horse trading by a fancy name The basic idea of horse trading is to buy a nag cheaply and unload it on someone else for a profit. One can horse trade in financial markets too, where it is given the fancy name “arbitrage.” Arbitrage sounds more respectable, especially to academics and bankers.1 The idea of arbitrage is simple. If the Euro sells for $1.10 in Frankfurt and $1.09 in New York, then traders should tend to short the Euro in Frankfurt and simultaneously buy it in New York to repay the borrowed Euros, assuming that transaction costs and taxes are less than the total gain (taxes and transaction costs are ignored to zeroth order in theoretical finance arguments). 1
For a lively description of the bond market in the time of the early days of derivatives, deregulation, and computerization on Wall Street, see Liar’s Poker by the ex-bond salesman Lewis (1989).
63
64
Scaling the ivory tower of finance
A basic assumption in standard finance theory is that arbitrage opportunities should quickly disappear as arbitrage is performed by traders (Bodie and Merton, 1998). This leads to the so-called no-arbitrage argument, or “law of one price.” The idea is that arbitrage occurs on a very short time scale, and on longer time scales equivalent assets will then tend to have the same ask price (or bid price) in different markets (assuming markets with similar tax structure, transaction costs, etc.) via the action of traders taking advantage of arbitrage opportunities on the shorter time scales. Finance theorists like to say that a market with no-arbitrage opportunities is an “efficient market.” Arbitrage can be performed in principle on any asset, like the same stock in different markets, for example, and on different but equivalent stocks in the same market. Without further information, however, using this argument to try to decide whether to short AMD and buy INTC because the former has a larger P/E (price to earnings ratio) than the latter is problematic because the two companies are not equivalent. The big question therefore is: can we determine that an asset is overpriced or underpriced? In other words: do assets have an observable intrinsic or fundamental value, or any identifiable, observable “value” other than market price? But before we get into this question let us briefly revisit the arbitrage issue. The no-arbitrage condition is called an “equilibrium” condition in finance texts and papers. In Nakamura (2000) the no-arbitrage condition is confused with Adam Smith’s Invisible Hand. This is all in error. Consider two markets with different prices for the same asset. Via arbitrage the price can be lowered in one market and raised in the other, but even if the prices are the same in both markets a positive excess demand will cause the price to increase continually. Therefore, the absence of arbitrage opportunities does not imply either equilibrium or stability. 4.3 Liquidity, and several shaky ideas of “true value” A Euro today does not necessarily cost the same in today’s dollars as a Euro a year from now, even if neither currency would fluctuate due to noise. For example, in a deflation money gains in relative value but in an inflation loses in relative value. If the annual interest rate minus inflation is a positive number r, then a Euro promised to be paid next year is worth e−r t to the recipient today. In finance texts this is called “the time value of money.” For n discrete time intervals t with interest rate r for each interval, we have p(tn ) = p(t0 )(1 + r t)n . This is also called “discounting.” The time value of money is determined by the ratio of two prices at two different times. But now consider an asset other than money. Is there an underlying “true value” of the asset, one fundamental price at one time t? It turns out that the true value of an asset is not a uniquely defined idea: there are at least five different
Liquidity and several shaky ideas of “true value”
65
definitions of “value” in finance theory. The first refers to book value. The second uses the replacement price of a firm (less taxes owed, debt and other transaction costs). These first two definitions are loved by market fundamentalists and can sometimes be useful, but we don’t discuss them further in what follows. That is not because they are not worth using, but rather because it is rare that market prices for companies with good future prospects would fall so low. Instead, we will concentrate on the standard ideas of value from finance theory. Third is the old idea of dividends and returns discounted infinitely into the future for a financial asset like a stock or bond and which we will discuss next. The fourth idea of valuation due to Modigliani and Miller is discussed in Section 4.5 below. The idea of dividends and returns discounted infinitely into the future for a financial asset is very shaky, because it makes impossible information demands on our knowledge of future dividends and returns. That is, it is impossible to apply with any reasonable degree of accuracy. Here’s the formal definition: starting with the total return given by the gain Rt due to price increase with no dividend paid in a time interval t, and using the small returns approximation, we have x = ln p(t)/ p(t0 ) ≈ p/ p
(4.1)
p(t + t) ≈ p(t)(1 + Rt)
(4.2)
or
But paying a dividend d at the end of a quarter (t = 1 quarter) reduces the stock price, so that for the nth quarter pn = pn−1 (1 + Rn ) − dn
(4.3)
If we solve this by iteration for the implied fair value of the stock at time t0 then we obtain ∞ dn (4.4) p(t0 ) = 1 + Rn k=1 whose convergence assumes that pn goes to zero as n goes to infinity. This reflects the assumption that the stock is only worth its dividends, a questionable assumption at best. Robert Shiller (1999) uses this formal definition of value in his theoretical discussion of the market efficiency in the context of rational vs irrational behavior of agents, in spite of the fact that equation (4.4) can’t be tested observationally and therefore is not even falsifiable. In finance, as in physics, we must avoid using ideas that are merely “defined to exist” mathematically. The ideas should be effectively realizable in practice or else they don’t belong in a theory. Equation (4.4) also conflicts with the Modigliani–Miller idea that dividends don’t matter, which we present in Section 4.5 below.
66
Scaling the ivory tower of finance
The idea of a market price means that buyers are available for sellers, and vice versa, albeit not necessarily at exactly the prices demanded or offered. This leads us to the very important idea of liquidity. An example of an extremely illiquid market is provided by a crash, where there are mainly sellers and few, if any, buyers. When we refer to “market price” we are making an implicit assumption of adequate liquidity. A liquid market is one with many rapidly executed trades in both directions, where consequently bid/ask spreads are small compared with price. This allows us to define “price,” meaning “market price,” unambiguously. We can in this case take “market price” as the price at which the last trade was executed. Examples of liquid markets are stocks, bonds, and foreign exchange of currencies like the Euro, Dollar and Yen, so long as large buy/sell orders are avoided, and so long as there is no market crash. In a liquid market a trade can be approximately reversed over very small time intervals (on the order of seconds in finance) with only very small losses. The idea of liquidity is that the size of the order is small enough that it does not affect the other existing limit orders. An illiquid market is one with large bid/ask spreads, like housing, carpets, or cars, where trades occur far less frequently and with much lower volume than in financial markets. As we’ve pointed out in Chapter 2, neo-classical equilibrium arguments can’t be trusted because they try to assign relative prices to assets via a theory that ignores liquidity completely. Even with the aid of modern options pricing theory the theoretical pricing of nonliquid assets is highly problematic, but we leave that subject for the next two chapters. Also, for many natural assets like clean air and water, a nice hiking path or a mountain meadow, the subjective idea of value cannot be reliably quantified. Finance theorists assume the contrary and believe that everything has a price that reflects “the market,” even if liquidity is nearly nonexistent. An example of a nonliquid asset (taken from Enron) is gas stored for months in the ground.2 The neo-classical assumption that everything has its price, or should have a price (as in the Arrow–Debreu Theory of Value), is not an assumption that we make here because there is no empirical or convincing theoretical basis for it. More to the point, we will emphasize that the theoretical attempt to define a fair price noncircularly for an asset is problematic even for well-defined financial assets like firms, and for very liquid assets like stocks, bonds, and foreign exchange transactions. The successful trader George Soros, who bet heavily against the British Pound and won big, asserts that the market is always wrong. He tries to explain what he means by this in his book The Alchemy of Finance (1994) but, like a baseball batter trying to explain how to hit the ball, Soros is better at winning than at understanding 2
Gas traded daily on the spot market is a liquid asset. Gas stored in the ground but not traded has no underlying market statistics that can be used for option pricing. Instead, finance theorists use a formal Martingale approach based on “synthetic probabilities” in order to assign “prices” to nonliquid assets. That procedure is shaky precisely because it lacks empirical support.
The Gambler’s Ruin
67
how he wins. The neo-classical approach to finance theory is to say instead that “the market knows best,” that the market price p(t) (or market bid/ask prices) is the fair price, the “true value” of the asset at time t. That is the content of the efficient market hypothesis, which we will refer to from now on as the EMH. We can regard this hypothesis as the fifth definition of true value of an asset. It assumes that the only information provided by the market about the value of an asset is its current market price and that no other information is available. But how can the market “know best” if no other information is available? Or, even worse, if it consists mainly of noise as described by a Markov process? The idea that “the market knows best” is a neo-classical assumption based on the implicit belief that an invisible hand stabilizes the market and always swings it toward equilibrium. We will return to the EMH in earnest in Chapter 7 after a preliminary discussion in Chapter 5. The easy to read text by Bodie and Merton (1998) is a well-written undergraduate introduction to basic ideas in finance. Bernstein (1992) presents an interesting history of finance, if from an implicit neo-classical viewpoint. Eichengren (1996) presents a history of the international monetary system. 4.4 The Gambler’s Ruin Consider any game with two players (you and the stock market, for example). Let d denote a gambler’s stake, and D is the house’s stake. If borrowing is not possible then d + D = C = constant is the total amount of capital. Let Rd denote the probability that the gambler goes broke, in other words the probability that d = 0 so that D = C. Assume a fair game; for example, each player bets on the outcome of the toss of a fair coin. Then 1 1 (4.5) Rd = Rd+1 + Rd−1 2 2 with boundary conditions R0 = 1 (ruin is certain) and RC = 0 (ruin is impossible). To solve (4.5), assume that Rd is linear in d. The solution is d D =1− C C Note first that the expected gain for either player is zero, Rd =
G = −d Rd + D(1 − Rd ) = 0
(4.6)
(4.7)
representing a fair game on the average: for many identical repetitions of the same game, the net expected gain for either the player or the bank vanishes, meaning that sometimes the bank must also go broke in a hypothetically unlimited number of repetitions of the game. In other words, in infinitely many repeated games the idea of a fair game would re-emerge: neither the bank nor the opponent would lose
68
Scaling the ivory tower of finance
money on balance. However, in finitely many games the house, or bank, with much greater capital has the advantage, the player with much less capital is much more likely to go broke. Therefore if you play a fair game many times and start with capital d < D you should expect to lose to the bank, or to the market, because in this case Rd >1/2. An interesting side lesson taught by this example that we do not discuss here is that, with limited capital, if you “must” make a gain “or else,” then it is better to place a single bet of all your capital on one game, even though the odds are that you will lose. By placing a single large bet instead of many small bets you improve your odds (Billingsley, 1983). But what does a brokerage house have to do with a casino? The answer is: quite a lot. Actually, a brokerage house can be understood as a full service casino (Lewis, 1989; Millman, 1995). Not only will they place your bets; they will lend you the money to bet with, on margin, up to 50%. However, there is an important distinction between gambling in a casino and gambling in a financial market. In the former the probabilities are fixed: no matter how many people bet on red, if the roulette wheel turns up black they all lose. In the market, the probability that you win increases with the number of people making the same bet as you. If you buy a stock and many other people buy the same stock then the price is driven upward. You win if you sell before the others get out of the market. That is, in order to win you must (as Keynes pointed out) guess correctly what other people are going to do before they do it. This would require having better than average information about the economic prospects of a particular business, and also the health of the economic sector as a whole. Successful traders like Soros and Buffet are examples of agents with much better than average information and knowledge. 4.5 The Modigliani–Miller argument We define the “capital structure” of a publicly held company as the division of financial obligations into stocks and bonds. The estimated value3 of a firm is given by p = B + S, where B is the total debt and S is the equity, also called market capitalization. Defined as B + S, market value p is measurable because we can find out what is B, and S = ps Ns , where Ns is the number of shares of stock outstanding at price ps . For shares of a publicly traded firm like INTC, one can look up both Ns and ps on any discount broker’s website. The Modigliani–Miller (M & M, meaning Franco Modigliani and Merton Miller) theorem asserts that capital structure doesn’t matter, that the firm’s market value p (what the firm would presumably sell for on the open market, were it for sale) is independent of the ratio B/S. Liquidity of the market is taken for granted in this discussion in spite of the fact that huge, 3
One might compare this with the idea of “loan value,” the value estimated by a bank for the purpose of lending money.
The Modigliani–Miller argument
69
global companies like Exxon and GMC rarely change hands: the capital required for taking them over is typically too large. Prior to the Modigliani and Miller (1958) theorem it had been merely assumed without proof that the market value p of a firm must depend on the fraction of a firm’s debt vs its equity, B/S. In contrast with that viewpoint, the M & M theorem seems intuitively correct if we apply it to the special case of buying a house or car: how much one would have to pay for either today is roughly independent of how much one pays initially as down payment (this is analogous to S) and how much one borrows to finance the rest (which is analogous to B). From this simple perspective the correctness of the M & M argument seems obvious. Let us now reproduce M & M’s “proof” of their famous theorem. Their “proof” is based on the idea of comparing “cash flows” of equivalent firms. M & M neglect taxes and transaction fees and assumed a very liquid market, one where everyone can borrow at the same risk-free interest rate. In order to present their argument we can start with a simple extrapolation of the future based on the local approximation ignoring noise p ≈ r pt
(4.8)
where p(t) should be the price of the firm at time t. This equation assumes the usual exponential growth in price for a risk-free asset like a money market account where r is fixed. Take the expected return r to be the market capitalization rate, the expected growth rate in value of the firm via earnings (the cash flow), so that p denotes earnings over a time interval t. In this picture p represents the value of a firm today based on the market’s expectations of its future earnings p at a later time t + t. To arrive at the M & M argument we concentrate on p ≈ p /r
(4.9)
where p is to be understood as today’s estimate of the firm’s net financial worth based on p = E and r , the expected profit and expected rate of increase in value of the firm over one unit of time, one quarter of a year. If we take t = 1 quarter in what follows, then E denotes expected quarterly earnings. With these assumptions, the “cash flow” relation E = pr yields that the estimated fair price of the firm today would be p = E/r
(4.10)
where r is the expected rate of profit/quarter and E is the expected quarterly earnings. Of course, in reality we have to know E at time t + t and p at time t and then r can be estimated. Neither E nor r can be known in advance and must either be estimated from historic data (assuming that the future will be like the past) or else
70
Scaling the ivory tower of finance
guessed on the basis of new information. In the relationship p = B + S, in contrast, B and S are always observable at time t. B is the amount of money raised by the firm for its daily operations by issuing bonds and S is the market capitalization, the amount of money raised by issuing shares of stock. Here comes the main point: M & M want us to assume that estimating E/r at time t is how the market arrives at the observable quantities B and S. To say the least, this is a very questionable proposition. In M & M’s way of thinking if the estimated price E/r differs from the market price p = B + S then there is an arbitrage opportunity. M & M assume that there is no arbitrage possible, so that the estimated price E/r and the known value B + S must be the same. Typical of neo-classical economists, M & M mislabel the equality B + S = E/r as “market equilibrium,” although the equality has nothing to do with equilibrium, because in equilibrium nothing can change with time. In setting B + S = p = E/r , M & M make an implicit assumption that the market collectively “computes” p by estimating E/r , although E/r cannot be known in advance. That is, an implicit, unstated model of agents’ collective behavior is assumed without empirical evidence. The assumption is characteristic of neoclassical thinking.4 One could try to assert that the distribution of prices, which is in reality mainly noise (and is completely neglected in M & M), reflects all agents’ attempts to compute E/r , but it is doubtful that this is what agents really do, or that the noise can be interpreted as any definite form of computation. In reality, agents do not seem to behave like rational bookkeepers who try to get all available information in numerical bits. Instead of bookkeepers and calculators, one can more accurately speak of agents, who speculate about many factors like the “mood” of the market, the general economic climate of the day triggered by the latest news on unemployment figures, etc., and about how other agents will interpret that data. One also should not undervalue personal reasons like financial constraints, or any irrational look into the crystal ball. The entire problem of agents’ psychology and herd behavior is swept under the rug with the simple assumptions made by M & M, or by assuming optimizing behavior. Of course, speculation is a form of gambling: in speculating one places a bet that the future will develop in a certain way and not in alternative ways. Strategies can be used in casino gambling as well, as in black jack and poker. In the book The Predictors (Bass, 1991) we learn how the use of a small computer hidden in the shoe and operated with the foot leads to strategies in roulette as well. This aside was necessary because when we can agree that agents behave less like rational computers and more like gamblers, then M & M have ignored something 4
The market would have to behave trivially like a primitive computer that does only simple arithmetic, and that with data that are not known in advance. Contrast this with the complexity of intellectual processes described in Hadamard (1945).
The Modigliani–Miller argument
71
important: the risk factor, and risk requires the inclusion of noise5 as well as possible changes in the “risk free” interest rate which are not perfectly predictable and are subject to political tactics by the Federal Reserve Bank. Next, we follow M & M to show that dividend policy does not affect net shareholders’ wealth in a perfect market, where there are no taxes and transaction fees. The market price of a share of stock is just ps = S/Ns . Actually, it is ps and Ns that are observable and S that must be calculated from this equation. Whether or not the firm pays dividends to shareholders is irrelevant: paying dividends would reduce S, thereby reducing ps to ps = (S − ␦S)/Ns . This is no different in effect than paying interest due quarterly on a bond. Paying a dividend is equivalent to paying no dividend but instead diluting the market by issuing more shares to the same shareholders (the firm could pay dividends in shares), so that ps = S/(Ns + ␦Ns ) = (S − ␦S)/Ns . In either case, or with no dividends at all, the net wealth of shareholders is the same: dividend policy affects share price but not shareholders’ wealth. Note that we do not get ps = 0 if we set dividends equal to zero, in contrast with (4.4). Here is a difficulty with the picture we have just presented: liquidity has been ignored. Suppose that the market for firms is not liquid, because most firms are not traded often or in volume. Also, the idea of characterizing a firm or asset by a single price doesn’t make sense in practice unless bid/ask spreads are small compared with both bid and ask prices. Estimating fair price p independently of the market in order to compare with the market price B + S and find arbitrage opportunities is not as simple as it may seem (see Bose (1999) for an application of equation (4.10) to try to determine if stocks and bonds are mispriced relative to each other). In order to do arbitrage you would have to have an independent way of making a reliable estimate of future earnings E based also on an assumption what is the rate r during the next quarter. Then, even if you use this guesswork to calculate a “fair price” that differs from the present market price and place your bet on it by buying a put or call, there is no guarantee that the market will eventually go along with your sentiment within your prescribed time frame. For example, if you determine that a stock is overpriced then you can buy a put, but if the stock continues to climb in price then you’ll have to meet the margin calls, so the gamblers’ ruin may break your bank account before the stock price falls enough to exercise the put. This is qualitatively what happened to the hedge fund Long Term Capital Management (LTCM), whose collapse in 1998 was a danger to the global financial system (Dunbar, 2000). Remember, there are no springs in the market, only unbounded diffusion of stock prices with nothing to pull them back to your notion of “fair value.” 5
Ignoring noise is the same as ignoring risk, the risk is in price fluctuations. Also, as F. Black pointed out, “noise traders” provide liquidity in the market.
72
Scaling the ivory tower of finance
To summarize, the M & M argument that p = B + S is independent of B/S makes sense in some cases,6 but the assumption that most agents compute what they can’t know, namely E/r to determine a fair price p, does not hold water. The impossibility of using then-existing finance theory to make falsifiable predictions led Black via the Capital Asset Pricing Model (CAPM) to discover a falsifiable model of options pricing, which (as he pointed out) can be used to value corporate liabilities. We will study the CAPM in the next chapter. 4.6 From Gaussian returns to fat tails The first useful quantitative description of stock market returns was proposed by the physicist turned finance theorist M. F. M. Osborne (1964), who plotted histograms based on Wall Street Journal data in order to try to deduce the empirical distribution. This is equivalent to assuming that asset returns do a random walk. Louis Bachelier had assumed much earlier without an adequate empirical analysis that asset prices do a random walk (Gaussian distribution of prices). By assuming that prices are Gaussian distributed, negative prices appeared in the model. Osborne, who was apparently unaware of Bachelier’s work, argued based on “Fechner’s law” that one needs an additive variable. The returns variable x = ln( p(t + t)/ p(t)) is additive.7 Assuming statistically independent events plus a returns-independent variance then yields a Gaussian distribution of returns, meaning that prices are lognormally distributed. Suppose that we know the price p of an asset at time t, and suppose that the asset price obeys a Markov process. This is equivalent to assuming that the probability distribution of prices at a later time t + t is fixed by p(t) alone, or by the distribution of p(t) alone. In this case knowledge of the history of earlier prices before time t adds nothing to our ability to calculate the probability of future prices. The same would be true were the system described by deterministic differential equations. However, deterministic differential equations are always time reversible: one can recover the past history by integrating backward from a single point on a trajectory. In contrast, the dynamics of a Markov process is diffusive, so that the past cannot be recovered from knowledge of the present. From the standpoint of Markov processes Bachelier’s model is given by the model stochastic differential equation (sde) d p = Rdt + σ dB 6
7
(4.11)
For a very nice example of how a too small ratio S/B can matter, see pp. 188–190 in Dunbar (2000). Also, the entire subject of Value at Risk (VaR) is about maintaining a high enough ratio of equity to debt to stay out of trouble while trading. Without noise, x = ln p(t + t)/ p(t) would give p(t + t)/ p(t) = ex , so that x = Rt would be the return during time interval t on a risk-free asset.
From Gaussian returns to fat tails
73
with constant variance σ (linear price growth) and predicts a qualitatively wrong formula for returns x, because with zero noise the return x should be linear in t, corresponding to interest paid on a savings account. Osborne’s model is described by d p = Rpdt + σ pdB
(4.12)
with nonconstant price diffusion coefficient8 (σ p)2 and predicts a qualitatively correct result (at least until winter, 2000) for the expected price (exponential growth) p(t + t) = p(t)e Rt eB(t)t
(4.13)
corresponding to an sde dx = Rdt + σ dB for returns (linear returns growth). The Black–Scholes (B–S) option pricing theory, which is presented in all detail in the next chapter and assumes the lognormal pricing model, was published in 1973 just as options markets began an explosive growth. However, we now know from empirical data that returns are not Gaussian, that empirical financial distributions have “fat tails” characterized by scaling exponents (Dacorogna, 2000). We have in mind here the data from about 1990–2002. Prior to 1990 computerization was not extensive enough for the accumulation of adequate data for analysis on time scales from seconds to hours. Extreme events are fluctuations with x σ , where σ is the standard deviation. With a Gaussian distribution of returns x, extreme events are extremely unlikely. In contrast, in financial markets extreme events occur too frequently to be ignored while assessing risk. The empirical market distributions have “fat tails,” meaning that the probability for a large fluctuation is not exponentially small, f (x, t) ≈ exp(−x 2 /2σ t) as in the Osborne–Black–Scholes theory of asset prices, but rather is given by a power law for large p or x, g( p, t) ≈ p −α or f (x, t) ≈ x −α , shown in Figure 4.1 as proposed by Mandelbrot. Power-law distributions were first used in economics by Pareto and were advanced much later by Mandelbrot. Mandelbrot’s contribution to finance (Mandelbrot, 1964) comes in two parts. In the same era when Osborne discovered that stock prices seem to be lognormally distributed, Mandelbrot produced evidence from cotton prices that the empirical distribution of returns has fat tails. A fat-tailed price density goes as g( p) ≈ p −α−1
(4.14)
for large enough p, whereas a fat-tailed distribution of returns would have a density f (x) ≈ x −α−1 8
(4.15)
The diffusion coefficient d( p, t) times t equals the mean square fluctuation in p, starting from knowledge of a specific initial price p(t). In other words, (p 2 ) = d( p, t)t.
74
Scaling the ivory tower of finance
Probability density
1
10−1
10−2
10−3
10−4 −4
1 2 3 −3 −2 −1 0 USD/DEM hourly returns (%)
4
Figure 4.1. Histogram of USD/DM hourly returns, and Gaussian returns (dashed line). Figure courtesy of Michel Dacorogna.
with x = ln( p(t + t)/ p(t)). A distribution that has fat price tails is not fat tailed in returns. Note that an exponential distribution is always fat tailed in the sense of equation (4.14), but not in the sense of equation (4.15). A relation (4.15) of the form ln f ≈ −α ln x is an example of a scaling law, f (λx) = λ−1−α f (x). In order to produce evidence for a scaling law one needs three decades or more on a log–log plot. Even two and one-half decades can lead to spurious claims of scaling because too many functions look like straight lines locally (but not globally) in log–log plots. In what follows we will denote the tail index by µ = 1 + α. Mandelbrot also discovered that the standard deviation of cotton prices is not well defined and is even subject to sudden jumps. He concluded that the correct model is one with infinite standard deviation and introduced the Levy distributions, which are fat tailed but with the restriction that 1 < α < 2, so that 2 < µ < 3. For α = 2 the Levy distribution is Gaussian. And for α > 2 the fat tails have finite variance, for α < 2 the variance is infinite and for α < 1 the tails are so fat that even the mean is infinite (we discuss Levy distributions in detail in Chapter 8). Later empirical analyses showed, in contrast, that the variance of asset returns is well defined. In other words, the Levy distribution does not describe asset returns.9 Financial returns densities f (x, t) seem to be exponential (like (4.14)) for small and moderate x with large exponents that vary with time, but cross over to fat-tailed returns (4.15) with µ ≈ 3.5 to 7.5 for extreme events. The observed tail exponents are apparently not universal and may be time dependent. 9
Truncated Levy distributions have been used to analyze finance market data and are discussed in Chapter 8.
The best tractable approximation
75
4.7 The best tractable approximation to liquid market dynamics If we assume that prices are determined by supply and demand then the simplest model is dp = ε( p, t) dt
(4.16)
where ε is excess demand. With the assumption that asset prices in liquid markets are random, we also have d p = r ( p, t)dt + d( p, t)dB(t)
(4.17)
where B(t) is a Wiener process. This means that excess demand d p/dt is approximated by drift r plus noise d( p, t)dB/dt. We adhere to this interpretation in all that follows. The motivation for this approximation is that financial asset prices appear to be random, completely unpredictable, even on the shortest trading time scale on the order of a second: given the price of the last trade, one doesn’t know if the next trade will be up or down, or by how much. In contrast, deterministic chaotic systems (4.16) are pseudo-random at long times but cannot be distinguished from nonchaotic systems at the shortest times, where the local conservation laws can be used to transform the flow (McCauley, 1997a) to constant speed motion in a special coordinate system (local integrability). Chaotic maps with no underlying flow could in principle be used to describe markets pseudo-randomly, but so far no convincing empirical evidence has been produced for positive Liapunov exponents.10 We therefore stick with the random model (4.17) in this text, as the best tractable approximation to market dynamics. Neo-classical theorists give a different interpretation to (4.17). They assume that it describes a sequence of “temporary price equilibria.” The reason for this is that they insist on picturing “price” in the market as the clearing price, as if the market would be in equilibrium. This is a bad picture: limit book orders prevent the market from approaching any equilibrium. Black actually adopted the neo-classical interpretation of his theory although this is both wrong and unnecessary. The only dynamically correct definition of equilibrium is that, in (4.16), d p/dt = 0, which is to say that the total excess demand for an asset vanishes, ε( p) = 0. In any market, so long as limit orders remain unfilled, this requirement is not satisfied and the market is not in equilibrium. With this in mind we next survey the various wrong ideas of equilibrium propagated in the economics and finance literature. 10
Unfortunately, no one has looked for Liapunov exponents at relatively short times, which is the only limit where they would make sense (McCauley, 1993).
76
Scaling the ivory tower of finance
4.8 “Temporary price equilibria” and other wrong ideas of “equilibrium” in economics and finance There are at least six wrong definitions of equilibrium in economics and finance literature. There is (1) the idea of equilibrium fluctuations about a drift in price. Then (2) the related notion that market averages describe equilibrium quantities (Fama, 1970). Then there is the assumption (3), widespread in the literature, that capital asset pricing model (CAPM) describes equilibrium prices (Sharpe, 1964). Again, this definition fails because (as we see in the next chapter) the parameters in CAPM vary with time. (4) Black (1989) claimed that “equilibrium dynamics” are described by the Black–Scholes equation. This is equivalent to assuming that the market is in “equilibrium” when prices fluctuate according to the sde defining the lognormal distribution, a nonequilibrium distribution. (5) Absence of arbitrage opportunities is thought to define an “equilibrium” (Bodie and Merton, 1998). Finally, there is the idea (6) that the market and stochastic models of the market define sequences of “temporary price equilibria” (F¨ollmer, 1995). We can dispense rapidly with definition (1): it would require a constant variance but the variance is approximately linear in the time for financial data. Another way to say it is that definitions (1) and (2) would require a stationary process, but financial data are not stationary. Definitions (3), (4) and (5) are discussed in Chapters 5, 6, and 7. Definition (2) will be analyzed in chapter 7. We now proceed to deconstruct definition (6). The clearest discussion of “temporary price equilibria” is provided by Hans F¨ollmer (1995). In this picture excess demand can vanish but prices are still fluctuating. F¨ollmer expresses the notion by trying to define an “equilibrium” price for a sequence of time intervals (very short investment/speculation periods t), but the price so-defined is not constant in time and is therefore not an equilibrium price. He begins by stating that an equilibrium price would be defined by vanishing total excess demand, ε( p) = 0. He then claims that the condition defines a sequence of “temporary price equilibria,” even though the time scale for a “shock” from one “equilibrium” to another would be on the order of a second: the “shock” is nothing but the change in price due to the execution of a new buy or sell order. F¨ollmer’s choice of language sets the stage for encouraging the reader to believe that market prices are, by definition, “equilibrium” prices. In line with this expectation, he next invents a hypothetical excess demand for agent i over time interval [t, t + t] that is logarithmic in the price, εi ( p) = αi ln( pi (t)/ p(t)) + xi (t, t),
(4.18)
where pi (t) is the price that agent i would be willing to pay for the asset during speculation period t. The factor xi (t, t) is a “liquidity demand”: agent i will not buy the stock unless he already sees a certain amount of demand for the stock in
Searching for Adam Smith’s Invisible Hand
77
the market. This is a nice idea: the agent looks at the number of limit orders that are the same as his and requires that there should be a certain minimum number before he also places a limit order. By setting the so-defined total excess demand ε( p) (obtained by summing (4.18) over all agents) equal to zero, one obtains the corresponding equilibrium price of the asset (αi ln pi (t) + ∆xi (t)) αi (4.19) ln p(t) = i
i
In the model pi is chosen as follows: the traders have no sense where the market is going so that they simply take as their “reference price” pi (t) the last price demanded in (4.18) at time t − t, pi (t) = p(t − t)
(4.20)
This yields ln p(t) =
(αi ln p(t − t) + xi (t, t)) αi i
i
= ln p(t − t) + x(t, t)
(4.21)
If we assume next that the liquidity demand x(t, t), which equals the log of the “equilibrium” price increments, executes Brownian motion then we obtain a contradiction: the excess demand (4.18), which is logarithmic in the price p and was assumed to vanish does not agree with the total excess demand defined by the righthand side of (4.17), which does not vanish, because with x = (R − σ 2 /2)t + σ B we have d p/dt = r p + σ pdB/dt = ε( p) = 0. The price p(t) so-defined is not an equilibrium price because the resulting lognormal price distribution depends on the time.
4.9 Searching for Adam Smith’s Invisible Hand The idea of Adam Smith’s Invisible Hand is to assume that markets are described by stable equilibria. In this discussion, as we pointed out in Section 4.7 above, by equilibrium we will always require that the total excess demand for an asset vanishes on the average. Correspondingly, the average asset price is constant. The latter will be seen below to be a necessary but not sufficient condition for equilibrium. We will define dynamic equilibrium and also statistical equilibrium, and then ask if the stochastic models that reproduce empirical market statistics yield either equilibrium, or any stability property. In this context we will also define two different ideas of stability.
78
Scaling the ivory tower of finance
Adam Smith lived in the heyday of the success of simple Newtonian mechanical models, well before statistical physics was developed. He had the dynamic idea of the approach to equilibrium as an example for his theorizing. As an illustration we can consider a block sliding freely on the floor, that eventually comes to rest due to friction. The idea of statistical equilibrium was not introduced into physics until the time of Maxwell, Kelvin, and Boltzmann in the latter half of the nineteenth century. We need now to generalize this standard dynamic notion of equilibrium to include stochastic differential equations. Concerning the conditions for reaching equilibrium, L. Arnold (1992) shows how to develop some fine-grained ideas of stability, in analogy with those from dynamical systems theory, for deterministic differential equations. Given an sde, (4.22) dx = R(x, t)dt + D(x, t)dB(t) dynamic equilibria x = X, where dx = 0 for all t > t0 , can be found only for nonconstant drift and volatility satisfying both R(X, t) = 0, D(X, t) = 0 for all forward times t. Given an equilibrium point X , one can then investigate local stability: does the noisy dynamical system leave the motion near equilibrium, or drive it far away? One sees from this standpoint that it would be impossible to give a precise definition of the neo-classical economists’ vague notion of “sequences of temporary price equilibria.” The notion is impossible, because, for example, the sde that the neo-classicals typically assume √ (4.23) dz = DdB(t) with z = x − Rt, and with R and D constants, has no equilibria at all. What they want to imagine instead is that were dB = 0 then we would have z = 0, describing their so-called “temporary price equilibria” p(t + t) = p(t). The noise dB instead interrupts and completely prevents this “temporary equilibrium” and yields a new point p(t + t) = p(t) in the path on the Wiener process. The economists’ description amounts to trying to imagine a Wiener process (ordinary Brownian motion) as a sequence of equilibrium points, which is completely misleading. Such nonsense evolved out of the refusal, in the face of far-from-equilibrium market data, to give up the postulated, nonempirical notions of equilibria and stability of markets. We can compare this state of denial with the position taken by Aristotelians in the face of Galileo’s mathematical description of empirical observations of how the simplest mechanical systems behave (Galilei, 2001). The stochastic dynamical systems required to model financial markets generally do not have stable equilibria of the dynamical sort discussed above. We therefore turn to statistical physics for a more widely applicable idea of equilibrium, the idea of statistical equilibrium. In this case we will see that the vanishing of excess demand on the average is a necessary but not sufficient condition for equilibrium.
Searching for Adam Smith’s Invisible Hand
79
As Boltzmann and Gibbs have taught us, entropy measures disorder. Lower entropy means more order, higher entropy means less order. The idea is that disorder is more probable than order, so low entropy corresponds to less probable states. Statistical equilibrium is the notion of maximum disorder under a given set of constraints. Given any probability distribution we can write down the formula for the Gibbs entropy of the distribution. Therefore, a very general coarse-grained approach to the idea of stability in the theory of stochastic processes would be to study the entropy ∞ S(t) = −
f (x, t) ln f (x, t)dx
(4.24)
−∞
of the returns distribution P(x, t) with density f (x, t) = dP/dx. If the entropy increases toward a constant limit, independent of time t, and remains there then the system will have reached statistical equilibrium, a state of maximum disorder. The idea is qualitatively quite simple: if you toss n coins onto the floor then it’s more likely that they’ll land with a distribution of heads and tails about half and half (maximum disorder) rather than all heads (or tails) up (maximum order). Let W denote the number of ways to get m heads and n − m tails with n coins. The former state is much more probable because there are many different ways to achieve it, W = n!/(n/2)!(n/2)! where n! = n(n − 1)(n − 2) . . . (2)(1). In the latter case there is only one way to get all heads showing, W = 1. Using Boltzmann’s formula for entropy S = ln W , then the disordered state has entropy S on the order of n ln 2 while the ordered state has S = ln 1 = 0. One can say the same about children and their clothing: in the absence of effective rules of order the clothing will be scattered all over the floor (higher entropy). But then mother arrives and arranges everything neatly in the shelves, attaining lower entropy. “Mama” is analogous to a macroscopic version of Maxwell’s famous Demon. That entropy approaches a maximum, the condition for statistical equilibrium, requires that f approaches a limiting distribution f 0 (x) that is time independent as t increases. Such a density is called an equilibrium density. If, on the other hand, the entropy increases without bound, as in diffusion with no bounds on returns as in the sde (4.23), then the stochastic process is unstable in the sense that there is no statistical equilibrium at long but finite times. The approach to a finite maximum entropy defines statistical equilibrium. Instead of using the entropy directly, we could as well discuss our coarse-grained idea of equilibrium and stability in terms of the probability distribution, which determines the entropy. The stability condition is that the moments of the distribution are bounded, and become time independent at large times. This is usually the same
80
Scaling the ivory tower of finance
as requiring that f approaches a t-independent limit f 0 . Next, we look at two very simple but enlightening examples. The pair correlation function R(t) = σ 2 e−2βt
(4.25)
arises from the Smoluchowski–Uhlenbeck–Ornstein (S–U–O) process (Wax, 1954; Kubo et al., 1978) (4.26) dv = −βvdt + d(v, t)d B(t) with the diffusion coefficient given by d = βv 2 = constant. In statistical physics, v is the velocity of a Brownian particle and the Fokker–Planck equation for this model describes the approach of an initially nonequilibrium velocity distribution to the Maxwellian one as time increases. The relaxation time for establishing equilibrium τ = 1/2β is the time required for correlations (4.25) to decay significantly, or for the entropy to reach a constant value. If we could model market data so simply, with v representing the price p, then the restoring force −βp with β > 0 would provide us with a simple model of Adam Smith’s stabilizing Invisible Hand. Summarizing, the probability distribution defined by the sde (4.26) satisfies the condition for statistical equilibrium by approaching a time-independent Gaussian distribution at large times (see Uhlenbeck and Ornstein, in Wax (1954) for details). When v is the velocity of a Brownian particle, then the limiting Gaussian is the Maxwell distribution, and so statistical equilibrium corresponds to thermodynamic equilibrium in that case. In the sde (4.26), v is unbounded but there is a restoring force (friction, with β > 0) acting on the velocity. But the reader should not assume that the presence of a restoring force alone in (4.22) guarantees a stabilizing Invisible Hand, as the following example shows. That stability is not guaranteed by a restoring force alone is shown by the example of the lognormal price model, where d p = r pdt + σ pdB
(4.27)
If we restrict to the case where r < 0 then we have exactly the same restoring force (linear friction) as in the S–U–O sde (4.26), but the p-dependent diffusion coefficient d( p) = (σ p)2 destabilizes the motion! We can see this as follows. The sde (4.27) describes the lognormal model of prices (Gaussian returns), with Fokker– Planck equation ∂ σ 2 ∂2 2 ∂g ( p g) = −r ( pg) + ∂t ∂p 2 ∂ p2
(4.28)
Searching for Adam Smith’s Invisible Hand
81
We can easily calculate the moments of g to obtain p n = Cen(r +σ
2
(n−1)/2)∆t
(4.29)
We see that even if r < 0 the moments do not approach constants. There is no approach to statistical equilibrium in this model (a necessary condition for statistical equilibrium is that there is no time dependence of the moments). Another way to say it is that g( p, t) does not approach a finite time-independent limit g( p) as t goes to infinity, but vanishes instead because prices p are unbounded: information about the “particle’s” position simply diffuses away because the density g spreads without limit as t increases. The equilibrium solution of the “lognormal” Fokker–Planck equation (4.28) expressed in returns x = ln p/ p0 is given by f (x) = Ce−2r x/σ
2
(4.30)
The time-dependent lognormal distribution, the Green function of the Fokker– Planck equation (4.28), does not approach the limit (4.30) as t goes to infinity. Negative returns r = −k < 0 in (4.27) and (4.28) are equivalent to a Brownian particle in a quadratic potential U ( p) = kp2 /2, but the p-dependent diffusion coefficient delocalizes the particle. This is nonintuitive: if quarks had such a diffusion coefficient due to zero point fluctuations, then they could effectively unbind. The only way to get statistical equilibrium from (4.24) would be by imposing price controls p1 ≤ p ≤ p2 . Mathematically, this is represented by reflecting walls at the two end points. In that case, the most general solution of the Fokker–Planck equation is given by the equilibrium solution (4.30) plus terms that die exponentially as t goes to infinity (Stratonovich, 1963). The spectrum of the Fokker–Planck operator that generates the eigenfunctions has a discrete spectrum for a particle in a box, and the lowest eigenvalue vanishes. It is the vanishing of the lowest eigenvalue that yields equilibrium asymptotically. When the prices are unbounded, then the lowest eigenvalue still vanishes but the spectrum is continuous, and equilibrium does not follow. The main point is that the mere mathematical existence of a statistical equilibrium solution of the Fokker–Planck equation (4.28) does not guarantee that time-dependent solutions of that equation will converge to that statistical equilibrium as time goes to infinity. This is the main point. We emphasize: it is not the restoring force alone in (4.26) that yields statistical equilibrium, a constant diffusion coefficient d = σ 2 in the random force σ dB/dt is also simultaneously required. It is precisely the lack of the latter condition, that d( p) = (σ p)2 is nonconstant, that leads to instability (delocalization) in (4.27). Note also that the equilibrium solution (4.30) has “fat tails” in p, g( p) = f (x)dx/d p = C/ p1+2R/σ
2
(4.31)
82
Scaling the ivory tower of finance
whereas the lognormal distribution has no fat tails in any limit. This fat-tailed equilibrium density has nothing whatsoever to do with the fat tails observed in empirical data, however, because the empirical density is not stationary. We can advance the main point another way. The S–U–O sde (4.26) has a variance that goes as t 1/2 at short times, but approaches a constant at large times and defines a stationary process in that limit (Maxwellian equilibrium). The Osborne sde (4.27), in contrast, does not define a stationary process at any time, large or small, as is shown by the moments (4.29) above. The dynamical model (4.27) is the basis for the Black–Scholes model of option pricing. Note that the S–U–O sde (4.26) has no equilibria in the fine-grained sense, but nevertheless the density f (x, t) approaches statistical equilibrium. The idea of dynamic stability is of interest in stochastic optimization and control, which has been applied in theoretical economics and finance and yields stochastic generalizations of Hamilton’s equations. Agents who want to make money do not want stability, they want big returns. Big returns occur when agents collectively bid up the price of assets (positive excess demand) as in the US stock bubble of the 1990s. In this case agents contribute to market instability via positive feedback effects. But big returns cannot go on forever without meeting limits that are not accounted for in equations (4.22). There is no complexity in (4.22), no “surprises” fall out of this equation as time goes on because the complexity is hidden in part in R, which may change discontinuously reflecting big changes in agents’ collective sentiment. Typical estimates of future returns R based on past history oversimplify the problem to the point of ignoring all complexity (see Arthur, 1995). It is possible to construct simple agent-based models of buy–sell decision making that are complex in the sense that the only way to know the future is to compute the model and see how the trading develops. The future can not be known in advance because we do not know whether an agent will use his or her particular market strategy to buy or sell at a given point in time. One can use history, the statistics of the market up to the present to say what the average returns were, but there is no reliable equation that tells us what R will be in the future. This is a way of admitting that the market is complex, an aspect that is not built into any of our stochastic models. We also do not take feedback, meaning how agents influence each other in a bubble or crash, into account in this text. It is extremely difficult to estimate returns R accurately using the empirical distribution of returns unless one simply assumes R to be constant and then restricts oneself to analyzing interday trading. We end this section with a challenge to economists and econophysicists (see also Section 7.4): find a market whose statistics are good enough to study the time evolution of the price distribution and produce convincing evidence for stationarity. Let us recall: approximate dynamic equilibria with supply nearly balancing demand do not occur in real markets due to outstanding limit orders, represented
Black’s “equilibrium”
83
mathematically in stochastic differential equations as noise. By the process of elimination, the only possible effect of the Invisible Hand, if it exists, would then be to produce statistical equilibrium in markets. Given any market, statistical equilibrium requires that the asset price distribution is stationary. We show in Chapters 6 and 7 that financial markets are not stationary, financial markets are described by an eternally diffusing returns distribution with no equilibrium limit. The author’s expectation is therefore that no real empirical market distribution is stationary. If the search for the time evolution of the price distribution cannot provide evidence for stationarity, then the Invisible Hand will have been falsified and all standard economics texts will have to be rewritten. The author expects that this will eventually be the case. 4.10 Black’s “equilibrium”: dreams of “springs” in the market In the short paper “Noise,” Fischer Black (1986) discusses three topics: price, value, and noise.11 He states that price is random and observable whereas value is random and unobservable. He asserts boldly that because of noise price deviates from value but always returns to value (he introduced the phrase “noise traders” in this paper). He regards price and value as roughly the same if price is within twice value. There is only one problem: he never defines what he means by “value.” We can reconstruct what Black may have had in mind. He apparently believed the neo-classical economists’ ideas of “equilibrium,” which he called “beautiful.” We can only guess what he thought, but the following argument would explain his claims about price and value. The market, as Osborne taught us, consists of unfilled limit book orders that are step functions. One can see these step functions evolving in time on the website 3DCharts.com, and one can consult Nasdaq level 2 for detailed numerical information. If we would assume that market equilibria exist and are stable, as neo-classical economics teaches, then every limit book would have a daily clearing price, namely, the equilibrium price, where total supply exactly matches total demand. Were the clearing price to exist, then it could be taken to define “value.” This is our guess as to what Black must have meant, and if he didn’t mean it then it will do anyway! Were the equilibrium stable in the sense of stochastic differential equations as we discussed above, then price would always tend to return to value no matter how far price would deviate from value, but value would be empirically unobservable because it would be the solution of many simultaneous equations of the form ε( p) = 0. One could know value in that case only if one could solve the equations in a reasonable amount of time on a 11
We recommend the short paper “Noise” by Fischer Black, who wrote and thought very clearly. He died too early to receive the Nobel Prize along with Myron Scholes and Robert Merton. See especially the entertaining NOVA video The Trillion Dollar Bet, http://www.pbs.org/wgbh/nova/stockmarket/.
84
Scaling the ivory tower of finance
computer, or on many PCs linked together in parallel. There is only one problem with this pretty picture, namely, that systems of stochastic and ordinary differential equations d p/dt = ε( p) may not have equilibria (ε( p) may vanish nowhere, as in the empirically based market model of Chapter 6), and even if equilibria would exist they would typically be unstable. Black’s error was in believing neo-classical economic theory, which is very misleading when compared with reality. A theme of this book is that there are no “springs” in the market, nothing to cause a market to tend toward an equilibrium state. Another way to say it is that there is no statistical evidence that Adam Smith’s Invisible Hand works at all. The dramatically failed hedge fund Long Term Capital Management (LTCM) assumed that deviations from Black–Scholes option pricing would always return to historic market averages (Dunbar, 2000). Initially, they made a lot of money for several years during the mid 1990s by betting on small-fluctuation “mispricing.” LTCM had two Nobel Prize winning neo-classical economists on its staff, Merton and Scholes. They assumed implicitly that equilibrium and stability exist in the market. And that in spite of the fact that the sde used by them to price options (lognormal model of asset prices) has only an unstable equilibrium point at p = 0 (see Chapter 6) and does not even lead to statistical equilibrium at long times. Finally, LTCM suffered the Gambler’s Ruin during a long time-interval large deviation. For a very interesting story of how, in contrast, a group of physicists who do not believe in equilibrium and stability placed bets in the market during the 1990s and are still in business, see The Predictors (Bass, 1991). In order to make his idea of value precise, Black would have needed to deduce from financial market data a model where there is a special stochastic orbit that attracts other nearby orbits (an orbit with a negative Liapunov exponent). The special stochastic orbit could then have been identified as randomly fluctuating “value.” Such an orbit would by necessity be a noisy attracting limit cycle and would represent the action of the Invisible Hand. Value defined in this way has nothing to do with equilibrium, and were fluctuating value so-defined to exist, it would be observable. We return briefly to the idea of fair price mentioned in Section 4.4 above. Black and Scholes (B–S) produced a falsifiable model that predicts a fair option price (the price of a put or call) at time t based on the observed stock price p at time t. The model is falsifiable because it depends only on a few observable parameters. The model therefore provides a basis for arbitrage: if one finds “mispricing” in the form of option prices that violate B–S, then a bet can be placed that the deviation from the B–S prediction will disappear, that the market will eliminate these “inefficiencies” via arbitrage. That is, B–S assumes that the market is efficient in the sense of the EMH in the long run but not in the short run. They were in part right: LTCM placed bets on deviations from historic behavior that grew in magnitude instead of
Macroeconomics: lawless phenomena?
85
decaying over a relatively long time interval. As the spread widened they continued to place more bets, assuming that returns would spring back to historic values on a relatively short time scale. That is how they suffered the Gamblers’ Ruin. According to traders around 1990, the B–S model worked well for option pricing before the mid 1980s. In our era it can only be applied by introducing a financial engineering fudge called implied volatility, which we discuss in Chapter 5. 4.11 Macroeconomics: lawless phenomena? Samuelson has written that the laws of economics are probabilistic in nature, meaning that we can at best predict probabilities for future events and not the events themselves. There would be nothing wrong with this claim, indeed it would be of interest, were there any known statistical laws of economics in the first place. So far, there are not even any empirically or even qualitatively correct models of economic behavior beyond the stochastic dynamical models of financial markets. Many economists believed that neo-classical microeconomic theory would provide the basis for macroeconomic theory. Unfortunately, some physicists write as if macroscopic law could arise from total microscopic lawlessness. Here, a misapplication of the law of large numbers, usually in the form of the central limit theorem, lies beneath the misconception. By randomness we mean dynamically that no algorithm exists to tell us the next state of a system, given the previous state or states. Randomness, as we use the idea in physics, is described by underlying local lawfulness, as in a stochastic process where the time evolution of the governing probability density is deterministic. It is possible to imagine total lawlessness, but we cannot derive any useful information about such a system. In particular, even the central limit theorem cannot be used to derive a Gaussian without the assumption of a microscopic invariance in the form of step sizes and probabilities for the underlying discrete random walk. If one makes other microscopic assumptions about step sizes and corresponding probabilities, then one gets an exponential distribution as in Figure 4.2 (Gunaratne and McCauley, 2003), a Levy distribution or some other distribution (we will discuss Levy and other distributions in Chapter 8). There is no universality independent of the microscopic assumptions in these cases: different local laws of time-evolution of probability lead to entirely different probability distributions. This is in contrast with the emphasis in the nice paper by Hughes et al. (1981) where walks are classified as either Gaussian or Levy with infinite variance. In this characterization large deviations from the mean are ignored, as we pointed out in our discussion of the central limit theorem in Chapter 3. The assumption of Hughes et al. is that the central limit theorem is the primary factor determining the probability distribution after infinitely many steps in a random walk, but we know that this is not true for finite
86
Scaling the ivory tower of finance 100 f 10−1
10−2
10−3
10−4 −10
−5
0
5
10 x(256)
Figure 4.2. Exponential distribution generated via computer for displacementdependent step probabilities, corresponding quantitatively in the continuum limit to a position- and time-dependent diffusion coefficient D(x, t) = b2 (1 + u) for u > 0 where u = x/bt 1/2 , and D(x, t) = b2 (1 − u) for u < 0 where u = x/b t 1/2 . In this simulation there is no drift, R = 0.
walks: the exponential distribution, for example, is never approximately Gaussian excepting for very small fluctuations near the mean, as we explained in Chapter 3. Several examples from physics illustrate the point. The ideal gas law provides one example. One obtains the ideal gas law from Newton’s first two laws via averaging. Without Newton’s local laws there is no “average” behavior. Another example is the Gibbs distribution. Without Hamiltonian dynamics (and model Hamiltonians) Gibbs distributions do not occur. Statistical physics is not, as Norbert Wiener wrote, more general than microscopic physics. There are sometimes thermodynamic analogies in other fields, but it was a misconception of the World War II era to imagine that there could be a thermodynamics of information applicable to economics (see Mirowski, 2002). For example, there is a thermodynamic formalism of one-dimensional chaotic maps precisely because there is a well-defined Boltzmann entropy of symbol sequences. The symbol sequences are defined by a tree, which itself is obtained from the backward iteration of the underlying map. Again, there is only local lawfulness (the iterated map) underlying the statistical mechanical analogy (McCauley, 1993).
4.12 No universal scaling exponents either! We do not expect that the scaling exponents that occur in economics are universal. This expectation goes against the grain of much of the rest of the econophysics
No universal scaling exponents either!
87
movement. We expect diversity rather than universality because there is no evidence that macroscopic behavior is governed approximately by mathematical equations that are at a critical point. In critical phenomena in equilibrium and near-equilibrium statistical physics, there is the idea of universality represented by scaling laws with critical exponents that are the same for all systems in the same universality class. Universality classes are defined by systems with the same symmetry and dimension. A similar universality appears at bifurcations describing the transition from regular to chaotic motion in driven–dissipative deterministic dynamical systems far from equilibrium. In the chaotic regime there is at least one positive Liapunov exponent, and no scaling exponent-universality for the fractals that occur there, only a weaker topological universality defined by symbolic dynamics (Gunaratne, 1990b). Selforganized criticality represents an attempt to extend the universality and scaling of the critical point to many-body systems far from equilibrium, but so far there is no precise definition of universality classes for that idea, nor for “complex adaptable systems.” A multiaffine or multifractal spectrum of scaling exponents is inadequate to pin down a universality class for a far-from-equilibrium system, much less for one or two exponents (McCauley, 1997b, c). It is an empirically unwarranted extrapolation to believe that financial time series are in any sense “critical,” are at the borderline of chaos. This has never been demonstrated and we will not assume that exponents of the distributions that we study are universal. That is, we extrapolate the reality of nonequilibrium nonlinear dynamics to expectations for the stochastic regime, so that exponents of statistical distributions for stochastic processes are expected to be local and nonuniversal, characteristic of a particular market under observation. We also do not assume that complexity (an ill-defined idea, except in computer science) can arise from mere randomness. Instead, we will try to take the data as they are, without any data massage,12 and ask whether they can teach us anything. Finance is sometimes compared with (soft) fluid turbulence, and certain formal analogies do exist. Both lognormal and exponential distributions appear in turbulence and finance, albeit for different reasons. An argument was also made in the physics literature in favor of self-organized criticality (SOC) as the explanation for fluid turbulence. One researcher’s unfulfilled expectation of universal scaling exponents describing the inertial range of turbulent flows with many different boundary and initial conditions was stated as follows: “A system driven by some conserved or quasi-conserved quantity uniformly at a large scale, but able to dissipate it only to microscopic fluctuations, may have fluctuations at all intermediate length scales . . . The canonical case of SOC is turbulence. . . .” This would be an attempt 12
In Chapter 6, for example, we analyze raw financial data and reject any and all of the statisticians’ tricks of filtering or truncating the data. We want to know what the market says, not what a statistician imagines it should say.
88
Scaling the ivory tower of finance
to describe turbulence in open flows could we replace the word “fluctuations” with the phrase “a hierarchy of eddies where the eddy cascade is generated by successive dynamical instabilities.” In SOC (as in any critical system) all Liapunov exponents should vanish, whereas the rapid mixing characteristic of turbulence requires at least one positive Liapunov exponent (mixing is relatively rapid even in low Reynolds number vortex cascades, where R = 15–20). The dissipation range of fluid turbulence in open flows suggests a Liapunov exponent of order ln2. In the case of turbulence, a spectrum of multiaffine scaling exponents is provided by the velocity structure functions (see Chapter 8). Only a few of these exponents can be measured experimentally, and one does not yet have log–log plots of at least three decades for that case. If at least one positive Liapunov exponent is required, for mixing, then the multiaffine scaling exponents cannot represent criticality and cannot be universal. There is no reason to expect universal scaling exponents in turbulence (McCauley, 1997b, c), and even less reason to expect them in finance. 4.13 Fluctuations, fat tails, and diversification Assets are risky because they fluctuate in price. Even if the market remains liquid enough that bid/ask spreads are small, there is no guarantee that tomorrow you can sell shares bought today without taking a loss. The believers in the efficient market hypothesis (EMH) cannot argue that stocks may be over- or under-valued because, according to their picture, the market is always right, the market price is the fair price (but according to the successful trader Soros, the market is always wrong). Fat tails mean that big price swings occur with appreciable probability. Big price swings mean that an appreciable fraction of agents in the market are trading at extreme prices. If you could buy at the low end and sell at the high end then you would make money, but this would amount to outguessing the market, a task that the EMH believers declare to be systematically impossible. The most current statement of the EMH is that there are no patterns/correlations in the market that can be exploited for profit. Traders like Soros and Buffet who make big gains or big losses usually do not diversify. They tend to put all their eggs in one basket, taking on extreme risk. An example is provided by Soros’ enormous winning bet against the Bank of England by shorting the Pound. Those who diversify spread the risk or transfer it, but the cost is a smaller expected return. In the next chapter we cover the standard theory of the relation of risk to expected return. A privately held company stands to win all the rewards from growth, if there is growth, but holds all the risk as well. Going public and selling shares of stock reduces risk. The potential rewards are transferred to the stockholders, who take on the risk as well. If there are no bonds or bank loans outstanding then the stockholders
Fluctuations, fat tails, and diversification
89
have all of the risk. They take on this risk because they believe that a company will grow, or because there is a stock bubble and they are simply part of the herd. Again, in the EMH picture, bubbles do not occur, every price is a “fair price.” And if you believe that, then I have a car that I’m willing to sell to you. The EMH leads to the conclusion that throwing darts at the stock listings in The Wall Street Journal (Malkiel, 1996) is as effective a way of picking stocks as any other. A monkey could as well throw the darts and pick a winning portfolio, in this picture. The basis in the EMH for the analogy with darts is that if you know only the present price or price history of a collection of stocks, then this is equivalent to maximum ignorance, or no useful information about future prices. Therefore, you may as well throw darts (or make any other arbitrary choice) to choose your portfolio because no systematic choice based on prices alone can be successful. Several years ago The Wall Street Journal had a contest that pitted dart throwers against amateurs and investment advisors for a period of several weeks. Very often the former two beat the professional investment advisors. Buffet, a very successful stock-picker, challenges the EMH conclusion. He asserts that the EMH is equivalent to assuming that all players on a hockey team have the same talent, the same chance to shoot a goal. From his perspective as one who beats the market consistently, he regards the believers in the EMH as orangutans. The difficulty in trying to beat the market is that if all you do is to compare stock prices, then you’re primarily looking at the noise. The EMH is approximately correct in this respect. But then Buffet does not look only at prices. The empirical market distribution of returns is observed to peak at the current expected return, calculated from initial investment time to present time t, but the current expected return is hard to extract accurately from empirical data and also presents us with a very lively moving target: it can change from day to day and can also exhibit big swings.
5 Standard betting procedures in portfolio selection theory
5.1 Introduction Of course, everyone would like to know how to pick winning stocks but there is no such mathematical theory, nor is a guaranteed qualitative method of success available to us.1 Given one risky asset, how much should one then bet on it? According to the Gambler’s Ruin we should bet the whole amount if winning is essential for survival. If, however, one has a time horizon beyond the immediate present then maybe the amount gambled should be less than the amount required for survival in the long run. Given two or more risky assets, we can ask Harry Markowitz’s question, which is more precise: can we choose the fractions invested in each in such a way as to minimize the risk, which is defined by the standard deviation of the expected return? This is the beginning of the analysis of the question of risk vs reward via diversification. The reader is forewarned that this chapter is written on the assumption that the future will be statistically like the past, that the historic statistical price distributions of financial markets are adequate to predict future expectations like option prices. This assumption will break down during a liquidity crunch, and also after the occurrence of surprises that change market psychology permanently. 5.2 Risk and return A so-called risk-free asset is one with a fixed interest rate, like a CD, money market account or a treasury bill. Barring financial disaster, you are certain to get your money back, plus interest. A risky asset is one that fluctuates in price, one where retrieving the capital cannot be guaranteed, especially over the long run. In all that follows we work with returns x = ln( p(t)/ p(0)) instead of prices p. 1
According to Warren Buffet, more or less: pick a stock that has good earnings prospects. Don’t be afraid to buy when the market is low. Do be afraid to buy when the market is high. This advice goes against that inferred from the EMH.
91
92
Standard betting procedures in portfolio selection theory
Averages R = x = ln( p(t)/ p(0))
(5.1)
are understood always to be taken with respect to the empirical distribution unless we specify that we are calculating for a particular model distribution in order to make a point. The empirical distribution is not an equilibrium one because its moments change with time without approaching any constant limit. Finance texts written from the standpoint of neo-classical economics assume “equilibrium,” but statistical equilibrium would require time independence of the empirical distribution, and this is not found in financial markets. In particular, the Gaussian model of returns so beloved of economists is an example of a nonequilibrium distribution. Consider first a single risky asset with expected return R1 combined with a riskfree asset with known return R0 . Let f denote the fraction invested in the risky asset. The fluctuating return of the portfolio is given by x = fx1 + (1 − f )R0 and so the expected return of the portfolio is R = f R1 + (1 − f )R0 = R0 + f R
(5.2)
where R = R1 − R0 . The portfolio standard deviation, or root mean square fluctuation, is given as σ = f σ1
(5.3)
σ1 = (x − R1 )2 1/2
(5.4)
where
is the standard deviation of the risky asset. We can therefore write σ R = R0 + R σ1
(5.5)
which we will generalize later to include many uncorrelated and also correlated assets. In this simplest case the relation between return and risk is linear (Figure 5.1): the return is linear in the portfolio standard deviation. The greater the expected return the greater the risk. If there is no chance of return then a trader or investor will not place the bet corresponding to buying the risky asset. Based on the Gambler’s Ruin, we argued in Chapter 2 that “buy and hold” is a better strategy than trading often. However, one can lose all one’s money in a single throw of the dice (for example, had one held only Enron). We now show that the law of large numbers can be used to reduce risk in a portfolio of n risky assets. The Strategy of Bold Play and the Strategy of Diversification provide different answers to different questions.
Diversification and correlations
93
R
σ
Figure 5.1. Return R vs “risk” / standard deviation σ for a portfolio made up of one risky asset and one risk-free asset.
5.3 Diversification and correlations Consider next n uncorrelated assets; the xk are all assumed to be distributed statistically independently. The expected return is given by R=
n
f k Rk
(5.6)
k=1
and the mean square fluctuation by 2 2 = f k2 σk2 f k xk − R σ =
(5.7)
where f k is the fraction of the total budget that is bet on asset k. As a special case consider a portfolio constructed by dart throwing (a favorite theme in Malkiel (1996), mentioned qualitatively in Chapter 4): f k = 1/n
(5.8)
Let σ1 denote the largest of the σk . Then σ1 σ ≤√ n
(5.9)
This shows how risk could be reduced by diversification with a statistically independent choice of assets. But statistically independent assets are hard to find. For example, automobile and auto supply stocks are correlated within the sector, computer chips and networking stocks are correlated with each other, and there are also correlations across different sectors due to general business and political conditions. Consider a portfolio of two assets with historically expected return given by R = f R1 + (1 − f )R2 = R2 + f (R1 − R2 )
(5.10)
94
Standard betting procedures in portfolio selection theory
R
σ
Figure 5.2. The efficient portfolio, showing the minimum risk portfolio as the left-most point on the curve.
and risk-squared by σ 2 = f 2 σ12 + (1 − f )2 σ22 + 2 f (1 − f )σ12
(5.11)
σ12 = (x1 − R1 )(x2 − R2 )
(5.12)
where
describes the correlation between the two assets. Eliminating f via f =
R − R2 R1 − R2
(5.13)
and solving R − R2 2 2 R − R2 2 2 σ1 + 1 − σ2 σ = R1 − R2 R1 − R2 R − R2 R − R2 1− σ12 +2 R1 − R2 R1 − R2
2
(5.14)
for reward R as a function of risk σ yields a parabola opening along the σ -axis, which is shown in Figure 5.2. Now, given any choice for f we can combine the risky portfolio (as fraction w) with a risk-free asset to obtain RT = (1 − w)R0 + w R = R0 + wR
(5.15)
With σT = wσ we therefore have RT = R0 +
σT R σ
(5.16)
Diversification and correlations
95
The fraction w = σT /σ describes the level of risk that the agent is willing to tolerate. The choice w = 0 corresponds to no risk at all, RT = R0 , and w = 1 corresponds to maximum risk, RT = R1 . Next, let us return to equations (5.14)–(5.16). There is a minimum risk portfolio that we can locate by using (5.14) and solving dσ 2 =0 dR
(5.17)
Instead, because R is proportional to f , we can solve dσ 2 =0 df
(5.18)
to obtain f =
σ22 − σ12 σ12 + σ22 − 2σ12
(5.19)
Here, as a simple example to prepare the reader for the more important case, risk is minimized independently of expected return. Next, we derive the so-called “tangency portfolio,” also called the “efficient portfolio” (Bodie and Merton, 1998). We can minimize risk with a given expected return as constraint, which is mathematically the same as maximizing the expected return for a given fixed level σ of risk. This leads to the so-called efficient and tangency portfolios. First, we redefine the reference interest rate to be the risk-free rate. The return relative to R0 is R = R − R0 = f 1 R1 + f 2 R2
(5.20)
where Rk = Rk − R0 and where we have used the constraint f 1 + f 2 = 1. The mean square fluctuation of the portfolio is σ 2 = x 2 = f 12 σ12 + f 22 σ22 + 2 f 1 f 2 σ12
(5.21)
Keep in mind that the five quantities Rk , σk2 and σ12 are to be calculated from empirical data and are fixed in all that follows. Next, we minimize the mean square fluctuation subject to the constraint that the expected return (5.20) is fixed. In other words we minimize the quantity H = σ 2 + λ(R − f 1 R1 − f 2 R2 )
(5.22)
with respect to the f s, where λ is the Lagrange multiplier. This yields ∂H = 2 f 1 σ12 + 2 f 2 σ12 − λR1 = 0 ∂ f1
(5.23)
96
Standard betting procedures in portfolio selection theory
and likewise for f 2 . Using the second equation to eliminate the Lagrange multiplier λ yields λ=
2 f 2 σ22 + 2 f 1 σ12 R2
(5.24)
and so we obtain 2 f 1 σ12 + 2 f 2 σ12 −
R1 2 f 2 σ22 + 2 f 1 σ12 = 0 R2
(5.25)
Combining this with the second corresponding equation (obtained by permuting indices in (5.25)) we can solve for f 1 and f 2 . Using the constraint f 2 = 1 − f 1 yields σ22 R1 − σ12 R2 σ12 − σ12 R2 + σ22 − σ12 R1
f1 =
(5.26)
and likewise for f 2 . This pair ( f 1 , f 2 ), so-calculated, defines the efficient portfolio of two risky assets. In what follows we denote the expected return and mean square fluctuation of this portfolio by Re and σee . If we combine the efficient portfolio as fraction w of a total investment including the risk-free asset, then we obtain the so-called tangent portfolio RT = R0 + wRe
(5.27)
where Re = Re − R0 and w is the fraction invested in the efficient portfolio, the risky asset. With σT = wσe we have RT = R0 +
σT Re σe
(5.28)
The result is shown as Figure 5.3. Tobin’s separation theorem (Bodie and Merton, 1998), based on the tangency portfolio (another Nobel Prize in economics), corresponds to the trivial fact that nothing determines w other than the agent’s psychological risk tolerance, or the investor’s preference: the value of w is given by free choice. Clearly, a younger person far from retirement may sensibly choose a much larger value for w than an older person who must live off the investment. Unless, of course, the older person is in dire straits and must act boldly or else face the financial music. But it can also go otherwise: in the late 1990s older people with safe retirement finances gambled by following the fad of momentum trading via home computer.
The CAPM portfolio selection strategy
97
R
σ
Figure 5.3. The tangency portfolio.
5.4 The CAPM portfolio selection strategy The Capital Asset Pricing Model (CAPM) is very general: it assumes no particular distribution of returns and is consistent with any distribution with finite first and second moments. Therefore, in this section, we generally assume the empirical distribution of returns. The CAPM (Varian, 1992) is not, as is often claimed (Sharpe, 1964), an equilibrium model because the distribution of returns is not an equilibrium distribution. Some economists and finance theorists have mistakenly adopted and propagated the strange notion that random motion of returns defines “equilibrium.” However, this disagrees with the requirement that in equilibrium no averages of any moment of the distribution can change with time. Random motion in the market is due to trading and the excess demand of unfilled limit orders prevents equilibrium at all or almost all times. Apparently, what many economists mean by “equilibrium” is more akin to assuming the EMH (efficient market hypothesis) or absence of arbitrage opportunities, which have nothing to do with vanishing excess demand in the market (see Chapters 4, 7, and 8 for details). The only dynamically consistent definition of equilibrium is vanishing excess demand: if p denotes the price of an asset then excess demand ε( p, t) is defined by d p/dt = ε( p, t) including the case where the right-hand side is drift plus noise, as in stochastic dynamical models of the market. Bodie and Merton (1998) claim that vanishing excess demand is necessary for the CAPM, but we will see below that no such assumption comes into play during the derivation and would even cause all returns to vanish in the model.
98
Standard betting procedures in portfolio selection theory
The CAPM can be stated in the following way. Let R0 denote the risk-free interest rate, xk = ln( pk (t + t)/ pk (t))
(5.29)
is the fluctuating return on asset k where pk (t) is the price of the kth asset at time t. The total return x on the portfolio of n assets relative to the risk-free rate is given by x − R0 =
n
f i (xi − R0 )
(5.30)
i=0
where f k is the fraction of the total budget that is bet on asset k. The CAPM minimizes the mean square fluctuation f i f j (xi − R0 )(x j − R0 ) = f i f j σi j (5.31) σ2 = i, j
i, j
subject to the constraints of fixed expected return R, f i (xi − R0 ) = f i (Ri − R0 ) R − R0 = (x − R0 ) = i
(5.32)
i
and fixed normalization n
fi = 1
(5.33)
i=0
where σi j is the correlation matrix σi j = (xi − R0 )(x j − R0 ) Following Varian, we solve σki f i = σke = σee (Rk − R0 )/Re
(5.34)
(5.35)
i
for the f s, where Re = Re − R0 and Re is the expected return of the “efficient portfolio,” the portfolio constructed from f s that satisfy the condition (5.35). The expected return on asset k can be written as σke Re = βk Re (5.36) Rk = σee where σ 2 is the mean square fluctuation of the efficient portfolio, σke is the correlation matrix element between the kth asset and the efficient portfolio, and Re is the “risk premium” for asset k. Beta is interpreted as follows: β = 1 means the portfolio moves with the efficient portfolio, β < 0 indicates anticorrelation, and β > 1 means that the swings in the
The CAPM portfolio selection strategy
99
portfolio are greater than those of the efficient one. Small β indicates independent portfolios but β = 0 doesn’t guarantee full statistical independence. Greater β also implies greater risk; to obtain a higher expected return you have to take on more risk. In the finance literature β = 1 is interpreted as reflecting moves with the market as a whole, but we will analyze and criticize this assumption below (in rating mutual funds, as on morningside.com, it is usually assumed that β = 1 corresponds to the market, or to a stock index). Contradicting the prediction of CAPM, studies show that portfolios with the highest βs usually yield lower returns historically than those with the lowest βs (Black, Jensen and Scholes, 1972). This indicates that agents do not minimize risk as is assumed by the CAPM. In formulating and deriving the CAPM above, nothing is assumed either about diversification or how to choose a winning portfolio. CAPM only advises us how to try to minimize the fluctuations in any arbitrarily chosen portfolio of n assets. The a priori chosen portfolio may or may not be well diversified relative to the market as a whole. It is allowed in the theory to consist entirely of a basket of losers. However, the qualitative conclusion that we can draw from the final result is that we should avoid a basket of losers by choosing assets that are anti-correlated with each other. In other words although diversification is not necessarily or explicitly a sine-qua-non, we are advised by the outcome of the calculation to diversify in order to reduce risk. And on the other hand we are also taught that in order to expect large gains we should take on more risk. In other words, diversification is only one of two mutually exclusive messages gleaned from CAPM. In the model negative x represents a short position, and positive x represents a long position. Large beta implies both greater risk and larger expected return. Without larger expected return a trader will not likely place a bet to take on more risk. Negative returns R can and do occur systematically in market downturns, and in other bad bets. In the finance literature the efficient portfolio is identified as the market as a whole. This is an untested assumption: without the required empirical analysis, there is no reason to believe that the entire Nasdaq or NY Exchange reflect the particular asset mix of an efficient portfolio, as if “the market” would behave as a CAPM risk-minimizing computer. Also, we will show in the next chapter that option pricing does not follow the CAPM strategy of risk minimization but instead reflects a different strategy. In general, all that CAPM does is: assume that n assets are chosen by any method or arbitrariness whatsoever. Given those n assets, CAPM shows how to minimize risk with return held fixed. The identification of the efficient portfolio as the market confuses together two separate definitions of efficiency: (1) the CAPM idea of an arbitrarily chosen portfolio with an asset mix that minimizes the risk, and (2) the EMH. The latter has nothing at all to do with portfolio selection.
100
Standard betting procedures in portfolio selection theory
Finance theorists distinguish systematic or market risk from diversifiable risk. The latter can be reduced, for example, via CAPM, whereas we have no control over the former. The discussion that follows is an econophysics treatment of that subject. Let us think of a vector f with entries ( f 1 , . . . , f n ) and a matrix with elements σkl . The scalar product of f with σ f is the mean square fluctuation σ 2 = f˜ f
(5.37)
If next we define a transformation U w = Uf Λ = U U˜
(5.38)
that diagonalizes then we obtain σ2 =
n
wk2 Λ2k
(5.39)
k=1
For many assets n in a well-diversified portfolio, studying the largest eigenvalue Λ1 of the correlation matrix has shown that that eigenvalue represents the market as a whole, and that clusters of eigenvalues represent sectors of the market like transportation, paper, autos, computers, etc. Here, we have ordered eigenvalues so that Λ1 ≥ Λ2 ≥ . . . ≥ Λn . In equation (5.39) σ = 2
w12 Λ21
+
n
wk2 Λ2k
(5.40)
k=2
the first term represents so-called “nondiversifiable risk,” risk due to the market as a whole, while the second term (the sum from 2 to n) represents risk that can be reduced by diversification. If we could assume that a vector component has the order of magnitude wk = O(1/n) then we would arrive at the estimate Λ2k (5.41) n which indicates that n must be very large in order effectively to get rid of diversifiable risk. Let us consider a portfolio of two assets, for example a bond (asset #1) and the corresponding European call option (asset # 2). For any two assets the solution for the CAPM portfolio can be written in the form σ 2 ≈ w12 Λ21 +
f 1 / f 2 = (σ12 R2 − σ22 R1 )/(σ12 R1 − σ11 R2 )
(5.42)
Actually there are three assets in this model because a fraction f 0 can be invested in a risk-free asset, or may be borrowed in which case f 0 < 0. With only two assets,
The efficient market hypothesis
101
data analysis indicates that the largest eigenvalue of Λ apparently still represents the market as a whole, more or less (Laloux et al., 1999; Plerou et al., 1999). This means simply that the market tends to drag the assets up or down with it. 5.5 The efficient market hypothesis The idea of the EMH is based on the fact that it is very difficult in practice to beat the market. Mathematically, this is formulated as a fair-game condition. The idea of a fair game is one where the expected gain/loss is zero, meaning that one expects to lose as much as one gains during many trades. Since x(t) generally does not define a fair game, the drift-free variable z(t) where z(t + t) = x(t + t) − Rt and z = x(t) − Rt can be chosen instead. The fair-game condition is that z = 0, or z(t + t) = z(t). So long as market returns x(t) can be described approximately as a Markov process then there are no systematically repeated patterns in the market that can be exploited to obtain gains much greater than R. This is the original interpretation of the EMH. However, with consideration of the CAPM this idea was later modified: above average expected gains require greater risk, meaning larger β. Earlier empirical studies2 suggest that smaller β values yield smaller returns than do intermediate values, but the same studies show that the CAPM is not quite correct in describing market behavior: larger returns also were awarded historically to assets with intermediate values of β than to the largest values of β. The studies were made for mutual funds from 1970 to 1990 for periods of ten years and for quarterly and monthly returns. Physicists estimate β differently than do economists, so it would be of interest to redo the analyses. In particular, it would be of interest to analyze data from the 1990s, since the collection of high-frequency data began. Finance theorists distinguish three forms of the EMH (Skjeltorp, 1996). Weak form: it’s impossible to develop trading rules to beat market averages based on empirical price statistics. Semi-strong form: it’s impossible to obtain abnormal returns based on the use of any publicly available information. Strong form: it’s impossible to beat the market consistently by using any information, including insider information. Warren Buffet criticized the CAPM and has ridiculed the EMH. According to Buffet, regarding all agents as equal in ability (the so-called “representative agent” of latter-day neo-classical economic theory) is like regarding all players on an ice-hockey team as equal to the team’s star. This amounts to a criticism of the strong form of the EMH and seems well taken. On the other hand, it’s very hard 2
See the figures on pages 253, 261, and 268 of Malkiel (1996) and his chapter 10 references. Malkiel assumes that the EMH implies a random walk, but this is only a sufficient, not necessary, condition (see Chapter 8 in this book).
102
Standard betting procedures in portfolio selection theory
to beat the market, meaning there is some truth in the weak form of the EMH. It should help if you have the resources in experience, money, and information channels and financial perceptiveness of a Warren Buffet, George Soros or Peter Lynch. A famous trader was recently convicted and sentenced to pay a large fine for insider trading. Persistent beating of the market via insider information violates the strong form. The strong form EMH believers’ response is that Buffet, Soros and Lynch merely represent fluctuations in the tails of a statistically independent market distribution, examples of unlikely runs of luck. A more realistic viewpoint is that most of us are looking at noise (useless information, in agreement with the weak form) and that only relatively few agents have useful information that can be applied to extract unusual profits from the market. The physicist-run Prediction Company is an example of a company that has apparently extracted unusual profits from the market for over a decade. In contrast, economist-run companies like LTCM and Enron have gone belly-up. Being a physicist certainly doesn’t guarantee success (most of us are far from rich, and are terrible traders), but if you are going to look for correlations in (market or any other) data then being a physicist might help. 5.6 Hedging with options Futures and options are examples of “derivatives”: an option is a contract that gives you the right but not the obligation to buy or sell an asset at a pre-selected price. The pre-selected price is called the strike price, K , and the deadline for exercising the option is called the expiration time T . An option to buy a financial asset is a call, an option to sell the asset is a put. A so-called “American option” can be exercised on or before its expiration time. A so-called “European option” can be exercised only at the strike time. These are only names having nothing to do with geography. Background for the next chapter can be found in Bodie and Merton (1998), and in Hull (1997). Some familiarity with options is necessary in order to follow the text. For example, the reader should learn how to read and understand Figure 5.4. There are two basic questions that we address in the next chapter: how to price options in a liquid market, and the closely related question of how to choose strategies for trading them. We can begin the discussion of the first question here. We will later find that pricing an option is not independent of the chosen strategy, however. That means that the pricing defined below is based implicitly on a yet to be stated strategy. We assume a “frictionless” liquid market by ignoring all transaction fees, dividends, and taxes. We discuss only the so-called “European option” because it has mathematically the simplest forward-time initial condition, but has nothing geographic to do with Europe.
Hedging with options
Figure 5.4. Table of option prices from the February 4, 1993, Financial Times. From Wilmott, Howison, and DeWynne (1995), fig. 1.1.
103
104
Standard betting procedures in portfolio selection theory
Consider first a call. We want to know the value C of the call at a time t < T . C will depend on ( p(t), K , T − t) where p(t) is the observed price at time t. In what follows p(t) is assumed known. At t = T we know that C = max[ p(T ) − K , 0] = ( p(T ) − K )ϑ( p(T ) − K )
(5.43)
where p(T ) is the price of the asset at expiration. Likewise, a put at exercise time T has the value P = max[K − p(T ), 0] = (K − p(T ))ϑ(K − p(T ))
(5.44)
The main question is: what are the expected values of C and P at an earlier time t < T ? We assume that the option values are simply the expected values of (5.43) and (5.44) calculated from the empirical distribution of returns (Gunaratne, 1990a). That is, the final price p(T ), unknown at time t < T , must be averaged over by the empirical distribution with density f (x, T − t) and then discounted over time interval t = T − t at some rate rd . This yields the predictions C( p, K , T − t) = e−rd (T −t) ( p(T ) − K )ϑ( p(T ) − K )
∞ −rd (T −t) =e ( p(T ) − K )ϑ( p(T ) − K ) f (x, t)dx (5.45) 0
for the call, where in the integrand x = ln( p(T )/ p(t)) with p = p(t) fixed, and P( p, K , T − t) = e−rd (T −t) (K − p(T ))ϑ(K − p(T ))
∞ −rd (T −t) =e (K − p(T ))ϑ(K − p(T )) f (x, T − t)dx
(5.46)
0
for the put. Note that the expected rate of return R = ln p(t + t)/ p(t)/t for the stock will generally appear in these predictions. Exactly how we will choose R and the discount rate rd is discussed in Section 6.2.4. We will refer to equations (5.45) and (5.46) as “expected option price” valuation. We will show below and also in Section 6.2.4 that predicting option prices is not unique. Note that C − P = e−rd (T −t) ( p(T ) − K ) = V − e−rd (T −t) K
(5.47)
where V is the expected asset price p(T ) at expiration, discounted back to time t at interest rate rd where r0 ≤ rd . The identity C + e−rd (T −t) K = P + V
(5.48)
Stock shares as options on a firm’s assets
105
is called put–call parity, and provides a starting point for discussing so-called synthetic options. That is, we show how to simulate puts and calls by holding some combination of an asset and money market. Suppose first that we finance the trading by holding an amount of money M0 = (T −t) K in a risk-free fund like a money market, so that rd = r0 where r0 is the −e−r d risk-free interest rate, and also invest in one call. The value of the portfolio is Π = C + e−r0 (T −t) K
(5.49)
This result synthesizes a portfolio of exactly the same value made up of one put and one share of stock (or one bond) Π=V+P
(5.50)
and vice versa. Furthermore, a call can be synthesized by buying a share of stock (taking on risk) plus a put (buying risky insurance)3 C = P + V − e−r0 (T −t) K
(5.51)
while borrowing an amount M0 (so-called risk-free leverage). In all of the above discussion we are assuming that fluctuations in asset and option prices are small, otherwise we cannot expect mean values to be applicable. In other words, we must expect the predictions above to fail in a market crash when liquidity dries up. Option pricing via calculation of expectation values can only work during normal trading when there is adequate liquidity. LTCM failed because they continued to place “normal” bets against the market while the market was going against them massively (Dunbar, 2000).
5.7 Stock shares as options on a firm’s assets We reproduce in part here an argument from the original paper by Black and Scholes (1973) that starts with the same formula as the Modigliani–Miller argument, p = B + S where p is the current market estimate of the value of a firm, B is debt owed to bondholders and S is the current net value of all shares of stock outstanding. Black and Scholes noticed that their option pricing formula can be applied to this valuation p = B + S of a firm. This may sound far-fetched at first sight, but the main point to keep in mind in what follows is that bondholders have first call on the firm’s assets, unless the bondholders can be paid in full the shareholders get nothing. The net shareholder value at time t is given by S = Ns ps where Ns is the number of shares of stock outstanding at price ps . To keep the mathematics simple we 3
This form of insurance is risky because it is not guaranteed to pay off, in comparison with the usual case of life, medical, or car insurance.
106
Standard betting procedures in portfolio selection theory
assume in what follows that no new shares are issued and that all bonds were issued at a single time t0 and are scheduled to be repaid with all dividends owed at a single time T (this is a mathematical simplification akin to the assumption of a European option). Assume also that the stock pays no dividend. With Ns constant the dynamics of equity S is the same as the dynamics of stock price ps . Effectively, the bondholders have first call on the firm’s assets. At time T the amount owed by the firm to the bondholders is B (T ) = B(T ) + D, where B(T ) is the amount borrowed at time t0 and D is the total interest owed on the bonds. Note that the quantity B (T ) is mathematically analogous to the strike price K in the last section on options: the stock share is worth something if p(T ) > B (T ), but is otherwise worthless. At expiration of the bonds, the shareholders’ equity, the value of all shares, is then S(T ) = max( p(T ) − B (T ), 0)
(5.52)
Therefore, at time t < T we can identify the expected value of the equity as S( p, B (T ), T − t) = e−rd (T −t) max( p(T ) − B (T ), 0)
(5.53)
showing that the net value of the stock shares S can be viewed formally for t < T as an option on the firm’s assets. Black and Scholes first pointed this out. This is a very beautiful argument that shows, in contrast with advertisements by brokerage houses like “Own a Piece of America,” a shareholder does not own anything but an option on future equity so long as there is corporate debt outstanding. And an option is a very risky piece of paper, especially in comparison with a money market account. Of course, we have formally treated the bondholder debt as if it would be paid at a definite time T , which is not realistic, but this is only an unimportant detail that can be corrected by a much more complicated mathematical formulation. That is, we have treated shareholder equity as a European option, mathematically the simplest kind of option. The idea of a stock as an option on a company’s assets is theoretically appealing: a stockholder owns no physical asset, no buildings, no equipment, etc., at t < T (all debt is paid hypothetically at time T ), and will own real assets like plant, machinery, etc., at t > T if and only if there is anything left over after the bondholders have been paid in full. The B–S explanation of shareholder value reminds us superficially of the idea of book or replacement value mentioned in Section 4.3, which is based on the idea that the value of a stock share is determined by the value of a firm’s net real and financial assets after all debt obligations have been subtracted. However, in a bubble the equity S can be inflated, and S is anyway generally much larger than book or replacement value in a typical market. That S can be inflated is in qualitative agreement with M & M, that shares are bought based on future expectations of equity
The Black–Scholes model
107
growth S. In this formal picture we only know the dynamics of p(t) through the dynamics of B and S. The valuation of a firm on the basis of p = B + S is not supported by trading the firm itself, because even in a liquid equity market Exxon, Intel, and other companies do not change hands very often. Thinking of p = B + S, we see that if the firm’s bonds and shares are liquid in daily trading, then that is as close to the notion of liquidity of the firm as one can get.
5.8 The Black–Scholes model To obtain the Black–Scholes (B–S) prediction for option prices we simply replace the empirical distribution by a Gaussian distribution of returns in (5.45) and (5.46). In terms of price p we then have the sde d p = µpdt + σ1 pdB
(5.54)
for the underlying asset (stock, bond or foreign exchange, for example) with R and σ both constant. The corresponding prediction for a call on that asset is C(K , p, t) = e−rd t ( p(T ) − K )ϑ( p(T ) − K )
∞ −rd t =e ( p(T ) − K ) f g (x, t)dx
(5.55)
ln n(K / p)
where x = ln( p(T )/ p) and p is the observed asset price at time t. The corresponding put price is P(K , p, t) = e−rd t (K − p(T ))ϑ(K − p(T )) ln(K
/ p) −rd t =e (K − p(T )) f g (x, t)dx
(5.56)
0
In these two formulae f g (x, t) is the Gaussian returns density with mean x = Rt
(5.57)
R = µ − σ 2 /2
(5.58)
where
is the expected rate of return on the asset. σ is the variance of the asset return, t = T − t is the time to expiration (T is the strike time). There are three parameters in these equations, rd , µ, and σ . To obtain the prediction of the B–S model one sets rd = µ = r0 , where r0 is the risk-free rate of interest. The motivation for this assumption is discussed immediately below. The B–S model is therefore based on
108
Standard betting procedures in portfolio selection theory
two observable parameters, the risk-free interest rate r0 and the variance σ of the return on the underlying asset. The Black–Scholes model can be derived in all detail from a special portfolio called the delta hedge (Black and Scholes, 1973). Let w(p, t) denote the option price. Consider a portfolio short one call option and long ∆ shares of stock. “Long” means that the asset is purchased, “short” means that it is sold. If we choose ∆ = w then the portfolio is instantaneously risk free. To see this, we calculate the portfolio’s value at time t Π = −w + ∆p
(5.59)
Using the Gaussian returns model (5.54) we obtain the portfolio’s rate of return (after using dt2 = dt) dΠ = (−dw + ∆d p)/Π dt Πdt = − w ∆t − w d p − w σ12 p 2 /2 + ∆d p /Π dt
(5.60)
Here, we have held the fraction ∆ of shares constant during dt because this is what the hypothetical trader must do. If we choose ∆ = w then the portfolio has a deterministic rate of return dΠ/Πdt = r . In this special case, called the delta hedge portfolio, we obtain dΠ = − wdt ˙ − w σ12 p 2 /2 /(−w + w p)dt = r (5.61) Πdt where the portfolio return r does not fluctuate randomly to O(dt) and must be determined or chosen. In principle r may depend on ( p, t). The cancellation of the random term w d p in the numerator of (5.61) means that the portfolio is instantaneously risk free: the mean square fluctuation of the rate of return dΠ/Π dt vanishes to O(dt), 2 dΠ −r =0 (5.62) Πdt but not to higher order. This is easy to see. With w( p, t) deterministic the finite change Π = −w + w • p fluctuates over a finite time interval due to p. This makes the real portfolio risky because continuous time portfolio rebalancing over infinitesimal time intervals dt is impossible in reality. The delta hedge portfolio is therefore not globally risk free like a CD where the mean square fluctuation vanishes for all finite times t. To maintain the portfolio balance (5.59) as the observed asset price p changes while t increases toward expiration, the instantaneously risk-free portfolio must continually be updated. This is because p changes and both w and w change with t and p. Updating
The CAPM option pricing strategy
109
the portfolio frequently is called “dynamic rebalancing.” Therefore the portfolio is risky over finite time intervals t, which makes sense: trading stocks and options, in any combination, is a very risky business, as any trader can tell you. The standard assumption among finance theorists is that r = r0 is the risk-free rate of interest. Setting r = r0 means that one assumes that the hedge portfolio is perfectly equivalent to a money market deposit, which is wrong. Note, however, that (5.62) holds for any value of r . The theory does not pick out a special value for the interest rate r of the hedge portfolio. We defer further discussion of this point until the end of the chapter. Finally, from r = dΠ/Πdt in (5.61) we obtain the famous B–S partial differential equation (pde) 1 rw = rw ˙ + r pw + σ12 p 2 w 2
(5.63)
a backward-in-time diffusion equation that revolutionized finance. The initial condition is specified at a forward time, the strike time T , and the equation diffuses backward in time from the initial condition to predict the option price w( p, t) corresponding to the observed asset price p at time t. For a call, for example, the initial condition at expiration is given by (5.43). In their very beautifully written original 1973 paper, Black and Scholes produced two separate proofs of the pde (5.63), one from the delta hedge and the other via CAPM. Black (1989) has explained that the CAPM provided his original motivation to derive an option pricing theory. We will show next that CAPM does not lead to (5.63) but instead assumes a different risk-reduction strategy, so that the original B–S paper contains an error. Black, Scholes and Merton were not the first to derive option pricing equations, they were the first to derive an option pricing pde using only observable quantities. Long before their famous discovery, Black was an undergraduate physics student, Scholes was an economist with a lifelong interest in the stock market, and Merton was a racing car enthusiast/mechanic who played the stock market as a student. Interesting people, all three!
5.9 The CAPM option pricing strategy In what follows we consider the CAPM for two assets, a stock or bond with rate of return R1 , and a corresponding option with rate of return R2 . Assuming lognormal asset pricing (5.54) the average return on the option is given by the sde for w as dw = w ˙ + R1 pw + σ12 p 2 w /2 dt + pw σ1 dB
(5.64)
110
Standard betting procedures in portfolio selection theory
where we have used dB2 = dt. This yields an instantaneous rate of return on the option w w ˙ pw pw 2 dB dw 1 = + R1 + σ12 p 2 + σ wdt w w 2 w w 1 dt
x2 =
(5.65)
where dB/dt is white noise. From CAPM we have R2 = R0 + β2 Re
(5.66)
for the average return. The average return on the stock is given from CAPM by R1 = R0 + β1 Re
(5.67)
and the instantaneous return rate is x2 = d p/ pdt = R1 + σ1 dB/dt. According to the original Nobel Prize-winning 1973 Black–Scholes paper we should be able to prove that β2 =
pw β1 w
(5.68)
Were this the case then we would get a cancellation of the two beta terms in (5.69) below: pw w ˙ + R1 + w w w ˙ pw = + R0 + w w
R2 = R0 + β2 Re =
1 2 2 w σ p 2 1 w pw 1 w β1 Re + σ12 p 2 w 2 w
(5.69)
leaving us with risk-free rate of return R0 and the B–S option pricing pde (5.63). We show next that this result would only follow from a circular argument and is wrong: the two beta terms do not cancel each other. From the sde (5.64) for w the fluctuating option price change over a finite time interval t is given by the stochastic integral equation t+t
1 2 2 w ˙ + w R1 p + w σ1 p dt + σ1 (w p) • B 2
w =
(5.70a)
t
where the dot in the last term denotes the Ito product. In what follows we assume sufficiently small time intervals t to make the small returns approximation whereby ln(w(t + t)/w(t)) ≈ w/w and ln( p(t + t)/ p(t)) ≈ p/ p. In the small returns approximation (local solution of (5.70a)) 1 2 2 (5.70b) w ≈ w ˙ + w R1 p + w σ1 p t + σ1 w pB 2
The CAPM option pricing strategy
111
We can use this to calculate the fluctuating option return x2 ≈ w/wt at short times. With x1 ≈ p/ pt denoting the short time approximation to the asset return, we obtain σ 2 p 2 w 1 pw w ˙ + 1 + R0 pw − R0 w + (x1 − R1 ) (5.71) x 2 − R0 ≈ w 2 w Taking the average would yield (5.68) if we were to assume that the B–S pde (5.63) holds, but we are trying to derive (5.63), not assume it. Therefore, taking the average yields σ12 p 2 w 1 pw + R0 pw − R0 w + β1 w ˙ + (5.72) β2 ≈ wR2 2 w which is true but does not reduce to (5.68), in contrast with the claim made by Black and Scholes. Equation (5.68) is in fact impossible to derive without making a circular argument. Within the context of CAPM one certainly cannot use (5.68) in (5.69). To see that we cannot assume (5.68) just calculate the ratio invested f 2 / f 1 by our hypothetical CAPM risk-minimizing agent. Here, we need the correlation matrix for Gaussian returns only to leading order in t:
(5.73) σ11 ≈ σ12 t pw σ11 (5.74) σ12 ≈ w and
σ22 ≈
pw w
2 σ11
(5.75)
The variance of the portfolio vanishes to lowest order as with the delta hedge, but it is also easy to show that to leading order in t f 1 ∝ (β1 pw /w − β2 ) pw /w
(5.76)
f 2 ∝ (β2 − β1 pw /w)
(5.77)
and
so that it is impossible that the B–S assumption (5.68) could be satisfied. Note that the ratio f 1 / f 2 is exactly the same as for the delta hedge. That CAPM is not an equilibrium model is exhibited explicitly by the time dependence of the terms in (5.73)–(5.77). The CAPM does not predict either the same option pricing equation as does the delta hedge. Furthermore, if traders actually use the delta hedge in option pricing
112
Standard betting procedures in portfolio selection theory
then this means that agents do not trade in a way that minimizes the mean square fluctuation a` la CAPM. The CAPM and the delta hedge do not try to reduce risk in exactly the same way. In the delta hedge the main fluctuating terms are removed directly from the portfolio return, thereby lowering the expected return. In CAPM, nothing is subtracted from the return in forming the portfolio and the idea there is not only diversification but also increased expected return through increased risk. This is illustrated explicitly by the fact that the expected return on the CAPM portfolio is not the risk-free return, but is instead proportional to the factor set equal to zero by Black and Scholes, shown above as equation (5.24). With Rcapm = R0 + Rcapm we have Rcapm =
β1 pw /w − β2 Re pw /w − 1
(5.78)
Note also that beta for the CAPM hedge is given by βcapm =
β1 pw /w − β2 pw /w − 1
(5.79)
The notion of increased expected return via increased risk is not present in the delta hedge strategy, which tries to eliminate risk completely. In other words, the delta hedge and CAPM attempt to minimize risk in two different ways: the delta hedge attempts to eliminate risk altogether whereas in CAPM one acknowledges that higher risk is required for higher expected return. We see now that the way that options are priced is strategy dependent, which is closer to the idea that psychology plays a role in trading. The CAPM option pricing equation depends on the expected returns for both stock and option, 1 ˙ + pw R1 + σ12 p 2 w R2 w = w 2
(5.80)
and so differs from the original Black–Scholes equation (5.63) of the delta hedge strategy. There is no such thing as a universal option pricing equation independent of the chosen strategy, even if that strategy is reflected in this era by the market. Economics is not like physics (nonthinking nature), but depends on human choices and expectations.
5.10 Backward-time diffusion: solving the Black–Scholes pde Next, we show that it is very simple to use the Green function method from physics to solve the Black–Scholes partial differential equation, which is a simple, linear backward-in-time diffusion equation.
Backward-time diffusion
113
Consider the simplest diffusion equation ∂2 f ∂f =D 2 ∂t ∂x
(5.81)
with D > 0 a constant. Solutions exist only forward in time, the time evolution operator ∂2
U (t) = et D ∂ x 2
(5.82)
has no inverse. The solutions f (x, t) = U (t) f (x, 0) = f (x, 0) (t D)n ∂ n f (x, 0) ∂ f (x, 0) + ··· + + ··· +tD ∂x n! ∂xn
(5.83)
form a semi-group. The infinite series (5.79) is equivalent to the integral operator
∞ f (x, t) =
g(x, t | z, 0) f (z, 0)dz
(5.84)
−∞
where g is the Green function of (5.81). That there is no inverse of (5.82) corresponds to the nonexistence of the integral (5.84) if t is negative. Consider next the diffusion equation (Sneddon, 1957) ∂2 f ∂f = −D 2 ∂t ∂x
(5.85)
It follows that solutions exist only backward in time, with t starting at t0 and decreasing. The Green function for (5.85) is given by (x−x0 )2 1 − e 4D(t0 −t) g(x, t | x0 , t0 ) = √ 4 D(t0 − t)
(5.86)
With arbitrary initial data f (x, t0 ) specified forward in time, the solution of (5.85) is for t ≤ t0 given by
∞ f (x, t) =
g(x, t | z, t0 ) f (z, t0 )dz
(5.87)
−∞
We can rewrite the equations as forward in time by making the transformation t = t0 − t so that (5.85) and (5.86) become ∂f ∂2 f =D 2 ∂t ∂x
(5.88)
114
Standard betting procedures in portfolio selection theory
and (x−x0 )2 1 e− 4Dt g( x, t| x0 , t0 ) = √ 4 Dt
(5.89)
with t increasing as t decreases. This is all that we need to know about backwardin-time diffusion equations, which appear both in option pricing and in stochastic models of the eddy-energy cascade in fluid turbulence. Finance texts go through a lot of rigmarole to solve the B–S pde, but that is because finance theorists ignore Green functions. They also concentrate on p instead of on x, which is a mistake. Starting with the B–S pde (5.63) and transforming to returns x we obtain 1 r u = u˙ + r u + σ12 u 2
(5.90)
where u(x, t) = pw( p, t) (because udx = wd p) and r = r − σ 2 /2. We next make the simple transformation w = ver t so that 1 0 = v˙ + r v + σ12 v 2
(5.91)
The Green function for this equation is the Gaussian g(x − r (T − t), T − t) =
− (x−r2(T −t)) 1 e 2σ1 (T −t) √ σ1 (T − t)
2
(5.92)
and the forward-time initial condition for a call at time T is v(x, T ) = e−r T ( pex − K ), v(x, T ) = 0,
x >0
x <0
(5.93)
so that the call has the value −r (T −t)
C(K , p, T − t) = e
∞
g(x − r t, T − t)ex dx
p ln K / p
−r (T −t)
−e
∞ K
g(x − r t, T − t)dx
(5.94)
ln K / p
The reader can write down the corresponding formula for a put. Here’s the main point: this result is exactly the same as equation (5.55) if we choose rd = r and µ = r in (5.55). In the delta hedge nothing is assumed about the underlying asset’s expected rate of return µ; instead, we obtain the prediction that the discount rate rd should be the
Backward-time diffusion
115
same as the expected rate of return r of the hedge portfolio. Finance theorists treat this as a mathematical theorem. A physicist, in contrast, sees this as a falsifiable condition that must be tested empirically. The main point is, without extra assumptions (5.55) and (5.56) implicitly reflect a different hedging strategy than the delta hedge. This is fine: theoretical option pricing is not universal independent of the choice of strategy, and one can easily cook up explicit strategies where rd , µ and r don’t all coincide. The trick, therefore, is to use empirical asset and option price data to try to find out which strategy the market is following in a given era. If we can use the empirical distribution to price options in agreement with the market then, implicitly (if not effectively) we will have isolated the dominant strategy, if there is a dominant strategy. In that case we have to pay attention to what traders actually do, something that no finance theory text discusses. Finance theory texts also do not use the empirical distribution to predict option prices. Instead, they prove a lot of formal mathematical theorems about Martingales, arbitrage over infinitesimal time intervals, and the like, as if theoretical finance would be merely a subset of the theory of stochastic processes. A trader cannot learn anything new or useful about making and losing money by reading a text on financial mathematics. By completing the square in the exponent of the first integral in (5.94) and then transforming variables in both integrals, we can transform equation (5.90) into the standard textbook form (Hull, 1997), convenient for numerical calculation: C(K , p, T − t) = pN (d1 ) − K e−r t N (d2 )
(5.95)
where 1 N (d) = √ 2 with
and
d
e−y
2
/2
(5.96)
−∞
ln p/K + r + σ12 /2 t d1 = √ σ1 t
(5.97)
ln p/K + r − σ12 /2 t d2 = √ σ1 t
(5.98)
Finally, to complete the picture, Black and Scholes, following the theorists Modigliani and Miller, assumed the no-arbitrage condition. Because the portfolio is instantaneously risk free they chose r = r0 .
116
Standard betting procedures in portfolio selection theory
Probability density
1
10−1
10−2
10−3
10−4 −4
1 2 3 −3 −2 −1 0 USD/DEM hourly returns (%)
4
Figure 5.5. Gaussian (dashed line) vs empirical distribution of returns x showing fat tails (courtesy of Michel Dacorogna). Note that the Gaussian distribution intersects but does not coincide with the empirical distribution for any finite range of x. (This figure is the same as Figure 4.1.)
The Fokker–Planck equation for the Gaussian returns model is σ 2 f (5.99) f˙ = −(µ − σ 2 /2) f + 2 Note that with the choice µ = r this equation and the transformed B–S pde (5.91) form a forward- and backward-time Kolmogorov pair of diffusion equations (see Gnedenko (1967) or Appendix A). Each equation has exactly the same Green function. This is why expected price option pricing based on (5.45) and (5.46) with f = f g and rd = µ = r agrees exactly with the predictions (5.96) and the corresponding put equation for the delta hedge. The more general correspondence between backward-time option pricing pdes and market Fokker–Planck equations for arbitrary diffusion coefficients D(x, t) is discussed in Chapter 6. What about the comparison of the model with real trading prices? Consider a call option as an example. If at the present time t we find that p > K then the call is said to be “in the money,” and is “out of the money” if p < K . How do the predictions of the model compare with observed option prices? The B–S model prices “in the money” options too high and “out of the money” options too low. The reason for this is that the observed distribution has fat tails, and also is not approximately Gaussian for small to moderate returns (see Figure 5.5). We know from our discussion of the central limit theorem in Chapter 3 that a distribution is at best approximated asymptotically as Gaussian only for small fluctuations near the mean. Such a restricted approximation cannot be used to describe the market and consequently cannot be used to price options correctly.
Backward-time diffusion
117
0.15
Implied volatility
0.14
0.13
0.12
0.11
0.1
0.09 70
80
90 Strike price
100
110
Figure 5.6. Volatility smile, suggesting that the correct underlying diffusion coefficient D(x, t) is not independent of x. (This figure is the same as Figure 6.4.)
The error resulting from the approximation of the empirical distribution by the Gaussian is compensated for in “financial engineering” by the following fudge: plug the observed option price into equations (5.55) and (5.56) and then calculate (numerically) the “implied volatility” σ . The implied volatility is not constant but depends on the strike price K and exhibits “volatility smile,” as in Figure 5.6. What this really means is that the returns sde dx = Rdt + σ dB
(5.100)
with σ = constant independent of x cannot possibly describe real markets: the local volatility σ 2 = D must depend on (x, t). The local volatility D(x, t) is deduced from the empirical returns distribution and used to price options correctly in Chapter 6. In financial engineering where “stochastic volatility models” are used, the volatility fluctuates randomly, but statistically independently of x. This is a bad approximation because fluctuation of volatility can be nothing other than a transformed version of fluctuation in x. That is, volatility is perfectly correlated with x. We will discover under what circumstances, in the next chapter for processes with nontrivial local volatility D(x, t), the resulting delta hedge option pricing partial differential equation can approximately reproduce the predictions (5.45) and (5.46) above.
118
Standard betting procedures in portfolio selection theory
5.11 We can learn from Enron Enron (Bryce and Ivins, 2002) started by owning real assets in the form of gas pipelines, but became a so-called New Economy company during the 1990s based on the belief that derivatives trading, not assets, paves the way to great wealth acquired fast. This was during the era of widespread belief in reliable applicability of mathematical modeling of derivatives, and “equilibrium” markets, before the collapse of LTCM. At the time of its collapse Enron was building the largest derivatives trading floor in the world. Compared with other market players, Enron’s VaR (Value at Risk (Jorion, 1997)) and trading-risk analytics were “advanced,” but were certainly not “fool-proof.” Enron’s VaR model was a modified Heath–Jarrow–Morton model utilizing numerous inputs (other than the standard price/volatility/position) including correlations between individual “curves” as well as clustered regional correlations, factor loadings (statistically calculated potential stress scenarios for the forward price curves), “jump factors” for power price spikes, etc. Component VaR was employed to identify VaR contributors and mitigators, and Extreme Value Theory4 was used to measure potential fat tail events. However, about 90% of the employees in “Risk Management” and virtually all of the traders could not list, let alone explain the inputs into Enron’s VaR model.5 A severe weakness is that Enron tried to price derivatives in nonliquid markets. This means that inadequate market returns or price histograms were used to try to price derivatives and assess risk. VaR requires good statistics for the estimation of the likelihood of extreme events, and so with an inadequate histogram the probability of an extreme event cannot be meaningfully estimated. Enron even wanted to price options for gas stored in the ground, an illiquid market for which price statistics could only be invented.6 The full implication of these words will be made apparent by the analysis of Chapters 6 and 7 below. Some information about Enron’s derivatives trading was reported in the article http://www.nytimes.com/2002/12/12/business/12ENER.html?pagewanted=1. But how could Enron “manufacture” paper profits, without corresponding cash flow, for so long and remain undetected? The main accounting trick that allowed Enron to report false profits, driving up the price of its stock and providing enormous rewards to its deal makers, is “mark to market” accounting. Under that method,
4
5 6
See Sornette (1998) and Dacorogna et al. (2001) for definitions of Extreme Value Theory. A correct determination of the exponent α in equation (4.15) is an example of an application of Extreme Value Theory. In other words, Extreme Value Theory is a method of determining the exponent that describes the large events in a fat-tailed distribution. The information in this paragraph was provided by a former Enron risk management researcher who prefers to remain anonymous. Private conversation with Enron modelers in 2000.
We can learn from Enron
119
future projected profits over a long time interval are allowed to be declared as current profit even though no real profit has been made, even though there is no positive cash flow. In other words, firms are allowed to announce to shareholders that profits have been made when no profit exists. Enron’s globally respected accounting firm helped by signing off on the auditing reports, in spite of the fact that the auditing provided so little real information about Enron’s financial status. At the same time, major investment houses that also profited from investment banking deals with Enron touted the stock. Another misleading use of mark to market accounting is as follows: like many big businesses (Intel, GE, . . .) Enron owned stock in dot.com outfits that later collapsed in and after winter, 2000, after never having shown a profit. When the stock of one such company, Rhythms NetConnections, went up significantly, Enron declared a corresponding profit on its books without having sold the stock. When the stock price later plummeted Enron simply hid the loss by transferring the holding into one of its spinoff companies. Within that spinoff, Enron’s supposed “hedge” against the risk was its own stock. The use of mark to market accounting as a way of inflating profit sheets surely should be outlawed,7 but such regulations fly in the face of the widespread belief in the infallibility of “the market mechanism.” Shareholders should be made fully aware of all derivatives positions held by a firm. This would be an example of the useful and reasonable regulation of free markets. Ordinary taxpayers in the USA are not permitted to declare as profits or losses unrealized stock price changes. As Black and Scholes made clear, a stock is not an asset, it is merely an option on an asset. Real assets (money in the bank, plant and equipment, etc.), not unexercised options, should be the basis for deciding profits/losses and taxation. In addition, accounting rules should be changed to make it extremely difficult for a firm to hide its potential losses on bets placed on other firms: all holdings should be declared in quarterly reports in a way that makes clear what are real assets and what are risky bets. Let us now revisit the Modigliani–Miller theorem. Recall that it teaches that to a first approximation in the valuation of a business p = B + S the ratio B/S of debt to equity doesn’t matter. However, Enron provides us with examples where the amount of debt does matter. If a company books profits through buying another company, but those earnings gains are not enough to pay off the loan, then debt certainly matters. With personal debt, debt to equity matters since one can go bankrupt by taking on too much debt. The entire M & M discussion is based on the small returns approximation E = p ≈ pt, but this fails for big changes in 7
It would be a good idea to mark liquid derivatives positions to market to show investors the level of risk. Illiquid derivatives positions cannot be marked to market in any empirically meaningful way, however.
120
Standard betting procedures in portfolio selection theory
p. The discussion is therefore incomplete and cannot be extrapolated to extreme cases where bankruptcy is possible. So the ratio B/S in p = B + S does matter in reality, meaning that something important is hidden in the future expectations E and ignored within the M & M theorem. Enron made a name for itself in electricity derivatives after successfully lobbying for the deregulation of the California market. The manipulations that were successfully made by options traders in those markets are now well documented. Of course, one can ask: why should consumers want deregulated electricity or water markets anyway? Deregulation lowered telephone costs, both in the USA and western Europe, but electricity and water are very different. Far from being an information technology, both require the expensive transport of energy over long distances, where dissipation during transport plays a big role in the cost. So far, in deregulated electricity and water markets, there is no evidence that the lowering of consumer costs outweighs the risk of having firms play games trying to make big wins by trading options on those services. The negative effects on consumers in California and Buenos Aires do not argue in favor of deregulation of electricity and water. Adam Smith and his contemporaries believed without proof that there must be laws of economics that regulate supply and demand analogous to the way that the laws of mechanics govern the motion of a ball. Maybe Smith did not anticipate that an unregulated financial market can develop big price swings where supply and demand cannot come close to matching each other.
6 Dynamics of financial markets, volatility, and option pricing
6.1 An empirical model of option pricing 6.1.1 Introduction We begin with the empirical distribution of intraday asset returns and show how to use that distribution to price options empirically in agreement with traders’ prices in closed algebraic form. In Section 6.2 we formulate the theory of volatility of fat-tailed distributions and then show how to use stochastic dynamics with the empirical distribution to deduce a returns- and time-diffusion coefficient. That is, we solve the inverse problem: given the empirical returns distribution, we construct the dynamics by inferring the local volatility function that generates the distribution. We begin by asking which variable should be used to describe the variation of the underlying asset price p. Suppose p changes from p(t) to p(t + t) = p + p in the time interval from t to t + t. Price p can of course be measured in different units (e.g., ticks, Euros, Yen or Dollars), but we want our equation to be independent of the units of measure, a point that has been ignored in many other recent data analyses. For example, the variable p is additive but is units dependent. The obvious way to achieve independence of units is to study p/ p, but this variable is not additive. This is a serious setback for a theoretical analysis. A variable that is both additive and units independent is x = ln( p(t)/ p(t0 )), in agreement with Osborne (1964), who reasoned from Fechner’s Law. In this notation x = ln( p(t + t)/ p(t)). One cannot discover the correct exponents µ for very large deviations (so-called “extreme values”) of the empirical distribution without studying the distribution of logarithmic returns x. The basic assumption in formulating our model is that the returns variable x(t) is approximately described as a Markov process. The simplest approximation is a Gaussian distribution of returns represented by the stochastic differential
121
122
Dynamics of financial markets, volatility, and option pricing
equation (sde) dx = Rdt + σ dB
(6.1)
where dB denotes the usual Wiener process with dB = 0 and dB 2 = dt, but with R and σ constants, yielding lognormal prices as first proposed by Osborne. The assumption of a Markov process is an approximation; it may not be strictly true because it assumes a Hurst exponent H = 1/2 for the mean square fluctuation whereas we know from empirical data only that the average volatility σ 2 behaves as σ 2 = (x − x)2 ≈ ct 2H
(6.2)
with c a constant and H = O(1/2) after roughly t >10–15 min in trading (Mantegna and Stanley, 2000). With H = 1/2 there would be fractional Brownian motion (Feder, 1988), with long time correlations that could in principle be exploited for profit, as we will show in Chapter 8. The assumption that H ≈ 1/2 is equivalent to the assumption that it is very hard to beat the market, which is approximately true. Such a market consists of pure noise plus hard to estimate drift, the expected return R on the asset. We assume a continuous time description for mathematical convenience, although this is also obviously a source of error that must be corrected at some point in the future: the shortest time scale in finance is on the order of one second, and so the use of Ito’s lemma may lead to errors that we have not yet detected. With that warning in mind, we go on with continuous time dynamics. The main assumption of the Black–Scholes (1973) model is that the successive returns x follow a continuous time random walk (6.1) with constant mean and constant standard deviation. In terms of price this is represented by the simple sde d p = µpdt + σ pdB
(6.3)
The lognormal price distribution g( p, t) solves the corresponding Fokker–Planck equation σ2 2 ( p g( p, t)) 2 If we transform variables to returns x = ln( p(t)/ p(t0 )), then g˙ ( p, t) = −µ( pg( p, t)) +
(6.4)
f 0 (x, t) = pg( p, t) = N ((x − Rt)/2σ 2 t)
(6.5)
is the Gaussian density of returns x, with N the standard notation for a normal distribution with mean x = Rt = (µ − σ 2 /2)t and diffusion constant D = σ 2 .
(6.6)
An empirical model of option pricing
123
The empirical distribution of returns is, however, not approximately Gaussian.1 We denote the empirical density by f (x, t). As we showed in Chapter 5, European options may be priced as follows. At expiration a call is worth C = ( pT −K )ϑ( pT −K )
(6.7)
where ϑ is the usual step function. We want to know the call price C at time t < T . Discounting money from expiration back to time t at rate rd , and writing x = ln( pT )/( p), where pT is the unknown asset price at time T and p is the observed price at time t, we simply average (6.7) over the pT using the empirical returns distribution to get the prediction C(K , p, t) = e−rd t ( pT − K )ϑ( pT − K ) ∞ ( pT − K ) f (x, t)dx = e−rd t
(6.8)
ln(K / p)
and rd is the discount rate. In (6.8), t = T − t is the time to expiration. Likewise, the value of a put at time t < T is P(K , p, t) = e−rd t (K − pT )ϑ(K − pT ) =e
−rd t
ln(K / p)
(K − pT ) f (x, t)dx
(6.9)
0
The Black–Scholes approximation is given by replacing the empirical density f by the normal density N = f 0 in (6.8) and (6.9). We will refer to the predictions (6.8) and (6.9) as the “expected price” option valuation. The reason for this terminology is that predicting option prices is not unique. We discuss the nonuniqueness further in Section 6.2.4 in the context of a standard strategy called risk-neutral or risk-free option pricing. In the face of nonuniqueness, one can gain direction only by finding out what the traders assume, which is the same as asking, “What is the market doing?” This is not the same as asking what the standard finance texts are teaching. What does the comparison with data on put and call prices predicted by (6.8) and (6.9) teach us? We know that out-of-the-money options generally trade at a higher price than in Black–Scholes theory. That lack of agreement is “fudged” in financial engineering by introducing the so-called “implied volatility”: the diffusion coefficient D = σ 2 is treated illegally as a function of the strike price K (Hull, 1997). 1
The empirical distribution becomes closer to a Gaussian in the central part only at times on the order of several months. At earlier times, for example from 1 to 30 days, the Gaussian approximation is wrong for both small and large returns.
124
Dynamics of financial markets, volatility, and option pricing
The fudge suggests to us that the assumption of a constant diffusion coefficient D in equation (6.1) is wrong. In other words, a model sde for returns √ (6.10a) dx = Rdt + DdB with diffusion coefficient D independent of (x, t) cannot possibly reproduce either the correct returns distribution or the correct option pricing. The corresponding price equation is (6.10b) d p = µpdt + d( p, t) pdB where d( p, t) = D(x, t) and R = µ − D(x, t)/2 are not constants. We will show in Section 6.1.3 how to approximate the empirical distribution of returns simply, and then will deduce an explicit expression for the diffusion coefficient D(x, t) describing that distribution dynamically in Section 6.2.3 (McCauley and Gunaratne, 2003a, b). We begin the next section with one assumption, and then from the historical data for US Bonds and for two currencies we show that the distribution of returns x is much closer to exponential than to Gaussian. After presenting some useful formulae based on the exponential distribution, we then calculate option prices in closed algebraic form in terms of the two undetermined parameters in the model. We show how those two parameters can be estimated from data and discuss some important consequences of the new model. We finally compare the theoretically predicted option prices with actual market prices. In Section 6.2 below we formulate a general theory of fluctuating volatility of returns, and also a stochastic dynamics with nontrivial volatility describing the new model. Throughout the next section the option prices given by formulae refer to European options.
6.1.2 The empirical distribution The observations discussed above indicate that we should analyze the observed distribution of returns x and see if we can model it. The frequencies of returns for US Bonds and some currencies are shown in Figures 6.1, 6.2, and 6.3. It is clear from the histograms, at least for short times t, that the logarithm of the price ratio p(t)/ p(0), x, is distributed very close to an exponential that is generally asymmetric. We’ll describe some properties of the exponential distribution here and then use it to price options below. The tails of the exponential distribution fall off much more slowly than those of normal distributions, so that large fluctuations in returns are much more likely. Consequently, the price of out-of-the-money options will be larger than that given by the Black–Scholes theory.
An empirical model of option pricing
125
100
f (x, t)
10−1
10−2
10−3 −0.01
−0.005
0 0.005 x(t) = log (p(t)/p)
0.01
Figure 6.1. The histogram for the distribution of relative price increments for US Bonds for a period of 600 days. The horizontal axis is the variable x = ln ( p(t + t)/p(t)), and the vertical axis is the logarithm of the frequency of its occurrence (t = 4 h). The piecewise linearity of the plot implies that the distribution of returns x is exponential.
10−0
f(x, t)
10−1
10−2
10−3 −5
0 x(t) = log (p(t)/p)
5 × 10−3
Figure 6.2. The histogram for the relative price increments of Japanese Yen for a period of 100 days with t = 1 h.
126
Dynamics of financial markets, volatility, and option pricing 100
f (x, t)
10−1
10−2
10−3 −5
0 x(t) = log (p(t)/p)
5
Figure 6.3. The histogram for the relative price increments for the Deutsche Mark for a period of 100 days with t = 0.5 h.
Suppose that the price of an asset moves from p0 to p(t) in time t. Then we assume that the variable x = ln( p(t)/ p0 ) is distributed with density γ (x−␦) , x <␦ Ae (6.11) f (x, t) = −ν(x−␦) , x >␦ Be Here, ␦, γ and ν are the parameters that define the distribution. Normalization of the probability to unity yields B A + =1 γ ν
(6.12)
The choice of normalization coefficients A and B is not unique. For example, one could take A = B, or one could as well take A = γ /2 and B = ν/2. Instead, for reasons of local conservation of probability explained in Section 6.2 below, we choose the normalization A B = 2 (6.13) 2 ν γ With this choice we obtain γ2 γ +ν (6.14) ν2 B= γ +ν and probability will be conserved in the model dynamics introduced in Section 6.2. A=
An empirical model of option pricing
127
Note that the density of the variable y = p(t)/ p0 has fat tails in price p, Ae−γ ␦ y γ −1 , y < e␦ g(y, t) = (6.15a) Be−γ ␦ y −ν−1 , y > e␦ where g(y, t) = f (x, t)dx/dy. The exponential distribution describes only intraday trading for small to moderate returns x. The empirical distribution has fat tails for very large absolute values of x. The extension to include fat tails in returns x is presented in Section 6.3 below. Typically, a large amount of data is needed to get a definitive form for the histograms as in Figures 6.1–6.3. With smaller amounts of data it is generally impossible to guess the correct form of the distribution. Before proceeding let us describe a scheme to deduce that the distribution is exponential as opposed to normal or truncated symmetric Levy. The method is basically a comparison of mean and standard deviation for different regions of the distribution. We define ∞ 1 B x+ = x f (x, t)dx = ␦+ (6.16) ν ν ␦
to be the mean of the distribution for x > ␦ ␦ x− = −∞
1 A ␦− x f (x, t)dx = γ γ
(6.17)
as the mean for that part with x < ␦. The mean of the entire distribution is x = ␦
(6.18)
The analogous expressions for the mean square fluctuation are easily calculated. The variance σ 2 for the whole is given by σ 2 = 2(γ ν)−1
(6.19)
With t = 0.5 − 4 h, γ and ν are on the order of 500 for the time scales t of data analyzed here. Hence the quantities γ and ν can be calculated from a given set of data. The average of x is generally small and should not be used for comparisons, but one can check if the relationships between the quantities are valid for the given distribution. Their validity will give us confidence in the assumed exponential distribution. The two relationships that can be checked are σ 2 = σ+2 + σ−2 and σ+ + σ− = x+ + x− . Our histograms do not include extreme values of x where f decays like a power of x (Dacorogna et al., 2001), and we also do not discuss results from trading on time scales t greater than one day.
128
Dynamics of financial markets, volatility, and option pricing
Assuming that the average volatility obeys σ 2 = (x − x)2 = ct 2H
(6.20)
where H = O(1/2) and c is a constant, we see that the fat-tailed price exponents in (6.11) decrease with increasing time, γ = 1/b t H
(6.21)
ν = 1/bt H
(6.22)
and
where b and b are constants. In our data analysis we find that the exponential distribution spreads consistent with 2H = O(1), but whether 2H ≈ 1, 0.9, or 1.1, we cannot determine with any reliable degree of accuracy. We will next see that the divergence of γ and ν as t vanishes is absolutely necessary for correct option pricing near the strike time. In addition, only the choice H = 1/2 is consistent with our assumption in Section 6.2 of a Markovian approximation to the dynamics of very liquid markets. The exponential distribution will be shown to be Markovian in Section 6.2.4. 6.1.3 Pricing options using the empirical distribution Our starting point for option pricing is the assumption that the call prices are given by averaging over the final option price max( pT − K , 0), where x = ln pT / p, with the exponential density C(K , p, t) = e−rd t ( pT − K )ϑ( pT − K ) =e
−rd t
∞ ( pex − K ) f (x, t)dx
(6.23)
ln(K / p)
but with money discounted at rate rd from expiration time T back to observation time t. Puts are given by P(K , p, t) = e−rd t (K − pT )ϑ(K − pT ) =e
−rd t
ln(K / p)
(K − pex ) f (x, t)dx
(6.24)
−∞
where f (x, t) is the empirical density of returns, which we approximate next as exponential. Here, p0 is the observed asset price at time t and the strike occurs at time T , where t = T − t.
An empirical model of option pricing
129
In order to determine ␦ empirically we will impose the traders’ assumption that the average stock price increases at the rate of cost of carry rd (meaning the risk-free interest rate r0 plus a few percentage points), where p(t) = p0 e
µ dt
= p0 erd t
(6.25a)
is to be calculated from the exponential distribution. The relationship between µ, µ , and ␦ is presented in Section 6.2.4 below. For the exponential density of returns we find that the call price of a strike K at time T is given for x K = ln(K / p) < ␦ by C(K , p, t)erd t =
pe Rt γ 2 (ν − 1) + ν 2 (γ + 1) (γ + ν) (γ + 1)(ν − 1) Kγ K −Rt γ + e −K (γ + 1)(γ + ν) p
(6.26)
where p0 is the asset price at time t, and A and ␦ are given by (6.14) and (6.25b). For x K > ␦ the call price is given by C(K , p, t)erd t =
ν K γ +νν−1
K −Rt e p
−ν (6.27)
Observe that, unlike in the standard Black–Scholes theory, these expressions and their derivatives can be calculated explicitly. The corresponding put prices are given by rd t
P(K , p, t)e
Kγ = (γ + ν)(γ + 1)
K −Rt e p
γ (6.28)
for x K < ␦ and by per Rt γ 2 (ν − 1) + ν 2 (γ + 1) (γ + ν) (γ + 1)(ν − 1) K Rt −ν Kν e + (ν + γ )(ν − 1) p
P(K , p, t)erd t = K −
(6.29)
for x K > ␦. Note that the backward-time initial condition at expiration t = T, C = max( p − K , 0) = ( p − K )ϑ( p − K ), is reproduced by these solutions as γ and ν go to infinity, and likewise for the put. To see how this works, just use this limit with the density of returns (6.15) in (6.23) and (6.24). We see that f (x, t) peaks sharply at x = ␦ and is approximately zero elsewhere as t approaches T. A standard
130
Dynamics of financial markets, volatility, and option pricing
largest-term approximation (see Watson’s lemma in Bender and Orszag, 1978) in (6.23) yields ␦ rd ∆t ␦ ␦ ≈ ( p0 e − K )ϑ( p0 e − K ) p− (x, t)dx Ce K
+ ( p0 e␦ − K )ϑ( p0 e␦ − K )
∞ p+ (x, t)dx ␦
␦
␦
= ( p0 e − K )ϑ( p0 e − K ) ≈ ( p0 − K )ϑ( p0 − K )
(6.30)
as ␦ vanishes. For x K > ␦ we get C = 0 whereas for x K < ␦ we retrieve C = ( p − K ), as required. Therefore, our pricing model recovers the initial condition for calls at strike time T , and likewise for the puts. We show next how ␦(t) is to be chosen. We calculate the average rate of gain rd in (6.25a) from the exponential distribution to obtain γ ν + (ν − γ ) 1 1 rd = µ (t)dt = ␦ + ln (6.25b) t t (γ + 1)(ν − 1) We assume that rd is the cost of carry, i.e. rd exceeds the risk-free interest rate r0 by a few percentage points. This is the choice made by traders. We will say more about ␦ in the dynamical model of Section 6.2. All that remains empirically is to estimate the two parameters γ and ν from data (we do not attempt to determine b, b and H empirically here). We outline a scheme that is useful when the parameters vary in time. We assume that the options close to the money are priced correctly, i.e. according to the correct frequency of occurrence. Then by using a least squares fit we can determine the parameters γ and ν. We typically use six option prices to determine the parameters, and find the root mean square (r.m.s.) deviation is generally very small; i.e. at least for the options close to the money, the expressions (6.26)–(6.29) give consistent results (see Figure 6.4). Note that, when fitting, we use the call prices for the strikes above the future and put prices for those below. These are the most often traded options, and hence are more likely to be traded at the “correct” price. Table 6.1 shows a comparison of the results with actual prices. The option prices shown are for the contract US89U whose expiration day was August 18, 1989 (the date at which this analysis2 was performed). The second column shows the endof-day prices for options (C and P denote calls and puts respectively), on May 3, 1989 with 107 days to expiration. The third column gives the equivalent annualized implied volatilities assuming Black–Scholes theory. The values of γ and ν are 2
The data analysis was performed by Gemunu Gunaratne (1990a) while working at Tradelink Corp.
An empirical model of option pricing
131
0.15
Implied volatility
0.14
0.13
0.12
0.11
0.1
0.09 70
80
90 Strike price
100
110
Figure 6.4. The implied volatilities of options compared with those using equations (6.26)–(6.29) (broken line). This plot is made in the spirit of “financial engineering.” The time evolution of γ and ν is described by equations (6.21) and (6.22), and a fine-grained description of volatility is presented in the text.
estimated to be 10.96 and 16.76 using prices of three options on either side of the futures price 89.92. The r.m.s. deviation for the fractional difference is 0.0027, suggesting a good fit for six points. Column 4 shows the prices of options predicted by equations (6.26)–(6.29). We have taken into account the fact that options trade in discrete ticks, and have chosen the tick price by the number larger than the actual price. We have added a price of 0.5 ticks as the transaction cost. The fifth column gives the actual implied volatilities from the Black–Scholes formulae. Columns 2 and 4, as well as columns 3 and 5, are almost identical, confirming that the options are indeed priced according to the proper frequency of occurrence in the entire range. The model above contains a flaw: option prices can blow up and then go negative at extremely large times t where ν ≤ 1 (the integrals (6.23) and (6.24) diverge for ν = 1). But since the annual value of ν is roughly 10, the order of magnitude of the time required for divergence is about 100 years. This is irrelevant for trading. More explicitly, ν = 540 for 1 h, 180 for a day (assuming 9 trading hours per day) and 10 for a year, so that we can estimate roughly that b ≈ 1/540 h1/2 . We now exhibit the dynamics of the exponential distribution. Assuming Markovian dynamics (stochastic differential equations) requires H = 1/2. The dynamics of exponential returns leads inescapably to a dynamic theory of local
132
Dynamics of financial markets, volatility, and option pricing
Table 6.1. Comparison of an actual price distribution of options with the results given by (6.26)–(6.29). See the text for details. The good agreement of columns 2 and 4, as well as columns 3 and 5, confirms that the options are indeed priced according to the distribution of relative price increments Strike price and type
Option price
Implied volatility
Computed option price
Computed implied volatilities
76 P 78 P 80 P 82 P 84 P 86 P 88 P 90 P 92 P 94 C 96 C 98 C 100 C 102 C 104 C
0.047 0.063 0.110 0.172 0.313 0.594 1.078 1.852 3.000 0.469 0.219 0.109 0.047 0.016 0.016
0.150 0.136 0.128 0.116 0.109 0.104 0.100 0.095 0.093 0.093 0.094 0.098 0.100 0.098 0.109
0.031 0.047 0.093 0.172 0.297 0.594 1.078 1.859 2.984 0.469 0.219 0.109 0.063 0.031 0.016
0.139 0.129 0.128 0.117 0.108 0.104 0.100 0.096 0.093 0.093 0.094 0.098 0.104 0.106 0.109
volatility D(x, t), in contrast with the standard theory where D is treated as a constant. 6.2 Dynamics and volatility of returns 6.2.1 Introduction In this section we will generalize stochastic market dynamics to include exponential and other distributions of returns that are volatile and therefore are far from Gaussian. We will see that the exponentially distributed returns density f (x, t) cannot be reached perturbatively by starting with a Gaussian returns density, because the required perturbation is singular. We will solve an inverse problem to discover the diffusion coefficient D(x, t) that is required to describe the exponential distribution, with global volatility σ 2 ∼ ct at long times, from a Fokker–Planck equation. After introducing the exponential model, which describes intraday empirical returns excepting extreme values3 of x, we will also extend the diffusion coefficient D(x, t) to include the fat tails that describe extreme events in x. For extensive 3
Traders, many of whom operate by the seats of their pants using limited information, apparently do not usually worry about extreme events when pricing options on time scales less than a day. This is indicated by the fact that the exponential distribution, which is fat tailed in price p but not in returns x, prices options correctly.
Dynamics and volatility of returns
133
empirical studies of distributions of returns we refer the reader to the book by Dacorogna et al. (2001). Most other empirical and theoretical studies, in contrast, have used price increments but that variable cannot be used conveniently to describe the underlying market dynamics model. Some of the more recent data analyses by econophysicists are discussed in Chapter 8.
6.2.2 Local vs global volatility The general theory of volatility of fat-tailed returns distributions with H = 1/2 can be formulated as follows. Beginning with a stochastic differential equation (6.31) dx = (µ − D(x, t)/2)dt + D(x, t)dB(t) where B(t) is a Wiener process, dB = 0, dB 2 = dt, and x = ln( p(t)/ p0 ) where p0 = p(t0 ). In what follows let R(x, t) = µ − D(x, t)/2. The solution of (6.31) is given by iterating the stochastic integral equation t+t
R(x(s), t)ds + (D(x, t))1/2 • B
x =
(6.32)
t
√ Iteration is possible whenever both R and D satisfy a Lipshitz condition. The last term in (6.32) is the Ito product defined by the stochastic integral t+t
b • B =
b(x(s), s)dB(s)
(6.33)
t
Forming x 2 from (6.32) and averaging, we obtain the conditional average ⎞2 t+t ⎛ t+t 2 x = ⎝ R(x(s), s)ds ⎠ + D(x(s), s)ds t
t
t
t
⎞2 t+t ∞ ⎛ t+t ⎝ ⎠ + = R(x(s), s)ds D(z, s)g(z, s|x, t)dzds (6.34) −∞
where g satisfies the Fokker–Planck equation 1 (6.35) g˙ = −(Rg) + (Dg) 2 corresponding to the sde (6.31) and is the transition probability density, the Green function of the Fokker–Planck equation. Next, we will discuss the volatility of the underlying stochastic process (6.31).
134
Dynamics of financial markets, volatility, and option pricing
For very small time intervals t = s − t the conditional probability g is approximated by its initial condition, the Dirac delta function ␦(z − x), so that we obtain the result t+t
x 2 ≈
D(x(t), s)ds ≈ D(x(t), t)t
(6.36)
t
which is necessary for the validity of the Fokker–Planck equation as t vanishes. Note that we would have obtained exactly the same result by first iterating the stochastic integral equation (6.32) one time, truncating the result, and then averaging. In general the average or global volatility is given by ⎞2 ⎛ t+t R(x(s), s)ds ⎠ σ 2 = x 2 − x2 = ⎝ t t+t ∞
+ t
⎛ t+t
D(z, s)g(z, s|x, t)dzds − ⎝
−∞
⎞ 2 R(x(s), s)ds ⎠
(6.37)
t
Again, at very short times t we obtain from the delta function initial condition approximately that σ 2 ≈ D(x(t), t)t
(6.38)
so that it is reasonable to call D(x, t) the local volatility. Our use of the phrase local volatility should not be confused with any different use of the same phrase in the financial engineering literature. In particular, we make no use at all of the idea of “implied volatility.” The t dependence of the average volatility at long times is model dependent and the underlying stochastic process is nonstationary. Our empirically based exponential returns model obeys the empirically motivated condition σ 2 = x 2 − x2 ∝ t
(6.39)
at large times t. In this section, we have shown how to formulate the dynamic theory of volatility for very liquid markets. The formulation is in the spirit of the Black–Scholes approach but goes far beyond it in freeing us from reliance on the Gaussian returns model as a starting point for analysis. From our perspective the Gaussian model is merely one of many simple possibilities and is not relied on as a zeroth-order approximation to the correct theory, which starts instead at zeroth order with exponential returns.
Dynamics and volatility of returns
135
6.2.3 Dynamics of the exponential distribution In our statistical theory of returns of very liquid assets (stock, bond or foreign exchange, for example) we begin with the stochastic differential equation (6.40) dx = R(x, t)dt + D(x, t)dB(t) The corresponding Fokker–Planck equation, describing local conservation of probability, is 1 f˙ = −(R f ) + (D f ) 2 with probability current density
(6.41)
1 j = R f − (D f ) (6.42) 2 We also assume (6.22) with H = 1/2 because otherwise there is no Fokker–Planck equation. The exponential density (6.11) is discontinuous at x = ␦. The solutions below lead to the conclusion that R(x, t) is continuous across the discontinuity, and that D(x, t) is discontinuous at x = ␦. In order to satisfy conservation of probability at the discontinuity at x = ␦ it is not enough to match the current densities on both sides of the jump. Instead, we have to use the more general condition ⎛ ␦ ⎞ ∞ 1 d ⎝ ˙ ⎠ f − (x, t)dx + f + (x, t)dx = (R − ␦) f − (D f ) |␦ = 0 (6.43) dt 2 −∞
␦
The extra term arises from the fact that the limits of integration ␦ depend on the time. In differentiating the product Df while using f (x, t) = ϑ(x − ␦) f + + ϑ(␦ − x) f −
(6.44)
which is the same as (6.11), and D(x, t) = ϑ(x − ␦)D+ + ϑ(␦ − x)D−
(6.45)
we obtain a delta function at x = ␦. The delta function has vanishing coefficient if we choose D+ f + = D − f −
(6.46)
at x = ␦. Note that we do not assume the normalization (6.14) here. The condition (6.46), along with (6.12), determines the normalization coefficients A and B once we know both pieces of the function D at x = ␦. In addition, there is the extra
136
Dynamics of financial markets, volatility, and option pricing
condition on ␦, ˙ f |␦ = 0 (R − ␦)
(6.47)
With these two conditions satisfied, it is an easy calculation to show that equation (3.124b) for calculating averages of dynamical variables also holds. We next solve the inverse problem: given the exponential distribution (6.11) with (6.12) and (6.46), we will use the Fokker–Planck equation to determine the diffusion coefficient D(x, t) that generates the distribution dynamically. In order to simplify solving the inverse problem, we assume that D(x, t) is linear in ν(x − ␦) for x > ␦, and linear in γ (␦ − x) for x < ␦. The main question is whether the two pieces of D(␦, t) are constants or depend on t. In answering this question we will face a nonuniqueness in determining the local volatility D(x, t) and the functions γ and ν. That nonuniqueness could only be resolved if the data would be accurate enough to measure the t-dependence of both the local and global volatility accurately at very long times, times where γ and ν are not necessarily large compared with unity. However, for the time scales of interest, both for describing the returns data and for pricing options, the time scales are short enough that the limit where γ , ν 1 holds to good accuracy. In this limit, all three solutions to be presented below cannot be distinguished from each other empirically, and yield the same option pricing predictions. The nonuniqueness will be discussed further below. To begin, we assume that d+ (1 + ν(x − ␦)), x > ␦ (6.48) D(x, t) = d− (1 + γ (␦ − x)), x < ␦ where the coefficients d+ , d− may or may not depend on t. Using the exponential density (6.11) and the diffusion coefficient (6.48) in the Fokker–Planck equation (6.41), and assuming first that R(x, t) = R(t) is independent of x, we obtain from equating coefficients of powers of (x − ␦) that d+ 3 ν 2 d− γ˙ = − γ 3 2
ν˙ = −
(6.49)
and also the equation R = d␦/dt. Assuming that d+ = b2 = constant, d− = b2 = constant (thereby enforcing the normalization (6.14)) and integrating (6.49), we obtain √ ν = 1/b t − t0 (6.50) √ γ = 1/b t − t0
Dynamics and volatility of returns
The diffusion coefficient then has the form 2 b (1 + ν(x − ␦)), D(x, t) = b2 (1 − γ (x − ␦)),
x >␦ x <␦
137
(6.51)
This is the solution that we used to price options in Section 6.1.3 and was derived by Gunaratne and McCauley (2003) by using a “Galilean invariance” argument. Unfortunately, this solution cannot be brought into exact agreement with risk-neutral option pricing by any parameter choice, as we will show in the next section by deriving the pde that can be used to price options “locally risk free.” Therefore, we present two other solutions, where we use the x-dependent drift coefficient R(x, t) = µ(t) − D(x, t)/2 in (6.41), so that both µ and D are discontinuous across the jump because R can be taken to be continuous there. We therefore next solve the inverse problem for the Fokker–Planck equation 1 f˙ = −((µ(t) − D(x, t)/2) f ) + (D(x, t) f ) 2 where the corresponding price sde is √ d p = pµ(t)dt + p DdB
(6.52)
(6.53)
Substituting (6.11) and (6.48) into the Fokker–Planck equation (6.52) and equating coefficients of powers of x − ␦, we obtain d+ 2 ν (ν − 1) 2 d− γ˙ = − γ 2 (γ + 1) 2 ν˙ = −
(6.54a)
and ˙ ˙ = B + 1 d+ ν B (µ+ − ␦)B ν 2 ˙ A ˙ = − − 1 d− γ A (µ− − ␦)A γ 2
(6.54b)
Combined with differentiating (6.12), (6.54b) can be used to show that (6.47) is satisfied nontrivially, so that d␦/dt is not overdetermined. Either of the equations (6.54b) can be used to determine ␦, where the two functions µ± (t) are to be determined by imposing the cost of carry condition (6.25b) on ␦. So far, no assumption has been made about the form of A and B. There are two possibilities. If we assume (6.51), so that the normalization (6.14) holds, then we obtain that 1 b2 1 + ln 1 − = − (t − t0 ) (6.55) ν ν 2
138
Dynamics of financial markets, volatility, and option pricing
and also get an analogous equation for γ . When γ , ν 1, then to good accuracy we recover (6.50), and we again have the first solution presented above. This solution would permit an equilibrium, with drift subtracted, as γ , ν approach unity, but at times so ridiculously large (on the order of 100 years) as to be uninteresting for typical trading. The second possibility is that (6.49) and (6.50) hold. In this case we have ⎧ 2 ν ⎪ (1 + ν(x − ␦)), x > ␦ ⎨b ν−1 (6.56) D(x, t) = γ ⎪ ⎩ b2 (1 − γ (x − ␦)), x < ␦ γ +1 but the normalization is not given by (6.14). However, for γ , ν 1, which is the only case of practical interest, we again approximately recover the first solution presented above, with the normalization given approximately by (6.14), so that options are priced approximately the same by all three different solutions, to within good accuracy. In reality, there is an infinity of possible solutions because there is nothing in the theory to determine the functions d± (t). In practice, it would be necessary to measure the diffusion coefficient and thereby determine d± , γ , and ν from the data. Then, we could use the measured functions d± (t) to predict γ (t) and ν(t) via (6.49) and (6.54) and compare those results with measured values. That one meets nonuniqueness in trying to deduce dynamical equations from empirical data is well known from deterministic nonlinear dynamics, more specifically in chaos theory where a generating partition (McCauley, 1993) exists, so it is not a surprise to meet nonuniqueness here as well. The problem in the deterministic case is that to know the dynamics with fairly high precision one must first know the data to very high precision, which is generally impossible. The predictions of distinct chaotic maps like the logistic and circle maps cannot be distinguished from each other in fits to fluid dynamics data at the transition to turbulence (see Ashvin Chhabra et al., 1988). A seemingly simple method for the extraction of deterministic dynamics from data by adding noise was proposed by Rudolf Friedrichs et al. (2000), but the problems of nonuniqueness due to limited precision of the data are not faced in that interesting paper. An attempt was made by Christian Renner et al. (2001) to extract µ and D directly from market data, and we will discuss that interesting case in Chapter 8. In contrast with the theory of Gaussian returns, where D(x, t) = constant, the local volatility (6.51) is piecewise-linear in x. Local volatility, like returns, is exponentially distributed with density h(D) = f (x)dx/dD, but yields the usual Brownian-like mean square fluctuation σ 2 ≈ ct on the average on all time scales of practical interest. But from the standpoint of Gaussian returns the volatility (6.51)
Dynamics and volatility of returns
139
must be seen as a singular perturbation: a Gaussian would follow if we could ignore the term in D(x, t) that is proportional to x − ␦, but the exponential distribution doesn’t reduce to a Gaussian even for small values of x − ␦! There is one limitation on our predictions. Our exponential solution of the Fokker–Planck equation using either of the diffusion coefficients written down above assumes the initial condition x = 0 with x = ln p(t)/ p0 , starting from an initial price p0 = p(t0 ). Note that the density peaks (discontinuously), and the diffusion coefficient is a minimum (discontinuously), at a special price P = p0 e␦ corresponding to x = ␦. We have not studied the time evolution for more general initial conditions where x(t0 ) = 0. That case cannot be solved analytically in closed form, so far as we know. One could instead try to calculate the Green function for an arbitrary initial condition x numerically via the Wiener integral. In the Black–Scholes model there are only two free parameters, the constants µ and σ . The model was easily falsified, because for no choice of those two constants can one fit the data for either the market distribution or option prices correctly. In the exponential model there are three constants µ, b, and b . For option pricing, the parameter µ(t) is determined by the condition (6.25b) with r the cost of carry. Only the product bb is determined by measuring the variance σ , so that one parameter is left free by this procedure. Instead of using the mean square fluctuation (6.18) to fix bb , we can use the right and left variances σ+ and σ− to fix b and b separately. Therefore, there are no undetermined parameters in our option pricing model. We will show next that the delta hedge strategy, when based on a nontrivial local volatility D(x, t), is still instantaneously “risk free,” just as in the case of the Osborne–Black–Scholes–Merton model of Gaussian returns, where D = constant. We will also see that solutions of the Fokker–Planck equation (6.52) are necessary for risk-neutral option pricing.
6.2.4 The delta hedge strategy with volatility Given the diffusion coefficient D(x, t) that reproduces the empirical distribution of returns f (x, t), we can price options “risk neutrally” by using the delta hedge. The delta hedge portfolio has the value Π = −w + w p
(6.57)
where w( p, t) is the option price. The instantaneous return on the portfolio is dΠ −dw + w d p = Πdt (−w + pw )dt
(6.58)
140
Dynamics of financial markets, volatility, and option pricing
We can formulate the delta hedge in terms of the returns variable x. Transforming to returns x = ln p/ p0 , the delta hedge portfolio has the value Π = −u + u
(6.59)
where u(x, t)/ p = w( p, t) is the price of the option. If we use the sde (6.31) for x(t), then the portfolio’s instantaneous return is (by Ito calculus) given by dΠ −(u˙ − u D/2) − u D/2 = Πdt (−u + u )
(6.60)
and is deterministic, because the stochastic terms O(dx) have cancelled. Setting r = dΠ/Π dt we obtain the equation of motion for the average or expected option price u(x, t) as r u = u˙ + (r − D/2)u +
D u 2
(6.61)
With the simple transformation t
u=e
T
r (s)ds
v
(6.62)
equation (6.61) becomes D v (6.63) 2 Note as an aside that if the Fokker–Planck equation does not exist due to the nonvanishing of higher moments, in which case the master equation must be used, then the option pricing pde (6.61) also does not exist for exactly the same reason. The pde (6.63) is the same as the backward-time equation, or Kolmogorov equation,4 corresponding to the Fokker–Planck equation (6.52) for the market density of returns f if we choose µ = r in the latter. With the choice µ = r , both pdes have exactly the same Green function so that no information is provided by solving the option pricing pde (6.61) that is not already contained in the solution f of the Fokker–Planck equation (6.52). Therefore, in order to bring the “expected price,” option pricing formulae (6.8) and (6.9) into agreement with the delta hedge, we see that it would be necessary to choose µ = rd = r in (6.8) and (6.9) in order to make those predictions risk neutral. We must still discuss how we would then choose r , which is left undetermined by the delta hedge condition. Let r denote any rate of expected portfolio return (r may be constant or may depend on t). Calculation of the mean square fluctuation of the quantity 0 = v˙ + (r − D/2)v +
4
See Appendix A or Gnedenko (1967) for a derivation of the backward-time Kolmogorov equation.
Dynamics and volatility of returns
141
(dΠ/Πdt − r ) shows that the hedge is risk free to O(dt), whether or not D(x, t) is constant or variable, and whether or not the portfolio return r is chosen to be the risk-free rate of interest. Practical examples of so-called risk-free rates of interest r0 are provided by the rates of interest for the money market, bank deposits, CDs, or US Treasury Bills, for example. So we are left with the important question: what is the right choice of r in option pricing? An application of the no-arbitrage argument would lead to the choice r = r0 . Finance theorists treat the formal no-arbitrage argument as holy (Baxter and Rennie, 1995), but physicists know that every proposition about the market must be tested and retested. We do not want to fall into the unscientific position of saying that “the theory is right but the market is imperfect.” We must therefore pay close attention to the traders’ practices because traders are the closest analog of experimenters that we can find in finance5 (they reflect the market). The no-arbitrage argument assumes that the portfolio is kept globally risk free via dynamic rebalancing. The delta hedge portfolio is instantaneously risk free, but has finite risk over finite time intervals t unless continuous time updating/rebalancing is accomplished to within observational error. However, one cannot update too often (this is, needless to say, quite expensive owing to trading fees), and this introduces errors that in turn produce risk. This risk is recognized by traders, who do not use the risk-free interest rate for rd in (6.8) and (6.9) (where rd determines µ (t) and therefore µ), but use instead an expected asset return rd that exceeds r0 by a few percentage points (amounting to the cost of carry). The reason for this choice is also theoretically clear: why bother to construct a hedge that must be dynamically balanced, very frequently updated, merely to get the same rate of return r0 that a money market account or CD would provide? This choice also agrees with historic stock data, which shows that from 1900 to 2000 a stock index or bonds would have provided a better investment than a bank savings account.6 Risk-free combinations of stocks and options only exist in finance theory textbooks, but not in real markets. Every hedge is risky, as the catastrophic history of the hedge fund Long Term Capital Management so vividly illustrates. Also, were the no-arbitrage argument true then agents from 1900 to 2000 would have sold stocks and bonds, and bid up the risk-free interest rate so that stocks, bonds and bank deposits would all have yielded the same rate of gain. We now present some details of the delta hedge solution. Because we have ␦˙ = r − D(␦, t)/2 5 6
(6.64)
Fischer Black remained skeptical of the no-arbitrage argument (Dunbar, 2000). In our present era since the beginning of the collapse of the bubble and under the current neo-conservative regime in Washington, it would be pretty risky to assume positive stock returns over time intervals on the order of a few years.
142
Dynamics of financial markets, volatility, and option pricing
with
D(␦, t) ≈
b2 , b2 ,
x >␦ x <␦
(6.65)
we must take r (t) (and also µ(t)) to be discontinuous at ␦ as well. The value of r is then fixed by the condition (6.25b) for the cost of carry rd , but with the choice µ = r in the formula, the solution for a call with ln(K / p) < ␦, for example, will then have the form ␦
C(K , p, t) = e−r− t
( pex − K ) f − (x, t)dx ln(K / p)
−r+ t
∞ ( pex − K ) f + (x, t)dx
+e
(6.66)
␦
where t = T − t, and so differs from our “intuited” formulae (6.23) and (6.24) by having two separate discounting factors for the two separate regions divided by the singular point x = ␦. Note, finally, that because the singular point P = p0 e␦ of the price distribution evolves deterministically, we could depart from the usual no-arbitrage argument to assert that we should identify ␦ = r0 t, where r0 is the risk-free interest rate. This would fix the cost of carry rd in (6.25b) completely theoretically, with the extra percentage points above the risk-free interest rate being determined by the logarithmic term on the right-hand side. The weakness in this argument is that it requires µ > 0 and ␦ > 0, meaning that expected asset returns are always positive, which is not necessarily the case. Extreme returns, large values of x where the empirical density obeys f (x, t) ≈ x −µ , cannot be fit by using the exponential model. We show next how to modify (6.48) to include fat tails in x perturbatively.
6.2.5 Scaling exponents and extreme events The exponential density (6.11) rewritten in terms of the variable y = p/ p(0) f˜ (y, t) = f (lny, t)/y
(6.15b)
exhibits fat-tail scaling with time-dependent tail price exponents γ − 1 and ν + 1. These tail exponents become smaller as t increases. However, trying to rewrite the dynamics in terms of p or p rather than x would lead to excessively complicated equations, in contrast with the simplicity of the theory above written in terms of the returns variable x. From our standpoint the scaling itself is neither useful nor
Dynamics and volatility of returns
143
F(u)
10 0
10 −1
10 −2 −5
0 u
5
Figure 6.5. The exponential distribution F(u) = f (x, t) develops fat tails in returns 1/2 2 ) ) is included in the diffusion coefx when a quadratic term O(((x − Rt)/t √ ficient D(x, t). Here, u = (x − Rt)/ t.
important in applications like option pricing, nor is it helpful in understanding the underlying dynamics. In fact, concentrating on scaling would have sidetracked us from looking in the right direction for the solution. We know that for extreme values of x the empirical density is not exponential but has fat tails (see Figure 6.5). This can be accounted for in our model above by including a term (x − δ)2 /t in the diffusion coefficient, for example D(x, t) ≈ b2 (1 + ν(x − ␦) + ε(ν(x − ␦))2 ),
x >␦
(6.67)
and likewise for x < ␦. The parameter ε is to be determined by the observed returns tail exponent µ, so that (6.67) does not introduce a new undetermined parameter into the otherwise exponential model. With f ≈ x −µ for large x, µ is nonuniversal and 4 ≤ µ ≤ 7 is observed. Option pricing during normal markets, empirically seen, apparently does not require the consideration of fat tails in x because we have fit the observed option prices accurately by taking ε = 0. However, the refinement based on (6.67) is required for using the exponential model to do Value at Risk (VaR), but in that case numerical solutions of the Fokker–Planck equation are required. But what about option pricing during market crashes, where the expected return is locally large and negative over short time intervals? We might think that we could include fluctuations in price somewhat better by using the Fokker–Planck equation for u based on the Ito equation for du, which is easy enough to write down, but this sde depends on the derivatives of u. Also, it is subject to the same (so far unstated)
144
Dynamics of financial markets, volatility, and option pricing
liquidity assumptions as the Ito equation for dx. The liquidity bath assumption is discussed in Chapter 7. In other words, it is not enough to treat large fluctuations via a Markov process; the required underlying liquidity must also be there. Otherwise the “heat bath” that is necessary for the validity of stochastic differential equations is not provided by the market.
6.2.6 Interpolating singular volatility We can interpolate from exponential to Gaussian returns with the following volatility, b2 (1 + ν(x − ␦))2−α , x > ␦ (6.68) D(x, t) = b2 (1 − γ (x − ␦))2−α , x < ␦ where 1 ≤ α ≤ 2 is constant. We do not know which probability density solves the local probability conservation equation (6.41) to lowest order with this diffusion coefficient, except that it is not a simple stretched exponential of the form −(ν(x−␦))α , x >␦ Be (6.69) f (x, t) = −(γ (δ−x))α , x <␦ Ae However, whatever is the probability density for (6.68) it interpolates between exponential and Gaussian returns, with one proviso. In order for this claim to make sense we would have to retrieve ∞ D+ = b
(ν(x − ␦))2−α f (x, t)dx = b2 n
2
(6.70)
␦
where n is independent of t, otherwise this could lead to fractional Brownian motion, violating our assumption of a Markov process.
6.3 Option pricing via stretched exponentials Although we do not understand the dynamics of the stretched exponential density (6.69) we can still use it to price options, if the need should arise empirically. First, using the integration variable z = (ν(x − ␦))α
(6.71)
dx = ν −1 z 1/α−1 dz
(6.72)
and correspondingly
Appendix A. The first Kolmogorov equation
145
we can easily evaluate all averages of the form n z +=A
∞
α
(ν(x − ␦))nα e−(ν(x−␦)) dx
(6.73)
␦
where n is an integer. We next estimate the prefactors A and B from normalization, but without any dynamics. For example, A=
1 γν γ + ν Γ (1/α)
(6.74)
where Γ (ζ ) is the Gamma function, and x+ = ␦ −
1 Γ (2/α) ν Γ (1/α)
(6.75)
Calculating the mean square fluctuation is equally simple, but without an underlying dynamics we cannot assert a priori that H = 1/2 when 1 < α < 2, although we suspect that it is true. Option pricing for α = 1 leads to integrals that must be evaluated numerically. For example, the price of a call with x K > ␦ is ⎞ ⎛ ∞ A −1 1/α C(K , p, t) = e−rd t ⎝eν␦ p eν z z 1/α−1 e−z dz − K Γ (1/α, z K )⎠ (6.76) ν zK
where z K = (ν(x K − ␦))α
(6.77)
and Γ (1/α, z K ) is the incomplete Gamma function. The average and mean square fluctuation are also easy to calculate. Retrieving initial data at the strike time follows as before via Watson’s lemma. Summarizing this chapter, we can say that it is possible to deduce market dynamics from empirical returns and to price options in agreement with traders by using the empirical distribution of returns. We have faced nonuniqueness in the deduction of prices and have shown that it doesn’t matter over all time scales of practical interest. Our specific prediction for the diffusion coefficient should be tested directly empirically, but that task is nontrivial. Appendix A. The first Kolmogorov equation We will show now that the Green function g for the Fokker–Planck equation, or second Kolmogorov equation, also satisfies a backward-time diffusion equation called the first Kolmogorov equation. We begin with the transition probability for
146
Dynamics of financial markets, volatility, and option pricing
a Markov process
g(x, t|x0 , t0 − t0 ) =
g(x, t|z, t0 )g(z, t0 |x0 , t0 − t0 )dz
(A1)
but with t0 > 0. Consider the identity g(x, t|x0 , t0 − t0 ) − g(x, t|x0 , t0 ) = (g(x, t|z, t0 ) − g(x, t|x0 , t0 ))g(z, t|x0 , t0 − t0 )dz
(A2)
Using the Taylor expansion ∂ g(x, t|x0 , t0 ) ∂ x0 2 ∂ g(x, t|x0 , t0 ) 1 + (z − x0 )2 + ··· 2 ∂ x02
g(x, t|z, t0 ) = g(x, t|x0 , t0 ) + (z − x0 )
(A3)
we obtain g(x, t|x0 , t0 − t0 ) − g(x, t|x0 , t0 ) t0 ∂g (z − x0 ) = g(z, t|x0 , t0 − ∆t0 )dz ∂ x0 t0 1 (z − x0 )2 ∂2g g(z, t|x0 , t0 − ∆t0 )dz + · · · + 2 t0 2 ∂ x0
(A4)
Assuming, as with the Fokker–Planck equation, that all higher moments vanish faster than t0 , we obtain the backward-time diffusion equation 0=
∂ g(x, t|x0 , t0 ) ∂g 1 ∂2g + R(x0 , t0 ) + D(x0 , t0 ) 2 ∂t0 ∂ x0 2 ∂ x0
(A5)
The same Green function satisfies the Fokker–Planck equation (6.41), because both were derived from (A1) by making the same approximations. With R = r − D/2, (A5) and the option pricing pde (6.56) coincide.
7 Thermodynamic analogies vs instability of markets
7.1 Liquidity and approximately reversible trading The question of whether a thermodynamic analogy with economics is possible goes back at least to von Neumann. We will attempt to interpret a standard hedging strategy by using thermodynamic ideas (McCauley, 2003a). The example provided by a standard hedging strategy illustrates why thermodynamic analogies fail in trying to describe economic behavior. We will see that normal trading based on the replicating, self-financing hedging strategy (Baxter and Rennie, 1995) provides us with a partial analogy with thermodynamics, where market liquidity of rapidly and frequently traded assets plays the role of the heat bath and the absence of arbitrage possibilities would have to be analogous to thermal equilibrium in order for the analogy to work. We use statistical physics to explain why the condition for “no-arbitrage” fails as an equilibrium analogy. In looking for an analogy with thermodynamics we will concentrate on three things, using as observable variables the prices and quantities held of financial assets: an empirically meaningful definition of reversibility, an analog of the heat bath, and the appearance of entropy as the measure of market disorder. We define an approximately reversible trade as one where you can reverse your buy or sell order over a very short time interval (on the order of a few seconds or ticks) with only very small percentage losses, in analogy with approximately reversible processes in laboratory experiments in thermodynamics. All that follows assumes that approximately reversible trading is possible although reversible trading is certainly the exception when orders are of large enough size. The notion of a risk-free hedge implicitly assumes adequate liquidity from the start. Several assumptions are necessary in order to formulate the analogy (see also Farmer, 1994). One is that transaction costs are negligible (no friction). Another is that the “liquidity bath” is large enough that borrowing the money, selling the call
147
148
Thermodynamic analogies vs instability of markets
and buying the stock are possible approximately instantaneously, meaning during a few ticks in the market, without affecting the price of either the stock or call, or the interest rate r. That is, the desired margin purchase is assumed to be possible approximately reversibly in real time through your discount broker on your Mac or PC. This will not be possible if the number of shares involved is too large, or if the market crashes. The assumption of “no market impact” (meaning adequate liquidity) during trading is an approximation that is limited to very small trades in a heavily-traded market and is easily violated when, for example, Deutsche Bank takes a very large position in Mexican Pesos or Swedish Crowns. Or as when Salomon unwound its derivatives positions in 1998 and left Long Term Capital Management holding the bag. Next, we introduce the hedging strategy. We will formulate the thermodynamic analogy in Section 7.3.
7.2 Replicating self-financing hedges In Section 6.2 we started with the delta hedge and derived the option pricing partial differential equation (pde). Next we observe that one can start with the replicating, self-financing hedging strategy and derive both the delta hedge and the option pricing pde. Approximately reversible trading is implicitly assumed in both cases. The option pricing partial differential equation is not restricted to the standard Black–Scholes equation when nontrivial volatility is assumed, as we know, but produces option pricing in agreement with the empirical distribution for the correct description of volatility in a Fokker–Planck description of fluctuations. Consider a dynamic hedging strategy (φ, ψ) defined as follows. Suppose you short a European call at price C( p, K , T − t), where K is the strike price and T the expiration time. To cover your bet that the underlying stock price will drop, you simultaneously buy φ shares of the stock at price p by borrowing ψ Euros from the broker (the stock is bought on margin, for example). In general, the strategy consists of holding φ( p, t) shares of stock at price p, a risky asset, and ψ( p, t) shares of a money market fund at initial price m = 1 Euro/share, a riskless asset (with fixed interest rate r) at all times t ≤ T during the bet, where T is the strike time. At the initial time t0 the call is worth C( p0 , t0 ) = φ0 p0 + ψ0 m 0
(7.1)
where m 0 = 1 Euro. This is the initial condition, and the idea is to replicate this balance at all later times t ≤ T without injecting any new money into the portfolio. Assuming that (φ, ψ) are twice differentiable functions (which would be needed
Replicating self-financing hedges
149
for a thermodynamics analogy), the portfolio is self-financing if, during dt, dφp + dψm = 0
(7.2)
dC = φd p + ψdm
(7.3)
so that
where dm = r mdt. In (7.3), dp is a stochastic variable, and p(t + dt) and C(t + dt) are unknown and random at time t when p(t) and C( p, t) are observed. Viewing C as a function of (p, m), equation (7.3) tells us that φ=
∂C ∂p
(7.4)
Note that this is the delta hedge condition. Next, we want the portfolio in addition to be “replicating,” meaning that the functional relationship C( p, t) = φ( p, t) p + ψ( p, t)m
(7.5)
holds for all later (p, t) up to expiration, and p is the known price at time t (for a stock purchase, we can take p to be the ask price). Equation (7.5) expresses the idea that holding the stock plus money market in the combination (φ, ψ) is equivalent to holding the call. The strategy (φ, ψ), if it can be constructed, defines a “synthetic call”: the call at price C is synthesized by holding a certain number φ > 0 shares of stock and ψ < 0 of money market at each instant t and price p(t). These conditions, combined with Ito’s lemma, predict the option pricing equation and therefore the price C of the call. An analogous argument can be made to construct synthetic puts, where covering the bet made by selling the put means shorting φ shares of the stock and holding ψ dollars in the money market. Starting with the stochastic differential equation (sde) for the stock price d p = R p pdt + σ ( p, t) pdB
(7.6a)
where B(t) defines a Wiener process, with dB = 0 and dB2 = dt, and using Ito’s lemma we obtain the stochastic differential equation dC = (C˙ + σ 2 p 2 C /2)dt + C d p
(7.7)
We use the empirically less reliable variable p here instead of returns x in order that the reader can better compare this presentation with discussions in the standard financial mathematics literature. Continuing, from (7.3) and Π = −ψm, because of the Legendre transform property, we have the sde dC = φd p − dΠ = φd p − r Πdt = φd p − r (−C + φd p)dt
(7.8)
150
Thermodynamic analogies vs instability of markets
Equating coefficients of d p and dt in (7.7) and (7.8) we obtain φ=
∂C ∂p
(7.9)
and also the option pricing partial differential equation (pde) ∂C σ 2 ( p, t) p 2 ∂ 2 C ∂C + + rC = −r p ∂t 2 ∂ p2 ∂p
(7.10)
where r = dΠ/Πdt. With ( p, t)-dependent volatility σ 2 ( p, t), the pde (7.10) is not restricted to Black–Scholes/Gaussian returns. The reader might wonder why he or she should bet at all if he or she is going to cover the bet with an expected gain/loss of zero. First, agents who for business reasons must take a long position in a foreign currency may want to hedge that bet. Second, a company like LTCM will try to find “mispricings” in bond interest rates and bet against them, expecting the market to return to “historic values.” Assuming that the B–S theory could be used to price options correctly, when options were “underpriced” you could go long on a call, when “overpriced” then you could go short a call. This is an oversimplification, but conveys the essence of the operation. Recalling our discussion of the M & M theorem in Chapter 4, LTCM used leveraging with the idea of a self-financing portfolio to approach the point where equity/debt is nearly zero. Some details are described in Dunbar (2000). A brokerage house, in contrast, will ideally try to hedge its positions so that it takes on no risk. For example, it will try to match calls that are sold to calls that are bought and will make its money from the brokerage fees that are neglected in our discussion. In that case leverage plays no role. The idea of leveraging bets, especially in the heyday of corporate takeovers based on the issuance of junk bonds, was encouraged by the M & M theorem. Belief in the M & M theorem encourages taking on debt quite generally. Corporate managers who fail to take on “enough” debt are seen as lax in their duty to shareholders.1 7.3 Why thermodynamic analogies fail Equations (7.2)–(7.5) define a Legendre transform. We can use this to analyze whether a formal thermodynamic analogy is possible, even though the market distribution of returns is not in equilibrium and even though our variables (C, p) are stochastic ones. If we would try to think of p and m as analogs of chemical potentials, then C in equation (7.1) is like a Gibbs potential (because (φ, ψ) are analogous to extensive 1
See Miller (1988), written in the junk bond heyday. For a description of junk bond financing and the explosion of derivatives in the 1980s, see Lewis (1989).
Why thermodynamic analogies fail
151
variables) and (7.2) is a constraint. One could just as well take p and m as analogous to any pair of intensive thermodynamic variables, like pressure and temperature. The interesting parts of the analogy are, first, that the assumption of adequate liquidity is analogous to the heat bath, and absence of arbitrage possibilities is expected to be analogous (but certainly not equivalent) to thermal equilibrium, where there are no correlations: one can not get something for nothing out of the heat bath because of the second law. Likewise, arbitrage is impossible systematically in the absence of correlations. In finance theory no arbitrage is called an “equilibrium” condition. We will now try to make that analogy precise and will explain precisely where and why it fails. First, some equilibrium statistical physics. In a system in thermal equilibrium with no external forces, there is spatial homogeneity and isotropy. The temperature T, the average kinetic energy, is in any case the same throughout the system independent of particle position. The same time-dependent energy fluctuations may be observed at any point in the system over long time intervals. Taking E = v 2 /2, the kinetic energy fluctuations can be described by a stochastic process derived from the S–U–O process (see Chapter 4) by using Ito calculus, √ (7.6b) dE = (−2β E + σ 2 /2)dt + σ 2EdB(t) with σ 2 = 2βkT , where k is Boltzmann’s constant. It’s easy to show that this process is asymptotically stationary for βt >>> 1, with equilibrium density f eq (E) =
1 e−E/kT √ Z E
(7.6c)
where Z is the normalization integral, the one-particle partition function (we consider a gas of noninteracting particles). If, in addition, there is a potential energy U (X ), where X is particle position then the equilibrium and nonequilibrium densities are not translationally invariant, but depend on location X. This is trivial statistical physics, but we can use it to understand what no arbitrage means physically, or geometrically. Now for the finance part. First, we can see easily that the no-arbitrage condition does not guarantee market equilibrium, which is defined by vanishing total excess demand for an asset. Consider two spatially separated markets with two different price distributions for the same asset. If enough traders go long in one market and short in the other, then the market price distributions can be brought into agreement. However, if there is positive excess demand for the asset then the average price of the asset will continue increasing with time, so that there is no equilibrium. The excess demand ε( p, t) is defined by d p/dt = ε( p, t) and is given by the right-hand side of the sde (7.6a) as drift plus noise. So, markets that are not in equilibrium can satisfy the no-arbitrage condition.
152
Thermodynamic analogies vs instability of markets
Now, in order to understand the geometric meaning of the no-arbitrage condition, consider a spatial distribution of markets with different price distributions at each location, i.e. gold has different prices in New York, Tokyo, Johannesburg, Frankfurt, London, and Moscow. That is, the price distribution g( p, X, t) depends on both market location X and time t. It is now easy to formulate the no-arbitrage condition in the language of statistical physics. In a physical system in thermal equilibrium the average kinetic energy is constant throughout the system, and is independent of location. The energy fluctuations at each point in the system obey a stationary process. The no-arbitrage condition merely requires spatial homogeneity and isotropy of the price distribution (to within transaction, shipping and customs fees, and taxes). That is, “no arbitrage” is equivalent to rotational invariance of the price distribution on the globe (the Earth), or to two-dimensional translational invariance locally in any tangent plane to the globe (Boston vs New York, for example). But the financial market price distribution is not stationary. We explain this key assertion in the next section. So, market equilibrium is not achieved merely by the lack of arbitrage opportunities. A collection of markets with arbitrage possibilities can be formulated via a master equation where the distribution of prices is not spatially homogeneous, but varies (in average price, for example) from market to market. Note also the following. In physics, we define the empirical temperature t of an equilibrium system with energy E and volume V (one could equally well use any other pair of extensive variables), where for any of n mentally constructed subsystems of the equilibrium system we have t = f (E 1 , V1 ) = . . . f (E n , Vn )
(7.11)
This condition, applied to systems in thermal contact with each other, reflects the historic origin of the need for an extra, nonmechanical variable called temperature. In thermodynamics (Callen, 1985), instead of temperature, one can as well take any other intensive variable, for example, pressure or chemical potential. The economic analog of equilibrium would then be the absence of arbitrage possibilities, that there is only one price of an asset p = f (φ1 , ψ1 ) = . . . f (φn , ψn )
(7.12)
This is a neo-classical condition that would follow from utility maximization. Starting from neo-classical economic theory, Smith and Foley (2002) have proposed a thermodynamic interpretation of p = f (z) based on utility maximization. In their discussion a quantity labeled as entropy is formally defined in terms of utility, but the quantity so-defined cannot represent disorder/uncertainty because there is no liquidity, no analog of the heat bath, in neo-classical equilibrium theory. The ground for this assertion is as follows. Kirman has pointed out, following Radner’s 1968 proof of noncomputability of neo-classical equilibria under slight uncertainty,
Entropy and instability of financial markets
153
that demand for money (liquidity demand) does not appear in neo-classical theory, where the future is completely determined. Kirman (1989) speculates that liquidity demand arises from uncertainty. This seems to be a reasonable speculation. The bounded rationality model of Bak et al. (1999) attempts to define the absolute value of money and is motivated by the fact that a standard neo-classical economy is a pure barter economy, where price p is merely a label2 as we have stressed in Chapter 2. The absence of entropy representing disorder in neo-classical equilibrium theory can be contrasted with thermodynamics in the following way: for assets in a market let us define economic efficiency as D S , (7.13) e = min S D where S and D are net supply and net demand for some asset in that market. In neo-classical equilibrium the efficiency is 100%, e = 1, whereas the second law of thermodynamics via the heat bath prevents 100% efficiency in any thermodynamic machine. That is, the neo-classical market equilibrium condition e = 1 is not a thermodynamic efficiency, unless we would be able to interpret it as the zero (Kelvin) temperature result of an unknown thermodynamic theory (100% efficiency of a machine is thermodynamically possible only at zero absolute temperature). In nature or in the laboratory, superfluids flow with negligible friction below the lambda temperature, and with zero friction at zero Kelvin, at speeds below the critical velocity for creating a vortex ring or vortex pair. In stark contrast, neo-classical economists assume the unphysical equivalent of a hypothetical economy made up of Maxwellian demonish-like agents who can systematically cheat the second law perfectly.
7.4 Entropy and instability of financial markets A severe problem with our attempted analogy is that entropy plays no role in the “thermodynamic formalism” represented by (7.2) and (7.5). According to Mirowski (2002), von Neumann suggested that entropy might be found in liquidity. If f (x, t) is the empirical returns density then the Gibbs entropy (the entropy of the asset in the liquidity bath) is ∞ S(t) = −
f (x, t) ln f (x, t)dx
(7.14)
−∞ 2
In a standard neo-classical economy there is no capital accumulation, no financial market, and no production of goods either. There is merely exchange of preexisting goods.
154
Thermodynamic analogies vs instability of markets
but, again, equilibrium is impossible because this entropy is always increasing. The entropy S(t) can never reach a maximum because f, which is approximately exponential in returns x, spreads without limit. The same can be said of the Gaussian approximation to the returns distribution. From the standpoint of dynamics two separate conditions prevent entropy maximization: the time-dependent diffusion coefficient D(x, t), and the lack of finite upper and lower bounds on returns x. If we would make the empirically wrong approximation of assuming Gaussian returns, where volatility D(x, t) is a constant, then the lack of bounds on returns x still prevents statistical equilibrium. Even with a t-independent volatility D(x) and expected stock rate of return R(x), the solution of the Fokker–Planck equation describing statistical equilibrium, P(x) =
C 2 e D(x)
R(x) D(x) dx
(7.15)
would be possible after a long enough time only if a Brownian “particle” with position x(t) = ln( p(t)/ p(0)) would be confined by two reflecting walls (the current density vanishes at a reflecting wall), pmin ≤ p ≤ pmax , for example, by price controls. This is not what is taught in standard economics texts (see, for example Varian (1992)). To make the contrast between real markets and equilibrium statistical physics sharper, we remind the reader that a Brownian particle in equilibrium in a heat bath has a velocity distribution that is Maxwellian (Wax, 1954). The sde that describes the approach of a nonequilibrium distribution of particle velocities to statistical equilibrium is the Smoluchowski–Ornstein–Uhlenbeck sde for the particle’s velocity. The distribution of positions x is also generated by a Fokker–Planck equation, but subject to boundary conditions that confine the particle to a finite volume V so that the equilibrium distribution of positions x is constant. That case describes timeindependent fluctuations about statistical equilibrium. Another way to say it is that the S–U–O process is stationary at long times, but the empirical market distribution is nonstationary. The main point is as follows. As we have illustrated by using the lognormal pricing model in Chapter 4, the mere existence of an equilibrium density is not enough: one must be able to prove that the predicted equilibrium can be reached dynamically. Here’s the essential difference between market dynamics and near-equilibrium dynamics. In the S–U–O process √ (7.16) dv = −βvdt + DdB √ the force DdB/dt is stationary because D is constant. The thermal equilibrium distribution of velocities v then requires that D = βkT, which represents the fluctuation–dissipation theorem (Kubo et al., 1978). A fluctuation–dissipation
Entropy and instability of financial markets
155
theorem is possible only near equilibrium, which is to say for asymptotically stationary processes v(t). In the fluctuation-dissipation theorem, the linear friction coefficient β is derived from the equilibrium fluctuations. In finance, in contrast, we have (7.17) d p = µpdt + d( p, t)dB √ Because d( p, t) depends on p, the random force d( p, t)dB/dt is nonstationary (see Appendix B), the stochastic process p(t) is far from equilibrium, and there is no analog of a fluctuation–dissipation theorem to relate even a negative rate of return µ < 0 to the diffusion coefficient d( p, t). Another way to say it is that an irreversible thermodynamics a` la Onsager (Kubo et al., 1978) is impossible for nonstationary forces. We have pointed out in Chapter 4 that there are at least six separate notions of “equilibrium” in economics and finance, five of which are wrong. Here, we discuss a definition of “equilibrium” that appears in discussions of the EMH: Eugene Fama (1970) misidentified market averages as describing “market equilibrium,” in spite of the fact that those averages are time dependent. The only dynamically acceptable definition of equilibrium is that price p is constant, d p/dt = 0, respecting the real equilibrium requirement of vanishing excess demand. In stochastic theory this is generalized (as in statistical physics and thermodynamics) to mean that all average values are time independent, so that p = constant and, furthermore, all moments of the price (or returns) distribution are time independent. This would correspond to a state of statistical equilibrium where prices would fluctuate about constant average values (with vanishing excess demand on the average), but this state has never been observed in data obtained from real markets, nor is it predicted by any model that describes real markets empirically correctly. In contrast, neo-classical economists have propagated the misleading notion of “temporary price equilibria,” which we have shown in Chapter 4 to be self-contradictory: in that definition there is an artificially and arbitrarily defined “excess demand” that is made to vanish, whereas the actual excess demand ε( p) defined correctly by d p/dt = ε( p) above does not vanish. The notion of temporary price equilibria violates the conditions for statistical equilibrium as well, and cannot sensibly be seen as an idea of local thermodynamic equilibrium because of the short time scales (on the order of a second) for “shocks.” The idea that markets may provide an example of critical phenomena is popular in statistical physics, but we see no evidence for an analogy of markets with phase transitions. We suggest instead the analogy of heat bath/energy with liquidity/ money. The definition of a heat bath is a system that is large enough and with large enough heat capacity (like a lake, for example) that adding or removing small quantities of energy from the bath do not affect the temperature significantly.
156
Thermodynamic analogies vs instability of markets
The analogy of a heat bath with finance is that large trades violate the liquidity assumption, as, for example, when Citi-Bank takes a large position in Reals, just as taking too much energy out of the system’s environment violates the assumption that the heat bath remains approximately in equilibrium in thermodynamics. The possibility of arbitrage would correspond to a lower entropy (Zhang, 1999), reflecting correlations in the market dynamics. This would require history dependence in the returns distribution whereas the no-arbitrage condition, which is guaranteed by the “efficient market hypothesis” (EMH) is satisfied by either statistically independent or Markovian returns. Our empirically based model of volatility of returns and option pricing is based on the assumption of a Markov process with unbounded returns. Larger entropy means greater ignorance, more disorder, but entropy has been ignored in the economics literature. The emphasis in economic theory has been placed on the nonempirically based idealizations of perfect foresight, instant information transfer and equilibrium.3 The idea of synthetic options, based on equation (7.5) and discussed in Chapter 5, led to so-called “portfolio insurance.” Portfolio insurance implicitly makes the assumption of approximately reversible trading, that agents would always be there to take the other side of a desired trade at approximately the price wanted. In October, 1987, the New York market crashed, the liquidity dried up. Many people who had believed that they were insured, without thinking carefully enough about the implicit assumption of liquidity, lost money (Jacobs, 1999). The idea of portfolio insurance was based on an excessive belief in the mathematics of approximately reversible trading combined with the expectation that the market will go up, on the average (R > 0), but ignoring the (unknown) time scales over which downturns and recoveries may occur. Through the requirement of keeping the hedge balanced, the strategy of a self-financing, replicating hedge can require an agent to buy on the way up and sell on the way down. This is not usually a prescription for success and also produces further destabilization of an already inherently unstable market. Another famous example of misplaced trust in neo-classical economic beliefs is the case of LTCM,4 where it was assumed that prices would always return to historic averages, in spite of the absence of stability in (7.6a). LTCM threw good money after bad, continuing to bet that interest rate spreads would return to historically expected values until the Gambler’s Ruin ended the game. Enron, which eventually went bankrupt, also operated with the belief that unregulated free markets are stable. 3
4
The theory of asymmetric information (Ackerlof, 1984; Stiglitz and Weiss, 1992) does better by pointing in the direction of imperfect, one-sided information, but is still based on the assumptions of optimization and equilibria. With the Modigliani–Miller argument of Chapter 4 in mind, where they assumed that the ratio of equity to debt doesn’t matter, see pp. 188–190 in Dunbar (2000) for an example where the debt to equity ratio did matter. LTCM tried to use self-replicating, self-financing hedges as a replacement for equity, and operated (off balance sheet) with an equity to debt ratio “S/B” 1. Consequently, they went bankrupt when the market unexpectedly turned against them.
Appendix B. Stationary vs nonstationary random forces
157
In contrast, the entropy (7.14) of the market is always increasing, never reaching a maximum, and is consistent with very large fluctuations that have unknown and completely unpredictable relaxation times. 7.5 The challenge: to find at least one stable market Globalization, meaning privatization and deregulation on a global scale, is stimulated by fast, large-scale money movement due to the advent of networking with second by second trading of financial assets. Globalization is a completely uncontrolled economic experiment whose outcome cannot be correctly predicted on the basis of either past history (statistics) or neo-classical economic theory. With the fate of LTCM, Enron, WCom, Mexico, Russia, Thailand, Brazil, and Argentina as examples of some of the ill consequences of rapid deregulation (see Luoma (2002) for a description of the consequences of deregulation of water supplies), we should be severely skeptical of the optimistic claims made by its promoters.5 Instead, the enthusiasts for globalization have the obligation to convince us that stability is really a property of deregulated markets. But standard economic theory has it wrong: one cannot have both completely unregulated markets and stability at the same time; the two conditions are apparently incompatible. Statistical equilbrium of financial markets is impossible with a diffusion coefficient D(x, t) that depends on x and t. Can one find examples of stability in nonfinancial markets? One could search in the following way: pick any market with data adequate for determining the time development of the empirical price distribution (we can study the time development in finance because we have high-frequency data over the past 12 or so years). With a stationary process the global volatility (variance) approaches a constant as initial correlations die out. Equilibrium markets would not be volatile. If the distribution is stationary or approaches stationarity, then the Invisible Hand stabilizes the market. This test will work for liquid markets. For very illiquid markets the available statistics may be so bad that it may be impossible to discover the time development of the empirical distribution, but in that case no hypothesis can be tested reliably. Appendix B. Stationary vs nonstationary random forces It is important to know in both finance and physics when a random force is stationary and when it is not. Otherwise, dynamics far from equilibrium may be confused with dynamics near equilibrium. We include this appendix with the aim of eliminating 5
See Stiglitz (2002) for a qualitative discussion of many examples of the instability of rapidly deregulated markets in the Third World; see Friedman (2000) for an uncritical cheerleader’s account of globalization written implicitly from a neo-classical perspective.
158
Thermodynamic analogies vs instability of markets
some confusion that has been written into the literature. Toward that end, let us first ask: when is a random force Gaussian, white, and stationary? To make matters worse, white noise ξ (t) is called Gaussian and stationary, but since ξ = dB/dt has a variance that blows up like (B/t)2 = 1/t as t vanishes, in what sense is white noise Gaussian? With the sde (7.17) written in Langevin fashion, the usual language of statistical physics, (B1) d p/dt = r ( p, t) + d( p, t)dB(t)/dt √ √ the random force is defined by ζ (t) = d( p, t)dB(t)/dt = d( p, t)ξ (t). The term ξ (t) is very badly behaved: mathematically, it exists nowhere pointwise. In spite of this, it is called “Gaussian, stationary, and white”. Let us analyze this in detail, because it will help us to see that the random force ζ (t) in (B1) is not stationary even if a variable diffusion coefficient d(p) is t-independent. Consider a general random process ξ (t) that is not restricted to be Gaussian, white, or stationary. We will return to the special case of white noise after arriving at some standard results. Given a sequence of r events (ξ (t1 ), . . . , ξ (tr )), the probability density for that sequence of events is f (x1 , . . . , xr ; t1 , . . . , tr ), with characteristic function given by Θ(k1 , . . . , kr ; t1 , . . . , tr ) ∞ = f (x1 , . . . , xr ; t1 , . . . tr )ei(k1 x1 +···+kr xr ) dx1 . . . dxr
(B2)
−∞
Expanding the exponential in power series, we get the expansion of the characteristic function in terms of the moments of the density f. Exponentiating that infinite series, we then obtain the cumulant expansion Θ(k1 , . . . , kr ; t1 , . . . , tr ) = eΨ (k1 ,...,kr ;t1 ,...,tr )
(B3)
where Ψ (k1 , . . . , kr ; t1 , . . . , tr ) =
∞
(ik1 )s1 (ikr )sr s1 ··· x1 . . . xrsr c s1 ! sr ! s1 ,...,sr =1
(B4)
and the subscript “c” stands for “cumulant” or “connected.” The first two cumulants are given by the correlation functions x(t1 )c = x(t1 ) x(t1 )x(t2 )c = x(t1 )x(t2 ) − x(t1 )x(t2 ) = (x(t1 ) − x(t1 ))(x(t2 ) − x(t2 ))
(B5)
Appendix B. Stationary vs nonstationary random forces
159
The density f (x, t) is Gaussian if all cumulants vanish for n > 2 : K n = 0 if n > 2. For a stationary Gaussian process we then have x(t1 )c = x(t1 ) = K 1 (t1 ) = constant x(t1 )x(t2 )c = (x(t1 ) − x(t1 ))(x(t2 ) − x(t2 )) = K 2 (t1 − t2 )
(B6)
If, in addition, the stationary process is white noise, then the spectral density is constant because K 2 (t1 − t2 ) = ξ (t1 )ξ (t2 ) = K ␦(t1 − t2 )
(B7)
with K = constant. Since the mean K 1 is constant we can take it to be zero. Using (3.162a) ∞ ξ (t) =
A(ω, T )eiωt dω
(B8)
−∞
we get from (3.164) that G(ω) = 2
A(ω, T )2 T
= constant
(B9)
so that (3.165) then yields σ = ξ (t)2 =
∞ G(ω)dω = ∞
2
(B10)
−∞
in agreement with defining white noise by “taking the derivative of a Wiener process.” But with infinite variance, in what sense can white noise be called a Gaussian process? The answer is that ξ itself is not Gaussian, but the expansion coefficients A(ω) in (B8) are taken to be Gaussian distributed, each with variance given by the constant spectral density (Wax, 1954). If we write the Langevin equation (B1) in the form d p/dt = −γ ( p) + ζ ( p, t) (B11) √ with random force given by ζ (t, p) = d( p)ξ (t) and with the drift term given by r ( p, t) = −γ ( p) < 0, representing dissipation with t-independent drift and diffusion coefficients, then the following assertions can be found on pp. 65–68 of the stimulating text by Kubo et al. (1978) on nonequilibrium statistical physics: (i) the random force ζ (t, p) is Gaussian and white, (ii) equilibrium exists and the equilibrium distribution can be written in terms of a potential U ( p). Point (ii), we know, is correct but the assumption that ζ (t, p) is Gaussian is wrong
160
Thermodynamic analogies vs instability of markets
(for example, if d( p) = p 2 then ζ (t, p) is lognormal, not Gaussian). Also, the assumption that ζ (t, p) is white presumes stationarity, which does not hold whenever d(p) depends on p. However, by “white” some writers may mean only that ζ (t, p) is delta-correlated, even if the spectral density doesn’t exist (for exam√ ple, ζ (t, p) = p2 ξ (t) is delta-correlated but has no spectral density because the process is nonstationary). In Kubo et al., reading between the lines, there seems to be an implicit assumption that stochastic processes of the form (B11) are always near equilibrium merely because one can find an equilibrium solution to the Fokker– Planck equation. We know that this is not true, for example, for the case where d( p) = p 2 . That is, the fact that a variable diffusion coefficient d( p) can delocalize a particle, no matter what the form of γ ( p) when p is unbounded to the right, was not noticed.
8 Scaling, correlations, and cascades in finance and turbulence
We will discuss next a subject that has preoccupied statistical physicists for over two decades but has been largely ignored in this book so far: scaling (McCauley, 2003b). We did not use scaling in order to discover the dynamics of the financial market distribution. That distribution scales but the scaling wasn’t needed to construct the dynamics. We will also discuss correlations. The usefulness of Markov processes in market dynamics reflects the fact that the market is hard to beat. Correlations would necessarily form the basis of any serious attempt to beat the market. There is an interesting long-time correlation that obeys self-affine scaling: fractional Brownian motion. We begin with the essential difference between self-similar and self-affine scaling.
8.1 Fractal vs self-affine scaling Self-similar scaling is illustrated by the following examples. Consider the pair correlation function for some distribution of matter in a volume V with fluctuating density ρ(x) 1 ρ(r )ρ(r + r) (8.1) C(r ) = V r In an isotropic system we have C(r ) = C(r )
(8.2)
and for an isotropic fractal the scaling law C(br ) = b−α C(r )
(8.3)
C(r ) ≈ r −α
(8.4)
holds. Taking b = 1/r yields
161
162
Scaling, correlations, and cascades in finance and turbulence
As in fractal growth phenomena, e.g. DLA (diffusion limited aggregation), let N(L) denote the number of particles inside a sphere of radius L , 0 ≤ r ≤ L, in a system with dimension d. Then L C(r )d d r ≈ r d−α = r D2
N (L) ≈
(8.5)
0
where D2 = d − σ is called the correlation dimension. Here, we have isotropic scaling with a single scaling parameter D2 . This kind of scaling describes phenomena both at critical points in equilibrium statistical physics, and also in dynamical systems theory far from equilibrium, therefore describing phenomena both at and beyond transitions to chaos. However, universality classes for scaling exponents have only been defined unambiguously at a critical point (the borderline of chaos is a critical point). In general, fractal dimensions are not universal in a chaotic dynamical system. In that case there is at best the weaker topologic universality of symbol sequences whenever a generating partition exists (McCauley, 1993). A generating partition is a natural partitioning of phase space by a deterministic dynamical system. The generating partition characterizes the dynamical system and provides a finite precision picture of the fractal whenever the attractor or repeller is fractal. The generalization to multifractals goes via the generating functions introduced by Halsey et al. (1987) based on generating partitions of chaotic dynamical systems, which implicitly satisfy a required infimum rule. The idea of multiaffine scaling is a different idea and is sometimes confused with multifractal scaling (McCauley, 2002). Generating partitions are required for the efficient definition of coarse-graining fractals and multifractals, with the sole exception of the information dimension, leading systematically to finite precision descriptions of fractal geometry. Generating partitions are not required for self- or multi-self-affine scaling. Notions of fractals are neither central nor useful in what follows and so are not discussed explicitly here. Self-affine scaling (Barabasi and Stanley, 1995) is defined by a relation that looks superficially the same as equation (8.3), namely, by a functional relationship of the form F(x) = b−H F(bx)
(8.6)
but where the vertical and horizontal axes F(x) and x are rescaled by different parameters, b−H and b, respectively. When applied to stochastic processes we expect only statistical self-similarity, or self-similarity of averages. H is called the Hurst exponent (Feder, 1988). An example from analysis is provided by the everywhere-continuous but nowhere-differentiable Weierstrass–Mandelbrot
Persistence and antipersistence
163
function F(t) =
∞ 1 − cos(bn t) −∞
bnα
(8.7)
It’s easy to show that F(t) = b−α F(bt), so that F(t) obeys self-affine scaling with H = α. Another example is provided by ordinary Brownian motion x 2 = t
(8.8)
with Hurst exponent H = 1/2.
8.2 Persistence and antipersistence Consider a time series x(t) that is in some yet-to-be-defined sense statistically selfaffine, i.e. x(bt) ≈ b H x(t)
(8.9)
Mandelbrot (1968) introduced a second scaling exponent J, the Joseph exponent, to describe persistence/antipersistence of correlations. The exponent J is defined via rescaled range analysis (R/S analysis). See Feder (1988) for discussions of both R/S analysis and persistence/antipersistence. For statistical independence J = 1/2. So J > 1/2 implies persistence of correlations while J < 1/2 implies antipersistence of correlations. The exponents J and H need not be the same but are sometimes confused with each other. As an example of persistence and antipersistence consider next “fractional Brownian motion” where x 2 = ct 2H
(8.10)
with Hurst exponent 0 < H < 1. Note that H = 1/2 includes, but is not restricted to, ordinary Brownian motion: there may be distributions with second moments behaving like (8.8) but showing correlations in higher moments. We will show that the case where H = 1/2 implies correlations extending over arbitrarily long times for two successive time intervals of equal size. We begin by asking the following question: what is the correct dynamics underlying (8.10) whenever H = 1/2? Proceeding via trial and error, we can try to construct the Ito product, or stochastic integral equation, x = t H −1/2 • B
(8.11a)
164
Scaling, correlations, and cascades in finance and turbulence
where B(t) is a Wiener process, dB = 0, dB 2 = dt, and the Ito product is defined by the stochastic integral t+t
b(x, t) • B =
b(x(s), s)dB(s)
(8.12)
t
With an x(t)-dependence this integral depends on the path C B followed by the Wiener process B(t). From Chapter 3 we know that averages of integrals of this form with b independent of x are given by the path-independent results b • B = 0, t+t
(b • B) = 2
b2 (s)ds
(8.13)
t
Therefore, for the case of arbitrary H we have (t
H −1/2
t+t
• B) =
(s − t)
2
2H −1
t
t+t (s − t)2H ds = = t 2H /2H 2H t
(8.14) Mandelbrot invented this process and called x(t) = B H (t) “fractional Brownian noise,” but instead of (8.11) tried to write a formula for x(t) with limits of integration going from minus infinity to t and got divergent integrals as a result (he did not use Ito calculus). In (8.11) above there is no such problem. For H = 1/2 the underlying dynamics of the process is defined irreducibly by the stochastic integral equation t+t
x(t) =
(s − t) H −1/2 dB(s)
(8.15)
t
So defined, the statistical properties of the increments x depend only on t and not on t (see the calculation of the transition probability below). To see that the resulting xs are not statistically independent for all possible nonoverlapping time intervals unless H = 1/2, we next calculate the autocorrelation function for the special case of two equal-sized adjacent time intervals (t − t, t + t) C(−t, t) =
x(−t)x(t) x 2 (t)
(8.16)
Persistence and antipersistence
165
as follows: x(−t)x(t) 1 = (x(t) + x(−t))2 − x 2 (t) − x 2 (−t) 2 1 = x 2 (2t) − x 2 (t) 2 1 = c(2t)2H − ct 2H 2
(8.17a)
so that C(−t, t) = 22H −1 − 1
(8.18)
With H > 1/2 we have C(−t, t) > 0 and persistence, whereas with H < 1/2 we find C(−t, t) < 0 and antipersistence. The time interval t may be either small or large. This explains why it was necessary to assume that H = 1/2 for Markov processes with trajectories {x(t)} defined by stochastic differential equations in Chapter 3. For the case of fractional Brownian motion, J = H . The exponent H is named after Hurst who studied the levels of the Nile statistically. One can also derive an expression for the correlation function for two overlapping time intervals t2 > t1 , where t1 lies within t2 . Above we used 2ab = (a + b)2 − a 2 − b2 . Now, we use 2ab = a 2 + b2 − (a − b)2 along with x(t2 ) − x(t1 ) = x(t2 − t1 ), which holds only if the interval t1 is contained within the interval t2 . This yields the well-known result x(t2 )x(t1 ) 1 = x 2 (t2 ) + x 2 (t1 ) − (x(t2 ) − x(t1 ))2 2 1 = (x 2 (t2 ) + x 2 (t1 ) − x 2 (t2 − t1 )) 2 1 = c t22H + t12H − |t2 − t1 |2H 2
(8.17b)
Note that this expression does not vanish for H = 1/2, yielding correlations of the Wiener process for overlapping time intervals. Finally, using the method of locally Gaussian Green functions (Wiener integral) of Section 3.6.3 it is an easy calculation to show that driftless fractional Brownian motion (fBm) (8.11a) is Gaussian distributed (x−x )2 1 e− ct 2H g(x, x ; t) = √ 2ct 2H
(8.11b)
166
Scaling, correlations, and cascades in finance and turbulence
Note that the statistical properties of the displacements x = x − x are independent of t. The process is nonstationary but has instead what Stratonovich (1967) calls “stationary increments,” meaning that g depends only on t and not on t. This result can be used to show that driftless fBm satisfies the conditions stated above for a Martingale, although in the formal mathematics literature fBm is not called a Martingale.
8.3 Martingales and the efficient market hypothesis The notion of the efficient market evolved as follows. First, it was stated that one cannot, on the average, expect gains higher than those from the market as a whole. Here, we speak of the possibility of beating the average gain rate R = ln p(t + t)/ p(t) of the market calculated over some past time interval. With the advent of the CAPM the efficient market hypothesis (EMH) was revised to assert that higher expected returns require higher risk. Economists, who are taught to imagine perfect markets and instantaneous, complete information1 like to state that an efficient market is one where all available information is reflected in current price. They believe that the market “fairly” evaluates an asset at all times. Our perspective is different. Coming from physics, we expect that one starts with imperfect knowledge of the market, and that that “information” is degraded as time goes on, represented by the market entropy increase, unless new knowledge is received. Again, one must take care that by “information” in economics both traders and theorists generally mean specific knowledge, and not the entropy-based information of Shannon’s information theory. Mandelbrot (1966) introduced the idea of a Martingale in finance theory as a way to formulate the EMH (the historic paper on the EMH was written by Fama (1970)). The idea can be written as follows: the random process z(t) describes a fair game if z(t + t)Φ = z(t)
(8.19)
where the average is to be calculated based on specific “information” Φ. In finance Φ might be the recent history of the market time series x(t), for example, which mainly consists of noise. The idea is that if we have a fair game then history should be irrelevant in predicting the future price; all that should matter is what happened in the last instant t (the initial condition in (8.19) is z(t)). How can we apply this to finance? If, in (8.19), we write z(t + t) = x(t + t) − Rt 1
(8.20a)
Actually, knowledge, not information, is the correct term. According to Shannon, “information” contains noise and is described by entropy. See also Dosi (2001). In neo-classical economics theory the information content is zero because there is no noise, no ambiguity or uncertainty.
Martingales and the efficient market hypothesis
167
then z(t) is a Martingale. In this case the idea of the Martingale implies via (8.19) that the expected return at time t + t is just the observed return x(t) (the initial condition) x(t + t) ≈ x(t) + Rt
(8.21)
at time t plus the expected gain Rt. This leads to the claim that you cannot beat the expected return. With the advent of CAPM, this was later revised to say that you cannot beat the Sharpe ratio.2 One way to get a fair game is to assume a Markov process, whose local solution is x(t + t) − Rt ≈ x(t) + D(x, t)B (8.22) If R is independent of x then a fair game is given by any driftless stochastic pro√ cess, z(t + t) = x(t + t) − Rt, so that dz = DdB. A Martingale requires, in addition, the technical condition |z(t)| < ∞ for all finite times t. It is easy to show via a direct average of the absolute value of z(t) using the expo√ nential distribution that the driftless exponential distribution, dz = D(x, t)dB with D given by (6.40), where R is x-independent, defines a Martingale. Likewise, for the case of D(x, t) defined by (6.40), where R(x, t) = µ − D(x, t)/2, we can define the fair game by the stochastic variable t+t
z(t + t) = x(t + t) −
R(x(s), s)ds
(8.20b)
t
and then use the exponential distribution to show that z(t) satisfies the technical condition for a Martingale (Baxter and Rennie, 1995). As an example of how the fair game condition on z(t) does not guarantee lack of correlations in other combinations of variables, consider next fractional Brownian motion with drift R t+t
x(t) = Rt +
(s − t) H −1/2 dB(s)
(8.23)
t
If we define z = x − Rt then we have a fair game, independent of the value of H. However, if we study the stochastic variable y(t) = z(−t)(z(t)) then this is no longer true. So, relying on a Martingale for a single variable z(t) does not guarantee the absence of exploitable correlations if that variable is nonMarkovian. This leads to the furthest point in the evolution of the interpretation of the EMH, namely, that there are no patterns in the market that can be exploited systematically for profit (strictly seen, this requires H = 1/2 as a necessary condition). 2
The Sharpe ratio is based on the parameter β in the CAPM.
168
Scaling, correlations, and cascades in finance and turbulence
The use of Martingale systems in gambling is not new. In the mid eighteenth century, Casanova (1997) played the system with his lover’s money in partnership with his, in an attempt to improve her financial holdings. She was a nun and wanted enough money to escape from the convent on Murano. Casanova lived during that time in the Palazzo Malpiero near Campo S. Samuele. In that era Venice had over a hundred casini. A painting by Guardi of a very popular casino of that time, Il Ridotto (today a theater), hangs in Ca’ Rezzonico. Another painting of gamblers hangs in the gallery Querini-Stampalia. The players are depicted wearing typical Venetian Carnival masks. In that time, masks were required to be worn in the casini. Casanova played the Martingale system and lost everything but went on to found the national lottery in France (Gerhard-Sharp et al., 1998), showing that it can be better to be lucky than to be smart.3 In 1979 Harrison and Kreps showed mathematically that the replicating portfolio of a stock and an option is a Martingale. Today, the Martingale system is forbidden in most casini/casinos, but is generally presented by theorists as the foundation of finance theory (Baxter and Rennie, 1995). The financial success of The Prediction Company, founded and run by a small collective of physicists who are experts in nonlinear dynamics (and who never believed that markets are near equilibrium), rests on having found a weak signal, never published and never understood qualitatively (so they didn’t tell us which signal), that could be exploited. However, trying to exploit a weak signal can easily lead to the Gambler’s Ruin through a run of market moves against you, so that agents with small bank accounts cannot very well take advantage of it. In practice, it’s difficult for small traders to argue against the EMH (we don’t include big traders like Soros, Buffet, or The Prediction Company here), because financial markets are well approximated by Markov processes over long time intervals. There are only two ways that a trader might try to exploit market inefficiency: via strong correlations over time scales much less than 10 min (the time scale for bond trading is on the order of a second), or very weak correlations that may persist over very long times. A time signal with a Joseph exponent J > 1/2 would be sufficient, as in fractional Brownian motion. Summarizing, initial correlations in financial data are observed to decay on a time scale on the order of 10 min. To the extent that J = 1/2, weaker very long-ranged time correlations exist and, in principle, may be exploited for profit. The EMH requires J = 1/2 but this condition is only necessary, not sufficient: there can be other correlations that could be exploited for profit if the process is nonMarkovian. However, that would not necessarily be easy because the correlations could be so weak that the data could still be well approximated as Markovian. For example, 3
Casanova was arrested by the Venetian Republic in 1755 for Freemasonry and by the Inquisition for godlessness, but escaped from the New Prison and then took a gondola to Treviso. From there he fled to Munich, and later went on to Paris. A true European, he knew no national boundaries.
Energy dissipation in fluid turbulence
169
with H = J and ␦H = H − 1/2 we obtain σ 2 = t H = t 1/2 e␦H ln t ≈ t 1/2 (1 + ␦H ln t + · · ·)
(8.24)
With ␦H small, it is easy to approximate weakly correlated time series to zeroth order by statistical independence over relatively long time scales t. The size of ␦H in the average volatility can be taken as a rough measure of the inefficiency of the market. Zhang (1999) has discussed market inefficiency in terms of entropy. If one could determine the complete distribution that generates fractional Brownian motion then it would be easy to write down the Gibbs entropy, providing a unified approach to the problem. In neo-classical economic theory an efficient market is one where all trades occur in equilibrium. That expectation led writers like Fama to claim that the Martingale describes “equilibrium” markets, thereby wrongly identifying timedependent averages as “equilibrium” values. The next step, realizing that infinite foresight and perfect information are impossible, was to assume that present prices reflect all available information about the market. This leads to modeling the market as pure noise, as a Markov process, for example. In such a market there is no sequential information at all, there is only entropy production. A market made up only of noise is a market in agreement with the EMH. Arbitrage is impossible systematically in a market consisting of pure noise. This is the complete opposite of the neo-classical notion of perfect information (zero entropy), and one cannot reach the former perturbatively by starting from the latter viewpoint. Rather, financial markets show that the neo-classical emperor wears no clothing at all. Lognormal and exponential distributions occur in discussions of both financial markets and fluid turbulence. The exponential distribution was observed in turbulence (Castaing et al., 1989) before it was discovered in finance. An information cascade has been suggested for finance in analogy with the vortex cascade in fluid turbulence. Therefore, we present next a qualitative description of the eddy cascade in three-dimensional turbulence. 8.4 Energy dissipation in fluid turbulence We begin with the continuum formulation of the flow of an incompressible fluid past an obstacle whose characteristic size is L, and where the fluid velocity vector v(x, t) is uniform, v = U = constant, far in front of the obstacle (see McCauley (1991) and references therein). The obstacle generates a boundary layer and therefore a wake. With no obstacle in the flow a shear layer between two streams flowing asymptotically at different velocities (water from a faucet through air, for example) can replace the effect of the obstacle in generating the boundary layer (in this case the shear layer) instability. Boundary conditions are therefore essential from the
170
Scaling, correlations, and cascades in finance and turbulence
beginning in order to understand where turbulence comes from. Mathematically, the no-stick boundary condition, v = 0 on the obstacle’s surface, reflects the effect of viscosity and generates the boundary layer. Without a boundary or shear layer there is no turbulence, only Galilean invariance. With constant fluid density (incompressible flow) the Navier–Stokes equations are ∂v + v˜ ∇v = −∇ P + ν∇ 2 v ∂t ˜ =0 ∇v
(8.25)
where ν is the kinematic viscosity and has the units of a diffusion coefficient, and P is the pressure divided by the density, and we use the notation of matrix algebra with row and column vectors here. The competition between the nonlinear term and dissipation is characterized by Re, the Reynolds number Re =
U 2 /L UL O(˜v ∇v) = = 2 2 O(ν∇ v) νU/L ν
(8.26)
From ν ≈ O(1/Re) for large Re follows boundary-layer theory and for very large Re (where nonlinearity wins) turbulence, whereas the opposite limit (where dissipation wins) includes Stokes flow and the theory of laminar wakes. The dimensionless form 1 2 ∂v ∇ v + v˜ ∇ v = −∇ P + ∂t Re ∇˜ v = 0
(8.27)
of the Navier–Stokes equation, Reynolds number scaling, follows from rescaling the variables, x = L x , v = U v , and t = t L/U . Instabilities correspond to the formation of eddies. Eddies, or vortices, are ubiquitous in fluid flows and are generated by the flow past any obstacle even at relatively low Reynolds numbers. Sharp edges create vortices immediately, as for example the edge of a paddle moving through the water. Vorticity ω =∇ ×v
(8.28)
is generated by the no-slip boundary condition, and vortices (vortex lines and rings) correspond to concentrated vorticity along lines ending on surfaces, or closed loops, with vorticity-free flow circulating about the lines in both cases. By Liouville’s theorem in mechanics vorticity is continuous, so that the instabilities form via vortex stretching. This is easily seen in Figure 8.1 where a droplet of ink falls into a pool of water yielding a laminar cascade starting with Re ≈ 15. One big vortex ring formed from the droplet was unstable and cascaded into four to six
Energy dissipation in fluid turbulence
Figure 8.1. Ink drop experiment showing the vortex cascade in a low Reynolds number instability, with tree order five and incomplete. A droplet of ink was ejected from a medicine dropper; the Reynolds number for the initial unstable large vortex ring is about 15–20, and the cascade ends with a complete binary tree and the Reynolds number on the order of unity. Note that the smaller rings are connected to the larger ones by vortex sheets. Photo courtesy of Arne Skjeltorp.
171
172
Scaling, correlations, and cascades in finance and turbulence
other smaller vortex rings (all connected by visible vortex sheets), and so on until finally the cascade ends with the generation of many pairs of small vortex rings at the Kolmogorov length scale. The Kolmogorov scale is simply the scale where dissipation wins over nonlinearity. The different generations of vortices define a tree, with all vortices of the same size occupying the branches in a single generation, as Figure 8.1 indicates. In fully developed turbulence the order of the incomplete tree predicted by fitting the β-model to data is of order eight, although the apparent order of the tree at the Kolmogorov length scale is a complete binary one (the dissipation range of fluid turbulence can be fit with a binomial distribution). The vorticity transport equation follows from (8.25) and (8.28) and is given by ∂ω + v˜ ∇ω = ω∇v ˜ + ν∇ 2 ω ∂t
(8.29)
The vortex stretching term is the first term on the right-hand side and provides the mechanism for the cascade of energy from larger to smaller scales (in 3D), from larger to more and smaller vortices until a small scale L K (the Kolmogorov length scale) is reached where the effective Reynolds number is unity. At this smallest scale dissipation wins and kills the instability. By dimensional analysis, L K = L Re−3/4 . The starting point for the phenomenology of the energy-eddy cascade in soft turbulence (open flows past an obstacle, or shear flows, for example) is the relation between vorticity and energy dissipation in the fluid, ∂v 2 3 1 (8.30) d x = −ν ω2 d3 x = −L 3 (∇ × v)2 2 ∂t One would like to understand fluid turbulence as chaotic vorticity in space-time, but so far the problem is too difficult mathematically to solve. Worse, the vortex cascade has not been understood mathematically by replacing the Navier–Stokes equations by a simpler model. Financial data are much easier to obtain than good data representing fluid turbulence, and the Burgers equation is a lot easier to analyze than the Navier–Stokes equations. We do not understand the Navier–Stokes equations mathematically. Nor do we understand how to make good, physical models of the eddy cascade. Whenever one cannot solve a problem in physics, one tries scaling. The expectation of multiaffine scaling in turbulence arose from the following observation combined with the idea of the eddy cascade. If we make the change of variable x = x /λ, v = v /λh , and t = t λh−1 , where h is a scaling index, then the Navier–Stokes equations are scale invariant (i.e., independent of λ and h) with Re = Re . That is, we expect to find scaling in the form ␦v ≈ v(x + L) − v(x) ≈ L h , where ␦v is the velocity difference across an eddy of size L (Frisch, 1995). This
Multiaffine scaling in turbulence models
173
led to the study of the so-called velocity structure functions4 . |␦v| p ≈ L hp ≈ L ζ p
(8.31)
defining the expectation of multiaffine scaling with a spectrum ζ p of Hurst exponents as the moment order p varies. 8.5 Multiaffine scaling in turbulence models Consider a probability distribution P(x, L). If there is self-affine scaling then the moments of the distribution may not scale with a single Hurst exponent H, but rather with a discrete spectrum of Hurst exponents (Barabasi and Stanley, 1995) ζn x n (bL) ≈ bζn x n (L)
(8.32)
x n ≈ L ζn
(8.33)
yielding (with bL = 1)
In hydrodynamics quantities like the velocity structure functions are expected to exhibit multiaffine scaling |␦v|n ≈ L ζn
(8.34)
The physical idea behind (8.34) is that one measures the velocity difference ␦v across a distance L in a turbulent flow. It is easy to construct Fokker–Planck models that show multiaffine scaling (Renner et al., 2000), but is much harder to relate stochastic models to vortex dynamics without hand waving. Ignoring the difficulties of the very interesting physics of the vortex cascade, consider a Markov process satisfying the Fokker– Planck equation with conditional probability density g(x, L; x0 , L 0 ) where x = ␦v is the velocity difference across an eddy of size L, and where energy is injected by creating a single big eddy of size L 0 initially. We will see that backward diffusion in L is required, and that the diffusion is from larger to smaller L in three dimensions (but is the opposite in two dimensions). In this stochastic model there are no vortices and no cascade, only diffusion with drift. One could make a stochastic model of the cascade by starting with a master equation for discrete changes in L, with branching, analogous to the β-model (McCauley, 1993), therefore leading to fractal or multifractal scaling because definite length scales and the relation between them would be built into the problem from the start. The β-model starts with self-similar fractals, but here we deal with self-affine scaling instead. And, with L as analog of 4
In their finance text, Dacorogna et al. (2001) label these exponents as “drift exponents” but they are not characterized by drift in the sense of convection.
174
Scaling, correlations, and cascades in finance and turbulence
time, we must have the equivalent of backward-time diffusion, diffusion from large to small length scales. Using a Fokker–Planck approach, the moments of the distribution P(x, L) obey n(n − 1) d n x = nRx n−1 + Dx n−2 dL 2
(8.35)
Consider simply the model defined by R(x, L) = βx/L D(x, L) = γ x 2 /L
(8.36)
Note in particular that with the variable t = lnL we would retrieve the lognormal model. Here, we obtain from the transformed lognormal model multiaffine scaling ςn L n (8.37) x (L) ≈ L0 with the Hurst exponent spectrum given by ζn = nβ +
n(n − 1) γ 2
(8.38)
Now, it is exactly the sign of γ that tells us whether we have forward or backward diffusion in L: if γ < 0 (“negative diffusion coefficient” forward in “time” L) then the diffusion is from large to small “times” L, as we can anticipate by writing L = L 0 − L. Therefore, whether diffusion is forward or backward in “time” L can be determined empirically by extracting γ from the data. In practice, velocity structure functions for n > 3 are notoriously difficult to measure but measuring ζ2 is enough for determining the direction of the cascade. The same would hold for an information cascade in finance, were there multiaffine scaling. In the K62 lognormal model (Frisch, 1995), the scaling exponents are predicted to be given by ζn =
n(n − 3) n −µ 3 18
(8.39)
yielding γ = −µ/9 R = 1/3 + µ/18
(8.40)
so that the vortex instability (whose details are approximated here only too crudely by diffusion) goes from larger to smaller and more vortices. This means that
Multiaffine scaling in turbulence models
information is lost as L decreases so that the entropy S(L) = − g ln gdx
175
(8.41)
increases as L decreases. In contrast with the K61 model, the drift and diffusion coefficients extracted from data analysis by Renner et al. (2000) are R(v, L) = γ (L)v − κ(L)v 2 + ε(L)v 3
(8.42)
D(v, L) = −α(L) + ␦(L)v − β(L)v 2
(8.43)
and
for backward-in-L diffusion, and do not lead to scaling. Emily Ching (Ching, 1996; Stolovotsky and Ching, 1999), whose theoretical work forms the basis for Tang’s financial data analysis, produced a related analysis.5 The original K41 model, in contrast, is based on time-reversible dynamics precisely because in that model γ = 0 and n ζn = (8.44) 3 (the same scaling was derived in the same era by Onsager and Heisenberg from different standpoints). In this case the equation of motion for the probability density f (x, L), the equation describing local probability conservation ∂ ∂f = − (R f ) (8.45) ∂L ∂x rewritten as the quasi-homogeneous first-order partial differential equation ∂f ∂R ∂f +R =−f ∂L ∂x ∂x has the characteristic equation dL =
dx R(x)
(8.46)
(8.47)
defining the (time-reversible!) deterministic dynamical system dx = R(x) dL With R(x) = 3x we integrate (using γ = 1/3) to obtain x(L) = constant L 1/3 5
(8.48)
(8.49)
In Ching’s 1996 paper there is a misidentification of an equilibrium distribution as a more general steady-state distribution.
176
Scaling, correlations, and cascades in finance and turbulence
which was first proposed statistically by Kolmogorov in 1941. Equation (8.47) simply represents the assumption of scale invariance with homogeneity (the eddies are space filling), ␦v n (L) ≈ L n/3
(8.50)
σ 2 = ␦v 2 = L 2/3
(8.51)
so that
In contrast with our dynamic modeling, this scaling law was derived by Kolmogorov by assuming that the probability density f (x, L) is scale invariant, that ␦v(bL) ␦v(L) = f (8.52) f (␦v, L) = f σ (L) σ (bL) where b is an arbitrary scale factor. This is consistent with our equations (8.48) and (8.49). This completes our discussion of the K41 model.
8.6 Levy distributions We now discuss the essential difference between the aggregation equation √
xk / n (8.53) f (x) = · · · dx1 f 1 (x1 ) . . . dxn f n (xn )␦ x − describing the density f of the distribution P(x) of the random variable n 1 xk x=√ n k=1
(8.54)
where the xk are independently distributed random variables, and the propagator for a Markov process (8.55a) g(x, t) = dx1 . . . dxn−1 g(x1 − x0 , t1 ) . . . g(x − xn−1 , tn−1 ) where x = x − x0 , tk = tk − tk−1 , and t = t − t0 . The latter describes dynamics and is the probability to obtain a specific displacement x during a specific time interval t starting from a definite initial condition x0 . The first equation (8.53), in contrast, describes a sequence of independent observations where the terms in √ ( n)x = xk do not necessarily add up to form a linked Brownian motion path even if the xk are additive, as in xk = ln( pk / p0 ) and was introduced in Chapter 3 in our discussion of the central limit theorem.
Levy distributions
177
If we substitute for each density f k in (8.53) a Green function g(xk − xk−1 , tk ), then we obtain f (x, t) n xk = dx1 . . . dxn g(x1 − x0 , t1 ) . . . g(xn − xn−1 , tn−1 )␦ x − k=1
(8.55b) The effect of the delta function constraint in the integrand is simply to yield f (x, t) = g(x, t), and thereby it reproduces the propagator equation (8.55a) exactly. In this case the aggregation of any number of identical densities g reproduces exactly the same density g. That is the propagator property. We have applied the dynamics behind this idea to financial markets. An example of the application of (8.53), in contrast, would be the use of the Gaussian returns model to show that for n statistically independent assets a portfolio of n > 1 assets has a smaller variance than does a portfolio of fewer assets. In this case the returns are not generated by a single model sde but with different parameters for different assets. Mandelbrot (1964), in contrast, thought the aggregation equation (8.53) to be advantageous in economics, where data are typically inaccurate and may arise from many different underlying causes, as in the growth populations of cities or the number and sizes of firms. He therefore asked which distributions have the property (called “stable” by him) that, with the n different densities f k in (8.53) replaced by exactly the same density f, we obtain the same functional form under aggregation, but with different parameters α, where α stands for a collection (α1 , . . . , αm ) of m parameters including the time variables tk : √
˜f (x, α) = . . . dx1 f (x1 , α1 ) . . . . dxn f (xn , αn )␦ x − xk / n (8.56) Here, the connection between the aggregate and basic densities is to be given by self-affine scaling f˜(x) = C f (λx)
(8.57)
As an example, the convolution of any number of Gaussians is again Gaussian, with different mean and standard deviation than the individual Gaussians under the integral sign. Levy had already answered the more general question, and the required distributions are called Levy distributions. Levy distributions have the fattest tails (the smallest tail exponents), making them of potential interest for economics and finance. However, in contrast with Mandelbrot’s motivation stated above, the Levy distribution does have a well-defined underlying stochastic dynamics, namely, the Levy flight (Hughes et al., 1981).
178
Scaling, correlations, and cascades in finance and turbulence
Denoting the Fourier transform by φ(k), f (x) = φ(k)eikxdk dk the use of equation (8.56) in the convolution (8.51) yields ˜f (x) = dkΦ(k)eikx = dkφ n (k)eikx
(8.58)
(8.59)
so that the scaling condition (8.56) gives φ n (k) = Cφ(k/λ)/λ
(8.60)
The most general solution was found by Levy and Khintchine (Gnedenko, 1967) to be ⎧ k tan(α/2) ⎪ α ⎪ ], α = 1 ⎨ iµk − γ |k| [1 − iβ |k| ln φ(k) = 2k ln |k| ⎪ ⎪ ⎩ iµk − γ |k| [1 + iβ ], α = 1 (8.61) |k| Denote the Levy densities by L α (x, t). The parameter β controls asymmetry, 0 < α ≤ 2, and only three cases are known in closed form: α = 1 describes the Cauchy distribution, 1 1 (8.62) L 1 (x, t) = t 1 + x 2 /t 2 α = 1/2 is Levy–Smirnov, and α = 2 is Gaussian. For 0 < α < 2 the variance is infinite. The exponent α describes very fat tails. For x large in magnitude and α < 2 we have µAα± (8.63) L α (x) ≈ |x|1+α so that the tail exponent is µ = 1 + α. The Levy distribution was applied to financial data analysis by two econophysics pioneers, Rosario Mantegna and Gene Stanley (2000). There is a scaling law for both the density and also the peak of the density at different time intervals that is controlled by the tail exponent α. For the symmetric densities ∞ 1 α dkeikx−γ k t (8.64) L α (x, t) = −∞
so that
L α (x, t) = t −1/α L α x/t 1/α , 1
(8.65)
Recent analyses of financial data
179
A data collapse is predicted with rescaled variable z = x/t 1/α . The probability density for zero return, a return to the origin after time t, is given by L α (0, t) = L α (0, 1)/t 1/α
(8.66)
Mantegna and Stanley have used the S&P 500 index to find α ≈ 1.4. Their estimate for the tail exponent is then µ = 2.4. This also yields H = 1/α ≈ 0.71 for the Hurst exponent. In this case J = 0 owing to the Levy requirement of statistical independence in x.
8.7 Recent analyses of financial data In order to compare our analysis of financial data with analyses by other econophysicists, we first review the results from Chapter 6. We found in Chapter 6 that the intraday distribution of foreign exchange and bond prices is distributed exponentially, Be−ν(x−␦) x > ␦ (8.67) f (x, t) = Aeγ (x−␦) x < ␦ √ where x = ln p(t + t)/ p(t), γ , ν = O(1/ t) and ␦ = Rt, with R the expected return, excepting extreme returns where x 1 or x 1. We assumed a stochastic differential equation dx = R(x, t)dt + D(x, t)dB(t) (8.68) where B(t) is a Wiener process. The corresponding Fokker–Planck equation is ∂f ∂ 1 ∂2 = − (R(x, t) f ) + (D(x, t) f ) ∂t ∂x 2 ∂x2
(8.69)
Given the exponential density f, we then solved the inverse problem to determine the √ diffusion coefficient which we found to be linear in the variable u = (x − ␦)/ ∆t b2 (1 + ν(x − ␦)), x > ␦ (8.70) D(x, t) ≈ b2 (1 − γ (x − ␦)), x < ␦ and where
√ ν ≈ 1/b ∆t √ γ ≈ 1/b ∆t
(8.71)
where b and b are constants. We have shown that this model, with only fat tails in the price ratio p(t)/ p0 , prices options in agreement with the valuations used by traders. That is, our model of returns prices options correctly.
180
Scaling, correlations, and cascades in finance and turbulence
Figure 8.2. Data collapse for the S&P 500 for logarithm of probability density vs price difference ␦p. From Mantegna and Stanley (2000), fig. 9.4.
Fat tails in returns x, which are apparently unnecessary for option pricing but are necessary for doing VaR, are generated by including a perturbation in (ν(x − ␦))2 D(x, t) ≈ b2 (1 + ν(x − ␦) + ε(ν(x − ␦))2 ),
x >␦
(8.72)
and similarly for x < ␦. We now survey and compare the results of other data analyses in econophysics. Our parameter ε is determined by the empirically observed tail exponent µ, f (x, t) ≈ x −µ , defined in Chapter 4, which is both nonuniversal and time dependent (see Dacorogna et al. (2001) for details). Mantegna and Stanley (M–S) have analyzed financial data extensively and have fit the data for a range of different time scales by using truncated Levy distributions (TLDs). Their fit with a TLD presumes statistical independence in price increments ␦p = p(t + t) − p(t), not in returns x = ln p(t + t)/ p(t). M–S reported a data collapse of the distribution of price differences with α = 1.4. Their predicted tail exponent in ␦p is µ = 1 + α = 2.4. The exponent α was estimated from the scaling of the peak of the distribution, not the tails, and their data collapse shows considerable noise in the tails (see Figure 8.2). Here, H = 1/α = 0.71 = J = 0. In this case self-affine scaling with statistical independence is reported.
Recent analyses of financial data
181
Johannes Skjeltorp (1996), using Mandelbrot’s R/S analysis where H = J , has found a Joseph exponent J ≈ 0.6 for the Norwegian stock index by analyzing logarithmic returns x, not price increments ␦p. In this analysis we have a report of self-affine scaling with persistence in x. It is clear that this analysis is qualitatively in disagreement with that of M–S (where there are no correlations and J = 0) in spite of the nearness of values of H ≈ 0.6–0.7 in both cases, but using different variables. An interesting empirically based stochastic analysis of both financial and soft turbulence data was presented by Christian Renner, Joachim Peinke, and Rudolf Friedrichs (2001), who assumed a Markov process in price increments ␦p instead of returns x. They, in contrast with M–S, found no data collapse of the distribution of price differences but reported instead evidence at small price intervals for a Fokker–Planck description. The drift and diffusion coefficients found there via direct analysis of first and second moments for price increments y = ␦p R(y, t) = −γ (t)y D(y, t) = α(t) + β(t)y 2
(8.73)
do not agree with the drift and diffusion coefficients in our model of returns, even in the limit of approximating returns x by price increments y. Also, the predicted local volatility differs from ours. Like the fat tails in the M–S analysis, the data from which their formulae for drift and diffusion were extracted are extremely noisy (large error bars) for larger price increments. In fact, the corresponding probability density has no fat tails at all: it is asymptotically lognormal for large y. With the exponential density we found the diffusion coefficient to be logarithmic in price. We approximated the region near the peak of f (x, t) simply by a discontinuity. Renner et al., and also M–S, treat the region near the origin more carefully and observe that the density tends to round off relatively smoothly there. Presumably, the quadratic diffusion coefficient of Renner et al. describes the region very near the peak that we have treated as discontinous, but is invalid for larger returns where the density is exponential in x. Presumably, their negative drift term is valid only very near the peak as well. The claim in Renner et al. (see their equation (27)) that they should be able to derive asymptotically a fat-tailed scaling exponent from their distribution is based on the assumption that their distribution approaches an equilibrium one as time goes to infinity. First, let us rewrite the Fokker–Planck equation as ∂ j(y, t) ∂ f (y, t) =− ∂t ∂y
(8.74)
182
Scaling, correlations, and cascades in finance and turbulence
where j(y, t) = R(y, t) f (y, t) −
∂ (D(y, t) f (y, t)) ∂y
(8.75)
Whenever R and D are time independent (which is not the case in (8.73)) we can set the left-hand side of (8.75) equal to zero to obtain the equilibrium density f (y)equil =
C e D(y)
R(y) D(y) dy
(8.76)
The equilibrium distribution of the stochastic equation of Renner et al., were α, β, and γ t-independent, would have fat tails f (y) ≈ y −γ /β for large y. However, one can show from the moment equations generated by (8.73) that the higher moments are unbounded as t increases, so that statistical equilibrium is not attained at any time. That is, their time-dependent solution does not approach (8.76) as t goes to infinity. Since their initially nonequilibrium distribution cannot approach statistical equilibrium, there are no fat tails predicted by it. This contrasts with the conclusion of Didier Sornette (2001) based on uncontrolled approximations to solutions of the model. In fact, at large t the model distribution based on (8.73), with time transformation dτ = β(t)dt, is lognormal in ␦p. Renner et al. (2001) also reported an information cascade from large to small time scales, requiring backward-time diffusion in t, but the evidence for this effect is not convincing. Given the Fokker–Planck equation forward in time, there is always the corresponding Kolmogorov equation backward in time describing the same data. However, this has nothing to do with an information cascade. Lei-Han Tang (2000) found that very high-frequency data for only one time interval t ≈ 1 s on the distribution of price differences could be fit by an equilibrium distribution. He also used the method of extracting empirical drift and diffusion coefficients for small price increments (in qualitative agreement with Renner et al., and with correspondingly very noisy data for the larger price increments), R(y) = −r y D(y) = Q(y 2 + a 2 )1/2
(8.77)
where y = ␦p is the price increment, but then did the equivalent of assuming that one can set the probability current density j(y, t) equal to zero (equilibrium assumption) in a Fokker–Planck description. Again, it was assumed that both R and D are t-independent and this assumption did not lead to problems fitting the data on the single time scale used by Tang. One could use Tang’s solution as the initial condition and ask how it evolves with time via the Fokker–Planck equation (8.75). The resulting distribution disagrees with
Recent analyses of financial data
183
Renner et al.: Tang’s sde yields the S–U–O process for small y, but has a diffusion coefficient characteristic of an exponential distribution in price increments for large y. Also, Tang worked within the limit (trading times less than 10 min) where initial correlations still matter, where Markov approximations may or may not be valid. In addition, Lisa Borlund (2002) has fit the data for some stock prices using a Tsallis distribution. The dynamics assumes a form of “stochastic feedback” where the local volatility depends on the distribution of stock prices. The model is dynamically more complicated than ours, and is based on assumptions about a thermodynamic analogy that we find unconvincing. The difference betweeen her model and ours can be tested by measuring the diffusion coefficient directly, for at least up to moderate values of the logarithmic return x. As we have shown in Chapter 6, the empirical distribution, or any model of the empirical distribution, can be used to price options. This provides an extra test on any empirically based model. Given the empirical returns distribution or any model of it described by a probability density f (x, t), calls are priced as C(K , p, t) = e−rd t ( pT − K )ϑ( pT − K ) ∞ −rd t ( pex − K ) f (x, t)dx =e
(8.78)
ln(K / p0 )
where K is the strike price and t is the time to expiration. The meaning of (8.78) is simple: x = ln pT / p, where p is the observed asset price at time t and pT is the unknown asset price at expiration time T. One simply averages over pT using the empirical density, and then discounts money back to time t at rate b. A corresponding equation predicts the prices of puts. Any proposed distribution or model can therefore be tested further by using it to predict prices of puts and calls, and then comparing with option prices used by traders. Another test is to use a model to do VaR. A more direct test is to measure the diffusion coefficient D(x, t) directly. This will require a direct measurement of conditional moments in terms of logarithmic returns x rather than in terms of price increments. Michel Dacorogna and his associates at the former Olsen & Associates (Dacorogna et al., 2001; Blum and Dacorogna, 2003), all acknowledged experts in foreign exchange statistics, have studied the distribution of logarithmic returns x and found no data collapse via self-affine scaling. They found instead that the distribution changes with time in a nonscaling way, excepting extreme returns where the (nonuniversal) tail exponents are typically µ ≈ 3.5 to 7.5 in magnitude, beyond the Levy range where 2 ≤ µ ≤ 3. It is clear that further and more difficult data analyses are required in order to resolve the differences discussed here.
184
Scaling, correlations, and cascades in finance and turbulence
Appendix C. Continuous time Markov processes We offer here an alternative derivation (Stratonovich, 1963) of the Fokker–Planck equation that also shows how one can treat more general Markov processes that violate the Fokker–Planck assumptions. Beginning with the Markov equation (3.111) ∞ f (x, t) =
g(x, t | x0 , t0 ) f (x0 , t0 )dx0
(C1)
−∞
and the characteristic function of the Green function/transition probability, Θ(k, x0 , t) = eik(x−x0 ) = eik(x−x0 ) g(x, x0 ; t)d x we can rewrite (C1) as 1 f (x, t) = 2
e−ik(x−x0 ) Θ(k, x; t) f (x)dkdx0
(C2)
(C3)
where f (x) = f (x, t0 ). Expanding the characteristic function in power series to obtain the moments, Θ(k, x0 , t) = 1 +
∞ (ik)s s=1
s!
(x − x 0 )s
we then arrive after a few manipulations at ∞ 1 ∂ s − (x − x0 )s 0 f (x, t) f (x, t) = f (x, t0 ) + s! ∂x s=1
(C4)
(C5)
where . . .0 denotes that the average is now over x0 , not x. If we assume that quantities (C6) K n (x, t) = (x − x0 )n 0 /t, t → 0 are all well-defined, then we obtain
∞ ∂ f (x, t) ∂ s 1 = − K s (x, t) f (x, t) ∂t s! ∂x s=1
(C7)
We obtain the Fokker–Planck equation from (C7) if we then assume that the first two moments vanish like t, while assuming that all higher moments vanish more slowly than t (K 1 and K 2 are finite, but K n = 0 for n > 2).
9 What is complexity?
Economists teach that markets can be described by equilibrium. Econophysicists teach that markets are very far from equilibrium and are dynamically complex. In our analysis of financial market data in Chapters 6 and 7 we showed that equilibrium is never a good approximation, that market equilibrium does not and cannot occur, but we did not use any idea of complexity in describing financial markets dynamically. Where, then, does complexity enter the picture if all we have needed so far is simple nonstationary stochastic dynamics? The complexity of financial markets is hidden in part in the missing theory of the expected return R, which we treated in Chapter 6 as piecewise constant (constant within a trading day), but we neither extracted R empirically nor modeled it theoretically. Imagine trying to write down an equation of motion for R. It is easy to construct simple deterministic and/or stochastic models, and all of them are wrong empirically. In order to get the time development of R right, you have to describe the collective sentiment of the market within a given time frame, and that collective sentiment can change suddenly due to political and/or business news.1 The other part of complexity of financial markets is that the empirical distribution is not fixed once and for all by any law of nature. Rather, it is also subject to change with agents’ collective behavior, but the time scale for the entire distribution to change its functional form can be much greater than the time scale for changes in the expected return. The only empirical method for estimating the expected return is to assume that the future will be like the past, which ignores complexity altogether. Here, clearly, we are not referring to the ever-present diffusion that broadens a given distribution but about a sudden change, for example, as from Gaussian to exponential returns, or from exponential to some other distribution. This sort of change cannot be anticipated or described by a simple stochastic theory of the sort employed in this text. 1
To get some feeling for the level of complication that one meets in trying to model the expected return in any realistic way, we recommend the paper by Arthur (1995).
185
186
What is complexity?
Even though we have not squared off against complexity in this text, we certainly would agree that market growth is likely to be understood as complex, but then what exactly do we mean by “complex”? Does the word have definite dynamical meaning? How does complexity differ from chaos? How does it differ from randomness? Can scaling describe complexity? Because the word “complexity” is often used without having been clearly defined, the aim of this final chapter is to try to delineate what is complex from what is not. Some confusion arises from the absence of a physically or biologically motivated definition of complexity and degrees of complexity. The only clear, systematic definitions of complexity that have been used so far in physics, biology, and nonlinear dynamics are definitions that were either taken from, or are dependent on, computer theory. The first idea of complexity to arise historically was that of the highest degree, equivalent to a Turing machine. Ideas of degrees of complexity, like how to describe the different levels of difficulty of computations or how to distinguish different levels of complexity of formal languages generated by automata, came later. 9.1 Patterns hidden in statistics We begin the discussion with binary strings, and discuss below and in Section 9.3 how we can regard them as representing either numbers or one-dimensional patterns. A definite pattern in finance data would violate the EMH and could in principle be exploited to make unusual profits in trading. We could search for patterns in economic data as follows: suppose that we know market data to three-decimal accuracy, for example, after rescaling all prices p by the highest price so that 0 ≤ p ≤ 1. This would allow us to construct three separate coarse-grainings: empirical histograms based on 10 bins, 100 bins, and 1000 bins. Of course, because the last digit obtained empirically is the least trustworthy, we should expect the finest coarse-graining to be the least reliable one. In the 10 coarse-graining each bin is labeled by one digit (0 through 9), while in the 1000 coarse-graining each bin is labeled by a triplet of digits (000 through 999). An example of a pattern would be to record the time sequence of visitation of the bins by the market in a given coarsegraining. That observation would produce a sequence of digits, called a symbol sequence. The question for market analysis is whether a pattern systematically nearly repeats itself. Mathematically well-defined symbolic dynamics is a signature of deterministic chaos, or of a deterministic dynamical system at the transition to chaos. First, we present some elementary number theory as the necessary background. We can restrict ourselves to numbers between zero and unity because, with those
Patterns hidden in statistics
187
numbers expressed as digit expansions (in binary, or ternary, or . . .) all possible onedimensional patterns that can be defined to exist abstractly exist there. Likewise, all possible two-dimensional patterns arise as digit expansions of pairs of numbers representing points in the unit square, and so on. Note that by “pattern” we do not imply a periodic sequence; nonperiodic sequences are included. We can use any integer base of arithmetic to perform calculations and construct histograms. In base µ we use the digits εk = 0, 1, 2, . . . , µ − 1 to represent any integer x as x = εk µk . In base 10 the digit 9 is represented by 9, whereas in base two the digit 9 is represented by 1001.0, and in base three 9 is represented by 100.0. Likewise, a number between zero and one is represented by x = εk µ−k . We will mainly use binary expansions (µ = 2) of numbers in the unit interval in what follows, because all possible binary strings/patterns are included in that case. From the standpoint of arithmetic we could as well use ternary, or any other base. Finite-length binary strings like 0.1001101 (meaning 0.100110100000000 . . . with the infinite string of 0s omitted) represent rational numbers that can be written as a finite sum of powers of 2−n , like 9/16 = 1/2 + 1/24 . Periodic strings of infinite length represent rational numbers that are not a finite sum of powers of 2−n , like the number 1/3 = 0.010101010101 . . ., and vice versa. Nonperiodic digit strings of infinite length represent irrational numbers, and vice versa (Niven, 1956). For √ example, 2 − 1 = 0.0110101000001001 . . .. This irrational number can be computed to as high a digital accuracy as one pleases by the standard school-boy/girl algorithm. We also know that every number in the unit interval can be formally represented by a continued fraction expansion. However, to use a continued fraction expansion to generate a particular number, we must first know the initial condition or “seed.” As a simple example, one can solve for the square root of any integer easily via a √ continued fraction formulation: with 3 = 1 + x, so that 0 < x < 1, we have the continued fraction x = 2/(2 + x). In this formula the digit 2 in the denominator is the seed (initial condition) that allows us to iterate the continued fraction, x = 2/(2 + 2/(2 + · · ·)) and thereby to construct a series of rational approximations √ whereby we can compute x = 3 − 1 to any desired degree of decimal accuracy. Turing (1936) proved via an application of Cantor’s diagonal argument (Hopkin and Moss, 1976) that for almost all numbers that can be defined to “exist” abstractly in the mathematical continuum there is no seed: almost all numbers (with measure one) that can be defined to exist in the mathematical continuum are both irrational and not computable via any possible algorithm. The measure zero set of irrational numbers that have an initial condition for a continued fraction expansion was called computable by Turing. Another way to say it is that Turing proved that the set of all algorithms is countable, and is in one-to-one correspondence with the
188
What is complexity?
integers. This takes us to the original idea of maximum computational complexity at the level of the Turing machine. 9.2 Computable numbers and functions Alan Turing mechanized the idea of computation classically by defining the Turing machine. A Turing machine can in principle be used to compute any computable number or function (Turing, 1936). We can recursively construct a computable number or function, digit by digit, using only integers in an algorithm. The algorithm can be used to generate as many digits as one wants, within the limits set only by √ computer time. Examples are the continued fraction expansion for 2 and the √ grade-school algorithm for 2. An example of recursion is the logistic map xn = Dxn−1 (1 − xn−1 ) with control parameter D. Recursion alone doesn’t guarantee computability: if the initial condition x0 is noncomputable, or if D is noncomputable, then so are all of the iterates xn for n > 0. If, however, we choose as initial condition a com√ putable number like x0 = 2 − 1, and a computable control parameter like D = 4, then by expressing both the initial condition and the map using binary expansions xn = .ε1 (n) . . . .ε N (n) . . ., where D = 4 = 100 in binary, then the logistic map defines a simple automaton/machine from which each point of the orbit x0 , x1 , . . . , xn , . . . can be calculated to as many decimals as one wants, always within the limits set by computation time (McCauley, 1993, 1997a). Information is lost only if one truncates or rounds off an iterate, but such mistakes are unnecessary (in grade school, such mistakes are penalized by bad grades, whereas scientific journals during the past 25 years have typically rewarded them). We have just described an example of an exact, computable chaotic trajectory calculated with controlled precision. A noncomputable number or function is a number or function that cannot be algorithmically generated digit by digit. No one can give an example of a noncomputable number, although such numbers “fill up” the continuum (are of measure one). If we could construct market statistics by a deterministic model or game, then market statistics would be algorithmically generated. This would not necessarily mean that the model or game is complex. But what is the criterion for complexity? Let us survey next a popular attempt to define complexity. 9.3 Algorithmic complexity The idea of algorithmic complexity seems both simple and appealing. Consider a binary string/pattern of length n. The definition of the algorithmic complexity of the string is the length K n of the shortest computer program that can
Algorithmic complexity
189
generate the string. The algorithm is the computer program. To keep the discussion focused, let us assume that machine language is used on a binary computer. The longest program of interest is: to write the digits one after the other, in which case K n = n. The typical sort of example given in popular papers on algorithmic information theory is that 101010101010 should be less complex than a nonperiodic string like 100100011001, for example, but both strings are equally simple, and many longer finite strings are also simple. For example, seen as binary fractions, 0.1010 = 5/8 whereas 0.1001 = 9/16. Every finite binary string can be understood as either a binary fraction or an integer (101.0 = 5 and 10001.0 = 17, for example). Instead of writing the string explicitly, we can state the rule for any string of finite length as follows: write the binary expansion of the integer or divide two integers in binary. All rational numbers between zero and unity are specified by an algorithm that states: divide integer P by integer Q. These algorithms can differ in length because P and Q can require different numbers of bits than do P and Q. For large Q (or for large P and large Q) the length of the program can become arbitrarily long, on the order of the number of bits required to specify Q. But what about infinite-length nonperiodic strings? One can prove that almost all numbers (in the sense of measure one), written as digit expansions in any integer basis of arithmetic, are “random,” meaning for one thing that there exists no algorithm by which they can be computed digit by digit (Martin-L¨of, 1966). Such digit strings are sometimes called algorithmically complex. But this case is not at all about the complexity of algorithms. It is instead about the case where no algorithm exists, the singular case where nothing can be computed. Many authors notwithstanding, this case is uninteresting for science, which requires falsifiable propositions. A falsifiable proposition is one that, among other things, can be stated in finite terms and then tested to within the precision possible in real measurements. We can summarize by saying that many periodic binary sequences are simple, and that some nonperiodic strings are also simple because the required algo√ rithm is short, like computing 2. From this perspective, nonperiodic computable sequences that are constructed from irreducibly very long algorithms are supposed to be more complex, and these sequences can be approximated by rational sequences of long period. Unfortunately, this definition still does not give us any “feeling” for, or insight into, what complexity really means physically, economically, or biologically. Also, the shortest algorithm that generates a given sequence may not be the one that nature (or the market) uses. For example, one can generate pictures of mountain landscapes via simple algorithms for self-affine fractals, but those algorithms are not derived from physics or geology, and in addition provide no insight whatsoever into how mountains actually are formed.
190
What is complexity?
What about the idea of complexity from both simple seeds and simple algorithms? The logistic map is not complex but generates chaotic orbits from simple binary initial conditions, like x0 = 1/8. That is, the chaos is “manufactured” from simplicity (1/8 = 0.001) by a very simple algorithm. Likewise, we know that there are one-dimensional cellular automata that are equivalent to a Turing machine (Wolfram, 1983, 1984). However, the simpler the machine, the more complicated the program. There is apparently no way to get complexity from simple dynamics plus a simple initial condition. 9.4 Automata Can every mathematics problem that is properly defined be solved? Motivated by this challenging question posed by Hilbert, Turing (1936) mechanized the idea of computation and generalized the notion of typing onto a ribbon of unlimited length to define precisely the idea of a universal computer, or Turing machine. The machine is capable of computing any computable number or function and is a formal abstraction of a real, finite computer. A Turing machine has unlimited memory. By proving that almost all numbers that can be defined to exist are noncomputable, Turing proved that there exist mathematical questions that can be formulated but not definitively answered. For example, one can construct computer programs that do not terminate in finite time to yield a definite answer, representing formally undecidable questions. von Neumann (1970a) formalized the idea of abstract mechanical systems, called automata, that can be used to compute. This led to a more useful and graphic idea of abstract computers with different degrees of computational capability. A so-called “universal computer” or universal automaton is any abstract mechanical system that can be proven to be equivalent to a Turing machine. The emphasis here is on the word mechanical, in the sense of classical mechanical: there is no randomness in the machine itself, although we can imagine the use of random programs in a deterministic machine. One can generate a random program by hooking a computer up to radioactive decays or radio noise, for example. In thinking of a computer as an automaton, the automaton is the dynamical system and the program is the initial condition. A universal binary computer accepts all possible binary programs. Here, in contrast, is an example of a very simple automaton, one that is far from universal: it accepts only two different programs and can compute only very limited results. Starting with the binary alphabet {a,b} and the rule R whereby a is replaced by ab and b by ba, we can generate the nonperiodic sequence a, ab, abba, abbabaab, abbabaabbaababba, . . .. The finite automaton in Figure 9.1 computes the Thue–Morse sequence in the following way. Consider the
Automata
191
1 0 a
b
0 1
Figure 9.1. The two-state automaton, that generates the Thue–Morse sequence.
sequence of programs 0,1,10,11,100,101,110,111, 1000, . . . , to be run sequentially. Before running each separate program, we agree to reset the machine in the state a. The result of all computations is recorded as the combined sequence of outputs for each input, yielding the Thue–Morse sequence: abbabaabbaababba . . .. Note that the machine simply counts the number of 1s in each program mod 2, and that the separate programs are the integers 0,1,2,3, . . ., written in base 2. Addition can be performed on a finite automaton, but multiplication, which requires increasing the precision (increasing the number of bits held in the registers and output) rapidly during the calculation, requires an automaton of unlimited size (Hopkin and Moss, 1976). Likewise, deterministic chaos requires increasing the precision within which the initial condition is specified at a rate determined by the largest Liapunov exponent λ. For an iterated map xn = f (xn−1 ) with λ = ln2, for example, we must increase the number of bits specified in the initial condition x0 (written as a binary string) at the rate of one bit per iteration of the map. As an example, if we choose x0 = 1/8 for the logistic map xn = 4xn−1 (1 − xn−1 ) and write all numbers in binary (4 = 100, for example), then we obtain the orbit x0 = 0.001, x1 = 0.0111, x2 = 0.111111, x3 = 0.0000111111, . . .. The effect of the Liapunov exponent in D = 4 = elnλ = 100 is to shift the third bit of the simple product xn−1 (1 − xn−1 ) into the first bit of xn , and also tells us the rate at which we must expect to increase the precision of our calculation per iteration in order to avoid making a mistake that eventually will be propagated into an error in the first bit. This orbit is chaotic but it is neither random (it is pseudo-random) nor is it complex: the required algorithm is simple. The level of machine complexity required for computing deterministic chaos here is simply the level of complexity required for multiplication, plus adequate memory for storing digit strings that grow in length at the rate Nn ≈ 2n N0 , where N0 , is the number of bits in the initial condition (McCauley, 1993). How do we know when we have a complex pattern, or when we have complex dynamics? In the absence of a physically motivated definition of degrees of complexity, we can only fall back on definitions of levels of complexity in computer science, like NP-completeness (Hopcroft and Ullman, 1979). There is also
192
What is complexity?
the Chomsky hierarchy for formal language recognition, which starts with a very simple automaton for the recognition of simple inputs, and ends with a Turing machine for arbitrary recursive languages (Feynman, 1996). Next, we distinguish chaos from randomness and from complexity, but we will see that there is some overlap between chaos and complexity. This distinction is necessary because complexity is sometimes confused with randomness in the literature. 9.5 Chaos vs randomness vs complexity Ideas of computational complexity have arisen within physics both from the standpoint of nonlinear dynamics2 and from statistical physics.3 A deterministic dynamical system cannot generate truly random numbers. Deterministic chaos, which we will simply call chaos, is pseudo-randomness of bounded trajectories generated via positive Liapunov exponents. The origin of pseudo-randomness always lies in an algorithm. In deterministic chaos the algorithm is discovered by digitizing the underlying dynamical system and initial conditions in an integer base of arithmetic. This is not at all the same as truncating power series solutions of differential equations for computation and then using floating point arithmetic. In contrast, randomness, for example white noise or a Wiener process, is not algorithmically generated in a stochastic differential equation.4 Complexity is not explained by either deterministic chaos or by randomness, but is a phenomenon that is distinct from either. Deterministic dynamics generating chaotic behavior is approximated by easily predictable regular behavior over very short time scales, whereas random behavior is always unpredictable at even the shortest observable time scales. The same can be said of complexity generated by a deterministic dynamical system: over short enough time scales all deterministic systems, including chaotic and complex ones, are trivially predictable. Stochastic processes, in contrast, are unpredictable even over the shortest time scales. Scaling is sometimes claimed to describe complexity, but scaling is an idea of simplicity: scaling is the notion that phenomena at shorter length scales look statistically the same, when magnified and rescaled, as do phenomena at larger length scales. In other words: no surprises occur as we look at smaller and smaller length scales. In this sense, the Mandelbrot set is an example of simplicity. So is the 2 3 4
See Fredkin and Toffoli (1982) for computation with billiard balls. Idealized models of neural networks are based on the Hopfield model (Hopfield, 1994; Hopfield and Tank, 1986). This should not be confused with the fact that computer simulations of stochastic processes are by design algorithmic and always are merely pseudo-random. Simulations should not be confused with real experiments and observations.
Complexity at the border of chaos
193
invariant set of the logistic map in the chaotic regime, where a generating partition that asymptotically obeys multifractal scaling has been discovered. Where, then, does complexity occur in deterministic dynamics? Edward Fredkin and Tomasso Toffoli showed in 1982 that billiard balls with reflectors (a chaotic system) can be used to compute reversibly, demonstrating that a Newtonian system is capable of behavior equivalent to a Turing machine. The difficulty in trying to use this machine in practice stems from the fact that the system is also chaotic: positive Liapunov exponents magnify small errors very rapidly. In fact, billiard balls have been proven by Ya. G. Sinai to be mixing, giving us an example of a Newtonian system that is rigorously statistical mechanical. In 1993 Moore constructed simple deterministic maps that are equivalent to Turing machines.5 In these systems there are no scaling laws, no symbolic dynamics, no way of inferring the future in advance, even statistically. Instead of scaling laws that tell us how the system behaves at different length scales, there may be surprises at all scales. In such a system, the only way to know the future is to choose an initial condition, compute the trajectory and see what falls out. Given the initial condition, even the statistics generated by a complex system cannot be known in advance. In contrast, the statistics generated by a chaotic dynamical system with a generating partition6 can be completely understood and classified according to classes of initial conditions. Likewise, there is no mystery in principle about which statistical distribution is generated by typical stochastic differential equations. However, the element of complexity can perhaps be combined with stochastic dynamics as well. Complexity within the chaotic regime is unstable due to positive Liapunov exponents, making the systems unreliable for building machines. Therefore, we have the current emphasis in the literature on the appearance of complexity at the transition to chaos. In that case there may be infinitely many positive Liapunov exponents representing unstable equilibria (as in a period-doubling sequence), but the emphasis is on a nonperiodic invariant set with vanishing Liapunov exponents. For the logistic map, for example, that set is a zero-measure Cantor-like set.
9.6 Complexity at the border of chaos In statistical physics universal scaling exponents arise at order–disorder transitions. For example, the transition from normal, viscous flow to superfluid flow is characterized by scaling exponents that belong to the same universality class as those 5 6
See Siegelmann (1995) for a connection with the Hopfield model. A generating partition is a natural, unique coarse-graining of phase space generated by the dynamical system. For chaotic one-dimensional maps, the generating partition, if it exists, is discovered via backward iteration of the (always multivalued) map.
194
What is complexity?
for other physical systems with the same symmetry and dimension, like the planar Heisenberg ferromagnet on a three-dimensional lattice. The scaling exponents describing the vanishing of the order parameter at the critical point, the divergence of the susceptibility, and the behavior of other singular thermodynamic quantities, are called critical exponents. A related form of scaling exponent universality has also been discovered for dynamical systems at the transition to chaos where the systems under consideration are far from thermal equilibrium (Feigenbaum, 1988a, b). For example, every map in the universality class of iterated maps defined by the logistic map generates the same scaling exponents at the transition to chaos. The same is true for the circle map universality class. This kind of universality is formally analogous to universal scaling that occurs at a second-order phase transition in equilibrium statistical physics. It is known that limited computational capability can appear in deterministic dynamical systems at the borderline of chaos, where universal classes of scaling exponents also occur. At the transition to chaos the logistic map defines an automaton that can be programmed to do simple arithmetic (Crutchfield and Young, 1990). It is also known that the sandpile model, at criticality, has nontrivial computational capability (Moore and Nilssen, 1999). Both of these systems produce scaling laws and are examples of computational capability arising at the borderline of chaos, although the scaling exponents do not characterize the computational capability generated by the dynamics. Moore showed that simple-looking one- and two-dimensional maps can generate Turing machine behavior, and speculated that the Liapunov exponents vanish asymptotically as the number of iterations goes to infinity, which would represent the borderline of chaos (Moore, 1990, 1991; Koiran and Moore, 1999). There is interest within statistical physics in self-organized criticality (SOC), which is the idea of a far-from equilibrium system where the control parameter is not tuned but instead dynamically adjusts itself to the borderline of chaos (Bak et al., 1987, 1988). The approach to a critical point can be modeled simply (Melby et al., 2000). The logistic map, for example, could adjust to criticality without external tuning if the control parameter would obey a law of motion Dm = Dc − a m (Dc − Dm−1 ) with −1 < a < 1 and m = 1, 2, . . . , for example, where Dc is the critical value. One can also try to model self-adjustment of the control parameter via feedback from the map. However, identifying real physical dynamical systems with self-organized behavior seems nontrivial, in spite of claims that such systems should be ubiquitous in nature. Certain scaling laws have been presented in the literature as signaling evidence for SOC, but a few scaling laws are not an adequate empirical prescription: scaling alone does not tell us that we are at a critical point, and we cannot expect critical exponents to be universal except at a critical point. Earthquakes, turbulence, and
Replication and mutation
195
economics have been suggested as examples of SOC, but fluid turbulence, as we have discussed in Chapter 4, does not seem to be an example of SOC. 9.7 Replication and mutation We will now concentrate on the idea of “surprises,” which Moore (1990) has proposed as the essence of complexity. Surprises are also the nature of changes in market sentiment that lead to booms and busts. But first, some thoughts that point in the direction of surprises from computer theory and biology. From the standpoint of our perspective from physics, complex systems can do unusual things. One of those is self-replication, an idea that is foreign to a physicist but not to a biologist (Purves et al., 2000). von Neumann (1970a), who invented the first example of an abstract self-replicating automaton, also offered the following rough definition of complexity: a system is simple when it is easier to describe mathematically than to build (chaos in the solar system, for example). A system is called complex if it is easier to build or produce it than to describe it mathematically, as in the case of DNA leading to an embryo. von Neumann’s original model of a self-replicating automaton with 32 states was simplified to a two-state system by McCullough and Pitts (Minsky, 1967). The model was later generalized to finite temperatures by Hopfield (1994) and became the basis for simple neural network models in statistical physics. Both bacteria and viruses can replicate themselves under the right conditions, but we cannot know in advance the entirely new form that a virulent bacterium might take after mutation. There, we do not have the probabilities for different possible forms for the bacterium, as in the tosses of a die. We have instead the possibility of an entirely new form, something unexpected, occurring via mutation during the time evolution of the dynamics. The result of fertilizing an egg with a sperm is another example of complexity. The essence of complexity is unpredictability in the form of “surprises” during the time evolution of the underlying dynamics. Scaling, attractors, and symbolic dynamics cannot be used to characterize complexity. From the standpoint of surprises as opposed to cataloging probabilities for a set of known, mutually exclusive alternatives, we can also see scientific progress as an example of “mutations” that may represent an underlying complex dynamical process: one cannot know in advance which new scientific discoveries will appear, nor what new technologies and also economies they may give birth to. But one thing is sure: the dominant neo-classical idea of “equilibrium” is useless for attempting to describe economic growth, and is not even in the same ballpark as economic growth that is complex. There are nonmainstream economists who study both automata and games (Dosi, 2001). Game theory, particularly the use of Nash equilibria, is used primarily by mainstream economic theorists (Gibbons, 1992) and has had very strong influence
196
What is complexity?
on the legal profession at high levels of operation (Posner, 2000). Nash equilibria have been identified as neo-classical, which partly explains the popularity of that idea (Mirowski, 2002). In econophysics, following the inventive economist Brian Arthur, the minority game has been extensively studied, with many interesting mathematical results. von Neumann first introduced the idea of game theory into economics, but later abandoned game theory as “the answer” in favor of studying automata. A survey of the use of game theory and automata in economics (but not in econophysics) can be found in Mirowski (2002). Poundstone (1992) describes many different games and the corresponding attempts to use games to describe social phenomena. Econophysics has also contributed recently to game theory, and many references can be found on the website www.unifr.ch/econophysics. Mirowski, in his last chapter of Machine Dreams, suggests that perhaps it is possible to discover an automaton that generates a particular set of market data. More complex markets would then be able to simulate the automata of simpler ones. That research program assumes that a market is approximately equivalent to a nonuniversal computer with a fixed set of rules and fixed program (one can simulate anything on a universal computer). One can surely generate any given set of market statistics by an automaton, but nonuniquely: the work on generating partitions for chaotic systems teaches us that there is no way to pin down a specific deterministic dynamical system from statistics alone, because statistics are not unique in deterministic dynamics. That is, one may well construct an ad hoc automaton that will reproduce the data, but the automaton so-chosen will tell us nothing whatsoever about the economic dynamics underlying the data. Again, this would be analogous to using simple rules for self-affine fractals (mentioned in Chapter 8) to generate landscape pictures. Another example of nonuniqueness is that one can vary the initial conditions for the binary tent map and thereby generate any histogram that can be constructed. All possible probability distributions are generated by the tent map on its generating partition. The same is true of the logistic map with D = 4, and of a whole host of topologically equivalent maps. We expect that, given the empirical market distribution analyzed in Chapter 6, there are in principle many different agent-based trading models that could be used to reproduce those statistics. Unfortunately, we cannot offer any hope here that such nonuniqueness can be overcome, because complex systems lack generating partitions and it is the generating partition, not the statistics, that characterizes the dynamics. 9.8 Why not econobiology? Economics and markets, like all humanly invented phenomena, involve competition and are a consequence of biology; but can we exploit this observation to any benefit? Can biology provide a mathematical model for aggregate market behavior (Magnasco, 2002), or is mental behavior like economics too far removed from the
Why not econobiology?
197
immediate consequences of genetics, which is based on the invariance of genes and the genetic code? Approximate macro-invariants have been searched for by economists interested in modeling growth, but without the discovery of any satisfying results (Dosi, 2001). Standard economic theory emphasizes optimization whereas biological systems are apparently redundant rather than optimally efficient (von Neumann, 1970b).7 This pits the idea of efficiency/performance against reliability, as we now illustrate. A racing motor, a sequential digital computer, or a thoroughbred horse are examples of finely tuned, highly organized machines. One small problem, one wire disconnected in a motor’s ignition system, and the whole system fails. Such a system is very efficient but failure-prone. A typical biological system, in contrast, is very redundant and inefficient but has invaluable advantages. It can lose some parts, a few synapses, an arm, an eye, or some teeth, and still may function at some reduced and even acceptable level of performance, depending on circumstances. Or, in some cases, the system may even survive and function on a sophisticated level like bacteria that are extremely adaptable to disasters like nuclear fallout. A one-legged runner is of little use, but an accountant or theorist or writer can perform his work with no legs, both in principle and in practice. The loss of a few synapses does not destroy the brain, but the loss of a few wires incapacitates a PC, Mac, or a sequential mainframe computer. Of interest in this context is von Neumann’s paper on the synthesis of reliable organisms from unreliable components. Biological systems are redundant, regenerative, and have error-correcting ability. Summarizing, in the biological realm the ability to correct errors is essential for survival, and the acquisition of perfect information by living beings is impossible (see Leff and Rex (1990) for a collection of discussions of the physical limitations on the acquisition of information-as-knowledge). In economic theory we do not even have a systematic theory of correcting misinformation about markets. Instead, economics texts still feed students the standard neo-classical equilibrium line of perfect information acquisition and Pareto efficiency.8 In the name of control and efficiency, humanly invented organizations like firms, government and the military create hierarchies. In the extreme case of a pure topdown hierarchy, where information and decisions flow only in one direction, downward into increasingly many branches on the organizational tree, a mistake is never corrected. Since organizations are rarely error-free, a top-down hierarchy with little or no upward feedback, one where the supposedly “higher-level automata” tend not to recognize (either ignore or do not permit) messages sent from below, can easily lead to disaster. In other words, error-correction and redundance may be 7 8
For a systematic discussion of the ideas used in von Neumann’s paper, see the text by Brown and Vranesic (2000). Imperfect information is discussed neo-classically, using expected utility, in the theory called “asymmetric information” by Stiglitz and Weiss (1992) and by Ackerlof (1984).
198
What is complexity?
important for survival. Examples of dangerous efficiency in our age of terrorism are the concentration of a very large fraction of the USA’s refining capacity along the Houston Ship Channel, the concentration of financial markets in New York, and the concentration of government in a few buildings in Washington, D.C. Maybe there are lessons for econophysicists in biology, and maybe an econphysicist will some day succeed in constructing a meaningful econobiology, but what are the current lessons from biology, exactly? We do not yet have biologically inspired models of economic growth/decay that exhibit predictive power. We do not even have a falsifiable macroscopic model of biological evolution. We understand evolution via mutations at the molecular level, but we have no falsifiable mathematical description of evolution at the macro-level over long time scales. So what is left? Can we do anything to improve the prospects for an econobiology or econobiophysics? The important question is not whether we can build simple mathematical models of complexity; this has already been done by Moore for mechanical models and by Dawkins and Kauffmann for “biological” models, which are in reality also simple mechanical models. The question is whether we can define and model degrees of complexity in any empirically falsifiable way instead of just building mathematical models that remind us of some aspects of biology, like regeneration. Cell biology provides us with plenty of examples of complexity (Alberts et al., 2002), but so far neither physicists nor biologists have produced corresponding mathematical models.9 There is a yawning gap between the simple models known as complex adaptable systems on the one hand, and the mass of unmathematized real facts about complex processes in cells on the other. Simple-looking models are very often useful in physics, for example the Ising model, but the Ising model has been used to describe measurable properties of real physical systems, like critical exponents. The message of this book is that physicists can contribute decisively to understanding economics by bringing the Galilean method into that field, following the example set by the first econophysicist, Osborne. That method has been the basis for 400 years of unequaled growth of scientific knowledge. The method is also the basis for our construction of the empirically based model of financial markets presented in Chapter 6. The method of physics – skepticism about ad hoc postulates like utility maximization combined with the demand for empirically based, falsifiable models – is the thread that runs throughout this book. We as physicists can collect and analyze reliable empirical data whatever the origin, whether from physics, economics, biology, or elsewhere, unprejudiced by special beliefs and 9
Ivar Giævar, who won a Nobel Prize in physics, much later “retired,” and then began research in biophysics, recommends that physicists learn the text by Alberts et al. He asserts that “either they are right or we are right and if we are right then we should add some mathematics to the biology texts.” (Comment made during a lecture, 1999 Geilo NATO-ASI.)
Note added April 8, 2003
199
models, always asking: “How can we understand the data? Can the measurements teach us anything?” If we stick to the method of physics,10 and avoid models that are completely divorced from empirical data (from reality), then the answer suggested by the history of physics and microbiology indicates that we should be able to add some clarity to the field of economics.11 But I suggest that we should not wait for biology to appear as a guide. There is so far no reliable theory or estimate of economic growth because we have no approximately correct, empirically grounded theory of macroeconomic behavior. I suggest that econophysicists should stay close to real market data. Because of the lack of socio-economic laws of nature and because of the nonuniqueness in explaining statistical data via dynamical models, well-known in deterministic chaos and illustrated for stochastic dynamics in Chapter 6, we have a far more difficult problem than in the natural sciences. The difficulty is made greater because nonfinancial economic data are generally much more sparse and less reliable than are financial data. But as the example of our empirically based model of financial market dynamics encourages, we still should try to add some more useful equations to macroeconomics texts. We should try to replace the standard arguments about “sticky prices” and “elasticity of demand” that are at best poor, hand waving equilibrium-bound substitutes for reality, with empirically based dynamical models with the hope that the models can eventually be falsified. Such an approach might free neo-classical economists from the illusion of stable equilibria in market data. Having now arrived at the frontier of new research fields, can’t we do better in giving advice for future research? The answer is no. This last chapter is more like the last data point on a graph, and as Feynman has reminded us, the last data point on a graph is unreliable, otherwise it wouldn’t be the last data point. Or, more poetically: “The book has not yet been written that doesn’t need explanation.”12 9.9 Note added April 8, 2003 Newton wrote that he felt like a boy on the seashore, playing and diverting himself now and then by finding a smoother pebble or prettier shell, while the great ocean 10
11
12
Also followed by Mendel in his discovery of the laws of genetics. Mendel studied and then taught physics in Vienna. See Olby (1985) and Bowler (1989). However, if one asks scientists “What did Mendel study in Vienna?” the most likely answers are (a) “peas” or (b) “theology.” Neo-classical economists have insisted on ignoring empirics and instead have concentrated on a model (utility maximization) that is mathematically so simple that they could prove rigorous theorems about it. Imagine where we would stand today had theoretical physicists in any era behaved similarly; for example, had we waited for the assumptions of quantum electrodynamics or equilibrium statistical mechanics to be proven mathematically rigorously. “Men den som har det fulle og rette skjønn, vil se at den bok som enn˚a trenges, til forklaring, er større enn den som her er skrevet,” Kongespeilet (Brøgger, 2000). The sayings of this book are from the era following the Viking Sagas.
200
What is complexity?
of truth lay undiscovered before him. We know that he was right, because we have stood on Newton’s shoulders and have begun to see into and across the depths of the ocean of truth, from the solar system to the atomic nucleus to DNA and the amazing genetic code.13 But in socio-economic phenomena, there is no time-invariant ocean of truth analogous to laws of nature waiting to be discovered. Rather, markets merely reflect what we are doing economically, and the apparent rules of behavior of markets, whatever they may appear to be temporarily, can change rapidly with time. The reason that physicists should study markets is to find out what we’re doing, to take the discussion and predictions of economic behavior out of the hands of the ideologues and place them on an empirical basis, to eliminate the confusion and therefore the power of ideology. This appears to be a task in dimension that is not less bold and challenging than when, in the seventeenth century, the scientific revolution largely eliminated priests and astrologers from policy-making and thereby ended the witch trials in western Europe (Trevor-Roper, 1967). With the coercive and destructive power of militant religion and other ideology in mind, I offer the following definitions for the reader’s consideration: a neo-classical economist is one who believes in the stability and equilibrium of unregulated markets, that deregulation and expansion of markets lead toward the best of all possible worlds (the Pareto optimum). A neo-liberal is one who advocates globalization based on neo-classical ideology. A neo-conservative14 is a mutation on a neo-liberal: he has a modern techno-army and also the will and desire to use it in order to try to create and enforce his global illusion of the best of all possible worlds. 13 14
See Bennett (1982), and Lipton (1995) for the behavior of DNA and genetic code as computers. See www.newamericancentury.org for the Statement of Principles and program of the neo-conservatives, who advocate playing “defect” (in the language of game theory) and the use of military force as foreign policy.
References
Ackerlof, G. A. 1984. An Economic Theorist’s Book of Tales. Cambridge: Cambridge University Press. Alberts, B. et al. 2002. Molecular Biology of the Cell. New York: Garland Publishing. Arnold, L. 1992. Stochastic Differential Equations. Malabar, FL: Krieger. Arrow, K. J. and Hurwicz, L. 1958. Econometrica 26, 522. Arthur, W. B. 1994. Increasing Returns and Path Dependence in the Economy. Ann Arbor: University of Michigan Press. 1995. Complexity in economic and financial markets. Complexity, number 1. Bak, P., Tang, C., and Wiesenfeld, K. 1987. Phys. Rev. Lett. 59, 381. 1988. Phys. Rev. A38, 364. Bak, P., Nørrelykke, S. F., and Shubik, M. 1999. The dynamics of money. Phys. Rev. E60(3), 2528–2532. Barabasi, A.-L. and Stanley, H. E. 1995. Fractal Concepts in Surface Growth. Cambridge: Cambridge University Press. Barbour, J. 1989. Absolute or Relative Motion? Cambridge: Cambridge University Press. Barro, R. J. 1997. Macroeconomics. Cambridge, MA: MIT Press. Bass, T. A. 1991. The Predictors. New York: Holt. Baxter, M. and Rennie, A. 1995. Financial Calculus. Cambridge: Cambridge University Press. Bender, C. M. and Orszag, S. A. 1978. Advanced Mathematical Methods for Scientists and Engineers. New York: McGraw-Hill. Bennett, C. H. 1982. Int. J. Theor. Phys. 21, 905. Berlin, I. 1998. The Crooked Timber of Humanity. Princeton: Princeton University Press. Bernstein, P. L. 1992. Capital Ideas: The Improbable Origins of Modern Wall Street. New York: The Free Press. Billingsley, P. 1983. American Scientist 71, 392. Black, F. 1986. J. Finance 3, 529. 1989. J. Portfolio Management 4,1. Black, F., Jensen, M. C., and Scholes, M. 1972. In Studies in the Theory of Capital Markets, ed. M. C. Jensen. New York: Praeger. Black, F. and Scholes, M. 1973. J. Political Economy 81, 637. Blum, P. and Dacorogna, M. 2003 (February). Risk Magazine 16 (2), 63. Bodie, Z. and Merton, R. C. 1998. Finance. Saddle River, NJ: Prentice-Hall. Borland, L. 2002. Phys. Rev. Lett. 89, 9.
201
202
References
Bose, R. 1999 (Spring). The Federal Reserve Board Valuation Model. Brown Economic Review. Bouchaud, J.-P. and Potters, M. 2000. Theory of Financial Risks. Cambridge: Cambridge University Press. Bowler, P. J. 1989. The Mendellian Revolution. Baltimore: Johns Hopkins Press. Brown, S. and Vranesic, Z. 2000. Fundamentals of Digital Logic with VHDL Design. Boston: McGraw-Hill. Bryce, R. and Ivins, M. 2002. Pipe Dreams: Greed, Ego, and the Death of Enron. Public Affairs Press. Callen, H. B. 1985. Thermodynamics. New York: Wiley. Caratheodory, C. 1989. Calculus of Variations. New York: Chelsea. Casanova, G. 1997. History of my Life, trans. W. R. Trask. Baltimore: Johns-Hopkins. Castaing, B., Gunaratne, G. H., Heslot, F., Kadanoff, L., Libchaber, A., Thomae, S., Wu, X.-Z., Zaleski, S., and Zanetti, G. 1989. J. Fluid Mech. 204, 1. Chhabra, A., Jensen, R. V., and Sreenivasan, K. R. 1988. Phys. Rev. A40, 4593. Ching, E. S. C. 1996. Phys. Rev. E53, 5899. Cootner, P. 1964. The Random Character of Stock Market Prices. Cambridge, MA: MIT Press. Courant, R. and Hilbert, D. 1953. Methods of Mathematical Physics, vol. II. New York: Interscience. Crutchfield, J. P. and Young, K. 1990. In Complexity, Entropy and the Physics of Information, ed. W. Zurek. Reading: Addison-Wesley. Dacorogna, M. et al. 2001. An Introduction to High Frequency Finance. New York: Academic Press. Dosi, G. 2001. Innovation, Organization and Economic Dynamics: Selected Essays. Cheltenham: Elgar. Dunbar, N. 2000. Inventing Money, Long-Term Capital Management and the Search for Risk-Free Profits. New York: Wiley. Eichengren, B. 1996. Globalizing Capital: A History of the International Monetary System. Princeton: Princeton University Press. Fama, E. 1970 (May). J. Finance, 383. Farmer, J. D. 1994. Market force, ecology, and evolution (preprint of the original version). 1999 (November/December). Can physicists scale the ivory tower of finance? In Computing in Science and Engineering, 26. Feder, J. 1988. Fractals. New York: Plenum. Feigenbaum, M. J. 1988a. Nonlinearity 1, 577. 1988b. J. Stat. Phys. 52, 527. Feynman, R. P. 1996. Feynman Lectures on Computation. Reading, MA: Addison-Wesley. Feynman, R. P. and Hibbs, A. R. 1965. Quantum Mechanics and Path Integrals. New York: McGraw-Hill. F¨ollmer, H. 1995. In Mathematical Models in Finance, eds. Howison, Kelly, and Wilmott. London: Chapman and Hall. Fredkin, E. and Toffoli, T. 1982. Int. J. Theor. Phys. 21, 219. Friedman, T. L. 2000. The Lexus and the Olive Tree: Misunderstanding Globalization. New York: Anchor. Friedrichs, R., Siegert, S., Peinke, J., L¨uck, St., Siefert, S., Lindemann, M., Raethjen, J., Deuschl, G., and Pfister, G. 2000. Phys. Lett. A271, 217. Frisch, U. 1995. Turbulence. Cambridge: Cambridge University Press. Frisch, U. and Sornette, D. 1997. J. de Physique I 7, 1155.
References
203
Galilei, G. 2001. Dialogue Concerning the Two Chief World Systems, trans. S. Drake. New York: Modern Library Series. Gerhard-Sharp, L. et al. 1998. Polyglott. APA Guide Venedig. Berlin und M¨unchen: Langenscheidt KG. Gibbons, R. C. 1992. Game Theory for Applied Economists. Princeton: Princeton University Press. Ginzburg, C. 1992. Clues, Myths and the Historical Method. New York: Johns Hopkins. Gnedenko, B. V. 1967. The Theory of Probability, trans. B. D. Seckler. New York: Chelsea. Gnedenko, B. V. and Khinchin, A. Ya. 1962. An Elementary Introduction to the Theory of Probability. New York: Dover. Gunaratne, G. 1990a. An alternative model for option pricing, unpublished Trade Link Corp. internal paper. 1990b. In Universality Beyond the Onset of Chaos, ed. D. Campbell. New York: AIP. Gunaratne, G. and McCauley, J. L. 2003. A theory for fluctuations in stock prices and valuation of their options (preprint). Hadamard, J. 1945. The Psychology of Invention in the Mathematical Field. New York: Dover. Halsey, T. H. et al. 1987. Phys. Rev. A33, 114. Hamermesh, M. 1962. Group Theory. Reading, MA: Addison-Wesley. Harrison, M. and Kreps, D. J. 1979. Economic Theory 20, 381. Harrison, M. and Pliska, S. 1981. Stoch. Proc. and Their Applicat. 11, 215. Hopkin, D. and Moss, B. 1976. Automata. New York: North-Holland. Hopcraft, J. E. and Ullman, J. D. 1979. Introduction To Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley. Hopfield, J. J. 1994 (February). Physics Today, 40. Hopfield, J. J. and Tank, D. W. 1986 (August). Science 233, 625. Hull, J. 1997. Options, Futures, and Other Derivatives. Saddle River: Prentice-Hall. Hughes, B. D., Schlessinger, M. F., and Montroll, E. 1981. Proc. Nat. Acad. Sci. USA 78, 3287. Intrilligator, M. D. 1971. Mathematical Optimization and Economic Theory. Engelwood Cliffs: Prentice-Hall. Jacobs, B. I. 1999. Capital Ideas and Market Realities: Option Replication, Investor Behavior, and Stock Market Crashes. London: Blackwell. Jacobs, J. 1995. Cities and the Wealth of Nations. New York: Vintage. Jorion, P. 1997. Value at Risk: The New Benchmark for Controlling Derivatives Risk. New York: McGraw-Hill. Kac, M. 1959. Probability and Related Topics in Physical Sciences. New York: Interscience. Keen, S. 2001. Debunking Economics: the Naked Emperor of the Social Sciences. Zed Books. Kirman, A. 1989. The Economic Journal 99, 126. Kongespeilet, 2000. [Konungs skuggsj´a Norsk], oversett fra islandsk av A. W. Brøgger. Oslo: De norske bokklubbene. Koiran, P. and Moore, C. 1999. Closed-form analytic maps in one and two dimensions can simulate universal Turing Machines. In Theoretical Computer Science, Special Issue on Real Numbers, 217. Kubo, R., Toda, M., and Hashitsume, N. 1978. Statistical Physics II: Nonequilibrium Statistical Mechanics. Berlin: Springer-Verlag. Laloux, L., Cizeau, P., Bouchaud, J.-P., and Potters, M. 1999. Phys. Rev. Lett. 83, 1467.
204
References
Leff, H. S. and Rex, A. F. 1990. Maxwell’s Demon, Entropy, Information, Computing. Princeton: Princeton University Press. Lewis, M. 1989. Liar’s Poker. New York: Penguin. Lipton, R. J. 1995. Science 268, 542. Luoma, J. R. 2002 (December). Water for Profit in Mother Jones, 34. Magnasco, M. O. 2002. The Evolution of Evolutionary Engines. In Complexity from Microscopic to Macroscopic Scales: Coherence and Large Deviations, eds. A. T. Skjeltorp and T. Viscek. Dordrecht: Kluwer. Malkiel, B. 1996. A Random Walk Down Wall Street, 6th edition. New York: Norton. Mandelbrot, B. 1964. In The Random Character of Stock Market Prices, ed. P. Cootner. Cambridge, MA: MIT. 1966. J. Business 39, 242. 1968. SIAM Rev. 10 (2), 422. Mankiw, N. G. 2000. Principles of Macroeconomics. Mason, Ohio: South-Western College Publishing. Mantegna, R. and Stanley, H. E. 2000. An Introduction to Econophysics. Cambridge: Cambridge University Press. Martin-L¨of, P. 1966. Inf. Control 9, 602. McCandless Jr., G. T. 1991. Macroeconomic Theory. Englewood Cliffs: Prentice-Hall. McCauley, J. L. 1991. In Spontaneous Formation of Space-Time Structures and Criticality, eds. T. Riste and D. Sherrington. Dordrecht: Kluwer. 1993. Chaos, Dynamics and Fractals: an Algorithmic Approach to Deterministic Chaos. Cambridge: Cambridge University Press. 1997a. Classical Mechanics: Flows, Transformations, Integrability and Chaos. Cambridge: Cambridge University Press. 1997b. Physica A237, 387. 1997c. Discrete Dynamical Systems in Nature and Society 1, 17. 2000. Physica A285, 506. 2001. Physica Scripta 63, 15. 2002. Physica A309, 183. 2003a. Physica A329, 199. 2003b. Physica A329, 213. McCauley, J. L. and Gunaratne, G. H. 2003a. Physica A329, 170. 2003b. Physica A329, 178. Melby, P., Kaidel, J., Weber, N., and H¨ubler, A. 2000. Phys. Rev. Lett. 84, 5991. Miller, M. H. 1988. J. Econ. Perspectives 2(4), 99. Millman, G. J. 1995. The Vandals’ Crown. The Free Press. Minsky, M. L. 1967. Computation: Finite and Infinite Machines. New York: Prentice-Hall. Mirowski, P. 1989. More Heat than Light. Economics as Social Physics, Physics as Nature’s Economics. Cambridge: Cambridge University Press. 2002. Machine Dreams. Cambridge: Cambridge University Press. Modigliani, F. 2001. Adventures of an Economist. New York: Texere. Modigliani, F. and Miller, M. 1958. The American Econ. Rev. XLVIII, 3, 261. Moore, C. 1990. Phys. Rev. Lett. 64, 2354. 1991. Nonlinearity 4, 199 & 727. Moore, C. and Nilsson, M. 1999. J. Stat. Phys. 96, 205. Nakahara, M. 1990. Geometry, Topology and Physics. Bristol: IOP. Nakamura, L. I. 2000 (July/August). Economics and the New Economy: the Invisible Hand meets creative destruction. In Federal Reserve Bank of Philadelphia Business Review, 15.
References
205
Neftci, S. N. 2000. Mathematics of Financial Derivatives. New York: Academic Press. Niven, I. 1956. Irrational Numbers. Carus Mathematics Monogram Number 11, Mathematics Association of America. Olby, R. 1985. Origins of Mendelism. Chicago: University of Chicago. Ormerod, P. 1994. The Death of Economics. London: Faber & Faber. Osborne, M. F. M. 1964. In The Random Character of Stock Market Prices, ed. P. Cootner. Cambridge, MA: MIT. 1977. The Stock Market and Finance from a Physicist’s Viewpoint. Minneapolis: Crossgar. Plerou, V., Gopikrishnan, P., Rosenow, B., Nunes, L., Amaral, L., and Stanley, H. E. 1999. Phys. Rev. Lett. 83, 1471. Posner, E. A. 2000. Law and Social Norms. New York: Harvard University Press. Poundstone, W. 1992. Prisoner’s Dilemma. New York: Anchor. Purves, W. K. et al. 2000. Life: The Science of Biology. New York: Freeman. Radner, R. 1968. Econometrica 36, 31. Renner, C., Peinke, J., and Friedrich R. 2000 J. Fl. Mech. 433, 383. 2001. Physica A298, 49. Roehner, B. M. 2001. Hidden Collective Factors in Speculative Trading: A Study in Analytical Economics. New York: Springer-Verlag. Saari, D. 1995. Notices of the AMS 42, 222. Scarf, H. 1960. Int. Econ. Rev. 1, 157. Schr¨odinger, E. 1944. What is Life? Cambridge: Cambridge University Press. Sharpe, W. F. 1964. J. Finance XIX, 425. Shiller, R. J. 1999. Market Volatility. Cambridge, MA: MIT. Siegelmann, H. T. 1995. Science 268, 545. Skjeltorp, J. A. 1996. Fractal Scaling Behaviour in the Norwegian Stock Market, Masters thesis, Norwegian School of Management. Smith, A. 2000. The Wealth of Nations. New York: Modern Library. Smith, E. and Foley, D. K. 2002. Is utility theory so different from thermodynamics? Preprint. Sneddon, I. N. 1957. Elements of Partial Differential Equations. New York: McGraw-Hill. Sonnenschein, H. 1973a. Econometrica 40, 569. 1973b. J. Economic Theory 6, 345. Sornette, D. 1998. Physica A256, 251. 2001. Physica A290, 211. Soros, G. 1994. The Alchemy of Finance: Reading the Mind of the Market. New York: Wiley. Steele, J. M. 2000. Stochastic Calculus and Financial Applications. New York: Springer-Verlag. Stiglitz, J. E. 2002. Globalization and its Discontents. New York: Norton. Stiglitz, J. E. and Weiss, A. 1992. Oxford Economic Papers 44(2), 694. Stolovitsky, G. and Ching, E. S. C. 1999. Phys. Lett. A255, 11. Stratonovich, R. L. 1963. Topics in the Theory of Random Noise, vol. I, trans. R. A. Silverman. New York: Gordon & Breach. 1967. Topics in the Theory of Random Noise, vol. II, trans. R. A. Silverman. New York: Gordon & Breach. Tang, L.-H. 2000. Workshop on Econophysics and Finance (Heifei, China). Trevor-Roper, H. R. 1967. The Crisis of the Seventeenth Century; Religion, the Reformation, and Social Change. New York: Harper & Row. Turing, A. M. 1936. Proc. London Math. Soc. (2) 42, 230.
206
References
Varian, H. R. 1992. Microeconomics Analysis. New York: Norton. 1999. Intermediate Economics. New York: Norton. von Neumann, J. 1970a. Essays on Cellular Automata, ed. A. W. Burks. Urbana: University of Illinois. 1970b. Probabilistic logic and the synthesis of reliable elements from unreliable components. In Essays on Cellular Automata, ed. A. W. Burks. Urbana: University of Illinois. Wax, N. 1954. Selected Papers on Noise and Stochastic Processes. New York: Dover. Weaver, W. 1982. Lady Luck. New York: Dover. Wigner, E. P. 1967. Symmetries and Reflections. Bloomington: University of Indiana. Wilmott, P., Howison, S. D., and DeWynne, J. 1995. The Mathematics of Financial Derivatives: A Student Introduction. Cambridge: Cambridge University Press. Wolfram, S. 1983. Los Alamos Science 9, 2. 1984. Physica 10D, 1. Yaglom, A. M. and Yaglom, I. M. 1962. An Introduction to the Theory of Stationary Random Functions, translated and edited by Richard A. Silverman. Englewood Cliffs, NJ: Prentice-Hall. Zhang, Y.-C. 1999. Physica A269, 30.
Index
accounting, mark to market 118 algorithmic complexity 188 arbitrage 64, 141, 152 assets risk-free 96 uncorrelated 93 asymmetric information, theory of 156 automata 190 self-replicating 195 Black, Fischer 83 Black–Scholes model 107, 109–112 backward-in-time diffusion equation 109, 112, 140 bounded rationality 153 Buffet, Warren 89, 101 call price (see also put price) 104, 123, 129 capital asset pricing model (CAPM) 97, 109–112 capital structure 68 cascade eddy 169, 173 cell biology 198 central limit theorem (CLT) 39, 41 chaos 192, 193 communist ideology 9 complexity 185 computer science, levels of complexity in 191 conservation laws 16, 27 integrability condition 27 criticality, self-organized 87 delta hedge strategy 139 demand, excess 13 deregulation of markets 9, 29, 30, 157 distribution characteristic function of 33 cost of carry 130 empirical 32, 92, 115, 123, 124, 185 exponential 35, 86, 124, 135: perturbation 130–132; stretched 37 fat-tailed 36, 73, 81, 143, 178: nonuniversality 143
invariant 34 lognormal 35, 113 diversification 91 efficiency 14, 153 efficient market hypothesis 101 a fair game 101, 166 empirical data 124, 199 Enron 118, 156 entropy 79, 153 equilibrium 9, 13, 19, 27, 59, 64, 70, 76, 78, 81, 97, 155, 169 computational complexity 19 general theory of 14 stable 14, 78 statistical 79, 153–154, 157: entropy 79, 153; maximum disorder 79 temporary price 76 European Union 28 exponentials 35, 124, 135 perturbation within 130–132 stretched 37, 144 extreme events 142 Farmer, J. D. 63 falsifiable theory 3, 109, 139, 189 finance data 125, 186 financial engineering 117 implied volatility 123 financial markets, complexity of 185 fluctuation–dissipation theorem 155 fluid turbulence 169 instabilities 170 velocity structure functions 173 vortices 170 Fokker–Planck equation 51 fractal growth phenomena correlation dimension 162 fractional Brownian motion 163 nonstationary process 166
207
208
Index
Gambler’s Ruin 67 game, fair 67, 101, 166 game theory 196 Gaussian distribution 35 process 52 globalization 9, 29, 30, 157 Green function 49, 113, 133, 140 hedges 102, 139 replicating self-financing 148 Hurst exponent 122, 162, 173 information, degradation of 166 initial endowment 16 instability 80, 153, 157 integrability conditions 29 local symmetries 29 International Monetary Fund (IMF) 10, 28 invariance principles global 29 local 2, 152 invariants 34 Ito calculus 42, 45 Kirman, A. 152 Keynesian theory 23 Kolmogorov equation 140, 145 law of large numbers 38 law of one price 64 lawlessness 3 Levy distributions 176, 180 aggregation equation 176 Liapunov exponents 75, 87, 88 liquidity (see money) 20, 143–144, 147, 153–154, 155 uncertainty 20, 153 local vs global law 28 Long Term Capital Management (LTCM) 71, 150, 156 macroeconomics microscopic lawlessness 85 Malkiel, B. darts 89, 93 Mandelbrot, B. 73 market capitalization 68 clearing 12 complexity 82 conditional average of a 50 data: as a restoring force 80; and destabilization 80 efficiency 14, 64, 153 liquid 102 patterns in the 167 price 66: bid/ask spreads 66 stability 80, 157 stationary processes in a 58, 82, 157 steady state of a 59 Markov processes 49, 121 Martingale 166 Marxism 25
mathematical laws of nature 2 Modigliani–Miller theorem 68, 105, 119, 150 monetarism 24 money (see liquidity) 20, 153, 155 bounded rationality 153 neo-classical model 9 noise 43, 61 noncomputable numbers 187 options 102, 106, 128 American 102 European 102 expiration time 102 synthetic 105, 149 Ormerod, Paul 22 Osborne, M. F. M. 21, 72, 121 phase space nonintegrable motion 28 portfolio beta, use of within a 98 delta hedge 108 dynamic rebalancing of a 109 efficient 95, 98 fluctuating return of 92 insurance 105, 156 minimum risk 95 tangency 95 transformations within a 100 Prediction Company, The 168 price noise 83 value 83 probability conservation of 51 lognormal density 35 measure 32 pseudo-random time series 41 random variable 41 scalar density 34 transformations of density 34 transition 49: Green function 49 profit motive 25 put price (see also call price) 104, 123, 129 put–call parity 105 Radner, Roy 19, 152 rational agents 10 redundancy 197 reversible trade 147 risk 91 free 108, 130–132 neutral option pricing 139 nondiversifiable 100 premium 98 sandpile model 194 scaling 183 exponents 88, 142 Joseph exponent 163
Index R/S analysis 163 K41 model 175 K62 lognormal model 174 law 74, 161 multiaffine 172 persistence/antipersistence (see also fractional Brownian motion) 163 R/S analysis 163 self-affine 162, 177 self-similar: pair correlation function 161 Scarf’s model 17 Sharpe ratio 167 Smith’s Invisible Hand 10, 14, 77, 80 Smoluchowski–Uhlenbeck–Ornstein (S–U–O) process 80, 151, 154 stationary force within the 154 Sonnenschein, H. 22 Soros, George 66 stochastic calculus 42 differential equation 42, 133: diffusion coefficient 58; global solutions 57; local solutions 57; local volatility 58 integral equation 46 Ito product 45 nonstationary forces 155 processes 41, 42: Green function 57, 113, 133; pair correlation function 60, 164; spectral density 60; and white noise 61 volatility 49, 53, 128, 134: nonuniqueness in determining local 136
stationary process 52, 76, 154, 157 strike price 102 supply and demand curves 12, 21 symbol sequence 186 symmetry 2 thermodynamic analogy 147, 150 efficiency 153 time value of money 64 Tobin’s separation theorem 96 transformations 34 traders 123, 141 Turing machine 190 universal computer 190 universality 86 utility function 11, 26 Hamiltonian system 27 Lagrangian utility rate 26 utility theory indifference curves 18 Value at Risk 118, 143–144 volatility smile 117 Walras’s Law 17 Wiener integrals 54 process 43 Wigner, Eugene 2 World Bank 10
209