Volatility and Time Series Econometrics: Essays in Honor of Robert Engle (Advanced Texts in Econometrics)

ADVANCED TEXTS IN ECONOMETRICS Editors Manuel Arellano Guido Imbens Grayham E. Mizon Adrian Pagan Mark Watson Advisory ...

Author: Mark Waston | Tim Bollerslev | Jeffrey Rusell

247 downloads 1362 Views 4MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

ADVANCED TEXTS IN ECONOMETRICS Editors Manuel Arellano Guido Imbens Grayham E. Mizon Adrian Pagan Mark Watson Advisory Editor C. W. J. Granger

Other Advanced Texts in Econometrics ARCH: Selected Readings Edited by Robert F. Engle Bayesian Inference in Dynamic Econometric Models By Luc Bauwens, Michel Lubrano, and Jean-Fran¸cois Richard Co-integration, Error Correction, and the Econometric Analysis of Non-Stationary Data By Anindya Banerjee, Juan J. Dolado, John W. Galbraith, and David Hendry Dynamic Econometrics By David F. Hendry Finite Sample Econometrics By Aman Ullah Generalized Method of Moments By Alastair R. Hall Likelihood-Based Inference in Cointegrated Vector Autoregressive Models By Søren Johansen Long-Run Economic Relationships: Readings in Cointegration Edited by Robert F. Engle and Clive W. J. Granger Micro-Econometrics for Policy, Program and Treatment Eﬀects By Myoung-jae Lee Modelling Economic Series: Readings in Econometric Methodology Edited by Clive W. J. Granger Modelling Non-Linear Economic Relationships By Clive W. J. Granger and Timo Ter¨ asvirta Modelling Seasonality Edited by S. Hylleberg Non-Stationary Time Series Analysis and Cointegration Edited by Colin P. Hargreaves Panel Data Econometrics By Manuel Arellano Periodic Time Series Models By Philip Hans Franses and Richard Paap Periodicity and Stochastic Trends in Economic Time Series By Philip Hans Franses Readings in Unobserved Components Models Edited by Andrew C. Harvey and Tommaso Proietti Stochastic Limit Theory: An Introduction for Econometricians By James Davidson Stochastic Volatility Edited by Neil Shephard Testing Exogeneity Edited by Neil R. Ericsson and John S. Irons The Cointegrated VAR Model By Katarina Juselius The Econometrics of Macroeconomic Modelling By Gunnar B˚ ardsen, Øyvind Eitrheim, Eilev S. Jansen and Ragnar Nymoen Time Series with Long Memory Edited by Peter M. Robinson Time-Series-Based Econometrics: Unit Roots and Co-integrations By Michio Hatanaka Workbook on Cointegration By Peter Reinhard Hansen and Søren Johansen

Volatility and Time Series Econometrics: Essays in Honor of Robert F. Engle Edited by Tim Bollerslev, Jeﬀrey R. Russell, and Mark W. Watson

1

3

Great Clarendon Street, Oxford ox2 6dp Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With oﬃces in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York c Oxford University Press 2010 The moral rights of the authors have been asserted Database right Oxford University Press (maker) First published 2010 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this book in any other binding or cover and you must impose the same condition on any acquirer British Library Cataloguing in Publication Data Data available Library of Congress Cataloging-in-Publication Data Volatility and time series econometrics : essays in honor of Robert F. Engle / edited by Mark W. Watson, Tim Bollerslev, and Jeﬀrey R. Russell. p. cm.—(Advanced texts in econometrics) ISBN 978-0-19-954949-8 (hbk.) 1. Econometrics. 2. Time-series analysis. I. Engle, R. F. (Robert F.) II. Watson, Mark W. III. Bollerslev, Tim, 1958IV. Russell, Jeﬀrey R. HB139.V65 2009 2009041065 330.01 51955—dc22 Typeset by SPI Publisher Services, Pondicherry, India Printed in Great Britain on acid-free paper by CPI Antony Rowe, Chippenham, Wiltshire ISBN 978-0-19-954949-8 1 3 5 7 9 10 8 6 4 2

Contents Introduction

x

1 A History of Econometrics at the University of California, San Diego: A Personal Viewpoint

1

Clive W. J. Granger

1 2

Introduction The Founding Years: 1974–1984

1 1

3 4 5

The Middle Years: 1985–1993 The Changing Years: 1994–2003 Graduate students

3 4 6

6 Visitors 7 Wives 8 The Econometrics Research Project 9 The UCSD Economics Department

6 8 8 8

10 11

The way the world of econometrics has changed Visitors and students

2 The Long Run Shift-Share: Modeling the Sources of Metropolitan Sectoral Fluctuations

8 9

13

N. Edward Coulson

1

Introduction

13

2 3 4

A general model and some specializations Data and evidence Summary and conclusions

14 21 33

3 The Evolution of National and Regional Factors in US Housing Construction

35

James H. Stock and Mark W. Watson

1 2 3

Introduction The state building permits data set The DFM-SV model v

35 38 45

vi

Contents 4 Empirical results

49

5 Discussion and conclusions

60

4 Modeling UK Inﬂation Uncertainty, 1958–2006

62

Gianna Boero, Jeremy Smith, and Kenneth F. Wallis

1 Introduction 2 UK inﬂation and the policy environment 3 Re-estimating the original ARCH model

62 63 66

4 The nonstationary behavior of UK inﬂation 5 Measures of inﬂation forecast uncertainty

69 73

6 Uncertainty and the level of inﬂation 7 Conclusion

77 78

5 Macroeconomics and ARCH

79

James D. Hamilton

1 Introduction 2 GARCH and inference about the mean

79 81

3 Application 1: Measuring market expectations of what the Federal Reserve is going to do next

87

4 Application 2: Using the Taylor Rule to summarize changes in Federal Reserve policy

91

5 Conclusions

95

6 Macroeconomic Volatility and Stock Market Volatility, World-Wide

97

Francis X. Diebold and Kamil Yilmaz

1

Introduction

2 3 4 5

Data Empirical results Variations and extensions Concluding remark

7 Measuring Downside Risk – Realized Semivariance

97 99 100 105 109 117

Ole E. Barndorﬀ-Nielsen, Silja Kinnebrock, and Neil Shephard

1 2 3 4 5

Introduction Econometric theory More empirical work Additional remarks Conclusions

117 122 128 131 133

Contents 8 Glossary to ARCH (GARCH)

vii 137

Tim Bollerslev

9 An Automatic Test of Super Exogeneity

164

David F. Hendry and Carlos Santos

1 2 3

Introduction Detectable shifts Super exogeneity in a regression context

164 166 170

4 5

Impulse saturation Null rejection frequency of the impulse-based test

173 175

6 7 8 9

Potency at stage 1 Super-exogeneity failure Co-breaking based tests Simulating the potencies of the automatic super-exogeneity test

179 181 186 186

Testing super exogeneity in UK money demand Conclusion

190 192

10 Generalized Forecast Errors, a Change of Measure, and Forecast Optimality

194

10 11

Andrew J. Patton and Allan Timmermann

1 2

Introduction Testable implications under general loss functions

194 196

3 4 5

Properties under a change of measure Numerical example and an application to US inﬂation Conclusion

200 202 209

11 Multivariate Autocontours for Speciﬁcation Testing in Multivariate GARCH Models

213

Gloria Gonz´ alez-Rivera and Emre Yoldas

1 2 3 4 5

Introduction Testing methodology Monte Carlo simulations Empirical applications Concluding remarks

12 Modeling Autoregressive Conditional Skewness and Kurtosis with Multi-Quantile CAViaR

213 215 219 224 230

231

Halbert White, Tae-Hwan Kim, and Simone Manganelli

1 2

Introduction The MQ-CAViaR process and model

231 232

viii

Contents 3 MQ-CAViaR estimation: Consistency and asymptotic normality

234

4 5 6 7

237 238 239 246

Consistent covariance matrix estimation Quantile-based measures of conditional skewness and kurtosis Application and simulation Conclusion

13 Volatility Regimes and Global Equity Returns

257

Luis Cat˜ ao and Allan Timmermann

1 Econometric methodology 2 Data

261 265

3 Global stock return dynamics 4 Variance decompositions 5 Economic interpretation: Oil, money, and tech shocks

267 275 281

6 Implications for global portfolio allocation 7 Conclusion

287 293

14 A Multifactor, Nonlinear, Continuous-Time Model of Interest Rate Volatility

296

Jacob Boudoukh, Christopher Downing, Matthew Richardson, Richard Stanton, and Robert F. Whitelaw

1 Introduction 2 The stochastic behavior of interest rates: Some evidence

296 298

3 Estimation of a continuous-time multifactor diﬀusion process 4 A generalized Longstaﬀ and Schwartz (1992) model 5 Conclusion

307 313 321

15 Estimating the Implied Risk-Neutral Density for the US Market Portfolio

323

Stephen Figlewski

1 Introduction

323

2 3 4

325 329

Review of the literature Extracting the risk-neutral density from options prices, in theory Extracting a risk-neutral density from options market prices, in practice 5 Adding tails to the risk-neutral density

331 342

6 Estimating the risk-neutral density for the S&P 500 from S&P 500 index options 7 Concluding comments

345 352

Contents 16 A New Model for Limit Order Book Dynamics

ix 354

Jeﬀrey R. Russell and Taejin Kim

1 2 3 4

Introduction The model Model estimation Data

354 356 358 358

5 6

Results Conclusions

360 364

Bibliography

365

Index

401

Introduction On June 20–21, 2009 a large group of Rob Engle’s students, colleagues, friends, and close family members met in San Diego to celebrate his extraordinary career. This book contains 16 chapters written to honor Rob for that occasion. Rob’s career spans several areas of economics, econometrics and ﬁnance. His Cornell Ph.D. thesis focused on temporal aggregation and dynamic macroeconometric models. As an assistant professor at MIT he began working in urban economics. In his long career at UCSD he continued his empirical work in macroeconomics and urban economics, and branched out into energy economics and ﬁnance, an interest that eventually led him to NYU’s Stern School of Business. His interest in applied problems and his original way of looking at them led Rob to develop econometric methods that have fundamentally changed empirical analysis in economics and ﬁnance. Along the way, Rob worked closely with scores of graduate students, fundamentally changing their lives for the better. We have organized the contributions in the book to highlight many of the themes in Rob’s career. Appropriately, the book begins with Clive Granger’s history of econometrics at UCSD, tracing Clive’s arrival at UCSD and how he recruited a young Rob Engle to join him to build what ultimately became the dominant econometrics group of the late twentieth century. For those of us who were part of it (and, in one way or another that includes nearly every practicing econometrician of the time), this is an extraordinary story. The next two contributions focus on urban economics and housing. Ed Coulson investigates the sources of metropolitan ﬂuctuations in sectoral employment by studying various restrictions on VAR representations of stochastic processes describing national, local, and industry employment. Jim Stock and Mark Watson investigate sources of volatility changes in residential construction using 40 years of state building permit data and a dynamic factor model with stochastic volatility. Of course, Rob’s most famous contribution to econometrics is the ARCH model, and the next ﬁve contributions focus on time-varying volatility. The empirical application in Rob’s original ARCH paper was to UK inﬂation uncertainty, and Gianna Boero, Jeremy Smith and Ken Wallis test the external validity of Rob’s conclusion by extending his 1958–77 sample through 2006. The ARCH class of models has subsequently found most widespread use in applications with ﬁnancial data. However, Jim Hamilton shows that macroeconomists primarily interested in inference about the conditional mean rather than the conditional variance, still need to think about possible ARCH eﬀects in the data. Further exploring the link between macroeconomics and ﬁnance, Frank Diebold and Kamil Yilmaz examine the cross-sectional relationship between stock market returns

x

Introduction

xi

and volatility and a host of macroeconomic fundamentals. The chapter by Ole BarndorﬀNielsen, Sinja Kinnebrock and Neil Shephard shows how the standard ARCH modeling framework may be enriched through the use of high-frequency intraday data and a new socalled realized semivariance measure for downside risk. Finally, Tim Bollerslev provides a glossary for the large number of models (and acronyms) that followed Rob’s original ARCH formulation. The next four chapters study various aspects of dynamic speciﬁcation and forecasting that have interested Rob. David Hendry and Carlos Santos propose a test for “super exogeneity”, a concept originally developed by Rob, David, and Jean-Francois Richard. Andrew Patton and Allan Timmermann discuss properties of optimal forecasts under general loss functions, and propose an interesting change of measure under which minimum mean square error forecast properties can be recovered. Gloria Gonz´alezRivera and Emre Yoldas develop a new set of speciﬁcation tests for multivariate dynamic models based on the concept of autocontours. On comparing the ﬁt of diﬀerent multivariate ARCH models for a set of portfolio returns, they ﬁnd that Rob’s DCC model provides the best speciﬁcation. This section is rounded out by Hal White, Tae-Hwan Kim, and Simone Manganelli who extend the CAViaR model for conditional quantiles that was originally proposed by Rob and Simone to simultaneously model multiple quantiles. The ﬁnal four chapters take up topics in ﬁnance. Luis Cat˜ ao and Allan Timmermann study to what extent equity market volatility can be attributed to global, country-speciﬁc, and sector-speciﬁc shocks. Jacob Boudoukh, Christopher Downing, Matthew Richardson, and Richard Stanton explore the relationship between volatility and the term structure of interest rates. The continuous-time model developed in that chapter is quite general, but some of the ideas and empirical results are naturally related to Rob’s original ARCH-M paper on time-varying risk premia in the term structure. The concept of risk-neutral distributions ﬁgures prominently in asset pricing ﬁnance as a way of valuing future risky payoﬀs and characterizing preferences toward risk, as exempliﬁed in Rob’s work with Josh Rosenberg. In his contribution to the volume, Stephen Figlewski provides an easy-tofollow step-by-step procedure for the construction of well-behaved empirical risk-neutral distributions. Rob has also been a leader in developing models to analyze intraday, highfrequency transactions data in ﬁnancial markets. The last chapter by Taejin Kim and Jeﬀrey Russell proposes a new model for the minute-by-minute adjustments to the limit order book. We thank the conference sponsors Duke University, the Journal of Applied Econometrics, Princeton University, the University of Chicago, and the University of California, San Diego. We thank all of the authors for their original contributions to this volume. More importantly, on behalf of the economics profession we thank Rob for his fundamental contributions to our ﬁeld. Finally, at the end of the June 21st dinner, Rob was presented with a bronze oak tree with 77 leaves. Inscribed on each leaf was the name and thesis title of one of Rob’s students. So most importantly, on behalf of all Rob’s past, present and future students we say simply “Thanks for growing us.” Tim Bollerslev Jeﬀrey R. Russell Mark W. Watson

This page intentionally left blank

1

A History of Econometrics at the University of California, San Diego: A Personal Viewpoint Clive W.J. Granger

1. Introduction It is diﬃcult to decide when a history should start or ﬁnish, but as this account is based on my own recollections, I decided to start in 1974. This was when I arrived at the University of California, San Diego (UCSD) with a full-time position, although I had been a visitor for six months a couple of years earlier. The account will end in 2003 when both Rob Engle and I oﬃcially retired from UCSD. Of course, history never really ends and it will be up to later participants in the program to add further essays in the future. The account has been divided into three convenient periods: 1974–1984, the founding years; 1985–1993, the middle years; and 1994–2003, the changing years.

2. The Founding Years: 1974–1984 I arrived at UCSD in the summer of 1974 having spent 22 years at the University of Nottingham in England (apart from a year as a post-doc at Princeton in 1959–1960), starting there as an undergraduate and ﬁnishing as a Full Professor. At the time of my arrival the teaching of econometrics was largely done by John Hooper who was well known but not actively engaged in research. For my ﬁrst year I was accompanied by Paul Newbold from Nottingham so that we could ﬁnish oﬀ our book on forecasting economic time series. We were surprised by how much time we had to work at UCSD compared to England as our teaching was easy, marking and student help was

1

2

A history of econometrics at the University of California, San Diego

provided by graduate students, lunch was brief, and there were no lengthy tea or coﬀee breaks during the day. The head of the department was Dan Orr who had been my best man when Patricia and I were married in Princeton’s chapel in 1960, so we automatically had some good friends in the department. During the ﬁrst year I found myself on an outside committee chaired by Arnold Zellner of Chicago, which was organizing a large conference on seasonal adjustment. The committee met in Washington, DC to make a variety of decisions. Also on the committee was Rob Engle, then at MIT. After the meeting he asked me if I knew of a department looking for a time series econometrician, to which I replied that we were. He came out for a visit and both sides liked each other. Rob joined the department in the fall of 1975. I had met Rob a couple of years earlier in a fortunate manner. Marc Nerlove at Chicago and I had been asked to select the speakers for three sessions at the forthcoming Econometrics World Congress. As might be expected we had many good applications, many from well-known people. However, we decided to dedicate one of our sessions just to young promising and (at that time) unpublished authors. Amongst the three we chose were Rob as well as Chris Sims, which suggests that long run forecasting is quite possible! It produced a good session at the congress. A couple of years later, Hal White came as a visitor from the University of Rochester during our spring quarter, 1979. He soon found that he quite liked the department but greatly liked the beaches and weather. He joined us permanently in 1980 completing the initial group. By the end of this period, in 1984, all three of us were Fellows of the Econometric Society: Rob in 1982, Hal in 1983, and I had been one since 1972, which made us a small but distinguished group on the West Coast. We did our research not only alone but also jointly when we found appropriate topics. We would be on the committee of each other’s graduate students and also attend the almost weekly econometrics seminar, which was usually given by a visitor. Early in this period we started the “Tuesday’s Econometricians Lunch” at a local restaurant, initially with just Rob, Hal, and myself and the occasional visitor. The topics could be far ranging, from football to going through one of our papers almost line by line. Some of our visitors so liked the idea that they adopted it, particularly Nuﬃed College, Oxford and Monash University in Melbourne. As our numbers grew, we stopped going out but instead met in the department for a “brown bag” luncheon. Some of the more famous ideas that came out of the group ﬁrst saw the light in these meetings, as well as some other ideas that did not survive. Two developments in this period are worth special attention as they produced two Nobel Prizes: Autoregressive Conditional Heteroskedasticity (ARCH) for Rob and Cointegration for me. I had written a paper on forecasting white noise, which was quite controversial. It suggested that functions of an uncorrelated series could be autocorrelated. When Bob Hall visited from Stanford to give a macroseminar, which Rob and I both attended, he had several equations with residuals having no serial correlations. I suggested to Rob that the squares of these residuals might not be white noise, but he did not agree. He was still connected electronically to MIT so he called up the same data that Hall had used, performed the identical regressions, obtained the residuals, squared them, and found quite strong autocorrelations. Later, on a visit to the London School of Economics, he considered what model would produce this behavior and found the

3 The middle years: 1985–1993

3

ARCH class, which has been such an enormous success. It is interesting to note that this example and also that used in Rob’s ﬁrst paper in the area were from macroeconomics, whereas the vast majority of its applications have been in ﬁnance. From the start I decided not to do research in ARCH and to leave the ﬁeld to Rob as it was clear that it would be heavily involved with ﬁnancial data, which was an area I had decided to leave, at least most of the time, a couple of decades before. Autoregressive Conditional Heteroskedasticity has become a huge research and application success mostly within the ﬁnance area, and many of our Ph.D. students chose to work in this area. Cointegration arose from my interest in the “balance” of econometric models where if one side of the equation contained a strong feature, such as a trend, then necessarily the other side must also do so. I was particularly concerned about the error-correction model being possibly out of balance. I had a disagreement with David Hendry, of Oxford, who said he thought it was possible to add two I(1) series and get an I(0) process. I said he was wrong, but my attempt at a proof found cointegration and showed that he was correct. I did publish a couple of small papers on cointegration but in trying to get something into Econometrica I was told that they would need a discussion of testing, estimation, and an application. Rob said that he would be happy to provide these, and become a co-author of a paper that eventually became a “Citation Classic.” In this ﬁrst period Rob produced 41 papers, ﬁve of which appeared in Econometrica, concerning spectral analysis and particularly spectral regression, regional economics, electrical residential load forecasting, various testing questions, exogeneity, forecasting inﬂation, and ARCH (in 1982). The exogeneity work has David Hendry as a co-author and links together my causality ideas and the statistical assumptions underlying estimations. Details can be found in his CV on his website. In this period Hal produced one book, two edited volumes, and 14 papers, ﬁve of which appeared in Econometrica. Amongst the topics of interest to him were a variety of testing and asymptotic questions, maximum likelihood estimation of mis-speciﬁed dynamic models, and mis-speciﬁed nonlinear regression models. My own contributions in this period were three books and 86 papers1 concerning forecasting, transformed variables, temporal and spatial data, causality, seasonality, nonlinear time series, electric load curve pricing and forecasting, and the invertability of time series. The innovation that was published in this period that had the greatest impact was fractional integration, or long memory processes.

3. The Middle Years: 1985–1993 In this period the econometrics group was steadily productive, had many excellent visitors as discussed below, and also built the reputation of the graduate program substantially (also discussed in more detail later). This was a period of consolidation and growth in maturity. Towards the end of the period the original three of us were joined by Jim Hamilton who works in time series and macroeconometrics and had previously been a visitor here as had Hal and I. 1 In

counting papers I have excluded notes, comments, and book reviews.

4

A history of econometrics at the University of California, San Diego

In this period Rob produced one book and 40 articles on topics including: Kalman ﬁlters, ARCH-M, cointegration and error-correction, meteor showers or heat waves, with an application to volatility in the foreign exchange market, modeling peak electricity demand, implied ARCH models for option prices, seasonal cointegration, testing superexogeneity in variance, common features and trends. In this period Hal produced one book and 29 papers. Some of the papers considered neural networks, interval forecasting, trends in energy consumption, and testing for neglected nonlinearity in time series models. He also had several papers attempting to win the “least comprehensible title” competition. Examples are “Eﬃcient Instrumental Variables Estimation of systems of Implicit Heterogenous Nonlinear Dynamic Equation With Nonspherical Errors” and “Universal Approximation Using Feedforward Networks With Non-Sigmoid Hidden Layer Activation Functions.” He is well known for the robust standard errors now known as “White’s Standard Error.” In his short couple of years with the department Jim produced an enormous and highly successful textbook on “Time Series” and also an article in the American Economic Review as well as his important work on switching regime models. My own contributions in this period were three books and 60 articles. The topics include aggregation with common factors, cointegration, causality testing and recent developments, models that generate trends, nonlinear models, chaos, gold and silver prices, multicointegration, nonlinear transformations of integrated series, treasury bill curves and cointegration, and positively related processes. One active area of research in this period concerned electricity prices and conducted within a small consulting company called QUERI directed by Rob, Ramu Ramanathan, and myself. The advantages of the work were that we were encouraged to publish and a couple of graduate students obtained their degrees on the topics and took jobs with the electricity production industry. We were involved in an interesting real-time forecasting project of hourly electricity demand in a particular region of the Northwest. Using a very simple dynamic model we beat several other consulting groups who used rather complicated and sophisticated methods. The following year we also won and were not allowed to enter in the third year because the organizers wanted a diﬀerent method to win. We submitted a paper to a leading journal about our experiences but it was initially rejected because the editor said it was not surprising that forecasts provided by Rob and myself won a competition. In this eight-year period the group produced six books and 130 papers, often highly innovative and progressive.

4. The Changing Years: 1994–2003 In the previous period both Hal and Rob had received very tempting oﬀers from other universities but fortunately had been persuaded to stay at UCSD. However, in this third period the inevitable changes started to occur when, in 1999, Rob accepted a professorship at the Stern School at New York University (NYU), although he did not oﬃcially retire from UCSD until 2003. This period started with a high note as two new econometricians joined us: Graham Elliott and Allan Timmermann. Both immediately showed considerable quality and

4 The changing years: 1994–2003

5

enthusiasm. Allan is best known for his ﬁnancial forecasting models and Graham for unit root inference. For the ﬁrst few years we had six econometricians at UCSD and the lunches, seminars, and other activities all continued. However, towards the end of the period the newer members had to take charge as Rob had left, Hal was often involved in consulting projects, and I was running out of energy. I ﬁnish this account in 2003 because that is the year that Rob and I both oﬃcially retired from UCSD and then a few months later we heard that we had won the Nobel Prize. Whether or not there is any causality involved will have to be tested by later retirements. Of course, the econometrics program is continuing and remains very active with Jim, Allan, Graham, and the more recent additions of Yixiao Sun and Ivana Komunjer. In this period, while at UCSD Rob published one book and 16 articles, and in the ﬁve years at Stern he had one book and 10 articles. These included work on international transmission of volatility, forecasts of electricity loads, and autoregressive conditional duration. Hal was very productive in the period with one book and 40 articles. The topics included the dangers of data mining (with Allan) and reality checks for data snooping, testing for stationarity, ergodicity, and for co-movement between nonlinear discrete-time Markov processes. Jim published one book and 14 papers of which one was in Econometrica and one in the American Economic Review. He also became a Fellow of the Econometric Society. His research topics included testing Markov switching models, asking “What do leading indicators lead?”, measuring the liquidity eﬀect, the daily market for federal funds, what is an oil shock, and the predictability of the yield spread. Allan published 30 papers on topics including implied volatility dynamics and predictive densities, nonlinear dynamics of UK stock returns, structural breaks and stock prices, moments of Markov switching models, data snooping, reform of the pension systems in Europe, and mutual fund performance in the UK. Graham published 12 papers, three of which appeared in Econometrica. The topics included near nonstationary processes, testing unit roots, cointegration testing and estimation, monetary policy, and exchange rates. I published two books and 65 papers. The books were about deforestation in the Amazon region of Brazil, and on modeling and evaluation. I was elected Corresponding Fellow of the British Academy and a Distinguished Fellow of the American Economic Association. Rob, Hal, and I all became Fellows of the American Academy of Arts and Sciences. In all, the econometricians at UCSD produced ﬁve books and 187 papers in this period. We received two awards for best paper of the year from the International Journal of Forecasting (one by Hal and one by myself). The period ended on a high note as Rob and I each received the Nobel Prize in Economics for 2003. The awards were presented in Stockholm in an exciting and memorable ceremony before many of our family members, colleagues, and friends. Rob’s award was for ARCH and mine was for Cointegration, although causality was also mentioned. Although not explicitly stated I do believe that we partly won the awards for helping to develop the econometrics group at San Diego in 30 years from being inconsequential and unranked to a major group of substantial importance. A couple of rankings produced by the journal Econometric Theory had UCSD ranked in the top three departments in

6

A history of econometrics at the University of California, San Diego

the world. A later ranking, considering the productivity of students after obtaining their doctorates, ranked our students second, which I think is excellent. It suggests that we produce some serious academic econometricians.

5. Graduate students On my arrival at San Diego I was told that I had a graduate student and that he was a rather unusual one. He was an Augustinian monk named Augustine. He was based at the Escorial in Spain and the story of how he found himself at UCSD was rather complicated. His religious order in Spain ran a college, not exclusively religious, and wanted someone to teach econometrics. They thought he was good at mathematics so they sent him to the United States to learn ﬁrst statistics and then econometrics. Why he chose us was not clear but I was happy that he had passed the preliminary examination satisfactorily. However, I was surprised that he had decided to study stock market prices as a Ph.D. topic. After a successful ﬁrst year he was called back by his order to start teaching and so did not ﬁnish his degree in San Diego. Later, he rose to a very high position in the college and always retained a very positive outlook, was cheerful, monkish, and delightful. We have attracted some excellent graduate students who have built very successful careers such as Mark Watson, Tim Bollerslev, and Norm Swanson, but to mention just a few is unfair to all the others, many of whom have been terriﬁc. Unfortunately the department has not kept careful record of all our students and so the lists that are included with this paper are of those who worked in some way with Rob or published with the other faculty members. I am sure that many of our excellent students have been left oﬀ the list and I apologize for this. From the very beginning we had good students, some very good students, and in later years several excellent students. Many have built successful academic careers and have become well-known academics. As well as the steady ﬂow from the United States, we have had memorable students from Spain, Canada, Australia, England, New Zealand, Taiwan, China, Korea, Hong Kong, Japan, Mexico, Italy, and Lebanon. Although some stayed in the US, most returned to their home countries which makes our international travel interesting these days. The quality and quantity of our graduate students certainly increased the standing of the department and made working here a pleasure.

6. Visitors The location of the campus near the beaches of San Diego and cliﬀs of Torrey Pines, the usually great weather, and the steadily improving group of econometricians, quickly attracted the notice of possible visitors to the department, especially econometricians. Over the years we have enjoyed visits from many of the very best econometricians in the world such as David Hendry, Søren Johansen, and Timo Ter¨ asvirta. There was a period when all three were here together with Katrina Juselius, James MacKinnon, and Tony Hall, which produced some exceptionally interesting discussions, socializing, and tennis. Over the years we had received and hopefully welcomed an incredible group of visitors

6 Visitors

7

(a list is attached). I apologize to anyone who is left oﬀ but no oﬃcial list was maintained and my memory is clearly fallible. The visitors added a great deal of depth, breadth, and activity to the econometrics group, as well as further improving our social life. To illustrate the impact that the UCSD econometrics group had it is worth looking at the article “What Has Mattered to Economics Since 1970” by E. Han Kim, Adair Morse, and Luigi Zingales published in the Journal of Economic Perspectives, volume 20 number 4, Fall 2006, pages 189–202. The most cited article, with 4,318 citations, is Hal White’s 1980 piece titled “A Heteroskedacity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity,” Econometrica volume 48, pages 817–838. The fourth most cited article is by Rob Engle and myself in 1987 on “Cointegration and Error-Correction: Representation, Estimation, and Testing,” which appeared in Econometrica volume 55, pages 251–276, with 3,432 citations. The 10th most cited article is also by Rob Engle in 1982 on “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inﬂation,” which appeared in Econometrica volume 50, pages 987–1007, with 2,013 citations. Thus the UCSD group registered three of the top ten most cited papers with a total of nearly 10,000 citations between them. Also in the top 10 was a paper by one of our visitors, Søren Johansen, in 1988 “Statistical Analysis of Cointegration Vectors” from the Journal of Economic Dynamics and Control volume 12, pages 231–254. It is worth noting that this article lists the most cited articles throughout economics and not just econometrics. Appearing at number 24 is Tim Bollerslev, a UCSD graduate, with his paper on GARCH from the Journal of Econometrics volume 31, pages 307–327, with 1,314 citations. Hal White appears again at 49th place with his 1982 paper on “Maximum Likelihood Estimates of Mis-Speciﬁed Models,” Econometrica volume 40, pages 1–25. Finally, in 72nd place is Jim Hamilton with his 1989 paper on “A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle,” Econometrica volume 57, pages 357–384, in which he introduced his well-known regime switching model. Our ﬁnal two mentions in the published list involve our own graduate. At 92 is T. Bollerslev, R. Chou, and K. Kroner on “ARCH Modeling in Finance” from the Journal of Econometrics, volume 32, 1,792 citations; and at number 99 are Rob Engle and B.S. Yoo on “Forecasting and Testing in Cointegrated Systems” also from the Journal of Econometrics, volume 35, with 613 citations. To illustrate how highly ranked these papers are it is worth noting that further down, at numbers 131, 132, and 133 are three very well-known Econometrica papers by Jim Durbin (1970) on “Testing for Social Correlation in Least Squares Regressions,” by Trevor Breusch and Adrian Pagan (1979) on a “Simple Test for Heteroskedasticity and Random Coeﬃcient Variation,” and by Roger Koenker and Gilbert Basset (1978) on “Regression Quantiles.” Most publications in economics get very few citations so the papers mentioned here have been exceptionally successful. There are a few concepts in our ﬁeld that can be considered as “post-citation.” Examples are the Durbin–Watson statistic and Student’s t-test which are frequently mentioned but very rarely cited. “Granger Causality” seems to fall into this category now and we should expect that ARCH will fall into this category.

8

A history of econometrics at the University of California, San Diego

7. Wives Virtually all good researchers are backed up by a patient and caring wife and it would be wrong to write this history without mentioning our wives. The original three, Marianne, Patricia, and Kim were later joined by Marjorie and then by Solange. All have made substantial contributions to our success.

8. The Econometrics Research Project In 1989 the UCSD administration decided to reward the publishing success of the econometrics group by forming an oﬃcial UCSD Research Project for Econometrics. It was designed to encourage our faculty to seek research grants that could use the project as a home oﬃce with little overhead. It was also charged with being helpful to our visitors. In 1992 we were fortunate to have Mike Bacci join us after service in the US Navy. He keeps the project in good shape and has been extremely helpful to our many visitors, both senior and students, particularly those from overseas.

9. The UCSD Economics Department The department has provided a supportive environment that has allowed us to grow both in size and quality. It has matured a great deal itself, now containing excellent scholars in several ﬁelds. There have been close research links with the econometricians and other faculty members often leading to publications including Ramu Ramanathan (with Rob and myself), Max Stinchcombe (with Hal), Mark Machina (with me), and Valerie Ramey (with myself). Although many of us are most excited by doing research and derive a great deal of pleasure from being involved in successful research projects, we are actually paid to teach both undergraduates and graduates. Over the years the number of students involved grew substantially. This produced larger class sizes and consequently changes in the methods of teaching. These developments allowed some of us to develop new approaches towards getting our messages across to classes who seem to be declining in levels of interest. The graduate students, acting as teaching assistants (TAs) were essential in helping overcome any problems. However, the UCSD faculty continued to give lectures and be available for discussions.

10. The way the world of econometrics has changed When the UCSD econometrics group was starting, in the mid- to late-1970s, the ﬁeld of econometrics was still dominated by large, simultaneous models with little dynamics often built using short lengths of annual or, at best, quarterly data. The problems of how

11 Visitors and students

9

to specify, estimate, test, and identify such models were essential ones, but very diﬃcult and some excellent econometrics was done on these topics. When data are insuﬃcient in quantity one always replaces it, or expands it, by using theory. Evaluation of these models was diﬃcult but forecasting comparisons were used. The advent of faster computers, more frequent data, even in macro but particularly in ﬁnance, brought more attention to time series methods, such as those developed by Box and Jenkins. Forecast comparisons usually found that the new, simpler and dynamic models outperformed the old models. The fact that some of the classical techniques, such as linear regressions, could perform badly when series are I(1) also turned researchers’ attention to the new methods. Some very famous university economic groups moved very reluctantly away from the classical areas and the research of the new groups, such as at UCSD, was not always well received. A sign of this can be seen in the development of standard econometric textbooks. The early versions, available in the 1970s and well into the 1980s and sometimes beyond, would make virtually no mention of time series methods, apart from a possible brief mention of an AR(1) model or a linear trend. Today many textbooks cover almost nothing but time series methods with considerable attention being paid to ARCH, cointegration, fraction integration, nonlinear models including neural networks, White robust standard errors, regime switching models, and causality, all of which were developed at UCSD. I think that it can be claimed the work at UCSD has had a major impact. It will be a diﬃcult task to keep this level of activity going. Throughout the years discussed above the major institution concerned with econometrics was the Econometric Society and it was inﬂuential through its journal, Econometrica, started in 1933. It is acknowledged to be one of the most prestigious within the ﬁeld of economics. Several of us have published in Econometrica, particularly Rob and Hal, and four of us are Fellows of the society. However, there have been remarkably few contacts between the organization of the society and the UCSD group. Rob was a member of the council for two three-year terms and I was a member for one term, but we were not asked to be active. Rob was an associate editor of Econometrica for the years 1975–1981 and that is the total of the contacts! We were asked to be on the boards of many other journals but somehow those who run the Econometric Society never warmed to what was being achieved here.

11. Visitors and students Much of the strength of the UCSD econometrics group came from the quality of our students and visitors. Unfortunately no comprehensive lists were kept of our students or visitors, so to make appropriate lists we have taken two approaches. In list “A” are all the students who had Rob Engle as one of their examiners and so signed their thesis up to the year 2003. There are 60 names on list “A.” For the rest of us we have just listed graduate students that have published with us up to the year 2003 or so. These lists give 31 students for Granger, 25 for White, eight for Hamilton, 10 for Timmermann, and one for Elliot, giving a total of 75 in all, although several students appear on more than one list. There is, of course, a great deal of overlap between the Engle list and these other lists.

10

A history of econometrics at the University of California, San Diego

There are 44 names on the visitors list and this is a very distinguished group. What follows is a partial list of distinguished econometricians who visited UCSD: Lykke Andersen Allan Anderson Badi Baltagi Alvaro Escribano Philip Hans Franses Ron Gallant Peter Bossaerts Peter Boswijk James Davidson Jesus Gonzalo Neils Haldrup Tony Hall David Hendry Kurt Hornik Svend Hylleberg Joao Issler

Eilev Jansen Michael Jansson Søren Johansen Katarina Juselius Jan Kiviets Erich Kole Asger Lunde Helmut L¨ utkepohl Essie Maasoumi J. Magnus John MacDonald James MacKinnon Graham Mizon Ulrich M¨ ueller Paul Newbold Dirk Ormoneit

Rachida Ouysse Gary Phillips Ser-Huang Poon Jeﬀ Racine Barbara Rossi Pierre Siklos Norm Swanson Timo Ter¨ asvirta Dag Tjøstheim Dick van Dijk Herman van Dijk Andrew Weiss Minxian Yang

A. List of students for whom Rob Engle signed the Ph.D. thesis as an examiner Richard Anderson Heather Anderson Yoshihisa Baba Tim Bollerslev Michael Brennan Kathy Bradbury Scott Brown Sharim Chaudhury Ray Chou Mustafa Chowdhury Riccardo Colacito Ed Coulson Zhuanxin Ding Ian Domowitz Alfonso Dufour Alvaro Escriban Ying-Feng (Tiﬀany) Gau Isamu Ginama Gloria Gonz´ alez-Rivera Jesus Gonzalo Peter Hansen

Andreas Heinen Che-Hsiung (Ted) Hong Owen Irvine Isao Ishida Joao Issler Oscar Jorda Sharon Kozicki Dennis Kraft Sandra Krieger Kenneth Kroner Joe Lange Gary Lee Han Shik Lee Wen-Ling Lin Henry Lin Simone Manganelli Juri Marcucci Robert Marshall Allen Mitchem Frank Monforte Walter Nicholson

Victor Ng Jaesun Noh Andrew Patton Lucio Picci Gonzalo Rangel Russell Robins Joshua Rosenberg Jeﬀrey Russell Dean Schiﬀman Kevin Sheppard Aaron Smith Gary Stern Zheng Sun Raul Susmel Farshid Vahid Artem Voronov Mark Watson Jeﬀ Wooldridge Byungsam (Sam) Yoo Allan Zebede

E. List of UCSD graduate students who published with Hamilton

B. List of UCSD students who published with Granger Lykke Andersen, Heather Anderson Melinda Deutsch Zhuanxin Ding Luigi Ermini Raﬀaella Giacomini Jesus Gonzalo Jeﬀ Hallman B.-N. Huang Tomoo Inoue

Yongil Jeon Roselyn Joyeux Mark Kamstra Dennis Kraft Chung-Ming Kuan H.-S. Lee T.-H. Lee C.-F. Lin J.-L Lin Matthew Mattson

Allan Mitchem Norm Morin Andrew Patton Russell Robins Chor-Yiu Sin Scot Spear Norman R. Swanson Farshid Vahid-Araghi Mark Watson Sam Yoo

C. List of UCSD graduate students who published with Elliott Elena Pesavento

D. List of UCSD graduate students who published with White Stephen C. Bagley Xiaohong Chen C.-S. James Chu Valentina Corradi Ian Domowitz Raﬀaella Giacomini Silvia Gon¸calves Christian Haefke Yong-Miao Hong

Mark Kamstra Pauline Kennedy Tae-Hwan Kim Robert Kosowski Chung-Ming Kuan T.-H. Lee Robert Lieli Matthew Mattson Teo Perez-Amara

Mark Plutowski Shinichi Sakata Chor-Yiu Sin Liangjun Su Ryan Sullivan Norman R. Swanson Jeﬀ Wooldridge

E. List of UCSD graduate students who published with Hamilton Michael C. Davis Ana Maria Herrera Oscar Jorda

Dong Heon Kim Gang Lin Josefa Monteagudo

Gabriel Perez-Quiros Raul Susmel

11

12

A history of econometrics at the University of California, San Diego

F. List of UCSD graduate students who published with Timmermann Marco Aiolﬁ Massimo Guidolin Robert Kosowski Asger Lunde

David Miles Andrew Patton Bradley Paye Gabriel Perez-Quiros

Davide Pettenuzzo Ryan Sullivan

2

The Long Run Shift-Share: Modeling the Sources of Metropolitan Sectoral Fluctuations N. Edward Coulson

1. Introduction In this tribute to the career of Robert Engle, attention should be given to an aspect of his early career that is not universally recognized, that of urban and regional economist. As related in his interview with Diebold (2003), upon arriving at Massachusetts Institute of Technology (MIT) Engle was asked by Franklin Fisher and Jerome Rothenberg to collaborate on the construction of a multi-equation structural model of the Massachusetts economy, and this led to a number of publications at the outset of his career. His involvement with the ﬁeld did not end there. An examination of Engle’s curriculum vitae reveals that of his ﬁrst 13 publications, seven were in the ﬁeld of urban economics, and there are many more publications in that area through the early 1990s. Perhaps of equal interest is the fact that many of his contributions to “pure” econometrics used urban and regional data to illustrate the methods associated with those contributions. Two prominent examples are his paper on the parameter variation across the frequency domain (Engle, 1978a), and Engle and Watson (1981) which introduced the Dynamic Multiple-Indicator Multiple-Cause (DYMIMIC) model by treating

Acknowledgments: My thanks go to Mark Watson, an anonymous referee, and participants at presentations at the 2006 Regional Science Association and the Federal Reserve Banks of New York and St. Louis for helpful comments.

13

14

The long run shift-share

the decomposition of metropolitan wage rates. As he notes in the interview with Diebold, “there is wonderful data in urban economics that provides a great place for econometric analysis. In urban economics we have time series by local areas, and wonderful cross sections . . . ”. One of the natural links between urban economics and time series econometrics is the examination of aggregate urban ﬂuctuations. Because of data availability, such analysis focuses on the determination of metropolitan area employment and labor earnings, and, again because of the data, sectoral level data are often employed in the analysis. This is helpful and appropriate, because both journalistic and academic explanations of the diﬀerences in cyclical movements of aggregate urban employment often center on diﬀerences in sectoral composition across metropolitan areas. On that account the focus turns to the sources of ﬂuctuations in metropolitan industry sectors. For example Brown, Coulson and Engle (1991), following Brown (1986), ask the basic question of whether or not metropolitan industry sectors are cointegrated (Engle and Granger, 1987) with national industry counterparts, and Brown, Coulson and Engle (1992) ask, in the context of constructing export base multipliers, under what circumstances metropolitan sectoral employment is cointegrated with aggregate metropolitan employment. In what follows, I build on the methods of the above papers and others and propose to delineate the sources of sectoral ﬂuctuations in metropolitan economies. This delineation has four steps. First, a general “city-industry” vector autoregression (VAR) is constructed, which accounts for both short and long run ﬂuctuations at a number of different levels of aggregation. Second, a large number of “traditional” models of regional economics (including the two cointegration analyses of the preceding paragraph) are shown to be reductions of this general VAR, although a by-product of the analysis is that it is not likely that all of these reductions can be applied simultaneously. Both of these steps occur in the next section. In Section 3 the restrictions implied by the restrictions of the traditional model are tested using data from 10 sectors and ﬁve cities. None is found to be universally applicable, though some do less violence to the data than others. Given these results, the fourth step of estimating the complete VARs (for each city industry) is undertaken under four diﬀerent assumptions. The overall result is that the traditional models are unsatisfactory because they neglect the role of local supply shocks, although this neglect does more damage in “short run” models than in those that invoke cointegration.

2. A general model and some specializations The goal of this analysis is to estimate the sources of sectoral ﬂuctuations in a metropolitan area – for example, the Los Angeles manufacturing sector. Such sources can be conveniently catalogued as arising from four diﬀerent levels: national (aggregate US), industrial (US manufacturing), metropolitan (aggregate Los Angeles), and sources that are idiosyncratic to the particular metropolitan sector. Consider, then, the following VAR, which for simplicity is restricted to ﬁrst order autoregressive processes

2 A general model and some specializations (an assumption relaxed in its empirical implementation): ⎛ ⎞ ⎛ ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ ⎞ k1 un Δn n Δn ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ Δi ⎟ ⎜k2 ⎟ ⎟ ⎜ ⎟ = ⎜ ⎟ + A1 ⎜ Δi ⎟ + A0 ⎜ i ⎟ + ⎜ ui ⎟ ⎝um ⎠ ⎝Δm⎠ ⎝m⎠ ⎝Δm⎠ ⎝k3 ⎠ k4 ue t Δe t−1 e t−1 Δe t ⎛ ⎞ un ⎜ ui ⎟ ⎟ cov ⎜ ⎝um ⎠ = Ω ue

15

(1)

where nt = log of aggregate national employment at time t it = log of national employment in the same speciﬁed industry at time t mt = log of aggregate local employment at time t et = log of local employment in a speciﬁed industry at time t, and the ki are intercept terms. Consideration of this issue has been one of the primary concerns of empirical regional economics over the past half century.1 Over that period of time, a number of models have been developed that in essence impose extra structure on the parameters of (1). In the extreme, such simpliﬁcations become shift-share decompositions, requiring no estimation at all. The exact form of the shift-share decomposition varies somewhat. Our baseline version of this decomposition seems to originate in Emmerson, Ramanathan and Ramm (1975): Δet = Δnt + (Δit − Δnt ) + (Δmt − Δnt ) + (Δet − Δmt − Δit + Δnt ).

(2)

Growth in a local industry is decomposed into four parts. The ﬁrst component, the national component, estimates the impact of national employment movements on local employment movements. If, say, national employment grows at 5% in a year, then, other things equal, the local industry – say the ﬁnance sector in Boston – is also expected to grow at the same 5% rate. The second component, the industry component, is the deviation of the national industry growth rate from that of the nation as a whole. Thus if the national ﬁnance sector grew at a rate of 10%, then the Boston ﬁnance sector should, other things equal, also be expected to grow at that same rate, with national and industry factors each responsible for half of that growth. Similarly, the third component is dubbed by Dunn (1960) the “total share component”, and is the deviation of the overall metropolitan growth rate from the national growth rate; obviously this is the contribution of local aggregate growth to local sector growth. The fourth component 1 It should be noted at the outset that such a model can only be used to assess the sources of ﬂuctuations of et , and not the other three series, all of which include et in their aggregations. A ﬁnding that e was a source of ﬂuctuations of n, m, or i would seem to be vacuous without consideration of the impact of other industries or locations. For an analysis of the reverse question, industry and regional impacts on national ﬂuctuations, see e.g. Horvath and Verbrugge (1996), Altonji and Ham (1990), Norrbin and Schlagenhauf (1988). At the metropolitan level the role of sectoral ﬂuctuations in determining aggregate metropolitan employment is discussed in Coulson (1999) and Carlino, DeFina and Sill (2001).

16

The long run shift-share

is the change in the industry’s share of employment at the metropolitan level relative to its share at the national level. It is the percentage change in the familiar location quotient and is interpretable as the outcome of local supply shocks to local employment growth (given that the total share components net out local demand factors and the industry component presumably nets out technology shocks that are common to all locations). How can the shift-share model be used to inform the speciﬁcation of the VAR (1)? There are eﬀectively two general approaches which are not mutually exclusive, though for the purposes of this paper they will be. One is to view (2) as an orthogonalization; that is, each of the components is assumed to be uncorrelated with the others, and therefore capable of separate study. How this has happened in the historical literature will be addressed below, but for the moment note that in the context of the VAR, the implications of this (Coulson, 1993) are that we should premultiply both sides of (1) by the orthogonalization matrix W, where: ⎛ ⎞ 1 0 0 0 ⎜−1 1 0 0⎟ ⎜ ⎟ W =⎜ (3) ⎟ ⎝−1 0 1 0⎠ 1 and we have

−1 −1

⎛ ⎞ ⎞ Δn Δn ⎜ Δi ⎟ ⎜ Δi ⎟ ⎜ ⎟ ⎜ ⎟ W⎜ ⎟ ⎟ = W k + W A1 ⎜ ⎝Δm⎠ ⎝Δm⎠ ⎛

Δe

t

Δe

t−1

1

⎛ ⎞ ⎛ ⎞ en n ⎜ ⎟ ⎜ ⎟ i ⎜e ⎟ ⎜ ⎟ + W A0 ⎜ ⎟ +⎜ i⎟ ⎝em ⎠ ⎝m⎠ e t−1 ee t

(4)

where k is the vector representation of the intercept terms, u = W−1 e

(5)

and the components of e are orthogonal. Thus we can write Ω = W −1 DW −1 .

(6)

The orthogonalization of the VAR is much the same as occurs in ordinary VARs, in that the orthogonalization matrix is triangular; however, given the nature of the homogeneity restrictions, the model is an overidentiﬁed structural (B-form) VAR (Coulson, 1993; Lutkepohl, 2005). The reasonableness of the structure, which is equivalent to testing the overidentifying restrictions, is also a test of the reasonableness of separately analyzing the components of the shift-share decomposition, as is typically the case, even today. As it happens, models and modes of regional analysis that view shift-share through this lens very often make implicit (and sometimes explicit) assumptions on the nature of the long run behavior of the components, that is to say, on the form of the matrix A0 . As is well known, the rank of A0 is critical to the time series representation of the vector of variables. If this rank is zero the variables are all integrated of order 1 (at least) and are not cointegrated; it happens that this is the explicit assumption of many previous models, as noted in Brown, Coulson and Engle (1991). It is for this reason that

2 A general model and some specializations

17

shift-share is regarded as a short run model. Long run considerations, as manifested in A0 , are non-existent If the rank is positive but less than the full rank of four, there is cointegration among the variables. If the rank is full, then the variables are ostensibly stationary – that is, integrated of order zero (I(0)). It will be demonstrated later that this last possibility will not trouble us much, and so if A0 = 0 then the proper question is how many cointegrating vectors exist within the system? Let the four components of the data vector be notated as xt . The essence of cointegration is that the number of variables in x is greater than the number of integrated processes characterizing their evolution, therefore the components of x are tied together in the long run. This is delivered by the fact that while each of the x variables is I(1), certain linear combinations are I(0). If those combinations are notated as β x we can write: A0 = αβ

(7)

where β is the kxr matrix of the r cointegrating vectors, and α is the kxr matrix of adjustment speeds.2 As is well known, α and β are not separately identiﬁed (since for any nonsingular 4 × 4 matrix F, the two vectors α∗ = αF and β ∗ = F−1 β would be observationally equivalent to α and β). The usual procedure is to speciﬁy restrictions on β, which are usually zero or normalization restrictions. To anticipate the implications of the long run shift-share model, we suppose that in our system of four variables we have three cointegrating vectors. A0 would therefore have rank = 3 and therefore 15 free parameters. The matrix of adjustment speeds, α, is typically freely estimated, and therefore uses up 4×3 = 12 parameters, leaving β with three free parameters. Typically, then, the cointegrating vectors would be given, without loss of generality, as: ⎞ ⎛ β1 1 0 0 ⎟ ⎜ (8) β = ⎝β2 0 1 0⎠ . β3 0 0 1 This is, of course, where the shift-share decomposition comes in. The second strand of models that deal with shift-share analysis have used the decomposition to identify, and overidentify, the matrix β. Accumulate and slightly rearrange the decomposition (2) to obtain the identity: (et − nt ) = (it − nt ) + (mt − nt ) + (et − mt − it + nt ).

(9)

The idea is that now each of the parenthetical terms represents a cointegrating vector; that while each of the data series is I(1), the diﬀerences displayed represent I(0) objects. Equally obvious is the fact that if any three of the parenthetic terms are I(0) the fourth one is as well, and so one can, indeed must, be omitted from the rows of β. In the standard formulation (8), this long run shift-share model would impose the further restrictions 2 Note that we could write the levels term as α(β x ). The parenthetic part is known as the error t correction term, and is a measure of the distance of the x vector from its long run equilibrium. The α term is then described as the speed of adjustment and it, as the name suggests, is a measure of how fast the error correction term goes to its equilibrium value of zero.

18

The long run shift-share

β1 = β2 = β3 = −1. But clearly we could implement the alternative formulation: ⎛

−1

1

0

⎜ 1 β = ⎝−1 0 1 −1 −1

⎞ 0 ⎟ 0⎠ 1

(10)

which implies that the industry component, the total share and the location quotient are all I(0). This form of β is attractive in that it is simply the last three rows of W. It should be noted that the existence of three cointegrating regressions in four variables implies that the entire system is driven in the long run by one shock. Given the implicit assumptions on causality that are inherent in the W matrix, that shock is the one to national employment. This seems somewhat implausible, so (as in the short run model) we can consider other parameterizations to this model as alternatives to the long run shift-share, models that assume some cointegration, but not “full cointegration” as implied by the long run shift-share model. To summarize: the short run shift-share model implies (a) that rank(A0 ) = 0 so that the model is one of changes; and (b) an orthogonalization of those data series that involves homogeneity restrictions. The long run shift-share implies (a) rank(A0 ) = 3, and (b) similar homogeneity restrictions on the cointegrating matrix β. We can now survey the historical development of the shift-share model as a series of restrictions on the above delineated types. It should not be assumed that the authors who are cited as developing these various models necessarily found evidence in their favor, only that they developed and used them for testing or forecasting purposes.

2.1. Dunn (1960): The total share model Dunn (1960) views the shift-share model as a model of total regional employment rather than local sectoral employment. He proposes the following decomposition:3 Δm = Δn + (Δm − Δn)

(11)

With m-n as the share of the region in the national economy, the second term is, naturally enough, the shift in that share. Hence, the name. Because this needs to be distinguished from industry based shift-share, these are actually dubbed the “total” shift and share. Given the language in Dunn (1960), this model is viewed as one in which, other things equal, the region should grow at the same rate as the nation as a whole. Dunn clearly views the model as one of the short run, hence we would view the decomposition as a reduction of the orthogonalization scheme above, speciﬁcally W31 = −1 and W32 = 0. The total share model does not operate at the level of the industry (either local or national). This is not at all the same thing as assuming that industry eﬀects are nonexistent, merely that they are not part of the assumed structure. Thus the W matrix 3 Actually, Dunn (1960), and much of the literature that follows, frames shift and share in terms of numbers of jobs gained or lost. Thus they would premultiply both sides of (2) by mt−1 (or later by et−1 ). In the interest of simplicity this modiﬁcation is ignored.

2 A general model and some specializations

19

is written as: ⎛

1 ⎜β ⎜ 21 W =⎜ ⎝ −1 β41

0 1 0

0 0 1

β42

β43

⎞ 0 0⎟ ⎟ ⎟. 0⎠ 1

Also, there is no cointegration between m and n, and thus the total share is I(1). In the ﬁrst order model above this implies that the share is a random walk.

2.2. Carlino and Mills (1993): Long run constant total share (stochastic convergence) In direct contrast to Dunn’s implicit assumption that the total share is a random walk, Carlino and Mills (1993) test for the proposition that the total share m-n is I(0).4 Thus the share held by a particular region is constant in the long run. This is taken as evidence of stochastic convergence, the idea being that deviations from this share will not persist. The long run constant total share model is therefore manifested as the restriction rank (β) = 1, as there is only one long run restriction, and this row will be of the form (−1 0 1 0); that is, neither et nor it is expected to be part of the long run model.

2.3. H. Brown (1969): Sectoral shift-share Brown (1969) introduced the three part shift-share model, which shifted focus from the total regional share to the industry share: Δe = Δn + (Δi − Δn) + (Δe − Δi)

(12)

with attention focusing on the behavior of the ﬁnal term, the regional shift, which is easily seen to be the change in the industry share (e/i) held by the region. The fact that these three terms were regarded as separately analyzed series is an implicit assumption that the decomposition is in fact an orthogonal one (Coulson, 1993). Noting that m plays no role in this decomposition the W−1 matrix is of the form: ⎛ ⎞ 1 0 0 0 ⎜ −1 1 0 0⎟ ⎜ ⎟ W =⎜ (13) ⎟. ⎝w31 w32 1 0⎠ 0

−1

0

1

Once the three part decomposition is developed, the assumption of orthogonality becomes explicit, as modeling of the shift component e/i is now the focus of the research program. Not only that, but the short run assumption also becomes operational. In an attempt to frame shift-share as a forecasting tool, Brown (1969) postulated that the region’s industry share was a random walk. This implies not only the orthogonalization suggested in (12) but also that there is no cointegration between e and i. 4 Though

perhaps with a structural break.

20

The long run shift-share

2.4. S. Brown, Coulson and Engle (1991): Constant share This is the natural counterpart to the martingale share model, implying that the orthogonalization in (12) is appropriate at the long run rather than short run time horizon, that is that there is a single cointegrating vector in β, and that the row is of the form (0 1 0 −1). Brown, Coulson and Engle (1991) frame this model as a set of regional production functions with a ﬁxed-in-location factor of production, technology shocks are national, and migration across regions equilibrates the share of production for each region in the long run as a function of the region’s share in the ﬁxed factor.

2.5. Sasaki (1963): The base multiplier model One of the workhorse models of regional economics is the base multiplier model, which implies a relationship between (a set of) local industry employments (the basic sectors) and aggregate regional employment. This is not a relationship between e and m, per se, but between the sum of a subset of e’s and m and was ﬁrst placed in a regression context by Sasaki (1963). Nevertheless, while a regression of m on e would yield a biased estimate of the multiplier (as the intercept term), if the base multiplier theory holds there should still be a unit elastic relationship between each sectoral employment and total regional employment in the short run.

2.6. Brown, Coulson and Engle (1992): Long run base multiplier If the employment series are integrated, then Brown, Coulson and Engle (1992) demonstrate that the base multiplier model implies that e and m will be cointegrated, regardless of whether e is not part of the basic sectors, if certain other conditions hold, to be discussed shortly. Thus there will be a row of β that can be rendered as (0 1 0 −1). It is of interest to note that the combination of restrictions implied in models 2, 4 and 6 yield the long run shift-share model. The three rows of β discussed in those models are linearly independent and are equivalent to the matrix in equation (10). As a further interesting demonstration of this note that: (et − mt ) = (nt − mt ) + (et − it ) + (it − nt ). Thus the three models together imply that national industries are cointegrated with the national aggregate. This seems implausible on its face, as it would imply that technology shocks are identical across industries. Thus, one of the three long run models must be wrong. Our preliminary candidate is model (6), the long run base multiplier. The “certain other conditions” alluded to above have to do with the cointegration of the various sectors at the local level. Basically, if e is a basic sector, then it must be cointegrated with other basic sectors, again implying something like common technology shocks. If e is a local sector, it must be cointegrated with other local sectors, which presumably implies common demand shocks. At the local level this is slightly more plausible; nevertheless, the long run shift-share model does require a lot of the data. To round out the model descriptions we reiterate the full models described in the beginning:

3 Data and evidence

21

2.7. Coulson: The four part shift-share model The three part decomposition/orthogonalization (model 5) is unsuitable particularly because it is diﬃcult to interpret the components of the decomposition in a coherent manner. If e and i are employments in an export-oriented industry then a change in the share plausibly represents supply shocks to that region-industry (at least relative to the national industry); but if there are regional demand shocks for the local output, then the shift term will conﬂate them with the supply shocks. As noted, the four part decomposition originated by Emmerson, Ramanathan and Ramm (1975) overcomes this problem by re-introducing the total shift and re-orthogonalizing (as particularly noted by Berzeg (1978)) with W as described in (3) above. Thus this four part model is basically a pair of hypotheses: (a) that there is no cointegration among the four variables in levels, and thus that the VAR should be speciﬁed in changes; (b) that the matrix W describes the appropriate orthogonalization.

2.8. Long run shift-share The long run counterpart to Coulson (1993) is the long run shift-share, as previously discussed.5 There are three maintained hypotheses, that (a) the data are integrated; (b) there are three cointegrating vectors among the four variables; (c) that (10) describes the cointegrating relationships.

3. Data and evidence Data on full-time employment are drawn from the website of the US Bureau of Labor Statistics (www.bls.gov). Data are drawn from ﬁve diﬀerent Metropolitan Statistical Areas (MSAs): Philadelphia, Dallas, Atlanta, Chicago, and Los Angeles. These example MSAs were chosen more or less at random, to represent a diversity of regions and economic bases. Not every industry is available in every MSA, so for purposes of comparability, we use the broad industry aggregates (“supersectors”) of the North American Industry Classiﬁcation System (NAICS), which are listed in the Tables. Comparable data are drawn from the US employment page for aggregate and supersector employment. The data are monthly and range from January 1990 through August 2006. The start date reﬂects the availability of consistently measured city-industry data.6 Our ﬁrst task is to determine the integratedness of the series in question. All of the models above implicitly assume that the series are indeed integrated. Table 2.1 presents augmented Dickey–Fuller tests for each of the series. The Dickey–Fuller test is a test 5 There are several other variants on the above models, but these are omitted from the present survey. Brown (1969) argues that the shift itself is a random walk, which would indicate that the employment series are I(2). Test results (not included) indicate no evidence of this. Theil and Ghosh (1980) model the decomposition in eﬀect as a two-way ANOVA model, where the interaction term, i.e. the location quotient, plays no role. 6 The conversion of BLS industry classiﬁcations from Standard Industrial Classiﬁcation (SIC) to NAICS occurred in 1997. The BLS could only reliably backcast the MSA industry-level data to 1990, and neither has it recreated SIC data after this change. The lack of long run information in these time series may cause the lack of speciﬁcity in the VAR results.

22 Table 2.1.

The long run shift-share Unit root tests US

Total Construction Durable Manufacturing Nondurable Manufacturing Trade, Transportation and Utilities Finance Information Professional and Business Services Education and Health Services Leisure and Hospitality Services Other Services Government

Philadelphia Dallas Atlanta Chicago

Los Angeles

−1.88 −3.79∗ −1.60 −1.35 −1.55

−2.69 −3.20 −2.81 −2.14 −3.42

−3.92∗ −3.37 −1.53 −2.51 −3.34

−2.86 −3.01 −0.59 −0.78 −2.51

−3.01 −1.86 −2.75 −1.60 −2.91

−3.74∗ −3.48∗ −3.49∗ −2.06 −2.91

−3.45∗ −1.39 −2.10

−2.52 −0.23 −3.19

−2.66 0.10 −3.16

−1.05 0.61 −2.17

−2.97 −0.57 −2.66

−2.09 −1.65 −2.16

−2.14

−2.07

−2.39

−1.46

−3.22

−3.34

−1.00

−2.40

−1.55

−1.58

−1.97

−2.24

−0.79 −1.69

−3.22 −1.81

−0.55 −2.47

−1.82 −2.97

−2.61 −0.59

−2.27 −3.04

The table entries are the t-values from an Augmented Dickey–Fuller test for unit roots in the indicated series. Asterisks indicate a test-statistic with a prob-value between 1 and 5% for rejecting the null hypothesis that a unit root exists, against the I(0) alternative. The Dickey–Fuller regressions contain an intercept, a time trend, and lags of the ﬁrst diﬀerence as selected by the Schwarz information criterion.

for stationarity, regressing the change in a variable on the lagged level (i.e. a univariate version of the ﬁnal equation in the VAR equation (1)). Rejection of the null hypothesis indicates that the series in question is I(0). As can be seen, the test-statistics are almost invariably above (i.e. closer to zero than) the 5% critical value. Of the 72 series listed in Table 2.1, four have test-values less than the 5% critical value, about what would be expected if the null were universally true and the tests were independent (which of course they are not). The general conclusion is therefore that the series are indeed integrated. This paves the way for Table 2.2, which tests the extent to which the four series are cointegrated with each other. The unrestricted VAR (equation (1)) with four lags is estimated using each city-industry and the three more aggregated series that correspond to it.7 Trace tests (Johansen, 1995) are then performed sequentially to reject or fail to reject whether the rank of the matrix A0 is zero, one, or two. That is, zero rank is tested, 7 Equation (1) contains intercept terms. The VARs are estimated under the assumption that part of this intercept is “inside” the cointegrating relation and part is “outside” (which are not separately identiﬁed). The ﬁrst part then corresponds (under the homogeneity assumption at least) with the proportionalities which exist across diﬀerent levels of aggregation, and the second is congruent with the assumption that the employment levels have deterministic trends. The models are estimated using Eviews, which identiﬁes the ﬁrst part by assuming that the “inside” part is zero during estimation, and then regressing the error correction term on an intercept. The diﬀerence between that intercept and the estimated constant becomes the trend term.

3 Data and evidence Table 2.2.

23

Trace tests of the long run shift-share Philadelphia Dallas Atlanta Chicago

Construction Durable Manufacturing Nondurable Manufacturing Trade, Transportation and Utilities Finance, Insurance and Real Estate Information Services Professional and Business Services Education and Health Services Leisure and Hospitality Services Other Services Government

3 1 3 2 1 1 2 3 2 1 3

3 2 2 1 1 1 2 2 2 1 3

3 3 1 0 2 1 2 2 2 2 2

3 3 2 1 1 1 1 3 1 1 1

Los Angeles 2 2 2 3 2 3 2 3 3 3 3

The table entries are the number of cointegrating vectors in a four equation system consisting of the logs of national employment, total city employment, total industry employment and city-industry employment for the indicated row and column. Sequential trace tests were employed to reject (or fail to reject) ranks in the A0 matrix of zero, one, and two at 5% critical values.

and if rejected, a rank of one is tested, and if that is rejected a rank of two is tested. Given that a rank of four is only possible if the data are I(0), testing ceases if a rank of two is rejected in favor of a rank of three. Recall that the long run shift-share hypothesis is that the rank of the matrix is three. Five points can be made from Table 2.2: 1. There is cointegration of some kind in almost all of the VARs. Only one combination, that associated with Atlanta’s Trade, Transport and Utilities sector, failed to reject the null hypothesis rank (A0 ) = 0, and the prob-value of that test was 6.5%. 2. At the other extreme, there is only a modest amount of empirical support for the long run shift-share model, in that relatively few of the VARs exhibit three cointegrating vectors. This is to be expected given the discussion above. 3. Nevertheless, there are patterns to the number of cointegrating vectors. More cointegration (and more evidence of the long run shift-share model) are observable in the construction and Government sectors and in the Los Angeles MSA. Other industries (information services, ﬁnance, other services) and cities exhibit much less cointegration. 4. A question of importance is the extent to which the results from point 3 are inﬂuenced by the results from Table 2.1. For instance, Los Angeles has more cointegration than other cities, but it is also one of the two cities where the unit root null was rejected for its aggregate employment. On the other hand, Dallas’ aggregate For comparability purposes, it was desirable that the number of lags (four) in the VAR be the same across the diﬀerent models. Four lags is something of a compromise; one lag is clearly too short to provide for the dynamics in the data, but using 12 (as might be suggested by the use of monthly data) seems, according to a random inspection of information criteria, like overparameterizing the model.

24

The long run shift-share employment also was not found to have a unit root, and its VARs exhibit considerably less cointegration than those of Los Angeles. Similarly, aggregate employment in the US ﬁnance sector also appeared to be I(0), and yet across the MSAs, the ﬁnance sector’s VARs exhibit much less cointegration than the construction sector, or indeed any sector. The bottom line is that very little about Table 2.3 could have been inferred a priori from the results in Table 2.1. 5. As the predominant ﬁnding is that these VARs exhibit one or two cointegrating relationships, it would be prudent to use the bivariate cointegrating models 2, 4, and 6 to seek a better understanding. Tables 2.3 and 2.4 pursue this course.

The ﬁrst row of Table 2.3 examines the cointegration (or lack thereof) between aggregate city employment and national employment. The table entries are the trace test-statistic for cointegration in the indicated two-variable VAR. An asterisk indicates rejection at the 5% level of the null hypothesis that there is no cointegration between aggregate city employment and aggregate US employment. The nonrejection is taken as evidence in favor of Dunn’s total share model (Model 1) and this is the case for Dallas and Chicago. For the other three cities, the trace test indicates that there is cointegration between city and national employment. This is partial evidence in favor of the stochastic convergence, long run constant share model of Carlino and Mills (Model 2), but that model also requires that the cointegrating vector have unit coeﬃcients. Thus, for those entries that reject the null hypothesis, a notation of = 1 indicates that 1 is in the 95% conﬁdence interval for the un-normalized coeﬃcient in the cointegrating regression. This result is congruent with model 2. An indication of = 1 indicates otherwise. The Carlino–Mills Model appears to hold for Los Angeles, but not for Atlanta and Philadelphia. Rows in Table 2.3 beyond the ﬁrst are analogous results for city-industry employment and national counterparts. Lack of cointegration (no asterisk) is taken as evidence in favor of Model 3, H. Brown’s presentation of the shift-share model, whereas cointegration with unit coeﬃcients is evidence for Model 4, S. Brown, Coulson and Engle’s constant industry share model. What conclusions can be drawn from these rows of Table 2.3? The industry level results have a broad range of interpretations. At one extreme is the Education and Health Services sector, in which all ﬁve city employments are cointegrated with national employment, and four of those are statistically indistinguishable from the constant share model. An interpretation of this is of course that permanent shocks, i.e. productivity shocks, occur at the national level and percolate immediately to each local industry, and local productivity shocks are unimportant. At the other extreme is the information sector, where no local sector is cointegrated with the aggregate. An interpretation of this result is that productivity shocks are completely local; there is no national trend to tie local sectors to the broader. Although a few industries display results that are to an extent like those of the information sector (e.g. nondurable manufacturing), the most common outcome is a mixture of noncointegration and cointegration with a nonunit coeﬃcient. For example, Professional and Business Services exhibits two cities with no cointegration and three with nonunit cointegration. Aside from the diﬃculties of interpreting nonunit cointegration (as a partial adoption of national technology shocks?) the variety of responses makes it supremely diﬃcult to draw general conclusions. Returning to the aggregate level we see that only Los Angeles fails to reject the homogeneity requirement for the constant long run share model, whereas Dallas and

Table 2.3.

Trace tests of the constant share model Philadelphia

Total Construction Durable Manufacturing Nondurable Manufacturing Trade, Transportation and Utilities Finance, Insurance and Real Estate Information Services Professional and Business Services Education and Health Services Leisure and Hospitality Services Other Services Government

24.39∗ 57.24∗ 7.27 13.29 29.95∗ 10.61 2.87 11.22 35.34∗ 35.43∗ 19.31∗ 27.15∗

= 1 = 1 = 1

= 1 =1 =1 = 1

Dallas 14.73 18.40∗ 8.51 5.87 21.28∗ 6.39 11.98 39.92∗ 19.69∗ 62.67∗ 32.32∗ 25.51∗

= 1 = 1 = 1 =1 = 1 = 1 = 1

Atlanta 23.1∗ 9 = 1 13.58 9.98 5.01 22.10∗ = 1 11.34 12.81 15.74∗ = 1 19.78∗ = 1 61.21∗ = 1 12.81 20.25∗ = 1

Chicago 13.76 87.28∗ 15.53∗ 22.69∗ 12.31 18.39∗ 7.85 26.85∗ 20.87∗ 120.41∗ 33.56∗ 12.00

= 1 = 1 = 1 = 1 = 1 =1 = 1 = 1

Los Angeles 42.12∗ 25.20∗ 30.88∗ 7.79 13.45 23.07∗ 12.90 14.23 28.66∗ 13.43 24.31∗ 8.57

=1 = 1 = 1 = 1

=1 = 1

With the exception of the ﬁrst row, the table entries are the trace test-statistic for cointegration between the indicated city-industry and its national counterpart. An asterisk indicates signiﬁcance (i.e. cointegration) at the 5% level. For each signiﬁcant result, the notation =1 indicates that the cointegration coeﬃcient contains 1 in its 95% conﬁdence interval, =1 indicating the contrary. The ﬁrst row is the corresponding statistic for aggregate employment.

Table 2.4.

Trace tests of the multiplier model Philadelphia

Construction Durable Manufacturing Nondurable Manufacturing Trade, Transportation and Utilities Finance, Insurance and Real Estate Information Services Professional and Business Services Education and Health Services Leisure and Hospitality Services Other Services Government

30.54∗ = 1 5.06 4.29 11.84 6.24 5.83 6.09 11.66 4.39 2.73 31.83∗ = 1

Dallas 66.18∗ = 1 7.62 1 36.94∗ = 7.64 14.66 2.91 1 41.45∗ = 3.66 11.92 10.29 13.15

Atlanta

Chicago

31.19∗ = 1 7.75 4.17 8.23 10.09 6.85 30.44∗ = 1 8.42 9.16 25.3∗ = 1 7.55

43.39∗ 9.87 20.69∗ 33.90∗ 12.77 2.76 25.37∗ 15.08 14.12 3.51 58.80∗

= 1 = 1 = 1 = 1

= 1

Los Angeles 5.44 14.49 9.80 39.20∗ = 1 45.5∗ = 1 41.02∗ = 1 48.02∗ = 1 10.16 19.13∗ = 1 8.56 14.92

The table entries are the trace test-statistic for cointegration between the indicated city-industry and its regional aggregate. An asterisk indicates signiﬁcance (i.e. cointegration) at the 5% level. For each signiﬁcant result, the notation =1 indicates that the normalized cointegration coeﬃcient contains 1 in its 95% conﬁdence interval, =1 indicating the contrary.

3 Data and evidence

27

Chicago, at the other extreme, are not cointegrated at all with national employment. Again the absence of similar results across cities makes generalities impossible. But even so, puzzles arise. For example, Los Angeles conforms to the long run total share model even though none of the component industries do. Table 2.4 provides tests of the Brown, Coulson, Engle model of long run base multipliers. The Table entries are notated as before. There is very little evidence of cointegration (aside from Los Angeles, and the construction and business service sectors) and almost no evidence of unit responses (only two cases). This, as noted, is to be expected. The model of Brown, Coulson and Engle (1991), for example, assumes that permanent components of employment series are due to productivity shocks, it is quite natural for there to be cointegration between local and national sectors in the same industry. It would be quite another matter for diﬀerent industries in the same city to have such a correspondence. As Brown, Coulson and Engle (1992) note, it is possible for a single local industry series to be cointegrated with its metropolitan aggregate. The example discussed there concerned a single basic sector, which could be cointegrated with metropolitan employment if it was cointegrated with the other basic sectors. Such a scenario, as noted, is quite unlikely, as the productivity shocks are unlikely to be the same across industries. What is perhaps more likely is a second example (only indirectly discussed in Brown, Coulson and Engle (1992)) where a single local sector can be cointegrated with the aggregate if it is cointegrated with other local-serving industries. This is more plausible only in the sense that local-serving industries are largely in the service sector, and the dominant form of permanent shocks is perhaps more likely to be local demand shocks, and therefore common across sectors. By this reasoning it is perhaps sensible that the cointegration that does occur is in two sectors that are plausibly local-serving: construction and business services. Obviously, given the mixture of results, neither the long run shift-share nor the short run shift-share fully describe the ﬂuctuations of regional economies. In order to say more, the VAR itself must actually be estimated. We will perform four VARs for each city-industry: • (A) The short run shift-share: The model is estimated in diﬀerences, and the orthogonalization (3) is imposed. • (B) The short run VAR: The model is estimated in diﬀerences and only a causal ordering implied by W is imposed (i.e. without the homogeneity restrictions). • (C) The intermediate model: Cointegration is assumed, with the number of cointegrating vectors as indicated by Table 2.2. Statistically, this is the “correct” model. • (D) The long run shift-share: Three cointegrating relations are assumed and the homogeneity restrictions are added.8 The VARs are estimated using four lags as before, and compared using the 24-month forecast error variance decomposition. The results for the six sampled MSAs are in Tables 2.5 through 2.9. The variation of results across cities and industries is large. 8 A ﬁfth model was estimated, which provided for three cointegrating relations, but without imposing the homogeneity restriction of the long run shift-share. As might be expected, the results were intermediate between those of Models C and D.

Table 2.5. Sector

C DM NM TU F IS PS ES LS OS G

Philadelphia VARs Model A Short run shift-share

Model B Short run without shift-share restrictions

Model C Long run without shift-share restrictions

Model D Long run shift-share

n

i

m

e

n

i

m

e

n

i

m

e

n

i

m

e

17.6 39.8 28.9 24.2 28.8 30.7 26.4 26.5 22.1 24.6 28.7

31.5 21.0 22.9 23.9 25.5 25.7 24.3 18.9 23.5 26.5 23.7

17.7 20.1 21.8 26.7 22.0 21.4 26.4 36.8 25.0 24.1 26.9

33.2 19.0 26.4 25.2 23.7 22.2 22.9 17.8 29.4 24.8 20.7

11.7 13.0 3.5 13.3 3.8 4.6 1.4 1.6 0.6 2.4 0.7

35.9 9.5 3.6 29.1 26.6 1.9 0.7 0.7 2.2 1.1 2.9

9.2 26.6 35.5 10.8 8.0 57.3 58.0 56.1 47.4 31.7 23.9

63.3 50.9 57.4 75.0 61.6 36.2 39.9 41.6 49.8 64.8 72.4

11.0 21.2 1.5 2.9 8.8 17.3 15.2 4.8 5.3 8.1 4.3

8.1 61.3 13.5 23.9 22.0 50.8 33.1 2.3 7.5 5.1 3.7

27.6 9.8 28.1 72.4 20.3 2.2 6.4 36.4 39.7 8.1 32.2

53.4 7.8 56.9 13.2 48.8 29.7 45.4 56.6 47.5 78.6 59.7

10.6 60.3 1.4 30.8 15.2 31.8 24.9 15.3 4.8 1.5 10.1

9.9 20.9 9.4 8.8 9.2 37.2 15.0 0.5 10.4 1.5 3.4

20.3 1.7 20.5 48.2 22.1 4.2 16.3 36.0 20.4 7.1 29.7

59.3 17.1 68.8 12.2 53.5 26.8 43.8 48.2 64.4 89.9 56.7

The table entries are the percentage of the 24-month forecast error variance of local employment that can be ascribed to the indicated shock. Sector abbreviations given in the ﬁrst column correspond to sectors listed in the ﬁrst columns of Tables 2.1–2.4.

Table 2.6. Sector

C DM NM TU F IS PS ES LS OS G

Dallas VARs Model A Short run shift-share

Model B Short run without shift-share restrictions

Model C Long run without shift-share restrictions

Model D Long run shift-share

n

i

m

e

n

i

m

e

n

i

m

e

n

i

m

e

53.0 72.3 54.4 60.5 69.5 63.8 9.7 12.2 64.5 66.1 52.9

20.6 19.6 31.1 7.6 24.1 18.5 68.9 48.3 14.9 18.9 22.3

5.7 3.2 1.5 15.5 2.3 7.7 9.8 7.3 6.7 5.0 10.3

20.7 4.9 13.0 16.4 4.1 10.0 11.6 32.2 13.9 10.0 14.5

8.5 22.7 8.0 3.7 6.7 21.3 5.3 1.2 1.0 3.5 0.7

1.5 6.8 4.3 2.4 6.8 4.5 4.0 1.6 1.3 1.2 1.7

31.7 10.8 52.5 54.9 22.0 4.8 59.8 55.4 36.2 20.0 14.3

62.4 59.7 35.3 39.0 64.5 69.4 30.9 41.9 61.4 75.3 83.4

63.1 56.3 73.6 49.5 32.1 59.0 22.8 1.2 45.5 38.4 19.6

1.2 33.9 12.4 3.4 19.5 6.9 59.3 0.8 7.5 4.4 8.2

20.2 0.7 9.6 44.5 2.5 1.3 9.7 10.4 18.4 7.6 24.2

15.5 9.1 4.3 2.6 45.9 32.8 8.2 87.6 28.6 49.6 48.1

42.7 58.0 75.4 55.7 26.2 58.0 40.5 1.3 34.5 37.7 12.5

9.9 25.8 8.1 4.9 17.4 2.8 28.7 13.8 12.0 8.0 7.4

21.9 2.7 9.8 28.2 23.1 1.4 17.6 32.3 16.9 6.7 22.4

25.5 13.6 6.7 11.2 33.2 37.8 13.2 52.5 36.6 47.5 57.7

The table entries are the percentage of the 24-month forecast error variance of local employment that can be ascribed to the indicated shock.

Table 2.7. Sector

C DM NM TU F IS PS ES LS OS G

Atlanta VARs Model A Short run shift-share

Model B Short run without shift-share restrictions

Model C Long run without shift-share restrictions

Model D Long run shift-share

n

i

m

e

n

i

m

e

n

i

m

e

n

i

m

e

27.7 27.1 25.4 19.1 27.9 35.7 25.9 22.8 22.0 26.8 30.9

25.9 25.5 27.2 30.3 24.8 25.2 24.8 22.5 25.6 23.9 19.2

22.7 22.5 22.6 17.8 23.9 17.7 27.5 28.4 25.2 25.0 29.0

23.8 24.8 24.8 32.8 23.4 21.4 21.8 26.3 27.2 24.3 27.9

6.7 5.1 2.1 4.3 2.5 8.7 3.9 1.9 1.6 3.7 23.7

3.7 4.0 3.0 2.4 5.7 10.7 4.6 0.9 1.2 0.3 27.7

39.2 14.5 22.5 55.7 29.2 14.2 65.7 35.8 45.0 6.1 20.7

50.4 76.4 72.3 37.5 62.6 66.4 25.8 61.4 52.2 89.9 77.3

59.1 30.8 15.5 4.3 39.3 31.1 50.0 6.5 22.0 28.4 10.2

5.1 24.8 1.6 2.4 6.3 38.4 18.7 5.3 17.6 1.1 18.6

7.3 13.1 33.0 55.7 3.5 22.2 14.6 14.9 22.8 2.6 19.2

28.6 31.2 49.9 37.5 50.9 8.3 16.7 73.3 37.7 67.9 51.9

27.0 44.4 49.9 57.0 33.5 40.1 46.8 0.9 27.1 16.9 1.4

14.3 5.4 9.4 2.7 4.8 34.5 16.5 19.5 17.4 3.2 14.0

29.2 3.2 9.6 20.4 10.3 5.6 4.5 28.8 17.5 13.3 30.2

29.5 47.1 31.2 19.9 51.4 19.8 32.1 50.8 38.0 66.6 54.4

The table entries are the percentage of the 24-month forecast error variance of local employment that can be ascribed to the indicated shock.

Table 2.8. Sector

C DM NM TU F IS PS ES LS OS G

Chicago VARs Model A Short run shift-share

Model B Short run without shift-share restrictions

Model C Long run without shift-share restrictions

Model D Long run shift-share

n

i

m

e

n

i

m

e

n

i

m

e

n

i

m

e

45.1 36.4 35.0 17.7 28.0 28.4 21.3 36.3 13.5 25.0 30.2

8.8 24.9 23.4 30.0 24.3 23.3 29.6 13.6 31.5 25.3 19.4

38.9 19.6 22.1 18.4 22.8 28.6 18.3 33.9 14.3 24.0 31.7

16.8 19.1 19.6 34.0 25.0 33.5 30.9 16.1 40.7 25.7 18.6

30.0 9.7 4.9 2.6 2.0 22.9 3.4 1.0 1.0 2.0 0.6

14.8 11.3 4.8 2.0 1.9 21.8 2.4 3.6 1.2 1.5 1.1

38.4 32.8 47.8 51.7 30.1 21.8 56.7 45.7 43.1 29.9 29.9

62.3 46.3 42.6 43.7 66.1 77.1 37.6 49.6 54.7 66.6 68.5

15.6 25.0 21.1 47.6 3.4 24.7 18.7 10.0 23.8 19.6 2.7

2.4 17.4 63.3 1.4 2.2 30.2 45.5 42.8 10.2 15.1 5.2

33.0 2.2 4.8 27.8 40.4 2.2 16.9 14.4 34.4 27.9 47.2

49.0 55.4 10.8 23.1 54.0 42.8 18.9 32.8 31.6 37.4 44.9

16.2 17.4 15.7 46.4 7.5 10.1 51.4 4.1 21.0 35.0 6.0

4.6 2.2 48.9 1.3 1.3 23.9 17.0 17.4 10.8 4.5 6.2

31.0 55.4 8.2 32.2 6.7 9.5 13.9 20.7 33.9 22.1 41.1

48.2 57.2 27.2 20.1 84.6 56.5 17.7 57.8 34.3 38.3 46.6

The table entries are the percentage of the 24-month forecast error variance of local employment that can be ascribed to the indicated shock.

Table 2.9. Sector

C DM NM TU F IS PS ES LS OS G

Los Angeles VARs Model A Short run shift-share

Model B Short run without shift-share restrictions

Model C Long run without shift-share restrictions

Model D Long run shift-share

n

i

m

e

n

i

m

e

n

i

m

e

n

i

m

e

32.0 36.6 5.5 22.2 25.1 26.9 27.4 21.8 25.2 27.1 25.8

21.9 23.1 5.4 27.3 25.3 19.2 24.2 30.5 24.5 23.6 25.5

26.6 22.3 51.2 22.5 26.3 33.1 27.3 21.2 24.9 29.4 24.2

19.5 18.0 33.6 28.0 23.4 20.8 21.1 26.4 25.3 20.0 24.5

3.5 8.1 23.2 0.8 2.4 4.7 3.5 0.1 0.3 0.3 2.0

2.8 6.6 27.9 2.2 3.1 2.2 2.4 0.6 1.9 2.3 2.1

48.2 45.6 15.4 66.9 40.2 23.1 54.4 30.0 53.1 58.0 19.6

45.5 39.7 37.9 30.0 54.3 70.0 39.7 69.3 44.7 39.3 76.3

20.7 17.6 31.0 30.9 12.9 22.7 35.0 0.8 9.8 3.9 0.4

17.7 43.3 12.5 0.2 5.5 22.6 6.0 16.1 0.4 0.4 1.1

5.9 5.9 1.5 58.5 13.6 7.5 8.3 8.5 9.2 7.9 17.9

55.6 33.3 55.0 10.4 68.0 47.3 50.6 74.6 80.5 87.8 80.6

26.1 29.3 23.5 24.1 6.0 6.0 33.0 4.6 16.7 4.0 6.5

6.7 46.8 24.2 2.2 20.4 7.3 6.0 9.9 19.4 2.3 5.1

4.9 3.5 3.9 55.4 14.4 4.4 10.0 8.5 22.7 26.4 15.0

62.4 20.5 48.4 18.3 59.2 82.3 51.1 77.1 41.2 67.3 73.3

The table entries are the percentage of the 24-month forecast error variance of local employment that can be ascribed to the indicated shock.

4 Summary and conclusions

33

The following stylized conclusions might be drawn, although every one of these has exceptions. A good starting point is the comparison of Models A and B, both of which assume a lack of cointegration, but which diﬀer in whether they impose the shift-share orthogonalization (A) or not (B). Not shown in the tables is that the overidentifying restrictions that impose the normalization are universally rejected. The shift-share model is not appropriate; in comparing models A and B we see that the statistically preferred Model B assigns far more explanatory power, on average, to the local industry shock, and (less regularly, the aggregate metro shock) than Model A. This is natural; what the shift-share model does is force local movements to follow movements in broader aggregates in a one-for-one manner, thus ignoring the role of local supply shocks. This can be contradictory to the actual movements of local industries, and thus not imposing the short run constraints would seem to be preferable. Another way of looking at this is to note that in the ﬁrst step of the variance decomposition in Model A, all four shocks are given equal weight (as per the structure of the matrix W), and the force of this persists even to the 24-step horizon. When the statistically preferred number of cointegrating vectors are assumed to exist (Model C), the results are generally closer to the results in Model B than to those in Model A. Generally, though, Model C does assign more explanatory power to national and national-industry shocks than does B. This is to be expected given the previous bivariate results of Tables 2.3 and 2.4. Note that bivariate cointegration was far more common in the relationship between local industry and national industry than between local industry and the aggregate local economy. Thus, we would expect that when cointegration is allowed into the system, the impact of the nation and national industry would increase. By and large (but by no means universally) this result is conﬁrmed. As we move from Model C to Model D, recall that two modeling changes are made. First, the number of cointegrating vectors is forced to be three. This would not be expected to make much of a diﬀerence in the results, as the extra cointegrating coeﬃcient would presumably be close to zero. The imposition of unit coeﬃcients (especially when they would otherwise be zero) is therefore presumably of more importance. Note ﬁrst of all (test statistics not shown) that these unitary restrictions are universally rejected by the data at any conventional level of signiﬁcance. Second, although there are strong diﬀerences in the results, these results do not appear to have any systematic pattern. In particular, the share of the forecast error variance that is absorbed by the idiosyncratic shock does not show systematic rise or fall when the long run shift-share restrictions are imposed. Thus, the imposition of the long run shift-share might be particularly dangerous, as there is little indication of in which direction the bias from the model runs.

4. Summary and conclusions A natural intersection of urban economics and time series econometrics is in the examination of urban ﬂuctuations. In this chapter, the work of Robert Engle at this intersection is carried forward. The traditional models of metropolitan sectoral ﬂuctuations investigated by Engle and others are shown to be special cases of a general four-dimensional VAR.

34

The long run shift-share

Many of the restrictions that the traditional models embody are shown to be largely rejected by the data in favor of models with greater parameterization. This would seem to be due, at least in the short run, to the fact that the traditional models try to track local sectoral ﬂuctuations by using broader aggregates. This implicitly minimizes the role of local productivity shocks, which, according to the variance decomposition, turn out to be quite important. In the long run there is some connection between local sectoral movements and broader aggregates via cointegrating relationships, but the relationship is not homogenous, and the imposition of shift-share type restrictions is not recommended even in the long run.

3

The Evolution of National and Regional Factors in US Housing Construction James H. Stock and Mark W. Watson

1. Introduction This chapter uses a dynamic factor model with time-varying volatility to study the dynamics of quarterly data on state-level building permits for new residential units from 1969–2007. In doing so, we draw on two traditions in empirical economics, both started by Rob Engle. The ﬁrst tradition is the use of dynamic factor models to understand regional economic ﬂuctuations. Engle and Watson (1981) estimated a dynamic factor model of sectoral wages in the Los Angeles area, with a single common factor designed to capture common regional movements in wages, and Engle, Lilien, and Watson (1985) estimated a related model applied to housing prices in San Diego. These papers, along with Engle (1978b) and Engle and Watson (1983), also showed how the Kalman ﬁlter could be used to obtain maximum likelihood estimates of the parameters of dynamic factor models in the time domain. The second tradition is modeling the time-varying volatility of economic time series, starting with the seminal work on ARCH of Engle (1982). That work, and the extraordinary literature that followed, demonstrated how time series models can be used to estimate time-varying variances, and how changes in those variances in turn can be linked to economic variables. The dynamics of the US housing construction industry are of particular interest for both historical and contemporary reasons. From an historical perspective the issuance of building permits for new residential units has been strongly procyclical, moving closely Acknowledgments: This research was funded in part by NSF grant SBR-0617811. We thank Dong Beong Choi and the Survey Research Center at Princeton University for their help on this project and Jeﬀ Russell and a referee for comments on an earlier draft. Data and replication ﬁles are available at http://www.princeton.edu/∼mwatson

35

36

The evolution of national and regional factors in US housing construction

0.6 0.4 0.2 0.0 –0.2 –0.4 –0.6 –0.8

1965

1970

1975

1970

1975

1980

1985

1990

1995

2000

2005

2010

3

2

1

0

–1

–2

–3

1965

1980

1985

1990

1995

2000

2005

2010

Fig. 3.1. Four-quarter growth rate of GDP (dark line) and total US building permits in decimal units (upper panel) and in units of standard deviations (lower panel) with overall growth in GDP but with much greater volatility. Figure 3.1 plots fourquarter growth rates of GDP and aggregate US building permits from 1960–2007. Like GDP growth and other macroeconomic aggregates, building permits were much more volatile in the ﬁrst half of the sample period (1960–1985) than in the second half of the period (1986–2007). In fact, the median decline in the volatility of building permits is substantially greater than for other major macroeconomic aggregates. From a contemporary perspective, building permits have declined sharply recently, falling by approximately 30% nationally between 2006 and the end of our sample in 2007, and the contraction in housing construction is a key real side-eﬀect of the decline in housing prices and the

1 Introduction

37

2.5 2.0 1.5

Northeast Southeast Northcentral Southwest West

1.0 0.5 0.0 –0.5 –1.0 –1.5 1975

1980

1985

1990

1995

2000

2005

2010

Fig. 3.2. Deviation of regional 30-year ﬁxed mortgage rates from the national median, 1976–2007 (units are decimal points at an annual rate). Data Source: Freddie Mac Primary Mortgage Market Survey

turbulence in ﬁnancial markets during late 2007 into 2008. Because building permit data are available by state, there is potentially useful information beyond that contained in the national aggregate plotted in Figure 3.1, but we are unaware of any systematic empirical analysis of state-level building permit data. In this chapter, we build on Engle’s work and examine the coevolution of state-level building permits for residential units. Our broad aim is to provide new ﬁndings concerning the link between housing construction, as measured by building permits,1 and the decline in US macroeconomic volatility from the mid-1980s through the end of our sample in 2007, often called the Great Moderation. One hypothesis about the source of the Great Moderation in US economic activity is that developments in mortgage markets, such as the elimination of interest rate ceilings and the bundling of mortgages to diversify the risk of holding a mortgage, led to wider and less cyclically sensitive availability of housing credit. As can be seen in Figure 3.2, prior to the mid-1980s there were substantial regional diﬀerences in mortgage rates across the US; however, after approximately 1987 these diﬀerences disappeared, suggesting that what had been regional mortgage markets become a single national mortgage market. According to this hypothesis, these changes in ﬁnancial markets reduced the cyclicality of mortgage credit, which in turn moderated the volatility of housing construction and thus of overall employment. This chapter undertakes two speciﬁc tasks. The ﬁrst task is to provide a new data set on state-level monthly building permits and to provide descriptive statistics about 1 Somerville (2001), Goodman (1986) and Coulson (1999) discuss various aspects of the links between housing permits, starts, and completions.

38

The evolution of national and regional factors in US housing construction

these data. This data set was put into electronic form from paper records provided by the US Bureau of the Census. These data allow us to characterize both the comovements (spatial correlation) of permits across states and changes in volatility of state permits from 1969 to the present. The second task is to characterize the changes over time in the volatility of building permits with an eye towards the Great Moderation. If ﬁnancial market developments were an important source of the Great Moderation, one would expect that the volatility of building permits would exhibit a similar pattern across states, and especially that any common or national component of building permits would exhibit a decline in volatility consistent with the patterns documented in the literature on the Great Moderation. Said diﬀerently, ﬁnding a lack of a substantial common component in building permits along with substantial state-by-state diﬀerences in the evolution of volatility would suggest that national-level changes in housing markets, such as the secondary mortgage market, were not an important determinant of housing market volatility. The model we use to characterize the common and idiosyncratic aspects of changes in state-level volatility is the dynamic factor model introduced by Geweke (1977), modiﬁed to allow for stochastic volatility in the factors and the idiosyncratic disturbances; we refer to this as the DFM-SV model. The ﬁltered estimates of the state variables implied by the DFM-SV model can be computed by Markov Chain Monte Carlo (MCMC). The DFMSV model is a multivariate extension of the univariate unobserved components-stochastic volatility model in Stock and Watson (2007a). In the DFM-SV model, state-level building permits are a function of a single national factor and one of ﬁve regional factors, plus a state-speciﬁc component. Thus speciﬁcation of the DFM-SV model requires determining which states belong in which region. One approach would be to adopt the Department of Commerce’s deﬁnition of US regions; however, that grouping of states was made for administrative reasons and, although the groupings involved some economic considerations, those considerations are now out of date. We therefore follow Abraham, Goetzmann, and Wachter (1994) (AGW) and Crone (2005) by estimating the regional composition using k-means cluster analysis. Our analysis diﬀers from these previous analyses in three main respects. First, we are interested in state building permits, whereas AGW studied metropolitan housing prices, and Crone was interested in aggregate state-wide economic activity (measured by state coincident indexes from Crone and Clayton-Matthews (2005)). Second, we estimate the clusters after extracting a single national factor, whereas AGW estimated clusters using percentage changes in metropolitan housing price indexes and Crone estimated clusters using business cycle components of the state-level data, where in both cases a national factor was not extracted. Third, we examine the stability of these clusters before and after 1987. The outline of the chapter is as follows. The state-level building permits data set is described in Section 2, along with initial descriptive statistics. The DFM-SV model is introduced in Section 3. Section 4 contains the empirical results, and Section 5 concludes.

2. The state building permits data set This section ﬁrst describes the state housing start data set, then presents some summary statistics and time series plots.

2 The state building permits data set

39

2.1. The data The underlying raw data are monthly observations on residential housing units authorized by building permits by state, from 1969:1–2008:1. The data were obtained from the US Department of Commerce, Bureau of the Census, and are reported in the monthly news release “New Residential Construction (Building Permits, Housing Starts, and Housing Completions).” Data from 1988–present are available from Bureau of the Census in electronic form.2 Data prior to 1988 are available in hard copy, which we obtained from the Bureau of the Census. These data were converted into electronic form by the Survey Research Center at Princeton University. For the purpose of the building permits survey, a housing unit is deﬁned as a new housing unit intended for occupancy and maintained by occupants, thereby excluding hotels, motels, group residential structures like college dorms, nursing homes, etc. Mobile homes typically do not require a building permit so they are not counted as authorized units. Housing permit data are collected by a mail survey of selected permit-issuing places (municipalities, counties, etc.), where the sample of places includes all the largest permitting places and a random sample of the less active permitting places. In addition, in states with few permitting places, all permitting places are included in the sample. Currently the universe is approximately 20,000 permitting places, of which 9,000 are sampled, and the survey results are used to estimate total monthly state permits. The universe of permitting places has increased over time, from 13,000 at the beginning of the sample to 20,000 since 1974.3 Precision of the survey estimates vary from state to state, depending on coverage. As of January 2008, eight states have 100% coverage of permitting places so for these states there is no sampling error. In an additional 34 states, the sampling standard error in January 2008 was less than 5%. The states with the greatest sampling standard error are Missouri (17%), Wyoming (17%), Ohio (13%), and Nebraska (12%). In some locations, housing construction does not require a permit, and any construction occurring in such a location is outside the universe of the survey. Currently more than 98% of the US population resides in permit-issuing areas. In some states, however, the fraction of the population residing in a permit-issuing area is substantially less; the states with the lowest percentages of population living within a permit-requiring area are Arkansas (60%), Mississippi (65%), and Alabama (68%). In January 2008, Arkansas had 100% of permitting places in the survey so there was no survey sampling error; however, the survey universe only covered 60% of Arkansas residents.4 The series analyzed in this chapter is total residential housing units authorized by building permits, which is the sum of authorized units in single-family and multiplefamily dwellings, where each apartment or town house within a multi-unit dwelling is counted as a distinct unit. 2 Monthly releases of building permits data and related documentation are provided at the Census Bureau Website, http://www.census.gov/const/www/newresconstindex.html 3 The number of permit-issuing places in the universe sampled by date are: 1967–1971, 13,000; 1972– 1977, 14,000; 1978–1983, 16,000; 1984–1993, 17,000; 1994–2003, 19,000; 2004–present, 20,000. 4 Additional information about the survey and the design is available at http://www.census.gov/const/ www/newresconstdoc.html#reliabilitybp and http://www.census.gov/const/www/C40/sample.html

40

The evolution of national and regional factors in US housing construction 32 28

Ohio Louisiana Kansas New Jersey Vermont

Thousands

24 20 16 12 8 4 0

1970

1975

1980

1985

1990

1995

2000

2005

2010

2005

2010

28 Ohio Louisiana Kansas New Jersey Vermont

24

Thousands

20 16 12 8 4 0

1970

1975

1980

1985

1990

1995

2000

Fig. 3.3. Quarterly building permits data for ﬁve representative states. Upper panel: not seasonally adjusted. Lower panel: seasonally adjusted using Census X12

The raw data are seasonally unadjusted and exhibit pronounced seasonality. Data for each state was seasonally adjusted using the X12 program available from the Bureau of the Census. Quarterly sums of the monthly data served as the basis for our analysis. The quarterly data are from 1969:I through 2007:IV.5

2.2. Summary statistics and plots Quarterly data for ﬁve representative states, Ohio, Louisiana, Kansas, New Jersey, and Vermont are plotted in Figure 3.3 (upper panel). Three features are evident in these plots. First, there is not a clear long run overall trend in the number of permits issued, 5 The

raw data are available at http://www.princeton.edu/∼mwatson

2 The state building permits data set

41

and for these states the number of permits issued in 2007 is not substantially diﬀerent from the number issued in 1970. Second, the raw data are strongly seasonal, but the seasonality diﬀers across states. Not surprisingly, the states with harsher winters (Ohio and Vermont) have stronger seasonal components than those with more moderate winters (Louisiana). Third, there is considerable volatility in these series over the several-year horizon (building permits are strongly procyclical). The lower panel of Figure 3.3 presents the seasonally adjusted quarterly building permits data for the same ﬁve states. The comovements among these series can be seen more clearly in these seasonally adjusted data than in the nonseasonally adjusted data. For example, these states (except Vermont) exhibited a sharp slowdown in building activity in the early 1980s and a steady growth in permits through the 1990s. Summary statistics for the seasonally adjusted building permits data for all 50 states are given in Table 3.1. The average quarterly number of building permits (ﬁrst numeric column) diﬀers by an order of magnitude across states. The average growth rate of building permits (second numeric column) is typically small in absolute value, and is negative for many states, especially in the northeast. The third and fourth numeric columns report the standard deviation of the four-quarter growth in building permits, deﬁned as: Δ4 yit = yit − yit−4 .

(1)

where yit = ln(BPit ), and BPit denotes the number of building permits in state i and time t. These standard deviations reveal ﬁrst the great volatility in permits in all states, and second the marked decline in volatility in most states between the ﬁrst and second half of the sample. In most states, the standard deviation fell by approximately one-half (variances fell by 75%) between the two subsamples. The ﬁnal three columns of Table 3.1 examine the persistence of building permits by reporting a 95% conﬁdence interval, constructed by inverting the ADF tμ statistic (columns 5 and 6) and, in the ﬁnal column, the DF-GLSμ t statistic, both computed using four lags in the quarterly data. The conﬁdence intervals indicate that the largest AR root is near one, and all but three of the conﬁdence intervals contain a unit root. The DF-GLSμ statistics paint a somewhat diﬀerent picture, with 25 of the 50 statistics rejecting a unit root at the 5% signiﬁcance level. Such diﬀerences are not uncommon using unit root statistics, however. Taken together, we interpret these conﬁdence intervals and DF-GLSμ statistics as consistent with the observation suggested by Figure 3.3 that the series are highly persistent and plausibly can be modeled as containing a unit root. For the rest of the chapter we therefore focus on the growth rate of building permits, either the quarterly growth rate or (for comparability to the literature on the Great Moderation) on the four-quarter growth rate Δ4 yit deﬁned in (1). The four-quarter growth rates of building permits for each of the 50 states are plotted in Figure 3.4. Also shown (solid lines) are the median, 25%, and 75% percentiles of growth rates across states, computed quarter by quarter. The median growth rate captures the common features of the ﬁve states evident in Figure 3.3, including the sharp fall in permits (negative growth) in the early 1980s, the steady rise through the 1990s (small ﬂuctuations around a positive average growth rate), and the sharp decline in permits at the end of the sample. This said, there is considerable dispersion of state-level growth

42

The evolution of national and regional factors in US housing construction

Table 3.1. State

CT MA MD ME NH NJ NY PA RI CA ID IN MI NV OH OR SD WA WI IA IL KA MN MO ND NE DE FL GA HA KY MS NC SC TN VA VT WV AK

Seasonally adjusted state building permits: summary statistics

Average quarterly permits

3463 5962 7780 1313 1657 8247 11147 10329 953 43210 2047 7482 11138 5745 11619 5259 863 9828 6918 2837 12256 2989 6937 5870 762 2025 1274 39213 15586 2021 3846 2449 13623 6520 7687 12674 704 749 703

Average annual growth rate −0.030 −0.022 −0.013 0.022 0.002 −0.010 −0.002 −0.005 −0.020 −0.015 0.061 −0.004 −0.028 0.044 −0.016 0.009 0.032 0.006 0.002 −0.001 −0.011 0.002 −0.013 −0.005 0.011 0.004 0.000 −0.005 0.016 −0.018 0.000 0.019 0.032 0.025 0.013 −0.001 0.014 0.016 0.006

Std. dev. of four-quarter growth rate

95% conﬁdence interval for largest AR root

1970–1987 1988–2007 Lower 0.29 0.33 0.34 0.34 0.39 0.33 0.33 0.29 0.45 0.38 0.45 0.35 0.38 0.48 0.35 0.38 0.46 0.31 0.31 0.38 0.45 0.41 0.32 0.36 0.44 0.37 0.44 0.45 0.36 0.39 0.40 0.42 0.33 0.31 0.42 0.35 0.36 0.52 0.62

0.23 0.20 0.18 0.20 0.26 0.28 0.17 0.15 0.23 0.21 0.21 0.15 0.19 0.33 0.14 0.22 0.30 0.16 0.15 0.19 0.16 0.22 0.17 0.18 0.31 0.22 0.18 0.22 0.17 0.36 0.19 0.21 0.14 0.16 0.17 0.18 0.25 0.20 0.35

0.92 0.90 0.84 0.88 0.86 0.83 0.90 0.86 0.90 0.87 0.90 0.89 0.91 0.88 0.88 0.86 0.90 0.87 0.90 0.93 0.88 0.82 0.87 0.83 0.88 0.89 0.89 0.36 0.90 0.90 0.82 0.87 0.87 0.90 0.81 0.79 0.89 0.93 0.89

DF-GLSμ unit root statistic

Upper 1.02 1.02 1.01 1.02 1.01 1.01 1.02 1.01 1.02 1.01 1.02 1.02 1.02 1.02 1.02 1.01 1.02 1.01 1.02 1.02 1.02 1.00 1.02 1.01 1.02 1.02 1.02 0.91 1.02 1.02 1.01 1.01 1.01 1.02 1.00 0.97 1.02 1.02 1.02

−0.42 −1.42 −2.42∗ −1.30 −2.06∗ −2.15∗ −1.33 −2.72∗∗ −2.04∗ −2.70∗∗ −0.12 −2.02∗ −0.87 −0.73 −1.37 −2.55∗ −1.31 −2.76∗∗ −2.11∗ −1.95∗ −1.49 −3.30∗∗ −1.95∗ −2.17∗ −2.18∗ −2.28∗ −1.94 −4.61∗∗ −1.83 −0.72 −2.50∗ −2.71∗∗ −1.04 −1.34 −2.88∗∗ −3.52∗∗ −0.77 −1.05 −1.97∗ cont.

2 The state building permits data set Table 3.1. State

AL AR AZ CO LA MT NM OK TX UT WY

43

(Continued )

Average quarterly permits

4612 2405 12274 8725 4461 690 2481 3637 30950 3891 548

Average annual growth rate 0.015 0.015 0.016 0.007 0.005 0.027 0.036 0.000 0.017 0.032 0.045

Std. dev. of four-quarter growth rate

95% conﬁdence interval for largest AR root

1970–1987 1988–2007 Lower 0.40 0.40 0.46 0.44 0.42 0.46 0.45 0.46 0.37 0.40 0.42

0.17 0.22 0.22 0.22 0.20 0.29 0.21 0.21 0.16 0.21 0.31

0.85 0.85 0.86 0.83 0.87 0.92 0.36 0.88 0.91 0.86 0.92

DF-GLSμ unit root statistic

Upper 1.01 1.01 1.01 1.01 1.02 1.02 0.94 1.02 1.02 1.01 1.02

−2.99∗∗ −2.19∗ −1.92 −2.70∗∗ −2.23∗ −1.64 −1.25 −2.06∗ −1.89 −1.45 −0.75

The units for the ﬁrst numeric column are units permitted per quarter. The units for columns 2–4 are decimal annual growth rates. The 95% conﬁdence interval for the largest autoregressive root in column 5 is computed by inverting the ADFμ t-statistic, computed using four lags. The ﬁnal column reports the DF-GLSμ t-statistic, also computed using four lags. The DF-GLSμ t-statistic rejects the unit root at the: ∗ 5% or ∗∗ 1% signiﬁcance level. The full quarterly data set spans 1969Q1–2007Q4.

rates around the median, especially in the mid-1980s. Also clearly visible in Figure 3.4 is the greater volatility of the four-quarter growth rate of building permits in the ﬁrst part of the sample than in the second.

2.3. Rolling standard deviations and correlations Figure 3.4 shows a decline in volatility in the state-level building permit data and also substantial comovements across states. Here we provide initial, model-free measurements of these two features. Volatility. Rolling standard deviations of the four-quarter growth rate of building permits for the 50 states (that is, the standard deviation of Δ4 yit ), computed using a centered 21-quarter window, are plotted in Figure 3.5; as in Figure 3.4, the dark lines are the median, 25%, and 75% percentiles. The median standard deviation clearly shows a sharp, almost discrete decline in state-level volatility that occurred in approximately 1984–1985, essentially the same date that has been identiﬁed as a break date for the Great Moderation. After 1985, however, the median volatility continued to decrease to a low of approximately 0.15 (decimal units for annual growth rates), although a sharp increase is evident at the end of the sample when it returned to the levels of the late 1980s (approximately 0.2). The magnitude of the overall decline in volatility is remarkable, from approximately 0.4 during the 1970s and 1980s to less than 0.2 on average during the1990s and 2000s. Spatial correlation. There are, of course, many statistics available for summarizing the comovements of two series, including cross correlations and spectral measures such

44

The evolution of national and regional factors in US housing construction 2.0 1.5 1.0 0.5 0.0 –0.5 –1.0 –1.5 –2.0 1970

1975

1980

1985

1990

1995

2000

2005

2010

Fig. 3.4. Four-quarter growth rate of building permits for all 50 states. The dotted lines are the state-level time series; the median, 25%, and 75% percentiles of the 50 growth rates (quarter by quarter) are in solid lines as coherence. In this application, a natural starting point is the correlation between the four-quarter growth rates of two state series, computed over a rolling window to allow for time variation. With a small number of series it is possible to display the N (N − 1)/2 pairs of cross-correlations, but this is not practical when N = 50. We therefore draw on the spatial correlation literature for a single summary time series that summarizes the possibly time-varying comovements among these 50 series. Speciﬁcally, we use a measure based on Moran’s I, applied to a centered 21-quarter rolling window.6 Speciﬁcally, the modiﬁed Moran’s I used here is: N i−1

I˜t =

c ov(Δ4 yit , Δ4 yjt ) N (N − 1)/2

i=1 j=1 N

v ar(Δ4 yit ) N

(2)

i=1

t+10

1 where c ov(Δ4 yit , Δ4 yjt ) = 21 4 yit ) = s=t−10 (Δ4 yis − Δ4 yit )(Δ4 yjs − Δ4 yjt ), var(Δ t+10 t+10 1 1 2 (Δ y − Δ y ) , Δ y = Δ y , and N = 50. 4 is 4 it 4 it 4 is s=t−10 s=t−10 21 21

The time series I˜t is plotted in Figure 3.6. For the ﬁrst half of the sample, the spatial correlation was relatively large, approximately 0.5. Since 1985, however, the spatial correlation has been substantially smaller, often less than 0.2 except in the early 1990s 6 Moran’s I is a weighted spatial correlation measure. Here we are interested in comovement over time across states.

3 The DFM-SV model

45

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0 1970

1975

1980

1985

1990

1995

2000

2005

2010

Fig. 3.5. Rolling standard deviation (centered 21-quarter window) of the four-quarter growth rate of building permits for all 50 states (decimal values). The dotted lines are the state-level rolling standard deviations; the median, 25%, and 75% percentiles of the 50 rolling standard deviations (quarter by quarter) are in solid lines and in the very recent collapse of the housing market. Aside from these two periods of national decline in housing construction, the spatial correlation in state building permits seems to have fallen at approximately the same time as did their volatility.

3. The DFM-SV model This section lays out the dynamic factor model with stochastic volatility (DFM-SV) model, discusses the estimation of its parameters and the computation of the ﬁltered estimates of the state variables, and describes the algorithm for grouping states into regions.

3.1. The dynamic factor model with stochastic volatility We examine the possibility that state-level building permits have a national component, a regional component, and an idiosyncratic component. Speciﬁcally, we model log building permits (yit ) as following the dynamic factor model, yit = αi + λi Ft +

NR j=1

γij Rjt + eit

(3)

46

The evolution of national and regional factors in US housing construction 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1970

1975

1980

1985

1990

1995

2000

2005

2010

Fig. 3.6. Rolling average spatial correlation in the four-quarter growth of building permits across states as measured by the modiﬁed Moran’s I statistic I˜t

where the national factor Ft and the NR regional factors Rjt follow random walks and the idiosyncratic disturbance eit follows an AR(1): Ft = Ft−1 + ηt

(4)

Rjt = Rjt−1 + υjt

(5)

eit = ρi eit−1 + εit .

(6)

The disturbances ηt , υjt , and εit are independently distributed and have stochastic volatility: ηt = ση,t ζη ,t

(7)

υjt = συj ,t ζυj ,t

(8)

εit = σεi ,t ζεi ,t

(9)

2 2 = ln ση,t−1 + νη,t ln ση,t

(10)

ln συ2j ,t = ln συ2j ,t−1 + νυj ,t

(11)

ln σε2i ,t = ln σε2i ,t−1 + νεi ,t

(12)

3 The DFM-SV model

47

where ζt = (ζη,t , ζυ1 ,t , . . . , ζυNR ,t , ζε1 ,t , . . . , ζεN ,t ) is i.i.d. N(0, I1+NR +N ), νt = (νη,t , νυ1 ,t , . . . , νυNR ,t , νε1 ,t , . . . , νεN ,t ) is i.i.d. N(0, φI1+NR +N ), ζt and νt are independently distributed, and φ is a scalar parameter. The factors are identiﬁed by restrictions on the factor loadings. The national factor enters all equations so {λi } is unrestricted. The regional factors are restricted to load on only those variables in a region, so γij is nonzero if state i is in region j and is zero otherwise, and the grouping of states into regions is described below. The scale of the factors is normalized setting λ λ/N = 1 and γj γj /NR,j = 1, where λ = (λ1 , . . . , λN ) , γj = (γ1j , . . . , γN j ) , and NR, is the number of state in region j. The parameters of the model consist of {αi , λi , γij , ρi , φ}.7 In the subsection we discuss estimation of the parameters and states conditional on the grouping of states into regions. We then discuss the regional groupings.

3.2. Estimation and ﬁltering Estimation of ﬁxed model coeﬃcients. Estimation was carried out using a two-step process. In the ﬁrst step, the parameters {αi , λi , γij , ρi }, i = 1, . . ., 50 were estimated by Gaussian maximum likelihood in a model in which the values of ση2 , συ2j , and σε2i are allowed to break midway through the sample (1987:IV). The pre- and post-break values of the variances are modeled as unknown constants. This approximation greatly simpliﬁes the likelihood by eliminating the need to integrate out the stochastic volatility. The likelihood is maximized using the EM algorithm described in Engle and Watson (1983). The scale parameter φ (deﬁned below equation (12)) was set equal to 0.04, a value that we have used previously for univariate models (Stock and Watson, 2007a). Filtering. Conditioning on the values of {αi , λi , γij , ρi , φ}, smoothed estimates of the 2 factors and variances E(Ft , Rjt , ση,t , συ2j ,t , σε2i ,t |{yiτ }50,T ι=1,τ =1 ) were computed using Gibbs

50,T NR,50,T 2 2 2 sampling. Draws of ({Ft , Rjt }NR,T j=1,t=1 |{yit }ι=1,t=1 , {ση,t , συj ,t , σεi ,t }j=1,i=1,t=1 ) were generated from the relevant multivariate normal density using the algorithm in Carter and 50,T NR,T 2 , συ2j ,t , σε2i ,t }NR,50,T Kohn (1994). Draws of ({ση,t j=1,i=1,t=1 |{yit }ι=1,t=1 , {Ft , Rjt }j=1,t=1 ) were obtained using a normal mixture approximation for the distribution of the logarithm of the χ21 random variable (ln(ζ 2 )) and data augmentation as described in Shephard (1994) and Kim, Shephard and Chib (1998) (we used a bivariate normal mixture approximation). The smoothed estimates and their standard deviations were approximated by sample averages from 20,000 Gibbs draws (after discarding 1,000 initial draws). Repeating the simulations using another set of 20,000 independent draws resulted in estimates essentially indistinguishable from the estimates obtained from the ﬁrst set of draws.

3.3. Estimation of housing market regions In the DFM-SV model, regional variation is independent of national variation, and any regional comovements would be most noticeable after removing the national factor Ft . 7 The model (3)–(6) has tightly parameterized dynamics. We also experimented with more loosely parameterized models that allow leads and lags of the factors to enter (3) and allow the factors to follow more general AR processes. The key empirical conclusions reported below were generally unaﬀected by these changes.

48

The evolution of national and regional factors in US housing construction

Accordingly, the housing market regions were estimated after removing a single common component associated with the national factor. Our method follows Abraham, Goetzmann, and Wachter (1994) and Crone (2005) by using k-means cluster analysis, except that we apply the k-means procedure after subtracting the contribution of the national factor. Speciﬁcally, the ﬁrst step in estimating the regions used the single-factor model, yit = αi + λi Ft + uit

(13)

Ft = Ft−1 + ηt

(14)

uit = ρi1 uit−1 + ρi2 uit−2 + εit ,

(15)

where (ηt , ε1t , . . . , ε2t ) are independently and distributed normal variables with mean zero and constant variances. Note that in this speciﬁcation, uit consists of the contribution of the regional factors as well as the idiosyncratic term, see (3). The model (13)–(15) was estimated by maximum likelihood, using as starting values least-squares estimates of the coeﬃcients using the ﬁrst principal component as an estimator of Ft (Stock and Watson, 2002a). After subtracting out the common component, this produced the residual ˆ i Fˆt . ˆi − λ u ˆit = yit − α The k-means method was then used to estimate the constituents of the clusters. In general, let {Xi }, i = 1, . . . , N be a T -dimensional vector and let μj be the mean vector of Xi if i is in cluster j. The k-means method solves, min{μj ,Sj }

k

(Xi − μj ) (Xi − μj )

(16)

j=1 i∈Sj

where Sj is the set of indexes contained in cluster j. That is, the k-means method is the least-squares solution to the problem of assigning entity i with data vector Xi to group j.8 We implemented the k-means cluster method using four-quarter changes in u ˆit , that ˆi5 , . . . , Δ4 u ˆiT ) . In principle, (16) should be minimized over all possible is, with Xi = (Δ4 u index sets Sj . With 50 states and more than two clusters, however, this is computationally infeasible. We therefore used the following algorithm: (i) An initial set of k clusters is assigned at random; call this S 0 . (ii) The cluster sample means were computed for the grouping S 0 yielding the k-vector of means, μ ˆ0 . ˆ0 and each state i (iii) The distance from each Xi is computed to each element of μ is reassigned to the cluster with the closest mean; call this grouping S 1 . (iv) The k cluster means μ ˆ1 are computed for the grouping S 1 , and steps (iii) and (iv) are repeated until there are no switches or until the number of iterations reaches 100. This algorithm was repeated for multiple random starting values. 8 In the context of the DFM under consideration, the model-consistent objective function would be to assign states to region so as to maximize the likelihood of the DFM. This is numerically infeasible, however, as each choice of index sets would require estimation of the DFM parameters.

4 Empirical results

49

We undertook an initial cluster analysis to estimate the number of regions, in which the foregoing algorithm was used with 20,000 random starting values. Moving from two to three clusters reduced the value of the minimized objective function (16) by approximately 10%, as did moving from three to four clusters. The improvements from four to ﬁve, and from ﬁve to six, were less, and for six clusters the number of states was as few as ﬁve in one of the clusters. Absent a statistical theory for estimating the number of clusters, and lacking a persuasive reason for choosing six clusters, we therefore chose k = 5. We then estimated the composition of these ﬁve regions using 400,000 random starting values. We found that even after 200,000 starting values there were some improvements in the objective function; however, those improvements were very small and the switches of states in regions involved were few. We then re-estimated the regions for the 1970–1987 and 1988–2007 subsamples, using 200,000 additional random starting values and using the full-sample regional estimates as an additional starting value.

4. Empirical results 4.1. Housing market regions The resulting estimated regions for the full sample and subsamples are tabulated in Table 3.2 and are shown in Figure 3.7 (full sample), Figure 3.8 (1970–1987), and Figure 3.9 (1988–2007). Perhaps the most striking feature of the full-sample estimates shown in Figure 3.7 is the extent to which the cluster algorithm, which did not impose contiguity, created largely contiguous regions, that is, regions in a traditional sense. Other than Vermont, the Northeast states comprise Region 1, and the Southeast states comprise Region 4,

1

2 3 5 4

Fig. 3.7.

Estimated housing market regions, 1970–2007

50

The evolution of national and regional factors in US housing construction

Table 3.2.

Estimated composition of housing market regions

State 1970–2007 1970–1987 1988–2007 State 1970–2007 1970–1987 1988–2007 CT MA MD ME NH NJ NY PA RI CA ID IN MI NV OH OR SD WA WI IA IL KA MN MO ND

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3

1 1 1 2 1 1 1 1 1 2 3 2 2 5 2 2 3 3 2 3 2 3 2 2 3

1 1 2 1 1 1 1 1 1 2 5 3 2 2 3 2 3 2 3 3 3 4 3 4 4

NE DE FL GA HA KY MS NC SC TN VA VT WV AK AL AR AZ CO LA MT NM OK TX UT WY

3 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5

3 4 4 4 1 4 4 4 4 4 4 4 4 5 4 4 5 5 5 3 5 5 5 3 5

3 2 2 4 3 2 4 4 4 4 1 1 4 5 5 5 4 5 4 5 5 5 5 5 4

Estimated using k-means cluster analysis after eliminating the eﬀect of the national factor as described in Section 3.3.

excluding Alabama and including Vermont. Region 3 is the Upper Midwest, without South Dakota, and Region 5 consists of the Rocky Mountain and South Central states, plus Alabama, Arkansas and Louisiana. The only region which is geographically dispersed is Region 2, which consists of the entire West Coast but also South Dakota, and the rust belt states. Figures 3.8 and 3.9 indicate that the general location of the regions was stable between the two subsamples, especially the New England and Rocky Mountain/South Central regions. Housing in Florida, Washington, and Nevada evidently behaved more like California in the second sample than in the ﬁrst, in which they were in other clusters. It is diﬃcult to assess the statistical signiﬁcance of these changes, and without the guidance of formal tests we are left to our own judgment about whether the groupings appear to be stable. The fact that the objective function is essentially unaltered by some changes in the groupings suggests that there is considerable statistical uncertainty associated with the regional deﬁnitions, which in turn suggests that one would expect a fair amount of region switching in a subsample analysis even if the true (unknown population) regions

4 Empirical results

51

2

1 3 5 4

Fig. 3.8.

Estimated housing market regions, 1970–1987

were stably deﬁned. We therefore proceed using ﬁve regions with composition that is kept constant over the full sample.

4.2. Results for split-sample estimates of the dynamic factor model Before estimating the DFM-SV model, we report results from estimation of the dynamic factor model with split-sample estimates of the disturbance variances. This model is given by (3)–(6), where ηt , υjt , and εit are i.i.d. normal. The purpose of this estimation is to

1

2 3 5 4

Fig. 3.9.

Estimated housing market regions, 1988–2007

52

The evolution of national and regional factors in US housing construction

examine the stability of the factor loading coeﬃcients and the disturbance variances over the two split subsamples, 1969–1987 and 1988–2007. Accordingly, two sets of estimates were computed. First, the unrestricted split-sample estimates were produced by estimating the model separately by maximum likelihood on the two subsamples, 1969–1987 and 1988–2008. Second, restricted split-sample estimates were computed, where the factor loading coeﬃcients λ and γ and the idiosyncratic autoregressive coeﬃcients ρ were restricted to be constant over the entire sample period, and the variances {ση2 , σν2j , σε2i } were allowed to change between the two subsamples. This restricted split model has the eﬀect of holding the coeﬃcients of the mean dynamics constant but allows for changes in the variances and the relative importance of the factors are idiosyncratic components. The MLEs for the restricted split-sample model are reported in Table 3.3. The factor loadings are normalized so that λ λ/N = 1 and γj γj /NR,j = 1. The loadings on the national factor are all positive and, for 44 states, are between 0.6 and 1.4. The states with the smallest loadings of the national factor are Hawaii (0.16), Wyoming (0.51), Rhode Island (0.55), and Alaska (0.57). There is considerably more spread on the loadings of the regional factors, and in fact four states have negative regional factor loadings: West Virginia (−1.3), South Carolina (−0.66), Georgia (−0.65), and Mississippi (−0.39). All the states with negative loadings are in Region 4, which suggests either a lack of homogeneity within that region or some intra-region ﬂows in economic activity as these four states see declines in activity associated with gains in Florida and Virginia. The idiosyncratic disturbances exhibit considerable persistence, with a median AR(1) coeﬃcient of 0.71. The restricted split estimates allow only the disturbance variances to change between samples, and the results in Table 3.3 and Table 3.4 (which presents the restricted splitsample estimates of the standard deviations of the factor innovations) show that nearly all these disturbance variances fall, and none increase. The average change in the idiosyncratic disturbance innovation standard deviation is −0.07, the same as the change in the standard deviation of the national factor innovation. The change in the innovation standard deviations of the regional factors is less, typically −0.03. Table 3.5 provides a decomposition of the variance of four-quarter growth in building permits, Δ4 yit , between the two samples. Each column contains two estimates for the column entry, the ﬁrst from the unrestricted split model and the second from the restricted split model. The ﬁrst block of columns reports the fraction of the variance of Δ4 yit explained by the national factor, regional factor, and idiosyncratic term for the ﬁrst subsample, and the second block reports these statistics for the second subsample. The ﬁnal block provides a decomposition of the change in variance of Δ4 yit between the two subsamples, attributable to changes in the contributions of the national factor, regional factor, and idiosyncratic term. Five features of this table are noteworthy. For now, consider the results based on the restricted model (the second of each pair of entries in Table 3.5). First, in both samples most of the variance in Δ4 yit is attributable to the idiosyncratic component, followed by a substantial contribution of the national factor, followed by a small contribution of the regional factor. For example, in the ﬁrst sample, the mean partial R2 attributable to the national factor is 36%, to the regional factor is 10%, and to the state idiosyncratic disturbance is 54%. There is, however, considerable heterogeneity

4 Empirical results Table 3.3.

CT MA MD ME NH NJ NY PA RI CA ID IN MI NV OH OR SD WA WI IA IL KA MN MO ND NE DE FL GA HA KY MS NC SC TN VA VT WV AK AL AR AZ

53

Maximum likelihood estimates, restricted split-sample estimation

Region

λ

γ

ρ

σε (69–87)

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5

0.90 0.91 0.78 1.00 1.16 1.08 0.86 0.74 0.55 1.02 1.07 1.02 1.23 1.31 1.11 0.69 1.14 0.68 0.99 1.23 1.55 0.83 1.24 1.01 1.02 1.12 1.09 0.83 1.21 0.16 1.04 0.92 1.07 0.79 1.18 1.13 0.90 0.93 0.57 1.02 1.09 1.41

1.38 1.21 0.70 0.67 1.04 1.13 0.55 0.60 1.30 0.45 0.53 1.01 1.89 0.11 0.93 1.10 0.70 0.68 1.38 1.58 1.03 0.42 0.73 0.26 1.32 0.96 1.00 0.95 −0.65 1.11 0.23 −0.39 0.24 −0.66 0.24 1.56 1.88 −1.30 1.42 0.37 0.18 0.39

−0.04 0.47 0.79 0.86 0.78 0.64 0.83 0.76 0.48 0.97 0.91 0.42 0.92 0.84 0.89 0.84 0.63 0.79 0.07 −0.17 0.90 0.55 0.91 0.77 0.63 0.37 0.68 0.93 0.94 0.71 0.50 0.70 0.91 0.83 0.70 −0.18 0.71 0.29 0.69 0.40 0.33 0.68

0.09 0.15 0.13 0.20 0.23 0.12 0.18 0.13 0.26 0.12 0.28 0.13 0.11 0.22 0.11 0.16 0.25 0.13 0.07 0.11 0.14 0.22 0.17 0.15 0.25 0.16 0.29 0.13 0.10 0.32 0.23 0.22 0.15 0.12 0.14 0.07 0.30 0.39 0.36 0.20 0.15 0.18

σε (88–07) 0.07 0.06 0.11 0.09 0.11 0.10 0.10 0.07 0.12 0.08 0.10 0.08 0.06 0.19 0.05 0.13 0.22 0.10 0.05 0.08 0.06 0.12 0.08 0.09 0.23 0.15 0.11 0.07 0.07 0.26 0.10 0.13 0.06 0.08 0.07 0.04 0.15 0.12 0.24 0.10 0.13 0.09 (cont.)

54

The evolution of national and regional factors in US housing construction Table 3.3.

CO LA MT NM OK TX UT WY

(Continued )

Region

λ

γ

ρ

σε (69–87)

σε (88–07)

5 5 5 5 5 5 5 5

1.10 0.90 1.10 1.00 0.72 0.76 0.89 0.51

0.86 1.34 0.88 0.38 1.51 1.29 0.59 1.39

0.83 0.06 0.79 0.63 0.42 0.96 0.90 0.83

0.11 0.11 0.33 0.23 0.15 0.08 0.17 0.31

0.11 0.11 0.17 0.12 0.12 0.06 0.10 0.19

Estimates are restricted split-sample MLEs of the dynamic factor model in Section 3.3, with innovation variances that are constant over each sample but diﬀer between samples.

behind these averages, for example in the ﬁrst period the partial R2 attributable to the national factor ranges from 0% to 67%. The states with 5% or less of the variance explained by the national factor in both periods are Hawaii, Wyoming, and Alaska. The states with 45% or more of the variance explained by the national factor in both periods are Georgia, Wisconsin, Illinois, Arizona, Ohio, Tennessee, Virginia, and North Carolina. Second, the importance of the national factor to state-level ﬂuctuations falls from the ﬁrst sample to the second: the median partial R2 in the ﬁrst period is 0.37 and in the second period is 0.23. The contribution of the regional factor is approximately unchanged, and the contribution of the state-speciﬁc disturbance increases for most states. Third, all states experienced a reduction in the volatility of Δ4 yit , and for most states that reduction was large. The variance reductions ranged from 35% (Hawaii) to 88% (West Virginia), with a median reduction of 72%. This reduction in variance is, on average, attributable equally to a reduction in the volatility of the contribution of the national factor and a reduction in the volatility of the idiosyncratic disturbance; on average, the regional factor makes only a small contribution to the reduction in volatility.

Table 3.4. Restricted split-sample estimates of the standard deviation of factor shocks for the national and regional factors

National Factor Region 1 Region 2 Region 3 Region 4 Region 5

1969–1987

1988–2007

Change

0.12 0.06 0.06 0.09 0.03 0.07

0.05 0.05 0.03 0.03 0.03 0.04

−0.07 −0.01 −0.03 −0.06 0.00 −0.03

CT MA MD ME NH NJ NY PA RI CA ID IN MI NV OH OR SD WA WI IA IL KA MN MO ND NE DE FL GA

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4

0.28 0.30 0.29 0.39 0.40 0.28 0.30 0.30 0.42 0.34 0.53 0.34 0.43 0.45 0.36 0.36 0.46 0.31 0.31 0.40 0.52 0.40 0.34 0.40 0.58 0.43 0.47 0.32 0.35

0.29 0.35 0.30 0.45 0.51 0.35 0.40 0.30 0.45 0.35 0.59 0.34 0.42 0.51 0.35 0.36 0.50 0.30 0.31 0.44 0.50 0.41 0.45 0.37 0.54 0.40 0.56 0.33 0.35

σ

0.63 0.61 0.58 0.35 0.58 0.71 0.47 0.43 0.23 0.69 0.15 0.59 0.66 0.30 0.60 0.23 0.18 0.34 0.65 0.42 0.68 0.30 0.49 0.76 0.29 0.43 0.20 0.30 0.63

0.53 0.39 0.37 0.28 0.29 0.55 0.26 0.34 0.08 0.49 0.18 0.51 0.47 0.38 0.56 0.21 0.30 0.30 0.59 0.44 0.55 0.23 0.42 0.43 0.20 0.45 0.21 0.36 0.67

R2 − F

0.03 0.07 0.02 0.00 0.01 0.01 0.08 0.01 0.00 0.04 0.11 0.10 0.13 0.01 0.11 0.26 0.11 0.08 0.21 0.41 0.17 0.05 0.04 0.07 0.20 0.21 0.06 0.09 0.03

0.27 0.15 0.07 0.03 0.05 0.13 0.02 0.05 0.10 0.02 0.01 0.13 0.30 0.00 0.10 0.14 0.03 0.08 0.31 0.43 0.15 0.03 0.09 0.02 0.20 0.20 0.01 0.02 0.01

R2 − R

1969–1987

0.34 0.33 0.40 0.65 0.40 0.28 0.45 0.55 0.77 0.27 0.74 0.31 0.21 0.69 0.29 0.51 0.72 0.58 0.14 0.17 0.15 0.65 0.47 0.18 0.51 0.36 0.74 0.60 0.34

0.20 0.45 0.56 0.70 0.66 0.32 0.71 0.61 0.81 0.49 0.81 0.36 0.23 0.62 0.34 0.65 0.67 0.63 0.10 0.14 0.30 0.74 0.49 0.56 0.60 0.35 0.78 0.61 0.32

R2 − e 0.22 0.18 0.18 0.24 0.24 0.24 0.24 0.14 0.23 0.19 0.22 0.17 0.18 0.29 0.13 0.21 0.32 0.16 0.15 0.20 0.16 0.20 0.22 0.16 0.34 0.22 0.24 0.17 0.19

0.20 0.18 0.22 0.21 0.25 0.24 0.21 0.16 0.24 0.20 0.23 0.17 0.21 0.37 0.16 0.26 0.40 0.21 0.15 0.20 0.20 0.22 0.20 0.19 0.41 0.26 0.23 0.16 0.19

σ 0.46 0.52 0.21 0.28 0.34 0.33 0.26 0.40 0.23 0.30 0.23 0.46 0.55 0.18 0.51 0.12 0.12 0.04 0.41 0.32 0.70 0.21 0.56 0.44 0.08 0.14 0.24 0.33 0.34

0.23 0.27 0.13 0.25 0.23 0.23 0.18 0.23 0.06 0.29 0.24 0.40 0.37 0.14 0.51 0.08 0.09 0.12 0.46 0.40 0.62 0.16 0.41 0.32 0.07 0.21 0.25 0.27 0.45

R2 − F 0.15 0.15 0.23 0.29 0.01 0.03 0.13 0.16 0.22 0.01 0.01 0.06 0.33 0.00 0.10 0.00 0.03 0.00 0.08 0.00 0.13 0.01 0.04 0.05 0.00 0.03 0.01 0.01 0.44

0.51 0.45 0.10 0.11 0.17 0.24 0.07 0.15 0.30 0.02 0.02 0.13 0.31 0.00 0.12 0.07 0.01 0.04 0.31 0.27 0.12 0.02 0.06 0.01 0.05 0.06 0.08 0.13 0.05

R2 − R

1988–2007

0.39 0.33 0.56 0.43 0.65 0.64 0.60 0.44 0.54 0.69 0.76 0.48 0.13 0.82 0.39 0.88 0.85 0.96 0.51 0.68 0.17 0.78 0.40 0.51 0.92 0.84 0.76 0.66 0.22

0.27 0.28 0.77 0.64 0.60 0.54 0.75 0.62 0.64 0.69 0.74 0.47 0.32 0.86 0.37 0.86 0.90 0.84 0.24 0.33 0.26 0.83 0.53 0.67 0.88 0.73 0.67 0.59 0.50

R2 − e −0.38 −0.63 −0.60 −0.63 −0.64 −0.30 −0.38 −0.78 −0.69 −0.68 −0.83 −0.76 −0.82 −0.60 −0.87 −0.64 −0.53 −0.75 −0.75 −0.74 −0.91 −0.75 −0.59 −0.83 −0.64 −0.73 −0.73 −0.72 −0.69

−0.55 −0.72 −0.45 −0.79 −0.75 −0.54 −0.72 −0.72 −0.71 −0.68 −0.85 −0.75 −0.76 −0.47 −0.79 −0.46 −0.37 −0.52 −0.75 −0.79 −0.83 −0.73 −0.80 −0.74 −0.43 −0.58 −0.84 −0.75 −0.71

Total

−0.34 −0.41 −0.49 −0.25 −0.46 −0.47 −0.31 −0.35 −0.16 −0.60 −0.11 −0.48 −0.56 −0.23 −0.53 −0.19 −0.12 −0.33 −0.55 −0.34 −0.62 −0.25 −0.26 −0.68 −0.27 −0.39 −0.14 −0.21 −0.52

F −0.43 −0.32 −0.30 −0.22 −0.24 −0.44 −0.21 −0.28 −0.07 −0.39 −0.15 −0.41 −0.38 −0.31 −0.45 −0.17 −0.24 −0.24 −0.48 −0.35 −0.45 −0.18 −0.34 −0.35 −0.16 −0.37 −0.17 −0.29 −0.54 0.06 −0.01 0.07 0.10 −0.01 0.01 0.00 0.02 0.07 −0.04 −0.11 −0.09 −0.07 −0.01 −0.10 −0.26 −0.09 −0.08 −0.19 −0.41 −0.16 −0.05 −0.02 −0.06 −0.20 −0.20 −0.06 −0.09 0.11

−0.05 −0.03 −0.01 0.00 −0.01 −0.02 0.00 −0.01 −0.02 −0.02 −0.01 −0.10 −0.22 0.00 −0.08 −0.11 −0.02 −0.06 −0.23 −0.37 −0.13 −0.03 −0.07 −0.01 −0.17 −0.17 0.00 0.01 0.00

R

−0.10 −0.21 −0.17 −0.49 −0.17 0.17 −0.08 −0.46 −0.60 −0.05 −0.61 −0.19 −0.19 −0.36 −0.24 −0.19 −0.32 −0.34 −0.01 0.00 −0.13 −0.46 −0.31 −0.09 −0.18 −0.14 −0.53 −0.42 −0.28

−0.08 −0.37 −0.14 −0.56 −0.51 −0.07 −0.51 −0.44 −0.63 −0.27 −0.69 −0.24 −0.15 −0.16 −0.26 −0.19 −0.11 −0.22 −0.04 −0.07 −0.25 −0.51 −0.39 −0.38 −0.09 −0.04 −0.67 −0.46 −0.18 (cont.)

e

Decomposition of (Var69−87 − Var88−07 )/Var88−07

Variance decompositions for four-quarter growth in state building permits (Δ4 yit ) based on unrestricted and restricted split-sample estimation of the dynamic factor model, 1969–1987 and 1988–2007

Table 3.5.

4 Empirical results 55

56

(Continued )

Table 3.5.

1969–1987

σ 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5

0.52 0.42 0.37 0.33 0.27 0.34 0.28 0.55 0.59 0.67 0.39 0.35 0.43 0.37 0.36 0.60 0.49 0.42 0.31 0.38 0.58

0.56 0.44 0.44 0.38 0.29 0.37 0.30 0.58 0.62 0.67 0.39 0.35 0.46 0.35 0.33 0.66 0.46 0.36 0.30 0.40 0.61

0.00 0.18 0.13 0.27 0.33 0.59 0.60 0.07 0.00 0.01 0.26 0.41 0.50 0.61 0.44 0.05 0.30 0.31 0.38 0.28 0.00

0.00 0.32 0.25 0.45 0.43 0.57 0.81 0.14 0.13 0.04 0.39 0.56 0.53 0.57 0.44 0.16 0.27 0.23 0.37 0.29 0.04

R2 − R 0.12 0.16 0.61 0.21 0.13 0.17 0.10 0.00 0.04 0.08 0.04 0.07 0.05 0.23 0.35 0.07 0.04 0.35 0.35 0.05 0.12

0.01 0.00 0.00 0.00 0.02 0.00 0.08 0.03 0.01 0.09 0.02 0.01 0.01 0.12 0.33 0.04 0.01 0.36 0.37 0.04 0.10

R2 − e 0.88 0.66 0.26 0.52 0.54 0.23 0.30 0.93 0.96 0.90 0.70 0.52 0.45 0.16 0.21 0.87 0.66 0.34 0.27 0.67 0.88

0.98 0.68 0.75 0.55 0.56 0.43 0.11 0.83 0.86 0.87 0.59 0.44 0.46 0.31 0.23 0.81 0.72 0.41 0.26 0.67 0.86

σ 0.45 0.19 0.23 0.13 0.17 0.16 0.18 0.34 0.26 0.37 0.16 0.20 0.21 0.23 0.18 0.31 0.19 0.20 0.16 0.20 0.25

0.45 0.19 0.25 0.17 0.18 0.18 0.16 0.29 0.22 0.43 0.19 0.22 0.21 0.25 0.20 0.33 0.23 0.22 0.17 0.21 0.37

R2 − F 0.01 0.51 0.16 0.50 0.21 0.43 0.45 0.20 0.20 0.10 0.09 0.16 0.68 0.06 0.03 0.04 0.32 0.13 0.13 0.11 0.05

0.00 0.31 0.14 0.45 0.21 0.49 0.51 0.10 0.20 0.02 0.33 0.26 0.49 0.22 0.21 0.12 0.21 0.12 0.22 0.20 0.02

R2 − R 0.00 0.02 0.12 0.16 0.22 0.01 0.02 0.07 0.07 0.06 0.18 0.04 0.02 0.18 0.41 0.18 0.02 0.19 0.49 0.19 0.31

0.02 0.01 0.01 0.01 0.05 0.01 0.35 0.16 0.14 0.06 0.02 0.00 0.02 0.06 0.23 0.04 0.01 0.25 0.31 0.04 0.08

R2 − e 0.99 0.47 0.72 0.34 0.57 0.57 0.53 0.73 0.73 0.85 0.73 0.80 0.30 0.76 0.56 0.78 0.66 0.68 0.38 0.69 0.64

0.97 0.68 0.85 0.54 0.74 0.51 0.13 0.74 0.65 0.92 0.65 0.73 0.49 0.72 0.56 0.84 0.78 0.64 0.47 0.76 0.90

Total

−0.24 −0.80 −0.61 −0.85 −0.62 −0.77 −0.57 −0.61 −0.81 −0.71 −0.82 −0.66 −0.76 −0.60 −0.73 −0.73 −0.85 −0.78 −0.73 −0.72 −0.81

−0.35 −0.80 −0.67 −0.81 −0.61 −0.78 −0.70 −0.74 −0.88 −0.59 −0.77 −0.60 −0.79 −0.50 −0.60 −0.75 −0.75 −0.61 −0.68 −0.72 −0.64

F 0.01 −0.08 −0.07 −0.19 −0.25 −0.50 −0.40 0.01 0.04 0.01 −0.24 −0.36 −0.34 −0.59 −0.43 −0.04 −0.25 −0.28 −0.35 −0.24 0.01

R 0.00 −0.26 −0.20 −0.36 −0.34 −0.46 −0.65 −0.11 −0.10 −0.03 −0.32 −0.45 −0.43 −0.46 −0.35 −0.13 −0.22 −0.19 −0.30 −0.23 −0.03

−0.12 −0.15 −0.57 −0.18 −0.05 −0.17 −0.09 0.02 −0.03 −0.07 −0.01 −0.05 −0.04 −0.16 −0.24 −0.02 −0.03 −0.31 −0.22 0.00 −0.06

0.00 0.00 0.00 0.00 0.01 0.00 0.03 0.01 0.00 −0.07 −0.01 0.00 −0.01 −0.09 −0.25 −0.03 −0.01 −0.26 −0.27 −0.03 −0.07

e −0.13 −0.57 0.02 −0.47 −0.32 −0.10 −0.07 −0.64 −0.82 −0.65 −0.57 −0.25 −0.38 0.15 −0.06 −0.66 −0.56 −0.18 −0.16 −0.48 −0.76

−0.36 −0.54 −0.47 −0.45 −0.27 −0.32 −0.07 −0.64 −0.78 −0.49 −0.45 −0.14 −0.36 0.05 0.00 −0.60 −0.52 −0.17 −0.11 −0.46 −0.54

Mean

0.40 0.42 0.38 0.36 0.12 0.10 0.49 0.54 0.22 0.23 0.28 0.25 0.11 0.12 0.61 0.63

−0.69 −0.68

−0.30 −0.29

−0.09 −0.06

−0.30 −0.33

0.10 0.25 0.50 0.75 0.90

0.29 0.33 0.38 0.45 0.55

−0.83 −0.78 −0.73 −0.62 −0.57

−0.56 −0.46 −0.31 −0.19 −0.04

−0.24 −0.23 −0.16 −0.08 −0.06 −0.02 −0.01 0.00 0.02 0.00

−0.64 −0.48 −0.25 −0.13 −0.05

0.30 0.35 0.40 0.50 0.58

0.05 0.23 0.35 0.59 0.65

0.13 0.23 0.37 0.49 0.56

0.01 0.04 0.08 0.17 0.26

0.00 0.01 0.04 0.13 0.30

0.18 0.29 0.47 0.67 0.87

0.23 0.35 0.56 0.71 0.81

0.16 0.17 0.20 0.24 0.31

0.16 0.19 0.21 0.25 0.37

0.05 0.13 0.23 0.43 0.51

0.07 0.14 0.23 0.33 0.46

0.00 0.01 0.06 0.18 0.29

0.01 0.02 0.06 0.16 0.31

0.33 0.47 0.64 0.76 0.85

0.28 0.51 0.65 0.77 0.86

−0.81 −0.77 −0.72 −0.60 −0.47

−0.46 −0.39 −0.30 −0.19 −0.11

−0.63 −0.51 −0.36 −0.14 −0.07

The ﬁrst entry in each cell is computed using the unrestricted split-sample estimates of the dynamic factor model; the second entry is computed using restricted split-sample estimates for which the factor loadings and idiosyncratic autoregressive coeﬃcients are restricted to equal their full-sample values. The ﬁrst numeric column is the region of the state. The next block of columns contains the standard deviation of Δ4 yit over 1969–1987 and the fraction of the variance attributable to the national factor F , the regional factor R, and the idiosyncratic disturbance e. The next block provides the same statistics for 1988–2007. The ﬁnal block decomposes the relative change in the variance from the ﬁrst to the second period as the sum of changes in the contribution of F , R, and e; for each state, the sum of the ﬁnal three columns equals the Total column up to rounding.

The evolution of national and regional factors in US housing construction

HA KY MS NC SC TN VA VT WV AK AL AR AZ CO LA MT NM OK TX UT WY

R2 − F

Decomposition of (Var69−87 − Var88−07 )/Var88−07

1988–2007

4 Empirical results

57

Fourth, the summary statistics based on the restricted and unrestricted split-sample estimation results are similar. For example, the median estimated R2 explained by the national factor in the ﬁrst period (numeric column 3) is 0.38 using the unrestricted estimates and 0.36, using the restricted estimates. Similarly, the median fractional change in the variance between the ﬁrst and the second sample attributed to a reduction in the contribution of the national factor (numeric column 11) is 0.31 for the unrestricted estimates and 0.30 for the restricted estimates. These comparisons indicate that little is lost, at least on average, by modeling the factor loadings and autoregressive coeﬃcients as constant across the two samples and allowing only the variances to change.9 Moreover, inspection of Table 3.5 reveals that the foregoing conclusions based on the restricted estimates also follow from the unrestricted estimates.

4.3. Results for the DFM-SV model We now turn to the results based on the DFM-SV model. As discussed in Section 3.2, the parameters λ, γ, and ρ are ﬁxed at the full-sample MLEs, and the ﬁltered estimates of the factors and their time-varying variances were computed numerically. National and regional factors. The four-quarter growth of the estimated national factor from the DFM-SV model, Δ4 Fˆt , is plotted in Figure 3.10 along with three other measures of national movements in building permits: the ﬁrst principal component 50 of the 1 50 series Δ4 y1t , . . . , Δ4 y50t ; the average state four-quarter growth rate, 50 i=1 Δ4 yit , and the fourth-quarter growth rate of total national building permits, ln(BPt /BPt−4 ), 50 where BPt = i=1 BPit . The ﬁrst principal component is an estimator of the fourquarter growth rate of the national factor in a single-factor model (Stock and Watson, 2002a) as is the average of the state-level four-quarter growth rates under the assumption that the average population factor loading for the national factor is nonzero (Forni, and Reichlin, 1998). The fourth series plotted, the four-quarter growth rate of national aggregate building permits, does not have an interpretation as an estimate of the factor in a single-factor version of the DFM speciﬁed in logarithms because the factor model is speciﬁed in logarithms at the state level. As is clear from Figure 3.10, the three estimates of the factor (the DFM-SV estimate, the ﬁrst principal component, and the average of the state-level growth rates) yields very nearly the same estimated four-quarter growth of the national factor. These in turn are close to the growth rate of national building permits; however, there are some discrepancies between the national permits and the estimates of the national factor, particularly in 1974, 1990, and 2007. Like national building permits and consistent with the split-sample analysis, the four-quarter growth rate of the national factor shows a marked reduction in volatility after 1985. Figure 3.11 presents the four-quarter growth rates of the national and ﬁve regional factors, along with ±1 standard deviation bands, where the standard deviation bands represent ﬁltering uncertainty but not parameter estimation uncertainty (as discussed in Section 3.2). The region factors show substantial variations across regions, for example the housing slowdown in the mid-1980s in the South Central (Region 5) and the slowdown 9 The restricted and unrestricted split-sample log-likelihoods diﬀer by 280 points, with 194 additional parameters in the unrestricted model. However, it would be heroic to rely on a chi-squared asymptotic distribution of the likelihood ratio statistic for inference with this many parameters.

58

The evolution of national and regional factors in US housing construction 0.8

Factor-SV PC(State Growth Rates) Total BPs Growth Rate Average Growth Rate

0.6 0.4 0.2 0.0 –0.2 –0.4 –0.6 –0.8 1965

1970

1975

1980

1985

1990

1995

2000

2005

2010

Fig. 3.10. Comparison of DFM-SV ﬁltered estimate of the national factor (solid line) to the ﬁrst principal component of the 50 state series, total US building permits, and the average of the state-level building permit growth rates, all computed using four-quarter growth rates

in the late-1980s in the Northeast (Region 1) are both visible in the regional factors, and these slowdowns do not appear in other regions. Figure 3.12 takes a closer look at the pattern of volatility in the national and regional factors by reporting the estimated instantaneous standard deviation of the factor innovations. The estimated volatility of the national factor falls markedly over the middle of the sample, as does the volatility for Region 3 (the Upper Midwest). However the pattern of volatility changes for regions other than 3 is more complicated; in fact, there is evidence of a volatility peak in the 1980s in regions 1, 2, 4, and 5. This suggests that the DFM-SV model attributes the common aspect of the decline in volatility of state building permits over the sample to a decline in the volatility of the national factor. Figure 3.13 uses the DFM-SV estimates to compute statistics analogous to those from the split-sample analysis of Section 4.2, speciﬁcally, state-by-state instantaneous estimates of the standard deviation of the innovation to the idiosyncratic disturbance and the partial R2 attributable to the national and regional factors and to the idiosyncratic disturbance. The conclusions are consistent with those reached by the examination of the split-sample results in Table 3.5. Speciﬁcally, for a typical state the fraction of the state-level variance of Δ4 yit explained by the national factor has declined over time, the fraction attributable to the idiosyncratic disturbance has increased, and the fraction attributable to the regional factor has remained approximately constant. In addition, the volatility of the idiosyncratic disturbance has decreased over time.

4 Empirical results

59

National Factor

Regional Factor 1

Regional Factor 2

0.75

0.75

0.75

0.50

0.50

0.50

0.25

0.25

0.25

0.00

0.00

0.00

–0.25

–0.25

–0.25

–0.50

–0.50

–0.50

–0.75 1965

1975

1985

1995

2005

2015

–0.75 1965

1975

Regional Factor 3

1985

1995

2005

2015

–0.75 1965

0.75

0.75

0.50

0.50

0.50

0.25

0.25

0.25

0.00

0.00

0.00

–0.25

–0.25

–0.25

–0.50

–0.50

–0.50

1975

1985

1995

2005

–0.75 2015 1965

1975

1985

1995

1985

1995

2005

2015

2005

2015

Regional Factor 5

0.75

–0.75 1965

1975

Regional Factor 4

2005

2015

–0.75 1965

1975

1985

1995

Fig. 3.11. Four-quarter decimal growth of the ﬁltered estimates of the national factor (ﬁrst panel) and the ﬁve regional factors from the DFM-SV model, and ±1 standard deviation bands (dotted lines)

National Factor

Regional Factor 1

Regional Factor 2

0.25

0.25

0.25

0.20

0.20

0.20

0.15

0.15

0.15

0.10

0.10

0.10

0.05

0.05

0.05

0.00 1965

1975

1985

1995

2005

2015

0.00 1965

1975

Regional Factor 3

1985

1995

2005

2015

0.00 1965

0.25

0.25

0.20

0.20

0.20

0.15

0.15

0.15

0.10

0.10

0.10

0.05

0.05

0.05

1975

1985

1995

2005

2015

0.00 1965

1975

1985

1995

1985

1995

2005

2015

2005

2015

Regional Factor 5

0.25

0.00 1965

1975

Regional Factor 4

2005

2015

0.00 1965

1975

1985

1995

Fig. 3.12. DFM-SV estimates of the instantaneous standard deviations of the innovations to the national and regional factors, with ±1 standard deviation bands (dotted lines)

60

The evolution of national and regional factors in US housing construction National R2

Idiosyncratic SD 0.4

1.0

0.8

0.3

0.6 0.2 0.4 0.1

0.0 1965

0.2

1970

1975

1980

1985

1990

1995

2000

2005

2010

0.0 1965

1970

1975

1980

Regional R2 1.0

1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.0 1965

1970

1975

1980

1985

1990

1985

1990

1995

2000

2005

2010

1995

2000

2005

2010

Idiosyncratic R2

1995

2000

2005

2010

0.0 1965

1970

1975

1980

1985

1990

Fig. 3.13. DFM-SV estimates of the evolution of the state-level factor model: the standard deviation of the idiosyncratic innovation (upper left) and the partial R2 from the national factor (upper right), the regional factor (lower left), and the idiosyncratic term (lower right). Shown are the 10%, 25%, 50%, 75%, and 90% percentiles across states, evaluated quarter by quarter This said, the patterns in Figure 3.13 suggest some nuances that the split-sample analysis masks. Notably, the idiosyncratic standard deviation declines at a nearly constant rate over this period, and does not appear to be well characterized as having a single break. The volatility of the regional factor does not appear to be constant, and instead increases substantially for many states in the 1980s. Also, the importance of the national factor has ﬂuctuated over time: it was greatest during the recessions of the late 70s/early 80s, but in the early 1970s the contribution of the national factor was essentially the same as in 2007. For this partial R2 associated with the national factor, the pattern that emerges is less one of a sharp break than of a slow evolution.

5. Discussion and conclusions The empirical results in Section 4 suggest ﬁve main ﬁndings that bear on the issues, laid out in the introduction, about the relationship between state-level volatility in housing construction and the Great Moderation in overall US economic activity.

5 Discussion and conclusions

61

First, there has been a large reduction in the volatility of state-level housing construction, with the state-level variance of the four-quarter growth in building permits falling by between 35% and 88% from the period 1970–1987 to the period 1988–2007, with a median decline of 72%. Second, according to the estimates from the state building permit DFM-SV model, there was a substantial decline in the volatility of the national factor, and this decline occurred sharply in the mid-1980s. On average, this reduction in the volatility of the national factor accounted for one-half of the reduction in the variance of four-quarter growth in state building permits. Third, there is evidence of regional organization of housing markets and, intriguingly, the cluster analytic methods we used to estimate the composition of the regions resulted in ﬁve conventionally identiﬁable regions – the Northeast, Southeast, Upper Midwest, Rockies, and West Coast – even though no constraints were imposed requiring the estimated regions to be contiguous. The regional factors, however, explain only a modest amount of state-level ﬂuctuations in building permits, and the regional factors show no systematic decline in volatility; if anything, they exhibit a peak in volatility in the mid-1980s. Fourth, there has been a steady decline in the volatility of the idiosyncratic component of state building permits over the period 1970–2007. The smooth pattern of this decline is diﬀerent than that for macroeconomic aggregates or for the national factor, which exhibit striking declines in volatility in the mid-1980s. Taken together, these ﬁndings are consistent with the view, outlined in the introduction, that the development of ﬁnancial markets played an important role in the Great Moderation: less cyclically sensitive access to credit coincided with a decline in the volatility of the national factor in building permits, which in turn led to declines in the volatility of state housing construction. The timing of the decline in the volatility of the national housing factor coincides with the harmonization of mortgage rates across regions in Figure 3.2, the mid-1980s. We emphasize that the evidence here is reduced-form, and the moderation of national factor presumably reﬂects many inﬂuences, including moderation in the volatility of income. Sorting out these multiple inﬂuences would require augmenting the state building permits set developed here with other data, such as state-level incomes.

4

Modeling UK Inﬂation Uncertainty, 1958–2006 Gianna Boero, Jeremy Smith, and Kenneth F. Wallis

1. Introduction Introducing the autoregressive conditional heteroskedastic (ARCH) process in his celebrated article in Econometrica in July 1982, Robert Engle observed that the ARCH regression model “has a variety of characteristics which make it attractive for econometric applications” (p. 989). He noted in particular that “econometric forecasters have found that their ability to predict the future varies from one period to another”, citing the recognition by McNees (1979, p. 52) that “the inherent uncertainty or randomness associated with diﬀerent forecast periods seems to vary widely over time”, and McNees’s ﬁnding that “the ‘large’ and ‘small’ errors tend to cluster together” (p. 49). McNees had examined the track record of the quarterly macroeconomic forecasts published by ﬁve forecasting groups in the United States over the 1970s. He found that, for inﬂation, the median one-year-ahead forecast persistently underpredicted the annual inﬂation rate from mid-1972 to mid-1975, with the absolute forecast error exceeding four percentage points for ﬁve successive quarters in this period; outside this period forecast errors were more moderate, and changed sign from time to time, though serial correlation remained. Engle’s article presented an application of the ARCH regression model to inﬂation in the United Kingdom over the period 1958–1977, which included the inﬂationary explosion of 1974–1975, the magnitude of which had likewise been unanticipated by UK forecasters (Wallis, 1989). In both countries this “Great Inﬂation” is now seen as an exceptional episode, and the transition to the “Great Moderation” has been much studied in recent years. How this has interacted with developments in the analysis of inﬂation volatility and the treatment of inﬂation forecast uncertainty is the subject of this chapter. The quarter-century since the publication of ARCH has seen widespread application in macroeconomics of the basic model and its various extensions – GARCH, GARCH-M, EGARCH . . . – not to mention the proliferation of applications in ﬁnance of these and related models under the heading of stochastic volatility, the precursors of which predate 62

2 UK inﬂation and the policy environment

63

ARCH (Shephard, 2008). There has also been substantial development in the measurement and reporting of inﬂation forecast uncertainty (Wallis, 2008). Since 1996 the National Institute of Economic and Social Research (NIESR) and the Bank of England have published not only point forecasts but also density forecasts of UK inﬂation, the latter in the form of the famous fan chart. Simultaneously in 1996 the Bank initiated its Survey of External Forecasters, analogous to the long-running US Survey of Professional Forecasters; based on the responses it publishes quarterly survey average density forecasts of inﬂation in its Inﬂation Report. Finally the last quarter-century has seen substantial development of the econometrics of structural breaks and regime switches, perhaps driven by and certainly relevant to the macroeconomic experience of the period. These methods have been applied in a range of models to document the decline in persistence and volatility of key macroeconomic aggregates in the United States, where the main break is usually located in the early 1980s. Interpretation has been less straightforward, however, especially with respect to inﬂation, as “it has proved hard to reach agreement on what monetary regimes were in place in the US and indeed whether there was ever any change at all (except brieﬂy at the start of the 1980s with the experiment in the control of bank reserves)” (Meenagh, Minford, Nowell, Sofat and Srinivasan, 2009). Although the corresponding UK literature is smaller in volume, it has the advantage that the various changes in policy towards inﬂation are well documented, which Meenagh et al. and other authors have been able to exploit. Using models in this way accords with the earlier view of Nerlove (1965), while studying econometric models of the UK economy, that model building, in addition to the traditional purposes of forecasting and policy analysis, can be described as a way of writing economic history. The modeling approach and the traditional approach to economic history each have limitations, but a judicious blend of the two can be beneﬁcial. At the same time there can be tensions between the ex post and ex ante uses of the model, as discussed below. The rest of this chapter is organized as follows. Section 2 contains a brief review of UK inﬂationary experience and the associated policy environment(s), 1958–2006, in the light of the literature alluded to in the previous paragraph. Section 3 returns to Engle’s original ARCH regression model, and examines its behavior over the extended period. Section 4 turns to a fuller investigation of the nature of the nonstationarity of inﬂation, preferring a model with structural breaks, stationary within subperiods. Section 5 considers a range of measures of inﬂation forecast uncertainty, from these models and other UK sources. Section 6 considers the association between uncertainty and the level of inﬂation, ﬁrst mooted in Milton Friedman’s Nobel lecture. Section 7 concludes.

2. UK inﬂation and the policy environment Measures of inﬂation based on the Retail Prices Index (RPI) are plotted in Figure 4.1, using quarterly data, 1958–2006. We believe that this is the price index used by Engle (1982a), although the internationally more standard term, “consumer price index”, is used in his text; in common with most time-series econometricians, he deﬁned inﬂation as the ﬁrst diﬀerence of the log of the quarterly index. In 1975 mortgage interest payments were introduced into the RPI to represent owner-occupiers’ housing costs, replacing a rental equivalent approach, and a variant index excluding mortgage interest payments

64

Modeling UK inﬂation uncertainty, 1958–2006 40

30 20 10 0 –10 60

Fig. 4.1(a). Δ1 pt

65

70

75

80

85

90

95

00

05

UK RPI inﬂation 1958:1–2006:4 (percentage points of annual inﬂation),

25 20 15 10 5 0 –5 60

Fig. 4.1(b). Δ4 pt

65

70

75

80

85

90

95

00

05

UK RPI inﬂation 1958:1–2006:4 (percentage points of annual inﬂation),

(RPIX) also came into use. This became the explicit target of the inﬂation targeting policy initiated in October 1992, as it removed a component of the all-items RPI that reﬂected movements in the policy instrument. In December 2003 the oﬃcial target was changed to the Harmonised Index of Consumer Prices, constructed on principles harmonized across member countries of the European Union and promptly relabeled CPI in the UK, while the all-items RPI continues in use in a range of indexation applications, including index-linked gilts. Neither of these indices, nor their variants, is ever revised after ﬁrst publication. For policy purposes, and hence also in public discussion and practical forecasting, inﬂation is deﬁned in terms of the annual percentage increase in the relevant index. We denote the “econometric” and “policy” measures of inﬂation, respectively, as Δ1 pt and Δ4 pt , where Δi = 1 − Li with lag operator L, and p is the log of the quarterly

2 UK inﬂation and the policy environment

65

index. The former, annualized (by multiplying by four), is shown in the upper panel of Figure 4.1; the latter in the lower panel. It is seen that annual diﬀerencing removes the mild seasonality in the quarterly RPI, which is evident in the ﬁrst-diﬀerenced series, and also much reduces short-term volatility. Episodes of distinctly diﬀerent inﬂationary experience are apparent in Figure 4.1, and their identiﬁcation in the context of diﬀerent modeling exercises and their association with diﬀerent approaches to macroeconomic policy have been studied in the UK literature mentioned above. Haldane and Quah (1999) consider the Phillips curve from the start of the original Phillips sample, 1861, to 1998. For the post-war period, with a speciﬁcation in terms of price inﬂation (unlike the original Phillips curve speciﬁcation in terms of wage inﬂation), they ﬁnd distinctly diﬀerent “curves” pre- and post-1980: at ﬁrst the curve is “practically vertical; after 1980, the Phillips curve is practically horizontal” (p. 266). Benati (2004), however, questions Haldane and Quah’s use of frequency-domain procedures that focus on periodicities between ﬁve and eight years, and argues for a more “standard” business-cycle range of six quarters to eight years. With this alternative approach he obtains a further division of each episode, identifying “a period of extreme instability (the 1970s), a period of remarkable stability (the post-1992 period), and two periods ‘in-between’ (the Bretton Woods era and the period between 1980 and 1992)” (p. 711). This division is consistent with his prior univariate analysis of RPI inﬂation, 1947:1–2003:2, which ﬁnds three breaks in the intercept, coeﬃcients and innovation variance of a simple autoregression, with estimated dates 1972:3, 1981:2 and 1992:2 (although the date of the second break is much less precisely determined than the other two dates). Nelson and Nikolov (2004) and Meenagh et al. (2009) consider a wide range of “real-time” policy statements and pronouncements to document the vicissitudes of UK macroeconomic policymaking since the late 1950s. Until 1997, when the Bank of England gained operational independence, monetary policy, like ﬁscal policy, was in the hands of elected politicians, and their speeches and articles are a rich research resource. This evidence, together with their simulation of an estimated New Keynesian model of aggregate demand and inﬂation behavior, leads Nelson and Nikolov to conclude that “monetary policy neglect”, namely the failure in the 1960s and 1970s to recognize the primacy of monetary policy in controlling inﬂation, is important in understanding the inﬂation of that period. Study of a yet wider range of policymaker statements leads Nelson (2009) to conclude that the current inﬂation targeting regime is the result not of changed policymaker objectives, but rather of an “overhaul of doctrine”, in particular a changed view of the transmission mechanism, with the divide between the “old” and “modern” eras falling in 1979. Meenagh et al. (2009) provide a ﬁner division of policy episodes, identifying ﬁve subperiods: the Bretton Woods ﬁxed exchange rate system, up to 1970:4; the incomes policy regime, 1971:1–1978:4; the money targeting regime, 1979:1–1985:4; exchange rate targeting, 1986:1–1992:3; and inﬂation targeting, since 1992:4. They follow their narrative analysis with statistical tests in a three-variable VAR model, ﬁnding general support for the existence of the breaks, although the estimated break dates are all later than those suggested by the narrative analysis. These reﬂect lags in the eﬀect of policy on inﬂation and growth outcomes and, when policy regimes change, “there may well be a lag before agents’ behaviour changes; this lag will be the longer when the regime change is not clearly communicated or its eﬀects are not clearly understood” (p. 980). Meenagh et al. suggest that this applies to the last two changes: the switch to exchange rate

66

Modeling UK inﬂation uncertainty, 1958–2006

targeting in 1986, with a period of “shadowing the Deutsche Mark” preceding formal membership of the Exchange Rate Mechanism of the European Monetary System, was deliberately kept unannounced by the Treasury, while in 1992 inﬂation targeting was unfamiliar, with very little experience from other countries to draw on. Independent evidence on responses to later changes to the detail of the inﬂation targeting arrangements is presented in Section 5. None of the research discussed above is cast in the framework of a regime switching model, of which a wide variety is available in the econometric literature. The brief account of ﬁve policy episodes in the previous paragraph makes it clear that there was no switching from one regime to another and back again; at each break point the old policy was replaced by something new. Likewise no regime switching models feature in the analysis presented below.

3. Re-estimating the original ARCH model The original ARCH regression model for UK inﬂation is (Engle, 1982a, pp. 1001–2): Δ1 pt = β0 + β1 Δ1 pt−1 + β2 Δ1 pt−4 + β3 Δ1 pt−5 + β4 (pt−1 − wt−1 ) + εt ,

εt |ψt−1 ∼ N (0, ht ), ht = α0 + α1 0.4ε2t−1 + 0.3ε2t−2 + 0.2ε2t−3 + 0.1ε2t−4

(1) (2)

where p is the log of quarterly RPI and ψt−1 is the information set available at time t − 1. The wage variable used by Engle (in logs) in the real wage “error correction” term, namely an index of manual wage rates, was subsequently discontinued, and for consistency in all our re-estimations we use the average earnings index, also used by Haldane and Quah (1999). For the initial sample period, 1958:1–1977:2, we are able to reproduce Engle’s qualitative ﬁndings, with small diﬀerences in the quantitative details due to these minor variations. In particular, with respect to the h-process, our maximum likelihood estimate of α0 is, like his, not signiﬁcantly diﬀerent from zero, whereas our estimate of α1 , at 0.897, is slightly smaller than his (0.955). The turbulence of the period is illustrated in Figure 4.2, which plots the square root of the estimates of ht over the sample period: these are the standard errors of one-quarter-ahead forecasts of annual inﬂation based on the model. The width of an interval forecast with nominal 50% coverage (the interquartile range) varies from a minimum of 2.75 percentage points to a maximum of 14 percentage points of annual inﬂation. Engle concludes that “this example illustrates the usefulness of the ARCH model . . . for obtaining more realistic forecast variances”, although these were not subject to test in an out-of-sample exercise. Re-estimation over the extended sample period 1958:1–2006:4 produces the results shown in Table 4.1. These retain the main features of the original model – signiﬁcant autoregressive coeﬃcients, insigniﬁcant α0 , estimated α1 close to 1 – except for the estimate of the error correction coeﬃcient, β4 , which is virtually zero. Forward recursive estimation shows that this coeﬃcient maintains its signiﬁcance from the initial sample to samples ending in the mid-1980s, but then loses its signiﬁcance as more recent observations are added to the sample. Figure 4.3(a) shows the conditional standard error of annualized inﬂation over the fully extended period. The revised estimates are seen to extend the peaks in the original sample period shown in Figure 4.2; there is then a

3 Re-estimating the original ARCH model

67

12 10 8 6 4 2 0 58

Fig. 4.2.

60

62

64

66

68

70

72

74

76

Conditional standard errors, 1958:1–1977:2, Δ1 pt

further peak around the 1979–1981 recession, after which the conditional standard error calms down. Practical forecasters familiar with the track record of inﬂation projections over the past decade may be surprised by forecast standard errors as high as two percentage points of annual inﬂation shown in Figure 4.3(a). Their normal practice, however, is to work with an inﬂation measure deﬁned as the percentage increase in prices on a year earlier, Δ4 p, whereas Δ1 p is used in Engle’s model and our various re-estimates of it. The latter series exhibits more short-term volatility, as seen in Figure 4.1. Replacing Δ1 p in the original ARCH regression model given above by Δ4 p and re-estimating over the extended sample gives the conditional standard error series shown in Figure 4.3(b). This has the same proﬁle as the original speciﬁcation, but reﬂects a much lower overall level of uncertainty surrounding the more popular measure of inﬂation. Table 4.1. Estimation of the original ARCH model over 1958:1–2006:4 Δ1 pt = β0 + β1 Δ1 pt−1 + β2 Δ1 pt−4 + β3 Δ1 pt−5 + β4 (pt−1 − wt−1 ) + εt , εt |ψt−1 ∼ N (0, ht ), ht = α0 + α1 (0.4ε2t−1 + 0.3ε2t−2 + 0.2ε2t−3 + 0.1ε2t−4 ) Coeﬀ. βˆ0 βˆ1 βˆ2 βˆ3 βˆ4 α ˆ0 α ˆ1 Log likelihood

0.014 0.391 0.659 −0.337 0.002 0.0002 1.009 398.9

Std Error

z statistic

0.0097 1.44 0.0852 4.59 0.0504 13.07 0.0646 −5.22 0.0062 0.39 8E–05 2.99 0.1564 6.45 Akaike info criterion Schwarz criterion Hannan-Quinn criterion

p value 0.150 0.000 0.000 0.000 0.696 0.003 0.000 −4.00 −3.88 −3.95

68

Modeling UK inﬂation uncertainty, 1958–2006 14 12 10 8 6 4 2 0 60

Fig. 4.3(a).

65

70

75

80

85

90

95

00

05

Conditional standard errors, 1958:1–2006:4, Δ1 pt 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 60

Fig. 4.3(b).

65

70

75

80

85

90

95

00

05

Conditional standard errors, 1958:1–2006:4, Δ4 pt

Over the last decade the time series plotted in Figures 4.1 and 4.3 have a more homoskedastic, rather than heteroskedastic appearance, despite the signiﬁcance of the estimate of α1 over the full sample including this period. As a ﬁnal re-estimation exercise on the original ARCH model, with Δ1 p, we undertake backward recursive estimation. We begin with the sample period 1992:4–2006:4, the inﬂation targeting period, despite reservations about a learning period having been required before the full beneﬁts of the new policy became apparent. We then consider sample periods starting earlier, one quarter at a time, until the complete sample period 1958:1–2006:4 is reached. Equivalently, we could begin with full sample estimation then sequentially remove the earliest observation. Either way, the resulting estimates of the coeﬃcient α1 and the p values of the LM test (Engle, 1982a, Section 8) are plotted in Figure 4.4 against the starting date of the sample; the end date is 2006:4 throughout. There is seen to be a clear change around 1980. To

4 The nonstationary behavior of UK inﬂation

69

1.2 1.0 0.8 0.6 0.4 0.2 0.0 –0.2 1960

1965

1970 1975 1980 Sample start date Coefficient

Fig. 4.4.

1985

1990

p value

Backward recursive estimates of α1 and the LM p-value, Δ1 pt

exhibit signiﬁcant conditional heteroskedasticity it is necessary to include periods earlier than this in the sample; samples starting after 1980 oﬀer no support for the existence of ARCH in this model. Similar results are obtained when the model is rewritten in terms of Δ4 p, except that the sample has to start in 1990 or later for the signiﬁcant ARCH eﬀect to have disappeared. These ﬁndings prompt more general questions about nonstationarity.

4. The nonstationary behavior of UK inﬂation We undertake a fuller investigation of the nature of the nonstationarity of inﬂation, in the light of the coexistence in the literature of conﬂicting approaches. For example, Garratt, Lee, Pesaran and Shin (2003; 2006, Ch. 9) present an eight-equation conditional vector error correction model of the UK economy, estimated over 1965:1–1999:4, in which RPI inﬂation, Δ1 p, is treated as an I(1) variable. This leads them to express the target in their monetary policy experiment as a desired constant reduction in the rate of inﬂation from that observed in the previous period, which does not correspond to the inﬂation target that is the current focus of policy in the UK, nor anywhere else. In contrast, Castle and Hendry (2008) present error correction equations for inﬂation (GDP deﬂator) for use in forecast comparisons, with the same sample starting date as Garratt et al., assuming that “the price level is I(1), but subject to structural breaks which give the impression that the series is I(2)”. Standard unit root tests without structural breaks reveal some of the sources of potential ambiguity. Tests are performed recursively, beginning with a sample of 40 observations, 1958:1–1967:4, then extending the sample quarter-by-quarter to 2006:4. Results for the augmented Dickey–Fuller (ADF) test are representative of those obtained across various other tests. For the quarterly inﬂation series Δ1 p, the results presented in

70

Modeling UK inﬂation uncertainty, 1958–2006 4 2 0 –2 –4 –6 –8 1970

Fig. 4.5(a). only

1975

1980

1985

1990

1995

2000

2005

Recursive ADF tests for Δ1 p, with 5% and 10% critical values: Constant

2 1 0 –1 –2 –3 –4 –5 –6 1970

1975

1980

1985

1990

1995

2000

2005

Fig. 4.5(b). Recursive ADF tests for Δ1 p, with 5% and 10% critical values: Constant and seasonal dummies

Figure 4.5 demonstrate sensitivity to the treatment of seasonality. The upper panel gives the ADF statistic with the inclusion of a constant term, and shows that over the 1970s and 1980s the null hypothesis of I(1) inﬂation would not be rejected. The addition of quarterly dummy variables, however, gives the results shown in the lower panel, which lead to the clear rejection of the unit root hypothesis as soon as the end-point of the sample gets clear of the 1975 peak in inﬂation, and thereafter. Such constant additive seasonality can alternatively be removed by annual diﬀerencing, which also reduces shortterm volatility, as noted above in the discussion of Figure 4.1. For the Δ4 p series, in the corresponding ﬁgure (not shown) the ADF statistic lies in the unit root nonrejection region over the whole period. Backward recursive estimation of the ADF test for the Δ4 p

4 The nonstationary behavior of UK inﬂation

71

series, however, shows that the unit root hypothesis would be rejected in samples with start dates in 1990 or later. These results represent a simple example of the impact of a deterministic component, and diﬀerent ways of dealing with it, on inference about unit roots, and the sensitivity of such inference to the choice of sample period. The impact of structural breaks on inference about unit roots over the full data period is assessed using the procedures of Zivot and Andrews (1992), allowing for an estimated break in mean under the alternative hypothesis. Once this is done, the ADF statistic, relative to Zivot and Andrews’s critical values, implies rejection of the unit root hypothesis in all three cases: Δ1 p, with and without seasonal dummy variables, and Δ4 p. These results motivate further investigation of structural change, in models that are stationary within subperiods. We apply the testing procedure developed by Andrews (1993), which treats the break dates as unknown. Conﬁdence intervals for the estimated break dates are calculated by the method proposed by Bai (1997). For the Δ1 p series, in an autoregressive model with seasonal dummy variables, namely Δ1 pt = β0 + β1 Δ1 pt−1 + β2 Δ1 pt−4 +

3

γj Qjt + εt ,

(3)

j=1

we ﬁnd three signiﬁcant breaks in β0 , but none in the remaining coeﬃcients, at the following dates (95% conﬁdence intervals in parentheses): 1972 : 3

(1970 : 3–1974 : 3)

1980 : 2

(1979 : 2–1981 : 2)

1990 : 4

(1987 : 4–1993 : 4).

These are similar dates to those of the more general breaks identiﬁed by Benati (2004), noted above, although in our case it is the date of the second break that is most precisely estimated. Likewise our three break dates are close to the dates of the ﬁrst three breaks estimated in the three-variable VAR of Meenagh et al. (2009, Table 1). We have no counterpart to their fourth break, in 1993:4, associated with the introduction of inﬂation targeting a year earlier, although this date is the upper limit of the 95% conﬁdence interval for our third break, which is the least precisely determined of the three. The resulting equation with shifts in β0 shows evidence of ARCH over the whole period, but results given in the ﬁnal paragraph of Section 3 about its time dependence suggest separate testing in each of the four subperiods deﬁned by the three break dates. In none of the subperiods is there evidence of ARCH. As an alternative representation of heteroskedasticity we consider breaks in the error variance. Following Sensier and van Dijk (2004) we again locate three signiﬁcant breaks, at similar dates, namely 1974:2, 1981:3, and 1990:2. Estimates of the full model are presented in Table 4.2, and the implied subperiod means and standard deviations of inﬂation are shown as horizontal lines in Figures 4.1(a) and 4.3(a), respectively. For the Δ4 p series seasonal dummy variables are not required, but a moving average error is included, and the autoregression is slightly revised, giving the model Δ4 pt = β0 + β1 Δ4 pt−1 + β2 Δ4 pt−2 + εt + θεt−4 .

(4)

72

Modeling UK inﬂation uncertainty, 1958–2006 Table 4.2. Estimation of the 1958:1–2006:4. 3 “breaks model”, Δ1 pt = β0 + β1 Δ1 pt−1 +β2 Δ1 pt−4 + j=1 γj Qjt +δ1 D72:3+δ2 D80 : 2+δ3 D90 : 4 + εt , εt |ψt−1 ∼ N (0, ht ), ht = α0 + α1 D74 : 2 + α2 D81 : 3 + α3 D90 : 2 Coeﬀ. βˆ0 γˆ1 γˆ2 γˆ3 βˆ1 βˆ2 δˆ1 δˆ2 δˆ3 α ˆ0 α ˆ1 α ˆ2 α ˆ3 Log likelihood

0.024 −0.016 0.030 −0.038 0.405 0.138 0.047 −0.038 −0.015 0.001 0.003 −0.003 −0.001 449.5

Std Error

z statistic

0.007 3.62 0.006 −2.73 0.006 4.92 0.007 −5.30 0.070 5.77 0.074 1.88 0.012 3.96 0.011 −3.50 0.005 −2.87 0.000 6.37 0.001 2.67 0.001 −3.05 0.000 −5.68 Akaike info criterion Schwarz criterion Hannan-Quinn criterion

p value 0.000 0.006 0.000 0.000 0.000 0.061 0.000 0.001 0.004 0.000 0.008 0.002 0.000 −4.45 −4.24 −4.37

Again we ﬁnd three signiﬁcant breaks in β0 , the ﬁrst and third of which are accompanied by shifts in β1 , the dates being as follows: 1975 : 3

(1974 : 2–1976 : 4)

1981 : 4

(1981 : 2–1982 : 2)

1988 : 3

(1987 : 2–1989 : 4).

As in the quarterly diﬀerence series, ARCH eﬀects persist over the whole period, but there are no ARCH eﬀects in any of the subperiods deﬁned by these shifts in mean. With the same motivation as above we also ﬁnd three signiﬁcant breaks in variance in this case, namely 1974:2, 1980:2, and 1990:2, the ﬁrst and last dates exactly coinciding with those estimated for the Δ1 p series. This again provides an alternative representation of the observed heteroskedasticity, and the corresponding subperiod means and standard deviations are shown in Figures 4.1(b) and 4.3(b), respectively. (Note that regression residuals sum to zero over the full sample period, but not in each individual subperiod, because some coeﬃcients do not vary between subperiods. Hence the plotted values in Figure 4.1 do not coincide with the subperiod means of the inﬂation data.) The ARCH regression model and the alternative autoregressive model with intercept breaks in mean and variance are non-nested, and can be compared via an information criterion that takes account of the diﬀerence in the number of estimated parameters in each model. We ﬁnd that the three measures in popular use, namely Akaike’s information

5 Measures of inﬂation forecast uncertainty

73

criterion, the Hannan-Quinn criterion and the Schwarz criterion, unambiguously select the breaks model, for both Δ1 p and Δ4 p versions. A ﬁnal note on outliers is perhaps in order, as several empirical researchers identify inﬂation outliers associated with the increase in Value Added Tax in 1979:3 and the introduction of Poll Tax in 1990:2, and deal with them accordingly. We simply report that none of the modeling exercises presented in this section is sensitive to changes in the treatment of these observations.

5. Measures of inﬂation forecast uncertainty Publication of the UK Government’s short-term economic forecasts began on a regular basis in 1968. The 1975 Industry Act introduced a requirement for the Treasury to publish two forecasts each year, and to report their margins of error. The latter requirement was ﬁrst met in December 1976, with the publication of a table of the mean absolute error (MAE) over the past 10 years’ forecasts of several variables, compiled in the early part of that period from internal, unpublished forecasts. Subsequently it became standard practice to include a column of MAEs in the forecast table – users could then easily form a forecast interval around the given point forecast, if they so wished – although in the 1980s and 1990s these were often accompanied by a warning that they had been computed over a period when the UK economy was more volatile than expected in the future. This publication practice continues to the present day. We consider the RPI inﬂation forecasts described as “fourth quarter to fourth quarter” forecasts, published each year in late November – early December in Treasury documents with various titles over the years – Economic Progress Report, Autumn Statement, Financial Statement and Budget Report, now Pre-Budget Report. For comparability with other measures reported as standard errors or standard deviations we multiply the reported forecast MAEs, which are rounded to the nearest quarter percentage point, by 1.253(= π/2), as Melliss and Whittaker’s (2000) review of Treasury forecasts found that “the evidence supports the hypothesis that errors were normally distributed”. The resulting series is presented in Figure 4.6(a). The series ends in 2003, RPI having been replaced by CPI in the 2004 forecast; no MAE for CPI inﬂation forecasts has yet appeared. The peak of 5 percentage points occurs in 1979, when the point forecast for annual inﬂation was 14%; on this occasion, following the new Conservative government’s policy changes, the accompanying text expressed the view that the published forecast MAEs were “likely to understate the true margins of error”. For comparative purposes over the same period we also plot comparable forecast standard errors for the two models estimated in Sections 3 and 4 – the ARCH model and the breaks model. In common with the practice of the Treasury and other forecasters we use the annual inﬂation (Δ4 pt ) versions of these models. Similarly we regard the “year-ahead” forecast as a ﬁve-quarter-ahead forecast, as when forecasting the fourth quarter next year we ﬁrst have to “nowcast” the fourth quarter this year, given that only third-quarter information is available when the forecast is constructed. The forecast standard errors take account of the estimated autoregressions in projecting ﬁve quarters ahead, but this is an “in-sample” or ex post calculation that assumes knowledge of the full-sample estimates at all intermediate points including, for the breaks

74

Modeling UK inﬂation uncertainty, 1958–2006

model, the dates of the breaks; the contribution of parameter estimation error is also neglected. It is seen that the ARCH model’s forecast standard error shows a much more exaggerated peak than that of Treasury forecasts in 1979, and is more volatile over the ﬁrst half of the period shown, whereas the breaks model’s forecast standard error is by deﬁnition constant over subperiods. Of course, in real-time ex ante forecasting the downward shift in forecast standard error could only be recognized with a lag, as discussed below. From 1996 two additional lines appear in Figure 4.6(a), following developments noted in the Introduction. As late as 1994 the Treasury could assert that “it is the only major forecasting institution regularly to publish alongside its forecasts the average errors from past forecasts” (HM Treasury, 1994, p. 11), but in 1996 density forecasts of inﬂation appeared on the scene. We consider the Bank of England’s forecasts published around

9 TREASURY ARCH BREAKS MODEL

8 7

MPC SEF

6 5 4 3 2 1 0 1980

Fig. 4.6(a).

1985

1990

1995

2000

2005

Measures of uncertainty, year-ahead forecasts, 1976–2006 9 8

HMT COMPILATION

7

SEF

6 5 4 3 2 1 0 1980

Fig. 4.6(b).

1985

1990

1995

2000

2005

Measures of disagreement, year-ahead forecasts, 1986–2006

5 Measures of inﬂation forecast uncertainty

75

the same time as the Treasury forecasts, namely those appearing in the November issue of the quarterly Inﬂation Report. From the Bank’s spreadsheets that underlie the fan charts of quarterly forecasts, originally up to two years ahead (nine quarters), later extended to three years, we take the uncertainty measure (standard deviation) of the ﬁvequarter-ahead inﬂation forecast. This is labeled MPC in Figure 4.6(a), because the Bank’s Monetary Policy Committee, once it was established, in 1997, assumed responsibility for the forecast. In 1996 the Bank of England also initiated its quarterly Survey of External Forecasters, at ﬁrst concerned only with inﬂation, later including other variables. The quarterly Inﬂation Report includes a summary of the results of the latest survey, conducted approximately three weeks before publication. The survey asks for both point forecasts and density forecasts, reported as histograms, and from the individual responses Boero, Smith and Wallis (2008) construct measures of uncertainty and disagreement. Questions 1 and 2 of each quarterly survey concern forecasts for the last quarter of the current year and the following year, respectively, and for comparable year-ahead forecasts we take the responses to question 2 in the November surveys. For these forecasts our SEF average individual uncertainty measure is plotted in Figure 4.6(a). The general appearance of Figure 4.6(a) has few surprises for the careful reader of the preceding sections. The period shown divides into two subperiods, the ﬁrst with high and variable levels of forecast uncertainty, the second with low and stable levels of forecast uncertainty, where the diﬀerent estimates lie within a relatively small range. The recent fall in the Treasury forecast standard error may be overdramatized by rounding, whereas the fall in SEF uncertainty is associated by Boero, Smith and Wallis (2008) with the 1997 granting of operational independence to the Bank of England to pursue a monetary policy of inﬂation targeting. Their quarterly series show a reduction in uncertainty until the May 1999 Survey of External Forecasters, after which the general level is approximately constant. This reduction in uncertainty about future inﬂation is attributed to the increasing conﬁdence in, and credibility of, the new monetary policy arrangements. The forecast evaluation question, how reliable are these forecasts, applies to measures of uncertainty just as it does to measures of location, or point forecasts. Wallis (2004) presents an evaluation of the current-quarter and year-ahead density forecasts of inﬂation published by the MPC and NIESR. He ﬁnds that both overstated forecast uncertainty, with more inﬂation outcomes falling in the central area of the forecast densities, and fewer in the tails, than the densities had led one to expect. Current estimates of uncertainty are based on past forecast errors, and both groups had gone back too far into the past, into a diﬀerent monetary policy regime with diﬀerent inﬂation experience. Over 1997–2002 the MPC’s year-ahead point forecast errors have mean zero and standard deviation 0.42, and the fan chart standard deviation gets closest to this, at 0.48, only at the end (2002:4) of the period considered. Mitchell (2005), for the NIESR forecasts, asks whether the overestimation of uncertainty could have been detected, in real time, had forecasters been alert to the possibility of a break in the variance. Statistical tests can detect breaks only with a lag, and in a forecast context we must also wait to observe the outcome before having information relevant to the possibility of a break in uncertainty at the forecast origin. In a “pseudo real time” recursive experiment it is concluded that tests such as those used in Section 4 could have detected at the end of 1996 that a break in year-ahead forecast uncertainty had occurred in 1993:4. This

76

Modeling UK inﬂation uncertainty, 1958–2006

is exactly the date of the most recent break identiﬁed by Meenagh et al. (2009), and Mitchell’s estimate is that it would not have been recognized by statistical testing until three years later; in the meantime forecasters might have been able to make judgmental adjustments. As an aside we discuss a recent inﬂation point forecast evaluation study in which the same issue arises. Groen, Kapetanios and Price (2009) compare the inﬂation forecasts published in the Bank of England’s Inﬂation Report with those available in pseudo real time from a suite of statistical forecasting models. All of the latter are subject to possible breaks in mean, so following a breaks test, the identiﬁed break dates are used to demean the series prior to model estimation, then the statistical forecasts are the remeaned projections from the models. It is found that in no case does a statistical model outperform the published forecasts. The authors attribute the Bank forecasters’ success to their ability to apply judgment in anticipating the important break, namely the change of regime in 1997:3 following Bank independence. As in Mitchell’s study, the ex ante recursively estimated shift is not detected until three years later. For Treasury forecasts, which started earlier, we can compare the ex ante uncertainty measures in Figure 4.6(a) with the forecast root mean squared errors of year-ahead inﬂation forecasts reported by Melliss and Whittaker’s (2000). Over subperiods, dated by forecast origin, these ex post measures are: 1979–1984, 2.3%; 1985–1992, 1.7%; 1993– 1996, 0.8%. These are below, often substantially so, the values plotted in Figure 4.6(a), with the exception of the 1990 and 1992 forecasts, again illustrating the diﬃculty of projecting from past to future in times of change. In the absence of direct measures of uncertainty it is often suggested that a measure of disagreement among several competing point forecasts may serve as a useful proxy. How useful such a proxy might be can be checked when both measures are available, and there is a literature based on the US Survey of Professional Forecasters that investigates this question, going back to Zarnowitz and Lambros (1987). However, recent research on the SPF data that brings the sample up to date and studies the robustness of previous ﬁndings to the choice of measures ﬁnds little support for the proposition that disagreement is a useful proxy for uncertainty (Rich and Tracy, 2006, for example). In the present context we provide a visual illustration of this lack of support by plotting in Figure 4.6(b) two measures of disagreement based on yearahead point forecasts of UK inﬂation. Although the series are relatively short, we use the same scales in panels (a) and (b) of Figure 4.6 to make the comparison as direct as possible and the lack of a relation as clear as possible. The ﬁrst series is based on the Treasury publication Forecasts for the UK Economy, monthly since October 1986, which is a summary of published material from a wide range of forecasting organizations. Forecasts for several variables are compiled, and their averages and ranges are also tabulated. We calculate and plot the sample standard deviation of year-ahead inﬂation forecasts in the November issue of the publication. The shorter series is our corresponding disagreement measure from the Bank of England Survey of External Forecasters (Boero, Smith and Wallis, 2008). Other than a slight downward drift, neither series shows any systematic pattern of variation, nor any correlation of interest with the uncertainty measures. We attribute the lower standard deviation in the SEF to the Bank’s care in selecting a well-informed sample, whereas the Treasury publication is all-encompassing.

6 Uncertainty and the level of inﬂation

77

6. Uncertainty and the level of inﬂation The suggestion by Friedman (1977) that the level and uncertainty of inﬂation are positively correlated has spawned a large literature, both theoretical and empirical. Simple evidence of such an association is provided by our breaks model where, using Benati’s (2004) characterization of the four subperiods as a period of high inﬂation and inﬂation variability, a period of low inﬂation and inﬂation variability, and two “in-between” periods, we note that the high and low periods for both measures coincide. Compare the horizontal lines in Figures 4.1(a) and 4.3(a) for the Δ1 p model, and in Figures 4.1(b) and 4.3(b) for the Δ4 p model. For the unconditional subperiod means and standard deviations of inﬂation over a shorter period (1965–2003), the data of Meenagh et al. (2009, Table 2) show a stronger association: when their ﬁve policy subperiods are ranked by mean inﬂation and by inﬂation standard deviation, the ranks exactly coincide. Of course, the empirical literature contains analyses of much greater sophistication although, perhaps surprisingly, they are not subjected to tests of structural stability. Two leading examples in the empirical literature, on which we draw, are the articles by Baillie, Chung and Tieslau (1996) and Grier and Perry (2000), in which various extensions of the GARCH-in-mean (GARCH-M) model are developed in order to formalize and further investigate Friedman’s proposition. The ﬁrst authors analyze inﬂation in 10 countries, the second authors analyze inﬂation and GDP growth in the US, including subsample analyses. Of particular relevance for the present purpose is the inclusion of the conditional variance (or standard deviation) in the inﬂation equation and, simultaneously, lagged inﬂation in the conditional variance equation. Then, with a GARCH representation of conditional heteroskedasticity, the model is: Δ1 pt = β0 + β1 Δ1 pt−1 + β2 Δ1 pt−4 +

3

γj Qjt + δ1

h t + εt

(5)

j=1

ht = α0 + α1 ε2t−1 + α2 ht−1 + δ2 Δ1 pt−1 .

(6)

Full-sample estimation results show positive feedback eﬀects between the conditional mean and the conditional variance, with a highly signiﬁcant coeﬃcient on lagged inﬂation in the variance equation (δ2 ), and a marginally signiﬁcant coeﬃcient (p value 0.063) on the conditional standard deviation in the mean equation (δ1 ); all other coeﬃcients are highly signiﬁcant. However, the model is not invariant over subperiods. If we simply split the sample at 1980, then the estimate of δ2 retains its signiﬁcance while the GARCH-M eﬀect drops out from equation (5), which may be associated with the insigniﬁcant estimates of α1 and α2 in equation (6). All of these statements apply to each half-sample; however, further division reveals the fragility of the signiﬁcance of δ2 . As a ﬁnal test we return to the breaks model of Section 4 and add the conditional standard deviation in mean and lagged inﬂation in variance eﬀects. Equivalently, we allow the separate intercept terms in equations (5) and (6), β0 and α0 , to shift at the dates estimated in Section 4; the coeﬃcients α1 and α2 are pre-tested and set to zero. This model dominates the originally estimated model (5)–(6) on the three standard information criteria, yet has completely insigniﬁcant estimates of δ1 and δ2 . More elaborate models are not able to take us much beyond Friedman’s simple association between the ﬁrst and second moments of inﬂation, as reﬂected in the shifts of our preferred model.

78

Modeling UK inﬂation uncertainty, 1958–2006

7. Conclusion Robert Engle’s concept of autoregressive conditional heteroskedasticity was a major breakthrough in the analysis of time series with time-varying volatility, recognized by the joint award of the Bank of Sweden Prize in Economic Sciences in Memory of Alfred Nobel in 2003. “The ARCH model and its extensions, developed mainly by Engle and his students, proved especially useful for modelling the volatility of asset returns, and the resulting volatility forecasts can be used to price ﬁnancial derivatives and to assess changes over time in the risk of holding ﬁnancial assets. Today, measures and forecasts of volatility are a core component of ﬁnancial econometrics, and the ARCH model and its descendants are the workhorse tools for modelling volatility” (Stock and Watson, 2007b, p. 657). His initial application was in macroeconometrics, however, and reﬂected his location in the United Kingdom at the time. This chapter returns to his study of UK inﬂation in the light of the well-documented changes in economic policy from his original sample period to the present time. Investigation of the stability of the ARCH regression model of UK inﬂation shows that little support for the existence of the ARCH eﬀect would be obtained in a sample period starting later than 1980; data from the earlier period of “monetary policy neglect” (Nelson and Nikolov, 2004) are necessary to support Engle’s formulation. Fuller investigation of the nature of the nonstationarity of inﬂation ﬁnds that a simple autoregressive model with structural breaks in mean and variance, constant within subperiods (and with no unit roots), provides a preferred representation of the observed heteroskedasticity from an economic historian’s point of view. As noted at the outset, however, the ARCH model has a strong forecasting motivation, and forecasters using the breaks model need to anticipate future breaks. Nevertheless, the shifts also provide a simple characterization of the association between the level and uncertainty of inﬂation suggested by Friedman (1977), which more elaborate models of possible feedbacks are unable to improve upon. The United Kingdom can claim several ﬁrsts in the measurement and public discussion of the uncertainty surrounding economic forecasts by oﬃcial agencies, and we present a range of measures of inﬂation forecast uncertainty, from the models considered here and from other UK sources. The few available evaluations of their accuracy indicate that the well-known problems of projecting from past to future in times of change apply equally well to measures of uncertainty as to point forecasts. Although the chapter re-emphasizes the importance of testing the structural stability of econometric relationships, it also acknowledges the diﬃculty of dealing with instability in a forecast context, for both the levels of variables of interest and, receiving more attention nowadays, their uncertainty.

5

Macroeconomics and ARCH James D. Hamilton

1. Introduction One of the most inﬂuential econometric papers of the last generation was Engle’s (1982a) introduction of autoregressive conditional heteroskedasticity (ARCH) as a tool for describing how the conditional variance of a time series evolves over time. The ISI Web of Science lists over 2,000 academic studies that have cited this article, and simply reciting the acronyms for the various extensions of Engle’s theme involves a not insigniﬁcant commitment of paper (see Table 5.1, or the more detailed glossary in Chapter 8). The vast majority of empirical applications of ARCH models have studied ﬁnancial time series such as stock prices, interest rates, or exchange rates (see Bollerslev, Chou and Kroner, 1992). To be sure, there have also been a number of interesting applications of ARCH to macroeconomic questions. Pelloni and Polasek (2003) analyzed the macroeconomic eﬀects of sectoral shocks within a VAR-GARCH framework. Lee, Ni, and Ratti (1995) noted that the conditional volatility of oil prices, as captured by a GARCH model, seems to matter for the magnitude of the eﬀect on GDP of a given movement in oil prices, and Elder and Serletis (2006) use a vector autoregression with GARCH-in-mean elements to describe the direct consequences of oil-price volatility for GDP. Grier and Perry (2000) and Fountas and Karanasos (2007) use such models to conclude that inﬂation and output volatility also can depress real GDP growth, while Serv´en (2003) studied the eﬀects of uncertainty on investment spending, and Shields et al. (2005) analyzed the response of uncertainty to macroeconomic shocks. However, despite these interesting applications, studying volatility has traditionally been a much lower priority for macroeconomists than for researchers in ﬁnancial markets because the former’s interest is primarily in describing the ﬁrst moments. There seems to be an assumption among many macroeconomists that, if your primary interest is in the ﬁrst moment, ARCH has little relevance apart from possible GARCH-M eﬀects. The purpose of this chapter is to suggest that even if our primary interest is in estimating the conditional mean, having a correct description of the conditional variance can still be quite important, for two reasons. First, hypothesis tests about the mean in a

79

80

Macroeconomics and ARCH

model in which the variance is mis-speciﬁed will be invalid. Second, by incorporating the observed features of the heteroskedasticity into the estimation of the conditional mean, substantially more eﬃcient estimates of the conditional mean can be obtained. Section 2 develops the theoretical basis for these claims, illustrating the potential magnitude of the problem with a small Monte Carlo study and explaining why the popular White (1980) or Newey–West (Newey and West, 1987) corrections may not fully correct for the inference problems introduced by ARCH. The subsequent sections illustrate the practical relevance of these concerns using two examples from the macroeconomics literature. The ﬁrst application concerns measures of what the market expects the US Federal Reserve’s next move to be, and the second explores the extent to which US monetary policy today is following a fundamentally diﬀerent rule from that observed 30 years ago. I recognize that it may require more than these limited examples to persuade macroeconomists to pay more attention to ARCH. Another thing I learned from Rob Engle is that, in addition to coming up with a great idea, it doesn’t hurt if you also have a catchy acronym that people can use to describe what you’re talking about. After all, where would we be today if we all had to pronounce “autoregressive conditional heteroskedasticity” every time we wanted to discuss these issues? However, Table 5.1 reveals that the acronyms one might logically use for “Macroeconomics and ARCH” seem already to be taken. “MARCH”, for example, is already used (twice), as is “ARCH-M”. Table 5.1. AARCH APARCH ARCH-M FIGARCH GARCH GARCH-t GJR-ARCH

How many ways can you spell “ARCH”? (A partial lexicography)

EGARCH HGARCH IGARCH MARCH

Augmented ARCH Asymmetric power ARCH ARCH in mean Fractionally integrated GARCH Generalized ARCH Student’s t GARCH Glosten-Jagannathan-Runkle ARCH Exponential generalized ARCH Hentschel GARCH Integrated GARCH Modiﬁed ARCH

MARCH NARCH PNP-ARCH QARCH QTARCH SPARCH STARCH SWARCH TARCH VGARCH

Multiplicative ARCH Nonlinear ARCH Partially Nonparametric ARCH Quadratic ARCH Qualitative Threshold ARCH Semiparametric ARCH Structural ARCH Switching ARCH Threshold ARCH Vector GARCH

Bera, Higgins and Lee (1992) Ding, Engle, and Granger (1993) Engle, Lilien and Robins (1987) Baillie, Bollerslev, Mikkelsen (1996) Bollerslev (1986) Bollerslev (1987) Glosten, Jagannathan, and Runkle (1993) Nelson (1991) Hentschel (1995) Bollerslev and Engle (1986) Friedman, Laibson, and Minsky (1989) Milhøj (1987) Higgins and Bera (1992) Engle and Ng (1993) Sentana (1995) Gourieroux and Monfort (1992) Engle and Gonz´ alez-Rivera (1991) Harvey, Ruiz, and Sentana (1992) Hamilton and Susmel (1994) Zakoian (1994) Bollerslev, Engle, and Wooldrige (1988)

2 GARCH and inference about the mean

81

Fortunately, Engle and Manganelli (2004) have shown us that it’s also OK to mix upper- and lower-case letters, picking and choosing handy vowels or consonants so as to come up with something catchy, as in “CAViaR” (Conditional Autoregressive Value at Risk). In that spirit, I propose to designate “Macroeconomics and ARCH” as “McARCH.” Maybe not a new product so much as new packaging. Herewith, then, discussion of the relevance of McARCH.

2. GARCH and inference about the mean We can illustrate some of the issues with the following simple model: yt = β0 + β1 yt−1 + ut u t = h t vt ht = κ + αu2t−1 + δht−1

(1) (2)

for t = 1, 2, . . . , T

h0 = κ/(1 − α − δ) vt ∼ i.i.d. N (0, 1).

(3)

Bollerslev (1986, pp. 312–313) showed that if 3α2 + 2αδ + δ 2 < 1,

(4)

then the noncentral unconditional second and fourth moments of ut exist and are given by κ (5) μ2 = E(u2t ) = 1−α−δ μ4 = E(u4t ) =

3κ2 (1 + α + δ) . (1 − α − δ)(1 − δ 2 − 2αδ − 3α2 )

(6)

Consider the consequences if the mean parameters β0 and β1 are estimated by ordinary least squares, −1 ˆ= β xt xt xt yt β = (β0 , β1 ) xt = (1, yt−1 ) , and where all summations are for t = 1, . . . , T . Suppose further that inference is based on the usual OLS formula for the variance, with no correction for heteroskedasticity: −1 ˆ = s2 xt xt V (7) s2 = (T − 2)−1

ˆ u ˆt = yt − xt β.

u ˆ2t

82

Macroeconomics and ARCH

Consider ﬁrst the consequences of this inference when the fourth-moment condition (4) is satisﬁed. For simplicity of exposition, consider the case when the true value of β = 0. Then from the standard consistency results (e.g., Lee and Hansen, 1994; Lumsdaine, 1996) we see that −1 ˆ = s2 T −1 xt xt TV p

→

E(u2t )

=

μ2 0

0 1

1

−1

E(yt−1 ) 2 ) E(yt−1 ) E(yt−1

−1 .

In other words, the OLS formulas will lead us to act as if if the true value of β1 is zero. But notice √

(8)

√ T βˆ1 is approximately N (0, 1)

−1 ˆ − β) = T −1 xt xt xt ut . T −1/2 T (β

(9)

Under the null hypothesis, the term inside the second summation, xt ut , is a martingale diﬀerence sequence with variance E(u2t xt xt )

=

E(u2t )

E(u2t ut−1 ) E(ut−1 u2t ) E(u2t u2t−1 )

.

When the (2,2) element of this matrix is ﬁnite, it then follows from the Central Limit Theorem (e.g., Hamilton, 1994, p. 173) that T −1/2

L yt−1 ut → N 0, E u2t u2t−1 .

(10)

To calculate the value of this variance, recall (e.g., Hamilton, 1994, p. 666) that the GARCH(1,1) structure for ut implies an ARMA(1,1) structure for u2t : u2t = κ + (δ + α)u2t−1 + ωt − δωt−1 for wt−1 a white noise process. It follows from the ﬁrst order autocovariance for an ARMA(1,1) process (e.g., Box and Jenkins, 1976, p. 76) that E(u2t u2t−1 ) = E(u2t − μ2 )(u2t−1 − μ2 ) + μ22 = ρ(μ4 − μ22 ) + μ22

(11)

for ρ=

[1 − (α + δ)δ]α . 1 + δ 2 − 2(α + δ)δ

(12)

2 GARCH and inference about the mean

83

Substituting (11), (10) and (8) into (9), √ L T βˆ1 → N (0, V11 ) V11 =

ρμ4 + (1 − ρ)μ22 μ22

=ρ

3(1 + α + δ)(1 − α − δ) + (1 − ρ). (1 − δ 2 − 2αδ − 3α2 )

with the last equality following from (5) and (6). √ Notice that V11 ≥ 1, with equality if and only if α = 0. Thus OLS treats T βˆ1 as approximately N (0, 1), whereas the true asymptotic distribution is Normal with a variance bigger than unity, meaning that the OLS t-test will systematically reject more often than it should. The probability of rejecting the null hypothesis that β1 = 0 (even though the null hypothesis is true) gets bigger and bigger as the parameters get closer to the region at which the fourth moment becomes inﬁnite, at which point the asymptotic rejection probability becomes unity. Figure 5.1 plots the rejection probability as a function of a and δ. If these parameters are in the range typically found in estimates of GARCH processes, an OLS t-test with no correction for heteroskedasticity would spuriously reject with arbitrarily high probability for a suﬃciently large sample. The good news is that the rate of divergence is pretty slow – it may take a lot of observations before the accumulated excess kurtosis overwhelms the other factors. I simulated 10,000 samples from the above Gaussian GARCH process for samples of size

Fig. 5.1. Asymptotic rejection probability for OLS t-test that autoregressive coeﬃcient is zero as a function of GARCH (1,1) parameters α and δ Note: Null hypothesis is actually true and test has nominal size of 5%

84

Macroeconomics and ARCH

T = 100, 200, and 1,000 and 10,000, (and 1,000 samples of size 100,000), where the true values were speciﬁed as follows: β0 = β1 = 0 κ=2 α = 0.35 δ = 0.6. The solid line in Figure 5.2 plots the fraction of samples for which an OLS t-test of β1 = 0 exceeds two in absolute value. Thinking we’re only rejecting a true null hypothesis 5% of the time, we would in fact do so 15% of the time in a sample of size T = 100 and 33% of the time when T = 1, 000. As one might imagine, for a given sample size, the OLS t-statistic is more poorly behaved if the true innovations υt in (2) are Student’s t with 5 degrees of freedom (the dashed line in Figure 5.2) rather than Normal. ˆ we use White’s What happens if instead of the OLS formula (7) for the variance of β (1980) heteroskedasticity-consistent estimate, ˜ = V

1 0.9

xt xt

−1

u ˆ2t xt xt

xt xt

−1

?

(13)

Normal Student’s t

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 102

103

104

105

Sample size (T)

Fig. 5.2. Fraction of samples in which OLS t-test leads to rejection of the null hypothesis that autoregressive coeﬃcient is zero as a function of the sample size for regression with Gaussian errors (solid line) and Student’s t-errors (dashed line) Note: Null hypothesis is actually true and test has nominal size of 5%

2 GARCH and inference about the mean 6

85

White OLS

5

4

3

2

1

0 102

104

103

105

Sample size (T)

√ Fig. 5.3. Average value of T times estimated standard error of estimated autoregressive coeﬃcient as a function of the sample size for White standard error (solid line) and OLS standard error (dashed line) ˜ is intended to ARCH is not a special case of the class of heteroskedasticity for which V ˜ is not a consistent estimate of a given be robust, and indeed, unlike typical cases, T V matrix: −1 −1 ˜ = T −1 xt xt u ˆ2t xt xt T −1 xt xt T −1 . TV The ﬁrst and last matrices will converge as before, p 1 0 −1 xt xt → , T 0 μ2 2 ˆt xt xt will diverge if the fourth moment μ4 is inﬁnite. Figure 5.3 plots the but T −1 u ˜ for the Gaussian simulated value for the square root of the lower-right element of T V √ simulations above. However, this growth in the estimated variance of T βˆ1 is exactly √ right, given the growth of the actual variance of T βˆ1 implied by the GARCH speciﬁcation. And a t-test based on (13) seems to perform reasonably well for all sample sizes (see the second row of Table 5.2). The small-sample size distortion for the White test is a little worse for Student’s t compared with Normal errors, though still acceptable. Table 5.2 also explores the consequences of using the Newey–West (Newey and West, 1987) generalization of the White formula to allow for serial correlation, using a lag window of q = 5: T −1 −1 T T

υ ∗ ˜ = 1− xt xt u ˆt u ˆt−υ xt xt−υ + xt−υ xt xt xt . V q + 1 t=υ+1 t=1 t=1

86

Macroeconomics and ARCH

Table 5.2. Fraction of samples for which indicated hypothesis is rejected by test of nominal size 0.05 H0

Test based on

T = 100

T = 200

T = 1000

Errors Normally distributed β1 = 0 (H0 is true) β1 = 0 (H0 is true) β1 = 0 (H0 is true) εt homoskedastic (H0 is false) εt homoskedastic (H0 is false)

OLS standard error White standard error Newey–West standard error White T R2

0.152 0.072 0.119 0.570

0.200 0.063 0.092 0.874

0.327 0.054 0.062 1.000

Engle T R2

0.692

0.958

1.000

β1 = 0 (H0 is true) β1 = 0 (H0 is true) β1 = 0 (H0 is true) εt homoskedastic (H0 is false) εt homoskedastic (H0 is false)

OLS standard error White standard error Newey–West standard error White T R2

0.174 0.081 0.137 0.427

0.229 0.070 0.106 0.691

0.389 0.065 0.079 0.991

Engle T R2

0.536

0.822

0.998

Errors Student’s t with 5 degrees of freedom

These results (reported in the third row of the two panels of Table 5.2) illustrate one potential pitfall of relying too much on “robust” statistics to solve the small-sample problems, in that it has more serious size distortions than does the simple White statistic for all speciﬁcations investigated. Another reason one might not want to assume that White or Newey–West standard errors can solve all the problems is that these formulas only correct the standard error ˆ but are still using the OLS estimate itself, which from Figure 5.3 was seen not for β, √ to be T convergent. By contrast, even if the fourth moment√ does not exist, maximum likelihood estimation as an alternative to OLS is still T convergent. Hence the relative eﬃciency gains of MLE relative to OLS become inﬁnite as the sample size grows for typical values of GARCH parameters. Engle (1982a, p. 999) observed that it is also possible to have an inﬁnite relative eﬃciency gain for some parameter values even with exogenous explanatory variables and ARCH as opposed to GARCH errors. Results here are also related to the well-known result that ARCH will render inaccurate traditional tests for serial correlation in the mean. That fact has previously been noted, for example, by Milhøj (1985, 1987), Diebold (1988), Stambaugh (1993), and Bollerslev and Mikkelsen (1996). However, none of the above seems to have commented on the fact (though it is implied by the formulas they use) that the test size goes to unity as the fourth moment approaches inﬁnity, or noted the implications as here for OLS regression. Finally, I observe that just checking for a diﬀerence between the OLS and the White standard errors will sometimes not be suﬃcient to detect these problems. The diﬀerence

3 Application 1

87

ˆ and V ˜ will be governed by the size of between V ˆ2t )xt xt . (s2 − u White (1980) suggested a formal test of whether this magnitude is suﬃciently small on the basis of an OLS regression of u ˆ2t on the vector ψ t consisting of the unique elements 2 ) . White showed that, under the null of xt xt . In the present case, ψ t = (1, yt−1 , yt−1 ˆ2t on ψ t hypothesis that the OLS standard errors are correct, T R2 from a regression of u 2 would have a χ (2) distribution. The next-to-last row of each panel of Table 5.2 reports the fraction of samples for which this test would (correctly) reject the null hypothesis. It would miss about half the time in a sample as small as 100 observations but is more reliable for larger sample sizes. Alternatively, one can look at Engle’s (1982a, 1982b) analogous test for the null of homoskedasticity against the alternative of qth-order ARCH by looking at T R2 from a ˆ2t−1 , u ˆ2t−2 , . . . , u ˆ2t−q ) , which asymptotically has a χ2 (q) distriburegression of u ˆ2t on (1, u tion under the null. The last rows in Table 5.2 report the rejection frequency for this test using q = 3 lags. Not surprisingly, as this test is designed speciﬁcally for the ARCH class of alternatives whereas the White test is not, this test has a little more power. Its advantage over the White test for homoskedasticity is presumably greater in many macro applications in which xt includes a number of variables and their lags, in which case the vector ψ t can become unwieldy, whereas the Engle test remains a simple χ2 (q) regardless of the size of xt . The philosophy of McARCH, then, is quite simple. The Engle T R2 diagnostic should be calculated routinely in any macroeconomic analysis. If a violation of homoskedasticity is found, one should compare the OLS estimates with maximum likelihood to make sure that the inference is robust. The following sections illustrate the potential importance of doing so with two examples from applied macroeconomics.

3. Application 1: Measuring market expectations of what the Federal Reserve is going to do next My ﬁrst example is adapted from Hamilton (2009). The Fed funds rate is a marketdetermined interest rate at which banks lend reserves to one another overnight. This interest rate is extremely sensitive to the supply of reserves created by the Fed, and in recent years monetary policy has been implemented in terms of a clearly announced target for the Fed funds rate that the Fed intends to achieve. A critical factor that determines how Fed actions aﬀect the economy is expectations by the public as to what the Fed is going to do next, as discussed, for example, in my (Hamilton, 2009) paper. One natural place to look for an indication of what those expectations might be is the Fed funds futures market. Let t = 1, 2, . . . , T index monthly observations. In the empirical results reported here, t = 1 corresponds to October 1988 and the last observation (T = 213) is June 2006. For each month, we’re interested in what the market expects for the average eﬀective Fed funds rate over that month, denoted rt . For the empirical estimates reported in this

88

Macroeconomics and ARCH

section, rt is measured in basis points, so that for example rt = 525 corresponds to an annual interest rate of 5.25%. On any business day, one can enter into a futures contract through the Chicago Board of Trade whose settlement is based on what the value of rt+j actually turns out to be for some future month. The terms of a j-month-ahead contract traded on the last day (j) of month t can be translated1 into an interest rate ft such that, if rt+j turns out to (j) be less than ft , then the seller of the contract has to compensate the buyer a certain (j) amount (speciﬁcally, $41.67 on a standard contract) for every basis point by which ft (j) (j) exceeds rt+j . If ft < rt+j , the buyer pays the seller. As ft is known as of the end of month t but rt+j will not be known until the end of month t + j, the buyer of the (j) contract is basically making a bet that rt+j will be less than ft . If the marginal market participant were risk neutral, it would be the case that (j)

ft

= Et (rt+j )

(14)

where Et (.) denotes the mathematical expectation on the basis of any information publicly available as of the last day of month t. If (14) holds, we could just look at the value (j) of ft to infer what market participants expect the Federal Reserve to do in the coming months. However, previous investigators such as Sack (2004) and Piazzesi and Swanson (2008) have concluded that (14) does not hold. The simplest way to investigate this claim is to construct the forecast error implied by the one-month-ahead contract, (1)

ut

(1)

= rt − ft−1

and test whether this error indeed has mean zero, as it should if (14) were correct. For contracts at longer horizons j > 1, one can look at the monthly change in contract terms, (j)

ut (j)

If (14) holds, then ut

(j−1)

= ft

(j)

− ft−1 .

would also be a martingale diﬀerence sequence: (j)

ut

= Et (rt+j−1 ) − Et−1 (rt+j−1 ).

One simple test is then to perform the regression (j)

ut

(j)

= μ(j) + εt

and test the null hypothesis that μ(j) = 0; this is of course just the usual t-test for a sample mean. Table 5.3 reports the results of this test using one-, two-, and threemonth-ahead futures contracts. For the historical sample, the one-month-ahead futures (1) contract ft overestimated the value of rt+1 by an average of 2.66 basis points and (j) (j−1) ft overestimated the value of ft+1 by almost 4 basis points. One interpretation is that there is a risk premium built into these contracts. Another possibility is that the market participants failed to recognize fully the chronic decline in interest rates over this period. 1 Speciﬁcally, if P is the price of the contract agreed to by the buyer and seller on day t, then t ft = 100 × (100 − Pt ).

3 Application 1

89 OLS estimates of bias in monthly fed funds futures forecast errors

Table 5.3.

Dependent (j) variable (ut )

Estimated mean (ˆ μ(j) )

Standard error

OLS p value

ARCH(4) LM p value

Log likelihood

j = 1 month j = 2 months j = 3 months

−2.66 −3.17 −3.74

0.75 1.06 1.27

0.001 0.003 0.003

0.006 0.204 0.001

−812.61 −884.70 −922.80

Before putting too much credence in such interpretations, however, recall that the (j) theory (14) implies that ut should be a martingale diﬀerence sequence but makes no (j) claims about predictability of its variance. Figure 5.4 reveals that each of the series ut exhibits some clustering of volatility and a signiﬁcant decline in variability over time, in addition to occasional very large outliers. Engle’s T R2 test for omitted fourth-order (1) ARCH ﬁnds very strong evidence of conditional heteroskedasticity at least for ut and (3) ut ; see Table 5.3. Hence if we are interested in a more accurate estimate of the bias

One month

30 20 10 0 –10 –20 –30 –40 –50 –60 1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

1999

2000

2001

2002

2003

2004

2005

2006

1999

2000

2001

2002

2003

2004

2005

2006

Two month

75 50 25 0 –25 –50 –75 1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

Three month

80 60 40 20 0 –20 –40 –60 –80 1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

(j)

Fig. 5.4. Plots of one-month-ahead forecast errors (ut ) as a function of month t based on j = one-, two-, or three-month ahead futures contracts

90

Macroeconomics and ARCH

and statistical test of its signiﬁcance, we might want to model these features of the data. Hamilton (2009) calculated maximum likelihood estimates for parameters of the following EGARCH speciﬁcation (with (j) superscripts on all variables and parameters suppressed for ease of readability): (15) ut = μ + ht εt log ht − γ zt = α(|εt−1 | − k2 ) + δ(log ht−1 − γ zt−1 )

(16)

zt = (1, t/1, 000) √ 2 νΓ [(ν + 1)/2] √ k2 = E|εt | = (ν − 1) πΓ (ν/2) for εt a Student’s t variable with ν degrees of freedom and Γ (.) the gamma function: ∞ xs−1 e−x dx. Γ (s) = 0

The log likelihood is then found from T

log f (ut |Ut−1 ; θ)

(17)

t=1

f (ut |Ut−1 , θ) = k1 / ht [1 + (ε2t /ν)]−(ν+1)/2 √ k1 = Γ [(ν + 1)/2]/[Γ (ν/2) νπ]. Given numerical values for the parameter vector θ = (μ, γ , α, δ, ν) and observed data UT = (u1 , u2 , . . . , uT ) we can then begin the iteration (16) for t = 1 by setting h1 = exp(γ z0 ). Plugging this into (15) gives us a value for ε1 , which from (16) gives us the number for h2 . Iterating in this fashion gives the sequence {ht , εt }Tt=1 from which the log likelihood (17) can be evaluated for the speciﬁed numerical value of θ. One then tries another guess for θ in order to numerically maximize the likelihood function. Asymptotic standard errors can be obtained from numerical second derivatives of the log likelihood as in Hamilton (1994, equation [5.8.3]). Maximum likelihood parameter estimates are reported in Table 5.4. Adding these features provides an overwhelming improvement in ﬁt, with a likelihood ratio test statistic well in excess of 100 when adding just four parameters to a simple Gaussian speciﬁcation with constant variance. The very low estimated degrees of freedom results from the big outliers in the data, and both the serial dependence (δ) and trend parameter (γ2 ) for the variance are extremely signiﬁcant. A very remarkable result is that the estimates for the mean of the forecast error μ actually switch signs, shrink by an order of magnitude, and become far from statistically (j) signiﬁcant. Evidently the sample means of ut are more inﬂuenced by negative outliers and observations early in the sample than they should be. Note that for this example, the problem is not adequately addressed by simply replacing OLS standard errors with White standard errors, as when the regressors consist only

4 Application 2

91

Table 5.4. Maximum likelihood estimates (asymptotic standard errors in parentheses) for EGARCH model of Fed funds futures forecast errors (1)

Horizon (j)

ut

Mean (μ) Log average variance (γ1 ) Trend in variance (γ2 ) |ut−1 |(α) log ht−1 (δ) Student’s t degrees of freedom (υ) Log likelihood

(2)

ut

(3)

ut

0.12 (0.24) 0.43 (0.34) 0.27 (0.67) 5.73 (0.42) 6.47 (0.51) 7.01 (0.54) −22.7 (3.1) −23.6 (3.3) −17.1 (3.8) 0.18 (0.07) 0.15 (0.07) 0.30 (0.12) 0.63 (0.16) 0.74 (0.22) 0.84 (0.11) 2.1 (0.4) 2.2 (0.4) 4.1 (1.2) −731.08 −793.38 −860.16

of a constant term, the two would be identical. Moreover, whenever, as here, there is an aﬃrmative objective of obtaining accurate estimates of a parameter (the possible risk premium incorporated in these prices) as opposed solely to testing a hypothesis, the concern is with the quality of the coeﬃcient estimate itself rather than the correct size of a hypothesis test.

4. Application 2: Using the Taylor Rule to summarize changes in Federal Reserve policy One of the most inﬂuential papers for both macroeconomic research and policy over the last decade has been John Taylor’s (1993) proposal of a simple rule that the central bank should follow in setting an interest rate like the Fed funds rate rt . Taylor’s proposal called for the Fed to raise the interest rate by an amount governed by a parameter ψ1 when the observed inﬂation rate πt is higher than it wishes (so as to bring inﬂation back down), and to raise the interest rate by an amount governed by ψ2 when yt , the gap between real GDP and its potential value, is positive: r t = ψ 0 + ψ 1 πt + ψ 2 y t In this equation, the value of ψ0 reﬂects factors such as the Fed’s long-run inﬂation target and the equilibrium real interest rate. There are a variety of ways such an expression has been formulated in practice, such as “forward-looking” speciﬁcations, in which the Fed is responding to what it expects to happen next to inﬂation and output, and “backwardlooking” speciﬁcations, in which lags are included to capture expectations formation and adjustment dynamics. A number of studies have looked at the way that the coeﬃcients in such a relation may have changed over time, including Judd and Rudebusch (1998), Clarida, Gal´ı and Gertler (2000), Jalil (2004), and Boivin and Giannoni (2006). Of particular interest has been the claim that the coeﬃcient on inﬂation ψ1 has increased relative to the 1970s, and that this increased willingness on the part of the Fed to ﬁght inﬂation has been a factor helping to make the US economy become more stable. In this chapter, I will explore the

92

Macroeconomics and ARCH

variant investigated by Judd and Rudebusch, whose reduced-form representation is Δrt = γ0 + γ1 πt + γ2 yt + γ3 yt−1 + γ4 rt−1 + γ5 Δrt−1 + vt .

(18)

Here t = 1, 2, . . . , T now will index quarterly data, with t = 1 in my sample corresponding to 1956:Q1 and T = 205 corresponding to 2007:Q1. The value of rt for a given quarter is the average of the three monthly series for the eﬀective Fed funds rate, with Δrt = rt −rt−1 , and for empirical results here is reported as percent rather than basis points, e.g., rt = 5.25 when the average Fed funds rate over the three months of the quarter is 5.25%. Inﬂation πt is measured as 100 times the natural logarithm of the diﬀerence between the level of the implicit GDP deﬂator for quarter t and its value for the corresponding quarter of the preceding year, with data taken from Bureau of Economic Analysis Table 1.1.9. As in Judd and Rudebusch, the output gap yt was calculated as yt =

100(Yt − Yt∗ ) Yt∗

for Yt the level of real GDP (in billions of chained 2000 dollars, from BEA Table 1.1.6) and Yt∗ the series for potential GDP from the Congressional Budget Oﬃce (obtained from the St. Louis FRED database). Judd and Rudebusch focused on certain rearrangements of the parameters in (18), though here I will simply report results in terms of the reduced-form estimates themselves. The term vt in (18) is the regression error. Table 5.5 presents results from OLS estimation of (18) using the full sample of data. Of particular interest are γ1 and γ2 , the contemporary responses to inﬂation and output, respectively. Table 5.6 then re-estimates the relation, allowing for separate coeﬃcients since 1979:Q3, when Paul Volcker became Chair of the Federal Reserve. The OLS results reproduce the ﬁndings of the many researchers noted above that monetary policy seems to have responded much more vigorously to disturbances since 1979, with the inﬂation coeﬃcient γ1 increasing by 0.26 and the output coeﬃcient γ2 increasing by 0.64. However, the White standard errors for the coeﬃcients on dt πt and dt yt are almost twice as large as the OLS standard errors, and suggest that the increased response to inﬂation is in fact not statistically signiﬁcant and the increased response to output is measured very imprecisely. Moreover, Engle’s LM test for the null of Gaussian errors with Table 5.5.

Fixed-coeﬃcient Taylor Rule as estimated from full sample OLS regression

Regressor Constant πt yt yt−1 rt−1 Δrt−1 T R2 for ARCH(4) (p value) Log likelihood

Coeﬃcient 0.06 0.13 0.37 −0.27 −0.08 0.14 23.94 −252.26

Std error (OLS)

Std error (White)

0.13 0.04 0.07 0.07 0.03 0.07

0.18 0.06 0.11 0.10 0.03 0.15

(0.000)

4 Application 2

93

Table 5.6. Taylor Rule with separate pre- and post-Volcker parameters as estimated by OLS regression (dt = 1 for t > 1979:Q2) Regressor constant πt yt yt−1 rt−1 Δrt−1 dt dt πt dt yt dt yt−1 dt rt−1 dt Δrt−1 T R2 for ARCH(4) (p value) Log likelihood

Coeﬃcient

Std error (OLS)

Std error (White)

0.37 0.17 0.18 −0.07 −0.21 0.42 −0.50 0.26 0.64 −0.55 0.05 −0.53 45.45 −226.80

0.19 0.07 0.08 0.08 0.07 0.11 0.24 0.09 0.14 0.14 0.08 0.13 (0.000)

0.19 0.04 0.07 0.07 0.06 0.13 0.30 0.16 0.24 0.21 0.08 0.24

no heteroskedasticity against the alternative of fourth-order ARCH leads to overwhelming rejection of the null hypothesis.2 All of which suggests that, if we are indeed interested in measuring the magnitudes by which these coeﬃcients have changed, it is preferable to adjust not just the standard errors but the parameter estimates themselves in light of the dramatic ARCH displayed in the data. I therefore estimated the following GARCH-t generalization of (18): yt = xt β + vt vt =

h t εt

˜t ht = κ + h ˜ t−1 ˜ t = α(v 2 − κ) + δ h h t−1

(19)

with εt a Student’s t random variable with ν degrees of freedom. Iteration on (19) is ˜ 1 = 0. The log likelihood is then evaluated exactly as in (17). Maximum initialized with h likelihood estimates are reported in Table 5.7. Once again generalizing a homoskedastic Gaussian speciﬁcation is overwhelmingly favored by the data, with a comparison of the speciﬁcations in Tables 5.6 and 5.7 producing a likelihood ratio χ2 (4) statistic of 183.34. The degrees of freedom for the Student’s t distribution are only 2.29, and the implied GARCH process is highly persistent (ˆ α + δˆ = 0.82). Of particular interest is the fact that the changes in the Fed’s response to inﬂation and output are now considerably smaller than suggested by the OLS 2 Siklos

and Wohar (2005) also make this point.

94

Macroeconomics and ARCH Table 5.7. Taylor Rule with separate pre- and post-Volcker parameters as estimated by GARCH-t maximum likelihood (dt = 1 for t > 1979:Q2) Regressor constant πt yt yt−1 rt−1 Δrt−1 dt dt πt dt yt dt yt−1 dt rt−1 dt Δrt−1 GARCH parameters constant α δ ν Log likelihood

Coeﬃcient

Asymptotic std error

0.13 0.06 0.14 −0.12 −0.07 0.47 −0.03 0.09 0.05 0.02 −0.01 −0.01

0.08 0.03 0.03 0.03 0.03 0.09 0.12 0.04 0.07 0.07 0.03 0.11

0.015 0.11 0.71 2.29 −135.13

0.010 0.05 0.07 0.48

estimates. The change in γ1 is now estimated to be only 0.09 and the change in γ2 has dropped to 0.05 and no longer appears to be statistically signiﬁcant. Figure 5.5 oﬀers some insight into what produces these results. The top panel illustrates the tendency for interest rates to exhibit much more volatility at some times than others, with the 1979:Q2–1982:Q3 episode particularly dramatic. The bottom panel plots observations on the pairs (yt , Δrt ) in the second half of the sample. The apparent positive slope in that scatter plot is strongly inﬂuenced by the observations in the 1979–1982 period. If one allowed the possibility of serial dependence in the squared residuals, one would give less weight to the 1979–1982 observations, resulting in a ﬂatter slope estimate over 1979–2007 relative to OLS. This is not to attempt to overturn the conclusion of earlier researchers that there has been a change in Fed policy in the direction of a more active policy. A comparison of the changing-parameter speciﬁcation of Table 5.7 with a ﬁxed-parameter GARCH speciﬁcation produces a χ2 (4) likelihood ratio statistic of 18.22, which is statistically signiﬁcant with a p value of 0.001. Nevertheless, the magnitude of this change appears to be substantially smaller than one would infer on the basis of OLS estimates of the parameters. Nor is this discussion meant to displace the large and thoughtful literature on possible changes in the Taylor Rule, which has raised a number of other substantive issues not explored here. These include whether one wants to use real-time or subsequent revised data (Orphanides, 2001), the distinction between the “backward-looking” Taylor Rule

5 Conclusions

95 Change in Fed funds rate, 1956:Q2–2007:Q1

change in funds rate

8 6 4 2 0 –2 –4 1956 1959 1962 1965 1968 1971 1974 1977 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007

date Scatter diagram, 1979:Q2–2007:Q1

change in funds rate

8 6 4 2 0 –2 –4 –8

–6

–4

–2

0

2

4

GDP deviation

Fig. 5.5. Change in Fed funds rate for the full sample (1956:Q2–2007:Q1), and scatter plot for later subsample (1979:Q2–2007:Q1) of change in Fed funds rate against deviation of GDP from potential

explored here and “forward-looking” speciﬁcations (Clarida, Gal´ı, and Gertler, 2000), and continuous evolution of parameters rather than a sudden break (Jalil, 2004; Boivin, 2006). The simple exercise undertaken nevertheless does in my mind establish the potential importance for macroeconomists to check for the presence of ARCH even when their primary interest is in the conditional mean.

5. Conclusions The reader may note that both of the examples I have used to illustrate the potential relevance of McARCH use the Fed funds rate as the dependent variable. This is not entirely an accident. Although Kilian and Gon¸calves (2004) concluded that most macro series exhibit some ARCH, the Fed funds rate may be the macro series for which one is most likely to observe wild outliers and persistent volatility clustering, regardless of the data frequency or subsample. It is nevertheless, as the examples used here illustrate, a series that features very importantly for some of the most fundamental questions in macroeconomics.

96

Macroeconomics and ARCH

The rather dramatic way in which accounting for outliers and ARCH can change one’s inference that was seen in these examples presumably would not be repeated for every macroeconomic relation estimated. However, routinely checking something like a T R2 statistic, or the diﬀerence between OLS and White standard errors, seems a relatively costless and potentially quite beneﬁcial habit. And the assumption by many practitioners that we can avoid all these problems simply by always relying on the White standard errors may not represent best possible practice.

6

Macroeconomic Volatility and Stock Market Volatility, World-Wide Francis X. Diebold and Kamil Yilmaz

1. Introduction The ﬁnancial econometrics literature has been strikingly successful at measuring, modeling, and forecasting time-varying return volatility, contributing to improved asset pricing, portfolio management, and risk management, as surveyed for example in Andersen, Bollerslev, Christoﬀersen and Diebold (2006a, 2006b). Much of the ﬁnancial econometrics of volatility is of course due to Rob Engle, starting with the classic contribution of Engle (1982a). Interestingly, the subsequent ﬁnancial econometric volatility, although massive, is largely silent on the links between asset return volatility and its underlying determinants. Instead, one typically proceeds in reduced-form fashion, modeling and forecasting volatility but not modeling or forecasting the eﬀects of fundamental macroeconomic developments.1 In particular, the links between asset market volatility and fundamental Acknowledgments: We gratefully dedicate this paper to Rob Engle on the occasion of his 65th birthday. The research was supported by the Guggenheim Foundation, the Humboldt Foundation, and the National Science Foundation. For outstanding research assistance we thank Chiara Scotti and Georg Strasser. For helpful comments we thank the Editor and Referee, as well as Joe Davis, Aureo DePaula, Jonathan Wright, and participants at the Penn Econometrics Lunch, the Econometric Society 2008 Winter Meetings in New Orleans, and the Engle Festschrift Conference. 1 The strongly positive volatility-volume correlation has received attention, as in Clark (1973), Tauchen and Pitts (1983), and many others, but that begs the question of what drives volume, which again remains largely unanswered.

97

98

Macroeconomic volatility and stock market volatility, world-wide

volatility remain largely unstudied; eﬀectively, asset market volatility is modeled in isolation of fundamental volatility.2 Ironically, although fundamental volatility at business cycle frequencies has been studied recently, as for example in Ramey and Ramey (1995) and several of the papers collected in Pinto and Aizenman (2005), that literature is largely macroeconomic, focusing primarily on the link between fundamental volatility and subsequent real growth.3 Hence the links between fundamental volatility and asset market volatility again remain largely unstudied; fundamental volatility is modeled in isolation of asset market volatility. Here we focus on stock market volatility. The general failure to link macroeconomic fundamentals to asset return volatility certainly holds true for the case of stock returns. There are few studies attempting to link underlying macroeconomic fundamentals to stock return volatility, and the studies that do exist have been largely unsuccessful. For example, in a classic and well-known contribution using monthly data from 1857 to 1987, Schwert (1989) attempts to link stock market volatility to real and nominal macroeconomic volatility, economic activity, ﬁnancial leverage, and stock trading activity. He ﬁnds very little. Similarly and more recently, using sophisticated regime-switching econometric methods for linking return volatility and fundamental volatility, Calvet, Fisher and Thompson (2006) also ﬁnd very little. The only robust ﬁnding seems to be that the stage of the business cycle aﬀects stock market volatility; in particular, stock market volatility is higher in recessions, as found by and echoed in Schwert (1989) and Hamilton and Lin (1996), among others. In this chapter we provide an empirical investigation of the links between fundamental volatility and stock market volatility. Our exploration is motivated by ﬁnancial economic theory, which suggests that the volatility of real activity should be related to stock market volatility, as in Shiller (1981) and Hansen and Jagannathan (1991).4 In addition, and crucially, our empirical approach exploits cross-sectional variation in fundamental and stock market volatilities to uncover links that would likely be lost in a pure time series analysis. This chapter is part of a nascent literature that explores the links between macroeconomic fundamentals and stock market volatility. Engle and Rangel (2008) is a prominent example. Engle and Rangel propose a spline-GARCH model to isolate low-frequency volatility, and they use the model to explore the links between macroeconomic fundamentals and low-frequency volatility.5 Engle, Ghysels and Sohn (2006) is another interesting example, blending the spline-GARCH approach with the mixed data sampling (MIDAS) approach of Ghysels, Santa-Clara, and Valkanov (2005). The above-mentioned Engle 2 By “fundamental volatility,” we mean the volatility of underlying real economic fundamentals. From the vantage point of a single equity, this would typically correspond to the volatility of real earnings or dividends. From the vantage point of the entire stock market, it would typically correspond to the volatility of real GDP or consumption. 3 Another strand of macroeconomic literature, including for example Levine (1997), focuses on the link between fundamental volatility and ﬁnancial market development. Hence, although related, it too misses the mark for our purposes. 4 Hansen and Jagannathan provide an inequality between the “Sharpe ratios” for the equity market and the real fundamental and hence implicitly link equity volatility and fundamental volatility, other things equal. 5 Earlier drafts of our paper were completed contemporaneously with and independently of Engle and Rangel.

2 Data

99

et al. macro-volatility literature, however, focuses primarily on dynamics, whereas in this chapter we focus primarily on the cross-section, as we now describe.

2. Data Our goal is to elucidate the relationship, if any, between real fundamental volatility and real stock market volatility in a broad cross-section of countries. To do so, we ask whether time-averaged fundamental volatility appears linked to time-averaged stock market volatility. We now describe our data construction methods in some detail; a more detailed description, along with a complete catalog of the underlying data and sources, appears in the Appendix.

2.1. Fundamental and stock market volatilities First consider the measurement of fundamental volatility. We use data on real GDP and real personal consumption expenditures (PCE) for many countries. The major source for both variables is the World Development Indicators (WDI) of the World Bank. We measure fundamental volatility in two ways. First, we calculate it as the standard deviation of GDP (or consumption) growth, which is a measure of unconditional fundamental volatility. Alternatively, following Schwert (1989), we use residuals from an AR(3) model ﬁt to GDP or consumption growth. This is a measure of conditional fundamental volatility, or put diﬀerently, a measure of the volatility of innovations to fundamentals.6 Now consider stock market volatility. We parallel our above-discussed approach to fundamental volatility, using the major stock index series from the IMF’s International Financial Statistics (IFS). Stock indices are not available for some countries and periods. For those countries we obtain data from alternative sources, among which are Datastream, the Standard and Poors Emerging Markets Database, and the World Federation of Exchanges. Finally, using consumer price index data from the IFS, we convert to real stock returns. We measure real stock market volatility in identical fashion to fundamental volatility, calculating both unconditional and conditional versions. Interestingly, the AR(3) coeﬃcients are statistically signiﬁcant for a few developing countries, which have small and illiquid stock markets.7

2.2. On the choice of sample period Our empirical analysis requires data on four time series for each country: real GDP, real consumption expenditures, stock market returns and consumer price inﬂation. In terms of data availability, countries fall into three groups. The ﬁrst group is composed 6 The

latter volatility measure is more relevant for our purposes, so we focus on it for the remainder of this chapter. The empirical results are qualitatively unchanged, however, when we use the former measure. 7 Again, however, we focus on the condition version for the remainder of this chapter.

100

Macroeconomic volatility and stock market volatility, world-wide

of mostly industrial countries, with data series available for all four variables from the 1960s onward. The second group of countries is composed mostly of developing countries. In many developing countries, stock markets became an important means of raising capital only in the 1990s; indeed, only a few of the developing countries had active stock markets before the mid-1980s. Hence the second group has shorter available data series, especially for stock returns. One could of course deal with the problems of the second group simply by discarding it, relying only on the cross-section of industrialized countries. Doing so, however, would radically reduce cross-sectional variation, producing potentially severe reductions in statistical eﬃciency. Hence we use all countries in the ﬁrst and second groups, but we start our sample in 1983, reducing the underlying interval used to calculate volatilities to 20 years. The third group of countries is composed mostly of the transition economies and some African and Asian developing countries, for which stock markets became operational only in the 1990s. As a result, we can include these countries only if we construct volatilities using roughly a 10-year interval of underlying data. Switching from a 20-year to a 10-year interval, the number of countries in the sample increases from around 40 to around 70 (which is good), but using a 10-year interval produces much noisier volatility estimates (which is bad). We feel that, on balance, the bad outweighs the good, so we exclude the third group of countries from our basic analysis, which is based on underlying annual data. However, and as we will discuss, we are able to base some of our analyses on underlying quarterly data, and in those cases we include some of the third group of countries. In closing this subsection, we note that, quite apart from the fact that data limitations preclude use of pre-1980s data, use of such data would probably be undesirable even if it were available. In particular, the growing literature on the “Great Moderation” – decreased variation of output around trend in industrialized countries, starting in the early 1980s – suggests the appropriateness of starting our sample in the early 1980s, so we take 1983–2002 as our benchmark sample.8 Estimating fundamental volatility using both pre- and post-1983 data would mix observations from the high and low fundamental volatility eras, potentially producing distorted inference.

3. Empirical results Having described our data and choice of benchmark sample, we now proceed with the empirical analysis, exploring the relationship between stock market volatility and fundamental volatility in a broad cross-section covering approximately 40 countries. 8 On the “Great Moderation” in developed countries, see Kim and Nelson (1999a), McConnell and Perez-Quiros (2000) and Stock and Watson (2002b). Evidence for fundamental volatility moderation in developing countries also exists, although it is more mixed. For example, Montiel and Serven (2006) report a decline in GDP growth volatility from roughly 4% in the 1970s and 1980s to roughly 3% in the 1990s. On the other hand, Kose, Prasad, and Terrones (2006) ﬁnd that developing countries experience increases in consumption volatility following ﬁnancial liberalization, and many developing economies have indeed liberalized in recent years.

3 Empirical results

101

3.1. Distributions of volatilities in the cross-section We begin in Figure 6.1 by showing kernel density estimates of the cross-country distributions of fundamental volatility and stock return volatility. The densities indicate wide dispersion in volatilities across countries. Moreover, the distributions tend to be right-skewed, as developing countries often have unusually high volatility. The log transformation largely reduces the right skewness; hence we work with log volatilities from this point onward.9

3.2. The basic relationship We present our core result in Figure 6.2, which indicates a clear positive relationship between stock return and GDP volatilities, as summarized by the scatterplot of stock market volatility against GDP volatility, together with ﬁtted nonparametric regression curve.10 The ﬁtted curve, moreover, appears nearly linear. (A ﬁtted linear regression gives a slope coeﬃcient of 0.38 with a robust t-statistic of 4.70, and an adjusted R2 of 0.26.) When we swap consumption for GDP, the positive relationship remains, as shown in Figure 6.3, although it appears less linear. In any event, the positive cross-sectional relationship between stock market volatility and fundamental volatility contrasts with the Schwert’s (1989) earlier-mentioned disappointing results for the US time series.

3.3. Controlling for the level of initial GDP Inspection of the country acronyms in Figures 6.2 and 6.3 reveals that both stock market and fundamental volatilities are higher in developing (or newly industrializing) countries. Conversely, industrial countries cluster toward low stock market and fundamental volatility. This dependence of volatility on stage of development echoes the ﬁndings of Koren and Tenreyro (2007) and has obvious implications for the interpretation of our results. In particular, is it a development story, or is there more? That is, is the apparent positive dependence between stock market volatility and fundamental volatility due to common positive dependence of fundamental and stock market volatilities on a third variable, stage of development, or would the relationship exist even after controlling for stage of development? To explore this, we follow a two-step procedure. In the ﬁrst step, we regress all variables on initial GDP per capita, to remove stage-of-development eﬀects (as proxied by initial GDP). In the second step, we regress residual stock market volatility on residual fundamental volatility. In Figures 6.4–6.6 we display the ﬁrst-step regressions, which are of independent interest, providing a precise quantitative summary of the dependence of all variables (stock market volatility, GDP volatility and consumption volatility) on initial GDP per capita. The dependence is clearly negative, particularly if we discount the distortions to the basic relationships caused by India and Pakistan, which have very low 9 The

approximate log-normality of volatility in the cross-section parallels the approximate unconditional log-normality documented in the time series by Andersen, Bollerslev, Diebold and Ebens (2001). 10 We use the LOWESS locally weighted regression procedure of Cleveland (1979).

102

Macroeconomic volatility and stock market volatility, world-wide Log real stock return volatility

Real stock return volatility 0.04

1

0.8

Density

Density

0.03

0.02

0.6

0.4 0.01 0.2

0

0 20

10

30

40

2

50

Real GDP growth volatility

0.3

2.5

3

3.5

4

Log real GDP growth volatility

0.8

0.6

Density

Density

0.2

0.4

0.1 0.2

0 0

2

4

6

0

8

0

Real PCE growth volatility

0.25

0.5

1

1.5

2

Log real PCE growth volatility

0.6

0.2

Density

Density

0.4 0.15

0.1

0.2 0.05

0 0

Fig. 6.1.

2

4

6

8

10

0 –1

0

1

2

3

Kernel density estimates, volatilities and fundamentals, 1983–2002

Note: We plot kernel density estimates of real stock return volatility (using data for 43 countries), real GDP growth volatility (45 countries), and real consumption growth volatility (41 countries), in both levels and logs. All volatilities are standard deviations of residuals from AR(3) models ﬁtted to annual data, 1983–2002. For comparison we also include plots of bestﬁtting normal densities (dashed).

3 Empirical results

103

4 PHL BRA TAI

Log real stock return volatility

3.5

KOR ISR

COL SWE

MYS

VEN

SGP HKG MEX

CHL

JPN

GER FRA NOR

NLD

TTB JAM LUX

IND

AUT SPA

3

ZBW PER ARG

FIN

PAK

ITA

IDN

THA

GRC

NZL

MOR IRL JOR

SWI USA CAN

SAF

UK

2.5

AUS

2 0

Fig. 6.2.

0.5

1 Log real GDP growth volatility

1.5

2

Real stock return volatility and real GDP growth volatility, 1983–2002

Note: We show a scatterplot of real stock return volatility against real GDP growth volatility, with a nonparametric regression ﬁt superimposed, for 43 countries. All volatilities are log standard deviations of residuals from AR(3) models ﬁtted to annual data, 1983–2002.

4 BRA

PHL TAI

Log real stock return volatility

3.5

FIN

PAK COL

ITA SWE AUT

3

ISR SGP

KOR

NLD

ARG

ZBW

MYS

HKG MEX

JAM

IND

SPA JPN FRA

IDN PER

THA

GRC

CHL

GER

NZL

MOR

NOR IRL

SWI USA

CAN

SAF UK

2.5

AUS

2 0

Fig. 6.3.

0.5

1 1.5 Log real PCE growth volatility

2

Real stock return volatility and real PCE growth volatility, 1983–2002

Note: We show a scatterplot of real stock return volatility against real consumption growth volatility, with a nonparametric regression ﬁt superimposed, for 39 countries. All volatilities are log standard deviations of residuals from AR(3) models ﬁtted to annual data, 1983–2002.

104

Macroeconomic volatility and stock market volatility, world-wide 4 BRA

PHL TAI

Log real stock return volatility

IDN ZBW

3.5

PER

THA

ARG VEN

PAK

MYS COL

FIN KOR TTB

ISR SGP HKG ITA

MEX

JAM

IND

SWE LUX AUT

SPA

CHL

3

GRC

NZL

MOR IRL

GER JPN NLD FRA NOR

JOR

SWI USA CAN

SAF

UK

2.5

AUS

2 6

Fig. 6.4.

8 10 Log real GDP per capita in 1983

12

Real Stock return volatility and initial real GDP per capita, 1983–2002

Note: We show a scatterplot of real stock return volatility against initial (1983) real GDP per capita, with a nonparametric regression ﬁt superimposed, for 43 countries. All volatilities are log standard deviations of residuals from AR(3) models ﬁtted to annual data, 1983–2002.

2 ARG

PER ZBW VEN

Log real GDP growth volatility

JOR IDN

1.5

MYS

MOR THA

URY KOR MEX TTB

PHL

1

CHL JAM TAI

SGP HKG

BRA

LUX

IRL SAF

ISR FIN

COL

IND

NZL GRC

0.5

PAK

CAN

SWEDEN JPN USA SPA AUT NOR GER UK FRA AUS

ITA

SWI

NLD

0 6

Fig. 6.5.

8 10 Log real GDP per capita in 1983

12

Real GDP growth volatility and initial GDP per capita, 1983–2002

Note: We show a scatterplot of real GDP growth volatility against initial (1983) real GDP per capita, with a nonparametric regression ﬁt superimposed, for 45 countries. All volatilities are log standard deviations of residuals from AR(3) models ﬁtted to annual data, 1983–2002. The number of countries is two more than in Figure 2 because we include Uruguay and Denmark here, whereas we had to exclude them from Figure 2 due to missing stock return data.

4 Variations and extensions

105

ZBW

Log real PCE growth volatility

2

JAM

PER MYS CHL

IDN MOR

1.5

KOR MEX

THA

BRA

PAK

HKG ISR SGP

1 COL

TAI SAF

0.5

IND PHL

0

6

Fig. 6.6.

ARG URY

IRL NZL FIN DEN GRC NOR SWE SPA ITA UK AUS AUT CAN GER NLD USA A N FR JP SWI

8 10 Log real GDP per capita in 1983

12

Real PCE growth volatility and initial GDP per capita, 1983–2002

Note: We show a scatterplot of real consumption growth volatility against initial (1983) real GDP per capita, with a nonparametric regression ﬁt superimposed, for 41 countries. All volatilities are log standard deviations of residuals from AR(3) models ﬁtted to annual data, 1983–2002. The number of countries is two more than in Figure 3 because we include Uruguay and Denmark here, whereas we had to exclude them from Figure 3 due to missing stock return data.

initial GDP per capita, yet relatively low stock market, and especially fundamental, volatility. We display second-step results for the GDP fundamental in Figure 6.7. The ﬁtted curve is basically ﬂat for low levels of GDP volatility, but it clearly becomes positive as GDP volatility increases. A positive relationship also continues to obtain when we switch to the consumption fundamental, as shown in Figure 6.8. Indeed the relationship between stock market volatility and consumption volatility would be stronger after controlling for initial GDP if we were to drop a single and obvious outlier (Philippines), which distorts the ﬁtted curve at low levels of fundamental volatility, as Figure 6.8 makes clear.

4. Variations and extensions Thus far we have studied stock market and fundamental volatility using underlying annual data, 1983–2002. Here we extend our analysis in two directions. First, we incorporate higher frequency data when possible (quarterly for GDP and monthly, aggregated to quarterly, for stock returns). Second, we use the higher frequency data in a panel-data framework to analyze the direction of causality between stock market and fundamental volatility.

106

Macroeconomic volatility and stock market volatility, world-wide

GRC

0.5

FIN

BRA

Log real stock return volatility

PHL TAI

ARG PER

ISR ITA

THA SWE

AUT

SPA COL

NLD

ZBW VEN

KOR MYS JPN TTB

PAK

0

IDN

SGP HKG LUX

SWI FRA GER MEX NOR NZL JAM IND IRL

USA CHL CAN

–0.5

MOR

UK

JOR AUS SAF

–1 –0.5

0 Log real GDP growth volatility

0.5

1

Fig. 6.7. Real stock return volatility and real GDP growth volatility, 1983–2002, controlling for initial GDP per capita Note: We show a scatterplot of real stock return volatility against real GDP growth volatility with a nonparametric regression ﬁt superimposed, for 43 countries, controlling for the eﬀects of initial GDP per capita via separate ﬁrst-stage nonparametric regressions of each variable on 1983 GDP per capita. All volatilities are log standard deviations of residuals from AR(3) models ﬁtted to annual data, 1983–2002.

GRC

0.5 Log real stock return volatility

PHL

FIN

BRA

TAI

ARG PER THA ITA

0 COL

SPA FRA

SWE

AUT

JPN

PAK

NLD

SWI GER

ISR SGP IDN

ZBW

KOR MYS

NZL MEX NOR

IND

HKG

JAM

IRL USA

CHL

CAN

–0.5

UK

MOR

AUS SAF

–1 –1

–0.5 0 Log real PCE growth volatility

0.5

1

Fig. 6.8. Real stock return volatility and real PCE growth volatility, 1983–2002, controlling for initial GDP per capita Note: We show a scatterplot of real stock return volatility against real consumption growth volatility with a nonparametric regression ﬁt superimposed, for 39 countries, controlling for the eﬀects of initial GDP per capita via separate ﬁrst-stage nonparametric regressions of each variable on 1983 GDP per capita. All volatilities are log standard deviations of residuals from AR(3) models ﬁtted to annual data, 1983–2002.

4 Variations and extensions

107

3.5

Log real stock return volatility

FIN

TUR PHL

KOR IDN TAI THA MYS

3 GER MEX

HUN CZE

UKI

SWE FRA

NLD

2.5

ARG SGP

ITA SWI CAN SAF USA

DEN COL

NOR SPA SLV

PRT JPN

AUT NZL

2

BEL

HKG

LAT

ISR IRL PER

CHL

AUS

1.5 –1

Fig. 6.9.

0 1 Log real GDP growth volatility

2

Real stock return volatility and real GDP growth volatility, 1999.1–2003.3

Note: We show a scatterplot of real stock return volatility against real GDP growth volatility, with a nonparametric regression ﬁt superimposed, for 40 countries. All volatilities are log standard deviations of residuals from AR(4) models ﬁtted to quarterly data, 1999.1–2003.3.

4.1. Cross-sectional analysis based on underlying quarterly data As noted earlier, the quality of developing-country data starts to improve in the 1980s. In addition, the quantity improves, with greater availability and reliability of quarterly GDP data. We now use that quarterly data 1984.1 to 2003.3, constructing and examining volatilities over four ﬁve-year spans: 1984.1–1988.4, 1989.1–1993.4, 1994.1–1998.4, and 1999.1–2003.3. The number of countries increases considerably as we move through the four periods. Hence let us begin with the fourth period, 1999.1–2003.3. We show in Figure 6.9 the ﬁtted regression of stock market volatility on GDP volatility. The relationship is still positive; indeed it appears much stronger than the one discussed earlier, based on annual data 1983–2002 and shown in Figure 6.2. Perhaps this is because the developing-country GDP data have become less noisy in recent times. Now let us consider the other periods. We obtained qualitatively identical results when repeating the analysis of Figure 6.9 for each of the three earlier periods: stock market volatility is robustly and positively linked to fundamental volatility. To summarize those results compactly, we show in Figure 6.10 the regression ﬁtted to all the data, so that, for example, a country with data available for all four periods has four data points in the ﬁgure. The positive relationship between stock market and fundamental volatility is clear.11 11 Two outliers on the left (corresponding to Spain in the ﬁrst two windows) distort the ﬁtted curve and should be discounted.

108

Macroeconomic volatility and stock market volatility, world-wide

Log real stock return volatility

3.5

3

2.5

2

1.5

–1

0 1 Log real GDP growth volatility

2

Fig. 6.10. Real stock return volatility and real GDP growth volatility, 1984.1– 2003.3 Note: We show a scatterplot of real stock return volatility against real GDP growth volatility, with a nonparametric regression ﬁt superimposed, for 43 countries. All volatilities are log standard deviations of residuals from AR(4) models ﬁtted to quarterly data over four consecutive ﬁve-year windows (1984.1– 1988.4, 1989.1–1993.4, 1994.1–1998.4, 1999.1–2003.3).

4.2. Panel analysis of causal direction Thus far we have intentionally and exclusively emphasized the cross-sectional relationship between stock market and fundamental volatility, and we found that the two are positively related. However, economics suggests not only correlation between fundamentals and stock prices, and hence from fundamental volatility to stock market volatility, but also (Granger) causation.12 Hence in this subsection we continue to exploit the rich dispersion in the cross-section, but we no longer average out the time dimension; instead, we incorporate it explicitly via a panel analysis. Moreover, we focus on a particular panel analysis that highlights the value of incorporating cross-sectional information relative to a pure time series analysis. In particular, we follow Schwert’s (1989) two-step approach to obtain estimates of timevarying quarterly stock market and GDP volatilities, country-by-country, and then we test causal hypotheses in a panel framework that facilitates pooling of the cross-country data. Brieﬂy, Schwert’s approach proceeds as follows. In the ﬁrst step, we ﬁt autoregressions to stock market returns and GDP, and we take absolute values of the associated residuals, which are eﬀectively (crude) quarterly realized volatilities of stock market and fundamental innovations, in the jargon of Andersen, Bollerslev, Diebold and Ebens (2001). 12 There

may of course also be bi-directional causality (feedback).

5 Concluding remark

109

In the second stage, we transform away from realized volatilities and toward conditional volatilities by ﬁtting autoregressions to those realized volatilities, and keeping the ﬁtted values. We repeat this for each of the 46 countries. We analyze the resulting 46 pairs of stock market and fundamental volatilities in two ways. The ﬁrst follows Schwert and exploits only time series variation, estimating a separate VAR model for each country and testing causality. The results, which are not reported here, mirror Schwert’s, failing to identify causality in either direction in the vast majority of countries. The second approach exploits cross-sectional variation along with time series variation. We simply pool the data across countries, allowing for ﬁxed eﬀects. First we estimate a ﬁxed-eﬀects model with GDP volatility depending on three lags of itself and three lags of stock market volatility, which we use to test the hypothesis that stock market volatility does not Granger cause GDP volatility. Next we estimate a ﬁxed-eﬀects model with stock market volatility depending on three lags of itself and three lags of GDP volatility, which we use to test the hypothesis that GDP volatility does not Granger cause stock market volatility. We report the results in Table 6.1, using quarterly real stock market volatility and real GDP growth volatility for the panel of 46 countries, 1961.1–2003.3. We test noncausality from fundamental volatility (FV) to return volatility (RV), and vice versa, and we present F-statistics and corresponding p values for both hypotheses. We do this for 30 sample windows, with the ending date ﬁxed at 2003.3 and the starting date varying from 1961.1, 1962.1, . . . , 1990.1. There is no evidence against the hypothesis that stock market volatility does not Granger cause GDP volatility; that is, it appears that stock market volatility does not cause GDP volatility. In sharp contrast, the hypothesis that GDP volatility does not Granger cause stock market volatility is overwhelmingly rejected: evidently GDP volatility does cause stock market volatility. The intriguing result of one-way causality from fundamental volatility to stock return volatility deserves additional study, as the forward-looking equity market might be expected to predict macro fundamentals, rather than the other way around. Of course here we focus on predicting fundamental and return volatilities, rather than fundamentals or returns themselves. There are subtleties of volatility measurement as well. For example, we do not use implied stock return volatilities, which might be expected to be more forward-looking.13

5. Concluding remark This chapter is part of a broader movement focusing on the macro-ﬁnance interface. Much recent work focuses on high-frequency data, and some of that work focuses on the high-frequency relationships among returns, return volatilities and fundamentals (e.g., Andersen, Bollerslev, Diebold and Vega, 2003, 2007). Here, in contrast, we focus on international cross-sections obtained by averaging over time. Hence this chapter can be interpreted not only as advocating more exploration of the fundamental volatility/return 13 Implied

volatilities are generally not available.

110

Macroeconomic volatility and stock market volatility, world-wide Table 6.1. Granger causality analysis of stock market volatility and fundamental volatility Beginning Year 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990

RV not ⇒ FV

FV not ⇒ RV

F-stat.

p value

F-stat.

p value

1.16 1.18 1.11 1.14 1.07 1.06 1.01 1.00 0.98 0.96 0.89 0.78 0.62 0.84 0.83 0.83 0.95 0.88 0.73 0.74 0.49 0.47 0.59 0.71 0.83 1.07 1.29 1.29 1.21 1.23

0.3264 0.3174 0.3498 0.3356 0.3696 0.3746 0.4007 0.4061 0.4171 0.4282 0.4689 0.5380 0.6482 0.4996 0.5059 0.5059 0.4339 0.4750 0.5714 0.5646 0.7431 0.7578 0.6699 0.5850 0.5059 0.3697 0.2716 0.2716 0.3044 0.2959

4.14 4.09 4.21 4.39 4.33 4.33 4.48 4.44 4.38 4.14 3.86 4.16 4.06 4.40 3.90 3.89 3.93 4.11 4.02 4.52 4.67 4.77 5.15 5.39 5.58 5.59 5.76 4.84 3.86 3.42

0.0024 0.0026 0.0021 0.0015 0.0017 0.0017 0.0013 0.0014 0.0016 0.0024 0.0039 0.0023 0.0027 0.0015 0.0036 0.0037 0.0035 0.0025 0.0030 0.0012 0.0009 0.0008 0.0004 0.0003 0.0002 0.0002 0.0001 0.0007 0.0039 0.0085

We assess the direction of causal linkages between quarterly real stock market volatility and real GDP growth volatility for the panel of 46 countries, 1961.1 to 2003.3. We test noncausality from fundamental volatility (FV) to return volatility (RV), and vice versa, and we present F-statistics and corresponding p values for both hypotheses. We do this for 30 sample windows, with the ending date ﬁxed at 2003.3 and the starting date varying from 1961.1, 1962.1, . . . , 1990.1.

volatility interface, but also in particular as a call for more exploration of volatility at medium (e.g., business cycle) frequencies. In that regard it is to the stock market as, for example, Diebold, Rudebusch and Aruoba (2006) is to the bond market and Evans and Lyons (2007) is to the foreign exchange market.

Appendix

111

Appendix Here we provide details of data sources, country coverage, sample ranges, and transformations applied. We discuss underlying annual data ﬁrst, followed by quarterly data.

Annual data We use four “raw” data series per country: real GDP, real private consumption expenditures (PCE), a broad stock market index, and the CPI. We use those series to compute annual real stock returns, real GDP growth, real consumption growth, and corresponding volatilities. The data set includes a total of 71 countries and spans a maximum of 42 years, 1960–2002. For many countries, however, consumption and especially stock market data are available only for a shorter period, reducing the number of countries with data available. We obtain annual stock market data from several sources, including International Financial Statistics (IFS), the OECD, Standard and Poor’s Emerging Market Data Base (EMDB), Global Insight (accessed via WRDS), Global Financial Data, Datastream, the World Federation of Exchanges, and various stock exchange websites. Details appear in Table 6.A1, which lists the countries for which stock market index data are available at least for the 20-year period from 1983–2002. With stock prices in hand, we calculate nominal returns as it = ln(pt /pt−1 ). We then calculate annual consumer price index (CPI) inﬂation, πt , using the monthly IFS database 1960–2002, and ﬁnally we calculate real stock returns as rt = (1 + it )/ (1 + πt ) − 1. We obtain annual real GDP data from the World Bank World Development Indicators database (WDI). For most countries, WDI covers the full 1960–2002 period. Exceptions are Canada (data start in 1965), Germany (data start in 1971), Israel (data end in 2000), Saudi Arabia (data end in 2001), and Turkey (data start in 1968). We obtain Taiwan real GDP from the Taiwan National Statistics website. We complete the real GDP growth rate series for Canada (1961–1965), Germany (1961–1971), Israel (2001–2002) and Saudi Arabia (2002) using IFS data on nominal growth and CPI inﬂation. We calculate real GDP growth rates as GDPt / GDPt−1 − 1. We obtain real personal consumption expenditures data using the household and personal ﬁnal consumption expenditure from the World Bank’s WDI database. We recover missing data from the IFS and Global Insight (through WRDS); see Table 6.A2 for details. We calculate real consumption growth rates as Ct /Ct−1 − 1.

Quarterly data The quarterly analysis reported in the text is based on 46 countries. Most, but not all, of those countries are also included in the annual analysis. For stock markets, we construct quarterly returns using the monthly data detailed in Table 6.A3, and we deﬂate to real terms using quarterly CPI data constructed using the same underlying monthly CPI on which annual real stock market returns are based. For real GDP in most countries, we use the IFS volume index. Exceptions are Brazil (real GDP volume index, Brazilian Institute of Geography and Statistics website), Hong

112

Macroeconomic volatility and stock market volatility, world-wide

Kong (GDP in constant prices, Census and Statistics Department website), Singapore (GDP in constant prices, Ministry of Trade and Industry, Department of Statistics website), and Taiwan (GDP in constant prices, Taiwan National Statistics website). Table 6.A4 summarizes the availability of the monthly stock index series and quarterly GDP series for each country in our sample. Table 6.A1.

Annual stock market data

Country

Period covered

Database/Source

Acronyms

Argentina

1966–2002

ARG

Australia Austria

1961–2002 1961–2002

Brazil Canada Chile Colombia Finland France Germany Greece Hong Kong, China India Indonesia Ireland Israel Italy Jamaica Japan Jordan Korea Luxembourg

1980–2002 1961–2002 1974–2002 1961–2002 1961–2002 1961–2002 1970–2002 1975–2002 1965–2002 1961–2002 1977–2002 1961–2002 1961–2002 1961–2002 1969–2002 1961–2002 1978–2002 1972–2002 1970–2002

Malaysia Mexico Morocco Netherlands New Zealand Norway

1980–2002 1972–2002 1980–2002 1961–2002 1961–2002 1961–2002

Pakistan

1961–2002

1966–1989 Buenos Aires SE(1) General Index 1988–2002 Buenos Aires SE Merval Index IFS(2) 1961–1998 IFS 1999–2002 Vienna SE WBI index Bovespa SE IFS IFS IFS IFS IFS IFS Athens SE General Weighted Index Hang Seng Index IFS EMDB–JSE Composite(3) IFS IFS IFS IFS IFS Amman SE General Weighted Index IFS 1980–1998 IFS 1999–2002 SE–LuxX General Index KLSE Composite Price & Quotations Index EMDB–Upline Securities IFS IFS 1961–2000 IFS 2001–2002 OECD–CLI industrials 1961–1975 IFS 1976–2002 EMDB–KSE 100

AUS AUT BRA CAN CHL COL FIN FRA GER GRC HKG IND IDN IRL ISR ITA JAM JPN JOR KOR LUX MYS MEX MOR NLD NZL NOR PAK (cont.)

Appendix Table 6.A1.

113 (Continued )

Country

Period covered

Database/Source

Acronyms

Peru Philippines Singapore

1981–2002 1961–2002 1966–2002

PER PHL SGP

South Africa Spain Sweden Switzerland Taiwan Thailand Trinidad and Tobago United Kingdom

1961–2002 1961–2002 1961–2002 1961–2002 1967–2002 1975–2002 1981–2002 1961–2002

United States Venezuela, Rep. Bol. Zimbabwe

1961–2002 1961–2002 1975–2002

Lima SE IFS 1966–1979 Strait Times Old Index 1980–2002 Strait Times New Index IFS IFS IFS OECD–UBS 100 index TSE Weighted Stock Index SET Index EMDB–TTSE index 1961–1998 IFS, industrial share index 1999–2002 OECD, industrial share index IFS IFS EMDB–ZSE Industrial

SAF SPA SWE SWI TAI THA TTB UK

USA VEN ZBW

(1) (2)

SE denotes Stock Exchange. IFS denotes IMF’s International Financial Statistics. IFS does not provide the name of the stock market index. (3) EMDB denotes Standard & Poors’ Emerging Market Data Base.

Table 6.A2. Country Argentina Australia Austria Brazil Canada Chile Colombia Denmark Finland

Annual Consumption Data Database

Country

1960–2001 IFS(1) 2002 WRDS(2) 1958–2000 WDI(3) , 2001–2002 WRDS 1959–2002 WDI, 2002 WRDS 1959–2001 WDI, 2002 WRDS 1960–1964 IFS; 1965–2000 WDI, 2002 WRDS 1960–2001 WDI, 2002 WRDS 1960–2001 WDI, 2002 WRDS 1959–2001 WDI, 2002 IFS 1959–2001 WDI, 2002 WRDS

Malaysia

1960–2002 WDI

Morocco

1960–2001 WDI, 2002 WRDS

Mexico

1959–2001 WDI, 2002 WRDS

Netherlands

1959–2001 WDI, 2002 WRDS

New Zealand

Pakistan

1958–2000 WDI, 2001–2002 IFS 1958–2000 WDI, 2001–2002 WRDS 1960–2002 WDI

Peru Philippines

1959–2001 WDI, 2002 WRDS 1960–2001 WDI, 2002 WRDS

Norway

Database

(cont.)

114

Macroeconomic volatility and stock market volatility, world-wide

Table 6.A2.

(Continued )

Country France Germany Greece Hong Kong, China India Indonesia Ireland

Database

Country

1959–2001 WDI, 2002 WRDS 1960–1970 IFS, 1971–2001 WDI, 2002 WRDS 1958–2000 WDI, 2001–2002 WRDS 1959–2001 WDI, 2002 IFS

Singapore

1960–2002 WDI

South Africa

1960–2002 WDI

Spain

1959–2001 WDI, 2002 WRDS 1960–2002 WDI

Switzerland

1959–2001 WDI, 2002 WRDS 1959–2001 WDI, 2002 WRDS 1959–2001 WDI, 2002 WRDS 1964–2002 National Statistics Oﬃce 1960–2002 WDI

Italy

1960–2000 WDI, 2001–2002 WRDS 1960–2000 WDI, 2001–2002 WRDS 1959–2001 WDI, 2002 IFS

Jamaica

1959–2001 WDI, 2002 IFS

Japan

1959–2001 WDI, 2002 WRDS 1960–2002 WDI

Israel

Korea

Database

Sweden

Taiwan Thailand

United Kingdom 1959–2001 WRDS United States 1958–2000 WRDS Uruguay 1960–2001 WRDS Zimbabwe 1965–2002

WDI, 2002 WDI, 2001–2002 WDI, 2002 WDI

(1)

IFS denotes IMF’s International Financial Statistics. Data taken from the Global Insight (formerly DRI) database which is available through Wharton Research Data Service (WRDS). (3) WDI denotes World Development Indicators. (2)

Table 6.A3. Acronym

Monthly Stock Index Data Country

Deﬁnition

Period covered

Source

ARG

Argentina

1983:01–2003:12

GFD(1)

AUS

Australia

1958:01–2003:12

IFS(2)

AUT

Austria

1957:01–2003:12

IFS

BEL BRA CAN CHL COL

Belgium Brazil Canada Chile Colombia

Buenos Aires Old (1967–1988) Merval Index (1989–2003) 19362. . . ZF. . . , Share Prices: Ordinaries 12262. . . ZF. . . , Share Prices 12462. . . ZF. . . 22362. . . ZF. . . 15662. . . ZF. . . 22862. . . ZF. . . 23362. . . ZF. . .

1957:01–2003:12 1980:01–2003:12 1957:01–2003:11 1974:01–2003:10 1959:01–2003:12

IFS IFS IFS IFS IFS (cont.)

Appendix

115

Table 6.A3.

(Continued )

Acronym

Country

CZE DEN FIN FRA GER GRC HKG HUN IDN IRL ISR ITA JPN JOR KOR LAT MYS MEX NLD NZL NOR PER PHL PRT SGP SLV SAF SPA SWE SWI TAI THA TUR UKI USA (1) (2) (3) (4)

Deﬁnition

Czech Republic Denmark Finland France Germany Greece Hong Kong Hungary Indonesia

PX50 Index 12862A..ZF. . . 17262. . . ZF. . . 13262. . . ZF. . . 13462. . . ZF. . . Athens General Index Hang Seng Index BSE BUX Index Jakarta SE Composite Index Ireland 17862. . . ZF. . . (May 1972 missing) Israel 43662. . . ZF. . . Italy 13662. . . ZF. . . Japan 15862. . . ZF. . . Jordan ASE Index S. Korea KOSPI Index Latyia 94162. . . ZF. . . Malaysia KLSE composite Mexico IPC index Netherlands 13862. . . ZF. . . New Zealand 19662. . . ZF. . . Norway 14262. . . ZF. . . (Sep 1997 missing) Peru Lima SE Index Philippines 56662. . . ZF. . . Portugal PSI General Index Singapore Old+New Strait Times Index Slovakia SAX Index South Africa 19962. . . ZF. . . Spain 18462. . . ZF. . . Sweden 14462. . . ZF. . . Switzerland 14662. . . ZF. . . Taiwan SE Capitalization Weighted Index Thailand SET Index Turkey ISE National-100 Index United Kingdom FTSE 100 Index United States 11162 ZF

GFD denotes Global Financial Data. IFS denotes IMF’s International Financial Statistics. EMDB denotes Standard & Poors’ Emerging Market Data Base. WRDS denotes Wharton Research Data Services.

Period covered

Source

1994:01–2003:12 EMDB(3) 1967:01–2003:12 IFS 1957:01–2003:12 IFS 1957:01–2003:11 IFS 1970:01–2003:12 IFS 1980:01–2003:09 GFD 1980:01–2003:05 GFD 1992:01–2003:12 EMDB 1983:03–2003:12 GFD 1957:01–2003:11

IFS

1957:01–2003:11 IFS 1957:01–2003:12 IFS 1957:01–2003:11 IFS 1986:01–2003:02 EMDB 1975:01–2003:12 GFD 1996:04–2003:12 IFS 1980:01–2003:12 GFD 1972:01–2003:12 GFD 1957:01–2003:11 IFS 1961:01–2003:09 IFS 1957:01–2003:12 IFS 1981:12–2003:12 1957:01–2003:11 1987:12–2003:12 1966:01–2003:11

GFD IFS EMDB GFD

1996:01–2003:12 EMDB 1960:01–2003:10 IFS 1961:01–2003:12 IFS 1996:06–2003:12 IFS 1989:01–2003:12 IFS 1967:01–2003:12 GFD 1980:01–2003:12 1986:12–2003:12 1957:12–2003:11 1957:01–2003:12

GFD GFD WRDS(4) IFS

116

Macroeconomic volatility and stock market volatility, world-wide

Table 6.A4. Acronym

ARG AUS AUT BEL BRA CAN CHL COL CZE DEN FIN FRA GER GRC HKG HUN IDN IRL ISR ITA JPN JOR KOR LAT MYS MEX NLD NZL NOR PER PHL PRT SGP SLV SAF SPA SWE SWI TAI THA TUR UKI USA

Availability of monthly stock returns and quarterly GDP series Country

Argentina Australia Austria Belgium Brazil Canada Chile Colombia Czech Republic Denmark Finland France Germany Greece Hong Kong Hungary Indonesia Ireland Israel Italy Japan Jordan S. Korea Latvia Malaysia Mexico Netherlands New Zealand Norway Peru Philippines Portugal Singapore Slovakia South Africa Spain Sweden Switzerland Taiwan Thailand Turkey United Kingdom United States

1984. I–1988.IV 1989.I–1993.IV 1994.I–1998.IV 1999.I–2003.IV Stock index

GDP

Stock index

GDP

Stock index

GDP

Stock index

GDP

7

Measuring Downside Risk – Realized Semivariance Ole E. Barndorﬀ-Nielsen, Silja Kinnebrock, and Neil Shephard

“It was understood that risk relates to an unfortunate event occurring, so for an investment this corresponds to a low, or even negative, return. Thus getting returns in the lower tail of the return distribution constitutes this ‘downside risk.’ However, it is not easy to get a simple measure of this risk.” Quoted from Granger (2008).

1. Introduction A number of economists have wanted to measure downside risk, the risk of prices falling, just using information based on negative returns – a prominent recent example is by Ang, Chen, and Xing (2006). This has been operationalized by quantities such as semivariance, value at risk and expected shortfall, which are typically estimated using daily returns. In this chapter we introduce a new measure of the variation of asset prices based on high frequency data. It is called realized semivariance (RS). We derive its limiting properties, relating it to quadratic variation and, in particular, negative jumps. Further, we show it has some useful properties in empirical work, enriching the standard ARCH models

Acknowledgments: The ARCH models ﬁtted in this chapter were computed using G@RCH 5.0, the package of Laurent and Peters (2002). Throughout, programming was carried out using the Ox language of Doornik (2001) within the OxMetrics 5.0 environment. We are very grateful for the help of Asger Lunde in preparing some of the data we used in this analysis and advice on various issues. We also would like to thank Rama Cont, Anthony Ledford and Andrew Patton for helpful suggestions at various points. The referee and an editor, Tim Bollerslev, made a number of useful suggestions. This chapter was ﬁrst widely circulated on 21 January, 2008.

117

118

Measuring downside risk – realized semivariance

pioneered by Rob Engle over the last 25 years and building on the recent econometric literature on realized volatility. Realized semivariance extends the inﬂuential work of, for example, Andersen, Bollerslev, Diebold, and Labys (2001) and Barndorﬀ-Nielsen and Shephard (2002), on formalizing so-called realized variances (RV), which links these commonly used statistics to the quadratic variation process. Realized semivariance measures the variation of asset price falls. At a technical level it can be regarded as a continuation of the work of Barndorﬀ-Nielsen and Shephard (2004) and Barndorﬀ-Nielsen and Shephard (2006), who showed it is possible to go inside the quadratic variation process and separate out components of the variation of prices into that due to jumps and that due to the continuous evolution. This work has prompted papers by, for example, Andersen, Bollerslev, and Diebold (2007), Huang and Tauchen (2005) and Lee and Mykland (2008) on the importance of this decomposition empirically in economics. Surveys of this kind of thinking are provided by Andersen, Bollerslev, and Diebold (2009) and Barndorﬀ-Nielsen and Shephard (2007), while a detailed discussion of the relevant probability theory is given in Jacod (2007). Let us start with statistics and results which are well known. Realized variance estimates the ex post variance of log asset prices Y over a ﬁxed time period. We will suppose that this period is 0 to 1. In our applied work it can be thought of as any individual day of interest. Then RV is deﬁned as RV =

n

Ytj − Ytj−1

2

j=1

where 0 = t0 < t1 < . . . < tn = 1 are the times at which (trade or quote) prices are available. For arbitrage free-markets, Y must follow a semimartingale. This estimator converges as we have more and more data in that interval to the quadratic variation at time one, [Y ]1 = p − lim n→∞

n

Ytj − Ytj−1

2

,

j=1

(e.g. Protter, 2004, pp. 66–77) for any sequence of deterministic partitions 0 = t0 < t1 < . . . < tn = 1 with supj {tj+1 − tj } → 0 for n → ∞. This limiting operation is often referred to as “in-ﬁll asymptotics” in statistics and econometrics.1 One of the initially strange things about realized variance is that it solely uses squares of the data, whereas the research of, for example, Black (1976), Nelson (1991), Glosten, Jagannathan, and Runkle (1993) and Engle and Ng (1993) has indicated the importance of falls in prices as a driver of conditional variance. The reason for this is clear, as the high-frequency data become dense, the extra information in the sign of the data can fall to zero for some models – see also the work of Nelson (1992). The most elegant framework 1 When there are market frictions it is possible to correct this statistic for their eﬀect using the two-scale estimator of Zhang, Mykland, and A¨ıt-Sahalia (2005), the realized kernel of Barndorﬀ-Nielsen, Hansen, Lunde, and Shephard (2008) or the pre-averaging based statistic of Jacod, Li, Mykland, Podolskij, and Vetter (2007).

1 Introduction

119

in which to see this is where Y is a Brownian semimartingale t t as ds + σs dWs , t ≥ 0, Yt = 0

0

where a is a locally bounded predictable drift process and σ is a c`adl` ag volatility process – all adapted to some common ﬁltration Ft , implying the model can allow for classic leverage eﬀects. For such a process t σs2 ds, [Y ]t = 0

and so d[Y ]t = σt2 dt, which means for a Brownian semimartingale the quadratic variation (QV) process tells us everything we can know about the ex post variation of Y and so RV is a highly interesting statistic. The signs of the returns are irrelevant in the limit – this is true whether there is leverage or not. If there are jumps in the process there are additional things to learn than just the QV process. Let t t as ds + σs dWs + J t , Yt = 0

0

where J is a pure jump process. Then, writing jumps in Y as ΔYt = Yt − Yt− , t 2 [Y ]t = σs2 ds + (ΔYs ) , 0

s≤t

and so QV aggregates two sources of risk. Even when we employ bipower variation (Barndorﬀ-Nielsen and Shephard, 2004 and Barndorﬀ-Nielsen and Shephard, 20062 ), t 2 which allows us2 to estimate 0 σs ds robustly to jumps, this still leaves us with estimates of s≤t (ΔJs ) . This tells us nothing about the asymmetric behavior of the jumps – which is important if we wish to understand downside risk. In this chapter we introduce the downside realized semivariance (RS − ) tj ≤1

RS

−

=

Ytj − Ytj−1

2

1Ytj

− Ytj−1 ≤0 ,

j=1

where 1y is the indicator function taking the value 1 if the argument y is true. We will study the behavior of this statistic under in-ﬁll asymptotics. In particular we will see that 1 2 − p 1 σs2 ds + (ΔYs ) 1ΔYs ≤0 , RS → 2 0 s≤1

2 Threshold-based decompositions have also been suggested in the literature, examples of this include Mancini (2001), Jacod (2007) and Lee and Mykland (2008).

120

Measuring downside risk – realized semivariance

under in-ﬁll asymptotics. Hence RS − provides a new source of information, one which focuses on squared negative jumps.3 Of course the corresponding upside realized semivariance tj ≤1

RS + =

Ytj − Ytj−1

2

1Ytj

− Ytj−1 ≥0

j=1

1 → 2 p

1

σs2 ds + 0

2

(ΔYs ) 1ΔYs ≥0 ,

s≤1

may be of particular interest to investors who have short positions in the market (hence a fall in price can lead to a positive return and hence is desirable), such as hedge funds. Of course, RV = RS − + RS + . Semivariances, or more generally measures of variation below a threshold (target semivariance) have a long history in ﬁnance. The ﬁrst references are probably Markowitz (1959), Mao (1970b), Mao (1970a), Hogan and Warren (1972) and Hogan and Warren (1974). Examples include the work of Fishburn (1977) and Lewis (1990). Sortino ratios (which are an extension of Sharpe ratios and were introduced by Sortino and van der Meer, 1991), and the so-called post-modern portfolio theory by, for example, Rom and Ferguson (1993), has attracted attention. Sortino and Satchell (2001) look at recent developments and provide a review, whereas Pedersen and Satchell (2002) look at the economic theory of this measure of risk. Our innovation is to bring high-frequency analysis to bear on this measure of risk. The empirical essence of daily downside realized semivariance can be gleaned from Figure 7.1, which shows an analysis of trades on General Electric (GE) carried out on the New York Stock Exchange4 from 1995 to 2005 (giving us 2,616 days of data). In graph (a) we show the path of the trades drawn in trading time on a particular randomly chosen day in 2004, to illustrate the amount of daily trading which is going on in this asset. Notice by 2004 the tick size has fallen to one cent. Graph (b) shows the open to close returns, measured on the log-scale and multiplied by 100, which indicates some moderation in the volatility during the last and ﬁrst piece of the sample period. The corresponding daily realized volatility (the square root of the realized variance) is plotted in graph (c), based upon returns calculated every 15 trades. The Andersen, Bollerslev, Diebold, and Labys (2000) variance signature plot is shown in graph (d), to assess the impact of noise on the calculation of realized volatility. It suggests statistics computed on returns calculated every 15 trades should not be too sensitive to noise for GE. Graph (e) shows the same but focusing on daily RS − and RS + . Throughout, the statistics are computed using returns calculated every 15 trades. The 3 This type of statistic relates to the work of Babsiria and Zakoian (2001) who built separate ARCHtype conditional variance models of daily returns using positive and negative daily returns. It also resonates with the empirical results in a recent paper by Chen and Ghysels (2007) on news impact curves estimated through semiparametric MIDAS regressions. 4 These data are taken from the TAQ database, managed through WRDS. Although information on trades is available from all the diﬀerent exchanges in the US, we solely study trades which are made at the exchange in New York.

1 Introduction

121

(a): Trading prices in a day in 2004

(b): Daily log returns (open to close), times 100

35.3

10

35.2

0

0

1000

2000

3000

1996

(c): Daily realized volatility, every 15 trades 10.0

1998

2000

2002

2004

(d): ABDL variance signature plot 6

7.5 5.0

4

2.5 1996

1998

2000

2002

0

2004

5

10

15

20

25

30

(f): ACF: components of realized variance

(e): Component variance signature plot 3

RS + RS − Realized variance

0.50 2 0.25 0

5

10

15

20

25

30

0

10

20

30

40

50

60

Fig. 7.1. Analysis of trades on General Electric carried out on the NYSE from 1995 to 2005. (a) Path of the trades drawn in trading time on a random day in 2004. (b) Daily open to close returns ri , measured √ on the log-scale and multiplied by 100. The corresponding daily realized volatility ( RVi ) is plotted in graph (c), based upon returns calculated every 15 trades. (d) Variance signature plot in trade time to assess the impact of noise on the calculation of realized variance (RV ). (e) Same thing, but for the realized semivariances (RSi+ and RSi− ). (f) Correlogram for RSi+ , RVi and RSi− average value of these two statistics are pretty close to one another on average over this sample period. This component signature plot is in the spirit of the analysis pioneered by Andersen, Bollerslev, Diebold, and Labys (2001) in their analysis of realized variance. Graph (f) shows the correlogram for the realized semivariances and the realized variance and suggests the downside realized semivariance has much more dependence in it than RS + . Some summary statistics for these data are available in Table 7.2, which will be discussed in some detail in Section 3. In the realized volatility literature, authors have typically worked out the impact of using realized volatilities on volatility forecasting using regressions of future realized variance on lagged realized variance and various other explanatory variables.5 Engle and Gallo (2006) prefer a diﬀerent route, which is to add lagged realized quantities as variance regressors in Engle (2002a) and Bollerslev (1986) GARCH-type models of daily 5 Leading references include Andersen, Bollerslev, Diebold, and Labys (2001) and Andersen, Bollerslev, and Meddahi (2004).

122

Measuring downside risk – realized semivariance

returns – the reason for their preference is that it is aimed at a key quantity, a predictive model of future returns, and is more robust to the heteroskedasticity inherent in the data. Typically when Engle generalizes to allow for leverage he uses the Glosten, Jagannathan, and Runkle (1993) (GJR) extension. This is the method we follow here. Throughout we will use the subscript i to denote discrete time. We model daily open to close returns6 {ri ; i = 1, 2, . . . , T } as E (ri |Gi−1 ) = μ, 2

hi = Var (ri |Gi−1 ) = ω + α (ri−1 − μ) + βhi−1 2

+ δ (ri−1 − μ) Iri−1 −μ<0 + γzi−1 , and then use a standard Gaussian quasi-likelihood to make inference on the parameters, e.g. Bollerslev and Wooldridge (1992). Here zi−1 are the lagged daily realized regressors and Gi−1 is the information set generated by discrete time daily statistics available to forecast ri at time i − 1. Table 7.1 shows the ﬁt of the GE trade data from 1995 to 2005. It indicates the lagged RS − beating out of the GARCH model (δ = 0) and the lagged RV. Both realized terms yield large likelihood improvements over a standard daily returns-based GARCH. Importantly there is a vast shortening in the information-gathering period needed to condition on, with the GARCH memory parameter β dropping from 0.953 to around 0.7. This makes ﬁtting these realized-based models much easier in practice, allowing their use on relatively short time series of data. When the comparison with the GJR model is made, which allows for traditional leverage eﬀects, the results are more subtle, with the RS − signiﬁcantly reducing the importance of the traditional leverage eﬀect while the high-frequency data still has an important impact on improving the ﬁt of the model. In this case the RS − and RV play similar roles, with RS − no longer dominating the impact of the RV in the model. The rest of this chapter has the following structure. In Section 2 we will discuss the theory of realized semivariances, deriving a central limit theory under some mild assumptions. In Section 3 we will deepen the empirical work reported here, looking at a variety of stocks and also both trade and quote data. In Section 4 we will discuss various extensions and areas of possible future work.

2. Econometric theory 2.1. The model and background We start this section by repeating some of the theoretical story from Section 1. 6 We have no high frequency data to try to estimate the variation of the prices over night and so do not attempt to do this here. Of course, it would be possible to build a joint model of open to close and close to open returns, conditional on the past daily data and the high frequency realized terms but we have not carried this out here. An alternative would be to model open to open or close to close prices given past data of the same type and the realized quantities. This is quite a standard technique in the literature, but not one we follow here.

Table 7.1.

ARCH-type models and lagged realized semivariance and variance GARCH

Lagged RS − Lagged RV ARCH GARCH

0.685 (2.78) −0.114 (−1.26) 0.040 (2.23) 0.711 (7.79)

GJR

0.499 (2.86)

0.036 (2.068) 0.691 (7.071)

0.046 (2.56) 0.953 (51.9)

0.228 (3.30) 0.040 (2.11) 0.711 (9.24)

GJR Log-likelihood

−4527.3

−4527.9

−4577.6

−4533.5

0.371 (0.91) 0.037 (0.18) 0.017 (0.74) 0.710 (7.28) 0.055 (1.05) −4526.2

0.441 (2.74)

0.021 (1.27) 0.713 (7.65) 0.048 (1.51) −4526.2

0.016 (1.67) 0.955 (58.0) 0.052 (2.86) −4562.2

0.223 (2.68) 0.002 (0.12) 0.708 (7.49) 0.091 (2.27) −4526.9

Gaussian quasi-likelihood ﬁt of GARCH and GJR models ﬁtted to daily open to close returns on General Electric share prices, from 1995 to 2005. We allow lagged daily realized variance (RV) and realized semivariance (RS) to appear in the conditional variance. They are computed using every 15th trade. T -statistics, based on robust standard errors, are reported in small font and in brackets.

124

Measuring downside risk – realized semivariance Consider a Brownian semimartingale Y given as t t as ds + σs dWs , Yt = 0

(1)

0

where a is a locally bounded predictable drift process and σ is a c`adl` ag volatility process. For such a process t σs2 ds, [Y ]t = 0

σt2 dt,

and so d[Y ]t = which means that when there are no jumps the QV process tells us everything we can know about the ex post variation of Y . When there are jumps this is no longer true, in particular let t t as ds + σs dWs + J t , (2) Yt = 0

0

where J is a pure jump process. Then t 2 [Y ]t = σs2 ds + (ΔJs ) , 0

s≤t

2

and d[Y ]t = σt2 dt + (ΔYt ) . Even when we employ devices like realized bipower variation (Barndorﬀ-Nielsen and Shephard, 2004 and Barndorﬀ-Nielsen and Shephard, 2006) BP V = μ−2 1

t p [1,1] Yt − Yt Yt − Y → {Y } = σs2 ds, tj−2 j j−1 j−1 t

tj ≤t

0

j=2

μ1 = E |U | , U ∼ N (0, 1), t we are able to estimate 0 σs2 ds robustly to jumps, but this still leaves us with estimates 2 of (ΔJs ) . This tells us nothing about the asymmetric behavior of the jumps. s≤t

2.2. Realized semivariances The empirical analysis we carry out throughout this chapter is based in trading time, so data arrive into our database at irregular points in time. However, these irregularly spaced observations can be thought of as being equally spaced observations on a new time-changed process, in the same stochastic class, as argued by, for example, Barndorﬀ-Nielsen, Hansen, Lunde, and Shephard (2008). Thus there is no loss in initially considering equally spaced returns yi = Y ni − Y i−1 , n

i = 1, 2, . . . , n.

We study the functional

yi2 1{yi ≥0}

i=1

yi2 1{yi ≤0}

nt

V (Y, n) =

.

(3)

2 Econometric theory

125

The main results then come from an application of some limit theory of Kinnebrock and Podolskij (2008) for bipower variation. This work can be seen as an important generalization of Barndorﬀ-Nielsen, Graversen, Jacod, and Shephard (2006) who studied bipower-type statistics of the form n √ 1 √ g( nyi )h( nyi−1 ), n i=2

when g and h were assumed to be even functions. Kinnebrock and Podolskij (2008) give the extension to the uneven case, which is essential here.7 Proposition 1 Suppose (1) holds, then nt

t y2 1 p 1 1 2 i {yi ≥0} → σs ds . yi2 1{yi ≤0} 1 2 0 i=1

Proof Trivial application of Theorem 1 in Kinnebrock and Podolskij (2008). Corollary 1 Suppose

Yt =

t

as ds +

t

σs dWs + J t ,

0

0

holds, where J is a ﬁnite activity jump process then t nt 2 y2 1 p 1 (ΔYs ) 1{ΔYs ≥0} 1 2 i {yi ≥0} . → σ ds + 2 yi2 1{yi ≤0} 1 2 0 s (ΔYs ) 1{ΔYs ≤0} i=1

s≤t

Remark 1 The above means that nt y2 1 p 2 2 {yi ≥0} i (1, −1) → (ΔYs ) 1{ΔYs ≥0} − (ΔYs ) 1{ΔYs ≤0} , 2 yi 1{yi ≤0} i=1

s≤t

the diﬀerence in the squared jumps. Hence this statistic allows us direct econometric evidence on the importance of the sign of jumps. Of course, by combining with bipower variation nt 2 y2 1 1 BP V p (ΔYs ) 1{ΔYs ≥0} i {yi ≥0} − , → 2 yi2 1{yi ≤0} 2 BP V (ΔYs ) 1{ΔYs ≤0} i=1

s≤t

we can straightforwardly estimate the QV of just positive or negative jumps. In order to derive a central limit theory we need to make two assumptions on the volatility process. (H1) If there were no jumps in the volatility then it would be suﬃcient to employ t t t ∗ ∗ σt = σ0 + as ds + σs dWs + vs∗ dWs∗ . (4) 0

0

0

7 It is also useful in developing the theory for realized autocovariance under a Brownian semimartingale, which is important in the theory of realized kernels developed by Barndorﬀ-Nielsen, Hansen, Lunde, and Shephard (2008).

126

Measuring downside risk – realized semivariance

adl` ag processes, with a∗ also being predictable and locally Here a∗ , σ ∗ , v ∗ are adapted c` ∗ bounded. W is a Brownian motion independent of W . (H2) σt2 > 0 everywhere. The assumption (H1)’ is rather general from an econometric viewpoint as it allows for ﬂexible leverage eﬀects, multifactor volatility eﬀects, jumps, nonstationarities, intraday eﬀects, etc. Indeed we do not know of a continuous time continuous sample path volatility model used in ﬁnancial economics that is outside this class. Kinnebrock and Podolskij (2008) also allow jumps in the volatility under the usual (in this context) conditions introduced by Barndorﬀ-Nielsen, Graversen, Jacod, Podolskij, and Shephard (2006) and discussed by, for example, Barndorﬀ-Nielsen, Graversen, Jacod, and Shephard (2006) but we will not detail this here. The assumption (H2) is also important, it rules out the situation where the diﬀusive component disappears. Proposition 2 Suppose (1), (H1) and (H2) holds, then ⎧ ⎞ ⎛ 1 ⎞⎫ ⎛ 2 ⎪ ⎪ t nt yi 1{yi ≥0} 2 ⎨ ⎬ √ ⎟ ⎜ 1 ⎟ Dst ⎜ 2 2 n σs ds ⎝ 2 ⎠ → Vt ⎝yi 1{yi ≤0} ⎠ − ⎪ ⎪ 0 ⎩ i=1 ⎭ |yi | |yi−1 | μ21 where

t

Vt =

t

αs (1) ds +

αs (2) dWs +

0

0

t

αs (3) dWs ,

0

⎛ ⎞ 1 1 αs (1) = √ {2as σs + σs σs∗ } ⎝−1⎠ , 2π 0 ⎛ ⎞ 1 2 2⎝ ⎠ αs (2) = √ σs −1 , 2π 0 ⎛

As =

5 4 ⎜ σs4 ⎝− 14 μ21

− 14 5 4 μ21

μ21

⎞

⎟ μ21 ⎠, 1 + 2μ21 − 3μ41

αs (3) αs (3) = As − αs (2) αs (2) , where αs (3) is a 2 × 2 matrix. Here W is independent of (W, W ∗ ), the Brownian motion which appears in the Brownian semimartingale (1) and (H1). Proof Given in the Appendix. Remark 2 When we look at nt

RV = (1, 1)

i=1

yi2 1{yi ≥0} yi2 1{yi ≤0}

,

2 Econometric theory

127

then we produce the well-known result $ % t t √ Dst n RV − σs2 ds → 2σs2 dWs 0

0

|=

which appears in Jacod (1994) and Barndorﬀ-Nielsen and Shephard (2002). W then Remark 3 Assume a, σ ⎧ ⎫ ⎬ t nt

√ ⎨ yi2 1{yi ≥0} 1 1 − n σs2 ds 2 1 ⎭ ⎩ 2 y 1 0 i {yi ≤0} i=1 Dst

→ MN

1 √ 2π

t

{2as σs +

σs σs∗ } ds

0

t 1 1 5 −1 4 , σ ds . −1 4 0 s −1 5

If there is no drift and the volatility of volatility was small then the mean of this mixed Gaussian distribution is zero and we could use this limit result to construct conﬁdence intervals on these quantities. When the drift is not zero we cannot use this result as we do not have a method for estimating the bias, which is a scaled version of t 1 √ {2as σs + σs σs∗ } ds. n 0 Of course in practice this bias will be small. The asymptotic variance of nt

yi2 1{yi ≥0} (1, −1) yi2 1{yi ≤0} i=1 is

3 t n 0

σs4 ds, but obviously not mixed Gaussian.

Remark 4 When the a, σ is independent of W assumption fails, we do not know how to construct conﬁdence intervals even if the drift is zero. This is because in the limit ⎧ ⎫ ⎬ nt

√ ⎨ yi2 1{yi ≥0} 1 t 2 1 − n σ ds 1 ⎭ ⎩ 2 0 s yi2 1{yi ≤0} i=1 depends upon W . All we know is that the asymptotic variance is again 1 t 4 5 −1 σ ds . −1 5 4n 0 s Notice, throughout the asymptotic variance of RS − is 5 t 4 σ ds 4n 0 s so it is less than that of the RV (of course it estimates a diﬀerent quantity). It also means the asymptotic variance of RS + − RS − is 3 t 4 σ ds. n 0 s

128

Measuring downside risk – realized semivariance

Remark 5 We can look at the measure of the variation of negative jumps through ⎛ ⎞ nt

nt

√ 1 Dst n ⎝2 yi2 1{yi ≤0} − 2 |yi | |yi−1 |⎠ → Vt μ1 i=1 i=1 where

t

Vt =

αs (1)ds + 0

t

αs (2)dWs + 0

t

αs (3)dWs ,

0

1 αs (1) = −2 √ {2as σs + σs σs∗ } , 2π 2 αs (2) = −2 √ σs2 , 2π

−2 As = σs4 μ−4 1 + 2μ1 − 2 , αs (3)αs (3) = As − αs (2)αs (2) . We note that −2 μ−4 1 + 2μ1 − 2 3.6089,

which is quite high (the corresponding term is about 0.6 when we look at the diﬀerence betweem realized variance and bipower variation). Without the assumption that the drift is zero and no leverage, it is diﬃcult to see how to use this distribution as the basis of a test.

3. More empirical work 3.1. More on GE trade data For the GE trade data, Table 7.2 reports basic summary statistics for squared open to close daily returns, realized variance and downside realized semivariance. Much of this is Table 7.2.

Summary information for daily statistics for GE trade data

Variable Mean S.D. ri ri2 RVi RSi+ RSi− BP Vi BP DVi

0.01 2.34 2.61 1.33 1.28 2.24 0.16

Correlation matrix

1.53 1.00 5.42 0.06 1.00 3.05 0.03 0.61 1.00 2.03 0.20 0.61 0.94 1.00 1.28 −0.22 0.47 0.86 0.66 2.40 0.00 0.54 0.95 0.84 0.46 −0.61 −0.10 −0.08 −0.34

1.00 0.93 1.00 0.34 −0.01

ACF1 ACF20 −0.01 0.17 0.52 0.31 0.65 0.64 1.00 0.06

0.00 0.07 0.26 0.15 0.37 0.34 0.03

Summary statistics for daily GE data computed using trade data. ri denotes daily open to close returns, RVi is the realized variance, RSi are the realized semivariances, and BP Vi is the daily realized bipower variation. BPDV will be deﬁned on the next page.

3 More empirical work

129

Table 7.3. GE trade data: regression of returns on lagged realized semivariance and returns Coeﬃcient Constant ri−1 RS − i−1 BPDVi−1 log L

0.009 −0.012 −4,802.2

t-value 0.03 0.01

Coeﬃcient −0.061 −0.001 0.054

t-value

Coeﬃcient

t-value

−1.43 −0.06 2.28

−0.067 0.016 0.046 0.109 −4,798.8

−1.56 0.67 1.85 1.26

−4,799.6

− Regression of returns ri on lagged realized semivariance RSi−1 and returns ri−1 for daily returns based on the GE trade database.

familiar, with the average level of squared returns and realized variance being roughly the same, whereas the mean of the downside realized semivariance is around one-half that of the realized variance. The most interesting results are that the RS − statistic has a correlation with RV of around 0.86 and that it is negatively correlated with daily returns. The former correlation is modest for an additional volatility measure and indicates that it may have additional information not in the RV statistic. The latter result shows that large daily semivariances are associated with contemporaneous downward moves in the asset price – which is not surprising of course. The serial correlations in the daily statistics are also presented in Table 7.2. They show the RV statistic has some predictability through time, but that the autocorrelation in the RS − is much higher. Together with the negative correlation between returns and contemporaneous RS − (which is consistent for a number of diﬀerent assets), this suggests one should be able to modestly predict returns using past RS − . − for the GE trade data. Table 7.3 shows the regression ﬁt of ri on ri−1 and RSi−1 The t-statistic on lagged RS − is just signiﬁcant and positive. Hence a small amount of the variation in the high-frequency falls of price in the previous day is associated with rises in future asset prices – presumably because the high-frequency falls increase the − for other series are risk premium. The corresponding t-statistics for the impact of RSi−1 given in Table 7.6, they show a similar weak pattern. The RS − statistic has a similar dynamic pattern to the bipower variation statistic.8 The mean and standard deviation of the RS − statistic is slightly higher than half the realized BPV one. The diﬀerence estimator BP DVi = RSi− − 0.5BP Vi , which estimates the squared negative jumps, is highly negatively correlated with returns but not very correlated with other measures of volatility. Interestingly this estimator is slightly autocorrelated, but at each of the ﬁrst 10 lags this correlation is positive, which means it has some forecasting potential. 8 This is computed using not one but two lags, which reduces the impact of market microstructure, as shown by Andersen, Bollerslev, and Diebold (2007).

130

Measuring downside risk – realized semivariance

Table 7.4.

Summary information for daily statistics for other trade data

Mean S.D.

Correlation matrix DIS 1.00 0.04 1.00 −0.00 0.53 1.00 0.19 0.55 0.94 1.00 −0.18 0.46 0.95 0.81 −0.00 0.53 0.98 0.93 −0.46 0.13 0.52 0.25 AXP

−0.02 ri 3.03 ri2 3.98 RVi RSi+ 1.97 RSi− 2.01 3.33 BP Vi 0.35 BP DVi

1.74 6.52 4.69 2.32 2.60 3.97 1.03

ri ri2 RVi RSi+ RSi− BP Vi BP DVi

0.01 3.47 3.65 1.83 1.82 3.09 0.27

1.86 7.75 4.57 2.62 2.30 3.74 0.90

1.00 −0.00 1.00 −0.01 0.56 1.00 0.22 0.52 0.93 −0.28 0.53 0.91 −0.04 0.52 0.94 −0.63 0.27 0.37 IBM

ri ri2 RVi RSi+ RSi− BP Vi BP DVi

0.01 3.02 2.94 1.50 1.44 2.62 0.13

1.73 7.25 3.03 1.81 1.43 2.60 0.49

1.00 0.04 0.03 0.24 −0.24 0.00 −0.71

1.00 0.72 0.83 0.10

1.00 0.93 0.72

1.00 0.92 0.62

1.00 0.43

1.00 0.28

1.00 0.55 1.00 0.54 0.94 1.00 0.48 0.91 0.74 1.00 0.51 0.96 0.86 0.93 1.00 0.05 0.13 −0.11 0.44 0.10

ACF1

ACF20

1.00

−0.00 0.15 0.69 0.66 0.57 0.69 0.05

0.00 0.08 0.35 0.35 0.30 0.37 0.04

1.00

0.01 0.15 0.64 0.48 0.64 0.69 0.20

0.01 0.09 0.37 0.27 0.36 0.39 0.11

1.00

−0.05 0.13 0.65 0.50 0.65 0.70 0.04

0.01 0.04 0.34 0.26 0.34 0.38 −0.01

Summary statistics for various daily data computed using trade data. ri denotes daily open to close returns, RVi is the realized variance, RSi is the realized semivariance, and BP Vi is the daily realized bipower variation. BP DVi is the realized bipower downward variation statistic.

3.2. Other trade data Results in Table 7.4 show that broadly the same results hold for a number of frequently traded assets – American Express (AXP), Walt Disney (DIS) and IBM. Table 7.5 shows the log-likelihood improvements9 by including RV and RS − statistics into the GARCH and GJR models based on trades. The conclusion is clear for GARCH models. By including RS − statistics in the model there is little need to include a traditional leverage eﬀect. Typically it is only necessary to include RS − in the information set, adding RV plays only a modest role. For GJR models, the RV statistic becomes more important and is sometimes slightly more eﬀective than the RS − statistic. 9 Of course the log-likelihoods for the ARCH-type models are Gaussian quasi-likelihoods and so the standard distributional theory for likelihood ratios does not apply directly. Instead one can think of the model ﬁt through a criterion like BIC.

4 Additional remarks

131

Table 7.5. Trades: logL improvements by including lagged RS − and RV in conditional variance Lagged variables

GARCH model

GJR model

AXP

DIS

GE

IBM

AXP

DIS

GE

IBM

RV, RS − & BPV 59.9 RV & BPV 53.2 59.9 RS − & BPV BPV 46.2 59.8 RV & RS − RV 53.0 59.6 RS − None 0.00

66.5 63.7 65.7 57.5 66.3 63.5 65.6 0.00

50.5 44.7 48.7 44.6 49.5 43.2 48.7 0.00

64.8 54.6 62.6 43.9 60.7 51.5 60.6 0.00

47.7 45.4 47.6 40.0 47.5 45.1 47.1 0.00

57.2 56.9 53.2 50.0 56.9 56.7 52.4 0.00

36.7 36.0 36.4 35.8 35.4 34.7 35.4 0.00

45.7 44.6 42.5 34.5 42.4 41.9 41.7 0.00

Improvements in the Gaussian quasi-likelihood by including lagged realized quantities in the conditional variance over standard GARCH and GJR models. Fit of GARCH and GJR models for daily open to close returns on four share prices, from 1995 to 2005. We allow lagged daily realized variance (RV), realized semivariance (RS − ), realized bipower variation (BPV) to appear in the conditional variance. They are computed using every 15th trade.

3.3. Quote data We have carried out the same analysis based on quote data, looking solely at the series for oﬀers to buy placed on the New York Stock Exchange. The results are given in Tables 7.6 and 7.7. The results are in line with the previous trade data. The RS − statistic is somewhat less eﬀective for quote data, but the changes are marginal.

4. Additional remarks 4.1. Bipower variation We can build on the work of Barndorﬀ-Nielsen and Shephard (2004), Barndorﬀ-Nielsen and Shephard (2006), Andersen, Bollerslev, and Diebold (2007) and Huang and Tauchen − , controlling Table 7.6. t-statistics for ri on RSi−1 for lagged returns

Trades Quotes

AIX

DIS

GE

IBM

−0.615 0.059

3.79 5.30

2.28 2.33

0.953 1.72

The t-statistics on realized semivariance calculated by regressing daily returns ri on lagged daily returns and lagged daily semi− ). This is carried out for a variety of stock prices variances (RSi−1 using trade and quote data. The RS statistics are computed using every 15th high-frequency data point.

132

Measuring downside risk – realized semivariance Table 7.7. Quotes: logL improvements by including lagged RS and RV in conditional variance Lagged variables

RV & RS − RV RS − None

GARCH model

GJR model

AXP

DIS

GE

IBM

AXP

DIS

GE

IBM

50.1 45.0 49.5 0.0

53.9 53.6 50.7 0.0

45.0 43.3 44.5 0.0

53.8 43.9 53.7 0.0

39.7 39.1 38.0 0.0

48.0 46.3 39.4 0.0

31.7 31.6 29.1 0.0

31.5 31.3 30.0 0.0

Quote data: Improvements in the Gaussian quasi-likelihood by including lagged realized quantities in the conditional variance. Fit of GARCH and GJR models for daily open to close returns on four share prices, from 1995 to 2005. We allow lagged daily realized variance (RV) and realized semivariance (RS) to appear in the conditional variance. They are computed using every 15th trade.

(2005) by deﬁning tj ≤1

BP DV =

Ytj − Ytj−1

2

1Ytj −

j=1 p

→

tj ≤1 1 −2 Ytj − Ytj−1 Ytj−1 − Ytj−2 Ytj−1 ≤0 − μ1 2 j=2

2

(ΔYs ) IΔYs ≤0 ,

s≤t

the realized bipower downward variation statistic (upward versions are likewise trivial to deﬁne). This seems a novel way of thinking about jumps – we do not know of any 2 literature that has identiﬁed s≤t (ΔYs ) IΔYs before. It is tempting to try to carry out jump tests based upon it to test for the presence of downward jumps against a null of no jumps at all. However, the theory developed in Section 2 suggests that this is going to be hard to implement based solely on in-ﬁll asymptotics without stronger assumptions than we usually like to make due to the presence of the drift term in the limiting result and the nonmixed Gaussian limit theory (we could do testing if we assumed the drift was zero and there is no leverage term). Of course, it would not stop us from testing things based on the time series dynamics of the process – see the work of Corradi and Distaso (2006). Further, a time series of such objects can be used to assess the factors that drive downward jumps, by simply building a time series model for it, conditioning on explanatory variables. An alternative to this approach is to use higher order power variation statistics (e.g. Barndorﬀ-Nielsen and Shephard, 2004 and Jacod, 2007), tj ≤1

Ytj − Ytj−1 r 1Yt − j j=1

Ytj−1 ≤0

p

→

r

|ΔYs | IΔYs ≤0 ,

r > 2,

s≤t

as n → ∞. The diﬃculty with using these high order statistics is that they will be more sensitive to noise than the BPDV estimator.

Appendix

133

4.2. Eﬀect of noise Suppose instead of seeing Y we see X = Y + U, and think of U as noise. Let us focus entirely on n

x2i 1{xi ≤0} =

i=1

n

yi2 1{yi ≤−ui } +

i=1

n i=1

n

u2i 1{yi ≤−ui } + 2

i=1

yi2 1{ui ≤0}

+

n

u2i 1{ui ≤0}

i=1

n

yi ui 1{yi ≤−ui }

i=1

+2

n

yi ui 1{ui ≤0} .

i=1

If we use the framework of Zhou (1996), where U is white noise, uncorrelated with Y , with E(U ) = 0 and Var(U ) = ω 2 then it is immediately apparent that the noise will totally dominate this statistic in the limit as n → ∞. Pre-averaging based statistics of Jacod, Li, Mykland, Podolskij, and Vetter (2007) could be used here to reduce the impact of noise on the statistic.

5. Conclusions This chapter has introduced a new measure of variation called downside “realized semivariance.” It is determined solely by high-frequency downward moves in asset prices. We have seen it is possible to carry out an asymptotic analysis of this statistic and see that its limit is eﬀects only by downward jumps. We have assessed the eﬀectiveness of this new measure using it as a conditioning variable for a GARCH model of daily open to close returns. Throughout, for nonleveragebased GARCH models, downside realized semivariance is more informative than the usual realized variance statistic. When a leverage term is introduced it is hard to tell the diﬀerence. Various extensions to this work were suggested. The conclusions that downward jumps seem to be associated with increases in future volatility is interesting for it is at odds with nearly all continuous time parametric stochastic volatility models. It could only hold, except for very contrived models, if the volatility process also has jumps in it and these jumps are correlated with the jumps in the price process. This is because it is not possible to correlate a Brownian motion process with a jump process. This observation points us towards models of the type, for example, introduced by Barndorﬀ-Nielsen and Shephard (2001). It would suggest the possibilities of empirically rejecting the entire class of stochastic volatility models built solely from Brownian motions. This seems worthy of some more study.

Appendix: Proof of Proposition 2 Consider the framework of Theorem 2 in Kinnebrock and Podolskij (2008) and choose ⎞ ⎛ 2 ⎞ ⎛ ⎞ ⎛ x 1{x≥0} 1 0 0 g1 (x) h (x) = ⎝0 1 0 ⎠ g (x) = ⎝g2 (x)⎠ = ⎝ x2 1{x≤0} ⎠ |x| 0 0 |x| g3 (x)

134

Measuring downside risk – realized semivariance

Assume that X is a Brownian semimartingale, conditions (H1) and (H2) are satisﬁed and note that g is continuously diﬀerentiable and so their theory applies directly. Due to the particular choice of h we obtain the stable convergence ⎧ ⎛ 1 ⎞⎫ ⎪ ⎪ t t t t 2 ⎨ ⎬ √ ⎜1⎟ 2 n V (Y, n)t − σs ds ⎝ 2 ⎠ → αs (1)ds + αs (2)dWs + αs (3)dWs , (5) ⎪ ⎪ 0 0 0 0 ⎩ ⎭ μ21 where W is a one-dimensional Brownian motion deﬁned on an extension of the ﬁltered probability space and independent of the σ-ﬁeld F . Using the notation ρσ (g) = E {g(σU )} ,

U ∼ N (0, 1)

ρ(1) U ∼ N (0, 1) σ (g) = E {U g(σU )} , % $ 1 (1,1) Ws dWs , & ρσ (g) = E g(σW1 ) 0

the α(1), α(2) and α(3) are deﬁned by ∂gj ∂gj ∗ (11) ρσs (hjj ) + as ρσs ρσs (hjj ) αs (1)j = σs ρ˜σs ∂x ∂x αs (2)j = ρ(1) σs (gj ) ρσs (hjj )

αs (3) αs (3) = As − αs (2) αs (2)

and the elements of the 3 × 3 matrix process A is given by ) ρσ (hjj hj j ) + ρσ (gj ) ρσ h g = ρ (g g Aj,j σ j j j s s s s s jj ρσs (hj j ) + ρσs (gj ) ρσs gj hj j ρσs (hjj ) − 3ρσs (gj ) ρσs (gj ) ρσs (hjj ) ρσs (hj j ) . Then we obtain the result using the following Lemma. Lemma 1 Let U be standard normally distrubuted. Then ( ' ( ' 2 1 E 1{U ≥0} U = √ , E 1{U ≥0} U 3 = √ , 2π 2π ( ' 2 E 1{U ≤0} U 3 = − √ , 2π

' ( 1 E 1{U ≤0} U = − √ . 2π

Proof Let f be the density of the standard normal distribution. 2 ∞ ∞ 1 x xdx f (x) xdx = √ exp − 2 2π 0 0 2 ∞ 1 x =√ − exp − 2 2π 0 1 =√ . 2π

Appendix

135

Using partial integration we obtain 2 ∞ ∞ 1 x xdx f (x) xdx = √ exp − 2 2π 0 0 2 ∞ 1 1 2 x x exp − =√ 2 2π 2 0 2 ∞ x 1 1 2 x − exp − x dx −√ 2 2 2π 0 2 ∞ 1 x x3 dx = √ exp − 2 2 2π 0 1 ∞ 3 x f (x) dx. = 2 0 Thus 0

∞

2 x3 f (x) dx = √ . 2π

Obviously, it holds

0

−∞ 0

−∞

∞

f (x) xdx = −

f (x) xdx, 0

∞

x3 f (x) dx = −

x3 f (x) dx.

0

This completes the proof of the Lemma. Using the lemma we can calculate the moments ρσs (g1 ) = ρσs (g2 ) =

1 2 σ , 2 s

ρσs (h1,1 ) = ρσs (h2,2 ) = 1, ρσs (h3,3 ) = ρσs (g3 ) = μ1 σs , 2 2 σs = −ρ(1) ρ(1) σs (g1 ) = √ σs (g2 ) , 2π 1 3 σ μ3 , 2

ρσs (g3 h3,3 ) = μ21 σs2 , ρσs g32 = ρσs h23 = μ21 , ρσs (g1 h3,3 ) = ρσs (g2 h3,3 ) =

We note that μ3 = 2μ1 . Further 2 ∂g1 ∂g2 = √ σs = −ρσs , ρσs ∂x ∂x 2π

136

Measuring downside risk – realized semivariance ρ(1) σs

∂g1 ∂x

= ρ(1) σs

∂g2 ∂x

= σs ,

3 2 2 ρσs (g1 ) = ρσs (g2 ) = σs4 , 2 σs ∂g1 ∂g2 11 11 √ = . ρ˜σs = −˜ ρσs ∂x ∂x 2π The last statement follows from 1 ∂g1 ∂g1 =E (σs W1 ) ρ˜σs Wu dWu ∂x ∂x 0 1 Wu dWu = 2E σs W1 1{W1 ≥0} 0

= 2E σs W1 1{W1 ≥0} = σs E

W13 − W1 1{W1 ≥0}

'

σs =√ . 2π

1 2 1 W − 2 1 2 (

8

Glossary to ARCH (GARCH) Tim Bollerslev

Rob Engle’s seminal Nobel Prize winning 1982 Econometrica article on the AutoRegressive Conditional Heteroskedastic (ARCH) class of models spurred a virtual “arms race” into the development of new and better procedures for modeling and forecasting timevarying ﬁnancial market volatility. Some of the most inﬂuential of these early papers were collected in Engle (1995). Numerous surveys of the burgeoning ARCH literature also exist; e.g., Andersen and Bollerslev (1998), Andersen, Bollerslev, Christoﬀersen and Diebold (2006a), Bauwens, Laurent and Rombouts (2006), Bera and Higgins (1993), Bollerslev, Chou and Kroner (1992), Bollerslev, Engle and Nelson (1994), Degiannakis and Xekalaki (2004), Diebold (2004), Diebold and Lopez (1995), Engle (2001, 2004), Engle and Patton (2001), Pagan (1996), Palm (1996), and Shephard (1996). Moreover, ARCH models have now become standard textbook material in econometrics and ﬁnance as exempliﬁed by, e.g., Alexander (2001, 2008), Brooks (2002), Campbell, Lo and MacKinlay (1997), Chan (2002), Christoﬀersen (2003), Enders (2004), Franses and van Dijk (2000), Gourieroux and Jasiak (2001), Hamilton (1994), Mills (1993), Poon (2005), Singleton (2006), Stock and Watson (2007), Tsay (2002), and Taylor (2004). So, why another survey type chapter? Even a cursory glance at the many reviews and textbook treatments cited above reveals a perplexing “alphabet-soup” of acronyms and abbreviations used to describe the plethora of models and procedures that have been developed over the years. Hence, as a complement to these more traditional surveys, I have tried to provide an alternative and easy-to-use encyclopedic-type reference guide to the long list of ARCH acronyms. Comparing the length of this list to the list of general Acronyms in Time Series Analysis (ATSA) compiled by Granger (1983) further underscores the scope of the research eﬀorts and new developments that have occurred in the area following the introduction of the basic linear ARCH model in Engle (1982a). Acknowledgments: I would like to acknowledge the ﬁnancial support provided by a grant from the NSF to the NBER and CREATES funded by the Danish National Research Foundation. I would also like to thank Frank Diebold, Xin Huang, Andrew Patton, Neil Shephard and Natalia Sizova for valuable comments and suggestions. Of course, I am solely to blame for any errors or omissions.

137

138

Glossary to ARCH (GARCH)

My deﬁnition of what constitutes an ARCH acronym is, of course, somewhat arbitrary and subjective. In addition to the obvious cases of association of acronyms with speciﬁc parametric models, I have also included descriptions of some association of abbreviations with more general procedures and ideas that ﬁgure especially prominently in the ARCH literature. With a few exceptions, I have restricted the list of acronyms to those that have appeared in already published studies. Following Granger (1983), I have purposely not included the names of speciﬁc computer programs or procedures as these are often of limited availability and may also be sold commercially. Even though I have tried my best to be as comprehensive and inclusive as possible, I have almost surely omitted some abbreviations. To everyone responsible for an acronym that I have inadvertently left out, please accept my apology. Lastly, let me make it clear that the mere compilation of this list does not mean that I endorse the practice of associating each and every ARCH formulation with its own unique acronym. In fact, the sheer length of this list arguably suggests that the use of special names and abbreviations originally intended for easily telling diﬀerent ARCH models apart might have reached a point of diminishing returns to scale. AARCH (Augmented ARCH) The AARCH model of Bera, Higgins and Lee (1992) extends the linear ARCH(q) model (see ARCH) to allow the conditional variance to depend on cross-products of the lagged innovations. Deﬁning the q × 1 vector et−1 ≡ {εt−1 , εt−2 , . . . , εt−q }, the AARCH(q) model may be expressed as:

σt2 = ω + et−1 Aet−1 ,

where A denotes a q × q symmetric positive deﬁnite matrix. If A is diagonal, the model reduces to the standard linear ARCH(q) model. The Generalized AARCH, or GAARCH model is obtained by including lagged conditional variances on the right-handside of the equation. The slightly more general GQARCH representation was proposed independently by Sentana (1995) (see GQARCH). ACD (Autoregressive Conditional Duration) The ACD model of Engle and Russell (1998) was developed to describe dynamic dependencies in the durations between randomly occurring events. The model has found especially wide use in the analysis of highfrequency ﬁnancial data and times between trades or quotes. Let xi ≡ ti − ti−1 denote the time interval between the ith and the (i-1)th event. The popular ACD(1,1) model then parameterizes the expected durations, ψi = E (xi |xi−1 , xi−2 , . . .), analogous to the conditional variance in the GARCH(1,1) model (see GARCH),

ψi = ω + αxi−1 + βψi−1 .

Higher order ACD(p,q) models are deﬁned in a similar manner. Quasi Maximum Likelihood Estimates (see QMLE) of the parameters in the ACD(p,q) model may be obtained by applying standard GARCH(p,q) estimation procedures to yi ≡ x1i /2 , with the conditional mean ﬁxed at zero (see also ACH and MEM). ACH1 (Autoregressive Conditional Hazard) The ACH model of Hamilton and Jord´ a (2002) is designed to capture dynamic dependencies in hazard rates, or the probability for the occurrence of speciﬁc events. The basic ACH(p,q) model without any updating of

Glossary to ARCH (GARCH)

139

the expected hazard rates between events is asymptotically equivalent to the ACD(p,q) model for the times between events (see ACD). ACH2

(Adaptive Conditional Heteroskedasticity) In parallel to the idea of allowing for time-varying variances in a sequence of normal distributions underlying the basic ARCH model (see ARCH), it is possible to allow the scale parameter in a sequence of Stable Paretian distributions to change over time. The ACH formulation for the scale parameter, ct , ﬁrst proposed by McCulloch (1985) postulates that the temporal variation may be described by an exponentially weighted moving average (see EWMA) of the form, ct = α|εt−1 | + (1 − α)ct−1 .

Many other more complicated Stable GARCH formulations have subsequently been proposed and analyzed in the literature (see SGARCH). ACM (Autoregressive Conditional Multinomial) The ACM model of Engle and Russell (2005) involves an ARMA-type representation for discrete-valued multinomial data, in which the conditional transition probabilities between the diﬀerent values are guaranteed to lie between zero and one and sum to unity. The ACM and ACD models (see ACD) may be combined in modeling high-frequency ﬁnancial price series and other irregularly spaced discrete data.

(Asymmetric Dynamic Conditional Correlations) The ADCC GARCH model of Cappiello, Engle and Sheppard (2006) extends the DCC model (see DCC) to allow for asymmetries in the time-varying conditional correlations based on a GJR threshold-type formulation (see GJR).

ADCC

AGARCH1

(Asymmetric GARCH) The AGARCH model was introduced by Engle (1990) to allow for asymmetric eﬀects of negative and positive innovations (see also EGARCH, GJR, NAGARCH, and VGARCH1 ). The AGARCH(1,1) model is deﬁned by: 2 σt2 = ω + αε2t−1 + γεt−1 + βσt− 1,

where negative values of γ implies that positive shocks will result in smaller increases in future volatility than negative shocks of the same absolute magnitude. The model may alternatively be expressed as: 2 σt2 = ω + α(εt−1 + γ )2 + βσt− 1,

for which ω > 0, α ≥ 0 and β ≥ 0 readily ensures that the conditional variance is positive almost surely. AGARCH2

(Absolute value GARCH) See TS-GARCH.

(Artiﬁcial Neural Network ARCH) Donaldson and Kamstra (1997) term the GJR model (see GJR) augmented with a logistic function, as commonly used in Neural Networks, the ANN-ARCH model.

ANN-ARCH

(Asymmetric Nonlinear Smooth Transition GARCH) The ANSTGARCH(1,1) model of Nam, Pyun and Arize (2002) postulates that

ANST-GARCH

2 2 2 σt2 = ω + αε2t−1 + βi σt− 1 + [κ + δεt−1 + ρσt−1 ]F (εt−1 , γ ),

140

Glossary to ARCH (GARCH)

where F (·) denotes a smooth transition function. The model simpliﬁes to the STGARCH(1,1) model of Gonz´ alez-Rivera (1998) for κ = ρ = 0 (see ST-GARCH) and the standard GARCH(1,1) model for κ = δ = ρ = 0 (see GARCH). APARCH (Asymmetric Power ARCH) The APARCH, or APGARCH, model of Ding, Engle and Granger (1993) nests several of the most popular univariate parameterizations. In particular, the APGARCH(p,q) model,

σtδ = ω +

q

αi (|εt−i | − γi εt−i )δ +

i=1

p i=1

δ βi σt−i ,

reduces to the standard linear GARCH(p,q) model for δ = 2 and γi = 0, the TSGARCH(p,q) model for δ = 1 and γi = 0, the NGARCH(p,q) model for γi = 0, the GJR-GARCH model for δ = 2 and 0 ≤ γi ≤ 1, the TGARCH(p,q) model for δ = 1 and 0 ≤ γi ≤ 1, while the log-GARCH(p,q) model is obtained as the limiting case of the model for δ → 0 and γi = 0 (see GARCH, TS-GARCH, NGARCH, GJR, TGARCH and log-GARCH). (AutoRegressive Conditional Density) The ARCD class of models proposed by Hansen (1994) extends the basic ARCH class of models to allow for conditional dependencies beyond the mean and variance by postulating a speciﬁc non-normal distribution for the standardized innovations zt ≡ εt σt−1 , explicitly parameterizing the shape parameters of this distribution as a function of lagged information. Most empirical applications of the ARCD model have relied on the standardized skewed Student-t distribution (see also GARCH-t and GED-GARCH). Speciﬁc examples of ARCD models include the GARCH with Skewness, or GARCHS, model of Harvey and Siddique (1999), in which the skewness is allowed to be time-varying. In particular, for the GARCHS(1,1,1) model, ARCD

st = γ0 + γ1 zt3 + γ2 st−1 ,

where st ≡ Et−1 (zt3 ). Similarly, the GARCH with Skewness and Kurtosis, or GARCHSK, model of Le´on, Rubio and Serna (2005), parameterizes the conditional kurtosis as: kt = δ0 + δ1 zt4 + δ2 kt−1 ,

where kt ≡ Et−1 (zt4 ). ARCH (AutoRegressive Conditional Heteroskedastcity) The ARCH model was originally developed by Engle (1982a) to describe UK inﬂationary uncertainty. However, the ARCH class of models has subsequently found especially wide use in characterizing time-varying ﬁnancial market volatility. The ARCH regression model for yt ﬁrst analyzed in Engle (1982a) is deﬁned by:

yt |Ft−1 ∼ N (xt β, σt2 ),

where Ft−1 refers to the information set available at time t − 1, and the conditional variance, σt2 = f (εt−1 , εt−2 , . . . , εt−p ; θ),

is an explicit function of the p lagged innovations, εt ≡ yt − xt β . Using a standard prediction error decomposition-type argument, the log-likelihood function for the ARCH

Glossary to ARCH (GARCH)

141

model may be expressed as: Log L(yT , yt−1 , . . . , y1 ; β, θ ) = −

T 2

log(2π ) −

T 1

2

t=1

[log(σt2 ) + (yt − xt β )σt−2 ].

Even though analytical expressions for the Maximum Likelihood Estimates (see also QMLE) are not available in closed form, numerical procedures may readily be used to maximize the function. The q th-order linear ARCH(q) model suggested by Engle (1982a) provides a particularly convenient and natural parameterizarion for capturing the tendency for large (small) variances to be followed by other large (small) variances, σt2 = ω +

q i=1

αi ε2t−i ,

where for the conditional variance to be non-negative and the model well deﬁned ω has to be positive and all of the αi s non-negative. Most of the early empirical applications of ARCH models, including Engle (1982a), were based on the linear ARCH(q) model with the additional constraint that the αi s decline linearly with the lag, σt2 = ω + α

q i=1

(q + 1 − i)ε2t−i ,

in turn requiring the estimation of only a single α parameter irrespective of the value of q . More generally, any nontrivial measurable function of the time t − 1 information set, σt2 , such that εt = σt zt ,

where zt is a sequence of independent random variables with mean zero and unit variance, is now commonly referred to as an ARCH model. ARCH-Filters ARCH and GARCH models may alternatively be given a nonparametric interpretation as discrete-time ﬁlters designed to extract information about some underlying, possibly latent continuous-time, stochastic volatility process. Issues related to the design of consistent and asymptotically optimal ARCH-Filters have been studied extensively by Nelson (1992, 1996a) and Nelson and Foster (1994). For instance, the asymptotically eﬃcient ﬁlter (in a mean-square-error sense for increasingly ﬁner sample observations) for the instantaneous volatility in the GARCH diﬀusion model (see GARCH Diﬀusion) is given by the discrete-time GARCH(1,1) model (see also ARCH-Smoothers).

(ARCH Nonstationary Nonlinear Heteroskedasticity) The ARCH-NNH model of Han and Park (2008) includes a nonlinear function of a near or exact unit root process, xt , in the conditional variance of the ARCH(1) model,

ARCH-NNH

σt2 = αε2t−1 + f (xt ).

The model is designed to capture slowly decaying stochastic long run volatility dependencies (see also CGARCH1 , FIGARCH, IGARCH). (ARCH-in-Mean) The ARCH-M model was ﬁrst introduced by Engle, Lilien and Robins (1987) for modeling risk-return tradeoﬀs in the term structure of US interest

ARCH-M

142

Glossary to ARCH (GARCH)

rates. The model extends the ARCH regression model in Engle (1982a) (see ARCH) by allowing the conditional mean to depend directly on the conditional variance, yt |Ft−1 ∼ N (xt β + δσt2 , σt2 ).

This breaks the block-diagonality between the parameters in the conditional mean and the parameters in the conditional variance, so that the two sets of parameters must be estimated jointly to achieve asymptotic eﬃciency. Nonlinear functions of the conditional variance may be included in the conditional mean in a similar fashion. The ﬁnal preferred model estimated in Engle, Lilien and Robins (1987) parameterizes the conditional mean

as a function of log σt2 . Multivariate extensions of the ARCH-M model were ﬁrst analyzed and estimated by Bollerslev, Engle and Wooldridge (1988) (see also MGARCH1 ). (ARCH Stochastic Mean) The ARCH-SM acronym was coined by Lee and Taniguchi (2005) to distinguish ARCH models in which εt ≡ yt − Et−1 (yt ) = yt − E (yt ) (see ARCH). ARCH-SM

ARCH-Smoothers, ﬁrst developed by Nelson (1996b) and Foster and Nelson (1996), extend the ARCH and GARCH models and corresponding ARCH-Filters based solely on past observations (see ARCH-Filters) to allow for the use of both current and future observations in the estimation of the latent volatility.

ARCH-Smoothers

ATGARCH (Asymmetric Threshold GARCH) The ATGARCH(1,1) model of Crouhy and Rockinger (1997) combines and extends the TS-GARCH(1,1) and GJR(1,1) models (see TS-GARCH and GJR) by allowing the threshold used in characterizing the asymmetric response to diﬀer from zero,

σt = ω + α|εt−1 |I (εt−1 ≥ γ ) + δ|εt−1 |I (εt−1 < γ ) + βσt−1 .

Higher order ATGARCH(p,q) models may be deﬁned analogously (see also AGARCH and TGARCH). (Augmented GARCH) The Aug-GARCH model developed by Duan (1997) nests most of the popular univariate parameterizations, including the standard linear GARCH model, the Multiplicative GARCH model, the Exponential GARCH model, the GJR-GARCH model, the Threshold GARCH model, the Nonlinear GARCH model, the Taylor–Schwert GARCH model, and the VGARCH model (see GARCH, MGARCH2 , EGARCH, GJR, TGARCH, NGARCH, TS-GARCH and VGARCH1 ). The Aug-GARCH(1,1) model may be expressed as: Aug-GARCH

σt2 = |λϕt − λ + 1|I (λ = 0) + exp(ϕt − 1)I (λ = 0),

where ϕt = ω + α1 |zt−1 − κ|δ ϕt−1 + α2 max(0, κ − zt−1 )δ ϕt−1 + α3 (|zt−1 − κ|δ − 1)/δ + α4 (max(0, κ − zt−1 )δ − 1)/δ + βϕt−1 ,

and zt ≡ εt σt−1 denotes the corresponding standardized innovations. The basic GARCH(1,1) model is obtained by ﬁxing λ = 1, κ = 0, δ = 2 and α2 = α3 = α4 = 0,

Glossary to ARCH (GARCH)

143

whereas the EGARCH model corresponds to λ = 0, κ = 0, δ = 1 and α1 = α2 = 0 (see also HGARCH). AVGARCH

(Absolute Value GARCH) See TS-GARCH.

β -ARCH (Beta ARCH) The β -ARCH(1) model of Gu´ egan and Diebolt (1994) allows the conditional variance to depend asymmetrically on positive and negative lagged innovations, ·β ˙ (εt−1 > 0) + γI (εt−1 < 0)]ε2t− σt2 = ω + [αI 1,

where I (·) denotes the indicator function. For α = γ and β = 1 the model reduces to the standard linear ARCH(1) model. More general β -ARCH(q) and β -GARCH(p,q) models may be deﬁned in a similar fashion (see also GJR, TGARCH, and VGARCH1 ). (Baba, Engle, Kraft and Kroner) The BEKK acronym refers to a speciﬁc parameteriztion of the multivariate GARCH model (see MGARCH1 ) developed in Engle and Kroner (1995). The simplest BEKK representation for the N × N conditional covariance matrix Ωt takes the form:

BEKK

Ωt = C C + A εt−1 εt−1 A + B Ωt−1 B,

where C denotes an upper triangular N × N matrix, and A and B are both unrestricted N ×N matrices. This quadratic representation automatically guarantees that Ωt is positive deﬁnite. The reference to Y. Baba and D. Kraft in the acronym stems from an earlier unpublished four-authored working paper. BGARCH

(Bivariate GARCH) See MGARCH1 .

CARR (Conditional AutoRegressive Range) The CARR(p,q) model proposed by Chou (2005) postulates a GARCH(p,q) structure (see GARCH) for the dynamic dependencies in time series of high–low asset prices over some ﬁxed time interval. The model is essentially analogous to the ACD model (see ACD) for the times between randomly occurring events (see also REGARCH).

(Conditional Autoregressive Value at Risk) The CAViaR model of Engle and Manganelli (2004) speciﬁes the evolution of a particular conditional quantile of a time series, say ft where Pt−1 (yt ≤ ft ) = p for some pre-speciﬁed ﬁxed level p, as an autoregressive process. The indirect GARCH(1,1) model parameterizes the conditional quantiles as:

CAViaR

2 2 ft = ω + αyt− 1 + βft−1

1/ 2

.

This formulation would be correctly speciﬁed if the underlying process for yt follows a GARCH(1,1) model with i.i.d. standardized innovations (see GARCH). Alternative models allowing for asymmetries may be speciﬁed in a similar manner. The CAViaR model was explicitly developed for predicting quantiles in ﬁnancial asset return distributions, or so-called Value-at-Risk. CCC (Constant Conditional Correlations) The N × N conditional covariance matrix for the N × 1 vector process εt , say Ωt , may always be decomposed as:

Ωt = Rt Dt Rt ,

144

Glossary to ARCH (GARCH)

where Rt denotes the N × N matrix of conditional correlations with typical element ρijt =

Covt−1 (εit , εjt ) , Vart−1 (εit )1/2 Vart−1 (εjt )1/2

and Dt denotes the N × N diagonal matrix with typical element Var t−1 (εit ). The CCC GARCH model of Bollerslev (1990) assumes that the conditional correlations are constant ρijt = ρij , so that the temporal variation in Ωt is determined solely by the time-varying conditional variances for each of the elements in εt . This assumption greatly simpliﬁes the inference, requiring only the nonlinear estimation of N univariate GARCH models, whereas Rt = R may be estimated by the sample correlations of the corresponding standardized residuals. Moreover, as long as each of the conditional variances are positive, the CCC model guarantees that the resulting conditional covariance matrices are positive deﬁnite (see also DCC and MGARCH1 ). Censored-GARCH

See Tobit-GARCH.

CGARCH1

(Component GARCH) The component GARCH model of Engle and Lee (1999) was designed to better account for long run volatility dependencies. Rewriting the GARCH(1,1) model as:

2 2 σt2 − σ 2 = α ε2t−1 − σ 2 + β σt− , 1 −σ

where σ 2 ≡ ω/(1 − α − β ) refers to the unconditional variance, the CGARCH model is obtained by relaxing the assumption of a constant σ 2 . Speciﬁcally,

2 2 2 σt2 − ζt2 = α ε2t−1 − ζt− 1 + β σt−1 − ζt−1 ,

with the corresponding long run variance parameterized by the separate equation,

2 2 2 ζt2 = ω + ρζt− 1 + ϕ εt−1 − σt−1 .

Substituting this expression for ζt2 into the former equation, the CGARCH model may alternatively be expressed as a restricted GARCH(2,2) model (see also FIGARCH). CGARCH2

(Composite GARCH) The CGARCH model of den Hertog (1994) represents

ε2t as the sum of a latent permanent random walk component and another latent AR(1)

component. (Continuous GARCH) The continuous-time COGARCH(1,1) model proposed by Kl¨ uppelberg, Lindner and Maller (2004) may be expressed as,

COGARCH

dy (t) = σ (t)dL(t),

and σ 2 (t) = [σ 2 (0) + ω

t

0

exp(x(s))ds] exp(−x(t− )),

where x(t) = −t log β −

log[1 + α exp(− log β )ΔL(s)2 ].

0<s≤t

The model is obtained by backward solution of the diﬀerence equation deﬁning the discrete-time GARCH(1,1) model (see GARCH), replacing the standardized innovations by the increments to the L´evy process, L(t). In contrast to the GARCH diﬀusion model

Glossary to ARCH (GARCH)

145

of Nelson (1990b) (see GARCH Diﬀusion), which involves two independent Brownian motions, the COGARCH model is driven by a single innovation process. Higher order COGARC(p,q) processes have been developed by Brockwell, Chadraa and Lindner (2006) (see also ECOGARCH). Any joint distribution function may be expressed in terms of its marginal distribution functions and a copula function linking these. The class of copula GARCH models builds on this idea in the formulation of multivariate GARCH models (see MGARCH1 ) by linking univariate GARCH models through a sequence of possibly timevarying conditional copulas. For further discussion of estimation and inference in copula GARCH models, see, e.g., Jondeau and Rockinger (2006) and Patton (2006a) (see also CCC and DCC). Copula GARCH

(Correlated ARCH) The bivariate CorrARCH model of Christodoulakis and Satchell (2002) parameterizes the time-varying conditional correlations as a distributed lag of the product of the standardized innovations from univariate GARCH models for each of the two series. A Fisher transform is used to ensure that the resulting correlations always lie between −1 and 1 (see also CCC, DCC and MGARCH1 ).

CorrARCH

DAGARCH (Dynamic Asymmetric GARCH) The DAGARCH model of Caporin and McAleer (2006) extends the GJR-GARCH model (see GJR) to allow for multiple thresholds and time-varying asymmetric eﬀects (see also AGARCH, ATGARCH and TGARCH). DCC (Dynamic Conditional Correlations) The multivariate DCC-GARCH model of Engle (2002a) extends the CCC model (see CCC) by allowing the conditional correlations to be time-varying. To facilitate the analysis of large dimensional systems, the basic DCC model postulates that the temporal variation in the conditional correlations may be described by exponential smoothing (see EWMA) so that

ρijt =

qijt , 1/ 2 1/ 2 qiit qjjt

where qijt = (1 − λ)εit−1 εjt−1 + λqijt−1 ,

and εt denotes the N × 1 vector innovation process. A closely related formulation was proposed independently by Tse and Tsui (2002), who refer to their approach as a Varying Conditional Correlation, or VCC-MGARCH model (see also ADCC, CorrARCH, FDCC and MGARCH1 ). (diagonal GARCH) The diag MGARCH model refers to the simpliﬁcation of the vech GARCH model (see vech GARCH) in which each of the elements in the conditional covariance matrix depends on its own past values and the products of the corresponding elements in the innovation vector only. The model is conveniently expressed in terms of Hadamard products, or matrix element-by-element multiplication. In particular, for the diag MGARCH(1,1) model,

diag MGARCH

Ωt = C o + Ao εt−1 εt−1 + B o Ωt−1 .

146

Glossary to ARCH (GARCH)

It follows (see Attanasio, 1991) that if each of the three N × N matrices C o , Ao and B o are positive deﬁnite, the conditional covariance matrix will also be positive deﬁnite (see also MGARCH1 ). DTARCH (Double Threshold ARCH) The DTARCH model of Li and Li (1996) allows the parameters in both the conditional mean and the conditional variance to change across regimes, with the m diﬀerent regimes determined by a set of threshold parameters for some lag k ≥ 1 of the observed yt process, say rj−1 < yt−k ≤ rj , where −∞ = r0 < r1 < . . . < rm = ∞ (see also TGARCH). DVEC-GARCH

(Diagonal VECtorized GARCH) See diag MGARCH.

(Exponential Continuous GARCH) The continuous-time ECOGARCH model developed by Haug and Czado (2007) extends the L´evy driven COGARCH model of Kl¨ uppelberg, Lindner and Maller (2004) (see COGARCH) to allow for diﬀerent impact of positive and negative jump innovations, or so-called leverage eﬀects. The model may be seen as a continuous-time analog of the discrete-time EGARCH model (see also EGARCH, GJR and TGARCH).

ECOGARCH

(Exponential GARCH) The EGARCH model was developed by Nelson (1991). The model explicitly allows for asymmetries in the relationship between return and volatility (see also GJR and TGARCH). In particular, let zt ≡ εt σt−1 denote the standardized innovations. The EGARCH (1,1) model may then be expressed as:

EGARCH

2 log(σt2 ) = ω + α(|zt−1 | − E (|zt−1 |)) + γzt−1 + β log(σt− 1 ).

For γ < 0 negative shocks will obviously have a bigger impact on future volatility than positive shocks of the same magnitude. This eﬀect, which is typically observed empirically with equity index returns, is often referred to as a “leverage eﬀect,” although it is now widely agreed that the apparent asymmetry has little to do with actual ﬁnancial leverage. By parameterizing the logarithm of the conditional variance as opposed to the conditional variance, the EGARCH model also avoids complications from having to ensure that the process remains positive. This is especially useful when conditioning on other explanatory variables. Meanwhile, the logarithmic transformation complicates the construction of unbiased forecasts for the level of future variances (see also GARCH and log-GARCH). (Extreme Value Theory GARCH) The EVT-GARCH approach pioneered by McNeil and Frey (2000), relies on extreme value theory for i.i.d. random variables and corresponding generalized Pareto distributions for more accurately characterizing the tails of the distributions of the standardized innovations from GARCH models. This idea may be used in the calculation of low-probability quantile, or Value-at-Risk, type predictions (see also CAViaR, GARCH-t and GED-GARCH).

EVT-GARCH

EWMA (Exponentially Weighted Moving Average) EWMA variance measures are deﬁned

by the recursion, 2 σt2 = (1 − λ)ε2t−1 + λσt− 1.

EWMA may be seen as a special case of the GARCH(1,1), or IGARCH(1, 1), model in which ω ≡ 0, α ≡ 1 − λ and β ≡ λ (see GARCH and IGARCH). EWMA covariance measures are readily deﬁned in a similar manner. The EWMA approach to variance

Glossary to ARCH (GARCH)

147

estimation was popularized by RiskMetrics, advocating the use of λ = 0.94 with daily ﬁnancial returns. F-ARCH (Factor ARCH) The multivariate factor ARCH model developed by Diebold and Nerlove (1989) (see also Latent GARCH) and the factor GARCH model of Engle, Ng and Rothschild (1990) assumes that the temporal variation in the N × N conditional covariance matrix for a set of N returns can be described by univariate GARCH models for smaller set of K < N portfolios,

Ωt = Ω +

K k=1

2 λk λk σkt ,

2 refer to the time invariant N × 1 vector of factor loadings and time where λk and σkt t conditional variance for the kth factor, respectively. More speciﬁcally, the F-GARCH (1,1) model may be expressed as:

Ωt = Ω + λλ [βw Ωt−1 w + α(w εt−1 )2 ]

where w denotes an N × 1 vector, and α and β are both scalar parameters (see also OGARCH and MGARCH1 ). (Flexible Coeﬃcient GARCH) The FCGARCH model of Medeiros and Veiga (2009) deﬁnes the conditional variance as a linear combination of standard GARCH-type models, with the weights assigned to each model determined by a set of logistic functions. The model nests several alternative smooth transition and asymmetric GARCH models as special limiting cases, including the DTARCH, GJR, STGARCH, TGARCH, and VSGARCH models.

FCGARCH

FDCC (Flexible Dynamic Conditional Correlations) The FDCC-GARCH model of Billio, Caporin and Gobbo (2006) generalizes the basic DCC model (see DCC) to allow for diﬀerent dynamic dependencies in the time-varying conditional correlations (see also ADCC). FGARCH

(Factor GARCH) See F-ARCH.

FIAPARCH (Fractionally Integrated Power ARCH) The FIAPARCH (p,d,q) model of Tse (1998) combines the FIGARCH (p,d,q) and the APARCH (p,q) models in parameterizing σtδ as a fractionally integrated distributed lag of (|εt | − γεt )δ (see FIGARCH and APARCH). FIEGARCH (Fractionally Integrated EGARCH) The FIEGARCH model of Bollerslev and Mikkelsen (1996) imposes a fractional unit root in the autoregressive polynomial in the ARMA representation of the EGARCH model (see EGARCH). In particular, the FIEGARCH (1,d,1) model may be conveniently expressed as: (1 − βL)(1 − L)d log(σt2 ) = ω + α(|zt−1 | − E (|zt−1 |)) + γzt−1 .

For 0 < d < 1 this representation implies fractional integrated slowly decaying hyperbolic

dependencies in log σt2 (see also FIGARCH, HYGARCH and LMGARCH). FIGARCH (Fractionally Integrated GARCH) The FIGARCH model proposed by Baillie, Bollerslev and Mikkelsen (1996) relies on an ARFIMA-type representation to better capture the long run dynamic dependencies in the conditional variance. The model may

148

Glossary to ARCH (GARCH)

be seen as natural extension of the IGARCH model (see IGARCH), allowing for fractional orders of integration in the autoregressive polynomial in the corresponding ARMA representation, ϕ(L)(1 − L)d ε2t = ω + (1 − β (L))νt ,

where vt ≡ ε2t − σt2 , 0 < d < 1, and the roots of ϕ(z ) = 0 and β (z ) = 1 are all outside the unit circle. For values of 0 < d < 1/2 the model implies an eventual slow hyperbolic decay in the autocorrelations for σt2 (see also FIEGARCH, HYGARCH and LMGARCH). FIREGARCH

(Fractionally Integrated Range EGARCH) See REGARCH.

(Flexible GARCH) The multivariate Flex-GARCH model of Ledoit, Santa-Clara and Wolf (2003) is designed to reduce the computational burden involved in the estimation of multivariate diagonal MGARCH model (see diag MGARCH). This is accomplished by estimating a set of bivariate MGARCH models for each of the N (N +1)/2 possible diﬀerent pairwise combinations of the N variables, and then subsequently “paste” together the parameter estimates subject to the constraint that the resulting parameter matrices for the full N -dimensional MGARCH model guarantee positive semideﬁnite conditional covariance matrices. FLEX-GARCH

GAARCH

(Generalized Augmented ARCH) See AARCH.

(Generalized AutoRegressive Conditional Heteroskedasticity) The GARCH (p,q) model of Bollerslev (1986) includes p lags of the conditional variance in the linear ARCH(q) (see ARCH) conditional variance equation,

GARCH

σt2 = ω +

q i=1

αi ε2t−i +

p i=1

2 βi σt−i .

Conditions on the parameters to ensure that the GARCH(p,q) conditional variance is always positive are given in Nelson and Cao (1992). The GARCH(p,q) model may alternatively be represented as an ARMA(max{p, q}, p) model for the squared innovation: ε2t = ω +

max{p,q}

p

i=1

i=1

(αi + βi )ε2t−i −

βi νt−i ,

where νt ≡ ε2t − σt2 , so that by deﬁnition Et−1 (vt ) = 0. The relatively simple GARCH(1,1) model, 2 σt2 = ω + αε2t−1 + βσt− 1,

often provides a good ﬁt in empirical applications. This particular parameterization was also proposed independently by Taylor (1986). The GARCH(1,1) model is well deﬁned and the conditional variance positive almost surely provided that ω > 0, α ≥ 0 and β ≥ 0. The GARCH(1,1) model may alternatively be expressed as an ARCH(∞) model, σt2 = ω (1 − β )−1 + α

∞ i=1

β i−1 ε2t−i ,

provided that β < 1. If α + β < 1 the model is covariance stationary and the unconditional variance equals σ 2 ≡ ω/(1 − α − β ). Multiperiod conditional variance forecasts from the

Glossary to ARCH (GARCH)

149

GARCH(1,1) model may readily be calculated as: σt2+h|t = σ 2 + (α + β )h−1 (σt2+1 − σ 2 ),

where h ≥ 2 denotes the horizon of the forecast. GARCH-Δ

(GARCH Delta) See GARCH-Γ .

GARCH Diﬀusion

The continuous-time GARCH diﬀusion model is deﬁned by: dy (t) = σ (t)dW1 (t),

and dσ 2 (t) = (ω − θσ 2 (t))dt +

√

2ασ 2 (t)dW2 (t),

where the two Wiener processes, W1 (t) and W2 (t), that drive the observable y (t) process and the instantaneous latent volatility process, σ 2 (t), are assumed to be independent. As shown by Nelson (1990b), the sequence of GARCH(1,1) models deﬁned over discrete time intervals of length 1/n, 2 2 σt,n = (ω/n) + (α/n1/2 )ε2t−1/n,n + (1 − α/n1/2 − θ/n)σt− 1/n,n ,

where εt,n ≡ y (t) − y (t − 1/n), converges weakly to a GARCH diﬀusion model for n → ∞ (see also COGARCH and ARCH-Filters). (GARCH Exponential AutoRegression) The GARCH-EAR model of LeBaron (1992) allows the ﬁrst order serial correlation of the underlying process to depend directly on the conditional variance,

GARCH-EAR

yt = ϕ0 + [ϕ1 + ϕ2 exp −σt2 /ϕ3 ]yt−1 + εt .

For ϕ2 = 0 the model reduces to a standard AR(1) model, but for ϕ2 > 0 and ϕ3 > 0 the magnitude of the serial correlation in the mean will be a decreasing function of the conditional variance (see also ARCH-M). GARCH-Γ (GARCH Gamma) The gamma of an option is deﬁned as the second derivative of the option price with respect to the price of the underlying asset. Options gamma play an important role in hedging volatility risk embedded in options positions. GARCH-Γ refers to the gamma obtained under the assumption that the return on the underlying asset follows a GARCH process. Engle and Rosenberg (1995) ﬁnd that GARCH-Γ s are typically much higher than conventional Black–Scholes gammas. Meanwhile, GARCH-Δs, or the ﬁrst derivative of the option price with respect to the price of the underlying asset, tend to be fairly close to their Black–Scholes counterparts. GARCH-M GARCHS

(GARCH in Mean) See ARCH-M.

(GARCH with Skewness) See ARCD.

GARCHSK

(GARCH with Skewness and Kurtosis) See ARCD.

(GARCH t-distribution) ARCH models are typically estimated by maximum likelihood under the assumption that the errors are conditionally normally distributed (see ARCH). However, in many empirical applications the standardized residuals, εˆt σˆt−1 , appear to have fatter tails than the normal distribution. The GARCH-t model of

GARCH-t

150

Glossary to ARCH (GARCH)

Bollerslev (1987) relaxes the assumption of conditional normality by instead assuming that the standardized innovations follow a standardized Student t-distribution. The corresponding log Likelihood function may be expressed as: LogL(θ) =

−1 ν+1 ν log Γ Γ ((ν − 2)σt2 )−1/2 (1 + (ν − 2)−1 σt−2 ε2t )−(ν +1)/2 , 2 2 t=1 T

where ν > 2 denotes the degrees of freedom to be estimated along with the parameters in the conditional variance equation (see also GED-GARCH, QMLE and SPARCH). GARCH-X1

The multivariate GARCH-X model of Lee (1994) includes the error correction term from a cointegrating-type relationship for the underlying vector process yt ∼ I (1), say zt−1 = b yt−1 ∼ I (0), as an explanatory variable in the conditional covariance matrix (see also MGARCH1 ). GARCH-X2

The GARCH-X model proposed by Brenner, Harjes and Kroner (1996) for modeling short-term interest rates includes the lagged interest rate raised to some power, γ say δrt− 1 , as an explanatory variable in the GARCH conditional variance equation (see GARCH). The GARCHX model proposed by Hwang and Satchell (2005) for modeling aggregate stock market return volatility includes a measure of the lagged cross-sectional return variation as an explanatory variable in the GARCH conditional variance equation (see GARCH).

GARCHX

Maheu and McCurdy (2004) refer to the standard GARCH model (see GARCH) augmented with occasional Poisson distributed “jumps” or large moves, where the timevarying jump intensity is determined by a separate autoregressive process, as a GARJI model.

GARJI

GDCC (Generalized Dynamic Conditional Correlations) The multivariate GDCCGARCH model of Cappiello, Engle and Sheppard (2006) utilizes a more ﬂexible BEKK-type parameterization (see BEKK) for the dynamic conditional correlations (see DCC). Combining the ADCC (see ADCC) and the GDCC models results in an AGDCC model (see also FDCC).

(Generalized Error Distribution GARCH) The GED-GARCH model of Nelson (1991) replaces the assumption of conditionally normal errors traditionally used in the estimation of ARCH models with the assumption that the standardized innovations follow a generalized error distribution, or what is also sometimes referred to as an exponential power distribution (see also GARCH-t).

GED-GARCH

GJR (Glosten, Jagannathan and Runkle GARCH) The GJR-GARCH, or just GJR, model of Glosten, Jagannathan and Runkle (1993) allows the conditional variance to respond diﬀerently to the past negative and positive innovations. The GJR(1,1) model may be expressed as: 2 σt2 = ω + αε2t−1 + γε2t−1 I (εt−1 < 0) + βσt− 1,

where I (·) denotes the indicator function. The model is also sometimes referred to as a Sign-GARCH model. The GJR formulation is closely related to the Threshold GARCH, or TGARCH, model proposed independently by Zako¨ıan (1994) (see TGARCH), and

Glossary to ARCH (GARCH)

151

the Asymmetric GARCH, or AGARCH, model of Engle (1990) (see AGARCH). When estimating the GJR model with equity index returns, γ is typically found to be positive, so that the volatility increases proportionally more following negative than positive shocks. This asymmetry is sometimes referred to in the literature as a “leverage eﬀect,” although it is now widely agreed that it has little to do with actual ﬁnancial leverage (see also EGARCH). (Generalized Orthogonal GARCH) The multivariate GO-GARCH model of van der Weide (2002) assumes that the temporal variation in the N × N conditional covariance matrix may be expressed in terms of N conditionally uncorrelated components,

GO-GARCH

Ωt = XDt X ,

where X denotes a N × N matrix, and Dt is diagonal with the conditional variances for each of the components along the diagonal. This formulation permits estimation by a relatively easy-to-implement two-step procedure (see also F-ARCH, GO-GARCH and MGARCH1 ). GQARCH (Generalized Quadratic ARCH) The GQARCH(p,q) model of Sentana (1995) is deﬁned by:

σt2 = ω +

q i=1

ψi εt−i +

q i=1

αi ε2t−i + 2

q q

αij εt−i εt−j +

i=1 j =i+1

q i=1

2 βi σt−i .

The model simpliﬁes to the linear GARCH(p,q) model if all of the ψi s and the αij s are equal to zero. Deﬁning the q × 1 vector et−1 ≡ {εt−1 , εt−2 , . . . , εt−q }, the model may alternatively be expressed as: σt2 = ω + Ψ et−1 + et−1 A et−1 +

q i=1

2 βi σt−i ,

where Ψ denotes the q×1 vector of ψi coeﬃcients and A refers to the q×q symmetric matrix of αi and αij coeﬃcients. Conditions on the parameters for the conditional variance to be positive almost surely and the model well deﬁned are given in Sentana (1995) (see also AARCH). GQTARCH

(Generalized Qualitative Threshold ARCH) See QTARCH.

(Generalized Regime-Switching GARCH) The RGS-GARCH model proposed by Gray (1996) allows the parameters in the GARCH model to depend upon an unobservable latent state variable governed by a ﬁrst order Markov process. By aggregating the conditional variances over all of the possible states at each point in time, the model is formulated in such a way that it breaks the path-dependence, which complicates the estimation of the SWARCH model of Cai (1994) and Hamilton and Susmel (1994) (see SWARCH).

GRS-GARCH

HARCH (Heterogeneous ARCH) The HARCH(n) model of M¨ uller, Dacorogna, Dav´e, Olsen, Puctet and von Weizs¨acker (1997) parameterizes the conditional variance as a function of the square of the sum of lagged innovations, or the squared lagged returns,

152

Glossary to ARCH (GARCH)

over diﬀerent horizons, σt2 = ω +

n

⎛

γi ⎝

i=1

i

⎞2

εt−j ⎠ .

j =1

The model is motivated as arising from the interaction of traders with diﬀerent investment horizons. The HARCH model may be interpreted as a restricted QARCH model (see GQARCH). HESTON GARCH

See SQR-GARCH.

HGARCH (Hentschel GARCH) The HGARCH model of Hentschel (1995) is based on a Box-Cox transform of the conditional standard deviation. It is explicitly designed to nest some of the most popular univariate parameterizations. The HGARCH(1,1) model may be expressed as:

ν

−1 −1 δ σtδ = ω + αδσt− 1 εt−1 σt−1 − κ − γ (εt−1 σt−1 − κ)

δ + βσt− 1.

The model obviously reduces to the standard linear GARCH(1,1) model for δ = 2, ν = 2, κ = 0 and γ = 0, but it also nests the APARCH, AGARCH1 , EGARCH, GJR, NGARCH, TGARCH, and TS-GARCH models as special cases (see also Aug-GARCH). HYGARCH (Hyperbolic GARCH) The HYGARCH model proposed by Davidson (2004) nests the GARCH, IGARCH and FIGARCH models (see GARCH, IGARCH and FIGARCH). The model is deﬁned in terms of the ARCH(∞) representation (see also LARCH),

σt2 = ω +

∞ i=1

αi ε2t−1 ≡ ω + 1 −

δ (L) (1 + α((1 − L)d − 1)) ε2t−1 . β (L)

The standard GARCH and FIGARCH models correspond to α = 0, and α = 1 and 0 < d < 1, respectively. For d = 1 the HYGARCH model reduces to a standard GARCH or an IGARCH model depending upon whether α < 1 or α = 1. (Integrated GARCH) Estimates of the standard linear GARCH (p,q) model (see GARCH) often results in the sum of the estimated αi and βi coeﬃcients being close to unity. Rewriting the GARCH(p,q) model as an ARMA (max {p,q},p) model for the squared innovations, IGARCH

(1 − α(L) − β (L))ε2t = ω + (1 − β (L))νt

where νt ≡ ε2t − σt2 , and α(L) and β (L) denote appropriately deﬁned lag polynomials, the IGARCH model of Engle and Bollerslev (1986) imposes an exact unit root in the corresponding autoregressive polynomial, (1 −α(L) −β (L)) = ϕ(L)(1 − L), so that the model may be written as: ϕ(L)(1 − L)ε2t = ω + (1 − β (L))νt .

Even though the IGARCH model is not covariance stationary, it is still strictly stationary with a well-deﬁned nondegenerate limiting distribution; see Nelson (1990a). Also, as shown by Lee and Hansen (1994) and Lumsdaine (1996), standard inference procedures

Glossary to ARCH (GARCH)

153

may be applied in testing the hypothesis of a unit root, or α(1) + β (1) = 1 (see also FIGARCH). (Implied Volatility) Implied volatility refers to the volatility that would equate the theoretical price of an option according to some valuation model, typically Black–Scholes, to that of the actual market price of the option.

IV

LARCH

(Linear ARCH) The ARCH (∞) representation, σt2 = ω +

∞ i=1

αi ε2t−1 ,

is sometimes referred to as a LARCH model. This representation was ﬁrst used by Robinson (1991) in the derivation of general tests for conditional heteroskedasticity. Latent GARCH Models formulated in terms of latent variables that adhere to GARCH structures are sometimes referred to as latent GARCH, or unobserved GARCH, models. A leading example is the N -dimensional factor ARCH model of Diebold and Nerlove (1989), εt = λft + ηt , where λ and ηt denote N × 1 vectors of factor loadings and i.i.d. innovations, respectively, and the conditional variance of ft is determined by an ARCH model in lagged squared values of the latent factor (see also F-ARCH). Models in which the innovations are subject to censoring is another example (see Tobit-GARCH). In contrast to standard ARCH and GARCH models, for which the likelihood functions are readily available through a prediction error decomposition-type argument (see ARCH), the likelihood functions for latent GARCH models are generally not available in closed form. General estimation and inference procedures for latent GARCH models based on Markov Chain Monte Carlo methods have been developed by Fiorentini, Sentana and Shephard (2004) (see also SV). Level-GARCH The Level-GARCH model proposed by Brenner, Harjes and Kroner (1996) for modeling the conditional variance of short-term interest rates postulates that 2γ σt2 = ψt2 rt− 1,

where ψt follows a GARCH(1,1) structure, 2 ψt2 = ω + αε2t−1 + βψt− 1.

For γ = 0 the model obviously reduces to a standard GARCH(1,1) model. The LevelGARCH model is also sometimes referred to as the Time-Varying Parameter Level, or TVP-Level, model (see also GARCH and GARCH-X2 ). LGARCH1

(Leverage GARCH) The GJR model is sometimes referred to as a LGARCH model (see GJR). LGARCH2

(Linear GARCH) The standard GARCH(p,q) model (see GARCH) in which the conditional variance is a linear function of p own lags and q lagged squared innovations is sometimes referred to as a LGARCH model. LMGARCH

(Long Memory GARCH) The LMGARCH(p,d,q) model is deﬁned by, σt2 = ω + [β (L)ϕ(L)−1 (1 − L)−d − 1]νt ,

154

Glossary to ARCH (GARCH)

where νt ≡ ε2t − σt2 , and 0 < d < 0.5. Provided that the fourth order moment exists, the resulting process for ε2t is covariance stationary and exhibits long memory. For further discussion and comparisons with the FIGARCH model see Conrad and Karanasos (2006) (see also FIGARCH and HYGARCH). (logarithmic GARCH) The log-GARCH(p,q) model, which was suggested independently in slightly diﬀerent forms by Geweke (1986), Pantula (1986) and Milhøj (1987), parameterizes the logarithmic conditional variance as a function of the lagged logarithmic variances and the lagged logarithmic squared innovations,

log-GARCH

2 log σt2 = ω + αi log ε2t−i + βi log σt−i . q

p

i=1

i=1

The model may alternatively be expressed as: σt2 = exp(ω )

q )

i=1

ε2t−i

p αi ) i=1

2 σt−i

βi

.

In light of this alternative representation, the model is also sometimes referred to as a Multiplicative GARCH, or MGARCH, model. (Moving Average Conditional Heteroskedastic) The MACH(p) class of models proposed by Yang and Bewley (1995) is formally deﬁned by the condition:

MACH

Et σt2+i = E σt2+i

i > p,

so that the eﬀect of a shock to the conditional variance lasts for at most p periods. More speciﬁcally, the Linear MACH(1), or L-MACH(1), model is deﬁned by σt2 = ω + α(εt−1 /σt−1 )2 . Higher order L-MACH(p) models, Exponential MACH(p), or EMACH(p), models, Quadratic MACH(p), or Q-MACH(p), models, may be deﬁned in a similar manner (see also EGARCH and GQARCH). The standard linear ARCH(1) model, σt2 = ω + αε2t−1 , is not a MACH(1) process. MAR-ARCH

(Mixture AutoRegressive ARCH) See MGARCH3 .

MARCH1

(Modiﬁed ARCH) Friedman, Laibson and Minsky (1989) denote the class of GARCH(1,1) models in which the conditional variance depends nonlinearly on the lagged squared innovations as Modiﬁed ARCH models,

2 σt2 = ω + αF ε2t−1 + βσt− 1,

where F (·) denotes a positive valued function. In their estimation of the model Friedman, Laibson and Minsky (1989) use the function F (x) = sin(θx) · I (θx < π/2) + 1 · I (θx ≥ π/2) (see also NGARCH). MARCH2

(Multiplicative ARCH) See MGARCH2 .

Matrix EGARCH The multivariate matrix exponential GARCH model of Kawakatsu (2006) (see also EGARCH and MGARCH1 ) speciﬁes the second moment dynamics in terms of the matrix logarithm of the conditional covariance matrix. More speciﬁcally, let ht = vech(log Ωt ) denote the N (N + 1)/2 × 1 vector of unique elements in log Ωt , where the logarithm of a matrix is deﬁned by the inverse of the power series expansion used

Glossary to ARCH (GARCH)

155

in deﬁning the matrix exponential. A simple multivariate matrix EGARCH extension of the univariate EGARCH(1,1) model may then be expressed as: ht = Ω + A(|εt−1 | − E (|εt−1 |)) + Γ εt−1 + Bht−1 ,

for appropriately dimensioned matrices Ω , A, Γ and B . By parameterizing only the unique elements of the logarithmic conditional covariance matrix, the matrix EGARCH model automatically guarantees that Ωt ≡ exp(ht ) is positive deﬁnite. (Mixture of Distributions Hypothesis) The MDH ﬁrst developed by Clark (1973) postulates that ﬁnancial returns over nontrivial time intervals, say one day, represent the accumulated eﬀect of numerous within period, or intraday, news arrivals and corresponding price changes. The MDH coupled with the assumption of serially correlated news arrivals is often used to rationalize the apparent volatility clustering, or ARCH eﬀects, in asset returns. More advanced versions of the MDH, relating the time-deformation to various ﬁnancial market activity variables, such as the number of trades, the cumulative trading volume or the number of quotes, have been developed and explored empirically by Tauchen and Pitts (1983) and Andersen (1996) among many others.

MDH

MEM (Multiplicative Error Model) The Multiplicative Error class of Models (MEM) was proposed by Engle (2002b) as a general framework for modeling non-negative valued time series. The MEM may be expressed as,

xt = μt ηt ,

where xt ≥ 0 denotes the time series of interest, μt refers to its conditional mean, and ηt is a non-negative i.i.d. process with unit mean. The conditional mean is natural parameterized as, μt = ω +

q i=1

αi xt−i +

p

βi μt−i ,

i=1

where conditions on the parameters for μt to be positive follow from the corresponding conditions for the GARCH(p,q) model (see GARCH). Deﬁning xt ≡ ε2t and μt ≡ σt2 , the MEM class of models encompasses all ARCH and GARCH models, and speciﬁc formulations are readily estimated by the corresponding software for GARCH models. The ACD model for durations may also be interpreted as a MEM (see ACD). MGARCH1

(Multivariate GARCH) Multivariate GARCH models were ﬁrst analyzed and estimated empirically by Bollerslev, Engle and Wooldridge (1988). The unrestricted linear MGARCH(p,q) model is deﬁned by: vech(Ωt ) = Ω +

q t=1

Ai vech(εt−i εt−i ) +

p

Bi vech(Ωt−i ),

i=1

where vech(·) denotes the operator that stacks the lower triangular portion of a symmetric N × N matrix into an N (N + 1)/2 × 1 vector of the corresponding unique elements, and the Ai and Bi matrices are all of compatible dimension N (N +1)/2 ×N (N +1)/2. This vectorized representation is also sometimes referred to as a VECH GARCH model. The general vech representation does not guarantee that the resulting conditional covariance matrices Ωt are positive deﬁnite. Also, the model involves a total of N (N +1)/2+(p + q )(N 4 +2N 3 + N 2 )/4 parameters, which becomes prohibitively expensive from a practical computational point

156

Glossary to ARCH (GARCH)

of view for anything but the bivariate case, or N = 2. Much of the research on multivariate GARCH models has been concerned with the development of alternative, more parsimonious, yet empirically realistic, representations, that easily ensure the conditional covariance matrices are positive deﬁnite. The trivariate vech MGARCH(1,1) model estimated in Bollerslev, Engle and Wooldridge (1988) assumes that the A1 and B1 matrices are both diagonal, so that each element in Ωt depends exclusively on its own lagged value and the product of the corresponding shocks. This diagonal simpliﬁcation, resulting in “only” (1 + p + q )(N 2 + N )/2 parameters to be estimated, is often denoted as a diag MGARCH model (see also diag MGARCH). MGARCH2

(Multiplicative GARCH) Slightly diﬀerent versions of the univariate Multiplicative GARCH model were proposed independently by Geweke (1986), Pantula (1986) and Milhøj (1987). The model is more commonly referred to as the log-GARCH model (see log-GARCH). MGARCH3

(Mixture GARCH) The MAR-ARCH model of Wong and Li (2001) and the MGARCH model Zhang, Li and Yuen (2006) postulates that the time t conditional variance is given by a time-invariant mixture of diﬀerent GARCH models (see also GRSGARCH, NM-GARCH and SWARCH).

MS-GARCH

(Markov Switching GARCH) See SWARCH.

(MultiVariate GARCH) The MV-GARCH, MGARCH and VGARCH acronyms are used interchangeably (see MGARCH1 ).

MV-GARCH

NAGARCH (Nonlinear Asymmetric GARCH) The NAGARCH(1,1) model of Engle and Ng (1993) is deﬁned by: −1 2 2 σt2 = ω + α(εt−1 σt− 1 + γ ) + βσt−1 .

Higher order NAGARCH(p,q) models may be deﬁned similarly (see also AGARCH1 and VGARCH1 ). NGARCH (Nonlinear GARCH) The NGARCH(p,q) model proposed by Higgins and Bera

(1992) parameterizes the conditional standard deviation raised to the power δ as a function of the lagged conditional standard deviations and the lagged absolute innovations raised to the same power, σtδ = ω +

q i=1

αi |εt−i |δ +

p i=1

δ βi σt−i .

This formulation obviously reduces to the standard GARCH(p,q) model for δ = 2 (see GARCH). The NGARCH model is also sometimes referred to as a Power ARCH or Power GARCH model, or PARCH or PGARCH model. A slightly diﬀerent version of the NGARCH model was originally estimated by Engle and Bollerslev (1986), 2 σt2 = ω + α|εt−1 |δ + βσt− 1.

Glossary to ARCH (GARCH)

157

With most ﬁnancial rates of returns, the estimates for δ are found to be less than two, although not always signiﬁcantly so (see also APARCH and TS-GARCH). (NonLinear GARCH) The NL-GARCH acronym is sometimes used to describe all parameterizations diﬀerent from the benchmark linear GARCH(p,q) representation (see GARCH).

NL-GARCH

NM-GARCH (Normal Mixture GARCH) The NM-GARCH model postulates that the distribution of the standardized innovations εt σt−1 is determined by a mixture of two or more normal distributions. The statistical properties of the NM-GARCH(1,1) model have been studied extensively by Alexander and Lazar (2006) (see also GARCH-t, GEDGARCH and SWARCH). OGARCH (Orthogonal GARCH) The multivariate OGARCH model assumes that the N × 1 vector process εt may be represented as εt = Γ ft , where the columns of the N × m

matrix Γ are mutually orthogonal, and the m elements in the m × 1 ft vector process are conditionally uncorrelated with GARCH conditional variances. Consequently, the conditional covariance matrix for εt may be expressed as: Ωt = Γ Dt Γ ,

where Dt denotes the m × m diagonal matrix with the conditional factor variances along the diagonal. Estimation and inference in the OGARCH model are discussed in detail in Alexander (2001, 2008). The OGARCH model is also sometimes referred to as a principal component MGARCH model. The approach is related to but formally diﬀerent from the PC-GARCH model of Burns (2005) (see also F-ARCH, GO-GARCH, MGARCH1 and PC-GARCH). PARCH

(Power ARCH) See NGARCH.

PC-GARCH (Principal Component GARCH) The multivariate PC-GARCH model of Burns (2005) is based on the estimation of univariate GARCH models to the principal components, deﬁned by the covariance matrix for the standardized residuals from a ﬁrst stage estimation of univariate GARCH models for each of the individual series (see also OGARCH). PGARCH1

(Periodic GARCH) The PGARCH model of Bollerslev and Ghysels (1996) was designed to account for periodic dependencies in the conditional variance by allowing the parameters of the model to vary over the cycle. In particular, the PGARCH(1,1) model is deﬁned by: 2 σt2 = ωs(t) + αs(t) ε2t−1 + βs(t) σt− 1,

where s(t) refers to the stage of the periodic cycle at time t, and ωs(t) , αs(t) and βs(t) denote the diﬀerent GARCH(1,1) parameter values for s(t) = 1, 2, . . . , P . PGARCH2

(Power GARCH) See NGARCH.

(Partially NonParametric ARCH) The PNP-ARCH model estimated by Engle and Ng (1993) allows the conditional variance to be a partially linear function of

PNP-ARCH

158

Glossary to ARCH (GARCH)

the lagged innovations and the lagged conditional variance, 2 σt2 = ω + βσt− 1 +

m

θi (εt−1 − i · σ )I (εt−1 < i · σ ),

i=−m

where σ denotes the unconditional standard deviation of the process, and m is an integer. The PNP-ARCH model was used by Engle and Ng (1993) in the construction of so-called news impact curves, reﬂecting how the conditional variance responds to diﬀerent sized shocks (see also GJR and TGARCH). QARCH

(Quadratic ARCH) See GQARCH.

QMLE (Quasi Maximum Likelihood Estimation) ARCH models are typically estimated under the assumption of conditional normality (see ARCH). Even if the assumption of conditional normality is violated (see also GARCH-t, GED-GARCH and SPARCH), the parameter estimates generally remain consistent and asymptotically normally distributed, as long as the ﬁrst two conditional moments of the model are correctly speciﬁed; i.e, Et−1 (εt ) = 0 and Et−1 (ε2t ) = σt2 . A robust covariance matrix for the resulting QMLE parameter estimates may be obtained by post- and pre-multiplying the matrix of outer products of the gradients with an estimate of Fisher’s Information Matrix. A relatively simple-to-compute expression for this matrix involving only ﬁrst derivatives was derived in Bollerslev and Wooldridge (1992). The corresponding robust standard errors are sometimes referred to in the ARCH literature as Bollerslev–Wooldridge standard errors.

(Qualitative Threshold ARCH) The QTARCH(q) model of Gourieroux and Monfort (1992) assumes that the conditional variance may be represented by a sum of step functions:

QTARCH

σt2 = ω +

q J

αij Ij (εt−i ),

i=1 j =1

where the Ij (·) function partitions the real line into J sub-intervals, so that Ij (εt−i ) equals unity if εt−i falls in the jth sub-interval and zero otherwise. The Generalized QTARCH, or GQTARCH(p,q), model is readily deﬁned by including p lagged conditional variances on the right-hand-side of the equation. REGARCH (Range EGARCH) The REGARCH model of Brandt and Jones (2006) postulates an EGARCH-type formulation for the conditional mean of the demeaned standardized logarithmic range. The FIREGARCH model allows for long-memory dependencies (see EGARCH and FIEGARCH). RGARCH1

(Randomized GARCH) The RGARCH(r,p,q) model of Nowicka-Zagrajek and Weron (2001) replaces the intercept in the standard GARCH(p,q) model with a sum of r positive i.i.d. stable random variables, ηt−i , i = 1, 2, . . . , r , σt2 =

r i=1

ci ηt−i +

q i=1

αi ε2t−i +

p i=1

2 βi σt−i ,

where ci ≥ 0. RGARCH2 (Robust GARCH) The robust GARCH model of Park (2002) is designed to minimize the impact of outliers by parameterizing the conditional variance as a

Glossary to ARCH (GARCH)

159

TS-GARCH model (see TS-GARCH) with the parameters estimated by least absolute deviations, or LAD. RGARCH3

(Root GARCH) The multivariate RGARCH model (see also MGARCH1 and Stdev-ARCH) of Gallant and Tauchen (1998) is formulated in terms of the lower triangular N × N matrix Rt , where by deﬁnition, Ωt = Rt Rt .

By parameterizing Rt instead of Ωt , the RGARCH formulation automatically guarantees that the resulting conditional covariance matrices are positive deﬁnite. However, the formulation complicates the inclusion of asymmetries or “leverage eﬀects” in the conditional covariance matrix. RS-GARCH

(Regime Switching GARCH) See SWARCH.

(Realized Volatility) The term realized volatility, or realized variation, is commonly used in the ARCH literature to denote ex post variation measures deﬁned by the summation of within period squared or absolute returns over some nontrivial time interval. A rapidly growing recent literature has been concerned with the use of such measures and the development of new and reﬁned procedures in light of various data complications. Many new empirical insights aﬀorded by the use of daily realized volatility measures constructed from high-frequency intraday returns have also recently been reported in the literature; see, e.g., the review in Andersen, Bollerslev and Diebold (2009).

RV

SARV

(Stochastic AutoRegressive Volatility) See SV.

(Stable GARCH) Let εt ≡ zt ct , where zt is independent and identically distributed over time as a standard Stable Paretian distribution. The Stable GARCH model for εt of Liu and Brorsen (1995) is then deﬁned by:

SGARCH

λ λ cλ t = ω + α|εt−1 | + βct−1 .

The SGARCH model nests the ACH model (see ACH2 ) of McCulloch (1985) as a special case for λ = 1, ω = 0 and β = 1 − α (see also GARCH-t, GED-GARCH and NGARCH). (Simpliﬁed GARCH) The simpliﬁed multivariate GARCH (see MGARCH1 ) approach of Harris, Stoja and Tucker (2007) infers the conditional covariances through the estimation of auxiliary univariate GARCH models for the linear combinations in the identity,

S-GARCH

Covt−1 (εit , εjt ) = (1/4) · [V art−1 (εit + εjt ) + V art−1 (εit − εjt )].

Nothing guarantees that the resulting N × N conditional covariance matrix is positive deﬁnite (see also CCC and Flex-GARCH). Sign-GARCH

See GJR.

SPARCH (SemiParametric ARCH) To allow for non-normal standardized residuals, as commonly found in the estimation of ARCH models (see also GARCH-t, GED-GARCH and QMLE), Engle and Gonz´ alez-Rivera (1991) suggest estimating the distribution of εˆt σˆ t−1 through nonparametric density estimation techniques. Although Engle and

160

Glossary to ARCH (GARCH)

Gonz´ alez-Rivera (1991) do not explicitly use the name SPARCH, the approach has subsequently been referred to as such by several other authors in the literature. Spline-GARCH The Spline-GARCH model of Engle and Rangel (2008) speciﬁes the conditional variance of εt as the product of a standardized unit GARCH(1,1) model, 2 σt2 = (1 − α − β )ω + α(ε2t−1 /τt ) + βσt− 1,

and a deterministic component represented by an exponential spline function of time, τt = c · exp[ω0 t + ω1 ((t − t0 )+ )2 + ω2 ((t − t1 )+ )2 + . . . + ωk ((t − tk−1 )+ )2 ],

where (t − ti )+ is equal to (t − ti ) for t > ti and 0 otherwise, and 0 = t0 < t1 < . . . < tk = T deﬁnes a partition of the full sample into k equally spaced time intervals. Other exogenous explanatory variables may also be included in the equation for τt . The Spline GARCH model was explicitly designed to investigate macroeconomic causes of slowly moving, or low-frequency volatility components (see also CGARCH1 ). (Square-Root GARCH) The discrete-time SQR-GARCH model of Heston and Nandi (2000),

SQR-GARCH

−1 2 2 σt2 = ω + α(εt−1 σt− 1 − γσt−1 ) + βσt−1 ,

is closely related to the VGARCH model of Engle and Ng (1993) (see VGARCH1 ). In contrast to the standard GARCH(1,1) model, the SQR-GARCH formulation allows for closed form option pricing under reasonable auxiliary assumptions. When deﬁned over increasingly ﬁner sampling intervals, the SQR-GARCH model converges weakly to the continuous-time aﬃne, or square-root, diﬀusion analyzed by Heston (1993), dσ 2 (t) = κ(θ − σ 2 (t))dt + νσ (t)dW (t).

The SQR-GARCH model is also sometimes referred to as the Heston GARCH or the Heston–Nandi GARCH model (see also GARCH diﬀusion). STARCH (Structural ARCH) An unobserved component, or “structural,” time series model in which one or more of the disturbances follow an ARCH model was dubbed a STARCH model by Harvey, Ruiz and Sentana (1992).

(Standard deviation ARCH) The Stdev-ARCH(q) model ﬁrst estimated by Schwert (1990) takes the form,

Stdev-ARCH

σt2 = (ω +

q

αi |εt−i |)2 .

i=1

This formulation obviously ensures that the conditional variance is positive. However, the nonlinearity complicates the construction of forecasts from the model (see also AARCH). (Smooth Transition GARCH) The ST-GARCH(1,1) model of Gonz´ alezRivera (1998) allows the impact of the past squared innovations to depend upon both the sign and the magnitude of εt−1 through a smooth transition function,

STGARCH

2 σt2 = ω + αε2t−1 + δε2t−i F (εt−1 , γ ) + βi σt− 1,

where F (εt−1 , γ ) = (1 + exp(γεt−1 ))−1 ,

Glossary to ARCH (GARCH)

161

so that the value of the function is bounded between 0 and 1 (see also ANST-GARCH, GJR and TGARCH). The Structural GARCH approach named by Rigobon (2002) relies on a multivariate GARCH model for the innovations in an otherwise unidentiﬁed structural VAR to identify the parameters through time-varying conditional heteroskedasticity. Closely related ideas and models have been explored by Sentana and Fiorentini (2001) among others.

Structural GARCH

GARCH models in which the standardized innovations, zt = εt σt−1 , are assumed to be i.i.d. through time are referred to as strong GARCH models (see also Weak GARCH).

Strong GARCH

(Stochastic Volatility) The term stochastic volatility, or SV model, refers to formulations in which σt2 is speciﬁed as a nonmeasurable, or stochastic, function of the observable information set. To facilitate estimation and inference via linear state-space representations, discrete-time SV models are often formulated in terms of time series models for

log σt2 , as exempliﬁed by the simple SARV(1) model, SV

2 log σt2 = μ + ϕ log σt− 1 + σu ut ,

where ut is i.i.d. with mean zero and variance one. Meanwhile, the SV approach has proven especially useful in the formulation of empirically realistic continuous-time volatility models of the form, dy (t) = μ(t)dt + σ (t)dW (t),

where μ(t) denotes the drift, W (t) refers to standard Brownian Motion, and the diﬀusive volatility coeﬃcient σ (t) is determined by a separate stochastic process (see also GARCH Diﬀusion). SVJ (Stochastic Volatility Jump) The SVJ acronym is commonly used to describe continuous-time stochastic volatility models in which the sample paths may be discontinuous, or exhibit jumps (see also SV and GARJI). SWARCH (regime SWitching ARCH) The SWARCH model proposed independently by Cai (1994) and Hamilton and Susmel (1994) extends the standard linear ARCH(q) model (see ARCH) by allowing the intercept, ωs(t) , and/or the magnitude of the squared innovations, ε2t−i /s(t − i), entering the conditional variance equation to depend upon some latent state variable, s(t), with the transition between the diﬀerent states governed by a Markov chain. Regime switching GARCH models were ﬁrst developed by Gray (1996) (see GRS-GARCH). Diﬀerent variants of these models are also sometimes referred to in the literature as Markov Switching GARCH, or MS-GARCH, Regime Switching GARCH, or RS-GARCH, or Mixture GARCH, or MGARCH, models. TGARCH (Threshold GARCH) The TGARCH(p,q) model proposed by Zako¨ıan (1994) extends the TS-GARCH(p,q) model (see TS-GARCH) to allow the conditional standard deviation to depend upon the sign of the lagged innovations. In particular, the TGARCH(1,1) model may be expressed as:

σt = ω + α|εt−1 | + γ|εt−1 |I (εt−1 < 0) + βσt−1 .

162

Glossary to ARCH (GARCH)

The TGARCH model is also sometimes referred to as the ZARCH, or ZGARCH, model. The basic idea behind the model is closely related to that of the GJRGARCH model developed independently by Glosten, Jagannathan and Runkle (1993) (see GJR). t-GARCH

(t-distributed GARCH) See GARCH-t.

The Tobit-GARCH model, ﬁrst proposed by Kodres (1993) for analyzing futures prices, extends the standard GARCH model (see GARCH) to allow for the possibility of censored observations on the εt s, or the underlying yt s. More general formulations allowing for multiperiod censoring and related inference procedures have been developed by Lee (1999), Morgan and Trevor (1999) and Wei (2002).

Tobit-GARCH

(Taylor–Schwert GARCH) The TS-GARCH(p,q) model of Taylor (1986) and Schwert (1989) parameterizes the conditional standard deviation as a distributed lag of the absolute innovations and the lagged conditional standard deviations,

TS-GARCH

σt = ω +

q

αi |εt−i | +

i=1

p

βi σt−i .

i=1

This formulation mitigates the inﬂuence of large, in an absolute sense, observations relative to the traditional GARCH(p,q) model (see GARCH). The TS-GARCH model is also sometimes referred to as an Absolute Value GARCH, or AVGARCH, model, or simply an AGARCH model. It is a special case of the more general Power GARCH, or NGARCH, formulation (see NGARCH). TVP-Level

(Time-Varying Parameter Level) See Level-GARCH.

UGARCH

(Univariate GARCH) See GARCH.

Unobserved GARCH

See Latent GARCH.

Variance Targeting The use of variance targeting in GARCH models was ﬁrst suggested by Engle and Mezrich (1996). To illustrate, consider the GARCH(1,1) model (see GARCH), 2 σt2 = (1 − α − β )σ 2 + αεt−1 + βσt− 1,

where σ 2 = ω (1 − α − β )−1 . Fixing σ 2 at some pre-set value ensures that the long run variance forecasts from the model converge to σ 2 . Variance targeting has proven especially useful in multivariate GARCH modeling (see MGARCH1 ). VCC

(Varying Conditional Correlations) See DCC.

vech GARCH

(vectorized GARCH) See MGARCH1 .

VGARCH1

Following Engle and Ng (1993), the VGARCH(1,1) model refers to the parameterization, −1 2 2 σt2 = ω + α(εt−1 σt− 1 + γ ) + βσt−1 ,

Glossary to ARCH (GARCH)

163

in which the impact of the innovations for the conditional variance is symmetric and centered at −γσt−1 . Higher order VGARCH(p,q) models may be deﬁned in a similar manner (see also AGARCH1 and NAGARCH). VGARCH2 (Vector GARCH) The VGARCH, MGARCH and MV-GARCH acronyms are used interchangeably (see MGARCH1 ).

(Volatility Switching GARCH) The VSGARCH(1,1) model of Fornari and Mele (1996) directly mirrors the GJR model (see GJR),

VSGARCH

2 2 σt2 = ω + αε2t−1 + γ ε2t−1 /σt− 1 I (εt−1 < 0) + βσt−1 ,

except that the asymmetric impact of the lagged squared negative innovations is scaled by the corresponding lagged conditional variance. The weak GARCH class of models, or concept, was ﬁrst developed by as the linear Drost and Nijman (1993). In the weak GARCH class of models σt2 is deﬁned * + projection of ε2t on the space spanned by 1, εt−1 , εt−2 , . . . , ε2t−1 , ε2t−2 , . . . as opposed to the

conditional expectation of ε2t , or Et−1 ε2t (see also ARCH and GARCH). In contrast to the standard GARCH(p,q) class of models, which is not formally closed under temporal aggregation, the sum of successive observations from a weak GARCH(p,q) model remains a weak GARCH(p’, q’) model, albeit with diﬀerent orders p’ and q’. Similarly, as shown by Nijman and Sentana (1996) the unrestricted multivariate linear weak MGARCH(p,q) model (see MGARCH1 ) deﬁned in terms of linear projections as opposed to conditional expectations is closed under contemporaneous aggregation, or portfolio formation (see also Strong GARCH). Weak GARCH

ZARCH

(Zakoian ARCH) See TGARCH.

9

An Automatic Test of Super Exogeneity David F. Hendry and Carlos Santos

1. Introduction It is a real pleasure to contribute to a volume in honor of Rob Engle, who has greatly advanced our understanding of exogeneity, and has published with the ﬁrst author on that topic. At the time of writing Engle, Hendry and Richard (1983) (which has accrued 750 citations and counting), or even Engle and Hendry (1993), we could not have imagined that an approach based on handling more variables than observations would have been possible, let alone lead to an automatic test as we explain below. Rob has, of course, also contributed hugely to many other aspects of econometrics, not least the modeling of volatility (with over 5,000 citations starting from Engle, 1982a) and nonstationarity (where his famous paper on cointegration, Engle and Granger, 1987, has garnered an astonishing 8,500 cites according to Google Scholar): DFH still remembers discussing cointegration endlessly while running round Florence in a mad dash to see all the sights during a day visit in 1983, while we were both attending the Econometric Society Meeting in Pisa. The hallmarks of Rob’s publications are inventiveness, clarity, and succinctness such that his research is ﬁlled with ideas that are beautifully explained despite the often complex mathematics lying behind – setting a high standard for others to emulate. He is also one of the driving forces for the rapid progress in our discipline, and we wish him continuing high productivity into the future. Acknowledgments: Financial support from the ESRC under Research Grant RES-062-23-0061, and from the Funda¸c˜ ao para a Ciˆencia e a Tecnologia (Lisboa), is gratefully acknowledged by the ﬁrst and second authors, respectively. We are indebted to Jennifer L. Castle, Jurgen A. Doornik, Ilyan Georgiev, Søren Johansen, Bent Nielsen, Mark W. Watson and participants at the Festschrift in honor of Robert F. Engle for helpful comments on an earlier draft, and to Jurgen and J. James Reade for providing some of the results based on Autometrics.

164

1 Introduction

165

In all areas of policy that involve regime shifts, or structural breaks in conditioning variables, the invariance of the parameters of conditional models under changes in the distributions of conditioning variables is of paramount importance, and was called super exogeneity by Engle et al. (1983). Even in models without contemporaneous conditioning variables, such as vector equilibrium systems (EqCMs), invariance under such shifts is equally relevant. Tests for super exogeneity have been proposed by Engle et al. (1983), Hendry (1988), Favero and Hendry (1992), Engle and Hendry (1993), Psaradakis and Sola (1996), Jansen and Ter¨ asvirta (1996) and Krolzig and Toro (2002), inter alia: Ericsson and Irons (1994) overview the literature at the time of their publication. Favero and Hendry (1992), building on Hendry (1988), considered the impact of nonconstant marginal processes on conditional models, and concluded that location shifts (changes in unconditional means of nonintegrated, I(0), variables) were essential for detecting violations attributable to the Lucas (1976) critique. Engle and Hendry (1993) examined the impact on a conditional model of changes in the moments of the conditioning variables, using a linear approximation: tests for super exogeneity were constructed by replacing the unobservable changing moments by proxies based on models of the processes generating the conditioning variables, including models based on ARCH processes (see Engle, 1982a), thereby allowing for nonconstant error variances to capture changes in regimes. However, Psaradakis and Sola (1996) claim that such tests have relatively low power for rejecting the Lucas critique. Jansen and Ter¨ asvirta (1996) propose self-exciting threshold models for testing constancy in conditional models as well as super exogeneity. Krolzig and Toro (2002) developed super-exogeneity tests using a reduced-rank technique for co-breaking based on the presence of common deterministic shifts, and demonstrated that their proposal dominated existing tests (on co-breaking in general, see Clements and Hendry, 1999, and Hendry and Massmann, 2007). We propose a new addition to this set of possible tests, show that its rejection frequency under the null is close to the nominal signiﬁcance level in static settings, and examine its rejection frequencies when super exogeneity does not hold. The ability to detect outliers and shifts in a model using the dummy saturation techniques proposed by Hendry, Johansen and Santos (2008) opens the door to this new class of automatically computable super-exogeneity tests. Their approach is to saturate the marginal model (or system) with impulse indicators (namely, include an impulse for every observation, but entered in feasible subsets), and retain all signiﬁcant outcomes. They derive the probability under the null of falsely retaining impulses for a location-scale i.i.d. process, and obtain the distribution of the estimated mean and variance after saturation. Johansen and Nielsen (2009) extend that analysis to dynamic regression models, which may have unit roots. Building on the ability to detect shifts in marginal models, we consider testing the relevance of all their signiﬁcant impulses in conditional models. As we show below, such a test has the correct rejection frequency under the null of super exogeneity of the conditioning variables for the parameters of the conditional model, for a range of null-rejection frequencies in the marginal-model saturation tests. Moreover, our proposed test can detect failures of super exogeneity when there are location shifts in the marginal models. Finally, it can be computed automatically – that is without explicit user intervention, as occurs with (say) tests for residual autocorrelation – once the desired nominal sizes of the marginal saturation and conditional super-exogeneity tests have been speciﬁed.

166

An automatic test of super exogeneity

Six conditions need to be satisﬁed for a valid and reliable automatic test of super exogeneity. First, the test should not require ex ante knowledge by the investigator of the timing, signs, or magnitudes of any breaks in the marginal processes of the conditioning variables. The test proposed here uses impulse-saturation techniques applied to the marginal equations to determine these aspects. Second, the correct data generation process for the marginal variables should not need to be known for the test to have the desired rejection frequency under the null of super exogeneity. That condition is satisﬁed here for the impulse-saturation stage in the marginal models when there are no explosive roots in any of the variables, by developing congruent models using an automatic variant of general-to-speciﬁc modeling (see Hendry, 2009, for a recent discussion of congruence). Third, the test should not reject when super exogeneity holds yet there are shifts in the marginal models, which would lead to many impulses being retained for testing in the conditional model. We show this requirement is satisﬁed as well. Fourth, the conditional model should not have to be over-identiﬁed under the alternative of a failure of super exogeneity, as needed for tests in the class proposed by (say) Revankar and Hartley (1973). Fifth, the test must have power against a large class of potential failures of super exogeneity in the conditional model when there are location shifts in some of the marginal processes. Below, we establish the noncentrality parameter of the proposed test in a canonical case. Finally, the test should be computable without additional user intervention, as holds for both the impulse-saturation stage and the proposed superexogeneity test. The results here are based partly on the PcGets program (see Hendry and Krolzig, 2001) and partly on the more recent Autometrics algorithm in PcGive (see Doornik, 2009, 2007b), which extends general-to-speciﬁc modeling to settings with more variables than observations (see Hendry and Krolzig, 2005, and Doornik, 2007a). The structure of the chapter is as follows. Section 2 reconsiders which shifts in vector autoregressions (VARs) are relatively detectable, and derives the implications for testing for breaks in conditional representations. Section 3 considers super exogeneity in a regression context to elucidate its testable hypotheses, and discusses how super exogeneity can fail. Section 4 describes the impulse-saturation tests in Hendry et al. (2008) and Johansen and Nielsen (2009), and considers how to extend these to test super exogeneity. Section 5 provides analytic and Monte Carlo evidence on the null rejection frequencies of that procedure. Section 6 considers the power of the ﬁrst stage to determine location shifts in marginal processes. Section 7 analyzes a failure of weak exogeneity under a nonconstant marginal process. Section 8 notes a co-breaking saturation-based test which builds on Krolzig and Toro (2002) and Hendry and Massmann (2007). Section 9 investigates the powers of the proposed automatic test in Monte Carlo experiments for a bivariate data generation process based on Section 7. Section 10 tests super exogeneity in the much-studied example of UK money demand; and Section 11 concludes.

2. Detectable shifts Consider the n-dimensional I(0) VAR(1) data generation process (DGP) of {xt } over t = 1, . . . , T : xt = φ + Πxt−1 + νt

where

νt ∼ INn [0, Ων ]

(1)

2 Detectable shifts

167

so Π has all its eigenvalues less than unity in absolute value, with unconditional expectation E [xt ]: −1

E [xt ] = (In − Π)

φ=ϕ

(2)

hence: xt − ϕ = Π (xt−1 − ϕ) + νt .

(3)

At time T1 , however, (φ : Π) changes to (φ∗ : Π∗ ), so for h ≥ 1 the data are generated by: xT1 +h = φ∗ + Π∗ xT1 +h−1 + νT1 +h

(4)

where Π∗ still has all its eigenvalues less than unity in absolute value. Such a shift generates considerable nonstationarity in the distribution of {xT1 +h } for many periods afterwards since: E [xT1 +h ] = ϕ∗ − (Π∗ ) (ϕ∗ − ϕ) = ϕ∗h h

where ϕ∗ = (In − Π∗ )

−1

−−−→ h→∞

ϕ∗

φ∗ , so that, from (4):

xT1 +h − ϕ∗ = Π∗ (xT1 +h−1 − ϕ∗ ) + νT1 +h .

(5)

Clements and Hendry (1994), Hendry and Doornik (1997), and Hendry (2000) show that changes in ϕ are easy to detect, whereas those in φ and Π are not when ϕ is unchanged. This delimits the class of structural breaks and regime changes that any test for super exogeneity can reasonably detect. To see the problem, consider the one-step forecast errors from T1 + 1 onwards using: ,T1 +h|T1 +h−1 = ϕ + Π (xT1 +h−1 − ϕ) x , T1 +h|T1 +h−1 = xT1 +h − x ,T1 +h|T1 +h−1 where: which would be ν , T1 +h|T1 +h−1 = (ϕ∗ − ϕ) + Π∗ (xT1 +h−1 − ϕ∗ ) − Π (xT1 +h−1 − ϕ) + ν T1 +h . ν

(6)

Finite-sample biases in estimators and estimation uncertainty are neglected here as negligible relative to the sizes of the eﬀects we seek to highlight. Unconditionally, therefore, using (2): (

' , T1 +h|T1 +h−1 = (In − Π∗ ) (ϕ∗ − ϕ) + (Π∗ − Π) ϕ∗h−1 − ϕ . (7) E ν Consequently, E[, ν T1 +h|T1 +h−1 ] = 0 when ϕ∗ = ϕ, however large the changes in Π or φ. Detectability also depends indirectly on the magnitudes of shifts relative to Ων , as there are data variance shifts following unmodeled breaks, but such shifts are hard to detect when ϕ∗ = ϕ until long after the break has occurred, as the next section illustrates.

2.1. Simulation outcomes To illustrate, let n = 2, and for the baseline case (a): 0.7 0.2 1 π 1 = , φ= Π= π 2 −0.2 0.6 1

(8)

168

An automatic test of super exogeneity

where Π has eigenvalues of 0.65 ± 0.19i with modulus 0.68, and for |ρ| < 1: 1 ρ 1 0.5 2 = (0.01) Ων = σ 2 ρ 1 0.5 1 so the error standard deviations are 1% for xt interpreted as logs, with: ϕ=

1 − 0.7 0.2

−1 3.75 1 . = 0.625 1

−0.2 1 − 0.6

(9)

At time T1 , Π and φ change to Π∗ and φ∗ leading to case (b): 0.5 −0.2 2.0 ∗ ∗ , φ = Π = 0.1 0.5 −0.0625

(10)

where the eigenvalues of Π∗ are 0.5 ± 0.14i with modulus 0.27. The coeﬃcients in Π are shifted at T1 = 0.75T = 75 by −20σ, −40σ, +30σ and +10σ, so the standardized impulse responses are radically altered between Π and Π∗ . Moreover, the shifts to the intercepts are 100σ or larger when a residual of ±3σ would be an outlier. Figure 9.1 shows the data outcomes on a randomly selected experiment in the ﬁrst column, with the Chow test rejection frequencies on 1,000 replications in the second (we will discuss the third below): • for the baseline DGP in (8); • for the changed DGP in (10); • for the intercept-shifted DGP in (11) below; • for the intercept-shifted DGP in (11) below, changed for just one period. The data over 1 to T1 are the same in the four cases, and although the DGPs diﬀer over T1 + 1 to T = 100 in (a) and (b), it is hard to tell their data apart. The changes in φ in (b) are vastly larger than any likely shifts in real-world economies. Nevertheless, the rejection frequencies on the Chow test are under 13% at a 1% nominal signiﬁcance. However, keeping Π constant in (8), and changing only φ by ±5σ to φ∗∗ yields case (c): 1.05 0.7 0.2 ∗∗ (11) Π= , φ = 0.95 −0.2 0.6 which leads to massive forecast failure. Indeed, changing the DGP in (11) for just one period is quite suﬃcient to reveal the shift almost 100% of the time as seen in (d). The explanation for such dramatic diﬀerences between the second and third rows – where the former had every parameter greatly changed and the latter only had a small shift in the intercept – is that ϕ is unchanged from (a) to (b) at: 1 − 0.5 ϕ∗ = −0.1

0.2 1 − 0.5

−1

2.0 −0.0625

=

3.75 0.625

=ϕ

(12)

2 Detectable shifts

169

Baseline data sample 2.5 (a) 0.0 –2.5

x1,a

0

50

Baseline null rejection frequencies 1.0 (a) Chow test at 0.01

Baseline conditional model 1.0 (a) Chow test at 0.01

0.5

0.5

x2,a

60

100

Changing all VAR(1) parameters 2.5 (b) 0.0

1.0

80

100

Changing all VAR(1) parameters (b)

Chow test at 0.01

x1,b

0

50

x1,c

100

Conditional model (b)

Chow test at 0.01

x2,b

60

100

Changing intercepts only 5 (c)

1.0

80

0.5

0.5

–2.5

60

1.0

x2,c

0

80

100

Changing intercepts only Chow test at 0.01

(c)

60 1.0

80

100

Conditional model Chow test at 0.01

(c)

0.5

0.5

–5 0

50

100

One−period shift 5 (d)

x1,d

60 1.0

x2,d

80

100

One−period shift

1.0

Chow test at 0.01

(d) 0.5

0 0

Fig. 9.1.

60

50

100

80

100

Conditional model

(d)

Chow test at 0.01

0.5

60

80

60

100

80

100

Data graphs and constancy test rejection frequencies

whereas in (c): ∗∗

ϕ

1 − 0.7 = 0.2

−0.2 1 − 0.6

−1 1.05 3.8125 = 0.95 0.46875

(13)

inducing shifts of a little over 6σ and 16σ in the locations of x1,t and x2,t , respectively, relative to the in-sample E [xt ]. Case (d) may seem the most surprising – it is far easier to detect a one-period intercept shift of 5σ than when radically changing every parameter in the system for a quarter of the sample, but where the long run mean is unchanged: indeed the rejection frequency is essentially 100% versus less than 15%. The entailed impacts on conditional models of such shifts in the marginal distributions are considered in the next section.

2.2. Detectability in conditional models In the bivariate case (a) of section 2.1, let xt = (yt : zt ) to match the notation below, then: E [yt | zt , xt−1 ] = φ1 + π 1 xt−1 + ρ (zt − φ2 − π 2 xt−1 )

= ϕ1 + ρ (zt − ϕ2 ) + (π 1 − ρπ 2 ) (xt−1 − ϕ)

170

An automatic test of super exogeneity

as:

φ1 φ2

=

ϕ1 − π 1 ϕ ϕ2 − π 2 ϕ

.

After the shift in case (b), so t > T1 :

E [yt | zt , xt−1 ] = (φ∗1 − ρφ∗2 ) + ρzt + (π ∗1 − ρπ ∗2 ) xt−1

= ϕ1 + ρ (zt − ϕ2 ) + (π ∗1 − ρπ ∗2 ) (xt−1 − ϕ)

(14)

and hence the conditional model is constant only if: π 1 − π ∗1 = ρ (π 2 − π ∗2 ) ,

(15)

which is strongly violated by the numerical values used here: 0.2 −0.15 = = ρ (π 2 − π ∗2 ) . π 1 − π ∗1 = 0.4 0.05 Nevertheless, as the shift in (14) depends on changes in the coeﬃcients of zero-mean variables, detectability will be low. In case (c) when t >> T1 :

E [yt | zt , xt−1 ] = ϕ∗1 + ρ (zt − ϕ∗2 ) + (π 1 − ρπ 2 ) (xt−1 − ϕ∗ ) ϕ∗2

∗

(16)

ϕ∗1

and E [xt−1 ] = ϕ so there is a location shift of − ϕ1 . The third where E [zt ] = column of graphs in Figure 9.1 conﬁrms that the outcomes in the four cases above carry over to conditional models, irrespective of exogeneity: cases (a) and (b) are closely similar and low, yet rejection is essentially 100% in cases (c) and (d). Notice that there is no shift at all in (14) when (15) holds, however large the changes to the VAR. Consequently, we focus the super-exogeneity test to have power for location shifts in the marginal distributions, which thereby “contaminate” the conditional model.

2.2.1. Moving window estimation One approach which could detect that breaks of type (b) had occurred is the use of moving estimation windows, as a purely post-break sample (last T − T1 + 1 observations) would certainly deliver the second-regime parameters. Suﬃcient observations must have accrued in the second regime (and no other shifts occurred): see e.g., Castle, Fawcett and Hendry (2009). If impulse response analysis is to play a substantive role in policy advice, it would seem advisable to check on a relatively small ﬁnal-period subsample that the estimated parameters and error variances have not changed.

3. Super exogeneity in a regression context Consider the sequentially factorized DGP of the n-dimensional I(0) vector process {xt }: T ) t=1

Dx (xt | Xt−1 , θ) =

T ) t=1

Dy|z (yt | zt , Xt−1 , φ1 ) Dz (zt | Xt−1 , φ2 )

(17)

3 Super exogeneity in a regression context

171

where xt = (yt : zt ), Xt−1 = (X0 x1 . . . xt−1 ) for initial conditions X0 , and φ =

φ1 : φ2 ∈ Φ with φ = f (θ) ∈ Rk . The parameters φ1 ∈ Φ1 and φ2 ∈ Φ2 of the {yt } and {zt } processes need to be variation free, so that Φ = Φ1 × Φ2 , if zt is to be weakly exogenous for the parameters of interest ψ = h (φ1 ) in the conditional model. However, such a variation-free condition by itself does not rule out the possibility that φ1 may change if φ2 is changed. Super exogeneity augments weak exogeneity with parameter invariance in the conditional model such that: ∂φ1 = 0 ∀φ2 ∈ C φ2 ∂φ2

(18)

where C φ2 is a class of interventions changing the marginal process parameters φ2 , so (18) requires no cross-links between the parameters of the conditional and marginal processes. No DGPs can be invariant for all possible changes, hence the limitation to C φ2 , the “coverage” of which will vary with the problem under analysis. When Dx (·) is the multivariate normal, we can express (17) as the unconditional model: μ1,t σ11,t σ 12,t yt ∼ INn , (19) μ2,t zt σ 12,t Σ22,t where E [yt ] = μ1,t and E [zt ] = μ2,t are usually functions of Xt−1 . To deﬁne the parameters of interest, we let the economic theory formulation entail: μ1,t = μ + β μ2,t + η xt−1

(20)

where β is the primary parameter of interest. The Lucas (1976) critique explicitly considers a model where expectations (the latent decision variables given by μ2,t ) are incorrectly modeled by the outcomes zt . From (19) and (20):

E [yt | zt , xt−1 ] = μ1,t + σ 12,t Σ−1 22,t zt − μ2,t + η xt−1 = μ + γ1,t + γ 2,t zt + η xt−1

(21)

2 where γ 2,t = σ 12,t Σ−1 22,t and γ1,t = (β−γ 2,t ) μ2,t . The conditional variance is ωt = σ11,t − γ 2,t σ 21,t . Thus, the parameters of the conditional and marginal densities, respectively, are:

φ1,t = μ : γ1,t : γ 2,t : η : ωt2 and φ2,t = μ2,t : Σ22,t .

When (21) is speciﬁed as a constant-parameter regression model over t = 1, . . . , T : yt = μ + β zt + η xt−1 + t where t ∼ IN[0, ω 2 ]

(22)

four conditions are required for zt to be super exogenous for (μ, β, η, ω 2 ) (see Engle and Hendry, 1993): (i) γ 2,t = γ 2 is constant ∀t; (ii) β = γ 2 ; (iii) φ1,t is invariant to C φ2 ∀t; (iv) ωt2 = ω 2 ∀t.

172

An automatic test of super exogeneity

Condition (i) requires that σ 12,t Σ−1 22,t is constant over time, which could occur because the σij happened not to change over the sample, or because the two components move in tandem through being connected by σ 12,t = γ 2 Σ22,t . Condition (ii) then entails that zt is weakly exogenous for a constant β. Together, (i) + (ii) entail the key result that γ1,t = 0 in (21), so the conditional expectation does not depend on μ2,t . Next, (iii) requires the absence of links between the conditional and marginal parameters. Finally, a fully constant regression also requires (iv), so ωt2 = σ11,t − β Σ22,t β = ω 2 is constant ∀t, with the observed variation in σ11,t derived from changes in Σ22,t . However, nonconstancy in ωt2 can be due to factors other than a failure of super exogeneity, so is only tested below as a requirement for congruency. Each of these conditions can be valid or invalid separately: for example, β t = γ 2,t is possible when (i) is false, and vice versa. When conditions (i)–(iv) are satisﬁed:

in which case zt Consequently:

(23) E [yt |zt , xt−1 ] = μ + β zt + η xt−1

is super exogenous for μ, β, η, ω 2 in this conditional model. σ 12,t = β Σ22,t ∀t

(24)

where condition (24) requires that the means in (20) are interrelated by the same parameter β as the covariances σ 12,t are with the variances Σ22,t . Under those conditions, the joint density is: μ + β μ2,t + η xt−1 ω 2 + β Σ22,t β β Σ22,t yt |xt−1 ∼ INn , (25) μ2,t zt Σ22,t β Σ22,t so the conditional-marginal factorization is: μ + β zt + η xt−1 yt |zt , xt−1 ω2 ∼ INn , μ2,t zt | xt−1 0

0 Σ22,t

.

(26)

Consequently, under super exogeneity, the parameters (μ2,t , Σ22,t ) can change in the marginal model: ' ( (27) zt | xt−1 ∼ INn−1 μ2,t , Σ22,t without altering the parameters of (22). Deterministic-shift co-breaking will then occur

in (25), as 1 : β xt does not depend on μ2,t : see §8. Conversely, if zt is not super exogenous for β, then changes in (27) should aﬀect (22) through γ1,t = (β − γ 2,t ) μ2,t , as we now discuss.

3.1. Failures of super exogeneity Super exogeneity may fail for any of three reasons, corresponding to (i)–(iii) above: (a) the regression coeﬃcient γ 2 is not constant when β is; (b) zt is not weakly exogenous for β; (c) β is not invariant to changes in C φ2 .

4 Impulse saturation

173

When zt is not super exogenous for β, and μ2,t is nonconstant, then (21) holds as: yt = μ + (β − γ 2,t ) μ2,t + γ 2,t zt + η xt−1 + et

(28)

We model μ2,t using lagged values of xt and impulses to approximate the sequential factorization in (17): zt = μ2,t + v2,t = π 0 +

s

Γj xt−j + dt + v2,t

(29)

j=1

where v2,t ∼ INn−1 [0, Σ22,t ] is the error on the marginal model and dt denotes a shift at t. Section 2 established that the detectable breaks in (29) are location shifts, so the next section considers impulse saturation applied to the marginal process, then derives the distribution under the null of no breaks in §5, and the behavior under the alternative in §6. Section 7 proposes the test for super exogeneity based on including the signiﬁcant impulses from such marginal-model analyses in conditional equations like (28).

4. Impulse saturation The crucial recent development for our approach . for nonconstancy by - is that of testing adding a complete set of impulse indicators 1{t} , t = 1, . . . , T to a marginal model, where 1{t} = 1 for observation t, and zero otherwise: see Hendry et al. (2008) and Johansen and Nielsen (2009). Using a modiﬁed general-to-speciﬁc procedure, those authors analytically establish the null distribution of the estimator of regression parameters after adding T impulse indicators when the sample size is T . A two-step process is investigated, where half the indicators are added, and all signiﬁcant indicators recorded, then the other half examined, and ﬁnally the two retained sets of indicators are combined. The average retention rate of impulse indicators under the null is αT when the signiﬁcance level of an individual test is set at α. Moreover, Hendry et al. (2008) show that other splits, such as using three splits of size T /3, or unequal splits do not aﬀect the retention rate under the null, or the simulation-based distributions. Importantly, Johansen and Nielsen (2009) both generalize the analysis to dynamic models (possibly with unit roots) and establish that for small α (e.g., α ≤ 0.01), the ineﬃciency of conducting impulse saturation is very small despite testing T indicators: intuitively, retained impulses correspond to omitting individual observations, so only αT data points are “lost”. This procedure is applied to the marginal models for the conditioning variables, and the associated signiﬁcant dummies in the marginal processes are recorded. Speciﬁcally, after the ﬁrst stage when m impulse indicators are retained, a marginal model like (29) has been extended to: zt = π 0 +

s j=1

Γj xt−j +

m i=1

∗ τ i,α1 1{t=ti } + v2,t

(30)

174

An automatic test of super exogeneity

(a)

(b) Y

2

Fitted

scaled residuals

10 0

5

0 –2 0

20

Fig. 9.2.

40

60

80

100

0

20

40

60

80

100

Absence of outliers despite a break

where the coeﬃcients of the signiﬁcant impulses are denoted τ i,α1 to emphasize their dependence on the signiﬁcance level α1 used in testing the marginal model. Equation (30) is selected to be congruent. Second, those impulses that are retained are tested as an added variable set in the conditional model. There is an important diﬀerence between outlier detection, which does just that, and impulse saturation, which will detect outliers but may also reveal other shifts that are hidden by being “picked up” incorrectly by other variables. Figure 9.2(a) illustrates a mean shift near the mid-sample, where a regression on a constant is ﬁtted. Panel (b) shows that no outliers, as deﬁned by |, ui,t | > 2σii (say), are detected (for an alternative approach, see S´anchez and Pe˜ na, 2003). By way of comparison, Figure 9.3 shows impulse Dummies included in model

15

Final model: actual and f itted

Final model: dummies selected

Block 1

10 5 0 0

50

0

100

50

100

15

0

50

100

1.0

Block 2

10 0.5

5 0 0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

15

Final

10 5 0 0

Fig. 9.3.

50

100

Impulse saturation in action

5 Null rejection frequency of the impulse-based test

175

saturation for the same data, where the columns show the outcomes for the ﬁrst half, second half, then combined, respectively, and the rows show the impulses included at that stage, their plot against the data, and the impulses retained. Overall, 20 impulses are signiﬁcant, spanning the break (the many ﬁrst-half impulses retained are due to trying to make the skewness diagnostic insigniﬁcant). In fact, Autometrics uses a more sophisticated algorithm, which outperforms the split-half procedure in simulation experiments (see Doornik, 2009, for details). The second stage is to add the m retained impulses to the conditional model, yielding: yt = μ + β zt + η xt−1 +

m

δi,α2 1{t=ti } + t

(31)

i=1

and conduct an F-test for the signiﬁcance of (δ1,α2 . . . δm,α2 ) at level α2 . Under the null of super exogeneity, the F-test of the joint signiﬁcance of the m impulse indicators in the conditional model should have an approximate F-distribution and thereby allow an appropriately sized test: Section 5 derives the null distribution and presents Monte Carlo evidence on its small-sample relevance. Under the alternative, the test will have power in a variety of situations discussed in Section 7 below. Such a test can be automated, bringing super exogeneity into the purview of hypotheses about a model that can be as easily tested as (say) residual autocorrelation. Intuitively, if super exogeneity is invalid, so β = σ 12,t Ω−1 22,t in (28), then the impact on the conditional model of the largest values of the μ2,t should be the easiest to detect, noting that the signiﬁcant impulses in (30) capture the outliers or breaks not accounted for by the regressor variables used. The null rejection frequency of this F-test of super exogeneity in the conditional model should not depend on the signiﬁcance level, α1 , used for each individual test in the marginal model. However, too large a value of α1 will lead to an F-test with large degrees of freedom; too small will lead to few, or even no, impulses being retained from the marginal models. Monte Carlo evidence presented in Section 5.1 supports that contention. For example, with four conditioning variables and T = 100, then under the null, α1 = 0.01 would yield four impulses in general, whereas α1 = 0.025 would deliver 10. Otherwise, the main consideration for choosing α1 is to allow power against reasonable alternatives to super exogeneity. A variant of the test in (31), which builds on Hendry and Santos (2005) and has diﬀerent power characteristics, is to combine the m impulses detected in (30) into an index (see Hendry and Santos, 2007).

5. Null rejection frequency of the impulse-based test Reconsider the earlier sequentially factorized DGP in (19), where under the null of super exogeneity, from (23): yt = μ + β zt + η xt−1 + t

(32)

so although the {zt } process is nonconstant, the linear relation between yt and zt in (32) is constant.

176

An automatic test of super exogeneity

Let Sα1 denote the dates of the signiﬁcant impulses {1{ti } } retained in the model for the marginal process (30) where: (33) tτ,i,ti > cα1 when cα1 is the critical value for signiﬁcance level α1 . In the model (32) for yt |z-t , xt−1., conditioning on zt implies taking the v2,t s as ﬁxed, so stacking the impulses in 1{ti } in the vector 1t : E [yt | zt | xt−1 ] = μ + β zt + η xt−1 + δ 1t

(34)

where δ = 0 under the null. Given a signiﬁcance level α2 , a subset of the indicators {1t } will be retained in the conditional econometric model, given that they were retained in the marginal when: (35) tδ,j > cα2 . Thus, when (33) occurs, the probability of retaining any indicator in the conditional is: (36) P tδ,j > cα2 | tτ,i,ti > cα1 = P tδ,j > cα2 = α2 as (33) holds, which only depends on the signiﬁcance level cα2 used on the conditional model and not on α1 . If (33) does not occur, no impulses are retained, then P(|tδ,j | > cα2 ) = 0, so the super-exogeneity test will under-reject under the null.

5.1. Monte Carlo evidence on the null rejection frequency The Monte Carlo experiments estimate the empirical null rejection frequencies of the super-exogeneity test for a variety of settings, sample sizes, and nominal signiﬁcance levels, and check if there is any dependence of these on the nominal signiﬁcance levels for impulse retention in the marginal process. If there is dependence, then searching for the relevant dates at which shifts might have occurred in the marginal would aﬀect testing for associated shifts in the conditional. In the following subsections, super exogeneity is the null, and we consider three settings for the marginal process: where there are no breaks in §5.1.1; a mean shift in §5.1.2; and a variance change in §5.1.3. Because the “size” of a test statistic has a deﬁnition that is only precise for a similar test, and the word is ambiguous in many settings (such as sample size), we use the term “gauge” to denote the empirical null rejection frequency of the test procedure. As Autometrics selection seeks a congruent model, irrelevant variables with |t| < cα can sometimes be retained, and gauge correctly reﬂects their presence, whereas “size” would not (e.g., Hoover and Perez, 1999, report “size” for signiﬁcant irrelevant variables only). The general form of DGP is the bivariate system: −1 μ + βξ(t) μzt + η xt−1 yt σ22 σ11 + β 2 θ(t) βθ(t) | xt−1 ∼ IN2 , σ22 zt ξ(t) μzt βθ(t) θ(t) (37)

5 Null rejection frequency of the impulse-based test

177

where ξ(t) = 1 + ξ1{t>T1 } and θ(t) = 1 + θ1{t>T2 } , so throughout: γ2,t =

βσ22 θ(t) σ12,t = = β = γ2 σ22,t σ22 θ(t)

−1 2 ωt2 = σ11,t − σ12,t σ22,t = σ11 + β 2 σ22 θ(t) −

2 2 θ(t) β 2 σ22

σ22 θ(t)

(38)

= σ11 = ω 2

(39)

and hence from (37):

E [yt | zt , xt−1 ] = μ + βξ(t) μzt + η xt−1 + γ2 zt − ξ(t) μzt = μ + βzt + η xt−1 .

(40)

Three cases of interest are ξ = θ = 0, ξ = 0, and θ = 0 in each of which super exogeneity holds, but for diﬀerent forms of change in the marginal process. In all cases, β = 2 = γ2 and ω 2 = 1, which are the constant and invariant parameters of interest, with σ22 = 5. Any changes in the marginal process occur at time T1 = 0.8T . The impulse saturation uses a partition of T /2 with M = 10,000 replications. Sample sizes of T = (50, 100, 200, 300) are investigated, and we examine all combinations of four signiﬁcance levels for both α1 (for testing impulses in the marginal) and α2 (testing in the conditional) equal to (0.1, 0.05, 0.025, 0.01).

5.1.1. Constant marginal The baseline DGP is (37) with ξ = θ = 0, μzt = 1 and η = 0. Thus, the parameters of the conditional model yt |zt are φ1 = μ; γ2 ; ω 2 = (0; 2; 1) and the parameters of the marginal are φ2,t = (μ2,t ; σ22,t ) = (1; 5). The conditional representation is: δi 1ti + t (41) yt = βzt + i∈Sα1

and testing super exogeneity is based on the F-test of the null δ = 0 in (41). The ﬁrst column in Figure 9.4 reports the test’s gauges where α1 is the nominal signiﬁcance level used for the t-tests on each individual indicator in the marginal model (horizontal axis), and α2 is the signiﬁcance level for the F-test on the set of retained dummies in the conditional (vertical axis). Unconditional rejection frequencies are recorded throughout. The marginal tests should not use too low a probability of retaining impulses, or else the conditional must automatically have a zero null rejection frequency. For example, at T = 50 and α1 = 0.01, about one impulse per two trials will be retained, so half the time, no impulses will be retained; on the other half of the trials, about α2 will be retained, so roughly 0.5α2 will be found overall, as simulation conﬁrms. The simulated gauges and nominal null rejection frequencies are close so long as α1 T > 3. Then, there is no distortion in the number of retained dummies in the conditional. However, constant marginal processes are the “worst-case”: the next two sections consider mean and variance changes where many more impulses are retained, so there are fewer cases of no impulses detected to enter in the conditional.

178 α2 0.100 0.075 0.050 0.025 0.000 0.100

An automatic test of super exogeneity T = 300

0.050

0.025

α2 0.100 (a) 0.075 0.050 0.025 0.000 2 0.010 α1→

(b)

0.050

0.025

0.010

0.100 0.075 0.050 0.025 0.000

2

T = 100

0.100 0.075 0.050 0.025 0.000 0.100

0.025

0.010

0.100 0.075 0.050 0.025 0.000

2

T = 50

0.100 0.075 0.050 0.025 0.000 0.100

Fig. 9.4.

0.025

100

2

10

100

0.100 0.075 0.050 0.025 0.000

2

0.010 α1→

0.100 0.075 0.050 0.025 0.000

2

10

5

θ→ 10

5 T = 100

10

5

10

T = 50

T = 50

(d)

0.050

10

T = 300

T = 200

0.100 0.075 0.050 0.025 0.000

T = 100

(c)

0.050

10

α2 0.100 0.075 0.050 0.025 0.000 ξ→ 100 2

T = 200

T = 200

0.100 0.075 0.050 0.025 0.000 0.100

T = 300

ξ→ 100

0.100 0.075 0.050 0.025 0.000

2

5

θ→

10

Gauges of F-tests in the conditional as α1 , ξ or θ vary in the marginal

5.1.2. Changes in the mean of zt The second DGP is given by (37), where ξ = 2, 10, 100 with θ = 0, μzt = 1 and η = 0. Super exogeneity holds irrespective of the level shift in the marginal; however, it is important to check that spurious rejection is not induced by breaks in marginal processes. The variance–covariance matrix is constant, but could be allowed to change as well, provided the values matched the conditions for super exogeneity as in §5.1.3. The second column of graphs in Figure 9.4 reports the test’s gauges where the horizontal axis now corresponds to the three values of ξ, using α1 = 2.5% throughout. Despite large changes in ξ, when T > 100, the gauges are close to the nominal signiﬁcance levels. Importantly, the test does not spuriously reject the null, but now is slightly undersized at T = 50 for small shifts, as again sometimes no impulses are retained for small shifts.

5.1.3. Changes in the variance of zt The third DGP is given by (37), where θ = 2, 5, 10 with ξ = 0, μzt = 1 and η = 0, so φ1,t is again invariant to changes in φ2,t induced by changes in σ22,t . The impulse-saturation test has the power to detect variance shifts in the marginal, so, like the previous case,

6 Potency at stage 1

179

more than αT impulses should be retained on average, depending on the magnitude of the marginal variance change (see §6.2). The third column of graphs in Figure 9.4 reports the test’s gauges as before. Again, the vertical axis reports α2 , the nominal signiﬁcance level for the F-test on the set of retained impulses in the conditional, but now the horizontal axis corresponds to the three values of θ, using α1 = 2.5% throughout. The F-test has gauge close to the nominal for T > 100, even when the variance of the marginal process changes markedly, but the test is again slightly undersized at T = 50 for small shifts. As in §5.1.2, the test is not “confused” by variance changes in the marginal to falsely imply a failure of super exogeneity even though the null holds. Overall, the proposed test has appropriate empirical null rejection frequencies for both constant and changing marginal processes, so we now turn to its ability to detect failures of exogeneity. Being a selection procedure, test rejection no longer corresponds to the conventional notion of “power”, so we use the term “potency” to denote the average non-null rejection frequency of the test. This test involves a two-stage process: ﬁrst detect shifts in the marginal, then use those to detect shifts in the conditional. The properties of the ﬁrst stage have been considered in Santos and Hendry (2006), so we only note them here, partly to establish notation for the second stage considered in §7.

6. Potency at stage 1 We consider the potency at stage 1 for a mean shift then a variance change, both at time T1 .

6.1. Detecting a mean shift in the marginal Marginal models in their simplest form are: ∗ τi,j,α1 1{ti } + v2,j,t zj,t =

(42)

i∈Sα1

when the marginal process is (43): zj,t = λj 1{t>T1 } + v2,j,t

(43)

where H1 : λj = 0 ∀j holds. The potency to retain each impulse in (42) depends on the probability of rejecting the null for the associated estimated τi,j,α1 : ∗ τ,i,j,α1 = λj + v2,j,t . i

The properties of tests on such impulse indicators are discussed in Hendry and Santos (2005). Let ψλ,α1 denote the noncentrality, then as V[, τi,j,α1 ] = σ22,j : ' ( λj τ,i,j,α E tτi,j,α =0 (ψλ,α1 ) = E 1 √ = ψλ,α1 . (44) σ σ ,22,j 22,j

180

An automatic test of super exogeneity

When v2,j,t is normal, the potency could be computed directly from the t-distribution: as most outliers will have been removed, normality should be a reasonable approximation. However, the denominator approximation requires most other shifts to have been detected. We compute the potency functions using an approximation to t2τi,j,α1 =0 by χ2 with one degree of freedom:

2

2 . t2τi,j,α1 =0 ψλ,α χ2 ψλ,α 1 a 1 / pp 1

(45)

Relating that noncentral χ2 distribution to a central χ2 using (see e.g., Hendry, 1995):

2 hχ2m (0) χ21 ψλ,α 1

(46)

where: h=

2 2 1 + 2ψλ,α 1 + ψλ,α 1 1 . and m = 2 1 + ψλ,α h 1

(47)

2 Then the potency function of the χ21 (ψλ,α ) test in (45) is approximated by: 1

1 0

2 ( ' 2 > c > cα1 |H1 P χ21 ψλ,α |H P t2τi,j,α1 =0 ψλ,α α 1 1 1 1 ( ' P χ2m (0) > h−1 cα1 .

(48)

For noninteger values of m, a weighted average of the neighboring integer values is 2 used. For example, when ψλ,α = 16 and cα1 = 3.84, then h 1.94 and m = 8.76 1 (taking the nearest integer values as 8 and 9 with weights 0.24 and 0.76), which yields P[t2τi,j,α1 =0 (16) > 3.84] 0.99, as against the exact t-distribution outcome of 0.975.

√ 2 When λj = d σ22,j so ψλ,α = d2 , then pλ = P[t2τ,i,j,α d2 > cα1 ] rises from 0.17, 1 1 through 0.50 to 0.86 as d is 1, 2, 3 at cα1 = 3.84, so the potency is low at d = 1 (the t-distribution outcome for d = 1 is 0.16), but has risen markedly even by d = 3. In practice, Autometrics selects impulses within contiguous blocks with approximately these probabilities, but has somewhat lower probabilities for scattered impulses. For example for the two DGPs: D1 : y1,t = d (IT −19 + · · · + IT ) + ut ,

ut ∼ IN(0, 1)

D3 : y3,t = d (I1 + I6 + I11 + · · · ) + ut ,

ut ∼ IN(0, 1)

where the model is just a constant and T dummies for T = 100. While both have 20 relevant indicators, the potency per impulse diﬀers as shown in Table 9.1. There is a close match between analytic power and potency in D1, and both rise rapidly with d, the standardized shift. D3 poses greater detection diﬃculties as all subsamples are alike (by construction); the split-half algorithm performs poorly on such experiments relative to Autometrics. Modifying the experiment to an intermediate case of (say) ﬁve breaks of length 4 delivers potency similar to D1. Importantly, breaks at the start or end of the sample are no more diﬃcult to detect. Thus, we use (48) as the approximation for the ﬁrst-stage potency.

7 Super-exogeneity failure

181

Table 9.1. Impulse saturation in Autometrics at 1% nominal size, T = 100, M = 1000 d=0 D1 gauge % potency % analytic power % D3 gauge % potency %

d=1 d=2

d=3 d=4

d=5

1.5 — —

1.2 4.6 6.1

0.9 25.6 26.9

0.3 52.6 65.9

0.7 86.3 93.7

1.1 99.0 99.7

1.5 —

1.0 3.5

0.4 7.9

0.3 24.2

1.0 67.1

0.8 90.2

6.2. Detecting a variance shift in the marginal Consider a setting where the variance shift θ > 1 occurs when T1 > T /2 so that: √ zt = 1 + 1{t
(49)

The maximum feasible potency would be from detecting and √ entering the set of k = T − T1 + 1 impulses 1{t≥T1 } , each of which would then equal θ1{t≥T1 } vt , to be judged against a baseline variance of σv2 : √ θ1{t≥T1 } vt tτt = , σv 2 2 = θ. Approximating by hχ2m (0) as in (48), for ψθ,α = so t2τt has a noncentrality of ψθ,α 1 1 (2; 5; 10) potency will be about (25%, 60%, 90%), respectively, at α1 = 0.05. Thus, only large changes in variances will be detected. Viewing the potencies at stage 1 as the probability pλ of retaining a relevant impulse from the marginal model, then approximately pλ k ≤ k relevant impulses will be retained for testing in the conditional model, attentuating the noncentrality (denoted ϕδ,α1 ) of the F-test of δ = 0 in (41) relative to known shifts. Further, retention of irrelevant impulses – corresponding to nonbreak-related shocks in the marginal process – will also lower potency relative to knowing the shifts. For the F-test of δ = 0, this increases its degrees of freedom, but that should only induce a small potency reduction for small α1 . For a given noncentrality ϕδ,α1 , however, that eﬀect also diﬀers depending on the magnitudes and lengths of the shifts in the marginal, as fewer irrelevant impulses will be retained when (e.g.) there is a large, short shift.

7. Super-exogeneity failure In this section, we derive the outcome for a super-exogeneity failure due to a weak exogeneity violation when the marginal process is nonconstant, and obtain the noncentrality and approximate potency of the test when there is a single location shift in the marginal. Figure 9.1 showed high constancy-test rejection frequencies for both that setting and

182

An automatic test of super exogeneity

even a single impulse. Section 9 reports the simulation outcomes. As seen in §3.1, many causes of failure are possible, including shifts in variances in marginal processes and any cross-links between conditional and marginal parameters, but location shifts due to changes in policy rules are a central scenario. The potency at the second stage conditional on the saturation approach locating all, and only, the relevant impulses corresponding to shifts in the marginal, is easily calculated, but will only be accurate for large magnitude breaks, parameterized below by λ. For smaller values of λ, fewer impulses will be detected in the marginal. Moreover, although the null rejection frequency of the test in the conditional does not depend on α1 once α1 T > 3, the potency will, suggesting that a relatively nonstringent α1 should be used. However, that will lead to retaining some “spurious” impulses in the marginal, albeit fewer than α1 T1 because shifts lower the remaining null rejection frequency (see, e.g., Table 9.1). We use the formulation in §3 for a normally distributed n × 1 vector xt = (yt : zt ) generated by (19), with E [yt |zt ] given by (21), where γ = Σ−1 22 σ 12 , η = 0 and conditional variance ω 2 = σ11 − σ 12 Σ−1 22 σ 12 . The parameter of interest is β in (20), so: yt = μ + β zt + (γ − β) zt − μ2,t + t

= μ + γ zt + (β − γ) μ2,t + t

(50)

( ' where yt −E[yt |zt ] = t ∼ IN 0, σ2 , so E[t |zt ] = 0. However, E[yt |zt ] = β zt when β = γ, violating weak exogeneity, so (50) will change as μ2,t shifts. Such a conditional model is an example of the Lucas (1976) critique where the agents’ behavioral rule depends on E[zt ] as in (20), whereas the econometric equation conditions on zt . To complete the system, the break in the marginal process for {zt }, which induces the violation in super exogeneity, is parameterized as: zt = μ2,t + v2,t = λ1{t>T1 } + v2,t .

(51)

In practice, there could be multiple breaks in diﬀerent marginal processes at diﬀerent times, which may aﬀect one or more zt s, but little additional insight is gleaned over the one-oﬀ break in (51), which is suﬃciently general as the proposed test is an F-test on all retained impulses, so does not assume any speciﬁc break form at either stage. The advantage of using the explicit alternative in (51) is that approximate analytic calculations are feasible. As §2 showed that the key shifts are in the long run mean, we use the Frisch and Waugh (1933) theorem to partial out means, but with a slight abuse of notation, do not alter it. Combining (50) with (51) and letting δ = (β − γ) λ, the DGP becomes:

yt = μ + γ zt + (β − γ) λ1{t>T1 } + t = μ + γ zt + δ1{t>T1 } + t . Testing for the impulse dummies in the marginal model yields: ∗ zti = τ, i,α1 1{i} + v2,t i i∈Sα1

(52)

(53)

7 Super-exogeneity failure

183

∗ = 0 ∀i ∈ Sα1 where Sα1 denotes the set of impulses τ, i,α1 = λ1{ti >T1 } + v2,ti where v2,t i deﬁned by:

t2τi,j,α1 =0 > cα1 .

(54)

Stacking signiﬁcant impulses from (54) in ιt , and adding these to (50), yields the test regression: yt = κ0 + κ1 zt + κ2 ιt + et

(55)

The main diﬃculty in formalizing the analysis is that ιt varies between draws in both its length and its contents. As the test is an F-test for an i.i.d. DGP, the particular relevant and irrelevant impulses retained should not matter, merely their total numbers from the ﬁrst stage. Consequently, we distinguish: (a) the length of the break, T r, (b) the number of relevant retained elements in the index, which on average will be pλ T r, where pλ is the probability of retaining any given relevant impulse from §6.1, and (c) the total number of retained impulses in the model, T s, usually including some irrelevant ones, where on average s = (pλ r + α1 ), which determines the average degrees of freedom of the test. The F-test will have T s numerator degrees of freedom and T (1 − s) − n denominator s -test of: (allowing for the constant). The potency of the FTT (1−s)−n H0 : κ2 = 0

(56)

in (55) depends on the strengths of the super-exogeneity violations, (βi − γi ); the magnitudes of the breaks, λi , both directly and through their detectability, pλ , in the marginal models, in turn dependent on α1 ; the sample size T ; the relative number of periods r aﬀected by the break; the number of irrelevant impulses retained, and on α2 . The properties are checked by simulation below, and could be contrasted with the optimal, but generally infeasible, test based on adding the index 1{t>T1 } , instead of the impulses ιt , equivalent to a Chow (1960) test (see Salkever, 1976). A formal derivation, could either include pλ T r impulses, akin to a mis-speciﬁcation analysis, or model ιt in (55) as containing all T r relevant impulses, each with probability pλ > 0. The impact of irrelevant retained impulses is merely to reduce the number of available observations, so lowers potency slightly, and can otherwise be neglected. Taking the second route, namely the ﬁxed length κ2 , the full-sample representations are: DGP : y = Zγ + δI∗T r 1T r + Model : y = Zκ1 + J∗T r κ2 + e

(57)

Exogenous : Z = I∗T r 1T r λ + V2 where: I∗T r

0T (1−r),T r ; J∗T r = pλ I∗T r = IT r

(58)

184

An automatic test of super exogeneity

so 1T r is T r × 1 with T r elements of unity etc. To relate the DGP to the model, add and subtract δJ∗T r 1T r , noting that omitted impulses are orthogonal to the included, so ∗ (I∗T r − J∗T r ) = K∗T r with J∗ T r KT r = 0: y = Zγ + J∗T r (δ1T r ) + δK∗T r 1T r + .

(59)

Combinations involving J∗T r also have probability pλ , as it is only the initial chance of selection that matters and, conditional on that, thereafter occurs with certainty. Then, using (59) and letting I∗ T r Z = ZT r : −1 &1 − γ γ Z Z Z y Z J∗T r κ − = ∗ ∗ & 2 − δ1T r J∗ δ1T r κ J∗ T r Z JT r J T r T ry G−1 −G−1 ZT r = −1 −ZT r G−1 (pλ IT r ) + ZT r G−1 ZT r δ (1 − pλ ) Z I∗T r 1T r + Z pλ T r G−1 ZT r 1T r = δ (1 − pλ ) −ZT r G−1 ZT r 1T r G−1 (Z − pλ ZT r T r ) (60) + T r − ZT r G−1 (Z − pλ ZT r T r ) where G = (Z Z − pλ ZT r ZT r ). Since E [ZT r ] = 1T r λ and λ1T r 1T r = T rλ, approximating by: ( ' −1 (−1 ' = (1 − pλ ) rλλ + (1 − pλ r) Σ22 (61) E T G−1 E T −1 G then:

E

&1 κ &2 κ

γ δ1T r

− rδ (1 − pλ )

−fλ

λ fλ 1T r

=

γ∗ δ ∗ 1T r

where:

' (−1 fλ = E T −1 G λ.

(62)

As expected, the bias term vanishes when pλ = 1. Also, using the same approximations, the reported covariance matrix is (which will diﬀer from the correct covariance matrix based on the distribution in (60)): G−1 −G−1 λ1T r &1 κ σe2

−1 (63) Cov T −1T r λ G−1 &2 κ T pλ IT r + λ G−1 λ1T r 1T r where, evaluated at γ ∗ and δ ∗ :

2 2 σe2 σ2 + δ 2 (1 − pλ ) r2 1 + fλ Σ22 fλ + 2fλ λ + r (1 − pλ ) (fλ λ) .

(64)

7 Super-exogeneity failure

185

In the special case that pλ = 1, consistent estimates of γ result with T −1 G = (1 − r) Σ22 and σe2 = σ2 . As:

−1 −1 &2 & 2 T p−1 λ1T r 1T r κ κ λ IT r + λ G Ts , FT (1−s)−n (κ2 = 0) = (T (1 − s) − n) T s& σe2 using: −1

1T r (IT r + x1T r 1T r )

1T r =

Tr 1 + T rx

then:

−1 T −1 pλ 1T r IT r + T −1 pλ λ G−1 λ1T r 1T r 1T r =

rpλ 1 + rpλ λ G−1 λ

(65)

s -test is: so an approximate explicit expression for the noncentrality of the FTT (1−s)−n

(T (1 − s) − n) pλ r (δ ∗ )

. T sσe2 1 + rpλ λ G−1 λ 2

ϕ2s,F

(66)

All the factors aﬀecting the potency of the automatic test are clear in (66). The important selection mistake is missing relevant impulses: when pλ < 1 in (60), then σe2 > σ2 , so ϕ2s falls rapidly with pλ . Consequently, a relatively loose ﬁrst-stage signiﬁcance level seems sensible, e.g., 2.5%. The potency is not monotonic in s as the degrees of freedom of the F-test alter: a given value of δ achieved by a larger s will have lower potency than that from a smaller s > r. For numerical calculations, we allow on average that α1 T random extra impulses

and s (ϕ2s,F ) by a χ2T s ϕ2s for pλ rT = T q relevant are retained, so approximate FTT (1−s)−n ( ' ϕ2s = T qϕ2s,F , where P χ2T s (0) > cα2 = α2 using:

( ' ( ' (67) P χ2T s ϕ2s > cα2 |H1 P χ2m (0) > h−1 cα2 with: h=

T s + ϕ2s T s + 2ϕ2s . and m = 2 T s + ϕs h

(68)

Some insight can be gleaned into the potency properties of the test when n = 2. In that case, G = (1 − pλ ) rλ2 + (1 − pλ r) σ22 , and approximately for small α1 : T (1 − r) rpλ (δ ∗ ) T (1 − r) (β − γ) σ22 −→ 2 −1 2 σe (1 + rpλ G λ ) λ→∞ σ2 2

ϕ2s

2

2

(69)

where the last expression shows the outcome for large λ so pλ → 1. Then (69) reﬂects the 2 violation of weak exogeneity, (β − γ) , the signal–noise ratio, σ22 /σ2 , the loss from longer 2 break lengths (1 − r) , and the sample size, T . The optimal value of the noncentrality, 2 ϕr , for a known break date and form – so the single variable 1{t>T1 } is added – is: 2

ϕ2r =

σ2

T rδ 2 T σ22 (β − γ) −→ . −1 2 λ→∞ σ2 1 + rσ22 λ

(70)

186

An automatic test of super exogeneity

Despite the nature of adding T r separate impulses when nothing is known about the existence or timing of a failure of super exogeneity, so ϕ2s < ϕ2r , their powers converge rapidly as the break magnitude λ grows, when r is not too large. The numerical evaluations of (69) in Table 9.4 below are reasonably accurate.

8. Co-breaking based tests A key assumption underlying the above test is that impulse-saturation tests to detect breaks and outliers were not applied to the conditional model. In many situations, investigators will have done precisely that, potentially vitiating the ability of a direct super-exogeneity test to detect failures. Conversely, one can utilize such results for a deterministic co-breaking test of super exogeneity. Again considering the simplest case for exposition, add impulses to the conditional model, such that after saturation: yt = μ0 + β zt +

s

κj 1{tj } + νt

(71)

j=1

At the same time, if Sα1 denotes the signiﬁcant dummies in the marginal model: zt = τ 0 + τ j 1{tj } + ut

(72)

j∈Sα1

then the test tries to ascertain whether the timing of the impulses in (71) and (72) overlaps. For example, a perfect match would be strong evidence against super exogeneity, corresponding to the result above that the signiﬁcance of the marginal-model impulses in the conditional model rejects super exogeneity.

9. Simulating the potencies of the automatic super-exogeneity test We undertook simulation analyses using the bivariate relationship in Section 5.1 for violations of super exogeneity due to a failure of weak exogeneity under nonconstancy in: βμ2,t 21 10 yt ∼ IN2 , (73) zt μ2,t 10 5 so γ = 2 and ω 2 = 1, but β = γ, with a level shift at T1 in the marginal: μ2,t = λ1{t>T1 } so μ1,t = βλ1{t>T1 } .

(74)

√ We vary: d = λ/ σ22 over the values 1, 2, 2.5, 3 and 4; β over 0.75, 1, 1.5 and 1.75, reducing the extent of departure from weak exogeneity; two sample sizes (T = 100 and T = 300) which have varying break points, T1 ; and the signiﬁcance levels α1 and α2

9 Simulating the potencies of the automatic super-exogeneity test

187

Table 9.2. Potencies of the F-test for a level shift at T1 = 250, T = 300, α1 = α2 = 0.05 d:β

0.75

1.0

1.5

1.75

1.0 2.0 2.5 3.0 4.0

0.191 0.972 1.000 1.000 1.000

0.153 0.936 0.993 1.000 1.000

0.078 0.529 0.917 0.998 1.000

0.054 0.150 0.339 0.653 0.967

in the marginal and conditional. A partition of T /2 was always used for the impulse saturation in the marginal model, and M = 10, 000 replications. Table 9.2 reports the empirical null rejection frequencies of the F-test when T = 300 is used with 5% signiﬁcance levels in both the marginal and conditional models, for a level shift at T1 = 250, so k = 50 and r = 1/6. The potency of the test increases with the increase in β − γ, as expected, and increases with the magnitude of the level shift d. Even moderate violations of the null are detectable for level shifts of 2.5σ or larger. Table 9.3 shows the impact of reducing T − T1 to 25 cet. par. The potency is never smaller for the shorter break, so the degrees of freedom of the F-test are important, especially at intermediate potencies. Table 9.3. Potencies of the F-test for a level shift at T1 = 275, T = 300, α1 = α2 = 0.05 d:β

0.75

1.0

1.5

1.75

1.0 2.0 2.5 3.0 4.0

0.377 1.000 1.000 1.000 1.000

0.274 0.997 1.000 1.000 1.000

0.097 0.803 0.990 1.000 1.000

0.060 0.238 0.504 0.797 0.984

Using more stringent signiﬁcance levels of α1 = α2 = 2.5% naturally leads to a less potent test than the 5% in Table 9.2, although the detection probabilities still rise rapidly with the break magnitude, and even relatively mild departures from weak exogeneity are detected at the break magnitude of d = 4. The italic numbers in parentheses report the numerical evaluation of the analytic potency from (69) as a typical example, p∗ , and the response surface in (75) checks its explanatory ability. The coeﬃcient of log (p∗ ) is not signiﬁcantly diﬀerent from unity and the intercept is insigniﬁcant. log (, p) = 0.96 log (p∗ ) − 0.015 (0.04)

(0.06)

Fhet (2, 15) = 1.77 Freset (1, 17) = 2.81 R2 = 0.975 σ ,p = 0.21 χ2nd (2) = 6.4∗

(75)

188

An automatic test of super exogeneity Table 9.4. Potencies of the F-test for a level shift at T1 = 250, T = 300, α1 = α2 = 0.025 d:β 1.0 2.0 2.5 3.0 4.0

0.75 0.081 0.717 0.977 1.000 1.000

1.0

(0.087) (0.932) (1.000) (1.000) (1.000)

0.065 0.612 0.953 0.999 1.000

1.5

(0.060) (0.918) (1.000) (1.000) (1.000)

0.035 0.220 0.616 0.953 1.000

1.75

(0.031) (0.234) (0.615) (0.922) (1.000)

0.026 0.062 0.143 0.372 0.908

(0.027) (0.067) (0.107) (0.203) (0.627)

Here, R2 is the squared multiple correlation (when including a constant), σ ,p is the residual standard deviation, coeﬃcient standard errors are shown in parentheses, the diagnostic tests are of the form Fj (k, T − l) which denotes an approximate F-test against the alternative hypothesis j for: heteroskedasticity (Fhet : see White, 1980); the RESET test (Freset : see Ramsey, 1969); and χ2nd (2) is a chi-square test for normality (see Doornik and Hansen, 2008); below we also present k th -order serial correlation (Far : see Godfrey, 1978); k th -order autoregressive conditional heteroskedasticity (Farch : see Engle, 1982a); FChow for parameter constancy over k periods (see Chow, 1960); and SC is the Schwarz criterion (see Schwarz, 1978); ∗ and ∗∗ denote signiﬁcant at 5% and 1%, respectively. Figure 9.5 records the response surface ﬁtted and actual values; their cross-plot; the residuals scaled by σ ,; and their histogram and density with N[0,1] for comparison. 0

0

−1

−1

−2

−2 ^ log( p) Fitted

−3

0

5

^ × Fitted log (p)

−3

10

15

20

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0

Density

3 ^ ûi /σ

2

^ f (ûi / σ) N(0,1)

0.75

1 0.50 0 0.25

−1 0

Fig. 9.5.

5

10

15

20

−3

−2

Response surface outcomes for equation (75)

−1

0

1

2

3

4

9 Simulating the potencies of the automatic super-exogeneity test

189

Table 9.5. Potencies of the F-test for a level shift at T1 = 80, T = 100, α1 = α2 = 0.025 d/β

0.75

1.0

1.5

1.75

1 2 2.5 3 4

0.027 0.114 0.392 0.757 0.996

0.027 0.098 0.349 0.715 0.994

0.026 0.054 0.159 0.434 0.949

0.022 0.034 0.055 0.112 0.418

Table 9.6. Potencies of the F-test for a level shift at T1 = 70, T = 100, α1 = α2 = 0.025 d:β

0.75

1.0

1.5

1.75

2.5 3.0 4.0

0.260 0.708 0.997

0.245 0.680 0.995

0.174 0.486 0.967

0.118 0.221 0.576

We now turn to the eﬀect of sample size on potency. Table 9.5 reports the results for signiﬁcance levels of 2.5% in both marginal and conditional models when T = 100 and T1 = 80. The test still has reasonable potency for moderate violations of weak exogeneity when breaks are at least 3σ, although there is a loss of potency with the reduction in sample size. The trade oﬀ between length of break and potency remains as shown in Table 9.6 for T − T1 = 30, beginning at observation 71: small breaks have negligible potency. However, the potency is higher at the larger breaks despite smaller weak exogeneity violations, so the impacts of the various determinants are nonmonotonic, as anticipated from (66).

9.1. Optimal infeasible impulse-based F-test The optimal infeasible impulse-based F-test with a known break location in the marginal process is computable in simulations. The tables below use α2 = 2.5% for testing in the conditional. The empirical rejection frequencies approximate maximum achievable power for this type of test. When T = 100, and the break is a mean shift starting at T1 = 80, the correct 20 impulse indicators are always included in the conditional model. Table 9.7 reports for the failure of super exogeneity. Relative to the optimal infeasible test, the automatic test based on saturation of the marginal naturally loses considerable potency for breaks of small magnitudes. Table 9.8 shows that for a failure of super exogeneity, even when β = 1.75, the optimal test power increases with k for breaks of d = 1 and 2. Thus, the optimal test exhibits power increasing with break length unlike (69).

190

An automatic test of super exogeneity Table 9.7. Powers of an F-test for a level shift at T1 = 0.8T = 80 with known break location and form

Table 9.8. and form d : T − T1 1.0 2.0

d:β

0.75

1.0

1.5

1.75

1.0 2.0 2.5 3.0 4.0

1.000 1.000 1.000 1.000 1.000

0.994 1.000 1.000 1.000 1.000

0.404 0.930 0.973 0.985 0.988

0.083 0.247 0.326 0.380 0.432

Super-exogeneity failures at T1 when T = 100 with known break location 45

40

30

20

15

10

0.572 0.942

0.563 0.938

0.515 0.920

0.423 0.880

0.348 0.828

0.259 0.720

5 0.073 0.484

10. Testing super exogeneity in UK money demand We next test super exogeneity in a model of transactions demand for money in the UK using a sample of quarterly observations over 1964(3) to 1989(2), deﬁned by: • • • •

M nominal M1 X real total ﬁnal expenditure (TFE) at 1985 prices P TFE deﬂator Rn net interest rate on retail sight deposits: three-month local authority interest rate minus own rate.

We use the model in Hendry and Doornik (1994) (also see Hendry, 1979; Hendry and Ericsson, 1991; Boswijk, 1992; Hendry and Mizon, 1993; and Boswijk and Doornik, 2004), and express the variables as a vector autoregressive system. Previous cointegration analyses showed two long run relationships, but conﬁrmed the long run weak exogeneity of {xt , Δpt , Rn,t } in that four-variable system. The theoretical basis is a model that links demand for real money, m − p (lower case denoting logs) to (log) income x (transactions motive) and inﬂation Δpt , with the interest rate Rn measuring the opportunity cost of holding money. The data series terminate in 1989(2) because a sequence of large building societies converted to banks thereafter, greatly altering M1 measures as their deposits were previously classiﬁed outside M1. Commencing from the conditional model of m − p on {xt , Δpt , Rn,t } with two lags of all variables, constant and trend, undertaking selection with impulse saturation on that

10 Testing super exogeneity in UK money demand

191

equation using Autometrics at α2 = 1% yields: (m − p)t = 0.11 xt − 0.85 Δpt − 0.44 Rn,t + 0.60 (m − p)t−1 + 0.30 (m − p)t−2 (0.01)

(0.11)

(0.08)

(0.07)

(0.07)

− 0.27 Rn,t−1 − 3.5 I69(2) + 4.3 I71(1) + 3.9 I73(2) + 4.2 I74(4) (0.10)

(1.1)

(1.1)

(1.1)

(1.1)

− 2.8 I83(3)

(76)

(1.1)

Far (5, 84) = 1.90 Farch (4, 81) = 0.57 Fhet (22, 66) = 0.35 Freset (1, 91) = 0.08 σ ,(m−p) = 0.010 χ2nd (2) = 0.76 FChow:81(4) (30, 59) = 1.0 SC(11) = −5.93 The legend is described in §9. The coeﬃcients of the impulses are multiplied by 100 (so are percentage shifts for (m − p)t , xt and Δpt ). Despite a large number of previous studies of UK M1, (76) has a major new result: the puzzle of why transactions demand did not depend on the contemporaneous expenditure for which it was held is resolved by ﬁnding that it does – once impulse saturation is able to remove the contaminating perturbations. Moreover, the PcGive unit-root test is −12.79∗∗ strongly rejecting an absence of cointegration; and the derived long run expenditure elasticity is 1.02 (0.003), so the match with economic theory has been made much closer. Almost all the impulses have historical interpretations: decimalization began in 1969(2) and was completed in 1971(1); 1973(2) saw the introduction of VAT; 1974(4) was the heart of the ﬁrst Oil crisis; but 1983(3) is unclear. Next, we selected the signiﬁcant impulses in congruent marginal models for {xt , Δpt , Rn,t } with two lags of every variable, constant and trend, ﬁnding: xt = 1.24 + 0.89 xt−1 − 0.14 Rn,t−2 + 0.0007 t + 2.9 I68(1) + 3.6 I72(4) (0.32)

(0.03)

(0.03)

(0.0002)

(1.0)

(1.0)

+ 4.5 I73(1) + 5.7 I79(2) (1.0)

(77)

(1.0)

Far (5, 91) = 1.50 Farch (4, 88) = 1.67 Fhet (13, 82) = 1.26 Freset (1, 95) = 0.001 σ ,x = 0.010 χ2nd (2) = 0.05 Δpt = − 1.9 + 0.43 Δpt−1 + 0.21 xt−1 − 0.03 (m − p)t−1 − 0.0012 t (0.29)

(0.07)

(0.03)

(0.01)

(0.0002)

− 3.1 I73(2) + 2.5 I74(2) (0.68)

(0.65)

Far (5, 92) = 0.10 Farch (4, 89) = 0.84 Fhet (16, 80) = 0.83 Freset (1, 96) = 6.5∗ σ ,Δp = 0.0064 χ2nd (2) = 0.22

(78)

192

An automatic test of super exogeneity Rn,t = 0.99 Rn,t−1 + 3.9 I73(3) + 3.5 I76(4) − 3.6 I77(1) − 3.4 I77(2) (0.01)

(1.2)

(1.2)

(1.2)

(79)

(1.2)

Far (5, 94) = 1.08 Farch (4, 91) = 1.53 Fhet (6, 92) = 1.85 Freset (1, 98) = 3.08 σ ,Rn = 0.012 χ2nd (2) = 0.09 Only one mis-speciﬁcation test is signiﬁcant at even the 5% level across these three equations, so we judge these marginal models to be congruent. The impulses were selected using α1 = 1%, as although the sample size is only T = 104, many impulses were already known to matter from the economic turbulence of the 1970s and 1980s in the UK, and indeed 10 are retained across these three models; surprisingly, the three-day week loss of output in December 1973 did not show up in (77). Next, we tested the signiﬁcance of the 10 retained impulses from (77), (78) and (79) in the same unrestricted conditional model of (m − p)t as used for selecting (76), but without impulse saturation. This yielded FSE (10, 81) = 1.28 so the new test does not reject: the model with impulses had SC(22) = −5.11, whereas the unrestricted model without any impulses had SC(12) = −5.41, both much poorer than (76). The one impulse in common between marginal and conditional models is I73(2) , which entered the equation for Δpt . However, it does so positively in both equations, even though Δpt enters (76) negatively. Finally, we repeated the super-exogeneity impulse-saturation based test at α1 = 2.5%, which now led to 37 impulses being retained across the three marginal models, and a test statistic of FSE (37, 54) = 1.67∗ that just rejects at 5%, which may be partly due to the small remaining degrees of freedom as SC(49) = −4.5, so the conditional model without any impulses has a substantially smaller value of SC. Moreover, the only one of the impulses in (76) selected in any of these marginal models was again I73(2) . Thus, we ﬁnd minimal evidence against the hypothesis that {xt , Δpt , Rn,t } are super exogenous for the parameters of the conditional model for (m − p)t in (76). Not rejecting the null of super exogeneity implies that agents did not alter their demand for money behavior despite quite large changes in the processes generating their conditioning variables. In particular, agents could not have been forming expectations based on the marginal models for any of the three variables. This might be because their near unpredictability led to the use of robust forecasting devices of the general forms discussed by Favero and Hendry (1992) and Hendry and Ericsson (1991): x ,t+1 = xt ;

t+1 = Δpt ; Δp

,n,t+1 = Rn,t . R

If so, the apparent conditioning variables are actually the basis for robust one-step ahead forecasting devices used in the face of unanticipated structural breaks, as in Hendry (2006). Consequently, the nonrejection of super exogeneity makes sense, and does not contradict an underlying theory of forward-looking money demand behavior.

11. Conclusion An automatically computable test for super exogeneity based on selecting shifts in the marginal process by impulse saturation to test for related shifts in the conditional has

11 Conclusion

193

been proposed. The test has the correct null rejection frequency in constant conditional models when the nominal test size, α1 , is not too small in the marginal (e.g. 2.5%) even at small sample sizes, for a variety of marginal processes, both constant and with breaks. The approximate rejection-frequency function was derived analytically for regression models, and helps explain the simulation outcomes. These conﬁrm that the test can detect failures of super exogeneity when weak exogeneity fails and the marginal processes change. Although only a single break was considered in detail, the general nature of the test makes it applicable when there are multiple breaks in the marginal processes, perhaps at diﬀerent times. A test rejection outcome indicates a dependence between the conditional model parameters and those of the marginals, warning about potential mistakes from using the conditional model to predict the outcomes of policy changes that alter the marginal processes by location shifts, which is a common policy scenario. The empirical application to UK M1 delivered new results in a much-studied illustration, and conﬁrmed the feasibility of the test. The status of super exogeneity was not completely clear cut, but suggested, at most, a small degree of dependence between the parameters. Although all the derivations and Monte Carlo experiments here have been for static regression equations and speciﬁc location shifts, the principles are general, and should apply to dynamic equations (although with more approximate null rejection frequencies), to conditional systems, and to nonstationary settings: these are the focus of our present research.

10

Generalized Forecast Errors, a Change of Measure, and Forecast Optimality Andrew J. Patton and Allan Timmermann

1. Introduction In a world with constant volatility, concerns about the possibility of asymmetric or nonquadratic loss functions in economic forecasting would (almost) vanish: Granger (1969) showed that in such an environment optimal forecasts will generally equal the conditional mean of the variable of interest, plus a simple constant (an optimal bias term). However, the pioneering and pervasive work of Rob Engle provides overwhelming evidence of time-varying volatility in many macroeconomic and ﬁnancial time series.1 In a world with time-varying volatility, asymmetric loss has important implications for forecasting, see Christoﬀersen and Diebold (1997), Granger (1999) and Patton and Timmermann (2007a). The traditional assumption of a quadratic and symmetric loss function underlying most of the work on testing forecast optimality is increasingly coming under critical scrutiny, and evaluation of forecast eﬃciency under asymmetric loss functions has Acknowledgments: The authors would like to thank seminar participants at the Festschrift Conference in Honor of Robert F. Engle in San Diego, June 2007, and Graham Elliott, Raﬀaella Giacomini, Clive Granger, Oliver Linton, Mark Machina, Francisco Penaranda, Kevin Sheppard, Mark Watson, Hal White, Stanley Zin and an anonymous referee for useful comments. All remaining deﬁciencies are the responsibility of the authors. The second author acknowledges support from CREATES, funded by the Danish National Research Foundation. 1 See, amongst many others, Engle (1982a, 2004b), Bollerslev (1986), Engle et al. (1990), the special issue of the Journal of Econometrics edited by Engle and Rothschild (1992), as well as surveys by Bollerslev et al. (1994) and Andersen et al. (2006a).

194

1 Introduction

195

recently gained considerable attention in the applied econometrics literature.2 Progress has also been made on establishing theoretical properties of optimal forecasts for particular families of loss functions (Christoﬀersen and Diebold, 1997; Elliott et al., 2005, 2008; Patton and Timmermann, 2007b). However, although some results have been derived for certain classes of loss functions, a more complete set of results has not been established. This chapter ﬁlls this lacuna in the literature by deriving properties of an optimal forecast that hold for general classes of loss functions and general data-generating processes. Working out these properties under general loss is important as none of the standard properties established in the linear-quadratic framework survives to a more general setting in the presence of conditional heteroskedasticity, cf. Patton and Timmermann (2007a). Irrespective of the loss function and data-generating process, a generalized orthogonality principle must, however, hold provided information is eﬃciently embedded in the forecast. Implications of this principle will, of course, vary signiﬁcantly with assumptions about the loss function and data-generating process (DGP). Our results suggest two approaches: transforming the forecast error for a given loss function, or transforming the density under which the forecast error is being evaluated. The ﬁrst approach provides tests that generalize the widely used Mincer–Zarnowitz (Mincer and Zarnowitz, 1969) regressions, established under mean squared error (MSE) loss, to hold for arbitrary loss functions. We propose a seemingly unrelated regression (SUR)-based method for testing multiple forecast horizons simultaneously, which may yield power improvements when forecasts for multiple horizons are available. This is relevant for survey data such as those provided by the Survey of Professional Forecasters (Philadelphia Federal Reserve) or Consensus Economics as well as for individual forecasts such as those reported by the IMF in the World Economic Outlook. Our second approach introduces a new line of analysis based on a transformation from the usual probability measure to an “MSE-loss probability measure”. Under this new measure, optimal forecasts, from any loss function, are unbiased and forecast errors are serially uncorrelated, in spite of the fact that these properties generally fail to hold under the physical (or “objective”) measure. This transformation has its roots in asset pricing and “risk neutral” probabilities, see Harrison and Kreps (1979) for example, but to our knowledge has not previously been considered in the context of forecasting. Relative to existing work, our contributions are as follows. Using the ﬁrst line of research, we establish population properties for the so-called generalized forecast error, which is similar to the score function known from estimation problems. These results build on, extend and formalize results in Granger (1999) as well as in our earlier work (Patton and Timmermann, 2007a,b) and apply to quite general classes of loss functions and data-generating processes. Patton and Timmermann (2007b) establish testable implications of simple forecast errors (deﬁned as the outcome minus the predicted value) under forecast optimality, whereas Patton and Timmermann (2007a) consider the generalized forecast errors but only for more specialized cases such as linex loss with normally distributed innovations. Unlike Elliott et al. (2005), we do not deal with the issue of 2 See, for example, Christoﬀersen and Diebold (1996), Pesaran and Skouras (2001), Christoﬀersen and Jacobs (2004) and Granger and Machina (2006).

196

Generalized forecast errors

identiﬁcation and estimation of the parameters of the forecaster’s loss function. The density forecasting results are, to our knowledge, new in the context of the forecast evaluation literature. The outline of this chapter is as follows. Section 2 establishes properties of optimal forecasts under general known loss functions. Section 3 contains the change of measure result, and Section 4 presents empirical illustrations of the results. Section 5 concludes. An appendix contains technical details and proofs.

2. Testable implications under general loss functions Suppose that a decision maker is interested in forecasting some univariate time series, Y ≡ {Yt ; t = 1, 2, ...}, h steps ahead given information at time t, Ft . We assume that Xt = [Yt , Z˜t ] , where Z˜t is a (m × 1) vector of predictor variables used by the decision maker, and X ≡ {Xt : Ω → Rm+1 , m ∈ N, t = 1, 2, ...} is a stochastic prom+1 , cess on a complete probability space (Ω, F , P ), where Ω = R(m+1)∞ ≡ ×∞ t=1 R (m+1)∞ (m+1)∞ (m+1)∞ ≡ B(R ), the Borel σ-ﬁeld generated by R , and Ft is the F = B σ-ﬁeld {Xt−k ; k ≥ 0}. Yt is thus adapted to the information set available at time t.3 We will denote a generic sub-vector of Z˜t as Zt , and denote the conditional distribution of Yt+h given Ft as Ft+h,t , i.e. Yt+h |Ft ∼ Ft+h,t , and the conditional density, if it exists, as ft+h,t . Point forecasts conditional on Ft are denoted by Yˆt+h,t and belong to Y, a compact subset of R, and forecast errors are given by et+h,t = Yt+h − Yˆt+h,t .4 In general the objective of the forecast is to minimize the expected value of some loss function, L(Yt+h , Yˆt+h,t ), which is a mapping from realizations and forecasts to the real line, L : R × Y → R. That is, in general ∗ ≡ arg min Et [L (Yt+h , yˆ)] . Yˆt+h,t yˆ∈Y

(1)

Et [.] is shorthand notation for E[.|Ft ], the conditional expectation given Ft . We also deﬁne the conditional variance, Vt = E[(Y − E[Y |Ft ])2 |Ft ] and the unconditional equivalents, E[.] and V (.). The general decision problem underlying a forecast is to maximize the expected value of some utility function, U (Yt+h , A(Yˆt+h,t )), that depends on the outcome of Yt+h as well as on the decision maker’s actions, A, which in general depend on the full distribution forecast of Yt+h , Ft+h,t . Here we assume that A depends only on the forecast Yˆt+h,t and we write this as A(Yˆt+h,t ). Granger and Machina (2006) show that under certain conditions on the utility function there exists a unique point forecast, which leads to the same decision as if a full distribution forecast had been available. 3 The assumption that Y is adapted to F rules out the direct application of the results in this t t chapter to, e.g., volatility forecast evaluation. In such a scenario the object of interest, conditional variance, is not adapted to Ft . Using imperfect proxies for the object of interest in forecast optimality tests can cause diﬃculties, as pointed out by Hansen and Lunde (2006) and further studied in Patton (2006b). 4 We focus on point forecasts below, and leave the interesting extension to interval and density forecasting for future research.

2 Testable implications under general loss functions

197

2.1. Properties under general loss functions Under general loss the ﬁrst order condition for the optimal forecast is5 ⎤ ⎡ ∗ ∂L y, Yˆ ∗ ∂L Yt+h , Yˆt+h,t t+h,t ⎦= 0 = Et ⎣ dFt+h,t (y) . ∂ Yˆt+h,t ∂ Yˆt+h,t

(2)

This condition can be rewritten using what Granger (1999) refers to as the (opti∗ ∗ ≡ ∂L(Yt+h , Yˆt+h,t )/∂ Yˆt+h,t ,6 so that equation (2) mal) generalized forecast error, ψt+h,t simpliﬁes to ∗ ∗ dFt+h,t (y) = 0. (3) Et [ψt+h,t ] = ψt+h,t ∗ Under a broad set of conditions ψt+h,t is therefore a martingale diﬀerence sequence with respect to the information set used to compute the forecast, Ft . The generalized forecast error is closely related to the “generalized residual” often used in the analysis of discrete, censored or grouped variables, see Gourieroux et al. (1987) and Chesher and Irish (1987) for example. Both the generalized forecast error and the generalized residual are based on ﬁrst order (or “score”) conditions. We next turn our attention to proving properties of the generalized forecast error analogous to those for the standard case. We will sometimes, though not generally, make use of the following assumption on the DGP for Xt ≡ [Yt , Z˜t ] :

Assumption D1: {Xt } is a strictly stationary stochastic process. Note that we do not assume that Xt is continuously distributed and so the results below may apply to forecasts of discrete random variables, such as direction-of-change forecasts or default forecasts. The following properties of the loss function are assumed at various points of the analysis, but not all will be required everywhere. Assumption L1: The loss function is (at least) once diﬀerentiable with respect to its second argument, except on a set of Ft+h,t -measure zero, for all t and h. Assumption L2: Et [L(Yt+h , yˆ)] < ∞ for some yˆ ∈ Y and all t, almost surely. Assumption L2’: An interior optimum of the problem min L (y, yˆ) dFt+h,t (y) yˆ∈Y

exists for all t and h. Assumption L3: |Et [∂L(Yt+h , yˆ)/∂ yˆ]| < ∞ for some yˆ ∈ Y and all t, almost surely. Assumption L2 simply ensures that the conditional expected loss from a forecast is ﬁnite, for some ﬁnite forecast. Assumptions L1 and L2’ allow us to use the ﬁrst order 5 This result relies on the ability to interchange the expectation and diﬀerentiation operators. Assumptions L1–L3 given below are suﬃcient conditions for this to hold. 6 Granger (1999) considers loss functions that have the forecast error as an argument, and so deﬁnes ∗ ∗ the generalized forecast error as ψt+h,t ≡ ∂L(et+h,t )/∂et+h,t . In both deﬁnitions, ψt+h,t can be viewed as the marginal loss associated with a particular prediction, Yˆt+h,t .

198

Generalized forecast errors

condition of the minimization problem to study the optimal forecast. One set of suﬃcient conditions for Assumption L2’ to hold are Assumption L2 and: Assumption L4: The loss function is a nonmonotonic, convex function solely of the forecast error. We do not require that L is everywhere diﬀerentiable with respect to its second argument, nor do we need to assume a unique optimum (though this is obtained if we impose Assumption L4, with the convexity of the loss function being strict). Assumption L3 is required to interchange expectation and diﬀerentiation: ∂Et [L(Yt+h , yˆ)]/∂ yˆ = Et [∂L(Yt+h , yˆ)/∂ yˆ]. The bounds on the integral on the left-hand-side of this expression are unaﬀected by the choice of yˆ, and so two of the terms in Leibnitz’s rule drop out, meaning we need only assume that the term on the right-hand-side is ﬁnite. The following proposition establishes properties of the generalized forecast error, ∗ : ψt+h,t Proposition 1 ∗ 1. Let assumptions L1, L2’ and L3 hold. Then the generalized forecast error, ψt+h,t , has conditional (and unconditional) mean zero. 2. Let assumptions L1, L2’ and L3 hold. Then the generalized forecast error from an optimal h-step forecast made at time t exhibits zero correlation with any function of any element of the time t information set, Ft , for which second moments exist. In particular, the generalized forecast error will exhibit zero serial correlation for lags greater than (h − 1).7 3. Let assumptions D1 and L2 hold. Then the unconditional expected loss of an optimal forecast error is a nondecreasing function of the forecast horizon.

All proofs are given in the appendix. The above result is useful when the loss func∗ can then be calculated directly and employed in generalized tion is known, since ψt+h,t ∗ on period-t instruments. For example, the martingale eﬃciency tests that project ψt+h,t ∗ diﬀerence property of ψt+h,t can be tested by testing α = β = 0 for all Zt ∈ Ft in the following regression: ψt+h,t = α + β Zt + ut+h .

(4)

The above simple test will not generally be consistent against all departures from forecast optimality. A consistent test of forecast optimality based on the generalized forecast errors could be constructed using the methods of Bierens (1990), de Jong (1996) and Bierens and Ploberger (1997). Tests based on generalized forecast errors obtained from a model with estimated parameters can also be conducted, using the methods in West (1996, 2006). If the same forecaster reported forecasts for multiple horizons we can conduct a joint test of forecast optimality across all horizons. This can be done without requiring that the forecaster’s loss function is the same across all horizons, i.e., we allow the one-step ahead forecasting problem to involve a diﬀerent loss function to the two-step ahead forecasting 7 Optimal h-step forecast errors under MSE loss are MA processes of order no greater than h − 1. In a nonlinear framework an MA process need not completely describe the dependence properties of the generalized forecast error. However, the autocorrelation function of the generalized forecast error will match some MA(h − 1) process.

2 Testable implications under general loss functions

199

problem, even for the same forecaster. A joint test of optimality across all horizons may be conducted as: ⎤ ⎡ ψt+1,t ⎢ ψt+2,t ⎥ ⎥ ⎢ (5) ⎢ .. ⎥ = A + BZt + ut,H ⎣ . ⎦ ψt+H,t and then testing H0 : A = B = 0 vs. Ha : A = 0 ∪ B = 0. More concretely, one possibility is to estimate a SUR system for the generalized forecast errors: ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ψt+1,t ψt,t−1 ψt−J+1,t−J ⎢ ψt,t−2 ⎥ ⎢ ψt−J+1,t−J−1 ⎥ ⎢ ψt+2,t ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ (6) ⎥ + ut,H , ⎢ . ⎥ = A + B1 ⎢ . ⎥ + . . . + BJ ⎢ .. ⎣ .. ⎦ ⎣ ⎦ ⎣ .. ⎦ . ψt,t−H ψt−J+1,t−J−H+1 ψt+H,t and then test H0 : A = B = 0 vs. Ha : A = 0 ∪ B = 0.

2.2. Properties under MSE Loss In the special case of a squared error loss function: 2 L(Yt+h , Yˆt+h,t ) = θ Yt+h − Yˆt+h,t , θ > 0,

(7)

optimal forecasts can be shown to have the standard properties, using the results from Proposition 1. For reference we list these below: Corollary 1 Let the loss function be 2 L Yt+h , Yˆt+h,t = θh Yt+h − Yˆt+h,t , θh > 0 for all h 2 ] < ∞ for all t and h almost surely. Then and assume that Et [Yt+h

1. The optimal forecast of Yt+h is Et [Yt+h ] for all forecast horizons h; 2. The forecast error associated with the optimal forecast has conditional (and unconditional) mean zero; 3. The h-step forecast error associated with the optimal forecast exhibits zero serial covariance beyond lag (h − 1); Moreover, if we further assume that Y is covariance stationary, we obtain: 4. The unconditional variance of the forecast error associated with the optimal forecast is a nondecreasing function of the forecast horizon. This corollary shows that the standard properties of optimal forecasts are generated by the assumption of mean squared error loss alone; in particular, assumptions on the DGP (beyond covariance stationarity and ﬁnite ﬁrst and second moments) are not required. Properties such as these have been extensively tested in empirical studies of optimality of predictions or rationality of forecasts, e.g. by testing that the intercept

200

Generalized forecast errors

is zero (α = 0) and the slope is unity (β = 1) in the Mincer–Zarnowitz (Mincer and Zarnowitz, 1969) regression Yt+h = α + β Yˆt+h,t + εt+h

(8)

or equivalently in a regression of forecast errors on current instruments, et+h,t = α + β Zt + ut+h .

(9)

Elliott, Komunjer and Timmermann (2008) show that the estimates of β will be biased when the loss function used to generate the forecasts is of the asymmetric squared loss variety. Moreover, the bias in that case depends on the correlation between the absolute forecast error and the instruments used in the test. It is possible to show that under general (non-MSE) loss the properties of the optimal forecast error listed in Corollary 1 can all be violated; see Patton and Timmermann (2007a) for an example using a regime switching model and the “linex” loss function of Varian (1974).

3. Properties under a change of measure In the previous section we showed that by changing our object of analysis from the forecast error to the “generalized forecast error” we can obtain the usual properties of unbiasedness and zero serial correlation. As an alternative approach, we next consider instead changing the probability measure used to compute the properties of the forecast error. This analysis is akin to the use of risk-neutral densities in asset pricing, cf. Harrison and Kreps (1979). In asset pricing one may scale the objective (or physical) probabilities by the stochastic discount factor (or the discounted ratio of marginal utilities) to obtain a risk-neutral probability measure and then apply risk-neutral pricing methods. Here we will scale the objective probability measure by the ratio of the marginal loss, ∂L/∂ yˆ, to the forecast error, and then show that under the new probability measure the standard properties hold; i.e., under the new measure, (Yt+h − Yˆt+h,t , Ft ) ∗ ∗ is a martingale diﬀerence sequence when Yˆt+h,t = Yˆt+h,t , where Yˆt+h,t is deﬁned in equation (1). We call the new measure the “MSE-loss probability measure”. The resulting method thus suggests an alternative means of evaluating forecasts made using general loss functions. Note that the conditional distribution of the forecast error, Fet+h,t , given Ft and any forecast yˆ ∈ Y, satisﬁes y + e) , Fet+h ,t (e; yˆ) = Ft+h,t (ˆ

(10)

for all (e, Yˆt+h,t ) ∈ R × Y where Ft+h,t is the conditional distribution of Yt+h given Ft . To facilitate the change of measure, we make use of the following assumption: y. Assumption L5: ∂L(y, yˆ)/∂ yˆ ≤ (≥)0 for y ≥ (≤)ˆ Assumption L5 simply imposes that the loss function is nondecreasing as the forecast moves further away (in either direction) from the true value, which is a reasonable assumption. It is common to impose that L(ˆ y , yˆ) = 0, i.e., the loss from a perfect forecast is zero, but this is obviously just a normalization and is not required here.

3 Properties under a change of measure

201

The sign of (y − yˆ)−1 ∂L(y, yˆ)/∂ yˆ is negative under assumption L5, and in deﬁning the MSE-loss probability measure we need to further assume that it is bounded and nonzero: Assumption L6: 0 < −Et [(Yt+h − yˆ)−1 ∂L(Yt+h , yˆ)/∂ yˆ] < ∞ for all yˆ ∈ Y and all t, almost surely. Deﬁnition 1 Let assumptions L5 and L6 hold and let 1 ∂L (y, yˆ) Λ (e, yˆ) ≡ − · e ∂ yˆ

.

(11)

y=ˆ y +e

Then the “MSE-loss probability measure”, dF˜et+h,t (·|ˆ y ), is deﬁned by dF˜et+h,t (e; yˆ) =

Λ (e, yˆ) · dFet+h,t (e; yˆ) . Et [Λ (Yt+h − yˆ, yˆ)]

(12)

By construction the MSE-loss probability measure F˜ (·|ˆ y ) is absolutely continuous with respect to the usual probability measure, F (·|ˆ y ), (that is, F˜ (·|ˆ y ) << F (·|ˆ y )). The function Λ (e, yˆ) (13) Λ˜t+h,t (e, yˆ) ≡ Et [Λ (Yt+h − yˆ, yˆ)] is the Radon-Nikod´ ym derivative dF˜et+h ,t (·|ˆ y )/dFet+h ,t (·|ˆ y ). If we let u = e−1 , then −1 Assumption L6 requires that ∂L(y, yˆ)/∂ yˆ|y=ˆy+1/u = O(u ). Note that Λ(e, yˆ) is well deﬁned at e = 0 for some common loss functions. For example, M SE : lim Λ (e, yˆ) = 2 e→0

Linex : lim Λ (e, yˆ) = a2 e→0

P ropM SE : lim Λ (e, yˆ) = 2/ˆ y2 e→0

where the Linex and PropMSE loss functions are deﬁned as L(y, yˆ) = exp{ae}−ae−1 and L(y, yˆ) = (y/ˆ y − 1)2 , respectively. For mean absolute error loss, L(y, yˆ) = |e|, the limits from both directions diverge, meaning that there is no MSE-loss density under MAE in general. However, if the variable of interest is conditionally symmetrically distributed at all points in time, then the optimal forecast under MAE coincides with the optimal forecast under MSE, as the conditional mean is equal to the conditional median, and so the appropriate Radon-Nikod´ ym derivative is equal to one. We now show that under the MSE-loss probability measure the optimal h-step ahead forecast errors exhibit the properties that we would expect from optimal forecasts under MSE loss: Proposition 2 1. Let assumptions L1, L5 and L6 hold. Then the “MSE-loss probability measure”, y ), deﬁned in equation (12) is a proper probability distribution function F˜et+h,t (·|ˆ for all yˆ ∈ Y. 2. If we further let assumption L2’ hold, then the optimal forecast error, e∗t+h,t = ∗ Yt+h − Yˆt+h,t has conditional mean zero under the MSE-loss probability measure ∗ F˜et+h,t (·|Yˆt+h,t ).

202

Generalized forecast errors 3. The optimal forecast error is serially uncorrelated under the MSE-loss probability ∗ ), for all lags greater than h − 1. measure, F˜et+h,t (·|Yˆt+h,t ∗ 4. V˜ [e∗t+h,t ], the variance of e∗t+h,t under F˜et+h,t evaluated at Yˆt+h,t , is a nondecreasing function of the forecast horizon.

Notice that e∗t+h,t is a martingale diﬀerence sequence, with respect to Ft , under F˜t+h,t . Furthermore, although the MSE-loss probability measure operates on forecast errors, the ∗ as separate arguments. result holds for general loss functions having Yt+h , Yˆt+h,t It is worth emphasizing that the MSE-loss probability measure is a conditional distribution, and so obtaining an estimate of it from data is not as simple as it would be if it was an unconditional distribution. If we assume that the density fet+h ,t exists then it is possible, under some conditions, to obtain a consistent estimate of fet+h ,t via seminonparametric density estimation, see Gallant and Nychka (1987). If L is known then Λ is, of course, also known.8 With consistent estimates of fet+h ,t and Λ it is simple to construct an estimator of f˜et+h ,t . In recent work, Chernov and Mueller (2007) specify a ﬂexible parametric model for f˜t and Λt in order to estimate the underlying objective conditional density, f , of forecasters from a variety of macroeconomic surveys. From this density estimate, they are then able to both “bias-correct” the individual forecasts, and compute combination forecasts.

4. Numerical example and an application to US inﬂation To illustrate how the MSE-loss error density diﬀers from the objective error density, consider the following simple example. Consider the following AR(1)-GARCH(1,1) data generating process: Yt = φ0 + φ1 Yt−1 + εt 1/2

εt = h t νt

(14)

ht = ω + βht−1 + αε2t−1 νt |Ft−1 ∼ N (0, 1). Next, consider the simple and analytically tractable “linex” loss function of Varian (1974), scaled by 2/a2 : 2 (exp {a (y − yˆ)} − a (y − yˆ) − 1) . (15) a2 The scaling term 2/a2 does not aﬀect the optimal forecast, but ensures that this function limits to the MSE loss function as a → 0. When a > 0, under-predictions (y > yˆ, or e > 0) carry an approximately exponential penalty, whereas over-predictions (y < yˆ, or e < 0) carry an approximately linear penalty. When a < 0 the penalty for over-predictions is approximately exponential whereas the penalty for under-predictions is approximately linear. In Figure 10.1 we present the linex loss function for a = 3. L (y, yˆ; a) =

8 If L is unknown, a nonparametric estimate of Λ may be obtained via sieve estimation methods, for example, see Andrews (1991) or Chen and Shen (1998).

4 Numerical example and an application to US inﬂation

203

MSE and Linex loss functions 12 MSE loss Linex loss (a = 3) 10

8

6

4

2

0 –3

Fig. 10.1.

–2

–1

0 forecast error

1

2

3

MSE and Linex loss functions for a range of forecast errors

Under linex loss, the optimal one-step-ahead forecast and the associated forecast error are (see Varian, 1974; Zellner, 1986; and Christoﬀersen and Diebold, 1997) a Yˆt∗ = Et−1 [Yt ] + Vt−1 [Yt ] 2 a (16) e∗t = − Vt−1 [Yt ] + εt 2 a 1/2 = − h t + h t νt 2 a so e∗t |Ft−1 ∼ N − ht , ht 2 and so we see that the process for the conditional mean (an AR(1) process above) does not aﬀect the properties of the optimal forecast error. Notice that the forecast error follows an ARCH-in-mean process of the type analyzed by Engle, Lilien and Robins (1987). The generalized forecast error for this example is as follows, and has a log-normal distribution when suitably centered and standardized: * + ∂L Yt , Yˆt 2 = 1 − exp a Yt − Yˆt (17) ψt ≡ ∂ yˆ a a so 1 − ψt |Ft−1 ∼ log N a μt − Yˆt , a2 ht 2 2 a a and 1 − ψt∗ |Ft−1 ∼ log N − ht , a2 ht . 2 2

204

Generalized forecast errors hhat = 0.54

hhat = 0.73

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

−5

0

5

0

−5

hhat = 1.00 (mean) 0.4

0.3

0.3

0.2

0.2

0.1

0.1 −5

0

5

0

−5

0.3

0.3

0.2

0.2

0.1

0.1

−5

0 forecast error

0

5

hhat = 2.45

hhat = 1.43

0

5

hhat = 1.11

0.4

0

0

5

0

objective density MSE−loss density

−5

0 forecast error

5

Fig. 10.2. Objective and “MSE-loss” error densities for a GARCH process under Linex loss, for various values of the predicted conditional variance

For the numerical example, we chose values of the predicted variance, ht , to correspond to the mean and the 0.01, 0.25, 0.75, 0.9 and 0.99 percentiles of the unconditional distribution of ht when the GARCH parameters are set to (ω, α, β) = (0.02, 0.05, 0.93), which are empirically reasonable. A plot of the objective and the MSE-loss densities is given in Figure 10.2. In all cases we see that the MSE-loss density is shifted to the right of the objective density, in order to remove the (optimal) negative bias that is present under the objective probability distribution due to the high cost associated with positive forecast errors. The way this probability mass is shifted depends on the level of predicted volatility, and Figure 10.2 reveals a variety of shapes for the MSE-loss density. When volatility is low

4 Numerical example and an application to US inﬂation

205

(ht = 0.54 or 0.73), the MSE-loss density remains approximately bell-shaped, and is a simple shift of location (with a minor increase in spread) so that the mean of this density is zero. When volatility is average to moderately high (ht = 1.00 or 1.11), the MSE-loss density becomes a more rounded bell shape and remains unimodal. When volatility is high, the MSE-loss density becomes bimodal: it is approximately “ﬂat-topped” for the ht = 1.43 case (though actually bimodal) and clearly bimodal for the ht = 2.45 case. The bimodality arises from the interaction of the three components that aﬀect the shape of the MSE-loss density: the derivative of the loss function, the shape of the objective density, and the inverse of the forecast error. We also see that the MSE-loss density is symmetric in this example. This is not a general result: a symmetric objective density (such as in this example) combined with an asymmetric loss function will generally lead to an asymmetric MSE-loss density. It is the particular combination of the normal objective density with the linex loss function that leads to the symmetric MSE-loss function observed here. A symmetric but non-normal conditional density for νt , such as a mixture of normals, can be shown to lead to an asymmetric MSE-loss density.

4.1. Application to US inﬂation In this section we apply the methods of this chapter to inﬂation forecasting, which was the application in Rob Engle’s original ARCH paper, Engle (1982a). We use monthly CPI inﬂation for the US, Δ log(CP It ) over the period January 1982 to December 2006. This happens to be the period starting with the publication of the original ARCH paper, and also coincides with the period after the change in the Federal Reserve’s monetary policy during the “monetarist experiment” from 1979–1982. This is widely believed to have led to a break in the inﬂation dynamics and volatility of many macroeconomic time series. We use a simple AR(4) model for the conditional mean, and a GARCH(1,1) model for the conditional variance.9 Assuming normality for the standardized residuals from this model, we can then obtain both the MSE-optimal forecast (simply the conditional mean) and the Linex-optimal forecast, where we set the linex shape parameter to equal three, as in the previous section.10 The data and forecasts are presented in Figure 10.3. In the upper panel we plot both the realized inﬂation (in percent per month) and the estimated conditional mean, which is labeled in the “MSE forecast” in the lower panel. The lower panel reveals that the linex forecast is always greater than the MSE forecast, by an amount that grows in periods with high variance (as shown in the middle panel), with the average diﬀerence being 0.087%, or 1.04% per year. With average realized inﬂation at 3.06% per year in this sample period, the linex forecast (optimal) bias is substantial.

9 The

Engle (1982a) LM test for ARCH in the residuals from the AR(4) model rejected the null of homoskedasticity, at the 0.05 level, for all lags up to 12. 10 The Jarque–Bera (1987) test for the normality of the standardized residuals actually rejects the assumption of normality here. The estimated skewness of these residuals is near zero, but the kurtosis is 4.38, which is far enough from 3 for this test to reject normality. We nevertheless proceed under the assumption of normality.

206

Generalized forecast errors Monthly US inflation 1.5

Percent

1

realized inflation conditional mean

0.5 0 −0.5 −1 Jan82

Jan84

Jan86

Jan88

Jan90

Jan92

Jan94

Jan96

Jan98

Jan00

Jan02

Jan04

Jan06

Jan00

Jan02

Jan04

Jan06

Jan00

Jan02

Jan04

Jan06

Volatility of monthly US inflation

0.5

conditional standard deviation Percent

0.4 0.3 0.2 0.1 Jan82

Jan84

Jan86

Jan88

Jan90

Jan92

Jan94

Jan96

Jan98

Monthly US inflation forecasts

1

Percent

linex forecast (a = 3) MSE forecast 0.5

0

−0.5 Jan82

Jan84

Jan86

Jan88

Jan90

Jan92

Jan94

Jan96

Jan98

Fig. 10.3. Monthly CPI inﬂation in the US over the period January 1982 to December 2006, along with the estimated conditional mean, conditional standard deviation, and the linex-optimal forecast To emphasize the importance of the loss function in considering forecast optimality, we illustrate two simple tests of optimality for each of the two forecasts.11 The ﬁrst looks for bias in the forecast, whereas the second looks for bias and ﬁrst order autocorrelation in the forecast errors. The results for the MSE and Linex forecasts are presented below, with Newey–West (Newey and West, 1987) t-statistics presented in parentheses below the parameter estimates. The “p value” below reports the p value associated with the test of the null of forecast optimality, either zero bias or zero bias and 11 Formal testing of forecast optimality would use a pseudo-out-of-sample period for analysis, separate from the period used for estimation.

4 Numerical example and an application to US inﬂation

207

zero autocorrelation. SE = − 0.002 + ut , eM t

p value = 0.902

(−0.123)

SE SE eM = − 0.002 + 0.003 eM t t−1 + ut , (−0.124)

p value = 0.992

(18)

(0.050)

eLinex = − 0.087 + ut , t

p value = 0.000

(−6.175)

eLinex = − 0.085 + 0.021 eLinex + ut , t t−1 (−6.482)

p value = 0.000.

(0.327)

As expected, the MSE-optimal passes these tests. The Linex-optimal forecast fails both of these tests, primarily due to the positive bias in the linex forecasts. This is, of course, also expected, as the linex forecasts are constructed for a situation where the costs of under-predicting are much greater than those of over-predicting, see Figure 10.1. Thus, the linex forecast is not constructed to be optimal under MSE loss, which is what the above two tests examine. Next we consider testing for optimality under linex loss, using the generalized forecast error for that loss function and the methods discussed in Section 2. The formula for the generalized forecast for linex loss is given in equation (17), and from that we construct ψtM SE and ψtLinex using the MSE forecast and the Linex forecast. We ran the same tests as above, but now using the generalized forecast error rather than the usual forecast error, and obtained the following results: ψtM SE = − 0.210 + ut ,

p value = 0.000

(−3.985)

M SE ψtM SE = − 0.214 − 0.019 ψt−1 + ut , (−3.737)

p value = 0.000

(19)

(−0.342)

ψtLinex = − 0.010 + ut ,

p value = 0.798

(−0.256)

Linex ψtLinex = − 0.010 − 0.031 ψt−1 + ut , (−0.263)

p value = 0.849

(−0.550)

Using the test of optimality based on linex loss (with parameter equal to three), we ﬁnd that the MSE forecasts are strongly rejected, whereas the linex forecasts are not. The contrast between this conclusion and the conclusion from the tests based on the usual forecast errors provides a clear illustration of the importance of matching the loss function used in forecast evaluation with that used in forecast construction. Failure to accurately account for the forecaster’s objectives through the loss function can clearly lead to false rejections of forecast optimality. Finally, we present the estimated objective and MSE-loss densities associated with these forecasts. We nonparametrically estimated the objective density of thestan ˆ t , where μ ˆ t is ˆt )/ h ˆt is the conditional mean and h dardized residuals, νˆt ≡ (yt − μ the conditional standard deviation, using a Gaussian kernel with bandwidth set to 8 νt ] × T −1/5 , where T = 300 is the sample size. From this, we can then compute 0.9 × Vˆ [ˆ

208

Generalized forecast errors hhat = 0.030

2.5 2

2

1.5

1.5

1

1

0.5

0.5

0 −2

−1

0 forecast error

1

0 −2

2

hhat = 0.057

2

1.5

1

1

0.5

0.5

−1

0 forecast error

1

0 −2

2

hhat = 0.085

1.5

−1

0 forecast error

1

2

1

2

hhat = 0.066

2

1.5

0 −2

hhat = 0.038

2.5

−1

0 forecast error hhat = 0.157

1.5

objective density MSE - loss density 1

1

0.5

0.5

0 −2

−1

0 forecast error

1

0 −2

2

−1

0 forecast error

1

2

Fig. 10.4. Estimated objective and “MSE-loss” error densities for US inﬂation, for various values of the predicted conditional variance an estimate of the conditional (objective) density of the forecast errors: ˆ t a/2 e + h 1 ˆ t = fˆν . fˆ e|h ˆ ˆt ht h The MSE-loss density is estimated as: ˆt = f˜ e|h

ˆ E

0

2 ae (1 − exp {ae}) ˆt 1 fˆ e|h 2 aet (1 − exp {aet }) |ht

.

- √ T 2 1 − exp a ht νi − a2 ht ˆ 2 (1 − exp {aet }) |ht ≡ 1

√ where E aet T i=1 a ht νi − a2 ht

(20)

(21)

(22)

Appendix

209

and thus uses both the nonparametric estimate of the objective density, and a data-based estimate of the normalization constant. The estimated objective and MSE-loss densities are presented in Figure 10.4, using the same method of choosing values for the predicted variance: we use values that correspond to the mean and the 0.01, 0.25, 0.75, 0.9 and 0.99 percentiles of the sample distribution ˆ t from our model. As in the simulation example in the previous section, we see of h that the objective density is centered to the left of zero, and that the centering point moves further from zero as the variance increases. A small ‘bump’ in the right tail of the objective density estimate is ampliﬁed in the MSE-loss estimate, particularly as the volatility increases, and the MSE-loss density is approximately centered on zero. The “bump” in the right tail of both of these densities disappears if we impose that the standardized residuals are truly normally distributed; in that case the objective density is, of course, Gaussian, and the resulting MSE-loss density is unimodal across these ˆ t. values of h

5. Conclusion This chapter derives properties of an optimal forecast that hold for general classes of loss functions in the presence of conditional heteroskedasticity. Studying these properties is important, given the overwhelming evidence for conditional heteroskedasticity that has accumulated since the publication of Engle’s seminal (1982a) ARCH paper. We show that irrespective of the loss function and data generating process, a generalized orthogonality principle must hold provided information is eﬃciently embedded in the forecast. We suggest that this orthogonality principle leads to two primary implications: (1) a transformation of the forecast error, the “generalized forecast error”, must be uncorrelated with elements of the information set available to the forecaster, and (2) a transformation of the density of the forecast errors, labeled the “MSE-loss” density, must exist which gives forecasts that are optimal under non-MSE loss the same properties as those that are optimal under MSE loss. The ﬁrst approach to testing forecast optimality has its roots in the widely used Mincer–Zarnowitz (1969) regression, whereas the second approach is based on a transformation from the usual probability measure to an “MSE-loss probability measure”. This transformation has its roots in asset pricing and “risk neutral” probabilities, but to our knowledge has not previously been considered in the context of forecasting. Implementing the ﬁrst approach empirically is relatively straightforward, although it may require estimation of the parameters of the loss function if these are unknown (Elliott et al., 2005); implementing the second approach will require thinking about forecast (sub-)optimality in a diﬀerent way, which may yield new insights into forecaster behavior.

Appendix Proof of Proposition 1 1. Assumptions L1 and L2’ allow us to analyze the ﬁrst order condition for the optimal forecast, and assumption L3 permits the exchange of diﬀerentiation and expectation in the ﬁrst order condition, giving us,

210

Generalized forecast errors

∗ , by the optimality of Yˆt+h,t

⎤ ⎡ ∗ ∂L Yt+h , Yˆt+h,t ' ∗ ( ⎦ = 0. Et ψt+h,t = Et ⎣ ∂ Yˆt+h,t

1 0 ∗ = 0 follows from the law of iterated expectations. E ψt+h,t ∗ = To prove point 2, as (Yt , Yt−1 , . . .) ∈ Ft by assumption we know that ψt+h−j,t−j ∗ ˆ ∂L(Yt+h−j , Yt+h−j,t−j )/∂ yˆ is an element of Ft for all j ≥ h. Assumptions L1 and L2’ again allow us to analyze the ﬁrst order condition for the optimal forecast, and assumption L3 permits the exchange of diﬀerentiation and expectation in the ﬁrst order condition. We thus have ⎤ ⎡ ∗ ∂L Yt+h , Yˆt+h,t ( ' ∗ Ft ⎦ = 0, E ψt+h,t |Ft = E ⎣ ∂ Yˆ ∗ which implies E[ψt+h,t · φ(Zt )] = 0 for all Zt ∈ Ft and all functions φ for which this ∗ is uncorrelated with any function of any element of Ft . This moment exists. Thus, ψt+h,t ∗ ∗ ∗ · ψt+h−j,t−j ] = 0, for all j ≥ h, and so ψt+h,t is uncorrelated with implies that E[ψt+h,t ∗ ψt+h−j,t−j . To prove point 3, note that assumption (D1) of strict stationarity for {Xt } yields the ∗ ∗ ) as Yˆt+h,t is a time-invariant function of Z˜t . Thus for strict stationarity of (Yt+h , Yˆt+h,t all h and j we have 11 0 0 11 0 0 ∗ ∗ = E Et−j L Yt+h−j , Yˆt+h−j,t−j E Et L Yt+h , Yˆt+h,t

and so the unconditional expected loss only depends on the forecast horizon, h, and not ∗ we on the period when the forecast was made, t. By the optimality of the forecast Yˆt+h,t also have, ∀j ≥ 0, 0 1 0 1 ∗ ∗ ≥ Et L Yt+h , Yˆt+h,t Et L Yt+h , Yˆt+h,t−j 0 1 0 1 ∗ ∗ E L Yt+h , Yˆt+h,t−j ≥ E L Yt+h , Yˆt+h,t 1 0 1 0 ∗ ∗ ≥ E L Yt+h , Yˆt+h,t E L Yt+h+j , Yˆt+h+j,t where the second line follows using the law of iterated expectations and the third line follows from strict stationarity. Hence the unconditional expected loss is a nondecreasing function of the forecast horizon. Proof of Corollary 1 This proof follows directly from the proof of Proposition 1 above, when one observes the relation between the forecast error and the generalized forecast ∗ ∗ , for the mean squared loss case: e∗t+h,t = − 2θ1h ψt+h,t , and noting that the error, ψt+h,t MSE loss function satisﬁes assumptions L1, L3 and L4, which implies a unique interior optimum. ˜ To prove Proposition 2 we prove the following lemma, for the “L-loss probability measure”, which nests the MSE-loss probability measure as a special case. We will require the following generalization of assumption L6:

Appendix

211

˜ 0 < Et Assumption L6’: Given two loss functions, L and L, all yˆ ∈ Y almost surely.

' ∂L(Yt+h ,ˆy)/∂ yˆ ( ˜ t+h ,ˆ ∂ L(Y y )/∂ yˆ

< ∞ for

˜∗ ˜ be two loss functions, and let Yˆ ∗ Lemma 1 Let L and L t+h,t and Yt+h,t be the optimal ˜ respectively. forecasts of Yt+h at time t under L and L, ˜ Then the “L-loss ˜ 1. Let assumptions L1, L5 and L6’ hold for L and L. probability measure”, F˜et+h,t , deﬁned below is a proper probability distribution function for all yˆ ∈ Y. dF˜et+h,t (e; yˆ) =

Λ (e, yˆ) · dFet+h,t (e; yˆ) Et [Λ (Yt+h − yˆ, yˆ)]

where Λ (e, yˆ) ≡

∂L (y, yˆ)/∂ yˆ|y=ˆy+e ψ (ˆ y + e, yˆ) ≡ . ˜ ˜ (y, yˆ)/∂ yˆ ψ (ˆ y + e, yˆ) ∂L y=ˆ y +e

˜ 2. If we further let assumption L2’ hold, then the generalized forecast error under L ∗ ∗ ∗ ˜ ˆ ˜ ˆ ˆ evaluated at Yt+h,t , ψ(Yt+h , Yt+h,t ) = ∂ L(Yt+h , Yt+h,t )/∂ yˆ, has conditional mean ˜ zero under the L-loss probability measure. ˜ evaluated at Yˆ ∗ , is serially uncorrelated 3. The generalized forecast error under L, t+h,t ˜ under the L-loss probability measure for all lags greater than h − 1. ˜ t+h , Yˆ ∗ ) under F˜e (·; yˆ), is a nonde˜ L(Y ˜ t+h , Yˆ ∗ )], the expectation of L(Y 4. E[ t+h,t t+h,t ∗ creasing function of the forecast horizon when evaluated at yˆ = Yˆt+h,t . Proof of Lemma 1 We ﬁrst need to show that dF˜et+h ≥ 0 for all possible values of e, and that dF˜et+h ,t (u; yˆ)du = 1. By assumption L5 we have Λ(e, yˆ) > 0 for all e where Λ(e, yˆ) exists. Thus Λ · dFet+h ,t is non-negative, and Et [Λ] is positive (and ﬁnite by assumption L6’), so dF˜et+h,t (e; Yˆt+h,t ) ≥ 0, if dFet+h,t (e; Yˆt+h,t ) ≥ 0. By the construction of dF˜et+h,t it is clear that it integrates to 1. ∗ To prove part 2, note that, from the optimality of Yˆt+h,t under L, 0 1 ∗ ∗ ∗ ∗ ∗ ˜t ψ˜ Yt+h , Yˆt+h,t ∝ ψ˜ Yˆt+h,t Λ e, Yˆt+h,t · dFet+h,t e; Yˆt+h,t + e, Yˆt+h,t E =

∗ ∗ ∗ ψ Yˆt+h,t · dFet+h,t e; Yˆt+h,t + e, Yˆt+h,t

= 0. ˜ t+h , Yˆ ∗ ) is also zero by the law of iterated The unconditional mean of ψ(Y t+h,t expectations.

212

Generalized forecast errors

˜ t+h , ˜ t+h , Yˆ ∗ )] = 0, from part 2, we need only show that E[ ˜ ψ(Y ˜ ψ(Y Part 3: As E[ t+h,t ∗ ˜ t+h+j , Yˆ ∗ Yˆt+h,t ) · ψ(Y t+h+j,t+j )] = 0 for j ≥ h. Again, by part 2, 0 1 ˜t ψ˜ Yt+h , Yˆ ∗ · ψ˜ Yt+h+j , Yˆ ∗ E t+h,t

t+h+j,t+j

0 0 11 ∗ ˜t+j ψ˜ Yt+h+j , Yˆ ∗ ˜t ψ˜ Yt+h , Yˆt+h,t ·E for j ≥ h =E t+h+j,t+j = 0. ˜ t+h+j , Yˆ ∗ ˜ t+h , Yˆ ∗ ) · ψ(Y ˜ ψ(Y E[ t+h,t t+h+j,t+j )] = 0 follows by the law of iterated expectations. ˜ t+h , Yˆ ∗ )] = 0 is the ﬁrst order condition of ˜ For part 4 note that Et [ψ(Y t+h,t ˜ t+h , yˆ)], so E ˜t [L(Y ˜ t+h , Yˆ ∗ )] ≤ E ˜t [L(Y ˜ t+h , Yˆ ∗ ˜t [L(Y min E t+h,t t+h,t−j )] ∀j ≥ 0, and so yˆ

˜ L(Y ˜ t+h , Yˆ ∗ ˜ ˜ ˆ∗ ˜ L(Y ˜ t+h , Yˆ ∗ )] ≤ E[ E[ t+h,t t+h,t−j )] = E[L(Yt+h+j , Yt+h+j,t )] by the law of iterated expectations and the assumption of strict stationarity. Note that the assumption of strict ∗ ∗ and the change of measure, Λ˜t+h,t (e, Yˆt+h,t ), stationarity for {Xt } suﬃces here as Yˆt+h,t ˜ are time-invariant functions of Zt . ˜ yˆ) = (y − yˆ)2 Proof of Proposition 2 Follows from the proof of Lemma 1 setting L(y, and noting that assumption L6 satisﬁes L6’ for this loss function.

11

Multivariate Autocontours for Speciﬁcation Testing in Multivariate GARCH Models Gloria Gonz´ alez-Rivera and Emre Yoldas

1. Introduction Even though there is an extensive literature on speciﬁcation tests for univariate time series models, the development of new tests for multivariate models has been very slow. As an example, in the ARCH literature we have numerous univariate speciﬁcations for which we routinely scrutinize the standardized residuals for possible neglected dependence and deviation from the assumed conditional density. However, for multivariate GARCH models we rarely test for the assumed multivariate density and for cross-dependence in the residuals. Given the inherent diﬃculty of estimating multivariate GARCH models, the issue of dynamic mis-speciﬁcation at the system level – as important as it may be – seems to be secondary. Though univariate speciﬁcation tests can be performed in each equation of the system, these tests are not independent from each other, and an evaluation of the system will demand adjustments in the size of any joint test that combines the results of the equation-by-equation univariate tests. Bauwens, Laurent, and Rombouts (2006) survey the latest developments in multivariate GARCH models and they also acknowledge the need for further research on multivariate diagnostic tests. There are some portmanteau statistics for neglected multivariate conditional heteroskedasticity as in Ling and Li (1997), Tse and Tsui (1999), and Duchesne and Lalancette (2003). Some of these tests have unknown asymptotic distributions when applied to the generalized GARCH residuals. Tse (2002) proposes another type of mis-speciﬁcation test that is based on regressions of the standardized residuals on some explanatory variables. In that Acknowledgments: We are grateful to Tim Bollerslev and an anonymous referee for helpful comments that signiﬁcantly improved the presentation of the chapter.

213

214

Multivariate autocontours for speciﬁcation testing

case, the usual ordinary least squares (OLS) asymptotics do not apply, but it is possible to construct some statistics that are asymptotically chi-squared distributed under the null of no dynamic mis-speciﬁcation. None of these tests are concerned with the speciﬁcation of the multivariate density. However, the knowledge of the density functional form is of paramount importance for density forecast evaluation, which is needed to assess the overall adequacy of the model. Recently, Bai and Chen (2008) adopted the empirical process-based testing approach of Bai (2003), which is developed in the univariate framework, to multivariate models. They use single-indexed empirical processes to make computation feasible, but this causes loss of full consistency. Kalliovirta (2007) also takes an empirical process-based approach and proposes several test statistics for checking dynamic mis-speciﬁcation and density functional form. We propose a new battery of tests for dynamic speciﬁcation and density functional form in multivariate time series models. We focus on the most popular models for which all the time dependence is conﬁned to the ﬁrst and second moments of the multivariate process. Multivariate dynamics in moments further than the second are diﬃcult to ﬁnd in the data and, to our knowledge, there are only a few attempts in the literature restricted to upmost bivariate systems. Our approach is not based on empirical processes, so we do not require probability integral transformations as opposed to the above mentioned studies testing for density speciﬁcation. This makes dealing with parameter uncertainty relatively less challenging on theoretical grounds. When parameter estimation is required, we will adopt a quasi-maximum likelihood procedure as opposed to strict maximum likelihood, which assumes the knowledge of the true multivariate density. If the true density were known, it would be possible to construct tests for dynamic mis-speciﬁcation based on the martingale diﬀerence property of the score under the null. However, if the density function is unknown, a quasi-maximum likelihood estimator is the most desirable to avoid the inconsistency of the estimator that we would have obtained under a potentially false density function. The lack of consistency may also jeopardize the asymptotic distribution of the tests. Our approach is less demanding than any score-type testing in the sense that once quasi-maximum likelihood estimates are in place, we can proceed to test diﬀerent proposals on the functional form of the conditional multivariate density function. The proposed tests are based on the concept of “autocontour” introduced by Gonz´ alez-Rivera, Senyuz, and Yoldas (2007) for univariate processes. Our methodology is applicable to a wide range of models including linear and nonlinear VAR speciﬁcations with multivariate GARCH disturbances. The variable of interest is the vector of general1/2 ized innovations εt = (ε1t , ε2t , . . . , εkt ) in a model yt = μt (θ01 ) + Ht (θ02 )εt , where yt is a k × 1 vector of variables with conditional mean vector μt and conditional covariance matrix Ht . Under the null hypothesis of correct dynamic speciﬁcation the vector εt must be i.i.d. with a certain parametric multivariate probability density function f (.). Thus, if we consider the joint distribution of two vectors εt and εt−l , then under the null we have f (εt , εt−l ) = f (εt )f (εt−l ). The basic idea of the proposed tests is to calculate the percentage of observations contained within the probability autocontour planes corresponding to the assumed multivariate density of the vector of independent innovations, i.e. f (εt )f (εt−l ), and to statistically compare it to the population percentage. We develop a battery of t-tests based on a single autocontour and also more powerful chi-squared tests based on multiple autocontours, which have standard asymptotic distributions. Without parameter uncertainty the test statistics are all distribution free, but under parameter

2 Testing methodology

215

uncertainty there are nuisance parameters aﬀecting the asymptotic distributions. We show that a simple bootstrap procedure overcomes this problem and yields the correct size even for moderate sample sizes. We also investigate the power properties of the test statistics in ﬁnite samples. As the null is a joint hypothesis, the rejection of the null begs the question of what is at fault. Thus, it is desirable to separate i.i.d-ness from density function. In the spirit of goodness-of-ﬁt tests, we also propose an additional test that focuses on the multivariate density functional form of the vector of innovations. Following a similar approach, we construct the probability contours corresponding to the hypothesized multivariate density, f (εt ), and compare the sample percentage of observations falling within the contour to the population percentage. The goodness-of-ﬁt tests are also constructed as t-statistics and chi-squared statistics with standard distributions. The organization of this chapter is as follows. In Section 2, we describe the battery of tests, which follow from Gonz´ alez-Rivera, Senyuz, Yoldas (2007), and the construction of the multivariate contours and autocontours. In Section 3, we oﬀer some Monte Carlo simulation to assess the size and power of the tests in ﬁnite samples. In Section 4, we apply the tests to the generalized residuals of GARCH models with hypothesized multivariate Normal and multivariate Student-t innovations ﬁtted to excess returns on ﬁve size portfolios. In Section 5, we conclude.

2. Testing methodology 2.1. Test statistics Let yt = (y1t , . . . , ykt ) and suppose that yt evolves according to the following process 1/2

yt = μt (θ01 ) + Ht

(θ02 )εt ,

t = 1, . . . , T,

(1)

1/2

where μt (.) and Ht (.) are both measurable with respect to time t − 1 sigma ﬁeld, t−1 , Ht (.) is positive deﬁnite, and {εt } is an i.i.d. vector process with zero mean and identity covariance matrix. The conditional mean vector, μt (.), and the conditional covariance , θ02 ) , which for matrix, Ht (.), are fully parameterized by the parameter vector θ0 = (θ01 now we assume to be known, but later on we will relax this assumption to account for parameter uncertainty. If all the dependence is contained in the ﬁrst and second conditional moments of the process yt , then the null hypothesis of interest to test for model mis-speciﬁcation is H0 : εt is i.i.d. with density f (.). The alternative hypothesis is the negation of the null. Though we wish to capture all the dynamic dependence of yt through the modeling of the conditional mean and conditional covariance matrix, there may be another degree of dependence that is built in the assumed multivariate density, f (.). In fact, once we move beyond the assumption of multivariate normality, for instance when we assume a multivariate Student-t distribution, the components of the vector εt are dependent among themselves and this information is only contained within the functional form of the density. This is why,

216

Multivariate autocontours for speciﬁcation testing

among other reasons, it is of interest to incorporate the assumed density function in the null hypothesis. Let us consider the joint distribution of two k×1 vectors εt and εt−l , l = 1, . . . , L < ∞. Deﬁne a 2k × 1 vector ηt = (εt , εt−l ) and let ψ(.) denote the associated density function. Under the null hypothesis of i.i.d. and correct probability density function, we can write ψ(ηt ) = f (εt )f (εt−l ). Then, under the null, we deﬁne the α-autocontour, Cl,α , as the set of vectors (εt , εt−l ) that results from slicing the multivariate density, ψ(.), at a certain value to guarantee that the set contains α% of observations, that is, $ % g2k g1 2k ··· ψ(ηt )dη1t . . . dη2k, t ≤ α , (2) Cl,α = S(ηt ) ⊂ h1

h2k

where the limits of integration are determined by the density functional form so that the shape of the probability contours is preserved under integration, e.g. when the assumed density is normal, then the autocontours are 2k-spheres (a circle when k = 1). We construct an indicator process deﬁned as 9 / Cl,α 1 if ηt ∈ l,α . (3) It = 0 otherwise The process {Itl,α } forms the building block of the proposed test statistics. Let pα ≡ 1−α. As the indicator is a Bernoulli random variable, its mean and variance are given by E[Itl,α ] = pα and V ar(Itl,α ) = pα (1−pα ). Although {εt } is an i.i.d. process, {Itl,α } exhibits l,α some linear dependence because Itl,α and It−l share common information contained in εt−l . Hence, the autocovariance function of {Itl,α } is given by 9 l,α = 1) − p2α if h = l P (Itl,α = 1, It−h α . γh = 0 otherwise T −l Proposition 1 Deﬁne pˆlα = (T − l)−1 t=1 Itl,α . Under the null hypothesis, √

T − l pˆlα − pα tl,α = →d N(0, 1), σl,α

(4)

2 = pα (1 − pα ) + 2γlα . where σl,α

Proof See Gonz´alez-Rivera, Senyuz, and Yoldas (2007) for all mathematical proofs. Now let us consider a ﬁnite number of contours, (α1 , . . . , αn ), jointly. Let pα = T −l (pα1 , . . . , pαn ) where pαi = 1 − αi , and deﬁne pˆlαi = (T − l)−1 t=1 Itl,αi for i = 1, . . . , n. p1 , . . . , pˆn ) . We then collect all the pˆlαi s in a n × 1 vector, pˆlα = (ˆ Proposition 2 Under the null hypothesis, √ T − l(ˆ plα − pα ) →d N (0, Ξ), l,α

where the elements of Ξ are ξij = min(pαi , pαj ) − pαi pαj + Cov(Itl,αi , It−lj ) + l,α l,αi Cov(It j , It−l ).

Then, it directly follows that Jnl = (T − l)(ˆ plα − pα ) Ξ −1 (ˆ plα − pα ) →d χ2 (n).

(5)

2 Testing methodology

217

A complementary test to those described above can be constructed in the spirit of goodness-of-ﬁt. Suppose that we consider only the vector εt and we wish to test in the direction of density functional form. We construct the probability contour sets Cα corresponding to the probability density function that is assumed under the null hypothesis. The set is given by $ Cα =

k

S(εt ) ⊂

g1

···

h1

gk

% f (εt )dε1t . . . dεkt ≤ α .

(6)

hk

Then, as before, we construct an indicator process as follows 9 Itα

=

/ Cα 1 if εt ∈ , 0 otherwise

(7)

for which the mean and variance are E[Itα ] = 1 − α and V ar(Itα ) = α(1 − α), respectively. The main diﬀerence between the sets Cl,α and Cα is that the latter does not explicitly consider the time-independence assumed under the null and, therefore, the following tests based on Cα will be less powerful against independence. There is also a diﬀerence in the properties of the indicator process. Now, the indicator is also an i.i.d. process, and the analogous tests to those of Propositions 1 and 2 will have a simpler asymptotic distribution. T Let pα = 1 − α and deﬁne an estimator of pα as p˜α = T −1 t=1 Itα . Under the null hypothesis the distribution of the analogue test statistic to that of Proposition 1 is √ tα =

T (˜ pα − pα ) →d N(0, 1). pα (1 − pα )

If, as in Proposition 2, now we jointly consider a ﬁnite number of contours and deﬁne pαn ) and p˜α = (˜ pα1 , . . . , p˜αn ) , where pαi = 1 − αi and the vectors pα = (pα1 , . . . ,√ T αi −1 T (˜ pα − pα ) →d N(0, Ξ) where the elements of Ξ simplify p˜αi = T t=1 It . Then to ξij = min(pαi , pαj ) − pαi pαj and, it follows that J˜n = T (˜ pα − pα ) Ξ −1 (˜ pα − pα ) →d χ2 (n). Note that to make these tests operational we replace the covariance terms by their sample counterparts. Furthermore, the asympotic normality results established above still hold under parameter uncertainty as shown by Gonz´ alez-Rivera, Senyuz, and Yoldas (2007). However, one needs to deal with nuisance parameters in the asymptotic covariance matrices to make the statistics operational. They suggest using a parametric bootstrap procedure, which imposes all restrictions of the null hypothesis to estimate asymptotic covariance matrices under parameter uncertainty. Speciﬁcally, after the model is estimated, bootstrap samples are generated by using the estimated model as the data generating process where innovation vectors are drawn from the hypothesized parametric distribution. Their Monte Carlo simulations indicate that this approach provides satisfactory results. Hence, in this chapter we take the same approach in our applications.

218

Multivariate autocontours for speciﬁcation testing

2.2. Multivariate contours and autocontours 2.2.1. Multivariate normal distribution In this case the density function is f (εt ) = (2π)−k/2 exp(−0.5εt εt ). Let f¯α denote the value of the density such that the corresponding probability contour contains α% of the observations. Then the equation describing this contour is qα = εt εt ≡ ε21t + ε22t + · · · + ε2kt , where qα = −2 ln(f¯α × (2π)k/2 ). Hence, the Cα contour set is deﬁned as follows $ % gk g1 ··· (2π)−k/2 exp(−0.5εt εt )dε1t . . . dεkt ≤ α , Cα = S(εt ) ⊂ k √ where g1 = qλ , gi =

8

−g1

qλ −

−gk

i−1

2 j=1 εjt

for i = 2, . . . , k, and λ ≤ α. We need to determine

the mapping qα in order to construct the indicator process. Let xt = εt εt , then xt ∼ χ2 (k) and we have qα ≡ inf{q : Fxt (q) ≥ α}, where Fxt is the cumulative distribution function of a chi-squared random variable with k degrees of freedom. As a result, the indicator series is obtained as follows 9 1 if εt εt > qα α . It = 0 otherwise To construct the autocontour Cl,α , we consider the joint distribution of εt and εt−l . Let ηt = (εt , εt−l ) , then the density of interest is given by ψ(ηt ) = (2π)−k exp(−0.5ηt ηt ). Hence, the autocontour equation is given by 2 2 + · · · + η2k,t , dα = ηt ηt ≡ η1t

where dα = −2 ln(ψ¯α ×(2π)k ). Following the same arguments as above, the corresponding indicator process is 9 1 if ηt ηt > dα l,α , It = 0 otherwise where dα ≡ inf{d : Fxt (d) ≥ α}, xt = ηt ηt , and Fxt is the cumulative distribution function of a chi-squared random variable with 2k degrees of freedom.

2.2.2. Student-t distribution The multivariate density function is −(k+v)/2

f (εt ) = G(k, v) [1 + εt εt /(v − 2)]

,

where G(k, v) = Γ [(v + k)/2]/{[π(v − 2)]0.5k Γ (v/2)}. Then the equation for the α-probability contour is qα = 1 + εt εt /(ν − 2), where qα = [f¯α /G(k, v)](k+v)/2 . As a result, the Cα contour set is deﬁned as $ % gk g1 ··· G(k, v)(1 + εt εt /(v − 2))dε1t . . . dεkt ≤ α , Cα = S(εt ) ⊂ k −g1

−gk

3 Monte Carlo simulations

219

8 i−1 where g1 = (qλ − 1)(v − 2), gi = (qλ − 1)(v − 2) − j=1 ε2jt for i = 2, . . . , k, and λ ≤ α. Now let xt = 1+εt εt /(v−2), then xt ≡ 1+(k/v)wt where wt has an F-distribution with (k, v) degrees of freedom. Consequently, we have qα ≡ inf{q : Fwt [v(q − 1)/k] ≥ α}. Then the indicator series is deﬁned as 9 1 if 1 + εt εt /(v − 2) > qα α . It = 0 otherwise To construct the autocontour Cl,α , we consider the joint distribution of εt and εt−l under the null hypothesis, which is '

(−(k+v)/2 . ψ(εt , εt−l ) = G(k, v)2 (1 + εt εt /(v − 2)) 1 + εt−l εt−l /(v − 2) Then, the equation for the α-probability autocontour is given by dα = 1 + (εt εt + εt−l εt−l )/(v − 2) + (εt εt )(εt−l εt−l )/(v − 2)2 . Let xt = 1 + (εt εt + εt−l εt−l )/(v − 2) + (εt εt )(εt−l εt−l )/(v − 2)2 , then we have xt = 1 + (k/v) × [(w1t + w2t ) + (k/v)(w1t w2t )] where w1t and w2t are independent random variables with an F-distribution with (k, v) degrees of freedom. Similar to the previous case, we have dα ≡ inf{d : Fxt (d) ≥ α}, but we do not have readily available results for the quantiles of xt as before. A plausible solution is using Monte Carlo simulation to approximate the quantiles of interest as we already know that xt is a speciﬁc function of two independent F-distributed random variables. As an illustration, we provide sample contour and autocontour plots under Normal and Student-t (with v = 5) distributions in Figure 11.1. Due to the graphical constraints imposed by high dimensionality, we consider k = 2 and k = 1 for Cα and Cl,α , respectively. Note that while Cα and Cl,α are of identical shape under normality, as the product of two independent normal densities yields a bivariate normal density, this is not the case under the Student-t distribution.

3. Monte Carlo simulations We investigate the size and power properties of the proposed tests in ﬁnite samples by Monte Carlo simulations for two cases: when the parameters of the model are known and when they are unknown and need to be estimated.

3.1. Size simulations For the size experiments we consider two alternative distributions for the innovation process: a multivariate Normal, εt ∼ i.i.d. N(0, Ik ), and a multivariate Student-t with 5 degrees of freedom, εt ∼ i.i.d.t(0, Ik , 5). Under parameter uncertainty, we consider a simple multivariate location-scale model: yt = μ+H 1/2 εt where we set μ = 0 and H = Ik . We consider both distributions under parameter uncertainty and apply the tests to the ˆ −1/2 (yt − μ ˆ), where we obtain H 1/2 by estimated standardized residual vector, εˆt = H

220

Multivariate autocontours for speciﬁcation testing

e1t

Ca under bivariate Normal and Student-t distributions a Î {0.5,0.7,0.9,0.99} e1t 5

5

4

4

3

3 99% 90% 70%

2 1

2

0

50%

–1

–2

–2

–3

–3

–4

–4

e1t

–3

–2

–1

0

1

70%

0

50%

–4

90%

1

–1

–5 –5

99%

2

3

4

5

e2t

–5 –5

–4

–3

–2

–1

0

1

2

3

4

5

e2t

4

5

e1,t–1

Cl,a under bivariate Normal and Student-t distributions a Î {0.5,0.7,0.9,0.99} e1t 5

5

4

4

3

3 99%

2

90% 70%

1

70% 99%

2

90%

1

50%

0

0 50%

–1

–1

–2

–2

–3

–3

–4

–4

–5 –5

–4

Fig. 11.1.

–3

–2

–1

0

1

2

3

4

e 5 1,t–1

–5 –5

–4

–3

–2

–1

0

1

2

3

Contour and autocontour plots under Normal and Student-t distributions

using the Cholesky decomposition.1 The asymptotic variance of the tests is obtained by the simple parametric bootstrap procedure outlined above (see Section 2.1). The number of Monte Carlo replications is equal to 1,000, and the number of bootstrap replications is set to 500. We consider 13 autocontours (n = 13) with coverage levels (%): 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, and 99, spanning the entire density function.2 We start with a sample size of 250 and consider increments of 250 up to 2,000 observations. In all experiments, the nominal size is 5%. 1 Alternative decompositions can be used to calculate the square-root matrix. We conjecture that the choice of the decomposition technique is not critical for application of our tests. 2 Our choice of the contour coverage levels is motivated by the need of covering the entire range of the density, from the tails to the very center as we do not have a theoretical result indicating the optimal choice of the number of contours to guide our practice. The ﬂexibility of our approach permits considering diﬀerent types of coverage levels depending on the purpose of application, e.g. concentrating on tails for risk models. Note also that the Monte Carlo results presented below provide guidance as to how far one can go in the tails and the center of the denisty without losing precision in ﬁnite samples. Additional Monte Carlo simulations, not reported here to save space, also indicate that the size and power results are robust to the number of contours as long as the range considered is identical, i.e. a ﬁner grid does not change the results.

3 Monte Carlo simulations Table 11.1(a). 1 J13

T

2 J13

221 Size of the Jnl -statistics 3 J13

4 J13

5 J13

Panel a: Normal (k = 2) 250 500 1000 2000

11.3 6.5 6.8 6.4

11.3 6.0 5.0 5.1

11.6 5.8 6.2 5.7

12.7 9.2 6.3 5.3

11.8 8.4 7.1 5.6

Table 11.1(b). T

1 J13

11.5 6.9 5.5 5.3

8.8 11.8 5.9 8.0 5.3 4.9 4.1 4.8

8.1 7.5 8.1 5.7

250 500 1000 2000

10.5 7.7 5.9 8.0

3 J13

4 J13

5 J13

10.5 7.5 7.2 7.2

11.0 5.8 5.2 5.8

10.5 5.9 5.1 5.5

12.3 7.0 5.4 6.4

9.4 6.2 6.0 6.4

Panel b: Student-t (k = 5)

14.0 12.9 7.6 8.3 6.0 6.4 3.4 6.5

10.4 7.3 5.9 6.9

11.7 6.6 4.8 4.8

12.3 7.3 6.6 5.7

10.3 7.9 5.7 5.5

11.6 8.1 7.8 5.4

Size of the Jnl -statistics under parameter uncertainty

2 J13

3 J13

4 J13

5 J13

Panel a: Normal (k = 2) 250 500 1000 2000

2 J13

Panel b: Student-t (k = 2)

Panel a: Normal (k = 5) 250 500 1000 2000

1 J13

6.1 7.3 5.9 5.8 5.8 8.0 5.4 7.7 Panel a: Normal 9.3 6.9 6.1 8.0

7.7 6.3 7.1 7.4

7.5 7.3 7.3 6.4 (k = 5) 9.2 6.9 5.5 6.8

1 J13

2 J13

3 J13

4 J13

5 J13

Panel b: Student-t (k = 2) 6.9 7.4 6.6 4.8

6.8 6.4 7.8 7.5 6.7 8.3 8.5 6.9 8.8 6.2 7.6 7.6 Panel b: Student-t

8.1 7.6 5.5 7.1

7.1 6.8 6.4 7.0

7.3 5.5 5.7 6.5

6.3 6.0 6.8 7.3

6.5 8.0 8.3 6.4 (k = 5) 7.2 6.9 7.5 6.3

6.0 8.1 7.6 7.0

6.3 6.4 6.6 7.9

In Tables 11.1(a) and 11.1(b) we present the simulated size results for the Jnl -statistics. We consider a system of two equations (k = 2) and a system of ﬁve equations (k = 5). For a small sample of 250 observations, the Jnl -statistics are oversized for both densities and both systems. However, under parameter uncertainty, the bootstrap procedure seems to correct to some extent the oversize behavior. For samples of 1,000 and more observations, the simulated size is within an acceptable range of values. There are no major diﬀerences between the results for the small versus the large systems of equations indicating that the dimensionality of the system is not an issue for the implementation of these tests. In Tables 11.2(a) and 11.2(b) we show the simulated size for the J˜n -statistics, which should be understood primarily as goodness-of-ﬁt tests as they do not explicitly take into account the independence of the innovations over time. The sizes reported in Table 11.2(a)

222

Multivariate autocontours for speciﬁcation testing Size of the J˜n -statistics (n = 13)

Table 11.2(a).

Normal T 250 500 1000 2000

Student-t

k=2

k=5

k=2

k=5

5.7 4.9 5.7 5.6

6.3 5.3 5.7 6.2

4.3 3.1 5.6 4.9

6.6 5.1 5.3 5.6

Table 11.2(b). Size of the J˜n -statistics (n = 13) under parameter uncertainty Normal T 250 500 1000 2000

Student-t

k=2

k=5

k=2

k=5

6.9 7.0 6.7 6.4

9.1 6.1 5.5 7.4

7.3 6.8 6.7 6.8

6.8 6.7 5.6 5.7

are very good, though those in Table 11.2(b) tend to be slightly larger than 5% mainly for small samples. However, when we consider the tests with individual contours (see Table 11.3 below), the size distortion tends to disappear. For the t-tests, which are based on individual contours, the simulated sizes are very good. In Table 11.3, we report these results for the case of parameter uncertainty. The major size distortions occur for small samples at the extreme contour t13 (99% coverage), but this is not very surprising as we do not expect enough variation in the indicator series for small samples.

3.2. Power simulations We investigate the power of the tests by generating data from a system with two equations that follows three diﬀerent stochastic processes. We maintain the null hypothesis as yt = μ + H 1/2 εt , where εt ∼ i.i.d. N(0, Ik ), and consider the following DGPs: DGP 1: yt = μ + H 1/2 εt , where εt ∼ i.i.d.t(0, I2 , 5), μ = 0, and H = I2 . In this case, we maintain the independence hypothesis and analyze departures from the hypothesized density function by generating i.i.d. observations from a multivariate Student-t distribution with 5 degrees of freedom. DGP 2: yt = Ayt−1 + H 1/2 εt , where εt ∼ i.i.d.N(0, I2 ), a11 = 0.7, a12 = 0.1, a21 = 0.03, a22 = 0.85, and H = I2 . In this case, we maintain the same density function as that of the null hypothesis and analyze departures from the independence assumption by considering a linear VAR(1).

3 Monte Carlo simulations

223

Table 11.3.

Size of the t-statistics under parameter uncertainty

T

t2

t1

t3

t4

t5

t6

t7

t8

t9

t10

t11

t12

t13

(k = 2) 4.9 5.2 5.1 5.3 5.7 5.7 6.2 4.8

4.6 5.2 4.6 5.9

6.0 5.1 5.9 4.3

4.8 4.7 7.6 6.4

2.0 6.4 3.7 4.9

(k = 5) 5.8 5.5 6.4 6.5 3.3 4.6 6.0 4.6

5.1 4.3 5.3 5.5

6.1 6.3 6.0 5.5

6.7 6.0 4.7 4.4

2.1 6.3 3.9 6.5

250 500 1000 2000

5.0 4.3 4.7 5.4

4.6 4.2 4.2 3.9

5.2 5.3 5.2 5.1

5.1 5.4 5.8 4.0

Panel a: Normal 6.5 6.7 5.7 4.1 4.6 4.5 5.4 5.5 5.2 5.0 5.3 5.3

250 500 1000 2000

4.5 4.1 3.8 4.5

6.2 4.8 5.3 5.3

5.3 5.8 5.7 5.0

5.0 4.8 5.3 5.0

Panel b: Normal 4.5 5.2 5.3 6.0 5.6 5.3 4.9 5.2 3.8 4.6 4.1 5.4

250 500 1000 2000

4.5 4.5 4.3 5.7

5.1 6.1 5.9 5.0

5.3 5.9 6.4 5.2

4.9 4.8 5.8 5.4

4.9 4.5 5.7 5.5

6.0 4.2 5.5 4.7

4.5 4.2 5.9 5.5

5.4 5.3 5.8 5.0

5.7 6.1 5.5 4.9

4.3 5.9 6.0 5.2

8.7 4.9 6.3 4.8

4.6 4.9 5.5 5.2

Panel 5.8 4.9 4.7 5.2

d: Student-t (k = 5) 6.0 7.6 6.7 7.0 6.6 5.8 7.1 7.7 6.2 5.8 5.3 5.2 5.0 5.3 4.4 5.3

6.6 6.5 6.0 6.1

5.8 5.4 5.2 5.0

4.1 5.0 4.7 5.1

8.4 5.9 3.7 3.8

Panel c: Student-t (k = 2)

250 500 1000 2000

4.5 4.6 3.4 5.1

5.5 5.4 4.2 5.6

4.8 6.4 4.9 5.3

4.8 4.9 6.6 5.4

4.6 5.3 6.4 5.9

DGP 3: yt = Ht εt , εt ∼ i.i.d. N(0, I2 ), with Ht = C + A yt−1 yt−1 A + G Ht−1 G and 1/2 1/2 parameter values A = 0.1 × I2 , G = 0.85 × I2 , and C = V − A V A − G V G where V is the unconditional covariance matrix with v11 = v22 = 1 and v12 = 0.5. In this case, we analyze departures from both independence and density functional form by generating data from a system with multivariate conditional heteroskedasticity. 1/2

In Table 11.4 we report the power of the Jnl -statistic. The test is the most powerful to detect departures from density functional form (DGP 1) as the rejection rates are almost 100% even in small samples. For departures from independence, the test has better power to detect dependence in the conditional mean (DGP 2) than in the conditional variance (DGP 3). As expected, in the case of the VAR(1) model (DGP 2), the power decreases as l becomes larger indicating ﬁrst order linear dependence. The power is also very good (69%) for small samples of 250 observations. In the case of the GARCH model (DGP 3), the rejection rate reaches 60% for sample sizes of 500 observations and above. As expected, in Table 11.5 we observe that the goodness-of-ﬁt test, J˜n , has the largest power for DGP 1 and it is not very powerful for DGP 2. It has reasonable power against DGP 3 mainly for samples of 1,000 observations and above. We ﬁnd a similar message in Table 11.6 when we analyze the power of the t-statistics. The tests are the most powerful to detect DGP 1, the least powerful to detect DGP 2,

224

Multivariate autocontours for speciﬁcation testing Power of the Jnl -statistics under parameter uncertainty

Table 11.4.

1 J13

T 250 500 1000 2000

98.6 100.0 100.0 100.0

2 J13

4 J13

5 J13

98.2 100.0 100.0 100.0

Panel a: DGP 1 98.6 100.0 100.0 100.0

97.8 100.0 100.0 100.0

98.3 100.0 100.0 100.0

Panel b: DGP 2 26.6 38.1 58.0 83.7

19.3 27.9 39.2 59.8

16.5 20.4 28.9 40.6

31.9 61.4 86.9 98.9

31.9 60.3 86.7 99.2

250 500 1000 2000

68.9 93.6 99.9 100.0

40.2 60.0 84.8 99.4

250 500 1000 2000

35.5 62.8 90.5 99.4

36.0 61.6 88.8 99.6

3 J13

Panel c: DGP 3 32.9 60.5 88.1 99.7

and acceptable power against DGP 3 for samples of 1,000 observations and above. There is a substantial drop in power for the t11 test (90% contour) for the cases of DGP 1 and DGP 3. This behavior is similar to that encountered in the univariate tests of Gonz´ alezRivera, Senyuz, and Yoldas (2007). This is a result due to the speciﬁc density under the null. In the case of DGP 1, for some contour coverage levels the normal density and the Student-t are very similar. Hence it is very diﬃcult for any test to discriminate the null from the alternative with respect to the coverage level of those contour planes. A similar argument applies to DGP 3 as well, as the GARCH structure in the conditional covariance matrix is associated with a non-normal unconditional density.

4. Empirical applications In this section we apply the proposed testing methodology to the generalized residuals of multivariate GARCH models ﬁtted to US stock return data. Our data set consists of Table 11.5. Power of the J˜n -statistics (n = 13) under parameter uncertainty T 250 500 1000 2000

DGP 1

DGP 2

DGP 3

99.1 100.0 100.0 100.0

12.4 12.1 12.9 14.2

19.7 44.5 70.2 94.7

4 Empirical applications

225

Table 11.6.

Power of the t-statistics under parameter uncertainty

T

t2

250 500 1000 2000 250 500 1000 2000 250 500 1000 2000

t1

t3

t4

t5

23.1 55.3 76.6 91.8 96.1 32.3 80.6 95.3 99.5 100.0 49.7 97.4 99.9 100.0 100.0 75.4 99.9 100.0 100.0 100.0 3.3 3.6 5.1 4.4

4.7 5.6 6.4 6.7

t6

t7

t8

t9

t10

t11

t12

t13

Panel a: DGP 1 97.7 98.0 96.6 89.9 59.6 8.5 33.7 85.2 100.0 100.0 100.0 99.4 85.6 8.6 57.8 98.5 100.0 100.0 100.0 100.0 98.9 14.0 78.7 100.0 100.0 100.0 100.0 100.0 100.0 16.2 94.9 100.0

Panel 8.4 11.2 11.1 12.4 7.6 11.5 12.8 11.5 8.4 11.2 13.5 14.0 9.2 10.8 13.3 15.3

Panel 5.6 7.2 10.7 12.8 15.3 17.6 7.2 11.9 17.7 25.5 33.4 38.3 8.1 20.5 31.4 46.3 58.6 64.3 13.5 35.3 56.8 77.7 86.7 91.5

b: DGP 2 13.4 11.0 11.8 11.0 11.7 11.9 14.6 11.6

7.3 8.9 9.6 9.5

6.7 7.0 7.1 8.7

9.7 11.6 3.5 7.2 10.9 13.1 7.9 11.9 13.2 8.7 12.3 14.0

c: DGP 3 18.5 18.7 14.6 8.3 41.5 41.1 32.6 15.6 68.7 67.1 59.1 32.1 92.8 91.8 85.4 54.7

6.3 9.0 17.0 5.3 20.0 48.0 8.6 34.8 70.4 9.5 60.0 93.5

daily excess returns on ﬁve size portfolios, i.e. portfolios sorted with respect to market capitalization in an increasing order.3 The sample period runs from January 2, 1996 to December 29, 2006, providing a total of 2,770 observations. A plot of the data is provided in Figure 11.2. As we are working with daily data we assume a constant conditional mean vector. In terms of the multivariate GARCH speciﬁcations, we consider two popular alternatives: the BEKK model of Engle and Kroner (1995) and the DCC model of Engle (2002a). Deﬁne ut = yt − μ where μ is the constant conditional mean vector. Then the BEKK (1, 1, K) speciﬁcation for the conditional covariance matrix, Ht ≡ E[ut ut |t−1 ], is given by Ht = C C +

K j=1

Aj ut ut−1 Aj +

K j=1

Gj Ht−1 Gj .

(9)

In our applications we set K = 1 and use the scalar version of the model due to parsimony considerations where A = αIk , A = βIk , and α and β are scalars. We also use variance targeting to facilitate estimation, i.e. we set C C = V −A V A−G V G where V = E[ut ut ], e.g. Ding and Engle (2001). In the DCC speciﬁcation, conditional variances and conditional correlations are modeled separately. Speciﬁcally, consider the following* decomposition+ of the conditional 1/2 1/2 covariance matrix: Ht = Dt Rt Dt where Dt = diag h11,t , . . . , hkk,t , and each element of Dt is modeled as an individual GARCH process. In our applications, we consider the 3 Data is obtained from Kenneth French’s website: http://mba.tuck.dartmouth.edu/pages/faculty/ ken.french We are grateful to him for making this data publicly available.

226

Multivariate autocontours for speciﬁcation testing

6.0 4.0 2.0 0.0 –2.0 –4.0 –6.0 –8.0 01/96

01/97

01/98

01/99

01/00

01/01

01/02

01/03

01/04

01/05

01/06

01/97

01/98

01/99

01/00

01/01

01/02

01/03

01/04

01/05

01/06

01/97

01/98

01/99

01/00

01/01

01/02

01/03

01/04

01/05

01/06

01/97

01/98

01/99

01/00

01/01

01/02

01/03

01/04

01/05

01/06

01/01

01/02

01/03

01/04

01/05

01/06

8.0 6.0 4.0 2.0 0.0 –2.0 –4.0 –6.0 –8.0 01/96 8.0 6.0 4.0 2.0 0.0 –2.0 –4.0 –6.0 –8.0 01/96 8.0 6.0 4.0 2.0 0.0 –2.0 –4.0 –6.0 –8.0 01/96 8.0 6.0 4.0 2.0 0.0 –2.0 –4.0 –6.0 –8.0 –10.0 01/96

01/97

01/98

01/99

01/00

Fig. 11.2. Daily excess returns on ﬁve size portfolios (1/2/1996–12/29/2006) (From the smallest quintile portfolio to the largest quintile portfolio)

4 Empirical applications

227

standard GARCH (1,1) process: hii,t = ωi + αi u2i,t−1 + βi hii,t−1 ,

j = 1, . . . , k.

Now deﬁne zt = Dt−1 ut , then Rt = diag{Qt }−1 Qt diag{Qt }−1 where Qt = (1 − α − β)Q + αut ut−1 + βQt−1 ,

(10)

]. and Q = E[zt zt−1 Under both BEKK and DCC speciﬁcations, we consider two alternative distributional assumptions that are most commonly used in empirical applications involving multivariate GARCH models: multivariate Normal and multivariate Student-t distributions. Under multivariate normality, the sample log-likelihood function, up to a constant, is given by

LT (θ) = −

1 T 1 T ln[det(Ht )] − ut Ht ut . t=1 t=1 2 2

(11)

In the case of the DCC model, a two-step estimation procedure is applicable under normality as one can write the total likelihood as the sum of two parts, where the former depends on the individual GARCH parameters and the latter on the correlation parameters. Under this estimation strategy, consistency is still guaranteed to hold. For further details on two-step estimation in the DCC model, the interested reader is referred to Engle (2002a), and Engle and Sheppard (2001). Under the assumption of multivariate Studentt distribution, we do not need to estimate the model with the corresponding likelihood as the estimates obtained under normality are consistent due to quasi-maximum likelihood interpretation. Therefore, we obtain the standardized residual vectors under normality and then simply test the Student-t assumption on these residuals.4 One remaining issue in the case of Student-t distribution is the choice of the degrees of freedom. We follow Pesaran and Zaﬀaroni (2008) and obtain estimates of the degrees of freedom parameters for all series separately and then consider an average of the individual estimates for the distributional speciﬁcation in the multivariate model. The results are summarized in Figures 11.3 through 11.6 and Table 11.7. From the ﬁgures we observe that under both GARCH speciﬁcations, the Jnl -statistics are highly statistically signiﬁcant when multivariate normality is the maintained distributional assumption. The Jnl -statistics of the BEKK model are larger than those obtained under the DCC speciﬁcation. Furthermore, there is an obvious pattern in the behavior of the statistics as a function of the lag order, especially under the BEKK speciﬁcation. This indicates that the rejection is partly due to remaining dependence in the model residuals. When we switch to the multivariate Student-t distribution with 11 degrees of freedom,5 the Jnl -statistics go down substantially under both multivariate GARCH speciﬁcations. Hence, we can argue that the distributional assumption plays a greater role in the rejection of both models under normality. The Jnl -statistics are barely signiﬁcant 4 Note that in the speciﬁcation of the multivariate Student-t distribution (see Section 2), the covariance matrix is already scaled to be an identity matrix, thus no re-scaling of residuals is necessary to implement the test, e.g. Harvey, Ruiz and Sentana (1992). 5 This value is obtained by averaging individual degrees of freedom estimates obtained from individual GARCH models under Student-t density.

228

Multivariate autocontours for speciﬁcation testing 700 600 500 400 300 200 100 0 1

5

9

13

17

21

25

29

33

37

41

45

49

Lag l J13 -statistics of BEKK model under multivariate Normal distribution

Fig. 11.3. 600 500 400 300 200 100 0 1

Fig. 11.4.

5

9

13

17

21

25 Lag

29

33

37

41

45

49

l J13 -statistics of DCC model under multivariate Normal distribution

60 50 40 30 20 10 0 1

5

9

13

17

21

25

29

33

37

41

45

49

Lag

Fig. 11.5.

l J13 -statistics of BEKK model under multivariate Student-t distribution

4 Empirical applications

229

30 25 20 15 10 5 0 1

5

9

13

17

21

25

29

33

37

41

45

49

Lag

Fig. 11.6.

l J13 -statistics of DCC model under multivariate Student-t distribution

at 5% level for only a few lag values under the DCC speciﬁcation coupled with multivariate Student-t distribution. However, under the BEKK speciﬁcation, Jnl -statistics are signiﬁcant at early lags, even at 1% level. Table 11.7 reports individual t-statistics and the J˜n -statistics. Both types of test statistics indicate that normality is very strongly rejected under both GARCH speciﬁcations. Similar to the case of Jnl -statistics, the results dramatically change when the distributional assumption is altered to multivariate Student-t. The DCC model produces better results with respect to both types of test statistics, but the chi-squared test in particular strongly supports the DCC speciﬁcation compared to the BEKK model. Combining the information from all test statistics we can

Table 11.7.

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 J˜13

Individual t and J˜13 -statistics for estimated GARCH models

BEKK Normal

DCC Normal

BEKK Student-t

DCC Student-t

−1.85 −8.52 −9.97 −9.37 −10.34 −11.54 −9.28 −6.85 −2.74 0.24 5.39 8.23 12.18

−2.17 −10.18 −12.26 −11.22 −11.81 −10.95 −10.03 −7.19 −5.70 −1.52 2.17 5.58 12.50

2.78 −0.31 1.00 0.84 2.47 1.13 0.09 0.25 0.92 0.66 0.08 1.00 1.26

2.30 −0.38 −0.64 −0.10 0.18 0.95 0.50 0.59 −0.32 −0.89 −3.51 −1.30 0.74

351.47

388.54

30.07

24.35

230

Multivariate autocontours for speciﬁcation testing

conclude that multivariate normality is a bad assumption to make regardless of the multivariate GARCH speciﬁcation. Furthermore, the DCC model with multivariate Student-t distribution does a good job in terms of capturing dependence and producing a reasonable ﬁt with respect to density functional form.

5. Concluding remarks Motivated by the relative scarcity of tests for dynamic speciﬁcation and density functional form in multivariate time series models, we proposed a new battery of tests based on the concept of “autocontour” introduced by Gonz´ alez-Rivera, Senyuz, and Yoldas (2007) for univariate processes. We developed t-tests based on a single autocontour and also more powerful chi-squared tests based on multiple autocontours, which have standard asymptotic distributions. We also developed a second type of chi-squared test statistic, which is informative as a goodness-of-ﬁt test when combined with the ﬁrst type of chisquared test. Monte Carlo simulations indicate that the tests have good size and power against dynamic mis-speciﬁcation and deviations from the hypothesized density. We applied our methodology to multivariate GARCH models and showed that the DCC speciﬁcation of Engle (2002a) coupled with a multivariate Student-t distribution provides a ﬁne model for multivariate time dependence in a relative large system of stock returns.

12

Modeling Autoregressive Conditional Skewness and Kurtosis with Multi-Quantile CAViaR Halbert White, Tae-Hwan Kim, and Simone Manganelli

1. Introduction It is widely recognized that the use of higher moments, such as skewness and kurtosis, can be important for improving the performance of various ﬁnancial models. Responding to this recognition, researchers and practitioners have started to incorporate these higher moments into their models, mostly using the conventional measures, e.g. the sample skewness and/or the sample kurtosis. Models of conditional counterparts of the sample skewness and the sample kurtosis, based on extensions of the GARCH model, have also been developed and used; see, for example, Leon, Rubio, and Serna (2004). Nevertheless, Kim and White (2004) point out that because standard measures of skewness and kurtosis are essentially based on averages, they can be sensitive to one or a few outliers – a regular feature of ﬁnancial returns data – making their reliability doubtful. To deal with this, Kim and White (2004) propose the use of more stable and robust measures of skewness and kurtosis, based on quantiles rather than averages. Nevertheless, Kim and White (2004) only discuss unconditional skewness and kurtosis measures. In this chapter, we extend the approach of Kim and White (2004) by proposing conditional quantile-based skewness and kurtosis measures. For this, we extend Engle and Manganelli’s (2004) univariate CAViaR model to a multi-quantile version, MQ-CAViaR. This allows for both a general vector autoregressive structure in the conditional quantiles and the presence of exogenous variables. We then use the MQ-CAViaR model to specify conditional versions of the more robust skewness and kurtosis measures discussed in Kim and White (2004). The chapter is organized as follows. In Section 2, we develop the MQ-CAViaR data generating process (DGP). In Section 3, we propose a quasi-maximum likelihood

231

232

Multi-Quantile CAViaR

estimator for the MQ-CAViaR process and prove its consistency and asymptotic normality. In Section 4, we show how to consistently estimate the asymptotic variancecovariance matrix of the MQ-CAViaR estimator. Section 5 speciﬁes conditional quantilebased measures of skewness and kurtosis based on MQ-CAViaR estimates. Section 6 contains an empirical application of our methods to the S&P 500 index. We also report results of a simulation experiment designed to examine the ﬁnite sample behavior of our estimator. Section 7 contains a summary and concluding remarks. Mathematical proofs are gathered into an Appendix.

2. The MQ-CAViaR process and model We consider data generated as a realization of the following stochastic process. Assumption 1 The sequence {(Yt , Xt ) : t = 0, ±1, ±2, . . . , } is a stationary and ergodic stochastic process on the complete probability space (Ω, F , P0 ), where Yt is a scalar and Xt is a countably dimensioned vector whose ﬁrst element is one. Let Ft−1 be the σ-algebra generated by Z t−1 ≡ {Xt , (Yt−1 , Xt−1 ), . . .}, i.e. Ft−1 ≡ σ(Z t−1 ). We let Ft (y) ≡ P0 [Yt < y|Ft−1 ] deﬁne the cumulative distribution function (CDF) of Yt conditional on Ft−1 . Let 0 < θ1 < . . . < θp < 1. For j = 1, . . . , p, the θj th quantile of Yt conditional on ∗ , is Ft−1 , denoted qj,t ∗ qj,t ≡ inf{y : Ft (y) = θj },

(1)

and if Ft is strictly increasing, ∗ qj,t = Ft−1 (θj ). ∗ can be represented as Alternatively, qj,t ∗ qj,t ∗ ] |Ft−1 ] = θj , dFt (y) = E[1[Yt ≤qj,t

(2)

−∞

where dFt is the Lebesgue-Stieltjes diﬀerential for Yt conditional on Ft−1 , corresponding to Ft . ∗ ,j = Our objective is to jointly estimate the conditional quantile functions qj,t ∗ ∗ ∗ 1, 2, . . . , p. For this we write qt ≡ (q1,t , . . . , qp,t ) and impose additional appropriate structure. First, we ensure that the conditional distribution of Yt is everywhere continuous, ∗ . We let ft denote with positive density at each conditional quantile of interest, qj,t the conditional probability density function (PDF) corresponding to Ft . In stating our next condition (and where helpful elsewhere), we make explicit the dependence of the conditional CDF Ft on ω by writing Ft (ω, y) in place of Ft (y). Realized values of the ∗ (ω). Similarly, we write ft (ω, y) in conditional quantiles are correspondingly denoted qj,t place of ft (y). After ensuring this continuity, we impose speciﬁc structure on the quantiles of interest. Assumption 2 (i) Yt is continuously distributed such that for each t and each ω ∈ Ω, Ft (ω, ·) and ft (ω, ·) are continuous on R; (ii) For given 0 < θ1 < . . . < θp < 1 and

2 The MQ-CAViaR process and model

233

∗ ∗ } as deﬁned above, suppose: (a) For each t and j = 1, . . . , p, ft (ω, qj,t (ω)) > 0; (b) {qj,t For given ﬁnite integers k and m, there exist a stationary ergodic sequence of random ∗ ∗ , . . . , βj,k ) and k × 1 vectors {Ψt }, with Ψt measurable−Ft−1 , and real vectors βj∗ ≡ (βj,1 ∗ ∗ ∗ γjτ ≡ (γjτ 1 , . . . , γjτ p ) such that for all t and j = 1, . . . , p, ∗ qj,t = Ψt βj∗ +

m

∗ ∗ qt−τ γjτ .

(3)

τ =1

The structure of (3) is a multi-quantile version of the CAViaR process introduced by ∗ Engle and Manganelli (2004). When γjτ i = 0 for i = j, we have the standard CAViaR process. Thus, we call processes satisfying our structure “Multi-Quantile CAViaR” (MQCAViaR) processes. For MQ-CAViaR, the number of relevant lags can diﬀer across the ∗ conditional quantiles; this is reﬂected in the possibility that for given j, elements of γjτ may be zero for values of τ greater than some given integer. For notational simplicity, we do not represent m as depending on j. Nevertheless, by convention, for no τ ≤ m do ∗ equal to the zero vector for all j. we have γjτ The ﬁnitely dimensioned random vectors Ψt may contain lagged values of Yt , as well as measurable functions of Xt and lagged Xt and Yt . In particular, Ψt may contain Stinchcombe and White’s (1998) GCR transformations, as discussed in White (2006). For a particular quantile, say θj , the coeﬃcients to be estimated are βj∗ and γj∗ ≡ ∗ ∗ ) . Let αj∗ ≡ (βj∗ , γj∗ ), and write α∗ = (α1∗ , . . . , αp∗ ) , an " × 1 vector, where (γj1 , . . . , γjm " ≡ p(k + mp). We will call α∗ the “MQ-CAViaR coeﬃcient vector.” We estimate α∗ using a correctly speciﬁed model of the MQ-CAViaR process. First, we specify our model. Assumption 3 Let A be a compact subset of R . (i) The sequence of functions {qt : Ω × A → Rp } is such that for each t and each α ∈ A, qt (·, α) is measurable–Ft−1 ; for each t and each ω ∈ Ω, qt (ω, ·) is continuous on A; and for each t and j = 1, . . . , p, qj,t (·, α) = Ψt βj +

m

qt−τ (·, α) γjτ .

τ =1

Next, we impose correct speciﬁcation and an identiﬁcation condition. Assumption 4(i.a) delivers correct speciﬁcation by ensuring that the MQ-CAViaR coeﬃcient vector α∗ belongs to the parameter space, A. This ensures that α∗ optimizes the estimation objective function asymptotically. Assumption 4(i.b) delivers identiﬁcation by ensuring that α∗ is the only such optimizer. In stating the identiﬁcation condition, we deﬁne δj,t (α, α∗ ) ≡ qj,t (·, α) − qj,t (·, α∗ ) and use the norm ||α|| ≡ maxi=1,..., |αi |. Assumption 4 (i)(a) There exists α∗ ∈ A such that for all t qt (·, α∗ ) = qt∗ ;

(4)

(b) There exists a nonempty set J ⊆ {1, . . . , p} such that for each > 0 there exists δ > 0 such that for all α ∈ A with ||α − α∗ || > , P [∪j∈J {|δj,t (α, α∗ )| > δ }] > 0. Among other things, this identiﬁcation condition ensures that there is suﬃcient variation in the shape of the conditional distribution to support estimation of a suﬃcient

234

Multi-Quantile CAViaR

number (#J) of variation-free conditional quantiles. In particular, distributions that depend on a given ﬁnite number of parameters, say k, will generally be able to support k variation-free quantiles. For example, the quantiles of the N (μ, 1) distribution all depend on μ alone, so there is only one “degree of freedom” for the quantile variation. Similarly the quantiles of scaled and shifted t-distributions depend on three parameters (location, scale, and kurtosis), so there are only three “degrees of freedom” for the quantile variation.

3. MQ-CAViaR estimation: Consistency and asymptotic normality We estimate α∗ by the method of quasi-maximum likelihood. Speciﬁcally, we construct a quasi-maximum likelihood estimator (QMLE) α ˆ T as the solution to the following optimization problem: ⎫ ⎧ p T ⎨ ⎬ ρθj (Yt − qj,t (·, α)) , min S¯T (α) ≡ T −1 (5) α∈A ⎭ ⎩ t=1

j=1

where ρθ (e) = eψθ (e) is the standard “check function,” deﬁned using the usual quantile step function, ψθ (e) = θ − 1[e≤0] . We thus view ⎧ ⎫ p ⎨ ⎬ ρθj (Yt − qj,t (·, α)) St (α) ≡ − ⎩ ⎭ j=1

as the quasi log-likelihood for observation t. In particular, St (α) is the log-likelihood of a vector of p independent asymmetric double exponential random variables (see White, 1994, ch. 5.3; Kim and White, 2003; Komunjer, 2005). Because Yt − qj,t (·, α∗ ), j = 1, . . . , p need not actually have this distribution, the method is quasi maximum likelihood. We can establish the consistency of α ˆ T by applying results of White (1994). For this we impose the following moment and domination conditions. In stating this next condition and where convenient elsewhere, we exploit stationarity to omit explicit reference to all values of t. Assumption 5 (i) E|Yt | < ∞; (ii) let D0,t ≡ maxj=1,...,p supα∈A |qj,t (·, α)|, t = 1, 2, . . .. Then E(D0,t ) < ∞. We now have conditions suﬃcient to establish the consistency of α ˆT . Theorem 1 Suppose that Assumptions 1, 2(i,ii), 3(i), 4(i), and 5(i,ii) hold. Then a.s. α ˆ T → α∗ . αT − α∗ ). We use a method Next, we establish the asymptotic normality of T 1/2 (ˆ originally proposed by Huber (1967) and later extended by Weiss (1991). We ﬁrst sketch the method before providing formal conditions and results.

3 MQ-CAViaR estimation: Consistency and asymptotic normality

235

ˆ T satisﬁes the asymptotic Huber’s method applies to our estimator α ˆ T , provided that α ﬁrst order conditions ⎫ ⎧ p T ⎨ ⎬ ∇qj,t (·, α ˆ T )ψθj (Yt − qj,t (·, α ˆ T )) = op (T 1/2 ), (6) T −1 ⎭ ⎩ t=1

j=1

where ∇qj,t (·, α) is the " × 1 gradient vector with elements (∂/∂αi )qj,t (·, α), i = 1, . . . , ", ˆ T )) is a generalized residual. Our ﬁrst task is thus to ensure that (6) and ψθj (Yt − qj,t (·, α holds. Next, we deﬁne λ(α) ≡

p

E[∇qj,t (·, α)ψθj (Yt − qj,t (·, α))].

j=1

With λ continuously diﬀerentiable at α∗ interior to A, we can apply the mean value theorem to obtain λ(α) = λ(α∗ ) + Q0 (α − α∗ ),

(7)

α(i) ), where α ¯ (i) is a mean where Q0 is an " × " matrix with (1 × ") rows Q0,i = ∇ λ(¯ value (diﬀerent for each i) lying on the segment connecting α and α∗ , i = 1, . . . , ". It is straightforward to show that correct speciﬁcation ensures that λ(α∗ ) is zero. We will also show that p

Q0 = −Q∗ + O(α − α∗ ),

(8)

where Q∗ ≡ j=1 E[fj,t (0)∇qj,t (·, α∗ )∇ qj,t (·, α∗ )] with fj,t (0) the value at zero of the density fj,t of εj,t ≡ Yt − qj,t (·, α∗ ), conditional on Ft−1 . Combining (7) and (8) and putting λ(α∗ ) = 0, we obtain λ(α) = −Q∗ (α − α∗ ) + O(α − α∗ 2 ).

(9)

The next step is to show that T 1/2 λ(ˆ αT ) + HT = op (1),

(10)

p T ∗ where HT ≡ T −1/2 t=1 ηt∗ , with ηt∗ ≡ j=1 ∇qj,t (·, α )ψθj (εj,t ). Equations (9) and (10) then yield the following asymptotic representation of our estimator α ˆT : αT − α∗ ) = Q∗−1 T −1/2 T 1/2 (ˆ

T

ηt∗ + op (1).

(11)

t=1

As we impose conditions suﬃcient to ensure that {ηt∗ , Ft } is a martingale diﬀerence sequence (MDS), a suitable central limit theorem (e.g., theorem 5.24 in White, 2001) applies to (11) to yield the desired asymptotic normality of α ˆT : αT − α∗ )→N (0, Q∗−1 V ∗ Q∗−1 ), T 1/2 (ˆ d

where V ∗ ≡ E(ηt∗ ηt∗ ).

(12)

236

Multi-Quantile CAViaR

We now strengthen the conditions above to ensure that each step of the above argument is valid. Assumption 2 (iii) (a) There exists a ﬁnite positive constant f0 such that for each t, each ω ∈ Ω, and each y ∈ R, ft (ω, y) ≤ f0 < ∞; (b) There exists a ﬁnite positive constant L0 such that for each t, each ω ∈ Ω, and each y1 , y2 ∈ R, |ft (ω, y1 )−ft (ω, y2 )| ≤ L0 |y1 − y2 |. Next we impose suﬃcient diﬀerentiability of qt with respect to α. Assumption 3 (ii) For each t and each ω ∈ Ω, qt (ω, ·) is continuously diﬀerentiable on A; (iii) For each t and each ω ∈ Ω, qt (ω, ·) is twice continuously diﬀerentiable on A. To exploit the mean value theorem, we require that α∗ belongs to the interior of A, int(A). Assumption 4 (ii) α∗ ∈ int(A). Next, we place domination conditions on the derivatives of qt . Assumption 5 (iii) Let D1,t ≡ maxj=1,...,p maxi=1,..., supα∈A |(∂/∂αi )qj,t (·, α)|, t = 2 ) < ∞; (iv) Let D2,t ≡ maxj=1,...,p 1, 2, . . .. Then (a) E(D1,t ) < ∞; (b) E(D1,t 2 maxi=1,..., maxh=1,..., supα∈A |(∂ /∂αi ∂αh )qj,t (·, α)|, t = 1, 2, . . .. Then (a) E(D2,t ) < 2 ) < ∞. ∞; (b) E(D2,t p Assumption 6 (i) Q∗ ≡ j=1 E[fj,t (0)∇qj,t (·, α∗ )∇ qj,t (·, α∗ )] is positive deﬁnite; (ii) V ∗ ≡ E(ηt∗ ηt∗ ) is positive deﬁnite. Assumptions 3(ii) and 5(iii.a) are additional assumptions helping to ensure that (6) holds. Further imposing Assumptions 2(iii), 3(iii.a), 4(ii), and 5(iv.a) suﬃces to ensure that (9) holds. The additional regularity provided by Assumptions 5(iii.b), 5(iv.b), and 6(i) ensures that (10) holds. Assumptions 5(iii.b) and 6(ii) help ensure the availability of the MDS central limit theorem. We now have conditions suﬃcient to ensure asymptotic normality of our MQ-CAViaR estimator. Formally, we have Theorem 2 Suppose that Assumptions 1–6 hold. Then αT − α∗ )→N (0, I). V ∗−1/2 Q∗ T 1/2 (ˆ d

Theorem 2 shows that our QML estimator α ˆ T is asymptotically normal with asympˆ T is totic covariance matrix Q∗−1 V ∗ Q∗−1 . There is, however, no guarantee that α asymptotically eﬃcient. There is now a considerable literature investigating eﬃcient estimation in quantile models; see, for example, Newey and Powell (1990), Otsu (2003), Komunjer and Vuong (2006, 2007a, 2007b). So far, this literature has only considered single quantile models. It is not obvious how the results for single quantile models extend to multi-quantile models such as ours. Nevertheless, Komunjer and Vuong (2007a) show that the class of QML estimators is not large enough to include an eﬃcient estimator, and that the class of M -estimators, which strictly includes the QMLE class, yields an estimator that attains the eﬃciency bound. Speciﬁcally, they show that replacing the usual quantile check function ρθj appearing in (5) with ρ∗θj (Yt − qj,t (·, α)) = (θ − 1[Yt −qj,t (·,α)≤0] )(Ft (Yt ) − Ft (qj,t (·, α)))

4 Consistent covariance matrix estimation

237

will deliver an asymptotically eﬃcient quantile estimator under the single quantile restriction. We conjecture that replacing ρθj with ρ∗θj in (5) will improve estimator eﬃciency. We leave the study of the asymptotically eﬃcient multi-quantile estimator for future work.

4. Consistent covariance matrix estimation To test restrictions on α∗ or to obtain conﬁdence intervals, we require a consistent estimator of the asymptotic covariance matrix C ∗ ≡ Q∗−1 V ∗ Q∗−1 . First, we provide a ˆ T for Q∗ . It follows consistent estimator VˆT for V ∗ ; then we give a consistent estimator Q −1 ˆ ˆ −1 ∗ ˆ ˆ that CT ≡ QT VT QT is a consistent estimator p for C . Recall that V ∗ ≡ E(ηt∗ ηt∗ ), with ηt∗ ≡ j=1 ∇qj,t (·, α∗ )ψθj (εj,t ). A straightforward plug-in estimator of V ∗ is VˆT ≡ T −1

T

ηˆt ηˆt ,

with

t=1

ηˆt ≡

p

∇qj,t (·, α ˆ T )ψθj (ˆ εj,t )

j=1

εˆj,t ≡ Yt − qj,t (·, α ˆ T ). We already have conditions suﬃcient to deliver the consistency of VˆT for V ∗ . Formally, we have p Theorem 3 Suppose that Assumptions 1–6 hold. Then VˆT →V ∗ .

Next, we provide a consistent estimator of Q∗ ≡

p

E[fj,t (0)∇qj,t (·, α∗ )∇ qj,t (·, α∗ )].

j=1

cT for a We follow Powell’s (1984) suggestion of estimating fj,t (0) with 1[−ˆcT ≤ˆεj,t ≤ˆcT ] /2ˆ suitably chosen sequence {ˆ cT }. This is also the approach taken in Kim and White (2003) and Engle and Manganelli (2004). Accordingly, our proposed estimator is ˆ T = (2ˆ cT T )−1 Q

p T

1[−ˆcT ≤ˆεj,t ≤ˆcT ] ∇qj,t (·, α ˆ T )∇ qj,t (·, α ˆ T ).

t=1 j=1

To establish consistency, we strengthen the domination condition on ∇qj,t and impose conditions on {ˆ cT }. 3 Assumption 5 (iii.c) E(D1,t ) < ∞.

Assumption 7 {ˆ cT } is a stochastic sequence and {cT } is a nonstochastic sequence such p 1/2 ). that (i) cˆT /cT → 1; (ii) cT = o(1); and (iii) c−1 T = o(T p ˆT → Q∗ . Theorem 4 Suppose that Assumptions 1–7 hold. Then Q

238

Multi-Quantile CAViaR

5. Quantile-based measures of conditional skewness and kurtosis Moments of asset returns of order higher than two are important because these permit a recognition of the multi-dimensional nature of the concept of risk. Such higher order moments have thus proved useful for asset pricing, portfolio construction, and risk assessment. See, for example, Hwang and Satchell (1999) and Harvey and Siddique (2000). Higher order moments that have received particular attention are skewness and kurtosis, which involve moments of order three and four, respectively. Indeed, it is widely held as a “stylized fact” that the distribution of stock returns exhibits both left skewness and excess kurtosis (fat tails); there is a large amount of empirical evidence to this eﬀect. Recently, Kim and White (2004) have challenged this stylized fact and the conventional way of measuring skewness and kurtosis. As moments, skewness and kurtosis are computed using averages, speciﬁcally, averages of third and fourth powers of standardized random variables. Kim and White (2004) point out that averages are sensitive to outliers, and that taking third or fourth powers greatly enhances the inﬂuence of any outliers that may be present. Moreover, asset returns are particularly prone to containing outliers, as the result of crashes or rallies. According to Kim and White’s simulation study, even a single outlier of a size comparable to the sharp drop in stock returns caused by the 1987 stock market crash can generate dramatic irregularities in the behavior of the traditional moment-based measures of skewness and kurtosis. Kim and White (2004) propose using more robust measures instead, based on sample quantiles. For example, Bowley’s (1920) coeﬃcient of skewness is given by

SK2 =

q3∗ + q1∗ − 2q2∗ , q3∗ − q1∗

where q1∗ = F −1 (0.25), q2∗ = F −1 (0.5), and q3∗ = F −1 (0.75), where F (y) ≡ P0 [Yt < y] is the unconditional CDF of Yt . Similarly, Crow and Siddiqui’s (1967) coeﬃcient of kurtosis is given by

KR4 =

q4∗ − q0∗ − 2.91, q3∗ − q1∗

where q0∗ = F −1 (0.025) and q4∗ = F −1 (0.975). (The notations SK2 and KR4 correspond to those of Kim and White, 2004.) A limitation of these measures is that they are based on unconditional sample quantiles. Thus, in measuring skewness or kurtosis, these can neither incorporate useful information contained in relevant exogenous variables nor exploit the dynamic evolution of quantiles over time. To avoid these limitations, we propose constructing measures ∗ in place of the unconof conditional skewness and kurtosis using conditional quantiles qj,t ditional quantiles qj∗ . In particular, the conditional Bowley coeﬃcient of skewness and

6 Application and simulation

239

the conditional Crow and Siddiqui coeﬃcient of kurtosis are given by CSK2 =

∗ ∗ ∗ + q1,t − 2q2,t q3,t , ∗ − q∗ q3,t 1,t

CKR4 =

∗ ∗ − q0,t q4,t ∗ − q ∗ − 2.91. q3,t 1,t

Another quantile-based kurtosis measure discussed in Kim and White (2004) is Moors’s (1988) coeﬃcient of kurtosis, which involves computing six quantiles. Because our approach requires joint estimation of all relevant quantiles, and, in our model, each quantile depends not only on its own lags, but also possibly on the lags of other quantiles, the number of parameters to be estimated can be quite large. Moreover, if the θj s are too close to each other, then the corresponding quantiles may be highly correlated, which can result in an analog of multicollinearity. For these reasons, in what follows we focus only on SK2 and KR4 , as these require jointly estimating at most ﬁve quantiles.

6. Application and simulation 6.1. Time-varying skewness and kurtosis for the S&P 500 In this section we obtain estimates of time-varying skewness and kurtosis for the S&P 500 index daily returns. Figure 12.1 plots the S&P 500 daily returns series used for estimation. The sample ranges from January 1, 1999 to September 28, 2007, for a total of 2,280 observations. 6

4

2

0

–2

–4

–6

–8 1999

Fig. 12.1.

2000

2001

2002

2003

2004

2005

2006

2007

S&P 500 daily returns: January 1, 1999–September 30, 2007

240

Multi-Quantile CAViaR

Table 12.1.

S&P 500 index: estimation results for the LRS model

β1

β2

β3

β4

β5

β6

β7

β8

β9

0.01 (0.18)

0.05 (0.19)

0.94 (0.04)

−0.04 (0.15)

0.01 (0.01)

0.01 (0.02)

3.25 (0.04)

0.00 (0.00)

0.00 (0.00)

Standard errors are in parentheses.

First, we estimate time-varying skewness and kurtosis using the GARCH-type model of Leon, Rubio, and Serna (2004), the LRS model for short. Letting rt denote the return for day t, we estimate the following speciﬁcation of their model: 1/2

rt = ht ηt 2 ht = β1 + β2 rt−1 + β3 ht−1 3 st = β4 + β5 ηt−1 + β6 st−1 4 kt = β7 + β8 ηt−1 + β9 kt−1 ,

where we assume that Et−1 (ηt ) = 0, Et−1 (ηt2 ) = 1, Et−1 (ηt3 ) = st , and Et−1 (ηt4 ) = kt , where Et−1 denotes the conditional expectation given rt−1 , rt−2 , . . . The likelihood is constructed using a Gram-Charlier series expansion of the normal density function for ηt , truncated at the fourth moment. We refer the interested reader to Leon, Rubio, and Serna (2004) for technical details. The model is estimated via (quasi-)maximum likelihood. As starting values for the optimization, we use estimates of β1 , β2 , and β3 from the standard GARCH model. We set initial values of β4 and β7 equal to the unconditional skewness and kurtosis values of the GARCH residuals. The remaining coeﬃcients are initialized at zero. The point estimates for the model parameters are given in Table 12.1. Figures 12.2 and 12.3 display the time series plots for st and kt , respectively. Next, we estimate the MQ-CAViaR model. Given the expressions for CSK2 and CKR4 , we require ﬁve quantiles, i.e. those for θj = 0.025, 0.25, 0.5, 0.75, and 0.975. We thus estimate an MQ-CAViaR model for the following DGP: ∗ ∗ ∗ ∗ = β11 + β12 |rt−1 | + qt−1 γ1∗ q0.025,t ∗ ∗ ∗ ∗ q0.25,t = β21 + β22 |rt−1 | + qt−1 γ2∗

.. . ∗ ∗ ∗ ∗ q0.975,t = β51 + β52 |rt−1 | + qt−1 γ5∗ , ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ≡ (q0.025,t−1 , q0.25,t−1 , q0.5,t−1 , q0.75,t−1 , q0.975,t−1 ) and γj∗ ≡ (γj1 , γj2 , where qt−1 ∗ ∗ ∗ , γj4 , γj5 ) , j = 1, . . . , 5. Hence, the coeﬃcient vector α∗ consists of all the coeﬃcients γj3 ∗ ∗ and γjk , as above. βjk Estimating the full model is not trivial. We discuss this brieﬂy before presenting the estimation results. We perform the computations in a step-wise fashion as follows. In

6 Application and simulation

241

0.5

0

–0.5

–1

–1.5

–2 1999

Fig. 12.2.

2000

2001

2002

2003

2004

2005

2006

2007

S&P 500: estimated conditional skewness, LRS model 6.5 6 5.5 5 4.5 4 3.5 3 2.5 1999

Fig. 12.3.

2000

2001

2002

2003

2004

2005

2006

2007

S&P 500: estimated conditional kurtosis, LRS model

the ﬁrst step, we estimate the MQ-CAViaR model containing just the 2.5% and 25% quantiles. The starting values for optimization are the individual CAViaR estimates, and we initialize the remaining parameters at zero. We repeat this estimation procedure for the MQ-CAViaR model containing the 75% and 97.5% quantiles. In the second step, we

242

Multi-Quantile CAViaR

Table 12.2. θj 0.025 0.25 0.50 0.75 0.975

S&P 500 index: estimation results for the MQ-CAViaR model

βj1

βj2

γj1

γj2

γj3

γj4

γj5

−0.04 (0.05) 0.001 (0.02) 0.10 (0.02) 0.03 (0.31) 0.03 (0.06)

−0.11 (0.07) −0.01 (0.05) 0.04 (0.04) −0.01 (0.05) 0.24 (0.07)

0.93 (0.12) 0 (0.06) 0.03 (0.04) 0 (0.70) 0 (0.16)

0.02 (0.10) 0.99 (0.03) 0 (0.02) 0 (0.80) 0 (0.16)

0 (0.29) 0 (0.04) −0.32 (0.02) 0 (2.33) 0 (0.33)

0 (0.93) 0 (0.63) 0 (0.52) 0.04 (0.84) 0.03 (0.99)

0 (0.30) 0 (0.20) −0.02 (0.17) 0.29 (0.34) 0.89 (0.29)

Standard errors are in parentheses.

use the estimated parameters of the ﬁrst step as starting values for the optimization of the MQ-CAViaR model containing the 2.5%, 25%, 75%, and 97.5% quantiles, initializing the remaining parameters at zero. Third and ﬁnally, we use the estimates from the second step as starting values for the full MQ-CAViaR model optimization containing all ﬁve quantiles of interest, again setting to zero the remaining parameters. The likelihood function appears quite ﬂat around the optimum, making the optimization procedure sensitive to the choice of initial conditions. In particular, choosing a diﬀerent combination of quantile couples in the ﬁrst step of our estimation procedure tends to produce diﬀerent parameter estimates for the full MQ-CAViaR model. Nevertheless, the likelihood values are similar, and there are no substantial diﬀerences in the dynamic behavior of the individual quantiles associated with these diﬀerent estimates. Table 12.2 presents our MQ-CAViaR estimation results. In calculating the standard errors, we have set the bandwidth to 1. Results are slightly sensitive to the choice of the bandwidth, with standard errors increasing for lower values of the bandwidth. We observe that there is interaction across quantile processes. This is particularly evident for the 75% quantile: the autoregressive coeﬃcient associated with the lagged 75% quantile is only 0.04, whereas that associated with the lagged 97.5% quantile is 0.29. This implies that the autoregressive process of the 75% quantile is mostly driven by the lagged 97.5% quantile, although this is not statistically signiﬁcant at the usual signiﬁcance level. Figure 12.4 displays plots of the ﬁve individual quantiles for the time period under consideration. ∗ ∗ , . . . , q0.975,t to calculate Next, we use the estimates of the individual quantiles q0.025,t the robust skewness and kurtosis measures CSK2 and CKR4 . The resulting time series plots are shown in Figures 12.5 and 12.6, respectively. We observe that the LRS model estimates of both skewness and kurtosis do not vary much and are dwarfed by those for the end of February 2007. The market was doing well until February 27, when the S&P 500 index dropped by 3.5%, as the market worried about global economic growth. (The sub-prime mortgage ﬁasco was still not yet public knowledge.) Interestingly, this is not a particularly large negative return (there are larger

6 Application and simulation

243

6

4

2

0

–2

–4

–6 1999

Fig. 12.4.

2000

2001

2002

2003

2004

2005

2006

2007

S&P 500 conditional quantiles: January 1, 1999–September 30, 2007 0.5

0

–0.5

–1

–1.5

–2 1999

Fig. 12.5.

2000

2001

2002

2003

2004

2005

2006

2007

S&P 500: estimated conditional skewness, MQ-CAViaR model

negative returns in our sample between 2000 and 2001), but this one occurred in a period of relatively low volatility. Our more robust MQ-CAViaR measures show more plausible variability and conﬁrm that the February 2007 market correction was indeed a case of large negative

244

Multi-Quantile CAViaR 6.5 6 5.5 5 4.5 4 3.5 3 2.5 1999

Fig. 12.6.

2000

2001

2002

2003

2004

2005

2006

2007

S&P 500: estimated conditional kurtosis, MQ-CAViaR model

conditional skewness and high conditional kurtosis. This episode appears to be substantially aﬀecting the LRS model estimates for the entire sample, raising doubts about the reliability of LRS estimates in general, consistent with the ﬁndings of Sakata and White (1998).

6.2. Simulation In this section we provide some Monte Carlo evidence illustrating the ﬁnite sample behavior of our methods. We consider the same MQ-CAViaR process estimated in the previous subsection, ∗ ∗ ∗ ∗ = β11 + β12 |rt−1 | + qt−1 γ1∗ q0.025,t ∗ ∗ ∗ ∗ q0.25,t = β21 + β22 |rt−1 | + qt−1 γ2∗

.. . ∗ ∗ ∗ ∗ q0.975,t = β51 + β52 |rt−1 | + qt−1 γ5∗ .

(13)

For the simulation exercise, we set the true coeﬃcients equal to the estimates reported in Table 12.2. Using these values, we generate the above MQ-CAViaR process 100 times, and each time we estimate all the coeﬃcients, using the procedure described in the previous subsection. Data were generated as follows. We initialize the quantiles qθ∗j ,t , j = 1, . . . , 5 at t = 1 using the empirical quantiles of the ﬁrst 100 observations of our S&P 500 data. Given quantiles for time t, we generate a random variable rt compatible with these using the following procedure. First, we draw a random variable Ut , uniform over the interval

6 Application and simulation

245

Means of point estimates through 100 replications (T = 1, 000)

Table 12.3. θj

0.025 0.25 0.50 0.75 0.975

True parameters βj1

βj2

γj1

γj2

γj3

γj4

γj5

−0.05 (0.08) −0.05 (0.40) −0.08 (0.15) 0.20 (0.42) 0.06 (0.16)

−0.10 (0.02) −0.01 (0.05) 0.02 (0.06) 0.05 (0.11) 0.22 (0.03)

0.93 (0.14) 0.04 (0.17) 0.00 (0.01) 0.00 (0.02) 0.00 (0.00)

0.04 (0.34) 0.81 (0.47) 0.00 (0.00) 0.00 (0.02) 0.00 (0.01)

0.00 (0.00) 0.00 (0.01) −0.06 (0.81) 0.00 (0.00) 0.00 (0.00)

0.00 (0.01) 0.00 (0.00) 0.00 (0.00) 0.38 (0.63) 0.10 (0.56)

0.00 (0.00) 0.00 (0.00) 0.00 (0.01) 0.13 (0.19) 0.87 (0.16)

Standard errors are in parentheses.

[0,1]. Next, we ﬁnd θj such that θj−1 < Ut < θj . This determines the quantile range within which the random variable to be generated should fall. Finally, we generated the desired random variable rt by drawing it from a uniform distribution within the interval [qθ∗j−1,t , qθ∗j ,t ]. The procedure can be represented as follows: rt =

p+1

I(θj−1 < Ut < θj )[qθ∗j−1 ,t + (qθ∗j ,t − qθ∗j−1 ,t )Vt ],

j=1

where Ut and Vt are i.i.d. U(0,1), θ0 = 0, θp+1 = 1, qθ∗0 ,t = qθ∗1 ,t − 0.05 and qθ∗p+1 ,t = qθ∗p ,t + 0.05. It is easy to check that the random variable rt has the desired quantiles by construction. Further, it does not matter that the distribution within the quantiles is uniform, as that distribution has essentially no impact on the resulting parameter estimates. Using these values of rt and qt∗ , we apply (13) to generate conditional quantiles for the next period. The process iterates until t = T . Once we have a full sample, we perform the estimation procedure described in the previous subsection. Tables 12.3 and 12.4 provide the sample means and standard deviations over 100 replications of each coeﬃcient estimate for two diﬀerent sample sizes, T = 1, 000 and T = 2, 280 (the sample size of the S&P 500 data), respectively. The mean estimates are fairly close to the values of Table 12.2, showing that the available sample sizes are suﬃcient to recover the true DGP parameters. (To obtain standard error estimates for the means, divide the reported standard deviations by 10.) A potentially interesting experiment that one might consider is to generate data from the LRS process and see how the MQ-CAViaR model performs in revealing underlying patterns of conditional skewness and kurtosis. Nevertheless, we leave this aside here, as the LRS model depends on four distributional shape parameters, but we require ﬁve variation-free quantiles for the present exercise. As noted in Section 2, the MQ-CAViaR model will generally not satisfy the identiﬁcation condition in such circumstances.

246

Multi-Quantile CAViaR

Table 12.4.

Means of point estimates through 100 replications (T = 2, 280)

θj

0.025 0.25 0.50 0.75 0.975

True parameters βj1

βj2

γj1

γj2

γj3

γj4

−0.04 (0.03) −0.04 (0.18) −0.01 (0.11) 0.09 (0.21) 0.05 (0.13)

−0.10 (0.01) −0.01 (0.02) 0.02 (0.04) 0.01 (0.07) 0.24 (0.02)

0.93 (0.07) 0.03 (0.12) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)

0.03 (0.21) 0.88 (0.38) 0.00 (0.01) 0.00 (0.01) 0.00 (0.03)

0.00 (0.00) 0.00 (0.00) −0.03 (0.75) 0.00 (0.00) 0.00 (0.00)

0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.33 (0.58) 0.18 (0.69)

γj5 0.00 (0.00) 0.00 (0.00) 0.00 (0.02) 0.19 (0.18) 0.83 (0.22)

Standard errors are in parentheses.

7. Conclusion In this chapter, we generalize Engle and Manganelli’s (2004) single-quantile CAViaR process to its multi-quantile version. This allows for (i) joint modeling of multiple quantiles; (ii) dynamic interactions between quantiles; and (iii) the use of exogenous variables. We apply our MQ-CAViaR process to deﬁne conditional versions of existing unconditional quantile-based measures of skewness and kurtosis. Because of their use of quantiles, these measures may be much less sensitive than standard moment-based methods to the adverse impact of outliers that regularly appear in ﬁnancial market data. An empirical analysis of the S&P 500 index demonstrates the use and utility of our new methods.

Appendix Proof of Theorem 1 We verify the conditions of corollary 5.11 of White (1994), which delivers α ˆ T → α∗ , where α ˆ T ≡ arg max T −1 α∈A

T

ϕt (Yt , qt (·, α)),

t=1

p and ϕt (Yt , qt (·, α)) ≡ − j=1 ρθj (Yt − qj,t (·, α)). Assumption 1 ensures White’s Assumption 2.1. Assumption 3(i) ensures White’s Assumption 5.1. Our choice of ρθj satisﬁes White’s Assumption 5.4. To verify White’s Assumption 3.1, it suﬃces that ϕt (Yt , qt (·, α)) is dominated on A by an integrable function (ensuring White’s Assumption 3.1(a,b))

Appendix

247

and that for each α in A, {ϕt (Yt , qt (·, α))} is stationary and ergodic (ensuring White’s Assumption 3.1(c), the strong uniform law of large numbers (ULLN)). Stationarity and ergodicity is ensured by Assumptions 1 and 3(i). To show domination, we write |ϕt (Yt , qt (·, α))| ≤

p

|ρθj (Yt − qj,t (·, α))|

j=1

=

p

|(Yt − qj,t (·, α))(θj − 1[Yt −qj,t (·,α)≤0] )|

j=1

≤2

p

|Yt | + |qj,t (·, α)|

j=1

≤ 2p(|Yt | + |D0,t |), so that sup |ϕt (Yt , qt (·, α))| ≤ 2p(|Yt | + |D0,t |).

α∈A

Thus, 2p(|Yt | + |D0,t |) dominates |ϕt (Yt , qt (·, α))| and has ﬁnite expectation by Assumption 5(i,ii). It remains to verify White’s Assumption 3.2; here this is the condition that α∗ is the unique maximizer of E(ϕt (Yt , qt (·, α)). Given Assumptions 2(ii.b) and 4(i), it follows by argument directly parallel to that in the proof of White (1994, corollary 5.11) that for all α ∈ A, E(ϕt (Yt , qt (·, α))) ≤ E(ϕt (Yt , qt (·, α∗ ))). ∗ Thus, p it suﬃces to show that the above inequality is strict for α = α .∗ Letting Δ(α) ≡ j=1 E(Δj,t (α)) with Δj,t (α) ≡ ρθj (Yt − qj,t (·, α)) − ρθj (Yt − qj,t (·, α )), it suﬃces to show that for each > 0, Δ(α) > 0 for all α ∈ A such that ||α − α∗ || > . Pick > 0 and α ∈ A such that ||α − α∗ || > . With δj,t (α, α∗ ) ≡ qt (θj , α) − qt (θj , α∗ ), by Assumption 4(i.b), there exist J ⊆ {1, . . . , p} and δ > 0 such that P [∪j∈J {|δj,t (α, α∗ )| > δ }] > 0. For this δ and all j, some algebra and Assumption 2(ii.a) ensure that

E(Δj,t (α)) = E[

δj,t (α,α∗ )

(δj,t (α, α∗ ) − s) fj,t (s)ds]

0

1 1 ≥ E[ δ2 1[|δj,t (α,α∗ )|>δ ] + δj,t (α, α∗ )2 1[|δj,t (α,α∗ )|≤δ ] )] 2 2 ≥

1 2 δ E[1[|δj,t (α,α∗ )|>δ ] ]. 2

248

Multi-Quantile CAViaR

The ﬁrst inequality above comes from the fact that Assumption 2(ii.a) implies that for any δ > 0 suﬃciently small, we have fj,t (s) > δ for |s| < δ . Thus, Δ(α) ≡

p

E(Δj,t (α)) ≥

j=1

=

p 1 2 δ E[1[|δj,t (α,α∗ )|>δ ] ] 2 j=1

p 1 1 2 δ P [|δj,t (α, α∗ )| > δ ] ≥ δ2 P [|δj,t (α, α∗ )| > δ ] 2 j=1 2 j∈J

≥

1 2 δ P [∪j∈J {|δj,t (α, α∗ )| > δ }] 2

>0 where the ﬁnal inequality follows from Assumption 4(i.b). As > 0 and α are arbitrary, the result follows. Proof of Theorem 2 As outlined in the text, we ﬁrst prove T −1/2

p T

∇qj,t (·, α ˆ T )ψθj (Yt − qj,t (·, α ˆ T )) = op (1).

(14)

t=1 j=1

The existence of ∇qj,t is ensured by Assumption 3(ii). Let ei be the " × 1 unit vector with ith element equal to one and the rest zero, and let Gi (c) ≡ T −1/2

p T

ρθj (Yt − qj,t (·, α ˆ T + cei )),

t=1 j=1

for any real number c. Then by the deﬁnition of α ˆ T , Gi (c) is minimized at c = 0. Let Hi (c) be the derivative of Gi (c) with respect to c from the right. Then Hi (c) = −T −1/2

p T

∇i qj,t (·, α ˆ T + cei )ψθj (Yt − qj,t (·, α ˆ T + cei )),

t=1 j=1

ˆ T + cei ) is the ith element of ∇qj,t (·, α ˆ T + cei ). Using the facts that (i) where ∇i qj,t (·, α Hi (c) is nondecreasing in c and (ii) for any > 0, Hi (−) ≤ 0 and Hi () ≥ 0, we have |Hi (0)| ≤ Hi () − Hi (−) ≤ T −1/2

p T

|∇i qj,t (·, α ˆ T )|1[Yt −qj,t (·,αˆ T )=0]

t=1 j=1

≤T

−1/2

max D1,t

1≤t≤T

p T

1[Yt −qj,t (·,αˆ T )=0] ,

t=1 j=1

where the last inequality follows by the domination condition imposed in Assumption 5(iii.a). Because D1,t is stationary, T −1/2 max1≤t≤T D1,t = op (1). The second T p term is bounded in probability: t=1 j=1 1[Yt −qj,t (·,αˆ T )=0] = Op (1) given Assumption

Appendix

249

2(i,ii.a) (see Koenker and Bassett, 1978, for details). Since Hi (0) is the ith element of T p ˆ T ) ψθj (Yt − qj,t (·, α ˆ T )), the claim in (14) is proved. T −1/2 t=1 j=1 ∇qj,t (·, α Next, for each α ∈ A, Assumptions 3(ii) and 5(iii.a) ensure the existence and ﬁniteness of the " × 1 vector λ(α) ≡

p

E[∇qj,t (·, α)ψθj (Yt − qj,t (·, α))]

j=1

=

p

E[∇qj,t (·, α)

j=1

0 δj,t (α,α∗ )

fj,t (s)ds],

where δj,t (α, α∗ ) ≡ qj,t (·, α) − qj,t (·, α∗ ) and fj,t (s) = (d/ds)Ft (s + qj,t (·, α∗ )) represents the conditional density of εj,t ≡ Yt − qj,t (·, α∗ ) with respect to the Lebesgue measure. The diﬀerentiability and domination conditions provided by Assumptions 3(iii) and 5(iv.a) ensure (e.g., by Bartle, corollary 5.9) the continuous diﬀerentiability of λ on A, with 9 : 0 p E ∇ ∇ qj,t (·, α) fj,t (s)ds . ∇λ(α) = δj,t (α,α∗ )

j=1

As α∗ is interior to A by Assumption 4(ii), the mean value theorem applies to each element of λ to yield λ(α) = λ(α∗ ) + Q0 (α − α∗ ),

(15)

∗

for α in a convex compact neighborhood of α , where Q0 is an " × " matrix with (1 × ") α(i) ) = ∇ λ(¯ α(i) ), where α ¯ (i) is a mean value (diﬀerent for each i) lying on rows Qi (¯ ∗ , i = 1, . . . , ". The chain rule and an application of the the segment connecting α and α 0 Leibniz rule to δj,t (α,α∗ ) fj,t (s)ds then give Qi (α) = Ai (α) − Bi (α), where Ai (α) ≡

p j=1

Bi (α) ≡

p

E ∇i ∇ qj,t (·, α)

0 δj,t (α,α∗ )

fj,t (s)ds

E[fj,t (δj,t (α, α∗ ))∇i qj,t (·, α)∇ qj,t (·, α)].

j=1

Assumption 2(iii) and the other domination conditions (those of Assumption 5) then ensure that α(i) ) = O(||α − α∗ ||) Ai (¯ Bi (¯ α(i) ) = Q∗i + O(||α − α∗ ||),

p p ∗ ∗ ∗ where Q∗i ≡ j=1 E[fj,t (0)∇i qj,t (·, α )∇ qj,t (·, α )]. Letting Q ≡ j=1 E[fj,t (0) × ∇qj,t (·, α∗ )∇ qj,t (·, α∗ )], we obtain Q0 = −Q∗ + O(||α − α∗ ||).

(16)

250

Multi-Quantile CAViaR Next, we have that λ(α∗ ) = 0. To show this, we write λ(α∗ ) =

p

E[∇qj,t (·, α∗ )ψθj (Yt − qj,t (·, α∗ ))]

j=1

=

p

E(E[∇qj,t (·, α∗ )ψθj (Yt − qj,t (·, α∗ ))|Ft−1 ])

j=1

=

p

E(∇qj,t (·, α∗ )E[ψθj (Yt − qj,t (·, α∗ ))|Ft−1 ])

j=1

=

p

E(∇qj,t (·, α∗ )E[ψθj (εj,t )|Ft−1 ])

j=1

= 0, ∗ ∗ ] |Ft−1 ] = 0, by deﬁnition of q as E[ψθj (εj,t )|Ft−1 ] = θj − E[1[Yt ≤qj,t j,t , j = 1, . . . , p (see ∗ (2)). Combining λ(α ) = 0 with (15) and (16), we obtain

λ(α) = −Q∗ (α − α∗ ) + O(||α − α∗ ||2 ).

(17)

The next step is to show that α) + HT = op (1), T 1/2 λ(ˆ

(18)

p T where HT ≡ T −1/2 t=1 ηt∗ , with ηt∗ ≡ ηt (α∗ ), ηt (α) ≡ j=1 ∇qj,t (·, α) ψθj (Yt − qj,t (·, α)). Let ut (α, d) ≡ sup{τ :||τ −α||≤d} ||ηt (τ ) − ηt (α)||. By the results of Huber (1967) and Weiss (1991), to prove (18) it suﬃces to show the following: (i) there exist a > 0 and d0 > 0 such that ||λ(α)|| ≥ a||α − α∗ || for ||α − α∗ || ≤ d0 ; (ii) there exist b > 0, d0 > 0, and d ≥ 0 such that E[ut (α, d)] ≤ bd for ||α − α∗ || + d ≤ d0 ; and (iii) there exist c > 0, d0 > 0, and d ≥ 0 such that E[ut (α, d)2 ] ≤ cd for ||α − α∗ || + d ≤ d0 . The condition that Q∗ is positive deﬁnite in Assumption 6(i) is suﬃcient for (i). For (ii), we have that for given (small) d > 0 ut (α, d) ≤

p

sup

{τ :||τ −α||≤d} j=1

||∇qj,t (·, τ )ψθj (Yt − qj,t (·, τ ))

− ∇qj,t (·, α)ψθj (Yt − qj,t (·, α))|| ≤

p

||ψθj (Yt − qj,t (·, τ ))||

sup

j=1 {τ :||τ −α||≤d}

×

sup

||∇qj,t (·, τ ) − ∇qj,t (·, α)||

{τ :||τ −α||≤d}

Appendix

251

+

p

sup

||ψθj (Yt − qj,t (·, α))

j=1 {τ :||τ −α||≤d}

− ψθj (Yt − qj,t (·, τ ))|| × ≤ pD2,t d + D1,t

p

sup

||∇qj,t (·, α)||

{τ :||τ −α||≤d}

1[|Yt −qj,t (·,α)|
(19)

j=1

using the following; (i) ||ψθj (Yt − qj,t (·, τ ))|| ≤ 1, (ii) ||ψθj (Yt − qj,t (·, α)) − ψθj (Yt − qj,t (·, τ ))|| ≤ 1[|Yt −qj,t (·,α)|<|qj,t (·,τ )−qj,t (·,α)|] , and (iii) the mean value theorem applied to ∇qj,t (·, τ ) and qj,t (·, α). Hence, we have E[ut (α, d)] ≤ pC0 d + 2pC1 f0 d for some constants C0 and C1 given Assumptions 2(iii.a), 5(iii.a), and 5(iv.a). Hence, (ii) holds for b = pC0 + 2pC1 f0 and d0 = 2d. The last condition (iii) can be similarly veriﬁed by applying the cr –inequality to (19) with d < 1 (so that d2 < d) and using Assumptions 2(iii.a), 5(iii.b), and 5(iv.b). Thus, (18) is veriﬁed. Combining (17) and (18) thus yields αT − α∗ ) = T −1/2 Q∗ T 1/2 (ˆ

T

ηt∗ + op (1)

t=1

sequence (MDS). In parBut {ηt∗ , Ft } is a stationary ergodic martingale diﬀerence p ticular, ηt∗ is measurable-Ft , and E(ηt∗ |Ft−1 ) = E( j=1 ∇qj,t (·, α∗ )ψθj (εj,t )|Ft−1 ) = p ∗ j=1 ∇qj,t (·, α )E(ψθj (εj,t )|Ft−1 ) = 0, as E[ψθj (εj,t )|Ft−1 ] = 0 for all j = 1, . . . , p. Assumption 5(iii.b) ensures that V ∗ ≡ E(ηt∗ ηt∗ ) is ﬁnite. The MDS central limit theorem ∗ (e.g., theorem 5.24 of White, 2001) applies, provided V is positive deﬁnite (as ensured T by Assumption 6(ii)) and that T −1 t=1 ηt∗ ηt∗ = V ∗ + op (1), which is ensured by the ergodic theorem. The standard argument now gives αT − α∗ ) → N (0, I), V ∗−1/2 Q∗ T 1/2 (ˆ d

which completes the proof. Proof of Theorem 3 We have T T T ∗ −1 −1 ∗ ∗ −1 ∗ ∗ ∗ ∗ ˆ VT − V = T + T ηˆt ηˆ − T η η η η − E[η η ] , t

t=1

t t

t=1

t t

t t

t=1

p p ∗ ∗ qj,t ψˆj,t and ηt∗ ≡ j=1 ∇qj,t ψj,t , with ∇ˆ qj,t ≡ ∇qj,t (·, α ˆ T ), ψˆj,t ≡ where ηˆt ≡ j=1 ∇ˆ ∗ ∗ ∗ ∗ ˆ T )), ∇qj,t ≡ ∇qj,t (·, α ), and ψj,t ≡ ψθj (Yt − qj,t (·, α )). Assumpψθj (Yt − qj,t (·, α tions 1 and 2(i,ii) ensure that {ηt∗ ηt∗ } is a stationary ergodic sequence. Assumptions 3(i,ii), 4(i.a), and 5(iii) ensure that E[ηt∗ ηt∗ ] < ∞. It follows by the ergodic theorem T T that T −1 t=1 ηt∗ ηt∗ − E[ηt∗ ηt∗ ] = op (1). Thus, it suﬃces to prove T −1 t=1 ηˆt ηˆt − T T −1 t=1 ηt∗ ηt∗ = op (1).

252

Multi-Quantile CAViaR T

The (h, i) element of T −1 T −1

p p T

t=1

ηˆt ηˆt − T −1

T t=1

ηt∗ ηt∗ is

∗ ∗ ∗ ∗ {ψˆj,t ψˆk,t ∇h qˆj,t ∇i qˆk,t − ψj,t ψk,t ∇h qj,t ∇i qk,t }.

t=1 j=1 k=1

Thus, it will suﬃce to show that for each (h, i) and (j, k) we have T −1

T ∗ ∗ ∗ ∗ {ψˆj,t ψˆk,t ∇h qˆj,t ∇i qˆk,t − ψj,t ψk,t ∇h qj,t ∇i qk,t } = op (1). t=1

By the triangle inequality, |T −1

T

∗ ∗ ∗ ∗ {ψˆj,t ψˆk,t ∇h qˆj,t ∇i qˆk,t − ψj,t ψk,t ∇h qj,t ∇i qk,t }| ≤ AT + BT ,

t=1

where AT = T −1

T

∗ ∗ |ψˆj,t ψˆk,t ∇h qˆj,t ∇i qˆk,t − ψj,t ψk,t ∇h qˆj,t ∇i qˆk,t |

t=1

BT = T

−1

T

∗ ∗ ∗ ∗ ∗ ∗ |ψj,t ψk,t ∇h qj,t ∇i qk,t − ψj,t ψk,t ∇h qˆj,t ∇i qˆk,t |.

t=1

We now show that AT = op (1) and BT = op (1), delivering the desired result. For AT , the triangle inequality gives AT ≤ A1T + A2T + A3T , where A1T = T −1

T

θj |1[εj,t ≤0] − 1[ˆεj,t ≤0] ||∇h qˆj,t ∇i qˆk,t |

t=1

A2T = T −1

T

θk |1[εk,t ≤0] − 1[ˆεk,t ≤0] ||∇h qˆj,t ∇i qˆk,t |

t=1

A3T = T −1

T

|1[εj,t ≤0] 1[εk,t ≤0] − 1[ˆεj,t ≤0] 1[ˆεk,t ≤0] ||∇h qˆj,t ∇i qˆk,t |.

t=1

Theorem 2, ensured by Assumptions 1–6, implies that T 1/2 ||ˆ αT − α∗ || = Op (1). This, together with Assumptions 2(iii,iv) and 5(iii.b), enables us to apply the same techniques used in Kim and White (2003) to show A1T = op (1), A2T = op (1), A3T = op (1), implying AT = op (1). It remains to show BT = op (1). By the triangle inequality, BT ≤ B1T + B2T ,

Appendix

253

where B1T = |T −1

T

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ψj,t ψk,t ∇h qj,t ∇i qk,t − E[ψj,t ψk,t ∇h qj,t ∇i qk,t ]|

t=1

B2T = |T −1

T

∗ ∗ ∗ ∗ ∗ ∗ ψj,t ψk,t ∇h qˆj,t ∇i qˆk,t − E[ψj,t ψk,t ∇h qj,t ∇i qk,t ]|.

t=1

Assumptions 1, 2(i,ii), 3(i,ii), 4(i.a), and 5(iii) ensure that the ergodic theorem applies ∗ ∗ ∗ ∗ ψk,t ∇h qj,t ∇i qk,t }, so B1T = op (1). Next, Assumptions 1, 3(i,ii), and 5(iii) ensure to {ψj,t ∗ ∗ ψk,t ∇h qj,t (·, α)∇i qk,t (·, α)}. This and that the stationary ergodic ULLN applies to {ψj,t ∗ the result of Theorem 1 (ˆ αT − α = op (1)) ensure that B2T = op (1) by e.g., White (1994, corollary 3.8), and the proof is complete. Proof of Theorem 4 We begin by sketching the proof. We ﬁrst deﬁne QT ≡ (2cT T )−1

p T

∗ ∗ 1[−cT ≤εj,t ≤cT ] ∇qj,t ∇ qj,t ,

t=1 j=1

and then we will show the following: p

Q∗ − E(QT ) → 0,

(20)

p

E(QT ) − QT → 0,

(21)

p

ˆ T → 0. QT − Q

(22)

p ˆ T − Q∗ → 0. Combining the results above will deliver the desired outcome: Q For (20), one can show by applying the mean value theorem to Fj,t (cT ) − Fj,t (−cT ), where Fj,t (c) ≡ 1{s≤c} fj,t (s)ds, that

E(QT ) = T −1

p T

∗ ∗ E[fj,t (ξj,T )∇qj,t ∇ qj,t ]=

t=1 j=1

p

∗ ∗ E[fj,t (ξj,T )∇qj,t ∇ qj,t ],

j=1

where ξj,T is a mean value lying between −cT and cT , and the second equality follows by stationarity. Therefore, the (h, i) element of |E(QT ) − Q∗ | satisﬁes |

p

. ∗ ∗ | E fj,t (ξj,T ) − fj,t (0)∇h qj,t ∇i qj,t

j=1

≤

p

. ∗ ∗ E |fj,t (ξj,T ) − fj,t (0)||∇h qj,t ∇i qj,t |

j=1

≤

p

. ∗ ∗ L0 E |ξj,T ||∇h qj,t ∇i qj,t |

j=1 2 ], ≤ pL0 cT E[D1,t

254

Multi-Quantile CAViaR

which converges to zero as cT → 0. The second inequality follows by Assumption 2(iii.b), and the last inequality follows by Assumption 5(iii.b). Therefore, we have the result in (20). To show (21), it suﬃces simply to apply a LLN for double arrays, e.g. theorem 2 in Andrews (1988). ˆ T − QT |, which is given by Finally, for (22), we consider the (h, i) element of |Q |

p T 1 1[−ˆcT ≤ˆεj,t ≤ˆcT ] ∇h qˆj,t ∇i qˆj,t 2ˆ cT T t=1 j=1

−

=

p T 1 ∗ ∗ 1[−cT ≤εj,t ≤cT ] ∇h qj,t ∇i qj,t | 2cT T t=1 j=1

p T 1 cT | (1[−ˆcT ≤ˆεj,t ≤ˆcT ] − 1[−cT ≤εj,t ≤cT ] )∇h qˆj,t ∇i qˆj,t cˆT 2cT T t=1 j=1

+

p T 1 ∗ 1[−cT ≤εj,t ≤cT ] (∇h qˆj,t − ∇h qj,t )∇i qˆj,t 2cT T t=1 j=1

+

p T 1 ∗ ∗ 1[−cT ≤εj,t ≤cT ] ∇h qj,t (∇i qˆj,t − ∇i qj,t ) 2cT T t=1 j=1

+

cˆT 1 ∗ ∗ (1 − ) 1[−cT ≤εj,t ≤cT ] ∇h qj,t ∇i qj,t | 2cT T cT t=1 j=1 T

≤

p

cˆT cT [A1T + A2T + A3T + (1 − )A4T ], cˆT cT

where A1T ≡

p T 1 |1[−ˆcT ≤ˆεj,t ≤ˆcT ] − 1[−cT ≤εj,t ≤cT ] | × |∇h qˆj,t ∇i qˆj,t | 2cT T t=1 j=1

A2T ≡

p T 1 ∗ 1[−cT ≤εj,t ≤cT ] |∇h qˆj,t − ∇h qj,t | × |∇i qˆj,t | 2cT T t=1 j=1

A3T ≡

p T 1 ∗ ∗ 1[−cT ≤εj,t ≤cT ] |∇h qj,t | × |∇i qˆj,t − ∇i qj,t | 2cT T t=1 j=1

A4T ≡

p T 1 ∗ ∗ 1[−cT ≤εj,t ≤cT ] |∇h qj,t ∇i qj,t |. 2cT T t=1 j=1

It will suﬃce to show that A1T = op (1), A2T = op (1), A3T = op (1), and A4T = Op (1). p p ˆ T − Q∗ → Then, because cˆT /cT → 1, we obtain the desired result: Q 0.

Appendix

255

We ﬁrst show A1T = op (1). It will suﬃce to show that for each j, T 1 |1[−ˆcT ≤ˆεj,t ≤ˆcT ] − 1[−cT ≤εj,t ≤cT ] | × |∇h qˆj,t ∇i qˆj,t | = op (1). 2cT T t=1

ˆ T and α∗ , and put dj,t,T ≡ ||∇qj,t (·, αT )|| × ||ˆ αT − α∗ || + |ˆ cT − cT |. Let αT lie between α Then (2cT T )−1

T

|1[−ˆcT ≤ˆεj,t ≤ˆcT ] − 1[−cT ≤εj,t ≤cT ] | × |∇h qˆj,t ∇i qˆj,t | ≤ UT + VT ,

t=1

where UT ≡ (2cT T )−1

T

1[|εj,t −cT |
t=1 −1

VT ≡ (2cT T )

T

1[|εj,t +cT |
t=1 p

p

It will suﬃce to show that UT → 0 and VT → 0. Let η > 0 and let z be an arbitrary positive number. Then, using reasoning similar to that of Kim and White (2003, lemma 5), one can show that for any η > 0, P (UT > η) ≤ P

−1

(2cT T )

T

1[|εj,t −cT |<(||∇qj,t (·,α0 )||+1)zcT ] )|∇h qˆj,t ∇i qˆj,t | > η

t=1

≤

T zf0 E {(||∇qj,t (·, αT )|| + 1)|∇h qˆj,t ∇i qˆj,t |} ηT t=1

3 2 | + E|D1,t |}/η, ≤ zf0 {E|D1,t

where the second inequality is due to the Markov inequality and Assumption 2(iii.a), and the third is due to Assumption 5(iii.c). As z can be chosen arbitrarily small and the p remaining terms are ﬁnite by assumption, we have UT → 0. The same argument is used p to show VT → 0. Hence, A1T = op (1) is proved. T Next, we show A2T = op (1). For this, it suﬃces to show A2T,j ≡ 2c1T T t=1 ∗ 1[−cT ≤εj,t ≤cT ] |∇h qˆj,t − ∇h qj,t | × |∇i qˆj,t | = op (1) for each j. Note that A2T,j ≤

≤

T 1 ∗ |∇h qˆj,t − ∇h qj,t | × |∇i qˆj,t | 2cT T t=1 T 1 ||∇2h qj,t (·, α)|| ˜ × ||ˆ αT − α∗ || × |∇i qˆj,t | 2cT T t=1

256

Multi-Quantile CAViaR

≤

T 1 1 ||ˆ αT − α∗ || D2,t D1,t 2cT T t=1

=

T 1 1/2 ∗ 1 T ||ˆ α − α || D2,t D1,t , T T t=1 2cT T 1/2

˜ is the ﬁrst derivative of ∇h qˆj,t with where α ˜ is between α ˆ T and α∗ , and ∇2h qj,t (·, α) respect to α, which is evaluated at α. ˜ The last expression above is op (1) because (i) T αT − α∗ || = Op (1) by Theorem 2, (ii) T −1 t=1 D2,t D1,t = Op (1) by the ergodic T 1/2 ||ˆ theorem and (iii) 1/(cT T 1/2 ) = op (1) by Assumption 7(iii). Hence, A2T = op (1). The other claims A3T = op (1) and A4T = Op (1) can be analogously and more easily proven. p ˆT → Hence, they are omitted. Therefore, we ﬁnally have QT − Q 0, which, together with p ˆ T − Q∗ → 0. The proof is complete. (20) and (21), implies that Q

13

Volatility Regimes and Global Equity Returns Luis Cat˜ao and Allan Timmermann

The stock market burst of the early 2000s and widespread perception of tighter international co-movements in stock prices over the past boom and burst cycle have renewed interest in patterns of equity market volatility and their sources. Three important questions arise in this connection: ﬁrst, does market volatility display wellcharacterized temporary switches that can nevertheless be quite persistent? Second, to what extent is such volatility accounted for by global, country- or sector-speciﬁc factors and how do these factor contributions evolve across distinct volatility states (if any)? Third, what implications follow for international risk diversiﬁcation? Each of these questions has been addressed in distinct literatures that have built on and been shaped by Rob Engle’s pioneering work on volatility modeling. Taking oﬀ from Engle (1982a), a ﬁrst body of literature has looked at the question of whether stock return volatility is time-varying using a variety of econometric models capable of gauging rich asset pricing dynamics, which have been applied to broad stock market indices (see Bollerslev et al., 1994, and Campbell et al., 1997 for comprehensive surveys). It is typically found that stock returns have been strongly time-varying, with evidence showing US stock market volatility to have risen in the run-up to the 1987 crash, then dropping to unusually low levels through 1996/97 before rising markedly since, although some controversy remains as to whether stock return volatility has been trendless (Schwert, 1989) or U-shaped over longer horizons (Eichengreen and Tong, 2004). Although the above studies do not decompose such time-varying stock return volatility into its country-, sector-, and ﬁrm-speciﬁc components, other researchers have used international ﬁrm-level data to try to measure the relative importance of these factors. Acknowledgments: We thank Steve Figlewski, seminar participants at the Festschrift conference in honor of Rob Engle, at the European econometric society meetings, IMF, and Monash University, as well as many other colleagues on both sides of the Atlantic for many helpful comments on earlier drafts. The usual caveats apply. The second author acknowledges support from CREATES, funded by the Danish National Research Foundation.

257

258

Volatility regimes and global equity returns

The employed econometric apparatus in the earlier strand of this literature has generally been much simpler, consisting of cross-sectional regressions of ﬁrms’ stock returns on a set of country and industry dummies for each period. As these dummies are orthogonal in each cross-section, and their estimated coeﬃcients represent the excess return associated with belonging to a given sector and country relative to a global average (the regression’s intercept), the contribution of each factor can then be computed in two ways: either by the time series variance of the coeﬃcients estimated in the successive crosssectional regressions over ﬁxed or rolling time windows of arbitrarily speciﬁed lengths, or by the average absolute sum of the coeﬃcients on the sector and country dummies over the chosen window. On this basis, it has been concluded that the country factor typically explains most of the cross-sectional variation in stock returns, with sector- or industry-speciﬁc factors accounting for less than 10% on average (Heston and Rouwenhorst 1994; Beckers et al., 1992; Griﬃn and Karolyi, 1998), albeit rising in the more recent period (Brooks and Cat˜ ao, 2000; Brooks and del Negro, 2002; Cavaglia et al., 2000; L’Her et al., 2002). Underlying this approach is thus the assumption that factors driving country- and industry-aﬃliation eﬀects have very limited dynamics, being either constant or changing only very gradually over time. Although more recent work has overcome some of these limitations by using an arbitrage pricing theory (APT) model where APT factors are extracted from the covariance matrix of returns and re-estimated over ﬁxed intervals (Bekaert, Hodrick, and Zhang, 2005), or by using a GARCH framework (Baele and Inghelbrecht, 2005), this strand of the literature has continued to rely on linear factor speciﬁcations. In light of evidence that country factors have been consistently important in driving stock returns, a third strand of the literature has focused on the issue of how they correlate over time and, hence, what scope there is for international equity risk diversiﬁcation arising from the covariance patterns of equity returns across the various national markets. Unsurprisingly in view of the evidence of time-varying correlations in many ﬁnancial return series (Engle, 2002a), it has generally been found that such covariances display considerable time variation (King, Sentana, and Wadhwani, 1994; Engle, Ito, and Lin, 1994; Longin and Solnik, 1995; Bekaert and Harvey, 1995; Karolyi and Stultz, 1996). Further, it has also been found that informational proximity and common institutional factors play a role (Portes and Rey, 2005) as do a variety of macroeconomic measures (Engle and Rangel, 2008), particularly at lower frequencies. Although Portes and Rey (2005) use disaggregated data on equity ﬂows to test the informational gravity view, the bulk of this literature on time-varying national market correlations has typically been obtained using broad stock indices. Among other things, this does not allow one to distinguish how much of these correlations are due to “pure” country-speciﬁc factors or diﬀerences in the sector composition across the various national market indices – an issue that is better addressed with ﬁrm-level data and consistent sector classiﬁcation across countries. By the same token, the important question of how risk diversiﬁcation possibilities evolve as the various country and industry factors move into distinct (and not necessarily coincident) volatility regimes is also overlooked in this literature. Against this background, the contribution of this chapter is twofold. First, we develop a dynamically ﬂexible econometric framework that is capable of addressing the above questions about patterns and sources of international equity market volatility. We do so without imposing unwarranted restrictions featured in previous work, including the assumption of a single volatility regime, that the contribution of sector- or

Volatility regimes and global equity returns

259

country-speciﬁc factors cannot discretely change across regimes, or by making use of arbitrarily speciﬁed rolling windows that are well-known since Frisch (1933) to be capable of inducing spurious dynamics in the data. There are clear reasons for why relaxing these assumptions is important. National policies that inﬂuence country risk may display nongradual changes that have been deemed as one culprit for the time-varying nature of stock return volatility (Eichengreen and Tong, 2004) and are also a well-known source of nonlinearities in macroeconomic and ﬁnancial data (Engel and Hamilton, 1990; Driﬃll and Sola, 1994). By the same token, widely studied supply shocks such as changes in oil prices are known to have potentially large and discrete eﬀects on equity market volatility, and similarly so the emergence of new technologies. Both are thus potentially capable of radically changing the industry-speciﬁc dynamics of stock returns and generate signiﬁcant diﬀerences in the persistence of high versus low volatility regimes, which cannot be typically accounted for by linear models and/or GARCH-type speciﬁcations. All this underscores the need for greater ﬂexibility in modeling the factor dynamics driving stock returns. The approach we propose consists of two steps. In the ﬁrst step we form “pure” country and “pure” industry (or sector) portfolios from a large cross-section of ﬁrms. Such a country–industry decomposition yields an important beneﬁt relative to the practice of measuring international correlations using broad national indices in that it permits disentangling the extent to which a given variation in country X’s stock index is due to country X’s speciﬁc (institutional or policy related) factor or, instead, due to say an information technology (IT) shock that aﬀects the country disproportionally simply because of a large weight of the IT sector in that country. No less importantly, this standard procedure of forming portfolios is instrumental for achieving the dimensionality reduction required in the application of richly parameterized models such as ours to large unbalanced panels. By summarizing the relevant ﬁrm-level information into a much smaller and hence manageable number of time series, we can then model the dynamics of returns on the various country and sector portfolios in a possibly nonlinear fashion in a second stage, allowing for regime switches in volatility processes. As shown below, once country-, industry- and global factors are each allowed to be in a diﬀerent volatility regime at any given point in time, this will permit the characterization of a broader array of diversiﬁcation possibilities than those considered in previous studies. The second contribution of the chapter lies in applying this methodology to a uniquely long ﬁrm-level data set so as to shed light on the substantive questions pertaining to the distinct strands of the literature referred to above. Our sample spans 13 countries over nearly 30 years, compared with at most 15–20 years or so of data in previous studies.1 As it accounts for around 80% of advanced countries’ stock market capitalization towards the end of the period and between 56 and 73% of world stock market capitalization over 1973–2002, our data set is thus broadly representative of global stock market developments. We use this data to answer the following questions. First, does the “stylized fact” that country factors overwhelmingly dominate sectoral-aﬃliation eﬀects hold uniformly or change only very slowly/rapidly over time? Second, what is the strength of the 1 The two studies using the longest time series that we are aware of are Brooks and del Negro (2002) and Bekaert, Hodrick, and Zhang (2005). The sample coverage is 1985:1 to 2002:2 in the former, and 1980:1 to 2003:12 in the latter. Thus, none of these studies incorporates in their estimates the eﬀects of the large shifts in stock market volatility and relative factor contribution following the oil shocks and monetary disturbances of the 1970s, which we document. We return to this issue below.

260

Volatility regimes and global equity returns

various individual country and sectoral return correlations within the distinct volatility states (if more than one)? In particular, do we observe tighter equity return correlations within certain groups even after allowing for distinct volatility states and distinct sectoral compositions of the various national indices, consistent with informational gravity models of equity holdings? Finally, what are the implications for international portfolio diversiﬁcation? The main results are as follows. First, we ﬁnd strong evidence of nonlinear dynamic dependencies in both sector and country portfolios, indicating that the dynamic “mixtures of normals” model underlying the Markov switching approach is superior to the single state model; we corroborate this evidence through a variety of tests on model residuals as well as by comparing our model’s smoothed probability estimates with nonparametric volatility measures spanning our entire sample. Second, we use this purportedly more accurate gauge provided by our model to estimate that the country factor explains about 50% of market volatility over the entire period on average, as opposed to 16% accounted for by the sector- or industry-speciﬁc factor. Thus, while this average contribution of the industry factor is substantially lower than that of the country factor, it is well above that estimated in earlier studies (less than 10%). No less importantly, these relative factor contributions are shown to vary widely across volatility states. The sectoral factor contribution typically rises sharply during major industry-speciﬁc shocks (such as the oil shocks of the early and late 1970s and mid-1980s, and IT boom and bust more recently), the direct counterpart of which is a marked drop in the country factor contribution down to the 30–35% range. Third, we provide a new set of measures of international portfolio correlations. As these are model implied estimates calculated over the various portfolio pairs and conditional upon the entire time series information up to that point, they are not marred by biases aﬀecting unconditional estimates discussed in Forbes and Rigobon (2001), nor aﬀected by potential biases arising from relying on a small number of observations from a particular volatility state. We ﬁnd that such volatilities vary markedly across states and, in particular, that when both the global and industry factors are in the high volatility state, correlations between country portfolios typically become tighter than correlations across industry portfolios. A key implication is that the sharp rise in country portfolio correlations during high global volatility states undermines the beneﬁts of cross-border diversiﬁcation during those periods. This eﬀect is further compounded by the ﬁnding that such correlations are generally tighter across certain groups of countries (such as Anglo– Saxon countries and some European markets), thus lending support to an information gravity view of cross-border equity ﬂows a la Portes and Rey (2005). Thus, our ﬁndings highlight a potentially important connection between global stock market volatility and both levels and the geographic distribution of international equity ﬂows – an issue which, to our knowledge, is yet to be explored in the literature on the determinants of international capital ﬂows. The remainder of the chapter is structured as follows. Section 1 lays out the econometric methodology, whereas Section 2 discusses the data. The empirical characterization of the single and joint dynamics of country and industry portfolios and of the global factor is provided in Section 3. Section 4 presents variance decomposition results on the relative contribution of each factor to overall stock return volatility. Section 5 provides an economic interpretation of our model characterization of the volatility states, linking it to the existing literature on the determinants of stock market volatility. Section 6 examines

1 Econometric methodology

261

the within-state portfolio correlations and examines the respective implications for global risk diversiﬁcation. Section 7 concludes.

1. Econometric methodology 1.1. Constructing “pure” country and industry portfolios Panels of individual stock returns are typically highly unbalanced due to the fact that some ﬁrms die whereas others are “born” at some point within any reasonably long time series data. To deal with this problem without having to resort to potentially distorting procedures dropping the observations of both newly born and dead ﬁrms to balance the panel and make estimation feasible, we present an approach that does not entail losing information contained in the time series dynamics of individual country or industry stock return series, nor in the whole cross-sectional dimension of the data. Speciﬁcally, we propose a two-stage approach where, in the ﬁrst stage, we follow Heston and Rouwenhorst (1994) and extract the industry and country returns for a given time period through cross-sectional regressions in which each ﬁrm’s stock returns is deﬁned as: Rijkt = αt + βjt + γkt + εit , th

(1) th

where Rijkt stands for the return at time t of the i ﬁrm in the j industry and the k th country, αt is a global factor common to all ﬁrms, βjt is an “excess” return owing to the ﬁrm’s belonging to industry j, γkt is an “excess” return associated with the ﬁrm’s location in country k, and εit is an idiosyncratic ﬁrm-speciﬁc factor.2 This factor structure has been a work-horse in much of the literature on equity market volatility and co-movements, both among studies using ﬁrm-level data as well as among those using aggregate country indices (see, e.g., Forbes and Chinn, 2004).3 What has diﬀered among recent studies is whether country and industry factor loads are assumed to be ﬁxed, cross-sectionally varying, time-varying, or both. Although there 2 We have not subtracted a risk-free rate from returns in equation (1). As the model measures returns relative to the benchmark of the average world portfolio, this means that αt incorporates time-variations in the risk-free world interest rate. As all returns are measured in US dollars, it is thus natural to think of αt as capturing ﬂuctuations in the three-month US T-bill rate (an oft-used proxy for the world risk-free interest rate). If instead we were to measure returns in the various local currencies relative to the respective country “risk-free” rate, variations in the risk-free rates across diﬀerent countries will be absorbed in γkt . 3 Besides its popularity, there are three particular reasons why we retain this three-factor speciﬁcation. First, we are interested in country and industry eﬀects in their own right and we know that a model with both country and industry factors systematically dominates one with country- or with industry factors alone (see Bekaert, Hodrick, and Zhang, 2005 for a comparison of models with distinct factors). Second, adding a “small cap” factor or a similar proxy related to ﬁrm size has been shown to have negligible eﬀects on country–industry decompositions for a similar sample of international ﬁrms (Brooks and Cat˜ ao, 2000). Third, although regional factors are deemed to be important – in particular for Europe (cf. Baele and Inghelbrecht, 2005) – augmenting the model with various regional dummies would be infeasible due to degrees of freedom limitations arising from the richly parameterized nature of our regime-switching speciﬁcation in the second stage estimation. As time goes by and longer time series data become available, however, augmenting the model with regional factors could become a feasible extension to this model.

262

Volatility regimes and global equity returns

are advantages of letting the loads vary both across ﬁrms and over time as in Bekaert, Hodrick, and Zhang’s (2005), this choice needs to be traded oﬀ against the beneﬁts of modeling the dynamics of factor loadings as a regime-switching process. For letting β and γ vary both cross-sectionally and over time for each ﬁrm would be unfeasible (even for reasonably long time series such as ours) given the already large number of parameters to be estimated with only time-varying loadings, as discussed below. Clearly, ﬁxing the cross-sectional factor loads has the drawback that individual ﬁrms may diﬀer in their degree of exposure to the global factor. However, this cost appears to be less consequential in the present context as we rely on this load homogeneity assumption only to construct country and sector portfolios consisting of hundreds of ﬁrms, so that the eﬀect of idiosyncratic factor loadings is largely washed out in the aggregate.4 Further, one other major advantage of doing so – besides that of making the subsequent regimeswitching estimation feasible – is to facilitate comparability between our results and those from a large body of the literature, which also uses decomposition schemes based on ﬁrm-level homogeneity of factor loads. This allows us to isolate the contribution of our approach relative to earlier studies. Generalizing to J industries and K countries, equation (1) can be rewritten as:

Rijkt = αt +

J

eijβ βjt +

j=1

K

eikγ γkt + εit ,

(2)

k=1

where eij β is a dummy variable deﬁned as 1 for the ith ﬁrm’s industry and zero otherwise, whereas eikγ is a dummy deﬁned as 1 for the ith ﬁrm’s country and zero otherwise. As each ﬁrm can only belong to one industry and one country at a time, the various industry dummies in (2) will be orthogonal to each other within the cross-section. Likewise, the various country dummies will also be orthogonal to each other. We can rewrite (2) more succinctly by deﬁning the excess return vectors as: ⎛

⎞ β1t ⎜β2t ⎟ ⎜ ⎟ βt = ⎜ . ⎟ , ⎝ .. ⎠ βJt

⎛

⎞ γ1t ⎜ γ2t ⎟ ⎜ ⎟ γt = ⎜ . ⎟ , ⎝ .. ⎠ γKt

4 This is clear from the evidence presented in Brooks and del Negro (2002) who, after allowing for distinct ﬁrm-speciﬁc loadings, obtain very similar inferences about the relative factor contributions as those yielded when the homogeneity of factor loadings is imposed. As elsewhere in this literature, their results are based on the assumption of a single volatility state. Another important trade-oﬀ of allowing for ﬁrm-speciﬁc factor loadings (which are estimated by a maximum likelihood algorithm as in their study), is that of having to balance the panel as the algorithm cannot handle missing observations. As discussed above, by eliminating new ﬁrm entry and exit from the sample, this panel-balancing procedure is an additional source of potential estimation biases. More recently, Bekaert, Hodrick and Zhang (2005) compared the performance of several models, including the Heston and Rouwenhorst (1994) model with ﬁxed cross-sectional and time series loads over shorter sub-periods. Although Bekaert, Hodrick and Zhang (2005) ﬁnd that an APT model incorporating both global and local factors ﬁts the covariance structure of stock returns best, their results are restricted to the linear (one-state) version of the model. No less importantly, their study also ﬁnds that diﬀerences in the measurement of country versus industry contributions between other (one-state) models and the Heston and Rouwenhorst model are mainly due to the time dimension of the sample, rather than to the assumption of unit beta loadings across ﬁrms.

1 Econometric methodology

263

so that: Rijkt = αt + e iβ β t + e iγ γ t + εit .

(3)

where eiβ is a J × 1 vector of zeros with a one in the ith ﬁrm’s industry, whereas eiγ is a K × 1 vector of zeros with a one in the ith ﬁrm’s country. As equation (3) cannot be estimated as it stands because of perfect multicollinearity (as every company belongs to both an industry and a country whereas the industry and country eﬀects can only be measured relative to a benchmark), we follow the literature by imposing the restriction that the weighted sum of industry and country eﬀects equals zero at every point in time; so, the industry and country eﬀects are estimated as deviations from a common benchmark, the return on the global factor captured by the intercept α. Subject to these zero sum restrictions, equation (3) can be estimated using weighted least squares, with each stock return being weighted by its beginning-of-period share xi of the global stock market capitalization (computed as a sum of the market capitalization of all the N ﬁrms comprising the cross-section). An advantage of constructing country and industry portfolios this way is that the number of ﬁrms in each cross-section can vary and yet the panel of portfolios of country- and sector- or industry-speciﬁc excess returns is balanced. This procedure therefore eﬀectively summarizes the relevant information from the original unbalanced panel.

1.2. Modeling stock return dynamics Whereas the earlier literature has not attempted to link the individual industry (β t ) and country components (γ t ) over time, we will allow for such dependencies in these components in a ﬂexible manner that does not impose linearity or serial independence a priori. In doing so, we follow the large empirical literature that has documented the presence of persistent regimes in a variety of ﬁnancial time series (Ang and Bekaert, 2002; Engel and Hamilton, 1990; Driﬃll and Sola, 1994; Hamilton, 1988; Kim and Nelson, 1999b; Perez-Quiros and Timmermann, 2000). Typically these studies capture periods of high and low volatility in univariate series or in pairs of series (e.g. Ang and Bekaert, 2002; Perez-Quiros and Timmermann, 2000). In what follows we extend this approach to multi-country/multi-sector portfolios. Let sαt , sβj t , sγk t be separate state variables driving returns on the global, industry, and country portfolios, respectively. We show in the empirical section that the data justiﬁes this assumption. If, furthermore, these state variables are industry- and countryspeciﬁc, we can write returns on the global, industry and country portfolios as: αt = μαsαt + σαsαt εαt , βjt = μβj sβjt + σβj sβjt εβjt ,

(4)

γkt = μγk sγkt + σγk sγkt εγkt . Suppose, for example, that there are two states for the global return process so sαt = 1 or sαt = 2. Then the mean of the global return component in any given period, t, is either μα1 or μα2 , whereas its volatility is either σα1 or σα2 . Similarly, if the jth industry state variable can take two values, sβjt = 1 or sβjt = 2, then the jth industry’s mean return at time t is either μβj1 or μβj2 whereas its volatility is either σβj1 or σβj2 .

264

Volatility regimes and global equity returns

How the state processes alternate between states is obviously important. We follow conventional practice and assume constant state transition probabilities for the global return process as well as for the individual country and industry return processes: Pr(Sαt = sα |Sαt−1 = sα ) = pαsα sα , Pr(Sβjt = sβj |Sβjt−1 = sβj ) = pβjsβj sβj ,

(5)

Pr(Sγkt = sγk |Sγkt−1 = sγk ) = pγksγk sγk . Here pα11 is the probability that the global return process remains in state 1 if it is already in this state, pβj11 is the probability that the jth industry state variable remains in state 1 and so forth. This means that the regimes are generated by a discrete state homogenous Markov chain. We will be interested in studying the state probabilities implied by our models given the current information set, Γt , which comprises all information up to time t, i.e., πsαt = Pr(Sαt = sα |Γt ), πsβj t = Pr(Sβj t = sβj |Γt ), πsγk t = Pr(Sγk t = sγ |Γt ). As we shall see in the empirical section, the time series of these probabilities extracted from the data provide information about high and low volatility states. Finally, we assume that the innovation terms, εαt , εβjt and εγkt are normally distributed. This implies that the return process will be a mixture of normal random variables, the resulting distribution of which is capable of accommodating features such as skews and fat tails that are frequently found in ﬁnancial data, c.f. Timmermann (2000). Under this model, the return on the ith company in industry j and country k is aﬀected by separate global, industry and country regimes plus an idiosyncratic error term Rijkt = μαsαt + μβj sβjt + μγk sγkt + σαsαt εαt + σβj sβjt εβjt + σγk sγkt εγkt + εit .

(6)

It is possible, however, that the state variable driving the industry and country returns shares an important common component across industries and country returns. This could be induced, for example, by an oil shock to the extent that the latter tends to have a large diﬀerential eﬀect across industries and a far more homogenous eﬀect across countries. Similarly, one can think of a number of common shocks of political origins, for instance, such as a war or a large-scale terrorist attack that spread mainly along country lines as opposed to industry lines. If so, a more eﬃcient way to gain information about the underlying state variable is to estimate a multivariate regime-switching model jointly for several portfolios. To account for the possibility that a common state factor is driving the individual industry returns on the one hand and the individual country returns on the other hand, we consider the following model: αt = μαsαt + εαsαt , β t = μβsβt + εβsβt ,

(7)

γ t = μγsγt + εγsγt , where μαsαt is the scalar global mean return in state sαt , μβsβt is a J-vector of industry means in state sβt , μγsγt is a K-vector of country means in state sγt . Furthermore, the innovations to returns are assumed to be Gaussian with zero mean and state-speciﬁc

2 Data

265

2 2 ), εβsβt ∼ (0, Ωβsβt ), εγsγt ∼ (0, Ωγsγt ), where σαs is the variances εαsαt ∼ (0, σαs αt αt scalar variance of global return in state sαt , Ωβsβt is the J × J variance–covariance matrix of industry returns in state sβt , Ωγsγt is the K × K variance–covariance matrix of country returns in state sγt . State transitions for this common factor case are still assumed to be time-invariant:

Pr(Sαt = sα |Sαt−1 = sα ) = pαsα sα , Pr(Sβt = sβ |Sβt−1 = sβ ) = pβsβ sβ ,

(8)

Pr(Sγt = sγ |Sγt−1 = sγ ) = pγsγ sγ . The regime-switching model is fully speciﬁed by the state transitions (8), the return equations (3) and (7) and the assumed “mixture of normals” density. However, estimation of the model is complicated by the fact that the state variable is unobserved or latent. We deal with this by obtaining maximum likelihood estimates based on the EM algorithm (see Hamilton, 1994, for details). A major advantage of our common nonlinear factor approach is that it allows us to extract volatility estimates of portfolio strategies involving an arbitrary number of countries or industries in addition to the global component. As discussed in Solnik and Roulet (2000), the standard way to capture time-variation in market volatility and correlations is by using a ﬁxed-length rolling window of, say, 36 or 60 months of returns data and estimate cross-correlations for pairs of countries. This approach has three major disadvantages compared to our approach. One is that of not relying on the full data sample, likely leading to imprecise estimates of volatilities and correlations, which typically require relatively large data samples for precise estimation. Second, by construction as they present moving averages of volatilities, rolling window estimates cannot capture relatively short-lived volatility bursts that may be important for investment risk. Third, rolling window estimates provide unconditional estimates of volatilities and correlations and do not exploit any dynamic structures in the covariance of portfolio returns, other than indirectly, as the parameter estimates get updated over time. In contrast, the proposed regime-switching framework can capture richer dynamics: whereas the mean and variance of returns are constant within each state, the state probabilities vary over time either gradually (if the ﬁltered state probabilities change slowly) or rapidly (if ﬁltered state probabilities move more suddenly).

2. Data The data cover monthly total returns and market capitalizations for up to 3,951 ﬁrms in developed stock markets over the period February 1973 to February 2002.5 Country 5 Monthly total returns are computed in local currency using data from Datastream/Primark. The return calculation assumes immediate reinvestment of dividends. These local currency returns are converted to US dollars using end-of-month spot exchange rates. The beginning-of-month stock market capitalizations are converted into US dollars using the beginning-of-month dollar price of one unit of local currency. Expressing all returns and market cap data in US dollars implicitly reﬂects the perspective of a currency unhedged equity investor whose objective is to maximize US dollar returns. To the extent that changes in equity returns overwhelm those associated with currency ﬂuctuations, expressing

266

Volatility regimes and global equity returns

coverage spans Australia, Belgium, Canada, Denmark, France, Germany, Ireland, Italy, Japan, Netherlands, Switzerland, the United Kingdom, and the United States. Although data are available for other advanced stock markets (notably Austria, Norway, and Sweden) from the late 1970s/early 1980s, this would entail a shorter estimation period and attendant degrees of freedom constraints would turn the estimation infeasible. The exclusion of emerging markets in particular from our sample does not seem capable of altering the main results. Recent work that includes both mature and emerging markets in (value weighted) regressions for the post-1985 period ﬁnds that trends in the relative contribution of country and industry factors are basically the same regardless of whether one includes or excludes the emerging market subsample (Brooks and Cat˜ ao, 2000; Brooks and del Negro, 2002). Firms in these 13 countries are then grouped into one of 11 FTSE industry sectors: resources, basic industries, general industries, cyclical consumer goods, noncyclical consumer goods, cyclical services, noncyclical services, utilities, information technology, ﬁnancials and others. Although some recent papers argue in favor of a ﬁner industry classiﬁcation, the level of aggregation used here is suﬃcient not only because it follows the traditional industry breakdown used by portfolio managers and much of the academic literature, but also because it clearly distinguishes new industries that appear to have distinct time series dynamics of stock returns (such as information technology).6 A desirable feature of these data is that they are a realistic and unbiased representation of the global stock market. As of December 1999, the total capitalization of the sample comes to $26.3 trillion or 80% of stock market capitalization in advanced countries as measured by the IFC yearbook and 73% of the world market capitalization (i.e. including developing countries). Coverage deteriorates somewhat towards the beginning of the sample but because the data comprise the largest and internationally most actively traded ﬁrms in key markets such as the United States, Japan, and the United Kingdom throughout, the sample can be deemed as quite representative from the viewpoint of a global investor. It should be noted, however, that the deterioration in coverage reﬂects two deﬁciencies of the data set. First, it is subject to survivorship bias, meaning that only ﬁrms surviving over the full sample period are covered. Although this bias no doubt aﬀects average rates of return, it does not seem to be too consequential for the analysis returns and market caps in the distinct national currencies should not change the thrust of the results, as earlier work using a subset of this data has found (Brooks and Cat˜ ao, 2000). This is consistent with what other researchers have also found using other data sets (Heston and Rouwenhorst, 1994; Griﬃn and Karolyi, 1998; Griﬃn and Stultz, 2001). One possible reason lies in exchange rate hedging used by many large ﬁrms in developed countries with extensive international operations, which comprise a sizeable portion of those data sets. Developing country ﬁrms were excluded from the sample altogether because none of those reported in the Datastream/Primark data set had suﬃciently long series, entailing too short a time span for the respective country portfolio to be included in the estimation of the Markov switching regressions. 6 Although Griﬃn and Karolyi (1998) note that a ﬁner industry disaggregation may yield a more accurate measure of industry eﬀects, their main result – the far greater dominance of country-speciﬁc eﬀects – hardly changes with the move to a ﬁner industry breakdown. An alternative breakdown that groups IT with media and telecoms under a broader TMT group has been considered in Brooks and Cat˜ ao (2000) and Brooks and del Negro (2002), who show that the dynamics of the broader TMT group is largely dominated by the IT sector from the 1990s. In the FTSE industry classiﬁcation used in this chapter, media is grouped under cyclical services and telecommunications under noncyclical services.

3 Global stock return dynamics

267

of relative factor contributions to market volatility, which is the central concern of this chapter. This can be gauged from the very small diﬀerences in the results obtained from an application of the Heston–Rouwenhorst decomposition scheme to a subsample that includes dead ﬁrms for the post-1986 period (when such a list is available) and the counterpart in our data which do not include such ﬁrms.7 The second deﬁciency of the data is that of including only post-merger companies, dropping companies that go into the merger. The most likely eﬀect of this is to bias the estimates in favor of ﬁnding more pronounced global industry eﬀects in the more recent years in the sample; but as this problem applies only to a few ﬁrms, it is also likely to have a very limited eﬀect on the estimates. On the positive side, our sample stretches over a much longer time period than those in the studies referred to above. This is a crucial advantage required for precise estimation of regime-switching processes. As we shall see, most regimes tend to be quite persistent so identifying them requires a time series as long as that considered in our study. No single country is represented by less than 28 ﬁrms on average (Ireland and Denmark) and, in the case of large economies such as the US and Japan, coverage approaches 1,000 ﬁrms towards the end of the sample from a minimum of 377 ﬁrms at the beginning of the sample (February 1973). This reasonably large time series and cross-sectional dimension of the data probably eliminates any signiﬁcant distortion in the econometric results arising from the deﬁciencies mentioned above.

3. Global stock return dynamics Table 13.1 presents some summary statistics for the distribution of the country, industry and world portfolios. All country and industry portfolio returns are measured in excess of the world portfolio so the mean returns on these portfolios are close to zero on average.8 Standard deviations average 4.89% per month for the country portfolios and 2.96% for the industry portfolios, thus verifying the ﬁnding in the literature that, on average, country factors matter more than industries for explaining variations in stock returns. Country portfolios tend to be slightly more positively skewed than the industry portfolios whereas, interestingly, returns on the global portfolio are not skewed. There is also strong evidence of excess kurtosis in most of the portfolios. Accordingly, Jarque-Bera test statistics for normality rejected the null of normally distributed returns for all portfolios except for Switzerland and Japan.9 This is the type of situation where mixtures of normals may better capture the underlying return distribution. 7 For a subsample including dead ﬁrms, Brooks and del Negro (2002) obtain the following ﬁgures for the capitalization-weighted time series variance of country eﬀects (also gauged by the same parameter γkt in the Heston–Rouwenhorst decomposition scheme): 18.47 over 1986:3 to 1990:2; 21.08 over 1990:3 to 1994:2; and 9.12 between 1994:3 and 1998:2. Using the same 4-year ﬁxed windows and a similar group of mature markets but without including dead stocks, our respective estimates are 19.21, 22.10, and 8.80. So, the diﬀerences are small and have no discernable eﬀect on trends. 8 The only reason the averages are not exactly equal to zero is that we are reporting arithmetic averages, whereas the world portfolio is based on capitalization-weighted returns. 9 Section 3.3 below reports the results of the normality tests for our ﬁtted model, which show that residuals become broadly normal once conditioned on the regime moments, thus providing strong support for the proposed regime-switching approach.

268

Volatility regimes and global equity returns Table 13.1. Summary statistics for the country, industry, and world portfolio returns mean

s.d.

skew

kurtosis

(a) Country portfolios US UK France Germany Italy Japan Canada Australia Belgium Denmark Ireland Netherlands Switzerland Average

−0.12 0.07 0.10 −0.29 −0.12 0.11 −0.29 −0.15 −0.22 −0.10 0.18 −0.12 −0.28 −0.09

2.77 5.07 5.25 5.02 7.28 4.63 3.85 6.27 4.65 5.32 6.11 3.31 4.04 4.89

−0.42 1.81 0.27 −0.09 0.38 0.02 −0.34 −0.25 0.60 0.33 0.55 −0.04 −0.02 0.22

2.48 14.8 1.32 0.81 1.71 0.58 0.55 1.67 1.87 1.33 2.72 1.02 0.09 2.38

(b) Industry portfolios Resources Basic General industry Cyclical durables Noncycl. durables Cyclical services Noncycl. services Utilities Information technology Financials Others Average

−0.12 −0.19 −0.05 −0.09 −0.05 −0.06 −0.17 −0.28 0.18 0.00 −0.51 −0.12

3.74 2.52 1.78 3.24 2.45 1.61 3.72 4.07 4.34 2.28 2.79 2.96

0.03 0.06 −0.40 −0.30 −0.51 0.01 0.88 0.93 0.50 −0.16 0.21 0.11

1.71

4.34

−0.04

0.88 3.71 1.24 1.22 4.27 0.68 3.11 6.46 3.01 4.78 2.62 2.91 0.79

(c) World

This table reports descriptive statistics for the country, industry and global portfolios using the decomposition (2) subject to the constraints (3), (4). Returns are measured at the monthly frequency over the period February 1973–February 2002 and are based on a data set covering up to 3,951 ﬁrms in developed stock markets.

3.1. Nonlinearity in returns Previous studies of country- and industry eﬀects in international stock returns have been based on the assumption of a single state, so it is important to investigate the validity of this assumption. To determine whether a regime-switching model is appropriate for our analysis, we ﬁrst verify that two or more states characterize the return generating process of the individual industry and country portfolios. For this purpose we report the outcome of the statistical test proposed by Davies (1977), which, unlike

3 Global stock return dynamics Table 13.2.

Tests for multiple states

(a) Country portfolios US p value 0.000 p value

Australia 0.000

(b) Industry portfolios Resources p value

0.006 Utilities

p value

269

0.000

UK 0.000

France 0.000

Germany 0.004

Belgium 0.000

Denmark 0.000

Ireland 0.000

Basic 0.000 Inf. Technology 0.000

General ind. Cyc. cons. goods 0.000 0.000 Financials 0.000

Italy 0.000

Japan 0.005

Canada 0.352

Netherlands Switzerland 0.071 0.341 Noncyc. Cons. 0.000

Cyc. serv. 0.271

Noncyc. serv. 0.000

Other 0.000

(c) Global factor p value 0.000 This table reports Davies’ (1977) p values for the test of a single state, accounting for unidentiﬁed nuisance parameters under the null hypothesis of a single state. P values below 0.05 indicate the presence of more than one state.

standard likelihood ratio tests, has the advantage of taking into account the problem associated with unidentiﬁed nuisance parameters under the null hypothesis of a single regime. The results are shown in Table 13.2. For 10 out of 13 countries and 10 of 11 industries, the null of a single state is rejected at the 1% critical level. Linearity is also strongly rejected for the global portfolio. Hence, there is overwhelming evidence of nonlinear dynamics in the form of multiple regimes in country, industry and global returns. These results suggest that there are at least two regimes in the vast majority of return series. However, they do not tell us if two, three or even more states are needed to model the return dynamics. To choose among model speciﬁcations with multiple states, Table 13.3 reports the results of three standard information criteria that are designed to trade oﬀ ﬁt (which automatically grows with the number of parameters and thus with the number of underlying states) against parsimony (as measured by the total number of parameters). We report results using the Akaike (AIC), the Schwarz Bayesian (BIC) and the Hannan–Quinn (HQ) information criteria. For the 13 country portfolios, the three criteria unanimously point to a single state for Canada and Switzerland and three states for the UK, and at least two of the above criteria suggest that stock returns in all other countries are better modeled as a two-state process.10 Turning to the industry portfolios, the results are even more homogenous, with the BIC and HQ criteria selecting a two-state model for nine industries out of 11. At the same time, all three criteria indicate that stock returns in resources are best captured through a three-state model. Only for cyclical services is there considerable diﬀerence – the BIC and HQ choosing a single-state model whereas the AIC selects a three-state speciﬁcation. Finally, regarding the global portfolio, AIC and HQ choose a two-state speciﬁcation, whereas the BIC marginally selects a single-state speciﬁcation. 10 The ﬁnding of a single state for Canada and Switzerland is consistent with the Davies’ tests in Table 13.2, which could not reject linearity for these two countries.

270 Table 13.3.

Volatility regimes and global equity returns Selection criteria for individual country, industry, and global portfolios AIC

BIC

H-Q

k=1 k=2 k=3 k=1 k=2 k=3 k=1 k=2 k=3 (a) Country portfolios US UK France Germany Italy Japan Canada Australia Belgium Denmark Ireland Netherlands Switzerland (b) Industry portfolios Resources Basic General industry Cyclical consumer goods Noncyclical consumer goods Cyclical services Noncyclical services Utilities Information technology Financials Others (c) Global

4.882 6.093 6.163 6.074 6.817 5.910 5.543 6.518 5.920 6.189 6.468 5.241 5.640

4.700 5.803 6.045 6.049 6.731 5.886 5.549 6.435 5.856 6.133 6.327 5.235 5.646

4.728 5.720 6.050 6.069 6.750 5.894 5.559 6.455 5.861 6.151 6.306 5.250 5.660

4.904 6.115 6.185 6.096 6.839 5.932 5.565 6.540 5.942 6.211 6.490 5.263 5.663

4.767 5.869 6.111 6.115 6.798 5.952 5.615 6.501 5.923 6.199 6.393 5.301 5.712

4.860 5.852 6.182 6.201 6.882 6.027 5.693 6.587 5.994 6.284 6.438 5.383 5.793

4.891 6.101 6.172 6.083 6.826 5.919 5.552 6.526 5.928 6.197 6.477 5.250 5.649

4.727 5.829 6.071 6.075 6.758 5.912 5.575 6.461 5.883 6.160 6.354 5.261 5.672

4.780 5.773 6.102 6.121 6.802 5.947 5.613 6.508 5.914 6.204 6.359 5.303 5.713

5.485 4.697 4.000 5.200

5.462 4.514 3.893 5.120

5.356 4.490 3.907 5.110

5.507 4.719 4.022 5.222

5.528 4.581 3.959 5.185

5.488 4.623 4.040 5.242

5.493 4.706 4.009 5.209

5.488 4.541 3.920 5.145

5.409 5.543 3.960 5.163

4.638 4.340 4.359 4.660 4.407 4.492 4.647 4.367 4.412 3.795 5.473 5.653 5.783 4.494 4.898 5.781

3.798 5.396 5.448 5.484 4.225 4.809 5.741

3.768 5.373 5.463 5.460 4.224 4.822 5.749

3.817 5.495 5.675 5.805 4.513 4.921 5.803

3.865 5.462 5.514 5.550 4.292 4.875 5.808

3.900 5.506 5.595 5.593 4.357 4.954 5.881

3.803 5.482 5.662 5.792 4.500 4.907 5.790

3.825 5.422 5.474 5.510 4.252 4.835 5.768

3.821 5.426 5.516 5.513 4.277 4.875 5.802

This table shows the values of various information criteria used to determine whether a single-state (k = 1), a two-state (k = 2), or a three-state (k = 3) model is chosen for the country, industry and global portfolios. AIC gives the value of the Akaike information criterion, BIC the Schwarz’s Bayesian information criterion and HQ the Hannan–Quinn information criterion. Lowest values (in bold) should be preferred.

Overall, therefore, the results in Table 13.3 strongly indicate the presence of two states in the dynamics of the various portfolio returns. Accordingly, the subsequent analysis is based on this speciﬁcation.

3.2. Joint portfolio dynamics Addressing the question of the overall importance of industry and country eﬀects requires studying common country and common industry eﬀects. As discussed in Section 2, we do this using a nonlinear dynamic common factor speciﬁcation, which is distinct from the vast majority of recent work on dynamic factor models (cf. Stock and Watson, 1998) in

3 Global stock return dynamics Table 13.4.

Estimation results for the common component models Stayer prob.

Country Industry Global

271

Ergodic prob.

Duration

State 1

State 2

State 1

State 2

State 1

State 2

0.975 0.870 0.922

0.976 0.962 0.899

0.486 0.226 0.565

0.514 0.774 0.435

40.1 7.7 12.9

42.5 26.4 9.9

Davies test

0.0000 0.0000 0.0004

This table reports maximum likelihood estimates of the transition probability parameters of the regime-switching model (9), (10) ﬁtted to the common state model for countries, industries or the global portfolio. The state transition probabilities give the probabilities of remaining in state 1 and state 2, respectively. Steady state or ergodic probabilities provide the average time spent in the two states, whereas the state durations are the average time spent without exiting from the states (in months). The Davies test is for the null of a single state versus the alternative of multiple states.

that it does not impose a linear factor structure. This distinction is particularly important when the main interest lies in extracting common factors in the volatility of returns on various portfolios, given overwhelming empirical evidence of time-varying volatilities in stock returns. We estimate the proposed joint regime-switching model for the return series on the 13 country portfolios and 11 industry portfolios. To our knowledge, regime-switching models on such large systems of variables have not previously been estimated. The joint estimation of the parameters of a highly nonlinear model for such a large system is a nontrivial exercise. Yet, it can yield valuable insights into the joint dynamics of portfolio returns, as discussed below. Table 13.4 presents estimates of the transition probabilities and average state durations and the outcome of the Davies test for multiple states. Volatility estimates are shown in Table 13.5, which also presents results for the global portfolio.11 As expected, the null hypothesis of a linear model with a single state is strongly rejected for the country, industry, and world models. All three information criteria support a two-state model over the single-state model in the case of the joint industry and joint country models, whereas both the AIC and the HQ criterion support the two-state speciﬁcation over the one-state model for the global return model. Table 13.4 also shows that the two states identiﬁed in country returns have persistence parameters of 0.975 and 0.976, implying that the durations of the two states are high at 40 and 42 months, respectively. Clearly the model is picking up long-lasting regimes in the common component of the country portfolios. The average volatility is around 4.9% in both states, so the states are no longer deﬁned along high and low volatility on average. Diﬀerent results emerge from the parameter estimates for the joint industry model. In the low volatility state (state 2) the average volatility is 2.27%, whereas it is more than twice as high in the high volatility state (4.67). Average correlations are now negative in the low volatility state and zero in the high volatility state. State transition probabilities for the industry returns listed in Table 13.4 at 0.87 and 0.96 are quite high and imply 11 As the joint country model has 210 parameters and the joint industry model has 156 parameters (most of which measure the covariance between industry returns in the two states), we do not report all the estimates and instead concentrate on the standard deviations.

272

Volatility regimes and global equity returns

Table 13.5.

Volatility estimates for the common component models

(a) Common country component

(b) Common industry component

State 1 State 2 US UK France Germany

3.47 3.76 5.54 5.14

1.85 6.04 4.94 4.84

Italy

7.56

6.99

Japan Canada Australia Belgium

4.46 3.92 6.39 4.90

4.76 3.75 6.13 4.38

Denmark Ireland Netherlands Switzerland

5.14 5.32 3.01 4.16

5.46 6.75 3.56 3.89

Average

4.88

4.95

(c) Global component

State 1 State 2 State 1 State 2 Resources Basic General industry Cyclical consumer goods Noncyclical consumer goods Cyclical services Noncyclical services Utilities Information technology Financials Others

5.66 3.96 2.60 4.80

2.99 1.95 1.47 2.62

4.25

1.62

2.01 5.65 6.69 7.41

1.46 2.98 2.97 2.99

3.70 2.76

1.68 2.79

Average

4.67

2.27

5.27

2.67

This table reports maximum likelihood estimates of the volatility parameters of the regime-switching model (9), (10) ﬁtted to the common state model for countries or industries. The models thus extract a nonlinear state-variable common across the country or across the industry portfolios.

average duration of eight months in regime 1 and 26 months in regime 2. Consequently, the steady state probabilities are 23 and 77%, so that three times as much time is spent by the industry portfolios in the low volatility regime (state 2). Figure 13.1 plots the time series of the smoothed probabilities for the high volatility state identiﬁed by the common country and common industry models, as well as the model for global returns. The high persistence in the common country component stands out. For example, the common country eﬀect stays in the same regime over the period 1986–1997, although it is diﬃcult to interpret in terms of periods of high and low volatility. The common industry regime identiﬁes four high volatility periods around the early seventies (1974) and 1979–1980, a spell from 1986 to September 1987 followed by the more recent period from late 1997. The global return component follows shorter cyclical movements that nevertheless are well identiﬁed by the model. The ﬁnding that the global return component is the least persistent factor bodes well with the interpretation that it captures a variety of large, common economic shocks typically associated with the global business cycle. In contrast, common country components are likely to undergo less frequent shifts as they tend to be more based on structural relations that are more slowly evolving, especially in countries with relatively stable institutions such as the advanced countries comprising our data set. The economic interpretation of these results is further discussed in Section 5.

3 Global stock return dynamics

273

Global 1.00

0.80

0.60

0.40

0.20

0.00 1973 1974 1976 1977 1978 1979 1981 1982 1983 1984 1986 1987 1988 1989 1991 1992 1993 1994 1996 1997 1998 1999 2001 2002

1.00

Common Country

0.80

0.60

0.40

0.20

0.00 1973 1974 1976 1977 1978 1979 1981 1982 1983 1984 1986 1987 1988 1989 1991 1992 1993 1994 1996 1997 1998 1999 2001 2002

1.00

Common Industry

0.80

0.60

0.40

0.20

0.00 1973 1974 1976 1977 1978 1979 1981 1982 1983 1984 1986 1987 1988 1989 1991 1992 1993 1994 1996 1997 1998 1999 2001 2002

Fig. 13.1.

Smoothed state probabilities for common components (high volatility state).

3.3. Robustness checks A simple way of gauging the robustness of our estimates is to compare our smoothed probability estimates in the upper panel of Figure 13.1, as well as the associated measures of market volatility (computed as discussed below) against a simple nonparametric measure of global stock return volatility – the intra-month (capitalization weighted) variance of daily stock returns in the 13 countries that we consider. This comparison is plotted in Figure 13.2. Our model clearly appears to do a very good job in picking up the major volatility shifts: the correlation between our model estimates and such high-frequency nonparametric measures is reasonably high at 0.4.

6.0

Intra-monthly variance MS Global Variance MS Smoothed probability (right axis)

1

0.9

5.0 0.8

0.7 4.0 0.6

0.5

3.0

0.4 2.0 0.3

0.2 1.0 0.1

0

M ar -7 M 5 ar -7 M 6 ar -7 M 7 ar -7 M 8 ar -7 M 9 ar -8 M 0 ar -8 M 1 ar -8 M 2 ar -8 M 3 ar -8 M 4 ar -8 M 5 ar -8 M 6 ar -8 M 7 ar -8 M 8 ar -8 M 9 ar -9 M 0 ar -9 M 1 ar -9 M 2 ar -9 M 3 ar -9 M 4 ar -9 M 5 ar -9 M 6 ar -9 M 7 ar -9 M 8 ar -9 M 9 ar -0 M 0 ar -0 M 1 ar -0 2

0.0

Fig. 13.2.

Actual intra-monthly variance of stock returns and MS estimates

4 Variance decompositions

275

We also performed a series of tests on the properties of the residuals that attest the suitability of our model speciﬁcation. The main results are as follows. For the world portfolio, the coeﬃcient of excess kurtosis goes from 0.79 in the raw return data to −0.10 for the data normalized by the weighted state means and standard deviation. The Jarque– Bera test for normality (which has a critical value of 5.99) goes from 9.11 to 0.24. For the country portfolios, the average coeﬃcient of excess kurtosis in the raw returns data is 2.38. This drops to 0.16 after standardizing by the regime moments. Moreover, the average normality test drops from 297 to 1.53 and, when standardized by the regime moments, the number of rejections drops from 11 to only one rejection. For the industry portfolios, the average coeﬃcient of excess kurtosis drops from 2.91 to 0.56 and the average value of the normality test declines from 180 to 18.6 upon standardizing by state moments. The upshot is that these tests clearly suggest that the proposed model is ﬁtting the data well in that the residuals from the two-state speciﬁcation are close to being normally distributed for the vast majority of portfolios, despite the evidence of very strong non-normality prior to accounting for distinct volatility regimes.

4. Variance decompositions A central question in the literature on the sources of stock return volatility is the relative volatility of geographically or industrially diversiﬁed portfolios. To get a ﬁrst measure of how the total market variance evolves over time, we simply sum the global variance, the average country variance and the average industry variance (all based on conditional moment information reﬂecting the time-varying state probabilities) as follows: πsαt (σs2αt + (μαsαt − μαt )2 ) σt2 = sαt

+

πsβt (ω βt Ωβsβt ω βt + ω βt (μβsβt − μβt )2 )

(9)

Sβt

+

πsγt (ω γt Ωγsγt ω γt + ω γt (μγsγt − μγt )2 ),

Sγt

where ω βt is the vector of weights for the industry portfolios and ω γt is the vector of = expectations of the weights of country portfolios. μαt sαt πsαt μαsαt is the conditional global portfolio returns, μβt = sβt πsβt μβsβt and μγt = sγt πsγt μγsγt are J × 1 and K × 1 vectors of conditionally expected returns on the industry and country portfolios, respectively. The ﬁrst component in (9) accounts for the total variance of the global return component. The second component is the value-weighted industry variance, whereas the third component is the value-weighted country variance. Besides accounting for statedependent covariances, there is an extra component in each of these terms arising from variations in the means across states. Notice that this measure of total market variance changes over time due to time-variations in the state probabilities.12 Figure 13.3 plots the time series of the market volatility component computed from (9). Volatility varies considerably over time from a low point around 2.8% to a peak 12 The squared terms in the variance expression enter due to the binomial nature of the state variable, c.f. Timmermann (2000).

276

Volatility regimes and global equity returns

around 5.5% per month. It was very high around 1974/75, 1980, 1987, 1991, and from late 1997 onward. At these times, the market volatility component was close to twice as large as during the low volatility regimes that occurred in the late 1970s and mid-1990s. Recalling that the volatility of the country component does not vary much across the two states, whereas conversely the volatility of the industry and global portfolio returns are about twice as high in the high volatility state as they are in the low volatility state, it is easy to understand the ﬁgure. Systematic volatility tends to be high when the common industry component and the global component are both in the high volatility state at the same time, i.e. in 1974, 1980, 1987 and from 1998 to 2002. Conversely, if they are simultaneously in the low volatility state, then systematic volatility will be low.13 The measure of market variance in (9) readily lends itself to a decomposition into its three constituents. Figure 13.4 shows the fraction of total market variance represented by the average country, industry and global components scaled by the sum of these. Time variation in the (average) country fraction is very large and ranges from about two-thirds to one-third as in recent years. In particular, the importance of the country factor has been noticeably lower in periods such as the 1974–1975 oil shock, the 1987 stock market crash and the information technology boom of the late 1990s. Likewise, the fraction of total market volatility due to the industry component also varies considerably as shown in the middle panel of Figure 13.4. It rises to about 30% in the immediate aftermath of the two oil shocks of the 1970s (1974 and 1980/81), during the stock market crash of 1987, and during the IT boom and bust cycle from 1997/98 onwards. In the context of the existing literature, the estimated average level in the 10–15% range is slightly higher than the 7% ﬁgure of Heston and Rouwenhorst (1994) and more than twice as high as the estimates in Griﬃn and Karolyi (1998) – both based on linear single-state models.14 Figure 13.4 clearly unveils signiﬁcant changes in the relative importance of the industry factor and shows that its recent rise has in fact been the most persistent of all over the past 30 years, though not quite yet to the point where its contribution to volatility has surpassed that of the country factor. As shown in the bottom panel of Figure 13.4, this is partly due to the concomitant rise of the global 13 Overall, our estimates plotted in Figure 13.3 suggest that systematic volatility is nearly trendless: if a trend is ﬁtted into this measure of market variance (which excludes ﬁrm-level idiosyncratic variance as in equation (9)), it is very mildly positive and its statistical signiﬁcance is quite sensitive to end-point. A similar inference obtains when applying the nonparametric measure of intra-monthly volatility discussed in Section 3.3 and plotted in Figure 13.2: the respective slope of regressing this volatility measure on a linear trend is positive (0.0001) but not statistically signiﬁcant at 5%. This is consistent with Schwert’s (1989) ﬁnding for the US that market volatility does not display a signiﬁcant long-term trend. Using, however, aggregate stock price data spanning over a century for various advanced countries, Eichengreen and Tong (2004) suggest that stock market volatility displays a U-shape in most countries. It remains to be established whether this result stems from their use of much longer data series, the eﬀects of idiosyncratic variance (which they do not ﬁlter out), or due to their methodology based on rolling standard deviations of stock price changes and univariate GARCH(1,1) regressions. 14 Griﬃn and Karolyi (1998) present two sets of estimates, one using a nine-sector breakdown and the other using a 66 industry breakdown. They ﬁnd that the mean industry factor contribution to total return variance are 2 and 4%, respectively, a lot lower therefore than the above estimates. One possible reason for the lower estimate of Griﬃn and Karolyi (1998) relative to Heston and Rouwenhorst (1994) as well as ours is the inclusion of emerging markets in their sample. As country-speciﬁc shocks have been shown to play a greater role in the determination of stock returns in emerging markets, this is to be expected. However, we show below that much of the diﬀerence appears to be model- and time-dependent. Furthermore, Griﬃn and Karolyi consider a much shorter sample of weekly returns so diﬀerences in estimates are not all that surprising.

Systematic Volatility 5.30

4.80

4.30

3.80

3.30

2.80 1973 1974 1976 1977 1978 1979 1980 1981 1983 1984 1985 1986 1987 1988 1990 1991 1992 1993 1994 1995 1997 1998 1999 2000 2001

Fig. 13.3.

Market volatility

278

Volatility regimes and global equity returns

0.65 Variance due to country factor

0.60 0.55 0.50 0.45 0.40 0.35 0.30 1973 1974 1975 1976 1978 1979 1980 1981 1983 1984 1985 1986 1988 1989 1990 1991 1993 1994 1995 1996 1998 1999 2000 2002

0.45 Variance due to industry factor

0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 1973 1974 1976 1977 1979 1981 1982 1984 1985 1987 1988 1990 1992 1993 1995 1996 1998 2000 2002

0.50 Variance due to global factor

0.45 0.40 0.35 0.30 0.25 0.20 0.15 1973 1974 1975 1976 1978 1979 1980 1981 1983 1984 1985 1986 1988 1989 1990 1991 1993 1994 1995 1996 1998 1999 2000 2002

Fig. 13.4. factors

Decomposition of systematic variance into country, industry and global

4 Variance decompositions

279

factor contribution to overall stock return volatility in recent years, which has ﬁlled some of the gap arising from the decline in country-speciﬁc volatility. It is instructive to compare these results with those obtained through the widespread practice of estimating relative contributions by the time series variances of the estimated βjt and γkt , computed over a rolling window. We follow the common practice of using a window length of 3 years, but also experimented with 4- and 5-year rolling windows and found the trends to be very similar. To facilitate comparison with our results, Figure 13.5 plots the 3-year rolling window results together with our regime-switching estimates previously plotted in Figure 13.4. Clearly, the rolling window approach smoothes out the shifts in factor volatilities and their relative contributions. The respective states become less clearly deﬁned and the approach overlooks the important spikes associated with the oil shocks of 1973–1974 and 1979–1980. Finally, we also consider an alternative and complementary measure of the relative signiﬁcance of the industry and country contributions to portfolio returns, which was proposed by Griﬃn and Karolyi (1998). Our two-stage econometric methodology allows us to extend the Griﬃn and Karolyi decomposition scheme by both letting the relative contributions of each factor vary across states and taking into account the various industry covariances within each state. As in Griﬃn and Karolyi (1998), let the excess return on the national stock market or portfolio of country k (over and above the global portfolio return α) ˆ be decomposed into country k’s unique industry weights times the J β ˆ industry returns summed across industries (i.e., j=1 ωjkt βjt ) plus a “pure” country 15 eﬀect γˆkt : Rkt − α ˆt =

J

β ˆ ωjkt βjt + γˆkt ,

(10)

j=1 β is the jth industry’s weight in country k. The variance of this excess return where ωjkt conditional on the country state being sγt and the industry state being sβt is

ˆ t |sβt , sγt ) = (ω βkt ) Ωβsβ ω βkt + ek Ωγsγ ek + 2(ω βkt ) Cov(βjt , γkt |sβt , syt ), (11) V ar(Rkt − α where ω βkt is the J-vector of market capitalization weights of the industries in country k. Similarly, the excess return on the portfolio of industry j (over and above the global portfolio) can be decomposed into industry j’s unique country weights times the country returns summed across countries plus a pure industry eﬀect, βˆjt : ˆt = Rjt − α

K

γ ωjkt γˆkt + βˆjt ,

(12)

k=1 γ is the kth country’s weight in industry j. The variance of this excess return where ωjkt conditional on the country state being sγt and the industry state being sβt is

V ar(Rjt −α ˆ t |sβt , sγt ) = ω γjt Ωγsγ ω γjt +ej Ωβsβ ej +2 ω γjt Cov (βjt , γkt |sβt , syt ) , (13) 15 It is straightforward to show that this decomposition follows from rewriting equation (2) for each individual country portfolio, where the individual ﬁrm’s weight is the share of that ﬁrm in total market capitalization of the respective country portfolio.

280

Volatility regimes and global equity returns

0.65 0.60

Variance due to country factor

0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 1973 1974 1975 1976 1978 1979 1980 1981 1983 1984 1985 1986 1988 1989 1990 1991 1993 1994 1995 1996 1998 1999 2000 2002

0.45 Variance due to industry factor

0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 1973 1974 1975 1976 1978 1979 1980 1981 1983 1984 1985 1986 1988 1989 1990 1991 1993 1994 1995 1996 1998 1999 2000 2002

0.60 Variance due to global factor

0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 1973 1974 1975 1976 1978 1979 1980 1981 1983 1984 1985 1986 1988 1989 1990 1991 1993 1994 1995 1996 1998 1999 2000 2002

Fig. 13.5. Comparison of variance decomposition results between the proposed model and the 36-month rolling window approach

5 Economic interpretation: Oil, money, and tech shocks

281

where ω γjt is the K-vector of market capitalization weights of the countries in industry j. Part (a) in Table 13.6 reports the time series variances of the “pure” country eﬀects and the cumulative sum of the industry eﬀects in the 13 country portfolios, whereas Part (b) reports the time series variances of the pure industry eﬀects and the cumulative sum of country eﬀects in the 11 industry portfolios. In both cases, these variances are expressed as a ratio relative to the total variances of the excess returns. Their sum is therefore close, but not exactly equal to one due to the presence of the extra covariance term in (11) and (13) between the industry and country eﬀects. As country volatility does not vary greatly over the two states, to save space Table 13.6 simply presents results separately for the high and low industry volatility state. Although a number of individual country and sector results are of interest in their own right, looking at the overall means, two ﬁndings stand out. First, the 3.3% ﬁgure reported in the upper right panel is the overall measure of the industry factor contribution in the low industry volatility state, which is well within the range previously estimated by Griﬃn and Karolyi (1998) (2 and 4% depending on the level of industry aggregation – see tables 2 and 3 of their paper). Turning to the left panel, however, one can see that the same measure yields a much higher estimate of the aggregate industry component in the country portfolios (22.3% on average). In both the high and low industry volatility states, the average pure country volatility accounts for over 90% of the total country volatility – the fact that the right- and the left-hand side estimates in part (a) add to 120% being due to the higher negative covariance between the pure country and the composite industry eﬀect during the high industry volatility state. Moving to the breakdown of the industry portfolios shown in the bottom panels of Table 13.6, it is clear that the aggregate contribution of country eﬀects to industry portfolios is also state sensitive, being much lower (17%) in the high industry volatility state than in the low industry volatility state where it more than doubles (41%). Similarly, the pure industry contribution accounts for 91% of the total industry portfolio volatility in the high industry volatility state but only for 69% in the low industry volatility state. These results therefore suggest that decomposition averages reported in previous studies do vary considerably over economic states.

5. Economic interpretation: Oil, money, and tech shocks The existence of distinct volatility regimes in stock returns and shifts in the factor contributions therein begs the question of what drives them. Although the construction of a multivariate global risk model capable of identifying the various underlying shocks and their propagation into stock pricing is beyond the scope of this chapter, it is important to relate the above ﬁndings to key global economic developments in light of an existing literature on the drivers of stock market volatility. Furthermore, this provides an additional robustness check on the reasonableness of our estimates. A glance at the top and bottom panels of Figure 13.1 as well as Figure 13.3 is suggestive of one key determinant of the three main spikes in global market volatility identiﬁed by our model – 1973–1975, 1979–1980, and 1986–1987. These were periods when the probability of being in the high volatility state for the common industry factor peaked

282

Volatility regimes and global equity returns Table 13.6. Relative contribution of “pure” country and industry factors to the variance of stock returns High industry volatility state

(a) Country Portfolios US UK France Germany Italy Japan Canada Australia Belgium Denmark Ireland Netherlands Switzerland Average

(b) Industry Portfolios Resources Basic General industry Cyclical cons. goods Noncycl. cons. goods Cyclical services Noncycl. services Utilities Information technology Financials Others Average

Low industry volatility state

Pure country

Acc. industry

Pure country

Acc. industry

0.955 0.825 1.297 0.983 0.988 0.969 0.907 0.956 1.092 1.008 0.922 0.626 0.971 0.974

0.091 0.169 0.114 0.153 0.102 0.112 0.213 0.212 0.300 0.115 0.227 0.471 0.438 0.223

0.992 1.010 1.003 1.023 1.014 1.028 1.020 0.993 1.028 1.033 1.053 0.973 1.033 1.018

0.011 0.020 0.009 0.017 0.014 0.012 0.029 0.039 0.033 0.025 0.026 0.107 0.049 0.033

Pure industry

Acc. country

Pure industry

Acc. country

0.920 0.928 0.621 1.168 0.772 0.532 1.370 1.345 0.895 1.104 0.349 0.910

0.161 0.080 0.101 0.114 0.138 0.182 0.221 0.060 0.059 0.128 0.647 0.172

0.725 0.721 0.684 0.941 0.435 0.594 0.708 0.894 0.667 0.726 0.511 0.691

0.453 0.254 0.346 0.309 0.532 0.384 0.410 0.200 0.270 0.409 0.923 0.408

Part (a) of this table shows the contribution of the “pure” country eﬀect and the cumulated industry eﬀect of the excess return (computed relative to the global return) on the individual country portfolios, using the decomposition equation (11) in this chapter. Part (b) shows the contribution of the “pure” industry eﬀect and the cumulated country eﬀect of the excess return (computed relative to the global return) on the individual industry portfolios using the decomposition equation (13) in this chapter. The reported ﬁgures are ratios of the variance of each component to the variance of their sum (including their covariance).

5 Economic interpretation: Oil, money, and tech shocks

283

relative to the rest of the sample (bottom panel of Figure 13.1), and the industry factor’s contribution to overall global market volatility rose (middle panel of Figure 13.4). This clearly coincided with large oil shocks: oil prices tripled in 1974, more than doubled in 1979, and underwent a sharp decline in 1986, when the spot price of oil reached an in-sample trough below $10/barrel. These periods coincided with spells of substantial short run volatility in oil prices and marked proﬁtability shifts across industries depending on their oil-intensity, leading to greater uncertainty about future earnings growth that were also reﬂected in current stock prices (Guo and Kliesen, 2005; Kilian and Park, 2007). Our estimates suggest that industry-speciﬁc shocks are not the whole story behind the successive ups and downs in global stock return volatility during 1973–2002. Speciﬁcally, industry-speciﬁc shocks do not seem able to account for two other volatility shifts which we identify – the volatility upturns of 1982–1983 and 1985. Both occasions, however, were marked by substantial volatility in a well-known determinant of stock returns such as the short-term (risk-free) interest rate (see Eichengreen and Tong, 2004, and the various references therein). Reﬂecting monetary policy shocks in the US and also in countries like the UK – to which global real interest rates responded by rising to unprecedented levels (see Bernanke and Blinder, 1992 for a discussion on the “exogeneity” of such shocks), volatility in the three-month US Treasury bill market rose sharply during those episodes.16 This is illustrated in Figure 13.6, which plots the intra-month volatility in the Treasury bond yields (calculated from daily data), together with the volatility of returns on the stock market indices of the 13 countries comprising our sample. The correlation between the two indices is reasonably tight at such high frequencies, yielding a correlation coeﬃcient of 0.42 between end-1981 and end-1985. The fact that such monetary policy shocks were dramatic and not-so-gradual is consistent with the behavior of the smoothed state transition probability, which clearly display changes in volatility states around those episodes that were discrete and nongradual. Evidence that industry-speciﬁc shocks contributed to drive up global stock return volatility from 1997 is also evident from our estimates, which highlight the higher contribution of the industry factors during that period. Yet, a more complex set of circumstances appears to have been at play. To gain insight into those, Figure 13.7(a) plots the smoothed transition probabilities for the diﬀerent industries. Earlier explanations of the rise in volatility and co-movement across mature stock markets during 1997–2001 focus on the IT sector (Brooks and Cat˜ ao, 2000; Brooks and del Negro, 2002). This is not only because IT stock volatility rose sharply during those years but also because the weight of the sector in the global market portfolio more than doubled – from 10% in mid-1997 to 25% at the market peak in March 2000. As discussed by Oliner and 16 Other high global volatility spells that overlap with changes in monetary policy stances and higher money market volatility are observed following the 1998 Russian crisis, the March 2000 stock market crash, and between late 2001 and early 2002, in the wake of the September 11th terrorist attacks. As shown in Figure 13.6, however, none of these money market volatility bouts were of similar magnitude as those of the late 1970s and early 1980s. In this more recent period, there is also some evidence that monetary policy has become more reactive to the stock market (cf. IMF, 2000; Rigobon and Sack, 2003) and therefore a less independent driver of stock market volatility. This plausibly reﬂects the much greater weight of the stock market in aggregate wealth from the 1990s (relative to the 1970s and early 1980s), calling for greater attention of policymakers to stock market developments and their impact on aggregate spending and hence on price stability.

10

1.0

9

0.9 Stock Returns

8

3-month TBill (right axis)

0.8

7

0.7

6

0.6

5

0.5

4

0.4

3

0.3

2

0.2

1

0.1

0 1975M2 1977M1 1978M12 1980M11 1982M10 1984M9 1986M8 1988M7 1990M6 1992M5 1994M4 1996M3 1998M2 2000M1 2001M12

Fig. 13.6.

Intra-monthly variance of stock returns and of the three-month treasury bill yield

0.0

5 Economic interpretation: Oil, money, and tech shocks

285

1.00 0.80

Resources

0.60 0.40 0.20 0.00 1973 1975 1976 1977 1979 1980 1982 1983 1985 1986 1987 1989 1990 1992 1993 1994 1996 1997 1999 2000 2002 1.00 0.80

Basic

0.60 0.40 0.20 0.00 1973 1975 1976 1977 1979 1980 1982 1983 1985 1986 1987 1989 1990 1992 1993 1994 1996 1997 1999 2000 2002 1.00 0.80

General Industry

0.60 0.40 0.20 0.00 1973 1975 1976 1977 1979 1980 1982 1983 1985 1986 1987 1989 1990 1992 1993 1994 1996 1997 1999 2000 2002 1.00 0.80

Cyclical Consumer Durables

0.60 0.40 0.20 0.00 1973 1975 1976 1977 1979 1980 1982 1983 1985 1986 1987 1989 1990 1992 1993 1994 1996 1997 1999 2000 2002 1.00 0.80

Non-Cyclical Consumer Durables

0.60 0.40 0.20 0.00 1973 1975 1976 1977 1979 1980 1982 1983 1985 1986 1987 1989 1990 1992 1993 1994 1996 1997 1999 2000 2002

Fig. 13.7(a). state)

Smoothed state probabilities for individual industries (high volatility

286

Volatility regimes and global equity returns

1.00 0.80

Utilities

0.60 0.40 0.20 0.00 1973 1975 1976 1977 1979 1980 1982 1983 1985 1986 1987 1989 1990 1992 1993 1994 1996 1997 1999 2000 2002 1.00 0.80

Information Technology

0.60 0.40 0.20 0.00 1973 1975 1976 1977 1979 1980 1982 1983 1985 1986 1987 1989 1990 1992 1993 1994 1996 1997 1999 2000 2002 1.00 0.80

Financials

0.60 0.40 0.20 0.00 1973 1975 1976 1977 1979 1980 1982 1983 1985 1986 1987 1989 1990 1992 1993 1994 1996 1997 1999 2000 2002

1.00 0.80

Non cyclical Services

0.60 0.40 0.20 0.00 1973 1975 1976 1977 1979 1980 1982 1983 1985 1986 1987 1989 1990 1992 1993 1994 1996 1997 1999 2000 2002 1.00 0.80

Cyclical Services

0.60 0.40 0.20 0.00 1973 1975 1976 1977 1979 1980 1982 1983 1985 1986 1987 1989 1990 1992 1993 1994 1996 1997 1999 2000 2002

Fig. 13.7(b). state)

Smoothed state probabilities for individual industries (high volatility

6 Implications for global portfolio allocation

287

Sichel (2000), this coincided with eﬃciency gains in the information technology sector, an associated shift in the relative proﬁtability across industries and frequent revisions in earnings growth expectations, all of which only became tangible – and not so gradually so – in the late 1990s because of a sharp rise in the sector’s weight. This is starkly picked up by our smoothed transition probability estimates shown in the second panel of Figure 13.7(b). However, our estimates indicate that such a transition into a higher volatility state is not the preserve of the IT sector. The post-1997 period was also characterized by rising volatility in oil prices: the world oil price tripled between end-1998 and end-2000, before dropping sharply through 2001 and shooting up subsequently. This is clearly captured by the transition probability estimates for the resource industry in Figure 13.7(a). In addition, other industries also witnessed a volatility bout and these are not limited to media and telecom ﬁrms – industries with closer ties to the IT sector (grouped under cyclical and noncyclical services, respectively – see footnote 6). Some of this generalized volatility rise no doubt reﬂects the well-known tightening in world monetary conditions and ﬁnancial distress in Asian emerging markets and Russia, particularly aﬀecting the ﬁnancial service sector (which was more heavily exposed to those markets – cf. the widely publicized collapse of the US-based hedge fund Long-Term Capital Management), but also large chunks of the general industry in the US, Japan and Europe, which also exported heavily to those emerging markets (see Forbes and Chinn, 2004 on related evidence). Hence, it is not surprising that the model is picking up such a strong common industry component rapidly transitioning from a low to a high volatility state. In a nutshell, whereas previous work has emphasized the role of the IT sector in driving up global market volatility between 1997 and 2002, our industry estimates and the model’s allowance for a common industry factor indicate that the phenomenon was more widespread than previous work may suggest.

6. Implications for global portfolio allocation Our decompositions of market variance are based on the average country and industryspeciﬁc variances. As such, they are statistical measures that do not represent the payoﬀs from a portfolio investment strategy as they ignore covariances between the returns on the underlying country, industry, and global equity portfolios. The advantage of such measures is that they provide a clear idea of the relative size of the variances of returns on the three components (global, industry and country). International investors, however, will be interested in economic measures of volatility and risk that represent feasible investment strategies and hence account for covariances between returns on the diﬀerent portfolios involved. Further, changes in these covariances have important implications. For instance, when such covariances increase, domestic risk becomes less diversiﬁable, which in turn tends to raise the equity premium and drive up the cost of capital. The large literature on the links between national stock markets ﬁnds that the covariance of (excess) returns between national stock indices displays considerable variation over time (King, Sentana, and Wadhwani, 1994; Engle, Ito, and Lin, 1994; Bekaert and Harvey, 1995; Longin and Solnik, 1995; Karolyi and Stultz, 1996; Hartmann et al.,

288

Volatility regimes and global equity returns

2004). In this section, we use ﬁrm-level data and the methodology laid out in the previous sections to characterize the behavior of country portfolio covariances. Like King, Sentana, and Wadhwani (1994) and others, we let such time variation in country covariances be driven by an unobserved latent variable but, unlike those authors, we characterize such variations in terms of relatively lengthy historical periods or “states” and allow for diﬀerences in industry composition across countries to play a role. Likewise, the same approach is used to characterize the covariance patterns of the various industry portfolios. Because the estimated covariances/correlations within volatility states are conditional upon the entire time series information up to that point, they are not subject to the type of biases aﬀecting unconditional estimates, which have been highlighted by Forbes and Rigobon (2001). Moreover, an important spin-oﬀ of the proposed approach that, to the best of our knowledge has not been explored in the literature, is the possibility that the country and industry portfolios may be in diﬀerent states at a given point in time, thus raising interesting possibilities for risk diversiﬁcation. To see this, recall that the joint models ((9)–(10)) assume separate state processes for the global return factor (which aﬀects all stocks in every period) and for the country or industry returns. Each of these state variables can be in the high or low volatility state. The return on a geographically diversiﬁed portfolio invested in industry j will be αt +βjt , whereas the return on an industrially diversiﬁed country portfolio is αt + γkt . For such portfolios there are thus four possible state combinations. For the industry portfolios the four states are:

• high industry volatility, high global volatility (sβt = 1; sαt = 1) • high industry volatility, low global volatility (sβt = 1; sαt = 2) • low industry volatility, high global volatility (sβt = 2; sαt = 1) • low industry volatility, low global volatility (sβt = 2; sαt = 2).

The correlation between geographically diversiﬁed industry portfolios is likely to vary strongly according to the underlying combination of global and industry state variables. By construction, the global component is common to all stocks. Thus, when the global return variable is in the high volatility state, it will contribute relatively more to variations in the returns of such portfolios and correlations will increase. In contrast, when the global return component is in the low volatility state, correlations between country or industry portfolios will tend to be lower. Similarly, when the industry component is in the low volatility state, the relative signiﬁcance of the common global return component is larger so that correlations between industry portfolios will be stronger compared to when the industry return process is in the high volatility state. Given the very large diﬀerences between volatilities in the high and low volatility states observed for the global and industry portfolios, these eﬀects are likely to give rise to large diﬀerences between correlations of geographically diversiﬁed industry portfolios in the four possible states. A complication arises when computing these correlations as they depend on the correlation between the global and industry or country portfolio returns. Terms such as

6 Implications for global portfolio allocation

289

Cov(αt , γkt |st , sγt ) can be consistently estimated as follows: T Cov(αt , βjt |sαt , sβt ) =

t=1

T Cov(αt , γkt |sαt , sγt ) =

t=1

T Cov(βt , γt |sβt , sγt ) =

t=1

πsαt πsβt (αt − α ˆ sαt )(βjt − βˆjsβt ) , T t=1 πsαt πsβt πsαt πsγk (αt − α ˆ sαt )(γkt − γˆksγt ) , T t=1 πsαt πsγt

(14)

πsβt πsγt (βjt − βˆjsβt )(γkt − γˆksγt ) . T t=1 πsβt πsγt

To investigate just how diﬀerent these correlations and volatilities are, Table 13.7 presents the estimated covariances and correlations in the two possible states for the industrially diversiﬁed country portfolios, whereas Table 13.8 presents the estimated covariances and correlations for the geographically diversiﬁed industry portfolios. Variances are presented on the diagonals, covariances above the diagonal, and correlations are below. For the country portfolios, the key ﬁndings are as follows. First, correlations across countries vary substantially, even after allowing for cross-country diﬀerences in industry composition. In particular, correlations are generally higher among the Anglo–Saxon countries (notably between Canada, the United States, and the United Kingdom) and lowest between the United States and much of continental Europe and Japan. This result is consistent with the evidence of other studies using diﬀerent methodologies and measures (see, e.g., IMF, 2000) and our estimates show that it broadly holds across states.17 Second, correlations change markedly across states. As there is not much difference between the variance of country returns in the high and low volatility states, the main driver of the results will be whether the global portfolio is in the high or low volatility state. The average correlation between the country portfolios is 0.30 in the low global volatility state and 0.56 in the high global volatility state. Thus, as other studies using distinct econometric methodologies and data series have found (see, e.g., Solnik and Roulet, 2000; Bekaert, Harvey, and Ng, 2005), the state process for the global return component clearly makes a big diﬀerence to the average correlations between the country portfolios – our estimates for 13 mature markets indicating that such correlations almost double in the high volatility state. Turning to the geographically diversiﬁed industry portfolios listed in Table 13.8, a richer menu of possible combinations emerges as the global high and low volatility states are now supplemented by the high and low industry volatility states. When the industry process is in the high volatility state while the global process is in the low volatility state, the average correlation across industry portfolios is only 0.19. This rises to 0.50 when the industry and global processes are both in the high volatility state or both are in the low volatility state. Finally, when the industry state process is in the low volatility state while the global process is in the high volatility state, the average correlation across the geographically diversiﬁed industry portfolios is 0.81. These results show that 17 Among continental European countries, a main exception is the Netherlands the country factor volatility of which is highly correlated with those of the US and the UK.

Table 13.7.

Covariances and correlations for industrially diversiﬁed country portfolios

US

UK

FR

(a) High Global Volatility State 22.576 21.077 15.481 US UK 0.680 42.534 27.334 FR 0.445 0.572 53.637 GE 0.396 0.589 0.671 IT 0.297 0.481 0.471 JP 0.435 0.628 0.589 CA 0.755 0.677 0.398 AU 0.478 0.587 0.333 BE 0.490 0.584 0.642 DE 0.357 0.566 0.458 IR 0.559 0.695 0.559 NL 0.698 0.843 0.643 SW 0.507 0.681 0.588 (b) Low Global Volatility State 13.810 6.208 4.145 US UK 0.360 21.562 9.896 FR 0.177 0.338 39.733 GE 0.123 0.337 0.552 IT −0.003 0.247 0.310 JP −0.256 0.202 0.295 CA 0.643 0.402 0.157 AU 0.350 0.447 0.176 BE 0.215 0.298 0.500 DE 0.171 0.344 0.303 IR 0.198 0.455 0.342 NL 0.482 0.646 0.467 SW 0.191 0.412 0.403

GE

IT

JP

CA

AU

BE

DE

IR

NL

SW

11.452 23.382 29.912 37.004 0.416 0.473 0.391 0.336 0.663 0.519 0.533 0.780 0.731

12.418 27.582 30.362 22.264 77.383 0.476 0.384 0.284 0.381 0.421 0.408 0.471 0.342

17.511 34.702 36.561 24.403 35.471 71.849 0.472 0.384 0.523 0.481 0.585 0.626 0.591

17.049 20.993 13.863 11.308 16.039 19.028 22.585 0.663 0.445 0.408 0.572 0.645 0.509

16.616 28.034 17.861 14.962 18.271 23.817 23.058 53.595 0.394 0.342 0.472 0.467 0.442

14.321 23.429 28.936 24.821 20.623 27.291 13.024 17.724 37.846 0.469 0.655 0.706 0.635

9.494 20.678 18.813 17.703 20.757 22.830 10.860 14.038 16.177 31.414 0.589 0.527 0.575

20.049 34.213 30.921 24.479 27.111 37.472 20.533 26.122 30.436 24.927 57.045 0.697 0.629

15.753 26.123 22.389 22.563 19.699 25.221 14.562 16.256 20.627 14.033 25.005 22.585 0.773

13.538 24.956 24.173 24.964 16.904 28.139 13.595 18.154 21.922 18.109 26.657 20.626 31.528

2.405 8.233 18.296 27.676 0.230 0.033 0.176 0.193 0.532 0.406 0.275 0.691 0.615

−0.085 8.976 15.290 9.480 61.143 0.188 0.158 0.127 0.166 0.263 0.167 0.237 0.080

−5.585 5.504 10.896 1.026 8.638 34.422 −0.104 0.044 0.091 0.076 0.216 0.072 0.132

9.836 7.678 4.081 3.815 5.090 −2.514 16.926 0.600 0.206 0.289 0.264 0.461 0.261

8.909 14.224 7.584 6.974 6.826 1.780 16.904 46.947 0.244 0.247 0.285 0.326 0.291

4.076 7.082 16.122 14.295 6.641 2.717 4.333 8.538 26.123 0.317 0.442 0.538 0.445

3.354 8.436 10.104 11.282 10.880 2.359 6.273 8.956 8.558 27.899 0.382 0.388 0.443

4.321 12.383 12.625 8.472 7.647 7.415 6.360 11.454 13.230 11.825 34.357 0.416 0.359

6.349 10.616 10.416 12.878 6.558 1.487 6.712 7.911 9.745 7.255 8.640 12.543 0.613

3.141 8.456 11.207 14.286 2.770 3.411 4.751 8.816 10.046 10.337 9.299 9.591 19.500

This table reports estimates of the covariances and correlations between the returns on industrially diversiﬁed country portfolios. Results are shown for two states: high global volatility and low global volatility. Numbers above the diagonal show covariance estimates, numbers on the diagonal show variance estimates, whereas numbers below the diagonal are estimates of the correlations. The diagonal is in bold for easy reference.

Table 13.8.

Covariances and correlations for geographically diversiﬁed industry portfolios RESOR

BASIC

GENIN

CYCGD

(a) High Global Volatility, High Industry Volatility RESOR 44.698 27.487 23.026 17.816 BASIC 0.656 39.249 31.117 34.581 GENIN 0.566 0.816 37.092 36.960 CYCGD 0.359 0.744 0.818 54.976 0.571 0.565 0.401 NCYCG 0.335 CYSER 0.460 0.760 0.869 0.814 NCYSR 0.164 0.280 0.349 0.340 UTILS 0.165 0.220 0.057 0.086 0.302 0.449 0.774 0.640 ITECH FINAN 0.553 0.793 0.731 0.682 OTHER 0.607 0.739 0.867 0.690 (b) Low Global Volatility, High Industry Volatility RESOR 46.513 18.019 12.537 2.633 BASIC 0.614 18.496 9.344 8.113 GENIN 0.486 0.575 14.299 9.472 CYCGD 0.081 0.395 0.525 22.793 NCYCG 0.310 0.160 0.067 −0.327 CYSER 0.349 0.423 0.640 0.473 NCYSR 0.054 −0.205 −0.167 −0.212 UTILS 0.318 0.141 −0.099 −0.185 ITECH −0.061 −0.160 0.451 0.252 FINAN 0.440 0.530 0.325 0.220 OTHER 0.526 0.398 0.659 0.224

NCYCG

CYSER

NCYSR

UTILS

ITECH

FINAN

OTHER

10.006 15.977 15.375 13.293 19.964 0.549 0.181 0.132 0.372 0.700 0.607

17.214 26.676 29.660 33.825 13.737 31.385 0.516 0.080 0.788 0.753 0.820

7.179 5.569 22.755 11.502 6.966 31.720 13.931 1.763 53.177 16.511 3.227 53.585 5.309 2.977 18.766 18.934 2.281 49.845 42.880 −3.578 39.028 −0.108 25.634 −2.995 0.528 −0.052 127.348 0.375 0.411 0.468 0.430 0.194 0.725

23.987 32.248 28.895 32.844 20.295 27.364 15.932 13.503 34.300 42.131 0.765

26.029 29.707 33.892 32.822 17.400 29.477 18.059 6.295 52.537 31.869 41.205

7.887 2.574 0.951 −5.826 13.911 0.030 −0.181 0.305 −0.361 0.315 0.131

7.772 5.949 7.912 7.382 0.360 10.684 0.141 −0.039 0.440 0.340 0.508

2.058 −4.904 −3.495 −5.610 −3.747 2.553 30.820 −0.067 0.122 −0.107 −0.016

−3.558 12.519 −5.876 9.496 14.560 5.122 10.273 4.376 −11.482 4.892 12.274 4.638 5.778 −2.474 8.384 −22.958 72.907 −5.297 −0.149 17.379 0.374 0.420

14.541 6.935 10.099 4.334 1.977 6.731 −0.367 1.156 12.921 7.097 16.413

13.734 3.846 −2.377 −5.607 7.207 −0.813 −2.350 40.148 −0.424 0.317 0.045

(cont.)

Table 13.8.

(Continued) RESOR

BASIC

GENIN

CYCGD

(c) High Global Volatility, Low Industry Volatility RESOR 31.884 26.284 25.658 23.047 BASIC 0.780 35.637 31.534 28.314 GENIN 0.802 0.932 32.124 29.490 CYCGD 0.712 0.827 0.908 32.856 0.886 0.903 0.822 NCYCG 0.770 CYSER 0.767 0.928 0.925 0.859 NCYSR 0.709 0.752 0.786 0.715 UTILS 0.730 0.774 0.764 0.676 0.812 0.882 0.822 ITECH 0.684 FINAN 0.802 0.893 0.901 0.820 OTHER 0.771 0.868 0.879 0.797 (d) Low Global Volatility, Low Industry Volatility RESOR 14.969 5.969 6.483 5.174 7.040 BASIC 0.447 11.922 8.959 GENIN 0.513 0.794 10.690 9.358 CYCGD 0.357 0.544 0.764 14.025 NCYCG 0.424 0.640 0.697 0.528 0.774 0.635 CYSER 0.424 0.779 NCYSR 0.355 0.331 0.432 0.332 UTILS 0.387 0.375 0.359 0.223 ITECH 0.343 0.558 0.737 0.615 0.525 FINAN 0.506 0.679 0.699 OTHER 0.467 0.655 0.683 0.510

NCYCG

CYSER

NCYSR

UTILS

ITECH

FINAN

OTHER

23.764 28.892 27.954 25.744 29.852 0.933 0.803 0.808 0.795 0.918 0.851

24.252 31.026 29.377 27.587 28.563 31.386 0.785 0.798 0.816 0.935 0.865

21.415 24.000 23.803 21.918 23.448 23.516 28.582 0.780 0.692 0.791 0.746

22.257 23.168 24.962 29.079 23.397 29.971 20.924 28.233 23.840 26.034 24.143 27.393 22.515 22.197 29.187 19.911 0.615 35.947 0.829 0.786 0.771 0.771

26.389 31.083 29.746 27.378 29.241 30.539 24.652 26.108 27.480 33.964 0.867

27.509 32.749 31.505 28.885 29.386 30.651 25.210 26.343 29.220 31.948 39.976

4.980 6.708 6.911 6.002 9.201 0.791 0.469 0.466 0.516 0.743 0.597

5.283 8.656 8.148 7.659 7.725 10.362 0.431 0.450 0.573 0.803 0.645

4.824 4.008 4.952 4.368 4.988 4.869 12.314 0.475 0.335 0.438 0.377

5.173 5.590 4.478 8.101 4.054 10.133 2.883 9.697 4.888 6.588 5.005 7.761 5.755 4.942 11.935 2.165 0.149 17.706 0.528 0.496 0.429 0.491

6.563 7.856 7.660 6.593 7.546 8.658 5.150 6.113 6.991 11.227 0.652

7.373 9.213 9.109 7.791 7.382 8.461 5.398 6.039 8.421 8.901 16.620

This table reports estimates of the covariances and correlations between the returns on geographically diversiﬁed industry portfolios. Results are shown for four states: high global volatility, high industry volatility (a); low global volatility, high industry volatility (b); high global volatility, low industry volatility (c); low global volatility, low industry volatility (d). Numbers above the diagonal show covariance estimates, numbers on the diagonal show variance estimates, whereas numbers below the diagonal are estimates of the correlations. The diagonal is in bold for easy reference.

7 Conclusion

293

the average correlations between geographically diversiﬁed industry portfolios vary substantially according to the state process driving the common industry component and the global component, with the non-negligible diﬀerences in industry factor correlations within each state being especially magniﬁed in the high industry volatility state. Finally we note how diﬀerent the average volatility level is in the high and low volatility states. For the country portfolios the variation in volatility is, unsurprisingly, somewhat smaller. The mean volatility is 6.4% per month in the high global volatility state and 5.3% in the low volatility state. The mean volatility of the industry portfolios is 6.6% per month in the high industry-, high global volatility state as compared with an average volatility of these portfolios of 3.6% in the low industry-, low global volatility state. Important implications follow from these results. Generally, it will be more diﬃcult to reduce equity risk through cross-border diversiﬁcation when the global volatility process is in the high volatility state. On a macro level, this suggests that international capital ﬂows should be expected to rise or accelerate during periods of low global stock market volatility and to ebb during high volatility states. Moreover, as the gains to cross-border diversiﬁcation appear to be especially meager when global and industry factors both simultaneously lie in a high volatility state, this suggests that cross-border risk diversiﬁcation should not be so beneﬁcial during those subperiods. Provided that a country has a suﬃciently diversiﬁed domestic industrial structure that allows residents to diversify risk along broad industry lines without having to go abroad, international equity ﬂows will tend to be dampened as a result. This raises the question of whether these patterns are actually observed in the data. A systematic testing of this relationship is no mean task – not only because international portfolio investments are driven by a number of eﬀects at play (see, e.g., Tesar and Werner, 1994), but also because of considerable data problems even if we were to limit the analysis to US data. Yet, it clearly warrants attention by future research.

7. Conclusion This chapter has developed a regime-switching modeling framework and applied it to 30-year long ﬁrm-level data to address three main questions – whether global stock return volatility displays well-deﬁned volatility regimes, the extent to which equity market volatility is accounted for by global, country- or sector-speciﬁc factors, and what implication this has for national equity market correlations and international risk diversiﬁcation. Our results reveal strong evidence of regimes in international stock returns characterized by diﬀerent levels of volatility, with the low volatility regime being two to three times more persistent once we average over the various individual country and industry portfolios. The robustness of these results is not only butressed by a variety of statistical tests on the model’s residuals but also by an identiﬁcation of volatility regimes, which is broadly consistent with estimates from alternative nonparametric measures, as well as with what we know about the timing of major shocks deemed to aﬀect stock market volatility. At the very least, this suggests that the single-state assumption underlying the linear models used in previous studies can be improved upon. As discussed above and further stressed below, the inadequacy of the single-state assumption is not only a

294

Volatility regimes and global equity returns

technical econometric issue: it also leads one to gloss over important shifts in portfolio diversiﬁcation possibilities as the various factors switch between high and low volatility regimes over time. To the extent that such states are persistent enough to allow the respective probabilities to be estimated with reasonable precision, market participants should thus be able to reap signiﬁcant beneﬁts from monitoring the underlying state probabilities as well as cross-country and -industry portfolio correlations within them. As allowing for time-varying factor contributions appears to characterize the data better than linear models with similar factor structure, this should also deliver more accurate estimates of the various factor contributions. Over the entire period 1973–2002, the country factor contribution averaged some 50% as opposed to 16% for the industry factor. Yet, these contributions have witnessed important variations across volatility states, with the country factor contribution dropping sharply at times, to as low as under 35% as around 1973–1974, 1986–1987 and 2000–2001. Further, as each factor in the model is allowed to be in one of two states at any point in time, we also show that economically interesting state combinations arise as each combination gives rise to a stronger or weaker pattern of correlations between the various portfolios. In general, the correlations among the various country and industry portfolios are stronger in the high global volatility state than in the low global volatility state; in the case of industrially diversiﬁed country-speciﬁc portfolios, those correlations nearly double on average. Hence the diversiﬁcation beneﬁts of investing abroad tend to be considerably smaller when global volatility is high. Further, and also after accounting for diﬀerent industrial makeups across countries and diﬀerences in volatility regimes, pair-wise correlations between the various country portfolios indicate that international diversiﬁcation beneﬁts are even smaller when conﬁned to certain subsets of countries, such as the Anglo–Saxon nations or within continental Europe. These results speak directly to various strands of the literature. First, our ﬁndings suggest that the apparently greater potential for industry diversiﬁcation arising from the greater contribution of industry factors to stock return volatility between 1997 and 2002 is likely to be, at least in part, temporary: global stock market volatility typically goes through ups and downs and the contribution of country factors typically declines (rises) during high (low) global volatility states; so, the incentive for global equity diversiﬁcation along industry lines (as opposed to country lines) should shift accordingly – rather than being a permanent phenomenon. A similar inference follows from the evidence presented in Brooks and del Negro (2002) and Bekaert et al. (2005) using diﬀerent econometric approaches and distinct levels of industry disaggregation in their discussion of the IT bubble of the late 1990s. Second, related inferences can be drawn about the role of “globalization” in driving down the contribution of country factors in stock returns. Although a number of studies have pointed to a decline in home bias and noted that ﬁrm operations (particularly among advanced countries) have grown more international (cf. Diermeier and Solnik, 2001), our ﬁnding that the contribution of country factors have ﬂuctuated throughout the period cautions against seeing the 1997–2002 decline as permanent due to “globalization” forces. This is not to exclude that this shift may have a sizeable permanent component especially for certain country subgroups – notably in Europe (cf. Baele and Inghelbrecht, 2005). What our look at the historical evidence on regime shifts simply suggests is that the estimated longer stay in a low country volatility state plus the attendant decline in the

7 Conclusion

295

contribution from the country factor from the mid-1990s may be picking up temporary as well as permanent factors. More deﬁnitely, our estimates also suggest that, in any event, greater globalization has not yet resulted in the industry factor becoming more important than the country factor. More time series data, together with richer structural models that pin down the various sources of market integration, are clearly needed before ﬁrmer inferences can be made about the permanency of such shifts. Third, our estimates of “pure” country portfolio correlations across national stock markets are consistent with the ﬁndings of an emerging literature on information frictions or institutional-based “gravity” views of international equity ﬂows (Portes and Rey, 2005). Because our estimates are conditional upon the distinct volatility states and use all the time series information up to that point, they are robust to the bias aﬀecting unconditional correlations discussed in Forbes and Rigobon (2001), which plagues unconditional correlations reported in much of the literature. As in such gravity-type models, these conditional correlation estimates clearly show that market correlations tend to be systematically higher – both during high and low volatility states – among Anglo–Saxon countries and across much of continental Europe. An open question for future research is to which extent higher “pure” country correlations among European countries have intensiﬁed over time and on a permanent basis since the introduction of the Euro in 1999 and of other regional harmonization policies – rather than resulting from the rise in the global factor-driven volatility since 1997. Finally, interesting implications for the pattern of cross-border capital ﬂows also follow. For one thing, evidence that over the long run average stock return volatility has been mainly determined by country-speciﬁc factors suggests that equity risk can be greatly reduced by diversifying portfolios across national borders. This provides an important rationale for the observed dramatic growth in gross cross-border equity holdings, which, in turn, has important implications for international macroeconomic adjustment (Lane and Milesi-Ferretti, 2001). Conversely, however, during high global and high industry volatility regimes (such as those historically associated with large sector-speciﬁc shocks such as oil), the risk diversiﬁcation incentive to cross-border ﬂows is therefore weakened and perceived home bias is strengthened. Even granting that risk diversiﬁcation is simply one among other potential drivers of cross-border ﬂows, one’s allowance for the existence of such regime shifts may shed new light on the question of what drives the massive swings in the growth of cross-border equity holdings observed in the data. Although further research considering a host of other factors and based on better equity ﬂow data is clearly needed before any robust inference is drawn, this hypothesis emerges as an interesting spin-oﬀ of this chapter’s results.

14

A Multifactor, Nonlinear, Continuous-Time Model of Interest Rate Volatility Jacob Boudoukh, Christopher Downing, Matthew Richardson, Richard Stanton, and Robert F. Whitelaw

1. Introduction When one sees so many co-authors on a single chapter and they are not from the hard sciences, the natural question is why? Looking at both the number and quality of contributors to this volume and how much econometric talent Rob Engle has helped nurture through his career, it becomes quite clear that the only way we could participate in Rob’s Festschrift is to pool our limited abilities in Financial Econometrics. Given Rob’s obvious importance to econometrics, and in particular to ﬁnance via his seminal work on volatility, it is quite humbling to be asked to contribute to this volume. Looking over Rob’s career, it is clear how deeply rooted the ﬁnance ﬁeld is in Rob’s work. When one thinks of the major empirical papers in the area of ﬁxed income, Fama and Bliss (1987), Campbell and Shiller (1991), Litterman and Scheinkman (1991), Chan, Karolyi, Longstaﬀ and Sanders (1992), Longstaﬀ and Schwartz (1992), Pearson and Sun (1994), A¨ıt-Sahalia (1996b) and Dai and Singleton (2000) come to mind. Yet in terms of citations, all of these papers are dominated by Rob’s 1987 paper with David Lilien and Russell Robins, “Estimating Time Varying Risk Premia in the Term Structure: The ARCH-M Model.” In this chapter, further expanded upon in Engle and Ng (1993), Acknowledgments: We would like to thank Tim Bollerslev, John Cochrane, Lars Hansen, Chester Spatt, an anonymous referee and seminar participants at the Engle Festschrift conference, New York Federal Reserve, the Federal Reserve Board, Goldman Sachs, University of North Carolina, U.C. Berkeley, ITAM, the San Diego meetings of the Western Finance Association, the Utah Winter Finance Conference, and the NBER asset pricing program for helpful comments.

296

1 Introduction

297

the authors present evidence that the yield curve is upward sloping when interest rate volatility is high via an ARCH-M eﬀect on term premia. The result is quite natural to anyone who teaches ﬁxed income and tries to relate the tendency for the term structure to be upward sloping to the duration of the underlying bonds. Given this work by Rob, our contribution to this Festschrift is to explore the relation between volatility and the term structure more closely. It is now widely believed that interest rates are aﬀected by multiple factors.1 Nevertheless, most of our intuition concerning bond and ﬁxed-income derivative pricing comes from stylized facts generated by single-factor, continuous-time interest rate models. For example, the ﬁnance literature is uniform in its view that interest rate volatility is increasing in interest rate levels, though there is some disagreement about the rate of increase (see, for example, Chan, Karolyi, Longstaﬀ and Sanders, 1992; A¨ıt-Sahalia, 1996b; Conley, Hansen, Luttmer and Scheinkman, 1995; Brenner, Harjes and Kroner, 1996; and Stanton, 1997). If interest rates possess multiple factors such as the level and slope of the term structure (Litterman and Scheinkman, 1991), and given the Engle, Lilien and Robins (1987) ﬁnding, then this volatility result represents an average over all possible term structure slopes. Therefore, conditional on any particular slope, volatility may be severely misestimated, with serious consequences especially for ﬁxed-income derivative pricing. Two issues arise in trying to generate stylized facts about the underlying continuoustime, stochastic process for interest rates. First, how do we specify ex ante the drift and diﬀusion of the multivariate process for interest rates so that it is consistent with the true process underlying the data? Second, given that we do not have access to continuoustime data, but instead to interest rates/bond prices at discretely sampled intervals, how can we consistently infer an underlying continuous-time multivariate process from these data? In single-factor settings, there has been much headway at addressing these issues (see, for example, A¨ıt-Sahalia, 1996a, 2007; Conley, Hansen, Luttmer and Scheinkman, 1995; and Stanton, 1997). Essentially, using variations on nonparametric estimators with carefully chosen moments, the underlying single-factor, continuous-time process can be backed out of interest rate data. Here, we extend the work of Stanton (1997) to a multivariate setting and provide for the nonparametric estimation of the drift and volatility functions of multivariate stochastic diﬀerential equations.2 Basically, we use Milshteins (1978) approximation schemes for writing expectations of functions of the sample path of stochastic diﬀerential equations in terms of the drift, volatility and correlation coeﬃcients. If the expectations are known (or, in our case, estimated nonparametrically) and the functions are chosen appropriately, then the approximations can be inverted to recover the drift, volatility and correlation coeﬃcients. In this chapter, we apply this technique to the short- and long-end of the term structure for a general two-factor, continuous-time diﬀusion process for interest 1 See, for example, Stambaugh (1988), Litterman and Scheinkman (1991), Longstaﬀ and Schwartz (1992), Pearson and Sun (1994), Andersen and Lund (1997), Dai and Singleton (2000) and CollinDufresne, Goldstein and Jones (2006) to name a few. This ignores the obvious theoretical reasons for multifactor pricing, as in Brennan and Schwartz (1979), Schaefer and Schwartz (1984), Heath, Jarrow and Morton (1992), Longstaﬀ and Schwartz (1992), Chen and Scott (1992), Duﬃe and Kan (1996), Ahn, Dittmar and Gallant (2002) and Piazzesi (2005), among others. 2 An exception is A¨ ıt-Sahalia (2008) and A¨ıt-Sahalia and Kimmel (2007b) who provide closed form expansions for the log-likelihood function for a wide class of multivariate diﬀusions.

298

A multifactor, nonlinear, continuous-time model of interest rate volatility

rates. Our methods can be viewed as a nonparametric alternative to the aﬃne class of multifactor continuous-time interest rate models studied in Longstaﬀ and Schwartz (1992), Duﬃe and Kan (1996), Dai and Singleton (2000) and A¨ıt-Sahalia and Kimmel (2007b), the quadratic term structure class studied in Ahn, Dittmar and Gallant (2002), and the nonaﬃne parametric speciﬁcations of Andersen and Lund (1997). As an application, we show directly how our model relates to the two-factor model of Longstaﬀ and Schwartz (1992). Our chapter provides two contributions to the existing literature. First, in estimating this multifactor diﬀusion process, some new empirical facts emerge from the data. Of particular note, although the volatility of interest rates increases in the level of interest rates, it does so primarily for sharply upward sloping term structures. Thus, the results of previous studies, suggesting an almost exponential relation between interest rate volatility and levels, are due to the term structure on average being upward sloping, and is not a general result per se. Moreover, our volatility result holds for both the shortand long-term rates of interest. Thus, conditional on particular values of the two factors, such as a high short rate of interest and a negative slope of the term structure, the term structure of interest rate volatilities is generally at a lower level across maturities than implied by previous work. The second contribution is methodological. In this chapter, we provide a way of linking empirical facts and continuous-time modeling techniques so that generating implications for ﬁxed-income pricing is straightforward. Speciﬁcally, we use nonparametrically estimated conditional moments of “relevant pricing factors” to build a multifactor continuous-time diﬀusion process, which can be used to price securities. This process can be considered a generalization of the Longstaﬀ and Schwartz (1992) two-factor model. Using this estimated process, we then show how to value ﬁxed-income securities, in conjunction with an estimation procedure for the functional for the market prices of risk. As the analysis is performed nonparametrically without any priors on the underlying economic structure, the method provides a unique opportunity to study the economic structure’s implications for pricing. Of course, ignoring the last 25 years of term structure theory and placing more reliance on empirical estimation, with its inevitable estimation error, may not be a viable alternative on its own. Nevertheless, we view this approach as helpful for understanding the relation between interest rate modeling and ﬁxed-income pricing.

2. The stochastic behavior of interest rates: Some evidence In this section, we provide some preliminary evidence for the behavior of interest rates across various points of the yield curve. Under the assumption that there are two interestrate dependent state variables, and that these variables are spanned by the short rate of interest and the slope of the term structure, we document conditional means and volatilities of changes in the six-month through ﬁve-year rates of interest. The results are generated nonparametrically, and thus impose no structure on the underlying functional forms for the term structure of interest rates.

2 The stochastic behavior of interest rates: Some evidence

299

2.1. Data description Daily values for constant maturity Treasury yields on the three-year, ﬁve-year and 10-year US government bond were collected from Datastream over the period January 1983 to December 2006. In addition, three-month, six-month and one-year T-bill rates were obtained from the same source, and converted to annualized yields. This provides us with over 6,000 daily observations. The post-1982 period was chosen because there is considerable evidence that the period prior to 1983 came from a diﬀerent regime (see, for example, Huizinga and Mishkin, 1986; Sanders and Unal, 1988; Klemkosky and Pilotte, 1992; and Torous and Ball, 1995). In particular, these researchers argue that the October 1979 change in Federal Reserve operating policy led to a once-and-for-all shift in the behavior of the short-term riskless rate. As the Federal Reserve experiment ended in November 1982, it is fairly standard to treat only the post-late-1982 period as stationary. In estimating the conditional distribution of the term structure of interest rates, we employ two conditioning factors. These factors are the short rate of interest – deﬁned here as the three-month yield – and the slope of the term structure – deﬁned as the spread between the 10-year and three-month yields. These variables are chosen to coincide with interest rate variables used in other studies (see Litterman and Scheinkman, 1991; and Chan, Karolyi, Longstaﬀ and Sanders, 1992, among others). Figure 14.1 graphs the time

12

Short Rate Slope

10

8

6

4

2

0

–2

84

86

88

90

92

94

96

98

00

02

04

06

Year

Fig. 14.1. Time series plot of the three-month rate and term structure slope (i.e., the spread between the 10-year and three-month rate) over the 1983–2006 period

300

A multifactor, nonlinear, continuous-time model of interest rate volatility 4

3

Slope

2

1

0

–1

0

2

4

6 Short Rate

8

10

Fig. 14.2. Scatter plot of the three-month rate versus the term structure slope over the 1983–2006 period

series of both the short rate and slope. Over the 1983–2006 period, the short rate ranges from 1% to 11%, whereas the slope varies from −1% to 4%. There are several distinct periods of low and high interest rates, as well as slope ranges. As the correlation between the short rate and slope is −0.31, there exists the potential for the two variables combined to possess information in addition to a single factor. Figure 14.2 presents a scatter plot of the short rate and term structure slope. Of particular importance to estimating the conditional distribution of interest rates is the availability of the conditioning data. Figure 14.2 shows that there are two holes in the data ranges, namely at low short rates (i.e., from 1% to 4%) and low slopes (i.e., from −1% to 2%), and at high short rates (i.e., from 9.5% to 11.5%) and low slopes (i.e., from −1% to 1%). This means that the researcher should be cautious in interpreting the implied distribution of interest rates conditional on these values for the short rate and slope.

2.2. The conditional distribution of interest rates: A ﬁrst look In order to understand the stochastic properties of interest rates, consider conditioning the data on four possible states: (i) high level (i.e., of the short rate)/high slope, (ii) high level/low slope, (ii) low level/low slope, and (iv) low level/high slope. In a generalized

2 The stochastic behavior of interest rates: Some evidence method of moments framework, the moment conditions are:3

τ ⎛ ⎞ Δit,t+1 − μτhr:hs × It,hr:hs

τ ⎜ ⎟ Δit,t+1 − μτhr:ls × It,hr:ls ⎜ ⎟ ⎜ ⎟

τ τ ⎜ ⎟ Δit,t+1 − μlr:ls × It,lr:ls ⎜ ⎟

τ ⎜ ⎟ Δit,t+1 − μτlr:hs × It,lr:hs ⎜ ⎟ ⎜0 ⎟ 1 2 ⎟ 2 τ τ τ E⎜ ⎜ Δit,t+1 − μhr:hs − σhr:hs × It,hr:hs ⎟ = 0, ⎜ 0 ⎟ 1 2 ⎜ ⎟ 2 τ × It,hr:ls ⎟ ⎜ Δiτt,t+1 − μτhr:ls − σhr:ls ⎜ 0 ⎟ 1 2 ⎜ ⎟ 2 τ Δiτt,t+1 − μτlr:ls − σlr:ls × It,lr:ls ⎟ ⎜ ⎝ 0 ⎠ 1 2 2 τ Δiτt,t+1 − μτlr:hs − σlr:hs × It,lr:hs

301

(1)

where Δiτt,t+1 is the change in the τ -period interest rate from t to t + 1, μτ·|· is the mean τ is the volatility of the change in rates conditional on one of the four states occurring, σ·|· change in rates conditional on these states, and It,·|· = 1 if [·|·] occurs, zero otherwise. These moments, μτ and σ τ , thus represent coarse estimates of the underlying conditional moments of the distribution of interest rates. These moment conditions allow us to test a variety of restrictions. First, are τ τ τ τ = σhr:ls and σlr:hs = σlr:ls ? That is, does the slope of the term structure σhr:hs help explain volatility at various interest rate levels? Second, similarly, with respect to the mean, are μτhr:hs = μτhr:ls and μτlr:hs = μτlr:ls ? Table 14.1 provides estimates τ , and the corresponding test statistics. Note that the framework allows of μτ·|· and σ·|· for autocorrelation and heteroskedasticity in the underlying squared interest rate series when calculating the variance–covariance matrix of the estimates. Further, the crosscorrelation between the volatility estimates is taken into account in deriving the test statistics. Several facts emerge from Table 14.1. First, as documented by others (e.g., Chan, Karolyi, Longstaﬀ and Sanders, 1992; and A¨ıt-Sahalia, 1996a), interest rate volatility is increasing in the short rate of interest. Of some interest here, this result holds across the yield curve. That is, conditional on either a low or high slope, volatility is higher for the six-month, one-year, three-year and ﬁve-year rates at higher levels of the short rate. Second, the slope also plays an important role in determining interest rate volatility. In particular, at high levels of interest rates, the volatility of interest rates across maturities is much higher at steeper slopes. For example, the six-month and ﬁve-year volatilities rise from 5.25 and 6.35 to 7.65 and 7.75 basis points, respectively. Formal τ τ = σhr:ls provide 1% level rejections at each of the maturitests of the hypothesis σhr:hs ties. There is some evidence in the literature that expected returns on bonds are higher for steeper term structures (see, for example, Fama, 1986, and Boudoukh, Richardson, Smith and Whitelaw, 1999a, 1999b); these papers and the ﬁnding of Engle, Lilien and Robins (1987) may provide a link to the volatility result here. Third, the eﬀect of the slope is most important at high interest rate levels. At low short rate levels, though the

3 We deﬁne a low (high) level or slope as one that lies below (above) its unconditional mean. Here, this mean is being treated as a known constant, though, of course, it is estimated via the data.

302

A multifactor, nonlinear, continuous-time model of interest rate volatility

Table 14.1.

Conditional moments of daily interest rate changes (basis points)

Probability

HR,HS 22.76%

HR,LS 26.83%

Six-month 0.032 (s.e.)/[p value] (0.207) One-year 0.032 (s.e.)/[p value] (0.215) Three-year 0.032 (s.e.)/[p value] (0.211) Five-year −0.070 (s.e.)/[p value] (0.210)

−0.292 (0.131) −0.365 (0.147) −0.365 (0.158) −0.371 (0.158)

Six-month (s.e.)/[p value] One-year (s.e.)/[p value] Three-year (s.e.)/[p value] Five-year (s.e.)/[p value]

7.645 (0.364) 7.928 (0.367) 7.928 (0.341) 7.746 (0.329)

5.248 (0.165) 5.869 (0.187) 5.869 (0.187) 6.347 (0.179)

0.840

0.823

χ2HR,HS=HR,LS

LR,HS 27.45%

Mean (bp/day) 1.747 0.031 [0.186] (0.092) 2.339 0.060 [0.126] (0.120) 1.462 0.063 [0.227] (0.167) 1.304 0.017 [0.254] (0.167) Volatility (bp/day) 35.314 3.715 [0.000] (0.163) 24.452 4.879 [0.000] (0.168) 13.564 6.784 [0.000] (0.180) 13.567 6.761 [0.000] (0.180) Average correlation 0.807

LR,LS χ2LR,HS=LR,LS 22.96% 0.056 (0.108) 0.069 (0.119) 0.082 (0.149) 0.110 (0.149)

0.033 [0.857] 0.003 [0.957] 0.007 [0.932] 0.170 [0.680]

4.006 (0.265) 4.428 (0.266) 5.520 (0.229) 5.571 (0.229)

0.862 [0.353] 2.024 [0.155] 18.173 [0.000] 20.389 [0.000]

0.796

The table presents summary statistics for daily changes in the six-month, one-year, three-year, and ﬁveyear yields on US government securities over the 1983–2006 period. Speciﬁcally, the table provides the mean, volatility, and cross-correlation of these series, conditional on whether the level of the short rate and slope of the term structure are either low or high (and the associated standard errors). These states of the world are labeled HR and LR for high and low short rates, respectively, and HS and LS for high and low slopes, respectively, and they occur with the probabilities given in the ﬁrst row of the table. A Wald test that the conditional moments are equal (and the associated p value), holding the short rate state ﬁxed but varying the state for the slope of the term structure, is also provided for the mean and volatility of these series.

volatility at low slopes is less than that at high slopes, the eﬀect is much less pronounced. This is conﬁrmed by the fact that a number of the p values are no longer signiﬁcant τ τ = σlr:ls . Fourth, the conat conventional levels for the test of the hypothesis, σlr:hs ditional means, though not in general reliably estimated, are consistent with existing results in the literature (e.g., Chan, Karolyi, Longstaﬀ and Sanders, 1992; A¨ıt-Sahalia, 1996a; and Stanton, 1997). That is, at low levels of interest rates, the mean tends to be greater than at high interest rates, which can be explained by mean reversion. However, the table also provides an interesting new result, namely that the eﬀect of the slope is of higher magnitude than the level. Further, low slopes tend to be associated with negative changes in rates, whereas high slopes are linked to positive interest rate changes.

2 The stochastic behavior of interest rates: Some evidence

303

2.3. The conditional distribution of interest rates: A closer look In order to generalize the results of Section 2.2, we employ a kernel estimation procedure for estimating the relation between interest rate changes and components of the term-structure of interest rates. Kernel estimation is a nonparametric method for estimating the joint density of a set of random variables. Speciﬁcally, given a time series Δiτt,t+1 , irt and ist (where ir is the level of interest rates, and is is the slope), generated from an unknown density f (Δiτ , ir , is ), then a kernel estimator of this density is fˆ(Δiτ , ir , is ) =

T (Δiτ , ir , is ) − (Δiτt,t+1 , irt , ist ) 1 , K T hm t=1 h

(2)

where K(·) is a suitable kernel function and h is the window width or smoothing parameter. We employ the commonly used independent multivariate normal kernel for K(·). The other parameter, the window width, is chosen based on the dispersion of the observations. For the independent multivariate normal kernel, Scott (1992) suggests the window width, −1 ˆ = kˆ h σi T m+4 ,

where σ ˆi is the standard deviation estimate of each variable zi , T is the number of observations, m is the dimension of the variables, and k is a scaling constant often chosen via cross-validation. Here, we employ a cross-validation procedure to ﬁnd the k that provides the right trade-oﬀ between the bias and variance of the errors. Across all the data points, we ﬁnd the ks that minimize the mean-squared error between the observed data and the estimated conditional data. This mean-squared error minimization is implemented using a Jackknife-based procedure. In particular, the various implied conditional moments at each data point are estimated using the entire sample, except for the actual data point and its nearest neighbors.4 Once the k is chosen, the actual estimation of the conditional distribution of interest rates involves the entire sample, albeit using window widths chosen from partial samples. To coincide with Section 2.2., we focus on the ﬁrst two conditional moments of the distribution, and it is possible to show that μ ˆΔiτ (ir , is ) =

T

wt (ir , is )Δiτt

(3)

wt (ir , is )(Δiτt − μ ˆΔiτ (ir , is ))2 ,

(4)

t=1

σˆ2 Δiτ (ir , is ) =

T t=1

4 Due to the serial dependence of the data, we performed the cross-validation omitting 100 observations, i.e., four months in either direction of the particular data point in question. Depending on the moments in question, the optimal ks range from roughly 1.7 to 27.6, which implies approximately twice to 28 times the smoothing parameter of Scott’s asymptotically optimal implied value.

304

A multifactor, nonlinear, continuous-time model of interest rate volatility Volatility (bp/day) (One year)

9 8.5 8 7.5 7 6.5 6 5.5 5

0.035 0.03 0.025

4.5 0.03

0.04

0.05

0.06 r

0.02 Slope 0.015 0.07

0.08

0.09

0.1

0.01 0.005 0.110

Fig. 14.3. The volatility of the daily change in the one-year yield (in basis points), conditional on the short rate and the slope of term structure r s r s (i ,i )−(irt ,ist ) (i ,i )−(irt ,ist ) T / . The weights, wt (ir , is ), where wt (ir , is ) = K K t=1 h h are determined by how close the chosen state, i.e., the particular values of the level and slope, ir and is , is to the observed level and slope of the term structure, irt and ist . As an illustration, using equation (4), Figure 14.3 provides estimates of the volatility of daily changes in the one-year rate, conditional on the current level of the short rate and the slope of the term structure (i.e., irt and ist ). Although Figure 14.3 represents only the one-year rate, the same eﬀects carry through to the rest of the yield curve and have therefore been omitted for purposes of space. The ﬁgure maps these estimates to the relevant range of the data, in particular, for short rates ranging from 3% to 11% and slopes ranging from 0.0% to 3.5%. That said, from Figure 14.2, the data are quite sparse in the joint region of very low rates and low slopes, and thus results must be treated with caution in this range. The main result is that the volatility is maximized at high interest rate levels and high slopes though the more dramatic changes occur at high slopes. To see this a little more clearly, Figures 14.4 and 14.5 present cut-throughs of Figure 14.3 across the term structure at short rates of 8.0% and 5.5%, respectively. From Figure 14.2, these levels represent data ranges in which there are many diﬀerent slopes; thus, conditional on these levels, the estimated relation between the volatility of the six-month, one-year, three-year and ﬁve-year rates as a function of the slope is more reliable. Several observations are in order. First, as seen from the ﬁgures,

2 The stochastic behavior of interest rates: Some evidence

305

9 Six months One year Three years Five years

8.5

Volatility (bp/day)

8

7.5

7

6.5

6

5.5

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Slope

Fig. 14.4. The volatility of the daily change in yields versus the slope, with the short rate ﬁxed at 8% 8 Six months One year Three years Five years

7.5

Volatility (bp/day)

7

6.5

6

5.5

5

4.5

4

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Slope

Fig. 14.5. The volatility of the daily change in yields versus the slope, with the short rate ﬁxed at 5.5%

306

A multifactor, nonlinear, continuous-time model of interest rate volatility 9 8.5 8

Six months One year Three years Five years

Volatility (bp/day)

7.5 7 6.5 6 5.5 5 4.5 4 0.03

0.04

0.05

0.06

0.07 r

0.08

0.09

0.1

0.11

Fig. 14.6. The volatility of the daily change in yields versus the short rate, with the slope ﬁxed at 2.75%

volatility is increasing in the slope for all maturities, though primarily only for steep term structures, i.e., above 2.0%. Second, volatility is also higher at greater magnitudes of the short rate albeit less noticeably. These results suggest that any valuation requiring a volatility estimate of interest rates should be done with caution. For example, estimating volatility when the term structure is ﬂat relative to upward sloping should lead to quite diﬀerent point estimates. Third, the relation between volatility and the slope is nonlinear, which, as it turns out in Section 4, will lead to a nonlinear continuous-time diﬀusion process. This feature can be potentially important as the majority of the multifactor, term structure pricing models are derived from the aﬃne class. Alternatively, Figures 14.6 and 14.7 provide cut-throughs of Figure 14.3 across the term structure at slopes of 2.75% and 1.00%, respectively. These slopes represent data ranges in which there are a number of observations of the interest rate level. The ﬁgures show that the estimated relation between the volatility of the six-month, one-year, and especially the three-year and ﬁve-year rates as a function of the level depends considerably on the slope of the term structure. For example, the volatility of the six-month and one-year interest rate changes is almost ﬂat over levels of 3.0% to 6.0% at low slopes, whereas it increases roughly 200 basis points at high slopes. Similarly, even at the long end of the yield curve, the increase in volatility is higher at high versus low slopes.

3 Estimation of a continuous-time multifactor diﬀusion process

307

8

7.5

Six months One year Three years Five years

Volatility (bp/day)

7

6.5

6

5.5

5

4.5

4 0.03

0.04

0.05

0.06

0.07 r

0.08

0.09

0.1

0.11

Fig. 14.7. The volatility of the daily change in yields versus the short rate, with the slope ﬁxed at 1%

3. Estimation of a continuous-time multifactor diﬀusion process The results of Section 2 suggest that the volatility of changes in the term structure of interest rates depends on at least two factors. Given the importance of continuous-time mathematics in the ﬁxed income area, the question arises as to how these results can be interpreted in a continuous-time setting. Using data on bond prices, and explicit theoretical pricing models (e.g., Cox, Ingersoll and Ross, 1985), Brown and Dybvig (1986), Pearson and Sun (1994), Gibbons and Ramaswamy (1993) and Dai and Singleton (2000) all estimate parameters of the underlying interest rate process in a fashion consistent with the underlying continuous-time model. These procedures limit themselves, however, to fairly simple speciﬁcations. As a result, a literature emerged which allows estimation and inference of fairly general continuous-time diﬀusion processes using discretely sampled data. A¨ıt-Sahalia (2007) provides a survey of this literature and we provide a quick review here. First, at a parametric level, there has been considerable eﬀort in the ﬁnance literature at working through maximum likelihood applications of continuous-time processes with discretely sampled data, starting with Lo (1988) and continuing more recently with A¨ıt-Sahalia (2002) and A¨ıt-Sahalia and Kimmel (2007a, 2007b). Second, by employing the inﬁnitesimal generators of the underlying continuous-time diﬀusion processes, Hansen and Scheinkman (1995) and Conley, Hansen, Luttmer and Scheinkman (1995)

308

A multifactor, nonlinear, continuous-time model of interest rate volatility

construct moment conditions that also make the investigation of continuous-time models possible with discrete time data. Third, in a nonparametric framework, A¨ıt-Sahalia (1996a, 1996b) develops a procedure for estimating the underlying process for interest rates using discrete data by choosing a model for the drift of interest rates and then nonparametrically estimating its diﬀusion function. Finally, as an alternative method, Stanton (1997) employs approximations to the true drift and diﬀusion of the underlying process, and then nonparametrically estimates these approximation terms to back out the continuous-time process (see also Bandi, 2002; Chapman and Pearson, 2000; and Pritsker, 1998). The advantage of this approach is twofold: (i) similar to the other procedures, the data need only be observed at discrete time intervals, and (ii) the drift and diﬀusion are unspeciﬁed, and thus may be highly nonlinear in the state variable. In this section, we extend the work of Stanton (1997) to a multivariate setting and provide for the nonparametric estimation of the drift and volatility functions of multivariate stochastic diﬀerential equations. Similar to Stanton (1997), we use Milshtein’s (1978) approximation schemes for writing expectations of functions of the sample path of stochastic diﬀerential equations in terms of the drift and volatility coeﬃcients. If the expectations are known (albeit estimated nonparametrically in this paper) and the functions are chosen appropriately, then the approximations can be inverted to recover the drift and volatility coeﬃcients. We have performed an extensive simulation analysis (not shown here) to better understand the properties of the estimators. Not surprisingly, the standard errors around the estimators, as well as the properties of the goodness of ﬁt, deteriorate as the data becomes more sparse. Given the aforementioned literature that looks at univariate properties of interest rates, it is important to point out that these properties suﬀer more in the multivariate setting as we introduce more “Star trek” regions of the data with the increasing dimensionality of the system. Nevertheless, this point aside, the approximation results here for the continuous-time process carry through to those presented in Stanton (1997), in particular, the ﬁrst order approximation works well at daily to weekly horizons, while higher order approximations are required for less frequent sampling.

3.1. Drift, diﬀusion and correlation approximations Assume that no arbitrage opportunities exist, and that bond prices are functions of two state variables, the values of which can always be inverted from the current level, Rt , and a second state variable, St . Assume that these variables follow the (jointly) Markov diﬀusion process dRt = μR (Rt , St ) dt + σR (Rt , St ) dZtR

(5)

dSt = μS (Rt , St ) dt + σS (Rt , St ) dZtS ,

(6)

where the drift, volatility and correlation coeﬃcients (i.e., the correlation between Z R and Z S ) all depend on Rt and St . Deﬁne the vector Xt = (Rt , St ).

3 Estimation of a continuous-time multifactor diﬀusion process

309

Under suitable restrictions on μ, σ, and a function f , we can write the conditional expectation Et [f (Xt+Δ )] in the form of a Taylor series expansion,5 1 Et [f (Xt+Δ )] = f (Xt ) + Lf (Xt )Δ + L2 f (Xt )Δ2 + . . . 2 +

1 n L f (Xt )Δn + O(Δn+1 ), n!

(7)

where L is the inﬁnitesimal generator of the multivariate process {Xt } (see Øksendal, 1985; and Hansen and Scheinkman, 1995), deﬁned by 2 ∂f (Xt ) 1 ∂ f (Xt ) μX (Xt ) + trace Σ(Xt ) , Lf (Xt ) = ∂Xt 2 ∂Xt ∂Xt where Σ(Xt ) =

2 σR (Rt , St ) ρ(Rt , St )σR (Rt , St )σS (Rt , St ) . σS2 (Rt , St ) ρ(Rt , St )σR (Rt , St )σS (Rt , St )

Equation (7) can be used to construct numerical approximations to Et [f (Xt+Δ )] in the form of a Taylor series expansion, given known functions μR , μS , ρ, σR and σS (see, for example, Milshtein, 1978). Alternatively, given an appropriately chosen set of functions f (·) and nonparametric estimates of Et [f (Xt+Δ )], we can use equation (7) to construct approximations to the drift, volatility and correlation coeﬃcients (i.e., μR , μS , ρ, σR and σS ) of the underlying multifactor, continuous-time diﬀusion process. The nice feature of this method is that the functional forms for μR , μS , ρ, σR and σS are quite general, and can be estimated nonparametrically from the underlying data. Rearranging equation (7), and using a time step of length iΔ(i = 1, 2, . . .), we obtain , i (Xt ) ≡ 1 Et [f (Xt+iΔ ) − f (Xt )] , E iΔ 1 1 = Lf (Xt ) + L2 f (Xt )(iΔ) + . . . + Ln f (Xt )(iΔ)n−1 + O(Δn ). 2 n!

(8)

, i is a ﬁrst order approximation to Lf , From equation (8), each of the E , i (Xt ) = Lf (Xt ) + O(Δ). E 5 For a discussion see, for example, Hille and Phillips (1957), Chapter 11. Milshtein (1974, 1978) gives examples of conditions under which this expansion is valid, involving boundedness of the functions μ, σ, f and their derivatives. There are some stationary processes for which this expansion does not hold for the functions f that we shall be considering, including processes such as

dx = μ dt + x3 dZ, which exhibit “volatility induced stationary” (see Conley, Hansen, Luttmer and Scheinkman, 1995). However, any process for which the ﬁrst order Taylor series expansion fails to hold (for linear f ) will also fail if we try to use the usual numerical simulation methods (e.g. Euler discretization). This severely limits their usefulness in practice.

310

A multifactor, nonlinear, continuous-time model of interest rate volatility

N , i (Xt ). That Now consider forming linear combinations of these approximations, i=1 αi E is, from equation (8), N N N 1 i , (Xt ) = αi E αi Lf (Xt ) + αi i L2 f (Xt )Δ 2 i=1 i=1 i=1 N 1 2 3 + αi i L f (Xt )Δ2 + . . . . 6 i=1

(9)

Can we choose the αi so that this linear combination is an approximation to Lf of order N ? For the combination to be an approximation to Lf , we require ﬁrst that the weights α1 , α2 , . . . , αN sum to 1. Furthermore, from equation (9), in order to eliminate the ﬁrst order error term, the weights must satisfy the equation N

αi i = 0.

i=1

More generally, in order to eliminate the nth order error term (n ≤ N − 1), the weights must satisfy the equation, N

αi in = 0.

i=1

We can write this set of ⎛ 1 ⎜1 ⎜ ⎜1 ⎜ ⎜ .. ⎝. 1

restrictions more compactly in matrix form as ⎞ ⎛ ⎞ 1 1 1 ··· 1 ⎜0⎟ 2 3 ··· N ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ 4 9 ··· N2 ⎟ ⎟ α ≡ V α = ⎜0⎟ . ⎜ .. ⎟ .. .. .. ⎟ .. ⎝.⎠ . . . . ⎠ N −1 N −1 N −1 2 3 ··· N 0

The matrix V is called a Vandermonde matrix, and is invertible for any value of N . We can thus obtain α by calculating ⎛ ⎞ 1 ⎜0⎟ ⎜ ⎟ α = V −1 ⎜ . ⎟ . (10) ⎝ .. ⎠ 0. For example, for N = 3, we obtain ⎛

1 1 α = ⎝1 2 1 4 ⎛ ⎞ 3 = ⎝−3⎠ . 1

⎞−1 ⎛ ⎞ 1 1 3⎠ ⎝0⎠ , 0 9

(11)

(12)

3 Estimation of a continuous-time multifactor diﬀusion process

311

Substituting α into equation (9), and using equation (8), we get the following third order approximation of the inﬁnitesimal generator of the process {Xt }: Lf (Xt ) =

1 [18Et (f (Xt+Δ ) − f (Xt )) − 9Et (f (Xt+2Δ ) − f (Xt )) 6Δ + 2Et (f (Xt+3Δ ) − f (Xt ))] + O(Δ3 ).

To approximate a particular function g(x), we now need merely to ﬁnd a speciﬁc function f satisfying Lf (x) = g(x). For our purposes, consider the functions f(1) (R) ≡ R − Rt , f(2) (S) ≡ S − St , 2

f(3) (R) ≡ (R − Rt ) , 2

f(4) (S) ≡ (S − St ) , f(5) (R, S) ≡ (R − Rt ) (S − St ) . From the deﬁnition of L, we have Lf(1) (R) = μR (R, S), Lf(2) (S) = μS (R, S), 2 Lf(3) (R) = 2(R − Rt )μR (R, S) + σR (R, S),

Lf(4) (S) = 2(S − St )μS (R, S) + σS2 (R, S), Lf(5) (R, S) = (S − St )μR (R, S) + (R − Rt )μS (R, S) + ρ(R, S)σR (R, S)σS (R, S). Evaluating these at R = Rt , S = St , we obtain Lf(1) (Rt ) = μR (Rt , St ), Lf(2) (St ) = μS (Rt , St ), 2 Lf(3) (Rt ) = σR (Rt , St ),

Lf(4) (St ) = σS2 (Rt , St ), Lf(5) (Rt , St ) = ρ(Rt , St )σR (Rt , St )σS (Rt , St ). Using each of these functions in turn as the function f above, we can generate approximations to μR , μS , σR , σS and ρ respectively. For example, the third order approximations

312

A multifactor, nonlinear, continuous-time model of interest rate volatility

(taking square roots for σR and σS ) are μR (Rt , St ) =

1 [18Et (Rt+Δ − Rt ) − 9Et (Rt+2Δ − Rt ) + 2Et (Rt+3Δ − Rt )] 6Δ + O(Δ3 ),

μS (Rt , St ) =

(13)

1 [18Et (St+Δ − St ) − 9Et (St+2Δ − St ) + 2Et (St+3Δ − St )] 6Δ + O(Δ3 ),

; ⎛ 0 1 0 1 ⎞ < 2 2 < − Rt ) < 1 ⎝ 18Et (Rt+Δ − R0t ) − 9Et (Rt+2Δ 1 ⎠ σR (Rt , St ) = = 2 6Δ +2Et (Rt+3Δ − Rt ) ; ⎛ 0 1 0 1 ⎞ < 2 2 < 18E (S − 9E (S − S ) − S ) t t+Δ t t t+2Δ t < 1 ⎝ 0 1 ⎠ σS (Rt , St ) = = 2 6Δ +2Et (St+3Δ − St ) σRS (Rt , St ) =

1 (18Et [(Rt+Δ − Rt ) (St+Δ − St )] 6Δ − 9Et [(Rt+2Δ − Rt ) (St+2Δ − St )] +2Et [(Rt+3Δ − Rt ) (St+3Δ − St )]) .

The approximations of the drift, volatility and correlation coeﬃcients are written in terms of the true ﬁrst, second and cross moments of multiperiod changes in the two state variables. If the two-factor assumption is appropriate, and a large stationary time series is available, then these conditional moments can be estimated using appropriate nonparametric methods. In this chapter, we estimate the moments using multivariate density estimation, with appropriately chosen factors as the conditioning variables. All that is required is that these factors span the same space as the true state variables.6 The results for daily changes were provided in Section 2. Equation (13) shows that these estimates are an important part of the approximations to the underlying continuous-time dynamics. By adding multiperiod extensions of these nonparametric estimated conditional moments, we can estimate the drift, volatility and correlation coeﬃcients of the multifactor process described by equations (5) and (6). Figure 14.8 provides the ﬁrst, second and third order approximations to the diﬀusion of the short rate against the short rate level and the slope of the term structure.7 The most notable result is that a ﬁrst order approximation works well; thus, one can consider the theoretical results of this section as a justiﬁcation for discretization methods currently used in the literature. The description of interest rate behavior given in Section 2, therefore, carries through to the continuous-time setting. Our major ﬁnding is that the 6 See

Duﬃe and Kan (1996) for a discussion of the conditions under which this is possible (in a linear setting). 7 Figures showing the various approximations to the drift of the short rate, the drift and diﬀusion of the slope, and the correlation between the short rate and the slope are available upon request.

4 A generalized Longstaﬀ and Schwartz (1992) model

313

Volatility (r) First order Second order Third order

0.018 0.016 0.014 0.012 0.01 0.008 0.006

0.035 0.03 0.025

0.004 0.03

0.04

0.05

0.06 r

0.07

0.08

0.09

0.1

0.110

0.01 0.005

0.02 0.015 Slope

Fig. 14.8. First, second and third order approximations to the diﬀusion (annualized) of the short rate versus the short rate and the slope of the term structure

volatility of interest rates is increasing in the level of interest rates mostly for sharply upward sloping term structures. The question then is what does Figure 14.8, and more generally the rest of the estimated process, mean for ﬁxed-income pricing?

4. A generalized Longstaﬀ and Schwartz (1992) model Longstaﬀ and Schwartz (1992) provide a two-factor general equilibrium model of the term structure. Their model is one of the more popular versions within the aﬃne class of models for describing the yield curve (see also Cox, Ingersoll and Ross, 1985; Chen and Scott, 1995; Duﬃe and Kan, 1996; and Dai and Singleton, 2000). In the Longstaﬀ and Schwartz setting, all ﬁxed-income instruments are functions of two fundamental factors, the instantaneous interest rate and its volatility. These factors follow diﬀusion processes, which in turn lead to a fundamental valuation condition for the price of any bond, or bond derivative. As an alternative, here we also present a two-factor continuous-time model for interest rates. The results of Section 2 suggest that the aﬃne class may be too restrictive. Although our results shed valuable light on the factors driving interest rate movements there are potential problems in using this speciﬁcation to price interest rate contingent claims. A general speciﬁcation for Rt and St (and the associated prices of risk) may allow arbitrage opportunities if either of these state variables is a known function of an

314

A multifactor, nonlinear, continuous-time model of interest rate volatility

asset price.8 Of course, this point is true of all previous estimations of continuous-time processes to the extent that they use a priced proxy as the instantaneous rate. If we are willing to assume that we have the right factors, however, then there is no problem in an asymptotic sense. That is, as we are estimating these processes nonparametrically, as the sample size gets larger, our estimates will converge to the true functions, which are automatically arbitrage-free (if the economy is). Nevertheless, this is of little consolation if we are trying to use the estimated functions to price assets. To get around this problem, we need to write the model in a form in which neither state variable is an asset price or a function of asset prices. In this chapter, we follow convention by using the observable three-month yield as a proxy for the instantaneous rate, Rt . Furthermore, suppose that the mapping from (R, S) to (R, σR ) is invertible,9 so we can write asset prices as a function of R and σR , instead of R and S.10 As σR is not an asset price, using this variable avoids the inconsistency problem. Speciﬁcally, suppose that the true model governing interest rate movements is a generalization of the two-factor Longstaﬀ and Schwartz (1992) model, dRt = μR (R, σ) dt + σ dZ1 , dσt = μσ (R, σ)dt + ρ(R, σ)s(R, σ) dZ1 +

(14)

1 − ρ2 s dZ2 ,

(15)

where dZ1 dZ2 = 0.11 In vector terms, d(Rt , σt ) = M dt + θ dZ, where μR , μσ σ 0 . θ≡ 1 − ρ2 s ρs

M≡

Asset prices, and hence the slope of the term structure, can be written as some function of the short rate and instantaneous short rate volatility, S(R, σ). From equations (14) and (15), how do we estimate the underlying processes for R and σ given the estimation results of Section 3? Although the short rate volatility, σ, is not directly observable, it is possible to estimate this process. Speciﬁcally, using Ito’s 8 See,

for example, Duﬃe, Ma and Yong (1995). The problem is that, given such a model, we can price any bond, and are thus able to calculate what the state variable “ought” to be. Without imposing any restrictions on the assumed dynamics for Rt and St , there is no guarantee that we will get back to the same value of the state variable that we started with. 9 That is, for a given value of R , the volatility, σ , is monotonic in the slope, S. This is the case in t R most existing multifactor interest rate models, including, for example all aﬃne models, such as Longstaﬀ and Schwartz (1992). 10 This follows by writing V (R, S) = V (R, S(R, σR )) ≡ U (R, σR ). 11 This

speciﬁcation is the most convenient to deal with, as we now have orthogonal noise terms. The correlation between the diﬀusion terms is ρ, and the overall variance of σ is s2 dt.

4 A generalized Longstaﬀ and Schwartz (1992) model

315

Lemma, together with estimates for μR , σR , μS , σS and ρ, it is possible to write 1' σRR σ 2 (Rt , St ) + σSS σS2 (Rt , St ) 2 ( +2σRS σ(Rt , St )σS (Rt , St )ρ(Rt , St ) dt.

dσt = σR dRt + σS dSt +

Given this equation, and the assumption that the function S(R, σ) is invertible, the dynamics of σt can be written as a function of the current level of R and σ in a straightforward way. This procedure requires estimation of a matrix of second derivatives. Although there are well-known problems in estimating higher order derivatives using kernel density estimation techniques, it is possible to link the results of Section 2 and 3 to this generalized Longstaﬀ and Schwartz (1992) model. In particular, using estimates of the second derivatives (not shown), several facts emerge. First, due to the small magnitudes of the estimated drifts of the state variables R and S, the drift of σ depends primarily on the second order terms. Consequently, the importance of the second factor (the slope) is determined by how much the sensitivity of short rate volatility to this factor changes relative to the changes in the sensitivity to the ﬁrst factor (the level). The general pattern is that volatility increases at a slower rate for high levels and a faster rate for high slopes. Consequently, for high volatilities and levels, the drift of volatility is negative, generating mean reversion. The eﬀect of the second factor, however, is to counter this phenomenon. Second, the diﬀusion of σ is determined by the sensitivities of short rate volatility to the two factors and the magnitudes of the volatilities of the factors. Based on the estimates of the volatilities and derivatives, the slope has the dominant inﬂuence on this eﬀect. In particular, the volatility of σ is high for upward sloping term structures, which also correspond to states with high short rate volatility. Moreover, sensitivity of this diﬀusion to the two factors is larger in the slope direction than in the level direction. As an alternative to the above method, we can estimate an implied series for σ by assuming that the function S(R, σ) is invertible, i.e., that we can equivalently write the model in the form dRt = μR (Rt , St )dt + σ(Rt , St )dZ1∗ dSt = μS (Rt , St )dt + σS (Rt , St )dZ2∗ , where Z1∗ and Z2∗ may be correlated. To estimate the function σ(R, S), we apply the methodology described in Section 3.1 to the function f(3) (R, S) ≡ (R − Rt )2 . Applying the estimated function to each observed (R, S) pair in turn yields a series for the volatility σ, which we can then use in estimating the generalized Longstaﬀ and Schwartz (1992) model given in equations (14) and (15).12 This procedure is in stark contrast to that of Longstaﬀ and Schwartz (1992), and others, who approximate the dynamics of the volatility factor as a Generalized Autoregressive Conditional Heteroskedasticity (GARCH) process. The GARCH process is not strictly compatible with the underlying dynamics of their continuous-time model; here, the estimation is based on approximation 12 Although the use of an estimated series for σ rather than the true series may not be the most eﬃcient approach, this procedure is consistent. That is, the problem will disappear as the sample size becomes large, and our pointwise estimates of σ converge to the true values.

316

A multifactor, nonlinear, continuous-time model of interest rate volatility 0.015

0.014

0.013

Sigma

0.012

0.011

0.01

0.009

0.008 0.03

0.04

0.05

0.06

0.07 Short rate

0.08

0.09

0.1

0.11

Fig. 14.9. Scatter plot of the three-month rate versus the term structure volatility over the 1983–2006 period

schemes to the diﬀusion process and is internally consistent. Due to the diﬃculties in estimating derivatives, we choose this second approach to estimate the continuous-time process.13

4.1. A general two-factor diﬀusion process: Empirical results Figures 14.10–14.12 show approximations to equation (15) for the generalized Longstaﬀ and Schwartz (1992) process as a function of the two factors, the instantaneous short rate and its volatility. It is important to point out that there are few available data at low short rates/high volatilities and high short rates/low volatilities, which corresponds to the earlier comment about interest rates and slopes (see Figure 14.9). Therefore, results in these regions need to be treated cautiously. Figures 14.10 and 14.11 provide the estimates of the continuous-time process for the second interest factor, namely its volatility. Several observations are in order. First, there is estimated mean-reversion in volatility; at low (high) levels of volatility, volatility tends to drift upward (downward). The eﬀect of the level of interest rates on this relation appears minimal. Second, and perhaps most important, there is clear evidence that the diﬀusion of the volatility process is increasing in the level of volatility, yet is aﬀected 13 Although the ﬁrst approach provides similar results to the second approach, the functional forms underlying the second method are more smooth and thus more suitable for analysis.

4 A generalized Longstaﬀ and Schwartz (1992) model

317

Drift (sigma)

0.0001 0 –0.0001 –0.0002 –0.0003 –0.0004 –0.0005 –0.0006 –0.0007 –0.0008 –0.0009 0.03

0.04

0.05

0.06

r

0.07

0.08

0.09

0.1

0.015 0.014 0.013 0.012 0.011 0.01 Sigma 0.009 0.008 0.007 0.006 0.11 0.005

Fig. 14.10. First order approximation to the drift (annualized) of the volatility versus the short rate and the volatility of the term structure

by the level of interest rates only marginally. Moreover, volatility’s eﬀect is nonlinear in that it takes eﬀect only at higher levels. This ﬁnding suggests extreme caution should be applied when inputting interest rate volatility into derivative pricing models. Most of our models take the relation between the level and volatility for granted; however, with increases from 3% to 11% in the interest rate level, both the drift and diﬀusion of volatility exhibit only mild increases. On the other hand, changes in the volatility level of much smaller magnitudes have a much larger impact on the volatility process. This ﬁnding links the term structure slope result documented earlier in the chapter to a second factor, namely the volatility of the instantaneous rate, and provides a close connection to the Engle, Lilien and Robins (1987) paper mentioned throughout this chapter. As the ﬁnal piece of the multifactor process for interest rates, Figure 14.12 graphs a ﬁrst order approximation of the correlation coeﬃcient between the short rate and the volatility, given values of the two factors. Taken at face value, the results suggest a complex variance–covariance matrix between these series in continuous-time. In particular, whereas the correlation decreases in the volatility for most interest rate levels, there appears to be some nonmonotonicity across the level itself. Why is correlation falling as volatility increases? Perhaps, high volatility, just like the corresponding high term structure slope, is associated with aggregate economic phenomena that are less related to the level of interest rates. Given that interest rates are driven by two relatively independent economic factors, namely expectations about both real rates and inﬂation, this argument

318

A multifactor, nonlinear, continuous-time model of interest rate volatility Volatility (sigma)

0.009 0.008 0.007 0.006 0.005 0.004 0.003 0.002 0.001 0 0.03

0.04

0.05

0.06

r

0.07

0.08

0.09

0.1

0.015 0.014 0.013 0.012 0.011 0.01 Sigma 0.009 0.008 0.007 0.006 0.11 0.005

Fig. 14.11. First order approximation to the diﬀusion (annualized) of the volatility versus the short rate and the volatility of the term structure

seems reasonable. It remains an open question, however, what the exact relationship is between Figure 14.12 and these economic factors.

4.2. Valuation of ﬁxed-income contingent claims Given the interest rate model described in equation (15), we can write the price of an interest rate contingent claim as V (r, σ, t), depending only on the current values of the two state variables plus time. Then, by Ito’s Lemma, dV (r, σ, t) = m(r, σ, t) dt + s1 (r, σ, t) dZ1 + s2 (r, σ, t) dZ2 , V (r, σ, t)

(16)

where ' ( 1 m(r, σ, t) V = Vt + μr (r, σ)VR + μσ (r, σ)Vσ + trace θT ∇2 V (r, σ) θ , 2 1 1 = Vt + μr (r, σ)Vr + μσ (r, σ)Vσ + σ 2 Vrr + s2 Vσσ + ρσsVrσ , 2 2 s1 (r, σ, t) V = σVr + ρsVσ , s2 (r, σ, t) V = 1 − ρ2 sVσ .

(17)

4 A generalized Longstaﬀ and Schwartz (1992) model

319

Correlation (r, sigma)

0.6 0.5 0.4 0.3 0.2 0.1 0 –0.1 0.03

0.04

0.05

0.06

r

0.07

0.08

0.09

0.1

0.015 0.014 0.013 0.012 0.011 0.01 Sigma 0.009 0.008 0.007 0.006 0.005 0.11

Fig. 14.12. First order approximation to the correlation coeﬃcient between changes in the short rate and the volatility versus the short rate and the volatility of the term structure The volatility of the asset, σV , is given by 8 2 σV V = (σVr + ρsVσ ) + (1 − ρ2 ) s2 Vσ2 , = σ 2 Vr2 + 2ρσsVr Vσ + s2 Vσ2 . With a one-factor interest rate model, to prevent arbitrage, the risk premium on any asset must be proportional to its standard deviation.14 Similarly, with two factors, absence of arbitrage requires the excess return on an asset to be a linear combination of its exposure to the two sources of risk. Thus, if the asset pays out dividends at rate d, we can write m=r−

Vr Vσ d + λr (r, σ) + λσ (r, σ) , V V V

(18)

where λr and λσ are the prices of short rate risk and volatility risk, respectively. Substituting equation (18) into equation (17), and simplifying, leads to a partial diﬀerential equation that must be satisﬁed by any interest rate contingent claim, assuming the usual 14 Suppose this did not hold for two risky assets. We could then create a riskless portfolio of these two assets with a return strictly greater than r, leading to an arbitrage opportunity (see Ingersoll, 1987).

320

A multifactor, nonlinear, continuous-time model of interest rate volatility

technical smoothness and integrability conditions (see, for example, Duﬃe, 1988), 1 1 2 σ Vrr + [μr − λr ] Vr + s2 Vσσ + [μσ − λσ ] Vσ + ρσsVrσ + Vt − rV + d = 0, 2 2

(19)

subject to appropriate boundary conditions. To price interest rate dependent assets, we need to know not only the processes governing movements in r and σ, but also the prices of risk, λr and λσ . Equation (18) gives an expression for these functions in terms of the partial derivatives Vr and Vσ , which could be used to estimate the prices of risk, given estimates of these derivatives for two diﬀerent assets, plus estimates of the excess return for each asset. As mentioned above, it is diﬃcult to estimate derivatives precisely using nonparametric density estimation. Therefore, instead of following this route, one could avoid directly estimating the partial derivatives, Vr and Vσ , by considering the instantaneous covariances between the asset return and changes in the interest rate/volatility, cV r and cV σ . From equations (14), (15) and (16) (after a little simpliﬁcation), 2 cV r dV dr/V dt σ Vr /V ρσs ≡ = . (20) dV dσ/V dt cV σ ρσs s2 Vσ /V This can be inverted, as long as |ρ| < 1, to obtain −1 2 Vr /V cV r σ ρσs , = cV σ Vσ /V ρσs s2 1 1/σ 2 −ρ/σs cV r = . cV σ 1 − ρ2 −ρ/σs 1/s2 To preclude arbitrage, the excess return on the asset must also be expressible as a linear combination of cV r and cV σ , m=r−

d + λ∗ r (r, σ)cV r + λ∗ σ (r, σ)cV σ . V

(21)

Given two diﬀerent interest rate dependent assets, we can estimate the instantaneous covariances for each in the same way as we estimated ρ(r, σ) above. We can also estimate the excess return for each asset, mi (r, σ) − r as a function of the two state variables. The two excess returns can be expressed in the form c1V r c1V σ λ∗ r m1 − r = , m2 − r c2V r c2V σ λ∗ σ which can be inverted to yield an estimate of the prices of risk, −1 λ∗ r c1V r c1V σ m1 − r . = λ∗ σ c2V r c2V σ m2 − r Finally, for estimates of the more standard representation of the prices of risk, λr and λσ , equate equations (18) and (21), using equation (20), to obtain 2 ∗ λr σ λ r ρσs = . λσ λ∗ σ ρσs s2

5 Conclusion

321

Given estimates for the process governing movements in r and σ, and the above procedure for the functions λr and λσ , we can value interest rate dependent assets in one of two ways. The ﬁrst is to solve equation (19) numerically using a method such as the Hopscotch method of Gourlay and McKee (1977). The second is to use the fact that we can write the solution to equation (19) in the form of an expectation. Speciﬁcally, we can write V , the value of an asset which pays out cash ﬂows at a (possibly path-dependent) rate Ct , in the form T

Vt = E

e−

s t

(, ru ) du

Cs ds ,

(22)

t

where r, follows the “risk adjusted” process, rτ , σ ,τ ) − λr (, rτ , σ ,τ )] dτ + σ ,τ dZ1 , d, rτ = [μr (, d, στ = [μσ (, rτ , σ ,τ ) − λσ (, rτ , σ ,τ )] dτ + ρs(, rt au, σ ,τ ) dZ1 +

(23)

1 − ρ2 s dZ2 ,

(24)

for all τ > t, and where r,t = rt , σ ,t = σt . This says that the value of the asset equals the expected sum of discounted cash ﬂows paid over the life of the asset, except that it substitutes the risk adjusted process (, r, σ ,) for the true process (r, σ). This representation leads directly to a valuation algorithm based on Monte Carlo simulation. For a given starting value of (rt , σt ), simulate a number of paths for r, and σ , using equations (23) and (24). Along each path, calculate the cash ﬂows Ct , and discount these back along the path followed by the instantaneous riskless rate, r,t . The average of the sum of these values taken over all simulated paths is an approximation to the expectation in equation (22), and hence to the security value, Vt . The more paths simulated, the closer the approximation.

5. Conclusion This chapter provides a method for estimating multifactor continuous-time Markov processes. Using Milshtein’s (1978) approximation schemes for writing expectations of functions of the sample path of stochastic diﬀerential equations in terms of the drift, volatility and correlation coeﬃcients, we provide nonparametric estimation of the drift and diﬀusion functions of multivariate stochastic diﬀerential equations. We apply this technique to the short- and long-end of the term structure for a general two-factor, continuous-time diﬀusion process for interest rates. In estimating this process, several results emerge. First, the volatility of interest rates is increasing in the level of interest rates, only for sharply, upward sloping term structures. Thus, the result of previous studies, suggesting an almost exponential relation between interest rate volatility and levels, is due to the term structure on average being upward sloping, and is not a general result per se. Second, the ﬁnding that partly motivates this chapter, i.e., the link between slope

322

A multifactor, nonlinear, continuous-time model of interest rate volatility

and interest rate volatility in Engle, Lilien and Robins (1987), comes out quite naturally from the estimation. Finally, the slope of the term structure, on its own, plays a large role in determining the magnitude of the diﬀusion coeﬃcient. These volatility results hold across maturities, which suggests that a low dimensional system (with nonlinear eﬀects) may be enough to explain the term structure of interest rates. As a ﬁnal comment, there are several advantages of the procedure adopted in this chapter. First, there is a constant debate between researchers on the relative beneﬁts of using equilibrium versus arbitrage-free models. Here, we circumvent this issue by using actual data to give us the process and corresponding prices of risk. As the real world coincides with the intersection of equilibrium and arbitrage-free models, our model is automatically consistent. Of course, in a small sample, statistical error will produce estimated functional forms that do not conform. This problem, however, is true of all empirical work. Second, we show how our procedure for estimating the underlying multifactor continuous-time diﬀusion process can be used to generate ﬁxed income pricing. As an example, we show how our results can be interpreted within a generalized Longstaﬀ and Schwartz (1992) framework, that is, one in which the drift and diﬀusion coeﬃcients of the instantaneous interest rate and volatility are both (nonlinear) functions of the level of interest rates and the volatility. Third, and perhaps most important, the pricing of ﬁxed-income derivatives depends crucially on the level of volatility. The results in this chapter suggest that volatility depends on both the level and slope of the term structure, and therefore contains insights into the eventual pricing of derivatives.

15

Estimating the Implied Risk-Neutral Density for the US Market Portfolio Stephen Figlewski

1. Introduction The Black–Scholes (BS) option pricing model has had an enormous impact on academic valuation theory and also, impressively, in the ﬁnancial marketplace. It is safe to say that virtually all serious participants in the options markets are aware of the model and most use it extensively. Academics tend to focus on the BS model as a way to value an option as a function of a handful of intuitive input parameters, but practitioners quickly realized that one required input, the future volatility of the underlying asset, is neither observable directly nor easy to forecast accurately. However, an option’s price in the market is observable, so one can invert the model to ﬁnd the implied volatility (IV) that makes the option’s model value consistent with the market. This property is often more useful than the theoretical option price for a trader who needs the model to price less liquid options consistently with those that are actively traded in the market, and to manage his risk exposure. An immediate problem with IVs is that when they are computed for options written on the same underlying they diﬀer substantially according to “moneyness”. The nowfamiliar pattern is called the volatility smile, or for options on equities, and stock indexes in particular, the smile has become suﬃciently asymmetrical over time, with higher Acknowledgments: Thanks to Justin Birru for excellent assistance on this research and to Otto van Hemert, Robert Bliss, Tim Bollerslev, an anonymous reviewer, and seminar participants at NYU, Baruch, Georgia Tech, Essex University, Lancaster University, Bloomberg, and the Robert Engle Festschrift Conference for valuable comments.

323

324

Estimating the implied risk-neutral density for the US market portfolio

IVs for low exercise price options, that it is now more properly called a “smirk” or a “skew”.1 Implied volatility depends on the valuation model used to extract it, and the existence of a volatility smile in Black–Scholes IVs implies that options market prices are not fully consistent with that model. Even so, the smile is stable enough over short time intervals that traders use the BS model anyway, by inputting diﬀerent volatilities for diﬀerent options according to their moneyness. This jury-rigged procedure, known as “practitioner Black–Scholes”, is an understandable strategy for traders, who need some way to impose pricing consistency across a broad range of related ﬁnancial instruments, and do not care particularly about theoretical consistency with academic models. This has led to extensive analysis of the shape and dynamic behavior of volatility smiles, even though it is odd to begin with a model that is visibly inconsistent with the empirical data and hope to improve it by modeling the behavior of the inconsistency. Extracting important but unobservable parameters from option prices in the market is not limited to implied volatility. More complex models can be calibrated to the market by implying out the necessary parameter values, such as the size and intensity of discrete price jumps. The most fundamental valuation principle, which applies to all ﬁnancial assets, not just options, is that a security’s market price should be the market’s expected value of its future payoﬀ, discounted back to the present at a discount rate appropriately adjusted for risk. Risk premia are also unobservable, unfortunately, but a fundamental insight of contingent claims pricing theory is that when a pricing model can be obtained using the principle of no-arbitrage, the risk-neutral probability distribution can be used in computing the expected future payoﬀ, and the discount rate to bring that expectation back to the present is the riskless rate. The derivative security can be priced relative to the underlying asset under the risk-neutralized probability distribution because investors’ actual risk preferences are embedded in the price of the underlying asset. Breeden and Litzenberger (1978) and Banz and Miller (1978) showed that, like implied volatility, the entire risk-neutral probability distribution can be extracted from market option prices, given a continuum of exercise prices spanning the possible range of future payoﬀs. An extremely valuable feature of this procedure is that it is model-free, unlike extracting IV. The risk-neutral distribution does not depend on any particular pricing model. At a point in time, the risk-neutral probability distribution and the associated riskneutral density function, for which we will use the acronym RND, contain an enormous amount of information about the market’s expectations and risk preferences, and their dynamics can reveal how information releases and events that aﬀect risk attitudes impact the market. Not surprisingly, a considerable amount of previous work has been done to extract and interpret RNDs, using a variety of methods and with a variety of purposes in mind.2 1 Occasionally

a writer will describe the pattern as a “sneer” but this is misleading. A smile curves upward more or less symmetrically at both ends; a smirk also curves upward but more so at one end than the other; a “skew” slopes more or less monotonically downward from left to right; but the term “sneer” would imply a downward curvature, i.e., a concave portion of the curve at one end, which is not a pattern seen in actual options markets. 2 An important practical application of this concept has been the new version of the Chicago Board Options Exchange’s VIX index of implied volatility (Chicago Board Options Exchange, 2003). The original VIX methodology constructed the index as a weighted average of BS implied volatilities from

2 Review of the literature

325

Estimation of the RND is hampered by two serious problems. First, the theory calls for options with a continuum of exercise prices, but actual options markets only trade a relatively small number of discrete strikes. This is especially problematic for options on individual stocks, but even index options have strikes at least 5 points apart, and up to 25 points apart or more in some parts of the available range. Market prices also contain microstructure noise from various sources, and bid-ask spreads are quite wide for options, especially for less liquid contracts and those with low prices. Slight irregularities in observed option prices can easily translate into serious irregularities in the implied RND, such as negative probabilities. Extracting a well-behaved estimate of a RND requires interpolation, to ﬁll in option values for a denser set of exercise prices, and smoothing, to reduce the inﬂuence of microstructure noise. The second major problem is that the RND can be extracted only over the range of available strikes, which generally does not extend very far into the tails of the distribution. For some purposes, knowledge of the full RND is not needed. But in many cases, what makes options particularly useful is the fact that they have large payoﬀs in the comparatively rare times when the underlying asset makes a large price move, i.e., in the tails of its returns distribution. The purpose of this chapter is to present a new methodology for extracting complete well-behaved RND functions from options market prices and to illustrate the potential of this tool for understanding how expectations and risk preferences are incorporated into prices in the US stock market. We review a variety of techniques for obtaining smooth densities from a set of observed options prices and select one that oﬀers good performance. This procedure is then modiﬁed to incorporate the market’s bid-ask spread into the estimation. Second, we will show how the tails of the RND obtained from the options market may be extended and completed by appending tails from a generalized extreme value (GEV) distribution. We then apply the procedure to estimate RNDs for the S&P 500 stock index from 1996–2008 and develop several interesting results. The next section will give a brief review of the extensive literature related to this topic. Section 3 details how the RND can theoretically be extracted from options prices. The following section reviews alternative smoothing procedures needed to obtain a wellbehaved density from actual options prices. Section 5 presents our new methodology for completing the RND by appending tails from a GEV distribution. Section 6 applies the methodology to explore the behavior of the empirical RND for the Standard and Poor’s 500 index over the period 1996–2008. The results presented in this section illustrate some of the great potential of this tool for revealing how the stock market behaves. The ﬁnal section will oﬀer some concluding comments and a brief description of several potentially fruitful lines of future research based on this technology.

2. Review of the literature The literature on extracting and interpreting the risk-neutral distribution from market option prices is broad, and it becomes much broader if the ﬁeld is extended to cover eight options written on the S&P 100 stock index. This was replaced in 2003 by a calculation that amounts to estimating the standard deviation of the risk-neutral density from options on the S&P 500 index.

326

Estimating the implied risk-neutral density for the US market portfolio

research on implied volatilities and on modeling the returns distribution. In this literature review, we restrict our attention to papers explicitly on RNDs. The monograph by Jackwerth (2004) provides an excellent and comprehensive review of the literature on this topic, covering both methodological issues and applications. Bliss and Panigirtzoglou (2002) also give a very good review of the alternative approaches to extracting the RND and the problems that arise with diﬀerent methods. Bahra (1997) is another often-cited review of methodology, done for the Bank of England prior to the most recent work in this area. One way to categorize the literature is according to the methods used by diﬀerent authors to extract a RND from a set of option market prices. These fall largely into three approaches: ﬁtting a parametric density function to the market data, approximating the RND with a nonparametric technique, or developing a model of the returns process that produces the empirical RND as the density for the value of the underlying asset on option expiration day. An alternative classiﬁcation is according to the authors’ purpose in extracting a risk-neutral distribution. Many authors begin with a review of the pros and cons of diﬀerent extraction techniques in order to select the one they expect to work best for their particular application. Because a risk-neutral density combines estimates of objective probabilities and risk preferences, a number of papers seek to use the RND as a window on market expectations about the eﬀects of economic events and policy changes on exchange rates, interest rates, and stock prices. Other papers take the opposite tack, in eﬀect, abstracting from the probabilities in order to examine the market’s risk preferences that are manifested in the diﬀerence between the risk-neutral density and the empirical density. A third branch of the literature is mainly concerned with extracting the RND as an econometric problem. These papers seek to optimize the methodology for estimating RNDs from noisy market options prices. The most ambitious papers construct an implied returns process, such as an implied binomial tree, that starts from the underlying asset’s current price and generates the implied RND on option expiration date. This approach leads to a full option pricing model, yielding both theoretical option values and Greek letter hedging parameters. Bates (1991) was one of the ﬁrst papers concerned with extracting information about market expectations from option prices. It analyzed the skewness of the RND from S&P500 index options around the stock market crash of 1987 as a way to judge whether the crash was anticipated by the market. Like Bahra (1997), S¨ oderlind and Svensson (1997) proposed learning about the market’s expectations for short-term interest rates, exchange rates, and inﬂation by ﬁtting RNDs as mixtures of two normal or lognormal densities. Melick and Thomas (1997) modeled the RND as a mixture of three lognormals. Using crude oil options, their estimated RNDs for oil prices during the period of the 1991 Persian Gulf crisis were often bimodal and exhibited shapes that were inconsistent with a univariate lognormal. They interpreted this as the market’s response to media commentary at the time and the anticipation that a major disruption in world oil prices was possible. In their examination of exchange rate expectations, Campa, Chang and Reider (1998) explored several estimation techniques and suggested that there is actually little diﬀerence among them. However, this conclusion probably depends strongly on the fact that their currency options data only provided ﬁve strike prices per day, which substantially limits the ﬂexibility of the functional forms that could be ﬁtted. Malz (1997) also modeled

2 Review of the literature

327

exchange rate RNDs and added a useful wrinkle. FX option prices are typically quoted in terms of their implied volatilities under the Garman–Kohlhagen model and moneyness is expressed in terms of the option’s delta. For example, a “25 delta call” is an out of the money call option with a strike such that the option’s delta is 0.25. Malz used a simple function involving the prices of option combination positions to model and interpolate the implied volatility smile in delta-IV space. Quite a few authors have ﬁtted RNDs to stock market returns, but for the most part, their focus has not been on the market’s probability estimates but on risk preferences. An exception is Gemmill and Saﬂekos (2000), who ﬁtted a mixture of two lognormals to FTSE stock index options and looked for evidence that investors’ probability beliefs prior to British elections reﬂected the dichotomous nature of the possible outcomes. Papers that seek to use RNDs to examine the market’s risk preferences include A¨ıtSahalia and Lo (1998, 2000), Jackwerth (2000), Rosenberg and Engle (2002) and Bliss and Panigirtzoglou (2004). In their 1998 paper, A¨ıt-Sahalia and Lo used a nonparametric kernel smoothing procedure to extract RNDs from S&P 500 index option prices. Unlike other researchers, they assumed that if the RND is properly modeled as a function of moneyness and the other parameters that enter the Black–Scholes model, it will be suﬃciently stable over time that a single RND surface deﬁned on log return and days to maturity can be ﬁtted to a whole calendar year (1993). Although they did not speciﬁcally state that their approach focuses primarily on risk preferences, it is clear that if the RND is this stationary, its shape is not varying in response to the ﬂow of new information entering the market’s expectations, beyond what is reﬂected in the changes in the underlying asset price. A¨ıt-Sahalia and Lo (2000) applies the results of their earlier work to the Value-at-Risk problem and proposes a new VaR concept that includes the market’s risk preferences as revealed in the nonparametric RND. Jackwerth (2000) uses the methodology proposed in Jackwerth and Rubinstein (1996) to ﬁt smooth RNDs to stock prices around the 1987 stock market crash and the period following it. The paper explores the market’s risk attitudes, essentially assuming that they are quite stable over time, but subject to substantial regime changes. The resulting risk aversion functions exhibit some anomalies, however, leaving some important open questions. In their cleverly designed study, Bliss and Panigirtzoglou (2004) assume a utility function of a particular form. Given a level of risk aversion, they can then extract the representative investor’s true (subjective) expected probability distribution. They assume the representative investor has rational expectations and ﬁnd the value of the constant risk-aversion parameter that gives the best match between the extracted subjective distribution and the distribution of realized outcomes. By contrast, Rosenberg and Engle (2002) model a fully dynamic risk-aversion function by ﬁtting a stochastic volatility model to S&P 500 index returns and extracting the “empirical pricing kernel” on each date from the diﬀerence between the estimated empirical distribution and the observed RND. The literature on implied trees began with three papers written at about the same time. Perhaps the best-known is Rubinstein’s (1994) Presidential Address to the American Finance Association, in which he described how to ﬁt Binomial trees that replicate the RNDs extracted from options prices. Rubinstein found some diﬃculty in ﬁtting a well-behaved left tail for the RND and chose the approach of using a lognormal density

328

Estimating the implied risk-neutral density for the US market portfolio

as a Bayesian prior for the RND. Jackwerth (1997) generalized Rubinstein’s binomial lattice to produce a better ﬁt, and Rubinstein (1998) suggested a diﬀerent extension, using an Edgeworth expansion to ﬁt the RND and then constructing a tree consistent with the resulting distribution. Both Dupire (1994) and Derman and Kani (1994) also developed implied tree models at about the same time as Rubinstein. Dupire ﬁt an implied trinomial lattice, whereas Derman and Kani, like Rubinstein, used options prices to imply out a binomial tree, but they combined multiple maturities to get a tree that simultaneously matched RNDs for diﬀerent expiration dates. Their approach was extended in Derman and Kani (1998) to allow implied (trinomial) trees that matched both option prices and implied volatilities. Unfortunately, despite the elegance of these techniques, their ability to produce superior option pricing and hedging parameters was called into question by Dumas, Fleming and Whaley (1998) who oﬀered empirical evidence that the implied lattices were no better than “practitioner Black–Scholes”. The most common method to model the RND is to select a known parametric density function, or a mixture of such functions, and ﬁt its parameters by minimizing the discrepancy between the ﬁtted function and the empirical RND. A variety of distributions and objective functions have been investigated and their relative strengths debated in numerous papers, including those already mentioned. Simply computing an implied volatility using the Black–Scholes equation inherently assumes the risk-neutral density for the cumulative return as of expiration is lognormal. Its mean is the riskless rate (with an adjustment for the concavity of the log function) and it has standard deviation consistent with the implied volatility, both properly scaled by the time to expiration.3 But given the extensive evidence that actual returns distributions are too fat-tailed to be lognormal, research with the lognormal has typically used a mixture of two or more lognormal densities with diﬀerent parameters. Yet, using the Black–Scholes equation to smooth and interpolate option values has become a common practice. Shimko (1993) was the ﬁrst to propose converting option prices into implied volatilities, interpolating and smoothing the curve, typically with a cubic spline or a low-order polynomial, then converting the smoothed IVs back into price space and proceeding with the extraction of a RND from the resulting dense set of option prices. We adopt this approach below, but illustrate the potential pitfall of simply ﬁtting a spline to the IV data: since a standard cubic spline must pass through all of the original data points, it incorporates all of the noise from the bid-ask spread and other market microstructure frictions into the RND. A more successful spline-based technique, discussed by Bliss and Panigirtzoglou (2002), uses a “smoothing” spline. This produces a much better-behaved RND by imposing a penalty function on the choppiness of the spline approximation and not requiring the curve to pass through all of the original points exactly. Other papers achieve smooth RNDs by positing either a speciﬁc returns process (e.g., a jump diﬀusion) or a speciﬁc terminal distribution (e.g., a lognormal) and extracting its parameters from option prices. Nonparametric techniques (e.g., kernel regression) inherently smooth the estimated RND and achieve the same goal. Several papers in this group, in addition to those described above, are worth mentioning. Bates (1996) used currency options prices to estimate the parameters of a 3 The implied volatility literature is voluminous. Poon and Granger (2003) provide an extensive review of this literature, from the perspective of volatility prediction.

3 Extracting the risk-neutral density from options prices, in theory

329

jump-diﬀusion model for exchange rates, implying out parameters that lead to the best match between the terminal returns distribution under the model and the observed RNDs. Buchen and Kelly (1996) suggested using the principle of Maximum Entropy to establish a RND that places minimal constraints on the data. They evaluated the procedure by simulating options prices and trying to extract the correct density. Bliss and Panigirtzoglou (2002) also used simulated option prices, to compare the performance of smoothing splines versus a mixture of lognormals in extracting the correct RND when prices are perturbed by amounts that would still leave them inside the typical bid-ask spread. They concluded that the spline approach dominates a mixture of lognormals. Bu and Hadri (2007), on the other hand, also used Monte Carlo simulation in comparing the spline technique against a parametric conﬂuent hypergeometric density and preferred the latter. One might summarize the results from this literature as showing that the implied riskneutral density may be extracted from market option prices using a number of diﬀerent methods, but none of them is clearly superior to the others. Noisy market option prices and sparse strikes in the available set of traded contracts are a pervasive problem that must be dealt with in any viable procedure. We will select and adapt elements of the approaches used by these researchers to extract the RND from a set of option prices, add a key wrinkle to take account of the bid-ask spread in the market, and then propose a new technique for completing the tails of the distribution.

3. Extracting the risk-neutral density from options prices, in theory In the following, the symbols C, S, X, r, and T all have the standard meanings of option valuation: C = call price; S = time 0 price of the underlying asset; X = exercise price; r = riskless interest rate; T = option expiration date, which is also the time to expiration. P will be the price of a put option. We will also use f(x) = risk-neutral x probability density function (RND) and F(x) = −∞ f (z)dz = risk-neutral distribution function. The value of a call option is the expected value of its payoﬀ on the expiration date T, discounted back to the present. Under risk-neutrality, the expectation is taken with respect to the risk-neutral probabilities and discounting is at the risk-free interest rate: ∞ e−rT (ST − X)f (ST )dST . (1) C= X

Increasing the exercise price by an amount dX changes the option value for two reasons. First, it narrows the range of stock prices ST for which the call has a positive payoﬀ. Second, increasing X reduces the payoﬀ by the amount -dX for every ST at which the option is in the money. The ﬁrst eﬀect occurs when ST falls between X and X + dX. The maximum of the lost payoﬀ is just dX, which contributes to option value multiplied by the probability that ST will end up in that narrow range. So, for discrete dX the impact of the ﬁrst eﬀect is very small and it becomes inﬁnitesimal relative to the second eﬀect in the limit as dX goes to 0.

330

Estimating the implied risk-neutral density for the US market portfolio

These two eﬀects are seen clearly when we take the partial derivative in (1) with respect to X: ∞ ∂ ∂C = e−rT (ST − X)f (ST )dST ∂X ∂X X ∞ −rT =e −(X − X)f (X) + −f (ST )dST . X

The ﬁrst term in brackets corresponds to the eﬀect of changing the range of ST for which the option is in the money. This is zero in the limit, leaving ∞ ∂C = −e−rT f (ST )dST = −e−rT [1 − F (X)] . ∂X X Solving for the risk-neutral distribution F(X) gives F (X) = erT

∂C + 1. ∂X

(2)

In practice, an approximate solution to (2) can be obtained using ﬁnite diﬀerences of option prices observed at discrete exercise prices in the market. Let there be option prices available for maturity T at N diﬀerent exercise prices, with X1 representing the lowest exercise price and XN being the highest. We will use three options with sequential strike prices Xn−1 , Xn , and Xn+1 in order to obtain an approximation to F(X) centered on Xn .4 Cn+1 − Cn−1 rT +1 (3) F (Xn ) ≈ e Xn+1 − Xn−1 To estimate the probability in the left tail of the risk-neutral distribution up C3 −C1 ∂C at X2 by erT X + 1, and the probability in to X2 , we approximate ∂X 3 −X1

CN −CN −2 + 1) = the right tail from XN−1 to inﬁnity is approximated by 1 − (erT X N −XN −2

CN −CN −2 . −erT X N −XN −2 Taking the derivative with respect to X in (2) a second time yields the risk-neutral density function at X:

f (X)

∂2C . ∂X 2

(4)

Cn+1 − 2Cn + Cn−1 . (ΔX)2

(5)

=

erT

The density f(Xn ) is approximated as f (Xn )

≈

erT

Equations (1)–(5) show how the portion of the RND lying between X2 and XN−1 can be extracted from a set of call option prices. A similar derivation can be done to yield a 4 In general, the diﬀerences (X − X n n−1 ) and (Xn+1 − Xn ) need not be equal, in which case a weighting procedure could be used to approximate F(Xn ). In our methodology ΔX is a constant value, because we construct equally spaced artiﬁcial option prices to ﬁll in values for strikes in between those traded in the market.

4 Extracting a risk-neutral density from options market prices, in practice

331

procedure for obtaining the RND from put prices. The equivalent expressions to (2)–(5) for puts are: ∂P ∂X Pn+1 − Pn−1 F (Xn ) ≈ erT Xn+1 − Xn−1 F (X) = erT

f (X) = erT f (Xn ) ≈ erT

(6) (7)

∂2P ∂X 2

(8)

Pn+1 − 2Pn + Pn−1 . (ΔX)2

(9)

4. Extracting a risk-neutral density from options market prices, in practice The approach described in the previous section assumes the existence of a set of option prices that are all fully consistent with the theoretical pricing relationship of equation (1). Implementing it with actual market prices for traded options raises several important issues and problems. First, market imperfections in observed option prices must be dealt with carefully or the resulting risk-neutral density can have unacceptable features, such as regions in which it is negative. Second, some way must be found to complete the tails of the RND beyond the range from X2 to XN−1 . This section will review several approaches that have been used in the literature to obtain the middle portion of the RND from market option prices, and will describe the technique we adopt here. The next section will add the tails. We will be estimating RNDs from the daily closing bid and ask prices for Standard and Poor’s 500 Index options. S&P 500 options are particularly good for this exercise because the underlying index is widely accepted as the proxy for the US “market portfolio”, the options are very actively traded on the Chicago Board Options Exchange, and they are cash-settled with European exercise style. S&P 500 options have major expirations quarterly, on the third Friday of the months of March, June, September and December. This will allow us to construct time series of RNDs applying to the value of the S&P index on each expiration date. The data set will be described in further detail below. Here we will take a single date, January 5, 2005, selected at random, to illustrate extraction of an RND in practice.

4.1. Interpolation and smoothing The available options prices for January 5, 2005 are shown in Table 15.1. The index closed at 1,183.74 on that date, and the March options contracts expired 72 days later, on March 18, 2005. Strike prices ranged from 1,050 to 1,500 for calls, and from 500 to 1,350 for puts. Bid-ask spreads were relatively wide: 2 points for contracts trading above 20 dollars down to a minimum of 0.50 in most cases even for the cheapest options. This amounted

332

Estimating the implied risk-neutral density for the US market portfolio

Table 15.1.

Strike price 500 550 600 700 750 800 825 850 900 925 950 975 995 1005 1025 1050 1075 1100 1125 1150 1170 1175 1180 1190 1200 1205 1210 1215 1220 1225 1250 1275 1300 1325 1350 1400 1500

S&P 500 index options prices, January 5, 2005

S&P 500 Index closing level, = 1, 183.74 Option expiration: 3/18/2005 (72 days)

Interest rate = 2.69 Dividend yield = 1.70

Calls

Puts

Best bid

Best oﬀer

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 134.50 136.50 111.10 113.10 88.60 90.60 67.50 69.50 48.20 50.20 34.80 36.80 31.50 33.50 28.70 30.70 23.30 25.30 18.60 20.20 16.60 18.20 14.50 16.10 12.90 14.50 11.10 12.70 9.90 10.90 4.80 5.30 2.30 1.80 0.75 1.00 0.10 0.60 0.15 0.50 0.00 0.50 0.00 0.50

Source: Optionmetrics.

Average price

Implied volatility

— — — — — — — — — — — — — — — 135.500 112.100 89.600 68.500 49.200 35.800 32.500 29.700 24.300 19.400 17.400 15.300 13.700 11.900 10.400 5.050 2.050 0.875 0.350 0.325 0.250 0.250

— — — — — — — — — — — — — — — 0.118 0.140 0.143 0.141 0.135 0.131 0.129 0.128 0.126 0.123 0.123 0.121 0.122 0.120 0.119 0.117 0.114 0.115 0.116 0.132 0.157 0.213

Best bid

Best oﬀer

0.00 0.05 0.00 0.05 0.00 0.05 0.00 0.10 0.00 0.15 0.10 0.20 0.00 0.25 0.00 0.50 0.00 0.50 0.20 0.70 0.50 1.00 0.85 1.35 1.30 1.80 1.50 2.00 2.05 2.75 3.00 3.50 4.50 5.30 6.80 7.80 10.10 11.50 15.60 17.20 21.70 23.70 23.50 25.50 25.60 27.60 30.30 32.30 35.60 37.60 38.40 40.40 41.40 43.40 44.60 46.60 47.70 49.70 51.40 53.40 70.70 72.70 92.80 94.80 116.40 118.40 140.80 142.80 165.50 167.50 — — — —

Average price

Implied volatility

0.025 0.025 0.025 0.050 0.075 0.150 0.125 0.250 0.250 0.450 0.750 1.100 1.550 1.750 2.400 3.250 4.900 7.300 10.800 16.400 22.700 24.500 26.600 31.300 36.600 39.400 42.400 45.600 48.700 52.400 71.700 93.800 117.400 141.800 166.500 — —

0.593 0.530 0.473 0.392 0.356 0.331 0.301 0.300 0.253 0.248 0.241 0.230 0.222 0.217 0.208 0.193 0.183 0.172 0.161 0.152 0.146 0.144 0.142 0.141 0.139 0.139 0.138 0.138 0.136 0.137 0.139 0.147 0.161 0.179 0.198 — —

4 Extracting a risk-neutral density from options market prices, in practice

333

1

Probability

0.8 0.6 0.4 0.2 0 –0.2 800

900

1000

1100 S&P 500 index

Distribution from put prices

Fig. 15.1.

1200

1300

1400

Distribution from call prices

Risk-neutral distribution from raw options prices

to spreads of more than 100% of the average price for many deep out of the money contracts. It is customary to use either transactions prices or the midpoints of the quoted bid-ask spreads as the market’s option prices. Options transactions occur irregularly in time and only a handful of strikes have frequent trading, even for an actively traded contract like S&P 500 index options. Use of transactions data also requires obtaining synchronous prices for the underlying. By contrast, bids and oﬀers are quoted continuously for all traded strikes, whether or not trades are occurring. We will begin by taking the average of bid and ask as the best available measure of the option price. We then modify the procedure to make use of the full spread in the smoothing and interpolation stage. Equations (3) and (7) show how to estimate the probability distribution using a centered diﬀerence to compute the slope and the distribution at Xn . In Figure 15.1, we have used uncentered diﬀerences, Cn − Cn−1 and Pn − Pn−1 simply for illustration, to construct probability distributions from the average call and put price quotes shown in Table 15.1. The distribution from the puts extends further to the left and the one from the calls extends further to the right, but in the middle range where they overlap, the values are quite close together. There are some discrepancies, notably around 1,250, where the cumulative call probability is 0.698 and the put probability is 0.776, but the more serious problem is around 1,225, where the ﬁtted probability distribution from call prices is nonmonotonic. Figure 15.2 plots the risk-neutral densities corresponding to the distribution functions displayed in Figure 15.1. These are clearly unacceptable as plausible estimates of the true density function. Both RNDs have ranges of negative values, and the extreme ﬂuctuations in the middle portion and sharp diﬀerences between call and put RNDs violate our prior beliefs that the RND should be fairly smooth and the same expectations should govern pricing of both calls and puts. Looking at the prices in Table 15.1, it is clear that there will be problems with out of the money puts. Except at 800, there is no bid for puts at any strike below 925 and

334

Estimating the implied risk-neutral density for the US market portfolio 0.025 0.02

Probability

0.015 0.01 0.005 0 –0.005 800

900

1000

1100 S&P 500 index

Density from put prices

Fig. 15.2.

1200

1300

1400

Density from call prices

Risk-neutral density from raw options prices

the ask price is unchanged over multiple contiguous strikes, making the average price equal for diﬀerent exercise prices. From (9), the estimated RND over these regions will be 0, implying no possibility that the S&P could end up there at expiration. A similar situation occurs for out of the money calls between 1,400 and 1,500. Moreover, the single 0.10 bid for puts at X = 800 produces an average put price higher than that for the next higher strike, which violates the static no-arbitrage condition that a put with a higher strike must be worth more than one with a lower strike. This leads to a region in which the implied risk-neutral density is negative. However, it is not obvious from the prices in Table 15.1 what the problem is that produces the extreme choppiness and negative densities around the at the money index levels between 1,150 and 1,250. Table 15.1 and Figure 15.1 show that even for this very actively traded index option, the available strikes are limited and the resulting risk-neutral distribution is a coarse step function. The problem would be distinctly worse for individual traded stock options whose available strikes are considerably less dense than this. This suggests the use of an interpolation technique to ﬁll in intermediate values between the traded strike prices and to smooth out the risk-neutral distribution. Cubic spline interpolation is a very common ﬁrst choice as an interpolation tool. Figure 15.3 shows the spline-interpolated intermediate option prices for our calls and puts. To the naked eye, the curves look extremely good, without obvious bumps or wiggles between the market prices, indicated by the markers. Yet these option prices produce the RNDs shown in Figure 15.4, with erratic ﬂuctuations around the at the money stock prices, large discrepancies between RNDs from calls and puts, and negative portions in both curves. The problem is that cubic spline interpolation generates a curve that is forced to go through every observed price, which has the eﬀect of incorporating all of the noise due to market microstructure and other imperfections into the RND. David Shimko (1993) proposed transforming the market option prices into implied volatility (IV) space before interpolating, then retransforming the interpolated curve back to price space to compute a risk-neutral distribution. This procedure does not assume

4 Extracting a risk-neutral density from options market prices, in practice

335

180 160 140

Option price

120 100 80 60 40 20 0 500

600

700

800

900

1000 1100 S&P 500 index

Spline interpolated call price

1300

1400

Market call prices Market put prices

Spline interpolated put price

Fig. 15.3.

1200

Market option prices with cubic spline interpolation

0.05 0.04 0.03

Density

0.02 0.01 0.00 –0.01 –0.02 –0.03 –0.04 800

900

1000

1100

1200

1300

S&P 500 index Density from interpolated put prices

Fig. 15.4.

Density from interpolated call prices

Densities from option prices with cubic spline interpolation

1400

1500

336

Estimating the implied risk-neutral density for the US market portfolio

that the Black–Scholes model holds for these option prices. It simply uses the Black– Scholes equation as a computational device to transform the data into a space that is more conducive to the kind of smoothing one wishes to do. Consider the transformed option values represented by their BS IVs. Canonical Black– Scholes would require all of the options to have the same IV. If this constraint were imposed and the ﬁtted IVs transformed back into prices, by construction the resulting risk-neutral density would be lognormal, and hence well-behaved. But because of the wellknown volatility smile, or skew in this market, the new prices would be systematically diﬀerent from the observed market prices, especially in the left tail. Most option traders do not use the canonical form of the BS model, but instead use “practitioner Black– Scholes”, in which each option is allowed to have its own distinct implied volatility. Despite the theoretical inconsistency this introduces, the empirical volatility smile/skew is quite smooth and not too badly sloped, so it works well enough. Considerable research eﬀort has been devoted to ﬁnding arbitrage-free theoretical models based on nonlognormal returns distributions, that produce volatility smiles resembling those found empirically. Inverting those (theoretical) smiles will also lead to option prices that produce well-behaved RNDs. Of course, if market prices do not obey the alternative theoretical model due to market noise, transforming through implied volatility space will not cure the problem. To moderate the eﬀects of market imperfections in option prices, a smooth curve is ﬁtted to the volatility smile/skew by least squares. Shimko used a simple quadratic function, but we prefer to allow greater ﬂexibility with a higher order polynomial. Applying a cubic spline to interpolate the volatility smile still produces bad results for the ﬁtted RND. The main reason for this is that an n-th degree spline constructs an interpolating curve consisting of segments of n-th order polynomials joined together at a set of “knot” points. At each of those points, the two curve segments entering from the left and the right are constrained to have the same value and the same derivatives up to order n-1. Thus, a cubic spline has no discontinuities in the level, slope or second derivative, meaning there will be no breaks, kinks, or even visible changes in curvature at its knot points. But when the interpolated IV curve is translated back into option strike-price space and the RND is constructed by taking the second derivative as in (5), the discontinuous third derivative of the IV curve becomes a discontinuous ﬁrst derivative – a kink – in the RND. The simple solution is just to interpolate with a fourth order spline or higher.5 The other problem with using a standard n-th degree spline as an interpolating function is that it must pass through every knot point, which forces the curve to incorporate all pricing noise into the RND. As with K knot points, there will be K + n + 1 parameters to ﬁt, this also requires applying enough constraints to the curve at its end-points to allow all of the parameters to be identiﬁed with only K data points. Previous researchers have used a “smoothing spline” that allows a tradeoﬀ between how close the curve is to the observed data points – it no longer goes through them exactly – and how well its shape conforms to the standard spline constraint that the 5 As mentioned above, some researchers plot the IV smile against the option deltas rather than against the strike prices, which solves this problem automatically. Applying a cubic spline in delta-IV space produces a curve that is smooth up to second order in terms of the partial derivative of option price, which makes it smooth up to third order in the price itself, eliminating any kinks in the RND.

4 Extracting a risk-neutral density from options market prices, in practice

337

derivatives of the spline curve should be smooth across the knot points. For any given problem, the researcher must choose how this tradeoﬀ is resolved by setting the value of a smoothness parameter.6 We depart somewhat from previous practice in this area. We have found that ﬁtted RNDs behave very well using interpolation with just a fourth order polynomial – essentially a fourth degree spline with no knots. Additional degrees of freedom, that allow the estimated densities to take more complex shapes, can be added either by ﬁtting higher order polynomials or by adding knots to a fourth order spline. In this exercise, we found very little diﬀerence from either of these modiﬁcations. We therefore have done all of the interpolation for our density estimation using fourth order splines with a single knot point placed at the money. Looking again at Table 15.1, we see that many of the bid and ask quotes are for options that are either very deep in the money or very deep out of the money. For the former case, the eﬀect of optionality is quite limited, such that the IV might range from 12.9% to 14.0% within the bid-ask spread. For the lowest strike call, there is no IV at the bid price, because it is below the no-arbitrage minimum call price. The IV at the ask is 15.6%, whereas the IV at the midpoint, which is what goes into the calculations, is 11.8%. In addition to the wide bid-ask spreads, there is little or no trading in deep in the money contracts. On this day, no 1,050 or 1,075 strike calls were traded at all, and only three 1,150 strike calls changed hands. Most of the trading is in at the money or out of the money contracts. But out of the money contracts present their own data problems, because of extremely wide bid-ask spreads relative to their prices. The 925 strike put, for example, would have an IV of 22.3% at its bid price of 0.20 and 26.2% at the ask price of 0.70. Setting the IV for this option at 24.8% based on the mid-price of 0.45 is clearly rather arbitrary. One reason the spread is so wide is that there is very little trading of deep out of the money contracts. On this date, the only trades in puts with strikes of 925 or below were ﬁve contracts at a strike of 850, for a total option premium of no more than a couple hundred dollars. It is obvious that the quality of information about the risk-neutral density that can be extracted from the posted quotes on options that do not trade in the market may be quite limited. These observations suggest that it is desirable to limit the range of option strikes that are brought into the estimation process, eliminating those that are too deep in or out of the money. Also, as most trading is in at the money and somewhat out of the money contracts, we can broaden the range with usable data if we combine calls and puts together. The CBOE does this in their calculation of the VIX index, for example, combining calls and puts but using only out of the money contracts. To incorporate these ideas into our methodology, we ﬁrst discard all options whose bid prices are less than 0.50. On this date, this eliminates calls with strikes of 1,325 and above, and puts with strikes of 925 and below. Next we want to combine calls and puts, using the out of the money contracts for each. But from Table 15.1, with the current index level at 1,183.74, if we simply use puts with strikes up to 1,180 and calls with 6 The procedure imposes a penalty function on the integral of the second derivative of the spline curve to make the ﬁtted curve smoother. The standard smoothing spline technique still uses a knot at every data point, so it requires constraints to be imposed at the end-points. See Bliss and Panigirtzoglou (2002), Appendix A, for further information about this approach.

338

Estimating the implied risk-neutral density for the US market portfolio 0.70

Implied volatility

0.60 0.50 0.40 0.30 0.20 0.10 0.00 500

600

700

800

900

1000

1100

1200

1300

1400

1500

S&P 500 index 4th degree polynomial on combined IVs

Traded Call IVs

Traded Put IVs

Fig. 15.5. Implied volatilities from all calls and puts minimum bid price 0.50 fourth degree spline interpolation (1-knot)

strikes from 1,190 to 1,300, there will be a jump from the put IV of 14.2% to the call IV of 12.6% at the break point. To smooth out the eﬀect of this jump at the transition point, we blend the call and put IVs in the region around the at the money index level. We have chosen a range of 20 points on either side of the current index value S0 in which the IV will be set to a weighted average of the IVs from the calls and the puts.7 Let Xlow be the lowest traded strike such that (S0 − 20) ≤ Xlow and Xhigh be the highest traded strike such that Xhigh ≤ (S0 + 20). For traded strikes between Xlow and Xhigh we use a blended value between IVput (X) and IVcall (X), computed as: IVblend (X) = w IVput (X) + (1 − w) IVcall (X)

(10)

where w=

Xhigh − X . Xhigh − Xlow

In this case, we take put IVs for strikes up to 1,150, blended IVs for strikes 1,170 to 1,200, and call IVs for strikes from 1,205 up. Figure 15.5 plots the raw IVs from the traded options with markers and the interpolated IV curve computed from calls and puts whose bid prices are at least 0.50, as just described. 7 The choice of a 40 point range over which to blend the put and call IVs is arbitrary, but we believe that the speciﬁc choice has little impact on the overall performance of the methodology. On January 5, 2005, the discrepancy between the two IVs is about 0.015 in this range, which becomes distributed over the 40 point range of strikes at the rate of about 0.0004 per point. The eﬀect on the ﬁtted RND will be almost entirely concentrated around the midpoint, and it will be considerably smoother than if no adjustment were made and the IV simply jumped from the put value to the call value for the at the money strike. A reasonable criterion in setting the range for IV blending would be to limit it to the area before the IVs from the two sets begin to diverge, as Figure 15.5 illustrates happens when one of them gets far enough out of the money.

4 Extracting a risk-neutral density from options market prices, in practice

339

This procedure produces an implied risk-neutral density with a very satisfying shape, based on prior expectations that the RND should be smooth. Even so, there might be some concern that we have smoothed out too much. We have no reason to rule out minor bumps in the RND, that could arise when an important dichotomous future event is anticipated, such as the possibility of a cut in the Federal Reserve’s target interest rate, or alternatively, if there are distinct groups in the investor population with sharply divergent expectations. I have explored increasing ﬂexibility by ﬁtting fourth order splines using three knots with one at the midpoint and the others 20 points above and below that price. The choice of how many knots to use and where to place them allows considerable latitude for the user. But we will see shortly that at least in the present case, it makes very little diﬀerence to the results.

4.2. Incorporating market bid-ask spreads The spline is ﬁtted to the IV observations from the market by least squares. This applies equal weights to the squared deviation between the spline curve and the market IV evaluated at the midpoint of the bid-ask spread at all data points, regardless of whether the spline would fall inside or outside the quoted spread. Given the width of the spreads, it would make sense to be more concerned about cases where the spline fell outside the quoted spread than those remaining within it. To take account of the bid-ask spread, we apply a weighting function to increase the weighting of deviations falling outside the quoted spread relative to those that remain within it. We adapt the cumulative normal distribution function to construct a weighting function that allows weights between 0 and 1 as a function of a single parameter σ. : 9 N[ IV − IVAsk , σ] if IVMidpoint ≤ IV (11) w(IV) = N[ IVBid − IV , σ] if IV ≤ IVMidpoint Figure 15.6 plots an example of this weighting function for three values of σ. Implied volatility is on the x axis, with the vertical solid lines indicating a given option’s IV values at the market’s bid, ask, and midprice, 0.1249, 0.1331, and 0.1290, respectively. These values are obtained by applying the spline interpolation described above separately to the three sets of IVs, from the bid prices, the ask prices and the midprices at each traded strike level. In the middle range where call and put IVs are blended according to equation (10), the bid and ask IV curves from calls and puts are blended in the same way before the interpolation step. Settting σ to a very high value like 100 assigns (almost) equal weights of 0.5 to all squared deviations between the IV at the midpoint and the ﬁtted spline curve at every strike. This is the standard approach that does not take account of the bid-ask spread. With σ = 0.005, all deviations are penalized, but those falling well outside the quoted spread are weighted about three times more heavily than those close to the midprice IV. Setting σ = 0.001 puts very little weight on deviations that are within the spread and close to the midprice IV, whereas assigning full weight to nearly all deviations falling outside the spread. This is our preferred weighting pattern to make use of the information contained in the quoted spread in the market.

340

Estimating the implied risk-neutral density for the US market portfolio 1.2

Weight on squared deviation

IV at Bid

IV at Midprice

IV at Ask

1.0

0.8

0.6

0.4

0.2

0.0 0.115

0.12

0.125

0.13

0.135

0.14

0.145

Implied volatility Equal weights (sigma =100) Increased penalty outside Bid-Ask (sigma = 0.005) Very small penalty inside Bid-Ask (sigma = 0.001)

Fig. 15.6. spread

Alternative weighting of squared deviations within and outside the bid-ask

Figure 15.7 illustrates the eﬀect of changing the degree of the polynomial, the number of knot points and the bid-ask weighting parameter used in the interpolation step. Lines in gray show densities constructed by ﬁtting polynomials of degree 4, 6, and 8, with no knots and equal weighting of all squared deviations. The basic shape of the three curves is close, but higher order polynomials allow greater ﬂexibility in the RND. This allows it to ﬁt more complex densities, but also increases the impact of market noise. Consider the left end of the density. The missing portion of the left tail must be attached below 950 but it is far from clear how it should look to match the density obtained either from the eighth degree polynomial, which slopes sharply downward at that level, or from the sixth degree polynomial, which has a more reasonable slope at that point, but the estimated density is negative. By contrast, the fourth order polynomial and all three of the spline functions produce very reasonably shaped RNDs that are so close together that they cannot be distinguished in the graph. Although these plots are for a single date, I have found similar results on nearly every date for which this comparison was done, which supports the choice of a fourth order spline with a single knot and with a very small relative weight on deviations that fall within the bid-ask spread in order to extract risk neutral densities from S&P 500 index options.

4.3. Summary The following steps summarize our procedure for extracting a well-behaved risk-neutral density from market prices for S&P 500 index options, over the range spanned by the available option strike prices.

4 Extracting a risk-neutral density from options market prices, in practice

341

0.009 0.008 0.007

Density

0.006 0.005 0.004 0.003 0.002 0.001 0.000 –0.001 800

900

1000

1100

1200

1300

1400

S&P 500 index 4th degree polynomial 4th order spline, 1 knot, sigma=.001

Fig. 15.7.

6th degree polynomial 4th order spline, 1 knot, sigma = .005

8th degree polynomial 4th order spline, 3 knots, sigma = .001

Densities constructed using alternative interpolation methods

1. Begin with bid and ask quotes for calls and puts with a given expiration date. 2. Discard quotes for very deep out of the money options. We required a minimum bid price of $0.50 for this study. 3. Combine calls and puts to use only the out of the money and at the money contracts, which are the most liquid. 4. Convert the option bid, ask and midprices into implied volatilities using the Black– Scholes equation. To create a smooth transition from put to call IVs, take weighted averages of the bid, ask and midprice IVs from puts and calls in a region around the current at the money level, using equation (10). 5. Fit a spline function of at least fourth order to the midprice implied volatilities by minimizing the weighted sum of squared diﬀerences between the spline curve and the midprice IVs. The weighting function shown in equation (11) downweights deviations that lie within the market’s quoted bid-ask spread relative to those falling outside it. The number of knots should be kept small, and their optimal placement may depend on the particular data set under consideration. In this study we used a fourth order spline with a single knot at the money. 6. Compute a dense set of interpolated IVs from the ﬁtted spline curve and then convert them back into option prices. 7. Apply the procedure described in Section 3 to the resulting set of option prices in order to approximate the middle portion of the RND. 8. These steps produce an empirical RND over the range between the lowest and highest strike price with usable data. The ﬁnal step is to extend the density into the tails.

342

Estimating the implied risk-neutral density for the US market portfolio

5. Adding tails to the risk-neutral density The range of strike prices {X1 , X2 , . . . , XN } for which usable option prices are available from the market or can be constructed by interpolation does not extend very far into the tails of the distribution. The problem is further complicated by the fact that what we are trying to approximate is the market’s aggregation of the individual risk-neutralized subjective probability beliefs in the investor population. The resulting density function need not obey any particular probability law, nor is it even a transformation of the true (but unobservable) distribution of realized returns on the underlying asset. We propose to extend the empirical RND by grafting onto it tails drawn from a suitable parametric probability distribution in such a way as to match the shape of the estimated RND over the portion of the tail region for which it is available. The ﬁrst question is which parametric probability distribution to use. Some of the earlier approaches to this problem implicitly assume a distribution. For example, the Black– Scholes implied volatility function can be extended by setting IV(X) = IV(X1 ) for all X < X1 and IV(X) = IV(XN ) for all X > XN , where IV(·) is the implied volatility from the Black–Scholes model.8 This forces the tails to be lognormal. Bliss and Panigirtzoglou (2004) do something similar by employing a smoothing spline for the middle portion of the distribution but constraining it to become linear outside the range of the available strikes. Given the extensive empirical evidence of fat tails in returns distributions, constraining the tails of the RND to be lognormal is unlikely to be satisfactory in practice if one is concerned about modeling tail events accurately. Fortunately, similar to the way the Central Limit Theorem makes the normal a natural choice for modeling the distribution of the sample average from an unknown distribution, the Extreme Value distribution is a natural candidate for the purpose of modeling the tails of an unknown distribution. The Fisher–Tippett Theorem proves that under weak regularity conditions the largest value in a sample drawn from an unknown distribution will converge in distribution to one of three types of probability laws, all of which belong to the generalized extreme value (GEV) family.9 We will therefore use the GEV distribution to construct tails for the RND. The standard generalized extreme value distribution has one parameter ξ, which determines the tail shape. GEV distribution function: F(z) = exp[−(1 + ξz)−1/ξ ].

(12)

The value of ξ determines whether the tail comes from the Fr´echet distribution with fat tails relative to the normal (ξ > 0), the Gumbel distribution with tails like the normal (ξ = 0), or the Weibull distribution (ξ < 0) with ﬁnite tails that do not extend out to inﬁnity. 8 See,

for example Jiang and Tian (2005). let x1 , x2 , . . . be an i.i.d. sequence of draws from some distribution F and let Mn denote the maximum of the ﬁrst n observations. If we can ﬁnd sequences of real numbers an and bn such that the sequence of normalized maxima (Mn − bn )/an converges in distribution to some nondegenerate distribution H(x), i.e., P((Mn − bn )/an ≤ x) → H(x) as n → ∞ then H is a GEV distribution. The class of distribution functions that satisfy this condition is very broad, including all of those commonly used in ﬁnance. See Embrechts et al. (1997) or McNeil et al. (2005) for further detail. 9 Speciﬁcally,

5 Adding tails to the risk-neutral density

343

Two other parameters, μ and σ, can be introduced to set location and scale of the distribution, by deﬁning z=

ST − μ . σ

(13)

Thus we have three GEV parameters to set, which allows us to impose three conditions on the tail. We will use the expressions FEVL (·) and FEVR (·) to denote the approximating GEV distributions for the left and right tails, respectively, with fEVL (·) and fEVR (·) as the corresponding density functions, and the same notation without the L and R subscripts when referring to both tails without distinction. FEMP (·) and fEMP (·) will denote the estimated empirical risk-neutral distribution and density functions. Let X(α) denote the exercise price corresponding to the α-quantile of the risk-neutral distribution. That is, FEMP (X(α)) = α. We ﬁrst choose the value of α at which the GEV tail is to begin, and then a second, more extreme point on the tail, that will be used in matching the GEV tail shape to that of the empirical RND. These values will be denoted α0R and α1R , respectively, for the right tail and α0L and α1L for the left. The choice of α0 and α1 values is ﬂexible, subject to the constraint that we must be able to compute the empirical RND at both points, which requires X2 ≤ X(α1L ) and X(α1R ) ≤ XN−1 . However, the GEV will ﬁt the more extreme tail of an arbitrary distribution better than the near tail, so there is a tradeoﬀ between data availability and quality, which would favor less extreme values for α0 and α1 , versus tail ﬁt, which would favor more extreme values. Consider ﬁrst ﬁtting a GEV upper tail for the RND. The ﬁrst condition to be imposed is that the total probability in the tail must be the same for the RND and the GEV approximation. We also want the GEV density to have the same shape as the RND in the area of the tail where the two overlap, so we use the other two degrees of freedom to set the two densities equal at α0R and α1R . The three conditions for the right tail are shown in equations (14a–c): FEVR (X(α0R )) = α0R ,

(14a)

fEVR (X(α0R )) = fEMP (X(α0R )),

(14b)

fEVR (X(α1R )) = fEMP (X(α1R )).

(14c)

The GEV parameter values that will cause these conditions to be satisﬁed can be found easily using standard optimization procedures. Fitting the left tail of the RND is slightly more complicated than the right tail. As the GEV is the distribution of the maximum in a sample, its left tail relates to probabilities of small values of the maximum, rather than to extreme values of the sample minimum, i.e., the left tail. To adapt the GEV to ﬁtting the left tail, we must reverse it left to right, by deﬁning it on −z. That is, z values must be computed from (15) in place of (13): z=

(−μL ) − ST σ

(15)

344

Estimating the implied risk-neutral density for the US market portfolio

where μL is the (positive) value of the location parameter for the left tail GEV. (The optimization algorithm will return the location parameter μ ≡ −μL as a negative number.)10 The optimization conditions for the left tail become FEVL (−X(α0L )) = 1 − α0L ,

(16a)

fEVR (−X(α0L )) = fEMP (X(α0L )),

(16b)

fEVR (−X(α1L )) = fEMP (X(α1L )).

(16c)

Our initial preference was to connect the left and right tails at α0 values of 5% and 95%, respectively. However, for the S&P 500 index options in the sample that will be analyzed below, market prices for options with the relevant exercise prices were not always available for the left tail and rarely were for the right tail. We have therefore chosen default values of α0L = 0.05 and α0R = 0.92, with α1L = 0.02 and α1R = 0.95 as the more remote connection points. In cases where data were not available for these α values, we set α1L = FEMP (X2 ), the lowest connection point available from the data, and α0L = α1L + 0.03. For the right tail, α1R = FEMP (XN−1 ), and α0R = α1R − 0.03. On January 5, 2005, the 5% and 2% quantiles of the empirical RND fell at 1,044.00 and 985.50, respectively, and the 95% and 92% right-tail quantiles were 1,271.50 and 1,283.50, respectively.11 The ﬁtted GEV parameters that satisﬁed equations (14) and (16) were as follows: Left tail :

μ = 1274.60

σ = 91.03

ξ = −0.112;

Right tail :

μ = 1195.04

σ = 36.18

ξ = −0.139.

Figure 15.8 plots three curves: the middle portion of the empirical RND extracted from the truncated set of options prices with interpolation using a fourth degree spline with one knot at the money and bid-ask weighting parameter σ = 0.001, as shown in Figure 15.7, and the two GEV distributions whose tails have been matched to the RND at the four connection points. As the ﬁgure illustrates, the GEV tail matches the empirical RND very closely in the region of the 5% and 92% tails. Figure 15.9 shows the resulting completed RND with GEV tails. 10 The

procedure as described works well for ﬁtting tails to a RND that is deﬁned on positive X values only, as it is when X refers to an asset price ST , or a simple gross return ST /S0 . Fitting a RND in terms of log returns, however, raises a problem that it may not be possible to ﬁt a good approximating GEV function on the same support as the empirical RND. This diﬃculty can be dealt with by simply adding a large positive constant to every X value to shift the empirical RND to the right for ﬁtting the tails, and then subtracting it out afterwards, to move the completed RND back to the right spot on the x axis. 11 With ﬁnite stock price increments in the interpolation, these quantiles will not fall exactly on any Xn . We therefore choose n at the left-tail connection points such that Xn−1 ≤ X(α) < Xn and set the actual quantiles α0L and α1L equal to the appropriate actual values of the empirical risk neutral distribution and density at Xn . Similarly, the right connection points are set such that Xn−1 < X(α) ≤ Xn .

6 Estimating the risk-neutral density

345

0.012

0.010

Density

0.008

0.006

0.004 92%

0.002 2%

0.000 800

900 Empirical RND

Fig. 15.8.

5% 95%

1000

1100

Left tail GEV function

1200

1300

1400

Connection points

Right tail GEV function

Risk-neutral density and ﬁtted GEV tail functions

0.012 0.010

Density

0.008 0.006 0.004 0.002 0.000 800

900

1000

1100

1200

1300

1400

S&P 500 index Empirical RND

Fig. 15.9.

Left GEV tail

Right GEV tail

Full estimated risk-neutral density function for January 5, 2005

6. Estimating the risk-neutral density for the S&P 500 from S&P 500 index options We applied the methodology described above to ﬁt risk-neutral densities for the Standard and Poor’s 500 stock index using S&P 500 index call and put options over the period January 4, 1996–February 20, 2008. In this section we will present interesting preliminary

346

Estimating the implied risk-neutral density for the US market portfolio

results on some important issues, obtained from analyzing these densities. The purpose is to illustrate the potential of this approach to generate valuable insights about how investors’ information and risk preferences are incorporated in market prices. The issues we consider are complex and we will not attempt to provide in-depth analysis of them in this chapter. Rather, we oﬀer a small set of what we hope are tantalizing “broad brush” results that suggest directions in which further research along these lines is warranted. Speciﬁcally, we ﬁrst examine the moments of the ﬁtted RNDs and compare them to the lognormal densities assumed in the Black–Scholes model. We then look at how the RND behaves dynamically, as the level of the underlying index changes.

6.1. Data sample Closing bid and ask option prices data were obtained from Optionmetrics through the WRDS system. The RND for a given expiration date is extracted from the set of traded options with that maturity, and each day’s option prices provide an updated RND estimate for the same expiration date. We focus on the quarterly maturities with expirations in March, June, September, and December, which are the months with the most active trading interest.12 The data sample includes option prices for 49 contract maturities and 2,761 trading days. We construct RNDs, updated daily, for each quarterly expiration, beginning immediately after the previous contract expires and ending when the contract has less than two weeks remaining to maturity. Very short maturity contracts were eliminated because we found that their RNDs are often badly behaved. This may be partly due to price eﬀects from trading strategies related to contract expiration and rollover of hedge positions into later expirations. Also, the range of strikes for which there is active trading interest in the market gets much narrower as expiration approaches. We computed Black–Scholes IVs using the closing bid and ask prices reported by Optionmetrics. Optionmetrics was also the source for the riskless rate and dividend yield data, which are also needed in calculating forward values for the index on the option maturity dates.13 Table 15.2 provides summary information on the data sample and the estimated tail parameters. During this period, the S&P index ranged from a low of just under 600 to a high of 1,565.20, averaging 1,140.60. Contract maturities were between slightly over three months down to 14 days, with an average value of about 54 days. The number of market option prices available varied from day to day and some of those for which prices were reported were excluded because implied volatilities could not be computed (typically because the option price violated a no-arbitrage bound). The numbers of usable calls and puts averaged about 46 and 42 each day, respectively. We eliminated those with bid prices in the market less than $0.50. The excluded deep out of the money contracts are quite illiquid and, as Table 15.1 shows, their bid-ask spreads 12 The CBOE lists contracts with maturities in the next three calendar months plus three more distant months from the March–June–September–December cycle, meaning that oﬀ-month contracts such as April and May are only introduced when the time to maturity is less than three months. 13 Optionmetrics interpolates US dollar LIBOR to match option maturity and converts it into a continuously compounded rate. The projected dividends on the index are also converted to a continuous annual rate. See the Optionmetrics Manual (2003) for detailed explanations of how Optionmetrics handles the data.

6 Estimating the risk-neutral density

347

Table 15.2. Summary statistics on ﬁtted S&P 500 risk-neutral densities, January 4, 1996–February 20, 2008 Average S&P index Days to expiration Number of option prices # calls available # calls used IVs for calls used # puts available # puts used IVs for puts used Left tail α0L connection point ξ μ σ Right tail α0R connection point ξ μ σ

1140.60 54.2 46.2 37.6 0.262 41.9 32.9 0.238

Standard deviation

Minimum

Maximum

234.75 23.6

598.48 14

1565.20 94

17.6 15.4 0.180 15.0 12.4 0.100

8 7 0.061 6 6 0.062

135 107 3.101 131 114 1.339

0.8672 0.0471 1.0611 0.0735

0.0546 0.1864 0.0969 0.0920

0.6429 −0.8941 0.9504 0.0020

0.9678 0.9620 2.9588 2.2430

1.0900 −0.1800 1.0089 0.0416

0.0370 0.0707 0.0085 0.0175

1.0211 −0.7248 0.8835 0.0114

1.2330 0.0656 1.0596 0.2128

Tail parameters refer to the risk-neutral density expressed in terms of gross returns, ST /S0 . “# calls (puts) available” is the number for which it was possible to compute implied volatilities. “# calls (puts) used” is the subset of those available that had bid prices of $0.50 and above.

are very wide relative to the option price. On average about 38 calls and 33 puts were used to ﬁt a given day’s RND, with a minimum of six puts and seven calls. Their implied volatilities averaged around 25%, but covered a very wide range of values. The tail parameters reported in the table relate to risk-neutral densities estimated on gross returns, deﬁned as ST /S0 , where S0 is the current index level and ST is the index on the contract’s expiration date. This rescaling makes it possible to combine RNDs from diﬀerent expirations so that their tail properties can be compared. Under Black–Scholes assumptions, these simple returns should have a lognormal distribution. For the left tail, if suﬃcient option price data are available, the connection point is set at the index level where the empirical RND has cumulative probability of α0L = 5%. This averaged 0.8672, i.e., a put option with that strike was about 13% out of the money. The mean value of the ﬁtted left-tail shape parameter ξ was 0.0471, which makes the left-tail shape close to the normal on average, but with a fairly large standard deviation. Note that this does not mean the RND is not fattailed relative to the normal, as we will see when we look at its excess kurtosis in Table 15.3, only that the extreme left tail of the RND deﬁned on simple returns is not fat on average. Indeed, as the RND deﬁned on gross returns is bounded below by 0, the true left tail must be thin-tailed relative to the normal, asymptotically.

Table 15.3. Summary statistics on the risk-neutral density for returns on the S&P 500, January 4, 1996–February 20, 2008 Mean

Std Dev

Quantile 0.10

Expected return to expiration 0.61% Expected return annualized 4.05% Excess return relative to the riskless −0.21% rate, annualized Standard deviation 7.55% Standard deviation annualized 20.10% Skewness −1.388 6.000 Excess kurtosis Skewness of RND on log returns −2.353 Excess kurtosis of RND on log returns 20.516

0.25

0.50

0.75

0.41% 0.13% 0.25% 0.52% 0.92% 1.89% 1.08% 2.03% 4.88% 5.46% 0.43% −0.57% −0.30% −0.16% −0.04% 2.86% 4.13% 5.49% 5.82% 12.80% 15.56% 0.630 −2.165 −1.651 6.830 1.131 2.082 1.289 −3.940 −2.834 28.677 2.929 4.861

7.22% 19.67% −1.291 3.806 −2.020 10.515

9.34% 23.79% −0.955 7.221 −1.508 23.872

0.90 1.22% 5.93% 0.10% 11.40% 27.57% −0.730 13.449 −1.202 49.300

The table summarizes properties of the risk-neutral densities ﬁtted to market S&P 500 Index option prices, with GEV tails appended, as described in the text. The period covers 2,761 days from 49 quarterly options expirations, with between 14 and 94 days to expiration. The RNDs are ﬁtted in terms of gross return, ST /S0 . Excess return relative to the riskless rate is the mean return, including dividends, under the RND minus LIBOR interpolated to match the time to expiration. Excess kurtosis is the kurtosis of the distribution minus 3.0. Skewness and excess kurtosis of RND on log returns are those moments from the ﬁtted RNDs transformed to log returns, deﬁned as log(ST /S0 ).

6 Estimating the risk-neutral density

349

The right connection point averaged 1.0900, i.e., where a call was 9% out of the money. The tail shape parameter ξ was negative for the right tail, implying a short-tailed distribution with a density that hits zero at a ﬁnite value. This result was very strong: although the ﬁtted values for ξ varied over a fairly wide range, with a standard deviation of 0.0707, only 1 out of 2,761 ξ estimates for the right tail was positive. Comparing the σ estimates for the left and right tails, we see that the typical GEV approximations generally resemble those shown for January 5, 2005 in Figures 15.8 and 15.9, with the left tail coming from a substantially wider distribution than the right tail.

6.2. Moments of the risk-neutral density Table 15.3 displays summary statistics on the moments of the ﬁtted S&P 500 risk-neutral densities. The table, showing the mean, standard deviation, and several quantiles of the distribution of the ﬁrst four moments of the ﬁtted densities within the set of 2,761 estimated RNDs, provides a number of interesting results. The mean risk-neutralized expected return including dividends was 0.61%, over time horizons varying from three months down to two weeks. At annualized rates, this was 4.05%, but with a standard deviation of 1.89%. The quantile results indicate that the range of expected returns was fairly wide. Perhaps more important is how the expected return option traders expected to earn compared to the riskless rate. Under risk-neutrality, the expected return on any security, including the stock market portfolio, should be equal to the riskless interest rate, but the third row of Table 15.3 shows that on average, option traders expected a risk-neutralized return 21 basis point below the riskless rate (using LIBOR as the proxy for that rate). The discrepancy was distributed over a fairly narrow range, however, with more than 95% of the values between −1% and +1%. Skewness of the RND deﬁned over returns was strongly negative. In fact, the skewness of the RND was negative on every single day in the sample. Under Black–Scholes assumptions, the distribution of gross returns is lognormal and risk-neutralization simply shifts the density to the left so that its mean becomes the riskless rate. The skewness result in Table 15.3 strongly rejects the hypothesis that the risk-neutral density is consistent with the standard model. Kurtosis was well over 3.0, indicating the RNDs were fat-tailed relative to the normal, although the nonzero skewness makes this result diﬃcult to interpret clearly. To explore these results a little further, we converted the RNDs deﬁned on terminal index levels to RNDs for log returns, deﬁned as r = log(ST /S0 ).14 This would yield a normal distribution if returns were lognormal. The results for skewness and excess kurtosis are shown in Table 15.3 for comparison, and they conﬁrm what we have seen for gross returns. The RND deﬁned on log returns is even more strongly left-skewed and 14 Let

x be a continuous r.v. with density fx ( . ). Let y = g(x) be a one-to-one transformation of x such that the derivative of x = g−1 (y) with respect to y is continuous. Then Y = g(X) is a continuous r.v. with density d −1 g (y)fX (g−1 (y)). fY (y) = dy In our case, r = g(S) = ln(S/S0 ). Therefore, RNDr (r) = S × RNDS (S), where r = ln(S/S0 ).

350

Estimating the implied risk-neutral density for the US market portfolio

excess kurtosis is increased. The RND was fat-tailed relative to the normal on every single day in the sample.

6.3. The dynamic behavior of the S&P 500 risk-neutral density How does the RND behave when the underlying Standard and Poor’s 500 index moves? The annualized excess return shown in Table 15.3 implies that the mean of the RND (in price space) is approximately equal to the forward price of the index on average. A reasonable null hypothesis would therefore be that if the forward index value changes by some amount ΔF, the whole RND will shift left or right by that amount at every point. This does not appear to be true. Table 15.4 reports the results of regressing the changes in quantiles of the RND on the change in the forward value of the index. Regression (17) was run for each of 11 quantiles, Qj , j = 1, . . . , 11, of the risk-neutral densities: Qj (t) = a + bΔF(t).

(17)

Under the null hypothesis that the whole density shifts up or down by Δ F(t) as the index changes, the coeﬃcient b should be 1.0 for all quantiles. When (17) is estimated on all observations, all b coeﬃcients are positive and highly signiﬁcant, but they show a clear negative and almost perfectly monotonic relationship between the quantile and the size of b. When the index falls, the left end of the curve drops by more than the change in the forward index and the right end moves down by substantially less. For example, a 10 point drop in the forward index leads to about a 14 point drop in the 1% and 2% quantiles, but the upper quantiles, 0.90 and above, go down less than 8 points. Similarly, when the index rises the lower quantiles go up further than the upper quantiles. Visually, the RND stretches out to the left when the S&P drops, and when the S&P rises the RND tends to stack up against its relatively inﬂexible upper end. The next two sets of results compare the behavior of the quantiles between positive and negative returns. Although the same diﬀerence in the response of the left and right tails is present in both cases, it is more pronounced when the market goes down than when it goes up. To explore whether a big move has a diﬀerent impact, the last two sets of results in Table 15.4 report regression coeﬃcients ﬁtted only on days with large negative returns, below −1.0%, or large positive returns greater than +1.0%. When the market falls sharply, the eﬀect on the left tail is about the same as the overall average response to both up and down moves, but the extreme right tail moves distinctly less than for a normal day. By contrast, if the market rises more than 1.0%, the left-tail eﬀect is attenuated whereas the right tail seems to move somewhat more than for a normal day. These interesting and provocative results on how the RND responds to and reﬂects the market’s changing expectations and (possibly) risk attitudes as prices ﬂuctuate in the market warrant further investigation. One potentially important factor here is that the biggest diﬀerences are found in the extreme tails of the RND, in the regions where the empirical RND has been extended with GEV tails. What we are seeing may be a result of changes in the shape of the empirical RND at its ends when the market makes a big move, which the GEV tails then try to match. Note, however, that the empirically observed portion of the RND for the full sample shows the strong monotonic

Table 15.4.

Regression of change in quantile on change in the forward S&P index level Quantile

All observations Nobs = 2712 Negative return Nobs = 1298 Positive return Nobs = 1414 Return < −1.0% Nobs = 390 Return > 1.0% Nobs = 395

0.01

0.02

1.365 (58.65) 1.449 (30.07) 1.256 (26.88) 1.352 (12.64) 1.106 (11.36)

1.412 (72.43) 1.467 (35.53) 1.308 (34.32) 1.390 (14.26) 1.194 (14.05)

0.05

0.10

0.25

0.50

0.75

0.90

1.385 1.297 1.127 0.974 0.857 0.773 (98.62) (180.26) (269.44) (269.88) (272.05) (131.16) 1.404 1.291 1.119 0.975 0.867 0.780 (46.96) (88.75) (128.75) (130.77) (134.93) (64.80) 0.978 0.845 0.756 1.340 1.306 1.148 (48.97) (88.04) (137.21) (134.38) (131.44) (62.87) 1.368 1.282 1.140 1.001 0.879 0.756 (19.62) (39.18) (54.80) (59.48) (61.05) (27.71) 0.988 0.843 0.756 1.292 1.310 1.173 (20.28) (35.88) (61.28) (57.29) (55.29) (26.27)

Regression equation: ΔRNDQ(t) = a + bΔF(t). The table shows the estimate b coeﬃcient. t-statistics in parentheses.

0.95

0.98

0.99

0.730 (88.75) 0.727 (43.13) 0.720 (43.00) 0.670 (17.94) 0.726 (18.38)

0.685 (60.08) 0.661 (28.37) 0.696 (29.88) 0.559 (11.28) 0.710 (13.11)

0.659 (46.60) 0.613 (21.45) 0.691 (23.75) 0.478 (8.12) 0.712 (10.53)

352

Estimating the implied risk-neutral density for the US market portfolio

coeﬃcient estimates throughout its full range, so the patterns revealed in Table 15.4 are clearly more than simply artifacts of the tail ﬁtting procedure.

7. Concluding comments We have proposed a comprehensive approach for extracting a well-behaved estimate of the risk-neutral density over the price or return of an underlying asset, using the market prices of its traded options. This involves two signiﬁcant technical problems: ﬁrst, how best to obtain a suﬃcient number of valid option prices to work with, by smoothing the market quotes to reduce the eﬀect of market noise, and interpolating across the relatively sparse set of traded strike prices; and second, how to complete the density functions by extending them into the tails. We explored several potential solutions for the ﬁrst problem and settled on converting market option price quotes into implied volatilities, smoothing and interpolating them in strike price-implied volatility space, converting back to a dense set of prices, and then applying the standard methodology to extract the middle portion of the risk-neutral density. We then addressed the second problem by appending left and right tails from a generalized extreme value distribution in such a way that each tail contains the correct total probability and has a shape that approximates the shape of the empirical RND in the portion of the tail that was available from the market. Although the main concentration in this chapter has been on developing the estimation technology, the purpose of the exercise is ultimately to use the ﬁtted RND functions to learn more about how the market prices options, how it responds to the arrival of new information, and how market risk preferences behave and vary over time. We presented results showing that the risk-neutral density for the S&P 500 index, as reﬂected in its options, is far from the lognormal density assumed by the Black–Scholes model – it is strongly negatively skewed and fat-tailed relative to the (log)normal. We also found that when the underlying index moves, the RND not only moves along with the index, but it also changes shape in a regular way, with the left tail responding much more strongly than the right tail to the change in the index. These results warrant further investigation, for the S&P 500 and for other underlying assets that have active options markets. The following is a selection of such projects that are currently under way. The Federal Reserve announces its Federal funds interest rate target and policy decisions at approximately 2:15 in the afternoon at the end of its regular meeting, about every six weeks. This is a major piece of new information and the market’s response is immediate and often quite extreme. Using intraday options data, it is possible to estimate real time RNDs that give a very detailed picture of how the market’s expectations and risk preferences are aﬀected by the information release. The volatility of the underlying asset is a very important input into all modern option pricing models, but volatility is hard to predict accurately and there are a number of alternative techniques in common use. There are also a number of index-based securities that are closely related to one another and should therefore have closely related volatilities. The RND provides insight into what the market’s expected volatility is, and how it

7 Concluding comments

353

is connected to other volatility measures, like realized historical volatility, volatility estimated from a volatility model such as GARCH, realized future volatility over the life of the option, implied volatility from individual options or from the VIX index, volatility of S&P index futures prices, implied volatility from futures options, volatility of the SPDR tracking ETF, etc. Yet another important issue involves causality and predictive ability of the riskneutral density. Does the information contained in the RND predict the direction and volatility of future price movements, or does it lag behind and follow the S&P index or the S&P futures price? We hope and anticipate that the procedure we have developed here can be put to work in these and other projects, and will ultimately generate valuable new insights into the behavior of ﬁnancial markets.

16

A New Model for Limit Order Book Dynamics Jeﬀrey R. Russell and Taejin Kim

1. Introduction Nearly half the world’s stock exchanges are organized as order-driven markets such as Electronic Communications Networks or ECNs. These markets are purely electronic with no designated specialists or market maker. In the absence of a market maker, prices are completely determined by limit orders submitted by market participants. Hence, for these markets, the structure of the limit order book, the quantity of shares available for immediate execution at any given price, determines the cost of immediate order execution. The dynamics of the limit order book, therefore, determine how this cost varies over time. Despite the prevalence of these markets there are remarkably few models for the determinants of the structure of the limit order book and its dynamics. This chapter proposes a new dynamic model for the determinants of the structure of the limit order book as determined by the state of the market and asset characteristics. There is a substantial literature that has examined speciﬁc features of the limit order book. One literature has examined limit order placement strategies of individual investors. Examples include Biais, Hillioin, and Spatt (1995), Coppejans and Domowitz (2002), Ranaldo (2004), and Hall and Hautsch (2004). This approach provides insight into the microbehavior of decisions, but provides only indirect evidence about the overall structure of the limit order book. A second literature has focused on depth (the number of shares available) at the best bid and the best ask. Bollerslev, Domowitz and Wang (1997) propose a dynamic for the best bid and best ask conditional on order ﬂow. Kavajecz (1999) decomposes depth into specialist and limit order components. He shows that depth at the best bid and best ask are reduced during periods of uncertainty and possible private information. Acknowledgments: We thank Tim Bollerslev and Mark Watson for comments on a previous draft.

354

1 Introduction

355

The existing literature can address speciﬁc questions regarding the limit order book but in the end, cannot provide direct answers to questions like “what is the expected cost of buying 2,000 shares of Google one minute from now?”. Answers to these questions require a more complete model of the limit order book that models the entire structure, not just a component. These answers will clearly be useful when considering optimal trade execution strategies such as those considered in Almgren and Chriss (2000) who, under parametric assumptions regarding price dynamics, derive a closed form expression for optimal order execution. Furthermore, Engle and Ferstenberg show that optimally executing an order by breaking it up and spreading the execution over time induces a risk component that can be analyzed in the context of classic mean variance portfolio risk. This risk and time to execution is tied to the future shape of the limit order book. Engle, Ferstenberg and Russell (2008) empirically evaluate this mean variance tradeoﬀ, but clearly optimal execution strategies will beneﬁt from rigorous models for the dynamics of the limit order book. This chapter therefore diverges from the existing literature that focuses on speciﬁc features of the limit order book. The limit order book is a set of quantities to be bought or sold at diﬀerent prices and we propose directly modeling the timevarying demand and supply curves. The forecast from the model is therefore a function producing expected quantities over a range of prices as a function of the history of the limit order book and market and asset conditions. The model, therefore, can directly answer the questions regarding the expected cost of a purchase (or sale) in 1 minute. The model is parameterized in a way that allows for easy interpretation and therefore the model is useful in assessing and interpreting how market conditions aﬀect the shape of the limit order book and therefore liquidity. The distribution of depth across the limit order book is modeled by a time-varying normal distribution and therefore depends on two time-varying parameters. The ﬁrst determines the average distance that the depth lies away from the midquote. As this parameter increases, market liquidity tends to decrease. The second parameter determines how spread out is the depth. Larger values of this parameter lead to a ﬂatter limit order book. These parameters are made time-varying in an autoregressive manner so that the shape of the limit order book next period depends on the shape of the limit order book in the previous period and possibly other variables that characterize the market condition. The ease of interpretation of the proposed model diﬀerentiates it from the ACM model proposed by Russell and Engle (2005). The Probit structure of the model is in the spirit of the time series models proposed by Hausman Lo and MacKinlay (1992) and Bollerslev and Melvin (1994) although the speciﬁc dynamics and application are new. The model is applied to one month of limit order book data. The data come from Archipelago Exchange. Model estimates are presented for limit order book dynamics at 1 minute increments. We ﬁnd that the limit order book exhibits very strong persistence suggesting that new limit orders are slow to replenish the book. We also ﬁnd that depth tends to move away from the midquote, so that the market becomes less liquid, following larger spreads, smaller trade volume, higher transaction rates, and higher volatility. We also ﬁnd that the book tends to become more disperse (ﬂatter) when spreads are low, trade size is large, transaction rates are high, and volatility is high.

356

A new model for limit order book dynamics

2. The model This section presents a model for the distribution of the number of shares available across diﬀerent prices. Our approach decomposes the limit order book into two components; the total depth in the market and the distribution of that depth across the multiple prices. We begin with some notation. Let the midquote at time t be denoted by mt . Next, we denote a grid for N prices on the ask and bid sides. The ith ask price on the grid is denoted by pait and the ith bid price is denoted by pbit . pa1t is the ﬁrst price at or above the midquote at which depth can be listed and similarly, pa1t is the ﬁrst price below the midquote at which depth can be listed. We will treat the grid as being equally spaced so that each consecutive price on the ask side is a ﬁxed unit above the previous price. The grid accounts for the fact that available prices in most markets are restricted to fall on values at ﬁxed tick sizes. Hence, the smallest increment considered would be that of the tick size although larger increments could be considered as well. Finally, we deﬁne the total number of shares available in each price bin. On the ask side, ait denotes the total depth available in the ith bin. a1t is the shares available in the limit order book at prices p where (pa1t ≤ p ≤ pa2t ) and for i > 1 ait denotes the shares available at prices p where (pait < p ≤ pai+1t ). A similar notation is used for the bid side of the market where bit denotes the shares available in the limit order book on the bid side. The grid is anchored at the midquote so the grid has a time subscript. In both cases, larger values of i are associated with prices further away from the midquote. Our goal is to specify a model for the expected shares available in each bin given the state of the market and perhaps characteristics of the asset. For small number of bins (small N) the depth could be modeled by standard time series techniques such as a VAR. These approaches quickly become intractable when N is more than one or two. Additionally, it is diﬃcult to directly interpret the results of a VAR in the relevant context of liquidity. We take a diﬀerent approach that decomposes the problem into two components. Deﬁne the total shares in the limit order book over the ﬁrst N bins as N Dta = i=1 ait . We decompose the model for the limit order book into shape and level components. Given the total shares in the limit order book, deﬁne πit = E

ait |Da Dta t

(1)

as the expected fraction of the depth Dta in bin i, at time t. Given the total shares, the expected depth in bin i at time t is given by E (ait |Dta ) = πit Dta .

(2)

Diﬀerences in depth across bins are driven by the π terms. Hence, this decomposition separates the model for the limit order book into a shape component described by the πs and a level given by the overall depth, Dta . In general, both the shape of the limit order book and the total shares available, Dta , will depend on characteristics of the asset and market conditions. Let Ft−1 denote an information set available at time t-1, and let g(Dta |Ft−1 ) denote a model for the time-varying total shares. We now can generalize (1) and (2) to allow for time time-varing probabilities, time-varying total shares, Dta , and a

2 The model

357

time-varying limit order book:

πit = E

ait |Da , Ft−1 Dta t

.

The one step ahead, predicted depth is then given by πit g ( Dta | Ft−1 )dD. E (ait |Ft−1 ) =

(3)

(4)

D

Hence, the limit order book can be modeled using a multinomial model for (3) and a univariate time series model for g(Dta |Ft−1 ). The latter is a univariate series that could be modeled with standard time series models such as an ARMA model. The new part here is, therefore, to ﬁnd a good model for the multinomial probabilities. The goal in specifying the multinomial model is to ﬁnd a model that ﬁts the data well, is easily interpreted, and allows for N to be large without requiring a large number of parameters. The limit order book clearly exhibits dependence especially when viewed over short time periods. The model must, therefore, be speciﬁed in a ﬂexible way so that the shape depends on the history of the limit order book. Our model is formulated using a multinomial probit model. For the probit model, the multinomial probabilities are determined by areas under the normal density function. These probabilities are time-varying when the mean and variance of the Normal density are time-varying. Speciﬁcally, given a mean μt and a variance σt2 the probability is given by: πit =

Φt (pit − mt ) − Φt (pi−1t − mt ) Φt (pN t − mt ) − Φt (0)

Where Φt is the cumulative distribution function for a Normal (μt , σt2 ). The denominator simply normalizes the probabilities to sum to one. If the grid is set on ticks, then this would correspond to the fraction of the depth that lies on the ith tick above the midquotes. This parameterization is convenient to interpret. Clearly as μt increases, the center of the distribution moves away from the midquote. Therefore, larger values of μt are associated with depth lying, on average, further from the midquote. This would correspond to a less liquid market. As σt2 increases, the Normal density becomes ﬂatter, so spreading out the probability more evenly across the N bins. As σt2 goes to inﬁnity the probabilities become equal. An increase or decrease in either the mean or the variance is, therefore, easily interpreted in terms of average distance that the depth lies from the midquote and how spread out the depth is across the N bins. We now turn to the dynamics of the mean and variance. As the shape of the limit order book will be highly dependent, especially over short time intervals, we begin with the simplest version of the model using an autoregressive structure for the mean and variance. At each time period t, we can calculate the center of the empirical distribution of the depth. n This is given by x ¯t = D1t i=1 (pit − mt )ait . The diﬀerence between the actual mean n ¯t − i=1 πit (pit − mt ). Similarly, and the predicted mean is given by εt = x n we can com¯t )2 ait pute the empirical variance of the depth across the bins as s2t = D1t i=1 (pit − x n and the associated error is given by ηt = s2t − i=1 πit (pit − x ¯t )2 . If the model is correctly speciﬁed then both error terms will be serially uncorrelated. These errors are used to build an autoregressive model for the time-varying mean and variance that in turn dictate the time-varying probabilities in the multinomial.

358

A new model for limit order book dynamics

Speciﬁcally, a simple model for the dynamic of the mean is given by: μt = β0 + β1 μt−1 + β2 εt−1 . Similarly, a simple model for the dynamics of the variance is given by: 2 + γ2 ηt−1 . σt2 = γ0 + γ1 σt−1

Clearly, higher order models could be considered. Additionally, other variables that capture the state of the market could be included. The explicit dependence of the current mean and variance on the past mean and variance allows for potential persistence in the series. The error terms allow the updating to depend on the diﬀerences between the expected and actual mean and variance. In the next section, we turn to model estimation.

3. Model estimation The data on a given side of the market consist of the number of shares available in each bin. We proceed to estimate parameters for the mean and variance dynamics by maximum likelihood. If each share submitted at each time period t could be viewed as n i.i.d. draws from a multinomial distribution then the likelihood associated with the t-th period is given by: a1t a2t ant π2t . . . πnt . lt = π1t

This assumes that the shares are i.i.d. draws, which is surely false. Orders are submitted in packets of multiple shares, typically in increments of 100 shares. If all orders were submitted in packets of 100 shares then the likelihood for the tth observation would be given by: a ˜1t a ˜2t a ˜nt π2t . . . πnt lt = π1t ait where a ˜it = 100 . The log likelihood is then given by:

L=

n T

a ˜it ln (πit ) .

t=1 i=1

Given an initial value of μ0 and σ02 , the sequence multinomial probabilities can be sequentially updated and the likelihood evaluated for any set of parameters.

4. Data The data consist of limit orders that were submitted through the Archipelago Exchange (ARCA). This exchange has since been bought by NYSE and is now called ARCA. As of March, 2007, Archipelago is the second largest ECN in terms of shares traded (about 20% market share for NASDAQ stocks). Our data consist of 1 month of all limit orders submitted in January 2005. The data contain the type of order action; add, modify and delete. “Add” corresponds to a new order submission. “Modify” occurs when an order is

4 Data

359

Average depth

150

100

50

–40

Fig. 16.1.

–20

0 Price

20

40

Distribution of depth measured in cents away from midquote

modiﬁed either in its price, number of shares, or if an order is partially ﬁlled. “Delete” signiﬁes that an order was cancelled, ﬁlled, or expired. The data also contain a time stamp down to the millisecond, the price and order size, and a buy or sell indicator, stock symbol, and exchange. We extract orders for a single stock Google (GOOG). Only orders submitted during regular hours (9:30 to 4:00) are considered. From the order by order data we construct the complete limit order book at every minute. This results in 390 observations per day. The average trade price for Google over the month is close to $200. Figure 16.1 presents a plot of the depth at each cent moving away from the midquote from 1 cent to 40 cents. The plot reveals a peaked distribution, with its peak around 15–20 cents away from the midquote. Of course, this is an unconditional distribution. The limit order book data is merged with Trades and Quotes (TAQ) data for the same time period. From these data we create several variables related to trading and volatility. Past order ﬂow should be related to future order ﬂow and therefore future limit order placement. For every minute, we construct the logarithm of the average trade size over the most recent 15-minute period. Additionally, we construct the total number of trades executed over the most recent 15-minute period. Both are indications of the degree of market activity. We also create a realized volatility measure constructed by summing squared, 1-minute interval returns over the 15 most recent minutes. Finally, the bid-ask spread at transaction times is averaged over the 15 most recent minutes. In principle, we could model depth out through any distance from the midquote. We focus our attention in this analysis to the depth out through 30 cents. We aggregate the shares to larger, 5-cent bins and, consequently, have six bins on the bid side and six bins

360

A new model for limit order book dynamics 1.0

Fig. 16.2.

0.081 rˆ0 = 0.094 0.039 0.010 0.020

0.081 1.00 0.190 0.167 0.090 0.075

0.094 0.190 1.00 0.190 0.210 0.092

0.039 0.167 0.190 1.00 0.185 0.133

0.010 0.090 0.210 0.185 1.00 0.263

0.020 0.075 0.092 0.033 0.263 1.00

0.049 0.058 0.111 rˆ1 = 0.054 0.116 0.046

0.136 0.209 0.164 0.180 0.146 0.138

0.082 0.192 0.228 0.160 0.210 0.176

0.074 0.163 0.207 0.231 0.230 0.236

0.087 0.155 0.184 0.253 0.311 0.308

0.053 0.162 0.219 0.206 0.276 0.329

0.050 0.088 0.034 rˆ2 = 0.059 0.042 0.045

0.084 0.123 0.160 0.132 0.104 0.128

0.050 0.085 0.203 0.142 0.154 0.146

0.037 0.108 0.141 0.184 0.223 0.186

0.066 0.105 0.114 0.174 0.280 0.297

0.099 0.106 0.179 0.200 0.252 0.315

0.048 0.092 0.053 rˆ3 = 0.026 0.057 0.058

0.090 0.145 0.114 0.105 0.121 0.125

0.023 0.116 0.188 0.103 0.135 0.145

0.056 0.097 0.156 0.170 0.152 0.177

0.089 0.140 0.175 0.143 0.233 0.222

0.041 0.099 0.156 0.152 0.248 0.269

Autocorrelations of depth in diﬀerent bins on the ask side

on the ask side. Our modeling strategy has separate models for the bid and ask side of the market. In our analysis, we focus on the ask side only.

5. Results We begin with some summary statistics for the minute by minute data. At each minute, we have observed depth in the ﬁrst six 5-cent bins, a1t , a2t , . . . , a6t . It is interesting to assess the dependence structure in this vector time series. Speciﬁcally, if we stack the depth at time t into a vector xt where the ﬁrst element of xt is a1t and the last element is a6t , we construct the autocorrelations of the vector xt for lags 0 through 3 minutes. The sample autocorrelations are presented in Figure 16.2. For reference, conventional √ Barlett standard errors yield a statistically signiﬁcant autocorrelation larger than 2/ T = 0.024 in absolute value. All autocorrelations are positive, indicating the depth at the prices tends to move together. Depth near the diagonal tends to be more highly correlated than depth away from the diagonal, indicating that the correlation between close bins is larger than the

5 Results

361

correlation between bins that are far apart. The diagonal or autocorrelations of the same element of the vector xt tend to have the highest of all correlations. Although not presented, the general positive and signiﬁcant correlations structure continues out through lag 10 (or 10 minutes). We now estimate the model for the distribution of the depth across the bins, the multinomial Probit. We begin by estimating a simple, ﬁrst order model presented in 2 + γ2 ηt−1 . In Section 2. Speciﬁcally μt = β0 + β1 μt−1 + β2 εt−1 and σt2 = γ0 + γ1 σt−1 principle, one could re-initialize the start of each day by setting the initial value of μt and σt2 to some set value such as an unconditional mean or perhaps treat the initial values as parameters to be estimated. In reality, with 390 observations per day there is unlikely to be any meaningful eﬀect by simply connecting the day and neglecting the re-initialization. This is consistent with the ﬁndings in Engle and Russell (1998) for the trade by trade duration models. The parameter estimates are given in Table 16.1 with tstatistics in parenthesis. All parameters are statistically signiﬁcant at the 1% level. Both the mean and the variance exhibit very strong persistence indicating that the average distance of the depth from the midquote is highly persistent, as is the degree spread of the depth across bins. The autoregressive term is near 1 for both models. A natural test of the model is to check if the one step ahead forecast errors for the mean and variance equations (εt and ηt ) are uncorrelated. The null of a white noise series can be tested by examining the autocorrelations of these in sample errors. Speciﬁcally, we perform a Ljung-Box test on the ﬁrst 15 autocorrelations associated with the errors for the mean equation and the variance equation. The p values are 0.53 and 0.06, respectively. Hence, this simple ﬁrst order model appears to do a reasonably good job of capturing the dependence in the shape of the limit order book. It is interesting to see that a simple ﬁrst order version of the model can capture the substantial dependence in the shape of the limit order book. We now turn our attention to additional market factors that might inﬂuence the dynamics of the limit order book. Glosten (1994) predicts that higher trading rates should result in depth clustering around the midquote. Competition among traders in an active market leads to more limit orders being placed near the midquote. Similarly, Rosu (2008) proposes a theoretical model for the dynamics of the limit order book, which also predicts that more depth should cluster around midquote. Following Glosten and Rosu, we should expect the mean to decrease, and the average distance of the depth to move closer to the midquote in periods of high trading rates. Periods of high volatility are associated with greater uncertainty. In periods of high uncertainty there might be a higher probability of trading against better informed agents. Classic microstructure theory predicts a widening of bid-ask spreads when the probability of trading against better informed agents is higher. We might therefore expect that depth

Table 16.1.

Estimated coeﬃcients for time series model

Model for mean Intercept μt−1 , εt−1

0.057 (6.61) 0.998 (179.5) −0.51 (−511.05)

Model for variance Intercept 2 σt−1 ηt−1

0.13 (3.69) 0.962 (679.9) −0.91 (25.93)

362

A new model for limit order book dynamics

should move away from the midquote in periods of high volatility. At the same time, high volatility in the asset price increases the probability that a limit order far from the current price gets executed. This might also serve as an incentive for traders to seek superior execution by placing limit orders further from the current price. Both of the ideas imply that in periods of higher volatility, the mean average distance of the depth from the midquote should increase. We might also expect that the distribution of depth should ﬂatten. Hence, we might expect the mean and variance to increase in periods of high asset price volatility. In light of these economic arguments, we next estimate models that condition on recent transaction history and volatility. Speciﬁcally, we use the transaction volume over the past 15 minutes, the number of trades over the last 15 minutes and the realized minute-by-minute volatility over the last 15 minutes. Additionally, we include some other economic variables that are of interest including the average spread over the last 15 minutes and the price change over the last 15 minutes. We include all these economic variables within the ﬁrst order time series model estimated above. The coeﬃcients of the economic variables are presented in Table 16.2 with t-statistics in parenthesis. We begin with a discussion of the realized volatility. Realized variance has a positive coeﬃcient in the mean equation indicating that when the volatility of the asset price increases, the average distance of the depth tends to move away from the midquote. This is consistent with both ideas, namely increased likelihood of trading against better informed agents moves depth to more conservative prices that account for this risk. It is also consistent with the idea that high volatility increases the likelihood of depth further from the midquote getting executed at some point in the future. Similarly, the coeﬃcient on the volatility is positive in the variance equation. This indicates a ﬂattening of the distribution so that the depth is more evenly spread over the bins. Next, consider the trade size and trading rate variables. We see that larger average trade size tends to move the depth closer to the midquote. Higher trading rates tend to move the depth further from the midquote, on average. The eﬀect of trade size and trading rates are both positive on the variance. Larger trade size may be indicative of larger depth posted at the best bid and ask prices. As the depth at any price is positively serially correlated, this might simply be indicative of large depth at the ask following larger depth at the ask. The trading rates are a little easier to interpret because there is less of a direct link between trading rates and quantities at the best ask. The positive sign here indicates that depth tends to move away from the midquote during periods of high transaction rates. Additionally, the positive sign on both variables in the variance

Table 16.2. Estimated coeﬃcients for time series model with economic variables Model for mean Realized variance Trade size Spread Trading rate Price change

0.83 (1.76) −0.07 (1.87) 2.12 (4.15) 0.072 (3.69) 0.56 (7.43)

Model for variance 45.51 (5.85) 1.26 (3.08) 26.48 (3.46) 2.45 (8.34) −10.08 (−9.49)

5 Results

363

equation indicates that the depth is more evenly distributed during high trading rates and larger average size. Overall, the evidence does not support the predictions of Glosten or the model of Rosu. Wider spreads are associated with more uncertainty. As with volatility, we might expect that depth should move away from the midquote in periods of greater uncertainty. Indeed, the sign on the spread is positive both for the mean equation and for the variance equation. Rising prices tend to be associated with depth moving away from the midquote and the distribution becoming more evenly distributed. Next, we estimate a model for the second component, namely the level of the depth Dta on the ask side of the market. Speciﬁcally, we specify an ARMA(2,2) model for the logarithm of the total depth:

a

a + α2 ln Dt−2 + θ1 ξt−1 + θ2 ξt−2 + λrvt−1 + ξt ln (Dta ) = c + α1 ln Dt−1 where ξt is white noise and rvt−1 is the realized volatility over the last 15 minutes. The other economic variables are not signiﬁcant so they are not included in the ﬁnal model. The estimated model is given in Table 16.3. T-statistics are presented in parenthesis. The in sample residuals pass a Ljung-Box test with 15 lags. The process is also highly persistent. Although the other economic variables are insigniﬁcant, the realized volatility is signiﬁcant at the 1% level and implies that the level of depth tends to increase following periods of higher volatility. Combining the results for the distribution and the level, we see that the total number of shares in the ﬁrst 30 cents tends to increase following high volatility periods, but that the distribution of the depth shifts away from the midquote and ﬂattens out. Figure 16.3 presents a plot of the predicted depth under average conditions for all variables except the volatility, which is varied from average to the 5th percentile (low) to the 95th percentile (high). This plot can be used to calculate the expected cost of purchasing diﬀerent quantities. Speciﬁcally, about 200 more shares are expected to be available in the ﬁrst price bin when the volatility is high as compared to the low volatility state. About 500 more shares are expected to be available in the high volatility state for the second next price bin. Alternatively, the expected costs can be computed for any size trade directly oﬀ of the curves. The expected cost of purchasing 2000 shares in the low volatility state is about $10 more in the low volatility state. Table 16.3. Estimated coeﬃcients for total depth model Estimate Intercept AR(1) AR(2) MA(1) MA(2) Realized variance

9.76 1.23 −0.28 −0.28 −0.18 2.55

(86.69) (2.83) (20.51) (−7.04) (−12.21) (2.05)

364

A new model for limit order book dynamics 3500 3000 2500 2000

Mean

1500

High vol Low vol

1000 500 0 1

2

3

4

5

6

Fig. 16.3. Predicted limit order book under average conditions as volatility varies from low to high

6. Conclusions We propose a model for limit order book dynamics. The model is formulated in a way that separates the modeling problem into a model for the level of the depth and a model for the distribution of the depth across speciﬁed bins. The decomposition combined with the use of a convenient Probit model allows the dynamics to be interpreted in a particularly simple way. Speciﬁcally, we model the level, average distance of the depth from the midquote, and the ﬂatness or spread of the depth across the bins. The model for the level of the depth can be taken from oﬀ the shelf processes. The new part here is the model for the time-varying multinomial distribution. We show that simple low order models for the Probit are able to capture the strong temporal dependence in the shape of the distribution of the depth. More interestingly, we also consider several economic variables. We ﬁnd that higher volatility predicts that the overall level of the depth will increase, but that depth moves away from the midquote and the distribution tends to ﬂatten out, becoming more disperse. Contrary to the predictions of Glosten (1994) and Rosu (2008), we ﬁnd evidence that higher market activity, as measured by trading rates, tends to move depth away from the midquote and ﬂatten the distribution.

Bibliography Abraham, J.M., Goetzmann, W.N. and Wachter, S.M. (1994). “Homogeneous Groupings of Metropolitan Housing Markets,” Journal of Housing Economics, 3, 186–206. Ahn, D., Dittmar, R. and Gallant, A. (2002). “Quadratic Term Structure Models: Theory and Evidence,” Review of Financial Studies, 16, 459–485. A¨ıt-Sahalia, Y. (1996a). “Nonparametric Pricing of Interest Rate Derivative Securities,” Econometrica, 64, 527–560. A¨ıt-Sahalia, Y. (1996b). “Testing Continuous-Time Models of the Spot Interest Rate,” Review of Financial Studies, 9, 385–426. A¨ıt-Sahalia, Y. (2002). “Maximum-Likelihood Estimation of Discretely-Sampled Diﬀusions: A Closed-Form Approximation Approach,” Econometrica, 70, 223–262. A¨ıt-Sahalia, Y. (2007). “Estimating Continuous-Time Models Using Discretely Sampled Data,” Econometric Society World Congress Invited Lecture, in Advances in Economics and Econometrics, Theory and Applications, Ninth World Congress, (R. Blundell, P. Torsten and W.K. Newey, eds), Econometric Society Monographs, Cambridge University Press. A¨ıt-Sahalia, Y. (2008). “Closed-Form Likelihood Expansions for Multivariate Diﬀusions,” Annals of Statistics, 36, 906–937. A¨ıt-Sahalia, Y. and Lo, A.W. (1998). “Nonparametric Estimation of State-Price Densities Implicit in Financial Asset Prices,” Journal of Finance, 53, 499–547. A¨ıt-Sahalia, Y. and Lo, A.W. (2000). “Nonparametric Risk Management and Implied Risk Aversion,” Journal of Econometrics, 94, 9–51. A¨ıt-Sahalia, Y. and Kimmel, R. (2007a). “Maximum Likelihood Estimation of Stochastic Volatility Models,” Journal of Financial Economics, 83, 413–452. A¨ıt-Sahalia, Y. and Kimmel, R. (2007b). “Estimating Aﬃne Multifactor Term Structure Models Using Closed-Form Likelihood Expansions,” Working Paper, Princeton University. Alexander, C. (2001). Market Models: A Guide to Financial Data Analysis. Chichester, UK: John Wiley and Sons, Ltd.

365

366

Bibliography

Alexander, C. (2008). Market Risk Analysis, Vol. II: Practical Financial Econometrics. Chichester, UK: John Wiley and Sons, Ltd. Alexander, C. and Lazar, E. (2006). “Normal Mixture GARCH(1,1): Applications to Exchange Rate Modelling,” Journal of Applied Econometrics, 21, 307–336. Almgren, R. and Chriss, N. (2000). “Optimal Execution of Portfolio Transactions” Journal of Risk, 3, 5–39. Altonji, J. and Ham, J. (1990). “Variation in Employment Growth in Canada,” Journal of Labor Economics, 8, 198–236. Andersen, T.G. (1996). “Return Volatility and Trading Volume: An Information Flow Interpretation of Stochastic Volatility,” Journal of Finance, 51, 169–204. Andersen, T.G. and Bollerslev, T. (1998). “ARCH and GARCH Models.” In Encyclopedia of Statistical Sciences, Vol. II. (S. Kotz, C.B. Read and D.L. Banks eds), New York: John Wiley and Sons. Andersen, T.G., Bollerslev, T., Christoﬀersen, P.F. and Diebold, F.X. (2006a). “Volatility and Correlation Forecasting.” In Handbook of Economic Forecasting. (G. Elliott, C.W.J. Granger and A. Timmermann, eds), Amsterdam: North-Holland, 778–878. Andersen, T.G., Bollerslev, T., Christoﬀersen, P.F. and Diebold, F.X. (2006b). “Practical Volatility and Correlation Modeling for Financial Market Risk Management.” In Risks of Financial Institutions, (M. Carey and R. Stulz, eds), University of Chicago Press for NBER, 513–548. Andersen, T.G., Bollerslev, T. and Diebold, F.X. (2007). “Roughing it up: Including Jump Components in the Measurement, Modeling and Forecasting of Return Volatility,” Review of Economics and Statistics, 89, 707–720. Andersen, T.G., Bollerslev, T. and Diebold, F.X. (2009). “Parametric and Nonparametric Measurements of Volatility.” In Handbook of Financial Econometrics, (Y. A¨ıt-Sahalia and L.P. Hansen eds), North-Holland Forthcoming. Andersen, T.G., Bollerslev, T., Diebold, F.X. and Ebens, H. (2001). “The Distribution of Realized Stock Return Volatility,” Journal of Financial Economics, 61, 43–76. Andersen, T.G., Bollerslev, T., Diebold, F.X. and Labys, P. (2000). “Great Realizations,” Risk, 13, 105–108. Andersen, T.G., Bollerslev, T. Diebold, F.X. and Labys, P. (2001). “The Distribution of Exchange Rate Volatility,” Journal of the American Statistical Association, 96, 42–55. Correction published in 2003, volume 98, page 501. Andersen, T.G., Bollerslev, T., Diebold, F.X. and Vega, C. (2003). “Micro Eﬀects of Macro Announcements: Real-Time Price Discovery in Foreign Exchange,” American Economic Review, 93, 38–62.

Bibliography

367

Andersen, T.G., Bollerslev, T., Diebold, F.X. and Vega, C. (2007). “Real-Time Price Discovery in Stock, Bond and Foreign Exchange Markets,” Journal of International Economics, 73, 251–277. Andersen, T.G. and Lund, J. (1997). “Estimating Continuous Time Stochastic Volatility Models of the Short Term Interest Rate,” Journal of Econometrics, 77, 343–377. Andersen, T.G., Bollerslev, T. and Meddahi, N. (2004). “Analytic Evaluation of Volatility Forecasts,” International Economic Review, 45, 1079–1110. Andrews, D.W.K. (1988). “Laws of Large Numbers for Dependent Non-Identically Distributed Random Variables,” Econometric Theory, 4, 458–467. Andrews, D.W.K. (1991). “Asymptotic Normality of Series Estimators for Nonparametric and Semi-parametric Regression Models,” Econometrica, 59, 307– 346. Andrews, D.W.K. (1993). “Tests for Parameter Instability and Structural Change with Unknown Change Point,” Econometrica, 61, 501–533. Ang, A. and Bekaert, G. (2002). “International Asset Allocation with Regime Shifts,” Review of Financial Studies, 15, 1137–87. Ang, A., Chen, J. and Xing, Y. (2006). “Downside Risk,” Review of Financial Studies, 19, 1191–1239. Attanasio, O. (1991). “Risk, Time-Varying Second Moments and Market Eﬃciency,” Review of Economic Studies, 58, 479–494. Babsiria, M.E. and Zakoian, J.-M. (2001). “Contemporaneous Asymmetry in GARCH Processes,” Journal of Econometrics, 101, 257–294. Baele, L. and Inghelbrecht, K. (2005). “Time-Varying Integration and International Diversiﬁcation Strategies,” Tilburg University, unpublished manuscript. Bahra, B. (1997). “Implied Risk-Neutral Probability Density Functions from Options Prices: Theory and Application” Working Paper, Bank of England. Bai, J. (1997). “Estimating Multiple Breaks One at a Time,” Econometric Theory, 13, 315–352. Bai J. (2003). “Testing Parametric Conditional Distributions of Dynamic Models,” Review of Economics and Statistics, 85, 532–549. Bai, J. and Chen, Z. (2008). “Testing Multivariate Distributions in GARCH Models,” Journal of Econometrics, 143, 19–36. Baillie, R.T., Bollerslev T. and Mikkelsen H.O. (1996). “Fractionally Integrated Generalized Autoregressive Conditional Heteroskedasticity,” Journal of Econometrics, 74, 3–30.

368

Bibliography

Baillie, R.T., Chung, C.F. and Tieslau, M.A. (1996). “Analysing Inﬂation by the Fractionally Integrated ARFIMA-GARCH Model,” Journal of Applied Econometrics, 11, 23–40. Bandi, F. (2002). “Short-Term Interest Rate Dynamics: A Spatial Approach,” Journal of Financial Economics, 65, 73–110. Banz, R. and Miller M. (1978). “Prices for State-Contingent Claims: Some Estimates and Applications,” Journal of Business, 51, 653–672. Barndorﬀ-Nielsen, O.E., Graversen, S.E., Jacod, J. and Shephard N. (2006). “Limit Theorems for Realised Bipower Variation in Econometrics,” Econometric Theory, 22, 677–719. Barndorﬀ-Nielsen, O.E., Graversen, S.E., Jacod, J., Podolskij, M. and Shephard N. (2006). “A Central Limit Theorem for Realised Power and Bipower Variations of Continuous Semimartingales.” In From Stochastic Analysis to Mathematical Finance, Festschrift for Albert Shiryaev, (Y. Kabanov, R. Lipster and J. Stoyanov, eds), 33–68. Springer. Barndorﬀ-Nielsen, O.E., Hansen, P.R., Lunde, A. and Shephard, N. (2008). “Designing Realised Kernels to Measure the Ex-Post Variation of Equity Prices in the Presence of Noise,” Econometrica, 76, 1481–1536. Barndorﬀ-Nielsen, O.E. and Shephard N. (2001). “Non-Gaussian Ornstein–UhlenbeckBased Models and Some of their Uses in Financial Economics (with discussion),” Journal of the Royal Statistical Society, Series B, 63, 167–241. Barndorﬀ-Nielsen, O.E. and Shephard N. (2002). “Econometric Analysis of Realised Volatility and its Use in Estimating Stochastic Volatility Models,” Journal of the Royal Statistical Society, Series B, 64, 253–280. Barndorﬀ-Nielsen, O.E. and Shephard N. (2004). “Power and Bipower Variation with Stochastic Volatility and Jumps (with discussion),” Journal of Financial Econometrics, 2, 1–48. Barndorﬀ-Nielsen, O.E. and Shephard N. (2006). “Econometrics of Testing for Jumps in Financial Economics Using Bipower Variation,” Journal of Financial Econometrics, 4, 1–30. Barndorﬀ-Nielsen, O.E. and Shephard N. (2007). “Variation, Jumps and High Frequency Data in Financial Econometrics.” In Advances in Economics and Econometrics. Theory and Applications, Ninth World Congress (R. Blundell, T. Persson and W.K. Newey, eds), Econometric Society Monographs, Cambridge University Press, 328– 372. Bartle, R. (1966). The Elements of Integration. New York: Wiley. Bates, D.S. (1991). “The Crash of ’87–Was it Expected? The Evidence from Options Markets,” Journal of Finance, 43, 1009–1044.

Bibliography

369

Bates, D.S. (1996). “Jumps and Stochastic Volatility: Exchange Rate Process Implicit in Deutsche Mark Options,” Review of Financial Studies, 9, 69–107. Bauwens, L., Laurent, S. and Rombouts, J.V.K. (2006). “Multivariate GARCH Models: A Survey,” Journal of Applied Econometrics, 21, 79–109. Beckers, S., Grinold, R., Rudd, A. and Stefek, D. (1992). “The Relative Importance of Common Factors Across the European Equity Markets,” Journal of Banking and Finance, 16, 75–95. Bekaert, G. and Harvey, C. (1995). “Time-Varying World Market Integration,” Journal of Finance, 50, 403–44. Bekaert, G., Harvey, C. and Ng, A. (2005). “Market Integration and Contagion,” Journal of Business, 78, 39–70. Bekaert, G., Hodrick, R. and Zhang, X. (2005). “International Stock Return Comovements,” NBER Working Paper 11906. Benati, L. (2004). “Evolving Post-World War II U.K. Economic Performance,” Journal of Money, Credit, and Banking, 36, 691–717. Bera, A.K. and Higgins, M.L. (1993). “ARCH Models: Properties, Estimation and Testing,” Journal of Economic Surveys, 7, 305–366. Bera, A.K., Higgins, M.L. and Lee, S. (1992). “Interaction Between Autocorrelation and Conditional Heteroskedasticity: A Random-Coeﬃcient Approach,” Journal of Business and Economic Statistics, 10, 133–142. Bernanke, B. and Blinder, A. (1992). “The Federal Funds Rate and the Channels of Monetary Transmission,” American Economic Review, 82(4), 901–921. Berzeg, K. (1978). “The Empirical Content of Shift-Share Analysis,” Journal of Regional Science, 18, 463–469. Biais, B., Hillioin P., and Spatt, C. (1995). “An Empirical Analysis of the Limit Order Book and the Order Flow in the Paris Bourse,” Journal of Finance, 1655–1689. Bierens, H.J. (1990). “A Consistent Conditional Moment Test of Functional Form,” Econometrica, 58, 1443–1458. Bierens, H.J. and Ploberger, W. (1997). “Asymptotic Theory of Integrated Conditional Moment Tests,” Econometrica, 65, 1129–1151. Billio, M., Caporin, M. and Gobbo, M. (2006). “Flexible Dynamic Conditional Correlation Multivariate GARCH Models for Asset Allocation,” Applied Financial Economics Letters, 2, 123–130. Black, F. (1976). “Studies of Stock Price Volatility Changes,” Proceedings of the Business and Economic Statistics Section, American Statistical Association, 177–181.

370

Bibliography

Bliss, R. and Panigirtzoglou, N. (2002). “Testing the Stability of Implied Probability Density Functions,” Journal of Banking and Finance, 26, 381–422. Bliss, R. and Panigirtzoglou, N. (2004). “Option Implied Risk Aversion Estimates,” Journal of Finance, 59, 407–446. Boero, G., Smith, J. and Wallis, K.F. (2008). “Uncertainty and Disagreement in Economic Prediction: The Bank of England Survey of External Forecasters,” Economic Journal, 118, 1107–1127. Boivin, J. (2006). “Has U.S. Monetary Policy Changed? Evidence from Drifting Coeﬃcients and Real-Time Data,” Journal of Money, Credit and Banking, 38, 1149–1173. Boivin, J. and Giannoni, M.P. (2006). “Has Monetary Policy Become More Eﬀective?” Review of Economics and Statistics, 88, 445–462. Bollerslev, T. (1986). “Generalized Autoregressive Conditional Heteroskedasticity,” Journal of Econometrics, 31, 307–327. Bollerslev, T. (1987). “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return,” Review of Economics and Statistics, 69, 542–547. Bollerslev, T. (1990). “Modeling the Coherence in Short-Run Nominal Exchange Rates: A Multivariate Generalized ARCH Model,” Review of Economics and Statistics, 72, 498–505. Bollerslev, T., Chou, R.Y. and Kroner, K.F. (1992). “ARCH Modeling in Finance: A Selective Review of the Theory and Empirical Evidence,” Journal of Econometrics, 52, 5–59. Bollerslev, T., Domowitz, I. and Wang, I. (1997). “Order Flow and the Bid-Ask Spread: An Empirical Probability Model of Screen-Based Trading,” Journal of Economics Dynamics and Control, 1471–1491. Bollerslev, T. and Engle, R.F. (1986). “Modeling the Persistence of Conditional Variances,” Econometric Reviews, 5, 1–50. Bollerslev, T., Engle, R.F. and Nelson, D.B. (1994). “ARCH Models.” In Handbook of Econometrics, Volume IV, (R.F. Engle and D. McFadden eds), Amsterdam: NorthHolland 2959–3038. Bollerslev, T., Engle, R.F. and Wooldridge, J.M. (1988). “A Capital Asset Pricing Model with Time Varying Covariances,” Journal of Political Economy, 96, 116–131. Bollerslev, T. and Ghysels, E. (1996). “Periodic Autoregressive Conditional Heteroskedasticity,” Journal of Business and Economic Statistics, 14, 139– 151.

Bibliography

371

Bollerslev, T. and Melvin, M. (1994). “Bid-Ask Spreads and Volatility in the Foreign Exchange Market: An Empirical Analysis,” Journal of International Economics, 355–372. Bollerslev, T. and Mikkelsen, H.O. (1996). “Modeling and Pricing Long Memory in Stock Market Volatility,” Journal of Econometrics, 73, 151–184. Bollerslev, T. and Wooldridge, J.M. (1992). “Quasi-Maximum Likelihood Estimation and Inference in Dynamic Models with Time Varying Covariances,” Econometric Reviews, 11, 143–172. Boswijk, H.P. (1992). Cointegration, Identiﬁcation and Exogeneity, Vol. 37 of Tinbergen Institute Research Series. Amsterdam: Thesis Publishers. Boswijk, H.P. and Doornik, J.A. (2004). “Identifying, Estimating and Testing Restricted Cointegrated Systems: An Overview,” Statistica Neerlandica, 58, 440–465. Boudoukh, J., Richardson, M., Smith, T. and Whitelaw, R.F. (1999a). “Ex Ante Bond Returns and the Liquidity Preference Hypothesis,” Journal of Finance, 54, 1153– 1167. Boudoukh, J., Richardson, M., Smith T. and Whitelaw, R.F. (1999b). “Bond Returns and Regime Shifts,” Working Paper, NYU. Bowley, A.L. (1920). Elements of Statistics. New York: Charles Scribner’s Sons. Box, G.E.P. and Jenkins, G.M. (1976). Time Series Analysis: Forecasting and Control. Revised edition. San Francisco: Holden-Day. Brandt, M.W. and Jones, C.S. (2006). “Volatility Forecasting with RangeBased EGARCH Models,” Journal of Business and Economic Statistics, 24, 470–486. Breeden, D. and Litzenberger, R. (1978). “Prices of State-Contingent Claims Implicit in Option Prices,” Journal of Business, 51, 621–652. Brennan, M. and Schwartz, E. (1979). “A Continuous Time Approach to the Pricing of Bonds,” Journal of Banking and Finance, 3, 133–155. Brenner, R.J., Harjes, R.H. and Kroner, K.F. (1996). “Another Look at Models of the Short-Term Interest Rate,” Journal of Financial and Quantitative Analysis, 31, 85–107. Brockwell, P., Chadraa, E. and Lindner, A. (2006). “Continuous-Time GARCH Processes,” Annals of Applied Probability, 16, 790–826. Brooks, C. (2002). Introductory Econometrics for Finance. Cambridge, UK: Cambridge University Press. Brooks, R. and Cat˜ ao, L. (2000). “The New Economy and Global Stock Returns,” IMF Working Paper 00/216, Washington: International Monetary Fund.

372

Bibliography

Brooks, R. and del Negro, M. (2002). “International Diversiﬁcation Strategies,” Working Paper 2002–23, Federal Reserve Bank of Atlanta. Brown, H.J. (1969). “Shift Share Projections of Regional Growth: An Empirical Test,” Journal of Regional Science, 9, 1–18. Brown, S. (1986). Post-Sample Forecasting Comparisons and Model Selection Procedures, Ph.D dissertation, University of California, San Diego. Brown, S., Coulson, N.E. and Engle, R. (1991). “Noncointegration and Econometric Evaluation of Models of Regional Shift and Share,” Working Paper. Brown, S., Coulson, N.E. and Engle, R. (1992). “On the Determination of Regional Base and Regional Base Multipliers,” Regional Science and Urban Economics, 27, 619–635. Brown, S. and Dybvig, P. (1986). “The Empirical Implications of the Cox, Ingersoll, Ross Theory of the Term Structure of Interest Rates,” Journal of Finance, 41, 617–630. Bu, R. and Hadri, K. (2007). “Estimating Option Implied Risk-Neutral Densities using Spline and Hypergeometric Functions,” Econometrics Journal, 10, 216–244. Buchen, P.W. and Kelly, M. (1996). “The Maximum Entropy Distribution of an Asset Inferred from Option Prices,” Journal of Financial and Quantitative Analysis, 31, 143–159. Burns, P. (2005). “Multivariate GARCH with Only Univariate Estimation,” http://www. burns-stat.com. Cai, J. (1994). “A Markov Model of Switching-Regime ARCH,” Journal of Business and Economic Statistics, 12, 309–316. Calvet, L.E., Fisher, A.J. and Thompson, S.B. (2006). “Volatility Comovement: A Multifrequency Approach,” Journal of Econometrics, 131, 179–215. Campa, J.M., Chang, P.H.K. and Reider, R.L. (1998). “Implied Exchange Rate Distributions: Evidence from OTC Option Markets,” Journal of International Money and Finance, 17, 117–160. Campbell, J.Y. and Shiller, R. (1991). “Yield Spreads and Interest Rate Movements: A Bird’s Eye View,” Review of Economic Studies, 58, 495–514. Campbell, J.Y., Lo, A.W. and Mackinlay, A.C. (1997). The Econometrics of Financial Markets, Princeton: Princeton University Press. Caporin, M. and McAleer, M. (2006). “Dynamic Asymmetric GARCH,” Journal of Financial Econometrics, 4, 385–412.

Bibliography

373

Cappiello, L., Engle, R.F. and Sheppard, K. (2006). “Asymmetric Dynamics in the Correlations of Global Equity and Bond Returns,” Journal of Financial Econometrics, 4, 537–572. Carlino, G.A., DeFina, R. and Sill, K. (2001). “Sectoral Shocks and Metropolitan Employment Growth,” Journal of Urban Economics, 50, 396–417. Carlino, G.A. and Mills, L.O. (1993). “Are, U.S. Regional Incomes Converging? A Time Series Analysis,” Journal of Monetary Economics, 32, 335–346. Carter, C.K. and Kohn, R. (1994). “On Gibbs Sampling for State Space Models,” Biometrika, 81, 541–553. Castle, J.L., Fawcett, N.W.P. and Hendry, D.F. (2009). “Forecasting with EquilibriumCorrection Models During Structural Breaks,” Journal of Econometrics, forthcoming. Castle, J.L. and Hendry, D.F. (2008). “Forecasting UK Inﬂation: The Roles of Structural Breaks and Time Disaggregation.” In Forecasting in the Presence of Structural Breaks and Model Uncertainty (D.E. Rapach and M.E. Wohar, eds), Bingley: Emerald, 41–92. Cavaglia, S., Brightman, C. and Aked, M. (2000). “The Increasing Importance of Industry Factors,” Financial Analysts Journal, 41–54. Chan, N.H. (2002). Time Series: Applications to Finance. New York: John Wiley and Sons, Inc. Chan, K.C., Karolyi, A., Longstaﬀ, F. and Sanders, A. (1992). “An Empirical Comparison of Alternative Models of the Short-Term Interest Rate,” Journal of Finance, 47, 1209–1227. Chapman, D.A. and Pearson, N.D. (2000). “Is the Short Rate Drift Actually Nonlinear?” Journal of Finance, 55, 355–388. Chen, X. and Ghysels, E. (2007). “News – Good or Bad – and its Impact over Multiple Horizons,” Unpublished paper: Department of Economics, University of North Carolina at Chapel Hill. Chen, R.-R. and Scott, L. (1992). “Pricing Interest Rate Options in a Two-Factor CoxIngersoll-Ross Model of the Term Structure,” Review of Financial Studies, 5, 613– 636. Chen, X. and Shen, X. (1998). “Sieve Extremum Estimates for Weakly Dependent Data,” Econometrica, 66, 289–314. Chernov, M. and Mueller, P. (2007). “The Term Structure of Inﬂation Forecasts,” Working Paper, London Business School. Chesher, A. and Irish, M. (1987). “Residual Analysis in the Grouped and Censored Normal Linear Model,” Journal of Econometrics, 34, 33–61.

374

Bibliography

Chicago Board Options Exchange (2003). VIX CBOE Volatility Index. http://www. cboe.com/micro/vix/vixwhite.pdf. Chou, R.Y. (2005). “Forecasting Financial Volatilities with Extreme Values: The Conditional Autoregressive Range (CARR) Model,” Journal of Money, Credit and Banking, 37, 561–582. Chow, G.C. (1960). “Tests of Equality Between Sets of Coeﬃcients in Two Linear Regressions,” Econometrica, 28, 591–605. Christodoulakis, G.A. and Satchell, S.E. (2002). “Correlated ARCH (CorrARCH): Modelling Time-Varying Conditional Correlation Between Financial Asset Returns,” European Journal of Operational Research, 139, 351–370. Christoﬀersen, P.F. (2003). Elements of Financial Risk Management. San Diego: Academic Press. Christoﬀersen, P.F. and Diebold, F.X. (1996). “Further Results on Forecasting and Model Selection Under Asymmetric Loss,” Journal of Applied Econometrics, 11, 561–72. Christoﬀersen, P.F. and Diebold, F.X. (1997). “Optimal Prediction Under Asymmetric Loss,” Econometric Theory, 13, 808–817. Christoﬀersen, P.F. and Jacobs, K. (2004). “The Importance of the Loss Function in Option Valuation,” Journal of Financial Economics, 72, 291–318. Clarida, R., Gal´ı, J. and Gertler, M. (2000). “Monetary Policy Rules and Macroeconomic Stability: Evidence and Some Theory,” Quarterly Journal of Economics, 115, 147–180. Clark, P.K. (1973). “A Subordinated Stochastic Process Model with Finite Variance for Speculative Prices,” Econometrica, 41, 135–156. Clements, M.P. and Hendry, D.F. (1994). “Towards a Theory of Economic Forecasting.” In Non-stationary Time-series Analysis and Cointegration, (Hargreaves, C., ed), Oxford: Oxford University Press, 9–52. Clements, M.P. and Hendry, D.F. (1999). Forecasting Non-stationary Economic Time Series. Cambridge, Mass.: MIT Press. Cleveland, W.S. (1979). “Robust Locally Weighted Fitting and Smoothing Scatterplots,” Journal of the American Statistical Association, 74, 829–836. Collin-Dufresne, P., Goldstein, R. and Jones, C. (2006). “Can Interest Rate Volatility be Extracted from the Cross Section of Bond Yields? An Investigation of Unspanned Stochastic Volatility,” Working Paper, U.C. Berkeley. Conley, T., Hansen, L., Luttmer, E. and Scheinkman, J. (1995). “Estimating Subordinated Diﬀusions from Discrete Time Data,” The Review of Financial Studies, 10, 525–577.

Bibliography

375

Conrad, C. and Karanasos, M. (2006). “The Impulse Response Function of the Long Memory GARCH Process,” Economic Letters, 90, 34–41. Coppejans, M. and Domowitz, I. (2002). “An Empirical Analysis of Trades, Orders, and Cancellations in a Limit Order Market,” Discussion Paper, Duke University. Corradi, V. and Distaso, W. (2006). “Semiparametric Comparison of Stochastic Volatility Models Using Realized Measures,” Review of Economic Studies, 73, 635–667. Coulson, N.E. (1993). “The Sources of Sectoral Fluctuations in Metropolitan Areas,” Journal of Urban Economics, 33, 76–94. Coulson, N.E. (1999). “Housing Inventory and Completion,” Journal of Real Estate Finance and Economics, 18, 89–106. Coulson, N.E. (1999). “Sectoral Sources of Metropolitan Growth,” Regional Science and Urban Economics, 39, 723–743. Cox, J.C., Ingersoll, J.E. Jr. and Ross, S.A. (1985). “A Theory of the Term Structure of Interest Rates,” Econometrica, 53, 385–408. Crone, T.M. (2005). “An Alternative Deﬁnition of Economic Regions in the United States Based on Similarities in State Business Cycles,” The Review of Economics and Statistics, 87, 617–626. Crone, T.M. and Clayton-Matthews, A. (2005). “Consistent Economic Indexes for the 50 States,” The Review of Economics and Statistics, 87, 593–603. Crouhy, H. and Rockinger, M. (1997). “Volatility Clustering, Asymmetric and Hysteresis in Stock Returns: International Evidence,” Financial Engineering and the Japanese Markets, 4, 1–35. Crow, E.L. and Siddiqui, M.M. (1967). “Robust Estimation of Location,” Journal of the American Statistical Association, 62, 353–389. Dai, Q. and Singleton, K.J. (2000). “Speciﬁcation Analysis of Aﬃne Term Structure Models,” Journal of Finance, 55, 1943–1978. Davidson, J. (2004). “Moment and Memory Properties of Linear Conditional Heteroskedasticity Models, and a New Model,” Journal of Business and Economic Statistics, 22, 16–29. Davies, R. (1977). “Hypothesis Testing When a Nuisance Parameter is Present Only Under the Alternative,” Biometrica, 64, 247–254. de Jong, R.M. (1996). “The Bierens Test Under Data Dependence,” Journal of Econometrics, 72, 1–32. Degiannakis, S. and Xekalaki, E. (2004). “Autoregressive Conditional Heteroscedasticity (ARCH) Models: A Review,” Quality Technology and Quantitative Management, 1, 271–324.

376

Bibliography

den Hertog, R.G.J. (1994). “Pricing of Permanent and Transitory Volatility for U.S. Stock Returns: A Composite GARCH Model,” Economic Letters, 44, 421– 426. Derman, E. and Kani, I. (1994). “Riding on a Smile,” RISK, 7 (Feb.), 32–39. Derman, E. and Kani, I. (1998). “Stochastic Implied Trees: Arbitrage Pricing with Stochastic Term and Strike Structure of Volatility,” International Journal of Theoretical and Applied Finance, 1, 61–110. Diebold, F.X. (1988). Empirical Modeling of Exchange Rate Dynamics. New York: Springer-Verlag. Diebold, F.X. (2003). “The ET Interview: Professor Robert F. Engle, January 2003,” Econometric Theory, 19, 1159–1193. Diebold, F.X. (2004). “The Nobel Memorial Prize for Robert F. Engle,” Scandinavian Journal of Economics, 106, 165–185. Diebold, F.X. and Lopez, J. (1995). “Modeling Volatility Dynamics.” In Macroeconometrics: Developments, Tensions and Prospects, (K. Hoover ed.), Boston: Kluwer Academic Press, 427–472. Diebold, F.X. and Nerlove, M. (1989). “The Dynamics of Exchange Rate Volatility: A Multivariate Latent Factor ARCH Model,” Journal of Applied Econometrics, 4, 1–21. Diebold, F.X., Rudebusch, G.D. and Aruoba, B. (2006). “The Macroeconomy and the Yield Curve: A Dynamic Latent Factor Approach,” Journal of Econometrics, 131, 309–338. Diemeier, J. and Solnik, J. (2001). “Global Pricing of Equity,” Financial Analysts Journal, 57, 37–47. Ding, Z. and Engle, R.F. (2001). “Large Scale Conditional Covariance Modeling, Estimation and Testing,” Academia Economic Papers, 29, 157–184. Ding, Z., Engle, R.F. and Granger, C.W.J. (1993). “A Long Memory Property of Stock Market Returns and a New Model,” Journal of Empirical Finance, 1, 83–106. Donaldson, R.G. and Kamstra, M. (1997). “An Artiﬁcial Neural Network GARCH model for International Stock Return Volatility,” Journal of Empirical Finance, 4, 17–46. Doornik, J.A. (2001). Ox: Object Oriented Matrix Programming, 5.0. London: Timberlake Consultants Press. Doornik, J.A. (2007a). “Econometric Modelling When There are More Variables than Observations.” Working Paper, Economics Department, University of Oxford. Doornik, J.A. (2007b). Object-Oriented Matrix Programming using Ox, 6th edn. London: Timberlake Consultants Press.

Bibliography

377

Doornik, J.A. (2009). “Autometrics.” Working Paper, Economics Department, University of Oxford. Doornik, J.A. and Hansen, H. (2008). “A Practical Test for Univariate and Multivariate Normality,” Discussion Paper, Nuﬃeld College. Driﬃll, J. and Sola, M. (1994). “Testing the Term Structure of Interest Rates from a Stationary Switching Regime VAR,” Journal of Economic Dynamics and Control 18, 601–628. Drost, F.C. and Nijman, T.E. (1993). “Temporal Aggregation of GARCH Processes,” Econometrica, 61, 909–927. Duan, J. (1997). “Augmented GARCH(p,q) Process and its Diﬀusion Limit,” Journal of Econometrics, 79, 97–127. Duchesne, P. and Lalancette, S. (2003). “On Testing for Multivariate ARCH Eﬀects in Vector Time Series Models,” La Revue Canadienne de Statistique, 31, 275–292. Duﬃe, D. (1988). Security Markets: Stochastic Models, Academic Press, Boston. Duﬃe, D. and Kan, R. (1996). “A Yield-Factor Model of Interest Rates,” Mathematical Finance, 6, 379–406. Duﬃe, D., Ma, J. and Yong, J. (1995). “Black’s Consol Rate Conjecture,” Annals of Applied Probability, 5, 356–382. Dumas, B., Fleming, J. and Whaley, R.E. (1998). “Implied Volatility Functions: Empirical Tests,” Journal of Finance, 53, 2059–2106. Dunn, E.S., Jr. (1960). “A Statistical and Analytical Technique for Regional Analysis,” Regional Science Association Papers and Proceedings, 6, 97–112. Dupire, B. (1994). “Pricing and Hedging with Smiles,” RISK, 7 (Jan), 18–20. Eichengreen, B. and Tong, H. (2004). “Stock Market Volatility and Monetary Policy: What the Historical Record Shows,” Paper presented at the Central Bank of Australia. Elder, J. and Serletis, A. (2006). “Oil Price Uncertainty,” Working Paper, North Dakota State University. Elliott, G., Komunjer, I. and Timmermann, A. (2005). “Estimation and Testing of Forecast Rationality under Flexible Loss” Review of Economic Studies, 72, 1107– 1125. Elliott, G., Komunjer, I. and Timmermann, A. (2008). “Biases in Macroeconomic Forecasts: Irrationality or Asymmetric Loss?” Journal of European Economic Association, 6, 122–157. Embrechts, P., Kl¨ uppelberg, C. and Mikosch. T. (1997). Modelling Extremal Values for Insurance and Finance. Springer.

378

Bibliography

Emmerson, R., Ramanathan, R. and Ramm, W. (1975). “On the Analysis of Regional Growth Patterns,” Journal of Regional Science, 15, 17–28. Enders, W. (2004). Applied Econometric Time Series. Hoboken, NJ: John Wiley and Sons, Inc. Engel, C. and Hamilton, J.D. (1990). “Long Swings in the Dollar: Are they in the Data and do the Markets Know it?,” American Economic Review, 80, 689–713. Engle, R.F. (1978a). “Testing Price Equations for Stability Across Frequency Bands,” Econometrica, 46, 869–881. Engle, R.F. (1978b). “Estimating Structural Models of Seasonality.” In Seasonal Analysis of Economic Time Series, (A. Zellner, ed.). U.S. Department of Commerce, Bureau of Census. Engle, R.F. (1982a). “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of U.K. Inﬂation,” Econometrica, 50, 987–1008. Engle, R.F. (1982b). “A General Approach to Lagrange Multiplier Model Diagnostics,” Journal of Econometrics, 20, 83–104. Engle, R.F. (1990). “Discussion: Stock Market Volatility and the Crash of ’87,” Review of Financial Studies, 3, 103–106. Engle, R.F. (1995). ARCH: Selected Readings. Oxford, UK: Oxford University Press. Engle, R.F. (2001). “GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics,” Journal of Economic Perspectives, 15, 157–168. Engle, R.F. (2002a). “Dynamic Conditional Correlation: A Simple Class of Multivariate GARCH Models,” Journal of Business and Economic Statistics, 20, 339–350. Engle, R.F. (2002b). “New Frontiers for ARCH Models,” Journal of Applied Econometrics, 17, 425–446. Engle, R.F. (2004). “Nobel Lecture. Risk and Volatility: Econometric Models and Financial Practice,” American Economic Review, 94, 405–420. Engle, R.F. and Bollerslev, T. (1986). “Modeling the Persistence of Conditional Variances,” Econometric Reviews, 5, 1–50. Engle, R.F. and Ferstenberg, R. (2007). “Execution Risk,” Journal of Portfolio Management, 34–45. Engle, R.F., Ferstenberg, R. and Russell, J. (2008). “Measuring and Modeling Execution Cost and Risk,” University of Chicago Booth School of Business, Working Paper. Engle, R.F. and Gallo, J.P. (2006). “A Multiple Indicator Model for Volatility Using Intra Daily Data,” Journal of Econometrics, 131, 3–27.

Bibliography

379

Engle, R.F., Ghysels, E. and Sohn, B. (2006). “On the Economic Sources of Stock Market Volatility,” Manuscript, New York University. Engle, R.F. and Gonz´ alez-Rivera, G. (1991). “Semi-Parametric ARCH Models,” Journal of Business and Economic Statistics, 9, 345–359. Engle, R.F. and Granger, C.W.J. (1987). “Cointegration and Error Correction: Representation, Estimation and Testing,” Econometrica, 55, 251–276. Engle, R.F. and Hendry, D.F. (1993). “Testing Super Exogeneity and Invariance in Regression Models,” Journal of Econometrics, 56, 119–139. Engle, R.F., Hendry, D.F. and Richard, J.-F. (1983). “Exogeneity,” Econometrica, 51, 277–304. Engle, R.F., Ito, T. and Lin, W.L. (1990). “Meteor Showers or Heat Waves? Heteroskedastic Intra-Daily Volatility in the Foreign Exchange Market,” Econometrica, 58, 525–542. Engle, R.F. and Kroner, F.K. (1995). “Multivariate Simultaneous Generalized GARCH,” Econometric Theory, 11, 122–150. Engle, R.F. and Lee, G.G.J. (1999). “A Permanent and Transitory Component Model of Stock Return Volatility.” In Cointegration, Causality, and Forecasting: A Festschrift in Honor of Clive W.J. Granger, (R.F. Engle and H. White eds), Oxford, UK: Oxford University Press, 475–497. Engle, R.F., Lilien, D. and Robins, R. (1987). “Estimating Time-Varying Risk Premia in the Term Structure: The ARCH-M Model,” Econometrica, 55, 391–407. Engle, R.F., Lilien, D.M. and Watson, M.W. (1985). “A DYMIMIC Model of Housing Price Determination,” Journal of Econometrics, 28, 307–326. Engle, R.F. and Manganelli, S. (2004). “CAViaR: Conditional Autoregressive Value-atRisk by Regression Quantiles,” Journal of Business and Economic Statistics, 22, 367–381. Engle, R.F. and Mezrich, J. (1996). “GARCH for Groups,” Risk, 9, 36–40. Engle, R.F. and Ng, V.K. (1993). “Measuring and Testing the Impact of News on Volatility,” Journal of Finance, 48, 1749–1778. Engle, R.F. and Ng, V. (1993). “Time-Varying Volatility and the Dynamic Behavior of the Term Structure,” Journal of Money, Credit, and Banking, 25, 336–349. Engle, R.F., Ng, V.K. and Rothschild, M. (1990). “Asset Pricing with a FactorARCH Covariance Structure: Empirical Estimates for Treasury Bills,” Journal of Econometrics, 45, 213–238. Engle, R.F. and Patton, A.J. (2001). “What Good is a Volatility Model?” Quantitative Finance, 1, 237–245.

380

Bibliography

Engle, R.F. and Rangel, J.G. (2008). “The Spline-GARCH Model for Low Frequency Volatility and its Global Macroeconomic Causes,” Review of Financial Studies, 21, 1187–1222. Engle, R.F. and Rosenberg, J. (1995). “GARCH Gamma,” Journal of Derivatives, 17, 229–247. Engle, R.F. and Rothschild, M. (1992). “Statistical Models for Financial Volatility,” Journal of Econometrics, 52, 1–311. Engle, R.F. and Russell, J.R. (1998). “Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data,” Econometrica, 66, 1127– 1162. Engle, R.F. and Russell, J.R. (2005). “A Discrete-State Continuous-Time Model of Financial Transactions Prices and Times: The ACM–ACD Model,” Journal of Business and Economic Statistics, 23, 166–180. Engle, R.F. and Sheppard, K. (2001). “Theoretical and Empirical Properties of Dynamic Conditional Correlation Multivariate GARCH,” Mimeo, UC San Diego. Engle, R.F. and Watson, M.W. (1981). “A One-Factor Multivariate Time Series Model of Metropolitan Wage Rates,” Journal of the American Statistical Association 76, 774–781. Engle, R.F. and Watson, M.W. (1983). “Alternative Algorithms for Estimation of Dynamic MIMIC, Factor, and Time Varying Coeﬃcient Regression Models,” Journal of Econometrics, 23, 385–400. Ericsson, N.R. and Irons, J.S. (eds) (1994). Testing Exogeneity. Oxford: Oxford University Press. Evans, M.D.D. and Lyons, R.K. (2007). “Exchange Rate Fundamentals and Order Flow.” Manuscript, Georgetown University and University of California, Berkeley. Fama, E.F. (1986). “Term Premiums and Default Premiums in Money Markets,” Journal of Financial Economics, 17, 175–196. Fama, E.F. and Bliss, R. (1987). “The Information in Long Maturity Forward Rates,” American Economic Review, 77, 680–692. Favero, C. and Hendry, D.F. (1992). “Testing the Lucas Critique: A Review,” Econometric Reviews, 11, 265–306. Fiorentini, G., Sentana, E. and Shephard, N. (2004). “Likelihood-Based Estimation of Latent Generalized ARCH Structures,” Econometrica, 72, 1481–1517. Fishburn, P.C. (1977). “Mean-Risk Analysis with Risk Associated Below Target Variance,” American Economic Review, 67, 116–126. Forbes, K. and Chinn, M. (2004). “A Decomposition of Global Linkages in Financial Markets Over Time,” Review of Economics and Statistics, 86, 705–722.

Bibliography

381

Forbes, K. and Rigobon, R. (2001). “No Contagion, Only Interdependence: Measuring Stock Market Co-Movements,” Journal of Finance, 57, 2223–2261. Fornari, F. and Mele, A. (1996). “Modeling the Changing Asymmetry of Conditional Variances” Economics Letters, 50, 197–203. Forni, M. and Reichlin, L. (1998). “Let’s Get Real: A Dynamic Factor Analytical Approach to Disaggregated Business Cycle,” Review of Economic Studies, 65, 453–474. Foster, D. and Nelson, D.B. (1996). “Continuous Record Asymptotics for Rolling Sample Estimators,” Econometrica, 64, 139–174. Fountas, S. and Karanasos, M. (2007). “Inﬂation, Output Growth, and Nominal and Real Uncertainty: Empirical Evidence for the G7,” Journal of International Money and Finance, 26, 229–250. Franses, P.H. and van Dijk, D. (2000). Non-Linear Time Series Models in Empirical Finance. Cambridge, UK: Cambridge University Press. Friedman, M. (1977). “Nobel Lecture: Inﬂation and Unemployment,” Journal of Political Economy, 85, 451–472. Friedman, B.M., Laibson, D.I. and Minsky, H.P. (1989). “Economic Implications of Extraordinary Movements in Stock Prices,” Brookings Papers on Economic Activity, 2, 137–189. Frisch, R. (1933). “Propagation and Impulse Problems in Dynamic Economics,” Essays in Honor of Gustav Cassel, London. Frisch, R. and Waugh, F.V. (1933). “Partial Time Regression as Compared with Individual Trends,” Econometrica, 1, 221–223. Gallant, A.R. and Nychka, D.W. (1987). “Semi-Nonparametric Maximum Likelihood Estimation,” Econometrica, 55, 363–390. Gallant, A.R. and Tauchen, G. (1998). “SNP: A Program for Nonparametric Time Series Analysis,” an online guide available at www.econ.duke.edu/ get/wpapers/index.html. Garratt, A., Lee, K., Pesaran, M.H. and Shin, Y. (2003). “A Long Run Structural Macroeconometric Model of the UK,” Economic Journal, 113, 412–455. Garratt, A., Lee, K., Pesaran, M.H. and Shin, Y. (2006). Global and National Macroeconometric Modelling: A Long-Run Structural Approach. Oxford: Oxford University Press. Gemmill, G. and Saﬂekos, A. (2000). “How Useful are Implied Distributions? Evidence from Stock-Index Options,” Journal of Derivatives, 7, 83–98. Geweke, J. (1977). “The Dynamic Factor Analysis of Economic Time Series.” In Latent Variables in Socio-Economic Models, (D.J. Aigner and A.S. Goldberger, eds), Amsterdam: North-Holland.

382

Bibliography

Geweke, J. (1986). “Modeling the Persistence of Conditional Variances: A Comment,” Econometric Review, 5, 57–61. Ghysels, E., Santa-Clara, P. and Valkanov, R. (2005). “There is a Risk-Return Tradeoﬀ After All.” Journal of Financial Economics, 76, 509–548. Ghysels, E., Santa-Clara, P. and Valkanov, R. (2006). “Predicting Volatility: How to Get the Most Out of Returns Data Sampled at Diﬀerent Frequencies,” Journal of Econometrics, 131, 59–95. Gibbons, M. and Ramaswamy, K. (1993). “A Test of the Cox, Ingersoll and Ross Model of the Term Structure,” Review of Financial Studies, 6, 619–658. Glosten, L. (1994). “Is the Electronic Open Limit Order Book Inevitable?”, Journal of Finance, 1127–1161. Glosten, L.R., Jagannathan, R. and Runkle, D.E. (1993). “On the Relationship between the Expected Value and the Volatility of the Nominal Excess Return on Stocks,” Journal of Finance, 48, 1779–1801. Godfrey, L.G. (1978). “Testing for Higher Order Serial Correlation in Regression Equations When the Regressors Include Lagged Dependent Variables,” Econometrica, 46, 1303–1313. Gonz´ alez-Rivera, G. (1998). “Smooth Transition GARCH Models,” Studies in Nonlinear Dynamics and Econometrics, 3, 61–78. Gonz´ alez-Rivera, G., Senyuz, Z. and Yoldas, E. (2007). “Autocontours: Dynamic Speciﬁcation Testing,” Mimeo, UC Riverside. Goodman, J.L. (1986). “Reducing the Error in Monthly Housing Starts Estimates,” AREURA Journal, 14, 557–566. Gourieroux, C. and Jasiak, J. (2001). Financial Econometrics. Princeton, NJ: Princeton University Press. Gourieroux, C. and Monfort, A. (1992). “Qualitative Threshold ARCH Models,” Journal of Econometrics, 52, 159–199. Gourieroux, C., Monfort, A., Renault, E. and Trongnon, A. (1987). “Generalized Residuals,” Journal of Econometrics, 34, 5–32. Gourlay, A.R. and McKee, S. (1977). “The Construction of Hopscotch Methods for Parabolic and Elliptic Equations in Two Space Dimensions with a Mixed Derivative,” Journal of Computational and Applied Mathematics, 3, 201– 206. Granger, C.W.J. (1969). “Prediction with a Generalized Cost Function,” OR, 20, 199– 207. Granger, C.W.J (1983). “Acronyms in Time Series Analysis (ATSA),” Journal of Time Series Analysis, 3, 103–107.

Bibliography

383

Granger, C.W.J. (1999). “Outline of Forecast Theory Using Generalized Cost Functions,” Spanish Economic Review, 1, 161–173. Granger, C.W.J. (2008). “In Praise of Pragmatic Econometrics.” In The Methodology and Practice of Econometrics: A Festschrift in Honour of David F. Hendry. (J.L. Castle and N. Shephard, eds), Oxford University Press. Forthcoming. Granger, C.W.J. and Machina, M.J. (2006). “Forecasting and Decision Theory.” In Handbook of Economic Forecasting (G. Elliott, C.W.J. Granger and A. Timmermann eds), Amsterdam: North-Holland. Gray, S.F. (1996). “Modeling the Conditional Distribution of Interest Rates as a RegimeSwitching Process,” Journal of Financial Economics, 42, 27–62. Grier, K.B. and Perry, M.J. (2000). “The Eﬀects of Real and Nominal Uncertainty on Inﬂation and Output Growth: Some GARCH-M Evidence,” Journal of Applied Econometrics, 15, 45–58. Griﬃn, J. and Karolyi, G.A. (1998). “Another Look at the Role of Industrial Structure of Markets for International Diversiﬁcation Strategies,” Journal of Financial Economics, 50, 351–373. Griﬃn, J. and Stultz, R. (2001). “International Competition and Exchange Rate Shocks: A Cross-Country Industry Analysis,” Review of Financial Studies 14, 215–241. Groen, J.J.J., Kapetanios, G. and Price, S. (2009). “Real Time Evaluation of Bank of England Forecasts for Inﬂation and Growth,” International Journal of Forecasting, 25, 74–80. Gu´egan, D. and Diebolt, J. (1994). “Probabilistic Properties of the β-ARCH Model,” Statistica Sinica, 4, 71–87. Guo, H. and Kliesen, K. (2005). “Oil Price Volatility and U.S. Macroeconomic Activity,” Federal Reserve Bank of St. Louis Review, Nov/Dec., 669–683. Haldane, A. and Quah, D. (1999). “UK Phillips Curves and Monetary Policy,” Journal of Monetary Economics, 44, 259–278. Hall, A. and Hautsch, N. (2004). “Order Aggressiveness and Order Book Dynamics,” Working Paper, University of Copenhagen. Hamilton, J. (1988). “Rational Expectations Econometric Analysis of Changes in Regime: An Investigation of the Term Structure of Interest Rates,” Journal of Economic Dynamics and Control, 12, 365–423. Hamilton, J.D. (1994). Time Series Analysis. Princeton: Princeton University Press. Hamilton, J.D. (2009). “Daily Changes in Fed Funds Futures Prices,” Journal of Money, Credit and Banking, 41, 567–582. Hamilton, J.D. (2008). “Daily Monetary Policy Shocks and New Home Sales,” Journal of Monetary Economics, 55, 1171–1190.

384

Bibliography

Hamilton, J. and Jord´ a, O. (2002). “A Model of the Federal Funds Rate Target,” Journal of Political Economy, 110, 1135–1167. Hamilton, J.D. and Lin, G. (1996). “Stock Market Volatility and the Business Cycle,” Journal of Applied Econometrics, 11, 573–593. Hamilton, J.D. and Susmel, R. (1994). “Autoregressive Conditional Heteroskedasticity and Changes in Regimes,” Journal of Econometrics, 64, 307–333. Han, H. and Park, J.Y. (2008). “Time Series Properties of ARCH Processes with Persistent Covariates,” Journal of Econometrics, 146, 275–292. Hansen, B.E. (1994). “Autoregressive Conditional Density Estimation,” International Economic Review, 35, 705–730. Hansen, L.P. and Jagannathan, R. (1991). “Implications of Security Market Data for Models of Dynamic Economies,” Journal of Political Economy, 99, 225–262. Hansen, P.R. and Lunde, A. (2006). “Consistent Ranking of Volatility Models,” Journal of Econometrics, 131, 97–121. Hansen, L.P. and Scheinkman, J. (1995). “Back to the Future: Generating Moment Implications for Continuous Time Markov Processes,” Econometrica, 63, 767–804. Harris, R.D.F., Stoja, E. and Tucker, J. (2007). “A Simpliﬁed Approach to Modeling the Comovement of Asset Returns,” Journal of Futures Markets, 27, 575–598. Harrison, J.M. and Kreps, D.M. (1979). “Martingales and Arbitrage in Multiperiod Securities Markets,” Journal of Economic Theory, 20, 381–408. Hartmann, P., Straetman, S. and de Vries, C. (2004). “Asset Market Linkages in Crisis Periods,” Review of Economics and Statistics, 86, 313–326. Harvey, A., Ruiz, E. and Sentana, E. (1992). “Unobserved Component Time Series Models with ARCH Disturbances,” Journal of Econometrics, 52, 129–157. Harvey, C.R. and Siddique, A. (1999). “Autoregressive Conditional Skewness,” Journal of Financial and Quantitative Analysis, 34, 465–487. Harvey, C.R. and Siddique, A. (2000). “Conditional Skewness in Asset Pricing Tests,” Journal of Finance LV, 1263–1295. Haug, S. And Czado, C. (2007). “An Exponential Continuous-Time GARCH Process,” Journal of Applied Probability, 44, 960–976. Hausman, J., A.W. Lo and A.C. MacKinlay (1992). “An Ordered Probit Analysis of Transaction Stock Prices,” Journal of Financial Analysis, 319–379. Heath, D., Jarrow, R. and Morton, A. (1992). “Bond Pricing and the Term Structure of Interest Rates,” Econometrica, 60, 77–105.

Bibliography

385

Hendry, D.F. (1979). “Predictive Failure and Econometric Modelling in MacroEconomics: The Transactions Demand for Money.” In Economic Modelling, (Ormerod, P. ed), 217–242. London: Heinemann. Hendry, D.F. (1988). “The Encompassing Implications of Feedback Versus Feedforward Mechanisms in Econometrics,” Oxford Economic Papers, 40, 132–149. Hendry, D.F. (1995). Dynamic Econometrics. Oxford: Oxford University Press. Hendry, D.F. (2000). “On Detectable and Non-Detectable Structural Change,” Structural Change and Economic Dynamics, 11, 45–65. Hendry, D.F. (2006). “Robustifying Forecasts from Equilibrium-Correction Models,” Journal of Econometrics, 135, 399–426. Hendry, D.F. (2009). “The Methodology of Empirical Econometric Modeling: Applied Econometrics Through the Looking-Glass.” In Palgrave Handbook of Econometrics. (Mills, T.C. and Patterson, K.D. eds), Basingstoke: Palgrave MacMillan. Hendry, D.F. and Doornik, J.A. (1994). “Modelling Linear Dynamic Econometric Systems,” Scottish Journal of Political Economy, 41, 1–33. Hendry, D.F. and Doornik, J.A. (1997). “The Implications for Econometric Modelling of Forecast Failure,” Scottish Journal of Political Economy, 44, 437–461. Hendry, D.F. and Ericsson, N.R. (1991). “Modeling the Demand for Narrow Money in the United Kingdom and the United States,” European Economic Review, 35, 833–886. Hendry, D.F., Johansen, S. and Santos, C. (2008). “Automatic Selection of Indicators in a Fully Saturated Regression,” Computational Statistics, 33, 317–335. Erratum, 337–339. Hendry, D.F. and Krolzig, H.-M. (2001). Automatic Econometric Model Selection. London: Timberlake Consultants Press. Hendry, D.F. and Krolzig, H.-M. (2005). “The Properties of Automatic Gets Modelling,” Economic Journal, 115, C32–C61. Hendry, D.F. and Massmann, M. (2007). “Co-breaking: Recent Advances and a Synopsis of the Literature,” Journal of Business and Economic Statistics, 25, 33–51. Hendry, D.F. and Mizon, G.E. (1993). “Evaluating Dynamic Econometric Models by Encompassing the VAR.” In Models, Methods and Applications of Econometrics, (Phillips, P.C.B. ed), 272–300. Oxford: Basil Blackwell. Hendry, D.F. and Santos, C. (2005). “Regression Models with Data-Based Indicator Variables,” Oxford Bulletin of Economics and Statistics, 67, 571–595. Hendry, D.F. and Santos, C. (2007). “Automatic Index-Based Tests of Super Exogeneity.” Unpublished paper, Economics Department, University of Oxford.

386

Bibliography

Hentschel, L. (1995). “All in the Family: Nesting Symmetric and Asymmetric GARCH Models,” Journal of Financial Economics, 39, 71–104. Heston, S.L. (1993). “A Closed-Form Solution for Options with Stochastic Volatility, with Applications to Bond and Currency Options,” Review of Financial Studies, 6, 327–343. Heston, S.L. and Nandi, S. (2000). “A Closed-Form GARCH Option Valuation Model,” Review of Financial Studies, 13, 585–625. Heston, S.L. and Rouwenhorst, G. (1994). “Does Industrial Structure Explain the Beneﬁts of International Diversiﬁcation” Journal of Financial Economics, 36, 3–27. Higgins, M.L. and Bera, A.K. (1992). “A Class of Nonlinear ARCH Models,” International, Economic Review, 33, 137–158. Hille, E. and Phillips, R. (1957). Functional Analysis and Semigroups, American Mathematical Society, Providence, R.I. HM Treasury (1994). “Economic Forecasting in the Treasury,” Government Economic Service Working Paper No.121. London: HM Treasury. Hogan, W.W. and Warren, J.M. (1972). “Computation of the Eﬃcient Boundary in the E-S Portfolio Selection Model, Journal of Finance and Quantitative Analysis, 7, 1881–1896. Hogan, W.W. and Warren, J.M. (1974). “Toward the Development of an Equilibrium Capital-Market Model Based on Semivariance,” Journal of Finance and Quantitative Analysis, 9, 1–11. Hoover, K.D. and Perez, S.J. (1999). “Data Mining Reconsidered: Encompassing and the General-to-speciﬁc Approach to Speciﬁcation Search.” Econometrics Journal, 2, 167–191. Horvath, M. and Verbrugge, R. (1996). “Shocks and Sectoral Interactions: An Empirical Investigation,” unpublished. Huang, X. and Tauchen, G. (2005). “The Relative Contribution of Jumps to Total Price Variation,” Journal of Financial Econometrics, 3, 456–499. Huber, P.J. (1967). “The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions,” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 221–233. Huizinga, J. and Mishkin, F.S. (1986). “Monetary Policy Regime Shifts and the Unusual Behavior of Real Interest Rates,” Carnegie-Rochester Conference on Public Policy, 24, 231–274. Hwang, S. and Satchell, S.E. (1999). “Modelling Emerging Market Risk Premia Using Higher Moments,” International Journal of Finance and Economics, 4, 271–296.

Bibliography

387

Hwang, S. and Satchell, S.E. (2005). “GARCH Model with Cross-Sectional Volatility: GARCHX Models,” Applied Financial Economics, 15, 203–216. Ingersoll, J. (1987). Theory of Financial Decision Making, Rowman and Littleﬁeld, Totowa, NJ. International Monetary Fund (2000). World Economic Outlook – Asset Prices and the Business Cycle, Washington, D.C.: International Monetary Fund. Jackwerth, J.C. (1997). “Generalized Binomial Trees,” Journal of Derivatives, 5 (Winter), 7–17. Jackwerth, J.C. (2000). “Recovering Risk Aversion from Option Prices and Realized Returns,” Review of Financial Studies, 13, 433–451. Jackwerth, J.C. (2004). Option-Implied Risk-Neutral Distributions and Risk Aversion. Charlotteville: Research Foundation of AIMR. Jackwerth, J.C. and Rubinstein, M. (1996). “Recovering Probability Distributions from Option Prices,” Journal of Finance, 51, 1611–1631. Jacod, J. (1994). Limit of random measures associated with the increments of a Brownian semimartingale. Preprint number 120, Laboratoire de Probabiliti´es, Universit´e Pierre et Marie Curie, Paris. Jacod, J. (2007). “Statistics and high frequency data.” Unpublished paper. Jacod, J., Li, Y. Mykland, P.A., Podolskij, M. and Vetter, M. (2007). “Microstructure noise in the continuous case: the pre-averaging approach.” Unpublished paper: Department of Statistics, University of Chicago. Jalil, M. (2004). Essays on the Eﬀect of Information on Monetary Policy, unpublished Ph.D. dissertation, UCSD. Jansen, E.S. and Ter¨ asvirta, T. (1996). “Testing parameter constancy and super exogeneity in econometric equations,” Oxford Bulletin of Economics and Statistics, 58, 735–763. Jarque, C.M. and Bera, A.K. (1987). “A Test for Normality of Observations and Regression Residuals,” International Statistical Review, 55, 163–172. Jiang, G.J. and Tian, Y.S. (2005). “The Model-Free Implied Volatility and its Information Content,” Review of Financial Studies, 18, 1305–1342. Johansen, S. (1995). Likelihood-Based Inference in Cointegrated Vector Autoregressive Models, Oxford: Oxford University Press. Johansen, S. and Nielsen, B. (2009). “An Analysis of the Indicator Saturation Estimator as a Robust Regression Estimator.” In The Methodology and Practice of Econometrics, (Castle, J.L. and Shephard, N. eds), Oxford: Oxford University Press.

388

Bibliography

Jondeau, E. and Rockinger, M. (2006). “The Copula-GARCH Model of Conditional Dependencies: An International Stock Market Application,” Journal of International Money and Finance, 25, 827–853. Judd, J.P., and G.D. Rudebusch (1998). “Taylor’s Rule and the Fed: 1970–1997,” Federal Reserve Bank of San Francisco Review, 3, 3–16. Kalliovirta, L. (2007). “Quantile Residuals for Multivariate Models,” Mimeo, University of Helsinki. Karolyi, A. and Stultz, R. (1996). “Why Do Markets Move Together? An Investigation of U.S.-Japan Stock Return Comovements,” Journal of Finance, 51, 951–86. Kavajecz, K. (1999). “A Specialist’s Quoted Depth and the Limit Book,” Journal of Finance, 747–771. Kawakatsu, H. (2006). “Matrix Exponential GARCH,” Journal of Econometrics, 134, 95–128. Kilian, L. and Gon¸calves, S. (2004). “Bootstrapping Autoregressions with Conditional Heteroskedasticity of Unknown Form,” Journal of Econometrics, 123, 89–120. Kilian, L. and Park, C. (2007). “The Impact of Oil Prices on the US Stock Market,” CEPR Discussion Paper 6166. Kim, C.-J. and Nelson, C.R. (1999a). “Has the US Economy Become More Stable? A Bayesian Approach Based on a Markov-Switching Model of the Business Cycle,” Review of Economics and Statistics, 81, 608–616. Kim, C.-J. and Nelson, C.R. (1999b). State-Space Models with Regime Switching. Classical and Gibb-Sampling Approaches with Applications, Cambridge, Mass. Kim, S., Shephard, N. and Chib, S. (1998). “Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models,” Review of Economic Studies, 65, 361–393. Kim, T.-H. and White, H. (2003). “Estimation, Inference, and Speciﬁcation Testing for Possibly Misspeciﬁed Quantile Regression.” In Maximum Likelihood Estimation of Misspeciﬁed Models: Twenty Years Later (T. Fomby and C. Hill, eds.), New York: Elsevier, 107–132. Kim, T.-H. and White, H. (2004). “On more robust estimation of skewness and kurtosis,” Finance Research Letters, 1, 56–73. King, M., Sentana, E. and Wadhwani, S. (1994). “Volatility and Links Between National Stock Markets,” Econometrica, 62, 901–33. Kinnebrock, S. and Podolskij, M. (2008). “A Note on the Central Limit Theorem for Bipower Variation of General Functions,” Stochastic Processes and Their Applications, 118, 1056–1070. Klemkosky, R.C. and Pilotte, E.A. (1992). “Time-Varying Term Premiums on U.S. Treasury Bills and Bonds,” Journal of Monetary Economics, 30, 87–106.

Bibliography

389

Kl¨ uppelberg, C., Lindner, A. and R. Maller (2004). “A Continuous Time GARCH Process Driven by a L´evy Process: Stationarity and Second Order Behaviour,” Journal of Applied Probability, 41, 601–622. Kodres, L.E. (1993). “Test of Unbiasedness in Foreign Exchange Futures Markets: An Examination of Price Limits and Conditional Heteroskedasticity,” Journal of Business, 66, 463–490. Koenker, R. and Bassett, G. (1978). “Regression Quantiles,” Econometrica, 46, 33–50. Komunjer, I. (2005). “Quasi-Maximum Likelihood Estimation for Conditional Quantiles,” Journal of Econometrics, 128, 127–164. Komunjer, I. and Vuong, Q. (2006). “Eﬃcient Conditional Quantile Estimation: The Time Series Case.” University of California, San Diego Department of Economics Discussion Paper 2006–10. Komunjer, I. and Vuong, Q. (2007a). “Semiparametric Eﬃciency Bound and Mestimation in Time-Series Models for Conditional Quantiles.” University of California, San Diego Department of Economics Discussion Paper. Komunjer, I. and Vuong, Q. (2007b). “Eﬃcient Estimation in Dynamic Conditional Quantile Models.” University of California, San Diego Department of Economics Discussion Paper. Koren, M. and Tenreyro, S. (2007). “Volatility and Development,” Quarterly Journal of Economics, 122, 243–287. Kose, M.A., Prasad, E.S. and Terrones, M.E. (2006). “How Do Trade and Financial Integration Aﬀect the Relationship Between Growth and Volatility?” Journal of International Economics, 69, 176–202. Krolzig, H.-M. and Toro, J. (2002). “Testing for Super-Exogeneity in the Presence of ´ Common Deterministic Shifts,” Annales d’Economie et de Statistique, 67/68, 41–71. Lane, P.R. and Milesi-Ferretti, G.M. (2001). “The External Wealth of Nations: Measures of Foreign Assets and Liabilities for Industrial and Developing Countries,” Journal of International Economics, 55, 263–294. Laurent, S. and Peters, J.P. (2002). “G@RCH 2.2: An Ox Package for Estimating and Forecasting Various ARCH Models,” Journal of Economic Surveys, 16, 447–485. LeBaron, B. (1992). “Some Relations Between Volatility and Serial Correlation in Stock Market Returns,” Journal of Business, 65, 199–219. Ledoit, O., Santa-Clara, P. and Wolf, M. (2003). “Flexible Multivariate GARCH Modeling with an Application to International Stock Markets,” Review of Economics and Statistics, 85, 735–747. Lee, K., Ni, S. and Ratti, R.A. (1995). “Oil Shocks and the Macroeconomy: The Role of Price Variability,” Energy Journal, 16, 39–56.

390

Bibliography

Lee, L.F. (1999). “Estimation of Dynamic and ARCH Tobit Models,” Journal of Econometrics, 92, 355–390. Lee, S. and Mykland, P.A. (2008). “Jumps in Financial Markets: A New Nonparametric Test and Jump Dynamics,” Review of Financial Studies, forthcoming. Lee, S. and Taniguchi, M. (2005). “Asymptotic Theory for ARCH-SM Models: LAN and Residual Empirical Processes,” Statistica Sinica, 15, 215–234. Lee, S.W. and Hansen, B.E. (1994). “Asymptotic Theory for the GARCH(1,1) QuasiMaximum Likelihood Estimator,” Econometric Theory, 10, 29–52. Lee, T.H. (1994). “Spread and Volatility in Spot and Forward Exchange Rates,” Journal of International Money and Finance, 13, 375–382. ´ Rubio, G. and Serna, G. (2004). “Autoregressive Conditional Volatility, Le´on, A., Skewness and Kurtosis,” WP-AD 2004-13, Instituto Valenciano de Investigaciones Economicas. ´ Rubio, G. and Serna, G. (2005). “Autoregessive Conditional Volatility, Le´on, A., Skewness and Kurtosis,” Quarterly Review of Economics and Finance, 45, 599–618. Levine, R. (1997). “Financial Development and Economic Growth: Views and Agenda,” Journal of Economic Literature, 35, 688–726. Lewis, A.L. (1990). “Semivariance and the Performance of Portfolios with Options,” Financial Analysts Journal, 67–76. L’Her, J.F., Sy, O. and Yassine Tnani, M. (2002). “Country, Industry, and Risk Factor Loadings in Portfolio Management, Journal of Portfolio Management, 28, 70–79. Li, C.W. and Li, W.K. (1996). “On a Double Threshold Autoregressive Heteroskedastic Time Series Model,” Journal of Applied Econometrics, 11, 253–274. Lin, W.-L., Engle, R.F. and Takatoshi, I. (1994). “Do Bull and Bears Move Across Borders? Transmission of International Stock Returns and Volatility,” Review of Financial Studies, 7, 507–538. Ling, S. and Li, W.K. (1997). “Diagnostic Checking of Nonlinear Multivariate Time Series with Multivariate ARCH Errors,” Journal of Time Series Analysis, 18, 447–464. Litterman, R. and Scheinkman, J. (1991). “Common Factors Aﬀecting Bond Returns,” Journal of Fixed Income, 1, 54–61. Liu, S.M. and Brorsen, B.W. (1995). “Maximum Likelihood Estimation of a GARCH Stable Model,” Journal of Applied Econometrics, 10, 272–285. Lo, A. (1988). “Maximum Likelihood Estimation of Generalized Ito Processes with Discretely Sampled Data,” Econometric Theory, 4, 231–247. Longin, F. and Solnik, B. (1995). “Is the Correlation in International Equity Returns Constant: 1970–1990?” Journal of International Money and Finance, 14, 3–26.

Bibliography

391

Longstaﬀ, F. and Schwartz, E. (1992). “Interest Rate Volatility and the Term Structure: A Two-Factor General Equilibrium Model,” Journal of Finance, 47, 1259–1282. Lucas, R.E. (1976). “Econometric Policy Evaluation: A Critique.” In The Phillips Curve and Labor Markets, (Brunner, K. and Meltzer, A. eds), Vol. 1 of Carnegie-Rochester Conferences on Public Policy, 19–46. Amsterdam: North-Holland Publishing Company. Lumsdaine, R.L. (1996). “Consistency and Asymptotic Normality of the Quasi-Maximum Likelihood Estimator in IGARCH(1,1) and Covariance Stationary GARCH(1,1) Models,” Econometrica, 64, 575–596. Lutkepohl, H. (2005). New Introduction to Multiple Time Series Analysis, Berlin: Springer-Verlag. Maheu, J.M. and McCurdy, T.H. (2004). “News Arrival, Jump Dynamics and Volatility Components for Individual Stock Returns,” Journal of Finance, 59, 755–794. Malz, A.M. (1997). “Estimating the Probability Distribution of the Future Exchange Rate from Options Prices,” Journal of Derivatives, 5, 18–36. Mancini, C. (2001). “Disentangling the Jumps of the Diﬀusion in a Geometric Brownian Motion,” Giornale dell’Istituto Italiano degi Attuari LXIV, 19–47. Mao, J.C.T. (1970a). “Models of Capital Budgeting, E-V vs. E-S,” Journal of Finance and Quantitative Analysis, 4, 657–675. Mao, J.C.T. (1970b). “Survey of Capital Budgeting: Theory and Practice,” Journal of Finance, 25, 349–360. Markowitz, H. (1959). Portfolio Selection. New York. McConnell, M.M. and Perez-Quiros, G. (2000). “Output Fluctuations in the United States: What Has Changed Since the Early 1980s?” American Economic Review, 90, 1464–1476. McCulloch, J.H. (1985). “Interest-Risk Sensitive Deposit Insurance Premia: Stable ACH Estimates,” Journal of Banking and Finance, 9, 137–156. McNees, S.K. (1979). “The Forecasting Record for the 1970s,” New England Economic Review, September/October 1979, 33–53. McNeil, A.J., Frei, R. and Embrechts, P. (2005). Quantitative Risk Management. Princeton University Press. McNeil, A.J. and Frey, R. (2000). “Estimation of Tail-Related Risk Measures for Heteroskedastic Financial Time Series: An Extreme Value Approach,” Journal of Empirical Finance, 7, 271–300. Medeiros, M.C. and Veiga, A. (2009). “Modeling Multiple Regimes in Financial Volatility with a Flexible Coeﬃcient GARCH(1,1) Model,” Econometric Theory, 25, 117–161.

392

Bibliography

Meenagh, D., Minford, P., Nowell, E., Sofat, P. and Srinivasan, N. (2009). “Can the Facts of UK Inﬂation Persistence be Explained by Nominal Rigidity?” Economic Modelling, 26, 978–992. Melick, W.R. and Thomas, C.P. (1997). “Recovering an Asset’s Implied PDF from Option Prices: An Application to Crude Oil During the Gulf Crisis,” Journal of Financial and Quantitative Analysis, 32, 91–115. Melliss, C. and Whittaker, R. (2000). “The Treasury’s Forecasts of GDP and the RPI: How Have They Changed and What are the Uncertainties?” In Econometric Modelling: Techniques and Applications (S. Holly and M.R. Weale, eds), 38–68. Cambridge: Cambridge University Press. Milhøj, A. (1985). “The Moment Structure of ARCH Processes,” Scandinavian Journal of Statistics, 12, 281–292. Milhøj, A. (1987). “A Conditional Variance Model for Daily Observations of an Exchange Rate,” Journal of Business and Economic Statistics, 5, 99–103. Milhøj, A. (1987). “A Multiplicative Parameterization of ARCH Models.” Working Paper, Department of Statistics, University of Copenhagen. Mills, T.C. (1993). The Econometric Modelling of Financial Time Series. Cambridge, UK: Cambridge University Press. Milshtein, G.N. (1974). “Approximate Integration of Stochastic Diﬀerential Equations,” Theory of Probability and Its Applications, 19, 557–562. Milshtein, G.N. (1978). “A Method of Second-Order Accuracy Integration for Stochastic Diﬀerential Equations,” Theory of Probability and Its Applications, 23, 396–401. Mincer, J. and Zarnowitz, V. (1969). “The Evaluation of Economic Forecasts.” In Economic Forecasts and Expectations, (J. Mincer ed.) National Bureau of Economic Research, New York. Mitchell, J. (2005). “The National Institute Density Forecasts of Inﬂation,” National Institute Economic Review, No.193, 60–69. Montiel, P. and Serven, L. (2006). “Macroeconomic Stablity in Developing Countries: How Much is Enough?” The World Bank Research Observer, 21, Fall, 151–178. Moors, J.J.A. (1988). “A Quantile Alternative for Kurtosis,” The Statistician, 37, 25–32. Morgan, I.G. and Trevor, R.G. (1999). “Limit Moves as Censored Observations of Equilibrium Futures Prices in GARCH Processes,” Journal of Business and Economic Statistics, 17, 397–408. M¨ uller, U.A., Dacorogna, M.M., Dav´e, R.D., Olsen, R.B., Puctet, O.V. and von Weizs¨acker, J. (1997). “Volatilities of Diﬀerent Time Resolutions – Analyzing the Dynamics of Market Components,” Journal of Empirical Finance, 4, 213–239.

Bibliography

393

Nam, K., Pyun, C.S. and Arize, A.C. (2002). “Asymmetric Mean-Reversion and Contrarian Proﬁts: ANST-GARCH Approach,” Journal of Empirical Finance, 9, 563–588. Nelson, D.B. (1990a). “Stationarity and Persistence in the GARCH(1,1) Model,” Econometric Theory, 6, 318–334. Nelson, D.B. (1990b). “ARCH Models as Diﬀusion Approximations,” Journal of Econometrics, 45, 7–39. Nelson, D.B. (1991). “Conditional Heteroscedasticity in Asset Returns: A New Approach,” Econometrica, 59, 347–370. Nelson, D.B. (1992). “Filtering and Forecasting with Misspeciﬁes ARCH Models I: Getting the Right Variance with the Wrong Model,” Journal of Econometrics, 52, 61–90. Nelson, D.B. (1996a). “Asymptotic Filtering Theory for Multivariate ARCH Models,” Journal of Econometrics, 71, 1–47. Nelson, D.B. (1996b). “Asymptotically Optimal Smoothing with ARCH Models,” Econometrica, 64, 561–573. Nelson, D.B. and Cao, C.Q. (1992). “Inequality Constraints in the Univariate GARCH Model,” Journal of Business and Economic Statistics, 10, 229–235. Nelson, D.B. and Foster, D. (1994). “Asymptotic Filtering Theory for Univariate ARCH Models,” Econometrica, 62, 1–41. Nelson, E. (2009). “An Overhaul of Doctrine: The Underpinning of UK Inﬂation Targeting,” Economic Journal, 11, F333–F368. Nelson, E. and Nikolov, K. (2004). “Monetary Policy and Stagﬂation in the UK,” Journal of Money, Credit, and Banking, 36, 293–318. Nerlove, M. (1965). “Two Models of the British Economy: A Fragment of a Critical Survey,” International Economic Review, 6, 127–181. Newey, W.K. and Powell, J.L. (1990). “Eﬃcient Estimation of Linear and Type I Censored Regression Models Under Conditional Quantile Restrictions,” Econometric Theory, 6, 295–317. Newey, W.K. and West, K.D. (1987). “A Simple, Positive Semideﬁnite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55, 703–708. Nijman, T. and Sentana, E. (1996). “Marginalization and Contemporaneous Aggregation in Multivariate GARCH Processes,” Journal of Econometrics, 71, 71–87. Norrbin, S. and Schlagenhauf, D. (1988). “An Inquiry into the Sources of Macroeconomic Fluctuations,” Journal of Monetary Economics, 22, 43–70.

394

Bibliography

Nowicka-Zagrajek, J. and Werron, A. (2001). “Dependence Structure of Stable RGARCH Processes,” Probability and Mathematical Statistics, 21, 371–380. Øksendal, B. (1985). Stochastic Diﬀerential Equations: An Introduction with Applications, 3rd edition, Springer-Verlag, New York. Oliner, S. and Sichel, D. (2000). “The Resurgence of Growth in the Late 1990s: Is Information Technology the Story?” Journal of Economic Perspectives, 14, 3–22. Optionmetrics (2003). “Ivy DB File and Data Reference Manual, Version 2.0.” Downloadable .pdf ﬁle, available on WRDS. Orphanides, A. (2001). “Monetary Policy Rules Based on Real-Time Data” American Economic Review, 91, 964–985. Otsu, T. (2003). “Empirical Likelihood for Quantile Regression.” University of Wisconsin, Madison Department of Economics Discussion Paper. Pagan, A. (1996). “The Econometrics of Financial Markets,” Journal of Empirical Finance, 3, 15–102. Palm, F. (1996). “GARCH Models of Volatility.” In Handbook of Statistics, Volume 14, (C.R. Rao and G.S. Maddala eds), Amsterdam: North-Holland, 209–240. Pantula, S.G. (1986). “Modeling the Persistence of Conditional Variances: A Comment,” Econometric Review, 5, 71–74. Park, B.J. (2002). “An Outlier Robust GARCH Model and Forecasting Volatility of Exchange Rate Returns,” Journal of Forecasting, 21, 381–393. Patton, A.J. (2006a). “Modelling Asymmetric Exchange Rate Dependence,” International Economic Review, 47, 527–556. Patton, A.J. (2006b). “Volatility Forecast Comparison Using Imperfect Volatility Proxies,” Journal of Econometrics, forthcoming. Patton, A.J. and Timmermann, A. (2007a). “Properties of Optimal Forecasts under Asymmetric Loss and Nonlinearity,” Journal of Econometrics, 140, 884–918. Patton, A.J. and Timmermann, A. (2007b). “Testing Forecast Optimality under Unknown Loss,” Journal of American Statistical Association, 102, 1172–1184. Pearson, N. and Sun, T.-S. (1994). “Exploiting the Conditional Density in Estimating the Term Structure: An Application to the Cox, Ingersoll, Ross Model,” Journal of Finance, 49, 1279–1304. Pedersen, C.S. and Satchell, S.E. (2002). “On the Foundation of Performance Measures Under Asymmetric Returns,” Quantitative Finance, 2, 217–223. Pelloni, G. and Polasek, W. (2003). “Macroeconomic Eﬀects of Sectoral Shocks in Germany, the U.K. and the U.S.: A VAR-GARCH-M Approach,” Computational Economics, 21, 65–85.

Bibliography

395

Perez-Quiros, G. and Timmermann, A. (2000). “Firm Size and Cyclical Variations in Stock Returns,” Journal of Finance, 55, 1229–1262. Pesaran, M.H. and Skouras, S. (2001). Decision-Based Methods for Forecast Evaluation. In Companion to Economic Forecasting (Clements, M.P. and D.F. Hendry eds). Basil Blackwell. Pesaran, M.H. and Zaﬀaroni, P. (2008). “Model Averaging in Risk Management with an Application to Futures Markets,” CESifo Working Paper Series No. 1358; IEPR Working Paper No. 04.3. Piazzesi, M. (2005). “Bond Yields and the Federal Reserve,” Journal of Political Economy, 113, 311–344. Piazzesi, M. and Swanson, E. (2008). “Futures Prices as Risk-Adjusted Forecasts of Monetary Policy,” Journal of Monetary Economics, 55, 677–691. Pinto, B. and Aizenman, J. (eds) (2005). Managing Economic Volatility and Crises: A Practitioner’s Guide. Cambridge: Cambridge University Press. Poon, S.H. (2005). A Practical Guide to Forecasting Financial Market Volatility. Chichester, UK: John Wiley & Sons, Ltd. Poon, S.-H. and Granger, C.W.J. (2003). “Forecasting Volatility in Financial Markets: A Review,” Journal of Economic Literature, 41, 478–539. Portes, R. and Rey, H. (2005). “The Determinants of Cross-Border Equity Flows,” Journal of International Economics, 65, 269–96. Powell, J. (1984). “Least Absolute Deviations Estimators for the Censored Regression Model,” Journal of Econometrics, 25, 303–325. Pritsker, M. (1998). “Nonparametric Density Estimation of Tests of Continuous Time Interest Rate Models,” The Review of Financial Studies, 11, 449– 487. Protter, P. (2004). Stochastic Integration and Diﬀerential Equations. New York: SpringerVerlag. Psaradakis, Z. and Sola, M. (1996). “On the Power of Tests for Superexogeneity and Structural Invariance,” Journal of Econometrics, 72, 151– 175. Ramey, G. and Ramey, V.A. (1995). “Cross-Country Evidence on the Link Between Volatility and Growth,” American Economic Review, 85, 1138–1151. Ramsey, J.B. (1969). “Tests for Speciﬁcation Errors in Classical Linear Least Squares Regression Analysis,” Journal of the Royal Statistical Society B, 31, 350– 371. Ranaldo, A. (2004). “Order Aggressiveness in Limit Order Book Markets,” Journal of Financial Markets, 53–74.

396

Bibliography

Revankar, N.S. and Hartley, M.J. (1973). “An Independence Test and Conditional Unbiased Predictions in the Context of Simultaneous Equation Systems,” International Economic Review, 14, 625–631. Rich, R. and Tracy, J. (2006). “The relationship between expected inﬂation, disagreement and uncertainty: evidence from matched point and density forecasts,” Staﬀ Report No.253, Federal Reserve Bank of New York. Rigobon, R. (2002). “The Curse of Non-Investment Grade Countries,” Journal of Development Economics, 69, 423–449. Rigobon, R. and Sack, B. (2003). “Measuring the Reaction of Monetary Policy to the Stock Market,” Quarterly Journal of Economics, 639–669. Robinson, P.M. (1991). “Testing for Strong Serial Correlation and Dynamic Conditional Heteroskedasticity in Multiple Regression,” Journal of Econometrics, 47, 67–84. Rom, B.M. and Ferguson, K. (1993). “Post-Modern Portfolio Theory Comes of Age,” Journal of Investing, 11–17. Rosenberg, J. and Engle, R. (2002). “Empirical Pricing Kernels,” Journal of Financial Economics, 64, 341–72. Rosu, I. (2008). “A Dynamic Model of the Limit Order Book,” Review of Financial Studies, forthcoming. Rubinstein, M. (1994). “Implied Binomial Trees,” Journal of Finance, 49, 771–818. Rubinstein, M. (1998). “Edgeworth Binomial Trees,” Journal of Derivatives, 5, 20–27. Russell, J. and R.F. Engle, (2005). “A Discrete-State Continuous-Time Model of Transaction Prices and Times: The ACM-ACD Model,” Journal of Business and Economic Statistics, 166–180. Sack, B. (2004). “Extracting the Expected Path of Monetary Policy from Futures Rates,” Journal of Futures Markets, 24, 733–754. Sakata, S. and White, H. (1998). “High Breakdown Point Conditional Dispersion Estimation with Application to S&P 500 Daily Returns Volatility,” Econometrica, 66, 529–568. Salkever, D.S. (1976). “The Use of Dummy Variables to Compute Predictions, Prediction Errors and Conﬁdence Intervals,” Journal of Econometrics, 4, 393–397. S´ anchez, M.J. and Pe˜ na, D. (2003). “The Identiﬁcation of Multiple Outliers in ARIMA Models,” Communications in Statistics: Theory and Methods, 32, 1265–1287. Sanders, A.B. and Unal, H. (1988). “On the Intertemporal Stability of the Short Term Rate of Interest,” Journal of Financial and Quantitative Analysis, 23, 417–423. Santos, C. and Hendry, D.F. (2006). “Saturation in Autoregressive Models,” Notas Economicas, 19, 8–20.

Bibliography

397

Sasaki, K. (1963). “Military Expenditures and the Employment Multiplier in Hawaii,” Review of Economics and Statistics, 45, 293–304. Schaefer, S. and Schwartz, E. (1984). “A Two-Factor Model of the Term Structure: An Approximate Analytical Solution,” Journal of Financial and Quantitative Analysis, 19, 413–424. Schwarz, G. (1978). “Estimating the Dimension of a Model,” Annals of Statistics, 6, 461–464. Schwert, G.W. (1989). “Why Does Stock Market Volatility Change Over Time?” Journal of Finance, 44, 1115–1153. Schwert, G.W. (1990). “Stock Volatility and the Crash of ‘87,” Review of Financial Studies, 3, 77–102. Sensier, M. and van Dijk, D. (2004). “Testing for Volatility Changes in U.S. Macroeconomic Time Series,” Review of Economics and Statistics, 86, 833–839. Sentana, E. (1995). “Quadratic ARCH Models,” Review of Economic Studies, 62, 639– 661. Sentana, E. and Fiorentini, G. (2001). “Identiﬁcation, Estimation and Testing of Conditional Heteroskedastic Factor Models,” Journal of Econometrics, 102, 143–164. Serv´en, L. (2003). “Real-Exchange-Rate Uncertainty and Private Investment in LDCS,” Review of Economics and Statistics, 85, 212–218. Shephard, N.H. (1994). “Partial Non-Gaussian State Space,” Biometrika 81, 115–131. Shephard, N.H. (1996). “Statistical Aspects of ARCH and Stochastic Volatility Models.” In Time Series Models in Econometrics, Finance and Other Fields, (D.R. Cox, D.V. Hinkley and O.E. Barndorﬀ-Nielsen eds), London: Chapman & Hall, 1–67. Shephard, N.H. (2008). “Stochastic volatility models” In The New Palgrave Dictionary of Economics, 2nd Edn (S.N. Durlauf and L.E. Blume, eds). Palgrave MacMillan. Shields, K., Olekalns, N. Henry, O.T. and Brooks, C. (2005). “Measuring the Response of Macroeconomic Uncertainty to Shocks,” Review of Economics and Statistics, 87, 362–370. Shiller, R.J. (1981). “Do Stock Prices Move Too Much to be Justiﬁed by Subsequent Changes in Dividends?” American Economic Review, 71, 421–436. Shimko, D. (1993). “The Bounds of Probability,” RISK, 6, 33–37. Siklos, P.L. and Wohar, M.E. (2005). “Estimating Taylor-Type Rules: An Unbalanced Regression?” In Advances in Econometrics, vol. 20 (T.B. Fomby and D. Terrell, eds). Amsterdam: Elsevier. Singleton, K.J. (2006). Empirical Dynamic Asset Pricing. Princeton: Princeton University Press.

398

Bibliography

S¨ oderlind, P. and Svensson, L. (1997). “New Techniques to Extract Market Expectations from Financial Instruments,” Journal of Monetary Economics, 40, 383–429. Solnik, B. and Roulet, J. (2000). “Dispersion as Cross-Sectional Correlation,” Financial Analysts Journal, 56, 54–61. Somerville, C.T. (2001). “Permits, Starts, and Completions: Structural Relationships Versus Real Options,” Real Estate Economics, 29, 161–190. Sortino, F.A. and Satchell, S.E. (2001). Managing Downside Risk in Financial Markets. Butterworth-Heinemann. Sortino, F.A. and van der Meer, R. (1991). “Downside Risk,” The Journal of Portfolio Management, 17, 27–31. Stambaugh, R.F. (1993). “Estimating Conditional Expectations When Volatility Fluctuates,” NBER Working Paper 140. Stambaugh, R.F. (1988). “The Information in Forward Rates: Implications For Models of the Term Structure,” Journal of Financial Economics, 21, 41–70. Stanton, R. (1997). “A Nonparametric Model of Term Structure Dynamics and the Market Price of Interest Rate Risk,” Journal of Finance, 52, 1973–2002. Stinchcombe, M. and White, H. (1998). “Consistent Speciﬁcation Testing with Nuisance Parameters Present Only Under the Alternative,” Econometric Theory, 14, 295–324. Stock, J. and Watson, M. (1998). “Diﬀusion Indexes,” NBER Working Paper 6702, Cambridge, Mass.: National Bureau of Economic Research. Stock, J.H. and Watson, M.W. (2002a). “Forecasting Using Principal Components from a Large Number of Predictors,” Journal of the American Statistical Association, 97, 1167–1179. Stock, J.H. and Watson, M.W. (2002b). “Has the Business Cycle Changed and Why?” In NBER Macroeconomics Annual 2002. (M. Gertler and K. Rogoﬀ eds), Cambridge, Mass.: MIT Press. Stock, J.H. and Watson, M.W. (2007a). “Why Has U.S. Inﬂation Become Harder to Forecast?” Journal of Money, Credit, and Banking, 39, 3–34. Stock, J.H. and Watson, M.W. (2007b). Introduction to Econometrics, 2nd edn. Boston: Pearson Education. Tauchen, G. and Pitts, M. (1983). “The Price Variability-Volume Relationship on Speculative Markets,” Econometrica, 51, 485–505. Taylor, J.B. (1993). “Discretion Versus Policy Rules in Practice,” Carnegie-Rochester Conference Series on Public Policy, 39, 195–214. Taylor, S.J. (1986). Modeling Financial Time Series. Chichester, UK: John Wiley and Sons.

Bibliography

399

Taylor, S.J. (2004). Asset Price Dynamics and Prediction. Princeton, NJ: Princeton University Press. Tesar, L. and Werner, I. (1994). “International Equity Transactions and U.S. Portfolio Choice.” In The Internationalization of Equity Markets, (Frankel, J. ed.), Chicago. Theil, H. and Ghosh, R. (1980). “A Comparison of Shift-share and the RAS Adjustment,” Regional Science and Urban Economics, 10, 175–180. Timmermann, A. (2000). “Moments of Markov Switching Models,” Journal of Econometrics, 96, 75–111. Torous, W. and Ball, C. (1995). “Regime Shifts in Short Term Riskless Interest Rates,” Working Paper, London Business School. Tsay, R.S. (2002). Analysis of Financial Time Series. New York: John Wiley and Sons, Inc. Tse, Y.K. (1998). “The Conditional Heteroskedasticity of the Yen-Dollar Exchange Rate,” Journal of Applied Econometrics, 13, 49–55. Tse, Y.K. (2002). “Residual-Based Diagnostics for Conditional Heteroscedasticity Models,” Econometrics Journal, 5, 358–373. Tse, Y.K. and Tsui, A.K.C. (1999). “A Note on Diagnosing Multivariate Conditional Heteroscedasticity Models,” Journal of Time Series Analysis, 20, 679– 691. Tse, Y.K. and Tsui, A.K.C. (2002). “A Multivariate GARCH Model with Time-Varying Correlations,” Journal of Business and Economic Statistics, 20, 351–362. van der Weide, R. (2002). “GO-GARCH: A Multivariate Generalized Orthogonal GARCH Model,” Journal of Applied Econometrics, 17, 549–564. Varian, H.R. (1974). “A Bayesian Approach to Real Estate Assessment.” In Studies in Bayesian Econometrics and Statistics in Honor of Leonard J. Savage (S.E. Fienberg and A. Zellner, eds). Amsterdam: North-Holland, 195–208. Wallis, K.F. (1989). “Macroeconomic Forecasting: A Survey,” Economic Journal, 99, 28–61. Wallis, K.F. (2004). “An Assessment of Bank of England and National Institute Inﬂation Forecast Uncertainties,” National Institute Economic Review, No.189, 64– 71. Wallis, K.F. (2008). “Forecast Uncertainty, its Representation and Evaluation,” In Econometric Forecasting and High-Frequency Data Analysis (R.S. Mariano and Y.K. Tse, eds), Vol.13 of the Lecture Notes Series of the Institute for Mathematical Sciences, National University of Singapore, 1–51. Singapore: World Scientiﬁc. Wei, S.X. (2002). “A Censored-GARCH Model of Asset Returns with Price Limits,” Journal of Empirical Finance, 9, 197–223.

400

Bibliography

Weiss, A. (1991). “Estimating Nonlinear Dynamic Models Using Least Absolute Error Estimation,” Econometric Theory, 7, 46–68. West, K.D. (1996). “Asymptotic Inference About Predictive Ability,” Econometrica, 64, 1067–1084. West, K.D. (2006). “Forecast Evaluation.” In Handbook of Economic Forecasting. (G. Elliott, C.W.J. Granger, and A. Timmermann eds), Amsterdam: North-Holland. White, H. (1980). “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity,” Econometrica, 48, 817–838. White, H. (1994). Estimation, Inference and Speciﬁcation Analysis. New York: Cambridge University Press. White, H. (2001). Asymptotic Theory for Econometricians. San Diego: Academic Press. White, H. (2006). “Approximate Nonlinear Forecasting Methods.” In Handbook of Economics Forecasting. (G. Elliott, C.W.J. Granger and A. Timmermann, eds), New York: Elsevier, 460–512. Wong, C.S. and Li, W.K. (2001). “On a Mixture Autoregressive Conditional Heteroskedastic Model,” Journal of the American Statistical Association, 96, 982–995. Yang, M. and Bewley, R. (1995). “Moving Average Conditional Heteroskedastic Processes,” Economics Letters, 49, 367–372. Zarnowitz, V. and Lambros, L.A. (1987). “Consensus and Uncertainty in Economic Prediction,” Journal of Political Economy, 95, 591–621. Zako¨ıan, J.-M. (1994). “Threshold Heteroskedastic Models,” Journal of Economic Dynamics and Control, 18, 931–955. Zellner, A. (1986). “Bayesian Estimation and Prediction Using Asymmetric Loss Functions,” Journal of the American Statistical Association, 81, 446–451. Zhang, L., Mykland, P.A. and A¨ıt-Sahalia, Y. (2005). “A Tale of Two Time Scales: Determining Integrated Volatility with Noisy High-Frequency Data,” Journal of the American Statistical Association, 100, 1394–1411. Zhang, Z., Li, W.K. and Yuen, K.C. (2006). “On a Mixture GARCH Time Series Model,” Journal of Time Series Analysis, 27, 577–597. Zhou, B. (1996). “High-Frequency Data and Volatility in Foreign-Exchange Rates,” Journal of Business and Economic Statistics, 14, 45–52. Zivot, E. and Andrews, D.W.K. (1992). “Further Evidence on the Great Crash, the OilPrice Shock, and the Unit-Root Hypothesis,” Journal of Business and Economic Statistics, 10, 251–270.

Index page numbers in bold refer to glossary deﬁnitions ANOVA models 21 n.5 ANST-GARCH (Asymmetric Nonlinear Smooth Transition GARCH) 139–40 APARCH (Asymmetric Power ARCH) 140, 147, 152 AR (1) model 9 AR(1)-GARCH(1,1) processes and US inﬂation 202–9 ARCD (AutoRegressive Conditional Density) 140 ARCH (Autoregressive Conditional Heteroskedasticity) x–xi, 2–3, 5, 9, 35, 62–3, 78, 85, 117–18, 140–1, 165 fourth-order 93 Federal Reserve forecasting 87–91, 89 Tab. 5.3, 89 Fig. 5.4 glossary 80 Tab. 5.1, 137–63 and macroeconomics 79–96 OLS estimates of bias in Federal Reserve forecasting 89 Tab 5.3 re-estimation of original UK inﬂation model 66–9, 67 Fig. 4.2, 67 Tab. 4.1, 68 Fig. 4.3 (a), (b), 69 Fig. 4.4, 71, 72, 73, 74 Fig. 4.4 (a), 77, 78 Taylor Rule and Federal Reserve policy 91–5, 92 Tab. 5.5, 93 Tab. 5.6, 94 Tab. 5.7, 95 Fig. 5.5 and volatility 79–80 see also GARCH Archipelago Exchange (ARCA) 355, 358

AARCH 138 Abraham, J. M. 38, 48 ACD (Autoregressive Conditional Duration) 138 ACH1 (Autoregressive Conditional Hazard) 138–9 ACH2 (Adaptive Conditional Heteroskedasticity) 139 ACM (Autoregressive Conditional Multinomial) 139 Adair, M. 7 ADCC (Asymmetric Dynamic Conditional Correlations) 139 AGARCH1 (Asymmetric GARCH) 139, 151 see also TS-GARCH AGARCH2 (Absolute Value GARCH) 139 AGDCC 150 Ahn, D. 298 A¨ıt-Sahalia, Y. 296, 298, 307, 308, 327 Aizenman, J. 98 Akaike’s information criterion 72–3 Alexander, C. 157 Almgren, R. 355 Altonji, J. 15 n.1 American Express (AXP) trade data 130, 130 Tab. 7.4, 131 Tab. 7.5 Andersen, T. 13, 108, 109, 118, 120, 121, 155, 298 Andrews, D. W. K. 71, 254 Ang, A. 117 ANN-ARCH (Artiﬁcial Neural Network ARCH) 139 401

402 ARCH-M (ARCH-in-Mean) 4, 141–2 eﬀects 296–7 ARCH-NNH (ARCH Nonstationary Nonlinear Heteroskedasticity) 141 ARCH-SM (ARCH Stochastic Mean) 142 ARCH-Smoothers 142 Arize, A. C. 139–40 ARMA model 357, 363 processes 82–3 representation 147, 148 Aruoba 110 asset market volatility 97–8 asymptotic standard errors 90 ATGARCH (Asymmetric Threshold GARCH) 142 AUG-GARCH (Augmented GARCH) 142–3 Augustine, monk 6 autocontours 214–15, 230 see also multivariate autocontours autoregressive conditional duration 5 autoregressive conditional skewness and kurtosis 231–56 conditional quantile-based measures 231 LRS model 240 Tab. 12.1, 241 Fig. 12.2, 241 Fig. 12.3, 242, 244, 245 quantile-based measures 238–9 time-varying for the S & P 500 239–44, 239 Fig. 12.1, 240 Tab. 12.1, 241 Fig. 12.2–12.3, 242 Tab. 12.2, 243 Fig. 12.4–12.5, 244 Fig. 12.6 AVGARCH (Absolute Value GARCH) 143 Baba, Y. 143 Babsiria, M. E. 120 n.3 Bacci, M. 8 Bahra, B. 326 Bai, J. 71, 214 Baillie, R. T. 77, 147 Bank of England 63, 65, 326 forecasts 74–6

Index Banz, R. 324 Barlett standard errors 360 Barndorﬀ-Nielsen, O. E. xi, 118, 124, 125, 126, 127, 131, 132, 133 base multiplier model 20, 21 Trace test 26 Tab. 2.4 Basset, G. 7 Bates, D. 326, 328–9 Bauwens, L. 213 Beaud 21 n.5 Bekaert, G. 259 n.1, 262, 262 n.4 BEKK (Baba, Engle, Kraft, Kroner) 143 multivariate autocontours 225, 227, 228 Fig. 11.3, 228 Fig. 11.5, 229 Tab. 11.7 Benati, L. 65, 71, 77 Bera, A. K. 138, 156 Bernoulli random variable 216 Berzeg, K. 21 Bewley, R. 154 Biais, B. 354 Bierens, H. J. 198 Billio, M. 147 Black, F. 118 Black-Scholes (BS) option pricing model 323, 327, 336, 342, 347, 349, 352, equations 328, 341 gammas 149 implied volatilities (IVs) 324, 346 Bliss, R. 296, 326, 327, 328, 329, 342 Boero, G. x, 75 Boivin, J. 91 Bollerslev, T. xi, 6 ARCH (GARCH) 86 citations 7 GARCH modeling 7, 81, 142, 144, 147, 148, 149–50, 151, 155, 156, 157, 158 macroeconomic volatility and stock market volatility, world-wide 108, 109 models to limit order book dynamics 354, 355 realized semivariance 118, 120, 121, 122, 131

Index Boudoukh, J. xi Bowley coeﬃcient of skewness 238–9 Box, G. E. P. 9 Brandt, M. W. 158 Breeden, D. 324 Brenner, R. 150, 153 Breusch, T. 7 Brockwell, P. 145 Brooks, R. 259 n.1, 262 n.4, 267 n.7 Brorsen, B. W. 159 Brown, H. 19, 24 Brown, S. 14, 16, 20, 27, 33 Brownian motion 126, 133, 134, 145, 161 Brownian semimartingale 119, 124, 126, 134 Bu, R. 329 Buchen, P. W. 329 building permits (US) 38–47, 40 Fig. 3.3, 42–3 Tab. 3.1, 44 Fig. 3.4, 45 Fig. 3.5, 46 Fig. 3.6 and GDP growth rate 36 Fig. 3.1 Bureau of Labor Statistics (US) 21 Bureau of the Census (US) 38, 39, 40 Burns, P. 157 business cycle eﬀects 98 Cai, J. 151, 161 Calvet, L. E. 98 Campa, J. M. 326 Campbell, J. 296 Cao, C. Q. 148 Caporin, M. 145, 147 Cappiello, L. 139, 150 Carlino, G. A. 15 n.1, 19, 20, 24, 25 CARR (Conditional AutoRegressive Range) 143 Carter, C. K. 47 Castle, J. L. 69 Cat˜ ao, Luis xi CAViaR (Conditional Autoregressive Value At Risk) xi, 81, 143 see also multi-quantile CAViaR and skewness and kurtosis ccc (Constant Conditional Correlations) 143–4 CCC GARCH 144 Central Limit Theorem 82, 342

403 CGARCG (Component GARCH) 144 Chadraa, E. 145 Chan, K. C. 296 Chang, P. H. K. 326 Chen, J. 117 Chen, X. 120 n.3 Chen, Z. 214 Chernov, M. 202 Chesher, A. 197 Chibb, S. 47 Chicago Board of Trade 88 Choleski decomposition 220 Chou, R. 7, 143 Chow (1960) test 168, 183 Chriss, N. 355 Christodoulakis, G. A. 145 Christoﬀersen, P. F. 194 Chung, C. F. 77 Clarida, R. 91 Clark, P. K. 155 Clayton-Matthews, A. 38 Clements, M. P. 167 COGARCH (Continuous GARCH) 144–5, 146 cointegration 2, 3, 4, 5, 9, 14, 17, 18, 164 and long run shift-share modeling 22–33 conditional mean/variance x Conley, T. 307 Consensus Economics 195 constant share model (Model 4) 20, 24, 27, 33 Trace tests 25 Tab. 2.3 continuous-time model xi Copula GARCH 145 Corr ARCH (Correlated ARCH) 145 Corradi, V. 132 Coulson, N. E. x, 14, 15 n.1, 16, 19, 20, 21, 27, 33 Crone, T. M. 38, 48 Crouhy, H. 142 Crow, E. L. 238, 239 Czado, C. 146 Dacorogna, M. M. 151 DAGARCH (Dynamic Asymmetric GARCH) 145

404 Dai, Q. 296, 298, 307 Dav´e, R. D. 151 Davidson, J. 151 Davies test 268–9, 271 DCC-GARCH (Dynamic Conditional Correlations) xi, 145, 147 and multivariate autocontours 225, 227, 228 Fig. 11.4, 229 Tab. 11.7, 229 Fig. 11.6, 229 Tab. 11.7, 230 de Jong, R. M. 198 DeFina, R. 15 n.1 deforestation, Amazon 5 del Negro, M. 259 n.1, 262 n. 4, 267 n.7 den Hertog, R. G. J. 144 Department of Commerce (US) 39 Derman, E. 328 developing countries 101 DFM-SV model 38, 45–51 estimation of ﬁxed model coeﬃcients 47 ﬁltering 47 US housing results 57–60, 58 Fig. 3.10, 59–60 Fig. 3.11–13 diag MGARCH (diagonal GARCH) 145–6 Dickey-Fuller (ADF) test 21–2, 69–71, 70 Fig. 4.5 (a), (b) Diebold, F. X. x–xi, 86, 108, 109, 110, 118, 120, 121, 131, 143, 147, 153, 194 Engle interview 13, 14 Ding, Z. 140, 225 Distaso, W. 132 Dittmar, R. 298 Domowitz, I. 354 Donaldson, R. G. 139 Doornik, J. A. 167, 190 Downing, C. xi downside risk 117–36 American Express (AXP) data 130, 130 Tab. 7.4, 131 Tab. 7.5 General Electric data 120–1, 121 Fig. 7.1, 128–30, 128 Tab. 7.2, 129 Tab. 7.3, 131 Tab. 7.5 IBM data 130, 130 Tab. 7.4, 131 tab. 7.5

Index measurement 117–36 trade data, general 130–1 Walt Disney (DIS) data 130, 130 Tab. 7.4, 131 Tab. 7.5 Drost, F. C. 163 DTARCH (Double Threshold ARCH) 146, 147 Duan, J. 142 Duchesne, P. 213 Duﬃe, D. 298 Dumas, B. 328 Dunn, E. S. Jr. 15, 18–19 Dupire, B. 328 Durbin, J. 7 Dynamic Multiple-Indicator Multiple-Cause (DYMIMIC) model 13–14 Ebens, H. 108 Econometric Society 2, 5, 9, 164 Econometrica 3, 5, 9, 62 Econometrics World Congress 2 Edgeworth expansion 328 EGARCH (Exponential GARCH) 62, 90–1, 91 Tab. 5.4, 143, 146, 147, 152 EGOGARCH (Exponential Continuous GARCH) 146 Elder, J. 79 electrical residential load forecasting 3 electricity prices/demand 4 Electronic Communications Networks (ECNs) 354 Elliott, G. 4–5, 9, 11, 195–6, 200 Emmerson, R. 15, 21 EMWA (Exponentially Weighted Moving Average) 146–7 Engle, R. 1, 5, 6, 9, 33, 37, 296 ARCH and GARCH modeling 4, 16, 20, 27, 35, 47, 86, 87, 92, 121–2, 138, 139, 140–1, 141–2, 144, 145, 147, 149, 150, 151, 152, 155, 156, 157, 159–60, 162, 164, 237, 327 ARCH paper (1982) x–xi, 2–3, 78, 79, 80, 205, 209 ARCH-M paper (1987) 203, 296

Index BEKK model 143, 225 CAViaR 81, 143, 231, 246 citations 7 Cornell PhD thesis x DCC model 225, 231 Diebold interview 13, 14 econometric volatility 97, 118, 137, 140, 257 interest rates 317 mean variance portfolio risk 355, 361 MEM 155 MIT x, 2, 13 Nobel Prize 2, 5, 78, 137 spline-GARCH model 98–9 Stern School of Business, NYU x, 4 super exogeneity 165 thesis examinees 10 time-varying volatility 194 TR 2 test 86, 86 Tab. 5.2, 87, 89, 96 UK inﬂation research 62, 66–9, 78 as urban economist x, 13–14, 33 equity model of volatility xi Ericsson, N. R. 165, 192 European Monetary System 66 European Union 64 Evans, M.D.D. 110 EVT-GARCH (Extreme Value Theory GARCH) 146 exchange rate mechanism 66 exogeneity see super exogeneity extreme value theory 146 Fama, E. F. 296 F-ARCH (Factor ARCH) 147 Favero, C. 165, 192 FCGARCH (Flexible Coeﬃcient GARCH) 147 FDCC (Flexible Dynamic Conditional Correlations) 147 Federal Reserve see US Federal Reserve Ferguson, K. 120 Ferstenberg, R. 355 F-GARCH (Factor GARCH) 147 FIAPARCH (Fractionally Integrated Power ARCH) 147

405 FIEGARCH (Fractionally Integrated EGARCH) 147 FIGARCH (Fractionally Integrated GARCH) 147–8, 152, 154 Figlewski, S. xi Fiorentini, G. 153, 161 FIREGARCH 158 Fishburn, P. C. 120 Fisher, A. J. 98 Fisher, F. 13 Fisher-Tippett Theorem 342 Fisher transforms 145 Fisher’s Information Matrix 158 Fleming, J. 328 FLEX-GARCH (Flexible GARCH) 148 Forbes, K. 260, 288, 295 forecast errors see generalized forecast errors, optimality and measure Foster, D. 141, 142 Fountas, S. 79 four-part shift share model (Model 5) 21 fractional integration/long memory processes 3, 9 Fr´echet distribution 342 Frey, R. 146 Friedman, B. M. 154 Friedman, M. 63, 77, 78 Frisch, R. 182, 259 Fornari, F. 163 FTSE industry sectors 266 stock index options 327 ‘fundamental volatility’ 98, 98n.2, 99, 100, 105, 108–9 FX option prices 327 GAARCH, Generalized model 138 Gal´ı, J. 91 Gallant, A. R. 159, 202, 298 Gallo, J. P. 121–2 GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) 7, 62, 121–2, 123 Tab. 7.1, 130, 131 Tab. 7.5, 132 Tab. 7.7, 133, 138, 148–9, 152, 231, 240, 258, 259, 353 glossary 137–63

406 GARCH . . . (cont.) inference about the mean 81–7, 83 Fig. 5.1, 84 Fig. 5.2, 85 Fig. 5.3, 86 Tab. 5.2 see also multivariate GARCH models GARCH Diﬀusion 149 GARCH-EAR (GARCH Exponential AutoRegression) 149 GARCH-in-mean (GARCH-M) model 62, 77, 79 GARCH with skewness and kurtosis (GARCHSK) 140 GARCH-t (GARCH t-distribution) 149–50 GARCH-t generalization 93, 94 Tab. 5.7 GARCHX 150 GARCH-X1 150 GARCH-X2 150 GARCH-Γ (GARCH Gamma) 149 GARCH-Δ (GARCH Delta) 149 GARJI 150 Garratt, A. 69 Gaussian GARCH processes 83 Gaussian limit theory 132 Gaussian quasi-likelihood 122 GCR transformations 233 GDCC (Generalized Dynamic Conditional Correlations) 150 GDP, India and Pakistan 101–5 GDP, US 36, 91, 92, 99–100 growth 77 oil price eﬀects 79 volatility 79 GED-GARCH (Generalized Error Distribution GARCH) 150 Gemmill, G. 327 General Electric (GE) trade data 120–1, 121 Fig. 7.1, 128–30, 128 Tab. 7.2, 129 Tab. 7.3, 131 Tab. 7.5 generalized forecast errors, optimality and measure 194–212 AR-GARCH processes 202–5 Linex inﬂation forecasts 205–7, 206 Fig. 10.3

Index Mincer-Zarnowitz regressions 195, 200, 209 MSE inﬂation forecasts 205–9, 206 Fig. 10.3, 208 Fig. 10.4 MSE loss error density 199–200, 202–5, 203 Fig. 10.1, 204 Fig. 10.2, 208 Fig. 10.4 “MSE-loss probability measure” 195 objective error density 202–5, 204 Fig. 10.2 properties under change of measure 200–2 properties under loss functions 197–9 “risk neutral probabilities” 195 testable implications under loss functions 196 US inﬂation application 205–9, 206 Fig. 10.3, 208 Fig. 10.4 Gertler, M. 91 Geweke, J. 38, 154, 156 Ghosh 21 n.5 Ghysels, E. 98, 120 n.3, 157 Giannoni, M. P. 91 Gibbons, M. 307 Gibbs sampling 47 GJR-GARCH (Glosten, Jagannathan and Runkle) 122, 123 Tab. 7.5, 130, 131 Tab. 7.5, 132 Tab. 7.7, 145, 147, 150–1, 152, 163 global stock market 266 globalization, role of 294 Glosten, L. R. 118, 150, 162, 361, 363, 364 Gobbo, M. 147 Goetzmann, W. N. 38, 48 GO-GARCH (Generalized Orthogonal GARCH) 151 Gon¸calves, S. 95 Gonz´ alez-Rivera, G. xi, 159–60, 214, 215, 216, 217, 224, 230 Google (GOOG) stock 359 Gourieroux, C. 158, 197 Gourlay, A. R. 321 GQARCH (Generalized Quadratic ARCH) 138, 151 Gram-Charlier series expansions 240

Index Granger, C. W. J. x. 9, 14, 194, 195, 196, 197 ARCH acronyms 137, 139 citations 7 downside risk 117 GDP hypothesis 109, 110 Tab. 6.1 ‘Granger Causality’ 7 Nobel Prize 2, 5 student co-publishers 11 Graversen, S. E. 125, 126 Gray, S. F. 151, 161 Great Inﬂation 62 Great Moderation 37, 38, 43, 60, 61, 62, 100, 100 n.8 Greek letter hedging parameters 326 Grier, K. B. 77, 79 Griﬃn, J. 276, 279, 281 Groen, J. J. J. 76 GRS-GARCH (Generalized Regime-Switching GARCH) 151 Gu´egan, D. 143 Hadri, K. 329 Haldane, A. 65, 66 Hall, A. 6, 354 Hall, B. 2 Ham, J. 15 n.1 Hamilton, J. x, 3, 4, 5, 9, 90, 98, 138, 151, 161 citations 7 student co-publishers 11 Han, H. 141 Hannan-Quinn criterion 73 Hansen, B. E. 152–3 Hansen, L. 98, 307 Hansen, P. R. 124 HARCH (Heterogeneous ARCH) 151–2 Harjes, R. 150, 153 Harmonised Index of Consumer Prices (CPI) 64 Harris, R. D. F. 159 Harrison, J. M. 195, 200 Hartley, M. J. 166 Harvey, C. R. 140, 160, 238 Haug, S. 146

407 Hausman, J. 355 Hautsch, N. 354 Hendry, D. F. xi, 3, 6, 69, 164, 165, 166, 167, 173, 175, 179, 190, 192 Hentschel, L. 152 Heston, S. L. 160, 261, 262 n.4, 276 Heston-Rouwenhorst decomposition scheme 267 HGARCH (Hentschel GARCH) 152 Higgins, M. L. 138, 156 high-frequency intraday data xi Hille, E. 309 n.5 Hillioin, P. 354 Hodrick, R. 259 n.1, 262, 262 n.4 Hogan, W. W. 120 Hooper, J. 1 Hoover, K. D. 176 Horvath, M. 15 n.1 housing construction, US Alabama population and housing permits 39, 50 Alaska, national factors 52 Arkansas population and housing permits 39, 50 building permit growth rate 41–4, 44 Fig. 3.4 building permits data 38–4 building permits data for representative states 40–1, 40 Fig. 3.3 California 50 conclusions 60–1 DFM-SV model, national/regional factors 38 DFM-SV model results 45–51, 57–60, 58 Fig. 3.10, 59–60 Fig. 3.11–13 DFM with split-sample estimates 51–7, 53–4 Tab. 3.3, 54 Tab. 3.4, 55–6 Tab. 3.5, 58, 60 estimated region results 49–51, 49 Fig. 37, 50 Tab. 3.2, 51 Fig. 3.8, 51 Fig. 3.9 estimation of housing market regions 47–9

408 housing . . . (cont.) evolution of national/regional factors 35–61 Florida 50, 52 Georgia 52 Hawaii 52 Louisiana 50 Mississippi population and housing permits 39, 52 Missouri sampling standard error 39 Nebraska sampling standard error 39 Nevada 50 Ohio sampling standard error 39 Rhode Island 52 seasonality 41–3, 42–3 Tab. 3.1 South Carolina 52 South Dakota 50 spatial correlation 43–5, 46 Fig. 3.6 standard deviations: volatility 43, 45 Fig. 3.5, 61 Vermont 49, 50 Virginia 52 volatility 43, 57, 58, 60–1 Washington 50 West Virginia 52 Wyoming 52 Wyoming sampling standard error 39 Huang, X. 131, 118 Huber, P. J. 234–5 Hwang, S. 150, 238 HYGARCH (Hyperbolic GARCH) 152 IBM trade data 130, 130 Tab. 7.4, 131 tab. 7.5 IGARCH (Integrated GARCH) 146, 148, 152–3 IMF 195 Implied Volatility (IV) 153 India, GDP 101–5 Industry Act UK (1975) 73 inﬂation uncertainty modeling, UK 62–78 ARCH model re-estimation 66–9, 67 Fig. 4.2, 67 Tab. 4.1, 68

Index Fig. 4.3 (a), (b), 69 Fig. 4.4, 71, 72, 73, 74 Fig. 4.4 (a), 77, 78 Bank of England forecasts 74–6 Bretton Woods period 65 business cycle range 65 CPI 73 exchange rate targeting 65, 66 forecast uncertainty 73–6, 74 Fig. 4.6 (a), (b) monetary policy 65, 75 Monetary Policy Committee (MPC) forecasts 74 Fig. 4.6 (a), 75 mortgage interest and RPI 63–6 non-stationary behaviour 69–73, 78 policy environment 63–6 Retail Price Index (RPI) 66, 69, 73 seasonality 70 short-term economic forecasts 73 structural ‘breaks’ model 69, 71, 72 Tab. 4.2, 73–4, 74 Fig. 4.6 (a), 75–6, 77, 78 Survey of External Forecasters (SEF) 74 Fig. 4.6 (a), (b), 75, 76 ‘traditional’/modeling approaches compared 63 Treasury forecasts 74 Fig. 4.6 (a), 76, 77 uncertainty and the level of inﬂation 77 unit route hypothesis 70–1 interest rate volatility, a multifactor, nonlinear, continuous-time model 296–322 aﬃne class models 298, 313 ARCH-M eﬀects 296–7 asset pricing 314 bond/ﬁxed-income derivative pricing 297 conditional distribution of 300–2, 302 Tab. 14.1, 303–7, 304 Fig. 14.3, 305 Fig. 14.4–5, 306 Fig. 14.6, 307 Fig. 14.7 continuous-time multifactor diﬀusion process 307–13 data description 299–300, 299 Fig. 14.1, 300 Fig. 14.2

Index distribution of four possible states 300–2 drift, diﬀusion and correlation approximations 308–13, 313 Fig. 14.8 equilibrium vs. arbitrage-free debate 322 ﬁxed-income contingent claims 318–21 Hopscotch method (Gourlay and McKee) 321 Kernel estimation 303, 315 Monte Carlo simulations 321 nonaﬃne models 298 “relevant pricing factor” 298 stochastic behaviour of interest rates 298–307 structure theory 298 two-factor diﬀusion process results 316–18, 316 Fig. 14.9, 317 Fig. 14.10, 318 Fig. 14.11, 319 Fig. 14.12 two-factor (Longstaﬀ and Schwartz) model 298, 313–16, 322 volatilities and levels 298, 301, 315, 316–21, 317 Fig. 14.10, 318 Fig. 14.11, 319 Fig. 14.12, 321 International Financial Statistics (IFS) 99 intraday high-frequency transactions xi Irish, M. 197 Irons, J. S. 165 ISI Web of Science 79 Jackwerth, J. C. 326, 327, 328 Jacod, J. 118, 125, 126, 127, 133 Jagannathan, R. 98, 118, 150, 162 Jalil, M. 91 Jansen, E. S. 165 Jarque-Bera test 267, 275 Jenkins, G. M. 9 Johansen, S. 6, 7, 165, 166, 173 Jondeau, E. 145 Jones, C. S. 158 Jord´ a, O. 138 Judd, J. P. 91, 92 Juselius, K. 6

409 Kalliovirta, L. 214 Kalman ﬁlters 4, 35 Kamstra, M. 139 Kan, R. 298 Kani, I. 328 Kapetanios, G. 76 Karanasos, M. 79 Karolyi, G. A. 276, 279, 281, 296 Kavajecz, K. 354 Kawakatsu, H. 154 Kelly, M. 329 Kernel estimation 303, 315 Kilian, L. 95 Kim, E. Han 7 Kim, S. 47 Kim, T. xi Kim, T.-H. xi, 231, 237, 238, 239, 252 Kimmel, R. 298, 307 King, M. 288 Kinnebrock, S. xi, 125, 126, 133–4 Kl¨ uppelberg, C. 144, 146 k-means cluster analysis 48–9 Kodres, L. E. 162 Koenker, R. 7 Kohn, R. 47 Komunjer, I. 5, 200, 236 Koppejans 354 Koren, M. 101 Kraft, D. 143 Kreps, D. M. 195, 200 Krolzig, H.-M. 165, 166 Kroner, K. F. 7, 143, 150, 153, 225 kurtosis autoregressive conditional 231–56 GARCH with skewness and kurtosis (GARCHSK) 140 and global equity returns 267, 275 MQ-CAViaR autoregressive conditional skewness and kurtosis 232–4 Labys, P. 118, 120, 121 Laibson, D. I. 154 Lalancette, S. 213 Lambros, L. A. 76 LARCH (Linear ARCH) 153 latent GARCH 153

410 Laurent, S. 213 Lazar, E. 157 LeBaron, B. 149 Lebesgue measure 249 Lebesgue-Stieltjes probability density function (PDF) 232 Ledoit, O. 148 Lee, G. G. J. 144 Lee, K. 69, 79 Lee, L. F. 162 Lee, S. 118, 138, 142, 152–3 Lee, T. H. 150 Leibnitz, G. 198 ´ 140, 231, 240 Le´on, A Level-GARCH 153 L´evy processes 144 Lewis, A. L. 120 LGARCH (Leverage GARCH) 153 see also GJR LGARCH2 (Linear GARCH) 153 Li, C. W. 146 Li, W. K. 146, 156, 213 Li, Y. 133 Lilien, D. 35, 141–2, 203, 296, 297, 301, 317, 322 limit order book dynamics, new model 354–64 ACM model 355 Arma model 357, 363 data 358–60, 359 Fig. 16.1, 360 Fig. 16.2 description 356–8 estimation 358 high volatility periods 361–2, 364 mean variance portfolio risk 355 results 360–4, 361 tab. 16.1, 362 Tab. 16.2, 363 Tab. 16.3, 364 Fig. 16.3 Lin, G. 98 Lindner, A. 144, 145, 146 linear regressions 9 Ling, S. 213 Litterman, R. 296 Litzenberger, R. 324 Liu, S. M. 159 Ljung-Box test 361, 363 LM test 92

Index LMGARCH (Long Memory GARCH) 153–4 Lo, A. W. 307, 327, 355 log-GARCH (Logarithmic GARCH) 154 London School of Economics (LSE) 2 long base multiplier model 20, 27 trace test 26 Tab. 2.4 long run shift-share modeling, metropolitan sectoral ﬂuctuations 13–34 Atlanta Trace test 23, 24, 25 Tab. 2.3, 26 Tab. 2.4 Atlanta VARs 30 Tab. 2.7 base multiplier model 20, 21, 26 Tab. 2.4 Chicago Trace test 25 Tab. 2.3, 26 Tab. 2.4 Chicago VARs 31 Tab. 2.8 cointegration 22–33 constant total share (Model 2) 19, 20, 24, 25 constant share (Model 4) 20, 24, 27, 33 Dallas Trace test 23–4, 25 Tab. 2.3, 26 Tab. 2.4 Dallas VARs 29 Tab. 2.6 data and evidence 21–33 four-part shift share model (Model 5) 21 general model 14–18 intermediate model (C) 27–33, Tab. 2.5–2.9 long base multiplier model 20, 27 Los Angeles Trace test 23, 24, 25 Tab. 2.3, 26 Tab. 2.4 Los Angeles VARs 32 Tab. 2.9 model (D) 21, 27–33, Tab 2.5–2.9 orthogonalization matrix 16, 18, 33 Philadelphia Trace test 24, 25 Tab. 2.3, 26 Tab. 24 Philadelphia VARs 28 Tab. 2.5 sectoral shift-share (Model 3) 19, 24 short run shift-share model (A) 27–33, Tab. 2.5–2.9 short run VAR model (B) 27–33, Tab. 2.5–2.9 ‘total share component’ 15

Index total share model (Model 1) 18–19, 24 trace test long run base multiplier 26 Tab. 2.4 trace tests 22–7, 23 Tab. 2.2, 25 Tab. 2.3, 26 Tab. 2.4 trace tests constant share model 25 Tab. 2.3 ‘traditional’ models 14 unit route tests 22 Tab. 2.1 Longstaﬀ, F. 296, 298, 313 Lucas, R. E. 165 Lucas critique 171, 182 Lumsdaine, R. L. 152–3 Lund, J. 298 Lunde, A. 124 Lutkepohl, H. 16 Luttmer, E. 307 Lyons, R. K. 110 MACH (Moving Average Conditional Heterodskedastic) 154 Machina, M. J. 8, 196 MacKinlay, A. C. 355 MacKinnon, J. 6 macroeconomic volatility and stock market volatility, world-wide 97–116 annual consumption data 111, 113–14 Tab. 6.A2 annual stock market data 111, 112–13 Tab. 6.A1 asset market volatility 97–8 basic relationship: stock return/GDP volatilities 101 basic relationship: stock return/PCE volatilities 101, 103 Fig. 6.3 choice of sample period 99–100 controlling level of initial GDP 101–5, 104–5 Figs 6.4–6.6, 106–7 Figs. 6.7–6.9 cross-sectional analysis 103 Fig. 6.2, 107–8, 107 Fig 6.9, 108 Fig. 6.10 data 99–100 developing countries 100

411 distribution of volatilities 101, 102 Fig. 6.1 empirical results 100–5 Granger hypothesis 109, 110 Tab. 6.1 panel analysis of causal direction 108–9 quarterly stock index data 111, 114–15 Tab. 6.A3 stack returns and GDP series 111, 116 Tab. 6.A4 stock markets and developing countries 100 transition economies 100 Maheu, J. M. 150 Maller, R. 144, 146 Malz, A. M. 326–7 Manganelli, S. xi, 81, 143, 231, 233, 237, 246 Mao, J. C. T. 120 MARCH1 (Modiﬁed ARCH) 154 MARCH2 see MGARCH2 Markov Chain 153, 264 Markov Chain Monte Carlo (MCMC) 38 Markov process 5, 151, 321 Markov switching 5, 260 Markowitz, H. 120 martingale diﬀerence properties 214 martingale diﬀerence sequence (MDS) 82, 88, 89, 235 martingale share model 20 Massmann, M. 166 Matrix EGARCH 154–5 Maximum Entropy principle 329 Maximum Likelihood Estimates 141 see also QMLE McAleer, M. 145 McARCH 95 philosophy 81 McCulloch, J. H. 139, 159 McCurdy, T. H. 150 McKee, S. 321 McNees, S. K. 62 McNeil, A. J. 146 MDH (Mixture of Distribution Hypothesis) 155 mean absolute error (MAE) 73

412 Medeiros, M. C. 147 Meenagh, D. 63, 65–6, 71, 76, 77 Mele, A. 163 Melick, W. R. 326 Melliss, C. 73, 76 Melvin, M. 355 MEM (Multiplicative Error Model) 155 metropolitan sectoral ﬂuctuations, sources of 13–34 demand shocks 21, 27 four aggregate levels 14–15 growth rates 15 industry share of employment 16 productivity shocks 24, 27, 34 supply shocks 14, 16, 21, 33 technology shocks 20 see also long run shift-share modeling Metropolitan Statistical Areas 21 Mezrich, J. 162 MGARCH 148, 154, 163 MGARCH1 155–6 MGARCH2 (Multiplicative GARCH) 156 see also log-GARCH MGARCH3 (Mixture GARCH) 156 Mikkelsen, H. O. 86, 147 Milhøj, A. 86, 154, 156 Miller, M. 324 Mills, L. O. 19, 20, 24, 25 Milshtein, G. N. 297, 308, 309 n.5, 321 Mincer-Zarnowitz regressions 195, 200, 209 Minsky, H. P. 154 MIT 2 Mitchell, J. 75, 76 mixed date sampling (MIDAS) 98 MN-GARCH (Normal Mixture GARCH) 157 Monash University, Melbourne 2 monetary policy (US) 80 monetary policy shocks (UK/US) 283 Monfort, A. 158 Monte Carlo methods 80, 153 Moors coeﬃcient of kurtosis 239 Moran’s I 44 n.6, 46 Morgan, I. G. 162

Index mortgage rate deviation, US regional 37 Fig. 3.2 MQ-CAViaR autoregressive conditional skewness and kurtosis 232–4 “MSE-loss probability measure” 195 MS-GARCH (Markov Switching GARCH) see SWARCH Mueller, P. 202 M¨ uller, U. A. 151 multi-quantile CAViaR and skewness and kurtosis 231–56 consistency and asymptotic normality 234–7 consistent covariance matrix estimation 237 estimations 240–4, 242 Tab. 12.2, 243 Fig. 12.5, 244 Fig. 12.6 MQ-CAViaR process and model 232–4 simulation 244–6, 245 Tab. 12.3, 246 Tab. 12.4 multivariate autocontours xi, 213–30 concept 214–15, 230 multivariate dynamic models xi multivariate GARCH models, autocontour testing BEKK model 225, 227, 228 Fig. 11.3, 228 Fig. 11.5, 229 Tab. 11.7 DCC model 225, 227, 228 Fig. 11.4, 229 Tab. 11.7, 229 Fig. 11.6, 229 Tab. 11.7, 230 empirical applications 224–30, 224 Tab. 11.5, 225 Tab. 11.6, 226 Fig. 11.2, 228–9, Figs. 11.3–11.6, 229 Tab. 11.7 empirical process-based testing approach 214 Monte Carlo simulations 215, 217, 219–22, 221 Tab. 11.1 (a), (b), 221, Tab. 11.2 (a), (b), 230 normal distributions 218, 219, 220 Fig. 11.1, 224, 227 power simulations 222–4, 223 Tab. 11.3, 224 Tab. 11.4 quasi-maximum likelihood estimator 214

Index Student-t distribution 218–19, 220 Fig. 11.1, 224, 227, 229 Tab. 11.7, 230 testing methodology 215–17 MV-GARCH (MultiVariate GARCH) 156 see also MGARCH1 Mykland, P. A. 118, 133 NAGARCH (Nonlinear Asymmetric GARCH) 156 Nam, K. 139–40 Nandi, S. 160 National Institute of Economic and Social Research (NIESR) 63, 75 Nelson, D. B. 118, 142, 141, 146, 148, 149, 150 Nelson, E. 65 Nerlove, M. 2, 63, 147, 153 neural networks 9 New Keynsian model 65 Newbold, P. 1 Newey, W. K. 206, 236 Newey-West corrections 80, 85, 86 Tab. 5.2 Ng, V. K. 118, 147, 156, 157–8, 160, 162, 296 NGARCH (Nonlinear GARCH) 152, 156–7 Ni, S. 79 Nielsen, B. 165, 166, 173 Nijman, T. E. 163 Nikolov, K. 65 NL-GARCH (NonLinear GARCH) 157 Nobel Prize 2, 5, 78 Norrbin, S. 15 n.1 North American Industry Classiﬁcation System (NAICS) 21 Nottingham University 1 Nowicka-Zagrajek, J. 158 Nuﬃeld College, Oxford 2 Nychka, D. W. 202 OGARCH (Orthogonal GARCH) 157 oil prices 259, 287, 326 oil shocks 283 Oliner, S. 287

413 OLS formula and tests 81–7, 83 Fig. 5.1, 84 Fig. 5.2, 85 Fig. 5.3, 86 Tab. 5.2, 90–1, 92, 96 OLS t-test, asymptotic rejection probability 83, 84 Olsen, R. B. 151 Orr, D. 2 Otsu, T. 236 Pagan, A. 7 Pakistan GDP 101–5 Panigirtzoglou, N. 326, 327, 328, 329, 342 Pantula, S. G. 154, 156 parameter variation across the frequency domain 13 PARCH (Power ARCH) see NGARCH Pareto distributions 139, 146, 159 Park, J. Y. 141, 158 Patton, A. J. xi, 145, 194, 195 PC-GARCH (Principal Component GARCH) 157 PcGets program 166 PcGive algorithms 166 Pearson, N. 296 Pedersen, C. S. 120 Pelloni, G. 79 Perez, S. J. 176 Perry, M. J. 77, 79 personal consumption expenditures (PCE) 99–100 Pesaran, M. H. 69 PGARCH1 (Periodic GARCH) 157 PGARCH2 (Power GARCH) see NGARCH Phillips curve 64, 65 Phillips, R. 309 n.5 Piazzesi, M. 88 Pinto, B. 98 Pitts, M. 155 Ploberger, W. 198 PNP-ARCH (Partially NonParametric ARCH) 157–8 Podolskij, M. 125, 126, 133–4 Polasek, W. 79 Poll Tax (UK) 73 Portes, R. 258, 260 portfolio theory 120

414 Powell, J. L. 236, 237 Power ARCH see NGARCH Power GARCH see NGARCH “practitioner Black-Scholes” 324, 328, 336 see also Black-Scholes (BS) option pricing model Price, S. 76 Psaradakis, Z. 165 Puctet, O. V. 151 Pyun, C. S. 139–40 QARCH see GQARCH QMLE (Quasi Maximum Likelihood Estimation) 158 QTARCH (Qualitative Threshold ARCH) 158 Quah, D. 65, 66 Quasi Maximum Likelihood Estimates (QMLE) 138 QUERI consultancy 4 Radon-Nikod´ ym derivative 201 Ramanathan, R. 8, 15, 21 Ramaswamy, K. 307 Ramey, G. 98 Ramey, V. A. 8, 98 Ramm, W. 15, 21 Ranaldo, A. 354 Rangel, J. G. 160 Rangel spline-GARCH model 98–9 Ratti, R. 79 realized semivariance (RS) 117–36 bipower variation 125, 131–2, 132 Tab. 7.7 GARCH models 121–2, 123 Tab. 7.1, 130, 131 Tab. 7.5, 132 Tab. 7.7, 133 GJR model 122, 123 Tab. 7.5, 130, 131 Tab. 7.5, 132 Tab. 7.7 models and background 122–4 noise eﬀect 133 realized variance (RV) 121, 122, 127, 129, 130 realized variance (RV) deﬁnition 118–19 signature plots 120, 121 Fig 7.1 (d), (e)

Index REGARCH (Range EGARCH) 158 regional economics 3 Reider, R. L. 326 Retail Price Index (RPI) 63–5, 65 Fig. 4.1(a), (b) Revankar, N. S. 166 Rey, H. 258, 260 RGARCH1 (Randomized GARCH) 158 RGARCH2 (Robust GARCH) 158–9 RGARCH3 (Root GARCH) 159 Richard, J. F. xi, 164 Richardson, Matthew xi Rigobon, R. 161, 260, 288, 295 RiskMetrics 147 risk-neutral density, US market portfolio estimation xi, 323–53 adding tails 342–5, 345 Fig. 15.8 arbitrage-free theoretical models 336 Binomial Tree models 327–8 Black-Scholes equations 328, 341 Black-Scholes implied volatilities (IVs) 324, 346 Black-Scholes option pricing model 323, 327, 336, 342, 347, 349, 352 Central Limit Theorem 342 dynamic behaviour 350–2, 351 Tab. 15.4 and economic/political events 326 estimating from S&P 500 index options 345–52, 347 Tab. 15.2, 348 Tab. 15.3 extracting from option market prices, in practice 331–9, 332 Tab. 15.1, 333 Fig. 15.1, 334 Fig. 15.2, 335 Figs. 15.3–4, 338 Fig. 15.5 extracting from option prices, in theory 329–31 exchange rates and expectations 326–7, 329 Extreme Value distribution 342 Fisher-Tippett Theorem 342 Garman-Kohlhagen model 327 Generalized Extreme Value (GEV) distribution 325, 342–45, 345 Figs. 15.8–9, 349, 352

Index Generalized Extreme Value (GEV) parameter values 343–4 Generalized Extreme Value (GEV) tails 350–2 Greek letter hedging parameters 326 implied volatilities (IVs) 323–4, 326, 328, 334, 336, 337, 338, 339, 341, 342, 352 market bid-ask spreads 339–41, 340 Fig. 15.6 Maximum Entropy principle 329 moments of risk-neutral density 349–50 Monte Carlo simulations 329 “practitioner Black-Scholes” 324, 328, 336 risk preferences 324, 325, 327 skewness and kurtosis 349–50 “smoothing spline” 336–7 spline functions 334, 336, 341 summary 340–1, 341 Fig. 15.7 tail parameters 347 tails 352 volatility ‘smile’ 323–4 Robins, R. 141–2, 203, 296, 297, 301, 317, 322 Robinson, P. M. 153 Rochester University 2 Rockinger, M. 142, 145 Rom, B. M. 120 Rombouts, J. V. K. 213 Rosenberg, J. xi, 149, 327 Rosu, I. 361, 363, 364 Rothenberg, Jerome 13 Rothschild, M. 147 Roulet, J. 265 Rouwenhorst, G. 261, 262 n.4, 276 RS-GARCH (Regime Switching GARCH) see SWARCH Rubinstein, M. 327–8 Rubio, G. 140, 231, 240 Rudebusch, G. D. 91, 92, 110 Ruiz, E. 160 Runkle, D. 118, 150, 162 Russell, J. xi, 138, 139, 355, 361 RV (Realized Volatility) 159

415 Sack, B. 88 Saﬂekos, A. 327 Sakata, S, 244 San Diego University, California x, xi, 1–12 changing years 4–6 citations 7 Econometrics Research Project 8 founding years 2–3 graduate students 6 middle years 3–4 university rankings 5–6 visitors and students 6–7, 9–12 wives 8 Sanders, A. 296 Santa-Clara, P. 98, 148 Santos, C. xi, 165, 175, 179 SARV (Stochastic AutoRegressive Volatility) see SV (Stochastic Volatility) SARV(1) 161 Sasaki, K. 20, 21 Satchell, S. E. 145, 150, 238 Scheinkman, J. 296, 307 Schlagenhauf, D. 15 n.1 Schwartz, E. 296, 298, 313 Schwarz criterion 73 Schwert, G. W. 98, 101, 108–9, 160, 162 sectoral shift-share (Model 3) 19, 24 Sensier, M. 71 Sentana, E. 138, 151, 153, 160, 161, 163, 288 Senyuz, Z. 214, 215, 216, 217, 224, 230 Serletis, A. 79 Serna, G. 140, 231, 240 Serv´en, L. 79 S-GARCH (Simpliﬁed GARCH) 159 SGARCH (Stable GARCH) 159 Shephard, N. xi, 47, 63, 118, 124, 125, 126, 127, 131, 132, 133, 153 Sheppard, K. 139, 150 Shields, K. 79 Shiller, R. 98, 296 Shimko, D. 334–6 Shin, Y. 69 short run shift-share modelling 17, 18

416 short-term economic forecasts, UK government 73 Sichel, D. 287 Siddique, A. 140, 238 Siddiqui, M. M. 238, 239 Sign-GARCH see GJR Sill, K. 15 n.1 Sims, C. 2 Singleton, K. J. 296, 298, 307 skewness autoregressive conditional 231–56 Bowley coeﬃcient of 238–9 GARCH with skewness and kurtosis (GARCHSK) 140 and global equity returns 267, 275 MQ-CAViaR autoregressive conditional skewness and kurtosis 232–4 Smith, J. x, 75 S¨ oderlind, P. 326 Sohn, B. 98 Sola, M. 165 Solnik, B. 265 Sortino, F. 120 Sortino ratios 120 SPARCH (SemiParametric ARCH) 159–60 Spatt, C. 354 spectral regression 3 spline-GARCH 98–9, 160 SQR-GARCH (Square-Root GARCH) 160 Satchell, S. E. 120 Stambaugh, R. F. 86 Standard and Poors (S&P) Emerging Markets Database 99 Standard and Poors (S&P) 500 stock index 239–44, 325, 326, 327, 333, 334, 340, 344, 345–52, 353 Stanford University 2 Stanton, R. xi, 297, 308 STARCH (Structural ARCH) 160 Stdev-ARCH (Standard deviation ARCH) 160 Stern School of Business, NYU x, 4, 5 STGARCH (Smooth Transition GARCH) 147, 160–1

Index Stinchcombe, M. 8, 233 stochastic volatility 62–3 Stock, J. H. x, 47 stock market volatility x–xi, 97–116 stock market crash (1987) 276, 326, 327 Stoja, E. 159 Strong GARCH 161 Structural GARCH 161 Sun, T.-S, 296 super exogeneity xi, 4 super exogeneity, automatic tests of 164–93 co-breaking based tests 186 detectability in conditional models 169–70 detectable shifts 166–70 failures 172–3, 181–6, 193 F-test, impulse-based 189–90 F-test potency 187–90, 187 Tab. 9.2–9.3, 188 Tab. 9.4, 189 Tab. 9.5–9.6, 190 Tab. 9.7 F-tests 175, 177, 179, 181, 182, 183, 185 impulse saturation 165, 166, 173–5, 174 Fig. 9.2–9.3, 178–9, 186, 187, 192–3 mean shift detection 179–81, 181 Tab. 9.2, 182 Monte Carlo evidence 166, 175, 193 Monte Carlo evidence and null rejection frequency 176–9, 177–9 Fig. 9.4 null rejection frequency 175–9 regression context 170–3 simulating the potencies 186–90, 187 Tab. 9.2–9.3, 188 Tab. 9.4, 188 Fig. 9.5, 189 Tab. 9.5–9.6, 190 Tab. 9.7–9.8 simulation outcomes 167–9, 169 Fig. 9.1 six conditions 166 UK money demand testing 190–2 variance shift detection 181 Survey of External Forecasters, Bank of England 63 Survey of Professional Forecasters (US) 63, 76, 195

Index Survey Research Centre, Princeton University 39 Susmel, R. 151, 161 SV (Stochastic Volatility) 161 Svensson, L. 326 SVJ (Stochastic Volatility Jump) 161 Swanson, E. 88 Swanson, Norm 6 SWARCH (Regime Switching ARCH) 151, 161 Taniguchi, M. 142 Tauchen, G. 118, 131, 155, 159 Taylor, S. J. 148, 162 Taylor series expansions 309, 309 n.5 Tenreyro, S. 101 Ter¨asvirta, T. 6, 165 TGARCH (Threshold GARCH) 147, 150, 152, 161–2 Theil 21 n.5 Thomas, C. P. 326 Thompson, S. B. 98 Tieslau, M. A. 77 time series methods 9 time-varying volatility x Timmermann, A. xi, 4–5, 9, 194, 195, 200, 264 student co-publishers 12 Tobit-GARCH 162 Toro, J. 165, 166 total share model (Model 1) 18–19, 24 Trades and Quotes (TAQ) data 359 transition economies 100 Trevor, R. G. 162 Tse, Y. K. 145, 147, 213–14 TS-GARCH (Taylor-Schwert GARCH) 152, 159, 162 Tsui, A. K. C. 145, 213 Tucker, J. 159 Tuesday’s Econometrician’s Lunch’ 2 TVP-Level (Time-Varying Parameter Level) see Level GARCH TVP-Level see Level-GARCH UGARCH (Univariate GARCH) see GARCH UK money demand and exogeneity 190–2

417 unit root inference 5 Unobserved GARCH see Latent GARCH urban economics x, 13–14, 33 US Federal Reserve 80, 299, 339, 352 forecasting 87–91 fund rates 95 future forecasts 89 Tab. 5.3, 89 Fig. 5.4 monetary policy 205 policy and Taylor Rule 91–5, 92 Tab. 5.5, 93 Tab. 5.6, 94 Tab. 5.7, 95 Fig. 5.5 US government bonds 299 US interest rates 141–2 US stock return data 224–6 US Treasury bill market 283 Valkanov, R. 98 Value Added Tax (UK) 73 van Dijk, D. 71 VAR (vector autoregression) x, 161 B-form 16 ‘city-industry’ 14 VAR-GARCH framework and sectoral shocks 79 Varian, H. R. 200 Variance Targeting 162 VCC (Varying Conditional Correlations) see DCC VCC-MGARCH (Varying Conditional Correlation) 145 vech GARCH (vectorized GARCH) see MGARCH1 VECH GARCH see MGARCH1 vector equilibrium systems (EqCMs) 165 Vega, C. 109 Veiga, A. 147 Verbrugge, R. 15 n.1 Vetter, M. 133 VGARCH 160, 162–3 VGARCH2 (Vector GARCH) 163 VIX index 337, 353 volatility asset market 97–8 equity model of xi

418 volatility (cont.) ‘fundamental’ 98, 98n.2, 99, 100, 105, 108–9 GDP, US 79 housing construction, US 43, 57, 58, 60–1 Implied Volatility (IV) 153 interest rates 296–322 limit order book periods 361–2, 364 macroeconomic and stock market, world-wide 97–116 ‘smile’ 323–4 volatility regimes and global equity returns 257–95 Akaike (AIC) information criteria 269, 270 Tab. 13.3, 271 arbitrage pricing theory (APT) 258 common nonlinear factor approach 265 conclusions 293–5 country-industry/aﬃliation factors 258–9, 260 country-industry decomposition 259–60 data 265–7 economic ﬁxed-length rolling windows approach 265, 280 Fig. 13.5, 279–81 global portfolio allocation 287–93, 290 Tab. 13.7, 291–2 Tab. 13.8 global risk diversiﬁcation 261 global stock return dynamics 267–75, 268 Tab. 13.1 Hannan-Quinn (HQ) information criteria 269, 270 Tab. 13.3, 271 Heston-Rouwenhorst decomposition scheme 267 interpretation 281–7, 282 Tab. 13.6, 284 Fig. 13.7, 285 Fig. 13.7 (a), 286 Fig. 13.7 (b) industry and country eﬀects benchmark 263 industry portfolios and state combinations 288–9 industry-speciﬁc shocks 283 international equity ﬂows 260

Index international risk diversiﬁcation 257, 258 intra-monthly variance of stock returns 274 Fig. 13.2 IT sector 287 joint portfolio dynamics 270–3, 271 Tab. 13.4, 272 Tab. 13.5, 273 Fig. 13.1 “mixtures of normals” model 260 modeling stock return dynamics 263–5 monetary policy shocks (UK/US) 283 nonlinear dynamic common component models 270–3 nonlinear dynamic dependencies 260 nonlinearity in returns 268–70, 269 Tab. 13.3, 270 Tab. 13.3 oil prices and shocks 283, 287 portfolio diversiﬁcation 294 “pure” country and industry portfolios 261–3 regime-switching/changes 279, 294 –5 regime-switching models 271, 293 regime-switching processes 262, 267, 268 risk diversiﬁcation 288, 295 robustness checks 273–5 rolling windows approach 279 rolling windows approach comparison, 280 Fig 13.5 Schwarz Bayesian (BIC) information criteria 269, 270 Tab. 13.3 sector-speciﬁc factors/shocks 257, 258, 295 short term interest rates 283 single state model 260 skewness and kurtosis 267, 275 smoothed state probabilities 273 Fig. 13.1, 285 Fig. 13.7 (a), 286 Fig. 13.7 (b) temporary switches 257 variance decompositions 275–81, 277 Fig. 13.3, 278 Fig. 13.4, 280 Fig. 13.5, 282 Tab. 13.6

Index Volcker, P. 92, 93 von Weizs¨acker 151 VSGARCH (Volatility Switching GARCH) 147, 163 Vuong, Q. 236 Wachter, S. M. 38, 48 Wadhwani, S. 288 wage rates (UK) 66 Wallis, K. F. x, 62, 75 Walt Disney (DIS) trade data 130, 130 Tab. 7.4, 131 Tab. 7.5 Wang, I. 354 Warren, J. M. 120 Watson, M. W. x, 6, 13, 35, 47 Waugh, F. V. 182 Weak GARCH 163 Weibull distribution 342 Weide, R. van der 151 Weiss, A. 234 Werron, A. 158 West, K. D. 198, 206 Whaley, R. E. 328 White, H. xi, 2, 3, 4, 5, 9, 84, 231, 233, 237, 238, 239, 244, 246–7, 252 citations 7 student co-publishers 11 white noise and processes 2, 82 White Standard Error test 4, 80, 85 Fig. 5.3, 86 Tab. 5.2, 92, 96, 90–1,

419 White TR 2 test 85, 86, 86 Tab. 5.2, 87 Whittaker, R. 73, 76 Wiener processes 149 Wolf, M. 148 Wong, C. S. 156 Wooldridge, J. M. 122, 142, 155, 158 World Bank 99 World Development Indicators database (WDI) 99, 111 World Economic Outlook 195 World Federation of Exchanges 99 Xing, Y. 117 Yang, M. 154 Yilmaz, K. x–xi Yixiao Sun 5 Yoldas, E. xi, 214, 215, 216, 217, 224, 230 Yuen, K. C. 156 Zakoian, J.-M. 120 n.3, 150, 161 ZARCH (Zakoian ARCH) see TGARCH ZARCH see TGARCH Zarnowitz, V. 76 Zellner, A. 2 Zhang, X. 156, 259 n.1, 262, 262 n.4 Zhou, B. 133 Zivot, E. 71 Zingales, L. 7