CONTENTS
NICHOLAS BLOOM: The Impact of Uncertainty Shocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Generalized Method of Moments With Many Weak Moment Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DONALD W. K. ANDREWS AND PATRIK GUGGENBERGER: Hybrid and Size-Corrected Subsampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P.-A. CHIAPPORI AND I. EKELAND: The Microeconomics of Efficient Group Behavior: Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MARCIANO SINISCALCHI: Vector Expected Utility and Attitudes Toward Variation . . . . . . . FRANK RIEDEL: Optimal Stopping With Multiple Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GARY CHARNESS AND URI GNEEZY: Incentives to Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
623
WHITNEY K. NEWEY AND FRANK WINDMEIJER:
687 721 763 801 857 909
NOTES AND COMMENTS: NING SUN AND ZAIFU YANG: A Double-Track Adjustment Process for Discrete Mar-
kets With Substitutes and Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SERGIO FIRPO, NICOLE M. FORTIN, AND THOMAS LEMIEUX: Unconditional Quantile Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MICHAEL P. KEANE AND ROBERT M. SAUER: Classification Error in Dynamic Discrete Choice Models: Implications for Female Labor Supply Behavior . . . . . . . . . . .
933 953 975
ANNOUNCEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 993 FORTHCOMING PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001
VOL. 77, NO. 3 — May, 2009
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org EDITOR STEPHEN MORRIS, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] MANAGING EDITOR GERI MATTSON, 2002 Holly Neck Road, Baltimore, MD 21221, U.S.A.; mattsonpublishingservices@ comcast.net CO-EDITORS DARON ACEMOGLU, Dept. of Economics, MIT, E52-380B, 50 Memorial Drive, Cambridge, MA 021421347, U.S.A.;
[email protected] STEVEN BERRY, Dept. of Economics, Yale University, 37 Hillhouse Avenue/P.O. Box 8264, New Haven, CT 06520-8264, U.S.A.;
[email protected] WHITNEY K. NEWEY, Dept. of Economics, MIT, E52-262D, 50 Memorial Drive, Cambridge, MA 021421347, U.S.A.;
[email protected] WOLFGANG PESENDORFER, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] LARRY SAMUELSON, Dept. of Economics, Yale University, New Haven, CT 06520-8281, U.S.A.;
[email protected] HARALD UHLIG, Dept. of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, U.S.A.;
[email protected] ASSOCIATE EDITORS YACINE AÏT-SAHALIA, Princeton University JOSEPH G. ALTONJI, Yale University JAMES ANDREONI, University of California, San Diego DONALD W. K. ANDREWS, Yale University JUSHAN BAI, New York University MARCO BATTAGLINI, Princeton University PIERPAOLO BATTIGALLI, Università Bocconi DIRK BERGEMANN, Yale University MICHELE BOLDRIN, Washington University in St. Louis VICTOR CHERNOZHUKOV, Massachusetts Institute of Technology J. DARRELL DUFFIE, Stanford University JEFFREY ELY, Northwestern University LARRY G. EPSTEIN, Boston University HALUK ERGIN, Washington University in St. Louis FARUK GUL, Princeton University JINYONG HAHN, University of California, Los Angeles PHILIP A. HAILE, Yale University PHILIPPE JEHIEL, Paris School of Economics YUICHI KITAMURA, Yale University PER KRUSELL, Princeton University and Stockholm University OLIVER LINTON, London School of Economics
BART LIPMAN, Boston University THIERRY MAGNAC, Toulouse School of Economics GEORGE J. MAILATH, University of Pennsylvania DAVID MARTIMORT, IDEI-GREMAQ, Université des Sciences Sociales de Toulouse STEVEN A. MATTHEWS, University of Pennsylvania ROSA L. MATZKIN, University of California, Los Angeles LEE OHANIAN, University of California, Los Angeles WOJCIECH OLSZEWSKI, Northwestern University ERIC RENAULT, University of North Carolina PHILIP J. RENY, University of Chicago JEAN-MARC ROBIN, Université de Paris 1 and University College London SUSANNE M. SCHENNACH, University of Chicago UZI SEGAL, Boston College CHRIS SHANNON, University of California, Berkeley NEIL SHEPHARD, Oxford University MARCIANO SINISCALCHI, Northwestern University JEROEN M. SWINKELS, Washington University in St. Louis ELIE TAMER, Northwestern University IVÁN WERNING, Massachusetts Institute of Technology ASHER WOLINSKY, Northwestern University
EDITORIAL ASSISTANT: MARY BETH BELLANDO, Dept. of Economics, Princeton University, Fisher Hall, Princeton, NJ 08544-1021, U.S.A.;
[email protected] Information on MANUSCRIPT SUBMISSION is provided in the last two pages. Information on MEMBERSHIP, SUBSCRIPTIONS, AND CLAIMS is provided in the inside back cover.
SUBMISSION OF MANUSCRIPTS TO ECONOMETRICA 1. Members of the Econometric Society may submit papers to Econometrica electronically in pdf format according to the guidelines at the Society’s website: http://www.econometricsociety.org/submissions.asp Only electronic submissions will be accepted. In exceptional cases for those who are unable to submit electronic files in pdf format, one copy of a paper prepared according to the guidelines at the website above can be submitted, with a cover letter, by mail addressed to Professor Stephen Morris, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A. 2. There is no charge for submission to Econometrica, but only members of the Econometric Society may submit papers for consideration. In the case of coauthored manuscripts, at least one author must be a member of the Econometric Society. Nonmembers wishing to submit a paper may join the Society immediately via Blackwell Publishing’s website. Note that Econometrica rejects a substantial number of submissions without consulting outside referees. 3. It is a condition of publication in Econometrica that copyright of any published article be transferred to the Econometric Society. Submission of a paper will be taken to imply that the author agrees that copyright of the material will be transferred to the Econometric Society if and when the article is accepted for publication, and that the contents of the paper represent original and unpublished work that has not been submitted for publication elsewhere. If the author has submitted related work elsewhere, or if he does so during the term in which Econometrica is considering the manuscript, then it is the author’s responsibility to provide Econometrica with details. There is no page fee; nor is any payment made to the authors. 4. Econometrica has the policy that all empirical and experimental results as well as simulation experiments must be replicable. For this purpose the Journal editors require that all authors submit datasets, programs, and information on experiments that are needed for replication and some limited sensitivity analysis. (Authors of experimental papers can consult the posted list of what is required.) This material for replication will be made available through the Econometrica supplementary material website. The format is described in the posted information for authors. Submitting this material indicates that you license users to download, copy, and modify it; when doing so such users must acknowledge all authors as the original creators and Econometrica as the original publishers. If you have compelling reason we may post restrictions regarding such usage. At the same time the Journal understands that there may be some practical difficulties, such as in the case of proprietary datasets with limited access as well as public use datasets that require consent forms to be signed before use. In these cases the editors require that detailed data description and the programs used to generate the estimation datasets are deposited, as well as information of the source of the data so that researchers who do obtain access may be able to replicate the results. This exemption is offered on the understanding that the authors made reasonable effort to obtain permission to make available the final data used in estimation, but were not granted permission. We also understand that in some particularly complicated cases the estimation programs may have value in themselves and the authors may not make them public. This, together with any other difficulties relating to depositing data or restricting usage should be stated clearly when the paper is first submitted for review. In each case it will be at the editors’ discretion whether the paper can be reviewed. 5. Papers may be rejected, returned for specified revision, or accepted. Approximately 10% of submitted papers are eventually accepted. Currently, a paper will appear approximately six months from the date of acceptance. In 2002, 90% of new submissions were reviewed in six months or less. 6. Submitted manuscripts should be formatted for paper of standard size with margins of at least 1.25 inches on all sides, 1.5 or double spaced with text in 12 point font (i.e., under about 2,000 characters, 380 words, or 30 lines per page). Material should be organized to maximize readability; for instance footnotes, figures, etc., should not be placed at the end of the manuscript. We strongly encourage authors to submit manuscripts that are under 45 pages (17,000 words) including everything (except appendices containing extensive and detailed data and experimental instructions).
While we understand some papers must be longer, if the main body of a manuscript (excluding appendices) is more than the aforementioned length, it will typically be rejected without review. 7. Additional information that may be of use to authors is contained in the “Manual for Econometrica Authors, Revised” written by Drew Fudenberg and Dorothy Hodges, and published in the July, 1997 issue of Econometrica. It explains editorial policy regarding style and standards of craftmanship. One change from the procedures discussed in this document is that authors are not immediately told which coeditor is handling a manuscript. The manual also describes how footnotes, diagrams, tables, etc. need to be formatted once papers are accepted. It is not necessary to follow the formatting guidelines when first submitting a paper. Initial submissions need only be 1.5 or double-spaced and clearly organized. 8. Papers should be accompanied by an abstract of no more than 150 words that is full enough to convey the main results of the paper. On the same sheet as the abstract should appear the title of the paper, the name(s) and full address(es) of the author(s), and a list of keywords. 9. If you plan to submit a comment on an article which has appeared in Econometrica, we recommend corresponding with the author, but require this only if the comment indicates an error in the original paper. When you submit your comment, please include any correspondence with the author. Regarding comments pointing out errors, if an author does not respond to you after a reasonable amount of time, then indicate this when submitting. Authors will be invited to submit for consideration a reply to any accepted comment. 10. Manuscripts on experimental economics should adhere to the “Guidelines for Manuscripts on Experimental Economics” written by Thomas Palfrey and Robert Porter, and published in the July, 1991 issue of Econometrica. Typeset at VTEX, Akademijos Str. 4, 08412 Vilnius, Lithuania. Printed at The Sheridan Press, 450 Fame Avenue, Hanover, PA 17331, USA. Copyright © 2009 by The Econometric Society (ISSN 0012-9682). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation, including the name of the author. Copyrights for components of this work owned by others than the Econometric Society must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Posting of an article on the author’s own website is allowed subject to the inclusion of a copyright statement; the text of this statement can be downloaded from the copyright page on the website www.econometricsociety.org/permis.asp. Any other permission requests or questions should be addressed to Claire Sashi, General Manager, The Econometric Society, Dept. of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA. Email:
[email protected]. Econometrica (ISSN 0012-9682) is published bi-monthly by the Econometric Society, Department of Economics, New York University, 19 West 4th Street, New York, NY 10012. Mailing agent: Sheridan Press, 450 Fame Avenue, Hanover, PA 17331. Periodicals postage paid at New York, NY and additional mailing offices. U.S. POSTMASTER: Send all address changes to Econometrica, Blackwell Publishing Inc., Journals Dept., 350 Main St., Malden, MA 02148, USA.
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Membership, Subscriptions, and Claims Membership, subscriptions, and claims are handled by Blackwell Publishing, P.O. Box 1269, 9600 Garsington Rd., Oxford, OX4 2ZE, U.K.; Tel. (+44) 1865-778171; Fax (+44) 1865-471776; Email
[email protected]. North American members and subscribers may write to Blackwell Publishing, Journals Department, 350 Main St., Malden, MA 02148, USA; Tel. 781-3888200; Fax 781-3888232. Credit card payments can be made at www.econometricsociety.org. Please make checks/money orders payable to Blackwell Publishing. Memberships and subscriptions are accepted on a calendar year basis only; however, the Society welcomes new members and subscribers at any time of the year and will promptly send any missed issues published earlier in the same calendar year. Individual Membership Rates Ordinary Member 2009 Print + Online 1933 to date Ordinary Member 2009 Online only 1933 to date Student Member 2009 Print + Online 1933 to date Student Member 2009 Online only 1933 to date Ordinary Member—3 years (2009–2011) Print + Online 1933 to date Ordinary Member—3 years (2009–2011) Online only 1933 to date Subscription Rates for Libraries and Other Institutions Premium 2009 Print + Online 1999 to date Online 2009 Online only 1999 to date
$a $60
€b €40
£c £32
Concessionaryd $45
$25
€18
£14
$10
$45
€30
£25
$45
$10
€8
£6
$10
$175
€115
£92
$70
€50
£38
$a
€b
£c
Concessionaryd
$550
€360
£290
$50
$500
€325
£260
Free
a All
countries, excluding U.K., Euro area, and countries not classified as high income economies by the World Bank (http://www.worldbank.org/data/countryclass/classgroups.htm), pay the US$ rate. High income economies are: Andorra, Antigua and Barbuda, Aruba, Australia, Austria, The Bahamas, Bahrain, Barbados, Belgium, Bermuda, Brunei, Canada, Cayman Islands, Channel Islands, Cyprus, Czech Republic, Denmark, Equatorial Guinea, Estonia, Faeroe Islands, Finland, France, French Polynesia, Germany, Greece, Greenland, Guam, Hong Kong (China), Hungary, Iceland, Ireland, Isle of Man, Israel, Italy, Japan, Rep. of Korea, Kuwait, Liechtenstein, Luxembourg, Macao (China), Malta, Monaco, Netherlands, Netherlands Antilles, New Caledonia, New Zealand, Northern Mariana Islands, Norway, Oman, Portugal, Puerto Rico, Qatar, San Marino, Saudi Arabia, Singapore, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Taiwan (China), Trinidad and Tobago, United Arab Emirates, United Kingdom, United States, Virgin Islands (US). Canadian customers will have 6% GST added to the prices above. b Euro area countries only. c UK only. d Countries not classified as high income economies by the World Bank only. Back Issues Single issues from the current and previous two volumes are available from Blackwell Publishing; see address above. Earlier issues from 1986 (Vol. 54) onward may be obtained from Periodicals Service Co., 11 Main St., Germantown, NY 12526, USA; Tel. 518-5374700; Fax 518-5375899; Email
[email protected].
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Administrative Office: Department of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA; Tel. 212-9983820; Fax 212-9954487 General Manager: Claire Sashi (
[email protected]) 2009 OFFICERS ROGER B. MYERSON, University of Chicago, PRESIDENT JOHN MOORE, University of Edinburgh and London School of Economics, FIRST VICE-PRESIDENT BENGT HOLMSTRÖM, Massachusetts Institute of Technology, SECOND VICE-PRESIDENT TORSTEN PERSSON, Stockholm University, PAST PRESIDENT RAFAEL REPULLO, CEMFI, EXECUTIVE VICE-PRESIDENT
2009 COUNCIL (*)DARON ACEMOGLU, Massachusetts Institute of Technology MANUEL ARELLANO, CEMFI SUSAN ATHEY, Harvard University ORAZIO ATTANASIO, University College London (*)TIMOTHY J. BESLEY, London School of Economics KENNETH BINMORE, University College London TREVOR S. BREUSCH, Australian National University DAVID CARD, University of California, Berkeley JACQUES CRÉMER, Toulouse School of Economics (*)EDDIE DEKEL, Tel Aviv University and Northwestern University MATHIAS DEWATRIPONT, Free University of Brussels DARRELL DUFFIE, Stanford University HIDEHIKO ICHIMURA, University of Tokyo MATTHEW O. JACKSON, Stanford University
LAWRENCE J. LAU, The Chinese University of Hong Kong CESAR MARTINELLI, ITAM HITOSHI MATSUSHIMA, University of Tokyo MARGARET MEYER, University of Oxford PAUL R. MILGROM, Stanford University STEPHEN MORRIS, Princeton University ADRIAN R. PAGAN, Queensland University of Technology JOON Y. PARK, Texas A&M University and Sungkyunkwan University CHRISTOPHER A. PISSARIDES, London School of Economics ROBERT PORTER, Northwestern University ALVIN E. ROTH, Harvard University LARRY SAMUELSON, Yale University ARUNAVA SEN, Indian Statistical Institute MARILDA SOTOMAYOR, University of São Paulo JÖRGEN W. WEIBULL, Stockholm School of Economics
The Executive Committee consists of the Officers, the Editor, and the starred (*) members of the Council.
REGIONAL STANDING COMMITTEES Australasia: Trevor S. Breusch, Australian National University, CHAIR; Maxwell L. King, Monash University, SECRETARY. Europe and Other Areas: John Moore, University of Edinburgh and London School of Economics, CHAIR; Helmut Bester, Free University Berlin, SECRETARY; Enrique Sentana, CEMFI, TREASURER. Far East: Joon Y. Park, Texas A&M University and Sungkyunkwan University, CHAIR. Latin America: Pablo Andres Neumeyer, Universidad Torcuato Di Tella, CHAIR; Juan Dubra, University of Montevideo, SECRETARY. North America: Roger B. Myerson, University of Chicago, CHAIR; Claire Sashi, New York University, SECRETARY. South and Southeast Asia: Arunava Sen, Indian Statistical Institute, CHAIR.
Econometrica, Vol. 77, No. 3 (May, 2009), 623–685
THE IMPACT OF UNCERTAINTY SHOCKS BY NICHOLAS BLOOM1 Uncertainty appears to jump up after major shocks like the Cuban Missile crisis, the assassination of JFK, the OPEC I oil-price shock, and the 9/11 terrorist attacks. This paper offers a structural framework to analyze the impact of these uncertainty shocks. I build a model with a time-varying second moment, which is numerically solved and estimated using firm-level data. The parameterized model is then used to simulate a macro uncertainty shock, which produces a rapid drop and rebound in aggregate output and employment. This occurs because higher uncertainty causes firms to temporarily pause their investment and hiring. Productivity growth also falls because this pause in activity freezes reallocation across units. In the medium term the increased volatility from the shock induces an overshoot in output, employment, and productivity. Thus, uncertainty shocks generate short sharp recessions and recoveries. This simulated impact of an uncertainty shock is compared to vector autoregression estimations on actual data, showing a good match in both magnitude and timing. The paper also jointly estimates labor and capital adjustment costs (both convex and nonconvex). Ignoring capital adjustment costs is shown to lead to substantial bias, while ignoring labor adjustment costs does not. KEYWORDS: Adjustment costs, uncertainty, real options, labor and investment.
1. INTRODUCTION UNCERTAINTY APPEARS TO dramatically increase after major economic and political shocks like the Cuban missile crisis, the assassination of JFK, the OPEC I oil-price shock, and the 9/11 terrorist attacks. Figure 1 plots stockmarket volatility—one proxy for uncertainty—which displays large bursts of uncertainty after major shocks, which temporarily double (implied) volatility on average.2 These volatility shocks are strongly correlated with other measures of uncertainty, like the cross-sectional spread of firm- and industry-level earnings and productivity growth. Vector autoregression (VAR) estimations suggest that they also have a large real impact, generating a substantial drop and rebound in output and employment over the following 6 months. 1 This article was the main chapter of my Ph.D. thesis, previously called “The Impact of Uncertainty Shocks: A Firm-Level Estimation and a 9/11 Simulation.” I would like to thank my advisors Richard Blundell and John Van Reenen; the co-editor and the referees; my formal discussants Susantu Basu, Russell Cooper, Janice Eberly, Eduardo Engel, John Haltiwanger, Valerie Ramey, and Chris Sims; Max Floetotto; and many seminar audiences. Financial support of the ESRC and the Sloan Foundation is gratefully acknowledged. 2 In financial markets implied share-returns volatility is the canonical measure for uncertainty. Bloom, Bond, and Van Reenen (2007) showed that firm-level share-returns volatility is significantly correlated with a range of alternative uncertainty proxies, including real sales growth volatility and the cross-sectional distribution of financial analysts’ forecasts. While Shiller (1981) has argued that the level of stock-price volatility is excessively high, Figure 1 suggests that changes in stock-price volatility are nevertheless linked with real and financial shocks.
© 2009 The Econometric Society
DOI: 10.3982/ECTA6248
624
NICHOLAS BLOOM
FIGURE 1.—Monthly U.S. stock market volatility. Notes: Chicago Board of Options Exchange VXO index of percentage implied volatility, on a hypothetical at the money S&P100 option 30 days to expiration, from 1986 onward. Pre-1986 the VXO index is unavailable, so actual monthly returns volatilities are calculated as the monthly standard deviation of the daily S&P500 index normalized to the same mean and variance as the VXO index when they overlap from 1986 onward. Actual and VXO are correlated at 0.874 over this period. A brief description of the nature and exact timing of every shock is contained in Appendix A. The asterisks indicate that for scaling purposes the monthly VXO was capped at 50. Uncapped values for the Black Monday peak are 58.2 and for the credit crunch peak are 64.4. LTCM is Long Term Capital Management.
Uncertainty is also a ubiquitous concern of policymakers. For example, after 9/11 the Federal Open Market Committee (FOMC), worried about exactly the type of real-options effects analyzed in this paper, stated in October 2001 that “the events of September 11 produced a marked increase in uncertainty [. . . ] depressing investment by fostering an increasingly widespread wait-andsee attitude.” Similarly, during the credit crunch the FOMC noted that “Several [survey] participants reported that uncertainty about the economic outlook was leading firms to defer spending projects until prospects for economic activity became clearer.” Despite the size and regularity of these second-moment (uncertainty) shocks, there is no model that analyzes their effects. This is surprising given the extensive literature on the impact of first-moment (levels) shocks. This leaves open a wide variety of questions on the impact of major macroeconomic shocks, since these typically have both a first- and a second-moment component. The primary contribution of this paper is to structurally analyze these types of uncertainty shocks. This is achieved by extending a standard firm-level
THE IMPACT OF UNCERTAINTY SHOCKS
625
model with a time-varying second moment of the driving process and a mix of labor and capital adjustment costs. The model yields a central region of inaction in hiring and investment space due to nonconvex adjustment costs. Firms only hire and invest when business conditions are sufficiently good, and only fire and disinvest when they are sufficiently bad. When uncertainty is higher, this region of inaction expands—firms become more cautious in responding to business conditions. I use this model to simulate the impact of a large temporary uncertainty shock and find that it generates a rapid drop, rebound, and overshoot in employment, output, and productivity growth. Hiring and investment rates fall dramatically in the 4 months after the shock because higher uncertainty increases the real-option value to waiting, so firms scale back their plans. Once uncertainty has subsided, activity quickly bounces back as firms address their pent-up demand for labor and capital. Aggregate productivity growth also falls dramatically after the shock because the drop in hiring and investment reduces the rate of reallocation from low to high productivity firms, which drives the majority of productivity growth in the model as in the real economy.3 Again productivity growth rapidly bounces back as pent-up reallocation occurs. In the medium term the increased volatility arising from the uncertainty shock generates a “volatility overshoot.” The reason is that most firms are located near their hiring and investment thresholds, above which they hire/invest and below which they have a zone of inaction. So small positive shocks generate a hiring and investment response while small negative shocks generate no response. Hence, hiring and investment are locally convex in business conditions (demand and productivity). The increased volatility of business conditions growth after a second-moment shock therefore leads to a medium-term rise in labor and capital. In sum, these second-moment effects generate a rapid slowdown and bounce-back in economic activity, entirely consistent with the empirical evidence. This is very different from the much more persistent slowdown that typically occurs in response to the type of first-moment productivity and/or demand shock that is usually modelled in the literature.4 This highlights the importance to policymakers of distinguishing between the persistent firstmoment effects and the temporary second-moment effects of major shocks. I then evaluate the robustness of these predictions to general equilibrium effects, which for computational reasons are not included in my baseline model. To investigate this I build the falls in interest rates, prices, and wages that occur after actual uncertainty shocks into the simulation. This has little short-run effect on the simulations, suggesting that the results are robust to general equilibrium effects. The reason is that the rise in uncertainty following a secondmoment shock not only generates a slowdown in activity, but it also makes firms 3 4
See Foster, Haltiwanger, and Krizan (2000, 2006). See, for example, Christiano, Eichenbaum, and Evans (2005) and the references therein.
626
NICHOLAS BLOOM
temporarily extremely insensitive to price changes. This raises a second policy implication that the economy will be particularly unresponsive to monetary or fiscal policy immediately after an uncertainty shock, suggesting additional caution when thinking about the policy response to these types of events. The secondary contribution of this paper is to analyze the importance of jointly modelling labor and capital adjustment costs. For analytical tractability and aggregation constraints the empirical literature has estimated either labor or capital adjustment costs individually, assuming the other factor is flexible, or estimated them jointly, assuming only convex adjustment costs.5 I jointly estimate a mix of labor and capital adjustment costs (both convex and nonconvex) by exploiting the properties of homogeneous functions to reduce the state space. The estimation uses simulated method of moments on firm-level data to overcome the identification problem associated with the limited sample size of macro data. I find moderate nonconvex labor adjustment costs and substantial nonconvex capital adjustment costs. I also find that assuming capital adjustment costs only—as is standard in the investment literature—generates an acceptable overall fit, while assuming labor adjustment costs only—as is standard in the labor demand literature—produces a poor fit. The analysis of uncertainty shocks links with the earlier work of Bernanke (1983) and Hassler (1996) who highlighted the importance of variations in uncertainty.6 In this paper I quantify and substantially extend their predictions through two major advances: first, by introducing uncertainty as a stochastic process which is critical for evaluating the high-frequency impact of major shocks and, second, by considering a joint mix of labor and capital adjustment costs which is critical for understanding the dynamics of employment, investment, and productivity. This framework also suggests a range of future research. Looking at individual events, it could be used, for example, to analyze the uncertainty impact of trade reforms, major deregulations, tax changes, or political elections. It also suggests there is a trade-off between policy “correctness” and “decisiveness”— it may be better to act decisively (but occasionally incorrectly) than to deliber5 See, for example: on capital, Cooper and Haltiwanger (1993), Caballero, Engel, and Haltiwanger (1995), Cooper, Haltiwanger, and Power (1999), and Cooper and Haltiwanger (2006); on labor, Hammermesh (1989), Bertola and Bentolila (1990), Davis and Haltiwanger (1992), Caballero and Engel (1993), Caballero, Engel, and Haltiwanger (1997), and Cooper, Haltiwanger, and Willis (2004); on joint estimation with convex adjustment costs, Shapiro (1986), Hall (2004), and Merz and Yashiv (2007); see Bond and Van Reenen (2007) for a full survey of the literature. 6 Bernanke developed an example of uncertainty in an oil cartel for capital investment, while Hassler solved a model with time-varying uncertainty and fixed adjustment costs. There are of course many other linked recent strands of literature, including work on growth and volatility such as Ramey and Ramey (1995) and Aghion, Angeletos, Banerjee, and Manova (2005), on investment and uncertainty such as Leahy and Whited (1996) and Bloom, Bond, and Van Reenen (2007), on the business-cycle and uncertainty such as Barlevy (2004) and Gilchrist and Williams (2005), on policy uncertainty such as Adda and Cooper (2000), and on income and consumption uncertainty such as Meghir and Pistaferri (2004).
THE IMPACT OF UNCERTAINTY SHOCKS
627
ate on policy, generating policy-induced uncertainty. For example, when the Federal Open Markets Committee discussed the negative impact of uncertainty after 9/11 it noted that “A key uncertainty in the outlook for investment spending was the outcome of the ongoing Congressional debate relating to tax incentives for investment in equipment and software” (November 6th, 2001). Hence, in this case Congress’s attempt to revive the economy with tax incentives may have been counterproductive due to the increased uncertainty the lengthy policy process induced. More generally, the framework in this paper also provides one response to the “where are the negative productivity shocks?” critique of real business cycle theories.7 In particular, since second-moment shocks generate large falls in output, employment, and productivity growth, it provides an alternative mechanism to first-moment shocks for generating recessions. Recessions could simply be periods of high uncertainty without negative productivity shocks. Encouragingly, recessions do indeed appear in periods of significantly higher uncertainty, suggesting an uncertainty approach to modelling business cycles (see Bloom, Floetotto, and Jaimovich (2007)). Taking a longer-run perspective this paper also links to the volatility and growth literature, given the large negative impact of uncertainty on output and productivity growth. The rest of the paper is organized as follows: in Section 2, I empirically investigate the importance of jumps in stock-market volatility; in Section 3, I set up and solve my model of the firm; in Section 4, I characterize the solution of the model and present the main simulation results; in Section 5, I outline my simulated method of moments estimation approach and report the parameter estimates using U.S. firm data; and in Section 6, I run some robustness test on the simulation results. Finally, Section 7 offers some concluding remarks. Data and programs are provided in an online supplement (Bloom (2009)). 2. DO JUMPS IN STOCK-MARKET VOLATILITY MATTER? Two key questions to address before introducing any models of uncertainty shocks are (i) do jumps8 in the volatility index in Figure 1 correlate with other measures of uncertainty and (ii) do these have any impact on real economic outcomes? In Section 2.1, I address the first question by presenting evidence showing that stock-market volatility is strongly linked to other measures of productivity and demand uncertainty. In Section 2.2, I address the second question by presenting vector autoregression (VAR) estimations showing that volatility shocks generate a short-run drop in industrial production of 1%, lasting about 6 months, and a longer-run overshoot. First-moment shocks to the interest rates and stock-market levels generate a much more gradual drop and 7
See the extensive discussion in King and Rebelo (1999). I tested for jumps in the volatility series using the bipower variation test of Barndorff-Nielsen and Shephard (2006) and found statistically significant evidence for jumps. See Appendix A.1. 8
628
NICHOLAS BLOOM
rebound in activity lasting 2 to 3 years. A full data description for both sections is contained in Appendix A.9 2.1. Empirical Evidence on the Links Between Stock-Market Volatility and Uncertainty The evidence presented in Table I shows that a number of cross-sectional measures of uncertainty are highly correlated with time-series stock-market volatility. Stock-market volatility has also been previously used as a proxy for uncertainty at the firm level (e.g., Leahy and Whited (1996) and Bloom, Bond, and Van Reenen (2007)). Columns 1–3 of Table I use the cross-sectional standard deviation of firms’ pretax profit growth, taken from the quarterly accounts of public companies. As can be seen from column 1 stock-market time-series volatility is strongly correlated with the cross-sectional spread of firm-level profit growth. All variables in Table I have been normalized by their standard deviations (SD). The coefficient implies that the 2.47 SD rise in stock-market time-series volatility that occurred on average after the shocks highlighted in Figure 1 would be associated with a 1.31 SD (131 = 247 × 0532) rise in the cross-sectional spread of the growth rate of profits, a large increase. Column 2 reestimates this including a full set of quarterly dummies and a time trend, finding very similar results.10 Column 3 also includes quarterly standard industrial criterion (SIC) three-digit industry controls and again finds similar results,11 suggesting that idiosyncratic firm-level shocks are driving the time-series variations in volatility. Columns 4–6 use a monthly cross-sectional stock-return measure and show that this is also strongly correlated with the stock-return volatility index. Columns 7 and 8 report the results from using the standard deviation of annual five-factor Total Factor Productivity (TFP) growth within the National Bureau of Economic Research (NBER) manufacturing industry data base. There is also a large and significant correlation of the cross-sectional spread of industry productivity growth and stock-market volatility. Finally, columns 9 and 10 use a measure of the dispersion across macro forecasters over their predictions for future gross domestic product (GDP), calculated from the Livingstone halfyearly survey of professional forecasters. Once again, periods of high stockmarket volatility are significantly correlated with cross-sectional dispersion, in this case in terms of disagreement across macro forecasters. 9
All data and program files are also available at http://www.stanford.edu/~nbloom/. This helps to control for any secular changes in volatility (see Davis, Haltiwanger, Jarmin, and Miranda (2006)). 11 This addresses the type of concerns that Abraham and Katz (1986) raised about Lillien’s (1982) work on unemployment, where time-series variations in cross-sectional unemployment appeared to be driven by heterogeneous responses to common macro shocks. 10
TABLE I THE STOCK-MARKET VOLATILITY INDEX REGRESSED ON CROSS-SECTIONAL MEASURES OF UNCERTAINTYa Explanatory Variable Is Period by Period Cross-Sectional Standard Deviation of Firm profit growth,c Compustat quarterly
Dependent Variable Is Stock-Market Volatilityb 1
2
3
0.532 (0.064)
0.526 (0.092)
0.469 (0.115)
Firm stock returns,d CRSP monthly
5
6
0.543 (0.037)
0.544 (0.038)
0.570 (0.037)
7
8
0.429 (0.119)
0.419 (0.125)
GDP forecasts,f Livingstone half-yearly Time trend Month/quarter/half-year dummiesg Controls for SIC 3-digit industryh R2 Time span Average units in cross sectioni Observations in regression
No No No 0.287
Yes Yes No 0.301 62Q3–05Q1 327 171
Yes Yes Yes 0.238
No No No 0.287
Yes Yes No 0.339 62M7–06M12 355 534
Yes Yes Yes 0.373
No Yes n/a n/a n/a n/a 0.282 0.284 1962–1996 425 35
9
10
0.614 (0.111)
0.579 (0.121)
No Yes No Yes n/a n/a 0.332 0.381 62H2–98H2 57.4 63
a Each column reports the coefficient from regressing the time series of stock-market volatility on the within period cross-sectional standard deviation (SD) of the explanatory variable calculated from an underlying panel. All variables normalized to a SD of 1. Standard errors are given in italics in parentheses below. So, for example, column 1 reports that the stock-market volatility index is on average 0.532 SD higher in a quarter when the cross-sectional spread of firms’ profit growth is 1 SD higher. b The stock-market volatility index measures monthly volatility on the U.S. stock market and is plotted in Figure 1. The quarterly, half-yearly, and annual values are calculated by averaging across the months within the period. c The standard deviation of firm profit growth measures the within-quarter cross-sectional spread of profit growth rates normalized by average sales, defined as (profits − t profitst−1 )/(05 × salest + 05 × salest−1 ) and uses firms with 150+ quarters of data in Compustat quarterly accounts.
629
d The standard deviation of firm stock returns measures the within month cross-sectional standard deviation of firm-level stock returns for firm with 500+ months of data in the Center for Research in Securities Prices (CRSP) stock-returns file. e The standard deviation of industry TFP growth measures the within-year cross-industry spread of SIC 4-digit manufacturing TFP growth rates, calculated using the five-factor TFP growth figures from the NBER data base. f The standard deviation of GDP forecasts comes from the Philadelphia Federal Reserve Bank’s biannual Livingstone survey, calculated as the (standard deviation/mean) of forecasts of nominal GDP 1 year ahead, using half-years with 50+ forecasts, linearly detrended to remove a long-run downward drift. g Month/quarter/half-year dummies refers to quarter, month, and half-year controls for period effects. h Controls for SIC 3-digit industry denotes that the cross-sectional spread is calculated with SIC 3-digit by period dummies so the profit growth and stock returns are measured relative to the industry period average. i Average units in cross section refers to the average number of units (firms, industries, or forecasters) used to measure the cross-sectional spread.
THE IMPACT OF UNCERTAINTY SHOCKS
Industry TFP growth,e SIC 4-digit yearly
4
630
NICHOLAS BLOOM
2.2. VAR Estimates on the Impact of Stock-Market Volatility Shocks To evaluate the impact of uncertainty shocks on real economic outcomes I estimate a range of VARs on monthly data from June 1962 to June 2008.12 The variables in the estimation order are log(S&P500 stock market index), a stock-market volatility indicator (described below), Federal Funds Rate, log(average hourly earnings), log(consumer price index), hours, log(employment), and log(industrial production). This ordering is based on the assumptions that shocks instantaneously influence the stock market (levels and volatility), then prices (wages, the consumer price index (CPI), and interest rates), and finally quantities (hours, employment, and output). Including the stock-market levels as the first variable in the VAR ensures the impact of stockmarket levels is already controlled for when looking at the impact of volatility shocks. All variables are Hodrick–Prescott (HP) detrended (λ = 129,600) in the baseline estimations. The main stock-market volatility indicator is constructed to take a value 1 for each of the shocks labelled in Figure 1 and a 0 otherwise. These 17 shocks were explicitly chosen as those events when the peak of HP detrended volatility level rose significantly above the mean.13 This indicator function is used to ensure that identification comes only from these large, and arguably exogenous, volatility shocks rather than from the smaller ongoing fluctuations. Figure 2 plots the impulse response function of industrial production (the solid line with plus symbols) to a volatility shock. Industrial production displays a rapid fall of around 1% within 4 months, with a subsequent recovery and rebound from 7 months after the shock. The 1 standard-error bands (dashed lines) are plotted around this, highlighting that this drop and rebound is statistically significant at the 5% level. For comparison to a first-moment shock, the response to a 1% impulse to the Federal funds rate (FFR) is also plotted (solid line with circular symbols), displaying a much more persistent drop and recovery of up to 0.7% over the subsequent 2 years.14 Figure 3 repeats the same exercise for employment, displaying a similar drop and recovery in activity. Figures A1, A2, and A3 in the Appendix confirm the robustness of these VAR results to a range of alternative approaches over variable ordering, variable inclusion, shock definitions, shock timing, and detrending. In particular, these results are robust to identification from uncertainty shocks defined by the 10 exogenous shocks arising from wars, OPEC shocks, and terror events. 12 Note that this period excludes most of the Credit Crunch, which is too recent to have full VAR data available. I would like to thank Valerie Ramey and Chris Sims (my discussants) for their initial VAR estimations and subsequent discussions. 13 The threshold was 1.65 standard deviations above the mean, selected as the 5% one-tailed significance level treating each month as an independent observation. The VAR estimation also uses the full volatility series (which does not require defining shocks) and finds very similar results, as shown in Figure A1. 14 The response to a 5% fall the stock-market levels (not plotted) is very similar in size and magnitude to the response to a 1% rise in the FFR.
THE IMPACT OF UNCERTAINTY SHOCKS
631
FIGURE 2.—VAR estimation of the impact of a volatility shock on industrial production. Notes: Dashed lines are 1 standard-error bands around the response to a volatility shock.
3. MODELLING THE IMPACT OF AN UNCERTAINTY SHOCK In this section I model the impact of an uncertainty shock. I take a standard model of the firm15 and extend it in two ways. First, I introduce uncertainty as a stochastic process to evaluate the impact of the uncertainty shocks shown in Figure 1. Second, I allow a joint mix of convex and nonconvex adjustment costs for both labor and capital. The time-varying uncertainty interacts with
FIGURE 3.—VAR estimation of the impact of a volatility shock on employment. Notes: Dashed lines are 1 standard-error bands around the response to a volatility shock. 15 See, for example, Bertola and Caballero (1994), Abel and Eberly (1996), or Caballero and Engel (1999).
632
NICHOLAS BLOOM
the nonconvex adjustment costs to generate time-varying real-option effects, which drive fluctuations in hiring and investment. I also build in temporal and cross-sectional aggregation by assuming firms own large numbers of production units, which allows me to estimate the model’s parameters on firm-level data. 3.1. The Production and Revenue Function Each production unit has a Cobb–Douglas16 production function (3.1)
K L H) = AK α (LH)1−α F(A
capital (K), labor (L), and hours (H). The firm faces an in productivity (A), isoelastic demand curve with elasticity (ε), Q = BP −ε where B is a (potentially stochastic) demand shifter. These can be combined B K L H) = A 1−1/ε B1/ε K α(1−1/ε) (LH)(1−α)(1−1/ε) . into a revenue function R(A For analytical tractability I define a = α(1 − 1/ε) and b = (1 − α)(1 − 1/ε), and 1−1/ε B1/ε , where A combines the unit-level productivity substitute A1−a−b = A and demand terms into one index, which for expositional simplicity I will refer to as business conditions. With these redefinitions we have17 S(A K L H) = A1−a−b K a (LH)b Wages are determined by undertime and overtime hours around the standard working week of 40 hours. Following the approach in Caballero and Engel (1993), this is parameterized as w(H) = w1 (1 + w2 H γ ), where w1 , w2 , and γ are parameters of the wage equation to be determined empirically. 3.2. The Stochastic Process for Demand and Productivity I assume business conditions evolve as an augmented geometric random walk. Uncertainty shocks are modelled as time variations in the standard deviation of the driving process, consistent with the stochastic volatility measure of uncertainty in Figure 1. 16 While I assume a Cobb–Douglas production function, other supermodular homogeneous unit revenue functions could be used. For example, when replacing (3.1) with a constant elas K L H) = A(α 1 Kσ + ticity of substitution aggregator over capital and labor, where F(A α2 (LH)σ )1/σ , I obtained similar simulation results. 17 This reformulation to A as the stochastic variable to yield a jointly homogeneous revenue function avoids long-run effects of uncertainty reducing or increasing output because of convexity or concavity in the production function. See Abel and Eberly (1996) for a discussion.
THE IMPACT OF UNCERTAINTY SHOCKS
633
Business conditions are in fact modelled as a multiplicative composite of three separate random walks18 : a macro-level component (AM t ), a firmlevel component (AFit ), and a unit-level component (AUijt ), where Aijt = F U AM t Ait Aijt and i indexes firms, j indexes units, and t indexes time. The macrolevel component is modelled as (3.2)
M M AM t = At−1 (1 + σt−1 Wt )
Wt M ∼ N(0 1)
where σt is the standard deviation of business conditions and Wt M is a macrolevel independent and identically distributed (i.i.d.) normal shock. The firmlevel component is modelled as (3.3)
AFit = AFit−1 (1 + μit + σt−1 WitF )
WitF ∼ N(0 1)
where μit is a firm-level drift in business conditions and WitF is a firm-level i.i.d. normal shock. The unit-level component is modelled as (3.4)
U ) AUijt = AUijt−1 (1 + σt−1 Wijt
U Wijt ∼ N(0 1)
U U where Wijt is a unit-level i.i.d. normal shock. I assume Wt M , WitF , and Wijt are all independent of each other. While this demand structure may seem complex, it is formulated to ensure that (i) units within the same firm have linked investment behavior due to common firm-level business conditions, and (ii) they display some independent behavior due to the idiosyncratic unit-level shocks, which is essential for smoothing under aggregation. This demand structure also assumes that macro-, firm-, and unit-level uncertainty are the same.19 This is broadly consistent with the results from Table I for firm and macro uncertainty, which show these are highly
18 A random-walk driving process is assumed for analytical tractability, in that it helps to deliver a homogenous value function (details in the next section). It is also consistent with Gibrat’s law. An equally plausible alternative assumption would be a persistent AR(1) process, such as the following based on Cooper and Haltiwanger (2006): log(At ) = α + ρ log(At−1 ) + vt , where vt ∼ N(0 σt−1 ), ρ = 0885. To investigate this alternative I programmed another monthly simulation with autoregressive business conditions and no labor adjustment costs (so I could drop the labor state) and all other modelling assumptions the same. I found in this setup there were still large real-options effects of uncertainty shocks on output, as plotted in Figure S1 in the supplemental material (Bloom (2009)). 19 This formulation also generates business-conditions shocks at the unit level (firm level) that have three (two) times more variance than at the macro level. This appears to be inconsistent with actual data, since establishment data on things like output and employment are many times more volatile than the macro equivalent. However, it is worth noting two points. First, micro data also typically have much more measurement error than macro data so this could be causing the much greater variance of micro data. In stock-returns data, one of the few micro and macro indicators with almost no measurement error, firm stock returns have twice the variance of aggregate returns consistent with the modelling assumption. Second, because of the nonlinearities in the investment and hiring response functions (due to nonconvex adjustment costs), output and input growth is much more volatile at the unit level than at the macro level, which is smoothed by aggregation. So
634
NICHOLAS BLOOM
correlated. For unit-level uncertainty there is no direct evidence on this, but to the extent that this assumption does not hold the quantitative impact of macro uncertainty, shocks will be reduced (since total uncertainty will fluctuate less than one for one with macro uncertainty), while the qualitative findings will remain. I also evaluate this assumption in Section 6.3 by simulating an uncertainty shock which only changes the variance of Wt M (rather than changing the U variance of Wt M , WitF , and Wijt ), with broadly similar results. The firm-level business conditions drift (μit ) is also assumed to be stochastic to allow autocorrelated changes over time within firms. This is important for empirically identifying adjustment costs from persistent differences in growth rates across firms, as Section 5 discusses in more detail. The stochastic volatility process (σt2 ) and the demand conditions drift (μit ) are both assumed for simplicity to follow two-point Markov chains (3.5)
σt ∈ {σL σH }
(3.6)
μit ∈ {μL μH }
where where
σ Pr(σt+1 = σj |σt = σk ) = πkj μ Pr(μit+1 = μj |μit = μk ) = πkj
3.3. Adjustment Costs The third piece of technology that determines the firms’ activities is the adjustment costs. There is a large literature on investment and employment adjustment costs which typically focuses on three terms, all of which I include in my specification: Partial Irreversibilities: Labor partial irreversibility, labelled CLP , derives from per capita hiring, training, and firing costs, and is denominated as a fraction of annual wages (at the standard working week). For simplicity I assume these costs apply equally to gross hiring and gross firing of workers.20 Capital partial irreversibilities arise from resale losses due to transactions costs, the market for lemons phenomenon, and the physical costs of resale. The resale loss of capital is labelled CKP and is denominated as a fraction of the relative purchase price of capital. Fixed Disruption Costs: When new workers are added into the production process and new capital is installed, there may be a fixed loss of output. For example, adding workers may require fixed costs of advertising, interviewing, and training, or the factory may need to close for a few days while a capital refit is occurring. I model these fixed costs as CLF and CKF for hiring/firing and investment, respectively, both denominated as fractions of annual sales. even if the unit-, firm-, and macro-level business conditions processes all have the same variance, the unit- and firm-level employment, capital, and sales growth outcomes will be more volatile due to more lumpy hiring and investment. 20 Micro data evidence, for example, Davis and Haltiwanger (1992), suggests both gross and net hiring/firing costs may be present. For analytical simplicity I have restricted the model to gross costs, noting that net costs could also be introduced and estimated in future research through the addition of two net firing cost parameters.
THE IMPACT OF UNCERTAINTY SHOCKS
635
Quadratic Adjustment Costs: The costs of hiring/firing and investment may also be related to the rate of adjustment due to higher costs for more rapid changes, where CLQ L( EL )2 are the quadratic hiring/firing costs and E denotes gross hiring/firing, and CKQ K( KI )2 are the quadratic investment costs and I denotes gross investment. The combination of all adjustment costs is given by the adjustment cost function C(A K L H I E pKt ) = 52w(40)CLP (E + + E − ) + (I + − (1 − CKP )I − ) + CLF 1{E=0} + CKF 1{I=0} S(A K L H) 2 2 E I + CKQ K + CLQ L L K where E + (I + ) and E − (I − ) are the absolute values of positive and negative hiring (investment), respectively, and 1{E=0} and 1{I=0} are indicator functions which equal 1 if true and 0 otherwise. New labor and capital take one period to enter production due to time to build. This assumption is made to allow me to pre-optimize hours (explained in Section 3.5 below), but is unlikely to play a major role in the simulations given the monthly periodicity. At the end of each period there is labor attrition and capital depreciation proportionate to δL and δK , respectively. 3.4. Dealing With Cross-Sectional and Time Aggregation Gross hiring and investment is typically lumpy with frequent zeros in singleplant establishment-level data, but much smoother and continuous in multiplant establishment and firm-level data. This appears to be because of extensive aggregation across two dimensions: cross-sectional aggregation across types of capital and production plants; and temporal aggregation across higherfrequency periods within each year. I build this aggregation into the model by explicitly assuming that firms own a large number of production units and that these operate at a higher frequency than yearly. The units can be thought of as different production plants, different geographic or product markets, or different divisions within the same firm. To solve this model I need to define the relationship between production units within the firm. This requires several simplifying assumptions to ensure analytical tractability. These are not attractive, but are necessary to enable me to derive numerical results and incorporate aggregation into the model. In doing this I follow the general stochastic aggregation approach of Bertola and Caballero (1994) and Caballero and Engel (1999) in modelling macro and industry investment, respectively, and most specifically Abel and Eberly (2002) in modelling firm-level investment.
636
NICHOLAS BLOOM
Production units are assumed to independently optimize to determine investment and employment. Thus, all linkages across units within the same firm are modelled by the common shocks to demand, uncertainty, or the price of capital. So, to the extent that units are linked over and above these common shocks, the implicit assumption is that they independently optimize due to bounded rationality and/or localized incentive mechanisms (i.e., managers being assessed only on their own unit’s profit and loss account).21 In the simulation the number of units per firm is set at 250. This number was obtained from two pieces of analysis. First, I estimated the number of production units in my Compustat firms. To do this I started with the work of Davis et al. (2006), showing that Compustat firms with 500+ employees (in the Census Bureau data) have on average 185 establishments each in the United States.22 Their sample is similar to mine, which has Compustat firms with 500+ employees and $10m+ of sales (details in Section 5.4). I then used the results of Bloom, Schankerman, and Van Reenen (2007), who used Bureau Van Dijk data to show that for a sample of 715 Compustat firms, 61.5% of their subsidiaries are located overseas. Again, their sample is similar to mine, having a median of 3839 employees compared to 3450 for my sample. Combining these facts suggests that—if the number of establishments per subsidiary is approximately the same overseas as in the United States—the Compustat firms in my sample should have around 480 establishments: about 185 in the United States and about 295 overseas (295 = 185 × 615/(100 − 615)). Second, the simulation results are insensitive to the number of units once firms have 250 or more units. The reason is that with 250 units, the firm is effectively smoothed across independent unit-level shocks, so that more units do not materially change the simulation moments. Since running simulations with large numbers of units is computationally intensive, I used 250 units as a good approximation to the 480 units my firms approximately have on average. Of course this assumption on 250 units per firm will have a direct effect on the estimated adjustment costs (since aggregation and adjustment costs are both sources of smoothing) and thereby have an indirect effect on the simulation. Hence, in Section 5 I reestimate the adjustment costs, assuming instead the firm has 1 and 25 units to investigate this further. The model also assumes no entry or exit for analytical tractability. This seems acceptable in the monthly time frame (entry/exit accounts for around 2% of employment on an annual basis), but is an important assumption to explore in future research. My intuition is that relaxing this assumption should increase 21 The semiindependent operation of plants may be theoretically optimal for incentive reasons (to motivate local managers) and technical reasons (the complexity of centralized information gathering and processing). The empirical evidence on decentralization in U.S. firms suggests that plant managers have substantial hiring and investment discretion (see, for example, Bloom and Van Reenen (2007) and Bloom, Sadun, and Van Reenen (2008)). 22 I wish to thank Javier Miranda for helping with these figures.
THE IMPACT OF UNCERTAINTY SHOCKS
637
the effect of uncertainty shocks, since entry and exit decisions are extremely nonconvex, although this may have some offsetting effects through the estimation of slightly “smoother” adjustment costs. There is also the issue of time-series aggregation. Shocks and decisions in a typical business unit are likely to occur at a much higher frequency than annually, so annual data will be temporally aggregated, and I need to explicitly model this. There is little information on the frequency of decision making in firms. The anecdotal evidence suggests monthly frequencies are typical, due to the need for senior managers to schedule regular meetings, which I assume in my main results.23 Section 5.7 undertakes some robustness tests on this assumption and finds that time aggregation is actually quite important. This highlights the importance of obtaining better data on decision making frequency for future research. 3.5. Optimal Investment and Employment The optimization problem is to maximize the present discounted flow of revenues less the wage bill and adjustment costs. As noted above, each unit within the firm is assumed to optimize independently. Units are also assumed to be risk neutral to focus on the real-options effects of uncertainty. Analytical methods suggest that a unique solution to the unit’s optimization problem exists that is continuous and strictly increasing in (A K L) with an almost everywhere unique policy function.24 The model is too complex, however, to be fully solved using analytical methods, so I use numerical methods, knowing that this solution is convergent with the unique analytical solution. Given current computing power, however, I have too many state and control variables to solve the problem as stated, but the optimization problem can be substantially simplified in two steps. First, hours are a flexible factor of production and depend only on the variables (A K L), which are predetermined in period t given the time to build assumption. Therefore, hours can be optimized out in a prior step, which reduces the control space by one dimension. Second, the revenue function, adjustment cost function, depreciation schedules, and demand processes are all jointly homogenous of degree 1 in (A K L), allowing the whole problem to be normalized by one state variable, reducing the state space by one dimension.25 I normalize by capital to operate on A and KL . K These two steps dramatically speed up the numerical simulation, which is run 23
Note that even if shocks continuously hit the firm, if decision making only happens monthly, then there is no loss of generality from assuming a monthly shock process. 24 The application of Stokey and Lucas (1989) for the continuous, concave, and almost surely bounded normalized returns and cost function in (3.7) for quadratic adjustment costs and partial irreversibilities, and Caballero and Leahy (1996) for the extension to fixed costs. 25 The key to this homogeneity result is the random-walk assumption on the demand process. Adjustment costs and depreciation are naturally scaled by unit size, since otherwise units would “outgrow” adjustment costs and depreciation. The demand function is homogeneous through the 1−1/ε B1/ε . trivial renormalization A1−a−b = A
638
NICHOLAS BLOOM
on a state space of ( A L σ μ), making numerical estimation feasible. AppenK K dix B contains a description of the numerical solution method. The Bellman equation of the optimization problem before simplification (dropping the unit subscripts) can be stated as V (At Kt Lt σt μt ) = max S(At Kt Lt Ht ) − C(At Kt Lt Ht It Et ) − w(Ht )Lt It Et Ht
1 E V (At+1 Kt (1 − δK ) + It Lt (1 − δL ) 1+r
+ Et σt+1 μt+1 )
+
where r is the discount rate and E[·] is the expectation operator. Optimizing over hours and exploiting the homogeneity in (A K L) to take out a factor of Kt enables this to be rewritten as Q(at lt σt μt ) = max S ∗ (at lt ) − C ∗ (at lt it lt et ) (3.7) it et
1 − δ K + it + E[Q(at+1 lt+1 σt+1 μt+1 )] 1+r where the normalized variables are lt = Lt /Kt , at = At /Kt , it = It /Kt , and et = Et /Lt , S ∗ (at lt ) and C ∗ (at lt it lt et ) are sales and costs after optimization over hours, and Q(at lt σt μt ) = V (at 1 lt σt μt ), which is Tobin’s Q. 4. THE MODEL’S SOLUTION AND SIMULATING AN UNCERTAINTY SHOCK In this section I present the main results of the model and the uncertainty simulations. I do this before detailing the parameter values to enable readers to get to the main results more quickly. I list all the parameter values in Tables II and III and discuss how I obtained them in Section 5. Simulation parameter robustness tests can be found in Section 6. 4.1. The Model’s Solution A ) space, due to the nonThe model yields a central region of inaction in ( A K L convex costs of adjustment. Units only hire and invest when business conditions are sufficiently good, and only fire and disinvest when they are sufficiently bad. When uncertainty is higher, these thresholds move out: units become more cautious in responding to business conditions. A ) space the valTo provide some graphical intuition, Figure 4 plots in ( A K L ues of the fire and hire thresholds (left and right lines) and the sell and buy
THE IMPACT OF UNCERTAINTY SHOCKS
639
TABLE II PREDEFINED PARAMETERS IN THE MODEL Parameter
Value
α ε
1/3 4
w1 w2
08 24e–9
σH
2 × σL
σ πLH
1/36
σ πHH
071
(μH + μL )/2
002
μ
πLH δK δL
μ
πHL 01 01
r
65%
N
250
Rationale (Also See the Text)
Capital share in output is one-third, labor share is two-thirds. 33% markup. With constant returns to scale yields a + b = 075. I also try a 20% markup to yield a + b = 0833. Hourly wages minimized at a 40 hour week. Arbitrary scaling parameter. Set so the wage bill equals unity at 40 hours. Uncertainty shocks 2× baseline uncertainty (Figure 1 data). σL estimated. I also try 1.5× and 3× baseline shocks. Uncertainty shocks expected every 3 years (17 shocks in 46 years in Figure 1). Average 2-month half-life of an uncertainty shock (Figure 1 data). I also try 1 and 6 month half-lives. Average real growth rate equals 2% per year. The spread μH − μL is estimated. Firm-level demand growth transition matrix assumed symmetric. μ The parameter πHL estimated. Capital depreciation rate assumed 10% per year. Labor attrition assumed 10% for numerical speed (since δL = δK ). I also try δL = 02. Long-run average value for U.S. firm-level discount rate (King and Rebello (1999)). Firms operate 250 units, chosen to match data on establishments per firm. I also try N = 25 and N = 1.
capital thresholds (top and bottom lines) for low uncertainty (σL ) and the preferred parameter estimates in Table III column All. The inner region is the region of inaction (i = 0 and e = 0), where the real-option value of waiting is worth more than the returns to investment and/or hiring. Outside the region of inaction, investment and hiring will be taking place according to the optimal values of i and e. This diagram is a two-dimensional (two-factor) version of the one-dimensional investment models of Abel and Eberly (1996) and Caballero and Leahy (1996). The gap between the investment/disinvestment thresholds is higher than between the hire/fire thresholds due to the higher adjustment costs of capital. Figure 5 displays the same lines for both low uncertainty (the inner box of lines) and high uncertainty (the outer box of lines). It can be seen that the comparative static intuition that higher uncertainty increases real options is confirmed here, suggesting that large changes in σt can have an important impact on investment and hiring behavior. To quantify the impact of these real-option values I ran the thought experiment of calculating what temporary fall in wages and interest rates would be required to keep units hiring and investment thresholds unchanged when un-
640
NICHOLAS BLOOM
FIGURE 4.—Hiring/firing and investment/disinvestment thresholds. Simulated thresholds using the adjustment cost estimates from the column All in Table III. Although the optimal policies are of the (s S) type, it cannot be proven that this is always the case.
certainty temporarily rises from σL to σH . The required wage and interest rate falls turn out to be quantitatively large: units would need a 25% reduction in wages in periods of high uncertainty to leave their marginal hiring decisions unchanged and a 7% (700 basis point) reduction in the interest rates in periods of high uncertainty to leave their marginal investment decisions unchanged.26 The reason this uncertainty effect is so large is that labor and capital adjustment costs lead units to be cautious about hiring and investing. It is expensive to hire and then rapidly fire a worker or to buy a piece of equipment and then quickly resell it. So when uncertainty is high, units optimally postpone hiring and investment decisions for a few months until business conditions become clearer. 26
This can be graphically seen in supplemental material Figure S2, which plots the low and high uncertainty thresholds, but with the change that when σt = σH , interest rates are 7 percentage points lower and wage rates are 25% lower than when σt = σL .
THE IMPACT OF UNCERTAINTY SHOCKS
641
FIGURE 5.—Thresholds at low and high uncertainty. Simulated thresholds using the adjustment cost estimates from the column All in Table III. High uncertainty is twice the value of low uncertainty (σH = 2 × σL ).
Interestingly, recomputing these thresholds with permanent (time invariant) differences in uncertainty results in an even stronger impact on the investment and employment thresholds. So the standard comparative static result on changes in uncertainty will tend to overpredict the expected impact of time changing uncertainty. The reason is that units evaluate the uncertainty of their discounted value of marginal returns over the lifetime of an investment or hire, so high current uncertainty only matters to the extent that it drives up long-run uncertainty. When uncertainty is mean reverting, high current values have a lower impact on expected long-run values than if uncertainty were constant. Figure 6 shows a one-dimensional cut of Figure 4 (using the same x-axis), with the level of hiring/firing (solid line, left y-axis) and cross-sectional density of units (dashed line, right y-axis) plotted. These are drawn for one illustrative set of parameters: baseline uncertainty (σL ), high demand growth (μH ), and
642
NICHOLAS BLOOM
FIGURE 6.—The distribution of units between the hiring and firing thresholds. The hiring response (solid line) and unit-level density (dashed line) for low uncertainty (σL ), high drift (μH ), and the most common capital/labor (K/L) ratio. The distribution of units in (A/L) space is skewed to the right because productivity growth generates an upward drift in A and attrition generates a downward drift in L. The density peaks internally because of lumpy hiring due to fixed costs.
the modal value of capital/labor.27 Three things stand out: first, the distribution is skewed to the right due to positive demand growth and labor attrition; second, the density just below the hiring threshold is low because whenever the unit hits the hiring threshold, it undertakes a burst of activity (due to hiring fixed costs) that moves it to the interior of the space; and third, the density peaks at the interior which reflects the level of hiring that is optimally undertaken at the hiring threshold. 4.2. The Simulation Outline The simulation models the impact of a large, but temporary, rise in the variance of business-conditions (productivity and demand) growth. This secondmoment shock generates a rapid drop in hiring, investment, and productivity 27 Figure 6 is actually a 45◦ cut across Figure 4. The reason is Figure 6 holds K/L constant while allowing A to vary.
THE IMPACT OF UNCERTAINTY SHOCKS
643
growth as units become much more cautious due to the rise in uncertainty. Once the uncertainty shock passes, however, activity bounces back as units clear their pent-up demand for labor and capital. This also leads to a drop and rebound in productivity growth, since the temporary pause in activity slows down the reallocation of labor and capital from low to high productivity units. In the medium term this burst of volatility generates an overshoot in activity due to the convexity of hiring and investment in business conditions. Of course this is a stylized simulation, since other factors also typically change around major shocks. Some of these factors can and will be added to the simulation, for example allowing for a simultaneous negative shock to the first moment. I start by focusing on a second-moment shock only, however, to isolate the pure uncertainty effects and demonstrate that these alone are capable of generating large short-run fluctuations. I then discuss the robustness of this analysis to price changes from general equilibrium effects and a combined first- and second-moment shock. In Section 6 I also show robustness to a range of different parameter values, including adjustment costs and the stochastic process for uncertainty. 4.3. The Baseline Simulation I simulate an economy of 1000 units (four firms) for 15 years at a monthly frequency. This simulation is then repeated 25,000 times, with the values for labor, capital, output, and productivity averaged over all these runs. In each simulation the model is hit with an uncertainty shock in month 1 of year 11, defined as σt = σH in equation (3.5). All other micro and macro shocks are randomly drawn as per Section 3. This generates the average impact of an uncertainty shock, where the average is taken over the distribution of micro and macro shocks. There are both fixed cost and partial irreversibility adjustment costs for labor and capital, which are estimated from Compustat data as explained in Section 5 (in particular see the All column in Table III). Before presenting the simulation results, it is worth first showing the precise impulse that will drive the results. Figure 7a reports the average value of σt normalized to unity before the shock. It is plotted on a monthly basis, with the month normalized to zero on the date of the shock. Three things are clear from Figure 7a: first, the uncertainty shock generates a sharp spike in the average σt across the 25,000 simulations; second, this dies off rapidly with a half-life of 2 months; and third, the shock almost doubles average σt (the rise is less than 100% because some of the 25,000 simulations already had σt = σH when the shock occurred). In Figure 7b I show the average time path of business conditions (Ajt ), showing that the uncertainty shock has no first-moment effect. In Figure 8, I plot aggregate detrended labor, again normalized to 1 at the month before the shock. This displays a substantial fall in the 6 months immediately after the uncertainty shock and then overshoots from month 8 onward, eventually returning to level by around 3 years.
644
NICHOLAS BLOOM
FIGURE 7A.—The simulation has a large second-moment shock.
The initial drop occurs because the rise in uncertainty increases the realoption value of inaction, leading the majority of units to temporarily freeze
FIGURE 7B.—The simulation has no first moment shock.
THE IMPACT OF UNCERTAINTY SHOCKS
645
FIGURE 8.—Aggregate (detrended) labor drops, rebounds, and overshoots. The aggregate figures for Lt are calculated by summing across all units within the simulation.
hiring. Because of the ongoing exogenous attrition of workers, this generates a fall in net employment. Endogenizing quits would of course reduce the impact of these shocks, since the quit rate would presumably fall after a shock. In the model, to offset this I have conservatively assumed a 10% annual quit rate—well below the 15% to 25% quit rate observed over the business cycle in recent Job Openings and Labor Turnover Survey (JOLTS) data (see Davis, Faberman, and Haltiwanger (2006)). This low fixed quit rate could be thought of as the exogenous component due to retirement, maternity, sickness, family relocation, and so forth. The rebound from month 4 onward occurs because of the combination of falling uncertainty (since the shock is only temporary) and rising pent-up demand for hiring (because units paused hiring over the previous 3 months). To make up the shortfall in labor, units begin to hire at a faster pace than usual so the labor force heads back toward its trend level. This generates the rapid rebound in the total labor from month 3 until about month 6. 4.4. The Volatility Overshoot One seemingly puzzling phenomenon, however, is the overshoot from month 7 onward. Pure real-options effects of uncertainty should generate a drop and overshoot in the growth rate of labor (that is the hiring rate), but only a drop and convergence back to trend in the level of the labor force. So the ques-
646
NICHOLAS BLOOM
tion is what is causing this medium-term overshoot in the level of the labor force? This medium-term overshoot arises because the increased volatility of business conditions leads more units to hit both the hiring and firing thresholds. Since more units are clustered around the hiring threshold than the firing threshold, due to labor attrition and business-conditions growth (see Figure 6), this leads to a medium-term burst of net hiring. In effect hiring is convex in productivity just below the hiring threshold—units that receive a small positive shock hire and units that receive a small negative shock do not respond. So total hiring rises in the medium term with the increased volatility of productivity growth. Of course once units have undertaken a burst of hiring, they jump to the interior of the region of inaction and so do not hire again for some time. So in the long-run this results in labor falling back to its long-run trend path. I label this phenomenon the volatility overshoot, since this mediumterm hiring boom is induced by the higher unit-level volatility of businessconditions.28 Thus, the effect of a rise in σt is twofold. First, the real-options effect from increased uncertainty over future business conditions causes an initial drop in activity as units pause investment and hiring. This happens rapidly since expectations change upon impact of the uncertainty shock, so that hiring and investment instantly freeze. Second, the effect from increased volatility of realized business conditions causes a medium-term hiring boom. This takes more time to occur because it is driven by the rise in the realized volatility of productivity growth. This rise in volatility accrues over several months. Thus, the uncertainty drop always precedes the volatility overshoot. These distinct uncertainty and volatility effects are shown in Figure 9. This splits out the expectational effects of higher σt from the realized volatility effects of higher σt . These simulations are shown for 36 months after the shock to highlight the long-run differences between these effects.29 The uncertainty effect is simulated by allowing unit expectations over σt to change after the shock (as in the baseline) but holds the variance of the actual draw of shocks constant. 28 This initially appears similar to the type of “echo effect” that appears in demand for products like cars in response to demand shocks, but these echo effects are actually quite distinct from a volatility overshoot. In the echo effect case, what arises is a lumpy response to a first-moment shock. The fixed costs of adjustment lead to a burst of investment, with subsequent future bursts (echo effects). This can arise in models with one representative agent and perfect certainty, but requires lump-sum adjustment costs. The volatility overshoot in this paper arises from time variation is the cross-sectional distribution, leading to an initial overshoot and then a gradual return to trend. This can arise in a model with no lump-sum adjustment costs (for example, partial irreversibility is sufficient), but it does require a cross section of agents and time variation in the second moment. 29 In general, I plot response for the first 12 months due to the partial equilibrium nature of the analysis, unless longer-run plots are expositionally helpful.
THE IMPACT OF UNCERTAINTY SHOCKS
647
FIGURE 9.—Separating out the uncertainty and volatility effects. The baseline plot is the same as in Figure 8 but extended out for 36 months. For the volatility effect only plot, firms have expectations set to σt = σL in all periods (i.e., uncertainty effects are turned off), while in the uncertainty effect only, they have the actual shocks drawn from a distribution σt = σL in all periods (i.e., the volatility effects are turned off).
This generates a drop and rebound back to levels, but no volatility overshoot. The volatility effect is simulated by holding unit expectations over σt constant but allowing the realized volatility of the business conditions to change after the shock (as in the baseline). This generates a volatility overshoot, but no initial drop in activity from a pause in hiring.30 The baseline figure in the graph is simply the aggregate detrended labor (as in Figure 8). This suggests that uncertainty and volatility have very different effects on economic activity, despite often being driven by the same underlying phenomena. The response of aggregate capital to the uncertainty shock is similar to labor. There is a short-run drop in capital as units postpone investing, followed by a rebound as they address their pent-up demand for investment, and a subsequent volatility driven overshoot (see supplementary material Figure S3).
30 In the figure, the volatility effects also take 1 extra month to begin. This is simply because of the standard finance timing assumption in (3.2) that σt−1 drives the volatility of Ajt . Allowing volatility to be driven by σt delivers similar results because in the short run the uncertainty effect of moving out the hiring and investment thresholds dominates.
648
NICHOLAS BLOOM
4.5. Why Uncertainty Reduces Productivity Growth Figure 10a plots the time series for the growth of aggregate productivity, defined as j Ajt Ljt , where the sum is taken over all j production units in the economy in month t. In this calculation the growth of business conditions (Ajt ) can be used as a proxy for the growth of productivity under the assumption that shocks to demand are small in comparison to productivity (or that the shocks are independent). Following Baily, Hulten, and Campbell (1992), I define three indices as follows31 : Ajt Ljt − Ajt−1 Ljt−1 Ajt−1 Ljt−1
aggregate productivity growth
=
(Ajt − Ajt−1 )Ljt−1 Ajt (Ljt − Ljt−1 ) + Ajt−1 Ljt−1 Ajt−1 Ljt−1
within productivity growth
reallocation productivity growth
The first term, aggregate productivity growth, is the increase in productivity weighted by employment across units. This can be broken down into two subterms: within productivity growth, which measures the productivity increase within each production unit (holding the employment of each unit constant), and reallocation productivity growth, which measures the reallocation of employment from low to high productivity units (holding the productivity of each unit constant). In Figure 10a aggregate productivity growth shows a large fall after the uncertainty shock. The reason is that uncertainty reduces the shrinkage of low productivity units and the expansion of high productivity units, reducing the reallocation of resources toward more productive units.32 This reallocation from low to high productivity units drives the majority of productivity growth in the model so that higher uncertainty has a first-order effect on productivity growth. This is clear from the decomposition which shows that the fall in total is entirely driven by the fall in the reallocation term. The within term is constant since, by assumption, the first moment of the demand conditions shocks is unchanged.33 31 Strictly speaking, Bailey, Hulten, and Campbell (1992) defined four terms, but for simplicity I have combined the between and cross terms into a reallocation term. 32 Formally there is no reallocation in the model because it is partial equilibrium. However, with the large distribution of contracting and expanding units all experiencing independent shocks, gross changes in unit factor demand are far larger than net changes, with the difference equivalent to reallocation. 33 These plots are not completely smooth because the terms are summations of functions which are approximately squared in Ajt . For example Ajt Ljt ≈ λA2jt for some scalar λ since Lit is ap-
THE IMPACT OF UNCERTAINTY SHOCKS
649
FIGURE 10A.—Aggregate productivity growth falls and rebounds after the shock.
FIGURE 10B.—Unit level productivity and hiring for the period before the uncertainty shock.
650
NICHOLAS BLOOM
FIGURE 10C.—Unit level productivity and hiring in the period after the uncertainty shock.
In the bottom two panels this reallocative effect is illustrated by two unit-level scatter plots of gross hiring against log productivity in the month before the shock (Figure 10b) and the month after the shock (Figure 10c). It can be seen that after the shock much less reallocative activity takes place with a substantially lower fraction of expanding productive units and shrinking unproductive units. Since actual U.S. aggregate productivity growth appears to be 70% to 80% driven by reallocation,34 these uncertainty effects should play an important role in the real impact of large uncertainty shocks. Figure 11 plots the level of an alternative productivity measure—Solow productivity. This is defined as aggregate output divided by factor share weighted aggregate inputs: Solow aggregate productivity 1/(ε−1) α Ajt Kjt (Ljt × Hjt )1−α =
j
α
j
Kjt + (1 − α)
Ljt × Hjt
j
proximately linear in Ajt . Combined with the random-walk nature of the driving process (which means some individual units grow very large), this results in lumpy aggregate productivity growth even in very large samples of units. 34 Foster, Haltiwanger, and Krizan (2000, 2006) reported that reallocation, broadly defined to include entry and exit, accounts for around 50% of manufacturing and 90% of retail productivity growth. These figures will in fact underestimate the full contribution of reallocation since they miss the within establishment reallocation, which Bernard, Redding, and Schott’s (2006) results on product switching suggests could be substantial.
THE IMPACT OF UNCERTAINTY SHOCKS
651
FIGURE 11.—Solow aggregate productivity (detrended) drops, rebounds, and overshoots. Solow productivity is defined as aggregate output divided by the factor share weighted aggregate inputs.
I report this series because macro productivity measures are typically calculated in this way using only macro data (note that the previous aggregate productivity measure would require micro data to calculate). As can be seen in Figure 11, the detrended Solow productivity series also falls and rebounds after the uncertainty shock. Again, this initial drop and rebound is because of the initial pause and subsequent catch-up in the level of reallocation across units immediately after the uncertainty shock. The medium-run overshoot is again due to the increased level of cross-sectional volatility, which increases the potential for reallocation, leading to higher aggregate medium-term productivity growth. Finally, Figure 12 plots the effects of an uncertainty shock on output. This shows a clear drop, rebound, and overshoot, very similar to the behavior of the labor, capital, and productivity. What is striking about Figure 12 is the similarity of the size, duration, and time profile of the simulated response to an uncertainty shock compared to the VAR results on actual data shown in Figure 2. In particular, both the simulated and actual data show a drop of detrended activity of around 1% to 2% after about 3 months, a return to trend at around 6 months, and a longer-run gradual overshoot.
652
NICHOLAS BLOOM
FIGURE 12.—Aggregate (detrended) output drops, rebounds, and overshoots.
4.6. Investigating Robustness to General Equilibrium Ideally I would set up my model within a general equilibrium (GE) framework, allowing prices to endogenously change. This could be done, for example, by assuming agents approximate the cross-sectional distribution of units within the economy using a finite set of moments and then using these moments in a representative consumer framework to compute a recursive competitive equilibrium (see, for example, Krusell and Smith (1998), Khan and Thomas (2003), and Bachman, Caballero, and Engel (2008)). However, this would involve another loop in the routine to match the labor, capital, and output markets between units and the consumer, making the program too slow to then loop in the simulated method of moments estimation routine. Hence, there is a trade-off between two options: (i) a GE model with flexible prices but assumed adjustment costs35 and (ii) estimated adjustment costs but in a fixed price model. Since the effects of uncertainty are sensitive to the nature of adjustment costs, I chose to take the second option and leave GE analysis to future work. 35
Unfortunately there are no “off the shelf” adjustment cost estimates that can be used, since no paper has previously jointly estimated convex and nonconvex labor and capital adjustment costs.
THE IMPACT OF UNCERTAINTY SHOCKS
653
This means the results in this model could be compromised by GE effects if factor prices changed sufficiently to counteract factor demand changes.36 One way to investigate this is to estimate the actual changes in wages, prices, and interest rates that arise after a stock-market volatility shock and feed them into the model in an expectations consistent way. If these empirically plausible changes in factor prices radically changed these results, this would suggest they are not robust to GE, while if they have only a small impact, it is more reassuring on GE robustness. To do this I use the estimated changes in factor prices from the VAR (see Section 2.2), which are plotted in Figure 13. An uncertainty shock leads to a short-run drop and rebound of interest rates of up to 1.1% points (110 basis point), of prices of up to 0.5%, and of wages of up to 0.3%. I take these numbers and structurally build them into the model so that when σt = σH , interest
FIGURE 13.—VAR estimation of the impact of a volatility shock on prices. Notes: VAR Cholesky orthogonalized impulse response functions to a volatility shock. Estimated monthly from June 1962 to June 2008. Impact on the Federal Funds rate is plotted as a percentage point change so the shock reduces rates by up to 110 basis points. The impact on the CPI and wages is plotted as percentage change.
36 Khan and Thomas (2008) found in their micro to macro investment model that with GE, the response of the economy to productivity shocks is not influenced by the presence of nonconvex adjustment costs. With a slight abuse of notation this can be characterized as (∂(∂Kt /∂At ))/∂NC ≈ 0, where Kt is aggregate capital, At is aggregate productivity, and NC are nonconvex adjustment costs. The focus of my paper on the direct impact of uncertainty on aggregate variables is different and can be characterized instead as ∂Kt /∂σt . Thus, their results are not necessarily inconsistent with mine. More recent work by Bachman, Caballero, and Engel (2008), found their results depend on the choice of parameter values. Sim (2006) built a GE model with capital adjustment costs and time-varying uncertainty and found that the impact of temporary increases in uncertainty on investment is robust to GE effects.
654
NICHOLAS BLOOM
FIGURE 14.—Aggregate (detrended) output: partial equilibrium and pseudo GE. Pseudo-GE allows interest rates, wages, and prices to be 1.1% points, 0.5%, and 0.3%, respectively, lower during periods of high uncertainty.
rates are 1.1% lower, prices (of output and capital) are 0.5% lower, and wages are 0.3% lower. Units expect this to occur, so expectations are rational. In Figure 14, I plot the level of output after an uncertainty shock with and without these “pseudo-GE” prices changes. This reveals two surprising outcomes: first, the effects of these empirically reasonable changes in interest rates, prices, and wages have very little impact on output in the immediate aftermath of an uncertainty shock; and second, the limited pseudo-GE effects that do occur are greatest at around 3–5 months, when the level of uncertainty (and so the level of the interest rate, price, and wage reductions) is much smaller. To highlight the surprising nature of these two findings, Figure S4 (supplemental material) plots the impact of the pseudo-GE price effects on capital, labor, and output in a simulation without adjustment costs. In the absence of any adjustment costs, these interest rate, prices, and wages changes do have an extremely large effect. So the introduction of adjustment costs both dampens and delays the response of the economy to the pseudo-GE price changes. The reason for this limited impact of pseudo-GE price changes is that after an uncertainty shock occurs, the hiring/firing and investment/disinvestment thresholds jump out, as shown in Figure 5. As a result there are no units near any of the response thresholds. This makes the economy insensitive to changes in interest rates, prices, or wages. The only way to get an impact would be to shift the thresholds back to the original low uncertainty position where the ma-
THE IMPACT OF UNCERTAINTY SHOCKS
655
jority of units are located, but as noted in Section 4.1 the quantitative impact of these uncertainty shocks is equivalent to something like a 7% (700 basis point) higher interest rate and a 25% higher wage rate, so these pseudo-GE price reductions of 1.1% in interest rates, 0.5% in prices, and 0.3% in wages are not sufficient to do this. Of course once the level of uncertainty starts to fall back again, the hiring/firing and investment/disinvestment thresholds begin to move back toward their low uncertainty values. This means they start to move back toward the region in (A/K) and (A/L) space where the units are located, so the economy becomes more sensitive to changes in interest rates, prices, and wages. Thus, these pseudo-GE price effects start to play a role, but only with a lag. In summary, the rise in uncertainty not only reduces levels of labor, capital, productivity, and output, but it also makes the economy temporarily extremely insensitive to changes in factor prices. This is the macro equivalent to the “cautionary effects” of uncertainty demonstrated on firm-level panel data by Bloom, Bond, and Van Reenen (2007). For policymakers this is important since it suggests a monetary or fiscal response to an uncertainty shock is likely to have almost no impact in the immediate aftermath of a shock. But as uncertainty falls back down and the economy rebounds, it will become more responsive, so any response to policy will occur with a lag. Hence, a policymaker trying, for example, to cut interest rates to counteract the fall in output after an uncertainty shock would find no immediate response, but rather a delayed response when the economy was already starting to recover. This cautions against using first-moment policy levers to respond to the second-moment component of shocks; policies aimed directly at reducing the underlying increase in uncertainty are likely to be far more effective. 4.7. A Combined First- and Second-Moment Shock All the large macro shocks highlighted in Figure 1 comprise both a firstand a second-moment element, suggesting a more realistic simulation would analyze these together. This is undertaken in Figure 15, where the output response to a pure second-moment shock (from Figure 12) is plotted alongside the output response to the same second-moment shock with an additional first-moment shock of −2% to business conditions.37 Adding an additional first-moment shock leaves the main character of the second-moment shock unchanged—a large drop and rebound. Interestingly, a first-moment shock on its own shows the type of slow response dynamics that the real data display (see, for example, the response to a 37 I choose −2% because this is equivalent to 1 year of business-conditions growth in the model. Larger or smaller shocks yield a proportionally larger or smaller impact.
656
NICHOLAS BLOOM
FIGURE 15.—Combined first- and second-moment shocks. The second-moment shock only has σt set to σH . The first- and second-moment shock has σt set to σH and also a −2% macro business-conditions shock. The first-moment shock only just has a −2% macro business-conditions shock.
monetary shock in Figure 3). This is because the cross-sectional distribution of units generates a dynamic response to shocks.38 This rapid drop and rebound in response to a second-moment shock is clearly very different from the persistent drop over several quarters in response to a more traditional first-moment shock. Thus, to the extent a large shock is more a second-moment phenomenon—for example, 9/11—the response is likely to involve a rapid drop and rebound, while to the extent it is more a first-moment phenomenon—for example, OPEC II—it is likely to generate a persistent slowdown. However, in the immediate aftermath of these shocks, distinguishing them will be difficult, as both the first- and second-moment components will generate an immediate drop in employment, investment, and productivity. The analysis in Section 2.1 suggests, however, there are empirical proxies for uncertainty that are available in real time to aid policymakers, such as the VXO series for implied volatility (see notes to Figure 1), the cross-sectional spread of stock-market returns, and the cross-sectional spread of professional forecasters. Of course these first- and second-moment shocks differ both in terms of the moments they impact and in terms of their duration: permanent and tempo38 See the earlier work on this by, for example, Caballero and Engel (1993) and Bertola and Caballero (1994).
THE IMPACT OF UNCERTAINTY SHOCKS
657
rary, respectively. The reason is that the second-moment component of shocks is almost always temporary while the first-moment component tends to be persistent. For completeness a persistent second-moment shock would generate a similar effect on investment and employment as a persistent first-moment shock, but would generate a slowdown in productivity growth through the reallocation term rather than a one-time reduction in productivity levels through the within term. Thus, the temporary/permanent distinction is important for the predicted time profile of the impact of the shocks on hiring and investment, and the first-/second-moment distinction is important for the route through which these shocks impact productivity. The only historical example of a persistent second-moment shock was the Great Depression, when uncertainty—as measured by share-returns volatility—rose to an incredible 130% of 9/11 levels on average for the 4 years of 1929 to 1932. While this type of event is unsuitable for analysis using my model given the lack of general equilibrium effects and the range of other factors at work, the broad predictions do seem to match up with the evidence. Romer (1990) argued that uncertainty played an important real-options role in reducing output during the onset of the Great Depression, while Ohanian (2001) and Bresnahan and Raff (1991) reported “inexplicably” low levels of productivity growth with an “odd” lack of output reallocation over this period. 5. ESTIMATING THE MODEL PARAMETERS This section explains how the individual parameter values used to solve the model and to simulate uncertainty shocks in the previous section were obtained. Readers who are focused on the simulation may want to skip to Section 6. The full set of parameters is the vector θ that characterizes the firm’s revenue function, stochastic processes, adjustment costs, and discount rate. The econometric problem consists of estimating this parameter vector θ. Since the model has no analytical closed form solution, this vector cannot be estimated using standard regression techniques. Instead estimation of the parameters is achieved by simulated method of moments (SMM), which minimizes a distance criterion between key moments from actual data (a panel of publicly traded firms from Compustat) and simulated data. Because SMM is computationally intensive, only 10 parameters can be estimated; the remaining 13 are predefined. 5.1. Simulated Method of Moments SMM proceeds as follows: a set of actual data moments Ψ A is selected for the model to match.39 For an arbitrary value of θ the dynamic program is solved 39 See McFadden (1989) and Pakes and Pollard (1989) for the statistical properties of the SMM estimator.
658
NICHOLAS BLOOM
and the policy functions are generated. These policy functions are used to create a simulated data panel of size (κN T + 10), where κ is a strictly positive integer, N is the number of firms in the actual data, and T is the time dimension of the actual data. The first 10 years are discarded so as to start from the ergodic distribution. The simulated moments Ψ S (θ) are then calculated on the remaining simulated data panel, along with an associated criterion function Γ (θ), where Γ (θ) = [Ψ A − Ψ S (θ)] W [Ψ A − Ψ S (θ)], which is a weighted distance between the simulated moments Ψ S (θ) and the actual moments Ψ A . The parameter estimate θ is then derived by searching over the parameter space to find the parameter vector which minimizes the criterion function: (5.1)
θ = arg min[Ψ A − Ψ S (θ)] W [Ψ A − Ψ S (θ)] θ∈Θ
Given the potential for discontinuities in the model and the discretization of the state space, I use an annealing algorithm for the parameter search (see Appendix B). Different initial values of θ are selected to ensure the solution converges to the global minimum. I also run robustness tests in Section 5.7 with different initial distributions. The efficient choice for W is the inverse of the variance–covariance matrix of [Ψ A − Ψ S (θ)]. Defining Ω to be the variance–covariance matrix of the data moments, Lee and Ingram (1991) showed that under the estimating null, the variance–covariance of the simulated moments is equal to κ1 Ω. Since Ψ A and Ψ S (θ) are independent by construction, W = [(1 + κ1 )Ω]−1 , where the first term in the inner brackets represents the randomness in the actual data and the second term represents the randomness in the simulated data. Ω is calculated by block bootstrap with replacement on the actual data. The asymptotic variance of the efficient estimator θ is proportional to (1 + κ1 ). I use κ = 25, with each of these 25 firm panels having independent draws of macro shocks. This implies the standard error of θ is increased by 4% by using simulation estimation. 5.2. Predefined Parameters In principle every parameter could be estimated, but in practice the size of the estimated parameter space is limited by computational constraints. I therefore focus on the parameters about which there are probably the weakest priors—the six adjustment cost parameters, the wage/hours trade-off slope, the baseline level of uncertainty, and the two key parameters that determine the μ firm-level demand drift, Θ = (PRL FCL QCL PRK FCK QCK γ σL πHH μL ). The other 13 parameters are based on values in the data and the literature, and are displayed in Table II. The predefined parameters outlined in Table II are mostly self-explanatory, although a few require further discussion. One of these is ε, which is the elasticity of demand. In a constant returns to scale (CRS) production function setup
THE IMPACT OF UNCERTAINTY SHOCKS
659
this translates directly into the returns to scale parameter on the revenue function, a + b. There is a wide range of estimates of the revenue returns to scale, recent examples being 0.905 in Khan and Thomas (2003), 0.82 in Bachman, Caballero, and Engel (2008), and 0.592 in Cooper and Haltiwanger (2006). I chose a parameter value of ε = 4, which (under CRS) yields a + b =0.75, which is (i) roughly in the midpoint of this literature and (ii) optimal for the speed of the numerical simulation since a = 025 and b = 05 so that capital and labor have integer fraction exponentials which compute much faster.40 This implies a markup of 33%, which is toward the upper end of the range estimates for price–cost markups. I also check the robustness of my results to a parameter value of a + b = 083 (given by ε = 6 with CRS), which is consistent with a 20% markup. The uncertainty process parameters are primarily taken from the macro volatility process in Figure 1, with the baseline level of uncertainty estimated in the simulation. The labor attrition rate is chosen at 10% per annum. This low figure is selected for two reasons: (i) to be conservative in the simulations of an uncertainty shock, since attrition drives the fall in employment levels, and (ii) for numerical speed, as this matches the capital depreciation rate, so that the (L/K) dimension can be ignored if no investment and hiring/firing occur. I also report a robustness test for using an annualized labor attrition rate of 20%, which more closely matches the figures for annualized manufacturing quits in Davis, Faberman, and Haltiwanger (2006). 5.3. Estimation Under the null, a full-rank set of moments (Ψ A ) will consistently estimate the parameter of the adjustment costs (Θ).41 The choice of moments is also important for the efficiency of the estimator. This suggests that moments which are “informative” about the underlying structural parameters should be included. The basic insights of plant- and firm-level data on labor and capital is the presence of highly skewed cross-sectional growth rates and rich time-series dynamics, suggesting some combination of cross-sectional and time-series moments. Two additional issues help to guide the exact choice of moments. 5.3.1. Distinguishing the Driving Process From Adjustment Costs A key challenge in estimating adjustment costs for factor inputs is distinguishing between the dynamics of the driving process and factor adjust40 Integer fractional exponentials are more easily approximated in binary calculations (see Judd (1998, Chapter 2) for details). This is quantitatively important due to the intensity of exponential calculations in the simulation, for example, moving from a + b = 075 to a + b = 076 slows down the simulation by around 15%. Choosing a lower value of a + b also induces more curvature in the value function so that less grid points are required to map the relevant space. 41 Note that even with a full-rank set of moments the parameters are only identified pointwise.
660
NICHOLAS BLOOM
ment costs. Concentrating on the moments from only one factor—for example, capital—makes it very hard to do this. To illustrate this, first consider a very smooth driving process without adjustment costs, which would produce a smooth investment series. Alternatively consider a volatile driving process with convex capital adjustment costs, which would also produce a smooth investment series. Hence, without some additional moments (or assumptions), it would be very hard to estimate adjustment costs using just the investment series data. So I focus on the joint (cross-sectional and dynamic) moments of the investment, employment, and sales growth series. The difference in responses across the three series (investment, employment, and sales growth) identifies the two sets of adjustment costs (for capital and labor).42 5.3.2. Distinguishing Persistent Differences From Adjustment Costs A stylized fact from the estimation of firm- and plant-level investment and labor demand equations is the empirical importance of “fixed effects,” that is, persistent differences across firms and plants in their levels of investment, employment, and output growth rates. Without controls for these persistent differences, the estimates of the adjustment costs could be biased. For example, persistent between-firm differences in investment, employment, and sales growth rates due to different growth rates of demand would (in the absence of controls for this) lead to the estimation of large quadratic adjustment costs, which are necessary to induce the required firm-level autocorrelation. To control for differential firm-level growth rates, the estimator includes two parameters: the spread of firm-level business-conditions growth, μH − μL , which determines the degree of firm-level heterogeneity in the average growth rates of business conditions as defined in (3.3); and the persistence of firm-level μ business-conditions growth, πHH , as defined in (3.6). When μH − μL is large there will be large differences in the growth rates of labor, capital, and output μ across firms, and when πHH is close to unity these will be highly persistent.43 To identify these parameters separately from adjustment costs requires information on the time path of autocorrelation across the investment, employment, and sales growth series. For example, persistent correlations between investment, sales, and employment growth rates going back over many years would help to identify fixed differences in the growth rates of the driving process, while decaying correlations in the investment series only would suggest convex capital adjustment costs. 42 An alternative is a two-step estimation process in which the driving process is estimated first and then the adjustment costs are estimated given this driving process (see, for example, Cooper and Haltiwanger (2006)). μ 43 Note that with πHH = 1 these will be truly fixed effect differences.
THE IMPACT OF UNCERTAINTY SHOCKS
661
So I include moments for the second-order and fourth-order correlations of the investment, employment growth, and sales growth series.44 The secondorder autocorrelation is chosen to avoid a negative bias in these moments from underlying level measurement errors which would arise in a first-order autocorrelation measure, while the fourth-order autocorrelation is chosen to allow a sufficiently large additional time period to pass (2 years) to identify the decay in the autocorrelation series. Shorter and longer lags, like the third-order, fifth-order, and sixth-order autocorrelations could also be used, but in experimentations did not make much difference.45 5.4. Firm-Level Data There are too little data at the macroeconomic level to provide sufficient identification for the model. I therefore identify my parameters using a panel of firm-level data from U.S. Compustat. I select the 20 years of data that cover 1981 to 2000. The data were cleaned to remove major mergers and acquisitions by dropping the top and bottom 0.5% of employment growth, sales growth, and investment rates. Firms with an average of at least 500 employees and $10m sales (in 2000 prices) were kept to focus on firms which are more size homogeneous. This generated a sample of 2548 firms and 22,950 observations with mean (median) employees of 13,540 (3450) and mean (median) sales of $2247m ($495m) in 2000 prices. In selecting all Compustat firms I am conflating the parameter estimates across a range of different industries, and a strong argument can be made for running this estimation on an industry by industry basis. However, in the interests of obtaining the “average” parameters for a macro simulation and to ensure a reasonable sample size, I keep the full panel, leaving industryspecific estimation to future work. Capital stocks for firm i in industry m in year t are constructed by the perpetual inventory method46 : labor figures come from company accounts, while sales figures come from accounts after deflation using the CPI. The investment rate is calculated as (I/K)it = Iit /(05 ∗ (Kit + Kit−1 )), the employment growth rate as (L/L)it = (Lit − Lit−1 )/(05 ∗ (Lit + Lit−1 )), and the sales growth as (S/S)it = (Sit − Sit−1 )/(05 ∗ (Sit + Sit−1 )).47 44
Note: A kth-order correlation for series xit and yit is defined as Corr(xit yit−k ). Note that because the optimal weighting matrix takes into account the covariance across moments, adding extra moments that are highly correlated to included moments has very little impact on the parameters estimates. 46 Kit = (1 − δK )Kit−1 (Pmt /Pmt−1 ) + Iit , initialized using the net book value of capital, where Iit is net capital expenditure on plant, property, and equipment, and Pmt is the industry-level capital goods deflator from Bartelsman, Becker, and Grey (2000). 47 Gross investment rates and net employment growth rates are used since these are directly observed in the data. Under the null that the model is correctly specified, the choice of net versus gross is not important for the consistency of parameter estimates as long as the same actual and simulated moments are matched. 45
662
NICHOLAS BLOOM
The simulated data are constructed in exactly the same way as company accounts are built. First, firm value is created by adding up across the N units in each firm. It is then converted into annual figures using standard accounting techniques: simulated data for flow figures from the accounting profit & loss and cash-flow statements (such as sales and capital expenditure) are added up across the 12 months of the year; simulated data for stock figures from the accounting balance sheet statement (such as capital stock and labor force) are taken from the year end values. By constructing my simulation data in the same manner as company accounts I can estimate adjustment costs using firm-level data sets like Compustat. This has some advantages versus using census data sets like the Logitudinal Research Dataset (LRD) because firm-level data are (i) easily available to all researchers in a range of different countries; (ii) matched into firm level financial and cash-flow data; and (iii) available as a yearly panel stretching back several decades (for example to the 1950s in the United States). Thus, this technique of explicitly building aggregation into estimators to match against aggregated quoted firm-level data should have a broader use in other applications. The obvious disadvantage of using Compustat is it represents only about one-third of employment in the United States (Davis et al. (2006)). 5.5. Measurement Errors Employment figures are often poorly measured in company accounts, typically including all parttime, seasonal, and temporary workers in the total employment figures without any adjustment for hours, usually after heavy rounding. This problem is then made much worse by the differencing to generate growth rates. As a first step toward reducing the sensitivity toward these measurement errors, the autocorrelations of growth rates are taken over longer periods (as noted above). As a second step, I explicitly introduce employment measurement error into the simulated moments to try to mimic the bias these impute into the actual data moments. To estimate the size of the measurement error, I assume that firm wages (Wit ) can be decomposed into Wit = ηt λjt φi Lit , where ηt is the absolute price level, λjt is the relative industry wage rate, φi is a firm-specific salary rate (or skill/seniority mix), and Lit is the average annual firm labor force (hours adjusted). I then regress log Wit on a full set of year dummies, a log of the 4-digit SIC industry average wage from Bartelsman, Becker, and Gray (2000), a full set of firm-specific fixed effects, and log Lit . Under my null on the decomposition of Wit the coefficient on log Lit will be ap2 ), where σL2 is the variation in log employment and proximately σL2 /(σL2 + σME 2 σME is the measurement error in log employment. I find a coefficient (standard error (s.e.)) on log Lit of 0.882 (0.007), implying a measurement error of 13%
THE IMPACT OF UNCERTAINTY SHOCKS
663
in the logged labor force numbers.48 This is reassuringly similar to the 8% estimate for measurement error in Compustat manufacturing firms’ labor figures that Hall (1987) calculated comparing ordinary least squares and instrumental variable estimates. I take the average of these figures and incorporate this into the simulation estimation by multiplying the aggregated annual firm labor force by meit , where meit ∼ iid LN(0 0105) before calculating simulated moments. The other variable which is also potentially affected by measurement error is the capital stock.49 Actual depreciation rates are not observed, so the perpetual inventory capital stock is potentially mismeasured. To investigate the importance of this in Section 5.7 I reestimate the model assuming capital stocks also have a 10% log-normal measurement error. 5.6. Baseline Adjustment-Cost Estimates In this section I present the estimates of the units’ capital and labor adjustment costs. Starting with Table III, the column labelled Data in the bottom panel reports the actual moments from Compustat. These demonstrate that investment rates have a low spread but a heavy right skew, due to the lack of disinvestment, and strong dynamic correlations. Labor growth rates are relatively variable but unskewed, with weaker dynamic correlations. Sales growth rates have similar moments to those of labor, although slightly lower spread and higher degree of dynamics correlations. The column in Table III labelled All presents the results from estimating the preferred specification, allowing for all types of adjustment costs. The estimated adjustment costs for capital imply a large resale loss of around 34% on capital, fixed investment costs of 1.5% of annual sales (about 4 working days), and no quadratic adjustment costs. The estimated labor adjustment costs imply limited hiring and firing costs of about 1.8% of annual wages (about 5 working days), and a high fixed cost of around 2.1% of annual revenue (about 6 working days), with no quadratic adjustment costs. The standard errors suggest all of these point estimates are statistically significant except for the fixed cost of capital adjustment (CKF ). One question is how do these estimates compare to those previously estimated in the literature? Table IV presents a comparison for some other estimates from the literature. Three factors stand out: first, there is tremendous variation is estimated adjustment costs, reflecting the variety of data, techniques, and assumptions used in the different papers; second, my estimates 48 Adding firm- or industry-specific wage trends reduces the coefficient on log Wit , implying an even higher degree of measurement error. Running the reverse regression of log labor on log wages plus the same controls generates a coefficient (s.e.) of 0.990 (0.008), indicating that the proportional measurement error in wages (a better recorded financial variable) is many times smaller than that of employment. The regressions are run using 2468 observations on 219 firms. 49 Sales and capital expenditure values are usually easier to audit and so much better measured.
664
NICHOLAS BLOOM TABLE III ADJUSTMENT COST ESTIMATES (TOP PANEL)a Adjustment Costs Specification
Estimated Parameters
All
Capital
Labor
Quad
None
CPK : investment resale loss (%)
339 427 (68) (142) CFK : investment fixed cost (% annual sales) 15 11 (15) (02) 0 0996 CQ K : capital quadratic adjustment cost (parameter) (0009) (0044) CPL : per capita hiring/firing cost (% annual wages) 18 (08) CFL : fixed hiring/firing costs (% annual sales) 21 (09) 0 CQ L : labor quadratic adjustment cost (parameter) (0037) 0443 0413 σ L : baseline level of uncertainty (0009) (0012) μH − μL : spread of firm business conditions growth 0121 0122 (0002) (0002) μ π HL : transition of firm business conditions growth 0 0 (0001) (0001) γ: curvature of the hours/wages function 2093 2221 (0272) (0146)
4844 (454.15) 167 (01) 11 (01) 1010 (0017) 0216 (0005) 0258 (0001) 0016 (0001) 3421 (0052)
0 (0002) 0171 0100 (0005) (0005) 0082 0158 (0001) (0001) 0 0011 (0001) (0001) 2000 2013 (0009) (1471) (Continues)
of positive capital and labor adjustment costs appear broadly consistent with other papers which jointly estimate these; and third, studies which estimate nonconvex adjustment costs report positive and typically very substantial values. For interpretation, in Table III I also display results for four illustrative restricted models. First, a model with capital adjustment costs only, assuming labor is fully flexible, as is typical in the investment literature. In the Capital columns we see that the fit of the model is worse, as shown by the significant rise in the criterion function from 404 to 625.50 This reduction in fit is primarily due to the worse fit of the labor moments, suggesting ignoring labor adjustment costs is a reasonable approximation for modelling investment. Second, a model with labor adjustment costs only—as is typical in the dynamic labor demand literature—is estimated in the column Labor: the fit is substantially worse. This suggests that ignoring capital adjustment costs is problematic. Third, a model 50 The χ2 value for 3 degrees of freedom is 7.82, so the column Capital can easily be rejected against the null of All given the difference in criterion values of 221. It is also true, however, that the preferred All specification can also be rejected as the true representation of the data given that the χ2 value for 10 degrees of freedom is 18.31.
665
THE IMPACT OF UNCERTAINTY SHOCKS TABLE III (BOTTOM PANEL) Moments
Correlation (I/K)it with (I/K)it−2 Correlation (I/K)it with (I/K)it−4 Correlation (I/K)it with (L/L)it−2 Correlation (I/K)it with (L/L)it−4 Correlation (I/K)it with (S/S)it−2 Correlation (I/K)it with (S/S)it−4 Standard deviation (I/K)it Coefficient of skewness (I/K)it Correlation (L/L)it with (I/K)it−2 Correlation (L/L)it with (I/K)it−4 Correlation (L/L)it with (L/L)it−2 Correlation (L/L)it with (L/L)it−4 Correlation (L/L)it with (S/S)it−2 Correlation (L/L)it with (S/S)it−4 Standard deviation (L/L)it Coefficient of skewness (L/L)it Correlation (S/S)it with (I/K)it−2 Correlation (S/S)it with (I/K)it−4 Correlation (S/S)it with (L/L)it−2 Correlation (S/S)it with (L/L)it−4 Correlation (S/S)it with (S/S)it−2 Correlation (S/S)it with (S/S)it−4 Standard deviation (S/S)it Coefficient of skewness (S/S)it Criterion, Γ (θ)
Data Moments − Simulated Moments
Data
0.328 0.258 0.208 0.158 0.260 0.201 0.139 1.789 0.188 0.133 0.160 0.108 0.193 0.152 0.189 0.445 0.203 0.142 0.161 0.103 0.207 0.156 0.165 0.342
0060 0037 0003 −0015 −0023 −0010 −0010 0004 −0007 −0021 0011 −0013 −0019 0003 −0022 −0136 −0016 −0008 −0005 −0015 −0033 0002 0004 −0407
−0015 0004 −0025 −0009 −0062 −0024 0010 0092 0052 0024 0083 0054 0063 0056 −0039 0294 −0015 −0010 0032 0011 0002 0032 0003 −0075
0049 0088 0004 0036 −0044 0018 −0012 1195 −0075 −0061 −0033 −0026 −0091 −0051 0001 −0013 −0164 −0081 −0105 −0054 −0188 −0071 0033 −0365
−0043 0031 −0056 0008 −0102 −0036 0038 1311 0055 0038 0071 0045 0064 0059 −0001 0395 −0063 −0030 −0024 −0005 −0040 −0021 0051 0178
0148 0162 0078 0091 0024 0087 0006 1916 0053 0062 0068 0060 0023 0063 0005 0470 −0068 −0027 −0037 −0020 −0158 −0027 0062 0370
404
625
3618
2798
6922
a The Data column (bottom panel only) contains the moments from 22,950 observations on investment (I), capital (K), labor (L) and sales (S) for 2548 firms. The other columns contain the adjustment costs estimates (top panel) and data moments minus the simulated moments (bottom panel) for all adjustment costs (All), just capital adjustment costs (Capital), just labor adjustment costs (Labor), just quadratic adjustment costs with 1 unit per firm (Quad), and no adjustment costs (None). So, for example, the number 0.328 at the top of the Data column reports that the secondlag of the autocorrelation of investment in the data is 0.328, and the number 0.060 to the right reports that in the All specification the simulated moment is 0.060 less than the data moment (so is 0.268 in total). In the top panel standarderrors are given in italics in parentheses below the point estimates. Parameters were estimated using simulated method of moments, and standard were errors calculated using numerical derivatives.
with quadratic costs only and no cross-sectional aggregation—as is typical in convex adjustment-costs models—is estimated in the Quad column, leading to a moderate reduction in fit generated by excessive intertemporal correlation and an inadequate investment skew. Interestingly, industry and aggregate data are much more autocorrelated and less skewed due to extensive aggregation, suggesting quadratic adjustments costs could be a reasonable approximation at this level. Finally, a model with no adjustment costs is estimated in column None. Omitting adjustment costs clearly reduces the model fit. It also biases the estimation of the business-conditions process to have much larger firm-
666
NICHOLAS BLOOM TABLE IV A COMPARISON WITH OTHER CAPITAL AND LABOR ADJUSTMENT-COST ESTIMATES Capital
Source
PI (%)
Column All in Table III (this paper) Ramey and Shapiro (2001) Caballero and Engel (1999) Hayashi (1982) Cooper and Haltiwanger (2006) Shapiro (1986) Merz and Yashiv (2007) Hall (2004) Nickell (1986) Cooper, Haltiwanger, and Willis (2004)
339 40–80
Fixed (%)
15
Labor Quad
0
PI (%)
Fixed (%)
1.8
2.1
Quad
0
165 25
204
20 0049 3 142 0
133 42 0 8–25 1.7
0
a PI denotes partial irreversibilities, Fixed denotes fixed costs, and Quad denotes quadratic adjustment costs. Missing values indicate no parameter estimated in the main specification. Zeros indicate the parameter was not significantly different from zero. Nickell’s (1986) lower [higher] value is for unskilled [skilled] workers. Shapiro’s (1986) value is a weighted average of (2/3) × 0 for production workers and (1/3) × 4 for nonproduction workers. Merz and Yashiv’s (2007) values are the approximated quadratic adjustment costs at the sample mean. Comparability subject to variation in data sample, estimation technique, and maintained assumptions.
level growth fixed effects and lower variance of the idiosyncratic shocks. This helps to highlight the importance of jointly estimating adjustment costs and the driving process. In Table III there are also some estimates of the driving process parameters μ , as well as the wage–hours curve parameter γ. What σL , μH − μL , and πHH is clear is that changes in the adjustment-cost parameters leads to changes in these parameters. For example, the lack of adjustment costs in the column Quad generates an estimated uncertainty parameter of around one-third of the baseline All value and a spread in firm-level fixed costs of about twothirds of the baseline All value. This provides support for the selection of moments that can separately identify the driving process and adjustment-cost parameters. 5.7. Robustness Tests on Adjustment-Cost Estimates In Table V I run some robustness tests on the modelling assumptions. The column All repeats the baseline results from Table III for ease of comparison. The column δL = 02 reports the results from reestimating the model with a 20% (rather than 10%) annual attrition rate for labor. This higher rate of attrition leads to higher estimates of quadratic adjustment costs for labor and capital, and lower fixed costs for labor. This is because with higher labor attrition rates, hiring and firing become more sensitive to current demand shocks (since higher attrition reduces the sensitivity to past shocks). To compensate, the estimated quadratic adjustment-cost estimates are higher and fixed costs
TABLE V ADJUSTMENT COST ROBUSTNESS TESTSa Adjustment Costs Specification Estimated Parameters
CPK : investment resale loss (%)
Q
CK : capital quadratic adjustment cost CPL : per capita hiring/firing cost (% wages) CFL : fixed hiring/firing costs (% sales) Q
CL : labor quadratic adjustment cost σ L : baseline level of uncertainty μH − μL : spread of business conditions growth μ
π HL : transition business conditions growth γ: curvature of hours/wages function Criterion, Γ (θ)
δL = 20%
a + b = 083
N = 25
N =1
Pre −5%
Pre +5%
Cap ME
339 (68) 15 (10) 0 (0009) 18 (08) 21 (09) 0 (0037) 0443 (0009) 0121 (0002) 0 (0001) 2093 (0272)
286 (48) 21 (09) 0461 (0054) 10 (01) 03 (01) 0360 (0087) 0490 (0019) 0137 (0002) 0 (0001) 2129 (0222)
298 (48) 21 (05) 0 (0007) 0 (00) 17 (06) 0 (0021) 0498 (0012) 0123 (0001) 0 (0001) 2000 (0353)
303 (87) 09 (04) 0616 (0154) 0 (01) 13 (08) 0199 (0062) 0393 (0013) 0163 (0002) 0 (0001) 2148 (0266)
470 (91) 13 (02) 2056 (0284) 0 (01) 0 (00) 0070 (0031) 0248 (0008) 0126 (0002) 0 (0001) 2108 (0147)
339 (68) 15 (10) 0 (0009) 18 (08) 21 (09) 0 (0037) 0443 (0009) 0121 (0002) 0 (0001) 2093 (0272)
361 (49) 16 (05) 0058 (0037) 0 (004) 16 (07) 0 (0018) 0469 (0011) 0132 (0002) 0 (0001) 2056 (0246)
353 (51) 09 (04) 0525 (0078) 0 (07) 09 (04) 0 (0017) 0515 (0017) 0152 (0003) 0 (0001) 2000 (0211)
404
496
379
556
593
403
380
394
Yearly
Weekly
453 500 (52) (308) 21 16 (03) (04) 0025 1488 (0015) (0729) 20 0 (09) (61) 20 13 (05) (03) 1039 0808 (0165) (0612) 0339 0600 (0011) (0035) 0228 0 (0005) (0018) 0016 n/a (0001) n/a 2000 2129 (0166) (0254) 656
52
THE IMPACT OF UNCERTAINTY SHOCKS
CFK : investment fixed cost (% annual sales)
All
a Adjustment costs estimates using the same methodology as in Table III for the baseline model (All), except 20% annualized labor attrition (δ = 20%), 20% markup L
(a + b = 083), only 25 units per firm (N = 25), only 1 unit per firm (N = 1), a 5% negative business-conditions shock 6 months prior to the start of the simulation period (pre −5%), a 5% positive business-conditions shock 6 months prior to the simulation period (pre +5%), a 10% log-normal capital measurement error (cap ME), and simulation run at a yearly frequency (Yearly). The final column (Weekly) is different. This evaluates temporal aggregation bias. It reports parameters estimated using moments from simulated data created by taking the parameters values from the All column, simulating at a weekly frequency, and then aggregating up to yearly data. This is done to test the bias from estimating the model assuming an underlying monthly process when in fact the moments are generated from an underlying weekly process. Standard errors are given in Italics below the point estimates.
667
668
NICHOLAS BLOOM
are lower. The column a + b = 083 reports the results for a specification with a 20% markup, in which the estimated adjustment costs look very similar to the baseline results. In columns N = 25 and N = 1, the results are reported for simulations assuming the firm operates 25 units and 1 unit, respectively.51 These assumptions also lead to higher estimates for the quadratic adjustment costs and lower estimates for the nonconvex adjustment costs to compensate for the reduction in smoothing by aggregation. In columns Pre −5% and Pre +5%, the results are reported for simulations assuming there is a −5% and +5% shock, respectively, to business conditions in the 6 months before the simulation begins. This simulates running the simulation during a recession and boom, respectively, to investigate how the initial conditions influence the results. As shown in Table V the results are numerically identical for Pre −5% and similar for Pre +5% to those for All, suggesting the results are robust to different initial conditions. The reason is that the long time period of the simulation (20 years) and the limited persistence of macro shocks means that the impact of initial conditions dies quickly. In column Cap ME, the parameters are estimated including a log-normal 10% measurement error for capital, with broadly similar results. Finally, the last two columns investigate the impact of time aggregation. First, the column Yearly reports the results for running the simulation at a yearly frequency without any time aggregation. Dropping time aggregation leads to higher estimated quadratic adjustment costs to compensate for the loss of smoothing by aggregation. Second, the column Weekly reports the results from (i) taking the baseline All parameter estimates and using them to run a weekly frequency simulation, (ii) aggregating these data to a yearly frequency, and (iii) using this to estimate the model assuming a monthly underlying frequency. Thus, this seeks to understand the bias from assuming the model is monthly if the underlying generating process was in fact weekly. Comparing the All parameter values used to generate the data with the Weekly values estimated from (incorrectly) assuming a monthly underlying frequency reveals a number of differences. This highlights that modelling time aggregation correctly appears to matter for correctly estimating adjustment costs. Given the lack of prior empirical or simulation results on temporal aggregation, this suggests an area in which further research would be particularly valuable. 6. SIMULATION ROBUSTNESS In this section, I undertake a set of simulation robustness tests for different values of adjustment costs, the predetermined parameters, and the uncertainty process. 51 The specification with N = 1 is included to provide guidance on the impact of simulated aggregation rather than for empirical realism. The evidence from the annual reports of almost all large companies suggests aggregation is pervasive.
THE IMPACT OF UNCERTAINTY SHOCKS
669
6.1. Adjustment Costs To evaluate the effects of different types of adjustment, I ran three simulations: the first with fixed costs only, the second with partial irreversibilities only, and the third with quadratic adjustment costs only.52 The output from these three simulations is shown in Figure 16. As can be seen, the two specifications with nonconvex adjustment costs generate a distinct drop and rebound in economic activity. The rebound with fixed costs is faster than with partial irreversibilities because of the bunching in hiring and investment, but otherwise they are remarkably similar in size, duration, and profile. The quadratic adjustment-cost specification appears to generate no response to an uncertainty shock. The reason is that there is no kink in adjustment costs around zero, so there is no option value associated with doing nothing. In summary, this suggests the predictions are very sensitive to the inclusion of some degree of nonconvex adjustment costs, but are much less sensitive to the type (or indeed level) of these nonconvex adjustment costs. This highlights
FIGURE 16.—Different adjustment costs. Adjustment costs in the fixed costs specification have only the CKF and CLF adjustment costs from the All estimation in Table III the partial irreversibility has only the CKP and CLP from the baseline All estimation in Table III, and the quadratic has the adjustment costs from the Quad column in Table III.
52
For fixed costs and partial irreversibilities the adjustment costs are the fixed cost and partial irreversibility components of the parameter values from the All column in Table III. For quadratic adjustment costs, the values are from the Quad column in Table III.
670
NICHOLAS BLOOM
the importance of the prior step of estimating the size and nature of the underlying labor and capital adjustment costs. 6.2. Predefined Parameters To investigate the robustness of the simulation results to the assumptions over the predefined parameters, I reran the simulations using the different parameters from Table V. The results, shown in Figure 17, highlight that the qualitative result of a drop and rebound in activity is robust to the different assumptions over the predetermined parameters. This is because of the presence of some nonconvex component in all the sets of estimated adjustment costs in Table V. The size of this drop and rebound did vary across specifications, however. Running the simulation with the N = 1 parameter estimates from Table V leads to a drop of only 1%, about half the baseline drop of about 1.8%. This smaller drop was due to the very high levels of estimated quadratic adjustment costs that were required to smooth the investment and employment series in the absence of cross-sectional aggregation. Of course, the assumption of no cross-sectional aggregation (N = 1) is inconsistent with the pervasive aggregation in the typical large firm. This simulation is presented simply to highlight the importance of building aggregation into estimation routines when it is also present in the data.
FIGURE 17.—Simulation robustness to different parameter assumptions.
THE IMPACT OF UNCERTAINTY SHOCKS
671
In the δL = 02 specification the drop was around 2.25%, about 30% above the baseline drop, due to the greater labor attrition after the shock. Hence, this more realistic assumption on 20% annual labor attrition (rather than 10% in the baseline) generates a larger drop and rebound in activity. The results for assuming partial cross-sectional aggregation (N = 25), a 20% markup (a + b = 083), a preestimation boom (Pre +5%), and capital measurement errors (Cap ME) are all pretty similar to the baseline simulation (which has full crosssectional aggregation and a 33% markup). 6.3. Durations and Sizes of Uncertainty Shocks Finally, I also evaluate the effects of robustness of the simulation predictions to different durations and sizes of uncertainty shocks. In Figure 18, I plot the output response to a shorter-lived shock (a 1-month half-life) and a longerlived shock (a 6-month half-life). Also plotted is the baseline (a 2-month halflife). It is clear that longer-lived shocks generate larger and more persistent falls in output. The reason is that the pause in hiring and investment lasts for longer if the rise in uncertainty is more persistent. Of course, because the rise in uncertainty is more persistent, the cumulative increase in volatility is also larger so that the medium-term “volatility overshoot” is also greater. Hence, more persistent uncertainty shocks generate a larger and more persistent drop, rebound, and overshoot in activity. This is interesting in the context of the Great Depression, a period in which uncertainty rose to 260% of the baseline
FIGURE 18.—Uncertainty shocks with half-lives (HL) of 1 month, 2 months, and 6 months.
672
NICHOLAS BLOOM
FIGURE 19.—Different sizes of uncertainty shocks. The larger and smaller uncertainty shocks have values of σH equal to 150% and 300% of the σL level (baseline is 200%).
level for over 4 years, which in my (partial equilibrium) model would generate an extremely large and persistent drop in output and employment. In Figure 19, I plot the output response to a smaller uncertainty shock (σH = 15 × σL ), a larger uncertainty shock (σH = 3 × σL ), and the baseline uncertainty shock (σH = 2 × σL ). Surprisingly, the three different sizes of uncertainty shock lead to similar sized drops in activity. The reason is that realoption values are increasing, but concave, in the level of uncertainty,53 so the impact of a 50% rise in uncertainty on the hiring and investment thresholds is about two-thirds of the size of the baseline 100% rise in uncertainty. Since the baseline impact on the hiring and investment thresholds is so large, even twothirds of this pauses almost all hiring and investment. What is different across the different sizes of shocks, however, is that larger uncertainty shocks generate a larger medium-term volatility overshoot because the cumulative increase in volatility is greater. Finally, in Figure 20, I evaluate the effects of an uncertainty shock which only changes the variance of macro shocks, and not the variance of firm- or unit-level shocks. This changes two things in the simulation. First, overall uncertainty only rises by 33% after a shock, since while macro uncertainty doubles, firm and micro uncertainty are unchanged. Despite this the initial drop is similar to the baseline simulation for a 100% increase in overall uncertainty. 53
See Dixit (1993) and Abel and Eberly (1996) for an analytical derivation and discussion.
THE IMPACT OF UNCERTAINTY SHOCKS
673
FIGURE 20.—Macro uncertainty shock only. At month 0 the simulation plot Macro uncertainty shock only has macro shocks only set to σH at month 0. Firm and unit level shocks are σL at all times.
This confirms the results from Figure 19 that even moderately sized uncertainty shocks are sufficient to pause activity. Second, there is no cross-sectional increase in firm- and unit-level variance, substantially reducing the volatility overshoot.54 7. CONCLUSIONS Uncertainty appears to dramatically increase after major economic and political shocks like the Cuban missile crisis, the assassination of JFK, the OPEC I oil-price shock, and the 9/11 terrorist attacks. If firms have nonconvex adjustment costs, these uncertainty shocks will generate powerful real-option effects, driving the dynamics of investment and hiring behavior. These shocks appear to have large real effects: the uncertainty component alone generates a 1% drop and rebound in employment and output over the following 6 months, and a milder long-run overshoot. This paper offers a structural framework to analyze these types of uncertainty shocks, building a model with a time-varying second moment of the 54 There is still some volatility overshoot due to the averaging across macro shocks in the 25,000 macro draws. The reason for this is that the cross-sectional distribution is right-skewed on average (as shown in Figure 6) so that investment responds more to positive than negative macro shocks.
674
NICHOLAS BLOOM
driving process, and a mix of labor and capital adjustment costs. The model is numerically solved and estimated on firm-level data using simulated method of moments. The parameterized model is then used to simulate a large macro uncertainty shock, which produces a rapid drop and rebound in output, employment, and productivity growth. This is due to the effect of higher uncertainty making firms temporarily pause their hiring and investment behavior. In the medium term, the increased volatility arising from the uncertainty shock generates a volatility overshoot as firms respond to the increased variance of productivity shocks, which drives a medium-term overshoot and longer-run return to trend. Hence, the simulated response to uncertainty shocks generates a drop, rebound, and longer-run overshoot, much the same as their actual empirical impact. This temporary impact of a second-moment shock is different from the typically persistent impact of a first moment shock. While the second-moment effect has its biggest drop by month 3 and has rebounded by about month 6, persistent first-moment shocks generate falls in activity that last several quarters. Thus, for a policymaker in the immediate aftermath of a shock it is critical to distinguish the relative contributions of the first- and second-moment components of shocks for predicting the future evolution of output. The uncertainty shock also induces a strong insensitivity to other economic stimuli. At high levels of uncertainty the real-option value of inaction is very large, which makes firms extremely cautious. As a result, the effects of empirically realistic general equilibrium type interest rate, wage, and price falls have a very limited short-run effect on reducing the drop and rebound in activity. This raises a second policy implication, that in the immediate aftermath of an uncertainty shock, monetary or fiscal policy is likely to be particularly ineffective. This framework also enables a range of future research. Looking at individual events it could be used, for example, to analyze the uncertainty impact of major deregulations, tax changes, trade reforms, or political elections. It also suggests there is a trade-off between policy correctness and decisiveness—it may be better to act decisively (but occasionally incorrectly) than to deliberate on policy, generating policy-induced uncertainty. More generally these second-moment effects contribute to the “where are the negative productivity shocks?” debate in the business cycle literature. It appears that second-moment shocks can generate short sharp drops and rebounds in output, employment, investment, and productivity growth without the need for a first-moment productivity shock. Thus, recessions could potentially be driven by increases in uncertainty. Encouragingly, recessions do indeed appear in periods of significantly higher uncertainty, suggesting an uncertainty approach to modelling business cycles (see Bloom, Floetotto, and Jaimovich (2007)). Taking a longer-run perspective, this model also links to the volatility and growth literature given the evidence for the primary role of reallocation in productivity growth.
THE IMPACT OF UNCERTAINTY SHOCKS
675
The paper also jointly estimates nonconvex and convex labor and capital adjustment costs. I find substantial fixed costs of hiring/firing and investment, a large loss from capital resale, and some moderate per-worker hiring/firing costs. I find no evidence for quadratic investment or hiring/firing adjustment costs. I also find that assuming capital adjustment costs only—as is standard in the investment literature—generates an acceptable overall fit, while assuming labor adjustment costs only—as is standard in the labor demand literature— produces a poor fit. APPENDIX A: DATA All data and Stata do files used to create the empirical Figures 1, 2, and 3 and Table I are available at http://www.stanford.edu/~nbloom/. In this appendix I describe the contents and construction of these data sets. A.1. Stock-Market Volatility Data A.1.1. Testing for Jumps in Stock-Market Volatility To test for jumps in stock-market volatility, I use the nonparametric bipower variation test of Barndorff-Nielsen and Shephard (2006). The test works for a time series {xt , t = 1 2 N} by comparing the squared variN N ation, SV = t=3 (xt − xt−1 )2 with the bipower variation, BPV = t=3 (xt − xt−1 )(xt−1 − xt−2 ). In the limit as dt → 0, if there are no jumps in the data, then E[SV] → E[BPV] since the variation is driven by a continuous process. If there are jumps, however, then E[SV] > E[BPV] since jumps have a squared impact on SV but only a linear impact on BPV. Barndorff-Nielsen and Shephard (2006) suggested two different test statistics—the linear-jump and the ratio-jump test—which have the same asymptotic distribution but different finite-sample properties. Using the monthly data from Figure 1, I reject the null of no jumps at the 2.2% and 1.6% level using the linear and ratio tests, respectively. Using the daily VXO data underlying Figure 1 (available from January 1986 onward), I reject the null of no jumps using both tests at the 0.0% level. A.1.2. Defining Stock-Market Volatility Shocks Given the evidence for the existence of stock-market volatility jumps, I need to define what they are. The main measure is an indicator that takes a value of 1 for each of the 17 events labelled in Figure 1, and 0 otherwise. These 17 events are chosen as those with stock-market volatility more than 1.65 standard deviations above the Hodrick–Prescott detrended (λ = 129600) mean of the stock-market volatility series (the raw undetrended series is plotted in Figure 1). While some of these shocks occur in 1 month only, others span multiple months so there was a choice over the exact allocation of their timing. I tried
676
NICHOLAS BLOOM TABLE A.1 MAJOR STOCK-MARKET VOLATILITY SHOCKS
Event
Max Volatility
First Volatility
Type
Cuban missile crisis Assassination of JFK Vietnam buildup Cambodia and Kent State OPEC I, Arab–Israeli War Franklin National OPEC II Afghanistan, Iran hostages Monetary cycle turning point Black Monday Gulf War I Asian Crisis Russian, LTCM default 9/11 terrorist attack Worldcom and Enron Gulf War II Credit crunch
October 1962 November 1963 August 1966 May 1970 December 1973 October 1974 November 1978 March 1980 October 1982 November 1987 October 1990 November 1997 September 1998 September 2001 September 2002 February 2003 October 2008
October 1962 November 1963 August 1966 May 1970 December 1973 September 1974 November 1978 March 1980 August 1982 October 1987 September 1990 November 1997 September 1998 September 2001 July 2002 February 2003 August 2007
Terror Terror War War Oil Economic Oil War Economic Economic War Economic Economic Terror Economic War Economic
two different approaches: the primary approach is to allocate each event to the month with the largest volatility spike for that event; an alternative approach is to allocate each event to the first month in which volatility went more than 2 standard deviations above the HP detrended mean. The events can also be categorized in terms of terror, war, oil, or economic shocks. So a third volatility indicator uses only the arguably most exogenous terms of terror, war, and oil shocks. The volatility shock events, their dates under each timing scheme, and their classification are shown in Table A.1. It is noticeable from Table A.1 that almost all the shocks are bad events. So one question for empirical identification is how distinct are stock-market volatility shocks from stock-market levels shocks? Fortunately, it turns out that these series do move reasonably independently because some events—like the Cuban missile crisis—raise volatility without impacting stock-market levels, while others—like hurricane Katrina— generate falls in the stock market without raising volatility. So, for example, the log detrended stock-market level has a correlation of −0192 with the main 1/0 volatility shock indicator, a correlation of −0136 with the 1/0 oil, terror, and war shock indicator, and a −0340 correlation with the log detrended volatility index itself. Thus, the impact of stock-market volatility can be separately identified from stock-market levels. In the working paper version of this paper (Bloom (2008)), I briefly described each of the 17 volatility shocks shown on Figure 1 to highlight the fact that these are typically linked to real shocks.
THE IMPACT OF UNCERTAINTY SHOCKS
677
A.2. Cross-Sectional Uncertainty Measures There are four key cross-sectional uncertainty measures: Standard Deviation of Firm-Level Profits Growth: This is measured on a quarterly basis using Compustat quarterly accounts. It is the cross-sectional standard deviation of the growth rates of pretax profits (data item 23). Profit growth has a close fit to productivity and demand growth in homogeneous revenue functions, and is one of the few variables to have been continuously reported in quarterly accounts since the 1960s. This is normalized by the firms’ average sales (data item 2) and is defined as (profitst − profitst−1 )/(05 × salest + 05 × salest−1 ). Only firms with 150 or more quarters of accounts with sales and pretax profit figures are used to minimize the effects of sample composition changes.55 The growth rates are windsorized at the top and bottom 0.05% growth rates to prevent the series from being driven by extreme outliers. Standard Deviation of Firm-Level Stock Returns: This is measured on a monthly basis using the CRSP data file. It is the cross-sectional standard deviation of the monthly stock returns. The sample is all firms with 500 or more months of stock-returns data. The returns are windsorized at the top and bottom 0.5% growth rates to prevent the series from being driven by extreme outliers. Standard Deviation of Industry-Level TFP Growth: This is measured on an annual basis using the NBER industry data base (Bartelsman, Becker, and Gray (2000)). The cross-sectional spread is defined as the standard deviation of the five-factor TFP growth rates, taken across all SIC 4-digit manufacturing industries. The complete sample is a balanced panel for 422 of the 425 industries (results are robust to dropping these three industries). Standard Deviation of GDP Forecasts. This is measured on a half-yearly basis using the Philadelphia Federal Reserve Bank’s Livingstone survey of professional forecasters. It is defined as the cross-sectional standard deviation of the 1-year-ahead GDP forecasts normalized by the mean of the 1-year-ahead GDP forecasts. Only half-years with 50+ forecasts are used to ensure sufficient sample size for the calculations. This series is linearly detrended across the sample (1950–2006) to remove a long-run downward drift of forecaster variance. A.3. VAR Data The VAR estimations are run using monthly data from June 1962 through June 2008. The full set of VAR variables in the estimation are log industrial production in manufacturing (Federal Reserve Board of Governors, seasonally adjusted), employment in manufacturing (BLS, seasonally adjusted), average hours in manufacturing (BLS, seasonally adjusted), log consumer price 55 Limiting compositional change helps to address some of the issues raised by Davis, Faberman, and Haltiwanger (2006), who found rising sales volatility of publicly quoted firms but flat volatility of privately held firms. I also include a time trend in column 2 of Table I to directly control for this and focus on short-run movements.
678
NICHOLAS BLOOM
FIGURE A1.—VAR robustness to different shock definitions.
index (all urban consumers, seasonally adjusted), log average hourly earnings for production workers (manufacturing), Federal Funds rate (effective rate, Federal Reserve Board of Governors), a monthly stock-market volatility indicator (described below), and the log of the S&P500 stock-market index. All variables are HP detrended using a filter value of λ = 129600. In Figure A1, the industrial production impulse response function is shown for four different measures of volatility: the actual series in Figure 1 after HP detrending (square symbols), the 1/0 volatility indicator with the shocks scaled by the HP detrended series (dot symbols), an alternative volatility indicator which dates shocks by their first month (rather than their highest month) (triangle symbols), and a series which only uses the shocks linked to terror, war, and oil (plus symbols). As can be seen, each one of these shock measures generates a rapid drop and rebound in the predicted industrial production. In Figure A2, the VAR results are also shown to be robust to a variety of alternative variable sets and orderings. The VAR is reestimated using a simple trivariate VAR (the volatility indicator, log employment, and industrial production only) (square symbols) also displays a drop and rebound. The “quadvariate” VAR (the volatility indicator, log stock-market levels, log employment, and industrial production) also displays a similar drop and rebound (cross symbols), as does the quadvariate VAR with the variable ordering reversed (circular symbols). Hence the response of industrial production to a volatility shock appears robust to both the basic selection and the ordering of variables. In Figure A3, I plot the results using different HP detrending filter values: the linear detrended series (λ = ∞) is plotted (square symbols) alongside the baseline detrending (λ = 129600) (cross symbols) and the “flexible” detrending (λ = 1296). As can be seen, the results again appear robust. I also conducted a range of other experiments, such as adding controls for the oil price (spot price of West Texas), and found the results to be similar.
THE IMPACT OF UNCERTAINTY SHOCKS
679
FIGURE A2.—VAR robustness to different variable sets and ordering.
APPENDIX B: NUMERICAL SOLUTION METHOD This appendix describes some of the key steps in the numerical techniques used to solve the firm’s maximization problem. The full program, which runs on Matlab 64-bit, is at http://www.stanford.edu/~nbloom/. B.1. Value Function Iteration The objective is to solve the value function (3.7). This value function solution procedure is used in two parts of the paper. The first is in the simulated method of moments estimation of the unknown adjustment cost parameters,
FIGURE A3.—VAR robustness to different variable detrending assumptions.
680
NICHOLAS BLOOM
whereby the value function is repeatedly solved for a variety of different parameters, the data are simulated, and these moments are used in the parameter search algorithm. The second is in the simulation where the value function is solved just once—using the estimated parameters choices—and then used to simulate a panel of 1000 units, repeated 25,000 times. The numerical contraction mapping procedure used to solve the value function in both cases is the same. This proceeds following three steps: STEP 1: Choose a grid of points in (a l σ μ) space. Given the log-linear structure of the demand process, I use a grid of points in (log(a) log(l) σ μ) space. In the log(a) and log(l) dimensions this is equidistantly spaced; in the σ and μ spacing this is determined by the estimated parameters. The normalization by capital in a and l—noting that a = A/K and l = L/K— also requires that the grid spacing in the log(a) and log(l) dimensions be the same (i.e., ai+1 /ai = lj+1 / lj , where i j = 1 2 N index grid points) so that the set of investment rates {ai /a1 ai /a2 ai /aN } maintains the state space on the grid.56 This equivalency between the grid spaces in the log(a) and log(l) dimensions means that the solution is substantially simplified if the values of δK and δL are equal, so that depreciation leaves the log(l) dimension unchanged. When δK and δL are unequal, the difference between them needs to be an integer of the grid spacing. For the log(a) dimension depreciation is added to the drift in the stochastic process, so there is no constraint on δK . Given the conversion to logs, I need to apply the standard Jensen correction to the driving process (3.2), (3.3), and (3.4); for example, for (3.2), M 2 2 M log(AM t ) = log(At−1 ) − (σt−1 − σL )/2 + σt−1 Wt . The uncertainty effect on the drift rate is second order compared to the real-options effect, so the simulations are virtually unchanged if this correction is omitted. I used a grid of 40,000 points (100 × 100 × 2 × 2). I also experimented with finer and coarser partitions and found that there were some changes in the value functions and policy choices as the partition changed, but the characteristics of the solution (i.e., a threshold response space as depicted in Figure 3) were unchanged as long as about 60 or more grid points were used in the log(a) and log(l) dimensions. Hence, the qualitative nature of the simulation results was robust to moderate changes in the number of points in the state space partition. STEP 2: Define the value function on the grid of points. This is straightforward for most of the grid, but toward the edge of the grid, due to the randomwalk nature of the demand process, this requires taking expectations of the value function off the edge of the state space. To address this, an extrapolation procedure is used to approximate the value function off the edge of the 56 Note that some extreme choices of the investment rate will move the state off the l grid which induces an offsetting choice of employment growth rates e to ensure this does not occur.
THE IMPACT OF UNCERTAINTY SHOCKS
681
state space. Under partial irreversibilities and/or fixed costs, the value function is log-linear outside the zone of inaction, so that as long as the state space is defined to include the region of inaction, this approximation is exact. Under quadratic adjustment costs the value function, however, is concave so a loglinear approach is only approximately correct. With a sufficiently large state space, however, the probability of being at a point off the edge of the state space is very low, so any approximation error will have little impact. STEP 3: The value function iteration process. First, select a starting value for the value function in the first loop. I used the solution for the value function without any adjustment costs, which can be easily derived. In the SMM estimation routine I initially tried using the last solution in the next iteration, but found this could generate instability in the estimations loop. So instead I always used the same initial value function. The speed of value function iteration depends on the modulus of contraction, which with a monthly frequency and a 6.5% annual discount rate is relatively slow. The number of loops was fixed at 250, which was chosen to ensure convergence in the policy functions. In practice, value functions typically converge more slowly than the policy function rules associated with them. Thus, it is generally more efficient to stop the iterations when the policy functions have converged, even if the value function has not yet fully converged. B.2. Simulated Method of Moments Estimation To generate the simulated data for the SMM estimation (used to create Ψ S (θ) in Equation (5.1)), I simulate an economy with 1000 firms, with 250 units each. This is run for 30 years, with the first 10 years discarded to eliminate the effects of any assumptions on initial conditions. Each firm is randomly assigned an initial drift parameter μL or μH . I run this simulations 25 times to try to average out over the impact of any individual macro shocks. The same seed is always used in every simulation iteration. I also assume firms are initially distributed equally across μL and μH given the symmetry of the transition matrix for μit . To ensure that first-moment draws have a constant aggregate drift rate, I numerically set Aijt = exp(μL +μH )/2t Aij0 ij
ij
consistent with (3.6) as N → ∞, which in smaller samples stops extreme draws for individual units from driving macro averages. I use a simulated annealing algorithm for minimizing the criterion function in the estimation step in Equation (5.1). This starts with a predefined first and second guess. For the third guess onward it takes the best prior guess and randomizes from this to generate a new set of parameter guesses. That is, it takes
682
NICHOLAS BLOOM
the best-fit parameters and randomly “jumps off” from this point for its next guess. Over time the algorithm “cools,” so that the variance of the parameter jumps falls, allowing the estimator to fine tune its parameter estimates around the global best fit. I restart the program with different initial conditions to ensure the estimator converges to the global minimum. The simulated annealing algorithm is extremely slow, which is an issue since it restricts the size of the parameter space which can be estimated. Nevertheless, I use this because it is robust to the presence of local minima and discontinuities in the criterion function across the parameter space. To generate the standard errors for the parameter point estimates, I generate numerical derivatives of the simulation moments with respect to the parameters and weight them using the optimal weighting matrix. One practical issue with this is that the value of the numerical derivative, defined as (x) , is sensitive to the exact value of ε chosen. This is a comf (x) = f (x+ε)−f ε mon problem with calculating numerical derivatives using simulated data with underlying discontinuities, arising, for example, from grid-point-defined value functions. To address this, I calculate four values of the numerical derivative for an ε of +1%, +2.5%, +5%, and −1% of the midpoint of the parameter space57 and then take the median value of these numerical derivatives. This helps to ensure that the numerical derivative is robust to outliers arising from any discontinuities in the criterion function. REFERENCES ABEL, A., AND J. EBERLY (1996): “Optimal Investment With Costly Reversibility,” Review of Economic Studies, 63, 581–593. [631,632,639,672] (2002): “Investment and q With Fixed Costs: An Empirical Analysis,” Mimeo, Wharton School, University of Pennsylvania. [635] ABRAHAM, K., AND L. KATZ (1986): “Cyclical Unemployment: Sectoral Shifts or Aggregate Disturbances?” Journal of Political Economy, 94, 507–522. [628] ADDA, J., AND R. COOPER (2000): “Balladurette and Juppette: A Discrete Analysis of Scrapping Subsidies,” Journal of Political Economy, 108, 778–806. [626] AGHION, P., G. ANGELETOS, A. BANERJEE, AND K. MANOVA (2005): “Volatility and Growth: Credit Constraints and Productivity-Enhancing Investment,” Working Paper 11349, NBER. [626] BACHMAN, R., R. CABALLERO, AND E. ENGEL (2008): “Lumpy Investment in Dynamic General Equilibrium,” Mimeo, Yale. [652,653,659] BAILY, M., C. HULTEN, AND D. CAMPBELL (1992): “Productivity Dynamics in Manufacturing Plants,” Brookings Papers on Economic Activity, 1992, 187–249. [648] BARLEVY, G. (2004): “The Cost of Business Cycles Under Endogenous Growth,” American Economic Review, 94, 964–990. [626] BARNDORFF-NIELSEN, O., AND N. SHEPHARD (2006): “Econometrics of Testing for Jumps in Financial Economics Using Bipower Variation,” Journal of Financial Econometrics, 4, 1–30. [627,675] 57 For example, the midpoint of the parameter space for CKF is taken as 0.01, so that ε is defined as 0.0001, 0.00025, 0.0005, and −0.0001.
THE IMPACT OF UNCERTAINTY SHOCKS
683
BARTELSMAN, E., R. BECKER, AND W. GRAY (2000): “NBER Productivity Database,” available at www.nber.org. [661,662,677] BERNANKE, B. (1983): “Irreversibility, Uncertainty and Cyclical Investment,” Quarterly Journal of Economics, 98, 85–106. [626] BERNARD, A., S. REDDING, AND P. SCHOTT (2006): “Multi-Product Firms and Product Switching,” Working Paper 12293, NBER. [650] BERTOLA, G., AND S. BENTOLILA (1990): “Firing Costs and Labor Demand: How Bad Is Eurosclerosis,” Review of Economic Studies, 54, 318–402. [626] BERTOLA, G., AND R. CABALLERO (1994): “Irreversibility and Aggregate Investment,” Review of Economic Studies, 61, 223–246. [631,635,656] BLOOM, N. (2008): “The Impact of Uncertainty Shocks,” Working Paper 13385, NBER. [676] (2009): “Supplement to ‘The Impact of Uncertainty Shocks’,” Econometrica Supplemental Material, 77, http://econometricsociety.org/ecta/Supmat/6248_data and programs.zip. [627,633] BLOOM, N., S. BOND, AND J. VAN REENEN (2007): “Uncertainty and Investment Dynamics,” Review of Economic Studies, 74, 391–415. [623,626,628,655] BLOOM, N., M. FLOETOTTO, AND N. JAIMOVICH (2007): “Really Uncertain Business Cycles,” Mimeo, Stanford. [627,674] BLOOM, N., R. SADUN, AND J. VAN REENEN (2008): “The Organization of Firms Across Countries,” Mimeo, LSE. [636] BLOOM, N., M. SCHANKERMAN, AND J. VAN REENEN (2007): “Identifying Technology Spillovers and Product Market Rivalry,” Working Paper 13060, NBER. [636] BLOOM, N., AND J. VAN REENEN (2007): “Measuring and Explaining Management Practices Across Firms and Countries,” Quarterly Journal of Economics, 122, 1351–1408. [636] BOND, S., AND J. VAN REENEN (2007): “Microeconometric Models of Investment and Employment,” in Handbook of Econometrics (forthcoming). [626] BRESNAHAN, T., AND D. RAFF (1991): “Intra-Industry Heterogeneity and the Great Depression: The American Motor Vehicles Industry, 1929–1935,” Journal of Economic History, 51, 317–331. [657] CABALLERO, R., AND E. ENGEL (1993): “Microeconomic Adjustment Hazards and Aggregate Dynamics,” Quarterly Journal of Economics, 108, 359–383. [626,632,656] (1999): “Explaining Investment Dynamics in U.S. Manufacturing: A Generalized (S s) Approach,” Econometrica, 67, 783–826. [631,635,666] CABALLERO, R., E. ENGEL, AND J. HALTIWANGER (1995): “Plant-Level Adjustment and Aggregate Investment Dynamics,” Brookings Papers on Economic Activity, 2, 1–54. [626] (1997): “Aggregate Employment Dynamics: Building From Microeconomics Evidence,” American Economic Review, 87, 115–137. [626] CABALLERO, R., AND J. LEAHY (1996): “Fixed Costs: The Demise of Marginal Q,” Working Paper 5508, NBER. [637,639] CHRISTIANO, L., M. EICHENBAUM, AND C. EVANS (2005): “Nominal Rigidities and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy, 113, 1–45. [625] COOPER, R., AND J. HALTIWANGER (1993): “The Aggregate Implications of Machine Replacement: Theory and Evidence,” American Economic Review, 83, 360–382. [626] (2006): “On the Nature of the Capital Adjustment Process,” Review of Economic Studies, 73, 611–633. [626,633,659,660,666] COOPER, R., J. HALTIWANGER, AND J. WILLIS (2004): “Dynamics of Labor Demand: Evidence From Plant-Level Observations and Aggregate Implications,” Working Paper 10297, NBER. [626,666] COOPER, R., J. HALTIWANGER, AND L. POWER (1999): “Machine Replacement and the Business Cycle: Lumps and Bumps,” American Economic Review, 89, 921–946. [626] DAVIS, S., R. FABERMAN, AND J. HALTIWANGER (2006): “The Flow Approach to Labor Markets: New Data Sources and Micro–Macro Links,” Journal of Economic Perspectives, 20, 3–26. [645, 659,677]
684
NICHOLAS BLOOM
DAVIS, S., AND J. HALTIWANGER (1992): “Gross Job Creation, Gross Job Destruction, and Employment Reallocation,” Quarterly Journal of Economics, 107, 819–863. [626,634] DAVIS, S., J. HALTIWANGER, R. JARMIN, AND J. MIRANDA (2006): “Volatility and Dispersion in Business Growth Rates: Publicly Traded versus Privately Held Firms,” Working Paper 12354, NBER. [628,636,662] DIXIT, A. (1993): The Art of Smooth Pasting, Fundamentals of Pure and Applied Economics. Reading, U.K.: Harwood Academic. [672] DIXIT, A., AND R. PINDYCK (1994): Investment Under Uncertainty. Princeton, NJ: Princeton University Press. FOSTER, L., J. HALTIWANGER, AND C. KRIZAN (2000): “Aggregate Productivity Growth: Lessons From Microeconomic Evidence,” in New Developments in Productivity Analysis. Chicago, IL: University of Chicago Press. [625,650] (2006): “Market Selection, Reallocation and Restructuring in the U.S. Retail Trade Sector in the 1990s,” Review of Economics and Statistics, 88, 748–758. [625,650] GILCHRIST, S., AND J. WILLIAMS (2005): “Investment, Capacity and Uncertainty: A Putty–Clay Approach,” Review of Economic Dynamics, 8, 1–27. [626] HALL, B. (1987): “The Relationship Between Firm Size and Firm Growth in the U.S. Manufacturing Sector,” Journal of Industrial Economics, 35, 583–606. [663] HALL, R. (2004): “Measuring Factor Adjustment Costs,” Quarterly Journal of Economics, 119, 899–927. [626,666] HAMERMESH, D. (1989): “Labor Demand and the Structure of Adjustment Costs,” American Economic Review, 79, 674–89. [626] HASSLER, J. (1996): “Variations in Risk and Fluctuations in Demand—A Theoretical Model,” Journal of Economic Dynamics and Control, 20, 1115–1143. [626] HAYASHI, F. (1982): “Tobin’s Marginal Q and Average Q: A Neoclassical Interpretation,” Econometrica, 50, 213–224. [666] JUDD, K. (1998): Numerical Methods in Economics. Cambridge, MA: MIT Press. [659] KHAN, A., AND J. THOMAS (2003): “Nonconvex Factor Adjustments in Equilibrium Business Cycle Models: Do Nonlinearities Matter?” Journal of Monetary Economics, 50, 331–360. [652, 659] (2008): “Idiosyncratic Shocks and the Role of Nonconvexities in Plant and Aggregate Investment Dynamics,” Econometrica, 76, 395–436. [653] KING, R., AND S. REBELO (1999): “Resuscitating Real Business Cycles,” in Handbook of Macroeconomics, ed. by J. Taylor and M. Woodford. Amsterdam: Elsevier. [627,639] KRUSELL, P., AND A. SMITH (1998): “Income and Wealth Heterogeneity in the Macroeconomy,” Journal of Political Economy, 106, 867–896. [652] LEAHY, J., AND T. WHITED (1996): “The Effects of Uncertainty on Investment: Some Stylized Facts,” Journal of Money Credit and Banking, 28, 64–83. [626,628] LEE, B., AND B. INGRAM (1991): “Simulation Estimation of Time Series Models,” Journal of Econometrics, 47, 197–205. [658] LILLIEN, E. (1982): “Sectoral Shifts and Cyclical Unemployment,” Journal of Political Economy, 90, 777–793. [628] MCFADDEN, D. (1989): “A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration,” Econometrica, 57, 995–1026. [657] MEGHIR, C., AND L. PISTAFERRI (2004): “Income Variance Dynamics and Heterogeneity,” Econometrica, 72, 1–32. [626] MERZ, M., AND E. YASHIV (2007): “Labor and the Market Value of the Firm,” American Economic Review, 97, 1419–1431. [626,666] NICKELL, S. (1986): “Dynamic Models of Labor Demand,” in Handbook of Labor Economics, Vol. 1, ed. by O. Ashenfelter and R. Layard. Amsterdam: North-Holland. [666] OHANIAN, L. (2001): “Why Did Productivity Fall So Much During The Great Depression?” American Economic Review, 91, 34–38. [657]
THE IMPACT OF UNCERTAINTY SHOCKS
685
PAKES, A., AND D. POLLARD (1989): “Simulation and the Asymptotics of Optimization Estimators,” Econometrica, 57, 1027–1057. [657] RAMEY, G., AND V. RAMEY (1995): “Cross-Country Evidence on the Link Between Volatility and Growth,” American Economic Review, 85, 1138–1151. [626] RAMEY, V., AND M. SHAPIRO (2001): “Displaced Capital A Study of Aerospace Plant Closings,” Journal of Political Economy, 109, 958–992. [666] ROMER, C. (1990): “The Great Crash and the Onset of the Great Depression,” Quarterly Journal of Economics, 105, 597–624. [657] SHAPIRO, M. (1986): “The Dynamic Demand for Labor and Capital,” Quarterly Journal of Economics, 101, 513–542. [626,666] SHILLER, R. (1981): “Do Stock Prices Move Too Much to Be Justified by Subsequent Changes in Dividends?” American Economic Review, 71, 421–436. [623] SIM, J. (2006): “Uncertainty, Irreversible Investment and General Equilibrium,” Mimeo, Boston University. [653] STOKEY, N., AND R. LUCAS (1989): Recursive Methods in Economic Dynamics, With E. Prescott. Cambridge, MA: Harvard University Press. [637]
Dept. of Economics, Stanford University, 579 Serra Mall, Stanford, CA 94305, U.S.A. and the Centre for Economic Performance, London School of Economics and Political Science, London, U.K. and NBER;
[email protected]. Manuscript received January, 2006; final revision received December, 2008.
Econometrica, Vol. 77, No. 3 (May, 2009), 687–719
GENERALIZED METHOD OF MOMENTS WITH MANY WEAK MOMENT CONDITIONS BY WHITNEY K. NEWEY AND FRANK WINDMEIJER1 Using many moment conditions can improve efficiency but makes the usual generalized method of moments (GMM) inferences inaccurate. Two-step GMM is biased. Generalized empirical likelihood (GEL) has smaller bias, but the usual standard errors are too small in instrumental variable settings. In this paper we give a new variance estimator for GEL that addresses this problem. It is consistent under the usual asymptotics and, under many weak moment asymptotics, is larger than usual and is consistent. We also show that the Kleibergen (2005) Lagrange multiplier and conditional likelihood ratio statistics are valid under many weak moments. In addition, we introduce a jackknife GMM estimator, but find that GEL is asymptotically more efficient under many weak moments. In Monte Carlo examples we find that t-statistics based on the new variance estimator have nearly correct size in a wide range of cases. KEYWORDS: GMM, continuous updating, many moments, variance adjustment.
1. INTRODUCTION MANY APPLICATIONS OF generalized method of moments (GMM; Hansen (1982)) have low precision. Examples include some natural experiments (Angrist and Krueger (1991)), consumption asset pricing models (Hansen and Singleton (1982)), and dynamic panel models (Holtz-Eakin, Newey, and Rosen (1988)). In these settings the use of many moments can improve estimator accuracy. For example, Hansen, Hausman, and Newey (2008) have recently found that in an application from Angrist and Krueger (1991), using 180 instruments, rather than 3, shrinks correct confidence intervals substantially. A problem with using many moments is that the usual Gaussian asymptotic approximation can be poor. The two-step GMM estimator can be very biased. Generalized empirical likelihood (GEL, Smith (1997)) and other estimators have smaller bias, but the usual standard errors are found to be too small in examples in Han and Phillips (2006) and here. In this paper we use alternative asymptotics that addresses this problem in overidentified instrumental variable models that are weakly identified. Such environments seem quite common in econometric applications. Under the alternative asymptotics we find that GEL has a Gaussian limit distribution with asymptotic variance larger than usual. We give a new, “sandwich” variance estimator that is consistent under standard and many weak moment asymptotics. We find in Monte Carlo examples 1 The NSF provided financial support for this paper under Grant 0136869. Helpful comments were provided by J. Hausman, J. Powell, J. Stock, T. Rothenberg, and four referees. This paper has been presented at the North American Winter Meeting and World Meeting of the Econometric Society, Cal Tech, Chicago, Copenhagen, Leuven, Manchester, NYU, Ohio State, Stanford, UC Berkeley, UCL, UCLA, and USC.
© 2009 The Econometric Society
DOI: 10.3982/ECTA6224
688
W. K. NEWEY AND F. WINDMEIJER
that, in a range of cases where identification is not very weak, t-ratios based on the new variance estimator have a better Gaussian approximation than usual. We also show that the Kleibergen (2005) Lagrange multiplier (LM) and conditional likelihood ratio statistics, the Stock and Wright (2000) statistic, and the overidentifying statistic have an asymptotically correct level under these asymptotics, but that the likelihood ratio statistic does not. For comparison purposes we also consider a jackknife GMM estimator that generalizes the jackknife instrumental variable (IV) estimators of Phillips and Hale (1977), Angrist, Imbens, and Krueger (1999), and Blomquist and Dahlberg (1999). This estimator should also be less biased than the two-step GMM estimator. In the linear IV case, Chao and Swanson (2004) derived its limiting distribution under the alternative asymptotics. Here we show that jackknife GMM is asymptotically less efficient than GEL. The alternative asymptotics is based on many weak moment sequences like those of Chao and Swanson (2004, 2005), Stock and Yogo (2005a), and Han and Phillips (2006). This paper picks up where Han and Phillips (2006) left off by showing asymptotic normality with an explicit formula for the asymptotic variance that is larger than usual and by giving a consistent variance estimator. This paper also extends Han and Phillips’ (2006) work by giving primitive conditions for consistency and a limiting distribution when a heteroskedasticity consistent weight matrix is used for the continuous updating estimator (CUE), by analyzing GEL estimators other than the CUE and by considering jackknife GMM. The standard errors we give can be thought of as an extension of the Bekker (1994) standard errors from homoskedasticity and the limited information maximum likelihood (LIML) estimator to heteroskedasticity and GEL. Under homoskedasticity these standard errors and Bekker’s (1994) have the same limit, but the ones here are consistent under heteroskedasticity. The asymptotics here is well suited for IV estimators, but will not be particularly helpful for the type of minimum distance estimator considered in Altonji and Segal (1996). Estimation of the weighting matrix can strongly affect the properties of minimum distance estimators, but the asymptotics here treats it as fixed. The limiting distribution for GEL can be derived by increasing the number of moments in the Stock and Wright (2000) limiting distribution of the CUE. This derivation corresponds to sequential asymptotics, where one lets the number of observations go to infinity and then lets the number of moments grow. We give here simultaneous asymptotics, where the number of moments grows along with, but slower than, the sample size. One might also consider asymptotics where the number of moments increases at the same rate as the sample size, as Bekker (1994) did for LIML. It is harder to do this for GEL than for LIML, because GEL uses a heteroskedasticity consistent weighting matrix. Consequently, estimation of all the elements of this weighting matrix has to be allowed for rather than just estimation of a
GMM WITH WEAK MOMENTS
689
scalar variance term. If the number of instruments grows as fast as the sample size, the number of elements of the weight matrix grows as fast as the square of the sample size. It seems difficult to simultaneously control the estimation error for all these elements. Many weak moment asymptotics sidestep this problem by allowing the number of moments to grow more slowly than the sample size, while accounting for the presence of many instruments by letting identification shrink. In the linear heteroskedastic model, we give primitive conditions for consistency and asymptotic normality of GEL estimators under many weak moments. For consistency of the CUE, these conditions include a requirement that the number of moments m and the sample size n satisfy m2 /n −→ 0. This condition seems minimal given the need to control estimation of the weighting matrix. For asymptotic normality, we require m3 /n −→ 0 for the CUE. We impose somewhat stronger rate conditions for other GEL estimators. In comparison, under homoskedasticity Stock and Yogo (2005a) required m2 /n −→ 0, Hansen, Hausman, and Newey (2008) allowed m to grow at the same rate as n but restricted m to grow slower than the square of the concentration parameter, and Andrews and Stock (2006) required m3 /n −→ 0 when normality is not imposed. Of course one might expect somewhat stronger conditions with a heteroskedasticity consistent weighting matrix. The new variance estimator from the many weak instrument asymptotics is different than in Windmeijer (2005). That paper adjusted for the variability of the weight matrix while the many instrument asymptotics adjust for the variability of the moment derivative. In Section 2 we describe the model, the estimators, and the new asymptotic variance estimator. Test statistics that are robust to weak instruments and many weak instruments are described in Section 3. The alternative asymptotics is set up in Section 4. Section 5 calculates the asymptotic variance. Section 6 gives precise large sample results for GEL. Section 7 reports some Monte Carlo results. Section 8 offers some conclusions and some possible directions for future work. The online Appendix (Newey and Windmeijer (2009)) gives proofs. 2. THE MODEL AND ESTIMATORS The model we consider is for independent and identically distributed (i.i.d.) data where there are a countable number of moment restrictions. In the asymptotics we allow the data generating process to depend on the sample size. To describe the model, let wi (i = 1 n) be i.i.d. observations on a data vector w. Also, let β be a p × 1 parameter vector and let g(w β) = m (g1m (w β) gm (w β)) be an m × 1 vector of functions of the data observation w and the parameter, where m ≥ p. For notational convenience, we
690
W. K. NEWEY AND F. WINDMEIJER
suppress an m superscript on g(w β). The model has a true parameter β0 satisfying the moment condition E[g(wi β0 )] = 0 where E[·] denotes expectation taken with respect to the distribution of wi for sample size n, and we suppress the dependence on n for notational convenience. To describe the estimators and the asymptotic approximation, we will use some notation. Let ej denote the jth unit vector and let ˆ g(β) =
gi (β) = g(wi β)
n
gi (β)/n
i=1
ˆ Ω(β) =
n
gi (β)gi (β) /n
i=1
¯ g(β) = E[gi (β)]
gi = gi (β0 )
Ω(β) = E[gi (β)gi (β) ] ˆ ˆ G(β) = ∂g(β)/∂β
G(β) = E[∂gi (β)/∂β]
Gi (β) = ∂gi (β)/∂β G = G(β0 )
Ω = Ω(β0 )
Gi = Gi (β0 ) −1
B = Ω E[gi ej Gi ] j
j
Ui = Gi ej − Gej − Bj gi
p
Ui = [Ui1 Ui ]
An important example of this model is a single linear equation with instruments orthogonal to disturbances and heteroskedasticity of unknown form. This model is given by (1)
yi = xi β0 + εi
xi = Υi + ηi
E[εi |Zi Υi ] = 0
E[ηi |Zi Υi ] = 0
where yi is a scalar, xi is a p × 1 vector of right-hand side variables, Zi is an m × 1 vector of instrumental variables, and Υi is a p × 1 vector of reduced form values. In this setting the moment functions are g(wi β) = Zi (yi − xi β) The notation for the linear model is then i
gi (β) = Zi (yi − x β)
ˆ g(β) =
n i=1
ˆ Ω(β) =
n
i=1
Zi Zi (yi − xi β)2 /n
Zi (yi − xi β)/n
691
GMM WITH WEAK MOMENTS
¯ g(β) = −E[Zi Υi ](β − β0 ) Ω = E[Zi Zi εi2 ] ˆ G(β) =−
n
Ω(β) = E[Zi Zi (yi − xi β)2 ]
gi = Zi εi
Zi xi /n
G(β) = G = −E[Zi Υi ]
Gi (β) = Gi = −Zi xi
Bj = −Ω−1 E[Zi Zi εi xij ]
i=1
j
Ui = −Zi xij + E[Zi xij ] − Bj Zi εi To describe the Hansen (1982) two-step GMM estimator, let β˙ be a preliminary estimator and let B be a compact set of parameter values. This estimator is given by ¨ β¨ = arg min Q(β) β∈B
ˆ ¨ ˆ ˆ Q(β) = g(β) W g(β)/2
ˆ β) ˙ −1 Wˆ = Ω(
ˆ β) ˙ −1 is optimal in minimizing the asymptotic The weighting matrix Wˆ = Ω( variance of β¨ under standard asymptotics. The CUE has an analogous form where the objective function is simultaneˆ ously minimized over β in Ω(β), that is ˆ βˆ = arg min Q(β) β∈B
ˆ ˆ ˆ ˆ Ω(β)−1 g(β)/2 Q(β) = g(β)
To describe a GEL estimator, let ρ(v) be a function of a scalar v that is concave on an open interval V containing zero and let ρj (0) = ∂j ρ(0)/∂vj . We ˆ normalize ρ(v) so that ρ(0) = 0, ρ1 (0) = −1, and ρ2 (0) = −1. Let L(β) = {λ : λ gi (β) ∈ V i = 1 n} A GEL estimator is given by ˆ βˆ = arg min Q(β) β∈B
ˆ Q(β) = sup
n
ˆ λ∈L(β) i=1
ρ(λ gi (β))/n
as in Smith (1997). The empirical likelihood (EL; Qin and Lawless (1994), Imbens (1997)) estimator is obtained when ρ(v) = ln(1 − v) (and V = (−∞ 1)), and exponential tilting (ET, Imbens (1997), Kitamura and Stutzer (1997)) is obtained when ρ(v) = −ev + 1. When ρ(v) = −v − v2 /2, the objective function ˆ ˆ ˆ ˆ Ω(β)−1 g(β)/2 has an explicit form Q(β) = g(β) (Newey and Smith (2004)) and GEL is CUE. To describe the new variance estimator for GEL, assume that ˆ λ(β) = arg max
ˆ λ∈L(β)
n i=1
ρ(λ gi (β))/n
692
W. K. NEWEY AND F. WINDMEIJER
exists (which will be true with probability approaching 1 in large samples) and let ˆ D(β) =
n
πˆ i (β)
i=1
∂gi (β) ∂β
ˆ gi (β)) ρ1 (λ(β) πˆ i (β) = n ˆ j=1 ρ1 (λ(β) gj (β))
(i = 1 n) ˆ j (β) of D(β) ˆ For the CUE, the jth column D will be taken to be ∂gi (β) 1 ∂gi (β) −1 ˆ ˆ j (β) = 1 ˆ D − gi (β) Ω(β) g(β) n i=1 ∂βj n i=1 ∂βj n
n
ˆ = D( ˆ β) ˆ is an efficient estimator of G = E[∂gi (β0 )/∂β] like that In general, D considered by Brown and Newey (1998). Also let ˆ β) ˆ Ωˆ = Ω(
ˆ β) ˆ ∂2 Q( Hˆ = ∂β ∂β
The estimator of the asymptotic variance of βˆ is Vˆ /n, where ˆ Ωˆ −1 D ˆ Hˆ −1 Vˆ = Hˆ −1 D When m is fixed and identification is strong (i.e., under “textbook” asymptotp p ˆ −→ ˆ −→ ˆ β) 0 so that D G and hence ics), Vˆ will be consistent. In that case, g( p −1 −1 ˆ V −→ (G Ω G) the textbook GMM asymptotic variance. The virtue of Vˆ is that it also is consistent under the alternative, many weak moment asymptotics (when normalized appropriately). Under the alternative asymptotics the asymptotic variance of βˆ has a ˆ Ωˆ −1 G, ˆ where G ˆ = “sandwich” form that is estimated by Vˆ /n. The matrix G −1 ˆ Ωˆ G ˆ has a bias. This ˆ ˆ β)/∂β, ∂g( cannot be used in place of Hˆ in Vˆ because G ˆ ˆ −1 ˆ 2 ˆ ˘ ˆ but bias can be removed by using H = i=j Gi Ω Gj /n for Gi = ∂gi (β)/∂β, we do not consider this further because it did not work well in trial simulations. ˆ Ωˆ −1 D ˆ in Vˆ estimates a different, larger object than H. ˆ It is The middle term D ˆ an estimator of the asymptotic variance of ∂Q(β0 )/∂β under weak identification introduced by Kleibergen (2005) for CUE and Guggenberger and Smith (2005) for other GEL objective functions. They showed that this estimator can be used to construct a test statistic under weak identification with fixed m. Here we give conditions for consistency of a properly normalized version of Vˆ when m is allowed to grow with the sample size. The jackknife GMM estimator is ob-
GMM WITH WEAK MOMENTS
693
tained by deleting “own observation” terms from the double sum that makes up the two-step GMM estimator, as ˆ β) ˘ ˘ ˙ −1 Q(β) = β˘ = arg min Q(β) gi (β) W˘ gj (β)/2n2 W˘ = Ω( β∈B
i=j
where β˙ is a preliminary jackknife GMM estimator based on a known choice of W˘ (analogous to two-step optimal GMM). For example, consider the linear model and let P˘ij = Zi W˘ Zj Here the jackknife GMM estimator is β˘ =
i=j
P˘ij xi xj
−1
P˘ij xi yj
i=j
This estimator is a generalization of the second jackknife instrumental variables estimator (JIVE2) of Angrist, Imbens, and Krueger (1999) to allow a general weighting matrix W˘ . ˆ β) ˘i = ˘ G To describe the variance estimator for jackknife GMM, let Ω˘ = Ω( n ˘ ˘ ˘ ˘ Gi (β), g˘ i = gi (β), and G = i=1 Gi /n. Also let ˘ Ω˘ −1 G ˘ j /n2 Λ˘ J = ˘ Ω˘ −1 g˘ i g˘ Ω˘ −1 G ˘ i /[n2 (n − 1)] H˘ = G G i j j i=j
i=j
The estimator of the asymptotic variance of β˘ is V˘ /n, where ˘ Ω˘ −1 G ˘ + Λ˘ J )H˘ −1 V˘ = H˘ −1 (G This has a sandwich form like Vˆ , with a jackknife estimator H˘ of H rather than the Hessian Hˆ and an explicit adjustment term Λ˘ J for many moments. V˘ will be consistent under both standard and many weak moment asymptotics, though we do not show this result here. The many moment bias of two-step GMM with nonrandom Wˆ has a quite simple explanation that motivates CUE, GEL, and jackknife GMM. This explanation is also valid under many weak moments with a random Wˆ , because estimation of Wˆ does not affect the limiting distribution. The absence of weighting matrix effects from many weak moment asymptotics indicates these asymptotics may not be a good approximation for minimum distance settings like those of Altonji and Segal (1996), where estimation of the weighting matrix is important. Following Han and Phillips (2006), the bias is explained by the fact that the expectation of the objective function is not minimized at the truth. Since the objective function will be close to its expectation in large samples, the estimator will tend to be close to the minimum of the expectation, leading to bias.
694
W. K. NEWEY AND F. WINDMEIJER
When Wˆ equals a nonrandom matrix W the expectation of the GMM objective function is ˆ ˆ E[g(β) (2) W g(β)/2] =E gi (β) W gj (β) i=j
+
n
gi (β) W gi (β)
2n2
i=1 ¯ ¯ = (1 − n−1 )g(β) W g(β)/2 + E[gi (β) W gi (β)]/2n ¯ ¯ = (1 − n−1 )g(β) W g(β)/2 + tr(W Ω(β))/2n ¯ ¯ The term (1 − n−1 )g(β) W g(β) is a “signal” term that is minimized at β0 . The second term is a bias (or “noise”) term that generally does not have zero derivative at β0 (and hence is not minimized at β0 ) when Gi is correlated with gi , for example, when endogeneity is present in the linear model. Also, when Gi and gi are correlated, the second term generally increases in size with the number of moments m. This increasing bias term leads to inconsistency of the two-step GMM estimator under many weak moments, as shown by Han and Phillips (2006). This bias also corresponds to the higher order bias term BG in Newey and Smith (2004) that is important with endogeneity. One way to remove this bias is to choose W so the bias does not depend on β. Note that if W = Ω(β)−1 then the bias term becomes tr(W Ω(β))/2n = tr(Ω(β)−1 Ω(β))/2n = m/2n which does not depend on β. A feasible version −1 ˆ , leading to the objective funcof this bias correction is to choose Wˆ = Ω(β) tion
(3)
ˆ ˆ ˆ ˆ Q(β) = g(β) Ω(β)−1 g(β)/2
=
−1 ˆ gi (β) Ω(β) gj (β)/2n2 +
i=j
=
n
−1 ˆ gi (β) Ω(β) gi (β)/2n2
i=1 −1 ˆ gi (β) Ω(β) gj (β)/2n2 + m/2n
i=j
ˆ The estimator βˆ = arg minβ∈B Q(β) that minimizes this objective function is the CUE. It is interesting to note that it also has a jackknife GMM form. Another way to remove the bias is to simply subtract an estimator ˆ tr(Wˆ Ω(β))/2n of the bias term from the GMM objective function, giving ˆ ˘ ¨ Q(β) = Q(β) − tr(Wˆ Ω(β))/2n =
[gi (β) Wˆ gj (β)]/2n2 i=j
GMM WITH WEAK MOMENTS
695
giving the jackknife GMM objective function. The corresponding estimator will be consistent under many weak moment asymptotics because the own observation terms are the source of the bias in equation (2). In what follows we will focus most of our attention on the GEL estimators. As shown below, when Wˆ is optimal in the usual GMM sense, the GEL estimators will be asymptotically more efficient than the jackknife GMM estimators under many weak moments. They are also inefficient relative to GEL in our Monte Carlo study, giving us further reason for our GEL focus. 3. LARGE SAMPLE INFERENCE As shown by Dufour (1997) in linear models, if the parameter set is allowed to include values where the model is not identified, then a correct confidence interval for a structural parameter must be unbounded with positive probability. Hence, bounded confidence intervals, such as Wald intervals formed in the usual way from Vˆ , cannot be correct. Also, under the weak identification sequence of Stock and Wright (2000), the Wald confidence intervals will not be correct, that is, the new variance estimator is not robust to weak identification. These observations motivate consideration of statistics that are asymptotically correct with weak or many weak moment conditions. One identification robust statistic proposed by Stock and Wright (2000) is a GMM version of the Anderson–Rubin (AR) statistic. For the null hypothesis H0 : β0 = β, where β is known, the GEL version of this statistic, as given by Guggenberger and Smith (2005), is (4)
ˆ AR(β) = 2nQ(β)
Under the null hypothesis and weak identification, or many weak moments, treating this as if it were distributed as χ2 (m) will be asymptotically correct. As a result we can form a joint confidence interval for the vector β by inverting AR(β). Specifically, for the 1 − α quantile qαm of a χ2 (m) distribution, an asymptotic 1 − α confidence interval for β will be {β : AR(β) ≤ qαm }. This confidence interval will be valid under weak identification and under many weak moments. However, there are other confidence intervals that have this property but are smaller in large samples, thus producing more accurate inference. One of these is the Kleibergen (2005) and Guggenberger and Smith (2005) Lagrange multiplier (LM) statistic for GEL. For the null hypothesis H0 : β0 = β, where β is known, the LM statistic is (5)
LM(β) = n
ˆ ˆ ∂Q(β) ˆ −1 ∂Q(β) ˆ ˆ Ω(β)−1 D(β)] [D(β) ∂β ∂β
Under the null hypothesis and weak identification or many weak moments, this statistic will have a χ2 (p) limiting distribution. As a result, we can form
696
W. K. NEWEY AND F. WINDMEIJER
joint confidence intervals for the vector β0 by inverting LM(β). Specifically, for the 1 − α quantile qαp of a χ2 (p) distribution, an asymptotic 1 − α confidence interval is {β : LM(β) ≤ qαp }. These confidence intervals are also correct in the weak identification setting of Stock and Wright (2000). Kleibergen (2005) also proposed a GMM analog of the conditional likelihood ratio (CLR) test of Moreira (2003), motivated by the superior performance of the analogous CLR statistic, relative to LM, in the linear homoskedastic model. Smith (2006) extended this statistic to GEL. Here we consider one version. ˆ Let R(β) be some statistic which should be large if the parameters are identiˆ fied and small if not and is random only through D(β) asymptotically. Kleibergen (2005) suggested using a statistic of a null hypothesis about the rank of ˆ ˆ D(β). We consider a simple choice of R(β) given by ˆ ˆ ˆ ˆ Ω(β)−1 D(β)) R(β) = nξmin (D(β)
where ξmin (A) denotes the smallest eigenvalue of A. A version of the GELCLR statistic is (6)
CLR(β) =
1 ˆ AR(β) − R(β) 2 1/2
2 ˆ ˆ + 4 LM(β)R(β) + (AR(β) − R(β))
Under the null hypothesis H0 : β0 = β, a level α critical value qˆ α (β) for this test statistic can be simulated. Let (qsm−p qsp ), s = 1 S, be i.i.d. draws (independent from each other and over s) of χ2 (m − p) and χ2 (p) random variables. Let qˆ α (β) be the 1 − α quantile of
1 m−p ˆ + qsp − R(β) q 2 s
m−p 1/2 p 2 p ˆ ˆ ; s = 1 S + (qs + qs − R(β)) + 4qs R(β) An asymptotic 1 − α confidence interval can then be formed as {β : CLR(β) ≤ qˆ α (β)}. These confidence intervals will be correct under weak identification and also under many weak moment conditions. ˆ Another test statistic of interest is the overidentification statistic AR(β). This statistic is often used to test all the overidentifying restrictions associated with the moment conditions. Under a fixed number of moment conditions, this statistic converges in distribution to χ2 (m − p) and the critical value for this distribution remains valid under many weak moments. Thus, it will be the case ˆ > qαm−p ) −→ α. that Pr(AR(β)
GMM WITH WEAK MOMENTS
697
In addition to these statistics, Hansen, Heaton, and Yaron (1996) considered the likelihood ratio (LR) statistic corresponding to the CUE. For GEL this statistic takes the form ˆ ˆ β)] ˆ LR(β) = 2n[Q(β) − Q( As discussed in Stock and Wright (2000), this statistic does not have a chisquared limiting distribution under weak identification; we show that it also does not under many weak moments. We find that the critical value for a chi-squared distribution leads to overrejection, so that the confidence interval based on this statistic is too small. Under local alternatives and many weak moments, one could compare the power of some of these test statistics as a test of H0 : β = β0 . The Wald statistic is Tˆ = n(βˆ − β0 ) Vˆ −1 (βˆ − β0 ). We will show that there is a bounded sequence {cn } with cn bounded positive such that LM(β0 ) = Tˆ + op (1)
CLR(β0 ) = cn Tˆ + op (1)
Thus, the Wald test based on Tˆ will be asymptotically equivalent under the null hypothesis and contiguous alternatives to the LM and CLR tests. The implied asymptotic equivalence of LM and CLR is a GMM version of a result of Andrews and Stock (2006). In contrast, a test based on AR(β0 ) will have asymptotic local power equal to size, because its degrees of freedom goes to infinity. However these comparisons do not hold up under weak identification. No power ranking of these statistics is known in that case. The new variance estimator seems useful despite the lack of robustness to weak instruments. Standard errors are commonly used in practice as a measure of uncertainty associated with an estimate. Also, for multidimensional parameters the confidence intervals based on the LM or CLR are more difficult to compute. Confidence ellipses can be formed in the usual way from βˆ and Vˆ , while LM or CLR confidence sets need to be calculated by an exhaustive grid search. Furthermore, the conditions for an accurate many weak moment approximation seem to occur often in applications, as further discussed below. For all these reasons, the standard errors given here seem useful for econometric practice. It does seem wise to check for weak moments in practice. One could develop GMM versions of the Hahn and Hausman (2004) and/or Stock and Yogo (2005b) tests. One could also compare a Wald test based on the corrected standard errors with a test based on an identification robust statistic. 4. MANY WEAK MOMENT APPROXIMATION As always, asymptotic theory is meant to provide an approximation to the distribution of objects of interest in applications. The theory and Monte Carlo
698
W. K. NEWEY AND F. WINDMEIJER
results below indicate that many weak moment asymptotics, applied to βˆ and Vˆ , should provide an improvement in (A) overidentified models where (B) the variance of the Jacobian of the moment functions is large relative to its average and (C) the parameters are quite well identified. Condition (B) is often true in IV settings, tending to hold when reduced form R2 s are low. Condition (C) is also often true in IV settings (e.g., see the brief applications survey in Hansen, Hausman, and Newey (2008)). The many weak moment asymptotics will not provide an improved approximation in minimum distance settings where g(w β) = g1 (w) − g2 (β). In that setting ∂gi (β0 )/∂β is constant, so that condition (B) will not hold. In fact, the asymptotic variance under many weak moments will be the same as the usual variance. Conditions (A), (B), and (C) are simultaneously imposed in the asymptotics, where (i) m grows, (ii) some components of G Ω−1 G go to zero, so that the variance of ∂gi (β0 )/∂β is large relative to G, and (iii) nG Ω−1 G grows, so that the parameters are identified. The following specific condition incorporates each of (i), (ii), and (iii). ASSUMPTION 1: (i) There is a p × p matrix Sn = S˜ n diag(μ1n μpn ) such that S˜n is bounded, the smallest eigenvalue of S˜n S˜n is bounded away from zero, for √ √ each j either μjn = n or μjn / n −→ 0, μn = min1≤j≤p μjn −→ ∞, and m/μ2n is bounded. (ii) nSn−1 G Ω−1 GSn−1 −→ Hand H is nonsingular. This assumption allows for linear combinations of β to have different degrees of identification, similarly to Hansen, Hausman, and Newey (2008). For example, when a constant is included one might consider the corresponding reduced √ form coefficient to be strongly identified. This will correspond to = n For less strong identification μjn will be allowed to grow slower than μ jn √ n This condition is a GMM version of one of Chao and Swanson (2005) for IV. It generalizes Han and Phillips (2006) to allow μjn to differ across j. The linear model of equation (1) is an example. Suppose that it has reduced form and instruments given by xi = (z1i x2i )
μn x2i = π21 z1i + √ z2i + η2i n
Zi = (z1i Z2i )
where z1i is a p1 × 1 vector of included exogenous variables, z2i is a (p − p1 ) × 1 vector of excluded exogenous variables, and Z2i is an (m − p1 ) × 1 vector of instruments. This specification allows for constants in the structural equation and reduced form by allowing an element of z1i to be 1. The variables z2i may not be observed by the econometrician. For example, we could have z2i = f0 (wi ) for a vector of underlying exogenous variables wi and an unknown vector of functions f0 (w). In this case the instrument vector could be Zi = (z1i p1m−p1 (wi ) pm−p1 m−p1 (wi )) , where pjm−p1 (wi ) (j = 1 m − p1 )
GMM WITH WEAK MOMENTS
699
are approximating functions such as power series or splines. In this case the model is like in Newey (1990), except that the coefficient of the unknown function f0 (wi ) goes to zero to model weaker identification. To see how Assumption 1 is satisfied in this example, let
√ n j = 1 p1 , 0 Ip1 ˜ μjn = Sn = π21 Ip−p1 μn j = p1 + 1 p. Then for zi = (z1i z2i ) the reduced form is z1i √ μ = Sn zi / n Υi = π z + n z √ 21 1i 2i n √ G = −E[Zi Υi ] = −E[Zi zi ]Sn / n Assume that zi and Zi are functions of some variables z˜ i , and let σi2 = E[εi2 |˜zi ] > 0 and zi∗ = zi /σi2 . Then nSn−1 G Ω−1 GSn−1 = E[zi Zi ]Ω−1 E[Zi zi ] = E[σi2 zi∗ Zi ](E[σi2 Zi Zi ])−1 E[σi2 Zi zi∗ ] The expression following the second equality is the mean square error of a linear projection of zi∗ on Zi weighted by σi2 . Therefore, if linear combinations of Zi can approximate zi∗ , that is, if there is πm such that limm−→∞ E[σi2 zi∗ − πm Zi 2 ] = 0, then nSn−1 G Ω−1 GSn−1 −→ E[σi2 zi∗ zi∗ ] = E[σi−2 zi zi ] Then it suffices for Assumption 1 to assume that E[σi−2 zi zi ] is nonsingular. Asymptotic normality will lead to different convergence rates for linear combinations of the coefficients. In the linear model example just considered, where β = (β1 β2 ) it will be the case that √ n[(βˆ 1 − β1 ) + π21 (βˆ 2 − β2 )] Sn (βˆ − β) = μn (βˆ 2 − β2 ) is jointly asymptotically normal. Thus, the coefficients βˆ 2 of the endogenous variables converge at rate 1/μn but the coefficients of included exogenous vari√ ables βˆ 1 need not converge at rate 1/ n. Instead, it is the linear combination √ ˆ β2 is the coefficient βˆ 1 + π21 β2 that converges at rate 1/ n. Note that β1 + π21 it is the reduced form coefof z1i in the reduced form equation for yi . Thus, √ ficient that converges to the truth at rate 1/ n. In general, all the structural coefficients may converge at the rate 1/μn . In that case the asymptotic variance matrix of μn (βˆ − β0 ) will be singular with rank equal to p2 . Wald tests of up
700
W. K. NEWEY AND F. WINDMEIJER
to p2 linear combinations can still have the usual asymptotic distribution, but tests of more than p2 linear combinations would need to account for singularity of the asymptotic variance of μn (βˆ − β0 ) The many weak moment asymptotic variance is larger than the usual one when m grows at the same rate as μ2n , for example, when μ2n = m. In the linear model this corresponds to a reduced form √ m x2i = π21 z1i + √ z2i + ηi2 n This sequence of models is a knife-edge case where the additional variance due to many instruments is the same size as the usual variance. If μ2n grew faster than m, the usual variance would dominate, while if μ2n grew slower than m, the additional term would dominate in the asymptotic variance. The case with μ2n growing slower than m is ruled out by Assumption 1 but is allowed in some work on the linear model, for example, see Chao and Swanson (2004) and Hansen, Hausman, and Newey (2008). One specification where μ2n and m grow at the same rate has
m−p1
z2i = C
√ Z2ij / m
E[Z2i Z2i ] = Im−p1
j=1
where C is an unknown constant. In that case the reduced form is C √ Z2ij + ηi2 n j=1
m−p1
x2i = π21 z1i +
This is a many weak instrument specification like that considered by Chao and Swanson (2004, 2005). Despite the knife-edge feature of these asymptotics, we find in simulations below that using the asymptotic variance estimate provides greatly improved approximation in a wide range of cases. Given these favorable results one might expect that the new variance estimator provides an improved approximation more generally than just when m grows at the same rate as μ2n . Hansen, Hausman, and Newey (2008) found such a result for the Bekker (1994) variance in a homoskedastic linear model, and the new variance here extends that to GEL and heteroskedasticity, so we might expect a similar result here. Showing such a result is beyond the scope of this paper though we provide some theoretical support for the linear model example in the next section. 5. ASYMPTOTIC VARIANCES To explain and interpret the results, we first give a formal derivation of the asymptotic variance for GEL and jackknife GMM. We begin with jackknife
GMM WITH WEAK MOMENTS
701
GMM because it is somewhat easier to work with. The usual Taylor expansion ˘ β)/∂β ˘ of the first-order condition ∂Q( = 0 gives ˘ 0 )/∂β Sn (β˘ − β0 ) = −H¯ −1 nSn−1 ∂Q(β ˘ β)/∂β ¯ ∂β Sn−1 H¯ = nSn−1 ∂2 Q( where β¯ is an intermediate value for β, being on the line joining β˘ and β0 (that ¯ Under regularity conditions it will be actually differs from row to row of H). the case that p H¯ −→ HW = lim nSn−1 G W GSn−1 n−→∞
where we assume that Wˆ estimates a matrix W in such a way that the remainders are small and that the limit of nSn−1 G W GSn−1 exists. The asymptotic distribution of Sn (β˘ − β) will then equal the asymptotic distribution of ˘ 0 )/∂β. −HW−1 nSn−1 ∂Q(β The estimation of the weighting matrix will not affect the asymptotic distribution, so that differentiating the jackknife GMM objective function and replacing Wˆ with its limit W gives ˘ 0 )/∂β = nSn−1 ∂Q(β Sn−1 Gi W gj /n + op (1) i=j
√ √ ˆ 0) = (1 − n−1 ) nSn−1 G W ng(β + ψJij /n + op (1) j
ψJij = Sn−1 (Gj − G) W gi + Sn−1 (Gi − G) W gj where the second equality holds by adding and subtracting G to Gi . The √ √ ˆ 0 ) term is the usual GMM one, having asymptotic varinSn−1 G W ng(β −1 −1 ance HJ Ω = limn−→∞ nSn G W ΩW GSn assumed to exist. The other term j
702
W. K. NEWEY AND F. WINDMEIJER
= Sn−1 E[Gj W ΩW Gj ] − G W ΩW G + E[Gj W gi gj W Gi ] Sn−1 This limit is equal to ΛJ = lim E[ψJij ψJ ij ]/2 n−→∞
= lim Sn−1 (E[Gj W ΩW Gj ] + E[Gj W gi gj W Gi ])Sn−1 n−→∞
The U-statistic term is uncorrelated with the usual GMM term, so by the cend ˘ 0 )/∂β −→ tral limit theorem, nS −1 ∂Q(β N(0 HΩ + ΛJ ) It then will follow that n
d Sn (β˘ − β0 ) −→ N(0 VJ )
VJ = HW−1 HΩ HW−1 + HW−1 ΛJ HW−1
a result that was previously derived for the JIVE2 estimator by Chao and Swanson (2004). For GEL we will focus on the asymptotic variance of the CUE because the explicit form of the CUE simplifies the discussion. The other GEL estimaˆ tors will have the same asymptotic variance, essentially because Q(β) will be ˆ quadratic in g(β) near β0 . To derive the CUE asymptotic variance we expand the first-order conditions similarly to jackknife GMM. That gives an analogous expression for Sn (βˆ − β0 ) ˆ with the CUE objective function Q(β) replacing the jackknife GMM objecp −1 2 ˆ ¯ ˘ tive Q(β). It will turn out that nSn ∂ Q(β)/∂β ∂β Sn−1 −→ H from Assumption 1, so that the Hessian term is the same for the CUE as for jackknife GMM. However, the other term in the variance will be different. To derive it, recall the definitions of Bj and Ui from Section 2, and note that the columns of Ui are the population residuals from least squares regression of columns of Gi − G on gi . Assuming we can differentiate under the integral we have ∂Ω(β0 )−1 −1 ∂Ω(β0 ) Ω−1 = −Bj Ω−1 − Ω−1 Bj = −Ω ∂βj ∂βj Then differentiating the CUE objective function with Ω(β)−1 replacing −1 ˆ Ω(β) we have nSn−1
−1 ˆ 0) ˆ ˆ ˆ 0 ) Ω(β)−1 g(β ˆ 0 )}|β=β0 Ω g(β) + g(β ∂Q(β ∂ {g(β) = nSn−1 ∂β ∂β 2
1 (G + Ui ) Ω−1 gj =S n ij=1 n
−1 n
703
GMM WITH WEAK MOMENTS
√ √ ˆ 0) + = nSn−1 G Ω−1 ng(β
j
ψij
n
−1 n
+S
n U Ω−1 gi i
i=1
n
ψij = Sn−1 (Uj Ω−1 gi + Ui Ω−1 gj ) By the projection residual form of Ui , each component of Ui is uncorrelated with every component of gi . Then by the law of large numbers, n p Sn−1 i=1 Ui Ω−1 gi /n −→ 0. Also note that E[ψij ψij ]/2 = Sn−1 E[Ui Ω−1 Ui ]Sn−1 . d ˆ 0 )/∂β −→ It then follows similarly to the jackknife GMM that nSn−1 ∂Q(β N(0 H + Λ) Λ = limn−→∞ Sn−1 E[Ui Ω−1 Ui ]Sn−1 . Then it follows that d Sn (βˆ − β0 ) −→ N(0 V )
V = H −1 + H −1 ΛH −1
We now show that GEL is asymptotically efficient relative to the jackknife GMM, that is, that V ≤ VJ in the positive semidefinite sense, when the jackknife GMM has W = Ω−1 . Let ij = ψJij − ψij . Under W = Ω−1 each element of ij depends on the data only through (1 gi ) (1 gj ). Therefore, by each element of Ui uncorrelated with every component of gi it follows that E[ψij ij ] = 0. Therefore we have E[ψJij ψJ ij ] = E[(ψij + ij )(ψij + ij ) ]
= E[ψij ψij ] + E[ij ij ] ≥ E[ψij ψij ] so that 1 1 E[ψij ψij ] ≤ lim E[ψJij ψJ ij ] = ΛJ n−→∞ 2 2 n−→∞
Λ = lim Thus we have
V = H −1 + H −1 ΛH −1 ≤ H −1 + H −1 ΛJ H −1 = VJ showing the asymptotic efficiency of GEL relative to a jackknife GMM estimator with W = Ω−1 . The linear model provides an example of the asymptotic variance, where, from the earlier notation, Bj = −Ω−1 E[Zi Zi ηij εi ]
j
Ui = −Zi Υij + E[Zi Υij ] + uij
uij = −Zi ηij + Bj Zi εi
√ p Then for ui = [u1i ui ] we have, by Υi = Sn zi / n, Sn−1 E[Ui Ω−1 Ui ]Sn−1 = Sn−1 E[ui Ω−1 ui ]Sn−1
+ E {Zi zi − E[Zi zi ]} Ω−1 {Zi zi − E[Zi zi ]} /n
704
W. K. NEWEY AND F. WINDMEIJER
The second term will be small as long as m grows slowly enough relative to n (when Zij is uniformly bounded, m/n −→ 0 will suffice) so that Λ = lim Sn−1 E[ui Ω−1 ui ]Sn−1 n−→∞
For instance, in the homoskedastic case where E[εi2 |Zi ] = σε2 , E[ηi ηi |Zi ] = Ση , and E[εi ηi |Zi ] = σηε , we have ui = −Zi (ηi − σηε εi /σε2 ), so that Sn−1 E[ui Ω−1 ui ]Sn−1 = Sn−1 E[(ηi − σηε εi /σε2 )(ηi − σηε εi /σε2 ) Zi Ω−1 Zi ]Sn−1 = Sn−1 (Ση − σηε σηε /σε2 )E[Zi (σε2 I)−1 Zi ]Sn−1 = mSn−1 (σε2 Ση − σηε σηε )Sn−1 /σε4
Then, assuming E[zi Zi ]E[Zi zi ] −→ E[zi zi ] = σε2 H and ymptotic variance matrix for Sn (βˆ − β0 ) will be
√
mSn−1 −→ S0 , the as-
V = H −1 + H −1 S0 (σε2 Ση − σηε σηε )S0 H −1 /σε4
This variance for GEL is identical to the asymptotic variance of LIML under many weak instrument asymptotics derived by Stock and Yogo (2005a). Thus we find that in the linear homoskedastic model, GEL and LIML have the same asymptotic variance under many weak moment asymptotics. As shown by Hansen, Hausman, and Newey (2008), the Bekker (1994) standard errors are consistent under many weak instruments, so that Sn Vˆ Sn /n will have the same limit as the Bekker standard errors in a homoskedastic linear model. Since Sn Vˆ Sn /n will also be consistent with heteroskedasticity, one can think of Vˆ as an extension of the Bekker (1994) variance estimator to GEL with heteroskedasticity. It is interesting to compare the asymptotic variance V of the CUE with the usual asymptotic variance formula H −1 for GMM. When m grows slower than μ2n or ∂gi (β0 )/∂β is constant, V = H −1 , but otherwise the variance here is larger than the standard formula. For further comparison we consider a corresponding variance approximation Vn for βˆ for a sample of size n. Replacing H by Hn = nSn−1 G Ω−1 GSn−1 and Λ by Λn = Sn−1 E[Ui Ω−1 Ui ]Sn−1 , premultiplying by (Sn )−1 , and postmultiplying by Sn−1 gives the variance approximation for sample size n of Vn = Sn−1 (Hn−1 + Hn−1 Λn Hn−1 )Sn−1 = (nG Ω−1 G)−1 + (nG Ω−1 G)−1 E[Ui Ω−1 Ui ](nG Ω−1 G)−1 = n−1 (G Ω−1 G)−1 + (G Ω−1 G)−1 (E[Ui Ω−1 Ui ]/n)(G Ω−1 G)−1
GMM WITH WEAK MOMENTS
705
The usual variance approximation for βˆ is (G Ω−1 G)−1 /n. The approximate variance Vn includes an additional term which can be important in practice. When Var(Ω−1/2 ∂gi (β0 )/∂β) is large relative to G Ω−1 G (condition (B) of Section 4), E[Ui Ω−1 Ui ] may be very large relative to G Ω−1 G, leading to the additional term being important, even when n is large. It is interesting to note that the usual term is divided by n and the additional term is divided by n2 . In asymptotic theory with fixed m this makes the additional term a “higher order” variance term. Indeed, by inspection of Donald and Newey (2003), one can see that the additional term corresponds to a higher order variance term involving estimation of G. There are also additional higher order terms that come from the estimation of the weight matrix, but the Jacobian term dominates when identification is not strong. For example, in the linear homoskedastic example, suppose that E[εi3 |Zi ] = 0 and E[εi4 |Zi ] = E[εi4 ], and let κ = E[εi4 ]/(E[εi2 ])2 . For An = E[zi Zi ]E[Zi zi ] the higher order variance approximation for GEL from Donald and Newey (2003) is 2 −1 2 −1 Vn = σε2 A−1 n /n + (m/n)σε An (Ση − σηε σηε /σε )An /n 2 −1 2 + [(5 − κ) + ρ3 (0)(3 − κ)]σε2 A−1 n E[Zi Zi Υi ]An /n
The last term corresponds to weight matrix estimation and will tend to be small when Υi is small, as it is under the asymptotics we consider. In this sense the many weak moment asymptotics accounts well for variability of the derivative of the moment conditions, but takes no account of variability of the weight matrix. Also, it is interesting to note that this last term will be asymptotically small relative to the second term even when m does not grow at the same 2 2 rate as √ μn . For example, if zi is bounded and √ μjn = μn for each j, then Υi ≤ Cμn / n so as long as μn grows slower than n the third (weight matrix) term will be small relative to the second (Jacobian) term. Here the new variance estimator corresponds to the higher order variance, showing that it provides an improved variance approximation more generally than in the knife-edge case where m and μ2n grow at the same rate. 6. LARGE SAMPLE THEORY We give results for GEL, leaving a precise treatment of jackknife GMM to another paper. It is helpful to strengthen Assumption 1. For a matrix A, let A = trace(A A)1/2 denote its Euclidean norm, and for symmetric A, let ξmin (A) and ξmax (A) denote its smallest and largest eigenvalues, respectively. Also, let δ(β) = Sn (β − β0 )/μn where we suppress an n subscript on δ(β) for notational convenience. ASSUMPTION 2: (i) Assumption 1(i) is satisfied. (ii) There is C > 0 with √ ˆ ¯ δ(β) ≤ C ng(β)/μ n for all β ∈ B. (iii) There is C > 0 and M = Op (1)
706
W. K. NEWEY AND F. WINDMEIJER
√ ˆ ˆ such that with probability approaching 1, δ(β) ≤ C ng(β)/μ n + M for all β ∈ B. This condition implies global identification of β0 . We also need conditions ˆ on convergence of g(β), as imposed in the following condition. ASSUMPTION 3: gi (β) is continuous in β and there is C > 0 such that (i) supβ∈B E[{gi (β) gi (β)}2 ]/n −→ 0; (ii) 1/C ≤ ξmin (Ω(β)) and ξmax (Ω(β)) ≤ p ˆ ˜ − Ω(β)]b| ≤ C for all β ∈ B; (iii) supβ∈B Ω(β) − Ω(β) −→ 0; (iv) |a [Ω(β) ˜ β ∈ B; (v) for every C˜ > 0 there is C and Cabβ˜ − β for all a b ∈ m β ˜ δ(β) ≤ C, ˜ √ng( ˆ = Op (1) such that for all β ˜ β ∈ B δ(β) ˜ ≤ C, ˜ − ¯ β) M √ ˆ ˜ ˜ ˜ ˆ β) − g(β)/μ ˆ ¯ ng( g(β)/μ n ≤ Cδ(β − β), and n ≤ Mδ(β − β). These conditions restrict the rate at which m can grow with the sample size. If E[gij (β)4 ] is bounded uniformly in j m, and β, then a sufficient condition p ˆ for Ω(β) − Ω(β) −→ 0 at each β is that m2 /n −→ 0. Uniform convergence may require further conditions. For GEL estimators other than the CUE we need an additional condition. differenASSUMPTION 4: βˆ is the CUE or (i) ρ(v) is three times continuously √ tiable and (ii) there is γ > 2 such that n1/γ (E[supβ∈B gi (β)γ ])1/γ m/n −→ 0. When βˆ is not the CUE, this condition puts further restrictions on the growth rate of m. If the elements of gi (β) were bounded uniformly in n, then this condition is m2 /n1−2/γ −→ 0, which is only slightly stronger than m2 /n −→ 0 The following is a consistency result for CUE. p THEOREM 1: If Assumptions 2–4 are satisfied, then Sn (βˆ − β0 )/μn −→ 0.
We also give more primitive regularity conditions for consistency for the linear model example. Let η˜ i be a vector of the nonzero elements of ηi and let Σi = E[(εi η˜ i ) (εi η˜ i )|Zi Υi ]. √ ASSUMPTION 5: The linear model holds, Υi = Sn zi / n, and there is a constant C with E[εi4 |Zi Υi ] ≤ C, E[ηi 4 |Zi Υi ] ≤ C, Υi ≤ C, ξmin (Σi ) ≥ 1/C, E[Zi Zi ] = Im , E[(Zi Zi )2 ]/n −→ 0, E[zi 4 ] < C, and either βˆ is the √ CUE or for γ > 2 we have E[|εi |γ |Zi ] ≤ C, E[ηi γ |Zi ] ≤ C, n1/γ (E[Zi γ ])1/γ m/n −→ 0. The conditions put restrictions on the rate at which m can grow with the sample size. If Zij is bounded uniformly in j and m, then these conditions will hold for the CUE if m2 /n −→ 0 (for in that case, E[(Zi Zi )2 ]/n = O(m2 /n) −→ 0) and if m2 /n1−2/γ −→ 0 for other GEL estimators.
GMM WITH WEAK MOMENTS
707
p THEOREM 2: If Assumptions 1 and 5 are satisfied, then Sn (βˆ − β0 )/μn −→ 0.
For asymptotic normality some additional conditions are needed. ASSUMPTION 6: g(z β) is twice continuously differentiable in a neighborhood N of β0 , (E[gi 4 ] + E[Gi 4 ])m/n −→ 0, and for a constant C and j = 1 p ∂Gi (β0 ) ∂Gi (β0 ) ≤ C ξmax E ∂βj ∂βj √ ∂Gi (β0 ) −1 Sn n E ≤ C ∂βj ξmax (E[Gi Gi ]) ≤ C
This condition imposes a stronger restriction on the growth rate of the number of moment conditions than was imposed for consistency. If gij (β0 ) were uniformly bounded, a sufficient condition would be that m3 /n −→ 0. √ ˆ √ p p ˆ 0 )]S −1 −→ ¯ − G(β β) 0, n × ASSUMPTION 7: If β¯ −→ β0 , then n[G( n p −1 ˆ ˆ β)/∂β ¯ [∂G( k − ∂G(β0 )/∂βk ]Sn −→ 0 k = 1 p. This condition restricts how the derivatives of the moments vary with the parameters. It is automatically satisfied in the linear model. For the next assumption let 1 ∂gi (β) gi (β) Ωˆ k (β) = n i=1 ∂βk n
Ωk (β) = E[Ωˆ k (β)]
1 ∂2 gi (β) Ωˆ k (β) = gi (β) n i=1 ∂βk ∂β n
1 ∂gi (β) ∂gi (β) Ωˆ k (β) = n i=1 ∂βk ∂β n
Ωk (β) = E[Ωˆ k (β)] Ωk (β) = E[Ωˆ k (β)]
ASSUMPTION 8: For all β on a neighborhood N of β0 and A equal to p ˆ ˜ − A(β)]b| ≤ Ωk Ωk , or Ωk : (i) supβ∈N A(β) − A(β) −→ 0; (ii) |a [A(β) ˜ Cabβ − β. This condition imposes uniform convergence and smoothness conditions ˆ similar to those already required for Ω(β) and Ω(β) above.
708
W. K. NEWEY AND F. WINDMEIJER
ASSUMPTION 9: βˆ is the CUE√ or (i) there is γ > 2 such that n1/γ × (E[supβ∈B gi (β)γ ])1/γ (m + μn )/ n −→ 0 and (ii) μ2n E[di4 ]/n −→ 0 for di = max max gi (β) ∂gi (β)/∂β ∂2 gi (β)/∂β ∂βj β∈B
j
This condition imposes some additional restrictions on the growth of m and μn . In the primary case of interest where μ2n and m grow at the same rate, then μ2n can be replaced by m in this condition. If βˆ is not the CUE, m3 must grow slower than n. The next condition imposes corresponding requirements for the linear model case. 4 0, and βˆ is the ASSUMPTION 10: The linear model √ holds, mE[Z2i ]/n −→ 1/γ γ 1/γ 4 CUE or n (E[Zi ]) (m + μn )/ n −→ 0 and μn E[Zi ]/n −→ 0.
Under these and other regularity conditions we can show that βˆ is asymptotically normal and that the variance estimator is consistent. Recall the definition of Ui from Section 2. THEOREM 3: If Assumption 1 is satisfied, Sn−1 E[Ui Ω−1 Ui ]Sn−1 −→ Λ and Assumptions 2–4 and 6–9 are satisfied or the linear model Assumptions 1, 5, and 10 are satisfied, then for V = H −1 + H −1 ΛH −1 , d Sn (βˆ − β0 ) −→ N(0 V )
p Sn Vˆ Sn /n −→ V
Furthermore, if there is rn and c ∗ = 0 such that rn Sn−1 c −→ c ∗ , then c (βˆ − β0 ) d −→ N(0 1) c Vˆ c/n This result includes the linear model case. The next result shows that χ2 (m) asymptotic approximation for the Anderson–Rubin statistic is correct. Let qαm be the (1 − α)th quantile of a χ2 (m) distribution. THEOREM 4: If (i) mE[gi 4 ]/n −→ 0 and ξmin (Ω) ≥ C or the linear model holds with E[εi4 |Zi ] ≤ C E[εi2 |Zi ] ≥ C, E[Zi Zi ] = I, and mE[Zi 4 ]/n −→ 0; √ (ii) βˆ is the CUE or there is γ > 2 such that n1/γ E[gi γ ]m/ n −→ 0, then Pr(AR(β0 ) ≥ qαm ) −→ α The last result shows that the Wald, LM, CLR, and overidentification statistics described in Section 3 are asymptotically equivalent and have asymptotically correct level under many weak moments, but that the likelihood ratio does not. Let Tˆ = n(βˆ − β0 ) Vˆ −1 (βˆ − β0 )
GMM WITH WEAK MOMENTS
709
THEOREM 5: If Sn−1 E[Ui Ω−1 Ui ]Sn−1 −→ Λ and either Assumptions 1–4 and 6–9 are satisfied or the linear model Assumptions 1, 5, and 10 are satisfied, d then Tˆ −→ χ2 (p), LM(β0 ) = Tˆ + op (1) ˆ β) ˆ ≥ qαm−p ) −→ α Pr(2nQ( ˆ 0 ) − Q( ˆ β)] ˆ ≥ qαp ≥ α + o(1) Pr 2n[Q(β 2 In addition, if there is C > 0 such that ξmin (μ−2 n Sn HV HSn ) − m/μn > C for all n large enough, then there is a bounded sequence {cn } cn ≥ 0, that is bounded away from zero such that
CLR(β0 ) = cn Tˆ + op (1)
qˆ α (β0 ) = cn qαp + op (1)
and Pr(CLR(β0 ) ≥ qˆ α (β0 )) −→ α. 7. MONTE CARLO RESULTS We first carry out a Monte Carlo simulation for the simple linear IV model where the disturbances and instruments have a Gaussian distribution. The parameters of this experiment are the correlation coefficient ρ between the structural and reduced form errors, the concentration parameter (CP), and the number of instruments m. The data generating process is given by yi = xi β0 + εi xi = zi π + ηi εi = ρηi + 1 − ρ2 vi ηi ∼ N(0 1) CP π= ιm mn
vi ∼ N(0 1)
zi ∼ N(0 Im )
where ιm is an m-vector of ones. The concentration parameter in this design is equal to CP. We generate samples of size n = 200, with values of CP equal to 10, 20, or 35; number of instruments m equal to 3, 10, or 15; values of ρ equal to 03, 05, or 09; and β0 = 0. This design covers cases with very weak instruments. For example, when CP = 10 and m = 15, the first stage F -statistic equals CP /m = 067. Table I presents the estimation results for 10,000 Monte Carlo replications. We report median bias and interquartile range (IQR) of two-stage least
710
W. K. NEWEY AND F. WINDMEIJER
TABLE I SIMULATION RESULTS FOR LINEAR IV MODELa CP = 10
Med Bias
CP = 20
IQR
Med Bias
CP = 35
IQR
Med Bias
IQR
(a) ρ = 03
m=3 2SLS GMM LIML CUE JGMM
00509 00516 −00016 −00031 −00127
0.3893 0.3942 0.4893 0.4963 0.5665
00312 00307 00027 00034 −00123
0.2831 0.2885 0.3184 0.3257 0.3474
00184 00186 00020 00013 −00064
0.2226 0.2233 0.2398 0.2410 0.2495
m = 10 2SLS GMM LIML CUE JGMM
01496 01479 00152 00230 00438
0.3059 0.3153 0.6060 0.6501 0.7242
00967 00956 00006 00002 −00117
0.2486 0.2562 0.3846 0.4067 0.4290
00630 00644 −00001 00007 −00088
0.1996 0.2056 0.2568 0.2762 0.2785
m = 15 2SLS GMM LIML CUE JGMM
01814 01809 00262 00375 00781
0.2645 0.2772 0.6605 0.7178 0.7855
01237 01248 −00024 −00008 −00065
0.2281 0.2397 0.4102 0.4629 0.4769
00839 00846 −00047 −00034 −00104
0.1863 0.1981 0.2729 0.3126 0.3128
(b) ρ = 05
m=3 2SLS GMM LIML CUE JGMM
00957 00961 00053 00082 −00189
0.3720 0.3773 0.4761 0.4773 0.5886
00498 00501 00028 00031 −00227
0.2830 0.2850 0.3219 0.3233 0.3576
00300 00296 00018 00009 −00130
0.2191 0.2210 0.2364 0.2376 0.2514
m = 10 2SLS GMM LIML CUE JGMM
02422 02434 00169 00212 00451
0.2768 0.2900 0.5640 0.6044 0.7086
01603 01606 00025 00045 −00137
0.2302 0.2360 0.3641 0.3851 0.4330
01044 01052 −00016 00035 −00118
0.1910 0.1969 0.2529 0.2676 0.2875
m = 15 2SLS GMM LIML CUE JGMM
03000 03021 00320 00484 01051
0.2492 0.2615 0.6377 0.7039 0.7808
02108 02115 00026 00081 −00027
0.2114 0.2233 0.3920 0.4408 0.4890
01432 01437 −00022 00003 −00122
0.1831 0.1911 0.2718 0.3027 0.3207 (Continues)
711
GMM WITH WEAK MOMENTS TABLE I—Continued CP = 10
CP = 20
Med Bias
IQR
Med Bias
CP = 35
IQR
Med Bias
IQR
(c) ρ = 09
m=3 2SLS GMM LIML CUE JGMM
01621 01614 −00053 −00036 −00536
0.3254 0.3313 0.4490 0.4559 0.6532
00855 00848 −00061 −00038 −00441
0.2601 0.2650 0.3054 0.3094 0.3863
00495 00503 −00046 −00034 −00268
0.2077 0.2106 0.2283 0.2291 0.2613
m = 10 2SLS GMM LIML CUE JGMM
04348 04363 −00036 −00034 00385
0.1984 0.2083 0.4823 0.5184 0.7737
02842 02853 −00057 −00070 −00347
0.1836 0.1896 0.3264 0.3477 0.4890
01870 01856 −00049 −00059 −00259
0.1630 0.1699 0.2391 0.2555 0.3155
m = 15 2SLS GMM LIML CUE JGMM
05333 05333 00018 00066 01186
0.1682 0.1800 0.5117 0.5778 0.7972
03747 03748 −00035 −00013 −00232
0.1588 0.1686 0.3331 0.3705 0.5377
02608 02609 00041 00042 −00182
0.1435 0.1517 0.2391 0.2655 0.3378
a n = 200; β = 0; 10,000 replications. 0
squares (2SLS), GMM, LIML, CUE, and the jackknife GMM estimator, denoted JGMM. The CUE is obtained by a standard iterative minimization routine, taking the minimum obtained from five different starting values of β, {−2 −1 2}. The results for 2SLS and GMM are as expected. They are upward biased, with the bias increasing with the number of instruments, the degree of endogeneity, and a decreasing concentration parameter. LIML and CUE are close to being median unbiased, although they display some small biases, accompanied by large interquartile ranges, when CP = 10 and the number of instruments is larger than 3. JGMM displays larger median biases than LIML and CUE in general, and especially in the very weak instrument case when CP = 10 and m = 15, with this bias increasing with ρ. There is a clear reduction in IQR for LIML, CUE, and JGMM when both the number of instruments and the concentration parameter increase, whereas the biases for 2SLS and GMM remain. As expected, the IQR for JGMM is larger than the IQR for CUE, which in turn is larger than that of LIML. The superior performance of LIML might be expected here and in the Wald tests below, because it is a homoskedastic design and LIML imposes homoskedasticity on the weighting matrix. Doing so is often thought to improve small sample performance in homoskedastic cases.
712
W. K. NEWEY AND F. WINDMEIJER
Table II presents rejection frequencies of Wald tests at the 5% nominal level. The purpose here is to analyze our proposed general methods in this wellunderstood setting. The estimators and standard errors utilized in the Wald tests are the two-step GMM estimator with the usual standard errors (GMM), the Windmeijer (2005) standard errors (GMMC), the continuous updating estimator with the usual standard errors (CUE), and the many weak instruments standard errors (CUEC), and equivalent for JGMM. For purposes of comparTABLE II REJECTION FREQUENCIES OF WALD TESTS FOR LINEAR IV MODELa ρ = 03
ρ = 05
ρ = 09
CP = 10
CP = 20
CP = 35
CP = 10
CP = 20
CP = 35
CP = 10
CP = 20
CP = 35
m=3 2SLS GMM GMMC LIML LIMLC CUE CUEC JGMM JGMMC LM
0.0451 0.0489 0.0468 0.0384 0.0317 0.0744 0.0348 0.1080 0.0217 0.0477
0.0440 0.0492 0.0463 0.0392 0.0329 0.0638 0.0382 0.0734 0.0282 0.0444
0.0477 0.0535 0.0510 0.0428 0.0374 0.0621 0.0418 0.0676 0.0370 0.0440
0.0780 0.0835 0.0806 0.0535 0.0439 0.0902 0.0500 0.1085 0.0366 0.0428
0.0653 0.0674 0.0644 0.0470 0.0415 0.0638 0.0433 0.0724 0.0378 0.0455
0.0593 0.0621 0.0579 0.0446 0.0413 0.0600 0.0429 0.0672 0.0401 0.0446
0.1898 0.1940 0.1818 0.0799 0.0767 0.0967 0.0779 0.1265 0.0708 0.0448
0.1274 0.1312 0.1217 0.0637 0.0625 0.0769 0.0648 0.0769 0.0543 0.0451
0.0969 0.1007 0.0933 0.0556 0.0551 0.0675 0.0564 0.0676 0.0482 0.0459
m = 10 2SLS GMM GMMC LIML LIMLC CUE CUEC JGMM JGMMC LM
0.1148 0.1423 0.1147 0.0812 0.0414 0.3450 0.0587 0.3676 0.0224 0.0398
0.0924 0.1157 0.0910 0.0663 0.0367 0.2277 0.0488 0.2513 0.0327 0.0374
0.0793 0.1001 0.0789 0.0627 0.0392 0.1628 0.0450 0.1686 0.0411 0.0363
0.2500 0.2763 0.2291 0.1015 0.0585 0.3080 0.0770 0.3657 0.0472 0.0345
0.1833 0.2089 0.1683 0.0724 0.0462 0.2026 0.0532 0.2415 0.0473 0.0356
0.1384 0.1635 0.1305 0.0587 0.0423 0.1470 0.0433 0.1629 0.0458 0.0329
0.7315 0.7446 0.7034 0.0937 0.0789 0.2159 0.0833 0.3848 0.1107 0.0334
0.5252 0.5423 0.4850 0.0739 0.0663 0.1462 0.0645 0.2520 0.0747 0.0336
0.3572 0.3847 0.3251 0.0612 0.0571 0.1138 0.0527 0.1698 0.0614 0.0345
m = 15 2SLS GMM GMMC LIML LIMLC CUE CUEC JGMM JGMMC LM
0.1641 0.2056 0.1534 0.0995 0.0393 0.4721 0.0709 0.4668 0.0244 0.0318
0.1339 0.1749 0.1269 0.0894 0.0397 0.3450 0.0637 0.3544 0.0341 0.0317
0.1080 0.1425 0.1008 0.0786 0.0413 0.2535 0.0536 0.2397 0.0420 0.0323
0.4081 0.4547 0.3701 0.1285 0.0594 0.4628 0.1001 0.4810 0.0581 0.0342
0.3037 0.3494 0.2720 0.0935 0.0510 0.3234 0.0701 0.3487 0.0571 0.0337
0.2283 0.2704 0.2034 0.0749 0.0473 0.2376 0.0509 0.2475 0.0531 0.0299
0.9329 0.9388 0.9165 0.1062 0.0827 0.3350 0.0887 0.5054 0.1497 0.0314
0.7935 0.8092 0.7535 0.0788 0.0661 0.2209 0.0665 0.3625 0.0936 0.0271
0.6130 0.6483 0.5663 0.0662 0.0596 0.1712 0.0559 0.2545 0.0744 0.0281
a n = 200; H : β = 0; 10,000 replications, 5% nominal size. 0 0
713
GMM WITH WEAK MOMENTS
ison we also give results for 2SLS and LIML with their usual standard errors and LIML with Bekker (1994) standard errors (LIMLC), and the GEL-LM statistic (LM) as defined in (5). We have also investigated the size properties of the GEL-AR and GEL-CLR statistics as defined in (4) and (6), respectively, and found them in these settings to be very similar to those of the LM statistic. They are therefore not reported separately. The LIML Wald test using the Bekker standard errors (LIMLC) has rejection frequencies very close to the nominal size, correcting the usual asymptotic Wald test which tends to be oversized with an increasing number of instruments. The LM statistic shows a tendency to be undersized with an increasing number of instruments. The results for the rejection frequencies of the Wald test show that even with low numbers of instruments the corrected standard errors for the continuous updating estimator produce large improvements in the accuracy of the approximation. When the instruments are not too weak, that is, when CP = 20 and larger, the observed rejection frequencies are very close to the nominal size for all values of m, whereas those based on the usual asymptotic standard errors are much larger than the nominal size. When we consider the “diagonal” elements, that is, increasing the number of instruments and the concentration parameter at the same time, we see that the CUEC Wald test performs very well in terms of size. Similar improvements are found for the corrected JGMM (JGMMC) Wald test, although this test overrejects more when ρ = 09 and m = 15. We next analyze the properties of the CUE using the many weak instrument asymptotics for estimation of the parameters in a panel data process, generated as in Windmeijer (2005): yit = β0 xit + uit
uit = ηi + vit
xit = γxit−1 + ηi + 05vit−1 + εit vit = δi τt ωit
(i = 1 n t = 1 T ) ηi ∼ N(0 1)
εit ∼ N(0 1)
ωit ∼ (χ21 − 1)
δi ∼ U[05 15]
τt = 05 + 01(t − 1)
Fifty time periods are generated, with τt = 05 for t = −49 0 and xi−49 ∼ N(ηi /(1 − γ) 1/1 − γ 2 ), before the estimation sample is drawn. n = 250, T = 6, β0 = 1, and 10,000 replications are drawn. For this data generating process the regressor xit is correlated with the unobserved constant heterogeneity term ηi and is predetermined due to its correlation with vit−1 . The idiosyncratic shocks vit are heteroskedastic over time and at the individual level, and have a skewed chi-squared distribution. The model parameter β0 is estimated by firstdifferenced GMM (see Arellano and Bond (1991)). As xit is predetermined, the sequential moment conditions used are gi (β) = Zi ui (β)
714
W. K. NEWEY AND F. WINDMEIJER
where ⎡x
i1
⎢ 0 Zi = ⎢ ⎣ 0
0 xi1
0 xi2
··· ···
0 0
··· ···
0 0
⎤ ⎥ ⎥ ⎦
··· ··· 0 0 · · · xi1 · · · xiT −1 ⎤ ⎡ y − βx ⎤ ⎡ i2 i2 ui2 (β) ⎢ ⎥ y − βx (β) u ⎥ ⎢ i3 ⎢ i3 i3 ⎥ ⎥=⎢ ui (β) = ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ uiT (β)
yiT − βxiT
This results in a total of 15 moment conditions in this case, but only a maximum of 5 instruments for the cross section in the last time period. The first two sets of results in Table III are the estimation results for values of γ = 040 and γ = 085, respectively. When γ = 040, the instruments are relatively strong, but they are weaker for γ = 085. The reported empirical concentration parameter is an object that corresponds to the reduced form of this panel data model and is equal to 261 when γ = 04 and 35 when γ = 085. This is estimated simply from the linear reduced form estimated by ordinary least squares, and ignores serial correlation and heteroskedasticity over time. This CP is therefore only indicative and does not play the same role as in the linear homoskedastic IV model. Median bias and interquartile range (IQR) are reported for the standard linear one-step and two-step GMM estimators, the CUE, and the JGMM. When γ = 040, median biases are negligible for GMM, CUE, and the JGMM, with comparable interquartile ranges. When γ = 085 and the instruments are weaker, the linear GMM estimators are downward biased, whereas the CUE and JGMM are median unbiased but exhibit a larger interquartile range than the linear GMM estimators. Figure 1 presents p-value plots for the Wald tests for the hypothesis H0 : β0 = 1 when γ = 085, based on one-step GMM estimates (WGMM1 ), on TABLE III SIMULATION RESULTS FOR PANEL DATA MODEL, N = 250, T = 6 γ = 040 (CP = 261)
GMM1 GMM2 CUE JGMM Instruments
γ = 085 (CP = 35)
γ = 085 (CP = 54)
Med Bias
IQR
Med Bias
IQR
Med Bias
IQR
−00082 −00047 00002 00003
0.0797 0.0712 0.0734 0.0737
−00644 −00492 00010 00018
0.2077 0.1952 0.2615 0.2707
−00836 −00608 −00068 −00038
0.1743 0.1627 0.2218 0.2280
xit−1 xi1
xit−1 xi1
xit−1 xi1 ; yit−2 yi1
GMM WITH WEAK MOMENTS
715
FIGURE 1.—p-value plot for the H0 : β0 = 1 panel data model when γ = 085.
two-step GMM estimates (WGMM2 ), on the Windmeijer (2005) corrected twostep Wald (WGMM2C ), on the CUE using the conventional asymptotic variance (WCUE ), on the CUE using the variance estimate Vˆ described in Section 2 (WCUEC ), and equivalently on the JGMM (WJ and WJC ). Further displayed is the p-value plot for the LM statistic. It is clear that the usual asymptotic variance estimates for the CUE and JGMM are too small. This problem is similar to that of the linear two-step GMM estimator, leading to rejection frequencies that are much larger than the nominal size. In contrast, use of the variance estimators under many weak instrument asymptotics leads to rejection frequencies that are very close to the nominal size. The third set of results presented in Table III is for the design with γ = 085, but with lags of the dependent variable yit included as sequential instruments (yit−2 yi1 ), additional to the sequential lags of xit . As there is feedback from yit−1 to xit and as xit is correlated with ηi , the lagged values of yit could improve the strength of the instrument set. The total number of instruments increases to 25, with a maximum of 11 for the cross section in the final period. The empirical concentration parameter increases from 35 to 54. The GMM estimators are more downward biased when the extra instruments are included. The CUE and JGMM are still median unbiased and their IQRs have decreased by 15%. As the p-value plot in Figure 2 shows, use of the proposed variance estimators results in rejection frequencies that are virtually equal to the nom-
716
W. K. NEWEY AND F. WINDMEIJER
FIGURE 2.—p-value plot for the H0 : β0 = 1, panel data model with additional instruments and when γ = 085.
inal size. Although WGMM2C had good size properties when using the smaller instrument set, use of the additional instruments leads to rejection frequencies that are larger than the nominal size. When further investigating the size properties of the AR and CLR tests, we find that the behavior of the CLR test is virtually indistinguishable from that of the LM test, whereas the AR test tends to have rejection frequencies that are slightly smaller than the nominal size, especially with the larger instrument set. For the power of these tests in the latter example, we find that the CLR and LM tests have identical power, which is slightly less than that of WCUEC , with the AR test having much lower power. 8. CONCLUSION We have given a new variance estimator for GEL that is consistent under standard asymptotics and also accounts for many weak moment conditions. This approximation is shown to perform well in a simple linear IV and panel data Monte Carlo analysis. One possible topic for future research is higher order asymptotics when m grows so slowly that the standard asymptotic variance formula is correct. As discussed in the paper, we conjecture that the new variance estimator would provide an improved approximation in a range of such cases. Hansen, Haus-
GMM WITH WEAK MOMENTS
717
man, and Newey (2008) have shown such a result for the Bekker (1994) variance in the homoskedastic linear model. Another interesting topic is the choice of moment conditions under many weak moment conditions. Donald, Imbens, and Newey (2003) gave a criteria for moment choice for GMM and GEL that is quite complicated. Under many weak moment conditions this criteria should simplify. It would be useful in practice to have a simple criteria for choosing the moment conditions. A third topic for future research is the extension of these results to dependent observations. It appears that the variance estimator for the CUE would be the same except that Ωˆ would include autocorrelation terms. It should also be possible to obtain similar results for GEL estimators based on time smoothed moment conditions, like those considered in Kitamura and Stutzer (1997). REFERENCES ALTONJI, J., AND L. M. SEGAL (1996): “Small Sample Bias in GMM Estimation of Covariance Structures,” Journal of Economic and Business Statistics, 14, 353–366. [688,693] ANDREWS, D. W. K., AND J. H. STOCK (2006): “Inference With Weak Instruments,” in Advances in Economics and Econometrics, Vol. 3, ed. by R. Blundell, W. Newey, and T. Persson. Cambridge, U.K.: Cambridge University Press. [689,697] ANGRIST, J., AND A. KRUEGER (1991): “Does Compulsory School Attendance Affect Schooling and Earnings,” Quarterly Journal of Economics, 106, 979–1014. [687] ANGRIST, J., G. IMBENS, AND A. KRUEGER (1999): “Jackknife Instrumental Variables Estimation,” Journal of Applied Econometrics, 14, 57–67. [688,693] ARELLANO, M., AND S. R. BOND (1991): “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations,” Review of Economic Studies, 58, 277–297. [713] BEKKER, P. A. (1994): “Alternative Approximations to the Distributions of Instrumental Variables Estimators,” Econometrica, 63, 657–681. [688,700,704,713,717] BLOMQUIST, S., AND M. DAHLBERG (1999): “Small Sample Properties of LIML and Jackknife IV Estimators: Experiments With Weak Instruments,” Journal of Applied Econometrics, 14, 69–88. [688] BROWN, B. W., AND W. K. NEWEY (1998): “Efficient Semiparametric Estimation of Expectations,” Econometrica, 66, 453–464. [692] CHAO, J. C., AND N. R. SWANSON (2004): “Estimation and Testing Using Jackknife IV in Heteroskedastic Regression With Many Weak Instruments,” Working Paper, Rutgers University. [688,700,702] (2005): “Consistent Estimation With a Large Number of Weak Instruments,” Econometrica, 73, 1673–1692. [688,698,700] DONALD, S. G., AND W. K. NEWEY (2003): “Choosing the Number of Moments in GMM and GEL Estimation,” Working Paper, MIT. [705] DONALD, S. G., G. W. IMBENS, AND W. K. NEWEY (2003): “Empirical Likelihood Estimation and Consistent Tests With Conditional Moment Restrictions,” Journal of Econometrics, 117, 55–93. [717] DUFOUR, J.-M. (1997): “Some Impossibility Theorems in Econometrics With Applications to Structural and Dynamic Models,” Econometrica, 65, 1365–1387. [695]
718
W. K. NEWEY AND F. WINDMEIJER
GUGGENBERGER, P., AND R. J. SMITH (2005): “Generalized Empirical Likelihood Estimators and Tests Under Partial, Weak, and Strong Identification,” Econometric Theory, 21, 667–709. [692,695] HAHN, J., AND J. A. HAUSMAN (2004): “A New Specification Test for the Validity of Instrumental Variables,” Econometrica, 70, 163–189. [697] HAN, C., AND P. C. B. PHILLIPS (2006): “GMM With Many Moment Conditions,” Econometrica, 74, 147–192. [687,688,693,694,698] HANSEN, C., J. A. HAUSMAN, AND W. K. NEWEY (2008): “Estimation With Many Instrumental Variables,” Journal of Business and Economic Statistics, 26, 398–422. [687,689,698,700,704,717] HANSEN, L. P. (1982): “Large Sample Properties of Generalized Method of Moments Estimators,” Econometrica, 50, 1029–1054. [687,691] HANSEN, L. P., AND K. J. SINGLETON (1982): “Generalized Instrumental Variables Estimation of Nonlinear Rational Expectations Models,” Econometrica, 50, 1269–1286. [687] HANSEN, L. P., J. HEATON, AND A. YARON (1996): “Finite-Sample Properties of Some Alternative GMM Estimators,” Journal of Business and Economic Statistics, 14, 262–280. [697] HOLTZ-EAKIN, D., W. K. NEWEY, AND H. ROSEN (1988): “Estimating Vector Autoregressions With Panel Data,” Econometrica, 56, 1371–1396. [687] IMBENS, G. (1997): “One-Step Estimators for Over-Identified Generalized Method of Moments Models,” Review of Economic Studies, 64, 359–383. [691] KITAMURA, Y., AND M. STUTZER (1997): “An Information-Theoretic Alternative to Generalized Method of Moments Estimation,” Econometrica, 65, 861–874. [691,717] KLEIBERGEN, F. (2005): “Testing Parameters in GMM Without Assuming They Are Identified,” Econometrica, 73, 1103–1123. [688,692,695,696] MOREIRA, M. (2003): “A Conditional Likelihood Ratio Test for Structural Models,” Econometrica, 71, 1027–1048. [696] NEWEY, W. K. (1990): “Efficient Instrumental Variables Estimation of Nonlinear Models,” Econometrica, 58, 809–837. [699] NEWEY, W. K., AND R. J. SMITH (2004): “Higher-Order Properties of GMM and Generalized Empirical Likelihood Estimators,” Econometrica, 72, 219–255. [691,694] NEWEY, W. K., AND F. WINDMEIJER (2009): “Supplement to ‘Generalized Method of Moments With Many Weak Moment Conditions’,” Econometrica Supplemental Material, 77, http://www. econometricsociety.org/ecta/Supmat/6224_proofs.pdf. [689,701] PHILLIPS, G. D. A., AND C. HALE (1977): “The Bias of Instrumental Variable Estimators of Simultaneous Equation Systems,” International Economic Review, 18, 219–228. [688] QIN, J., AND J. LAWLESS (1994): “Empirical Likelihood and General Estimating Equations,” Annals of Statistics, 22, 300–325. [691] SMITH, R. J. (1997): “Alternative Semi-Parametric Likelihood Approaches to Generalized Method of Moments Estimation,” Economic Journal, 107, 503–519. [687,691] (2006): “Weak Instruments and Empirical Likelihood: A Discussion of the Papers by D. W. K. Andrews and J. H. Stock and Y. Kitamura,” in Advances in Economics and Econometrics, Vol. 3, ed. by R. Blundell, W. Newey, and T. Persson. Cambridge, U.K.: Cambridge University Press. [696] STOCK, J., AND J. WRIGHT (2000): “GMM With Weak Identification,” Econometrica, 68, 1055–1096. [688,695-697] STOCK, J., AND M. YOGO (2005a): “Asymptotic Distributions of Instrumental Variables Statistics With Many Instruments,” in Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, ed. by D. W. K. Andrews and J. H. Stock. Cambridge, U.K.: Cambridge University Press, 109–120. [688,689,704] (2005b): “Testing for Weak Instruments in Linear IV Regression,” in Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, ed. by D. W. K. Andrews and J. H. Stock. Cambridge, U.K.: Cambridge University Press, 80–108. [697]
GMM WITH WEAK MOMENTS
719
WINDMEIJER, F. (2005): “A Finite Sample Correction for the Variance of Linear Efficient TwoStep GMM Estimators,” Journal of Econometrics, 126, 25–51. [689,712,713,715]
Dept. of Economics, Massachusetts Institute of Technology, Cambridge, MA 02142-1347, U.S.A.;
[email protected] and Dept. of Economics, University of Bristol, 2 Priory Road, Bristol BS8 1TN, U.K.;
[email protected]. Manuscript received August, 2006; final revision received January, 2008.
Econometrica, Vol. 77, No. 3 (May, 2009), 721–762
HYBRID AND SIZE-CORRECTED SUBSAMPLING METHODS BY DONALD W. K. ANDREWS AND PATRIK GUGGENBERGER1 This paper considers inference in a broad class of nonregular models. The models considered are nonregular in the sense that standard test statistics have asymptotic distributions that are discontinuous in some parameters. It is shown in Andrews and Guggenberger (2009a) that standard fixed critical value, subsampling, and m out of n bootstrap methods often have incorrect asymptotic size in such models. This paper introduces general methods of constructing tests and confidence intervals that have correct asymptotic size. In particular, we consider a hybrid subsampling/fixed-criticalvalue method and size-correction methods. The paper discusses two examples in detail. They are (i) confidence intervals in an autoregressive model with a root that may be close to unity and conditional heteroskedasticity of unknown form and (ii) tests and confidence intervals based on a post-conservative model selection estimator. KEYWORDS: Asymptotic size, autoregressive model, m out of n bootstrap, exact size, hybrid test, model selection, over-rejection, size-correction, subsample, confidence interval, subsampling test.
1. INTRODUCTION NONREGULAR MODELS are becoming increasingly important in econometrics and statistics as developments in computation make it feasible to employ more complex models. In a variety of nonregular models, however, methods based on a standard asymptotic fixed critical value (FCV) or the bootstrap do not yield tests or confidence intervals with the correct size even asymptotically. In such cases, the usual prescription in the literature is to use subsampling or m out n bootstrap methods (where n denotes the sample size and m denotes the bootstrap sample size). For references, see Andrews and Guggenberger (2010a), hereafter denoted AG1. However, AG1 showed that in a fairly broad array of nonregular models these methods do not deliver correct asymptotic size (defined to be the limit of exact size). The purpose of this paper is to provide general methods of constructing tests and confidence intervals (CIs) that do have correct asymptotic size in such models. The results cover cases in which a test statistic has an asymptotic distribution that is discontinuous in some parameters. Examples include inference for (i) post-conservative model selection procedures (such as those based on the Akaike information criterion (AIC)), (ii) parameters in scalar and vector autoregressive models with roots that may be close to unity, (iii) models with 1 D. W. K. Andrews gratefully acknowledges the research support of the National Science Foundation via Grants SES-0417911 and SES-0751517. P. Guggenberger gratefully acknowledges research support from a Sloan Fellowship, faculty research Grant from UCLA in 2005, and from the National Science Foundation via Grant SES-0748922. For comments, the authors thank three referees, a co-editor, Hannes Leeb, Azeem Shaikh, Jim Stock, and the participants at a number of seminars and conferences at which the paper was presented.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7015
722
D. W. K. ANDREWS AND P. GUGGENBERGER
a parameter near a boundary, (iv) models with lack of identification at some point(s) in the parameter space, such as models with weak instruments, (v) predictive regression models with nearly integrated regressors, (vi) threshold autoregressive models, (vii) tests of stochastic dominance, (viii) nondifferentiable functions of parameters, and (ix) differentiable functions of parameters that have zero first-order derivative. The methods considered here are quite general. However, their usefulness is greatest in models in which other methods, such as those based on a standard asymptotic FCV or the bootstrap, are not applicable. In models in which other methods work properly (in the sense that the limit of their exact size equals their nominal level), such methods are often preferable to the methods considered here in terms of the accuracy of the asymptotic approximations and/or the power of the test or length of the CI they generate. The first method considered in the paper is a hybrid method that takes the critical value for a given test statistic to be the maximum of a subsampling critical value and the FCV that applies when the true parameters are not near a point of discontinuity of the asymptotic distribution. The latter is usually a normal or chi-squared critical value. By simply taking the maximum of these two critical values, one obtains a test or CI that has correct asymptotic size in many cases where the FCV, subsampling, or both methods have incorrect asymptotic size. Furthermore, the paper shows that the hybrid method has the feature that relative to a subsampling method either (i) the subsampling method has correct size asymptotically, and the subsampling and hybrid critical values are the same asymptotically or (ii) the subsampling method has incorrect size asymptotically and the hybrid method reduces the magnitude of over-rejection for at least some parameter values, sometimes eliminating size distortion. The second method considered in the paper is a size-correction (SC) method. This method can be applied to FCV, subsampling, and hybrid procedures. The basic idea is to use the formulae given in AG1 for the asymptotic sizes of these procedures and to increase the magnitudes of the critical values (by adding a constant or reducing the nominal level) to achieve a test whose asymptotic size equals the desired asymptotic level. Closed-form solutions are obtained for the SC values (based on adding a constant). Numerical work in a number of different examples shows that computation of the SC values is tractable. The paper provides analytical comparisons of the asymptotic power of different SC tests and finds that the SC hybrid test has advantages over FCV and subsampling methods in most cases, but it does not dominate the SC subsampling method. The SC methods that we consider are not asymptotically conservative, but typically are asymptotically nonsimilar. (That is, for tests, the limit of the supremum of the finite-sample rejection probability over points in the null hypothesis equals the nominal level, but the limit of the infimum over points in the null hypothesis is less than the nominal level.) Usually power can be improved in
HYBRID AND SIZE-CORRECTED SUBSAMPLING
723
such cases by reducing the magnitude of asymptotic nonsimilarity. To do so, we introduce “plug-in” size-correction (PSC) methods for FCV, subsampling, and hybrid tests. These methods are applicable if there is a parameter subvector that affects the asymptotic distribution of the test statistic under consideration, is not related to the discontinuity in the asymptotic distribution, and is consistently estimable. The PSC method makes the critical values smaller for some parameter values by making the size-correction value depend on a consistent estimator of the parameter subvector. The asymptotic results for subsampling methods derived in AG1, and utilized here for size-correction, do not depend on the choice of subsample size b provided b → ∞ and b/n → 0 as n → ∞. One would expect that this may lead to poor approximations in some cases. To improve the approximations, the paper introduces finite-sample adjustments to the asymptotic rejection probabilities of subsampling and hybrid tests. The adjustments depend on the magnitude of δn = b/n. The adjusted formulae for the asymptotic rejection probabilities are used to define adjusted SC (ASC) values and adjusted PSC (APSC) values. All of the methods discussed above are applicable when one uses an m out of n bootstrap critical value in place of a subsampling critical value provided m2 /n → 0 and the observations are independent and identically distributed (i.i.d.). The reason is that the m out of n bootstrap can be viewed as subsampling with replacement, and the difference between sampling with and without replacement is asymptotically negligible under the stated conditions; see Politis, Romano, and Wolf (1999, p. 48). Literature that is related to the methods considered in this paper includes the work of Politis and Romano (1994) and Politis, Romano, and Wolf (1999) on subsampling; for the literature on the m out of n bootstrap, see AG1 for references. We are not aware of any methods in the literature that are analogous to the hybrid test or that consider size-correction of subsampling or m out of n bootstrap methods. Neither are we aware of any general methods of sizecorrection for FCV tests for the type of nonregular cases considered in this paper. For specific models in the class considered here, however, some methods are available. For example, for CIs based on post-conservative model selection estimators in regression models, Kabaila (1998) suggested a method of sizecorrection. For models with weak instruments, Anderson and Rubin (1949), Dufour (1997), Staiger and Stock (1997), Moreira (2003, 2009), Kleibergen (2002, 2005), Guggenberger and Smith (2005, 2008), and Otsu (2006) suggested methods with correct asymptotic size. A variant of Moreira’s method also is applicable in predictive regressions with nearly integrated regressors; see Jansson and Moreira (2006). In conditionally homoskedastic autoregressive models, CI methods of Stock (1991), Andrews (1993), Nankervis and Savin (1996), and Hansen (1999) can be used in place of the least squares estimator combined with normal critical values or subsampling critical values. Mikusheva (2007a) showed that the former methods yield correct asymptotic size. (She did not consider the method in Nankervis and Savin (1996).)
724
D. W. K. ANDREWS AND P. GUGGENBERGER
This paper considers two examples in detail. First, we consider CIs for the autoregressive parameter ρ in a first-order conditionally heteroskedastic autoregressive (AR(1)) model in which ρ may be close to or equal to 1. Models of this sort are applicable to exchange rate and commodity and stock prices (e.g., see Kim and Schmidt (1993)). We consider FCV, subsampling, and hybrid CIs. The CIs are based on inverting a (studentized) t statistic constructed using a feasible quasi-generalized least squares (FQGLS) estimator of ρ. This is a feasible generalized least squares (GLS) estimator based on a specification of the form of the conditional heteroskedasticity that may or may not be correct. We introduce procedures that are robust to this type of misspecification. We are interested in robustness of this sort because the literature is replete with different forms of autoregressive conditional heteroskedasticity (ARCH), generalized autoregressive conditional heteroskedasticity (GARCH), and stochastic volatility models for conditional heteroskedasticity—not all of which can be correct. We consider the FQGLS estimator because it has been shown that GLS correction of unit root tests yields improvements of power; see Seo (1999) and Guo and Phillips (2001). None of the CIs in the literature, such as those in Stock (1991), Andrews (1993), Andrews and Chen (1994), Nankervis and Savin (1996), Hansen (1999), Chen and Deo (2007), and Mikusheva (2007a), have correct asymptotic size in the presence of conditional heteroskedasticity under parameter values that are not 1/n local-to-unity. Table 2 of Mikusheva (2007b) shows that the nominal 95 CIs of Andrews (1993), Stock (1991) (modified), and Hansen (1999) have finite-sample coverage probabilities of 70, 72, and 73, respectively, for autoregressive parameter values of 3 and 5 under conditionally normal ARCH(1) innovations with ARCH parameter equal to 85, with n = 120, in a model with a linear time trend. Furthermore, GLS versions of the methods listed above cannot be adapted easily to account for conditional heteroskedasticity of unknown form, which is what we consider here.2 Given that the parameter space for ρ includes a unit root and near unit roots, standard two-sided FCV methods for constructing CIs based on a standard normal approximation to the t statistic are known to be problematic even under conditional homoskedasticity. As an alternative, Romano and Wolf (2001) proposed subsampling CIs for ρ. Mikusheva (2007a, Theorem 4) showed that 2
Stock’s (1991) method is not feasible because the asymptotic distribution of the GLS t statistic under local-to-unity asymptotics depends on a constant, h27 below, that is unknown and is quite difficult to estimate because it depends on the unknown form of conditional heteroskedasticity which in turn depends on an infinite number of lags. The methods of Andrews (1993), Andrews and Chen (1994), and Hansen (1999) are not feasible because they depend on a parametric specification of the model since they are parametric bootstrap-type methods. Mikusheva’s (2007a) procedures are variants of those considered in the papers above and hence are not easily adapted to conditional heteroskedasticity of unknown form. Chen and Deo’s (2007) approach relies on the i.i.d. nature of the innovations.
HYBRID AND SIZE-CORRECTED SUBSAMPLING
725
equal-tailed versions of such subsampling CIs under-cover the true value asymptotically under conditional homoskedasticity (i.e., their asymptotic confidence size is less than 1 − α), whereas some versions of the methods listed in the previous paragraph provide correct asymptotic coverage in a uniform sense. The results given here differ from those of Mikusheva (2007a) in several dimensions. First, her results do not apply to least squares (LS) or FQGLS procedures in models with conditional heteroskedasticity. Second, even in a model with conditional homoskedasticity, her results do not apply to symmetric subsampling CIs and do not provide an expression for the asymptotic confidence size. We consider two models: Model 1 includes an intercept, and Model 2 includes an intercept and time trend. We show that equal-tailed two-sided subsampling and two-sided FCV CIs have substantial asymptotic size distortions. On the other hand, symmetric subsampling CIs are shown to have correct asymptotic size. An explanation is given below. All types of hybrid CIs are shown to have correct asymptotic size. Finite-sample results indicate that the hybrid CIs have good coverage probabilities across all types of conditional heteroskedasticity that are considered. The second example is a post-conservative model selection (CMS) example. We consider an LS t test concerning a regression parameter after model selection is used to determine whether another regressor should be included in the model. The model selection procedure uses an LS t test with nominal level 5%. This procedure, which is closely related to AIC, is conservative (i.e., it chooses a correct model, but not necessarily the most parsimonious model, with probability that goes to 1). The asymptotic results for FCV tests in the CMS example are variations of those of Leeb and Pötscher (2005) and Leeb (2006) (and other papers referenced in these two papers). In the CMS example, nominal 5% subsampling, FCV, and hybrid tests have asymptotic and adjusted-asymptotic sizes between 90% and 96% for upper, symmetric, and equal-tailed tests.3 The finite-sample maximum (over the cases considered) null rejection probabilities of these tests for n = 120 and b = 12 are close to the asymptotic sizes. They are especially close for the adjusted-asymptotic sizes for which the largest deviations are 20%. Plug-in size-corrected tests perform very well in this example. For example, the 5% PSC hybrid test has finite-sample maximum null rejection probability of 48% for upper, symmetric, and equal-tailed tests. Additional examples are given in Andrews and Guggenberger (2009a, 2009b, 2009c, 2010b) and Guggenberger (2010). These examples cover (i) tests when a nuisance parameter may be near a boundary of the parameter space, (ii) tests and CIs concerning the coefficient on an endogenous variable in an instrumental variables regression model with instruments that may be weak, (iii) tests 3 This is for a parameter space of [−995 995] for the (asymptotic) correlation between the LS estimators of the two regressors.
726
D. W. K. ANDREWS AND P. GUGGENBERGER TABLE I
ASYMPTOTIC SIZES OF SUBSAMPLING AND HYBRID TESTS AND CONFIDENCE INTERVALS OF SYMMETRIC AND EQUAL -TAILED TWO-SIDED TYPES FOR A VARIETY OF MODELSa Subsampling Model
Sym
Hybrid
Eq-Tail
Sym
Eq-Tail
525 94 82 5
5 94 5 5
5 94 5 5
(b) Nominal 95% Confidence Intervals AR with Intercept 95 60 AR with Intercept & trend 95 25 Post-consistent model selection 0 0 Parameter of interest near boundary 90 475 Parameters defined by moment inequalities 95 95
95 95 0 95 95
95 95 0 95 95
(a) Nominal 5% Tests Nuisance parameter near boundary 10 Post-conservative model selection 94 IV regression–2SLS with possibly weak IVs 55 Parameter-dependent support 5
a The details of the post-conservative model selection model are described in the text. The results for the IV regression–2SLS with possibly weak IVs model are for tests concerning the coefficient on an endogenous variable in a linear instrumental variable (IV) model with a single endogenous variable and five IVs.
concerning a parameter that determines the support of the observations, (iv) CIs constructed after the application of a consistent model selection procedure, (v) CIs when the parameter of interest may be near a boundary, (vi) tests and CIs for parameters defined by moment inequalities, and (vii) tests after the application of a pretest. Table I summarizes the asymptotic sizes of subsampling and hybrid procedures in these models for symmetric and equal-tailed two-sided procedures. In many of these models, subsampling procedures have incorrect asymptotic size—often by a substantial amount. In all of these models except those based on post-consistent model selection, hybrid procedures have correct asymptotic size. For post-conservative model selection inference, PSC tests have correct asymptotic size. The remainder of the paper is outlined as follows. Section 2 introduces the testing setup, the hybrid tests, and the CMS example, which is used as a running example in the paper. Section 3 introduces the size-corrected tests, gives power comparisons of the SC tests, and introduces the plug-in size-corrected tests. Section 4 introduces the finite-sample adjustments to the asymptotic sizes of subsampling and hybrid tests. Sections 5 and 6 consider equal-tailed tests and CIs, respectively. Sections 7 and 8 provide the results for the autoregressive and post-conservative model selection examples. The Supplemental Material to this paper (Andrews and Guggenberger (2009b)), gives (i) details concerning the construction of Tables II and III, (ii) proofs of the results in the paper, (iii) size-correction methods based on quantile adjustment, (iv) results concerning power comparisons of SC tests, (v) graphical illustrations of critical
HYBRID AND SIZE-CORRECTED SUBSAMPLING
727
value functions and power comparisons, (vi) size-correction results for equaltailed tests, (vii) results for combined size-corrected subsampling and hybrid tests, (viii) an additional example of a nuisance parameter near a boundary, and (ix) proofs for the examples in the paper. 2. HYBRID TESTS 2.1. Intuition We now provide some intuition regarding the potential problem with the asymptotic size of subsampling procedures and indicate why the hybrid procedure introduced below solves the problem in many cases. Suppose we are carrying out a test based on a test statistic Tn and a nuisance parameter γ ∈ Γ ⊂ R appears. Suppose the asymptotic null distribution of Tn is discontinuous at γ = 0. That is, we obtain a different asymptotic distribution under the fixed parameter γ = 0 from that under a fixed γ = 0. As is typical in such situations, suppose the asymptotic distribution of Tn under any drifting sequence of parameters {γn = h/nr : n ≥ 1} (or γn = (h + o(1))/nr ) depends on the “localization parameter” h.4 Denote this asymptotic distribution by Jh . If the asymptotic distribution of Tn under γn is Jh , then the asymptotic distribution of Tb under γn = h/nr = (b/n)r h/br = o(1)/br is J0 when b/n → 0 as n → ∞. Subsample statistics with subsample size b have the same asymptotic distribution J0 as Tb . In consequence, subsampling critical values converge in probability to the 1 − α quantile, c0 (1 − α), of J0 , whereas the full-sample statistic Tn converges in distribution to Jh . The test statistic Tn needs a critical value equal to the 1 − α quantile, ch (1 − α), of Jh in order to have an asymptotic null rejection probability of α under {γn : n ≥ 1}. If c0 (1 − α) < ch (1 − α), then the subsampling test over-rejects asymptotically under {γn : n ≥ 1} and has asymptotic size greater than α. If c0 (1 − α) > ch (1 − α), then it under-rejects asymptotically and is asymptotically nonsimilar. Sequences of the form γn = h/nr are not the only ones in which the subsampling critical value may be too small. Suppose γn = g/br for fixed g ∈ R (or γn = (g + o(1))/br ). Then Tb has asymptotic distribution Jg and the probability limit of the subsampling critical value is cg (1 − α). On the other hand, γn = (n/b)r g/nr and (n/b)r → ∞, so the full-sample statistic Tn converges to J∞ (when g = 0), which is the asymptotic distribution of Tn when γn is more distant from the discontinuity point than O(n−r ). Let c∞ (1 − α) denote the 1 − α quantile of J∞ . If cg (1 − α) < c∞ (1 − α), then the subsampling test overrejects asymptotically under {γn : n ≥ 1}. Any value of g ∈ R is possible, so one obtains asymptotic size greater than α if cg (1 − α) < ch (1 − α) for any (g h) such that g = 0 if h < ∞ or g ∈ R if h = ∞.5 4 Typically, the constant r > 0 is such that the distribution of Tn under γn is contiguous to its distribution under γ = 0. In most cases, r = 1/2, but in the autoregressive example with a discontinuity at a unit root, we have r = 1. 5 For g = 0 a slightly different argument is needed than that given above.
728
D. W. K. ANDREWS AND P. GUGGENBERGER
The hybrid test uses a critical value given by the maximum of the subsampling critical value and c∞ (1 − α). Its probability limit is cg∗ = max{cg (1 − α) c∞ (1 −α)}. In consequence, if the critical value function ch (1 −α) viewed as a function of h is maximized at h = 0, then when (g h) = (0 h) for |h| < ∞, we have cg∗ = c0 (1 − α) ≥ ch (1 − α), and when (g h) is such that h = ∞, we have cg∗ ≥ c∞ (1 − α) = ch (1 − α). On the other hand, if ch (1 − α) is maximized at h = ∞, then cg∗ = c∞ (1 − α) ≥ cg (1 − α) for all g ∈ R ∪ {∞}. Hence, in this case too the hybrid critical value does not lead to over-rejection. In many examples, ch (1 − α) is maximized at either 0 or ∞ and the hybrid test has correct asymptotic size. In some models, the test statistic Tn depends on two nuisance parameters (γ1 γ2 ) and its asymptotic distribution is discontinuous whenever γ1 = 0. In this case, the asymptotic distribution of Tn depends on a localization parameter h1 analogous to h above and the fixed value of γ2 . The asymptotic behaviors of subsampling and hybrid tests in this case are as described above with h1 in place of h except that the conditions for a rejection rate of less than or equal to α must hold for each value of γ2 . It turns out that in a number of models of interest the critical value function is monotone increasing in h1 for some values of γ2 and monotone decreasing in other values. In consequence, subsampling tests over-reject asymptotically, but hybrid tests do not. 2.2. Testing Setup Here we describe the general testing setup. We are interested in tests concerning a parameter θ ∈ Rd in the presence of a nuisance parameter γ ∈ Γ . The null hypothesis is H0 : θ = θ0 for some θ0 ∈ Rd . The alternative hypothesis may be one-sided or two-sided. Let Tn (θ0 ) denote a test statistic based on a sample of size n for testing H0 . It could be a t statistic or some other test statistic. We consider the case where the asymptotic null distribution of Tn (θ0 ) depends on the nuisance parameter γ and is discontinuous at some value(s) of γ. The nuisance parameter γ has up to three components: γ = (γ1 γ2 γ3 ). The points of discontinuity of the asymptotic distribution of Tn (θ0 ) are determined by the first component, γ1 ∈ Rp . We assume that the discontinuities occur when one or more elements of γ1 equal zero. The parameter space for γ1 is Γ1 ⊂ Rp . The second component, γ2 (∈ Rq ), of γ also affects the limit distribution of the test statistic, but does not affect the distance of the parameter γ to the point of discontinuity. The parameter space for γ2 is Γ2 ⊂ Rq . The third component, γ3 , of γ does not affect the limit distribution of the test statistic. It is assumed to be an element of an arbitrary space T3 . Infinite dimensional γ3 parameters, such as error distributions, arise frequently in examples. Due to the central limit theorem (CLT), the asymptotic distribution of a test statistic often does not depend on an error distribution. The parameter space for γ3 is Γ3 (γ1 γ2 ) (⊂ T3 ).
HYBRID AND SIZE-CORRECTED SUBSAMPLING
729
The parameter space for γ is (2.1)
Γ = {(γ1 γ2 γ3 ) : γ1 ∈ Γ1 γ2 ∈ Γ2 γ3 ∈ Γ3 (γ1 γ2 )}
Let an opening floor bracket ( ) denote the left endpoint of an interval that may be open or closed at the left end. Define a closing floor bracket () analogously for the right endpoint. p ASSUMPTION A: (i) Γ satisfies (2.1) and (ii) Γ1 = m=1 Γ1m , where Γ1m =
u u u γ1m for some −∞ ≤ γ1m < γ1m ≤ ∞ that satisfy γ1m ≤ 0 ≤ γ1m for m =
γ1m 1 p. Next, we describe the asymptotic behavior of Tn (θ0 ) when the true value of θ is the null value θ0 . All limits are as n → ∞. For an arbitrary distribution G, let G(·) denote the distribution function (d.f.) of G, let G(x−) denote the limit from the left of G(·) at x, and let C(G) denote the set of continuity points of G(·). Let α ∈ (0 1) be a given constant. Define the 1 − α quantile, q(1 − α), of a distribution G by q(1 − α) = inf{x ∈ R : G(x) ≥ 1 − α}. The distribution Jh considered below is the distribution of a proper random variable that is finite with probability 1. Let R+ = {x ∈ R : x ≥ 0}, R− = {x ∈ R : x ≤ 0}, R+∞ = p R+ ∪ {+∞}, R−∞ = R− ∪ {−∞}, R∞ = R ∪ {±∞}, R+ = R+ × · · · × R+ (with p p copies), and R∞ = R∞ × · · · × R∞ (with p copies). Let r > 0 denote a rate of convergence index such that when the true parameter γ1 satisfies nr γ1 → h1 , then the test statistic Tn (θ0 ) has an asymptotic distribution that depends on the localization parameter h1 . In most examples, r = 1/2, but in the unit root example considered below r = 1. The index set for the different asymptotic null distributions of the test statistic Tn (θ0 ) is (2.2)
H = H1 × H2 ⎧
if γ1m = 0, p ⎨ R+∞ u if γ R H1 = −∞ 1m = 0, ⎩
u m=1 if γ1m < 0 and γ1m > 0, R∞ H2 = cl(Γ2 )
where cl(Γ2 ) is the closure of Γ2 with respect to Rq∞ . For example, if p = 1,
γ11 = 0, and Γ2 = Rq , then H1 = R+∞ , H2 = Rq∞ , and H = R+∞ × Rq∞ . For notational simplicity, we write h = (h1 h2 ), rather than (h 1 h 2 ) , even though h is a p + q column vector. DEFINITION OF {γnh : n ≥ 1}: Given r > 0 and h = (h1 h2 ) ∈ H, let {γnh = (γnh1 γnh2 γnh3 ) : n ≥ 1} denote a sequence of parameters in Γ for which nr γnh1 → h1 and γnh2 → h2 .
730
D. W. K. ANDREWS AND P. GUGGENBERGER
For a given model, we assume there is a single fixed r > 0. The sequence {γnh : n ≥ 1} is defined such that under {γnh : n ≥ 1}, the asymptotic distribution of Tn (θ0 ) depends on h. We assume that Tn (θ0 ) satisfies the following conditions concerning its asymptotic null behavior. ASSUMPTION B: For some r > 0, all h ∈ H, all sequences {γnh : n ≥ 1}, and some distributions Jh , Tn (θ0 ) →d Jh under {γnh : n ≥ 1}. ASSUMPTION K: The asymptotic distribution Jh in Assumption B is the same (proper) distribution, call it J∞ , for all h = (h1 h2 ) ∈ H for which h1m = +∞ or −∞ for all m = 1 p, where h1 = (h11 h1p ) . Assumptions B and K hold in a wide variety of examples of interest; see below and Andrews and Guggenberger (2009b, 2009c, 2010a, 2010b). In examples, when Tn (θ0 ) is a studentized t statistic or a likelihood ratio (LR), Lagrange multiplier (LM), or Wald statistic, J∞ typically is a standard normal, absolute standard normal, or chi-squared distribution. Let c∞ (1 − α) denote the 1 − α quantile of J∞ . As defined, c∞ (1 − α) is an FCV that is suitable when γ is not at or close to a discontinuity point of the asymptotic distribution of Tn (θ0 ). Post-Conservative Model-Selection Example In this example, we consider inference concerning a parameter in a linear regression model after a “conservative” model selection procedure has been applied to determine whether another regressor should enter the model. A “conservative” model selection procedure is one that chooses a correct model, but not necessarily the most parsimonious correct model, with probability that goes to 1 as the sample size n goes to infinity. Examples are model selection based on a test whose critical value is independent of the sample size and the Akaike information criterion (AIC). The model we consider is (2.3)
yi = x∗1i θ + x∗2i β2 + x∗ 3i β3 + σεi x∗i = (x∗1i x∗2i x∗ 3i ) ∈ Rk
for i = 1 n
where
β = (θ β2 β 3 ) ∈ Rk
x∗1i x∗2i θ β2 σ εi ∈ R, and x∗3i β3 ∈ Rk−2 . The observations {(yi x∗i ) : i = 1 n} are i.i.d. The scaled error εi has mean 0 and variance 1 conditional on x∗i . We are interested in testing H0 : θ = θ0 after carrying out a model selection procedure to determine whether x∗2i should enter the model. The model selection procedure is based on a t test of H0∗ : β2 = 0 that employs a critical value c that does not depend on n. Because the asymptotic distribution of the test statistic is invariant to the value of θ0 , the testing results immediately yield
HYBRID AND SIZE-CORRECTED SUBSAMPLING
731
results for a CI for θ obtained by inverting the test. Also, the inference problem described above covers tests concerning a linear combination of regression coefficients by suitable reparametrization (see the Supplemental Material for details). We consider upper and lower one-sided, and symmetric and equal-tailed two-sided nominal level α FCV, subsampling, and hybrid tests of H0 : θ = θ0 . Each test is based on a studentized test statistic Tn (θ0 ), where Tn (θ0 ) equals Tn∗ (θ0 ), −Tn∗ (θ0 ), |Tn∗ (θ0 )|, and Tn∗ (θ0 ), respectively, and Tn∗ (θ0 ) is defined below. n1 (θ0 ) denote the standard t staTo define the test statistic Tn∗ (θ0 ), we let T tistic for testing H0 : θ = θ0 in (2.3) (which is unrestricted in the sense that H0∗ : β2 = 0 is not imposed). As defined, this statistic has an exact t distribution under H0 and normality of the errors (but the latter is not assumed). We n1 (θ0 ) denote the “restricted” t statistic for testing H0 which imposes the let T restriction of H0∗ : β2 = 0, but uses the unrestricted estimator σ of σ instead of the restricted estimator.6 We let Tn2 denote the standard t statistic for testing H0∗ : β2 = 0 (that does not impose H0 ).7 The model selection test rejects H0∗ : β2 = 0 if |Tn2 | > c, where c > 0 is a given critical value that does not depend on n. Typically, c = z1−α/2 for some α > 0. The post-model selection test statistic, Tn∗ (θ0 ), for testing H0 : θ = θ0 is (2.4)
n1 (θ0 )1(|Tn2 | ≤ c) + T n1 (θ0 )1(|Tn2 | > c) Tn∗ (θ0 ) = T
We now show how the testing problem above fits into the general framework. First, we define the regressor vector x⊥i = (x⊥1i x⊥2i ) that corresponds to (x∗1i x∗2i ) with x∗3i projected out using the population projection. Let G denote the distribution of (εi x∗i ). Define
∗ x1i − x∗ 3i (EG x∗3i x∗ 3i )−1 EG x∗3i x∗1i ⊥ ∈ R2 (2.5) xi = x∗2i − x∗ 3i (EG x∗3i x∗ 3i )−1 EG x∗3i x∗2i 11 Q Q12 −1 Q = EG x⊥i x⊥ and Q = i Q12 Q22 The parameter vector γ = (γ1 γ2 γ3 ) is defined in this example by (2.6)
γ1 =
β2 σ(Q22 )1/2
γ2 =
Q12 (Q11 Q22 )1/2
and
γ3 = (β2 β3 σ G)
6 n1 (θ0 ) using the restricted (by β2 = 0) LS estimator of σ, but this is not One could define T desirable because it leads to an inconsistent estimator of σ under sequences of parameters {β2 = β2n : n ≥ 1} that satisfy β2n → 0 and n1/2 β2n 0 as n → ∞. For subsampling tests, one could n1 (θ0 ) with n1 (θ0 ) and T σ deleted because the scale of the subsample statistics offsets define T that of the original sample statistic. This does not work for hybrid tests because Assumption K fails if σ is deleted. 7 n1 (θ0 ), T n1 (θ0 ), See Section S11.1 of the Supplemental Material for explicit expressions for T and Tn2 .
732
D. W. K. ANDREWS AND P. GUGGENBERGER
2 ). The parameter spaces for γ1 , γ2 , Note that γ2 = ρ, where ρ = AsyCorr( θ β and γ3 are Γ1 = R, Γ2 = [−1 + ζ 1 − ζ] for some ζ > 0, and
(2.7) Γ3 (γ1 γ2 ) = (β2 β3 σ G) : β2 ∈ R β3 ∈ Rk−2 σ > 0 and for
⊥ ⊥ G i i
Q = E x x and Q (i)
−1
Q11 = Q12
Q12 Q22
β2 Q12 = γ (ii) = γ2 1 σ(Q22 )1/2 (Q11 Q22 )1/2
(iii) λmin (Q) ≥ κ (iv) λmin (EG x∗3i x∗ 3i ) ≥ κ (v) EG x∗i 2+δ ≤ M (vi) EG εi x∗i 2+δ ≤ M
(vii) EG (εi |x ) = 0 a.s., and (viii) EG (ε |x ) = 1 a.s. ∗ i
2 i
∗ i
for some κ δ > 0 and M < ∞. The parameter γ2 is bounded away from 1 and −1 because otherwise the LS estimators of θ and β2 could have a distribution that is arbitrarily close to being singular (such as a normal distribution with singular variance matrix). Assumption A holds immediately. The rate of convergence parameter r equals 1/2. The localization parameter h satisfies h = (h1 h2 ) ∈ H = H1 × H2 , where H1 = R∞ and H2 = [−1 + ζ 1 − ζ]. Let (a b) = Φ(a + b) − Φ(a − b), where Φ(·) is the standard normal distribution function. Note that (a b) = (−a b). Calculations in the Supplemental Material establish that the asymptotic distribution Jh∗ of Tn∗ (θ0 ) under a sequence of parameters {γn = (γn1 γn2 γn3 ) : n ≥ 1} (where n1/2 γn1 → h1 , γn2 → h2 , and γn3 ∈ Γ3 (γn1 γn2 ) for all n) is (2.8) Jh∗ (x) = Φ x + h1 h2 (1 − h22 )−1/2 (h1 c)
x h1 + h2 t c 1− φ(t) dt + (1 − h22 )1/2 (1 − h22 )1/2 −∞ when |h1 | < ∞. When |h1 | = ∞, Jh∗ (x) = Φ(x) (which equals the limit as |h1 | → ∞ of Jh∗ (x) defined in (2.8)). For upper one-sided, lower one-sided, and symmetric two-sided tests, the asymptotic distribution Jh of Tn (θ0 ) is given by Jh∗ , −Jh∗ , and |Jh∗ |, respectively. (If Y ∼ Jh∗ , then by definition, −Y ∼ −Jh∗ and ∗ being an |Y | ∼ |Jh∗ |.) This verifies Assumption B. Assumption K holds with J∞ N(0 1) distribution. The asymptotic results that are used to verify Assumption B are closely related to results of Leeb and Pötscher (2005) and Leeb (2006) (and other papers referenced in these two papers). However, no papers in the literature that we are aware of consider subsampling-based methods for post-conservative model
HYBRID AND SIZE-CORRECTED SUBSAMPLING
733
selection procedures, as is done below. The results given below also are related to, but quite different from, those in Andrews and Guggenberger (2009a) for post-consistent model selection estimators, shrinkage estimators, and superefficient estimators. 2.3. Subsampling Critical Value The hybrid test introduced below makes use of a subsampling critical value, which we define here. A subsampling critical value is determined by subsample nbj : j = 1 qn }, where j indexes the subsamstatistics that are denoted by {T ple, b is a subsample size that depends on n, and qn is the number of different subsamples. With i.i.d. observations, there are qn = n!/((n − b)!b!) different subsamples of size b. With time series observations, there are qn = n − b + 1 subsamples each consisting of b consecutive observations. Let {Tnbj (θ0 ) : j = 1 qn } be subsample statistics that are defined exactly as Tn (θ0 ) is defined, but are based on subsamples of size b rather than the nbj : j = 1 qn } that are used to confull sample. The subsample statistics {T struct the subsampling critical value are defined to satisfy one or the other of the following assumptions. nbj = Tnbj ( ASSUMPTION SUB1: T θn ) for all j ≤ qn , where θn is an estimator of θ. nbj = Tnbj (θ0 ) for all j ≤ qn . ASSUMPTION SUB2: T The estimator θn in Assumption Sub1 usually is chosen to be an estimator that is consistent under both the null and alternative hypotheses. Let Lnb (x) and cnb (1 − α) denote the empirical distribution function and nbj : j = the 1 − α sample quantile, respectively, of the subsample statistics {T 1 qn }: (2.9)
−1 n
Lnb (x) = q
qn
nbj ≤ x) 1(T
for x ∈ R
j=1
cnb (1 − α) = inf{x ∈ R : Lnb (x) ≥ 1 − α} The subsampling critical value is cnb (1 − α). The subsampling test rejects H0 : θ = θ0 if Tn (θ0 ) > cnb (1 − α). For subsampling tests (and the hybrid tests introduced below), we make the following assumptions: ASSUMPTION C: (i) b → ∞ and (ii) b/n → 0.
734
D. W. K. ANDREWS AND P. GUGGENBERGER
ASSUMPTION D: (i) {Tnbj (θ0 ) : j = 1 qn } are identically distributed under any γ ∈ Γ for all n ≥ 1, and (ii) Tnbj (θ0 ) and Tb (θ0 ) have the same distribution under any γ ∈ Γ for all n ≥ 1. These assumptions allow for i.i.d., stationary strong-mixing, and even nonstationary observations (as shown in the autoregressive example below). They have been verified in a wide variety of examples in this paper and elsewhere. In the post-conservative model selection example, the subsampling critical values are defined using Assumption Sub1. Let θ and θ denote the restricted and unrestricted least squares (LS) estimators of θ, respectively. The subsample statistics are defined by {Tnbj (θ) : j = 1 qn }, where θ is the “model selection” estimator of θ defined by (2.10)
θ= θ1(|Tn2 | ≤ c) + θ1(|Tn2 | > c)
and Tnbj (θ0 ) is defined just as Tn (θ0 ) is defined but using the jth subsample of size b in place of the full sample of size n. (One could also use the unrestricted estimator θ in place of θ.) Assumption C holds by choice of b. Assumption D holds automatically. 2.4. Technical Assumptions We now state several technical assumptions that are used below. Definethe empirical distribution of {Tnbj (θ0 ) : j = 1 qn } by Unb (x) = qn 1(Tnbj (θ0 ) ≤ x). qn−1 j=1 ASSUMPTION E: For all sequences {γn ∈ Γ : n ≥ 1}, Eθ0 γn Unb (x) →p 0 under {γn : n ≥ 1} for all x ∈ R.
Unb (x) −
ASSUMPTION F: For all ε > 0 and h ∈ H, Jh (ch (1 − α) + ε) > 1 − α, where ch (1 − α) is the 1 − α quantile of Jh . ASSUMPTION G: For all h = (h1 h2 ) ∈ H and all sequences {γnh : n ≥ 1} for which br γnh1 → g1 for some g1 ∈ Rp∞ , if Unb (x) →p Jg (x) under {γnh : n ≥ 1} for all x ∈ C(Jg ) for g = (g1 h2 ) ∈ Rp+q ∞ , then Lnb (x) − Unb (x) →p 0 under {γnh : n ≥ 1} for all x ∈ C(Jg ). ASSUMPTION J: For all ε > 0 and h ∈ H, Jh (ch (τ) + ε) > τ for τ = α/2 and τ = 1 − α/2, where ch (τ) is the τ quantile of Jh . Assumption E holds for i.i.d. observations and for stationary strong-mixing observations with supγ∈Γ αγ (m) → 0 as m → ∞, where {αγ (m) : m ≥ 1} are the strong-mixing numbers of the observations when the true parameters are (θ0 γ); see AG1. Assumptions F and J are not very restrictive. The former is
HYBRID AND SIZE-CORRECTED SUBSAMPLING
735
used for one- and two-sided tests, while the latter is used for equal-tailed tests. nbj } satisfy Assumption Sub2. SecAssumption G holds automatically when {T tion 7 of AG1 provides sufficient conditions for Assumption G when Assumption Sub1 holds. In the post-conservative model selection example, Assumption E holds automatically, Assumptions F and J hold because Jh∗ (x) is strictly increasing in x ∈ R for all h ∈ H, and Assumption G is verified in the Supplemental Material using the proof of Lemma 4 of AG1. 2.5. Definition of Hybrid Tests We now introduce a hybrid test that is useful when the test statistic Tn (θ0 ) has a limit distribution that is discontinuous in some parameter and an FCV or subsampling test over-rejects asymptotically under the null hypothesis. The critical value of the hybrid test is the maximum of the subsampling critical value and a certain fixed critical value. The hybrid test is quite simple to compute, in many situations has asymptotic size equal to its nominal level α (see Lemma 2 below and the examples in Table I), and otherwise over-rejects the null asymptotically less than the standard subsampling test or the FCV test at some null parameter values. In addition, in many scenarios, the power of the hybrid test is quite good relative to FCV and subsampling tests (after all have been sizecorrected); see Section 3.2 below. The hybrid test with nominal level α rejects the null hypothesis H0 : θ = θ0 when (2.11)
∗ Tn (θ0 ) > cnb (1 − α)
where
∗ nb
c (1 − α) = max{cnb (1 − α) c∞ (1 − α)} The hybrid test simply takes the critical value to be the maximum of the usual subsampling critical value and the critical value from the J∞ distribution, which is usually known.8 For example, in the post-conservative model selection example, c∞ (1 − α) equals z1−α and z1−α/2 for one- and two-sided tests, respectively. Hence, the hybrid test is straightforward to compute. Obviously, the rejection probability of the hybrid test is less than or equal to those of the standard subsampling test and the FCV test with critical value c∞ (1 − α). Hence, the hybrid test does not over-reject more often than both of these two tests. 8
Hybrid tests can be defined even when Assumption K does not hold. For example, we can ∗ (1 − α) = max{cnb (1 − α) suph∈H ch∞ (1 − α)}, where ch∞ (1 − α) is the 1 − α quandefine cnb ∞ ∞ ∞ tile of Jh∞ and, given h = (h1 h2 ) ∈ H, h∞ = (h∞ 11 h1p h2 ) ∈ H is defined by h1j = +∞ ∞ ∞ ∞ if h1j > 0, h1j = −∞ if h1j < 0, h1j = +∞ or −∞ (chosen so that h ∈ H) if h1j = 0 for j = 1 p, and h∞ 2 = h2 . When Assumption K holds, this reduces to the hybrid critical value in (2.11). We utilize Assumption K because it leads to a particularly simple form for the hybrid test.
736
D. W. K. ANDREWS AND P. GUGGENBERGER
Furthermore, it is shown in Lemma 2 below that the hybrid test of nominal level α has asymptotic size α provided the 1 − α quantile function c(h1 h2 ) (1 − α) of J(h1 h2 ) is maximized at a boundary point of h1 for each fixed h2 , where h = (h1 h2 ). For example, this occurs if ch (1 − α) is monotone increasing or decreasing in h1 for each fixed h2 ∈ H2 . 2.6. Asymptotic Size The exact and asymptotic sizes of a hybrid test are: (2.12)
∗ ExSzn (θ0 ) = sup Pθ0 γ (Tn (θ0 ) > cnb (1 − α)) γ∈Γ
AsySz(θ0 ) = lim sup ExSzn (θ0 ) n→∞
where Pθγ (·) denotes probability when the true parameters are (θ γ). We are interested in the “asymptotic size” of the test because it approximates the exact size. Uniformity over γ ∈ Γ , which is built into the definition of asymptotic size, is necessary for asymptotic results to give a good approximation to the exact size. The proof of Theorem 1 below shows that the asymptotic size of a hybrid test depends on the asymptotic distributions of the full-sample statistic Tn (θ0 ) and the subsampling statistic Tnbj (θ0 ) under sequences {γnh : n ≥ 1}. By Assumption B, the asymptotic distribution of Tn (θ0 ) is Jh . The asymptotic distribution of Tnbj (θ0 ) under {γnh : n ≥ 1} is shown to be Jg for some g ∈ H. Given h ∈ H, under {γnh : n ≥ 1} not all g ∈ H are possible indices for the asymptotic distribution of Tnbj (θ0 ). The set of all possible pairs of localization parameters (g h) is denoted GH and is defined by (2.13) GH = (g h) ∈ H × H : g = (g1 g2 ) h = (h1 h2 ) g2 = h2 and for m = 1 p (i) g1m = 0 if |h1m | < ∞ (ii) g1m ∈ R+∞ if h1m = +∞ and (iii) g1m ∈ R−∞ if h1m = −∞ where g1 = (g11 g1p ) ∈ H1 and h1 = (h11 h1p ) ∈ H1 . Note that for (g h) ∈ GH, we have |g1m | ≤ |h1m | for all m = 1 p. In the “continuous limit” case (defined as the case where there is no γ1 component of γ), GH simplifies considerably: GH = {(g2 h2 ) ∈ H2 × H2 : g2 = h2 }. See AG1 for further discussion of GH. Define (2.14) MaxHyb (α) = sup 1 − Jh max{cg (1 − α) c∞ (1 − α)} (gh)∈GH
If Jh (x) is continuous at suitable (h x) values, then the following assumption holds.
HYBRID AND SIZE-CORRECTED SUBSAMPLING
737
ASSUMPTION T: MaxHyb (α) = Max−Hyb (α), where Max−Hyb (α) is defined as MaxHyb (α) is defined in (2.14), but with Jh (x) replaced by Jh (x−), where x = max{cg (1 − α) c∞ (1 − α)}. Assumption T holds in the post-conservative model selection example by the continuity of Jh∗ (x) in x for x ∈ R for all h ∈ H. It also holds in all of the examples we have considered except the moment inequality example.9 The following result establishes the asymptotic size of the hybrid test. THEOREM 1: Suppose Assumptions A–G, K, and T hold. Then the hybrid test based on Tn (θ0 ) has AsySz(θ0 ) = MaxHyb (α). COMMENT: Theorem 1 holds by the proof of Theorem 1(ii) of AG1 with cnb (1 − α) replaced by max{cnb (1 − α) c∞ (1 − α)} throughout using a slight variation of Lemma 5(ii) of AG1. 2.7. Properties of Hybrid Tests The following result shows that the hybrid test has better size properties than the subsampling test. It is shown in AG1 that the subsampling test has asymptotic size that satisfies AsySz(θ0 ) = MaxSub (α), where MaxSub (α) is defined just as MaxHyb (α) is, but with cg (1 − α) in place of max{cg (1 − α) c∞ (1 − α)}. LEMMA 1: Suppose Assumptions A–G, K, and T hold. Then either (i) the addition of c∞ (1 − α) to the subsampling critical value is irrelevant asymptotically (i.e., ch (1 − α) ≥ c∞ (1 − α) for all h ∈ H and MaxHyb (α) = MaxSub (α)) or (ii) the nominal level α subsampling test over-rejects asymptotically (i.e., AsySz(θ0 ) > α) and the hybrid test reduces the asymptotic over-rejection for at least one parameter value (g h) ∈ GH. Next, we show that the hybrid test has correct size asymptotically if ch (1 − α) is maximized at h∞ or is maximized at h0 = (0 h2 ) and p = 1, where p is the dimension of h1 and h∞ is any h ∈ H for which Jh = J∞ . For example, for p = 1, the maximization condition is satisfied if ch (1 − α) is monotone increasing or decreasing in h1 , is bowl-shaped in h1 , or is wiggly in h1 with global maximum at 0 or ±∞. The precise condition is the following. (Here, “Quant” abbreviates quantile.) ASSUMPTION QUANT: (i)(a) for all h ∈ H, ch (1 − α) ≤ c∞ (1 − α) and (b) J∞ (c∞ (1 − α)) = 1 − α, or (ii)(a) p = 1, (b) for all h ∈ H, ch (1 − α) ≤ ch0 (1 − α), and (c) J∞ (c∞ (1 − α)) = 1 − α. 9 Assumption T is not needed in the moment inequality example because subsampling has correct asymptotic size in that example; see Andrews and Guggenberger (2009c).
738
D. W. K. ANDREWS AND P. GUGGENBERGER
Assumption Quant(i)(b) and (ii)(c) are continuity conditions that are not restrictive. LEMMA 2: Suppose Assumptions A–G, K, T, and Quant hold. Then the hybrid test based on Tn (θ0 ) has AsySz(θ0 ) = α. 3. SIZE-CORRECTED TESTS 3.1. Definition and Justification of Size-Corrected Tests We now define size-corrected (SC) tests. The size-corrected fixed-criticalvalue (SC-FCV), subsampling (SC-Sub), and hybrid (SC-Hyb) tests with nominal level α are defined to reject the null hypothesis H0 : θ = θ0 when (3.1)
Tn (θ0 ) > cv(1 − α) Tn (θ0 ) > cnb (1 − α) + κ(α) Tn (θ0 ) > max{cnb (1 − α) c∞ (1 − α) + κ∗ (α)}
respectively, where (3.2)
cv(1 − α) = sup ch (1 − α) h∈H
κ(α) = sup [ch (1 − α) − cg (1 − α)] (gh)∈GH
κ∗ (α) = sup ch (1 − α) − c∞ (1 − α)
h∈H ∗
H = h ∈ H : for some (g h) ∈ GH cg (1 − α) < ch (1 − α) ∗
If H ∗ is empty, then κ∗ (α) = −∞ by definition. size-correction as in (3.1) is possible under the following assumption. ASSUMPTION L: (i) suph∈H ch (1 − α) < ∞ and (ii) infh∈H ch (1 − α) > −∞. Assumption L is satisfied in most, but not all, examples. Assumption L holds in the post-conservative model selection example because ch (1 − α) is continuous in h ∈ H and has finite limits as |h1 | → ∞ and/or |h2 | → 1 − ζ. Assumption L(i) is a necessary and sufficient condition for size-correction of the FCV test. Necessary and sufficient conditions for size-correction of the subsampling and hybrid tests are given in Andrews and Guggenberger (2010b). These conditions are weaker than Assumption L, but more complicated. Even the weaker conditions are violated in some examples, for example, in the consistent model selection/superefficient example in Andrews and Guggenberger (2009a). In some cases the FCV test cannot be size-corrected because cv(1 − α) = ∞, but the SC-Sub and SC-Hyb tests still exist and have correct asymptotic size.
HYBRID AND SIZE-CORRECTED SUBSAMPLING
739
Also, in some cases, the SC-FCV and SC-Hyb tests exist while the SC-Sub test does not (because κ(α) = ∞). Surprisingly, both cases arise in the instrumental variables (IV) example considered in Andrews and Guggenberger (2010b) (depending upon whether one considers symmetric two-sided or upper onesided tests). The following is a continuity condition that is not very restrictive. ASSUMPTION M: (a)(i) For some h∗ ∈ H, ch ∗ (1 − α) = suph∈H ch (1 − α) and (ii) for all h∗ ∈ H that satisfy the condition in part (i), Jh∗ (x) is continuous at x = ch ∗ (1 − α). (b)(i) For some (g∗ h∗ ) ∈ GH, ch∗ (1 − α) − cg∗ (1 − α) = sup(gh)∈GH [ch (1 − α) − cg (1 − α)] and (ii) for all (g∗ h∗ ) ∈ GH that satisfy the condition in part (i), Jh∗ (x) is continuous at x = ch∗ (1 − α). (c)(i) When H ∗ is nonempty, for some h∗ ∈ H ∗ , ch ∗ (1 − α) = suph∈H ∗ ch (1 − α) and (ii) for all (g h) ∈ GH with max{cg (1 − α) c∞ (1 − α) + κ∗ (α)} = ch (1 − α), Jh (x) is continuous at x = ch (1 − α). Assumption M holds in the post-conservative model selection example by the continuity of ch (1 − α) in h ∈ H plus the shape of ch (1 − α) as a function of h1 for each |h2 | ≤ 1 − ζ (which is determined by simulation); see Figure 2 in Section 8. The following result shows that the SC tests have AsySz(θ0 ) equal to their nominal level under suitable assumptions. THEOREM 2: Suppose Assumptions A–G and K–M hold. Then the SC-FCV, SC-Sub, and SC-Hyb tests satisfy AsySz(θ0 ) = α. COMMENTS: (i) The proof of Theorem 2 can be altered slightly to prove that limn→∞ supγ∈Γ Pθ0 γ (Tn (θ0 ) > cv(1 − α)) = α for the SC-FCV test under the given assumptions (which is slightly stronger than the result in Theorem 2) and analogously for the SC-Sub and SC-Hyb tests. (ii) Assumptions C–G are only used for the SC-Sub and SC-Hyb tests. Assumption K is only used for the SC-Hyb test. Part (a) of Assumption M is only used for the SC-FCV test and analogously part (b) only for the SC-Sub test and part (c) only for SC-Hyb test. To compute cv(1 − α) κ(α), and κ∗ (α), one needs to be able to compute ch (1 − α) for h ∈ H and carry out maximization over h ∈ H or (g h) ∈ GH. Computation of ch (1 − α) can be done analytically in some cases, by numerical integration if the density of Jh is available, or by simulation. The maximization step may range in difficulty from being very easy to nearly impossible, depending on the dimension p + q of h, the shape and smoothness of ch (1 − α) as a function of h, and the time needed to compute ch (1 − α) for any given h. For a given example, one can tabulate cv(1 − α), κ(α), and κ∗ (α) for selected values
740
D. W. K. ANDREWS AND P. GUGGENBERGER
of α. Once this is done, the SC-FCV, SC-Sub, and SC-Hyb tests are as easy to apply as the corresponding noncorrected tests. An alternative method of size-correcting subsampling and hybrid tests is to adjust the quantile of the test rather than to increase the critical value by a fixed amount, see the Supplemental Material. 3.2. Power Comparisons of Size-Corrected Tests Here we compare the asymptotic power of the SC-FCV, SC-Sub, and SCHyb tests. Since all three tests employ the same test statistic Tn (θ0 ), the comparison is based on the magnitudes of the critical values of the tests for n large. The SC-FCV critical value is fixed. The other two critical values are random and their large sample behavior depends on the sequence {γn ∈ Γ : n ≥ 1} of true parameters. We focus on the case in which these critical values do not depend on whether the null hypothesis is true, which typically holds when the subsample statistics are defined to satisfy Assumption Sub1 (and fails when they satisfy Assumption Sub2). The possible limits of the SC-Sub and SC-Hyb critical values under {γnh } are (3.3)
cg (1 − α) + κ(α) & max{cg (1 − α) c∞ (1 − α) + κ∗ (α)}
for
g∈H
(see Lemma 6(v) of AG1). The relative magnitudes of the limits of the critical values are determined by the shapes of the quantiles cg (1 − α) as functions of g ∈ H. The first result is that the SC-Hyb test is always at least as powerful as the SC-FCV test. This holds because for all g ∈ H, (3.4) max{cg (1 − α) c∞ (1 − α) + κ∗ (α)} = max cg (1 − α) sup ch (1 − α) h∈H ∗
≤ sup ch (1 − α) = cv(1 − α) h∈H
The same is not true of the SC-Sub test vis-à-vis the SC-FCV test. Next, Theorem S1 in the Supplemental Material shows that (a) if cg (1 − α) ≥ ch (1 − α) for all (g h) ∈ GH, then the SC-Sub, SC-Hyb, Sub, and Hyb tests are equivalent asymptotically and are more powerful than the SC-FCV test; (b) if cg (1 − α) ≤ ch (1 − α) for all (g h) ∈ GH, then the SC-FCV, SC-Hyb, FCV, and Hyb tests are equivalent asymptotically and are more powerful than the SC-Sub test; and (c) if H = H1 = R+∞ and ch (1 − α) is uniquely maximized at h∗ ∈ (0 ∞), then the SC-FCV and SC-Hyb tests are asymptotically equivalent and are either (i) more powerful than the SC-Sub test for all (g h) ∈ GH or (ii) more powerful than the SC-Sub test for some values of (g h) ∈ GH but less powerful for other values of (g h) ∈ GH. Note that these power comparisons hold even if different subsample sizes are used for the hybrid and subsampling
HYBRID AND SIZE-CORRECTED SUBSAMPLING
741
procedures provided both satisfy b → ∞ and b/n → 0 (because the asymptotic results do not depend on the specific choice of b). These results show that the SC-Hyb test has some nice power properties. When the SC-Sub test dominates the SC-FCV test, the SC-Hyb test behaves like the SC-Sub test. When the SC-FCV test dominates the SC-Sub test, the SC-Hyb test behaves like the SC-FCV test. In none of the cases considered is the SC-Hyb test dominated by the SC-FCV or SC-Sub tests. 3.3. Plug-In Size-Corrected Tests Here, we introduce improved size-correction methods that employ a consistent estimator γn2 of the nuisance parameter γ2 . The idea is to size-correct a test differently for different values of γn2 , rather than size-correct by a value that is sufficiently large to work uniformly for all γ2 ∈ Γ2 . This yields a more powerful test. The estimator γn2 is assumed to satisfy the following assumption. ASSUMPTION N: γn2 − γn2 →p 0 under all sequences {γn = (γn1 γn2 γn3 ) ∈ Γ : n ≥ 1}. Assumption N holds in many cases, but it fails in models that are unidentified at the discontinuity point of the asymptotic distribution of Tn (θ0 ), as occurs in an IV regression model with IVs that may be weak. Define (3.5)
cvh2 (1 − α) = sup c(h1 h2 ) (1 − α) h1 ∈H1
κh2 (α) =
sup g1 h1 ∈H1 : ((g1 h2 )(h1 h2 ))∈GH
c(h1 h2 ) (1 − α) − c(g1 h2 ) (1 − α)
κ∗h2 (α) = sup c(h1 h2 ) (1 − α) − c∞ (1 − α) h1 ∈Hh∗
where
2
Hh∗2 = h1 ∈ H1 : for some g1 ∈ H1 (g h) = ((g1 h2 ) (h1 h2 )) ∈ GH & cg (1 − α) < ch (1 − α) If Hh∗2 is empty, then κ∗h2 (α) = −∞. The PSC-FCV, PSC-Sub, and PSC-Hyb tests are defined as in (3.1) with cv(1 − α), κ(α), and κ∗ (α) replaced by cvγn2 (1 − α), κγn2 (α), and κ∗γn2 (α), respectively. γn2 takes Clearly, cvγn2 (1 − α) ≤ cv(1 − α) (with strict inequality whenever a value that does not maximize cvh2 (1 − α) over h2 ∈ H2 ). In consequence, the PSC-FCV test is asymptotically more powerful than the SC-FCV test. Analogous results hold for the critical values and asymptotic power of the PSC-Sub and PSC-Hyb tests relative to the SC-Sub and SC-Hyb tests. The following continuity assumption is not very restrictive.
742
D. W. K. ANDREWS AND P. GUGGENBERGER
ASSUMPTION O: (a)(i) cvh2 (1−α) is uniformly continuous in h2 on H2 , (ii) for each h2 ∈ H2 , there exists some h∗1 ∈ H1 such that c(h∗1 h2 ) (1 − α) = cvh2 (1 − α), and (iii) for all h = (h1 h2 ) ∈ H for which ch (1 − α) = cvh2 (1 − α), Jh (x) is continuous at x = cvh2 (1 − α). (b)(i) κh2 (α) is uniformly continuous in h2 on H2 , (ii) for each h2 ∈ H2 , there exists some g1∗ h∗1 ∈ H1 such that (g∗ h∗ ) = ((g1∗ h2 ) (h∗1 h2 )) ∈ GH and c(h∗1 h2 ) (1 − α) − c(g1∗ h2 ) (1 − α) = κh2 (1 − α), and (iii) for all (g h) ∈ GH for which ch (1 − α) − cg (1 − α) = κh2 (1 − α), where h = (h1 h2 ), Jh (x) is continuous at x = cg (1 − α) + κh2 (1 − α). (c)(i) κ∗h2 (α) is uniformly continuous in h2 on H2 , (ii) for each h2 ∈ H2 , when Hh∗2 is nonempty, we have, for some h∗1 ∈ Hh∗2 , c(h∗1 h2 ) (1 − α) − c∞ (1 − α) = κ∗h2 (1 − α), and (iii) for all (g h) ∈ GH for which ch (1 − α) − c∞ (1 − α) = κ∗h2 (1 − α), where h = (h1 h2 ), Jh (x) is continuous at x = max{cg (1 − α) c∞ (1 − α) + κ∗h2 (1 − α)}. Assumption O holds in the post-conservative model selection example given the definition of Jh∗ (x) in (2.8). THEOREM 3: Suppose Assumptions A–G, K, L, N, and O hold. Then (a) cvγn2 (1 − α) − cvγn2 (1 − α) →p 0, κγn2 (α) − κγn2 (α) →p 0, and κ∗γn2 (α) − κ∗γn2 (α) →p 0 under all sequences {γn = (γn1 γn2 γn3 ) ∈ Γ : n ≥ 1} and (b) the PSC-FCV, PSC-Sub, and PSC-Hyb tests satisfy AsySz(θ0 ) = α. COMMENT: Assumption O(a) is only used for the PSC-FCV test and likewise part (b) is only used for the PSC-Sub test and part (c) for the PSC-Hyb test. 4. FINITE-SAMPLE ADJUSTMENTS In this section, we introduce a finite-sample adjustment to the AsySz(θ0 ) of subsampling and hybrid tests. It is designed to give a better approximation to the actual finite-sample sizes of these tests than does AsySz(θ0 ). The adjustments are used to construct finite-sample adjusted size-corrected (ASC) subsampling and hybrid tests, both with and without plug-in estimation of h2 . The idea of the adjustment is to retain the actual ratio δn = b/n of the subsample size to the full-sample size in the approximation to the exact size of the tests, rather than to use its asymptotic limit, which is zero. The adjustment method is described roughly as follows. For simplicity, consider the case in which γ does not contain subvectors γ2 or γ3 , p = 1, and Γ = [0 d] for some 0 < d < ∞. Under Assumption B, the distribution of Tn (θ0 ) under γ can be approximated by Jhn , where hn = nr γ. Hence, the distribution of Tb (θ0 ) under γ can be approximated by Jh∗n , where h∗n = br γ = (b/n)r hn = δrn hn . In turn, the 1 − α subsampling quantile cnb (1 − α) under γ
HYBRID AND SIZE-CORRECTED SUBSAMPLING
743
can be approximated by the 1 − α quantile of Jh∗n = Jδrn hn , namely cδrn hn (1 − α). This leads to the approximation of Pθ0 γ (Tn (θ0 ) > cnb (1 − α)) by (4.1) 1 − Jhn cδrn hn (1 − α) and it leads to the approximation of supγ∈Γ Pθ0 γ (Tn (θ0 ) > cnb (1 − α)) by (4.2) AsySzn (θ0 ) = sup 1 − Jh cδrn h (1 − α) h∈H
Suppose Jh (cg (1 − α)) is a continuous function of (g h) at each (g h) ∈ GH and that Assumption C(ii) holds, that is, δn = b/n → 0. Then, as n → ∞, the quantity in (4.1) approaches 1 −Jh (c0 (1−α)) if hn → h ∈ [0 ∞). It approaches 1 − J∞ (cg (1 − α)) if hn → ∞ and δrn hn → g ∈ [0 ∞]. Hence, for any (g h) ∈ GH, limn→∞ (1 − Jhn (cδrn hn (1 − α))) = 1 − Jh (cg (1 − α)) for a suitable choice of {hn ∈ H : n ≥ 1}. This suggests that lim sup 1 − Jh cδrn h (1 − α) = sup 1 − Jh (cg (1 − α)) (4.3) n→∞ h∈H
(gh)∈GH
= AsySz(θ0 ) It is shown below that (4.3) does hold, which implies that AsySzn (θ0 ) is an asymptotically valid finite-sample adjustment to AsySz(θ0 ). We now consider the general case in which γ may contain subvectors γ2 and γ3 , and p ≥ 1. In this case, only the subvector γ1 affects whether γ is near a discontinuity point of the limit distribution. In consequence, only h1 , and not h2 , is affected by the δrn rescaling that occurs above. For a subsampling test, we define (4.4) AsySzn (θ0 ) = sup 1 − Jh c(δrn h1 h2 ) (1 − α) h=(h1 h2 )∈H
Next, we use the finite-sample adjustment to construct adjusted SC-Type and PSC-Type tests for Type = Sub and Hyb, which are denoted ASC-Type and APSC-Type tests. For δ ∈ (0 1) and h2 ∈ H2 , define κ(δ α) = sup (4.5) c(h1 h2 ) (1 − α) − c(δr h1 h2 ) (1 − α) h=(h1 h2 )∈H
κh2 (δ α) = sup c(h1 h2 ) (1 − α) − c(δr h1 h2 ) (1 − α) h1 ∈H1
∗
κ (δ α) = sup ch (1 − α) − c∞ (1 − α) h∈H ∗ (δ)
∗ h2
κ (δ α) =
2
where
H (δ) = h ∈ H : c(δr h1 h2 ) (1 − α) < c(h1 h2 ) (1 − α) for h = (h1 h2 ) ∗
sup c(h1 h2 ) (1 − α) − c∞ (1 − α)
h1 ∈Hh∗ (δ)
744
D. W. K. ANDREWS AND P. GUGGENBERGER
Hh∗2 (δ) = h1 ∈ H1 : c(δr h1 h2 ) (1 − α) < c(h1 h2 ) (1 − α) If H ∗ (δ) is empty, then κ∗ (δ α) = −∞. If Hh∗2 (δ) is empty, then κ∗h2 (δ α) = −∞. The ASC-Sub and ASC-Hyb tests are defined as in (3.1) with κ(α) and κ∗ (α) replaced by κ(δn α) and κ∗ (δn α), respectively, where δn = b/n. The APSC-Sub and APSC-Hyb tests are defined as in (3.1) with κ(α) and κ∗ (α) replaced by κγ2n (δn α) and κ∗γ2n (δn α), respectively. We use the following assumptions. ASSUMPTION P: (i) The function (g h) → Jh (cg (1 − α)) for (g h) ∈ H × H is continuous at all (g h) ∈ GH and (ii) MaxSub (α) = Max−Sub (α), where the latter are defined as MaxHyb (α) is defined in (2.14) but with cg (1 − α) in place of max{cg (1 − α) c∞ (1 − α)} and in addition Max−Sub (α) has Jh (x) replaced by Jh (x−). ASSUMPTION Q: ch (1 − α) is continuous in h on H. ASSUMPTION R: Either H ∗ is nonempty and suph∈H † ch (1 − α) ≤ suph∈H ∗ ch (1 − α), where H † = {h ∈ H : h = limk→∞ hvk for some subsequence {vk } and some hvk ∈ H ∗ (δvk ) for all k ≥ 1}, or H ∗ is empty and H ∗ (δ) is empty for all δ > 0 sufficiently close to zero. ASSUMPTION S: For all h2 ∈ H2 , either Hh∗2 is nonempty and suph1 ∈H † c(h1 h2 ) (1 − α) ≤ suph1 ∈H ∗ (δ) c(h1 h2 ) (1 − α), where Hh†2 = {h1 ∈ H1 : h1 = h2
h2
limk→∞ hvk 1 for some subsequence {vk } and some hvk 1 ∈ Hγ∗v 2 (δvk ) for all k ≥ 1, k where limk→∞ γvk 2 = h2 }, or Hh∗2 is empty and Hh∗2 (δ) is empty for all δ > 0 sufficiently close to zero. Assumption P is a mild continuity assumption. Assumptions Q, R, and S are not restrictive in most examples. Whether Assumptions R and S hold depends primarily on the shape of ch (1 − α) as a function of h. It is possible for Assumptions R and S to be violated, but only for quite specific and unusual shapes for ch (1 − α). For example, Assumption R is violated in the case where p = 1 and no parameter h2 exists if for some h∗ ∈ (0 ∞) the graph of ch (1 −α) is (i) bowlshaped for h ∈ [0 h∗ ] with c0 (1 − α) = ch∗ (1 − α) and (ii) strictly decreasing for h > h∗ with c∞ (1 − α) < ch (1 − α) for all 0 ≤ h < ∞. In this case, H ∗ is empty (because ch (1 − α) takes on its minimum for h = ∞ and its maximum at h = 0), but h∗ ∈ H ∗ (δ) for all δ ∈ (0 1), which contradicts Assumption R. The following result shows that AsySzn (θ0 ) provides an asymptotically valid finite-sample adjustment to AsySz(θ0 ) that depends explicitly on the ratio δn = b/n, and that the ASC and APSC tests have AsySz(θ0 ) = α.
HYBRID AND SIZE-CORRECTED SUBSAMPLING
745
THEOREM 4: (a) Suppose Assumptions A–G and P hold. Then a subsampling test satisfies lim AsySzn (θ0 ) = AsySz(θ0 )
n→∞
(b) Suppose Assumptions A–G, K–M, Q, and R hold. Then (i) limn→∞ κ(δn α) = κ(α) and limn→∞ κ∗ (δn α) = κ∗ (α), and (ii) the ASC-Sub and ASC-Hyb tests satisfy AsySz(θ0 ) = α. (c) Suppose Assumptions A–G, K, L, N, O, Q, and S hold. Then (i) κγn2 (δn α) − κγn2 (α) →p 0 and κ∗γn2 (δn α) − κ∗γn2 (α) →p 0 under all sequences {γn = (γn1 γn2 , γn3 ) ∈ Γ : n ≥ 1}, and (ii) the APSC-Sub and APSC-Hyb tests satisfy AsySz(θ0 ) = α. COMMENTS: (i) An analogous result to Theorem 4(a) holds for the hybrid test with c(δr h1 h2 ) (1 − α) replaced by max{c(δr h1 h2 ) (1 − α) c∞ (1 − α)} in (4.4). (ii) In Theorem 4(b), the ASC-Hyb test satisfies lim infn→∞ κ∗ (δn α) ≥ ∗ κ (α) and AsySz(θ0 ) ≤ α without imposing Assumption R. Assumption R is a necessary and sufficient condition for limn→∞ κ∗ (δn α) = κ∗ (α) given the other assumptions. 5. EQUAL-TAILED TESTS This section considers equal-tailed two-sided hybrid t tests. For brevity, equaltailed SC APSC t tests are discussed in the Supplemental Material. We suppose that Tn (θ0 ) = τn ( θn − θ0 )/ σn , where θn is an estimator of a scalar parameter θ based on a sample of size n, σn (∈ R) is an estimator of the scale of θn , and τn is a normalization constant, usually equal to n1/2 . An equal-tailed hybrid t test of H0 : θ = θ0 versus H1 : θ = θ0 of nominal level α (∈ (0 1/2)) rejects H0 when (5.1)
∗ ∗∗ (1 − α/2) or Tn (θ0 ) < cnb (α/2) Tn (θ0 ) > cnb
where
∗ cnb (1 − α/2) = max{cnb (1 − α/2) c∞ (1 − α/2)} ∗∗ cnb (α/2) = min{cnb (α/2) c∞ (α/2)}
Define (5.2)
MaxETHyb (α) = sup
1 − Jh (cg∗ (1 − α/2)) + Jh (cg∗∗ (α/2))
(gh)∈GH
where cg∗ (1 − α/2) = max{cg (1 − α/2) c∞ (1 − α/2)} and cg∗∗ (α/2) = min{cg (α/2), c∞ (α/2)}. If Jh (x) is continuous at suitable (h x) values, then the following assumption holds.
746
D. W. K. ANDREWS AND P. GUGGENBERGER
− r− ASSUMPTION TET: Maxr− ETHyb (α) = MaxETHyb (α), where MaxETHyb (α) is de∗∗ fined as MaxETHyb (α) is defined in (5.2) but with Jh (cg (α/2)−) in place of Jh (cg∗∗ (α/2)) (where Jh (cg∗∗ (α/2)−) denotes the limit from the left of Jh (x) at ∗ x = cg∗∗ (α/2)) and Max − ETHyb (α) is defined as in (5.2) with Jh (cg (1 − α/2)−) in ∗ place of Jh (cg (1 − α/2)).
Assumption TET holds in the post-conservative model selection example by the continuity of Jh∗ (x) in x for x ∈ R for all h ∈ H. The proof of Theorem 1 of AG1 can be adjusted straightforwardly to yield the following result for equal-tailed hybrid t tests. θn − θ0 )/ σn . SupCOROLLARY 1: Let α ∈ (0 1/2) be given. Let Tn (θ0 ) = τn ( pose Assumptions A–E, G, J, K, and TET hold. Then an equal-tailed hybrid t test satisfies AsySz(θ0 ) = MaxETHyb (α). 6. CONFIDENCE INTERVALS This section introduces hybrid and size-corrected CIs for a parameter θ ∈ Rd when nuisance parameters η ∈ Rs and γ3 ∈ T3 may appear. (See Andrews and Guggenberger (2009c) for results concerning FCV and subsampling CIs.) The confidence level of a CI for θ requires uniformity over θ as well as over (η γ3 ). We make θ and η subvectors of γ so that the results from previous sections, which are uniform over γ ∈ Γ , give the uniformity results that we need for CIs for θ.10 Specifically, we partition θ into (θ1 θ2 ), where θj ∈ Rdj for j = 1 2, and we partition η into (η1 η2 ), where ηj ∈ Rsj for j = 1 2. Then we consider the same set-up as in Section 2.2 where γ = (γ1 γ2 γ3 ), but with γ1 = (θ1 η1 ) and γ2 = (θ2 η2 ), where p = d1 + s1 and q = d2 + s2 . In most examples, either no parameter θ1 or θ2 appears (i.e., d1 = 0 or d2 = 0) and either no parameter η1 or η2 appears (i.e., s1 = 0 or s2 = 0). We consider a test statistic Tn (θ0 ) for testing the null hypothesis H0 : θ = θ0 as above. We obtain CIs for θ by inverting tests based on Tn (θ0 ). Let Θ (⊂ Rd ) denote the parameter space for θ and let Γ denote the parameter space for γ. Hybrid CIs for θ are defined by (6.1)
CIn = {θ0 ∈ Θ : Tn (θ0 ) ≤ c1−α }
where c1−α = max{cnb (1 − α) c∞ (1 − α)}. The critical value c1−α does not depend on θ0 when Assumption Sub1 holds, but does depend on θ0 when Assumption Sub2 holds through the dependence of the subsample statistic on θ0 . 10 Of course, with this change, the index parameter h, the asymptotic distributions {Jh : h ∈ H}, and the assumptions are different in any given model in this CI section from the earlier sections on testing.
HYBRID AND SIZE-CORRECTED SUBSAMPLING
747
For example, suppose Tn (θ0 ) is an (i) upper one-sided, (ii) lower one-sided, or (iii) symmetric two-sided t test of nominal level α and Assumption Sub1 holds. Then the corresponding CI of nominal level α is defined by (6.2)
θn − τn−1 σn c1−α ∞) CIn = [ θn + τn−1 σn c1−α ] CIn = (−∞
or
θn − τn−1 σn c1−α θn + τn−1 σn c1−α ] CIn = [ respectively (provided Θ is R). The exact and asymptotic confidence sizes of CIn are (6.3)
ExCSn = inf Pγ (Tn (θ) ≤ c1−α ) and AsyCS = lim inf ExCSn γ∈Γ
n→∞
respectively, where θ = (θ1 θ2 ) and probabilities are indexed by γ = ((θ1 η1 ) (θ2 η2 ), γ3 ) here, whereas they are indexed by (θ γ) in earlier sections. An equal-tailed hybrid CI for θ of nominal level α is defined by ∗ ∗∗ θn − τn−1 (6.4) σn cnb (1 − α/2) θn − τn−1 σn cnb (α/2) CIn = ∗ ∗∗ (1 − α/2) and cnb (α/2) are defined in (5.1). where cnb An analogue of Theorem 4 holds regarding the finite-sample adjustedasymptotic sizes of subsampling and hybrid CIs. In this case, AsyCSn is defined as AsySzn is defined in (4.4) but with suph∈H replaced by infh∈H and Jh replaced by 1 − Jh . Next, we consider size-corrected CIs. SC-FCV, SC-Sub, and SC-Hyb CIs are defined by (6.1) with their critical values, c1−α , defined as in (3.1) and (3.2) for SC tests. The following are changes in the assumptions for use with CIs.
ASSUMPTION ADJUSTMENTS FOR CIS: (i) θ is a subvector of γ, rather than a separate parameter from γ. In particular, γ = (γ1 γ2 γ3 ) = ((θ1 η1 ) (θ2 η2 ) γ3 ) for θ = (θ1 θ2 ) and η = (η1 η2 ). (ii) Instead of the true probabilities under a sequence {γnh : n ≥ 1} being {Pθ0 γnh (·) : n ≥ 1}, they are {Pγnh (·) : n ≥ 1}. (iii) The test statistic Tn (θ0 ) is replaced in the assumptions under a true sequence {γnh : n ≥ 1} by Tn (θnh ), where γnh = ((θnh1 ηnh1 ), (θnh2 ηnh2 ) γnh3 ) and θnh = (θnh1 θnh2 ). (iv) In Assumption D, θ0 in Tnbn j (θ0 ) and Tbn (θ0 ) is replaced by θ, where θ = (θ1 θ2 ) and γ = ((θ1 η1 ) (θ2 η2 ) γ3 ). (v) θ0 is replaced in the definition of Unb (x) by θn when the true parameter is γn = ((θn1 ηn1 ) (θn2 ηn2 ) γn3 ) and θn = (θn1 θn2 ). Hybrid and size-corrected CIs satisfy the following results. COROLLARY 2: Let the assumptions be adjusted for CIs as stated above.
748
D. W. K. ANDREWS AND P. GUGGENBERGER
(a) Suppose Assumptions A–G, K, and T hold. Then the hybrid CI satisfies AsyCS = 1 − MaxHyb (α). (b) Let α ∈ (0 1/2) be given. Suppose Assumptions A–E, G, J, K, and TET hold. Then the equal-tailed hybrid t CI satisfies AsyCS = 1 − MaxETHyb (α). (c) Suppose Assumptions A–G and K–M hold. Then the SC-FCV, SC-Sub, and SC-Hyb CIs satisfy AsyCS = 1 − α. COMMENT: Corollary 2(a), (b), and (c) hold by the same arguments as for Theorem 1, Corollary 1, and Theorem 2, respectively, with some adjustments. Definitions and results for CIs of the form PSC-Type for Type = FCV, Sub, and Hyb, and ASC-Type and APSC-Type for Type = Sub and Hyb are analogous to those just stated for SC CIs but with critical values as defined in Sections 3.3 and 4, rather than as in Section 3.1. Size-corrected equal-tailed CIs are defined as in (6.4) with critical values c1−α/2 and cα/2 given by the equal-tailed SC, PSC, ASC, and/or APSC critical values for tests given in the Supplemental ∗ ∗∗ (1 − α/2) and cnb (α/2). Material in place of cnb 7. CI FOR AN AUTOREGRESSIVE PARAMETER We now apply the general results above to an AR(1) model with conditional heteroskedasticity. We use the unobserved components representations of the AR(1) model. The observed time series {Yi : i = 0 n} is based on a latent no-intercept AR(1) time series {Yi∗ : i = 0 n}: (7.1)
Yi = α + βi + Yi∗ ∗ + Ui Yi∗ = ρYi−1
for i = 1 n
where ρ ∈ [−1 + ε 1] for some 0 < ε < 2, {Ui : i = 0 ±1 ±2 } are stationary and ergodic with conditional mean 0 given a σ-field Gi−1 defined below, conditional variance σi2 = E(Ui2 |Gi−1 ), unconditional variance σU2 ∈ (0 ∞), and distribution F . The distribution of Y0∗ is the distribution ∞ that yields strict stationarity for {Yi∗ : i ≤ n} when ρ < 1, that is, Y0∗ = j=0 ρj U−j , and is arbitrary when ρ = 1. We consider two versions of the AR(1) model—Model 1, which has an intercept, and Model 2, which has an intercept and time trend. Model 1 is obtained by setting β = 0 in (7.1). In the notation above, we have θ = 1 − ρ ∈ Θ = [0 2 − ε]. Models 1 and 2 can be rewritten, respectively, as (7.2)
α + ρYi−1 + Ui Yi =
where α = α(1 − ρ)
Yi = α + βi + ρYi−1 + Ui where α = α(1 − ρ) + ρβ
and β = β(1 − ρ)
HYBRID AND SIZE-CORRECTED SUBSAMPLING
749
for i = 1 n.11 We consider a feasible quasi-GLS (FQGLS) t statistic based on estimators 2 : i ≤ n} of the conditional variances {σ 2 : i ≤ n}. The estimators {φ 2 : i ≤ {φ ni i ni n} may be based on a parametric specification of the conditional heteroskedasticity, such as a GARCH(1 1) model, or a nonparametric procedure, such as one based on q lags of the observations. In either case, we do not assume that the estimator of conditional heteroskedasticity is consistent. For example, we allow for incorrect specification of the parametric model in the former case and conditional heteroskedasticity that depends on more than q lags in the lat2 : i ≤ n} are defined such that ter case. The estimated conditional variances {φ ni they approximate a stationary Gi−1 -adapted sequence {φ2i : i ≤ n} in the sense that certain normalized sums have the same asymptotic distribution whether 2 or φ2 appears in the sum. This is a standard property of feasible and infeaφ ni i sible GLS estimators. For example, for the model without a time trend, the results cover the case 2 : i ≤ n} are based on a GARCH(1 1) parametric model eswhere (i) {φ ni timated using LS residuals with GARCH and LS parameter estimators πn and ( αn ρn ), respectively, (ii) ( αn ρn ) have probability limits given by the true values ( α0 ρ0 ) (see (7.2)), (iii) πn has a probability limit given by the 2 = φ2 ( “pseudo-true” value π0 , (iv) φ ρn πn ), where φ2i1 ( α ρ π) is the ni i1 αn ith GARCH conditional variance based on a start-up at time 1 and parameters ( α ρ π), and (v) φ2i−∞ ( α ρ π) is the GARCH conditional variance based on a start-up at time −∞. In this case, φ2i = φ2i−∞ ( α0 ρ0 π0 ). 2 2 Thus, φi is just φni with the estimation error and start-up truncation removed. Under the null hypothesis that ρ = ρn = 1 − θn , the studentized t statistic is (7.3)
Tn∗ (θn ) = τn ( ρ − ρn )/ σ
i on Yi−1 /φ i where τn = n1/2 , ρ is the LS estimator from the regression of Yi /φ and 1/φi in the case of Model 1 and from the regression of Yi /φi on Yi−1 /φi , i , and i/φ i in the case of Model 2, and 1/φ σ 2 is the (1 1) element of the standard heteroskedasticity-robust variance estimator for the LS estimator in the preceding regression. To define Tn∗ (θn ) more explicitly, let Y , U, X1 , and X2 be n-vectors with i , Ui /φ i , Yi−1 /φ i , and 1/φ i , respectively, in Models ith elements given by Yi /φ i i/φ i ). 1 and 2, except in Model 2 let X2 be the n × 2 matrix with ith row (1/φ Let Δ be the diagonal n × n matrix with ith diagonal element given by the ith element of the residual vector MX Y , where X = [X1 : X2 ] and MX = In − 11
The advantage of writing the model as in (7.1) becomes clear here. For example, in Model 1, the case ρ = 1 and α = 0 is automatically ruled out by model (7.1). This is a case where Yi is dominated by a deterministic trend and the LS estimator of ρ converges at rate n3/2 .
750
D. W. K. ANDREWS AND P. GUGGENBERGER
X(X X)−1 X . That is, Δ = Diag(MX Y ). Then, by definition, (7.4)
ρ = (X1 MX2 X1 )−1 X1 MX2 Y −1 −1 −1 σ 2 = n−1 X1 MX2 X1 n X1 MX2 Δ2 MX2 X1 n−1 X1 MX2 X1
For upper one-sided, lower one-sided, and symmetric two-sided tests or CIs concerning ρ, we take Tn (θn ) equal to Tn∗ (θn ), −Tn∗ (θn ), and |Tn∗ (θn )|, respectively. In this section, we provide results for the (infeasible) quasi-GLS estimator based on {φ2i : i ≤ n}. Conditions under which feasible and infeasible quasiGLS estimators are asymptotically equivalent are technical and, for brevity, sufficient conditions are given in Andrews and Guggenberger (2008b). For technical reasons, these conditions take πn to be a discretized estimator and 2 to depend upon a finite number of lagged squared residuals. Neirequire φ i ther of these conditions is particularly restrictive because the grid size for the discretized estimator can be defined such that there is little difference between the discretized and nondiscretized versions of the estimator of π, and any model with stationary conditional heteroskedasticity, such as a GARCH(1 1) model, can be approximated arbitrarily well by taking the number of lags sufficiently large. By assumption, {(Ui φ2i ) : i ≥ 1} are stationary and strong mixing. We define Gi to be some nondecreasing sequence of σ-fields for i ≥ 1 for which (Uj φ2j+1 ) ∈ Gi for all j ≤ i. The vector of parameters is γ = (γ1 γ2 γ3 ), where γ1 = θ (= 1 − ρ), γ2 = (λ1 λ2 λ3 λ4 λ5 λ6 λ7 ) ∈ R7 , where λ1 = VarF (Ui ), λ2 = VarF (Ui /φ2i ) = −2 EF (σi2 /φ4i ), λ3 = CovF (Ui Ui /φ2i ) = EF (σi2 /φ2i ), λ4 = EF φ−1 i , λ5 = EF φi , −4 2 1/2 λ6 = EF φi , and λ7 = CorrF (Ui Ui /φi ) = λ3 /(λ1 λ2 ) , γ3 = (α F) in Model 1, and γ3 = (α β F) in Model 2.12 In this example, γ2 = η2 and no parameters θ2 or η1 appear. The distribution of the initial condition Y0∗ does not appear in γ3 because under strict stationarity it equals the stationary marginal distribution of Ui , and that is completely determined by F and γ1 , and in the unit root case it is irrelevant. In the definition of γnh , we take r = 1. The parameter spaces are Γ1 = Θ = [0 2 − ε] for some ε > 0, Γ2 ⊂ Γ2∗ = {(λ1 λ2 λ3 λ4 λ5 λ6 λ7 ) ∈ [ε2 ∞)2 × (0 ∞) × [ε2 ∞)3 × [0 1] : λ7 = λ3 /(λ1 λ2 )1/2 } for some ε2 > 0, Γ3 (γ2 ) = B1 × F (γ2 ) in Model 1, and Γ3 (γ2 ) = B2 × F (γ2 ) in Model 2, where B1 and B2 are bounded subsets of R and R2 , respectively, and F (γ2 ) is the parameter space for the stationary and strong12
Note that Section 6 discusses CIs for θ, which is an element of γ, whereas here we consider CIs for ρ = 1 − θ, which is not an element of γ. However, a CI for θ immediately yields one for ρ.
HYBRID AND SIZE-CORRECTED SUBSAMPLING
751
mixing distribution F of {Ui : i = 1 2 } for a given value of γ2 .13 In particular, we have (7.5) F (γ2 ) = F : {(Ui φ2i ) : i = 0 ±1 ±2 } are stationary and strong mixing under F with EF (Ui |Gi−1 ) = 0 a.s., EF (Ui2 |Gi−1 ) = σi2 a.s., where Gi is some nondecreasing sequence of σ-fields for i = 1 2 for which (Uj φ2j+1 ) ∈ Gi for all j ≤ i the strong-mixing numbers {αF (m) : m ≥ 1} satisfy αF (m) ≤ Cm−3ζ/(ζ−3) as m → ∞ for some ζ > 3 supistuvA EF | a∈A a|ζ ≤ M where 0 ≤ i s t u v < ∞ and A is any nonempty subset 2 U−u U−v U12 } φ2i ≥ δ a.s. of {Ui−s Ui−t Ui+1
λmin (EF X 1 X 1 U12 /φ21 ) ≥ δ where X 1 = (Y0∗ /φ1 φ−1 1 )
VarF (Ui ) = λ1 VarF (Ui /φ2i ) = λ2 CovF (Ui Ui /φ2i ) = λ3 −2 −4 EF φ−1 i = λ4 EF φi = λ5 EF φi = λ6 and
CorrF (Ui Ui /φ2i ) = λ7 where γ2 = (λ1 λ2 λ3 λ4 λ5 λ6 λ7 )
for some C M < ∞ and δ > 0, where λmin (A) denotes the minimum eigenvalue of a matrix A. In the Supplemental Material we verify the assumptions of Corollary 2 concerning hybrid CIs, except Assumption B. The Supplemental Material uses Lemma 4 of AG1 to verify Assumption G. Assumption B holds by Theorem 1 in Andrews and Guggenberger (2008b). Slightly weaker assumptions than those in Corollary 2 yield asymptotic size results for FCV and subsampling CIs; see Theorem 3 in Andrews and Guggenberger (2009c). For brevity, we verify assumptions only for Model 1. The moment conditions in F (γ2 ) are used in the verification of Assumptions B and E for the case where ρ → 1 at a rate slower than n−1 . The bounding of φ2i away from zero in F (γ2 ) is not 2 . restrictive because it is a consequence of a suitable choice of φ i 13 The parameter space Γ2 is a subset of Γ2∗ because the elements of γ2 are related (given that they all depend on moments of (Ui φi )) and Γ2∗ does not incorporate all of these restrictions. An −2 2 example of a restriction is λ24 = (EF φ−1 i ) ≤ EF φi · EF 1 = λ5 by the Cauchy–Schwarz inequality. Although the restrictions on Γ2 are not written explicitly, this is not a problem because the subsampling and hybrid procedures do not depend on the specification of Γ2 and the size-correction procedures only depend on λ7 or h27 whose parameter space is known. The parameter space B1 is taken to be bounded, because otherwise there are sequences αn → ∞, ρn → 1 for which αn 0. For analogous reasons, B2 is taken to be bounded.
752
D. W. K. ANDREWS AND P. GUGGENBERGER
In this example, H = R+∞ × Γ2 . Therefore, to establish Assumption B, we have to consider sequences {γnh = (γnh1 γnh2 γnh3 ) : n ≥ 1}, where h = (h1 h2 ), when the true autoregressive parameter ρ = ρn equals 1 − γnh1 , where (i) h1 = ∞ and (ii) 0 ≤ h1 < ∞. For AR(1) models with conditional heteroskedasticity, the special case of case (ii) in which ρ = 1 is fixed has been considered by Seo (1999) and Guo and Phillips (2001). For models without conditional heteroskedasticity, case (i) was studied by Park (2002), Giraitis and Phillips (2006), and Phillips and Magdalinos (2007), and case (ii) is the “near integrated” case that has been studied without conditional heteroskedasticity by Bobkowski (1983), Cavanagh (1985), Chan and Wei (1987), Phillips (1987), Elliott (1999), Elliott and Stock (2001), and Müller and Elliott (2003). The latter three papers consider the situation of interest here in which the initial condition Y0∗ yields a stationary process. Specifically, what is relevant here is the triangular array case with rowwise strictly stationary observations {Yi∗ : i ≤ n} and ρ that depends on n. Note that case (ii) contains as a special case the unit root model ρ = 1. We do not consider an AR(1) model here without an intercept, but such a model can be analyzed using the results of Andrews and Guggenberger (2008a). Interestingly, the asymptotic distributions in this case are quite different than in the models with an intercept or intercept and time trend. For Model 1, we have (7.6)
Tn∗ (θn ) →d Jh∗
under γnh
where
∗ h
J is the N(0 1) distribution for h1 = ∞ ⎛ ⎞ 1 ∗ IDh (r) dW (r) ⎜ ⎟ ⎜ ⎟ 0 ∗ 2 1/2 Jh is the distribution of ⎜h27 1
1/2 + (1 − h27 ) Z2 ⎟ ⎝ ⎠ ∗ IDh (r)2 dr 0
for 0 ≤ h1 < ∞ 1 ∗ (r) = Ih∗ (r) − Ih∗ (s) ds IDh 0
⎧ ⎨ I (r) + √ 1 exp(−h r)Z h 1 1 ∗ Ih (r) = 2h1 ⎩ W (r) r Ih (r) = exp(−(r − s)h1 ) dW (s)
for h1 > 0, for h1 = 0,
0
W (·) is a standard Brownian motion, and Z1 and Z2 are independent standard normal random variables that are independent of W (·). As defined, Ih (r)
HYBRID AND SIZE-CORRECTED SUBSAMPLING
753
is an Ornstein–Uhlenbeck process. The parameter h27 ∈ [0 1] is the limit of CorrFn (Ui Ui /φ2i ) under the sequence {γnh : n ≥ 1}. For Model 2, (7.6) holds except that for 0 ≤ h1 < ∞, Jh∗ is the distribution of 1 1 ∗ ∗ IDh (r) − 12 IDh (s)s ds · (r − 1/2) dW (r) 0 0 (7.7) h27 1 2 1/2 1 ∗ IDh (r) − 12
0
∗ IDh (s)s ds · (r − 1/2)
dr
0
+ (1 − h227 )1/2 Z2 The asymptotic results above apply to a first-order AR model. They should extend without essential change to CIs for the “sum of the AR coefficients” in a pth-order autoregressive model. In particular, the asymptotic distributions for statistics concerning the sum of the AR coefficients should be the same as those for ρ given in (7.6) and (7.7). Of course, the proofs will be more complex. For brevity, we do not provide such proofs. Figure 1 provides 95 quantile graphs of Jh∗ , −Jh∗ , and |Jh∗ | as functions of h1 for the cases of h27 = 0, 3, 6, and 1. The graphs for different val-
FIGURE 1.—Autoregression example for Model 1: .95 quantile graphs, ch (95), for Jh∗ −Jh∗ and (left, center, and right panel) as functions of h1 for several values of h27 .
|Jh∗ |
754
D. W. K. ANDREWS AND P. GUGGENBERGER
ues of h27 have similar shapes, but are progressively less steep as h27 decreases from 1 to 0. All of the graphs are monotone in h1 .14 The 95 quantile graphs for Jh∗ are monotone increasing in h1 for each value of h2 because the upper tail of Jh∗ gets thinner as h1 gets smaller. In consequence, the upper one-sided and equal-tailed two-sided subsampling CIs under-cover the true value asymptotically, and the upper FCV CI has correct size asymptotically. The 95 quantile graphs for −Jh∗ are decreasing in h1 for each value of h2 because the lower tail of Jh∗ gets thicker as h1 gets smaller. The 95 quantile graphs for |Jh∗ | are decreasing in h1 for each value of h2 because the lower tail of Jh∗ gets thicker as h1 gets smaller at a faster rate than the upper tail of Jh∗ gets thinner. Because the graphs of −Jh∗ and |Jh∗ | are decreasing in h1 , the lower and symmetric subsampling CIs have correct asymptotic size, while the lower FCV CI under-covers the true value asymptotically. These results explain the seemingly puzzling result (quantified in Table II in this section) that the equal-tailed subsampling CI has incorrect size asymptotically while the symmetric subsampling CI has correct size asymptotically. Table II reports the asymptotic (Asy) and finite-sample adjusted-asymptotic (Adj-Asy) sizes of nominal 95% CIs for Model 1 for symmetric and equaltailed two-sided FCV, subsampling, and hybrid CIs; see the last two rows of the table. (Symmetric and equal-tailed FCV CIs are the same, so only the former are reported.) These numbers are obtained by simulating the asymptotic formulae of Section 6. Further details concerning the construction of Table II are given in the Supplemental Material. Table II also reports finite-sample coverage probabilities of these CIs based on a FQGLS estimator ρn that uses a GARCH(1 1) specification for the conditional heteroskedasticity. The GARCH parameters are estimated by the closed-form estimator of Kristensen and Linton (2006). This estimator is employed in the Monte Carlo simulations because it is very quick to compute. Six different forms of the true conditional heteroskedasticity of the innovations are considered: (i) GARCH(1 1) with (intercept, MA, AR) parameters equal to (20 15 80), (ii) IGARCH(1 1) with (intercept, MA, AR) parameters (20 20 80), (iii) GARCH(1 1) with (intercept, MA, AR) parameters (20 70 20), (iv) i.i.d., (v) ARCH(4) with (intercept, AR1–AR4) parameters (20 30 20 20 20), and (vi) IARCH(4) with (intercept, AR1– AR4) parameters (20 30 30 20 20). In all cases, Ui = σi εi , where εi is standard normal and σi is the multiplicative conditional heteroskedasticity. The ARCH and IARCH processes provide evidence concerning the robustness of the procedures to an incorrect specification of the form of the conditional heteroskedasticity used in the definition of ρn . The integrated GARCH 14 The graphs in Figures 1 and 2 are computed by simulation. Monotonicity in Figure 1 is established numerically, not analytically.
755
HYBRID AND SIZE-CORRECTED SUBSAMPLING TABLE II AR EXAMPLE: CI COVERAGE PROBABILITIES (×100) FOR NOMINAL 95% CISa
Case
(i)
Symmetric CIs
Equal-Tailed CIs
Data Generating Process
n = 131 or Asy
FCV
Sub
Hyb
Sub
Hyb
GARCH MA = 15, AR = 80 h27 = 86
−.90 −.50 .00 ρ = 70
90.7 91.0 91.1 90.4
95.5 95.0 96.0 97.9
95.8 95.7 96.4 97.9
93.0 90.7 90.2 88.7
94.7 93.6 94.1 96.6
.80 .90 .97 1.0
89.6 87.9 82.1 65.1
97.8 97.9 97.5 95.1
97.8 97.9 97.5 95.1
88.7 89.5 92.7 94.5
97.0 97.5 98.0 96.7
FS-Min Asy Adj-Asy
65.1 76.8 —
95.0 95.0 95.0
95.1 95.0 95.1
88.4 69.6 89.0
93.6 95.0 95.1
(ii)
IGARCH MA = 20, AR = 80
FS-Min
67.3
95.4
95.6
87.8
93.5
(iii)
GARCH MA = 70, AR = 20 h27 = 54
FS-Min Asy Adj-Asy
70.4 87.5 —
95.4 95.0 94.7
96.0 95.0 95.1
89.6 85.2 92.6
93.9 95.0 95.2
(iv)
i.i.d. h27 = 1
FS-Min Asy Adj-Asy
62.4 69.4 —
94.6 95.0 95.0
94.6 95.0 95.0
88.1 59.7 86.3
93.5 95.0 95.1
(v)
ARCH4 (3 2 2 2) h27 = 54
FS-Min Asy Adj-Asy
69.3 87.5 —
95.8 95.0 94.7
96.1 95.0 95.1
88.7 85.2 92.6
93.8 95.0 95.2
(vi)
IARCH4 (3 3 2 2)
FS-Min
71.1
95.7
96.2
88.5
93.7
Min over h27 ∈ [0 1]
Asy Adj-Asy
69.5 —
94.9 94.6
94.9 95.0
59.8 86.3
95.0 95.1
a The asymptotic and adjusted-asymptotic results of Table II are based on 30,000 simulation repetitions. The search over h1 to determine the minimum is done on the interval [−90 1] with step size 01 on [−90 90] and step size 001 on [90 10]. The search over h27 to determine the minimum is done on the interval [0 1] with step size 05. The asymptotic results are computed using a discrete approximation to the continuous stochastic process on [0 1] with 25,000 grid points.
and ARCH processes are not covered by the asymptotic results, but are included to address questions of robustness. The sample size, subsample size, and number of subsamples of consecutive observations employed are n = 131, 12, and 119. (We did not experiment with other sample sizes or subsample sizes.) For case (i), we report the finite-sample coverage probability for eight values of ρ between −9 and 10, as well as the minimum over ρ ∈ [−9 1], which is
756
D. W. K. ANDREWS AND P. GUGGENBERGER
denoted FS-Min.15 For brevity, for cases (ii)–(vi), we only report FS-Min. The finite-sample size of a CI depends on the minimum coverage probability over both ρ and different true forms of conditional heteroskedasticity. We do not attempt to determine the finite-sample size via simulations. For the four nonintegrated cases, we report the asymptotic and finite-sample adjusted-asymptotic sizes that correspond to the particular value of h27 for the given case (which are h27 = 86 54 10, and 54 for cases (i), (iii), (iv), and (v), respectively). These are the correct asymptotic sizes if h27 is known. Table S-II in the Supplemental Material reports results analogous to those in Table II for upper and lower one-sided CIs. We do not report results for any other CIs in the literature because none has a correct asymptotic size. We now discuss the results in Table II. The two-sided FCV CI under-covers asymptotically by a substantial amount; see the rows of column 4 labeled Asy. Its AsyCS equals 695%. The asymptotic results for equal-tailed subsampling CIs are similar, but somewhat worse; see the rows of column 7 labeled Asy. Its AsyCS equals 598%. Hence, subsampling CIs can have very poor asymptotic performance. On the other hand, symmetric subsampling CIs have correct AsyCS (up to simulation error) for the reasons described above; see column 5 of the second to last row. The discussion above of the quantile graphs in Figure 1 leads to the following results, which are corroborated by the numerical results. The two-sided FCV CI under-covers because its upper endpoint is farther away from 1 than it should be. Hence, it misses the true value of ρ too often to the left. On the other hand, the equal-tailed subsampling CI under-covers ρ because its lower endpoint is closer to 1 than it should be. Hence, it misses the true ρ to the right too often. The finite-sample adjusted-asymptotic results for δn = b/n = 12/131 show much less severe under-rejection for the equal-tailed subsampling CIs than the unadjusted asymptotic results: compare the rows in column 7 denoted Asy and Adj-Asy in Table II. The finite-sample coverage probabilities of the subsampling CIs (for n = 131 and b = 12) are closer in most cases to the adjusted asymptotic sizes than the unadjusted-asymptotic sizes. Hence, it is apparent that the asymptotic size of the equal-tailed subsampling CIs is approached slowly as n → ∞ and is obtained only with large sample sizes. In consequence, increases in the sample size from n = 131 make the equal-tailed subsampling CIs perform worse rather than better. The equal-tailed subsampling CIs can be size-corrected. We do not report results here for such CIs. None of the CIs considered here is similar asymptotically in a uniform sense or in finite samples. For the latter, see the rows corresponding to different values of ρ in case (i). 15 The minimum is calculated over the set {−9 −8 9 95 97 99 10}. The reason for excluding ρ ∈ (−10 −9) is discussed in the Supplemental Material.
HYBRID AND SIZE-CORRECTED SUBSAMPLING
757
The symmetric and equal-tailed hybrid CIs both have correct AsyCS; see the rows in columns 6 and 8 labeled Asy in Table II. This occurs because for every value of h27 , either the critical value of the FCV CI or the subsampling CI is suitable. Hence, the maximum of the two is a critical value that delivers correct asymptotic size. The finite-sample minimum (over ρ) coverage probabilities of the symmetric hybrid CI over the six cases range from 946% to 962%, which is quite good given the wide variety of conditional heteroskedasticity covered by these six cases. For the equal-tailed hybrid CI, the range is 935% to 939%, which is also good, but slightly lower than desirable. It is far superior to that of the FCV or equal-tailed subsampling CIs. We conclude by noting that the same sort of size issues that arise with subsampling in the AR(1) model also arise in vector autoregressive models with roots that may be near unity. For example, they arise with subsampling tests of Granger causality in such models; see Choi (2005). 8. POST-CONSERVATIVE MODEL-SELECTION INFERENCE Figure 2 provides graphs of the quantiles, ch (1 − α), of |Jh∗ | as a function of h1 ≥ 0 for several values of h2 ≥ 0 for this example. (The quantile graphs are invariant to the signs of h1 and h2 .) The corresponding quantile graphs for Jh∗ are remarkably similar to those for |Jh∗ | and, hence, are not given. In Figure 2, the graphs are hump-shaped with the size of the hump increasing in |h2 |. Based on the shape of the graphs, one expects the subsampling, FCV, and hybrid tests all to over-reject the null hypothesis asymptotically and in finite samples, and to do so by a substantial amount when |h2 | is large.
FIGURE 2.—Conservative model selection example: .95 quantile graphs, ch (95), for |Jh∗ | as functions of h1 for several values of the correlation h2 .
758
D. W. K. ANDREWS AND P. GUGGENBERGER TABLE III
CONSERVATIVE MODEL SELECTION EXAMPLE: MAXIMUM (OVER h1 ) NULL REJECTION PROBABILITIES (×100) FOR DIFFERENT VALUES OF THE CORRELATION h2 FOR VARIOUS NOMINAL 5% TESTS, WHERE THE PROBABILITIES ARE ASYMPTOTIC, FINITE-SAMPLE ADJUSTED ASYMPTOTIC, AND FINITE SAMPLE FOR n = 120 AND b = 12, AND THE PARAMETER SPACE FOR h2 IS [−995 995]a Sub
FCV
|h2 |
Asy
n = 120
.00 .20 .40 .60 .80 .90 .95 .995
51 69 112 202 413 613 755 929
54 72 110 198 389 575 722 919
Max
929
.00 .20 .40 .60 .80 .90 .95 .995 Max
PSC-FCV
n = 120
n = 120
(a) Upper One-Sided Tests 53 54 4.7 71 75 5.1 118 119 5.1 218 220 4.9 443 438 4.8 639 628 4.6 772 767 4.6 932 931 4.1
51 69 112 202 413 613 755 929
37 53 87 173 372 568 718 919
3.3 4.0 4.5 4.8 4.8 4.6 4.6 4.1
919
932
929
919
4.8
51 60 87 161 362 576 734 939
50 53 73 123 282 485 661 907
(b) Symmetric Two-Sided Tests 54 55 5.0 63 65 5.1 96 101 5.2 182 188 5.3 406 403 4.9 620 615 4.5 771 764 4.2 955 953 4.2
51 60 87 161 362 576 734 939
33 38 59 113 278 483 660 907
3.1 3.3 4.0 4.8 4.8 4.5 4.2 4.2
939
907
955
939
907
4.8
931
953
n = 120
PSC-Hyb
Asy
Asy
n = 120
Hyb
5.1
5.3
a The results in Table III are based on 20,000 simulation repetitions. For the finite-sample results, the search over |β2 | is done on the interval [0 10] with step sizes 0025, 025, and 250, respectively, on the intervals [0 8], [8 3], and [3 10] and also includes the value |β2 | = 999999. For the asymptotic results, the search over |h1 | is done in the interval [−10 10] with step size 01 For the finite-sample and asymptotic results, the Max is taken over |h2 | values in {0 2 4 6 8 9 95 99 995}. For the plug-in size-correction values, the grid of |γ2 | values has step sizes 01 001 0001, and 00002, respectively, on the intervals [0 7], [7 99], [99 996], and [996 10].
Table III provides null rejection probability results that are analogous to the CI coverage probabilities in Table II but for the present example. The parameter space H2 for the (asymptotic) correlation h2 between the LS estimators of the two regressors is [−995 995]. The finite-sample results in Table III are for n = 120, b = 12, and a model with standard normal errors, and k = 3 regressors, where x∗1i and x∗2i are independent standard normal random variables and x∗3i = 1. To dramatically increase computational speed, finite-sample results for tests that utilize subsampling critical values are based on qn = 119 subsamples of consecutive observations. Hence, only a small fraction of the “120 choose 12” available subsamples are used. In cases where such tests have correct asymptotic size, their finite-sample performance is expected to be bet-
HYBRID AND SIZE-CORRECTED SUBSAMPLING
759
ter when all available subsamples are used than when only qn = 119 are used. Further details concerning Table III are given in the Supplemental Material. The asymptotic results for the Sub, FCV, and Hyb tests show that all of these tests perform very similarly and very poorly. They are found to over-reject the null hypothesis very substantially for the upper and symmetric cases when the absolute value of the correlation, |h2 |, is large. (Results for equal-tailed tests, not reported, are similar to those for symmetric tests.) The asymptotic sizes of these nominal 5% tests range from 93% to 96% (see columns 2, 4, and 7). Even for |h2 | = 8, the maximum (over h1 ) asymptotic rejection probabilities of these tests range from 36% to 44%. Adjusted asymptotic sizes of the nominal 5% Sub and Hyb tests, not reported, are slightly lower than the unadjusted ones, but they are still in the range of 90% to 92%. The finite-sample maximum (over h1 and h2 ) null rejection probabilities of the nominal 5% Sub, FCV, and Hyb tests are very high and reflect the asymptotic results (see columns 3, 5, and 8).16 They range from 91% to 95%. Next, we consider PSC tests. We use the following consistent estimator of γn2 , −n (8.1)
−1
n
x1i x2i
i=1
γn2 = n−1
n i=1
x1i x1i n−1
n
!1/2 x2i x2i
i=1
where {(x1i x2i ) : i = 1 n} are the residuals from the regressions of x∗ji on x∗3i for j = 1 2. The choice of this estimator is based on the equality γn2 = Qn12 /(Qn11 Qn22 )1/2 = −Qn12 /(Qn11 Qn22 )1/2 , where Qnjm and Qnjm denote the (j m) elements of Qn−1 and Qn = EGn x⊥i x⊥ i , respectively, for j m = 1 2 (see the second equality in (S11.19) of the Supplemental Material) and Gn is the distribution of (εi x∗i ) Consistency of γn2 (i.e., γn2 − γn2 →p 0 under {γn : n ≥ 1}) follows from a lemma in the Supplemental Material. Thus, Assumption N holds. Note that the PSC tests do not depend on the specification of the parameter space for h2 . The PSC-FCV CI obtained by inverting the PSC-FCV test considered here is closely related to, but different from, the modified CI of Kabaila (1998). Table III reports finite-sample maximum (over h1 ) null rejection probabilities of the PSC-FCV and PSC-Hyb tests (see columns 6 and 9). These tests both perform very well. The maximum (over h1 and |h2 |) null rejection probabilities of these tests are all in the range of 48% to 53% for upper and symmetric tests. For both tests, the maximum rejection rates (over h1 ) do not vary 16 Strictly speaking, h2 denotes the asymptotic correlation between the LS estimators and H2 denotes its parameter space. For simplicity, when discussing the finite-sample results, we let h2 denote the finite-sample correlation between the LS estimators and we let H2 denote its parameter space.
760
D. W. K. ANDREWS AND P. GUGGENBERGER
too much with |h2 |, which is the objective of the plug-in approach. Hence, the plug-in approach works well in this example. For H2 = [−999 999], the finite-sample maximum (over h1 and |h2 |) null rejection rates of the PSC tests lie between 69% and 74%. For H2 = [−9999 9999], the PSC tests have corresponding values between 71% and 83%. Hence, it is clear that bounding |h2 | away from 10 is not only sufficient for the asymptotic PSC results to hold, but it is necessary for the PSC tests to have good finite-sample size. For practical purposes, this is not much of a problem because (i) h2 can be consistently estimated, so one has a good idea of whether |h2 | is close to 10, and (ii) |h2 | can be very close to 10 (i.e., 995 or less) and the PSC tests still perform very well in finite samples. In conclusion, nominal 5% subsampling, FCV, and hybrid tests have asymptotic and adjusted-asymptotic sizes that are very large—between 90% and 96%—for upper, symmetric, and equal-tailed tests (for H2 = [−995 995]). The maximum (over the cases considered) finite-sample null rejection probabilities of these tests for n = 120 and b = 12 are close to the asymptotic values. PSC methods work very well in this example. The PSC-Hyb and PSC-FCV tests have finite-sample maximum (over the cases considered) null rejection probabilities between 48% and 53% for upper, lower, and symmetric tests (for H2 as above). REFERENCES ANDERSON, T. W., AND H. RUBIN (1949): “Estimation of the Parameters of a Single Equation in a Complete Set of Stochastic Equations,” Annals of Mathematical Statistics, 20, 46–63. [723] ANDREWS, D. W. K. (1993): “Exactly Median-Unbiased Estimation of First-Order Autoregressive/Unit Root Models,” Econometrica, 61, 139–165. [723,724] ANDREWS, D. W. K., AND H.-Y. CHEN (1994): “Approximately Median-Unbiased Estimation of Autoregressive Models,” Journal of Business and Economic Statistics, 12, 187–204. [724] ANDREWS, D. W. K., AND P. GUGGENBERGER (2008a): “Asymptotics for Stationary Very Nearly Unit Root Processes,” Journal of Time Series Analysis, 29, 203–212. [752] (2008b): “Asymptotics for LS, GLS, and Feasible GLS Statistics in an AR(1) Model With Conditional Heteroskedasticity,” Discussion Paper 1665, Cowles Foundation, Yale University. [750,751] (2009a): “Incorrect Asymptotic Size of Subsampling Procedures Based on PostConsistent Model Selection Estimators,” Journal of Econometrics (forthcoming). [725,733,738] (2009b): “Supplement to ‘Hybrid and Size-Corrected Subsampling Methods’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ecta/Supmat/ 7015_extensions.pdf. [725,726,730] (2009c): “Validity of Subsampling and ‘Plug-in Asymptotic’ Inference for Parameters Defined by Moment Inequalities,” Econometric Theory, 25 (forthcoming). [725,730,737,746, 751] (2010a): “Asymptotic Size and a Problem With Subsampling and With the m out of n Bootstrap,” Econometric Theory, 26 (forthcoming). [721,730] (2010b): “Applications of Subsampling, Hybrid, and Size-Correction Methods,” Journal of Econometrics (forthcoming). [725,730,738,739] BOBKOWSKI, M. J. (1983): “Hypothesis Testing in Nonstationary Time Series,” Unpublished Ph.D. Thesis, University of Wisconsin, Madison. [752]
HYBRID AND SIZE-CORRECTED SUBSAMPLING
761
CAVANAGH, C. (1985): “Roots Local to Unity,” Unpublished Manuscript, Harvard University. [752] CHAN, N. H., AND C. Z. WEI (1987): “Asymptotic Inference for Nearly Nonstationary AR(1) Processes,” Annals of Statistics, 15, 1050–1063. [752] CHEN, W., AND R. DEO (2007): “A Smooth Transition to the Unit Root Distribution via the Chisquare Distribution With Interval Estimation for Nearly Integrated Autoregressive Processes,” Unpublished Manuscript, Stern School of Business, New York University. [724] CHOI, I. (2005): “Subsampling Vector Autoregressive Tests of Linear Constraints,” Journal of Econometrics, 124, 55–89. [757] DUFOUR, J.-M. (1997): “Some Impossibility Theorems in Econometrics With Applications to Structural and Dynamic Models,” Econometrica, 65, 1365–1387. [723] ELLIOTT, G. (1999): “Efficient Tests for a Unit Root When the Initial Observation Is Drawn From Its Unconditional Distribution,” International Economic Review, 40, 767–783. [752] ELLIOTT, G., AND J. H. STOCK (2001): “Confidence Intervals for Autoregressive Coefficients Near One,” Journal of Econometrics, 103, 155–181. [752] GIRAITIS, L., AND P. C. B. PHILLIPS (2006): “Uniform Limit Theory for Stationary Autoregression,” Journal of Time Series Analysis, 27, 51–60; corr. (forthcoming). [752] GUGGENBERGER, P. (2010): “The Impact of a Hausman Pretest on the Asymptotic Size of a Hypothesis Test,” Econometric Theory, 26 (forthcoming). [725] GUGGENBERGER, P., AND R. J. SMITH (2005): “Generalized Empirical Likelihood Estimators and Tests Under Partial, Weak, and Strong Identification,” Econometric Theory, 21, 667–709. [723] (2008): “Generalized Empirical Likelihood Tests in Time Series Models With Potential Identification Failure,” Journal of Econometrics, 142, 134–161. [723] GUO, B., AND P. C. B. PHILLIPS (2001): “Testing for Autocorrelation and Unit Roots in the Presence of Conditional Heteroskedasticity of Unknown Form,” Unpublished Working Paper, Department of Economics, UC Santa Cruz. [724,752] HANSEN, B. E. (1999): “The Grid Bootstrap and the Autoregressive Model,” Review of Economics and Statistics, 81, 594–607. [723,724] JANSSON, M., AND M. J. MOREIRA (2006): “Optimal Inference in Regression Models With Nearly Integrated Regressors,” Econometrica, 74, 681–714. [723] KABAILA, P. (1998): “Valid Confidence Intervals in Regression After Variable Selection,” Econometric Theory, 14, 463–482. [723,759] KIM, K., AND P. SCHMIDT (1993): “Unit Root Tests With Conditional Heteroskedasticity,” Journal of Econometrics, 59, 287–300. [724] KLEIBERGEN, F. (2002): “Pivotal Statistics for Testing Structural Parameters in Instrumental Variables Regression,” Econometrica, 70, 1781–1803. [723] (2005): “Testing Parameters in GMM Without Assuming They Are Identified,” Econometrica, 73, 1103–1124. [723] KRISTENSEN, D., AND O. LINTON (2006): “A Closed-Form Estimator for the GARCH(1 1) Model,” Econometric Theory, 22, 323–337. [754] LEEB, H. (2006): “The Distribution of a Linear Predictor After Model Selection: Unconditional Finite-Sample Distributions and Asymptotic Approximations,” in 2nd Lehmann Symposium— Optimality. Institute of Mathematical Statistics Lecture Notes—Monograph Series, Vol. 49. Beachwood, OH: Institute of Mathematical Statistics, 291–311. [725,732] LEEB, H., AND B. M. PÖTSCHER (2005): “Model Selection and Inference: Facts and Fiction,” Econometric Theory, 21, 21–59. [725,732] MIKUSHEVA, A. (2007a): “Uniform Inferences in Autoregressive Models,” Econometrica, 75, 1411–1452. [723-725] (2007b): “Supplement to ‘Uniform Inferences in Autoregressive Models’,” Econometrica Supplemental Material, 75, http://www.econometricsociety.org/ecta/Supmat/6254_proofs. pdf. [724]
762
D. W. K. ANDREWS AND P. GUGGENBERGER
MOREIRA, M. J. (2003): “A Conditional Likelihood Ratio Test for Structural Models,” Econometrica, 71, 1027–1048. [723] (2009): “Tests With Correct Size When Instruments Can Be Arbitrarily Weak,” Journal of Econometrics (forthcoming). [723] MÜLLER, U. K., AND G. ELLIOTT (2003): “Tests for Unit Roots and the Initial Condition,” Econometrica, 71, 1269–1286. [752] NANKERVIS, J. C., AND N. E. SAVIN (1996): “The Level and Power of the Bootstrap t-Test in the Trend Model With AR(1) Errors,” Journal of Business and Economic Statistics, 14, 161–168. [723,724] OTSU, T. (2006): “Generalized Empirical Likelihood Inference for Nonlinear and Time Series Models Under Weak Identification,” Econometric Theory, 22, 513–527. [723] PARK, J. Y. (2002): “Weak Unit Roots,” Unpublished Working Paper, Department of Economics, Rice University. [752] PHILLIPS, P. C. B. (1987): “Towards a Unified Asymptotic Theory for Autoregression,” Biometrika, 74, 535–547. [752] PHILLIPS, P. C. B., AND T. MAGDALINOS (2007): “Limit Theory for Moderate Deviations From a Unit Root,” Journal of Econometrics, 136, 115–130. [752] POLITIS, D. N., AND J. P. ROMANO (1994): “Large Sample Confidence Regions Based on Subsamples Under Minimal Assumptions,” Annals of Statistics, 22, 2031–2050. [723] POLITIS, D. N., J. P. ROMANO, AND M. WOLF (1999): Subsampling. Springer Series in Statistics. New York: Springer. [723] ROMANO, J. P., AND M. WOLF (2001): “Subsampling Intervals in Autoregressive Models With Linear Time Trend,” Econometrica, 69, 1283–1314. [724] SEO, B. (1999): “Distribution Theory for Unit Root Tests With Conditional Heteroskedasticity,” Journal of Econometrics, 91, 113–144. [724,752] STAIGER, D., AND J. H. STOCK (1997): “Instrumental Variables Regression With Weak Instruments,” Econometrica, 65, 557–586. [723] STOCK, J. H. (1991): “Confidence Intervals for the Largest Autoregressive Root in U.S. Macroeconomic Time Series,” Journal of Monetary Economics, 28, 435–459. [723,724]
Dept. of Economics, Cowles Foundation for Research in Economics, Yale University, P.O. Box 208281, Yale Station, New Haven, CT 06520-8281, U.S.A.;
[email protected] and Dept. of Economics, University of California–Los Angeles, Los Angeles, CA 90095, U.S.A.;
[email protected]. Manuscript received March, 2007; final revision received May, 2008.
Econometrica, Vol. 77, No. 3 (May, 2009), 763–799
THE MICROECONOMICS OF EFFICIENT GROUP BEHAVIOR: IDENTIFICATION1 BY P.-A. CHIAPPORI AND I. EKELAND Consider a group consisting of S members facing a common budget constraint p ξ = 1: any demand vector belonging to the budget set can be (privately or publicly) consumed by the members. Although the intragroup decision process is not known, it is assumed to generate Pareto-efficient outcomes; neither individual consumptions nor intragroup transfers are observable. The paper analyzes when, to what extent, and under which conditions it is possible to recover the underlying structure—individual preferences and the decision process—from the group’s aggregate behavior. We show that the general version of the model is not identified. However, a simple exclusion assumption (whereby each member does not consume at least one good) is sufficient to guarantee generic identifiability of the welfare-relevant structural concepts. KEYWORDS: Collective model, household behavior, nonparametric identification, exterior differential calculus, labor supply, public goods.
1. INTRODUCTION Group Behavior: The “Black Box” and Beyond CONSIDER A GROUP consisting of S members. The group has limited resources; specifically, its global consumption vector ξ must satisfy a standard market budget constraint of the form p ξ = y (where p is a vector of prices and y is total group income). Any demand vector belonging to the global budget set thus defined can be consumed by the members. Some of the goods can be privately consumed, while others may be publicly used. The decision process within the group is not known and is only assumed to generate Pareto-efficient outcomes.2 Finally, the intragroup allocation of resources or consumptions is not observable. In other words, the group is perceived as a “black box”; only its aggregate behavior, summarized by the demand function ξ(p y), is recorded. The goal 1 This paper presented at seminars in Chicago, Paris, Tel Aviv, New York, Banff and London. We thank the participants for their comments. Also, we are indebted to the editor, Eddie Dekel, two anonymous referees, and especially Ian Preston for valuable suggestions. This research received financial support from the NSF (Grant SBR 0532398). 2 We view efficiency as a natural assumption in many contexts, and as a natural benchmark in all cases. For instance, the analysis of household behavior often takes the “collective” point of view, where efficiency is the basic postulate. Other models, in particular in the literature on firm behavior, are based on cooperative game theory in a symmetric information context, where efficiency is paramount (see, for instance, the “insider–outsider” literature and, more generally, the models involving bargaining between management and workers or unions). The analysis of intragroup risk sharing, starting with Townsend’s (1994) seminal paper, provides other interesting examples. Finally, even in the presence of asymmetric information, first best efficiency is a natural benchmark. For instance, a large part of the empirical literature on contract theory tests models involving asymmetric information against the null of symmetric information and first best efficiency (see Chiappori and Salanie (2000) for a recent survey).
© 2009 The Econometric Society
DOI: 10.3982/ECTA5929
764
P.-A. CHIAPPORI AND I. EKELAND
of the present paper is to provide answers to the following, general question: When is it possible to recover the underlying structure—namely, individual preferences, the decision process, and the resulting intragroup transfers—from the group’s aggregate behavior? In the (very) particular case where the group consists of only one member, the answer is well known: individual demand uniquely defines the underlying preferences. Not much is known in the case of a larger group. However, recent results in the literature on household behavior suggest that, surprisingly enough, when the group is small, the structure can be recovered under reasonably mild assumptions. For instance, in the model of household labor supply proposed by Chiappori (1988a, 1988b, 1992), two individuals privately consume leisure and some Hicksian composite good. The main conclusion is that the two individual preferences and the decision process can generically be recovered (up to an additive constant) from the two labor supply functions. This result has been empirically applied by (among others) Fortin and Lacroix (1997) and Chiappori, Fortin, and Lacroix (2002), and extended by Chiappori (1997) to household production and by Blundell, Chiappori, Magnac, and Meghir (2007) to discrete participation decisions. Fong and Zhang (2001) considered a more general model where leisure can be consumed both privately and publicly. Although the two alternative uses are not independently observed, they can in general be identified under a separability restriction, provided that the consumption of another exclusive good (e.g., clothing) is observed. Altogether, these results suggest that there is information to be gained on the contents of the black box. In a companion paper (Chiappori and Ekeland (2006)), we investigated the properties of aggregate behavior stemming from the efficiency assumption. We concluded that when the group is small enough, a lot of structure is imposed on collective demand by this basic assumption: there exist strong, testable, restrictions on the way the black box may operate. The main point of the present paper is complementary. We investigate to what extent and under which conditions it is possible to recover much (or all) of the interior structure of the black box without opening it. We first show that in the most general case, there exists a continuum of observationally equivalent models, that is, a continuum of different structural settings generating identical observable behavior. This negative result implies that additional assumptions are required. We then provide examples of such assumptions and show that they are surprisingly mild. Essentially, it is sufficient that each agent in the group be excluded from consumption of (at least) one commodity. Moreover, when a “distribution factor” (see below) is available, this requirement can be reduced to the existence of an assignable good (i.e., a private good for which individual consumptions are observed). Under these conditions, the welfare-relevant structure is nonparametrically identifiable in general (in a sense, that is, made clear below), irrespective of the total number of commodities. We conclude that
MICROECONOMICS OF GROUP BEHAVIOR
765
even when decision processes or intragroup transfers are not known, much can be learned about them from the sole observation of the group’s aggregate behavior. This conclusion generalizes the earlier intuition of Chiappori (1988a, 1988b, 1992); it shows that the results obtained in these early contributions, far from being specific to the particular settings under consideration, were in fact general. Identifiability and Identification From a methodological perspective, it may be useful to define more precisely what is meant by “recovering the underlying structure.” The structure, in our case, is defined by the (strictly convex) preferences of individuals in the group and the decision process. Because of the efficiency assumption, for any particular cardinalization of individual utilities, the decision process is fully summarized by the Pareto weights corresponding to the outcome at stake. The structure thus consists of in a set of individual preferences (with a particular cardinalization) and Pareto weights (with some normalization, e.g., the sum of Pareto weights is taken to be 1). Whether the underlying structure can be recovered from the group’s aggregate behavior raises two different issues. One, usually called the identifiability problem, is whether the demand function ξ(p y) uniquely defines preferences and Pareto weights, possibly within a specific class (e.g., differentiable functions or utilities of a particular functional form). A second and independent issues deals with the possibility of recovering the function ξ(p y) from available data; it involves specific econometric problems, such as endogeneity, measurement errors, or the introduction of unobserved heterogeneity. The present paper deals only with the former question; that is, we want to find conditions under which the standard, integrability result of consumer theory (whereby an individual demand function uniquely identifies the underlying preferences) extends to a nonunitary setting. It is important to note that our approach is fundamentally nonparametric, in the sense that uniqueness obtains in a very large class of functions (typically, twice differentiable mappings). While most existing, empirical work on the topic is parametric, we argue that the conclusions drawn from parametric estimations are much more reliable when the underlying model is nonparametrically identifiable, since otherwise any recommendation made on the basis of the empirical results crucially relies on the arbitrary choice of a particular functional form.3 3 Of course, this discussion should not be interpreted too strictly. In the end, identifying assumptions are (almost) always needed. The absence of nonparametric identifiability, thus, should not necessarily be viewed as a major weakness. We believe, however, that it justifies a more cautious interpretation of the estimates. More importantly, we submit, as a basic, methodological rule that an explicit analysis of nonparametric identifiability is a necessary first step in any consistent empirical strategy— if only to suggest the most adequate identifying assumptions. Applying this approach to collective models is indeed the main purpose of this paper.
766
P.-A. CHIAPPORI AND I. EKELAND
Distribution Factors An important tool to achieve identifiability is the presence of distribution factors; see Bourguignon, Browning, and Chiappori (2009). These are defined as variables that can affect group behavior only through their impact on the decision process. Think, for instance, of the choices as resulting from a bargaining process. Typically, the outcomes will depend on the members’ respective bargaining positions; hence, any factor of the group’s environment that may influence these positions (EEPs in McElroy’s (1990) terminology) potentially affects the outcome. Such effects are of course paramount, and their relevance is not restricted to bargaining in any particular sense. In general, group behavior depends not only on preferences and budget constraint, but also on the members’ respective “power” in the decision process. Any variable that changes the powers may have an impact on observed collective behavior. In many cases, distribution factors are readily observable. An example is provided by the literature on household behavior. In their study of household labor supply, Chiappori, Fortin, and Lacroix (2002) used the state of the marriage market, as proxied by the sex ratio by age, race, and state, and the legislation on divorce as particular distribution factors affecting the intrahousehold decision process and thereby its outcome, that is, labor supplies. They found, indeed, that factors more favorable to women significantly decrease (resp. increase) female (resp. male) labor supply. Using similar tools, Oreffice (2007) concluded that the legalization of abortion had a significant impact on intrahousehold allocation of power. In a similar context, Rubalcava and Thomas (2000) used the generosity of single parent benefits and reached identical conclusions. Thomas, Contreras, and Frankenberg (1997), using an Indonesian survey, showed that the distribution of wealth by gender at marriage— another candidate distribution factor—has a significant impact on children’s health in those areas where wealth remains under the contributor’s control.4 Duflo (2000) has derived related conclusions from a careful analysis of a reform of the South African social pension program that extended benefits to a large, previously not covered black population. She found that the recipient’s gender—a typical distribution factor—is of considerable importance for the consequences of the transfers on children’s health. Whenever the aggregate group demand is observable as a function of prices and distribution factors, one can expect that identifiability may be easier to obtain. This is actually known to be the case in particular situations. For instance, Chiappori, Fortin, and Lacroix (2002) showed how the use of distribution factors allows a simpler and more robust estimation of a collective model of labor supply. In the present paper, we generalize these results by providing a general analysis of the estimation of collective models in different contexts, with and without distribution factors. 4
See also Galasso (1999) for a similar investigation.
MICROECONOMICS OF GROUP BEHAVIOR
767
The Results Our main conclusions can be summarized as follows: • In its most general formulation, the model is not identifiable. Any given aggregate demand that is compatible with efficiency can be derived either from a model with private consumption only or from a model with public consumption only. Moreover, even when it is assumed that all consumptions are private (or that they are all public or that some commodities are privately and other publicly consumed), in the absence of consumption exclusion there exists a continuum of different structural models that generate the same aggregate demand. • A simple exclusion assumption is in general sufficient to guarantee full, nonparametric identifiability of the welfare-relevant structure. Specifically, we define the collective indirect utility of each member as the utility level that member ultimately reaches for given prices, household income, and possibly distribution factors, taking into account the allocation of resources prevailing within the household. We show the following result: if each agent of the group is excluded from consumption of (at least) one commodity, then, in general, the collective indirect utility of each member can be recovered (up to some increasing transform), irrespective of the total number of commodities. Our general conclusion, hence, is that one consumption exclusion per agent is sufficient to identify all welfare-relevant aspects of the collective model. Moreover, when distribution factors are available, one assignable good (i.e., a private good for which individual consumptions are observed) only is sufficient for identifiability. Section 2 describes the model. The formal structure of the identifiability problem is analyzed in Section 3. Sections 4–6 consider the case of two-person groups. Section 4 characterizes the limits to identifiability in a general context. Identification under exclusivity or generalized separability assumptions is discussed in Section 5 and applied to specific economic frameworks in Section 6. Section 7 briefly discusses the extension to the general case of S-person groups. 2. THE MODEL 2.1. Preferences We consider an S-person group. There exist N commodities, n of which are privately consumed within the group while the remaining K = N − n commodities are public. Purchases5 are denoted by the vector x ∈ RN . Here, x = ( s xs X), where xs ∈ Rn denotes the vector of private consumption by agent s and X ∈ RK is the household’s public consumption.6 The correspond5
Formally, purchases could include leisure; then the price vector includes the wages—or virtual wages for nonparticipants. 6 Throughout the paper, xis denotes the private consumption of commodity i by agent s.
768
P.-A. CHIAPPORI AND I. EKELAND
ing prices are (p P) ∈ RN = Rn × RK and household income is y, giving the budget constraint p (x1 + · · · + xS ) + P X = y Each member has her/his own preferences over the goods consumed in the group. In the most general case, each member’s preferences can depend on other members’ private and public consumptions; this allows for altruism, but also for externalities or any other preference interaction. The utility of member s is then of the form U s (x1 xS X). We shall say that the function U s is normal if it is strictly increasing in (xs X), twice continuously differentiable in (x1 xS X), and the matrix of second derivatives is negative definite. We shall see that identifiability does not obtain in the general setting of normal utilities. Therefore, throughout most of the paper we use a slightly less general framework. Specifically, we concentrate on egoistic preferences, defined as follows: DEFINITION 1: The preferences of agent s are egoistic if they can be represented by a utility function of the form U s (xs X) In words, preferences are egoistic if each agent only cares about his private consumption and the household’s vector of public goods. Most of our results can be extended to allow for preferences of the caring type (i.e., agent s maximizes an index of the form W s (U 1 U S ); however, we will not discuss the identifiability of the W s .7 Finally, we shall denote by z ∈ Rd the vector of distribution factors. 2.2. The Decision Process We now consider the mechanism that the group uses to decide what to buy. Note, first, that if the functions U 1 U S represent the same preferences, then we are in a unitary model where the common utility is maximized under the budget constraint. The same conclusion obtains if one of the partners can act as a dictator and impose her (or his) preferences as the group’s maximand. Clearly, these are very particular cases. In general, the process that takes place within the group is more complex. Following the collective approach, we shall throughout the paper postulate efficiency, as expressed in the following axiom: 7 Each allocation that is efficient with respect to the W s must also be efficient with respect to the U s . The converse is not true (e.g., an allocation which is too unequal may fail to be efficient for the W s ), a property that has sometimes been used to achieve identification (see Browning and Lechène (2001)).
MICROECONOMICS OF GROUP BEHAVIOR
769
AXIOM 2—Efficiency: The outcome of the group decision process is Paretoefficient; that is, for any prices (p P), income y, and distribution factors z, the consumption (x1 xS X) chosen by the group is such that no other vector ¯ in the budget set could make all members better off, one of them (x¯ 1 x¯ S X) strictly so. The axiom can be restated as follows: There exist S scalarfunctions μs (p P y z) ≥ 0 1 ≤ s ≤ S, the Pareto weights, normalized by s μs = 1, such that (x1 xS X) solves (Pr) μs (p P y z)U s (x1 xS X) max x1 xS X
p (x1 + · · · + xS ) + P X = y For any given utility functions U 1 U S and any price–income bundle, the budget constraint defines a Pareto frontier for the group. From the efficiency axiom, the final outcome will be located on this frontier. It is well known that, for every (p P y z), any point on the Pareto frontier can be obtained as a solution to problem (Pr): the vector μ(p P y z), which belongs to the (S − 1)dimensional simplex, summarizes the decision process because it determines the final location of the demand vector on this frontier. The map μ describes the distribution of power. If one of the weights, μs , is equal to 1 for every (p P y z), then the group behaves as though s is the effective dictator. For intermediate values, the group behaves as though each person s has some decision power, and the person’s weight μs can be seen as an indicator of this power.8 It is important to note that the weights μs will in general depend on prices p, P, income y, and distribution factors z, since these variables may in principle influence the distribution of power within the group, hence the location of the final choice over the Pareto frontier. Three additional remarks can be made: • While prices enter both Pareto weights and the budget constraint, distribution factors matter only (if at all) through their impact on μ. • We assume throughout the paper the absence of monetary illusion. In particular, the μs are zero-homogeneous in (p P y). • Following Browning and Chiappori (1998), we add some structure by assuming that the μs (p P y z) are continuously differentiable for s = 1 S. This interpretation must be used with care, since the Pareto coefficient μs obviously depends on the particular cardinalization adopted for individual preferences; in particular, μs > μt does not necessarily mean that s has more power than t. However, the variations of μs are significant, in the sense that for any given cardinalization, a policy change that increases μs while leaving μt constant unambiguously ameliorates the position of s relative to t. 8
770
P.-A. CHIAPPORI AND I. EKELAND
From now on, we set π = (p P) ∈ RN . 2.3. Characterization of Aggregate Demand In a companion paper (Chiappori and Ekeland (2006)), we derived necessary and sufficient conditions for a function ξ(π y z) to be the aggregate demand of an efficient S-member group. For the sake of completeness, we briefly restate these conditions. Let us first omit the dependence on distribution factors, and normalize household income to 1. Then we can make the following statements: • If a function ξ(π) is the aggregate demand of an efficient S-member group, then its Slutsky matrix can be decomposed as (1)
S(π) = Σ(π) + R(π)
where the matrix Σ is symmetric and negative and the matrix R is of rank at most S − 1. Equivalently, there exists a subspace R of dimension at least N − (S − 1) such that the restriction of S(π) to R is symmetric and negative. • Conversely, if a “smooth enough” map ξ(π) satisfies Walras’ law π ξ = y and condition (1) in some neighborhood of π, ¯ and if the Jacobian of ξ at π¯ has maximum rank, then ξ(π) can locally be obtained as the aggregate demand of an efficient S-member group; that is, one can (locally) recover S utility functions of the general form U s (x1 xS X) and S Pareto weights μs (π) ≥ 0 such that ξ(π) is the collective demand associated with problem (Pr). We then say that ξ is S-admissible. Relation (1) was initially derived by Browning and Chiappori (1998); it is known as the (symmetric negative plus rank S − 1) SNR(S − 1) condition. A natural question is whether more knowledge about intragroup consumption will generate stronger restrictions. Assume, for instance, that commodities are known to be privately consumed, so that the utility functions are of the form U s (xs ); alternatively assume that consumption is exclusively public, so that the preferences are U s (X). Does the preceding result still hold when utilities are constrained to belong to these specific classes? Interestingly enough, the answer is positive. In fact, it is impossible to distinguish the two cases by only looking at the local structure of the aggregate demand.9 In the paper mentioned above, we proved that whenever a function ξ is S-admissible, then it can be (locally) obtained as a Pareto-efficient aggregate demand for a group in which all consumptions are public, and it can also be (locally) obtained as a 9 Global conditions may however exist; see Cherchye, De Rock, and Vermeulen (2007a, 2007b) for a revealed-preferences approach.
MICROECONOMICS OF GROUP BEHAVIOR
771
Pareto-efficient aggregate demand for a group in which all consumptions are private. Finally, the same paper provides necessary conditions on the effect of distribution factors. 2.4. Identifiability: The General Problem Following the discussion above, we now raise the question of identifiability: QUESTION A—Identifiability: Take an arbitrary demand function ξ(π y z) satisfying the SNR(S − 1) condition (1). Is there a unique family of preference relations on RN , represented by utility functions U s (x1 xS X) (unique up to an increasing transformation) and, for each cardinalization of preferences, a family of differentiable Pareto weights μs (π y z), 1 ≤ s ≤ S, with unique s μ = 1, such that ξ(π y z) is the aggregate demand associated with problem (Pr)? Question A refers to what could be called a nonparametric definition of identifiability, because uniqueness is required within the general set of wellbehaved functions, rather than within the set of functions sharing a specific parametric form in which only a finite number of parameters can be varied. It should be clear that in the most general version of the model we consider, identifiability cannot obtain. A demand function that satisfies SNR(S − 1) is compatible with (at least) two different structural models: one where all commodities are privately consumed and one in which all consumption is public. Quite obviously, these models have very different welfare implications, although they generate the same aggregate demand. This suggests that more specific assumptions are needed. In what follows, we assume that each commodity is either known to be privately consumed or known to be publicly consumed. Also, preferences are egoistic in the sense defined above. While these assumptions are natural, we shall actually see that they are not sufficient. The nature of the indeterminacy is deeper than suggested by the previous remark. Even with egoistic preferences—and, as a matter of fact, even when consumptions are assumed to be either all public or, alternatively, all private—it is still the case that a continuum of different structural models generates the same group demand function. In other words, identifying restrictions are needed that go beyond egoistic preferences. In the remainder of the paper, we analyze the exact nature of such restrictions. We first investigate the mathematical structure underlying the model. We then prove a general result regarding uniqueness in this mathematical framework. Finally, we derive the specific results of interest.
772
P.-A. CHIAPPORI AND I. EKELAND
3. THE MATHEMATICAL STRUCTURE OF THE IDENTIFIABILITY PROBLEM 3.1. The Duality Between Private and Public Consumption 3.1.1. Basic Intuition With egoistic preferences, program (Pr) above becomes: (Pr )
max
x1 xS X
μs (p P y z)U s (xs X)
p (x1 + · · · + xS ) + P X = y Let x1 (p P y z) xS (p P y z) X(p P y z) denote its solution. The household demand function is then (x(p P y z) X(p P y z)), where x = s xs . In what follows, we repeatedly use the duality between private and public consumption, a standard tool in public economics. In the neighborhood of a point (p P y z) such that DP X is of full rank, we can consider the change in variables ψ : Rn+K+1+d → Rn+K+1+d ψ(p P y z) = (p X(p P y z) y z) The economic motivation for such a change in variables is clear. A basic insight underlying the duality between private and public goods is that, broadly speaking, quantities play for public goods the role of prices for private goods and conversely. Intuitively, in the case of private goods, all agents face the same price but consume different quantities, which add up to the group’s demand; with public goods, agents consume the same quantity, but face different (Lindahl) prices, which add up to the market price if the allocation is efficient. This suggest that whenever the direct demand function x(p) is a relevant concept for private consumption, then the inverse demand function P(X) should be used for public goods. The change of variable ψ allows us to implement this intuition. In particular, instead of considering the demand function (x X) as a function of (p P y z), we shall often consider (x P) as a function of (p X y z) (then the public prices P are implicitly determined by the condition that demand for public goods must be equal to X while private prices are equal to p, income is y, and distribution factors are z). While these two viewpoints are clearly equivalent (one can switch from the first to the second and back using the change ψ), the computations are much easier (and more natural) in the second case. Finally, for the sake of clarity, we omit distribution factors for the moment and consider functions of (p X y) only.
MICROECONOMICS OF GROUP BEHAVIOR
773
3.1.2. Conditional Sharing Rules We now introduce the notion of a conditional sharing rule. It stems from the following result: ¯ denote a solution to (Pr ). DeLEMMA 3: For given (p P), let (x¯ 1 x¯ S X) fine ρs (p X y) = p x¯ s (p X y) for s = 1 S. Then x¯ s solves (Prs )
max U s (xs X) xs
p xs ≤ ρs PROOF: Assume not. Then there exists some xs such that p xs ≤ ρs and ¯ is feasiU s (xs X) > U s (xs X). But then the allocation (x¯ 1 xs x¯ S X) ¯ ble and Pareto dominates (x¯ 1 x¯ S X), a contradiction. Q.E.D. In words, an efficient allocation can always be seen as stemming from a two stage decision process.10 At stage 1, members decide on the public purchases X and on the allocation of the remaining income y − P X between the members; member s receives ρs . At stage 2, agents each choose their vector of private consumption, subject to their own budget constraint and taking the level of public consumption as given. The conditional sharing rule is the vector (ρ1 ρS ); it generalizes the notion of sharing rule developed in collective models with private goods only (see, for instance, Chiappori (1992)) because it is defined conditionally on the level of public consumption previously chosen. Of course, if all commodities are private (K = 0), then the conditional sharing rule boils down to the standard notion. In all cases, it satisfies the budget constraint (2) ρs = y − P X s
The conditional sharing rule can be expressed either as a function of (p P y z), as above, or, using the change in variable ψ, as a function of (p X y z). In the first case, ρ is one-homogeneous in (p P y); in the second case, ρ is one-homogeneous in (p y).11 10 Needless to say, we are not assuming that the actual decision process is in two stages. The result simply states that any efficient group behaves as if it was following a process of this type. 11 With a slight notational abuse, we use the same notation ρ in both cases. This convention avoids tedious distinctions in a context in which confusions are easy to avoid.
774
P.-A. CHIAPPORI AND I. EKELAND
We define the conditional indirect utility of member s as the value of program (Prs ); hence (3)
V s (p X ρ) = max{U s (xs X) st p xs = ρ}
which can be interpreted as the utility reached by member s when consuming X and being allocated an amount ρ for her private expenditures. 3.1.3. Some Duality Results The envelope theorem applied to (3) gives: Dp V s = −λs xs DX V s = DX U s Dρ V s = λs where λs denotes the Lagrange multiplier of the budget constraint (or equivalently the marginal utility of private income for member s) in (3), and where ⎛ ∂V ⎜ ∂p1 ⎜ Dp V = ⎜ ⎜ ⎝ ∂V ∂pn
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
⎛ ∂V ⎜ ∂X1 ⎜ DX V = ⎜ ⎜ ⎝ ∂V ∂XK
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
Dρ V =
∂V ∂ρ
Hence we have a direct extension of Roy’s identity: Dp V s = −xs Dρ V s The first stage decision process can then be modelled as max
Xρ1 ρS
μs (p P y) V s (p X ρs )
s
ρs + P X = y
s
First order conditions give μs Dρ V s (p X ρs ) = γ(p P y) μs DX V s (p X ρs ) = γ(p P y)P s
(s = 1 S)
775
MICROECONOMICS OF GROUP BEHAVIOR
where γ denotes the Lagrange multiplier of the constraint. Setting γ s = μs /γ, we have that Dρ V s (p X ρs ) =
1 γ s (p P y)
(s = 1 S)
which expresses the fact that individual marginal utilities of private incomes, Dρ V s (p X ρs ), are inversely proportional to Pareto weights. Finally, we have the conditions (4)
γ s (p P y)Dp V s = −
s
xs = −x
s
γ s (p P y)DX V s = P
s
which determine x(p P y) and X(p P y). Using the change in variable ψ, we see that (x P), as a function of (p X), can be decomposed as a linear combination of the partial derivatives of the V s . The nice symmetry of these equations illustrates the duality between private and public consumptions.
3.2. Collective Indirect Utility Following Chiappori (2005), we introduce the following, key definition: DEFINITION 4: Given a conditional sharing rule ρs (p X y), the collective indirect utility of agent s is defined by W s (p X y) = V s (p X ρs (p X y)) In words, W s denotes the utility level reached by agent s, at prices p and with total income y, in an efficient allocation such that the household demand for public goods is X, taking into account the conditional sharing rule ρs (p X y). Note that W s depends not only on the preferences of agent s (through the conditional indirect utility V s ), but also on the decision process (through the conditional sharing rule ρs ). Hence W s summarizes the impact on s of the interactions taking place within the group. As such, it is the main concept required for welfare analysis: knowing the W s allows one to assess the impact of any reform (i.e., any change in prices and incomes) on the welfare of each group member. Most of what follows is devoted to the identifiability of the collective indirect utility of each member.
776
P.-A. CHIAPPORI AND I. EKELAND
3.3. Identifying Collective Indirect Utilities: The Mathematical Structure Condition (4) can readily be translated in terms of collective indirect utilities. Since Dp W s = γ s Dp W s = γ s Dp V s + Dp ρs = −xs + Dp ρs Dρ V s DX W s = γ s DX W s = γ s DX V s + DX ρs Dρ V s Dy W s = γ s Dy W s = Dy ρs Dρ V s and using the budget constraint (2), we conclude that (5) γ s Dp W s = −x − (Dp P)X = −x − Dp A s
γ s DX W s = −(DX P)X = P − DX A
s
γ s Dy W s = 1 − (Dy P) X = 1 − Dy A
s
where A(p X y) = P(p X y) X denotes household total expenditures on public goods. The identifiability question described above can thus be rephrased in mathematical terms as follows: QUESTION B: Take an arbitrary, S-admissible demand (x(p P y) X(p P y)) and apply the change of variable ψ in the neighborhood of a regular point to obtain a function (x(p X y) P(p X y)). Let A(p X y) be the total household expenditure on public goods. Is there a unique family of differentiable functions W s (p X y) on RN , each defined up to some increasing transformation, such that the vector ⎞ ⎛ −x − Dp A ⎝ P − DX A ⎠ 1 − Dy A can be expressed as a linear combination of the gradients of the W s 1 ≤ s ≤ S? Decomposing a given function as a linear combination of gradients is an old problem in mathematics to which much work has been devoted in the first half of the 20th century, particularly by Elie Cartan, who developed exterior differential calculus for that purpose (see Cartan (1945), Arnold (1978), Bryant, Chern, Gardner, Goldschmidt, and Griffiths (1991)). We shall use these tools in what follows.
MICROECONOMICS OF GROUP BEHAVIOR
777
3.3.1. Particular Case of Public Goods Only Before addressing Question B, we may briefly consider two special cases. Assume, first, that all goods are public. Then W s coincides with the direct utility U s , that is, W s (X) = U s (X). Since A = P X = y, equations (5) become (6) γ s DX U s = P(X) s
The existence (or characterization) problem is whether one can find functions U s (X) and γ s (X), 1 ≤ s ≤ S, to satisfy this equation. It has been addressed in Chiappori and Ekeland (2006). Here we are interested in the uniqueness (or identifiability) question: Are the U s (X) unique up to an increasing transformation? 3.3.2. Particular Case of Private Goods Only The opposite polar case obtains when all commodities are private. This is a case that has been repeatedly studied in the literature, starting with Chiappori (1988a, 1988b, 1992). Then A = 0 and W s is a function of (p y) only. Moreover, W s , as V s , is zero-homogeneous, so we can normalize y to be 1. Equations (5) become (7) γ s (p)Dp W s = −x(p) s
Remember that W s is not identical to the standard indirect utility function V s ; the difference, indeed, is that W s captures both the preferences of agent s (through V s ) and the decision process (which, in that case, is fully summarized by the sharing rule ρ). In particular, identifying W s is not equivalent to identifying V s (hence U s ). Specifically, we have that
p s s s s (8) 1 W (p) = V (p ρ (p)) = V ρs (p) and it is easy to prove that knowledge of W s is not sufficient to independently identify both ρs and V s : for any W s , there exists a continuum of pairs (ρs V s ) such that (8) is satisfied.12 In contrast with the public good case, knowledge of the collective indirect utilities is therefore not sufficient under private consumption to identify preferences and the decision process (as summarized by 12 For instance, pick up some arbitrary φ(p) mapping RN into R, and define an alternative solution (ρ¯ s V¯ s ) by
ρ¯ s (p) = φ(p)ρs (p)
and
V¯ s (p 1) = V s (φ(p)p 1)
Then (8) is satisfied for the alternative solution.
778
P.-A. CHIAPPORI AND I. EKELAND
the sharing rule). This indeterminacy is a direct generalization of a previous result derived by Chiappori (1992) in a three commodity, labor supply setting. The crucial remark, however, is that this indeterminacy is welfare irrelevant. If (V¯ 1 V¯ S ; ρ¯ 1 ρ¯ S ) is a particular solution, the various, alternative solutions (V 1 V S ; ρ1 ρS ) have a very simple interpretation in welfare terms. Namely, the V s are such that the (ordinal) utility of each member s, when facing the sharing rule ρ, is always the same as under the V¯ and (ρ¯ 1 ρ¯ S ). It follows, in particular, that any reform, that is, found to increase the welfare of member s for V¯ s and ρ¯ s will also increase her welfare for V s and ρs . Again, identifying the collective indirect utilities W s is sufficient for welfare analysis. This remark is general and applies to any context, whatever the number of private and public goods. Finally, in the pure private good case, not only is the indeterminacy welfareirrelevant, but it is also behavior-irrelevant; that is, two households with the same collective indirect utilities W s and the same weights γ s will exhibit the same market behavior, irrespective of the particular sharing rule (although intrahousehold allocation will obviously differ). Hence estimation of collective indirect utilities and weights allows us to predict household behavior and derive comparative statics conclusions. 4. IDENTIFIABILITY IN THE GENERAL SETTING: A NEGATIVE RESULT We now state the central mathematical result that underlies most of the following conclusions. It will be applied in various settings. We denote by π = (π1 πN ) the independent variables, and by ξ(π) = (ξ1 (π) ξN (π)) a given vector field. All functions are assumed to be smooth (at least C 2 ). LEMMA 5: Suppose four functions (W¯ 1 (π) W¯ 2 (π) λ¯ 1 (π) λ¯ 2 (π)) satisfy (9)
λ1 (π)DW 1 (π) + λ2 (π)DW 2 (π) = ξ(π)
in some neighborhood Ω1 of π, ¯ and that Dπ W¯ 1 , Dπ W¯ 2 , and Dπ (λ¯ 1 /λ¯ 2 ) are linearly independent at π. ¯ Then for any other family (W 1 W 2 λ1 λ2 ) satisfying (9) in some neighborhood Ω2 of π, ¯ there exists a neighborhood Ω3 ⊂ Ω1 ∩ Ω2 of π¯ and two functions F and G, defined on some neighborhood of ¯ ¯ W¯ 2 (π) ¯ λλ¯ 1 (π)) ¯ in R3 , such that, for all π ∈ Ω3 (W¯ 1 (π) 2
(10) (11)
λ¯ 1 1 2 ¯ ¯ W (π) = F W (π) W (π) (π) λ¯ 2
λ¯ 1 2 1 2 ¯ ¯ W (π) = G W (π) W (π) (π) λ¯ 2 1
MICROECONOMICS OF GROUP BEHAVIOR
779
The functions F(t1 t2 t3 ) and G(t1 t2 t3 ) must moreover satisfy the partial differential equation
∂G ∂F ∂G ∂F ∂G ∂F ∂G ∂F (12) − = t3 − ∂t1 ∂t3 ∂t3 ∂t1 ∂t2 ∂t3 ∂t3 ∂t2 Finally, λ1 and λ2 are completely determined by the choice of F and G. For the prooof, see the Appendix. In practice, we can pick one arbitrary function F of three variables. Then (12) becomes a linear partial differential equation for the unknown function G, which is therefore fully determined by its values on any hypersurface not tangent to the vector field (∂F/∂t3 −t3 ∂F/∂t3 −∂F/∂t1 + t3 ∂F/∂t2 ). Hence the solution is defined up to the arbitrary choice of one function of three variables and one function of two variables. Lemma 5 provides two conclusions. One is that whenever a given function can be decomposed as a linear combination of two gradients, this can be done in a continuum of different ways. Second, one can exactly characterize the set of such solutions. Specifically, the decomposition is identified up to an arbitrary function of three variables and an arbitrary function of two variables. An immediate implication is the following corollary: COROLLARY 6: In the collective model, the collective indirect utilities of the members are not uniquely determined by the knowledge of household demand. To get a better intuition of this result, it is useful to consider the particular case in which all goods are public. Remember that in that case the collective indirect utility of a member is simply the person’s direct utility. Now, let (U¯ 1 U¯ 2 ) be a particular solution; the inverse demand P(X) can be written as a linear combination of the gradients of (U¯ 1 U¯ 2 ), say λ¯ 1 DX U¯ 1 + λ¯ 2 DX U¯ 2 = P(X) Define U 1 (X) = F(U¯ 1 (X) U¯ 2 (X)) U 2 (X) = G(U¯ 1 (X) U¯ 2 (X)) where F and G are arbitrary, increasing functions. Clearly, a linear combination of the gradients of U 1 and U 2 must be a linear combination of the gradients of U¯ 1 and U¯ 2 . The economic intuition is that any allocation X which is Pareto-efficient for (U 1 U 2 ) must also be Pareto-efficient for (U¯ 1 U¯ 2 ) (otherwise it would be possible to increase both U¯ 1 and U¯ 2 , but this would increase U 1 and U 2 as well—a contradiction). This gives a first intuition why the Pareto
780
P.-A. CHIAPPORI AND I. EKELAND
efficiency assumption is not sufficient to distinguish between the two solutions: If a demand function can be collectively rationalized by a couple with utilities (U¯ 1 U¯ 2 ), then it can also be collectively rationalized by a couple with utilities (F(U¯ 1 U¯ 2 ), G(U¯ 1 U¯ 2 )). Interestingly enough, this is not the only degree of indeterminacy. To get a ¯ different example, set θ(X) = λ¯ 1 (X)/λ¯ 2 (X), and define ¯ U 1 (X) = U¯ 1 (X) + U¯ 2 (X) + θ(X) ¯ U 2 (X) = −U¯ 1 (X) + log(1 − θ(X)) It is easy to check that λ¯ 2 DX U 1 + (λ¯ 2 − λ¯ 1 )DX U 2 = λ¯ 1 DX U¯ 1 + λ¯ 2 DX U¯ 2 = P(X) and we conclude that the inverse demand function P(X) can also be expressed as a linear combination of the gradients of utilities (U 1 U 2 ). Here, F(t1 t2 t3 ) = t1 + t2 + t3 , and G(t1 t2 t3 ) = −t1 + ln(1 − t3 ) is a particular solution of equation (12). 5. IDENTIFIABILITY UNDER EXCLUSION The previous section concludes that identifiability does not obtain in the general version of the collective model. The next step is to find specific, additional assumptions that guarantee identifiability. We now show that a simple exclusion condition is sufficient in general. 5.1. The Main Result Again, suppose four functions (W¯ 1 (π) W¯ 2 (π) λ¯ 1 (π) λ¯ 2 (π)) satisfy equa¯ = λ¯ 1 (π)/λ¯ 2 (π). We shall tion (9) in some neighborhood Ω1 of π¯ and set θ(π) 1 2 ¯ ¯ say that this solution is generic if Dπ W , Dπ W , and Dπ θ¯ are linearly independent at π, ¯ and (13)
∂W¯ 2 (π) = 0 and ∂π1 ¯ abcd (π) ¯ = 0 for some a b c d ∈ {1 N} D
where (14)
¯2
−1 ∂θ¯ ∂W ¯ Φ(π) = (π) (π) ∂π1 ∂π1
MICROECONOMICS OF GROUP BEHAVIOR
(15)
¯ ∂Φ ∂πa ∂Φ¯ ¯ abcd = ∂πb D ∂Φ¯ ∂π c ¯ ∂Φ ∂πd
∂W¯ 1 ∂πa ∂W¯ 1
∂W¯ 2 ∂πa ∂W¯ 2
∂θ¯ ∂πa ∂θ¯
∂πb ∂W¯ 1 ∂πc ∂W¯ 1 ∂πd
∂πb ∂W¯ 2 ∂πc ∂W¯ 2 ∂πd
∂πb ∂θ¯ ∂πc ∂θ¯ ∂πd
781
Note that if equations (13) hold at π, ¯ they will hold in a neighborhood of π¯ as well. In practice, there is no reason to expect that the functions W¯ 1 , W¯ 2 , and ¯ abcd (π) ¯ = 0; and if they θ¯ = λ¯ 1 /λ¯ 2 would satisfy either ∂W¯ 2 /∂π1 (π) = 0 or D do, the equality is not robust to a slight perturbation of the model. We shall say that commodity 1 is excluded from W¯ 1 in Ω if: (16)
∂W¯ 1 (π) = 0 ∀π ∈ Ω ∂π1
We now show that an exclusion condition for each agent is sufficient to guarantee identifiability for generic models. Note, first, that the result is obvious if the number of commodities is exactly 2, since individual consumptions are perfectly observed in that case. Also, the case N = 3 has been already solved by Chiappori (1992) for private goods and Blundell, Chiappori, and Meghir (2005) for public goods. We are thus left with the general case N ≥ 4. The main result is the following proposition: PROPOSITION 7: Let N ≥ 4. Assume that equation (9) has a generic solution (W¯ 1 W¯ 2 λ¯ 1 λ¯ 2 ) in some neighborhood Ω1 of π, ¯ where commodity 1 is excluded 1 1 2 ¯ from W . Let (W W λ1 λ2 ) be another solution on Ω1 , with commodity 1 excluded from W 1 . Then there exist a function F and a neighborhood Ω2 ⊂ Ω1 of π¯ such that W1 = F(W¯ 1 ) on Ω2 , so that W1 is ordinally identified. PROOF: From Lemma 5, we know that there is some neighborhood Ω2 ⊂ Ω1 and some function F such that ¯ on Ω2 W 1 = F(W¯ 1 W¯ 2 θ) Without loss of generality, it can be assumed that equations (13) hold for all π ∈ Ω2 . From the exclusion condition for W 1 and W¯ 1 , we derive, for all π ∈ Ω2 , (17)
0=
¯2 ¯ ∂W 1 ¯ ∂W + F3 (W¯ 1 W¯ 2 θ) ¯ ∂θ (π) = F2 (W¯ 1 W¯ 2 θ) ∂π1 ∂π1 ∂π1
782
P.-A. CHIAPPORI AND I. EKELAND
Here and later, we denote by Fi , i = 1 2 3, the derivative of F with respect ¯ = 0 for some π ∈ Ω2 . to the ith variable. Suppose F3 (W¯ 1 (π) W¯ 2 (π) θ(π)) Equation (17) then implies that
−1 ¯ ∂θ¯ ∂W¯ 2 F2 (W¯ 1 W¯ 2 θ) ¯ Φ= =− ¯ ∂π1 ∂π1 F3 (W¯ 1 W¯ 2 θ) ¯ and its in some neighborhood Ω3 ⊂ Ω2 of π. So Φ¯ is a function of (W¯ 1 W¯ 2 θ), 1 2 ¯ ¯ ¯ For gradient must be a linear combination of the gradients of W W , and θ. ¯ N ≥ 4, this translates into Dabcd (π) = 0 for all a b c d ∈ {1 N}. But this contradicts the fact that equation (13) holds for all π in Ω2 . ¯ = 0 for all π ∈ Ω2 . Then equation So we must have F3 (W¯ 1 (π) W¯ 2 (π) θ(π)) (17) becomes ∂W¯ 2 ¯ (π) 0 = F2 (W¯ 1 (π) W¯ 2 (π) θ(π)) ∂π1 ¯ = 0. We are left with W 1 (π) = which in turn implies F2 (W¯ 1 (π) W¯ 2 (π) θ(π)) 1 Q.E.D. F[W¯ (π)], as announced. Proposition 7 states that, broadly speaking, a simple exclusion condition is sufficient to obtain identifiability. Note that the result is specific to one of the functions: if W 1 does not depend on π1 , then W 1 is ordinally identifiable, irrespective of W 2 (which may not be). 5.2. Generalized Separability The previous result has an important extension that generalizes the notion of separability. Let φ be a given, smooth function from a neighborhood Ω1 of π¯ to RN , such that (18)
∂φ (π) = 0 ∀π ∈ Ω1 ∂πN
Suppose four functions (W¯ 1 (π), W¯ 2 (π), λ¯ 1 (π), λ¯ 2 (π)) satisfy equation (9) ¯ We shall say that this solution is generic in some neighborhood Ω2 ⊂ Ω1 of π. ¯ π) ¯ Dπ W¯ 2 (π), ¯ and Dπ θ( ¯ are if ∂W¯ 1 /∂πN does not vanish on Ω1 , if Dπ W¯ 1 (π), linearly independent, and if (19)
−1 ∂W¯ 2 ∂W¯ 2 ∂π1 ∂πN
−1 ∂θ¯ ∂θ¯ ∂π1 ∂πN
and
−1 ∂φ ∂φ ∂π1 ∂πN
are well defined and pairwise distinct on Ω1 (with θ¯ = λ¯ 1 /λ¯ 2 as above).
MICROECONOMICS OF GROUP BEHAVIOR
783
We now introduce a generalized separability conditions, which states that all solutions depend on two particular variables, say π1 and πN , only through the common function φ. This is a well known consequence of separability in standard consumer theory (when a set of commodities is separable, their demand depends on the other prices only through an income effect), which explains the expression “generalized separability.” Technically, we say that W¯ 1 is separable through φ if: (20)
∂φ ∂W¯ 1 ∂W¯ 1 ∂φ (π) (π) = (π) (π) ∂πN ∂π1 ∂πN ∂π1
COROLLARY 8: Let N ≥ 4. Assume that (W¯ 1 W¯ 2 λ¯ 1 λ¯ 2 ) is a generic solution of equation (9) on Ω2 and that W¯ 1 is separable through φ. Let (W 1 W 2 λ1 λ2 ) be another solution on Ω2 , with W 1 separable through the same φ. Then there exist a function F and a neighborhood Ω3 ⊂ Ω2 of π¯ such that W1 = F(W¯ 1 ) on Ω3 , that is, W1 is ordinally identified. PROOF: We claim that W 1 can be written in the form (21)
W 1 (π) = K 1 [π2 πN−1 φ(π)]
Indeed, the map (22)
(π1 π2 πN ) → (π1 π2 πN−1 φ(π))
is invertible in a neighborhood Ω3 of π¯ by condition (18). So W 1 can be written as W 1 (π) = K 1 [π1 π2 πN−1 φ(π)] Differentiating, we get ∂W 1 ∂K 1 ∂K 1 ∂φ = + ∂π1 ∂π1 ∂φ ∂π1 ∂W 1 ∂K 1 ∂φ = ∂πN ∂φ ∂πN
−1
−1
−1 ∂W 1 ∂W 1 ∂K 1 ∂K 1 ∂φ ∂φ ∂φ = + ∂π1 ∂πN ∂π1 ∂φ ∂πN ∂π1 ∂πN Condition (20) then implies that ∂K 1 /∂π1 = 0 for all π, and (21) is proved. Taking (π1 π2 πN−1 φ) as local coordinates, we find that W 1 = K 1 (π2 πN−1 φ) does not depend on the first coordinate, and Proposition 7 applies. Q.E.D.
784
P.-A. CHIAPPORI AND I. EKELAND
Corollary 8 shows that generalized separability is but a particular form of exclusion restriction, so that the general identifiability result applies in that case as well. As it turns out, while Proposition 7 directly relates to public goods, Corollary 8 is easier to use in a private good context. Finally, note that if either the exclusion or the generalized separability property applies to both members (with different excluded goods x1 and x2 , or different separating functions φ1 and φ2 ), then both W 1 and W 2 are ordinally identifiable. Then for any cardinalization of W 1 and W 2 , the coefficients λi are identifiable as well. 5.3. The Meaning of Genericity Before discussing the economic interpretation of the previous results, a remark is in order. In Proposition 7 and Corollary 8, identifiability is only “generic.” Technically, what Proposition 7 shows is that identifiability obtains unless the structural functions W¯ 1 , W¯ 2 , θ¯ satisfy a set of partial differential equations that can be explicitly derived. It is thus important to understand the exact scope of these partial differential equations, and particularly to discuss examples in which they are satisfied everywhere. An obvious case in which the equations are always fulfilled obtains when θ¯ ¯ abcd in is constant: Φ¯ is then identically null and so are the determinants D condition (15). This result is by no means surprising. Indeed, if θ¯ is a constant (say k), then λ¯ 2 (π) = kλ¯ 1 (π) for all π, and (9) becomes ξ(π) = λ1 (π)[Dπ W 1 (π) + kDπ W 2 (π)] = λ1 (π)Dπ [W 1 (π) + kW 2 (π)] In that case, the function ξ is in fact proportional to a single gradient; economically, the group behaves as a single consumer with an indirect utility equal to W 1 (π) + kW 2 (π). The nonidentifiability conclusion is indeed expected in that case: clearly, there exists a continuum of pairs (W 1 (π) W 2 (π)) that add up to the same weighted sum W 1 (π) + kW 2 (π). Note that exclusion does not help much here, unless the variables that enter W 1 and W 2 are all different. If there exists one variable, say πN , that enters both utilities, then if (W 1 (π) W 2 (π)) is a solution, so is (W 1 (π) + kf (πN ) W 2 (π) − f (πN )) for any arbitrary function f . We conclude that when a group behaves as a single consumer, then individual preferences are not identifiable. Ironically, a large fraction of the literature devoted to household behavior tends to assume a unitary setting in which the group is described as a unique decision maker. Our conclusions show that this approach, while analytically convenient, entails a huge cost, since it precludes the (nonparametric) identifiability of individual consumption and welfare. In a
MICROECONOMICS OF GROUP BEHAVIOR
785
general sense, nonunitary models are indispensable for addressing issues related to intrahousehold allocation.13 Finally, except for these particular cases, the identifiability result is quite ro¯ is nongeneric bust. For instance, whenever a particular solution (W¯ 1 W¯ 2 θ) ¯ abcd (π) ¯ = 0 for all a b c d ∈ (in the sense that ∂W¯ 2 /∂π1 (π) = 0 and D {1 N}, so that (13) is violated), the violation is not robust to slight perturbations of either preferences or weights. An example entailing linear versions of all functions is provided in Chiappori and Ekeland (2007). 6. APPLICATION: IDENTIFIABILITY IN THE COLLECTIVE MODEL (S = 2) We may now specialize our result to specific economic contexts. 6.1. Identifiability With Purely Public Consumptions We first consider the benchmark case where all commodities are publicly consumed; then n = 0 and N = K, and the problem can be expressed as (see Section 3.3.1) γ s DX U s = P(X y) s=12
where γ s = μs /γ. In that case, the exclusion assumption has a natural translation, namely that commodity 1 (resp. 2) is not consumed by member 1 (resp. 2). Also, remember that in the public good case, the collective indirect utility is just the direct utility. We can thus apply Proposition 7 to W i = U i and πj = X j , and conclude that U 1 and U 2 are ordinally identifiable. Moreover, for any cardinalization of these utilities, the coefficients γ s are exactly identifiable. Since they are proportional to the Pareto weights and that the latter add up to 1, we conclude that the Pareto weights are identifiable as well. In summary, we state the following corollary: COROLLARY 9: In the collective model with two agents and public consumption only, if for each member there exist at least one good not consumed by that member but consumed by the other, then generically individual preferences are exactly (ordinally) identifiable from household demand, and for any cardinalization of individual utilities, the Pareto weights are exactly identifiable. We may briefly discuss the conditions needed for identifiability. One is that (∂W 2 (π))/∂π1 = 0, that is, in this case (∂U 2 (π))/∂X 1 = 0. Clearly, if commodity 1 is valued neither by member 1 nor by member 2, household demand for this good is zero and cannot be used for identifiability. 13 ¯ More generally, Proposition 7 does not apply if ∂θ(π)/∂π 1 = 0; again, Φ is then the null function. It is thus important to check that this condition is satisfied. For empirical applications, however, we shall see that it is likely to hold unless θ¯ is constant.
786
P.-A. CHIAPPORI AND I. EKELAND
More demanding is the requirement that ∂θ/∂X 1 = 0. In our context, θ is the ratio of individual Pareto weights. In general, Pareto weights are functions of prices P and income y. In Proposition 7, however, θ is expressed as a function of (X y) to exploit the private/public goods duality. The link between the two is straightforward: If μ denotes the ratio of Pareto weights, we have θ(X y) = μ[P(X y) y], and hence ∂μ ∂Pk ∂θ = 1 ∂X ∂Pk ∂X 1 k=1 K
This partial ∂θ/∂X 1 cannot be zero unless the vector (∂P1 /∂X 1 ∂PK / ∂X 1 ) is orthogonal to the gradient of μ in P. This will surely occur if μ is constant, since the gradient is null. This is the case, discussed above, in which the household behaves as a single decision maker and maximizes the unitary utility U 1 + μU 2 . If μ is not constant, however, then ∂θ/∂X 1 = 0 leads to the equation: K ∂μ ∂Pk [P(X y) y] (X y) = 0 ∂Pk ∂X 1 k=1
Taking (P y) as independent variables instead of (X y), we get an equation of the form (23)
K ∂μ (P y)ϕk (P y) = 0 ∂Pk k=1
where the ϕk 1 ≤ k ≤ K, depend on μ, U 1 , and U 2 . It can be shown that, generically14 in (μ U 1 U 2 ), equation (23) defines a submanifold Σ of codimension 1 (and hence a set of measure 0) in RK × R. In other words, if the actual μ, U 1 , and U 2 are generic (in particular, if μ is not constant), there is a set Σ of measure 0 such that, whenever (P y) ∈ / Σ, Pareto weights and preferences can be uniquely recovered from observing the collective demand function X(P y) ¯ y). ¯ near (P 6.2. Application: Collective Models of Labor Supply With Public Consumptions An immediate application is to the collective model of household labor supply, initially introduced by Chiappori (1988a, 1988b, 1992). The idea is to con14 Here, genericity is taken in the sense of Thom; it means that the set N of (μ U 1 U 2 ) where the property does not hold is small in an appropriate sense. To be precise, it is, in an appropriate function space, contained in a countable union of closed sets with empty interiors. As a consequence, it has itself an empty interior, so that if (μ ¯ U¯ 1 U¯ 2 ) happens to belong to N , there are ¯ U¯ 1 U¯ 2 ) and which do not belong neighbors (μ U 1 U 2 ), which are as close as one wishes to (μ to N .
MICROECONOMICS OF GROUP BEHAVIOR
787
sider the household as a two-person group making Pareto-efficient decisions on consumption and labor supply. Let Ls denote the leisure of member s and let ws denote the corresponding wage. Various versions of the model can be considered. In each of them Corollary 9 applies, leading to full identifiability of the model. MODEL 1 —Leisure as an Exclusive Good: In the first model, each member’s leisure is exclusive (i.e., excluded from the other member’s consumption) and there is no household production. Labor and nonlabor incomes are used to purchase commodities X 1 X K that are publicly consumed within the household; utilities are thus of the form U s (Ls X 1 X K ). MODEL 2—Leisures Are Public, One Exclusive Good per Member: In the second model, leisure of one member is also consumed by the other member; again, there is no household production. The identifying assumption is that there exist two commodities (say, 1 and 2) such that commodity i is exclusively consumed by member i.15 One can think, for instance, of clothing as the exclusive commodity (as in Browning, Bourguignon, Chiappori, and Lechene (1994)), but many other examples can be considered. Utilities are then of the form U i (L1 L2 X i X 3 X K ). Again, Corollary 9 applies: from the observation of the two labor supplies and the K consumptions as functions of prices, wages, and nonlabor income, it is possible to uniquely recover preferences and Pareto weights. This is a strong result indeed, since it states that one can, from the sole observation of household labor supply and consumption, identify a person’s marginal willingness to pay for her spouse’s leisure. MODEL 3—Leisure as Exclusive Goods With Household Production: As a third example, assume that individual time can be devoted to three different uses: leisure, market work, and household production. The domestic good Y is produced from domestic labor under some constant return to scale technology, say Y = f (t 1 t 2 ),16 and publicly consumed within the household. Preferences are of the form U s (Ls Y X 1 X K ) and one can define (24)
U˜ s (Ls t 1 t 2 X 1 X K ) = U s (Ls f (t 1 t 2 ) X 1 X K )
Here, Y is not observable in general, but t 1 and t 2 are observed, which typically requires data over time use (obviously, there is little chance to identify household production if neither the output nor the input are observable). 15 16
This framework is close to (but less general than) that of Fong and Zhang (2001). Other inputs can be introduced at no cost, provided they are observable.
788
P.-A. CHIAPPORI AND I. EKELAND
From Corollary 9, the U˜ s are identified. Then the production technology can be recovered up to a scaling factor, using the assumption of constant return to scale, from the relation ∂U˜ 1 /∂t 1 ∂U˜ 2 /∂t 1 ∂f/∂t 1 = = ∂U˜ 1 /∂t 2 ∂U˜ 2 /∂t 2 ∂f/∂t 2 which in addition generates an overidentifying restriction. Finally, (24) allows us to recover the U s ; again, the separability property in (24) generates additional, testable restrictions. MODEL 4 —Leisures Are Public, One Exclusive Good per Member and Household Production: Finally, one can combine Models 2 and 3 by assuming that leisure is a public good, but the demand for two other exclusive goods can be observed. Again, identifiability generically obtains in this context. 6.3. Identifiability in the General Case We may now address the general case with private and public consumptions, that is, we want to solve the system of equations (5) for a given A(p X y). In the presence of private consumption, the problem is more complex, because the exclusivity assumption of the type introduced in Proposition 7 does not have a direct economic translation. Indeed, remember that W s (p X y) = V s (p X ρs (p X y)); hence ∂W 1 ∂V 1 ∂V 1 ∂ρ1 = + 1 ∂p1 ∂p1 ∂ρ ∂p1 We see, in particular, that even when consumer 1 does not consume commodity 1, so that ∂V 1 /∂p1 = ∂U 1 /∂x1 = 0, we still have that ∂W 1 /∂p1 = (∂V 1 )/(∂ρ)(∂ρ1 /∂p1 ) = 0 in general. Intuitively, even when a good is not consumed by an agent, the corresponding expenditures may still impact the agent’s share of resources, therefore the agent’s welfare, through an income effect. ¯ y) ¯ X ¯ and However, Corollary 8 is now the relevant tool. Take a point (p assume that (25)
∂ρ1 ¯ y) ¯ X ¯ = 0 (p ∂y
MICROECONOMICS OF GROUP BEHAVIOR
789
From now on, all results are understood to be local, that is, to hold in some ¯ y). ¯ X ¯ Assume, now, that commodity 1 is not consumed neighborhood of (p by member 1 and that commodity 2 is not consumed by member 2: ∂V i ∂U i = =0 ∂pi ∂xi
(i = 1 2)
It follows, first, that for any solution W : ∂W 1 (p X y)/∂p1 ∂ρ1 (p X y)/∂p1 = ∂W 1 (p X y)/∂y ∂ρ1 (p X y)/∂y Let (U 1 U 2 ρ1 ρ2 = y − A − ρ1 ) and (U¯ 1 U¯ 2 ρ¯ 1 ρ¯ 2 = y − A − ρ¯ 1 ) be the utilities and conditional sharing rules corresponding to two different solutions of equations (5). We have: (26)
∂W 1 /∂p1 ∂ρ1 /∂p1 = ∂W 1 /∂y ∂ρ1 /∂y
and
∂W¯ 1 /∂p1 ∂ρ¯ 1 /∂p1 = ∂ρ¯ 1 /∂y ∂W¯ 1 /∂y
Consider the household demand x2 for commodity 2—which by assumption comes from agent 1. We have x2 (p X y) = ξ12 (p2 p3 pn X ρ1 (p X y)) = ξ¯ 12 (p2 p3 pn X ρ¯ 1 (p X y)) where ξ1i (resp. ξ¯ 1i ) is the conditional Marshallian demand corresponding to U 1 (resp. U¯ 1 ). Therefore, (27)
∂ρ1 /∂p1 ∂x2 /∂p1 ∂ρ¯ 1 /∂p1 = = ∂ρ1 /∂y ∂x2 /∂y ∂ρ¯ 1 /∂y
Comparing (26) with (27), we get ∂W¯ /∂p1 ∂ρ1 /∂p1 = ∂ρ1 /∂y ∂W¯ 1 /∂y which is condition (20). Applying Corollary 8, we get the following corollary: COROLLARY 10: In the general, collective model with two agents, under assumption (25), if each member does not consume at least one of the goods consumed by the other, then generically the indirect collective utility of each member is exactly (ordinally) identifiable from household demand. For any cardinalization of indirect collective utilities, the Pareto weights are exactly identifiable.
790
P.-A. CHIAPPORI AND I. EKELAND
Note, however, that the conditions (19) have to be satisfied. Although they hold true in a generic sense, checking that they are fulfilled in a specific context may be tedious. In such cases, using distribution factors may facilitate identifiability (see below). Links With the Existing Literature The statements derived above generalize several results existing in the literature. I. In an earlier work, Chiappori (1992) analyzed a collective model of labor supply in a three goods framework (two labor supplies and an Hicksian composite good). In this model, all commodities are privately consumed and each member’s labor supply is exclusive. Chiappori showed that • efficiency is equivalent to the existence of a sharing rule; • the sharing rule is identifiable from labor supply up to an additive constant; for any choice of the constant, individual preferences are exactly identified; • finally, the additive constant is welfare-irrelevant. Our paper extends these conclusions to a general framework, including an arbitrary number of private and public consumptions. In particular, • the sharing rule of Chiappori’s model (which entails private goods only) can be extended to a conditional sharing rule in the general context; • identifiability up to an additive constant is only true in a three goods context; in the general case, the indeterminacy of direct utilities and the sharing rule is deeper; • however, the indeterminacy is welfare-irrelevant, which generalizes Chiappori’s conclusion. II. In a recent paper, Blundell, Chiappori, and Meghir (2005) (from now on BCM) analyzed a similar model, with the difference that the nonexclusive good is public (and can be interpreted as child expenditures or welfare). They proved that their model is generically identifiable when a distribution factor is available.17 Again, the present paper shows that identifiability obtains in the BCM context with more than three public goods, and actually an arbitrary (larger than three) number of private and public consumptions, even without distribution factors. 6.4. Identifiability Using Distribution Factors The results derived so far do not use distribution factors. We now show how such factors can help the identifiability process by alleviating the necessary conditions. Let z denote a distribution factor which is behavior-relevant, in the 17 BCM provided a detailed analysis of the comparative statics of such a model and of its welfare implications. They showed, in particular, that increasing the Pareto weight of a particular member increases the household demand for a public good if and only if the person’s marginal willingness to pay for the public good is more income sensitive than that of her spouse.
MICROECONOMICS OF GROUP BEHAVIOR
791
sense that ∂θ/∂z = 0 where θ = λ1 /λ2 as usual. Also, we maintain the regularity assumption (25). From W 1 (p X y z) = V 1 (p X ρ1 (p X y z)) we get that (28)
∂W 1 /∂z ∂ρ1 /∂z = 1 ∂W 1 /∂y ∂ρ /∂y
and Corollary 8 applies if we can show that the right-hand side does not depend on the particular solution considered. This can be done, as before, when each member is the exclusive consumer of (at least) one commodity. But this requirement can be relaxed. We only need either one exclusive good (instead of two) or an assignable commodity. A good is assignable when it is consumed by both members and the consumption of each member is independently observed. Assume that good 1 is either assignable or exclusively consumed by member 1, so that x11 (p X y z) is observed. Now, for any two solutions, we have that x11 (p X y z) = ξ11 (p X ρ1 (p X y z)) = ξ¯ 11 (p X ρ¯ 1 (p X y z)) where ξ11 and ξ¯ 11 denote the conditional demand of member 1 in each solution. Therefore, ∂ρ1 /∂z ∂x11 /∂z ∂ρ¯ 1 /∂z = 1 = 1 ∂ρ1 /∂y ∂ρ¯ /∂y ∂x1 /∂y and the right-hand side of (28) does not depend on the particular solution considered, so that identifiability obtains by Corollary 8. The genericity condition (19) may also be easier to check in that case. Consider, in particular, the case of private goods only. Then W 2 (p y z) = V 2 (p ρ2 (p y z)) = V 2 (p y − ρ1 (p y z)); hence ∂W 2 /∂z ∂ρ2 /∂z ∂ρ1 /∂z = = − ∂W 2 /∂y ∂ρ2 /∂y 1 − ∂ρ1 /∂y which cannot equal (∂ρ1 /∂z)/(∂ρ1 /∂y) unless ∂ρ1 /∂z = 0, that is, unless the distribution factor is behavior-irrelevant. These results can easily be illustrated on specific functional forms. A simple example, using LES preferences and linear Pareto weights, can be used to show how identification obtains under an exclusion restriction in the collective setting, but not in the unitary context, and policy implications (in terms of welfare
792
P.-A. CHIAPPORI AND I. EKELAND
analysis of tax reforms and targeted benefits) are discussed. The interested reader is referred to the supplementary material available online (Chiappori and Ekeland (2009)). 7. THE GENERAL CASE (S ≥ 2) Finally, how do these results generalize to groups of arbitrary sizes? We only indicate here the main results. First, Lemma 5 extends as follows. Consider the equation ξ(π) =
S
λs (π)Dπ W s
s=1
where π ∈ RN and ξ maps RN into itself. Let (W¯ 1 W¯ S λ¯ 1 λ¯ S ) be a particular solution. Under a regularity condition, for any other solution (W 1 W S λ1 λS ), there exist S functions F 1 F S such that
¯1 λ¯ S−1 1 1 1 S λ ¯ ¯ W (π) = F W W S S λ¯ λ¯
W (π) = F S
S
¯1 λ¯ S−1 1 S λ ¯ ¯ W W S S λ¯ λ¯
where the F s satisfy, moreover, a set of partial differential equations. Second, identifiability under exclusivity still holds in the following sense. Assume that there exists some variable, say π1 , such that ∂W 1 /∂π1 = 0. Then generically W 1 is identifiable. The generalized separability property extends in a similar manner. However, both results require at least 2S commodities to apply; moreover, the genericity conditions are more complex. Regarding economic implications, we conclude that while the general model (without exclusivity) is clearly not identifiable, one consumption exclusion per agent is sufficient to obtain generic identifiability in the general case, with the same caveats as above. The techniques are similar to (although more tedious than) those used above. Perhaps more important is the conclusion in the presence of one distribution factor (at least). Indeed, the argument previously described for two agents directly extends to S ≥ 2. For any two solutions W s and W¯ s , we have that ∂W s /∂z ∂x1s /∂z ∂W¯ s /∂z = 1 = ∂W s /∂y ∂xs /∂y ∂W¯ s /∂y Provided that the genericity conditions are satisfied (in particular, the (∂ρs / ∂z)/(∂ρs /∂y) must be pairwise different), we then conclude that the W s are
MICROECONOMICS OF GROUP BEHAVIOR
793
(ordinally) exactly identified, without restriction on the number of distribution factors, from just one assignable good. 8. CONCLUSION The main goal of the paper is to assess under which conditions the aggregate behavior of a group provides enough information to recover the underlying structure (i.e., preferences and the decision process) even when nothing is known (or observed) about the intrahousehold decision making mechanism beyond efficiency. We reach two main conclusions. First, the general version of the model is not identifiable: a continuum of different models generates the same household demand function. Second, one exclusion condition for each member—for each member there is a commodity this member does not consume—is sufficient to generically guarantee full identifiability of the collective indirect utility of each member, that is, of the welfare-relevant concept that summarize preferences and the decision processes. We adopt throughout the paper a “nonparametric” standpoint, in the sense that our results do not rely on specific functional form assumptions. Obviously, the introduction of a particular functional form is likely to considerably facilitate identifiability; that is, it may well be the case that, for models that are not identifiable in the nonparametric sense, all parameters of a given functional form can be exactly identifiable, even when the form is quite flexible. In the end, the results above show that not much is needed to formulate normative judgements that take into account the complex nature of collective decision processes; one exclusive commodity per agent is generically sufficient. Constant Pareto Weights Our identifiability is only generic in the sense that one can always construct examples in which it does not hold. A polar case in which identifiability does not obtain is when Pareto weights are constant; then we are in a “unitary” context, since the group behaves as a single consumer. We conclude that nonunitary models are necessary for identifiability. It is fair to argue, however, that in basically any reasonable, nonunitary model that generates Pareto-efficient allocations, Pareto weights will not be constant. Take, for instance, a simple bargaining framework, as used (among many others) by McElroy (1990) or Chiappori and Donni (2006). A key role in determining the outcome on the Pareto frontier is played by agents’ “threat points.” Various versions have been proposed for these: utilities in case of divorce, noncooperative outcomes, “separate sphere” allocations, and so forth. In all cases, the threat point is an indirect utility, representing the welfare an agent would achieve in some particular context; as such, it is price dependent.
794
P.-A. CHIAPPORI AND I. EKELAND
The Exclusion Assumption These ideas have actually been empirically implemented in a number of existing contributions. Most of the time, the key assumption is that a person’s leisure is exclusively consumed by that person (therefore excluded from the spouse’s consumption) as in Fortin and Lacroix (1997), Blundell, Chiappori, Magnac, and Meghir (2007), Dauphin and Fortin (2001), Vermeulen (2005), Chiappori, Fortin, and Lacroix (2002), Dauphin (2003), Mazzocco (2003, 2007), and Donni (2003, 2004). This is certainly a strong assumption, for two reasons. First, household members may spend time on household production; note, however, that household production can readily be taken into account in this context, as discussed in Section 7 and in Apps and Rees (1996, 1997), and Chiappori (1988b), and empirically estimated among others by Rappoport, Sofer, and Solaz (2003), Couprie (2004), and van Klaveren, van Praag, and Maassen van den Brink (2009). Second, leisure may be partly public, in the sense that the husband may derive direct utility from his wife’s leisure and conversely. Therefore, if leisure is mainly private (in the sense that the externalities just mentioned are of second order), the approach adopted in the literature is valid. If, on the contrary, public effects are paramount and drive most couples’ behavior, the strategy under consideration may be problematic. In the end, whether a member’s leisure may be considered as exclusive is an empirical issue. Previous contributions (Chiappori (1988a, 1988b, 1992)) have derived specific predictions stemming from this assumption. It is interesting to note that (i) almost none of the empirical tests of the collective model with exclusive leisure reject these specific predictions and (ii) in particular, papers aimed at comparing the respective empirical performances of the unitary and the collective approach with exclusive leisure invariably favor the latter (Fortin and Lacroix (1997), Vermeulen (2005)). Anyhow, the results derived in the present paper show that the exclusion assumption can be relaxed in two directions. First, in the presence of distribution factors, one leisure only needs to be exclusive. This finding should lead to new empirical approaches. Second, even if both individual labor supplies are public, the model remains identifiable provided that two other commodities can be found that are each excluded from a member’s consumption (one commodity is sufficient with a distribution factor); see Fong and Zhang (2001) for a careful analysis along these lines. Indeed, exclusion restrictions have been imposed on other consumption goods in the literature. For instance, several papers, following an early contribution by Browning et al. (1994), used clothing expenditures. While assignability of clothing is not too problematic (except maybe for same sex couples), exclusion may be trickier in our context, since it requires independent price variations for the excluded goods. In practice, while different prices for male and female clothing may be available, they tend to be highly correlated. Here, the result, demonstrated in the present paper, that assignability is sufficient for
MICROECONOMICS OF GROUP BEHAVIOR
795
identifiability in the presence of a distribution factor provides a strong justification to these approaches. Identification Finally, what about actual identification of an identifiable model? Specifically, what type of stochastic structure should be used and to what data can it be applied? Data sets are readily available (and have been used) for models of household labor supply (see the abundant literature referred to above). The case of consumption goods is somewhat different. A standard problem of demand analysis is that while income effects are usually easy to estimate, the measurement of price effects requires price variations, which may be hard to obtain. Two strategies have been used in the literature. One relies on variations through time and regions (see, for instance, Browning and Chiappori (1998) and Browning, Chiappori, and Lewbel (2005)). Alternatively, one may use natural experiments to generate exogenous price variations. For instance, Kapan (2009) used Turkish data collected in a period of high inflation; similar households then faced widely different relative prices depending on the month during which they were surveyed. Both works reject the unitary model when applied to couples, but not for singles; moreover, none rejects the collective model for couples. Regarding the stochastic structure, on the other hand, existing works use mostly the standard approach of consumer theory, in which stochastic error terms (reflecting measurement errors, but also unobserved heterogeneity, etc.) enter demand equations additively. Identification is in general easy to achieve in this context. Some models, however, consider more sophisticated stochastic structure and discuss the related identification issues; see Blundell et al. (2005) in the case of labor supply. All in all, it is fair to conclude that much still has to be done in terms of empirical applications of the collective model. The main goal of the present paper is to provide a theoretical background for such works. One thing seems, however, clear: whenever issues related to intrahousehold allocation of commodities, welfare, or decision power are at stake, nonunitary models are needed. The collective approach provides a consistent and promising theoretical framework for that purpose. APPENDIX: PROOF OF LEMMA 5 The proof is in three steps: Step 1: Consider the 1-form ω = ξi dπi . For information about differential forms, we refer to Chiappori and Ekeland (1997, 1999, 2006) and to the references therein. Equation (9) means that ω = λ¯ 1 d W¯ 1 + λ¯ 2 d W¯ 2 . Differentiating and then taking wedge products, we get (29)
dω = d λ¯ 1 ∧ d W¯ 1 + d λ¯ 2 ∧ d W¯ 2
796 (30) (31) (32)
P.-A. CHIAPPORI AND I. EKELAND
ω ∧ dω = (λ¯ 1 d λ¯ 2 − λ¯ 2 d λ¯ 1 ) ∧ d W¯ 1 ∧ d W¯ 2
¯ λ1 = −(λ¯ 2 )2 d ∧ d W¯ 1 ∧ d W¯ 2 λ¯ 2 dω ∧ dω = 2 d λ¯ 1 ∧ d W¯ 1 ∧ d λ¯ 2 ∧ d W¯ 2
Now, following Ekeland and Nirenberg (2002), we introduce two sets of differential forms:
E1 = {α|α ∧ dω ∧ dω = 0} E2 = {α ∈ E1 |α ∧ ω ∧ dω = 0} Both E1 and E2 are linear subspaces, of dimension 4 and 3, respectively, and differential ideals. By the Frobenius theorem (see Bryant et al. (1991, Chap. 2, Theorem 1.1)), d(λ¯ 1 /λ¯ 2 ), d W¯ 1 , and d W¯ 2 form a linear basis of E2 . If (W 1 W 2 λ1 λ2 ) is another solution of (9), then ω = λ1 dW 1 + λ2 dW 2 , so that formulas (29), (30), and (32) hold with (W 1 W 2 λ1 λ2 ) replacing (W¯ 1 W¯ 2 λ¯ 1 λ¯ 2 ). So d(λ1 /λ2 ), dW 1 , and dW 2 belong to E2 , and by the Frobenius theorem they are linear combinations of d(λ¯ 1 /λ¯ 2 ), d W¯ 1 , and d W¯ 2 . It follows that there are functions F , G, and H such that (10) and (11) hold, together with
λ1 λ¯ 1 1 2 ¯ ¯ (π) = H W (π) W (π) (π) λ2 λ¯ 2 Step 2: Set θ¯ = λ¯ 1 /λ¯ 2 , Fi = ∂F/∂ti , and Gi = ∂G/∂ti . From (10), we have ω = λ1 dW 1 + λ2 dW 2 ¯ = (λ1 F1 + λ2 G1 ) d W¯ 1 + (λ1 F2 + λ2 G2 ) d W¯ 2 + (λ1 F3 + λ2 G3 ) d θ Comparing with ω = λ¯ 1 d W¯ 1 + λ¯ 2 d W¯ 2 , we get λ1 F1 + λ2 G1 = λ¯ 1 λ1 F2 + λ2 G2 = λ¯ 2 λ1 F3 + λ2 G3 = 0 which gives (12). Step 3: Once W 1 and W 2 are determined, λ1 and λ2 follow from the decomposition ω = λ1 dW 1 + λ2 dW 2 , which is unique if dW 1 and dW 2 are not collinear.
MICROECONOMICS OF GROUP BEHAVIOR
797
REFERENCES APPS, P. F., AND R. REES (1996): “Labour Supply, Household Production and Intra-Family Welfare Distribution,” Journal of Public Economics, 60, 199–219. [794] (1997): “Collective Labour Supply and Household Production,” Journal of Political Economy, 105, 178–190. [794] ARNOLD, V. I. (1978): Mathematical Methods of Classical Mechanics. Berlin: Springer-Verlag. [776] BLUNDELL, R., P.-A. CHIAPPORI, T. MAGNAC, AND C. MEGHIR (2007): “Collective Labor Supply: Heterogeneity and Nonparticipation,” Review of Economic Studies, 74, 417–447. [764,794] BLUNDELL, R., P.-A. CHIAPPORI, AND C. MEGHIR (2005): “Collective Labor Supply With Children,” Journal of Political Economy, 113, 1277–1306. [781,790,795] BOURGUIGNON, F., M. BROWNING, AND P.-A. CHIAPPORI (2009): “The Collective Approach to Household Behaviour,” Review of Economic Studies (forthcoming). [766] BROWNING, M., AND P.-A. CHIAPPORI (1998): “Efficient Intra-Household Allocations: A General Characterization and Empirical Tests,” Econometrica, 66, 1241–1278. [769,770,795] BROWNING, M., AND V. LECHENE (2001): “Caring and Sharing: Tests Between Alternative Models of Intra-Household Allocation,” Discussion Papers 01-07, Institute of Economics, University of Copenhagen. [768] BROWNING, M., F. BOURGUIGNON, P.-A. CHIAPPORI, AND V. LECHENE (1994): “Incomes and Outcomes: A Structural Model of Intra-Household Allocation,” Journal of Political Economy, 102, 1067–1096. [787,794] BROWNING, M., P.-A. CHIAPPORI, AND A. LEWBEL (2005): “Estimating Consumption Economies of Scale, Adult Equivalence Scales, and Household Bargaining Power,” Mimeo, Boston College. [795] BRYANT, R. L., S. S. CHERN, R. B. GARDNER, H. L. GOLDSCHMIDT, AND P. A. GRIFFITHS (1991): Exterior Differential Systems. New York: Springer-Verlag. [776,796] CARTAN, E. (1945): Les systèmes Différentiels Extérieurs et Leurs Applications Géométriques. Paris: Hermann. [776] CHERCHYE, L., B. DE ROCK, AND F. VERMEULEN (2007a): “The Collective Model of Household Consumption: A Nonparametric Characterization,” Econometrica, 75, 553–574. [770] (2007b): “The Revealed Preference Approach to Collective Consumption Behavior: Testing, Recovery and Welfare Analysis,” Mimeo, Leuven University. [770] CHIAPPORI, P.-A. (1988a): “Rational Household Labor Supply,” Econometrica, 56, 63–89. [764, 765,777,786,794] (1988b): “Nash-Bargained Household Decisions: A Comment,” International Economic Review, 29, 791–796. [764,765,777,786,794] (1992): “Collective Labor Supply and Welfare,” Journal of Political Economy, 100, 437–467. [764,765,773,777,778,781,786,790,794] (1997): “Introducing Household Production in Collective Models of Labor Supply,” Journal of Political Economy, 105, 191–209. [764] (2005): “Modèle collectif et analyse de bien-être,” L’Actualité Economique/Revue D’Analyse Economique, 81, 405–419. [775] CHIAPPORI, P.-A., AND O. DONNI (2006): “Les modèles non unitaires de comportement du ménage: Un survol de la littérature,” L’Actualité Economique/Revue D’Analyse Economique, 82, 9–52. [793] CHIAPPORI, P.-A., AND I. EKELAND (1997): “A Convex Darboux Theorem,” Annali della Scuola Normale Superiore di Pisa, 4.25, 287–297. [795] (1999): “Aggregation and Market Demand: an Exterior Differential Calculus Viewpoint,” Econometrica, 67, 1435–1458. [795] (2006): “The Microeconomics of Group Behavior: General Characterization,” Journal of Economic Theory, 130, 1–26. [764,770,777,795] (2007): “The Microeconomics of Group Behavior: Identification,” Mimeo, Columbia University. [785]
798
P.-A. CHIAPPORI AND I. EKELAND
(2009): “Supplement to ‘The Microeconomics of Efficient Group Behavior: Identification’: A Parametric Example,” Econometrica Supplemental Material, 77, http://www. econometricsociety.org/ecta/Supmat/5929_extensions.pdf. [792] CHIAPPORI, P.-A., AND B. SALANIÉ (2000): “Empirical Applications of Contract Theory: A Survey of Some Recent Work,” in Advances in Economics and Econometrics—Theory and Applications, Eighth World Congress, ed. by M. Dewatripont, L. Hansen, and P. Turnovsky. Econometric Society Monographs. Cambridge: Cambridge University Press, 115–149. [763] CHIAPPORI, P.-A., B. FORTIN, AND G. LACROIX (2002): “Marriage Market, Divorce Legislation and Household Labor Supply,” Journal of Political Economy, 110, 37–72. [764,766,794] COUPRIE, H. (2004): “L’influence du Contexte Institutionnel et Familial sur L’Offre de Travail des Femmes,” Ph.D. Dissertation, Universite de la Méditerranée, Aix-Marseille. [794] DAUPHIN, A. (2003): “Rationalité Collective des Ménages Comportant Plusieurs Membres: RéSultats Théoriques et Applications au Burkina Faso,” Thèse de doctorat, Université Laval. [794] DAUPHIN, A., AND B. FORTIN (2001): “A Test of Collective Rationality for Multi-Person Households,” Economic Letters, 71, 211–216. [794] DONNI, O. (2003): “Collective Household Labor Supply: Non-Participation and Income Taxation,” Journal of Public Economics, 87, 1179–1198. [794] (2004): “A Collective Model of Household Behavior With Private and Public Goods: Theory and Some Evidence From U.S. Data,” Working Paper, CIRPEE. [794] DUFLO, E. (2000): “Grandmothers and Granddaughters: Old Age Pension and Intra-Household Allocation in South Africa,” World Bank Economic Review, 17, 1–25. [766] EKELAND, I., AND L. NIRENBERG (2002): “The Convex Darboux Theorem,” Methods and Applications of Analysis, 9, 329–344. [796] FONG, Y., AND J. ZHANG (2001): “The Identifiability of Unobservable Independent and Spousal Leisure,” Journal of Political Economy, 109, 191–202. [764,787,794] FORTIN, B., AND G. LACROIX (1997): “A Test of Neoclassical and Collective Models of Household Labor Supply,” Economic Journal, 107, 933–955. [764,794] GALASSO, E. (1999): “Intrahousehold Allocation and Child Labor in Indonesia,” Mimeo, University of British Columbia. [766] KAPAN, T. (2009): “Essays on Household Bargaining and Matching,” Ph.D. Dissertation, Columbia University. [795] MAZZOCCO, M. (2003): “Individual Euler Equations Rather Than Household Euler Equations,” Manuscript, University of Wisconsin–Madison. [794] (2007): “Household Intertemporal Behavior: A Collective Characterization and a Test of Commitment,” Review of Economic Studies, 74, 857–895. [794] MCELROY, M. B. (1990): “The Empirical Content of Nash Bargained Household Behavior,” Journal of Human Resources, 25, 559–583. [766,793] OREFFICE, S. (2007): “Did the Legalization of Abortion Increase Women’s Household Bargaining Power? Evidence From Labor Supply,” Review of Economics of the Household, 5, 181–207. [766] RAPOPORT, B., C. SOFER, AND A. SOLAZ (2003): “Household Production in a Collective Model: Some New Results,” Cahiers de la MSE, Série Blanche 03039. [794] RUBALCAVA, L., AND D. THOMAS (2000): “Family Bargaining and Welfare,” Mimeo, RAND, UCLA. [766] THOMAS, D., D. CONTRERAS, AND E. FRANKENBERG (1997): “Child Health and the Distribution of Household Resources at Marriage,” Mimeo, RAND, UCLA. [766] TOWNSEND, R. (1994): “Risk and Insurance in Village India,” Econometrica, 62, 539–591. [763] VAN KLAVEREN, C., B. PRAAG, AND H. MAASSEN VAN DEN BRINK (2009): “Empirical Estimation Results of a Collective Household Time Allocation Model,” Review of Economics of the Household (forthcoming). [794] VERMEULEN, F. (2005): “And the Winner Is An Empirical Evaluation of Two Competing Approaches to Household Labour Supply,” Empirical Economics, 30, 711–34. [794]
MICROECONOMICS OF GROUP BEHAVIOR
799
Dept. of Economics, Columbia University, New York, NY 10027, U.S.A.;
[email protected] and Pacific Institute of Mathematical Sciences, University of British Columbia, 1933 West Mall, Vancouver, BC, V6T 1Z2 Canada and Dept. of Mathematics and Dept. of Economics, University of British Columbia, Vancouver, BC, V6T 1Z2 Canada;
[email protected]. Manuscript received June, 2005; final revision received August, 2008.
Econometrica, Vol. 77, No. 3 (May, 2009), 801–855
VECTOR EXPECTED UTILITY AND ATTITUDES TOWARD VARIATION BY MARCIANO SINISCALCHI1 This paper proposes a model of decision under ambiguity deemed vector expected utility, or VEU. In this model, an uncertain prospect, or Savage act, is assessed according to (a) a baseline expected-utility evaluation, and (b) an adjustment that reflects the individual’s perception of ambiguity and her attitudes toward it. The adjustment is itself a function of the act’s exposure to distinct sources of ambiguity, as well as its variability. The key elements of the VEU model are a baseline probability and a collection of random variables, or adjustment factors, which represent acts exposed to distinct ambiguity sources and also reflect complementarities among ambiguous events. The adjustment to the baseline expected-utility evaluation of an act is a function of the covariance of its utility profile with each adjustment factor, which reflects exposure to the corresponding ambiguity source. A behavioral characterization of the VEU model is provided. Furthermore, an updating rule for VEU preferences is proposed and characterized. The suggested updating rule facilitates the analysis of sophisticated dynamic choice with VEU preferences. KEYWORDS: Ambiguity, attitudes toward variability, reference prior.
1. INTRODUCTION THE ISSUE OF AMBIGUITY in decision-making has received considerable attention in recent years, both from a theoretical perspective and in applications to contract theory, information economics, finance, and macroeconomics. As Ellsberg (1961) first observed, individuals may find it difficult to assign probabilities to events when available information is scarce or unreliable. In these circumstances, agents may avoid taking actions whose ultimate outcomes depend crucially upon the realization of such ambiguous events and instead opt for safer alternatives. Several decision models have been developed to accommodate these patterns of behavior: these models represent ambiguity via multiple priors (Gilboa and Schmeidler (1989), Ghirardato, Maccheroni, and Marinacci (2004)), nonadditive beliefs (Schmeidler (1989)), second-order probabilities (Klibanoff, Marinacci, and Mukerji (2005), Nau (2006), Ergin and Gul (2009)), relative entropy (Hansen and Sargent (2001), Hansen, Sargent, and Tallarini (1999)), or variational methods (Maccheroni, Marinacci, and Rustichini (2006)). This paper proposes a decision model that incorporates key insights from Ellsberg’s original analysis, as well as from cognitive psychology and recent theoretical contributions on the behavioral implications of ambiguity. According 1 This is a substantially revised version of Siniscalchi (2001). Many thanks to Stephen Morris and three anonymous referees, as well as to Eddie Dekel, Paolo Ghirardato, Faruk Gul, Lars Hansen, Peter Klibanoff, Alessandro Lizzeri, Fabio Maccheroni, Massimo Marinacci, and Josè Scheinkman. All errors are my own.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7564
802
M. SINISCALCHI
to the proposed model, the individual evaluates uncertain prospects, or acts, by a process suggestive of anchoring and adjustment (Tversky and Kahneman (1974)). The anchor is the expected utility of the prospect under consideration, computed with respect to a baseline probability; the adjustment depends upon its exposure to distinct sources of ambiguity, as well as its variation away from the anchor at states that the individual deems ambiguous. Formally, an act f , mapping each state ω ∈ Ω to a consequence x ∈ X, is evaluated via the functional V (f ) = Ep [u ◦ f ] + A (Ep [ζi · u ◦ f ])0≤i
VEU AND ATTITUDES TOWARD VARIATION
803
reported by Ellsberg, the modal preferences are fR fB and fRG ≺ fBG . Epstein and Zhang suggested that “[t]he intuition for this reversal is the complementarity between G and B—there is imprecision regarding the likelihood of B, whereas {B G} has precise probability 23 ” (Epstein and Zhang (2001, p. 271)). The proposed model enables a representation of the modal preferences in this example that closely matches this interpretation: let p be uniform on the state space Ω = {R G B}, assume without loss of generality (w.l.o.g.) that u is linear, and let ζ0 be the random variable given by ζ0 (R) = 0
ζ0 (B) = 1
ζ0 (G) = −1
Finally, let A(φ) = −|φ| for every φ ∈ R. Thus, in this example, n = 1: onedimensional adjustment factors suffice. The interpretation of the adjustment factor ζ0 is as follows: since A(p({G})ζ0 (G)) = A(p({B})ζ0 (B)), G and B are “equally ambiguous”; however, ζ0 (G) = −ζ0 (B), that is, their ambiguities “cancel out.” This algebraic cancellation corresponds to Epstein and Zhang’s notion of complementarity. It is then easily verified that V (fR ) = 103 , V (fB ) = 0, V (fRG ) = 103 , and V (fBG ) = 203 , which is consistent with the preferences indicated above.2 Adjustment Factors ζi and Sources of Ambiguity: Each factor ζi encodes a particular pattern of complementarity and thus reflects a specific aspect of ambiguity. Different considerations lead to a similar intuition. Since Ep [ζi ] = 0 for all i, Eq. (1) can be rewritten in the form V (f ) = Ep [u ◦ f ] + A (Covp (ζi u ◦ f ))0≤i
804
M. SINISCALCHI
acts, and the expectations Ep [ζi · u ◦ f ] = Covp [ζi u ◦ f ] are the Fourier coefficients of the projection of u ◦ f onto this subspace. Adjustments and Variability: As noted above, adjustments to the baseline expected utility (EU) evaluation of an act are also related to the variability, or dispersion, of its utility profile. This can be attractive, as many economic applications of ambiguity-sensitive decision models show that interesting patterns of behavior can arise when agents wish to reduce outcome or utility variability.5 Indeed, Schmeidler (1989, p. 582) suggested that “ambiguity aversion” can be defined as a preference for “smoothing or averaging utility distributions”; see also Chateauneuf and Tallon (2002). The VEU representation relates adjustments to utility variability via two complementary channels. One is immediate from Eq. (2): the covariance of ζi and u ◦ f clearly depends upon the standard deviation of u ◦ f with respect to the baseline prior p. The second channel deserves further discussion. Call two acts f and f¯ complementary if their utility profiles u ◦ f and u ◦ f¯ satisfy u ◦ f¯ = c − u ◦ f for some real constant c: Definition 3 provides a simple behavioral characterization. Notice that the utility profiles of f and f¯ have the same standard deviation; indeed, virtually all classical measures of variability or dispersion for random variables6 consider u ◦ f and u ◦ f¯ = c − u ◦ f to be just as dispersed, because such measures are invariant to translation and sign changes. To relate adjustments to utility variability, the VEU representation incorporates the same invariance property: complementary acts receive the same adjustment. This follows from the symmetry property of the adjustment functional A: for every vector φ, A(φ) = A(−φ).7 Behaviorally, this property corresponds to the main novel axiom in this paper, complementary independence. Behavioral Identification of the Baseline Prior p: One additional consequence of this property, and indeed of the Complementary Independence axiom, deserves special emphasis. Symmetry implies that adjustment terms cancel out when comparing two complementary acts using the VEU representation in Eq. (1); thus, the ranking of complementary acts is effectively determined by their baseline EU evaluation. Conversely, preferences over complementary acts uniquely identify the baseline prior: there is a unique probability p and a cardinally unique utility function u such that, for all complementary acts f and f¯, f f¯ if and only if (iff) Ep [u ◦ f ] ≥ Ep [u ◦ f¯]. Thus, baseline priors have a simple behavioral interpretation in the present setting: they provide a representation of the individual’s preferences over complementary acts. This 5 See, for example, Bose, Ozdenoren, and Pape (2006), Epstein and Schneider (2007), Ghirardato and Katz (2006), or Mukerji (1998). 6 For instance, the mean absolute deviation, the range and (for continuous random variables) the interquantile range, Gini’s mean difference (cf., e.g., Yitzhaki (1982)), or peakedness ordering Bickel and Lehmann (1976). 7 Notice that if f and f¯ are complementary, then Cov(ζi u ◦ f¯) = − Cov(ζi u ◦ f ) for all ζi .
VEU AND ATTITUDES TOWARD VARIATION
805
implies that, under complementary independence, the baseline prior is behaviorally identified independently of other elements of the VEU representation. Flexibility and Dynamics: Finally, the functional representation in Eq. (1) is flexible enough to accommodate a broad range of attitudes toward ambiguity, while at the same time allowing for numerical and analytical tractability. The preferences in the three-color-urn example display ambiguity aversion as defined by Schmeidler (1989); correspondingly, the adjustment function A is nonpositive and concave. VEU preferences featuring a nonpositive and concave adjustment function A are variational (Maccheroni, Marinacci, and Rustichini (2006), Corollary 2 and Section 5.1), but VEU preferences allow for considerably more general ambiguity attitudes. For instance, as shown in Section 4.3, a nonpositive, but not necessarily concave adjustment function characterizes “comparative ambiguity aversion” in the sense of Ghirardato and Marinacci (2002); a parsimonious VEU representation with this property can, for instance, accommodate the interesting preference patterns highlighted by Machina (2009) (such patterns are inconsistent with decision models such as maxmin expected utility, variational preferences or smooth-ambiguity-averse preferences (cf. Baillon, L’Haridon, and Placido (2008)). Indeed, the VEU model can accommodate even more complex attitudes towards ambiguity— for instance, stake-dependent attitudes: the previous version of this paper (Siniscalchi (2007)) provides an example. This paper also proposes a possible updating rule for VEU preferences and provides a behavioral characterization. In the covariance formulation of the VEU model in Eq. (2), the proposed rule amounts to replacing expectations and covariances Ep and Covp with their conditional counterparts Ep [·|E] and Covp (· ·|E).8 Section 4.4 provides a behavioral characterization of this updating rule; it also illustrates how this rule enables a recursive analysis of sophisticated choice in dynamic problems. The paper is organized as follows. Section 2 is devoted to preliminaries. Section 3 presents the main characterization result. Section 4 analyzes the components of the VEU representation (Sections 4.1–4.3), and discusses updating and dynamic choice (Sections 4.4 and 4.5). Section 5 discusses the related literature (Section 5.1), as well as additional features and extensions of the VEU representation (Section 5.2). All proofs, as well as additional technical results, are given in the Appendix. Supplemental material is also available online (Siniscalchi (2009)). 2. NOTATION AND DEFINITIONS The following notation is standard. Consider a set Ω (the state space) and a sigma-algebra Σ of subsets of Ω (events). It will be useful to assume that the sigma-algebra Σ is countably generated: that is, there is a countable collection 8
A slight modification is required to ensure monotonicity; see Section 4.4 for details.
806
M. SINISCALCHI
S = (Si )i≥0 such that Σ is the smallest sigma-algebra containing S . All finite and countably infinite sets, as well as all Borel subsets of Euclidean n-space, and more generally all standard Borel spaces (Kechris (1995)) satisfy this assumption. Denote by B0 (Σ) the set of Σ-measurable real functions with finite range and by B(Σ) its sup-norm closure. The set of countably additive probability measures on Σ is denoted by ca1 (Σ).For any probability measure π ∈ ca1 (Σ) and function a ∈ B(Σ), let Eπ [a] = Ω a dπ, the standard Lebesgue integral of a with respect to π. Finally, a ◦ b : X → Z denotes the composition of the functions b : X → Y and a : Y → Z . Additional notation is useful to streamline the definition and analysis of the VEU representation. Given m ∈ Z+ ∪ {∞} and a finite or countably infinite collection z = (zi )0≤i<m of elements of B(Σ), let Eπ [z · a] = (Eπ [zi · a])0≤i<m if m > 0 and let Eπ [z · a] = 0 if m = 0. For any collection F ⊂ B(Σ), let E (F; π z) = {Eπ [z · a] ∈ Rm : a ∈ F}. Finally, let 0m denote the zero vector in Rm . Turn now to the decision setting. Consider a convex set X of consequences (outcomes, prizes). As in Anscombe and Aumann (1963), X could be the set of finite-support lotteries over some underlying collection of (deterministic) prizes, endowed with the usual mixture operation. Alternatively, the set X might be endowed with a subjective mixture operation, as in CasadesusMasanell, Klibanoff, and Ozdenoren (2000) or Ghirardato, Maccheroni, Marinacci, and Siniscalchi (2003). An act is a Σ-measurable function from Ω to X. Let F0 be the set of simple acts, that is, acts with finite range. With the usual abuse of notation, denote by x the constant act assigning the consequence x ∈ X to each ω ∈ Ω. The main object of interest is a preference relation on F0 ; its symmetric and asymmetric parts are denoted ∼ and , respectively. As is the case for other decision models, VEU preferences on F0 have a unique extension to a class of nonsimple, bounded acts. This extension is of particular interest in this paper: Proposition 1 uses it to characterize the minimum number of adjustment factors required to provide a VEU representation of a given preference relation. Thus, following Schmeidler (1989), denote by Fb the set of acts f for which there exist x x ∈ X such that x f (ω) x for all ω ∈ Ω. Finally, given a function u : X → R and a set F of acts, let u ◦ F = {u ◦ f ∈ B(Σ) : f ∈ F }. The formal definition of the VEU representation can now be provided. For the reasons just mentioned, the definition accommodates preferences on either F0 or Fb . DEFINITION 1: Let F denote either F0 or Fb . A tuple (u p n ζ A) is a VEU representation of a preference relation on F if the following conditions are met:
VEU AND ATTITUDES TOWARD VARIATION
807
1. u : X → R is nonconstant and affine, p ∈ ca1 (Σ), n ∈ Z+ ∪ {∞}, and ζ = (ζi )0≤i
f g
⇔
Ep [u◦f ]+A(Ep [ζ ·u◦f ]) ≥ Ep [u◦g]+A(Ep [ζ ·u◦g])
Conditions 1 and 5 are self-explanatory. Condition 2 ensures that the adjustment factors ζi are bounded and reflect the fact that constant acts are not subject to ambiguity. The general representation allows for at most countably infinitely many adjustment factors; moreover, by Theorem 1, if the state space Ω is finite, then a finite n suffices. In addition to the normalization A(0n ) = 0, condition 3 formalizes the central symmetry assumption discussed in the Introduction (cf. in particular footnote 7). Condition 4 ensures monotonicity of the VEU representation. Simple examples show that monotonicity necessarily involves a joint restriction on p, ζ, and A.9 In many cases of interest, easy-to-check necessary and sufficient conditions can be provided: see Appendix A for details. The functional A can be extended to all of Rn consistently with the symmetry requirement of condition 3; for instance, let A(φ) = 0 for all φ ∈ Rn \ E (u ◦ F ; p ζ).10 The values assumed by A at such points are obviously irrelevant to the representation of preferences. Restricting the domain of A to E (u ◦ F ; p ζ) as in Definition 1 simplifies the statement of some results. It is useful to point out that the functional A, and hence the entire VEU representation, is not required to be positively homogeneous. This makes it possible to accommodate, for instance, members of the “variational preferences” family studied by Maccheroni, Marinacci, and Rustichini (2006) that satisfy the key symmetry requirement of this paper; furthermore, it enables differentiable specifications of the adjustment functional A, which would otherwise be precluded. Observation: Equivalent Formulations: One can view the collection ζ = (ζi )0≤i
103 = Ep [ζ0 · fB ]. Taking A(ϕ) = |ϕ| instead shows that no general assumption may be made regarding the direction of monotonicity for A alone. 10 Note that a ∈ u ◦ F implies [infΩ a + supΩ a] − a ∈ u ◦ F , so φ ∈ E (u ◦ F ; p ζ) implies −φ ∈ E (u ◦ F ; p ζ).
808
M. SINISCALCHI
(Ep [ζi ·u◦f ])0≤i
λg + (1 − λ)x ∼ λg + (1 − λ)f
That is, a crisp act behaves like its certainty equivalent: in particular, as discussed in Ghirardato, Maccheroni, and Marinacci (2004), it does not provide a hedge against the ambiguity that influences any other act g.13 Constant acts are obviously crisp; correspondingly, any VEU representation of the preference assigns them the zero adjustment vector. Since crisp acts behave like constant acts, it seems desirable to ensure that their associated adjustment vector also be zero. DEFINITION 2: Let F denote either F0 or Fb . A VEU representation (u p n ζ A) of a preference relation on F is sharp if (ζi )0≤i
VEU AND ATTITUDES TOWARD VARIATION
809
It is sometimes convenient to employ VEU representations that are not sharp: see, for instance, the analysis of updating in Section 4.4. However, adjustment factors in a sharp representation can be interpreted as independent sources of ambiguity; see Section 4 for details. 3. AXIOMATIC CHARACTERIZATION OF VEU PREFERENCES Mixtures of acts are taken pointwise: for every pair of acts f , g and any α ∈ [0 1], αf + (1 − α)g is the act assigning the consequence αf (ω) + (1 − α)g(ω) to each state ω ∈ Ω. As in the preceding section, let F denote either F0 or Fb . Axioms 1–4 are standard: AXIOM 1—Weak Order: is transitive and complete on F . AXIOM 2—Monotonicity: For all acts f g ∈ F , f (ω) g(ω) for all ω ∈ Ω implies f g. AXIOM 3—Continuity: For all acts f g h ∈ F , the sets {α ∈ [0 1] : αf + (1 − α)g h} and {α ∈ [0 1] : h αf + (1 − α)g} are closed. AXIOM 4—Nondegeneracy: Not for all f g ∈ F , f g. Next, a weak form of the Anscombe and Aumann (1963) independence axiom, owing to Maccheroni, Marinacci, and Rustichini (2006), is assumed. AXIOM 5—Weak Certainty Independence: For all acts f g ∈ F , x y ∈ X, and α ∈ (0 1), αf + (1 − α)x αg + (1 − α)x implies αf + (1 − α)y αg + (1 − α)y. Loosely speaking, preferences are required to be invariant to translations of utility profiles, but not to rescaling (note that the same weight α is employed when mixing with x and with y). As discussed in Maccheroni, Marinacci, and Rustichini (2006), this axiom weakens Gilboa and Schmeidler’s (1989) certainty independence, which requires invariance to both translation and rescaling. Since certainty independence will be referenced below, it is reproduced here, even though it is not assumed in Theorem 1. AXIOM 5∗ —Certainty Independence: For all acts f g ∈ F , x ∈ X, and α ∈ (0 1), f g implies αf + (1 − α)x αg + (1 − α)x. To ensure that the baseline prior is countably additive, adopt the following axiom, which is in the spirit of Arrow (1974).14 A similar representation could 14 See also Chateauneuf, Marinacci, Maccheroni, and Tallon (2005) and Ghirardato, Maccheroni, and Marinacci (2004).
810
M. SINISCALCHI
be obtained without it, but it would not be possible to restrict attention to finite or countably infinite collections of adjustment factors. To state the axiom, for every pair x y ∈ X and E ∈ Σ, denote by xEy the act that yields x at every state ω ∈ E and y elsewhere. AXIOM 6—Monotone Continuity: For all sequences (Ak )k≥1 ⊂ Σ such that Ak ⊃ Ak+1 and k Ak = ∅, and for all x y z ∈ X such that x y z, there is k ≥ 1 such that zAk x y xAk z. To state the novel axioms in this paper, a preliminary definition is required. Intuitively, it identifies pairs of acts whose utility profiles are “mirror images.” DEFINITION 3: Two acts f f¯ ∈ F are complementary if and only if, for any two states ω ω ∈ Ω, 1 1 1 1 f (ω) + f¯(ω) ∼ f (ω ) + f¯(ω ) 2 2 2 2 If two acts f f¯ ∈ F are complementary, then (f f¯) is referred to as a complementary pair. If preferences over X can be represented by a von Neumann–Morgenstern utility function u(·), which is the case under Axioms 1–5, then the utility profiles of the acts f and f¯, denoted u ◦ f and u ◦ f¯, respectively, satisfy u ◦ f¯ = k − u ◦ f for some constant k ∈ R. Thus, complementarity is the preference counterpart of algebraic negation. ¯ are complementary pairs of acts, then, for any Notice that if (f f¯) and (g g) weight α ∈ [0 1], the mixtures αf + (1 − α)g and αf¯ + (1 − α)g¯ are themselves complementary. The complementary independence axiom may now be formulated. AXIOM 7—Complementary Independence: For any two complementary pairs ¯ in F , and all α ∈ [0 1]: f f¯ and g g¯ imply αf + (1 − α)g (f f¯) and (g g) ¯ ¯ αf + (1 − α)g. Axiom 7 formalizes the behavioral implications of the key cognitive assumption underlying VEU preferences: the decision-maker’s assessment of an act takes into account (i) a baseline evaluation, consistent with EU, as well as (ii) its utility variability around this baseline.15 To elaborate, for EU preferences, ¯ the property “f f¯ and g g¯ imply that αf + (1 − α)g αf¯ + (1 − α)g” holds regardless of whether or not f f¯ and g g¯ are pairwise complementary; 15
Equivalently, its outcome variability, but taking preferences over prizes into account.
VEU AND ATTITUDES TOWARD VARIATION
811
indeed, under Axioms 1–4, this property is equivalent to the standard independence axiom and characterizes EU preferences. Next, recall that complementary acts are mirror images of each other; hence, as noted in the Introduction, virtually all classical measures of dispersion attribute to them the same utility variability. Under the cognitive assumptions considered here, this implies that complementary acts are effectively ranked according to their baseline evaluation, which is assumed to be consistent with EU. In Axiom 7, this applies to the rank¯ and, because complementarity is preserved by mixtures, ing of f vs. f¯, g vs. g, ¯ These rankings must be consistent with EU, αf + (1 − α)g vs. αf¯ + (1 − α)g. which leads to the requirement in Axiom 7. A final assumption is needed: AXIOM 8—Complementary Translation Invariance: For all complementary ¯ 12 f + 12 x¯ ∼ 12 f¯ + 12 x. pairs (f f¯) in F and all x x¯ ∈ X with f ∼ x and f¯ ∼ x, Axiom 8 ensures that complementary acts are subject to the same adjustment to their respective baseline evaluations. Observe first that, since f and f¯ in Axiom 8 are complementary, so are the mixtures 12 f + 12 x¯ and 12 f¯ + 12 x; hence, these acts are evaluated according to their baseline EU evaluation. Consequently, the indifference between these mixtures has a trade-off interpretation: the difference between the baseline EU evaluation of f and f¯ is equal to ¯ Since f ∼ x and f¯ ∼ x, ¯ it also equals the the utility difference between x and x. difference between the overall VEU evaluations of f and f¯. Hence, f and f¯ are subject to the same adjustment. Complementary translation invariance is much less central to the characterization of VEU preferences than complementary independence (Axiom 7). Indeed, Axiom 8 is actually redundant in two important cases. First, Axiom 8 is implied by Axioms 1–5 and 7 if the utility function representing preferences over X is unbounded either above or below,16 as is the case for the majority of monetary utility functions employed in applications. Second, regardless of the utility function, if preferences satisfy Axioms 1–4 and 5∗ (instead of Axiom 5), then it is trivial to verify that the indifference required by Axiom 8 holds regardless of whether or not f and f¯ are complementary; in other words, Axiom 8 is automatically satisfied by all “invariant-biseparable” preferences Ghirardato, Maccheroni, and Marinacci (2004).17 Thus, Axiom 8 is only required to allow for preferences that simultaneously violate Axiom 5∗ and are represented by a bounded utility function on X. The main result of this paper can now be stated. 16 A proof is available upon request. Well known axioms ensure that utility is unbounded; see, for example, Maccheroni, Marinacci, and Rustichini (2006). 17 This class includes, for instance, all multiple-priors, α-maxmin, and Choquet expected utility preferences.
812
M. SINISCALCHI
THEOREM 1: Consider a preference relation on F0 . The following statements are equivalent: 1. The preference relation satisfies Axioms 1–8 on L = F0 . 2. admits a sharp VEU representation (u p n ζ A). 3. admits a VEU representation (u p n ζ A). In statement 2, if (u p n ζ A ) is another VEU representation of , then
p = p, u = αu + β for some α β ∈ R with α > 0, and there is a linear surjection T : E (u ◦ F0 ; p ζ ) → E (u ◦ F0 ; p ζ) such that (5)
1 T (Ep [ζ · a ]) = Ep [ζ · a ] α A (Ep [ζ · a ]) = αA T (Ep [ζ · a ])
∀a ∈ u ◦ F0
If (p u n ζ A ) is also sharp, then T is a bijection. Finally, if Ω is finite, then n ≤ |Ω| − 1. COROLLARY 1: If a preference relation on F0 satisfies satisfies Axioms 1–8, then it has a unique extension to Fb that satisfies the same axioms and admits a sharp VEU representation on Fb . The primary message of Theorem 1 is the equivalence of statements 1 and 2: Axioms 1–8 are equivalent to the existence of a sharp VEU representation. However, as noted in Section 2, it is sometimes convenient to employ VEU representations that are not sharp. Theorem 1 ensures that the resulting preferences will still satisfy Axioms 1–8. To put it differently, if a preference admits a VEU representation, then it also admits a sharp VEU representation. The second part of Theorem 1 indicates the uniqueness properties of the VEU representation. The baseline probability measure p is unique, and the adjustment factors ζ and function A are unique up to transformations that preserve both the affine structure of the set E (u ◦ F0 ; p ζ) of adjustment vectors and the actual adjustment associated with each element in that set. To elaborate, recall that the role of the adjustment factors ζ is to capture the patterns of “complementarity” among different events; for instance, if ambiguity about two events E and F cancels out, then Ep [ζ · 1E∪F ] = 0. For another tuple of random variables ζ to capture the same complementarities as ζ, it must be the case that also Ep [ζ · 1E∪F ] = 0. Similarly, complementarities among adjustment vectors associated with different acts must be preserved. The existence of a functional T with the properties listed in Theorem 1 ensures this. As Example 1 illustrates, this imposes considerable restrictions on transformations of a given adjustment that can be deemed inessential. EXAMPLE 1: Refer to the ambiguity-averse VEU preferences described in the Introduction in the context of the Ellsberg paradox. Note that E (u ◦ F0 ; p ζ) is the entire real line.
813 Now consider a two-element tuple ζ = (ζ0 ζ1 ) and let A (ϕ) = − ϕ21 + ϕ22 for all ϕ ∈ E (u ◦ F0 ; p ζ ). Suppose T is as in Theorem 1. Then A = A ◦ T implies that, in particular, A ( 13 ζ (R)) = A(T ( 13 ζ (R))) = A( 13 ζ(R)) = 0, so ζ (R) = 0 ∈ R2 . Similarly, T ( 13 ζ (B) + 13 ζ (G)) = 13 ζ(B) + 13 ζ(G) = 0, so A = A ◦ T implies A ( 13 ζ (B) + 13 ζ (G)) = 0, and so ζ (B) = −ζ (G). Finally, A ( 13 ζ (B)) = 13 = A ( 13 ζ (G)). In other words, ζ encodes exactly the same information about B and G as ζ: the two events are equally ambiguous, but their ambiguities cancel out. Of course, ζ does so in a more parsimonious way. Thus, intuitively, ambiguity in the Ellsberg paradox is really “one dimensional,” regardless of the particular vector representation one chooses. The analysis in Section 4.1 expands upon this observation. VEU AND ATTITUDES TOWARD VARIATION
4. ANALYSIS OF THE REPRESENTATION AND ADDITIONAL RESULTS 4.1. Heuristic Construction of the Representation The VEU representation is constructed in three key steps. First, a preliminary numerical representation is obtained invoking results from Maccheroni, Marinacci, and Rustichini (2006); see item 6 in Proposition 6. Second, the baseline prior p is identified: Lemma 1 (cf. also Observation 1) implies that if Axioms 7 and 8 hold,18 there exists a unique probability p such that, for every complementary pair (f f¯), f f¯ iff Ep [u ◦ f ] ≥ Ep [u ◦ f¯], as was claimed in the Introduction. By Axiom 6, p is countably additive (Lemma 5). The third key step is the construction of the adjustment factors ζi and the function A. To provide some intuition, it is useful to focus once again on the three-color-urn problem of the Introduction and Example 1. Recall that the prior p on the state space Ω = {R G B} is assumed to be uniform. Figure 1 depicts the set F0 of acts in the problem under consideration; assuming linear utility for simplicity, this is identified with Euclidean space R3 . The upward-sloping plane in the picture corresponds to the set of crisp acts; in this example, ambiguity concerns the relative likelihood of G vs. B, so intuitively an act h is crisp if and only if h(G) = h(B). Denote this set by C and denote by NC the orthogonal complement of C relative to the inner product defined by the baseline prior p: that is, g ∈ NC if and only if Ep [g · h] = 0 for all h ∈ C. In Figure 1, this set corresponds to the line perpendicular to C and going through the origin.19 By definition, elements of NC are uncorrelated with any crisp act, and thus may be thought of as “purely ambiguous”; the acronym NC stands for the more neutral term noncrisp. In this example, both C and NC 18 As noted above, Axiom 8 need not be imposed explicitly in most cases of interest for applications. 19 Since p is uniform, in this example the elements of NC are also orthogonal to C in the usual Euclidean sense.
814
M. SINISCALCHI
FIGURE 1.—Crisp and noncrisp acts in the Ellsberg paradox.
are easily seen to be closed subsets of R3 . For the general case, see Lemma 6 in the Appendix. It is now possible to define the collection ζ = (ζi )0≤i
VEU AND ATTITUDES TOWARD VARIATION
815
ticular, g = Ep [ζ0 · f ] · ζ0 . In the general case, the expectations Ep [ζi · u ◦ f ], viewed as inner products, are the Fourier coefficients of f relative to the orthonormal basis ζ = (ζi )0≤i n, every tuple f1 fm ∈ Fb admits a crisp combination. If, additionally, (u p n ζ A) is sharp, then 2. for every finite m ≤ n, there is a tuple f1 fm ∈ Fb that admits no crisp combination; 3. for every other VEU representation (u p n ζ A ) of the extension of to Fb , n ≥ n. 4. n = 1 if and only if is not consistent with EU and, for all f g g¯ ∈ Fb such that g g¯ are complementary and not constant, and all α ∈ [0 1], either αf + (1 − α)g or αf + (1 − α)g¯ is crisp. This result complements the analysis in the preceding subsection, and reinforces the interpretation of the number n as reflecting the multiplicity and complexity of the “sources of ambiguity” in a given decision situation. Part 1 of Proposition 1 states that, given any collection of more than n acts, it is possible to construct a crisp combination, that is, a perfect hedge against ambiguity. Intuitively, this means that there cannot be more than n distinct sources or forms of ambiguity; for instance, in the three-color-urn example, given any two noncrisp acts, it is always possible to construct a combination act that delivers the same outcome in states G and B, and is therefore not subject to ambiguity. Conversely, part 2 of the Proposition 1 asserts the existence of a tuple of up to n acts that cannot be combined in any way to construct a perfect hedge. Intuitively, this suggests that each act in such a tuple is subject to a different source
816
M. SINISCALCHI
or form of ambiguity. It is also instructive to note that the tuple f1 fm in the statement is constructed by rescaling the adjustment factors (ζi )0≤i
VEU AND ATTITUDES TOWARD VARIATION
817
In the VEU representation, it also seems plausible to associate nonpositive, but not necessarily concave, adjustment functions with a (different) form of ambiguity aversion. This property turns out to be characterized by weaker forms of Axiom 9 for VEU preferences. AXIOM 10—Complementary Ambiguity Aversion: For all complementary ¯ 12 f + 12 f¯ 12 x + 12 x. ¯ pairs (f f¯) and prizes x x¯ ∈ X such that f ∼ x and f¯ ∼ x, AXIOM 11—Simple Diversification: For all complementary pairs (f f¯) with f ∼ f¯, 12 f + 12 f¯ f . Both axioms have the standard hedging interpretation, but are restricted to complementary acts. Axiom 11 is related to the “diversification” property of Chateauneuf and Tallon (2002). Finally, Ghirardato and Marinacci (2002) proposed a way to compare ambiguity attitudes across decision-makers, mirroring analogous definitions for risk attitudes. This leads to a “comparative” notion of ambiguity aversion. For VEU preferences, this notion, too, characterizes a negative adjustment function. The details are as follows. DEFINITION 4: Given two preference relations 1 and 2 on F0 , 1 is more ambiguity-averse than 2 iff, for all f ∈ F0 and x ∈ X, f 1 x ⇒ f 2 x. Also, 1 is comparatively ambiguity-averse if it is more ambiguity-averse than a preference relation 2 that is consistent with EU. PROPOSITION 2: Let be a preference relation with VEU representation (u p n ζ A). Then the following statements are equivalent: 1. is comparatively ambiguity-averse. 2. satisfies Axiom 10. 3. For all ϕ ∈ E (u ◦ F0 ; p ζ), A(ϕ) ≤ 0. If u(X) is unbounded above or below, or if satisfies Axiom 5∗ , then statements 1–3 are equivalent to the following statement: 4. satisfies Axiom 11. A VEU preference that satisfies the equivalent conditions 1–4 is not necessarily variational or, a fortiori, consistent with maxmin EU. (For completeness, such a VEU preference is also not ambiguity-loving in the sense of Schmeidler, except in the trivial case, i.e., if it is ambiguity-neutral.) The following example shows that this additional flexibility can be advantageous. EXAMPLE 2: Machina (2009) considered the following situation. Let Ω = {ω1 ω4 } and assume that {ω1 ω2 } and {ω3 ω4 } are known to be equally likely (and not ambiguous); the relative likelihood of ω1 vs. ω2 and of ω3 vs.
818
M. SINISCALCHI TABLE I MACHINA’S REFLECTION EXAMPLE: REASONABLE PREFERENCES f1 ≺ f2 AND f3 f4
f1 f2 f3 f4
ω1
ω2
ω3
ω4
$4000 $4000 $0 $0
$8000 $4000 $8000 $4000
$4000 $8000 $4000 $8000
$0 $0 $4000 $4000
ω4 , is not known. Assume further that X = R and u is linear (this is inconsequential for the example). Consider the monetary bets (acts) in Table I. Notice that f1 and f4 only differ by a “reflection,” that is, by exchanging prizes on states that are informationally symmetric. The same is true of f2 and f3 . Hence, it is plausible to expect that f1 ∼ f4 and f2 ∼ f3 . In particular, Machina (2009) conjectured, and L’Haridon and Placido (2009) verified experimentally, that a plausible pattern of “ambiguity-averse” preferences is f1 ≺ f2 and f3 f4 . Machina showed that this pattern is inconsistent with Choquet EU if informational symmetries are respected. Baillon, L’Haridon, and Placido (2008) showed that the same is true for maxmin EU and variational preferences. Recall that the latter two preference models satisfy Schmeidler’s notion of ambiguity aversion.22 However, it is possible to rationalize this pattern with VEU preferences that satisfy comparative ambiguity aversion and respect informational symmetries. Let p be uniform and define two adjustment factors by ζ0 (ω1 ) = 1 = −ζ0 (ω2 ), ζ1 (ω3 ) = 1 = −ζ1 (ω4 ), and ζ0 (ω3 ) = ζ0 (ω4 ) = ζ1 (ω1 ) = ζ1 (ω2 ) = 0. 2 Finally, consider1 the adjustment function A : R → R given by A(φ0 φ1 ) = 1 − 2 1 + |φ0 | − 2 1 + |φ1 | + 1. Monotonicity may be verified by applying Remark 2 (Appendix); straightforward calculations show that the pattern f1 ≺ f2 and f3 f4 is obtained. Finally, A(φ0 φ1 ) ≤ 0 for all (φ0 φ1 ), and so these VEU preferences are comparatively ambiguity-averse by Proposition 2. Since the adjustment function A is not concave on R2 , these VEU preferences do not satisfy Axiom 9 and hence are not variational, and since A(φ) < 0 unless φ = 0 and A is not convex, these VEU preferences are also not ambiguity-loving. For additional discussion of Machina’s reflection example, see Siniscalchi (2008). Turn now to the comparison of ambiguity attitudes across individuals. The Ghirardato and Marinacci “more ambiguity averse than” ordering also has a 22
Smooth-ambiguity preferences Klibanoff, Marinacci, and Mukerji (2005) also rule out this pattern under the appropriate ambiguity-aversion assumption (concavity of the second-order utility).
VEU AND ATTITUDES TOWARD VARIATION
819
simple characterization for VEU preferences. To obtain a meaningful comparison of ambiguity attitudes, it is necessary to ensure that the preferences being compared are represented by the same utility function and baseline prior.23 Furthermore, a comparison solely in terms of the adjustment functions can be obtained if the preferences under consideration also share the same adjustment factors. Proposition 3 provides behavioral characterizations of these conditions, and Proposition 4 characterizes the “more ambiguity averse than” relation for the VEU representation. PROPOSITION 3: Consider two VEU preferences 1 and 2 with representations (u1 p1 n1 ζ 1 A1 ) and (u2 p2 n2 ζ 2 A2 ). Then the following statements are equivalent: 1. For all complementary pairs (f f¯) in F0 , f 1 f¯ if and only if f 2 f¯. 2. p1 = p2 and u1 u2 differ by a positive linear transformation. Furthermore, if statement 1 holds, then 1 and 2 admit a sharp VEU representation with the same vector of adjustment factors if and only if they admit the same set of crisp acts.24 PROPOSITION 4: Consider two VEU preferences 1 and 2 on F0 with representations (u p n1 ζ 1 A1 ) and (u p n2 ζ 2 A2 ). Then 1 is more ambiguityaverse than 2 if and only if, for all f ∈ F0 , A1 (Ep [ζ 1 · u ◦ f ]) ≤ A2 (Ep [ζ 2 · u ◦ f ]). In particular, if n1 = n2 and ζ 1 = ζ 2 = ζ, then 1 is more ambiguity-averse than 2 if and only if A1 (ϕ) ≤ A2 (ϕ) for all ϕ ∈ E (u ◦ F0 ; p ζ). To conclude, Epstein (1999) proposed an alternative definition of ambiguity aversion in which the benchmark is probabilistic sophistication (Machina and Schmeidler (1992)) rather than EU. The implications of this definition for VEU preferences are left to future work. 4.4. Updating This section proposes an updating rule for VEU preferences. Throughout this subsection, two binary relations on F0 will be considered: denotes the individual’s ex ante preferences, whereas E denotes her preferences conditional upon the event E ∈ Σ. To keep notation to a minimum, the event E will be fixed throughout. To provide some heuristics for the proposed updating rule, recall that the VEU preference functional V : F0 → R can be rewritten in “covariance” form: compare Eq. (2) in the Introduction. One possible way the individual might 23 Note that the ranking in Definition 4 already implies that the utility functions coincide: see the proof of Proposition 2. 24 The final statement is not true for VEU representations that are not sharp: examples are readily obtained.
820
M. SINISCALCHI
update her preferences upon learning that the event E has occurred is to update her baseline prior p and use the same functional representation: that is, replace Ep [·] and Covp (· ·) in Eq. (2) with Ep [·|E] and Covp (· ·|E), where Covp (a b|E) = Ep [(a − Ep [a|E])(b − Ep [b|E])|E].25 However, the resulting preferences may violate monotonicity, and in fact the functional A may not even be defined for all vectors (Covp (ζi u ◦ f |E))0≤i 0. As is the case for conditional EU preferences, it will be assumed throughout that the evaluation of acts upon learning that the event E has occurred does not depend upon the consequences that might have been obtained if, counterfactually, E had not obtained: AXIOM 13—Null Complement: For all f g ∈ F0 , if f (ω) = g(ω) for all ω ∈ E, then f ∼E g. The main axiom of this section can be informally stated as follows: if two acts have the same baseline EU evaluation both ex ante and conditional upon E, and the utility of the outcomes they deliver differs from this baseline evaluation only on the event E, then their ex ante and conditional ranking should be the same. This is consistent with the proposed interpretation of VEU preferences. Consider 25
In the covariance formulation, the fact that, in general, Ep [ζi |E] = 0 is inconsequential.
VEU AND ATTITUDES TOWARD VARIATION
821
an individual whose preferences are VEU both ex ante and conditional on E. Upon learning that E has occurred, her evaluation of an act f may change for two reasons: the baseline EU evaluation of f may change, and utility variability in states outside E no longer matters. However, for acts such that the baseline evaluation does not change upon conditioning on E, and which exhibit no variation away from the baseline evalution at states outside E to begin with, it seems plausible to assume that the individual’s evaluation of such acts will not change. These special acts can be characterized by a behavioral condition that, once again, involves complementarity. Consider two complementary acts h h¯ ∈ F0 ¯ ¯ ) for all that are constant on Ω \ E; that is, h(ω) = h(ω ) and h(ω) = h(ω
ω ω ∈ Ω \ E. Suppose that, for any (hence all) ω ∈ Ω \ E, (8)
1 1¯ 1 1 h + h(ω) ∼ h¯ + h(ω) 2 2 2 2
If the preference relation happens to be consistent with EU, then Eq. (8), together with complementarity, readily imply that h ∼ h(ω) for any (hence all) ω ∈ Ω \ E.26 This indicates that h(ω) is a certainty equivalent of h ex ante. However, intuitively, h(ω) can also be viewed as a “conditional certainty equivalent” of h given E: since h(ω ) = h(ω) for all ω ∈ Ω\E, the ranking h ∼ h(ω) suggests that receiving h(ω) for sure at states in E is just as good for the individual as allowing the act h to determine the ultimate prize she will receive conditional upon E.27 Thus, for an EU preference, Eq. (8) implies that the act h has the same certainty equivalent both ex ante and conditional upon E. For general VEU preferences, the above intuition obviously does not apply: it may well be the case that h h(ω) for ω ∈ Ω \ E. However, recall that complementary independence (Axiom 7) implies that VEU preferences always rank complementary acts in accordance with their baseline EU evaluation. Since the mixture acts in Eq. (8) are complementary, the above intuition does apply to the EU preference determined by the individual’s baseline prior. One then concludes that if Eq. (8) holds, then h(ω) is a baseline certainty equivalent of h, both ex ante and conditional upon E; this is formally verified in the proof of Proposition 5. Furthermore, it is clear that h deviates from this baseline only at states in E. Thus, Eq. (8) identifies the class of acts that should be ranked consistently by prior and conditional VEU preferences. ¯ By complementarity, 12 h + 12 h¯ ∼ 12 h(ω) + 12 h(ω); by independence, combining this relation ¯ with Eq. (8) yields 12 h + 12 k ∼ 12 h(ω) + 12 k, with k = 12 h¯ + 12 h(ω). Invoking independence once more yields h(ω) ∼ h. 27 Indeed, this condition may be used to characterize Bayesian updating for EU preferences, as well as prior-by-prior Bayesian updating for maxmin expected utility (MEU) preferences; see Pires (2002). 26
822
M. SINISCALCHI
AXIOM 14 —Baseline-Variation Consistency: For all complementary pairs ¯ such that f f¯ g g¯ are constant on Ω \ E, and for every (f f¯) and (g g) 1 ¯ ∼ 12 g¯ + 12 g(ω), f E g if ω ∈ Ω \ E, 2 f + 12 f¯(ω) ∼ 12 f¯ + 12 f (ω) and 12 g + 12 g(ω) and only if f g. PROPOSITION 5: Consider a preference relation on F0 having a VEU representation (u p n ζ A), an event E ∈ Σ, and another binary relation E on F0 . Assume that E is complete and transitive, and that Axiom 12 holds. Then the following statements are equivalent. 1. Axioms 13 and 14 hold. 2. E has a VEU representation (u p(·|E) n ζE A), where ζE = (ζiE )0≤i
F∈Π
In other words, for every i, the coefficient Ep [ζi a] can be obtained from the conditional baseline expectations Ep [a|F] and conditional coefficients Ep [ζiF a|F] for all F ∈ Π, just like the baseline expectation Ep [a] can be obtained from the conditional baseline expectations Ep [a|F]. Turn now to the consumption–savings example.
VEU AND ATTITUDES TOWARD VARIATION
823
Setup and Notation Consider an agent who has an initial endowment, or wealth, of w0 units of a single good and wishes to consume in periods t = 0 T . At each time t = 0 T − 1, she can choose how much of her current wealth wt to save (st ) and to consume (ct = wt − st ). A unit saved at time t yields rt units of the good at time t + 1, where (rt )0≤i 1 or L < H with equal probability. This is the only technology that allows the agent to transfer the good across periods. Informally, I shall assume that the agent perceives ambiguity about the correlation between rt and rt+1 ; this is inspired by Seidenfeld and Wasserman (1993). Formally, let the state space Ω be the collection of all realizations of the process (rt )0≤t
T Assume discounted power utility on X: u(x) = t=0 δt v(xt ) with v(c) = c 1−γ /(1 − γ). Let the baseline prior p be uniform on Ω, which reflects the distributional assumptions on (rt )0≤t 0 is “suitably small.” Observe that ζt is Πt+2 -measurable; furthermore, it can be verified that Ep [ζt ] = 0 for all 0 ≤ t < T − 1, as required by DefiniT −2 tion 1. Finally, let the adjustment function be defined by A(ϕ) = − t=0 |ϕt | for all ϕ ∈ RT −1 . The following facts are established in Section S.4 of the supplemental material: First, for all f ∈ FA ,
T T −2
T
t s (10) V (f ) = δ Ep [v ◦ ft ] − δ v ◦ fs
Ep ζt
t=0
t=0
s=t+2
824
M. SINISCALCHI
Moreover, the updating rule in Eq. (7) yields, for each τ and F ∈ Πτ , a collection (ζtF )0≤t
Vτ (f |F) = v ◦ fτ +
T
δt−τ Ep [v ◦ ft |F]
t=τ+1
T
s−τ δ v ◦ fs |F −
Ep ζtF
T −2
t=τ−1
s=t+2
Equation (9) is also simpler here: for all a : Ω → R and all t, Ep [ζtΠτ (ω) a|Πτ (ω)] = (12) Ep [ζtG a|G] G∈Πτ+1 : G⊂Πτ (ω)
Analysis of Consumption–Savings Choices The consistent-planning algorithm prescribes that, at each time τ and for any possible cell f ∈ Πτ , the agent choose the level of savings that maximizes her conditional VEU payoff as per Eq. (11), calculated assuming that consumption–savings choices at all subsequent times t = τ + 1 T − 1 and cells G ∈ Πt (with G ⊂ F ) are as determined in prior iterations of the procedure.29 This is conceptually straightforward. However, naively computing the expectations in Eq. (11) at time τ as just described is both analytically cumbersome and computationally intensive: for each possible consumption level at time τ, it is necessary to explicitly calculate how this choice would influence all subsequent consumption–savings decisions at times t > τ. In other words, at any decision point, the entire continuation subtree following a consumption choice must be taken into account. With EU preferences, this is avoided by assigning a continuation value to the subtree following each consumption choice; the decision faced at any time τ then effectively reduces to a simple, two-period problem. It will now be shown that, by virtue of Eq. (12), a similar recursive approach is also possible with VEU preferences and baseline-prior updating. The main difference is that, together with a (baseline) continuation value, it is also necessary to iteratively construct a continuation adjustment corresponding to each adjustment factor ζt . To initialize the recursion, for every w ≥ 0, let VT +1 (w) = 0. Now assume that Vτ+1 and Φτ+1t have been defined for τ + 1 ≤ T + 1 and τ − 1 ≤ t ≤ T − 2; fix τ−1 In the notation of Eq. (7), VF (f ) = t=0 δt Ep [v ◦ ft |F] + δτ Vτ (f |F); however, when evaluating continuation plans at time τ, only Vτ (f |F) is relevant. 29 A simplifying feature of this example is that ties do not arise. 28
VEU AND ATTITUDES TOWARD VARIATION
825
∗ (w) be the (unique, as it turns out) solution to F ∈ Πτ and w ≥ 0, and let sτF the problem
(13)
max v(w − s) + δEp [Vτ+1 (rτ s)|F]
s∈[0w]
−δ
T −2
Φτ+1t (Hs|F ∩ Hτ ) + Φτ+1t (Ls|F ∩ Lτ ) t=τ−1
where, as usual, a summation over an empty index set equals zero. As with EU ∗ preferences, it turns out that sτF (w) = ατF w, where ατF does not itself depend upon w. To complete the inductive step, define the baseline continuation value
∗ ∗ Vτ (w) = v(w − sτF (14) (w)) + δEp Vτ+1 (˜r sτF (w))|F ; then define the continuation adjustments (15)
Φτt (w|F) ⎧ ∗ ⎪ ⎨ δ Φτ+1t (HsτF (w)|F ∩ Hτ ) ∗ = + Φτ+1t (LsτF (w)|F ∩ Lτ ) τ − 1 ≤ t ≤ T − 2, ⎪ ⎩ ζτ−2F (ω)Vτ (w) for any ω ∈ F, t = τ − 2
(the cases t = τ − 1 and t = τ − 2 also require t ≥ 0). Observe that continuation adjustments use the same state variable w as the continuation value; however, they also depend upon the conditioning event F . This is required to keep track of the realization of adjustment factors. The (unique) recursive solution to the problem is the act f ∗ ∈ FA for which ∗ consumption fτ∗ (ω) at time τ in state ω ∈ F ∈ Πτ equals (1 − ατF )wτf (ω). Section S.4 (supplemental material) proves that this coincides with the solution obtained by direct application of the consistent-planning algorithm. ∗ A key step of the argument uses Eq. (12) to show that Φτt (wτf (ω)|F) = T Ep [ζtF s=t+2 δs−τ v ◦ fs∗ |F] for ω ∈ F : that is, as claimed, the functions Φτt keep track of adjustments. As a result, the problem in Eq. (13) is analogous to a two-period decision situation: it is not necessary to explicitly trace out the effects of the choice of s at time τ on subsequent decisions, because the relevant payoff information is encoded in the functions defined in Eqs. (14) and (15). 4.6. Complementary Independence for Other Decision Models This section investigates the implications of the complementary independence axiom for four well known families of preferences: the maxmin-expected utility (MEU) model of Gilboa and Schmeidler (1989), the variational preferences model of Maccheroni, Marinacci, and Rustichini (2006), the Choquet
826
M. SINISCALCHI TABLE II NECESSARY AND SUFFICIENT CONDITIONS FOR COMPLEMENTARY INDEPENDENCE
Model
Representation I(a)
MEU
minq∈C Eq [a]C ⊂ ba1 (Σ)
∀q ∈ C, 2p − q ∈ C
Variational
minq∈ba1 (Σ) (Eq [a] + c ∗ (q)) u unbounded above or below, xf ∼ f and c ∗ (q) = supf ∈F0 (u(xf ) − Eq [u ◦ f ]) a dv, · dv Choquet integral w.r.t. capacity v φ(Eq [a]) dμ(q) ba1 (Σ) μ has finite support
∀q ∈ ba1 (Σ), 2p − q ∈ ba1 (Σ) ⇒ c ∗ (q) = c ∗ (2p − q), and 2p − q ∈ / ba1 (Σ) ⇒ c ∗ (q) = ∞
CEU Smooth
Property of Baseline Prior p
∀E ∈ Σ, 1 − v(Ω \ E) = 2p(E) − v(E) (Only sufficient) ∀q ∈ ba1 (Σ), 2p − q ∈ ba1 (Σ) ⇒ μ(q) = μ(2p − q) and 2p − q ∈ / ba1 (Σ) ⇒ μ(q) = 0
expected utility (CEU) model of Schmeidler (1989), and the smooth-ambiguity model of Klibanoff, Marinacci, and Mukerji (2005). In the interest of conciseness, the results are presented in tabular form (see Table II); the reader is referred to the original papers for details on the representations and their axiomatizations, and to Section S.2 of the supplemental material for formal statements and proofs. The second column in Table II indicates the functional I : u ◦ F0 → R that, along with a utility function u : X → R, represents preferences in each of these models: that is, for all f g ∈ F0 , f g if and only if I(u ◦ f ) ≥ I(u ◦ g). Notation: ba1 (Σ) is the set of probability charges on (Ω Σ). The third column in Table II contains the main results of this subsection. Each entry should be interpreted as follows: the model under consideration satisfies complementary independence (Axiom 7) if and only if there exists a probability p ∈ ba1 (Σ) with the properties indicated in the table. For the smooth-ambiguity model, this condition is only sufficient for Axiom 7.30 It is also important to notice that, for each of these models, under the stated condition, the baseline probability p is fully characterized by preferences: it is the only probability charge such that, for all complementary pairs of acts (f f¯), f f¯ if and only if Ep [u ◦ f ] ≥ Ep [u ◦ f¯]. Table II emphasizes the formal analogy among the various conditions for complementary independence (CI). This allows a unitary interpretation of these conditions. Consider first the MEU, variational, and smooth models. Fix an act f and compute its baseline EU evaluation Ep [u ◦ f ]. Suppose that a probability charge q provides a more pessimistic evaluation of f , in the sense that 30 In the setting of Klibanoff, Marinacci, and Mukerji (2005), it is easy to provide a condition on second-order preferences that is equivalent to the property in Table II and hence implies complementary independence.
VEU AND ATTITUDES TOWARD VARIATION
827
Ep [u◦f ] > Eq [u◦f ]. It is then immediate to verify that E2p−q [u◦f ] > Ep [u◦f ], so the charge 2p − q provides a more optimistic evaluation of f . Indeed, E2p−q [u ◦ f ] exceeds the baseline Ep [u ◦ f ] precisely by the amount by which the latter exceeds Eq [u ◦ f ]. For CI to hold in the MEU, variational, and smooth models, the probability charges q and 2p − q must receive the same weight in the representation of preferences, where the precise meaning of “weight” is modelspecific.31 Informally, under CI, the individual must hold a balanced view of probabilistic assessments that are equally pessimistic and optimistic relative to the baseline p. Thus, the latter serves as a cognitive “center of symmetry.” In the CEU model, the set function defined by E → 1 − v(Ω \ E) is usually called the dual of the capacity v. Furthermore, if v is ambiguity-averse in the sense of Schmeidler (1989), its dual is ambiguity-loving. According to Table II, under CI the dual of v is precisely 2p − v. Again, this suggests that the baseline p acts as a center of symmetry between capacities representing pessimistic and optimistic evaluations.32 This property is satisfied, for instance, in several well known specifications of MEU preferences. For finite state spaces, one important example is provided by mean-standard deviation preferences, represented by the functional V (f ) = Ep [u ◦ f ] − θσp (u ◦ f ) (Grant and Kajii (2007)); analogous representations for general state spaces can be obtained by replacing the standard deviation σp (·) with a different measure of dispersion, such as the Gini mean difference (Yitzhaki (1982)) to ensure monotonicity. For a different, broad class of MEU examples, consider a finite state space Ω, fix a baseline prior p, and let C = {q ∈ Δ(Ω) : p − q ≤ ε}, where · denotes any p norm (p ≥ 1) on R|Ω| ; this suggests a concern for robustness to the misspecification of the baseline prior p. Further details may be found in Siniscalchi (2007). 5. DISCUSSION 5.1. Related Literature In the context of choice under risk, Quiggin and Chambers (1998, 2004) analyzed models featuring an exogenously given, objective reference probability p. Under suitable assumptions, a random variable y is evaluated according to the difference between its expectation Ep (y) with respect to p and a “risk index” ρ(y). See also Epstein (1985) and Safra and Segal (1998). Similar functional forms also appear in the social-choice literature. A classic result owing to Roberts (1980) characterized social-welfare functionals that evaluate a profile u1 uI of utility imputations according to the form 31 For the MEU model, p must be the barycenter of the set of priors C; for variational preferences, q and 2p − q must be equally “costly”; and in the smooth model, q and 2p − q must receive the same second-order probability. 32 I emphasize that ambiguity aversion is not required for the characterization in Table II; however, the interpretation in the text may be more transparent for ambiguity-averse preferences.
828
M. SINISCALCHI
¯ uI − u), ¯ where u¯ = 1I i ui . Ben-Porath and Gilboa (1994) u¯ − g(u1 − u characterized orderings over income distributions that can be represented in what is essentially a special case of the VEU functional, with the uniform distribution as reference probability. These contributions suggest an alternative formulation of the VEU representation. Assume for simplicity that the state space Ω is finite and write Ω = {ω0 ωn−1 }. Also consider a strictly positive probability p on Ω and a utility function u. For every 0 ≤ i < n, let (16)
ζic (ωi ) =
1 − p({ωi }) p({ωi })
and
ζic (ωj ) = −1
∀j = i
Then, for every f ∈ F0 , Ep [ζic · u ◦ f ] = u(f (ωi )) − Ep [u ◦ f ], so (Ep [ζic · u ◦ f ])0≤i
VEU AND ATTITUDES TOWARD VARIATION
829
preferences) allow for multiple supporting hyperplanes and hence, typically, multiple reference priors. One way to ensure uniqueness is to assume that indifference curves are “flat” or smooth at certainty, but, in this case, the prior p only reflects (indeed, under smoothness, approximates) local behavior around the certainty line. The baseline prior in the VEU representation is instead uniquely identified by preferences over complementary acts. Hence, every act contributes to the behavioral identification of the baseline prior. Furthermore, Grant and Polak maintain a form of ambiguity aversion, which is required for the existence of a supporting hyperplane at certainty; the VEU representation instead allows for arbitrary ambiguity attitudes. Finally, the ambiguity index ρ in Grant and Polak (2007) is not invariant to sign changes; the VEU adjustment functional A instead satisfies this invariance property, which supports the intuition that adjustments to baseline evaluations reflect outcome variability, or dispersion. On the other hand, the analysis of VEU preferences provided in this paper does assume and rely upon translation invariance (cf. Axiom 5); however, see Section 5.2 below. Decision models that incorporate a reference prior have also been analyzed in environments where the objects of choice either consist of or include sets of probabilities. In Stinchcombe (2003), Gajdos, Tallon, and Vergnaud (2004b), and Gajdos, Hayashi, Tallon, and Vergnaud (2008), the reference prior is characterized as the Steiner point of the set of probabilities under consideration. In Gajdos, Tallon, and Vergnaud (2004a) and Wang (2003), each object of choice explicitly indicates the reference prior. The present paper complements the analysis of these authors by characterizing a decision model that features a baseline prior in a fully subjective environment. Kopylov (2006) axiomatized a special case of MEU preferences, where the characterizing set of priors is generated by ε-contamination: that is, it takes the form {(1−ε)p+εq : q ∈ Δ}, where p serves as a reference prior and Δ is a set of “contaminating” probability measures. While the prior p is endogenously derived, the set Δ must be specified exogenously. Chateauneuf, Eichberger, and Grant (2007) characterized CEU with respect to a “neo-additive” capacity; this model can be viewed as α-maxmin expected utility with a set of priors obtained by ε-contamination, in which the reference prior and the “contaminating set” are both endogenously derived. Finally, as was noted following Corollary 2 and elsewhere, VEU preferences that satisfy Schmeidler’s ambiguity-aversion assumption (i.e., Axiom 9) are also variational preferences. In this case, the VEU representation can provide a convenient alternative to the variational specification. To elaborate, recall that in the canonical variational representation (cf. the second row in Table II), the utility index V (f ) assigned to an act f is the value of a minimization problem: V (f ) = minq∈ba1 (Σ) Eq [u ◦ f ] + c ∗ (q). In general, there may be no closedform solution to this problem and hence no explicit expression for the utility in-
830
M. SINISCALCHI
dex V (f ).34 On the other hand, the VEU utility index V (f ) is explicitly defined in Eq. (3); VEU representations with a concave function A can thus provide a family of richly parameterized, analytically convenient specifications of variational preferences. Furthermore, Theorem 1 and Corollary 2 provide a full behavioral characterization of preferences that are both VEU and variational. It is worth emphasizing, however, that VEU preferences enable the modeler to capture more nuanced forms of aversion to ambiguity than are allowed by maxmin EU or variational preferences (cf. Section 4.3). 5.2. Additional Features and Extensions Probabilistic Sophistication Non-EU VEU preferences can be probabilistically sophisticated in the sense of Machina and Schmeidler (1992). A characterization of probabilistic sophistication for VEU preferences is left for future work; Section S.3 in the supplemental material provides a simple, related result that sheds further light on the central role of baseline probabilities in the VEU model. Given a preference relation on F0 , define the induced likelihood ordering ⊂ Σ × Σ by ∀E F ∈ Σ
E F
⇔
xEy xFy
for all x y ∈ X with x y
Proposition 10 (supplemental material) shows that the likelihood ordering induced by a VEU preference is represented by a probability measure μ if and only if μ is its baseline prior. Translation-Invariance Because they satisfy the weak certainty independence axiom (Axiom 5), VEU preferences are invariant to “translation in utility space”; in the language of Grant and Polak (2007), they display “constant absolute ambiguity aversion,” as do, for instance, MEU, CEU, variational, and invariant-biseparable preferences. However, this is solely a consequence of Axiom 5: the key axiom in the characterization of the VEU representation, namely complementary independence (Axiom 7), does not imply or require translation-invariance. For instance, consider the smooth-ambiguity model of Klibanoff, Marinacci, and Mukerji (2005): Section 4.6 provides a sufficient condition for complementary independence that involves only the second-order probability μ, but not the second-order utility φ; the latter is unrestricted. Smooth-ambiguity preferences are translation-invariant if and only if φ is negative exponential or linear; it then follows that there exists a rich class of smooth-ambiguity preferences 34
Hansen and Sargent’s (2001) multiplier preferences are variational preferences for which the minimization problem does have a closed-form solution; their popularity in applications is probably due in part to this fact.
VEU AND ATTITUDES TOWARD VARIATION
831
that are not translation-invariant, but nevertheless satisfy complementary independence. For a different perspective on this issue, consider an “aggregator” function W : R2 → R, strictly increasing in both arguments. Also let u p ζ, and A be as in the VEU representation. Then one may consider preferences defined by letting, for all f g ∈ F0 , f g ⇔ W Ep [u ◦ f ] A(Ep [ζ · u ◦ f ]) ≥ W Ep [u ◦ g] A(Ep [ζ · u ◦ g]) The representation in this paper corresponds to the aggregator W (x y) = x + y. It is then easy to verify that Axiom 7 holds for such preferences, even if they are not translation-invariant. Therefore, it may be possible to characterize a version of the VEU representation that does not impose “constant absolute ambiguity aversion.” The resulting model would still feature sign- and translation-invariant adjustments A(Ep [ζ · u ◦ f ]), and hence would be consistent with the variability interpretation described in this paper.35 Such an extension is left to future work. APPENDIX A: CONDITIONS FOR MONOTONICITY REMARK 2: If a tuple (u p n ζ A) satisfies parts 1 and 2 in Definition 1, n < ∞, and A is continuous on E (u ◦ F0 ; p ζ) and differentiable −1 on E (u ◦ F0 ; p ζ) \ A (0), then it satisfies−1part 3 if and only if p(E) + / A (0) and E ∈ Σ. 0≤i 0 such that a + ε1E ∈ B0 (Σ u(X)), (17)
εp(E) + A(Ep [ζ · a] + εEp [ζ · 1E ]) − A(Ep [ζ · a]) ≥ 0
For any ϕ ∈ E (u◦ F0 ; p ζ), if A(ϕ) = 0 or ϕ = Ep [ζ ·a] and a+1E ε ∈ u◦ F0 for some ε > 0, Eq. (17) readily implies the condition in the remark; if A(ϕ) = 0, / u ◦ F0 for any ε > 0, then let F = {ω : a(ω) = ϕ = Ep [ζ · a], but a + 1E ε ∈ max u(X)}; since a is a simple function, F = ∅. Consider the sequence (ak ) [ζ · ak ]) = 0, and there given by ak = a − 1F k1 ; for k large, ak ∈ u ◦ F0 , A(Ep is εk > 0 such that ak + 1E εk ∈ u ◦ F0 . Then p(E) + 0≤i 0 such that a a + 1E ε ∈ u ◦ F0 . To simplify the notation, write ϕη = Ep [ζ · a] + ηEp [ζ · 1E ] for all η ∈ [0 ε]. 35 Axiom 8 would also have to be dropped: after all, its interpretation involves translationinvariance. In any case, recall that its role is limited even in the present setting.
832
M. SINISCALCHI
Consider first the case A(ϕ0 ) = 0. Let ε0 = sup{η ∈ [0 ε] : A(ϕη ) = 0}. If ε0 = 0, then A(ϕη ) is differentiable for all η ∈ (0 ε) and (18)
εp(E) + A(ϕε ) − A(ϕ0 ) = 0 · p(E) + A(ϕ0 ) − A(ϕ0 ) ε ∂ p(E) + A(ϕη )Ep [ζi · 1E ] dη + ∂ϕi 0 0≤i
as required. If ε0 > 0, then by continuity A(ϕε0 ) = 0 = A(ϕ0 ), so (19)
ε0 p(E) + A(ϕε0 ) − A(ϕ0 ) = ε0 p(E) ≥ 0
Thus, in particular, if ε0 = ε, Eq. (17) holds. If instead ε0 < 1, then one can repeat the preceding argument with a = a + ε0 1E and ε = ε − ε0 in lieu of a and ε. By assumption A(Ep [ζ · a ] + ηEp [ζ · 1E ]) = 0 for all η ∈ (0 ε ), so the argument just given implies that (ε − ε0 )p(E) + A(ϕε ) − A(ϕε0 ) ≥ 0. Together with Eq. (19), this implies that Eq. (17) holds in this case as well. Consider now the case A(ϕ0 ) > 0. Let ε1 = sup{η ∈ [0 ε] : A(ϕη ) = 0}. By continuity of A, ε1 > 0; thus, integrating on (0 ε1 ) as in Eq. (18) yields ε1 p(E) + A(ϕε1 ) − A(ϕ0 ) ≥ 0. If ε1 = ε, the proof is complete; otherwise, note that by continuity of A, A(ϕε1 ) = 0. Applying the argument given above to a = a + ε1 1E and ε = ε − ε1 in lieu of a and ε yields (ε − ε1 )p(E) + A(ϕε ) − A(ϕε ) ≥ 0; together with ε1 p(E) + A(ϕε1 ) − A(ϕ0 ) ≥ 0, this implies that Eq. (17) holds. Q.E.D. REMARK 3: If (u p n ζ A) satisfies parts 1 and 2 in Definition 1, and A is concave and positively homogeneous, then (u p n ζ A) satisfies part 3 if and only if p(E) + A(Ep [ζ · 1E ]) ≥ 0 ∀E ∈ Σ. PROOF: Since A is positively homogeneous, it has a unique positively homogeneous extension to E (B0 (Σ); p ζ) given by A(Ep [ζ · αa]) = αA(Ep [ζ · a]) for all α > 0 and a ∈ u ◦ F0 . Hence, A(Ep [ζ · a]) is well defined for all a ∈ B0 (Σ), and A is concave on this domain. Hence, for all ϕ ψ ∈ E (B0 (Σ); p ζ), A(ϕ) = A(ψ + (ϕ − ψ)) = 2A( 12 ψ + 12 (ϕ − ψ)) ≥ 2 12 A(ψ) + 2 12 A(ϕ − ψ), so A(ϕ − ψ) ≤ A(ϕ) − A(ψ). Now suppose that p(E) + A(Ep [ζ · 1E ]) ≥ 0 for all E ∈ Σ and consider a b ∈ B0 (Σ) with a(ω) ≥ b(ω) for all ω. Then a − b ∈ B0 (Σ), and since a(ω) − b(ω) ≥ 0 for all ω, concavity and homogeneity, together with linearity and monotonicity of · dp, imply that (a − b) dp + A(Ep [ζ · (a − b)]) ≥ 0. But the argument given above implies that A(Ep [ζ · (a − b)]) ≤ A(Ep [ζ · a]) − A(Ep [ζ · b]), so a dp + A(Ep [ζ · a]) ≥ b dp + A(Ep [ζ · b]). The other direction is immediate. Q.E.D.
VEU AND ATTITUDES TOWARD VARIATION
833
APPENDIX B: PROOFS B.1. Additional Notation and Preliminaries on Niveloids The indicator function of an event E ∈ Σ will be denoted by 1E . Inequalities between two elements a b of B(Σ) are interpreted pointwise: a ≥ b means that a(ω) ≥ b(ω) for all ω ∈ Ω. Let Φ ⊂ B(Σ) be convex. A functional I : Φ → R is a niveloid iff I(a) − I(b) ≤ sup(a − b) for all a b ∈ Φ; it is normalized if I(γ1Ω ) = γ for all γ ∈ R such that γ1Ω ∈ Φ; it is monotonic iff, for all a b ∈ Φ, a ≥ b implies I(a) ≥ I(b); it is constant-mixture invariant iff, for all a ∈ Φ, α ∈ (0 1), and γ ∈ R with γ1Ω ∈ Φ, I(αa + (1 − α)γ) = I(αa) + (1 − α)γ; it is vertically invariant iff I(a + γ) = I(a) + γ for all a ∈ Φ and γ ∈ R such that a + γ ∈ Φ; and it is affine iff, for all a b ∈ Φ and α ∈ (0 1), I(αa+(1−α)b) = αI(a)+(1−α)I(b). Maccheroni, Marinacci, and Rustichini (2006) (MMR henceforth) demonstrated the usefulness of niveloids in decision theory and established useful results reviewed below. If Φ = B0 (Σ) or Φ = B(Σ), then a functional I : Φ → R is positively homogeneous iff, for all a ∈ Φ and α ≥ 0, I(αa) = αI(a); is c-additive iff I(a + α) = I(a) + α for all α ∈ R+ and a ∈ Φ; is additive iff I(a + b) = I(a) + I(b) for all a b ∈ Φ; is c-linear iff it is c-additive and positively homogeneous; and is linear iff it is additive and positively homogeneous. Let ba(Σ) and ba1 (Σ) denote, respectively, the set of finitely additive measures and the set of charges (finitely additive probabilities) on (Ω Σ). Recall that ba(Σ) is isometrically isomorphic to the norm dual of B0 (Σ) and B(Σ). Also, the σ(ba(Σ) B(Σ)) and σ(ba(Σ) B0 (Σ)) topologies coincide on ba1 (Σ); they are referred to as the weak∗ topology. Furthermore, if Γ ⊂ R is a nonempty, nonsingleton interval, denote by B0 (Σ Γ ) and B(Σ Γ ) the restrictions of B0 (Σ) and B(Σ) to functions taking values in Γ . Then the weak∗ topology on ba1 (Σ) also coincides with the σ(ba(Σ) B0 (Σ Γ )) and σ(ba(Σ) B(Σ Γ )) topologies. The following useful results on niveloids are owing to or reviewed in MMR. In particular, item 6 provides a first representation for preferences satisfying Axioms 1–5. PROPOSITION 6—MMR: Let Γ be an interval such that 0 ∈ int(Γ ) and I : B0 (Σ Γ ) → R. 1. If I is a niveloid, it is supnorm, hence Lipschitz continuous. 2. If I : B0 (Σ K) → R is a niveloid, then it has a (minimal) niveloidal extension to B(Σ). 3. I is a niveloid iff it is monotonic and constant-mixture invariant. 4. If I is constant-mixture invariant, then it is vertically invariant. 5. If I is vertically invariant, then it has a unique, vertically invariant extension Iˆ to B0 (Σ Γ ) + R ≡ {a + 1Ω γ : a ∈ B0 (Σ Γ ) γ ∈ Γ }.
834
M. SINISCALCHI
6. on F0 satisfies Axioms 1–5 if and only if there is a nonconstant, affine function u : X → R and a normalized niveloid I : B0 (Σ u(X)) → R such that f g iff I(u ◦ f ) ≥ I(u ◦ g). The following uniqueness and extension results are straightforward and useful: COROLLARY 3: If I u and I u provide two representations of as per the last point of Proposition 6, then u = αu + β (with α > 0) and I (αa + β) = αI(a) + β for all a ∈ B0 (Σ u(X)). PROOF: Since I and I are normalized, standard results imply that u = αu + β for some α > 0 and β ∈ R. Next, for every a ∈ B0 (Σ Γ ), let f ∈ F0 be such that u ◦ f = a and x ∼ f : thus, since I and I are normalized, u(x) = I(u ◦ f ) = I(a) and similarly u (x) = I (u ◦ f ); that is, αu(x) + β = I (αu ◦ f + β) and therefore αI(a) + β = I (αa + β). [Note that this is consistent with Q.E.D. normalization: αI(γ1Ω ) = αγ and I (αγ1Ω ) = αγ.] COROLLARY 4: A niveloid I : B0 (Σ Γ ) → R admits a unique niveloidal extension to B(Σ Γ ). Therefore, if a preference on F0 admits a niveloidal representation (I u) as in part 6 of Proposition 6, then it admits a unique extension to Fb that satisfies Axioms 1–5. Together with u, the extension of I to B(Σ Γ ) represents the extension of to Fb . PROOF: By Proposition 6, there is a minimal niveloidal extension of I to B(Σ); let Iˆ be its restriction to B(Σ Γ ). If there is another niveloidal extension Iˆ of I to B(Σ Γ ), fix a ∈ B(Σ Γ ) and a sequence ak → a such that ak ∈ ˆ ˆ k ) = limk I(ak ) = limk Iˆ (ak ) = Iˆ (a). B0 (Σ Γ ) for all k. Then I(a) = limk I(a ˆ ˆ ◦ g) for all f g ∈ Fb . One can ˆ ˆ Now define on Fb by f g iff I(u ◦ f ) ≥ I(u verify that this defines a preference relation that satisfies Axioms 1–5. Moreˆ that satisfies the same axioms and coincides with over, consider a preference to Fb . The proof of Lemma 28 in MMR applies verbatim to a preference defined on Fb and yields a representation (Iˆ u ), where Iˆ is a niveloid defined on u ◦ Fb . Since F0 ⊂ Fb , we can take u = u and Iˆ = I on u ◦ F0 . But then ˆ which implies that ˆ = . ˆ Q.E.D. Iˆ = I, NOTE: For notational simplicity, the unique extension of a niveloid I : B0 (Σ Γ ) to B(Σ Γ ) will also be denoted by I. B.2. Characterization of Complementary Independence and Crisp Acts This subsection starts with the “niveloidal representation” of provided by part 6. It will first be shown that Axioms 8 and 7 hold if and only if a “baseline
VEU AND ATTITUDES TOWARD VARIATION
835
linear functional” J can be defined. This identifies a baseline prior. Then, it will be shown that I coincides with J on all crisp acts. Finally, further properties of the set of crisp acts are investigated. To simplify the exposition, throughout this section we maintain the following assumption and definitions: is represented by I u as in Proposition 6, with 0 ∈ int(u(X)). The unique extension of I to B(Σ u(X)), and hence to u ◦ Fb , is implicitly used wherever it is needed. Define J : u ◦ Fb → R by letting, for all a ∈ u ◦ Fb and γ ∈ R with γ − a ∈ u ◦ Fb , (20)
1 1 1 J(a) = γ + I(a) − I(γ − a) 2 2 2
LEMMA 1: J is a well defined, normalized niveloid. If satisfies Axioms 7 and 8 on F0 , then J is affine on F0 and has a unique, normalized, and positive linear extension to B(Σ), also denoted J. Conversely, if J is affine on u ◦ F0 (resp. u ◦ Fb ), then (resp. the extension of to Fb ) satisfies Axioms 7 and 8. PROOF: J as above is well defined. First, for every a ∈ u ◦ Fb , if γ = infΩ a + supΩ a, then γ − a = supΩ a − [a − infΩ a] ∈ u ◦ Fb . Furthermore, if γ γ ∈ R are such that γ − a γ − a ∈ u ◦ Fb , then γ − a = (γ − a) + (γ − γ ), so vertical invariance of I implies that I(γ − a) = I(γ − a) + γ − γ , and so 12 γ − 12 I(γ − a) = 12 γ − 12 I(γ − a) − 12 (γ − γ ) = 12 γ − 12 I(γ − a), as required. Next, J is normalized: if γ ∈ u(X), then γ −γ = 0 ∈ u(X), so J(γ) = 12 γ + 12 I(γ)− 12 I(γ − γ) = 12 γ + 12 γ + 0 = γ, because I is normalized and 0 · 1Ω ∈ u ◦ Fb . Finally, J is a niveloid: for a b ∈ u ◦ Fb , if α β ∈ u(X) are such that α − a β − b ∈ u ◦ Fb , then 2[J(a) − J(b)] = α + I(a) − I(α − a) − β − I(b) + I(β − b) ≤ (α − β) + sup(a − b) + sup(β − b − α + a) Ω
Ω
= 2 sup(a − b) Ω
Turn now to Axioms 8 and 7. First, it will be shown that satisfies Axiom 8 if and only if J( 12 a) = 12 J(a) for all a ∈ u ◦ F . Specifically, let F denote either F0 or Fb . Fix f , f¯, x, and x¯ as in Axiom 8 and let a ∈ u ◦ F and γ ∈ R be such that a = u ◦ f and γ − ¯ = I( 12 f¯ + 12 u(x)); by a = u ◦ f¯. Then 12 f + 12 x¯ ∼ 12 f¯ + 12 x iff I( 12 a + 12 u(x)) ¯ vertical invariance [note that 12 a 12 (γ − a) ∈ u ◦ F ] and the properties of x x, this equals 1 1 1 1 I a + I(γ − a) = I (γ − a) + I(a) 2 2 2 2
836
M. SINISCALCHI
By the definition of J, rearranging terms, this holds iff J( 12 a) − 14 γ = 12 [J(a) − 1 γ], that is, J( 12 a) = 12 J(a). Thus, if J has this property, then Axiom 8 holds. 2 Conversely, for any a ∈ u ◦ F , there is f ∈ F such that u ◦ f = a, and as noted in the first part of this proof, one can find γ ∈ R with γ − a ∈ u ◦ F . Again, there will be f¯ ∈ F with u ◦ f¯ = γ − a, so that f f¯ are complementary: if Axiom 8 holds, the argument just given shows that J( 12 a) = 12 J(a). Now assume that J is affine on Fb . Then, in particular, for all a ∈ Fb , J( 12 a) = J( 12 a + 12 · 0) = 12 J(a) + 12 J(0) = 12 J(a), and, as shown above, in this ¯ and α as in Axiom 7. Let case Axiom 8 holds. Next, consider (f f¯), (g g),
a = u ◦ f and b = u ◦ g, and let z z ∈ R be such that 12 u(f (ω)) + 12 u(f¯(ω)) = z, 1 ¯ u(g(ω)) + 12 u(g(ω)) = z for all ω. Finally, let a¯ = 2z − a and b¯ = 2z − b, so 2 ¯ ¯ ¯ Then f f¯ and g g¯ imply I(a) ≥ I(a) ¯ = I(2z − a), a¯ = u ◦ f and b = u ◦ g. 1 so J(a) = z + 2 I(a) − 12 I(2z − a) ≥ z; similarly, J(b) ≥ z . If J is affine, then J(αa + (1 − α)b) = αJ(a) + (1 − α)J(b) ≥ [αz + (1 − α)z ], so ¯ I(αa + (1 − α)b) − I(αa¯ + (1 − α)b) = I(αa + (1 − α)b) − I(α[2z − a] + (1 − α)[2z − b]) = I(αa + (1 − α)b) − I 2[αz + (1 − α)z ] − αa − (1 − α)b = 2J(αa + (1 − α)b) − 2[αz + (1 − α)z ] ≥ 0 where the last equality follows from the definition of J. Thus, αf + (1 − α)g ¯ that is, Axiom 7 holds. αf¯ + (1 − α)g, Conversely, assume that Axioms 8 and 7 hold on F0 . As shown above, J( 12 a) = 12 J(a) for all a ∈ u ◦ F0 . It will now be shown that J( 12 a + 12 b) = 1 J(a) + 12 J(b) for all a b ∈ u ◦ F0 . 2 Since 0 ∈ int(u(X)), there is δ > 0 such that [−δ δ] ⊂ u(X). Assume first that a b ≤ 12 δ; this implies that (a) a b −a −b ∈ B0 (Σ u(X)) and, furthermore, (b) a − J(a) b − J(b) J(a) − a J(b) − b ∈ B0 (Σ u(X)), because monotonicity of J implies that J(a) J(b) ∈ [− 12 δ 12 δ]. Let f g f¯ g¯ ∈ F0 be such that a − J(a) = u ◦ f , b − J(b) = u ◦ g, J(a) − a = u ◦ f¯, and J(b) − b = ¯ Clearly, (f f¯) and (g g) ¯ are complementary pairs. Furthermore, applyu ◦ g. ing the definition of J with γ = 0, J(a − J(a)) = 12 I(a − J(a)) − 12 I(J(a) − a) and similarly J(b − J(b)) = 12 I(b − J(b)) − 12 I(J(b) − b). Finally, by vertical invariance of J, J(a − J(a)) = J(a) − J(a) = 0 and similarly J(b − J(b)) = 0. ¯ so Axiom 7 implies that 12 f + 12 g ∼ 12 f¯ + 12 g. ¯ It folThus, f ∼ f¯ and g ∼ g, 1 lows that I( 2 [a − J(a)] + 12 [b − J(b)]) = I( 12 [J(a) − a] + 12 [J(b) − b]) or J( 12 [a − J(a)] + 12 [b − J(b)]) = 0, but by vertical invariance of J, this is equivalent to J( 12 a + 12 b) = 12 J(a) + 12 J(b), as claimed. Now, for arbitrary a b ∈ B0 (Σ u(X)), there is an integer K > 0 such that 2−K a 2−K b ≤ 12 δ. Then the argument just given shows that J( 12 (2−K a) +
VEU AND ATTITUDES TOWARD VARIATION
837
(2−K )b) = 12 J(2−K a) + 12 J(2−K b), but it was shown above that, for all c ∈ B0 (Σ u(X)), J( 12 c) = 12 J(c), so it follows that 1 1 1 1 J a + b = 2K J 2−K a + b 2 2 2 2 1 2
1 1 1 1 = 2K J(2−K a) + 2K J(2−K b) = J(a) + J(b) 2 2 2 2 This implies that J(αa + (1 − α)b) = αJ(a) + (1 − α)J(b) for all dyadic rationals α = k2−K , with k ∈ {0 K} for some integer K > 0.36 But since these are dense in [0 1] and J is sup-norm continuous, J is affine. The extension of J to B(Σ) is now standard. Q.E.D. By standard results, if J is linear, there exists a unique p ∈ ba1 (Σ) such that (21) ∀a ∈ B(Σ) J(a) = a dp Ω
OBSERVATION 1: Note that, if f f¯ are complementary acts, then f f¯ iff J(u ◦ f ) ≥ J(u ◦ f¯). Thus, J is identified by preferences over complementary acts. Lemma 1 then shows that if Axioms 7 and 8 hold, such preferences identify the baseline prior p. To investigate further properties of the functional I, a short detour is needed. Begin by defining and characterizing a binary relation, to be interpreted as “unambiguous preference.” The following lemma adapts notions and employs results from Ghirardato, Maccheroni, and Marinacci (2004) (GMM henceforth). Since its proof merely adapts arguments from GMM, it is relegated to the supplemental material. LEMMA 2: There exists a unique, weak∗ compact and convex set C ⊂ ba1 (Σ) such that, for all a b ∈ B0 (Σ u(X)), (22)
∀α ∈ (0 1] c ∈ B0 (Σ u(X)) : I(αa + (1 − α)c) ≥ I(αb + (1 − α)c) ⇐⇒ ∀q ∈ C : a dq ≥ b dq
Furthermore, for all a b ∈ B(Σ u(X)), (23)
36
∀α ∈ (0 1] c ∈ B(Σ u(X)) : I(αa + (1 − α)c) ≥ I(αb + (1 − α)c) a dq ≥ b dq ⇐⇒ ∀q ∈ C :
The claim is easily established by induction on K.
838
M. SINISCALCHI
Notation: Let q(a) = a dq for any q ∈ ba1 (Σ) and q-integrable function a : Ω → R. Next, some key consequences of linearity of J for the set C are investigated. LEMMA 3: Assume that J is linear. Then we can make the following statements: 1. p ∈ C and, for all q ∈ C , 2p − q ∈ C . 2. For all a ∈ B(Σ) such that a ≥ 0, and for all q ∈ C , 2J(a) ≥ q(a). In particular, for all a b ∈ B(Σ) and all q ∈ C , 2J(|a − b|) ≥ q(|a − b|) ≥ |q(a) − q(b)|. PROOF: Consider a b ∈ B0 (Σ u(X)) such that −a −b 2J(a) − a 2J(b) − b ∈ B0 (Σ u(X)), so J(a) = 12 I(a) − 12 I(−a) and similarly for b. Then, for all λ ∈ (0 1] and d ∈ B0 (Σ u(X)), choose γ so that γ − d ∈ B0 (Σ u(X)). Then (1 − λ)γ − λa − (1 − λ)d = λ(−a) + (1 − λ)(γ − d) ∈ B0 (Σ u(X)) and similarly (1 − λ)γ − λb − (1 − λ)d ∈ B0 (Σ u(X)), so the definition of J implies that I(λa + (1 − λ)d) = 2J(λa + (1 − λ)d) + I(λ(−a) + (1 − λ)(γ − d)) − (1 − λ)γ and I(λb + (1 − λ)d) = 2J(λb + (1 − λ)d) + I(λ(−b) + (1 − λ)(γ − d)) − (1 − λ)γ. Therefore, by linearity of J and canceling common terms, I(λa + (1 − λ)d) ≥ I(λb + (1 − λ)d) iff 2J(λa) + I(λ(−a) + (1 − λ)(γ − d)) ≥ 2J(λb) + I(λ(−b) + (1 − λ)(γ − d)). Since a b were chosen so that 2J(a) − a 2J(b) − b ∈ B0 (Σ u(X)), this is also equivalent to I(λ(2J(a) − a) + (1 − λ)(γ − d)) ≥ I(λ(2J(b) − b) + (1 − λ)(γ − d)) by vertical invariance. Finally, since d ∈ B0 (Σ u(X)) if and only if γ − d ∈ B0 (Σ u(X)) for some γ , conclude that a b if and only if 2J(a) − a 2J(b) − b. By Lemma 2, this is equivalent to the condition (24)
∀q ∈ C ⇐⇒
q(a) ≥ q(b) ∀q ∈ C
2J(a) − q(a) ≥ 2J(b) − q(b)
For arbitrary a b ∈ B0 (Σ), let α > 0 be such that αa αb −αa −αb 2J(αa) − αa 2J(αb) − αb ∈ B0 (Σ u(X)) [such an α exists because 0 ∈ u(X)]. Then Eq. (24) must hold for αa and αb, and positive homogeneity of every q ∈ C and J implies that it must hold for a and b as well. Now, for statement 1, define a 0 b for a b ∈ B0 (Σ u(X)) to mean that the left-hand side of Eq. (22) holds, as in the proof of Lemma 2. For every q ∈ C , 2p(Ω) − q(Ω) = 1. Furthermore, for every E ∈ Σ, taking a = 1E and b = 0, q(E) ≥ 0 and so, by Eq. (24), 2p(E) − q(E) ≥ 0 as well. Thus, 2p − q ∈ ba1 (Σ). Thus, let D be the weak∗ convex closure of C ∪{2p−q : q ∈ C }. It is clear that for all a b ∈ B0 (Σ u(X)), r(a) ≥ r(b) for all r ∈ D implies a 0 b; conversely, if a 0 b, then q(a) ≥ q(b) for all q ∈ C , hence 2J(a) − q(a) ≥ 2J(b) − q(b) for all q ∈ C , and hence r(a) ≥ r(b) for all r ∈ D . Since Lemma 2 ensures that C is the unique set of probability charges that represents 0 , C = D , and so for every q ∈ C , 2p−q ∈ C as well. This immediately implies that p = 12 q + 12 (2p−q) ∈ C .
VEU AND ATTITUDES TOWARD VARIATION
839
For statement 2, note first that for any a ∈ B0 (Σ) with a ≥ 0, q(a) ≥ 0 for all q ∈ C : hence, by Eq. (24), 2J(a) ≥ q(a). The inequality now extends to B(Σ) by sup-norm continuity of J and q(·). Finally, for any a b ∈ B(Σ), 2J(|a − b|) ≥ q(|a−b|) ≥ |q(a)−q(b)|, where the second equality follows, for example, from Dudley (1989, Theorem 5.1.1). Q.E.D. Conclude with a useful “vertical invariance” property. LEMMA 4: In the setting of Lemma 2, if a b ∈ B(Σ u(X)) and for some δ ∈ R, q(a) = q(b) + δ for all q ∈ C , then I(a) = I(b) + δ. PROOF: Assume first that inf b(Ω) sup b(Ω) ∈ int(u(X)). Then there exists α ∈ (0 1) such that b + αδ ∈ B(Σ u(X)). For all k ≥ 0, let ak = [1 − (1 − α)k ]a + (1 − α)k b. Then ak ∈ B(Σ u(X)) for all k ≥ 0. Furthermore, (1 − α)ak + αa = (1 − α)[1 − (1 − α)k ]a + (1 − α)k+1 b + αa = [1 − (1 − α)k+1 ]a + (1 − α)k+1 b = ak+1 Now write d d to signify that I(αd + (1 − α)c) = I(αd + (1 − α)c) for all α ∈ (0 1] and c ∈ B(Σ u(X)). By Lemma 2, d d iff q(d) = q(d ) for all q ∈ C . In particular, is conic: d d implies that βd + (1 − β)d
βd + (1 − β)d
. Note that is the symmetric part of the relation defined in the proof of Lemma 2. CLAIM 1: For all k, ak + α(1 − α)k δ ∈ B(Σ u(X)) and ak+1 ak + α(1 − α)k δ. PROOF: For k = 0, a0 + α(1 − α)0 δ = b + αδ ∈ B(Σ u(X)) by the choice of δ. Furthermore, for all q ∈ C , q(a1 ) = q((1 − α)b + αa) = (1 − α)q(b) + αq(a) = (1 − α)q(b) + αq(b) + αδ = q(b) + αδ = q(a0 + α(1 − α)0 δ), so a1 a0 + α(1 − α)0 δ. By induction, for k > 0, (1 − α)[ak−1 + α(1 − α)k−1 δ] + αa = (1 − α)ak−1 + αa + α(1 − α)k δ = ak + α(1 − α)k δ Thus, ak + α(1 − α)k δ ∈ B(Σ u(X)) because a ak−1 + α(1 − α)k−1 δ ∈ B(Σ u(X)). Furthermore, if ak ak−1 + α(1 − α)k−1 δ, then also ak+1 = (1 − α)ak + αa (1 − α)[ak−1 + α(1 − α)k−1 δ] + αa = ak + α(1 − α)k δ because is conic.
Q.E.D.
840
M. SINISCALCHI
The claim implies that, for all k ≥ 1, I(ak ) = I(ak−1 + α(1 − α)k−1 δ) = I(ak−1 ) + α(1 − α)k−1 δ, where the second equality follows from vertical invariance. Thus, I(ak ) = I(b) + αδ
k−1
(1 − α) = I(b) + αδ
=0
1 − (1 − α)k α
= I(b) + δ[1 − (1 − α)k ] Since ak → a and I is continuous, the result follows. k k If b is arbitrary, for k ≥ 0, let ak = k+1 a and bk = k+1 b, so in particuk lar b (Ω) ⊂ int(u(X)). Furthermore, for every k ≥ 0 and q ∈ C, q(ak ) = k k k k q(a) = k+1 q(b) + k+1 δ = q(bk ) + k+1 δ, and it has just been shown that k+1 k k k k then I(a ) = I(b ) + k+1 δ. Since a → a and bk → b, continuity implies that I(a) = I(b) + δ. Q.E.D. B.3. Monotone Continuity Assume that Γ is nonsingleton. A functional H : B0 (Σ Γ ) → R is monotonely continuous iff, for every α β γ ∈ Γ with α > β > γ and every sequence of events (Ak ) ⊂ Σ such that Ak ⊃ Ak+1 for all n and Ak = ∅, there is k such that H(α − (α − γ)1Ak ) > β > H(γ + (α − γ)1Ak )—or, abusing the notation for binary acts, H(γAk α) > β > H(αAk γ). Continue to focus on the representation I u of ; assume w.l.o.g. that 0 ∈ int(u(X)). Clearly, satisfies Axiom 6 iff I is monotonely continuous. This property will now be characterized in terms of the functional J defined in Lemma 1. LEMMA 5: The following statements are equivalent: 1. I is monotonely continuous. 2. For every decreasing sequence (Ak ) ⊂ Σ such that Ak = ∅, J(1Ak ) → 0. Thus, if I is monotonely continuous, the charge p representing J is actually a measure. PROOF OF LEMMA 5: 1 ⇒ 2: Let α ∈ u(X) be such that α > 0 and −α ∈ u(X). For every ε ∈ (0 α), there is k such that ε > I(α1An ) and k
such that I(α(1 − 1Ak
)) > α − ε (take γ = 0 and β = ε α − ε in the definition of monotone continuity). Letting k = max(k k
), so A ⊂ Ak and A ⊂ Ak
, by monotonicity both ε > I(α1Ak ) and I(α(1 − 1Ak )) > α − ε hold. Furthermore, since −α ∈ u(X), vertical invariance of I implies that I(α(1 − 1Ak )) = α + I(−α1Ak ) > α − ε, that is, ε > −I(−α1Ak ). Hence, ε > 12 I(α1Ak ) − 12 I(−α1Ak ) = J(α1Ak ). To sum up, if η ≥ 1, then monotonicity
VEU AND ATTITUDES TOWARD VARIATION
841
implies that J(1Ak ) ≤ η for all k, and for η ∈ (0 1), taking ε = ηα yields k such that J(1Ak ) = α1 J(α1Ak ) < α1 ε = η. 2 ⇒ 1: Fix α β γ ∈ u(X) with α > β > γ. Then there is k such that J(γ + (α − γ)1Ak ) < γ + 12 (β − γ). Let μ = α + γ, so μ − γ − (α − γ)1A k = α − (α − γ)1Ak ∈ B0 (Σ u(X)). Then, by the definition of J, 1 1 1 γ + (β − γ) > μ + I γ + (α − γ)1Ak 2 2 2 1 − I μ − γ − (α − γ)1Ak ; 2 substituting for μ and simplifying this reduces to 1 1 1 1 β > α + I γ + (α − γ)1Ak − I α − (α − γ)1Ak 2 2 2 2 1 ≥ I γ + (α − γ)1Ak 2 where the inequality follows from monotonicity of I, as α − (α − γ)1Ak ≤ α. Thus, β > I(γ + (α − γ)1Ak ). Similarly, there is k
such that J(α − (α − γ)1Ak
) > α − 12 (α − β), that is, 1 1 1 α − (α − β) < μ + I α − (α − γ)1Ak
2 2 2 1 − I μ − α + (α − γ)1Ak
2 and again substituting for μ and simplifying yields 1 1 1 1 β < γ + I α − (α − γ)1Ak
− I γ + (α − γ)1Ak
2 2 2 2 1 ≤ I α − (α − γ)1Ak
2 because γ + (α − γ)1Ak
≥ γ. Thus, I(α − (α − γ)1Ak
) > β. Therefore, by monotonicity, k = max(k k
) satisfies I(α − (α − γ)1Ak ) > β > I(γ + (α − Q.E.D. γ)1Ak ), as required. B.4. Proof of Theorem 1 It is clear that statement 2 implies 3 in Theorem 1; thus, focus on the nontrivial implications.
842
M. SINISCALCHI
B.4.1. Statement 3 Implies 1
For all a ∈ u◦ F0 , let Jp (a) = a dp and I(a) = Jp (a)+A(Ep [ζa]). Thus, for all f g ∈ F0 , f g iff I(u ◦ f ) ≥ I(u ◦ g). It is easy to verify that I is constantmixture invariant and normalized (because Ep [ζi ] = 0 for all i and A(0) = 0). Furthermore, by part 3 of Definition 1, it is monotonic and hence a niveloid by Proposition 6. This implies that satisfies the first five axioms in statement 1. Furthermore, for all a ∈ u ◦ F0 , letting γ ∈ u(X) be such that γ − a ∈ u ◦ F0 , 1 1ˆ 1ˆ J(a) ≡ γ + I(a) − I(γ − a) 2 2 2 1 1 1 1 = γ + Jp (a) + A(Ep [ζa]) − Jp (γ − a) 2 2 2 2 1 − A Ep [ζ(γ − a)] 2 = Jp (a) as Ep [ζi ] = 0 for all i and A(φ) = A(−φ) for all φ ∈ E (F0 ; p ζ); thus, the functional J defined in Lemma 1 coincides with Jp on u ◦ F0 , and hence it is affine; thus, satisfies Axioms 7 and 8 as well. Moreover, since p is countably additive, if (Ak ) ⊂ Σ decreases to ∅, J(1Ak ) = Jp (1Ak ) ↓ 0, and Lemma 5 implies that I is monotonely continuous, so satisfies Axiom 6. B.4.2. Statement 1 Implies 2 Since satisfies Axioms 1–5, it admits a nondegenerate niveloidal representation I u by Proposition 6. Furthermore, it is w.l.o.g. to assume that 0 ∈ int(u(X)). Moreover, since satisfies Axioms 7 and 8, the functional J defined in Eq. (20) is affine on u ◦ F0 by Lemma 1. Finally, since satisfies Axiom 6, I is monotonely continuous, so Lemma 5 implies that the measure p representing J is countably additive. This will be the baseline prior in the VEU representation. The next step is to construct the adjustment factors (ζi )0≤i
VEU AND ATTITUDES TOWARD VARIATION
843
PROOF: By Lemma 3, a dq ≤ 2J(a) for all a ∈ B(Σ) such that a ≥ 0; hence, possibly by considering truncations and taking suprema, |a|2 dq ≤ 2J(|a|2 ) for all Σ-measurable functions a, where one or both integrals may be infinite. In particular, every a ∈ H is also square-integrable with respect to q, so a → a dq is well defined on H. Furthermore, if ak → a in the L2 (p) norm topology (i.e. J(|ak − a|2 ) → 0), then clearly q(|ak − a|2 ) → 0, which implies that q(ak ) → q(a).37 Hence, q(·) is a continuous linear functional on H. By the Riesz–Frechet theorem, there exists aq ∈ H such that q(a) = a aq . I claim that aq can be chosen to be bounded. To this end, for every M > 0, let EM = {ω : aq (ω) > M}. Then M · p(EM ) ≤ 1EM aq dp = q(EM ) ≤ 2p(EM ) where the second inequality follows from Lemma 3. Then either p(EM ) = 0 or M ≤ 2. Therefore, since q is positive, 0 ≤ aq (ω) ≤ 2 p-a.e., so the claim follows. Q.E.D. Now define the set (25)
C = {c ∈ H : ∀q ∈ C q(c) = J(c)}
To interpret, recall that an act f in F0 or Fb is crisp iff λf +(1 −λ)g ∼ λx+(1 − λ)g for all g ∈ F0 , λ ∈ (0 1], and x ∈ X such that x ∼ f . This is equivalent to I(λu ◦ f + (1 − λ)u ◦ g) = λI(u ◦ f ) + I((1 − λ)u ◦ g) and, hence, by Lemma 2, to q(u ◦ f ) = I(u ◦ f ) for all q ∈ C . In particular, this implies J(u ◦ f ) = I(u ◦ f ) by Lemma 3, and so f is crisp iff u ◦ f ∈ C. The definition of the set C employs this characterization of crisp acts to identify a class of functions in H with analogous properties. Conclude by showing that C is closed in H. By Lemma 6, if (c k ) ⊂ C is such that c k → c for some c ∈ H in the L2 (p) norm topology, then J(c) = limk J(c k ) = limk q(c k ) = q(c) for all q ∈ C ; therefore c ∈ C. Construction of the Adjustment Factors (ζi )0≤i
844
M. SINISCALCHI
Consider the orthogonal complement NC⊥ = {c ∈ H : ∀b ∈ NC c b = 0}. If c ∈ C, then q(c) = J(c) for all q ∈ C , so in particular c bi = 0 for all i ≥ 0. Therefore, c b = 0 for any b in the linear span of {b0 b1 }, which is the same as the linear span of {ζ0 ζ1 }. Finally, this implies that c b = 0 for all b ∈ NC. Thus, C ⊂ NC⊥ . Conversely, if c ∈ NC⊥ , then in particular c aq − 1Ω = 0 for all q ∈ C , that is, q(c) = J(c); hence, c ∈ C. Thus, conclude that C = NC⊥ . Since 1Ω ∈ C = NC⊥ , 1Ω ζi = 0, that is, Ep [ζi ] = 0 for all i. Henceforth, let n denote the number of nonzero ζi ’s and assume w.l.o.g. that these are the first n elements of the sequence ζ0 ζ1 Construction of the Adjustment Function A: Define first I˜ : u ◦ Fb + C → R ˜ + c) = I(a) + J(c) for all a ∈ u ◦ Fb and c ∈ C. This is well posed: by letting I(a if a + c = a + c for a a ∈ u ◦ Fb and c c ∈ C, then a − a = c − c ∈ C; thus, for all q ∈ C , q(a) = q(a + (a − a )) = q(a ) + q(c − c) = q(a ) + J(c − c), so that I(a) = I(a ) + J(c − c) by Lemma 4. Thus, I(a) + J(c) = I(a ) + J(c − c) + J(c) = I(a ) + J(c ), as needed. Also note that if a ∈ u ◦ Fb , then there exists γ ∈ R such that γ − a ∈ u ◦ Fb and therefore −a ∈ u ◦ Fb + C because C contains all constant functions. Now consider ϕ ∈ E (u ◦ F b ; p ζ), so there is a ∈ u ◦ Fb such that ϕ = Ep [ζa] = (a ζi )i . Then b = i ϕi ζi is the projection of a onto NC, a − b ∈ ˜ ˜ + 12 I(−b). NC⊥ = C, and thus b = a + (b − a) ∈ u ◦ Fb + C. Let A(ϕ) = 12 I(b)
To see that A(·) is well defined, suppose that ϕ = Ep [ζa ] for some a = a in u ◦ Fb . Then b is also the projection of a onto NC and a − b ∈ C, so b = a + ˜ Furthermore, (b − a ) ∈ u ◦ Fb + C; thus, A(·) is well defined because so is I. 0n = Ep [aζ] for a = 0, which is the unique element in NC ∩ C. Thus, A(0n ) = 1 I(0) + 12 I(−0) = 0. Finally, if ϕ = Ep [ζa] for some a 2 ∈ u ◦ Fb and b ∈ NC is the projection of a, then b ∈ u ◦ Fb + C and so −b = i (−ϕi )ζi ∈ u ◦ Fb + C, ˜ ˜ which implies that A(−ϕ) = 12 I(−b) + 12 I(b) = A(ϕ). Finally, verify that the map f → Ep [u ◦ f ] + A(Ep [ζu ◦ f ]) indeed represents preferences. For a ∈ u ◦ Fb , if γ − a ∈ u ◦ Fb , then Ep [a] + A(Ep [ζa]) = J(a) + 1 ˜ ˜ I(a)+ 12 I(−a) = 12 γ + 12 I(a)− 12 I(γ −a)+ 12 I(a)+ 12 I(γ −a)+ 12 J(−γ) = I(a), 2 decomposing −a as (γ − a) + (−γ) with γ − a ∈ u ◦ Fb and −γ ∈ C. This completes the proof. B.4.3. Proof of Corollary 1 By Corollary 4, has a unique extension to Fb that satisfies Axioms 1–5. Clearly, this preference also satisfies Axiom 6, and Lemma 1 shows that it satisfies Axioms 7 and 8 as well. The argument in the preceding subsection actually constructs a VEU representation of the extension of to Fb , which is sharp. B.4.4. Uniqueness Consider two VEU representations (u p n ζ A) and (u p n ζ A ) of , and assume that the former is sharp. By standard arguments, u = αu+β for
VEU AND ATTITUDES TOWARD VARIATION
845
some α β ∈ R with α > 0. Consequently, a ∈ u ◦ F0 if and only if αa + β ∈ u ◦ F0 . Next, for every a ∈ u◦ F0 , let I(a) = Ep [a] +A(Ep [ζ · a]); define I similarly using the second VEU representation. By Corollary 3, αI(a) + β = I (αa + β) for every a ∈ u ◦ F0 . If a γ − a ∈ u ◦ F0 , then αa + β α(γ − a) + β ∈ u ◦ F0 , and so, if J and J are the functionals defined from I and I , respectively, as in Eq. (20), then 1 J (αa + β) = α γ + β + 2 1 =α γ+β+ 2 = αJ(a) + β
1 1 I (αa + β) − I (α(γ − a) + β) 2 2 1 1 [αI(a) + β] − [αI(γ − a) + β] 2 2
This implies that linear extensions of J and J to B(Σ) coincide, and so p = p ; hence, (26)
αA(Ep [ζ · a]) = αI(a) + β − αJ(a) − β = I (αa + β) − J (αa + β) = A (Ep [ζ · αa])
for all a ∈ u ◦ F0 , where the last equality uses the fact that Ep [ζj ] = 0 for all 0 ≤ j < n . Now, to define a suitable linear surjection T : E (u ◦ F0 ; p ζ ) → E (u ◦ F0 ; p ζ), suppose that Ep [ζ · αa] = Ep [ζ · αb] for a b ∈ u ◦ F0 . Let γ ∈ R be such that γ − b ∈ u ◦ F0 , so there is f ∈ F0 such that 12 a + 12 (γ − b) = u ◦ f or, equivalently, 12 (αa + β) + 12 [α(γ − b) + β] = αu ◦ f + β = u ◦ f . But then Ep [ζ · u ◦ f ] = Ep [ζ · 12 (a − b)] = 0, which implies that f is crisp.38 Since (u p n ζ A) is sharp, Ep [ζ · u ◦ f ] = 0 and so Ep [ζ · a] = Ep [ζ · b]. Thus, we can define T by letting T (Ep [ζ · αa]) = Ep [ζ · a] for all a ∈ u ◦ F0 . That T is affine and onto is immediate. Finally, if ϕ = Ep [ζ · αa], then A(T (ϕ )) = A(T (Ep [ζ · αa])) = A(Ep [ζ · a]) = α1 A (Ep [ζ · αa]) = α1 A (ϕ ), where the second equality follows from the definition of T and the third equality follows from Eq. (26): thus, A = α1 A ◦ T . Finally, if (u p n ζ A ) is also sharp, assume that Ep [ζ · a] = Ep [ζ · b]: arguing as above, if γ − b ∈ u ◦ F0 and u ◦ f = 12 a + 12 (γ − b), then f is crisp. Since (u p n ζ A ) is sharp, Ep [ζ · u ◦ f ] = 0, that is, Ep [ζ · αa] = Ep [ζ · αb]. Thus, T is a bijection. B.4.5. Proof of Proposition 1 Recall first that, as shown in the proof of uniqueness, Ep [ζ · a] = 0 implies that a is a crisp function (in u ◦ F0 or u ◦ Fb ). 38 For all g ∈ F0 and x ∈ X with f ∼ x, Ep [λu ◦ f + (1 − λ)u ◦ g] + A (Ep [ζ · [λu ◦ f + (1 − λ)u ◦ g]]) = Ep [λu ◦ f + (1 − λ)u ◦ g] + A (Ep [ζ · (1 − λ)u ◦ g]) = Ep [λu (x) + (1 − λ)u ◦ g] + A (Ep [ζ · [λu (x) + (1 − λ)u ◦ g]]).
846
M. SINISCALCHI
Part 1. If n = ∞, the statement holds vacuously. Otherwise, observe that Ep [ζ · u ◦ f1 ] Ep [ζ · u ◦ fm ] is a collection of m > n vectors in Rn , so there must be β1 βm ∈ R, not all zero, such that j βj Ep [ζ · u ◦ fj ] = 0. Let β¯ = j |βj | > 0. Now define α1 αm and g1 gm ∈ Fb by letting (a) αj = |βj |/β¯ and (b) gj = fj if βj > 0, and gj be such j − u ◦ fj for that u ◦ gj = γ a suitable γj ∈ R otherwise. Then j αj u ◦ gj = β1¯ j βj u ◦ fj − β1¯ j : βj <0 βj γj . Therefore, by construction, Ep [ζ · j αj u ◦ gj ] = 0, so j αj gj is a crisp combination of f1 fm . Part 2. Suppose that (u p n ζ A) is sharp, so in particular the ζi ’s are orthonormal. Since each ζi is bounded, there exists γ > 0 such that γζi ∈ u ◦ Fb for all i = 0 m − 1; thus, for each such i, let fi ∈ Fb be such that u ◦ fi = γζi . Suppose there exists a crisp combination j αj gj of f0 fm−1 . For j such that gj = fj , suppose that u ◦ gj = γj − u ◦ fj . Also, for all j = 0 m − 1,
let βj = αj if g j = fj and let βj = −α j otherwise. Then, since (u p n ζ A ) is 1 sharp, Ep [ζ · j βj ζj ] = γ Ep [ζ · j αj u ◦ gj ] = 0n , where constants cancel bem−1
cause Ep [ζ ] = 0n . But since ζ0 ζm−1 are orthonormal, Ep [ζi · j=0 βj ζj ] = βi for 0 ≤ i < m − 1, and not all βi ’s are zero: contradiction. Part 3. Suppose that (u p n ζ A ) is another representation of on Fb and, by contradiction, n < n. By part 2, there is a tuple f0 fn that admits no crisp combination; however, by part 1, every tuple of n + 1 elements must contain a crisp combination: contradiction. Thus, n ≥ n. Part 4. If: This part follows from part 2 and the fact that if is not EU, then n > 0. Only if: Since (u p n ζ A) is sharp and n = 1, is not EU. Now suppose ¯ and α are such that both αf + (1 − α)g and αf + (1 − α)g¯ are that f , g, g, crisp. Since the representation is sharp, Ep [ζ · u ◦ [αf + (1 − α)g]] = Ep [ζ · ¯ = 0; hence, for all f¯ such that (f f¯) are complementary, u ◦ [αf + (1 − α)g]] ¯ = 0. This implies also Ep [ζ · u ◦ [αf¯ + (1 − α)g]] = Ep [ζ · u ◦ [αf¯ + (1 − α)g]] that there is a tuple of size m = 2 that admits no crisp combinations, which contradicts part 2. Q.E.D. B.5. Ambiguity Aversion PROOF OF COROLLARY 2: If satisfies Ambiguity Aversion, then I is concave (cf. MMR, p. 28); in particular, if a γ − a ∈ u ◦ F0 , 1 1 1 1 1 γ = I a + (γ − a) ≥ I(a) + I(γ − a) 2 2 2 2 2 1 1 1 a dp + A(Ep [ζ · a]) + γ = 2 2 2
VEU AND ATTITUDES TOWARD VARIATION
−
1 2
847
1 a dp + A Ep [ζ · (γ − a)] 2
1 = γ + A(Ep [ζ · a]) 2 and so A is nonpositive. Finally, A is clearly also concave. Conversely, suppose that A is concave (hence, also nonpositive). Then I is concave, so for all f g ∈ F0 with f ∼ g, I(u ◦ [λf + (1 − λ)g]) ≥ I(u ◦ λf ). Q.E.D. PROOF OF PROPOSITION 2: That condition 3 ⇒ 1 is immediate (consider the EU preference determined by p and u). To see that condition 3 ⇔ 2, note ¯ then that if f f¯ are complementary, with 12 f + 12 f¯ ∼ z ∈ X, f ∼ x, and f¯ ∼ x, 1 1 ¯ 1 1 f + 2 f 2 x + 2 x¯ iff 2 1 1 u ◦ f dp + A(Ep [ζ · u ◦ f ]) u(z) ≥ 2 2 1 1 u ◦ f¯ dp + A(Ep [ζ · u ◦ f¯]) + 2 2 = u(z) + A(Ep [ζ · u ◦ f ]) because Ep [ζ · u ◦ f¯] = −Ep [ζ · u ◦ f ] and A is symmetric; hence, the required ranking obtains iff A(Ep [ζ · u ◦ f ]) ≤ 0. Turn now to condition 1 ⇒ 3. Suppose that is more ambiguity-averse than some EU preference relation . By Corollary B.3 in Ghirardato, Maccheroni, and Marinacci (2004), one can assume that is represented by the nonconstant utility u on X. Arguing by contradiction, suppose that there is f ∈ F0 such that A(Ep [ζ · u ◦ f ]) > 0. Let γ ∈ R be such that γ − u ◦ f ∈ B0 (Σ u(X)) and let f¯ ∈ F0 be such that u ◦ f¯ = γ − u ◦ f . Then A(Ep [ζ · u ◦ f¯]) = A(Ep [ζ · u ◦ f ]) > 0. Furthermore, 12 u ◦ f + 12 u ◦ f¯ = u ◦ ( 12 f + 12 f¯) = 12 γ, which implies A(Ep [ζ · u ◦ ( 12 f + 12 f¯)]) = A(0) = 0. If now f ∼ x and f¯ ∼ x¯ for x x¯ ∈ X, ¯ = 12 γ + A(Ep [ζ · u ◦ f ]) > 12 γ, so 12 x + 12 x¯ 12 f + 12 f¯. Now then 12 u(x) + 12 u(x) let z ∈ X be such that 12 f (ω) + 12 f¯(ω) ∼ z for all ω; then 12 x + 12 x¯ z, so 1 ¯ and since is an x + 12 x¯ z. But f ∼ x and f¯ ∼ x¯ imply f x and f¯ x, 2 1 1 ¯ 1 1 1
1 ¯ hence, z 2 x + 2 x, ¯ a contradiction. EU preference, 2 f + 2 f 2 x + 2 x; To see that condition 3 ⇔ 4, consider first the following claim. CLAIM 2: For a complementary pair (f f¯) such that f ∼ f¯, 12 f + 12 f¯ ∼ z f iff A(Ep [ζ · u ◦ f ]) ≤ 0. PROOF: To prove this claim, let 12 f + 12 f¯ ∼ z ∈ X. Then, since f ∼ f¯ and these acts have the same adjustments, u ◦ f dp = u ◦ f¯ dp, so both integrals
848
M. SINISCALCHI
1 1 ¯ equal u(z). Therefore, 2 f + 2 f ∼ z f if and only if u(z) ≥ u(z) + A(Ep [ζ · u ◦ f ]) = u ◦ f dp + A(Ep [ζ · u ◦ f ]). Q.E.D.
Claim 2 immediately shows that condition 3 implies 4. For the converse, assume that Axiom 11 holds and consider the cases (a) satisfies Certainty Independence or (b) u(X) is unbounded. In case (a), then I is positively homogeneous, so if ϕ = Ep [ζ · a] for some a ∈ B(Σ u(X)) and α > 0, then ˆ ˆ A(αϕ) = I(αa) − J(αa) = α[I(a) − J(a)] = αA(ϕ); that is, A is also positively homogeneous. In this case, it is w.l.o.g. to assume that u(X) ⊃ [−1 1] and prove the result for f ∈ F0 such that u ◦ f ≤ 13 . This ensures the existence of f¯ ∈ F0 such that u ◦ f¯ = −u ◦ f, as well as g g¯ ∈ F0 such that u ◦ g = u ◦ f − u ◦ f dp and u ◦ g¯ = u ◦ f¯ − u ◦ f¯ dp = −u ◦ g. By construc¯ are complementary and g ∼ g, ¯ because u ◦ g dp = u ◦ g¯ dp = 0. tion, (g g) Claim 2 implies that A(Ep [ζ · u ◦ f ]) = A(Ep [ζ · u ◦ g]) ≤ 0, as required. In case (b), suppose u(X) is unbounded below (the other case is treated analogously). Consider f ∈ F0 and construct f¯ ∈ F0 such that u ◦ f¯ = min u ◦ f (Ω) + max u ◦ f (Ω) − f . Then f and f¯ are complementary. If f ∼ f¯, then Claim 2 suffices to prove the result. Otherwise, let δ = u ◦f dp − u ◦ f¯ dp. If δ > 0, consider f ∈ F0 such that u ◦ f = u ◦ f − δ: then u ◦ f dp = u ◦ f¯ dp, and f and f¯ are complementary, so f ∼ f¯ and Claim 2 implies that A(Ep [ζ · u ◦ f ]) = A(Ep [ζ · u ◦ f ]) ≤ 0. If instead δ < 0, consider f such that u ◦ f = f¯ − δ, so again f ∼ f and Claim 2 can be invoked to yield the required conclusion. Q.E.D. PROOF OF PROPOSITION 3: Statement 2 ⇒ 1 is immediate, so focus on statement 1 ⇒ 2. Since constant acts are complementary, assume w.l.o.g. that u1 = u2 ≡ u; it is also w.l.o.g. to assume that 0 ∈ int(X). Next, consider a ∈ u ◦ F0 such that −a ∈ u ◦ F0 and let f f¯ be such that a = u ◦ f and −a = u ◦ f¯. Then, by the properties of the VEU representation, f 1 f¯ iff f 2 f¯ is equivalent to Ep1 [a] ≥ 0 iff Ep2 [a] ≥ 0. By positive homogeneity, this is true for all a ∈ B0 (Σ); in particular, Ep1 [a − Ep1 [a]] = 0, so Ep2 [a − Ep1 [a]] = 0, that is, Ep1 [a] = Ep2 [a] for all a ∈ B0 (Σ) and the claim follows. Now suppose that statements 1 and 2 hold, and that the VEU representations under consideration are sharp. Then an act f is crisp for j iff Ep [ζ j · u ◦ f ] = 0. Thus, if ζ 1 = ζ 2 , 1 and 2 admit the same crisp acts. Conversely, suppose 1 and 2 admit the same crisp acts; then, for all a ∈ u ◦ F0 , Ep [ζ 1 · a] = 0 iff Ep [ζ 2 · a] = 0, and by positive homogeneity the same is true for all a ∈ B0 (Σ). Therefore, if Ep [ζ 1 · a] = Ep [ζ 1 · b] for a b ∈ u ◦ F0 , then also Ep [ζ 2 · a] = Ep [ζ 2 · b] and the converse implication also holds. Hence, we can define A¯ 2 : E (u ◦ F0 ; p ζ 1 ) → R by A¯ 2 (Ep [ζ 1 · a]) = A2 (Ep [ζ 2 · a]) to get a new VEU representation (u p n1 ζ 1 A¯ 2 ) for 2 . Q.E.D.
849
VEU AND ATTITUDES TOWARD VARIATION
PROOF OF PROPOSITION 4: Suppose that 1 is more ambiguity-averse than 2 . Pick f ∈ F0 and let x ∈ X be such that u(x) = Ep [u ◦ f ] + A1 (Ep [ζ 1 · u ◦ f ]) Then f 2 x and therefore Ep [u ◦ f ] + A2 (Ep [ζ 2 · u ◦ f ]) ≥ u(x) = Ep [u ◦ f ] + A1 (Ep [ζ 1 · u ◦ f ]) which yields the required inequality. Conversely, suppose A1 (Ep [ζ 1 · u ◦ f ]) ≤ A2 (Ep [ζ 2 · u ◦ f ]) for all f ∈ F0 . Then, for all x ∈ X, f 1 x implies Ep [u ◦ f ] + A2 (Ep [ζ 2 · u ◦ f ]) ≥ Ep [u ◦ f ] + A1 (Ep [ζ 1 · u ◦ f ]) ≥ u(x) that is, f 2 x, as required. The final claim is immediate.
Q.E.D.
B.6. Updating For a b ∈ u ◦ F0 , let aEb ∈ u ◦ F0 be the function that equals a on E and equals b elsewhere. PROOF OF REMARK 1: Only if: It will be shown that, for any event E ∈ Σ, p(E) = 0 implies I(a) = I(b) for all a b ∈ u ◦ F0 such that a(ω) = b(ω) for ω ∈ / E. To see this, assume w.l.o.g. that I(a) ≥ I(b) and let α = max{max a(Ω) max b(Ω)} and β = min{min a(Ω) min b(Ω)}. Then monotonicity implies that I(αEa) ≥ I(a) ≥ I(b) ≥ I(βEb) = I(βEa). Thus, it is sufficient to show that I(αEa) = I(βEa). This is immediate if α = β, so assume α > β. Since p(E) = 0, Ep [αEa] = Ep [1Ω\E a] = Ep [βEa], so if I(αEa) > I(βEa), it must be the case that A(Ep [ζ · αEa]) > A(Ep [ζ · βEa]). Letting γ = α + β, as usual γ − αEa γ − βEa ∈ u ◦ F0 . Now I(γ − αEa) = Ep [[γ − αEa]] + A Ep [ζ · [γ − αEa]]
= Ep 1Ω\E [γ − a] + A(−Ep [ζ · αEa])
= Ep 1Ω\E [γ − a] + A(Ep [ζ · αEa])
> Ep 1Ω\E [γ − a] + A(Ep [ζ · βEa])
= Ep 1Ω\E [γ − a] + A Ep [ζ · [γ − βEa)]
850
M. SINISCALCHI
= Ep [γ − βEa] + A Ep [ζ · [γ − βEa)] = I(γ − βEa) which is a violation of monotonicity, as γ − α = β < α = γ − β. If: Suppose that p(E) > 0 and fix x y ∈ X with x y. If xEy y, we are done. Otherwise, note that xEy ∼ y, that is, [u(x) − u(y)]p(E) + A([u(x) − u(y)]Ep [ζ · 1E ]) = 0, implies
A [u(x) − u(y)]Ep ζ · 1Ω\E = A [u(x) − u(y)]Ep [ζ · 1E ] = −[u(x) − u(y)]p(E); hence, I(u ◦ yEx) = u(y) + p(Ω \ E)[u(x) − u(y)]
+ A [u(x) − u(y)]Ep ζ · 1Ω\E = u(y) + p(Ω \ E)[u(x) − u(y)] − [u(x) − u(y)]p(E) = u(y) + [u(x) − u(y)][p(Ω \ E) − p(E)] < u(x) because p(Ω \ E) − p(E) = 1 − 2p(E) < 1 as p(E) > 0. Thus, x yEx and again Axiom 12 holds. Q.E.D. PROOF OF PROPOSITION 5: Since E is not null, p(E) > 0, so p(·|E) is well defined. CLAIM 3: If (f f¯) are complementary and constant on Ω \ E, then 1 1 1 1 f + f¯(ω) ∼ f¯ + f (ω) 2 2 2 2 holds if and only if u(f (ω)) = Ep [u ◦ f ] = Ep [u ◦ f |E] for all ω ∈ Ω \ E. PROOF: Let γ ∈ R be such that 12 γ = 12 u(f (ω)) + 12 u(f¯(ω)) for all ω ∈ Ω. Also let α = u(f (ω)) and β = u(f¯(ω)) for any (hence all) ω ∈ Ω \ E. Then u ◦ f¯ = γ − u ◦ f and β = γ − α. Thus, for ω ∈ Ω \ E, 1 1 ¯ I u◦ f + f (ω) 2 2 1 1 1 = Ep [u ◦ f ] + β + A Ep [ζ · u ◦ f ] 2 2 2 1 1 1 1 = Ep [u ◦ f ] + γ − α + A Ep [ζ · u ◦ f ] 2 2 2 2
VEU AND ATTITUDES TOWARD VARIATION
and
851
1 ¯ 1 f + f (ω) I u◦ 2 2
1 1 1 = Ep [u ◦ f¯] dp + α + A Ep [ζ · u ◦ f¯] 2 2 2 1 1 1 = γ − Ep [u ◦ f ] + α + A Ep [ζ · u ◦ f ] 2 2 2
where the last equality uses the fact that Ep [ζ · u ◦ f¯] = −Ep [ζ · u ◦ f ] and A is symmetric. Hence, 12 f + 12 f¯(ω) ∼ 12 f¯ + 12 f (ω) holds if and only if α = Ep [u ◦ f ]. Furthermore, Ep [u ◦ f ] = Ep [u ◦ f · 1E ] + αp(Ω \ E), so it follows Q.E.D. that α = Ep [u ◦ f |E] as well. Next, note that the adjustment factors ζE = (ζiE )0≤i
= Ep ζ · aE(Ep [a|E]) where the last equality follows from −Ep [ζi · 1E ] = Ep [ζi · 1Ω\E ]. To show that (u p n ζE A) is a VEU representation, it is sufficient to verify monotonicity. Observe that for a b ∈ u ◦ F0 , a ≥ b implies that Ep [a|E] ≥ Ep [b|E], and hence aE(Ep [a|E]) ≥ bE(Ep [b|E]). Since (u p n ζ A) is a VEU representation, Ep [aE(Ep [a|E])] + A(Ep [ζ · aE(Ep [a|E])]) ≥ Ep [bE(Ep [b|E])] + A(Ep [ζ · bE(Ep [b|E])), that is, by Eq. (27), Ep [a|E] + A(Ep [ζE · a|E]) ≥ Ep [b|E] + A(Ep [ζE · b|E]), as required. Now suppose part 1 holds. Fix f g f¯ g¯ ∈ F0 as in Axiom 14. By Claim 3, u ◦ f (ω) = Ep [u ◦ f |E] = Ep [u ◦ f ] and u ◦ g(ω) = Ep [u ◦ g|E] = Ep [u ◦ g] for all ω ∈ Ω \ E. Then the axiom implies that f E g iff f g, that is, iff Ep [u ◦ f ] + A(Ep [ζ · u ◦ f ]) ≥ Ep [u ◦ g] + A(Ep [ζ · u ◦ g]) ⇔ Ep [u ◦ f |E] + A Ep ζ · u ◦ f E(Ep [u ◦ f |E]) ≥ Ep [u ◦ g|E] + A Ep ζ · u ◦ gE(Ep [u ◦ g|E]) ⇔
Ep [u ◦ f |E] + A(Ep [ζE · u ◦ f |E]) ≥ Ep [u ◦ g|E] + A(Ep [ζE · u ◦ g|E])
If now f g ∈ F0 are arbitrary, let x y ∈ X be such that u(x) = Ep [u ◦ f |E] and u(y) = Ep [u ◦ g|E]. Notice that then Ep [u ◦ f Ex] = Ep [u ◦ f Ex|E] = u(x), and
852
M. SINISCALCHI
similarly for gEy. Finally, let f and g be such that (f Ex f ) and (gEy g ) are complementary; notice that this requires that f and g be constant on Ω \ E. Then, by Claim 3, the acts f Ex, f , gEy, and g satisfy the assumptions of Axiom 14, and the argument just given shows that then f Ex E gEy iff Ep [u ◦ f |E]+A(Ep [ζE ·u◦f |E]) ≥ Ep [u◦g|E]+A(Ep [ζE ·u◦g|E]). But by Axiom 13, f Ex E gEy iff f E g, so part 2 holds. In the opposite direction, assume that part 2 holds. It is then immediate that Axiom 13 is satisfied. Now assume that f , g, f¯, and g¯ are as in Axiom 14. Then Claim 3 shows that u(f (ω)) = Ep [u ◦ f |E] and u(g(ω)) = Ep [u ◦ g|E] for all ω ∈ Ω \ E, so Ep [u ◦ f |E] + A(Ep [ζE · u ◦ f |E]) = p(E)Ep [u ◦ f |E] + p(Ω \ E)u(f (ω)) + A Ep p(E)(ζ − Ep [ζ|E])u ◦ f |E
= Ep [u ◦ f ] + A Ep [ζ1E u ◦ f ] + Ep ζ1Ω\E Ep [u ◦ f |E] = Ep [u ◦ f ] + A(Ep [ζu ◦ f ]) and similarly for g, so Axiom 14 holds.
Q.E.D.
Conclude by verifying that the “law of iterated conditioning” holds: with notation as in Section 4.4, ζiEF = p(F|E) · [ζiE − Ep [ζiE |F]] = p(F|E)
· p(E) · (ζi − Ep [ζi |E]) − Ep p(E)(ζi − Ep [ζi |E])|F = p(F)ζi − p(F)Ep [ζi |E] − p(F)Ep [ζi |F] + p(F)Ep [ζi |E] = ζiF REFERENCES ALIPRANTIS, C., AND K. BORDER (1994): Infinite Dimensional Analysis. Berlin: Springer Verlag. [808] ANSCOMBE, F. J., AND R. J. AUMANN (1963): “A Definition of Subjective Probability,” Annals of Mathematical Statistics, 34, 199–205. [806,809] ARROW, K. J. (1974): Essays in the Theory of Risk-Bearing. Amsterdam: North-Holland. [809] BAILLON, A., O. L’HARIDON, AND L. PLACIDO (2008): “Machina’s Collateral Falsifications of Models of Ambiguity Attitude,” Draft Version, FUR Conference. [805,818] BEN-PORATH, E., AND I. GILBOA (1994): “Linear Measures, the Gini Index, and the IncomeInequality Trade-Off,” Journal of Economic Theory, 64, 443–467. [828] BICKEL, P., AND E. L. LEHMANN (1976): “Descriptive Statistics for Nonparametric Models. III. Dispersion,” Annals of Statistics, 4, 1139–1158. [804] BOGACHEV, V. I. (2007): Measure Theory, Vol. I. Berlin: Springer Verlag. [842]
VEU AND ATTITUDES TOWARD VARIATION
853
BOSE, S., E. OZDENOREN, AND A. PAPE (2006): “Optimal Auctions With Ambiguity,” Theoretical Economics, 1, 411–438. [804] CASADESUS-MASANELL, R., P. KLIBANOFF, AND E. OZDENOREN (2000): “Maxmin Expected Utility Over Savage Acts With a Set of Priors,” Journal of Economic Theory, 92, 33–65. [806] CHATEAUNEUF, A., AND J. TALLON (2002): “Diversification, Convex Preferences and Non-Empty Core in the Choquet Expected Utility Model,” Economic Theory, 19, 509–523. [804,817] CHATEAUNEUF, A., J. EICHBERGER, AND S. GRANT (2007): “Choice Under Uncertainty With the Best and Worst in Mind: Neo-Additive Capacities,” Journal of Economic Theory, 137, 538–567. [829] CHATEAUNEUF, A., M. MARINACCI, F. MACCHERONI, AND J.-M. TALLON (2005): “Monotone Continuous Multiple-Priors,” Economic Theory, 26, 973–982. [809] COCHRANE, J. H. (2001): Asset Pricing. Princeton, NJ: Princeton University Press. [803] DUDLEY, R. (1989): Real Analysis and Probability. Belmont, CA: Wadsworth & Brooks/Cole. [814, 839,843] EINHORN, H. J., AND R. M. HOGARTH (1985): “Ambiguity and Uncertainty in Probabilistic Inference,” Psychological Review, 92, 433–461. [802] (1986): “Decision Making Under Ambiguity,” Journal of Business, 59, S225–S250. [802] ELLSBERG, D. (1961): “Risk, Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. [801,802] EPSTEIN, L. (1985): “Decreasing Risk Aversion and Mean–Variance Analysis,” Econometrica, 53, 945–962. [827] (1999): “A Definition of Uncertainty Aversion,” Review of Economic Studies, 66, 579–608. [819] EPSTEIN, L. G., AND M. SCHNEIDER (2007): “Learning Under Ambiguity,” Review of Economic Studies, 74, 1275–1303. [804] EPSTEIN, L. G., AND J. ZHANG (2001): “Subjective Probabilities on Subjectively Unambiguous Events,” Econometrica, 69, 265–306. [802,803] ERGIN, H., AND F. GUL (2009): “A Theory of Subjective Compound Lotteries,” Journal of Economic Theory (forthcoming). [801] GAJDOS, T., T. HAYASHI, J.-M. TALLON, AND J.-C. VERGNAUD (2008): “Attitude Toward Imprecise Information,” Journal of Economic Theory, 140, 27–65. [829] GAJDOS, T., J. TALLON, AND J. VERGNAUD (2004a): “Decision Making With Imprecise Probabilistic Information,” Journal of Mathematical Economics, 40, 647–681. [829] (2004b): “Coping With Ignorance: A Decision-Theoretic Approach,” Working Paper 2004-14, INSEE. [829] GHIRARDATO, P., AND J. KATZ (2006): “Indecision Theory: Weight of Evidence and Voting Behavior,” Journal of Public Economic Theory, 8, 379–399. [804] GHIRARDATO, P., AND M. MARINACCI (2002): “Ambiguity Made Precise: A Comparative Foundation,” Journal of Economic Theory, 102, 251–289. [805,816,817] GHIRARDATO, P., F. MACCHERONI, AND M. MARINACCI (2004): “Differentiating Ambiguity and Ambiguity Attitude,” Journal of Economic Theory, 118, 133–173. [801,808,809,811,837,847] GHIRARDATO, P., F. MACCHERONI, M. MARINACCI, AND M. SINISCALCHI (2003): “A Subjective Spin on Roulette Wheels,” Econometrica, 71, 1897–1908. [806] GILBOA, I., AND D. SCHMEIDLER (1989): “Maxmin Expected Utility With a Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. [801,809,816,825] (1993): “Updating Ambiguous Beliefs,” Journal of Economic Theory, 59, 33–49. [822] GRANT, S., AND A. KAJII (2007): “The Epsilon-Gini-Contamination Multiple Priors Model Admits a Linear-Mean-Standard-Deviation Utility Representation,” Economics Letters, 95, 39–47. [827] GRANT, S., AND B. POLAK (2007): “Generalized Variational Preferences,” paper presented at the Australian Economic Theory Workshop, Canberra. [828-830] HANSEN, L., AND T. SARGENT (2001): “Robust Control and Model Uncertainty,” The American Economic Review, 91, 60–66. [801,828,830]
854
M. SINISCALCHI
HANSEN, L., T. SARGENT, AND T. TALLARINI (1999): “Robust Permanent Income and Pricing,” Review of Economic Studies, 66, 873–907. [801,828] HOGARTH, R. M., AND H. J. EINHORN (1990): “Venture Theory: A Model of Decision Weights,” Management Science, 36, 780–803. [802] KECHRIS, A. (1995): Classical Descriptive Set Theory. New York: Springer Verlag. [806] KLIBANOFF, P. (2001): “Characterizing Uncertainty Aversion Through Preference for Mixtures,” Social Choice and Welfare, 18, 289–301. [816] KLIBANOFF, P., M. MARINACCI, AND S. MUKERJI (2005): “A Smooth Model of Decision Making Under Ambiguity,” Econometrica, 73, 1849–1892. [801,818,826,830] KOPYLOV, I. (2006): “A Parametric Model of Hedging Under Ambiguity,” Mimeo, UC Irvine. [829] L’HARIDON, O., AND L. PLACIDO (2009): “Betting on Machina’s Reflection Example: An Experiment on Ambiguity,” Theory and Decision (forthcoming). [818] MACCHERONI, F., M. MARINACCI, AND A. RUSTICHINI (2006): “Ambiguity Aversion, Robustness, and the Variational Representation of Preferences,” Econometrica, 74, 1447–1498. [801,805, 807,809,811,813,816,825,828,833] MACHINA, M. (2009): “Risk, Ambiguity, and the Rank-Dependence Axioms,” American Economic Review (forthcoming). [805,817,818] MACHINA, M. J., AND D. SCHMEIDLER (1992): “A More Robust Definition of Subjective Probability,” Econometrica, 60, 745–780. [819,830] MUKERJI, S. (1998): “Ambiguity Aversion and Incompleteness of Contractual Form,” American Economic Review, 88, 1207–1231. [804] NAU, R. (2006): “Uncertainty Aversion With Second-Order Utilities and Probabilities,” Management Science, 52, 136. [801] PIRES, C. P. (2002): “A Rule for Updating Ambiguous Beliefs,” Theory and Decision, 53, 137–152. [821] QUIGGIN, J., AND R. CHAMBERS (1998): “Risk Premiums and Benefit Measures for GeneralizedExpected-Utility Theories,” Journal of Risk and Uncertainty, 17, 121–137. [827] (2004): “Invariant Risk Attitudes,” Journal of Economic Theory, 117, 96–118. [827,828] ROBERTS, K. (1980): “Interpersonal Comparability and Social Choice Theory,” Review of Economic Studies, 47, 421–439. [827,828] SAFRA, Z., AND U. SEGAL (1998): “Constant Risk Aversion,” Journal of Economic Theory, 83, 19–42. [827] SCHMEIDLER, D. (1989): “Subjective Probability and Expected Utility Without Additivity,” Econometrica, 57, 571–587. [801,804-806,816,826,827] SEIDENFELD, T., AND L. WASSERMAN (1993): “Dilation for Sets of Probabilities,” The Annals of Statistics, 21, 1139–1154. [823] SINISCALCHI, M. (2001): “Vector-Adjusted Expected Utility,” Working Papers in Economic Theory 01S3, Princeton University. [801] (2007): “Vector Expected Utility and Attitudes Toward Variation,” Mimeo, Northwestern University. Available at http://faculty.wcas.northwestern.edu/~msi661. [805,808,827] (2008): “Machina’s Reflection Example and VEU Preferences: A Very Short Note,” Mimeo, Northwestern University. Available at http://faculty.wcas.northwestern.edu/~msi661. [818] (2009): “Supplement to ‘Vector Expected Utility and Attitudes Toward Variation’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ecta/ Supmat/7564_Proofs.pdf. [805] STINCHCOMBE, M. (2003): “Choice and Games With Ambiguity as Sets of Probabilities,” Mimeo, University of Texas at Austin. [829] STROTZ, R. (1955–1956): “Myopia and Inconsistency in Dynamic Utility Maximization,” Review of Economic Studies, 23, 165–180. [822] STRZALECKI, T. (2007): “Axiomatic Foundations of Multiplier Preferences,” Mimeo, Northwestern University. [828]
VEU AND ATTITUDES TOWARD VARIATION
855
TVERSKY, A., AND D. KAHNEMAN (1974): “Judgement Under Uncertainty: Heuristics and Biases,” Science, 185, 1124–1131. [802] WANG, T. (2003): “A Class of Multi-Prior Preferences,” Mimeo, University of British Columbia. [828,829] YITZHAKI, S. (1982): “Stochastic Dominance, Mean Variance, and Gini’s Mean Difference,” The American Economic Review, 72, 178–185. [804,827]
Dept. of Economics, Northwestern University, 302 Andersen Hall, 2003 Sheridan Road, Evanston, IL 60208-2600, U.S.A.; [email protected]. Manuscript received November, 2007; final revision received December, 2008.
Econometrica, Vol. 77, No. 3 (May, 2009), 857–908
OPTIMAL STOPPING WITH MULTIPLE PRIORS BY FRANK RIEDEL1,2 We develop a theory of optimal stopping under Knightian uncertainty. A suitable martingale theory for multiple priors is derived that extends the classical dynamic programming or Snell envelope approach to multiple priors. We relate the multiple prior theory to the classical setup via a minimax theorem. In a multiple prior version of the classical model of independent and identically distributed random variables, we discuss several examples from microeconomics, operation research, and finance. For monotone payoffs, the worst-case prior can be identified quite easily with the help of stochastic dominance arguments. For more complex payoff structures like barrier options, model ambiguity leads to stochastic changes in the worst-case beliefs. KEYWORDS: Optimal stopping, ambiguity, uncertainty aversion, robustness.
1. INTRODUCTION OPTIMAL STOPPING PROBLEMS arise frequently in economics, finance, operations research, and statistics. Let us consider four typical examples. In economics, a classic problem is the entry decision of a firm.3 Upon entering the market, the firm pays a fixed cost and receives in return a stochastic dividend from that time onward. The managers aim to find a decision rule that maximizes expected net present value. The standard approach to this problem assumes that the joint distribution of future dividends is known to the firm. In finance, we encounter the optimal stopping problem when pricing, hedging, and looking for exercising rules for American options.4 The buyer of such an option wants to maximize the expected gain from exercising the option. In this case, he faces a similar problem as the firm above. The seller of the option looks for a (no-arbitrage) price and a (super)hedging portfolio. In complete markets, the no-arbitrage price of an American option is the value of the optimal stopping problem under the (unique) equivalent martingale measure. In both cases, one looks for an optimal stopping time under one given probability distribution. 1
With Appendices by Tatjana Chudjakow and Jörg Vorbrink. I thank Tatjana Chudjakow, Daniel Engelage, Larry Epstein, the referees, and the co-editor as well as seminar audiences in Beijing, Chinese Academy of Sciences, Bocconi University, Paris I, Kaiserslautern, ESEM 2008 Milano, and “Theoretischer Ausschuss des Vereins für Socialpolitik” for comments. Financial support through the German Research Foundation, Grant Ri 1128-3-1 and the International Graduate College “Stochastics and Real World Models” Beijing–Bielefeld is gratefully acknowledged. 3 See Dixit and Pindyck (1994), for example. 4 See Duffie (1992), for example. 2
© 2009 The Econometric Society
DOI: 10.3982/ECTA7594
858
FRANK RIEDEL
In operations research, textbooks5 look at a typical real estate agent who collects bids for the purchase of a house. Here, one assumes, for example, that the bids are independent and identically distributed with a known distribution. One then determines a sequence of critical thresholds above which the agent sells. In sequential statistics, the famous Wald test6 determines the optimal sampling size for a sequential experiment to enable one to distinguish between two simple hypotheses. Part of the solution is an optimal stopping rule that determines the optimal sampling size based on past observations. The four examples share a common assumption: there is one and only one distribution of payoffs, and this distribution is perfectly known to the agent. This assumption is too strong in a number of cases. The firm might face Knightian uncertainty in the sense that the distribution of future payoffs is not (exactly) known. Even if we invoke Savage’s theory of subjective probability, there is by now sufficient evidence that rational agents as well as real persons use different approaches. The current literature on ambiguity aversion thus uses multiple prior models to account for this uncertainty.7 As far as American options are concerned, markets can be incomplete. In this case, there is more than one equivalent martingale measure, and we are again in a multiple prior framework. Alternatively, one might want to assess the riskiness of an option by studying optimal stopping problems under coherent risk measures. Due to the representation theorem for such risk measures, we again have a multiple prior setting.8 In operations research and macroeconomics,9 one typically asks for the robustness of the optimal rules derived under the assumption of a known distribution. To study this, one solves the optimization problem for a whole class of priors that are close to or suitable perturbations of the original problem under the assumption that the decision maker is playing a game against malevolent nature.10 In sequential statistics, we encounter multiple priors when we move from simple to composite hypotheses. Motivated by the above problems, we study and solve in this paper optimal stopping problems in a multiple prior framework. We develop a theory of optimal stopping along the classical lines using and extending suitable results from martingale theory. This approach works as long as the set of priors is 5
See Porteus (1990), for example. See Wald (1947). 7 Gilboa and Schmeidler (1989) axiomatized multiple prior expected utility. Epstein and Schneider (2003b) provided the extension to dynamic settings. 8 See Artzner, Delbaen, Eber, and Heath (1999) on coherent risk measures. Riedel (2004) axiomatized time-consistent coherent risk measures. 9 See Hansen and Sargent (2001). 10 One can quantify this closeness by taking small balls around the original distribution in the variational topology, for example. Alternatively, one can penalize the distance from the original distribution by some cost function—relative entropy, for instance. We do not pursue this here. 6
OPTIMAL STOPPING WITH MULTIPLE PRIORS
859
time-consistent.11 When the horizon is finite, backward induction leads to the optimal solution as in the Bayesian case. One can thus compute easily the value function of the problem, the generalized Snell envelope of the payoff sequence. An optimal stopping rule is, as in the classical case, to stop when the payoff from stopping is equal to the Snell envelope. The proof of these results is not completely straightforward, though. The classical theory of optimal stopping relies strongly on martingale theory. A martingale is the probabilistic model of a fair game against nature. By definition, (conditional) expected gains from a martingale are zero. A supermartingale is an unfair game against nature where (conditional) expected gains are negative. Every supermartingale can be decomposed into a martingale and a sum of fees that are to be paid in advance (the Doob decomposition). A riskneutral agent would not pay a fee for an expected gain of zero; by backward induction, this carries over to supermartingales and it is optimal to stop immediately when the payoff process is a supermartingale. This reasoning is a version of the so-called optional sampling theorem, which says that the expected gain from stopping a supermartingale is negative; in other words, you cannot beat an unfair game by smart stopping. Snell (1952) has shown that the value process of an optimal stopping problem is a supermartingale, and is a martingale as long as stopping is not optimal.12 It then follows that it is optimal to stop when the current payoff is equal to the value process, because the optional sampling theorem and the supermartingale property of the value process imply that one cannot expect a higher gain by waiting any longer. The key to extend these results to multiple priors is to develop the first steps of a theory of multiple prior martingales. We do this in Section 5. Intuitively, we have to understand what a fair game is in the eyes of an uncertainty-averse, or pessimistic, agent. We define a multiple prior martingale (Mt ) by extending the usual martingale property to the nonlinear multiple prior expectation operator, or Mt = ess inf EP [Mt+1 |Ft ] P∈Q
11 When leaving the Bayesian world, one easily runs into dynamic inconsistencies (Sarin and Wakker (1998), Machina (1989), Yoo (1991), Eichberger and Kelsey (1996)). In the current setting, we also give an example where the naive choice of two priors leads to dynamically inconsistent stopping decisions (Example 3 in Appendix D). The work of Epstein and Schneider (2003b) shows how to overcome this difficulty. The set of priors must satisfy a certain dynamic consistency condition that they call rectangularity. This property appears in other decision–theoretic contexts as well; see, for example, Delbaen (2002b), Riedel (2004), or Föllmer and Schied (2004). It has also been called stability under pasting or time-consistency. We use the latter name here. 12 This is a version of the principle of dynamic programming.
860
FRANK RIEDEL
where Q is the set of time-consistent priors.13 Similarly, a multiple prior supermartingale (St ) satisfies St ≥ ess inf EP [St+1 |Ft ] P∈Q
We show that a multiple prior martingale is a submartingale for all probability measures P ∈ Q and a martingale for some (worst-case) probability measure P ∈ Q. In other words, an uncertainty-averse agent considers a game against nature as fair if it is a favorable game under all priors and a fair game under the worst prior. We also show that the existence of this worst-case measure requires timeconsistency of the set of priors. The worst-case measure is constructed by backward induction: in a tree, one chooses at every node the worst conditional onestep-ahead probability. The worst-case prior is obtained by pasting together all these one-step-ahead probabilities and the worst-case marginal at time 0. Time-consistency (or rectangularity or m-stability) is needed to ensure that the prior constructed in this way belongs to the original set of priors. Two key results from classical martingale theory hold true for multiple prior supermartingales: the Doob decomposition and the optional sampling theorem. The Doob decomposition states that a multiple prior supermartingale can be written as a multiple prior martingale minus a predictable, increasing process that starts at 0. While the proof is a copy of the original proof, it is noteworthy that we do not have here a uniform decomposition for the class of priors as one obtains it in the optional decomposition theorem (Kramkov (1996)). There, one aims to write a uniform supermartingale as a uniform martingale minus some optional increasing process. Multiple prior martingales are not uniform martingales, in general; thus, the type of decomposition is quite different. The second theorem that plays a key role here is the preservation of the multiple prior supermartingale property under stopping, the so-called optional sampling theorem. This theorem is a typical systems theorem. It says that if you play an unfair game against nature, then you cannot “beat the system” no matter what stopping rule you use. Formally, this means that if we start with a multiple prior supermartingale, then also the stopped process remains a multiple prior supermartingale. Having established these two key theorems, one can proceed as in the classical literature (Chow, Robbins, and Siegmund (1971), Snell (1952)). We show that the value function which one defines by backward induction (Bellman principle) is the smallest multiple prior supermartingale that dominates the payoff process. As long as stopping is not optimal, the value function is a 13 The operator ess inf denotes the essential infimum for a family of random variables. It is the “almost sure” version of an infimum replacing the usual infimum in probabilistic settings.
OPTIMAL STOPPING WITH MULTIPLE PRIORS
861
multiple prior martingale. In discrete time, optimal stopping times are usually not unique. The mentioned rule—to stop whenever the payoff equals the value process—is the smallest optimal stopping time. The multiple prior Doob decomposition allows one to determine the largest optimal stopping time as well. We also obtain a duality result that was first obtained by Föllmer and Schied (2004) and Karatzas and Kou (1998) with different methods. The multiple prior value function is the lower envelope of all Bayesian value functions. Under our assumptions, the infimum is also attained by some probability measure. As a consequence, the smallest optimal stopping rule in the multiple prior case is equal to the smallest optimal stopping rule a Bayesian decision maker would choose under some probability measure P ∗ ∈ Q. In Section 3.2, we extend the theory to infinite horizon where backward induction is not feasible. We show that the value process is still the smallest multiple prior supermartingale that dominates the payoff process. It satisfies the same recursive (or Bellman) equation as the value process does in the finite horizon case. Moreover, we show that the finite horizon solutions converge to the infinite horizon solution as the horizon tends to infinity. This is important for applications as the finite horizon solution can easily be computed by backward induction. The convergence theorem then allows one to approximate the infinite horizon numerically. In Section 4, we study some important optimal stopping problems discussed above and further examples. To this end, we first introduce a multiple prior version of independent and identically distributed random variables. As the concept of a distribution of a random variable is not uniquely defined in the case of multiple priors, we use the following model that allows us to speak of independent random variables with identical ambiguity (instead of distribution). The agent observes a sequence of random variables (εt ) that are projections of a product space S N to the state set S and thus model identical experiments. The agent uses a certain set Mab on S that describe her set of (marginal) beliefs for each random variable εt ; in this sense, we can say that all random variables share the same (degree of) ambiguity. It remains to combine these marginal beliefs in a path-independent way to obtain a set of prior beliefs for the whole sequence (εt ). Having observed the realization of ε1 = x1 , the agent has to specify the set of (conditional) distributions for ε2 given the observation x1 . She again uses Mab for this purpose and repeats this procedure for the other random variables ε2 ε3 As the set of (conditional) distributions for εt is independent of the observations x1 xt−1 , it is appropriate here to speak of independent random variables in a multiple prior framework. As the set does not depend on time, we say that the random variables (εt ) share identical ambiguity. Altogether, we use the name independent random variables with identical ambiguity in this paper.14 Technically, we start with a reference prob14 Epstein and Schneider (2003a) used the name independent and indistinguishably distributed, which leads to the acronym i.i.d., as in the classical case for independent and identically distributed. We feel that this might be confusing and thus prefer another name.
862
FRANK RIEDEL
ability under which a sequence of random variables is independent and identically distributed. We then describe a suitable dynamic version of an exponential family of distributions that are parametrized by certain predictable processes with values in an interval [a b]. The interval [a b] describes the agent’s uncertainty. This generalization of independent and identically distributed random variables to multiple priors allows us to highlight several differences between optimal stopping under risk and under uncertainty. We start by discussing several simple problems where the monotonicity of the payoff function and stochastic dominance arguments can be used to identify the worst-case measure ex ante. Under the worst-case measure, the payoffs are independent and identically distributed, and the optimal stopping rule under uncertainty thus coincides with an optimal stopping rule that is well known in the literature. This reasoning applies to the house selling problem and to simple American options like the call or the put. In general, the situation is more subtle. With the help of several types of American options, we show that the marginal distributions for the payoff sequence can change stochastically under the worst-case measure. If the payoff function is a convex function of the current asset price, for example, the state space can be separated into two regions. For low values of the asset price, the agent hopes that the price will decrease further because she wants to stop when the state price is sufficiently far away from the minimum and the worst-case prior thus has high mean values; for high values, on the contrary, the worstcase prior has low mean values. The agent thus picks a different prior from her set of possible priors in each step depending on the state of the world, even in this framework of independent random variables with identical ambiguity. The reason for this is, generally speaking, that the agent supposes a malevolent nature to choose the worst distribution for her and this distribution changes with the state of the world, in general. The same type of reasoning applies to knock-in barrier options and shout options. Barrier options of knock-in type are financial derivatives that become valuable only if the underlying asset price hits a barrier H and the option is “knocked in.” After this event, the owner holds a typical American option, say, a put. In this example, the buyer hopes first that the asset price goes up and hits the upper barrier; afterward, she hopes for plunging asset prices because this yields a higher payoff for the put. Again, the marginal distributions change in a path-dependent way under the worst-case measure. A shout option gives the buyer the right to freeze the asset price at any time before maturity to insure herself against a potential later decline. This option is equivalent to getting a European put option that is at the money at some random time chosen by the buyer. In this case, the buyer first wants the asset price to increase because she can then freeze the asset price at a quite high level; afterward, low asset prices are good for the investor because she owns a
OPTIMAL STOPPING WITH MULTIPLE PRIORS
863
put option. Interestingly, she changes her (marginal) belief after her own action here (presupposing that nature chooses the average returns exactly when she has exercised her option). Decisions under ambiguity are being studied by a number of authors currently. The present paper relies heavily on the fundamental work by Epstein and Schneider (2003b), Delbaen (2002a), and Föllmer and Schied (2004). The duality theorem, Theorem 2, appears in Föllmer and Schied (2004) (derived by other arguments). The notion of a generalized Snell envelope for maxmax expected utility (which is easier than the minimax case treated here) appears also in the theory of dynamic coherent risk measures in Artzner, Delbaen, Eber, Heath, and Ku (2007). Another approach can be found in Karatzas and Zamfirescu (2003) (see also Zamfirescu (2003)), who discussed both the maxmax and the minimax cases, and characterized saddlepoints. However, they did not assume time-consistency. In the framework of Brownian motion, the concept of g-expectation introduced by Peng (1997) is closely related to multiple prior expectations. In that framework, Coquet, Hu, Mémin, and Peng (2002) derived a nonlinear Doob–Meyer decomposition. Nishimura and Ozaki (2007) solved the optimal stopping problem for an American option when the drift term is unknown. The corresponding discrete-time result follows from our examples in Section 4. An application to job search is in Nishimura and Ozaki (2004). In independent work, Miao and Wang (2004) looked at infinite horizon Markovian stopping problems under multiple priors and obtained some solutions for American options in this setting. An outline of the paper is as follows. Section 2 introduces the model and discusses the assumptions. Section 3 contains the main theorems on multiple prior optimal stopping. The proofs of these theorems are in Section 6. Section 4 contains the applications, and Section 5 develops the multiple prior martingale theory. The Appendix contains valuable additional material and the remaining proofs and technical details. 2. MULTIPLE PRIORS AND OPTIMAL STOPPING Let (Ω F P0 (Ft )t∈N ) be a filtered probability space. We assume that F0 is the trivial σ-field and that F is the σ-field generated by the union of all Ft , t ∈ N. Let (Xt )t∈N be an adapted process. The decision maker chooses a stopping time τ with values in N ∪ {∞} of the filtration (Ft )t∈N . From stopping she obtains a payoff Xτ (ω) = Xτ(ω) (ω) for ω ∈ Ω, and we set Xτ (ω) = 0 if τ(ω) = ∞. The agent aims to maximize the expected reward: as she is uncertain about the distribution of X, she uses a class Q of probability measures on (Ω F ). The (multiple prior) expected reward is given by (1)
inf EP Xτ
P∈Q
864
FRANK RIEDEL
The aim of this paper is to develop a theory of optimal stopping over finite and infinite time horizons when the expected payoff is evaluated according to (1). To have well defined expectations, the payoff process has to be integrable in a suitable sense. We are going to impose the following uniform integrability condition throughout: A random variable X is called Q-uniformly integrable if it satisfies (2)
lim sup EP |X|1{|X|≥K} = 0
K→∞ P∈Q
We assume that the whole payoff process is bounded by such a random variable. ASSUMPTION 1: The payoff process (Xt )t∈N is bounded by a Q-uniformly integrable random variable: there exists a random variable Z ≥ 0 such that supt∈N |Xt | ≤ Z and lim sup EP Z1{Z≥K} = 0
K→∞ P∈Q
The assumption is clearly satisfied for bounded payoff processes, in particular in any finite model (tree). For some applications, especially in finance when returns are normally distributed or for infinite horizon problems, one has to allow for unbounded processes, though. Let us also note that for finite horizons, the assumption is equivalent to Q-uniform integrability for all Xt t = 1 T , T because supt≤T |Xt | ≤ t=1 |Xt |. The dynamic nature of our problem requires some structure for the set Q. ASSUMPTION 2: All P ∈ Q are locally equivalent to the reference measure P0 , that is, for all t ∈ N and A ∈ Ft we have P(A) = 0 if and only if P0 (A) = 0. The reference measure P0 just serves the role of fixing the sets of measure zero. Economically, this means that the decision maker has perfect knowledge about sure events.15 Note that we assume local equivalence of the priors only. It would not be reasonable to assume that all measures in Q are equivalent to the reference measure P0 on the information up to ∞ given by the σ-field F = F∞ = σ( ∞ t=0 Ft ). As an example, assume that the sequence (Xt ) is independent and identically distributed with different mean values mP = mQ under 15 In a finite model, one can take P0 to be the uniform distribution without loss of generality. More generally, if the measurable space (Ω F ) has a nice topological structure and the minimal expectation as in (1) is continuous from below, one can always construct P0 from the set of priors Q; see Tutsch (2006). Our assumption just excludes the case in which some prior assigns a probability of 0 to an event that can occur with positive probability under the reference measure P0 . We think that it is economically plausible to exclude this degenerate case. A behavioral foundation for this assumption can be found in Epstein and Marinacci (2006).
OPTIMAL STOPPING WITH MULTIPLE PRIORS
865
two measures P Q ∈ Q. By the law of large numbers, the arithmetic mean converges P-almost surely to mP and Q-almost surely to mQ . Hence, the measures P and Q are even singular on F∞ . Due to our assumptions and the Radon–Nikodym theorem, the set of priors Q can be identified with the set of density processes dP P ∈ D= Q dP0 Ft t∈N We impose the following technical condition that ensures that the infimum in (1) is always attained for bounded stopping times; see Lemma 10. ASSUMPTION 3: For every t ∈ N, the family of densities dP Dt = P ∈ Q dP0 Ft is weakly compact in L1 (Ω F P0 ). Note that the assumption is satisfied for (weakly) closed sets of priors when the densities in Dt are bounded by a P0 -integrable random variable. In particular, the assumption is satisfied whenever the state space Ω is finite and the set of priors is closed. The assumption is equivalent to certain monotone continuity conditions; see Corollary 4.35 in Föllmer and Schied (2004) or Chateauneuf, Maccheroni, Marinacci, and Tallon (2005), and also Lemma 9 in Appendix C. A behavioral justification for such kind of continuity was given by Arrow (1971). The next and last assumption of this paper is time-consistency. Timeconsistency requires that the decision maker does what she plans—the plan of action viewed as optimal at time 0 for a sequential decision problem is not changed when new information arrives or merely time elapses. For multiple prior preferences, time-consistency is equivalent to a certain stability condition for the set of priors that we state formally in Assumption 4 below. Let us discuss first where and when time-consistency can be expected to hold. Time-consistency is a consequence of sequential rationality. In a dynamic decision problem, a rational agent has two possible approaches to deal with her problem. On the one hand, she can collapse the tree to a single-stage decision problem and apply the corresponding axioms (here, the axioms of Gilboa and Schmeidler (1989)). On the other hand, she can apply the axioms step by step in every node of the tree and work backward. Both procedures should lead to the same decision if time-consistency is satisfied. In Appendix D, we show by an example in our optimal stopping context that time-consistency is violated if our priors violate the stability condition of Assumption 4 below.
866
FRANK RIEDEL
Time-consistency is also normatively appealing when used, for example, by a government agency to regulate markets with the help of a coherent risk measure. Without our assumption, it could happen that the agency liquidates a position at time 0 from which it knows that it would accept it in all states of the world at time 1 (or vice versa). The regulated market participants would have many reasons to complain about such a kind of decision making. It is also quite amazing that the forces of market competition lead automatically to time-consistent sets of multiple priors. When we have an incomplete financial market, equilibrium leads to a set of equivalent martingale measures. This equilibrium set of pricing measures is stable under pasting (see Riedel (2004, Example 2) for the details). The multiple prior price functional generated by incomplete financial markets is time-consistent. In short, a sequentially rational agent who uses the Gilboa–Schmeidler axioms must have a rectangular set of priors, as should an agency using coherent risk measures to assess the riskiness of portfolio positions. Last but not least, competitive market equilibrium leads to rectangular sets of equivalent martingale measures. While time-consistency is frequently violated in non-Bayesian models,16 multiple prior preferences do satisfy it as long as a certain stability condition called rectangularity or stability under pasting17 holds; that we state now. ASSUMPTION 4: The set of priors Q is time-consistent in the following sense. For P and Q in Q, let (pt ) and (qt ) be the density processes of P (resp. Q) with respect to P0 , that is, pt =
16
dP dP0 Ft
qt =
dQ dP0 Ft
See Sarin and Wakker (1998), for example. The typical metatheorem states that adding time-consistency to the axioms of a model leaves only expected utility; this holds true for rank-dependent (Choquet expected) utility, Kahneman–Tversky (weighted expected) utility, and betweenness preferences. For multiple priors (and more generally variational utility), timeconsistency restricts the choice of priors, but still leaves room for a rich dynamic theory in intertemporal models where agents update their beliefs as uncertainty resolves. For update rules in static models, one might impose an even stronger notion of time-consistency by demanding consistent updating for any possible event. This very strict notion of time-consistency leaves only the Bayesian model; see Epstein and LeBreton (1993). 17 The equivalence of dynamic consistency and rectangularity of priors in multiple prior models has recently received much attention in the decision theory literature and also in mathematical finance; see Epstein and Schneider (2003b) for the decision-theoretic foundation, and Artzner, Delbaen, Eber, Heath, and Ku (2007), Riedel (2004), Detlefsen and Scandolo (2005) for the discussion in the context of coherent risk measures.
OPTIMAL STOPPING WITH MULTIPLE PRIORS
867
Fix some stopping time τ. Define a new probability measure R by setting, for all t ∈ N, ⎧ if t ≤ τ, ⎨ pt dR pτ qt (3) = else. dP0 Ft ⎩ qτ Then R belongs to Q as well. Note that the measure R above is well defined, as the density process q is strictly positive by Assumption 2. The above definition of time-consistency is taken from Delbaen (2002b). It may look different from the definition of rectangularity used in Epstein and Schneider (2003b), but they are equivalent. The Appendix discusses another equivalent definition given by Föllmer and Schied (2004). The assumption ensures that the set of priors is closed under the operation of pasting together marginal and conditional distributions. In fact, if the decision maker uses the measure P until time τ and evaluates expectations after τ according to Q, then the new expectation R, constructed as in (3), is still in her set of priors Q. It has been shown (see Riedel (2004) and, more generally, Delbaen (2002b, Theorems 6.2 and 8.2)) that under Assumption 3, Q is time-consistent if and only if we have for all bounded random variables Z the following version of the law of iterated expectations18 : ess inf EP [Z|Ft ] = ess inf EP ess inf EQ [Z|Ft+1 ]Ft (4) (t ∈ N) P∈Q
P∈Q
Q∈Q
In Appendix C, Lemma 11, we show that the law of iterated expectations extends to Q-uniformly integrable random variables. We record this fact as a lemma. LEMMA 1: Under Assumptions 2, 3, and 4, we have for all Q-uniformly integrable random variables Z the law of iterated expectations (4). Before passing to our optimal stopping theory, let us point out two methods of building dynamically consistent multiple prior models. Independent and identically distributed random variables have a natural generalization to (timeconsistent) multiple priors that we call independent random variables with identical ambiguity; see Section 4 below. To obtain such a model, one first chooses a set of priors for the marginal distribution of every random variable of the 18 The operator ess inf denotes the essential infimum of a family of random variables. Similarly, ess sup is the essential supremum. These concepts are the right measure-theoretic version of infimum or supremum in probabilistic environments. See Doob (1998, Chap. V.18), or any good text on probability and measure.
868
FRANK RIEDEL
sequence (and this class of priors remains the same throughout because we want to model independent experiments) and then pastes the classes of marginal distributions together to obtain a rectangular set of priors on the whole state space. This model can be used as a building block for many interesting applications of dynamic multiple prior models in the same way as independent and identically distributed random variables are used in the classical setup to study random walks, autoregressive processes, and so on. Another general method to generate rectangular (or time-consistent) sets of priors is given by the time-consistent hull, which is the smallest rectangular set of priors that contains the original set. It can be constructed as follows. Starting with a given set of priors, compute marginal and conditional one-step distributions and paste them together working backward in the tree; see Riedel (2004) for details. 3. MAIN THEOREMS In this section, we gather the main theorems on optimal stopping for ambiguity-averse agents. Their proofs are postponed to Section 6, when we have developed the suitable martingale theory for multiple priors. 3.1. Finite Horizon The problem we consider in this subsection is maximize infP∈Q EP Xτ over all stopping times τ ≤ T for a finite horizon T < ∞. Throughout this section, we maintain the Assumptions 2, 3, and 4. For standard expectations (i.e., Q = {P} a singleton), the general solution to the above problem is well known.19 One proceeds by backward induction: let UTP = XT be the value in the last period. By backward induction, for t < T , set UtP := max{Xt EP [Ut+1 |Ft ]} Then the value process (UtP ) is the smallest P-supermartingale that dominates the payoff process (Xt ) and an optimal stopping time is given by τ∗ = inf{t ≥ 0 : UtP = Xt } The process U P is called the Snell envelope of X under P. To transfer the same kind of reasoning to the multiple prior framework, we have to introduce a suitable concept of a multiple prior supermartingale. 19 The theory starts with Snell (1952). For a textbook account, see Chow, Robbins, and Siegmund (1971). Dixit and Pindyck (1994) provided many important economic applications.
OPTIMAL STOPPING WITH MULTIPLE PRIORS
869
DEFINITION 1: Let Q be a set of priors. Let (Mt )t∈N be an adapted process with EP |Mt | < ∞ for all P ∈ Q and t ∈ N. (Mt ) is called a multiple prior (sub-, super-) martingale with respect to Q if we have, for t ∈ N, ess inf EP [Mt+1 |Ft ] = (≥ ≤)Mt P∈Q
We show later that one can develop analogs of the classical martingale theorems for multiple prior martingales. Here, we just note that this concept is the right one for our purposes. THEOREM 1: Define the multiple prior Snell envelope of X with respect to Q recursively by UT = XT and (5) (t = 0 T − 1) Ut = max Xt ess inf EP [Ut+1 |Ft ] P∈Q
Then (i) U is the smallest multiple prior supermartingale with respect to Q that dominates X; (ii) U is the value process of the optimal stopping problem under ambiguity, that is, Ut = ess sup ess inf EP [Xτ |Ft ]; τ≥t
P∈Q
(iii) an optimal stopping rule is given by τ∗ = inf{t ≥ 0 : Ut = Xt } The above theorem gives a complete solution to the optimal stopping problem under ambiguity. As in the expected utility case, the problem is solved by backward induction. We would like to stress that the theorem does not hold true without time-consistency of the set of priors Q. We give an example of the failure of the backward induction principle in Appendix D. Although the formulation of the backward induction principle is straightforward, the proof is not. To follow the classical argument, we have to extend the classical martingale theory to multiple prior martingales. This is done in Section 5 below. We study next the relationship between the multiple prior Snell envelope U and the usual Snell envelopes U P for the individual priors P ∈ Q. Above, we maximized the minimal expected payoff from stopping. One might as well swap the order, and maximize the expected reward first and then minimize over all priors. It is natural to guess that this leads to the same value.20 20 Such minimax theorems have been derived before with other methods; see Föllmer and Schied (2004, Theorem 6.47). Karatzas and Kou (1998) have a similar theorem for American options in continuous time.
870
FRANK RIEDEL
THEOREM 2 —Duality: The multiple prior Snell envelope U constructed in Theorem 1 is the lower envelope of the individual Snell envelopes U P : Ut = ess inf UtP P∈Q
The essential infimum is attained by some measure P ∈ Q, that is, U = U P . We have the minimax identity ess sup ess inf EP [Xτ |Ft ] = ess inf ess sup EP [Xτ |Ft ] τ≥t
P∈Q
P∈Q
τ≥t
The preceding theorem can be viewed as an equivalence theorem: for a given payoff process X, the ambiguity-averse decision maker behaves like an expected utility maximizer for a certain worst-case measure P. This does not imply, however, that optimal stopping under ambiguity aversion is behaviorally indistinguishable from optimal stopping under expected utility. This is so because the worst-case measure P depends on the payoff process X. For different suitably constructed payoff processes, the ambiguity-averse decision maker behaves like two distinct expected utility maximizers. This makes it possible to distinguish behaviorally between ambiguity-averse and ambiguity-neutral (EU) decision makers. Exotic options provide stark examples of such behavioral differences; see Sections 4.4 and 4.5 below. 3.2. Infinite Time Horizon Many optimal stopping problems are naturally formulated without imposing a finite time horizon. Also, the infinite horizon case frequently leads to simpler closed form solutions that are usually not available in the finite horizon case. We thus extend the analysis of the preceding section to T = ∞. We show that the value function satisfies the same Bellman-type backward recursion as in the finite case. Again, it is optimal to stop when the current payoff is equal to the value function. Moreover, we establish that the solutions of the finite time horizon converge to the infinite horizon solution. This is important as it allows us to approximate the general solution by using the constructive algorithm available in the finite horizon case. The problem we consider in this section is maximize infP∈Q EP Xτ over all stopping times τ that are universally finite, that is, infP∈Q P[τ < ∞] = 1. As we cannot use backward induction as in the finite horizon case, we define the value function at time t as (6)
Vt = ess sup ess inf EP [Xτ |Ft ] τ≥t
P∈Q
Note that Vt is well defined and finite because of Assumption 1.
OPTIMAL STOPPING WITH MULTIPLE PRIORS
871
THEOREM 3: (i) V is the smallest multiple prior supermartingale with respect to Q that dominates X and is bounded by a random variable in X . (ii) The value process (Vt ) satisfies the Bellman principle Vt = max Xt ess inf EP [Vt+1 |Ft ] P∈Q
for all t ≥ 0. (iii) An optimal stopping rule is given by τ∗ = inf{t ≥ 0 : Vt = Xt } provided that τ∗ is universally finite. The Bellman principle and the supermartingale characterization thus carry over to the infinite horizon. We have to restrict the class of supermartingales a little because we need to apply the optional sampling theorem (OST) in the proof; our version of the OST requires an upper bound that is Q-uniformly integrable. An optimal stopping time need not exist if τ∗ is not universally finite. A simple example is a deterministic strictly increasing sequence (Xt ) with finite limit 1, say. Then Ut = 1 for all t, but Xτ < 1 for all finite stopping times τ. The above theorem characterizes nicely the value process for an infinite horizon stopping problem. In contrast to the finite time horizon, it does not provide a constructive algorithm to compute the value process, though. It is thus important to know that the Snell envelopes of the finite horizon models converge to the infinite horizon value. THEOREM 4—Finite Horizon Approximation: Denote by U T the multiple prior Snell envelope of X with time horizon T . Let V be the value process with infinite time horizon as in (6). Then limT →∞ UtT = Vt for all t ≥ 0. 4. APPLICATIONS Let us now come to the examples and applications we mentioned in the introduction. In some famous examples it is relatively easy to identify the worstcase measure due to independence, monotonicity, and stochastic dominance. The optimal stopping rule under ambiguity is identified as the optimal stopping rule under the worst-case measure; the worst-case measure has the same structure as in the classical example in the sense that the state variables are independent and identically distributed. In general, however, the worst-case measure can have a quite complicated structure and the optimal stopping rule is different from the rules one can find in the literature. We exhibit below three classes of examples in which the worst-case measure assigns path dependence to the distribution of the payoff variables although the corresponding unambiguous model displays independence.
872
FRANK RIEDEL
The basic building block for our model is a sequence of random variables (εt ) that are projections from some product space S N to the state set S. On S, we have a set of marginal distributions Mab that describe the agent’s prior beliefs about one particular experiment. The parameters a < b are real numbers that describe the ambiguity—technically speaking, we describe the set of densities via a certain exponential family (see (7) below), and the parameters are chosen from the real interval [a b]. As the parameters a < b are the same for all εt , one can say that the experiments all share the same (degree of) ambiguity. In a second step, we paste these marginal distributions together to form a time-consistent set of priors for the whole sequence (εt ) of observations. In several important applications, the payoff from stopping is a monotone function of an underlying state variable. This holds true, for example, for the parking problem or the house vendor problem; similarly, the usual asset pricing model has independent returns. In these cases it is possible to identify the worst-case measure ex ante. To illustrate this let us think about the house vendor. If he has in his set of priors a distribution under which prices are lowest in the sense of first-order stochastic dominance, then it is easy to believe that this prior is going to be his worst-case prior. Therefore, he should use this prior to compute his optimal stopping rule. Similarly, for an American put option the buyer uses the prior under which the log returns are independent and identically distributed with highest mean return because the buyer of an American put option is hoping for decreasing asset prices. Many interesting and important problems, however, do not have this simple monotone structure that makes it easy to exploit stochastic dominance to identify the worst-case measure. We discuss here two interesting examples from option pricing where the worst-case prior is path dependent. In both examples, the buyer of the option hopes that prices increase until a certain event has occurred and decrease afterwards. In such cases, the worst-case prior presumes minimal mean returns until the critical event has happened and maximal mean returns afterward. To illustrate this, let us have a look at barrier options, in particular up-and-in American put options. These options pay off like an American put for a certain strike K, but only after the asset price has reached a certain upper barrier H. In other words, the option has to be knocked in, and knock-in occurs when the asset price first reaches a certain high level H. Here the buyer of such an option hopes first that prices are going up, because he wants them to reach the level H. After the level H is hit, the investor is faced with an American put and here he wants prices to decrease. High mean returns are thus the worst-case. In this case the worst-case prior corresponds to the case where the drift of the underlying asset price’s mean return changes at a random time. Mean returns are thus no longer independent. Another similar example that we discuss below is given by so-called shout options; see Section 4.5. Last but not least, we discuss a general class of problems where the payoff function is a U-shaped function of some state variable, with a unique mini-
OPTIMAL STOPPING WITH MULTIPLE PRIORS
873
mum, for example. The agent wants to exercise when the state variable is sufficiently far away from the minimum. One can then show that for high states, the worst-case prior has low mean, and for low states, the worst-case prior has high mean. In principle, the worst-case prior changes drift arbitrarily often during the lifetime of the contract; see Section 4.6. 4.1. Time-Consistent Multiple Priors in Discrete Time: Independent Experiments With Identical Ambiguity We introduce a framework in which one can treat multiple prior versions of the examples discussed in the Introduction. We aim for a multiple prior generalization of a sequence of independent and identically distributed random variables. Our model is the discrete-time version of the κ-ignorance model in Epstein and Chen (2002). S ⊂ R. Let Ω = S N be the set of all Let (S S ν0 ) be a measure space with ∞ sequences with values in S and let B = t=1 Sbe the σ-field generated by ∞ all projections εt : Ω → S. Moreover, let P0 = t=1 ν0 be the probability on (Ω B ) under which the (εt ) are independent and identically distributed with distribution ν0 . Let (Ft ) be the filtration generated by the sequence (εt ). We assume that eλx ν0 (dx) < ∞ S
for all λ ∈ R. The log-Laplace function L(λ) = log eλx ν0 (dx) S
is then well defined. Note that, by definition, E P0 exp(λεt − L(λ)) = 1 We use this property to define densities on (Ω B P0 ). First, fix real numbers a < b. For a given t, the densities (exp(θεt − L(θ)))θ∈[ab] define a family of priors Mab on (S S ). We are going to use this set as the set of marginal distributions for each εt . We paste those marginal priors together to get a timeconsistent family of priors on (Ω B ) as follows. Let (αt ) be a predictable process. Set t t (7) αs ε s − L(αs ) Dαt = exp s=1
s=1
D is a density process that defines a probability measure P α ∼ P0 . Fix some a < b. We denote by P ab the set of all probability measures P ∼ P0 whose density processes satisfy (7) for a predictable process (αt ) with values in [a b]. α
874
FRANK RIEDEL
We record for our purposes the following lemma. LEMMA 2: The set P ab satisfies Assumptions 2, 3, and 4. In general, the set P ab is not convex. Although it is often convenient to have this property for general arguments, we do not need it here. Let us have a look at the one period version of the multiple prior set P ab . In this case, the densities can be written as Dα1 = exp(α1 ε1 ) · const
(α1 ∈ [a b])
As the process (αt ) is predictable, α1 is a real number. Such families are frequently studied in statistics. They are called exponential families and have a lot of nice properties for estimation and hypothesis testing. Here, our decision maker uses a suitable dynamic version of exponential families to set up a time-consistent model.21 4.2. Monotone Problems We consider now stopping problems where the payoff function is a monotone function of the state variables (εt ). The house vendor’s problem in the Introduction and the parking problem discussed below are instances of such a setting. Let us assume that we have Xt = f (t εt ) for some bounded,22 measurable function f that is strictly increasing in its second variable. We consider the stopping problem (8)
maximize infP∈P ab EP Xτ over all stopping times τ ≤ T .
Here, the horizon T can be finite or infinite. In these problems, monotonicity and stochastic dominance allow us to identify ex ante the worst-case probability measure. It is given by the measure P a , where the process αt = a is equal to its lowest possible value for all t. As a consequence, we have the following theorem. THEOREM 5: Let τa denote the optimal stopping time for the classical stopping problem a
maximize EP Xτ over all stopping times τ ≤ T . τa is also the solution of the stopping problem (8) under multiple priors. 21
In Bier and Riedel (2009), we characterized time-consistent multiple priors in finite trees. For our assumption on uniform integrability to be satisfied, it is enough to require |f (t x)| ≤ A(1 + exp(λx)) for some A λ > 0. 22
OPTIMAL STOPPING WITH MULTIPLE PRIORS
875
We apply the previous theorem to two famous examples. EXAMPLE 1 —Selling a House: Consider an agent who collects bids p0 p1 for a house. In the classical version of the problem, the bids are independent and identically distributed. The agent discounts the future with a factor δ ∈ (0 1) and looks for a stopping time τ that maximizes the expected present value E(δτ pτ ). In the finite horizon case, one shows quickly via backward induction that there exist critical prices p∗0 ≥ p∗1 ≥ p∗T = 0 such that it is optimal to stop whenever pt ≥ p∗t . Now let us consider the multiple prior version with a set of priors P ab . As we have a monotone problem, the multiple prior agent behaves as if P a was the right prior. Note that the prices are again independent and identically distributed under P a . To specialize the problem further, assume that prices are log-normal with mean μ and variance σ 2 in the original problem. One can show that under P a , prices are again log-normal with mean a and same variance. Here, time-consistent ambiguity thus corresponds to a shift in the mean of the price distribution. The technical details are in Appendix G. EXAMPLE 2 —The Parking Problem23 : You are driving along the Rhine. Your aim is to park your car as close as possible to the place where the ship leaves for a sightseeing tour. When a spot is empty, you face the decision whether to stop and park or to continue, hoping to find a spot closer to the departure point. Formally, let N ∈ N be the desired parking spot. Let (εt )t∈N be a sequence of binary random variables. The spot k is empty when εk = 1. The payoff from parking at an empty spot is −|N − k|. If you stop at an occupied spot, you pay a fee K (assumed to be so large that it is never optimal to stop at an occupied spot). In the classical version of the problem, the probability p = P[εt = 1] is known to the driver. It is natural, though, to admit that the driver has some uncertainty about the parameter p and to allow for p ∈ [p p]. It might not be obvious how to embed this problem into a monotone problem; for example, the payoff function is not monotone in k. However, after the desired spot N, it is obviously optimal to stop at the first empty spot. One can then reduce the (potentially infinite) parking problem to a finite problem where the payoff at N + 1 is equal to the expected waiting time for finding an empty spot. This reduced problem has a monotone payoff function. Also, it requires some work to write the densities in the form (7). The technical details are provided in Appendix H. Theorem 5 tells us that an ambiguity-averse driver should behave as if the lowest probability p was the correct one. The pessimist thus behaves as if p = p was the case. 23 See Chow, Robbins, and Siegmund (1971) and Lerche, Keener, and Woodroofe (1994) for a generalization.
876
FRANK RIEDEL
For the reader’s convenience, we recall the solution to this classical problem (see, e.g., Ferguson (2006, Chap. 2.11)). Let r ∈ N be the smallest number such that (1 − p)r+1 ≤ 1/2. The optimal rule is to start looking when you are r places away from the desired location and to take the first available spot. If, for example, you think that in the worst case 1 out of 100 places is empty, you should start looking when you are 68 places from your target. 4.3. Simple American Options and Entry Decisions We now discuss simple American options like call and put. For concreteness, we focus on the binomial tree, also known as the Cox–Ross–Rubinstein (CRR) model in finance. Our results extend to any monotone model in the sense of Section 4.2. We show first how to formulate a time-consistent set of priors for the binomial tree in the spirit of Section 4.1; see also Appendix We have S = {0 1} H. ∞ and take ν0 to be uniform on S. Hence, under P0 = t=1 ν0 , the projections (εt ) are independent and identically distributed with P0 [εt = 1] = P0 [εt = 0] = 1/2. Consider now the set P ab as defined in Section 4.1. Fix a predictable process (αt ). By using Bayes’ law, we get P α [εt = 1|Ft−1 ] =
exp(αt ) 1 + exp(αt )
Hence, the probability for εt = 1 is always in the interval [p p] with p=
exp(a) 1 + exp(a)
p=
exp(b) 1 + exp(b)
We denote by P and P the probability measures under which the random variables (εt ) are independent and identically distributed with P[εt = 1] = p and P[εt = 1] = p, respectively. In the binomial model of asset markets (Cox, Ross, and Rubinstein (1979)), there is a riskless asset with price Bt = (1 + r)t for an interest rate r > −1 and a risky asset (St ) given by S0 = 1 and u if εt+1 = 1, St+1 = St · d if εt+1 = 0. To preclude arbitrage opportunities, we assume 0 < d < 1 + r < u. We consider an investor who exercises an American option that pays off A(t St ) when exercised at time t. We consider the problem maximize infP∈Q EP A(τ Sτ ) over all stopping times τ ≤ T . The following theorem provides the solution for monotone A.
OPTIMAL STOPPING WITH MULTIPLE PRIORS
877
THEOREM 6: (i) Let (U t ) be the Snell envelope of Xt = A(t St ) under P. If A(t ·) is increasing in S for all t, the multiple prior Snell envelope is U = U and an optimal stopping rule under ambiguity is given by τ = inf{t ≥ 0 : A(t St ) = Ut }. The same statement holds true for an infinite time horizon provided that τ∞ = inf{t ≥ 0 : A(t St ) = Ut∞ } is universally finite. (ii) Let (U t ) be the Snell envelope of Xt = A(t St ) under P. If A(t ·) is decreasing in S for all t, the multiple prior Snell envelope is U = U and an optimal stopping rule under ambiguity is given by τ = inf{t ≥ 0 : A(t St ) = Ut }. The same statement holds true for an infinite time horizon provided that τ∞ = inf{t ≥ 0 : A(t St ) = Ut∞ } is universally finite. The proof of the preceding theorem relies on the fact that P (or P resp.) is the worst probability measure in the sense of first-order stochastic dominance and that the payoff is a monotone function of the underlying stock price. Although the payoff here is not an increasing function of state variables that are independent under the worst-case measure, we can use essentially the same argument for the proof as the log returns are independent. We collect our conclusions for the American put and American call. COROLLARY 1—American Call: A risk-neutral buyer of an American call uses an optimal stopping rule for the prior P. In particular, if pu + (1 − p)d > 1 + r, the American call is not exercised before maturity. PROOF: From Theorem 6, we know that P is the worst-case measure because the payoff of the American call is increasing in S. If pu + (1 − p)d > 1 + r holds, then St (1 + r)−t is a (strict) submartingale under P and τ = T is optimal. Q.E.D. COROLLARY 2—American Put: A risk-neutral buyer of an American put uses an optimal stopping rule for the prior P. 4.4. Barrier Options While exercising American put and call can be reduced easily to the single prior case by using monotonicity and stochastic dominance, the picture is quite more involved in general. To illustrate this issue, we consider now American barrier options. Barrier options are financial derivatives whose payoffs depend on the underlying’s price hitting a barrier during the lifetime of the contract. We consider the American version of the so-called up-and-in put. Such an option is knocked in when the asset hits a level H; if this does not happen over the lifetime of the contract, the option remains worthless. After knock-in, the buyer owns a usual American put with some strike K. We assume H > K and,
878
FRANK RIEDEL
to avoid the trivial case, H > S0 = 1. To make our life simpler, let us also assume that ud = 1 and that H lies on the grid of possible asset prices. Let T > 0 be the contract’s maturity. Denote by τH = inf{t ≥ 0 : St ≥ H} the knock-in time. After τH , the barrier option coincides with a plain vanilla American put. Let us write p(t x) = ess sup ess inf E P [(K − Sτ )+ /(1 + r)τ |St = x] τ≥t
P∈Q
for the value of the American put in the ambiguous CRR model. From the analysis above, we know that this value coincides with the put’s value under the worst-case measure P where the asset price has maximal mean return: p(t x) = ess sup E P [(K − Sτ )+ /(1 + r)τ |St = x] τ≥t
Let us write b(t x) for the value of the up-and-in put at time t when the asset price is x. As the option’s payoff coincides with that of a simple put after knockin, we have the following result. LEMMA 3: On the event {τH ≤ t}, the value of the up-and-in put is b(t St ) = p(t St ) Before knock-in, the payoff upon exercise is zero. The dynamic programming principle (5) yields b(t St ) = ess inf E P [b(t + 1 St+1 )|St ] P∈Q
on {τH ≤ t}
so (b(t St )) is a multiple prior martingale until knock-in. LEMMA 4: (b(t ∧ τH St∧τH )) is a multiple prior martingale. From the multiple prior martingale property and the optional sampling theorem, Theorem 9, we get b(0 S0 ) = inf E P p(τH H) P∈Q
where we use the fact that the barrier H is hit exactly (by assumption). The value of an American put is a decreasing function of time. Moreover, we have P[τH ≤ t] ≤ P[τH ≤ t]
OPTIMAL STOPPING WITH MULTIPLE PRIORS
879
for all P ∈ Q (see Appendix J for the details). In other words, τH is stochastically largest under P. The usual characterization of first-order stochastic dominance yields b(0 S0 ) = E P p(τH H) Combining this result with our formula for the American put, we obtain ∗ b(0 S0 ) = E P E P (K − Sτ∗ )+ /(1 + r)τ |Fτ∗ where we write τ∗ = inf{t ≥ τH : b(t St ) = (K − St )+ /(1 + r)t } for the optimal stopping time after knock-in. From this formula, we see that the investor uses a worst-case measure P ∗ that is the pasting of P after P at τH . The pessimistic buyer thus presumes a change of drift at knock-in. Before the option becomes valuable, she uses the lowest mean return in her computations, and afterward, she uses the highest mean return. 4.5. Shout Options A shout option gives the buyer the right to freeze the asset price at any time τ before maturity to insure herself against later losses. If the asset price at maturity is ST < Sτ , the buyer receives the difference Sτ − ST while no payment is made otherwise. This contract is equivalent to the following. Upon shouting, the buyer receives a European put option at the money—that is, with strike Sτ —with maturity T . To evaluate the option, let us first compute the expected payoff of the at-themoney put option at time τ which is given by ess inf E P [(Sτ − ST )+ /(1 + r)T −τ |Fτ ] P∈Q
A similar consideration as for the case of the American put shows that the worst-case measure is P; the worst-case for the put’s owner is when the mean return is highest. Therefore, the worst expected payoff of the European put is E P [(Sτ − ST )+ /(1 + r)T −τ |Fτ ] at the time of shouting. From the properties of the conditional expectation and by definition of the asset price in the CRR model, this can be written as + T T −τ P 1−εs εs 1− d u Sτ /(1 + r) E F τ s=τ+1
880
FRANK RIEDEL
As the log returns of the risky asset are independent under the measure P, the expectation is just a function of τ. Overall we can write the value of the at-the-money put at time τ as Sτ g(τ) for some function g; see Appendix K for the details. The buyer is thus faced with an optimal stopping problem where the payoff is monotone in the risky asset. From our analysis of the monotone case, we conclude that the worst-case measure until shouting is given by P. The investor’s beliefs change after she has taken action: as long as she has not shouted, she believes that the risky asset has very low returns; after shouting she changes her beliefs—being pessimistic, she presumes now that the risky asset is going to have high returns. 4.6. U-Shaped Payoffs In the preceding section, the worst-case prior changed exactly once when the barrier option was hit. In general, there can be arbitrarily many changes in the underlying worst-case prior as we illustrate now with options that have convex payoffs Xt = g(t St ) for functions g(t x) that are convex in x with a unique minimum. Typical examples are the straddle with payoff |K − x| or, more generally, a strangle with a payoff like (K − x)+ + c(x − L)+ for some c > 0 and 0 < K ≤ L. As in the preceding section, we assume du = 1. In these cases, the payoff function decreases first until some minimum and increases afterward. We show first that this property carries over to the value function. LEMMA 5: Let v(t x) = ess sup ess inf E P [f (Sτ )/(1 + r)τ |St = x] τ≥t
P∈Q
be the value function. For all t there is a value xt > 0 such that v(t x) is decreasing for x < xt and increasing for x ≥ xt . The previous lemma shows that we can separate the state space of the asset price into two regions. If asset prices are low, the value function is decreasing. Therefore, with the same argument as for simple American options, one can show that P is the worst-case measure here. In the other region, on the contrary, P is the worst-case measure. Here, we have a deterministic boundary, given by the numbers xt such that the worst-case measure changes the drift whenever the asset price crosses this boundary. We then have the following result.
OPTIMAL STOPPING WITH MULTIPLE PRIORS
881
THEOREM 7: The worst-case measure is given by the density D∗t = exp(aεv − L(a)) exp(bεv − L(b)) v≤tSv ≤xv
v≤tSv ≥xv
5. MULTIPLE PRIOR MARTINGALE THEORY This section develops the multiple prior martingale theory that we need for the proof of our main theorems. The material might be useful in other contexts as well. Throughout, we maintain Assumptions 2, 3, and 4. We start by connecting the new concept of multiple prior martingale with the classical concept. LEMMA 6: Let (Mt ) be an adapted process that is bounded by a Q-uniformly integrable random variable. (i) M is a multiple prior submartingale if and only if it is a Q-submartingale for all Q ∈ Q. (ii) M is a multiple prior supermartingale if and only if there exists P ∗ ∈ Q such that M is a P ∗ -supermartingale. (iii) M is a multiple prior martingale with respect to Q if and only if (a) there exists P ∗ ∈ Q such that M is a P ∗ -martingale and (b) it is a Q-submartingale for all Q ∈ Q. PROOF: For (i), note that ess inf EP [Mt+1 |Ft ] ≥ Mt P∈Q
is equivalent to for all P ∈ Q
EP [Mt+1 |Ft ] ≥ Mt
We proceed with (ii). Suppose that M is a P ∗ -supermartingale for some P ∈ Q. Then we have for t ∈ N, ∗
∗
Mt ≥ E P [Mt+1 |Ft ] ≥ ess inf E P [Mt+1 |Ft ] P∈Q
and M is a multiple prior supermartingale. For the converse, we need the assumption of time-consistency. By Lemma 10, there exist measures P t+1 ∈ Q for t ∈ N that coincide with P0 on Ft and satisfy ess inf EP [Mt+1 |Ft ] = EP P∈Q
t+1
[Mt+1 |Ft ]
Let z t+1 be the density of P t+1 with respect to P0 on Ft+1 . The density of P t+1 with respect to P0 on Ft is 1. By Bayes’ formula, we have (9)
Mt ≥ ess inf EP [Mt+1 |Ft ] = EP0 [Mt+1 z t+1 |Ft ] P∈Q
882
FRANK RIEDEL
Construct a new measure P ∗ by setting dP ∗ = z1 z2 · · · zT dP0 FT
(T ∈ N)
By time-consistency, P ∗ ∈ Q. We claim that M is a P ∗ -supermartingale. To see this, use Bayes’ formula and Equation (9) to get ∗ −1 dP dP ∗ Mt+1 F t dP0 Ft+1 dP0 Ft
P∗
E [Mt+1 |Ft ] = E
P0
= EP0 [Mt+1 z t+1 |Ft ] ≤ Mt Therefore, M is a P ∗ -supermartingale. For (iii), combine (i) and (ii).
Q.E.D.
Note the big difference between multiple prior sub- and supermartingales. While a multiple prior submartingale is a submartingale for all Q ∈ Q uniformly, a multiple prior supermartingale is a supermartingale for some Q ∈ Q only. This is due, of course, to the fact that we always take the essential infimum over a class of probability measures. We are now going to extend two fundamental theorems from martingale theory to multiple prior martingales. We start with the famous Doob decomposition. THEOREM 8—Doob Decomposition: Let S be a multiple prior supermartingale (submartingale) with respect to Q. Then there exists a multiple prior martingale M and a predictable, nondecreasing process A with A0 = 0 such that S = M − A (S = M + A). Such a decomposition is unique. PROOF: For uniqueness, note that from S = M − A with the stated properties, we obtain 0 = ess inf E P [St+1 − St + At+1 − At |Ft ] P∈Q
and predictability of A yields the recursive relation (10)
At+1 = At − ess inf E P [St+1 − St |Ft ] P∈Q
In conjunction with A0 = 0, this determines A, and then M, uniquely. Now let A be given by (10) and let A0 = 0. Note that A is predictable and nondecreasing as S is a multiple prior supermartingale. Let Mt = St + At . We have to show that M is a multiple prior martingale. The predictability of A
883
OPTIMAL STOPPING WITH MULTIPLE PRIORS
implies ess inf E P [Mt+1 − Mt |Ft ] = ess inf E P [St+1 − St + At+1 − At |Ft ] P∈Q
P∈Q
= At+1 − At + ess inf E P [St+1 − St |Ft ] = 0 P∈Q
Q.E.D.
This completes the proof.
REMARK 1: As multiple prior submartingales are nothing but uniform Q-submartingales, it is worthwhile to compare the preceding Doob decomposition with the so-called optional or uniform Doob decompositions for Q-submartingales used in the theory of hedging, where Q is given by the (timeconsistent!) set of equivalent martingale measures for some financial market (see El Karoui and Quenez (1995), Föllmer and Kabanov (1998), Kramkov (1996)). Here, we decompose a Q-submartingale into a multiple prior martingale M and a predictable, increasing process A starting at 0. This, however, is not a uniform Doob decomposition as M is usually only a Q-submartingale, not a Q-martingale. In fact, for such a uniform decomposition, A is usually only adapted, not predictable. The next fundamental theorem concerns the preservation of the (super-) martingale property under optimal stopping. THEOREM 9—Optional Sampling Theorem: Let Z be a multiple prior supermartingale with respect to Q that is bounded by a Q-uniformly integrable random variable. Let σ ≤ τ be stopping times. Assume that τ is universally finite in the sense that infP∈Q P[τ < ∞] = 1. Then ess inf EP [Zτ |Fσ ] ≤ Zσ P∈Q
PROOF: By Lemma 6, there exists P ∗ ∈ Q such that Z is a supermartingale under P ∗ . The standard optional sampling theorem states that ∗
EP [Zτ |Fσ ] ≤ Zσ As a consequence, ∗
ess inf EP [Zτ |Fσ ] ≤ EP [Zτ |Fσ ] ≤ Zσ P∈Q
Q.E.D.
REMARK 2: As multiple prior submartingales are Q-submartingales, Theorem 9 holds true for multiple prior submartingales also.
884
FRANK RIEDEL
6. PROOFS OF THE MAIN THEOREMS PROOF OF THEOREM 1: U is a multiple prior supermartingale by definition. Let V be another multiple prior supermartingale with V ≥ X. Then we have VT ≥ XT = UT . Now assume that Vt+1 ≥ Ut+1 ; as V is a multiple prior supermartingale, Vt ≥ ess inf E P [Vt+1 |Ft ] ≥ ess inf E P [Ut+1 |Ft ] P∈Q
P∈Q
We also have Vt ≥ Xt by assumption. Hence Vt ≥ max Xt ess inf EP [Ut+1 |Ft ] = Ut P∈Q
Thus, U is the smallest multiple prior supermartingale that dominates X. Now let Wt = ess supτ≥t ess infP∈Q EP [Xτ |Ft ]. Note that W is well defined because Wt ≥ Xt > −∞ and Wt ≤ ess infP∈Q EP [Z|Ft ] < ∞ by Assumption 1. From the multiple prior supermartingale property of U and U ≥ X, we conclude with the help of the optional sampling theorem, Theorem 9, that for every stopping time τ with values in {t T }, ess inf EP [Xτ |Ft ] ≤ ess inf EP [Uτ |Ft ] ≤ Ut P∈Q
P∈Q
Wt ≤ Ut follows. It remains to be shown that Ut ≤ Wt . To this end, we define the stopping time τt∗ = inf{s ≥ t : Us = Xs } We claim that (Us∧τt∗ )s=tT is a multiple prior martingale. Again using the optional sampling theorem, Theorem 9, we conclude Ut = ess inf EP Uτt∗ |Ft = ess inf EP Xτt∗ |Ft ≤ Wt P∈Q
P∈Q
and we are done. To check the multiple prior martingale property, fix s ∈ {t T }. Note that on the set {τt∗ ≤ s}, we have U(s+1)∧τt∗ = Uτt∗ = Us∧τt∗ . Hence, ess inf EP U(s+1)∧τt∗ |Fs = Us∧τt∗ P∈Q
on {τt∗ ≤ s}. On the complement {τt∗ > s}, we have Us > Xs . The definition of U implies that Us∧τt∗ = Us = max Xs ess inf EP [Us+1 |Fs ] P∈Q
OPTIMAL STOPPING WITH MULTIPLE PRIORS
= ess inf EP [Us+1 |Fs ] P∈Q = ess inf EP U(s+1)∧τt∗ |Fs P∈Q
885
on {τt∗ > s}
Hence, (Us∧τt∗ )s=tT is a multiple prior martingale and the above claim is proved. As a further consequence, by setting t = 0, we obtain that (Us∧τ∗ ) is a multiple prior martingale for τ∗ = τ0∗ = inf{t ≥ 0 : Ut = Xt } The optional sampling theorem, Theorem 9, yields U0 = infP∈Q EP Xτ∗ . This Q.E.D. shows that τ∗ is optimal. REMARK 3: (i) One might wonder where time-consistency of Q was used in the proof. We need it when we apply the optional sampling theorem. This theorem does not hold true without time consistency. (ii) Optimal stopping times are usually not unique.24 By using the Doob decomposition of the Snell envelope U = M − A, one can show that the largest optimal stopping time is τmax = inf{t ≥ 0 : At+1 > 0} The proof runs along the classical lines (see, e.g., Theorem 6.23 in Föllmer and Schied (2004)) and is thus omitted here. PROOF OF THEOREM 2: From the backward definition, it is clear that U ≤ U P for all priors P ∈ Q. U is a multiple prior supermartingale that is bounded by a Q-uniformly integrable random variable (because the payoff process X is). By Lemma 6, there exists P ∈ Q such that U is a Psupermartingale. The classical theory of optimal stopping tells us that U P is the smallest P-supermartingale that dominates X. Hence, U P ≤ U. Altogether, we obtain U = U P . It follows that ess sup ess inf EP [Xτ |Ft ] = Ut = Ut
P
τ≥t
P∈Q
= ess sup EP [Xτ |Ft ] τ≥t
≥ ess inf ess sup EP [Xτ |Ft ] P∈Q
24
τ≥t
Think of an American call that is known to be out of the money until maturity. Such a situation can happen easily in a CRR model for appropriate parameters. Then the agent is indifferent between stopping and waiting as her payoff is zero anyway.
886
FRANK RIEDEL
The other inequality of the minimax identity is trivial. This concludes the proof. Q.E.D. PROOF OF THEOREM 3: We start with (ii). By Lemma 12, there exists a sequence (τk ) of stopping times such that ess inf EP Xτk |Ft+1 ↑ Vt+1 P∈Q
Continuity from below (Lemma 9) and the law of iterated expectations (1) (also Lemma 11) imply that ess inf EP [Vt+1 |Ft ] = lim ess inf EP ess inf EQ Xτk |Ft+1 Ft P∈Q
k→∞
P∈Q
k→∞
P∈Q
Q∈Q
= lim ess inf EP Xτk |Ft ≤ Vt
As Xt ≤ Vt is clear, we obtain max Xt ess inf EP [Vt+1 |Ft ] ≤ Vt P∈Q
For the converse inequality, take some stopping time τ ≥ t and define a new stopping time σ = max{τ t + 1} ≥ t + 1. Then ess inf EP [Xτ |Ft ] P∈Q
= Xt 1{τ=t} + ess inf EP Xτ 1{τ≥t+1} |Ft P∈Q = Xt 1{τ=t} + ess inf EP Xσ 1{τ≥t+1} |Ft P∈Q
= Xt 1{τ=t} + ess inf EP [Xσ |Ft ]1{τ≥t+1} P∈Q = Xt 1{τ=t} + ess inf EP ess inf E Q [Xσ |Ft+1 ]Ft 1{τ≥t+1} P∈Q
Q∈Q
≤ max Xt ess inf EP [Vt+1 |Ft ] P∈Q
Here, we have used the law of iterated expectations (Lemma 1) in the fourth line. (ii) is thus proved. As a consequence, (Vt ) is a multiple prior supermartingale. Now suppose that (Wt ) is another multiple prior supermartingale that dominates X and is bounded by a random variable in X . Then for every stopping time τ ≥ t, the optional sampling theorem, Theorem 9, implies that ess inf EP [Xτ |Ft ] ≤ ess inf EP [Wτ |Ft ] ≤ Wt P∈Q
P∈Q
By taking the supremum over all such stopping times, Vt ≤ Wt follows. This shows (i).
OPTIMAL STOPPING WITH MULTIPLE PRIORS
887
For (iii), one shows first that (Us∧τ∗ ) is a multiple prior martingale; see the proof of Theorem 1. If τ∗ is universally finite, bounded convergence (Lemma 9(iv)) gives inf EP Uτ∗ = lim inf EP UT ∧τ∗ = U0
P∈Q
T →∞ P∈Q
Hence, τ∗ is optimal.
Q.E.D.
PROOF OF THEOREM 4: Note that UtT is bounded by a Q-uniformly integrable random variable and is increasing in T . Hence, we can define Ut∞ = limT →∞ UtT . By continuity from below (Lemma 9) and the definition of the Snell envelope, we obtain T |Ft ] Ut∞ = lim max Xt ess inf EP [Ut+1 T →∞ P∈Q ∞ |Ft ] = max Xt ess inf EP [Ut+1 P∈Q
Hence, U ∞ is a multiple prior supermartingale that dominates X. By Theorem 3, we have U ∞ ≥ V . On the other hand, by Theorem 1, UtT = ess supt≤τ≤T ess infP∈Q EP [Xτ |Ft ] ≤ Vt . As a consequence, V = U ∞ , and the proof is complete. Q.E.D. 7. CONCLUSION We present a unified and general theory of optimal stopping under multiple priors in discrete time. Much of the received theory can be translated to the multiple priors framework provided the priors satisfy the time-consistency criterion. In this case, we also generalize classical martingale theory. A pessimist views a game against nature as fair (a martingale) if and only if it is favorable for all priors and exactly fair for one (the worst) prior. We develop a natural framework for time-consistent multiple priors that generalizes the classical model of independent and identically distributed random variables and we solve a number of problems in this setting. While sometimes monotonicity and stochastic dominance allow ex ante identification of the worst prior, the situation can be quite involved in general; indeed, the marginal beliefs may change stochastically, sometimes depending on the action of the agent, because the agent always picks the worst marginal prior from the set of all possible priors. The current paper may form the basis for building a theory of dynamic stopping games with multiple priors. Concrete examples for such applications are patent races, the war of attrition, and innovations in duopoly with multiple priors. Another application is strategic delay in entry games (the discrete time version of Weeds (2002) is now feasible, for example); for irreversible investment games as in Grenadier (2002), the dynamically consistent theory of optimal stopping is a basic building block in the arguments. Therefore, our results may be used in these papers to generalize to multiple priors.
888
FRANK RIEDEL
From a more general perspective, one can now develop the general theory of Markov perfect equilibria in multiple priors stopping games along the lines of Fudenberg and Tirole (1983). Dutta and Rustichini (1993) had an approach for subgame perfect equilibria in stopping games: they reduced the search for equilibria to a single agent stopping problem. If their arguments go through in the multiple prior case, our theory would provide the extension to multiple priors quite easily. Let us also mention the class of Dynkin games. These are zero-sum games in which both agents have the right to stop. A typical application in finance is the so-called Israeli options; these are American options which can be exercised by both buyer and seller. Our results may allow the existence of a value in such games to be proved. A natural next step is, of course, to extend these results to continuous time. Here, the work of Peng (1997) provides a promising framework to extend the current discrete-time results. Recent work also shows that one might generalize our results to the more general class of dynamic variational preferences (Maccheroni, Marinacci, and Rustichini (2006)) or convex risk measures (Föllmer and Penner (2006)). APPENDIX A: THE SPACE X OF Q-UNIFORMLY INTEGRABLE RANDOM VARIABLES The standard literature on multiple priors and coherent risk measures focuses mainly on finite probability spaces or the space of essentially bounded random variables L∞ (Ω F P0 ).25 In some applications in economics and finance, unbounded payoffs occur naturally, for example, when asset returns are modeled by (log-) normal distributions or when one looks at infinite horizon models. In this paper, we thus allow payoffs to be unbounded; however, we need some degree of integrability so as to get well defined expectations and a clean theory. DEFINITION 2: Let Q be a family of probability measures on (Ω F P0 ). We call a random variable X ∈ L0 (Ω F P0 ) Q-uniformly integrable if lim sup EP |X|1{|X|≥K} = 0
K→∞ P∈Q
We denote by X the space of all Q-uniformly integrable random variables. REMARK 4: Under our Assumption 2, X is a subspace of L1 (Ω F P0 ) that is closed when under taking maxima or minima. 25
Delbaen (2002a) discussed extensions to L0 and recently, several authors considered coherent risk measures on Lp -spaces, for example, Kaina and Rüschendorf (2009) and Filipovic and Svindland (2008).
889
OPTIMAL STOPPING WITH MULTIPLE PRIORS
The following lemma is of importance. LEMMA 7: For all X ∈ X , the mapping P → EP X is weakly continuous on Q. PROOF: By definition of the weak topology, the mapping is weakly continuous for bounded X. Now suppose that Pn → P weakly, and let X ∈ X and ε > 0. By uniform integrability, there exists L > 0 such that sup EPn |X|1{|X|≥L} < ε/3 n∈N
and EP |X|1{|X|≥L} < ε/3 Set Y := X1{|X|≤L} − L1{X<−L} + L1{X>L} . As Y is bounded, we have |EPn Y − EP Y | < ε/3 for large n. Note also that |X − Y | ≤ |X|1{|X|≥L} . We then obtain |EPn X − EP X| ≤ EPn |X − Y | + |EPn Y − EP Y | + E P |Y − X| < ε and the proof is complete.
Q.E.D.
APPENDIX B: EQUIVALENT DESCRIPTIONS OF TIME-CONSISTENCY Several notions of time-consistency have been introduced in the literature. For the sake of the reader and our own convenience, we gather them here in the manner used in the text and prove that they are equivalent to each other. In this section, we fix a finite26 time horizon T < ∞. All stopping times τ are dP . . . . for the densities on FT . thus bounded by T . Moreover, we write dQ In the spirit of Epstein and Schneider (2003b), Q is said to be rectangular if for all stopping times τ and all P Q ∈ Q, the measure R given by R(B) = EQ P(B|Fτ )
(B ∈ F )
belongs to Q as well. Following Föllmer and Schied (2002), we call Q stable if for all stopping times τ, sets A ∈ Fτ , and priors P Q ∈ Q, there exists a unique measure R ∈ Q such that R = P on Fτ and for all random variables Z ∈ X one has (11)
ER [Z|Fτ ] = EP [Z|Fτ ]1Ac + EQ [Z|Fτ ]1A
26 For infinite time horizon, we call a family of probability measures time-consistent if it is time-consistent for all finite horizons T > 0. It is thus enough to consider the finite horizon here.
890
FRANK RIEDEL
LEMMA 8: The following assertions are equivalent: (i) Q is time-consistent. (ii) Q is stable. (iii) Q is rectangular. PROOF —Time-Consistency Implies Stability: Suppose that Q is timeconsistent. Fix a stopping time τ, sets A ∈ Fτ , and priors P Q ∈ Q. Let (pt ) and (qt ) be the density processes of P and Q with respect to P0 . Define a new stopping time σ = τ1A + T 1Ac . By time-consistency, the measure R given by pσ dQ dR = dP0 qσ dP0 belongs to DT Note that dR pτ dQ dP = 1A + 1 Ac dP0 qτ dP0 dP0 Taking conditional expectations, we get dR = pτ dP0 Fτ Hence, R = P on Fτ . An application of the generalized Bayes formula27 yields (11). Stability Implies Rectangularity: Fix a stopping time τ and P Q ∈ Q. Take A = Ω. By stability, there exists a measure R ∈ Q with R = P on Fτ and (11). Take Z = 1B for B ∈ F . Equation (11) yields R(B|Fτ ) = Q(B|Fτ ). As R = P on Fτ , we obtain R(B) = ER R(B|Fτ ) = EP R(B|Fτ ) = EP Q(B|Fτ ) Rectangularity Implies Time-Consistency: Let P Q ∈ Q and let τ be a stopping time. Define R by setting pτ dQ dR = dP0 qτ dP0 For B ∈ F , we obtain, by conditioning and using Bayes’ formula, pτ dQ R(B) = EP0 1B qτ dP0 27
See, for example, Karatzas and Shreve (1991, p. 193).
OPTIMAL STOPPING WITH MULTIPLE PRIORS
= EP0
dQ pτ P0 E 1B F τ qτ dP0
891
= EP0 [pτ Q(B|Fτ )] = EP Q(B|Fτ ) Rectangularity yields R ∈ Q. APPENDIX C: PROPERTIES OF MULTIPLE PRIOR EXPECTED VALUES For the sake of the reader, we list here some properties of multiple prior expected values that are used frequently in the arguments of the main text. For bounded random variables, all results are well known. For our purposes, we extend them to Q-uniformly integrable random variables. Let Q be a set of probability measures equivalent to the reference measure P0 . For random variables Z ∈ X , we define the conditional multiple prior expected value πt (Z) = ess inf EP [Z|Ft ] P∈Q
By uniform integrability, πt is well defined. From the properties of conditional expectations and the essential infimum, it follows immediately that πt has the following characteristics: (i) Monotone: For Z ≥ Z in X , we have πt (Z) ≥ πt (Z ). (ii) Conditionally homogeneous of degree 1: For bounded Ft -measurable random variables λ ≥ 0, we have πt (λZ) = λπt (Z) for all Z ∈ X . (iii) Superadditive: For Z Z ∈ X , we have πt (Z + Z ) ≥ πt (Z) + πt (Z ). (iv) Additive with respect to Ft : For Ft -measurable, Z ∈ X , and all Z ∈ X , we have πt (Z + Z ) = Z + πt (Z ). We need the following continuity properties. LEMMA 9: (i) πt is Lipschitz continuous with respect to the sup-norm on L∞ (Ω F P0 ). (ii) πt is continuous from above in the following sense. If Xk ↓ X in X , then πt (Xk ) ↓ πt (X). (iii) Under Assumption 3, πt is continuous from below in the following sense. If Xk ↑ X in X , then πt (Xk ) ↑ πt (X). (iv) Under Assumption 3, πt satisfies bounded convergence in the following sense. If Xk → X, and |Xk | ≤ Z for all k and some Z ∈ X , then πt (Xk ) → πt (X). PROOF: The unconditional version of these results is given in Delbaen (2002a); see Theorems 3.2 and 3.6. They extend easily to the conditional case; see Detlefsen and Scandolo (2005), for example. We have to show that the results carry over to the larger space X .
892
FRANK RIEDEL
Let Xk ↓ X for Xk X ∈ X . Note that for all k we have |Xk | ≤ Z := max{|X0 | |X|} Therefore, we can apply monotone convergence for all P ∈ Q and get EP [Xk |Ft ] ↓ EP [X|Ft ] As a consequence, πt (X) = ess inf lim EP [Xk |Ft ] P∈Q
k→∞
≥ lim inf ess inf EP [Xk |Ft ] = lim inf πt (Xk ) k→∞
P∈Q
k→∞
The inequality πt (X) ≤ lim infk→∞ πt (Xk ) follows from monotonicity. This proves (ii). For (iii), note that the mappings P → EP Xk are weakly continuous on Q by Lemma 7. If Q is weakly compact, Dini’s lemma states that the mappings converge uniformly to P → EP X, and πt (Xk ) ↑ πt (X) follows. The same argument proves (iv). Q.E.D. LEMMA 10: Let T > 0, Z ∈ X , and τ ≤ T be a stopping time. Under Assumptions 3 and 4, there exists a measure P Zτ ∈ Q that coincides with P0 on the σ-field Fτ and ess inf EP [Z|Fτ ] = EP
Zτ
P∈Q
[Z|Fτ ]
PROOF: We show below that there exists a sequence (P m ) ⊂ Q with P m = P0 on Fτ such that m
ess inf EP [Z|Fτ ] = lim EP [Z|Fτ ] P∈Q
m→∞
By Assumption 3, the sequence has a weak limit point P Zτ ∈ Q and ess inf EP [Z|Fτ ] = EP P∈Q
Zτ
[Z|Fτ ]
follows with the help of Lemma 7. To see this, note first that H := ess inf EP [Z|Fτ ] ≤ EP P∈Q
Zτ
[Z|Fτ ]
On the other hand, by monotone continuity, Zτ m m m [Z|Fτ ] = EP Z = lim EP Z = lim EP EP [Z|Fτ ] Zτ m Zτ (P m = P Zτ on Fτ ) = lim EP EP [Z|Fτ ] = EP H
EP
Zτ
EP
Zτ
OPTIMAL STOPPING WITH MULTIPLE PRIORS
893
Then H = EP [Z|Fτ ] follows. It remains to establish the existence of the minimizing sequence (P m ) ⊂ Q. Note first that one can restrict attention to the set Φ = {EP [Z|Fτ ]|P ∈ Q and P = P0 on Fτ }. This is so because for arbitrary P ∈ Q, we can define a new measure R with density Zτ
dP dR dP0 = dP dP0 dP 0 Fτ
Then R = P0 on Fτ . As Q is time-consistent, R ∈ Q. By Bayes’ formula, EP [Z|Fτ ] = ER [Z|Fτ ] We conclude that ess inf EP [Z|Fτ ] = ess inf Φ P∈Q
The existence of the sequence (P m ) ⊂ Q with the desired properties follows if we can show that Φ is downward directed. Hence, let P Pˆ ∈ Q with P = Pˆ = P0 on Fτ . Then ˆ
ˆ
min{EP [Z|Fτ ] EP [Z|Fτ ]} = EP [Z|Fτ ]1A + EP [Z|Fτ ]1Ac ˆ
for A = {EP [Z|Fτ ] < EP [Z|Fτ ]}. We have to show that there exists R ∈ Q with R = P0 on Fτ and ˆ
EP [Z|Fτ ]1A + EP [Z|Fτ ]1Ac = ER [Z|Fτ ] This follows from the equivalent characterization of time-consistency in Lemma 8, (iii). Q.E.D. We obtain the crucial law of iterated expectations as a corollary. LEMMA 11: Under Assumptions 2, 3, and 4, we have for all Q-uniformly integrable random variables Z ∈ X the law of iterated expectations (4). PROOF: Let Z ∈ X and fix t ∈ N. By the above Lemma 10, there exist Q0 Q1 ∈ Q such that ess inf EQ [Z|Ft+1 ] = EQ0 [Z|Ft+1 ] Q∈Q
and ess inf EP [EQ0 [Z|Ft+1 ]|Ft ] = EQ1 [EQ0 [Z|Ft+1 ]|Ft ] P∈Q
894
FRANK RIEDEL
By time-consistency, EQ1 [EQ0 [Z|Ft+1 ]|Ft ] = ER [Z|Ft ] for some R ∈ Q. It follows that ess inf EP ess inf EQ [Z|Ft+1 ]Ft = ER [Z|Ft ] ≥ ess inf EP [Z|Ft ] P∈Q
P∈Q
Q∈Q
Q.E.D.
The other inequality is obvious.
LEMMA 12: Let (Xt ) be an adapted process that satisfies Assumption 1. Let Q be a set of probability measures that satisfies Assumptions 2, 3, and 4. Set Vt = ess sup ess inf EP [Xτ |Ft ] τ≥t
P∈Q
There exists a sequence of stopping times (τk ) with τk ≥ t and ess inf EP Xτk |Ft ↑ Vt P∈Q
PROOF: Let us note first that the value process (Vt ) is well defined and finite: By Assumption 1, there exists Z ∈ X with |Xt | ≤ Z. The conditional expectation EP [Xτ |Ft ] is thus well defined for all P ∈ Q and the value Vt is bounded above by EP [Z|Ft ] for an arbitrary P ∈ Q. With the help of Lemma 10, we get a lower bound as follows. There exists a measure P ∈ Q such that Vt ≥ ess inf EP [−Z|Ft ] = EP [−Z|Ft ] > −∞ P∈Q
a.s.
Let us now come to the proof of the theorem. By the usual properties of the essential supremum (see, e.g., Föllmer and Schied (2004, Appendix A.5)), it is enough to show that the set {ess infP∈Q EP [Xτ |Ft ]|τ ≥ t} is upward directed. Choose two stopping times τ0 τ1 ≥ t. Set A = ess inf EP Xτ0 |Ft > ess inf EP Xτ1 |Ft P∈Q
P∈Q
Set τ2 = τ0 1A + τ1 1Ac . Then τ2 is a stopping time that is greater than or equal to t. The proof is complete if we can show that ess inf EP Xτ2 |Ft = max ess inf EP Xτ0 |Ft ess inf EP Xτ1 |Ft P∈Q
P∈Q
P∈Q
It is obvious from the definition of τ2 that the left hand side is smaller than or equal to the right hand side. Let us show the other inequality. By Lemma 10, there exist measures P0 and P1 such that ess inf EP Xτi |Ft = EPi Xτi |Ft (i = 0 1) P∈Q
895
OPTIMAL STOPPING WITH MULTIPLE PRIORS
Then we have ess inf EP Xτ0 1A + Xτ1 1Ac |Ft ≥ ess inf EP Xτ0 |Ft 1A P∈Q P∈Q + ess inf EP Xτ1 |Ft 1Ac P∈Q P0 = E Xτ0 |Ft 1A + EP1 Xτ1 |Ft 1Ac = EP0 Xτ2 |Ft 1A + EP1 Xτ2 |Ft 1Ac Time-consistency of Q and Lemma 8 imply that there exists a measure P2 ∈ Q such that EP0 Xτ2 |Ft 1A + EP1 Xτ2 |Ft 1Ac = EP2 Xτ2 |Ft Altogether, we obtain ess inf EP Xτ2 |Ft ≥ EP2 Xτ2 |Ft P∈Q
and, as P2 ∈ Q, ess inf EP Xτ2 |Ft = EP2 Xτ2 |Ft P∈Q
Now our claim follows as we have ess inf EP Xτ2 |Ft = EP2 Xτ2 |Ft P∈Q = EP2 Xτ0 |Ft 1A + EP2 Xτ1 |Ft 1Ac ≥ ess inf EP Xτ0 |Ft 1A + ess inf EP Xτ1 |Ft 1Ac P∈Q P∈Q = max ess inf EP Xτ0 |Ft ess inf EP Xτ1 |Ft P∈Q
P∈Q
where we have used the definition of A in the last line.
Q.E.D.
APPENDIX D: BACKWARD INDUCTION AND TIME-CONSISTENCY: AN EXAMPLE The following example shows that backward induction and thus timeconsistency fail in general for multiple priors when the set of priors does not satisfy Assumption 4.
896
FRANK RIEDEL
FIGURE 1.—Tree for Example 3. The decision maker uses two probabilistic models. In model 1, the (conditional) probability of moving up is 1/3 in every node; in model 2, this probability is 2/3. The payoff from stopping is indicated by the bold numbers at the nodes. Time-consistency fails because the decision maker does not take the worst-case probability measure into account. Here, the worst-case probability measure would have probability 2/3 of moving up in the upper node and probability 1/3 of moving up in the lower node.
EXAMPLE 3: Consider a two-period binomial tree as in Figure 1. Let X0 X1 X2 be the sequence of payoffs. We take X0 = x, 3 after up, X1 = 1 after down, and
⎧ 0 after up, up, ⎪ ⎨ 6 after up, down, X2 = 6 after down, up, ⎪ ⎩ 0 after down, down.
The decision maker believes that the up and down moves are independent and identically distributed. She uses two priors. Under the first prior, one moves up with probability 1/3 in all nodes, whereas under the second prior, one moves up with probability 2/3 in all nodes; see Figure 1. If we use backward induction, the value at time 2 is ⎛ ⎞ 0 ⎜6⎟ U2 = ⎝ ⎠ 6 0
OPTIMAL STOPPING WITH MULTIPLE PRIORS
897
At time 1, the minimal conditional expected payoff in the upper node is achieved for the probability 2/3 with a value of 2. From stopping, we get 3. Hence, backward induction prescribes to stop in this node. Similarly, in the lower node, we obtain a value of 2. Finally, at time 0, the value deduced by backward induction is U0 = max{x 7/3}. Hence, if x ≥ 7/3, backward induction prescribes to stop immediately and one obtains a value of x. On the other hand, consider what happens if one does not stop at all. Then the ex ante multiple prior expected payoff is min{1/9 · 0 + 2/9 · 6 + 2/9 · 6 + 4/9 · 0 4/9 · 0 + 2/9 · 6 + 2/9 · 6 + 1/9 · 0} = 8/3 Hence, if 7/3 ≤ x < 8/3, we conclude that backward induction does not lead to the ex ante optimal solution. One checks easily that the ex ante optimal decision is to wait until time 2 while backward induction would prescribe to stop immediately. APPENDIX E: PROOF OF LEMMA 2 As we have strictly positive densities, local equivalence of any P α and P0 is clear. For weak compactness, it is sufficient to show that P ab is closed and that the family {(Dαt )|(αs ) predictable process with values in [a b]} is uniformly integrable for a fixed t. We start with uniform integrability. It is sufficient to show that the second moment E P0 (Dαt )2 remains bounded. Now t P0 α 2 P0 (αs εs − L(αs )) E (Dt ) = E exp 2 s=1
= EP0 exp 2
t
t (αs εs − L(2αs )) + (L(2αs ) − L(αs ))
s=1
s=1
The continuous function L remains bounded on [a b]; hence we have t (L(2αs ) − L(αs )) ≤ A s=1
for some A > 0. It follows that
t (αs εs − L(2αs )) = eA < ∞ E (D ) ≤ e E exp 2 P0
α 2 t
A
P0
s=1
898
FRANK RIEDEL
This establishes uniform integrability. To see time-consistency, take two density processes Dα and Dβ for two predictable processes (αt ) and (βt ) with values in [a b]. Let τ be a stopping time and define a new density by setting t t α αs ε s − L(αs ) dt = Dt = exp s=1
s=1
for t ≤ τ and dt = Dατ Dβt /Dβτ τ τ t t = exp αs ε s − L(αs ) + βs εs − L(βs ) s=1
s=1
s=τ+1
s=τ+1
else. Let γt = αt for t ≤ τ and γt = βt else. It is easy to see that dt = Dγt . Hence, the new probability measure belongs to P ab as well. Q.E.D. APPENDIX F: PROOF OF THEOREM 5 It is enough to treat the finite horizon case. The infinite horizon result follows from Theorem 4. For a predictable process (αt ) with values in [a b], let us write E α for the expectation under the probability P α ∈ P ab . We write E a for the expectation under the probability P a for which αt = a for all t. For later use, note that the random variables (εt ) are independent and identically distributed under P a with distribution νa that has density exp(ax − L(a)) with respect to ν0 . We start with a lemma that shows that P a is the worst-case measure in the sense of first-order stochastic dominance. LEMMA 13: For all bounded, increasing measurable functions h : Ω → R and all t ≥ 1, E a h(εt ) = E a [h(εt )|Ft−1 ] = min E P [h(εt )|Ft−1 ] P∈P ab
PROOF: Let (αt ) be a predictable process with values in [a b]. The density of P α with respect to P a is t # $ Dαt dP α = = exp (αs εs − L(αs )) − (aεs − L(a)) dP a Ft Dat s=1
899
OPTIMAL STOPPING WITH MULTIPLE PRIORS
The generalized Bayes law implies E α [h(εt )|Ft−1 ] =E
t h(εt ) exp (αs εs − L(αs )) − (aεs − L(a)) Ft−1
a
s=1
t−1 · exp (aεs − L(αs )) − (αs εs − L(a)) s=1
# $ = E h(εt ) exp ((αt − a)εt + L(a) − L(αt )) |Ft−1 # $ = E a [h(εt )|Ft−1 ]E a exp ((αt − a)εt + L(a) − L(αt )) |Ft−1 # $ + cova h(εt ) exp((αt − a)εt + L(a) − L(αt ))|Ft−1 a
Under P a , εt is independent of Ft−1 and the density has expectation 1; hence we get # $ = E a h(εt ) + cova h(εt ) exp((αt − a)εt + L(a) − L(αt ))|Ft−1 where cova denotes the conditional covariance under P a . As αt −a ≥ 0, we have to compute the conditional covariance under P a of two monotone increasing functions of εt —which is positive. As a consequence, we get E α h(εt ) ≥ E a h(εt )
Q.E.D.
We show now by backward induction that the value functions Ut are bounded, increasing functions of εt . For t = T , the assertion is clear, UT = f (T εT ), and f is bounded and increasing by assumption. For the induction step, note that Ut = max f (t εt ) ess inf E P [Ut+1 |Ft ] P∈P ab
By the induction hypothesis, Ut+1 is a bounded, increasing function of εt+1 , say Ut+1 = ut (εt+1 ) for some bounded, increasing, measurable function ut . By Lemma 13, we thus have Ut = max{f (t εt ) E a [Ut+1 |Ft ]} As εt+1 is independent of Ft under P a , we get (12)
Ut = max{f (t εt ) E a ut (εt+1 )}
Hence, Ut is a bounded, increasing function of εt .
900
FRANK RIEDEL
Finally, we tackle the proof of Theorem 5. Let U a be the value function for the classical optimal stopping problem under the prior P a . We show U = U a by backward induction. For t = T , we have UT = f (T εT ) = UTa . For t < T , note from Equation (12) that Ut = max{f (t εt ) Ea Ut+1 }; hence, Ut satisfies the same recursion as U a , and Ut = Uta follows.
Q.E.D.
APPENDIX G: TECHNICAL DETAILS FOR THE HOUSE SELLING PROBLEM (EXAMPLE 1) We show first how to embed the problem into the framework of Section 4.1. Let S = R, S be the Borel sets, and ν0 be a distribution on (S S ) with well defined Laplace transform. Define discounted prices as δt pt = f (t εt ) = δt exp(εt ). Then we have a monotone problem. Note that ∞ a P = t=1 ν a for the measure ν a (dx) = exp(ax − L(a))ν0 (dx). Hence, the (εt ) are independent and identically distributed with distribution ν a under P a . Consider now the special case of log-normal prices. In this case, take ν0 = N(0 σ 2 ). Then L(λ) = 1/2σ 2 λ2 . Fix some predictable process (αt ) with values in the interval [a b]. The conditional distribution of εt given Ft−1 is P(εt ∈ dx|Ft−1 ) = E 0 Dαt /Dαt−1 1{εt ∈dx} |Ft−1 = E 0 exp(αt x − σ 2 α2t /2)1{εt ∈dx} |Ft−1 = exp(αt x − σ 2 α2t /2)P 0 (εt ∈ dx) = exp(−1/(2σ 2 )(x − αt )2 ) Conditional on Ft−1 , the random variable εt is normally distributed with mean αt (which is known at time t − 1) and variance σ 2 . Note that εt is not normally distributed in general because the αt can be stochastic. However, if αt = a a.s. for all t, the (εt ) are independent and normally distributed. APPENDIX H: TECHNICAL DETAILS FOR THE PARKING PROBLEM (EXAMPLE 2) We show first how to formulate a time-consistent set of priors for binary random variables in the spirit of Section 4.1. Let S = {0 1} and take ν0 to be ∞ uniform on S. Hence, under P0 = t=1 ν0 , the projections (εt ) are independent and identically distributed with P0 [εt = 1] = P0 [εt = 0] = 1/2. Consider now the set P ab as defined in Section 4.1. Fix a predictable process (αt ). By using
OPTIMAL STOPPING WITH MULTIPLE PRIORS
901
Bayes’ law, we get [Dαt 1{εt =1} |Ft−1 ] Dαt−1 = E 0 exp(αt − L(αt ))1{εt =1} |Ft−1
P α [εt = 1|Ft−1 ] = E 0
(αt ) predictable = exp(αt − L(αt ))P 0 (εt = 1) =
exp(αt ) 1 + exp(αt )
Hence, the probability for εt = 1 is always in the interval [p p] with p=
exp(a) 1 + exp(a)
p=
exp(b) 1 + exp(b)
We denote by P and P the probability measures under which the random variables (εt ) are identically and independently distributed with P[εt = 1] = p and P[εt = 1] = p, respectively. APPENDIX I: PROOF OF THEOREM 6 We give the proof for increasing A. The infinite horizon result follows from Theorem 4 once we have established the result for the finite horizon. So let T > 0. We prove by backward induction that Ut = u(t St ) for a function u(t S) that is increasing in S; moreover u(t St ) = U t . We clearly have UT = A(T ST ) = U T , and the claim is thus valid for t = T . We have for t < T, Ut = max A(t St ) $ # min pt+1 u(t + 1 St u) + (1 − pt+1 )u(t + 1 St d) pt+1 ∈[pp]
By induction hypothesis, u(t + 1 St u) ≥ u(t + 1 St d); thus % # $& Ut = max A(t St ) pu(t + 1 St u) + (1 − p)u(t + 1 St d) = Ut Hence, Ut is a function u(t St ). As A is increasing in S by assumption and u(t + 1 S) is increasing in S by induction hypothesis, u(t S) is increasing in S. Q.E.D.
902
FRANK RIEDEL
APPENDIX J: BARRIER OPTIONS28 The other claims being clear, we show here only that P[τH ≤ t] ≤ P[τH ≤ t] for all P ∈ Q holds true. By assumption, we have S0 = 1 and H = un for some n ≥ 1. Let us write ξt =
t
εs
s=1
σm = min{k : ξt = m} and Mt = max(0 ξ1 ξt ) Let us show P[Mt ≥ m] ≤ P[Mt ≥ m] for all t ≥ 1 and m odd when t is odd and m even otherwise, and P ∈ Q by using induction over t. For t = 1, we have P[M1 ≥ m] = 0
(P ∈ Q)
for m ≥ 2. For m = 1, we have P[M1 ≥ 1] = p ≤ P[M1 ≥ 1] for all P ∈ Q. Now let t > 1 and assume that the claim is true for all s < t. We have (13)
P[Mt ≥ m] = P[Mt−1 ≥ m + 1] + P[Mt−1 = m − 1 εt = 1] = P[Mt−1 ≥ m + 1] + E P P[εt = 1|Ft−1 ]1{Mt−1 =m−1} ≥ P[Mt−1 ≥ m + 1] + pP[Mt−1 = m − 1]
Now we distinguish two cases. If P[Mt−1 = m − 1] ≥ P[Mt−1 = m − 1], then by induction hypotheses, P[Mt ≥ m] ≥ P[Mt−1 ≥ m + 1] + pP[Mt−1 = m − 1] = P[Mt ≥ m] 28
By Jörg Vorbrink.
903
OPTIMAL STOPPING WITH MULTIPLE PRIORS
If P[Mt−1 = m − 1] ≤ P[Mt−1 = m − 1], then rewrite (13) as P[Mt ≥ m] ≥ P[Mt−1 ≥ m + 1] + pP[Mt−1 = m − 1] = P[Mt−1 ≥ m − 1] + (p − 1)P[Mt−1 = m − 1] ≥ P[Mt−1 ≥ m − 1] + (p − 1)P[Mt−1 = m − 1] ≥ P[Mt−1 ≥ m − 1] + (p − 1)P[Mt−1 = m − 1] = P[Mt−1 ≥ m + 1] + pP[Mt−1 = m − 1] = P[Mt ≥ m] APPENDIX K: SHOUT OPTIONS29 As in the previous examples, we fix the finite ambiguous CRR model introduced in Section 4.3. To evaluate the European claim at the time τ we consider the payoff 0 τ ≤ t < T, f (t St ) = + (Sτ − ST ) else. As Theorem 6 holds for all values of τ and the function is decreasing in its second variable, we obtain for the the value of the stopping problem at time τ, (14)
s(τ Sτ ) = ess sup ess inf E P [(Sτ − ST )+ /(1 + r)T −τ |Fτ ] σ≥τ
P∈Q
= E P [(Sτ − ST )+ /(1 + r)T −τ |Fτ ] Clearly the claim is equal to the classical at-the-money European put starting at time τ and maturing at T . Therefore, Equation (14) provides the discounted expected value of the European put at the time of shouting. Using the structure of the CRR model, we can write the value of the claim at time τ as + T (d)1−εv (u)εv Fτ s(τ Sτ ) = EP Sτ /(1 + r)T −τ 1 − v=τ+1
= Sτ /(1 + r)T −τ EP
1−
T
(u)1−εv (d)εv
+
F τ
v=τ+1
where we used the fact that the random variables τ and Sτ are Fτ -measurable in the finite model. As it was shown in Section 4.3, the random variables (εt ) 29
By Tatjana Chudjakow.
904
FRANK RIEDEL
are independent and identically distributed under P. Thus, using T ∗ = T − τ, the expectation can be written as T ∗ Sτ ∗ [p · u]k [d(1 − p)](T −k) τ (1 + r) T ∗ /2
s(τ Sτ ) = E P [(Sτ − ST )+ /(1 + r)T −τ |Fτ ] = Sτ g(τ)
(16)
As g(τ) ≥ 0 for all τ ≤ T , s(τ Sτ ) is increasing in its second variable. Again using Theorem 4.5, we can conclude that for t < τ, the worst-case measure is given by P. Using time-consistency, we obtain for the density of the worst-case measure D∗l =
τ∧l
exp(aεv − L(a))
v=1
l
exp(bεu − L(b))
Q.E.D.
j=τ+1
APPENDIX L: OPTIONS WITH U-SHAPED PAYOFFS30 We fix the finite ambiguous CRR model of the Section 4.3. To evaluate U-shaped payoffs we first show by backward induction that Ut = v(t St ) for a U-shaped function v and then compute the worst-case measure. So let T > 0. We prove by backward induction that for every t < T , Ut = v(t St ) for a function v(t S) that is increasing in St for St ≥ xt for some xt and decreasing for St < xt . As the payoff functions g(t x) are U-shaped, for every t ≤ T there exists a minimum of g(t St ), say in Kt∗ . At time T we clearly have UT = g(T ST ), which is decreasing for ST < KT∗ and increasing otherwise. Thus, the claim is valid for t = T . Using the induction hypothesis and time-consistency, we obtain for t < T, Ut = max g(t St ) $ # min pt+1 v(t + 1 St · u) + (1 − pt+1 )v(t + 1 St · d) pt+1 ∈[pp]
By induction hypothesis, there exists an xt+1 that divides the values of St+1 into two regions: the region Dt+1 := {St+1 ≤ xt+1 } 30
By Tatjana Chudjakow.
OPTIMAL STOPPING WITH MULTIPLE PRIORS
905
where the function v(t + 1 St+1 ) decreases, and and the region It+1 := {St+1 ≥ xt+1 } where v(t + 1 St+1 ) increases. Note that the boundary point xt+1 is included in both regions. This notion ensures that the boundary cases St · u = xt+1 > St · d and St · u > xt+1 = St · d do not need to be analyzed separately. Therefore, for the induction step it is sufficient to consider two cases: the case where St satisfies St · u > St · d ≥ xt+1 and the case xt+1 ≥ St · u > St · d In the first case, we are in the situation of the monotone increasing case of Theorem 6 and have by induction hypothesis v(t + 1 St · u) ≥ v(t + 1 St · d); thus % # $& (17) Ut = max g(t St ) pv(t + 1 St · u) + (1 − p)v(t + 1 St · d) Hence, Ut is a function v(t St ). Furthermore, g is increasing in St for St ≥ Kt∗ by assumption and v(t + 1 St+1 ) is increasing in St+1 by induction hypothesis. Therefore, there exists an xt such that v(t St ) is increasing in St for all St ≥ xt . With the same argument, one obtains the analogous result for the decreasing region of v and the proof is complete. This iterative procedure provides the sequence x = (xt )t=1T . In the second step, we compute the density corresponding to the worst-case measure P ∗ in this case. As one can see from Equation (17), P ∗ [εt+1 = 1|Ft ] = p if St ≥ xt . Similarly, one obtains P ∗ [εt+1 = 1|Ft ] = p for St < xt , that is, p if St ≥ xt , ∗ P [εt+1 = 1|Ft ] = (18) p else. Using the definition of p and p, one can paste together the densities. This leads to exp(aεv − L(a)) exp(bεu − L(b)) D∗t = v≤tSv ≤xv
u≤tSv ≥xv
REFERENCES ARROW, K. (1971): Essays in the Theory of Risk Bearing. Chicago: Markham. [865]
906
FRANK RIEDEL
ARTZNER, P., F. DELBAEN, J.-M. EBER, AND D. HEATH (1999): “Coherent Measures of Risk,” Mathematical Finance, 9, 203–228. [858] ARTZNER, P., F. DELBAEN, J.-M. EBER, D. HEATH, AND H. KU (2007): “Coherent Multiperiod Risk Adjusted Values and Bellman’s Principle,” Annals of Operations Research, 152, 5–22. [863, 866] BIER, M., AND F. RIEDEL (2009): “Time-Consistent Multiple Prior Models in Trees,” Mimeo, Bielefeld University. [874] CHATEAUNEUF, A., F. MACCHERONI, M. MARINACCI, AND J.-M. TALLON (2005): “Monotone Continuous Multiple Priors,” Economic Theory, 26, 973–982. [865] CHOW, Y., H. ROBBINS, AND D. SIEGMUND (1971): Great Expectations: The Theory of Optimal Stopping. Boston: Houghton Mifflin. [860,868,875] COQUET, F., Y. HU, J. MÉMIN, AND S. PENG (2002): “Filtration-Consistent Nonlinear Expectations and Related g-Expectations,” Probability Theory and Related Fields, 123, 1–27. [863] COX, J. C., S. A. ROSS, AND M. RUBINSTEIN (1979): “Option Pricing: A Simplified Approach,” Journal of Financial Economics, 7, 229–263. [876] DELBAEN, F. (2002a): “Coherent Risk Measures on General Probability Spaces,” in Essays in Honour of Dieter Sondermann, ed. by K. Sandmann and P. Schönbucher. New York: Springer Verlag, 1–37. [863,888,891] (2002b): “The Structure of m-Stable Sets and in Particular of the Set of Risk Neutral Measures,” Mimeo, ETH Zürich. [859,867] DETLEFSEN, K., AND G. SCANDOLO (2005): “Conditional and Dynamic Convex Risk Measures,” Finance and Stochastics, 9, 539–561. [866,891] DIXIT, A., AND R. PINDYCK (1994): Investment Under Uncertainty. Princeton, NJ: Princeton University Press. [857,868] DOOB, J. (1998): Measure Theory. Berlin: Springer Verlag. [867] DUFFIE, D. (1992): Dynamic Asset Pricing Theory. Princeton, NJ: Princeton University Press. [857] DUTTA, P. K., AND A. RUSTICHINI (1993): “A Theory of Stopping Time Games With Applications to Product Innovations and Asset Sales,” Economic Theory, 3, 743–763. [888] EICHBERGER, J., AND D. KELSEY (1996): “Uncertainty Aversion and Dynamic Consistency,” International Economic Review, 37, 625–640. [859] EL KAROUI, N., AND M.-C. QUENEZ (1995): “Dynamic Programming and Pricing of Contingent Claims in an Incomplete Market,” SIAM Journal of Control and Optimization, 33, 29–66. [883] EPSTEIN, L., AND Z. CHEN (2002): “Ambiguity, Risk and Asset Returns in Continuous Time,” Econometrica, 70, 1403–1443. [873] EPSTEIN, L. G., AND M. LEBRETON (1993): “Dynamically Consistent Beliefs Must Be Bayesian,” Journal of Economic Theory, 61, 1–22. [866] EPSTEIN, L., AND M. MARINACCI (2006): “Mutual Absolute Continuity of Multiple Priors,” Journal of Economic Theory, 137, 716–720. [864] EPSTEIN, L., AND M. SCHNEIDER (2003a): “IID: Independently and Indistinguishably Distributed,” Journal of Economic Theory, 113, 32–50. [861] (2003b): “Recursive Multiple Priors,” Journal of Economic Theory, 113, 1–31. [858,859, 863,866,867,889] FERGUSON, T. (2006): “Optimal Stopping and Applications,” Electronic Text, University of California, Los Angeles, available at http://www.math.ucla.edu/~tom/Stopping/Contents.html. [876] FILIPOVIC, D., AND G. SVINDLAND (2008): “Optimal Capital and Risk Allocations for Law- and Cash-Invariant Convex Functions,” Finance and Stochastics, 12, 423–439. [888] FÖLLMER, H., AND Y. KABANOV (1998): “Optional Decomposition and Lagrange Multipliers,” Finance and Stochastics, 2, 1–25. [883] FÖLLMER, H., AND I. PENNER (2006): “Convex Risk Measures and the Dynamics of Their Penalty Functions,” Statistics and Decisions, 24, 61–96. [888] FÖLLMER, H., AND A. SCHIED (2002): Stochastic Finance, Studies in Mathematics, Vol. 27. Berlin: de Gruyter. [889]
OPTIMAL STOPPING WITH MULTIPLE PRIORS
907
(2004): Stochastic Finance (Second Ed.). Berlin: de Gruyter. [859,861,863,865,867,869, 885,894] FUDENBERG, D., AND J. TIROLE (1983): “Capital as a Commitment: Strategic Investment to Deter Mobility,” Journal of Economic Theory, 31, 227–250. [888] GILBOA, I., AND D. SCHMEIDLER (1989): “Maxmin Expected Utility With Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. [858,865] GRENADIER, S. (2002): “Option Exercise Games: An Application to the Equilibrium Investment Strategies of Firms,” Review of Financial Studies, 15, 691–721. [887] HANSEN, L., AND T. SARGENT (2001): “Robust Control and Model Uncertainty,” American Economic Review Papers and Proceedings, 91, 60–66. [858] KAINA, M., AND L. RÜSCHENDORF (2009): “On Convex Risk Measures on Lp -Spaces,” Mathematical Methods of Operations Research (forthcoming). [888] KARATZAS, I., AND S. KOU (1998): “Hedging American Contingent Claims With Constrained Portfolios,” Finance and Stochastics, 2, 215–258. [861,869] KARATZAS, I., AND S. E. SHREVE (1991): Brownian Motion and Stochastic Calculus. New York: Springer Verlag. [890] KARATZAS, I., AND I. ZAMFIRESCU (2003): “Game Approach to the Optimal Stopping Problem,” Working Paper, Columbia University. [863] KRAMKOV, D. (1996): “Optional Decomposition of Supermartingales and Hedging Contingent Claims in Incomplete Security Markets,” Probability Theory and Related Fields, 105, 459–479. [860,883] LERCHE, R., R. KEENER, AND M. WOODROOFE (1994): “A Generalized Parking Problem,” in Statistical Decision Theory and Related Topics V, ed. by J. B. S. S. Gupta. Berlin: Springer Verlag, 523–532. [875] MACCHERONI, F., M. MARINACCI, AND A. RUSTICHINI (2006): “Dynamic Variational Preferences,” Journal of Economic Theory, 128, 4–44. [888] MACHINA, M. (1989): “Dynamic Consistency and Non-Expected Utility Models of Choice Under Uncertainty,” Journal of Economic Literature, 27, 1622–1668. [859] MIAO, J., AND N. WANG (2004): “Risk, Uncertainty, and Option Exercise,” Working Paper, Boston University. [863] NISHIMURA, K. G., AND H. OZAKI (2004): “Search and Knightian Uncertainty,” Journal of Economic Theory, 119, 299–333. [863] (2007): “Irreversible Investment and Knightian Uncertainty,” Journal of Economic Theory, 136, 668–694. [863] PENG, S. (1997): “BSDE and Related g-Expectation,” in Backward Stochastic Differential Equation, Pitman Research Notes in Mathematics, Vol. 364, ed. by N. El Karoui and L. Mazliak. Reading, MA: Addison-Wesley. [863,888] PORTEUS, E. (1990): Foundations of Stochastic Inventory Theory. Standford, CA: Stanford University Press. [858] RIEDEL, F. (2004): “Dynamic Coherent Risk Measures,” Stochastic Processes and Their Applications, 112, 185–200. [858,859,866-868] SARIN, R., AND P. WAKKER (1998): “Dynamic Choice and Nonexpected Utility,” Journal of Risk and Uncertainty, 17, 87–119. [859,866] SNELL, L. (1952): “Applications of the Martingale Systems Theorem,” Transactions of the American Mathematical Society, 73, 293–312. [859,860,868] TUTSCH, S. (2006): “Konsistente und Konsequente Dynamische Risikomaße und das Problem der Aktualisierung,” Ph.D. Thesis, Humboldt University at Berlin. [864] WALD, A. (1947): Sequential Analysis. New York: Wiley. [858] WEEDS, H. (2002): “Strategic Delay in a Real Options Model of R&D Competition,” Review of Economic Studies, 69, 729–747. [887] YOO, K.-R. (1991): “The Iterative Law of Expectation and Non-Additive Probability Measures,” Economics Letters, 37, 145–149. [859]
908
FRANK RIEDEL
ZAMFIRESCU, I.-M. (2003): “Optimal Stopping Under Model Uncertainty,” Ph.D. Thesis, Columbia University. [863]
Institute of Mathematical Economics, Bielefeld University, Bielefeld NRW 33501, Germany; [email protected]. Manuscript received December, 2007; final revision received December, 2008.
Econometrica, Vol. 77, No. 3 (May, 2009), 909–931
INCENTIVES TO EXERCISE BY GARY CHARNESS AND URI GNEEZY1 Can incentives be effective in encouraging the development of good habits? We investigate the post-intervention effects of paying people to attend a gym a number of times during one month. In two studies we find marked attendance increases after the intervention relative to attendance changes for the respective control groups. This is entirely driven by people who did not previously attend the gym on a regular basis. In our second study, we find improvements on health indicators such as weight, waist size, and pulse rate, suggesting the intervention led to a net increase in total physical activity rather than to a substitution away from nonincentivized ones. We argue that there is scope for financial intervention in habit formation, particularly in the area of health. KEYWORDS: Exercise, field experiment, habit formation, incentives.
INTRODUCTION
ON SEPTEMBER 18, 2006, New York Mayor Michael Bloomberg announced a new policy he called conditional cash transfers. He said that the plan was designed to address the simple fact that the stress of poverty often causes people to make decisions—to skip a doctor’s appointment or to neglect other basic tasks—that often only worsen their long-term prospects. Conditional cash transfers give them an incentive to make sound decisions instead. The intention was to provide conditional cash transfers to families of at-risk youngsters to encourage parents and young people to engage in healthy behavior, to stay in school, stay at work, and stay on track to rise out of poverty. Bloomberg also argued that the return on such investments is necessarily delayed, but that this is a clear path out of the cycle of poverty. Bloomberg’s last comment is about changing peoples’ habits. He believes that the cost (estimated at $42 billion) of the program is worth the benefit of this improvement in habits. Whether or not Bloomberg is correct in his assessment, an underlying issue is whether we can construct mechanisms to induce better decision-making. As DellaVigna and Malmendier (2006) have nicely demonstrated, people make poor choices regarding membership options at a health club: people who choose to pay a flat monthly fee for membership in a gym pay more than if they would have chosen to pay a fixed cost per visit. So perhaps the incentives to exercise that are already present are ineffective or insufficiently salient. But can we improve on these existing incentives? Can we go beyond the mere 1 We acknowledge helpful comments from Yan Chen, Martin Dufwenberg, Guillaume Fréchette, Jacob Goeree, Ulrike Malmendier, Uri Simonsohn, and Priscilla Williams, as well as seminar audiences at the Stanford Institute of Theoretical Economics, the Santa Barbara Conference on Communication and Incentives, the ESA meeting in Tucson, the ESA meeting in Lyon, CIDE in Mexico City, and Harvard University. Special thanks go to a co-editor and three anonymous referees, who suggested the second study with the biological measures reported in the paper. Charness and Gneezy each acknowledge support from the National Science Foundation.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7416
910
G. CHARNESS AND U. GNEEZY
identification of behavioral mistakes, and consider the issue of how a welfare maximizer would react if aware of his or her own bias?2 In this vein, the goal of the current paper is to test the conjecture that financial incentives can be used to develop or foster good habits. Habits are an important feature of daily life. However, people often follow a routine without much ongoing consideration about the costs and benefits of the constituent elements of this routine. One such habit is that of regular physical exercise. The physical benefits of exercise are undeniable, as adequate exercise is associated with better health in many respects. In particular, obesity has become a prominent health issue; the National Center for Health Statistics reported that in 2003–2004 a startling 66 percent of American adults we overweight or obese. This is the highest level ever recorded; in comparison this rate was 47 percent in 1976–1980 and 55 percent in 1988–1994. Indeed, as suggested by this trend, the problem appears to be getting worse more recently: Adult obesity rates rose in 31 states from 2006 to 2007, according to the report from the Trust for America’s Health (2007); rates did not decrease in any states. A new public opinion survey featured in the report finds 85 percent of Americans believe that obesity is an epidemic. This increased prevalence of obesity is paralleled by an increase in inactivity. Most jobs today are sedentary and overweight people are particularly likely to report being inactive. Regular exercise combined with limiting calorie intake was shown to be most effective in reducing body mass (Andersen (1999)). Exercise provides health benefits even if people do not lose weight (Lee, Blair, and Jackson (1999)). There are also psychological benefits to exercising: People who exercise regularly are likely to be less depressed, have higher self-esteem, and have an improved body image (Brownell (1995)). Regular exercise may also reduce stress and anxiety (Kayman, Bruvold, and Stern (1990)). The literature discusses four main barriers to activity (Andersen (1999)): lack of time, embarrassment at taking part in activity, inability to exercise vigorously, and lack of enjoyment of exercise. The traditional approach in economics involves providing financial incentives for people to engage in (or refrain from) various activities, but since strong (nonfinancial) incentives regarding habitual behavior are already in place without any intervention appearing to be necessary,3 can there be much scope for intervention in the incentive structure? 2 For a similar attempt in other economic areas, see the Thaler and Benartzi (2004) “save more tomorrow” plan. In a sense, this can be seen as complementary to the behavioral industrialorganization agenda, which considers how firms might react to consumer biases in product design, advertising, and so forth. See Ellison (2006) for a summary of the relevant literature, as well as related studies by Heidhues and Koszegi (2008) and Lauga (2008) and the recent review in DellaVigna (2009). 3 That people are aware that exercise is beneficial is evidenced by the fact that Americans spend billions of dollars annually on diet books, exercise equipment, and weight-loss programs.
INCENTIVES TO EXERCISE
911
We discuss below two main hypotheses regarding the outcome of using financial incentives to shape habits. The first is the “crowding-out” hypothesis, according to which paying people for an activity (such as exercising) might destroy their intrinsic motivation to perform the task once the incentives are removed (Deci (1971), Gneezy and Rustichini (2000a, 2000b), Frey and Jegen (2001)). The alternative hypothesis is “habit-formation” behavior. The main idea here is that one’s utility from consumption depends on one’s past consumption (Becker and Murphy (1988)). If it is possible to induce a beneficial habit, the policy implications are major. In this paper we undertake financial interventions, conducting two field studies in which we paid university students to attend the university’s gym. In the first of our studies, we compare the behavior of three groups. All groups were given a handout regarding the benefits of exercise. One group had no further requirements; people in the other two groups were paid $25 to attend the gym once in a week, and people in one of these two groups were then paid an additional $100 to then attend the gym eight more times in the following four weeks. We are able to observe attendance before the intervention, during the intervention, and for a period of seven weeks after the end of the intervention. The main result is that post-intervention attendance is more than twice as high for the high-incentive group as for the no-incentive group. This difference does not decline at all during the time following payment, suggesting that the effects do have some degree of persistence. There is very little difference between the behavior of the no-incentive and low-incentive groups, while there is a significant difference between the behavior of the low-incentive and high-incentive groups. In our second study, we invited people to a first meeting in which we took biometric measures and gave them the handout regarding the benefits of exercise. They were paid $75 for this part, and were then invited to come twice more, so that we could obtain biometric information again. They were promised $50 for each of the two later meetings. We randomly divided the participants into three groups. There were no additional requirements for the people in the control group. Participants in the second group were required to attend the gym once during the one-month intervention period, and participants in the third group were required to attend the gym eight times during the intervention period. We find a significant and persistent increase in attendance rates for people in the third group, and this increase is again entirely driven by people who had not been regular gym attendees (at least once per week). We also find improvement in the biometric measures for the third group relative to the other groups. Our results indicate that it may be possible to encourage the formation of good habits by offering monetary compensation for a sufficient number of occurrences, as doing so appears to move some people past the “threshold” needed to engage in an activity. It may often be the case that there is initial resistance to commencing a beneficial regimen, as the startup costs loom large.
912
G. CHARNESS AND U. GNEEZY
However, if people are “walked through” this process with adequate financial incentives to try the regimen regularly for a while, perhaps good habits will develop. Note that the observation that exercising is a habitual behavior suggests that people who are interested in exercising more should try to commit themselves to exercise for a while. By doing so they affect not only their current well-being, but also their future utility, by making future exercise more beneficial.4 This type of self-enforcing mechanism is a possible explanation of the DellaVigna and Malmendier (2006) study. As a self-control mechanism, people may choose the more expensive plan because it reduces the marginal cost of attending to zero, and people believe that this will encourage them to attend the gym in the future. Potential applications are numerous, as much of the population seems to be aware of the benefits of some activity, but incapable of reforming without some assistance. For example, in education, Angrist, Lang, and Oreopoulos (2009) offered merit scholarships to undergraduates at a Canadian university; they have some success in improving performance, but mixed results overall. A recent literature in economics ties habits and self-control. Laibson (1997) and O’Donoghue and Rabin (1999) discussed present-biased (hyperbolic) preferences as an explanation for persistent bad habits and addictions.5 This relates to our study because students may over-emphasize initial setup costs for going to the gym due to hyperbolic discounting. Loewenstein and O’Donoghue (2005) developed a dual-process model in which a person’s behavior is the outcome of an interaction between deliberative goal-based processes, and affective processes that encompass emotions and motivational drives. The incentives to exercise introduced in our study could help in the conflict between the two processes and increase gym visits. Bernheim and Rangel (2004) presented a model in which use among addicts may be a mistake triggered by environmental cues, which addicts may then try to avoid. In a related vein, Benabou and Tirole (2004) developed a theory of internal commitments, wherein one’s self-reputation leads to self-regulation and this “willpower” enables one to maintain good behavior. 1. THE FIELD EXPERIMENTS In our first study, we invited students (from an e-mail list of people interested in participating in experiments) at the University of Chicago to the laboratory. There was no mention of physical fitness or exercise in the recruiting materials. All participants were promised payment if, and only if, they came to the 4 Becker and Murphy (1988) identified conditions under which past consumption of a good raises the marginal utility of present consumption; Becker (1992) applied this to habit formation. This is discussed in more detail below. 5 See Frederick, Loewenstein, and O’Donoghue (2002) for a comprehensive review of empirical research on intertemporal choice, as well as an overview of related theoretical models.
INCENTIVES TO EXERCISE
913
laboratory once on a given date and again a week later. The 120 students we signed up were assigned randomly to one of the three treatments described below. The assignment to treatment was based on the arrival time to the meeting. All students at the university received a membership in the campus athletic facility as part of their fees. Each person was asked to sign a consent form allowing us to get the computerized report (based on the magnetic swipe card used to enter the gym) of his or her visits to the gym during the academic year, so we were able to obtain records concerning past attendance at this facility for all of our participants. Everyone was given a handout about the benefits of exercise; this is shown in the Supplemental Materials on the Econometrica website (Charness and Gneezy (2009)). Forty of these people participated in a different experiment, which was completely unrelated to exercising; this was the control group.6 The other 80 participants (in different sessions) were told that they would receive $25 to visit the gym at least once during the following week and then to return to the lab to answer questions. They were told that we would be checking their computerized records. Upon returning to the lab in the following week, participants were randomly assigned to one of two treatments. For half of them this was the end of the experiment; the other half was promised an additional $100 for attending the gym at least eight times during the next four weeks. All participants in the latter group returned after the month was over. Our second study was conducted at University of California, San Diego, where all registered students receive a membership in the campus athletic facilities as part of their tuition. All participants were asked to sign a consent form granting us access to data on their past and future gym visits. In all, 168 first- and second-year undergraduate students were recruited from the general campus population using e-mail lists.7,8 All participants were paid $175 (in installments of $75 for the first visit and $50 for each of the other two visits, to motivate people to show up each time) to go to a meeting room at the Rady School of Management three times (once in January, once after about one month, and once after about five months) for biometric tests. They were also asked to keep an exercise log for five weeks and to complete a questionnaire. There were no further requirements for the people in the control group; the people in the second group were additionally required to go to the gym at least once in the next month, while the people in the third group were additionally required to go to the gym at least eight times in the next month. By paying the same amount of money to all participants in this study, we control for the 6
This was a marketing experiment studying the effect of coupons on product choice. This differs slightly from the recruiting done in Study 1, where all undergraduates were recruited. This could conceivably have affected pre-intervention attendance rates (as well as perhaps inducing the positive trend in attendance rates in time for each treatment group), as firstyear students might have still been settling in. 8 Twelve participants (see below) did not show up to all meetings and were excluded from the data. 7
914
G. CHARNESS AND U. GNEEZY
possibility that it was the monetary payment, rather than a habit acquired by our requiring multiple gym visits, that caused the effects we observed in our first study; since the control group was paid $175 independently of gym attendance, the additional attendance for the eight-times group cannot be the result of their additional income (and possibly a corresponding additional amount of free time). Participants who replied to the e-mail were invited for an individual meeting and were randomly assigned a treatment group according to arrival time, and given the exercise handout and a questionnaire. We then measured the individual’s height, weight, body fat percentage, waist, pulse, and blood pressure.9 We collected the exercise logs, which showed the number of days of exercise and a brief description of the kinds of exercise in both the gym and otherwise, at the second measurement appointment. The appointments for the second and third meetings were arranged by e-mail. Hypotheses The standard null hypothesis is that our financial intervention will not affect behavior after the end of the intervention. We formalize this as follows: HYPOTHESIS 0: Participants will visit the gym with the same frequency after the incentives are removed as before the incentives were introduced. We also test two competing hypotheses regarding the effect of this incentive. The first hypothesis is the crowding-out effect. Studies indicate that, in some situations, providing rewards may be counterproductive, as providing an extrinsic motivation for a task or activity may crowd out existing intrinsic motivation.10 The formal statement of the hypothesis is next: HYPOTHESIS 1: Participants will visit the gym less frequently after the incentives are removed as compared to before the incentives were introduced. 9 To measure the waist circumference, the research assistant placed a tape measure around the abdomen just above the hip bone. The tape was snug and was kept parallel to the floor. Body fat percentage was measured with a conventional scale that uses the bioelectrical impedance method. A low-level electrical current is passed through the body and the impedance is measured. The result is used in conjunction with weight to determine body fat percentage. Unfortunately, the body’s impedance level can be altered by many factors besides body fat, such as the amount of water in the body, skin temperature, and recent physical activity. Hence, this is a noisy measure of actual body fat. Pulse and blood pressure were measured using an automatic monitor. 10 For early demonstrations in psychology, see Deci (1971) and Lepper and Greene (1978). See Frey and Jegen (2001) and Gneezy and Rustichini (2000a, 2000a) for demonstrations in economic settings. Benabou and Tirole (2003) presented a formal model of this issue. Fehr and Falk (2002) provided a more general framework of the psychology of incentives.
INCENTIVES TO EXERCISE
915
According to this hypothesis, participants are intrinsically motivated to exercise. Any extrinsic intervention, such as paying them to go to the gym, may be counterproductive in the long run by destroying the intrinsic motivation to exercise. According to this process, before the introduction of the incentives participants exercised either because it was good for them or because they simply enjoyed it. After the incentives are introduced, they may instead feel that they exercise just for the money. Even if the incentives are large enough to motivate people to go to the gym while in force (see Gneezy and Rustichini (2000a) and Heyman and Ariely (2004) for discussions of the importance of the size of the incentive), the hypothesis is that after the incentives are removed participants will stop attending the gym because intrinsic motivation has been crowded out. The competing hypothesis is that people who were paid to attend the gym for some period would attend the gym more frequently even after the incentives are removed. HYPOTHESIS 2: Participants will visit the gym more frequently after the incentives are removed as compared to before the incentives were introduced. One motivation for this hypothesis is “habit formation.” Becker and Murphy (1988) identified a necessary and sufficient condition for a good to be habitual near a steady state: (σ + 2δ)Ucs > −Uss where δ is the depreciation rate on past consumption, σ is the rate of preference for the present, c is a consumption good, S is the stock of consumption capital, Ucs = ∂2 U/∂c ∂S and Uss = ∂2 U/∂S 2 .11 In words, an increase in one’s current consumption of c increases one’s future consumption of c if and only if one’s behavior displays adjacent complementarity.12,13 Habits may be harmful or beneficial to the extent that they decrease or increase future utility. The marginal utility of today’s consumption is correlated with historical consumption; changes today may have only a small effect in the short run, but increasingly large effects in the long run. In this view, “experiences influence desires and choices partly by creating habits, addictions, and traditions” (Becker (1992, p. 335)). 11
See Becker and Murphy (1988, pp. 679–680) for the derivation. We use the Becker (1992, p. 343) formulation. 12 This term was first introduced Ryder and Heal (1973). An example on page 5 is “A person with adjacent complementarity [who expects to receive a heavy supper] would tend to eat a light breakfast and a substantial lunch,” while this would be reversed with distant complementarity. 13 In fact, past consumption of the good raises the marginal utility of present consumption whenever Ucs > 0.
916
G. CHARNESS AND U. GNEEZY
If exercising is a form of habitual behavior, providing incentives to go to the gym for a while may increase future utility from exercising. If the marginal utility of consumption today is positively correlated with historical consumption, then this period in which people were given financial incentives to go to the gym could also induce people to go to the gym more often in the future. Hence, we call this hypothesis habit formation. Note that crowding out and habit formation are not mutually exclusive. Since they work in opposite directions, the outcome could be hard to interpret. There could also be individual differences: some people may react in line with crowding out, while others react in line with habit formation. We will discuss this in light of our results. 2. RESULTS Figure 1(a) and (b) graphically presents the rate of gym attendance before and after the intervention period for Study 1 and Study 2, respectively. “Before” refers to the period before the first lab visit, while “After” refers to the period after any incentives were removed.14,15 In Study 1, the average attendance rate for the control group decreased slightly, from 0.59 visits per week in the eight weeks before the intervention period to 0.56 visits per week in the seven weeks after the intervention period. The corresponding change for the group required to attend only one time (henceforth the “one-time group”) was from 0.70 visits per week to 0.76 visits per week. In contrast, the corresponding change for the group required to attend the gym eight additional times (henceforth the “eight-times group”16 ) was from 0.60 visits per week to 1.24 visits per week. Thus, we see an average increase of 0.64 visits, or 107 percent of the baseline, for the eight-times group, compared to a small increase for the one-time group, and a slight decline in gym visits for the control group. 14
We compare the same weeks in Study 1. However, since the intervention period ended earlier for the group required to attend only once than for the group required to attend eight more times, the actual post-intervention period is slightly different. Nevertheless, robustness checks show no qualitative difference for different specifications. 15 As we only obtained the data at the end of the academic year for both Study 1 and Study 2, we did not know actual individual attendance until then, and so we paid all students who showed up for payment. It turns out that compliance was imperfect. Although all 40 people in the onetime group in Study 1 complied with the attendance rule, 3 people of 40 in the eight-times group did not fully comply (2 of these people attended six or seven times). In Study 2, 4 of the 56 people in the one-time treatment did not comply, while 5 of the 60 people in the eight-times group did not fully comply (2 of these people attended six or seven times). If we remove the people who did not fully comply with the rules from the analysis, the treatment effects reported above become stronger. 16 We use “eight-times group” for consistency with Study 2, even though people in this group were actually required to attend nine times overall.
INCENTIVES TO EXERCISE
917
(a) Study 1
(b) Study 2 FIGURE 1.—Average weekly gym visits. Error bars reflect 1 standard error.
In Study 2, we observe a positive trend in attendance for all treatment groups. The average attendance rate for the control group increased from 0.81 visits per week in the 12 weeks before the intervention period to 1.10 visits per week in the 13 weeks after the intervention period; this was a 36 percent increase. The corresponding change for the one-time group was from 0.62 visits per week to 0.87 visits per week; this was a 40 percent increase. The change for the eight-times group was much greater, with an average of 0.52 weekly vis-
918
G. CHARNESS AND U. GNEEZY
its before the intervention period and 1.46 weekly visits after the intervention period; this was a 181 percent increase.17,18 We can also examine changes on an individual basis. In Study 1, 13 people of 40 participants (32 percent) in the eight-times group increased their average number of gym visits by more than one per week, while only 2 participants (1 participant) in the one-time group (control group) did so. The test of the equality of proportions (see Glasnapp and Poggio (1985)) finds a very significant difference between the high-incentive and no-incentive treatments, as well as between the two incentive treatments (Z = 353 and 3.15 for these comparisons, both significant at p < 0001). There is no difference between the one-time group and the control group (Z = 059). In Study 2, 40 percent of all participants in the eight-times group (24 people of 60) increased their average number of gym visits by more than one per week, while only 12 percent of the participants (7 people of 57) in the one-time group, and 13 percent of the participants (5 people of 39) in the control group did so. The test of the equality of proportions finds a very significant difference between the eight-times and control groups, as well as between the eight-times and one-time groups (Z = 340 and 2.90 for the respective comparisons, both significant at p < 0002).19 There is no difference between the control and onetime groups (Z = 007). Regular versus Nonregular Attendees We can view a cross section of the population by categorizing people before the intervention as regular attendees (at least one visit per week) or nonregular attendees. From the standpoint of public policy, it may well be more useful to target people who rarely (if ever) attend the gym and convert them into regular attendees than to increase the visitation rate for people who already attend the gym regularly. The effect of requiring multiple visits on the people who were not regular attendees is also particularly relevant for testing habit formation. In Study 1, there were 27 people in the eight-times group who had not been attending the gym regularly (here and elsewhere defined as at least once per 17
We note that the difference in pre-intervention attendance rates for the control and eighttimes groups in Study 2 (0.81 versus 0.52) is not significant (Z = 141 p = 0159). 18 The observed patterns are robust to whether we use means or medians. In Study 1, the median weekly attendance for the control group and for the one-time group was 0 before the intervention and 0 after the intervention; the median weekly attendance for the eight-times group was 0 before the intervention, but 1.214 after the intervention. In Study 2, the median weekly attendance for the control group was 0.417 before the intervention and 0.615 after the intervention; the median weekly attendance for the one-time group was 0.167 before the intervention and 0.385 after the intervention; the median weekly attendance for the eight-times group was 0.167 before the intervention, but 1.000 after the intervention. 19 A chi-squared test using these three categories shows a significant difference between the distributions in the eight-times treatment versus the other two treatments (χ22 = 849 and χ22 = 1166 p < 0012 and p < 0001, respectively).
919
INCENTIVES TO EXERCISE
week); 12 of these people (44 percent) became regular attendees after being paid to go to the gym for a month; these 12 people represent 30 percent of the sample population. The average change for people who had not been regular attendees was 0.98 visits. In contrast, the average change for the 13 people who were already regular attendees was −0.07. Thus, the entire effect of the incentive for the eight-times group comes from those people who had not been regular attendees. In Study 2, 49 of 60 people in the eight-times group had not been attending the gym regularly; 26 of these people (53 percent) became regular attendees; these 26 people represent 43 percent of the sample population. The average change for people who had not been regular attendees was 1.20 visits. In contrast, the average change for the 11 people who were already regular attendees was −0.20. Thus, as in Study 1, the entire effect in the eight-times treatment comes from those people who had not been regular attendees. Table I illustrates the gym attendance rates before and after any intervention for the different groups in Study 1 and Study 2, by previously regular or nonregular attendees:20 In Study 1, we see that there is no real effect on the attendance rates of those people who were already regular attendees. In fact, there is a slight and insignificant downward drift in all treatments. Similarly, there is a slight and insignificant downward drift for ex ante nonregular attendees in the control TABLE I MEAN WEEKLY GYM ATTENDANCE RATESa Ex ante Regular Attendees
Study 1
Control One required visit Eight required visits
Study 2
Control group One required visit Eight required visits
Ex ante Nonregular Attendees
Before
After
Change
Before
After
Change
1.844 (0.296) 1.866 (0.165) 1.644 (0.127)
1.774 (0.376) 1.827 (0.211) 1.571 (0.304)
−0.070 (0.206) −0.040 (0.204) −0.073 (0.264)
0.058 (0.036) 0.077 (0.040) 0.102 (0.044)
0.046 (0.023) 0.181 (0.094) 1.085 (0.234)
−0.012 (0.020) 0.104 (0.106) 0.983 (0.231)
2.433 (0.419) 2.051 (0.191) 1.901 (0.402)
2.677 (0.465) 2.491 (0.583) 1.706 (0.786)
0.244 (0.417) 0.440 (0.537) −0.195 (0.411)
0.250 (0.047) 0.193 (0.039) 0.204 (0.038)
0.560 (0.168) 0.395 (0.079) 1.405 (0.170)
0.310 (0.160) 0.202 (0.080) 1.201 (0.171)
a Standard errors are in parentheses.
20 For the purposes of analysis, we exclude the weeks of spring and winter break, as attendance rates were, perforce, extremely low during these weeks and thus not really representative. In any case, our results are qualitatively unchanged when these weeks are included.
920
G. CHARNESS AND U. GNEEZY
group. We do observe a small and insignificant increase in attendance for nonregulars in the one-time group; however, there is a large and highly significant (t = 426) effect for nonregulars in the eight-times group. We see that in no treatment of Study 2 is there a significant effect on the attendance rates of those people who were already regular attendees. There is an upward trend for ex ante nonregular attendees in both the control group and the one-time group; this is significant for both the control group and the onetime group (t = 194 and t = 252, respectively). However, by far the largest effect (with t = 702) is observed for nonregulars in the eight-times group. This effect is significantly larger than the effect for the control group and the effect for the one-time group (Z = 230 p = 0011 and Z = 274, p = 0003, respectively, one-tailed test, Wilcoxon rank sum tests). We also note that the change for regular attendees in the eight-times group is actually negative, and in contrast to the modest upward trend for regular attendees in the control and one-time groups. While this difference-in-difference is at most only marginally significant (Wilcoxon rank sum test, Z = 134; p = 0090 on a one-tailed test), it does suggest the possibility that ex ante regular attendees experience some crowding out (Hypothesis 1). One might expect that simply requiring people to become familiar with the gym by going through the initial setup might lead to benefits. But if this were the full explanation, there should be little difference between any groups who were required to attend at least once, since they all went to the gym and incurred the setup costs. Yet we see that the increase in gym attendance, in both studies, is significantly and substantially larger for nonregular attendees in the eight-times group than in the other groups. Thus, we see support for Hypothesis 2 over Hypothesis 0 when people who had not regularly attended the gym were required to make multiple visits (obviously we cannot test Hypothesis 2 against Hypothesis 1 for those people who had not attended the gym before the intervention, as their attendance rate cannot decrease). On the other hand, Hypothesis 0 appears to hold for the other treatments. Hypothesis 1 is generally rejected, although perhaps not for the people in Study 2 who were initially regular gym attendees.21 Changes Over Time It is not surprising that the financial incentives lead to a strong effect during the incentive period. But how persistent are the post-intervention effects? Do they appear to be diminishing over time? Figures 2(a) and (b) shows that there 21 A final concern is whether people exceeded the attendance requirements during the intervention period. In Study 1, 23 of 40 people in the one-time group exceeded the required attendance level during the intervention period, while 20 of 40 people in the eight-times group did so. In Study 2, 38 of 57 people in the one-time group exceeded the required attendance level during the intervention period, while 35 of 60 people in the eight-times group did so.
INCENTIVES TO EXERCISE
(a) Study 1
(b) Study 2 FIGURE 2.—Average gym visits.
921
922
G. CHARNESS AND U. GNEEZY
is very little change in attendance rates once the intervention is over.22 We see little if any change over the remaining time for any group.23 We note a tendency for people in the eight-times group to delay much of their required visits until the latter part of the required time period. In Study 1, there was little gym attendance in the early part of the intervention period. There is a pronounced peak for this group in Study 2, as by far the highest weekly average occurred in the fourth week of the intervention period. This seems consistent with self-control issues, since most standard models with uncertainty would predict some precautionary gym visits early on, to take into account later unforeseen shocks. On the other hand, we observe a sharp dip in attendance for the eight-times group in Study 2 in the week after the intervention period, suggesting some degree of intertemporal substitution. A natural concern is the decay rate of the post-intervention attendance for the eight-times group. A regression on the gym attendance for this group in Study 1 over the time of the post-intervention period shows that the average gym attendance increases by the insignificant rate of 0.004 per period, so we see no signs of decay in gym attendance over time after the intervention. A similar regression for Study 2 shows that the average gym attendance drops by the insignificant rate of 0.010 per period. Given that the increase over the pre-intervention rate for this group was 1.201, the benefits would erode in 120 weeks with a linear decline (and, of course, more slowly with a constant percentage decrease from week to week). A final issue concerns attrition. Given the payoff structure in Study 1 (people received $25 after one week and the eight-times group received $100 only after the following month), it is not surprising that there was no attrition at all, in terms of people showing up for money. In Study 2, we paid a large portion ($75) to people who came to the first session, so that some people might have felt satisfied with their earnings and dropped out. In fact, 3 people out of 42 (7 percent) dropped out of the control group, 3 people out of 60 (5 percent) dropped out of the one-time group, and 6 people out of 66 (9 percent) dropped out of the eight-times group. All of the dropouts occurred after the first session, but before the second measurement appointment; these people received the $75 that was paid at the first session. It seems natural to have a higher attrition rate in the eight-times group, since the task to earn more money is more difficult. Nevertheless, the test of proportions finds no significant difference across treatments for these attrition rates.24 22 A smoothed version of Figure 2(b) (with weeks grouped together to lower the noise) can be found in the supplemental Materials on the Econometrica website. 23 Both studies necessarily ended at the end of the school year, as our Institutional Review Board (IRB) approval to gather these data did not extend to the next academic year. The gap for week 0 in Figure 2(a) reflects the week of spring break, while the gaps in Figure 2(b) reflect the low-usage periods mentioned in footnote 16. 24 We have Z = 045 for the control versus the one-time group, Z = 036 for the eight-times group versus the control group, and Z = 089 for the eight-times group versus the one-time group.
923
INCENTIVES TO EXERCISE
Regression Analysis We supplement our descriptive results and nonparametric statistics with some Tobit regressions that account for the left-censoring problem.25 These are presented in Table II. The regressions confirm our earlier discussion. Specification 1 of Study 1 indicates that only when eight visits are required do we observe a significant increase in post-intervention gym attendance. Specification 2 shows that the effect largely vanishes for ex ante regular attendees who were required to visit the gym eight times. Specification 1 of Study 2 also indicates that only when eight visits are required do we observe a significant increase in post-intervention gym attendance. Specification 2 shows that the effect vanishes entirely (going slightly in the other direction) for ex ante regular attendees who were required to attend eight times. Note that we find no significance for gender in any regression.26 TABLE II TOBIT REGRESSIONS FOR GYM ATTENDANCE RATE AFTER INTERVENTIONa Independent Variables
Attendance before One-time group Eight-times group Male
Study 1 Spec. 2
1262∗∗∗ [0154] 0292 [0358] 1320∗∗∗ [0350] 0135 [0280]
1434∗∗∗ [0205] 0184 [0450] 1874∗∗∗ [0404] 0153 [0268] 0198 [0589] −1527∗∗∗ [0533] −1362∗∗∗ [0386] 120 0241
One-time∗regular Eight-times∗regular Constant # Observations Pseudo R2
Study 2
Spec. 1
−1243∗∗ [0366] 120 0211
Spec. 1
1045∗∗∗ [0112] −0022 [0289] 0884∗∗∗ [0284] −0031 [0249]
−0020 [0250] 156 0140
Spec. 2
1195∗∗∗ [0140] −0043 [0307] 1234∗∗∗ [0294] −0114 [0240] 0230 [0480] −1664∗∗∗ [0480] −0114 [0252] 156 0164
a The control-group attendee is the omitted variable in these regressions. Standard errors are in parentheses. ∗∗ indicates significance at the 5 percent level. ∗∗∗ indicates significance at the 1 percent level, two-tailed test.
25
Note that these regressions address within-participant change measured on a nearly continuous variable, rather than differences in proportions with arbitrary thresholds or across individual comparisons. 26 Consistent with the habit-formation story, we find a significant correlation between gym attendance during and after the intervention for the eight-times groups in both Study 1 and Study 2
924
G. CHARNESS AND U. GNEEZY
Biometric and Questionnaire Data As mentioned earlier, in Study 2 we gathered data on each participant’s weight, body fat percentage, waist size, pulse rate, and blood pressure. The full data are presented in Appendix A. Table III summarizes the differences in these measures between the first and last measurements. We find modest but significant differences across treatments in thechange in levels over time for body fat percentage, weight, waist size, BMI, and pulse rate.27 Overall, with the exception of the blood-pressure measures, we see that the biometric measures of the eight-times group improved significantly relative to both the control group and (with the further exception of the pulse rate) the one-time group.28 Thus, it appears that there are real health benefits that TABLE III BIOMETRIC DATA AVERAGES AND CHANGES OVER TIME—STUDY 2a Control (G1) First
Body fat %
257 (154) Pulse rate 780 (186) Weight (kg) 618 (203) BMI 227 (064) Waist (in.) 343 (063) Systolic BP 122 (182) Diastolic BP 740 (122)
One-Time (G2)
First
Eight-Times (G3) First
141 216 029 269 (042) (107) (033) (109) 390 818 −175 802 (208) (156) (145) (147) 057 598 059 640 (055) (160) (021) (154) 023 217 022 232 (019) (045) (007) (040) 007 330 −010 350 (036) (047) (027) (042) 523 121 232 122 (171) (201) (188) (190) 287 758 107 747 (122) (144) (123) (107)
Difference-in-Difference (Wilcoxon–Mann–Whitney Test) G1–G2 ∗
G2–G3
G1–G3
∗∗∗
−078 112 −107 −219∗∗∗ (021) [0088] [0000] [0000] −125 565∗∗ −050 515∗∗ (178) [0030] [0974] [0040] −034 −002 093∗∗∗ 091∗∗∗ (025) [0560] [0005] [0006] −012 001 034∗∗∗ 035∗∗∗ (009) [0560] [0005] [0006] −072 017∗ 062∗∗ 079∗ (023) [0790] [0045] [0068] 178 291∗ 054 345∗ (199) [0084] [0897] [0090] 258 180 −151 029 (133) [0160] [0654] [0535]
a Standard errors are in parentheses, Two-tailed p-values are in brackets. Body mass index (BMI) is calculated using the formula: BMI = (weight in kilograms)/(height in meters)2 . “First” refers to the first measurement, which was taken in the initial week, and “” indicates the change from the initial level as determined using the final measurement, taken 20 weeks later.
(r = 06035 and r = 06802 in the respective studies; both correlation coefficients are significant at p < 00001). 27 We include for convenience both weight and BMI, even though the latter is an isomorphic function of weight for a given height. 28 Some of the significance of the improvement stems from the control group having gotten worse over time in their biometric measures. We suspect that this is not so uncommon, particularly among first-year students. There is an expression “the freshman 15,” which refers to students’ weight gain in the first year living away from home.
INCENTIVES TO EXERCISE
925
accrue from paying people to go to the gym eight times in a month.29 There are also some differences between the control and one-time group, but these tend to have lower significance levels; perhaps there is a slight benefit from merely requiring one visit.30 We asked people to fill out a questionnaire at the first meeting, with questions involving the frequency of exercise, whether people wished to exercise more, whether people thought that being paid money to go to the gym would increase the amount they exercise, and, if so, whether this would have a longterm effect. Other questions pertained to the grade point average (GPA), to happiness with respect to social life and academic performance, and to the extent a change was needed in their lives. As would be expected with randomization, there were no substantial differences in the responses to these questions across treatment groups; the full summary statistics can be found in Appendix B. On average, students at the first meeting reported exercising 2.2 times per week, while 84 percent wished to exercise more. Eighty-two percent thought that paying money to exercise would help; of these, 79 percent thought that there would be a long-term effect. On a 1–7 scale, respondents reported a mean happiness level of 5.97 with their social life and 5.40 with their academic life; an average score of 4.59 was observed for people feeling that a change was needed in their life. Overall, we found that the responses had little predictive power in terms of the change in gym attendance over time; we present regressions to this effect in Appendix B. Only the question about needing a change in one’s life showed any significance, and this was not robust to different specifications. Interaction effects were not significant.31 Thus, our results seem to be largely 29 A reviewer suggests the following rough calibration of our health results. Suppose we assume that the only effect of the eight-times treatment was to increase the amount of exercise, and further assume that this amounted to 400 calories expended during each exercise session. The eight-times group has about 15 more exercise sessions than the control group (combining the extra sessions during the intervention period with the effect in the post-intervention periods), or spends roughly 6000 more calories, which is a little less than 1 kilogram. We find a difference-indifference for weight of 0.91 kilograms. Again, this is only a rough calibration. 30 We do measure the amount of gym and nongym exercise in Study 2 by asking participants to keep exercise journals during the intervention period. The people in the control group reported exercising on an average of 7.87 days, with 79 percent reporting going to the gym during this time and 26 percent reporting nongym exercise. Participants in the one-time group reported exercising on an average of 4.91 days, with 100 percent reporting going to the gym during this time and 33 percent reporting nongym exercise. People in the eight-times group reported exercising on an average of 8.58 days, with 100 percent reporting going to the gym during this time and 22 percent reporting nongym exercise. Thus, while the one-time group reports less frequent exercise than the eight-times group, there is only a small difference in frequency between the control group and the eight-times group. This is puzzling, and we suspect that self-reported data may not be the most reliable in this case. 31 We also performed a regression (not shown) with two interaction terms for each independent variable. Only one of the coefficients for the 16 interaction terms (that for the interaction between the one-time group and one’s happiness with one’s social life) was significant (it was negative).
926
G. CHARNESS AND U. GNEEZY
robust to the responses on our questionnaire. It appears that the mechanism by which one’s exercise frequency increases is independent of one’s attitudes and views. 3. CONCLUSION Some of us have too many bad habits, such as smoking, and too few good ones, such as exercising. Could incentives be used to “improve” one’s habit formation—reducing the bad ones and increasing the good ones? This is an important public-policy question that comes to mind when discussing, for example, incentives to get an education. A major argument made by opponents of using monetary incentives in education is the risk of crowding out intrinsic incentives. Strong and robust evidence shows that the introduction of extrinsic incentives can alter the meaning of the interaction, and hence be counterproductive. In education, for example, it might result in focusing attention on test scores, instead of a more holistic approach in which test scores are only one component. A particular concern arises regarding the longterm effect of the monetary intervention. Even if incentives are effective while present, after they are removed people may revert to effort levels even lower than the initial ones. Nevertheless, some recent evidence in education (e.g., Angrist and Lavy (2009), Angrist, Lang, and Oreopoulos (2009), and Kremer, Miguel, and Thornton (2009) and the discussion therein) indicates that paying children to overcome initial resistance to engaging in a potentially beneficial activity may be quite successful. In this paper we chose to test the effect of such extrinsic incentives on behavior that may be easier to evaluate than education, because the goal is perhaps better defined. In each of our two studies we paid one group to go to the gym for several weeks, and we observed the gym attendance for this group and all others after the incentives (if any) were removed. Two competing predictions regarding the long-term effect on exercising can arise from the existing literature. The use of incentives might weaken the intrinsic motivation to engage in exercise, such that when the incentives are removed people would exercise less than before. Alternatively, the period of time during which people were induced to exercise might be sufficient to induce the formation of a habit that will remain even after the removal of the incentives.32 We find a positive effect from paying people to go to the gym eight times over a period of one month, as the rate of gym visits after the intervention increased significantly in both studies. Upon closer examination, we have the encouraging result that our incentive scheme was successful in creating this positive habit of exercising more: Participants who did not attend the gym before our study began to do so during our intervention and continued to go after 32 We reiterate that we cannot readily reject explanations (such as the ones mentioned on p. 912) other than habit formation, although we do feel that the habit-formation story fits best.
INCENTIVES TO EXERCISE
927
it was concluded. This result is robust to a number of factors, including gender, the expressed desire to exercise more, and satisfaction regarding one’s social and academic life. Hence, the main result of this paper is that paying people to go to the gym regularly positively reenforced this behavior. The concerns discussed above regarding a strong decline in exercising after removing the incentives were not completely rejected, as there is some slight evidence (primarily in Study 2) that imposing requirements can actually backfire with respect to people who have already been attending the gym regularly. Finally, the evidence shows that people derive health benefits from our intervention, as the relative change in several biometric indices is significantly better for the eight-times group than the other groups in Study 2. Given the enormous sums of money spent on health care, even a modest improvement may yield large social benefits. Furthermore, if it is possible to favorably influence the habits of young people, there is at least the possibility that this improvement will last for a long time, providing social benefits for the entire period. Of course, we cannot substantiate such a strong claim; however, we do find that the gym attendance rate does not decrease significantly during the post-intervention period in either study. Habits increase the marginal utility of engaging in an activity in the future. People seem to systematically underestimate the impact of their current actions on the utility of future action and to discount the future too much. As a result, people may underinvest in habit-forming activities (either because they fail to realize the link between current and future consumption or because they do not think that they care about the future). The implications of our findings for public policy are straightforward. Incentives to exercise work, but they should be targeted at people who currently do not exercise and must mandate enough practice hours for the habit to develop. We find that merely providing information about the benefits of exercise or even requiring one gym visit does not have much of an effect. Furthermore, paying people who currently exercise is at best a waste of money; at worst, as our findings hint, it can actually weaken post-intervention exercise habits for people who had already been exercising. This paper clearly does not exhaust the agenda on habit formation and monetary interventions to improve behavior. There are some very important questions left unanswered, such as why do monetary interventions sometimes succeed and sometimes do not? For example, Volpp et al. (2008) found that a lottery-based financial incentive improved medication adherence for patients. However, behavior reverted back to almost exactly what it was initially once the incentives were dropped. Another open question is the effect of incentives on bad habits. Findings in the literature on bad habits, such as smoking, are not as encouraging as our findings. For example, in cigarette smoking cessations, researchers have used punishment or rewards (Donatelle et al. (2004)), with very little success. The basic finding is that people refrain from smoking when incentives are present, but go back after the incentives are removed.
928
G. CHARNESS AND U. GNEEZY
An interesting question that future research might address is why habits that we are trying to eliminate seem different from habits that we are trying to acquire. This is not a straightforward extension, since the literature on the neurobiology of addiction (see, e.g., the discussion in Bernheim and Rangel (2004)) finds that smoking and other substance addictions are qualitatively different from other “negative habits.” On the other hand, a recent study by Giné, Karlan and Zinman (2008) found a positive effect for the use of incentives on smoking cessation. It is clear that more research is needed in the area of incentives and habit formation/cessation. APPENDIX A: BIOMETRICS TABLE A.I BIOMETRIC DATA AVERAGES—STUDY 2a Control
Body fat % Pulse rate Weight (kg) BMI Waist (in.) Systolic BP Diastolic BP
One-Time
1st
2nd
3rd
Diff
1st
2nd
257 (154) 780 (186) 618 (203) 227 (064) 343 (063) 122 (182) 740 (122)
256 (155) 803 (180) 615 (195) 226 (061) 341 (058) 123 (237) 759 (114)
271 (162) 819 (212) 624 (191) 230 (061) 343 (062) 127 (214) 768 (118)
141 (042) 390 (208) 057 (055) 023 (019) 007 (036) 523 (171) 287 (122)
216 (107) 818 (169) 598 (160) 217 (045) 330 (047) 121 (201) 758 (144)
220 (111) 799 (152) 600 (156) 218 (044) 329 (046) 125 (204) 756 (111)
3rd
Eight-Times Diff
218 029 (114) (033) 801 −175 (159) (173) 604 059 (160) (021) 219 022 (045) (007) 328 −010 (048) (027) 123 232 (183) (188) 768 107 (120) (123)
1st
2nd
3rd
Diff
269 (109) 802 (147) 640 (154) 232 (040) 350 (042) 122 (190) 747 (107)
267 (109) 817 (148) 640 (151) 232 (039) 347 (041) 125 (185) 763 (108)
261 (107) 790 (181) 637 (152) 231 (039) 343 (044) 125 (176) 773 (125)
−078 (021) −125 (178) −034 (025) −012 (009) −072 (023) 178 (199) 258 (133)
a Standard errors are in parentheses. Body mass index (BMI) is calculated using the formula BMI = (weight in kilograms)/(height in meters)2 . “1st,” “2nd,” and “3rd” refer to the first, second, and third times that the biometric data were taken.
APPENDIX B: QUESTIONNAIRE SUMMARY STATISTICS AND REGRESSIONS, STUDY 2 TABLE B.I QUESTIONNAIRE SUMMARY STATISTICSa Control
Exercise frequency (per week) Wish to exercise more (yes = 1)
2391 (0257) 0821 (0062)
One-Time
2202 (0264) 0807 (0053)
Eight-Times
2058 (0187) 0883 (0042)
Aggregate
2194 (0136) 0840 (0029) (Continues)
929
INCENTIVES TO EXERCISE TABLE B.I—Continued
Money helps (yes = 1) If so, long-term effect (yes = 1) GPA Happiness with social life (1–7) Happiness with academic affairs (1–7) Change needed in life (1–7)
Control
One-Time
Eight-Times
Aggregate
0872 (0054) 0875 (0052) 3324 (0064) 5949 (0107) 5590 (0171) 4500 (0247)
0789 (0040) 0744 (0063) 3077 (0065) 5948 (0107) 5246 (0153) 4456 (0194)
0825 (0043) 0767 (0058) 3267 (0067) 6017 (0087) 5433 (0133) 4783 (0178)
0824 (0028) 0790 (0034) 3211 (0039) 5974 (0065) 5404 (0087) 4594 (0116)
a Standard errors are in parentheses.
TABLE B.II OLS REGRESSIONS FOR CHANGE IN ATTENDANCE RATE AND QUESTIONNAIRE RESPONSESa Independent Variables
Exercise frequency (per week) Wish to exercise more (yes = 1) Money helps (yes = 1) If so, long-term effect (yes = 1) GPA Happiness with social life (1–7) Happiness with academic affairs (1–7) Change needed in life (1–7)
1
−0056 [0070] −0435 [0329] −1034 [0765] −0057 [0278] 0054 [0270] 0117 [0138] 0025 [0134] 0162 [0080]∗∗
Change needed,* one-time Change needed,* eight-times Constant No. observations R2
1488 [1100] 132 0055
2
−0070 [0071] −0459 [0331] −1147 [0777] −0052 [0279] 0063 [0271] 0137 [0140] 0020 [0136] 0257 [0143]∗ −0.196 [0180] −0.068 [0176] 1580 [1108] 132 0065
3
4
—
—
—
—
—
—
—
—
—
—
—
—
—
—
0.079 [0.066]
0.481∗∗∗ [0.103] 155 0.009
0.144 [0.123] −0.190 [0.159] 0.012 [0.155] 0.471∗∗∗ [0.103] 155 0.025
a The control-group attendee is the omitted variable in these regressions. *, **, *** indicate significance at the 10 percent, 5 percent, and 1 percent levels, respectively, for a two-tailed test.
930
G. CHARNESS AND U. GNEEZY REFERENCES
ANDERSEN, R. E. (1999): “Exercise, an Active Lifestyle and Obesity,” The Physician and Sport Medicine, available at http://www.physsportsmed.com/index.php?art=psm_10.1_1999?article= 1021. [910] ANGRIST, J., D. LANG, AND P. OREOPOULOS (2009): “Incentives and Services for College Achievement: Evidence From a Randomized Trial,” American Economic Journal: Applied Economics, 1, 136–163. [912,926] ANGRIST, J., AND V. LAVY (2009): “The Effect of High Stakes High School Achievement Awards: Evidence From a Group Randomized Trial,” American Economic Review (forthcoming). [926] BECKER, G. S. (1992): “Habits, Addictions and Traditions,” Kyklos, 45, 327–345. [912,915] BECKER, G. S., AND K. M. MURPHY (1988): “A Theory of Rational Addiction,” Journal of Political Economy, 96, 675–700. [911,912,915] BENABOU, R., AND J. TIROLE (2003): “Intrinsic and Extrinsic Motivation,” Review of Economic Studies, 70, 489–520. [914] (2004): “Willpower and Personal Rules,” Journal of Political Economy, 112, 848–886. [912] BERNHEIM, B. D., AND A. RANGEL (2004): “Addiction and Cue-Triggered Decision Processes,” American Economic Review, 94, 1558–1590. [912,928] BROWNELL, K. D. (1995): “Exercise in the treatment of obesity,” in Eating Disorders and Obesity: A Comprehensive Handbook, ed. by K. D. Brownell and C. G. Fairburn. New York: Guilford Press, 473–478. [910] CHARNESS, G., AND U. GNEEZY (2009): “Supplement to ‘Incentives to Exercise’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ecta/Supmat/7416_instructions to experimental subjects.zip. [913] DECI, E. (1971): “Effects of Externally Mediated Rewards on Intrinsic Motivation,” Journal of Personality and Social Psychology, 18, 105–115. [911,914] DELLAVIGNA, S. (2009): “Psychology and Economics: Evidence From the Field,” Journal of Economic Literature (forthcoming). [910] DELLAVIGNA, S., AND U. MALMENDIER (2006): “Paying Not to Go to the Gym,” American Economic Review, 96, 694–719. [909,912] DONATELLE, R. J., D. HUDSON, S. DOBIE, A. GOODALL, M. HUNSBERGER, AND K. OSWALD (2004): “Incentives in Smoking Cessation: Status of the Field and Implications for Research and Practice With Pregnant Smokers,” Nicotine & Tobacco Research, 6, S163–S179. [927] ELLISON, G. (2006): “Bounded Rationality in Industrial Organization,” in Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress, ed. by R. Blundell, W. K. Newey, and T. Persson. Cambridge, U.K.: Cambridge University Press. [910] FEHR, E., AND A. FALK (2002): “Psychological Foundations of Incentives,” European Economic Review, 46, 687–724. [914] FREDERICK, S., G. LOEWENSTEIN, AND T. O’DONOGHUE (2002): “Time Discounting and Time Preference: A Critical Review,” Journal of Economic Literature, 40, 351–401. [912] FREY, B. S., AND R. JEGEN (2001): “Motivation Crowding Theory,” Journal of Economic Surveys, 15, 589–611. [911,914] GINÉ, X., D. KARLAN, AND J. ZINMAN (2008): “Put Your Money Where Your Butt Is: A Commitment Savings Account for Smoking Cessation,” available at http://karlan.yale.edu/p/CARES_ dec08.pdf. [928] GLASNAPP, D., AND J. POGGIO (1985): Essentials of Statistical Analysis for the Behavioral Sciences. Columbus, OH: Merrill. [918] GNEEZY, U., AND A. RUSTICHINI (2000a): “Pay Enough, or Don’t Pay at All,” Quarterly Journal of Economics, 115, 791–810. [911,914,915] (2000b): “A Fine Is a Price,” Journal of Legal Studies, 29, 1–17. [911] HEIDHUES, P., AND B. KOSZEGI (2008): “Competition and Price Variation When Consumers Are Loss Averse,” American Economic Review, 98, 1245–1248. [910]
INCENTIVES TO EXERCISE
931
HEYMAN, J., AND D. ARIELY (2004): “Effort for Payment: A Tale of Two Markets,” Psychological Science, 15, 787–793. [915] KAYMAN, S., W. BRUVOLD, AND J. STERN (1990): “Maintenance and Relapse After Weight Loss in Women: Behavioral Aspects,” American Journal of Clinical Nutrition, 52, 800–807. [910] KREMER, M., E. MIGUEL, AND R. THORNTON (2009): “Incentives to Learn,” The Review of Economics and Statistics (forthcoming). [926] LAIBSON, D. I. (1997): “Golden Eggs and Hyperbolic Discounting,” Quarterly Journal of Economics, 112, 443–477. [912] LAUGA, D. (2008): “Persuasive Advertising With Sophisticated but Impressionable Consumers,” Mimeo, UCSD, available at http://management.ucsd.edu/faculty/directory/lauga/ docs/persuasive_advertising.pdf. [910] LEE, C., S. BLAIR, AND A. JACKSON (1999): “Cardiorespiratory Fitness, Body Composition, and All-Cause and Cardiovascular Disease Mortality in Men,” American Journal of Clinical Nutrition, 69, 373–380. [910] LEPPER, M., AND D. GREENE (1978): The Hidden Costs of Reward: New Perspectives in the Psychology of Human Motivation. New York: Elbaum/Wiley. [914] LOEWENSTEIN, G., AND T. O’DONOGHUE (2005): “Animal Spirits: Affective and Deliberative Processes in Economic Behavior,” Mimeo, available at http://people.cornell.edu/pages/edo1/ will.pdf. [912] NATIONAL CENTER FOR HEATH STATISTICS, http://www.cdc.gov/nchs/products/pubs/pubd/ hestats/overweight/overwght_adult_03.htm. O’DONOGHUE, T., AND M. RABIN (1999): “Doing It Now or Later,” American Economic Review, 89, 103–124. [912] RYDER, H., AND G. HEAL (1973): “Optimal Growth With Intertemporally Dependent Preferences,” Review of Economic Studies, 40, 1–31. [915] THALER, R., AND S. BENARTZI (2004): “Save More Tomorrow: Using Behavioral Economics to Increase Employee Saving,” Journal of Political Economy, 112, S164–S187. [910] TRUST FOR AMERICA’S HEALTH (2007): “F as in Fat: How Obesity Policies Are Failing in America,” available at http://healthyamericans.org/reports/obesity2007/. [910] VOLPP, K., G. LOEWENSTEIN, A. TROXEL, J. DOSHI, M. PRICE, M. LASKIN, AND S. KIMMEL (2008): “Financial Incentive-Based Approaches for Weight Loss: A Randomized Trial,” Journal of the American Medical Association, 300, 2631–2637. [927]
Dept. of Economics, University of California at Santa Barbara, 2127 North Hall, Santa Barbara, CA 93106-9210, U.S.A.; [email protected] and Rady School of Management, University of California at San Diego, Otterson Hall, 9500 Gilman Dr., #0553, La Jolla, CA 92093-0553, U.S.A.; ugneezy@ucsd. edu. Manuscript received September, 2007; final revision received December, 2008.
Econometrica, Vol. 77, No. 3 (May, 2009), 933–952
NOTES AND COMMENTS A DOUBLE-TRACK ADJUSTMENT PROCESS FOR DISCRETE MARKETS WITH SUBSTITUTES AND COMPLEMENTS BY NING SUN AND ZAIFU YANG1 We propose a new Walrasian tâtonnement process called a double-track procedure for efficiently allocating multiple heterogeneous indivisible items in two distinct sets to many buyers who view items in the same set as substitutes but items across the two sets as complements. In each round of the process, a Walrasian auctioneer first announces the current prices for all items, buyers respond by reporting their demands at these prices, and then the auctioneer adjusts simultaneously the prices of items in one set upward but those of items in the other set downward. It is shown that this procedure converges globally to a Walrasian equilibrium in finitely many rounds. KEYWORDS: Adjustment process, auction, substitute, complement, indivisibility.
1. INTRODUCTION TÂTONNEMENT PROCESSES or auctions are fundamental instruments for discovering market-clearing prices and efficient allocations. The study of such processes provides one way of addressing the question of price formation and has long been a major issue of economic research. In 1874, Leon Walras formulated the first tâtonnement process—a type of auction. Samuelson (1941) and Arrow and Hurwicz (1958) were among the first to study the convergence of certain tâtonnement processes. They proved that such processes converge globally to an equilibrium for any economy with divisible goods when the goods are substitutable. This study then generated great hope that such processes might also work for a larger class of economies with divisible goods, but Scarf (1960) soon dashed such hopes by showing that when goods exhibit complementarity, such processes can oscillate and will never tend toward equilibrium. Later it was Scarf (1973) who developed a remarkable process that can find an equilibrium in any reasonable economy with divisible goods. The study on tâtonnement processes for markets with indivisible goods began in the early 1980s. In a seminal paper, Kelso and Crawford (1982) de1 We are deeply grateful to Atsushi Kajii and Larry Samuelson for many insightful comments and constructive suggestions, and to five anonymous referees for very useful feedback. In particular, a referee suggested we strengthen Theorem 2 from if to if and only if. We also thank Tommy Andersson, Vince Crawford, Mamoru Kaneko, Gerard van der Laan, Michael Ostrovsky, LarsGunnar Svensson, Dolf Talman, Walter Trockel, and many seminar and conference participants for their helpful comments. All errors are our own. This research was financially supported by KIER, Kyoto University (Sun), and by the Ministry of Education, Science and Technology of Japan, the Netherlands Organization for Scientific Research (NWO), and CentER, Tilburg University (Yang).
© 2009 The Econometric Society
DOI: 10.3982/ECTA6514
934
N. SUN AND Z. YANG
veloped an adjustment process that allows each firm to hire several workers.2 They showed that their process efficiently allocates workers with competitive salaries to firms, provided that every firm views all the workers as substitutes. This condition is called gross substitutes (GS) and has been widely used in auction, matching, and equilibrium models. Gul and Stacchetti (2000) devised an elegant ascending auction that finds a Walrasian equilibrium in finitely many steps when all goods are substitutes. While their analysis is mathematically sophisticated and quite demanding, Ausubel (2006) significantly simplified the analysis by developing a simpler and more elegant dynamic auction. Based on his auction, he also proposed a novel strategy-proof dynamic procedure that yields a Vickrey–Clarke–Groves (VCG) outcome. Milgrom (2000) presented an ascending auction with an emphasis on the sale of radio spectrum licenses in the United States. However, all these processes were designed and work only for substitutes. In contrast, it is widely recognized3 that complementarities pose a challenge for designing dynamic mechanisms for discovering market-clearing prices and efficient allocations. The current paper explores a tâtonnement process for markets with indivisible goods which exhibit a typical pattern of complementarities. More specifically, we study a market model where a seller wishes to sell two distinct sets S1 and S2 of several heterogeneous items to a number of buyers. In general, buyers view items in the same set as substitutes but items across the two sets as complements. This condition is called gross substitutes and complements (GSC), generalizing the GS condition. Many typical situations fit this general description, stretching from the sale of computers and software packages to consumers, the allocation of workers and machines to firms, takeoff and landing slots to airliners, and so forth.4 In our earlier analysis (Sun and Yang (2006b)), we showed that if all agents in an exchange economy have GSC preferences, the economy has a Walrasian equilibrium. But the method is nonconstructive, and so, in particular, the important issue of how to find the equilibrium 2 Special but well studied unit-demand models typically assume that every consumer demands at most one item or every person needs only one opposite sex partner. See Gale and Shapley (1962), Shapley and Scarf (1974), Crawford and Knoer (1981), and Demange, Gale, and Sotomayor (1986) among others. 3 The current state of the art is well documented in Milgrom (2000), Jehiel and Moldovanu (2003), Klemperer (2004), and Maskin (2005). We quote from Milgrom (2000, p. 258): “The problem of bidding for complements has inspired continuing research both to clarify the scope of the problem and to devise practical auction designs that overcome the exposure problem.” The socalled exposure problem refers to a phenomenon concerning an ascending auction that at the earlier stages of the auction, all items were over-demanded, but as the prices are going up, some or all items may be exposed to the possibility that no bidder wants to demand them anymore, because complementary items have become too expensive. As a result, the ascending auction will get stuck in disequilibrium. 4 Ostrovsky (2008) independently proposed a similar condition for a supply chain model where prices of goods are fixed and a non-Walrasian equilibrium solution is used. See also Shapley (1962), Samuelson (1974), Rassenti, Smith, and Bulfin (1982), Krishna (2002), and Milgrom (2007) for related work.
DOUBLE-TRACK ADJUSTMENT PROCESS
935
prices and allocation is not dealt with. The existing auction processes, however, are hindered by the exposure problem and cannot handle this situation. In contrast, in this paper we propose a new Walrasian tâtonnement process— a double-track procedure that circumvents the exposure problem and can discover a Walrasian equilibrium. This procedure works as follows. In each round of the process, a Walrasian auctioneer first calls out the current prices for all items, buyers are asked to report their demands at these prices, and then according to buyers’ reported demands the auctioneer adjusts the prices of overdemanded items in one set S1 (or S2 ) upward but those of under-demanded items in the other set S2 (or S1 ) downward. We prove that this procedure finds a Walrasian equilibrium in finitely many rounds. Unlike traditional tâtonnement processes that typically adjust prices continuously, this procedure adjusts prices only in integer or fixed quantities. The proposed procedure differs markedly from the existing auctions in that it adjusts simultaneously prices of items in S1 and S2 , respectively, in opposite directions, whereas the existing auctions typically adjust all prices simultaneously only in one direction (either ascending or descending). When all items are substitutes (i.e., either S1 = ∅ or S2 = ∅), the proposed procedure coincides with Ausubel’s (2006) auction and is similar to Gul and Stacchetti (2000). In general the proposed procedure deals with the circumstances including complements that go beyond the existing models with substitutes. Unlike sealedbid auction mechanisms, the proposed procedure is informationally efficient in the sense that it only requires buyers to report their demands at several price vectors along a finite path to equilibrium, and it is also privacy-preserving in the sense that buyers need not report their demands beyond that finite path and can avoid exposing their values over all possible bundles. This property is important, because in reality businessmen are generally unwilling to reveal their values or costs. To design this new procedure, it is crucial to introduce a new characterization of the GSC condition called generalized single improvement (GSI), generalizing the single improvement (SI) property of Gul and Stacchetti (1999). This paper proceeds as follows. Section 2 introduces the market model and Section 3 presents the main results. Section 4 concludes. 2. THE MARKET MODEL A seller wishes to sell a set N = {β1 β2 βn } of n indivisible items to a finite group I of buyers. The items may be heterogeneous and can be divided into two sets S1 and S2 (i.e., N = S1 ∪ S2 and S1 ∩ S2 = ∅). For instance, one can think of S1 as computers and of S2 as software packages. Items in the same set can also be heterogeneous. Every buyer i has a value function ui : 2N → R specifying his valuation ui (B) (in units of money) on each bundle B with ui (∅) = 0, where 2N denotes the family of all bundles of items. It is standard to assume that ui is weakly increasing, every buyer can pay up to his
936
N. SUN AND Z. YANG
value and has quasilinear utilities in money, and the seller values each bundle at zero. Note, however, that weak monotonicity can be dropped; see Sun and Yang (2006a, 2006b). A price vector p = (p1 pn ) ∈ Rn specifies a price ph for each item βh ∈ N. Buyer i’s demand correspondence Di (p), the net utility function vi (A p), and the indirect utility function V i (p), are defined, respectively, by i i (1) D (p) = arg max u (A) − ph A⊆N
vi (A p) = ui (A) −
βh ∈A
ph
βh ∈A
i V (p) = max u (A) − ph i
A⊆N
βh ∈A
It is known that for any value function ui : 2N → R, the indirect utility function V i is a decreasing, continuous, and convex function. An allocation of items in N is a partition π = (π(i) i ∈ I) of items among all buyers in I, that is, π(i) ∩ π(j) = ∅ for all i = j and i∈I π(i) = N. Note that π(i) = ∅ is allowed. At allocation π, i receives bundle π(i). An allocabuyer i tion π is efficient if i∈I ui (π(i)) ≥ i∈I u (ρ(i)) for every allocation ρ. Given i u (π(i)). We call R(N) the market an efficient allocation π, let R(N) = i∈I value of the items. DEFINITION 1: A Walrasian equilibrium (p π) consists of a price vector p ∈ Rn+ and an allocation π such that π(i) ∈ Di (p) for every i ∈ I. It is well known that every equilibrium allocation is efficient, but an equilibrium may not always exist. To ensure the existence of an equilibrium, we need to impose some conditions on the model. The most important one is called the gross substitutes and complements condition, which is defined below.5 DEFINITION 2: The value function ui of buyer i satisfies the gross substitutes and complements (GSC) condition if for any price vector p ∈ Rn , any item βk ∈ Sj for j = 1 or 2, any δ ≥ 0, and any A ∈ Di (p), there exists B ∈ Di (p + δe(k)) such that [A ∩ Sj ] \ {βk } ⊆ B and [Ac ∩ Sjc ] ⊆ Bc . 5 The following piece of notation is used throughout the paper. For any integer k (1 ≤ k ≤ n), n e(k) denotes the kth unit vector in Rn . Let Zn stand for the integer lattice in R and let 0 denote the n-vector of 0’s. For any subset A of N, let e(A) = βk ∈A e(k). When A = {βk }, we also write c e(A) as e(k). For any subset A of N, let Ac denote its complement, that is, A = N \ A. For any vector p ∈ Rn and any set A ∈ 2N , let p(A) = βk ∈A pk e(k). So we have p(N) = p for any p ∈ Rn . For any finite set A, (A) denotes the number of elements in A. For any set D ⊆ Rn , co(D) denotes its convex hull.
DOUBLE-TRACK ADJUSTMENT PROCESS
937
GSC says that buyer i views items in each set Sj as substitutes, but items across the two sets S1 and S2 as complements, in the sense that if the buyer demands a bundle A at prices p and if now the price of some item βk ∈ Sj is increased, then he would still demand the items both in A and in Sj whose prices did not rise, but he would not demand any item in another set Sjc which was not in his choice set A at prices p. In particular, when either S1 = ∅ or S2 = ∅, GSC reduces to the gross substitutes (GS) condition of Kelso and Crawford (1982). GS excludes complements and requires that all the items be substitutes. This case has been studied extensively in the literature; see for example, Kelso and Crawford (1982), Gul and Stacchetti (1999, 2000), Milgrom (2000, 2004), Crawford (2008), and Ausubel (2006). Moreover, Ausubel (2004) and Perry and Reny (2005) studied auctions for selling many identical units of a good (a special case of GS) with interdependent values. The following assumptions will be used in this paper: ASSUMPTION A1 —Integer Private Values: Every buyer i’s value function ui : 2N → Z+ takes integer values and is his private information. ASSUMPTION A2 —Gross Substitutes and Complements: Every buyer i’s value function ui satisfies the GSC condition. ASSUMPTION A3 —The Auctioneer’s Knowledge: The auctioneer knows some integer value U ∗ greater than any buyer’s possible maximum value. The integer value assumption in Assumption A1 is quite standard and natural, since one cannot value a bundle of goods more closely than to the nearest penny. Assumption A3 is merely a technical assumption used only in Theorem 4. The essence and difficulty of designing an adjustment process for locating a Walrasian equilibrium in this market lie in the facts that every buyer’s valuation of any bundle of goods is private information and is therefore unobservable to the auctioneer Assumption A1, and that there are multiple indivisible substitutes and complements for sale Assumption A2. 3. THE DOUBLE-TRACK ADJUSTMENT PROCESS 3.1. The Basis This subsection provides the basis on which the double-track procedure will be established for finding a Walrasian equilibrium in the market described in the previous section. We begin with a new characterization of the GSC condition. DEFINITION 3: The value function ui of buyer i has the generalized single improvement (GSI) property if for any price vector p ∈ Rn and any bundle A∈ / Di (p), there exists a bundle B ∈ 2N such that vi (A p) < vi (B p) and B satisfies exactly one of the following conditions
938
N. SUN AND Z. YANG
(i) A ∩ Sj = B ∩ Sj , and [(A \ B) ∩ Sjc ] ≤ 1 and [(B \ A) ∩ Sjc ] ≤ 1 for either j = 1 or j = 2. (ii) Either B ⊆ A and [(A \ B) ∩ S1 ] = [(A \ B) ∩ S2 ] = 1 or A ⊆ B and [(B \ A) ∩ S1 ] = [(B \ A) ∩ S2 ] = 1. GSI says that for buyer i, every suboptimal bundle A at prices p can be strictly improved by either adding an item to it, removing an item from it, or doing both in either set A ∩ Sj . The bundle A can be also strictly improved by adding simultaneously one item from each set Sj to it or removing simultaneously one item from each set A ∩ Sj , j = 1 2. We call bundle B a GSI improvement of A. When either S1 or S2 is empty, GSI coincides with the single improvement (SI) property of Gul and Stacchetti (1999) which in turn is equivalent to the GS condition. The GSI property plays an important role in our adjustment process design as SI does in Ausubel (2006) and Gul and Stacchetti (2000). We now state the following theorem whose proof, together with those of Theorems 3 and 5 and Lemmas 2 and 4, is deferred to the Appendix. THEOREM 1: Conditions GSC and GSI are equivalent. Let p q ∈ Rn be any vectors. With respect to the order (S1 S2 ), we define their generalized meet s = (s1 sn ) = p ∧g q and join t = (t1 tn ) = p ∨g q by sk = min{pk qk }
βk ∈ S1
sk = max{pk qk }
βk ∈ S2 ;
tk = max{pk qk }
βk ∈ S1
tk = min{pk qk }
βk ∈ S2
Note that the two operations are different from the standard meet and join operations. A set W ⊆ Rn is called a generalized lattice if p ∧g q p ∨g q ∈ W for any p q ∈ W . For p q ∈ Rn , we introduce a new order by defining p ≤g q if and only if p(S1 ) ≤ q(S1 ) and p(S2 ) ≥ q(S2 ). Given a set W ⊆ Rn , a point p∗ ∈ W is called a smallest element if p∗ ≤g q for every q ∈ W . Similarly, a point q∗ ∈ W is called a largest element if q∗ ≥g p for every p ∈ W . It is easy to verify that a compact generalized lattice has a unique smallest (largest) element in it. Given a generalized lattice W ⊆ Rn , we say a function f : W → R is a generalized submodular function if f (p ∧g q) + f (p ∨g q) ≤ f (p) + f (q) for all p q ∈ W . Ausubel and Milgrom (2002, Theorem 10) showed that items are substitutes for a buyer if and only if his indirect utility function is submodular. Our next theorem generalizes their result from GS to GSC preferences and will be used to establish Theorem 3 below. The proof of Theorem 2, as well as those of Lemmas 1, 3, 5, and 6, is relegated to the Supplemental Material (Sun and Yang (2009)). THEOREM 2: A value function ui satisfies the GSC condition if and only if the indirect utility function V i is a generalized submodular function.
DOUBLE-TRACK ADJUSTMENT PROCESS
939
For the market model, define the Lyapunov function L : Rn → R by (2) L(p) = ph + V i (p) βh ∈N
i∈I
where V i is the indirect utility function of buyer i ∈ I. This type of function is well known in the literature for economies with divisible goods (see, e.g., Arrow and Hahn (1971) and Varian (1981)), but was only recently explored ingeniously by Ausubel (2005, 2006) in the context of indivisible goods. His Proposition 1 in both papers shows that if an equilibrium exists, then the set of equilibrium price vectors coincides with the set of minimizers of the Lyapunov function. The following lemma strengthens this result by providing a necessary and sufficient condition for the existence of an equilibrium. LEMMA 1: For the market model, p∗ is a Walrasian equilibrium price vector if and only if it is a minimizer of the Lyapunov function L defined by (2) with its value L(p∗ ) equal to the market value R(N). A set D ⊆ Rn is integrally convex if D = co(D) and x ∈ D implies x ∈ co(D ∩ N(x)), where N(x) = {z ∈ Zn |z − x∞ < 1} and · ∞ means the maximum norm, that is, every point x ∈ D can be represented as a convex combination of integral points in N(x) ∩ D. Favati and Tardella (1990) originally introduced this concept for discrete subsets of Zn . The following theorem will be used to prove the convergence of the double-track procedures. THEOREM 3: Assume that the market model satisfies Assumptions A1 and A2. Then (i) the Lyapunov function L defined by (2) is a continuous, convex, and generalized submodular function; (ii) the set of Walrasian equilibrium price vectors forms a nonempty, compact, integrally convex, and generalized lattice, implying that all its vertices including both its smallest and largest equilibrium price vectors, ¯ respectively, are integer vectors. denoted by p and p, The theorem asserts that (i) the Lyapunov function is a well-behaved function, meaning that a local mimimum is also a global mimimum, and (ii) the set of Walrasian equilibrium price vectors possesses an elegant geometry: (a) the set is an integral polyhedron, that is, all vertices including p and p¯ are integer vectors, and (b) the intersection of the set with any unit hypercube {x} + [0 1]n for x ∈ Zn is integrally convex and thereby all of its vertices are integer vectors. 3.2. The Formal Procedure We are now ready to give a formal description of the double-track adjustment process.6 This process can be seen as an extension of Ausubel (2006) 6 We refer to Yang (1999) for various adjustment processes for finding Walrasian equilibria, Nash equilibria and their refinements in continuous models.
940
N. SUN AND Z. YANG
from GS to GSC preferences environments and thus from the standard order ≤ to the new order ≤g .7 More specifically, when either S1 = ∅ or S2 = ∅ (i.e., all items are substitutes to the buyers), this new process coincides exactly with Ausubel’s. The new order ≤g differs from the standard order ≤ used by the existing auctions in that the new process adjusts prices of items in one set upward, but at the same time adjusts prices of items in the other set downward. Therefore, we define an n-dimensional cube for price adjustment by = {δ ∈ Rn |0 ≤ δk ≤ 1 ∀βk ∈ S1 −1 ≤ δl ≤ 0 ∀βl ∈ S2 } For any buyer i ∈ I, any price vector p ∈ Zn , and any price variation δ ∈ , choose S˜ i ∈ arg min (3) δh S∈Di (p)
βh ∈S
The next lemma asserts that for any buyer i, any p ∈ Zn , and any δ ∈ , his optimal bundle S˜ i in (3) chosen from Di (p) remains constant for all price vectors on the line segment from p to p + δ. This property is crucial for the auctioneer to adjust the current price vector to the next one and is a consequence of the GSI property. LEMMA 2: If Assumptions A1 and A2 hold for the market model, then for any i ∈ I, any p ∈ Zn , and any δ ∈ , the solution S˜ i of formula (3) satisfies S˜ i ∈ Di (p + λδ) and the Lyapunov function L(p + λδ) is linear in λ, for any parameter λ ≥ 0 such that 0 ≤ λδk ≤ 1 for every βk ∈ S1 and −1 ≤ λδl ≤ 0 for every βl ∈ S2 . Given a current price vector p(t) ∈ Zn , the auctioneer first asks every buyer i to report his demand Di (p(t)). Then she uses every buyer’s reported demand Di (p(t)) to determine the next price vector p(t + 1). The underlying rationale for the auctioneer is to choose a direction δ ∈ so as to reduce the value of the Lyapunov function L as much as possible. To achieve this, she needs to solve the problem max L(p(t)) − L(p(t) + δ) (4) δ∈
Note that the above formula involves every buyer’s valuation of every bundle of goods, so it uses private information. Apparently, it is impossible for the 7 This process can be also viewed as a direct generalization of Gul and Stacchetti (2000) from GS to GSC environments. We adopt here the Lyapunov function approach instead of matroid theory used by Gul and Stacchetti, because the former is more familiar in economics and much simpler than the latter.
941
DOUBLE-TRACK ADJUSTMENT PROCESS
auctioneer to know such information unless the buyers tell her. Fortunately, she can fully infer the difference between L(p(t)) and L(p(t) + δ) just from the reported demands Di (p(t)) and the price variation δ. To see this, we know from the definition of the Lyapunov function that for any given p(t) ∈ Zn and δ ∈ , the difference is given by
(5) L(p(t)) − L(p(t) + δ) = δh V i (p(t)) − V i (p(t) + δ) − βh ∈N
i∈I
Although, at prices p(t), each buyer i may have many optimal choices, his indirect utility V i (p(t)) at p(t) is unique since every optimal choice gives him the same indirect utility. Lemma 2 tells us that some S˜ i of his optimal choices remains unchanged when prices vary from p(t) to p(t) + δ. It is immediately clear that his indirect utility V i (p(t) + δ) at prices p(t) + δ equals V i (p(t)) − βh ∈S˜ i δh . Now we obtain the change in indirect utility for buyer i when prices move from p(t) to p(t) + δ. This change is unique and is given by (6) δh = δh V i (p(t)) − V i (p(t) + δ) = min S∈Di (p(t))
βh ∈S
βh ∈S˜ i
where S˜ i is a solution given by (3) for buyer i with respect to price vector p(t) and the variation δ. Consequently, equation (5) becomes the following simple formula whose right side involves only price variation δ and optimal choices at p(t): min (7) L(p(t)) − L(p(t) + δ) = δh − δh i∈I
=
S∈Di (p(t))
i∈I β ∈S˜ i h
βh ∈N
βh ∈S
δh −
δh
βh ∈N
The next result shows that the set of solutions to problem (4) is a generalized lattice, and both its smallest and largest elements are integral, resembling Theorem 3 and following also from the generalized submodularity of the Lyapunov function. LEMMA 3: If Assumptions A1 and A2 hold for the market model, then the set of solutions to problem (4) is a nonempty, integrally convex, and generalized lattice, and both its smallest and largest elements are integer vectors. Given the current price vector p(t), the next price vector p(t + 1) is given by p(t + 1) = p(t) + δ(t), where δ(t) is the unique smallest element as described in Lemma 3. Since δ(t) is an integer vector, this implies that the auctioneer
942
N. SUN AND Z. YANG
does not need to search everywhere in the cube to achieve a maximal decrease in the value of the Lyapunov function. It suffices to search only the vertices (i.e., the integer vectors) of the cube , and doing so will lead to the same maximal value decrease of the Lyapunov function. Let Δ = ∩ Zn . By (7), the decision problem (4) of the seller boils down to computing the unique smallest solution δ(t) (on the order ≤g ) of the optimization problem (8) min δh − δh max δ∈Δ
i∈I
S∈Di (p(t))
βh ∈S
βh ∈N
The max-min in formula (8) has a meaningful and interesting interpretation: when the prices are adjusted from p(t) to p(t + 1) = p(t) + δ(t), all buyers try to minimize their losses in indirect utility, whereas the seller strives for the highest gain. Nevertheless, the entire computation for (8) is carried out solely by the seller according to buyers’ reported demands Di (p(t)). The computation of (8) is fairly simple because the seller can easily calculate the value (minS∈Di (p(t)) βh ∈S δh ) for each given δ ∈ Δ and buyer i. Now we summarize the adjustment process as follows. The Dynamic Double-Track (DDT) Procedure Step 1: The auctioneer announces an initial price vector p(0) ∈ Zn+ with p(0) ≤g p. Let t := 0 and go to Step 2. Step 2: After the announcement of p(t), the auctioneer asks every buyer i to report his demand Di (p(t)). Then according to (8) and reported demands Di (p(t)), the auctioneer computes the unique smallest element δ(t) (on the order ≤g ) and obtains the next price vector p(t + 1) := p(t) + δ(t). If p(t + 1) = p(t), then the procedure stops. Otherwise, let t := t + 1 and return to Step 2. First, observe that this procedure simultaneously adjusts prices upward for items in S1 and downward for items in S2 , but it does not run two sets independently. When S1 = ∅ and S2 = ∅ (or S2 = ∅ and S1 = ∅), that is, the GS case, the procedure reduces to a multi-item ascending (descending) auction. Second, the rules of the procedure are simple, transparent, detail-free, and privacy-preserving because buyers are asked to reveal only their demands and nothing else, such as their values. Third, to guarantee p(0) ≤g p, the auctioneer just needs to set the initial prices of items in S1 so low and those of items in S2 so high that all items in S1 are over-demanded, but all items in S2 are under-demanded. This can be easily done because every buyer’s value function ui is weakly increasing with ui (∅) = 0 and is bounded above from U ∗ given in Assumption A3. For instance, the auctioneer can simply take p(0) = (p1 (0) pn (0)) by setting pk (0) = 0 for any βk ∈ S1 and pk (0) = U ∗ for any βk ∈ S2 . Observe from the proof of the following Lemma 4 (in the Appendix) that Lemma 4(i) and (ii) are independent of the choice of p(0).
DOUBLE-TRACK ADJUSTMENT PROCESS
943
LEMMA 4: Under Assumptions A1 and A2, the DDT procedure has the following properties: (i) p(t) ≤g p implies p(t + 1) ≤g p and (ii) p(t + 1) = p(t) implies p(t) ≥g p. We are ready to establish the following convergence theorem for the DDT procedure. THEOREM 4: For the market model under Assumptions A1–A3, the DDT procedure converges to the smallest equilibrium price vector p in a finite number of rounds. PROOF: Recall that by Theorem 3(ii), the market model has not only a nonempty set of equilibrium price vectors, but also a unique smallest equilibrium price vector p. Let {p(t) t = 0 1 } be the sequence of price vectors generated by the procedure. Note that p(t + 1) = p(t) + δ(t), δ(t) ∈ ∩ Zn for t = 0 1 and p(t) ≤g p(t + 1) for t = 0 1 and all p(t) are integer vectors. Because of p(0) ≤g p, Lemma 4(i) implies that p(t) ≤g p for all t. Since δ(t) is an integer vector for any t and the sequence {p(t) t = 0 1 } is bounded above from p, the sequence must be finite. This means that p(t ∗ ) = p(t ∗ + 1) for some t ∗ , that is, the sequence can be written as {p(t) t = 0 1 t ∗ }. Note that p(t) = p(t + 1) and δ(t) = 0 for any t = 0 1 t ∗ − 1. By Lemma 4(ii), p(t ∗ ) ≥g p. Because of p(t ∗ ) ≤g p, it is clear that p(t ∗ ) = p. This shows that the procedure terminates with the smallest equilibrium price vector p in finitely many rounds. Q.E.D. Similar to Ausubel (2006), Gul and Stacchetti (2000) in the GS case, and Demange, Gale, and Sotomayor (1986) in the unit-demand case (a special GS case), the DDT procedure finds the smallest equilibrium price vector p only if p(0) ≤g p. Now we modify the DDT procedure so that it converges globally to an equilibrium price vector (not necessarily p) by starting from any integer price vector p(0). Analogous to the discrete set Δ, define the discrete set Δ∗ = −Δ. Through Δ∗ , we lower prices of items in S1 , but raise prices of items in S2 . The Global Dynamic Double-Track (GDDT) Procedure Step 1: Choose any initial price vector p(0) ∈ Zn+ . Let t := 0 and go to Step 2. Step 2: The auctioneer asks every buyer i to report his demand Di (p(t)) at p(t). Then based on reported demands Di (p(t)), the auctioneer computes the unique smallest element δ(t) (on the order ≤g ) according to (8). If δ(t) = 0, go to Step 3. Otherwise, set the next price vector p(t + 1) := p(t) + δ(t) and t := t + 1. Return to Step 2.
944
N. SUN AND Z. YANG
Step 3: The auctioneer asks every buyer i to report his demand Di (p(t)) at p(t). Then based on reported demands Di (p(t)), the auctioneer computes the unique largest element δ(t) (on the order ≤g ) according to (8) where Δ is replaced by Δ∗ . If δ(t) = 0, then the procedure stops. Otherwise, set the next price vector p(t + 1) := p(t) + δ(t) and t := t + 1. Return to Step 3. First, observe that Step 2 of the GDDT procedure is the same as Step 2 of the DDT procedure and Step 3 of the GDDT procedure is also the same as Step 2 of the DDT procedure except that in Step 3 we switch the role of S1 and S2 by moving from Δ to Δ∗ . Second, the GDDT procedure terminates in Step 3 and never goes from Step 3 to Step 2. This is a crucial step, quite different from Ausubel’s (2006) global auction which requires repeated implementation of his ascending auction and descending auction one after the other in the GS case. Here we find that his global auction only needs to run his ascending auction and descending auction each once. Third, because the order ≤g is defined in the specified order of (S1 S2 ), the auctioneer computes the unique largest element δ(t) in Step 3 (which is equivalent to the unique smallest element if we redefine the order ≤g on the order of (S2 S1 )). Theorem 5 below dispenses with Assumption A3. THEOREM 5: For the market model under Assumptions A1 and A2, starting with any integer price vector, the GDDT procedure converges to an equilibrium price vector in a finite number of rounds. 4. CONCLUDING REMARKS We conclude with a short summary that highlights the main contributions of the current paper. We proposed the (G)DDT procedure that finds Walrasian equilibria and tells us about Walrasian equilibria in the circumstances containing complements that move beyond those we could handle before. The essential feature of the proposed procedure is that it adjusts the prices of items in one set upward but those of items in the other set downward. The GSI property plays a crucial role in establishing these procedures. In this paper, however, we did not address the strategic issue such as when confronting a price mechanism, is it a best policy for every buyer to reveal his demand truthfully? More specifically, does sincere bidding constitute a Nash equilibrium (or its variants) of the game induced by the mechanism? If it is the case, the mechanism is said to be strategy-proof. In Sun and Yang (2008), based upon the current GDDT procedure, we developed a strategy-proof dynamic auction for the GSC environments and proved that sincere bidding is an ex post perfect equilibrium of the game induced by the auction mechanism. APPENDIX We first introduce two auxiliary lemmas. Lemma 5 gives a different formulation of the GSC condition in terms of multiple price increases for the goods in
DOUBLE-TRACK ADJUSTMENT PROCESS
945
one set and multiple price decreases for the goods in the other set as opposed to a single price increase for one good. The original definition of GSC has the advantage of simplicity and is easy to use in checking whether a utility function has the GSC property, whereas this alternative shows rich properties of GSC and will be handily used to prove various results. Lemma 6 provides an alternative form of Definition 3 by relaxing the strict inequality vi (A p) < vi (B p) to vi (A p) ≤ vi (B p). LEMMA 5: A value function ui : 2N → R satisfies the GSC condition if and only if for any price vectors p q ∈ Rn with qk ≥ pk for all βk ∈ Sj for j = 1 or 2 and ql ≤ pl for all βl ∈ Sjc , and for any bundle A ∈ Di (p), there exists a bundle B ∈ Di (q) such that {βk |βk ∈ A ∩ Sj and qk = pk } ⊆ B
and
{βl |βl ∈ A ∩ S and ql = pl } ⊆ B c
c j
c
LEMMA 6: A value function ui : 2N → R satisfies the GSI condition, if and only if, for any price vector p ∈ Rn and any set A ∈ / Di (p), there exists another set B (= A) satisfying one of the conditions (i) and (ii) of Definition 3 and vi (A p) ≤ vi (B p). Because the proof of Theorem 1 is quite involved, it will be helpful to give its road map in advance. For the necessity part, we first pick up any price vector p and fix any bundle A ∈ / Di (p). Then we vary the prices and use the boundedness of the value function and the GSC condition to construct a new bundle B which is at least as good as A given the new prices and satisfies conditions (i) or (ii) of Definition 3. Our construction will then imply that B is at least as good as A given the original prices. For the sufficiency part, we mainly use monotonicity and continuity of the indirect utility function. PROOF OF THEOREM 1: We first prove that GSC implies GSI. Pick up any p ∈ Rn , fix any A ∈ / Di (p), and fix C ∈ Di (p). It follows from the boundedness of the value function that for the given p, there exists a large real number M ∗ such that for any q ∈ Rn , any T ∈ Di (q), and any βk ∈ N, qk ≥ pk + M ∗ implies βk ∈ / T , and qk ≤ pk − M ∗ implies βk ∈ T . Using M ∗ , define pˆ = ∗ c ˆ and C ∈ Di (p). ˆ Because p + M e(A ∩ C c ). Then it still holds that A ∈ / Di (p) C = A, there are two possibilities: Case I, C \ A = ∅, and Case II, C \ A = ∅ and A \ C = ∅. Now we discuss them in detail. In Case I (i.e., C \ A = ∅), choose an item βk ∈ C \ A and assume βk ∈ Sj for some j = 1 or 2. Let p˘ = pˆ + M ∗ e([C \ (A ∪ {βk })] ∩ Sj ) − M ∗ e(A ∩ Sjc ). Note ˘ the price of item βk does not change. Then, with that when pˆ changes to p, ˆ and βk ∈ C ∩ Sj , it follows from the GSC condition and regard to C ∈ Di (p) ¯ Clearly, {βk } ⊆ ˘ such that βk ∈ C. Lemma 5 that there exists a set C¯ ∈ Di (p)
946
N. SUN AND Z. YANG
(C¯ \ A) ∩ Sj . Meanwhile, observe that p˘ = p + M ∗ e(Ac \ [(C ∩ Sjc ) ∪ {βk }]) − ˘ we have M ∗ e(A ∩ Sjc ). Then, by the definition of M ∗ and the construction of p, c ¯ ¯ ¯ (C \ A) ∩ Sj ⊆ {βk } and A ∩ Sj ⊆ C. In summary, it yields (C \ A) ∩ Sj = {βk } ¯ and A ∩ Sjc ⊆ C. In Subcase I(a) in which (C¯ \A)∩Sjc = ∅, select an item βh ∈ (C¯ \A)∩Sjc . Let p˜ = p˘ + M ∗ e([C \ (A ∪ {βh })] ∩ Sjc ) − M ∗ e(A ∩ Sj ). Note that when p˘ changes ˜ the price of item βh does not change. Then, with regard to C¯ ∈ Di (p) ˘ to p, and βh ∈ C¯ ∩ Sjc , it follows from the GSC condition and Lemma 5 that there ˜ such that βh ∈ B. Observe that p˜ = p + M ∗ e(Ac \ exists a bundle B ∈ Di (p) ∗ {βk βh }) − M e(A). Then the definition of M ∗ and the construction of p˜ imply that A ⊆ B and B \ A ⊆ {βk βh }. Thus we have A \ B = ∅ and B \ A = {βh βk } or {βh }. Namely, the set B satisfies the condition (i) or (ii) of Definition 3. ¯ ∩ Sj = ∅, choose an In Subcase I(b) in which (C¯ \ A) ∩ Sjc = ∅ and (A \ C) ∗ c ¯ ∩ Sj . Let p˜ = p˘ + M e((C \ A) ∩ S ) − M ∗ e((A ∩ Sj ) \ {βh }). item βh ∈ (A \ C) j ˜ the price of item βh does not change. Then, Note that when p˘ changes to p, ¯ it follows from the GSC condition ˘ and βh ∈ Sj \ C, with regard to C¯ ∈ Di (p) ˜ such that βh ∈ and Lemma 5 that there exists a set B ∈ Di (p) / B. Next, observe that p˜ = p + M ∗ e(Ac \ {βk }) − M ∗ e(A \ {βh }). Then the definition of M ∗ and the construction of p˜ imply that A \ B ⊆ {βh } and B \ A ⊆ {βk }. Therefore we have A \ B = {βh } and B \ A = {βk } or ∅. This shows that the set B satisfies the condition (i) or (ii) of Definition 3. ¯ Then B satisfies In Subcase I(c) in which C¯ = A ∪ {βk }, let p˜ = p˘ and B = C. the condition (i) of Definition 3. In Case II (i.e., C ⊆ A and A\C = ∅), choose an item βk ∈ A\C and assume βk ∈ Sj for some j = 1 or 2. Let p˘ = pˆ − M ∗ e((A \ {βk }) ∩ Sj ). Note that when ˘ the price of item βk does not change. Then, with regard to pˆ changes to p, ˆ and βk ∈ Sj \ C, it follows from the GSC condition and Lemma 5 C ∈ Di (p) ¯ Meanwhile, note that p˘ = ˘ such that βk ∈ that there exists a set C¯ ∈ Di (p) / C. ∗ c ∗ p + M e(A ) − M e((A ∩ Sj ) \ {βk }). Then, by the definition of M ∗ and the ¯ ∩ Sj ⊆ {βk }. Consequently, it ˘ we have C¯ ⊆ A and (A \ C) construction of p, ¯ ∩ Sj = {βk }. leads to C¯ ⊆ A and (A \ C) ¯ ∩ S c = ∅, choose an item βh ∈ (A \ C) ¯ ∩ Sc . In Subcase II(a) in which (A \ C) j j ∗ c ˜ the price Let p˜ = p˘ − M e((A \ {βh }) ∩ Sj ). Note that when p˘ changes to p, ¯ ˘ and βh ∈ Sjc \ C, of item βh does not change. Then, with regard to C¯ ∈ Di (p) it follows from the GSC condition and Lemma 5 that there exists a bundle B ∈ ˜ such that βh ∈ Di (p) / B. Next, note that p˜ = p + M ∗ e(Ac ) − M ∗ e(A \ {βk βh }). Then the definition of M ∗ and the construction of p˜ imply that B ⊆ A and A \ B ⊆ {βh βk }. Therefore, we have B ⊆ A, and A \ B = {βh βk } or {βh }. Thus the set B satisfies the condition (i) or (ii) of Definition 3. ¯ Then B satisfies In Subcase II(b) in which C¯ = A \ {βk }, let p˜ = p˘ and B = C. the condition (i) of Definition 3.
DOUBLE-TRACK ADJUSTMENT PROCESS
947
By summing up all the above cases, we conclude that there always exist a ˜ (B = A) that satisfy one of the conditions price vector p˜ and a set B ∈ Di (p) ˜ that vi (B p) ˜ = (i) and (ii) of Definition 3. Next, it follows from B ∈ Di (p) i i ˜ ≥ v (A p). ˜ The construction of p˜ implies that p([A ˜ \ B] ∪ [B \ A]) = V (p) ˜ − p([A \ B] ∪ [B \ A]). As a result, we have vi (B p) − vi (A p) = vi (B p) ˜ ≥ 0, and by Lemma 6 we see immediately that the GSI condition is vi (A p) satisfied. It remains to show that GSI implies GSC. Choose any price vector p ∈ Rn , βk ∈ Sj for some j = 1 or 2, δ ≥ 0, and A ∈ Di (p). It is clear that if βk ∈ / A, then A ∈ Di (p + δe(k)). If we choose B = A, then the GSC condition is immediately satisfied. Now we assume that βk ∈ A. Let δ∗ = V i (p) − V i (p + δe(k)). Then we have 0 ≤ δ∗ ≤ δ, A ∈ Di (p + εe(k)), and V i (p + εe(k)) = V i (p) − ε for all ε ∈ [0 δ∗ ]. We need to consider two separate cases. First, if δ∗ = δ, then we have A ∈ Di (p + δe(k)) and we can choose B = A. Clearly, the GSC condition is satisfied. In the rest, we deal with the case of δ∗ < δ. In this case we have V i (p + εe(k)) = V i (p + δ∗ e(k)) and A ∈ / Di (p + εe(k)) for all ε > δ∗ . In i particular, we have A ∈ / D (p + δe(k)). Now let {δν } be any sequence of positive real numbers which converges to 0. Since A ∈ / Di (p + (δ∗ + δν )e(k)), it follows from the GSI condition that there exists a GSI improvement set Bν of A such that vi (Bν p + (δ∗ + δν )e(k)) > vi (A p + (δ∗ + δν )e(k)). Notice that βk does not belong to any such GSI improvement set Bν . Suppose that this statement is false. Then for some ν we would have vi (Bν p + δ∗ e(k)) − δν = vi (Bν p + (δ∗ + δν )e(k)) > vi (A p + (δ∗ + δν )e(k)) = vi (A p + δ∗ e(k)) − δν . This leads to vi (Bν p + δ∗ e(k)) > vi (A p + δ∗ e(k)) = V i (p + δ∗ e(k)), yielding a contradiction. Meanwhile, since the number of sets Bν is finite, without loss of generality we can assume that there exists a positive integer ν ∗ such that Bν = B for all ν ≥ ν ∗ . Then by the continuity of net utility function vi (B ·), we have vi (B p+δ∗ e(k)) = vi (A p+δ∗ e(k)) = V i (p+δ∗ e(k)) = V i (p+δe(k)). In addition, since βk ∈ / B, we have vi (B p + δe(k)) = vi (B p + δ∗ e(k)) = i V (p + δe(k)). This implies B ∈ Di (p + δe(k)). Furthermore, since B is a / B, it satisfies either (i) A ∩ Sjc = B ∩ Sjc , GSI improvement set of A and βk ∈ (A \ B) ∩ Sj = {βk }, and [(B \ A) ∩ Sj ] ≤ 1 or (ii) B ⊆ A, (A \ B) ∩ Sj = {βk }, and [(A \ B) ∩ Sjc ] = 1. This concludes that [A ∩ Sj ] \ {βk } ⊆ B and Ac ∩ Sjc ⊆ Bc , and thus the GSC condition is satisfied. Q.E.D. PROOF OF THEOREM 3: By Theorem 3.1 of Sun and Yang (2006b) the model has an equilibrium. Then by Lemma 1 the set of equilibria is equal to the set of minimizers of the Lyapunov function L. Let Λ = arg min{L(p)|p ∈ Rn } It follows from Theorem 2 and the remark after formula (1) that the Lyapunov function L is a continuous, convex, and generalized submodular function. Now we prove statement (ii). We first show that Λ is a generalized lattice. Take any p q ∈ Λ. So we have L(p) = L(q) = R(N), where R(N) is the market value. Clearly, R(N) ≤ L(p ∧g q) ≤ L(p) + L(q) − L(p ∨g q) ≤ 2R(N) − R(N) = R(N). This shows that L(p ∧g q) = L(p ∨g q) = R(N) and p ∧g q p ∨g q ∈ Λ.
948
N. SUN AND Z. YANG
So the set Λ is a nonempty, convex, and generalized lattice. Clearly, Λ is also compact. Next, we prove that Λ is also an integrally convex set. Suppose the statement is false. Define A = {p ∈ Λ|p ∈ / co(Λ ∩ N(p))} where N(p) = {z ∈ Zn |z − p∞ < 1}. Then A is a nonempty subset of Λ. Observe that p ∈ co(Λ ∩ N(p)) for every p ∈ Λ ∩ Zn because N(p) = {p} for every p ∈ Zn , and so A ∩ Zn = ∅. Let p∗ ∈ A be a vector that has at least as many integral coordinates as any other vector in A. Thus, the number of integral coordinates of p∗ is the largest among all vectors in A. Since co(N(p∗ )) is a hypercube, it is a generalized lattice. Let q∗ be the generalized smallest element of co(N(p∗ )). Obviously, q∗ ∈ Zn , q∗ = p∗ , and qh∗ = p∗h whenever p∗h is an integer. Let δ∗ = p∗ − q∗ . Clearly, δ∗h = 0 whenever p∗h is an integer. Then δ∗ ∈ (defined before Lemma 2), δ∗ ∈ / Zn , and 0 < δ∗ ∞ < 1. Define ¯λ = 1/δ∗ ∞ > 1. By Lemma 2 we know8 that L(q∗ + λδ∗ ) is linear in λ on ¯ Recall that p∗ is a minimizer of the Lyapunov function L. the interval [0 λ]. ∗ ¯ ∗ ), Thus, if q ∈ / Λ, that is, L(q∗ ) > L(p∗ ) = L(q∗ + δ∗ ), then L(p∗ ) > L(q∗ + λδ ∗ yielding a contradiction. We now consider the case where q ∈ Λ, that is, L(q∗ ) = L(p∗ ) = L(q∗ + δ∗ ). Then it follows from the linearity of L in λ that ¯ ∗ ); that is, q∗ + λδ ¯ ∗ ∈ Λ. By the construction of λ, ¯ q∗ + λδ ¯ ∗ L(p∗ ) = L(q∗ + λδ ∗ ∗ has more integral coordinates than p . Therefore, by the choice of p , we see ¯ ∗ ∈ Λ \ A. That is, q∗ + λδ ¯ ∗ ∈ co(Λ ∩ N(q∗ + λδ ¯ ∗ )). Moreover, that q∗ + λδ ∗ ∗ ∗ ∗ ∗ ¯ observe that N(q + λδ ) ⊆ N(p ), q ∈ co(Λ ∩ N(p )), and p∗ is a convex ¯ ∗ . As a result, we have p∗ ∈ co(Λ ∩ N(p∗ )), combination of q∗ and q∗ + λδ contradicting the hypothesis that p∗ ∈ A. Finally, by definition, we know that every vertex of an integrally convex set is an integral vector and thus every vertex of Λ must be integral as well. So all the vertices of Λ, including the generalized smallest and largest equilibrium price ¯ are integral vectors. Furthermore, since the set Λ is bounded, vectors p and p, Λ has a finite number of vertices. Clearly, Λ is an integral polyhedron. Q.E.D. We extend and modify the arguments of Propositions 2 and 5 of Ausubel (2006) under the GS condition to prove the following two lemmas under the GSC condition. PROOF OF LEMMA 2: Assume by way of contradiction that there exists λ > 0 such that 0 ≤ λδk ≤ 1 for any βk ∈ S1 and −1 ≤ λδl ≤ 0 for any βl ∈ S2 but S˜ i ∈ / Di (p + λδ). By the GSI property, for S˜ i there exists a GSI improvement bundle A with vi (A p + λδ) > vi (S˜ i p + λδ). By the construction of S˜ i , we see that vi (S˜ i p + λδ) ≥ vi (C p + λδ) for all C ∈ Di (p) and hence A ∈ / Di (p). n i i ˜i Then it follows from Assumption A1 and p ∈ Z that v (A p) ≤ v (S p) − 1 On the other hand, since 0 ≤ λδk ≤ 1 for any βk ∈ S1 and −1 ≤ λδl ≤ 0 8
Note that Lemma 2 and its proof are independent of the current theorem and its proof.
DOUBLE-TRACK ADJUSTMENT PROCESS
949
˜i for bundle of 2 , and A is a GSI improvement any βl ∈ S S , we must have | βh ∈S˜ i λδh − βh ∈A λδh | ≤ 1 and thus βh ∈S˜ i λδh − 1 ≤ βh ∈A λδh The previous two inequalities imply that vi (A p + λδ) ≤ vi (S˜ i p + λδ), yielding a contradiction. To prove the second part, observe that the above result S˜ i ∈ Di (p + λδ) im plies L(p + λδ) = L(p) + λ( βh ∈N δh − i∈I βh ∈S˜ i δh ) for all λ ≥ 0 such that 0 ≤ λδk ≤ 1 for any βk ∈ S1 and −1 ≤ λδl ≤ 0 for any βl ∈ S2 . So L is linear in λ. Q.E.D. PROOF OF LEMMA 4: (i) Suppose to the contrary that in the DDT procedure process there exists a price vector p(t) such that p(t) ≤g p but p(t + 1) ≤g p Then we have p(t) ∧g p = p(t), but (9)
p(t) ≤g p(t + 1) ∧g p ≤g p(t + 1) and p(t + 1) ∧g p = p(t + 1)
On the other hand, recall from Lemma 1 that since p is the smallest equilibrium price vector on the order of ≤g , it minimizes L(·) and so L(p) ≤ L(p(t + 1) ∨g p). Since L(·) is a generalized submodular function by Theorem 3(i), we have L(p(t + 1) ∨g p) + L(p(t + 1) ∧g p) ≤ L(p(t + 1)) + L(p). Adding the previous inequalities leads to L(p(t + 1) ∧g p) ≤ L(p(t + 1)). By the construction of p(t + 1), this implies that L(p(t + 1) ∧g p) = L(p(t + 1)) and so p(t + 1) ≤g p(t + 1) ∧g p, contradicting inequality (9). (ii) Suppose to the contrary that there exists a price vector p(t) such that p(t + 1) = p(t) but p(t) ≥g p. Then p(t) ∧g p is less than p in at least one component on the order of ≤g . Since p is the smallest equilibrium price vector on the order of ≤g , we know that p(t) ∧g p is not an equilibrium price vector of the market model. Applying Lemma 1, this implies that L(p) < L(p(t) ∧g p). Because L(·) is a generalized submodular function, we also have that L(p(t) ∨g p) + L(p(t) ∧g p) ≤ L(p(t)) + L(p). Adding the previous inequalities implies that L(p(t) ∨g p) < L(p(t)). Because p(t) ∨g p ≥g p(t) and p(t) ∨g p = p(t), there exists p , a strict convex combination of p(t) and p(t) ∨g p, such that p ∈ {p(t)} + and L(p ) < L(p(t)) due to the convexity of L(·) by Theorem 3(i) and the previous strict inequality. By Lemma 3, we know that L(p(t) + δ(t)) < L(p(t)) and hence p(t + 1) = p(t), contradicting the hypothesis. Q.E.D. PROOF OF THEOREM 5: By Theorem 3(ii), the market has a Walrasian equilibrium, and by Lemma 1, the Lyapunov function L(·) attains its mimimum value at any equilibrium price vector and is bounded from below. Since the prices and value functions take only integer values, the Lyapunov function is an integer valued function and it lowers by a positive integer value in each
950
N. SUN AND Z. YANG
round of the GDDT procedure. This guarantees that the procedure terminates in finitely many rounds, that is, δ(t ∗ ) = 0 in Step 3 for some t ∗ ∈ Z+ . Let p(0) p(1) p(t ∗ ) be the generated finite sequence of price vectors. Let t¯ ∈ Z+ be the time when the GDDT procedure finds δ(t¯) = 0 in Step 2. We claim that L(p) ≥ L(p(t¯)) for all p ≥g p(t¯). Suppose to the contrary that there exists some p ≥g p(t¯) such that L(p) < L(p(t¯)). By the convexity of L(·) via Theorem 3(i), there is a strict convex combination p of p and p(t¯) such that p ∈ {p(t¯)} + and L(p ) < L(p(t¯)). Lemma 3 and Step 2 of the GDDT procedure imply that L(p(t¯) + δ(t¯)) = minδ∈ L(p(t¯) + δ) = minδ∈Δ L(p(t¯) + δ) ≤ L(p ) < L(p(t¯)) and so δ(t¯) = 0, yielding a contradiction. Therefore, we have L(p ∨g p(t¯)) ≥ L(p(t¯)) for all p ∈ Rn , because p ∨g p(t¯) ≥g p(t¯) for all p ∈ Rn . We will further show that L(p ∨g p(t)) ≥ L(p(t)) for all t = t¯ + 1 t¯ + 2 t ∗ and p ∈ Rn . By induction, it suffices to prove the case of t = t¯ + 1. Notice that p(t¯ + 1) = p(t¯) + δ(t¯), where δ(t¯) ∈ Δ∗ is determined in Step 3 of the GDDT procedure. Suppose to the contrary that there is p ∈ Rn such that L(p ∨g p(t¯ + 1)) < L(p(t¯ + 1)). Then if we start the GDDT procedure from p(t¯ + 1), by the same previous argument we can find a δ (= 0) ∈ Δ in Step 2 such that L(p(t¯ +1)+δ) < L(p(t¯ +1)). Since L(·) is a generalized submodular function, we have L(p(t¯) ∨g (p(t¯ + 1) + δ)) + L(p(t¯) ∧g (p(t¯ + 1) + δ)) ≤ L(p(t¯) + L(p(t¯ + 1) + δ)). Recall that L(p(t¯) ∨g (p(t¯ + 1) + δ)) ≥ L(p(t¯)). It follows that L(p(t¯) ∧g (p(t¯ + 1) + δ)) ≤ L(p(t¯ + 1) + δ) < L(p(t¯ + 1)). Observe that δ = 0 ∧g (δ(t¯) + δ) ∈ Δ∗ and p(t¯) ∧g (p(t¯ + 1) + δ) = p(t¯) + δ . This yields L(p(t¯) + δ ) < L(p(t¯) + δ(t¯)) and so δ = δ(t¯), contradicting the fact that L(p(t¯) + δ(t¯)) = minδ∈Δ∗ L(p(t¯) + δ) = minδ∈∗ L(p(t¯) + δ), where ∗ = −. By the symmetry between Step 2 and Step 3, similarly we can also show that L(p ∧g p(t ∗ )) ≥ L(p(t ∗ )) for all p ∈ Rn (see a detailed proof for this statement in the Supplemental Material). We proved above that L(p ∨g p(t ∗ )) ≥ L(p(t ∗ )) for all p ∈ Rn . Since L(·) is a generalized submodular function, we have L(p) + L(p(t ∗ )) ≥ L(p ∨g p(t ∗ )) + L(p ∧g p(t ∗ )) ≥ 2L(p(t ∗ )) for all p ∈ Rn . This shows that L(p(t ∗ )) ≤ L(p) holds for all p ∈ Rn and by Lemma 1, p(t ∗ ) is an equilibrium price vector. Q.E.D. REFERENCES ARROW, K. J., AND F. H. HAHN (1971): General Competitive Analysis. San Francisco: Holden-Day. [939] ARROW, K. J., AND L. HURWICZ (1958): “On the Stability of the Competitive Equilibrium, I,” Econometrica, 26, 522–552. [933] AUSUBEL, L. (2004): “An Efficient Ascending-Bid Auction for Multiple Objects,” American Economic Review, 94, 1452–1475. [937] (2005): “Walrasian Tatonnement for Discrete Goods,” Preprint. [939] (2006): “An Efficient Dynamic Auction for Heterogeneous Commodities,” American Economic Review, 96, 602–629. [934,935,937-939,943,944,948] AUSUBEL, L., AND P. MILGROM (2002): “Ascending Auctions With Package Bidding,” Frontiers of Theoretical Economics, 1, Article 1. [938]
DOUBLE-TRACK ADJUSTMENT PROCESS
951
CRAWFORD, V. P. (2008): “The Flexible-Salary Match: A Proposal to Increase the Salary Flexibility of the National Resident Matching Program,” Journal of Economic Behavior and Organization, 66, 149–160. [937] CRAWFORD, V. P., AND E. M. KNOER (1981): “Job Matching With Heterogeneous Firms and Workers,” Econometrica, 49, 437–450. [934] DEMANGE, D., D. GALE, AND M. SOTOMAYOR (1986): “Multi-Item Auctions,” Journal of Political Economy, 94, 863–872. [934,943] FAVATI, P., AND F. TARDELLA (1990): “Convexity in Nonlinear Integer Programming,” Ricerca Operativa, 53, 3–44. [939] GALE, D., AND L. SHAPLEY (1962): “College Admissions and the Stability of Marriage,” American Mathematical Monthly, 69, 9–15. [934] GUL, F., AND E. STACCHETTI (1999): “Walrasian Equilibrium With Gross Substitutes,” Journal of Economic Theory, 87, 95–124. [935,937,938] (2000): “The English Auction With Differentiated Commodities,” Journal of Economic Theory, 92, 66–95. [934,935,937,938,940,943] JEHIEL, P., AND B. MOLDOVANU (2003): “An Economic Perspective on Auctions,” Economic Policy, 18, 269–308. [934] KELSO, A., AND V. P. CRAWFORD (1982): “Job Matching, Coalition Formation, and Gross Substitutes,” Econometrica, 50, 1483–1504. [933,937] KLEMPERER, P. (2004): Auctions: Theory and Practice. Princeton, NJ: Princeton University Press, 269–282. [934] KRISHNA, V. (2002): Auction Theory. New York: Academic Press. [934] MASKIN, E. (2005): “Recent Contributions to Mechanism Design: A Highly Selective Review,” Preprint. [934] MILGROM, P. (2000): “Putting Auction Theory to Work: The Simultaneous Ascending Auction,” Journal of Political Economy, 108, 245–272. [934,937] (2004): Putting Auction Theory to Work. New York: Cambridge University Press. [937] (2007): “Package Auctions and Exchanges,” Econometrica, 75, 935–965. [934] OSTROVSKY, M. (2008): “Stability in Supply Chain Network,” American Economic Review, 98, 897–923. [934] PERRY, M., AND P. RENY (2005): “An Efficient Multi-Unit Ascending Auction,” Review of Economic Studies, 72, 567–592. [937] RASSENTI, S., V. SMITH, AND R. BULFIN (1982): “A Combinatorial Auction Mechanism for Airport Time Slot Allocation,” The Bell Journal of Economics, 13, 402–417. [934] SAMUELSON, P. A. (1941): “The Stability of Equilibrium: Comparative Statics and Dynamics,” Econometrica, 9, 97–120. [933] (1974): “Complementarity,” Journal of Economic Literature, 12, 1255–1289. [934] SCARF, H. (1960): “Some Examples of Global Instability of the Competitive Equilibrium,” International Economic Review, 1, 157–172. [933] (1973): The Computation of Economic Equilibria. New Haven, CT: Yale University Press. [933] SHAPLEY, L. (1962): “Complements and Substitutes in the Optimal Assignment Problem,” Naval Research Logistics Quarterly, 9, 45–48. [934] SHAPLEY, L., AND H. SCARF (1974): “On Cores and Indivisibilities,” Journal of Mathematical Economics, 1, 23–37. [934] SUN, N., AND Z. YANG (2006a): “Double-Track Auction and Job Matching Mechanisms for Allocating Substitutes and Complements,” FBA Discussion Paper 241, Yokohama National University, Yokohama. [936] (2006b): “Equilibria and Indivisibilities: Gross Substitutes and Complements,” Econometrica, 74, 1385–1402. [934,936,947] (2008): “A Double-Track Auction for Substitutes and Complements,” FBA Discussion Paper 656, Institute of Economic Research, Kyoto University, Kyoto; available at http://www. kier.kyoto-u.ac.jp. [944]
952
N. SUN AND Z. YANG
(2009): “Supplement to ‘A Double-Track Adjustment Process for Discrete Markets With Substitutes and Complements’,” Econometrica Supplemental Material, 77, http://www. econometricsociety.org/ecta/Supmat/6514_proofs.pdf. [938] VARIAN, H. R. (1981): “Dynamic Systems With Applications to Economics,” in Handbook of Mathematical Economics, 1, ed. by K. J. Arrow and M. D. Intriligator. Amsterdam: NorthHolland. [939] YANG, Z. (1999): Computing Equilibria and Fixed Points. Boston: Kluwer. [939]
School of Economics, Shanghai University of Finance and Economics, Shanghai, China; [email protected] and Faculty of Business Administration, Yokohama National University, Yokohama, Japan; [email protected]. Manuscript received June, 2006; final revision received October, 2008.
Econometrica, Vol. 77, No. 3 (May, 2009), 953–973
UNCONDITIONAL QUANTILE REGRESSIONS BY SERGIO FIRPO, NICOLE M. FORTIN, AND THOMAS LEMIEUX1 We propose a new regression method to evaluate the impact of changes in the distribution of the explanatory variables on quantiles of the unconditional (marginal) distribution of an outcome variable. The proposed method consists of running a regression of the (recentered) influence function (RIF) of the unconditional quantile on the explanatory variables. The influence function, a widely used tool in robust estimation, is easily computed for quantiles, as well as for other distributional statistics. Our approach, thus, can be readily generalized to other distributional statistics. KEYWORDS: Influence functions, unconditional quantile, RIF regressions, quantile regressions.
1. INTRODUCTION IN THIS PAPER, we propose a new computationally simple regression method to estimate the impact of changing the distribution of explanatory variables, X, on the marginal quantiles of the outcome variable, Y , or other functional of the marginal distribution of Y . The method consists of running a regression of a transformation—the (recentered) influence function defined below— of the outcome variable on the explanatory variables. To distinguish our approach from commonly used conditional quantile regressions (Koenker and Bassett (1978), Koenker (2005)), we call our regression method an unconditional quantile regression.2 Empirical researchers are often interested in changes in the quantiles, denoted qτ , of the marginal (unconditional) distribution, FY (y). For example, we may want to estimate the direct effect dqτ (p)/dp of increasing the proportion of unionized workers, p = Pr[X = 1], on the τth quantile of the distribution of wages, where X = 1 if the workers is unionized and X = 0 otherwise. In the case of the mean μ, the coefficient β of a standard regression of Y on X is a measure of the impact of increasing the proportion of unionized 1
We thank the co-editor and three referees for helpful suggestions. We are also indebted to Joe Altonji, Richard Blundell, David Card, Vinicius Carrasco, Marcelo Fernandes, Chuan Goh, Jinyong Hahn, Joel Horowitz, Guido Imbens, Shakeeb Khan, Roger Koenker, Thierry Magnac, Ulrich Müller, Geert Ridder, Jean-Marc Robin, Hal White, and seminar participants at CESG2005, UCL, CAEN–UFC, UFMG, Econometrics in Rio 2006, PUC-Rio, IPEA-RJ, SBE Meetings 2006, Tilburg University, Tinbergen Institute, KU Leuven, ESTE-2007, Harvard–MIT Econometrics Seminar, Yale, Princeton, Vanderbilt, and Boston University for useful comments on earlier versions of the manuscript. Fortin and Lemieux thank SSHRC for financial support. Firpo thanks CNPq for financial support. Usual disclaimers apply. 2 The “unconditional quantiles” are the quantiles of the marginal distribution of the outcome variable Y . Using “marginal” instead of “unconditional” would be confusing, however, since we also use the word “marginal” to refer to the impact of small changes in covariates (marginal effects). © 2009 The Econometric Society
DOI: 10.3982/ECTA6822
954
S. FIRPO, N. M. FORTIN, AND T. LEMIEUX
workers on the mean wage, dμ(p)/dp. As is well known, the same coefficient β can also be interpreted as an impact on the conditional mean.3 Unfortunately, the coefficient βτ from a single conditional quantile regression, βτ = FY−1 (τ|X = 1) − FY−1 (τ|X = 0), is generally different from dqτ (p)/dp = (Pr[Y > qτ |X = 1]− Pr[Y > qτ |X = 0])/fY (qτ ), the effect of increasing the proportion of unionized workers on the τth quantile of the unconditional distribution of Y .4 A new approach is therefore needed to provide practitioners with an easy way to compute dqτ (p)/dp, especially when X is not univariate and binary as in the above example. Our approach builds upon the concept of the influence function (IF), a widely used tool in the robust estimation of statistical or econometric models. As its name suggests, the influence function IF(Y ; ν FY ) of a distributional statistic ν(FY ) represents the influence of an individual observation on that distributional statistic. Adding back the statistic ν(FY ) to the influence function yields what we call the recentered influence function (RIF). One convenient feature of the RIF is that its expectation is equal to ν(FY ).5 Because influence functions can be computed for most distributional statistics, our method easily extends to other choices of ν beyond quantiles, such as the variance, the Gini coefficient, and other commonly used inequality measures.6 For the τth quantile, the influence function IF(Y ; qτ FY ) is known to be equal to (τ − 1{Y ≤ qτ })/fY (qτ ). As a result, RIF(Y ; qτ FY ) is simply equal to qτ + IF(Y ; qτ FY ). We call the conditional expectation of the RIF(Y ; ν FY ) modeled as a function of the explanatory variables, E[RIF(Y ; ν FY )|X] = mν (X), the RIF regression model.7 In the case of quantiles, E[RIF(Y ; qτ FY )|X] = mτ (X) can be viewed as an unconditional quantile regression. We show that the average derivative of the unconditional quantile regression, E[mτ (X)], corresponds to the marginal effect on the unconditional quantile of a small location shift in the distribution of covariates, holding everything else constant. Our proposed approach can be easily implemented as an ordinary least squares (OLS) regression. In the case of quantiles, the dependent variable in the regression is RIF(Y ; qτ FY ) = qτ + (τ − 1{Y ≤ qτ })/fY (qτ ). It is easily 3 The conditional mean interpretation is the wage change that a worker would expect when her union status changes from non-unionized to unionized, or β = E(Y |X = 1) − E(Y |X = 0). Since the unconditional mean is μ(p) = pE(Y |X = 1) + (1 − p)E(Y |X = 0), it follows that dμ(p)/dp = E(Y |X = 1) − E(Y |X = 0) = β. 4 The expression for dqτ (p)/dp is obtained by implicit differentiation applied to FY (qτ ) = p · (Pr[Y ≤ qτ |X = 1] − Pr[Y ≤ qτ |X = 0]) + Pr[Y ≤ qτ |X = 0]. 5 Such property is important in some situations, although for the marginal effects in which we are interested in this paper the recentering is not fundamental. In Firpo, Fortin, and Lemieux (2007b), the recentering is useful because it allows us to identify the intercept and perform Oaxaca-type decompositions at various quantiles. 6 See Firpo, Fortin, and Lemieux (2007b) for such regressions on the variance and Gini. 7 In the case of the mean, since the RIF is simply the outcome variable Y , a regression of RIF(Y ; μ) on X is the same as an OLS regression of Y on X.
UNCONDITIONAL QUANTILE REGRESSIONS
955
computed by estimating the sample quantile qτ , estimating the density fY (qτ ) at that point qτ using kernel (or other) methods, and forming a dummy variable 1{Y ≤ qτ }, indicating whether the value of the outcome variable is below qτ . Then we can simply run an OLS regression of this new dependent variable on the covariates, although we suggest more sophisticated estimation methods in Section 3. We view our approach as an important complement to the literature concerned with the estimation of quantile functions. However, unlike Imbens and Newey (2009), Chesher (2003), and Florens, Heckman, Meghir, and Vytlacil (2008), who considered the identification of structural functions defined from conditional quantile restrictions in the presence of endogenous regressors, our approach is concerned solely with parameters that capture changes in unconditional quantiles in the presence of exogenous regressors. The structure of the paper is as follows. In the next section, we define the key object of interest, the “unconditional quantile partial effect” (UQPE) and show how RIF regressions for the quantile can be used to estimate the UQPE. We also link this parameter to the structural parameters of a general model and the conditional quantile partial effects (CQPE). The estimation issues are addressed in Section 3. Section 4 presents an empirical application of our method that illustrates well the difference between our method and conditional quantiles regressions. We conclude in Section 5.
2. UNCONDITIONAL PARTIAL EFFECTS 2.1. General Concepts We assume that Y is observed in the presence of covariates X, so that Y and X have a joint distribution, FYX (· ·) : R × X → [0 1], and X ⊂ Rk is the support of X. By analogy with a standard regression coefficient, our object of interest is the effect of a small increase in the location of the distribution of the explanatory variable X on the τth quantile of the unconditional distribution of Y . We represent this small location shift in the distribution of X in terms of the counterfactual distribution GX (x). By definition, the unconditional (marginal) distribution function of Y can be written as (1)
FY (y) =
FY |X (y|X = x) · dFX (x)
Under the assumption that the conditional distribution FY |X (·) is unaffected by this small manipulation of the distribution of X, a counterfactual distribution
956
S. FIRPO, N. M. FORTIN, AND T. LEMIEUX
of Y , G∗Y , can be obtained by replacing FX (x) with GX (x)8 : ∗ GY (y) ≡ FY |X (y|X = x) · dGX (x) (2) Our regression method builds on some elementary properties of the influence function, a measure introduced by Hampel (1968, 1974) to study the infinitesimal behavior of real-valued functionals ν(FY ), where ν : Fν → R, and where Fν is a class of distribution functions such that FY ∈ Fν if |ν(F)| < +∞. Let GY be another distribution in the same class. Let FYt·GY ∈ Fν represent the mixing distribution, which is t away from FY in the direction of the probability distribution GY : FYt·GY = (1 − t) · FY + t · GY = t · (GY − FY ) + FY , where 0 ≤ t ≤ 1. The directional derivative of ν in the direction of the distribution GY can be written as ν(FYt·GY ) − ν(FY ) ∂ν(FYt·GY ) = (3) lim t↓0 t ∂t t=0 = IF(y; ν FY ) · d(GY − FY )(y) where IF(y; ν FY ) = ∂ν(FYt·Δy )/∂t|t=0 , with Δy denoting the probability measure that puts mass 1 at the value y. The von Mises (1947) linear approximation of the functional ν(FYt·GY ) is ν(FYt·GY ) = ν(FY ) + t · IF(y; ν FY ) · d(GY − FY )(y) + r(t; ν; GY FY ) where r(t; ν; GY FY ) is a remainder term. We define the recentered influence function (RIF) more formally as the leading terms of the above expansion for the particular case where GY = Δy and t = 1. Since IF(y; ν FY ) · dFY (y) = 0 by definition, it follows that RIF(y; ν FY ) = ν(FY ) + IF(s; ν FY ) · dΔy (s) = ν(FY ) + IF(y; ν FY ) Finally, note that the last equality in equation (3) also holds for RIF(y; ν FY ). In the presence of covariates X, we can use the law of iterated expectations to express ν(FY ) in terms of the conditional expectation of RIF(y; ν FY ) 8 Instead of assuming a constant conditional distribution FY |X (·|·), we could allow the conditional distributions to vary as long as they converge as the marginal distributions of X converge to one another.
UNCONDITIONAL QUANTILE REGRESSIONS
957
given X: (4)
ν(FY ) =
RIF(y; ν FY ) · dFY (y)
=
RIF(y; ν FY ) · dFY |X (y|X = x) · dFX (x)
=
E[RIF(Y ; ν FY )|X = x] · dFX (x)
where the first equality follows from the fact that the influence function integrates to zero, and the second equality comes from substituting in equation (1). Equation (4) shows that when we are interested in the impact of covariates on a specific distributional statistic ν(FY ) such as a quantile, we simply need to integrate over E[RIF(Y ; ν FY )|X], which is easily done using regression methods. By contrast, in equation (1) we need to integrate over the whole conditional distribution FY |X (y|X = x), which is, in general, more difficult to estimate.9 We now state our main result on how the impact of a marginal change in the distribution of X on ν(FY ) can be obtained using the conditional expectation of the RIF(Y ; ν FY ). Note that all proofs are provided in the Appendix. THEOREM 1—Marginal Effect of a Change in the Distribution of X: Suppose we can induce a small perturbation in the distribution of covariates, from FX in the direction of GX , maintaining the conditional distribution of Y given X unaffected. The marginal effect of this distributional change on the functional ν(FY ) is given by integrating up the conditional expectation of the (recentered) influence function with respect to the changes in distribution of the covariates d(GX − FX ): ∂ν(FYt·G∗Y ) = E[RIF(Y ; ν FY )|X = x] · d(GX − FX )(x) ∂t t=0 where FYt·G∗Y = (1 − t) · FY + t · G∗Y . We next consider a particular change, a small location shift t, in the distribution of covariates X. Let Xj be a continuous covariate in the vector X, where 9 Most other approaches, such as the conditional quantile regression method of Machado and Mata (2005), have essentially proposed to estimate and integrate the whole conditional distribution, FY |X (y|X = x) over a new distribution GX of X to obtain the counterfactual unconditional distribution of Y . See also Albrecht, Björklund, and Vroman (2003) and Melly (2005). By contrast, we show in Section 3 that our approach requires estimating the conditional distribution FY |X (y|X = x) = Pr[Y > y|X = x] only at one point of the distribution. Note that these approaches do not generate a marginal effect parameter, but instead a total effect of changes in the distribution of X on selected features (e.g., quantiles) of the unconditional distribution of Y .
958
S. FIRPO, N. M. FORTIN, AND T. LEMIEUX
1 ≤ j ≤ k. The new distribution GX will be the distribution of a random k × 1 vector Z, where Zl = Xl for l = j and l = 1 k, and Zj = Xj + t. In this special case, let αj (ν) denote the partial effect of a small change in the distribution of covariates from FX to GX on the functional ν(FY ). Collecting all j entries, we construct the k × 1 vector α(ν) = [αj (ν)]kj=1 . We can write the unconditional partial effect α(ν) as an average derivative. COROLLARY 1—Unconditional Partial Effect: Assume that dX , the boundary of the support X of X, is such that if x ∈ d X , then fX (x) = 0. Then the vector α(ν) of partial effects of small location shifts in the distribution of a continuous covariate X on ν(FY ) can be written using the vector of average derivatives10 dE[RIF(Y ; ν)|X = x] (5) · dF(x) α(ν) = dx 2.2. The Case of Quantiles Turning to the specific case of quantiles, consider the τth quantile qτ = ντ (FY ) = infq {q : FY (q) ≥ τ}. It follows from the definition of the influence function that RIF(y; qτ ) = qτ + IF(y; qτ ) = qτ +
τ − 1{y ≤ qτ } = c1τ · 1{y > qτ } + c2τ fY (qτ )
where c1τ = 1/fY (qτ ), c2τ = qτ − c1τ · (1 − τ), and fY (qτ ) is the density of Y evaluated at qτ . Thus E[RIF(Y ; qτ )|X = x] = c1τ · Pr[Y > qτ |X = x] + c2τ From equation (5), the unconditional partial effect, that we denote α(τ) in the case of the τth quantile, simplifies to ∂ντ (FYt·G∗Y ) = c1τ · d Pr[Y > qτ |X = x] · dFX (x) α(τ) = (6) ∂t dx t=0 where the last term is the average marginal effect from the probability response model Pr[Y > qτ |X]. We call the parameter α(τ) = E[dE[RIF(Y qτ )|X]/dx] the unconditional quantile partial effect (UQPE), by analogy with the Woolridge (2004) unconditional average partial effect (UAPE), which is defined as E[dE[Y |X]/dx].11 10 The expression dE[RIF(Y ; ν)|X = x]/dx is the k vector of partial derivatives [∂E[RIF(Y ; ν)|X = x]/∂xj ]kj=1 . 11 The UAPE is a special case of Corollary 1 for the mean (ν = μ), where α(μ) = E[dE[Y | X]/dx] since RIF(Y μ) = y.
UNCONDITIONAL QUANTILE REGRESSIONS
959
Our next result provides an interpretation of the UQPE in terms of a general structural model, Y = h(X ε), where the unknown mapping h(· ·) is invertible on the second argument, and ε is an unobservable determinant of the outcome variable Y . We also show that the UQPE can be written as a weighted average of a family of conditional quantile partial effects (CQPE), which is the effect of a small change of X on the conditional quantile of Y : CQPE(τ x) =
∂Qτ [h(X ε)|X = x] ∂h(x Qτ [ε]) = ∂x ∂x
where Qτ [Y |X = x] ≡ infq {q : FY |X (q|x) ≥ τ} is the conditional quantile operator. For the sake of simplicity and comparability between the CQPE and the UQPE, we consider the case where ε and X are independent. Thus, we can use the unconditional form for Qτ [ε] in the last term of the above equation.12 In a linear model Y = h(X ε) = X β + ε, both the UQPE and the CQPE are trivially equal to the parameter βj of the structural form for any quantile. While this specific result does not generalize beyond the linear model, useful connections can still be drawn between the UQPE and the underlying structural form, and between the UQPE and the CQPE. To establish these connections, we define three auxiliary functions. The first function, ωτ : X → R+ , is a weighting function defined as the ratio between the conditional density given X = x and the unconditional density: ωτ (x) ≡ fY |X (qτ |x)/fY (qτ ). The second function, ετ : X → R, is the inverse function h−1 (· qτ ), which shall exist under the assumption that h is strictly monotonic in ε. The third function, ζτ : X → (0 1), is a “matching” function indicating where the unconditional quantile qτ falls in the conditional distribution of Y : ζτ (x) ≡ {s : Qs [Y |X = x] = qτ } = FY |X (qτ |X = x) PROPOSITION 1—UQPE and the Structural Form: (i) Assuming that the structural form Y = h(X ε) is strictly monotonic in ε and that X and ε are independent, the parameter UQPE(τ) will be ∂h(X ετ (X)) UQPE(τ) = E ωτ (X) · ∂x (ii) We can also represent UQPE(τ) as a weighted average of CQPE(ζτ (x) x): UQPE(τ) = E ωτ (X) · CQPE(ζτ (X) X) 12 In this setting, the identification of the UQPE requires Fε|X to be unaffected by changes in the distribution of covariates. The identification of the CQPE requires quantile independence between ε and X, that is, the τ-conditional quantile of ε given X equals the τ-unconditional quantile of ε. Independence between ε and X guarantees, therefore, that both the UQPE and the CQPE parameters are identified.
960
S. FIRPO, N. M. FORTIN, AND T. LEMIEUX
Result (i) of the Proposition 1 shows formally that UQPE(τ) is equal to a weighted average (over the distribution of X) of the partial derivatives of the structural function. In the simple case of the linear model mentioned above, it follows that ∂h(X ετ (X))/∂x = β and UQPE(τ) = β for all τ. More generally, the UQPE will typically depend on τ in nonlinear settings. For example, ˜ β + ε), where h˜ is differentiable and strictly monotonic, when h(X ε) = h(X simple algebra yields UQPE(τ) = β · h˜ (h˜ −1 (qτ )), which depends on τ. Finally, note that independence plays a crucial role here. If, instead, we had dropped the independence assumption between ε and X, we would not be able, even in a linear model, to express UQPE(τ) as a simple function of the structural parameter β.13 Result (ii) shows that UQPE(τ) is a weighted average (over the distribution of X) of a family of CQPE(ζτ (X) X) at ζτ (X), the conditional quantile corresponding to the τth unconditional quantile of the distribution of Y , qτ . But while result (ii) of Proposition 1 provides a more structural interpretation of the UQPE, it is not practical from an estimation point of view as it would require estimating h and Fε , the distribution of ε, using nonparametric methods. As shown below, we propose a simpler way to estimate the UQPE based on the estimation of average marginal effects.
3. ESTIMATION In this section, we discuss the estimation of UQPE(τ) using RIF regressions. Equation (6) shows that three components are involved in the estimation of UQPE(τ): the quantile qτ , the density of the unconditional distribution of Y that appears in the constant c1τ = 1/fY (qτ ), and the average marginal effect E[d Pr[Y > qτ |X]/dX]. We discuss the estimation of each component in turn and then briefly address the asymptotic properties of related estimators. The estimator of the τth population quantile of the marginal distribution of Y is qτ , the usual τth sample quantile, which can be represented, as in Koenker and Bassett (1978), as qτ = arg min q
N
(τ − 1{Yi − q ≤ 0}) · (Yi − q) i=1
13 These examples are worked in detail in the working paper version of this article. See Firpo, Fortin, and Lemieux (2007a).
UNCONDITIONAL QUANTILE REGRESSIONS
961
We estimate the density of Y , f Y (·), using the kernel density estimator14 qτ ) = f Y (
N
qτ Yi − 1 · KY N · b i=1 b
where KY (z) is a kernel function and b a positive scalar bandwidth. We suggest three estimation methods for the UQPE based on three ways, among many, to estimate the average marginal effect E[d Pr[Y > qτ |X]/dX]. As discussed in Firpo, Fortin, and Lemieux (2009), the first two estimators will be consistent if we correctly impose functional form restrictions. The third estimator involves a fully nonparametric first stage and, therefore, will be consistent quite generally for the average derivative parameter. The first method estimates the average marginal effect E[d Pr[Y > qτ | X]/dX] with an OLS regression, which provides consistent estimates if Pr[Y > qτ |X = x] is linear in x. This method, that we call RIF-OLS, consists of regress ; c1τ · 1{Y > qτ } + c2τ on X. The second method uses a logistic ing RIF(Y qτ ) = regression of 1{Y > qτ } on X to estimate the average marginal effect, which is then multiplied by c1τ . Again, the average marginal effect from this logit model will be consistent if Pr[Y > qτ |X = x] = Λ(x θτ ), where Λ(·) is the cumulative distribution function (c.d.f.) of a logistic distribution and θτ is a vector of coefficients. We call this method RIF-Logit. In the empirical section, we use these two estimators and find that, in our application, they yield estimates very close to the fully nonparametric estimator. The last estimation method, called RIF-NP, is based on a nonparametric estimator that does not require any functional form assumption on Pr[Y > qτ |X = x] to be consistent. We use the method discussed by Newey (1994) and estimate Pr[Y > qτ |X = x] by polynomial series. As the object of interest is the average of d Pr[Y > qτ |X = x]/dx, once we have a polynomial function that approximates the conditional probability, we can easily take derivatives of polynomials and average them. As shown by Stoker (1991) for the average derivative case and later formalized in a more general setting by Newey (1994), the choice of the nonparametric estimator for the derivative is not crucial in large samples. Averaging any regular nonparametric estimator with respect to X yields an estimator that converges at the usual parametric rate and has the same limiting distribution as other estimators based on different nonparametric methods.15 14 In the empirical section we propose using the Gaussian kernel. The requirements for the kernel and the bandwidth are described in Firpo, Fortin, and Lemieux (2009). We propose using the kernel density estimator, but other consistent estimators of the density could be used as well. 15 Nonparametric estimation of Pr[Y > qτ |X = x] could also be performed by series approximation of the log-odds ratio, which would keep predictions between 0 and 1 (Hirano, Imbens, and Ridder (2003)). Note, however, that we are mainly interested in another object, the derivative d Pr[Y > qτ |X = x]/dx, and imposing that the conditional probability lies in the unit interval does not necessarily add much structure to its derivative.
962
S. FIRPO, N. M. FORTIN, AND T. LEMIEUX
We study the asymptotic properties of our estimators in detail in Firpo, Fortin, and Lemieux (2009), where we establish the limiting distributions of these estimators, discuss how to estimate their asymptotic variances, and show how to construct test statistics. Three important results from Firpo, Fortin, and Lemieux (2009) are summarized here. The first result is that the asymptotic linear expression of each one of the three estimators consists of three components. The first component is associated with uncertainty regarding the density; the second component is associated with the uncertainty regarding the population quantile; and the third component is associated with the average derivative term E[d Pr[Y > qτ |X]/dX]. The second result states that because the density is nonparametrically estimated by kernel methods, the rate of convergence of the three estimators will be dominated by this slower term. In Firpo, Fortin, and Lemieux (2009), we use a higher order expansion type of argument to allow for the quantile and the average derivative components to be explicitly included. By doing so, we can introduce a refinement in the expression of the asymptotic variance. Finally, the third result is that to test the null hypothesis that UQPE = 0, we do not need to estimate the density, as E[d Pr[Y > qτ |X]/dX] = 0 ⇔ UQPE = 0. Thus, we can use test statistics that converge at the parametric rate. In this case, the only components that contribute to the asymptotic variance are the quantile and the average derivative. As with standard average marginal effects, we can also estimate the UQPE for a dummy covariate by estimating E[Pr[Y > qτ |X = 1]] − E[Pr[Y > qτ |X = 0]] instead of E[d Pr[Y > qτ |X]/dX] using any of the three methods discussed above. Like in the example of union status mentioned in the Introduction, the UQPE in such cases represents the impact of a small change in the probability p = Pr[X = 1], instead of the small location shift for a continuous covariate considered in Section 2.16 4. EMPIRICAL APPLICATION In this section, we present an empirical application to illustrate how the unconditional quantile regressions work in practice using the three estimators discussed above.17 We also show how the results compare to standard (conditional) quantile regressions. Our application considers the direct effect of union status on male log wages, which is well known to be different at different points of the wage distribution.18 We use a large sample of 266,956 observa16
See Firpo, Fortin, and Lemieux (2007a) for more detail. A Stata ado file that implements the RIF-OLS estimator is available on the author’s website, http://www.econ.ubc.ca/nfortin/. 18 See, for example, Chamberlain (1994) and Card (1996). For simplicity, we maintain the assumption that union coverage status is exogenous. Studies that have used selection models or longitudinal methods to allow the union status to be endogenously determined (e.g., Lemieux (1998)) suggest that the exogeneity assumption only introduces small biases in the estimation. 17
UNCONDITIONAL QUANTILE REGRESSIONS
963
tions on U.S. males from the 1983–1985 Outgoing Rotation Group (ORG) supplement of the Current Population Survey.19 Looking at the impact of union status on log wages illustrates well the difference between conditional and unconditional quantiles regressions. Consider, for example, the effect of union status estimated at the 90th and 10th quantiles. Finding that the effect of unions (for short) estimated using conditional quantile regressions is smaller at the 90th than at the 10th quantile simply means that unions reduce within-group dispersion, where the “group” consists of workers who share the same values of the covariates X (other than union status). This does not mean, however, that increasing the rate of unionization would reduce overall wage dispersion as measured by the difference between the 90th and the 10th quantiles of the unconditional wage dispersion. To answer this question we have to turn to unconditional quantile regressions. In addition to the within-group wage compression effect captured by conditional quantile regressions, unconditional quantile regressions also capture an inequality-enhancing between-group effect linked to the fact that unions increase the conditional mean of wages of union workers. This creates a wedge between otherwise comparable union and non-union workers.20 As a result, unions tend to increase wages for low wage quantiles where both the betweenand within-group effects go in the same direction, but can decrease wages for high wage quantiles where the between- and within-group effects go in opposite directions. As a benchmark, Table I reports the RIF-OLS estimated coefficients of the log wages model for the 10th, 50th, and 90th quantiles. The results (labeled UQR for unconditional quantile regressions) are compared with standard OLS (conditional mean) estimates and with standard (conditional) quantile regressions (CQR) at the corresponding quantiles. For the sake of comparability, we use simple linear specifications for all estimated models. We also show in Figure 1 how the estimated UQPE of unions changes when we use the RIF-Logit and RIF-NP methods instead. Interestingly, the UQPE of unions first increases from 0.198 at the 10th quantile to 0.349 at the median, before turning negative (−0.137) at the 90th quantile. These findings strongly confirm the point discussed above that unions 19 We start with 1983 because it is the first year in which the ORG supplement asked about union status. The dependent variable is the real log hourly wage for all wage and salary workers, and the explanatory variables include six education classes, married, non-white, and nine experience classes. The hourly wage is measured directly for workers paid by the hour and is obtained by dividing usual earnings by usual hours of work for other workers. Other data processing details can be found in Lemieux (2006). 20 In the case of the variance, it is easy to write down an analytical expression for the betweenand within-group effects (see, for example, Card, Lemieux, and Riddell (2004)) and find the conditions under which one effect dominates the other. It is much harder to ascertain, however, whether the between- or the within-group effect tends to dominate at different points of the wage distribution.
964
S. FIRPO, N. M. FORTIN, AND T. LEMIEUX TABLE I
COMPARING OLS, UNCONDITIONAL QUANTILE REGRESSIONS (UQR), AND CONDITIONAL QUANTILE REGRESSIONS (CQR); 1983–1985 CPS DATA FOR MENa 10th Centile
Union status Non-white Married Education Elementary HS dropout Some college College Post-graduate Constant
50th Centile
90th Centile
OLS
UQR
CQR
UQR
CQR
UQR
CQR
0179 (0002) −0134 (0003) 0140 (0002)
0195 (0002) −0116 (0005) 0195 (0004)
0288 (0003) −0139 (0004) 0166 (0004)
0337 (0004) −0163 (0004) 0156 (0003)
0195 (0002) −0134 (0003) 0146 (0002)
−0135 (0004) −0099 (0005) 0043 (0004)
0088 (0003) −0120 (0005) 0089 (0003)
−0351 (0004) −0190 (0003) 0133 (0002) 0406 (0003) 0478 (0004)
−0307 (0009) −0344 (0007) 0058 (0004) 0196 (0004) 0138 (0004)
−0279 (0006) −0127 (0004) 0058 (0003) 0252 (0005) 0287 (0007)
−0452 (0006) −0195 (0004) 0179 (0004) 0464 (0005) 0522 (0005)
−0374 (0005) −0205 (0003) 0133 (0003) 0414 (0004) 0482 (0004)
−0240 (0005) −0068 (0003) 0154 (0005) 0582 (0008) 0844 (0012)
−0357 (0007) −0227 (0005) 0172 (0004) 0548 (0006) 0668 (0006)
1742 (0004)
0970 (0005)
1145 (0006)
1735 (0006)
1744 (0004)
2511 (0008)
2332 (0005)
a Robust standard errors (OLS) and bootstrapped standard errors (200 replications) for UQR and CQR are given in parentheses. All regressions also include a set of dummies for labor market experience categories.
have different effects at different points of the wage distribution.21 The conditional quantile regression estimates reported in the corresponding columns show, as in Chamberlain (1994), that unions shift the location of the conditional wage distribution (i.e., positive effect on the median) but also reduce conditional wage dispersion. The difference between the estimated effect of unions for conditional and unconditional quantile regression estimates is illustrated in more detail in panel A of Figure 1, which plots both conditional and unconditional quantile regression estimates of union status at 19 different quantiles (from the 5th to the 95th).22 As indicated in Table I, the unconditional union effect is highly nonmonotonic, while the conditional effect declines monotonically. More precisely, the unconditional effect first increases from about 0.1 at the 5th quantile to about 0.4 at the 35th quantile, before declining and eventually reaching 21 Note that the effects are very precisely estimated for all specifications and the R-squared (close to 0.40) are sizeable for cross-sectional data. 22 Bootstrapped standard errors are provided for both estimates. Analytical standard errors for the UQPE are nontrivial and derived in Firpo, Fortin, and Lemieux (2009).
UNCONDITIONAL QUANTILE REGRESSIONS
965
FIGURE 1.—Unconditional and conditional quantile regression estimates of the effect of union status on log wages.
a large negative effect of over −0.2 at the 95th quantile. By contrast, standard (conditional) quantile regression estimates decline almost linearly from about 0.3 at the 5th quantile to barely more than 0 at the 95th quantile. At first glance, the fact that the effect of unions is uniformly positive for conditional quantile regressions, but negative above the 80th quantile for unconditional quantile regressions may seem puzzling. Since Proposition 1 states that the UQPE is a weighted average of the CQPEs, for the UQPE to be negative it must be that some of the CQPEs are negative too. Unlike the UQPE, however, the CQPE generally depends on X. For the sake of clarity, in Figure 1 we report the conditional quantile regressions using a highly restricted specification where the effect of unions is not allowed to depend on a rich set of other covariates (no interaction terms). When we relax this assumption, we find that conditional quantile regressions estimates are often negative for more “skilled” workers (in high education/high labor market experience cells). However, these negative effects are averaged away by positive effects in the more parsimonious conditional quantile regressions. On the other hand, because the matching function ζτ (x) from Proposition 1 reassigns some of the negative union effects from the s-conditional quantiles to the τ-unconditional quantiles at the top of the wage distribution and because the weighting func-
966
S. FIRPO, N. M. FORTIN, AND T. LEMIEUX
tion ωτ (x) puts more weight on these workers, the UQPE becomes negative for workers at the top end of the wage distribution. Panel B shows that the RIF-OLS and RIF-Logit estimates of the UQPE are very similar, which confirms the “folk wisdom” in empirical work that, in many instances, using a linear probability model or a logit gives very similar average marginal effects. More importantly, Figure 1 shows that the RIF-NP estimates are also very similar to the estimates obtained using these two simpler methods.23 This suggests that, at least for this particular application, using a simple linear specification for the unconditional quantile regressions provides fairly accurate estimates of the UQPE. The small difference between RIF-OLS and RIF-NP estimates stands in sharp contrast to the large differences between the RIF-OLS estimates and the conditional quantile regression estimates in panel A. The large differences between the conditional and unconditional quantile regressions results have important implications for understanding recent changes in wage inequality. There is a long tradition in labor economics of attempting to estimate the effect of unionization on the (unconditional) distribution of wages.24 The unconditional quantile regressions provide a simple and direct way to estimate this effect at all points of the distribution. The estimates reported in Figure 1 show that unionization progressively increases wages in the three lower quintiles of the distribution, peaking around the 35th quantile, and actually reduces wages in the top quintile of the distribution. As a result, the decline in unionization over the last three decades should have contributed to a reduction in wage inequality at the bottom end of the distribution and to an increase in wage inequality at the top end. This precisely mirrors the actual U-shaped changes observed in the data.25 By contrast, conditional quantile regressions results describe a positive but monotonically declining effect of unionization on wages, which fails to account for the observed pattern of changes in the wage distribution. 5. CONCLUSION In this paper, we propose a new regression method to estimate the effect of explanatory variables on the unconditional quantiles of an outcome variable. The proposed unconditional quantile regression method consists of running a regression of the (recentered) influence function of the unconditional quantile of the outcome variable on the explanatory variables. The influence 23 The RIF-NP is estimated using a model fully saturated with all possible interactions (up to 432 parameters) of our categorical variables, omitting for each estimated quantile the interactions that would result in perfect predictions. For the RIF-OLS, the figure graphs the estimated coefficients, while for the RIF-Logit and RIF-NP, the average unconditional partial effects are displayed. 24 See, for example, Card (1996) and DiNardo, Fortin, and Lemieux (1996). 25 See, for example, Autor, Katz, and Kearney (2008) and Lemieux (2008).
UNCONDITIONAL QUANTILE REGRESSIONS
967
function is a widely used tool in robust estimation that can easily be computed for each quantile of interest. We show how standard partial effects, that we call unconditional quantile partial effects (UQPE), can be estimated using our regression approach. Another important advantage of the proposed method is that it can be easily generalized to other distributional statistics such as the Gini, the log variance, or the Theil coefficient. Once the recentered influence function for these statistics is computed, all that is required is running a regression of the resulting RIF on the covariates. We discuss in a companion paper (Firpo, Fortin, and Lemieux (2007b)) how our regression method can be used to generalize traditional Oaxaca–Blinder decompositions, devised for means, to other distributional statistics. Finally, our method can be useful even when the independence assumption is relaxed. However, the interpretation of the identified parameter in terms of its relation to the structural function linking observed and unobserved factors to the dependent variable would change. Yet, the UQPE parameter would still be defined by holding unobserved variables and other components of X fixed when evaluating the marginal effect of changes in the distribution of Xj on a given quantile of the unconditional distribution of Y . Such structural averaged marginal effects can be useful in practice. We plan to show in future work how our approach can be used when instrumental variables are available for the endogenous covariates and how consistent estimates of marginal effects can be obtained by adding a control function in the unconditional quantile regressions. APPENDIX PROOF OF THEOREM 1: The effect on the functional ν of the distribution of Y of an infinitesimal change in the distribution of X from FX toward GX is defined as ∂ν(FYt·G∗Y )/∂t|t=0 . Given that equation (3) also applies to RIF(y; ν), it follows that ∂ν(FYt·G∗Y ) = RIF(y; ν) · d(G∗ − FY )(y) Y ∂t t=0 Substituting in equations (1) and (2), and applying the fact that E[RIF(Y ; ν)|X = x] = y RIF(y; ν) · dFY |X (y|X = x) yields ∂ν(FYt·G∗Y ) ∂t
= t=0
RIF(y; ν) · dFY |X (y|X = x) · d(GX − FX )(x)
=
E[RIF(Y ; ν)|X = x] · d(GX − FX )(x)
Q.E.D.
968
S. FIRPO, N. M. FORTIN, AND T. LEMIEUX
X (·; t) of the random PROOF OF COROLLARY 1: Consider the distribution G vector Z = X + tj , where tj = t · ej and ej = [0 0 1 0 0] , which is a k vector of zeros except at the jth entry, which equals 1. The density of Z ∗ (·; t) of Y using is gX (x; t) = fX (x − tj ).26 The counterfactual distribution G Y X (·; t) will be FY |X and G ∗ (y; t) = FY |X (y|x) · fX (x − tj ) · dx G Y =
FY |X (y|x) · fX (x) · dx −t ·
FY |X (y|x) ·
= FY (y) + t ·
∂fX (x)/∂xj · fX (x) · dx + χt fX (x)
FY |X (y|x) · ej · lX (x) · fX (x) · dx + χt
where the second line is obtained using a first-order expansion, where lX (x) = −d ln(fX (x))/dx = −fX (x)/fX (x), and fX (x) =[∂fX (x)/∂xl ]kl=1 is the k vector of partial derivatives of fX (x). Therefore, χt = O(t 2 ). Now, define x gX (x) = fX (x) · (1 + ej · lX (x)) and GX (x) = gX (ξ) · dξ By the usual definition of the counterfactual distribution G∗Y of Y using FY |X and GX , we have ∗ GY (y) = FY |X (y|x) · gX (x) · dx = FY (y) +
FY |X (y|x) · ej · lX (x) · fX (x) · dx
Thus we can write ∗ (y; t) = FY (y) + t · (G∗ (y) − FY (y)) + χt = FYt·G∗ + χt G Y Y Y Hence, ∗ (·; t)) − ν(FY ) ν(G Y t↓0 t ∗ ν(FYt·G∗Y ) − ν(FY ) ν(GY (·; t)) − ν(FYt·G∗Y ) + lim = lim t↓0 t↓0 t t ν(FYt·G∗Y + χt ) − ν(FYt·G∗Y ) ∂ν(FYt·G∗Y ) + lim = t↓0 ∂t t t=0
αj (ν) ≡ lim
26
The density of X is fX (·) and, by definition of densities,
x
fX (ξ) · dξ = FX (x).
969
UNCONDITIONAL QUANTILE REGRESSIONS
where the last term vanishes: ν(FYt·G∗Y + χt ) − ν(FYt·G∗Y ) O(χt ) = lim lim t↓0 t↓0 t t = lim O(|t|) = O(1) · lim t t↓0
t↓0
Using Theorem 1, it follows that ∂ν(FYt·G∗Y ) = E[RIF(Y ; ν)|X = x] · d(GX − FX )(x) αj (ν) = ∂t t=0 = E[RIF(Y ; ν)|X = x] · ej · lX (x) · fX (x) · dx Applying partial integration and using the condition that fX (x) is zero at the boundary of the support yields ej · E[RIF(Y ; ν)|X = x] · lX (x) · fX (x) · dx = = Hence
ej ·
∂E[RIF(Y ; ν)|X = x] · fX (x) · dx ∂xj
αj (ν) =
dE[RIF(Y ; ν)|X = x] · fX (x) · dx dx
∂E[RIF(Y ; ν F)|X = x] · fX (x) · dx ∂xj
Q.E.D.
PROOF OF PROPOSITION 1: (i) Starting from equation (6), d Pr[Y ≤ qτ |X = x] 1 · · dFX (x) UQPE(τ) = − fY (qτ ) dx and assuming that the structural form Y = h(X ε) is monotonic in ε, so that ετ (x) = h−1 (x qτ ), we can write Pr[Y ≤ qτ |X = x] = Pr[ε ≤ ετ (X)|X = x] = Fε|X (ετ (x)|x) = Fε (ετ (x)) Taking the derivative with respect to x, we get d
Pr[Y ≤ qτ |X = x] ∂h−1 (x qτ ) = fε (ετ (x)) · dx ∂x
970
S. FIRPO, N. M. FORTIN, AND T. LEMIEUX
Defining H(x ετ (x) qτ ) = h(x ετ (x)) − qτ , it follows that ∂H(x ετ qτ )/∂x ∂h−1 (x qτ ) ∂ετ (x) = =− ∂x ∂x ∂H(x ετ qτ )/∂ετ
−1 ∂h(x ε) ∂h(x ετ )/∂x ∂h(x ετ ) · =− =− ∂h(x ετ )/∂ετ ∂x ∂ε ε=ετ
Similarly, −1 ∂H(x ετ qτ )/∂qτ ∂h(x ε) ∂h−1 (x qτ ) =− = ∂qτ ∂H(x ετ qτ )/∂ετ ∂ε ε=ετ Hence, fY |X (qτ ; x) = d
Pr[Y ≤ qτ |X = x] Fε (h−1 (x qτ )) =d dqτ dqτ
∂h−1 (x qτ ) ∂qτ −1 ∂h(x ε) = · fε (ετ (x)) ∂ε ε=ετ = fε (ετ (x)) ·
Substituting in these expressions yields UQPE(τ)
−1
= −(fY (qτ ))
·
d
Pr[Y ≤ qτ |X = x] · dFX (x) dx
= (fY (qτ ))−1 −1 ∂h(x ε) ∂h(x ετ ) · · fε (ετ (x)) · · dFX (x) ∂x ∂ε ε=ετ ∂h(X ετ (X)) = (fY (qτ ))−1 · E fY |X (qτ |X) · ∂x fY |X (qτ |X) ∂h(X ετ (X)) =E · fY (qτ ) ∂x ∂h(X ετ (X)) = E ωτ (X) · ∂x (ii) Let the CQPE be defined as CQPE(τ x) = d
Qτ [Y |X = x] dx
UNCONDITIONAL QUANTILE REGRESSIONS
971
where τ denote the quantile of the conditional distribution: τ = Pr[Y ≤ Qτ [Y |X = x]|X = x]. Since Y = h(X ε) is monotonic in ε, τ = Pr[Y ≤ Qτ [Y |X = x]|X = x] = Pr ε ≤ h−1 (X Qτ [Y |X = x])|X = x = Fε h−1 (x Qτ [Y |X = x]) Thus, by the implicit function theorem, CQPE(τ x) fε (h−1 (x Qτ [Y |X = x])) · ∂h−1 (x Qτ [Y |X = x])/∂x fε (h−1 (x Qτ [Y |X = x])) · ∂h−1 (x q)/∂q|q=Qτ [Y |X=x] = − −∂h x h−1 (x Qτ [Y |X = x]) /∂x −1 −1 ∂h(x ε) ∂h(x ε) · ∂ε ε=h−1 (xQτ [Y |X=x]) ∂ε ε=h−1 (xQτ [Y |X=x]) =−
=
∂h(x h−1 (x Qτ [Y |X = x])) ∂x
Using the matching function ζτ (x) ≡ {s : Qs [Y |X = x] = qτ }, we can write CQPE(s x) for the τth conditional quantile at a fixed x (Qs [Y |X = x]) that equals (matches) the τth unconditional quantile (qτ ) as CQPE(s x) = CQPE(ζτ (x) x) ∂h(x h−1 (x Qs [Y |X = x])) ∂x ∂h(x h−1 (x qτ )) ∂h(X ετ (X)) = = ∂x ∂x
=
Therefore, ∂h(X ετ (X)) UQPE(τ) = E ωτ (X) · ∂x = E[ωτ (X) · CQPE(ζτ (X) X)]
Q.E.D.
REFERENCES ALBRECHT, J., A. BJÖRKLUND, AND S. VROMAN (2003): “Is There a Glass Ceiling in Sweden?” Journal of Labor Economics, 21, 145–178. [957] AUTOR, D. H., L. F. KATZ, AND M. S. KEARNEY (2008): “Trends in U.S. Wage Inequality: Revising the Revisionists,” Review of Economics and Statistics, 90, 300–323. [966]
972
S. FIRPO, N. M. FORTIN, AND T. LEMIEUX
CARD, D. (1996): “The Effect of Unions on the Structure of Wages: A Longitudinal Analysis,” Econometrica, 64, 957–979. [962,966] CARD, D., T. LEMIEUX, AND W. C. RIDDELL (2004): “Unions and Wage Inequality,” Journal of Labor Research, 25, 519–562. [963] CHAMBERLAIN, G. (1994): “Quantile Regression Censoring and the Structure of Wages,” in Advances in Econometrics, ed. by C. Sims. New York: Elsevier. [962,964] CHESHER, A. (2003): “Identification in Nonseparable Models,” Econometrica, 71, 1401–1444. [955] DINARDO, J., N. M. FORTIN, AND T. LEMIEUX (1996): “Labor Market Institutions and the Distribution of Wages: A Semi-Parametric Approach, 1973–1992,” Econometrica, 64, 1001–1044. [966] FIRPO, S., N. M. FORTIN, AND T. LEMIEUX (2007a): “Unconditional Quantile Regressions,” Technical Working Paper 339, National Bureau of Economic Research, Cambridge, MA. [960,962] (2007b): “Decomposing Wage Distributions Using Recentered Influence Function Regressions,” Unpublished Manuscript, University of British Columbia. [954,967] (2009): “Supplement to ‘Unconditional Quantile Regressions’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ecta/Supmat/6822_extensions.pdf; http://www.econometricsociety.org/ecta/Supmat/6822_data and programs.zip. [961,962,964] FLORENS, J. P., J. J. HECKMAN, C. MEGHIR, AND E. VYTLACIL (2008): “Identification of Treatment Effects Using Control Functions in Models With Continuous, Endogenous Treatment and Heterogeneous Effect,” Econometrica, 76, 1191–1207. [955] HAMPEL, F. R. (1968): “Contribution to the Theory of Robust Estimation,” Ph.D. Thesis, University of California at Berkeley. [956] (1974): “The Influence Curve and Its Role in Robust Estimation,” Journal of the American Statistical Association, 60, 383–393. [956] HIRANO, K., G. W. IMBENS, AND G. RIDDER (2003): “Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score,” Econometrica, 71, 1161–1189. [961] IMBENS, G. W., AND W. K. NEWEY (2009): “Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity,” Econometrica (forthcoming). [955] KOENKER, R. (2005): Quantile Regression, New York: Cambridge University Press. [953] KOENKER, R., AND G. BASSETT (1978): “Regression Quantiles,” Econometrica, 46, 33–50. [953, 960] LEMIEUX, T. (1998): “Estimating the Effects of Unions on Wage Inequality in a Panel Data Model With Comparative Advantage and Non-Random Selection,” Journal of Labor Economics, 16, 261–291. [962] (2006): “Increasing Residual Wage Inequality: Composition Effects, Noisy Data, or Rising Demand for Skill?” American Economic Review, 96, 461–498. [963] (2008): “The Changing Nature of Wage Inequality,” Journal of Population Economics, 21, 21–48. [966] MACHADO, A. F., AND J. MATA (2005): “Counterfactual Decomposition of Changes in Wage Distributions Using Quantile Regression,” Journal of Applied Econometrics, 20, 445–465. [957] MELLY, B. (2005): “Decomposition of Differences in Distribution Using Quantile Regression,” Labour Economics, 12, 577–590. [957] NEWEY, W. K. (1994): “The Asymptotic Variance of Semiparametric Estimators,” Econometrica, 62, 1349–1382. [961] STOKER, T. M. (1991): “Equivalence of Direct, Indirect and Slope Estimators of Average Derivatives,” in Nonparametric and Semiparametric Methods, ed. by W. A. Barnett, J. L. Powell, and G. Tauchen. Cambridge, U.K.: Cambridge University Press. [961] VON MISES, R. (1947): “On the Asymptotic Distribution of Differentiable Statistical Functions,” The Annals of Mathematical Statistics, 18, 309–348. [956]
UNCONDITIONAL QUANTILE REGRESSIONS
973
WOOLDRIDGE, J. M. (2004): “Estimating Average Partial Effects Under Conditional Moment Independence Assumptions,” Unpublished Manuscript, Michigan State University. [958]
Escola de Economia de São Paulo, Fundação Getúlio Vargas, Rua Itapeva 474, São Paulo, SP 01332-000, Brazil; [email protected], Dept. of Economics, University of British Columbia, 997-1873 East Mall, Vancouver, BC V6T 1Z1, Canada and Canadian Institute for Advanced Research, Toronto, Canada [email protected], and Dept. of Economics, University of British Columbia, 997-1873 East Mall, Vancouver, BC V6T 1Z1, Canada; [email protected]. Manuscript received November, 2006; final revision received December, 2008.
Econometrica, Vol. 77, No. 3 (May, 2009), 975–991
CLASSIFICATION ERROR IN DYNAMIC DISCRETE CHOICE MODELS: IMPLICATIONS FOR FEMALE LABOR SUPPLY BEHAVIOR BY MICHAEL P. KEANE AND ROBERT M. SAUER1 Two key issues in the literature on female labor supply are (i) whether persistence in employment status is due to unobserved heterogeneity or state dependence, and (ii) whether fertility is exogenous to labor supply. Until recently, the consensus was that unobserved heterogeneity is very important and fertility is endogenous. Hyslop (1999) challenged this. Using a dynamic panel probit model of female labor supply including heterogeneity and state dependence, he found that adding autoregressive errors led to a substantial diminution in the importance of heterogeneity. This, in turn, meant he could not reject that fertility is exogenous. Here, we extend Hyslop (1999) to allow classification error in employment status, using an estimation procedure developed by Keane and Wolpin (2001) and Keane and Sauer (2005). We find that a fairly small amount of classification error is enough to overturn Hyslop’s conclusions, leading to overwhelming rejection of the hypothesis of exogenous fertility. KEYWORDS: Female labor supply, fertility, discrete choice, classification error, simulated maximum likelihood.
1. INTRODUCTION FOR MANY YEARS, two key issues have played a major role in the literature on female labor supply. One is the attempt to distinguish true state dependence from unobserved heterogeneity as potential explanations for the substantial observed persistence in work decisions (see, e.g., Heckman and Willis (1977), Nakamura and Nakamura (1985), and Eckstein and Wolpin (1989)).2 The second is whether children and nonlabor income can reasonably be viewed as exogenous to labor supply (see, e.g., Chamberlain (1984), Rosenzweig and Schultz (1985), Mroz (1987), and Jakubson (1988)). Until recently, the consensus of the literature was that unobserved heterogeneity is an important source of persistence and that fertility is endogenous, that is, women with greater unobserved tastes for work or greater unobserved skill endowments tend to have fewer children.3 Hyslop (1999) challenged these conclusions. Using recursive importance sampling techniques 1 This research is supported in part by the Australian Research Council, through a grant to Michael Keane (ARC Grants FF0561843 and DP0774247), and the Economic and Social Research Council of the United Kingdom, through a grant to Robert Sauer (ESRC Grant RES-00022-1529). 2 Labor market policies can have very different effects depending on whether persistence is due to unobserved heterogeneity (i.e., relatively immutable differences across individuals in tastes for work, motivation, productivity, etc.) or due to state dependence (i.e., habit persistence, human capital accumulation, barriers to labor market entry, etc.). 3 For instance, Chamberlain (1984) estimated probit models for married women’s labor force participation, and Jakubson (1988) estimated panel tobit models for married women’s hours, and they both overwhelmingly rejected exogeneity of children.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7642
976
M. P. KEANE AND R. M. SAUER
(see Keane (1994)), he estimated a complex panel probit model of married women’s labor supply that included a rich pattern of unobserved heterogeneity, true state dependence, and autoregressive errors. Hyslop found that the equicorrelation assumption of the random effects model was soundly rejected. Allowing for autoregressive errors led to a substantial diminution in the apparent importance of permanent unobserved heterogeneity. This, in turn, reduced the importance of correlation between unobserved heterogeneity and children/nonlabor income for labor supply behavior. Hence, rather surprisingly, he could not reject that fertility and nonlabor income are exogenous to female labor supply. Here, we examine the potential sensitivity of Hyslop’s results to classification error in employment status. Prior work has shown that misclassification of work status is important in micro data sets. Perhaps the best known evidence was provided by Poterba and Summers (1986). In the Current Population Survey (CPS), they found the probability an employed person falsely reports being unemployed or out of the labor force is 15%, while the probability an unemployed person reports being employed is 40%.4 Might Hyslop’s results be sensitive to allowing for such misclassification of employment status? To address this issue, we nest Hyslop’s (1999) panel probit model of married women’s work decisions within a model of classification error in reported employment status. We first replicate Hyslop’s (1999) results, using the Panel Study of Income Dynamics (PSID) data on married women’s work decisions between 1981 and 1987. We then show that inferences regarding exogeneity of fertility/nonlabor income are indeed sensitive to classification error: allowing for misclassification leads us to strongly reject the exogeneity hypothesis. The intuition for the change in results is simple: If the data contain classification error, persistence in employment status is understated and so is the importance of permanent unobserved heterogeneity. Allowing for classification error leads one to infer more persistence in “true” employment status, making unobserved heterogeneity more important. This increases the apparent magnitude of the covariance between individual effects and fertility/nonlabor income as well. Thus, to the extent one believes classification error in reported employment status is important in panel data, our results should move one’s priors toward accepting the endogeneity of fertility and nonlabor income. This, in turn, provides additional motivation for the importance of jointly modelling female labor supply and fertility, as in, for example, Moffitt (1984), Hotz and Miller (1988), and Keane and Wolpin (2006). 4
These figures are derived from Poterba and Summers’ (1986) Table II. To obtain their results Poterba and Summers use the CPS reconciliation data. In the reconciliation data, Census sends an interviewer to reinterview a household a week after its original interview. The interviewer determines if reports disagree and, in the event of a disagreement, attempts to determine true employment status.
CLASSIFICATION ERROR IN CHOICE MODELS
977
Introducing classification error in the panel probit model creates a serious computational problem: lagged true choices are unobserved, making simulation of state contingent transition probabilities intractable. This makes the GHK approach to simulating the likelihood infeasible, as it relies on simulating transition probabilities (see Keane (1994)). Instead, following Keane and Wolpin (2001) and Keane and Sauer (2005) we simulate the likelihood using unconditional simulations. As our focus is on substantive results, we refer the reader to those papers for the econometric methods. The rest of the paper is organized as follows: Section 2 presents our panel probit model with classification error. Section 3 describes the PSID data used in the estimation. Section 4 presents the estimation results, while Section 5 concludes. Simulations are available in the online supplement (Keane and Sauer (2009)) 2. A PANEL PROBIT MODEL WITH CLASSIFICATION ERROR In Section 2.1 we present a model of married women’s labor supply decisions exactly like that in Hyslop (1999). In Section 2.2 we extend it to allow for classification error. 2.1. The Basic Panel Probit Model—Hyslop (1999) Consider the following specification for the work decision rule: (1)
hit = 1(Xit β+γhit−1 +uit > 0)
(i = 1 N, t = 0 T )
where hit indicates whether woman i works at time t. She chooses to work if and only if (iff) the expression in parentheses is true. Xit is a vector of covariates including nonlabor income, number of children in different age ranges, age, race, education, and time dummies. hit−1 is lagged employment status and uit is an error term. The decision rule is in “reduced form” in the sense that we have substituted out for the wage as a function of Xit and hit−1 , and the Xit are assumed to be exogenous (a key hypothesis which we will test). In the simple static probit model, the coefficient γ is set to zero and uit is assumed to be serially independent and normally distributed with zero mean and variance σu2 . The scale normalization is achieved by setting σu2 equal to 1. In the static random effects (RE) model, uit is decomposed into two components, (2)
uit = αi + εit
where αi is a time-invariant individual effect distributed N(0 σα2 ). This individual effect, which captures unobserved time-invariant taste and productivity characteristics of woman i, generates serial correlation in uit . The transitory error component, εit , is serially uncorrelated, conditionally independent of αi ,
978
M. P. KEANE AND R. M. SAUER
and distributed N(0 σε2 ). Because σu2 = σα2 + σε2 and we normalize σu2 = 1, only σα2 is directly estimated. Although αi in (2) is usually assumed to be conditionally independent of Xit , one can allow αi to be correlated with Zit , a vector that contains only the time varying elements of Xit .5 This yields a correlated random effects model (CRE), where (3)
αi =
T
Zit δt + ηi
t=0
Here ηi ∼ N(0 ση2 ) and is conditionally independent of Zit and Xit . The variT ance of permanent heterogeneity is now σα2 = Var( t=0 Zit δt )+ση2 . In the CRE model, the δt ’s are estimated in addition to ση2 and β. Thus, exogeneity of children and nonlabor income can be examined via hypothesis tests on δt .6 To see how the CRE model relaxes exogeneity, note that the basic panel probit assumes (3a)
P(hit = 1|Xi hit−1 αi ) = P(hit = 1|Xit hit−1 αi )
(3b)
E(αi |Xi1 XiT ) = E(αi ) = 0
These equations imply that, conditional on (hit−1 αi ), only Xit helps predict hit ; that is, leads and lags of X do not matter. Equation (3a) is equivalent to E(εit |Xis ) = 0 for all t and s, a type of strict exogeneity we will call SE-A. Together, (3a) and (3b) imply E(uit |Xis ) = 0 for all t and s, a stronger form of strict exogeneity we will call SE-B. By dropping assumption (3b), the CRE model relaxes SE-B while maintaining SE-A.7 Next we allow εit to be serially correlated. This could arise from persistence in shocks to tastes and/or productivity. Letting εit follow an AR(1) process, we have (4)
εit = ρεit−1 + vit
5 Letting a time-invariant element of Xit shift αi is equivalent to letting it shift Xit β by a constant. 6 The CRE model was first suggested by Chamberlain (1982) and first used by him (Chamberlain (1984)) to test exogeneity of children to married women’s labor supply (i.e., employment status). 7 Intuitively, the CRE model allows the unobserved individual effects αi , which may capture tastes for work and/or latent skill endowments, to be correlated with fertility and nonlabor income (in all periods), but it still maintains that current shocks to employment status εit , which may arise from transitory shocks to tastes and/or productivity, do not alter future fertility or nonlabor income.
CLASSIFICATION ERROR IN CHOICE MODELS
979
where vit is normally distributed with zero mean and variance σv2 , and is conditionally independent of εit−1 . We assume the process is stationary, so σε2 = (σv2 )/(1 − ρ2 ). 8 The scale normalization and independence assumption gives σu2 = ση2 + σε2 = 1, and variance stationarity in the AR(1) process gives σu2 = ση2 + σv2 /(1 − ρ2 ) = 1. Thus, we can estimate ρ and ση2 , and “back out” σv2 using the formula σv2 = (1 − ρ2 )(1 − ση2 ). Finally, in addition to estimating ρ and ση2 , we can allow for “true state dependence” by letting γ in (1) be nonzero. Thus, we decompose the persistence in observed choice behavior into that due to (i) permanent unobserved heterogeneity, (ii) first-order state dependence, and (iii) AR(1) serial correlation.9 In dynamic probit models like (1)–(4), it is well known that if the hit process is not observed from its start, simply treating the first observed h as exogenous can lead to severe bias. Heckman (1981) proposed an approximate solution to this initial conditions problem where the first observed h is determined by a probit model:10 (5)
hit = 1(Xit β + γhit−1 + uit > 0)
(t ≥ 1)
hi0 = 1(Xi0 β0 + ui0 > 0) ρt = corr(ui0 uit )
(t ≥ 1)
Here t = 0 is the first period of observed data (not the start of the hit process) and hi0 is the first observed h. The error in the first period probit for hi0 , denoted ui0 , is assumed to be N(0 1). ρt is the correlation between ui0 and the errors uit for t ≥ 1. Hyslop (1999) adopted the restriction that the ρt ’s are equal. Let ρ0 denote their common value. In this case ρ0 is also the covariance between ui0 and the individual effect αi . (See Keane and Sauer (2006) for a derivation.) 8 The stationarity assumption may be controversial. We assume stationarity because Hyslop (1999) did so, and we want our results to differ from his only due to inclusion of classification error. 9 As discussed by Wooldridge (2005), what distinguishes true state dependence (γ) from serial correlation (i.e., random effects or AR(1) errors) in (1) is whether or not there is a causal effect of lagged X’s on current choices. If the observed persistence in choices is generated entirely by serially correlated errors (i.e., γ = 0), then lagged Xit ’s do not help to predict the current choice, conditional on the current Xit . Of course, this assertion rules out any direct effect of lagged X on the current choice. More generally, it is well known one cannot disentangle true state dependence from serial correlation without some parametric assumptions (see Chamberlain (1984) for discussion). 10 Again, we choose this method for comparability with Hyslop (1999). See Heckman (1981), Wooldridge (2005), and Keane and Wolpin (1997) for details on various alternative solutions.
980
M. P. KEANE AND R. M. SAUER
2.2. Incorporating Classification Error We generalize the panel probit model in (1)–(5) by nesting it within a model of classification error. Let h∗it denote the reported choice, while hit is the true choice. Let πjk denote the probability a true j is recorded as a k, where j k = 0 1, and assume these classification rates are determined by a logit model with the index function (6)
lit = γ0 + γ1 hit + γ2 h∗it−1 + ωit
where lit > 0 implies h∗it = 1, while h∗it = 0 otherwise. Naturally, we allow h∗it to be a function of hit , as the probability of a reported “1” should be greater if the person is actually employed.11 We also include the lagged reported choice h∗it−1 to capture persistence in misreporting. The error term ωit is distributed logistically and independent of uit , conditional on hit and h∗it−1 .12 Combining (1)–(6), we arrive at the following panel data probit model of female labor supply decisions with classification error in reported choices: (7)
hit = 1(Xit β + γhit−1 + uit > 0)
(t ≥ 1)
uit = αi + εit αi =
T
Zit δt + ηi
t=0
εit = ρεit−1 + vit hi0 = 1(Xi0 β0 + ui0 > 0) ρ0 = corr(ui0 uit ) lit = γ0 + γ1 hit + γ2 h∗it−1 + ωit for i = 1 N and t = 0 T . The full vector of estimable parameters is θ = {β γ δ ση2 ρ β0 ρ0 γ0 γ1 γ2 }. 11
Hausman, Abrevaya, and Scott-Morton (1998) noted that the key condition for identification of measurement error rates in parametric discrete choice models is that the probability of a reported “1” be increasing in the probability of a true “1” (and similarly for “0”). In our notation this requires that π01 + π10 < 1, which in our model is equivalent to γ1 > 0. Thus, classification error can not be so severe that people misreport their state more often than not—certainly a mild requirement. 12 Keane and Sauer (2005) showed that the classification error scheme in (6) performs quite well in repeated sampling experiments on panel probit models using our estimation procedure (i.e., both the parameters of (6) and the “true” process (1)–(5) are recovered with precision).
CLASSIFICATION ERROR IN CHOICE MODELS
981
3. DATA We use the same data as Hyslop (1999), who graciously gave us his data set. While in some cases we might have made different decisions in defining covariates or constructing the sample, it is essential the data be identical to facilitate replication. The data are from the 1986 Panel Study of Income Dynamics (PSID), including both the random core sample of families and nonrandom Survey of Economic Opportunity. The sample period is 1979–1985. We include women aged 18–60 in 1980, who were continuously married during the period and whose husbands were employed each year. This gives N = 1812 women and 12,684 person/year observations. A woman is classified as employed if she reports positive annual hours worked and positive earnings. Table I reports means and standard deviations of variables in the analysis sample. The average employment rate over the whole sample is 70. Covariates used to predict employment are nonlabor income, number of children in three different age ranges (0–2, 3–5, and 6–17), age, education, and race (equal to 1 if black).13 As in Hyslop (1999), the log of husband’s average earnings over the sample period yip = ln( T1 t yitm ) is used as a proxy for permanent nonlabor income. Transitory nonlabor income is proxied by yit = ln(yitm ) − yip . yip and yit enter as separate covariates in estimation. The number of children aged 0–2 years lagged 1 year also appears as a covariate (see Hyslop (1999) for discussion). The degree of persistence in employment status is very strong. P(hit = 1|hit−1 = 1) is 91%, while P(hit = 0|hit−1 = 0) is 78%. There is also an important asymmetry in transition rates. P(hit = 1|hit−1 = 1 hit−2 = 0) is 722%, while P(hit = 1|hit−1 = 0 hit−2 = 1) is only 403%. This implies the model is not simply RE, but that there is also some type of short run persistence (e.g., AR(1) errors and/or state dependence). Transition patterns are critical for identifying the relative importance of random effects, AR(1) errors, and first-order state dependence, but if some transitions are spurious, due to misclassification of employment status, there may be a substantial effect on estimates of the relative importance of these factors, as well as on conclusions regarding the endogeneity of nonlabor income and fertility in the CRE model. 4. ESTIMATION RESULTS Tables II–IV present SML estimates of the CRE model in (7). In addition to the variables reported, all models control for race, education, a quadratic in age, unrestricted year effects, and the lagged value of number of children aged 0–2. 13 There is substantial variation over time in numbers of children and transitory nonlabor income. The standard deviations over time of the three fertility variables (in ascending age order) and transitory nonlabor income are 159, 182, 375, and 149, respectively. Significant variation in these variables is important for identification of the δt in the CRE estimator.
982
M. P. KEANE AND R. M. SAUER
TABLE I SAMPLE CHARACTERISTICS PSID WAVES 12–19 (1978–1985) (N = 1812)a Mean 1
Std. Dev. 2
Employed (avg. over 1979–1985)
705 (008)
362
Employed 1979
710 (011) 694 (011) 687 (011) 682 (011) 700 (011) 733 (010) 727 (010)
454
Employed 1980 Employed 1981 Employed 1982 Employed 1983 Employed 1984 Employed 1985
Husband’s annual earnings (avg. over 1979–1985)
2959 (47)
461 464 466 458 442 445
1997
No. children aged 0–2 years (avg. over 1978–1985)
249 (007)
313
No. children aged 3–5 years (avg. over 1978–1985)
296 (008)
338
No. children aged 6–17 years (avg. over 1978–1985)
989 (022)
948
Age (1980)
3434 (02)
977
Education (maximum over 1979–1985)
1290 (05)
233
Race (1 = black)
216 (010)
412
a Means and standard errors (in parentheses) for 1812 continuously married women in the PSID between 1979 and 1985, aged 18–60 in 1980, and with a husband who has positive annual hours worked and positive wages each year during the sample period. Earnings are in thousands of 1987 dollars. Variable definitions and sample selection criteria are the same as those chosen by Hyslop (1999).
983
CLASSIFICATION ERROR IN CHOICE MODELS TABLE II CORRELATED RANDOM EFFECTS PROBIT MODELS OF EMPLOYMENT STATUS (SML ESTIMATES)a Hyslop
Keane and Sauer
No CE 1
No CE 2
γ0
−341 (05) −099 (03) −300 (03) −247 (03) −084 (03) 804 (02) —
−336 (05) −103 (03) −305 (03) −245 (03) −083 (03) 829 (04) —
γ1
—
—
γ2
—
—
−4888.38 1812 32.36 (.00)∗∗ 12.77 (.12) 21.74 (.01)∗∗ 48.50 (.00)∗∗
−4887.75 1812 35.31 (.00)∗∗ 13.02 (.11) 23.01 (.00)∗∗ 48.71 (.00)∗∗
yip yit #Kids0-2t #Kids3-5t #Kids6-17t Var(ηi )
Log-likelihood N δ#Kids0-2 = 0 δ#Kids3-5 = 0 δ#Kids6-17 = 0 δyit = 0
CE 3
−400 (04) −127 (02) −290 (04) −265 (03) −090 (02) 938 (07) −2427 (09) 6996 (21) — −4878.27 1812 52.14 (.00)∗∗ 49.04 (.00)∗∗ 49.50 (.00)∗∗ 50.08 (.00)∗∗
CE 4
−375 (04) −172 (03) −388 (05) −271 (04) −087 (03) 943 (10) −2386 (11) 5056 (19) 2611 (11) −4672.62 1812 57.34 (.00)∗∗ 61.04 (.00)∗∗ 61.19 (.00)∗∗ 62.60 (.00)∗∗
a All specifications include number of children aged 0–2 years lagged 1 year, race, maximum years of education over the sample period, a quadratic in age, and unrestricted year effects. Nonlabor income is measured by yip and yit , which denote husband’s permanent (sample average) and transitory (deviations from sample average) annual earnings, respectively. Var(ηi ) is the variance of permanent unobserved heterogeneity and the γ ’s are the classification error parameters. ∗ indicates significance at the 1% level and ∗∗ indicates significance at the 5% level.
4.1. Basic CRE Model Column 1 of Table II reports estimates of the CRE model with no AR(1) serial correlation, no first-order state dependence (SD(1)), and no correction for classification error (No CE). The estimates were obtained by Hyslop (1999) using the SML-GHK algorithm.14 The parameter estimates are as expected: the negative effect of “permanent” nonlabor income on work decisions is stronger 14
This CRE model could have been estimated without simulation (e.g., using quadrature). We use SML so differences with the AR(1) models in columns 3 and 4 do not arise due to simulation per se.
984
M. P. KEANE AND R. M. SAUER TABLE III
CORRELATED RANDOM EFFECTS PROBIT MODELS OF EMPLOYMENT STATUS WITH AR(1) ERRORS (SML ESTIMATES)a Hyslop
Keane and Sauer
No CE 1
No CE 2
CE 3
CE 4
γ0
−332 (05) −097 (03) −272 (03) −234 (03) −077 (02) 546 (04) 696 (04) —
−327 (04) −108 (03) −251 (03) −219 (02) −083 (02) 582 (03) 710 (05) —
γ1
—
—
γ2
—
—
−345 (00) −112 (01) −306 (02) −265 (01) −079 (01) 830 (03) 746 (02) −2650 (12) 7909 (35) —
−4663.71 1812 9.65 (.29) 9.37 (.31) 8.04 (.43) 8.22 (.22)
−4662.55 1812 10.27 (.25) 10.39 (.24) 9.44 (.31) 8.91 (.18)
−4646.65 1812 36.05 (.00)∗∗ 43.80 (.00)∗∗ 52.44 (.00)∗∗ 53.84 (.00)∗∗
−345 (00) −085 (01) −307 (02) −269 (01) −083 (01) 831 (04) 748 (03) −2675 (13) 6837 (85) 1576 (19) −4633.67 1812 37.31 (.00)∗∗ 35.17 (.00)∗∗ 34.53 (.00)∗∗ 40.45 (.00)∗∗
yip yit #Kids0-2t #Kids3-5t #Kids6-17t Var(ηi ) ρ
Log-likelihood N δ#Kids0-2 = 0 δ#Kids3-5 = 0 δ#Kids6-17 = 0 δyit = 0
a All specifications include number of children aged 0–2 years lagged 1 year, race, maximum years of education over the sample period, a quadratic in age, and unrestricted year effects. Nonlabor income is measured by yip and yit , which denote husband’s permanent (sample average) and transitory (deviations from sample average) annual earnings, respectively. Var(ηi ) is the variance of permanent unobserved heterogeneity and the γ ’s are the classification error parameters. ρ is the AR(1) serial correlation coefficient. ∗ indicates significance at the 1% level and ∗∗ indicates significance at the 5% level.
than that of transitory nonlabor income, and young children have a larger negative effect on employment than older children. The estimate of Var(ηi ) implies that 804% of the overall error variance is due to permanent unobserved heterogeneity.15 The bottom four rows of the table report likelihood ratio tests for 15 The proportion of the overall error variance σu2 due to permanent unobserved heterogeneity is ση2 /σu2 = ση2 /(ση2 + σε2 ) = ση2 , following the normalization for scale, σu2 = 1.
985
CLASSIFICATION ERROR IN CHOICE MODELS TABLE IV
CORRELATED RANDOM EFFECTS PROBIT MODELS OF EMPLOYMENT STATUS WITH AR(1) ERRORS AND FIRST-ORDER STATE DEPENDENCE (SML ESTIMATES)a Hyslop
Keane and Sauer
No CE 1
No CE 2
γ0
−285 (05) −140 (04) −252 (05) −135 (05) −054 (04) 485 (04) −213 (04) 1042 (09) 494 (03) —
−291 (05) −137 (05) −254 (05) −131 (04) −053 (04) 519 (06) −141 (03) 1031 (07) 561 (09) —
γ1
—
—
γ2
—
—
−4643.52 1812 3.39 (.91) 3.84 (.87) 3.34 (.91) 2.92 (.82)
−4641.62 1812 6.02 (.65) 6.78 (.56) 6.89 (.55) 5.92 (.43)
yip yit #Kids0-2t #Kids3-5t #Kids6-17t Var(ηi ) ρ ht−1 Corr(ui0 uit )
Log-likelihood N δ#Kids0-2 = 0 δ#Kids3-5 = 0 δ#Kids6-17 = 0 δyit = 0
CE 3
−362 (01) −134 (03) −322 (05) −158 (03) −072 (02) 781 (09) 619 (03) 733 (03) 835 (18) −2684 (09) 6842 (14) — −4609.70 1812 39.80 (.00)∗∗ 35.90 (.00)∗∗ 32.97 (.00)∗∗ 47.70 (.00)∗∗
CE 4
−451 (01) −186 (03) −420 (05) −171 (03) −110 (03) 787 (11) 649 (03) 726 (04) 853 (21) −2252 (08) 5427 (21) 1335 (17) −4583.94 1812 36.91 (.00)∗∗ 32.25 (.00)∗∗ 31.19 (.00)∗∗ 38.20 (.00)∗∗
a All specifications include number of children aged 0–2 years lagged 1 year, race, maximum years of education over the sample period, a quadratic in age, and unrestricted year effects. Nonlabor income is measured by yip and yit , which denote husband’s permanent (sample average) and transitory (deviations from sample average) annual earnings, respectively. Var(ηi ) is the variance of permanent unobserved heterogeneity and the γ ’s are the classification error parameters. ρ is the AR(1) serial correlation coefficient and ht−1 is lagged participation status. Corr(ui0 uit ) is the error correlation relevant for the Heckman approximate solution to the initial conditions problem. ∗ indicates significance at the 1% level and ∗∗ indicates significance at the 5% level.
exogeneity of children in three age ranges (0–2, 3–5, and 6–17) and nonlabor income (i.e., tests of H(0): δt = 0 ∀t). The null hypothesis that children and nonlabor income are exogenous is clearly rejected.
986
M. P. KEANE AND R. M. SAUER
Column 2 presents estimates of the exact same model except we use our SML algorithm, based on unconditional simulation, instead of SML-GHK.16 We also fix the level of classification error to near zero, that is, π01 = π10 = 0025. The purpose of this exercise is to verify that any difference between our results and those of Hyslop (1999) that we may find later is due to introduction of classification error, not use of a different simulation method. Comparing columns 1 and 2, we see the results are essentially identical—the alternative estimation method makes almost no difference. Next, we introduce classification error. Column 3 presents estimates of the model with no persistence in misclassification (i.e., γ2 = 0 in (7)), while column 4 reports the model that allows persistence. Allowing for classification error (of either type) produces little change in the coefficients on covariates, but it increases the estimated fraction of the overall error variance due to unobserved heterogeneity from about 80% to 94%.17 This large increase in the importance of unobserved heterogeneity suggests that misclassification exaggerates the frequency of transitions between labor market states. Given the increased importance of the random effects, it is not surprising that the χ2 statistics for the hypotheses of exogenous fertility and nonlabor income increase substantially, leading to even stronger rejections of exogeneity.18 The estimates of γ0 and γ1 in column 3 can be used to calculate the classification error rates implied by the model. The probability of reporting working when the true state is not working ( π01 ) is 081. Conversely, the probability of falsely reporting nonemployment ( π10 ) is 010. These classification error rates can be compared to the analogous rates of 4% and 15% obtained by Poterba and Summers for the CPS.19 The overall (i.e., unconditional) error rate implied by our model is only 18%. Thus, we see that even a fairly “small” amount of classification error can lead to substantial attenuation bias in the importance of unobserved heterogeneity. The estimate of γ2 in column 4 implies considerable persistence in misreporting. However, we reserve further discussion of this point until we get to the models with AR(1) errors. The reason is that, as we shall see, in models Hyslop used 40 draws to implement GHK, while we use M = 1500 simulated choice histories. Comparing columns 2 and 3 by a likelihood ratio test produces a χ2 (2) of 1996 and a p-value of 000. Thus, introducing classification error also leads to a significant improvement in fit. 18 The increased σα2 makes it easier to detect correlations between the individual effect and error fertility and nonlabor income. Note that σα2 is bigger in the CRE models with classification ση2 is larger (recall that σα2 = Var( t=0T Zit × both because Zit δt is more important and because δt ) + ση2 ). 19 The CPS asks about current employment while the PSID asks about annual employment, so the two measures are not strictly comparable. Our prior is that a current measure would tend to have less error (i.e., it is easier to say if you are employed today than if you were employed at all during the past year), so that CPS error rates would be lower than PSID error rates. Of course, this is only speculative. The point of our comparison is merely to show that our error rates are not implausibly high. 16 17
CLASSIFICATION ERROR IN CHOICE MODELS
987
with only random effects the parameter γ2 tends to “sop up” omitted serial correlation in εit . 4.2. CRE With AR(1) Errors Table III reports estimates of the same sequence of models as in Table II, except now we allow for AR(1) serial correlation in the transitory error. Columns 1 and 2 reproduce the rather dramatic finding from Hyslop (1999). Specifically, with the introduction of AR(1) errors we can no longer reject the null hypothesis that fertility and nonlabor income are exogenous at any conventional level of significance. Introducing AR(1) errors has little impact on the nonlabor income and fertility coefficients, but the importance of the individual effect is considerably reduced, dropping from 80% of the overall error variance to only 55% (compare column 1 of Tables II and III). The estimated AR(1) coefficient ( ρ) is 696 and precisely estimated, and including it leads to a 225 point improvement in the log-likelihood (compare column 1 in Tables II and III). Thus, AR(1) serial correlation appears to be an important source of persistence in reported labor market states. This replicates Hyslop’s other main result: that the equicorrelation assumption is soundly rejected. Table III columns 3 and 4 introduce classification error. Here we see our main result. When classification is introduced, the fraction of variance accounted for by random effects increases from about 55% to 83%, and the hypotheses of exogenous fertility and nonlabor income are soundly rejected. This is true regardless of whether we allow for persistence in classification error. Note that this change in the exogeneity test results is consistent with the overall importance of the random effect increasing when we account for measurement error. As the importance of the RE increases, the correlation between it and fertility/nonlabor income becomes easier to detect (and more important as a determinant of labor supply behavior). Also note that the AR(1) parameter actually increases (slightly) when we introduce classification error, from 70 in column 1 to 75 in columns 3 and 4. Thus, Hyslop’s other main finding—the rejection of equicorrelation—is still supported.20 The estimates of γ0 and γ1 in column 3 imply that π01 is 066 and π10 is 005. These error rates are again comparable to the figures of 4% and 15% obtained by Poterba and Summers (1986). The overall (i.e., unconditional) rate of misclassification implied by our model is 13%. A likelihood ratio test for the joint significance of γ0 and γ1 produces a χ2 (2) statistic of 318, implying a p-value of 000. 20 Note also that introduction of AR(1) errors into (either) classification error model reduces the fraction of variance due to permanent unobserved heterogeneity from about 94% to 83% (compare columns 3 and 4 in Table II to columns 3 and 4 in Table III).
988
M. P. KEANE AND R. M. SAUER
Column 4 presents estimates allowing for persistence in misclassification. The estimate of γ2 implies substantial persistence. There is a substantial increase in the probability of falsely reporting a particular labor market state if that same state was reported in the previous period.21 This suggests that persistent misclassification may be an important source of recorded persistence in female employment. Note, however, that the strength of persistence in misclassification is sensitive to the inclusion of AR(1) errors in the model. In Table III column 4, relaxing the restriction that γ2 = 0 results in a rather large improvement in the log-likelihood of 13 points, but this is much smaller than the 206 point improvement we saw in Table II column 4 when we added persistent classification error to a model without AR(1) errors.22 Thus, while still significant, persistent classification error does not give nearly so great a likelihood improvement once we allow for AR(1) errors.23 4.3. CRE With AR(1) Errors and SD(1) Table IV reports results for more general CRE models which allow for both AR(1) serial correlation and first-order state dependence (SD(1)). As in Hyslop (1999), we deal with the initial conditions problem that arises when SD(1) is included in the model by using the Heckman approximate solution. Column 1 reports the model without classification error from Hyslop (1999). The coefficient on lagged employment is a strong 1042 and is precisely estimated. Including lagged employment in the model improves the log-likelihood by 20 points over the CRE +AR(1) model, and reduces the variance of the individual effect from 55% to 49%. The estimate of the AR(1) serial correlation coefficient ρ falls dramatically from 696 to −213. Column 2 reports estimates of the same model, except using our SML estimator. The estimates are little different from Hyslop’s and his main finding is again replicated: In the CRE +AR(1) + SD(1) model, the hypothesis of exogeneity of fertility and nonlabor income cannot be rejected at any conventional level of significance. Columns 3 and 4 report estimates of models that include classification error. These models produce substantial improvements in the log-likelihood: 21 For instance, the probability of reporting employment, when the true state is nonemployment and nonemployment is reported in the previous period, is 064, but if employment was reported in the previous period, this error rate rises to 250. Similarly, the probability of reporting nonemployment, when the true state is employment and employment is reported in the previous period, is only 003, but if nonemployment was reported in the previous period, this error rate rises to 015. 22 Also, comparing column 4 of Tables II and III, we see that including the AR(1) error component leads to a drop in the estimated persistence in misclassification (i.e., γ2 falls from 261 to 158). 23 The intuition for how the parameters ρ and γ2 are distinguished is similar to that for how serial correlation and state dependence are distinguished. Specially, γ2 > 0 implies that lagged X’s help to predict current choices, while ρ > 0 does not have this implication.
CLASSIFICATION ERROR IN CHOICE MODELS
989
32 points with no persistence in classification error and an additional 26 points when persistence is allowed. Compared to Hyslop (1999), they produce quite different estimates of the importance of random effects, AR(1) errors, and state dependence. The first-order state dependence coefficient falls to about 73 (compared to 104 in column 1), the fraction of the error variance due to random effects increases to 78 (compared to 49 in column 1), and the AR(1) coefficient increases to about 62 to 65 (compared to −21). Particularly notable is the complete reversal in sign on the AR(1) coefficient, back to a more plausible positive value. Thus, failure to account for classification error produces severe attenuation biases in the importance of unobserved heterogeneity and AR(1) serial correlation, and an upward bias in the extent of first-order state dependence.24 Note π10 = 015) are that the estimated classification error rates ( π01 = 064 and similar in magnitude to those obtained in earlier specifications and remain statistically significant. They are also quite close to the analogous rates calculated by Poterba and Summers for the CPS (i.e., 4% and 15%, respectively). The overall error rate implied by our model is 18%. Also, the estimated degree of persistence in misclassification is only slightly smaller than in the RE + AR(1) model (compare γ2 in column 4 of Tables III and IV).25 Finally, the classification error models in columns 3 and 4 again reject overwhelmingly the hypotheses of exogenous fertility and nonlabor income. The difference in results from columns 1 and 2 is again a direct result of the greater estimated variance of the random effect in models that accommodate classification error. 5. CONCLUSION Estimating the relative importance of state dependence and permanent unobserved heterogeneity, and the influence of children and nonlabor income, have long been important topics in the literature on female labor supply. Hyslop (1999) contributed to this literature by estimating panel probit models of married women’s employment decisions, using PSID data from 1979 to 1985. His innovation was to relax the equicorrelation assumption of the common 24
The main parameter of the Heckman approximate solution to the initial conditions problem,
i0 uit ), also suffers from an attenuation bias. ρ0 = Corr(u 25 The intuition for how one can distinguish true state dependence γ > 0 from persistence in misclassification γ2 > 0 is as follows. If there is persistence in classification error but no true state dependence, we should have E(h∗it |Xit h∗it−1 Xit−1 ) = E(h∗it |Xit h∗it−1 ) However, in a first-order Markov model, the lagged state is only a sufficient statistic for lagged inputs if it is measured without error. Thus, if true state dependence is also present (in addition to persistent misreporting), then lagged X’s will help to predict current choices even conditional on the lagged (measured) choice.
990
M. P. KEANE AND R. M. SAUER
random effects model by allowing for an AR(1) error component. He obtained two main findings: (i) the AR(1) error component is important and when it is included the importance of random effects is substantially reduced, and (ii) once the AR(1) error component is included, the hypothesis that fertility and husband’s income are exogenous—in the sense of being uncorrelated with the random effects—cannot be rejected. We extend Hyslop’s model by nesting it within a model of classification error in reported employment status. Our estimates imply that the extent of classification error in the data is rather modest, that is, employment status is misclassified in about 13% to 18% of cases on average. The extent of classification error that we estimate for the PSID is in the ballpark of estimates obtained by Poterba and Summers for employment status in the CPS, which gives face validity to our results. Crucially, we find that even these modest levels of classification error (i.e., 13% to 18%) are sufficient to cause models that ignore it to substantially understate the importance of individual random effects. This is obviously due to the spurious transitions created by misclassification. After correcting for classification error, we obtain a large increase in the estimated variance of the random effects. As a result, correlation between the random effects and fertility/nonlabor income becomes easier to detect, and we soundly reject the hypothesis that fertility and nonlabor income are exogenous. This is in sharp contrast to main result (ii) in Hyslop (1999). Our results suggest that researchers estimating dynamic discrete choice models should be careful to consider the possible impact of misclassification on their results. REFERENCES CHAMBERLAIN, G. (1982): “Multivariate Regression Models for Panel Data,” Journal of Econometrics, 18, 5–46. [978] (1984): “Panel Data,” in Handbook of Econometrics, Vol. 2, ed. by Z. Griliches and M. D. Intrilligator. Amsterdam: Elsevier Science, Chapter 22. [975,978,979] ECKSTEIN, Z., AND K. I. WOLPIN (1989): “Dynamic Labor Force Participation of Married Women and Endogenous Work Experience,” Review of Economic Studies, 56, 375–390. [975] HAUSMAN, J. A., J. ABREVAYA, AND F. M. SCOTT-MORTON (1998): “Misclassification of the Dependent Variable in a Discrete-Response Setting,” Journal of Econometrics, 87, 239–269. [980] HECKMAN, J. J. (1981): “The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating a Discrete Time–Discrete Data Stochastic Process and Some Monte Carlo Evidence,” in Structural Analysis of Discrete Data With Econometric Applications, ed. by C. F. Manski and D. McFadden. Cambridge, MA: MIT Press. [979] HECKMAN, J. J., AND R. WILLIS (1977): “A Beta-Logistic Model for the Analysis of Sequential Labor Force Participation by Married Women,” Journal of Political Economy, 85, 27–58. [975] HOTZ, V. J., AND R. A. MILLER (1988): “An Empirical Analysis of Life Cycle Fertility and Female Labor Supply,” Econometrica, 56, 91–118. [976] HYSLOP, D. R. (1999): “State Dependence, Serial Correlation and Heterogeneity in Intertemporal Labor Force Participation of Married Women,” Econometrica, 67, 1255–1294. [975-977,979, 981,982,986-990]
CLASSIFICATION ERROR IN CHOICE MODELS
991
JAKUBSON, G. (1988): “The Sensitivity of Labor Supply Parameter Estimates to Unobserved Individual Effects: Fixed- and Random-Effects Estimates in a Nonlinear Model Using Panel Data,” Journal of Labor Economics, 6, 302–329. [975] KEANE, M. P. (1994): “A Computationally Practical Simulation Estimator for Panel Data,” Econometrica, 62, 95–116. [976,977] KEANE, M. P., AND R. M. SAUER (2005): “A Computationally Practical Simulation Estimation Algorithm for Dynamic Panel Data Models With Unobserved Endogenous State Variables,” Unpublished Manuscript, available at http://ssrn.com/abstract=448240. [977,980] (2006): “Classification Error in Dynamic Discrete Choice Models: Implications for Female Labor Supply Behavior,” Discussion Paper 2332, IZA. [979] (2009): “Supplement to ‘Classification Error in Dynamic Discrete Choice Models: Implications for Female Labor Supply Behavior’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ecta/Supmat/7642_simulations.zip. [977] KEANE, M. P., AND K. I. WOLPIN (1997): “The Career Decisions of Young Men,” Journal of Political Economy, 105, 473–522. [979] (2001): “The Effect of Parental Transfers and Borrowing Constraints on Educational Attainment,” International Economic Review, 42, 1051–1103. [977] (2006): “The Role of Labor and Marriage Markets, Preference Heterogeneity and the Welfare System in the Life Cycle Decisions of Black, Hispanic and White Women,” Working Paper 06-004, PIER. [976] MOFFITT, R. (1984): “Life Cycles Profile of Labour Supply and Fertility,” Review of Economic Studies, 51, 263–278. [976] MROZ, T. A. (1987): “The Sensitivity of an Empirical Model of Married Women’s Hours of Work to Economic and Statistical Assumptions,” Econometrica, 55, 765–799. [975] NAKAMURA, A., AND M. NAKAMURA (1985): “Dynamic Models of the Labor Force Behavior of Married Women Which Can Be Estimated Using Limited Amounts of Information,” Journal of Econometrics, 27, 273–298. [975] POTERBA, J. M., AND L. H. SUMMERS (1986): “Reporting Errors and Labor Market Dynamics,” Econometrica, 54, 1319–1338. [976] ROSENZWEIG, M. R., AND T. P. SCHULTZ (1985): “The Demand for and Supply of Births: Fertility and Its Life Cycle Consequences,” American Economic Review, 75, 992–1015. [975] WOOLDRIDGE, J. M. (2005): “Simple Solutions to the Initial Conditions Problem in Dynamic, Nonlinear Panel Data Models With Unobserved Effects,” Journal of Applied Econometrics, 20, 39–54. [979]
University of Technology Sydney, City Campus, P.O. Box 123, Broadway, Sydney, NSW 2007, Australia and Arizona State University, Tempe, AZ 85287, U.S.A.; [email protected] and School of Economics, Finance and Management, University of Bristol, 8 Woodland Road, Bristol BS8 1TN, U.K.; [email protected]. Manuscript received December, 2007; final revision received November, 2008.
Econometrica, Vol. 77, No. 3 (May, 2009), 993–999
ANNOUNCEMENTS 2009 AUSTRALASIAN MEETING
THE ECONOMETRIC SOCIETY AUSTRALASIAN MEETING in 2009 (ESAM09) will be held in Canberra, Australia, from 7 to 10 July. The meeting will be hosted by the School of Economics in the College of Business and Economics at the Australian National University, and the program co-chairs will be Professor Heather Anderson and Dr. Maria Racionero. The program will consist of invited speakers and contributed papers on a wide range of both theoretical and applied areas of econometrics, microeconomics, and macroeconomics. Please visit the conference website at http://esam09.anu.edu.au for more information. 2009 FAR EAST AND SOUTH ASIA MEETING
THE 2009 FAR EAST AND SOUTH ASIA MEETING of the Econometric Society (FESAMES 2009) will be hosted by the Faculty of Economics of the University of Tokyo, on 3–5 August, 2009. The venue of the meeting is the Hongo campus of the University of Tokyo which is about ten minutes away from the Tokyo JR station. The program will consist of invited and contributed sessions in all fields of economics. We encourage you to register at your earliest convenience. Partial subsidy is available for travel and local expenses of young economists submitting a paper for a contributed session in the conference. Confirmed Invited Speakers: Dilip Abreu (Princeton University) Abhijit Banerjee (Massachusetts Institute of Technology) Markus K. Brunnermeier (Princeton University) Larry G. Epstein (Boston University) Faruk R. Gul (Princeton University) James Heckman (University of Chicago) Han Hong (Stanford University) Ali Hortacsu (University of Chicago) Michihiro Kandori (University of Tokyo) Dean Karlan (Yale University) Nobuhiro Kiyotaki (Princeton University) John List (University of Chicago) Charles Manski (Northwestern University) Daniel McFadden (University of California, Berkeley) Costas Meghir (University College London) © 2009 The Econometric Society
DOI: 10.3982/ECTA773ANN
994
ANNOUNCEMENTS
Roger Myerson (University of Chicago) Sendhil Mullainathan (Harvard University) Ariel Pakes (Harvard University) Giorgio Primiceri (Northwestern University) Jean-Marc Robin (Paris School of Economics/UCL) Yuliy Sannikov (Princeton University) Hyun Song Shin (Princeton University) Christopher Sims (Princeton University) James Stock (Harvard University) David Weir (University of Michigan) Program Committee: Co-Chairs: Hidehiko Ichimura (University of Tokyo) Hitoshi Matsushima (University of Tokyo) Toni Braun, Macroeconomics (University of Tokyo) Xiaohong Chen, Econometric Theory (Yale University) Shinichi Fukuda, Monetary Economics, Macroeconomics, International Finance (University of Tokyo) Takeo Hoshi, Finance and Banking, Monetary Economics, Japanese Economy (UC San Diego) Toshihiro Ihori, Public Economics, Fiscal Policy (University of Tokyo) Hideshi Itoh, Contract Theory (Hitotsubashi University) Atsushi Kajii, Economic Theory, Information Economics, General Equilibrium, Game Theory (Kyoto University) Kazuya Kamiya, General Equilibrium, Decision Theory, Computational Economics (University of Tokyo) Yuichi Kitamura, Microeconometrics (Yale University) Kazuharu Kiyono, Trade, Industrial Economics, Applied Economics (Waseda University) Siu Fai Leung, Labor (Hong Kong University of Science and Technology) Akihiko Matsui, Game Theory (University of Tokyo) Tomoyuki Nakajima, Macroeconomic Theory (Kyoto University) Masao Ogaki, Macroeconometrics (Ohio State University) Hiroshi Ohashi, Industrial Organization, International Trade (University of Tokyo) Joon Y. Park, Time Series Econometrics (Texas A&M University) Tatsuyoshi Saijo, Experimental Economics, Mechanism Design (Osaka University) Makoto Saito, Asset Pricing, Consumption and Investment, Monetary Theory (Hitotsubashi University) Yasuyuki Sawada, Development, Applied Econometrics (University of Tokyo) Shigehiro Serizawa, Mechanism Design, Social Choice Theory
ANNOUNCEMENTS
995
(Osaka University) Akihisa Shibata, International Macroeconomics (Kyoto University) Takatoshi Tabuchi, Urban Economics, International Trade, Economic Geography (University of Tokyo) Noriyuki Yanagawa, Law and Economics, Financial Contract (University of Tokyo) Local Organizing Committee: Co-Chairs: Hidehiko Ichimura (University of Tokyo) Hitoshi Matsushima (University of Tokyo) Yoichi Arai (University of Tokyo) Yun Jeong Choi (University of Tokyo) Julen Esteban-Pretel (University of Tokyo) Masahiro Fujiwara (University of Tokyo) Fumio Hayashi (University of Tokyo) Isao Ishida (University of Tokyo) Takatoshi Ito (University of Tokyo) Motoshige Itoh (University of Tokyo) Katsuhito Iwai (University of Tokyo) Yasushi Iwamoto (University of Tokyo) Yoshitsugu Kanemoto (University of Tokyo) Takashi Kano (University of Tokyo) Takao Kobayashi (University of Tokyo) Tatsuya Kubokawa (University of Tokyo) Naoto Kunitomo (University of Tokyo) Hisashi Nakamura (University of Tokyo) Tetsuji Okazaki (University of Tokyo) Yasuhiro Omori (University of Tokyo) Akihiko Takahashi (University of Tokyo) Yoshiro Miwa (University of Tokyo) Kazuo Ueda (University of Tokyo) Makoto Yano (Kyoto University) Jiro Yoshida (University of Tokyo) Hiroshi Yoshikawa (University of Tokyo) 2009 EUROPEAN MEETING
THE 2009 EUROPEAN MEETING of the Econometric Society (ESEM) will take place in Barcelona, Spain from 23 to 27 August, 2009. The Meeting is jointly organized by the Barcelona Graduate School of Economics and it will run in parallel with the Congress of the European Economic Association (EEA).
996
ANNOUNCEMENTS
The Program Committee Chairs are Prof. Juuso Välimäki (Helsinki School of Economics) for Theoretical and Applied Economics; Prof. Gerard J. van der Berg (Free University Amsterdam) for Econometrics and Empirical Economics. This year’s Fisher–Schultz Lecture will be given by Faruk Gul (Princeton University). The Laffont Lecture will be given by Guido Imbens (Harvard University). The Local Arrangements Committee: Albert Carreras, Chairman—Universitat Pompeu Fabra and Barcelona GSE Carmen Bevià—Universitat Autònomade Barcelona and Barcelona GSE Jordi Brandts—Institute for Economic Analysis-CSIC and Barcelona GSE Eduard Vallory, Secretary—Barcelona GSE Director-General All details regarding the congress can be found on the website http:// eea-esem2009.barcelonagse.eu/. 2009 LATIN AMERICAN MEETING
THE 2009 LATIN AMERICAN MEETINGS will be held jointly with the Latin American and Caribbean Economic Association in Buenos Aires, Argentina, from October 1 to 3, 2009. The Meetings will be hosted by Universidad Torcuato Di Tella (UTDT). The Annual Meetings of these two academic associations will be run in parallel, under a single local organization. By registering for LAMES 2009, participants will be welcome to attend to all sessions of both meetings. Andrés Neumeyer (UTDT) is the conference chairman. The LAMES Program Committee is chaired by Emilio Espina (UTDT). The LACEA Program Committee is chaired by Sebastián Galiani (Washington University in St. Louis). Plenary Speakers: Ernesto Zedillo, Former President of Mexico, Yale University Roger Myerson, Nobel Laureate, University of Chicago, LAMES Presidential Address Mauricio Cardenas, Brookings Institution, LACEA Presidential Address Daron Acemoglu, MIT Guido Imbens, Harvard University
ANNOUNCEMENTS
John Moore, University of Edinburgh Invited Speakers: Fernando Alvarez, University of Chicago Jere Behrman, University of Pennsylvania Abhijit Banerjee, MIT Pablo Beker, University of Warwick Samuel Berlinski, University College of London Richard Blundell, University College of London Gustavo Bobonis, University of Toronto Michele Boldrin, Washington University in St. Louis Maristella Botticini, Boston University Francois Bourguignon, Paris School of Economics Francisco Buera, Northwestern University Guillermo Calvo, Columbia University Matias Cattaneo, University of Michigan V. V. Chari, University of Minnesota Satyajit Chatterjee, Philadelphia Federal Bank Lawrence Christiano, Northwestern University Ernesto Dal Bo, University of California, Berkeley David DeJong, University of Pittsburgh José de Gregorio, Banco Central de Chile Augusto de la Torre, The World Bank Rafael Di Tella, Harvard University Juan Dubra, Universidad de Montevideo Esther Duflo, MIT Jonathan Eaton, New York University Huberto Ennis, Universidad Carlos III de Madrid Martin Eichenbaum, Northwestern University Raquel Fernandez, New York University Sergio Firpo, Fundacao Getulio Vargas Sao Paulo Paul Gertler, University of California, Berkeley Edward Glaeser, Harvard University Ricardo Hausman, Harvard University Christian Hellwig, UCLA Bo Honoré, Princeton University Hugo Hopenhayn, UCLA Boyan Jovanovic, New York University Dean Karlan, Yale University Pat Kehoe, University of Minnesota Tim Kehoe, University of Minnesota Felix Kubler, Swiss Finance Institute Victor Lavy, Hebrew University
997
998
ANNOUNCEMENTS
David Levine, Washington University in St. Louis Santiago Levy, Inter American Development Bank Rodolofo Manuelli, Washington University Rosa Matzkin, UCLA Enrique Mendoza, University of Maryland Dilip Mookerjee, Boston University John Nye, George Mason University Rohini Pande, Harvard University Fabrizio Perri, University of Minnesota Andrew Postlewaite, University of Pennsylvania Martin Redrado, Banco Central de la Republica Argentina Carmen Reinhart, University of Maryland Rafael Repullo, CEMFI James Robinson, Harvard University Esteban Rossi-Hansberg, Princeton University Ernesto Schargrodsky, Universidad Di Tella Karl Schmedders, Kellogg School of Management Northwestern University Paolo Siconolfi, Columbia University Michele Tertil, Stanford University Miguel Urquiola, Columbia University Martin Uribe, Columbia University Andres Velasco, Ministerio de Hacienda, Chile John Wallis, University of Maryland Chuck Whiteman, University of Iowa Stanley Zin, Carnegie Mellon University Further information can be found at the conference website at http://www. lacealames2009.utdt.edu or by email at [email protected]. 2010 NORTH AMERICAN WINTER MEETING
THE 2010 NORTH AMERICAN WINTER MEETING of the Econometric Society will be held in Atlanta, GA, from January 3 to 5, 2010, as part of the annual meeting of the Allied Social Science Associations. The program will consist of contributed and invited papers. The program committee will be chaired by Dirk Bergemann of Yale University. This year we are pleased to invite submissions of entire sessions in addition to individual papers. Program Committee: Dirk Bergemann, Yale University, Chair Marco Battaglini, Princeton University (Political Economy) Roland Benabou, Princeton University (Behavioral Economics)
ANNOUNCEMENTS
999
Markus Brunnermeier, Princeton University (Financial Economics) Xiahong Chen, Yale University (Theoretical Econometrics, Time Series) Liran Einav, Stanford University (Industrial Organization) Luis Garicano, University of Chicago (Organization, Law and Economics) John Geanakoplos, Yale University (General Equilibrium Theory, Mathematical Economics) Mike Golosov, MIT (Macroeconomics) Pierre Olivier Gourinchas, University of California (International Finance) Igal Hendel, Northwestern (Empirical Microeconomics) Johannes Hoerner, Yale University (Game Theory) Han Hong, Stanford University (Applied Econometrics) Wojcich Kopczuk, Columbia University (Public Economics) Martin Lettau, University of California, Berkeley (Finance) Enrico Moretti, University of California, Berkeley (Labor) Muriel Niederle, Stanford University (Experimental Game Theory, Market Design) Luigi Pistaferri, Stanford University (Labor) Esteban Rossi-Hansberg, Princeton University (International Trade) Marciano Siniscalchi, Northwestern University (Decision Theory) Robert Townsend, Massachusetts Institute of Technology (Development Economics) Oleg Tsyvinski, Yale University (Macroeconomics, Public Finance) Harald Uhlig, University of Chicago (Macroeconomics, Computational Finance) Ricky Vohra, Northwestern University (Auction, Mechanism Design)
Econometrica, Vol. 77, No. 3 (May, 2009), 1001
FORTHCOMING PAPERS THE FOLLOWING MANUSCRIPTS, in addition to those listed in previous issues, have been accepted for publication in forthcoming issues of Econometrica. AL -NAJJAR, NABIL I.: “Decision Makers as Statisticians: Diversity, Ambiguity and Learning.” ANDREONI, JAMES, AND B. DOUGLAS BERNHEIM: “Social Image and the 50– 50 Norm: A Theoretical and Experimental Analysis of Audience Effects.” CILIBERTO, FEDERICO, AND ELIE TAMER: “Market Structure and Multiple Equilibria in Airline Markets.” DUFFIE, DARRELL, SEMYON MALAMUD, AND GUSTAVO MANSO: “Information Percolation With Equilibrium Search Dynamics.” GOSSNER, OLIVIER, EHUD KALAI, AND ROBERT WEBER: “Information Independence and Common Knowledge.” HIRANO, KEISUKE, AND JACK R. PORTER: “Asymptotics for Statistical Treatment Rules.” MYKLAND, PER A., AND LAN ZHANG: “Inference for Continuous Semimartingales Observed at High Frequency.” ONATSKI, ALEXEI: “Testing Hypotheses About the Number of Factors in Large Factor Models.” WANG, QIYING, AND PETER C. B. PHILLIPS: “Structural Nonparametric Cointegrating Regression.”
© 2009 The Econometric Society
DOI: 10.3982/ECTA773FORTH