CONTENTS FATIH GUVENEN: A Parsimonious Macroeconomic Model for Asset Pricing . . . . . . . . . . . . VERONICA GUERRIERI AND GUIDO LORENZONI: Liquidity and Trading Dynamics . . . . . . . .
1711 1751
FEDERICO CILIBERTO AND ELIE TAMER: Market Structure and Multiple Equilibria in Air-
line Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MOSHE HAZAN: Longevity and Lifetime Labor Supply: Evidence and Implications . . . . . SUSUMU IMAI, NEELAM JAIN, AND ANDREW CHING: Bayesian Estimation of Dynamic Discrete Choice Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . QIYING WANG AND PETER C. B. PHILLIPS: Structural Nonparametric Cointegrating Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JOHN K.-H. QUAH AND BRUNO STRULOVICI: Comparative Statics, Informativeness, and the Interval Dominance Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1791 1829 1865 1901 1949
NOTES AND COMMENTS: DEAN KARLAN AND JONATHAN ZINMAN:
Observing Unobservables: Identifying Information Asymmetries With a Consumer Credit Field Experiment . . . . . . . . 1993 DANIEL ACKERBERG, JOHN GEWEKE, AND JINYONG HAHN: Comments on “Convergence Properties of the Likelihood of Computed Dynamic Models” . . . . . . . . 2009
ANNOUNCEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2019 FORTHCOMING PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2023 REPORT OF THE PRESIDENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2025
(2009 Volume Table of Contents is located on p. iii of this issue.)
VOL. 77, NO. 6 — November, 2009
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org EDITOR STEPHEN MORRIS, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] MANAGING EDITOR GERI MATTSON, 2002 Holly Neck Road, Baltimore, MD 21221, U.S.A.; mattsonpublishingservices@ comcast.net CO-EDITORS DARON ACEMOGLU, Dept. of Economics, MIT, E52-380B, 50 Memorial Drive, Cambridge, MA 021421347, U.S.A.;
[email protected] WOLFGANG PESENDORFER, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] JEAN-MARC ROBIN, Maison des Sciences Economiques, Université Paris 1 Panthéon–Sorbonne, 106/112 bd de l’Hôpital, 75647 Paris Cedex 13, France and University College London, U.K.;
[email protected] LARRY SAMUELSON, Dept. of Economics, Yale University, 20 Hillhouse Avenue, New Haven, CT 065208281, U.S.A.;
[email protected] JAMES H. STOCK, Dept. of Economics, Harvard University, Littauer M-24, 1830 Cambridge Street, Cambridge, MA 02138, U.S.A.;
[email protected] HARALD UHLIG, Dept. of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, U.S.A.;
[email protected] ASSOCIATE EDITORS YACINE AÏT-SAHALIA, Princeton University JOSEPH G. ALTONJI, Yale University JAMES ANDREONI, University of California, San Diego JUSHAN BAI, Columbia University MARCO BATTAGLINI, Princeton University PIERPAOLO BATTIGALLI, Università Bocconi DIRK BERGEMANN, Yale University XIAOHONG CHEN, Yale University VICTOR CHERNOZHUKOV, Massachusetts Institute of Technology J. DARRELL DUFFIE, Stanford University JEFFREY ELY, Northwestern University HALUK ERGIN, Washington University in St. Louis MIKHAIL GOLOSOV, Yale University FARUK GUL, Princeton University JINYONG HAHN, University of California, Los Angeles PHILIP A. HAILE, Yale University MICHAEL JANSSON, University of California, Berkeley PHILIPPE JEHIEL, Paris School of Economics and University College London PER KRUSELL, Princeton University and Stockholm University FELIX KUBLER, University of Zurich
OLIVER LINTON, London School of Economics BART LIPMAN, Boston University THIERRY MAGNAC, Toulouse School of Economics (GREMAQ and IDEI) GEORGE J. MAILATH, University of Pennsylvania DAVID MARTIMORT, IDEI-GREMAQ, Université des Sciences Sociales de Toulouse STEVEN A. MATTHEWS, University of Pennsylvania ROSA L. MATZKIN, University of California, Los Angeles LEE OHANIAN, University of California, Los Angeles WOJCIECH OLSZEWSKI, Northwestern University NICOLA PERSICO, New York University BENJAMIN POLAK, Yale University PHILIP J. RENY, University of Chicago SUSANNE M. SCHENNACH, University of Chicago UZI SEGAL, Boston College NEIL SHEPHARD, University of Oxford MARCIANO SINISCALCHI, Northwestern University JEROEN M. SWINKELS, Northwestern University ELIE TAMER, Northwestern University EDWARD J. VYTLACIL, Yale University IVÁN WERNING, Massachusetts Institute of Technology ASHER WOLINSKY, Northwestern University
EDITORIAL ASSISTANT: MARY BETH BELLANDO, Dept. of Economics, Princeton University, Fisher Hall, Princeton, NJ 08544-1021, U.S.A.;
[email protected] Information on MANUSCRIPT SUBMISSION is provided in the last two pages. Information on MEMBERSHIP, SUBSCRIPTIONS, AND CLAIMS is provided in the inside back cover.
INDEX ARTICLES AL -NAJJAR, NABIL I.: Decision Makers as Statisticians: Diversity, Ambiguity, and Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . ALVAREZ, FERNANDO, AND FRANCESCO LIPPI: Financial Innovation and the Transactions Demand for Cash . . . . . . . . . . . ANDREONI, JAMES, AND B. DOUGLAS BERNHEIM: Social Image and the 50–50 Norm: A Theoretical and Experimental Analysis of Audience Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ANDREWS, DONALD W. K., AND PATRIK GUGGENBERGER: Hybrid and Size-Corrected Subsampling Methods . . . . . . . . . . ARELLANO, MANUEL, AND STÉPHANE BONHOMME: Robust Priors in Nonlinear Panel Data Models . . . . . . . . . . . . . . . . . . . BAI, JUSHAN: Panel Data Models With Interactive Fixed Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BANDIERA, ORIANA, IWAN BARANKAY, AND IMRAN RASUL: Social Connections and Incentives in the Workplace: Evidence From Personnel Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BARANKAY, IWAN: (See BANDIERA) BERNHEIM, B. DOUGLAS: (See ANDREONI) BLOOM, NICHOLAS: The Impact of Uncertainty Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BONHOMME, STÉPHANE: (See ARELLANO) BURKE, JONATHAN L.: Virtual Determinacy in Overlapping Generations Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAMBERLAIN, GARY, AND MARCELO J. MOREIRA: Decision Theory Applied to a Linear Panel Data Model . . . . . . . . . . . CHARNESS, GARY, AND URI GNEEZY: Incentives to Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHIAPPORI, P.-A., AND I. EKELAND: The Microeconomics of Efficient Group Behavior: Identification . . . . . . . . . . . . . . . . . . CHING, ANDREW: (See IMAI) CILIBERTO, FEDERICO, AND ELIE TAMER: Market Structure and Multiple Equilibria in Airline Markets . . . . . . . . . . . . . . . . CONLON, JOHN R.: Two New Conditions Supporting the First-Order Approach to Multisignal Principal–Agent Problems COSTINOT, ARNAUD: An Elementary Theory of Comparative Advantage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CURRARINI, SERGIO, MATTHEW O. JACKSON, AND PAOLO PIN: An Economic Model of Friendship: Homophily, Minorities, and Segregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DUFFIE, DARRELL, SEMYON MALAMUD, AND GUSTAVO MANSO: Information Percolation With Equilibrium Search Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EKELAND, I.: (See CHIAPPORI) ELLISON, GLENN, AND SARA FISHER ELLISON: Search, Obfuscation, and Price Elasticities on the Internet . . . . . . . . . . . . . . ELLISON, SARA FISHER: (See ELLISON) FORTNOW, LANCE, AND RAKESH V. VOHRA: The Complexity of Forecast Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GARRATT, RODNEY J., THOMAS TRÖGER, AND CHARLES Z. ZHENG: Collusion via Resale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GNEEZY, URI, KENNETH L. LEONARD, AND JOHN A. LIST: Gender Differences in Competition: Evidence From a Matrilineal and a Patriarchal Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GNEEZY, URI: (See CHARNESS) GOVINDAN, SRIHARI, AND ROBERT WILSON: On Forward Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GUERRE, EMMANUEL, ISABELLE PERRIGNE, AND QUANG VUONG: Nonparametric Identification of Risk Aversion in First-Price Auctions Under Exclusion Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GUERRIERI, VERONICA, AND GUIDO LORENZONI: Liquidity and Trading Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GUGGENBERGER, PATRIK: (See ANDREWS) GUVENEN, FATIH: A Parsimonious Macroeconomic Model for Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HANSEN, LARS PETER, AND JOSÉ A. SCHEINKMAN: Long-Term Risk: An Operator Approach . . . . . . . . . . . . . . . . . . . . . . . . . . HAZAN, MOSHE: Longevity and Lifetime Labor Supply: Evidence and Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HELLWIG, CHRISTIAN, AND GUIDO LORENZONI: Bubbles and Self-Enforcing Debt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HÖRNER, JOHANNES, AND STEFANO LOVO: Belief-Free Equilibria in Games With Incomplete Information . . . . . . . . . . . . . HÖRNER, JOHANNES, AND NICOLAS VIEILLE: Public vs. Private Offers in the Market for Lemons . . . . . . . . . . . . . . . . . . . . . . IMAI, SUSUMU, NEELAM JAIN, AND ANDREW CHING: Bayesian Estimation of Dynamic Discrete Choice Models . . . . . . . . IMBENS, GUIDO W., AND WHITNEY K. NEWEY: Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JACKSON, MATTHEW O.: (See CURRARINI) JAIN, NEELAM: (See IMAI) KASAHARA, HIROYUKI, AND KATSUMI SHIMOTSU: Nonparametric Identification of Finite Mixture Models of Dynamic Discrete Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LAGOS, RICARDO, AND GUILLAUME ROCHETEAU: Liquidity in Asset Markets With Search Frictions . . . . . . . . . . . . . . . . . . . LEONARD, KENNETH L.: (See GNEEZY) LIPPI, FRANCESCO: (See ALVAREZ) LIST, JOHN A.: (See GNEEZY) LORENZONI, GUIDO: (See GUERRIERI) LORENZONI, GUIDO: (See HELLWIG) LOVO, STEFANO: (See HÖRNER) MALAMUD, SEMYON: (See DUFFIE) MANSO, GUSTAVO: (See DUFFIE)
iii
1371 363 1607 721 489 1229 1047
623 235 107 909 763 1791 249 1165 1003 1513 427 93 1095 1637 1 1193 1751 1711 177 1829 1137 453 29 1865 1481
135 403
iv
INDEX
MOREIRA, MARCELO J.: (See CHAMBERLAIN) MYKLAND, PER A., AND LAN ZHANG: Inference for Continuous Semimartingales Observed at High Frequency . . . . . . . . NEWEY, WHITNEY K., AND FRANK WINDMEIJER: Generalized Method of Moments With Many Weak Moment Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NEWEY, WHITNEY K.: (See IMBENS) ONATSKI, ALEXEI: Testing Hypotheses About the Number of Factors in Large Factor Models . . . . . . . . . . . . . . . . . . . . . . . . . . PERRIGNE, ISABELLE: (See GUERRE) PHILLIPS, PETER C. B.: (See WANG) PIN, PAOLO: (See CURRARINI) PISSARIDES, CHRISTOPHER A.: The Unemployment Volatility Puzzle: Is Wage Stickiness the Answer? . . . . . . . . . . . . . . . . . . QUAH, JOHN K.-H., AND BRUNO STRULOVICI: Comparative Statics, Informativeness, and the Interval Dominance Order RASUL, IMRAN: (See BANDIERA) RIEDEL, FRANK: Optimal Stopping With Multiple Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ROCHETEAU, GUILLAUME: (See LAGOS) SCHEINKMAN, JOSÉ A.: (See HANSEN) SEO, KYOUNGWON: Ambiguity and Second-Order Belief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SHIMOTSU, KATSUMI: (See KASAHARA) SIEGEL, RON: All-Pay Contests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SINISCALCHI, MARCIANO: Vector Expected Utility and Attitudes Toward Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STRULOVICI, BRUNO: (See QUAH) TAMER, ELIE: (See CILIBERTO) TRÖGER, THOMAS: (See GARRATT) VIEILLE, NICOLAS: (See HÖRNER) VOHRA, RAKESH V.: (See FORTNOW) VUONG, QUANG: (See GUERRE) WANG, QIYING, AND PETER C. B. PHILLIPS: Structural Nonparametric Cointegrating Regression . . . . . . . . . . . . . . . . . . . . . . WILSON, ROBERT: (See GOVINDAN) WINDMEIJER, FRANK: (See NEWEY) ZHANG, LAN: (See MYKLAND) ZHENG, CHARLES Z.: (See GARRATT)
1403 687 1447
1339 1949 857
1575 71 801
1901
NOTES AND COMMENTS ACKERBERG, DANIEL, JOHN GEWEKE, AND JINYONG HAHN: Comments on “Convergence Properties of the Likelihood of Computed Dynamic Models” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ECHENIQUE, FEDERICO, AND IVANA KOMUNJER: Testing Models With Multiple Equilibria by Quantile Methods . . . . . . . FIRPO, SERGIO, NICOLE M. FORTIN, AND THOMAS LEMIEUX: Unconditional Quantile Regressions . . . . . . . . . . . . . . . . . . . . FORTIN, NICOLE M.: (See FIRPO) GEWEKE, JOHN: (See ACKERBERG) GONÇALVES, SÍLVIA, AND NOUR MEDDAHI: Bootstrapping Realized Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GOSSNER, OLIVIER, EHUD KALAI, AND ROBERT WEBER: Information Independence and Common Knowledge . . . . . . . . HAHN, JINYONG: (See ACKERBERG) HEYDENREICH, BIRGIT, RUDOLF MÜLLER, MARC UETZ, AND RAKESH V. VOHRA: Characterization of Revenue Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HIRANO, KEISUKE, AND JACK R. PORTER: Asymptotics for Statistical Treatment Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KALAI, EHUD: (See GOSSNER) KARLAN, DEAN, AND JONATHAN ZINMAN: Observing Unobservables: Identifying Information Asymmetries With a Consumer Credit Field Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KARNI, EDI: A Mechanism for Eliciting Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KEANE, MICHAEL P., AND ROBERT M. SAUER: Classification Error in Dynamic Discrete Choice Models: Implications for Female Labor Supply Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KLEVEN, HENRIK JACOBSEN, CLAUS THUSTRUP KREINER, AND EMMANUEL SAEZ: The Optimal Income Taxation of Couples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KOMUNJER, IVANA: (See ECHENIQUE) KREINER, CLAUS THUSTRUP: (See KLEVEN) LEE, SOKBAE, OLIVER LINTON, AND YOON-JAE WHANG: Testing for Stochastic Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . LEMIEUX, THOMAS: (See FIRPO) LINTON, OLIVER: (See LEE) MEDDAHI, NOUR: (See GONÇALVES) MÜLLER, RUDOLF: (See HEYDENREICH) NORETS, ANDRIY: Inference in Dynamic Discrete Choice Models With Serially Correlated Unobserved State Variables . PORTER, JACK R.: (See HIRANO) RINCÓN-ZAPATERO, JUAN PABLO, AND CARLOS RODRÍGUEZ-PALMERO: Corrigendum to “Existence and Uniqueness of Solutions to the Bellman Equation in the Unbounded Case” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2009 1281 953
283 1317
307 1683
1993 603 975 537
585
1665
317
INDEX
RODRÍGUEZ-PALMERO, CARLOS: (See RINCÓN-ZAPATERO) SAEZ, EMMANUEL: (See KLEVEN) SAUER, ROBERT M.: (See KEANE) SHI, SHOUYONG: Directed Search for Equilibrium Wage–Tenure Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SINCLAIR-DESGAGNÉ, BERNARD: Ancillary Statistics in Principal–Agent Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STOYE, JÖRG: More on Confidence Intervals for Partially Identified Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SUN, NING, AND ZAIFU YANG: A Double-Track Adjustment Process for Discrete Markets With Substitutes and Complements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SWENSEN, ANDERS RYGH: Corrigendum to “Bootstrap Algorithms for Testing and Determining the Cointegration Rank in VAR Models” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . UETZ, MARC: (See HEYDENREICH) VOHRA, RAKESH V.: (See HEYDENREICH) WEBER, ROBERT: (See GOSSNER) WHANG, YOON-JAE: (See LEE) YANG, ZAIFU: (See SUN) ZINMAN, JONATHAN: (See KARLAN)
v
561 279 1299 933 1703
ANNOUNCEMENTS ANNOUNCEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319, 607, 993, 1329, 1705, 2019 ECONOMETRICA REFEREES 2007–2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 2008 ELECTION OF FELLOWS TO THE ECONOMETRIC SOCIETY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 FORTHCOMING PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325, 615, 1001, 1336, 1709, 2023 REPORT OF THE EDITORS 2007–2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 REPORT OF THE EDITORS OF THE MONOGRAPH SERIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 REPORT OF THE PRESIDENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2025 REPORT OF THE SECRETARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 REPORT OF THE TREASURER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 SUBMISSION OF MANUSCRIPTS TO THE ECONOMETRIC SOCIETY MONOGRAPH SERIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361, 1337
TORSTEN PERSSON President of the Econometric Society, 2008 First Vice-President of the Econometric Society, 2007 Second Vice-President of the Econometric Society, 2006
SUBMISSION OF MANUSCRIPTS TO ECONOMETRICA 1. Members of the Econometric Society may submit papers to Econometrica electronically in pdf format according to the guidelines at the Society’s website: http://www.econometricsociety.org/submissions.asp Only electronic submissions will be accepted. In exceptional cases for those who are unable to submit electronic files in pdf format, one copy of a paper prepared according to the guidelines at the website above can be submitted, with a cover letter, by mail addressed to Professor Stephen Morris, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A. 2. There is no charge for submission to Econometrica, but only members of the Econometric Society may submit papers for consideration. In the case of coauthored manuscripts, at least one author must be a member of the Econometric Society. Nonmembers wishing to submit a paper may join the Society immediately via Blackwell Publishing’s website. Note that Econometrica rejects a substantial number of submissions without consulting outside referees. 3. It is a condition of publication in Econometrica that copyright of any published article be transferred to the Econometric Society. Submission of a paper will be taken to imply that the author agrees that copyright of the material will be transferred to the Econometric Society if and when the article is accepted for publication, and that the contents of the paper represent original and unpublished work that has not been submitted for publication elsewhere. If the author has submitted related work elsewhere, or if he does so during the term in which Econometrica is considering the manuscript, then it is the author’s responsibility to provide Econometrica with details. There is no page fee; nor is any payment made to the authors. 4. Econometrica has the policy that all empirical and experimental results as well as simulation experiments must be replicable. For this purpose the Journal editors require that all authors submit datasets, programs, and information on experiments that are needed for replication and some limited sensitivity analysis. (Authors of experimental papers can consult the posted list of what is required.) This material for replication will be made available through the Econometrica supplementary material website. The format is described in the posted information for authors. Submitting this material indicates that you license users to download, copy, and modify it; when doing so such users must acknowledge all authors as the original creators and Econometrica as the original publishers. If you have compelling reason we may post restrictions regarding such usage. At the same time the Journal understands that there may be some practical difficulties, such as in the case of proprietary datasets with limited access as well as public use datasets that require consent forms to be signed before use. In these cases the editors require that detailed data description and the programs used to generate the estimation datasets are deposited, as well as information of the source of the data so that researchers who do obtain access may be able to replicate the results. This exemption is offered on the understanding that the authors made reasonable effort to obtain permission to make available the final data used in estimation, but were not granted permission. We also understand that in some particularly complicated cases the estimation programs may have value in themselves and the authors may not make them public. This, together with any other difficulties relating to depositing data or restricting usage should be stated clearly when the paper is first submitted for review. In each case it will be at the editors’ discretion whether the paper can be reviewed. 5. Papers may be rejected, returned for specified revision, or accepted. Approximately 10% of submitted papers are eventually accepted. Currently, a paper will appear approximately six months from the date of acceptance. In 2002, 90% of new submissions were reviewed in six months or less. 6. Submitted manuscripts should be formatted for paper of standard size with margins of at least 1.25 inches on all sides, 1.5 or double spaced with text in 12 point font (i.e., under about 2,000 characters, 380 words, or 30 lines per page). Material should be organized to maximize readability; for instance footnotes, figures, etc., should not be placed at the end of the manuscript. We strongly encourage authors to submit manuscripts that are under 45 pages (17,000 words) including everything (except appendices containing extensive and detailed data and experimental instructions).
While we understand some papers must be longer, if the main body of a manuscript (excluding appendices) is more than the aforementioned length, it will typically be rejected without review. 7. Additional information that may be of use to authors is contained in the “Manual for Econometrica Authors, Revised” written by Drew Fudenberg and Dorothy Hodges, and published in the July, 1997 issue of Econometrica. It explains editorial policy regarding style and standards of craftmanship. One change from the procedures discussed in this document is that authors are not immediately told which coeditor is handling a manuscript. The manual also describes how footnotes, diagrams, tables, etc. need to be formatted once papers are accepted. It is not necessary to follow the formatting guidelines when first submitting a paper. Initial submissions need only be 1.5 or double-spaced and clearly organized. 8. Papers should be accompanied by an abstract of no more than 150 words that is full enough to convey the main results of the paper. On the same sheet as the abstract should appear the title of the paper, the name(s) and full address(es) of the author(s), and a list of keywords. 9. If you plan to submit a comment on an article which has appeared in Econometrica, we recommend corresponding with the author, but require this only if the comment indicates an error in the original paper. When you submit your comment, please include any correspondence with the author. Regarding comments pointing out errors, if an author does not respond to you after a reasonable amount of time, then indicate this when submitting. Authors will be invited to submit for consideration a reply to any accepted comment. 10. Manuscripts on experimental economics should adhere to the “Guidelines for Manuscripts on Experimental Economics” written by Thomas Palfrey and Robert Porter, and published in the July, 1991 issue of Econometrica. Typeset at VTEX, Akademijos Str. 4, 08412 Vilnius, Lithuania. Printed at The Sheridan Press, 450 Fame Avenue, Hanover, PA 17331, USA. Copyright © 2009 by The Econometric Society (ISSN 0012-9682). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation, including the name of the author. Copyrights for components of this work owned by others than the Econometric Society must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Posting of an article on the author’s own website is allowed subject to the inclusion of a copyright statement; the text of this statement can be downloaded from the copyright page on the website www.econometricsociety.org/permis.asp. Any other permission requests or questions should be addressed to Claire Sashi, General Manager, The Econometric Society, Dept. of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA. Email:
[email protected]. Econometrica (ISSN 0012-9682) is published bi-monthly by the Econometric Society, Department of Economics, New York University, 19 West 4th Street, New York, NY 10012. Mailing agent: Sheridan Press, 450 Fame Avenue, Hanover, PA 17331. Periodicals postage paid at New York, NY and additional mailing offices. U.S. POSTMASTER: Send all address changes to Econometrica, Blackwell Publishing Inc., Journals Dept., 350 Main St., Malden, MA 02148, USA.
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Membership, Subscriptions, and Claims Membership, subscriptions, and claims are handled by Blackwell Publishing, P.O. Box 1269, 9600 Garsington Rd., Oxford, OX4 2ZE, U.K.; Tel. (+44) 1865-778171; Fax (+44) 1865-471776; Email
[email protected]. North American members and subscribers may write to Blackwell Publishing, Journals Department, 350 Main St., Malden, MA 02148, USA; Tel. 781-3888200; Fax 781-3888232. Credit card payments can be made at www.econometricsociety.org. Please make checks/money orders payable to Blackwell Publishing. Memberships and subscriptions are accepted on a calendar year basis only; however, the Society welcomes new members and subscribers at any time of the year and will promptly send any missed issues published earlier in the same calendar year. Individual Membership Rates Ordinary Member 2009 Print + Online 1933 to date Ordinary Member 2009 Online only 1933 to date Student Member 2009 Print + Online 1933 to date Student Member 2009 Online only 1933 to date Ordinary Member—3 years (2009–2011) Print + Online 1933 to date Ordinary Member—3 years (2009–2011) Online only 1933 to date Subscription Rates for Libraries and Other Institutions Premium 2009 Print + Online 1999 to date Online 2009 Online only 1999 to date
$a $60
€b €40
£c £32
Concessionaryd $45
$25
€18
£14
$10
$45
€30
£25
$45
$10
€8
£6
$10
$175
€115
£92
$70
€50
£38
$a
€b
£c
Concessionaryd
$550
€360
£290
$50
$500
€325
£260
Free
a All
countries, excluding U.K., Euro area, and countries not classified as high income economies by the World Bank (http://www.worldbank.org/data/countryclass/classgroups.htm), pay the US$ rate. High income economies are: Andorra, Antigua and Barbuda, Aruba, Australia, Austria, The Bahamas, Bahrain, Barbados, Belgium, Bermuda, Brunei, Canada, Cayman Islands, Channel Islands, Cyprus, Czech Republic, Denmark, Equatorial Guinea, Estonia, Faeroe Islands, Finland, France, French Polynesia, Germany, Greece, Greenland, Guam, Hong Kong (China), Hungary, Iceland, Ireland, Isle of Man, Israel, Italy, Japan, Rep. of Korea, Kuwait, Liechtenstein, Luxembourg, Macao (China), Malta, Monaco, Netherlands, Netherlands Antilles, New Caledonia, New Zealand, Northern Mariana Islands, Norway, Oman, Portugal, Puerto Rico, Qatar, San Marino, Saudi Arabia, Singapore, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Taiwan (China), Trinidad and Tobago, United Arab Emirates, United Kingdom, United States, Virgin Islands (US). Canadian customers will have 6% GST added to the prices above. b Euro area countries only. c UK only. d Countries not classified as high income economies by the World Bank only. Back Issues Single issues from the current and previous two volumes are available from Blackwell Publishing; see address above. Earlier issues from 1986 (Vol. 54) onward may be obtained from Periodicals Service Co., 11 Main St., Germantown, NY 12526, USA; Tel. 518-5374700; Fax 518-5375899; Email
[email protected].
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Administrative Office: Department of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA; Tel. 212-9983820; Fax 212-9954487 General Manager: Claire Sashi (
[email protected]) 2009 OFFICERS ROGER B. MYERSON, University of Chicago, PRESIDENT JOHN MOORE, University of Edinburgh and London School of Economics, FIRST VICE-PRESIDENT BENGT HOLMSTRÖM, Massachusetts Institute of Technology, SECOND VICE-PRESIDENT TORSTEN PERSSON, Stockholm University, PAST PRESIDENT RAFAEL REPULLO, CEMFI, EXECUTIVE VICE-PRESIDENT
2009 COUNCIL (*)DARON ACEMOGLU, Massachusetts Institute of Technology MANUEL ARELLANO, CEMFI SUSAN ATHEY, Harvard University ORAZIO ATTANASIO, University College London (*)TIMOTHY J. BESLEY, London School of Economics KENNETH BINMORE, University College London TREVOR S. BREUSCH, Australian National University DAVID CARD, University of California, Berkeley JACQUES CRÉMER, Toulouse School of Economics (*)EDDIE DEKEL, Tel Aviv University and Northwestern University MATHIAS DEWATRIPONT, Free University of Brussels DARRELL DUFFIE, Stanford University HIDEHIKO ICHIMURA, University of Tokyo MATTHEW O. JACKSON, Stanford University
LAWRENCE J. LAU, The Chinese University of Hong Kong CESAR MARTINELLI, ITAM HITOSHI MATSUSHIMA, University of Tokyo MARGARET MEYER, University of Oxford PAUL R. MILGROM, Stanford University STEPHEN MORRIS, Princeton University ADRIAN R. PAGAN, Queensland University of Technology JOON Y. PARK, Texas A&M University and Sungkyunkwan University CHRISTOPHER A. PISSARIDES, London School of Economics ROBERT PORTER, Northwestern University ALVIN E. ROTH, Harvard University LARRY SAMUELSON, Yale University ARUNAVA SEN, Indian Statistical Institute MARILDA SOTOMAYOR, University of São Paulo JÖRGEN W. WEIBULL, Stockholm School of Economics
The Executive Committee consists of the Officers, the Editor, and the starred (*) members of the Council.
REGIONAL STANDING COMMITTEES Australasia: Trevor S. Breusch, Australian National University, CHAIR; Maxwell L. King, Monash University, SECRETARY. Europe and Other Areas: John Moore, University of Edinburgh and London School of Economics, CHAIR; Helmut Bester, Free University Berlin, SECRETARY; Enrique Sentana, CEMFI, TREASURER. Far East: Joon Y. Park, Texas A&M University and Sungkyunkwan University, CHAIR. Latin America: Pablo Andres Neumeyer, Universidad Torcuato Di Tella, CHAIR; Juan Dubra, University of Montevideo, SECRETARY. North America: Roger B. Myerson, University of Chicago, CHAIR; Claire Sashi, New York University, SECRETARY. South and Southeast Asia: Arunava Sen, Indian Statistical Institute, CHAIR.
Econometrica, Vol. 77, No. 6 (November, 2009), 1711–1750
A PARSIMONIOUS MACROECONOMIC MODEL FOR ASSET PRICING BY FATIH GUVENEN1 I study asset prices in a two-agent macroeconomic model with two key features: limited stock market participation and heterogeneity in the elasticity of intertemporal substitution in consumption (EIS). The model is consistent with some prominent features of asset prices, such as a high equity premium, relatively smooth interest rates, procyclical stock prices, and countercyclical variation in the equity premium, its volatility, and in the Sharpe ratio. In this model, the risk-free asset market plays a central role by allowing non-stockholders (with low EIS) to smooth the fluctuations in their labor income. This process concentrates non-stockholders’ labor income risk among a small group of stockholders, who then demand a high premium for bearing the aggregate equity risk. Furthermore, this mechanism is consistent with the very small share of aggregate wealth held by non-stockholders in the U.S. data, which has proved problematic for previous models with limited participation. I show that this large wealth inequality is also important for the model’s ability to generate a countercyclical equity premium. When it comes to business cycle performance, the model’s progress has been more limited: consumption is still too volatile compared to the data, whereas investment is still too smooth. These are important areas for potential improvement in this framework. KEYWORDS: Equity premium puzzle, limited stock market participation, elasticity of intertemporal substitution, wealth inequality, Epstein–Zin preferences.
1. INTRODUCTION SINCE THE 1980S, a vast body of empirical research has documented some interesting and puzzling features of asset prices. For example, Mehra and Prescott (1985) showed that the equity premium observed in the historical U.S. data was hard to reconcile with a canonical consumption-based asset pricing model and, as it later turned out, with many of its extensions. A parallel literature in financial economics found that the equity premium is predictable by a number of variables including the dividend yield, challenging the longheld view that stock returns follow a martingale (Campbell and Shiller (1988)). Other studies have documented that the expected equity premium, its volatility, and the ratio of the two—the conditional Sharpe ratio—move over time 1 For helpful conversations and comments, I thank Daron Acemoglu, John Campbell, V. V. Chari, Jeremy Greenwood, Lars Hansen, Urban Jermann, Narayana Kocherlakota, Per Krusell, Martin Lettau, Debbie Lucas, Sydney Ludvigson, Rajnish Mehra, Martin Schneider, Tony Smith, Kjetil Storesletten, Ivan Werning, Amir Yaron, and, especially, a co-editor and five anonymous referees, as well as seminar participants at Duke University (Fuqua), Massachusetts Institute of Technology, Ohio State University, University of Montréal, University of Pittsburgh, University of Rochester, UT-Austin, UT-Dallas, NBER Economic Fluctuations and Growth Meetings, NBER Asset Pricing Meetings, the SED Conference, and the AEA Winter Meetings. Financial support from the National Science Foundation under Grant SES-0351001 is gratefully acknowledged. The usual disclaimer applies.
© 2009 The Econometric Society
DOI: 10.3982/ECTA6658
1712
FATIH GUVENEN
following a countercyclical business cycle pattern (Schwert (1989) and Chou, Engle, and Kane (1992)). In this paper, I ask if these asset pricing phenomena can be explained in a parsimonious macroeconomic model with two key features: limited participation in the stock market and heterogeneity in the elasticity of intertemporal substitution in consumption (EIS). The limited nature of stock market participation and the concentration of stock wealth even among stockholders is well documented. For example, until the 1990s more than two-thirds of U.S. households did not own any stocks at all, while the richest 1% held 48% of all stocks (Poterba and Samwick (1995) and Investment Company Institute (2002)). As for the heterogeneity in preferences, the empirical evidence that I review in Section 3 indicates that stockholders have a higher EIS than non-stockholders. The interaction of these two features is important, as will become clear below. I choose the real business cycle model as the foundation that I build upon to provide a contrast with the poor asset pricing implications of that framework that are well known; this helps to highlight the role of the new features considered in this paper. Specifically, I study an economy with competitive markets and a neoclassical production technology subject to capital adjustment costs. There are two types of agents. The majority of households (first type) do not participate in the stock market where claims to the firm’s future dividend stream are traded. However, a risk-free bond is available to all households, so non-stockholders can also accumulate wealth and smooth consumption intertemporally. Finally, consistent with empirical evidence, non-stockholders are assumed to have a low EIS, whereas stockholders have a higher elasticity. To clarify the role played by different preference parameters, I employ Epstein–Zin preferences and disentangle risk aversion from the EIS. I find that heterogeneity in risk aversion plays no essential role, whereas heterogeneity in the EIS (and especially the low EIS of non-stockholders) is essential for the results of this paper. I first examine a benchmark version of the model in which labor supply is inelastic. The calibrated model is consistent with some salient features of asset prices, such as a high equity premium with a plausible volatility, and a low average interest rate. Furthermore, the variability of the interest rate is very low in the U.S. data, which has proved challenging to explain for some previous models that have otherwise successful implications. The standard deviation of the risk-free rate is about 4%–6.5% in the present model, which is still higher than in the U.S. data, but quite low compared to some of these earlier studies. So, the present paper provides a step in the right direction as far as interest rate volatility is concerned. Although there are now several papers that have made progress in explaining these unconditional moments in the context of production economies,2 some 2 Jermann (1998), Boldrin, Christiano, and Fisher (2001), Danthine and Donaldson (2002), Storesletten, Telmer, and Yaron (2007), and Uhlig (2006), among others.
A PARSIMONIOUS MACROECONOMIC MODEL
1713
aspects of asset price dynamics have proved more difficult to generate. The present model is consistent with the procyclical variation in stock prices, the mean reversion in the equity premium, and the countercyclical variation in the expected equity premium, in its volatility, and in the conditional Sharpe ratio. While the model also reproduces the long-horizon predictability of the equity premium, the degree of predictability is quantitatively small compared to the data. This paper, as well as earlier models with limited participation, builds on the empirical observation, first made by Mankiw and Zeldes (1991), that stockholders’ consumption growth is more volatile (and more highly correlated with returns) than that of non-stockholders. Therefore, a high equity premium can be consistent with the relatively smooth per capita consumption process in the U.S. data, since stockholders only make up a small fraction of the population. Existing theoretical models differ precisely in the economic mechanisms they propose for generating this high volatility of stockholders’ consumption growth. The mechanism in this paper differs from earlier studies (most notably, Saito (1995) and Basak and Cuoco (1998)) in some crucial ways. In particular, in these earlier models, non-stockholders consume out of wealth, which they must invest in the bond market given the absence of any other investment opportunity. As a result, each period stockholders make interest payments to non-stockholders, which leverages the capital income of stockholders, thereby amplifying their consumption volatility. Although this is a potentially powerful mechanism, it only works quantitatively if these interest payments are substantial, which in turn requires non-stockholders to own a substantial fraction of aggregate wealth. But, in reality, non-stockholders own only one-tenth of aggregate wealth in the United States, and this counterfactual implication has been an important criticism raised against these models. One contribution of this paper is to propose a new economic mechanism that avoids this counterfactual implication. Specifically, the mechanism results from the interaction of three factors. First, non-stockholders receive labor income every period, which is stochastic, and trade in the bond market to smooth the fluctuations in their consumption. Second, because of their low EIS, non-stockholders have a stronger desire for consumption smoothing— and therefore need the bond market much more—than stockholders (who have a higher EIS and an additional asset for consumption smoothing purposes). However, and third, since the source of risk is aggregate, the bond market cannot eliminate this risk and merely reallocates it across agents. In equilibrium, stockholders make payments to non-stockholders in a countercyclical fashion, which serves to smooth the consumption of non-stockholders and amplifies the volatility of stockholders, who then demand a large premium for holding aggregate risk. As shown in Section 6.3, this mechanism is consistent with a very small wealth share of non-stockholders precisely because it is the cyclical nature of interest payments that is key, and not their average amount (which can very well be zero).
1714
FATIH GUVENEN
The same mechanism also explains why the equity premium is countercyclical. Essentially, because non-stockholders have very low wealth, they become effectively more risk averse during recessions when their wealth falls even further (because with incomplete markets, value functions have more curvature at low wealth levels). This is not the case for stockholders who hold substantially more wealth. Consequently, during recessions, non-stockholders demand more consumption smoothing, which strengthens the mechanism described above (i.e., increased trade in the bond market, more volatile consumption growth for stockholders), generating a higher premium in recessions. In Section 5, I quantify the contribution of these channels to both the level and the countercyclicality of the equity premium. I also investigate the extent to which labor supply choice can be endogenized without compromising overall performance. Cobb–Douglas utility does not appear to be suitable for this task: it results in a deterioration of asset pricing results and generates labor hours much smoother than in the data. One reason for these results is that these preferences do not allow an independent calibration of the EIS and the Frisch labor supply elasticity, which are both crucial for my analysis. This poor performance perhaps does not come as a surprise in light of the earlier findings: for example, Lettau and Uhlig (2000), Boldrin, Christiano, and Fisher (2001, hereafter BCF), and Uhlig (2006) uncovered various problems generated by endogenous labor supply in asset pricing models and identified certain labor market frictions that successfully overcome these difficulties. Incorporating the same frictions into the model with Cobb–Douglas utility could also improve its performance, although this is beyond the scope of the present paper. Next I consider the utility specification first introduced by Greenwood, Hercowitz, and Huffman (1988, GHH) and find that it performs better: it preserves the plausible asset pricing implications of the model with inelastic labor fairly well and it generates business cycle implications in the same ballpark as existing macroeconomic models. While these results suggest that GHH preferences could provide a promising direction for endogenizing labor supply in this class of models, there is still much room for improvement: consumption volatility remains significantly higher than in the U.S. data, whereas investment volatility is still too low. Therefore, the progress made by this model has been rather limited in tackling these well-known shortcomings shared by many macro-based asset pricing models. A potentially fruitful approach could be to introduce certain labor market frictions, such as wage rigidities, that have been found to improve asset pricing models along these dimensions (Uhlig (2006)). This paper is related to a growing literature that builds models for jointly studying asset prices and macroeconomic behavior. In addition to the papers cited in footnote 2, Danthine and Donaldson (2002) constructed an entrepreneur–worker model, in which the worker lives hand-to-mouth and there is no labor supply choice. In this environment, labor contracts between the two agents act as “operational leverage” and affect asset prices in a way
A PARSIMONIOUS MACROECONOMIC MODEL
1715
that is similar to limited participation. Storesletten, Telmer, and Yaron (2007) built a heterogeneous-agent model and showed that persistent idiosyncratic shocks with countercyclical innovation variance generate plausible unconditional moments. As noted earlier, one difference in my paper is the focus on the dynamics of asset prices, which is not studied in these papers. Finally, in terms of integrating recursive preferences into macro-based asset pricing models, an important precursor is Tallarini (2000), who showed how one can fix the EIS and increase the risk aversion in the standard real business cycle (RBC) model to generate a high market price of risk without causing a deterioration in business cycle properties. The present paper goes one step further by introducing limited participation, preference heterogeneity, and adjustment costs, and generates a high equity premium with a risk aversion that is much lower than in Tallarini (2000). Other notable contributions that are contemporaneous with the present paper include Campanale, Castro, and Clementi (2007) and Gomes and Michaelides (2008), who built full blown macro-based asset pricing models with recursive preferences, and Uhlig (2007), who provided a convenient log-linear framework for asset pricing with recursive preferences, labor–leisure choice, and long-run risk. I further discuss some of these papers later. The paper is organized as follows. The next section presents the model, and the parametrization is discussed in Section 3. Sections 4 and 5 contain the asset pricing results and the mechanism. The macroeconomic implications are presented in Section 6; Section 7 concludes. 2. THE MODEL Households The economy is populated by two types of agents who live forever. The population is constant and is normalized to unity. Let μ ∈ (0 1) denote the measure of the second type of agents (who will be called stockholders later). Consumers are endowed with one unit of time every period, which they allocate between market work and leisure. I consider three different preference specifications in this paper that can be written as special cases of the Epstein–Zin recursive utility function (1)
i (1−ρi )/(1−αi ) 1/(1−ρi ) i )1−α Uti = (1 − β)ui (ct 1 − lt ) + β Et (Ut+1
for i = h, n, where throughout the paper the superscripts h and n denote stockholders and non-stockholders, respectively, and c and l denote consumption and labor supply, respectively. For the parametrizations I consider below, the risk aversion parameter for static wealth gambles will be proportional to αi , and the EIS will be inversely proportional to ρi , although the precise relationship will also depend on the choice of u As indicated by the superscripts, the two types are allowed to differ in their preference parameters.
1716
FATIH GUVENEN
It should be emphasized that the choice of recursive preferences is made mainly for clarity: by disentangling risk aversion from the elasticity of intertemporal substitution, these preferences allow us to examine the impact of heterogeneity in the EIS on asset prices without generating corresponding differences in risk aversion that could confound the inference.3,4 The Firm There is an aggregate firm that produces a single consumption good using capital (Kt ) and labor (Lt ) inputs according to a Cobb–Douglas technology: where θ ∈ (0 1) is the factor share parameter. The technology Yt = Zt Ktθ L1−θ t level evolves according to log(Zt+1 ) = φ log(Zt ) + εt+1
iid
ε ∼ N(0 σε2 )
The firm’s managers maximize the value of the firm, which equals the value of the future dividend stream generated by the firm, {Dt+j }∞ j=1 discounted by the marginal rate of substitution process of firm owners, {βj cally, the firm’s problem is ∞ Λt+j s j (2) β Dt+j Pt = max Et {It+j Lt+j } Λt j=1
Λt+j ∞ Λt j=1
}
. Specifi-
subject to the law of motion for capital, which features “adjustment costs” in investment:
It Kt (3) Kt+1 = (1 − δ)Kt + Φ Kt Pts is the ex-dividend value of the firm, and I normalize the number of shares outstanding to unity (for convenience) so that Pts is also the stock price. The adjustment cost function Φ(·) is concave in investment, which captures the difficulty of quickly changing the level of capital installed in the firm. 3 This expositional advantage notwithstanding, the ability to calibrate risk aversion and EIS separately is not essential for the main substantive results of the paper. This can be seen by comparing the results reported here to the working paper version (Guvenen (2006)), which uses constant relative risk aversion (CRRA) preferences. 4 Habit preferences also break the reciprocal relation between the relative risk aversion (RRA) and EIS parameters. However, as is well known, these preferences can generate a high equity premium even in a simple RBC model with inelastic labor supply and capital adjustment costs (Jermann (1998)). Therefore, using habit in the present model would confound the mechanisms created by habit with those studied here (e.g., resulting from limited participation). Having said that, given the popularity of habit preferences in the recent business cycle research (see, e.g., Christiano, Eichenbaum, and Evans (2005)), future work should explore the implications of a model that combines habit formation with limited stock market participation.
A PARSIMONIOUS MACROECONOMIC MODEL
1717
f
Every period the firm sells one-period bonds, at price Pt , to finance part of its investment. The total supply of these bonds is constant over time and equals a fraction, χ, of the average capital stock owned by the firm (as in Jermann (1998), Danthine and Donaldson (2002)). As a result, the firm makes f net interest payments in each period in the amount of (1 − Pt )χK to bond 5 owners. An equity share in this firm entitles its owner to the entire stream of future dividends, which is given by the profits net of wages, investment, and f interest payments: Dt = Zt Ktθ L1−θ − Wt Lt − It − (1 − Pt )χK t Financial Markets In this economy, the firm’s equity shares (stocks) and one-period bonds issued by the firm are traded. The difference between the two groups is in their investment opportunity sets: the “non-stockholders” can freely trade the riskfree bond, but they are restricted from participating in the stock market. The “stockholders,” on the other hand, have access to both markets and hence are the sole capital owners in the economy. Finally, I impose portfolio constraints as a convenient way to prevent Ponzi schemes. Individuals’ Dynamic Problem and the Equilibrium In a given period, the portfolio of each group can be expressed in terms of the beginning-of-period capital stock, K, the aggregate bond holdings of nonstockholders after production, B and the technology level, Z Let Υ denote the aggregate state vector (K B Z) The dynamic programming problem of a stockholder can be expressed as V h (ω; Υ ) = max (4) (1 − β)u(c 1 − l) clb s
i (1−ρi )/(1−αi ) 1/(1−ρi ) + β E[V h (ω ; Υ )|Z]1−α s.t. (5)
c + P f (Υ )b + P s (Υ )s ≤ ω + W (K Z)l
(6)
ω = b + s (P s (Υ ) + D(Υ ))
(7)
K = ΓK (Υ )
(8)
b ≥ B
B = ΓB (Υ )
5 The introduction of corporate debt into this framework allows me to model bonds as a positive net supply asset, which is more realistic. However, the Modigliani–Miller theorem holds in this framework in the sense that stockholders are able to fully undo the effect of leverage in their portfolio. Therefore, the existence of leverage has no effect on quantity allocations, which I have verified by solving the model without leverage.
1718
FATIH GUVENEN
where ω denotes financial wealth, b and s are individual bond and stock holdings, respectively, ΓK and ΓB denote the laws of motion for the wealth distribution which are determined in equilibrium, and P f is the equilibrium bond pricing function. The problem of a non-stockholder can be written as above with s ≡ 0 and the superscript h replaced with n Finally, the stock return and the risk-free rate are defined as usual: Rs = (P s + D )/P s − 1 and Rf = 1/P f − 1 and the equity premium is denoted by Rep ≡ Rs − Rf A stationary recursive competitive equilibrium for this economy is given by a pair of value functions, V i (ωi ; Υ ) i = h n; consumption, labor supply, and bond holding decision rules for each type of agent, c i (ωi ; Υ ) li (ωi ; Υ ) and bi (ωi ; Υ ); a stockholding decision rule for stockholders, s (ωh ; Υ ); stock and bond pricing functions, P s (Υ ) and P f (Υ ); a competitive wage function, W (K Z); an investment function for the firm, I(Υ ); laws of motion for aggregate capital and the aggregate bond holdings of non-stockholders, ΓK (Υ ) and ΓB (Υ ); and a marginal utility process, Λ(Υ ) for the firm owners, such that the following statements hold: (i) Given the pricing functions and the laws of motion, the value function and decision rules of each agent solve that agent’s dynamic problem. (ii) Given W (K Z) and the equilibrium discount rate process obtained from Λ(Υ ) the investment function, I(Υ ), and the labor choice of the firm, L(Υ ), are optimal. (iii) All markets clear: (a) μbh (h ; Υ ) + (1 − μ)bn (n ; Υ ) = χK/P f (Υ ) (bond market), (b) μs (h ; Υ ) = 1 (stock market), and (c) L(Υ ) = μlh (h ; Υ ) + (1 − μ)ln (n ; Υ ) (labor market), where i denotes the wealth of each type of agent in state Υ in equilibrium. (iv) Aggregate laws of motion are consistent with individual behavior: K = (1 − δ)K + Φ(I(Υ )/K)K and B = (1 − μ)bn (n Υ ) (v) There exists an invariant probability measure P defined over the ergodic set of equilibrium distributions. 3. QUANTITATIVE ANALYSIS The solution to the recursive competitive equilibrium is obtained using numerical methods. In addition to the well-known challenges associated with solving incomplete markets asset pricing models (cf. Krusell and Smith (1997) and Storesletten, Telmer, and Yaron (2007)), the present model is further complicated by the presence of (i) capital adjustment costs, (ii) Epstein–Zin preferences, and (iii) leverage. These features raise a number of issues that we now discuss. First, because markets are incomplete, one cannot solve for allocations first and then obtain prices as is typically done in representative agent models. Instead, allocations and pricing functions must be solved simultaneously. Sec-
A PARSIMONIOUS MACROECONOMIC MODEL
1719
ond, the three features mentioned above introduce important nonlinearities into equilibrium functions, making it essential to start the algorithm with good initial guesses and update them very slowly. Third, asset prices are well known to be more sensitive to approximation errors (much more so than quantity allocations). Furthermore, some key variables in the model exhibit significant volatility and are, therefore, not confined to narrow regions of the state space, so the accuracy of local (e.g., log-linear, etc.) approximations is not always easy to ensure. So, instead, I approximate the equilibrium functions over the entire state space using multi-dimensional cubic splines and check that equilibrium conditions ((i)–(v) at the end of Section 2) are satisfied at all grid points. Finally, the wealth distribution (which is a relevant state vector in this model) is not approximated by moments as done by Krusell and Smith (1997), but instead its evolution is tracked exactly via the functions ΓK and ΓB . Of course, this is only feasible here because there are two types of agents, so the wealth distribution has only two dimensions. Overall, the algorithm I use trades speed for precision and is, therefore, not very fast, but does deliver accurate results. I now provide an outline of the algorithm for the baseline model with inelastic labor supply introduced in Section 3.1, which contains all the essential components. A supplemental computational appendix (Guvenen (2010)) contains further details. Step 0—Initialization: Choose appropriate grids for ωh , ωn (each agent’s wealth holdings), and Υ . Off-grid values for all functions are obtained by cubic spline interpolation. Initial guesses for equilibrium functions are obtained by solving a simplified version of the model with no adjustment costs (ξ = ∞), with no leverage (χ = 0), and with CRRA utility (α = ρ). The algorithm for this simplified model is essentially the same as Steps 1–5 below (with no need for Step 2(c) because P s ≡ K), but is much more stable. Let superscript j index the iteration number. Set j = 1 and start the iteration: Step 1—Solve Each Agent’s Dynamic Problem (defined by equations (4)– (8)): These are standard Bellman equations and are solved via value function iteration. The fact that preferences are of the Epstein–Zin form poses no additional difficulty in this step—in fact, it makes it easier in certain ways. More on this in the appendix. j Step 2—Update Equilibrium Functions: Λj−1 I j−1 Dj−1 P sj−1 ΓK . ) j (a) Use c hj (ω; Υ ) and V hj (ω; Υ ) obtained in Step 1 to construct ΛΛj(Υ . (Υ ) Using this stochastic discount factor, solve the firm’s problem ((2) and (3)) to obtain I j (Υ ). (b) Obtain Dj (Υ ) = ZK θ L1−θ − W L − I j (Υ ) − (1 − P fj−1 (Υ ))χK and upj date ΓK (Υ ): K = (1 − δ)K + Φ(I j (Υ )/K)K. 0 (Υ ) ≡ P sj−1 (Υ ) Now iterate for m = (c) Define the temporary variable P m−1 (Υ )) | Υ ] Set P sj (Υ ) = m (Υ ) = E[βΛj (Υ Z )(Dj (Υ ) + P 1 M: P M (Υ ) P
1720
FATIH GUVENEN
Step 3—Update the Bond Pricing Function, P fj−1 : The method here follows Krusell and Smith (1997) closely. First, solve q) Vh (ω; Υ (1−ρ)/(1−α) 1/(1−ρ) (1 − β)c 1−ρ + β E[V hj (ω ; Υ )1−α |Υ ] = max b s
st
c + qb + P sj (Υ )s ≤ ω + W (K Z)
and equations (6)–(8) (and with s ≡ 0 for the non-stockholder). The only difference between this problem and the individual’s original Bellman equation is that here the individual views the current period bond price as some arbitrary parameter q which is not necessarily equal to P fj−1 (Υ ) This problem generates bond holding rules bh (ω; Υ q) and bn (ω; Υ q) which explicitly depend on q Then, at each grid point, search over values of q to find q∗ such that the bond market clears, that is, excess demand is less than 10−8 . Then set P fj (Υ ) = q∗ (Υ ) j Step 4: Obtain ΓB (Υ ): B (Υ ) = (1 − μ) bn (n Υ q∗ (Υ )). Step 5: Iterate on Steps 1–4 until convergence. I require maximum percentage discrepancy (across all points in the state space) between consecutive iterations to be less than 10−6 for P f 10−4 for P s , and 10−6 for aggregate laws of motion. Further tightening these convergence criteria has no noticeable effect. As further described in the appendix, additional checks are conducted to ensure the accuracy of the solution. For example, one should also test whether the stock market clears at all state points, which is not explicitly imposed by the algorithm above. It indeed does: the maximum |μs (h ; Υ ) − 1| < 10−5 Another useful check is to see whether increasing the number of grid points changes the results: doubling the number of points in ω and K directions and tripling them in the B direction (simultaneously) had no noticeable effect on the statistics studied in the paper. Further details are provided in the supplementary computational appendix (Guvenen (2010)). 3.1. Baseline Parametrization A model period corresponds to 1 month of calendar time to approximate the frequent trading in financial markets. Because asset pricing statistics are typically reported at annual frequencies and macroeconomic statistics are reported at quarterly frequencies, I aggregate financial variables and quantities to their respective reporting frequencies to calculate the relevant statistics as explained below. Table I summarizes the baseline parameter choices. The capital share parameter, θ is set to 0.3. The functional form for Φ is specified as a1 (It /Kt )1−1/ξ + a2 , as in Jermann (1998), where a1 and a2 are constants chosen such that the steady state level of capital is invariant to ξ The curvature parameter ξ determines the severity of adjustment costs. As ξ approaches infinity, Φ becomes linear, and investment is converted into capital
A PARSIMONIOUS MACROECONOMIC MODEL
1721
TABLE I BASELINE PARAMETRIZATIONa Parameter
Value
Calibrated Outside the Model β∗ Time discount rate 1/ρh EIS of stockholders 1/ρn EIS of non-stockholders μ Participation rate φ∗ Persistence of aggregate shock θ Capital share ξ Adjustment cost coefficient Depreciation rate δ∗ B Borrowing limit χ Leverage ratio
0.99 0.3 0.1 0.2 0.95 0.30 0.40 0.02 6W 015
Calibrated Inside the Model (to Match Targets) Standard deviation of shock (%) σε∗ αh = αn Relative risk aversion
15/15/11 6
a The asterisks indicate that the reported value refers to the implied quarterly value for a parameter that is calibrated to monthly frequency. W is the average monthly wage rate in the economy. The last two parameters are chosen (i) to match the standard deviation of Hodrick–Prescott filtered output in quarterly data (1.89%) and (ii) to generate an annual Sharpe ratio of 0.25. The standard deviation values refer to CONS/CD/GHH models, respectively.
one for one (frictionless economy limit). At the other extreme, as ξ approaches zero, Φ becomes a constant function and the capital stock remains constant regardless of the investment level (exchange economy limit). I set ξ = 040, which is broadly consistent with the values reported in the empirical literature (see Christiano and Fisher (1998) for a survey of existing estimates). Because there is also a fair amount of disagreement about the correct value of ξ, in Section 6.2, I also conduct sensitivity analysis with respect to this parameter. The calibration of the capital accumulation equation is completed by setting δ to 00066 implying a quarterly depreciation rate of 2%. As for the technology shock, I match the first order autocorrelation of 0.95 of the Solow residuals at quarterly frequencies by setting φ = 0976 at monthly frequency. I discretize the AR(1) process for Zt using a 15-state Markov process. The innovation standard deviation, σε is set later below. Given the absence of idiosyncratic shocks in the present model, it does not seem realistic for borrowing constraints to bind frequently for entire groups of population. Therefore, in the baseline case I calibrate these constraints to be quite loose—equal to 6 months of labor income for both types of agents— which almost never bind in the simulations.6 As for the calibration of the leverage ratio, Masulis (1988, Table 1.3) reported that the leverage ratio (debt/book 6 In the supplementary appendix, I show that if constraints were tight enough to bind frequently, if anything, this raises the equity premium.
1722
FATIH GUVENEN
value) of U.S. firms has varied between 13% and 44% from 1929 to 1986. With my calibration, the leverage ratio in the model is set to 15% of the average equity value and fluctuates between 11% and 32%. Moreover, this calibration also ensures that the firm is always able to pay its interest obligations, so the corporate bond is default-free. Participation Rates The model assumes a constant participation rate in the stock market, which seems to be a reasonable approximation for the period before the 1990s when the participation rate was either stable or increasing gradually (Poterba and Samwick (1995, Table 7)). In contrast, during the 1990s participation increased substantially: from 1989 to 2002 the number of households who owned stocks increased by 74%; by 2002, half of U.S. households had become stock owners (Investment Company Institute (2002)). Modeling the participation boom in this later period would require going beyond the stationary structure of the present model, so instead, I exclude this later period (1992–present) both when calibrating the participation rate and when comparing the model to the data. I set the participation rate in the model, μ to 20%, roughly corresponding to the average rate from 1962 to 1992 (a period during which participation data are available). Note that even during times when participation was higher, households in the top 20% have consistently owned more than 98% of stocks (Poterba and Samwick (1995, Table 9)). Utility Functions I consider three different specifications for the period utility function. First, to provide a simple and well-known benchmark, I begin with the case where labor supply is inelastic (i.e., leisure is not valued) and assume that the period i utility function is of the standard power form: u(c 1 − l) = c 1−ρ This is a useful benchmark that allows a direct comparison to the existing literature where inelastic labor supply is the most common assumption. In addition, this case allows us to illustrate the key mechanisms that result from limited participation in their simplest form. To distinguish between different versions of the model, I will refer to this case as the CONS model. The remaining two specifications feature valued leisure for a full-blown quantitative analysis. The first one features a Cobb–Douglas function (hereafter, the CD model) commonly used in macroeconomic analysis: u(c 1 − l) = i (c γ (1 − l)1−γ )1−ρ . However, one restrictive property of this functional form is i that ρ and γ jointly pin down the EIS, the fraction of time devoted to market work, and the Frisch labor supply elasticity. In other words, choosing the two parameters to match the first two empirical magnitudes automatically pins down the Frisch elasticity, which is a serious restriction given that we are interested in constructing a model that allows us to study macroeconomic quantities and asset prices jointly. To overcome this difficulty I use a third utility function:
A PARSIMONIOUS MACROECONOMIC MODEL
1723
i
u(c 1 − l) = (c − ψ(l1+γ )/(1 + γ))1−ρ , introduced by Greenwood, Hercowitz, and Huffman (1988, hereafter the GHH model). This specification has three distinct parameters that can be chosen to separately target the three parameters mentioned above. This feature will be useful in the analysis that follows. Preference Parameters There is a large body of empirical work that documents heterogeneity in the EIS across the population (see Guvenen (2006) for a more comprehensive review of the empirical evidence). These studies find that, by and large, nonstockholders (and the poor, in general) have an elasticity of substitution that is very low—close to zero—while stockholders (and the wealthy, in general) have an EIS that is higher. For example, Blundell, Browning, and Meghir (1994) estimated that households in the top income quintile have an EIS that is three times that of households in the bottom quintile of the distribution. Similarly, Barsky, Juster, Kimball, and Shapiro (1997) estimated the distribution of the EIS parameter in the population and found the average to be below 02, but also found the highest percentiles to be exceeding unit elasticity. One theoretical explanation for this observed heterogeneity was provided by Browning and Crossley (2000). They started with a model of choice where agents consume several goods with different income elasticities. Because the budget share of luxuries rises with wealth, the aggregate consumption bundle of wealthy individuals has more goods with high income elasticities than that of poor individuals. Browning and Crossley (2000) proved that this observation also implies that luxuries are easier to postpone than necessities and, consequently, that the EIS (with respect to total consumption) increases with wealth. Since stockholders are substantially wealthier than non-stockholders, this also implies heterogeneity in the EIS across these two groups as found in these studies. To broadly capture the empirical evidence described above, I set the EIS of non-stockholders to 01 and assume an EIS that is three times higher (0.3) for stockholders (in all versions of the model).7 Finally, I set β equal to 09966 (monthly) so as to match the U.S. capital–output ratio of 25 in annual data. 7
Although in this paper I do not explicitly model the source of the heterogeneity in the EIS (so as not to add another layer of complexity), one way to do this would be by assuming nonhomothetic preferences following Browning and Crossley’s analysis. Specifically, suppose that both agents have identical utility functions that feature “benchmark consumption levels”: uit = (cti − aC t )1−ρ /(1 − ρ) where C t is aggregate consumption. With this specification, the EIS is rising with the consumption (and, therefore, the wealth) of an individual. Furthermore, wealth inequality in this framework is mainly due to limited participation and is quite robust to changes in the curvature of the utility of both agents (see Guvenen (2006) for a detailed analysis of this point). Therefore, with these preferences, stockholders continue to be much wealthier and, consequently, consume more than non-stockholders. By choosing ρ and a appropriately, one can generate the same EIS values assumed (exogenously) in the present paper. The supplemental appendix presents the details of such a calibration that broadly generates the same asset pricing results as in the CONS model.
1724
FATIH GUVENEN
With Cobb–Douglas preferences, there is only one additional parameter, γ which is chosen to match the average time devoted to market activities (036 of discretionary time). I continue to keep the EIS values of both groups as above. However, as noted above, γ and ρi also determine the Frisch labor supply elasticity, which means that assuming heterogeneity in the EIS also implies unintended heterogeneity in the Frisch elasticity: 1.35 for stockholders and 0.69 for non-stockholders. Although such heterogeneity is difficult to justify with any empirical evidence I am aware of, there seems to be no practical way to get around this problem with CD preferences; I will return to this caveat later. The GHH specification provides more flexibility, with one additional parameter. The Frisch elasticity is now equal to 1/γ for both types of agents, which I set equal to 1. This value is consistent with the estimates reported in Kimball and Shapiro (2003). However, there is a fair degree of disagreement in the literature about the correct value of this parameter, so I also discuss below the effect of different values for γ on the results. The average hours worked is given by L = (W (κ)/((1 + γ)κ))1/γ , where W (κ) is the average wage rate in the economy whose dependence on κ is made explicit. For a target value of L = 036, this equation is solved to obtain the required value of κ. The existing empirical evidence on the risk aversion parameter is much less precise than one would like. Moreover, the limited evidence available pertains to the population average, whereas what will matter for asset prices in this framework is the risk aversion of stockholders, who constitute only a small fraction of the population, making those average figures even less relevant. Therefore, I calibrate the risk aversion of stockholders indirectly, that is, by matching the model to some empirical targets. (I also conduct a sensitivity analysis in Section 6.2.) Specifically, I first consider the CONS model. I choose the two parameters that are free at this point (αh σε ) to match two empirical targets: (i) the volatility of Hodrick–Prescott filtered quarterly output (189%) and (ii) an annual Sharpe ratio of 025. I then set the risk aversion of non-stockholders equal to the same value. My target value for the Sharpe ratio is somewhat lower than the 032 figure in the U.S. data (Table II). This is because forcing the model to explain the full magnitude of the risk premium is likely to come at the expense of poor performance in other areas, such as macroeconomic behavior or asset price dynamics, which I am also interested in analyzing. The present choice is intended to balance these different considerations. For practical considerations, I restrict the parameter search to integer values in the αh direction (from 2 to 10) and consider 01 increments in the σε direction (from 01% to 2%). I minimize an equally weighted quadratic objective written in the percent deviation from each empirical target. The minimum is obtained for σε = 15% (quarterly standard deviation) for the CONS model with αh = 6 For the CD and GHH models, I keep the risk aversion parameter at this value and choose σε in each case to match output volatility. The
1725
A PARSIMONIOUS MACROECONOMIC MODEL TABLE II
UNCONDITIONAL MOMENTS OF ASSET RETURNS: MODEL WITH INELASTIC LABOR SUPPLY U.S. Data αh αn
CONS Model 6/6 0.3/0.1
6/6 0.3/0.3
6/6 0.1/0.1
6/12 0.3/0.1
E(Rep ) σ(Rep ) σ(Rs ) E(Rep )/σ(Rep ) E(Rf ) σ(Rf )
A. Stock and Bond Returns 617 (199) 546 244 194 (141) 219 153 193 (138) 206 147 032 (011) 025a 016 194 (054) 131 320 544 (062) 665 455
765 272 270 028 024 852
552 220 208 025 135 671
E(P s /D) σ(log(P s /D)) σ( log D)
221 (058) 263 (167) 134 (094)
295 387 242
271 269 191
1/ρh 1/ρn
σ( log c h ) σ( log c n )
B. Price–Dividend Ratio 272 259 266 138 191 140
C. Consumption Growth Volatility >1.5–2.0b 242 078
112
244
a The Sharpe ratio of 0.25 is one of the two empirical targets in the calibration. All statistics are reported in annualized percentages. Annual returns are calculated by summing log monthly returns. The numbers in parentheses in the second column are the standard errors of the statistics to reflect the sampling variability in the U.S. data. b The value reported here represents an approximate lower bound for this ratio based on the empirical evidence discussed in Section 5.1.
resulting values are σε = 15% in the CD model and σε = 11% in the GHH model.8 These values of the innovation standard deviation are close to the values used by BCF, Danthine and Donaldson (2002), and Storesletten, Telmer, and Yaron (2007) in a context similar to ours.9 Nevertheless, these figures are quite high compared to the direct estimate of the volatility of Solow residuals for the postwar period, which is about 0.7%. This suggests that it may be more appropriate to interpret the exogenous driving source in this class of models as encompassing more than just technology shocks (such as fiscal policy shocks, among others).
8
These volatility figures are helped a bit by the choice of a monthly decision frequency and time aggregation of the resulting simulated data. If, instead, the model were solved at quarterly frequency, the values of σε necessary to match the same targets would be 10%–15% higher, depending on specification. 9 BCF used permanent shocks with σε = 18% per quarter; Storesletten, Telmer, and Yaron (2007) also used permanent shocks with σε = 33% per year. Danthine and Donaldson (2002) used a two-state Markov process with persistence of 0.97 and a σε = 56% per quarter.
1726
FATIH GUVENEN
4. MODEL RESULTS: ASSET PRICES 4.1. The Unconditional Moments of Asset Prices I begin by discussing the unconditional moments of stock and bond returns, and then turn to the conditional moments in the next section. Table II displays the statistics from the simulated CONS model along with their empirical counterparts computed from the historical U.S. data covering the period 1890–1991.10 I first examine the inelastic labor supply case. This case provides a useful benchmark, both because it is the most common case studied in the literature and because it allows one to understand the key mechanisms generated by limited participation without the added complexity of labor supply choice. The Equity Premium As shown in the third column of Table II, in the calibrated model the target Sharpe ratio of 025 is attained with a moderately high risk aversion of 6. Clearly, a given Sharpe ratio can be generated by many different combinations of equity premium and volatility, so matching this target does not say anything about the numerator and the denominator. The corresponding equity premium is 545%, which is slightly lower than the historical figure of 62%. The volatility of the equity premium is 219% compared to 194% in the data. Therefore, the model generates an equity premium with mean and volatility that are in the right ballpark compared to the data. The Mechanism The high equity premium is generated by a general equilibrium mechanism that amplifies stockholders’ consumption growth volatility and does so in a procyclical fashion, causing them to demand a high equity premium. Specifically, the mechanism results from the interaction of three features of the model, which reinforce each other. First, limited participation creates an asymmetry in consumption smoothing opportunities: facing persistent (aggregate) labor income shocks, non-stockholders have to exclusively rely on the bond market, whereas stockholders have another margin—they can also adjust their capital holdings. Second, because of their low EIS, non-stockholders have a stronger desire for a smooth consumption process compared to stockholders. The combination of these two effects implies that non-stockholders need the bond market much more than stockholders. Third, and importantly, the bond market is not a very effective device for consumption smoothing in the face of aggregate 10 The data are taken from Campbell (1999). The stock return and the risk-free rate are calculated from Standard and Poor’s 500 index and the 6-month commercial paper rate (bought in January and rolled over in July), respectively. All returns are real and are obtained by deflating nominal returns with the consumption deflator series available in the same data set.
A PARSIMONIOUS MACROECONOMIC MODEL
1727
risk, because it merely reallocates the risk rather than reducing it, as would be the case if shocks were idiosyncratic. As a result, non-stockholders’ desire for smooth consumption is satisfied via trade in the bond market, at the expense of higher volatility in stockholders’ consumption. Moreover, since these large fluctuations in stockholders’ consumption are procyclical, they are reluctant to own the shares of the aggregate firm that performs well in booms and poorly in recessions. Therefore, they demand a high equity premium. In Section 5, I quantify the role of this mechanism and contrast it with earlier models of limited participation, such as Saito (1995) and Basak and Cuoco (1998). The Risk-Free Rate Turning to the risk-free rate, the mean is 13%, which compares well to the low average interest rate of 19% in the data. It is important to note that the low risk-free rate is helped by the fact that the model abstracts from longrun growth and preferences are of the Epstein–Zin form. To see this, consider the following expression for the log risk-free rate, which holds as a fairly good approximation:11 (9)
f
h rt ≈ − ln β + ρh Et ( log ct+1 ) + κ
where κ contains terms that involve the volatility of consumption and wealth, which turns out to be secondary for the present discussion. With secular growth, the consumption growth term on the right-hand side would be nonzero—unlike in the present model—pushing the average riskfree rate up. For example, taking an annual growth rate of 15%, and setting f ρh = 333 as calibrated above would imply rt = 585%. As is well known, this “risk-free rate puzzle” is even more severe with CRRA utility, because in this case it would be the risk aversion parameter that would appear in front of the f second term, which is αh = 6 in this case, implying rt = 102%. This discussion reiterates the well-known point that models with CRRA utility functions and long-run growth that match the equity premium typically imply a high average interest rate. Epstein–Zin preferences mitigate this problem if one assumes an EIS that is higher than the reciprocal of the risk aversion parameter, as is the case here. Another well-documented feature of the interest rate—and as it turns out, a challenging one to explain—is its low volatility. The standard deviation is 544% in the historical sample, although different time periods (such as the postwar sample) can yield values as low as 2% per year (see, e.g., Campbell (1999) for a discussion). The corresponding figure is 665% in the model (and 11
For an exact derivation of this expression, human wealth would need to be tradeable. Although this is not the case in the present model, the equation holds fairly well and provides a useful approximation.
1728
FATIH GUVENEN TABLE III
UNCONDITIONAL MOMENTS OF ASSET RETURNS: MODEL WITH ELASTIC LABOR SUPPLY U.S. Data Endogenous Leisure?
Model CD
GHHBaseline
BCF
D-D
GHH(Low Frisch)
Linear
None
E(Rep ) σ(Rep ) σ(Rs ) E(Rep )/σ(Rep ) E(Rf ) σ(Rf )
617 (199) 194 (141) 187 (138) 032 (011) 194 (054) 544 (062)
A. Stock and Bond Returns 265 421 154 174 148 165 017 024 287 142 491 410
403 181 179 022 173 446
663 — 184 036a 120 246
523 253 243 021 132 1061
E(P s /D) σ(log(P s /D)) σ( log D)
221 (058) 263 (167) 134 (094)
B. Price–Dividend Ratio 257 247 136 178 148 112
259 191 119
— — —
— — 455
a BCF reported the time average of the conditional Sharpe ratio, E (Rep )/σ (RS ), instead of the unconditional t t Sharpe ratio reported in the present paper. The statistics from BCF refer to their benchmark model (called the preferred two sector model), which has the best overall performance. The statistics from Danthine and Donaldson (2002, D-D) are from their Table 6, right column with smoothed dividends, which generates the best overall performance. A dash indicates that the corresponding statistic has not been reported in that paper. In the fifth column, the Frisch elasticity is set to 0.5 and the σε is recalibrated to 1.3% per quarter to match output volatility.
further falls to 41% with endogenous labor supply below). Although this figure is higher than the empirical values, the low volatility of the interest rate has turned out to be quite difficult to generate, especially in macro-based asset pricing models. For example, as I report in Table III, this volatility is 246% in BCF and 106% in Danthine and Donaldson (2002); it is 115% in Jermann (1998) (not reported). Thus, the present model provides a step in the right direction. So what explains the relatively low variability of interest rates in the model? To understand the mechanism, consider the bond market diagram in Figure 1. The left panel depicts the case of a representative agent with a low EIS, which is a feature common to the models mentioned above. For example, both the endogenous and the external habit models imply a low EIS (despite differing in their risk aversion implications). With a low EIS, however, the interaction of the resulting inelastic (steep) bond demand curve with a bond supply that is perfectly inelastic at zero (because of the representative–agent assumption) means that even small shifts in the demand curve—due to labor income shocks and the consequent change in the demand for savings—generate large movements in the bond price and, hence, in the risk-free rate. In the present model, the mechanism is different. First, for the following discussion it is convenient to label non-stockholders’ bond decision rule as the “bond demand” and the negative of stockholders’ bond decision rule as the “bond supply.” Now notice that the majority—80%—of the population (non-
A PARSIMONIOUS MACROECONOMIC MODEL
1729
FIGURE 1.—Determination of bond price volatility.
stockholders) have a very low EIS as before, implying very inelastic bond demand (right panel). Turning to bond supply, the key difference here is that it is not inelastic at all. In fact, the stockholders’ supply curve is rather flat, both because of their higher EIS and because they have another asset—equity in the firm—that can act as a partial substitute for bond. As a result, a shift in the bond demand curve (resulting from fluctuations in non-stockholders’ labor income) of similar magnitude as before now results in smaller fluctuations in the interest rate, and the rest is reflected in the variability of trade volume. The Price–Dividend Ratio The average price–dividend (P/D) ratio in the CONS model is 272, which is about 20% higher than the average of 221 in the data. Its volatility is 266%, which compares fairly well with the empirical figure (262%). Finally, the volatility of dividend growth is 191%, which is too volatile compared to the 134% figure in the U.S. data. This high volatility is due to the leverage in the capital structure and is one of the dimensions with which the labor supply choice will help. 4.2. The Role of Preference Heterogeneity One advantage of Epstein–Zin utility is that it allows me to easily examine the effects of heterogeneity in the EIS and risk aversion parameters on asset prices. I conduct three experiments reported in the last three columns of Table II. First (fourth column), I keep all aspects of the baseline parametrization intact, but only increase non-stockholders’ EIS from 0.1 to 0.3, which eliminates
1730
FATIH GUVENEN
all preference heterogeneity from the model. With this change, the equity premium falls significantly, from 55% to 244%, and its volatility falls from 219% to 153%. More importantly, the price of risk falls from 025 to 016. Moving down the column, the volatilities of all variables go down—by 30%–50%. This makes some variables, such as the P/D ratio, too smooth compared to the data, while bringing some others closer to their empirical counterparts, such as the interest rate and dividend growth volatilities. Overall, these results show that the EIS of non-stockholders has a major impact on asset prices, perhaps most importantly on the equity premium and the Sharpe ratio, which are key statistics that this model seeks to explain. Second, an alternative way to eliminate preference heterogeneity is by reducing the EIS of stockholders from 03 to 01, which, as could be anticipated, has qualitatively the opposite effect (fifth column). The equity premium now increases to 765%, but the volatility is also higher at 272%. As a result, the rise in the Sharpe ratio remains rather modest: it is 028, up from 025. Finally, other volatilities—that of the interest rate, P/D ratio, and dividend growth— are also significantly higher. Therefore, stockholders’ EIS has a larger effect on volatilities, but a smaller effect on the Sharpe ratio. The two experiments above also reveal another role played by the EIS heterogeneity. As mentioned earlier, in the U.S. data, stockholders’ consumption growth is more volatile than non-stockholders’: σ( log c h )/σ( log c n ) > 15 − 20 (see Section 5.1 for more details). The baseline CONS model is consistent with this empirical fact, generating a ratio of volatilities equal to 242 (panel C of Table II). However, if preference heterogeneity is eliminated, this ratio falls from 242 to 078 in the fourth column and to 112 in the fifth column. This is because stockholders have access to two separate assets that they can use to smooth consumption effectively and, absent preference heterogeneity they choose to do exactly that. (Notice that this result also explains why reducing stockholders’ EIS in the fifth in column above had a small effect on the Sharpe ratio.) To summarize the findings so far, the low EIS of non-stockholders plays an important role in generating a high Sharpe ratio, whereas lowering stockholders’ EIS has a larger effect on asset price volatilities, but only a small effect on the Sharpe ratio; furthermore, it creates a counterfactually low consumption growth volatility for this group. The baseline calibration—with EIS heterogeneity—generates results consistent with both asset prices and the relative consumption volatility of the two groups. Third, and finally, in the last column, I examine the effect of non-stockholders’ risk aversion by doubling it to 12. Comparing the sixth column to the baseline case in the third column shows that this change has a minor effect, if at all, across the board. Surprisingly, doubling the risk aversion of 80% of the population has very little impact on the unconditional moments of asset prices. Loosely speaking, this is due to the fact that non-stockholders’ only direct effect on asset prices is through the bond market and their (precautionary) bond
A PARSIMONIOUS MACROECONOMIC MODEL
1731
demand is largely determined by their EIS, but very little influenced by their risk aversion.12 4.3. Asset Prices With Endogenous Labor Supply In the previous section, I found that the benchmark model with inelastic labor supply generated plausible behavior for the unconditional moments of stock and short-term bond returns. Nevertheless, labor supply choice is central for any serious macroeconomic analysis. Therefore, I now relax the inelastic labor supply assumption and consider the two utility functions—Cobb–Douglas and GHH—described above. Results Table III reports the results (third to fifth columns). To provide a comparison, the last two columns display the same set of statistics from two leading macro-based asset pricing models proposed in the existing literature, namely BCF and Danthine and Donaldson (D-D) (2002). The first paper features an endogenous labor–leisure choice, whereas the latter paper has inelastic labor and is more comparable to the model analyzed in the previous section. With Cobb–Douglas utility (CD model, third column), the first point to observe is the rather large fall in the equity premium, which is now 265% (compared to 55% with inelastic labor supply), accompanied by a smaller fall in the volatility (which is 154% compared to 219% before), resulting in a fall in the Sharpe ratio from 025 to 017. Moving down the column, notice that both the risk-free rate and the dividend growth are less volatile than before and now are closer to their respective empirical values. The P/D ratio is also less volatile and is now too smooth compared to data. Overall, I view these results as a step backward compared to the model with inelastic labor supply. I next turn to the GHH model in the fourth column, which delivers a more respectable equity premium level of 42%, with a volatility of 175%. The resulting Sharpe ratio is 024, slightly lower than in the model with inelastic labor supply. Furthermore, the volatility of the risk-free rate is 41%, which is lower than the model with inelastic labor supply (at 665%). As noted earlier, this low volatility is also an important improvement of this model over earlier production economy models. As for the price–dividend ratio, the volatility is about 70% that in the data, but higher than in the model with CD preferences. The volatility of dividend growth now falls to 112% annually. Overall, the model with GHH preferences displays asset pricing behavior that is comparable to the model with inelastic labor supply. In the next column, I explore the impact of lowering the Frisch elasticity closer to the values reported in the microeconomics literature. I set 1/γ to 0.5 12 Furthermore, in the supplementary appendix, I also report the results on asset price dynamics for αn = 12 and find that they are very similar to the case with αn = 6
1732
FATIH GUVENEN
following Domeij and Floden (2006), who surveyed existing microeconometric studies and argued that part of the reason for the very low estimates is the bias that arises from ignoring borrowing constraints. This change in calibration has a relatively modest effect on statistics: most notably, the equity premium falls slightly to 4% and the Sharpe ratio falls to 0.22. As I discuss in Section 6.1, the main drawback of this calibration is that it implies a labor hours process that is too smooth, which is also why macroeconomists typically use higher values similar to those in the baseline calibration. It is useful to compare these results to earlier studies. In the working paper version, BCF showed that introducing a flexible labor supply choice reduces the equity premium substantially, from 445% to 015%, and reduces the Sharpe ratio from 027 to 003.13 One goal of their paper, then, is to identify some prominent labor market frictions that overcome this difficulty associated with endogenous labor supply. Their baseline model features such a framework that matches the historical equity premium as well as a number of other unconditional moments (reported in the sixth column of Table III). In comparison, here flexible labor supply choice has a smaller negative impact on the Sharpe ratio, especially with GHH preferences. This difference stems from two sources. First, BCF employed a specification where utility is linear in leisure and separable from consumption. Consequently, fluctuations in labor hours have no direct effect on utility, which makes it relatively costless to smooth consumption fluctuations by adjusting one’s labor supply. The only loss comes from the fact that to smooth fluctuations, labor supply would need to move in the opposite direction dictated by the substitution effect: rise in recessions when wages are low and fall in expansions when wages are high. In contrast, with the nonseparable preferences in this paper, agents do care about fluctuations in leisure as well as how leisure comoves with consumption, which makes it more costly to adjust labor supply to suppress fluctuations in the marginal utility of consumption. Second, with GHH preferences there is no wealth effect on labor supply choice, so labor hours are strongly procyclical due to the substitution effect of wages over the business cycle, making it an even less effective tool for smoothing fluctuations in marginal utility. As a result, the price of risk does not fall as much in this framework when labor supply choice is endogenized. 4.4. The Dynamics of Asset Prices I now examine the extent to which the limited participation model captures some salient aspects of asset price dynamics. As mentioned earlier, some of these features, such as the countercyclical price of risk, have been difficult to generate in some asset pricing models. The results reported below are from 13 See Boldrin, Christiano, and Fisher (1999, Table II). The statement in the text is based on comparing the results in column 4 to column 8.
1733
A PARSIMONIOUS MACROECONOMIC MODEL
the GHH model, but I found these statistics to be remarkably robust across the different specifications analyzed in this paper.14 4.4.1. Mean Reversion and Predictability of Returns I begin with the price–dividend ratio, which is procyclical in the model, consistent with empirical evidence, with a correlation of 0.73 with output. The procyclicality follows from the fact that when the persistent technology shock hits the economy, the stock price capitalizes all future productivity gains upon impact and thus increases substantially, while the initial response of dividends is muted due to the gradual rise of the capital stock after the shock (because of adjustment costs), making the ratio of the two variables procyclical. A second well-known observation concerning the equity premium is that it tends to revert to its mean over time. The first row of Table IV displays the autocorrelations of the equity premium from the U.S. data. Notice that these autocorrelations are close to zero, which makes it difficult to measure them precisely. As a result, they are not uniformly negative. To circumvent this problem, an alternative statistic used in the literature aggregates consecutive autocorrelation coefficients, which reveals a stronger pattern of mean reversion (third row). The second and fourth rows display the model counterparts of the two measures of mean reversion, which are consistent with the signs and rough magnitudes of these statistics in the data. The finding of mean reversion is a clear departure from the martingale hypothesis of returns and is closely linked to the predictability of returns documented by Campbell and Shiller (1988), among others. To show this, I first regress log stock returns on the log price–dividend ratio using the historical TABLE IV THE AUTOCORRELATION STRUCTURE OF KEY FINANCIAL VARIABLESa Autocorrelation at Lag j (years)
rs − rf U.S. data Model
j s f s f i=1 ρ[(r − r )t (r − r )t−i ] U.S. data Model
1
2
3
5
7
003 −003
−022 −003
008 −002
−014 −002
010 −002
003 −003
−019 −006
−011 −008
−029 −013
−015 −015
a Statistics from the model are calculated from annualized values of each variable, except for the stock price, which is taken to be the value at the end of the year.
14 The supplementary appendix reports the counterparts of the tables in this section for the CONS and CD models.
1734
FATIH GUVENEN TABLE V LONG-HORIZON REGRESSIONS ON PRICE–DIVIDEND RATIOa U.S. Data Horizon (k)
Coefficient
Model R2
Coefficient
R2
1 2 3 5 7
A. Stock Returns −021 0.07 −036 0.12 −041 0.13 −070 0.23 −087 0.27
−019 −040 −049 −077 −091
0.09 0.16 0.22 0.28 0.35
1 2 3 5 7
B. Excess Returns −022 0.09 −039 0.14 −047 0.15 −077 0.26 −094 0.33
−011 −020 −029 −042 −049
0.02 0.04 0.06 0.10 0.12
a The table reports the coefficients and R2 values of the regression of log stock returns (top panel) and log excess returns (bottom panel) on the log price– dividend ratio. Horizon (k) is in years.
U.S. data (panel A of Table V). The well-known pattern documented in the literature can been seen here: the coefficients are negative, indicating that a high price–dividend ratio forecasts lower returns in the future. Moreover, both the coefficients and the R2 values are increasing with horizon. The model counterpart is reported in the last two columns. The coefficient estimates and the R2 ’s are consistent with empirical results: predictability is modest at the 1-year horizon, but increases steadily, reaching 35% at the 7-year horizon. The coefficients also increase quickly first and then grow more slowly. Overall, the model generates significant predictability of stock returns. Turning to excess returns in panel B of Table V, the predictability in the U.S. data is evident again. The last two columns report the same regressions using simulated data. The coefficients on the price–dividend ratio are negative and increase in absolute value with horizon consistent with the data. However, the amount of predictability is smaller than in the data: the R2 of the regression is only 12% at the 7-year horizon compared to 33% in the data. Thus, the model is qualitatively consistent with excess return predictability, but does not quantitatively capture the total magnitude observed in the U.S. data. 4.4.2. Countercyclical Variation in the Price of Risk I now examine the cyclical behavior of the conditional moments. Figure 2 plots the typical simulated paths of the expected equity premium and Sharpe
A PARSIMONIOUS MACROECONOMIC MODEL
1735
FIGURE 2.—The cyclical behavior of expected excess returns and the Sharpe ratio.
ratio against output over a period of 200 years.15 Consistent with the U.S. data (Chou, Engle, and Kane (1992)), both the expected premium and the Sharpe ratio rise visibly in recessions and fall in expansions (correlation with output is −055 and −063 respectively). Moreover, the conditional volatility is also countercyclical and has a correlation with output of −042 (to save space, the plot is not included in the figure). The conditional moments are also quite variable: the 95% confidence interval for the expected excess return extends from 13% to 48%, and the 95% confidence interval for its conditional volatility extends from 91% to 185%. The fact that expected returns are more variable than conditional volatility (as measured by their coefficient of variation) results in the countercyclicality of the Sharpe ratio mentioned above. With few exceptions,16 this countercyclicality of the market price of risk has been difficult to generate in consumption-based asset pricing models. In the next section, I explain the new mechanism in this model which generates this result. 5. UNDERSTANDING THE SOURCES OF EQUITY PREMIUM The large and countercyclical equity premium arises from the interaction of (i) limited stock market participation, (ii) the low EIS of non-stockholders, and (iii) consumption smoothing in the face of persistent (aggregate) labor income shocks. What distinguishes the mechanism in this paper from earlier 15
Each series is rescaled by its standard deviation and shifted vertically to fit in the same graph. Thus, while the amplitudes are not informative, the comovement between the series is preserved. 16 Most notably, Campbell and Cochrane (1999) and Bansal and Yaron (2004).
1736
FATIH GUVENEN
models of limited participation is (ii) and (iii). These two features combine to make the timing of trade in the bond market the driving force behind the high equity premium. In this section I quantify this mechanism. Because labor supply choice adds another level of complexity to the analysis, in this section, I focus on the CONS model. 5.1. Why Is There a High Equity Premium? The annual standard deviation of consumption growth in the CONS model is 361% for stockholders, but only 148% for non-stockholders. To quantify the sources of stockholders’ consumption volatility, I first substitute the equilibrium conditions (iii)(a) and (b) (from the end of Section 2) as well as the expression for dividends into the budget constraint of a stockholder, and after some straightforward manipulation, I obtain
θZK θ L1−θ − I (B − P f B ) (10) − ch = W + μ μ Ah
ah
This expression provides a useful decomposition. Part of the variation in stockholders’ consumption comes from aggregate sources, that is, fluctuations in wage and capital income, which is denoted by Ah . The scaling factor 1/μ that appears in Ah reflects the fact that aggregate capital income risk is concentrated among stockholders, since they are the sole capital owners in this economy. However, this risk only has a modest contribution to the volatility of their consumption, as will become clear shortly. A second part of variability arises from trade with the non-stockholders in the bond market: the component denoted by ah ≡ (B − P f B )/μ measures the net interest payments received by non-stockholders, which is made by stockholders indirectly through the firm they own. Using equation (10), stockholders’ consumption growth volatility can be approximated as h
a h h (11) var( log c ) ≈ var( log A ) + var h A
ah h + 2 cov log A − h A Similarly, non-stockholders’ consumption can be written as c n = W + (B − P B )/(1 − μ) and the variance of their consumption growth is obtained by replacing Ah and ah in equation (11) with An ≡ W and an ≡ −ah μ/(1 − μ) respectively. The top panel of Table VI shows that interest payments are very small on average, making up a small fraction of consumption: E(ai )/E(c i ) is less than f
1737
A PARSIMONIOUS MACROECONOMIC MODEL TABLE VI DECOMPOSITION OF CONSUMPTION VOLATILITY AND EQUITY PREMIUMa Fraction of E(c i ) Explained by
Stockholders Non-stockholders
E(Ai )/E(c i )
E(ai )/E(c i )
1.011 0.995
0.011 0.005
Fraction of σ 2 ( log c i ) Accounted for by σ 2 ( log c i )
Stockholders Non-stockholders
2
(3.61%) (1.48%)2
σ 2 ( log Ai )
i σ 2 ( a i )
i 2σ( log Ai − a i )
0.185 3.13
0.340 0.61
0.475 −2.74
A
A
a i = h n. See the text and equation (10) for the definitions of ai and Ai .
101% for both stockholders and non-stockholders. However, interest payments are volatile and, more importantly, they vary systematically with the business cycle. This is shown in the lower panel of Table VI, which displays the fraction of consumption growth variance explained by each of the three terms in equation (11). For stockholders, only 185% of consumption growth variance is attributable to fluctuations in aggregate income, despite the fact that this component makes up nearly all of their average consumption (row 1). Hence, the concentration of aggregate capital income risk, included in Ah , contributes only modestly to consumption fluctuations and, consequently, to the equity premium. The main source of volatility for stockholders comes from the bond market: interest payments, ah account for the remaining three-quarters of variance (034 + 0475). What is really crucial for this extra volatility is the timing of interest payments: corr( log Ah (ah /Ah )) = −0947 (which can be calculated from the third row of Table VI), which means that the payments received by the non-stockholders increase exactly when aggregate income falls, that is, in recessions.17 Therefore, consumption smoothing for nonstockholders comes at the expense of large fluctuations in stockholders’ consumption, so the covariance term in the third column accounts for nearly half of the total volatility. The flip side of this story is seen in the variance of nonstockholders: var( log c n ) would be 3.13 times higher were it not for the consumption smoothing provided by stockholders through the bond market (that is, if the bond market were shut down, so aht ≡ 0). This negative correlation is not due to the appearance of Ah in the first term as well as in the denominator of the second term. To verify this, I also calculated corr( log Ah ah ) = −0953 which is as negative as the reported correlation. 17
1738
FATIH GUVENEN
A Comparison to Earlier Models The mechanism proposed in this paper differs from earlier models with limited participation in two key dimensions. First, in Saito (1995) and Basak and Cuoco (1998), non-stockholders begin life with some financial wealth but do not have a labor income flow. The only way to sustain consumption is then by investing this wealth in the bond market. As a result, each period nonstockholders receive substantial interest payments from stockholders, which in turn leverages the capital income of the latter group and amplifies their consumption volatility. Although this is a potentially powerful mechanism, for it to work, interest payments need to be substantial—as large as the consumption of non-stockholders—which in turn implies that non-stockholders must hold a large fraction of aggregate wealth. This counterfactual implication has been one of the main criticisms raised against these models. In contrast, as noted above, the interest payments in the present model are very small, and, as we shall see below, the fraction of aggregate wealth held by non-stockholders is also small and consistent with empirical evidence. Instead, the amplification here is generated by the cyclical nature of these payments that serve to insure non-stockholders against aggregate fluctuations. The mechanism in Danthine and Donaldson (2002) is more closely related to the present paper. In their model, entrepreneurs smooth fluctuations in their workers’ consumption at the expense of extra volatility in their own consumption, which results in a high equity premium. An important difference from the present model is that this smoothing happens through insurance within the firm, instead of being through the bond market. As a result, the bond price is determined by the entrepreneurs alone (and not jointly with workers, which is the case here). It would be interesting to know if this difference would result in different implications for asset price dynamics, which is not explored in that paper. A second difference from previous models is that here preference heterogeneity is essential for amplifying stockholders’ consumption volatility relative to that of non-stockholders. This point has already been discussed in Section 4.2, so I do not repeat it here, but it is important to note that this feature contrasts with the previous papers mentioned above, where non-stockholders have no choice but to lend to stockholders simply to survive, which amplifies the consumption volatility of the latter group regardless of preference heterogeneity. Empirical Evidence Finally, the high volatility of stockholders’ consumption relative to that of non-stockholders is well documented empirically. For example, Mankiw and Zeldes (1991) found the ratio of standard deviations to be 1.6 using the Panel Study of Income Dynamics data, although their consumption measure consists of only food expenditures, which is likely to understate the true volatility ratio for nondurables. Attanasio, Banks, and Tanner (2002) used expenditures
A PARSIMONIOUS MACROECONOMIC MODEL
1739
on nondurables and services from the U.K. Family Expenditure Survey and calculated stockholders’ volatility to be 1.5 to 2 times larger than that of nonstockholders.18 Another piece of evidence comes from Aït-Sahalia, Parker, and Yogo (2004), who documented that the sales of luxury goods (such as expensive French wine, charitable donations, rents on luxury condos in Manhattan, etc.) display volatility that exceeds the standard deviation of aggregate nondurables consumption by a factor of 5–10. They interpreted this finding as indicative of highly volatile expenditures by the very wealthy. 5.2. Why Is the Equity Premium Countercyclical? It is instructive to start from the following decomposition of the conditional Sharpe ratio that can be derived under CRRA preferences.19 Although I use recursive utility in the present model, it turns out that the same decomposition still holds fairly well here and is useful for understanding the source of time variation:20 ep
(12)
SRt ≡
Et (Rt+1 ) ep h h ) × corrt ( log ct+1 Rt+1 ) ≈ αh × σt ( log ct+1 ep σt (Rt+1 )
The conditional correlation term in (12) is close to 1 and is very stable over the business cycle, and αh is simply a constant. Therefore, equation (12) points h ) as the source of time variation in the Sharpe ratio. Indeed, to σt ( log ct+1 in the model, SRt is 48% higher during a recession than it is in a boom, h ) is 39% higher in a recession than in a boom, explaining a and σt ( log ct+1 large fraction of the countercyclicality in SRt .21 Moreover, because conditional ep volatility, σt (Rt+1 ) is also countercyclical (18% higher during recessions) the 18 These figures are likely to be downward biased because existing measures of stockholders’ consumption are based on micro data sets that contain few “extremely rich” households, that is, those in the top 1% of the wealth distribution. At the same time, these households own nearly half of all stocks, and the top 0.5% own 37% of all stocks (Poterba and Samwick (1995)). 19 Two additional assumptions that are needed to derive the decomposition are that portfolio constraints are not binding, and consumption growth and asset returns are homoskedastic and jointly log-normal. 20 Although one can also derive an exact decomposition of the Sharpe ratio with recursive preferences (Campbell (1999, eq. 22)), I do not use this alternative form because (i) it requires the additional assumption of complete markets (or human wealth to be tradeable) and (ii) the decomposition has the same form as the one used here, but adds a second term that involves the covariance of the equity premium with the return on wealth. The contribution of this second term depends on the values of risk aversion and EIS, and, for the values used in this paper, it does not seem to be of first order importance for the cyclical behavior. Therefore, I opt for the simpler decomposition. which is useful for discussing the mechanism. 21 A recession is defined as the states of the world when output is 1 standard deviation or more below the mean, that is, Yt < E(Y ) − σ(Y ). An expansion is defined analogously: Yt > E(Y ) + h h σ(Y ). Using average output as the dividing line, I get E[σt (ct+1 )|Yt < E(Y )]/E[σt (ct+1 )|Yt > E(Y )] = 123
1740
FATIH GUVENEN
FIGURE 3.—Change in the slope of value functions with wealth level. ep
expected equity premium, Et (Rt+1 ), is 68% higher during recessions compared to booms. The natural question that follows is, “Why is the stockholders’ consumption growth volatility countercyclical?” To answer this question, two points should be noted. First, a well-known feature of the consumption–savings problem in the presence of uninsurable income risk is that the value function has more curvature at lower wealth levels and less curvature at higher wealth levels. While this is well known in the expected utility case, the same is true here with recursive preferences, as can be seen from Figure 3, which plots how the slopes of V h and V n change with ω (keeping aggregate state variables fixed). If markets were complete, with the formulation of recursive preferences in (1), the value functions would be linear in wealth (Epstein (1988)), implying that the slopes would be constant in wealth. This is obviously not the case for either agent in Figure 3, owing to incomplete markets. Moreover, and thus second, non-stockholders own substantially less wealth per capita than stockholders (more on this in Section 6.3). Figure 3 shows the bounds of the ergodic distribution of wealth for each type of agent, where this difference in wealth levels is evident. Thus, non-stockholders are on the part of their value function that has high curvature: a relatively small change in their wealth holdings changes their attitudes toward risk and, consequently, changes their precautionary savings demand significantly. This is clearly not the case for stockholders, who remain on the very flat part of their value function over the business cycle. To further illustrate this point, Figure 4 plots the normalized curvature of the value function (i.e., risk aversion with respect to
A PARSIMONIOUS MACROECONOMIC MODEL
1741
FIGURE 4.—Change in attitudes toward risk over the business cycle. i wealth gambles), −ωVωω /Vωi for each type of agent as a function of realized output using simulated data from the CONS model.22 I normalize the average value to 1 for both groups so that the vertical axis can be interpreted as the percentage change in risk attitudes over the business cycle. The effect described above can be clearly seen here: non-stockholders’ risk aversion increases significantly in recessions (and consequently, their demand for insurance), whereas it barely moves for stockholders. Combining this countercyclical change in non-stockholders’ demand for consumption smoothing with the mechanism described in the previous section implies that there should be more trade in the risk-free asset market in recessions. This is exactly what happens: ρ(|aht | Yt ) = −055, which shows that the average size of interest payments increases—in turn raising stockholders’ consumption growth volatility—during recessions.23 h Finally, in contrast to the 39% rise in σt ( log ct+1 ) mentioned above, n σt ( log ct+1 ) goes down slightly—by 10%—in recessions. Of course, this is entirely consistent with the mechanism discussed above whereby nonstockholders have an even stronger desire for a smooth consumption process during recessions. Nevertheless, because the two groups’ conditional volatili-
22 Risk aversion with respect to wealth gambles depends on the market structure in addition to α 23 Notice that the countercyclicality of trade volume reflects larger transfers of both signs in the bond market during recessions. This can be seen more clearly by calculating E(aht |recession Yt < 0) = 062 > 040 = E(aht |boom Yt < 0) and E(aht |recession Yt > 0) = −066 < −043 = E(aht |boom Yt > 0) Thus, because non-stockholders are more averse to fluctuations during recessions, they borrow more aggressively when they receive a negative shock to prevent an immediate fall in consumption; when they receive a positive shock, they save more aggressively as a buffer stock to self-insure against future negative shocks.
1742
FATIH GUVENEN
ties move in opposite directions over the business cycle, aggregate consumption growth volatility exhibits little time variation (goes up by only 8% in recesh ) responsible for the sions), masking the large cyclical changes in σt ( log ct+1 countercyclical Sharpe ratio. A corollary to this observation is that if a researcher studies equation (12) with the aggregate consumption process (as would be the case in a representative agent framework), the lack of time variation in the volatility of aggregate consumption growth would lead the researcher to require the curvature parameter, α to be time-varying so as to explain the countercyclicality of SRt . This is the approach adopted, for example, in Campbell and Cochrane (1999), among others. 6. MODEL RESULTS: MACROECONOMIC QUANTITIES In this section I examine the implications of the model for macroeconomic quantities along two dimensions. I first analyze the performance of the model for business cycle statistics and then turn to the cross-sectional implications. 6.1. Business-Cycle Implications Table VII displays standard business cycle statistics from the U.S. data as well as their counterparts from the CD and GHH models. As before, the last two columns report the corresponding figures from BCF and Danthine and Donaldson (2002) for comparison.
TABLE VII BUSINESS CYCLE STATISTICS IN THE U.S. DATA AND IN THE MODELa Data Leisure Preferences
Model CD
σ(Y ) σ(C)/σ(Y ) σ(I)/σ(Y ) σ(L)/σ(Y )
1.89 (0.21) 0.40 (0.04) 2.39 (0.06) 0.80 (0.05)
1.97 0.92 1.38 0.07
ρ(Y C) ρ(Y I) ρ(Y L)
0.76 (0.05) 0.96 (0.01) 0.78 (0.05)
0.99 0.99 0.96
GHHBaseline
GHHLow Frisch
Volatility 1.95 0.78 1.76 0.50
1.96 0.75 1.86 0.32
Correlation With Output 0.99 0.99 0.94 0.97 0.98 0.99
BCF
D-D
Linear
None
1.97 0.69 1.67 0.51
1.77 0.82 1.72 —
0.95 0.97 0.86
0.96 0.93 —
a The statistics for the U.S. data are taken from Table 1 in BCF. The model statistics are computed after simulated data have been aggregated to quarterly frequency, logged, and then Hodrick–Prescott filtered.
A PARSIMONIOUS MACROECONOMIC MODEL
1743
CD Model In the third column, the first row reports the volatility of output, which was one of the two calibration targets and is matched fairly well. The volatility of consumption (normalized by that of output) is about 0.4 in the U.S. data, but is overstated at 0.92 in the model. In contrast, investment is too smooth, with a volatility of 1.38 in the model compared to 2.39 in the U.S. data. Although these discrepancies are bothersome, the most glaring shortcoming of the CD model is seen in the behavior of labor hours, which is an order of magnitude too smooth compared to data (0.07 versus 0.80). However, the behavior of the aggregate labor supply masks significant heterogeneity between the two groups’ behavior. In particular, labor hours are quite volatile but countercyclical for stockholders (σ(lh )/σ(Y ) = 077; correlation with output is −096), whereas it is procyclical but smoother for non-stockholders because of their low Frisch labor supply elasticity (σ(ln )/σ(Y ) = 024; correlation is +097). The behavior of aggregate hours largely mirrors that of non-stockholders (i.e., procyclical and smooth), because they make up a substantial majority of the population. However, aggregate hours are even smoother than that of non-stockholders, exactly because the labor supply of the two groups moves in opposite directions over the business cycle, partly canceling each other. To increase the volatility of aggregate hours, one would need to increase the Frisch elasticity of nonstockholders and/or assume a smaller fraction of households to be stockholders. Unfortunately, a higher Frisch elasticity also means a higher EIS for nonstockholders, which contradicts the main empirical evidence this paper builds upon. Because of this important shortcoming, I conclude that Cobb–Douglas utility does not appear to be a promising direction to pursue in this framework for endogenizing labor supply. Procyclicality of Aggregate Hours Despite the important shortcomings of the CD model, the procyclical aggregate hours is an encouraging finding, in light of earlier studies that have consistently found that capital adjustment costs create countercyclical hours in representative agent models (cf. BCF and Uhlig (2006)). Thus, it is important to understand why the present model delivers the opposite result. The answer lies in the specific nature of the cross-sectional heterogeneity generated by limited participation. To see why this helps, first note that adjustment costs generate a volatile and procyclical stock price (as in the data). In a representative agent model, such as the one studied by BCF, this implies a large positive wealth effect during expansions, which overcomes the substitution effect from a higher wage rate and causes labor hours to drop during expansions. In contrast, here 80% of the population are non-stockholders and therefore do not experience the described wealth effect. Instead, they mainly respond to the substitution effect of higher wages and increase their labor hours. Although stockholders behave in a similar fashion to the representative agent described
1744
FATIH GUVENEN
above, they only make up a small fraction of the population. This result therefore suggests that the countercyclical hours found in earlier studies arises from the interaction of adjustment costs with the representative agent assumption. By relaxing the latter, the present model is able to generate procyclical hours with Cobb–Douglas preferences. GHH Model Turning to the GHH model, the volatility of consumption is lower, whereas the volatility of investment is higher compared to the CD model, and the model is now in the same ballpark as the other two macro-based asset pricing models reported in the table. The more significant improvement is in the volatility of labor hours, where the CD model failed most dramatically: now, the volatility of hours is much higher than before (equal to half the volatility of output) and is closer to the data. This improvement is due to the flexibility afforded by GHH preferences for calibrating the Frisch elasticity independently of the EIS. Aggregate hours (as well as each group’s labor hours) are also procyclical, which is due to the lack of wealth effects with GHH preferences. Finally, the fifth column reports the statistics from the GHH model when Frisch elasticity is lowered to 0.5. As seen here, the major impact of this change is a large fall in the volatility of labor hours, which is a major reason why macroeconomic studies typically use higher values as in the baseline calibration. In summary, the macroeconomic implications of GHH preferences are tolerable—that is, comparable to previous macro-based asset pricing models—and are significantly better than the CD model. 6.2. Asset Prices versus Macro Performance: Some Trade-Offs As noted earlier, there is a fair amount of uncertainty regarding the values of some key model parameters, most notably ξ and αh Therefore, I now examine the effect on asset prices and macroeconomic volatilities of varying these two parameters over a plausible range. I first consider variations in the adjustment cost parameter around its baseline value of 0.4. As ξ is raised from 0.2 to 0.7, it becomes less costly for stockholders to adjust the level of capital installed in the firm, which makes investment a more effective tool for consumption smoothing. The effect can be seen in the right panel of Figure 5 (circles), which plots the combinations of investment volatility and consumption volatility (scaled by output volatility) implied by the model for each value of ξ The baseline case is marked by the intersection of the dashed horizontal and vertical lines. As ξ rises, investment becomes much more volatile (rises from 1.16 to 2.33) while consumption becomes smoother (from 0.96 to 0.67), bringing both statistics closer to their empirical counterparts (although the lowest volatility figure for consumption is still higher than the 0.40 value in the U.S. data).
A PARSIMONIOUS MACROECONOMIC MODEL
1745
FIGURE 5.—Trade-off between asset prices and macroeconomic volatilities.
Not surprisingly, however, lower adjustment costs and the resulting smoothness of consumption also affect asset prices. This is shown in the left panel, which plots the combinations of equity premium and Sharpe ratio implied by each value of ξ. Raising ξ from 0.2 to 0.7 reduces the equity premium by a factor of about 1.2 (from 8.20% to 3.67%), which is mostly driven by a lower volatility of returns (from 28.1% to 16.1%). As a result, the fall in the Sharpe ratio is a rather small 24% (from 0.285 to 0.229). This comparison suggests that the model’s implications for the volatilities of macroeconomic variables can be improved, at the expense of a smaller equity premium and a slightly lower Sharpe ratio. Next I turn to the risk aversion of stockholders and consider values of αh from 4 to 10. Starting in the right panel (marked by squares), notice that increasing risk aversion has a very minor impact on the volatilities of investment and consumption (although, qualitatively, it moves the model in the right direction). It does, however, have a more significant impact on asset prices (left panel): the equity premium rises from 4.79% to 6.65% with an almost equal rise in the Sharpe ratio (from 0.221 to 0.306). These two exercises illustrate the trade-offs between the asset pricing and the macroeconomic implications of the model. Together, they suggest that if one is willing to tolerate a risk aversion as high as 10 for stockholders, adjustment costs could be reduced to, for example, ξ = 06 The higher risk aversion would deliver a sizeable equity premium and Sharpe ratio, while the low adjustment cost would move macroeconomic volatilities closer to the data. Nevertheless, a risk aversion of 10 seems quite high, especially for stockholders who are willingly undertaking substantial risks in financial markets.
1746
FATIH GUVENEN
FIGURE 6.—Evolution of wealth distribution over time.
6.3. Distributional Implications Given the central role played by heterogeneity for the asset pricing results of this paper, an important question to ask is whether the implied cross-sectional behavior is broadly consistent with empirical evidence. I first look at the wealth distribution implied by the GHH model (Figure 6), which reveals substantial inequality between the two groups: on average stockholders own 92% of aggregate wealth, whereas non-stockholders own the remaining 8%. (These shares are fairly robust across specifications: stockholders’ share is 86% in the CD model and 89% in the CONS model.) These figures compare fairly well to the U.S. data, where in the 1980s and 1990s stockholders owned an average of 82% of all net worth (fungible wealth, including housing) and more than 90% of financial assets (see Poterba and Samwick (1995) and Guvenen (2006)). As noted above, this implication for wealth inequality is an important difference between the present model and earlier models with limited participation, which required substantial wealth holdings for non-stockholders. This discussion, however, raises a question of its own: Is the amount of trade in the risk-free asset implied by the model quantitatively plausible, or do we need a substantial amount of trade to sustain the mechanism that generates the equity premium? One way to answer this question is by looking at the trade volume, which I define as the (absolute) change in non-stockholders’ bond holdings during a given year compared to the level of their bond holdings: E(|B|)/E(B) This measure is less than 11% per year across all versions of the model reported in Tables II and III. This modest figure perhaps does not come as a surprise, given the gradual shifts in the wealth shares shown in Figure 6. As an alternative measure, I compare the size of non-stockholders’ per capita transactions to their wage income: E(|B − P f B |/(1 − μ))/E(W ) is
A PARSIMONIOUS MACROECONOMIC MODEL
1747
less than 19% annually across all simulations.24 For a non-stockholding household with an annual income of $50,000, this upper bound implies a net change, for example in their savings account, of $950 during the year. These relatively small payments entail large changes in stockholders’ consumption, because they are synchronized across all non-stockholders (i.e., they result from aggregate shocks) and the effect is concentrated among a small number of stockholders.25 7. CONCLUSIONS I have studied the implications of a two-agent model with limited stock market participation and heterogeneity in the EIS for asset prices and macroeconomic quantities. My findings suggest that the cross-sectional heterogeneity generated by the interaction of these two features could be important for several macroeconomic and asset pricing phenomena. The model highlights a new mechanism for the equity premium that results from the limited consumption smoothing opportunities available to the majority of the population, who are non-stockholders. These households turn to the bond market, which effectively transfers their (aggregate) labor income risk and concentrates it among a small group of stockholders. As a result, these households demand to be compensated in the form of a high equity premium. In addition to generating broadly plausible behavior for asset prices, this mechanism is also consistent with the very low wealth holdings of non-stockholders in the U.S. data, which has been problematic for previous models of limited participation. This significant skewness in wealth holdings, in turn, combines with the basic mechanism above to provide a novel explanation for the countercyclicality of the equity premium: essentially, because non-stockholders are very poor and poorly insured, they become effectively more risk averse during recessions when their wealth falls even further. Therefore, stockholders provide more insurance to these agents and bear more risk during these periods. This leads them to demand a higher ex ante premium to hold the equity risk. Another appealing implication of this model is the fairly smooth interest rate process generated by the model despite the very low EIS of the majority of households (non-stockholders). This result is due to the higher EIS of the stockholders who are on the other side of the bond market. I have also investigated the extent to which the labor supply decision can be endogenized in this framework. Cobb–Douglas utility gave rise to some counterfactual implications and therefore did not seem to offer a promising f
To construct the measure, I first annualize the total transactions by adding up Bt − Pt Bt+1 over 12 consecutive months and then take the absolute value. If I instead take the absolute value of monthly transactions first and then add up, the reported figure changes very little: it rises from 1.9% to 1.97%. 25 To see this, note that E(|B − P f B |/μ)/E(Cs) = 49%. 24
1748
FATIH GUVENEN
direction. In contrast, GHH preferences did not cause deterioration in asset pricing implications compared to the inelastic labor case. At the same time, the model’s implications for some key business cycle statistics, such as the volatilities of consumption and investment, need improvement. Therefore, the present model makes only modest progress in tackling these well-known shortcomings shared by many macro-based asset pricing models. A limitation of the present model is that it abstracts from long-run growth. Essentially, I assume that the economy is stationary in levels to circumvent the difficulties associated with obtaining a stationary wealth distribution with preference heterogeneity and long-run growth. Chan and Kogan (2002) were able to generate a stationary wealth distribution by introducing “benchmark consumption levels” into preferences. Although incorporating a similar idea into this framework would be a valuable extension, this is beyond the scope of the present analysis and remains an important area for expanding this model. Finally, there are many parallels between the asset pricing results of this paper and those obtained in the external habit model of Campbell and Cochrane (1999). It turns out that these similarities point to a deeper connection between the two frameworks: indeed, the limited participation model studied in this paper has a reduced form representation that is very similar to Campbell and Cochrane’s model, where the “external habit stock” in their framework corresponds to the consumption of non-stockholders in the present model. However, these similarities do not extend to the macroeconomic and policy implications of the two models, so the limited participation model should not be interpreted as providing microfoundations for the particular habit specification used by these authors. These links between the two frameworks are explored in detail in Guvenen (2009). I hope that these results will also encourage further research on the reasons behind limited participation, which is not addressed in this paper.26 Furthermore, given the central role played by limited participation in this model, another important research avenue is to investigate the consequences of the recent trends in participation observed in most countries for asset prices as well as for wealth inequality and welfare.
26 Recently, Gomes and Michaelides (2008) constructed a life-cycle model where individuals pay a one-time fixed cost to become a stockholder and argued that in this setting limited participation has a much smaller impact on asset prices. Their conclusion does not follow from the endogeneity of participation per se, because one can introduce a one-time cost into the present framework and show that it results in individuals with high EIS to enter the stock market whereas those with low EIS stay out. With a cost of appropriate magnitude, the resulting equilibrium is identical to the one studied here. (Further calculations on this fixed cost are available upon request.) The difference instead appears to come from some of the calibration choices made by these authors.
A PARSIMONIOUS MACROECONOMIC MODEL
1749
REFERENCES AÏT-SAHALIA, Y., J. PARKER, AND M. YOGO (2004): “Luxury Goods and the Equity Premium,” Journal of Finance, 59, 2959–3004. [1739] ATTANASIO, O., J. BANKS, AND S. TANNER (2002): “Asset Holding and Consumption Volatility,” Journal of Political Economy, 110, 771–92. [1738] BANSAL, R., AND A. YARON (2004): “Risks for the Long-Run: A Potential Resolution of Asset Pricing Puzzles,” Journal of Finance, 59, 1481–1509. [1735] BARSKY, B., F. JUSTER, M. KIMBALL, AND M. SHAPIRO (1997): “Preference Parameters and Behavioral Heterogeneity: An Experimental Approach in the Health and Retirement Survey,” Quarterly Journal of Economics, 112, 537–579. [1723] BASAK, S., AND D. CUOCO (1998): “An Equilibrium Model With Restricted Stock Market Participation,” Review of Financial Studies, 11, 309–341. [1713,1727,1738] BLUNDELL, R., M. BROWNING, AND C. MEGHIR (1994): “Consumer Demand and the Life-Cycle Allocation of Household Expenditures,” Review of Economic Studies, 61, 57–80. [1723] BOLDRIN, M., L. CHRISTIANO, AND J. FISHER (1999): “Habit Persistance, Asset Returns, and the Business Cycle,” Unpublished Manuscript, Northwestern University. [1732] (2001): “Habit Persistence, Asset Returns, and the Business Cycle,” American Economic Review, 91, 149–166. [1712,1714] BROWNING, M., AND T. CROSSLEY (2000): “Luxuries Are Easier to Postpone: A Proof,” Journal of Political Economy, 108, 1022–1026. [1723] CAMPANALE, C., R. CASTRO, AND G. CLEMENTI (2007): “Asset Pricing in a Production Economy With Chew-Dekel Preferences,” Unpublished Manuscript, Stern School of Business, New York University. [1715] CAMPBELL, J. Y. (1999): “Asset Prices, Consumption, and the Business Cycle,” in Handbook of Macroeconomics, Vol. 1, ed. by J. Taylor and M. Woodford. Amsterdam: North-Holland, 1231–1303. [1726,1727,1739] CAMPBELL, J., AND J. H. COCHRANE (1999): “By Force of Habit: A Consumption Based Explanation of Aggregate Stock Market Behavior,” Journal of Political Economy, 107, 205–251. [1735,1742,1748] CAMPBELL, J. Y., AND R. J. SHILLER (1988): “Stock Prices, Earnings and Expected Dividends,” Journal of Finance, 43, 661–676. [1711,1733] CHAN, Y., AND L. KOGAN (2002): “Catching Up With the Joneses: Heterogeneous Preferences and the Dynamics of Asset Prices,” Journal of Political Economy, 110, 1255–1285. [1748] CHOU, R., R. F. ENGLE, AND A. KANE (1992): “Measuring Risk Aversion From Excess Returns on a Stock Index,” Journal of Econometrics, 52, 210–224. [1712,1735] CHRISTIANO, L. J., AND J. D. M. FISHER (1998): “Stock Market and Investment Good Prices: Implications for Macroeconomics,” Unpublished Manuscript, Northwestern University. [1721] CHRISTIANO, L., M. EICHENBAUM, AND C. L. EVANS (2005): “Nominal Rigidities and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy, 113, 1–45. [1716] DANTHINE, J.-P., AND J. DONALDSON (2002): “Labor Relations and Asset Returns,” Review of Economic Studies, 69, 41–64. [1712,1714,1717,1725,1728,1731,1738,1742] DOMEIJ, D., AND M. FLODEN (2006): “The Labor-Supply Elasticity and Borrowing Constraints: Why Estimates Are Biased,” Review of Economic Dynamics, 9, 242–262. [1732] EPSTEIN, L. G. (1988): “Risk Aversion and Asset Prices,” Journal of Monetary Economics, 22, 179–192. [1740] GREENWOOD, J., Z. HERCOWITZ, AND G. W. HUFFMAN (1988): “Investment, Capacity Utilization, and the Real Business Cycles,” American Economic Review, 78, 402–417. [1714,1723] GOMES, F., AND A. MICHAELIDES (2008): “Asset Pricing With Limited Risk Sharing and Heterogeneous Agents,” Review of Financial Studies, 21, 415–448. [1715,1748] GUVENEN, F. (2006): “Reconciling Conflicting Evidence on the Elasticity of Intertemporal Substitution: A Macroeconomic Perspective,” Journal of Monetary Economics, 53, 1451–1472. [1716,1723,1746]
1750
FATIH GUVENEN
(2009): “Limited Stock Market Participation versus External Habit: An Intimate Link,” Unpublished Manuscript, University of Minnesota. [1748] (2010): “Supplement to ‘A Parsimonious Macroeconomic Model for Asset Pricing’: Technical Appendix and Extensions,” Econometrica Supplemental Material, 78, http://www. econometricsociety.org/ecta/Supmat/6658_data and programs.pdf. [1719,1720] INVESTMENT COMPANY INSTITUTE (2002): “Equity Ownership in America,” available online at http://www.ici.org/statements/res. [1712,1722] JERMANN, U. J. (1998): “Asset Pricing in Production Economies,” Journal of Monetary Economics, 41, 257–275. [1712,1716,1717,1720,1728] KIMBALL, M., AND M. SHAPIRO (2003): “Labor Supply: Are the Income and Substitution Effects Both Large or Both Small?” Unpublished Manuscript, University of Michigan. [1724] KRUSELL, P., AND A. A. SMITH, JR. (1997): “Income and Wealth Heterogeneity, Portfolio Choice, and Equilibrium Asset Returns,” Macroeconomic Dynamics, 1, 387–422. [1718-1720] LETTAU, M., AND H. UHLIG (2000): “Can Habit Formation Be Reconciled With Business Cycle Facts?” Review of Economic Dynamics, 3, 79–99. [1714] MANKIW, N. G., AND S. P. ZELDES (1991): “The Consumption of Stockholders and Nonstockholders,” Journal of Financial Economics, 29, 97–112. [1713,1738] MASULIS, R. W. (1988): The Debt–Equity Choice. Institutional Investor Series in Finance. Cambridge, MA: Ballinger Press. [1721] MEHRA, R., AND E. C. PRESCOTT (1985): “The Equity Premium: A Puzzle,” Journal of Monetary Economics, 15, 145–161. [1711] POTERBA, J., AND A. A. SAMWICK (1995): “Stock Ownership Patterns, Stock Market Fluctuations, and Consumption,” Brookings Papers on Economic Activity, 26, 295–373. [1712,1722, 1739,1746] SAITO, M. (1995): “Limited Participation and Asset Pricing,” Unpublished Manuscript, University of British Columbia. [1713,1727,1738] SCHWERT, G. W. (1989): “Why Does Stock Market Volatility Change Over Time?” Journal of Finance, 44, 1115–1153. [1712] STORESLETTEN, K., C. TELMER, AND A. YARON (2007): “Asset Pricing With Idiosyncratic Risk and Overlapping Generations,” Review of Economic Dynamics, 10, 519–548. [1712,1715,1718, 1725] TALLARINI, T. (2000): “Risk Sensitive Business Cycles,” Journal of Monetary Economics, 45, 507–532. [1715] UHLIG, H. (2006): “Macroeconomics and Asset Markets: Some Mutual Implications,” Unpublished Manuscript, University of Chicago. [1712,1714,1743] (2007): “Leisure, Growth, and Long-Run Risk,” Unpublished Manuscript, University of Chicago. [1715]
Dept. of Economics, University of Minnesota, 4-101 Hanson Hall, 1925 Fourth Street, South Minneapolis, MN 55455, U.S.A. and NBER;
[email protected]; http://www.econ.umn.edu/~guvenen. Manuscript received August, 2006; final revision received August, 2009.
Econometrica, Vol. 77, No. 6 (November, 2009), 1751–1790
LIQUIDITY AND TRADING DYNAMICS BY VERONICA GUERRIERI AND GUIDO LORENZONI1 In this paper, we build a model where the presence of liquidity constraints tends to magnify the economy’s response to aggregate shocks. We consider a decentralized model of trade, where agents may use money or credit to buy goods. When agents do not have access to credit and the real value of money balances is low, agents are more likely to be liquidity constrained. This makes them more concerned about their short-term earning prospects when making their consumption decisions and about their short-term spending opportunities when making their production decisions. This generates a coordination element in spending and production which leads to greater aggregate volatility and greater comovement across producers. KEYWORDS: Liquidity, money, search, aggregate volatility, amplification.
1. INTRODUCTION DURING RECESSIONS, CONSUMERS, facing a higher risk of unemployment and temporary income losses, tend to hold on to their reserves of cash, bonds, and other liquid assets as a form of self-insurance. This precautionary behavior can lead to reduced spending and magnify the initial decline in aggregate activity. In this paper, we explore formally this idea and show that this amplification mechanism depends crucially on the consumers’ access to liquidity in a broad sense. The scarcer the access to liquidity, the more consumers are likely to be liquidity constrained in the near future. This means that they are more concerned about their short-term earnings prospects when making their spending decisions, and these decisions become more responsive to the spending decisions of other agents. In other words, there is a stronger “coordination effect” which can magnify the effect of aggregate shocks on aggregate output and induce more comovement across different sectors of the economy. We consider a decentralized model of production and exchange in the tradition of search models of money, where credit frictions arise from the limited ability to verify the agents’ identity. There are a large number of households, each composed of a consumer and a producer. Consumers and producers from different households meet and trade in spatially separated markets, 1 For helpful comments we thank a co-editor, three anonymous referees, Daron Acemoglu, Fernando Alvarez, Marios Angeletos, Boragan Aruoba, Gabriele Camera, Ricardo Cavalcanti, Chris Edmond, Ricardo Lagos, Robert Lucas, David Marshall, Robert Shimer, Nancy Stokey, Christopher Waller, Dimitri Vayanos, Iván Werning, Randall Wright, and seminar participants at MIT, the Cleveland Fed (Summer Workshop on Money, Banking, and Payments), University of Maryland, University of Notre Dame, Bank of Italy, AEA Meetings (Chicago), the Chicago Fed, Stanford University, the Philadelphia Fed Monetary Macro Workshop, UCLA, San Francisco Fed, St. Louis Fed, University of Chicago GSB, NYU, IMF, Bank of France/Toulouse Liquidity Conference (Paris), and the Minneapolis Fed/Bank of Canada Monetary Conference. Lorenzoni thanks the Federal Reserve Bank of Chicago for its generous hospitality during part of this research.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7231
1752
V. GUERRIERI AND G. LORENZONI
or islands. In each island, the gains from trade are determined by a local productivity shock. An exogenous aggregate shock determines the distribution of local shocks across islands. A good aggregate shock reduces the proportion of low productivity islands and increases that of high productivity islands, that is, it leads to a first-order stochastic shift in the distribution of local productivities. Due to limited credit access, households accumulate precautionary money balances to bridge the gap between current spending and current income. Money is supplied by the government and grows at a constant rate. In a stationary equilibrium, the rate of return on money is equal to the inverse of the money growth rate. A lower rate of return on money reduces the equilibrium real value of the money stock. In the model, we distinguish different regimes along two dimensions: credit access and the rate of return on money. In regimes with less credit access and a lower rate of return on money, agents are more likely to face binding liquidity constraints. In such regimes, we show that there is a coordination element both in spending and in production decisions: agents are less willing to trade (buy and sell) when they expect others to trade less. This leads both to greater aggregate volatility and to greater comovement among islands. We first obtain analytical results in two polar cases which we call unconstrained and fully constrained regimes. An unconstrained regime arises when either households have unrestricted access to credit or the value of real money balances is sufficiently high. In this case, households are never liquidity constrained in equilibrium. Our first result is that, in an unconstrained regime, the quantity traded in each island is independent of what happens in other islands. The result follows from the fact that households are essentially fully insured against idiosyncratic shocks. This makes their expected marginal value of money constant, allowing the consumer and the producer from the same household to make their trading decisions independently. At the opposite end of the spectrum, a fully constrained regime arises when households have no credit access and the value of real money balances is sufficiently low, so that households expect to be liquidity constrained for all realizations of the idiosyncratic shocks. In this case, the decisions of the consumer and the producer are tightly linked. The consumer needs to forecast the producer’s earnings and the producer needs to forecast the consumer’s spending so as to evaluate the household’s marginal value of money. Next, we look at the aggregate implications of these linkages. In all regimes, a bad aggregate shock has a negative compositional effect: as fewer islands have high productivity, aggregate output decreases. However, in an unconstrained regime there is no feedback from this aggregate fall in output to the level of trading in an island with a given local shock. In a fully constrained regime, instead, the linkage between trading decisions in different islands generates an additional effect on trading and output. A bad aggregate shock reduces the probability of high earnings for the producer, inducing the consumer to reduce spending. At the same time, the producer expects his partner to
LIQUIDITY AND TRADING DYNAMICS
1753
spend less, reducing his incentive to produce. These two effects imply that a lower level of aggregate activity induces lower levels of activity in each island, conditional on the local shock, leading to a magnified fall in aggregate activity. Numerical results show that our mechanism is also at work in intermediate regimes, where the liquidity constraint is occasionally binding, and that reduced credit access and a lower rate of return on money lead to higher volatility and comovement. This paper is related to the literature on search models of decentralized trading, going back to Diamond (1982, 1984) and Kiyotaki and Wright (1989). In particular, Diamond (1982) puts forth the idea that “the difficulty of coordination of trade” may contribute to cyclical volatility. The contribution of our paper is to show that the presence of this coordination effect depends crucially on credit market conditions and on the monetary regime. This allows us to identify a novel connection between financial development, liquidity supply, and aggregate dynamics. Our model allows for divisible money and uses the Lagos and Wright (2005) approach to make the model tractable. In Lagos and Wright (2005), agents alternate trading in a decentralized market with trading in a centralized competitive market. The combination of quasilinear preferences and periodic access to a centralized market ensures that the distribution of money holdings is degenerate when agents enter the decentralized market. Here we use these same two ingredients, with a modified periodic structure. In our model, agents have access to a centralized market every three periods. The extra period of decentralized trading is necessary to make the precautionary motive matter for trading decisions in the decentralized market of the previous period. This is at the core of our amplification mechanism. A three-period structure was also used by Berentsen, Camera, and Waller (2005) to study the short-run neutrality of money. They showed that, away from the Friedman rule, random monetary injections can be nonneutral, since they have a differential effect on agents with heterogeneous money holdings. Although different in its objectives, their analysis also relies on the lack of consumption insurance. Our work is also related to a large number of papers that have explored the implications of different monetary regimes for risk sharing in environments with idiosyncratic risk (e.g., Aiyagari and Williamson (2000), Reed and Waller (2006)), and is related to Rocheteau and Wright (2005) for the use of competitive pricing à la Lucas and Prescott (1974) in a money search model. More broadly, the paper is related to the literature exploring the relation between financial frictions and aggregate volatility, including Bernanke and Gertler (1989), Bencivenga and Smith (1991), Acemoglu and Zilibotti (1997), and Kiyotaki and Moore (1997). In particular, Kiyotaki and Moore (2001) also emphasized the effect of a limited supply of liquid assets (money) on aggregate dynamics. Their paper studies a different channel by which limited liquidity can affect the transmission of aggregate shocks, focusing on the effects on investment and capital accumulation. Our paper is also related to the vast literature on the effect of liquidity constraints on consumption decisions. In particular, our argument relies on the
1754
V. GUERRIERI AND G. LORENZONI
idea that when liquidity constraints are binding less often, consumption becomes less sensitive to short-term income expectations. Some evidence consistent with this idea is in the article by Jappelli and Pagano (1989), who showed that the excess sensitivity of consumption to current income is less pronounced in countries with more developed credit markets, and in the article by Bacchetta and Gerlach (1997), who showed that excess sensitivity has declined in the United States as a consequence of financial innovations. The rest of the paper is organized as follows. In Section 2, we introduce a baseline model, with simple binary shocks and no credit access, and derive our main analytical results. In Section 3, we analyze an extended version of the model, generalizing the shock distribution and allowing for credit access. Section 4 presents a further extension with imperfect information and public signals. Section 5 concludes. The Appendix contains all the proofs not given in the text. 2. THE MODEL The economy is populated by a unit mass of infinitely lived households, composed of two agents: a consumer and a producer. Time is discrete and each period agents produce and consume a single, perishable consumption good. The economy has a simple periodic structure: each time period t is divided into three subperiods, s = 1 2 3. We will call them periods whenever there is no risk of confusion. In periods 1 and 2, the consumer and the producer from each household travel to spatially separated markets, or islands, where they interact with consumers and producers from other households. There is a continuum of islands and each island receives the same mass of consumers and producers in both periods 1 and 2. The assignment of agents to islands is random and satisfies a law of large numbers, so that each island receives a representative sample of consumers and producers. In each island there is a competitive goods market, as in Lucas and Prescott (1974). The consumer and the producer from the same household do not communicate while traveling in periods 1 and 2, but get together at the end of each period. In period 3, all consumers and producers trade in a single centralized market.2 In period 1 of time t, a producer located in island k has access to the linear technology yt1 = θtk nt where yt1 is output, nt is labor effort, and θtk is the local level of productivity, which is random and can take two values: 0 and θ > 0. At time t, a fraction ζt 2 The use of a household made of two agents who are spatially separated during a trading period goes back to Lucas (1990) and Fuerst (1992).
LIQUIDITY AND TRADING DYNAMICS
1755
of islands is randomly assigned the high productivity level θ, while a fraction 1 − ζt is unproductive. The aggregate shock ζt is independently drawn and publicly revealed at the beginning of period 1, and takes two values, ζH and ζL , in (0 1), with probabilities α and 1 − α. The island-specific productivity θtk is only observed by the consumers and producers located in island k. In Section 3, we will generalize the distributions of local and aggregate shocks. In periods 2 and 3, each producer has a fixed endowment of consumption goods: yt2 = e2 and yt3 = e3 . We assume that the value of e3 is large, so as to ensure that equilibrium consumption in period 3 is strictly positive for all households. The household’s preferences are represented by the utility function ∞ t β (u(ct1 ) − v(nt ) + U(ct2 ) + ct3 ) E t=0
where cts is consumption in subperiod (t s) and β ∈ (0 1) is the discount factor. Both u and U are increasing, strictly concave, with continuous first and second derivatives on (0 ∞). The function u is bounded below, with u(0) = 0, has finite right derivative at 0, and satisfies the Inada condition limc→∞ u (c) = 0. The function U satisfies the Inada conditions limc→0 U (c) = ∞ and limc→∞ U (c) = 0. The function v represents the disutility of effort, is ¯ increasing and convex, has continuous first and second derivatives on [0 n), and satisfies v (0) = 0 and limn→n¯ v (n) = ∞. We assume that the consumers’ identity cannot be verified in the islands they visit in periods 1 and 2, so credit contracts are not feasible. There is an exogenous supply of perfectly divisible notes issued by the government—money— and the only feasible trades in periods 1 and 2 are trades of money for goods. Each household is endowed with a stock of money M0 at date 0. At the end of each subperiod 3, the government injects (γ − 1)Mt units of money by a lumpsum transfer to each household (a lump-sum tax if γ < 1). Therefore, the stock of money Mt grows at the constant gross rate γ. In this paper, we make no attempt to explain the government’s choice of the monetary regime, but simply explore the effect of different regimes on equilibrium behavior. Let us comment briefly on two of the assumptions made. First, the fact that in subperiod 3, consumers and producers trade in a centralized market and have linear utility is essential for tractability, as it allows us to derive an equilibrium with a degenerate distribution of money balances at the beginning of (t 1), as in Lagos and Wright (2005).3 Second, we assume that the household is split into a consumer and a producer who make separate decisions in period 1, without observing the shock of the partner. This assumption allows us to capture, in a compact way, the effects of short-term income uncertainty on consumption and production decisions. 3
See Shi (1997) for a different approach to obtain a degenerate distribution of money holdings.
1756
V. GUERRIERI AND G. LORENZONI
2.1. First Best The first-best allocation provides a useful benchmark for the rest of the analysis. Consider a social planner with perfect information who can choose the consumption and labor effort of the households. Given that there is no capital, there is no real intertemporal link between times t and t + 1. Therefore, we can look at a three-period planner’s problem. ˜ where the first element Each household is characterized by a pair (θ θ), represents the shock in the producer’s island and the second represents the shock in the consumer’s island. An allocation is given by consumption func˜ ζ)}s∈{123} and an effort function n(θ θ ˜ ζ). The planner chooses tions {cs (θ θ an allocation that maximizes the ex ante utility of the representative household, ˜ ζ)) − v(n(θ θ ˜ ζ)) + U(c2 (θ θ ˜ ζ)) + c3 (θ θ ˜ ζ) E u(c1 (θ θ subject to the economy’s resource constraints. In period 1, given the aggregate shock ζ, there is one resource constraint for each island θ,4 ˜ θ ζ)|θ ζ] ≤ E[y1 (θ θ ˜ ζ)|θ ζ] E[c1 (θ ˜ ζ) = θn(θ θ ˜ ζ). In period s = 2 3, the resource constraint is where y1 (θ θ ˜ ζ)|ζ] ≤ es E[cs (θ θ The resource constraints for periods 1 and 2 reflect the assumption that each island receives a representative sample of consumers and producers. An optimal allocation is easy to characterize. Due to the separability of the utility function, the optimal consumption and output levels in a given island are not affected by the fraction ζ of productive islands in the economy. Namely, ˜ θ ζ) = y1 (θ θ ˜ ζ) = y1∗ (θ) for all θ˜ and ζ, where y1∗ (0) = 0 and y1∗ (θ) satc1 (θ isfies (1)
θu (y1∗ (θ)) = v (y1∗ (θ)/θ)
˜ ζ) = e2 for all θ θ, ˜ and ζ, that is, houseMoreover, at the optimum, c2 (θ θ ˜ holds are fully insured against the shocks θ and θ. Finally, given linearity, the consumption levels in period 3 are not pinned down, as consumers are ex ante ˜ ζ) such that E[c3 (θ θ ˜ ζ)] = e3 . indifferent among all profiles c3 (θ θ 2.2. Equilibrium Let us normalize all nominal variables (prices and money holdings), dividing them by the aggregate money stock Mt . Then we can focus on stationary 4
From now on, “island θ” is short for “an island with productivity shock θ.”
LIQUIDITY AND TRADING DYNAMICS
1757
monetary equilibria where money is valued and where normalized prices only depend on the current aggregate shock ζt . Therefore, we drop the time index t. We begin by characterizing optimal individual behavior. Let p1 (θ ζ) denote the normalized price of goods in period 1 in island θ, and let p2 (ζ) and p3 (ζ) denote the normalized prices in periods 2 and 3. Consider a household with an initial stock of money m (normalized) at the beginning of period 1 after ˜ ζ). the realization of ζ. The consumer travels to island θ˜ and consumes c1 (θ Since money holdings are nonnegative, the budget constraint and the liquidity constraint in period 1 are (2)
˜ ζ) + p1 (θ ˜ ζ)c1 (θ ˜ ζ) ≤ m m1 (θ ˜ ζ) ≥ 0 m1 (θ
˜ ζ) denotes the consumer’s normalized money holdings at the end where m1 (θ of period 1. In the meantime, the producer, located in island θ, produces and sells y1 (θ ζ) = θn(θ ζ). At the end of period 1, the consumer and the producer get together and pool their money holdings. Therefore, in period 2 the budget and liquidity constraints are (3)
˜ ζ) + p2 (ζ)c2 (θ θ ˜ ζ) ≤ m1 (θ ˜ ζ) + p1 (θ ζ)θn(θ ζ) m2 (θ θ ˜ ζ) ≥ 0 m2 (θ θ
˜ ζ), and end-of-period normalized money holdwhere consumption, c2 (θ θ ˜ ζ), are now contingent on both shocks θ and θ. ˜ Finally, in peings, m2 (θ θ riod 3, the consumer and the producer are located in the same island and the revenues p3 (ζ)e3 are immediately available. Moreover, the household receives a lump-sum nominal transfer equal to γ − 1, in normalized terms. The constraints in period 3 are then (4)
˜ ζ) + p3 (ζ)c3 (θ θ ˜ ζ) m3 (θ θ ˜ ζ) + p2 (ζ)e2 + p3 (ζ)e3 + γ − 1 ≤ m2 (θ θ ˜ ζ) ≥ 0 m3 (θ θ
˜ ζ) at the end of subA household with normalized money balances m3 (θ θ −1 ˜ period 3, will have normalized balances γ m3 (θ θ ζ) at the beginning of the following subperiod 1, as the rate of return on normalized money holdings between (t 3) and (t +1 1) is equal to the inverse of the growth rate of the money stock, γ. Let V (m) denote the expected utility of a household with money balances m at the beginning of period 1, before the realization of the aggregate
1758
V. GUERRIERI AND G. LORENZONI
shock ζ. The household’s problem is then characterized by the Bellman equation ˜ ζ)) − v(n(θ ζ)) + U(c2 (θ θ ˜ ζ)) V (m) = max E u(c1 (θ (5) {cs }{ms }n
˜ ζ) + βV (γ −1 m3 (θ θ ˜ ζ)) + c3 (θ θ
subject to the budget and liquidity constraints specified above.5 The solution to this problem gives us the optimal household’s choices as functions of the shocks and of the initial money balances m, which we denote by c1 (θ ζ; m), ˜ ζ; m), and so forth. c2 (θ θ We are now in a position to define an equilibrium.6 DEFINITION 1: A stationary monetary equilibrium is given by prices {p1 (θ ζ) p2 (ζ) p3 (ζ)}, a distribution of money holdings with cumulative distribution function (c.d.f.) H(·) and support M, and an allocation {n(θ ζ; m), ˜ ζ; m), c3 (θ θ ˜ ζ; m), m1 (θ ζ; m), m2 (θ θ ˜ ζ; m), m3 (θ c1 (θ ζ; m), c2 (θ θ ˜ θ ζ; m)} such that the following statements hold: (i) The allocation solves problem (5) for each m ∈ M. (ii) Goods markets clear E[c1 (θ ζ; m)|θ ζ] dH(m) = θ E[n(θ ζ; m)|θ ζ] dH(m) M
M
for all θ ζ
M
˜ ζ; m)|ζ] dH(m) = es E[cs (θ θ
(iii) The distribution H(·) satisfies
M
for s = 2 3 and all ζ
m dH(m) = 1 and
˜ m) ˜ ζ; m) ˜ : γ −1 m3 (θ θ ˜ ≤ m|ζ] H(m) = Pr[(θ θ
for all m and ζ
Condition (iii) ensures that the distribution H(·) is stationary. As we will see below, we can focus on equilibria where the distribution of money balances is degenerate at m = 1. Therefore, from now on, we drop the argument m from the equilibrium allocations. To characterize the equilibrium, it is useful to derive the household’s firstorder conditions. From problem (5) we obtain three Euler equations, with re5 Standard dynamic programming techniques can be applied. To take care of the unboundedness of the per-period utility function, one can extend the argument in Lemma 7 of Lagos and Wright (2004). 6 We focus on monetary equilibria, that is, equilibria where money has positive value. As usual in money search environments, nonmonetary equilibria also exist.
1759
LIQUIDITY AND TRADING DYNAMICS
spective complementary slackness conditions (6)
˜ ζ)) ≥ u (c1 (θ
˜ ζ) p1 (θ ˜ ζ))|θ ˜ ζ E U (c2 (θ θ p2 (ζ) ˜ ζ) ≥ 0) for all θ ˜ ζ (m1 (θ p2 (ζ) p3 (ζ)
(7)
˜ ζ)) ≥ U (c2 (θ θ
(8)
˜ ζ)) 1 ≥ p3 (ζ)βγ −1 V (γ −1 m3 (θ θ
˜ ζ) ≥ 0) for all θ θ ˜ ζ (m2 (θ θ ˜ ζ) ≥ 0) for all θ θ ˜ ζ (m3 (θ θ
the optimality condition for labor supply (9)
v (n(θ ζ)) = θ
p1 (θ ζ) ˜ ζ))|θ ζ E U (c2 (θ θ p2 (ζ)
for all θ ζ
and the envelope condition (10)
V (m) = E
˜ ζ))
u (c1 (θ ˜ ζ) p1 (θ
Our assumptions allow us to simplify the equilibrium characterization as follows. Since θ = 0 with probability ζ > 0, the Inada condition for U implies ˜ ζ) and m3 (θ θ ˜ ζ) are strictly positive for all θ, θ, ˜ and ζ. To insure that m1 (θ against the risk of entering period 2 with zero money balances, households always keep positive balances at the end of periods 1 and 3. This implies that (6) and (8) always hold as equalities. Condition (8), holding with equality, shows why we obtain equilibria with a degenerate distribution of money balances as in Lagos and Wright (2005). Given that the normalized supply of money is equal to 1, a stationary equilibrium with a degenerate distribution H(·) must satisfy ˜ ζ) = 1 γ −1 m3 (θ θ
˜ ζ for all θ θ
In equilibrium, all agents adjust their consumption in period 3 so as to reach the same level of m3 , irrespective of their current shocks. The assumptions that utility is linear in period 3 and that e3 is large enough imply that the marginal utility of consumption in period 3 is constant, ensuring that this behavior is optimal.7 Moreover, equation (8), as an equality, implies that in all stationary equilibria, p3 (ζ) is independent of the aggregate shock ζ and equal to γ/(βV (1)). From now on, we just denote it as p3 . 7 When γ > β, it can be shown that all stationary equilibria are characterized by a degenerate distribution of money holdings.
1760
V. GUERRIERI AND G. LORENZONI
This leaves us with condition (7). In general, this condition can be either ˜ depending on the parameters of the binding or slack for different pairs (θ θ), model. However, we are able to give a full characterization of the equilibrium by looking at specific monetary regimes, namely, by making assumptions about ˜ ζ) ≥ 0 is γ. First, we look at equilibria where the liquidity constraint m2 (θ θ never binding. We will show that this case arises if and only if γ = β, that is, in a monetary regime that follows the Friedman rule. Second, we look at equilibria ˜ ζ) ≥ 0 is binding for all pairs (θ θ) ˜ and for all where the constraint m2 (θ θ ζ. We will show that this case arises if and only if the rate of money growth is sufficiently high, that is, when γ ≥ γˆ for a given cutoff γˆ > β. These two polar cases provide analytically tractable benchmarks which illustrate the mechanism at the core of our model. The numerical example in Section 3.3 considers the case of economies with γ ∈ (β γ), ˆ where the liquidity constraint in period 2 is binding for a subset of agents. 2.2.1. Unconstrained Equilibrium We begin by considering unconstrained equilibria, that is, stationary monetary equilibria where the liquidity constraint in period 2 is never binding. In this case, condition (7) always holds as an equality. Combining conditions (6)–(8), all as equalities, and (10) gives (11)
˜ ζ)) u (c1 (θ u (c1 (θ˜ ζ )) = βγ −1 E ˜ ζ) p1 (θ p1 (θ˜ ζ )
where θ˜ and ζ represent variables in the next time period. Taking expectations with respect to θ˜ and ζ on both sides shows that a necessary condition for an unconstrained equilibrium is γ = β. The following proposition shows that this condition is also sufficient. Moreover, under this monetary regime, the equilibrium achieves an efficient allocation.8 PROPOSITION 1: An unconstrained stationary monetary equilibrium exists if and only if γ = β and achieves a first-best allocation. For our purposes, it is especially interesting to understand how the level of activity is determined in a productive island in period 1. Let p1 (ζ) and y 1 (ζ) denote p1 (θ ζ) and y1 (θ ζ). Substituting (7) into (6) (both as equalities), we can rewrite the consumer’s optimality condition in period 1 as (12) 8
u (y 1 (ζ)) =
p1 (ζ) p3
This result is closely related to the analysis in Section 4 of Rocheteau and Wright (2005).
LIQUIDITY AND TRADING DYNAMICS
1761
Similarly, the producer’s optimality condition (9) can be rewritten as p (ζ) y 1 (ζ) =θ 1 (13) v p3 θ These two equations describe, respectively, the demand and the supply of consumption goods in island θ as a function of the price p1 (ζ). Jointly, they determine the equilibrium values of p1 (ζ) and y 1 (ζ) for each ζ. These equations highlight that, in an unconstrained equilibrium, consumers and producers do not need to forecast the income/spending of their partners when making their optimal choices, given that their marginal value of money is constant and equal to 1/p3 . This implies that trading decisions in a given island are independent of trading decisions in all other islands. We will see that this is no longer true when we move to a constrained equilibrium. Conditions (12) and (13) can be easily manipulated to obtain the planner’s first-order condition (1), showing that in an unconstrained equilibrium, y 1 (ζ) is independent of ζ and equal to the first best. 2.2.2. Fully Constrained Equilibrium We now turn to stationary monetary equilibria where the liquidity constraint ˜ ζ) = 0 for all θ θ, ˜ and ζ. We refer is always binding in period 2, that is, m2 (θ θ to them as fully constrained equilibria. We will show that such equilibria arise when the money growth rate γ is sufficiently high. Again, our main objective is to characterize how output is determined in period 1. First, however, we need to derive the equilibrium value of p2 (ζ). At the beginning of each period, the entire money supply is in the hands of the consumers. Since in a fully constrained equilibrium consumers spend all their money in period 2 and normalized money balances are equal to 1, market clearing gives us (14)
p2 (ζ)e2 = 1
which pins down p2 (ζ). To simplify notation, we normalize e2 = 1, so as to have p2 (ζ) = 1. Consider now a consumer and a producer in a productive island in period 1. Given that the consumer will be liquidity constrained in period 2, his consumption in that period will be fully determined by his money balances. In period 1, the consumer is spending p1 (ζ)y 1 (ζ) and expects his partner’s income to be p1 (ζ)y 1 (ζ) with probability ζ and zero otherwise. Therefore, he expects total money balances at the beginning of period 2 to be 1 in the first case and 1 − p1 (ζ)y 1 (ζ) in the second. Using p2 (ζ) = 1, we can then rewrite the Euler equation (6) as u (y 1 (ζ)) = p1 (ζ) ζU (1) + (1 − ζ)U (1 − p1 (ζ)y 1 (ζ)) (15)
1762
V. GUERRIERI AND G. LORENZONI
A symmetric argument on the producer’s side shows that the optimality condition (9) can be written as (16) v (y 1 (ζ)/θ) = θp1 (ζ) ζU (1) + (1 − ζ)U (1 + p1 (ζ)y 1 (ζ)) These two equations correspond to (12) and (13) in the unconstrained case, and jointly determine p1 (ζ) and y 1 (ζ) for each ζ. The crucial difference with the unconstrained case is that now ζ, the fraction of productive islands in the economy, enters the optimal decisions of consumers and producers in a given productive island, since it affects their expected income and consumption in the following period. We will see in a moment how this affects aggregate volatility and comovement. Notice that (15) and (16) implicitly define a “demand curve” and a “supply curve,” y D (p1 ζ) and y S (p1 ζ).9 It is easy to show that, for any ζ, there exists a price where the two curves intersect. For comparative statics, it is useful to make the additional assumption (A1)
−(1 − ζH )cU (c)/U (c) ≤ 1
for all c
which ensures that the income effect on labor supply is not too strong and that the supply curve is positively sloped. LEMMA 1: The function y D (p1 ζ) is decreasing in p1 . Under assumption (A1), the function y S (p1 ζ) is increasing in p1 and, for given ζ, there is a unique pair (p1 (ζ) y 1 (ζ)) which solves (15) and (16). To complete the equilibrium characterization, it remains to find p3 and check that consumers are indeed constrained in period 2, that is, that (7) holds. In the next proposition, we show that this condition is satisfied as long as γ is above some cutoff γ. ˆ PROPOSITION 2: There is a γˆ > β such that a fully constrained stationary monetary equilibrium exists if and only if γ ≥ γ. ˆ 10 It is useful to clarify the role of the rate of return on money γ −1 in determining whether we are in a constrained or unconstrained equilibrium. Notice that in an unconstrained equilibrium, the household’s normalized money balances at the beginning of period 1, which are equal to 1, must be sufficient to purchase both p1 (ζ)y 1 (ζ) and p2 (ζ)e2 in case the consumer is assigned to a 9 These are not standard partial-equilibrium demand and supply functions, as they represent the relation between the price p1 and the demand/supply of goods in a symmetric equilibrium where prices and quantities are identical in all productive islands. 10 Under assumption (A1), if γ ≥ γ, ˆ there is a unique fully constrained equilibrium. However, we cannot rule out, in general, the existence of other partially constrained equilibria.
LIQUIDITY AND TRADING DYNAMICS
1763
productive island and the producer is assigned to an unproductive one. Therefore in an unconstrained equilibrium, the following inequality holds for all ζ: 1 p (ζ) ≥ e2 + 1 y (ζ) p2 (ζ) p2 (ζ) 1 On the other hand, (14) shows that 1/p2 (ζ) is constant and equal to e2 in a fully constrained equilibrium. That is, the real value of money balances in terms of period 2 consumption is uniformly lower in a fully constrained equilibrium. This is due to the fact that the rate of return on money is low. This reduces the agents’ willingness to hold money, reducing the equilibrium real value of money balances. Through this channel, high money growth reduces the households’ ability to self-insure. 2.3. Coordination, Amplification, and Comovement We now turn to the effects of the aggregate shock ζ on the equilibrium allocation in the various regimes considered. Aggregate output in period 1 is given by (17)
Y1 (ζ) = ζy 1 (ζ)
Consider the proportional effect of a small change in ζ on aggregate output: (18)
d log Y1 (ζ) 1 d ln y 1 (ζ) = + dζ ζ dζ
When ζ decreases, there is a smaller fraction of productive islands, so aggregate output mechanically decreases in proportion to ζ. This composition effect corresponds to the first term in (18). The open question is whether a change in ζ also affects the endogenous level of activity in a productive island. This effect is captured by the second term in (18) and will be called coordination effect. In an unconstrained equilibrium, we know that y 1 (ζ) is independent of ζ. Therefore, if money growth follows the Friedman rule and the rate of return of money is equal to β−1 , the coordination effect is absent. What happens in a fully constrained equilibrium, that is, when the rate of return on money is sufficiently low? Consider the demand and supply curves in a productive island, y D (p1 ζ) and y S (p1 ζ), derived above. Applying the implicit function theorem to (15) and (16) yields ∂y D (p1 (ζ) ζ) U (1) − U (1 − p1 (ζ)y 1 (ζ)) = p1 ∂ζ u (y 1 (ζ)) + (p1 (ζ))2 (1 − ζ)U (1 − p1 (ζ)y 1 (ζ)) >0
1764
V. GUERRIERI AND G. LORENZONI
and ∂y S (p1 (ζ) ζ) ∂ζ = p1
U (1) − U (1 + p1 (ζ)y 1 (ζ)) v (y 1 (ζ)/θ)/θ − θ(p1 (ζ))2 (1 − ζ)U (1 + p1 (ζ)y 1 (ζ))
> 0
Both inequalities follow from the strict concavity of U. On the demand side, the intuition is the following. In period 1, a consumer in a productive island is concerned about receiving a bad income shock. Given that he is liquidity constrained, this shock will directly lower his consumption from 1 to 1 − p1 (ζ)y 1 (ζ). An increase in ζ lowers the probability of a bad shock, decreasing the expected marginal value of money and increasing the consumer’s willingness to spend, for any given price. On the supply side, as ζ increases, a producer in a productive island expects his partner to spend p1 (ζ)y 1 (ζ) with higher probability. This generates a negative income effect which induces him to produce more, for any given price. These two effects shift both demand and supply to the right and, under assumption (A1), lead to an increase in equilibrium output.11 This proves the following result. PROPOSITION 3 —Coordination: Under assumption (A1), in a fully constrained equilibrium, the output in the productive islands, y 1 (ζ), is increasing in ζ. This is the central result of our paper and it shows that when liquidity constraints are binding, there is a positive coordination effect, as consumers and producers try to keep their spending and income decisions aligned. Consumers spend more when they expect their partners to earn more, and producers work more when they expect their partners to spend more. This has two main consequences. First, the impact of an aggregate shock on the aggregate level of activity is magnified, leading to increased volatility. Second, there is a stronger degree of comovement across islands. Let us analyze these two implications formally. Since y 1 (ζ) is independent of ζ in an unconstrained equilibrium and is increasing in ζ in a fully constrained equilibrium, equation (18) implies immediately that ∂ log Y1 (ζ)/∂ζ is larger in a fully constrained equilibrium than in an unconstrained equilibrium. This leads to the following result. 11 It is useful to mention what would happen in an environment where the producer and consumer from the same household can communicate (but not exchange money) in period 1. In that case, in a productive island there will be two types of consumers and producers, distinguished by the local shock of their partners. Consumers (producers) paired with low productivity partners will have lower demand (supply). So also in that case, an increase in ζ would lead to an increase in activity in the productive island. However, that case is less tractable, due to the four types of agents involved, and it fails to capture the effect of uncertainty on the agents’ decisions.
LIQUIDITY AND TRADING DYNAMICS
1765
PROPOSITION 4 —Amplification: Under assumption (A1), Var[log Y1 (ζ)] is larger in a fully constrained equilibrium than in an unconstrained equilibrium. To measure comovement we look at the coefficient of correlation between local output in any given island and aggregate output. In an unconstrained equilibrium, there is some degree of correlation between the two, simply because an increase in ζ increases both aggregate output and the probability of a high productivity shock in any given island. However, in a fully constrained equilibrium, the correlation tends to be stronger. Now, even conditionally on the island receiving the high productivity shock, an increase in ζ tends to increase both local and aggregate output, due to the coordination effect. This leads to the following result.12 PROPOSITION 5 —Comovement: Under assumption (A1), Corr[y1 (θ ζ) Y1 (ζ)] is larger in a fully constrained equilibrium than in an unconstrained equilibrium. 3. THE EXTENDED MODEL In the baseline model, the rate of return on money is the only determinant of the households’ access to liquidity. In this section, we allow a fraction of households to have access to credit each period. This introduces an additional dimension of liquidity, which makes the model better suited to interpret the effects of financial innovation and of financial crises, which can be described as changes in the fraction of households with credit access. We also generalize the distribution of local and aggregate shocks. Using this extended model, we first focus on the polar cases of unconstrained and fully constrained equilibria to generalize the main result of the previous section, Proposition 3. Next, we use a numerical example to analyze the model’s implications for amplification and comovement, both in the polar cases above and in intermediate cases. The setup is as in Section 2 except for two differences. First, we allow for general distributions of aggregate and local shocks. The aggregate shock ζt has c.d.f. G(·) and support [ζ ζ]. Conditional on ζt , the local productivity shock θtk has c.d.f. F(·|ζt ) with support [0 θ]. We assume that F(θ|ζ) has an atom at 0, is continuous in θ on (0 θ], and is continuous and decreasing in ζ for each θ. The latter property implies that a larger ζ leads to a distribution of θ that first-order stochastically dominates a distribution associated with a lower ζ. As before, a law of large numbers applies, so F(·|ζ) also represents the distribution of productivity shocks across islands. 12
An alternative measure of comovement is the correlation between the level of activity in any ˜ ζ)]. In a setup with independent and identically given pair of islands, that is, Corr[y1 (θ ζ) y1 (θ distributed (i.i.d.) idiosyncratic shocks, the two measures are interchangeable, as it is possible to ˜ ζ)] = (Corr[y(θ ζ) Y (ζ)])2 . prove that Corr[y(θ ζ) y(θ
1766
V. GUERRIERI AND G. LORENZONI
Second, we assume that at the beginning of each period t, a fraction φ of households is randomly selected and their identity can be fully verified during that period, so that they have full credit access. Each island is visited by a representative sample of households: a fraction φ of households with credit access, or credit households, and a fraction 1 − φ of anonymous households who need money to trade goods, or money households. We study stationary equilibria, defined along the lines of Definition 1, and focus on equilibria where the distribution of beginning-of-period money balances is degenerate at m = 1. In subperiod 3, households do not know if they will or will not have access to credit in the next period and hence all choose the same money holdings. Without loss of generality, we can assume that all loans are repaid in subperiod 3. Moreover, we assume that IOUs issued by credit households can circulate and so are a perfect substitute for money. Therefore, we can represent the problem of credit households using the same budget con˜ ζ) and straints (2)–(4), simply omitting the nonnegativity constraints for m1 (θ ˜ m2 (θ θ ζ). In a stationary equilibrium, the behavior of money households is characterized by the optimality conditions (6)–(9), as in Section 2.2. The assumption that F(·|ζ) has an atom at 0, together with the Inada condition for U, ensures that (6) and (8) always hold as equalities, as in the binary case, while (7) can hold ˜ and on the monetary with equality or not depending on the shocks θ and θ, regime. The behavior of credit households is described by the same optimality conditions, except for one difference: since they can hold negative nominal balances in subperiod 2, equation (7) always hold as an equality. 3.1. Unconstrained and Fully Constrained Equilibria First, let us look at unconstrained equilibria, which arise either when monetary policy follows the Friedman rule or when all households have access to credit, that is, when either γ = β or φ = 1. PROPOSITION 6: In the extended model, an unconstrained stationary monetary equilibrium exists if and only if γ = β and it achieves a first-best allocation. Output in period 1 is y1 (θ ζ) = y1∗ (θ), where y1∗ (θ) satisfies θu (y1∗ (θ)) = v (y1∗ (θ)/θ) for all θ > 0, is increasing in θ, and is independent of ζ. If φ = 1, it is easy to show that there exists an unconstrained stationary equilibrium that also achieves a first-best allocation. The only difference is that since all agents have access to credit, money is not valued in equilibrium except if γ = β. In an unconstrained equilibrium, the real allocation is the same for all households, regardless of their access to credit. In particular, they consume and produce the first-best level of output in all islands. As in the binary model, the
LIQUIDITY AND TRADING DYNAMICS
1767
separability of the utility function implies that equilibrium output in island θ depends only on the local productivity and is not affected by the distribution of productivities in other islands. Second, consider fully constrained equilibria where money households are always constrained in period 2 and there are no credit households, that is, when γ ≥ γˆ and φ = 0. Following steps similar to Section 2.2, we can show that p2 (ζ) is constant and equal to 1 (under the normalization e2 = 1). Consider a consumer and a producer in island θ. The consumer’s Euler equation and the producer’s optimality condition can be rewritten as (19)
θ
u (y1 (θ ζ)) = p1 (θ ζ) 0
(20)
˜ θ ζ)) dF(θ|ζ) ˜ U (c2 (θ
θ
v (y1 (θ ζ)/θ) = θp1 (θ ζ)
˜ ζ)) dF(θ|ζ) ˜ U (c2 (θ θ
0
where, from the consumer’s budget constraints in periods 1 and 2, (21)
˜ ζ) = 1 − p1 (θ ˜ ζ)y1 (θ ˜ ζ) + p1 (θ ζ)y1 (θ ζ) c2 (θ θ
Equations (19) and (20) are analogous to (15) and (16), and represent the demand and supply in island θ, taking as given prices and quantities in other islands. They define two functional equations in p1 (· ζ) and y1 (· ζ). In the Appendix, we show that this pair of functional equations has a unique solution. To do so, we define nominal income x(θ ζ) ≡ p1 (θ ζ)y1 (θ ζ) and solve a fixed point problem for the function x(· ζ). To solve our fixed point problem, we use a contraction mapping argument, making the assumption
(A2)
−
cu (c) ∈ [ρ 1) u (c)
for all c
for some ρ > 0. The upper bound on −cu (c)/u (c) is needed to ensure that the demand elasticity in a given island θ is high enough. This guarantees that in islands where productivity is higher, prices do not fall too much, so that nominal income is increasing in θ. That is, producers in more productive islands receive higher earnings. This property is both economically appealing and useful on technical grounds, as it allows us to prove the monotonicity of
1768
V. GUERRIERI AND G. LORENZONI
the mapping used in our fixed point argument.13 The lower bound ρ is used to prove the discounting property of the same mapping.14 As in the binary model, we can then characterize a fully constrained equilibrium and find a cutoff γˆ such that such an equilibrium exists whenever γ ≥ γ. ˆ PROPOSITION 7: In the extended model, under assumption (A2), there is a cutoff γˆ > β such that a fully constrained stationary monetary equilibrium exists if and only if φ = 0 and γ ≥ γ. ˆ In equilibrium, both output y1 (θ ζ) and nominal income p1 (θ ζ)y1 (θ ζ) are monotone increasing in θ. We could prove existence under weaker conditions by using a different fixed point argument. However, the contraction mapping approach helps us derive the coordination result in Proposition 8 below. 3.2. Aggregate Implications We now turn to the analysis of the impact of the aggregate shock ζ on the equilibrium allocation. Aggregate output in period 1 is given by (22)
θ
Y1 (ζ) ≡
y1 (θ ζ) dF(θ|ζ) 0
where y1 (θ ζ) ≡ φy1C (θ ζ) + (1 − φ)y1M (θ ζ), given that in each island θ there is a fraction φ of credit households (C) and a fraction 1 − φ of money households (M). The proportional response of output to a small change in ζ can be decomposed as in the binary case: (23)
d ln Y1 1 = dζ Y1
θ
y1 (θ ζ) 0
1 ∂f (θ|ζ) dθ + ∂ζ Y1
θ 0
∂y1 (θ ζ) dF(θ|ζ) ∂ζ
The first term is the mechanical composition effect of having a larger fraction of more productive islands. This effect is positive both in an unconstrained and in a fully constrained equilibrium. This follows from the fact that an increase in 13 It is useful to mention alternative specifications which can deliver the same result (nominal income increasing in θ) without imposing restrictions on risk aversion in period 1. One possibility is to introduce local shocks as preference shocks. For example, we could assume that the production function is the same in all islands while the utility function takes the form θu(c1 ), where θ is the local shock. In this case, it is straightforward to show that both p1 (θ ζ) and y1 (θ ζ) are increasing in θ, irrespective of the curvature of u. This immediately implies that nominal income is increasing in θ. Another possibility is to use more general preferences, which allow us to distinguish risk aversion from the elasticity of intertemporal substitution. For example, using a version of Epstein and Zin (1989) preferences, it is possible to show that our results only depend on the elasticity of substitution between c1 and c2 , and not on risk aversion. 14 This assumption is minimally restrictive, as ρ is only required to be nonzero.
LIQUIDITY AND TRADING DYNAMICS
1769
ζ leads to a first-order stochastic shift in the distribution of θ and that y1 (θ ζ) is increasing in θ in both regimes, as shown in Propositions 6 and 7. The second term in (23) is our coordination effect. As in the binary case, this effect is zero in an unconstrained equilibrium, since, by Proposition 6, output in any island θ is independent of the economy-wide distribution of productivity. Turning to a fully constrained equilibrium, we can generalize Proposition 3 and show that y1 (θ ζ) is increasing in ζ for any realization of the local productivity shock θ. For this result, we make a stronger assumption than the one used in the binary model, that is, we assume that U has a coefficient of relative risk aversion smaller than 1: (A1 )
−
cU (c) ≤ 1 for all c U (c)
This condition is sufficient to prove that the labor supply in each island is positively sloped. Assumption (A1 ) is stronger than needed, as numerical examples show that the labor supply is positively sloped also for parametrizations with a coefficient of relative risk aversion greater than 1. In fact, the coordination effect can be more powerful when agents are more risk averse. PROPOSITION 8 —Coordination: Consider the extended model. Under assumptions (A1 ) and (A2), in a fully constrained equilibrium, for each θ > 0, the output y1 (θ ζ) is increasing in ζ. To understand the mechanism behind this result, consider the following ˜ ζ) and partial equilibrium exercise. Focus on island θ, taking as given p1 (θ ˜ ζ) for all θ˜ = θ. Consider the demand and supply equations (19) and (20). y1 (θ ˜ ζ)y1 (θ ˜ ζ) is increasing in θ, ˜ it follows that Since, from Proposition 7, p1 (θ ˜ ˜ ˜ ˜ Hence, U (c2 (θ θ ζ)) is decreasing in θ and U (c2 (θ θ ζ)) is increasing in θ. when ζ increases, the integral on the right-hand side of (19) decreases, while the integral on the right-hand side of (20) increases.15 The intuition is similar to that for the binary model. When a liquidity constrained consumer expects higher income, his marginal value of money decreases and he increases consumption for any p1 (θ ζ). When a producer expects higher spending by his partner, he faces a negative income effect and produces more for any p1 (θ ζ). The first effect shifts the demand curve to the right, the second shifts the supply curve also to the right. Therefore, equilibrium output increases in island θ. On top of this partial equilibrium mechanism, there is a general equilibrium feedback due to the endogenous response of prices and quantities in islands θ˜ = θ. This magnifies the initial effect. As nominal output in all other islands increases, there is a further increase in the marginal value of money for the 15 This follows immediately from the fact that an increase in ζ leads to a shift of the distribution of θ in the sense of first-order stochastic dominance.
1770
V. GUERRIERI AND G. LORENZONI
consumers and a further decrease for the producers, leading to an additional increase in output. Summing up, the coordination effect identified in Proposition 8 is driven by the agents’ expectations regarding nominal income in other islands. This effect tends to magnify the output response to aggregate shocks in a fully constrained economy and to generate more comovement across islands. Going back to equation (23), we have established that the coordination effect is zero in the unconstrained case and positive in the fully constrained case. However, this is not sufficient to establish that output volatility is greater in the constrained economy, since we have not compared the relative magnitude of the compositional effect, which is positive in both cases. In the binary model, the comparison was unambiguous, given that this effect was identical in the two regimes. However, with general shock distributions, it is difficult to compare the relative size of this effect in the two regimes and to obtain the analogues of Propositions 4 and 5. Therefore, to gauge the implications of our coordination effect for volatility and comovement, we turn to a numerical example. 3.3. Numerical Example We use a numerical example both to analyze the aggregate implications of our coordination effect in the two polar cases analyzed above and to study intermediate cases in which the fraction of households with credit access φ is in (0 1) and the rate of return on money γ is in the intermediate range (β γ). ˆ We choose isoelastic instantaneous utility functions: u(c) = c 1−ρ1 /(1 − ρ1 ), U(c) = c 1−ρ2 /(1−ρ2 ), and v(n) = n1+η /(1+η). There are two aggregate states, ζL and ζH , with probabilities α and 1 − α. Conditional on the aggregate state, the shock θ is log-normally distributed with mean μH in state ζH , mean μL in state ζL , and variance σ 2 .16 We interpret each sequence of three subperiods as a year and set the discount factor β to 097. We normalize e2 = 1 and set ρ1 = 05, ρ2 = 1, η = 03, and e3 = 3.17 For the shock distribution, we choose α = 02, μL = 05, μH = 056, and σ 2 = 019.18 The aggregate shock μH − μL is chosen to deliver a standard deviation of log Y1 equal to 005 in the first-best allocation. In Figure 1 we look at the effects of different liquidity regimes on volatility and comovement, plotting the standard deviation of log Y1 (top panel) and 16 Even though this distribution does not have an atom at 0, in our example, consumers never exhaust their money balances in period 1. 17 The parameters ρ1 , η, and e3 are chosen to roughly match the empirical relation between money velocity and the nominal interest rate in the postwar U.S. data (following Lucas (2000), Lagos and Wright (2005), and Craig and Rocheteau (2008)). 18 The variance of the idiosyncratic shock σ 2 yields a standard deviation of income volatility at the household level equal to 02, consistent with estimates in Hubbard, Skinner, and Zeldes (1994).
LIQUIDITY AND TRADING DYNAMICS
1771
FIGURE 1.—Volatility and comovement.
the correlation coefficient of y1 and Y1 (bottom panel) as functions of γ for different levels of φ. With φ = 1, all consumers have perfect access to credit and the equilibrium achieves the first-best allocation, so both volatility and comovement are independent of γ. With φ = 0, volatility and comovement increase with γ until the economy reaches the fully constrained equilibrium for γ ≥ γ. ˆ In the intermediate case with φ = 05, volatility and comovement are both increasing in γ and, for each γ > β, achieve intermediate levels relative to the two extreme cases. The figure shows that as the rate of return on money increases or the fraction of households with access to credit decreases, both volatility and comovement increase. In particular, a fully constrained economy is significantly more volatile than the unconstrained economy, but the relation between γ and volatility is concave, and relatively large effects already appear when γ is not far from the Friedman rule (e.g., at γ = 1). Experimenting with the parameters, shows that increasing the elasticity of labor supply by lowering η tends to lead to larger effects. Increasing the second period risk aversion ρ2 can increase or decrease volatility, depending on the initial parameters. A higher ρ2 increases the precautionary motive, making households more responsive to a negative shock. However, there is a countervailing force, as a higher ρ2 also increases the equilibrium value of real balances.
1772
V. GUERRIERI AND G. LORENZONI
4. NEWS SHOCKS Consider now the general model of Section 3, with the only difference that the aggregate shock ζ is not observed by the households in period 1. Instead, they all observe a public signal ξ ∈ [ξ ξ], which is drawn at the beginning of each period, together with the aggregate shock ζ, from a continuous distribution with joint density function g(ζ ξ). Taking an agent located in an island with productivity θ, his posterior density regarding ζ can be derived using Bayes’ rule: f (θ|ζ)g(ζ ξ)
g(ζ|ξ θ) =
ζ
˜ ζ ˜ ξ) d ζ˜ f (θ|ζ)g(
ζ
The distribution g(ζ|ξ θ) is then used to derive the agent’s posterior beliefs regarding θ˜ in the island where his partner is located: ˜ θ) = F(θ|ξ
ζ
˜ F(θ|ζ)g(ζ|ξ θ) dζ
ζ
˜ θ) is nonincreasing in ξ for any pair We will make the assumption that F(θ|ξ ˜ This means that, conditional on θ, the signal ξ is “good news” for θ, ˜ (θ θ). in the sense of Milgrom (1981). We also make the natural assumption that ˜ θ) is nonincreasing in θ. In period 2, the actual shock ζ is publicly reF(θ|ξ vealed. In this environment, we study a stationary equilibrium along the lines of that described in Section 3. Prices and allocations now depend on the local shocks and on the aggregate shocks ξ and ζ. In particular, prices and quantities in period 1 depend only on θ and ξ, given that ζ is not in the information set of the households in that period. Aggregate output in period 1 becomes (24)
θ
Y1 (ζ ξ) ≡
y1 (θ ξ) dF(θ|ζ) 0
We can now look separately at the output response to the productivity shock ζ and to the news shock ξ. In particular, the next proposition shows that the output response to ζ is positive both in an unconstrained and in a fully constrained equilibrium, while the output response to the signal ξ is positive only in the fully constrained case. PROPOSITION 9: Consider an economy with imperfect information regarding the aggregate shock. Under assumptions (A1 ) and (A2), in an unconstrained equilibrium, ∂Y1 (ζ ξ)/∂ζ > 0 and ∂Y1 (ζ ξ)/∂ξ = 0. In a fully constrained equilibrium, ∂Y1 (ζ ξ)/∂ζ > 0 and ∂Y1 (ζ ξ)/∂ξ > 0.
LIQUIDITY AND TRADING DYNAMICS
1773
This result is not surprising in light of the analysis in the previous sections. Compare the expression for aggregate output under imperfect information (24) with the correspondent expression in the case of full information (22). By definition, the productivity shock ζ affects the distribution of local shocks F(·|ζ) in both cases. However, the trading decisions of anonymous households in island θ are affected only by the agents’ expectations about that distribution, which, in the case of imperfect information, are driven by the signal ξ. It follows that the effect of ζ is analogous to the mechanical composition effect in the model with full information on ζ, while the effect of ξ is analogous to the coordination effect. The advantage of an environment with imperfect information is that these two effects can be disentangled. In an unconstrained economy, as we know from Proposition 6, output in island θ is independent of the economy-wide distribution of productivity and thus does not respond to ξ. The result that the output response to ξ is positive in a fully constrained economy is a natural extension of Proposition 8. In island θ, trading is lower the more pessimistic agents are about trading in all other islands. The only difference is how expectations are formed. The perceived distribution of productivities for an agent in island θ depends now on the signal ξ instead of on the actual ζ. A negative signal ξ makes both consumers and producers in island θ more pessimistic about trading in other islands, even if the underlying ζ is unchanged. This highlights that expectations are at the core of our amplification result. 5. CONCLUDING REMARKS In this paper, we have analyzed how different liquidity regimes affect the response of an economy to aggregate shocks. A liquidity regime depends both on the households’ access to credit and on the value of their money holdings. We show that in regimes where liquidity constraints are binding more often, there is a coordination motive in the agents’ trading decisions. This generates both an amplified response to aggregate shocks and a larger degree of comovement. Our mechanism is driven by the combination of risk aversion, idiosyncratic uncertainty, and decentralized trade. All three ingredients are necessary for the mechanism to operate. Risk aversion and idiosyncratic risk give rise to an insurance problem. Decentralized trade implies that agents with no access to credit can only self-insure using their money holdings.19 A nice feature of our setup is that simply by changing the credit and monetary regimes, we move from an environment in which idiosyncratic risk is perfectly insurable (unconstrained equilibrium) to an environment in which idiosyncratic risk is completely uninsurable (fully constrained equilibrium). In this sense, the mechanism identified in this paper speaks more broadly about the effect of uninsurable idiosyncratic risk on aggregate behavior. 19 Reed and Waller (2006) also pointed out the risk sharing implications of different monetary regimes in a model à la Lagos and Wright (2005).
1774
V. GUERRIERI AND G. LORENZONI
For analytical tractability, we have developed our argument in a periodic framework à la Lagos and Wright (2005). This framework is clearly special in many respects and, in particular, displays no endogenous source of persistence. It would be interesting to investigate, numerically, the quantitative implications of our mechanism in a version of the model that allows for richer dynamics of individual asset positions.20 A similar extension would also help to clarify the relation between our results and the literature on the aggregate implications of imperfect risk sharing, such as Krusell and Smith (1998).21 The current crisis is a good example of how anticipated changes in access to liquidity can have a substantial impact on aggregate activity. Our model provides a possible interpretation of these effects: a reduction in credit access induces consumers to be more cautious in their spending decisions and more concerned about their income prospects in the near future, making the recession worse. Our model can also be applied to interpret the effects of more gradual regime changes: for example, many have argued that a gradual increase in credit access for households and firms has contributed to a reduction in aggregate volatility in the United States after the mid 1980s. The model’s predictions are qualitatively consistent with this story and also emphasize that the high inflation of the 1970s, by reducing the value of the real balances in the hands of consumers, may have further increased volatility in the pre-1982 period. APPENDIX PROOF OF PROPOSITION 1: In the main text we show that γ = β is a necessary condition for an unconstrained equilibrium and that an unconstrained equilibrium achieves first-best efficiency in period 1. In period 2, if the liquidity constraint is slack, all households consume the same amount, as ˜ ζ)) = p2 (ζ)/p3 for all θ and θ. ˜ By market clearing, c2 (θ θ ˜ ζ) must U (c2 (θ θ ˜ then be equal to e2 . Since any stationary allocation c3 (θ θ ζ) is consistent with first-best efficiency, this completes the proof of efficiency. It remains to prove that γ = β is sufficient for an unconstrained equilibrium to exist. To do so, we construct such an equilibrium. Let the prices be p1 (θ) = p3 u (y1∗ (θ))
for all θ
p2 = p3 U (e2 ) and let p3 take any value in (0 pˆ 3 ], where pˆ 3 ≡ [u (y1∗ (θ))y1∗ (θ) + U (e2 )e2 ]−1 . From the argument above, the consumption levels in periods 1 and 2 must be 20
See, for example, the computational approach in Molico (2006). In Krusell and Smith (1998), the entire capital stock of the economy is a liquid asset and the presence of uninsurable idiosyncratic risk has minor effects on aggregate dynamics. To explore our mechanism, it would be interesting to assume that capital is at least partially illiquid. 21
LIQUIDITY AND TRADING DYNAMICS
1775
at their first-best level. Substituting in the budget constraints the prices above and the first-best consumption levels in periods 1 and 2, we obtain ˜ ζ) = e3 − u (y1∗ (θ))y ˜ 1∗ (θ) ˜ + u (y1∗ (θ))y1∗ (θ) c3 (θ θ ˜ > 0 for all θ and θ. ˜ MoreThe assumption that e3 is large ensures that c3 (θ θ) over, it is easy to show that money holdings are nonnegative thanks to the assumption p3 ≤ pˆ 3 . It is also easy to check that the allocation is individually optimal and satisfies market clearing, completing the proof. Q.E.D. PROOF OF LEMMA 1: Applying the implicit function theorem to (15) and (16), we obtain (25)
∂y D (p1 ζ) ∂p1 =
(26)
ζU (e2 ) + (1 − ζ)U (e2 − p1 y 1 ) − p1 y 1 (1 − ζ)U (e2 − p1 y 1 ) u (y 1 ) + p21 (1 − ζ)U (e2 − p1 y 1 )
∂y S (p1 ζ) ∂p1 =θ
ζU (e2 ) + (1 − ζ)U (e2 + p1 y 1 ) + p1 y 1 (1 − ζ)U (e2 + p1 y 1 ) v (y 1 /θ)/θ − p21 (1 − ζ)U (e2 + p1 y 1 )
The concavity of u and U imply that the numerator of (25) is positive and the denominator is negative, proving that ∂y D (p1 ζ)/∂p1 < 0. The concavity of U and the convexity of v show that the denominator of (26) is positive. It remains to show that the numerator is also positive. The following chain of inequalities is sufficient for that: ζU (e2 ) + (1 − ζ)U (e2 + p1 y 1 ) + (1 − ζ)p1 y 1 U (e2 + p1 y 1 ) > U (e2 + p1 y 1 ) + (1 − ζ)(e2 + p1 y 1 )U (e2 + p1 y 1 ) ≥ 0 The first inequality follows because the concavity of U implies U (e2 ) > U (e2 + p1 y 1 ) and e2 U (e2 + p1 y 1 ) < 0. The second inequality follows from assumption (A1), completing the proof that ∂y S (p1 ζ)/∂p1 > 0. Existence can be shown using similar arguments as in the proof of Lemma 2 below. Uniqueness follows immediately. Q.E.D. PROOF OF PROPOSITION 2: First, we complete the characterization of a fully constrained equilibrium, presenting the steps omitted in the text. Then we will define γˆ and prove that such an equilibrium exists iff γ ≥ γ. ˆ Suppose for the
1776
V. GUERRIERI AND G. LORENZONI
moment that (15) and (16) have a unique solution, p1 (θ ζ) and y1 (θ ζ). In unproductive islands, output and nominal output are zero, y1 (0 ζ) = 0 and p1 (0 ζ)y1 (0 ζ) = 0. From the consumer’s budget constraint in period 2, we obtain ˜ ζ) = e2 − p1 (θ ˜ ζ)y1 (θ ˜ ζ) + p1 (θ ζ)y1 (θ ζ) c2 (θ θ The price level in unproductive islands is obtained from the Euler equation (6),
˜ ζ))|θ ˜ ζ −1 p1 (0 ζ) = u (0) E U (c2 (0 θ From the consumer’s budget constraint in period 3, we obtain c3 = e3 . Combining the Euler equations (6) and (8) and the envelope condition (10), p3 is uniquely pinned down by (27)
1 ˜ ζ)) = βγ −1 E U (c2 (θ θ p3
The only optimality condition that remains to be checked is the Euler equation ˜ ζ) and in period 2—equation (7). Notice that given our construction of c2 (θ θ ˜ ˜ ˜ the concavity of U U (c2 (θ θ ζ)) ≥ minζ˜ U (c2 (θ θ ζ)) for all θ θ ζ. It fol˜ ζ is lows that a necessary and sufficient condition for (7) to hold for all θ θ (28)
min U (c2 (θ θ ζ)) ≥ ζ
1 p3
We now define the cutoff ˜ ζ)) E U (c2 (θ θ γˆ ≡ β min U (c2 (θ θ ζ)) ζ
and prove the statement of the proposition. Using (27) to substitute for p3 , condition (28) is equivalent to γ ≥ γ. ˆ Therefore, if an unconstrained equilibrium exists, (28) implies γ ≥ γ, ˆ proving necessity. If γ ≥ γ, ˆ the previous steps show how to construct a fully constrained equilibrium, proving sufficiency. In the case where (15) and (16) have multiple solutions, one can follow the steps above and find a value of γˆ for each solution. The smallest of these values gives us the desired cutoff. Under assumption (A1), Lemma 1 ensures that (15) and (16) have a unique solution and, from the characterization above, the fully constrained equilibrium is unique. Q.E.D. PROOF OF PROPOSITION 4: The argument in the text shows that d log Y1 (ζ)/ dζ is larger in a fully constrained equilibrium, for all ζ ∈ [ζL ζH ], which implies that log Y1 (ζH ) − log Y1 (ζL ) is larger as well. This proves our statement, since Var[log Y1 (ζ)] = α(1 − α)[log Y1 (ζH ) − log Y1 (ζL )]2 . Q.E.D.
LIQUIDITY AND TRADING DYNAMICS
1777
PROOF OF PROPOSITION 5: Let μy = E[y1 (θ ζ)]. Since Y1 (ζ) = E[y1 (θ ζ)|ζ], we have Cov[y1 (θ ζ) Y1 (ζ)] = E E (y1 (θ ζ) − μy )(Y1 (ζ) − μy )|ζ = Var[Y1 (ζ)] and, hence, Corr[y1 (θ ζ) Y1 (ζ)] = (Var[Y1 (ζ)]/ Var[y1 (θ ζ)])1/2 . Using the decomposition Var[y1 (θ ζ)] = Var[Y1 (ζ)] + E[Var[y1 (θ ζ)|ζ]], rewrite this correlation as Corr[y1 (θ ζ) Y1 (ζ)] E Var[y1 (θ ζ)|ζ] −1/2 = 1+ Var[Y1 (ζ)] −1/2 (1 − α)ζL (1 − ζL )(y 1 (ζL ))2 + αζH (1 − ζH )(y 1 (ζH ))2 = 1+ α(1 − α)(ζH y 1 (ζH ) − ζL y 1 (ζL ))2 Therefore, the correlation is lower in the unconstrained economy if and only if f (ν U ) < f (ν C ), where ν U and ν C denote, respectively, the ratio y 1 (ζH )/y 1 (ζL ) in the unconstrained and in the fully constrained regimes, and the function f (ν) is defined as f (ν) ≡
α(1 − α)(ζH ν − ζL )2 (1 − α)ζL (1 − ζL ) + αζH (1 − ζH )ν 2
Notice that f (ν) is continuous and differentiable, ν U = 1 from Proposition 1, and ν C > 1 from Proposition 3. Therefore, to prove our statement, it is sufficient to show that f (ν) > 0 for ν ≥ 1. Differentiating f (ν) shows that f (ν) has the same sign as ζH (ζH ν − ζL )((1 − α)ζL (1 − ζL ) + αζH (1 − ζH )ν 2 ) − αζH (1 − ζH )(ζH ν − ζL )2 ν Since ζH > ζL , if ν ≥ 1, then ζH ν − ζL > 0. Some algebra shows that the expression above has the same sign as (1 − α)(1 − ζL ) + α(1 − ζH )ν and is always positive, completing the proof. Q.E.D. PROOF OF PROPOSITION 6: It is easy to generalize the first-best allocation described in Section 2.1 for the binary model. Solving the planner problem for the extended model, the optimal output level in period 1, in island θ, is equal to the y1∗ (θ) that satisfies (29)
θu (y1∗ (θ)) = v (y1∗ (θ)/θ)
1778
V. GUERRIERI AND G. LORENZONI
˜ ζ) = c2M (θ θ ˜ ζ) = e2 for all θ θ, ˜ Optimal consumption in period 2 is c2C (θ θ and ζ, where C and M denote, respectively, credit and money households. Next, we prove that any unconstrained equilibrium achieves a first-best al˜ and ζ, for both credit and location. Since (7) holds as an equality for all θ, θ, i ˜ ζ) is equal to a constant c2 for all money households, it follows that c2 (θ θ ˜ θ, θ, ζ, and i = C M. Then market clearing requires c2 = e2 . Substituting in (6) (for a consumer in island θ) and (9) (for a producer in island θ), and given that (6) holds as an equality, we obtain c1C (θ ζ) = c1M (θ ζ) = c1 (θ) and nC (θ ζ) = nM (θ ζ) = n(θ) for all θ and ζ, where u (c1 (θ)) =
p1 (θ) p1 (θ) U (e2 ) and v (n(θ)) = θ U (e2 ) p2 p2
These two conditions, and market clearing in island θ, imply that y1C (θ ζ) = y1M (θ ζ) = y1∗ (θ) as defined by the planner optimality condition (29). Therefore, consumption levels in periods 1 and 2 achieve the first best. Since any consumption allocation in period 3 is consistent with first-best efficiency, this completes the argument. The proof that γ = β is necessary for an unconstrained equilibrium to exist is the same as in the binary model. To prove sufficiency, when γ = β, we can construct an unconstrained equilibrium with prices p1 (θ) = p3 u (y1∗ (θ))
for all θ
p2 = p3 U (e2 ) for some p3 ∈ (0 pˆ 3 ], where pˆ 3 ≡ [u (y1∗ (θ))y1∗ (θ) + U (e2 )e2 ]−1 . From the argument above, consumption levels in periods 1 and 2 are at their first-best level. Substituting in the budget constraints the prices above and the first-best consumption levels in periods 1 and 2, we obtain ˜ ζ) = c3M (θ θ ˜ ζ) = e3 − u (y1∗ (θ))y ˜ 1∗ (θ) ˜ + u (y1∗ (θ))y1∗ (θ) c3C (θ θ Moreover, choosing any p3 ≤ pˆ 3 ensures that money holdings are nonnegative. It is straightforward to check that this allocation satisfies market clearing and that it is individually optimal, completing the proof. Finally, it is easy to show that y1∗ (θ) is increasing by applying the implicit function theorem to the planner’s optimality condition (29). Q.E.D. Preliminary Results for Proposition 7 To prove Proposition 7, it is useful to prove some preliminary lemmas, which will be used to show that the system of functional equations (19) and (20) has a unique solution (p1 (· ζ) y1 (· ζ)) for a given ζ. These results will also be useful to prove Proposition 8.
LIQUIDITY AND TRADING DYNAMICS
1779
Let us define a fixed point problem for the function x(· ζ). Recall from the text that x(θ ζ) ≡ p1 (θ ζ)y1 (θ ζ). To save on notation, in the lemmas we fix ζ and refer to p1 (· ζ), y1 (· ζ), x(· ζ), and F(· ζ) as p(·), y(·), x(·), and F(·). Notice that in an island where θ = 0, x(0) = 0. Moreover, nonnegativity of consumption in period 2 requires that x(θ) ≤ 1 for all θ. Therefore, we restrict attention to the set of measurable, bounded functions x : [0 θ] → [0 1] that satisfy x(0) = 0. We use X to denote this set. LEMMA 2: Given θ > 0 and a function x ∈ X, there exists a unique pair (p y) which solves the system of equations (30)
u (y) − p
θ
0
(31)
˜ dF(θ) ˜ = 0 U (1 − py + x(θ))
v (y/θ) − θp
θ
˜ + py) dF(θ) ˜ = 0 U (1 − x(θ)
0
The pair (p y) satisfies py ∈ [0 1]. PROOF: We proceed in two steps, first we prove existence, then uniqueness. Step 1—Existence. For a given p ∈ (0 ∞), it is easy to show that there is a unique y which solves (30) and a unique y which solves (31), which we denote, respectively, by y D (p) and y S (p). Finding a solution to (30) and (31) is equivalent to finding a p that solves (32)
y D (p) − y S (p) = 0
It is straightforward to prove that y D (p) and y S (p) are continuous on (0 ∞). We now prove that they satisfy four properties: (a) py D (p) < 1 for all p ∈ (0 ∞), (b) y S (p) < θn¯ for all p ∈ (0 ∞), (c) lim supp→0 y D (p) = ∞, and (d) lim supp→∞ py S (p) = ∞. Notice that x(0) = 0 with positive probability, so the Inada condition for U can be used to prove property (a). Similarly, to prove property (b), we can use the assumption limn→n¯ v (n) = ∞. To prove (c) notice that (a) implies lim supp→0 py D (p) ≤ 1. If lim supp→0 py D (p) = 1, then we immediately have lim supp→0 y D (p) = ∞. If, instead, lim supp→0 py D (p) < 1, then there exists a K ∈ (0 1) and an ε > 0 such that py D (p) < K for all p ∈ (0 ε). ˜ is bounded Since U is decreasing, this implies that U (1 − py D (p) + x(θ)) above by U (1 − K) < ∞ for all p ∈ (0 ε), which implies lim p
p→0
θ
˜ dF(θ) ˜ = 0 U (1 − py D (p) + x(θ))
0
Using (30), this requires limp→0 u (y D (p)) = 0 and, hence, limp→0 y D (p) = ∞. To prove property (d), suppose, by contradiction, that there exist a K > 0 and
1780
V. GUERRIERI AND G. LORENZONI
˜ + py S (p)) is a P > 0, such that py S (p) ≤ K for all p ≥ P. Then U (1 − x(θ) bounded below by U (1 + K) > 0 for all p ∈ (P ∞), which implies θ ˜ + py S (p)) dF(θ) ˜ = ∞ (33) U (1 − x(θ) lim p p→∞
0
Moreover, since 0 ≤ py S (p) ≤ K for all p ≥ P, it follows that limp→∞ y S (p) = 0 and thus (34)
lim v (y S (p)/θ) < ∞
p→∞
Using equation (31), conditions (33) and (34) lead to a contradiction, completing the proof of (d). Properties (a) and (d) immediately imply lim supp→∞ (p × y S (p)−py D (p)) = ∞, while (b) and (c) imply lim supp→0 (y D (p)−y S (p)) = ∞. It follows that there exists a pair (p p ), with p < p , such that y D (p ) − y S (p ) > 0 and y D (p ) − y S (p ) < 0. By the intermediate value theorem there exists a p which solves (32). Property (a) immediately implies that py ∈ [0 1], where y = y D (p) = y S (p). ˆ = y S (p). ˆ To Step 2—Uniqueness. Let pˆ be a zero of (32) and let yˆ = y D (p) D S show uniqueness, it is sufficient to show that dy (p)/dp − dy (p)/dp < 0 at ˆ Applying the implicit function theorem gives p = p. θ θ ˜ − pˆ yˆ ˜
D U (c˜2D ) dF(θ) U (c˜2D ) dF(θ) dy (p) 0 0 = θ dp p=pˆ D 2 ˜ ˆ + pˆ u (y) U (c˜2 ) dF(θ) 0
˜ and where c˜2D = 1 − pˆ yˆ + x(θ) θ θ S ˜ ˜
S U (c˜2 ) dF(θ) + pˆ yˆ U (c˜2S ) dF(θ) dy (p) 0 0 = θ dp p=pˆ 2−p ˜ ˆ ˆ2 v (y/θ)/θ U (c˜2S ) dF(θ) 0
˜ + pˆ y. ˆ Using (30) and (31), the required inequality can where c˜2S = 1 − x(θ) then be rewritten as θ ˆ ˆ v (y/θ) u (y) D ˜ U (c˜2 ) dF(θ) − pˆ yˆ θ2 pˆ 0 θ ˆ v (y/θ) 2 D ˜ ˆ ˆ ˜ u (y) + p − U (c2 ) dF(θ) θpˆ 0 θ ˜ (u (y) ˆ + yu ˆ (y)) ˆ > 0 + pˆ U (c˜2S ) dF(θ) 0
LIQUIDITY AND TRADING DYNAMICS
1781
The first two terms on the left-hand side are positive. Assumption (A2) implies that also the last term is positive, completing the argument. Q.E.D. LEMMA 3: Given a function x ∈ X, for any θ > 0 let (p(θ) y(θ)) be the unique pair solving the system (30) and (31), and define z(θ) ≡ p(θ)y(θ). The function z(θ) is monotone increasing. PROOF: Define the two functions θ ˜ dF(θ) ˜ U (1 − z + x(θ)) h1 (z y; θ) ≡ u (y)y − z 0
θ
h2 (z y; θ) ≡ v (y/θ)y/θ − z
˜ + z) dF(θ) ˜ U (1 − x(θ)
0
which correspond to the left-hand sides of (30) and (31) multiplied, respectively, by y and y/θ. Lemma 2 ensures that for each θ > 0 there is a unique positive pair (z(θ) y(θ)) which satisfies h1 (z(θ) y(θ); θ) = 0
and h2 (z(θ) y(θ); θ) = 0
Applying the implicit function theorem, gives
(35)
∂h1 ∂h2 ∂h2 ∂h1 − ∂y ∂θ ∂y ∂θ z (θ) = ∂h1 ∂h2 ∂h2 ∂h1 − ∂z ∂y ∂z ∂y
To prove the lemma, it is sufficient to show that z (θ) > 0 for all θ ∈ (0 θ]. Using z and y as shorthand for z(θ) and y(θ), the numerator on the righthand side of (35) can be written as
y y y y − 2 v + v [u (y) + u (y)y] θ θ θ θ and the denominator can be written, after some algebra, as (36)
θ y y y z ˜ dF(θ) ˜ v + v U (1 − z + x(θ)) θ θ θ θ 0 θ ˜ + z) dF(θ) ˜ + [u (y) + u (y)y]z U (1 − x(θ) +
2
y zθ2
0
y y θ − u (y)v u (y)v θ θ
1782
V. GUERRIERI AND G. LORENZONI
Assumption (A2) ensures that both numerator and denominator are negative, completing the proof. Q.E.D. We can now define a map T from the space X into itself. DEFINITION 2: Given a function x ∈ X, for any θ > 0, let (p(θ) y(θ)) be the unique pair solving the system (30) and (31). Define a map T : X → X as follows. Set (T x)(θ) = p(θ)y(θ) if θ > 0 and (T x)(θ) = 0 if θ = 0. The following lemmas prove monotonicity and discounting for the map T . These properties will be used to find a fixed point of T . In turn, this fixed point will be used to construct the equilibrium in Proposition 7. LEMMA 4: Take any x0 x1 ∈ X, with x1 (θ) ≥ x0 (θ) for all θ. Then (T x1 )(θ) ≥ (T x0 )(θ) for all θ. PROOF: For each θ˜ ∈ [0 θ] and any scalar λ ∈ [0 1], with a slight abuse of ˜ λ) ≡ x0 (θ) ˜ + λ(θ), ˜ where (θ) ˜ ≡ x1 (θ) ˜ − x0 (θ) ˜ ≥ 0. notation, we define x(θ 0 ˜ 1 ˜ ˜ ˜ Notice that x(θ 0) = x (θ) and x(θ 1) = x (θ). Fix a value for θ and define the two functions θ ˜ λ)) dF(θ) ˜ U (1 − z + x(θ h1 (z y; λ) ≡ yu (y) − z 0
θ
h2 (z y; λ) ≡ v (y/θ)y/θ − z
˜ λ) + z) dF(θ) ˜ U (1 − x(θ
0
Applying Lemma 2, for each λ ∈ [0 1], we can find a unique positive pair (z(λ) y(λ)) that satisfies h1 (z(λ) y(λ); λ) = 0
and h2 (z(λ) y(λ); λ) = 0
We are abusing notation in the definition of h1 (· ·; λ) h2 (· ·; λ) z(λ), and y(λ), given that the same symbols were used above to define functions of θ. Here we keep θ constant throughout the proof, so no confusion should arise. Notice that, by construction, (T x0 )(θ) = z(0) and (T x1 )(θ) = z(1). Therefore, to prove our statement, it is sufficient to show that z (λ) ≥ 0 for all λ ∈ [0 1]. Applying the implicit function theorem yields
(37)
∂h1 ∂h2 ∂h2 ∂h1 − ∂y ∂λ ∂y ∂λ z (λ) = ∂h1 ∂h2 ∂h2 ∂h1 − ∂z ∂y ∂z ∂y
LIQUIDITY AND TRADING DYNAMICS
1783
Using z and yas shorthand for z(λ) and y(λ), the numerator on the right-hand side of (37) can be written as
[u (y) + u (y)y]z
θ
˜ λ) + z)(θ) ˜ dF(θ) ˜ U (1 − x(θ
0
+
θ y y z y ˜ λ))(θ) ˜ dF(θ) ˜ v + v U (1 − z + x(θ θ θ θ θ 0
The denominator takes a form analogous to (36). Again, assumption (A2) ensures that both the numerator and the denominator are negative, completing the argument. Q.E.D. Before proving the discounting property, it is convenient to restrict the space X to the space X˜ of functions bounded in [0 z] for an appropriate z < 1. The following lemma shows that the map T maps X˜ into itself, and that any fixed ˜ point of T in X must lie in X. LEMMA 5: There exists a z < 1, such that if x ∈ X, then (T x)(θ) ≤ z for all θ. PROOF: Set x(0) = 0 and x(θ) = 1 for all θ > 0. Setting x(·) = x(·) and θ = θ, equations (30) and (31) take the form u (y) = p F(0)U (1 − py) + (1 − F(0))U (2 − py) v (y/θ) = θp F(0)U (1 + py) + (1 − F(0))U (py) ˆ Since ˆ y) ˆ denote the pair that solves these equations and let z ≡ pˆ y. Let (p F(0) > 0 and U satisfies the Inada condition, limc→0 U (c) = ∞, inspecting the first equation shows that z < 1. Now take any x ∈ X. Since x(θ) ≤ x(θ) for all θ, Lemma 4 implies that (T x)(θ) ≤ (T x)(θ). Moreover, Lemma 3 implies that (T x)(θ) ≤ (T x)(θ) = z. Combining these inequalities, we obtain Q.E.D. (T x)(θ) ≤ z. LEMMA 6: There exists a δ ∈ (0 1) such that the map T satisfies the discounting property: for any x0 x1 ∈ X˜ such that x1 (θ) = x0 (θ) + a for some a > 0, the follow inequality holds: |(T x1 )(θ) − (T x0 )(θ)| ≤ δa for all θ ∈ [0 θ]
1784
V. GUERRIERI AND G. LORENZONI
˜ λ) ≡ x0 (θ) ˜ + PROOF: Proceeding as in the proof of Lemma 4, define x(θ ˜ ˜ ˜ λ(θ), where now (θ) = a for all θ. After some algebra, we obtain nv (n) yu (y) A+ 1+ B 1+ u (y) v (n) (38) a z (λ) = nv (n) nv (n) yu (y) yu (y) A+ 1+ B+ − 1+ u (y) v (n) v (n) u (y) where y and n are shorthand for y(λ) and y(λ)/θ, and
θ
˜ λ) + z(λ)) dF(θ) ˜ U (1 − x(θ
z(λ) A=−
0
θ
˜ λ) + z(λ)) dF(θ) ˜ U (1 − x(θ
0
θ
z(λ) B=−
˜ λ)) dF(θ) ˜ U (1 − z(λ) + x(θ
0
θ
˜ λ)) dF(θ) ˜ U (1 − z(λ) + x(θ
0
˜ λ) are both in [0 z] and z < e2 , and given that Now, given that z(λ) and x(θ U has continuous first and second derivatives on (0 ∞), it follows that both A and B are bounded above. We can then find a uniform upper bound on both A and B, independent of λ and of the functions x0 and x1 chosen. Let C be this upper bound. Given that u (y) ≤ 0, then nv (n) nv (n) yu (y) A+ 1+ B≤ 2+ C 1+ u (y) v (n) v (n) Therefore, (38) implies −1 nv (n)/v (n) − yu (y)/u (y) z (λ) ≤ 1 + a (2 + nv (n)/v (n))C Recall that ρ > 0 is a lower bound for −yu (y)/u (y). Then ρ nv (n)/v (n) − yu (y)/u (y) −yu (y)/u (y) ≥ ≥ (2 + nv (n)/v (n))C 2C 2C Setting δ ≡ 1/[1 + ρ/(2C)] < 1, it follows that z (λ) ≤ δa for all λ ∈ [0 1]. Integrating both sides of the last inequality over [0 1] gives z(1) − z(0) ≤ δa. By construction (T x1 )(θ) = z(1) and (T x0 )(θ) = z(0), completing the proof. Q.E.D.
LIQUIDITY AND TRADING DYNAMICS
1785
PROOF OF PROPOSITION 7: We first uniquely characterize prices and allocations in a fully constrained equilibrium. Next, we will use this characterization to prove our claim. The argument in the text and the preliminary results above ˜ ζ) = 0 for all θ and θ, ˜ show that if there exists an equilibrium with m2 (θ θ then p1 (θ ζ) and y1 (θ ζ) must solve the functional equations (19) and (20) for any given ζ. To find the equilibrium pair (p1 (θ ζ) y1 (θ ζ)), we first find a fixed point of the map T defined above (Definition 2). Lemmas 4 and 6 show that T is a map from a space of bounded functions into itself and satisfies the assumptions of Blackwell’s theorem. Therefore, a fixed point exists and is unique. Let x denote the fixed point. Then Lemma 2 shows that we can find two functions p1 (θ ζ) and y1 (θ ζ) for a given ζ that satisfy (30) and (31). Since x(θ ζ) is a fixed point of T , we have x(θ ζ) = p1 (θ ζ)y1 (θ ζ), and substituting in (30) and (31) shows that (19) and (20) are satisfied. Therefore, in a fully constrained equilibrium p1 (θ ζ) and y1 (θ ζ) are uniquely determined, and so is labor supply n(θ ζ) = y1 (θ ζ)/θ. Moreover, from the budget constraint and the market clearing condition in period 2, consumption in period 2 is uniquely determined by (21). The price p2 is equal to 1, given the normalization in the text. From the consumer’s budget constraint in period 3, we obtain c3 = e3 . Combining the Euler equations (6) and (8) and the envelope condition (10), p3 is uniquely pinned down by (39)
1 ˜ ζ)) = βγ −1 E U (c2 (θ θ p3
Moreover, equilibrium money holdings are m1 (θ ζ) = 1 − p1 (θ ζ)y1 (θ ζ), ˜ ζ) = 0, and m3 (θ θ ˜ ζ) = γ. Define the cutoff m2 (θ θ ˜ ζ)) E U (c2 (θ θ γˆ ≡ β min U (c2 (θ θ ζ)) ζ
The only optimality condition that remains to be checked is the Euler equation ˜ ζ), Lemma 3 in period 2, that is, equation (7). Given the definition of c2 (θ θ ˜ It implies that it is an increasing function of θ and a decreasing function of θ. ˜ and follows that a necessary and sufficient condition for (7) to hold for all θ, θ, ζ is (40)
1 min U (c2 (θ θ ζ)) ≥ ζ p3
Substituting the expression (39) for 1/p3 , this condition is equivalent to γ ≥ γ. ˆ ˜ ζ) is uniquely deTherefore, if a fully constrained equilibrium exists, c2 (θ θ termined and condition (40) implies that γ ≥ γ, ˆ proving necessity. Moreover, if γ ≥ γ, ˆ the previous steps show how to construct a fully constrained equilibrium, proving sufficiency.
1786
V. GUERRIERI AND G. LORENZONI
Finally, the proof that nominal income p1 (θ ζ)y1 (θ ζ) is monotone increasing in θ, for a given ζ, follows immediately from Lemma 3. To prove that also output y1 (θ ζ) is monotone increasing in θ, let us use the same functions h1 (z y; θ), and h2 (z y; θ), and the same notation as in the proof of Lemma 3. For a given ζ, apply the implicit function theorem to get
(41)
∂h2 ∂h1 ∂h1 ∂h2 − ∂z ∂θ y (θ) = ∂z ∂θ ∂h1 ∂h2 ∂h2 ∂h1 − ∂z ∂y ∂z ∂y
Then it is sufficient to show that y (θ) > 0 for all θ ∈ (0 θ]. Using z and y as shorthand for z(θ) and y(θ), the numerator on the right-hand side of (41) can be written as
y y y y +v v 2 θ θ θ θ θ
θ ˜ ˜ ˜ ˜ × z U (1 − z + x(θ)) dF(θ) − U (1 − z + x(θ)) dF(θ) 0
0
and is negative. Finally, the denominator is equal to (36) and is negative thanks to assumption (A2), as we have argued in the proof of Lemma 3. This completes the argument. Q.E.D. PROOF OF PROPOSITION 8: The proof proceeds in three steps. The first two steps prove that, for each θ, the nominal income in island θ, x(θ ζ), is increasing with the aggregate shock ζ. Using this result, the third step shows that y1 (θ ζ) is increasing in ζ. Consider two values ζ I and ζ II , with ζ II > ζ I . Denote, respectively, by TI and TII the maps defined in Definition 2 under the distributions F(θ|ζI ) and F(θ|ζII ). Let xI and xII be the fixed points of TI and TII , that is, xI (θ) ≡ x(θ ζ I ) and xII (θ) ≡ x(θ ζ II ) for any θ. Again, to save on notation, we drop the period index for y1 . Step 1. Let the function x0 be defined as x0 = TII xI . In this step, we want to prove that x0 (θ) > xI (θ) for all θ > 0. We will prove it pointwise for each θ. Fix θ > 0 and define the functions
θ
h1 (z y; ζ) ≡ yu (y) − z
˜ dF(θ|ζ) ˜ U (1 − z + xI (θ))
0
θ
h2 (z y; ζ) ≡ v (y/θ)y/θ − z 0
˜ + z) dF(θ|ζ) ˜ U (1 − xI (θ)
LIQUIDITY AND TRADING DYNAMICS
1787
for ζ ∈ [ζ I ζ II ]. Lemma 2 implies that we can find a unique pair (z(ζ) y(ζ)) that satisfies h1 (z(ζ) y(ζ); ζ) = 0 and
h2 (z(ζ) y(ζ); ζ) = 0
Once more, we are abusing notation in the definition of h1 (· ·; ζ) h2 (· ·; ζ) z(ζ), and y(ζ). However, as θ is kept constant, there is no room for confusion. Notice that z(ζ I ) = xI (θ), since xI is a fixed point of TI , and z(ζ II ) = x0 (θ) by construction. Therefore, to prove our statement, we need to show that z(ζ II ) > z(ζ I ). It is sufficient to show that z (ζ) > 0 for all ζ ∈ [ζ I ζ II ]. Applying the implicit function theorem gives
(42)
∂h1 ∂h2 ∂h2 ∂h1 − ∂y ∂ζ ∂y ∂ζ z (ζ) = ∂h1 ∂h2 ∂h2 ∂h1 − ∂z ∂y ∂z ∂y
˜ is monotone increasing in θ, ˜ by Lemma 3, and U is strictly Notice that xI (θ) I ˜ ˜ + concave. Therefore, U (1 − z + x (θ)) is decreasing in θ˜ and U (1 − xI (θ) ˜ z) is increasing in θ. By the properties of first-order stochastic dominance, θ θ ˜ dF(θ|ζ) ˜ ˜ ˜ U (1−z +xI (θ)) is decreasing in ζ and 0 U (1−xI (θ)+z) dF(θ|ζ) 0 is increasing in ζ. This implies that ∂h1 /∂ζ > 0 and ∂h2 /∂ζ < 0. Using y as shorthand for y(ζ), the numerator on the right-hand side of (42) is, with the usual notation,
y ∂h1 ∂h2 1 y y [u (y) + yu (y)] − v +v ∂ζ θ θ θ θ ∂ζ The denominator is the analogue of (36). Once more, assumption (A2) ensures that both numerator and denominator are negative, completing the argument. Step 2. Define the sequence of functions (x0 x1 ) in X, using the recursion xj+1 = TII xj . Since, by Step 1, x0 ≥ xI (where by x0 ≥ xI , we mean x0 (θ) ≥ xI (θ) for all θ > 0) and, by Lemma 4, TII is a monotone operator, it follows that this sequence is monotone, with xj+1 ≥ xj . Moreover, TII is a contraction by Lemmas 4 and 6, so this sequence has a limit point, which coincides with the fixed point xII . This implies that xII ≥ x0 and, together with the result in Step 1, shows that xII > xI , as we wanted to prove. Step 3. Fix θ > 0 and, with the usual abuse of notation, define the functions
h1 (z y; ζ) ≡ yu (y) − z
θ
˜ ζ)) dF(θ|ζ) ˜ U (1 − z + x(θ
0
θ
h2 (z y; ζ) ≡ v (y/θ)y/θ − z 0
˜ ζ) + z) dF(θ|ζ) ˜ U (1 − x(θ
1788
V. GUERRIERI AND G. LORENZONI
˜ ζ) Notice the difference with the definitions of h1 and h2 in Step 1: now x(θ I ˜ replaces x (θ). The functions z(ζ) and y(ζ) are defined in the usual way. Applying the implicit function theorem, we get ∂h2 ∂h1 ∂h1 ∂h2 − ∂z ∂ζ ∂z ∂ζ y (ζ) = ∂h1 ∂h2 ∂h2 ∂h1 − ∂z ∂y ∂z ∂y To evaluate the numerator, notice that θ ∂h1 ˜ ζ)) dF(θ|ζ) ˜ =− U (1 − z + x(θ ∂z 0 θ ˜ ζ)) dF(θ|ζ) ˜ U (1 − z + x(θ < 0 +z ∂h2 =− ∂z
0
θ 0
˜ ζ) + z) dF(θ|ζ) ˜ U (1 − x(θ
θ
−z ≤− 0
˜ ζ) + z) dF(θ|ζ) ˜ U (1 − x(θ
0 θ
˜ ζ) + z) U (1 − x(θ
˜ ζ) + z) dF(θ|ζ) ˜ ˜ ζ) + z)U (1 − x(θ ≤ 0 + (1 − x(θ
where the last inequality follows from assumption (A1 ) (this is the only place where this assumption is used). Furthermore, notice that θ ˜ ∂h1 ˜ ˜ ζ)) ∂x(θ ζ) dF(θ|ζ) = −z U (1 − z + x(θ ∂ζ ∂ζ 0 θ ˜ ˜ ζ)) ∂f (θ|ζ) d θ˜ > 0 U (1 − z + x(θ −z ∂ζ 0 where the first element is positive from Steps 1 and 2, and the second element is positive because ζ leads to a first-order stochastic increase in θ˜ and U (1 − ˜ ζ)) is decreasing in θ. ˜ A similar reasoning shows that z + x(θ θ ˜ ∂h2 ˜ ˜ ζ) + z) ∂x(θ ζ) dF(θ|ζ) =z U (1 − x(θ ∂ζ ∂ζ 0 θ ˜ ˜ ζ) + z) ∂f (θ|ζ) d θ˜ < 0 U (1 − x(θ +z ∂ζ 0
LIQUIDITY AND TRADING DYNAMICS
1789
Putting together the four inequalities just derived shows that the numerator is negative. The denominator takes the usual form, analogous to (36), and is negative. This completes the proof. Q.E.D. PROOF OF PROPOSITION 9: From expression (24) it follows that ∂Y1 (ζ ξ) = ∂ζ ∂Y1 (ζ ξ) = ∂ξ
θ
y1 (θ ξ) 0
0
θ
∂f (θ|ζ) dθ ∂ζ
∂y1 (θ ξ) dF(θ|ζ) ∂ξ
In the case of an unconstrained equilibrium, the analogue of Proposition 6 can be easily derived, showing that ∂y1 (θ ξ)/∂ξ = 0 and ∂y1 (θ ξ)/∂θ > 0. These properties imply that ∂Y1 (ζ ξ)/∂ζ > 0 and ∂Y1 (ζ ξ)/∂ξ = 0. Next, consider a fully constrained equilibrium, where φ = 0 and γ ≥ γ. ˆ For each value of ξ, the functions p1 (θ ξ) and y1 (θ ξ) can be derived solving the following system of functional equations, analogous to (19) and (20):
θ
u (y1 (θ ξ)) = p1 (θ ξ) 0
˜ θ ξ)) dF(θ|ξ ˜ θ) U (c2 (θ
θ
v (y1 (θ ξ)/θ) = θp1 (θ ξ)
˜ ξ)) dF(θ|ξ ˜ θ) U (c2 (θ θ
0
˜ θ ξ) = 1 − p1 (θ ξ)y1 (θ ξ) + p1 (θ ˜ ξ)y1 (θ ˜ ξ). The only formal difwhere c2 (θ ˜ θ) deference between these and (19) and (20) is that the distribution F(θ|ξ pends also on θ. However, this does not affect any of the steps of Proposition 7 (there is only a minor difference in the proof of the analogue of Lemma 3; the details are available on request). Therefore, this system has a unique solution for each ξ. Next, following the steps of Propositions 7 and 8, we can show that y1 (θ ξ) is increasing in θ and ξ. This implies that ∂Y1 (ζ ξ)/∂ζ > 0 and Q.E.D. ∂Y1 (ζ ξ)/∂ξ > 0. REFERENCES ACEMOGLU, D., AND F. ZILIBOTTI (1997): “Was Prometheus Unbounded by Chance? Risk, Diversification and Growth,” Journal of Political Economy, 105, 709–751. [1753] AIYAGARI, R., AND S. WILLIAMSON (2000): “Money and Dynamic Credit Arrangements With Private Information,” Journal of Economic Theory, 91, 248–279. [1753] BACCHETTA, P., AND S. GERLACH (1997): “Consumption and Credit Constraints: International Evidence,” Journal of Monetary Economics, 40, 207–238. [1754] BENCIVENGA, V., AND B. SMITH (1991): “Financial Intermediation and Endogenous Growth,” Review of Economic Studies, 58, 195–209. [1753] BERENTSEN, A., G. CAMERA, AND C. WALLER (2005): “The Distribution of Money Balances and the Nonneutrality of Money,” International Economic Review, 46, 465–486. [1753]
1790
V. GUERRIERI AND G. LORENZONI
BERNANKE, B., AND M. GERTLER (1989): “Agency Costs, Net Worth, and Business Fluctuations,” American Economic Review, 79, 14–31. [1753] CRAIG, B., AND G. ROCHETEAU (2008): “Inflation and Welfare: A Search Approach,” Journal of Money, Credit and Banking, 40, 89–120. [1770] DIAMOND, P. (1982): “Aggregate Demand Management in Search Equilibrium,” Journal of Political Economy, 90, 881–894. [1753] (1984): “Money in Search Equilibrium,” Econometrica, 52, 1–20. [1753] EPSTEIN, L. G., AND S. E. ZIN (1989): “Substitution, Risk Aversion, and the Temporal Behavior of Consumption and Asset Returns: A Theoretical Framework,” Econometrica, 57, 937–969. [1768] FUERST, T. (1992): “Liquidity, Loanable Funds, and Real Activity,” Journal of Monetary Economics, 29, 3–24. [1754] HUBBARD, R., J. SKINNER, AND S. ZELDES (1994): “The Importance of Precautionary Motives in Explaining Individual and Aggregate Savings,” Carnegie Rochester Conference Series on Public Policy, 40, 59–126. [1770] JAPPELLI, T., AND M. PAGANO (1989): “Consumption and Capital Market Imperfections: An International Comparison,” American Economic Review, 79, 1088–1105. [1754] KIYOTAKI, N., AND J. H. MOORE (1997): “Credit Cycles,” Journal of Political Economy, 105, 211–248. [1753] (2001): “Liquidity, Business Cycles and Monetary Policy,” Mimeo, Princeton University. [1753] KIYOTAKI, N., AND R. WRIGHT (1989): “On Money as a Medium of Exchange,” Journal of Political Economy, 97, 927–954. [1753] KRUSELL, P., AND A. A. SMITH, JR. (1998): “Income and Wealth Heterogeneity in the Macroeconomy,” Journal of Political Economy, 106, 867–896. [1774] LAGOS, R., AND R. WRIGHT (2004): “A Unified Framework of Money Theory and Policy Analysis,” Staff Report 346, Res. Dept., Fed. Reserve Bank Minneapolis. [1758] (2005): “A Unified Framework of Money Theory and Policy Analysis,” Journal of Political Economy, 113, 463–484. [1753,1755,1759,1770,1773,1774] LUCAS, R. (1990): “Liquidity and Interest Rates,” Journal of Economic Theory, 50, 237–264. [1754] (2000): “Inflation and Welfare,” Econometrica, 68, 247–274. [1770] LUCAS, R., AND E. PRESCOTT (1974): “Equilibrium Search and Unemployment,” Journal of Political Economy, 7, 188–209. [1753,1754] MILGROM, P. (1981): “Good News and Bad News: Representation Theorems and Applications,” Bell Journal of Economics, 12, 380–391. [1772] MOLICO, M. (2006): “The Distribution of Money and Prices in Search Equilibrium,” International Economic Review, 47, 701–722. [1774] REED, R., AND C. WALLER (2006): “Money and Risk Sharing,” Journal of Money, Credit, and Banking, 38, 1599–1618. [1753,1773] ROCHETEAU, G., AND R. WRIGHT (2005): “Money in Competitive Equilibrium, in Search Equilibrium, and in Competitive Search Equilibrium,” Econometrica, 73, 175–202. [1753,1760] SHI, S. (1997): “A Divisible Search Model of Fiat Money,” Econometrica, 65, 75–102. [1755]
Graduate School of Business, University of Chicago, 5807 South Woodlawn Avenue, Chicago, IL 60637, U.S.A. and NBER;
[email protected] and Massachusetts Institute of Technology, Cambridge, MA 02142, U.S.A. and NBER;
[email protected]. Manuscript received June, 2007; final revision received May, 2009.
Econometrica, Vol. 77, No. 6 (November, 2009), 1791–1828
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA IN AIRLINE MARKETS BY FEDERICO CILIBERTO AND ELIE TAMER1 We provide a practical method to estimate the payoff functions of players in complete information, static, discrete games. With respect to the empirical literature on entry games originated by Bresnahan and Reiss (1990) and Berry (1992), the main novelty of our framework is to allow for general forms of heterogeneity across players without making equilibrium selection assumptions. We allow the effects that the entry of each individual airline has on the profits of its competitors, its “competitive effects,” to differ across airlines. The identified features of the model are sets of parameters (partial identification) such that the choice probabilities predicted by the econometric model are consistent with the empirical choice probabilities estimated from the data. We apply this methodology to investigate the empirical importance of firm heterogeneity as a determinant of market structure in the U.S. airline industry. We find evidence of heterogeneity across airlines in their profit functions. The competitive effects of large airlines (American, Delta, United) are different from those of low cost carriers and Southwest. Also, the competitive effect of an airline is increasing in its airport presence, which is an important measure of observable heterogeneity in the airline industry. Then we develop a policy experiment to estimate the effect of repealing the Wright Amendment on competition in markets out of the Dallas airports. We find that repealing the Wright Amendment would increase the number of markets served out of Dallas Love. KEYWORDS: Entry models, inference in discrete games, multiple equilibria, partial identification, airline industry, firm heterogeneity.
1. INTRODUCTION WE PROVIDE A PRACTICAL METHOD to estimate the payoff functions of players in complete information, static, discrete games. With respect to the empirical literature on entry games originated by Bresnahan and Reiss (1990) (BR) and Berry (1992), the main novelty of our framework is to allow for general forms of heterogeneity across players without making equilibrium selection assumptions. These assumptions are typically made on the form of firm heterogeneity to ensure that, for a given value of the exogenous variables, the economic model predicts a unique number of entrants. In the ensuing econometric models, multiple equilibria in the identity of the firms exist, but the number of entrants is unique across equilibria. This uniqueness leads to standard estimation 1 We thank a co-editor and three referees for comments that greatly improved the paper. We also thank T. Bresnahan, A. Cohen, B. Honoré, C. Manski, M. Mazzeo, A. Pakes, J. Panzar, A. Paula, R. Porter, W. Thurman, and seminar participants at many institutions and meetings for comments. We especially thank K. Hendricks for insightful suggestions. We also thank S. Sakata for help with a version of his genetic algorithm, Ed Hall for computing help, and T. Whalen at the Department of Justice for useful insights on airlines’ entry decisions. B. Karali provided excellent research assistance. The usual disclaimer applies. Tamer gratefully acknowledges research support from the National Science Foundation and the A. P. Sloan Foundation.
© 2009 The Econometric Society
DOI: 10.3982/ECTA5368
1792
F. CILIBERTO AND E. TAMER
of the parameter using maximum likelihood or method of moments. On the other hand, models with general forms of player heterogeneity have multiple equilibria in the number of entrants, and so the insights of BR and Berry do not generalize easily. We present an econometric framework that allows for multiple equilibria and where different selection mechanisms can be used in different markets. This framework directs the inferential strategy for a “class of models,” each of which corresponds to a different selection mechanism. We use the simple condition that firms serve a market only if they make nonnegative profits in equilibrium to derive a set of restrictions on regressions.2 In games with multiple equilibria, this simple condition leads to upper and lower bounds on choice probabilities.3 The economic model implies a set of choice probabilities which lies between these lower and upper bounds. Heuristically, our estimator then is based on minimizing the distance between this set and the choice probabilities that can be consistently estimated from the data. Our econometric methodology restricts the parameter estimates to a set and thus partially identifies the parameters (see footnote 3). Each parameter in this set corresponds to a particular selection mechanism that is consistent with the model and the data. We use recently developed inferential methods in Chernozhukov, Hong, and Tamer (2007) (CHT) to construct confidence regions that cover the identified parameter with a prespecified probability.4 We apply our methods to data from the airline industry, where each observation is a market (a trip between two airports).5 The idea behind cross-section studies is that in each market, firms are in a long-run equilibrium. The objective of our econometric analysis is to infer long-run relationships between the exogenous variables in the data and the market structure that we observe at some point in time, without trying to explain how firms reached the observed 2 The idea of deriving results for a class of models goes back to Sutton (2000). Taking a class of models approach to game theoretic settings, one “abandon(s) the aim of identifying some unique equilibrium outcome. Instead, we admit some class of candidate models (each of which may have one or more equilibria) and ask whether anything can be said about the set of outcomes that can be supported as an equilibrium of any candidate model.” The necessary and weak condition on behavior is similar to the “viability condition” discussed by Sutton (see also Sutton (1991)). 3 Tamer (2003) also used this insight to show that, for a simple 2 × 2 game with multiple equilibria, the model provides inequality restrictions on the regression. Sufficient conditions are then given to guarantee that these inequality restrictions point-identify the parameter of interest. These conditions are not easy to generalize to larger games. However, the paper noted that, in general, inequality restrictions constrain the parameter vector to lie in the identified set, and an estimator was suggested (Tamer (2003, p. 153)). 4 CHT focused on constructing confidence regions for the arg min of a function (in this paper, the minimum distance objective function) and also confidence regions for the true but potentially partially identified parameter. Other econometric methods that can be used are Romano and Shaikh (2008), Bugni (2007), Beresteanu and Molinari (2008), Andrews and Soares (2009), and Canay (2007). 5 Berry (1992) used the same data source, but from earlier years.
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1793
equilibrium. For example, we model the entry decision of American Airlines as having a different effect on the profit of its competitors than the entry of Delta or of low cost carriers has. In addition, we perform a policy exercise using our estimated model to study how the Wright Amendment, a law restricting competition in markets out of Dallas Love airport, affects the state of these markets with respect to competition or market structure. This law was partially repealed in 2006, so we can compare the predictions of our model with what actually happened. We estimate two versions of a static complete information entry game. These versions differ in the way in which the entry of a firm, its “competitive effect,” affects the profits of its competitors. In the simpler version, which follows the previous literature, these competitive effects are captured by firm-specific indicator variables in the profit functions of other airlines. These indicator variables measure the firms’ “fixed competitive effects.” In the more complex version, a firm’s competitive effect is a variable function of the firm’s measure of observable heterogeneity. The measure of observable heterogeneity that affects competitors’ profits is an airline’s airport presence, which is a function of the number of markets served by a firm out of an airport. The theoretical underpinnings for these “variable competitive effects” are given in Hendricks, Piccione, and Tan (1997), who showed that as long as an airline has a large airport presence, its dominant strategy is not to exit from a spoke market, even if that means it suffers losses in that market. Thus, the theoretical prediction is that the larger an airline’s airport presence, the larger its variable competitive effects should be. We find evidence of heterogeneity across airlines in their profit functions. We find that the competitive effects of large airlines (American, Delta, United) are different from those of low cost carriers and Southwest. We also find that the (negative) competitive effect of an airline is increasing in its airport presence, which is an important measure of observable heterogeneity in the airline industry. Moreover, we also find evidence of heterogeneity in the effects of control variables on the profits of airline firms, which affects the probability of observing different airlines as the control variables change. Then we develop a policy experiment to estimate the effect of repealing the Wright Amendment on competition in markets out of the Dallas airports. We find that repealing the Wright Amendment would increase the number of markets served out of Dallas Love. As part of our analysis, we also estimate the variance–covariance matrix, and find evidence of correlation in the unobservables as well as evidence of different variances and distributions of the firm unobservable heterogeneity. This paper contributes to a growing literature on inference in discrete games. In the complete information setting, complementary approaches include Bjorn and Vuong (1985) and Bajari, Hong, and Ryan (2005), where equilibrium selection assumptions are imposed. Another approach makes informational assumptions. For example, Seim (2002), Sweeting (2004), and AradillasLopez (2005) considered the case where the entry game has incomplete information, so that neither the firms nor the econometrician observe the profits of
1794
F. CILIBERTO AND E. TAMER
all competitors. Andrews, Berry, and Jia (2003) proposed methods applicable to entry models to construct confidence regions for models with inequality restrictions. More recently, Pakes, Porter, Ho, and Ishii (2005) provided a novel economic framework that leads to a set of econometric models with inequality restrictions on regressions. They also provide a method for obtaining confidence regions. Finally, further insights about identification in these settings is given in Berry and Tamer (2006). This article also adds to the literature on inference in partially identified models. This literature has a history in econometrics, starting with the Frisch bounds on parameters in linear models with measurement error (Frisch (1934)) and the work of Marschak and Andrews (1944) (which contains one of the earliest examples of a structural model with partially identified parameters). More recently, Manski and collaborators further developed this literature with new results and made it part of the econometrics toolkit starting with Manski (1990). See also Manski (2007) and the references therein, Manski and Tamer (2002), and Imbens and Manski (2004). In the industrial organization literature, Haile and Tamer (2003) used partial identification methods to construct bounds on valuation distributions in second price auctions. In the statistics literature, the Frechet bounds on joint distributions given knowledge of marginals are well known (see also Heckman, Lalonde, and Smith (1999)), and these were used starting with the important result of Peterson (1976) in competing risks. In the labor literature, the bounds approach to inference has been prominent in the treatment–response and selection literature where several papers discuss and use exclusion restrictions to tighten the bounds and gain more information. See Manski (1994) for a discussion of the selection problem using partial identification and exclusion restrictions, and Blundell, Gosling, Ichimura, and Meghir (2007) and Honoré and Lleras-Muney (2006) for important empirical papers that use and expand this methodology. See also bounds based on revealed preference in Blundell, Browning, and Crawford (2003), and bounds on various treatment effects derived in Heckman and Vytlacil (1999). The remainder of the paper is organized as follows. Section 2 presents the empirical model of market structure and the main idea of the econometric methodology. Section 3 formalizes the inferential approach, providing conditions for the identification and estimation of the parameter sets. Then Section 4 discusses market structure in U.S. airline markets. Section 5 presents the estimation results. Section 6 reports the results of our policy experiment. Section 7 concludes, and provides limitations and future work. 2. AN EMPIRICAL MODEL OF MARKET STRUCTURE We follow Berry (1992) in modeling market structure. In particular, let the profit function for firm i in market m be πim (θ; y−im ), where y−im is a vector that represents other potential entrants in market m and θ is a finite parameter
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1795
of interest determining the shape of πim . This function can depend on both market-specific and firm-specific variables.6 A market m is defined by Xm , where Xm = (Sm Zm Wm ). Sm is a vector of market characteristics which are common among the firms in market m; Zm = (Z1m ZKm ) is a matrix of firm characteristics that enter into the profits of all the firms in the market, for example, some product attributes that consumers value; K is the total number of potential entrants in market m; Wm = (W1m WKm ) are firm characteristics where Wim enters only into firm i’s profit in market m, such as the cost variables. The profit function for firm i in market m is (1)
πim = Sm αi + Zim βi + Wim γi +
j=i
δij yjm +
Zjm φij yjm + εim
j=i
where εim is the part of profits that is unobserved to the econometrician.7 We assume throughout that εim is observed by all players in market m. Thus, this is a game of complete information. An important feature of the profit function in this paper is the presence of i {δj φij }, which summarize the effect other airlines have on i’s profits. In particular, notice that this function can depend directly on the identity of the firms (yj ’s, j = i). Also, the effect on the profit of firm i of having firm j in its market is allowed to be different from that of having firm k in its market (δij = δik ). For example, the parameters δij can measure a particularly aggressive behavior of one airline (e.g., American) against another airline (e.g., Southwest).8 These competitive effects could also measure the extent of product differentiation across airlines (Mazzeo (2002)). Finally, δij and φj could measure cost externalities among airlines at airports.9
6 The fully structural form expression of the profit function should be written in terms of prices, quantities, and costs. However, because of lack of data on prices, quantities, and costs, most of the previous empirical literature on entry games had to specify the profit function in a reduced form. There exist data on airline prices and quantities, but these variables would be endogenous in this model. We would have to find adequate instruments and extend our methodology to include additional regression equations, one for the demand side and one for the supply side. This is clearly beyond the scope of our paper. As stated in the Introduction, the main contribution of this paper is to take the models used by previous empirical literature on entry games and allow for general forms of heterogeneity across players without making equilibrium selection assumptions. 7 The linearity imposed on the profit function is not essential. We only require that the profit function be known up to a finite dimensional parameter. 8 See the discussion in Bresnahan (1989, Section 2.2.3) for an interpretation of the δij ’s as measures of the expectations that each firm has on the behavior of its competitors. 9 See Borzekowski and Cohen (2004) for an example of a game of technology adoption with multiple equilibria.
1796
F. CILIBERTO AND E. TAMER
3. IDENTIFICATION We examine the conceptual framework that we use to identify the model. For simplicity, we start with a bivariate game where we show how to analyze the identified features of this game without making equilibrium selection assumptions. We then show that the same insights carry over to richer games. 3.1. Simple Bresnahan and Reiss 2 × 2 Game Consider the version of the model above with two players: (2)
y1m = 1[α1 X1m + δ2 y2m + ε1m ≥ 0] y2m = 1[α2 X2m + δ1 y1m + ε2m ≥ 0]
where (X1m X2m ) is a vector of observed exogenous regressors that contain market-specific variables. Here, a firm is in market m if, in a pure strategy Nash equilibrium, it makes a nonnegative profit. Following BR, Berry (1992), and Mazzeo (2002), we do not consider mixed strategy equilibria.10 The econometric structure in (2) is a binary simultaneous equation system. With large enough support for ε’s, this game has multiple equilibria. The presence of multiple equilibria complicates inference due to the coherency issue (see Heckman (1978) and Tamer (2003)). The likelihood function predicted by the model will sum to more than 1. A way to complete the model is to specify a rule that “picks” a particular equilibrium in the region of multiplicity. Another way to solve the coherency issue is to find some feature that is common to all equilibria and transform the model into one that predicts this feature uniquely. This is the solution adopted by Bresnahan and Reiss (1991a) and Berry (1992), which we illustrate next. When δ1 δ2 < 0 (monopoly profits are larger than duopoly profits), the map between the support of the unobservables (the ε) and the set of pure strategy equilibria of the game is as illustrated in the left-hand panel (LHP) of Figure 1. Notice that multiple equilibria in the identity, but not in the number of firms, happen when −αi Xi ≤ εi ≤ −αi Xi − δ3−i for i = 1 2 (we suppress dependence on m for simplicity). The shaded center region of the figure contains payoff pairs where either firm could enter as a monopolist in the simultaneous-move entry game. To ensure the uniqueness of equilibrium in the number of firms, Bresnahan and Reiss (1990) assumed that the sign of the δ’s is known. Consider, however, the simple 2 × 2 discrete game illustrated in the right hand panel (RHP) of Figure 1. In this case, where δi > 0 for i = 1 2 and for −δ3−i − αi Xi ≤ εi ≤ −αi Xi , both players enter or no player enters.11 Here, a player benefits from 10 It is simple, conceptually, to accommodate mixed strategies in our framework. We discuss this below. See also Berry and Tamer (2006). 11 This could be the case when positive externalities are present.
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1797
FIGURE 1.—Regions for multiple equilibria: LHP, δ1 δ2 < 0; RHP, δ1 δ2 > 0.
the other player entering the market. We can again use BR’s approach and estimate the probability of the outcome (1 0), of the outcome (0 1), and of the outcome “either (1 1) or (0 0),” but it is clear that we need to know the sign of the δ’s. Our methodology does not require knowledge of the signs of the δ’s. Finally, with more than two firms, one must assume away any heterogeneity in the effect of observable determinants of profits, including the presence of a competitor, on the firms’ payoff functions. If one drops these assumptions, different equilibria can exist with different numbers of players, even if the signs of the δ’s are known. Heuristically, in three-player games where one player is a large firm and the other two players are small firms, there can be multiple equilibria, where one equilibrium includes the large firm as a monopolist while the other has the smaller two firms enter as duopolists (as we will discuss in the empirical section). This happens when one allows differential effect on profits from the entry of a large firm versus a small one (δlarge = δsmall ). In contrast, our methodology allows for general forms of heterogeneity in the effect of the observable determinants of profits. Main Idea We illustrate the main idea starting with the case where the δ’s are negative. The choice probabilities predicted by the model are (3)
Pr(1 1|X) = Pr(ε1 ≥ −α1 X1 − δ2 ; ε2 ≥ −α2 X2 − δ1 ) Pr(0 0|X) = Pr(ε1 ≤ −α1 X1 ; ε1 ≤ −α2 X2 ) Pr(1 0|X) = Pr((ε1 ε2 ) ∈ R1 (X θ)) + Pr((1 0)|ε1 ε2 X)1[(ε1 ε2 ) ∈ R2 (θ X)] dFε1 ε2
1798
F. CILIBERTO AND E. TAMER
where R1 (θ X) = (ε1 ε2 ) : (ε1 ≥ −α1 X1 ; ε2 ≤ −α2 X2 )
∪ (ε1 ≥ −α1 X1 − δ2 ; −α2 X2 ≤ ε2 ≤ −α2 X2 − δ1 ) R2 (θ X) = (ε1 ε2 ) : (−α1 X1 ≤ ε1 ≤ −α1 X1 − δ2 ; − α2 X2 ≤ ε2 ≤ −α2 X2 − δ1 ) X = (X1 X2 ), and θ is a finite dimensional parameter of interest that contains the α’s, the δ’s, and parameters of the joint distribution of the ε’s. The first two equalities in (3) are simple. For example, the model predicts (1 1) uniquely if and only if the ε’s belong to the upper right quadrant. The third equality provides the predicted probability for the (1 0) event. This probability consists of the case when (1 0) is the unique equilibrium of the game, that is, when (ε1 ε2 ) ∈ R1 , and also when (1 0) is a potentially observable outcome of the game and it is the outcome that was “selected.” The selection mechanism is the function Pr((1 0)|ε1 ε2 X), which is allowed to depend on the unobservables in an arbitrary way. It is unknown to the econometrician and can differ across markets. This term is an infinite dimensional nuisance parameter.12 Heuristically, the identified feature of the above model is the set of parameters for which there exists a proper selection function such that the choice probabilities predicted by the model are equal to the empirical choice probabilities obtained from the data (or consistently estimated). We exploit the fact that this (selection) function is a proper probability and hence lies in [0 1]. Hence, an implication of the above model is (4)
Pr((ε1 ε2 ) ∈ R1 ) ≤ Pr((1 0)) ≤ Pr((ε1 ε2 ) ∈ R1 ) + Pr((ε1 ε2 ) ∈ R2 )
The model predicts the first two equations in (3) above and the inequality restriction on the choice probability of the (1 0) in (4). The upper and lower bound probabilities for the (1 0) event are illustrated in Figure 2. Sufficient point-identification conditions based on the predicted choice probabilities of the (0 0) and (1 1) outcomes were given in Tamer (2003). In the next section, we extend this inferential approach to more general games. 3.2. Identification: General Setup Here, we consider general games with many players and basically extend the insights from the previous section on bivariate games. We consider mod12 If we were to allow for mixed strategy equilibria, then each choice probability in (3) will need to be adjusted to account for each outcome being on the support of the mixed strategy equilibrium. More on this below.
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1799
FIGURE 2.—Upper and lower probability bounds on the Pr(1 0). The shaded area in the graph on the right hand side represents the region for (ε1 ε2 ) that would predict the outcome (1, 0) uniquely. The shaded region in the graph on the left hand side represents the region where (1, 0) would be predicted if we always select (1, 0) to be the equilibrium in the region of multiplicity. The probability of the epsilons falling in the respective regions provides an upper and a lower bound on the probability of observing (1, 0).
els where the number of markets is large, as opposed to requiring that the number of players within each market is large. We also require that the joint distribution of ε be known up to a finite parameter vector which is part of the parameter vector θ. As in the setup above, our approach to identification is to “compare” the (conditional) distribution of the observables (the data) to the distribution predicted by the model for a given parameter value. To estimate the conditional choice probability vector P(y|X), a nonparametric conditional expectation estimator can be used. We then derive the predicted choice probabilities in any given market m and find parameters that minimize their distance (to be formally defined below). We first provide an assumption that is used throughout. ASSUMPTION 1: We have a random sample of observations (ym Xm ) m = 1 n.13 Let n → ∞. Assume that the random vector ε is continuously distributed on RK independently of X = (X1 XK ) with a joint distribution function F that is known up to a finite dimensional parameter that is part of θ.
13
We do not need independent and identically distributed (i.i.d.) random sampling here. All that is needed is for the law of large numbers to hold. Moreover, an i.i.d. assumption can be made conditional on fixed effects.
1800
F. CILIBERTO AND E. TAMER
The predicted choice probability for y given X is Pr(y |X) = Pr(y |ε X) dF
Pr(y |ε X) dF +
= R1 (θX)
=
dF R1 (θX) unique outcome region
+
Pr(y |ε X) dF R2 (θX)
R2 (θX)
Pr(y |ε X) dF
multiple outcome region
where y = (y1 yK ) is some outcome which is a sequence of 0’s or/and 1’s, for example, American, Southwest, and Delta serving the market. The third equality splits the likelihood of observing y into two regions, R1 (θ X) and R2 (θ X). The first region of the unobservables, R1 (θ X), is where y is the unique observable outcome of the entry game. The second region, R2 , is where the game admits multiple potentially observable outcomes, one of which is y . The region R2 can be complicated. For example, in a subregion of R2 , y and y are the equilibria in pure strategies, while in another subregion of R2 , y and y can be the multiple pure strategy equilibria. Mixed strategy equilibria can also exist in region R2 (sometimes uniquely), and if y is on the support of the mixing distribution, then y is a potentially observable outcome. Hence, allowing for mixed strategies does not present additional problems, but for computational simplicity we do not allow for mixing in our empirical application.14 The probability function Pr(y |ε X) is the selection function for outcome y in regions of multiplicity. This function is not specified, and one objective of the methodology in this paper is to examine the question of what can be learned when researchers remain agnostic about this selection function. One can condition this function further on the various equilibria as functions of both ε and X, in which case the statistical model becomes one of a mixture. See Berry and Tamer (2006) for more on this. Generally, without assumptions on equilibrium selection, the model partially identifies the finite dimensional parameter of interest. Bjorn and Vuong (1985) assumed that this function is a constant. More recently, Bajari, Hong, and Ryan (2005) used a more flexible parametrization. To obtain the sharp identified set, one way to proceed is to use semiparametric likelihood, where the parameter space contains the space of unknown probability functions that include the selection functions. Although this is an attractive avenue down which to proceed theoretically, since this will provide 14 An important consequence of this is the fact that in some cases, when one estimates the model, there might only be mixed strategy equilibria. In the application below, this never happened. For more on inference with mixed strategies, see Berry and Tamer (2006).
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1801
information on the selection functions, it is difficult to implement practically.15 A practical way to proceed that can be used in many games is to exploit the fact that the selection functions are probabilities and hence bounded between 0 and 1, and so an implication of the above model is (5) dF ≤ Pr(y |X) ≤ dF + dF R1 (θX)
R1 (θX)
R2 (θX)
In vectorized format, these inequalities correspond to the upper and lower bounds on conditional choice probabilities: ⎤ ⎡ ⎤ ⎡ 1 ⎡ 1 H1 (θ X) H2 (θ X) Pr(y1 |X) ⎤ ⎥ ⎣ ⎥ ⎢ ⎦≤⎢ (6) H1 (θ X) ≡ ⎣ ⎦≤ ⎦ ⎣ K
H12 (θ X)
Pr(y2K |X)
K
H22 (θ X)
≡ H2 (θ X) where Pr(y|X) (the vector of the form (Pr(0 0) Pr(0 1) )) is a 2k vector of conditional choice probabilities. The inequalities are interpreted element by element. The H’s are functions of θ and of the distribution function F . For example, these functions were derived analytically in (4) for the 2 × 2 game. The lower bound function H1 represents the probability that the model predicts a particular market structure as the unique equilibrium.16 H2 contains, in addition, the probability mass of the region where there are multiple equilibria. This is a conditional moment inequality model, and the identified feature is the set of parameter values that obey these restrictions for all X almost everywhere and represents the set of economic models that is consistent with the empirical evidence. More formally, we can state the definition: DEFINITION 1: Let ΘI be such that (7)
ΘI = {θ ∈ Θ s.t. inequalities (6) are satisfied at θ∀ X a.s.}
We say that ΘI is the identified set. In general, the set ΘI is not a singleton and it is hard to characterize this set, that is, to find out whether it is finite, convex, etcetera. Next, following Tamer (2003), we provide sufficient conditions that guarantee pointidentification. 15 Another approach to sharp inference in this setup is the recent interesting work of Beresteanu, Molinari, and Molchanov (2008). 16 Notice that there are cross-equation restrictions that can be exploited in the “cube” defined in (5), like the fact that the selection probabilities sum to 1.
1802
F. CILIBERTO AND E. TAMER
3.3. Exclusion Restriction The system of equation that we consider is similar to a simultaneous equation system except that here the dependent variable takes finitely many values. As in the classical simultaneous equation system, exclusion restrictions can be used to reach point-identification. In particular, exogenous variables that enter one firm’s profit function and not the other’s play a key role. We explain using model (2) above. THEOREM 2: In model (2), let Assumption 1 hold with K = 2. Suppose X1 (X2 ) (suppressing the dependence on m) is such that x11 |X1−1 X2 (x12 |X2−1 X1 ) is continuously distributed with support on R and that (α11 α12 ) = (0 0), where Xi = 1 (x1i Xi−1 ) and αi = (α1i α−1 i ) for i = 1 2. Finally, normalize αi = 1 for i = 1 2. −1 −1 17 Then (α1 α2 δ1 δ2 ) is identified. PROOF: First, consider the choice probabilities for (0 0): (8)
P(0 0|X1 X2 )
=
P(0 0|x11 X1−1 ; x12 X2−1 )
=
P(ε1 ≥ α1 X1 ; ε2 ≥ α2 X2 )
as x11 →−∞
=
P(ε2 ≥ α2 X2 )
Hence, we see that the choice probabilities for (0 0) as we drive x11 to −∞ isolates the distribution function for ε2 and the parameter vector α2 . Hence, conditioning on those x11 ’s, (where player 1 is out of the market with probability 1 regardless of what player 2 does), this (0 0) choice probability point-identifies the marginal distribution of ε2 and α2 . Similarly, by driving x12 to −∞, we can identify the marginal distribution of ε1 and α1 . The same lines as above can be used to also identify (δ1 δ2 ) along with the joint distribution of (ε1 ε2 ). In the above discussion, we implicitly assumed that the signs of α1i = 1 for i = 1 2 are positive. This is without loss of generality, since large positive values of x1i conditional on other x’s will yield that firm 1 always enters in case α1i is positive; when α11 is negative, firm 1 does not enter. Now, we can use the choice probabilities for (1 1) to identify Q.E.D. (δ1 δ2 ). Independent variation in one regressor while driving another to take extreme values on its support (identification at infinity) identifies the parameters of model (2). Identification at infinity arguments have been used extensively 17 The identification in this theorem relies on driving values of one regressor to ±∞ while keeping the others finite. What this effectively does is render the game into a single decision problem, since the player with the large value for one of the regressors will always be in the market or out, regardless of what the other player does.
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1803
in econometrics. See, for example, Heckman (1990) and Andrews and Schafgans (1998). In more realistic games with many players, variation in excluded exogenous variables (like the airport presence or cost variables we use in the empirical application) help shrink the set ΘI . The support conditions above are sufficient for point identification, and are not essential since our inference methods are robust to non-point-identification. However, the exogeneity of the regressors and the exclusion restrictions are important restrictions that are discussed in Section 4.2. 3.4. Estimation The estimation problem is based on the conditional moment inequality model (9)
H1 (θ X) ≤ Pr(y|X) ≤ H2 (θ X)
Our inferential procedures uses the objective function (P(X) − H1 (X θ))− + (P(X) − H2 (X θ))+ dFx Q(θ) = where (A)− = [a1 1[a1 ≤ 0] a2k 1[a2K ≤ 0]] and similarly for (A)+ for a 2k vector A, and where · is the Euclidian norm. It is easy to see that Q(θ) ≥ 0 for all θ ∈ Θ and that Q(θ) = 0 if and only if θ ∈ ΘI , the identified set in Definition 1. The object of interest is either the set ΘI or the (possibly partially identified) true parameter θI ∈ ΘI . We discuss inference on both θI and ΘI , but we present confidence regions for θI , which is the true but potentially non-point-identified parameter that generated the data.18 Statistically, the main difference in whether one considers ΘI or θI as the parameter of interest is that confidence regions for the former are weakly larger than for the latter. Evidently, in the case of point-identification, the regions coincide asymptotically. Inference in partially identified models is a current area of research in econometrics, and in this paper we follow the framework of Manski and Tamer (2002), Imbens and Manski (2004), and Chernozhukov, Hong, and Tamer (2007).19 We discuss first the building of consistent estimators for the 18 Earlier versions of the paper contained estimators for sets Cn such that limn→∞ P(ΘI ⊆ Cn ) = α. The current results provide confidence regions for points instead, as the co-editor suggested. The earlier results for the sets, which were not very different, can be obtained from the authors upon request. 19 Other set inference methods that one can use to obtain confidence regions for sets include Andrews, Berry, and Jia (2003), Beresteanu and Molinari (2008), Romano and Shaikh (2008) Pakes, Porter, Ho, and Ishii (2005), Bugni (2007), and Canay (2007).
1804
F. CILIBERTO AND E. TAMER
identified set, which contains parameters that cannot be rejected as the truth. To estimate ΘI , we first take a sample analog of Q(·). To do that, we first reI place Pr(y|X) with a consistent estimator Pn (X). Then we define the set Θ as (10)
I = {θ ∈ Θ | nQn (θ) ≤ νn } Θ
where νn → ∞ and νn /n → 0 (take for example νn = ln(n)) and (11)
Qn (θ) =
n 1 (Pn (Xi ) − H1 (Xi θ))− +(Pn (Xi ) − H2 (Xi θ))+ n i=1
where · is the Euclidian norm. Theorem 3 below shows that the set estimator defined above is a Hausdorff-consistent estimator of the set ΘI . that for the function Qn defined THEOREM 3: Let Assumption 1 hold. Suppose √ in (11), (i) supθ |Qn (θ) − Q(θ)| = Op (1/ n) and (ii) Qn (θI ) = Op (1/n) for all θI ∈ ΘI . Then we have that with probability (w.p.) approaching 1, I ⊆wp1 ΘI Θ
I and ΘI ⊆wp1 Θ
as n → ∞
PROOF: Following the proof of Theorem 3.1 in CHT, first we show that ΘI ⊆wp1 ΘI . This event is equivalent to the event that Q(θn ) = op (1) for all I . We have θn ∈ Θ Q(θn ) ≤ |Qn (θn ) − Q(θn )| + Qn (θn ) √ = OP (1/ n) + Op (νn /n) = op (1) I . This event, again, is equivOn the other hand, we now show that ΘI ⊆wp1 Θ alent to the event that Qn (θI ) ≤ νn /n with probability 1 for all θI ∈ ΘI . From the hypothesis of the theorem, we have Qn (θI ) = Op (1/n) This can be made less than νn /n with probability approaching 1.
Q.E.D.
To conduct inference in the above moment inequalities model, we use the methodology of CHT where the above equality is a canonical example of a moment inequality model. We construct a set Cn such that limn→∞ P(θI ∈ Cn ) ≥ α for a prespecified α ∈ (0 1) for any θI ∈ ΘI . In fact, the set Cn that we construct will not only have the desired coverage property, but will also be consistent in the sense of Theorem 3. This confidence region is based on the principle of collecting all of the parameters that cannot be rejected. The confidence regions we report are constructed as follows. Let Cn (c) = θ ∈ Θ : n Qn (θ) − min Qn (t) ≤ c (12) t
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1805
We start with an initial estimate of ΘI . This set can be, for example, Cn (c0 ) = Cn (0). Then we will subsample the statistic n(Qn (θ0 ) − mint Qn (t)) for θ0 ∈ Cn (c0 ) and obtain the estimate of its α-quantile, c1 (θ0 ). That is, c1 (θ0 ) is the αquantile of {bn (Qbn j (θ0 ) − mint Qbn j (t)) j = 1 Bn }. We repeat this for all θ0 ∈ Cn (c0 ). We take the first updated cutoff c1 to be c1 = supθ0 ∈Cn (c0 ) c1 (θ0 ). This will give us the confidence set Cn (c1 ). We then redo the above step, replacing c0 with c1 , which will get us c2 . As the confidence region we can report Cn ≡ Cn (c2 ) or the generally “smaller” I = θ ∈ Θ : n Qn (θ) − min Qn (t) ≤ min(c2 cn (θ)) Θ t
where cn (θ) is the estimated α-quantile of n(Qn (θ) − mint Qn (t)). In our data set, we find that there is not much difference between the two, so we report Cn (c2 ). See CHT for more on this and for other ways to build asymptotically equivalent confidence regions. Also, more on subsampling size and other steps can be found in the online Supplemental Material (Ciliberto and Tamer (2009)).
3.5. Simulation In general games, it is not possible to derive the functions H1 and H2 analytically. Here, we provide a brief description of the simulation procedure that can be used to obtain an estimate of these functions for a given X and a given value for the parameter vector θ. We first draw R simulations of market and firm unobservables for each market m. These draws remain fixed during the optimization stage. We transform the random draw into one with a given covariance matrix. Then we obtain the “payoffs” for every player i as a function of other players’ strategies, observables, and parameters. This involves computing a 2k vector of profits for each simulation draw and for every value of θ. If π(yj X θ) ≥ 0 for some j ∈ {1 2K }, then yj is an equilibrium of that game. If this equilibrium is unique, then we add 1 to the lower bound probability for outcome yj and add 1 for the upper bound probability. If the equilibrium is not unique, then we add a 1 only to the upper bound of each of the multiple equilibria’s upper bound probabilities. For example, the upper bound on the outcome probability Pr(1 1 1|X) is K j 2K (X θ) = 1 H 1 π1 X1 θ; y2−1 ε1 ≥ 0 2 R j=1 R
K j π2K X2K θ; y2−2K ε2K ≥ 0
1806
F. CILIBERTO AND E. TAMER
where 1[∗] is equal to 1 if the logical condition ∗ is true and where R is the number of simulations, here we assume that R increases to infinity with sample size.20 The methods developed by McFadden (1989) and Pakes and Pollard (1989) i (X θ) converges almost surely uniformly in θ can be easily used to show that H and X to Hi (X θ) as the number of simulations increases for i = 1 2. 4. MARKET STRUCTURE IN THE U.S. AIRLINE INDUSTRY Our work contributes to the literature started by Reiss and Spiller (1989) and continued by Berry (1992). Reiss and Spiller (1989) provided evidence that unobservable firm heterogeneity in different markets is important in determining the effect of market power on airline fares. Berry (1992) showed that firm observable heterogeneity, such as airport presence, plays an important role in determining airline profitability, providing support to the studies that show a strong positive relationship between airport presence and airline fares.21 Berry also found that profits decline rapidly in the number of entering firms, consistent with the findings of Bresnahan and Reiss (1991b). In this paper, we investigate the role of heterogeneity in the effects that each firm’s entry has on the profits of its competitors: we call this their competitive effect. Then we use our model to perform a policy exercise on how market structures will change, at least in the short run and within our model, in markets out of and into Dallas after the repeal of the Wright Amendment. We start with a data description. 4.1. Data Description To construct the data, we follow Berry (1992) and Borenstein (1989). Our main data come from the second quarter of the 2001 Airline Origin and Destination Survey (DB1B). We discuss the data construction in detail in the Supplemental Material. Here, we provide information on the main features of the data set. Market Definition We define a market as the trip between two airports, irrespective of intermediate transfer points and of the direction of the flight. The data set includes a sample of markets between the top 100 metropolitan statistical areas (MSAs), ranked by population size.22 In this sample we also include markets that are 20 Since the objective function is nonlinear in the moment condition that contains the simulated quantities, it is important to drive the number of simulations to infinity; otherwise, there will be a simulation error that does not vanish and can lead to inconsistencies. 21 See Borenstein (1989) and Evans and Kessides (1993). 22 The list of the MSAs is available from the authors.
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1807
temporarily not served by any carrier, which are the markets where the number of observed entrants is equal to zero. The selection of these markets is discussed in the Supplemental Material. Our data set includes 2742 markets. Carrier Definition We focus our analysis on the strategic interaction between American, Delta, United, and Southwest, because one of the objectives of this paper is to develop the policy experiment to estimate the impact of repealing the Wright Amendment. To this end, we need to pay particular attention to the nature of competition in markets out of Dallas. Competition out of Dallas has been under close scrutiny by the Department of Justice. In May 1999, the Department of Justice (DOJ) filed an antitrust lawsuit against American Airlines, charging that the major carrier tried to monopolize service to and from its Dallas/Fort Worth (DFW) hub.23 So, using data from 2001—the year when American won the case against the DOJ—we investigate whether American shows a different strategic behavior than other large firms. Among the other large firms, Delta and United are of particular interest because they interact intensely with American at its two main hubs: Dallas (Delta) and Chicago O’Hare (United). In addition to considering American, Delta, United, and Southwest individually, we build two additional categorical variables that indicate the types of the remaining firms.24 The categorical variable medium airlines, MAm , is equal to 1 if either America West, Continental, Northwest, or USAir is present in market m. Lumping these four national carriers into one type makes sense if we believe that they do not behave in strategically different ways from each other in the markets we study. To facilitate this assumption, we drop markets where one of the two endpoints is a hub of the four carriers included in the type medium airlines.25 23 In particular, in April 27, 2001, the District Court of Kansas dismissed the DOJ’s case, granting summary judgment to American Airlines. The DOJ’s complaint focused on American’s responses to Vanguard Airlines, Sun Jet, and Western Pacific. In each case, fares dropped dramatically and passenger traffic rose when the low cost carriers (LCCs) began operations at DFW. According to the DOJ, American then used a combination of more flights and lower fares until the low cost carriers were driven out of the route or drastically curtailed their operations. American then typically reduced service and raised fares back to monopoly levels once the low cost carriers were forced out of DFW routes. In the lawsuit, the DOJ claimed that American responded aggressively against new entry of low cost carriers in markets out of Dallas/Fort Worth, a charge that was later dismissed. 24 In a previous draft of this paper, which is available from the authors’ websites, we showed that we could also construct vectors of outcomes where an element of the vector is the number of how many among Continental, Northwest, America West, and USAir are in the market. This is analogous to a generalized multivariate version of Berry (1992) and, especially, of Mazzeo (2002). We chose to let MAm and LCCm be categorical variables, since most of the time they take either a 0 or 1 value. 25 See the Supplemental Material for a list of these hubs.
1808
F. CILIBERTO AND E. TAMER
The categorical variable low cost carrier small, LCCm , is equal to 1 if at least one of the small low cost carriers is present in market m. 4.2. Variable Definitions We now introduce the variables used in our empirical analysis. Table I presents the summary statistics for these variables. Airport Presence Using Berry’s (1992) insight, we construct measures of carrier heterogeneity using the carrier’s airport presence at the market’s endpoints. First, we compute a carrier’s ratio of markets served by an airline out of an airport over the total number of markets served out of an airport by at least one carrier.26 Then we define the carrier’s airport presence as the average of the carrier’s airport presence at the two endpoints. We maintain that the number of markets that one airline (e.g., Delta) serves out of one airport (e.g., Atlanta) is taken as given by the carrier when it decides whether to serve another market.27 Cost Firm- and market-specific measures of cost are not available. We first compute the sum of the geographical distances between a market’s endpoints and the closest hub of a carrier as a proxy for the cost that a carrier has to face to serve that market.28 Then we compute the difference between this distance and the nonstop distance between two airports, and we divide this difference by the nonstop distance. This measure can be interpreted as the percentage of the nonstop distance that the airline must travel in excess of the nonstop distance if the airline uses a connecting instead of a nonstop flight. This is a good measure of the opportunity fixed cost of serving a market, even when a carrier serves that market on a nonstop basis, because it measures the cost of the best alternative to nonstop service, which is a connecting flight through the closest hub. It is associated with the fixed cost of providing airline service because it is a function of the total capacity of a plane, but does not depend on the number of passengers transported on a particular flight. We call this variable cost. 26
See the discussion in the Supplemental Material for more on this. The entry decision in each market is interpreted as a “marginal” decision, which takes the network structure of the airline as given. This marginal approach to the study of the airline markets is also used in the literature that studies the relationship between market concentration and pricing. For example, Borenstein (1989) and Evans and Kessides (1993) did not include prices in other markets out of Atlanta (e.g., ATL-ORD) to explain fares in the market ATL-AUS. The reason for this marginal approach is that modeling the design of a network is too complicated. 28 Data on the distances between airports, which are also used to construct the variable close airport are from the data set Aviation Support Tables: Master Coordinate, available from the National Transportation Library. See the Supplemental Material for the list of hubs. 27
SUMMARY STATISTICS %
Airline (%) Airport presence (%) Cost (%) Market level variables Wright amendment (0/1) Dallas airport (0/1) Market size (population) Per capita income ($) Income growth rate (% ∗ 100) Market distance (miles) Closest airport (miles) U.S. center distance (miles) Number of markets
AA
DL
UA
MA
LCC
WN
0.426 (0.494) 0.422 (0.167) 0.736 (1.609)
0.551 (0.497) 0.540 (0.180) 0.420 (1.322)
0.275 (0.447) 0.265 (0.153) 0.784 (1.476)
0.548 (0.498) 0.376 (0.135) 0.229 (0.615)
0.162 (0.369) 0.098 (0.077) 0.043 (0.174)
0.247 (0.431) 0.242 (0.176) 0.302 (0.860)
0.029 (0.169) 0.070 (0.255) 2,258,760 (1,846,149) 32,402.29 (3911.667) 5.195 (0.566) 1084.532 (624.289) 34.623 (20.502) 1570.614 (593.798) 2742
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
TABLE I
1809
1810
F. CILIBERTO AND E. TAMER
The Wright Amendment The Wright Amendment was passed in 1979 to stimulate the growth of the Dallas/Fort Worth airport. To achieve this objective, Congress restricted airline service out of Dallas Love, the other major airport in the Dallas area. In particular, the Wright Amendment permitted air carrier service between Love Field and airports only in Texas, Louisiana, Arkansas, Oklahoma, New Mexico, Alabama, Kansas, and Mississippi, provided that the air carrier did not permit through service or ticketing and did not offer for sale transportation outside of these states.29 In October 2006, a bill was enacted that determined the full repeal of the Wright Amendment in 2014. Between 2006 and 2014, nonstop flights outside the Wright zone would still be banned, connecting flights outside the Wright zone would be allowed immediately, and only domestic flights would be allowed out of Dallas Love. We construct a binary variable, Wright, equal to 1 if entry into the market is regulated by the Wright Amendment, and equal to 0 otherwise. Wright is equal to 1 for the markets between Dallas Love and any airport except the ones located in Texas, Louisiana, Arkansas, Oklahoma, New Mexico, Alabama, Kansas, and Mississippi. We also construct another categorical variable, called Dallas market, which is equal to 1 if the market is between either of the two Dallas airports and any other airport in the data set. This variable controls for the presence of a Dallas fixed effect. More details on the Wright Amendment are given in the Supplemental Material. Control Variables We use six control variables. Three of these are demographic variables.30 The geometric mean of the city populations at the market endpoints measures the market size. The average per capita incomes (per capita income) and the average rates of income growth (income growth rate) of the cities at the market endpoints measure the strength of the economies at the market endpoints. The other three variables are geographic. The nonstop distance between the endpoints is the measure of market distance. The distance from each airport to the closest alternative airport controls for the possibility that passengers can fly from different airports to the same destination (close airport).31 Finally, we use 29 The Shelby Amendment, passed in 1997, dropped the original restriction on flights between Dallas Love and airports in Alabama, Kansas, and Mississippi. In 2005, an amendment was passed that exempted Missouri from the Wright restrictions. 30 Data are from the Regional Economic Accounts of the Bureau of Economic Analysis, download in February 2005. 31 For example, Chicago Midway is the closest alternative airport to Chicago O’Hare. Notice that for each market, we have two of these distances, since we have two endpoints. Our variable is equal to the minimum of these two distances. In previous versions of the paper, we addressed the concern that many large cities have more than one airport. For example, it is possible to fly
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1811
the sum of the distances from the market endpoints to the geographical center of the United States (U.S. center distance). This variable is intended to control for the fact that, just for purely geographical reasons, cities in the middle of the United State have a larger set of close cities than cities on the coasts or cities at the borders with Mexico and Canada.32 Market Size Does Not Explain Market Structure To motivate the analysis that follows, we have classified markets by market size of the connected cities. The relevant issue is whether market size alone determines market structure (Bresnahan and Reiss (1990)). Table II provides some evidence that the variation in the number of firms across markets cannot be explained by market size alone. Identification in Practice We assume that the unobservables are not correlated with our exogenous variables. This is part of the content of Assumption 1. Notice that this assumption would be clearly violated if we were to use variables that the firm can TABLE II DISTRIBUTION OF THE NUMBER OF CARRIERS BY MARKET SIZEa Number of Firms
Large
Medium
Small
Total
0 1 2 3 4 5 6
707 4151 2903 1223 807 166 042
731 2286 2430 1967 1514 958 113
773 2091 2214 1634 1459 1617 211
729 3063 2593 1572 1193 748 102
Number
1202
971
569
2742
a Cross-tabulation of the percentage of firms serving a market by the market size, which is here measured by the geometric mean of the populations at the market endpoints.
from San Francisco to Washington on nine different routes. In a previous version of the paper, we allowed the firms’ unobservables to be spatially correlated across markets between the same two cities. In the estimation, whenever a market was included in the subsample that we drew to construct the parameter bounds, we also included any other market between the same two cities. This is similar to adjusting the moment conditions to allow for spatial correlation. In our context, it was easy to adjust for it since we knew which of the observations were correlated, that is, ones that had airports in close proximity. 32 The location of the mean center of population is from the Geography Division at the U.S. Bureau of the Census. Based on the 1990 census results, it was located in Crawford County, Missouri.
1812
F. CILIBERTO AND E. TAMER
choose, such as prices or quantities. However, we are considering a reduced form profit function, where all of the control variables (e.g., population, distance) are maintained to be exogenous.33 The main difficulty of estimating model (1) is given by the presence of the competitors’ entry decisions, since we consider a simultaneous move entry game. Theorem 2 in Section 3.3 shows that an exclusion restriction helps to point identify θI . Here, an exclusion restriction consists of a variable that enters firm i’s profit but not firm j’s. If this variable has wide support (e.g., a large degree of variation), then this reduces the size of the identified set. Berry (1992) assumed that the variable airport presence of one carrier is excluded from the profit equations of its competitors. Then airport presence is a market–carrier-specific variable that shifts the individual profit functions without changing the competitors’ profit functions. We refer to this model as the fixed competitive effect specification (that is, φij = 0 ∀i j). For example, the airport presence of American is excluded from the profit function of Delta. In the version fixed competitive effects we have two exclusion restrictions. Both the airport presence (used by Berry) and the cost of the competitors are excluded from the profit function. The second version of model (1) that we estimate is called variable competitive effects. This version includes the market presence of one airline in the profit function of all airlines. As mentioned in the Introduction, the theoretical underpinnings for these variable competitive effects are in Hendricks, Piccione, and Tan (1997). In this version, variable competitive effects, only the cost variable shifts the individual profit functions without changing the competitors’ profit functions, while airport presence is included in the profit functions of all firms. Generally, the economic rationale for excluding the competitors’ cost but including their airport presence in a firm’s profit function is the following. The airport presence variable is a measure of product differentiation. Thus, the airport presence of each firm is likely to enter the demand side of the profit function of all firms.34 In contrast, a variable that affects the fixed cost of one firm directly enters the (reduced form) profit function of only that firm.35 We maintain that this variable does not enter the profit function of the competitors directly.
33
The presence of market-, airport-, and airline-specific random effects controls for unobserved heterogeneity in the data. 34 Berry (1990) used airport presence as a measure of product differentation in a discrete choice model of demand for airline travel. 35 Notice that variables affecting the variable costs would not work as instruments because they would enter into the reduced form profit functions. The excluded variables must be determinants of the fixed cost.
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1813
Reporting of Estimates In our results, we report superset confidence regions that cover the truth, θI , with a prespecified probability. This parameter might be partially identified, and hence our confidence intervals are robust to non-point-identification. Since generically these models are not point identified, and since the true parameter, along with all parameters in the identified set, minimize a nonlinear objective function, it is not possible to provide estimates of the bounds on the true parameter.36 So our reported confidence regions have the coverage property and can also be used as consistent estimators for the bounds of the partially identified parameter θI . So in each table, we report the cube that contains the confidence region that is defined as the set that contains the parameters that cannot be rejected as the truth with at least 95% probability.37 5. EMPIRICAL RESULTS Before discussing our results, we specify in detail the error structure of our empirical model and discuss the first stage estimation. First, we include firm-specific unobserved heterogeneity, uim .38 Then we add market-specific unobserved heterogeneity, um . Finally, we add airport-specific unobserved heterogeneity uom and udm , where uom is an error that is common across all markets whose origin is o and udm is an error that is common across all markets whose origin is d39 uim , um , uom , and udm are independent and normally distributed, except where explicitly mentioned. Recall that εim is the sum of all four errors. With regard to the first stage estimation of the empirical probabilities, we first discretize the variables and then use a nonparametric frequency estimator. We discuss the way we discretize the variables in the Supplemental Material. The nonparametric frequency estimator consists of counting the fraction of markets with a given realization of the exogenous variables where we observe a given market structure.40 36
The reason is that it is not possible to solve for the upper and lower endpoints of the bounds, especially in a structural model where the objective function is almost always mechanically minimized at a unique point. 37 Not every parameter in the cube belongs to the confidence region. This region can contain holes, but here we report the smallest connected “cube” that contains the confidence region. 38 In one specification (third column of Table IV in Section 5.2), we estimate the covariance matrix of the unobserved variables (reported in Table VI). 39 Recall that our markets are defined irrespective of the direction of the flight. Thus, the use of the terms “origin” and “destination” means either one of the market endpoints. 40 An alternative to discretization and nonparametric estimation is to add a distributional assumption in the first stage. In previous versions of the paper, we estimated the empirical probabilities using a multinomial logit. This discretization is necessary since inference procedures with a nonparametric first step with continuous regressors have not been developed.
1814
F. CILIBERTO AND E. TAMER
5.1. Fixed Competitive Effects This section provides the estimation results for model (1) when we restrict φij = φj = 0 ∀i j Essentially, this is the same specification as the one used by Bresnahan and Reiss (1990) and Berry (1992), and therefore it provides the ideal framework with which to compare our methodology. This first version is also useful for investigating the case where the competitive effect of one airline is allowed to vary by the identity of its competitors. For example we allow Delta’s effect on American to be different than Delta’s effect on Southwest. In this case, the number of parameters to be estimated gets large very quickly. Thus, this specification allows for a more flexible degree of heterogeneity that is computationally very difficult to have without restricting φij = 0 ∀i j Berry Specification The second column of Table III presents the estimation results for a variant of the model estimated by Berry (1992). Here we assume βi = β, αi = α, and δij = δ ∀i j. Most importantly, this implies that the effects of firms on each other, measured by δ, are identical. In the second column of Table III, the reported confidence interval is the “usual” 95% confidence interval since the coefficients are point identified. The main limitation of this model is that the effects of firms on each other are identical, which ensures that in each market there is a unique equilibrium in the number of firms. The parameter competitive fixed effect captures the effect of the number of firms on the probability of observing another firm entering a market. We estimate the effect of an additional firm to be [−14151 −10581]. The entry of a firm lowers the probability that we see its competitors in the market. As the number of markets that an airline serves at an airport increases, the probability that the firm enters into the market increases as well. This is seen from the positive effect of airport presence, which is [3052 5087]. As expected, the higher is the value of the variable cost, the lower is the probability that the firm serves that market ([−0714 0024]). A higher income growth rate increases the probability of entry ([0370 1003]), as do market size ([0972 2247]), U.S. center distance ([1452 3330]), market distance ([4356 7046]), per capita income ([0568 2623]), and close airport ([4022 9831]). The Wright Amendment has a negative impact on entry, as its coefficient is estimated to be [−20526 −8612]. Next, we present values of the distance function at the parameter values where this function is minimized. In the first column, the distance function takes the value 17562. This function can be interpreted as a measure of “fit” among different specifications that use the same exogenous variables. Berry’s (1992) (symmetry) assumptions ensure that the equilibrium is unique in the number of firms, though there might be multiple equilibria in the identity of firms. To examine the existence of multiple equilibria in the identity of firms,
TABLE III EMPIRICAL RESULTSa
Competitive fixed effect AA DL UA MA LCC WN LAR on LAR LAR: AA, DL, UA, MA LAR on LCC LAR on WN LCC on LAR WN on LAR LCC on WN WN on LCC Airport presence Cost Wright Dallas Market size WN LCC
[−14.151, −10.581]
Heterogeneous Interaction
Heterogeneous Control
[−10.914, −8.822] [−10.037, −8.631] [−10.101, −4.938] [−11.489, −9.414] [−19.623, −14.578] [−12.912, −10.969]
[−9.510, −8.460] [−9.138, −8.279] [−9.951, −5.285] [−9.539, −8.713] [−19.385, −13.833] [−10.751, −9.29]
Firm-to-Firm Interaction
[−9.086, −8.389] [−20.929, −14.321] [−10.294, −9.025] [−22.842, −9.547] [−9.093, −7.887] [−13.738, −7.848] [−15.950, −11.608] [3.052, 5.087] [−0.714, 0.024]
[11.262, 14.296] [−1.197, −0.333]
[10.925, 12.541] [−1.036, −0.373]
[9.215, 10.436] [−1.060, −0.508]
[−20.526, −8.612] [−6.890, −1.087]
[−14.738, −12.556] [−1.186, 0.421]
[−12.211, −10.503] [−1.014, 0.324]
[−12.092, −10.602] [−0.975, 0.224]
[0.972, 2.247]
[0.532, 1.245]
[0.372, 0.960] [0.358, 0.958] [0.215, 1.509]
[0.044, 0.310]
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
Berry (1992)
(Continues)
1815
1816
TABLE III—Continued Heterogeneous Interaction
Heterogeneous Control
Market distance WN LCC
[4.356, 7.046]
[0.106, 1.002]
[0.062, 0.627] [−2.441, −1.121] [−0.714, 1.858]
[−0.057, 0.486]
Close airport WN LCC
[4.022, 9.831]
[−0.769, 2.070]
[−0.289, 1.363] [1.751, 3.897] [0.392, 5.351]
[−1.399,−0.196]
U.S. center distance WN LCC
[1.452, 3.330]
[−0.932, −0.062]
[−0.275, 0.356] [−0.357, 0.860] [−1.022, 0.673]
[−0.606, 0.242]
Per capita income Income growth rate
[0.568, 2.623] [0.370, 1.003]
[−0.080, 1.010] [0.078, 0.360]
[0.286, 0.829] [0.086, 0.331]
[0.272, 1.073] [0.094, 0.342]
[−13.840, −7.796]
[−1.362, 2.431]
[−1.067, −0.191] [−0.016, 0.852] [−2.967, −0.352] [−0.448, 1.073]
[0.381, 2.712]
1756.2 0.837 0 0.328
1644.1 0.951 0.523 0.326
1627 0.943 0.532 0.325
1658.3 0.969 0.536 0.308
Constant MA LCC WN Function value Multiple in identity Multiple in number Correctly predicted
Firm-to-Firm Interaction
a These set estimates contain the set of parameters that cannot be rejected at the 95% confidencet level. See Chernozhukov, Hong, and Tamer (2007) and the Supplemental Material for more details on constructing these confidence regions.
F. CILIBERTO AND E. TAMER
Berry (1992)
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1817
we simulate results and find that in 83.7% of the markets there exist multiple equilibria in the identity of firms. Finally, we report the percentage of outcomes that are correctly predicted by our model. Clearly, in each market we only observe one outcome in the data. The model, however, predicts several equilibria in that market. If one of them is the outcome observed in the data, then we conclude that our model predicted the outcome correctly. We find that our model predicts 32.8% of the outcomes in the data. This is also a measure of fit that can be used to compare models. Heterogeneous Competitive Fixed Effects The third column allows for firms to have different competitive effects on their competitors. We relax the assumption δij = δ ∀i j. Here we only assume δij = δj ∀i j. For example, the effect of American’s presence on Southwest’s and Delta’s entry decisions is given by δAA , while the effect of Southwest’s presence on the decision of the other airlines is given by δWN . All the δ’s are estimated to be negative, which is in line with the intuition that profits decline when other firms enter a market. There is quite a bit of heterogeneity in the effects that firms have on each other. The row denoted AA reports the estimates for the effect of American on the decision of the other airlines to enter into the market. We estimate the effect of American on the other airlines to be [−10914 −8822]. Instead, the entry decision of low cost carriers (LCC) has a slightly stronger effect on other airlines. The estimate of this effect is [−19623 −14578]. The coefficient estimates for the control variables are quite different in the second and third columns. This suggests that assuming symmetry introduces some bias in the estimates of the exogenous variables. For example, in the second column we estimate the effect of market distance to be [4356 7046], while in the third column the effect is [0106 1002]. The estimates for the constant are also different: [−13840 −7796] in the second column and [−1362 2431] in the third column. The differences in the competitive effects are large enough to lead to multiple equilibria in the number of firms in 52.3% percent of the markets. Thus, even the simplest form of heterogeneity introduces the problem of multiple equilibria in a fundamental way. Next, we show that multiple equilibria can also be present when we allow for other types of heterogeneity in the empirical model. Control Variables With Heterogeneous Fixed Effects The fourth column allows the control variables to have different effects on the profits of firms. In practice we drop the assumption αi = α ∀i. This is interesting because relaxing this assumption leads to multiple equilibria, even if the competitive effects are the same across firms.
1818
F. CILIBERTO AND E. TAMER
We estimate market size to have a similar positive effect on the probability that all firms enter into a market (the estimated sets overlap). On the contrary, we find that market distance increases the probability of entry of large national carriers, but it has a negative effect on the entry decision of Southwest. This is consistent with anecdotal evidence that Southwest serves shorter markets than the larger national carriers. Firm-to-Firm Specific Competitive Effects We now allow Delta’s effect (Delta’s effect is coded as the effect of a (LAR) large type firm) on American (whose effect is also coded as the effect of a type LAR firm) to be different than Delta’s effect on Southwest (WN). Here, the competitive effects of American (AA), Delta (DL), United (UA), and the type MA are coded as the effect of a type LAR firm. Therefore, δLAR LAR measures the competitive effect of the entry of a large carrier, for example, American, on another large carrier, for example, Delta. δWN LAR measures the competitive effect of Southwest on one of the four LAR firms. The other parameters are defined similarly. We find that the competitive effect of large firms on other large firms (LAR on LAR or δLAR LAR ) is [−9086 −8389], which is smaller than the competitive effect of large firms on low cost firms (LAR on LCC). The LAR competitive effects are not symmetric, in the sense that δLCC LAR is larger than δLCC . Finally, the competitive effects of Southwest and large firms on each other are symmetric. Overall, these results suggest that the competitive effects are firmto-firm specific. In later specifications, we do not allow for the competitive effects to vary in this very general way to reduce the number of parameters to be estimated. However, we find that allowing for variable competitive effects and for a flexible variance–covariance structure leads to results that are equally rich in terms of firm-to-firm effects. 5.2. Variable Competitive Effects In this section, we study models where the competitive effect of a firm on the other carriers’ profits from serving that market varies with its airport presence. Here, the main focus is on the estimation of φj .41 Variable Competitive Effects With Independent Unobservables The second column of Table IV reports the estimation results when the errors are assumed to be i.i.d. Most importantly, the coefficients φj , which measure the variable competitive effects, are all negative, as we would expect. This implies that the larger is the airport presence of an airline, the less likely is the entry of its competitors in markets where the airline is present. 41
We restrict φij = φj for computational reasons.
1819
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA TABLE IV VARIABLE COMPETITIVE EFFECTS
Fixed effect AA DL UA MA LCC WN Variable effect AA DL UA MA LCC WN Airport presence Cost AA DL UA MA LCC WN Wright Dallas Market size WN LCC Market distance WN LCC Close airport WN LCC U.S. center distance WN LCC Per capita income Income growth rate Constant MAm LCC WN Function value Multiple in identity Multiple in number Correctly predicted
Independent Unobs
Variance–Covariance
Only Costs
[−9.433, −8.485] [−10.216, −9.255] [−6.349, −3.723] [−9.998, −8.770] [−28.911, −20.255] [−9.351, −7.876]
[−8.817, −8.212] [−9.056, −8.643] [−4.580, −3.813] [−7.476, −6.922] [−14.952, −14.232] [−6.570, −5.970]
[−11.351, −9.686] [−12.472, −11.085] [−10.671, −8.386] [−11.906, −10.423] [−11.466, −8.917] [−12.484, −10.614]
[−5.792, −4.545] [−3.812, −2.757] [−10.726, −5.645] [−6.861, −4.898] [−9.214, 13.344] [−10.319, −8.256] [14.578, 16.145] [−1.249, −0.501]
[−4.675, −3.854] [−3.628, −3.030] [−8.219, −7.932] [−7.639, −6.557]
[−17.800, −16.346] [0.368, 1.323] [0.230, 0.535] [0.260, 0.612] [−0.432, 0.507] [0.009, 0.645] [−3.091, −1.819] [−1.363, 1.926] [−0.373, 0.422] [1.164, 3.387] [1.059, 3.108] [−9.271, 0.506] [0.276, 1.008] [−0.930, 0.367] [0.929, 1.287] [0.136, 0.331] [−0.522, 0.163] [0.664, 1.448] [−1.528, −0.180] [1.405, 2.215] 1616 0.9538 0.6527 0.3461
[−11.345, −10.566] [10.665, 11.260] [−0.387, −0.119]
[−16.781, −15.357] [0.839, 1.132] [0.953, 1.159] [0.823, 1.068]
[−0.791, 0.024] [−1.236, 0.069] [−1.396, −0.117] [−1.712, 0.072] [−17.786, 1.045] [−0.802, 0.169] [−14.284, −10.479] [−5.517, −2.095] [1.946, 2.435]
[0.316, 0.724] [−2.036, −1.395]
[−0.039, 1.406]
[0.400, 1.433] [2.078, 2.450] [1.875, 2.243] [0.015, 0.696] [0.668, 1.097]
[3.224, 6.717]
[0.824, 1.052] [0.151, 0.316] [−0.827, −0.523] [0.279, 0.747] [−0.233, 0.454] [1.401, 1.659] 1575 0.9223 0.3473 0.3375
[1.416, 2.307] [1.435, 2.092] [−12.404, −10.116]
[2.346, 3.339]
1679 0.9606 0.0728 0.3011
1820
F. CILIBERTO AND E. TAMER
We compare these results to those presented in the fourth column of Table III. To facilitate the comparison it is worth mentioning that in Table III, the competitive effect of one firm (for example, American) on the others is captured by a constant term, for example, δAA . In Table IV, the same competitive effect is captured by a linear function of American’s airport presence, δAA + φAA ZAAm . Our findings suggests that both the fixed and variable effects are negative. For example, we find δAA equal to [−9433 −8485] and φAA equal to [−5792 −4545]. Thus, a firm’s entry lowers the probability of observing other firms in the market. Moreover, the larger is the airport presence of the firm, the smaller is the probability of a competitor’s entry. This is consistent with the idea that entry is less likely when the market is being served by another firm that is particularly attractive, because of the positive effect on the demand of airport presence. Variable Competitive Effects With Correlated Unobservables In the third column, we relax the i.i.d. assumption on the unobservables and estimate the variance–covariance matrix.42 Notice that the results are quite similar in the second and third columns. For this reason, here we provide a discussion on the economic magnitude (that is, the marginal effects) of the parameters estimated in the third column. Table V presents the marginal effects of the variables. The results are organized in three panels. The top and middle panels show the marginal effects associated with a unit discrete change.43 The bottom panel shows the effect that the entry of a carrier, for example, American, has on the probability that we observe one of its competitors in the market. Before presenting our results, we clarify up front an important point. Normally, the marginal effects are a measure of how changes in the variables of the model affect the probability of observing the discrete event that is being studied. Here, there are six discrete events that our model must predict, as many as the carriers that can enter into a market, and there are eight market structures in which we can observe any given carrier. For example, we can observe American as a monopoly, as a duopoly with Delta or United, and so on. If there were no multiple equilibria, this would not create any difficulty: We could simply sum over the probability of all the market structures where American is in the market and that would give us the total probability of observing American in the market. However, we do have multiple equilibria, and we only observe 42 This correlation structure of the unobservable errors allows the unobservable profits of the firms to be correlated. For example, in markets where large firms face high fuel costs, small firms also face high fuel costs. Another possibility is that there are unobservable characteristics of a market that we are unable to observe, and that affect large firms and Southwest differently, so that when American enters, Southwest does not and vice versa. 43 Recall that we have discretized our data.
1821
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA TABLE V MARGINAL EFFECTSa AA
DL
UA
MA
LCC
WN
No Firms
Market size Positive Negative
01188 −00494
01136 −00720
00571 −00001
01188 −00442
00849 −01483
01118 −00300
−00033 −00033
Market distance Positive Negative
00177 −00354
00165 −00377
00106 −00110
00177 −00360
00099 −00128
00000 −00377
00006 00006
Close airport Positive Negative
01178 −00375
01122 −00518
00312 −00004
01048 −00318
00662 -00911
01178 −00175
−00033 −00033
Change income Positive Negative
00283 −00140
00265 −00193
00149 −00001
00283 −00120
00171 −00339
00277 −00086
−00007 −00007
Per capita income Positive Negative
00576 −00270
00546 −00377
00291 −00002
00576 −00237
00364 −00699
00573 −00160
−00015 −00015
U.S. center distance Positive Negative
00177 −00044
00181 −00055
00052 −00001
00171 −00033
00038 −00076
00181 −00011
−00004 −00004
Airport presence Cost
00673 −00102
00498 −00068
01888 −00117
00734 −00120
00599 −00054
01040 −00125
AA DL UA MA LCC WN
··· −03336 −02486 −03877 −00998 −02256
−03606 ··· −02630 −03941 −01579 −02356
−02556 −02658 ··· −02717 −00721 −02030
−04108 −03908 −02696 ··· −01415 −02868
−00704 −00335 −00675 −00989 ··· −00242
−02143 −02126 −02015 −02766 −00411 ···
a The numbers that we report are marginal effects. They are appropriately selected percentage changes in the original probability of a particular outcome. In the top and middle panels we report the largest change in the average upper bounds of the probabilities of observing a given carrier in any possible market structure.
lower and upper bounds on the probabilities of each market structure. Summing over the upper bounds of the probabilities of the market structures where American is in the market is not the appropriate solution, because the maximum probability of observing one market structure, for example, an American monopoly, necessarily excludes seeing another market structure, for example, a duopoly with American and Delta, with its maximum probability. There is one important exception to the point just made. The probability of observing the market structure with no firms is uniquely identified because the competitive effects are negative. Thus, in our discussion we will pay particular attention to this outcome, where no firm enters into a market.
1822
F. CILIBERTO AND E. TAMER
In the top and middle panels, we report the largest positive and negative change in the average upper bounds of the probabilities of observing a given carrier in any possible market structure. We report both the positive and negative changes because an increase in market size, for example, increases the profits of all firms. Thus, all firms individually are more likely to enter. However, here we are looking at the simultaneous decision of all firms. Consequently, we may see that some market structures become more likely to be observed at the expense of other market structures. The dominant effect for one particular firm might end up being negative. We identify the dominant effects in italics. In practice, we increase one variable at a time and we compute the average by an economically meaningful amount and we compute the average upper bounds by taking the means of the upper bounds for one market structure across markets. Then we compute the average upper bounds by taking the means of the upper bounds for one market structure across markets at the observed values. Finally, we take the differences of all of the upper bounds for all 64 market structures, and report the largest positive and negative changes among them. In the top panel, an increase in market size of 1 million people is associated with a maximum effect of an increase of 1188% in the probability of observing American Airlines and a maximum decrease of −494% of not observing it. This means that there is one market structure where American is present that is 1188% percent more likely to be observed and there is one other market structure where American is present that is 494% less likely to be observed. We interpret this combination of results as evidence that an increase in market size is associated with an overall increase in the likelihood of observing American in a market. If the nonstop distance increases by 250 miles, then the overall likelihood of observing American in a market decreases by 354%. If the distance to the closest alternative airport increases by more than 50 miles, then the probability of observing American increases by 1178%. If income grows 1% faster, then the probability of observing American serving the market increases by 283%. If the per capita income increases by 5,000 dollars, then the maximum probability of observing a market structure where American is serving the market increases by 576 percent. Finally, if the distance from the US geographical center increases of 250 miles, then the maximum probability of observing American increases by 177 percent. The interpretation of the results for the other firms is analogous. The middle panel reports the effect of an increase in the variables measuring heterogeneity on the probability of observing an airline, or no airlines (“No Firms”), in the market. Generally, the effects are much larger in this middle panel than in the top panel, suggesting that observable heterogeneity is a key determinant of entry. If American’s airport presence increases by 15%, then the probability of observing American increases by 673%. Finally, an increase
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1823
TABLE VI VARIANCE–COVARIANCE MATRIX
AA DL UA MA LCC WN
AA
DL
UA
MA
LCC
WN
1
[0.043, 0.761] [5.052, 6.895]
[−0.110, 0.442] [−0.200, 0.190] [2.048, 3.340]
[0.103, 0.626] [0.629, 0.949] [−0.173, 0.309] [2.396, 5.558]
[−0.217, 0.752] [−0.128, 0.656] [−0.213, 0.652] [−0.094, 0.313] [2.026, 6.705]
[0.055, 0.355] [0.218, 0.834] [0.192, 0.797] [0.093, 0.862] [0.093, 0.764] [2.063, 2.331]
of 50% in the cost associated with serving a market lowers the probability of observing American by approximately 1%.44 The numbers in the bottom panel of Table V are derived in a slightly different fashion from those ones in the top and middle panels. They also require additional discussion. Let us say that we want to quantify the effect of American’s entry on the probability of observing one of its competitors. If there were unique equilibria, then the answer would be straightforward. We could set the parameters that measure the competitive effects of American equal to zero and then recompute the new equilibria. Then we would just have to compute the change in the probabilities of observing the other firms in each market and take the averages across all markets. With multiple equilibria, the analysis of the marginal effects has to take into account that we estimate lower and upper bounds for the probability of observing any market structure in each market. We find that Delta’s entry can decrease the probability of observing American in the market by as much as 3336%. The effect of Delta’s entry varies a lot by the identity of the opponent, as we observe that it is just −335% for low cost carriers. Next, we discuss the estimation results for the variance–covariance matrix. Recall that the unobservables are made up of four components. One of them is a firm-specific component. We estimate the covariances of these firm-specific components. In addition, we estimate the variance of the sum of the four components. We find that the variances of all firms are larger than one. For example, the variance for Delta is [5052 6895]. This suggests that the unobservable heterogeneity for these firms is larger than for American. The estimated correlations are quite interesting. In general, it is hard to identify correlation coefficients among unobservables in multivariate discrete choice models, and here is no exception since the confidence regions are wide. A few points are worth making. For example, the unobservables of Southwest 44 We do not report the negative effect, since for the variable airport presence, the increase in the maximum probability of observing a firm in one particular market structure is always larger in absolute value than the maximum decrease of observing that firm in another market structure.
1824
F. CILIBERTO AND E. TAMER
and low cost carriers are positively correlated, suggesting that it is more likely that both of them would be in a market, ceteris paribus. Only Costs The fourth column of Table IV estimates the model without the variables that measure airport presence. In practice, we set βi = β = 0, in addition to δij = δj and φij = φj = 0 ∀i j. We present this specification so as to address the concern that airport presence could be endogenous if airlines choose their network, instead of choosing only whether to enter into a particular market for a given exogenous network. This concern is particularly reasonable when we perform our policy simulation. The results should be compared with those the third in column of Table III. With the exception of Southwest, the fixed competitive effects are all similar. This is reassuring because it suggests that the variation in the costs is enough to identify the competitive effects. This specification fits the data less than the specification in the third column of Table III. 6. POLICY EXPERIMENT: THE REPEAL OF THE WRIGHT AMENDMENT We develop a policy experiment to examine how our model predicts the market structures change in markets out of Dallas Love after the repeal of the Wright Amendment. To this end, it is crucial to study the individual firms’ strategic responses to the repeal of the amendment. In practice, we first take all 93 markets out of Dallas Love and simulate the predicted outcomes with the Wright Amendment still in place. We then repeal the law (we set the variable Wright equal to zero) and recompute the new predicted outcomes. Following the same approach as when we computed the marginal effects, we report the maximum change in the average upper bounds of the probabilities of observing a given carrier in any possible market structure before and after the repeal of the Wright Amendment. Our estimates provide a within model prediction of the effect of the repeal that should be interpreted in the short term. In Table VII, we present the policy simulations when we use three different specifications. The second column of Table VII reports the policy results when we use the specification in the third column of Table IV. This is an interesting specification because it accounts for correlation in the unobservables. We report the results when we use the value of the parameters at which the objective function is minimized. This is the number in the middle. Then we report the lowest and largest numbers for the policy results that we derive when we use all the parameters in the estimated set. The first result of interest is in the first row, which reports the probability of observing markets not served by any carrier. We find that the percentage of markets that would not be served would drop by 6384% after the repeal of the Wright Amendment, suggesting that its repeal would increase the number of markets served out of Dallas Love. Of those new markets, as many as 4744%
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1825
TABLE VII PREDICTED PROBABILITIES FOR POLICY ANALYSIS: MARKETS OUT OF DALLAS LOVE Airline
Variance–Covariance
Independent Obs
Only Costs
No firms [−0.6514, −0.6384, −0.6215] [−0.7362, −0.6862, −0.6741] [−0.6281, −0.6162, −0.5713] AA [0.4448, 0.4634, 0.4711] [0.2067, 0.3013, 0.3280] [0.3129, 0.3782, 0.4095] DL [[0.4768, 0.4988, 0.5056] 0.2733, 0.3774, 0.4033] [0.3843, 0.4315, 0.4499] UA [0.1377, 0.1467, 0.1519] [0.1061, 0.1218, 0.2095] [0.2537, 0.3315, 0.3753] MA [0.4768, 0.4988, 0.5056] [0.2733, 0.3774, 0.4033] [0.3656, 0.4143, 0.4342] LCC [0.3590, 0.3848, 0.4156] [0.8369, 0.8453, 0.8700] [0.2839, 0.3771, 0.3933] WN [0.4480, 0.4744, 0.4847] [0.2482, 0.2697, 0.3367] [0.3726, 0.4228, 0.4431]
could be served by Southwest. American and Delta, which have strong airport presences at Dallas/Fort Worth would serve a percentage of these markets, that is, at most 4634 and 4988% respectively. These marked changes in market structures suggest that one reason why the Wright Amendment was not repealed until 2006 was to protect American monopolies in markets out of Dallas/Fort Worth. Repealing the Wright Amendment would lead to a remarkable increase in service in new markets out of Dallas Love and thus would reduce the incentive for American to prevent entry of new competitors in markets out of Dallas/Fort Worth. As we said, these are dramatic increases and they do raise some concern that our methodology might overestimate the effects of the repeal of the Wright Amendment. First, we tried to get some anecdotal information on how Southwest plans to react to the repeal of the Wright Amendment. We checked Southwest’s web page and found out that since the partial repeal of the Wright Amendment in October 2006, Southwest has started offering one-stop, same plane or connecting service itineraries to and from Dallas Love field to 43 cities beyond the Wright Amendment area. This pattern of entry into new markets confirms that the repeal of the Wright Amendment is bound to have dramatic effects on airline service out of the Dallas Love airport. As a second check, we compared our results with those that we would derive using the coefficient estimates in the second column of Table IV. The main result, concerning the change in the number of markets that are not served, is almost identical. The other results, with the exception of that for the low cost carriers, are very similar. Finally, we checked our predictions using a specification where the airport presence variables are not included. The policy change might be so major that firms change their network structure when the Wright Amendment is repealed. The fourth column of Table VII reports the policy results when we use the specification presented in the fourth column of Table IV. This last specification shows results that are almost identical to those in the third column of Table VII.
1826
F. CILIBERTO AND E. TAMER
7. CONCLUSIONS This paper is a first step in the development of methods that study inference in entry models without making equilibrium selection assumptions. To that extent, these methods are used to study the effect of multiple equilibria on learning about parameters of interest. However, the methodology used in this paper has important limitations. The model imposes distributional assumptions on the joint distribution of the unobservables and on the shape of the variable profit function. Though it is conceptually possible to study the identification problem in our model without making strong parametric assumptions, it is not clear at this point that those results are practically attractive since they will involve a semiparametric model with possibly partially identified parameters. Moreover, the empirical analysis of the paper looks at the airline industry and network from a long-run perspective where its network is taken as exogenous in the short run. To relax this assumption, one needs to use a more complicated model that accounts for the dynamics of entry and of adjustment to one airline’s network in response to entry by a competitor. This is something we do not pursue here. Hence, the results, especially the policy experiments, need to be interpreted as the response in the short run and within our model. On the other hand, the econometric model allows for flexible correlation among firm unobservables and for spatial correlation among market unobservables. In addition, it is possible to test whether a certain selection mechanism is consistent with the data and the model by verifying whether estimates obtained under a given mechanism lie in our sets. To do that, one needs to deal with model misspecification, a topic that we leave also for future research. REFERENCES ANDREWS, D. W. K., AND M. SCHAFGANS (1998): “Semiparametric Estimation of the Intercept of a Sample Selection Model,” Review of Economic Studies, 65, 487–517. [1803] ANDREWS, D. W. K., AND G. SOARES (2009): “Inference for Parameters Defined by Moment Inequalities Using Generalize Moment Selection,” Econometrica (forthcoming). [1792] ANDREWS, D., S. BERRY, AND P. JIA (2003): “On Placing Bounds on Parameters of Entry Games in the Presence of Multiple Equilibria,” Working Paper, Yale University. [1794,1803] ARADILLAS-LOPEZ, A. (2005): “Semiparametric Estimation of a Simultaneous Game With Incomplete Information,” Working Paper, Princeton University. [1793] BAJARI, P., H. HONG, AND S. RYAN (2005): “Identification and Estimation of Discrete Games of Complete Information,” Working Paper, Stanford University. [1793,1800] BERESTEANU, A., AND F. MOLINARI (2008): “Asymptotic Properties for a Class of Partially Identified Models,” Econometrica, 76, 763–814. [1792,1803] BERESTEANU, A., F. MOLINARI, AND I. MOLCHANOV (2008): “Sharp Identification Regions in Games,” Working Paper, Cornell University. [1801] BERRY, S. (1990): “Airport Presence as Product Differentiation,” American Economic Review, Papers and Proceedings, 80, 394–399. [1812] (1992): “Estimation of a Model of Entry in the Airline Industry,” Econometrica, 60, 889–917. [1791,1792,1794,1796,1806-1808,1812,1814-1816] BERRY, S., AND E. TAMER (2006): “Identification in Models of Oligopoly Entry,” in Advances in Economics and Econometrics. Ninth World Congress of the Econometric Society, Vol. 2, ed.
MARKET STRUCTURE AND MULTIPLE EQUILIBRIA
1827
by R. Blundell, W. Newey, and T. Persson. Cambridge: Cambridge University Press, 46–85. [1794,1796,1800] BJORN, P., AND Q. VUONG (1985): “Simultaneous Equations Models for Dummy Endogenous Variables: A Game Theoretic Formulation With an Application to Labor Force Participation,” Working Paper 537, Caltech. [1793,1800] BLUNDELL, R., M. BROWNING, AND I. CRAWFORD (2003): “Nonparametric Engel Curves and Revealed Preference,” Econometrica, 71, 205–240. [1794] BLUNDELL, R., A. GOSLING, H. ICHIMURA, AND C. MEGHIR (2007): “Changes in the Distribution of Male and Female Wages Accounting for Employment Composition,” Econometrica, 75, 323–363. [1794] BORENSTEIN, S. (1989): “Hubs and High Fares: Dominance and Market Power in the US Airline Industry,” RAND Journal of Economics, 20, 344–365. [1806,1808] BORZEKOWSKI, R., AND A. COHEN (2004): “Estimating Strategic Complementarities in Credit Unions’ Outsourcing Decisions,” Working Paper, Fed. [1795] BRESNAHAN, T. (1989): “Empirical Studies of Industries With Market Power,” in Handbook of Industrial Organization, Vol. 2, ed. by R. Schmalensee and R. Willig. North-Holland. [1795] BRESNAHAN, T., AND P. REISS (1990): “Entry in Monopoly Markets,” Review of Economic Studies, 57, 531–553. [1791,1796,1811,1814] (1991a): “Empirical Models of Discrete Games,” Journal of Econometrics, 48, 57–81. [1796] (1991b): “Entry and Competition in Concentrated Markets,” Journal of Political Economy, 99, 977–1009. [1806] BUGNI, F. (2007): “Bootstrap Inference in Partially Identified Models,” Working Paper, Northwestern University. [1792,1803] CANAY, I. (2007): “EL Inference for Partially Identified Models: Large Deviations Optimality and Bootstrap Validity,” Working Paper, Northwestern. [1792,1803] CHERNOZHUKOV, V., H. HONG, AND E. TAMER (2007): “Estimation and Confidence Regions for Parameter Sets in Econometric Models,” Econometrica, 75, 1243–1284. [1792,1803,1816] CILIBERTO, F., AND E. TAMER (2009): “Supplements to ‘Market Structure and Multiple Equilibria in Airline Markets’,” Econometrica Supplemental Material, 77, http://www. econometricsociety.org/ecta/Supmat/5368_simulations.pdf; http://www.econometricsociety. org/ecta/Supmat/5368_data and programs. zip. [1805] EVANS, W., AND I. KESSIDES (1993): “Localized Market Power in the U.S. Airline Industry,” Review of Economics and Statistics, 75, 66–75. [1806,1808] FRISCH, R. (1934): Statistical Confluence Analysis. Oslo: University Institute of Economics. [1794] HAILE, P., AND E. TAMER (2003): “Inference With an Incomplete Model of English Auctions,” Journal of Political Economy, 111, 1–51. [1794] HECKMAN, J. (1978): “Dummy Endogenous Variables in a Simultaneous Equation System,” Econometrica, 46, 931–960. [1796] (1990): “Varieties of Selection Bias,” The American Economic Review, 80, 313–318. [1803] HECKMAN, J. J., AND E. J. VYTLACIL (1999): “Local Instrumental Variables and Latent Variable Models for Identifying and Bounding Treatement Effects,” Proceedings of the National Academy of Science: Economic Sciences, 96, 4730–4734. [1794] HECKMAN, J., R. LALONDE, AND J. SMITH (1999): “The Economics and Econometrics of Active Market Programs,” in Handbook of Labor Economics, ed. by O. Ashenfelter and D. Card. Amsterdam: North Holland, Chap. 31. [1794] HENDRICKS, K., M. PICCIONE, AND G. TAN (1997): “Entry and Exit in Hub-Spoke Networks,” RAND Journal of Economics, 28, 291–303. [1793,1812] HONORÉ, B., AND A. LLERAS-MUNEY (2006): “Bounds in Competing Risks Models and the War on Cancer,” Econometrica, 74, 1675–1698. [1794] IMBENS, G., AND C. F. MANSKI (2004): “Confidence Intervals for Partially Identified Parameters,” Econometrica, 72, 1845–1857. [1794,1803]
1828
F. CILIBERTO AND E. TAMER
MANSKI, C. F. (1990): “Nonparametric Bounds on Treatment Effects,” The American Economic Review, 80, 319–323. [1794] (1994): “The Selection Problem,” in Advances in Econometrics: Sixth World Congress. Econometric Society Monographs, Vol. 2, ed. by C. Sims. Cambridge: Cambridge University Press, 143–170. [1794] (2007): Identification for Prediction and Decision. Cambridge, MA: Harvard University Press. [1794] MANSKI, C. F., AND E. TAMER (2002): “Inference on Regressions With Interval Data on a Regressor or Outcome,” Econometrica, 70, 519–547. [1794,1803] MARSCHAK, J., AND W. H. ANDREWS (1944): “Random Simultaneous Equations and the Theory of Production,” Econometrica, 12, 143–203. [1794] MAZZEO, M. (2002): “Product Choice and Oligopoly Market Structure,” RAND Journal of Economics, 33, 1–22. [1795,1796,1807] MCFADDEN, D. (1989): “A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration,” Econometrica, 57, 995–1026. [1806] PAKES, A., AND D. POLLARD (1989): “Simulation and the Asymptotics of Optimization Estimators,” Econometrica, 57, 1027–1057. [1806] PAKES, A., J. PORTER, K. HO, AND J. ISHII (2005): “Moment Inequalities and Their Applications,” Working Paper, Harvard University. [1794,1803] PETERSON, A. V. (1976): “Bounds for a Joint Distribution Function With Fixed Subdistribution Functions: Applications to Competing Risks,” Proceedings of the National Academy of Sciences of the United States of America, 73, 11–13. [1794] REISS, P., AND P. SPILLER (1989): “Competition and Entry in Small Airline Markets,” Journal of Law and Economics, 32, S179–S202. [1806] ROMANO, J., AND A. SHAIKH (2008): “Inference for the Identified Set in Partially Identified Econometric Models,” Journal of Statistical Planning and Inference, 138, 2786–2807. [1792,1803] SEIM, K. (2002): “An Empirical Model of Firm Entry With Endogenous Product-Type Choices,” Working Paper, Stanford Business School. [1793] SUTTON, J. (1991): Sunk Costs and Market Structure: Price Competition, Advertising, and the Evolution of Concentration. Cambridge, MA: MIT Press. [1792] (2000): Marshall’s Tendencies: What Can Economists Know? Cambridge, MA: MIT Press. [1792] SWEETING, A. (2004): “Coordination Games, Multiple Equilibria, and the Timing of Radio Commercials,” Working Paper, Duke University. [1793] TAMER, E. (2003): “Incomplete Bivariate Discrete Response Model With Multiple Equilibria,” Review of Economic Studies, 70, 147–167. [1792,1796,1798,1801]
Dept. of Economics, University of Virginia, Monroe Hall, Charlottesville, VA 22903, U.S.A.;
[email protected] and Dept. of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208, U.S.A.;
[email protected]. Manuscript received August, 2004; final revision received February, 2009.
Econometrica, Vol. 77, No. 6 (November, 2009), 1829–1863
LONGEVITY AND LIFETIME LABOR SUPPLY: EVIDENCE AND IMPLICATIONS BY MOSHE HAZAN1 Conventional wisdom suggests that increased life expectancy had a key role in causing a rise in investment in human capital. I incorporate the retirement decision into a version of Ben-Porath’s (1967) model and find that a necessary condition for this causal relationship to hold is that increased life expectancy will also increase lifetime labor supply. I then show that this condition does not hold for American men born between 1840 and 1970 and for the American population born between 1890 and 1970. The data suggest similar patterns in Western Europe. I end by discussing the implications of my findings for the debate on the fundamental causes of long-run growth. KEYWORDS: Longevity, hours worked, human capital, economic growth.
1. INTRODUCTION THE LIFE EXPECTANCY at age 5 of American men born in the mid-19th century was 52.5 years and their average years of schooling were less than 9. Their peers, born 100 years later, gained more than 16 years of life and invested 6 more years in schooling (see Figure 1). Conventional wisdom suggests that these gains in life expectancy positively affected schooling by increasing the horizon over which investments in schooling have been paid off. Hereafter, I refer to this mechanism as the Ben-Porath mechanism, following the seminal work of Ben-Porath (1967).2 1 I thank a co-editor and four referees for detailed and valuable comments. I also thank Daron Acemoglu, Stefania Albanesi, Mark Bils, Hoyt Bleakley, Matthias Doepke, Reto Foellmi, Oded Galor, Eric Gould, Peter Howitt, Charles Jones, Sebnem Kalemli-Ozcan, Todd Kaplan, Peter Klenow, Kevin Lang, Doron Kliger, Robert Margo, Joram Mayshar, Omer Moav, Joel Mokyr, Dilip Mookherjee, Andy Newman, Claudia Olivetti, Daniele Paserman, Valerie Ramey, Yona Rubinstein, Eytan Sheshinski, Avi Simhon, Guillaume Vandenbroucke, David Weil, Joseph Zeira, Stephen Zeldes, Fabrizio Zilibotti, Hosny Zoabi, conference participants in the Minerva DEGIT, Jerusalem 2006; Society for Economic Dynamics, Vancouver 2006; NBER Summer Institute, Growth Meeting 2006; European Growth and Integration since the Mid-Nineteenth Century (Marie Curie Research Training Network), Lund 2006; The European Meeting of the Econometric Society, Milan 2008; Rags to Riches, Barcelona 2008, and seminar participants at Bar Ilan University, Boston University, Brown University, Columbia University, the University of Cyprus, Haifa University, Hebrew University, Universitat Pompeu Fabra, and the University of Zurich. Amnon Schreiber and Shalva Zonenashvili provided excellent research assistance. Financial support from the Hebrew University is greatly acknowledged. 2 The essence of the Ben-Porath model is that individuals choose their human capital according to the future rewards that this human capital will receive. The mechanism described above and labeled “the Ben-Porath mechanism” is one of but several predictions of the Ben-Porath model, and the aim of the current paper is the empirical evaluation of only this prediction. Another prominent prediction of the Ben-Porath model suggests that an increase in the rental rate on human capital will increase future rewards to human capital and hence increase investment in schooling. This prediction is discussed in the concluding remarks.
© 2009 The Econometric Society
DOI: 10.3982/ECTA8107
1830
MOSHE HAZAN
FIGURE 1.—Life expectancy at age 5 and average years of schooling by year of birth. Average years of schooling are our own estimates using IPUMS data and Margo’s (1986) methodology. For data sources on life expectancy, see Section 4.1.
Prominent scholars have emphasized that in the context of economic growth, exogenous reductions in mortality rates were crucial in initiating the process of human capital accumulation, which itself was instrumental in the transition from “stagnation” to “growth.” For example, Galor and Weil (1999, p. 153) wrote: [. . . ] A second effect of falling mortality is that it raises the rate of return on investments in a child’s human capital and thus can induce households to make quality–quantity tradeoffs. This inducement to increased investment in child quality would be complementary to the increase in the rate of return to human capital discussed in Section 1. [. . . ] The effect of lower mortality in raising the expected rate of return to human capital investments will nonetheless be present, leading to more schooling and eventually to a higher rate of technological progress. This will in turn raise income and further lower mortality.
This mechanism has been explored theoretically by others as well (see Meltzer (1992), de la Croix and Licandro (1999), Kalemli-Ozcan, Ryder, and Weil (2000), Boucekkine, de la Croix, and Licandro (2002, 2003), Soares (2005), Cervellati and Sunde (2005), and Boldrin, Jones, and Khan (2005), among oth-
LONGEVITY AND LIFETIME LABOR SUPPLY
1831
ers).3 But while each work emphasizes different aspects, all of these works have two things in common: a common objective, namely, to explain the transition from stagnation to growth and a shared crucial reliance on the Ben-Porath mechanism. Jena, Mulligan, Philipson, and Sun (2008) took the Ben-Porath mechanism one step further and quantified the monetary gains accrued over a lifetime due to this mechanism. Focusing on the gains in life expectancy in the United States from 1900 to 2000, their estimates range between $3,711 and $26,505 per capita, in 1996 dollars. The Ben-Porath mechanism, however, is not just a legacy of the past. In the context of comparative development, several scholars, using different tools, have tried to evaluate the causal effect of life expectancy on investment in human capital. Acemoglu and Johnson (2006) and Lorentzen, McMillan, and Wacziarg (2008) found no effect of life expectancy on school enrollment using cross-country regressions, whereas Bils and Klenow (2000) and Manuelli and Seshadri (2005) found positive effects, although of a different order of magnitude, using calibrated general equilibrium models.4 The causal effect of life expectancy on investment in human capital is studied in the development literature as well. Jayachandran and Lleras-Muney (2009) found a significant effect of a decline in maternal mortality risk on female literacy rates and school enrollment in Sri Lanka, and Fortson (2007) found a large negative effect of regional HIV prevalence on individual human capital investment in sub-Saharan Africa. Finally, the Ben-Porath mechanism is also mentioned outside the academic realm. In the public debate on the benefits of improving health in developing countries, a popular view suggests that while improving the health and longevity of the poor is an end in itself, it is also a means to achieving economic development. This view is best reflected in the report of the World Health Organization’s Commission on Macroeconomics and Health (2001, p. 25): The gains in growth of per capita income as a result of improved health are impressive, but tell only a part of the story. Even if per capita economic growth were unaffected by health, there would still be important gains in economic well-being from increased longevity. [. . . ] Longer-lived households will tend to invest a higher fraction of their incomes in education and financial saving, because their longer time horizon allows them more years to reap the benefits of such investments.
Despite its popularity, the evidence on the Ben-Porath mechanism is brief and mixed, and encompasses the experience of only recent decades. My purpose 3 Hazan and Zoabi (2006) criticized this literature, arguing that in a setting where parents choose fertility and the education of their children, a rise in the life expectancy of the children increases not only the returns to quality, but also the returns to quantity, mitigating the incentive to invest more in the children’s education. 4 See also Caselli (2005) and Ashraf, Lester, and Weil (2008). Both works present calibrated values for the elasticity of human capital with respect to the adult mortality rate. The former uses cross-country data and the latter use microestimates.
1832
MOSHE HAZAN
is to investigate empirically the relevance of this mechanism to the transition from stagnation to growth of today’s developed countries. I do so by noting that there is a fundamental asymmetry between providing support for a hypothesis and refuting it. While meeting a necessary condition is only a prerequisite for providing supportive evidence for a hypothesis, failure to meet a necessary condition is sufficient to refute one. I examine, therefore, a crucial implication of the Ben-Porath mechanism. Specifically, I argue that although the Ben-Porath mechanism is phrased as the effect of the prolongation of (working) life, it in fact suggests that as individuals live longer, they invest more in human capital if and only if their lifetime labor supply increases. Importantly, incorporation of the retirement choice into a version of Ben-Porath’s (1967) model does not change the above statement.5 Section 2 of the paper formulates this argument. Clearly, this statement is true as long as schooling is desired only to increase labor market productivity. In Section 9, I discuss several other motives that may positively affect the investment in human capital in response to an increase in longevity. The discussion above suggests that the necessary condition for the BenPorath mechanism can be tested directly by looking at the correlation between longevity and lifetime labor supply. I therefore suggest estimating the empirical counterpart of the lifetime labor supply, that is, the expected total working hours over a lifetime (henceforth ETWH) of consecutive cohorts of American men born between 1840 and 1970, and of all American individuals born between 1890 and 1970. A positive correlation between ETWH and longevity should serve as supportive evidence for the Ben-Porath mechanism. Conversely, a negative correlation between these two variables would suggest that the Ben-Porath mechanism cannot account for any of the immense increase in education that has accompanied the growth process over the last 150 years. The ETWH is determined by three factors: the age specific mortality rates, which determine the probability of being alive at each age, and the labor supply decisions along both the extensive and intensive margins at each age. Clearly, holding labor supply decisions constant, the Ben-Porath mechanism suggests a positive effect of longevity on lifetime labor supply and thereby on investment in education.6 However, the data suggest that the reduction in labor supply along both the extensive and intensive margins outweighs the gains in longevity, leading to a decline in lifetime labor supply. Thus, if one attempts to decompose the observed change in schooling over the relevant period to its different sources, the total effect of the Ben-Porath mechanism enters with a 5
An earlier version of this paper (Hazan (2006)) showed that incorporation of a leisure choice does not change the statement made above. 6 This is the partial, causal effect of life expectancy on education which the literature aims to estimate. See, for example, Acemoglu and Johnson (2006), Lorentzen, McMillan, and Wacziarg (2008), and Jayachandran and Lleras-Muney (2009).
LONGEVITY AND LIFETIME LABOR SUPPLY
1833
nonpositive sign, and it therefore cannot provide an explanation for the observed rise in education. My approach has two major advantages. First, it relies on sound theoretical prediction and therefore the empirical test is not specific to econometric specifications or structural assumptions. Second, it uses the experience of today’s developed countries over more than 150 years and can therefore shed light on the long-run economic consequences of the prolongation of life in the developing world. The rest of the paper is organized as follows. In Section 2, I present a simplified version of the Ben-Porath model to explicitly derive the effect of an increase in life expectancy on education and lifetime labor supply. In Section 3, I present my methodology for the estimation of the ETWH and in Section 4, I describe the data. In Section 5, I present my results for men and in Section 6, I present results for all individuals by combining the labor supply of both men and women. In Section 7, I explore the robustness of the results and in Section 8, I provide suggestive evidence that my results are not confined to the United States, but are a robust feature of the growth process in the 19th and 20th centuries. In Section 9, I discuss the broader implications of my findings and present some concluding remarks. 2. A PROTOTYPE OF THE BEN-PORATH MODEL In this section, I present a simplistic version of the Ben-Porath model. The purpose of this section is to explicitly emphasize the implications of this type of model for the effect of an increase in longevity on lifetime labor supply and thereby on investment in schooling.7 Denote consumption at age t by c(t) and let the utility from consumption, u(c), be twice continuously differentiable, strictly increasing, and strictly concave. Assume that labor supply is indivisible so the individual may be either fully employed or retired. Disutility of work, f (t), is independent of consumption and increases with age, f (t) > 0.8 The individual works until retirement, R, and lives until T , R ≤ T . This structure implies that the individual’s lifetime utility, V , is given by
T
V = 0
7
R
e−ρt u(c(t)) dt −
e−ρt f (t) dt
0
The Ben-Porath model allows for continuing investment in human capital during the phase of working life. My simplistic variant of the model does not allow for that. Modeling the schooling decision as in Ben-Porath (1967) will complicate the model, making the derivation of lifetime labor supply analytically intractable. I conjecture, nevertheless, that the results derived in this section would hold under the more realistic structure of the original Ben-Porath model. 8 This is a conventional way to model the retirement motive. See, for example, Sheshinski (2008) and Bloom, Canning, and Moore (2007).
1834
MOSHE HAZAN
where ρ is the subjective discount rate. The individual’s productivity during the working period is assumed to be equal to his human capital h. The latter is determined by the individual’s choice of the length of the schooling period, s, and the human capital production function, h(s) = eθ(s) . Finally, schooling occurs prior to entering the labor market and the sole cost of schooling is foregone earnings. The budget constraint of the individual is then given by T R (1) e−rt eθ(s) dt = e−rt c(t) dt s
0
where r is the interest rate. Define the Lagrangian associated with maximizing lifetime utility V , subject to the budget constraint, (1), and let λ be the Lagrange multiplier associated with this problem. The first order conditions with respect to c(t), R, and s are, respectively,
(2)
e−ρt u (c(t)) = λe−rt R −rs+θ(s) ≥ e−rt+θ(s) θ (s) dt e s
and (3)
e−ρR f (R) ≤ λe−rR eθ(s)
To illustrate the relationship between lifetime labor supply and investment in human capital in the most transparent way, I make two simplifying assumptions. First, I assume that r = ρ. This assumption ensures that consumption is constant throughout the individual’s life. This property is warranted because the Ben-Porath mechanism is silent with respect to life-cycle considerations of consumption. Second, I concentrate on an interior solution for the schooling and retirement choices. This implies that both (2) and (3) hold with strict equality.9 Using the budget constraint, the optimal consumption becomes (4)
c = c(s R) =
eθ(s) (e−rs − e−rR ) 1 − e−rT
and (2) and (3) can be rewritten, respectively, as (5) 9
1 − e−r(R−s) 1 = θ (s) r
Sufficient conditions for an interior solution for R are f (0) = 0 and f (T ) = ∞.
LONGEVITY AND LIFETIME LABOR SUPPLY
1835
and (6)
f (R) = u (c(s R))eθ(s)
Note that the left-hand side of (5) is the cost to increase schooling to the point where productivity rises by one unit, while the right-hand side of (5) is the discounted value of an increase in income by one unit per period over the productive life, R − s. Similarly, the left-hand side of (6) is the disutility from work at age R, while the right-hand side of (6) is the marginal cost of retiring, measured in terms of the loss of utility from foregone consumption. Inspection of (4), (5), and (6) reveals the effect of longevity on the optimal level of schooling, s, and lifetime labor supply, R − s. This effect is summarized in the following two propositions. The proofs are relegated to the Appendix. PROPOSITION 1: If θ(·) is twice continuously differentiable, strictly increasing, and strictly concave, an increase in longevity induces an increase in schooling and in lifetime labor supply. PROPOSITION 2: If θ(·) is linear, an increase in longevity induces an increase in schooling and has no effect on lifetime labor supply. It follows that if there are no diminishing returns to schooling, changes in longevity positively affect schooling but leave the optimal lifetime labor supply unaffected. Hall and Jones (1999) and Bils and Klenow (2000) argued that in a cross section of countries, there are sharp diminishing returns to human capital.10 In contrast, the typical finding in studies based on microdata within countries is that of linear returns to education. Some argue, however, that the latter studies are more prone to ability bias, which may drive the estimates toward linearity (Card (1995), e.g.). Assuming that the returns to education are (weakly) concave, the effect of an increase in longevity on lifetime labor supply that increases human capital investment is nonnegative. I conclude from Propositions 1 and 2 that for any reasonable human capital production function, a rise in longevity that induces an increase in the investment in human capital must also induce a rise in lifetime labor supply. It should be mentioned that in an earlier version of this paper (Hazan (2006)), the intensive margin of the labor supply decision was modeled and the results with respect to the effect of longevity on schooling and lifetime labor supply were similar to those summarized in Propositions 1 and 2. I now proceed with my 10
Most models which analyze long-run growth assume that human capital is strictly increasing and strictly concave with respect to time invested (see Galor and Weil (2000), Kalemli-Ozcan, Ryder, and Weil (2000), Hazan and Berdugo (2002), and Moav (2005), among others). The assumption that θ(·) is strictly increasing and strictly concave implies that the rate of return is diminishing, a less restrictive assumption.
1836
MOSHE HAZAN
empirical exercise of estimating the ETWH of consecutive cohorts of Americans to see whether their expected lifetime labor supply has indeed increased in parallel to the increase in their longevity and schooling, as the Ben-Porath mechanism predicts. 3. METHODOLOGY In this section, I explain my methodology for estimating the ETWH of each cohort. Let TWHc denote the lifetime working hours of a representative member of cohort c. Then ETWHc is an average of working hours at each age t, lc (t), weighted by the probability of remaining in the labor market at each age, the survivor function, denoted by Sc (t). The ETWH depends, of course, on the age at which expectations are calculated. Formally, the ETWH of an individual aged t0 who belongs to cohort c is E(TWHc |t ≥ t0 ) =
∞
lc (t)Sc (t|t ≥ t0 )
t=t0
Below I explain how I estimate the survivor function, Sc (t|t ≥ t0 ), and then discuss how I deal with the manner in which individuals form their expectations with respect to the relevant variables that determine the ETWH. 3.1. The Survivor Function To estimate the survivor function, Sc (t|t ≥ t0 ), I estimate the hazard function (i.e., the rate of leaving the labor market in the age interval [t t + 1)) and then calculate the survivor function directly. Two factors affect this hazard function: (i) mortality rates—at each age individuals may die and leave the labor market—and (ii) retirement rates—conditional on being alive, at each age individuals choose whether to continue working or to permanently leave the labor market and retire. Specifically, an individual of cohort c who survives to age t0 and is still alive at age t, leaves the labor market if he dies in the age interval [t t + 1), an event that occurs with probability qc (t). If he remains alive, an event that occurs with probability 1 − qc (t), he may choose to retire with probability Rc (t). Applying the law of large numbers, it follows that the hazard function for the representative member of cohort c is given by (7)
λc (t) = qc (t) + (1 − qc (t)) · Rc (t)
where qc (t) and Rc (t) are now interpreted as the mortality rate and the retirement rate of the representative member of cohort c at age t, respectively. Hence, to estimate the hazard function using (7), I need data on mortality and
LONGEVITY AND LIFETIME LABOR SUPPLY
1837
retirement rates for each cohort c at each age t, t ≥ t0 . Finally, the survivor function, Sc (t|t ≥ t0 ), is given by (8)
Sc (t|t ≥ t0 ) =
t
(1 − λc (i))
i=t0
3.2. The Formation of Expectations It is important to understand how individuals form expectations regarding mortality rates, retirement rates, and the hours they intend to work over the course of their lives, because most of the investment in human capital predates entry to the labor market. Specifically, I am interested in the way each cohort anticipates its mortality rate at each age, qc (t), its retirement rate at each age, Rc (t), and the hours it intends to work at each age, lc (t). At one extreme, one can assume that each cohort perfectly foresees its course of life and hence use the actual mortality rates, retirement rates, and hours worked of cohort members at each age. I refer to these estimates as cohort estimates. At the other extreme, one can assume that each cohort has static expectations and hence uses mortality rates, retirement rates, and hours worked by age from the cross section at the age at which expectations are formed. I refer to these estimates as period estimates. I estimate the ETWH using these two extreme assumptions, assuming that individuals’ beliefs about the future are a weighted average of these two extremes. 4. DATA As suggested in Section 3, to estimate the ETWH, I need data on three variables: the expected mortality rate, the expected retirement rate, and the expected working hours. As mentioned in Section 3.2, I need different data for the cohort estimates and the period estimates. In particular, since the cohort estimates require the utilization of actual cohort data, I can produce these estimates for cohorts born between 1840 and 1930. In contrast, the period estimates require cross-sectional data and hence I have these estimates for cohorts born between 1850 and 1970.11 In what follows, each subsection begins by discussing data sources and a general description of each variable. A description of the data for the cohort estimates then follows.12 11
More accurately, men born between 1836–1845 are referred to as “men born 1840,” men born between 1846–1855 are referred to as “men born 1850,” etc. 12 For brevity, I neither discuss nor present the data for the period estimates in detail and only present the period estimates of the ETWH. A discussion of these data can be found in Hazan (2006).
1838
MOSHE HAZAN
4.1. Mortality Rates Generally, there are two types of life tables: period life tables and cohort life tables. A period life table is generated from cross-sectional data. It reports, among other things, the probability of dying within an age interval in the concurrently living population. A cohort life table, on the other hand, follows a specific cohort and reports, among other things, the probability of dying within an age interval in that specific cohort. If mortality rates at each age were constant over time, the period life table and the cohort life table would coincide. However, if mortality rates were falling over time, the period life table would underestimate gains in life expectancy of each cohort. In my estimation, I employ data from cohort life tables for the cohort estimates and period life tables for the period estimates. My main source is Bell, Wade, and Goss (1992), who provided both period and cohort life tables from 1900 to 2080.13 For earlier periods, I use period life tables from Haines (1998). Note that one can construct cohort life tables from the period life table by culling mortality rates for different ages from different years. Looking across the cohorts, I observe that mortality rates have been declining at all ages for men born in 1840 onward. Since I aim to discover whether individuals were expected to increase or decrease their ETWH and decide on their education in relation to that, I am interested in mortality rates at the “relevant ages.” Since investment in formal education does not start prior to age 5 and entrance to the labor market starts, on average, at age 20, I focus on mortality rates, conditional on surviving to ages 5 and 20. The available data on mortality can be presented in several ways. One way is to use mortality rates at each age to construct survival curves. These curves show the percentage of individuals who are still alive at each age. A second way is to estimate the life expectancy. Graphically, this is the area under a survival curve. Below I present summary data of these two approaches. 4.1.1. Mortality Rates—Cohort Estimates for Men Born Between 1840 and 1930 Figure 2 plots the survival curves for men born in 1840, 1880, and 1930 who survived to age 20.14 As can be seen from the figure, survival to each age has been increasing, with the largest gains concentrated in the ages 50–75. These gains are translated into sizable gains in life expectancy at age 20. While a 20year-old man who belongs to the cohort born in 1840 was expected to live for another 43.2 years, his counterpart in the cohort born in 1880 was expected to live for another 45.65 years and their counterpart born in 1930 was expected 13
Data for the years 1990–2080 reflect projected mortality. Including all 10 cohorts on the same graph hides more than it reveals. I choose the cohort born in 1840 because it is the oldest, the cohort born in 1930 because it is the youngest, and the cohort born in 1880 because it is in the middle. 14
LONGEVITY AND LIFETIME LABOR SUPPLY
1839
FIGURE 2.—The probability of remaining alive, conditional on reaching age 20, for men born in 1840, 1880, and 1930: Cohort estimates.
to live for another 53.01 years. Overall, conditional on surviving to age 20, individuals born in 1930 were expected to live almost 10 years more than their counterparts born in 1840. Finally, there were also reductions in mortality rates at younger ages. The probability of surviving to age 20, conditional on being alive at age 5 has increased from 0.92 for individuals born in 1840 to 0.98 for individuals born in 1930, with most of the increase concentrated in the younger cohorts.15 4.2. Labor Force Participation and Retirement Rates To estimate retirement rates, I first estimate labor force participation rates and then compute the retirement rates between age t and age t + 1, Rc (t), as the rate of change in labor force participation between age t and t + 1.16 To 15 Figures showing the life expectancy at age 20 and the probability of surviving from age 5 to 20 can be found in Hazan (2006). 16 Although nonparticipation at a given age does not necessarily imply permanent retirement, this is what I assume here. This is not a bad assumption since I assume that retirement does not start prior to age 45. For men age 45 and above, the rate of exit from and reentry to the labor force is supposedly rather low. Furthermore, if the decision to leave the labor force and then return is uncorrelated across individuals of the same age and cohort, things would average out because I estimate variables at the cohort level. For expositional purposes, in this section I present the data on labor force participation rates.
1840
MOSHE HAZAN
estimate labor force participation rates, I use the Integrated Public Use Microdata Series (IPUMS) which are available from 1850 to 2000 (except for 1890) (Ruggles, Sobek, Alexander, Fitch, Goeken, Kelly Hall, King, and Ronnander (2004)). Prior to 1940, an individual was considered to be part of the labor force if he or she reported having a gainful occupation. This is also known as the concept of “gainful employment.” From 1940 onward, however, the definition changed and an individual is considered to be part of the labor force if, within a specific reference week, he or she has a job from which he or she is temporarily absent, working, or seeking work. Some scholars have argued that the former definition is more comprehensive than the latter. Moen (1988) suggested a method of estimating a consistent time series of labor force participation rates across all available IPUMS samples, based on the concept of gainful employment. In my estimation I employ the method suggested by Moen.17 4.2.1. Labor Force Participation and Retirement Rates—Cohort Estimates for Men Born Between 1840 and 1930 For each cohort I estimate the labor force participation rate based on the concept of gainful employment at each age starting from age 45.18 Similar to Figure 2, Figure 3 presents labor force participation rates for men born in 1840, 1880, and 1930. As can be seen, from age 55 and over, the younger the cohort is, the faster is the decline in his participation. Notice that while participation at age 45 is about 96–97 percent for all three cohorts, by age 60 it declines to 89 percent for men born in 1840, 80 percent for men born in 1880, and 76 percent for the men born in 1930. By age 70, the estimates are 61 percent, 48 percent, and 29 percent, respectively.19 Thus, while the fraction of those who survive to each age has increased, the fraction of those who have already retired has increased as well. In Section 5.1, I combine the survival and retirement rates to obtain the fraction of those who remain in the labor market at each age, Sc (t|t ≥ t0 ). 17
See also Costa (1998a, Chap. 2). I assume that participation rates are constant for all cohorts between age 20 and 45. The data support this claim firmly. In addition, from age 75 and over, there are too few observations in each cell. Hence I estimate participation in 5-year intervals (75–79, 80–84, 85–89, and 90–94) and use a linear trend to predict participation at each age. Finally, members of the cohort born in 1920 were 84 years old in 2000 and members of the cohort born in 1930 were 74 years old in 2000. Hence for the cohort born in 1920, at ages 85–94, I use the participation rates of the cohort born in 1910 and for the cohort born in 1930, at age 75–84, I use the participation rates of the cohort born in 1920, and at ages 85–94, the participation rates of the cohort born in 1910. 19 The long-run decline in labor force participation at age 55 and above was discussed by Costa (1998a) and Moen (1988). Lee (2001) discussed the length of the retirement period of cohorts of American men born between 1850 and 1990. 18
LONGEVITY AND LIFETIME LABOR SUPPLY
1841
FIGURE 3.—Labor force participation for men born in 1840, 1880, and 1930: Cohort estimates.
4.3. Hours Worked Questions about hours worked last week or usual hours worked per week were not asked by the U.S. Bureau of the Census prior to 1940. Hence, it is not possible to estimate a consistent time series of hours worked by age and sex from microdata over my period of interest, 1860–present. Whaples (1990), who did probably the most comprehensive study on the length of the American work week prior to 1900, put together the available aggregated time-series data from as early as 1830 to the present day. Clearly, such series suffer from biases due to the aggregation itself (e.g., changes over time in the workers’ age composition, the fraction of parttime workers, the fraction of women in the labor force, and so forth), due to sampling of different industries (e.g., manufacturing vs. all private sectors vs. all sectors of the economy) and a host of other reasons. Whaples (1990) reported two time series for the pre-1900 period: the Weeks and the Aldrich. The former suggests that the average work week was 62 hours in 1860, 61.1 hours in 1870, and 60.7 hours in 1880, while the latter suggests that the average work week was 66 hours in 1860, 63 hours in 1870, 61.8 hours in 1880, and 60 hours in 1890. During the last quarter of the 19th century, state Bureaus of Labor Statistics published several surveys of the economic circumstances of nonfarm wage earners. I rely on nine such surveys published between 1888 and 1899, all of which contain information on individuals’ daily hours of work, their wages, age
1842
MOSHE HAZAN
and sex, as well as other personal characteristics.20 Specifically, I combine the surveys from California in 1892, Kansas in 1895, 1896, 1897, and 1899, Maine in 1890, Michigan stone workers in 1888, Michigan railway workers in 1893, and Wisconsin in 1895. Altogether I have data on 13,515 male workers.21 I use this combined data set to generate an estimate of hours worked by males for 1890. Average hours worked by males yields an estimate of 10.2 hours per day, or 61.2 per week.22 The microdata set allows me to study the distribution of hours worked across the male population in more detail. The data suggest that average weekly hours did not vary much by age: although hours are somewhat higher at ages 20–29 and 30–39, 61.7 and 61.8, respectively, they were only reduced to 60.2, 60.5, 60.3, and 60.2 for the age groups 40–49, 50–59, 60–69, and 70–79, respectively. Across the wage distribution, however, there is more variation. The work week of individuals whose wages are in the 10th percentile consisted of 62.15 hours, while that of individuals whose wages are in the 90th percentile consisted of only 56.53 hours. Starting in 1900, in contrast, consistent time series on hours worked both by men, women, and all individuals are available from Ramey and Francis (2009), which are based on Kendrick (1961).23 For my main results of ETWH for men, I use the time series of hours for males age 14+. These data, however, present hours worked by person and not per worker. Hence, to transform these data into hours per worker, I estimate employment rates for males age 14+ from Census data and divide the hours per person by the fraction of men employed in each year. The resulting time series suggests that weekly hours per male worker fluctuated at around 50 hours between 1900 and 1925. It then sharply declined for about a decade during the Great Depression, rebounded to almost 57 hours a week during war time in the years 1943 and 1944, and then started its long-run decline from about 45 hours a week in 1946 to about 36 hours by 1970. Since then it has fluctuated at around this value.24 20 The data are available through the Historical Labor Statistics Project, Institute of Business and Economic Research, University of California, Berkeley, CA 94720. See Carter, Ransom, Sutch, and Zhao (2006). 21 Costa (1998b) argued that when these data sets are pulled together, they represent quite well the occupational distribution of the 1900 census and the 1910 industrial distribution. Hence I assume that they represent the U.S. population at that time. 22 Hours reported in these data sets are per day. As discussed in Costa (1998b), the 1897 Kansas data set included a question on whether hours worked were reduced or increased on Saturday. Nine percent reported that hours were reduced, 14 percent reported that hours were increased, and 76 percent reported that they remained the same. Sundstorm (2006) also argued that the typical number of working days per week in the late 19th century was 6. Hence, I assume a 6-day work week. 23 The data are available at http://econ.ucsd.edu/~vramey/research.html/Century_Public_ Data.xls. 24 See also Jones (1963), which documents average weekly hours in manufacturing for the years 1900–1957.
LONGEVITY AND LIFETIME LABOR SUPPLY
1843
4.3.1. A Baseline Time Series for Hours The discussion above highlights several obstacles in generating a consistent time series of hours worked at each age t for each cohort c. First, in some series the sample consists of men and women, while in others it consists only of men. Second, some series consist of only part of the economy, while others report on all sectors of the economy. Third, over time, there is a change in the pattern of hours worked over a lifetime: in the 1890s and in 1940, hours by age did not vary much, but starting in 1950, hours by age varied substantially.25 These issues posit a problem in generating consistent time series of hours worked by age for each cohort. In an attempt to overcome these obstacles, I make the following assumptions. First, for the period 1860–1880, I take the Weeks estimates, which are lower than the Aldrich estimates for all years: 62 hours in 1860, 61.1 hours in 1870, and 60.7 hours in 1880. For 1890, I take my estimate from the microdata sets published by the state Bureaus of Labor Statistics: 61.2 hours a week.26 For the years 1900 to present I use the time series of Ramey and Francis (2009) for males age 14+, adjusted to account for the employment rate as discussed above in Section 4.3. Second, I have to overcome the changes in the pattern of hours worked over the life cycle of the different cohorts. Given the data limitations, in the baseline estimates I do not allow for any age variation in hours in a given year. Under this assumption, the only difference in annual hours worked across cohorts arises from the year of entry and year of retirement from the labor market.27 Figure 4 displays the time series for weekly hours worked by males that is used for the main estimates of ETWH presented in Section 5. For each cohort in my cohort estimates, I use a subset of this series. For example, men born in 1880 joined the labor market in 1900 (by assumption, all cohorts enter the labor market at age 20). Since I need data on hours worked until Sborn 1880 (t|t ≥ t0 ) = 0, and this is true for the cohort born in 1880, at age 94, lborn 1880 (t) is hours worked from 1900 to 1993.28 For my period estimates I only need the average hours worked at the age at which expectations are calculated, which, by assumption, is age 5. Hence for the cohort born in 1850, I use average hours in 1855–1864; for the cohort born in 1860, I use average hours in 1865–1874; and so forth. Finally, since this series is expressed in terms of 25 Hazan (2006) presented the cross-sectional relationship between age and hours for various years. 26 Since I have very few observations for the period 1860–1900, I use a quadratic fitting curve to assign values for years in which data are missing. 27 Figures 9, 11, 12, and 13 (Sections 5.2, 5.3, and 6) present estimates of ETWH both for men and all individuals using data taken solely from Ramey and Francis (2009). These data allow me to utilize age–year variation in hours worked. These estimates, however, are only available for a subset of cohorts. 28 Note that while for each cohort I need data on hours worked at all ages until Sc (t|t ≥ t0 ) = 0, in practice, for all cohorts, by the age of 80, Sc (t|t ≥ t0 ) is sufficiently close to 0 and, therefore, hours worked above this age have a negligible effect on the ETWH.
1844
MOSHE HAZAN
FIGURE 4.—Weekly hours worked by men, 1860–present.
weekly hours worked, and my mortality rates and retirement rates are annual, I convert the hours series to an annual series as well. Since most men in the labor market work most of the year, I avoid further complications and assume that all cohorts work 52 weeks a year.29 Hence my annual series, l(t), is the series presented in Figure 4 multiplied by 52.30 5. RESULTS FOR MEN In this section, I present my results for men. I begin by estimating the probability of remaining in the labor market, or the fraction of individuals who 29
This assumption is carefully examined in Section 7. The series presented in Figure 4 has many “jumps.” The first is in 1900, when I combine the earliest data with the Ramey and Francis data, and then during the Great Depression and World War II. To alleviate concerns that the main results of the paper are driven by these changes, I fit a quartic curve to this series and use the predicted values to generate estimates of ETWH. These estimates are very similar to those presented in Section 5. Alternatively, since I have values from two series for 1900, I calculate the ratio between these two values and adjust the pre-1900 data by this ratio. Although this reduces hours worked by about 15 percent for the pre-1900 period, the ETWH is still declining across cohorts. 30
LONGEVITY AND LIFETIME LABOR SUPPLY
1845
remain in the labor market, conditional on being alive at age 5 and age 20.31 This also enables me to present estimates on the expected number of years each cohort was expected to work. I then combine the probability of remaining in the labor market with the series of hours worked per year to arrive at my main results—the ETWH. 5.1. The Probability of Remaining in the Labor Market—Cohort Estimates for Men Born 1840–1930 In this section, I present my cohort estimates of Sc (t|t ≥ t0 )—the fraction of individuals who remain in the labor market at age t, conditional on being alive at age t0 , for members of cohort c. Specifically, I let t0 = 20 and assume that individuals of each cohort enter the labor market at age 20. I then estimate the fraction of those who remain in the labor market at all ages over 20 by estimating the hazard function, (7), and computing Sc (t|t ≥ 20) using (8). Figure 5 shows the fraction of individuals who remain in the labor market conditional on being alive at age 20 and on entering the labor market at that age. Given my
FIGURE 5.—The probability of remaining in the labor market, conditional on entry into the labor force at age 20: Cohort estimates for men. 31 I use the terms “probability of surviving” and “the fraction of individuals who survived” interchangeably. Although from an individual point of view, the former is the appropriate term, for the representative member of each cohort, the latter is relevant.
1846
MOSHE HAZAN
FIGURE 6.—Expected number of years in the labor market at age 5 and age 20, conditional on entry into the labor force at age 20: Cohort estimates for men.
assumption that participation rates remain constant from age 20 to age 45, it is evident from (7) that the fraction of individuals who participate in the labor market over the age interval 20–45 is affected solely by death rates. Since it was shown in Section 4.1.1 that mortality rates have been declining monotonically over time, it is not surprising that the fraction of those who participate in the labor market is higher for younger cohorts than for older ones, up to age 45. However, from age 55, the two variables that affect the fraction of those who participate in the labor market work in opposite directions. As a result, while this fraction is higher at younger ages for the younger cohorts, the curves for men born in 1840, 1880, and 1930 intersect at about the age of 63.32 It is worth noting that the area under each such survival curve is the expected number of years each cohort is expected to be working in the labor market. Figure 6 plots the number of years that each cohort was expected to work, for individuals who survive to age 20, assuming that entry age is fixed at 20.33 As can be seen from this figure, the representative member of the cohort born in 1840 was expected to work for 37.23 years, whereas his counterpart born in 32
In fact, this is the pattern across all the cohorts. Note that this is a very conservative assumption. While participation at ages 20–24 is lower than at ages 25–45 for the younger cohorts, probably due to college education, for the oldest cohorts, the average age of entrance to the labor market was likely to have been lower than 20. Hence I overestimate the difference in the expected number of years in the labor market between the oldest and youngest cohorts, which, in turn, underestimates the difference in ETWH. 33
LONGEVITY AND LIFETIME LABOR SUPPLY
1847
1930 was expected to work for 41.73 years. I then redo this exercise, assuming that expectations are calculated at age 5 (i.e., t0 = 5), but maintaining the assumption that entry to the labor market occurs at age 20. Since the probability of surviving to age 20, conditioned on surviving to age 5, increases across cohorts, the difference in the expected number of years across the cohort is larger by about 2 years. Overall, it is evident that the lower mortality rates for the younger cohorts slightly outweigh their higher retirement rates. Given that the ETWH is an average of the hours worked at each age, weighted by the probability of being in the labor market at that age, the trend in ETWH across the cohorts at hand will be mostly determined by the trend in hours worked at each age. 5.2. ETWH: Cohort Estimates I now present the main results of the paper. Figures 7 and 8 present the cohort estimates of the ETWH for cohorts of men born between 1840 and 1930. Each figure contains two series of estimates: The first is labeled “by age 95” and shows the ETWH until each cohort is completely retired from the labor market; the second is labeled “by age 70” and shows the ETWH, truncated at age 70. The latter is presented to alleviate any concerns that the declining trend
FIGURE 7.—Expected total working hours over the lifetime of consecutive cohorts of men born between 1840 and 1930. Individuals are assumed to enter the labor market at age 20: Cohort estimates are calculated at age 20.
1848
MOSHE HAZAN
FIGURE 8.—Expected total working hours over the lifetime of consecutive cohorts of men born between 1840 and 1930. Individuals are assumed to enter the labor market at age 20: Cohort estimates are calculated at age 5.
of ETWH might be driven by men older than 70 years old, who, conditional on participating in the labor market, worked more than 60 hours a week in the late 19th century.34 I begin by presenting the estimates under the assumption that expectations are calculated at age 20. As can be seen in Figure 7, the lifetime labor supply as measured by the ETWH of consecutive cohorts has been declining monotonically. The oldest cohort, born in 1840, was expected to work 115,378 hours in its lifetime. In contrast, the youngest cohort, born in 1930, was expected to work only 81,411 hours. This amounts to a decline of more than 29 percent between men born in 1840 and 1930—an average decline of more than 2.5 percent between two adjacent cohorts. The probability of surviving to age 20 from age 5, however, has increased from 0.92 for the cohort born in 1840 to 0.98 for the cohort born in 1930. Since investment in education begins at age 5, one might rightfully argue that the age at which expectations should be calculated is age 5.35 This is what I do 34
Hereafter, all figures which present estimates of ETWH show ETWH by both age 95 and age
70. 35 Recall that while expectations are calculated at age 5, it is assumed that the age of entry into the labor market is 20.
LONGEVITY AND LIFETIME LABOR SUPPLY
1849
in Figure 8. As can be seen, although the difference in the ETWH between the cohorts has narrowed, it is still substantial: while members of the earliest cohort were expected at age 5 to work for 106,176 hours over their lifetime, their counterparts born 90 years later, were expected at that age to work for 79,684 hours. This amounts to a decline of nearly 25 percent between men born in 1840 and 1930, an average decline of more than 2 percent between two adjacent cohorts. Finally, note that in both figures, the decline in ETWH is monotonic across the cohorts. The main advantage of the estimates presented in Figures 7 and 8 is that they encompass 10 cohorts of men born over a period of 90 years. They suffer, however, from two disadvantages, due to the time series of annual hours worked used in the estimation. First, the hours series used for these estimates combines different sources for the pre-1900 and post-1900 period. Second, it does not allow for age–year variation in hours worked. To alleviate concerns that the declining trend in ETWH is generated due to potential biases in the time series of hours worked that is used, I employ data on hours worked by men from 1900 to 2005 computed by Ramey and Francis (2009). These data enable me to overcome the two shortcomings just mentioned, at the expense of obtaining cohort estimates of the ETWH of men born between 1890 and 1930.36 Given that Ramey and Francis reported hours per person by age groups, with the youngest age group containing men aged 10–13, I assume that expectations are formed at age 10 and I remove my earlier assumption that men of all cohorts enter the labor market at age 20. Figure 9 presents the cohort estimates of ETWH for men born 1890–1930, using Ramey and Francis’ data.37 A few points are worth mentioning. First, similar to the estimates presented in Figures 7 and 8, ETWH is monotonically decreasing across cohorts. Second, although the estimates based on Ramey and Francis’ data are somewhat larger than those presented in Figures 7 and 8, the difference across cohorts is almost constant. Finally, the decline across cohorts does not spring from different behavior at very old ages: the difference between the ETWH by age 95 and by age 70 is almost constant. 5.3. ETWH: Period Estimates One reason to present the period estimates is that assuming that individuals perfectly foresee their entire lifetime may be a strong assumption. Hence, I also 36 The Ramey and Francis’ data comprise hours per person, not per worker. I therefore adjust my methodology such that I define the survivor function as the probability of being alive and I weight it by hours worked per person, a series which already takes into account the participation decision, conditional on being alive. 37 Men born in 1920 were age 85 in 2005 and men born in 1930 were age 75 in 2005. I assume that men born in 1920 work the same number of hours at ages 86–95 as men born in 1910. Similarly, I assume that men born in 1930 work the same number of hours at ages 75–84 as men born in 1920 and the same number of hours at ages 86–95 as men born in 1910.
1850
MOSHE HAZAN
FIGURE 9.—Expected total working hours over the lifetime of consecutive cohorts of men born in 1890–1930. Cohort estimates calculated at age 10. Hours series based on Ramey and Francis (2009).
present the period estimates for ETWH for men born between 1850 and 1970. Figure 10 presents the period estimates for ETWH until age 79, assuming that expectations are taken at age 5.38 While the period estimates series does not monotonically decline between each adjacent cohort, for example, men born 1850 expected to work 1000 hours less than men born 1860, the general trend is clear: ETWH declined by more than 35 percent between men born in 1850 and men born in 1970. The baseline time series of hours used in these estimates is that used in the cohort estimates presented in Figures 7 and 8. Note that the nature of the period estimates exposes them to a larger biases than the cohort estimates, for a given bias in the hours series. Hence, to alleviate the concern that the declining trend in ETWH is driven by biases in the time series of hours worked, 38 The truncation at age 79 is because the period life tables in Haines (1998) do not report the death rate for individuals age 80 and over. This is not a major problem, however. Since S(·) is nonincreasing and since, in the data, the older the cohort is, the larger the value of Sc (79), when I use Sc (t), t ≤ 79, to estimate the ETWH, I underestimate the differences across cohorts. Estimates of the ETWH by age 70 are not presented, for clarity, because their values are very similar to those presented here.
LONGEVITY AND LIFETIME LABOR SUPPLY
1851
FIGURE 10.—Expected total working hours and average years of schooling of consecutive cohorts of men born in 1850–1970. Individuals are assumed to enter the labor market at age 20: Period estimates are calculated at age 5.
I use the Ramey and Francis data to derive period estimates, which, similar to the corresponding estimates presented in Figure 9, have the two advantages mentioned in Section 5.2. These estimates are presented in Figure 11. As can be seen from the figure, the period estimates of the ETWH are downward trending, with a decrease of nearly 30,000 hours between men born in 1890 and men born in 1970.39 6. RESULTS FOR ALL INDIVIDUALS Thus far, the estimates presented of ETWH were only for men. One may worry, however, that the focus on men may be biasing the results against the Ben-Porath mechanism. To see why, suppose one tested the theory on women instead. One would find that as longevity increased, both education and lifetime labor supply increased, thereby supporting the Ben-Porath mechanism. 39 The “dip” in the ETWH for men born in 1920 results from the far fewer hours worked during the years 1931–1935. Recall that “men born 1920” in effect were born between 1916–1925, so they were 10 years old between 1926–1935. The hours series used for this cohort is the average across these 10 years.
1852
MOSHE HAZAN
FIGURE 11.—Expected total working hours over the lifetime of consecutive cohorts of men born in 1890–1970. Individuals are assumed to enter the labor market at age 20: Period estimates are calculated at age 10. Hours series based on Ramey and Francis (2009).
Thus, in this section, I present estimates of ETWH for all individuals by combining mortality data and labor market decisions for both men and women.40 Due to data availability on hours worked by all individuals, however, I can only present cohort estimates of ETWH for individuals born between 1890 and 1930 and period estimates for individuals born between 1890 and 1970. These are presented in Figures 12 and 13, respectively.41 Two main features are worth mentioning. First, the ETWH of all individuals still shows a declining trend across cohorts, although not monotonically across each two adjacent cohorts. The reason for the declining trend is that although average hours of work across men and women were virtually unchanged for those aged 22–54, hours fell substantially for the younger and older age groups (Ramey and Francis (2009)). It turns out that this fall outweighs the gains in life expectancy across the cohorts at study. Second, in light of the long-run trend 40 I thank two referees for raising this point and for suggesting that I make use of the Ramey and Francis data to address this issue. 41 The Ramey and Francis data comprise hours per person, not per worker. I therefore adjust my methodology such that I define the survivor function as the probability of being alive, and weight it by hours worked per person, a series which already takes into account the participation decision, conditional on being alive. Mortality data for all individuals have been used.
LONGEVITY AND LIFETIME LABOR SUPPLY
1853
FIGURE 12.—Expected total working hours over the lifetime of consecutive cohorts of all individuals born in 1890–1930. Cohort estimates are calculated at age 10.
FIGURE 13.—Expected total working hours over the lifetime of consecutive cohorts of all individuals born in 1890–1970. Period estimates are calculated at age 10.
1854
MOSHE HAZAN
of increasing labor supply of women, the decline in ETWH across cohorts is of a much smaller magnitude compared to the estimates for men. 7. ROBUSTNESS OF THE RESULTS In this section, I explore the robustness of my estimates for the ETWH for men. Some scholars argue that in 19th century America, most employment, particularly that in agriculture, was seasonal (Atack and Bateman (1992), Engerman and Goldin (1994)). Since seasonality in employment declined over time, my assumption that workers of all cohorts work 52 weeks a year biases upward the difference across cohorts in the ETWH. To explore this possibility, I conduct a counterfactual experiment. I try to answer the hypothetical question, How many weeks of employment a year do the representative members of the cohorts born between 1850 and 1910 expect to work, such that their ETWHs would be equal to that of the representative member of the cohort born in 1970. I then compare the answer to the estimates implied by Engerman and Goldin (1994). For the representative member of the cohort born in 1850 to have his ETWH equal that of the representative member of the cohort born in 1970, he should have expected to work 1,596 hours a year. In 1860, the year at which the representative member of the cohort born in 1850 was 10 years old, the weekly average hours of work was 62.17. Hence, to work 1,596 annual hours, the representative member of this cohort should have expected to be employed for about 26 weeks a year. The answer for this hypothetical question for all cohorts born between 1850 and 1910 is presented in Figure 14. As can be seen, for all these cohorts, employment of less than 31 weeks a year was enough to expect a lifetime labor supply that is equal to that of the cohort born in 1970. Note that these numbers imply an expected length of unemployment of almost 5 months a year, which is well above the findings of Engerman and Goldin (1994) and Atack, Bateman, and Margo (2002). Specifically, Engerman and Goldin found that in 1900 the length of unemployment, conditional on being unemployed, was between 3 and 4 months. Yet, the probability of being unemployed in 1900 was less than 50 percent. Taking these two findings together, it follows that the expected months of unemployment did not exceed 2. Similarly, Atack, Bateman, and Margo (2002) found that the full-time equivalent months of employment was nearly 11 months a year both in 1870 and 1880. 8. THE EUROPEAN EXPERIENCE Was the American experience unique? Does the lifetime labor supply of European men display a different time trend? In this section, I briefly discuss the data on the determinants of lifetime labor supply in some European coun-
LONGEVITY AND LIFETIME LABOR SUPPLY
1855
FIGURE 14.—Counterfactual experiment: expected number of weeks of employment that would equalize ETWH to that of the cohort born in 1970.
tries and compare them to U.S. data.42 Although the data in this section are somewhat suggestive, my purpose is to show that my results are not unique to the U.S. experience, but rather are a robust feature of the process of development of today’s developed economies. To this end, I present time series of (i) life expectancies for males at age 5 (Figure 15), (ii) labor force participation of men aged 65 and over (Figure 16), and (iii) annual hours of work of full-time production workers (Figure 17). Figures 15–17 demonstrate remarkable similarities across these countries in the determinants of ETWH, both in terms of the trends and the magnitudes. I therefore conjecture that the decline in ETWH across cohorts is not unique to the American experience, but is a robust feature of the process of development in today’s developed economies. 9. CONCLUDING REMARKS In this paper, I demonstrate that the commonly utilized mechanism, according to which prolonging the period in which individuals may receive returns on their human capital spurs investment in human capital and causes growth, has 42 The selection of countries reflects availability of data from the various sources used. References to the various sources are given in the figures.
1856
MOSHE HAZAN
FIGURE 15.—Life expectancies at age 5 for males in selected countries, period life tables. Data for France, Germany, the Netherlands, and the United Kingdom are from the Human Mortality Database. Data sources for the United States are described in Section 4.1.
an important implicit implication. Namely, that as life prolongs, lifetime labor supply must increase as well. Hence, I argue that this mechanism has to satisfy this necessary condition. Utilizing data on consecutive cohorts of American men born between 1840 and 1970, I show that this mechanism fails to satisfy its necessary condition. Specifically, the estimates of lifetime labor supply and average years of schooling, which are shown together in Figure 10, reject this necessary condition unequivocally.43 I also provide suggestive evidence that the determinants of lifetime labor supply are remarkably similar between the United States and other developed countries, such as England, France, GerThe correlation between the period estimates of ETWH and schooling is −093, with a p value of 0, and between the cohort estimates of ETWH and schooling is −085, with a p value of 0.0081. One may argue that hours per school day may have been reduced as well, challenging the argument that schooling has been increasing. Ramey and Francis (2009) argued that the average weekly hours spent in school by individuals in the age group 14–17 has increased from 1.4 in 1900 to 20.2 in 1970 and has been fluctuating around this value since then (see their Table 3). Goldin (1999) provided data on the average length of the school term and the average number of days attended per pupil enrolled. Both series show monotonic increases from the school year 1869– 1870 (which is the earliest data point of this series). For example, the average number of days attended per pupil enrolled has increased from about 80 days in the school year 1869–1870 to nearly 100 in the school year 1899–1900 and to 150 day in the school year 1939–1940. 43
LONGEVITY AND LIFETIME LABOR SUPPLY
1857
FIGURE 16.—Labor force participation of men aged 65 and over in selected countries. Data are from Table 2A.2 of Costa (1998a).
many, and the Netherlands. Thus, I conjecture that my main result that ETWH has declined is a robust feature of the process of development in today’s developed economies. I therefore conclude that the Ben-Porath mechanism had a nonpositive effect on investment in education and, therefore, cannot account for any of the immense increase in educational attainment observed over the last 150 years. My results lend credence to mechanisms that emphasize an increase in the net return to private investment in human capital. One possible candidate is technological progress, which increases the economic return to human capital.44 Another candidate is public education, which increases the net private returns to human capital by reducing the cost of acquiring education (Galor and Moav (2006)). My results have implications at a broader level as well. In the recent debate on the fundamental causes of long-run growth, several scholars have advocated the “geography” hypothesis. According to this hypothesis, exogenous differences in the environment are the fundamental cause of long-run growth. One 44
See Galor and Weil (2000) for a growth model driven by the interplay between human capital and technological progress, and Acemoglu (2007) for a theoretical analysis of the interplay between factors’ supply and the nature of technological change.
1858
MOSHE HAZAN
FIGURE 17.—Annual hours worked by full-time production workers in selected countries. Data from Huberman and Minns (2007).
important difference is the “disease burden” in the tropics, which, compared to temperate zones, results in high morbidity and mortality rates, and in turn, impedes development (Bloom and Sachs (1998), Sachs (2003)). My results, however, suggest that an important element of the geography hypothesis is not supported by the data, namely that mortality decline did not play a role in the growth process of the United States and Western Europe via the human capital channel. Furthermore, mortality rates in mid-19th century America were much higher than those existing in sub-Saharan Africa today.45 Hence, if lessons of the past guide our perceptions of the future, my results cast doubt on the optimistic view advocated by World Health Organization’s Commission on Macroeconomics and Health (2001), as quoted in the Introduction. Some caveats are in place. First, my analysis was conducted for a representative member of each cohort. However, it could be that ETWH have increased for more educated individuals while they declined for less educated workers and that the latter dominated. While this is possible, data limitations preclude 45 Using data from the World Development Indicators for the year 2000, I average three measures of mortality across all 48 countries of sub-Saharan Africa: life expectancy at birth, adult mortality rate, and child mortality rate. The figures for sub-Saharan Africa are 51.61 years, 407 per 1000, and 147 per 1000, respectively. The corresponding numbers for mid-19th century America are 37.23 years, 585 per 1000, and 322 per 1000, respectively.
LONGEVITY AND LIFETIME LABOR SUPPLY
1859
me from estimating ETWH in different segments of the skill distribution. In particular, weekly hours worked by wage or education cannot be estimated consistently prior to 1940, and mortality rates by wage or education are not available. Second, one should not conclude from this paper that gains in life expectancy are useless, or that they do not affect growth. For one thing, they are desirable for their own sake, as long as individuals value life (over death). Murphy and Topel (2006) built a model to value longevity and health, based on individuals’ willingness to pay, and estimated substantial economic gains from both gains in life expectancies and improvements in health over the 20th century in America.46 Finally, human capital might also make leisure more valuable (Vandenbroucke (2009)), provide social status (Fershtman, Murphy, and Weiss (1996)), and increase the attractiveness in the marriage market (Gould (2008)). Thus, greater longevity can potentially increase the investment in human capital for these reasons, rather than for labor market productivity. Hence, one can build a model in which an increase in longevity reduces total lifetime labor supply and increases education and total welfare, reconciling my findings with the Ben-Porath (1967) model. APPENDIX: PROOFS FOR PROPOSITIONS 1 AND 2 Differentiating (5) and (6) with respect to longevity, T , yields, respectively, (9)
2 1 ds ds −r(R−s) dR − =e − θ (s) θ (s) dT dT dT
and (10)
f (R)
dR ds dc = u (·)eθ(s) θ (s) + eθ(s) u (·) dT dT dT
46 Related to gains in longevity are improvements in health. From a theoretical point of view, however, longevity and health are distinct. While longevity measures the length of (productive) life, health affects the productivity (in school or in the labor market) per unit of time. Interestingly, Bleakley (2007) analyzed the eradication of the nonfatal disease hookworm from the American south and found a positive effect of the eradication on schooling. Moreover, in a related work (Bleakley (2006)), he found an interesting natural experiment that forms a bridge between health and longevity. In Colombia, most of the malarial areas were afflicted with vivax malaria, a high-morbidity strain. However, significant portions of the country suffered from elevated rates of falciparum, a malaria parasite associated with high mortality. Bleakley found that eradicating vivax malaria produced substantial gains in human capital and income, while on the other hand, his estimates indicated no such gains from eradicating falciparum.
1860 where (11)
MOSHE HAZAN dc dT
is given by dR ds dc ds = eθ(s) θ (s)(e−rs − e−rR ) + r e−rR − e−rs dT dT dT dT × (1 − e−rT ) − re−rT (e−rs − e−rR ) (1 − e−rT )2
ds PROOF OF PROPOSITION 1: Solving for dT in equations (9), (10), and (11), using the first order conditions, and taking into account the second order conds > 0. Given that, by the strict concavity of θ(·), the left-hand ditions yields dT side of (9) is positive. Clearly, the right-hand side of (9) is positive if and only dR ds > dT , which implies that d(R−s) > 0. Q.E.D. if dT dT ds > 0 in equations (9), (10), and PROOF OF PROPOSITION 2: Solving for dT (11), using the first order conditions, and taking into account the second order ds conditions yields dT > 0. The linearity of θ(·) implies that the left-hand side of ds = dT , (9) equals 0. Clearly, the right-hand side of (9) equals 0 if and only if dR dT d(R−s) which implies that dT = 0. Q.E.D.
REFERENCES ACEMOGLU, D. (2007): “Equilibrium Bias of Technology,” Econometrica, 75, 1371–1409. [1857] ACEMOGLU, D., AND S. JOHNSON (2006): “Disease and Development: The Effect of Life Expectancy on Economic Growth,” Working Paper 12269, NBER. [1831,1832] ASHRAF, Q. H., A. LESTER, AND D. N. WEIL (2008): “When Does Improving Health Raise GDP?” in NBER Macroeconomics Annual, Vol. 23. University of Chicago Press. [1831] ATACK, J., AND F. BATEMAN (1992): “How Long Was the Workday in 1880?” Journal of Economic History, 52, 129–160. [1854] ATACK, J., F. BATEMAN, AND R. A. MARGO (2002): “Part-Year Operation in Nineteenth-Century American Manufacturing: Evidence From the 1870 and 1880 Censuses,” Journal of Economic History, 62, 792–809. [1854] BELL, F. C., A. H. WADE, AND S. C. GOSS (1992): Life Tables for the United States Social Security Area 1900–2080. Washington, DC: Government Printing Office. [1838] BEN-PORATH, Y. (1967): “The Production of Human Capital and the Life Cycle of Earnings,” Journal of Political Economy, 75, 352–365. [1829,1832,1833,1859] BILS, M., AND P. J. KLENOW (2000): “Does Schooling Cause Growth?” American Economic Review, 90, 1160–1183. [1831,1835] BLEAKLEY, H. (2006): “Malaria in the Americas: A Retrospective Analysis of Childhood Exposure,” Working Paper 2006-35, CEDE. [1859] (2007): “Disease and Development: Evidence From Hookworm Eradication in the American South,” Quarterly Journal of Economics, 122, 73–117. [1859] BLOOM, D. E., AND J. D. SACHS (1998): “Geography, Demography and Economic Growth in Africa,” Brookings Papers on Economic Activity, 2, 207–295. [1858] BLOOM, D. E., D. CANNING, AND M. MOORE (2007): “A Theory of Retirement,” Working Paper 26, PGDA. [1833]
LONGEVITY AND LIFETIME LABOR SUPPLY
1861
BOLDRIN, M., L. E. JONES, AND A. KHAN (2005): “Three Equations Generating an Industrial Revolution?” Unpublished Manuscript, University of Minnesota. [1830] BOUCEKKINE, R., D. DE LA CROIX, AND O. LICANDRO (2002): “Vintage Human Capital, Demographic Trends, and Endogenous Growth,” Journal of Economic Theory, 104, 340–375. [1830] (2003): “Early Mortality Decline at the Dawn of Modern Growth,” Scandinavian Journal of Economics, 105, 401–418. [1830] CARD, D. (1995): “Earnings, Schooling and Ability Revisited,” in Research in Labor Economics, ed. by S. Polachek. Greenwich, CT: JAI Press. [1835] CARTER, S. B., R. RANSOM, R. SUTCH, AND H. ZHAO (2006): “Computer Files of Various State Bureau of Labor Statistics Reports,” available at http://www.eh.net/databases/labor/. [1842] CASELLI, F. (2005): “Accounting for Cross-Country Income Differences,” in Handbook of Economic Growth, ed. by P. Aghion and S. N. Durlauf, Vol. 1A. Elsevier, 679–741. [1831] CERVELLATI, M., AND U. SUNDE (2005): “Human Capital Formation, Life Expectancy and Process of Economic Development,” American Economic Review, 95, 1653–1672. [1830] COSTA, D. L. (1998a): The Evolution of Retirement. Chicago, IL: University of Chicago Press. [1840,1857] (1998b): “The Wage and the Length of the Work Day: From the 1890s to 1991,” Working Paper 6504, NBER. [1842] DE LA CROIX, D., AND O. LICANDRO (1999): “Life Expectancy and Endogenous Growth,” Economics Letters, 65, 255–263. [1830] ENGERMAN, S., AND C. GOLDIN (1994): “Seasonality in Nineteenth Century Labor Market,” in Economic Development in Historical Perspective, ed. by D. Schaefer and T. Weiss. Palo Alto, CA: Stanford University Press. [1854] FERSHTMAN, C., K. M. MURPHY, AND Y. WEISS (1996): “Social Status, Education, and Growth,” Journal of Political Economy, 104, 108–132. [1859] FORTSON, J. G. (2007): “Mortality Risk and Human Capital Investment: The Impact of HIV/AIDS in Sub-Saharan Africa,” Unpublished Manuscript, University of Chicago. [1831] GALOR, O., AND O. MOAV (2006): “Das Human Kapital: A Theory of the Demise of the Class Structure,” Review of Economic Studies, 73, 85–117. [1857] GALOR, O., AND D. N. WEIL (1999): “From Malthusian Stagnation to Modern Growth,” American Economic Review, 89, 150–154. [1830] (2000): “Population, Technology, and Growth: From Malthusian Stagnation to the Demographic Transition and Beyond,” American Economic Review, 90, 806–828. [1835,1857] GOLDIN, C. (1999): “A Brief History of Education in the United States,” Historical Paper 119, NBER. [1856] GOULD, E. D. (2008): “Marriage and Career: The Dynamic Decisions of Young Men,” Journal of Human Capital, 2, 337–378. [1859] HAINES, M. R. (1998): “Estimated Life Table for the United States, 1850–1910,” Historical Methods, 31, 149–167. [1838,1850] HALL, R. E., AND C. I. JONES (1999): “Why Do Some Countries Produce so Much More Output per Worker Than Others?” Quarterly Journal of Economics, 114, 83–116. [1835] HAZAN, M. (2006): “Longevity and Lifetime Labor Input: Data and Implications,” Discussion Paper 5963, CEPR. [1832,1835,1837,1839,1843] HAZAN, M., AND B. BERDUGO (2002): “Child Labor, Fertility, and Economic Growth,” Economic Journal, 112, 810–828. [1835] HAZAN, M., AND H. ZOABI (2006): “Does Longevity Cause Growth? A Theoretical Critique,” Journal of Economic Growth, 11, 363–376. [1831] HUBERMAN, M., AND C. MINNS (2007): “The Times They Are Not Changin’: Days and Hours of Work in Old and New Worlds, 1870–2000,” Explorations in Economic History, 44, 538–567. [1858] HUMAN MORTALITY DATABASE, UC Berkeley, and Max Planck Institute for Demographic Research. Available at www.mortality.org. [1856]
1862
MOSHE HAZAN
JAYACHANDRAN, S., AND A. LLERAS-MUNEY (2009): “Life Expectancy and Human Capital Investments: Evidence From Maternal Mortality Declines,” Quarterly Journal of Economics, 124 (forthcoming). [1831,1832] JENA, A., C. MULLIGAN, T. J. PHILIPSON, AND E. SUN (2008): “The Value of Life in General Equilibrium,” Working Paper 14157, NBER. [1831] JONES, E. B. (1963): “New Estimates of Hours of Work per Week and Hourly Earnings, 1900– 1957,” The Review of Economics and Statistics, 45, 374–385. [1842] KALEMLI-OZCAN, S., H. E. RYDER, AND D. N. WEIL (2000): “Mortality Decline, Human Capital Investment, and Economic Growth,” Journal of Development Economics, 62, 1–23. [1830,1835] KENDRICK, J. W. (1961): Productivity Trends in the United States. Princeton, NJ: Princeton University Press. [1842] LEE, C. (2001): “The Expected Length of Male Retirement in the United States,” Journal of Population Economics, 14, 641–650. [1840] LORENTZEN, P., J. MCMILLAN, AND R. WACZIARG (2008): “Death and Development,” Journal of Economic Growth, 13, 81–124. [1831,1832] MANUELLI, R. E., AND A. SESHADRI (2005): “Human Capital and the Wealth of Nations,” Unpublished Manuscript, University of Wisconsin–Madison. [1831] MARGO, R. A. (1986): “Race and Human Capital: Comment,” American Economic Review, 76, 1221–1224. [1830] MELTZER, D. (1992): “Mortality Decline, the Demographic Transition, and Economic Growth,” Ph.D. Thesis, University of Chicago. [1830] MOAV, O. (2005): “Cheap Children and the Persistence of Poverty,” Economic Journal, 115, 88–110. [1835] MOEN, J. R. (1988): “From Gainful Employment to Labor Force: Definitions and a New Estimate of Work Rates of American Males, 1860 to 1980,” Historical Methods, 21, 149–159. [1840] MURPHY, K. M., AND R. H. TOPEL (2006): “The Value of Health and Longevity,” Journal of Political Economy, 114, 871–904. [1859] RAMEY, V. A., AND N. FRANCIS (2009): “A Century of Work and Leisure,” American Economic Journal: Macroeconomics (forthcoming). [1842,1843,1849,1850,1852,1856] RUGGLES, S., M. SOBEK, T. ALEXANDER, C. A. FITCH, R. GOEKEN, P. KELLY HALL, M. KING, AND C. RONNANDER (2004): Integrated Public Use Microdata Series: Version 3.0 [Machinereadable database]. Minneapolis, MN: Minnesota Population Center. [1840] SACHS, J. D. (2003): “Institutions Don’t Rule: Direct Effects of Geography on per Capita Income,” Working Paper 9490, NBER. [1858] SHESHINSKI, E. (2008): The Economic Theory of Annuities. Princeton, NJ: Princeton University Press. [1833] SOARES, R. R. (2005): “Mortality Reductions, Educational Attainment, and Fertility Choice,” American Economic Review, 95, 580–601. [1830] SUNDSTORM, W. A. (2006): “Hours and Working Conditions,” in Historical Statistics of the United States, Vol. 2, ed. by S. B. Carter, S. S. Gartner, M. R. Haines, A. L. Olmstead, R. Sutch, and G. Wright. Cambridge University Press, 46–54. [1842] VANDENBROUCKE, G. (2009): “Trends in Hours: The U.S. From 1900 to 1950,” Journal of Economic Dynamics and Control, 33, 237–249. [1859] WHAPLES, R. (1990): “The Shortening of the American Work Week: An Economic and Historical Analysis of Its Context, Causes and Consequences,” Ph.D. Thesis, University of Pennsylvania. [1841]
LONGEVITY AND LIFETIME LABOR SUPPLY
1863
WORLD HEALTH ORGANIZATION (2001): “Macroeconomics and Health: Investing in Health for Economic Development,” Report, Commission on Macroeconomics and Health, Geneva. [1831,1858]
Dept. of Economics, The Hebrew University of Jerusalem, Mt. Scopus, Jerusalem 91905, Israel;
[email protected]. Manuscript received September, 2008; final revision received May, 2009.
Econometrica, Vol. 77, No. 6 (November, 2009), 1865–1899
BAYESIAN ESTIMATION OF DYNAMIC DISCRETE CHOICE MODELS BY SUSUMU IMAI, NEELAM JAIN, AND ANDREW CHING1 We propose a new methodology for structural estimation of infinite horizon dynamic discrete choice models. We combine the dynamic programming (DP) solution algorithm with the Bayesian Markov chain Monte Carlo algorithm into a single algorithm that solves the DP problem and estimates the parameters simultaneously. As a result, the computational burden of estimating a dynamic model becomes comparable to that of a static model. Another feature of our algorithm is that even though the number of grid points on the state variable is small per solution-estimation iteration, the number of effective grid points increases with the number of estimation iterations. This is how we help ease the “curse of dimensionality.” We simulate and estimate several versions of a simple model of entry and exit to illustrate our methodology. We also prove that under standard conditions, the parameters converge in probability to the true posterior distribution, regardless of the starting values. KEYWORDS: Bayesian estimation, dynamic programming, discrete choice models, Markov chain Monte Carlo.
1. INTRODUCTION STRUCTURAL ESTIMATION OF DYNAMIC DISCRETE CHOICE (DDC) models has become increasingly popular in empirical economics. Examples include Keane and Wolpin (1997) on labor economics, Erdem and Keane (1996) on marketing, Imai and Krishna (2004) on crime, and Rust (1987) on empirical industrial organization. Structural estimation of DDC models is appealing because it captures the dynamic forward-looking behavior of individuals. This is important in understanding agents’ behavior in various settings. For example, in the labor market, individuals carefully consider future prospects when they decide whether to change jobs. Moreover, since structural estimation allows us to obtain estimates of parameters that have economic interpretations, based on these interpretations and the solution of the model, we can assess the effect of fundamental changes in policy regimes by simply changing the estimated value of “policy” parameters and simulating the model. However, one major obstacle in adopting the structural estimation method has been its computational burden, which is mainly due to the following two reasons. First, the likelihood or the moment conditions are based on the explicit solution of a dynamic programming (DP) model. For instance, solving an infinite 1 We are very grateful to the editor and the anonymous referees for insightful comments. Thanks also go to Chris Ferrall, Chris Flinn, Wesley Hartmann, Mike Keane, Justin McCrary, Andriy Norets, Matthew Osborne, Peter Rossi, John Rust, and seminar participants at the UIUC, NYU, Ohio State University, University of Kansas, University of Michigan, University of Minnesota, SBIES, 2006 Quantitative Marketing and Economics Conference, and 2005 Econometrics Society World Congress for helpful comments on the earlier draft of the paper. We also thank SSHRC and FQRSC for financial support. All remaining errors are our own.
© 2009 The Econometric Society
DOI: 10.3982/ECTA5658
1866
S. IMAI, N. JAIN, AND A. CHING
horizon DP problem requires us to obtain the fixed point of a Bellman operator for each possible point in the state space. Second, the possible number of points in the state space increases exponentially with the dimensionality of the state space. This is commonly referred to as the curse of dimensionality, which makes the estimation of DDC models infeasible even in a relatively simple setting. In this paper, we propose an estimator that helps overcome the two computational difficulties of structural estimation of infinite horizon DP models. Our estimator is based on the Bayesian Markov chain Monte Carlo (MCMC) estimation algorithm, where we simulate the posterior distribution by repeatedly drawing parameters from a pseudo-Markov chain until convergence. In contrast to the conventional MCMC estimation approach, we combine the Bellman equation step and the MCMC algorithm step into a single hybrid solutionestimation step, which we iterate until convergence. The key innovation in our algorithm is that, for a given state space point, we only need to conduct a single iteration of the Bellman operator during each estimation step. Since evaluating a Bellman operator once is as computationally demanding as computing a static model, the computational burden of estimating a DP model is in order of magnitude comparable to that of estimating a static model.2 This is in contrast to conventional estimation methods that “estimate” the model only after solving the DP problem. Our estimation method is related to the algorithm advocated by Aguirregabiria and Mira (2002) and others, which is an extension of the method developed by Hotz and Miller (1993) and Hotz, Miller, Sanders, and Smith (1994).3 However, their estimation algorithms, which are not based on the full solution of the model, have difficulties dealing with unobserved heterogeneity. This is because they essentially recover the value function from the observed choices of individuals at each point of the state space by conditioning on observed state variables. In contrast, our estimation algorithm is based on the full solution of the DP problem and, therefore, it can accommodate a rich specification of both observed and unobserved heterogeneity.4 2
Ferrall (2005) also considered an optimal mix of model solution and estimation algorithms. Arcidiacono and Jones (2003) adopted the expectation–maximization (EM) algorithm to estimate different parts of a dynamic model with latent types sequentially rather than jointly. Using a Monte Carlo experiment, they showed that their method could potentially result in significant computational gain compared with the full information maximum likelihood. 3 See also Aguirregabiria and Mira (2007) and Arcidiacono and Miller (2009) for extensions of the work of Hotz et al. to estimate models with dynamic games and finite mixture models. 4 In contrast to Ackerberg (2004), where the entire DP problem needs to be solved for each parameter simulation, in our algorithm, the Bellman operator needs to be evaluated only once for each parameter value. Furthermore, there is an additional computational gain because our pseudo-MCMC algorithm guarantees that, except for the initial burn-in simulations, most of the parameter draws are from a distribution close to the true posterior distribution. In Ackerberg’s case, the initial parameter simulation and, therefore, the DP solution would be inefficient because
BAYESIAN ESTIMATION OF DDC MODELS
1867
We avoid the computational burden of the full solution by approximating the expected value function (that is, the emax function) at a state space point using the average of value functions of past iterations in which the parameter vector is “close” to the current parameter vector and the state variables are either exactly the same as the current state variables (if the state space is finite) or close to the current state variables (if the state space is continuous). This method of updating the emax function is similar to Pakes and McGuire (2001) except in the important respect that we also include the parameter vector in determining the set of iterations over which averaging occurs. Note that the probability function that determines the next period parameter values is not a Markov transition function because our updated emax function depends on the past simulations of parameter vectors and value functions. We prove that under mild conditions, the probability function converges to the true MCMC transition function as the number of iterations of our Bayesian MCMC algorithm increases. That is, as the number of iterations increases, our algorithm becomes closer to the standard MCMC algorithm. Our algorithm also helps in the “curse of dimensionality” situation where the dimension of the state space is high. In most DP solution exercises involving a continuous state variable, the state space grid points, once determined, are fixed over the entire algorithm, as in Rust (1997). In our Bayesian DP algorithm, the state space grid points do not have to be the same for each solution-estimation iteration. In fact, by varying the state space grid points at each solution-estimation iteration, our algorithm allows for an arbitrarily large number of state space grid points by increasing the number of iterations. This is how our estimation method reduces the computational burden in highdimensional cases. We demonstrate the performance of our algorithm by estimating a dynamic, infinite horizon model of firm entry and exit choice with observed and unobserved heterogeneity. The unobserved random effects coefficients are assumed to have a continuous distribution function, and the observed characteristics are assumed to be continuous as well. It is well known that for a conventional dynamic programming simulated maximum likelihood estimation strategy, this setup imposes a severe computational burden. The computational burden is due to the fact that during each estimation step, the DP problem has to be solved for each firm hundreds of times. Because of the observed heterogeneity, each firm has a different parameter value. Furthermore, because the random effects term has to be integrated out numerically via Monte Carlo integration, for each firm, one has to simulate the random effects parameter hundreds of times, and for each simulation, solve for the DP problem. This is why most at the initial stage, true parameter distribution is not known. On the other hand, if prior to the estimation, one has a fairly accurate prior about the location of the parameter estimates, and thus the model needs to be solved at only very few parameter values up front, then the algorithm could be computationally efficient.
1868
S. IMAI, N. JAIN, AND A. CHING
practitioners of structural estimation follow Heckman and Singer (1984), and assume discrete distributions for random effects and only allow for discrete types as observed characteristics. We show that the computational burden of the estimation exercise above, using our algorithm, becomes quite similar in difficulty to the Bayesian estimation of a static discrete choice model with random effects (see McCulloch and Rossi (1994) for details). Indeed, through simulation-estimation exercises, we show that the computing time for our estimation exercise is around five times as fast and significantly more accurate than the conventional random effects simulated maximum likelihood estimation algorithm. In addition to the experiments, we formally prove that under very mild conditions, the distribution of parameter estimates simulated from our solution-estimation algorithm converges in probability to the true posterior distribution as we increase the number of iterations. Our algorithm shows that the Bayesian methods of estimation, suitably modified, can be used effectively to conduct full-solution-based estimation of structural DDC models. Thus far, application of Bayesian methods to estimate such models has been particularly difficult. The main reason is that the solution of the DP problem, that is, the repeated calculation of the Bellman operator, is computationally so demanding that the MCMC, which typically involves far more iterations than the standard maximum likelihood (ML) routine, becomes infeasible quickly with a relatively small increase in model complexity. One of the few examples of Bayesian estimation is Lancaster (1997). He successfully estimated the equilibrium search model where the Bellman equation can be transformed into an equation where all the information on optimal choice of the individual can be summarized in the reservation wage and, hence, there is no need to solve the value function. Another line of research is Geweke and Keane (2000) and Houser (2003), who estimated the DDC model without solving the DP problem. In contrast, our paper accomplishes Bayesian estimation based on full solution of the infinite horizon DP problem by simultaneously solving for the DP problem and iterating on the pseudo-MCMC algorithm. The difference turns out to be important because their estimation algorithms can only accommodate limited specification of unobserved heterogeneity.5 Our estimation method makes Bayesian application to DDC models not only computationally feasible, but possibly even superior to the existing (nonBayesian) methods, by reducing the computational burden of estimating a dynamic model to that of estimating a static one. Furthermore, the usually cited 5 Since the working paper version of this paper has been circulated, several authors have used the Bayesian DP algorithm and made some important extensions. Osborne (2007) applied the Bayesian DP algorithm to the estimation of dynamic discrete choice model with random effects and estimated the dynamic consumer brand choice model. Norets (2007) applied it to the DDC model with serially correlated state variables. Also see Brown and Flinn (2006) for a classical econometric application of the idea.
BAYESIAN ESTIMATION OF DDC MODELS
1869
advantages of Bayesian estimation over classical estimation methods apply here as well. That is, first, the conditions for the convergence of the pseudoMCMC algorithm are in general weaker than the conditions for the global maximum of the ML estimator, as we show in this paper. Second, in MCMC, standard deviations of parameter estimates are simply the sample standard errors of parameter draws, whereas in ML estimation, standard errors have to be computed, usually either by inverting the numerically calculated information matrix, which is valid only in a large sample world, or by repeatedly bootstrapping and reestimating the model, which is computationally demanding. The organization of the paper is as follows. In Section 2, we present a general version of the DDC model and discuss conventional estimation methods as well as our Bayesian DP algorithm. In Section 3, we state theorems and corollaries on the convergence of our algorithm under some mild conditions. In Section 4, we present a simple model of entry and exit. In Section 5, we present the simulation and estimation results of several experiments applied to the model of entry and exit. Finally, in Section 6, we conclude and briefly discuss the future direction of this research. Appendices are provided as a supplement on the Econometrica website (Imai, Jain, and Ching (2009)). Appendix A contains some results of the simulation-estimation exercises of the basic model and the random effects model. Appendix B contains all proofs. Appendix C contains plots of the MCMC estimation of the random effects model. 2. THE FRAMEWORK We estimate an infinite horizon dynamic model of a forward-looking agent. Let θ be the J-dimensional parameter vector. Let S be the set of state space points and let s be an element of S. We assume that S is finite. Let A be the set of all possible actions and let a be an element of A. We assume A to be finite to study discrete choice models. Let R(s a a θ) be the current period return function of choosing action a, where s is the state variable and is a vector whose ath element a is a random shock to current returns to choice a. We further assume that follows a multivariate distribution F(|θ) with density function dF( θ) and is independent over time. We assume that the transition probability of next period state s , given current period state s and action a, is f (s |s a θ), where θ is the parameter vector. Then the time invariant value function can be defined to be the maximum of the discounted sum of expected revenues as ∞ τ V (st t θ) ≡ max E β R sτ aτ aτ θ st t {at at+1 }
τ=t
where β is the discount factor. This value function is known to be the unique solution to the Bellman equation V (s θ) = max R(s a a θ) + βEs [V (s θ)|s a] (1) a∈A
1870
S. IMAI, N. JAIN, AND A. CHING
where s is the next period’s state variable. The expectation is taken with respect to the next period shock and the next period state s . If we define V (s a a θ) to be the expected value of choosing action a, then
V (s a a θ) = R(s a a θ) + βEs [V (s θ)|s a] and the value function can be written as V (s θ) = max V (s a a θ) a∈A
We assume that the data set for estimation includes variables which correspond to state vector s and choice a in our model but the choice shock is not obd T d d adiτ Gdiτ }Ni=1τ=1 , where N d served. That is, the observed data are YN d T d ≡ {siτ 6 d is the number of firms and T is the number of time periods. Furthermore, d adiτ = arg max V (siτ a a θ) a∈A
G = d iτ
d d R siτ aiτ adiτ θ 0
d if (siτ adiτ ) ∈ Ψ ,
otherwise.
The current period return is observable in the data only when the pair of state and choice variables belongs to the set Ψ . In the entry–exit problem of firms that we discuss later, profit of a firm is only observed when the incumbent firm stays in. In this case, Ψ is a set whose state variable is being an incumbent (and the capital stock) and the choice variable is staying in. Let π(·) be the prior distribution of θ. Furthermore, let L(YN d T d |θ) be the likelihood of the model, given the parameter θ and the value function V (· · θ), which is the solution of the DP problem. Then, we have the posterior distribution function of θ: (2)
P(θ|YN d T d ) ∝ π(θ)L(YN d T d |θ) d
d
T Let ≡ {iτ }Ni=1τ=1 . Because is unobserved to the econometrician, the likelihood is an integral over it. That is, if we define L(YN d T d | θ) to be the likelihood conditional on ( θ), then
L(YN d T d |θ) =
6
L(YN d T d | θ) dF (|θ)
We denote any variables with d superscript to be the data.
BAYESIAN ESTIMATION OF DDC MODELS
1871
The value function enters into the likelihood through choice probability, which is a component of the likelihood. That is,7 d d (3) P[a = adiτ |siτ V θ] = Pr : adiτ = arg max R(siτ a a θ) a∈A d + βEs [V (s θ)|siτ a] Below we briefly describe the conventional estimation approaches and then the Bayesian dynamic programming algorithm we propose. 2.1. The Maximum Likelihood Estimation The conventional ML estimation procedure of the DP problem consists of two main steps. First is the solution of the DP problem and the subsequent construction of the likelihood, which is called the inner loop; second is the estimation of the parameter vector, which is called the outer loop. DP Step (Inner Loop) Given parameter vector θ, we solve for the fixed point V (· · θ) of the Bellman operator Tθ : Tθ V(s θ) ≡ max R(s a a θ) + βEs [V (s θ)|s a] a
This typically involves several steps. Step a. The random choice shock is drawn a fixed number of times, say, M , generating (m) m = 1 M . At iteration 0, we let the expected value function be 0, that is, E [V (0) (s θ)] = 0 for every s ∈ S. Then we set initial guess of the value function at iteration 1 to be the current period payoff. That is, V (1) s (m) θ = max R s a (m) a θ a∈A
for every s ∈ S, (m) . Step b. Assume we are at iteration t of the Bellman operator. Given s ∈ S and (m) , the value of every choice a ∈ A is calculated. For the emax function, [V (t−1) (s θ)] comwe use the approximated expected value function E puted at the previous iteration t − 1 for every s ∈ S. Hence, the iteration t 7 Notice that it is not necessary that we have a random choice shock a for each choice a. What is important for the feasibility of estimation is that the likelihood, which is based on the choice d d V θ], is well defined and bounded for all {adiτ siτ }, and for uniformly probability P[a = adiτ |siτ bounded V and θ ∈ Θ
1872
S. IMAI, N. JAIN, AND A. CHING
value of choice a is V (t) s a (m) a θ V (t−1) (s θ) f (s |s a θ) = R s a (m) E a θ + β s
Then we compute the value function, V (t) (s (m) θ) = maxa∈A {V (t) (s a (m) , m = 1 M . (m) a θ)} This calculation is done for every s ∈ S and Step c. The approximation for the expected value function is computed by taking the average of value functions over simulated choice shocks as (4)
M V (t) (s θ) ≡ 1 V (t) s (m) θ E M m=1
Steps b and c have to be repeated for every state space point s ∈ S. Furthermore, the two steps have to be repeated until the value function converges. That is, for a small δ > 0, |V (t) (s (m) θ) − V (t−1) (s (m) θ)| < δ for all s ∈ S and m = 1 M . Likelihood Construction The important increment of the likelihood is the conditional choice probd d ability P[a = adiτ |siτ V θ] given the state siτ , value function V and the parameter θ. For example, suppose that the per period return function is specified as a θ) + a R(s a a θ) = R(s a θ) is the deterministic component of the per period return funcwhere R(s tion. Also, denote (s a θ) = R(s [V (s θ)]f (s |s a θ) a θ) + β E V s
to be the deterministic component of the value of choosing action a. Then d P[adiτ |siτ V θ] (s ad θ) − V (s a θ); ∀a = ad |sd V θ = P a − adiτ ≤ V iτ iτ iτ
which becomes a multinomial probit specification when the error term is assumed to follow a joint normal distribution.8 8 As long as the choice probability is well defined, the error term does not have to be additive. The dynamic discrete choice model is essentially a multinomial discrete choice model, where the right hand side includes future expected value functions.
BAYESIAN ESTIMATION OF DDC MODELS
1873
Likelihood Maximization Routine (Outer Loop) Suppose we have J parameters to estimate. In a typical ML estimation routine, where one uses the Newton hill climbing algorithm, at iteration t, likelihood is derived under the original parameter vector θ(t) and under the perturbed parameter vector θ(t) + θj , j = 1 J. The perturbed likelihood is used together with the original likelihood to derive the new direction of the hill climbing algorithm. This is done to derive the parameters for the iteration t + 1, θ(t+1) . That is, during a single ML estimation routine, the DP problem needs to be solved in full J + 1 times. Furthermore, often the ML estimation routine has to be repeated many times until convergence is achieved. During a single iteration of the maximization routine, the inner loop algorithm needs to be executed at least as many times as the number of parameters plus 1. Since the estimation requires many iterations of the maximization routine, the entire algorithm is usually computationally extremely burdensome. 2.2. The Conventional Bayesian MCMC Estimation A major computational issue in the Bayesian estimation method is that the posterior distribution, given by equation (2), is a high-dimensional and complex function of the parameters. Instead of directly simulating the posterior, we adopt the MCMC strategy and construct a transition density from current parameter θ to the next iteration parameter θ f (θ θ ), which satisfies, among other more technical conditions, the equality P(θ|YN d T d ) = f (θ θ )P(θ |YN d T d ) dθ where P(θ|YN d T d ) is the posterior distribution of θ given YN d T d . From the transition density, we simulate the sequence of parameters {θ(s) }ts=1 , which is known to converge to the correct posterior. The conventional Bayesian estimation method applied to the DDC model proceeds in the following two main steps.9 Metropolis–Hastings (M–H) Step The M–H algorithm is a Markov chain simulation algorithm used to draw from a complex target distribution.10 In our case, the target density is proportional to π(θ)L(YN d T d |θ). Given θ(t) , the parameter vector at iteration t, we draw the new parameter vector θ(t+1) as follows: First, we draw the candidate 9
See Tierney (1994) and Tanner and Wong (1987) for details on Bayesian estimation. See Robert and Casella (2004) for more details on the M–H algorithm.
10
1874
S. IMAI, N. JAIN, AND A. CHING
parameter vector θ∗(t) from a candidate generating density (or proposal density) q(θ(t) θ∗(t) ). Then we accept θ∗(t) , that is, set θ(t+1) = θ∗(t) with probability,
π(θ∗(t) )L(YN d T d |θ∗(t) )q(θ∗(t) θ(t) ) λ θ(t) θ∗(t) = min (5) 1 π(θ(t) )L(YN d T d |θ(t) )q(θ(t) θ∗(t) ) and we reject θ∗(t) , that is, set θ(t+1) = θ(t) with probability 1 − λ. Since the likelihood is a function of the value function, the DP problem needs to be solved for each θ∗(t) . Hence, similar to the maximum likelihood estimation procedure, one can interpret the M–H step as the outer loop of the estimation algorithm and interpret the DP step involved in constructing the likelihood for each candidate parameter vector as the inner loop. This DP step is the same as the one described in the previous subsection. The full-solutionbased Bayesian MCMC method turns out to be even more burdensome computationally than the full-solution-based ML method because MCMC typically requires a lot more iterations than the ML routine. We next present our algorithm for estimating the parameter vector θ We call it the Bayesian dynamic programming (Bayesian DP) algorithm. The key innovation of our algorithm is that we solve the DP problem and estimate the parameters simultaneously rather than sequentially as in the conventional methods described above. 2.3. The Bayesian Dynamic Programming Estimation The main difference between the Bayesian DP algorithm and the conventional algorithm is that during each estimation step, we do not solve for the fixed point of the Bellman operator. In fact, during each modified M–H step, we iterate the Bellman operator only once. This operator can be expressed as Tθ(t) V (· · θ)
(t) ≡ max R(s a a θ) + β E [V (s θ)]f (s |s a θ) a
s
Our Bellman operator Tθ(t) depends on t because our approximation of the (t) , depends on t. In conventional methods, this expected value function, E approximation is given by equation (4). In contrast, we approximate it by averaging over a subset of past iterations. Let H(t) ≡ {(s) θ∗(s) V (s) }ts=1 be the history of shocks, candidate parameters,11 and value functions up to it∗(t) eration t. Let V (t) (s a (t) H(t−1) ) be the value of choice a and let a θ (t) (t) ∗(t) (t−1) V (s θ H ) be the value function derived at iteration t of our 11
We do not use past history of θ(s) ; hence it is not included in H(t) .
BAYESIAN ESTIMATION OF DDC MODELS
1875
solution–estimation algorithm. Then the value function and the approxima(t) [V (s θ)|H(t−1) ] for the expected value function E [V (s θ)] at tion E iteration t are defined recursively as (t) V (s θ)|H(t−1) E (6) ≡
V
V
N(t)
Kh (θ − θ∗(t−n) ) V (t−n) s (t−n) θ∗(t−n) H(t−n−1) N(t) n=1 Kh (θ − θ∗(t−k) )
∗(t−n)
k=1
θ H s a θ∗(t−n) = R s a (t−n) a (t−n) +β E V s θ∗(t−n) H(t−n−1) f (s |s a θ)
(t−n)
(t−n)
(t−n) a
(t−n−1)
s
s (t−n) θ∗(t−n) H(t−n−1) θ∗(t−n) H(t−n−1) = max V (t−n) s a (t−n) a a∈A
where Kh (·) is a multivariate kernel with bandwidth h > 0. The approximated expected value function given by equation (6) is the weighted average of value functions of N(t) most recent iterations. The sample size of the average, N(t), increases with t, that is, N(t) → ∞ as t → ∞. Furthermore, we let t − N(t) → ∞ as t → ∞. The weights are high for the value functions at iterations with parameters close to the current parameter vector θ(t) . This is similar to the idea of Pakes and McGuire (2001), where the expected value function is the average of the past N iterations. In their algorithm, averages are taken only over the value functions that have the same state space point as the current state space point s. In our case, averages are taken over the value functions that have the same state as the current state s as well as parameters that are close to the current parameter θ(t) . We now describe the complete Bayesian DP algorithm at iteration t. Suppose that {(l) }tl=1 , {θ∗(l) }tl=1 , and {V (l) (s (l) θ∗(l) H(l−1) )}tl=1 are given for all discrete s ∈ S. Then we update the value function and the parameters as follows. Modified M–H Step12 We draw the new parameters θ(t+1) as follows: First, we draw the candidate parameter vector θ∗(t) from the proposal density q(θ(t) θ∗(t) ). Then we 12 We are grateful to Andriy Norets for pointing out a flaw in the Gibbs sampling scheme adopted in the earlier draft. We follow Norets (2007) and Osborne (2007), and adopt the modified Metropolis–Hastings algorithm for the MCMC sampling. Notice that in a linear model (see
1876
S. IMAI, N. JAIN, AND A. CHING
accept θ∗(t) , that is, set θ(t+1) = θ∗(t) with probability λ(θ(t) θ∗(t) |H(t−1) ) defined in the same way as before, in equation (5), and we reject θ∗(t) , that is, we set θ(t+1) = θ(t) with probability 1 − λ. Modified DP Step As explained in the conventional Bayesian MCMC algorithm, during each Metropolis–Hastings step, we need to update the expected value function (t) [V (· · ·)|H(t−1) ] for parameters θ(t) and θ∗(t) . To do so for all s ∈ S, without E iterating on the Bellman operator until convergence, we follow equation (6). For use in future iterations, we simulate the value function by drawing (t) to ∗(t) derive V (t) (s a (t) H(t−1) ) and V (t) (s (t) θ∗(t) H(t−1) ) a θ We repeat these two steps until the sequence of parameter simulations converges to a stationary distribution. In our algorithm, in addition to the DP and Bayesian MCMC methods, nonparametric kernel techniques are also used to approximate the value function. Notice that the convergence of kernel-based approximation is not based on the large sample size of the data, but on the number of Bayesian DP iterations. Moreover, the Bellman operator is evaluated only once during each estimation iteration. Hence, the Bayesian DP algorithm avoids the computational burden of solving for the DP problem during each estimation step, which involves repeated evaluation of the Bellman operator.13 It is important to note that the modified Metropolis–Hastings algorithm is not a Markov chain.14 This is because it involves value functions calculated in past iterations. Hence, convergence of our algorithm is by no means trivial. McCulloch and Rossi (1994) for an example), the equations are zit = Rit γ + uit uit ∼ N(0 Σ)
1 if zijt ≥ max{zit }, yijt = 0 otherwise. Then, once zit is derived by data augmentation, the first equation is linear in parameter γ and thus can be estimated by linear Gibbs sampling. However, in our case, zit = Rit γ + βEV (Rit+1 |Rit θ) + uit
uit ∼ N(0 Σ)
where EV (Rit+1 |Rit θ) is a nonlinear function of the state variable Rit and, thus, linear Gibbs sampling algorithms cannot be applied. 13 Both Osborne (2007) and Norets (2007) approximated the expected value function using the value functions computed in the past iterations evaluated at the past parameter draws θ(t−n) . Here, we use the value functions evaluated at the past proposal parameter draws θ∗(t−n) . We chose to do so because given θ(t) it is easier to control the random movement of θ∗(t) than the random movement of θ(t+1) , since θ∗(t) is drawn from a known distribution function. This simplifies both the proofs and the empirical example. By keeping the conditional variance of the proposal density given θ(t) small, we can guarantee that the invariant distribution of θ∗(t) is not very different from that of θ 14 We are grateful to Peter Rossi for emphasizing this.
BAYESIAN ESTIMATION OF DDC MODELS
1877
In the next section, we state theorems and corollaries which show that under some mild assumptions, the distribution of the parameters generated by our algorithm converges to the true posterior in probability. All proofs are in Appendix B. 3. THEORETICAL RESULTS In this section, we prove the convergence of the Bayesian DP algorithm in the basic model. We then present the random effects model and the continuous state space model in two subsections. To facilitate the proof, we modify the DP step slightly to calculate the expected value function. That is, we simulate the value function by drawing (t) to derive ∗(t) V (t) s a (t) H(t−1) a θ (t) V s θ∗(t) H(t−1) f (s |s a θ) s a (t) θ∗(t) + β E =R a s
where s a (t) θ∗(t) = min max R s a (t) θ∗(t) −MR MR R a a for a large positive MR . This makes the current period return function used in equation (1) uniformly bounded, which simplifies the proof. This modification does not make any difference in practice because MR can be set arbitrarily large. Let V denote the solution of the Bellman equation:15 a a θ) + βEs [V (s θ)|s a] V (s θ) = max R(s Next we show that under some mild assumptions, our algorithm generates a sequence of parameters θ(1) θ(2) that converges in probability to the correct posterior distribution. ASSUMPTION 1: Parameter space Θ ⊆ RJ is compact, that is, closed and bounded in the Euclidean space RJ . The proposal density q(θ ·) is continuously differentiable, strictly positive, and uniformly bounded in the parameter space given any θ ∈ Θ.16 15 Given the expected value function, per period return in the likelihood construction is set to See equation (3). be R not R 16 Compactness of the parameter space is a standard assumption used in proving the convergence of the MCMC algorithm. It is often not necessary but simplifies the proofs. An example of the proposal density that satisfies Assumption 1 is the multivariate normal density, truncated to only cover the compact parameter space.
1878
S. IMAI, N. JAIN, AND A. CHING
a a θ)| < MR for ASSUMPTION 2: For any s ∈ S, a ∈ A, and , θ ∈ Θ, |R(s a · ·) some MR > 0. Also, R(s a · θ) is a nondecreasing function in and R(s satisfies the Lipschitz condition in terms of and θ. Also, the density function dF( θ) and the transition function f (·|· a ·) given a satisfy the Lipschitz condition. ASSUMPTION 3: β is known17 and β < 1. ASSUMPTION 4: For any s ∈ S, , and θ ∈ Θ, V (0) (s θ) < MI for some MI > 0. Furthermore, V (0) (s · ·) also satisfies the Lipschitz condition in terms of and θ. (t) [V (s θ)], Assumptions 2, 3, and 4 jointly make V (t) (s θ) and hence E t = 1 uniformly bounded, measurable, and continuous, and satisfy the Lipschitz condition as well. ASSUMPTION 5: π(θ) is positive and bounded for any θ ∈ Θ. Similarly, for any θ ∈ Θ and V uniformly bounded, L(YN d T d |θ V (· θ)) > 0 and is bounded and uniformly continuous in θ ∈ Θ. as follows. For some s > 0, define t(1) = s Define the sequence t(l), N(l) and N(1) = N(s). Let t(2) be such that t(2) − N(t(2)) = t(1) + 1. Such t(2) exists from the assumption on N(s) stated in Assumption 6. Also, let N(2) = N(t(2)). Similarly, for any l > 2, let t(l +1) be such that t(l +1)−N(t(l +1)) = + 1) = N(t(l + 1)). t(l) + 1 and let N(l ASSUMPTION 6: N(t) is nondecreasing in t, increases at most by one for a unit increase in t, and N(t) → ∞. Furthermore, t − N(t) → ∞ and there exists a finite + 1) < AN(l) for all l > 1, and, for any l = 2 constant A > 0 such that N(l N(t(l) + 1) = N(t(l)) + 1. An example of a sequence that satisfies Assumption 6 is t(l) ≡ s + N(l) = l for l > 0, and N(t) = l + 1 for t(l) < t ≤ t(l + 1), l > 1.
(l+1)(l+2) 2
,
ASSUMPTION 7: The bandwidth h is a nonincreasing function of N and as N → ∞, h(N) → 0 and Nh(N)9J → ∞. Further, h(N) is constant for N(t(l)) < N ≤ N(t(l + 1)). ASSUMPTION 8: Kh (·) is a multivariate kernel with bandwidth h > 0. That is, Kh (z) = (1/hJ )K(z/ h) where K is a nonnegative, continuous, bounded real 17 The assumption that β is known may not be necessary but greatly simplifies the proofs. However, we show later that β can be successfully estimated as long as its prior is restricted to be strictly less than 1.
BAYESIAN ESTIMATION OF DDC MODELS
1879
function which is symmetric around 0 and integrates to 1,4Jthat is, K(z) dz = 1. Furthermore, zK(z) dz < ∞ and |z|>1/ h K(z) dz ≤ Ah for some positive constant A where for a vector z, |z| = supj=1J |zj |, and K has an absolutely integrable Fourier transform. THEOREM 1: Suppose Assumptions 1–8 are satisfied for V (t) , π, L, , and θ. (t) [V (s θ)| Then, the sequence of approximated expected value functions E (t−1) H ] converges to E [V (s θ)] in probability uniformly over s ∈ S, θ ∈ Θ as t → ∞. Similarly, the sequence of value functions V (t) (s θ H(t−1) ) converges to V (s θ) in probability uniformly over s, , andθ ∈ Θ as t → ∞. COROLLARY 1: Suppose Assumptions 1–8 are satisfied. Then λ(θ(t) θ∗(t) | H ) converges to λ(θ(t) θ∗(t) ) in probability uniformly in Θ. (t−1)
THEOREM 2: Suppose Assumptions 1–8 are satisfied for V (t) , t = 1 π, L, θ(t) in probability, where θ(t) is a Markov chain , and θ. Then θ(t) converges to generated by the Metropolis–Hastings algorithm with proposal density q(θ θ(∗) ) and acceptance probability function λ(θ θ(∗) ). COROLLARY 2: The sequence of parameter simulations generated by the Metropolis–Hastings algorithm with proposal density q(θ θ∗ ) and acceptance probability λ(θ θ∗ ) converges to the true posterior in total variation norm. That is, n lim K (θ ·)μ0 (dθ) − μ =0 n→∞
TV
for arbitrary initial distribution μ0 , where μ is the true posterior distribution and K n (θ ·) is the transition kernel for n iterations. By Corollary 2, we can conclude that the distribution of the sequence of parameters θ(t) generated by the Bayesian DP algorithm converges in probability to the true posterior distribution. To understand the basic logic of the proof of Theorem 1, suppose that parameter θ(t) stays fixed at a value θ∗ for all iterations t. Then equation (6) reduces to (t) V (s θ∗ )|H(t−1) = E
1 (t−n) (t−n) ∗ (t−n−1) V θ H s N(t) n=1 N(t)
Then our algorithm boils down to a simple version of the machine learning algorithm discussed by Pakes and McGuire (2001) and Bertsekas and Tsitsiklis (1996). They approximated the expected value function by taking the average over all past value function iterations whose state space point is the same
1880
S. IMAI, N. JAIN, AND A. CHING
as the state space point s . Bertsekas and Tsitsiklis (1996) discussed the convergence issues and showed that under some assumptions, the sequence of the value functions from the machine learning algorithm converges to the true value function almost surely. The difficulty of the proofs lies in extending the logic of the convergence of the machine learning algorithm to the framework of estimation, where the parameter vector moves around as well. Our answer to this issue is simple: for a parameter vector θ ∈ Θ at iteration t, we look at the past iterations and use value functions at parameters θ∗(t−n) that are very close to θ. Then the convergence is very similar to the case where the parameter vector is fixed, as long as the number of past value functions used can be made arbitrarily large. This is guaranteed by Assumption 1, since every neighborhood in the compact parameter space Θ will be visited infinitely often. It is important to note that for convergence of the value function, the estimation algorithm does not have to be Markov. The only requirement is that during the iteration, each neighborhood in Θ has a strictly positive probability of being drawn. 3.1. Random Effects Consider a model where, for a subset of parameters, each agent has a different value θi , which is randomly drawn from a density f ( θi |θ(1) ). The parameter vector of the model is θ ≡ (θ(1) θ(2) ), where θ(1) is the parameter vector for the distribution of the random coefficients and θ(2) is the vector of other parameters. The parameter vector of firm i is ( θi θ(2) ). Instead of explicitly integrating the likelihood over θi , we follow the commonly adopted and computationally efficient procedure of treating each θi as a parameter and drawing it from its density. It is known (see McCulloch and Rossi (1994), Albert and Chib (1993), and Chib and Greenberg (1996)) that instead of drawing the entire parameter d vector ({ θi }Ni=1 θ(1) θ(2) ) at once, it is often simpler to partition it into several blocks and draw the parameters of each block separately given the other parameters. Here, we propose to draw them in the following three blocks. At iteration t the blocks are d (t) (t) Block 1: Draw { θi(t+1) }Ni=1 given θ(1) θ(2) d (t+1) (t) given { θi(t+1) }Ni=1 , θ(2) Block 2: Draw θ(1) d (t+1) (t+1) (t+1) N θi }i=1 , θ(1) Block 3: Draw θ(2) given { Below we describe in detail the algorithm at each block. Block 1—Modified M–H Step for Drawing θi For firm i, we draw the new random effects parameters θi(t+1) as follows: We set the proposal density as the distribution function of θi , that is, f ( θi |θ(1) ). Notice that the prior is a function of θ(1) and θ(2) , and not of θi . Hence for drawing θi given θ(1) and θ(2) , the prior is irrelevant. Similarly, given θ(1) , the likelihood
BAYESIAN ESTIMATION OF DDC MODELS
1881
increment of firms other than i is also irrelevant in drawing θi . Therefore, we draw θi using the likelihood increment of firm i, which can be written as θi θ(2) f θi |θ(1) Li YiT d | where
θi θ(2) ≡ L YiT d | θi θ(2) V (t) · · · Li YiT d | θi θ(2) H(t−1)
because the likelihood depends on the value function. Now, we draw the canθi∗(t) |θ(1) ). Then we acdidate parameter θi∗(t) from the proposal density f ( ∗(t) (t+1) ∗(t) cept θi , that is, set θi = θi with probability (t) ∗(t) (t−1) θi H λ1 θ
(t) (t) (t) θi∗(t) θ(2) ))f ( θi∗(t) |θ(1) )f ( θi(t) |θ(1) ) Li (YiT d |( 1 = min (t) (t) (t) Li (YiT d |( θi(t) θ(2) ))f ( θi(t) |θ(1) )f ( θi∗(t) |θ(1) )
(t) θi∗(t) θ(2) )) Li (YiT d |( 1 ; = min (t) Li (YiT d |( θi(t) θ(2) )) θi(t+1) = θi(t) with probability 1 − λ1 . otherwise, reject θi∗(t) , that is, set (t+1) Block 2—Drawing θ(1)
N d d (t+1) Conditional on { θi(t+1) }Ni=1 , the density of θ(1) is proportional to i=1 f ( θi(t+1) | θ(1) ) Drawing from this density is straightforward as it does not involve the solution of the DP problem.18 Block 3—Modified M–H Algorithm for Drawing θ(2) (t+1) We draw the new parameters θ(2) as follows: First, we draw the candidate ∗(t) (t) ∗(t) ∗(t) from the proposal density q(θ(2) θ(2) ). Then we accept θ(2) , parameter θ(2)
18 As pointed out by a referee, a potential issue could be serial correlation of θi , because the new θi ’s could be dependent on θ(1) from the past iteration. Furthermore, draws of new θi could be heavily centered around the past mean of θi , which may suppress sufficient movement of θ(1) . MCMC plots in Appendix C for the example discussed later show that there is sufficient movement of the parameters θ(1) and that serial correlation is small. An important statistic to look at in this case is the acceptance probability of the M–H draw of θi(t) . If the acceptance probability is too low, then there is insufficient movement of θi(t) over iteration t and thus their hyperparameter θ1(t) will exhibit high correlation across t. On the other hand, if it is too high, that is, close to 1, then their mean θ1(t) will not change much if N is large, and thus will result in high serial correlation of θ1(t) . In our random effects example, the acceptance probability of θi(t) is around 15–25%, which is considered to be quite appropriate in the MCMC literature. If the acceptance rate is either too high or too low, then a different procedure such as the ones proposed by Osborne (2007) or Norets (2007) is recommended.
1882
S. IMAI, N. JAIN, AND A. CHING
(t+1) ∗(t) that is, set θ(2) = θ(2) with probability
(t+1) ∗(t) (t−1) θ(2) H λ2 θ(1) ⎫ ⎧ Nd ⎪ ⎪ ⎪ ⎪ (t+1) ∗(t) (t+1) ∗(t) ∗(t) (t) ⎪ ⎪ ⎪ ⎪ π(θ θ ) L (Y | θ θ ) q(θ θ ) d i iT i ⎪ ⎪ (1) (2) (2) (2) (2) ⎬ ⎨ i=1 ; 1 = min Nd ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (t+1) (t) (t) (t) ∗(t) ⎪ ⎪ ⎪ Li (YiT d | θi(t+1) θ(2) ) q(θ(2) θ(2) ) ⎪ ⎭ ⎩ π(θ(1) θ(2) ) i=1 ∗(t) (t+1) (t) otherwise, we reject θ(2) , that is, set θ(2) = θ(2) with probability 1 − λ2 .
Bellman Equation Step During each M–H step, for each agent i we evaluate the expected value func(t) [V (· · θi θ(2) )|H(t−1) ]. To do so for each agent, for all s ∈ S, we follow tion E equation (6) as before.19 For use in future iterations, we simulate the value function by drawing (t) to derive (t−1) V (t) s a (t) a θi θ(2) H s a (t) =R θi θ(2) a (t) V s E θi θ(2) H(t−1) f (s |s a θ) +β V
(t)
s
(t−1) θi θ(2) H(t−1) = max V (t) s a (t) s (t) a θi θ(2) H a∈A
The additional computational burden necessary to estimate the random coefficient model is the computation of the value function which has to be done separately for each firm i, because each firm has a different random effects parameter vector. In this case, adoption of the Bayesian DP algorithm results in a large reduction in computational cost. 3.2. Continuous State Space So far, we assumed a finite state space. However, the Bayesian DP algorithm can also be applied, with minor modifications, in a straightforward manner to other settings of dynamic discrete choice models. One example is the random grid approximation of Rust (1997). Conventionally, randomly generated state vector grid points are fixed throughout the solution-estimation algorithm. If we follow this procedure and 19
For more details, see Experiment 1 in Section 5.1.
1883
BAYESIAN ESTIMATION OF DDC MODELS
let sm , m = 1 M, be the random grids that are generated before the start of the solution-estimation algorithm, then, given parameter θ, the expected value function approximation at iteration t of the DP solution algorithm using the Rust random grids method would be (t−1) (t) E (7) s V (s θ)|s a H ⎤ ⎡ ⎥ ⎢ M ⎢ N(t) Kh (θ − θ∗(t−n) ) ⎥ ⎥ ⎢ (t−n) (t−n) ∗(t−n) (t−n−1) V θ H sm ≡ ⎥ ⎢ N(t) ⎥ ⎢ m=1 ⎣ n=1 K (θ − θ∗(t−k) ) ⎦ h
k=1
×
f (sm |s a θ) f (sl |s a θ) M
l=1
Notice that in this definition of emax function approximation, the grid points remain fixed over all iterations. In contrast, in our Bayesian DP algorithm, random grids can be changed at each solution-estimation iteration. Let s(t) be the random grid point generated at iteration t. Here s(τ) , τ = 1 2 are drawn independently from a distribution. Then the expected value function can be approximated as (t−1) (t) E s V (s θ)|s a H ≡
N(t)
V (t−n) s(t−n) (t−n) θ∗(t−n) H(t−n−1)
n=1
×
Kh (θ − θ∗(t−n) )f (s(t−n) |s a θ) N(t) ∗(t−k) (t−k) Kh (θ − θ )f (s |s a θ) k=1
In the Rust method, if the total number of random grids is M, then the number of computations required for each iteration of the Bellman operator is M. Hence, at iteration τ, the number of DP computations that is required is Mτ. If a single DP solution step requires τ iterations of the Bellman operator and if each Newton ML step requires K DP solution steps, then to iterate the Newton ML algorithm once, we need to compute a single DP iteration MτK times. In contrast, in our Bayesian DP algorithm, at iteration t we only need to draw one state vector s(t) (so that M = 1) and only iterate on the Bellman operator once on that state vector (so that τ = 1 and K = 1). Still, at iteration t
1884
S. IMAI, N. JAIN, AND A. CHING
the number of random grid points is N(t), which can be made arbitrarily large when we increase the number of iterations. In other words, in contrast to the Rust method, the accuracy of the DP computation in our algorithm automatically increases with iterations. Another issue that arises in application of the Rust random grid method is that the method assumes that the transition density function f (s |s a θ) is not degenerate. That is, we cannot use the random grid algorithm if the transition from s to s , given a and θ, is deterministic. It is also well known that the random grid algorithm becomes inaccurate if the transition density has a small variance. In these cases, several versions of polynomial-based, expected value function approximation have been used. Keane and Wolpin (1994) approximated the emax function using polynomials of the deterministic part of the value functions for each choice and state space point. Imai and Keane (2004) used Chebychev polynomials of state variables. It is known that in some cases, global approximation using polynomials can be numerically unstable and exhibit “wiggling.” Here, we propose a kernel-based local interpolation approach to emax function approximation. The main problem behind the local approximation has been the computational burden of having a large number of grid points. As pointed out earlier, in our solution-estimation algorithm, we can make the number of grid points arbitrarily large by increasing the total number of iterations, even though the number of grid points per iteration is 1. Thus, if the continuous state variable evolves deterministically, we approximate the s [V (s θ)|s a] as follows. Let Khs (·) be the kernel funcemax function E tion with bandwidth hs for the state variable and Khθ (·) for the parameter vector θ. Then (t) V (s θ)|s a H(t−1) E ≡
N(t)
V (t−n) s(t−n) (t−n) θ∗(t−n) H(t−n−1)
n=1
×
Khs (s − s(t−n) )Khθ (θ − θ∗(t−n) ) N(t) Khs (s − s(t−k) )Khθ (θ − θ∗(t−k) ) k=1
4. EXAMPLES We estimate a simple, infinite horizon, dynamic discrete choice model of entry and exit, where firms are in a competitive environment.20 We describe the model in general first and then consider two simplifications. In the general model, there are firm-specific random effects and the state variable evolves 20
For an estimation exercise based on this model, see Roberts and Tybout (1997).
BAYESIAN ESTIMATION OF DDC MODELS
1885
stochastically. In the first model, which we term the basic model, there is no observed or unobserved heterogeneity. In the second model, in addition, the state variable evolves deterministically. The firm is either an incumbent (I) or a potential entrant (O). If the incumbent firm chooses to stay, its per period return is RIIN (Kt t θi ) = αi Kt + 1t where Kt is the capital of the firm, 1t is the independent and identically distributed (i.i.d.) random shock, and θi is the vector of parameters, including firm-specific parameter αi , which is distributed according to N(α σα ). When there are no random effects, αi = α for all i and σα = 0 If the firm chooses to exit, its per period return is RIOUT (Kt t θi ) = 2t Similarly, if the potential entrant chooses to enter, its per period return is ROIN (Kt t θi ) = −δ + 1t and if it decides to stay out, its per period return is ROOUT (Kt t θi ) = 2t where 2t is an i.i.d. shock. We assume that the random components of current period returns are i.i.d. and normally distributed, that is, lt ∼ N(0 σl ) l = 1 2. The level of capital Kt evolves as follows. If the incumbent firm stays in, then ln Kt+1 = b0 + b1 Xid + b2 ln Kt + ut+1 where ut+1 ∼ N(0 σu ) and Xid is a firm-specific characteristic vector observable to the econometrician. In the simple specification without firm-specific heterogeneity, b1 is set to zero. In the specification where we allow for heterogeneity, we set b0 to zero. In the specification where we assume the capital transition for the incumbent who stays to be deterministic, we simply set Kt+1 = Kt , in other words, b0 = 0, b1 = 0, b2 = 1, and σu = 0. If the potential entrant enters, then ln Kt+1 = be + ut+1 Now consider a firm that is an incumbent at the beginning of period t. Let VI (Kt t θi ) be the value function of the incumbent with capital stock Kt and let VO (0 t θi ) be the value function of the potential entrant, who has capital stock 0. The Bellman equation for the optimal choice of the incumbent is VI (Kt t θi ) = max{VIIN (Kt t θi ) VIOUT (Kt t θi )}
1886
S. IMAI, N. JAIN, AND A. CHING
where VIIN (Kt t θi ) = RIIN (Kt 1t θi ) + βEt+1 VI (Kt+1 (Kt ut+1 θi ) t+1 θi ) is the value of staying in during period t. Similarly, VIOUT (Kt t θi ) = RIOUT (Kt 2t θi ) + βEt+1 VO (0 t+1 θi ) is the value of exiting during period t. The Bellman equation for the optimal choice of the potential entrant is VO (0 t θi ) = max{VOIN (0 t θi ) VOOUT (0 t θi )} where VOIN (0 t θi ) = ROIN (0 1t θi ) + βEt+1 VI (Kt+1 (0 ut+1 θi ) t+1 θi ) is the value of entering during period t and VOOUT (0 t θi ) = ROOUT (0 2t θi ) + βEt+1 VO (0 t+1 θi ) is the value of staying out during period t. Notice that the capital stock of a potential entrant is always 0. Notice that if we assume lt to be extreme value distributed (see Rust (1987) for details on dynamic discrete choice models based on extreme valued error term), then the deterministic component of the value function can be expressed analytically, greatly simplifying the solution of the dynamic programming problem. To allow for correlation of the revenue function across different choices, one can adopt the random coefficient logit specification, where the random coefficient term is added to the per period revenue function. Then, in the basic model, the underlying latent per period revenue would be RIIN (Kt θi θ) where a a ∈ A is assumed to be i.i.d. extreme value distributed and the distribution of θi is assumed to be G(dθi ; θ). McFadden and Train (2000) showed that any choice probabilities can be approximated by the random coefficient multinomial logit model. Since θi is only introduced to allow for correlation of the revenues across different choices, and not to add serial correlation of choices, we assume θi to be i.i.d. over time as well. RIOUT (Kt θi θ), ROIN (Kt θi θ), and ROOUT (Kt θi θ) are similarly defined. To derive the expected value function at iteration t + 1, we would first draw θm ∼ G(dθi ; θ) m = 1 M, and then use the analytic formula for the ex-
1887
BAYESIAN ESTIMATION OF DDC MODELS
pected value function proposed by Rust (1987) to evaluate E [VI (K θm θ)] = log exp RIIN (K 0 θm θ) + βE (t) VI (K θ) + exp RIOUT (K 0 θm θ) + βE (t) VO (K θ) These two steps are repeated to derive E
(t+1)
M 1 [VI (K θ)|K] = E VI (K θm θ) M m=1
Similarly, E (t+1) [VO (K θ)|K] is computed. This algorithm involves Monte Carlo integration over θm and thus is very similar to the DP step of the Bayesian DP algorithm for the probit case. Hence, our algorithm would straightforwardly apply to the case of mixed logit where the mixture distribution is continuous and would result in a computational gain. We now discuss estimation of the basic model. The parameter vector θ of the model is (δ α β σ1 σ2 σu b0 b2 be ) The state variables are the capital stock K and the status of the firm Γ ∈ {I O}, that is, whether the firm is an incumbent or a potential entrant. We assume that for each firm, we only observe the capital stock, the profit of the firm that stays in, and the entry–exit status d d d over T d periods. That is, we know {Kit Rdit Γitd }t=1T , where Rdit ≡ αKit + 1t i=1N d if the firm stays in. We assume the prior distribution of all parameters to be diffuse. That is, we set π(θ) = 1. Below, we explain the estimation steps in detail. Assume we start with the initial guess of expected value functions being zero, that is, (0) VI K θ(0) = 0 E (0) VO 0 θ(0) = 0 E We employ the modified random walk M–H algorithm, where at iteration s the proposal density q(θ(s) θ∗(s) ) given iteration s parameter θ(s) , is21 δ∗(s) ∼ N(δ(s) σδ2 ), α∗(s) ∼ N(α(s) σα2 ), ln σ∗(s) ∼ N(ln σ(s) σln2 σ ), ln σ∗(s) ∼ 1 1 2 1
∗(s) 2 2 2 ∗(s) σln2 σ ), b∗(s) ∼ N(b(s) ∼ N(b(s) ∼ N(b(s) N(ln σ(s) 0 0 σb0 ), b2 2 σb2 ), be e σbe ), 2 2 and ln σu∗(s) ∼ N(ln σu(s) σln2 σu ), and when we estimate β, β∗(s) ∼ N(β(s) σβ2 ). (s) (s) Given the parameters of iteration s, θ(s) = (δ(s) α(s) σ(s) σ(s) b(s) 0 b2 be 1 2 (s) ∗(s) σu ), we draw the candidate parameter θ from these normal densities.
21 The standard errors of the innovations of the random walk M–H are all set to be 0004, except for β, for which it is set to be 0001.
1888
S. IMAI, N. JAIN, AND A. CHING
Expected Value Function Iteration Step We update the expected value function for parameter θ(s) and θ∗(s) First, we derive E(s) [VΓ (K θ|H(s) )] for θ = θ(s) and θ∗(s) , using the Gaussian kernel22 J 1 2 Kh (θ − θ ) = (2π)−L/2 j=1 h−1 j exp[− 2 ((θj − θj )/hj ) ], as follows: (s−1) (s) E VI (K θ)|H N(s) M 1 VI(s−n) K j(s−n) θ∗(s−n) H(s−n−1) ≡ M n=1 j=1 ×
E
Kh (θ − θ∗(s−n) )ILθ (θ∗(s−n) ) N(s) Kh (θ − θ∗(s−k) )ILθ (θ∗(s−k) ) k=1
VO (0 θ)|H(s−1) N(s) M j(s−n) ∗(s−n) (s−n−1) 1 (s−n) ≡ 0 V θ H M j=1 O n=1
(s)
×
Kh (θ − θ∗(s−n) )ILθ (θ∗(s−n) ) N(s) ∗(s−k) ∗(s−k) Kh (θ − θ )ILθ (θ ) k=1
The expected value function is updated by taking the weighted average over L of the past N(s) iterations where the parameter vector θ∗(l) was closest to θ (we denote this as ILθ (θ∗(l) ) = 1), where L is set to be 2000 and N(s) increases to 3000. As discussed before, in principle, only one simulation of is needed during each solution-estimation iteration. But that requires the number of past iterations for averaging, that is, requires N(s) to be large, which adds to the computational burden. Instead, in our example, we draw ten times and take an average. Hence, when we derive the expected value function, instead of averaging past value functions, we average over past average value functions, $M (j) that is, (1/M ) m=1 VΓ (K (j) m θ ), where M = 10. This obviously increases the accuracy per iteration and reduces the need to have a large N(s). It is important to notice that as the algorithm proceeds, and t and N(t) become sufficiently large, the computational burden of our nonparametric approximation of the expected value functions could become more than that of solving the DP problem.23 In our examples, we have set the number of MCMC iterations and the maximum of N(t) arbitrarily, but at these values, the al22 23
Kernel bandwidth hj is set to be 002 for all j = 1 2 J We thank an anonymous referee for emphasizing this point.
BAYESIAN ESTIMATION OF DDC MODELS
1889
gorithm is computationally much superior to the full-solution-based MCMC, without experiencing any noticeable loss in accuracy in posterior distribution estimation. To avoid this arbitrariness, and still avoid the computational bur and occaden of large N(t), one could initially set N(t) at a fixed number N sionally conduct one-step Bellman updates using the expected value E (t) V (s ) computed using the past values as an initial value for the DP iteration. If the newly iterated expected value function is sufficiently close to E (t) V (s ), then Norets (2008) considered other ways to there is no need for an increase in N. combine the Bayesian DP algorithm with the standard DP steps to gain further computational efficiency. To further integrate the value function over the capital shock u, we can use the Rust random grid integration method which uses a fixed grid. Since the state space has only one dimension, we use equally spaced Km , m = 1 M, capital grid points and apply equation (7). That is, for the incumbent, (s) VI (K (K d u θ) θ)|K d H(s−1) E it it (s) VI (Km θ)|H(s−1) = E m
×
d 2 [Km σu ]−1 exp(−(ln Km − b0 − b1 ln Kit ) /(2σu2 )) M d 2 [Kl σu ]−1 exp(−(ln Kl − b0 − b1 ln Kit ) /(2σu2 )) l=1
for θ = θ(s) θ∗(s) . For the entrant, (s) VI K 0 u θ(s) θ H(s−1) E (s) VI (Km θ)|H(s−1) E = m
×
[Km σu ]−1 exp(−(ln Km − be )2 /(2σu2 )) M [Kl σu ]−1 exp(−(ln Kl − be )2 /(2σu2 )) l=1
for θ = θ(s) θ∗(s) . Modified DP Step j(s)
∼ N(0 σl ), l = 1 2; j = 1 M , and compute VIIN Km j(s) θ∗(s) H(s−1) j(s) = RIIN Km 1 θ∗(s) (s) VI K Km ut+1 θ∗(s) θ∗(s) Km H(s−1) + βE
We draw l
1890
S. IMAI, N. JAIN, AND A. CHING
VIOUT Km j(s) θ∗(s) H(s−1) j(s) (s) VO 0 θ∗(s) H(s−1) = RIOUT Km 2 θ∗(s) + βE VI Km j(s) θ∗(s) H(s−1) = max VIIN Km j(s) θ∗(s) H(s−1) VIOUT Km j(s) θ∗(s) H(s−1) to derive M 1 VI(s) Km j(s) θ∗(s) H(s−1) M j=1
and M 1 VO(s) 0 j(s) θ∗(s) H(s−1) M j=1
Modified M–H Step We draw the new parameter vector θ(s+1) from the posterior distribution. Let d d d (IN) Iit (IN) IiT (IN)] Ii = [Ii1 d where Iit (IN) = 1 if the firm either enters or decides to stay in, and =0 otherd and Rdit . The likeliwise. Similarly, we use Ki and Ri to denote vectors of Kit hood increment for firm i at time t is (suppressing the superscript (s − 1) on H and denoting φ(·) to be the standard normal density for convenience)
Li (Ii Ki Ri |θ) (s) VI (K (K d u θ) θ)|K d H = Pr 2t ≤ Rdit + β E it it (s) [VO (0 θ)|H] −E & & % d % d ln Kit+1 − b1 − b2 ln Kitd 1 1 Rit − αKitd × φ φ d σ 1 σ 1 σu Kit+1 σu d × Iitd (IN)Iit+1 (IN) (s) d VI K K u θ(s) θ |K d H + Pr 2t − 1t > αKitd + β E it it (s) [VO (0 θ)|H] −E d × Iitd (IN)(1 − Iit+1 (IN)) (s) VI K 0 u θ(s) θ |H + Pr 2t − 1t ≤ −δ + β E (s) [VO (0 θ)|H] −E
BAYESIAN ESTIMATION OF DDC MODELS
1891
& d − be ln Kit+1 d (1 − Iitd (IN))Iit+1 (IN) d σu Kit+1 σu (s) VI K 0 u θ(s) θ |H + Pr 2t − 1t > −δ + β E (s) [VO (0 θ)|H] −E ×
1
%
φ
d × (1 − Iitd (IN))(1 − Iit+1 (IN))
The algorithm sets θ(s+1) = θ∗(s) with probability λ(θ(s) θ∗(s) |H(s−1) ), where the random walk proposal density satisfies q(θ∗(s) θ(s) ) = q(θ(s) θ∗(s) ). Thus, λ θ(s) θ∗(s) |H(s−1) ⎧ ⎫ ⎪ π(θ∗(s) ) Li (Ii Ki Ri |θ∗(s) )q(θ∗(s) θ(s) ) ⎪ ⎪ ⎪ ⎨ ⎬ i = min 1 ⎪ (s) ⎪ ⎪ Li (Ii Ki Ri |θ(s) )q(θ(s) θ∗(s) ) ⎪ ⎩ π(θ ) ⎭ i
⎧ ⎫ ⎪ π(θ∗(s) ) Li (Ii Ki Ri |θ∗(s) ) ⎪ ⎪ ⎪ ⎨ ⎬ i = min 1 ⎪ (s) ⎪ ⎪ Li (Ii Ki Ri |θ(s) ) ⎪ ⎩ π(θ ) ⎭ i
Notice that if the firm stays out or exits, then its future capital stock is zero. Therefore, no averaging over capital grid points is required to derive the emax function, which is simply E(s) [VO (0 θ)|H(s−1) ]. In the next section, we present the results of several Monte Carlo studies we conducted using our Bayesian DP algorithm. The first experiment is for the model that incorporates observed and unobserved heterogeneity; the second incorporates deterministic capital transition.24 5. SIMULATION AND ESTIMATION Denote the true values of θ by θ0 , that is, for the basic model, θ0 = (δ0 σ01 σ02 σu0 α0 b00 b02 b0e β0 ). We set them as δ0 = 04, σ01 = 03, σ02 = 03, σu0 = 04, α0 = 01, b00 = 00, b02 = 04, b0e = 05, and β0 = 098. We first solve the DP problem numerically using the conventional fullsolution method described earlier in detail. Next, we generate artificial data based on this DP solution. All estimation exercises are done on a 28 GHz Pentium 4 Linux workstation. For data generation, we solved the DP problem, where during each iteration we set capital grid points MK = 200 to be equally 24 The results of the experiment, where we estimate the basic model (without heterogeneity), are shown in Appendix A, Table AI.
1892
S. IMAI, N. JAIN, AND A. CHING
spaced between 0 and K, which we set to be 50. We draw M = 1000 revenue shocks, m , m = 1 M , and calculate the value function V (Ki m ). Then we compute the expected value function as their sample average. [V (Ki )] = E
M
V (Ki m )/1000
m=1
We simulate artificial data of capital stock, profit, and entry–exit choice d d d d N T sequences {Kit Rdit Iit }i=1t=1 using this DP solution. We then estimate the model using the simulated data with our Bayesian DP routine. We either set the discount factor at the true value β0 = 098 or estimate it, but with its prior being π(β) ∼ N(β σβ ) if β ≤ β and π(β) = 0 otherwise, where σβ = 02 and β = 0995. We imposed restrictions on the prior to guarantee that β < 1 so that the Bellman operator is a contraction mapping.25 Next, we report results of the experiments mentioned above.26 5.1. Experiment 1: Random Effects We now report estimation results of a model that includes observed and unobserved heterogeneity. For data generation, we assume that the profit coefficient for each firm i, αi , is distributed normally with mean α = 02 and standard error σα = 01. For the transition of capital, we simulate Xid from N(00 10) and set b1 = 01. All other parameters are set at true values given by the vector θ0 . Notice that if we use the conventional simulated ML estimation to estimate the model, for each firm i, we need to draw αi many times, say Mα times, and for each draw, we need to solve the DP problem. If the number of firms in the data is N d , then for a single simulated likelihood evaluation, we need to solve the DP problem N d Mα times. This process is computationally demanding and most researchers use only a finite number of types, typically less than 10, as an approximation of the observed heterogeneity and the random effect.27 Since in our Bayesian DP estimation exercise, the computational burden of estimating the dynamic model is similar to that of a static model, we can easily accommodate random effects estimation. 25 Notice that in DDC models, the discount factor is nonparametrically unidentified (see Rust (1994) and Magnac and Thesmar (2002)). Hence, estimation of the discount factor relies on functional form assumptions. 26 The results reported in Appendix A, Table AI, show that the full-solution-based ML outperforms the Bayesian DP in the basic model (without heterogeneity). 27 The only exceptions are economists who have access to supercomputers or large PC clusters. Bound, Stinebrickner, and Waidmann (2007) used interpolation methods to evaluate the expected value functions where the unobserved health status was continuous. They used PC clusters for their estimation.
BAYESIAN ESTIMATION OF DDC MODELS
1893
In contrast to the solution-estimation algorithm of the basic model, we iterate the Bellman operator once for each firm i separately. Let θ−α be the parameter vector except for the random effects term αi . Then, for any given K, we derive (s) VΓ (K θ−α αi )|H(s−1) E N(s) M 1 (s−n) (s−n) ∗(s−n) ∗(s−n) (s−n−1) V H = K l θ−α αi M l=1 Γ n=1 ×
∗(s−n) Kh (θ−α − θ−α )Kh (αi − α∗(s−n) ) i N(s) ∗(s−n) Kh (θ−α − θ−α )Kh (αi − α∗(s−n) ) i n=1
As pointed out by Heckman (1981) and others, the missing initial state vector is likely to be correlated with the unobserved heterogeneity αi , which would result in bias in the parameter estimates. To deal with this problem, for each firm i, given parameters (θ−α αi ), we simulate the model for 100 initial periods to derive the initial capital and the initial status of the firm. Then we proceed to construct the likelihood increment for firm i. We set N(s) to go up to 1000 iterations. The one-step Bellman operator for each firm i is the part where we have an increase in computational burden, but it turns out that the additional burden is far lighter than that of computing the fixed point of the Bellman operator for each firm Mα times to integrate out the random effects αi , as would be done in the simulated ML estimation strategy. We set the sample size to be 100 firms for 100 periods. All priors are diffuse. We set the initial guess of the expected value function to be 0. The Bayesian DP iteration was conducted 10000 times. We only use the draws from the 5001st iteration up to the 10000th iteration to derive the posterior means and standard deviations. We conducted 50 replications in this experiment. Table I reports the average of the posterior means (PM), the posterior standard deviations (PSD), and the standard deviation of the posterior means (sd(PM)) for the 50 replications. There are three sets of results. To obtain the first and second set of results (Bayesian DP 1, Bayesian DP 2, respectively), we set the initial parameter values equal to the true ones. We fix the discount factor β in Bayesian DP 1, while we estimate it in Bayesian DP 2. To obtain the third set of results (Bayesian DP 3), we fix β at the true value and set the other initial parameter values to be half of the true ones. All these results show that the posterior means are very close to—indeed within 1 standard deviation of—the true parameter values. In particular, the results presented in Bayesian DP 3
1894
S. IMAI, N. JAIN, AND A. CHING TABLE I POSTERIOR MEANS AND STANDARD DEVIATIONSa Bayesian DP 1
δ α σα σ1 σ2 b1 b2 be σu β CPU
Bayesian DP 2
Bayesian DP 3
PM
PSD
sd(PM)
PM
PSD
sd(PM)
PM
PSD
sd(PM)
True
0.4005 0.2013 0.1006 0.3005 0.3034 0.0993 0.3975 0.4954 0.4014
00162 00104 000736 000284 00113 000481 000943 00125 000314
00224 00104 000655 000261 00184 000425 000941 00146 000316
0.3957 0.2012 0.1005 0.3006 0.2995 0.0994 0.3977 0.4954 0.4014 0.9689
00165 00105 000736 000292 00124 000490 000952 00127 000318 00109
00239 00103 000651 000271 00208 000434 000924 00149 000312 00170
0.4011 0.2013 0.1006 0.3005 0.3036 0.0993 0.3975 0.4954 0.4014
00163 00104 000735 000286 00116 000482 000943 00126 000314
00226 00105 000656 00264 00190 000420 000939 00146 000318
04 02 01 03 03 01 04 05 04 098
4 h 0 min
3 h 58 min
4 h 0 min
a PM is the average of the posterior means across 50 replications; PSD is the average of the posterior standard deviations across 50 replications; sd(PM) is the standard deviation of the posterior means across 50 replications.
confirm the robustness of the Bayesian DP algorithm to the initial parameter values.28 On the other hand, there is a fairly large bias in the parameters estimated by simulated ML with Mα = 100. The point estimate of entry cost parameter δ = 03795, the mean of profit coefficient α = 01701 and its standard error σα = 009326, and the standard error of the choice shock σ2 = 02805 are all downwardly biased, and except for σα , the magnitude of the bias is larger than the standard error.29 The downward bias seems to be especially large for α, which leads us to conclude that the simulation size of Mα = 100 is not enough to integrate out the unobserved heterogeneity sufficiently accurately. The CPU time required for the Bayesian DP algorithm is about 4 hours, whereas for the full-solution-based Bayesian MCMC estimation, we needed about 31 hours, and for the full-solution-based ML estimation, 21 hours. That is, the Bayesian DP is about 8 times as fast as the full-solution-based Bayesian MCMC algorithm and about 5 times as fast as the simulated ML algorithm.30 28 The standard deviations of the posterior means across the 50 replications sd(PM) reflect the data uncertainty of the 50 simulation-estimation exercises. Note that they are very close to the mean of the posterior standard deviations PSD. Thus, the standard deviation of the Bayesian DP draws captures the data uncertainty well. 29 These values are the averages of 10 simulation-estimation exercises. The detailed results are shown in Tables AIII and AIV of Appendix A. 30 When we solve for the DP problem, both for the full-solution-based Bayesian estimation and the simulated ML estimation (details in Appendix A), we set M = 100. If we were to set M = 1000, then a single Newton iteration would take about 4 hours and 20 minutes, which is about the same CPU time as required for the entire Bayesian DP algorithm.
BAYESIAN ESTIMATION OF DDC MODELS
1895
We also tried to reduce the computational time for the full-solution-based ML algorithm by reducing the number of draws for αi from 100 to 20. Then the CPU time reduced to 8 hours and 43 minutes, which is still about twice as much time as required for the Bayesian DP algorithm. However, the point estimate of α is 0145, having larger downward bias than the estimate with Mα = 100. If we were to try to reduce the bias of the full-solution-based ML method by increasing the simulation size of unobserved heterogeneity from Mα = 100 to, say Mα = 1000, then the CPU time would be at least 200 hours. We also tried the ML estimation where the simulation size for draws is reduced from 100 to 20 while keeping Mα = 100. The parameter estimates and their standard errors are very similar to those of the 100 draws. However, the total CPU time of the ML estimation with 20 draws is 18 hours and 15 minutes, hardly different from that of the original 100 draws. Another estimation strategy for the simulated ML could be to expand the state variables of the DP problem to include both X and αi . Then we have to assign grid points for the three-dimensional state space points (K X αi ). If we assign 100 grid points per dimension, then we end up having 10000 times more grid points than before. Hence, the overall computational burden would be quite similar to the previous simulated ML estimation strategy. Thus, our Bayesian DP algorithm outperforms the full-solution-based conventional methods when the model includes observed and unobserved heterogeneity. Furthermore, the computational advantage of the Bayesian DP algorithm over the conventional full-solution-based Bayesian or ML estimation grows as the discount factor becomes closer to 1. Ching, Imai, Ishihara, and Jain (2009) showed that while the time required to estimate the model under full-solutionbased conventional methods becomes twice as much when β changes from 06 to 08, and becomes 20 times as much when β changes from 06 to 098, there is no difference in the overall computational performance of the Bayesian DP algorithm. This is because in full-solution-based algorithms, the closer the discount factor is to 1, the more time is required for the DP algorithm to converge. However, in our algorithm, DP iteration is done only once during each parameter estimation step, regardless of the value of the discount factor. 5.2. Experiment 2: Deterministic Transition (t) At iteration t we use K1(t) KM as grid points. We set MK = 10, hence K the total number of grid points increases over iterations up to MK × N(t) = 10 × 1000 = 10000. The formula for the expected value function for the incumbent who stays in is (t) VI (K θ)|H(t−n−1) E N(t) MK M 1 ≡ VI(t−n) Km(t−n) (t−n) θ∗(t−n) H(t−n−1) j M n=1 m=1 j=1
1896
S. IMAI, N. JAIN, AND A. CHING TABLE II POSTERIOR MEANS AND STANDARD DEVIATIONSa
Parameter
δ α σ1 σ2 b1 σu
PM
PSD
sd(PM)
True Value
01905 01019 03980 03961 02004 02000
00124 000477 000506 00126 000466 000322
00175 000428 000532 00187 000401 000366
02 01 04 04 02 02
Sample size CPU time
10000 48 min 3 s
a PM is the average of the posterior means across 50 replications; PSD is the average of the posterior standard deviations across 50 replications; sd(PM) is the standard deviation of the posterior means across 50 replications.
×
KhK (K − Km(t−n) )Khθ (θ − θ∗(t−n) ) N(t) MK
KhK (K − K
(t−k) m
)Khθ (θ − θ
∗(t−k)
)
k=1 m=1
where KhK is the kernel for the capital stock with bandwidth hK . The formulas for the expected value functions for the incumbent who exits and the potential entrant who stays out or enters are the same as those in the basic model. Table II shows the posterior means and the posterior standard deviations of the parameter estimates. They are the average of 50 replications of the simulation-estimation exercises. We can see that the parameter estimates are close to the true values. We can also see that the posterior standard deviations closely reflect the data uncertainty. The entire exercise took about 48 minutes. 6. CONCLUSION We have proposed a Bayesian estimation algorithm where the infinite horizon DP problem is solved and parameters are estimated at the same time. This dramatically increases the speed of estimation, particularly in models with observed and unobserved heterogeneity. We have demonstrated the effectiveness of our approach by estimating a simple infinite horizon, dynamic model of entry–exit choice. We find that the computational time required for estimating this dynamic model is in line with the time required for Bayesian estimation of static models. The additional computational cost of our algorithm is the cost of using information obtained in past iterations. Our Monte Carlo experiments show that the more complex a model becomes, the smaller is this cost relative to the cost of computing the full solution. We have also shown that our algorithm may help reduce the computational burden when the dimension of the state space is high. As is well known, the
BAYESIAN ESTIMATION OF DDC MODELS
1897
computational burden increases exponentially with an increase in the dimension of the state space. In our algorithm, even though at each iteration, the number of state space points on which we calculate the expected value function is small, the total number of “effective” state space points over the entire solution-estimation iteration grows with the number of Bayesian DP iterations. This number can be made arbitrarily large without much additional computational cost, and it is the total number of effective state space points that determines accuracy. This explains why our nonparametric approximation of the expected value function works well under the assumption of a continuous state space with deterministic transition function of the state variable. In this case, as is discussed in the main body of the paper, the Rust random grid method may face computational difficulties. It is worth mentioning that since we are locally approximating the expected value function nonparametrically, as we increase the number of parameters, we may face the curse of dimensionality in terms of the number of parameters to be estimated. So far, in our examples, this issue does not seem to have made a difference. The reason could be that most dynamic models specify per period return function and transition functions to be smooth and well behaved. Hence, we know in advance that the value functions we need to approximate are smooth and, hence, are well suited for nonparametric approximation. Furthermore, the simulation exercises show that with a reasonably large sample size, the MCMC simulations are tightly centered around the posterior mean. Hence, the actual multidimensional area where we need to apply nonparametric approximation is small. But in empirical exercises that involve many more parameters, one probably needs to adopt an iterative MCMC strategy where only up to four or five parameters are moved at once, which is also commonly done in conventional ML estimation. REFERENCES ACKERBERG, D. (2004): “A New Use of Importance Sampling to Reduce Computational Burden in Simulation Estimation,” Unpublished Manuscript, University of Arizona. [1866] AGUIRREGABIRIA, V., AND P. MIRA (2002): “Swapping the Nested Fixed Point Algorithm: A Class of Estimators for Discrete Markov Decision Models,” Econometrica, 70, 1519–1543. [1866] (2007): “Sequential Estimation of Dynamic Discrete Games,” Econometrica, 75, 1–53. [1866] ALBERT, J., AND S. CHIB (1993): “Bayesian Analysis of Binary and Polychotomous Data,” Journal of the American Statistical Association, 88, 669–679. [1880] ARCIDIACONO, P., AND J. B. JONES (2003): “Finite Mixture Distributions, Sequential Likelihood and the EM Algorithm,” Econometrica, 71, 933–946. [1866] ARCIDIACONO, P., AND R. MILLER (2009): “CCP Estimation of Dynamic Discrete Choice Models With Unobserved Heterogeneity,” Unpublished Manuscript, Duke University. [1866] BERTSEKAS, D. P., AND J. TSITSIKLIS (1996): Neuro-Dynamic Programming. Cambridge, MA: Athena Scientific Press. [1879,1880] BOUND, J., T. STINEBRICKNER, AND T. WAIDMANN (2007): “Health, Economic Resources and the Work Decisions of Older Men,” Working Paper 13657, NBER. [1892]
1898
S. IMAI, N. JAIN, AND A. CHING
BROWN, M., AND C. FLINN (2006): “Investment in Child Quality Over Marital States,” Unpublished Manuscript, University of Wisconsin. [1868] CHIB, S., AND E. GREENBERG (1996): “Markov Chain Monte Carlo Simulation Methods in Econometrics,” Econometric Theory, 12, 409–431. [1880] CHING, A., S. IMAI, M. ISHIHARA, AND N. JAIN (2009): “A Practitioner’s Guide to Bayesian Estimation of Discrete Choice Dynamic Programming Models,” Working Paper, Rotman School of Management, University of Toronto. [1895] ERDEM, T., AND M. P. KEANE (1996): “Decision Making Under Uncertainty: Capturing Dynamic Brand Choice Processes in Turbulent Consumer Goods Markets,” Marketing Science, 15, 1–20. [1865] FERRALL, C. (2005): “Solving Finite Mixture Models: Efficient Computation in Economics Under Serial and Parallel Execution,” Computational Economics, 25, 343–379. [1866] GEWEKE, J., AND M. P. KEANE (2000): “Bayesian Inference for Dynamic Discrete Choice Models Without the Need for Dynamic Programming,” in Simulation Based Inference and Econometrics: Methods and Applications, ed. by R. Mariano, T. Schuermann, and M. J. Weeks. Cambridge, MA: Cambridge University Press. [1868] HECKMAN, J. J. (1981): “The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating a Discrete Time-Discrete Data Stochastic Process,” in Structural Analysis of Discrete Data With Econometric Applications, ed. by C. F. Manski and D. McFadden. Cambridge, MA: MIT Press, 179–195. [1893] HECKMAN, J. J., AND B. SINGER (1984): “A Method for Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration Data,” Econometrica, 52, 271–320. [1868] HOTZ, J. V., AND R. MILLER (1993): “Conditional Choice Probabilities and the Estimation of Dynamic Models,” Review of Economic Studies, 60, 497–529. [1866] HOTZ, J. V., R. A. MILLER, S. SANDERS, AND J. SMITH (1994): “A Simulation Estimator for Dynamic Models of Discrete Choice,” Review of Economic Studies, 61, 265–289. [1866] HOUSER, D. (2003): “Bayesian Analysis of Dynamic Stochastic Model of Labor Supply and Savings,” Journal of Econometrics, 113, 289–335. [1868] IMAI, S., AND M. P. KEANE (2004): “Intertemporal Labor Supply and Human Capital Accumulation,” International Economic Review, 45, 601–641. [1884] IMAI, S., AND K. KRISHNA (2004): “Employment, Deterrence and Crime in a Dynamic Model,” International Economic Review, 45, 845–872. [1865] IMAI, S., N. JAIN, AND A. CHING (2009): “Supplement to ‘Bayesian Estimation of Dynamic Discrete Choice Models’,” Econometrica Supplemental Material, 77, http://www. econometricsociety.org/ecta/Supmat/5658_proofs.pdf; http://www.econometricsociety.org/ecta/ Supmat/5658_data and programs.zip. [1869] KEANE, M. P., AND K. I. WOLPIN (1994): “The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence,” The Review of Economics and Statistics, 76, 648–672. [1884] (1997): “The Career Decisions of Young Men,” Journal of Political Economy, 105, 473–521. [1865] LANCASTER, T. (1997): “Exact Structural Inference in Optimal Job Search Models,” Journal of Business & Economic Statistics, 15, 165–179. [1868] MAGNAC, T., AND D. THESMAR (2002): “Identifying Dynamic Decision Processes,” Econometrica, 70, 801–816. [1892] MCCULLOCH, R., AND P. ROSSI (1994): “An Exact Likelihood Analysis of the Multinomial Probit Model,” Journal of Econometrics, 64, 207–240. [1868,1876,1880] MCFADDEN, D., AND K. TRAIN (2000): “Mixed MNL Models for Discrete Response,” Journal of Applied Econometrics, 15, 447–470. [1886] NORETS, A. (2007): “Inference in Dynamic Discrete Choice Models With Serially Correlated Unobserved State Variables,” Unpublished Manuscript, University of Iowa. [1868,1875,1876, 1881]
BAYESIAN ESTIMATION OF DDC MODELS
1899
(2008): “Implementation of Bayesian Inference in Dynamic Discrete Choice Models,” Unpublished Manuscript, Princeton University. [1889] OSBORNE, M. J. (2007): “Consumer Learning, Switching Costs, and Heterogeneity: A Structural Estimation,” Discussion Paper EAG 07-10, Dept. of Justice, Antitrust Division, EAG. [1868, 1875,1876,1881] PAKES, A., AND P. MCGUIRE (2001): “Stochastic Algorithms, Symmetric Markov Perfect Equilibrium, and the ‘Curse’ of Dimensionality,” Econometrica, 69, 1261–1281. [1867,1875,1879] ROBERT, C. P., AND G. CASELLA (2004): Monte Carlo Statistical Methods. Springer Texts in Statistics. New York: Springer-Verlag. [1873] ROBERTS, M., AND J. TYBOUT (1997): “The Decision to Export in Columbia: An Empirical Model of Entry With Sunk Costs,” American Economic Review, 87, 545–564. [1884] RUST, J. (1987): “Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher,” Econometrica, 55, 999–1033. [1865,1886,1887] (1994): “Structural Estimation of Markov Decision Processes,” in Handbook of Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden. Amsterdam, Netherlands: Elsevier, 3082–3139. [1892] (1997): “Using Randomization to Break the Curse of Dimensionality,” Econometrica, 65, 487–516. [1867,1882] TANNER, M. A., AND W. H. WONG (1987): “The Calculation of Posterior Distributions by Data Augmentation,” Journal of the American Statistical Association, 82, 528–550. [1873] TIERNEY, L. (1994): “Markov Chains for Exploring Posterior Distributions,” The Annals of Statistics, 22, 1701–1762. [1873]
Dept. of Economics, Queen’s University, 233 Dunning Hall, 94 University Avenue, Kingston, ON K7L 5M2, Canada;
[email protected], Dept. of Economics, City University London, Northampton Square, London EC1V 0HB, U.K. and Dept. of Economics, Northern Illinois University, 508 Zulauf Hall, DeKalb, IL 60115, U.S.A.;
[email protected], and Rotman School of Management, University of Toronto, 105 St. George Street, Toronto, ON M5S 3E6, Canada;
[email protected]. Manuscript received January, 2005; final revision received May, 2009.
Econometrica, Vol. 77, No. 6 (November, 2009), 1901–1948
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION BY QIYING WANG AND PETER C. B. PHILLIPS1 Nonparametric estimation of a structural cointegrating regression model is studied. As in the standard linear cointegrating regression model, the regressor and the dependent variable are jointly dependent and contemporaneously correlated. In nonparametric estimation problems, joint dependence is known to be a major complication that affects identification, induces bias in conventional kernel estimates, and frequently leads to ill-posed inverse problems. In functional cointegrating regressions where the regressor is an integrated or near-integrated time series, it is shown here that inverse and ill-posed inverse problems do not arise. Instead, simple nonparametric kernel estimation of a structural nonparametric cointegrating regression is consistent and the limit distribution theory is mixed normal, giving straightforward asymptotics that are useable in practical work. It is further shown that use of augmented regression, as is common in linear cointegration modeling to address endogeneity, does not lead to bias reduction in nonparametric regression, but there is an asymptotic gain in variance reduction. The results provide a convenient basis for inference in structural nonparametric regression with nonstationary time series when there is a single integrated or near-integrated regressor. The methods may be applied to a range of empirical models where functional estimation of cointegrating relations is required. KEYWORDS: Brownian local time, cointegration, functional regression, Gaussian process, integrated process, kernel estimate, near integration, nonlinear functional, nonparametric regression, structural estimation, unit root.
1. INTRODUCTION A GOOD DEAL OF RECENT ATTENTION in econometrics has focused on functional estimation in structural econometric models and the inverse problems to which they frequently give rise. A leading example is a structural nonlinear regression where the functional form is the object of primary interest. In such systems, identification and estimation are typically much more challenging than in linear systems because they involve the inversion of integral operator equations which may be ill-posed in the sense that the solutions may not exist, may not be unique, and may not be continuous. Some recent contributions to this field include Newey, Powell, and Vella (1999), Newey and Powell (2003), Ai and Chen (2003), Florens (2003), and Hall and Horowitz (2005). Overviews of the ill-posed inverse literature are given in Florens (2003) and Carrasco, Florens, and Renault (2007). All of this literature has focused on microeconometric and stationary time series settings. In linear structural systems, problems of inversion from the reduced form are much simpler, and conditions for identification and consistent estimation techniques have been extensively studied. Under linearity, it is also well known 1 The authors thank a co-editor and two referees for helpful comments. Wang acknowledges partial research support from the Australian Research Council. Phillips acknowledges partial research support from a Kelly Fellowship and NSF Grant SES 06-47086.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7732
1902
Q. WANG AND P. C. B. PHILLIPS
that the presence of nonstationary regressors can provide a simplification. In particular, for cointegrated systems involving time series with unit roots, structural relations are actually present in the reduced form (and therefore always identified) because of the unit roots in a subset of the determining equations. In fact, such models can always be written in error correction or reduced rank regression format where the structural relations are immediately evident. The present paper shows that nonstationarity leads to major simplifications in the context of structural nonlinear functional regression. The primary simplification arises because in nonlinear models with endogenous nonstationary regressors, there is no ill-posed inverse problem. In fact, there is no inverse problem at all in the functional treatment of such systems. Furthermore, identification does not require the existence of instrumental variables that are orthogonal to the equation errors. Finally, and perhaps most importantly for practical work, consistent estimation may be accomplished using standard kernel regression techniques, and inference may be conducted in the usual way and is valid asymptotically under simple regularity conditions. These results for kernel regression in structural nonlinear models of cointegration open up new possibilities for empirical research. The reason why there is no inverse problem in structural nonlinear nonstationary systems can be explained heuristically as follows. In a nonparametric structural setting, it is conventional to impose on the disturbances a zero conditional mean condition given certain instruments, so as to assist in identifying an infinite-dimensional function. Such conditions lead to an integral equation involving the conditional probability distribution of the regressors and the structural function integrated over the space of the regressor. This equation describes the relation between the structure and reduced form, and its solution, if it exists and is unique, delivers the unknown structural function. But when the endogenous regressor is nonstationary, there is no invariant probability distribution of the regressor, only the local time density of the limiting stochastic process corresponding to a standardized version of the regressor as it sojourns in the neighborhood of a particular spatial value. Accordingly, there is no integral equation relating the structure to the reduced form. In fact, the structural equation itself is locally also a reduced form equation in the neighborhood of this spatial value, for when an endogenous regressor is in the locality of a specific value, the systematic part of the structural equation depends on that specific value and the equation is effectively a reduced form. What is required is that the nonstationary regressor spends enough time in the vicinity of a point in the space to ensure consistent estimation. This in turn requires recurrence, so that the local time of the limit process corresponding to the time series is positive. In addition, the random wandering nature of a stochastically nonstationary regressor such as a unit root process ensures that the regressor inevitably departs from any particular locality and thereby assists in tracing out (and identifying) the structural function over a wide domain. The process is similar to the manner in which instruments may shift the location in
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1903
which a structural function is observed and in doing so assist in the process of identification when the data are stationary. Linear cointegrating systems reveal a strong form of this property. As mentioned above, in linear cointegration the inverse problem disappears completely because the structural relations continue to be present in the reduced form. Indeed, they are the same as reduced form equations up to simple time shifts, which are of no importance in linear long run relations. In nonlinear structural cointegration, the same behavior applies locally in the vicinity of a particular spatial value, thereby giving local identification of the structural function and facilitating estimation. In linear cointegration, the signal strength of a nonstationary regressor ensures that least squares estimation is consistent, although the estimates are well known to have second order bias (Phillips and Durlauf (1986), Stock (1987)) and are therefore seldom used in practical work. Much attention has therefore been given in the time series literature to the development of econometric estimation methods that remove the second order bias and are asymptotically and semiparametrically efficient. In nonlinear structural functional estimation with a single nonstationary regressor, this paper shows that local kernel regression methods are consistent and that under some regularity conditions they are also asymptotically mixed normally distributed, so that conventional approaches to inference are possible. It is not necessary to use special methods or even an augmented regression equation where the cointegrating model is adjusted for the conditional mean to account for endogeneity, such as the augmented regressions that underlie semiparametric methods like FM regression or dynamic least squares in linear cointegrating models. These results constitute a major simplification in the functional treatment of nonlinear cointegrated systems and they directly open up empirical applications with existing methods. In related recent work, Karlsen, Myklebust, and Tjøstheim (2007) and Schienle (2008) used Markov chain methods to develop an asymptotic theory of kernel regression that allows for some forms of nonstationarity and endogeneity in the regressor. Schienle also considered additive nonparametric models with many nonstationary regressors and smooth backfitting methods of estimation. The results in the current paper are obtained using local time convergence techniques, extending those in Wang and Phillips (2009) to the endogenous regressor case and allowing for both integrated and near-integrated regressors with general forms of serial dependence in the generating mechanism and equilibrium error. The validity of the limit theory in the case of near-integrated regressors is important in practice because it is often convenient in empirical work not to insist on unit roots and to allow for roots near unity in the regressors. By contrast, conventional methods of estimation and inference in parametric models of linear cointegration are known to break down when the regressors have roots local to unity.
1904
Q. WANG AND P. C. B. PHILLIPS
The paper is organized as follows. Section 2 introduces the model and assumptions. Section 3 provides the main results on the consistency and limit distribution of the kernel estimator in a structural model of nonlinear cointegration and associated methods of inference. Section 4 reports some Monte Carlo simulations that explore the finite sample performance of the kernel estimator and the effects of augmented regression specification. Section 5 concludes and outlines ways in which the present paper may be extended. Proofs and various subsidiary technical results are given in Sections 6–9, which function as appendices to the paper. 2. MODEL AND ASSUMPTIONS We consider the nonlinear structural model of cointegration (2.1)
(t = 1 2 n)
yt = f (xt ) + ut
where ut is a zero mean stationary equilibrium error, xt is a jointly dependent nonstationary regressor, and f is an unknown function to be estimated with the observed data {yt xt }nt=1 . The conventional kernel estimate of f (x) in model (2.1) is given by n
(2.2)
fˆ(x) =
yt Kh (xt − x)
t=1 n
Kh (xt − x)
t=1
where Kh (s) = (1/ h)K(s/ h), K(x) is a nonnegative real function, and the bandwidth parameter h ≡ hn → 0 as n → ∞. The limit behavior of fˆ(x) has been investigated in past work in some special situations, notably where the error process ut is a martingale difference sequence and there is no contemporaneous correlation between xt and ut . These are strong conditions, they are particularly restrictive in relation to the conventional linear cointegrating regression framework, and they are unlikely to be satisfied in econometric applications. However, they do facilitate the development of a limit theory by various methods. In particular, Karlsen, Myklebust, and Tjøstheim (2007) investigated fˆ(x) in the situation where xt is a recurrent Markov chain, allowing for some dependence between xt and ut . Under similar conditions and using related Markov chain methods, Schienle (2008) investigated additive nonlinear versions of (2.1) and obtained a limit theory for nonparametric regressions under smooth backfitting. Wang and Phillips (2009, hereafter WP) considered an alternative treatment by making use of local time limit theory and, instead of recurrent t Markov chains, worked with partial sum representations of the type xt = j=1 ξj , where ξj is a general linear process.
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1905
These authors showed that the limit theory for fˆ(x) has links to traditional nonparametric asymptotics for stationary models with exogenous regressors even though the rates of convergence are different and typically slower when xt is nonstationary and the limit theory is mixed normal rather than normal. In extending this work, it seems particularly important to relax conditions of independence and permit joint determination of xt and yt , and to allow for serial dependence in the equilibrium errors ut and the innovations driving xt , so that the system is a time series structural model. The goal of the present paper is to do so and to develop a limit theory for structural functional estimation in the context of nonstationary time series that is more in line with the type of assumptions made for parametric linear cointegrated systems. Throughout the paper we let {t }t≥1 be a sequence of independent and identically distributed (i.i.d.) continuous random variables with E1 = 0 and E21 = 1, ∞ and with the characteristic function ϕ(t) of 1 satisfying −∞ |ϕ(t)| dt < ∞. The sequence {t }t≥1 is assumed to be independent of another i.i.d. random sequence {λt }t≥1 that enters into the generating mechanism for the equilibrium errors. These two sequences comprise the innovations that drive the time series structure of the model. We use the following assumptions in the asymptotic development. xt = ρxt−1 + ηt , wherex0 = 0 ρ = 1 + κ/n with κ being a ASSUMPTION 1: ∞ ∞ ∞ constant, and ηt = k=0 φk t−k with φ ≡ k=0 φk = 0 and k=0 |φk | < ∞. ASSUMPTION 2: ut = u(t t−1 t−m0 +1 λt λt−1 λt−m0 +1 ) satisfies Eut = 0 and Eu4t < ∞ for t ≥ m0 , where u(x1 xm0 y1 ym0 ) is a real measurable function on R2m0 . We define ut = 0 for 1 ≤ t ≤ m0 − 1. ASSUMPTION 3: K(x) is a nonnegative bounded continuous ixt function satisfying ˆ ˆ K(x) dx < ∞ and |K(x)| dx < ∞, where K(x) = e K(t) dt. ASSUMPTION 4: For given x, there exists a real function f1 (s x) and a 0 < γ ≤ 1 suchthat, when h sufficiently small, |f (hy + x) − f (x)| ≤ hγ f1 (y x) for all ∞ y ∈ R and −∞ K(s)f1 (s x) ds < ∞. Assumption 1 allows for both a unit root (κ = 0) and a near unit root (κ = 0) regressor by virtue of the localizing coefficient κ, and is standard in the nearintegrated regression framework (Chan and Wei (1987), Phillips (1987, 1988)). The regressor xt is then a triangular array formed from a (weighted) partial sum of linear process innovations that satisfy a simple summability condition with long run moving average coefficient φ = 0. We remark that in the cointegrating framework, it is conventional to set κ = 0 so that the regressor is integrated and this turns out to be important in inference. Indeed, in linear parametric cointegration, it is well known (e.g., Elliott (1998)) that near integration (κ = 0) leads to failure of standard cointegration estimation and test
1906
Q. WANG AND P. C. B. PHILLIPS
procedures. As shown here, no such failures occur under near integration in the nonparametric regression context. Assumption 2 allows the equation error ut to be serially dependent and cross-correlated with xs for |t − s| < m0 , thereby inducing endogeneity in the regressor. As a consequence, we may have cov(ut xt ) = 0. This makes the model in the current paper essentially different from the one investigated in Theorem 3.2 of WP. WP imposed the condition that xt is adapted to Ft−1 , where (ut Ft ) forms a martingale difference. Hence, under the conditions of WP, one always has cov(ut xt ) = E[xt E(ut | Ft−1 )] = 0. This difference explains why the proof of the main result in the current paper is so different from that in WP. In WP, we could use a general martingale central limit theorem (CLT) result, but such an approach is not possible in the current framework because the sample covariance function is not a martingale. In the asymptotic development below, the lag parameter m0 in Assumption 2 is assumed to be finite, but this could likely be relaxed under some additional conditions and with greater complexity in the proofs, although that is not done here. It is not necessary for ut to depend on λs , in which case there would be only a single innovation sequence. However, in most practical cases involving cointegration between two variables, we can expect that there will be two innovation sequences. While ut is stationary in Assumption 2, we later discuss some nonstationary cases where the conditional variance of ut may depend on xt . Note also that Assumption 2 allows for a nonlinear generating mechanism for the equilibrium error ut . This seems appropriate in a context where the regression function itself is allowed to take a general nonlinear form. Assumption 3 places stronger conditions on the kernel function than are usual in kernel estimation, requiring that the Fourier transform of K(x) is integrable. This condition is needed for technical reasons in the proofs and is clearly satisfied for many commonly used kernels, like the normal kernel or kernels that have a compact support. Assumption 4, which was used in WP, is quite weak and can be verified for various kernels K(x) and regression functions f (x). For instance, if K(x) is a standard normal kernel or has a compact support, a wide range of regression functions f (x) are included. Thus, commonly occurring functions like f (x) = |x|β and f (x) = 1/(1 + |x|β ) for some β > 0 satisfy Assumption 4 with γ = min{β 1}. When γ = 1, stronger smoothness conditions on f (x) can be used to assist in developing analytic forms for the asymptotic bias function in kernel estimation. 3. MAIN RESULT AND OUTLINE OF THE PROOF The limit theory for the conventional kernel regression estimate fˆ(x) under random normalization turns out to be very simple and is given in the following theorem.
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1907
THEOREM 3.1: For any h satisfying nh2 → ∞ and h → 0, (3.1)
fˆ(x) →p f (x)
Furthermore, for any h satisfying nh2 → ∞ and nh2(1+2γ) → 0, n 1/2 h (3.2) Kh (xt − x) (fˆ(x) − f (x)) →D N(0 σ 2 ) t=1
where σ 2 = E(u2m0 )
∞ −∞
K 2 (s) ds/
∞ −∞
K(x) dx.
REMARK A: The result (3.1) implies that fˆ(x) is a consistent estimate of f (x). Furthermore, as in WP, we may show that
√ fˆ(x) − f (x) = oP an hγ + ( nh)−1/2 (3.3) where γ is defined as in Assumption 4 and an diverges to infinity as slowly as required. This indicates that a possible “optimal” bandwidth h which yields the best rate in (3.3) or the minimal E(fˆ(x) − f (x))2 at least for general γ satisfies
√ h∗ ∼ a arg min hγ + ( nh)−1/2 ∼ a n−1/[2(1+2γ)] h
where a and a are positive constants. In the most common case that γ = 1, this result suggests a possible optimal bandwidth to be h∗ ∼ a n−1/6 , so that h = o(n−1/6 ) ensures undersmoothing. This is different from nonparametric regression with a stationary regressor, which typically requires h = o(n−1/5 ) for undersmoothing. Under stronger smoothness conditions on f (x), it is possible to develop an explicit expression for the bias function and the weaker condition h = o(n−1/10 ) applies for undersmoothing. Some further discussion and results are given in Remark C and Section 9. REMARK B: To outline the essentials of the argument in the proof of Theorem 3.1, we split the error of estimation fˆ(x) − f (x) as n
(3.4)
fˆ(x) − f (x) =
ut K[(xt − x)/ h]
t=1 n
K[(xt − x)/ h]
t=1 n [f (xt ) − f (x)]K[(xt − x)/ h]
+
t=1 n t=1
K[(xt − x)/ h]
1908
Q. WANG AND P. C. B. PHILLIPS
√ The result (3.3), which implies (3.1) by letting an = min{h−γ ( nh)1/2 }, will follow if we prove n
√
ut K[(xt − x)/ h] = OP ( nh)1/2
(3.5)
Θ1n :=
(3.6)
n √ Θ2n := [f (xt ) − f (x)]K[(xt − x)/ h] = OP { nh1+γ }
t=1
t=1
and if, for any an diverging to infinity as slowly as required, (3.7)
Θ3n := 1
n
√ K[(xt − x)/ h] = oP {an /( nh)}
t=1
On the other hand, it is readily seen that h
n
1/2 (fˆ(x) − f (x))
Kh (xt − x)
t=1 n
ut K[(xt − x)/ h]
+ Θ2n Θ3n =
n K[(xt − x)/ h] t=1
t=1
√ By virtue of (3.6) and (3.7) with an = (nh2+4γ )−1/8 , we obtain Θ2n Θ3n →P 0, since nh2+4γ → 0. The stated result (3.2) will then follow if we prove [nt] n (nh2 )−1/4 (3.8) uk K[(xk − x)/ h] (nh2 )−1/2 K[(xk − x)/ h] k=1
k=1
→D {d0 NL (t 0) d1 L(1 0)} ∞ ∞ on D[0 1]2 , where d02 = |φ|−1 E(u2m0 ) −∞ K 2 (s) dt, d1 = |φ|−1 −∞ K(s) ds, L(t 0) is the local time process at the origin of the Gaussian diffusion process {Jκ (t)}t≥0 defined by t (3.9) e(t−s)κ W (s) ds Jκ (t) = W (t) + κ 1/2
0
and {W (t)}t≥0 being a standard Brownian motion, and where N is a standard normal variate independent of L(t 0). The local time process L(t a) is defined
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1909
by (3.10)
1 L(t a) = lim →0 2
t
I |Jκ (r) − a| ≤ dr
0
Indeed, since P(L(1 0) > 0) = 1, the required result (3.2) follows by (3.8) and the continuous mapping theorem. It remains to prove (3.5)–(3.8), which are established in Section 6. As for (3.8), it is clearly sufficient for the required result to show that the finite-dimensional distributions converge in (3.8). REMARK C: Results (3.2) and (3.8) show that fˆ(x) has an asymptotic distribution that is mixed normal and that this limit theory holds even in the presence of an endogenous regressor. The mixing variate in the limit distribution depends on the local time process L(1 0), as follows from (3.8). Explicitly, (3.11)
(nh2 )1/4 (fˆ(x) − f (x)) →D d0 d1−1 NL−1/2 (1 0)
whenever nh2 → ∞ and nh2(1+2γ) → 0. Again, this is different from nonparametric regression with a stationary regressor. As noticed in WP, in the nonstationary case, the amount√of time spent by the process around any particular spatial point is of order n rather thann, so that the corresponding conver√ nh√= (nh2 )1/4 , which requires that gence rate in such regressions is now 2 nh → ∞. In effect, the local sample size is nh in nonstationary regression involving integrated processes, rather than nh as in the case of stationary regression. The condition that nh2(1+2γ) → 0 is required to remove bias. This condition can be further relaxed if we add stronger smoothness conditions on f (x) and incorporate an explicit bias term in (3.11). A full development requires further conditions and a very detailed analysis, which we defer to later work. differencesequence with In the simplest case where κ = 0, ut is a martingale E(u2t ) = σu2 , ut is independent of xt , K satisfies K(y) dy = 1 yK(y) dy = 0 and has compact support, and f has continuous, bounded third derivatives, it is shown in Section 9 that
(3.12)
h2 fˆ(x) − f (x) − f (x) (nh ) 2 ∞ 2 2 N 0 σu K (s) ds −∞ ⇒ L(1 0)1/2
∞
2 1/4
2
y K(y) dy −∞
provided nh14 → 0 and nh2 → ∞. Importantly, there is no linear term in the bias function appearing in (3.12), in contrast to the limit theory for local level nonparametric regression for stationary time series.
1910
Q. WANG AND P. C. B. PHILLIPS
REMARK D: As is clear from the second n member of (3.8), the signal √ strength in the present kernel regression is O( k=1 K[(xk − x)/ h]) = O( nh), which gives the local sample size in this case, so consistency requires that the bandwidth h does not pass to zero too fast (viz., nh2 → ∞). On the other hand, when h tends to zero slowly, estimation bias is manifest even in very large samples. Some illustrative simulations are reported in the next section. REMARK E: The limiting variance of the (randomly normalized) kernel estimator in (3.2) is simply a scalar multiple of the variance of the equilibrium error, namely Eu2m0 , rather than a conditional variance that depends on xt ∼ x, as is commonly the case in kernel regression theory for stationary time series. This difference is explained by the fact that, under Assumption 2, ut is stationary and, even though ut is correlated with the shocks εt εt−m0 +1 involved in generating the regressor xt , the variation of ut when xt ∼ x is still measured by Eu2m0 in the limit theory. If Assumption 2 is relaxed to allow for some explicit nonstationarity in the conditional variance of ut , then this may impact the limit theory. The manner in which the limit theory is affected depends on the form of the conditional variance function. For instance, suppose the equilibrium error is u t = g(xt )ut , where ut satisfies Assumption 2 and is independent of xt , and where g is a positive continuous function (e.g., g(x) = 1/(1 + |x|α ) for some α > 0). In this case under some additional regularity conditions, modifications to the arguments given in Proposition 7.2 show that of the limit ∞the variance ∞ distribution is now given by σ 2 (x) = E(u2m0 )g(x)2 −∞ K 2 (s) ds/ −∞ K(x) dx The limiting variance of the kernel estimator is then simply a scalar multiple of the variance of the equilibrium error, where the scalar depends on g(x). REMARK F: Theorem 3.1 gives a pointwise result at the value x, while the process xt itself is recurrent and wanders over the whole real line. For fixed points x = x , the kernel cross-product (3.13)
n xt − x xt − x 1 K = op (1) K √ h h nh t=1
for x = x
√ To show (3.13), note that if xt / t has a bounded density ht (y), as in WP, we have xt − x xt − x K E K h h √ √ ty − x ty − x K ht (y) dy = K h h yh + x x − x −1/2 dy ht = ht K(y)K y + √ h t
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
∼ ht −1/2 ht (0) = o ht −1/2
1911
x − x dy K(y)K y + h
whenever x = x , h → 0, and t → ∞. Then n n 1 1 xt − x xt − x 1 = op (1) K = op √ K √ h h nh t=1 n t=1 t 1/2 This result and Theorem 2.1 of WP give 2 ⎤ xt − x xt − x xt − x K K n ⎥ 1 ⎢ h h h ⎢ ⎥ √ 2 ⎣ ⎦ nh t=1 xt − x xt − x xt − x K K K h h h ⎤ ⎡ K(s)2 ds 0 ⎥ ⎢ ⇒ L(1 0) ⎣ ⎦ 2 0 K(s) ds
⎡
K
Following the same line of argument as in the proof of Theorem 3.2 of WP, it follows that in the special case where ut is a martingale difference sequence independent of xt , the regression ordinates (fˆ(x) fˆ(x )) have a mixed normal limit distribution with diagonal covariance matrix. The ordinates are then asymptotically conditionally independent given the local time L(1 0) Extension of this theory to the general case where ut and xt are dependent involves more complex limit theory and is left for later work. REMARK G: The error variance term Eu2m0 in the limit distribution (3.2) may be estimated by a localized version of the usual residual based method. Indeed, by letting n [yt − fˆ(x)]2 Kh (xt − x)
(3.14)
σˆ n2 =
t=1 n
Kh (xt − x)
t=1
we have the following theorem under minor additional conditions.
1912
Q. WANG AND P. C. B. PHILLIPS
∞ THEOREM 3.2: In addition to Assumptions 1–4, Eu8m0 < ∞ and −∞ K(s)f12 (s x) ds < ∞ for given x. Then, for any h satisfying nh2 → ∞ and h → 0, (3.15)
σˆ n2 →p Eu2m0
Furthermore, for any h satisfying nh2 → ∞ and nh2(1+γ) → 0, (3.16) (nh2 )1/4 σˆ n2 − Eu2m0 →D σ1 NL−1/2 (1 0) where N and L(1 0) are defined as in (3.8) and 2 ∞ 2 2 2 2 σ1 = E um0 − Eum0 K (s) ds −∞
∞
K(x) dx
−∞
While the estimator σˆ n2 is constructed from the regression residuals yt − fˆ(x), it is also localized at x because of the action of the kernel function Kh (xt − x) in (3.14). Note, however, that in the present case the limit theory for σˆ n2 is not localized at x. In particular, the limit of σˆ n2 is the unconditional variance Eu2m0 , not a conditional variance, and the limit distribution of σˆ n2 given in (3.16) depends only on the local time L(1 0) of the limit process at the origin, not on the precise value of x. The explanation is√ that conditioning on the neighbor√ √ hood xt ∼ x is equivalent to xt / n ∼ x/ n or xt / n ∼ 0, which translates into the local time of the limit process of xt at the origin irrespective of the given value of x. For the same reason, as discussed in Remark E above, the limit distribution of the kernel regression estimator given in (3.2) depends on the variance Eu2m0 . However, as discussed in Remark E, in the more general context where there is nonstationary conditional heterogeneity, the limit of σˆ n2 may be correspondingly affected. For instance, in the case considered there where u t = g(xt )ut , ut satisfies Assumption 2, and g is a positive continuous function, we find that σˆ n2 →p Eu2m0 g(x)2 . REMARK H: In parametric cointegrating regression, techniques such as FM regression (Phillips and Hansen (1990)) have been developed to eliminate the second order bias effects in the limit theory that arise from endogeneity, thereby improving upon simple least squares regression. These techniques typically augment the regression equation to address the effects of endogeneity by adjusting for the (long run) conditional mean. Interestingly, there is no need to augment the regression equation in this way to achieve bias reduction in nonparametric cointegrating regression. To illustrate, we take the case where the regressor follows the simple model xt = xt−1 + t and E(ut | t ) = λt . The augmented regression equation is, say, (3.17)
yt = f (xt ) + E(ut |t ) + (ut − E(ut |t )) = f (xt ) + λxt + uyt
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1913
which is the nonparametric analogue of the augmented regression used in linear cointegration models. From observations on xt and the residuals uˆ t = yt − fˆ(xt ) where fˆ is defined by (2.2), the first stage nonparametric estimate of f in model (2.1), the least squares estimate of λ is given by n [yt − fˆ(xt )]xt
λˆ =
t=1 n
2 t
x
t=1
Using λˆ in place of λ in (3.17) and conventional kernel regression to estimate f in this equation, we have the following nonparametric augmented regression estimate of f n ˆ t )Kh (xt − x) (yt − λx
(3.18)
fˆa (x) =
t=1 n
Kh (xt − x)
t=1
We now show that the limit distribution of fˆa (x) is the same as that of fˆ(x) except for a scale variance effect that arises from the adjusted error term uyt in (3.17). Hence, there is no bias reduction in the use of the augmented regression equation (3.17), unlike linear parametric cointegration. Indeed we have the following theorem. THEOREM 3.3: In addition to Assumptions 1–4, assume E(u2m0 2m0 ) < ∞, K(x) has a compact support, and f (x) satisfies |f (x) − f (y)| ≤ C|x − y| whenever x − y is sufficiently small. Then, for any h satisfying nh2 → ∞ and h → 0, (3.19)
λˆ →p λ
Furthermore, for any h satisfying hn1/2+δ0 → ∞ and nh6 → 0, where 0 < δ0 < 1, (3.20)
h
n
1/2 Kh (xt − x)
(fˆa (x) − f (x)) →D N(0 σa2 )
t=1
where σa2 = E(u2ym0 )
∞ −∞
K 2 (s) ds/
∞ −∞
K(x) dx.
Thus, the effect of estimating the augmented regression equation in (3.17) is to deliver a scale variance reduction that corresponds to E(u2yt ) ≤ E(u2t ) in
1914
Q. WANG AND P. C. B. PHILLIPS
the limit theory for the estimator fˆa (x). The variance reduction results from the inclusion of the stationary regressor xt in (3.17). There is no bias reduction, unlike the case of linear parametric cointegration. The same result and the same limit theory apply for the infeasible kernel estimate n (yt − λxt )K[(xt − x)/ h]
(3.21)
f˜a (x) =
t=1 n
K[(xt − x)/ h]
t=1
where λ is assumed known. 4. SIMULATIONS This section reports the results of a simulation experiment investigating the finite sample performance of the kernel regression estimator. The generating mechanism follows (2.1) and has the explicit form yt = f (xt ) + σut
xt = t
ut = (λt + θt )/(1 + θ2 )1/2 where (t λt ) are i.i.d. N(0 I2 ) and x0 = 0. The following two regression functions were used in the simulations: fA (x) =
∞ (−1)j+1 sin(jπx) j=1
j2
fB (x) = x3
The first function corresponds (up to a scale factor) to the function used in Hall and Horowitz (2005) and is truncated at j = 4 for computation. Figures 1 and 2 graph these functions (the solid lines) and the mean simulated kernel estimates (broken lines) over the intervals [0 1] and [−1 1] for kernel estimates of fA and fB respectively. Bias, variance, and mean squared error for the estimates were computed on the grid of values {x = 001k; k = 0 1 100} for [0 1] and {x = −1 + 002k; k = 0 1 100} for [−1 1] based on 20,000 replications. Simulations were performed for θ = 02 (weak endogeneity, corr(ut t ) = 02) and θ = 20 (strong endogeneity, corr(ut t ) = 09) for σ = 02 and for the sample size n = 500. An Epanechnikov kernel was used with bandwidths h = n−10/18 n−1/2 n−1/3 n−1/5 . Table I shows the performance of the regression estimates fˆ, fˆa , and f˜a computed for various bandwidths, two of which (n−1/3 n−1/5 ) satisfy and two of which (n−10/18 n−1/2 ) violate the condition nh2 → ∞ of Theorem 3.1, thereby showing the effects of bandwidth on estimation. While smaller bandwidths may
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1915
FIGURE 1.—Graphs over the interval [0 1] of fA (x) and Monte Carlo estimates of E(fˆA (x)) for h = n−1/2 (short dashes), h = n−1/3 (dotted), and h = n−1/5 (long dashes) with θ = 2, σ = 02, and n = 500.
n reduce bias, when h → 0 so fast that nh2 → ∞ the “signal” k=1 K[(xk − x)/ h] no longer diverges and the estimate fˆ is inconsistent (see (3.8)). Also, since xt is recurrent and wanders over the real line, some simulations are inevitably thin in subsets of the chosen domains (as in the simulation design
FIGURE 2.—Graphs over the interval [0 1] of estimation bands for fA (x) (solid line), the Monte Carlo estimate of E(fA (x)) for h = n1/3 (short dashes), and 95% estimation bands (dotted) with θ = 2, σ = 02, and n = 500.
1916
Q. WANG AND P. C. B. PHILLIPS TABLE I
SUMMARY COMPARISONS (AVERAGED OVER THE GRIDS DESCRIBED IN THE TEXT) OF LOCAL LEVEL, FEASIBLE AUGMENTED, AND INFEASIBLE AUGMENTED NONPARAMETRIC ESTIMATES fˆ(x) θ
h
Bias
f˜a (x)
Std
MSE
Bias
Model A: fA (x) =
Std
4 j=1
fˆa (x)
MSE
Bias
Std
MSE
(−1)j+1 sin(jπx) j2
2
n−10/18 n−1/2 n−1/3 n−1/5
−0005 0000 0011 0067
0.129 0.126 0.122 0.141
0.017 0.016 0.016 0.032
0000 0001 0015 0073
0.064 0.063 0.076 0.114
0.004 0.004 0.008 0.027
−0001 0001 0014 0071
0.069 0.070 0.083 0.117
0.005 0.005 0.009 0.027
0.2
n−10/18 n−1/2 n−1/3 n−1/5
−0001 0001 0015 0072
0.135 0.128 0.122 0.141
0.018 0.017 0.016 0.033
−0000 0001 0016 0073
0.133 0.126 0.120 0.139
0.018 0.016 0.015 0.033
−0000 0002 0016 0072
0.133 0.126 0.121 0.140
0.018 0.016 0.016 0.033
Model B: fB (x) = x3 2
−10/18
n n−1/2 n−1/3 n−1/5
0000 0001 0000 0000
0.125 0.119 0.104 0.105
0.164 0.015 0.011 0.011
0000 0000 0000 0004
0.058 0.056 0.055 0.069
0.004 0.003 0.003 0.007
−0003 −0000 −0001 −0002
0.153 0.171 0.371 0.491
0.028 0.030 0.138 0.244
0.2
n−10/18 n−1/2 n−1/3 n−1/5
0001 −0001 0000 0001
0.127 0.121 0.104 0.102
0.017 0.015 0.011 0.012
0001 −0001 0000 0000
0.124 0.119 0.102 0.102
0.016 0.014 0.010 0.012
−0002 −0002 −0001 −0002
0.188 0.203 0.381 0.497
0.039 0.042 0.146 0.249
here) and this inevitably affects performance due to the small “local” sample size. The results in Table I show that in both models the degree of endogeneity (θ) in the regressor has a negligible effect on the properties of the kernel regression estimate fˆ, although estimation bias does increase in Model A when the bandwidth is h = n−1/5 which is the conventional rate for stationary series. For both models, finite sample performance of fˆ in terms of mean squared error (MSE) seems to be optimized for h around n−1/3 . Table I also enables a comparison between the local level estimate fˆ and the feasible and infeasible augmented regression estimates fˆa (x) and f˜a (x) The infeasible estimate has uniformly smaller variance than fˆ(x), as may be expected from asymptotic theory, and it also has smaller variance than fˆa (x), resulting from the finite sample effects of estimating λ in the latter. For Model A and the case of strong endogeneity, the variance of the feasible estimate fˆa (x) is considerably smaller than that of fˆ(x), so the feasible procedure has a clear advantage in this case. But these gains do not appear under weak endogeneity or under either weak or strong endogeneity for
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1917
Model B. The variance of the feasible estimate fˆa (x) is, in fact, much larger than that of fˆ(x) for Model B. Both fˆa (x) and f˜a (x) display negligible bias, just as fˆ(x) but there is no apparent gain in terms of bias reduction from the use of the augmented regression estimates, including the infeasible estimate. These results indicate that feasible nonparametric estimation of the augmented regression equation (3.17), leading to fˆa (x), does not dominate the simple nonparametric regression estimate fˆ(x) in terms of bias. Neither does fˆa (x) always dominate fˆ(x) in terms of variance in finite samples, even though asymptotic theory suggests an improvement and the equation error in (3.17) has smaller variance than that of (2.1). This outcome contrasts with linear parametric cointegrating regression, where it is generally beneficial— and near universal empirical practice—to fit the augmented regression equation. Figures 1 and 2 show results for the Monte Carlo approximations to E(fˆA (x)) and E(fˆB (x)) corresponding to bandwidths h = n−1/2 (broken line), h = n−1/3 (dotted line), and h = n−1/5 (dashed and dotted line) for θ = 2 Figures 3 and 4 show the Monte Carlo approximations to E(fˆA (x)) and E(fˆB (x)) together with a 95% pointwise “estimation band.” As in Hall and Horowitz (2005), these bands connect points f (xj ± δj ), where each δj is chosen so that the interval [f (xj ) − δj f (xj ) + δj ] contains 95% of the 10,000 simulated values of fˆ(xj ) for Models A and B, respectively. Apparently, the bands can be wide, reflecting the slower rate of convergence of the kernel esti√ mate fˆ(x) in the nonstationary case. In particular, since xt spends only n of its time in the neighborhood of any specific point, the effective sample
FIGURE 3.—Graphs of fB (x) and Monte Carlo estimates of E(fˆB (x)) for h = n−1/2 (short dashes), h = n−1/3 (dotted), and h = n−1/5 (long dashes) with 6 = 2, σ = 02, and n = 500.
1918
Q. WANG AND P. C. B. PHILLIPS
FIGURE 4.—Graphs of estimation bands for fB (x) (solid line), the Monte Carlo estimate of E(fˆB (x)) for h = n−1/3 (short dashes), and 95% estimation bands (dotted) with 6 = 2, σ = 02, and n = 500.
√ size for pointwise estimation purposes is 500 ∼ 22 When h = n−1/3 it follows from Theorem 3.1 that the convergence rate is (nh2 )1/4 = n1/12 which is much slower than the rate (nh)1/2 = n2/5 for conventional kernel regression. Using Theorems 3.1 and 3.2, an asymptotic 100(1 − α)% level confidence interval for f (x) is given by 1/2 2 σ ˆ μ 2 /μK K n fˆ(x) ± zα/2 n xt − x K h t=1 ∞ ∞ where μK2 = −∞ K 2 (s) ds μK = −∞ K(s) ds and zα/2 = −1 (1 − α/2) using the standard normal cumulative distribution function (c.d.f.) . Figures 5 and 6 show the empirical coverage probabilities of these pointwise asymptotic confidence intervals for fA and fB over 100 equispaced points on the domains [0 1] and [−1 1], using an Epanechnikov kernel, various bandwidths as shown, and setting α = 005 and n = 500 For both functions, the coverage rates are more uniform over the respective domains for the smaller bandwidth choices, but the undercoverage also increases as the bandwidth gets smaller. For both fA and fB there is evidence of substantial undercoverage with around 60% coverage over most of the domain when h = n−1/3 and around 70% coverage when h = n−1/4 . Coverage is higher (around 80%) when h = n−1/5 , but for function fA dips to below 60% in the region (around x ∼ 07) where the nonparametric estimator is most biased for larger bandwidths (see Figure 1).
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1919
FIGURE 5.—Coverage probabilities of (nominal 95%) confidence intervals for fA (x) over [0 1] for different bandwidths.
5. CONCLUSION The main results in the paper have many implications. First, there is no inverse problem in structural models of nonlinear cointegration of the form (2.1), where the regressor is an endogenously generated integrated or nearintegrated process. This result reveals a major simplification in structural nonparametric regression in cointegrating models, avoiding the need for instrumentation and eliminating ill-posed functional equation inversions. Second,
FIGURE 6.—Coverage probabilities of (nominal 95%) confidence intervals for fB (x) over [0 1] for different bandwidths.
1920
Q. WANG AND P. C. B. PHILLIPS
functional estimation of (2.1) is straightforward in practice and may be accomplished by standard kernel methods with no modification. These methods yield consistent estimates that have a mixed normal limit distribution, thereby validating conventional methods of inference in the nonstationary nonparametric setting. Third, the methods are applicable without change when the regressor is near-integrated with a root local to unity rather than unity, providing a major departure from the parametric case where near integration presents substantial difficulties in inference. The results open up interesting possibilities for functional regression in empirical research with integrated and near-integrated processes. There are some possible extensions of the ideas presented here to other models involving nonlinear functions of integrated processes. In particular, additive nonlinear cointegration models (cf. Schienle (2008)) and partial linear cointegration models may be treated in a similar way to (2.1). But multiple nonadditive regression models do present difficulties arising from the nonrecurrence of the limit processes in high dimensions (cf. Park and Phillips (2001)). There are issues of specification testing, functional form tests, and cointegration tests, which may now be addressed using these methods. Also, the simulations reported here indicate that there is a need to improve confidence interval coverage probabilities in the use of these nonparametric methods. We hope to report progress on some of these issues in later work. 6. PROOF OF THEOREM 3.1 As shown in Remark B, the proof of the theorem essentially amounts to proving (3.5)–(3.8). To do so, we will make use of various subsidiary results which are proved here and in the next section. First, it is convenient to introduce the following definitions and notation. If αn(1) , αn(2) α(k) n (1 ≤ n ≤ ∞) are random elements of D[0 1], we will understand the condition (1) (2) (1) (2) α∞ α(k) →D α∞ αn αn α(k) n ∞ (1) (2) to mean that for all α∞ , α∞ α(k) ∞ continuity sets A1 , A2 Ak , (1) P αn ∈ A1 αn(2) ∈ A1 α(k) n ∈ Ak (1) (2) → P α∞ ∈ A1 α∞ ∈ A2 α(k) ∞ ∈ Ak
(see Billingsley (1968, Theorem 3.1) or Hall (1977)). D[0 1]k will be used to denote D[0 1] × · · · × D[0 1], the k-times coordinate product space of D[0 1]. We still use ⇒ to denote weak convergence on D[0 1]. To prove (3.8), we use the following lemma. LEMMA 6.1: Suppose that {Ft }t≥0 is an increasing sequence of σ-fields, q(t) is a process that is Ft -measurable for each t and continuous with probability 1,
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1921
Eq2 (t) < ∞, and q(0) = 0. Let ψ(t) t ≥ 0 be a process that is nondecreasing and continuous with probability 1, and satisfies ψ(0) = 0 and Eψ2 (t) < ∞. Let ξ be a random variable which is Ft -measurable for each t ≥ 0. If, for any γj ≥ 0 j = 1 2 r, and any 0 ≤ s < t ≤ t0 < t1 < · · · < tr < ∞, r γj [ψ(tj ) − ψ(tj−1 )] [q(t) − q(s)] Fs = 0 as E exp −
j=1
E exp −
r
γj [ψ(tj ) − ψ(tj−1 )]
j=1
× [q(t) − q(s)]2 − [ψ(t) − ψ(s)] Fs = 0
as
then the finite-dimensional distributions of the process (q(t) ξ)t≥0 coincide with those of the process (W [ψ(t)] ξ)t≥0 , where W (s) is a standard Brownian motion with EW 2 (s) = s independent of ψ(t). PROOF: This lemma is an extension of Theorem 3.1 of Borodin and Ibragimov (1995, p. 14) and the proof follows the same lines as in their work. Indeed, by using the fact that ξ is Ft -measurable for each t ≥ 0, it follows from the same arguments as in the proof of Theorem 3.1 of Borodin and Ibragimov (1995) that, for any t0 < t1 tr < ∞, αj ∈ R and s ∈ R, r αj [q(tj ) − q(tj−1 )] + isξ E exp i j=1
= E exp i
r−1
αj [q(tj ) − q(tj−1 )] + isξ
j=1
!
× E exp iαr [q(tr ) − q(tr−1 )] | Ftr−1
α2r = E exp − [ψ(tr ) − ψ(tr−1 )] 2 r−1 ! × exp i αj [q(tj ) − q(tj−1 )] + isξ j=1
r α2r [ψ(tj ) − ψ(tj−1 )] + isξ = · · · = E exp − 2 j=1
1922
Q. WANG AND P. C. B. PHILLIPS
Q.E.D.
which yields the stated result.
By virtue of Lemma 6.1, we now obtain the proof of (3.8). Technical details of some subsidiary results that are used in this proof are given in the next section. Set 1 k ζn (t) = √ n k=1 [nt]
Sn (t) =
1 d0 (nh2 )1/4
[nt] 1 xk − x ψ (t) = √ K h d1 nh2 k=1 [nt] xk − x uk K h k=1 n
[nt] 1 2 2 xk − x ψn (t) = √ uk K h d02 nh2 k=1 for 0 ≤ t ≤ 1, where d0 and d1 are defined as in (3.8). We will prove in Propositions 7.1 and 7.2 that ζn (t) ⇒ W (t), ψ n (t) ⇒ ψ(t), and ψn (t) ⇒ ψ(t) on D[0 1], where ψ(t) := L(t 0). Furthermore we will prove in Proposition 7.4 that {Sn (t)}n≥1 is tight on D[0 1]. These facts imply that {Sn (t) ψn (t) ψ n (t) ζn (t)}n≥1 is tight on D[0 1]4 . Hence, for each {n } ⊆ {n}, there exists a subsequence {n } ⊆ {n } such that (6.1)
{Sn (t) ψn (t) ψ n (t) ζn (t)} →d {η(t) ψ(t) ψ(t) W (t)}
on D[0 1]4 , where η(t) is a process continuous with probability 1 by noting (7.26) below. Write Fs = σ{W (t) 0 ≤ t ≤ 1; η(t) 0 ≤ t ≤ s}. It is readily seen that Fs ↑ and η(s) is Fs -measurable for each 0 ≤ s ≤ 1. Also note that ψ(t) (for any fixed t ∈ [0 1]) is Fs -measurable for each 0 ≤ s ≤ 1. If we prove that for any 0 ≤ s < t ≤ 1, (6.2) (6.3)
E [η(t) − η(s)] | Fs = 0 a.s.
E [η(t) − η(s)]2 − [ψ(t) − ψ(s)] | Fs = 0 a.s.
then it follows from Lemma 6.1 that the finite-dimensional distributions of (η(t) ψ(1)) coincide with those of {NL1/2 (t 0) L(1 0)}, where N is normal variate independent of L(t 0). The result (3.8) therefore follows, since η(t) does not depend on the choice of the subsequence.
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1923
Let 0 ≤ t0 < t2 < · · · < tr = 1, let r be an arbitrary integer, and let G(· · ·) be an arbitrary bounded measurable function. To prove (6.2) and (6.3), it suffices to show that (6.4) (6.5)
E[η(tj ) − η(tj−1 )]G[η(t0 ) η(tj−1 ); W (t0 ) W (tr )] = 0
E [η(tj ) − η(tj−1 )]2 − [ψ(tj ) − ψ(tj−1 )] × G[η(t0 ) η(tj−1 ); W (t0 ) W (tr )] = 0
Recall (6.1). Without loss of generality, we assume the sequence {n } is just {n} itself. Since Sn (t) Sn2 (t), and ψn (t), for each 0 ≤ t ≤ 1 are uniformly integrable (see Proposition 7.3), the statements (6.4) and (6.5) will follow if we prove (6.6) (6.7)
E[Sn (tj ) − Sn (tj−1 )]G[· · ·] → 0
E [Sn (tj ) − Sn (tj−1 )]2 − [ψn (tj ) − ψn (tj−1 )] G[· · ·] → 0
where G[· · ·] = G[Sn (t0 ) Sn (tj−1 ); ζn (t0 ) ζn (tr )] (see, e.g., Theorem 5.4 of Billingsley (1968)). Furthermore, by using similar arguments to those in the proofs of Lemmas 5.4 and 5.5 in Borodin and Ibragimov (1995), we may choose j−1 r λk yk + μk zk G(y0 y1 yj−1 ; z0 z1 zr ) = exp i k=0
k=0
Therefore, by independence of k , we only need to show that (6.8)
E
[ntj ]
uk K[(xk − x)/ h] exp iμ∗j [ζn (tj ) − ζn (tj−1 )] + iχ(tj−1 )
k=[ntj−1 ]+1
(6.9)
= o (nh2 )1/4 !2 [ntj ] uk K[(xk − x)/ h] − E k=[ntj−1 ]+1
[ntj ]
u2k K 2 [(xk − x)/ h]
k=[ntj−1 ]+1
× exp iμ∗j [ζn (tj ) − ζn (tj−1 )] + iχ(tj−1 ) = o (nh2 )1/2
where χ(s) r= χ(x1 xs u1 us ), a functional of x1 xs u1 us , and μ∗j = k=j μk .
1924
Q. WANG AND P. C. B. PHILLIPS
Note that χ(s) depends only on ( s−1 s ) and λ1 λs , and we may write (6.10)
t
xt =
ρ
t−j
ηj =
t
j=1
=ρ
t−s
ρ
t−j
xs +
ρ
j=s+1
t−j
i φj−i
i=−∞
j=1 t
j
s
i φj−i +
i=−∞
t
ρ
t−j
j=s+1
j
i φj−i
i=s+1
:= x∗st + x st where x∗st depends only on ( s−1 s ) and st
x =
t−s
ρ
t−j−s
j
j=1
i+s φj−i =
i=1
t i=s+1
i
t−i
ρt−j−i φj
j=0
Now, by independence of k again and conditioning arguments, it suffices to show that, for any μ, (6.11)
E
sup y0≤s<m≤n
(6.12)
m
sk
uk K[(y + x )/ h] exp iμ
m
√ i / n
i=1
k=s+1
= o (nh2 )1/4 m 2 m 2 2 uk K[(y + xsk )/ h] − uk K [(y + xsk )/ h] sup E
y0≤s<m≤n
k=s+1
× exp iμ
= o (nh )
2 1/2
m
√ i / n
k=s+1
i=1
This follows from Proposition 7.5. The proof of (3.8) is now complete. We next prove (3.5)–(3.7). In fact, it follows from Proposition 7.3 that, uniformly in n, EΘ21n /(nh2 )1/2 = d02 ESn2 (1) ≤ C. This yields (3.5) by the Markov’s √ inequality. It follows from Claim 1 in the proof of Proposition 7.2 that xt / nφ satisfies Assumption 2.3 of WP. The same argument as in the proof of (5.18) in WP yields (3.6). As for (3.7), it follows from Proposition 7.2, together with the fact that P(L(t 0) > 0) = 1. The proof of Theorem 3.1 is now complete. Q.E.D.
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1925
7. SOME USEFUL SUBSIDIARY PROPOSITIONS In this section we will prove the following propositions required in the proof of Theorem 3.1. Notation will be the same as in the previous section except when explicitly mentioned. PROPOSITION 7.1: We have (7.1)
ζn (t) ⇒ W (t) and 1 [nt]−k ζn (t) := √ ρ ηk ⇒ Jκ (t) on D[0 1] nφ k=1 [nt]
where {W (t) t ≥ 0} is a standard Brownian motion and Jκ (t) is defined as in (3.9). PROOF: The first statement of (7.1) is well known. So that to ζn (t) ⇒ Jκ (t), for each fixed l ≥ 1, put Z1j(l) =
l
φk j−k
∞
Z2j(l) =
and
k=0
φk j−k
k=l+1
It is readily seen that for any m ≥ 1, m
ρm−j Z1j(l) =
j=1
m
ρm−j
j=1
=
l
l
−k
ρ φk
l−1
ρ
m−j
j +
l
l
ρ−k φk
k=0
ρ
m+s−1
s=1 l
ρj m−s
s=0
=
m j=1
k=0
+
φk j−k
k=0
1−s
l
ρ−j φj
j=s
ρ−j φj
j=s+1 m
ρm−j j + R(m l)
say
j=1
Therefore, for fixed l ≥ 1, [nt] l 1 [nt]−j 1 1 −k ζn (t) = (7.2) ρ φk √ ρ j + √ R([nt] l) φ k=0 n j=1 nφ 1 [nt]−j (l) ρ Z2j +√ nφ j=1 [nt]
1926
Q. WANG AND P. C. B. PHILLIPS
Note that
√1 n
[nt]
l
j=1
ρ[nt]−j j ⇒ Jκ (t) (see Chan and Wei (1987) and Phillips
(1987)) and k=0 ρ−k φk → φ as n → ∞ first and then l → ∞. By virtue of Theorem 4.1 of Billingsley (1968, p. 25), to prove ζn (t) ⇒ Jκ (t), it suffices to show that for any δ > 0, " √ # (7.3) lim sup P sup R([nt] l) ≥ δ n = 0 n→∞
0≤t≤1
for fixed l ≥ 1 and (7.4)
[nt] √ Z2j(l) ≥ δ n = 0 lim lim sup P sup l→∞ n→∞ 0≤t≤1
j=1
|κ| Recall limn→∞ ρn = eκ , which yields e−|κ| /2 ≤ ρk ≤ for all −n ≤ k ≤ n and 2e ∞ n sufficiently large. The result (7.3) holds since k=0 |φk | < ∞, and hence as n → ∞, l l l 1 1 P |φj | + |φj | −→ 0 √ sup R([nt] l) ≤ √ max |j | n 0≤t≤1 n −l≤j≤n j=s s=0 j=s+1
We next prove (7.4). Noting m
ρ
m−j
Z
(l) 2j
j=1
=
∞ k=l+1
φk
m
ρm−j j−k
for any m ≥ 1
j=1
by applying the Hölder inequality and the independence of k , we have k (t) 2 m 2 ∞ ∞ n (l) m−j Z2j ≤ |φk | |φk |E max ρ j−k E sup 0≤t≤1
j=1
k=l+1
≤ Cn
1≤m≤n
k=l+1 ∞
j=1
2 |φk |
k=l+1
Result (7.4) now follows immediately from the Markov inequality and ∞ Q.E.D. k=l+1 |φk | → 0 as l → ∞. The proof of Proposition 7.1 is complete. PROPOSITION 7.2: For any h satisfying h → 0 and nh2 → ∞, we have [nt] 1 i xk − x (i = 1 2) ⇒ di L(t 0) (7.5) K √ h nh2 k=1 [nt] 1 2 xk − x 2 uk ⇒ d02 L(t 0) (7.6) K √ h nh2 k=1
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1927
∞ ∞ on D[0 1], where di = |φ|−1 −∞ K i (s) ds i = 1 2, d02 = |φ|−1 Eu2m0 −∞ K 2 (s) ds, and L(t s) is the local time process of the Gaussian diffusion process {Jκ (t) t ≥ 0} defined by (3.9), in which {W (t) t ≥ 0} is a standard Brownian motion. PROPOSITION 7.3: For any fixed 0 ≤ t ≤ 1, Sn (t), Sn2 (t), and ψn (t), n ≥ 1, are uniformly integrable. PROPOSITION 7.4: {Sn (t)}n≥1 is tight on D[0 1]. PROPOSITION 7.5: Results (6.11) and (6.12) hold true for any u ∈ R. To prove Propositions 7.2–7.5, we need some preliminaries. ∞ Let r(x) and r1 (x) be bounded functions such that −∞ (|r(x)| + |r1 (x)|) dx < (s) ∞. We first calculate the values of I(s) kl and IIk defined by (7.7)
(s) kl
I
= E r(x sk / h)r1 (x sl / h)g(uk )g1 (ul ) exp iμ
II(s) k = E r(xsk / h)g(uk ) exp iμ
k
√
√
!
j / n
j=1
!
j / n
l
j=1
under different settings of g(x) and g1 (x), where x sk is defined as in (6.10). We have the following lemmas, which will play a core rule in the proof of the main results. We always assume k < l and let C denote a constant not dependent on k, l, and n, which may be different from line to line. LEMMA 7.1: Suppose |ˆr (λ)| dλ < ∞, where rˆ(t) = eitx r(x) dx. (a) If E|g(uk )| < ∞, then, for all k ≥ s + 1, (s) II ≤ Ch/ k − s (7.8) k (b) If Eg(uk ) = 0 and Eg2 (uk ) < ∞, then, for all k ≥ s + 1, (s) II ≤ C[(k − s)−2 + h/(k − s)] (7.9) k dλ < ∞ and |ˆr1 (λ)| dλ < ∞, where LEMMA itx7.2: Suppose that |ˆr (λ)| rˆ(t) = e r(x) dx and rˆ1 (t) = eitx r1 (x) dx. Suppose that Eg(ul ) = Eg1 (uk ) = 0 and Eg2 (um0 ) + Eg12 (um0 ) < ∞ Then, for any > 0, there exists an n0 > 0 such that, for all n ≥ n0 , all l − k ≥ 1, and all k ≥ s + 1, −2 (7.10) I(s) + h(l − k)−1 ][(k − s)−2 + h/ k − s] kl ≤ C[(l − k)
1928
Q. WANG AND P. C. B. PHILLIPS
We only prove Lemma 7.2 with s = 0. The proofs of Lemmas 7.1 and 7.2 with s = 0 are the same and hence the details are omitted. (0) . As (|ˆr (t)| + PROOF OF LEMMA 7.2: Write x k = x 0k and Ikl = Ikl 1 1 e−ixt rˆ(t) dt and r1 (x) = 2π e−ixt rˆ1 (t) dt. |ˆr1 (t)|) dt < ∞, we have r(x) = 2π This yields that Ikl = E r(x k / h)r1 (x l / h)g(uk )g1 (ul ) exp iμ
l
!
j / n
j=1
√
E exp(−itx k / h) exp(iλx l / h)g(uk )g1 (ul )
=
× exp iμ
l
√
j / n
rˆ(t)ˆr1 (λ) dt dλ
j=1
Define Since
l j=k
l
= 0 if l < k, and put ∇(k) =
x =
l
q
l−q
q=1
j=0
ρ
l−q−j
φj =
k
k j=0
ρ−j φj and asq = ρl−q ∇(s − q).
l−m0
+
q=1
q=k+1
+
l
q alq
q=l−m0 +1
it follows from independence of the k ’s that, for l − k ≥ m0 + 1, (2) (3)
(7.11) |Ikl | ≤ E eiz / h E eiz / h g1 (ul ) |ˆr1 (λ)| ×
iz(1) / h
E e g(uk ) rˆ(t) dt dλ
where z (1) =
k
√ q (λalq − takq + uh/ n)
q=1
l−m0
z (2) =
√ q (λalq + uh/ n)
q=k+1
z
(3)
=
l q=l−m0 +1
√ q (λalq + uh/ n)
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1929
√ We may take n sufficiently large so that u/ n is as small as required. Without loss of generality, we assume u = 0 in the following proof for convenience of notation. We first show that, for all k sufficiently large, √ (1)
(7.12) Λ(λ k) := E eiz / h g(uk ) |ˆr (t)| dt ≤ C(k−2 + h/ k) To estimate Λ(λ k), we need some preliminaries. Recall ρ = 1 + κ/n. For s any given s, we have limn→∞ |∇(s)| = | j=0 φj |. This fact implies that k0 can be taken sufficiently large such that whenever n is sufficiently large, (7.13)
∞
|φj | ≤ e−|κ| |φ|/4 ≤ e−|κ| |∇(k0 )|
j=k0 /2+1
and hence for all k0 ≤ s ≤ n and 1 ≤ q ≤ s/2, ∞ (7.14) |asq | ≥ 2−1 e−|κ| |∇(k0 /2)| − 2e|κ| |φj | ≥ e−|κ| |φ|/4 j=k0 /2+1
where we have used the well known fact that limn→∞ ρn = eκ , which yields e−|κ| /2 ≤ ρk ≤ 2e|κ| for all −n ≤ k ≤ n. Further write Ω1 (Ω2 , respectively) for the set of 1 ≤ q ≤ k/2 such that |λalq − takq | ≥ h (|λalq − takq | < h, respectively), and B1 =
q∈Ω2
a2kq
B2 =
alq akq
B3 =
q∈Ω2
a2lq
q∈Ω2
√ By virtue of (7.13), it is readily seen that B1 ≥ Ck whenever #(Ω1 ) ≤ k, where #(A) denotes the number of elements in A. We are now ready to prove (7.12). First notice that there exist constants γ1 > 0 and γ2 > 0 such that $ −γ1 if |t| ≥ 1, e i1 t (7.15) |Ee | ≤ −γ t 2 e 2 if |t| ≤ 1, since E1 = 0, E21 = 1, and 1 has a density; see, for example, Chapter 1 of Petrov (1995). Also note that
(λalq − takq )2 = λ2 B3 − 2λtB2 + t 2 B1
q∈Ω2
= B1 (t − λB2 /B1 )2 + λ2 (B3 − B22 /B1 ) ≥ B1 (t − λB2 /B1 )2
1930
Q. WANG AND P. C. B. PHILLIPS
since B22 ≤ B1 B3 , by Hölder’s inequality. It follows from the independence of t that, for all k ≥ k0 , % $ iW (1) / h 2 ≤ exp −γ1 #(Ω1 ) − γ2 h−2 Ee (λalq − takq ) q∈Ω2
≤ exp{−γ1 #(Ω1 ) − γ2 B1 h−2 (t − λB2 /B1 )2 } k/2 where W (1) = q=1 q (λalq − takq ). This, together with the facts z (1) = W (1) + k (1) is indeq=k/2+1 q (λalq − takq ) and k/2 ≤ k − m0 (which implies that W pendent of uk ) yield that
Λ(λ k) ≤ E exp(iW (1) / h) E|g(uk )||ˆr (t)| dt ≤C
√ #(Ω1 )≥ k
exp(−γ1 #(Ω1 ))|ˆr (t)| dt
+C
√ #(Ω1 )≤ k
exp(−γ2 B1 h−2 (t − λB2 /B1 )2 ) dt
≤ Ck
−2
|ˆr (t)| dt +
exp(−γ2 B1 h−2 t 2 ) dt
√ ≤ C(k−2 + h/ k) This proves (7.12) for k ≥ k0 . We now turn back to the proof of (7.10). We will estimate Ikl in three separate settings: l − k ≥ 2k0
and
k ≥ k0
l − k ≤ 2k0
and
k ≥ k0
l>k
and k ≤ k0
where, without loss of generality, we assume k0 ≥ 2m0 . CASE I: l − k ≥ 2k0 and k ≥ k0 . We first notice that, for any δ > 0, there exist constants γ3 > 0 and γ4 > 0 such that, for all s ≥ k0 and q ≤ s/2, $ −γ3 if |λ| ≥ δh, e |E exp(i1 λasq / h)| ≤ −γ λ2 / h2 4 e if |λ| ≤ δh. This fact follows from (7.14) and (7.15) with a simple calculation. Hence it follows from the facts l − m0 ≥ (l + k)/2 and l − q ≥ k0 for all k ≤ q ≤ (l + k)/2,
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1931
since l − k ≥ 2k0 and k0 ≥ 2m0 , that (7.16)
& iz(2) / h (l+k)/2 ≤ Ee |E exp(iq λalq / h)| q=k
$ ≤
exp(−γ3 (l − k)) exp(−γ4 (l − k)λ2 / h2 )
if |λ| ≥ δh, if |λ| ≤ δh.
On the other hand, since Eg1 (ul ) = 0, we have (7.17)
iz(3) / h
(3)
E e g1 (ul ) = E eiz / h − 1 g1 (ul ) ≤ h−1 E z (3) |g1 (ul )| ≤ C(E21 )1/2 (Eg12 (ul ))1/2 |λ|h−1
We also have (7.18)
iz(3) / h
E e g1 (ul ) → 0
whenever
λh−1 → ∞
uniformly for all l ≥ m0 . Indeed, supposing φ0 = 0 (if φ0 = 0, we may use φ1 and so on), we have E{exp(iz (3) / h)g1 (ul )} = E{exp(il φ0 λρn−l / h)g∗ (l )} where g∗ (l ) = E[exp(i(z (3) − l φ0 λρn−l )/ h)g1 (ul ) | l ]. By recalling that l has a density d(x), it is readily seen that sup |g∗ (x)|d(x) dx ≤ E|g1 (ul )| < ∞ λ
uniformly for all l. The result (7.18) follows from the Riemann–Lebesgue theorem. By virtue of (7.18), for any > 0, there exists an n0 (A0 respectively) such that, for all n ≥ n0 (|λ|/ h ≥ A0 , respectively), |E{exp(iz (3) / h)g1 (ul )}| ≤ . This, together with (7.12) and (7.16) with δ = A0 , yields that (2) kl
I
:=
iz(2) / h iz(3) / h
E e E e g1 (ul ) Λ(λ k)|ˆr1 (λ)| dλ
|λ|>A0 h −γ3 (l−k)
≤ Ce
−2
(k
√ + h/ k)
|λ|>A0 h
√ ≤ C(l − k)−2 (k−2 + h/ k)
|ˆr1 (λ)| dλ
1932
Q. WANG AND P. C. B. PHILLIPS
Similarly it follows from (7.12), (7.16) with δ = A0 , and (7.17) that iz(2) / h iz(3) / h
(1) E e E e Ikl := g1 (ul ) Λ(λ k)|ˆr1 (λ)| dλ |λ|≤A0 h
√ ≤ C(k−2 + h/ k)h−1
|λ|≤A0 h
λ exp(−γ4 (l − k)λ2 / h2 ) dλ
√ ≤ Ch(l − k)−1 (k−2 + h/ k) The result (7.10) in Case I now follows from √ (1) (2) + Ikl ≤ C[(l − k)−2 + h(l − k)−1 ](k−2 + h/ k) Ikl ≤ Ikl CASE II: l − k ≤ 2k0 and k ≥ k0 . In this case, we only need to show that (7.19)
√ |Ikl | ≤ C( + h)(k−2 + h/ k)
In fact, as in (7.11), we have iz(4) / h iz(5) / h
E e E e (7.20) |Ikl | ≤ g(uk )g1 (ul ) |ˆr (t)||ˆr1 (λ)| dt dλ where
k−m0
z (4) =
q [λalq − takq ]
q=1
z
(5)
=
l q=k−m0 +1
√ q (λalq + uh/ n) − t
k
q akq
q=k−m0 +1
Similar arguments to those in the proof of (7.12) give that, for all λ and all k ≥ k0 , √ (4) Λ1 (λ k) := E eiz / h |ˆr (t)| dt ≤ C(k−2 + h/ k) Note that E|g(uk )g1 (ul )| ≤ (Eg2 (uk ))1/2 (Eg12 (ul ))1/2 < ∞ For any > 0, similar to the proof of (7.18), there exists an n0 (A0 , respectively) (5) such that, for all n ≥ n0 (|λ|/ h ≥ A0 , respectively), |E{eiz / h g(uk )g1 (ul )}| ≤ .
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
By virtue of these facts, we have |Ikl | ≤ + |λ|≤A0 h
≤C
|λ|>A0 h
|λ|≤A0 h
iz(5) / h
E e g(uk )g1 (ul ) |ˆr1 (λ)|Λ1 (λ k) dλ
dλ + −2
≤ C( + h)(k
1933
|λ|>A0 h
√ |ˆr1 (λ)| dλ (k−2 + h/ k)
√ + h/ k)
This proves (7.19) and hence the result (7.10) in Case II. CASE III: l > k and k ≤ k0 . In this case, we only need to prove (7.21) |Ikl | ≤ C (l − k)−3/2 + h(l − k)−1 To prove (7.21), split l > k into l − k ≥ 2k0 and l − k ≤ 2k0 . The result (7.10) then follows from the same arguments as in the proofs of Cases I and II but replacing the estimate of Λ(λ k) in (7.12) by Λ(λ k) ≤ E|g(uk )| |ˆr (t)| dt ≤ C We omit the details. The proof of Lemma 7.2 is now complete.
Q.E.D.
We are now ready to prove the propositions. We first mention that, under the conditions for K(t), if we let r(t) = K(y/ h + t) or r(t) = K 2 (y/ h + t), then ˆ |r(x)| dx = |K(x)| dx < ∞ and |ˆr (λ)| dλ ≤ |K(λ)| dλ < ∞ uniformly for all y ∈ R. PROOF OF PROPOSITION 7.5: Let r(t) = r1 (t) = K(y/ h + t) and g(x) = g1 (x) = x. It follows from Lemma 7.2 that for any > 0, there exists an n0 such that, whenever n ≥ n0 , √ |Ikl | ≤ C [(l − k)−2 + h(l − k)−1 ](k−2 + h/ k) 1≤k
1≤k
≤C +h
n k=1
−1
k
n √ (k−2 + h/ k) k=1
≤ C( + h log n)(C +
√
nh)
This implies (6.12) since h log n → 0 and nh2 → ∞. The proof of (6.11) is similar and the details are omitted. Q.E.D.
1934
Q. WANG AND P. C. B. PHILLIPS
PROOF OF PROPOSITION 7.2: We first note that, under a suitable probability space {Ω F P}, there exists an equivalent process ζˆ n (t) of ζn (t) (i.e., ζˆ n (i/n) =d ζn (i/n) 1 ≤ i ≤ n for each n ≥ 1) such that (7.22)
sup |ζˆ n (t) − Jκ (t)| = oP (1)
0≤t≤1
by Proposition 7.1 and the Skorohod–Dudley–Wichura representation theorem. Also, we may make the following claim: CLAIM 1: xjn := ζˆ n (j/n) or, equivalently, xjn = ζn (j/n) satisfies Assumption 2.3 of WP. The proof of this claim is similar to Corollary 2.2 of WP. Here we only give a outline. Write ζn (l/n) − ζn (k/n) = S1l + S2l where S1l =
√1 nφ
l j=k+1
ρl−j
j i=−∞
i φj−i + (ρl−k − 1)ζn (k/n) and
j l−i l l ρl −j ρl −i −j S2l = √ ρ i φj−i = √ ρ i ρ φj nφ j=k+1 nφ i=k+1 j=0 i=k+1
l l−i 2 Furthermore let dlkn = (ρ2l /(nφ2 )) i=k+1 ρ−2i ( j=0 ρ−j φj )2 and Ftn = 2 σ( t−1 t ). Recall (7.14). It is readily seen that dlkn ≥ C(l − k)/n whenever l − k is sufficiently large. This implies that dlkn satisfies Assumption 2.3(i) of WP. On the other hand, by using a similar argument as in the proof of Corollary 2.2 of WP with minor modifications, it may be shown that the standardized sum l−i 2 ' l l l−i
ρ−i i ρ−j φj ρ−2i ρ−j φj S2l /dlkn = i=k+1
j=0
i=k+1
j=0
has a bounded density hlk (x) satisfying sup |hlk (x) − n(x)| → 0 as l − k ≥ δn → ∞ x
√ 2 where n(x) = e−x /2 / 2π is the standard normal density. Hence, conditional on Fkn , (xln − xkn )/dlkn = (S1l + S2l )/dlkn has a density hlk (x − S1l /dlkn ),
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1935
which is uniformly bounded by a constant C and x − S1l −S1l − hlk sup suphlk d d |u|≤δ lkn
l−k≥δn
lkn
1 2 ≤ 2 sup hlk (x) − √ e−x /2 2π l−k≥δn −(x + u)2 1 2 − e−x /2 +√ sup supexp 2 2π |u|≤δ x →0
as n → ∞ first and then δ → 0. This proves that Assumption 2.3(ii) holds true for xkn , and also completes the proof of Claim 1. By virtue of√all the above facts, it follows from Theorem 2.1 of WP with the settings cn = n|φ|/ h and g(t) = K i (t − x/ h), i = 1 2, that √ ˆ ∞ [nt] φ nφ ζ (k/n) − x n − L(t 0) sup √ Ki K i (s) ds 2 h 0≤t≤1 nh −∞ k=1
→P 0 √ This, together with the fact that ζˆ n (k/n) =d ζn (k/n) = xk /( nφ) 1 ≤ k ≤ n for √ each n ≥ 1, implies that the finite-dimensional distributions of Tin (t) := [nt] (1/ nh2 ) k=1 K i [(xk − x)/ h] converge to those of di L(t 0). On the other hand, by applying the same argument as in the proof of Proposition 7.4, it is easy to show that Tin (t) n ≥ 1, is tight. Hence Tin (t) ⇒ di L(t 0), i = 1 or 2, on D[0 1]. This proves the result (7.5). [nt] To prove (7.6), write ψ n (t) = √1nh k=1 K 2 [(xk − x)/ h]u2k and ψ n (t) = [nt] 2 2 √1 k=1 K [(xk − x)/ h]Euk . We first prove nh (7.23)
sup E|ψ n (t) − ψ n (t)|2 = o(1)
0≤t≤1
In fact, by recalling xk = x∗0k + x 0k (see (6.10)), where x∗0k depends only on 0 −1 we have, almost surely, E |ψ n (t) − ψ n (t)|2 | 0 −1 !2
m y + x 0k 1 2 (u2k − Eu2k ) K ≤ 2 sup E nh y1≤m≤n h k=1
1936
Q. WANG AND P. C. B. PHILLIPS
n x0k 2 1 2 g (uk ) ≤ 2 sup Er nh y k=1 h
! x 0k x 0l Er r g(uk )g1 (ul ) +2 h h 1≤k
where r(t) = K 2 (y/ h + t), g(t) = t 2 − Eu2k , and g1 (t) = t 2 − Eu2l . Again it follows from Lemmas 7.1 and 7.2 that, for any > 0, there exists an n0 such that for all n ≥ n0 , n 1 −1/2 k + C( + h log n) nh k=m 0 1 ≤ C + h log n + √ nh
E[|ψ n (t) − ψ n (t)|2 | 0 −1 ] ≤ C
almost surely. The result (7.23) follows from nh2 → ∞, h log n → 0, and the fact that is arbitrary. The result (7.23) means that ψ n (t) and ψ n (t) have the same finite-dimensional limit distributions. Hence, the finite-dimensional distributions of ψ n (t) converge to those of d02 L(t 0), since ψ n (t) ⇒ d02 L(t 0) on D[0 1] by (7.5) and the fact Eu2k = Eu2m0 whenever k ≥ m0 . On the other hand, ψ n (t) is tight on D[0 1], which follows from the same argument as in the proof of Proposition 7.4. This proves ψ n (t) ⇒ d02 L(t 0) on D[0 1], that is, the result (7.6). Q.E.D. PROOF OF PROPOSITION 7.3: We first claim that, for each fixed t, (7.24)
sup E[ψ n (t)]2 < ∞ n
[nt] where ψ n (t) = √1nh k=1 K 2 [(xk − x)/ h]Eu2k as above. In fact, by recalling xk = x∗sk + x sk (see (6.10)), where x∗sk depends only on s s−1 it follows from Lemma 7.1 with r(t) = ry (t) = K 2 (y/ h + t) and g(t) = 1 that, for each fixed t, E|ψ n (t)|2 ≤
n C 4 xk − x EK nh2 k=1 h
%! $ 2 xk − x 2 xl − x +2 K E K h h 1≤k
n x0k C 2 ≤ sup Ery 2 nh k=1 y h
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
+2
sup Ery
1≤k
y
xkj x 0k sup Ery h h y
1937
!
n C −1/2 ≤ hk + 2 h2 k−1/2 (l − k)−1/2 nh2 k=1 1≤k
!
<∞ uniformly on n, as required. The result (7.24), together with (7.23), implies that supn E[ψn (t)]2 < ∞ and hence ψn (t) is uniformly integrable. To prove the uniform integrability of Sn2 (t), we first notice that (7.25)
sup E|ψn (t) − Sn2 (t)| = o(1)
0≤t≤1
This follows from the similar argument as in the proof of (7.23) and the fact that xl − x 2d0−2 xk − x 2 K uk ul K ψn (t) − Sn (t) = nh2 1≤k
0 and fixed t, we have 2 ES (t)IS2 (t)≥A − Eψn (t)IS2 (t)≥A ≤ sup E|ψn (t) − S 2 (t)| = o(1) n n n n 0≤t≤1
This, together with the fact that Eψn (t)ISn2 (t)≥A ≤ Eψn (t)Iψn (t)≥√A +
√
AP(Sn2 (t) ≥ A)
≤ Eψn (t)Iψn (t)≥√A + A−1/2 Eψn (t) + o(1) implies that lim sup ESn2 (t)ISn2 (t)≥A ≤ lim sup Eψn (t)Iψn (t)≥√A + A−1/2 Eψn (t)
A→∞
n
A→∞
n
= 0 where we have used the uniform integrability of ψn (t). That is, Sn2 (t) is uniformly integrable. The integrability of Sn (t) follows from that of Sn2 (t). The proof of Proposition 7.3 is now complete. Q.E.D. PROOF OF PROPOSITION 7.4: We will use Theorem 4 of Billingsley (1974) to establish the tightness of Sn (t) on D[0 1]. According to this theorem, we only need to show that (7.26) max uk K[(xk − x)/ h] = oP (nh2 )1/4 1≤k≤n
1938
Q. WANG AND P. C. B. PHILLIPS
and that there exists a sequence of αn ( δ) satisfying limδ→0 lim supn→∞ αn ( δ) = 0 for each > 0 such that, for 0 ≤ t1 ≤ t2 ≤ · · · ≤ tm ≤ t ≤ 1
t − tm ≤ δ
we have (7.27)
P |Sn (t) − Sn (tm )| ≥ | Sn (t1 ) Sn (t2 ) Sn (tm ) ≤ αn ( δ)
a.s.
n By noting max1≤k≤n |uk K[(xk − x)/ h]| ≤ { j=1 u4j K 4 [(xj − x)/ h]}1/4 , the re sult (7.26) follows from Eu4j K 4 [(xj − x)/ h] ≤ Ch/ j by Lemma 7.1, with a simple calculation. As for (7.27), it only needs to show that [nt] (7.28) uk K[(xk − x)/ h] sup P |t−s|≤δ k=[ns]+1 ≥ dn [ns] [ns]−1 ; η[ns] η1 ≤ αn ( δ) In terms of the independence, we may choose αn ( δ) as −2
2 −1/2
αn ( δ) := (nh )
sup E y0≤t≤δ
[nt]
2 0k
uk K[(y + x )/ h]
k=1
As in the proof of (7.23) with a minor modification, it is clear that whenever n is large enough, αn ( δ) ≤ −2 (nh2 )−1/2 sup y
[nδ]
E u2k K 2 [(y + x 0k )/ h] k=1
+ −2 (nh2 )−1/2 sup 2 y
× K[(y + x 0l )/ h] ≤ C−2 (nh2 )−1/2
[nδ]
E uk ul K[(y + x )/ h] 0k 1≤k
√ h/ k(1 + + h log n)
k=1
This yields limδ→0 lim supn→∞ αn ( δ) = 0 for each > 0. The proof of Proposition 7.4 is complete. Q.E.D.
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1939
8. PROOF OF THEOREM 3.2 We may write σˆ n2 − Eu2m0 = Θ3n [Θ4n + Θ5n + Θ6n ] where Θ3n is defined as in (3.7), Θ4n =
n
u2t − Eu2m0 K[(xt − x)/ h]
t=1 n Θ5n := 2 [f (xt ) − fˆ(x)]ut K[(xt − x)/ h] t=1
n
Θ6n :=
[f (xt ) − fˆ(x)]2 K[(xt − x)/ h]
t=1
√ As in the proof of (3.6) with minor modifications, have Θ6n = OP { nh1+γ }. √ we As in the proof of (3.5), we obtain Θ4n = OP {( nh)1/2 } and Θ 1n :=
n
√ u2t K[(xt − x)/ h] = OP ( nh)
t=1
These facts, together with (3.7), imply that
√ σˆ n2 − Eu2m0 = oP an hγ/2 + ( nh)−1/2 (8.1) where an diverges to infinity as slowly as required and where we use the fact that by Hölder’s inequality, √
1/2 γ/2 |Θ5n | ≤ 2Θ1/2 6n Θ1n = OP ( nh)h √ Now, result (3.15) follows from (8.1) by choosing an = min{h−γ/4 ( nh)1/4 }. On the other hand, similar to the proof of (3.8), we may prove [nt] (nh2 )−1/4 (8.2) (u2k − Eu2m0 )K[(xk − x)/ h] k=1
(nh2 )−1/2
n
K[(xk − x)/ h]
k=1
→D d0 NL1/2 (t 0) d1 L(1 0)
∞ on D[0 1]2 , where d0 2 = |φ|−1 E(u2m0 − Eu2m0 )2 −∞ K 2 (s) dt, d1 = |φ|−1 × ∞ K(s) ds, and N is a standard normal variate independent of L(1 0), as −∞
1940
Q. WANG AND P. C. B. PHILLIPS
in (3.8). This, together with the fact that Θ3n (Θ4n + Θ5n ) = oP (an hγ ) for any an diverging to infinity as slowly as required, yields (nh2 )1/4 σˆ n2 − Eu2m0 = (nh2 )−1/2 Θ3n (nh2 )−1/4 Θ4n + (nh2 )1/4 Θ3n (Θ5n + Θ6n ) →D σ1 NL(1 0)−1/2 whenever nh2 → ∞ and nh2+2γ → 0. The proof of Theorem 3.2 is now complete. Q.E.D. 9. BIAS ANALYSIS We consider the special case where, in addition to earlier conditions, κ = 0, with E(u2t ) = σu2 ut is independent of ut is a martingale difference sequence xt K satisfies K(y) dy = 1 yK(y) dy = 0 and has compact support, and f has continuous, bounded third derivatives. It follows from the proof of Theorems 2.1 and 3.1 of WP that, on a suitably enlarged probability space (9.1)
n xt − x 1 →P L(1 0) K √ h nh t=1
and ∞ xt − x 2 2 ut K N 0 σu K (s) ds h t=1 −∞ ⇒ n L(1 0)1/2 xt − x K h t=1
n
(9.2)
(nh2 )1/4
whenever nh2 → ∞ and h → 0. The error decomposition is
n
(9.3)
xt − x {f (xt ) − f (x)}K hn t=1 fˆ(xt ) − f (x) = n xt − x K hn t=1 xt − x ut K hn t=1 + n xt − x K hn t=1 n
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1941
The bias term in the numerator of the first term of (9.3) involves (9.4)
n xt − x = Ia + Ib + Ic {f (xt ) − f (x)}K h t=1
where xt − x (xt − x)K Ia = f (x) h t=1
n
n 1 xt − x Ib = f (x) (xt − x)2 K 2 h t=1 n $
1 Ic = f (xt ) − f (x) − f (x)(xt − x) − f (x)(xt − x)2 2 t=1 xt − x ×K h
%
As in (9.1) above, we have (9.5)
n Ib 1 1 xt − x (where H(s) := s2 K(s)) H √ 3 = f (x) √ 2 h nh nh t=1 ∞ 1 H(y) dy L(1 0) →P f (x) 2 −∞
We show below that the remaining terms of (9.4) have the order √ √ √ Ia + Ic = OP ( nh3 )1/2 + ( nh5 log n)1/2 + ( nh4 ) (9.6) It follows from (9.1), (9.4), and (9.5) that
(9.7)
n xt − x {f (xt ) − f (x)}K h t=1 n xt − x K h t=1 1 √ (Ia + Ib + Ic ) nh = n 1 xt − x K √ hn nh t=1
1942
Q. WANG AND P. C. B. PHILLIPS
=
∞ h2 f (x) y 2 K(y) dy {1 + op (1)} 2 −∞ 1/2 3 1/2 h h log n + OP + + h3 √ √ n n
Then, from (9.3), (9.2), and (9.7), ∞ h2 (nh2 )1/4 fˆ(xt ) − f (x) − f (x) y 2 K(y) dy 2 −∞ n xt − x 1 ut K (nh2 )1/4 t=1 h = n 1 xt − x K √ h nh t=1 + OP h + h2 (log n)1/2 + n1/4 h7/2 ∞ 2 2 N 0 σu K (s) ds −∞ ⇒ L(1 0)1/2 provided h4 log n + nh14 → 0 for which nh14 → 0 suffices. It remains to prove (9.6). As shown in the proof of Proposition 7.2, xtn = n√−1/2 xt satisfies Assumption 2.3 of WP, so that for t > s the scaled quantity n √ (xtn − xsn ) has a uniformly bounded density htsn (y) Furthermore we may t−s prove that htsn is locally Lipschitz in the neighborhood of the origin, that is, (9.8)
|htsn (x) − htsn (0)| ≤ c|x|
Then, for some constant C whose value may change in each occurrence, we have (9.9)
E|Ic | ≤
n t=1
∞ −∞
$ √ √ f ( ty) − f (x) − f (x)( ty − x)
√ % √ x t 1 2 y− ht0n (y) dy − f (x)( ty − x) K 2 h h ∞ $ n 1 f (hy + x) − f (x) − f (x)(hy) ≤ hC √ t −∞ t=1
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1943
% 1 2 − f (x)(hy) K(y) dy 2 ∞ n √ 1 4 ≤ Ch |s|3 K(s) ds ≤ C nh4 √ t −∞ t=1 using the fact that K has compact support. As for Ia , we have EI ≤ Ch E 2 a
2
n
H1
t=1
≤ Ch2
n t=1
EH12
xt − x h
!2 (with H1 (y) := yK(y))
! xt − x xs − x xt − x + H1 EH1 h h h 1≤s
It is readily seen that EH12
(9.10) Since
√ ty − x xt − x Ch = H12 ht0n (y) dy ≤ √ h h t
H1 (y) dy = 0 and using (9.8), we also have $ % xt − x E H1 Fs h √ t − sy + xs − x htsn (y) dy = H1 h h xs − x hy H1 y + dy htsn √ =√ h t −s t −s h xs − x hy htsn √ H1 y + − htsn (0) dy ≤√ h t −s t −s 2 h H1 y + xs − x |y| dy ≤C √ h t −s
since y is restricted to the compact support of K. Thus, xs − x xt − x (9.11) EH1 H1 h h x x − x − x s t E H1 ≤ E H1 Fs h h
1944
Q. WANG AND P. C. B. PHILLIPS
2 xs − x xs − x h E H1 ≤C √ H1 y + h |y| dy h t −s 2 h h |H1 (y)||H1 (y + z)||y| dz dy ≤C √ √ s t −s 2 h h ≤C √ √ s t −s
Taking the bounds (9.10) and (9.11) in EI2a , we get
(9.12)
n 1 2 1 1 5 EI ≤ Ch √ + Ch √ √ s t −s t t=1 1≤s
3
√ √ 1 √1 = 2 n log n + O( n) Combining (9.9) and using the fact that 1≤s
10. PROOF OF THEOREM 3.3 We rewrite fˆa (x) as n
fˆa (x) =
(yt − λxt )K[(xt − x)/ h]
t=1 n
K[(xt − x)/ h]
t=1 n
+ (λˆ − λ)
xt K[(xt − x)/ h]
t=1 n
K[(xt − x)/ h]
t=1
Recall that xt = t and yt − λxt = f (xt ) + uyt . Since uyt and t satisfy Assumption 2, as in the proof of Theorem 3.1 which makes an application of (3.8), Theorem 3.3 will follow if we prove λˆ →P λ, that is, we only need to prove the result (3.19).
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION
1945
We may write n [f (xt ) − fˆ(xt )]t
λˆ − λ =
t=1 n
n
+ 2 t
uyt t
t=1 n
t=1
2 t
t=1
n n n Since n1 t=1 2t →as E21 < ∞ and ( t=1 [f (xt ) − fˆ(xt )]t )2 ≤ t=1 [f (xt ) − n fˆ(xt )]2 t=1 2t by Hölder’s inequality, the result (3.19) will follow if we prove (10.1)
1 uyt t = oP (1) n t=1
(10.2)
1 [f (xt ) − fˆ(xt )]2 = oP (1) n t=1
n
n
Note that {uyt t Ft }t≥1 , where Ft = σ(1 t ), form a martingale difference and E(uyt t )2 ≤ E(u2t 2t ) < ∞. Result (10.1) follows straightforwardly. We next prove (10.2). Throughout the remaining part of the proof, we denote by C C1 constants which may differ from line to line. Recall that |f (x) − f (y)| ≤ C|x − y| and K has a compact support. We then have (|xt − xj |/ h)K[(xt − xj )/ h] ≤ C1 K[(xt − xj )/ h] and hence n
(10.3)
J1j :=
[f (xt ) − f (xj )]K[(xt − xj )/ h]
t=1 n
K[(xt − xj )/ h]
t=1
C
n
|xt − xj |K[(xt − xj )/ h]
t=1
≤
n
≤ C1 h K[(xt − xj )/ h]
t=1
Further, let Yj =
n
n
J2j =
t=j+1
K[(xt − xj )/ h] and
ut K[(xt − xj )/ h]
t=1 n
t=1
K[(xt − xj )/ h]
1946 Since
Q. WANG AND P. C. B. PHILLIPS
n t=1
K[(xt − xj )/ h] ≥ Yj , result (10.3) together with (3.4) yields
1 2 2 [f (xj ) − fˆ(xj )]2 ≤ (J + J1j2 ) n j=1 n j=1 2j n
(10.4)
n
n 2 n−2 2 −2 Y ut K[(xt − xj )/ h] + Ch2 ≤ n j=1 j t=1 := Λn + Ch2
say t j Since xt − xj = s=j+1 s if t > j and xt − xj = − s=t+1 s if t < j, by using similar arguments as in the proofs of (6.12) and/or Proposition 7.2, we have (10.5)
E
n
2 ut K[(xt − xj )/ h]
≤ C(nh2 )1/2
t=1
On the other hand, by noting that xt is a random walk and hence a 1/2-null recurrent Markov chain with Lebesgue measure as an invariance measure (see Section 6 of Karlsen and Tjøstheim (2001)), it follows from Lemmas 3.4 and 3.5 and Theorem 5.1 that, for all δ > 0, ! t n 1 h ≥ Cn−δ a.s. (10.6) K s √ h n t=1 s=1 Result (10.6), together with i.i.d. properties of i , implies that ∀η > 0 and ∀δ > 0, there exists an n0 such that for all n ≥ n0 , 1 −δ (10.7) P Yj ≥ Cn 1 ≤ j ≤ n − 2 h n−j ! t j 1 −δ K s =P h ≥ Cn 1 ≤ j ≤ n − 1 h j t=1 s=1 ≥ 1 − η This, together with (10.5), yields that, for ∀η > 0 and ∀δ > 0, 1 −δ −δ −δ Yj ≥ Cn 1 ≤ j ≤ n − 2 P(Λn ≥ n ) ≤ η + P Λn ≥ n h n−j n n−2 2 1 xt − xj ut K ≥ Ch2 n1−3δ ≤η+P n − j h j=1 t=1
STRUCTURAL NONPARAMETRIC COINTEGRATING REGRESSION n−2 √ ≤ η + Ch−2 n3δ−1 ( nh) j=1
1947
1 n−j
≤ η + Ch−1 n3δ−1/2 log n whenever n ≥ n0 . Taking δ = δ0 /4 and recalling hn1/2+δ0 → ∞, we obtain Λn = oP (1). This, together with (10.4), proves (10.2) and also completes the proof of Theorem 3.3. Q.E.D. REFERENCES AI, C., AND X. CHEN (2003): “Efficient Estimation of Models With Conditional Moment Restrictions Containing Unknown Functions,” Econometrica, 71, 1795–1843. [1901] BILLINGSLEY, P. (1968): Convergence of Probability Measures. New York: Wiley. [1920,1923,1926] (1974): “Conditional Distributions and Tightness,” The Annals of Probability, 2, 480–485. [1937] BORODIN, A. N., AND I. A. IBRAGIMOV (1995): Limit Theorems for Functionals of Random Walks. Proceedings of the Steklov Institute of Mathematics, Vol. 195. Providence, RI: American Mathematical Society. [1921,1923] CARRASCO, M., J.-P. FLORENS, AND E. RENAULT (2007): “Linear Inverse Problems in Structural Econometrics: Estimation Based on Spectral Decomposition and Regularization,” in Handbook of Econometrics, Vol. 6B, ed. by J. Heckman and E. Leamer. Amsterdam: North Holland. [1901] CHAN, N. H., AND C. Z. WEI (1987): “Asymptotic Inference for Nearly Nonstationary AR(1) Process,” The Annals of Statistics, 15, 1050–1063. [1905,1926] ELLIOTT, G. (1998): “On the Robustness of Cointegration Methods When Regression Almost Have Unit Roots,” Econometrica, 66, 149–158. [1905] FLORENS, J.-P. (2003): “Inverse Problems and Structural Econometrics: The Example of Instrumental Variables,” in Advances in Economics and Econometrics: Theory and Applications— Eighth World Congress. Econometric Society Monographs, Vol. 36, ed. by L. P. Hansen, S. J. Turnovsky, and M. Dewatripont. Cambridge, NY: Cambridge University Press. [1901] HALL, P. (1977): “Martingale Invariance Principles,” The Annals of Probability, 5, 875–887. [1920] HALL, P., AND J. L. HOROWITZ (2005): “Nonparametric Methods for Inference in the Presence of Instrumental Variables,” The Annals of Statistics, 33, 2904–2929. [1901,1914,1917] KARLESN, H. A., AND D. TJØSTHEIM (2001): “Nonparametric Estimation in Null Recurrent Time Series,” The Annals of Statistics, 29, 372–416. [1946] KARLESN, H. A., T. MYKLEBUST, AND D. TJØSTHEIM (2007): “Nonparametric Estimation in a Nonlinear Cointegration Type Model,” The Annals of Statistics, 35, 252–299. [1903,1904] NEWEY, W. K., AND J. J. POWELL (2003): “Instrumental Variable Estimation of Nonparametric Models,” Econometrica, 71, 1565–1578. [1901] NEWEY, W. K., J. L. POWELL, AND F. VELLA (1999): “Nonparametric Estimation of Triangular Simultaneous Equations Models,” Econometrica, 67, 565–603. [1901] PARK, J. Y., AND P. C. B. PHILLIPS (2001): “Nonlinear Regressions With Integrated Time Series,” Econometrica, 69, 117–161. [1920] PETROV, V. V. (1995): Limit Theorems of Probability Theory. Sequences of Independent Random Variables. Oxford Studies in Probability, Vol. 4. New York: The Clarendon Press, Oxford University Press. [1929] PHILLIPS, P. C. B. (1987): “Towards a Unified Asymptotic Theory for Autoregression,” Biometrika, 74, 535–547. [1905,1926] (1988): “Regression Theory for Near-Integrated Time Series,” Econometrica, 56, 1021–1044. [1905]
1948
Q. WANG AND P. C. B. PHILLIPS
PHILLIPS, P. C. B., AND S. N. DURLAUF (1986): “Multiple Time Series Regression With Integrated Processes,” Review of Economic Studies, 53, 473–495. [1903] PHILLIPS, P. C. B., AND B. E. HANSEN (1990): “Statistical Inference in Instrumental Variables Regression With I(1) Processes,” Review of Economic Studies, 57, 99–125. [1912] SCHIENLE, M. (2008): “Nonparametric Nonstationary Regression,” Unpublished Ph.D. Thesis, University of Mannheim. [1903,1904,1920] STOCK, J. H. (1987): “Asymptotic Properties of Least Squares Estimators of Cointegration Vectors,” Econometrica, 55, 1035–1056. [1903] WANG, Q., AND P. C. B. PHILLIPS (2009): “Asymptotic Theory for Local Time Density Estimation and Nonparametric Cointegrating Regression,” Econometric Theory, 25, 710–738. [1903,1904]
School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia; [email protected] and Cowles Foundation, Yale University, Box 208281, New Haven, CT 06520-8281, U.S.A. and University of Auckland, Auckland, New Zealand and University of Southampton, Southampton, U.K., and Singapore Management University, Singapore; [email protected]. Manuscript received February, 2008; final revision received February, 2009.
Econometrica, Vol. 77, No. 6 (November, 2009), 1949–1992
COMPARATIVE STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE ORDER BY JOHN K.-H. QUAH AND BRUNO STRULOVICI1 We identify a new way to order functions, called the interval dominance order, that generalizes both the single crossing property and a standard condition used in statistical decision theory. This allows us to provide a unified treatment of the major theorems on monotone comparative statics with and without uncertainty, the comparison of signal informativeness, and a non-Bayesian theorem on the completeness of increasing decision rules. We illustrate the concept and results with various applications, including an application to optimal stopping time problems where the single crossing property is typically violated. KEYWORDS: Single crossing property, interval dominance order, supermodularity, comparative statics, optimal stopping time, capital deepening, complete class theorem, statistical decision theory, informativeness.
1. INTRODUCTION A PRINCIPAL CONCERN in the theory of monotone comparative statics is the behavior of an optimal solution as the objective function changes.2 Consider a family of real-valued functions {f (· s)}s∈S , defined on the domain X ⊆ R and parameterized by s in S ⊆ R. Under what conditions can we guarantee that3 (1)
arg max f (x s ) ≥ arg max f (x s ) x∈X
whenever
s > s ?
x∈X
In an influential paper, Milgrom and Shannon (1994) showed that (1) holds if the family of functions {f (· s)}s∈S obeys the single crossing property (SCP). Apart from guaranteeing (1), the single crossing property has other features that make it an easily applicable concept for comparative statics, but it is not a necessary condition for (1). Indeed, Milgrom and Shannon showed that it is necessary and sufficient to guarantee that, for all Y ⊆ X, (2)
arg max f (x s ) ≥ arg max f (x s ) x∈Y
whenever
s > s
x∈Y
1 We would like to thank Ian Jewitt for many stimulating conversations. Rabah Amir, Alan Beggs, Eddie Dekel, Juan-Jose Ganuza, Aki Matsui, Paul Milgrom, Leonard Mirman, Andrea Patacconi, Herakles Polemarchakis, Edward Schlee, and Aleksey Tetenov also provided helpful feedback. Part of this research was carried out while John Quah was visiting professor at the National University of Singapore and he would like to thank the Economics Department at NUS for its hospitality and support. Finally, we are very grateful to the editor and three referees for their many insightful comments. 2 Early contributions to this literature include Topkis (1978), Milgrom and Roberts (1990), Vives (1990), and Milgrom and Shannon (1994). A textbook treatment can be found in Topkis (1998). Ashworth and Bueno de Mesquita (2006) discussed applications in political science. 3 The sets in (1) are ordered according to the strong set order, which we define in Section 2. It reduces to the standard order on the real numbers when the sets are singletons.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7583
1950
J. K.-H. QUAH AND B. STRULOVICI
This leaves open the possibility that there may be other useful concepts for comparative statics in situations where the modeler is principally interested in comparative statics on the domain X (rather than all subsets of X). The first objective of this paper is to introduce a new way to order functions that is weaker than the single crossing property but is still sufficient to guarantee (1). We call this new order the interval dominance order (IDO). We show that the family {f (· s)}s∈S obeys the interval dominance order if and only if, for all intervals Y ⊆ X, (3)
arg max f (x s ) ≥ arg max f (x s ) x∈Y
whenever
s > s
x∈Y
It is clear from this characterization that IDO is weaker than SCP.4 For IDO to be useful in applications, it helps if there is a simple way to check for the property. A sufficient condition for a family of differentiable functions {f (· s)}s∈S to obey IDO is that, for any two functions f (· s ) and f (· s ) with s > s , there is a nondecreasing positive function α (which may depend on s and s ) such that df df (x s ) ≥ α(x) (x s ) dx dx We give applications where the function α arises naturally. An important feature of the single crossing property is that it is, in some sense, robust to the introduction of uncertainty. Suppose {f (· s)}s∈S is an SCP family and interpret s as the state of the world, which is unknown to the agent when he is choosing x. Assuming that the agent is an expected utility maximizer, he will choose x to maximize s∈S f (x s)λ(s) ds, where λ is his subjective probability over states. Since the optimal choice of x increases with s if s is known, one expects the agent’s decision under uncertainty to have the same pattern, that is, the optimally chosen x should increase when higher states are more likely. It turns out that SCP does indeed possess this comparative statics property (see Athey (2002)); formally, arg max (4) f (x s)γ(s) ds ≥ arg max f (x s)λ(s) ds x∈X
s∈S
x∈X
s∈S
whenever {f (· s)}s∈S obeys SCP and γ dominates λ by the monotone likelihood ratio.5 An important feature of the interval dominance order is that, even though it is a weaker property than SCP, it is still robust to the introduction of uncertainty; that is, (4) holds whenever {f (· s)}s∈S obeys IDO. Y ⊂ X is an interval of X if any x ∈ X is also in Y whenever there is x and x in Y such that x ≤ x ≤ x . For example, if X = {1 2 3}, then {1 2} (but not {1 3}) is an interval of X. 5 Note that γ dominates λ by the monotone likelihood ratio if γ(s)/λ(s) is increasing in s. 4
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1951
The second objective of this paper is to bridge the gap between the literature on monotone comparative statics and the closely related literature in statistical decision theory on informativeness. In that setting, the agent takes an action x (in X) before the state is known but after observing a signal z (in Z ⊂ R) which conveys information on the true state. The information structure H refers to the family of distribution functions {H(·|s)}s∈S , where H(·|s) is the distribution of z conditional on the state s. Assuming that the agent has a prior given by the density function λ on S, the value of H for the payoff function f is V (H f ) ≡ maxφ∈D [ s∈S z∈Z f (φ(z) s) dH(z|s)λ(s) ds], where D is the set of all decision rules (which are maps from Z to X). So V (H f ) is the agent’s ex ante expected payoff from an optimally chosen decision rule. Lehmann identified an intuitive condition under which H can be thought of as more informative than (another information structure) G. He goes on to show that if H is more informative than G, then (5)
V (H f ) ≥ V (G f )
whenever f obeys the following property: f (· s) is a quasiconcave function of the action x and (1) holds. (In other words, the peaks of the quasiconcave functions f (· s) are moving right with increasing s.) This restriction imposed on f by Lehmann implies that {f (· s)}s∈S obeys the interval dominance order, but significantly, it need not obey the single crossing property. We extend Lehmann’s result by showing that if H is more informative than G, then (5) holds whenever {f (· s)}s∈S obeys IDO (and thus, in particular, SCP). In this way, we have found a single condition on the payoff function that is useful for both comparative statics and comparative informativeness, so results in one category extend seamlessly into results in the other.6 Our final major result uses Lehmann’s informativeness theorem to identify conditions under which a decision-maker, including one who is not Bayesian, will pick a decision rule where the action is increasing in the signal. (In statistical terminology, increasing decision rules form an essentially complete class.) Our result generalizes the complete class theorem of Karlin and Rubin (1956), who assumed that the payoff functions obey the same restriction as the one employed by Lehmann; we generalize it to an IDO class of payoff functions. The paper contains several applications, which serve primarily to illustrate the use of the IDO property and the results relating to it, but may also be of interest in themselves. The following are the main applications. (A1) We show that for any optimal stopping problem, a lower discount rate delays the optimal stopping time and raises the value of the problem. (A2) The IDO property can 6 Economic applications of Lehmann’s concept of informativeness can be found in Persico (2000), Athey and Levin (2001), Levin (2001), Bergemann and Valimaki (2002), and Jewitt (2006). The distinction between Lehmann’s restriction on the payoff function and the single crossing property was first highlighted in Jewitt (2006), which also discussed the significance of this distinction in economic applications.
1952
J. K.-H. QUAH AND B. STRULOVICI
also be used to examine a basic issue in optimal growth theory: we show that a lower discount rate leads to capital deepening, that is, the optimal capital stock is higher at all times. Both (A1) and (A2) are shown under general conditions, providing significant extensions to existing results. (A3) We illustrate the use of the informativeness results by applying them to a portfolio problem. Consider a group of investors who pool their funds with a single fund manager; the manager chooses a portfolio consisting of a safe and a risky asset, and each investor’s return is proportional to their contribution to the fund. We show that a fund manager who is more informed than another in the sense of Lehmann can choose a portfolio that gives higher ex ante utility to every investor. (A4) Finally, we consider a treatment response problem studied in Manski (2005). We show that our generalization of Karlin and Rubin’s complete class theorem allows for more realistic payoff functions and fractional treatment rules.7 The paper is organized as follows. The next section discusses the interval dominance order and its basic properties. Section 3 is devoted to applications. The IDO property for decision-making under uncertainty is studied in Section 4. In Section 5, we introduce Lehmann’s notion of informativeness and extend his result, while Section 6 proves a complete class theorem; Section 7 concludes. 2. THE INTERVAL DOMINANCE ORDER AND COMPARATIVE STATICS This section introduces the interval dominance order, provides simple ways to check that two functions respect this order, and demonstrates its relevance to monotone comparative statics. We begin with a review of the single crossing property. 2.1. The Single Crossing Property Let X be a subset of the real line (denoted by R), and let f and g be two real-valued functions defined on X. We say that g dominates f by the single crossing property (which we denote by g SC f ) if, for all x and x with x > x , the following statement holds: (6)
f (x ) − f (x ) ≥ (>) 0
⇒
g(x ) − g(x ) ≥ (>) 0
A family of real-valued functions {f (· s)}s∈S , defined on X and parameterized by s in S ⊂ R, is referred to as an SCP family if the functions are ordered by the single crossing property (SCP), that is, whenever s > s , we have f (· s ) SC f (· s ). Note that for any x > x , the function Δ : S → R defined by Δ(s) = f (x s) − f (x s) crosses the horizontal axis at most once, which gives the motivation for the term “single crossing.” 7
The first two examples are found in Section 3, the third in Section 5, and the last in Section 6.
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1953
The crucial role played by the single crossing property when comparing the solutions to optimization problems was highlighted by Milgrom and Shannon (1994). Since the solution to an optimization problem is not necessarily unique, before we state their result, we must first define an ordering on sets. Let S and S be two subsets of R. We say that S dominates S in the strong set order (see Topkis (1998)), and we write S ≥ S if for any for x in S and x in S , we have max{x x } in S and min{x x } in S .8 It follows immediately from this definition that if S = {x } and S = {x }, then x ≥ x . More generally, suppose that both sets contain their largest and smallest elements. Then it is clear that the largest (smallest) element in S is larger than the largest (smallest) element in S .9 THEOREM —Milgrom and Shannon (1994): Suppose that f and g are realvalued functions defined on X ⊂ R. Then arg maxx∈Y g(x) ≥ arg maxx∈Y f (x) for any Y ⊆ X if and only if g SC f .10 Note that the necessity of the single crossing property is obvious since we are requiring monotonicity of the optimal solution for all subsets Y of X. In particular, we can choose Y = {x x }, in which case arg maxx∈Y g(x) ≥ arg maxx∈Y f (x) implies (6). In fact, SCP is not necessary for monotone comparative statics if we only require arg maxx∈Y g(x) ≥ arg maxx∈Y f (x) for Y = X or for Y belonging to a particular subcollection of the subsets of X. Consider Figures 1 and 2: in both cases, we have arg maxx∈X g(x) ≥ arg maxx∈X f (x); furthermore, arg maxx∈Y g(x) ≥ arg maxx∈Y f (x), where Y is any closed interval contained in X. In Figure 1, SCP is satisfied (specifically, (6) is satisfied) but this is not true in Figure 2. In Figure 2, we have f (x ) = f (x ) but g(x ) < g(x ), violating SCP. This type of violation of SCP can arise naturally in an economic setting, as the following simple example shows. We shall return to this example at various points in the paper to illustrate our results. EXAMPLE 1: Consider a firm producing some good whose price we assume is fixed at 1 (either because of market conditions or for some regulatory reason). It has to decide on the production capacity (x) of its plant. Assume that a plant with production capacity x costs Dx, where D > 0. Let s be the state of the world, which we identify with the demand for the good. The unit cost of producing the good in state s is c(s). We assume that, for all s, D + c(s) < 1. 8
Note that this definition of the strong set order makes sense on any lattice (see Topkis (1998)). Throughout this paper, when we say that something is “greater” or “increasing,” we mean to say that it is greater or increasing in the weak sense. Most of the comparisons in this paper are weak, so this convention is convenient. When we are making a strict comparison, we shall say so explicitly, as in “strictly higher,” “strictly increasing,” and so forth. 10 Milgrom and Shannon’s result is situated in a lattice space. The theorem stated here is its one-dimensional analog. 9
1954
J. K.-H. QUAH AND B. STRULOVICI
FIGURE 1.—Illustration of the single-crossing property.
The firm makes its capacity decision before the state of the world is realized and makes its production decision after the state is revealed. Suppose it chooses capacity x and the realized state of the world (and thus realized demand) is s ≥ x. In this case, the firm should produce up to its capacity, so that its profit Π(x s) = x − c(s)x − Dx. On the other hand, if s < x, the firm will produce (and sell) s units of the good, giving it a profit of Π(x s) = s − c(s)s − Dx. It is easy to see that Π(· s) is increasing linearly for x ≤ s and thereafter declines linearly with a slope of −D. Its maximum is achieved at x = s, with Π(s s) = (1 − c(s) − D)s. Suppose s > s and c(s ) > c(s ); in other words, the state with higher demand also has higher unit cost. Then it is possible that Π(s s ) < Π(s s ); diagrammatically, this means that the peak of the Π(· s ) curve (achieved at x = s ) lies below the Π(· s ) curve. If this occurs, we have the situation depicted in Figure 2, with f (·) = Π(· s ) and g(·) = Π(· s ).
2.2. The Interval Dominance Order We wish to find a way to order functions that is useful for monotone comparative statics and also weaker than the single crossing property. In particu-
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1955
FIGURE 2.—Illustration of the interval dominance order, where the single-crossing property is violated.
lar, we want the ordering to allow us to say that g dominates f (with respect to this new order) whenever both functions are concave and arg maxx∈X g(x) ≥ arg maxx∈X f (x) (as in Figure 2). To this end, it is useful to look again at Figure 2 and to notice that violations of (6) can only occur if we compare points x and x on opposite sides of the maximum point of f . This suggests that a possible way to weaken SCP, while retaining comparative statics, is to require (6) to hold only for a certain collection of pairs {x x }, rather than all possible pairs. The set J is an interval of X if, whenever x and x are in J, any element x in X such that x ≤ x ≤ x is also in J.11 Let f and g be two real-valued functions defined on X. We say that g dominates f by the interval dominance order (or, for short, g I-dominates f , with the notation g I f ) if (7)
f (x ) − f (x ) ≥ (>) 0
⇒
g(x ) − g(x ) ≥ (>) 0
11 Note that X need not be an interval in the conventional sense, that is, X need not be, using our terminology, an interval of R. Furthermore, the fact that J is an interval of X does not imply that it is an interval of R. For example, if X = {1 2 3}, then J = {1 2} is an interval of X, but of course neither X nor J are intervals of R.
1956
J. K.-H. QUAH AND B. STRULOVICI
holds for x and x such that x > x and f (x ) ≥ f (x) for all x in the interval [x x ] = {x ∈ X : x ≤ x ≤ x }. Clearly, the interval dominance order (IDO) is weaker than ordering by SCP. For example, in Figure 2, g I-dominates f , but g does not dominate f by SCP. Notice that for the function g to I-dominate f , two conditions—arising from the weak and strong inequalities in (7)—have to be satisfied. The strong inequality in (7) implies the following property: (A) If f is strictly increasing over some interval [x x ] in X, then g is also strictly increasing over that interval. The weak inequality part of (7) guarantees the following property: (B) If f (x ) ≥ f (x) for x in [x x ] and f (x ) = f (x ), then g(x ) ≥ g(x ). Within a class of well behaved functions, properties (A) and (B) can be used to characterize the interval dominance order. A real-valued function f defined on an interval X of the real line is piecewise monotone if any compact interval in X may be partitioned into finitely many intervals within which f is either strictly increasing, strictly decreasing, or constant. (In other words, on any compact set, the function f has finitely many turning points or plateaus.) A sufficient condition for a function to be piecewise monotone is that it is real analytic. The next result says that for continuous and piecewise monotone functions, IDO is characterized by (A) and (B). PROPOSITION 112 : Suppose X is an interval of R, and f : X → R is continuous and piecewise monotone. Then g I f if and only if conditions (A) and (B) (as defined above) are satisfied. PROOF: It is clear that conditions (A) and (B) are necessary for g I f . To show that they are sufficient, we need to show that if f (x ) ≥ f (x) for x in [x x ] and f (x ) > f (x ), then g(x ) > g(x ). Since f is continuous and N piecewise monotone, we may write [x x ] = i=0 [ai ai+1 ], where a0 = x and aN+1 = x , and the intervals [ai ai+1 ] belong to two types: either f is strictly increasing in [ai ai+1 ] or f (x) ≤ f (ai ) = f (ai+1 ) for x ∈ [ai ai+1 ]. Condition (A) guarantees that g(ai+1 ) > g(ai ) for intervals of the first type, while condition (B) guarantees that g(ai+1 ) ≥ g(ai ) for intervals of the second type. Intervals of the first type must exist since f (x ) > f (x ) and so we conclude that Q.E.D. g(x ) > g(x ). Note that the continuity of f is needed in this result; without it, the partition described in the proof may not be possible. For example, suppose f : [0 2] → R satisfies the following: f is strictly decreasing in the interval [0 1), with f (0) = 12
We are very grateful to an anonymous referee for suggesting this result and Proposition 4.
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1957
0, and in the interval [1 2], we have f (x) = 1. Let g obey g(x) = 0 for all x in [0 2]. Then f (2) ≥ f (x) for all x in [0 2] and f (2) > f (0), but g(2) > g(0). So g does not I-dominate f , even though conditions (A) and (B) are satisfied. 2.3. The Comparative Statics Theorem In visualizing the relationship between f and g, it may be helpful to note that we could rewrite the definition in the following manner: g I f if and x ∈ / arg max f (x) x ∈ arg max f (x) x∈[x x ]
⇒
x ∈ arg max g(x) x∈[x x ]
x∈[x x ]
and x ∈ / arg max g(x) x∈[x x ]
In other words, on any interval [x x ], if f is maximized at x , then g is also maximized at x , and if x is the unique maximizer of f in the interval, then g is also uniquely maximized at x . It is a small step from this observation to the more general conclusion that arg maxx∈X g(x) ≥ arg maxx∈X f (x) whenever g I f . This is stated in Theorem 1, which also gives the precise sense in which IDO is necessary for monotone comparative statics. For that part of the result, we impose a mild regularity condition on the objective function. A function f : X → R is said to be regular if arg maxx∈[x x ] f (x) is nonempty for any points x and x with x > x . Suppose the set X is such that X ∪ [x x ] is always closed, and thus compact, in R (with the respect to the Euclidean topology). This is true, for example, if X is finite, if it is closed, or if it is a (not necessarily closed) interval. Then f is regular if it is upper semicontinuous with respect to the relative topology on X. THEOREM 1: Suppose that f and g are real-valued functions defined on X ⊂ R and g I f . Then the following property holds: ( )
arg max g(x) ≥ arg max f (x) x∈J
for any interval J of X
x∈J
Furthermore, if property ( ) holds and g is regular, then g I f . PROOF: Assume that g I-dominates f , and that x is in arg maxx∈J f (x) and x is in arg maxx∈J g(x). We need only consider the case where x > x . Since x is in arg maxx∈J f (x), we have f (x ) ≥ f (x) for all x in [x x ] ⊆ J. Since g I f , we also have g(x ) ≥ g(x ); thus x is in arg maxx∈J g(x). Furthermore, f (x ) = f (x ) so that x is in arg maxx∈J f (x). If not, f (x ) > f (x ), which implies (by the fact that g I f ) that g(x ) > g(x ), contradicting the assumption that g is maximized at x . To prove the other direction, we assume that there is an interval [x x ] such that f (x ) ≥ f (x) for all x in [x x ]. This means that x is in
1958
J. K.-H. QUAH AND B. STRULOVICI
arg maxx∈[x x ] f (x). There are two possible violations of IDO. One possibility is that g(x ) > g(x ); in this case, by the regularity of g, the set arg maxx∈[x x ] g(x) is nonempty but does not contain x , which violates ( ). Another possible violation of IDO occurs if g(x ) = g(x ) but f (x ) > f (x ). In this case, the set arg maxx∈[x x ] g(x) either contains x , which violates ( ) since arg maxx∈[x x ] f (x) does not contain x , or it does not contain x , which also violates ( ). Q.E.D. 2.4. Sufficient conditions for IDO While Theorem 1 is a straightforward result that follows easily from our definition of the interval dominance order, it nonetheless provides the basic motivation for the concept. Beyond this, the usefulness of the IDO concept may hinge on whether there are simple ways of checking that one function I-dominates another. The next result provides a sufficient condition for Idominance; since it is a condition on derivatives, we refer to it as condition (D). PROPOSITION 2: Suppose that X is an interval of R, the functions f , g : X → R are absolutely continuous on compact intervals in X (and thus f and g are differentiable a.e.), and the following condition holds: (D) there is an increasing and strictly positive function α : X → R such that g (x) ≥ α(x)f (x) for almost all x. Then g I f ; more specifically, if f (x ) ≥ f (x) for all x ∈ [x x ], then (8)
g(x ) − g(x ) ≥ α(x )(f (x ) − f (x ))
Note that the function α in Proposition 2 can be a constant function; if it is a constant α, ¯ then we obtain g(x ) − g(x ) ≥ α(f ¯ (x ) − f (x )), which implies g SC f . When α is not constant, the functions f and g in Proposition 2 need not be related by SCP, as the following example shows. Let f : [0 M] → R be a differentiable and quasiconcave function, with f (0) = 0 and a unique maximum at x∗ in (0 M). Let α : [0 M] → R be given by α(x) = 1 for x ≤ x∗ and α(x) = 1 + (x − x∗ ) for x > x∗ . Consider g : [0 M] satisfying g(0) = f (0) = 0 with g (x) = α(x)f (x) (as in Proposition 2). Then it is clear that g(x) = f (x) for x ≤ x∗ and g(x) < f (x) for x > x∗ . In other words, g coincides with f up to the point x = x∗ ; thereafter, g falls more steeply than f . The function g is also quasiconcave with a unique maximum at x∗ and g I-dominates f (weakly), but g does not dominate f by SCP. To see this, choose x and x on either side of x∗ such that f (x ) = f (x ). Then we have a violation of (6) since g(x ) = f (x ) = f (x ) > g(x ).13 13 If we wish the I-dominance to be strict, define g˜ (x) = αf (x) + ε, where ε is a positive ˜ > x∗ ), but for ε real number. Then g˜ I-dominates f strictly (in the sense that arg max[0M] g(x) sufficiently small, SCP will still be violated.
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1959
The proof of Proposition 2 follows easily from the following lemma. LEMMA 1: Suppose [x x ] is a compact interval of R, and that α and h are real-valued functions defined on [x x ], with h integrable and α increasing (and x thus integrable as well). If x h(t) dt ≥ 0 for all x in [x x ], then (9)
x
x
α(t)h(t) dt ≥ α(x )
x x
h(t) dt
PROOF: We confine ourselves to the case where α is an increasing and differentiable function. If we can establish (9) for such functions, then we can extend it to all increasing functions α since any such function can be approximated by an increasing and differentiable function. x The function H(t) ≡ α(t) t h(z) dz is absolutely continuous and thus differentiable a.e. By the fundamental theorem of calculus, H(x ) − H(x ) = x x H (t) dt; furthermore, H(x ) = α(x ) x h(t) dt, H(x ) = 0, and, by the x product rule, x H (t) = α (t) (10) h(z) dz − α(t)h(t) t
Therefore,
−α(x )
x x
h(t) dt = −H(x ) = =
x x x x
H (t) dt α (x)
x
h(z) dz dt −
x
x
t
where the last equality follows from (10). The term 0 by assumption and so (9) follows.
x x
α(t)h(t) dt
x α (x)( t h(z) dz) dt ≥ Q.E.D.
PROOF OF PROPOSITION 2: Consider x and x in X such that x > x and assume that f (x) ≤ f (x ) for all x in [x x ]. Since f is absolutely continuous x on [x x ], f (x ) − f (x) = x f (t) dt (with an analogous expression for g). We then have x x x g (t) dt ≥ α(t)f (t) dt ≥ α(x ) f (t) dt x
x
x
where the second inequality follows from Lemma 1. This leads to (8), from Q.E.D. which we obtain g(x ) ≥ (>) g(x ) if f (x ) ≥ (>) f (x ).
1960
J. K.-H. QUAH AND B. STRULOVICI
It may be helpful to have a version of Proposition 2 for applications where the agent’s choice set is discrete. This is stated in the next result. The proof is in the Appendix and uses a discrete version of Lemma 1. PROPOSITION 3: Suppose that X = {x1 x2 xN }, with xi < xi+1 for i = 1 2 N + 1. The functions f , g : X → R satisfy g I f if there is an increasing and positive function α : X → R such that (11)
g(xi+1 ) − g(xi ) ≥ α(xi )(f (xi+1 ) − f (xi ))
for
i = 1 2 N + 1
While (as we shall see in the next section) Proposition 2 provides a useful way to check for IDO, condition (D) in that proposition is not necessary for IDO. Note that IDO (like SCP) are ordinal properties, so if g I f , then h ◦ g I h˜ ◦ f ˜ Consider a differentiable function for any strictly increasing functions h and h. f defined on the interval [x x ] such that f (x) < f (x ) for all x in [x x ] and f (x ) = −1. Now choose a differentiable and strictly increasing function h¯ such that h¯ (f (x )) = 1 and (12)
¯ ) < f (x ) − f (x ) ¯ ) − g(x g(x
where g¯ = h¯ ◦ f . While g¯ I f , we claim that g¯ and f are not related by condition (D). Suppose otherwise; then since g¯ (x ) = −1, f (x ) = −1, and g (x) ≥ α(x)f (x), we must have α(x ) ≥ 1, but in this case (8) will contradict (12). This example also shows that (D) is not an ordinal property: while g = f and f trivially obeys (D) (simply choose α ≡ 1), g¯ and f do not obey that condition. The next result gives a way to weaken condition (D) for the case of analytic functions. Note that the conditions (A) and (B ) in this result are both ordinal in the following sense: if g and f obey (A), then h ◦ g and h˜ ◦ f obey (A), where h and h˜ are differentiable functions with strictly positive derivatives (and the same is true of (B )). PROPOSITION 4: Suppose that X is an interval of R and the functions f , g : X → R are analytic. Then sufficient conditions for g I f are (A) (as defined in Section 2.2) and (B ): (B ) If f (x ) ≥ f (x) for x in [x x ] and f (x ) = f (x ), then (13)
g (x ) g (x ) ≤ f (x ) f (x )
whenever f (x ) and f (x ) are nonzero. The proof of Proposition 4 is in the Appendix; it proceeds by showing that (A) and (B ) imply condition (B) in Proposition 1.14 This result generalizes
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1961
Proposition 2 in the following sense: suppose that (D) is satisfied, so there is a positive increasing function α such that g (x) ≥ α(x)f (x); then g and f obey (A) and (B ). That they obey (A) is clear. To check that (B ) is satisfied, suppose f (x ) ≥ f (x) for x in [x x ] and f (x ) = f (x ), in which case f (x) ≤ 0 and f (x ) ≥ 0. If f (x ) = 0, then g (x ) ≥ α(x )f (x ) implies that g (x )/f (x ) ≤ α(x ). Similarly, if f (x ) = 0, we have g (x )/f (x ) ≥ α(x ). Inequality (13) follows since α(x ) ≤ α(x ). The next section is devoted to applying the theoretical results obtained so far: we illustrate the use (and usefulness) of the IDO property through a number of simple examples. We shall focus on individual decision problems, though our methods can also be applied in a game-theoretic context.15 Our theory could be fruitfully developed in different directions, some of which we mention in the Conclusion. In this paper, we focus instead on the development of the theory for decision-making under uncertainty. In particular, we show in Section 4 that the interval dominance order shares an important feature with the SCP order: in some precise sense, the order is preserved when one moves from a nonstochastic to a stochastic environment. Section 4 also provides the foundation for the results on informativeness covered in Sections 5 and 6. Readers who are keen on exploring those developments can skip the next section and go straight to Section 4. 3. APPLICATIONS OF THE IDO PROPERTY EXAMPLE 2: A very natural application of Proposition 2 is to the comparative statics of optimal stopping time problems. We consider a simple deterministic problem here; in Quah and Strulovici (2007) we showed that the results in the next proposition extend naturally to a stochastic optimal stopping time problem. x Suppose we are interested in maximizing Vδ (x) = 0 e−δt u(t) dt for x ≥ 0, where δ > 0 and the function u : R+ → R is bounded on compact intervals and is measurable. So x may be interpreted as the stopping time, δ is the discount rate, u(t) is the cash flow or utility of cash flow at time t (which may be positive or negative), and Vδ (x) is the discounted sum of the cash flow (or its utility) when x is the stopping time. We are interested in how the optimal stopping time changes with the discount rate. It seems natural that the optimal stopping time will rise as the discount rate δ falls. This intuition is correct but it cannot be proved by the 14
Note that because f is analytic, it is piecewise monotone and continuous. Monotone comparative statics based on the single crossing property has been used, among other things, to guarantee that a player’s strategy is increasing in the strategy of his opponent (see Milgrom and Roberts (1990)) and, in Bayesian games, to guarantee that a player’s strategy is increasing in the signal he receives (Athey (2001)). An IDO-based theory can serve the same purpose and, since our results are more general, the restrictions on payoff functions to guarantee monotonicity are potentially less stringent. 15
1962
J. K.-H. QUAH AND B. STRULOVICI
methods of concave maximization since Vδ need not be a quasiconcave function. Indeed, it will have a turning point every time u changes sign and its local maxima occur when u changes sign from positive to negative. Changing the discount rate does not change the times at which local maxima are achieved, but it potentially changes the time at which the global maximum is achieved, that is, it changes the optimal stopping time. The next result gives the solution to this problem. PROPOSITION 5: Suppose that δ1 > δ2 > 0. Then the following statements hold: (i) Vδ2 I Vδ1 , (ii) arg maxx≥0 Vδ2 (x) ≥ arg maxx≥0 Vδ1 (x), (iii) maxx≥0 Vδ2 (x) ≥ maxx≥0 Vδ1 (x) PROOF: The functions Vδ2 and Vδ1 are absolutely continuous and thus differentiable a.e.; moreover, Vδ2 (x) = e−δ2 x u(x) = e(δ1 −δ2 )x Vδ1 (x) Note that the function α(x) = exp((δ1 − δ2 )x) is increasing, so Vδ2 I Vδ1 (by Proposition 2) and (ii) follows from Theorem 1. For (iii), let us suppose that Vδ1 (x) is maximized at x = x∗ . Then for all x in [0 x∗ ], Vδ1 (x) ≤ Vδ1 (x∗ ). Note that Vδ1 (0) = Vδ2 (0) = 0 and α(0) = 1. Thus, applying the inequality (8) (with x = 0 and x = x∗ ), we obtain Vδ2 (x∗ ) ≥ Vδ1 (x∗ ). Since maxx≥0 Vδ2 (x) ≥ Vδ2 (x∗ ), we obtain (iii). Q.E.D. Arrow and Levhari (1969) gave a version of Proposition 5(iii) (but not (ii)). They required u to be a continuous function; with this assumption, they showed that the value function V¯ , defined by V¯ (δ) = maxx≥0 Vδ (x), is right differentiable and has a negative derivative. This result is the crucial step (in their proof) that guarantees the existence of a unique internal rate of return for an investment project, that is, a unique δ∗ such that V¯ (δ∗ ) = 0. It is possible for us to extend and apply Proposition 5 to prove something along these lines, but we shall not do so in this paper.16 We should point out that we cannot strengthen Proposition 5(i) to say that Vδ2 SC Vδ1 . To see this, suppose u(t) = 1 − t, and choose x and x (with x and x smaller and bigger than 1, respectively) such that Vδ1 (x ) = Vδ1 (x ). It follows from this that the function F , defined by F(δ) = Vδ (x ) − Vδ (x ) = x −δt e (1 − t) dt, satisfies x (14) 16
F(δ1 ) =
x
x
e−δ1 t (1 − t) dt = 0
We would like to thank Herakles Polemarchakis for pointing out Arrow and Levhari’s result.
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1963
x Note that F (δ1 ) = − x e−δ1 t t(1 − t) dt > 0 because of (14). Therefore, if δ2 is close to and smaller than δ1 , we have F(δ2 ) < 0; equivalently, Vδ2 (x ) > Vδ2 (x ). So we obtain a violation of SCP.17 Put another way, if u(t) = 1 − t and the only stopping times available to the agent were x and x , then while the agent will be indifferent between them at δ = δ1 , she will strictly prefer x at δ = δ2 , so a lower discount rate leads to earlier stopping. On the other hand, if she can stop at any time x ≥ 0, then it is clear that, whatever the discount rate, it is optimal to stop just before u turns negative, that is, at x = 1. This is reflected in Proposition 5. EXAMPLE 3: Many optimization problems involve trading off costs and benefits. In this case, the objective function has the form Π(x) = B(x) − C(x), where B(x) and C(x) represent the benefit and cost of action x, respectively. The next proposition provides conditions under which Π˜ I-dominates ˜ Π, where Π˜ = B˜ − C. PROPOSITION 6 18 : Suppose that the functions Π and Π˜ are defined on an interval X of R, and that the benefit and cost functions for both are differentiable, with C > 0. Then Π˜ I Π if there exists a positive and increasing function α such that, for all x ∈ X, (i) B˜ (x) ≥ α(x)B (x) and (ii) α(x) ≥ C˜ (x)/C (x). PROOF: Clearly, Π˜ (x) = B˜ (x) − C˜ (x) ≥ α(x)Π (x) if assumptions (i) and (ii) hold. The result then follows from Proposition 2. Q.E.D. If B > 0, conditions (i) and (ii) in this proposition may be succinctly written as C˜ (x) B˜ (x) ≥ α(x) ≥ ; B (x) C (x) in this case, we may choose α to be B˜ /B if this is increasing in x (or C˜ /C if this is increasing in x). In other words, Π˜ I Π if the increase in marginal benefits is proportionately larger than the increase in marginal cost and if (say) the ratio of benefits after and before the change is increasing in x. As an application of Proposition 6, consider the profit maximization problem of a firm, where B(x) = xP(x) is the revenue derived from producing x units of output. Suppose that the inverse demand function changes from P to P˜ 17 We choose u(t) = 1 − t for concreteness. The crucial features of this example remain true for any decreasing function u that crosses the horizontal axis. 18 We would like to thank an anonymous referee for suggesting this more general presentation of an earlier result.
1964
J. K.-H. QUAH AND B. STRULOVICI
˜ By Theorem 1, arg max Π(x) ≥ and the cost function changes from C to C. x∈X arg maxx∈X Π(x) if Π˜ I Π. We claim that this condition holds if (15)
α(x) ≡
˜ C˜ (x) P(x) ≥ P(x) C (x)
and α is increasing in x. In other words, the firm will increase its output if the ratio of the inverse demand functions is increasing in x and greater than the ra˜ tio of the marginal costs. To establish this claim, observe that B(x) = α(x)B(x). Differentiating this expression and using the inequality α (x)B(x) ≥ 0, we obtain B˜ (x) ≥ α(x)B (x), so the conditions of Proposition 6 are satisfied.19 EXAMPLE 4: We wish to show, in the context of a model of optimal growth, that lowering the discount rate of the representative agent leads to capital deepening, that is, a higher capital stock at all times. To do this, we consider the general optimal control problem ∞ max U(c k) = e−δs u(c(s) k(s) s) ds 0
˙ subject to (a) k(t) = H(c(t) k(t) t), where c is a vector-valued function (of t) belonging to a certain set and k is scalar-valued, and (b) the initial condition k(0) = k0 . In the context of optimal growth, k(t) is the capital stock at time t, while the control variables c(t) can include a vector of nondurable consumption and a vector of time use (including labor hours). Our result requires no functional form assumptions on H, but as an example, we could have H(c(t) k(t) t) = Q(k(t) c2 (t) t) − c1 (t) − ηk(t), where η ∈ (0 1) is the rate of depreciation, c1 (t) and c2 (t) are the consumption and labor input at time t, respectively, and Q(· t) is the production function (which is allowed to vary with time t). We allow the felicity function u to depend directly on consumption, capital, and time. t We say that capital is beneficial if the optimal value of t12 e−δs u(c(s) k(s) s) ds subject to (a) and the boundary conditions k(t1 ) = k1 and k(t2 ) = k2 is strictly increasing in k1 . In other words, raising the capital stock at t1 strictly increases the utility achieved in the period [t1 t2 ]. Clearly, this is a mild condition which, in essence, is guaranteed if felicity strictly increases with consumption and production strictly increases with capital. This condition is all that is needed to guarantee that a lower discount rate leads to capital deepening.20 19
Note that our argument does not require that P be decreasing in x. This problem is typically treated in a discrete time model using the Bellman equation (see Amir (1996) and Boyd and Becker (1997)). In that context, capital deepening requires additional 20
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1965
¯ and (c ˆ are ¯ k) ˆ k) PROPOSITION 7: Suppose that capital is beneficial, and (c ˆ respectively. If solutions to the optimal growth problem at discount rates δ¯ and δ, ˆ ¯ ˆδ < δ, ¯ then k(t) ≥ k(t) for all t ≥ 0. PROOF: Suppose, contrary to the claim in the proposition, that there is a ¯ ) > k(t ˆ ). Let t be the largest t below t such that k(t) ¯ = time t at which k(t ˆ ¯ ˆ k(t); the existence of t is guaranteed by the continuity of k and k and the ¯ ˆ ¯ ) = k(T ˆ ). fact that k(0) = k(0). Let T be the earliest time after t at which k(T ¯ Set T = ∞ if no such time exists. Notice that for t in [t T ], we have k(t) ≥ ˆ ˜ ˜ k) the k(t), with a strict inequality for t in (t T ). For a given t, denote by (c T ¯ path that maximizes t e−δs u(c(s) k(s) s) ds subject to (a) and the boundary ˆ ¯ ). We claim that conditions k(t) = k(t) and k(T ) = k(T T T ¯ ¯ −δs ¯ ˜ ¯ ˜ (16) e u(c(s) k(s) s) ds ≥ e−δs u(c(s) k(s) s) ds t
t T
≥
¯ ˆ ˆ e−δs u(c(s) k(s) s) ds
t
˜ the first inequal˜ k); The second inequality follows from the optimality of (c ity is an equality if t = t and (by the fact that capital is beneficial) is a strict ¯ c) ¯ is the optimal policy at disinequality if t is in (t T ). In short, because (k ¯ count rate δ, any policy that has a lower capital stock at time t in [t T ] and that leads to the same capital stock at time T must accumulate less utility between ¯ t and T (at discount rate δ). Define the function F(· T δ) by T
ˆ ¯ ˆ ¯ F(t T δ) ≡ e−δs u(c(s) k(s) s) − u(c(s) k(s) s) ds t
with
¯ δ ˆ δ = δ
ˆ ¯ = k(t), (ii) eiIn the previous paragraph found t and T such that (i) k(t) ˆ ¯ ther T = ∞ or k(T ) = k(T ), and (iii) for all t in the interval [t T ], we have ¯ ≤ 0, where the inequality is strict for t in (t T ). (Condition (iii) folF(t T δ) lows from (16).) assumptions on the felicity and production functions to ensure that the capital stock in period t + 1 increases with (i) a fall in the discount rate and (ii) an increase in period t’s capital stock. In turn, these two properties guarantee that lowering the discount rate raises the capital stock over all periods. In a continuous time model, the capital stock is a continuous function of time. This property allows us to dispense with the supermodularity-type assumptions made in the discrete time model.
1966
J. K.-H. QUAH AND B. STRULOVICI
ˆ I-dominates F(· T δ), ¯ Proposition 2 guarantees that the function F(· T δ) ¯ ˆ ( δ− δ)t ˆ =e ¯ With F(T T δ) ¯ = 0, condition (iii) and since Ft (t T δ) Ft (t T δ). ˆ ˆ = 0 Indeed, since the I-dominance property imply that F(t T δ) ≤ F(T T δ) ¯ < 0 for t in (t T ), we obtain the sharper conclusion that F(t T δ) ˆ < F(t T δ) 021 which may be rewritten as T T ˆ ˆ ˆ ¯ ˆ ¯ e−δs u(c(s) k(s) s) ds < e−δs u(c(s) k(s) s) ds t
t
ˆ at the discount rate δˆ means that ˆ k) This is a contradiction: the optimality of (c ¯ if the ¯ k) it cannot accumulate strictly less utility over the interval [t T ] than (c initial and final capital stocks are the same (conditions (i) and (ii)). Q.E.D. 4. THE INTERVAL DOMINANCE ORDER WHEN THE STATE IS UNCERTAIN Let {f (· s)}s∈S be a family of functions parameterized by s in S. Each function f (· s) maps X ⊂ R to R; S is either an interval of R or consists of finitely many points in R. We say this is a quasiconcave family of functions with increasing peaks (QCIP family) if two properties hold. First, every function f (· s) is regular and quasiconcave, where the latter means that f (· s) is maximized in some interval [x∗ (s) x∗ (s)] of X, with f (· s) strictly increasing in the interval {x ∈ X : x ≤ x∗ (s)} and strictly decreasing in the interval {x ∈ X : x ≥ x∗ (s)}. Second, arg maxx∈X f (x s ) ≥ arg maxx∈X f (x s ) whenever s > s . Interpreting s to be the state of the world, consider the problem of an agent who has to choose x under uncertainty, that is, before s is realized. We assume the agent maximizes the expected value of his objective. Formally, when S is an interval, the agent maximizes f (x s)λ(s) ds F(x λ) = s∈S
where λ : S → R is the density function defined over the states of the world. It is natural to think that if the agent considers the higher states to be more likely, then his optimal value of x will increase. Is this true? Note that a QCIP family of functions is an instance of an IDO family, that is, a family of regular functions f (· s) : X → R, with X ⊆ R, such that f (· s ) I-dominates f (· s ) whenever s > s ; therefore, we may also pose the same question in that more general context. One way to formalize the notion that higher states are more likely is via the monotone likelihood ratio (MLR) property. Let λ and γ be two density 21
We can see this by retracing the proof of Lemma 1. In Lemma 1, the inequality (3) is strict if x (a) x h(t) dt > 0 for x in a set with positive measure in (x x ) and (b) α is a strictly increasing ¯ ˆ function. Note that in this application, α(t) = e(δ−δ)t , which is strictly increasing in t.
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1967
functions defined on the interval S of R (or, in the case where S is finite, two probability mass functions defined on S) and assume that λ(s) > 0 for s in S. We call γ an MLR shift of λ if γ(s)/λ(s) is increasing in s. For distributional changes of this kind, there are two results that come close, though not quite, to addressing the problem we posed. Ormiston and Schlee (1993) identified some conditions under which an MLR shift will raise the agent’s optimal choice. Among other conditions, they assumed that F(·; λ) is a quasiconcave function. This will hold if all functions in the family {f (· s)}s∈S are concave but will not generally hold if the functions are just quasiconcave. Athey (2002) gave a related result, which says that an MLR shift will lead to a higher optimal choice of x provided {f (· s)}s∈S is an SCP family. As we already pointed out in Example 1, a QCIP family need not be an SCP family. The next result gives the solution to the problem we posed. THEOREM 2: Let {f (· s)}s∈S be an IDO family. Then F(· γ) I F(· λ) if γ is an MLR shift of λ. Consequently, arg maxx∈X F(x γ) ≥ arg maxx∈X F(x λ) Notice that since {f (· s)}s∈S in Theorem 2 is assumed to be an IDO family, we know (from Theorem 1) that arg maxx∈X f (x s ) ≥ arg maxx∈X f (x s ). Thus Theorem 2 guarantees that the comparative statics which holds when s is known also holds when s is unknown, but experiences an MLR shift.2223 The proof of Theorem 2 requires a lemma (Lemma 2 stated below). Its motivation arises from the observation that if g SC f , then for any x > x such that g(x ) − g(x ) ≥ (>) 0, we must also have f (x ) − f (x ) ≥ (>) 0. Lemma 2 is the (less trivial) analog of this observation in the case when g I f . LEMMA 2: Let X be a subset of R, and let f and g be two regular functions defined on X. Then g I f if and only if the following property holds: (M) If g(x ) ≥ g(x) for x in [x x ], then g(x ) − g(x ) ≥ (>) 0
⇒
f (x ) − f (x ) ≥ (>) 0
PROOF: Suppose x < x and g(x ) ≥ g(x) for x in [x x ]. There are two possible ways for property (M) to be violated. One possibility is that f (x ) > f (x ). By regularity, we know that arg maxx∈[x x ] f (x) is nonempty; choosing x∗ in this set, we have f (x∗ ) ≥ f (x) for all x in [x x∗ ], with f (x∗ ) ≥ f (x ) > f (x ). Since g I f , we must have g(x∗ ) > g(x ), which is a contradiction. 22
We echo an observation that was made by Athey (2002) in a similar context. Apart from being interesting in itself, the monotonicity property guaranteed by Theorem 2 can play a crucial role in establishing the existence of a pure strategy equilibrium in certain Bayesian games (see Athey (2001)). 23
1968
J. K.-H. QUAH AND B. STRULOVICI
The other possible violation of (M) occurs if g(x ) > g(x ) but f (x ) = f (x ). By regularity, we know that arg maxx∈[x x ] f (x) is nonempty, and if f is maximized at x∗ with f (x∗ ) > f (x ), then we are back to the case considered above. So assume that x and x are both in arg maxx∈[x x ] f (x). Since f I g, we must have g(x ) ≥ g(x ), contradicting our initial assumption. So we have shown that (M) holds if g I f . The proof that (M) implies g I f is similar. Q.E.D. PROOF OF THEOREM 2: We shall only consider the case where S is an interval. The case where S is a finite set is left to the Appendix. The proof can be broken down into two distinct steps. Step 1. Denoting sup S by s∗ , we prove the following property: if F(x λ) ≥ F(x λ) for x ∈ [x x ], then s∗ (17) (f (x s) − f (x s))λ(s) ds ≥ 0 for any s˜ ∈ S. s˜
Assume instead that there is s¯ such that s∗ (18) (f (x s) − f (x s))λ(s) ds < 0 s¯
By the regularity of f (· s¯), there is x¯ that maximizes f (· s¯) in [x x ]. We claim ¯ λ), which is a contradiction. To establish this inequality, that F(x λ) < F(x ¯ λ) into the integral above s¯ and that below s¯ , and we break up F(x λ) − F(x examine each in turn. ¯ s¯) ≥ f (x s¯) for all x in [x ¯ x ] and {f (· s)}s∈S is an IDO family, Since f (x ¯ s) ≥ f (x s) for all s ≤ s¯ (using Lemma 2). Thus we also have f (x s¯ ¯ s) − f (x s))λ(s) ds ≥ 0 (19) (f (x s∗
¯ s¯) ≥ f (x s¯) for all x in [x x], ¯ which where s∗ = inf S. Notice also that f (x ¯ s) ≥ f (x s) for all s ≥ s¯ . Aggregating across s, we obtain implies that f (x s∗ ¯ s) − f (x s))λ(s) ds ≥ 0 (20) (f (x s¯
It follows from (18) and (20) that s∗ ¯ s) − f (x s))λ(s) ds (f (x s¯
=
s¯
> 0
s∗
¯ s) − f (x s))λ(s) ds + (f (x
s∗ s¯
(f (x s) − f (x s))λ(s) ds
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1969
s∗ ¯ s) − f (x s))λ(s) ds > 0; in other Combining this with (19), we obtain s∗ (f (x ¯ λ) > F(x λ). This completes our proof of (17). words, F(x Step 2. The function H(· λ) : [s∗ s∗ ] → R defined by s˜ H(˜s λ) = (f (x s) − f (x s))λ(s) ds s∗
satisfies H(s∗ λ) ≥ H(˜s λ) for all s˜ in [s∗ s∗ ]; this follows from (17). Defining H(· γ) in an analogous fashion, we also have H (˜s γ) = [γ(s)/λ(s)]H (˜s λ) for s˜ in S. Since γ is an upward MLR shift of λ, the ratio γ(s)/λ(s) is increasing in s. By Proposition 2, H(· γ) I H(· λ). In particular, we have H(s∗ γ) ≥ (>) H(s∗ γ) = 0 if H(s∗ λ) ≥ (>) H(s∗ λ) = 0. Rewriting this, we have F(x γ) ≥(>) F(x γ) if F(x λ) ≥ (>) F(x λ). Q.E.D. It is natural to ask whether the MLR condition in Theorem 2 can be weakened, that is, can we obtain F(· γ) I F(· λ) for every IDO family without requiring an MLR shift from λ to γ? The following proposition shows, in the context where S is finite, that the answer is “No.” Indeed, it makes a stronger claim: MLR shifts are necessary as long as one requires monotone comparative statics for every SCP family. PROPOSITION 8: Let γ and λ be two probability mass functions defined on S = {1 2 N} and suppose that γ is not an MLR shift of λ. Then there is an SCP family {f (· i)}i∈S with arg maxx∈X F(x γ) < arg maxx∈X F(x λ). The proof of Proposition 8 is in the Appendix. We end this section with two simple applications of Theorem 2. EXAMPLE 1—Continued: Recall that in state s, the firm’s profit is Π(x s). It achieves its maximum at x∗ (s) = s with Π(s s) = (1 − c(s) − D)s, which is strictly positive by assumption. The firm has to choose its capacity before the state of the world is realized; we assume that s is drawn from S, an interval in R, and has a distribution given by the density function λ : S → R. We can think of the firm as maximizing its expected profit, which is Π(x s)λ(s) S ds, or, more generally, its expected utility from profit, which is U(x λ) = S u(Π(x s) s)λ(s) ds, where, for each s, the function u(· s) : R → R is strictly increasing. The family {u(Π(· s) s)}s∈S is a QCIP, hence IDO, family. By Theorem 2, we know that an upward MLR shift of the density function will lead the firm to choose a greater capacity. EXAMPLE 5: Consider a firm that has to decide on when to launch a new product. The more time the firm gives itself, the more it can improve the quality of the product and its manufacturing process, but it also knows that there is a rival about to launch a similar product. If it is not anticipated by
1970
J. K.-H. QUAH AND B. STRULOVICI
its rival, the firm’s profit is a continuous and strictly increasing function of time φ : R+ → R+ . If the rival launches its product at time s, then the firm’s profit has a step fall at s and thereafter declines strictly with time. Formally, the firm’s profit function in state s, denoted by π(· s), satisfies π(t s) = φ(t) for t ≤ s and π(t s) = w(t s) for t > s, where w is strictly decreasing in t and limt→s+ w(t s) < φ(s). The firm decides on the launch date t by maximizing Π(t λ) = s∈S π(t s)λ(s) ds, where λ : R+ → R is the density function over s. It is clear that each π(· s) is a quasiconcave function and {π(· s)}s∈S is a QCIP (hence IDO) family, though Π(· λ) need not be a quasiconcave function (of t). By Theorem 2, if the firm thinks that it is less likely that the rival will launch early, in the sense that there is an MLR shift in the density function, then it will decide on a later launch date. 5. COMPARING INFORMATION STRUCTURES24 Consider an agent who, as in the previous section, has to make a decision before the state of the world (s) is realized, where the set of possible states S is a subset of R. Suppose that, before he makes his decision, the agent observes a signal z. This signal is potentially informative of the true state of the world; we refer to the collection {H(·|s)}s∈S , where H(·|s) is the distribution of the signal z conditional on s, as the information structure of the decision-maker’s problem. (Whenever convenient, we shall simply call this information structure H.) We assume that, for every s, H(·|s) admits a density function and has the compact interval Z ⊂ R as its support. We say that H is MLR-ordered if the density function of H(·|s ) is an MLR shift of the density function of H(·|s ) whenever s > s . We assume that the agent has a prior distribution Λ on S. We allow either of the following: (i) S is a compact interval and Λ admits a density function with S as its support or (ii) S is finite and Λ has S as its support. The agent’s decision rule (under H) is a measurable map from Z to the set of actions X (contained in R). Denoting the utility of action x in state s by u(x s), the decision rule φ : Z → X gives the ex ante expected utility U (φ H Λ) = u(φ(z) s) dH(z|s) dΛ(s) S Z We denote the agent’s posterior distribution (on S) upon observing z by ΛzH . The ex ante utility of an agent with the decision rule φ : Z → X may be equivalently written as z U (φ H Λ) = u(φ(z) s) dΛH dMHΛ z∈Z
s∈S
where MHΛ is the marginal distribution of z. A decision rule φˆ : Z → X that maximizes the agent’s (posterior) expected utility at each realized signal is 24 We are very grateful to Ian Jewitt for introducing us to the literature in this section, and the next, and for extensive discussions.
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1971
called an H-optimal decision rule. We assume that X is compact and that {u(· s)}s∈S is an equicontinuous family. This guarantees that the map from x to u(x s) dΛzH (s) is continuous, so the problem maxx∈X S u(x s) dΛzH (s) has a S solution at every value of z; in other words, φˆ exists. The agent’s ex ante utility using such a rule is denoted by V (H Λ u). Consider now an alternative information structure given by G = {G(·|s)}s∈S ; we assume that G(·|s) admits a density function and has the compact interval Z as its support. What conditions will guarantee that the information structure H is more favorable than G in the sense of offering the agent a higher ex ante utility; in other words, how can we guarantee that V (H Λ u) ≥ V (G Λ u)? It is well known that this holds if H is more informative than G according to the criterion developed by Blackwell (1953); furthermore, this criterion is also necessary if one does not impose significant restrictions on u (see Blackwell (1953) or, for a recent textbook treatment, Gollier (2001)). We wish instead to consider the case where a significant restriction is imposed on u; specifically, we assume that {u(· s)}s∈S is an IDO family. We show that, in this context, a different notion of informativeness due to Lehmann (1988) is the appropriate concept.25 Our assumptions on H guarantee that, for any s, H(·|s) admits a density function with support Z; therefore, for any (z s) in Z × S, there exists a unique element in Z, which we denote by T (z s), such that H(T (z s)|s) = G(z|s). We say that H is more accurate than G if T is an increasing function of s.26 To gain some intuition on why this is a natural definition, note that since the signals under H and G do not interact, there is no loss of generality if we imagine that the signals are monotonically related to one another. With that convenient point of view, Lehmann’s condition says that for a given signal from G, the higher is the state s, the higher is the signal under H. Thus H gives more information. It is of course crucial that we are in an ordered context where “higher” and “lower” make sense. Our goal in this section is to prove the following result. THEOREM 3: Suppose {u(· s)}s∈S is an IDO family, G is MLR-ordered, and Λ is the agent’s prior distribution on S. If H is more accurate than G, then (21)
V (H Λ u) ≥ V (G Λ u)
25 Jewitt (2006) gave the precise sense in which Lehmann’s concept is weaker than Blackwell’s (see also Lehmann (1988) and Persico (2000)) and also discussed its relationship with the concept of concordance. Some papers with economic applications of Lehmann’s concept of informativeness are Persico (2000), Athey and Levin (2001), Levin (2001), Bergemann and Valimaki (2002), and Jewitt (2006). Athey and Levin (2001) explored other related concepts of informativeness and their relationship with the payoff functions. The manner in which these papers are related to Lehmann’s (1988) result and to each other is not straightforward; Jewitt (2006) provided an overview. 26 The concept is Lehmann’s; the term accuracy follows Persico (2000).
1972
J. K.-H. QUAH AND B. STRULOVICI
This theorem generalizes a number of earlier results. Lehmann (1988) established a special case of Theorem 3 in which {u(· s)}s∈S is a QCIP family. Persico (1996) gave a version of Theorem 3 in which {u(· s)}s∈S is an SCP family, but he required the optimal decision rule to vary smoothly with the signal, a property that is not generally true without the sufficiency of the first-order conditions for optimality. Jewitt (2006) proved Theorem 3 for the general SCP case.27 To prove Theorem 3, we first note that if G is MLR-ordered, then the family of posterior distributions {ΛzG }z∈Z is also MLR-ordered, that is, if z > z , then ΛzG is an MLR shift of ΛzG . This result follows from Bayes’ rule, which tells us that (22)
λG (s|z ) m(z ) g(z |s) = λG (s|z ) m(z ) g(z |s)
where λG (·|z) is the density function of ΛzG (or, in the case where S is finite, the probability mass function), g(·|s) is the density function of G(·|s), and m is the density function of the marginal distribution over z. When G is MLRordered and z > z , the right hand side of this equation is increasing in s, so the left hand side is also increasing in s and thus {ΛzG }z∈Z is MLR-ordered.28 Since {u(· s)}s∈S is an IDO family, Theorem 2 guarantees that there exists a G-optimal decision rule that is increasing in z. Therefore, Theorem 3 is valid if we can show that for any increasing decision rule ψ : Z → X under G, there is a rule φ : Z → X under H that gives a higher ex ante utility; that is, u(φ(z) s) dH(z|s) dΛ(s) ≥ u(ψ(z) s) dG(z|s) dΛ(s) S
Z
S
Z
This inequality in turn follows from aggregating (across s) the inequality (23) below. PROPOSITION 9: Suppose {u(· s)}s∈S is an IDO family and H is more accurate than G. Then for any increasing decision rule ψ : Z → X under G, there is an increasing decision rule φ : Z → X under H such that, at each state s, the distribution of utility induced by φ and H(·|s) first-order stochastically dominates the distribution of utility induced by ψ and G(·|s). Consequently, at each state s, (23) u(φ(z) s) dH(z|s) ≥ u(ψ(z) s) dG(z|s) Z
Z
27 However, there is a sense in which it is incorrect to say that Theorem 3 generalizes Lehmann’s result. The criterion employed by us here (and indeed by Persico (1996) and Jewitt (2006) as well)—comparing information structures with the ex ante utility—is less stringent than the criterion Lehmann used. In the next section we shall compare information structures using precisely the same criterion as Lehmann and prove a result (Corollary 1) that is stronger than Theorem 3. 28 A discussion of results closely related to this can be found in Milgrom (1981).
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1973
(At a given state s, a decision rule ρ and a distribution on z induce a distribution of utility in the following sense: for any measurable set U of R, the probability of {u ∈ U} equals the probability of {z ∈ Z : u(ρ(z) s) ∈ U}. So it is meaningful to refer, as this proposition does, to the distribution of utility at each s.) Our proof of Proposition 9 requires the following lemma. LEMMA 3: Suppose {u(· s)}s∈S is an IDO family and H is more accurate than G. Then for any increasing decision rule ψ : Z → X under G, there is an increasing decision rule φ : Z → X under H such that, for all (z s),
(24) u φ(T (z s)) s ≥ u(ψ(z) s) PROOF: We shall only demonstrate here how we construct φ from ψ in the case where ψ takes only finitely many values. This is true, in particular, when the set of actions X is finite. The extension to the case where the range of ψ is infinite is shown in the Appendix; the proof that φ is increasing is also postponed to the Appendix. The proof below assumes that S is a compact interval, but it can be modified in an obvious way to deal with the case where S is finite. The proof is divided into two steps. In Step 1, the problem of finding φ will be reformulated in a way that makes it easier to solve. In Step 2, we provide an explicit construction of φ. Step 1. For every t¯ in Z and s in S, there is a unique z¯ in Z such that ¯ s). This follows from the fact that G(·|s) is a strictly increasing cont¯ = T (z tinuous function (since it admits a density function with support Z). We write z¯ = τ(t¯ s); note that because T is increasing in both its arguments, the function τ : Z × S → Z is decreasing in s. Given this, the problem of finding φ obeying (24) is equivalent to the problem of finding φ such that, for all (t s),
(25) u(φ(t) s) ≥ u ψ(τ(t s)) s Step 2. We now show how, for a given value of t, we may construct φ(t) to obey (25). Note that because τ is decreasing and ψ is increasing, the function ψ(τ(t ·)) is decreasing in s. This, together with our assumption that ψ takes finitely many values, allow us to partition S = [s∗ s∗∗ ] into the sets S1 , S2 SM , where M is odd, with the following properties: (i) if m > n, then any element in Sm is greater than any element in Sn ; (ii) whenever m is odd, Sm is a singleton, with S1 = {s∗ } and SM = {s∗∗ }; (iii) when m is even, Sm is an open interval; (iv) for any s and s in Sm , we have ψ(τ(t s )) = ψ(τ(t s )); (v) for s in Sm and s in Sn such that m > n, ψ(τ(t s )) ≥ ψ(τ(t s )). In other words, we have partitioned S into finitely many sets, so that within each set, ψ(τ(t ·)) takes the same value. Denoting ψ(τ(t s)) for s in Sm by ψm , (v) says that ψ1 ≥ ψ2 ≥ ψ3 ≥ · · · ≥ ψM . Establishing (25) involves finding x∗ such that (26)
u(x∗ sm ) ≥ u(ψm sm ) for any sm ∈ Sm
(m = 1 2 M)
1974
J. K.-H. QUAH AND B. STRULOVICI
and then setting x∗ = φ(t). Indeed such an x∗ can always be constructed given any decreasing sequence of actions {ψm }1≤m≤M . (Loosely speaking, the reason for this is that optimality requires actions to increase with the state, while this sequence is moving in the wrong direction.) In the interval [ψ2 ψ1 ], we pick the largest action φˆ 2 that maximizes u(· s∗ ) in that interval. This exists because u(· s∗ ) is continuous and X ∩ [ψ2 ψ1 ] is compact. By the IDO property, (27)
u(φˆ 2 sm ) ≥ u(ψm sm ) for any sm ∈ Sm
(m = 1 2)
Recall that S3 is a singleton; we call that element s3 . The action φˆ 4 is chosen to be the largest action in [ψ4 φˆ 2 ] that maximizes u(· s3 ). Since ψ3 is in that interval, we have u(φˆ 4 s3 ) ≥ u(ψ3 s3 ). Since u(φˆ 4 s3 ) ≥ u(ψ4 s3 ), the IDO property guarantees that u(φˆ 4 s4 ) ≥ u(ψ4 s4 ) for any s4 in S4 . Using the IDO property again (specifically, Lemma 2), we have u(φˆ 4 sm ) ≥ u(φˆ 2 sm ) for sm in Sm (m = 1 2) since u(φˆ 4 s3 ) ≥ u(φˆ 2 s3 ). Combining this with (27), we have found φˆ 4 in [ψ4 φˆ 2 ] such that (28)
u(φˆ 4 sm ) ≥ u(ψm sm ) for any sm ∈ Sm
(m = 1 2 3 4)
We can repeat the procedure finitely many times, at each stage choosing φˆ m+1 (for m odd) as the largest element maximizing u(· sm ) in the interval [ψm+1 φˆ m−1 ] and, finally, choosing φˆ M+1 as the largest element maximizing u(· s∗∗ ) in the interval [ψM φˆ M−1 ]. It is clear that φ(t) = φˆ M will satisfy (26). Q.E.D. PROOF OF PROPOSITION 9: Let z˜ denote the random signal received under information structure G and let u˜ G denote the (random) utility achieved when the decision rule ψ is used. Correspondingly, we denote the random signal received under H by t˜, with u˜ H denoting the utility achieved by the rule φ, as constructed in Lemma 3. Observe that for any fixed utility level u and at a given state s ,
Pr[u˜ H ≤ u |s = s ] = Pr u(φ(t˜) s ) ≤ u |s = s
= Pr u φ(T (˜z s )) s ≤ u |s = s
≤ Pr u(ψ(˜z ) s ) ≤ u |s = s = Pr[u˜ G ≤ u |s = s ] where the second equality comes from the fact that, conditional on s = s , the distribution of t˜ coincides with that of T (z˜ s ), and the inequality comes from the fact that u(φ(T (z s )) s ) ≥ u(ψ(z) s ) for all z (by Lemma 3).
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1975
Finally, the fact that, given the state, the conditional distribution of u˜ H firstorder stochastically dominates u˜ G means that the conditional mean of u˜ H must Q.E.D. also be higher than that of u˜ G . EXAMPLE 1—Continued: As a simple application of Theorem 3, we return again to this example (previously discussed in Sections 2 and 4), where a firm has to decide on its production capacity before the state of the world is realized. Recall that the profit functions {Π(· s)}s∈S form an IDO (though not necessarily SCP) family. Suppose that before it makes its decision, the firm receives a signal z from the information structure G. Provided G is MLR-ordered, we know that the posterior distributions (on S) will also be MLR-ordered (in z). It follows from Theorem 2 that a higher signal will cause the firm to decide on a higher capacity. Assuming the firm is risk neutral, its ex ante expected profit is V (G Λ Π), where Λ is the firm’s prior on S. Theorem 3 tells us that a more accurate information structure H will lead to a higher ex ante expected profit; the difference V (H Λ Π) − V (G Λ Π) represents what the firm is willing to spend for the more accurate information structure. It is worth pointing out that our use of Proposition 9 to prove Theorem 3 (via (23)) does not fully exploit the property of first order stochastic dominance that Proposition 9 obtains. In our next application, this stronger conclusion is crucial. EXAMPLE 6: There are N investors, with investor i having wealth wi > 0 and the strictly increasing Bernoulli utility function vi . These investors place their wealth with a manager who decides on an investment policy; specifically, the N manager allocates the total pool of funds W = i=1 wi between a risky asset with return s in state s and a safe asset with return r > 0. Denoting the fraction invested in the risky asset by x, investor i’s utility (as a function of x and s) is given by ui (x s) = vi ((xs + (1 − x)r)wi ). It is easy to see that {ui (· s)}s∈S is an IDO family. Indeed, it is also an SCP and a QCIP family: for s > r, ui (· s) is strictly increasing in x; for s = r, ui (· r) is the constant vi (rwi ); and for s < r, ui (· s) is strictly decreasing in x. Before she makes her portfolio decision, the manager receives a signal z from some information structure G. She employs the decision rule ψ, where ψ(z) (in [0 1]) is the fraction of W invested in the risky asset. We assume that ψ is increasing in the signal; in the next section we shall give the precise sense in which this restriction involves no loss of generality. Suppose that the manager now has access to a superior information structure H. By Proposition 9, there is an increasing decision rule φ under H such that, at any state s, the distribution of investor k’s utility under H and φ first-order stochastically dominates the distribution of k’s utility under G and ψ. In particular, (23) holds for u = uk . Aggregating across states, we obtain Uk (φ H Λk ) ≥ Uk (ψ G Λk ), where Λk is investor k’s (subjective) prior; in other words, k’s ex ante utility is higher with the new information structure and the new decision rule.
1976
J. K.-H. QUAH AND B. STRULOVICI
But even more can be said, because, for any other investor i, ui (· s) is a strictly increasing transformation of uk (· s), that is, there is a strictly increasing function f such that ui = f ◦ uk . It follows from Proposition 9 that (23) is true, not just for u = uk , but for u = ui . Aggregating across states, we obtain Ui (φ H Λi ) ≥ Ui (ψ G Λi ), where Λi is investor i’s prior. To summarize, we have shown the following: although different investors may have different attitudes toward risk aversion and different priors, the greater accuracy of H compared to G allows the manager to implement a new decision rule that gives greater ex ante utility to every investor. Finally, we turn to the following question: How important is the accuracy criterion to the results of this section? For example, Theorem 3 tells us that when H is more accurate than G, it gives the agent a higher ex ante utility for any prior that he may have on S. This raises the possibility that the accuracy criterion may be weakened if we only wish H to give a higher ex ante utility than G for a particular prior. However, this is not the case, as the next result shows. PROPOSITION 10: Let S be finite, and let H and G be two information structures on S. If (21) holds at a given prior Λ∗ which has S as its support and for any SCP family {u(· s)}s∈S , then (21) holds at any prior Λ which has S as its support and for any SCP family. PROOF: For the distribution Λ∗ (Λ) we denote the probability of state s by ˜ s)}s∈S by λ (s) (λ(s)). Given the SCP family {u(· s)}s∈S , define the family {u(· ˜ u(x s) = [λ(s)/λ∗ (s)]u(x s) The ex ante utility of the decision rule φ under H when the agent’s utility is u˜ may be written as ∗ ∗ ˜ ˜ U (φ H Λ ) = λ (s) u(φ(z) s) dH(z|s) ∗
s∈S
Clearly,
U (φ H Λ) ≡
λ(s)
u(φ(z) s) dH(z|s) = U˜ (φ H Λ∗ )
s∈S
From this, we conclude that (29)
˜ V (H Λ u) = V (H Λ∗ u)
˜ s)}s∈S is Crucially, the fact that {u(· s)}s∈S is an SCP family guarantees that {u(· ˜ ≥ V (G Λ∗ u). ˜ Applying (29) also an SCP family. By assumption, V (H Λ∗ u) to both sides of this inequality, we obtain V (H Λ u) ≥ V (G Λ u). Q.E.D.
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1977
Loosely speaking, this result says that if we wish to have ex ante utility comparability for any SCP family (or, even more strongly, any IDO family), then fixing the prior does not lead to a weaker criterion of informativeness. A weaker criterion can only be obtained if we fix the prior and require ex ante utility comparability for a smaller class of utility families.29 To construct a converse to Theorem 3, we assume that there are two states and two actions, and that the actions are nonordered with respect to u in the sense that x1 is the better action in state s1 and that x2 is the better action in s2 (i.e., u(x1 s1 ) > u(x2 s1 ) and u(x1 s2 ) < u(x2 s2 )). This condition guarantees that information on the state is potentially useful; if it does not hold, the decision problem is clearly trivial since either x1 or x2 will be unambiguously superior to the other action. Note also that the family {u(· s1 ) u(· s2 )} is an IDO family. We have the following result. PROPOSITION 11: Suppose that S = {s1 s2 }, X = {x1 x2 }, and that the actions are nonordered with respect to u. If H is MLR-ordered and not more accurate ¯ u) < V (G Λ ¯ u). than G, then there are a prior Λ¯ on S such that V (H Λ PROOF: Since H is not more accurate than G, there are z¯ and t¯ such that (30)
G(¯z|s1 ) = H(t¯|s1 ) and
G(¯z |s2 ) < H(t¯|s2 )
Given any prior Λ and with the information structure H, we may work out the posterior distribution and the posterior expected utility of any action after receipt of a signal. We claim that there is a prior Λ¯ such that action x1 maximizes the agent’s posterior expected utility after he receives the signal z < t¯ (under H) and action x2 maximizes the agent’s posterior expected utility after he receives the signal z ≥ t¯. This result follows from the assumption that H is MLR-ordered and is proved in the Appendix. Therefore, the decision rule φ such that φ(z) = x1 for z < t¯ and φ(z) = x2 for z ≥ t¯ maximizes the agent’s ex ante utility, that is, ¯ u) = U (φ H Λ) ¯ V (H Λ ¯ 1 ) u(x1 |s1 )H(t¯|s1 ) + u(x2 |s1 )[1 − H(t¯|s1 )] = λ(s ¯ 2 ) u(x1 |s2 )H(t¯|s2 ) + u(x2 |s2 )[1 − H(t¯|s2 )] + λ(s Now consider the decision rule ψ under G given by ψ(z) = x1 for z < z¯ and ψ(z) = x2 for z ≥ z¯ . We have ¯ = λ(s ¯ 1 ) u(x1 |s1 )G(¯z |s1 ) + u(x2 |s1 )[1 − G(¯z|s1 )] U (ψ G Λ) ¯ 2 ) u(x1 |s2 )G(¯z|s2 ) + u(x2 |s2 )[1 − G(¯z |s2 )] + λ(s 29
This possibility was explored by Athey and Levin (2001).
1978
J. K.-H. QUAH AND B. STRULOVICI
¯ and U (φ H Λ), ¯ and bearing in Comparing the expressions for U (ψ G Λ) mind (30) and the fact that x2 is the optimal action in state s2 , we see that ¯ > U (φ H Λ) ¯ = V (H Λ ¯ u) U (ψ G Λ) ¯ u) > V (H Λ ¯ u) Therefore, V (G Λ
Q.E.D.
6. THE COMPLETENESS OF INCREASING DECISION RULES The model of information we constructed in the last section is drawn from statistical decision theory.30 In the context of statistical decisions, the agent is a statistician conducting an experiment, the signal is the random outcome of the experiment, and the state of the world should be interpreted as a parameter of the model being considered by the statistician. In the previous section, we identified conditions under which a Bayesian statistician (i.e., a statistician who has a prior on the states of the world) will pick an increasing decision rule. Our objective in this section is to strengthen that conclusion: we show that, under the same conditions, statisticians who use other criteria to choose their decision rule may also confine their attention to increasing decision rules. This conclusion follows from a natural application of the informativeness results of the previous section (in particular, Proposition 9). Consider an information structure G = {G(·|s)}s∈S ; as in the previous section, we assume that the signal z is drawn from a compact interval Z and that its distribution G(·|s) admits a density function with Z as its support. The set of possible states, S, may either be a compact interval or a finite set of points. Unlike the previous section, we do not assume that the statistician has a prior on S. The utility associated with each action and state is given by the function u : X × S → R. We assume that the set of actions X is a compact subset of R and that u is continuous in x. Let D be the set of all decision rules (which are measurable maps from Z to X). The expected utility of the rule ψ in state s (denoted by EG (ψ s)) is defined as EG (ψ s) = u(ψ(z) s) dG(z|s) z∈Z
Note that our assumptions guarantee that this is well defined. The decision rule ψ˜ is said to be at least as good as another decision ψ if it ˜ s) ≥ EG (ψ s) for all s gives higher expected utility in all states, that is, EG (ψ in S. A subset D of decision rules forms an essentially complete class if for any 30 For an introduction to statistical decision theory, see Blackwell and Girshik (1954) or Berger (1985). Statistical decision theory has been used in econometrics to study portfolio allocation, treatment choice, and model uncertainty in macroeconomic policy, among other areas. See the survey of Hirano (2008).
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1979
decision rule ψ, there is a rule ψ˜ in D that is at least as good as ψ. Results that identify some subset of decision rules as an essentially complete class are called complete class theorems.31 It is useful to identify such a class of decision rules because, while statisticians may differ on the criterion they adopt to choose amongst decision rules, it is typically the case that a rule satisfying their preferred criterion can be found in an essentially complete class. Consider the Bayesian statistician with prior Λ; her ex ante utility from the rule ψ (recall that this is denoted by U (ψ G Λ)) can be written as s∈S EG (ψ s) dΛ(s). It is clear that if U (ψ G Λ) is maximized ˆ then a rule ψ˜ in the complete class that is at least as good as ψˆ will at ψ = ψ, also maximize U (ψ G Λ). A non-Bayesian statistician will choose a rule using a different criterion; the two most commonly used criteria are the maxmin and minimax regret. The maxmin criterion evaluates a decision rule according to the lowest utility the rule could bring; formally, a rule ψˆ satisfies this criterion if it solves maxψ∈D {mins∈S EG (ψ s)}. The regret of a decision rule ψ, which we denote by r(ψ), is defined as maxs∈S [maxψ ∈D EG (ψ s) − EG (ψ s)]. In other words, the regret of a rule ψ is the utility gap between the ideal rule (if s is known) and the rule ψ. A rule ψˆ satisfies the minimax regret criterion if it solves minψ∈D r(ψ). It is not hard to check that if a decision rule for either the maxmin or minimax regret criterion exists, then such a rule can be found in an essentially complete class.32 The following complete class theorem is the main result of this section. It provides a justification for the statistician to restrict her attention to increasing decision rules. THEOREM 4: Suppose {u(· s)}s∈S is an IDO family and G is MLR-ordered. Then the increasing decision rules form an essentially complete class. Theorem 4 generalizes the complete class theorem of Karlin and Rubin (1956), which in turn generalizes Blackwell and Girshik (1954, Theorem 7.4.3). Karlin and Rubin (1956) established the essential completeness of increasing decision rules under the assumption that {u(· s)}s∈S is a QCIP family, which is a special case of our assumption that {u(· s)}s∈S forms an IDO family. Note 31
Some readers may wonder why, in our definition of a decision rule, we did not allow the statistician to mix actions at any given signal. The answer is that when the signal space is atomless (as we have assumed), allowing her to do so will not make a difference, since the set of decision rules involving only pure actions form an essentially complete class (see Blackwell (1951)). 32 In the context of statistical decisions, the maxmin criterion was first studied by Wald; the minimax regret criterion is due to Savage. Discussion and motivation for the maxmin and minimax regret criteria can be found in Blackwell and Girshik (1954), Berger (1985), and Manski (2005). For some recent applications of the minimax regret criterion, see Manski (2004, 2005) and Manski and Tetenov (2007); a closely related criterion was employed by Chamberlain (2000).
1980
J. K.-H. QUAH AND B. STRULOVICI
that Theorem 4 is not known even for the case where {u(· s)}s∈S forms an SCP family. An immediate application of Theorem 4 is that it allows us to generalize Theorem 3. Recall that Theorem 3 tells us that, for the Bayesian statistician, if H is more accurate than G, then H achieves a higher ex ante utility. Combining Theorem 4 and Proposition 9 allows us to strengthen that conclusion by establishing the superiority of H over G under a more stringent criterion; specifically, Corollary 1 below tells us that when an experiment H is more accurate than G, then H is capable of achieving higher expected utility in all states. COROLLARY 1: Suppose {u(· s)}s∈S is an IDO family, H is more accurate than G, and G is MLR-ordered. Then for any decision rule ψ : Z → X under G, there is an increasing decision rule φ : Z → X under H such that (31)
EH (φ s) ≥ EG (ψ s) at each s ∈ S
PROOF: If ψ is not increasing, then by Theorem 4, there is an increasing rule ψ¯ that is at least as good as ψ. Proposition 9 in turn guarantees that there is an ¯ since (23) says increasing decision rule φ under H that is at least as good as ψ, ¯ s) at each s. that EH (φ s) ≥ EG (ψ Q.E.D. Corollary 1 is a generalization of Lehmann (1988), which established a version of this result in the case where {u(· s)}s∈S is a QCIP family. This result is not known even for the case where {u(· s)}s∈S forms an SCP family. PROOF OF THEOREM 4: Having assumed that Z is a compact interval, we can assume, without further loss of generality, that Z = [0 1]. Let ψ be a decision rule (not necessarily increasing) under G. We show that there exists an increasing decision rule that is at least as good as ψ. We shall confine ourselves here to the case where ψ takes only finitely many values, an assumption which certainly holds if the set of actions X is finite. The case where ψ has an infinite range is covered in the Appendix. The proof is divided into two steps. In Step 1, we show that given G and ¯ and a decision rule ψ¯ that gives ψ, there is another information structure G exactly the same payoff distribution as ψ (under G) in every state. In Step 2, ¯ is less accurate than G. It then follows we show that the information structure G from Proposition 9 that there is an increasing decision rule under G that is at ¯ and hence at least as good as ψ (under G). In least as good as ψ¯ (under G) other words, the decision-maker who uses a strategy that is not increasing is, in a sense, debasing the information made available to her by G. Step 1. Suppose that the actions taken under ψ are exactly x1 , x2 xn ¯ = (arranged in increasing order). We construct a new information structure G ¯ ¯ ¯ {G(·|s)} s∈S along the following lines. For each s, G(0|s) = 0 and G(k/n|s) =
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1981
PrG [ψ(z) ≤ xk |s] for k = 1 2 n, where the right side of the second equation refers to the probability of {z ∈ Z : ψ(z) ≤ xk } under the distribution G(·|s). We define tk (s) as the unique element in Z that obeys G(tk (s)|s) = ¯ G(k/n|s). (Note that t0 (s) = 0 for all s.) Any z in ((k − 1)/n k/n) may be written as z = θ[(k − 1)/n] + (1 − θ)[k/n] for some θ in (0 1). We define (32)
¯ G(z|s) = G(θtk−1 (s) + (1 − θ)tk (s)|s)
¯ This completely specifies G. ¯ Define a new decision rule ψ¯ by ψ(z) = x1 for z in [0 1/n]; for k ≥ 2, we ¯ have ψ(z) = xk for z in ((k − 1)/n k/n]. This is an increasing decision rule since we have arranged xk to be increasing with k. It is also clear from our ¯ and ψ¯ that, at each state s, the distribution of utility induced construction of G ¯ ¯ by G and ψ equals the distribution of utility induced by G and ψ. ¯ Provided this is true, PropoStep 2. We claim that G is more accurate than G. sition 9 says that there is an increasing decision rule φ under G that is at least ¯ that is, at each s, the distribution of utility induced by G as good as ψ¯ under G, ¯ and ψ. ¯ Since the and φ first-order stochastically dominates that induced by G latter coincides with the distribution of utility induced by G and ψ, the proof is complete. ¯ follows from the assumption that G is MLRThat G is more accurate than G ordered. Denote the density function associated to the distribution G(·|s) by g(·|s). The probability of Zk = {z ∈ Z : ψ(z) ≤ xk } is given by 1Zk (z)g(z|s) dz, ¯ we have where 1Zk is the indicator function of Zk . By the definition of G, ¯ G(k/n|s) = 1Zk (z)g(z|s) dz Recall that tk (s) is defined as the unique element that obeys G(tk (s)|s) = ¯ G(k/n|s); equivalently,
¯ G(k/n|s) − G(tk (s)|s) = (33) 1Zk (z) − 1[0tk (s)] (z) g(z|s) dz = 0 The function W given by W (z) = 1Zk (z) − 1[0tk (s)] (z) has the following singlecrossing-type condition: for z > tk (s), we have W (z) ≥ 0, and for z ≤ tk (s), we have W (z) ≤ 0.33 Let s > s; since G(·|s) is MLR-ordered, g(z|s )/g(z|s) is an increasing function of z. By a standard result (see, for example, Athey (2002, Lemma 5)), we have
¯ G(k/n|s ) − G(tk (s)|s ) = (34) 1Zk (z) − 1[0tk (s)] (z) g(z|s ) dz ≥ 0 33 This property is related to but not the same as the single crossing property we have defined in this paper; Athey (2002) referred to this property as SC1 and the one we use as SC2.
1982
J. K.-H. QUAH AND B. STRULOVICI
¯ This implies that tk (s ) ≥ tk (s) since G(tk (s )|s ) = G(k/n|s ). ¯ To show that G is more accurate than G, we require T (z s) to be in¯ creasing in s, where T is defined by G(T (z s)|s) = G(z|s). For z = k/n, T (z s) = tk (s), which we have shown is increasing in s. For z in the in¯ terval ((k − 1)/n k/n), recall (see (32)) that G(z) was defined such that T (z s) = θtk−1 (s) + (1 − θ)tk (s). Since both tk−1 and tk are increasing in s, T (z s) is also increasing in s. Q.E.D.
It is clear from our proof of Theorem 4 that we can in fact give a sharper statement of that result; we do so below. THEOREM 4 : Suppose {u(· s)}s∈S is an IDO family and G is MLR-ordered. Then for any decision rule ψ : Z → X, there is an increasing decision rule φ : Z → X such that, at each s, the distribution of utility induced by G and φ first-order stochastically dominates the distribution of utility induced by G and ψ. Complete class theorems obviously have a role in statistics and econometrics, but they are also relevant in theoretical economic modeling. The next example gives an instance of such an application. EXAMPLE 6—Continued: Recall that we assume in Section 5 that the manager’s decision rule under G is increasing in the signal. This restriction on the decision rule needs to be justified; provided G is MLR-ordered, this justification is provided by Theorem 4 . Let ψ˜ be a (not necessarily increasing) decision rule. Theorem 4 tells us that for some investor k, there is an increasing decision rule ψ : Z → X such that, at each s, the distribution of uk induced by G and ψ first order stochastically dominates the distribution of uk induced by G ˜ This implies that and ψ. ˜ (35) uk (φ(z) s) dG(z|s) ≥ uk (ψ(z) s) dG(z|s) z∈Z
z∈Z
˜ G Aggregating this inequality across states, we obtain Uk (ψ G Λk ) ≥ Uk (ψ Λk ), that is, the increasing rule gives investor k a higher ex ante utility. However, we can say more because, for any other investor i, ui (· s) is just an increasing transformation of uk (· s), that is, there is a strictly increasing function f such that ui = f ◦ uk . Appealing to Theorem 4 again, we see that (35) is true if uk is replaced with ui . Aggregating this inequality across states ˜ G Λi ). In short, we have shown that any decigives us Ui (ψ G Λi ) ≥ Ui (ψ sion rule admits an increasing decision rule that (weakly) raises the ex ante utility of every investor. This justifies our assumption that the manager uses an increasing decision rule. Our final application generalizes an example found in Manski (2005, Proposition 3.1) on monotone treatment rules.
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1983
EXAMPLE 7: Suppose that there are two ways to treat patients with a particular medical condition. Treatment A is the status quo; it is known that a patient who receives this treatment will recover with probability p¯ A . Treatment B is a new treatment whose effectiveness is unknown. The probability of recovery with this treatment, pB , corresponds to the unknown state of the world and takes values in some set P. We assume that the planner receives a signal z of pB that is MLR-ordered with respect to pB . (Manski (2005) considered the case of N subjects who were randomly selected to receive treatment B, with z being the number who were cured. Clearly, the distribution of z is binomial; it is also not hard to check that it is MLR-ordered with respect to pB .) Normalizing the utility of a cure at 1 and that of no cure at 0, the planner’s expected utility when a member of the population receives treatment A is p¯ A . Similarly, the expected utility of treatment B is pB . Therefore, the planner’s utility if she subjects fraction x of the population to B (and the rest to A) is (36)
u(x pB ) = (1 − x)p¯ A + xpB
The planner’s decision (treatment) rule maps z to the proportion x of the (patient) population who will receive treatment B. As pointed out in Manski (2005), {u(x ·)}pB ∈P is a QCIP family, and so Karlin and Rubin (1956) guaranteed that decision rules where x increases with z form an essentially complete class. Suppose now that the planner has a different payoff function that takes into account the cost of the treatment. We denote the cost of having fraction x treated with B and the rest with A by C(x). Then the payoff function is u(x pB ) = (1 − x)p¯ A + xpB − C(x) If the cost of treatments A and B are both linear or, more generally, if C is convex, then one can check that {u(x ·)}pB ∈P will still be a QCIP family. We can then appeal to Karlin and Rubin (1956) to obtain the essential completeness of the increasing decision rules. But there is no particular reason to believe that C is convex; indeed C will never be convex if the presence of scale economies leads to the total cost of having both treatments in use being more expensive than subjecting the entire population to one treatment or the other. (Formally, there is x∗ ∈ (0 1) such that C(0) < C(x∗ ) > C(1).) However, u is supermodular in (x pB ) whatever the shape of C, so {u(x ·)}pB ∈P is an IDO family; Theorem 4 tells us that the planner may confine herself to increasing decision rules since they form an essentially complete class.34 34 Manski and Tetenov (2007) proved a complete class theorem with a different variation on the payoff function (36). For the payoff function (36), Manski (2005) in fact showed a sharper result: the planner will choose a rule where the whole population is subject to B (A) if the number of treatment successes in the sample goes above (below) a particular threshold. The modification of the payoff function in Manski and Tetenov (2007) is motivated in part by the desire to obtain
1984
J. K.-H. QUAH AND B. STRULOVICI
7. CONCLUSION In their conclusion, Milgrom and Shannon (1994), noted that the theory of monotone comparative statics, even for optimization models, is still unfinished. One priority is the analysis of economic applications involving stochastic models. The first- and second-order stochastic dominance relations, Blackwell’s informativeness order for information systems, and the likelihood ratio order are among ones that could be usefully integrated with the single crossing and/or supermodularity conditions.
Part of this paper addresses the issues highlighted by Milgrom and Shannon; we have shown that the interval dominance order provides a unified framework which connects and generalizes Lehman’s informativeness theorem, and Karlin and Rubin’s monotone class theorem, as well as the comparative statics theorems under uncertainty of Ormiston and Schlee (1993) and Athey (2002). In addition, the interval dominance order provides its own set of applications, such as to optimal stopping problems, which are not covered by earlier assumptions. There are many ways in which the ideas in this paper can be further developed. For example, IDO can be extended to a multidimensional setting. Notions like quasisupermodularity (Milgrom and Shannon (1994)) and C quasisupermodularity (Quah (2007)), which are important for multidimensional comparative statics, are essentially variations on the single crossing property; therefore, like the single crossing property, they can be generalized. We explore these issues in a companion paper (see Quah and Strulovici (2007)). An issue closely connected with the value of information is the marginal incentive to acquire information and the extent to which it varies with the payoff functions and (in games) the nature of the strategic interaction. We hope that the ideas developed here can be employed to help address those issues. APPENDIX PROOF OF PROPOSITION 3: We first establish the following lemma. LEMMA 4: Let {di }0≤i≤M and {ai }0≤i≤M be two finite sequences of real numM bers with the properties Dj = i=j di ≥ 0 for j = 0 1 M and ai+1 ≥ ai . Then M i=0 ai di ≥ a0 D0 . PROOF: It is straightforward to check that (37)
a0 D0 +
M i=0
Di (ai − ai−1 ) =
M
ai bi
i=0
fractional treatment rules, in which, for a nonnegligible set of sample outcomes, both treatments will be in use. In our variation on (36), it is clear that fractional treatment is plausible if large values of x involve very high costs.
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1985
(this is just a discrete version of integration by parts). This gives us the desired M conclusion, since the assumptions guarantee that i=0 Di (ai − ai−1 ) ≥ 0. Q.E.D. Returning to the proof of Proposition 3, suppose f (xk+K ) ≥ f (xk+i ) for i = K−1 0 1 2 K − 1. This means that Dj = i=j di ≥ 0 (for j = 0 1 (K − 1)), where di = f (xk+i+1 ) − f (xk+i ). Denoting ai = α(xi ), (11) and Lemma 4 give us K−1 K−1 [g(xk+i+1 ) − g(xk+i )] ≥ ai di ≥ a0 (f (xk+K ) − f (xk )) i=0
i=0
K−1 Since i=0 [g(xk+i+1 ) − g(xk+i )] = g(xk+K ) − g(xk ), we obtain g(xk+K ) ≥ (>) g(xk ) if f (xk+K ) ≥ (>) f (xk ). Q.E.D. PROOF OF PROPOSITION 4: It suffices to show that condition (B) in Proposition 1 holds. To do that, we require the following lemma. LEMMA 5: Suppose f and g obey the conditions of Proposition 4. In addition, suppose f is strictly decreasing in the interval [x1 x2 ] and strictly increasing in the interval [x3 x4 ], with x2 ≤ x3 , f (x1 ) = f (x4 ), and f (x2 ) = f (x3 ). Then (38)
[g(x4 ) − g(x3 )] + [g(x2 ) − g(x1 )] ≥ 0
PROOF: Define z : [x1 x2 ] → [x3 x4 ] by f (x) = f (z(x)). Except when f = 0 (and there are just finitely many such points in [x3 x4 ] since f is analytic), z is differentiable and f (x) = f (z(x))z (x). Therefore, for all but finitely many values of x in [x1 x2 ], g (x) g (z(x)) f (x) − f (z(x))z (x) f (x) f (z(x))
g (x) ≥ f (x) − f (z(x))z (x) f (x)
g (x) − g (z(x))z (x) =
= 0 Integrating this inequality between x1 and x2 gives us (38).
Q.E.D.
Returning to the proof of Proposition 4, suppose f (x ) = f (x ) and f (x) ≤ f (x ) for x ∈ [x x ]. We prove that g(x ) ≥ g(x ) by induction on the number of local maxima in the open interval (x x ). (Since f is analytic, this num-
1986
J. K.-H. QUAH AND B. STRULOVICI
ber is finite; note also that there cannot be intervals within [x x ] where f is constant, because that implies that f is constant in X.) First consider the case where, in (x x ), f has no local maxima but achieves its minimum at x∗ . Applying Lemma 5 (with x4 = x , x1 = x , and x2 = x3 = x∗ ), we obtain g(x ) − g(x ) ≥ 0. Now consider the case where there are K local maxima in the interval (x x ). Let x∗ be the highest of these local peaks; formally f (x∗ ) ≥ f (x ) for any x which is a local maximum of f . We define x to be the smallest x in [x x ] such that f (x∗ ) = f (x) and define x¯ to be the largest x such that ¯ By continuity, these points exist, and f is strictly increasing in f (x∗ ) = f (x). ¯ x ] and strictly decreasing in [x x]. In the interval (x x∗ ), f has strictly [x ¯ fewer than K local maxima, so (i) g(x) ≤ g(x∗ ). Similarly, (ii) g(x∗ ) ≤ g(x) ¯ Finally, Lemma 5 since f has strictly fewer than K local maxima in (x∗ x). ¯ + [g(x) − g(x )] ≥ 0. Together, the inequalguarantees that (iii) [g(x ) − g(x)] ities (i)–(iii) guarantee that g(x ) − g(x ) ≥ 0. Q.E.D. PROOF OF THEOREM 2—The Case Where S Is Finite: We can assume, with no loss of generality, that S = {1 2 N}. The agent’s objective function is N F(x λ) = i=1 f (x i)λi , where λi is the probability of state i; we wish to show that F(· γ) I F(· λ) if γ is an MLR shift of λ. This claim follows from an application of Theorem 2 in the case where S is an interval (which was proved in the main body of the paper). Define S¯ = (0 N] and the family {f¯(· s)}s∈S¯ by f¯(x s) = f¯(x i) whenever s ∈ (i − 1 i], where i = 1 2 N. The density function λ¯ : S → R (and similarly γ) ¯ is defined ¯ ¯ by λ(s) = λi for s ∈ (i − 1 i]. Clearly, γ¯ is an MLR shift of λ, {f¯(· s)}s∈S is ¯ ¯ ¯ ¯ = F(x λ) (where F(x ¯ = ¯ = F(x γ), and F(x λ) λ) an IDO family, F(x γ) ¯ ¯ γ) ¯ λ); ¯ ¯ f (x s)λ(s) ds). Theorem 2 (for an interval S) tells us that F(· ¯ I F(· S Q.E.D. in other words, F(· γ) I F(· λ). PROOF OF PROPOSITION 8: Suppose that γ is not an MLR shift of λ, so there are states k and k + 1 such that γk /λk > γk+1 /λk+1 , where λk is the probability of state k, and so forth. It follows that there is positive scalar a such that (39)
γk+1 a λk+1 a >1> λk γk
Let X = {x1 x2 } with x1 < x2 and define {f (· i)}i∈S in the following way: for i < k, f (x1 i) = ε and f (x2 i) = 0; for i = k, f (x1 k) = 1 and f (x2 k) = 0; for i = k + 1, f (x1 k + 1) = 0 and f (x2 k + 1) = a; finally, for i > k + 1, f (x1 i) = 0 and f (x2 i) = ε. This is an SCP family provided ε > 0, since f (x2 i) − f (x1 i) takes values −ε, −1, a, and, finally, ε as i increases.
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1987
Note that the first inequality in (39) says that λk+1 a > λk , so if ε is small, F(x2 λ) = λk+1 a +
N i=k+2
λi ε >
k−1
λi ε + λk = F(x1 λ)
i=1
Now the second inequality in (39) says that γk > γk+1 a, so if ε is sufficiently small, F(x1 γ) =
k−1
γi ε + γk > γk+1 a +
i=1
N
γi ε = F(x2 γ)
i=k+2
So the shift from λ to γ has reduced the optimal value of x from x2 to x1 . Q.E.D. PROOF OF LEMMA 3—Continued: We first show that φ (as constructed in the proof in Section 5) is an increasing rule. We wish to compare φ(t ) against φ(t) where t > t. Note that the construction of φ(t) involves partitioning S into subsets S1 , S2 SM that obey (i)–(v). In particular, (v) says that for any s in Sm , ψ(τ(t s)) takes the same value, which we denote by ψm . To obtain φ(t ), we first partition S into disjoints sets S1 , S2 SL , where L is odd, with the partition satisfying properties (i)–(v). The important thing to note is that, for any s, we have (40)
ψ(τ(t s)) ≥ ψ(τ(t s))
This is clear: both ψ and τ(· s) are increasing functions and t > t. We denote ψ(τ(t s)) for s in Sk by ψk . Any s in S belongs to some Sm and some Sk , in which case (40) may be rewritten as (41)
ψk ≥ ψm
The construction of φ(t ) involves the construction of φˆ 2 , φˆ 4 , and so forth. The action φˆ 2 is the largest action that maximizes u(· s∗ ) in the interval [ψ2 ψ1 ]. Comparing this with φˆ 2 , which is the largest action that maximizes u(· s∗ ) in the interval [ψ2 ψ1 ], we know that φˆ 2 ≥ φˆ 2 , since (following from (41)) ψ2 ≥ ψ2 and ψ1 ≥ ψ1 . By definition, φˆ 4 is the largest action that maximizes u(· s3 ) in [ψ4 φˆ 2 ], ¯ be the largest odd number where s3 refers to the unique element in S3 . Let m such that sm¯ ≤ s3 . (Recall that sm¯ is the unique element in Sm¯ .) By definition, is the largest element that maximizes u(· sm¯ ) in [ψm+1 φˆ m−1 ]. We claim φˆ m+1 ¯ ¯ ¯ . This is an application of Theorem 1. It follows from (i) s3 ≥ sm¯ , that φˆ 4 ≥ φˆ m+1 ¯ ¯ is defined, along with (41), so u(· s3 ) I u(· sm¯ ), (ii) the manner in which m
1988
J. K.-H. QUAH AND B. STRULOVICI
guarantees that ψ4 ≥ ψm+1 , and (iii) we know (from the previous paragraph) ¯ that φˆ 2 ≥ φˆ 2 ≥ φˆ m−1 . ¯ So we obtain φˆ 4 ≥ φˆ m+1 ≥ φ(t). Repeating the argument finitely many ¯ times (on φˆ 6 and so on), we obtain φ(t ) ≥ φ(t). This completes the proof of Lemma 3 in the special case where ψ takes finitely many values. Extension to the Case Where the Range of ψ Is Infinite. The strategy is to approximate ψ with a sequence of simpler decision rules. Let {An }n≥1 be a sequence of finite subsets of X such that An ⊂ An+1 and n≥1 An is dense in X. (This sequence exists because X is compact.) The function ψn : Z → X is defined as follows: ψn (z) is the largest element in An that is less than or equal to ψ(z). The sequence of decision rules ψn has the properties (i) ψn is increasing in z, (ii) ψn+1 (z) ≥ ψn (z) for all z, (iii) the range of ψn is finite, and (iv) the increasing sequence ψn converges to ψ pointwise. Since ψn takes only finitely many values, we know there is an increasing decision rule φn (as defined in the proof of Lemma 3 in Section 5) such that
(42) u φn (T (z s)) s ≥ u(ψn (z) s) for all (z s) We claim that φn is also an increasing sequence. This follows from the fact that (43)
ψn+1 (τ(t s)) ≥ ψn (τ(t s)) for all (t s)
This inequality plays the same role as (40); the latter was used to show that φ(t ) ≥ φ(t). Mimicking the argument there, (43) tells us that φn+1 (t) ≥ φn (t). Since φn is an increasing sequence and X is compact, it has a limit, which we denote as φ. Since, for each n, φn is an increasing decision rule, φ is also an increasing decision rule. For each n, (42) holds; taking limits and using the continuity of u with respect to x, we obtain u(φ(T (z s)) s) ≥ u(ψ(z) s). Q.E.D. PROOF OF PROPOSITION 11—Continued: It remains for us to show how Λ¯ is constructed. We denote the density function of H(·|s) by h(·|s). It is clear ¯ 1 ) and λ(s ¯ 2 ) (the that since the actions are nonordered, we may choose λ(s probabilities of s1 and s2 , respectively) such that (44)
¯ 1 )h(t¯|s1 )[u(x1 s1 ) − u(x2 s1 )] λ(s ¯ 2 )h(t¯|s2 )[u(x2 |s2 ) − u(x1 s2 )] = λ(s
Rearranging this equation, we obtain (45)
¯ 2 )h(t¯|s2 )u(x1 |s2 ) ¯ 1 )h(t¯|s1 )u(x1 |s1 ) + λ(s λ(s ¯ 1 )h(t¯|s1 )u(x2 |s1 ) + λ(s ¯ 2 )h(t¯|s2 )u(x2 |s2 ) = λ(s
¯ the posterior distribution after observing t¯ is such Therefore, given the prior Λ, that the agent is indifferent between actions x1 and x2 .
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1989
Suppose the agent receives the signal z < t¯. Since H is MLR-ordered, we have h(z|s1 ) h(z|s2 ) ≥ h(t¯|s1 ) h(t¯|s2 ) This fact, together with (44), guarantees that ¯ 1 )h(z|s1 )[u(x1 s1 ) − u(x2 s1 )] λ(s ¯ 2 )h(z|s2 )[u(x2 |s2 ) − u(x1 s2 )] ≥ λ(s Rearranging this equation, we obtain ¯ 2 )h(z|s2 )u(x1 |s2 ) ¯ 1 )h(z|s1 )u(x1 |s1 ) + λ(s λ(s ¯ 1 )h(z|s1 )u(x2 |s1 ) + λ(s ¯ 2 )h(z|s2 )u(x2 |s2 ) ≥ λ(s So, after observing z < t¯, the (posterior) expected utility of action x1 is greater than that of x2 . In a similar way, we can show that x2 is the optimal action after observing a signal z ≥ t¯. Q.E.D. PROOF OF THEOREM 4—The Case Where ψ Has an Infinite Range: We ¯ and increasing decision rule ψ¯ with the construct an alternative experiment G following two properties: ¯ and ψ¯ equals (P1) At each state s, the distribution of losses induced by G that induced by G and ψ. ¯ (P2) G is more accurate than G. An application of Proposition 9 then guarantees that there is an increasing decision rule under G that is at least as good as ψ. Thus our proof is essentially the same as the one we gave for the finite case in Section 6, except that ¯ and ψ¯ is more complicated. construction of G Since X is compact, there is a smallest compact interval M that contains X. At a state s, we denote the distribution on M induced by G and ψ by F(·|s), that is, for any x in M, F(x|s) = PrG [ψ(z) ≤ x|s]. There are two noteworthy features of {F(·|s)}s∈S : (i) For a fixed s¯ , we may partition M into (disjoint) contour sets Us¯ (r), that is, Us¯ (r) = {x ∈ M : F(x|¯s) = r}. It is possible that for some r, Us¯ (r) is empty, but if it is nonempty, then it has a minimum and the minimum is in X (and not just in M). Crucially, this partition is common across all states s, that is, for any other state s, there is some r such that Us (r ) = Us¯ (r). (ii) The atoms of F(·|s) also do not vary with s; that is, if x is an atom for F(·|¯s), then it is an atom for F(·|s) for every other state s. Features (i) and (ii) follow easily from the definition of F , the compactness of X, and the fact that G(·|s) is atomless and has support Z at every state s.
1990
J. K.-H. QUAH AND B. STRULOVICI
To each element x in M we associate a number ε(x), where ε(x) > 0 if and only if x is an atom and x∈X ε(x) < ∞. (Note that there are at most countably many atoms, so the infinite summation makes sense.) We define the map Y : M → R, where Y (x) = x + {x ∈M:x ≤x} ε(x ). Itis clear that this map is a strictly increasing and hence 1–1 map. Let Y ∗ = x∈M [Y (x) − ε(x) Y (x)]. ∗ ∗ The difference between ∞ Y and the range of Y (i.e., set Y \ Y (M)), may be written in the form n=1 In , where In = [Y (an ) − ε(an ) Y (an )) and {an }n∈N is the set of atoms. (Loosely speaking, the “gaps” In arise at every atom.) ˜ We define the distribution G(·|s) on Y ∗ in the following way. For y in Y (M), −1 ˜ ˜ ˜ n |s), G(y|s) = F(Y (y)|s). For y = Y (an ) − ε(an ), G(y|s) is the limit of G(y where yn is a sequence in Y (M) tending to y from the left; if no such sequence exists (which occurs if and only if there is an atom at the smallest ˜ element of X), let G(y|s) = 0. (One can check that this definition is un˜ ambiguous.) It remains for us to define G(y|s) for y in the open interval (Y (an ) − ε(an ) Y (an )). For y = C(an ) or y = C(an ) − ε(an ), define t(y) by ˜ G(t(y)|s) = G(y|s). Any element y in (Y (an ) − ε(an ) Y (an )) may be written as θ[Y (an ) − ε(an )] + (1 − θ)[Y (an )]. We define
˜ G(y|s) = G θt(C(an ) − ε(an )) + (1 − θ)t(C(an ))|s ˜ We have now completely specified the distribution G(·|s). Note that we have constructed this distribution to be atomless, so for every number r in [0 1], ˜ = r} is nonempty. Indeed, following from observation the set {y ∈ Y ∗ : G(y|s) ˆ (i) above, this set has a smallest element, which we denote by y(r). We define ˆ : r ∈ [0 1]}. Observation (i) also tells us that Y ∗∗ does not vary with Y ∗∗ = {y(r) ¯ ˜ Therefore, for any r in s. We denote the restriction of G(·|s) to Y ∗∗ by G(·|s). ¯ [0 1] and any state s, there is a unique y in Y ∗∗ such that G(y|s) = r. One can check that property (P2) (stated above) holds: G is more accurate ¯ Formally, the map T defined by G(T (y s)|s) = G(y|s) ¯ than G. exists and has the property that T (y s) is increasing in s; the proof is substantially the same as that for the finite case. Furthermore, T (·|s) has a unique inverse (in Y ∗∗ ). So we have identified precisely the properties of T needed for the application of Proposition 9. Consider the decision rule ψ¯ : Y ∗∗ → X defined as follows: if ¯ y is in Y (M), define ψ(y) = Y −1 (y); if y is in [Y (an ) − ε(an ) Y (an )), define ¯ and ψ¯ generate the same distribution ¯ ψ(y) = an . It is not hard to verify that G of losses as G and ψ, as required by (P1). Q.E.D. REFERENCES AMIR, R. (1996): “Sensitivity Analysis in Multisector Optimal Economic Dynamics,” Journal of Mathematical Economics, 25, 123–141. [1964] ARROW, K., AND D. LEVHARI (1969): “Uniqueness of the Internal Rate of Return With Variable Life of Investment,” Economic Journal, 79 (315), 560–566. [1962]
STATICS, INFORMATIVENESS, AND THE INTERVAL DOMINANCE
1991
ASHWORTH, S., AND E. BUENO DE MESQUITA (2006): “Monotone Comparative Statics in Models of Politics,” American Journal of Political Science, 50 (1), 214–231. [1949] ATHEY, S. (2001): “Single Crossing Properties and the Existence of Pure Strategy Equilibria in Games of Incomplete Information,” Econometrica, 69 (4), 861–890. [1961,1967] (2002): “Monotone Comparative Statics Under Uncertainty,” Quarterly Journal of Economics, 117 (1), 187–223. [1950,1967,1981,1984] ATHEY, S., AND J. LEVIN (2001): “The Value of Information in Monotone Decision Problems,” Stanford Working Paper 01-003. [1951,1971,1977] BECKER, R., AND J. BOYD (1997): Capital Theory, Equilibrium Analysis, and Recursive Utility. Oxford: Blackwell. [1964] BERGEMANN, D., AND J. VALIMAKI (2002): “Information Acquisition and Efficient Mechanism Design,” Econometrica, 70, 1007–1033. [1951,1971] BERGER, J. O. (1985): Statistical Decision Theory and Bayesian Analysis. New York: SpringerVerlag. [1978,1979] BLACKWELL, D. (1951): “On a Theorem of Lyapunov,” The Annals of Mathematical Statistics, 22 (1), 112–114. [1979] (1953): “Equivalent Comparisons of Experiments,” The Annals of Mathematical Statistics, 24 (2), 265–272. [1971] BLACKWELL, D., AND M. A. GIRSHIK (1954): Theory of Games and Statistical Decisions. New York: Dover Publications. [1978,1979] CHAMBERLAIN, G. (2000): “Econometrics and Decision Theory,” Journal of Econometrics, 95, 255–283. [1979] GOLLIER, C. (2001): The Economics of Risk and Time. Cambridge: MIT Press. [1971] HIRANO, K. (2008): “Decision Theory in Econometrics,” in The New Palgrave Dictionary of Economics (Second Ed.), ed. by S. N. Durlauf and L. E. Blume. Basingstoke, England: Palgrave Macmillan. [1978] JEWITT, I. (2006): “Information Order in Decision and Agency Problems,” Personal Manuscript, Nuffield College, Oxford University. [1951,1971,1972] KARLIN, S., AND H. RUBIN (1956): “The Theory of Decision Procedures for Distributions With Monotone Likelihood Ratio,” The Annals of Mathematical Statistics, 27 (2), 272–299. [1951, 1979,1983] LEHMANN, E. L. (1988): “Comparing Location Experiments,” The Annals of Statistics, 16 (2), 521–533. [1971,1972,1980] LEVIN, J. (2001): “Information and the Market for Lemons,” Rand Journal of Economics, 32 (4), 657–666. [1951,1971] MANSKI, C. (2004): “Statistical Treatment Rules for Heterogeneous Populations,” Econometrica, 72 (4), 1221–1246. [1979] (2005): Social Choice With Partial Knowledge of Treatment Response. Princeton: Princeton University Press. [1952,1979,1982,1983] MANSKI, C., AND A. TETENOV (2007): “Admissible Treatment Rules for a Risk-Averse Planner With Experimental Data on an Innovation,” Journal of Statistical Planning and Inference, 137 (6), 1998–2010. [1979,1983] MILGROM, P. (1981): “Good News and Bad News: Representation Theorems and Applications,” Bell Journal of Economics, 12, 380–391. [1972] MILGROM, P., AND J. ROBERTS (1990): “Rationalizability, Learning, and Equilibrium in Games With Strategic Complementarities,” Econometrica, 58 (6), 1255–1277. [1949,1961] MILGROM, P., AND C. SHANNON (1994): “Monotone Comparative Statics,” Econometrica, 62 (1), 157–180. [1949,1953,1984] ORMISTON, M. B., AND E. SCHLEE (1993): “Comparative Statics Under Uncertainty for a Class of Economic Agents,” Journal of Economic Theory, 61, 412–422. [1967,1984] PERSICO, N. (1996): “Information Acquisition in Affiliated Decision Problems”. Discussion Paper 1149, Department of Economics, Northwestern University. [1972]
1992
J. K.-H. QUAH AND B. STRULOVICI
(2000): “Information Acquisition in Auctions,” Econometrica, 68 (1), 135–148. [1951, 1971] QUAH, J. (2007): “The Comparative Statics of Constrained Optimization Problems,” Econometrica, 75 (2), 401–431. [1984] QUAH, J. K.-H., AND B. STRULOVICI (2007): “Comparative Statics With the Interval Dominance Order II,” Incomplete Manuscript; available at http://www.economics.ox.ac.uk/Members/john. quah/hlp30.pdf. [1961,1984] TOPKIS, D. M. (1978): “Minimizing a Submodular Function on a Lattice,” Operations Research, 26, 305–321. [1949] (1998): Supermodularity and Complementarity. Princeton: Princeton University Press. [1949,1953] VIVES, X. (1990): “Nash Equilibrium With Strategic Complementarities,” Journal of Mathematical Economics, 19, 305–321. [1949]
Dept. of Economics, Oxford University, Manor Road, Oxford OX1 3UQ, United Kingdom; [email protected] and Dept. of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60201, U.S.A.; [email protected]. Manuscript received November, 2007; final revision received April, 2009.
Econometrica, Vol. 77, No. 6 (November, 2009), 1993–2008
NOTES AND COMMENTS OBSERVING UNOBSERVABLES: IDENTIFYING INFORMATION ASYMMETRIES WITH A CONSUMER CREDIT FIELD EXPERIMENT BY DEAN KARLAN AND JONATHAN ZINMAN1 Information asymmetries are important in theory but difficult to identify in practice. We estimate the presence and importance of hidden information and hidden action problems in a consumer credit market using a new field experiment methodology. We randomized 58,000 direct mail offers to former clients of a major South African lender along three dimensions: (i) an initial “offer interest rate” featured on a direct mail solicitation; (ii) a “contract interest rate” that was revealed only after a borrower agreed to the initial offer rate; and (ii) a dynamic repayment incentive that was also a surprise and extended preferential pricing on future loans to borrowers who remained in good standing. These three randomizations, combined with complete knowledge of the lender’s information set, permit identification of specific types of private information problems. Our setup distinguishes hidden information effects from selection on the offer rate (via unobservable risk and anticipated effort), from hidden action effects (via moral hazard in effort) induced by actual contract terms. We find strong evidence of moral hazard and weaker evidence of hidden information problems. A rough estimate suggests that perhaps 13% to 21% of default is due to moral hazard. Asymmetric information thus may help explain the prevalence of credit constraints even in a market that specializes in financing high-risk borrowers. KEYWORDS: Advanced selection, credit markets, development finance, information asymmetrics microfinance, moral hazard.
1. INTRODUCTION INFORMATION ASYMMETRIES are important in theory. Stiglitz and Weiss (1981) sparked a large theoretical literature on the role of asymmetric information in credit markets that has influenced economic policy and lending practice worldwide (Armendariz de Aghion and Morduch (2005), Bebczuk (2003)). Theories show that information frictions and ensuing credit market failures can create inefficiency at both the micro and the macro level via underinvestment (Mankiw (1986), Gale (1990), Banerjee and Newman (1993), Hubbard (1998)), overinvestment (De Meza and Webb (1987), Bernanke and Gertler (1990)), or poverty traps (Mookherjee and Ray (2002)). Many policies 1 We are grateful to the National Science Foundation, BASIS/USAID, and the Bill and Melinda Gates Foundation for funding research expenses, and to the lender for implementing the experiment and financing the loans. Thanks to Jonathan Bauchet and Karen Lyons for excellent research assistance. Thanks to four referees, the editor Costas Meghir, seminar participants, and numerous colleagues (in particular Gharad Bryan and Chris Udry) for helpful comments. Much of this paper was completed while Zinman worked at the Federal Reserve Bank of New York (FRBNY), and we thank FRBNY—particularly Jamie McAndrews, Jamie Stewart, and Joe Tracy—for research support. The views expressed herein are those of the authors and do not necessarily reflects those of our funders, the FRBNY, or the Federal Reserve System.
© 2009 The Econometric Society
DOI: 10.3982/ECTA5781
1994
D. KARLAN AND J. ZINMAN
have been put forth to address information asymmetry problems. A better understanding of which information asymmetries are empirically salient is critical for determining optimal remedies, if any. Information asymmetries are difficult to identify in practice. Empirical evidence on the existence and importance of specific information frictions is relatively thin, in general, and particularly so for credit markets (Chiappori and Salanie (2000)). Distinguishing between hidden information and hidden action is difficult even when precise data on underwriting criteria and clean variation in contract terms are available, as a single interest rate may produce independent, conflated selection and incentive effects. For example, a positive correlation between loan default and a randomly assigned interest rate, conditional on observable risk, could be due to adverse selection ex ante (those with relatively high probabilities of default will be more likely to accept a high rate) or moral hazard ex post (because those given high rates have greater incentive to default).2 We test for the presence of distinct types of asymmetric information problems using a new field experiment methodology that was implemented by a South African finance company that typically lends at 200% APR. Our design randomizes interest rates independently along three dimensions: (i) the interest rate offered in a direct-mail solicitation; (ii) the actual interest rate on the loan contract; (iii) the interest rate offered on future loans. The design produces borrowers who select in at identical rates and then face different repayment incentives going forward, and borrowers who select in at different rates and then face identical repayment incentives. The ability to disentangle hidden information (defined here as ex-ante selection effects, which includes both selection on unobserved risk type, i.e., classic adverse selection, and selection on unobserved anticipated effort) from hidden action (defined here as ex-post incentive effects) is critical from a policy and practical perspective. For instance, hidden information problems should motivate policymakers and lenders to consider subsidies, loan guarantees, information coordination, and enhanced screening strategies. Hidden action problems should motivate policymakers and lenders to consider legal reforms in the areas of liability and garnishment, and enhanced dynamic contracting schemes. Our theoretical model formalizes how our experimental design can disentangle hidden information from hidden action effects. The model also highlights an interesting limitation of the design for testing theory: it can only isolate the effect of classic adverse selection (on risk type alone) if there is no hidden 2 See Ausubel (1999) for a related discussion of the problem of disentangling adverse selection and moral hazard in a consumer credit market. See Chiappori and Salanie (2000) on the analogous problem in insurance markets. Insurance markets have been the subject of relatively active interplay between theoretical and empirical contributions, but recent papers on other markets have also made important strides toward identifying the independent effects of adverse selection and/or moral hazard; see, for example, Cardon and Hendel (2001) on health insurance, and Shearer (2004) on labor contracts.
OBSERVING UNOBSERVABLES
1995
action effect (i.e., no moral hazard in effort). The limitation comes from the fact that, in a world with both hidden information and hidden action problems, there can be selection on two distinct forms of hidden information: selection on unobserved risk type (classic adverse selection) and selection on unobserved anticipated effort. In that case our design identifies effects of the reduced-form combination of both forms of hidden information; i.e., of selection on the two different forms of ex-ante unobservables. Our empirical results indicate weak evidence of hidden information and strong evidence of economically significant moral hazard. A rough estimate suggests that moral hazard explains perhaps 13% to 21% of default in our sample. Information asymmetries thus may help explain the prevalence of credit constraints even in a market that specializes in financing high-risk borrowers at very high rates. 2. MARKET AND LENDER OVERVIEW Our cooperating lender operated for over 20 years as one of the largest, most profitable micro-lenders in South Africa. It competed in a “cash loan” industry segment that offers small, high interest, short-term, uncollateralized credit with fixed monthly repayment schedules to a “working poor” population.3 Cash loan borrowers generally lack access to traditional institutional sources such as commercial banks. Cash loan sizes tend to be small relative to the fixed costs of underwriting and monitoring them, but substantial relative to a typical borrower’s income. For example, the lender’s median loan size of R1000 ($150) was 32% of its median borrower’s gross monthly income. Cash lenders arose to substitute for traditional “informal sector” moneylenders following deregulation of the usury ceiling in 1992, and they are regulated by the Micro Finance Regulatory Council (MFRC). Cash lenders focusing on the observably highest-risk market segment typically make 1-month maturity loans at 30% interest per month. Informal sector moneylenders charge 30%– 100% per month. Lenders targeting observably lower-risk segments charge as little as 3% per month.4 Our cooperating lender’s product offerings were somewhat differentiated from competitors. It did not pursue collection or collateralization strategies such as direct debit from paychecks or physically keeping bank books and ATM 3 Aggregate outstanding loans in this market segment equal 38% of non-mortgage consumer credit Department of Trade and Industry South Africa (2003). 4 The cash loan market has important differences and similarities with “traditional” microcredit (e.g., the Grameen Bank, or government or non-profit lending programs). In contrast to our setting, most microcredit has been delivered by lenders with explicit social missions that target groups of female entrepreneurs, sometimes in group settings. On the other hand, the industrial organization of microcredit is trending steadily in the direction of the for-profit, more competitive delivery of individual, untargeted credit that characterizes the cash loan market (Robinson (2001), Porteous (2003)). This push is happening both from the bottom-up (non-profits converting to for-profits) as well as from the top-down (for-profits expanding into microcredit segments).
1996
D. KARLAN AND J. ZINMAN
cards of clients. Its pricing was transparent and linear, with no surcharges, application fees, or insurance premiums added to the cost of the loan. The lender also had a “medium-maturity” product niche, with a 90% concentration of 4-month loans (Web Appendix Table I.A). Most other cash lenders focus on 1-month or 12+-month loans. The lender’s normal 4-month rates, absent this experiment, ranged from 7.75% to 11.75% per month depending on observable risk, with 75% of clients in the high-risk (11.75%) category. Per standard practice in the cash loan market, essentially all of the lender’s underwriting and transactions were conducted face to face in its network of over 100 branches. Its risk assessment technology combined centralized credit scoring with decentralized loan officer discretion. Rejection was prevalent even with a modal rate of 200% annual percentage rate (APR); the lender denied 50% of new loan applicants. Reasons for rejection included unconfirmed employment, suspicion of fraud, poor credit rating, and excessive debt burden. Applicants who were approved often defaulted on their loan obligation, despite facing several incentives to repay. Carrots included decreasing prices and increasing future loan sizes following good repayment behavior. Sticks included reporting to credit bureaus, frequent phone calls from collection agents, court summons, and wage garnishments. Repeat borrowers had default rates of about 15%; first-time borrowers defaulted twice as often. 3. EXPERIMENTAL DESIGN AND IMPLEMENTATION The sample frame consisted of 57,533 former clients5 with good repayment histories from 86 predominantly urban branches. Everyone in the sample frame had borrowed from the lender within the past 24 months, and did not have a loan outstanding in the 30 days prior to the mailer. Web Appendix Tables I.A and I.B present summary statistics on the sample frame and the subsample of clients who obtained a loan in this study (Karlan and Zinman (2009)). The lender assigns prior borrowers into low-, medium-, and high-risk categories, and this determines the borrower’s loan pricing and maturity options under normal operations. 5 Information asymmetries may be less prevalent among former clients than new clients if hidden type is revealed through the lending relationship (Elyasiani and Goldberg (2004)). Hence there is reason to expect that a lender faces more adverse selection among new clients (those who have not previously done business with the firm). The lender tried addressing this possibility by sending solicitations to 3000 individuals from a mailing list purchased from a consumer database. Only one person from this list borrowed. Another list was purchased from a different vendor, and 5000 letters were sent without randomized interest rates. Only two people responded. The lender had no previous experience with direct mail solicitation to new clients, and concluded that the lack of response was due to low-quality (fraudulent or untargeted) lists from the consumer database firms, or to consumer unfamiliarity with receiving a solicitation from a firm they have not done business with in the past. In general, unsolicited direct mail is not common in South Africa, but individuals are accustomed to receiving mail from firms with which they do business (e.g., the lender mails solicitations and monthly statements to prior and existing clients).
OBSERVING UNOBSERVABLES
1997
3.1. Experimental Design and Integrity The experiment was conducted in three waves: July, September, and October 2003. In each wave clients were randomly assigned three interest rates conditional on their observable risk category. Rate ranged from an upper bound of the prior interest rate for each individual to a lower bound of 3.25% per month (see Web Appendix Tables VIII and IX for details on the rate distributions). The offer rate r o was featured on the direct mailer. The contract and future rates r c and r f were only revealed to clients and loan officers if the client tookup the offer (i.e., applied), and after the loan officer completed her initial underwriting (Web Appendix Figure 1 shows the experimental operations, step-by-step).6 Our design contains built-in integrity checks for whether r c and r f were indeed surprises: both client takeup and loan officer approve/reject decisions were uncorrelated with the surprise rates (Web Appendix Table II, columns 4 and 5). Nor were there any instances of clients applying for the loan, being approved, and then not taking out the loan. This fact further corroborates that the contract rate and dynamic repayment incentive were surprises; i.e., that borrowers made takeup decisions with reference to r o only. 5028 (8.7%) clients tookup the offer by applying for a loan. Clients applied by entering a branch office and filling out an application in person with a loan officer. Loan applications were taken and assessed as per the lender’s normal underwriting procedures. The loan application process took at most one hour, typically less. Loan officers first updated observable information (current debt load, external credit report, and employment information) and decide whether to offer any loan based on their updated risk assessment. 4348 (86.5%) of applicants were approved. Next loan officers decided the maximum loan size and maturity for which applicants qualified. Each loan supply decision was made “blind” to the experimental rates; i.e., the credit, loan amount, and maturity length decisions were made as if the individual were applying to borrow at the normal rate dictated by her observable risk class.7 After clients choose an allowable loan size and maturity, special software revealed r c in the 41% cases that it was lower than r o (otherwise no mention was made of a potentially lower rate). Loan officers were instructed to present the lower contract rate as simply what the computer dictated, not as part of a special promotion or anything particular to the client. Due to operational constraints, clients were then permitted to adjust their desired loan size following 6 Web Appendix Table II, columns 1–3 shows that, as one would expect, the randomly assigned rates were essentially uncorrelated with baseline client characteristics, conditional on observable risk. The prevalence of significant correlations (3 out of 45 cases) is what one would expect to occur by chance. 7 A lower interest rate normally would allow for a larger loan. This would work against identifying moral hazard on the interest rate, so we constrained the maximum allowable loan size to be calculated based on the normal, not experimental, interest rate.
1998
D. KARLAN AND J. ZINMAN
the revelation of r c . In theory, endogenizing loan size in this fashion can work against identifying moral hazard on the contract rate (since a lower r c strengthens repayment incentives ceteris paribus, but might induce choice of a higher loan size that weakens repayment incentives). In practice, however, only about 3% of borrowers who received r c < r o changed their loan demand after r c was revealed. Last, 47% of clients were randomly assigned and informed of a dynamic incentive r f in which clients received the same low contract interest rate on all future loans for one year as long as they remained in good standing with the lender.8 This explicitly raised the benefits of repaying the initial loan on time (or equivalently the cost of defaulting) in the 98% of cases where the contract rate was less than the lender’s normal rate. The average discount embodied in r c , and hence r f , was substantial: an average of 350 basis points off the monthly rate. Moreover, the lender’s prior data suggested that, conditional on borrowing once, a client would borrow again within a year more than half the time. Clients not receiving the dynamic incentive obtained r c for just the first loan (which had only a 4-month maturity in 80% of the cases). Clients were informed of r f by the branch manager only after all paperwork had been completed and all other terms of the loan were finalized. 3.2. Default Outcomes Following the execution of the loan contract we tracked repayment behavior using the lender’s administrative data. In principle, a measure of default should summarize the true economic cost of lending. In practice the true cost is very difficult to measure because of uncertainty and fixed costs in originating, monitoring, and collections. Given these difficulties, the lender lacked a summary statistic for default, and instead relied on a range of proxies for true costs (this is common practice). Consultation with the lender suggested focusing on three measures: (i) monthly average proportion past due (the average default amount in each month divided by the total debt burden); (ii) proportion of months in arrears (the number of months with positive arrearage divided by the number of months in which the loan was outstanding); and (iii) account in collection status (typically, the lender considered a loan in collection status if there are three or more months of payments in arrears). Web Appendix Table I.A presents summary statistics on these default measures. We also create a summary index that aggregates across these three measures of default to allow us to address the problem of multiple inference Kling, Liebman, and Katz (2007). 8 For operational reasons, the dynamic repayment incentive was randomized at the branch level during the first and second wave of the experiment, and at the individual level for the third wave.
OBSERVING UNOBSERVABLES
1999
3.3. Identification Strategy: Intuition A stylized example, illustrated in Figure 1, captures the heart of our identification strategy. Individuals decide whether to takeup at the solicitation’s offer rate r o , which can be “high” or “low.” Of those that takeup at the high r o , some are randomly surprised with a new lower contract interest rate r c , while the remainder continue to receive the high rate (i.e., r c = r o ). We identify any hidden information effect (the combination of selection on risk and on anticipated effort induced through selection on the offer rate) by considering the sample that received the low r c , and comparing the repayment behavior of those who tookup at the high r o (cells 2 and 3 in the figure) with those who tookup at the low r o (cells 4 and 5). Because everyone in this sample was randomly assigned identical contracts (i.e., low r c ), but selected in at varying, randomly assigned rates, any difference in repayment comes from hidden information: from selection on unobservables, including both type and anticipated effort, induced by r o . Similarly, we identify any effect of hidden action (moral hazard) by considering the sample that tookup at the high r o , and comparing the repayment behavior of those who received the high r c (cell 1) with those who received the low r c (cells 2 and 3). These borrowers selected in identically, but ultimately received randomly different r c . Any difference in default comes from the resulting moral hazard. We also identify moral hazard by comparing the repayment behavior of borrowers who both selected in and contracted at identical rates, but face different dynamic repayment incentives from randomly assigned future interest rates r f that are conditional on repayment of the initial loan (cell 2 v. cell 3; cell 4 v. cell 5).
FIGURE 1.—Some basic intuition for our identification strategy.
2000
D. KARLAN AND J. ZINMAN
4. THEORETICAL MODEL AND IDENTIFICATION STRATEGY We now formalize what can be learned about the presence or absence of different types of asymmetric information problems from empirical tests based on our design. To do this, we provide a model of loan takeup and repayment in the presence of hidden information (in the form of risky prospects and anticipated effort) and hidden actions (in the form of realized effort). Our goal is not to put forward new theory that incorporates both adverse selection and moral hazard and discusses their interplay (e.g., Chassagnon and Chiappori (1997)), but rather to detail precisely what is meant by each in this context. Models with similar features can be found in many sources (e.g., Bardhan and Udry (1999)). Because the lender decisions included in the model are randomized, we only need to model borrower decisions. We do this in three stages, following the experimental design: 1. The individual decides whether or not to borrow at an exogenously set offer rate r o . In making this decision, the individual believes that any repeat loans will be provided at the lender’s normal interest rate r. 2. The lender randomly lowers the interest rate for some borrowers to r c < r o . With an independent randomization the lender also lowers the repeat borrowing rate to r f = r c < r for some borrowers. Given r c and r f the borrower ¯ to put into generating cash flows to repay decides how much effort, e ∈ [e e], the loan. 3. Cash flows (i.e., project returns) are realized and the borrower decides whether or not to repay the loan. We define the borrower’s decision process as follows. Each individual has the opportunity to invest in a project but is liquidity constrained and requires financing of 1 from a single lender to do so. We refer to “project” here in a broad sense that includes household as well as entrepreneurial activities. Indi¯ The project either succeeds (g) viduals are indexed by a risk type θi ∈ [θ θ]. and returns Y (θi ) or fails (b) and returns 0. The probability of project success π(θi e) is a function of the project risk type, θi , and the effort put forth by the ¯ Both risk type and effort are observable to the borrower borrower, e ∈ [e e]. but incompletely observable to the lender. The borrower is subject to a state specific cost of default, Ci (r f ), i = g b, which is a decreasing function of the future lending rate. Under limited liability Cb (r f ) < Cg (r f ), but we will explore the implications of relaxing this assumption below. We assume that the borrower is risk neutral and we make the following assumptions regarding returns and repayment: ASSUMPTION 1: Y (θi ) > 1 + r o for all θi : if the project succeeds, the loan can be repaid at the offer interest rate. If the project fails the loan cannot be repaid (this follows from the borrower’s liquidity constraint). ASSUMPTION 2: ∂π(θ∂ei e) > 0; creasing and concave in effort.
∂2 π ∂e2
< 0: the likelihood of project success is in-
OBSERVING UNOBSERVABLES
ASSUMPTION 3: risk type.
∂π(θi e) ∂θi
2001
< 0: the likelihood of project success is decreasing in
ASSUMPTION 4: π(θi e)Y (θi ) = Y (e) for all θi : all risk types have the same expected project return. This follows Stiglitz and Weiss (1981) and implies that projects with a higher θi are “riskier” in terms of second order stochastic dominance. It also implies, as we show below, that borrowers with higher θ choose a lower effort level. ASSUMPTION 5: Cg (r f ) ≥ 1 + r o for all relevant r f : there is no strategic default. We make this assumption because we do not observe empirically why the borrower repays or not (e.g., whether the project succeeds or fails) and hence cannot test separately for each of the possible channels through which hidden actions affect repayment. So we use the model to focus on what can be learned about moral hazard in effort under the assumption that borrowers repay if they are able. An alternative interpretation (given our broad definition of “project”) is that “effort” is a tractable way to model all borrower activities that impact repayment. We now turn to solving the three stages of the model. Consider a borrower using backwards induction: Stage 3 By Assumptions 1 and 5 the borrower repays if and only if the project succeeds. Stage 2 Knowing that she will repay if and only if the project succeeds, the borrower chooses effort to solve: max π(θi e)((Y (θi ) − 1 − r c + Cb (r f )) − e − Cb (r f )
e∈[eL eH ]
Effort is decreasing in r c given Assumption 2. This implies: HIDDEN ACTION EFFECT 1: Effort is decreasing in r c given Assumption 2. A given set of borrowers exerts less effort at higher contract interest rates than at lower contract interest rates, holding constant offer and future interest rates. Thus repayment is decreasing in r c . Effort is also decreasing in r f given Assumptions 1, 2, and 5. This implies: HIDDEN ACTION EFFECT 2: A given set of borrowers exerts less effort as the cost of default decreases, holding constant offer and contract interest rates. Thus repayment is decreasing in r f .
2002
D. KARLAN AND J. ZINMAN
Risk type, θi , also affects effort, with then affects repayment. To see this note the borrower’s first order condition for optimal effort: ∂π(θi e) (Y (θi ) − 1 − r c + Cb (r f )) = 1 ∂e Given Assumption 4, we can implicitly define optimal effort eˆ as a function of r c , Cb (r f ), and θi . (1)
ˆ c Cb (r f ) θi ))Y (θi ) (1 − Y¯ (e(r = Cb (r f ) − 1 − r c ˆ c Cb (r f ) θi )) Y¯ (e(r
ˆ c Cb (r f ) θi ) must be a decreasing function of θi , Equation (1) implies that e(r i.e., ceteris paribus, higher risk borrowers put in less effort.9 We use this finding below to help interpret our third effect, the effect of the offer rate on repayment. Stage 1 An individual decides to take up the offer if the expected return from ˆ o her project, given expected optimized effort at the offer interest rate, e(r 10 Cb (r) θi ), is greater than her next-best option (set to zero for simplicity). That is, an individual borrows from the lender if and only if (2)
ˆ o Cb (r) θi ))(Y (θi ) − 1 − r o + Cb (r)) π(θi e(r ˆ o Cb (r) θi ) − Cb (r) ≥ 0 −e(r
ˆ o Cb (r) θi ) is the optimal level of effort for an individual with project where e(r type θi that borrows and expects to pay the offer interest rate, r o . If we assume that Cb (r) < 1 + r o then the left-hand side of (2) is increasing in riskiness, θi . To see this, note that the envelope theorem implies that the increase in θi has no indirect effect through effort. The only effect of increasing ˆ which has a negative first derivative by θi comes through the term π(θi e), Assumption 3. Consequently, for a given r o , either all borrowers will take out a loan, or there will be a separation with those with a higher θi taking a loan. We define the implicit function θ(r o ) as the θi below which individuals, offered interest rate r o , do not borrow, i.e. the θi at which equation (2) equals zero. The implicit function theorem implies that: (3) 9
dθ(r o ) > 0 dr o
This follows form the concavity of π in effort. In stage one the borrower evaluates optimal effort at the offer rate and standard future borrowing rate because any surprise rates have not yet been revealed (see Web Appendix Table II, column 4 for corroborating evidence that takeup is uncorrelated with surprise rates). 10
OBSERVING UNOBSERVABLES
2003
This partial derivative implies that higher offer interest rates lead to a riskier pool of clients. Coupled with Assumption 4, this produces the classic Stiglitz– Weiss adverse selection effect: a higher offer interest rate leads to a lower repayment rate. If Cb (r) > 1 + r o , which is implied by Cb (r) ≥ Cg (r), we would get the opposite result. That is, increasing the interest rate would lead to less risk in the borrower pool—advantageous selection. The classic adverse selection result relies heavily on the asymmetry of borrower default costs across states. While the empirical prevalence of limited liability gives the asymmetry assumption some appeal, there may be cases in which it does not hold. Our empirical results, will shed light on the plausibility of the asymmetry assumption. Note also that Equation (3) is only true for a marginal change in r o . If we consider a discrete change in r o the risk pool will change through two channels. One is the classic Stiglitz–Weiss adverse selection effect. Two is an anticipated effort effect that cannot be signed theoretically without an additional assumption: although we know from (1) that riskier clients exert less effort, we can only assert that the anticipated effort effect here actually draws in riskier clients (and thereby reinforces the classic adverse selection effect) by assuming that the cost of additional effort at the discretely higher rate is greater than the benefit. So without that additional assumption the net effect of r o on the risk pool and hence on repayment is ambiguous in theory. Of course we will test the net effect empirically. HIDDEN INFORMATION EFFECT: We label the net effect of r o on repayment “Hidden Information” because it only exists if there is selection on unobservables prior to actual effort choice. It is important to note that what we learn about the nature of asymmetric information from the empirical test of how r o affects repayment depends on whether we find either of the hidden action effects (i.e., on whether there is moral hazard in effort). If there is no hidden action effect then the test identifies any effect of classic adverse selection (or of advantageous selection) on risk. If there is a hidden action effect then our test identifies the reduced-form combination of any classic adverse selection effect and an effect of selection on anticipated effort that may either reinforce or offset any classic adverse selection. In this case the offer rate provides a one-sided test for hidden information: if we find that r o affects repayment this is evidence of a hidden information problem that works through either or both channels (classic adverse selection and/or selection on anticipated effort). But if we find that r o does not affect repayment we might be failing to identify offsetting effects of classic adverse selection and advantageous selection on anticipated effort that can still have negative welfare consequences.11 11 See Finkelstein and McGarry (2006) and Fang, Keane, and Silverman (2008) for evidence and discussion on the effects of offsetting selection on risk exposure and other decision inputs.
2004
D. KARLAN AND J. ZINMAN
In sum, our experimental design allow us, under the assumptions detailed above, to differentiate between hidden information and hidden action effects that have theoretical and policy relevance. Below we present the empirical results and then discuss implications in the Conclusion. 5. RESULTS Table I presents estimates from an empirical model of default that tests for the three asymmetric information effects derived above: (4)
Yi = α + βo r o + βc r c + βb C + Xi + εo
where Yi is one of the measures of default described in Section 3.2, and Xi is a vector of the randomization conditions: observable risk, mailer wave, and branch.12 Adding controls for loan size and maturity does not change the results. We estimate equation (4) on the entire sample of 4348 individuals who obtained a loan. The specifications in Table I vary only in how they measure default, and in whether the dynamic repayment incentive C is measured as a binary variable (= 1 if r f = r c on future loans conditional on good repayment of initial loan) or with binary and continuous (r − r f ) variables. Columns 1–6 estimate the effects of the randomly assigned interest rates on default using individual default measures. Columns 7 and 8 use a summary index of the three default measures; these results are interpreted as the average effect of the interest rate on default, in standard deviation units. The first row of Table I presents estimates of βc , the effect of the contract rate on default. This coefficient identifies any Hidden Action Effect 1, with βc > 0 indicating moral hazard in effort on the contract rate. Seven out of the eight coefficients are positive; the one marginally significant result (column 3) implies that a 100 basis point cut would reduce the average number of months in arrears by 3%. The next row presents estimates of βb , the effect of the dynamic repayment incentive on default. Every specification points to economically and statistically significant moral hazard. Columns 1, 3, and 5 imply that clients assigned any dynamic incentive defaulted an estimated 13 to 21 percent less than the mean. The summary index test also finds a large and significant effect. Columns 2, 4, and 6 show that the effect is increasing in and driven by the size of the discount on future loans, as each 100 basis point decrease in r f reduces default by about 4% in the full sample. The second-to-last row of the table shows that binary incentive and the size of the discount are jointly significant in all specifications. 12 The dynamic incentive was randomized at the branch office level in waves 1 and 2 and hence the error term allows for clustering at the branch level. This is done to allow for the possibility that borrowers in the same branch are subject to similar shocks. Thus following Bloom (2005) and Duflo, Glennerster, and Kremer (2006) we cluster the standard errors at the unit of randomization.
TABLE I EMPIRICAL TESTS OF HIDDEN INFORMATION AND HIDDEN ACTION: FULL SAMPLE OLS Dependent Variable:
Mean of Dependent Variable:
Dynamic repayment incentive dummy (Hidden Action Effect 2)
Observations Adjusted R-squared Probability(both dynamic incentive variables = 0) Probability(all 3 or 4 interest rate variables = 0)
Account in Collection Status
Standardized Index of Three Default Measures
0.09
0.09
0.22
0.22
0.12
0.12
0
0
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
0005 (0003)
0002 (0004)
0006∗ (0003)
0002 (0004)
0001 (0005)
−0001 (0005)
0014 (0011)
0004 (0013)
−0019∗ (0010)
−0000 (0017)
−0028∗∗ (0011)
0004 (0021)
−0025∗∗ (0012)
−0004 (0020)
−0080∗∗ (0032)
−0000 (0057)
−0009∗∗ (0004)
−0005 (0004)
Dynamic repayment incentive size Offer rate (Hidden Information Effect)
Proportion of Months in Arrears
−0023∗ (0013)
−0006 (0005)
0005 (0003)
0004 (0003)
0002 (0003)
0002 (0004)
0007 (0005)
0007 (0005)
0015 (0011)
0015 (0012)
4348 008
4348 008 006 00005
4348 014
4348 015 000 00012
4348 006
4348 006 006 00016
4348 010
4348 011 001 00001
00004
00003
00006
00000
OBSERVING UNOBSERVABLES
Contract rate (Hidden Action Effect 1)
Monthly Average Proportion Past Due
∗ significant at 10%; ∗∗ significant at 5%; ∗∗∗ significant at 1%. Each column presents results from a single OLS model with the RHS variables shown and controls for the
2005
randomization condtions: observable risk, month of offer letter, and branch. Adding loan size and maturity as additional controls does not change the results. Robust standard errors in parentheses are corrected for clustering at the branch level. “Offer rate” and “Contract rate” are in monthly percentage point units (7.00% interest per month is coded as 7.00). “Dynamic repayment incentive” is an indicator variable equal to one if the contract interest rate is valid for one year (rather than just one loan) before reverting back to the normal (higher) interest rates. ”Dynamic repayment incentive size” interacts the above indicator variable with the difference between the lender’s normal rate for that individual’s risk category and the experimentally assigned contract interest rate. A positive coefficient on the Offer Rate variable indicates hidden information, a positive coefficient on the Contract Rate or Dynamic Repayment Incentive variables indicates hidden action (moral hazard). The dependent variable in columns (7) and (8) is a summary index of the three dependent variables used in columns (1)–(6). The summary index is the mean of the standardized value for each of the three measures of default.
2006
D. KARLAN AND J. ZINMAN
The next row presents estimates of βo , the effect of the offer rate on default. Given the presence of moral hazard this coefficient identifies a net Hidden Information Effect that is the combination of any classic adverse selection, and selection on anticipated effort that may either reinforce or offset any classic adverse selection. A positive (negative) coefficient indicates net adverse (advantageous) selection on hidden information. The point estimates are positive in all eight specifications but never significant. The bottom row shows the F -test p-value for the null hypothesis of no asymmetric information effects on default (i.e., that all of the interest rate coefficients = 0). This hypothesis is rejected with >99% confidence in each of the 8 specifications. Web Appendix Table III shows that we find similar results if we bin clients along the lines of Web Appendix Figure 1 and compare means. The Web Appendix also contains additional results (in Tables IV–VIII) on heterogeneity in hidden information and hidden action effects by gender and borrowing history, and on the efficiency of the lender’s underwriting process. 6. CONCLUSION We develop a new market field experiment methodology that disentangles hidden information from hidden action effects. The experiment was implemented on a sample of successful prior borrowers by a for-profit lender in a high-risk South African consumer loan market. The results indicate significant moral hazard, with weaker evidence for adverse selection on hidden information. Practically, identifying the existence and prevalence of any specific information asymmetries is important because of the preponderance of credit market interventions that presuppose credit rationing arising from these asymmetric information problems. But theory and practice are far ahead of the empirical evidence. To craft optimal policies and business strategies, we need answers to at least three key questions: • Which models of information asymmetries (if any) accurately describe existing markets? • What lending practices are effective at mitigating information asymmetries? • What are the welfare implications of resolving information asymmetry problems in credit markets? Our paper makes inroads to the first question only in one particular market and, hence, does not lead directly to a policy prescription. There are many promising directions for future research and we mention a few. One is replicating our experimental design in different markets. There is particularly strong motivation for studying more marginal (e.g., first-time) borrowers, since these borrowers are the focus of many interventions and may pose relatively severe hidden information problems. Our design can also be
OBSERVING UNOBSERVABLES
2007
adapted to other product and service markets in which it is useful to separate selection effects from ex-post incentive effects. Another direction is to design tests that address the key confound discussed in the theoretical section: selection processes can attract types who exert less unobserved effort as well as types who are the innately more risky. Collecting supplemental data on margins of effort and riskiness that are not typically observed by the principal can help isolate these different selection channels (Finkelstein and McGarry (2006); Fang, Keane, and Silverman (2008)). Another approach to isolating adverse selection on risk would be to study contexts where effort can be observed (e.g., settings where firms can closely monitor employee actions and productivity). Uncovering the actual nature and practical implications (if any) of asymmetric information problems in credit markets will require theoretical as well as empirical progress. We highlight a fundamental entangling of selection and effort, specifically that selection processes may draw in individuals with different anticipated effort as well as with different project risks. Thus the clean theoretical distinction between adverse selection and moral hazard may not be identifiable empirically in many contexts. Salanie (2005) lauds the “constant interaction between theory and empirical studies” (p. 221) that has characterized the closely related literature on insurance markets. Comparably intense interactions would deepen our understanding of credit markets, and field experiments can be a useful tool for testing and refining theories as well as practice. REFERENCES ARMENDARIZ DE AGHION, B., AND J. MORDUCH (2005): The Economics of Microfinance. Cambridge, MA: MIT Press. [1993] AUSUBEL, L. M. (1999): “Adverse Selection in the Credit Market,” Working Paper, Department of Economics, University of Maryland. [1994] BANERJEE, A., AND A. NEWMAN (1993): “Occupational Choice and the Process of Development,” Journal of Political Economy, 101, 274–298. [1993] BARDHAN, P., AND C. UDRY (1999): Development Microeconomics. Oxford, U.K.: Oxford University Press. [2000] BEBCZUK, R. (2003): Asymmetric Information in Financial Markets: Introduction and Applications. Cambridge: Cambridge University Press. [1993] BERNANKE, B., AND M. GERTLER (1990): “Financial Fragility and Economic Performance,” Quarterly Journal of Economics, 105, 87–114. [1993] BLOOM, H. S. (2005): “Randomizing Groups to Evaluate Place-Based Programs,” in Learning More From Social Experiments: Evolving Analytical Approaches, ed. by H. S. Bloom. New York: Russel Sage Foundation, Chap. 4. [2004] CARDON, J. H., AND I. HENDEL (2001): “Asymmetric Information in Health Insurance: Evidence From The National Medical Expenditure Survey,” RAND Journal of Economics, 32, 408–427. [1994] CHASSAGNON, A., AND P. A. CHIAPPORI (1997): “Insurance Under Moral Hazard and Adverse Selection: The Case of Pure Competition,” Working Paper 28, Laval-Laboratoire Econometrie. [2000]
2008
D. KARLAN AND J. ZINMAN
CHIAPPORI, P. A., AND B. SALANIE (2000): “Testing for Asymmetric Information in Insurance Markets,” Journal of Political Economy, 108, 56–78. [1994] DE MEZA, D., AND D. C. WEBB (1987): “Too Much Investment: A Problem of Asymmetric Information,” Quarterly Journal of Economics, 102, 281–292. [1993] DEPARTMENT OF TRADE AND INDUSTRY SOUTH AFRICA (2003): “Credit Law Review, Summary of Findings of the Technical Committee,” Report, Department of Trade and Industry, South Africa. [1995] DUFLO, E., R. GLENNERSTER, AND M. KREMER (2006): “Using Randomization in Development Economics Research: A Toolkit,” Working Paper 333, National Bureau of Economic Research. [2004] ELYASIANI, E., AND L. G. GOLDBERG (2004): “Relationship Lending: A Survey of the Literature,” Journal of Economics and Business, 56, 315–330. [1996] FANG, H., M. P. KEANE, AND D. SILVERMAN (2008): “Sources of Advantageous Selection: Evidence From the Medigap Insurance Market,” Journal of Political Economy, 116, 303–350. [2003,2007] FINKELSTEIN, A., AND K. MCGARRY (2006): “Multiple Dimensions of Private Information: Evidence From the Long-Term Care Insurance Market,” American Economic Review, 96, 938–958. [2003,2007] GALE, W. G. (1990): “Federal Lending and The Market for Credit,” Journal of Public Economics, 42, 177–193. [1993] HUBBARD, R. G. (1998): “Capital Market Imperfections and Investment,” Journal of Economic Literature, 36, 193–225. [1993] KARLAN, D., AND J. ZINMAN (2009): “Supplement to ‘Observing Unobservables: Identifying Information Asymmetries With a Consumer Credit Field Experiment’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ecta/Supmat/5781_data and programs.zip. [1996] KLING, J., J. LIEBMAN, AND L. KATZ (2007): “Experimental Analysis of Neighborhood Effects,” Econometrica, 75, 83–120. [1998] MANKIW, N. G. (1986): “The Allocation of Credit and Financial Collapse,” Quarterly Journal of Economics, 101, 455–470. [1993] MOOKHERJEE, D., AND D. RAY (2002): “Contractual Structure and Wealth Accumulation,” American Economic Review, 92, 818–849. [1993] PORTEOUS, D. (2003): “Is Cinderella Finally Coming to the Ball: SA Micro Finance in Broad Perspective,” Working Paper, Micro Finance Regulatory Council. [1995] ROBINSON, M. (2001): The Microfinance Revolution: Sustainable Finance for the Poor. Washington, DC: IBRD/The World Bank. [1995] SALANIE, B. (2005): The Economics of Contracts: A Primer. Cambridge, MA: MIT Press. [2007] SHEARER, B. (2004): “Piece Rates, Fixed Wages and Incentives: Evidence From a Field Experiment,” Review of Economic Studies, 71, 513–534. [1994] STIGLITZ, J. E., AND A. WEISS (1981): “Credit Rationing in Markets With Imperfect Information,” American Economic Review, 71, 393–410. [1993,2001]
Dept. of Economics, Yale University, P.O. Box 208269, New Haven, CT 065208269, U.S.A.; [email protected] and Dartmouth College, Hanover, NH 03755, U.S.A.; [email protected]. Manuscript received March, 2005; final revision received December, 2008.
Econometrica, Vol. 77, No. 6 (November, 2009), 2009–2017
COMMENTS ON “CONVERGENCE PROPERTIES OF THE LIKELIHOOD OF COMPUTED DYNAMIC MODELS” BY DANIEL ACKERBERG, JOHN GEWEKE, AND JINYONG HAHN We show by counterexample that Proposition 2 in Fernández-Villaverde, RubioRamírez, and Santos (Econometrica (2006), 74, 93–119) is false. We also show that even if their Proposition 2 were corrected, it would be irrelevant for parameter estimates. As a more constructive contribution, we consider the effects of approximation error on parameter estimation, and conclude that second order approximation errors in the policy function have at most second order effects on parameter estimates. KEYWORDS: Approximation error.
1. INTRODUCTION FERNÁNDEZ-VILLAVERDE, RUBIO-RAMÍREZ, AND SANTOS (2006; FRS hereafter) consider likelihood based estimation of economic models which cannot be solved analytically and must be solved numerically or with some other form of approximation. This approximated model (e.g., an optimal policy function in the dynamic environment of FRS) is used in place of the exact model to form an approximated likelihood function. The approximated likelihood is then used for statistical inference. FRS posed the question, “What are the effects on statistical inference of using an approximated likelihood instead of the exact likelihood?” Assuming that the approximated model is converging to the exact model, they obtained two results. First, given a fixed number of observations T , they found conditions under which the approximated likelihood function converges to the exact likelihood function. This is an important contribution, since if the approximate likelihood function converges to something other than the exact likelihood function, it is hard to see how one could ever perform meaningful inference. In the second part of the paper (Section 4), the authors consider implications of the relative speed at which the approximated model (in their case, the policy function) converges to the exact model. Their main conclusion here is that “second order approximation errors in the policy function, which almost always are ignored by researchers, have first order effects on the likelihood function” (emphasis in the introduction of original text). More specifically, Proposition 2 in their paper states that the difference between the approximated likelihood function and the exact likelihood function is bounded by a function that includes a term of the form T Bχδ where δ is a bound on the approximation error of the model, and where B and χ are constants that do not depend on T . This result implies that as T increases, one may need the approximation error δ to be shrinking at a rate faster than T for the approximated likelihood to converge to the exact likelihood. Given the authors’ emphasis on the result, a reader might additionally conclude that, for meaningful inference on the para© 2009 The Econometric Society
DOI: 10.3982/ECTA7669
2010
D. ACKERBERG, J. GEWEKE, AND J. HAHN
meter vector γ0 , one may also need the approximation error δ to disappear at a rate faster than T . This comment has two parts. First, we show by counterexample that Proposition 2 in FRS is false. In a simple model that satisfies the assumptions of the FRS framework, we find a sharp bound on the likelihood function that can either be larger or smaller (depending on parameters) than the bound claimed by Proposition 2. We note that Geweke (2007) was the first to point out the error in Proposition 2 in FRS. He showed that the upper bound derived there is incorrect. He also pointed out the logical error of using an upper bound that is not sharp to make conclusions regarding the effects of approximation error. Our note goes one step further, using the counterexample to show that even if their Proposition 2 were corrected, it would be irrelevant for parameter inference. Second, as a more constructive contribution, we extend the results of FRS to explicitly consider the effects of approximation error on parameter inference from a classical perspective.1 What is relevant for classical maximum likelihood inference on γ0 is not the behavior of the approximated likelihood function per se (as considered by FRS), but the behavior of the maximizer of the approximated likelihood function. Denote this maximizer as γj and call it the pseudomaximum likelihood estimator (PMLE). Our first result shows that as T increases, the difference between γj and the true parameter vector γ0 converges to something bounded by a term of the same order as the approximation error. Hence, we conclude that second order approximation errors in the policy function have at most second order effects on parameter inference. We then investigate the consistency and asymptotic normality of the PMLE. The analysis is a straightforward extension of that used in the simulated maximum likelihood literature (e.g., Gouriéroux and Monfort (1991), Hajivassiliou and Ruud (1994)). We first show that as long as the approximation error converges to zero at any rate (as T increases), γj is a consistent estimate of γ0 .2 Regarding asymptotic normality, we√show that as long as the approximation error disappears at a rate faster than T , the approximation error does not affect the asymptotic distribution of the √ maximum likelihood estimator. In other γj − γ0 ) is normal with mean 0 and words, the asymptotic distribution of T ( variance given by the inverse of the information matrix of the exact model. Under our assumptions, this information matrix can be consistently estimated in the standard way. 1
Note that our extension is done under some additional regularity conditions that are not assumed by FRS. 2 A working paper version of Fernández-Villaverde, Rubio-Ramírez, and Santos (2005) shows that if the approximated likelihood function converges to the exact likelihood function, the maximum likelihood estimate using the approximated likelihood converges to the value of the parameter vector that maximizes the exact likelihood function. However, this is done for fixed T and thus does not imply anything about consistency or asymptotic normality.
COMMENTS ON “. . . COMPUTED DYNAMIC MODELS”
2011
We conclude that the relative impact of approximation error is not as large as one might conclude from a reading of FRS. The effects of approximation error on classical inference regarding γ0 are of the same magnitude as the approximation error itself. There is no sense in which the effects of approximation error on point estimates worsen as the sample size T increases. To put the result in context, it is helpful to compare the effects of approximation error to the effects of simulation error in maximum likelihood estimation. In√both cases, there are “bias” terms that must disappear at a rate faster than T to not affect the asymptotic distribution of the estimator. However, we emphasize that these are only asymptotic results. One’s estimates are only going to be as accurate as one’s approximations, and we believe one should make concerted efforts to make these approximations as precise as possible. We also note that it is beyond the scope of this comment to characterize the convergence behavior of approximation error using various approximation techniques.3 2. SIMPLIFIED VERSION OF FRS’S PROPOSITION 2 We consider a very simplified, static version of the model studied by FRS. This simple model is sufficient to illustrate the key points of this comment. Note that since our simplified model is just a restricted version of the model of FRS,4 our counterexample to FRS’s Proposition 2 is in fact a valid counterexample. Our simple model is yt = g(vt ; γ) where vt is an independent and identically distributed unobservable, and γ is an unknown parameter which will be assumed to be a scalar for simplicity of notation. As in FRS, we will assume that we cannot compute the function g(·; γ) exactly. Alternatively, it can be approximated by a function gj (·; γ), where j indexes the accuracy of the approximation. This approximated model gj (·; γ) will in turn generate an approximation pj (·; γ) to the true density p(·; γ) of yt . Lemma 6 of FRS then implies that, under some regularity conditions, if gj (·; γ) − g(·; γ) ≤ δ, where · is the sup norm, then there exists some constant χ > 0 such that (1) 3
|pj (yt ; γ) − p(yt ; γ)| ≤ χδ
For example, quadrature and interpolation. Formally, one can get to our simplified model from the FRS framework by simply adding the additional assumptions that St = {·} and Wt = {·}. 4
2012
D. ACKERBERG, J. GEWEKE, AND J. HAHN
Their Proposition 2 further implies that T T (2) pj (yt ; γ) − p(yt ; γ) ≤ T Bχδ t=1
t=1
where, like χ, B is a constant that does not depend on T .5 3. COUNTEREXAMPLE TO PROPOSITION 2 We now show by counterexample that Proposition 2 of FRS is incorrect. Consider the model yt = g(vt ; γ) = γ + vt , where vt ∼ N(0 σ 2 ). Suppose that σ 2 is known, so the only parameter to estimate is γ. Suppose that instead of using the true model g(vt ; γ) = γ + vt , the econometrician uses an approximated model gj (vt ; γ) = γ + δ + vt . For now, consider the approximation error δ = 0 to be a constant. Again, note that this simple model satisfies the assumptions of the FRS framework. The exact and approximated likelihoods for yt are, respectively, (yt − γ)2 1 p(yt ; γ) = √ exp − 2σ 2 σ 2π and
(yt − δ − γ)2 1 pj (yt ; γ) = √ exp − 2σ 2 σ 2π
It can be shown6 that the difference in these two individual likelihoods can be bounded by (3)
|p(yt ; γ) − pj (yt ; γ)| ≤ χ(δ)|δ|
where χ(δ) is such that 1 exp − 2 (4) χ(δ) ≤ √ σ 2π and (5)
1 2 ≤ lim inf |c|χ(c) ≤ lim sup |c|χ(c) ≤ √ √ |c|→∞ |c|→∞ σ 2π σ 2π
5 The second term in the published version of Proposition 2 of FRS disappears in our simple example since St = {·} and Wt = {·}. 6 Proofs of (3), (4), and (5) are available in the a Supplemental Material (Ackerberg, Geweke, and Hahn (2009)).
COMMENTS ON “. . . COMPUTED DYNAMIC MODELS”
2013
It is also shown that the difference in the joint likelihoods can be bounded by T T −1 √ T √ 1 (6) | T δ|χ( T δ) √ pj (yt ; γ) − p(yt ; γ) ≤ σ 2π t=1
t=1
and that this bound is sharp. √ √ √ Consider the case where σ < 1/ 2π. Because | T δ|χ( T δ) is bounded from above and away from zero for T large, our bound is of order K T −1 , where K > 1. Comparing our bound to the bound derived in Proposition 2 of FRS (i.e., (2)), it is obvious that for large enough T , our bound will be strictly larger. Given Proposition 2 of FRS. If σ > √ that our bound is sharp, this contradicts T −1 1/ 2π, then our bound is of order K with K < 1, which implies that in other cases the bound in Proposition 2 of FRS is too big. Given this counterexample, one might be inclined to try to find a valid bound for the joint likelihood of more general models. However, further consideration of this simple counterexample suggests that this might not be a fruitful endeavor. Note that the maximum likelihood error (MLE) of γ in the simple counterexample is γ = y. The MLE using the approximate model is γj = y − δ. Hence, the effects of approximation error on inference regarding γ is of the same order of magnitude as the approximation error in the model gj (vt ; γ). In addition, the impact of a fixed level of approximation error does not depend on T . These results are true regardless of the value of the standard deviation σ. In contrast, the dependence of our √ sharp bound (6) on T depends dramatically on the value of σ. When σ√< 1/ 2π, the bound increases exponentially in T (for fixed δ). When σ > 1/ 2π , the bound actually shrinks. Thus, while the effects of approximation error on γj do not depend on σ the effect of approximation error on the bound of the joint likelihood depends critically on σ. The counterexample is therefore suggestive that the way in which these joint likelihood function bounds depend on T may not be relevant for studying the effects of approximation error on inference, even in more general models. In this simple example, the Bayesian posterior means (with flat priors) are equivalent to the MLEs. Hence, this evidence is also suggestive that the dependence of a joint likelihood bound on T is not relevant for both classical inference and at least some aspects of Bayesian inference. In the next two sections, we provide a constructive analysis of the effects of approximation error on classical maximum likelihood inference. This analysis is based on the differences in the true and the approximating average log likelihoods rather than the differences in their levels. 4. THE EFFECTS OF APPROXIMATION ERROR ON CLASSICAL INFERENCE We now extend the results in FRS to examine the effects of approximation error on classical inference regarding γ. This is done in the context of the static model of the previous section, but we suspect that our results could also be
2014
D. ACKERBERG, J. GEWEKE, AND J. HAHN
shown in FRS’s more general framework. We first show that we can bound the effect of approximation error on parameter inference by a term of the same order of magnitude as the approximation error. In other words, we show that second order approximation errors in the model have at most second order effects on inference regarding the model’s parameters. In the following section, we explicitly analyze the effect of approximation error on the asymptotic distribution of maximum likelihood estimators. γj as the pseudo-maximum likeDenote the true value of γ by γ0 . Define lihood estimator (PMLE) which maximizes the approximated joint log likelihood function, that is, 1 log pj (yt ; γ) T t=1 T
γj = arg max γ
To characterize the magnitude of the effect of approximation error on estimation, we will investigate how the probability limit of γj depends on the approximation error. We use the Sobolev norm to measure the degree of approximation: DEFINITION 1: We define k ∂ ∂k Δj ≡ max sup k log pj (yt ; γ) − k log p(yt ; γ) k = 0 1 2 ∂γ yt γ ∂γ The Δj measures how well the individual log likelihood pj approximates both the level and the shape of the exact log likelihood. This approximation error in the individual log likelihood is generated by the difference between the exact model (g(vt ; γ)) and the approximated model (gj (vt ; γ)). One could derive this bound Δj from lower level assumptions on bounds relating gj (vt ; γ), g(vt ; γ), and their derivatives.7 For this comment, it is sufficient to observe that this approximation error bound Δj will not depend on the sample size T . This is because in our simple model, none of gj (vt ; γ) g(vt ; γ), or the distribution of vt depends on the sample size T . We will assume that the index j is such that the approximation error gets small as j gets larger: CONDITION 1: We assume Δj → 0 as j → ∞. We will assume the following standard regularity conditions on the exact likelihood, which can be found in, for example, Newey and McFadden (1994; NM hereafter): 7 Given that we bound differences in shapes as well as levels between the approximated and true individual log likelihoods, this might require extra regularity conditions (on gj (vt ; γ), g(vt ; γ), and the distribution of vt ) in addition to those assumed by FRS.
COMMENTS ON “. . . COMPUTED DYNAMIC MODELS”
2015
p(yt ; γ0 ); (ii) γ0 ∈ Γ , which CONDITION 2: (i) If γ = γ0 , then p(yt ; γ) = is compact; (iii) log p(yt ; γ) is continuous at each γ ∈ Γ with probability 1; (iv) E[supγ∈Γ | log p(yt ; γ)|] < ∞. Given these conditions, NM (Lemma 2.2) implies that the function Q0 (γ) ≡ E[p(yt ; γ)] is uniquely maximized at γ0 . We will strengthen this identification result by making the following additional assumption: CONDITION 3: We assume Q0 (γ0 ) < 0. Last, we assume additional regularity conditions on both the exact individual likelihood p(yt ; γ) (Condition 4) and the approximated individual likelihood pj (yt ; γ) (Condition 5): CONDITION 4: (i) log p(yt ; γ) is twice continuously differentiable; (ii) there exists some d(yt ) with E[d(yt )] < ∞ such that γ ∈ Γ | log p(yt ; γ)| ≤ d(yt ), |∇γ log p(yt ; γ)| ≤ d(yt ), and |∇γγ log p(yt ; γ)| ≤ d(yt ) for all γ ∈ Γ . CONDITION 5: (i) log pj (yt ; γ) is twice continuously differentiable; (ii) there exists some d(yt ) with E[d(yt )] < ∞ such that | log pj (yt ; γ)| ≤ d(yt ), |∇γ log pj (yt ; γ)| ≤ d(yt ), and |∇γγ log pj (yt ; γ)| ≤ d(yt ) for all γ ∈ Γ ; (iii) γj uniquely maximizes Qj (γ) ≡ E[pj (yt ; γ)]. Note that part (iii) of Condition 5 is not guaranteed by NM (Lemma 2.2) because pj (yt ; γ) is not the true likelihood of the data yt However, given we assume this additional identification condition, NM (Theorem 2.1) implies that the PMLE γj converges to γj in probability, i.e. γj converges to γj in THEOREM 1: Suppose that Condition 5 is satisfied. Then, probability as T → ∞ while j is fixed. The proofs of all theorems are given in the Supplemental Material (Ackerberg, Geweke, and Hahn (2009)). We next explicitly relate how this bound relates to the approximation error of pj : THEOREM 2: Suppose that Conditions 1–5 are satisfied. Then there exists ζ > 0 such that |γj − γ0 | ≤ ζ · Δj . The ζ in Theorem 2 does not depend on the sample size T . Hence, Theorem 2 states that the difference between the true parameter and the probability limit of the PMLE using the approximated likelihood is bounded by a term of the same magnitude as the approximation error Δj . As a result, we conclude that second order approximation errors in the model have at most second order effects on inference regarding parameters.
2016
D. ACKERBERG, J. GEWEKE, AND J. HAHN
5. ASYMPTOTIC RESULTS WITH APPROXIMATION ERROR Last we explicitly consider the effects of approximation errors on standard asymptotic approximations. For this purpose, we assume that the index j is a function j(T ) of the sample size T such that j(T ) → ∞ as T → ∞. It is fairly obvious that one will need the approximation error to disappear asymptotically in order to obtain consistent estimates. What might be less obvious is the rate at which the approximation error needs to disappear, both for consistency and for standard asymptotic approximations to be valid. The results in this section are based on very standard arguments that have also been used in the simulated maximum likelihood literature (Gouriéroux and Monfort (1991), Hajivassiliou and Ruud (1994)); the proofs are in the Supplemental Material. Our first result regards the consistency of the PMLE γj(T ) . THEOREM 3: Suppose that Conditions 1–5 are satisfied. Then γj(T ) = γ0 + op (1) as T → ∞ and Δj(T ) → 0. This result states that γj(T ) is a consistent estimate of γ0 regardless of the rate at which Δj(T ) converges to zero (as T → ∞). Intuitively, the approximated likelihood converges to the exact likelihood (since Δj(T ) → 0), and the maximum of the exact likelihood converges to γ0 (since T → ∞), providing the result. √ Our next result considers the asymptotic distribution of T ( γj(T ) − γ0 ) as T → ∞ and Δj(T ) → 0. THEOREM 4: Suppose that Conditions 1–5 are satisfied. Then √
T γj(T ) − γ0 ⇒ N(0 I −1 ) √ if T Δj(T ) → 0 as T → ∞. Here, I = −Q0 (γ0 ) denotes the Fisher information. To appreciate Theorem 4, note that standard arguments imply that the MLE that maximizes the exact joint likelihood, that is, 1 γ0 = arg max log p(yt ; γ) γ T t=1 T
also has asymptotic distribution √ (7) T ( γ0 − γ0 ) ⇒ N(0 I −1 ) A comparison √ of (7) to Theorem 4 implies that as long as Δj(T ) → 0 at a rate faster than T , approximation error does not affect the asymptotic distribution of the estimator. Intuitively, approximation error introduces a bias term of√order Δj(T ) into the PMLE. If this bias term disappears at a rate slower than T ,
COMMENTS ON “. . . COMPUTED DYNAMIC MODELS”
2017
this bias term √ dominates the asymptotic distribution. If the term disappears faster than T , it vanishes from the asymptotic distribution. Again, this is very reminiscent of the “bias term” that arises in simulated maximum likelihood estimation. That bias term (which is inversely proportional to√the number of simulation draws) also needs to disappear at a rate faster than T for it not to affect the asymptotic distribution. Last, we note that under our assumptions it is straightforward to show that γj(T ) ) is a consistent estimate of Q0 (γ0 ). In other words, we can consisQj ( tently estimate I using the approximated model. Thus, our results √ imply that as long as approximation error disappears at a rate faster than T , it can be ignored for purposes of forming asymptotically valid confidence intervals and hypothesis tests. REFERENCES ACKERBERG, D., J. GEWEKE, AND J. HAHN (2009): “Supplement to ‘Comments on ‘Convergence Properties of the Likelihood of Computed Dynamic Models’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ecta/Supmat/7669_proofs.pdf. [2012,2015] FERNÁNDEZ-VILLAVERDE, J., J. F. RUBIO-RAMÍREZ, AND J. S. SANTOS (2005): “Convergence Properties of the Likelihood of Computed Dynamic Models,” Technical Working Paper 315, NBER. [2010] (2006): “Convergence Properties of the Likelihood of Computed Dynamic Models,” Econometrica, 74, 93–119. [2009] GEWEKE, J. (2007): “Convergence Properties of the Likelihood of Computed Dynamic Models: Comment,” Unpublished Manuscript, University of Iowa, available at http://www.biz.uiowa. edu/faculty/jgeweke/papers/discussionA/comment.pdf. [2010] GOURIÉROUX, C., AND A. MONFORT (1991): “Simulation Based Inference in Models With Heterogeneity,” Annales d’Économie et de Statistique, 20/21, 69–107. [2010,2016] HAJIVASSILIOU, V. A., AND P. A. RUUD (1994): “Classical Estimation Methods for LDV Models Using Simulation,” in Handbook of Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden. Amsterdam: Elsevier. [2010,2016] NEWEY, W. K., AND D. MCFADDEN (1994): “Large Sample Estimation and Hypothesis Testing,” in Handbook of Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden. Amsterdam: Elsevier. [2014]
Dept. of Economics, University of California Los Angeles, 8283 Bunche Hall, Los Angeles, CA 90095, U.S.A.; [email protected], University of Technology Sydney, City Campus, P.O. Box 123, Broadway, Sydney, NSW 2007, Australia and University of Colorado, Boulder, CO 80309, U.S.A.; [email protected], and Dept. of Economics, University of California Los Angeles, 8283 Bunche Hall, Los Angeles, CA 90095, U.S.A.; [email protected]. Manuscript received January, 2008; final revision received July, 2008.
Econometrica, Vol. 77, No. 6 (November, 2009), 2019–2021
ANNOUNCEMENTS 2010 NORTH AMERICAN WINTER MEETING
THE 2010 NORTH AMERICAN WINTER MEETING of the Econometric Society will be held in Atlanta, GA, from January 3 to 5, 2010, as part of the annual meeting of the Allied Social Science Associations. The program will consist of contributed and invited papers. The program committee will be chaired by Dirk Bergemann of Yale University. This year, the program of the Winter Meetings contains five semi-plenary sessions. The semi-plenary session are scheduled for Sunday, January 3, 2:30 pm, and Monday, January 4, 10:15 am. During the time of the semiplenary sessions there will be no regular sessions scheduled. The program for the semi-plenary sessions is as follows: January 3, 2:30 pm: Advances in Econometrics Presiding: Lars Hansen, University of Chicago and Gary Chamberlain, Harvard University—Bayesian Aspects of Treatment Choice; Peter M. Robinson, London School of Economics—Nonparametric Trending Regression With Cross-Sectional Dependence Discussants: Edward J. Vytlacil, Yale University; Mark W. Watson, Princeton University The Economics of the Financial Crises Presiding: Gary Gorton, Yale University and Markus K. Brunnermeier, Princeton University—Macroeconomics With a Financial Sector; Darrell Duffie, Stanford University—The Failure Mechanics of Dealer Banks Discussant: John Geanakoplos, Yale University Labor Markets, Search and Human Capital Presiding: Daron Acemoglu, Massachusetts Institute of Technology and Steven Davis, University of Chicago—Labor Market Search: New Evidence and Unresolved Issues; Robert Shimer, University of Chicago—Human Capital in the Theory of Unemployment Discussant: Giuseppe Moscarini, Yale University and January 4, 10:15 am: Applications of Nonlinear Filtering Methods in Econometrics (joint with the Journal of Business & Economic Statistics) © 2009 The Econometric Society
DOI: 10.3982/ECTA776ANN
2020
ANNOUNCEMENTS
Presiding: Serena Ng, Columbia University and Lars Peter Hansen, University of Chicago—Applications of Nonlinear Filtering Methods in Econometrics Discussants: Jesus Fernandez-Villaverde, University of Pennsylvania; Eric Michel Renault, University of North Carolina; Jean-Francois Richard, University of Pittsburgh; David Dejong, University of Pittsburgh Auction and Mechanism Design Presiding: Andrew Postlewaite, University of Pennsylvania and Stephen Edward Morris, Princeton University—Robust Mechanism Design; Paul Klemperer, University of Oxford—Auctions for Public Policy: Central-Bank Liquidity Provision, Airport Landing-Slot Allocation, and “Toxic Asset” Purchases Discussant: Rakesh Vohra, Northwestern University Program Committee: Dirk Bergemann, Yale University, Chair Marco Battaglini, Princeton University (Political Economy) Roland Benabou, Princeton University (Behavioral Economics) Markus Brunnermeier, Princeton University (Financial Economics) Xiahong Chen, Yale University (Theoretical Econometrics, Time Series) Liran Einav, Stanford University (Industrial Organization) Luis Garicano, University of Chicago (Organization, Law and Economics) John Geanakoplos, Yale University (General Equilibrium Theory, Mathematical Economics) Mike Golosov, MIT (Macroeconomics) Pierre Olivier Gourinchas, University of California (International Finance) Igal Hendel, Northwestern (Empirical Microeconomics) Johannes Hoerner, Yale University (Game Theory) Han Hong, Stanford University (Applied Econometrics) Wojcich Kopczuk, Columbia University (Public Economics) Martin Lettau, University of California, Berkeley (Finance) Enrico Moretti, University of California, Berkeley (Labor) Muriel Niederle, Stanford University (Experimental Game Theory, Market Design) Luigi Pistaferri, Stanford University (Labor) Esteban Rossi-Hansberg, Princeton University (International Trade) Marciano Siniscalchi, Northwestern University (Decision Theory) Robert Townsend, Massachusetts Institute of Technology (Development Economics) Oleg Tsyvinski, Yale University (Macroeconomics, Public Finance) Harald Uhlig, University of Chicago (Macroeconomics, Computational Finance) Ricky Vohra, Northwestern University (Auction, Mechanism Design)
ANNOUNCEMENTS
2021
2010 WORLD CONGRESS OF THE ECONOMETRIC SOCIETY
THE TENTH WORLD CONGRESS of the Econometric Society will be held in Shanghai from August 17th to August 21th, 2010. It is hosted by Shanghai Jiao Tong University in cooperation with Shanghai University of Finance and Economics, Fudan University, China Europe International Business School, and the Chinese Association of Quantitative Economics. The congress is open to all economists, including those who are not now members of the Econometric Society. It is hoped that papers presented at the Congress will represent a broad spectrum of applied and theoretical economics and econometrics. The Program Co-Chairs are: Professor Daron Acemoglu, MIT Department of Economics, E52-380B, 50 Memorial Drive, Cambridge, MA 02142-1347, U.S.A. Professor Manuel Arellano, CEMFI, Casado del Alisal 5, 28014 Madrid, Spain. Professor Eddie Dekel, Department of Economics, Northwestern University, 2003 Sheridan Rd., Evanston, IL 60208-2600, U.S.A., and Eitan Berglas School of Economics, Tel Aviv University, Tel Aviv 69978, Israel. Submissions will be open from November 1st, 2009 and will be accepted only in electronic form at www.eswc2010.com. The deadline for such submissions will be January 30th, 2010. There will be financial assistance for young scholars to be allocated once the decisions on submitted papers have been made. At least one co-author must be a member of the Society or must join prior to submission. This can be done electronically at www.econometricsociety.org. The Chair of the Local Organizing Committee is: Professor Lin Zhou, Department of Economics, Shanghai Jiao Tong University, Shanghai 200052, China, and Department of Economics, Arizona State University, Tempe, AZ 85287, U.S.A. Detailed information on registration and housing will be sent by email to all members of the Econometric Society in due course and will be available at www.eswc2010.com.
Econometrica, Vol. 77, No. 6 (November, 2009), 2023
FORTHCOMING PAPERS THE FOLLOWING MANUSCRIPTS, in addition to those listed in previous issues, have been accepted for publication in forthcoming issues of Econometrica. AHN, DAVID S., AND HALUK ERGIN: “Framing Contingencies.” BAI, YAN, AND JING ZHANG: “Solving the Feldstein–Horioka Puzzle With Financial Frictions.” BAJARI, PATRICK, HAN HONG, AND STEPHEN P. RYAN: “Identification and Estimation of a Discrete Game of Complete Information.” BEARE, BRENDAN: “Copulas and Temporal Dependence.” BESANKO, DAVID, ULRICH DORASZELSKI, YAROSLAV KRYUKOV, AND MARK SATTERTHWAITE: “Learning-by-Doing, Organizational Forgetting, and Industry Dynamics.” BESLEY, TIMOTHY, AND TORSTEN PERSSON: “State Capacity, Conflict and Development.” BIAIS, BRUNO, THOMAS MARIOTTI, JEAN-CHARLES ROCHET, AND STÉPHANE VILLENEUVE: “Large Risks, Limited Liability and Dynamic Moral Hazard.” CHAMBERLAIN, GARY: “Binary Response Models for Panel Data: Identification and Information.” GILBOA, ITZHAK, FABIO MACCHERONI, MASSIMO MARINACCI, AND DAVID SCHMEIDLER: “Objective and Subjective Rationality in a Multiple Prior Model.” KOJIMA, FUHITO, AND MIHAI MANEA: “Axioms for Deferred Acceptance.” RAHMAN, DAVID, AND ICHIRO OBARA: “Mediated Partnerships.” ROMANO, JOSEPH P., AND AZEEM M. SHAIKH: “Inference for the Identified Set in Partially Identified Econometric Models.” STOVALL, JOHN E.: “Multiple Temptations.” YAMASHITA, TAKURO: “Mechanism Games With Multiple Principals and Three or More Agents.”
© 2009 The Econometric Society
DOI: 10.3982/ECTA776FORTH
Econometrica, Vol. 77, No. 6 (November, 2009), 2025–2027
THE ECONOMETRIC SOCIETY ANNUAL REPORTS, 2008 REPORT OF THE PRESIDENT
1. THE SOCIETY AND ITS EXISTING PUBLICATIONS THE SOCIETY IS THE LEADING LEARNED SOCIETY for economists in the world, and the world-wide coverage is becoming even more important as economic science expands in regions outside of its traditional strongholds in North America and Europe. It has been both a privilege and a pleasure to serve as the Society’s President in the year of 2008. One of the Society’s major activities is to publish high-quality research so as to further the objective of supporting economics with a quantitative-theoretical and quantitative-empirical orientation. The flagship journal Econometrica continues to be very well managed thanks to the devoted work by the Editor, Stephen Morris, assisted in the Princeton editorial office by Mary Beth Bellando, as well as the efforts of the six Co-editors, and many Associate editors. The Society’s Monograph Series was effectively edited by Andrew Chesher and George Mailath. In 2008, the Society also took important steps towards adding two new journals to its publications—more about this in Section 4 below.
2. NOMINATING COMMITTEES As required in the constitution and by-laws, the annual elections of Officers, Council, and Fellows are preceded by work in three independent Nominating Committees. In 2008, I appointed the following committees: OFFICERS: Richard Blundell, Chair; Avinash Dixit; Lars Peter Hansen; Elhanan Helpman; John Moore; Roger Myerson; Torsten Persson; Jean Tirole. COUNCIL: Lars Peter Hansen, Chair; Susan Athey; Timothy Besley; Takatoshi Ito; Stephen Morris; Adrian Pagan; Jörgen Weibull. FELLOWS: Matthew Jackson, Chair; Orazio Attanasio; Gabrielle Demange; Pinelopi Goldberg; Søren Johansen; Nancy Stokey. I thank all the committee members for undertaking this important work for the Society.
3. THE REGIONS Aside from its central activities, the Society also operates in six regions, mainly by organizing scientific meetings. During the year, I attended the following regional meetings: North American Winter Meeting, New Orleans, Louisiana, January 4–6, 2008 North American Summer Meeting, Pittsburgh, Pennsylvania, June 18–21, 2008 Australasian Meeting, Wellington, New Zealand, July 9–11, 2008 Far Eastern and South Asian Meeting, Singapore, July 16–18, 2008 European Meeting, Milan, Italy, August 27–31, 2008 Latin American Meeting, Rio de Janeiro, Brazil, November 20–22, 2008 © 2009 The Econometric Society
DOI: 10.3982/ECTA776PRES
2026
ANNUAL REPORTS
It was a true pleasure to take part in these gatherings, which I uniformly found scientifically exciting, impeccably organized, and well attended. I would like to thank all the members of program committees and local organizing committees, whose work was essential to make the meetings such a success. It was particularly gratifying to see the high quality and the general excitement in the meetings in Asia and Latin America. These are also the regions with the fastest growing number of members, a growth that brought the worldwide membership of the Society close to 6000 by the end of 2008. During the year, the Executive Committee initiated a discussion about creating a separate African region of the Society, based on a proposal from the African Econometric Society (AES)—an independent organization that has organized a set of meetings for some ten years. An ad-hoc committee (Timothy Besley, Chair; Yaw Nyarko; Rafael Repullo; and Chris Udry) was set up with the dual purposes of advising the Executive Committee as well as the AES along the way to the prospective formation of an African region.
4. NEW SOCIETY JOURNALS In 2007, the Blundell–Hansen–Persson committee proposed to create two new Society journals: one with a focus on economic theory and its applications, another with a focus on quantitative methods and applications, both broadly defined. Following a go-ahead decision by the Executive Committee, my work during the year was directed towards working out a concrete proposal that could be posed to the Council and the Fellowship. As for the theory journal, the working hypothesis was that we might be able to reach an agreement with the Society for Economic Theory to arrange an adoption by the Econometric Society of the existing journal Theoretical Economics (TE). In the fall of 2008, Roger Myerson and I were indeed able to come to a conditional agreement with that society about the conditional terms of such an adoption including governance, editorial board, time plan, etc. According to this agreement, Martin Osborne would continue as the Editor of TE, with partial extensions and replacements of the editorial board to help publish a larger number applied theory papers, and the first issues under the Society’s ownership would appear in 2010. As for the quantitative journal, named Quantitative Economics (QE), no similar adoption agreement was foreseen. Lars Hansen and I were therefore conducting a search for an editorial board that would be prepared to start up the new journal, also to appear in 2010. Eventually, we came to an agreement with Orazio Attanasio to serve as Editor of QE, and with Steven Durlauf, Victor Rios-Rull, and Elie Tamer to serve as Co-editors, conditional on a positive decision. In parallel with these discussions, Rafael Repullo and I were working on a business plan for the new journals. By the end of the year, a concrete proposal was ready to take to the Council and the Fellows for a vote. In early 2009, a large majority of Council members and Fellows supported the proposal to create two new open-access Society journals, TE and QE, with the first issues to appear in 2010.
5. CLOSING REMARKS During my time in the Society’s Executive Committee, I have been fortunate enough to collaborate with a group of extraordinary people. In particular, it has been a privilege
ANNUAL REPORTS
2027
to work together with Past Presidents Richard Blundell and Lars Hansen, as well as my successors Roger Myerson, John Moore and Bengt Holmström. Finally, the Society would not be able to function without the devoted and skillful work by its Executive Vice-President, Rafael Repullo, and General Manager, Claire Sashi. I am grateful to both of them for making my job as President so much easier. Torsten Persson PRESIDENT IN 2008