CONTENTS DAVID BESANKO, ULRICH DORASZELSKI, YAROSLAV KRYUKOV, AND MARK SATTERTHWAITE:
Learning-by-Doing, Organizational Forgetting, and Industry Dynamics. . . . . . . . . . . FRANCISCO M. GONZALEZ AND SHOUYONG SHI: An Equilibrium Theory of Learning, Search, and Wages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JAN EECKHOUT AND PHILIPP KIRCHER: Sorting and Decentralized Price Competition . . . . ANDREW CHESHER: Instrumental Variable Models for Discrete Outcomes . . . . . . . . . . . . . YAN BAI AND JING ZHANG: Solving the Feldstein–Horioka Puzzle With Financial Frictions FUHITO KOJIMA AND MIHAI MANEA: Axioms for Deferred Acceptance . . . . . . . . . . . . . . . . . . DAVID S. AHN AND HALUK ERGIN: Framing Contingencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
453 509 539 575 603 633 655
NOTES AND COMMENTS: GUIDO KUERSTEINER AND RYO OKUI:
Constructing Optimal Instruments by FirstStage Prediction Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FRANCESCO BARTOLUCCI AND VALENTINA NIGRO: A Dynamic Model √ for Binary Panel Data With Unobserved Heterogeneity Admitting a n-Consistent Conditional Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FEDERICO A. BUGNI: Bootstrap Inference in Partially Identified Models Defined by Moment Inequalities: Coverage of the Identified Set . . . . . . . . . . . . . . . . . . . . . . .
697
719 735
ITZHAK GILBOA, FABIO MACCHERONI, MASSIMO MARINACCI, AND DAVID SCHMEIDLER:
Objective and Subjective Rationality in a Multiple Prior Model . . . . . . . . . . . . . DIRK BERGEMANN AND JUUSO VÄLIMÄKI: The Dynamic Pivot Mechanism . . . . . . . . . TAKURO YAMASHITA: Mechanism Games With Multiple Principals and Three or More Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIKTOR WINSCHEL AND MARKUS KRÄTZIG: Solving, Estimating, and Selecting Nonlinear Dynamic Models Without the Curse of Dimensionality . . . . . . . . . . . . . . YUICHIRO KAMADA: Strongly Consistent Self-Confirming Equilibrium . . . . . . . . . . . MARTIN PESENDORFER AND PHILIPP SCHMIDT-DENGLER: Sequential Estimation of Dynamic Discrete Games: A Comment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
755 771
ANNOUNCEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FORTHCOMING PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
843 845
VOL. 78, NO. 2 — March, 2010
791 803 823 833
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org EDITOR STEPHEN MORRIS, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] MANAGING EDITOR GERI MATTSON, 2002 Holly Neck Road, Baltimore, MD 21221, U.S.A.; mattsonpublishingservices@ comcast.net CO-EDITORS DARON ACEMOGLU, Dept. of Economics, MIT, E52-380B, 50 Memorial Drive, Cambridge, MA 021421347, U.S.A.;
[email protected] WOLFGANG PESENDORFER, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] JEAN-MARC ROBIN, Maison des Sciences Economiques, Université Paris 1 Panthéon–Sorbonne, 106/112 bd de l’Hôpital, 75647 Paris Cedex 13, France and University College London, U.K.;
[email protected] LARRY SAMUELSON, Dept. of Economics, Yale University, 20 Hillhouse Avenue, New Haven, CT 065208281, U.S.A.;
[email protected] JAMES H. STOCK, Dept. of Economics, Harvard University, Littauer M-24, 1830 Cambridge Street, Cambridge, MA 02138, U.S.A.;
[email protected] HARALD UHLIG, Dept. of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, U.S.A.;
[email protected] ASSOCIATE EDITORS YACINE AÏT-SAHALIA, Princeton University JOSEPH G. ALTONJI, Yale University JAMES ANDREONI, University of California, San Diego JUSHAN BAI, Columbia University MARCO BATTAGLINI, Princeton University PIERPAOLO BATTIGALLI, Università Bocconi DIRK BERGEMANN, Yale University XIAOHONG CHEN, Yale University VICTOR CHERNOZHUKOV, Massachusetts Institute of Technology J. DARRELL DUFFIE, Stanford University JEFFREY ELY, Northwestern University HALUK ERGIN, Washington University in St. Louis MIKHAIL GOLOSOV, Yale University FARUK GUL, Princeton University JINYONG HAHN, University of California, Los Angeles PHILIP A. HAILE, Yale University MICHAEL JANSSON, University of California, Berkeley PHILIPPE JEHIEL, Paris School of Economics and University College London PER KRUSELL, Princeton University and Stockholm University FELIX KUBLER, University of Zurich
OLIVER LINTON, London School of Economics BART LIPMAN, Boston University THIERRY MAGNAC, Toulouse School of Economics (GREMAQ and IDEI) GEORGE J. MAILATH, University of Pennsylvania DAVID MARTIMORT, IDEI-GREMAQ, Université des Sciences Sociales de Toulouse STEVEN A. MATTHEWS, University of Pennsylvania ROSA L. MATZKIN, University of California, Los Angeles LEE OHANIAN, University of California, Los Angeles WOJCIECH OLSZEWSKI, Northwestern University NICOLA PERSICO, New York University BENJAMIN POLAK, Yale University PHILIP J. RENY, University of Chicago SUSANNE M. SCHENNACH, University of Chicago UZI SEGAL, Boston College NEIL SHEPHARD, University of Oxford MARCIANO SINISCALCHI, Northwestern University JEROEN M. SWINKELS, Northwestern University ELIE TAMER, Northwestern University EDWARD J. VYTLACIL, Yale University IVÁN WERNING, Massachusetts Institute of Technology ASHER WOLINSKY, Northwestern University
EDITORIAL ASSISTANT: MARY BETH BELLANDO, Dept. of Economics, Princeton University, Fisher Hall, Princeton, NJ 08544-1021, U.S.A.;
[email protected] Information on MANUSCRIPT SUBMISSION is provided in the last two pages. Information on MEMBERSHIP, SUBSCRIPTIONS, AND CLAIMS is provided in the inside back cover.
Econometrica, Vol. 78, No. 2 (March, 2010), 453–508
LEARNING-BY-DOING, ORGANIZATIONAL FORGETTING, AND INDUSTRY DYNAMICS BY DAVID BESANKO, ULRICH DORASZELSKI, YAROSLAV KRYUKOV, AND MARK SATTERTHWAITE1 Learning-by-doing and organizational forgetting are empirically important in a variety of industrial settings. This paper provides a general model of dynamic competition that accounts for these fundamentals and shows how they shape industry structure and dynamics. We show that forgetting does not simply negate learning. Rather, they are distinct economic forces that interact in subtle ways to produce a great variety of pricing behaviors and industry dynamics. In particular, a model with learning and forgetting can give rise to aggressive pricing behavior, varying degrees of long-run industry concentration ranging from moderate leadership to absolute dominance, and multiple equilibria. KEYWORDS: Dynamic stochastic games, Markov-perfect equilibrium, learning-bydoing, organizational forgetting, industry dynamics, multiple equilibria.
1. INTRODUCTION EMPIRICAL STUDIES PROVIDE ample evidence that the marginal cost of production decreases with cumulative experience in a variety of industrial settings. This fall in marginal cost is known as learning-by-doing. More recent empirical studies also suggest that organizations can forget the know-how gained through learning-by-doing due to labor turnover, periods of inactivity, and failure to institutionalize tacit knowledge.2 Organizational forgetting has been largely ignored in the theoretical literature. This is problematic because Benkard (2004) showed that organizational forgetting is essential to explain the dynamics in the market for wide-bodied airframes in the 1970s and 1980s. 1 We have greatly benefitted from the comments and suggestions of a co-editor and two anonymous referees. We are also indebted to Lanier Benkard, Luis Cabral, Jiawei Chen, Stefano Demichelis, Michaela Draganska, Ken Judd, Pedro Marin, Ariel Pakes, Michael Ryall, Karl Schmedders, Chris Shannon, Kenneth Simons, Scott Stern, Mike Whinston, and Huseyin Yildirim as well as the participants of various conferences. Guy Arie and Paul Grieco provided excellent research assistance. Besanko and Doraszelski gratefully acknowledge financial support from the National Science Foundation under Grant 0615615. Doraszelski further benefitted from the hospitality of the Hoover Institution during the academic year 2006–2007. Kryukov thanks the General Motors Center for Strategy in Management at Northwestern’s Kellogg School of Management for support during this project. Satterthwaite acknowledges gratefully that this material is based on work supported by the National Science Foundation under Grant 0121541. 2 See Wright (1936), Hirsch (1952), DeJong (1957), Alchian (1963), Levy (1965), Kilbridge (1962), Hirschmann (1964), Preston and Keachie (1964), Baloff (1971), Dudley (1972), Zimmerman (1982), Lieberman (1984), Gruber (1992), Irwin and Klenow (1994), Jarmin (1994), Pisano (1994), Bohn (1995), Hatch and Mowery (1998), Thompson (2001), and Thornton and Thompson (2001) for empirical studies of learning-by-doing; see Argote, Beckman, and Epple (1990), Darr, Argote, and Epple (1995), Benkard (2000), Shafer, Nembhard, and Uzumeri (2001), and Thompson (2003) for organizational forgetting.
© 2010 The Econometric Society
DOI: 10.3982/ECTA6994
454
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
In this paper we build on the computational Markov-perfect equilibrium framework of Ericson and Pakes (1995) to analyze how the economic fundamentals of learning-by-doing and organizational forgetting interact to determine industry structure and dynamics.3 We add organizational forgetting to Cabral and Riordan’s (1994) (C–R) seminal model of learning-by-doing.4 This seemingly small change has surprisingly large effects. Dynamic competition with learning and forgetting is akin to racing down an upward-moving escalator. As long as a firm makes sales sufficiently frequently so that the gain in know-how from learning outstrips the loss in know-how from forgetting, it moves down its learning curve and its marginal cost decreases. However, if sales slow down or come to a halt, perhaps because of its competitors’ aggressive pricing, then the firm slides back up its learning curve and its marginal cost increases. This cannot happen in the C–R model. Due to this qualitative difference, organizational forgetting leads to a rich array of pricing behaviors and industry dynamics that the existing literature neither imagined nor explained. It is often said that learning-by-doing promotes market dominance because it gives a more experienced firm the ability to profitably underprice its less experienced rival and therefore shut out the competition in the long run. As Dasgupta and Stiglitz (1988, p. 247) explained firm-specific learning encourages the growth of industrial concentration. To be specific, one expects that strong learning possibilities, coupled with vigorous competition among rivals, ensures that history matters in the sense that if a given firm enjoys some initial advantages over its rivals it can, by undercutting them, capitalize on these advantages in such a way that the advantages accumulate over time, rendering rivals incapable of offering effective competition in the long run
However, if organizational forgetting “undoes” learning-by-doing, then forgetting may be a procompetitive force and an antidote to market dominance through learning. Two reasons for suspecting this come to mind. First, to the extent that the leader has more to forget than the follower, forgetting should work to equalize differences between firms. Second, because forgetting makes improvements in competitive position from learning transitory, it should make firms reluctant to invest in the acquisition of know-how through price cuts. We reach the opposite conclusion: Organizational forgetting can make firms more aggressive rather than less aggressive. This aggressive pricing behavior, in turn, puts the industry on a path toward one firm leading—perhaps even dominating—the market. 3 Dynamic stochastic games and feedback strategies that map states into actions date back at least to Shapley (1953). Maskin and Tirole (2001) provided the fundamental theory that shows how many subgame-perfect equilibria of these games can be represented consistently and robustly as Markov-perfect equilibria. 4 Prior to the infinite-horizon price-setting model of C–R, the literature had studied learningby-doing using finite-horizon quantity-setting models (Spence (1981), Fudenberg and Tirole (1983), Ghemawat and Spence (1985), Ross (1986), Dasgupta and Stiglitz (1988), Cabral and Riordan (1997)).
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
455
In the absence of organizational forgetting, the price that a firm sets reflects two goals. First, by winning a sale, the firm moves down its learning curve. This is the advantage-building motive. Second, the firm prevents its rival from moving down its learning curve. This is the advantage-defending motive. But in the presence of organizational forgetting, bidirectional movements through the state space are possible, and this opens up new strategic possibilities for building and defending advantage. By winning a sale, the firm ensures that it does not slide back up its learning curve even if it forgets. At the same time, by denying its rival a sale, the firm sets up the possibility that its rival will move back up its learning curve if it forgets. Because organizational forgetting reinforces the advantage-building and advantage-defending motives in this way, it can create strong incentives to cut prices so as to win a sale. Organizational forgetting is thus a source of aggressive pricing behavior. While the existing literature has mainly focused on the dominance properties of firms’ pricing behaviors, we find that these properties are neither necessary nor sufficient for market dominance in our more general setting. We therefore go beyond the existing literature and directly examine the industry dynamics that firms’ pricing behaviors imply. We find that organizational forgetting is a source of—not an antidote to—market dominance in the long run. If forgetting is sufficiently weak, then asymmetries may arise but cannot persist. As in the C–R model, learning-by-doing operates as a ratchet: firms inexorably—if at different rates—move toward the bottom of their learning curves where cost parity obtains. If forgetting is sufficiently strong, then asymmetries cannot arise in the first place because forgetting stifles investment in learning altogether. For intermediate degrees of forgetting, asymmetries arise and persist. Even extreme asymmetries akin to near monopoly are possible. This is because, in the presence of organizational forgetting, the leader can use price cuts to delay or even stall the follower in moving down its learning curve. Organizational forgetting is not the only source of long-run market dominance. As C–R showed in their discussion of predatory pricing and as we also demonstrate in Section 8, a model of learning-by-doing that incorporates shutout elements, such as entry and exit or a choke price, can lead to long-run market dominance, much in the way Dasgupta and Stiglitz (1988) describe. Nevertheless, we exclude shut-out elements from our basic model for two reasons. First, the interaction between learning and forgetting is subtle and generates an enormous variety of interesting, even surprising, equilibria. Isolating it is, therefore, useful theoretically. Second, from an empirical viewpoint, as with Intel and AMD, we may see an apparently stable hierarchy of firms with differing market shares, costs, and profits. Our model can generate such an outcome, while shut-out model elements favor more extreme outcomes in which either one firm dominates or all firms compete on equal footing. Organizational forgetting is also a source of multiple equilibria. If the inflow of know-how into the industry due to learning is substantially smaller than the outflow of know-how due to forgetting, then it is virtually impossible that both
456
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
firms reach the bottom of their learning curves. Conversely, if the inflow is substantially greater than the outflow, then it is virtually inevitable that they do reach the bottom. In both cases, the primitives of the model tie down the equilibrium. This is no longer the case if the inflow roughly balances the outflow. If both firms believe that they cannot profitably coexist at the bottom of their learning curves, then both cut their prices in the hope of acquiring a competitive advantage early on and maintaining it throughout. However, if both firms believe that they can profitably coexist, then neither cuts its price, thereby ensuring that the anticipated symmetric industry structure emerges. Consequently, in addition to the degree of forgetting, the equilibrium by itself is an important determinant of pricing behavior and industry dynamics. Our finding of multiplicity is important for two reasons. First, to our knowledge, all applications of Ericson and Pakes’ (1995) framework have found a single equilibrium. Pakes and McGuire (1994, p. 570) (P–M) indeed held that “nonuniqueness does not seem to be a problem.” It is, therefore, striking that we obtain up to nine equilibria for some parameterizations. Second, we show that multiple equilibria in our model arise from firms’ expectations regarding the value of continued play. Being able to pinpoint the driving force behind multiple equilibria is a first step toward tackling the multiplicity problem that plagues the estimation of dynamic stochastic games and inhibits the use of counterfactuals in policy analysis.5 In sum, we show that learning-by-doing and organizational forgetting are distinct economic forces. Forgetting, in particular, does not simply negate learning. The unique role forgetting plays comes about because it enables bidirectional movements through the state space. Thus the interaction of learning and forgetting can give rise to aggressive pricing behavior, long-run industry concentration of varying degrees, and multiple equilibria. We also make two methodological contributions. First, we point out a weakness of the P–M algorithm, the major tool for computing equilibria in the literature following Ericson and Pakes (1995). Specifically, we prove that our dynamic stochastic game has equilibria that the P–M algorithm cannot compute. Roughly speaking, in the presence of multiple equilibria, “in between” two equilibria that it can compute there is one equilibrium it cannot. This severely limits its ability to provide a complete picture of the set of solutions to the model. Second, we propose a homotopy or path-following algorithm. The algorithm traces out the equilibrium correspondence by varying the degree of forgetting. This allows us to compute equilibria that the P–M algorithm cannot compute. We find that the equilibrium correspondence contains a unique path that starts at the equilibrium of the C–R model. Whenever this path bends back on itself 5 See Ackerberg, Benkard, Berry, and Pakes (2007) and Pakes (2008) for a discussion of the issue.
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
457
and then forward again, there are multiple equilibria. In addition, the equilibrium correspondence may contain one or more loops that cause additional multiplicity. To our knowledge, our paper is the first to describe in detail the structure of the set of equilibria of a dynamic stochastic game in the tradition of Ericson and Pakes (1995). The organization of the remainder of the paper is as follows. Sections 2 and 3 describe the model specification and our computational strategy. Section 4 provides an overview of the equilibrium correspondence. Section 5 analyzes industry dynamics and Section 6 characterizes the pricing behavior that drives it. Section 7 describes how organizational forgetting can lead to multiple equilibria. Section 8 undertakes a number of robustness checks. Section 9 summarizes and concludes. Throughout the paper, in presenting our findings, we distinguish between results, which are established numerically through a systematic exploration of a subset of the parameter space, and propositions, which hold true for the entire parameter space. If a proposition establishes a possibility through an example, then the example is presented adjacent to the proposition. If the proof of a proposition is deductive, then it is contained in the Appendix. 2. MODEL For expositional clarity, we focus on the basic model of an industry with two firms and neither entry nor exit; the online Appendix (Besanko, Doraszelski, Kryukov, and Satterthwaite (2010)) outlines the general model. Our basic model is the C–R model with organizational forgetting added and, to allow for our computational approach, specific functional forms for demand and cost. Firms and States We consider a discrete-time, infinite-horizon dynamic stochastic game of complete information played by two firms. Firm n ∈ {1 2} is described by its state en ∈ {1 M}. A firm’s state indicates its cumulative experience or stock of know-how. By making a sale, a firm can add to its stock of know-how. Following C–R, we use a period just long enough for a firm to make a sale.6 As suggested by the empirical studies of Argote, Beckman, and Epple (1990), Darr, Argote, and Epple (1995), Benkard (2000), Shafer, Nembhard, and Uzumeri (2001), and Thompson (2003), we account for organizational forgetting. Accordingly, the evolution of firm n’s stock of know-how is governed by the law of motion en = en + qn − fn 6 A sale may involve a single unit or a batch of units (e.g., 100 aircraft or 10,000 memory chips) that are sold to a single buyer.
458
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
where en and en denote firm n’s stock of know-how in the subsequent and current period, respectively, the random variable qn ∈ {0 1} indicates whether firm n makes a sale and gains a unit of know-how through learning-by-doing, and the random variable fn ∈ {0 1} indicates whether firm n loses a unit of know-how through organizational forgetting. At any point in time, the industry is characterized by a vector of firms’ states e = (e1 e2 ) ∈ {1 M}2 . We refer to e as the state of the industry. We use e[2] to denote the vector (e2 e1 ) constructed by interchanging the stocks of knowhow of firms 1 and 2. Learning-by-Doing Firm n’s marginal cost of production c(en ) depends on its stock of know-how en through a learning curve η κen if 1 ≤ en < m, c(en ) = κmη if m ≤ en ≤ M, where η = log2 ρ for a progress ratio of ρ ∈ (0 1]. Marginal cost decreases by 100(1 − ρ) percent as the stock of know-how doubles, so that a lower progress ratio implies a steeper learning curve. The marginal cost of production at the top of the learning curve, c(1), is κ > 0 and, as in C–R, m represents the stock of know-how at which a firm reaches the bottom of its learning curve.7 Organizational Forgetting We let Δ(en ) = Pr(fn = 1) denote the probability that firm n loses a unit of know-how through organizational forgetting. We assume that this probability is nondecreasing in the firm’s experience level. This has several advantages. First, experimental evidence in the management literature suggests that forgetting by individuals is an increasing function of the current stock of learned knowledge (Bailey (1989)). Second, a direct implication of Δ(·) increasing is that the expected stock of know-how in the absence of further learning is a decreasing convex function of time.8 This phenomenon, known in the psychology literature as Jost’s second law, is consistent with experimental evidence on forgetting by individuals (Wixted and Ebbesen (1991)). Third, in the capital–stock model employed in empirical work on organizational forgetting, the amount of depreciation is assumed to be proportional to the stock of know-how. Hence, the additional know-how needed to counteract depreciation must increase with 7 While C–R take the state space to be infinite, that is, M = ∞ in our notation, they make the additional assumption that the price that a firm charges does not depend on how far it is beyond the bottom of its learning curve (C–R, p. 1119). This is tantamount to assuming, as we do, that the state space is finite. 8 See the online Appendix for a proof.
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
459
the stock of know-how. Our specification has this feature. However, unlike the capital–stock model, it is consistent with a discrete state space.9 The specific functional form we employ is Δ(en ) = 1 − (1 − δ)en where δ ∈ [0 1] is the forgetting rate.10 If δ > 0, then Δ(en ) is increasing and concave in en ; δ = 0 corresponds to the absence of organizational forgetting, the special case C–R analyzed.11 Other functional forms are plausible, and we explore one of them in the online Appendix. Demand The industry draws its customers from a large pool of potential buyers. In each period, one buyer enters the market and purchases a good from one of the two firms. The utility that the buyer obtains by purchasing good n is v − pn + εn , where pn is the price of good n, v is a deterministic component of utility, and εn is a stochastic component that captures the idiosyncratic preference for good n of this period’s buyer. Both ε1 and ε2 are unobservable to firms, and are independently and identically type 1 extreme value distributed with location parameter 0 and scale parameter σ > 0. The scale parameter governs the degree of horizontal product differentiation. As σ → 0, goods become homogeneous. The buyer purchases the good that gives it the highest utility. Given our distributional assumptions, the probability that firm n makes the sale is given by the logit specification v − pn exp 1 σ = Dn (p) = Pr(qn = 1) = 2 pn − p−n v − pk 1 + exp exp σ σ k=1 where p = (p1 p2 ) is the vector of prices and p−n denotes the price the other firm charges. Demand effectively depends on differences in prices because we 9
See Benkard (2004) for an alternative approximation to the capital–stock model. One way to motivate this functional form is to imagine that the stock of know-how is dispersed among a firm’s workforce. In particular, assume that en is the number of skilled workers and that organizational forgetting is the result of labor turnover. Then, given a turnover rate of δ, Δ(en ) is the probability that at least one of the en skilled workers leaves the firm. 11 In state e = (e1 e2 ) ∈ {m m + 1 M}2 where both firms have reached the bottom of their learning curves, if one firm adds a unit of know-how, moving the industry to either state e = (e1 + 1 e2 ) or state e = (e1 e2 + 1), then firms’ marginal costs remain constant and equal. If there is learning but no forgetting (ρ < 1, δ = 0), then each pair of states e e ∈ {m m + 1 M}2 satisfies Maskin and Tirole’s (2001, p. 204) criterion for belonging “to the same element of the Markov partition.” If, however, there is both learning and forgetting (ρ < 1, δ > 0), then every state e ∈{1 M}2 is a distinct member of the Markov partition and is payoff-relevant. 10
460
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
assume, as do C–R, that the buyer always purchases from one of the two firms in the industry. In Section 8 we discuss the effects of including an outside good in the specification. State-to-State Transitions From one period to the next, a firm’s stock of know-how moves up or down or remains constant depending on realized demand qn ∈ {0 1} and organizational forgetting fn ∈ {0 1}. The transition probabilities are 1 − Δ(en ) if en = en + qn , Pr(en |en qn ) = if en = en + qn − 1, Δ(en ) where, at the upper and lower boundaries of the state space, we modify the transition probabilities to be Pr(M|M 1) = 1 and Pr(1|1 0) = 1, respectively. Note that a firm can increase its stock of know-how only if it makes a sale in the current period, an event that has probability Dn (p); otherwise, it runs the risk that its stock of know-how decreases. Bellman Equation We define Vn (e) to be the expected net present value of firm n’s cash flows if the industry is currently in state e. The value function Vn : {1 M}2 → [−Vˆ Vˆ ], where Vˆ is a sufficiently large constant, is implicitly defined by the Bellman equation (1)
Vn (e) = max Dn (pn p−n (e))(pn − c(en )) pn
+β
2
Dk (pn p−n (e))V nk (e)
k=1
where p−n (e) is the price charged by the other firm in state e, β ∈ (0 1) is the discount factor, and V nk (e) is the expectation of firm n’s value function conditional on the buyer purchasing the good from firm k ∈ {1 2} in state e as given by (2)
V n1 (e) =
e1 +1 e2
Vn (e ) Pr(e1 |e1 1) Pr(e2 |e2 0)
e1 =e1 e2 =e2 −1
(3)
V n2 (e) =
e2 +1 e1 e1 =e1 −1 e2 =e2
Vn (e ) Pr(e1 |e1 0) Pr(e2 |e2 1)
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
461
ˆ p], ˆ where pˆ is a sufficiently The policy function pn : {1 M}2 → [−p large constant, specifies the price pn (e) that firm n sets in state e.12 Let hn (e pn p−n (e) Vn ) denote the maximand in the Bellman equation (1). Differentiating it with respect to pn and using the properties of logit demand, we obtain the first-order condition (FOC) ∂hn (·) ∂pn 1 = Dn (pn p−n (e)) σ − (pn − c(en )) − βV nn (e) + hn (·) σ
0=
Differentiating hn (·) a second time yields 1 ∂2 hn (·) 1 ∂hn (·) = 2Dn (pn p−n (e)) − 1 − Dn (pn p−n (e)) 2 ∂pn σ ∂pn σ 2
hn (·) = − σ1 Dn (pn p−n (e)) < 0. hn (·) is therefore If the FOC is satisfied, then ∂ ∂p 2 n strictly quasiconcave in pn , so that the pricing decision pn (e) is uniquely determined by the solution to the FOC (given p−n (e)).
Equilibrium In our model, firms face identical demand and cost primitives. Asymmetries between firms arise endogenously from the effects of their pricing decisions on realized demand and organizational forgetting. Hence, we focus attention on symmetric Markov-perfect equilibria (MPE). In a symmetric equilibrium, the pricing decision taken by firm 2 in state e is identical to the pricing decision taken by firm 1 in state e[2] , that is, p2 (e) = p1 (e[2] ), and similarly for the value function. It therefore suffices to determine the value and policy functions of firm 1. We define V (e) = V1 (e) and p(e) = p1 (e) for each state e. Further, we let V k (e) = V 1k (e) denote the conditional expectation of firm 1’s value function and let Dk (e) = Dk (p(e) p(e[2] )) denote the probability that the buyer purchases from firm k ∈ {1 2} in state e. Given this notation, the Bellman equation and FOC can be expressed as (4)
Fe1 (V∗ p∗ ) = −V ∗ (e) + D∗1 (e)(p∗ (e) − c(e1 )) +β
2
D∗k (e)V ∗k (e)
k=1
= 0 12
In what follows, we assume that pˆ is chosen large enough to not constrain pricing behavior.
462 (5)
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
Fe2 (V∗ p∗ ) = σ − (1 − D∗1 (e))(p∗ (e) − c(e1 )) − βV ∗1 (e) +β
2
D∗k (e)V k∗ (e)
k=1
= 0 where we use asterisks to denote an equilibrium. The collection of equations (4) and (5) for all states e ∈ {1 M}2 can be written more compactly as ⎡ 1 ⎤ F(11) (V∗ p∗ ) ⎢ F 1 (V∗ p∗ ) ⎥ ⎢ (21) ⎥ F(V∗ p∗ ) = ⎢ (6) ⎥ = 0 ⎣ ⎦ 2 ∗ ∗ F(MM) (V p ) where 0 is a (2M 2 × 1) vector of zeros. Any solution to this system of 2M 2 equations in 2M 2 unknowns V∗ = (V ∗ (1 1) V ∗ (2 1) V ∗ (M M)) and p∗ = (p∗ (1 1) p∗ (2 1) p∗ (M M)) is a symmetric equilibrium in pure strategies. A slightly modified version of Proposition 2 in Doraszelski and Satterthwaite (2010) establishes that such an equilibrium always exists. Baseline Parameterization Since our focus is on how learning-by-doing and organizational forgetting affect pricing behavior and the industry dynamics this behavior implies, we explore the full range of values for the progress ratio ρ and the forgetting rate δ. To do so, we fix the remaining parameters to their baseline values given below. We specify a grid of 100 equidistant values of ρ ∈ (0 1]. For each of them, we use the homotopy algorithm described in Section 3 to trace the equilibrium as δ ranges from 0 to 1. Typically this entails solving the model for a few thousand intermediate values of δ. If an important or interesting property is true for each of these systematically computed equilibria, then we report it as a result. In Section 8 we then vary the values of the parameters other than ρ and δ so as to discuss their influence on the equilibrium and demonstrate the robustness of our conclusions. While we explore the full range of values for ρ and δ, we note that most empirical estimates of progress ratios are in the range of 07 to 095 (Dutton and Thomas (1984)). However, a very steep learning curve, with ρ much less than 07, may also capture a practically relevant situation. Suppose the first unit of a product is a hand-built prototype and the second unit is a guinea pig for organizing the production line. After this point, the gains from learning-bydoing are more or less exhausted and the marginal cost of production is close to
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
463
zero.13 Benkard (2000) and Argote, Beckman, and Epple (1990) found monthly rates of depreciation ranging from 4 to 25 percent of the stock of know-how. In the online Appendix, we show how to map these estimates that are based on a capital–stock model of organizational forgetting into our specification. The implied values of the forgetting rate δ fall below 01. In our baseline parameterization we set M = 30 and m = 15. The marginal cost at the top of the learning curve κ is equal to 10. For a progress ratio of ρ = 085, this implies that the marginal cost of production declines from a maximum value of c(1) = 10 to a minimum value of c(15) = · · · = c(30) = 530. For ρ = 015, we have the case of a hand-built prototype where the marginal cost of production declines very quickly from c(1) = 10 to c(2) = 150 and c(3) = 049 to c(15) = · · · = c(30) = 001. Turning to demand, we set σ = 1 in our baseline parameterization. To illustrate, in the Nash equilibrium of a static price-setting game (obtained by setting β = 0 in our model), the own-price elasticity of demand ranges between −886 in state (1 15) and −213 in state (15 1) for a progress ratio of ρ = 085. The cross-price elasticity of firm 1’s demand with respect to firm 2’s price is 241 in state (15 1) and 784 in state (1 15). For ρ = 015, the own-price elasticity ranges between −989 and −100, and the cross-price elasticity ranges between 100 and 805. These reasonable elasticities suggest that the results reported below are not artifacts of extreme parameterizations. ζ 1 We finally set the discount factor to β = 105 . It may be thought of as β = 1+r , where r > 0 is the per-period discount rate and ζ ∈ (0 1] is the exogenous probability that the industry survives from one period to the next. Consequently, our baseline parameterization corresponds to a variety of scenarios that differ in the length of a period. For example, it corresponds to a period length of 1 year, a yearly discount rate of 5 percent, and certain survival. Perhaps more interestingly, it also corresponds to a period length of 1 month, a monthly discount rate of 1 percent (which translates into a 1268 percent annual discount rate), and a monthly survival probability of 096. To put this—our focal scenario—in perspective, technology companies such as IBM and Microsoft had costs of capital in the range of 11 to 15 percent per annum in the late 1990s. Furthermore, an industry with a monthly survival probability of 096 has an expected lifetime of 2625 months. Thus this scenario is consistent with a pace of innovative activity that is expected to make the current generation of products obsolete within 2–3 years.
13
To avoid a marginal cost of close to zero, shift the cost function c(en ) by τ > 0. While introducing a component of marginal cost that is unresponsive to learning-by-doing shifts the policy function by τ, the value function and the industry dynamics remain unchanged.
464
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
3. COMPUTATION In this section we first describe a novel algorithm for computing equilibria of dynamic stochastic games that is based on the homotopy method.14 Then we turn to the P–M algorithm that is the standard means for computing equilibria in the literature following Ericson and Pakes (1995). We show that it is inadequate for characterizing the set of solutions to our model, although it remains useful for obtaining a starting point for the homotopy algorithm. A reader who is more interested in the economic implications of learning and forgetting may skip ahead to Section 4 after reading the first part of this section that introduces the homotopy algorithm by way of an example. 3.1. Homotopy Algorithm Our goal is to explore the graph of the equilibrium correspondence as the forgetting rate δ and the progress ratio ρ vary: (7)
F−1 = {(V∗ p∗ δ ρ)|F(V∗ p∗ ; δ ρ) = 0 δ ∈ [0 1] ρ ∈ (0 1]}
where F(·) is the system of equations (6) that defines an equilibrium, and we make explicit that it depends on δ and ρ (recall that we hold fixed the remaining parameters). The graph F−1 is a surface, or set of surfaces, that may have folds. Our homotopy algorithm explores this graph by taking slices of it for given values of ρ: (8)
F−1 (ρ) = {(V∗ p∗ δ)|F(V∗ p∗ ; δ ρ) = 0 δ ∈ [0 1]}
The homotopy algorithm starts from a single equilibrium that has already been computed and traces out an entire path of equilibria in F−1 (ρ) by varying δ. The homotopy algorithm is therefore also called a path-following algorithm and δ is the homotopy parameter. EXAMPLE: An example helps explain how the homotopy algorithm works. Consider the equation F(x; δ) = 0, where (9)
F(x; δ) = −15289 −
δ + 67500x − 96923x2 + 46154x3 1 + δ4
Equation (9) implicitly relates an endogenous variable x to an exogenous parameter δ. Figure 1 graphs the set of solutions F −1 = {(x δ)|F(x; δ) = 0 δ ∈ [0 1]}. There are multiple solutions at δ = 03, namely x = 0610, x = 0707, 14
See Schmedders (1998, 1999) for an application of the homotopy method to general equilibrium models with incomplete asset markets and see Berry and Pakes (2007) for an application to estimating demand systems.
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
465
FIGURE 1.—Homotopy example.
and x = 0783. Finding them is trivial with the graph in hand, but even for this simple case, the graph is less than straightforward to draw. Whether one solves F(x; δ) = 0 for x taking δ as given or for δ taking x as given, the result is a correspondence, not a function. The homotopy method introduces an auxiliary variable s that indexes each point on the graph, starting at point A for s = 0 and ending at point D for s = s¯. The graph is then just the parametric path given by a pair of functions (x(s) δ(s)) that satisfy F(x(s); δ(s)) = 0 or, equivalently, (x(s) δ(s)) ∈ F −1 . While there are infinitely many such pairs, a simple way to select a member of this family is to differentiate F(x(s); δ(s)) = 0 with respect to s: (10)
∂F(x(s); δ(s)) ∂F(x(s); δ(s)) x (s) + δ (s) = 0 ∂x ∂δ
This differential equation in two unknowns x (s) and δ (s) must be satisfied so as to remain “on path.” One possible approach for tracing out the path in (s) ∂F(x(s);δ(s))/∂δ F −1 is to solve equation (10) for the ratio xδ (s) = − ∂F(x(s);δ(s))/∂x that indicates the direction of the next step along the path from s to s + ds. This approach, however, fails because the ratio switches from +∞ to −∞ at points such as B in Figure 1. So instead of solving for the ratio, we simply solve for each term of
466
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
the ratio. This insight implies that the graph F −1 in Figure 1 is the solution to the system of differential equations (11)
x (s) =
∂F(x(s); δ(s)) ∂δ
δ (s) = −
∂F(x(s); δ(s)) ∂x
These are the so-called basic differential equations for our example. They reduce the task of tracing out the set of solutions to solving a system of differential equations. Given an initial condition, this can be done with a variety of methods (see Chapter 10 of Judd (1998)). If δ = 0, then F(x; δ) = 0 is easily solved for x = 05, thereby providing an initial condition (point A in Figure 1). From there the homotopy algorithm uses the basic differential equations to determine the next step along the path. It continues to follow—step by step—the path until it reaches δ = 1 (point D). In our example, the auxiliary variable s is decreasing from point A to point D. Therefore, whenever δ (s) switches sign from negative to positive (point B), the path is bending backward and there are multiple solutions. Conversely, whenever the sign of δ (s) switches back from positive to negative (point C), the path is bending forward. Returning to our model of learning and forgetting, let x = (V∗ p∗ ) denote the 2M 2 endogenous variables. Our goal is to explore F−1 (ρ), a slice of the graph of the equilibrium correspondence. Proceeding as in our example, a parametric path is a set of functions (x(s) δ(s)) ∈ F−1 (ρ). Differentiating F(x(s); δ(s) ρ) = 0 with respect to s yields the necessary conditions for remaining on path: (12)
∂F(x(s); δ(s) ρ) ∂F(x(s); δ(s) ρ) x (s) + δ (s) = 0 ∂x ∂δ
where ∂F(x(s);δ(s)ρ) is the (2M 2 ×2M 2 ) Jacobian, x (s) and ∂F(x(s);δ(s)ρ) are (2M 2 × ∂x ∂δ 1) vectors, and δ (s) is a scalar. This system of 2M 2 differential equations in the 2M 2 + 1 unknowns xi (s), i = 1 2M 2 , and δ (s) has a solution that obeys the basic differential equations ∂F(y(s); ρ) i+1 i = 1 2M 2 + 1 (13) yi (s) = (−1) det ∂y −i where y(s) = (x(s) δ(s)) and the notation (·)−i is used to indicate that the . Note that ith column is removed from the (2M 2 × 2M 2 + 1) Jacobian ∂F(y(s);ρ) ∂y equation (13) reduces to equation (11) if x is a scalar instead of a vector. Garcia and Zangwill (1979) and Chapter 2 of Zangwill and Garcia (1981) proved that the basic differential equations (13) satisfy the conditions in equation (12). The homotopy method requires that F(y; ρ) be continuously differentiable with respect to y and that the Jacobian ∂F(y;ρ) have full rank at all points ∂y in F−1 (ρ). To appreciate the importance of the latter requirement, known as
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
467
has less than full rank at some regularity, note that if the Jacobian ∂F(y(s);ρ) ∂y point y(s), then the determinants of all its (2M 2 × 2M 2 ) submatrices are zero. Hence, according to the basic differential equations (13), yi (s) = 0 for i = 1 2M 2 + 1, and the algorithm stalls. On the other hand, with regularity in place, the implicit function theorem ensures that F−1 (ρ) consists only of continuous paths; paths that suddenly terminate, endless spirals, branch points, isolated equilibria, and continua of equilibria are ruled out (see Chapter 1 of Zangwill and Garcia (1981)). While our assumed functional forms ensure continuous differentiability, we have been unable to establish regularity analytically. Indeed, we have numerical evidence suggesting that regularity can fail. In practice, failures of regularity are not a problem as long as they are confined to isolated points. Because our algorithm computes just a finite number of points along the path, it is extremely unlikely to hit an irregular point.15 We refer the reader to Borkovsky, Doraszelski, and Kryukov (2010) for a fuller discussion of this issue and a step-by-step guide to solving dynamic stochastic games using the homotopy method. As Result 1 in Section 4 shows, we have always been able to trace out a path in F−1 (ρ) that starts at the equilibrium for δ = 0 and ends at the equilibrium for δ = 1. Whenever this “main path” folds back on itself, the homotopy algorithm automatically identifies multiple equilibria. This makes it well suited for models like ours that have multiple equilibria. Nevertheless, the homotopy algorithm cannot be guaranteed to find all equilibria.16 The slice F−1 (ρ) may contain additional equilibria that are off the main path. These equilibria form one or more loops (see Result 1 in Section 4). We have two intuitively appealing but potentially fallible ways to try and identify additional equilibria. First, we use a large number of restarts of the P–M algorithm, often trying to “propagate” equilibria from “nearby” parameterizations. Second, and more systematically, just as we can choose δ as the homotopy parameter while keeping ρ fixed, we can choose ρ while keeping δ fixed. This allows us to “crisscross” the parameter space in an orderly fashion by using the equilibria on the δ-slices as initial conditions to generate ρ-slices. A ρ-slice must either intersect with all δ-slices or lead us to an additional equilibrium that, in turn, gives us an initial condition to generate an additional δ-slice. We continue this process until all the ρ- and δ-slices “match up” (for details see Grieco (2008)). 3.2. Pakes and McGuire (1994) Algorithm While the homotopy method has advantages in finding multiple equilibria, it cannot stand alone. The P–M algorithm (or some other means for solving a 15 Our programs use Hompack (Watson, Billups, and Morgan (1987), Watson et al. (1997)) written in Fortran 90. They are available from the authors upon request. 16 Unless the system of equations that defines them happens to be polynomial; see Judd and Schmedders (2004) for some early efforts along this line.
468
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
system of nonlinear equations) is necessary to compute a starting point for our homotopy algorithm. Recall that V2 (e) = V1 (e[2] ) and p2 (e) = p1 (e[2] ) for each state e in a symmetric equilibrium, and it therefore suffices to determine V and p, the value and policy functions of firm 1. The P–M algorithm is iterative. An iteration cycles through the states in some predetermined order and updates V and p as it progresses from one iteration to the next. The strategic situation firms face in setting prices in state e is similar to a static game if the value of continued play is taken as given. The P–M algorithm computes the best reply of firm 1 against p(e[2] ) in this game and uses it to update the value and policy functions of firm 1 in state e. More formally, let h1 (e p1 p(e[2] ) V) be the maximand in the Bellman equation (1) after symmetry is imposed. The best reply of firm 1 against p(e[2] ) in state e is given by G2e (V p) = arg max h1 e p1 p e[2] V (14) p1
and the value associated with it is (15) G1e (V p) = max h1 e p1 p e[2] V p1
Write the collection of equations (14) and (15) for all states e ∈ {1 M}2 as ⎛ 1 ⎞ G(11) (V p) ⎜ G1 (V p) ⎟ ⎜ (21) ⎟ G(V p) = ⎜ (16) ⎟ ⎝ ⎠ G2(MM) (V p) Given an initial guess x0 = (V0 p0 ), the P–M algorithm executes the iteration xk+1 = G(xk )
k = 0 1 2
until the changes in the value and policy functions of firm 1 are deemed small (or a failure to converge is diagnosed). The P–M algorithm does not lend itself to computing multiple equilibria. To identify more than one equilibrium (for a given parameterization of the model), it must be restarted from different initial guesses, but different initial guesses may or may not lead to different equilibria. This, however, still understates the severity of the problem. Whenever our dynamic stochastic game has multiple equilibria, the P–M algorithm cannot compute a substantial fraction of them even if an arbitrarily large number of initial guesses are tried. The problem is this. The P–M algorithm continues to iterate until it reaches a fixed point x = G(x). A necessary condition for convergence is local stability of
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
469
at the fixed point and the fixed point. Consider the (2M 2 × 2M 2 ) Jacobian ∂G(x) ∂x 17 let ( ∂G(x) ) be its spectral radius. The fixed point is locally stable under the ∂x ∂G(x) P–M algorithm if ( ∂x ) < 1, that is, if all eigenvalues are within the complex unit circle. Given local stability, the P–M algorithm converges provided that the initial guess is close (perhaps very close) to the fixed point. Conversely, if ( ∂G(x) ) ≥ 1, then the fixed point is unstable and the P–M algorithm cannot ∂x compute it. The following proposition identifies a subset of equilibria that the P–M algorithm cannot compute. PROPOSITION 1: Let (x(s) δ(s)) ∈ F−1 (ρ) be a parametric path of equilibria. (i) If δ (s) ≤ 0, then ( ∂G(x(s)) |(δ(s)ρ) ) ≥ 1 and the equilibrium x(s) is unstable ∂x under the P–M algorithm. (ii) Moreover, the equilibrium x(s) remains unstable with either dampening or extrapolation applied to the P–M algorithm. Part (i) of Proposition 1 establishes that the P–M algorithm cannot compute equilibria on any part of the path for which δ (s) ≤ 0 Whenever δ (s) switches sign from positive to negative, the main path connecting the equilibrium at δ = 0 with the equilibrium at δ = 1 bends backward and multiple equilibria arise. Conversely, whenever the sign of δ (s) switches back from negative to positive, the main path bends forward. Hence, for a fixed forgetting rate δ(s), between two equilibria for which δ (s) > 0 lies a third—necessarily unstable—equilibrium for which δ (s) ≤ 0 Similarly, a loop has equilibria for which δ (s) > 0 and equilibria for which δ (s) ≤ 0. Consequently, if we have multiple equilibria (for a given parameterization of the model), then the P–M algorithm can compute at best 1/2 to 2/3 of them. Dampening and extrapolation are often applied to the P–M algorithm in the hope of improving its likelihood or speed of convergence. The iteration xk+1 = ωG(xk ) + (1 − ω)xk
k = 0 1 2
is said to be dampened if ω ∈ (0 1) and extrapolated if ω ∈ (1 ∞). Part (ii) of Proposition 1 establishes the futility of these attempts.18 The ability of the P–M algorithm to provide a reasonably complete picture of the set of solutions to the model is limited beyond the scope of Proposition 1. Our numerical analysis indicates that the P–M algorithm cannot compute equilibria on some part of the path for which δ (s) > 0: PROPOSITION 2: Let (x(s) δ(s)) ∈ F−1 (ρ) be a parametric path of equilibria. Even if δ (s) > 0, we may have ( ∂G(x(s)) |(δ(s)ρ) ) ≥ 1 so that the equilibrium x(s) ∂x is unstable under the P–M algorithm. 17 Let A be an arbitrary matrix and let ς(A) be the set of its eigenvalues. The spectral radius of A is (A) = max{|λ| : λ ∈ ς(A)}. 18 Dampening and extrapolation may, of course, still be helpful in computing equilibria for which δ (s) > 0.
470
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
In the online Appendix we prove Proposition 2 by way of an example and illustrate equilibria of our model that the P–M algorithm cannot compute. As is well known, not all Nash equilibria of static games are stable under best reply dynamics (see Chapter 1 of Fudenberg and Tirole (1991)).19 Since the P–M algorithm incorporates best reply dynamics, it is reasonable to expect that this limits its usefulness. In the online Appendix, we argue that this is not the case. More precisely, we show that, holding fixed the value of continued play, the best reply dynamics are contractive and therefore converge to a unique fixed point irrespective of the initial guess. The value function iteration also is contractive, holding fixed the policy function. Hence, each of the two building blocks of the P–M algorithm “works.” What makes it impossible to obtain a substantial fraction of equilibria is the interaction of value function iteration with best reply dynamics. The P–M algorithm is a pre-Gauss–Jacobi method. The subsequent literature has instead sometimes used a pre-Gauss–Seidel method (Benkard (2004), Doraszelski and Judd (2008)). Whereas a Gauss–Jacobi method replaces the old guesses for the value and policy functions with the new guesses at the end of an iteration after all states have been visited, a Gauss–Seidel method updates after each state. This has the advantage that “information” is used as soon as it becomes available (see Chapters 3 and 5 of Judd (1998)). We have been unable to prove that Proposition 1 carries over to this alternative algorithm. We note, however, that the Stein–Rosenberg theorem (see Proposition 6.9 in Section 2.6 of Bertsekas and Tsitsiklis (1997)) asserts that for certain systems of linear equations, if the Gauss–Jacobi algorithm fails to converge, then so does the Gauss–Seidel algorithm. Hence, it does not seem reasonable to presume that the Gauss–Seidel variant of the P–M algorithm is immune to the difficulties the original algorithm suffers. 4. EQUILIBRIUM CORRESPONDENCE This section provides an overview of the equilibrium correspondence. In the absence of organizational forgetting, C–R established uniqueness of equilibrium. Theorem 2.2 in C–R extends to our model: PROPOSITION 3: If organizational forgetting is either absent (δ = 0) or certain (δ = 1), then there is a unique equilibrium.20 The cases of δ = 0 and δ = 1 are special in that they ensure that movements through the state space are unidirectional. When δ = 0, a firm can never move 19 More generally, in static games, Nash equilibria of degree −1 are unstable under any Nash dynamics, that is, dynamics with rest points that coincide with Nash equilibria, including replicator and smooth fictitious play dynamics (Demichelis and Germano (2002)). 20 Proposition 3 pertains to both symmetric and asymmetric equilibria.
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
471
“backward” to a lower state and when δ = 1, it can never move “forward” to a higher state. Hence, backward induction can be used to establish uniqueness of equilibrium (see Section 7 for details). In contrast, when δ ∈ (0 1), a firm can move in either direction. These bidirectional movements break the backward induction and make multiple equilibria possible: PROPOSITION 4: If organizational forgetting is neither absent (δ = 0) nor certain (δ = 1), then there may be multiple equilibria. Figure 2 proves the proposition and illustrates the extent of multiplicity. It shows the number of equilibria that we have identified for each combination of progress ratio ρ and forgetting rate δ. Darker shades indicate more equilibria. As can be seen, we have found up to nine equilibria for some values of ρ and δ. Multiplicity is especially pervasive for forgetting rates δ in the empirically relevant range below 01. In dynamic stochastic games with finite actions, Herings and Peeters (2004) have shown that generically the number of MPE is odd. While they consider both symmetric and asymmetric equilibria, in a two-player game with symmetric primitives such as ours, asymmetric equilibria occur in pairs. Hence, their result immediately implies that generically the number of symmetric equilibria
FIGURE 2.—Number of equilibria.
472
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
is odd in games with finite actions. Figure 2 suggests that this carries over to our setting with continuous actions. To understand the geometry of how multiple equilibria arise, we take a close look at the slices of the graph of the equilibrium correspondence that our homotopy algorithm computes. RESULT 1: The slice F−1 (ρ) contains a unique path that connects the equilibrium at δ = 0 with the equilibrium at δ = 1. In addition, F−1 (ρ) may contain (one or more) loops that are disjoint from this “main path” and from each other. Figure 3 illustrates Result 1. To explain this figure, recall that an equilibrium consists of a value function V∗ and a policy function p∗ , and is thus an element of a high-dimensional space. To succinctly represent it, we proceed in two steps. First, we use p∗ to construct the probability distribution over next period’s state e given this period’s state e, that is, the transition matrix that characterizes the Markov process of industry dynamics. We compute the transient distribution over states in period t, μt (·), starting from state (1 1) in period 0. This tells us how likely each possible industry structure is in period t given that both firms began at the top of their learning curves. In addition, we compute the limiting (or ergodic) distribution over states, μ∞ (·).21 The transient distributions capture short-run dynamics and the limiting distribution captures long-run (or steady-state) dynamics. Second, we use the transient distribution over states in period t, μt (·), to compute the expected Herfindahl index Ht = (D∗1 (e)2 + D∗2 (e)2 )μt (e) e
The time path of H t summarizes the implications of learning and forgetting for industry dynamics. If the industry evolves asymmetrically, then H t > 05. The maximum expected Herfindahl index H ∧ = max H t t∈{1100}
is a summary measure of short-run industry concentration. The limiting expected Herfindahl index H ∞ , computed using μ∞ (·) instead of μt (·), is a summary measure of long-run industry concentration. If H ∞ > 05, then an asymmetric industry structure persists. Let P be the M 2 × M 2 transition matrix. The transient distribution in period t is given by μ = μ0 Pt , where μ0 is the 1×M 2 initial distribution and Pt the tth matrix power of P. If δ ∈ (0 1), then the Markov process is irreducible because logit demand implies that the probability moving forward is always nonzero. That is, all its states belong to a single closed communicating class and the 1 × M 2 limiting distribution μ∞ solves the system of linear equations μ∞ = μ∞ P. If δ = 0 (δ = 1), then there is also a single closed communicating class, but its sole member is state (M M) ((1 1)). 21
t
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
473
FIGURE 3.—Limiting expected Herfindahl index H ∞ (solid line) and maximum expected Herfindahl index H ∧ (dashed line).
474
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
In Figure 3 we visualize F−1 (ρ) for a variety of progress ratios by plotting H (dashed line) and H ∞ (solid line). As can be seen, multiple equilibria arise whenever the main path folds back on itself. Moreover, there is one loop for ρ ∈ {075 065 055 015 005}, two loops for ρ ∈ {085 035}, and three loops for ρ = 095, thus adding further multiplicity. Figure 3 is not necessarily a complete picture of the equilibria to our model. As discussed in Section 3.1, no algorithm is guaranteed to find all equilibria. We do find all equilibria along the main path and we have been successful in finding a number of loops, but other loops may exist because, to trace out a loop, we must somehow compute at least one equilibrium on the loop, and doing so is problematic. ∧
Types of Equilibria Despite the multiplicity, the equilibria of our game exhibit the four typical patterns shown in Figure 4.22 The parameter values are ρ = 085 and δ ∈ {0 00275 008}; they represent the median progress ratio across a wide array of empirical studies combined with the cases of no, low, and high organizational forgetting. One should recognize that the typical patterns, helpful as they are in understanding the range of behaviors that can occur, lie on a continuum and thus morph into each other in complicated ways as we change the parameter values. The upper left panel of Figure 4 is typical for what we call a flat equilibrium without well (ρ = 085, δ = 0). The policy function is very even over the entire state space. In particular, the price that a firm charges in equilibrium is fairly insensitive to its rival’s stock of know-how. The upper right panel shows a flat equilibrium with well (ρ = 085, δ = 00275). While the policy function remains even over most of the state space, price competition is intense during the industry’s birth. This manifests itself as a “well” in the neighborhood of state (1 1). The lower left panel of Figure 4 exemplifies a trenchy equilibrium (ρ = 085, δ = 00275). The parameter values are the same as for the flat equilibrium with well, thereby providing an instance of multiplicity.23 The policy function is uneven and exhibits a “trench” along the diagonal of the state space. This trench starts in state (1 1) and extends beyond the bottom of the learning curve in 22 The value functions corresponding to the policy functions in Figure 4 can be found in the online Appendix where we also provide tables of the value and policy functions for ease of reference. 23 As can be seen in the upper right panel of Figure 3, the main path in F−1 (085) bends back on itself at δ = 00275, and there are three equilibria for slightly lower values of δ and only one for slightly higher values. This particular parameterization (if not the pattern of behavior it generates) is therefore almost nongeneric in that it approximates the isolated occurrence of an even number of equilibria. Due to the limited precision of our homotopy algorithm, we have indeed been unable to find a third equilibrium.
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
475
FIGURE 4.—Policy function p∗ (e1 e2 ); marginal cost c(e1 ) (solid line in e2 = 30 plane).
state (m m) all the way to state (M M). Hence, in a trenchy equilibrium, price competition between firms with similar stocks of know-how is intense but abates once firms become asymmetric. Finally, the lower right panel illustrates an extra-trenchy equilibrium (ρ = 085, δ = 008). The policy function has not only a diagonal trench, but also trenches parallel to the edges of the state space. In these sideways trenches, the leader competes aggressively with the follower. Sunspots For a progress ratio of ρ = 1 the marginal cost of production is constant at c(1) = · · · = c(M) = κ and there are no gains from learning-by-doing. It clearly is an equilibrium for firms to disregard their stocks of know-how and set the same prices as in the Nash equilibrium of a static price-setting game (obtained by setting β = 0). Since firms’ marginal costs are constant, so are the static Nash equilibrium prices. Thus, we have an extreme example of a
476
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
σ flat equilibrium with p∗ (e) = κ + 2σ = 12 and V ∗ (e) = 1−β = 21 for all states 2 e ∈ {1 M} . Figure 2 shows that, in case of ρ = 1, there are two more equilibria for a range of forgetting rates δ below 01. Since the state of the industry has no bearing on the primitives, we refer to these equilibria as sunspots. One of the sunspots is a trenchy equilibrium while the other one is, depending on δ, either a flat or a trenchy equilibrium. In the trenchy equilibrium, the industry evolves toward an asymmetric structure where the leader charges a lower price than the follower and enjoys a higher probability of making a sale. Consequently, the net present value of cash flows to the leader exceeds that to the follower. The value in state (1 1), however, is lower than in the static Nash equilibrium, that is, V ∗ (1 1) < 21.24 This indicates that value is destroyed as firms fight for dominance. The existence of sunspots and the fact that these equilibria persist for ρ ≈ 1 suggests that the concept of MPE is richer than one may have thought. Besides describing the physical environment of the industry, the state serves as a summary of the history of play: A larger stock of know-how indicates that—on average—a firm has won more sales than its rival, with the likely reason being that the firm has charged lower prices. Hence, by conditioning their current behavior on the state, firms implicitly condition on the history of play. The difference with a subgame perfect equilibrium is that there firms have the entire history of play at their disposal, whereas here they have but a crude indication of it. Nevertheless, “barely” payoff-relevant state variables (such as firms’ stocks of know-how if ρ ≈ 1) open the door for bootstrap-type equilibria, which are familiar from repeated games, to arise in Markov-perfect settings. In sum, accounting for organizational forgetting in a model of learning-bydoing leads to multiple equilibria and a rich array of pricing behaviors. In the next section we explore what these behaviors entail for industry dynamics, both in the short run and in the long run.
5. INDUSTRY DYNAMICS Figures 5 and 6 display the transient distribution in periods 8 and 32, respectively, and Figure 7 displays the limiting distribution for our four typical cases.25 In the flat equilibrium without well (ρ = 085, δ = 0, upper left panels), the transient and limiting distributions are unimodal. The most likely industry structure is symmetric. For example, the modal state is (5 5) in period 8, is (9 9) in period 16, is (17 17) in period 32, and is (30 30) in period 64. Turning from the short run to the long run, the industry is sure to remain in state (30 30), because with logit demand, a firm always has a positive probability of 24 For example, if δ = 00275, then V ∗ (28 21) = 2543 and p∗ (28 21) = 1233 for the leader, V ∗ (21 28) = 2239 and p∗ (21 28) = 1251 for the follower, and V ∗ (1 1) = 1936. 25 To avoid clutter, we do not graph states that have probability of less than 10−4 .
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
477
FIGURE 5.—Transient distribution over states in period 8 given initial state (1 1).
making a sale irrespective of its own price and that of its rival so that, in the absence of organizational forgetting, both firms must eventually reach the bottom of their learning curves.26 In short, the industry starts symmetric and stays symmetric. By contrast, in the flat equilibrium with well (ρ = 085, δ = 00275, upper right panels), the transient distributions are first bimodal and then unimodal as is the limiting distribution. The modal states are (1 8) and (8 1) in period 8, are (4 11) and (11 4) in period 16, and are (9 14) and (14 9) in period 32, but the modal state is (17 17) in period 64 and the modal states of the limiting distribution are (24 25) and (25 24). Thus, as time passes, firms end up competing on equal footing. In sum, the industry evolves first toward an asymmetric 26 The absence of persistent asymmetries is not an artifact of our functional forms. C–R pointed out that it holds true as long as the support of demand is unbounded (see their Assumption 1(a) and footnote 6 on p. 1118).
478
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
FIGURE 6.—Transient distribution over states in period 32 given initial state (1 1).
structure and then toward a symmetric structure. As we discuss in detail in the Section 6, the well serves to build, but not to defend, a competitive advantage. While the modes of the transient distributions are more separated and pronounced in the trenchy equilibrium (ρ = 085, δ = 00275, lower left panels) than in the flat equilibrium with well, the dynamics of the industry are similar at first. Unlike in the flat equilibrium with well, however, the industry continues to evolve toward an asymmetric structure. The modal states are (14 21) and (21 14) in period 64 and are (21 28) and (28 21) in the limiting distribution. Although the follower reaches the bottom of its learning curve in the long run and attains cost parity with the leader, asymmetries persist because the diagonal trench serves to build and to defend a competitive advantage. In the extra-trenchy equilibrium (ρ = 085, δ = 008, lower right panels) the sideways trench renders it unlikely that the follower ever makes it down from the top of its learning curve. The transient and limiting distributions are bimodal, and the most likely industry structure is extremely asymmetric. The
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
479
FIGURE 7.—Limiting distribution over states.
modal states are (1 7) and (7 1) in period 8, are (1 10) and (10 1) in period 16, are (1 15) and (15 1) in period 32, are (1 19) and (19 1) in period 64, and are (1 26) and (26 1) in the limiting distribution. In short, one firm acquires a competitive advantage early on and maintains it with an iron hand. Returning to Figure 3, the maximum expected Herfindahl index H ∧ (dashed line) and the limiting expected Herfindahl index H ∞ (solid line) highlight the fundamental economics of organizational forgetting. If forgetting is sufficiently weak (δ ≈ 0), then asymmetries may arise but cannot persist, that is, H ∧ ≥ 05 and H ∞ ≈ 05. Moreover, if asymmetries arise in the short run, they are modest. If forgetting is sufficiently strong (δ ≈ 1), then asymmetries cannot arise in the first place, that is, H ∧ ≈ H ∞ ≈ 05 because forgetting stifles investment in learning altogether,27 but for intermediate degrees of forgetting, asymmetries 27
We further document this investment stifling in the online Appendix.
480
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
arise and persist. These asymmetries can be so pronounced that the leader is virtually a monopolist. Since the Markov process of industry dynamics is irreducible for δ ∈ (0 1), the follower must eventually overtake the leader. The limiting expected Herfindahl index H ∞ may be a misleading measure of long-run industry concentration if such leadership reversals happen frequently. This, however, is not the case: Leadership reversals take a long time to occur when H ∞ is high. To establish this, define τ(e1 e2 ) to be the first-passage time into the set {(e˜ 1 e˜ 2 )|e˜ 1 ≤ e˜ 2 } if e1 ≥ e2 or {(e˜ 1 e˜ 2 )|e˜ 1 ≥ e˜ 2 } if e1 ≤ e2 . That is, τ(e) is the expected time it takes the industry to move from state e below (or on) the diagonal of the state space, where firm 1 leads and firm 2 follows, to state e˜ above (or on) it, where firm 1 follows and firm 2 leads. Taking the average with respect to the limiting distribution yields the summary measure τ(e)μ∞ (e) τ∞ = e
For the trenchy and extra-trenchy equilibria τ∞ = 295 and τ∞ = 83406, respectively, indicating that a leadership reversal takes a long time to occur. Hence, the asymmetry captured by H ∞ persists. In the online Appendix we plot the expected time to a leadership reversal τ∞ in the same format as Figure 3. Just like H ∞ , τ∞ is largest for intermediate degrees of forgetting. Moreover, τ∞ is of substantial magnitude, easily reaching and exceeding 1000 periods. Asymmetries are therefore persistent in our model because the expected time until the leader and the follower switch roles is (perhaps very) long. We caution the reader that the absence of persistent asymmetries for small forgetting rates δ in Figure 3 may be an artifact of the finite size of the state space (M = 30 in our baseline parameterization). Given δ = 001, say, Δ(30) = 026 and organizational forgetting is so weak that the industry is sure to remain in or near state (30 30). This minimizes bidirectional movements and restores the backward induction logic that underlies uniqueness of equilibrium for the extreme case of δ = 0 (see Proposition 3). We show in the online Appendix that increasing M, while holding fixed δ, facilitates persistent asymmetries as the industry becomes more likely to remain in the interior of the state space. Furthermore, as emphasized in Section 1, shut-out model elements can give rise to persistent asymmetries even in the absence of organizational forgetting. We explore this issue further in Section 8. To summarize, contrary to what one might expect, organizational forgetting does not negate learning-by-doing. Rather, as can be seen in Figure 3, over a range of progress ratios ρ above 06 and forgetting rates δ below 01, learning and forgetting reinforce each other. Starting from the absence of both learning (ρ = 1) and forgetting (δ = 0), a steeper learning curve (lower progress ratio) tends to give rise to a more asymmetric industry structure just as a higher forgetting rate does. In the next section we analyze the pricing behavior that drives these dynamics.
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
481
6. PRICING BEHAVIOR Rewriting equation (5) shows that firm 1’s price in state e satisfies (17)
p∗ (e) = c ∗ (e) +
σ 1 − D∗1 (e)
where the virtual marginal cost (18)
c ∗ (e) = c(e1 ) − βφ∗ (e)
equals the actual marginal cost c(e1 ) minus the discounted prize βφ∗ (e) from winning the current period’s sale. The prize, in turn, is the difference in the value of continued play to firm 1 if it wins the sale, V ∗1 (e), versus if it loses the sale, V ∗2 (e): (19)
φ∗ (e) = V ∗1 (e) − V ∗2 (e)
Note that irrespective of the forgetting rate δ, the equilibrium of our dynamic stochastic game reduces to the static Nash equilibrium if firms are myopic. Setting β = 0 in equations (17) and (18) gives the usual FOC for a static price-setting game with logit demand: (20)
p† (e) = c(e1 ) +
σ 1 − D†1 (e)
where D†k (e) = Dk (p† (e) p† (e[2] )) denotes the probability that, in the static Nash equilibrium, the buyer purchases from firm k ∈ {1 2} in state e. Thus, if β = 0, then p∗ (e) = p† (e) and V ∗ (e) = D†1 (e)(p† (e) − c(e1 )) for all states e ∈ {1 M}2 . 6.1. Price Bounds Comparing equations (17) and (20) shows that equilibrium prices p∗ (e) and p (e[2] ) coincide with the prices that obtain in a static Nash equilibrium with costs equal to virtual marginal costs c ∗ (e) and c ∗ (e[2] ). Static Nash equilibrium prices are increasing in either firm’s cost (Vives (1999, p. 35)). Therefore, if both firms’ prizes are nonnegative, static Nash equilibrium prices are an upper bound on equilibrium prices, that is, if φ∗ (e) ≥ 0 and φ∗ (e[2] ) ≥ 0, then p∗ (e) ≤ p† (e) and p∗ (e[2] ) ≤ p† (e[2] ). A sufficient condition for φ∗ (e) ≥ 0 for each state e is that the value function ∗ V (e) is nondecreasing in e1 and nonincreasing in e2 . Intuitively, it should not hurt firm 1 if it moves down its learning curve and it should not benefit firm 1 if firm 2 moves down its learning curve. While neither we nor C–R have succeeded in proving it, our computations show that this intuition is valid in the absence of organizational forgetting: ∗
482
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
RESULT 2: If organizational forgetting is absent (δ = 0), then p∗ (e) ≤ p† (e) for all e ∈ {1 M}2 . Result 2 highlights the fundamental economics of learning-by-doing: As long as improvements in competitive position are valuable, firms use price cuts as investments to achieve them. We complement Result 2 by establishing a lower bound on equilibrium prices in states where at least one of the two firms has reached the bottom of its learning curve: PROPOSITION 5: If organizational forgetting is absent (δ = 0), then (i) p∗ (e) = p (e) = p† (m m) > c(m) for all e ∈ {m M}2 and (ii) p∗ (e) > c(m) for all e1 ∈ {m M} and e2 ∈ {1 m − 1}. †
Part (i) of Proposition 5 sharpens Theorem 4.3 in C–R by showing that once both firms have reached the bottom of their learning curves, equilibrium prices revert to static Nash levels. To see why, note that, given δ = 0 the prize reduces to φ∗ (e) = V ∗ (e1 + 1 e2 ) − V ∗ (e1 e2 + 1), but beyond the bottom of their learning curves, firms’ competitive positions can neither improve nor deteriorate. Hence, as we show in the proof of the proposition, V ∗ (e) = V ∗ (e ) for all e e ∈ {m M}2 , so that the advantage-building and advantage-defending motives disappear, the prize is zero, and equilibrium prices revert to static Nash levels. This rules out trenches penetrating into this region of the state space. Trenchy and extra-trenchy equilibria therefore cannot arise in the absence of organizational forgetting. Part (ii) of Proposition 5 restates Theorem 4.3 in C–R for the situation where the leader but not the follower has reached the bottom of its learning curve. The leader no longer has an advantage-building motive, but continues to have an advantage-defending motive. This raises the possibility that the leader uses price cuts to delay the follower in moving down its learning curve. The proposition shows that there is a limit to how aggressively the leader defends its advantage: below-cost pricing is never optimal in the absence of organizational forgetting. The story changes dramatically in the presence of organizational forgetting. The equilibrium may exhibit soft competition in some states and price wars in other states. Consider the trenchy equilibrium (ρ = 085, δ = 00275). The upper bound in Result 2 fails in state (22 20) where the leader charges 644 and the follower charges 760, significantly above its static Nash equilibrium price of 730. The follower’s high price stems from its prize of −104. This prize, in turn, reflects that if the follower wins the sale, then the industry most likely moves to state (22 21) and thus moves closer to the brutal price competition on the diagonal of the state space. Indeed, the follower’s value function decreases from 2009 in state (22 20) to 1956 in state (22 21). To avoid this undesirable possibility, the follower charges a high price. The lower bound in
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
483
Proposition 5 fails in state (20 20) where both firms charge 524 as compared to a marginal cost of 530. The prize of 216 makes it worthwhile to price below cost even beyond the bottom of the learning curve because “in the trench” winning the current sale confers a lasting advantage. This discussion provides the instances that prove the next two propositions. PROPOSITION 6: If organizational forgetting is present (δ > 0), then we may have p∗ (e) > p† (e) for some e ∈ {1 M}2 . Figure 8 illustrates Proposition 6 by plotting the share of equilibria that violate the upper bound in Result 2.28 Darker shades indicate higher shares. As can be seen, the upper bound continues to hold if organizational forgetting is very weak (δ ≈ 0) and possibly also if learning-by-doing is very weak (ρ ≈ 1).
FIGURE 8.—Share of equilibria with p∗ (e) > p† (e) for some e ∈ {1 M}2 . 28 To take into account the limited precision of our computations, we take the upper bound to be violated if p∗ (e) > p† (e) + for some e ∈ {1 M}2 , where is positive but small. Specifically, we set = 10−2 , so that if prices are measured in dollars, then the upper bound must be violated by more than a cent. Given that the homotopy algorithm solves the system of equations up to a maximum absolute error of about 10−12 , Figure 8 therefore almost certainly understates the extent of violations.
484
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
Apart from these extremes (and a region around ρ = 045 and δ = 025), at least some, if not all, equilibria entail at least one state where equilibrium prices exceed static Nash equilibrium prices. Taken alone, Proposition 6 suggests that organizational forgetting makes firms less aggressive. This makes sense: After all, why invest in improvements in competitive position when they are transitory? However, organizational forgetting can also be a source of aggressive pricing behavior: PROPOSITION 7: If organizational forgetting is present (δ > 0), then we may have p∗ (e) ≤ c(m) for some e1 ∈ {m M} and e2 ∈ {1 M}. Figure 9 illustrates Proposition 7 by plotting the share of equilibria in which a firm prices below cost even though it has reached the bottom of its learning curve. Note that Figure 9 is a conservative tally of how often the lower bound in Proposition 5 fails, because the lower bound in part (i) already fails if the leader charges less than its static Nash equilibrium price, not less than its marginal cost. In sum, the leader may be more aggressive in defending its advantage in the presence of organizational forgetting than in its absence. The
FIGURE 9.—Share of equilibria with p∗ (e) ≤ c(m) for some e1 ∈ {m M} and e2 ∈ {1 M}.
485
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
most dramatic expression of this aggressive pricing behavior is the diagonal trenches that are the defining feature of trenchy and extra-trenchy equilibria. 6.2. Wells and Trenches This section develops intuition as to how wells and trenches can arise. Our goal is to provide insight as to whether equilibria featuring wells and trenches are economically plausible and, at least potentially, empirically relevant. Wells A well, as seen in the upper right panel of Figure 4, is a preemption battle that firms at the top of their learning curves fight. Consider our leading example of a flat equilibrium with well (ρ = 085, δ = 00275).29 Table I details firms’ competitive positions at various points in time, assuming that firm 1 leads and firm 2 follows. Having moved down the learning curve first, the leader has a lower cost and a higher prize than the follower. In the modal state (8 1) in period 8, the leader therefore charges a lower price and enjoys a higher probability of making a sale. In time the follower also moves down the learning curve and the leader’s advantage begins to erode (see modal state (11 4) in period 16) and eventually vanishes completely (see the modal state (17 17) in period 64). The prizes reflect this erosion. The leader’s prize is higher than the follower’s in state (8 1) (395 versus 220) but lower in state (11 4) (116 versus 123). Although the leadership position is transitory, it is surely worth having. Both firms use price cuts in state (1 1) in the hope of being the first to move down the learning curve. In the example, the prize of 685 justifies charging the price of 548 that is well below the marginal cost of 10. The well is therefore an investment in building competitive advantage. TABLE I FLAT EQUILIBRIUM WITH WELL (ρ = 085, δ = 00275)
Period
0 8 16 32 64 ∞
Leader
Follower
Modal State
Cost
Prize
Price
Prob.
Value
Cost
Prize
Price
Prob.
Value
(1 1) (8 1) (11 4) (14 9) (17 17) (25 24)
10.00 6.14 5.70 5.39 5.30 5.30
6.85 3.95 1.16 0.36 −0.01 −0.01
5.48 7.68 7.20 7.16 7.31 7.30
0.50 0.81 0.62 0.53 0.50 0.50
5.87 22.99 20.08 20.06 20.93 21.02
10.00 10.00 7.22 5.97 5.30 5.30
6.85 2.20 1.23 0.64 −0.01 −0.00
5.48 9.14 7.68 7.27 7.31 7.30
0.50 0.19 0.38 0.47 0.50 0.50
5.87 5.34 11.48 17.30 20.93 21.02
29 As a point of comparison, we provide details on firms’ competitive positions at various points in time for our leading example of a flat equilibrium without well (ρ = 085, δ = 0) in the online Appendix.
486
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
More abstractly, a well is the outcome of an auction in state (1 1) for the additional future profits—the prize—that accrue to the firm that makes the first sale and acquires transitory industry leadership. As equations (17)–(19) show, a firm’s price in equilibrium is virtual marginal cost marked up. Virtual marginal cost, in turn, accounts for the discounted prize from winning the current period’s sale and, getting to the essential point, the prize is the difference in the value of continued play if the firm rather than its rival wins. Diagonal Trenches A diagonal trench, as seen in the lower panels of Figure 4, is a price war between symmetric or nearly symmetric firms. Extending along the entire diagonal of the state space, a diagonal trench has the curious feature that the firms compete fiercely—perhaps pricing below cost—even when they both have exhausted all gains from learning-by-doing. Part (i) of Proposition 5 rules out this type of behavior in the absence of organizational forgetting. Like a well, a diagonal trench serves to build a competitive advantage. Unlike a well, a diagonal trench also serves to defend a competitive advantage, thereby rendering it (almost) permanent: The follower recognizes that to seize the leadership position, it would have to “cross over” the diagonal trench and struggle through another price war. Crucially this price war is a part of a MPE and, as such, a credible threat the follower cannot ignore. The logic behind a diagonal trench has three parts. If a diagonal trench exists, then the follower does not contest the leadership position. If the follower does not contest the leadership position, then being the leader is valuable. Finally, to close the circle of logic, if being the leader is valuable, then firms price aggressively on the diagonal on the state space in a bid for the leadership position, thus giving rise to the diagonal trench. Table II illustrates this argument by providing details on firms’ competitive position in various states for our leading example of a trenchy equilibrium (ρ = 085, δ = 00275). Part 1. Trench Sustains Leadership. To see why the follower does not contest the leadership position, consider a state such as (21 20), where the folTABLE II TRENCHY EQUILIBRIUM (ρ = 085, δ = 00275) Leader Prize
Follower
State
Cost
Price
Prob.
Value
Cost
Prize
Price
Prob.
Value
(21 20) (21 21) (22 20)
5.30 5.30 5.30
3.53 2.14 3.22
5.57 5.26 6.44
0.72 0.50 0.76
21.91 19.79 23.98
5.30 5.30 5.30
0.14 2.14 −1.04
6.54 5.26 7.60
0.28 0.50 0.24
19.56 19.79 20.09
(28 21)
5.30
−0.13
7.63
0.55
(20 20)
5.30
2.16
5.24
0.50
25.42
5.30
−0.71
7.81
0.45
22.37
19.82
5.30
2.16
5.24
0.50
19.82
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
487
lower has almost caught up with the leader. Suppose the follower wins the current sale. In this case, the follower may leapfrog the leader if the industry moves against the odds to state (20 21). However, the most likely possibility, with a probability of 032, is that the industry moves to state (21 21). Due to the brutal price competition “in the trench,” the follower’s expected cash flow in the next period decreases to −002 = 050 × (526 − 530) compared to 034 = 028 × (654 − 530) if the industry had remained in state (21 20). Suppose, in contrast, the leader wins. This completely avoids sparking a price war. Moreover, the most likely possibility, with a probability of 032, is that the leader enhances its competitive advantage by moving to state (22 20). If so, the leader’s expected cash flow in the next period increases to 087 = 076×(644−530) compared to 020 = 072×(557−530) if the industry had remained in state (21 20). Because winning the sale is more valuable to the leader than to the follower, the leader’s prize in state (21 20) is almost 25 times larger than the follower’s and the leader underprices the follower. As a consequence, the leader defends its position with a substantial probability of 079. In other words, the diagonal trench sustains the leadership position. Part 2. Leadership Generates Value. Because the leader underprices the follower, over time the industry moves from state (21 20) to (or near) the modal state (28 21) of the limiting distribution. Once there, the leader underprices the follower (763 versus 781) despite cost parity and thus enjoys a higher probability of making a sale (055 versus 045). The leader’s expected cash flow in the current period is, therefore, 055 × (763 − 530) = 127 as compared to the follower’s expected cash flow of 045 × (781 − 530) = 114. Because the follower does not contest the leadership position, the leader is likely to enjoy these additional profits for a long time (recall that τ∞ = 295). Hence, being the leader is valuable. Part 3. Value Induces Trench. Because being the leader is valuable, firms price aggressively on the diagonal on the state space in a bid for the leadership position. The prize is 216 in state (20 20) and 214 in state (21 21), and justifies charging a price of 524 and 526, respectively, even through all gains from learning-by-doing have been exhausted. This gives rise to the diagonal trench. Observe that this argument applies at every state on the diagonal, because no matter where on the diagonal the firms happen to be, winning the current sale confers a lasting advantage. The trench therefore extends along the entire diagonal of the state space. All this can be summed up in a sentence: Building a competitive advantage creates the diagonal trench that defends the advantage and creates the prize that makes it worthwhile to fight for the leadership position. A diagonal trench is thus a self-reinforcing mechanism for gaining and maintaining market dominance.
488
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE TABLE III EXTRA-TRENCHY EQUILIBRIUM (ρ = 085, δ = 008) Leader
Follower
State
Cost
Prize
Price
Prob.
Value
Cost
Prize
Price
Prob.
Value
(26 1) (26 2) (26 3)
5.30 5.30 5.30
6.43 6.21 5.16
8.84 7.48 6.94
0.90 0.88 0.85
53.32 46.31 40.23
10.00 8.50 7.73
0.12 0.21 0.27
11.00 9.44 8.65
0.10 0.12 0.15
2.42 2.55 2.78
(26 7) (26 8) (26 9) (26 10)
5.30 5.30 5.30 5.30
3.04 2.33 1.17 0.16
6.14 5.99 6.24 6.83
0.73 0.66 0.51 0.40
24.76 21.86 19.84 19.09
6.34 6.14 5.97 5.83
0.58 1.08 1.71 1.96
7.15 6.64 6.29 6.44
0.27 0.34 0.49 0.60
4.16 4.78 6.07 8.08
Sideways Trenches A sideways trench, as seen in the lower right panel of Figure 4, is a price war between very asymmetric firms. This war is triggered when the follower starts to move down its learning curve. Table III provides details on firms’ competitive positions in various states for our leading example of an extratrenchy equilibrium (ρ = 085, δ = 008). The sideways trench is evident in the decrease in the leader’s price from state (26 1) to state (26 8) and the increase from state (26 8) to state (26 10). Note that the follower has little chance of making it down its learning curve as long as the probability of winning a sale is less than the probability of losing a unit of know-how through organizational forgetting. While D∗2 (26 1) = 010 > 008 = Δ(1), we have D∗2 (26 2) = 012 < 015 = Δ(2) and D∗2 (26 3) = 015 < 022 = Δ(3). Hence, the leader can stall the follower at the top of its learning curve and, indeed, the modal state of the limiting distribution is (26 1). The additional future profits stemming from the leader’s ability to stall the follower are the source of its large prize in state (26 1). In state (26 2), the prize is almost as large because by winning a sale, the leader may move the industry back to state (26 1) in the next period. The leader’s prize falls as the follower moves further down its learning curve because it takes progressively longer for the leader to force the follower back up its learning curve and because the lower cost of the follower makes it harder for the leader to do so. In the unlikely event that the follower crashes through the sideways trench in state (26 8), the leader’s prize falls sharply. At the same time, the follower’s prize rises sharply as it turns from a docile competitor into a viable threat. A sideways trench, like a diagonal trench, is a self-reinforcing mechanism for gaining and maintaining market dominance, but a diagonal trench is about fighting an imminent threat, whereas a sideways trench is about fighting a distant threat. One can think of a sideways trench as an endogenously arising
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
489
mobility barrier in the sense of Caves and Porter (1977) or the equilibrium manifestation of former Intel CEO Andy Grove’s dictum, “Only the paranoid survive.” In sum, the four types of equilibria that we have identified in Section 4 give rise to distinct yet plausible pricing behaviors and industry dynamics. Rather than impeding aggressive behavior, organizational forgetting facilitates it. In its absence, the equilibria are flat either with or without well, depending on the progress ratio. Generally speaking, organizational forgetting is associated with “trenchier” equilibria, more aggressive behavior, and more concentrated industries both in the short run and in the long run. 6.3. Dominance Properties Traditional intuition suggests that learning-by-doing leads by itself to market dominance by giving a more experienced firm the ability to profitably underprice its less experienced rival. This enables the leader to widen its competitive advantage over time, thereby further enhancing its ability to profitably underprice the follower. C–R formalized this idea with “two concepts of selfreinforcing market dominance” (p. 1115): An equilibrium exhibits increasing dominance (ID) if p∗ (e) − p∗ (e[2] ) < 0 whenever e1 > e2 and exhibits increasing increasing dominance (IID) if p∗ (e) − p∗ (e[2] ) is decreasing in e1 . If ID holds, the leader charges a lower price than the follower and therefore enjoys a higher probability of making a sale. If IID holds, the price gap between the firms widens with the length of the lead.30 In the absence of organizational forgetting, Theorem 3.3 in C–R shows that ID and IID hold provided that the discount factor β is sufficiently close to 1 (alternatively, close to 0; see Theorem 3.1 in C–R). Our computations show 1 in our baseline parameterization suffices. that β = 105 RESULT 3: If organizational forgetting is absent (δ = 0), then IID holds. Thus, ID holds. Even if an equilibrium satisfies ID and IID, it is not clear that the industry is inevitably progressing toward monopolization. If the price gap between the firms is small, then the impact of ID and IID on industry structure and dynamics may be trivial.31 In such a scenario, the leader charges a slightly lower price than the follower and this gap widens a bit over time. However, with even a 30 Athey and Schmutzler’s (2001) notion of weak increasing dominance describes the relationship between players’ states and their actions in dynamic games with deterministic state-to-state transitions and coincides with the notion of ID in C–R. Similar notions also have been used by Vickers (1986) and Budd, Harris, and Vickers (1993) in dynamic investment games. 31 Indeed, C–R showed in their Theorem 3.2 that p∗ (e) → p† (m m) for all e ∈ {1 M}2 as β → 1, that is, both firms price as if at the bottom of their learning curves. This suggests that the price gap may be small for “reasonable” discount factors.
490
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
modest degree of horizontal product differentiation, the firms still split sales more or less equally and thus move down the learning curve in tandem. This is exactly what happens when δ = 0. For example, the flat equilibrium without well (ρ = 085, δ = 0) satisfies IID and thus ID, but with a maximum expected Herfindahl index of 052 the industry is essentially a symmetric duopoly at all times. More generally, as Figure 3 shows, in the absence of organizational forgetting, asymmetries are modest if they arise at all. Although ID and IID hold, the maximum expected Herfindahl index across all equilibria is 067 (attained at ρ = 065). Hence, ID and IID are not sufficient for economically meaningful market dominance. ID and IID are also not necessary for market dominance. The extra-trenchy equilibrium (ρ = 085, δ = 008), for example, violates ID and thus IID. Yet, the industry is likely to be a near monopoly at all times. More generally, while the empirical studies of Argote, Beckman, and Epple (1990), Darr, Argote, and Epple (1995), Benkard (2000), Shafer, Nembhard, and Uzumeri (2001), and Thompson (2003) warrant accounting for organizational forgetting in a model of learning-by-doing, doing so may cause ID and IID to fail. PROPOSITION 8: If organizational forgetting is present (δ > 0), then IID may fail. Also ID may fail. In the absence of organizational forgetting, C–R have already shown that ID and IID may fail for intermediate values of β (see their Remark C.5 on p. 1136). Result 3 and Proposition 8 make the comparative dynamics point that ID and IID may hold when δ = 0 but fail when δ > 0 (holding fixed the remaining parameters). Figure 10 illustrates Proposition 8 by plotting the share of equilibria that violate IID (upper panel) and ID (lower panel). As can be seen, all equilibria fail to obey IID unless forgetting or learning is very weak. Even violations of ID are extremely common, especially for forgetting rates δ in the empirically relevant range below 01. Of course, we do not argue that the concepts of ID and IID have no place in the analysis of industry dynamics. Caution, however, is advisable. Since ID and IID are neither necessary nor sufficient for market dominance, making inferences about the evolution of the industry on their basis alone may be misleading. 6.4. Summary Table IV summarizes the broad patterns of pricing behavior and industry dynamics. Acknowledging that the know-how gained through learning-by-doing can be lost through organizational forgetting is important. Generally speaking, organizational forgetting is associated with “trenchier” equilibria, more aggressive behavior, and more concentrated industries both in the short run and in the long run. Moreover, the dominance properties of firms’ pricing behavior can break down in the presence of organizational forgetting.
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
491
FIGURE 10.—Share of equilibria that violate IID (upper panel) and share of equilibria that violate ID (lower panel).
492
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE TABLE IV PRICING BEHAVIOR AND INDUSTRY DYNAMICS
Leading example
Flat Eqbm. Without Well
Flat Eqbm. With Well
Trenchy Eqbm.
Extratrenchy Eqbm.
ρ = 085 δ=0
ρ = 085 δ = 00275
ρ = 085 δ = 00275
ρ = 085 δ = 008
Preemption battle (well) Price war triggered by imminent threat (diagonal trench) Price war triggered by distant threat (sideways trench)
no
yes
no
no
no
no
yes
yes
no
no
no
yes
Short-run market dominance Long-run market dominance
no no
yes no
yes yes, modest
yes yes, extreme
Dominance properties
yes
no, mostly
no, mostly
no, mostly
The key difference between a model with and without organizational forgetting is that in the former, a firm can move both forward to a higher state and backward to a lower state. This possibility of bidirectional movement enhances the advantage-building and advantage-defending motives. By winning a sale, a firm makes itself less and its rival more vulnerable to organizational forgetting. This can create strong incentives to cut prices. Rather than impeding it, organizational forgetting therefore facilitates aggressive pricing as manifested in the trenchy and extra-trenchy equilibria that we have identified. 7. ORGANIZATIONAL FORGETTING AND MULTIPLE EQUILIBRIA While the equilibrium is unique if organizational forgetting is either absent (δ = 0) or certain (δ = 1), multiple equilibria are common for intermediate degrees of forgetting. Surprisingly, for some values of ρ and δ, the equilibria range from “peaceful coexistence” to “trench warfare.” Consequently, in addition to primitives of learning-by-doing and organizational forgetting, the equilibrium by itself is an important determinant of pricing behavior and industry dynamics. Why do multiple equilibria arise in our model? To explore this question, think about the strategic situation faced by firms in setting prices in state e. The value of continued play to firm n is given by the conditional expectation of its value function, V n1 (e) and V n2 (e), as defined in equations (2) and (3). Holding the value of continued play fixed, the strategic situation in state e is akin to a static game. If the reaction functions in this game intersect more than once, then multiple equilibria arise. On the other hand, if they intersect only
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
493
once, irrespective of the value of continued play, then we say that a model satisfies statewise uniqueness. PROPOSITION 9: Statewise uniqueness holds. Not surprisingly, the proof of Proposition 9 relies on the functional form of demand. This is reminiscent of the restrictions on demand (e.g., log concavity) that Caplin and Nalebuff (1991) set forth to guarantee uniqueness of Nash equilibrium in their analysis of static price-setting games. Given that the model satisfies statewise uniqueness, multiple equilibria must arise from firms’ expectations regarding the value of continued play. To see this, consider again state e. The intersection of the reaction functions constitutes a Nash equilibrium in prices in a subgame in which firm n believes that its value of continued play is given by V n1 (e) and V n2 (e). If firms have rational expectations, that is, if the conjectured value of continued play is actually attained, then these prices constitute an equilibrium of our dynamic stochastic game. In our model, taking the value of continued play as given, the reaction functions intersect only once because we have statewise uniqueness, but there may be more than one value of continued play that is consistent with rational expectations. In this sense, multiplicity is rooted in the dynamics of the model. The key driver of multiplicity is organizational forgetting. Dynamic competition with learning and forgetting is like racing down an upward-moving escalator. Unless a firm makes sales at a rate that exceeds the rate at which it loses know-how through forgetting, its marginal cost is bound to increase. The inflow of know-how to the industry is one unit per period, whereas in expectation the outflow in state e is Δ(e1 ) + Δ(e2 ). Consider state (e e), where e ≥ m, on the diagonal of the state space at or beyond the bottom of the learning curve. If 1 2Δ(e), then it is impossible that both firms reach the bottom of their learning curves and remain there. Knowing this, firms have no choice but to price aggressively. The result is trench warfare as each firm uses price cuts to push the state to its side of the diagonal and keep it there. If, however, 1 2Δ(e), then it is virtually inevitable that both firms reach the bottom of their learning curves, and firms may as well price softly. In both cases, the primitives of the model tie down the equilibrium. This is no longer the case if 1 ≈ 2Δ(e), setting the stage for multiple equilibria as diverse as peaceful coexistence and trench warfare. If firms believe that they cannot peacefully coexist at the bottom of their learning curves and that one firm will come to dominate the market, then both firms will cut their prices in the hope of acquiring a competitive advantage early on and maintaining it throughout. This naturally leads to trench warfare and market dominance. If, however, firms believe that they can peacefully coexist at the bottom of their learning curves, then neither firm cuts its price. Soft pricing, in turn, ensures that the anticipated symmetric industry structure actually emerges. A back-ofthe-envelope calculation is reassuring here. Recall that m = 15 and M = 30 in
494
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
our baseline parameterization, and observe that 1 = 2Δ(15) implies δ ≈ 0045, 1 = 2Δ(20) implies δ ≈ 0034, and 1 = 2Δ(30) implies δ ≈ 0023. This range of forgetting rates, for which the inflow of know-how approximately equals the outflow, is indeed where multiplicity prevails (see again Figure 2). Perhaps the starkest example of multiplicity arising from firms’ expectations about each other are the sunspot equilibria discussed in Section 4. The gains from learning-by-doing are absent (ρ = 1) and, consequently, the state variables are payoff-irrelevant in the sense of Maskin and Tirole (2001). One equilibrium is for firms to set the same price as in the Nash equilibrium of a static price-setting game. In addition to this flat equilibrium, there can also be a sunspot in the form of a trenchy equilibrium where the leader charges a higher price than the follower. The leader correctly anticipates that the follower will back down and charge a higher price, thus allowing the leader to enjoy a higher probability of making a sale. This equilibrium can be sustained because the state carries enough history for punishments to be effectively administered.32 A sufficient condition for uniqueness of equilibrium in a dynamic stochastic game with a finite state space is that the model satisfies statewise uniqueness and the movements through the state space are unidirectional. Statewise uniqueness precludes players’ actions from giving rise to multiple equilibria and unidirectional movements preclude their expectations from doing so. The proof of Proposition 3 illustrates the power of this sufficient condition. Specifically, if δ = 0 in our game, then a firm can never move backward to a lower state. Hence, once the industry reaches state (M M), it remains there forever, so that the value of future play in state (M M) coincides with the value of being in this state ad infinitum. In conjunction with statewise uniqueness, this uniquely determines the value of being in state (M M). Next consider states (M − 1 M) and (M M − 1). The value of future play in states (M − 1 M) and (M M − 1) depends on the value of being in state (M M). Statewise uniqueness ensures that firms’ prices in states (M − 1 M) and (M M − 1) as well as the value of being in these states are uniquely determined. Continuing to work backward establishes that the equilibrium is unique. 8. ROBUSTNESS CHECKS We have conducted extensive robustness checks regarding our specification of the discount factor, demand (product differentiation, outside good, and choke price), learning-by-doing, and entry and exit. In the interest of brevity, 32 In their discussion of predatory pricing, C–R contrasted an equilibrium in which neither firm exits the industry with another equilibrium in which the firm losing the first sale exits. C– R called this latter case a “bootstrap” equilibrium because it is sustained by the leader correctly anticipating that by pricing aggressively, it will drive the follower out of the industry. Our sunspots are similar in that they—like any equilibrium—rely on self-fulfilling expectations, but different in that they do not rely on payoff-relevant state variables.
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
495
we confine ourselves here to pointing out if and how our results regarding aggressive pricing behavior (wells and trenches), market dominance (persistent asymmetries), and multiple equilibria change with the specification of the model. A more detailed discussion can be found in the online Appendix along with some further checks (frequency of sales, organizational forgetting) that we omit here. 8.1. Discount Factor Two extreme cases merit discussion. First, as β → 0 and firms become more myopic, the wells and trenches vanish and we obtain a flat equilibrium without well. In the limit of β = 0, equation (20) implies that the equilibrium of our dynamic stochastic game reduces to the static Nash equilibrium irrespective of the forgetting rate δ. Second, as β → 1, the wells and trenches deepen: More patient firms have a stronger incentive to cut prices in the present so as to seize the leadership position in the future. In addition to the four typical cases in Figure 4, we obtain other types of equilibria with more complex patterns of trenches. The fact that high discount factors exacerbate the multiplicity problem is hardly surprising in light of the folk theorems for repeated games (Friedman (1971), Rubinstein (1979), Fudenberg and Maskin (1986)). 8.2. Demand Product Differentiation A higher degree of horizontal product differentiation σ lowers the extent to which firms interact strategically with each other. As σ → ∞, firms eventually become monopolists and have no incentive to cut prices to acquire or defend of a competitive advantage. As a result, the equilibrium is unique and the industry evolves symmetrically. Outside Good As in C–R, we assume that the buyer always purchases from one of the two firms in the industry. Allowing the buyer to choose, instead, an alternative made from a substitute technology (outside good) implies that the price elasticity of aggregate demand for the two competing firms is no longer zero. If the outside good is made sufficiently attractive, then in state (1 1) the probability that either firm wins the one unit of demand that is available each period becomes small unless they price aggressively—below marginal cost—against the outside good. Making it down from the top of its learning curve consequently requires a firm to incur substantial losses in the near term. In the long term, however, fighting one’s way down the learning curve has substantial rewards, because the outside good is a much less formidable competitor to a firm at the bottom of its learning curve than to a firm at the top.
496
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
If the discount factor is held fixed at its baseline value, then even a moderately attractive outside good sufficiently constrains firms’ pricing behavior so that we no longer have sunspots for a progress ratio of ρ = 1. If we further increase the attractiveness of the outside good, then the rewards from fighting one’s way down the learning curve become too far off in the future to justify the required aggressive pricing with its attendant near-term losses. In the ensuing equilibrium, the price that a firm charges is fairly insensitive to its rival’s stock of know-how, because the outside good is the firm’s main competitor and wins the sale most periods. As a result, the inflow of know-how to the industry through learning is much smaller than the outflow through forgetting. This implies that the equilibrium is unique and entails both firms being stuck at the top of their learning curves. Trenchy and extra-trenchy equilibria disappear. If the discount factor is increased as the outside good is made increasingly attractive, then the near-term losses of fighting one’s way down the learning curve do not overwhelm the long-term rewards from doing so. Firms price aggressively in trenchy and extra-trenchy equilibria. Therefore, provided the discount factor is sufficiently close to one, the presence of an economically significant price elasticity of aggregate demand does not seem to change the variety and multiplicity of equilibria in any fundamental way. Choke Price As in C–R, our logit specification for demand ensures that a firm always has a positive probability of making a sale and, in the absence of forgetting, must therefore eventually reach the bottom of its learning curve. This precludes the occurrence of long-run market dominance in the absence of organizational forgetting. Suppose instead that the probability that firm n makes a sale is given by a linear specification. Due to the choke price in the linear specification, a firm is surely able to deny its rival a sale by pricing sufficiently aggressively. Given a sufficiently low degree of horizontal product differentiation, firms at the top of their learning curves fight a preemption battle. The industry remains in an asymmetric structure as the winning firm takes advantage of the choke price to stall the losing firm at the top of its learning curve. In other words, the choke price is a shut-out model element that can lead to persistent asymmetries even in the absence of organizational forgetting. 8.3. Learning-by-Doing Following C–R, we assume that m < M represents the stock of know-how at which a firm reaches the bottom of its learning curve. In a bottomless learning specification with m = M, we obtain other types of equilibria in addition to the four typical cases in Figure 4. Particularly striking is the plateau equilibrium. This equilibrium is similar to a trenchy equilibrium except that the diagonal trench is interrupted by a region of very soft price competition. On this plateau,
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
497
both firms charge prices well above cost. This “cooperative” behavior contrasts markedly with the price war of the diagonal trench. 8.4. Entry and Exit We assume that at any point in time there is a total of N firms, each of which can be either an incumbent firm or a potential entrant. Once an incumbent firm exits the industry, it perishes and a potential entrant automatically takes its “slot” and has to decide whether or not to enter. Organizational forgetting remains a source of aggressive pricing behavior, market dominance, and multiple equilibria in the general model with entry and exit. The possibility of exit adds another component to the prize from winning a sale, because by winning a sale, a firm may move the industry to a state in which its rival is likely to exit. But if the rival exits, then it may be replaced by an entrant that comes into the industry at the top of its learning curve or it may not be replaced at all. As a result, pricing behavior is more aggressive than in the basic model without entry and exit. This leads to more pronounced asymmetries both in the short run and in the long run. Because entry and exit are shut-out model elements, asymmetries can arise and persist even in the absence of organizational forgetting (see the online Appendix for a concrete example). Entry and exit may also give rise to multiple equilibria as C–R have already shown (see their Theorem 4.1). 9. CONCLUSIONS Learning-by-doing and organizational forgetting are empirically important in a variety of industrial settings. This paper provides a general model of dynamic competition that accounts for these economic fundamentals and shows how they shape industry structure and dynamics. We contribute to the numerical analysis of industry dynamics in two ways. First, we show that there are equilibria that the P–M algorithm cannot compute. Second, we propose a homotopy algorithm that allows us to describe in detail the structure of the set of solutions to our model. In contrast to the present paper, the theoretical literature on learningby-doing has largely ignored organizational forgetting. Moreover, it has mainly focused on the dominance properties of firms’ pricing behavior. By directly examining industry dynamics, we are able to show that ID and IID may not be sufficient for economically meaningful market dominance. By generalizing the existing models of learning, we are able to show that these dominance properties break down with even a small degree of forgetting. Yet, it is precisely in the presence of organizational forgetting that market dominance ensues, both in the short run and in the long run.
498
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
Our analysis of the role of organizational forgetting reveals that learning and forgetting are distinct economic forces. Forgetting, in particular, does not simply negate learning. The unique role played by organizational forgetting comes about because it makes bidirectional movements through the state space possible. As a consequence, a model with forgetting can give rise to aggressive pricing behavior, varying degrees of long-run industry concentration ranging from moderate leadership to absolute dominance, and multiple equilibria. Diagonal and sideways trenches are part and parcel of the self-reinforcing mechanisms that lead to market dominance. Since the leadership position is aggressively defended, firms fight a price war to attain it. This provides all the more reason to aggressively defend the leadership position, because if it is lost, then another price war ensues. This seems like a good story to tell. Our computations show that this is not just an intuitively sensible story, but also a logically consistent one that—perhaps—plays out in real markets. APPENDIX PROOF OF PROPOSITION 1: Part (i) The basic differential equations (13) set ∂F(x(s); δ(s) ρ) δ (s) = det ∂x is a (2M 2 × 2M 2 ) matrix and therefore has an even The Jacobian ∂F(x(s);δ(s)ρ) ∂x number of eigenvalues. Its determinant is the product of its eigenvalues. Hence, if δ (s) ≤ 0, then there exists at least one real nonnegative eigenvalue. (Suppose to the contrary that all eigenvalues are either complex or real and negative. Since the number of complex eigenvalues is even, so is the number of real eigenvalues. Moreover, the product of a conjugate pair of complex eigenvalues is positive, as is the product of an even number of real negative eigenvalues.) To relate the P–M algorithm to our homotopy algorithm, let (x(s) δ(s)) ∈ F−1 (ρ) be a parametric path of equilibria. We show in the online Appendix that ∂F(x(s); δ(s) ρ) ∂G(x(s)) (21) + I = ∂x ∂x (δ(s)ρ) where I denotes the (2M 2 × 2M 2 ) identity matrix. The proof is completed by recalling a basic result from linear algebra: Let A be an arbitrary matrix and let ς(A) be its spectrum. Then ς(A + I) = ς(A) + 1 (see Proposition A.17 in Appendix A of Bertsekas and Tsitsiklis (1997)).
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
499
has at least one real nonnegative eigenvalue, it folHence, because ∂F(x(s);δ(s)ρ) ∂x lows from equation (21) that ∂G(x(s)) |(δ(s)ρ) has at least one real eigenvalue equal ∂x |(δ(s)ρ) ) ≥ 1. to or greater than unity. Hence, ( ∂G(x(s)) ∂x ˜ k ) = ωG(xk ) + (1 − ω)xk , where Part (ii) Consider the iteration xk+1 = G(x ω > 0. Using equation (21), its Jacobian at (x(s) δ(s)) ∈ F−1 (ρ) is ˜ ∂G(x(s)) ∂G(x(s)) =ω + (1 − ω)I ∂x ∂x (δ(s)ρ) (δ(s)ρ) =ω
∂F(x(s); δ(s) ρ) + I ∂x
˜
As before, it follows that ( ∂G(x(s)) |(δ(s)ρ) ) ≥ 1. ∂x
Q.E.D.
PROOF OF PROPOSITION 3: We rewrite the Bellman equations and FOCs in state e as (22)
V1 = D1 (p1 p2 )(p1 − c(e1 ) + β(V 11 − V 12 )) + βV 12
(23)
V2 = D2 (p1 p2 )(p2 − c(e2 ) + β(V 22 − V 21 )) + βV 21 σ 0= − (p1 − c(e1 ) + β(V 11 − V 12 )) D2 (p1 p2 ) σ − (p2 − c(e2 ) + β(V 22 − V 21 )) 0= D1 (p1 p2 )
(24) (25)
where, to simplify the notation, Vn is shorthand for Vn (e), V nk is shorthand for V nk (e), pn is shorthand for pn (e), etcetera and we use the fact that D1 (p1 p2 ) + D2 (p1 p2 ) = 1. Case (i) First suppose δ = 0. The proof proceeds in a number of steps. In Step 1, we establish that the equilibrium in state (M M) is unique. In Step 2a, we assume that there is a unique equilibrium in state (e1 + 1 M), where e1 ∈ {1 M − 1}, and show that this implies that the equilibrium in state (e1 M) is unique. In Step 2b, we assume that there is a unique equilibrium in state (M e2 + 1), where e2 ∈ {1 M − 1}, and show that this implies that the equilibrium in state (M e2 ) is unique. By induction, Steps 1, 2a, and 2b establish uniqueness along the upper edge of the state space. In Step 3, we assume that there is a unique equilibrium in states (e1 + 1 e2 ) and (e1 e2 + 1), where e1 ∈ {1 M − 1} and e2 ∈ {1 M − 1}, and show that this implies that the equilibrium in state (e1 e2 ) is unique. Hence, uniqueness in state (M − 1 M − 1) follows from uniqueness in states (M M − 1) and (M − 1 M), uniqueness in state (M − 2 M − 1) follows from uniqueness in states (M − 1 M − 1) and (M − 2 M), etcetera. Working backward gives uniqueness in states (e1 M − 1), where e1 ∈ {1 M − 1}. This, in turn, gives uniqueness in states (e1 M − 2), where e1 ∈ {1 M − 1}, and so on.
500
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
Step 1. Consider state e = (M M). From the definition of the state-to-state transitions in Section 2, we have V 11 = V 12 = V1
V 21 = V 22 = V2
Imposing these restrictions and solving equations (22) and (23) for V1 and V2 , respectively, yields (26) (27)
D1 (p1 p2 )(p1 − c(e1 )) 1−β D2 (p1 p2 )(p2 − c(e2 )) V2 = 1−β
V1 =
Simplifying equations (24) and (25) yields σ − (p1 − c(e1 )) = F1 (p1 p2 ) (28) 0= D2 (p1 p2 ) σ − (p2 − c(e2 )) = F2 (p1 p2 ) (29) 0= D1 (p1 p2 ) The system of equations (28) and (29) determines equilibrium prices. Once we have established that there is a unique solution for p1 and p2 , equations (26) and (27) immediately ascertain that V1 and V2 are unique. Let p1 (p2 ) and p2 (p1 ) be defined by F1 (p1 (p2 ) p2 ) = 0
F2 (p1 p2 (p1 )) = 0
and set F(p1 ) = p1 − p1 (p2 (p1 )). The p1 that solves the system of equations (28) and (29) is the solution to F(p1 ) = 0, and this solution is unique provided that F(p1 ) is strictly monotone. The implicit function theorem yields ∂F2 ∂F1 − − ∂p2 ∂p1 F (p1 ) = 1 − ∂F1 ∂F2 ∂p1 ∂p2 Straightforward differentiation shows that ∂F1 D1 (p1 p2 ) − − ∂p2 D2 (p1 p2 ) = = D1 (p1 p2 ) ∈ (0 1) ∂F1 1 − ∂p1 D2 (p1 p2 ) ∂F2 D2 (p1 p2 ) − − ∂p1 D1 (p1 p2 ) = = D2 (p1 p2 ) ∈ (0 1) ∂F2 1 − ∂p2 D1 (p1 p2 )
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
501
It follows that F (p1 ) > 0. Step 2a. Consider state e = (e1 M), where e1 ∈ {1 M − 1}. We have V 12 = V1
V 22 = V2
Imposing these restrictions and solving equations (22) and (23) for V1 and V2 , respectively, yields (30)
V1 =
D1 (p1 p2 )(p1 − c(e1 ) + βV 11 ) 1 − βD2 (p1 p2 )
(31)
V2 =
D2 (p1 p2 )(p2 − c(e2 ) − βV 21 ) + βV 21 1 − βD2 (p1 p2 )
Substituting equations (30) and (31) into equations (24) and (25), and dividing through by 1−βD1−β and 1−βD 1(p p ) , respectively, yields (p p ) 2
1
2
2
1
2
(32)
0=
(1 − βD2 (p1 p2 ))σ − (p1 − c(e1 ) + βV 11 ) = G1 (p1 p2 ) (1 − β)D2 (p1 p2 )
(33)
0=
(1 − βD2 (p1 p2 ))σ − (p2 − c(e2 ) − β(1 − β)V 21 ) = G2 (p1 p2 ) D1 (p1 p2 )
The system of equations (32) and (33) determines equilibrium prices as a function of V 11 and V 21 . These are given by V1 (e1 + 1 M) and V2 (e1 + 1 M), respectively, and are unique by hypothesis. As in Step 1, once we have established that there is a unique solution for p1 and p2 , equations (30) and (31) immediately ascertain that, in state e = (e1 M), V1 and V2 are unique. Proceeding as in Step 1, set G(p1 ) = p1 − p1 (p2 (p1 )), where p1 (p2 ) and p2 (p1 ) are defined by G1 (p1 (p2 ) p2 ) = 0 and G2 (p1 p2 (p1 )) = 0, respectively. We have to show that G(·) is strictly monotone. Straightforward differentiation shows that ∂G1 D1 (p1 p2 ) − − D1 (p1 p2 ) ∂p2 (1 − β)D2 (p1 p2 ) ∈ (0 1) = = ∂G1 1 − βD2 (p1 p2 ) 1 − βD2 (p1 p2 ) − ∂p1 (1 − β)D2 (p1 p2 ) ∂G2 (1 − β)D2 (p1 p2 ) − − (1 − β)D2 (p1 p2 ) ∂p1 D1 (p1 p2 ) ∈ (0 1) = = ∂G2 1 − βD2 (p1 p2 ) 1 − βD2 (p1 p2 ) − ∂p2 D1 (p1 p2 ) It follows that G (p1 ) > 0.
502
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
Step 2b. Consider state e = (M e2 ), where e2 ∈ {1 M − 1}. We have V 11 = V1
V 21 = V2
The argument is completely symmetric to the argument in Step 2a and therefore is omitted. Step 3. Consider state e = (e1 e2 ), where e1 ∈ {1 M − 1} and e2 ∈ {1 M − 1}. The system of equations (24) and (25) determines equilibrium prices as a function of V 11 , V 12 , V 21 , and V 22 . These are given by V1 (e1 + 1 e2 ), V1 (e1 e2 + 1), V2 (e1 + 1 e2 ), and V2 (e1 e2 + 1), respectively, and are unique by hypothesis. As in Step 1, once we have established that there is a unique solution for p1 and p2 , equations (22) and (23) immediately ascertain that, in state e = (e1 e2 ), V1 and V2 are unique. Let H1 (p1 p2 ) and H2 (p1 p2 ) denote the right-hand side of equations (24) and (25), respectively. Proceeding as in step 1, set H(p1 ) = p1 − p1 (p2 (p1 )), where p1 (p2 ) and p2 (p1 ) are defined by H1 (p1 (p2 ) p2 ) = 0 and H2 (p1 p2 (p1 )) = 0, respectively. We have to show that H(·) is strictly monotone. Straightforward differentiation shows that ∂H1 D1 (p1 p2 ) − − ∂p2 D2 (p1 p2 ) = = D1 (p1 p2 ) ∈ (0 1) ∂H1 1 − ∂p1 D2 (p1 p2 ) ∂H2 D2 (p1 p2 ) − − ∂p1 D1 (p1 p2 ) = = D2 (p1 p2 ) ∈ (0 1) ∂H2 1 − ∂p2 D1 (p1 p2 ) It follows that H (p1 ) > 0. Case (ii) Next suppose δ = 1. A similar induction argument as in the case of δ = 0 can be used to establish the claim except that in the case of δ = 1 we anchor the argument in state (1 1) rather than state (M M). Q.E.D. PROOF OF PROPOSITION 5: Part (i) Consider the static Nash equilibrium. The FOCs in state e are σ p†1 (e) = c(e1 ) + (34) † 1 − D1 (p1 (e) p†2 (e)) σ (35) p†2 (e) = c(e2 ) + † 1 − D2 (p1 (e) p†2 (e)) Equations (34) and (35) imply p†1 (e) > c(e1 ) and p†2 (e) > c(e2 ), and thus, in particular, p† (m m) > c(m). In addition, p† (e) = p† (m m) because c(e1 ) = c(e2 ) = c(m) for all e ∈ {m M}2 .
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
503
Turning to our dynamic stochastic game, suppose that δ = 0. The proof of part (i) proceeds in a number of steps, similar to the proof of Proposition 3. In Step 1, we establish that equilibrium prices in state (M M) coincide with the static Nash equilibrium. In Step 2a, we assume that the equilibrium in state (e1 + 1 M), where e1 ∈ {m M − 1}, coincides with the equilibrium in state (M M) and show that this implies that the equilibrium in state (e1 M) does the same. In Step 2b, we assume that the equilibrium in state (M e2 +1), where e2 ∈ {m M − 1}, coincides with the equilibrium in state (M M) and show that this implies that the equilibrium in state (M e2 ) does the same. In Step 3, we assume that the equilibrium in states (e1 + 1 e2 ) and (e1 e2 + 1), where e1 ∈ {m M − 1} and e2 ∈ {m M − 1}, coincides with the equilibrium in state (M M) and show that this implies that the equilibrium in state (e1 e2 ) does the same. Also similar to the proof of Proposition 3, we continue to use Vn as shorthand for Vn (e), V nk as shorthand for V nk (e), pn as shorthand for pn (e), etcetera. Step 1. Consider state e = (M M). From the proof of Proposition 3, equilibrium prices are determined by the system of equations (28) and (29). Since equations (28) and (29) are equivalent to equations (34) and (35), equilibrium prices are p1 = p†1 and p2 = p†2 . Substituting equation (28) into (26) and equation (29) into (27) yields equilibrium values (36)
V1 =
σD1 (p1 p2 ) (1 − β)D2 (p1 p2 )
(37)
V2 =
σD2 (p1 p2 ) (1 − β)D1 (p1 p2 )
Step 2a. Consider state e = (e1 M), where e1 ∈ {m M − 1}. Equilibrium prices are determined by the system of equations (32) and (33). Given V 11 = V1 (e1 + 1 M) = V1 (M M) and V 21 = V2 (e1 + 1 M) = V2 (M M), it is easy to see that, in state e = (e1 M), p1 = p1 (M M) and p2 = p2 (M M) are a solution. Substituting equation (32) into (30) and equation (33) into (31) yields equilibrium values V1 = V1 (M M) and V2 = V2 (M M) as given by equations (36) and (37). Step 2b. Consider state e = (M e2 ), where e2 ∈ {m M − 1}. The argument is completely symmetric to the argument in Step 2a and therefore is omitted. Step 3. Consider state e = (e1 e2 ), where e1 ∈ {m M − 1} and e2 ∈ {m M − 1}. Equilibrium prices are determined by the system of equations (24) and (25). Given V 11 = V1 (e1 + 1 e2 ) = V1 (M M), V 12 = V1 (e1 e2 + 1) = V1 (M M), V 21 = V2 (e1 + 1 e2 ) = V2 (M M), and V 22 = V2 (e1 e2 + 1) = V2 (M M), it is easy to see that, in state e = (e1 e2 ), p1 = p1 (M M) and p2 = p2 (M M) are a solution. Substituting equation (24) into (22) and equation (25) into (23) yields equilibrium values V1 = V1 (M M) and V2 = V2 (M M) as given by equations (36) and (37).
504
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
Part (ii) We show that p2 (e) > c(m) for all e1 ∈ {1 m − 1} and e2 ∈ {m M}. The claim follows because p∗ (e) = p2 (e[2] ). The proof of part (ii) proceeds in two steps. In Step 1, we establish that the equilibrium price of firm 2 in state (e1 M), where e1 ∈ {1 m − 1}, exceeds c(m). In Step 2, we extend the argument to states in which firm 2 has not yet reached the bottom of its learning curve. We proceed by induction: Assuming that the equilibrium in state (e1 e2 + 1), where e1 ∈ {1 m − 1} and e2 ∈ {m M − 1}, coincides with the equilibrium in state (e1 M), we show that the equilibrium in state (e1 e2 ) does the same. Step 1. Consider state e = (e1 M), where e1 ∈ {1 m − 1}. From the proof of Proposition 3, equilibrium prices are determined by the system of equations (32) and (33). In equilibrium, we must have Vn (e) ≥ 0 for all e ∈ {1 M}2 because a firm can always set price equal to cost. Hence, V 21 ≥ 0 and equation (33) implies p2 > c(m). For reference in Step 2, note that substituting equation (32) into (30) and equation (33) into (31) yields equilibrium values (38)
V1 =
σD1 (p1 p2 ) (1 − β)D2 (p1 p2 )
(39)
V2 =
σD2 (p1 p2 ) + βD1 (p1 p2 )V 21 D1 (p1 p2 )
Step 2. Consider state e = (e1 e2 ), where e1 ∈ {1 m − 1} and e2 ∈ {m M − 1}. Equilibrium prices are determined by the system of equations (24) and (25). Assuming V 12 = V1 (e1 e2 + 1) = V1 (e1 M) and V 22 = V2 (e1 e2 + 1) = V2 (e1 M) as given by equations (38) and (39) in Step 1, equations (24) and (25) collapse to equations (32) and (33). Hence, in state e = (e1 e2 ), p1 = p1 (e1 M) and p2 = p2 (e1 M) are a solution. Further substituting equation (24) into (22) and equation (25) into (23) yields equilibrium values V1 = V1 (e1 M) and V2 = V2 (e1 M) as given by equations (38) and (39). Q.E.D. PROOF OF PROPOSITION 9: We rewrite the FOCs in state e as σ 0= − (p1 − c(e1 ) + β(V 11 − V 12 )) (40) D2 (p1 p2 ) σ − (p2 − c(e2 ) + β(V 22 − V 21 )) (41) 0= D1 (p1 p2 ) where, to simplify the notation, V nk is shorthand for V nk (e), pn is shorthand for pn (e), etcetera, and we use the fact that D1 (p1 p2 ) + D2 (p1 p2 ) = 1. The system of equations (40) and (41) determines equilibrium prices. We have to establish that there is a unique solution for p1 and p2 irrespective of V 11 , V 12 , V 21 , and V 22 .
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
505
Let H1 (p1 p2 ) and H2 (p1 p2 ) denote the right-hand side of equations (40) and (41), respectively. Proceeding as in Step 3 of the proof of Proposition 3, set H(p1 ) = p1 − p1 (p2 (p1 )), where p1 (p2 ) and p2 (p1 ) are defined by H1 (p1 (p2 ) p2 ) = 0 and H2 (p1 p2 (p1 )) = 0, respectively. We have to show that H(·) is strictly monotone. Straightforward differentiation shows that ∂H1 D1 (p1 p2 ) − − ∂p2 D2 (p1 p2 ) = = D1 (p1 p2 ) ∈ (0 1) ∂H1 1 − ∂p1 D2 (p1 p2 ) ∂H2 D2 (p1 p2 ) − − ∂p1 D1 (p1 p2 ) = = D2 (p1 p2 ) ∈ (0 1) ∂H2 1 − ∂p2 D1 (p1 p2 ) It follows that H (p1 ) > 0.
Q.E.D. REFERENCES
ACKERBERG, D., L. BENKARD, S. BERRY, AND A. PAKES (2007): “Econometric Tools for Analyzing Market Outcomes,” in Handbook of Econometrics, Vol. 6A, ed. by J. Heckman and E. Leamer. Amsterdam: North-Holland, 4171–4276. [456] ALCHIAN, A. (1963): “Reliability of Progress Curves in Airframe Production,” Econometrica, 31, 679–693. [453] ARGOTE, L., S. BECKMAN, AND D. EPPLE (1990): “The Persistence and Transfer of Learning in an Industrial Setting,” Management Science, 36, 140–154. [453,457,463,490] ATHEY, S., AND A. SCHMUTZLER (2001): “Investment and Market Dominance,” Rand Journal of Economics, 32, 1–26. [489] BAILEY, C. (1989): “Forgetting and the Learning Curve: A Laboratory Study,” Management Science, 35, 340–352. [458] BALOFF, N. (1971): “Extension of the Learning Curve: Some Empirical Results,” Operational Research Quarterly, 22, 329–340. [453] BENKARD, C. L. (2000): “Learning and Forgetting: The Dynamics of Aircraft Production,” American Economic Review, 90, 1034–1054. [453,457,463,490] (2004): “A Dynamic Analysis of the Market for Wide-Bodied Commercial Aircraft,” Review of Economic Studies, 71, 581–611. [453,459,470] BERRY, S., AND A. PAKES (2007): “The Pure Characteristics Demand Model,” International Economic Review, 48, 1193–1225. [464] BERTSEKAS, D., AND J. TSITSIKLIS (1997): Parallel and Distributed Computation: Numerical Methods. Belmont: Athena Scientific. [470,498] BESANKO, D., U. DORASZELSKI, Y. KRYUKOV, AND M. SATTERTHWAITE (2010): “Supplement to ‘Learning-by-Doing, Organizational Forgetting, and Industry Dynamics’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/6994_proofs.pdf; http://www.econometricsociety.org/ecta/Supmat/6994_data and programs.zip. [457] BOHN, R. (1995): “Noise and Learning in Semiconductor Manufacturing,” Management Science, 41, 31–42. [453]
506
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
BORKOVSKY, R., U. DORASZELSKI, AND Y. KRYUKOV (2010): “A User’s Guide to Solving Dynamic Stochastic Games Using the Homotopy Method,” Operations Research (forthcoming). [467] BUDD, C., C. HARRIS, AND J. VICKERS (1993): “A Model of the Evolution of Duopoly: Does the Asymmetry Between Firms Tend to Increase or Decrease?” Review of Economic Studies, 60, 543–573. [489] CABRAL, L., AND M. RIORDAN (1994): “The Learning Curve, Market Dominance, and Predatory Pricing,” Econometrica, 62, 1115–1140. [454] (1997): “The Learning Curve, Predation, Antitrust, and Welfare,” Journal of Industrial Economics, 45, 155–169. [454] CAPLIN, A., AND B. NALEBUFF (1991): “Aggregation and Imperfect Competition: On the Existence of Equilibrium,” Econometrica, 59, 25–59. [493] CAVES, R., AND M. PORTER (1977): “From Entry Barriers to Mobility Barriers: Conjectural Decisions and Contrived Deterrence to New Competition,” Quarterly Journal of Economics, 91, 241–262. [489] DARR, E., L. ARGOTE, AND D. EPPLE (1995): “The Acquisition, Transfer, and Depreciation of Knowledge in Service Organizations: Productivity in Franchises,” Management Science, 41, 1750–1762. [453,457,490] DASGUPTA, P., AND J. STIGLITZ (1988): “Learning-by-Doing, Market Structure and Industrial and Trade Policies,” Oxford Economics Papers, 40, 246–268. [454,455] DEJONG, J. (1957): “The Effects of Increasing Skills on Cycle Time and Its Consequences for Time Standards,” Ergonomics, 1, 51–60. [453] DEMICHELIS, S., AND F. GERMANO (2002): “On (Un)Knots and Dynamics in Games,” Games and Economic Behavior, 41, 46–60. [470] DORASZELSKI, U., AND K. JUDD (2008): “Avoiding the Curse of Dimensionality in Dynamic Stochastic Games,” Working Paper, Harvard University, Cambridge. [470] DORASZELSKI, U., AND M. SATTERTHWAITE (2010): “Computable Markov-Perfect Industry Dynamics,” Rand Journal of Economics (forthcoming). [462] DUDLEY, L. (1972): “Learning and Productivity Changes in Metal Products,” American Economic Review, 62, 662–669. [453] DUTTON, J., AND A. THOMAS (1984): “Treating Progress Functions as a Managerial Opportunity,” Academy of Management Review, 9, 235–247. [462] ERICSON, R., AND A. PAKES (1995): “Markov-Perfect Industry Dynamics: A Framework for Empirical Work,” Review of Economic Studies, 62, 53–82. [454,456,457,464] FRIEDMAN, J. (1971): “A Non-Cooperative Equilibrium for Supergames,” Review of Economic Studies, 38, 1–12. [495] FUDENBERG, D., AND E. MASKIN (1986): “The Folk Theorem in Repeated Games With Discounting or With Incomplete Information,” Econometrica, 54, 533–554. [495] FUDENBERG, D., AND J. TIROLE (1983): “Learning-by-Doing and Market Performance,” Bell Journal of Economics, 14, 522–530. [454] (1991): Game Theory. Cambridge: MIT Press. [470] GARCIA, C., AND W. ZANGWILL (1979): “An Approach to Homotopy and Degree Theory,” Mathematics of Operations Reserach, 4, 390–405. [466] GHEMAWAT, P., AND A. M. SPENCE (1985): “Learning Curve Spillovers and Market Performance,” Quarterly Journal of Economics, 100, 839–852. [454] GRIECO, P. (2008): “Numerical Approximation of an Equilibrium Correspondence: Method and Applications,” Working Paper, Northwestern University, Evanston. [467] GRUBER, H. (1992): “The Learning Curve in the Production of Semiconductor Memory Chips,” Applied Economics, 24, 885–894. [453] HATCH, N., AND D. MOWERY (1998): “Process Innovation and Learning by Doing in Semiconductor Manufacturing,” Management Science, 44, 1461–1477. [453] HERINGS, J., AND R. PEETERS (2004): “Stationary Equilibria in Stochastic Games: Structure, Selection, and Computation,” Journal of Economic Theory, 118, 32–60. [471]
LEARNING, FORGETTING, AND INDUSTRY DYNAMICS
507
HIRSCH, W. (1952): “Progress Functions of Machine Tool Manufacturing,” Econometrica, 20, 81–82. [453] HIRSCHMANN, W. (1964): “Profit From the Learning Curve,” Harvard Business Review, 42, 125–139. [453] IRWIN, D., AND P. KLENOW (1994): “Learning-by-Doing Spillovers in the Semiconductor Industry,” Journal of Political Economy, 102, 1200–1227. [453] JARMIN, R. (1994): “Learning by Doing and Competition in the Early Rayon Industry,” Rand Journal of Economics, 25, 441–454. [453] JUDD, K. (1998): Numerical Methods in Economics. Cambridge: MIT Press. [466,470] JUDD, K., AND K. SCHMEDDERS (2004): “A Computational Approach to Proving Uniqueness in Dynamic Games,” Working Paper, Hoover Institution, Stanford. [467] KILBRIDGE, M. (1962): “A Model for Industrial Learning Costs,” Management Science, 8, 516–527. [453] LEVY, F. (1965): “Adaptation in the Production Process,” Management Science, 11, B136–B154. [453] LIEBERMAN, M. (1984): “The Learning Curve and Pricing in the Chemical Processing Industries,” Rand Journal of Economics, 15, 213–228. [453] MASKIN, E., AND J. TIROLE (2001): “Markov Perfect Equilibrium, I: Observable Actions,” Journal of Economic Theory, 100, 191–219. [454,459,494] PAKES, A. (2008): “Theory and Empirical Work on Imperfectly Competitive Markets,” Fisher– Schultz Lecture, Harvard University, Cambridge. [456] PAKES, A., AND P. MCGUIRE (1994): “Computing Markov-Perfect Nash Equilibria: Numerical Implications of a Dynamic Differentiated Product Model,” Rand Journal of Economics, 25, 555–589. [456,467] PISANO, G. (1994): “Knowledge, Integration, and the Locus of Learning: An Empirical Analysis of Process Development,” Strategic Management Journal, 15 (Special Issue), 85–100. [453] PRESTON, L., AND E. KEACHIE (1964): “Cost Functions and Progress Functions: An Integration,” American Economic Review, 54, 100–107. [453] ROSS, D. (1986): “Learning to Dominate,” Journal of Industrial Economics, 34, 337–353. [454] RUBINSTEIN, A. (1979): “Equilibrium in Supergames With the Overtaking Criterion,” Journal of Economic Theory, 21, 1–9. [495] SCHMEDDERS, K. (1998): “Computing Equilibria in the General Equilibrium Model With Incomplete Asset Markets,” Journal of Economic Dynamics and Control, 22, 1375–1401. [464] (1999): “A Homotopy Algorithm and an Index Theorem for the General Equilibrium Model With Incomplete Asset Markets,” Journal of Mathematical Economics, 32, 225–241. [464] SHAFER, S., D. NEMBHARD, AND M. UZUMERI (2001): “The Effects of Worker Learning, Forgetting, and Heterogeneity on Assembly Line Productivity,” Management Science, 47, 1639–1653. [453,457,490] SHAPLEY, L. (1953): “Stochastic Games,” Proceedings of the National Academy of Sciences, 39, 1095–1100. [454] SPENCE, M. (1981): “The Learning Curve and Competition,” Bell Journal of Economics, 12, 49–70. [454] THOMPSON, P. (2001): “How Much Did the Liberty Shipbuilders Learn? New Evidence for an Old Case Study,” Journal of Political Economy, 109, 103–137. [453] (2003): “How Much Did the Liberty Shipbuilders Forget?” Working Paper, Florida International University, Miami. [453,457,490] THORNTON, R., AND P. THOMPSON (2001): “Learning From Experience and Learning From Others: An Exploration of Learning and Spillovers in Wartime Shipbuilding,” American Economic Review, 91, 1350–1368. [453] VICKERS, J. (1986): “The Evolution of Market Structure When There Is a Sequence of Innovations,” Journal of Industrial Economics, 35, 1–12. [489] VIVES, X. (1999): Oligopoly Pricing: Old Ideas and New Tools. Cambridge: MIT Press. [481]
508
BESANKO, DORASZELSKI, KRYUKOV, AND SATTERTHWAITE
WATSON, L., S. BILLUPS, AND A. MORGAN (1987): “HOMPACK: A Suite of Codes for Globally Convergent Homotopy Algorithms,” ACM Transactions on Mathematical Software, 13, 281–310. [467] WATSON, L., M. SOSONKINA, R. MELVILLE, A. MORGAN, AND H. WALKER (1997): “Algorithm 777: HOMPACK90: A Suite of Fortran 90 Codes for Globally Convergent Homotopy Algorithms,” ACM Transactions on Mathematical Software, 23, 514–549. [467] WIXTED, J., AND E. EBBESEN (1991): “On the Form of Forgetting,” Psychological Science, 2, 409–415. [458] WRIGHT, T. (1936): “Factors Affecting the Cost of Airplanes,” Journal of Aeronautical Sciences, 3, 122–128. [453] ZANGWILL, W., AND C. GARCIA (1981): Pathways to Solutions, Fixed Points, and Equilibria. Englewood Cliffs: Prentice Hall. [466,467] ZIMMERMAN, M. (1982): “Learning Effects and the Commercialization of New Energy Technologies: The Case of Nuclear Power,” Bell Journal of Economics, 13, 297–310. [453]
Kellogg School of Management, Northwestern University, Evanston, IL 60208, U.S.A.;
[email protected], Dept. of Economics, Harvard University, Cambridge, MA 02138, U.S.A.;
[email protected], Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A.;
[email protected], and Kellogg School of Management, Northwestern University, Evanston, IL 60208, U.S.A.;
[email protected]. Manuscript received February, 2007; final revision received June, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 509–537
AN EQUILIBRIUM THEORY OF LEARNING, SEARCH, AND WAGES BY FRANCISCO M. GONZALEZ AND SHOUYONG SHI1 We examine the labor market effects of incomplete information about the workers’ own job-finding process. Search outcomes convey valuable information, and learning from search generates endogenous heterogeneity in workers’ beliefs about their jobfinding probability. We characterize this process and analyze its interactions with job creation and wage determination. Our theory sheds new light on how unemployment can affect workers’ labor market outcomes and wage determination, providing a rational explanation for discouragement as the consequence of negative search outcomes. In particular, longer unemployment durations are likely to be followed by lower reemployment wages because a worker’s beliefs about his job-finding process deteriorate with unemployment duration. Moreover, our analysis provides a set of useful results on dynamic programming with optimal learning. KEYWORDS: Learning, wages, unemployment, directed search, monotone comparative statics.
1. INTRODUCTION WHEN WORKERS HAVE INCOMPLETE INFORMATION about their own jobfinding process, search outcomes convey valuable information. Differences in search outcomes that may initially be caused by luck can induce different updating of workers’ beliefs about their own job-finding process, which will influence workers’ search behavior in the future and lead to further differences in their reemployment rates and wages. In this paper, we develop an equilibrium framework to characterize this endogenous heterogeneity generated by learning from search, and we analyze its interactions with job creation and wage determination. Our theory sheds new light on how unemployment can affect workers’ labor market outcomes and wage determination. As a particular illustration, our theory provides a novel explanation for why longer unemployment durations are likely to be followed by lower reemployment rates and wages (see Addison and Portugal (1989)). It thus complements common human-capital explanations, which emphasize that workers’ skills depreciate during unemployment 1
A co-editor and four anonymous referees made detailed comments and suggestions that led to significant improvements of the paper. This paper (an earlier version of which had the title “Learning From Unemployment”) was presented at the NBER Summer Institute (Cambridge, 2005), the Canadian Macro Study Group Meeting (Vancouver, 2006), Society for Economic Dynamics Meeting (Istanbul, 2009), NBER Conference on Macro Perspectives of the Labor Market (Minneapolis FED, 2009), University of Pennsylvania, Illinois–Urbana/Champaign, and Wisconsin–Madison. We have benefited from comments and suggestions by the participants of these conferences and the workshops, and by Philipp Kircher, Rob Shimer, Stephan Lauermann, Kevin Reffett, and Bob Becker. Both authors would like to acknowledge financial support from the Social Sciences and Humanities Research Council of Canada. The second author would also like to acknowledge financial support from the Bank of Canada Fellowship. The opinion expressed here is the authors’ own and does not reflect the view of the Bank of Canada. © 2010 The Econometric Society
DOI: 10.3982/ECTA8061
510
F. M. GONZALEZ AND S. SHI
(Pissarides (1992)) or that unemployment durations may signal differences in labor productivity (Lockwood (1991)). These explanations alone are unlikely to explain the effects of unemployment on workers’ labor market outcomes. For instance, Addison and Portugal (1989) found that reemployment wages and rates fall significantly over short unemployment durations, and they do so for low-skilled as well as high-skilled workers, even after trying to control for observed and unobserved heterogeneity.2 Our broader view of human capital emphasizes a distinction between a worker’s matching ability and labor productivity, and a distinction between exogenous and endogenous heterogeneity. These distinctions can be useful for devising new empirical strategies to discriminate between duration dependence in workers’ search behavior and the effect of uncontrolled worker heterogeneity (Heckman and Borjas (1980)). One contribution of our paper is to integrate search and learning into an equilibrium framework. The need for an equilibrium framework arises because when workers change their search behavior as a result of learning, firms have an incentive to adjust vacancies and wage offers to respond to these changes. Thus, learning affects the wage distribution. In turn, the availability of vacancies and the wage distribution can affect workers’ search behavior and, hence, the information contained in a worker’s search outcomes. The equilibrium interactions between workers’ search, firms’ vacancy creation, and the wage distribution are important for understanding the tensions between aggregate and individual behavior, as reflected, for instance, in the relationship between wages and the duration of vacancies as well as unemployment. Indeed, our analysis uses the properties of the equilibrium wage function to establish a central result that a worker’s desired wages are a strictly increasing function of the worker’s beliefs. In our model, a worker’s ability is either high or low permanently. A high ability implies that the worker has a higher probability of forming a productive match with a random job. A worker has incomplete information about his ability and, hence, does not precisely know his matching probability. We model search as a directed process as in Moen (1997) and Acemoglu and Shimer (1999). That is, workers know the wage offers before choosing where to apply.3 Directed search allows for sorting of the workers into jobs, which makes an equilibrium block recursive in the sense that individuals’ decisions and market tightness are independent of the distribution of workers. Block recursivity 2
Changes in wealth and search intensity during unemployment can also play a role. However, even after trying to control for wealth effects and search intensity, Alexopoulos and Gladden (2007) found that unemployment duration still has strong negative effects on a worker’s labor market outcomes. 3 See Peters (1984, 1991), Burdett, Shi, and Wright (2001), and Shi (2001) for analyses of directed search as a strategic problem that leads to the competitive search equilibrium outcome as the market becomes large.
THEORY OF LEARNING, SEARCH, AND WAGES
511
allows for a tractable analysis of the equilibrium interactions between equilibrium wages and learning.4 Success and failure to find a match both convey useful information about a worker’s type. Success in getting a match is good news about a worker’s ability. Failure is bad news, which induces a worker to search for jobs that are easier to get. Those jobs come with lower wages as part of the equilibrium trade-off between wages and market tightness. Thus, learning from search induces not only reservation wages, but also desired wages, to increase with beliefs. Firms offer different wages to cater to these workers, who sort according to beliefs, resulting in a nondegenerate distribution of equilibrium wages among ex post equally productive workers. Endogenous heterogeneity in workers’ beliefs provides a rational explanation for discouragement as the consequence of negative search outcomes. This is a natural explanation for the negative effect of unemployment duration on future wages found by Addison and Portugal (1989). As a worker becomes pessimistic, he searches for lower wages so as to raise his job-finding probability. The flip side of this result is that vacancies that offer high wages to target optimistic workers are filled more rapidly than low-wage vacancies, consistent with the evidence in Barron, Bishop, and Dunkelberg (1985) and Holzer, Katz, and Krueger (1991). Moreover, despite workers’ attempt to search for lower wages as the unemployment spell continues, the average job-finding probability can fall with unemployment duration, as the evidence indicates (e.g., Shimer (2008)). This is because the ability composition of workers in any given cohort worsens with unemployment duration. Our analysis provides a sharp characterization of learning from search, resolving a number of problems inherent to the analysis of optimal learning from experience. These problems arise because, as search outcomes generate variations in a worker’s posterior beliefs about his ability, these variations are valuable to the worker only if the worker’s value function is convex in beliefs. Because such convexity can make optimal decisions not unique and the value function not differentiable, standard techniques in dynamic programming (Stokey, Lucas, and Prescott (1989)) cannot be used to study the policy function which is the key object in our analysis.5 We resolve this difficulty by exploiting a connection between convexity of the value function and standard monotone comparative statics results (e.g., Topkis (1998) and Milgrom and Shannon (1994)). The connection is not immediately obvious and, to our knowledge, has not been examined. 4 Shi (2009) first formalized this notion of block recursive equilibria and proved existence of such equilibria in the context of on-the-job search where firms offer wage–tenure contracts to direct workers’ search. Menzio and Shi (2009) established existence of block recursive equilibria in a dynamic, stochastic environment with on-the-job search. 5 Although the literature on optimal learning (e.g., Easley and Kiefer (1988)) recognizes the analytical difficulty caused by a convex value function, it has either ignored the difficulty or focused on corner solutions (e.g., Balvers and Cosimano (1993)).
512
F. M. GONZALEZ AND S. SHI
Because a worker’s decision problem is formulated with dynamic programming, the objective function involves the future value function, which is endogenous. Moreover, we cannot presume properties of the objective function such as concavity, in contrast with other applications of lattice-theoretic techniques to dynamic programming (e.g., Amir, Mirman, and Perkins (1991), Mirman, Morand, and Reffett (2008)). In the end, we establish a set of useful results in dynamic programming with optimal learning. First, convexity of the value function and monotonicity of the policy function are closely related. Second, under a mild condition, the value function is strictly convex and the policy function is strictly monotone. Third, under the same condition, optimal decisions obey the first-order condition and a general version of the envelope theorem is valid. Finally, optimal decisions are unique if the worker’s search history has ever contained a match failure. Our emphasis on learning from search is close in spirit to that of Burdett and Vishwanath (1988). They considered the case in which workers learn about the unknown distribution of wages from the random arrival of wage offers and showed that learning from search can induce reservation wages to decline with unemployment duration. In contrast, we analyze workers’ learning about their ability, study an environment where wages and vacancies are endogenously determined, and focus on desired wages rather than reservation wages. 2. THE MODEL ENVIRONMENT Time is discrete. All agents are risk neutral and discount the future at a rate r > 0. There is a unit measure of workers, divided between employment and unemployment. The measure of firms is determined endogenously by free entry. An employed worker produces, after which a separation shock makes him unemployed with probability δ > 0. An unemployed worker searches for a job and receives the unemployment benefit per period, b ≥ 0. Each worker has unknown ability i that is either high (H) or low (L). Ability is a worker’s permanent characteristic, determined at the time when the worker first enters the market. A new worker has ability i with probability pi , where pH = p ∈ (0 1) and pL = 1 − p. Ability determines a worker’s productivity as follows.6 Upon meeting a randomly drawn firm, the productivity of a worker with ability i is realized to be y > 0 with probability ai , and y ≤ 0 with probability (1 − ai ), while the cost of production is normalized to 0. We refer to ai as a type-i worker’s productive units and we assume 0 < aL < aH < 1 so that a high-ability worker is more likely to be productive than a low-ability worker. Clearly, a firm hires a worker only when the worker is productive, and labor productivity of every employed worker is y > 0. 6 We are very grateful to Daron Acemoglu and the referees for directing us toward this formulation. In a previous version of the paper (Gonzalez and Shi (2007)), we formulated the problem as one of incomplete information about the characteristics of local labor markets rather than individuals.
THEORY OF LEARNING, SEARCH, AND WAGES
513
A natural interpretation of a worker’s ability in our model is in terms of the worker’s skill bundle, in the spirit of recent literature on human capital (see Lazear (2004)). According to this view, workers are heterogeneous with respect to the specific composition of their skill bundle, and different firms demand different skill bundles. A firm must review a worker’s application to determine whether a worker’s skill bundle fits the firm. However, to focus on workers’ learning about their human capital, we abstract from the actual formation of heterogeneous matches by assuming, as above, that the skill bundle of high-ability workers is relatively more likely to fit a random firm. A worker learns about his ability from his labor market experience. After an infinitely long history in the market, a worker eventually learns his true ability. To rule out this uninteresting case, we assume that, with probability σ > 0, an exit shock forces a worker (employed or unemployed) out of the market at the end of each period. An exiting worker’s payoff is normalized to zero, and the worker is replaced with a new worker who enters the market through unemployment so that the labor force remains constant. The events in a period unfold as follows. First, new workers enter the market through unemployment, replacing the workers who exited the market in the previous period. Nature determines a new worker’s ability. Second, an employed worker produces and gets the wage, after which the job separation shock is realized. Meanwhile, unemployed workers search for jobs and new matches are formed. Finally, the exit shock is realized. There is a continuum of submarkets indexed by x that are linked to matching rates in that submarket. The domain of x is X = [0 1/aH ]. A submarket x is characterized by a wage level, W (x), and a tightness, λ(x). The functions W (·) and λ(·) are public information, taken as given by agents and determined in equilibrium. A worker’s or a firm’s search decision in each period is to choose x, that is, the submarket to search.7 Search is directed in the sense that an agent explicitly takes into account the trade-off that a submarket with a high wage has relatively fewer vacancies per worker in the equilibrium. As in Moen (1997) and Acemoglu and Shimer (1999), a firm does not directly set wages; rather, it chooses a pair (W λ) from the menu {(W (x) λ(x)) : x ∈ X}.8 In each submarket, the number of matches is given by a matching function. Since a firm hires a worker only when the worker is productive at the job, it is useful to specify the matching function to determine the number of productive matches rather than the number of contacts. Let v(x) denote the number of 7 Workers who differ in beliefs may also choose different levels of search intensity and labor market participation. Although our analysis can shed light on such differences, we abstract from them for simplicity. 8 It is inadequate to index the submarkets by the length of unemployment duration of the participating workers. First, workers with the same unemployment duration can be heterogeneous in beliefs about their ability if they have different employment histories. Indexing submarkets by unemployment duration alone does not allow these workers to optimally make different search choices.
514
F. M. GONZALEZ AND S. SHI
vacancies created in submarket x, and let ui (x) denote the number of type-i unemployed workers in submarket x, where i ∈ {H L}. We define the total productive units of workers searching in submarket x as (2.1)
ue (x) = aH uH (x) + aL uL (x)
A function F(ue (x) v(x)) gives the number of productive matches in the submarket. The index x is the matching rate for each productive unit in submarket x; that is, x=
F(ue (x) v(x)) ue (x)
For a type-i worker in submarket x, the probability of getting a productive match is ai x. Thus, given x, the lower a worker’s ability, the lower his matching probability. The matching probability of a vacancy in submarket x is F/v = x/λ(x), where λ(x) ≡ v(x)/ue (x) is the effective tightness in the submarket. The above specification of the matching function uses workers’ productive units as an argument, which are similar to the efficiency units of search commonly used in the literature where workers are heterogeneous. This specification enables us to focus on productive matches by combining the process of making contacts (i.e., receiving applications) and the process of evaluating the applicants. The only relevant information for worker’s learning is contained in the matching probabilities aH x and aL x.9 This formulation significantly simplifies the analysis of the learning problem, because neither workers nor firms need to learn about the composition of high- versus low-ability workers in a submarket. In any submarket x, a worker’s matching probability depends only on his own ability and x, while a vacancy’s matching probability depends only on x. Thus, given the choice x, an agent’s expected payoff is independent of the level and the composition of the productive units in the submarket. Accordingly, free entry of firms into the submarket ensures that the effective tightness and the wage in the submarket are functions only of x. We impose the following standard assumption on the matching function.10 ASSUMPTION 1: (i) F(ue v) ≤ min{uH + uL v}; (ii) F is strictly increasing, strictly concave, and twice differentiable in each argument whenever x < 1/aH ; (iii) F is linearly homogeneous; (iv) F(1 0) = 0, F(1 ∞) ≥ 1/aH , and x/λ(x) ≤ 1 for all x ≤ 1/aH . 9 In this sense, our matching function implicitly assumes that a worker who fails to get a job does not know whether he has made a contact, that is, whether his application has been considered by a firm. 10 An example that satisfies the assumption is F(ue v) = ue v/(ue + Bv) if v/ue ≤ 1/(aH − B), and F(ue v) = ue /aH otherwise, where B ∈ (0 aH ) is a constant. In this example, λ(x) = x/(1 − Bx).
THEORY OF LEARNING, SEARCH, AND WAGES
515
Since F(1 λ) = x, we can solve λ and verify that Assumption 1 implies (2.2)
λ (x) > λ(x)/x > 0
λ (x) > 0
for all x ∈ (0 1/aH ]
Moreover, x/λ(x) strictly decreases in x. That is, if it is easy for a worker to find a match in a submarket, it must be difficult for a firm to find a match there. The key feature of the model is the incomplete information about worker ability, which implies that workers face a signal extraction problem. Search histories are informative because low-ability workers are more likely to fail to get matches in any given submarket. As we show below, self-selection of workers into submarkets according to their own information implies that firms do not need to know the workers’ histories. 3. LEARNING IN DIRECTED SEARCH EQUILIBRIUM 3.1. Learning From Search A worker learns about his a, the probability that he will be productive with a randomly selected job. We refer to a worker’s expectation of a as his belief and denote it as μ. The domain of μ is M ≡ [aL aH ]. When a new worker first enters the market, the initial belief is μ0 = paH + (1 − p)aL , where p ∈ (0 1). This initial belief is common to all new workers and it is public information.11 The updating of beliefs depends on the particular submarket into which the worker just searched. Consider an arbitrary period. The worker enters the period with Pi as the prior probability of a = ai , where ai ∈ {aH aL }, and with μ as the prior belief computed from these prior probabilities. After searching in the period, the worker either gets a match (denoted as k = 1) or fails to get a match (denoted as k = 0). Bayesian updating yields the posterior probabilities (3.1)
P(ai |x k = 1) = Pi ai /μ
P(ai |x k = 0) = Pi (1 − xai )/(1 − xμ)
The posterior belief is E(a|x k) = aH P(aH |x k) + aL [1 − P(aH |x k)]. Using the relationship μ = PH aH + (1 − PH )aL , we can solve Pi in terms of μ: (3.2)
PH = (μ − aL )/(aH − aL )
PL = (aH − μ)/(aH − aL )
Substituting (3.1) and (3.2), we express posterior beliefs as E(a|x k = 1) = φ(μ) and E(a|x k = 0) = H(x μ), where (3.3)
φ(μ) ≡ aH + aL − aH aL /μ
(3.4)
H(x μ) ≡ aH − (aH − μ)(1 − xaL )/(1 − xμ)
11 For simplicity we abstract from heterogeneity in the initial beliefs among new workers. Note that our model does generate heterogeneous beliefs among workers with different employment histories.
516
F. M. GONZALEZ AND S. SHI
If μ > aL , then E(a|x k) > aL for both k = 0 and k = 1. Also, φ(μ) > μ > H(x μ) for all μ ∈ (aL aH ), φ (μ) > 0, and φ (μ) < 0. The sequence of beliefs, μ, is a Markov process, and a worker’s belief is a sufficient statistic for the worker’s unemployment history. Note that H(x μ) is decreasing in x; that is, a higher x reduces the worker’s posterior belief after the worker fails to find a match. However, φ is independent of x, because x does not affect the likelihood ratio of a match success between the two types. The value of x measures the informativeness of search. Intuitively, search outcomes in a market with a higher x are more informative because such a market has a relatively higher matching probability for a worker; if a worker fails to find a match in such a market, the worker more likely attributes the failure to low ability. This relationship between x and the informativeness of search can be made precise using Blackwell’s (1951) criterion. Consider the information revealed by search in two different submarkets, with x > x . Let K and K be the random number of matches associated with x and x . Intuitively, one can construct the random variable K by “adding noise” to K as follows. First, let the worker randomize with probability of success ax, where a ∈ {aL aH }; then, whenever the realization is a success, randomize again with success probability x /x. The result is a Bernoulli trial with probability of success equal to ax . In other words, if x > x , the random variable, or experiment, K is sufficient for K (see DeGroot (1970, pp. 433–439)). 3.2. A Worker’s Value Function Consider first a worker with belief μ who is employed at wage w in a period. Denote the worker’s value function, discounted to the end of the previous period, as Je (μ w). After producing and obtaining the wage w, the separation shock forces the worker into unemployment with probability δ and then, independently, the exit shock forces the worker out of the market with probability σ. If the worker remains employed after the two shocks, the continuation value is Je (μ w). If the worker is separated from the job but remains in the market, the continuation value is denoted V (μ). If the worker is out of the market, the continuation value is 0. Thus, the Bellman equation for Je is (1 + r)Je (μ w) = w + (1 − σ)[(1 − δ)Je (μ w) + δV (μ)] The above equation yields w 1 Je (μ w) = (3.5) + δV (μ) A 1−σ
where A ≡
r +σ + δ. 1−σ
Now consider an unemployed worker who enters a period with belief μ. If he chooses to search in submarket x, the expected probability of finding a (productive) match is xμ. If he fails to find a match, his belief is updated downward to H(x μ) as defined by (3.4). In this case, his continuation value is
THEORY OF LEARNING, SEARCH, AND WAGES
517
(1 − σ)V (H(x μ)), which takes into account the probability of exogenous exit. If the worker succeeds in finding a match in the current period, his belief is updated upward to φ(μ) as defined by (3.3). In this case, the worker can choose whether or not to accept the match. We impose Assumption 2 below to guarantee that a worker always accepts a match, and so the worker’s continuation value after finding a match is (1 − σ)Je (φ(μ) W (x)). Thus, under Assumption 2, the worker’s expected return to searching in submarket x, excluding the unemployment benefit, is (1 − σ)R(x μ), where (3.6)
R(x μ) ≡ xμJe (φ(μ) W (x)) + (1 − xμ)V (H(x μ))
Since the value functions are discounted to the end of the previous period, then (3.7)
(1 + r)V (μ) = b + (1 − σ) max R(x μ) x∈X
Denote the set of optimal decisions in (3.7) as G(μ) and a selection from G(μ) as g(μ). When choosing a submarket x, the worker faces two considerations. One is the familiar trade-off between the wage and the matching probability in models of directed search. That is, a submarket with a higher x has a higher jobfinding probability and a lower wage. Another consideration is learning from the search outcome. As discussed earlier, search in a submarket with a high x (i.e., a low wage) is more informative than search in a submarket with a low x. The value of this information is captured by the features of the value function, to be described later in Theorems 3.1 and 4.1. It is useful to note that the set of solutions G(μ) generically contains only a finite number of values. That is, given beliefs μ, a worker prefers to search in only a few submarkets and possibly only one submarket. Over time, the worker switches from one submarket to another not because he is indifferent between these submarkets, but because search outcomes induce the worker to update beliefs. In principle, workers may have incentive to engage in the following “experimentation”: searching during a period solely to gather information and, thus, refusing to enter a match once they learn that a match has occurred. This may occur because a worker who finds a match revises his belief upward to φ(μ). We do not think that this form of experimentation is important in practice, unless it is associated with heterogeneity among productive matches, which does not exist here. Thus, we rule out such experimentation by focusing on the case in which employment is sufficiently valuable to a worker so that the worker always prefers to accept a match that he searches for. ASSUMPTION 2: Define x∗ by the solution to λ (x∗ ) = aH λ(a−1 H ) and note that x ∈ (0 1/aH ). Assume that labor productivity satisfies ∗
(y − b)/c > [A + aH x∗ ]λ (x∗ ) − aH λ(x∗ )
518
F. M. GONZALEZ AND S. SHI
This sufficient condition implies that a worker prefers getting the lowest equilibrium wage every period starting now to remaining unemployed in the current period and then getting the highest possible wage from a match starting next period (see Appendix A). Intuitively, the condition requires that the opportunity cost of rejecting a match, as reflected by (y − b), should be sufficiently high to a worker.12 Stronger than necessary, this condition significantly simplifies the analysis and the exposition of our main results. As in Burdett and Vishwanath (1988), one can relax the condition by introducing a direct cost of search per period, which further increases a worker’s opportunity cost of rejecting an offer. For simplicity, however, we have not included such a cost of search. REMARK 1: Since x∗ λ (x∗ ) > λ(x∗ ), Assumption 2 implies y − b > cA × λ(x∗ )/x∗ , which in turn implies y − b > cAλ (0). The last inequality says that there are feasible wages at which employment is better than unemployment for a worker. 3.3. Free Entry of Firms and the Equilibrium Definition There is free entry of firms into the market. After incurring a cost c ∈ (0 y), a firm can post a vacancy for a period in any one of the submarkets. Denote the value of a job filled at wage w, discounted to the end of the previous period, as Jf (w). Then (3.8)
(1 + r)Jf (w) = y − w + (1 − σ)(1 − δ)Jf (w)
The matching probability for a vacancy in submarket x is x/λ(x), and the continuation value of a match is (1 − σ)Jf (W (x)). Solving Jf from (3.8) and using A defined in (3.5), we can express a firm’s value of a vacancy in submarket x as (3.9)
Jv (x) = −c +
x y − W (x) λ(x) A
A recruiting firm chooses x to maximize Jv (x). In equilibrium, a firm is willing to enter any submarket, provided that the wage in the submarket is consistent with the free-entry condition. Precisely, Jv (x) and the number of vacancies, v(x), satisfy Jv (x) ≤ 0 and v(x) ≥ 0 for all x ∈ X, where the two inequali12 The discount rate in Assumption 2, appearing through A, reflects both workers’ and firms’ discount rate. For a worker, a higher discount rate lowers the benefit from experimentation for any given wage. However, when firms discount future at a higher rate, the present value of a filled job falls and wages in all submarkets must be lower to induce firms to enter. In this case, the loss of the current wage from experimentation falls. With a common discount rate, the effect through firms’ discount rate dominates.
THEORY OF LEARNING, SEARCH, AND WAGES
519
ties hold with complementary slackness. Thus, for all x such that v(x) > 0, the wage function is (3.10)
W (x) = y − cAλ(x)/x
Conversely, for any feasible wage level specified in (3.11)(i) below, we require the number of vacancies to be positive. The wage function has the properties (3.11)
(i) b + caH [x∗ λ (x∗ ) − λ(x∗ )] ≤ W (x) ≤ y − cAλ (0) (ii) W (x) < 0
(iii)
2W (x) + xW (x) < 0
Part (i) specifies the interval of feasible wages, where x∗ is defined in Assumption 2. The upper bound on wages comes from the fact that λ(x)/x ≥ λ (0). The lower bound on wages comes from Assumption 2 and the fact that ∗ λ(x)/x ≤ aH λ(a−1 H ) = λ (x ). The lower bound on wages is strictly greater than ∗ ∗ ∗ b because x λ (x ) > λ(x ). Also, Assumption 2 is sufficient for the wage interval in (i) to be nonempty. Parts (ii) and (iii) of (3.11) are implied by (2.2), which is in turn implied by Assumption 1 on the matching function. Part (ii) says that a higher employment probability occurs together with a lower wage. This negative relationship is necessary for providing a meaningful trade-off between the two variables in directed search. As such, part (ii) is necessary to induce firms to enter the submarket. Part (iii) is implied by λ (x) > 0, and it says that the function xW (x) is strictly concave in x. In general, [xW (x)] is nonmonotone because there is a trade-off between the matching probability and the wage in a submarket. Focus on stationary symmetric equilibria. Such an equilibrium consists of workers’ choices of x, a wage function W (x), value functions (Je V Jf Jv ), and a sequence of beliefs that meet the following requirements. (i) Given the wage function, all workers with the same belief μ use the same optimal search policy x = g(μ) ∈ G(μ) that solves (3.7). (ii) A worker with beliefs μ updates beliefs according to φ(μ) upon getting a match and according to H(g(μ) μ) upon failing to get a match. (iii) The value functions satisfy (3.5), (3.7), (3.8), and (3.9). (iv) Free entry: The wage function W (x) satisfies (3.10). (v) Consistency: For every submarket x with positive entry, the mass of all vacancies in x divided by the productive units of workers who choose x is equal to λ(x). In the above definition, we have left out the distributions of workers and wages, which is characterized in Section 6. We deliberately do so to emphasize the property that individuals’ decisions and matching probabilities are independent of such distributions. For analyzing the former, it is sufficient to know the wage function W (·) and the tightness function λ(·), which are determined by firms’ free-entry condition and the matching function. After completing this analysis, we can simply aggregate individuals’ decisions to find equilibrium distributions of workers and wages. This property, referred to as block recursivity of an equilibrium (see Shi (2009)), makes the analysis tractable by signif-
520
F. M. GONZALEZ AND S. SHI
icantly reducing the dimensionality of the state variables in individuals’ decisions. Block recursivity is a consequence of directed search. In our model, directed search allows the workers to sort according to beliefs about their ability. Since each submarket attracts only the workers with particular beliefs, firms that post vacancies in that submarket calculate the expected profit with only such workers in mind—they do not need to consider how other workers with different beliefs are distributed. Free entry of firms guarantees that each submarket has exactly the effective tightness specified for that submarket. If search were undirected, instead, an individual’s search decision would depend on the wage distribution which, in turn, would evolve as individuals learn about their ability. 3.4. Existence of an Equilibrium Let us analyze a worker’s problem, (3.7). It is easy to see that the right-hand side of (3.7) is a contraction mapping on V . Using (3.11), standard arguments show that a unique value function V exists, which is positive, bounded, and continuous on M = [aL aH ] (see Stokey, Lucas, and Prescott (1989, p. 79)). Moreover, the set of maximizers, G, is nonempty, closed, and upper hemicontinuous. The following theorem summarizes the existence result and some other features of the equilibrium (see Appendix A for a proof). THEOREM 3.1: Under Assumptions 1 and 2, there exists an equilibrium where all matches are accepted. In the equilibrium, g(μ) > 0 for all g(μ) ∈ G(μ) and all μ ∈ M. Moreover, V is strictly increasing, (weakly) convex and almost everywhere differentiable. Let us explain the results in the theorem. First, optimal choices of x are strictly positive. A worker who chooses x = 0 never finds a match and does not learn anything from search (i.e., H(0 μ) = μ). Since there are feasible wages at which employment is strictly better than unemployment (see Remark 1), a worker will choose x > 0. Second, the value function of an unemployed worker is strictly increasing in the worker’s beliefs. Because a worker with higher beliefs can always choose to enter the same submarket as does a worker with lower beliefs and, thereby, can obtain a match with a higher expected probability, the former gets a higher expected payoff. Third, the value function is (weakly) convex in beliefs, as is standard in optimal learning problems (see Nyarko (1994)). Search generates information by creating variations in the worker’s posterior beliefs. Such variations can never be harmful to the worker because the worker can always choose to ignore the information. Weak convexity of the value function reflects this fact. A worker’s reservation wage can be defined in the conventional way as the lowest permanent income that a worker will accept to forego search. This is given as (r + σ)V (μ). Monotonicity of the value function determines the
THEORY OF LEARNING, SEARCH, AND WAGES
521
behavior of reservation wages. Because V (μ) is strictly increasing, the reservation wage strictly falls over each unemployment spell as the worker’s beliefs about his own ability deteriorate. Put differently, a worker’s permanent income strictly declines over each unemployment spell. Similarly, with strict monotonicity of V , (3.7) implies that a worker’s reservation wage is always strictly lower than the desired wage, that is, (r + σ)V (μ) < W (g(μ)) for all μ > aL . Our focus is on a worker’s desired wage, which is defined as w(μ) = W (g(μ)). Desired wages are much more difficult to analyze than reservation wages, because they depend on optimal learning from search. As it will become clear in the next section, monotonicity of the optimal search decision relies crucially on convexity of the value function. 4. MONOTONICITY OF WORKERS’ DESIRED WAGES In this section, we establish the result that a worker’s desired wage, w(μ), is an increasing function of beliefs. Because w(μ) = W (g(μ)), where W (·) is decreasing, it is equivalent to establish the result that a worker’s policy function for the submarket to search, x = g(μ), is a decreasing function. For what follows, we define z = −x and refer to z, rather than x, as the worker’s search decision. Then the objective function in (3.7) becomes R(−z μ), and the feasible set of choices is −X = [−a−1 H 0]. The domain of μ is M = [aL aH ]. A difficulty in proving monotonicity of the policy function arises from the feature inherent to learning that the value function is convex in beliefs. This feature implies that optimal choices may not be unique or interior, and the value function may not be differentiable.13 In this context, standard techniques in dynamic programming for proving policy functions to be monotone are not applicable (e.g., Stokey, Lucas, and Prescott (1989, pp. 80–87)).14 A natural approach is to use lattice-theoretic techniques associated with a supermodular objective function (see Topkis (1998)). In our model, R is supermodular in (z μ) if and only if R has increasing differences in (z μ), because the latter variables lie in closed intervals of the real line.15 13 One can attempt to impose strong assumptions to make R concave in z and then use the firstorder condition to characterize the optimal choice of z. However, such assumptions invariably require restrictions on the degree of convexity of the value function. It is difficult to verify that these restrictions can be satisfied by the fixed point of the Bellman equation, (3.7). 14 In different modeling environments, there are techniques to establish differentiability of value functions and optimal choices in dynamic programming, for example, Santos (1991). However, those techniques also require the value function to be concave. On the other hand, the literature on optimal learning (e.g., Easley and Kiefer (1988)) has either ignored the difficulty arising from a convex value function or focused on corner solutions (e.g., Balvers and Cosimano (1993)). 15 Let z ∈ Z and μ ∈ M, where Z and M are partially ordered sets. A function f (z μ) has increasing differences in (z μ) if f (z1 μ1 ) − f (z1 μ2 ) ≥ f (z2 μ1 ) − f (z2 μ2 ) for all z1 > z2 and
522
F. M. GONZALEZ AND S. SHI
The connection between lattice-theoretic techniques and the dynamic programming problem in (3.7) is far from obvious. First, because [−zW (−z)] is nonmonotone in z, the current payoff in the objective function, [−μzW (−z)], is not supermodular in (μ z). Second, whatever features one imposes on the value function to make the objective function supermodular must be confirmed as those of the fixed point of the Bellman equation, (3.7). In other applications of lattice-theoretic techniques to dynamic programming, this confirmation is achieved by assuming supermodularity of the current payoff function and using concavity of the value function recursively via the Bellman equation (e.g., Amir, Mirman, and Perkins (1991) and Mirman, Morand, and Reffett (2008)). This approach is not applicable here, because the current payoff is not supermodular and the value function is convex. To proceed, we note that optimal choices remain unchanged if we divide the objective function R by the state variable μ. This transformation eliminates the first difficulty above, that is, the ambiguous effect of the nonmonotone function [−zW (−z)] on modularity of the objective function. Accordingly, we express (3.7) as ˆ (1 + r)V (μ) = b + (1 − σ)μ max R(z μ) z∈−X
ˆ where Rˆ is defined as R(z μ) ≡ μ−1 R(−z μ), that is, (4.1)
δ zW (−z) ˆ − zV (φ(μ)) + (z + μ−1 )V (H(−z μ)) R(z μ) = − A(1 − σ) A
Next, note that the second term on the right-hand side of (4.1), with negative sign, is associated with the value of the information contained in a match success. This term is strictly submodular in (z μ) because searching in a high-z market has a relatively low chance of success, while the payoff of success increases in a worker’s belief. To ensure that Rˆ is supermodular, it is necessary to restrict this submodular term. The value of the information contained in a match success is regulated by the job separation rate δ, because a worker uses such information only when he becomes unemployed again in the future. Accordingly, we impose an upper bound on δ: ¯ where δ¯ > 0 is ASSUMPTION 3: The job separation rate satisfies 0 < δ ≤ δ, defined in part (iii) of Lemma A.1 in Appendix A. ˆ Denote Z(μ) = arg maxz∈−X R(z μ) and z(μ) ∈ Z(μ). Clearly, G(μ) = −Z(μ) and g(μ) = −z(μ). Denote the greatest selection of Z(μ) as z¯ (μ) and μ1 > μ2 . If the inequality is strict, then f has strictly increasing differences. Because Z, M, and Z × M are all lattices in our model, the feature of increasing differences implies supermodularity (see Topkis (1998, p. 45)).
THEORY OF LEARNING, SEARCH, AND WAGES
523
the smallest selection as z(μ). Every selection z(μ) is an increasing function if for all μa and μb in M, with μa > μb , it is true that z(μa ) ≥ z(μb ) for all z(μa ) ∈ Z(μa ) and all z(μb ) ∈ Z(μb ). If z(μa ) > z(μb ) for all μa > μb in the preceding definition, then every selection z(μ) is strictly increasing. We state the following theorem (see Appendix B for a proof). ˆ THEOREM 4.1: Maintain Assumptions 1, 2, and 3. Part 1: R(z μ) is strictly supermodular in (z μ), and so every selection z(μ) is an increasing function. Part 2: The following statements are all equivalent to each other: (i) V (μ) is strictly convex for all μ. (ii) Every selection z(μ) ∈ Z(μ) is strictly increasing in μ. (iii) For all μ > aL , {−a−1 / Z(μ) and so Z(μ) is interior. (iv) {−a−1 / Z(aH ). H }∈ H }∈ (v) The following condition holds: (4.2)
−1 (y − b)/c < (A + 1)λ (a−1 H ) − aH λ(aH )
Part 1 states that the policy function, z(μ), is a (weakly) increasing function and so desired wages are an increasing function of beliefs. These results are implied by strict supermodularity of Rˆ (see Topkis (1998, p. 79)). Rˆ is strictly supermodular because a worker’s expected value in the case of a match failure, (1 + μz)V (H(−z μ)), is strictly supermodular and Assumption 3 ensures that this supermodular component dominates the submodular component, −δ μzV (φ(μ)), that is associated with a match success. A To gain further intuition, consider a hypothetical lottery that gives a “prize” of E(a|z k) when k = 0 (i.e., match failure) and 0 when k = 1 (i.e., match success), where k = 0 occurs with probability (1 + zμ). The expected value of the lottery conditional on k = 0 is α ≡ (1 + μz)H(−z μ). Note that α increases in z and is strictly supermodular in (z μ). Lowering z (i.e., increasing x = −z) always lowers α, because it reduces the probability and the expected size of the prize. This effect of a lower z has the flavor of a winner’s curse. In addition, the curse gets worse as μ is higher, because the marginal impact of lowering z on α increases in μ. That is, with a higher μ, lowering z reduces the probability of winning the prize by a larger amount, in which case a match failure indicates that the expected prize is even more likely to be low. Now note that supermodularity of α translates into supermodularity of (1+μz)V (H(−z μ)), because V is convex and strictly increasing. Let us make three remarks on Part 1. First, convexity of the value function ˆ as explained plays an important role in the proof of strict supermodularity of R, ˆ above. Second, because R is strictly supermodular, every selection of z, rather than just the greatest or the smallest selection, is an increasing function. Third, it can be verified that strict supermodularity of Rˆ is sufficient but not necessary for the original function R(−z μ) to have strict single crossing in (z μ), as defined by Milgrom and Shannon (1994) (see the Supplemental Material (Gonzalez and Shi (2010))). Although strict single crossing is enough to prove
524
F. M. GONZALEZ AND S. SHI
that the policy function is weakly increasing, we need strict supermodularity of Rˆ to establish strict monotonicity. Part 2 states the necessary and sufficient conditions for the policy function to be strictly increasing, and establishes the equivalence between strict monotonicity of the policy function and strict convexity of the value function. In general, strong conditions are required for the policy function to be strictly monotone.16 In our model, only a very mild condition, (4.2), is necessary and sufficient for every selection z(μ) to be a strictly increasing function. Condition (4.2) is equivalent to statement (iv), which requires that a worker with the most optimistic belief μ = aH finds it not optimal to search for the lowest wage, that is, to search in the submarket with the lowest z. If (4.2) is not satisfied, then it is optimal for all workers to search for the lowest wage, regardless of their beliefs. In this sense, (4.2) can be viewed as a regularity condition for learning to be a useful explanation for the fact that wage losses upon reemployment increase with unemployment duration.17 To see the role of (4.2), let us first explain statements (i)–(iii) in Theorem 4.1. The equivalence between (i) and (ii) relies on the following standard property of optimal learning: The value function V is strictly convex in beliefs if and only if there do not exist μa and μb in M, with μa > μb , and a choice x0 such that x0 is optimal for all μ ∈ [μb μa ] (see Nyarko (1994)). Because every selection z(μ) is weakly increasing, as established earlier, this standard property implies that the value function is strictly convex if and only if every selection z(μ) is strictly increasing. It is easy to see that statement (ii) in Theorem 4.1 implies (iii) which, in turn, implies (iv). The key step in the proof of Part 2 is to show that (iv) implies (i). That is, if the value function is not strictly convex, then a worker with the most optimistic belief should search for the lowest wage. To understand this result, suppose that the value function is not strictly convex. In this case, there is an interval of beliefs [μb μa ], with μa > μb , such that the optimal choice is the same under such beliefs. This must mean that under such beliefs, local variations in the positive or negative signal are not valuable to the worker. In particular, the value function must be linear in the interval of beliefs induced by a match success, [φ(μb ) φ(μa )]. In this case, strict concavity of the function [−zW (−z)] implies that the payoff function is strictly concave in z and, hence, the optimal 16 See Amir (1996) and Edlin and Shannon (1998). Their methods require the value function to be continuously differentiable. In particular, Edlin and Shannon (1998) assumed that the obˆ jective function, R(z μ), has increasing marginal differences. To compute marginal differences, ˆ R(z μ) must be continuously differentiable with respect to z. Because Rˆ depends on z through the future value function, as well as W , it is differentiable with respect to z only if the value function is so. 17 The condition in the theorem can hold simultaneously with Assumption 2. To see this, note that the right-hand side of the condition in Assumption 2 is strictly increasing in x∗ and, hence, is less than the right-hand side of the condition given in the theorem (since x∗ < 1/aH ).
THEORY OF LEARNING, SEARCH, AND WAGES
525
choice of z is unique for all μ ∈ (μb μa ). For this unique choice to be constant for all μ ∈ (μb μa ), it must be at the corner z = −a−1 H ; otherwise, strict supermodularity of Rˆ implies that the optimal choice strictly increases in μ. Repeating the above argument, we know that for any positive integer i, the value function must be linear over beliefs in [φi (μb ) φi (μa )] and that the optimal i choice under such beliefs must be the singleton {−a−1 H }, where φ is defined i i−1 i as φ (·) = φ(φ (·)). Because φ (μ) converges to aH for all μ ∈ (aL aH ), the choice {−a−1 H } must also be optimal when μ = aH . The above explanation for why (iv) implies (i) relies on the assumption δ > 0.18 If δ = 0, instead, the information revealed by a match success is not valuable to the worker, because the worker will never be unemployed again. In this case, the above induction does not apply and so, for some belief μ = μa > aL , the worker may find it optimal to choose z = −a−1 H . Once this happens, it is optimal for the worker to choose z = −a−1 H for all μ ≤ μa , in which case the value function is linear in the subinterval [aL μa ]. It is useful to clarify the role of the equilibrium wage function for the results obtained so far. In contrast to a model of decision theory, our model requires the wage in each submarket to be consistent with free entry of firms. This equilibrium requirement results in the wage function W (−z), as given by (3.10). Given the standard assumptions on the matching technology in Assumption 1, the wage function has the properties listed in (3.11). These properties are not important for the policy functions z(μ) and w(μ) to be weakly increasing. The ˆ which requires only that the value latter relies on strict supermodularity of R, function V be weakly convex. However, the equilibrium wage function is critical for the policy functions to be strictly increasing. In particular, in the above explanation for why statement (iv) implies (i), we have explicitly used property (iii) in (3.11) that the function [−zW (−z)] is strictly concave. If the wage function were exogenous or if it had no connection to the matching technology, it would not be clear how it should satisfy (3.11). In this sense, the equilibrium structure of the model is essential for our analysis to capture the intuitive link between learning from search and discouragement. 5. FURTHER CHARACTERIZATION OF EQUILIBRIUM PATHS Condition (4.2) is necessary and sufficient for optimal choices to be interior for all μ > aL (see Theorem 4.1). We explore this feature to provide a sharper characterization of an equilibrium than in the previous section. In particular, we establish the validity of the first-order condition, a generalized version of the envelope theorem, and a discipline on the set of paths of optimal choices. Together with monotonicity of the policy function, these results provide an operational way to do dynamic programming when the value function is convex. 18 The equivalence between statements (i) and (ii) in Theorem 4.1 does not require δ > 0. However, the equivalence between these two statements and other statements does require δ > 0.
526
F. M. GONZALEZ AND S. SHI
Since we maintain Assumptions 1–3 and condition (4.2) in this section, the ˆ Since we focus on symmetric equiresults rely on strict supermodularity of R. libria, all workers with beliefs μ use the same selection z(μ). Let us introduce some notation. For any μ, let μ+ denote the limit to μ from the right, μ− denote the limit from the left, f (μ+ ) denote the right-hand derivative of any function f , and f (μ− ) denote the left-hand derivative. Recall that φ(μ) is the posterior belief reached from the prior μ through a match success. Denote the posterior belief reached through a match failure with an optimal choice as h(μ) ≡ H(−z(μ) μ). For any S in the σ-algebra of M, denote φ(S) = {φ(μ) : μ ∈ S} and h(S) = {h(μ) : μ ∈ S}. For any μ ∈ M, construct Υ (μ) = {Υi (μ)}∞ i=0 by Υ0 (μ) = {μ} and Υi+1 (μ) = {φ(Υi (μ)) h(Υi (μ))} for i = 0 1 ∞. We call Υ (μ) the tree of equilibrium beliefs generated from μ and call Υi (μ) the ith layer of the tree. Given μ and the optimal choice z(μ), beliefs in the next period will be φ(μ) with mean probability −z(μ)μ, and h(μ) with mean probability [1 + z(μ)μ]. In Appendix C, we establish the following theorem. THEOREM 5.1: Let μ be any arbitrary value in the interior of (aL aH ). The following results hold: (i) V (h(μ)) exists for all z(μ) ∈ Z(μ), and so optimal ¯ choices in every period obey the first-order condition, Rˆ 1 (z(μ) μ) = 0. (ii) z(μ) is right-continuous and z(μ) is left-continuous. (iii) V satisfies the envelope conditions (1 + r)V (μ+ ) = (1 − σ)R2 (−¯z(μ) μ+ ) (1 + r)V (μ− ) = (1 − σ)R2 (−z(μ) μ− ) ¯ = z(μ). (v) If V (μa ) (iv) V (μ) exists if and only if V (φ(μ)) exists and z(μ) exists for a particular (interior) μa , such as μa = h(μ) for any interior μ, then the optimal choice z(μ ) is unique and the value function V (μ ) is differentiable at all μ ∈ Υ (μa ). Recall that the value function is differentiable almost everywhere (see Theorem 3.1). Part (i) above states that a match failure induces posterior beliefs at which the value function is differentiable, regardless of whether the value function is differentiable at the prior belief. To explain this result, consider an arbitrary (interior) prior belief μ and let the posterior belief following a match failure be μ = h(μ). If the value function is not differentiable at μ , the lefthand derivative of V (μ ) must be strictly lower than the right-hand derivative. This implies that by searching for a wage slightly lower than w(μ) (i.e., in a submarket slightly lower than z(μ)), the worker’s future marginal value falls by a discrete amount even though the worker learns only slightly more about his ability when he fails to find a match. The worker can avoid this discretely larger marginal loss by choosing z slightly above z(μ), which keeps the posterior slightly above μ . Since the cost to increasing z is a marginal reduction
THEORY OF LEARNING, SEARCH, AND WAGES
527
in the matching probability, the net marginal gain from increasing z slightly above z(μ) is positive. This contradicts the optimality of z(μ). The above limited sense of differentiability of the value function is sufficient to ensure that optimal choices obey the first-order condition, as stated in part (i) of Theorem 5.1. Although the value function may fail to be differentiable if a worker has never experienced a match failure, this potential failure does not invalidate the first-order condition. The reason is that for any given prior belief, μ, search choices in the current period do not affect the posterior belief in the case of a match success, φ(μ). Thus, optimal search decisions are independent of whether or not the value function is differentiable at φ(μ). As long as the value function is differentiable at h(μ), the worker’s objective function is differentiable at optimal choices, and so the first-order condition applies. Part (ii) of Theorem 5.1 describes one-sided continuity of the highest and the smallest selection of optimal choices. Such continuity is needed for part (iii), which is a generalized version of the envelope theorem. Part (iv) states that uniqueness of the optimal choice under a belief μ is necessary, but not sufficient, for the derivative V (μ) to exist. For the latter, the value function must also be differentiable at the posterior belief φ(μ). Part (v) of Theorem 5.1 puts discipline on equilibrium paths. If initial beliefs lie outside the measure-zero set where the value function is not differentiable, then the value function remains differentiable on the entire tree of beliefs generated by the equilibrium, in which case the optimal choice is unique. Even if a worker’s initial beliefs lie in this measure-zero set, the first match failure takes the worker out of this set, after which the value function is differentiable and the optimal choice is unique. 6. STEADY STATE DISTRIBUTIONS AND WORKER FLOWS We now determine the distribution of workers and discuss how current unemployment durations and past unemployment spells can influence reemployment rates and wages. Immediately before the labor market opens in a period, measure employed workers with beliefs μ and type i ∈ {H L} as ei (μ), and similarly, uˆ i (μ) for the unemployed. The stationary distribution of workers over beliefs is {(eH (μ) eL (μ) uˆ H (μ) uˆ L (μ)) : μ ∈ Υ (μ0 )}, where Υ (μ0 ) is the tree of equilibrium beliefs generated from μ0 . Consider unemployed workers of type i ∈ {H L}. There are three cases. One is that the unemployed workers are newborns. The measure of newborns with type i is (6.1)
uˆ i (μ0 ) = σpi
where pH = p and pL = 1 − p. The outflow from and inflow to this group are both equal to σpi , and so stationarity always holds for this group.
528
F. M. GONZALEZ AND S. SHI
The second case of unemployed workers of type i is that these workers were unemployed in the previous period, in which case their beliefs in the current period are h(μ) = H(−z(μ) μ) for some μ ∈ Υ (μ0 ). All of these workers move out of the group in the period. The inflow is type-i unemployed workers with beliefs μ who survive exogenous exit and fail to find a match in the current period; the probability of this joint event is (1 − σ)[1 − ai g(μ)]. Thus, stationarity requires (6.2)
uˆ i (h(μ)) = (1 − σ)[1 − ai g(μ)]uˆ i (μ)
μ ∈ Υ (μ0 )
The third case of unemployed workers of type i is that these workers separated from their jobs in the previous period. These workers’ beliefs in the current period are φ(μ) for some μ ∈ Υ (μ0 ). Again, all of these workers move out of the group in the period. The inflow is type-i employed workers with beliefs φ(μ) who exogenously separate from jobs and survive exogenous exit. Thus, stationarity requires (6.3)
uˆ i (φ(μ)) = (1 − σ)δei (φ(μ))
μ ∈ Υ (μ0 )
Similarly, consider employed workers of type i with beliefs φ(μ). The outflow from the group in the period is [σ + (1 − σ)δ]ei (φ(μ)), which is generated by exogenous exit from the market and exogenous job separation. The inflow is type-i unemployed workers with beliefs μ who find a match in the current period and survive exogenous exist; the probability of this joint event is (1 − σ)ai g(μ). Thus, stationarity requires (6.4)
[σ + (1 − σ)δ]ei (φ(μ)) = (1 − σ)ai g(μ)uˆ i (μ)
μ ∈ Υ (μ0 )
The stationary distribution is determined by (6.1)–(6.4), together with the requirement that the total measure of workers is one. Because the equilibrium is block recursive, optimal choices are independent of the distribution, and so (6.1)–(6.4) are linear equations of the measures of workers. It is straightforward to solve for these equations by going through the nodes of the tree, starting at the root, μ0 . Given the equilibrium tree of beliefs, Υ (μ0 ), the stationary distribution of workers over such beliefs is unique. In the stationary equilibrium, the set of active submarkets is {g(μ) : μ ∈ Υ (μ0 )}. In submarket g(μ), the measure of type-i workers is ui (g(μ)) = uˆ i (μ), where i ∈ {H L}. The total number of matches in this submarket is [aH uˆ H (μ) + aL uˆ L (μ)]g(μ). The average job-finding probability in submarket g(μ) is (6.5)
f (g(μ)) =
aH uˆ H (μ) + aL uˆ L (μ) g(μ) uˆ H (μ) + uˆ L (μ)
Given μ, this probability is stationary over time because the composition of workers in the submarket is constant in the stationary equilibrium. Similarly, the average job-finding probability in the entire economy is constant over time.
THEORY OF LEARNING, SEARCH, AND WAGES
529
To see how unemployment duration influences reemployment rates and wages, let us follow a given cohort of unemployed workers with beliefs μ. As established in Theorem 4.1, workers search for lower wages as their beliefs about their ability deteriorate with unemployment duration. Accordingly, discouragement is reflected in wage losses at reemployment, providing a natural explanation for the negative effect of unemployment duration on future wages found by Addison and Portugal (1989).19 This mechanism can also help to explain why similar workers are paid different wages (see Burdett and Mortensen (1998) and Mortensen (2003)). On reemployment rates, learning from search has two effects. First, for any given ability ai , the job-finding probability ai g(μ) increases in the course of unemployment as workers search for lower wages that are easier to get. Second, as high-ability workers are more successful in getting jobs and exiting from unemployment, the average ability in the cohort remaining unemployed decreases with unemployment duration, which reduces the average job-finding probability in the cohort. More precisely, the average job-finding probability in the cohort, given by (6.5), is an increasing function of the ratio of high- to low-ability workers in the cohort, uˆ H (μ)/uˆ L (μ), which decreases as μ decreases with unemployment duration. When this composition effect dominates the effect of g(μ), the average job-finding probability falls with unemployment duration. Confounding the above composition effect, workers who become unemployed at the same time can differ in their beliefs μ because their histories of past unemployment can differ. This implication naturally suggests that an empirical investigation of job-finding probabilities and reemployment wages should take into account not only the worker’s most recent unemployment spell, as it is typically done in the empirical literature, but also the history of the worker’s previous unemployment spells. Our theory suggests a simple empirical strategy to take into account a worker’s labor market history. Because a worker’s beliefs follow a Markov process, the effect of past labor market history is summarized by the worker’s beliefs when entering the most recent unemployment spell. In turn, the latter beliefs have a monotone relationship to the worker’s wage at the most recent job. Thus, a worker’s pre-unemployment wage serves the role of summarizing the worker’s previous experience in the labor market. This role complements conventional human-capital explanations that view wages as a summary of the workers’ human capital. The previous argument also provides a novel explanation for Addison and Portugal’s (1989) finding that unemployment duration increases with preunemployment wages after controlling for skills. Workers with higher preunemployment wages are those who had relatively shorter durations in previous unemployment spells and, hence, are more optimistic about their ability 19 Addison and Portugal (1989) controled for observed heterogeneity by including, for example, schooling, age, race, location, experience, and industry dummy. They also controled for unobserved heterogeneity by estimating a predisplacement wage equation first and then imposing the resulting restrictions in the postdisplacement wage equation.
530
F. M. GONZALEZ AND S. SHI
when entering the current unemployment spell. These workers will search for jobs offering higher wages, which are relatively harder to get. 7. CONCLUSION In this paper, we proposed an equilibrium theory of learning from search in the labor market. The main assumption is that unemployed workers have incomplete information about their job-finding ability and learn about their ability from search outcomes. Success and failure of search both convey useful information about a worker’s type. As workers experience different search outcomes, their labor market histories and, hence, their beliefs about their ability diverge. The theory formalizes a notion akin to discouragement. That is, over each unemployment spell, unemployed workers update their beliefs about their job-finding ability downward and reduce not only reservation wages, but also desired wages. Firms cater to these workers by offering different wages. Thus, learning from search generates endogenous heterogeneity in workers’ histories that can be useful for understanding how unemployment can affect workers’ labor market outcomes and wage determination. Our paper integrates learnings from search into an equilibrium framework to determine jointly the workers’ search behavior, the incentives to create jobs, and the wage distribution. The equilibrium analysis was made tractable with directed search, which made the equilibrium block recursive in the sense that search behavior and market tightness are independent of the wage distribution. Another contribution of the paper is to provide a set of results in dynamic programming when the value function is convex. We identified a connection between convexity of a worker’s value function in beliefs and the property of supermodularity, established the property that the policy functions are monotone, and provided conditions under which the first-order condition and the envelope condition are valid. These results are likely to be useful in other learning problems, because convexity of the value function in beliefs is inherent to optimal learning from experience. The equilibrium theory of learning from search provides a novel mechanism for generating endogenous heterogeneity among unemployed workers. The learning process turns ex ante identical workers into ex post heterogeneous workers who differ in posterior beliefs about their job-finding probabilities. Such endogenous heterogeneity makes a worker’s entire labor market history relevant for his future labor market outcomes. With block recursivity, it will be feasible and interesting to examine the interactions between such endogenous heterogeneity and ex ante heterogeneity among workers and firms. APPENDIX See the Supplemental Material (Gonzalez and Shi (2010)) for complete proofs.
THEORY OF LEARNING, SEARCH, AND WAGES
531
A. Proof of Theorem 3.1 First, we prove existence of the equilibrium. Given the analysis leading to Theorem 3.1, it suffices to show that Assumption 2 is sufficient for all matches to be accepted, in which case V indeed obeys (3.7). Consider a worker with beliefs μ ∈ M who gets a match in submarket x ∈ X. The worker strictly prefers to accept the match if and only if Je (φ(μ) W (x)) > V (φ(μ)), which is equivalent to W (x) > (r + σ)V (φ(μ)). A sufficient condition is that the inequality holds for x = 1/aH and μ = aH . Substituting V (aH ) from (A.1) in Lemma A.1, we rewrite this sufficient condition as (y −b)/c > [A+aH xH ]λ (x∗ )−aH λ(xH ), where x∗ is defined by λ (x∗ ) = aH λ(a−1 H ) and xH = g(aH ). Since the right-hand side of the inequality is maximized at xH = x∗ , the inequality is ensured by Assumption 2. Second, we prove that g(μ) > 0 for all g(μ) ∈ G(μ) and all μ ∈ M. Suppose that g(μ) = 0 for some μ ∈ M, contrary to the theorem. In this case, (3.6) and (3.7) yield R(0 μ) = V (μ) = b/(r + σ). Substituting this value of V for the future value function, we obtain a lower bound on the payoff R, say, ˜ ˜ R(x μ). Using Remark 1, we can prove that some x0 > 0 maximizes R(x μ) ˜ and achieves R(x0 μ) > R(0 μ)—contradiction. Third, we prove that V is strictly increasing. Let T V (μ) denote the righthand side of (3.7). Since T is a contraction mapping, it suffices to prove that T V (μa ) > T V (μb ) for any continuous and increasing function V and any μa μb ∈ M, with μa > μb (see Stokey, Lucas, and Prescott (1989)). Denote gi = g(μi ) ∈ G(μi ), where i ∈ {a b}. We have R(ga μa ) − R(gb μb ) ≥ R(gb μa ) − R(gb μb ) δ W (gb ) + V (φ(μb )) − V (H(gb μb )) ≥ gb (μa − μb ) A(1 − σ) A > gb (μa − μb ) V (φ(μb )) − V (H(gb μb )) ≥ 0 The first inequality comes from the fact that gi ∈ arg maxx R(x μi ) and the second inequality comes from V (H(gb μa )) ≥ V (H(gb μb )). The strict inequality uses the fact that gb > 0 and W (x) > (r + σ)V (φ(μ)) for all x and μ (see above proof). The last inequality comes from φ(μb ) ≥ H(gb μb ). Hence, T V (μa ) > T V (μb ). Finally, (weak) convexity of V follows from standard arguments (e.g., Nyarko (1994, Proposition 3.2)). Because a convex function is almost everywhere differentiable (see Royden (1988, pp. 113–114)), V is almost everywhere differentiable. Q.E.D. The proofs of the following lemmas are omitted:
532
F. M. GONZALEZ AND S. SHI
LEMMA A.1: Denote xi = g(ai ), where i ∈ {H L}. The following results hold: (i) The optimal choice xi is unique and satisfies R1 (xi ai ) ≥ 0, with strict inequality only if xi = 1/aH . The value function satisfies (A.1)
V (ai ) =
Ab + ai xi W (xi ) (r + σ)[A + ai xi ]
(ii) Condition (4.2) is necessary and sufficient for xH < 1/aH . Also, xL ≥ xH with ¯ where δ¯ strict inequality if xH < 1/aH . (iii) δ/A < V (a+L )/V (a−H ) for all δ ≤ δ, is the smallest positive solution to Ω(δ) = 0 and Ω is defined as 2 r +σ r +σ r +σ aL aL Ω(δ) = (A.2) +δ −δ 1+ +δ + 1−σ 1−σ aH 1−σ aH LEMMA A.2: For any given z, the functions μV (φ(μ)) and (1 + zμ)V (H(−z μ)) are convex in μ if V (·) is convex, and strictly convex in μ if V (·) is strictly convex. B. Proof of Theorem 4.1 ˆ First, we prove that R(z μ) is strictly supermodular. Once this is done, the monotone selection theorem in Topkis (1998, Theorem 2.8.4, p. 79) implies that every selection from Z(μ) is increasing. To prove that Rˆ is strictly supermodular, take arbitrary za , zb ∈ −X, and arbitrary μa , μb ∈ M, with za > zb ˆ a μa ) − R(z ˆ a μb )] − [R(z ˆ b μa ) − R(z ˆ b μb )]. and μa > μb . Denote D = [R(z We need to show D > 0. Temporarily denote φj = φ(μj ), Hij = H(−zi μj ), and Vij = V (Hij ), where i j ∈ {a b}. Computing D, we have D = D1 − [V (φa ) − V (φb )](za − zb )δ/A ≥ D1 − V (a+H )[φa − φb ](za − zb )δ/A where the inequality follows from convexity of V and where D1 denotes −1 −1 −1 D1 = (za + μ−1 a )Vaa − (zb + μa )Vba − (za + μb )Vab + (zb + μb )Vbb
Denote H˜ = min{Hba Hab }. Because H(−z μ) is a strictly increasing function of z and μ for all μ ∈ (aL aH ), then Haa > H˜ ≥ Hbb . Because V is convex, we have ˜ Vaa − Vbb Vaa − V (H) Vaa − Vba Vaa − Vab ≥ ≥ min ˜ Haa − Hba Haa − Hab Haa − Hbb Haa − H Substituting Vba , Vab , and Vbb from these inequalities, and substituting H, we have ˜ ˜ D1 ≥ (za − zb )(φa − φb )[Vaa − V (H)]/[H aa − H] ≥ V (a+L )(za − zb )(φa − φb )
THEORY OF LEARNING, SEARCH, AND WAGES
533
where the second inequality follows from convexity. Thus, a sufficient condition for D > 0 is δ/A < V (a+L )/V (a−H ), which is implied by Assumption 3 (see Lemma A.1). We next establish that the five statements (i)–(v) in Theorem 4.1 are equivalent. (i) ⇐⇒ (ii) Optimal learning has the following standard property (see Nyarko (1994, Proposition 4.1)): The value function is strictly convex in beliefs if and only if there do not exist μa and μb in M, with μa > μb , and a choice z0 such that z0 ∈ Z(μ) for all μ ∈ [μb μa ]. Since z(μ) is an increasing function, as proven above, the standard property implies that V is strictly convex if and only if every selection z(μ) is strictly increasing for all μ. (ii) ⇒ (iii) Suppose {−a−1 H } ∈ Z(μa ) for some μa > aL so that (iii) is violated. Because every selection z(μ) is increasing, Z(μ) contains only the singleton {−a−1 H } for all μ < μa . In this case, (ii) does not hold for μ ≤ μa . Note that since z(μ) < 0 by Theorem 3.1, the result {−a−1 / Z(μ) implies H } ∈ that Z(μ) is interior. (iii) ⇒ (iv) This follows from aH > aL . (iv) ⇐⇒ (v) See part (ii) of Lemma A.1. (iv) ⇒ (i) We prove that a violation of (i) implies that {−a−1 H } ∈ Z(aH ), which violates (iv). Suppose that V is not strictly convex. Proposition 4.1 in Nyarko (1994) implies that there exist μa and μb in M, with μa > μb , and a choice z0 such that z0 ∈ Z(μ) and V (μ) is linear for all μ ∈ [μb μa ]. Since μa > μb , let μb > aL and μa < aH without loss of generality. We deduce that V (μ) is linear for all μ ∈ [φ(μb ) φ(μa )]: If V (μ) were strictly convex in any subinterval of [φ(μb ) φ(μa )], Lemma A.2 above would imply that R(−z0 μ) is strictly convex μ in some subinterval of [μb μa ]. Similarly, V (μ) is linear for all μ ∈ [Hb Ha ], where Hi denotes H(−z0 μi ) for i ∈ {a b}. Denote the slope of V as V (φb ) for μ ∈ [φ(μb ) φ(μa )] and V (Hb ) for μ ∈ [Hb Hb ]. For all μ ∈ [μb μa ], we have δz
zW (−z) ˆ − R(z μ) = − V (φ(μb )) + V (φb )[φ(μ) − φ(μb )] (1 − σ)A A
1 + (1 + zμ) V (Hb ) + V (Hb )[H(−z μ) − Hb ] μ Because (1 + zμ)H(−z μ) is linear in z, the last two terms in the above exˆ pression are linear in z. In this case, part (iii) in (3.11) implies that R(z μ) is strictly concave in z and twice continuously differentiable in z and μ for all μ ∈ [μb μa ]. Thus, the optimal choice z(μ) is unique and, by the supposition, equal to z0 . Since z0 < 0 (see Theorem 3.1), z0 satisfies the complementary slackness condition, Rˆ 1 (z0 μ) ≤ 0 and z0 ≥ −1/aH . Moreover, in this case, strict supermodularity of Rˆ implies Rˆ 12 (z μ) > 0 and strictly concavity of Rˆ in z implies Rˆ 11 (z μ) < 0 for all μ ∈ [μb μa ]. If z0 > −1/aH , then Rˆ 1 (z0 μ) = 0, which implies dz0 /dμ = −Rˆ 12 /Rˆ 11 > 0. This contradicts the supposition that z0 is constant for all μ ∈ [μb μa ]. Thus, z0 = −1/aH .
534
F. M. GONZALEZ AND S. SHI
Repeat the above argument for all μ ∈ [φi (μb ) φi (μa )], where φi (μ) = φ(φi−1 (μ)) and i = 1 2 For such μ, V is linear and Z(μ) is the singleton, {−a−1 H }. Take an arbitrary μc ∈ (μb μa ). Since Z(φi (μc )) = {−a−1 H } for all positive integers i, then limi→∞ Z(φi (μc )) = {−a−1 H }. From the definition of φ(μ), it is clear that φ(aH ) = aH , φ(aL ) = aL , and φ(μ) > μ for all μ ∈ (aL aH ). Thus, limi→∞ φi (μ) = aH for every μ ∈ (aL aH ) and, particularly, for μ = μc . BeQ.E.D. cause Z is upper hemicontinuous, we conclude that {−a−1 H } ∈ Z(aH ). C. Proof of Theorem 5.1 Fix μ ∈ (aL aH ) and use the notation h(μ) = H(−z(μ) μ). (i) Because H(−z μ) is increasing in z, H(−z + (μ) μ) = h+ (μ) and H(−z − (μ) μ) = h− (μ). Since V (h+ (μ)) ≥ V (h− (μ)), we can prove that Rˆ 1 (z + (μ) μ) ≥ Rˆ 1 (z − (μ) μ) (see the Supplemental Material (Gonzalez and Shi (2010))). However, the optimality of z(μ) requires Rˆ 1 (z + (μ) μ) ≤ 0 ≤ Rˆ 1 (z − (μ) μ). It must be true that Rˆ 1 (z − (μ) μ) = Rˆ 1 (z + (μ) μ) = 0, which requires that V (h− (μ)) = V (h+ (μ)) = V (h(μ)). (ii) Let {μi } be a sequence with μi → μ and μi ≥ μi+1 ≥ μ for all i. Be¯ ¯ i )} is a decreasing sequence, and cause z(μ) is an increasing function, {z(μ ¯ ¯ i ) ≥ z(μ) for all i. Thus, z¯ (μi ) ↓ zc for some zc ≥ z¯ (μ). On the other z(μ hand, the theorem of the maximum implies that the correspondence Z(μ) is upper hemicontinuous (see Stokey, Lucas, and Prescott (1989, p. 62)). Because μi → μ, and z¯ (μi ) ∈ Z(μi ) for each i, upper hemicontinuity of Z im¯ i )} that converges to an element plies that there is a subsequence of {z(μ in Z(μ). This element must be zc , because all convergent subsequences of a convergent sequence must have the same limit. Thus, zc ∈ Z(μ), and so ¯ which shows that z¯ (μ) zc ≤ max Z(μ) = z¯ (μ). Therefore, z¯ (μi ) ↓ zc = z(μ), is right-continuous. Similarly, by examining the sequence {μi } with μi → μ and μ ≥ μi+1 ≥ μi for all i, we can show that z is left-continuous. (iii) Let μa be another arbitrary value in the interior of (aL aH ). Because z¯ (μ) maximizes R(−z μ) for each given μ, then (1 + r)V (μa ) = b + (1 − σ)R(−¯z(μa ) μa ) ≥ b + (1 − σ)R(−¯z (μ) μa ) (1 + r)V (μ) = b + (1 − σ)R(−¯z (μ) μ) ≥ b + (1 − σ)R(−¯z(μa ) μ) For μa > μ, we have R(−¯z (μ) μa ) − R(−¯z(μ) μ) V (μa ) − V (μ) ≤ (1 + r)(μa − μ) (1 − σ)(μa − μ) R(−¯z (μa ) μa ) − R(−¯z(μa ) μ) ≤ (1 + r)(μa − μ)
THEORY OF LEARNING, SEARCH, AND WAGES
535
Take the limit μa ↓ μ. Under (4.2), V (H(−¯z(μa ) μa )) exists for each μa (see ¯ a ) is right-continuous, limμa ↓μ z(μ ¯ ¯ a ) = z(μ). part (i)). Because z(μ Thus, all 1 1 three ratios above converge to the same limit, 1−σ V (μ+ ) = 1+r R2 (−¯z(μ) μ+ ), where R2 (−¯z (μ) μ+ ) is given as
δ W (−¯z(μ)) ¯ − V (φ(μ)) + V H(−¯z(μ) μ) z(μ) − (1 − σ)A A μ¯z(μ)δ + V (φ (μ))φ (μ) A
+ [μ¯z (μ) + 1]V H(−¯z (μ) μ) H2 (−¯z(μ) μ)
−
× Similarly, using left-continuity of z(μa ), we can prove that V (μ− ) = 1−σ 1+r R2 (−z(μ) μ− ). ˆ we can verify (iv) From the above expression for R2 and the relation R = μR, R2 (−¯z (μ) μ+ ) ≥ R2 (−¯z (μ) μ− ) ˆ ≥ R(z(μ) μ− ) + μRˆ 2 (z(μ) μ− ) = R2 (−z(μ) μ− ) The first inequality comes from strict convexity of V , and it is strict if and only if V (φ+ (μ)) > V (φ− (μ)). The second inequality comes from strict superˆ ¯ modularity of R(z μ), and it is strict if and only if z(μ) > z(μ). Therefore, + − ¯ V (μ ) = V (μ ) if and only if V (φ(μ)) exists and z(μ) = z(μ). (v) Assume that V (μa ) exists for a particular (interior) μa , such as μa = h(μ) for any arbitrary interior μ. By part (iv), z(μa ) is unique and V (φ(μa )) exists. Recall that V (h(μa )) always exists, by part (i). Since V is now differentiable at all posterior beliefs reached from μa under the optimal choice, we can take each of these subsequent nodes and repeat the argument. This shows that the optimal choice is unique and the value function is differentiable at all nodes on the tree generated from μa in the equilibrium. Q.E.D. REFERENCES ACEMOGLU, D., AND R. SHIMER (1999): “Efficient Unemployment Insurance,” Journal of Political Economy, 107, 893–928. [510,513] ADDISON, J. T., AND P. PORTUGAL (1989): “Job Displacement, Relative Wage Changes, and Duration of Unemployment,” Journal of Labor Economics, 7, 281–302. [509-511,529] ALEXOPOULOS, M., AND T. GLADDEN (2007): “The Effects of Wealth and Unemployment Benefits on Search Behavior and Labor Market Transitions,” Manuscript, University of Toronto. [510] AMIR, R. (1996): “Sensitivity Analysis of Multisector Optimal Economic Dynamics,” Journal of Mathematical Economics, 25, 123–141. [524] AMIR, R., L. J. MIRMAN, AND W. R. PERKINS (1991): “One-Sector Nonclassical Optimal Growth: Optimality Conditions and Comparative Dynamics,” International Economic Review, 32, 625–644. [512,522]
536
F. M. GONZALEZ AND S. SHI
BALVERS, R. J., AND T. F. COSIMANO (1993): “Periodic Learning About a Hidden State Variable,” Journal of Economic Dynamics and Control, 17, 805–827. [511,521] BARRON, J. M., J. BISHOP, AND W. C. DUNKELBERG (1985): “Employer Search: The Interviewing and Hiring of New Employees,” The Review of Economics and Statistics, 67, 43–52. [511] BLACKWELL, D. (1951): “The Comparison of Experiments,” in Proceedings of the Second Berkeley Symposium on Statistics and Probability. Berkeley: University of California Press. [516] BURDETT, K., AND D. T. MORTENSEN (1998): “Wage Differentials, Employer Size, and Unemployment,” International Economic Review, 39, 257–273. [529] BURDETT, K., AND T. VISHWANATH (1988): “Declining Reservation Wages and Learning,” Review of Economic Studies, 55, 655–665. [512,518] BURDETT, K., S. SHI, AND R. WRIGHT (2001): “Pricing and Matching With Frictions,” Journal of Political Economy, 109, 1060–1085. [510] DEGROOT, M. H. (1970): Optimal Statistical Decisions. New York: McGraw-Hill. [516] EASLEY, D., AND N. M. KIEFER (1988): “Controlling a Stochastic Process With Unknown Parameters,” Econometrica, 56, 1045–1064. [511,521] EDLIN, A. S., AND C. SHANNON (1998): “Strict Monotonicity in Comparative Statics,” Journal of Economic Theory, 81, 201–219. [524] GONZALEZ, F. M., AND S. SHI (2007): “An Equilibrium Theory of Declining Reservation Wages and Learning,” Working Paper 292, University of Toronto. [512] (2010): “Supplement to ‘An Equilibrium Theory of Learning, Search and Wages’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/ Supmat/8061_proofs.pdf. [523,530,534] HECKMAN, J. T., AND G. J. BORJAS (1980): “Does Unemployment Cause Future Unemployment? Definitions, Questions and Answers From a Continuous Time Model of Heterogeneity and State Dependence,” Economica, 47, 247–283. [510] HOLZER, H. J., L. F. KATZ, AND A. B. KRUEGER (1991): “Job Queues and Wages,” Quarterly Journal of Economics, 106, 739–768. [511] LAZEAR, E. (2004): “Firm-Specific Human Capital: A Skill-Weights Approach,” Manuscript, Stanford University. [513] LOCKWOOD, B. (1991): “Information Externalities in the Labour Market and the Duration of Unemployment,” Review of Economic Studies, 58, 733–753. [510] MENZIO, G., AND S. SHI (2009): “Block Recursive Equilibria for Stochastic Models of Search on the Job,” Journal of Economic Theory (forthcoming). [511] MILGROM, P., AND C. SHANNON (1994): “Monotone Comparative Statistics,” Econometrica, 62, 157–180. [511,523] MIRMAN, L. J., O. F. MORAND, AND K. L. REFFETT (2008): “A Qualitative Approach to Markovian Equilibrium in Infinite Horizon Economies With Capital,” Journal of Economic Theory, 139, 75–98. [512,522] MOEN, E. R. (1997): “Competitive Search Equilibrium,” Journal of Political Economy, 105, 385–411. [510,513] MORTENSEN, D. T. (2003): Wage Dispersion: Why Are Similar Workers Paid Differently? Cambridge: MIT Press. [529] NYARKO, Y. (1994): “On the Convexity of the Value Function in Bayesian Optimal Control Problems,” Economic Theory, 4, 303–309. [520,524,531,533] PETERS, M. (1984): “Bertrand Equilibrium With Capacity Constraints and Restricted Mobility,” Econometrica, 52, 1117–1129. [510] (1991): “Ex ante Price Offers in Matching Games: Non-Steady State,” Econometrica, 59, 1425–1454. [510] PISSARIDES, C. A. (1992): “Loss of Skill During Unemployment and the Persistence of Employment Shocks,” Quarterly Journal of Economics, 107, 1371–1391. [510] ROYDEN, H. L. (1988): Real Analysis. New York: Macmillan Co. [531] SANTOS, M. (1991): “Smoothness of the Policy Function in Discrete Time Economic Models,” Econometrica, 59, 1365–1382. [521]
THEORY OF LEARNING, SEARCH, AND WAGES
537
SHI, S. (2001): “Frictional Assignment I: Efficiency,” Journal of Economic Theory, 98, 232–260. [510] (2009): “Directed Search for Equilibrium Wage-Tenure Contracts,” Econometrica, 77, 561–584. [511,519] SHIMER, R. (2008): “The Probability of Finding a Job,” American Economic Review: Papers and Proceedings, 98, 268–273. [511] STOKEY, N., R. E. LUCAS, JR., AND E. PRESCOTT (1989): Recursive Methods in Economic Dynamics. Cambridge, MA: Harvard University Press. [511,520,521,531,534] TOPKIS, D. M. (1998): Supermodularity and Complementarity. Princeton, NJ: Princeton University Press. [511,521-523,532]
Dept. of Economics, University of Calgary, 2500 University Drive NW, Calgary, Alberta, Canada, T2N 1N4;
[email protected] and Dept. of Economics, University of Toronto, 150 St. George Street, Toronto, Ontario, Canada, M5S 3G7;
[email protected]. Manuscript received August, 2008; final revision received December, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 539–574
SORTING AND DECENTRALIZED PRICE COMPETITION BY JAN EECKHOUT AND PHILIPP KIRCHER1 We investigate the role of search frictions in markets with price competition and how it leads to sorting of heterogeneous agents. There are two aspects of value creation: the match value when two agents actually trade and the probability of trading governed by the search technology. We show that positive assortative matching obtains when complementarities in the former outweigh complementarities in the latter. This happens if and only if the match-value function is root-supermodular, that is, its nth root is supermodular, where n reflects the elasticity of substitution of the search technology. This condition is weaker than the condition required for positive assortative matching in markets with random search. KEYWORDS: Competitive search equilibrium, directed search, two-sided matching, decentralized price competition, complementarity, root-supermodularity, sorting.
1. INTRODUCTION WE ADDRESS THE ROLE OF SEARCH FRICTIONS in the classic assignment problem when there is price competition. We are interested in a simple condition for positive assortative matching (PAM) that exposes the different forces that induce high types to trade with other high types. In the neoclassical benchmark (Becker (1973), Rosen (1974)), there is full information about prices and types, and markets clear perfectly. Supermodularity of the match value then induces PAM. At the other extreme, Shimer and Smith (2000) assumed that there are random search frictions, and agents cannot observe prices and types until after they meet. They derived a set of conditions that ensure PAM and that jointly imply that the match value is log-supermodular. In this paper, we consider a world with search frictions, yet there is information about prices and types. This circumvents the feature of the random search model that agents necessarily meet many trading partners that they would rather have avoided. Heterogeneous sellers compete in prices for buyers, and we find that sorting is driven by a simple efficiency trade-off between the gains from better match values and the losses due to no trade. The former are captured by complementarities in the match value, which have to offset complementarities in the search technology as measured by the elasticity of substitution. This economic trade-off establishes that PAM occurs for all type distributions if and only if the match value is root-supermodular, that is, its nth root is supermodular where n depends on the elasticity of substitution of the search technology. This condition 1 The paper initially circulated under the title “The Sorting Effect of Price Competition.” We would like to thank numerous colleagues and seminar participants for insightful discussions and comments. We greatly benefited from comments by Ken Burdett, John Kennan, Stephan Lauermann, Benny Moldovanu, Michael Peters, Andrew Postlewaite, Shouyong Shi, Robert Shimer, and Randy Wright. Kircher gratefully acknowledges support from the National Science Foundation, Grant SES-0752076, and Eeckhout acknowledges support from the ERC, Grant 208068.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7953
540
J. EECKHOUT AND P. KIRCHER
is weaker than log-supermodularity and has a transparent economic interpretation. The key ingredients of our model are diversity, market frictions, and price competition. Diversity is the hallmark of economic exchange. People have different preferences over goods and are endowed with diverse talents. Such diverse tastes and endowments lead to different market prices that are driven by the supply and demand of each variety. Spatially differentiated goods like houses, for example, are priced depending on the characteristics of the occupants, location, and the dwelling itself. Assets in the stock market are differentiated depending on many characteristics, most notably mean and variance. In labor markets, salaries vary substantially depending on the experience and skill of the worker and on the productivity and safety of the job. While centralized price setting (see Rosen (2002), for an overview) adequately captures environments such as the stock market, in many other environments trading is decentralized and frictions are nonnegligible. In the labor market, for example, unemployment is a natural feature; in the housing market, several months delay in finding a buyer is usual. To captures these features, we consider a decentralized market framework with search frictions, yet with price competition. This framework is known as directed search or competitive search. Sellers have one unit for sale and buyers want to buy one unit. Think of “locations” or “submarkets” indexed by the quality of the product and the trading price. Sellers of a particular quality choose the location with the price they want to obtain. Buyers observe the sellers at the various locations and decide at which location they would like to trade, that is, which quality–price combination to seek. At each location there remain search frictions that prevent perfect trade: When the ratio of buyers to sellers at a location is high, then the probability of trade is high for the sellers and low for the buyers. Observe that the location metaphor is used for simplicity but is not crucial (e.g., in Peters (1991, 1997a) buyers choose an individual seller with the desired quality–price announcement, but sometimes multiple buyers choose the same seller and not all can trade). Prices guide the trading decisions just like in the Walrasian model of Becker (1973) and Rosen (1974), only now the possibility that a person cannot trade remains an equilibrium feature that is taken into account in the price setting. One novelty of our setting relative to the earlier directed search literature is that it is designed to handle rich (continuous) type distributions on both sides of the market. We identify the economic forces that drive the sorting pattern, and provide a necessary and sufficient condition on the strength of supermodularity that ensures positive assortative matching. The key economic insight is that the creation of value can be decomposed into two sources: the complementarity in the match value upon trading and the complementarity in the search technology. In the Walrasian framework, only the first source is present. When both are present, they trade off against each other: the first leads toward positive assortative matching; the second leads toward negative assortative matching.
SORTING AND DECENTRALIZED PRICE COMPETITION
541
If the former outweighs the latter, positive assortative matching obtains. We can summarize the necessary and sufficient condition required for PAM by root-supermodularity of the match-value function, that is, the nth root of the match-value function is supermodular. The magnitude of n is determined by the upper bound of the elasticity of substitution of the search technology. Similarly, match values that are nowhere n-root-supermodular lead to negative assortative matching (NAM), where n now denotes the lower bound of the elasticity of substitution in the search technology. The economic intuition of this trade-off between frictions and complementarities in match values is transparent in terms of the fundamentals of the economy. In the absence of any complementarities, sorting is not important for the creation of match value. The key aspect is “trading security,” that is, to ensure trade and avoid frictions. High-type buyers would like to trade where few other buyers attempt to trade. This allows them to secure trade with high probability and they are willing to pay for this. While sellers know that they might be idle if they attract few buyers on average, some are willing to do this at a high enough price. The low-type sellers are those who find it optimal to provide this trading security, as their opportunity cost of not trading is lowest. This results in negative assortative matching: high-type buyers match with low-type sellers. In the directed search literature, Shi (2001) was the first to highlight for a specific search technology that supermodularity is not enough to ensure positive assortative matching. Here, we address in a general context the extent of the complementarities required for positive assortative matching and we isolate the economic forces that govern such sorting.2 How much supermodularity is needed—how fast marginal output changes across different matched types—depends on how fast the probability of matching changes when moving across different types with different buyer-to-seller ratios. The change in the matching probability is captured by the elasticity of substitution of the search technology. The elasticity of substitution measures how many more matches are created as the ratio of buyers to sellers increases. If it is high, then matching rates are very sensitive to the buyer–seller ratio and submarkets with lots of low-type sellers make it easy for the high-type buyers to trade, while submarkets with lots of low-type buyers make it easy for the high seller types to trade. The “trading-security” motive is important since the gains from negative sorting are large, and positive sorting only arises if the match value improves substantially when high types trade with high types rather than low types. If the elasticity of substitution in the search technology is low, then it is difficult to generate additional matches for the high types and even moderate strength of the match value motive will offset the tendency to seek trading security. The exact level of supermodularity required for positive sorting can be expressed by requiring a concave transformation of the match value to be su2
We relate our findings to Shi’s (2001) insight in greater detail in Section 6.
542
J. EECKHOUT AND P. KIRCHER
permodular. In particular, it can be summarized by the (relative) Arrow–Pratt measure of the transform, which has to be as large as the elasticity of substitution of the search technology. The latter is in the unit interval, so the associated transform is the nth root, where n depends on the exact magnitude of the elasticity of substitution. The root-supermodularity condition therefore neatly summarizes the trade-off between complementarity in match value and the elasticity of substitution of the search technology. For PAM our condition is weaker than log-supermodularity required in random search models such as Shimer and Smith (2000)3 and Smith (2006). The key difference is that our framework allows agents to seek the quality and price they desire. This leads to a rather simple and straightforward condition for sorting. It requires a lower degree of complementarity in the match value to overcome the search frictions. Only when the search technology approaches perfect substitutability is log-supermodularity needed. Our condition for positive assortative matching therefore falls in between those for frictionless trade of Becker and random search. Yet, when it comes to negative assortative matching, our results differ substantially. Match values that are nowhere n-root-supermodular induce negative sorting. In particular, this is the case for any weakly submodular match-value function. If the matching technology never approaches perfect complementarity (this excludes the urn–ball search technology), then there are strictly supermodular match-value functions such that negative sorting arises for any distribution of types. To our knowledge, this is new in the literature on sorting with or without frictions. In comparison, negative assortative matching obtains only under stronger conditions both in the frictionless case (strict submodularity) and with random search (logsubmodularity). Our requirement of root-supermodularity is necessary and sufficient to ensure positive assortative matching if we allow for any distribution of types. It is binding when the buyer–seller ratio in some market induces the highest possible elasticity of substitution of the search technology. For some distributions, this is not a binding restriction, and in this case there are match value functions with less complementarity that nonetheless induce positive assortative matching. In that sense, our condition is one of weak necessity. Likewise, the condition that ensures negative assortative matching for any distribution of types is stringent, requiring, for example, the absence of any complementarities for the case of urn–ball matching. Again, we show that for many search technologies (such as urn–ball) there exist particular distributions for which weaker requirements suffice. 3 The models are not immediately comparable partly because random search requires a set based notion of assortative matching, while the frictionless benchmark and our model do not. Note also that the conditions in Shimer and Smith (2000) include log-supermodularity even of first and cross-partial derivatives, but not log-supermodularity. However, coupled with monotonicity as assumed throughout our model, log-supermodularity is implied by their conditions. We discuss the relation to Shimer and Smith (2000) and other work in Section 6.
SORTING AND DECENTRALIZED PRICE COMPETITION
543
Our results hold for very general search technologies and match values. Yet, it turns out that a large class of widely used search technologies has a common condition, that of square-root-supermodularity. This is the case for any search technology that has bounds on its derivatives at zero and some curvature restriction, for example, the urn–ball search technology. In this class, the value of the elasticity at zero is always one-half. In contrast, the constant elasticity of substitution (CES) search technology satisfies the Inada conditions and, therefore, does not have bounded derivatives. Because its elasticity of substitution is constant, it separates the range of positive and negative sorting exactly. Finally, we establish existence of a sorting equilibrium and show efficiency, that is, the planner’s solution can be decentralized. While the efficiency properties of directed search models are well known (see, e.g., Moen (1997), Acemoglu and Shimer (1999b), and Shi (2001)), we discuss in particular the connection of our condition to the well known Hosios condition. Hosios’ (1990) original contribution considers identical buyer and seller types, and relates the first derivative of the aggregate search technology to the match value. In our setting, this holds for each submarket. With heterogeneity, agents have a choice of which submarket to join. Our root-supermodularity condition ensures efficient sorting across submarkets by relating the elasticity of substitution of the aggregate search technology to the complementarities in the match value. In the discussion section, we relate our model to existing results in the search literature. We discuss directed and random search, and the relationship of our model to the large literature on the foundations of competitive equilibria as limits of matching games with vanishing frictions. We consider a convergent sequence of search technologies in our static economy such that, in the limit, the short side of the market gets matched with certainty. To our knowledge, considering vanishing frictions as the limit of a sequence of static search technologies is new in this literature on foundations of competitive equilibrium. In the conclusions, we also highlight that our results do not only apply to search markets, but also shed some initial light on sorting in many-to-many matching markets. 2. THE MODEL We cast our model in the context of a generic trading environment between buyers and sellers, as is often done in the directed search literature. This environment includes the labor market and many other markets with two-sided heterogeneity and search frictions. Our setup is chosen to be as general as possible and to encompass a broad class of different search technologies. Players: There is a mass of heterogeneous sellers who are indexed by a type y ∈ Y that is observable. Let S(y) denote the measure of sellers with types weakly below y ∈ Y We assume Y = [y y] ⊂ R+ and that S(y) denotes the overall measure of sellers. Each seller has one good for sale. On the other
544
J. EECKHOUT AND P. KIRCHER
side of the market there is a unit mass of buyers. Buyers differ in their valuation for the good, which is private information. Each buyer draws his type x independently and identically distributed (i.i.d.) from distribution B(x) on X = [x x] ⊂ R+ . S and B are C 2 , with strictly positive derivatives s and b respectively. It is convenient to think of a continuum of agents of each type, and of b(x) and s(y) as the size of the group of type x buyers and type y sellers. Preferences: The value of a good consumed by buyer x and bought from seller y is given by f (x y) where f is a strictly positive function f : R2+ →R++ Conditional on consuming and paying a price p, the utility of the buyer is f (x y) − p and that of the seller is p. That is, agents have quasilinear utilities. We discuss broader preferences for the seller in the conclusion. We assume that f is twice continuously differentiable in (x y) We consider indices x and y that are ordered such that they increase the utility of the buyer: fx > 0 fy > 0. The utility of an agent who does not consume is normalized to zero. Clearly, no trade takes place at prices below zero and above f (x y) and we define the set of feasible prices as P = [0 f (x y)] All agents maximize expected utility. Search Technology: The model is static.4 There are search frictions in the sense that with positive probability, a buyer does not get to match with the seller he has chosen. The extent of the frictions depends on the competition for the goods. We capture this idea of competition by considering the ratio of buyers to sellers, denoted by λ ∈ [0 ∞], and refer to it as the expected queue length. This ratio varies in general with the quality of the good offered and the price posted. When a seller faces a ratio of λ, then he meets (and trades with) a buyer with probability m(λ) The idea that relatively more buyers make it easier to sell is captured by assuming that m : [0 ∞] → [0 1] is a strictly increasing function. Analogously, buyers who want to trade at a price–quality combination that attracts a ratio λ of buyers to sellers can buy with probability q(λ) where q : [0 ∞] → [0 1] is a strictly decreasing function: when there are relatively more buyers, it becomes harder for them to trade. Trading in pairs requires that q(λ) = m(λ)/λ. We additionally impose the standard assumption that m is twice continuously differentiable, strictly concave, and has a strictly decreasing elasticity. Examples of Search Technologies: There are many ways to interpret and provide a microfoundation for the search technology. The most common one arises when buyers directly choose a seller but use an anonymous strategy in their selection. That means that once they decide on the quality–price combination, they choose one of the sellers with these characteristics at random. In a large market with many buyers and sellers, the probability that a seller has at least one buyer and can trade is approximately m1 (λ) = 1 − e−λ . This search technology was first proposed by Butters (1977) (see also Peters (1991), Shi (2001), Shimer (2005)). Variations of this specification arise naturally, for 4 We discuss our findings for steady states of a repeated model in the conclusion. See also our working paper version.
SORTING AND DECENTRALIZED PRICE COMPETITION
545
example, when a fraction 1 − β of the buyers gets lost on the way to the sellers, we have m2 (λ) = 1 − e−βλ . Alternatively, if, at each price–quality combination, agents form pairs randomly, but trade only occurs when a seller is paired with a buyer (in the spirit of Kiyotaki and Wright (1993)), the matching probability is m3 (λ) = λ/(1 + λ). Extensive Form and Trading Decisions: The extensive form of the market interaction has two stages. In stage 1, all sellers simultaneously post a price p at which they are willing to sell the good. In stage 2, after observing the sellers’ qualities and their posted prices (y p), buyers simultaneously decide where to attempt to buy, that is, each buyer chooses the quality–price combination (y p) that she seeks. A buyer for whom all the prices p are too high can always choose the option of no trade, denoted by ∅.5 A buyer who gets matched consumes the good and pays the posted price. Whether a buyer gets matched with a seller is determined by the search technology. This two-stage extensive form is in the spirit of, for example, Peters (1991, 2000) and Acemoglu and Shimer (1999a, 1999b). We denote by G(y p) and H(x y p) the distribution of trading decisions of sellers and buyers, that is, G(y p) is the measure of sellers who offer a quality–price combination below (y p) and H(x y p) is the measure of buyers with types below x who attempt to buy a quality–price combination that is below (y p) For many subsequent discussions, the marginals of these distributions are important; we denote them with subscripts. For example, HX (x) is the fraction of buyers with type below x and HYP (y p) is the fraction of buyers that search for a quality below y at a price below p. We impose the following two requirements. First, we require GY = S and HX = B that is, the measure of traders coincides with the distribution in the population. Second, we require HYP to be absolutely continuous with respect to G which means that if there are no sellers who have chosen prices in some set, then no buyers will try to buy from that set. This will enable us to use the Radon–Nikodym derivative below. Equilibrium: Our equilibrium concept follows the literature on large games (see, e.g., Mas-Colell (1984)), where the payoff of each individual is determined only by his own decision and by the distribution of trading decisions G and H in the economy, which in turn have to arise from the optimal decisions of the individual traders.6 To define the expected payoffs for each agent given G and H, let the function ΛGH : Y × P → [0 ∞] denote the expected queue length at each quality–price combination. Along the support of the sellers’ trading distribution G it is given by the Radon–Nikodym derivative 5 To make the choice of no trade consistent with the rest of our notation, let ∅ = (∅y ∅p ), where ∅y < y denotes a nonexistent quality and ∅p < 0 denotes a nonexistent price, and the trading probability at ∅ is zero. 6 We are grateful to Michael Peters for pointing out to us this approach, which brings the competitive search model in line with the standard game theoretic approach to large markets.
546
J. EECKHOUT AND P. KIRCHER
ΛGH = dHYP /dG.7 Along the support of G, we can define the expected payoff of sellers as (1)
π(y p G H) = m(ΛGH (y p))p
and the expected pay off of buyers as (2)
u(x y p G H) = q(ΛGH (y p))[f (x y) − p]
So far the payoffs are only determined on the path of play, since the buyer– seller ratio ΛGH is only well defined there. We extend the payoff functions by extending the queue length function ΛGH to all of Y × P A seller who contemplates a deviation and offers a price different from all other sellers, that is, (y p) ∈ / supp G, has to form a belief about the queue length that he will attract. We follow the literature (e.g., McAfee (1993), Acemoglu and Shimer (1999b), Shimer (2005)) by imposing restrictions on beliefs in the spirit of subgame perfection: the seller expects a queue length ΛGH (y p) larger than zero only if there is a buyer type x ∈ X who is willing to trade with him. Moreover, he expects the highest queue length for which he can find such a buyer type, which means that he expects buyers to queue up for the job until it is no longer profitable for them to do so. Formally, that means that (3) ΛGH (y p) = sup λ ∈ R+ : u(x y P G H) ∃x; q(λ)[f (x y) − P] ≥ max (y p )∈supp G
if that set is nonempty and ΛGH (y p) = 0 otherwise. This extension defines the queue length and thus the matching frictions and payoffs on the entire domain.8 Here the queue length function ΛGH acts similar to Rosen’s (1974) hedonic price schedule in the sense that individuals take this function as given, and an equilibrium simply states that all trading decisions according to G and H are indeed optimal given the implied queue lengths. DEFINITION 1: An equilibrium is a pair of trading distributions (G H) such that the following conditions hold: (i) Seller Optimality: (y p) ∈ supp G only if p maximizes (1) for y. 7 On the support of G, the Radon–Nikodym derivative is well defined, up to a zero measure set: any two derivatives coincide almost everywhere. To achieve everywhere well defined payoffs in (1) and (2), assume some rule that selects a unique ΛGH on supp G for each (G H) For our existence proof, we require the selection to be continuous and differentiable wherever possible on supp G as this will select the derivative that we construct. 8 For particular microfoundations of the matching function in an economy with one-sided heterogeneity, Peters (1991, 1997a, 2000) showed that the specification of the matching frictions in (3) indeed arises the equilibrium path after a deviation by an individual seller.
SORTING AND DECENTRALIZED PRICE COMPETITION
547
(ii) Buyer Optimality: (x y p) ∈ supp H only if (y p) maximizes (2) for x Assortative Matching: Our main focus is on the sorting of buyers across sellers. In ex ante terms, an allocation is not one-to-one since the ratio of buyers to sellers is, in general, different from 1. Therefore, we define sorting in terms of the distribution of visiting decisions of buyers H. Consider active buyer types x who choose to be in the market rather than taking their outside option ((x ∅) ∈ / supp H). We say that H entails assortative matching if there exists a strictly monotone function ν that maps these buyer types into Y such that HX Y (x ν(x)) = B(x) for all active buyer types. This means that ν(x) is the seller type with which buyer type x would like to trade. We say that matching is positive assortative if ν is strictly increasing and is negative assortative if it is strictly decreasing. Since ν is strictly monotone, it is uniquely characterized by its inverse μ ≡ ν −1 , where μ(y) denotes the buyer type that visits seller y Throughout we will consider this inverse and call it the assignment. 3. THE MAIN RESULTS In equilibrium, an individual seller of type y takes the trading distributions G and H as given, and according to part (i) of the equilibrium definition, his pricing decision solves maxp m(ΛGH (y p))p. This seller can set a price that does not attract any buyers (ΛGH (y p) = 0) or he can set a price that attracts buyers (ΛGH (y p) > 0) and we can substitute (3), which holds by assumption outside the support of G and also by equilibrium condition (ii) on the support of G. Therefore, the seller’s problem can be written as max m(λ)p : λ = sup λ : ∃x; q(λ )[f (x y) − p] ≥ U(x G H) λp
where we introduced U(x G H) ≡ max(y p )∈supp G u(x y p G H) to denote the highest utility that a buyer of type x can obtain. By equilibrium condition (ii), U(x G H) is continuous. Therefore, for sellers who trade with positive probability, this problem is equivalent to max m(λ)p : q(λ)[f (x y) − p] = U(x G H) (4) xλp
This maximization problem has a natural interpretation that is common to much of the literature on competing mechanism design. It states that a seller can choose prices and trading probabilities as well as the buyer type that he wants to attract, as long as the utility for this buyer is as large as the utility that he can get by trading with other sellers. Note also that (x y p) cannot be in the support of the buyers’ equilibrium trading strategy H if there does not exist a λ such that (x λ p) solves (4) for y, since the price and associated queue length offered by y will not allow buyer x to obtain his expected equilibrium utility U(x G H). To simplify notation in what follows, we suppress the dependence of the variables on G and H when there is no danger of confusion.
548
J. EECKHOUT AND P. KIRCHER
We will now derive a necessary condition for assortative matching. For expositional purposes, we focus on a particular class of equilibria in this derivation that fulfill a number of differentiability conditions. Consider a candidate equilibrium (G H) that is assortative, that is, it permits a strictly monotone assignment μ(y), and has a unique price p(y) offered by seller type y with both μ(y) and p(y) differentiable.9 The focus on a differentiable equilibrium is just for convenience of exposition in the main body. The formal proofs do not assume differentiability a priori. For any seller y who trades at an interior queue length, we can use the constraint to substitute out the price in (4). Since m(λ) = λq(λ), this yields (5)
max m(λ)f (x y) − λU(x) xλ
Along the equilibrium path, seller y’s assigned buyer type μ (i.e., μ(y)) and his queue length Λ (i.e., Λ(y p(y))) solve this program and are characterized by its first-order conditions (6)
m (Λ)f (μ y) − U(μ) = 0
(7)
m(Λ)fx (μ y) − ΛU (μ) = 0
The first-order conditions only characterize an optimal choice if the secondorder condition is satisfied. To verify the second-order condition, we derive the Hessian along the equilibrium path: m (Λ)fx (μ y) − U (μ) m (Λ)f (μ y) (8) m (Λ)fx (μ y) − U (μ) m(Λ)fxx (μ y) − ΛU (μ) The term m (Λ)f (μ y) is strictly negative and the point (Λ μ) is a local maximum only if the determinant of the Hessian is positive: (9)
m (Λ)f (μ y)(m(Λ)fxx (μ y) − ΛU (μ)) − (m (Λ) − m(Λ)/Λ)2 fx (μ y)2 ≥ 0
where in the last term of this inequality we have substituted U from (7). Totally differentiating (7) with respect to y and using (7) yields the expression (10)
U (μ) =
m(Λ) fxx (μ y) Λ 1 m(Λ) dΛ + m f + m(Λ)f (Λ) − (μ y) (μ y) x xy Λμ Λ dy
9 We require this only for those types that trade with strictly positive probability. A unique price p(y) means that (y p(y)) ∈ supp G and (y p ) ∈ / supp G for any other p = p(y) Finally, we note that μ(y) and p(y) are differentiable only if U(x G H) is twice differentiable in x and Λ(y p(y)) is totally differentiable in y, as shown in (10) and (11) below.
SORTING AND DECENTRALIZED PRICE COMPETITION
549
Totally differentiating (6) with respect to y and substituting (7) yields an expression for the change of the queue length along the equilibrium path: (11)
dΛ 1 = − dy m (Λ)f (μ y) m(Λ) × m (Λ) − fx (μ y)μ + m (Λ)fy (μ y) Λ
Substituting (10) and (11) into (9) allows us to cancel terms, and after rearranging and multiplying by μ (y)2 , we are left with m (Λ)(Λm (Λ) − m(Λ)) fx (μ y)fy (μ y) (12) ≥ 0 μ (y) fxy (μ y) − Λm (Λ)m(Λ) f (μ y) To satisfy the second-order condition, both terms in (12) must have identical signs. Under PAM (μ > 0), the term in square brackets has to be positive; under NAM (μ < 0), it has to be negative. Defining (13)
a(λ) ≡
m (λ)(m (λ)λ − m(λ)) λm(λ)m (λ)
the following lemma follows immediately. LEMMA 1: In any differentiable equilibrium that satisfies positive assortative matching, (14)
fxy (μ y)f (μ y) ≥ a(Λ) fy (μ y)fx (μ y)
has to hold along the equilibrium path, with the opposite sign in any differentiable equilibrium with negative assortative matching. This condition is stronger than standard supermodularity, because our assumptions on the search technology imply that a(λ) ∈ [0 1] for all λ.10 A related but different condition was reported by Shi (2001) for a specific directed search model. His condition arises as a special case of (14), as we discuss in more detail in Section 6. The benefit of expression (14) is that it provides a 10 One can rewrite (13) as a(λ) = m (λ)q (λ)/(m (λ)q(λ)) and our assumptions on the search technology immediately yield a(λ) > 0 for all λ ∈ (0 ∞) Furthermore, some straightforward algebra shows that a strictly decreasing elasticity of m implies that a(λ) < 1 for all λ ∈ (0 ∞) More details are presented in the published working paper version. All results in this paper obtain even without the standard assumption that the elasticity of m is decreasing, only that the right-hand side of condition (14) might be larger than 1, which requires stronger supermodularity conditions.
550
J. EECKHOUT AND P. KIRCHER
clear economic interpretation of the trade-offs for sorting in markets in which both search frictions and complementarities in values are present. The economic insight of Lemma 1 becomes transparent when we interpret condition (14) in terms of the aggregate search technology M. This aggregate search technology is defined as the total number of matches that arise when β buyers are in a market with σ sellers, that is, M(β σ) = σm(β/σ). Substituting for M in (14) delivers the condition (15)
fxy (μ y)f (μ y) Mb (Λ 1)Ms (Λ 1) ≥ fx (μ y)fy (μ y) Mbs (Λ 1)M(Λ 1)
The right-hand side measures the elasticity of substitution of the aggregate search technology M denoted by ESM .11 When f is constant returns, the lefthand side measures the inverse of the elasticity of substitution of the match value function f denoted by ESf (see Hicks (1932)). The condition highlights the nature of the trade-off between match value and trading security. To obtain PAM, the inverse of the elasticity of substitution of the surplus function ESf must exceed the elasticity of substitution of the search technology ESM : ES−1 f ≥ ESM . If different markets are very substitutable (high ESM ), then x and y have to be strong complements (high fxy and, therefore, low ESf ). The latter corresponds to the gain in match value due to complementarity and reflects the marginal increase in output from increasing both types. That degree of complementarity must offset the gains from using additional low types to help high types trade. If the elasticity of substitution ESM is large, additional low types are very efficient in providing such trading security. Therefore, complementarities in production have to be strong to nevertheless induce PAM. For aggregate search technologies with a constant elasticity of substitution, the righthand side of (14) is constant and determines the degree of supermodularity required of f . In general, the supremum and infimum of that elasticity become of importance. Let a = sup a(λ); a = inf a(λ) Both lie in [0 1]. We discuss some specific search technologies in depth in the next section, after presenting the main results on sorting. To state our main result, we first introduce a notion of the degree of supermodularity. Clearly, for condition (14) to hold, it does not suffice that function f is simply supermodular. For any two buyer and seller types x2 > x1 and y2 > y1 supermodularity means that the total value when the high types trade and when the low types trade is higher than when there is cross-trade (low with high and vice versa): f (x2 y2 ) + f (x1 y1 ) ≥ f (x2 y1 ) + f (x1 y2 ). This also means that the extreme values (very high f and very low f ) on the left-hand side of the inequality are jointly higher than the intermediate values on the 11 We are grateful to John Kennan for pointing out that a(λ) is equal to the elasticity of substitution of the aggregate search technology ESM .
SORTING AND DECENTRALIZED PRICE COMPETITION
551
right. The equivalent condition when f (x y) is differentiable is that the crosspartial is positive: fxy (x y) > 0. Such a condition only includes the gains if agents trade, but in our setting we also need to consider the losses if agents do not trade. These losses especially affect the high types and gives them extra incentives to ensure trade by attracting (many) low types. We therefore need a stronger condition for positive sorting, and the idea that assortative matching becomes harder can be captured by strengthening the supermodularity condition as follows. Let g be a concave function and require that g ◦ f be supermodular, that is, g ◦ f (x2 y2 ) + g ◦ f (x1 y1 ) ≥ g ◦ f (x2 y1 ) + g ◦ f (x1 y2 ) Concavity affects extreme values on the left of the inequality more than intermediate values on the right, which makes this condition of assortative matching more difficult to fulfill. This is easiest to see in the differential version of this inequality: ∂2 g(f (x y))/∂x ∂y ≥ 0 or, equivalently, (16)
fxy (x y)f (x y) g (f (x y))f (x y) ≥− fx (x y)fy (x y) g (f (x y))
Exactly how much more difficult it is to sustain this inequality is captured by the (relative) Arrow–Pratt measure of the transform g on the right-hand side of (16). For example, this measure is 0 if g is a linear transformation and it is 1 if g is a log-transformation. Compare this inequality with (14). By virtue of the sup (or inf) of a the right-hand side of (14) is a constant in the unit interval. A constant right-hand side
of (16) with similar magnitude is exactly induced by the transformation g(f ) = n f
. We say that function f is n-root-supermodular with coefficient n ∈ (1 ∞) if n f is supermodular. By (16), this requires that the cross-partial derivative of f is sufficiently large, that is, fxy (x y)f (x y) ≥ 1 − n−1 fx (x y)fy (x y) This captures standard supermodularity when n = 1 and approaches logsupermodularity as n → ∞ We can now state the main result: THEOREM 1: For any type distributions B and S any equilibrium is positive assorted if and only if function f is n-root-supermodular, where n = (1 − a)−1 . For any type distributions B and S any equilibrium is negative assorted if and only if function f is nowhere n-root-supermodular, where n = (1 − a)−1 . See the Appendix for the proof. The proof focusses on positive assortative matching and consists of two parts. First, we show that (strict) n-root-supermodularity implies positive assortative matching. Since we want to rule out other equilibria that might be nonassortative, we cannot work with a monotone differentiable assignment μ; therefore, we deploy a different proof technique than in the derivation of condition
552
J. EECKHOUT AND P. KIRCHER
(14). Second, we show that positive assortative matching for all type distributions implies that f has to be (weakly) n-root-supermodular. Here the proof works by contradiction: If f is not n-root-supermodular at some point (x y) in the domain, then we can construct a type distribution such that types in the neighborhood of (x y) trade at a queue length λ with a(λ) close enough to a and, therefore, larger than the degree of root-supermodularity of f . This directly contradicts the condition for PAM in Lemma 1 for differentiable equilibria, and a similar contradiction can be derived for nondifferential equilibria. Key here is that the result holds for all distributions. For a particular type distribution, PAM may arise with less complementarities, because the value of a might not be attained in equilibrium. The proofs in the case of negative assortative matching are completely analogous and are omitted for brevity. The theorem establishes a dividing range between positive and negative sorting. This dividing range collapses to a line when a = a (see also Section 4 where we discuss constant elasticity of substitution matching technologies). Such a sharp cutoff is also a feature of Becker’s (1973) frictionless theory, but our cutoff is shifted toward larger complementarities. In our environment, the fact that low types are valuable because they can help facilitate trade for the high types has the novel implication that under a > 0, for all type distributions, NAM obtains even if f is strictly supermodular as long as it is nowhere n-rootsupermodular (n = (1 − a)−1 ). On the other hand, if a < a, then the areas of positive and negative sorting are not as sharply divided. This is the case specifically for those search technologies such as urn–ball technology that have a = 0 Still, any f that is weakly submodular (fxy ≤ 0) induces NAM.12 The conditions in Theorem 1 are particularly strong so as to ensure sorting under any possible type distribution. This gives us useful bounds, but these bounds might not be necessary for given type distributions. If the elasticity of substitution is not constant, it may be the case that neither the supremum a nor the infimum a are reached on the equilibrium path. This explains the weaker notion in an example by Shi (2001), who considered the urn–ball search technology and a given seller type distribution. His Example 5.2 has negative sorting despite fxy > 0 and a = 0. We formalize this in the next proposition. PROPOSITION 1: Consider a search technology such that a(·) is not constant: (i) There exist distributions B and S and functions f that are nowhere n-rootsupermodular (n = (1 − a)−1 ) such that any equilibrium exhibits positive assortative matching. 12
In general, negative assortative matching has to arise under the strict inequality fxy < afx fx f −1 The case of a = 0 is special because negative assortative matching is ensured even when fxy = 0 since in this case our assumptions on the search technology still imply a(λ) > 0 whenever λ ∈ (0 ∞). Therefore, for all types that trade with positive probability (λ = 0 ∞), the elasticity is strictly positive and the proof technique immediately extends to this case.
SORTING AND DECENTRALIZED PRICE COMPETITION
553
(ii) There exist distributions B and S and strictly n-root-supermodular (n = (1 − a)−1 ) functions f such that any equilibrium exhibits negative assortative matching. See the Appendix for all proposition proofs. Finally, we establish existence of a (differentiable) equilibrium. Existence in our setup is more complicated than in frictionless matching models because we cannot employ the standard measure-consistency condition. In our setup, it is possible that more agents from one side attempt to trade with the other, and this imbalance is absorbed through different trading probabilities.13 The system retains tractability when we impose the sufficient conditions for assortative matching (either PAM or NAM), in which case we can exploit differential equation (11) to construct the equilibrium path along the first-order condition and use the sufficient conditions to show that deviations are not profitable. PROPOSITION 2: If the function f satisfies n-root-supermodularity for n = (1 − a)−1 (or nowhere n-root-supermodularity for n = (1 − a)−1 ), then for any type distributions B and S, there exists a differentiable equilibrium. 4. CHARACTERIZATION In this section we discuss the characterization of the equilibrium. We consider two particular classes of commonly used search technologies that allow particularly sharp bounds on the degree of supermodularity: those that are bounded and imply square-root-supermodularity, and those that have a constant elasticity of substitution. We then investigate the properties of the equilibrium price schedule. 4.1. Common Search Technologies Square-root-supermodularity is the property that applies to a large class of search technologies, including those that are built on microfoundations, such as the example search technologies m1 , m2 , and m3 outlined above. The class is characterized by technologies with local bounds on the derivatives and enough 13
In frictionless one-to-one matching models with a continuum of agents, existence can be proven by considering the efficient allocation, which can be characterized by a linear program for which existence is proven by Kantorovich (1958). The efficient allocation in our setting resembles Kantorovich’s optimal transportation problem, with the one major difference that it is not a linear program since the buyer–seller ratio enters the objective (see (18)). Interpretation of a submarket as a coalition of many buyers and sellers in the spirit of the many-to-many matching literature still does not allow us to adopt existence proofs from this literature, since the proofs we are aware of rely on finite coalitions of bounded size, whereas in our setting submarkets with uncountably many buyer and seller arise.
554
J. EECKHOUT AND P. KIRCHER
curvature. To lay this out formally, it will be convenient to consider the matching probability q(λ) of the buyers, which is linked to the matching probability of the sellers via m(λ) = λq(λ). PROPOSITION 3—Square-Root-Supermodularity: Let |q (0)| > 0 and |q (0)| < ∞, and let 1/q be convex. For any type distributions B and S any equilibrium exhibits PAM if and only if f (x y) is square-root-supermodular.
Understanding what drives the sorting pattern is motivated by the relation between the complementarities in match value and the elasticity of substitution of the search technology. It is then somewhat striking that in such a large class of search technologies—arguably the most relevant ones—all depend exactly on that same condition: square-root supermodularity. The explanation for this is entirely driven by the value of the elasticity of substitution at zero. The bounds on the derivatives imply that it is necessarily pinned down at one half, which turns out to be a general property of homothetic functions as can be seen in the proof. This makes square-root-supermodularity necessary. The curvature restriction is equivalent to the requirement that the elasticity of substitution does not exceed one-half at some point other than zero and, therefore, square-root-supermodularity is sufficient. Constant elasticity of substitution (CES) matching technologies are often assumed for their simplicity. Since the elasticity of substitution is invariant, they can be represented by m(λ) = (1 + kλ−r )−1/r , where r > 0 and k > 1. The associated aggregate CES search technology for a given number of buyers and sellers β and σ is defined as (see, among others, Menzio (2007)): M(β σ) = (βr + kσ r )−1/r βσ The elasticity of substitution is given by ESM = (1 + r)−1 . The CES matching technologies do not fall into the previous category because either the bounds at zero are violated or the curvature restriction does not hold. The exception is the knife-edge case with r = 1 that corresponds to (a variation of) the matching technology m3 = λ/(λ + k) that is CES. The CES search technology nonetheless gives very sharp predictions on the necessary and sufficient conditions for positive and negative assortative matching: PAM arises when f (x y) is n-root-supermodular and NAM arises when f (x y) is nowhere n-root-supermodular, where n = (1 + r)/r is the same in both cases. It is important to stress here that n-root-supermodularity is a necessary condition for positive assortative matching even if we consider only a particular type distribution. This is stronger than our Theorem 1, and arises exactly because the elasticity is constant and we do not have to worry whether the supremum is actually realized on the path of play. Moreover, since Theorem 1 ensures NAM for any given distribution, it also provides direct evidence that NAM will arise for any type distributions even if the match value function
SORTING AND DECENTRALIZED PRICE COMPETITION
555
is (moderately) supermodular, since the elasticity of substitution is bounded away from zero. The class of CES search technologies spans the entire range of n-root-supermodularity, from supermodularity to log-supermodularity, as stated in the next corollary to Theorem 1. COROLLARY 1: Let the search technology be CES with elasticity ESM . Then a necessary and sufficient condition for PAM is one of the following cases: (i) Supermodularity if ESM 0 (Leontief). (ii) Square-root-supermodularity if ESM = 12 (m3 ). (iii) Log-supermodularity if ESM 1 (Cobb–Douglas). 4.2. The Equilibrium Price Schedule Our results are cast in terms of the monotonicity of the allocation, offering sharp predictions on assortative matching. In contrast, equilibrium does not provide equally general predictions in terms of the monotonicity of the price schedule. Equilibrium prices can be both increasing and decreasing in type, because agents are compensated through both prices and trading probabilities. This is not the case in the frictionless model of Becker (1973). There, p (y) = fy > 0, that is, the slope of the price schedule is equal to the marginal product of being matched with a better seller. For our setting, we derive the equilibrium price schedule in the Appendix. It satisfies (17)
p (y) = fy + a[(1 − ηm )fx μ − ηm fy ]
where ηm = λm /m is the elasticity of m, a is the elasticity of substitution, and μ is the change of trading partner along a differentiable equilibrium. This price schedule decentralizes the efficient allocation (Proposition 4 below). It reflects the marginal benefit conditional on matching, but additionally reflects the marginal benefit from the change in the probability of a match. In this world with trading frictions, sellers can be rewarded through higher prices or better trading probabilities. Higher seller types obviously have to make higher equilibrium profits, yet this increase may be due more to the second source than to the first and equilibrium prices can actually be declining. For this to happen, the trading probabilities have to rise substantially, though, which is only possible under negative assortative matching. Inspection of equation (17) immediately reveals that under PAM (with μ > 0), the price schedule is increasing in firm type. The effect introduced by the search frictions can never be so strong that prices actually decrease: both a and ηm are in [0 1], and as a result the aggregate sign on the fy term as determined by (1 − aηm ) is positive. This is not necessarily true under NAM, where μ < 0 Prices can then be decreasing, for example, consider some fixed type distributions and fy sufficiently small. Then sellers must make nearly identical
556
J. EECKHOUT AND P. KIRCHER
profits. If buyer types remain important (fx 0), high buyer types obtain substantially higher equilibrium utility than low buyer types. Therefore, in equilibrium, low seller types leave high utility to their (high-type) customers and obtain low queue length since dΛ/dy in equation (11) is positive under NAM. To make nearly equal profits according to (4), the low seller types have to charge a higher price in equilibrium. Since the price change (17) does not depend directly on the cross-partial, particularly simple examples of this phenomenon can be constructed with modular match values (fxy = 0). Finally, it is instructive to consider the price function in a symmetric world. Suppose there is symmetry between buyers and sellers in the match value function f (x y) and in the aggregate search technology M(β σ), and the type distributions are identical for buyers and sellers. Then it is straightforward to show that under root-supermodularity and, therefore, PAM, a “symmetric” equilibrium exists with μ(y) = x and a constant queue length λ = 1 along the equilibrium path. Since symmetry of M implies that ηm = 1/2, the pricing function reduces exactly to the marginal value of Becker (1973), that is, p = fy . This highlights the fact that the effect on prices due to search frictions is only prevalent in the presence of asymmetries. In a positively assorted equilibrium, under symmetry, the effects of frictions exactly cancel out. 5. EFFICIENCY OF THE DECENTRALIZED ALLOCATION Consider a planner who chooses trading distributions (G H) to maximize the surplus in the economy, subject to the same search technology. The planner maximizes max q(ΛGH (y p))f (x y) dH (18) GH
(19)
s.t.
GY = S HX = B ΛGH = dHYP /dG
where the constraints correspond to the restrictions in the decentralized economy. Prices simply constitute transfers between agents and, therefore, they do not enter the planner’s objective directly. They do allow the planner to let identical sellers trade at different queue lengths Λ(y p) and Λ(y p ) with potentially different buyers, which is also possible in the decentralized economy. Since in the planner’s problem prices play no direct role, we could as well have indexed the queue length by some other label such as a “location” instead of prices. PROPOSITION 4: If f is strictly n-root-supermodular with n = (1 − a)−1 (nowhere n-root-supermodular with n = (1 − a)−1 ), then any solution to the planner’s problem is positive (negative) assorted and can be decentralized as an equilibrium.
SORTING AND DECENTRALIZED PRICE COMPETITION
557
This result is in line with the efficiency properties of directed search models in general; see for example, Moen (1997), Acemoglu and Shimer (1999b), and Shi (2001). It is worth highlighting this efficiency property, because it allows us to interpret our sorting condition from an efficiency point of view. Our result provides a condition that augments the standard Hosios (1990) condition for efficiency by relating different submarkets. The Hosios (1990) condition holds for a particular (x y) market and equates the social contribution to match formation with the split of the surplus between buyer and seller. In our decentralized equilibrium, substituting (6) into (5) yields the Hosios condition, which can be rewritten to say that seller y’s equilibrium profits are Ms (Λ 1)f (x y) and reflect his marginal contribution to match creation. With two-sided heterogeneity, the issue of efficiency hinges on which (x y) combinations trade in equilibrium. Our contribution is to show that this is governed not by the derivative of the aggregate matching technology M, but by its elasticity of substitution a(λ). The Hosios condition is usually associated with the elasticity ηm of the individual search technology m since Ms = 1 − ηm A similar connection exists in our setting between the elasticity of substitution of M, denoted by a, and the elasticity ηm of the individual matching technology m. To see this, observe that (20)
a(λ) ≡
m (λ)(m (λ)λ − m(λ)) 1 − ηm (λ) = λm(λ)m (λ) ηm (λ)
The first equality is the condition we derived above in equation (13). The second equality follows immediately after rearranging terms, where η denotes the elasticity of the subscripted function: ηm = λm and ηm = λm . As with the Hom m sios condition, the condition here depends on the elasticity via 1 − ηm , which captures the marginal effect on the search technology. In addition, it depends on ηm , which captures the second degree marginal effect on the search technology. This effect governs how the matching probability changes as we move across different matched pairs. The latter effect is obviously absent with homogeneous types and, therefore, in the standard Hosios condition. 6. DISCUSSION OF RELATED LITERATURE We relate our findings to models and results from three distinct literatures. 6.1. Directed Search There is an extensive literature on directed search with and without twosided heterogeneity. Contributions range from work that provides a rationale for unemployment in the labor market and waiting times in the product market (for example, Peters (1991, 1997b, 2000, 2007), Acemoglu and Shimer (1999a, 1999b), Burdett, Shi, and Wright (2001), Shi (2001), Mortensen and Wright
558
J. EECKHOUT AND P. KIRCHER
(2002), Galenianos and Kircher (2009), Kircher (2009), and Delacroix and Shi (2006)) to work that models more elaborate trading mechanisms (such as McAfee (1993), Peters (1997a), Shi (2002), Shimer (2005), and Eeckhout and Kircher (2010)). Here we focus our attention on specific aspects of the most closely related paper by Shi (2001). Shi was the first researcher to show that, in an environment with directed search, supermodularity is not sufficient to attain PAM. He assumed that firms can freely enter with type y if they pay some entry cost C(y) He derived a condition that requires fxy to be sufficiently large that is seemingly different from ours. Here we show that our findings are consistent. His condition is (21)
Cfy (fy − Cy ) ffxy > fx fy Cy (f Cy − Cfy )
The strength of this condition (i.e., the magnitude of the right-hand side) cannot readily be evaluated. Moreover, this condition seems not to depend on the search technology m, which is in apparent contradiction with our results. Our results imply that sorting depends on the elasticity of substitution of the search technology. It turns out, even though it is not directly visible, that condition (21) depends crucially on the feature of the urn–ball search technology assumed in Shi (2001). In particular, the right-hand side (RHS) will look different when the search technology is not urn–ball. A simple example is the case of CES where the RHS is a constant. Recall that our condition (14) gives a condition for PAM for a given type distribution. To see that condition (21) arises as a special case of this, we now derive the equilibrium conditions in Shi (2001) with free entry and for a general search technology. Equilibrium profits can be obtained by substituting (6) into (5). If, after entry, seller type y trades with buyer type μ(y) at queue length Λ(y) (more precisely, Λ(y p(y))), the free entry condition requires
m(Λ(y)) − Λ(y)m (Λ(y)) f (μ(y) y) = C(y) (22) Differentiating (22), after eliminating terms that add to zero by (7), and using the derivative of (6), we obtain that m(Λ(y))fy (μ(y) y) = Cy (y). For the special case of the urn–ball search technology m1 , these two equations coincide with Shi’s (2001) characteristic equations. We can invert these to obtain an analytic expression of Λ(y) as a function of the entry cost, and substitution into the RHS of (14) recovers Shi’s (2001) result.14 Still the right-hand side of (14) depends crucially on the elasticity of substitution for the specific search 14 For m1 (λ) = 1 − e−λ , we obtain a nice analytic expression for the elasticity of substitution: a(Λ) = λ−1 + e−λ /(1 − e−λ ) There are a multitude of ways to use the entry cost to substitute out the queue length along the equilibrium path. Observe that m1 (Λ(y))fy (μ(y) y) = Cy (y) implies Λ(y) = − ln(1 − Cy (y)/fy (μ(y) y)) Using this, one could write the elasticity of substitution
SORTING AND DECENTRALIZED PRICE COMPETITION
559
technology in question, as can easily be seen when the RHS of (14) is constant and, therefore, the level of entry plays no role. For urn–ball, the elasticity of substitution is nonconstant and indirectly depends on the entry cost. By varying the entry cost, any seller type distribution can be sustained (by setting the entry cost equal to the equilibrium profits) and by Proposition 3, square-rootsupermodularity provides the relevant bound on the strength of (21). In our setting, entry does not simplify the analysis because inverting the free entry conditions yields Λ(y) as a function of the inverse of the search technology, which for general search technologies does not have a nice analytic representation. Our approach, therefore, relies directly on the second-order conditions of the seller’s optimization problem (4). Using a general search technology allows us to derive the fundamental economic trade-off between complementarities in match value and complementarities in the search technology, and to obtain explicit bounds on the strength of supermodularity that hold for any type distribution. 6.2. Random Search In the Introduction, we compared our root-supermodularity condition to the conditions in the random search model of Shimer and Smith (2000). It is worth noting first that random search models adopt a notion of positive assortative matching that differs from the notion in this paper and in the frictionless environment of Becker (1973). In random search, sellers meet many different buyer types and the probability of meeting any particular buyer type is zero. Therefore, sellers are willing to accept matches from some set of buyer types. For a given seller, the set of buyers for which matching is mutually agreeable is then called the matching set. Positive assortative matching means that any element in the acceptance set of a lower type is either included or strictly below any element in the acceptance set of a higher type. The conditions in Shimer and Smith (2000) derived their economic meaning from the fact that they ensure connectedness of these matching sets. The exact conditions are supermodularity of f log-supermodularity of fx and fy , and log-supermodularity of fxy Unlike our match value function, theirs is a symmetric function f such that f (x y) = f (y x). They also assumed that f ≥ 0 and fy (0 y) ≤ 0 ≤ fy (1 y) for all y. These assumptions do not directly include log-supermodularity of f , which we used as a lower bound to compare the strength of our condition to theirs. We now show that log-supermodularity is and thus the RHS of (21) as a(Λ(y)) = − ln(1 − Cy (y)/fy (μ(y) y))−1 + 1 − fy (μ(y) y)/Cy (y) Alternatively, one could use both entry conditions to express the elasticity of substitution as a(Λ(y)) =
C(y)fy (μ(y) y)[fy (μ(y) y) − Cy (y)] Cy (y)[f (μ(y) y)Cy (y) − C(y)fy (μ(y) y)]
which exactly recovers the RHS of (21).
560
J. EECKHOUT AND P. KIRCHER
implied under the additional monotonicity restriction imposed in our model, that is, fy (x y) ≥ 0 (and by symmetry, fx (x y) ≥ 0). Assume that the conditions of the previous paragraph hold. A function f is log-supermodular if log f is supermodular or, equivalently, if for all (x y), the condition fxy f − fx fy ≥ 0 holds (where we suppress the arguments). Obviously this condition holds whenever fx = 0 because of supermodularity (fxy ≥ 0) and f ≥ 0. Now we establish that it holds even at points with fx > 0. First, observe that log-supermodularity trivially holds at (0 0) under the assumptions above. Then it is sufficient to show that at any (x y) at which logsupermodularity holds, the left-hand side of the condition increases in x The argument applies symmetrically for increases in y which establishes the result that log-supermodularity holds at all (x y) The left-hand side of the logsupermodularity condition increases in x if (23)
fx2 y f + fxy fx − fx2 fy − fx fxy ≥ 0
Log-supermodularity of fx was assumed, which implies fx2 y fx − fx2 fxy ≥ 0. From this inequality, we can now substitute for fx2 y in (23) and also substitute for fxy from the inequality of the log-supermodularity condition to get the more demanding inequality fx2 fy + fxy fx − fx2 fy − fx fxy ≥ 0, which holds trivially. We have, therefore, established that the conditions in Shimer and Smith (2000) together with monotonicity imply log-supermodularity. Although the reverse is not true (not every log-supermodular function fulfills the conditions in Shimer and Smith (2000); not all log-supermodular functions also have first and cross-partial derivatives that are log-supermodular), at least this result gives us a useful lower bound for the strength of supermodularity required under random search that can be used for comparison with our setting. 6.3. Vanishing Frictions and Convergence to the Walrasian Equilibrium The competitive benchmark of the Walrasian economy (Becker (1973), Rosen (1974)) induces positive sorting under mere supermodularity. There are no frictions in a competitive setting. Such a lack of frictions can be captured in our setup by assuming that agents can perfectly match into pairs. This leads to a benchmark search technology represented by mB (λ) = min{λ 1} (see the kinked, solid line m(λ) in Figure 1). The short side of the market always matches with probability 1 while those types on the long side get rationed in proportion to the buyer–seller ratio. We can now consider vanishing frictions to be a sequence of matching functions that converges to mB and investigate whether the condition for sorting reduces to mere supermodularity as required in the Walrasian benchmark. This approach of considering the limit economy as frictions vanish ties in with the large literature that validates Walrasian trade as the limit of matching and bargaining games (see, among many others, Rubinstein and Wolinsky
SORTING AND DECENTRALIZED PRICE COMPETITION
561
FIGURE 1.—Vanishing frictions for the static search technology.
(1985), Gale (1986), and more recently, Lauermann (2007)). This literature generally studies dynamic games and shows convergence as trading becomes more frequent. While this approach can be replicated with similar success in a dynamic extension of our setting,15 our contribution here is to take a different perspective by modeling vanishing frictions directly through changes in the search technology. We obtain immediately an apparent discrepancy between the idea of convergence to Becker’s (1973) supermodularity condition and the n-rootsupermodularity condition as implied by Theorem 1. For example, the class of logarithmic search technologies m(λ) = 1 − ln(1 + e(1−λ)/(1−δ) )/ ln(1 + e1/(1−δ) ) with δ ∈ (0 1) fulfills the premise of Proposition 3 and, therefore, requires square-root-supermodularity for any level of δ to induce assortative matching. Yet it converges uniformly to the competitive benchmark mB (λ) as δ → 1 where we would expect the weaker condition of supermodularity (Becker (1973)) to apply. To resolve this apparent discrepancy, observe that our condition for sorting entails the elasticity of substitution a(λ δ) that depends on the search technology through the parameter δ.16 While m → mB uniformly as δ → 1, the elasticity of substitution does not converge to zero uniformly. In particular, in markets with few buyers, the elasticity of substitution remains close to one15 The working paper version of this paper incorporates the fully dynamic extension of the model, including results on the convergence of our condition. We further discuss the dynamic model in the Conclusion. 16 Some algebra establishes that a(Λ δ) = (1 + exp( 1−Λ )) 1−δ − exp( 1−Λ )(ln(1 + exp(1/(1 − 1−δ Λ 1−δ 1−Λ −1 δ))) − ln(1 + exp( 1−δ )))
562
J. EECKHOUT AND P. KIRCHER
half. With vanishing frictions, the strength of the square-root-supermodularity condition comes only from the submarkets with few buyers (λ ≈ 0), that is, when at least some sellers match with very low probabilities due, for example, to an aggregate imbalance where the overall mass of sellers exceeds the mass of buyers. If this is not the case, that is, if all sellers can trade with probability bounded away from zero along a sequence of δ’s such that m → mB then the standard supermodularity condition emerges: some tedious application of l’Hôpital’s rule reveals that limδ→1 a(λ δ) = 0 for all λ > 0. More generally this means that the set of seller types that trade with positive probability but for whom Becker’s condition does not (approximately) govern the matching pattern includes only those sellers with queue length around zero (i.e., those that can hardly trade) as frictions vanish. Becker’s (1973) insight is, therefore, recovered for vanishing frictions as it applies to all types that have nonvanishing trading prospects. A special case is that of the CES search technology, because the only way to get convergence to mB is by changing the elasticity of substitution a → 0. By construction, there is then not only uniform convergence of m, but also uniform convergence of a, and as a result, the necessary and sufficient condition for PAM converges to mere supermodularity for all matched pairs. CONCLUSION
In the presence of search frictions in a market with two-sided matching, price competition gives rise to two distinct and opposing forces that determine sorting. The degree of complementarity in the match value is a force toward positive assortative matching, whereas search frictions embody a force toward negative assortative matching. We identify a condition based on the elasticities of substitution of the match value function and that of the search technology that summarizes this trade-off. It tells us exactly how much additional complementarity above and beyond mere supermodularity—namely root-supermodularity—is needed in terms of the match value to induce positive sorting, where the exact root depends on the elasticity of substitution in the search technology. This elasticity condition also augments the standard Hosios (1990) condition for efficiency by relating different submarkets. In addition to the split of the surplus for a given pair of buyer–seller types as analyzed by Hosios, the novel determinant of efficiency here is which types are matched in equilibrium. Then not only is the derivative of the aggregate search technology important (as in Hosios), but also is the elasticity of substitution across different pairs. In this work, we have made various simplifying assumptions. Some of them we relaxed in the working paper version of this paper. If seller preferences depend on the price and additionally their own type—for example, due to opportunity costs that depend on the seller’s own type—our results still obtain, only now the match value is the sum of the buyer’s and the seller’s valuation: If the
SORTING AND DECENTRALIZED PRICE COMPETITION
563
sellers’ preferences are of the form f s (y) + p, then our conditions on the match value function refer to f (x y) + f s (y). Our results further generalize if sellers also care about the buyer’s type, provided they are able to specify the desired buyer type together with the price so as to avoid problems of adverse selection. Alternatively, our results apply if the seller posts a payoff he wants to obtain (rather than a price), which makes the buyer the residual claimant. In addition to the preferences, we also relax the time structure. We consider steady states in a repeated interaction and show that n-root-supermodularity still ensures positive assortative matching. The condition of n-root-supermodularity (n = 1 − a−1 ) is still sufficient, though a weaker root that depends on the discount factor may also suffice.17 We conclude with a final thought on the connection to many-to-many matching markets for which the literature yet lacks a characterization of the sorting patterns. While our setup requires each seller to trade a single unit with at most one buyer, it does resemble a particular kind of two-sided many-to-many matching market. When β buyers of type x and σ sellers of type y form a coalition, they produce output M(β σ)f (x y). Instead of buyers and sellers, the sides can be interpreted as teachers and students, where a coalition is a school, or machines and workers, where a coalition is a factory. Given the similarity in structure, we expect our results to apply to this setting as well. APPENDIX PROOF OF THEOREM 1: We prove the result for case (i), positive assortative matching. An analogous derivation establishes the result for negative assortative matching. The proof for PAM consists of two parts, one for the sufficient condition, and one for the necessary condition. PROPOSITION A1—Sufficiency: If the function f (x y) is strictly n-rootsupermodular where n = (1 − a)−1 , then any equilibrium entails positive assortative matching under any type distributions B(x) S(y). PROOF—By Contradiction: Consider a (candidate) equilibrium (G H) that does not entail positive assortative matching Then there exist (x y p) and (x y p ) on the support of H such that x > x but y < y Then x has to be part of the solution to the seller’s optimization problem (4) for y, and x has to be part of the solution to (4) for y given U(· G H) We contradict this in four steps. 17 For the search technologies in Proposition 3, square-root-supermodularity remains necessary, while for CES matching technologies, weaker conditions apply that depend on the discount factor. Note that these results assume the existence of a steady state, which can be assured under a “cloning” assumption that we make in the working paper.
564
J. EECKHOUT AND P. KIRCHER
Step 1—Reformulating the sellers’ maximization problem. The optimization problem (4) of seller y can be written as max m(λ)p : q(λ)[f (x y) − p] = U(x G H) (24) xλp
⇔
max{m(λ)f (x y) − λU(x G H)}
⇔
max Π(x y|U(· G H))
xλ
x
where Π in the last line is defined as (25)
Π(x y|V (·)) ≡ max m(λ)f (x y) − λV (x) λ
for any positive and continuous function V (·) The following obvious property will be useful later: (I) For any two positive and continuous functions V (·) and W (·), and any seller type x the inequality Π(x y|V (·)) < Π(x y|W (·)) holds if and only if V (x) > W (x) We have achieved the desired contradiction if the maximizer of (24) for y is smaller than for y Defining Γ (y|V (·)) = arg maxx Π(x y|V (·)) this means that we have achieved the contradiction if (26)
max Γ (y|U(· G H)) ≤ min Γ (y |U(· G H))
Step 2—Introducing differentiability through auxiliary buyer utility V (·). To show (26), it will convenient to have Π differentiable. To achieve this, we do not directly work with buyers’ equilibrium utility U(· G H) but rather we work with a particular auxiliary function V (·) that we define implicitly as (27)
Π(x y|V (·)) = Π(κ y|U(· G H))
for all x ≤ κ ≡ max Γ (y|U(· G H)) and V (x) = U(x G H) otherwise. This means that if seller y has to leave utility V (x) to the buyers, he is indifferent between all types that are below κ that is, Γ (y|V ) = [x κ] Equation (27) defines V (x) uniquely by property (I) established in the previous step. Note that V (x) is differentiable by construction since the implicit function theorem delivers V (x) =
m(λ) fx (x y) λ
where λ takes the value that maximizes the right-hand side of (25). Since κ is a maximizer of Π(x y|U(· G H)) property (I) also establishes another property: (II) V (x) ≤ U(x G H) everywhere. Step 3—Positive cross-partials. Now consider seller y > y in a neighborhood of y. Taking the cross-partial of Π(x y |V ) and incorporating that V is defined
SORTING AND DECENTRALIZED PRICE COMPETITION
565
by (27) together with (25), we obtain, after some tedious algebra, for all x ∈ [x κ] that ∂Π(x y |V ) (28) ∂x ∂y y =y m (λ)(λm (λ) − m(λ)) fx (x y)fy (x y) = fxy (x y) − m(λ) λm (λ)m(λ) f (x y) where λ takes the value that maximizes the right-hand side of (25). This crosspartial evaluated at y = y is strictly positive since the RHS of (28) is strictly positive by strict n-root-supermodularity of f . Hence for y slightly larger than y, the cross-partial remains strictly positive by continuity. On [x κ] we have Π(x y|V ) = Π(x y|V ) by construction and, therefore, Π(x y |V ) < Π(x y |V ) when x < x Therefore, any x that maximizes Π has to lie above κ and we obtain a third property: (III) min Γ (y |V ) ≥ κ Step 4—Reintroducing U(· G H) instead of the auxiliary buyer utility V (·). By construction, V (x) = U(x G H) for x ≥ κ and, by (II), it holds that V (x) ≤ U(x G H) everywhere. Therefore, by (I) we have Π(x y|V ) = Π(x y|U(· G H)) for x ≥ κ and Π(x y|V ) ≥ Π(x y|U(· G H)) everywhere. Since by (III), min Γ (y |V ) ≥ κ this implies immediately that min Γ (y | U(· G H)) ≥ κ By the definition of κ, this implies (26). Q.E.D. PROPOSITION A2—Necessity: If any equilibrium is positive assorted under any type distributions B(x) and S(y) then f (x y) is weakly n-root-supermodular where n = (1 − a)−1 . ˆ y), ˆ such that the PROOF: By contradiction. Suppose there exists some (x match value function is not n-root-supermodular, but there exists an equilibrium that exhibits PAM for any distributions B S We will contradict this in four steps; the main insights are in the first three steps. ˆ y), ˆ where f is nowhere n-rootStep 1—Construct a set Zε around (x supermodular. By the smoothness properties of f , there exists ε > 0 such that f is not root-supermodular anywhere on Zε = [xˆ − ε xˆ + ε] × [yˆ − ε yˆ + ε] We can choose ε such that fxy (x y) − α
fx (x y)fy (x y) <0 f (x y)
for all (x y) ∈ Zε for some α < a. By continuity of a(λ), there exists λ1 λ2 such that a(λ) > α for all λ ∈ [λ1 λ2 ]. If buyer and seller types are in Zε and they trade at queue lengths in [λ1 λ2 ] the lack of sufficient supermodularity means that PAM cannot be sustained, as we formalize in the next steps. Step 2—Let Zε shrink so that types are similar. Consider a sequence {εk }∞ k=1 0 < εk < ε that monotonically converges to zero. Let Bk and Sk be associated
566
J. EECKHOUT AND P. KIRCHER
sequences of distributions of buyer and seller types. Let Bk be uniform with support on Xk = [xˆ − εk xˆ + εk ] and unit mass Bk (xˆ + εk ) = 1 Let Sk be uniform with support on Yk = [yˆ − εk yˆ + εk ] with mass Sk (yˆ + εk ) = 2/(λ1 + λ2 ), that is, the aggregate ratio of buyers to sellers remains constant at the average of λ1 and λ2 independent of k. By construction, the buyer–seller types that trade are within Zε for any k. Step 3—For some k, all buyers and sellers trade at queue lengths in (λ1 λ2 ). Consider an equilibrium (Gk Hk ) for each k Note first that the difference in expected buyer utilities converges to zero, in the sense that for every ξ > 0, there exists κ such that |U(x1 Gk Hk ) − U(x2 Gk Hk )| ≤ ξ for any x1 x2 ∈ Xk and any k ≥ κ This notion of convergence is used throughout this proof. It can be shown based on equilibrium condition (ii), which ensures that |U(x1 Gk Hk ) − U(x2 Gk Hk )| ≤ maxλ∈[0∞] maxy∈Yk q(λ)|f (x1 y) − f (x2 y)| Assuming without loss of generality that x1 ≥ x2 the right-hand side of the inequality is bounded by f (x1 yˆ + εk ) − f (x2 yˆ − εk ) and this term vanishes since x1 − x2 ≤ 2εk → 0 and yˆ + εk − (yˆ − εk ) = 2εk → 0 and f is continuous. Given that the differences in buyer utility vanish with large k and given that the distance in types vanishes, it is easy to show that the distance between the highest queue length that is part of a solution to (4) for some y ∈ Yk and the lowest queue length that is part of a solution to (4) for some y ∈ Yk converge to zero. (Also the differences in the value to program (4) across seller types in Yk vanish with increasing k as used in the next step.) Since the differences in queue lengths across sellers vanish, but the aggregate buyer seller ratio is (λ1 + λ2 )/2, all sellers trade at queue lengths in (λ1 λ2 ) for k sufficiently large. If we restrict attention only to differentiable equilibria, this immediately contradicts the assumption that the equilibria are PAM, since condition (14) in Lemma 1 is violated. Step 4—Nondifferentiable equilibria. Finally, we rule out that equilibria are PAM but nondifferentiable. Let πk (y) = maxp π(y p Gk Hk ) denote the equilibrium profit of seller y, that is, the value of program (4). In the previous proof of Proposition A1, the indifference condition (27) which defines auxiliary utility Vk (x) can be restated as Π(x y|Vk (·)) = πk (y) or (29)
max m(λ)f (x y) − λVk (x) = πk (y) λ
Note that the maximizer of the left-hand side (LHS) of (29) is used in (28) in the previous proof. We are done if we can show that there exists k such that the maximizers of the LHS of (29) lie in [λ1 λ2 ] for all x ∈ Xk and any y ∈ Yk Then analogous arguments as in the proof of Proposition A1 establish that there has to be negative assortative matching since the cross-partial in (28) is negative, ruling out PAM. To show the missing part recall that the equilibrium profits πk (y) across sellers in Yk becomes nearly identical for large k (see previous Step 3). Since profits lie in a bounded set, there exist limit point π∞ and a subsequence such
SORTING AND DECENTRALIZED PRICE COMPETITION
567
that for any ξ, the distance between the equilibrium profit πk (y) of any y ∈ Yk and π∞ are less than ξ, as k becomes sufficiently large. This convergence of the RHS of (29) and the vanishing differences between buyer types mean that there is a subsequence for which Vk (x) approaches some limit value V∞ arbitrarily close for all x ∈ Xk and any y ∈ Yk Since Vk (x) converges to V∞ and ˆ the queue lengths that maximize the the support of buyer types shrinks to x, LHS of (29) have to converge. Finally, observe that they have to converge to a value within [λ1 λ2 ] as we will now show. The profit πk (y) can by (5) be written as maxxλ m(λ)f (x y) − λU(x Gk Hk ) Let (x∗k λ∗k ) be the equilibrium type and equilibrium queue length which maximize this expression. Since equilibrium queue lengths lie in [λ1 λ2 ] for large k as shown in Step 3, we have λ∗k ∈ (λ1 λ2 ) for k large enough. Since all maximizers of the LHS of (29) converge, and λ∗k is such a maximizer (for x∗k ), all maximizers converge to the limit of λ∗k that lies within (λ1 λ2 ) Q.E.D. PROOF OF PROPOSITION 1: (i) Given search technology m let a1 = 13 a + 23 a and a2 = 23 a + 13 a Choose λ1 and λ2 such that a(λ) ∈ [a1 a2 ] for all λ ∈ [λ1 λ2 ]. Consider f (x y) = (x + y + 1)(n+n2 )/2 This function is n2 -root-supermodular but nowhere n-root-supermodular, where n2 = (1 − a2 )−1 and n = (1 − a)−1 . Now consider a sequence of distributions Bk and Sk with support on [0 εk ], with Bk (εk ) = 1 and Sk (εk ) = 2/(λ1 + λ2 ) Analogous arguments as in Steps 2– 4 in the proof of Proposition A2 show that all agents desire trade at a queue lengths in (λ1 λ2 ) and n2 -root-supermodularity implies PAM. This establishes the first part. Part (ii) can be established analogously, with preference function f (x y) = (x + y + 1)(n+n1 )/2 , where n1 = (1 − a1 )−1 and n = (1 − a)−1 . Q.E.D. PROOF OF PROPOSITION 2: We prove the result for positive sorting; the proof for negative sorting is analogous. We construct a positively assorted differentiable equilibrium (G H) in three steps: First we explore necessary conditions that restrict the connection between the queue length, the assignment, and the price that different seller types face in equilibrium. Then we “reverseengineer” the associated equilibrium (G H) and, finally, we check that the equilibrium conditions are indeed met. Step 1—Exploiting necessary conditions. Rather than consider equilibrium distributions (G H) directly, we reverse-engineer them by exploiting first some necessary conditions about the relationship between the queue length Λ(y) [formally Λ(y p(y))], the assignmentμ(y), and the price p(y) in a differentiable equilibrium. First, the buyer–seller ratio integrated over a range of seller types equals the number of buyers that choose these types (as required by the Radon–Nikodym y x derivative), which relates Λ to μ via y Λ(·) dS = μ(y) dB This yields (30)
μ (y) = s(y)Λ(y)b(μ(y))−1
568
J. EECKHOUT AND P. KIRCHER
Second, Λ and μ are linked via the first-order conditions given in (6) and (7) for some positive and increasing function U(y) From (6) and (7) we can derive (11), which together with (30) yields (31)
1 m (Λ(y))f (μ(y) y) (Λ(y)m (Λ(y)) − m(Λ(y)))s(y) × fx (μ(y) y) b(μ(y)) + m (Λ(y))fy (μ y)
Λ (y) = −
Third, Λ and μ are linked via two boundary conditions. Intuitively, the lowest active seller type, that is, the lowest type x0 that does not take the outside option, has to obtain at least as much utility as the outside option of zero and has to get exactly zero if x0 > x; otherwise, lower types would get more by becoming active. A similar logic holds for the lowest seller type y0 that trades in equilibrium. Therefore, the boundary buyers’ equilibrium utility [given in (6)] and the boundary sellers’ equilibrium profits [given by (6) substituted into (5)] have to satisfy (32) (33)
m (Λ(y0 ))f (μ(y0 ) y0 ) ≥ 0 with equality if μ(y0 ) > x
m(Λ(y0 )) − Λ(y0 )m (Λ(y0 )) f (μ(y0 ) y0 ) ≥ 0 with equality if y0 > y
Equations (30) and (31) together constitute a differential equation system in Λ μ. One initial condition is μ(y) = x Given a second initial condition on the queue length at the top seller, Λ(y) = λ ∈ (0 ∞) the system uniquely determines Λ(y) and μ(y) (in the direction of lower y) at all y down to some limit point y0 (λ) This limit point is characterized either by y0 (λ) = y, or by μ(y0 (λ)) = x, or by limyy0 (λ) Λ(y) = 0, or by limyy0 (λ) Λ(y) = ∞ whichever arises first. Since the lower bound has to satisfy (32) and (33), this imposes restrictions on the free parameter λ We can show (the proof is available in the working paper version of the paper) that there exists an initial condition λ ∈ (0 ∞) such that the resulting y0 (λ), λ(y0 (λ)), and μ(y0 (λ)) fulfill boundary conditions (32) and (33). For the following discussion, consider such a λ which fixes the associated solutions Λ and μ to (30) and (31), and fixes associated boundary types y0 and x0 = μ(y0 ) uniquely. The price function p(y) for each type y ≥ y0 can then be reconstructed since the profit m(Λ(y))p(y) has to equal the constructed profits given by (6) substituted into (5), yielding after division by m(Λ(y)) that p(y) = [1 − Λ(y)m (Λ(y))/m(Λ(y))]f (μ(y) y) For types below y0 note that y0 >
SORTING AND DECENTRALIZED PRICE COMPETITION
569
y implies by (32) that Λ(y0 ) = limy→y0 λ(y) = 0 which implies p(y0 ) = limy→y0 p(y) = 0 since limΛ→0 m(Λ)/Λ = m (0) and, therefore, limy→y0 Λ(y) × m (Λ(y))/m(Λ(y)) = 1. Note that the finite limit limΛ→0 m(Λ)/Λ = m (0) indeed exists: finite m (λ) exists by assumption for λ > 0, is monotone (by m (λ) < 0), and is bounded (by m (λ) ≤ 1, as otherwise limλ→0 q(λ) = limλ→0 m(λ)/λ = limλ→0 m (λ) > 1, which violates q(λ) ∈ [0 1]). The boundary seller does not obtain any buyers even at a zero price, and all types below him also obtain no buyers independent of the price they charge because their quality is also too low. So we can set p(y) = 0 for all y < y0 Step 2—Recovering the equilibrium (G H). The equilibrium distributions (G H) can now be constructed from the μ and p functions derived in the first step. Consider the sellers first. We integrate all of them that offer prices below p as derived in the previous step: y ˜ [p(y)≤p] ˜ s(y)I d y G(y p) = ˜ y0
where I is an indicator function that takes the value of 1 if the qualifier in square brackets it true and takes the value 0 otherwise. Clearly, GY = S by construction, as required. Next consider the buyers. Types below x0 choose their outside option ∅ That is, at any price p ≥ 0 these types trade below (by our convention in footnote 5) and, therefore, have mass B(x). Therefore, for all x < x0 we have H(x y p) = B(x) for all (y p) ∈ (Y × P ) ∪ {∅} For all buyers with x ≥ x0 we have H(x ∅) = B(x0 ) and for all other (y p) ∈ Y × P , y ˜ [μ(y)≤x] (34) b(μ(y))I I[p(y)≤p] d y˜ + B(x0 ) H(x y p) = ˜ ˜ y0
Clearly HX = B as required. Step 3—Checking the equilibrium conditions. By construction, the function Λ(y) as constructed in the first step coincides with a Radon–Nikodym derivative ΛGH (y p) of G with respect to H along all (y p(y)) Also, the function U(·) in the first step coincides with U(· G H) by construction. To check that (G H) is indeed an equilibrium, we can extend ΛGH to the entire domain by (3) and check the equilibrium conditions (i) and (ii). Condition (i) amounts to verifying that no seller y wants to deviate and offer a different price than p(y) constructed above (because (y p(y)) are the only combinations in the support of G), which is equivalent to checking that no seller has a profitable deviation from (μ(y) Λ(y) p(y)) in (4). Additionally, condition (ii) requires us to check that no buyer μ(y) wants to deviate and trade at some combination other than (y p(y)) (again (μ(y) y p(y)) are the only combinations in the support of H, except for those buyers below x0 for which we have to check that they do not want to trade at all). The verification is facilitated by the observations that if sellers do not have an incentive to deviate, then buyers have no incentive to deviate. This follows
570
J. EECKHOUT AND P. KIRCHER
directly from the fact that a profitable deviation for buyers means that, in program (4), sellers can make higher profits. (Another way to see this is that (31) is exactly the buyers’ envelope condition.) Moreover, for the sellers, we only have to consider types in [y0 y] If there are seller types below y0 , these types do not have a profitable deviation because, by boundary condition (33), type y0 makes zero profits and we will verify that he does not have a profitable deviation despite being a higher type. For types in [y0 y] we know that (μ(y) Λ(y) p(y)) constructed above is indeed a local maximum in (4), because n-root-supermodularity implies that the Hessian (8) is negative definite. We now establish that the solution is a global maximum. Consider a seller y with assigned buyer type x, that is, x = μ(y). Now suppose there is another buyer x = μ(y ), different from x which is optimal for y that is, x satisfies the necessary first-order conditions (6) and (7) for optimality for seller y (together with for some queue length).18 Since (x y) fulfill both (6) and (7), they satisfy the generalization of (6), (35)
q(ς(x y))fx (x y) − U (x ) = 0
where ς(x y) is defined as the queue length such that Λ(y) = ς(x y) solves m (Λ(y))f (x y)−U(x ) = 0 in analogy to (7). Now suppose that x > x which implies y > y; the opposite case is analogous. Since μ(y ) = x , these types also fulfill (6) and (7) by our construction in Step 1; therefore, they fulfill also (36)
q(ς(x y ))fx (x y ) − U (x ) = 0
We rule out that both (35) and (36) are satisfied simultaneously by showing that q(ς(x y))fx (x y) is strictly increasing in y. The derivative of this expression with respect to y together with implicit differentiation of (7) to recover ∂ς(x y)/∂y is strictly positive if and only if fxy (x y) > a(ς(x y))fy (x y)fx (x y)f (x y)−1 which is ensured by n-root-supermodularity (where n = (1 − a)−1 ). This implies that the solution to the first-order condition in (6) and (7) is a global maximum. Q.E.D. PROOF OF PROPOSITION 3: Trade in pairs requires λq(λ) = m(λ) Therefore, q (λ) = [m (λ) − 2q (λ)]λ−1 Then |q (0)| < ∞ implies m (0) = 2q (0) Together with q (0) = 0, this implies m (0) = 0 Use λq(λ) = m(λ) to write (13) as a(λ) = m (λ)q (λ)/(m (λ)q(λ)) and substitute to get a(0) = m (0)/(2q(0)) Since q(0) = limλ→0 m(λ)/λ = m (0), one obtains a(0) = 1/2. This argument assumes x satisfied x = μ(y ) for some y which does not hold if x < x0 Note that in the case x < x0 , both types x and x0 obtain zero utility (see (32)), and seller y is at least as well off according to (4) by attracting x0 as by attracting x For x0 it holds that μ(y0 ) = x0 18
SORTING AND DECENTRALIZED PRICE COMPETITION
571
Further, a(λ) ≤ 1/2 for all λ if and only if q(λ)−1 is convex: since q(λ)−1 = λm(λ)−1 , we have (suppressing the argument of m) (λm−1 ) = λ2 m−3 [−m q + 2m q ] This is positive if and only if −m q + 2m q ≥ 0 or, equivalently, a(λ) = m q /(m q) ≤ 1/2. Q.E.D. PROOF OF PROPOSITION 4: We first show that the planner’s assignment coincides with the equilibrium assignment if it is positive assortative. Then we sketch why root-supermodularity implies the associated direction of sorting by showing that it induces the direction locally (the full proof for global assortative matching is available on request). Assume that H in the planner’s solution is assortative, that is, it permits μ that is strictly monotone. Since HX Y (μ(y) y) = B(y) and HX = B all the mass is concentrated only on (μ(y) y) pairs. For a given (μ(y) y) pair, the concavity of the matching function implies that it is optimal if all of these agents trade at the same queue length Λ(y) [formally, Λ(y p(y)) for some p(y)].19 Since all mass is only concentrated on (μ(y) y) pairs, the constraints can be convey niently summarized by a single constraint y Λ(·) dS = 1 − B(μ(y)) in case of y positive sorting and y Λ(·) dS = B(μ(y)) in case of negative sorting. For given (G H), there is almost everywhere a unique Λ fulfilling this constraint, and a given Λ yields unique μ and, thus, a unique (G H) [for the given p(y)] as can be seen by the analogous construction in Step 2 of the existence proof. The planner can, therefore, directly control Λ, which by the constraint governs the assignment μ, leading to the much simpler control problem (37)
y
s(y) · m(Λ(y)) · f (μ(y) y) dy
max Λy0
s.t.
y0
μ (y) = ±s(y)Λ(y)/b(μ(y))
where the sign on the constraint is positive for positive sorting and negative for negative sorting, y0 denotes the lowest type that is assigned to buyers by the planner. The Hamiltonian to problem (37) is: (38)
H(y λ μ) = s(y) · m(Λ) · f (μ y) + φ · s(y)Λ/b(μ)
where φ is the multiplier. Formally, the objective in(18) can be written as maxGH q(ΛGH (y p))f (μ(y) y) dHYP , which is equivalent to maxGH m(ΛGH (y p))f (μ(y) y) dG by the third constraint. This prob lem is equivalent to maxGH m(Λ(y))f (μ(y) y) dGY , such that GY = S HX = B and Λ = dHY /dGY where Λ(y) := ΛGH (y p) dG(p|y) since the concavity of m always makes it optimal to assign the average queue length. 19
572
J. EECKHOUT AND P. KIRCHER
The optimality conditions of the Hamiltonian satisfy: Λ:
∂H φ = m (Λ) · f (μ y) + = 0 ∂Λ b(μ)
μ:
b (μ) ∂H = s(y) · m(Λ) · fx (μ y) − φ · s(y)Λ 2 = −φ ∂μ b (μ)
φ(y) Defining A(μ(y)) = − b(μ(y)) , the optimality conditions can be written a
(39)
m (Λ(y)) · f (μ(y) y) = A(μ(y))
(40)
q(Λ(y)) · fx (μ(x) y) = A (μ(y))
These equations are identical to first-order conditions (6) and (7) of the decentralized economy with appropriate reinterpretation of the variables. To establish that the solution to this program is identical to the solution of the decentralized economy, focus on the case of positive sorting (the alternative case follows analogous steps). The planner’s boundary conditions are the following: at the upper bound, assortative matching means that μ(y) = x; at the lower bound, observe that it is never optimal to assign lower types if higher types have matching probability zero. Therefore, Λ(y) = 0 or Λ(y) = ∞ only at y = y0 Moreover, obviously y0 ≥ y and μ(y0 ) ≥ x Therefore, the planners’ problem has the same boundary conditions as the decentralized equilibrium. In the proof of existence (Proposition 2), we showed that under n-rootsupermodularity for n = (1 − a)−1 any solution of these first-order conditions and the boundary solutions constitutes an equilibrium when integrated up to the corresponding distributions (G H) Finally, we sketch why the planner’s solution is positive assortative if f is n-root-supermodular with n = (1 − a)−1 . Assume that the planner’s solution locally yields a differentiable assignment: on some subset of X × Y , the distribution H fulfills HX Y (μ(y) y) = B(x) for some function μ that is differentiable. Optimality still requires that μ satisfies (38) and associated optimality conditions (39) and (40). Yet to maximize the Hamiltonian, the second-order condition must be satisfied: (39) and (40) are identical to (6) and (7) under appropriate relabeling of variables, and the second-order condition therefore reduces to (14), which requires positive sorting. This rules out locally decreasing assignments. A tedious proof that extends this logic globally is available from the authors. Q.E.D. PROOF OF THE EQUILIBRIUM PRICE SCHEDULE: In a differentiable assortative equilibrium with price function p(y) assignment function μ(y), and queue length Λ(y) [formally Λ(y p(y))], the equilibrium buyer utility U(μ(y)) = q(Λ(y))[f (μ(y) y) − p(y)] can be totally differentiated to get (41)
U μ = Λ q [f − p] + q(fx μ + fy − p )
SORTING AND DECENTRALIZED PRICE COMPETITION
573
where we suppressed all arguments. Note further that firms’ equilibrium profits can be recovered by substituting (6) into (5), yielding [m − Λm ]f . Equating this to the definition of expected profits as trading probability times price (i.e., mp), we obtain the price schedule p(y) along the equilibrium path as p = [1 − Λm /m]f = [1 − ηm ]f Substituting this and (6) into (41), we get after canceling terms that 0 = q ηm Λ f + q[fy − p ] We can solve this for p : use (11) to substitute out Λ f and use the fact that a = m q /(m q) to get, after rearranging, that p = fy + a[fx μ (m/Λ − m )ηm /m − (1 − ηm )fy ] Since Q.E.D. [m/Λ − m ]ηm /m = 1 − ηm , we obtain (17). REFERENCES ACEMOGLU, D., AND R. SHIMER (1999a): “Efficient Unemployment Insurance,” Journal of Political Economy, 107, 893–928. [545,557] (1999b): “Holdups and Efficiency With Search Frictions,” International Economic Review, 40, 827–849. [543,545,546,557] BECKER, G. S. (1973): “A Theory of Marriage: Part I,” Journal of Political Economy, 81, 813–846. [539,540,552,555,556,559-562] BURDETT, K., S. SHI, AND R. WRIGHT (2001): “Pricing and Matching With Frictions,” Journal of Political Economy, 109, 1060–1085. [557] BUTTERS, G. R. (1977): “Equilibrium Distributions of Sales and Advertising Prices,” Review of Economic Studies, 44, 465–491. [544] DELACROIX, A., AND S. SHI (2006): “Directed Search on the Job and the Wage Ladder,” International Economic Review, 47, 651–699. [558] EECKHOUT, J., AND K. PHILIPP (2010): “Sorting versus Screening: Search Frictions and Competing Mechanisms,” Journal of Economic Theory (forthcoming). [558] GALE, D. (1986): “Bargaining and Competition, Part I: Characterization,” Econometrica, 54, 785–806. [561] GALENIANOS, M., AND P. KIRCHER (2009): “Directed Search With Multiple Job Applications,” Journal of Economic Theory, 114, 445–471. [558] HICKS, J. R. (1932): The Theory of Wages. London: Macmillan. [550] HOSIOS, A. (1990): “On the Efficiency of Matching and Related Models of Search and Unemployment,” Review of Economic Studies, 57, 279–298. [543,557,562] KANTOROVICH, L. V. (1958): “On the Translocation of Masses,” Management Science, 5, 1–4. [553] KIRCHER, P. (2009): “Efficiency of Simultaneous Search,” Journal of Political Economy, 117, 861–913. [558] KIYOTAKI, N., AND R. WRIGHT (1993): “A Search-Theoretic Approach to Monetary Economics,” American Economic Review, 83, 63–77. [545] LAUERMANN, S. (2007): “Dynamic Matching and Bargaining Games: A General Approach,” Mimeo, University of Michigan. [561] MAS-COLELL, A. (1984): “On a Theorem by Schmeidler,” Journal of Mathematical Economics, 13, 201–206. [545] MCAFEE, R. P. (1993): “Mechanism Design by Competing Sellers,” Econometrica, 61, 1281–1312. [546,558] MENZIO, G. (2007): “A Theory of Partially Directed Search,” Journal of Political Economy, 115, 748–769. [554] MOEN, E. (1997): “Competitive Search Equilibrium,” Journal of Political Economy, 105, 385–411. [543,557] MORTENSEN, D., AND R. WRIGHT (2002): “Competitive Pricing and Efficiency in Search Equilibrium,” International Economic Review, 43, 1–20. [558]
574
J. EECKHOUT AND P. KIRCHER
PETERS, M. (1991): “Ex ante Pricing in Matching Games: Non Steady States,” Econometrica, 59, 1425–1454. [540,544-546,557] (1997a): “A Competitive Distribution of Auctions,” The Review of Economic Studies, 64, 97–123. [540,546,558] (1997b): “On the Equivalence of Walrasian and Non-Walrasian Equilibria in Contract Markets: The Case of Complete Contracts,” The Review of Economic Studies, 64, 241–264. [557] (2000): “Limits of Exact Equilibria for Capacity Constrained Sellers With Costly Search,” Journal of Economic Theory, 95, 139–168. [545,546,557] (2007): “Unobservable Heterogeneity in Directed Search,” Mimeo, UBC. [557] ROSEN, S. (1974): “Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition,” Journal of Political Economy, 82, 34–55. [539,540,546,560] (2002): “Markets and Diversity,” American Economic Review, 92, 1–15. [540] RUBINSTEIN, A., AND A. WOLINSKY (1985): “Equilibrium in a Market With Sequential Bargaining,” Econometrica, 53, 1133–1150. [561] SHI, S. (2001): “Frictional Assignment. 1. Efficiency,” Journal of Economic Theory, 98, 232–260. [541,543,544,549,552,557,558] (2002): “A Directed Search Model of Inequality With Heterogeneous Skills and SkillBiased Technology,” The Review of Economic Studies, 69, 467–491. [558] SHIMER, R. (2005): “The, Assignment of Workers to Jobs in an Economy With Coordination Frictions,” Journal of Political Economy, 113, 996–1025. [544,546,558] SHIMER, R., AND L. SMITH (2000): “Assortative Matching and Search,” Econometrica, 68, 343–369. [539,542,559,560] SMITH, L. (2006): “The Marriage Model With Search Frictions,” Journal of Political Economy, 114, 1124–1144. [542]
Dept. of Economics, University of Pennsylvania, 3718 Locust Walk, Philadelphia, PA 19140, U.S.A. and UPF Barcelona, ICREA, and GSE; eeckhout@ssc. upenn.edu and Dept. of Economics, University of Pennsylvania, 3718 Locust Walk, Philadelphia, PA 19140, U.S.A. and University of Oxford, Oxford, U.K.; kircher@econ. upenn.edu. Manuscript received May, 2008; final revision received October, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 575–601
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES BY ANDREW CHESHER1 Single equation instrumental variable models for discrete outcomes are shown to be set identifying, not point identifying, for the structural functions that deliver the values of the discrete outcome. Bounds on identified sets are derived for a general nonparametric model and sharp set identification is demonstrated in the binary outcome case. Point identification is typically not achieved by imposing parametric restrictions. The extent of an identified set varies with the strength and support of instruments, and typically shrinks as the support of a discrete outcome grows. The paper extends the analysis of structural quantile functions with endogenous arguments to cases in which there are discrete outcomes. KEYWORDS: Partial identification, nonparametric methods, nonadditive models, discrete distributions, ordered choice, endogeneity, instrumental variables, structural quantile functions, incomplete models.
1. INTRODUCTION THIS PAPER GIVES RESULTS on the identifying power of single equation instrumental variables (IV) models for a discrete outcome, Y , in which explanatory variables, X, may be endogenous. Outcomes can be binary, for example, indicating the occurrence of an event; integer valued, for example, recording counts of events; or ordered, for example, giving a point on an attitudinal scale or obtained by interval censoring of an unobserved continuous outcome. Endogenous and other observed variables can be continuous or discrete. The scalar discrete outcome Y is determined by a structural function Y = h(X U); it is identification of the function h that is studied. Here X is a vector of possibly endogenous variables, U is a scalar continuously distributed unobservable random variable that is normalized marginally uniformly distributed on the unit interval, and h is restricted to be weakly monotonic, normalized nondecreasing, and caglad in U. 1 I thank Victor Chernozhukov, Martin Cripps, Russell Davidson, Simon Sokbae Lee, Arthur Lewbel, Charles Manski, Lars Nesheim, Adam Rosen, Konrad Smolinski, and Richard Spady for stimulating comments and discussions and referees for very helpful comments. The support of the Leverhulme Trust through grants to the research project Evidence Inference and Inquiry and to the Centre for Microdata Methods and Practice (CeMMAP) is acknowledged. The support for CeMMAP given by the U.K. Economic and Social Research Council under Grant RES-58928-0001 since June 2007 is acknowledged. This is a revised and corrected version of CeMMAP Working Paper CWP 05/07 titled “Endogeneity and Discrete Outcomes.” The main results of the paper were presented at an Oberwolfach Workshop on March 19, 2007. Detailed results for binary response models were given at a conference in honor of the 60th birthday of Peter Robinson at the LSE, May 25, 2007. I am grateful for comments at these meetings and at subsequent presentations of this and related papers.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7315
576
ANDREW CHESHER
There are instrumental variables Z excluded from the structural function h, and U is distributed independently of Z for Z lying in a set Ω. X may be endogenous in the sense that U and X may not be independently distributed. This is a single equation model in the sense that there is no specification of structural equations that determine the value of X. In this respect, the model is incomplete. There could be parametric restrictions. For example, the function h(X U) could be specified to be the structural function associated with a probit or a logit model with endogenous X; in the latter case, h(X U) = 1 U > (1 + exp(X β))−1 U ∼ Unif(0 1) with U potentially jointly dependent with X but independent of instrumental variables Z, which are excluded from h. The results of this paper apply in this case. Until now, instrumental variables analysis of binary outcome models has been confined to linear probability models. The central result of this paper is that the single equation IV model set identifies the structural function h. Parametric restrictions on the structural function do not typically secure point identification, although they may reduce the extent of identified sets. Underpinning the identification results are the inequalities, for all τ ∈ (0 1) and z ∈ Ω, (1)
Pra [Y ≤ h(X τ)|Z = z] ≥ τ Pra [Y < h(X τ)|Z = z] < τ
which hold for any structural function h that is an element of an admissible structure that generates the probability measure indicated by Pra . In the binary outcome case, these inequalities sharply define the identified set of structural functions for the probability measure under consideration in the sense that all functions h, and only functions h, that satisfy these inequalities for all τ ∈ (0 1) and all z ∈ Ω are elements of the observationally equivalent admissible structures that generate the probability measure Pra . When Y has more than two points of support, the model places restrictions on structural functions in addition to those that come from (1) and the inequalities define an outer region,2 that is, a set within which lies the set of structural functions identified by the model. Calculation of the sharp identified set seems to be infeasible when X is continuous or discrete with many points of support without additional restrictions. Similar issues arise in some of the models of oligopoly market entry discussed in Berry and Tamer (2006). When the outcome Y is continuously distributed (in which case h is strictly monotonic in U), both probabilities in (1) are equal to τ and with additional 2
This terminology is borrowed from Beresteanu, Molchanov, and Molinari (2008).
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
577
completeness restrictions, the model point identifies the structural function as set out in Chernozhukov and Hansen (2005), where the function h is called a structural quantile function. This paper extends the analysis of structural quantile functions to cases in which outcomes are discrete. Many applied researchers facing a discrete outcome and endogenous explanatory variables use a control function approach. This is rooted in a more restrictive complete, triangular model that can be point identifying, but the model’s restrictions are not always applicable. There is a brief discussion in Section 4 and a detailed comparison with the single equation instrumental variable model in Chesher (2009). A few papers take a single equation IV approach to endogeneity in parametric count data models, basing identification on moment conditions.3 Mullahy (1997) and Windmeijer and Santos Silva (1997) considered models in which the conditional expectation of a count variable, given explanatory variables X = x and an unobserved scalar heterogeneity term V = v, is multiplicative: exp(xβ) × v, with X and V correlated, and with V and instrumental variables Z having a degree of independent variation. This IV model can point identify β, but the fine details of the functional form restrictions are influential in securing point identification, and the approach, based as it is on a multiplicative heterogeneity specification, is not applicable when discrete variables have bounded support. The paper is organized as follows. The main results of the paper are given in Section 2, which specifies an IV model for a discrete outcome, and presents and discusses the set identification results. Section 3 presents two illustrative examples: one with a binary outcome and a binary endogenous variable; the other involving a parametric ordered-probit-type problem. Section 4 discusses alternatives to the set identifying single equation IV model and outlines some extensions, including the case that arises with panel data when there is a vector of discrete outcomes. 2. IV MODELS AND THEIR IDENTIFYING POWER This section presents the main results of the paper. Section 2.1 defines a single equation instrumental variable model for a discrete outcome and develops the probability inequalities that play a key role in defining the identified set of structural functions. In Section 2.2, theorems are presented that deliver bounds on the set of structural functions identified by the IV model in the M > 2 outcome case and deliver sharp identification in the binary outcome case. Section 2.3 discusses the identification results with brief comments on the impact of support and strength of instruments, and discreteness of outcome on the identified set, sharpness, and local independence restrictions. 3
See the discussion in Section 11.3.2 of Cameron and Trivedi (1998).
578
ANDREW CHESHER
2.1. Model The following two restrictions define a model D for a scalar discrete outcome. RESTRICTION D1: Y = h(X U), where U ∈ (0 1) is continuously distributed and h is weakly monotonic (normalized caglad, nondecreasing) in its last argument. X is a vector of explanatory variables. The codomain of h is some ascending sequence {ym }M m=1 which is independent of X. M may be unbounded. The function h is normalized so that the marginal distribution of U is uniform. RESTRICTION D2: There exists a vector Z such that Pr[U ≤ τ|Z = z] = τ for all τ ∈ (0 1) and all z ∈ Ω. A key implication of the weak monotonicity condition contained in Restriction D1 is that the function h(x u) is characterized by threshold functions {pm (x)}M m=0 with, for m ∈ {1 M}, (2)
h(x u) = ym
if and only if
pm−1 (x) < u ≤ pm (x)
and, for all x, p0 (x) ≡ 0 and pM (x) ≡ 1. The structural function h is a nondecreasing step function, and the value of Y increases as U ascends through thresholds that depend on the value of the explanatory variables X, but not on Z. Restriction D2 requires that the conditional distribution of U given Z = z be invariant with respect to z for variations within Ω. If Z is a random variable and Ω is its support, then the model requires that U and Z be independently distributed. But Z is not required to be a random variable. For example values of Z might be chosen purposively, for example, by an experimenter, and then Ω is some set of values of Z that can be chosen. Restriction D1 excludes the variables Z from the structural function h. These variables play the role of instrumental variables with the potential to contribute to the identifying power of the model if they are indeed “instrumental” in determining the value of the endogenous X. But the model D places no restrictions on the way in which the variables X, possibly endogenous, are generated. Data are informative about the conditional distribution function of (Y X) given Z for Z = z ∈ Ω, denoted by FY X|Z (y x|z). Let FUX|Z denote the joint distribution function of U and X given Z. Under the weak monotonicity cona } with dition embodied in the model D , an admissible structure S a ≡ {ha FUX|Z structural function ha delivers a conditional distribution for (Y X) given Z as (3)
a (pam (x) x|z) FYa X|Z (ym x|z) = FUX|Z
m ∈ {1 M}
Here the functions {pam (x)}M m=0 are the threshold functions that characterize the structural function ha as in (2) above.
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
579
Distinct structures admitted by the model D can deliver identical distributions of Y and X given Z for all z ∈ Ω. Such structures are observationally equivalent and the model is set—not point—identifying because, within a set of admissible observationally equivalent structures there can be more than one distinct structural function. This can happen because on the right hand side of (3), certain variations in the functions pam (x) can be offset by altering the sena sitivity of FUX|Z (u x|z) to variations in u and x so that the left hand side of (3) is left unchanged. Crucially, the independence restriction (Restriction D2) places limits on the variations in the functions pam (x) that can be so compensated and results in the model having nontrivial set identifying power. A pair of probability inequalities place limits on the structural functions which lie in the set identified by the model. They are the subject of the following theorem.4 a THEOREM 1: Let Sa ≡ {ha FUX|Z } be a structure admitted by the model D that delivers a distribution function for (Y X) given Z (FYa X|Z ) and let Pra indicate probabilities calculated using this distribution. Then, for all z ∈ Ω and τ ∈ (0 1), the following inequalities hold:
(4)
Pra [Y ≤ ha (X τ)|Z = z] ≥ τ Pra [Y < ha (X τ)|Z = z] < τ
PROOF: For all x, each admissible ha (x u) is caglad for variations in u, so for all x and τ ∈ (0 1), {u : ha (x u) ≤ ha (x τ)} ⊇ {u : u ≤ τ} {u : ha (x u) < ha (x τ)} ⊂ {u : u ≤ τ} which lead to the following inequalities which hold for all τ ∈ (0 1), and for all x and z: a (τ|x z) Pra [Y ≤ ha (X τ)|X = x Z = z] ≥ FU|XZ a (τ|x z) Pra [Y < ha (X τ)|X = x Z = z] < FU|XZ a be the distribution function of X given Z associated with FYa X|Z . Using Let FX|Z this distribution to take expectations over X given Z = z on the left hand sides of these inequalities delivers the left hand sides of the inequalities (4). Taking expectations similarly on the right hand sides yields the distribution function a of U given Z = z associated with FU|Z (τ|z) which is equal to τ for all z ∈ Ω and τ ∈ (0 1) under the conditions of model D . Q.E.D. 4 Theorem 1, part (b) of Chernozhukov and Hansen (2001) states these inequalities under similar assumptions. That paper and Chernozhukov and Hansen (2005) study the case in which the outcome is continuous. The potential for set identification which the inequalities afford when the outcome is discrete is not discussed.
580
ANDREW CHESHER
2.2. Identification a Consider the model D , a structure Sa = {ha FUX|Z } admitted by it, and the ˜ set Sa of all structures admitted by D and observationally equivalent to Sa . Let H˜ a be the set of structural functions which are components of structures contained in S˜a . Let FYa X|Z be the joint distribution function of (Y X) given Z delivered by the observationally equivalent structures in the set S˜a . The model D set identifies the structural function that generates FYa X|Z : it must be one of the structural functions in the set H˜ a . The inequalities (4) constrain this set as follows: all structural functions in the identified set H˜ a satisfy the inequalities (4) when they are calculated using the probability distribution FYa X|Z ; conversely, no admissible function that violates one or other of the inequalities at any value of z or τ can lie in the identified set. Thus the inequalities (4), in general, define an outer region within which H˜ a lies. This is the subject of Theorem 2. When the outcome Y is binary, the inequalities do define the identified set, that is, all and only functions that satisfy the inequalities (4) lie in the identified set H˜ a . This is the subject of Theorem 3. There is a discussion in Section 2.3.3 of sharp identification in the case when Y has more than two points of support.
THEOREM 2: Let Sa be a structure admitted by the model D and delivering the ∗ } be any observationally equivalent distribution function FYa X|Z . Let S∗ ≡ {h∗ FUX|Z structure admitted by the model D . Let Pra indicate probabilities calculated using the distribution function FYa X|Z . Then, for all z ∈ Ω and τ ∈ (0 1), the following inequalities are satisfied: (5)
Pra [Y ≤ h∗ (X τ)|Z = z] ≥ τ Pra [Y < h∗ (X τ)|Z = z] < τ
PROOF: Let Pr∗ indicate probabilities calculated using FY∗ X|Z . Because the structure S∗ is admitted by model D , Theorem 1 implies that, for all z ∈ Ω and τ ∈ (0 1), Pr∗ [Y ≤ h∗ (X τ)|Z = z] ≥ τ Pr∗ [Y < h∗ (X τ)|Z = z] < τ Since Sa and S∗ are observationally equivalent, FY∗ X|Z = FYa X|Z and the inequalQ.E.D. ities (5) follow on substituting Pra for Pr∗ . There is the following corollary, the proof of which is elementary and is omitted. COROLLARY: If the inequalities (5) are violated for any (z τ) ∈ Ω × (0 1), then h∗ ∈ / H˜ a .
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
581
The consequence of these results is that for any probability measure FYa X|Z generated by an admissible structure, the set of functions that satisfy the inequalities (5) contains all members of the set of structural functions H˜ a identified by the model D . When the outcome Y is binary, the sets are identical— a sharpness result which follows from the next theorem. THEOREM 3: If Y is binary and h∗ (x u) satisfies the restrictions of the model ∗ D and the inequalities (5), then there exists a proper distribution function FUX|Z ∗ ∗ ∗ such that S = {h FUX|Z } satisfies the restrictions of model D and is observationally equivalent to structures S a that generate the distribution FYa X|Z . A proof of Theorem 3 is given in the Appendix. The proof is constructive. For a given distribution FYa X|Z , and each value of z ∈ Ω and each structural ∗ function h∗ satisfying the inequalities (5), a proper distribution function FUX|Z is constructed which respects the independence condition of Restriction D2 ∗ and has the property that at the chosen value of z the pair {h∗ FUX|Z } delivers a the distribution function FY X|Z at that value of z. 2.3. Discussion 2.3.1. Intersection Bounds Let I˜ a (z) be the set of structural functions that satisfy the inequalities (5) for all τ ∈ (0 1) at a value z ∈ Ω. Let H˜ a (z) denote the set of structural functions identified by the model at z ∈ Ω, that is, H˜ a (z) contains the structural functions which lie in those structures admitted by the model that deliver the distribution FYa X|Z for Z = z. When Y is binary, I˜ a (z) = H˜ a (z); otherwise, I˜ a (z) ⊇ H˜ a (z). The identified set of structural functions H˜ a , defined by the model given a distribution FYa X|Z , is the intersection of the sets H˜ a (z) for z ∈ Ω, and because for each z ∈ Ω, I˜ a (z) ⊇ H˜ a (z), the identified set is a subset of the set defined by the intersection of the inequalities (5), thus (6) H˜ a ⊆ I˜ a = h∗ : for all τ ∈ (0 1)
min Pra [Y ≤ h∗ (X τ)|Z = z] ≥ τ
z∈Ω
max Pra [Y < h∗ (X τ)|Z = z] < τ z∈Ω
with H˜ a = I˜ a when the outcome is binary. The set I˜ a can be estimated by calculating (6) using an estimate of the distribution of FY X|Z . Chernozhukov, Lee, and Rosen (2009) gave results on inference in the presence of intersection bounds. There is an illustration in Chesher (2009).
582
ANDREW CHESHER
2.3.2. Strength and Support of Instruments It is clear from (6) that the support of the instrumental variables, Ω, is critical in determining the extent of an identified set. The strength of the instruments is also critical. When instrumental variables are good predictors of some particular value of the endogenous variables, say x∗ , the identified sets for the values of threshold crossing functions at X = x∗ will tend to be small in extent. In the extreme case of perfect prediction, there can be point identification. For example, suppose X is discrete with K points of support, x1 xK , and suppose that for some value z ∗ of Z, Pr[X = xk∗ |Z = z ∗ ] = 1. Then the values of all the threshold functions at X = xk∗ are point identified and, for m ∈ {1 M},5 (7)
pm (xk∗ ) = Pr[Y ≤ m|Z = z ∗ ]
2.3.3. Sharpness The inequalities of Theorem 1 define the identified set when the outcome is binary. When Y has more than two points of support, there may exist admissible functions that satisfy the inequalities but do not lie in the identified set. This happens when, for a function that satisfies the inequalities, say h∗ , it ∗ , which, when is not possible to find an admissible distribution function FUX|Z ∗ paired with h , delivers the “observed” distribution function FYa X|Z . In the three or more outcome model, without further restrictions on the structural function, the identified set is characterized by inequalities that are additional to those provided in Theorem 1. Section 3.1.3 gives an example based on a three outcome model. Chesher and Smolinski (2009) gave inequalities defining the identified set when X is binary. When X is continuous it is not feasible to compute the identified set without additional restrictions because in that case FUX|Z is infinite dimensional. 5
This is so because Pr[Y ≤ m|Z = z ∗ ] =
K
Pr[U ≤ pm (xk )|xk z ∗ ] Pr[X = xk |z ∗ ]
k=1
= Pr[U ≤ pm (xk∗ )|xk∗ z ∗ ] the second equality following because of perfect prediction at z ∗ . Because of the independence restriction and the uniform marginal distribution normalization embodied in Restriction D2, for any value u, u = Pr[U ≤ u|z ∗ ] =
K
Pr[U ≤ u|xk z ∗ ] Pr[X = xk |z ∗ ] = Pr[U ≤ u|xk∗ z ∗ ]
k=1
which delivers the result (7) on substituting u = pm (xk∗ ).
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
583
A similar situation arises in the oligopoly entry game studied in Ciliberto and Tamer (2009). Some progress is possible when X is discrete, but if there are many points of support for Y and X, then computations are infeasible without further restriction. Chesher and Smolinski (2009) gave some results using parametric restrictions. 2.3.4. Discreteness of Outcomes The degree of discreteness in the distribution of Y affects the extent of the identified set. The difference between the two probabilities in the inequalities (4) which delimit the identified set is the conditional probability of the event: (Y X) realizations lie on the structural function. This is an event of measure zero when Y is continuously distributed. As the support of Y grows more dense and the distribution of Y comes to be continuous, the maximal probability mass (conditional on X and Z) on any point of support of Y will pass to zero, and the upper and lower bounds will attain coincidence. However, even when the bounds coincide, there can remain more than one observationally equivalent structural function admitted by the model. In the absence of parametric restrictions, this is always the case when the support of Z is less rich than the support of X. The continuous outcome case was studied in Chernozhukov and Hansen (2005) and Chernozhukov, Imbens, and Newey (2007), where completeness conditions are provided under which there is point identification of a structural function. 2.3.5. Local Independence It is possible to proceed under weaker independence restrictions, for example, Pr[U ≤ τ|Z = z] = τ for τ ∈ τL , some restricted set of values of τ, and z ∈ Ω. It is straightforward to show that, with this amendment to the model, Theorems 1 and 2 hold for τ ∈ τL from which results on set identification of h(· τ) for τ ∈ τL can be developed. 3. ILLUSTRATIONS AND ELUCIDATION This section illustrates results of the paper with two examples. The first example has a binary outcome and a discrete endogenous variable which, for simplicity in this illustration, is specified as binary. It is shown how the probability inequalities of Theorem 2 deliver inequalities on the values taken by the threshold crossing function which determine the binary outcome. In this case, it is easy to develop admissible distributions for unobservables which, taken with each member of the identified set, deliver the probabilities used to construct the set. The second example employs a restrictive parametric ordered-probit-type model such as might be used to analyze interval censored data or data on ordered choices. This example demonstrates that parametric restrictions alone
584
ANDREW CHESHER
are not sufficient to deliver point identification. By varying the number of “choices,” the impact on set identification of the degree of discreteness of an outcome is clearly revealed. In both examples one can clearly see the effect of instrument strength on the extent of identified sets. 3.1. Binary Outcomes and Binary Endogenous Variables In the first example, there is a threshold crossing model for a binary outcome Y with binary explanatory variable X, which may be endogenous. An unobserved scalar random variable U is continuously distributed, normalized Uniform on (0 1), and restricted to be distributed independently of instrumental variables Z. The model is Y = h(X U) 0 0 ≤ U ≤ p(X), ≡ 1 p(X) < U ≤ 1,
U⊥ ⊥Z ∈ Ω
U ∼ Unif(0 1)
The distribution of X is restricted to have support independent of U and Z with two distinct points of support: {x1 x2 }. The values taken by p(X) are denoted by θ1 ≡ p(x1 ) and θ2 ≡ p(x2 ). The identifiability of these structural features is of interest. A shorthand notation for the conditional probabilities about which data are informative is π1 (z) ≡ Pr[Y = 0 ∩ X = x1 |z] β1 (z) ≡ Pr[X = x1 |z]
π2 (z) ≡ Pr[Y = 0 ∩ X = x2 |z]
β2 (z) ≡ Pr[X = x2 |z]
The set of values of Θ ≡ {θ1 θ2 } identified by the model for a particular distribution of Y and X given Z = z ∈ Ω is now obtained by applying the results given earlier. There is a set associated with each value of z in Ω and the identified set for variations in z over Ω is the intersection of the sets obtained at each value of z. The sharpness of the identified set is demonstrated by a constructive argument. 3.1.1. The Identified Set First, expressions are developed for the probabilities that appear in the inequalities (4) which, in this binary outcome case, define the identified set. With these in hand, it is straightforward to characterize the identified set. The ordering of θ1 and θ2 is important, and in general is not restricted a priori. First consider the case in which θ1 ≤ θ2 . Consider the event {Y < h(X τ)}. This occurs if and only if h(X τ) = 1 and Y = 0, and since h(X τ) = 1 if and only if p(X) < τ, there is (8)
Pr[Y < h(X τ)|z] = Pr[Y = 0 ∩ p(X) < τ|z]
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
585
As far as the inequality p(X) < τ is concerned, there are three possibilities: τ ≤ θ1 , θ1 < τ ≤ θ2 , and θ2 < τ. In the first case, p(X) < τ cannot occur and the probability (8) is zero. In the second case, p(X) < τ only if X = x1 and the probability (8) is therefore Pr[Y = 0 ∩ X = x1 |z] = π1 (z) In the third case, p(X) < τ whatever value X takes and the probability (8) is therefore Pr[Y = 0|z] = π1 (z) + π2 (z) The situation is
⎧ 0 ≤ τ ≤ θ1 , ⎨ 0 Pr[Y < h(X τ)|z] = π1 (z) θ1 < τ ≤ θ2 , ⎩ π1 (z) + π2 (z) θ2 < τ ≤ 1.
The inequality Pr[Y < h(X τ)|Z = z] < τ restricts the identified set because in each row above, the value of the probability must be less than any value of τ in the interval to which it relates and, in particular, must not exceed the infimum of that interval. The result is the pair of inequalities (9)
π1 (z) ≤ θ1
π1 (z) + π2 (z) ≤ θ2
Now consider the event {Y ≤ h(X τ)}. This occurs if and only if h(X τ) = 1, when any value of Y is admissible, or h(X τ) = 0 and Y = 0, so the probability of the event is Pr[Y ≤ h(X τ)|z] = Pr[Y = 0 ∩ τ ≤ p(X)|z] + Pr[p(X) < τ|z] Again there are three possibilities to consider: τ ≤ θ1 , θ1 < τ ≤ θ2 , and θ2 < τ. In the first case, τ ≤ p(X) occurs whatever the value of X and Pr[Y ≤ h(X τ)|z] = π1 (z) + π2 (z) In the second case, τ ≤ p(X) when X = x2 and p(X) < τ when X = x1 , so Pr[Y ≤ h(X τ)|z] = β1 (z) + π2 (z) In the third case, p(X) < τ whatever the value taken by X, so Pr[Y ≤ h(X τ)|z] = 1 The situation is
⎧ ⎨ π1 (z) + π2 (z) 0 ≤ τ ≤ θ1 , Pr[Y ≤ h(X τ)|z] = β1 (z) + π2 (z) θ1 < τ ≤ θ2 , ⎩ 1 θ2 < τ ≤ 1
586
ANDREW CHESHER
The inequality Pr[Y ≤ h(X τ)|Z = z] ≥ τ restricts the identified set because in each row above, the value of the probability must at least equal all values of τ in the interval to which it relates and, in particular, must at least equal the supremum of that interval. The result is the pair of inequalities (10)
θ1 ≤ π1 (z) + π2 (z)
θ2 ≤ β1 (z) + π2 (z)
Bringing (9) and (10) together gives, for the case in which Z = z, the part of the identified set in which θ1 ≤ θ2 , which is defined by the inequalities (11)
π1 (z) ≤ θ1 ≤ π1 (z) + π2 (z) ≤ θ2 ≤ β1 (z) + π2 (z)
The part of the identified set in which θ2 ≤ θ1 is obtained directly by exchange of indexes, (12)
π2 (z) ≤ θ2 ≤ π1 (z) + π2 (z) ≤ θ1 ≤ π1 (z) + β2 (z)
and the identified set for the case in which Z = z is the union of the sets defined by the inequalities (11) and (12). The resulting set consists of two rectangles in the unit square, one above and one below the 45◦ line, oriented with edges parallel to the axes. The two rectangles intersect at the point θ1 = θ2 = π1 (z) + π2 (z). There is one such set for each value of z in Ω and the identified set for Θ ≡ (θ1 θ2 ) delivered by the model is the intersection of these sets. The result is not, in general, a connected set, comprising two disjoint rectangles in the unit square, one strictly above and the other strictly below the 45◦ line. However, with a strong instrument and rich support, one of these rectangles will not be present. 3.1.2. Sharpness The set just derived is precisely the identified set, that is, for every value Θ in the set, a distribution for U given X and Z can be found which is proper and satisfies the independence restriction, U⊥ ⊥Z, and delivers the distribution of Y given X and Z used to define the set. The existence of such a distribution is now demonstrated. Consider some value z and a value Θ∗ ≡ {θ1∗ θ2∗ } with, say, θ1∗ ≤ θ2∗ , which satisfies the inequalities (11), and consider a distribution function for U given ∗ X and Z, FU|XZ . The proposed distribution is piecewise uniform, but other choices could be made. Define values of the proposed distribution function as (13)
∗ FU|XZ (θ1∗ |x1 z) ≡ π1 (z)/β1 (z) ∗ FU|XZ (θ1∗ |x2 z) ≡ (θ1∗ − π1 (z))/β2 (z) ∗ FU|XZ (θ2∗ |x1 z) ≡ (θ2∗ − π2 (z))/β1 (z) ∗ FU|XZ (θ2∗ |x2 z) ≡ π2 (z)/β2 (z)
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
587
∗ ∗ (θ1∗ |x1 z) and FU|XZ (θ2∗ |x2 z) ensures that this The choice of values for FU|XZ structure is observationally equivalent to the structure which generated the conditional probabilities that define the identified set.6 The proposed distribution respects the independence restriction because the implied probabilities marginal with respect to X do not depend on z, as follows ∗ ∗ (θ1∗ |x1 z) + β2 (z)FU|XZ (θ1∗ |x2 z) = θ1∗ Pr[U ≤ θ1∗ |z] = β1 (z)FU|XZ ∗ ∗ (θ2∗ |x1 z) + β2 (z)FU|XZ (θ2∗ |x2 z) = θ2∗ Pr[U ≤ θ2∗ |z] = β1 (z)FU|XZ
It just remains to determine whether the proposed distribution of U given X and Z = z is proper, that is, has probabilities that lie in the unit interval and ∗ ∗ (θ1∗ |x1 z) and FU|XZ (θ2∗ |x2 z) lie in [0 1] by respect monotonicity. Both FU|XZ definition. The other two elements lie in the unit interval if and only if π1 (z) ≤ θ1∗ ≤ π1 (z) + β2 (z) π2 (z) ≤ θ2∗ ≤ β1 (z) + π2 (z) which both hold when θ1∗ and θ2∗ satisfy the inequalities (11). The case under consideration has θ1∗ ≤ θ2∗ , so if the distribution function of U given X and Z = z is to be monotonic, it must be that the following inequalities hold: ∗ ∗ FU|XZ (θ1∗ |x1 z) ≤ FU|XZ (θ2∗ |x1 z) ∗ ∗ FU|XZ (θ1∗ |x2 z) ≤ FU|XZ (θ2∗ |x2 z)
Manipulating the expressions in (13) yields the result that these inequalities are satisfied if θ1∗ ≤ π1 (z) + π2 (z) ≤ θ2∗ which is assured when θ1∗ and θ2∗ satisfy the inequalities (11). There is a similar argument for the case θ2∗ ≤ θ1∗ . The argument above applies at each value z ∈ Ω, so it can be concluded that, for each value Θ∗ in the set formed by intersecting sets obtained at each ∗ with U independent of z ∈ Ω, there exists a proper distribution function FU|XZ Z which, combined with that value, delivers the probabilities used to define the sets. 3.1.3. Numerical Example The identified sets are illustrated using probability distributions generated by a structure in which binary Y ≡ 1[Y ∗ > 0] and X ≡ 1[X ∗ > 0] are generated 6
This is because for j ∈ {1 2}, αj (z) ≡ Pr[Y = 0|xj z] = Pr[U ≤ θj |X = xj Z = z].
588
ANDREW CHESHER
by a triangular linear equation system which delivers values of latent variables Y ∗ and X ∗ as Y ∗ = a0 + a1 X + ε X ∗ = b0 + b1 Z + η Latent variates ε and η are jointly normally distributed conditional on (and marginal with respect to) an instrumental variable Z:
ε 0 1 r Z ∼ N η 0 r 1 Let denote the standard normal distribution function. The structural equation for binary Y is 0 0 < U ≤ p(X), Y= 1 p(X) < U ≤ 1, with U ≡ (ε) ∼ Unif(0 1), U⊥ ⊥Z, and p(X) = (−a0 − a1 X) with X ∈ {0 1}. Figure 1 shows identified sets when the parameter values that generate the probabilities are a0 = 0, a1 = 05, b0 = 0, b1 = 1, r = −025, for which p(0) = (−a0 ) = 05
p(1) = (−a0 − a1 ) = 0308
and z takes values in Ω ≡ {0 −075 075}. Part (a) in Figure 1 shows the identified set when z = 0. It comprises two rectangular regions that touch at the point p(0) = p(1), but otherwise are not connected. In the upper rectangle, p(1) ≥ p(0), and in the lower rectangle, p(1) ≤ p(0). The dashed lines intersect at the location of p(0) and p(1) in the structure that generates the probability distributions used to calculate the identified sets. In that structure, p(0) = 05 > p(1) = 0308, but there are observationally equivalent structures that lie in the rectangle above the 45◦ line in which p(1) > p(0). Part (b) in Figure 1 shows the identified set when z = 075. At this instrumental value, the range of values of p(1) in the identified set is smaller than when z = 0, but the range of values of p(0) is larger. Part (c) shows the identified set when z = −075. At this instrumental value, the range of values of p(1) in the identified set is larger than when z = 0 and the range of values of p(0) is smaller. Part (d) shows the identified set (the solid filled rectangle) when all three instrumental values are available. The identified set is the intersection of the sets drawn in Figure 1 parts (a)– (c). The strength and support of the instrument in this case is sufficient to eliminate the possibility that p(1) > p(0). If the instrument were stronger (b1 1), the solid filled rectangle would be smaller and as b1 increased without limit, it
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
589
FIGURE 1.—Identified sets with a binary outcome and binary endogenous variable as instrumental values, z, vary. Strong instrument (b1 = 1). Dotted lines intersect at the values of p(0) and p(1) in the distribution generating structure. Parts (a)–(c) show identified sets at each of three values of the instrument. Part (d) shows the intersection (solid area) of these identified sets. The instrument is strong enough and has sufficient support to rule out the possibility p(1) > p(0).
would contract to a point. For the structure used to construct this example, the model achieves “point identification at infinity,” because the mechanism that generates X is such that as Z passes to ±∞, the value of X becomes perfectly predictable.
590
ANDREW CHESHER
FIGURE 2.—Identified sets with a binary outcome and binary endogenous variable as instrumental values, z, vary. Weak instrument (b1 = 03). Dotted lines intersect at the values of p(0) and p(1) in the distribution generating structure. Parts (a)–(c) show identified sets at each of three values of the instrument. Part (d) shows the intersection (solid area) of these identified sets. The instrument is weak and there are observationally equivalent structures in which p(1) > p(0).
Figure 2 shows identified sets when the instrument is weaker, achieved by setting b1 = 03. In this case, even when all three values of the instrument are employed there are observationally equivalent structures in which p(1) > p(0).7 7
In Supplemental Material (Chesher (2010)) more extensive graphical displays are available.
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
591
3.2. Three Valued Outcomes When the outcome has more than two points of support, the inequalities of Theorem 1 define an outer region within which the set of structural functions identified by the model lies. This is demonstrated in a three outcome case: ⎧ ⎨ 0 0 < U ≤ p1 (X), Y = h(X U) ≡ 1 p1 (X) < U ≤ p2 (X), ⎩ 2 p2 (X) < U ≤ 1, with U⊥ ⊥Z ∈ Ω, U ∼ Unif(0 1) and X binary, taking values in {x1 x2 } as before. The structural features whose identification is of interest are now θ11 ≡ p1 (x1 )
θ12 ≡ p1 (x2 )
θ21 ≡ p2 (x1 )
θ22 ≡ p2 (x2 )
and the probabilities about which data are informative are (14)
π11 (z) ≡ Pr[Y = 0 ∩ X = x1 |z]
π12 (z) ≡ Pr[Y = 0 ∩ X = x2 |z]
π21 (z) ≡ Pr[Y ≤ 1 ∩ X = x1 |z]
π22 (z) ≡ Pr[Y ≤ 1 ∩ X = x2 |z]
and β1 (z) and β2 (z) as before. Consider putative values of parameters which fall in the order θ11 < θ12 < θ22 < θ21 The inequalities of Theorem 1 place the following restrictions on the θ’s: (15a)
π11 (z) < θ11 ≤ π11 (z) + π12 (z) < θ12 ≤ π21 (z) + π12 (z)
(15b)
π11 (z) + π22 (z) < θ22 ≤ π21 (z) + π22 (z) < θ21 ≤ π21 (z) + β2 (z)
However, when determining whether it is possible to construct a proper distribution FUX|Z that exhibits independence of U and Z, and delivers the probabilities (14), it is found that the inequality θ22 − θ12 ≥ π22 (z) − π12 (z) is required to hold and this is not implied by the inequalities (15).8 This inequality and the inequality θ21 − θ11 ≥ π21 (z) − π11 (z) are required when the ordering θ11 < θ21 < θ12 < θ22 is considered. However, in the case of the ordering θ11 < θ12 < θ21 < θ22 , the inequalities of Theorem 1 guarantee that both of these inequalities hold. So if there were the additional 8
Details can be found in Chesher and Smolinski (2009).
592
ANDREW CHESHER
restriction that this latter ordering prevails, then the inequalities of Theorem 1 would define the identified set.9 3.3. Ordered Outcomes: A Parametric Example In the second example, Y records an ordered outcome in M classes, X is a continuous explanatory variable, and there are parametric restrictions. The model used in this illustration has Y generated as in an ordered probit model with specified threshold values c0 cM and potentially endogenous X. The unobservable variable in a threshold crossing representation is distributed independently of Z, which varies across a set of instrumental values Ω. This sort of specification might arise when studying ordered choice using a ordered probit model or when employing interval censored data to estimate a linear model, in both cases allowing for the possibility of endogenous variation in the explanatory variable. To allow a graphical display, just two parameters are unrestricted in this example. In many applications there would be other free parameters, for example, the threshold values. The parametric model considered states that for some constant parameter value α ≡ (α0 α1 ), Y = h(X U; α)
U⊥ ⊥Z ∈ Ω
U ∼ Unif(0 1)
where, for m ∈ {1 M}, with denoting the standard normal distribution function, h(X U; α) = m if
(cm−1 − α0 − α1 X) < U ≤ (cm − α0 − α1 X) and c0 = −∞, cM = +∞, and c1 cM−1 are specified finite constants. The notation h(X U; α) makes explicit the dependence of the structural function on the parameter α. For a conditional probability function FY |XZ , a conditional density fX|Z , and some value α, the probabilities in (4) are (16)
Pr[Y ≤ h(X τ; α)|Z = z] M FY |XZ (m|x z)fX|Z (x|z) dx = m=1
(17)
{x:h(xτ;α)=m}
Pr[Y < h(X τ; α)|Z = z] M FY |XZ (m − 1|x z)fX|Z (x|z) dx = m=2
{x:h(xτ;α)=m}
9 There are six feasible permutations of the θ’s of which three are considered in this section; the other three are obtained by exchange of the second index.
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
593
In the numerical calculations, the conditional distribution of Y and X given Z = z is generated by a structure of the form Y ∗ = a0 + a1 X + W X = b0 + b1 Z + V
W 0 1 suv Z ∼ N suv svv V 0 Y = m if
cm−1 < Y ≤ cm
m ∈ {1 M}
Here c0 ≡ −∞, cM ≡ ∞, and c1 cM−1 are the specified finite constants employed in the definition of the structure and in the parametric model whose identifying power is being considered. The probabilities in (16) and (17) are calculated for each choice of α by numerical integration.10 Illustrative calculations are done for 5 and 11 class specifications with thresholds chosen as quantiles of the standard normal distribution at equispaced probability levels. For example, in the 5 class case the thresholds are −1 (p) for p ∈ {02 04 06 08}, that is, {−084 −025 025 084}. The instrumental variable ranges over the interval Ω ≡ [−1 1], the parameter values employed in the calculations are: a0 = 0
a1 = 1
b0 = 0
suv = 06
svv = 1
and the value of b1 is set to 1 or 2 to allow comparison of identified sets as the strength of the instrument, equivalently the support of the instrument, varies. Figure 3 shows the set defined by the inequalities of Theorem 1 for the intercept and slope coefficients α0 and α1 in a 5 class model. The dark shaded set is obtained when the instrument is relatively strong (b1 = 2). This set lies within the set obtained when the instrument is relatively weak (b1 = 1). Figure 4 shows identified sets (shaded) for these weak and strong instrument scenarios when there are 11 classes rather than 5. The 5 class sets are shown in outline. The effect of reducing the discreteness of the outcome is substantial and there is a substantial reduction in the extent of the set as the instrument is strengthened. The sets portrayed here are outer regions which contain the sets identified by the model. The identified sets are computationally challenging to produce in this continuous endogenous variable case. Chesher and Smolinski (2009) investigated feasible procedures based on discrete approximations. 10
The integrate procedure in R (R Development Core Team (2009)) was used to calculate probabilities. Intersection bounds over z ∈ ΩZ were obtained as in (6) using the R function optimise. The resulting probability inequalities were inspected over a grid of values of τ at each value of α considered, a value being classified as out of the identified set as soon as a value of τ was encountered for which there was violation of one or other of the inequalities (6). I am grateful to Konrad Smolinski for developing and programming a procedure to efficiently track out the boundaries of the sets.
594
ANDREW CHESHER
FIGURE 3.—Outer regions within which lie identified sets for an intercept α0 and slope coefficient α1 in a 5 class ordered probit model with endogenous explanatory variable. The dashed lines intersect at the values of a0 and a1 used to generate the distributions employed in this illustration.
4. CONCLUDING REMARKS It has been shown that, when outcomes are discrete, single equation IV models do not point identify the structural function that delivers the discrete outcome. The models have been shown to have partial identifying power and set identification results have been obtained. Identified sets tend to be smaller when instrumental variables are strong and have rich support, and when the discrete outcome has rich support. Imposing parametric restrictions reduces the extent of the identified sets, but, in general, parametric restrictions do not deliver point identification of the values of parameters. To secure point identification of structural functions, more restrictive models are required. For example, specifying recursive structural equations for the outcome and endogenous explanatory variables, and restricting all latent variates and instrumental variables to be jointly independently distributed pro-
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
595
FIGURE 4.—Outer regions within which lie identified sets for an intercept α0 and slope coefficient α1 in a 1 class ordered probit model with endogenous explanatory variable. Outer regions for the 5 class model displayed in Figure 3 are shown in outline. The dashed lines intersect at the values of a0 and a1 used to generate the distributions employed in this illustration.
duces a triangular system model which can be point identifying.11 This is the control function approach studied in Blundell and Powell (2004), Chesher (2003), and Imbens and Newey (2009), and contrasted with the single equation IV model in the binary outcome case in Chesher (2009). The restrictions of the triangular model rule out full simultaneity (Koenker (2005, Section 8.8.2)) such as arises in the simultaneous entry game model of Tamer (2003). An advantage of the single equation IV approach set out in this paper is that it allows an equation-by-equation attack on such simultaneous equations models for discrete outcomes, avoiding the need to deal directly with the coherency and completeness issues they pose. 11
But not when endogenous variables are discrete (Chesher (2005)).
596
ANDREW CHESHER
The weak restrictions imposed in the single equation IV model lead to partial identification of deep structural objects which complements the many developments in the analysis of point identification of the various average structural features studied in, for example, Heckman and Vytlacil (2005). There are a number of interesting extensions. For example, the analysis can be extended to the multiple discrete outcome case such as arises in the study of panel data. Consider a model for T discrete outcomes, each determined by a structural equation as Yt = ht (X Ut )
(t = 1 T ),
where each function ht is weakly increasing and caglad for variations in Ut , and each Ut is a scalar random variable normalized marginally Unif(0 1) and U ≡ {Ut }Tt=1 , and instrumental variables Z ∈ Ω are independently distributed. In practice, there will often be cross equation restrictions, for example, requiring each function ht to be determined by a common set of parameters. Define h ≡ {ht }Tt=1 , τ ≡ {τt }Tt=1 , and T C(τ) ≡ Pr (Ut ≤ τt ) t=1
which is a copula since the components of U have marginal uniform distributions. An argument along the lines of that used in Section 2.1 leads to the inequalities T Pr (Yt ≤ ht (X τt ))|Z = z ≥ C(τ)
t=1
T Pr (Yt < ht (X τt ))|Z = z < C(τ) t=1
which hold for all τ ∈ [0 1]T and z ∈ Ω. These can be used to delimit the sets of structural function and copula combinations {h C} identified by the model. Other extensions arise on relaxing restrictions maintained so far. For example, it is straightforward to generalize to the case in which exogenous variables appear in the structural function. In the binary outcome case, additional heterogeneity, W , independent of instruments Z, can be introduced if there is a monotone index restriction, that is, if the structural function has the form h(Xβ U W ) with h monotonic in Xβ and in U.12 This allows extension to measurement error models in which observed X˜ = X + W . This can be further 12 Chesher (2009) studies the power of monotone index restrictions in the single equation IV model for binary outcomes.
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
597
extended to the general discrete outcome case if a monotone index restriction holds for all threshold functions. APPENDIX PROOF OF THEOREM 3—Sharp Set Identification for Binary Outcomes: The proof proceeds by considering a structural function h(x u) that (i) is weakly monotonic nondecreasing for variations in u, (ii) is characterized by a threshold function p(x), and (iii) satisfies the inequalities of Theorem 1 when probabilities are calculated using a conditional distribution FY X|Z . A proper conditional distribution FUX|Z is constructed such that U and Z are independent and have the property that the distribution function generated by {h FUX|Z } is identical to FY X|Z used to calculate the probabilities in Theorem 1. Attention is directed to constructing a distribution for U conditional on both X and Z, FU|XZ . This is combined with FX|Z , the (identified) distribution of X conditional on Z implied by FY X|Z , to obtain the required distribution of (U X) conditional on Z. The construction of FUX|Z is done for a representative value, z, of Z. The argument of the proof can be repeated for any z such that the inequalities of Theorem 1 are satisfied. It is helpful to introduce some abbreviated notation. At many points, dependence on z is not made explicit in the notation. Let Ψ denote the support of X conditional on Z. Y is binary, taking values in {y1 y2 }. Define conditional probabilities as α1 (x) ≡ Pr[Y = y1 |x z] α1 ≡ Pr[Y = y1 |z] = α1 (x) dFX|Z (x|z) and α2 (x) ≡ 1 − α1 (x), α2 ≡ 1 − α1 , and note that dependence of Ψ , α1 (x), α2 (x), and so forth on z is not made explicit in the notation. A threshold function p(x) is proposed such that y1 0 ≤ U ≤ p(x), Y= y2 p(x) < U ≤ 1, and this function satisfies some inequalities to be stated. The threshold function is a continuous function of x and does not depend on z. Define the following functions which, in general, depend on z: u1 (v) = min(v α1 )
u2 (v) = v − u1 (v)
Define sets as X(s) ≡ {x : p(x) = s}
X[s] ≡ {x : p(x) ≤ s}
598
ANDREW CHESHER
and let φ denote the empty set. Define α1 (x) dFX|Z (x|z) = u1 (v) s1 (v) ≡ min s : s
and
x∈X[s]
s2 (v) ≡ min s : s
and functions
α2 (x) dFX|Z (x|z) = u2 (v) x∈X[s]
α1 (x) x ∈ X[s1 (v)], 0 x∈ / X[s1 (v)], α2 (x) x ∈ X[s2 (v)], β2 (v x) ≡ 0 x∈ / X[s2 (v)].
β1 (v x) ≡
For a structural function h(x u) characterized by the threshold function p(x) and for a probability measure that delivers α1 (x) and FX|Z , a distribution function FU|XZ is defined as FU|XZ (u|x z) ≡ β(u x) ≡ β1 (u x) + β2 (u x) where z is the value of Z upon which there is conditioning at various points in the definition of β(u x). Consider functions p that satisfy the inequalities of Theorem 1, which in this binary outcome case can be expressed as for all u ∈ (0 1), (A1) α1 (x) dFX|Z (x|z) < u ≤ α1 + α2 (x) dFX|Z (x|z) p(x)
p(x)
Now the following statements are shown to be true: (i) For all x and any z, the distribution function β(u x) is proper: (a) β(0 x) = 0; (b) β(1 x) = 1; (c) for v > v, β(v x) ≥ β(v x). (ii) There is an independence property: for all u, β(u x) dFX|Z (x|z) = u x∈Ψ
(iii) If p satisfies the inequalities (A1), then there is an observational equivalence property: for all x, β(p(x) x) = α1 (x) (i)(a) Proper distribution β(0 x) = 0. By definition, u1 (0) = u2 (0) = 0 and so s1 (0) = s2 (0) = 0. Therefore, X[s1 (0)] = X[s2 (0)] = φ, which implies that, for all x, β1 (0 x) = β2 (0 x) = 0 and so β(0 x) = 0.
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
599
(i)(b) Proper distribution β(1 x) = 1. By definition, u1 (1) = α1 , so s1 (1) is the smallest value of s such that X[s] = Ψ , so s = maxx∈Ψ p(x). With X[s1 (1)] = Ψ it is assured that β1 (1 x) = α1 (x) for all x. By definition, u2 (1) = α2 , so s2 (1) is the smallest value of s such that X[s] = Ψ . With X[s2 (1)] = Ψ it is assured that β2 (1 x) = α2 (x) for all x. So, for all x, β(1 x) = α1 (x) + α2 (x) = 1. (i)(c) Proper distribution, nondecreasing β(u x). Since u1 (v) and u2 (v) are nondecreasing functions of v, so are s1 (v) and s2 (v). It follows that, for v > v, X[s1 (v )] ⊇ X[s1 (v)]
X[s2 (v )] ⊇ X[s2 (v)]
and so, for all x, β1 (v x) ≥ β1 (v x)
β2 (v x) ≥ β2 (v x)
and it follows that the sum of the functions, β(v x), is a nondecreasing function of v. (ii) Independence. By definition, β1 (v x) dFX|Z (x|z) = α1 (x) dFX|Z (x|z) = u1 (v) x∈Ψ
x∈X[s1 (v)]
β2 (v x) dFX|Z (x|z) =
x∈Ψ
and so
α2 (x) dFX|Z (x|z) = u2 (v) x∈X[s2 (v)]
β(v x) dFX|Z (x|z) = u1 (v) + u2 (v) = v x∈Ψ
which does not depend on z. (iii) Observational equivalence. This requires that, for all x, β(p(x) x) = α1 (x), which is true if, for all x, (a) β1 (p(x) x) = α1 (x) and (b) β2 (p(x) x) = 0. Each equation is considered in turn. The inequalities (A1) come into play. (a) β1 (p(x) x) = α1 (x). Since for all u, α1 (x) dFX|Z (x|z) < u p(x)
there exists δ(u) > 0 such that α1 (x) dFX|Z (x|z) = u p(x)
for all u ≤ α1 . It follows that s1 (v) > v for all v, which implies X[v] ⊂ X[s1 (v)] and, in particular, X(v) ⊂ X[s1 (v)].
600
ANDREW CHESHER
For some value of x, x∗ , define p∗ ≡ p(x∗ ). Then for p∗ ∈ (0 1), X(p∗ ) ≡ {x : p(x) = p∗ } ⊂ X[p∗ ] ⊂ X[s1 (p∗ )] Recall that β1 (v x) is equal to α1 (x) for all x ∈ X[s1 (v)]. It has been shown that for any p∗ , all x such that p(x) = p∗ lie in the set X[s1 (p∗ )] and so β1 (p∗ x∗ ) = α1 (x∗ ), and the result is β1 (p(x) x) = α1 (x). (b) β2 (p(x) x) = 0. Recall X(v) ≡ {x : p(x) = v}. It is required to show that X(v) ∩ X[s2 (v)] is empty for all v. Since u2 (v) = 0 for v ≤ α1 , then s2 (v) = 0 for v ≤ α1 and X[s2 (v)] = φ for v ≤ α1 . Therefore, for v ≤ α1 , X(v) ∩ X[s2 (v)] = X(v) ∩ φ = φ From (A1) there is the inequality α2 (x) dFX|Z (x|z) ≥ v − α1 p(x)
For v > α1 , the constraint implies that there exists γ(v) ≥ 0 such that α2 (x) dFX|Z (x|z) = v − α1 p(x)
and, since for v > α1 , s2 (v) ≡ min s : s
α2 (x) dFX|Z (x|z) = v − α1 p(x)<s
it follows that s2 (v) < v. It follows that for v > α1 , X(v) ∩ X[s2 (v)] = φ because with s2 (v) < v, {x : p(x) < s2 (v)} ∩ {x : p(x) = v} = φ Define p∗ = p(x∗ ). Then for p∗ ∈ (0 1), X(p∗ ) ∩ X[s2 (p∗ )] = φ so Q.E.D. β2 (p∗ x∗ ) = 0 and there is the result, for all x, β2 (p(x) x) = 0 REFERENCES BERESTEANU, A., I. MOLCHANOV, AND F. MOLIMARI (2008): “Sharp Identification Regions in Games,” Working Paper CWP15/08, CeMMAP, doi: 10.1920/wp.cem.2008.1508. [576] BERRY, S., AND E. TAMER (2006): “Identification in Models of Oligopoly Entry,” in Advances in Economics and Econometrics: Theory and Applications: Ninth World Congress, Vol. 2, ed. by R. Blundell, W. K. Newey, and T. Persson. Cambridge: Cambridge University Press. [576] BLUNDELL, R. W., AND J. L. POWELL (2004): “Endogeneity in Semiparametric Binary Response Models,” Review of Economic Studies, 71, 655–679. [595] CAMERON, A. C., AND P. K. TRIVEDI (1998): Regression Analysis of Count Data. Econometric Society Monograph, Vol. 30. Cambridge: Cambridge University Press. [577]
INSTRUMENTAL VARIABLE MODELS FOR DISCRETE OUTCOMES
601
CHERNOZHUKOV, V., AND C. HANSEN (2001): “An IV Model of Quantile Treatment Effects,” Working Paper 02-06, Department of Economics, MIT. Available at http://ssrn.com/abstract= 298879 or doi: 10.2139/ssrn.298879. [579] (2005): “An IV Model of Quantile Treatment Effects,” Econometrica, 73, 245–261. [577, 579,583] CHERNOZHUKOV, V., G. W. IMBENS, AND W. K. NEWEY (2007): “Instrumental Variable Estimation of Nonseparable Models,” Journal of Econometrics, 139, 4–14. [583] CHERNOZHUKOV, V., S. LEE, AND A. ROSEN (2009): “Intersection Bounds: Estimation and Inference,” Working Paper CWP19/09, CeMMAP, doi: 10.1920/wp.cem.2009.1909. [581] CHESHER, A. D. (2003): “Identification in Nonseparable Models,” Econometrica, 71, 1405–1441. [595] (2005): “Nonparametric Identification Under Discrete Variation,” Econometrica, 73, 1525–1550. [595] (2009): “Single Equation Endogenous Binary Response Models,” Working Paper CWP23/09, CeMMAP, doi: 10.1920/wp.cem.2009.2309. [577,581,595,596] (2010): “Supplement to ‘Instrumental Variable Models for Discrete Outcomes’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/7315_ extensions.pdf; http://www.econometricsociety.org/ecta/Supmat/7315_programs and data.zip. [590] CHESHER, A. D., AND K. SMOLINSKI (2009): “IV Models of Ordered Choice,” Working Paper CWP37/09, CeMMAP. [582,583,591,593] CILIBERTO, F., AND E. TAMER (2009): “Market Structure and Multiple Equilibria in Airline Markets,” Econometrica, 77, 1791–1828. [583] HECKMAN, J. J., AND E. VYTLACIL (2005): “Structural Equations, Treatment Effects, and Econometric Policy Evaluation,” Econometrica, 73, 669–738. [596] IMBENS, G. W., AND W. K. NEWEY (2009): “Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity,” Econometrica, 77, 1481–1512. [595] KOENKER, R. W. (2005): Quantile Regression. Econometric Society Monograph, Vol. 38. Cambridge: Cambridge University Press. [595] MULLAHY, J. (1997): “Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior,” Review of Economics and Statistics, 79, 586–593. [577] R DEVELOPMENT CORE TEAM (2009): R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available at http://www.R-project. org. [593] TAMER, E. (2003): “Incomplete Simultaneous Discrete Response Model With Multiple Equilibria,” Review of Economic Studies, 70, 147–165. [595] WINDMEIJER, F. A. G., AND J. M. C. SANTOS SILVA (1997): “Endogeneity in Count Data Models: An Application to Demand for Health Care,” Journal of Applied Econometrics, 12, 281–294. [577]
Dept. of Economics, University College London, Gower Street, London WC1E 6BT, U.K. and CeMMAP;
[email protected]. Manuscript received July, 2007; final revision received July, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 603–632
SOLVING THE FELDSTEIN–HORIOKA PUZZLE WITH FINANCIAL FRICTIONS BY YAN BAI AND JING ZHANG1 Unlike the prediction of a frictionless open economy model, long-term average savings and investment rates are highly correlated across countries—a puzzle first identified by Feldstein and Horioka (1980). We quantitatively investigate the impact of two types of financial frictions on this correlation. One is limited enforcement, where contracts are enforced by the threat of default penalties. The other is limited spanning, where the only asset available is noncontingent bonds. We find that the calibrated model with both frictions produces a savings–investment correlation and a volume of capital flows close to the data. To solve the puzzle, the limited enforcement friction needs low default penalties under which capital flows are much lower than those in the data, and the limited spanning friction needs to exogenously restrict capital flows to the observed level. When combined, the two frictions interact to endogenously restrict capital flows and thereby solve the Feldstein–Horioka puzzle. KEYWORDS: Feldstein–Horioka, savings, investment, financial frictions, limited enforcement, international capital flows.
1. INTRODUCTION THE FELDSTEIN–HORIOKA (henceforth FH) puzzle is one of the most robust empirical regularities in international finance. Feldstein and Horioka (1980) found that a cross-country regression of average domestic investment rates on average domestic savings rates results in a large, positive regression coefficient. Their finding is tightly linked to the empirical observation that net capital flows across countries are small. Feldstein and Horioka conjectured that the FH coefficient should be zero in a frictionless world economy and concluded that there must be sizeable financial frictions in international capital markets. Our objective is to quantitatively assess the implications of different financial frictions on the FH coefficient and the volume of capital flows across countries. To achieve this, we build a model with a continuum of small open economies. Each economy is a one-sector production economy that experiences idiosyncratic shocks to its total factor productivity (TFP). We analyze two financial frictions that are commonly studied in the literature. One is limited enforcement, where contracts are enforced by the threat of default penalties: perma1 We are grateful for valuable comments and continuous encouragement from Patrick Kehoe, Timothy Kehoe, Linda Tesar, Richard Rogerson, Berthold Herrendorf, and Chris House. We thank four anonymous referees and a co-editor for many useful suggestions. For helpful comments, we also thank Cristina Arellano, David Backus, Rui Castro, V. V. Chari, Narayana Kocherlakota, Ellen McGrattan, Fabrizio Perri, Vivian Yue, and seminar and conference participants at Arizona State University, the Cleveland Fed, Florida State University, the Midwest Macroeconomic meeting 2005, the Minneapolis Fed, the NBER IFM November 2005, UBC, UIUC, the University of Iowa, the University of Michigan, the University of Minnesota, the University of Montreal, and the University of Texas. All remaining errors are our own.
© 2010 The Econometric Society
DOI: 10.3982/ECTA6619
604
Y. BAI AND J. ZHANG
nent exclusion from financial markets and a loss in output. The other is limited spanning, which restricts the set of available assets to noncontingent bonds.2 We find that the interaction of these two frictions generates an FH coefficient and a volume of capital flows close to the data. To understand the role of each friction, we first examine the frictionless model, where a full set of contingent contracts is traded and contracts are fully enforceable. This model generates substantial capital flows across countries; the average current-account-to-GDP ratio reaches an average of 62%, much higher than the 7% observed in the data. The reason is that countries have a large incentive to borrow and lend so as to smooth consumption and allocate capital stocks efficiently, given the volatility of calibrated TFP shocks. The large volume of capital flows breaks the link between savings and investment, and leads to an FH coefficient close to zero. We then turn to the enforcement model, in which countries trade a full set of assets but only have a limited capacity to enforce repayment. In this environment, state-contingent debt limits arise endogenously to ensure that countries never default on state-contingent liabilities. Given the volatile shock process and the benefit of trading contingent claims, the default penalties of permanent exclusion and output losses make continuation in international financial markets highly attractive. Countries therefore have little incentive to default. Consequently, the model implies large capital flows and a close-to-zero FH coefficient. To match the observed FH coefficient, we find that the default penalties have to be close to zero, that is, almost no exclusion from the markets and no loss in output. With this low default penalty, default incentives are high, and the volume of capital flows is much lower than that in the data. We conclude that limited enforcement alone cannot jointly reproduce the FH coefficient and the capital flows in the data. We next consider the bond model, in which the spanning of assets is limited to a noncontingent bond. We follow the literature in imposing the natural debt limits to ensure that countries are able to repay without incurring negative consumption.3 The natural debt limits are quite loose and rarely bind in equilibrium. As a result, this model generates large capital flows and a counterfactually small FH coefficient. Clearly, as one tightens the debt limits exogenously, the implied FH coefficient increases. In particular, the bond model generates the observed FH coefficient when we set the exogenous debt limits tight enough to produce capital flows close to the data. Our work shows that limited spanning and limited enforcement combine to endogenously reduce the volume of capital flows to a level consistent with the data. When countries trade noncontingent bonds with an option to default, endogenous debt limits are highly restrictive for two reasons. First, these 2 Limited enforcement has been studied by Kehoe and Levine (1993), Kocherlakota (1996), and Kehoe and Perri (2002), among others. Limited spanning has been studied by Mendoza (1991), Aiyagari (1994), and Baxter and Crucini (1995), among others. 3 See Aiyagari (1994) and Zhang (1997).
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
605
debt limits have to ensure that countries prefer to repay, even under the worst realization of the TFP shock. Second, the benefits of staying in the markets are considerably lower when the only available asset is a noncontingent bond. Countries thus have a greater incentive to default, implying that the debt limits are tighter. These tight debt limits lead to a volume of capital flows of 10% and an FH coefficient close to that found in the data. They also help produce a degree of international risk sharing, cross-country dispersions of savings and investment rates, and time-series volatilities of output, consumption, and net exports close to those found in the data. It is well known that savings and investment data are highly correlated not only across countries, but also in the time series of a given country.4 Our work demonstrates that the cross-country correlation, not the time-series correlation, helps evaluate the significance of financial frictions. All of our models produce a positive time-series correlation independently of financial frictions because both savings and investment respond positively to persistent TFP shocks. This is consistent with the findings of Baxter and Crucini (1993) and Mendoza (1991). The cross-country correlation, however, does depend on financial frictions, which affect the ability of countries to borrow and the degree of divergence between the average savings and investment rates. Our work builds on Castro (2005), who demonstrated that the bond model can explain the FH finding when exogenous debt limits are calibrated to match the observed capital flows. While this is an important contribution, Castro’s analysis leaves the source of the debt limits unexplained. The contribution of our work is to identify one potential source for these required debt limits: the interaction of the limited spanning and limited enforcement frictions. Understanding the source of debt limits is important if one is interested in how savings, investment, and capital flows respond to changes in default penalties or contracting technologies. Our work is closely related to Kehoe and Perri (2002). They found that limited enforcement severely restricts capital flows when default penalties consist of permanent exclusion from financial markets but no drop in output. In contrast, we find under the same default penalties that limited enforcement barely restricts capital flows. The difference comes from two sources. First, our shock process is more volatile than theirs. We calibrate to TFPs of both developed and developing countries, while they calibrate to those of developed countries only. Second, our multicountry model offers more insurance opportunities than their two-country model. Thus, in our model, there is both a greater need and a greater opportunity to insure, which leads to larger capital flows. Our two-friction model builds on Zhang (1997), who studied endogenous debt limits in a pure exchange economy. In his setup, the debt limits depend only on exogenous endowment shocks and are independent of agents’ choices. 4
Tesar (1991) documented that savings and investment are highly correlated over time.
606
Y. BAI AND J. ZHANG
In contrast, in our production economy, the debt limits depend on both exogenous shocks and endogenous capital stocks. Thus, countries affect the debt limits they face through their choices of capital stocks.5 Relative to the vast empirical literature, there are few theoretical studies on the FH finding. Westphal (1983) argued that the FH finding is due to official capital controls. This finding, however, has persisted even after the widespread dismantling of capital controls. Obstfeld (1986) argued that population growth might generate a savings–investment comovement in a life-cycle model. Summers (1988), however, showed that the FH finding persists even after controlling for population growth. Barro, Mankiw, and Sala-I-Martin (1995) showed in a deterministic model that the savings and investment rates are perfectly correlated under full capital mobility after countries reach steady states. We instead show in a stochastic model that these two rates are uncorrelated under full capital mobility. The rest of the paper is organized as follows. Section 2 confirms the FH finding with updated data. Section 3 shows that the FH puzzle can be resolved by combining limited spanning and limited enforcement. In Section 4, we study each friction in isolation. We conclude in Section 5. The Supplemental Material (Bai and Zhang (2010)) comprises two technical appendixes. 2. THE PUZZLE CONFIRMED Feldstein and Horioka (1980) found a positive correlation between longterm savings and investment rates across countries. This finding is interpreted as a puzzle relative to a world with a frictionless financial market, which is an assumption behind most of the models in international economics. In this section, we reexamine the Feldstein–Horioka finding with updated data and show that the Feldstein–Horioka puzzle still exists today. In their seminal paper, Feldstein and Horioka (1980) measured the long-run cross-country relationship between savings and investment rates by estimating (1)
(I/Y )i = γ0 + γ1 (S/Y )i + εi
where Y is gross domestic product (GDP), S is gross domestic savings (GDP minus private and government consumption), I is gross domestic investment, and (S/Y )i and (I/Y )i are period averages of savings rates and investment rates for each country i. All the variables are in nominal terms. Feldstein and Horioka took the long-period averages of these rates to handle the cyclical endogeneity of savings and investment rates. The constant term γ0 captures the impact of the common shocks that affect all the countries on the world 5
Abraham and Carceles-Poveda (2010) studied a similar model but with aggregate production. The endogenous borrowing constraints depend on shocks and aggregate capital stock, and thus are independent of agents’ choices.
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
607
average savings and investment rates.6 The coefficient γ1 tells us whether highsaving countries are also high-investing countries on average. Obviously, the regression coefficient γ1 should be one in a world with closed economies because domestic investment must be fully financed by domestic savings. Feldstein and Horioka argued that γ1 should be zero in a world without financial frictions. Based on a sample of 16 Organization for Economic Cooperation and Development (OECD) countries7 over the 15-year period from 1960 to 1974, they found that γ1 is 0.89 with a standard error of 0.07. They interpreted this finding as evidence of a high degree of financial frictions. The Feldstein–Horioka finding stimulated a large empirical literature that attempted to refute the puzzle by studying different data samples and periods, by adding other variables to the original ordinary least squares regression, or by using different estimation methods. Across empirical studies, however, the FH coefficient has remained large and significant, although it has tended to decline in recent years (see Coakley, Kulasi, and Smith (1998) for a detailed review). We confirm the Feldstein–Horioka finding using a data set with 64 countries for the period 1960–2003.8 We find that the FH coefficient is 0.52 with a standard error of 0.06. Although lower than the original estimate, it is still positive and significantly different from zero. These results are robust to different subgroups of countries and subperiods (see Table I).9 Thus, the positive long-run correlation between savings and investment rates remains a pervasive regularity in the data.
TABLE I CROSS-COUNTRY REGRESSION COEFFICIENTS FH Coefficient (s.e.)a Group of Countries
Full sample (64 countries) Subsample (16 OECD countries)
1960–2003
1960–1974
1974–2003
0.52 (0.06) 0.67 (0.11)
0.60 (0.07) 0.61b (0.11)
0.46 (0.05) 0.56 (0.13)
a The term s.e. refers to the standard error. b The new data source produces an FH coefficient different from Feldstein and Horioka’s original estimate for the same sample. See Appendix A.3 for details.
6
For more discussion, see Frankel (1992). These countries are Australia, Austria, Belgium, Canada, Denmark, Finland, Germany, Greece, Ireland, Italy, Japan, the Netherlands, New Zealand, Sweden, the United Kingdom, and the United States. 8 For a detailed description of data, see Appendix A. 9 To compare with the Feldstein–Horioka result, we take two subperiods (1960–1974 and 1974– 2003) and two subgroups of countries (16 OECD countries and the rest of the countries). 7
608
Y. BAI AND J. ZHANG TABLE II CROSS-COUNTRY SAVINGS, INVESTMENT, AND CAPITAL FLOWSa Standard Deviation
Mean
Correlation
Capital Flows
S/Y
I/Y
S/Y
I/Y
(S/Y I/Y )
(S/Y gy )
(I/Y gy )
CA/Y (std)
TA/Y (std)
0.21
0.22
0.07
0.04
0.77
0.31
0.47
0.07 (0.04)
0.49 (0.29)
a g denotes average growth of real GDP per worker, CA/Y denotes the average absolute current-account-to-GDP y ratio, and TA/Y denotes the average absolute foreign-asset-position-to-GDP ratio. The term std refers to the standard deviation across countries.
To further understand the FH finding, we decompose the FH coefficient γ1 as (2)
γ1 = cor((S/Y )i (I/Y )i )
std((I/Y )i ) std((S/Y )i )
where cor denotes the correlation and std denotes the standard deviation. We report the correlation between the average savings and investment rates and their standard deviations across countries in Table II. The average savings rate has a larger standard deviation than the average investment rate: 0.07 versus 0.04. These two rates have a correlation of 0.77. In addition, we find that countries that grow faster not only invest more, but also save more on average. In particular, the correlation of the average growth rate of GDP per worker with the average investment rate is 0.47 and that with the average savings rate is 0.31. The sample mean of the savings rates is close to that of the investment rates, both of which are around 20%. Another way to examine the Feldstein–Horioka finding is by looking at differences between domestic savings and investment rates. A frictionless international financial market should allow domestic investment rates of countries to diverge widely from their savings rates. In the data, however, differences between savings and investment rates have not been large for most of the countries. The average of the absolute current-account-to-GDP ratios, referred to as the capital flow ratio for simplicity, is 7% for the 64 countries over the full period, as shown in Table II. The average of the absolute foreign-asset-positionto-GDP ratios is 49%. International financial markets over this period do not seem to have enabled countries to reap the long-run gains from intertemporal trade. 3. A SOLUTION FROM TWO FINANCIAL FRICTIONS Feldstein and Horioka interpreted their finding as an indication of a high degree of financial frictions. An open question is what kinds of financial frictions can explain the finding quantitatively. To address this question, we study two
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
609
types of financial frictions. One is limited spanning, where countries are limited to trading one noncontingent asset. The other is limited enforcement of contracts, where contracts are enforced by the threat of a reversion to costly financial autarky. We find that the model with both frictions (labeled as the bondenforcement model) can solve the Feldstein–Horioka puzzle quantitatively. 3.1. The Model Environment Following Clarida (1990), we consider a continuum of small open economies to study a large number of countries in a tractable fashion. All economies produce a homogeneous good that can be either consumed or invested. Each economy consists of a production technology and a benevolent government that maximizes utility on behalf of a continuum of identical domestic consumers. Countries face idiosyncratic shocks in their production technologies. The world economy has no aggregate uncertainty. The production function is the standard Cobb–Douglas AK α L1−α , where A denotes total factor productivity (TFP), K is capital, and L is labor. TFP has two components: one is a deterministic growth component that increases at rate ga , is common across countries, and is constant across periods; the other is a country-specific idiosyncratic shock a, which follows a Markov process with finite support and transition matrix Π. The history of the idiosyncratic shock is denoted by at and the probability of at , as of period 0, is denoted by π(at ). We normalize each country’s allocations by its labor endowment and the common deterministic growth rate (1 + ga )1/(1−α) . The production function can thus be simplified to akα , where lowercase letters denote variables after normalization. Each country s0 is indexed by its initial idiosyncratic TFP shock, capital stock, and asset holding: (a0 k0 b0 ). International financial markets are characterized by two frictions. One is limited spanning: the menu of available assets is restricted to noncontingent bonds. The other is limited enforcement: countries have the option to default and the extent to which countries can be penalized is restricted to a reversion to costly financial autarky. When countries have an option to repudiate their obligations, there must be some penalty for debt default to give borrowers an incentive to repay. Following the sovereign debt literature, we assume that debt contracts are enforced by exclusion from international financial markets as in Eaton and Gersovitz (1981) and an associated drop in output as in Bulow and Rogoff (1989). In such an environment, the government in each country s0 chooses a sequence of feasible allocations of consumption, capital stocks, and noncontingent bonds, denoted by x = {c(at ) k(at ) b(at )}, to maximize a continuum of identical consumers’ utility given by (3)
∞ t=0
at
βt π(at )u(c(at ))
610
Y. BAI AND J. ZHANG
where β denotes the discount factor and u denotes utility which satisfies the usual Inada conditions.10 Any feasible allocation must satisfy the budget constraints, the enforcement constraints, the natural debt limits, and the nonnegativity constraints on consumption and capital. The budget constraints are given by (4)
c(at ) + k(at ) − (1 − δ)k(at−1 ) + b(at ) ≤ at k(at−1 )α + Rt b(at−1 )
where Rt is the risk-free interest rate and δ is the per-period depreciation rate of capital. The enforcement constraints capture the limited enforcement friction and require that the continuation utility must be at least as high as autarky utility for each possible future shock, that is, (5)
U(at+1 x) ≥ V AUT (at+1 k(at ))
for any at+1
The continuation utility under allocation x from at+1 onward is given by U(at+1 x) =
∞
βτ−t−1 π(aτ |at+1 )u(c(aτ ))
τ=t+1 aτ
where π(aτ |at+1 ) denotes the conditional probability of aτ given at+1 . The autarky utility from at+1 onward is given by V AUT (at+1 k(at )) =
max τ τ
{c(a )k(a )}
∞
βτ−t−1 π(aτ |at+1 )u(c(aτ ))
τ=t+1 aτ
subject to nonnegativity constraints (c(aτ ) k(aτ ) ≥ 0) and budget constraints given by c(aτ ) + k(aτ ) − (1 − δ)k(aτ−1 ) ≤ (1 − λ)aτ k(aτ−1 )α for any τ ≥ t + 1, with k(at ) given. Here the penalty parameter λ represents a drop in output associated with defaulting on debt, which has been extensively documented in the sovereign debt literature; see Tomz and Wright (2007) and Cohen (1992). Two possible channels lead to output drops after default. One is the disruption of international trade and the other is the disruption of domestic financial systems. Either disruption could lead to output drops if either trade or banking credit is essential for production. We follow the sovereign debt literature and model the output loss exogenously. 10 We drop the country index s0 for simplicity of notation when doing so does not cause any confusion.
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
611
In the spirit of Aiyagari (1994), we impose the natural debt limits to ensure that countries are able to repay even under the lowest shock without incurring negative consumption: (6)
b(at ) ≥ −D(k(at ))
where D(k(at )) = (ak(at )α + (1 − δ)k(at ))/(R − 1) and a is the lowest potential TFP shock. In the presence of the enforcement constraints, the natural debt limits never bind in equilibrium. We impose these limits to rule out the Ponzi scheme that does not violate the enforcement constraints. 3.2. Equilibrium and the Solution Strategy An equilibrium in the bond-enforcement model is a sequence of prices {Rt } and allocations {c(at ), k(at ), b(at )} such that allocations solve each country’s problem given prices and that the world resource conditions are satisfied every period: π(at )[c(at s0 ) + k(at s0 ) s0
at
− (1 − δ)k(at−1 s0 ) − at k(at−1 s0 )α ] = 0
Since our model has no aggregate uncertainty, interest rates are constant under an invariant distribution. Given the interest rate R, each country’s problem, labeled original problem, is to maximize utility given by (3) subject to the budget constraints (4), the enforcement constraints (5), the natural debt limits (6), and the nonnegativity constraints on consumption and capital. The optimal allocation in the original problem is different from a competitive equilibrium where consumers decide on borrowing, since the consumers fail to internalize the impact of their choices on nationwide debt limits. This point was made by Jeske (2006). To decentralize the optimal allocation, we can impose taxes or subsidies on foreign borrowing and lending of each consumer in a competitive setting, similarly to Kehoe and Perri (2002) and Wright (2006).11 To compute the equilibrium, we restate our original problem recursively as (P) u(c) + β π(a |a)W (a k b ) W (a k b) = max ck b
a |a
subject to (7) 11
c + k − (1 − δ)k + b ≤ akα + Rb For details see Bai and Zhang (2010).
612 (8)
Y. BAI AND J. ZHANG
c k ≥ 0
b ≥ −D(k )
W (a k b ) ≥ V AUT (a k )
∀a
This technical approach is similar to the approach used in Abreu, Pearce, and Stacchetti (1990),12 with one key difference in that our problem is dynamic rather than repeated. Capital and bond holdings are endogenous state variables that alter the set of feasible allocations in the following period. Thus, they not only affect the current utility, but also the future prospects in the continuation of the dynamic problem. Nonetheless, we show that the original problem can be restated in such a recursive formulation following Atkeson (1991).13 We solve the P-problem iteratively. In each iteration n, we compute the corresponding optimal welfare Wn given Wn−1 as (9) u(c) + β π(a |a)Wn−1 (a k b ) Wn (a k b) = max ck b
a |a
subject to the constraints (7), (8), and (10)
Wn−1 (a k b ) ≥ V AUT (a k )
for all a
One feature of this algorithm is that the domain Sn of Wn needs to be updated in each iteration. This is because, for Wn to be well defined, the set of feasible allocations that satisfy the constraints (7), (8), and (10) needs to be nonempty for each state (a k b) ∈ Sn . Clearly, this set of feasible allocations depends on the continuation welfare Wn−1 . In particular, when Wn−1 decreases, some states (a k b) cannot find any feasible allocations. Thus, a smaller domain can be supported. We now describe the details of our iterative algorithm. We start with a value of W0 that is sufficiently high and a sufficiently large set of states S0 . Specifically, we set S0 to include all the states under which the set of allocations that satisfy the constraints (7) and (8) is nonempty. We set W0 as the optimal welfare in the P-problem under the constraints (7) and (8) only. For each n ≥ 1, we construct Sn given Wn−1 to include all the states that permit a nonempty set of feasible allocations. The associated welfare function Wn on Sn is constructed according to (9). Both sequences of {Sn } and {Wn } are decreasing, and converge to the limits S and W , respectively. The limit W corresponds to the optimal welfare in the original problem. 12 We thank an anonymous referee for directing us to this alternative solution strategy. Relative to our original one, this approach improves computation efficiency. For details on both approaches see Bai and Zhang (2010). 13 The model in Atkeson (1991) had a complete set of assets and private information, while our model has incomplete markets and complete information. Despite these differences, the adaptation of his approach is straightforward.
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
613
We compute the equilibrium of the bond-enforcement model as follows. We start with an initial guess of the interest rate R. We then follow the above iterative algorithm to compute the optimal welfare W and the associated optimal decision rules. We next find the invariant distribution and calculate the excess demand in the bond markets under this interest rate R. We finally update the interest rate and repeat the above procedures until the bond markets clear. 3.3. Calibration To quantitatively evaluate the FH coefficient in the bond-enforcement model, we calibrate the model parameters. Countries share all parameter values that describe tastes and technology, and differ only in their shock realizations. As is standard in the literature, we adopt the constant elasticity of substitution utility function u(c) = (c 1−σ − 1)/(1 − σ), where the risk aversion parameter σ is chosen to be 2. The discount rate β is calibrated to be 0.89 to match the U.S. average real capital return of 4% per annum. The technology parameters are set to match U.S. equivalents: the capital depreciation rate δ is set at 10% per annum, and the capital share α is set at 0.33. The benchmark default penalties are permanent exclusion from financial markets and a loss in output. We set the output drop parameter λ at 1.4%, following Tomz and Wright (2007). Given the importance of the default penalties, in the next section we conduct a wide range of the sensitivity analysis. In particular, we experiment with different values of the output drop parameter and allow for partial exclusion under which countries can regain access to international financial markets with some probability each period after default. Calibration of the world TFP process requires a rich stochastic process to capture the key features of the TFP series for a large number of countries. Using the standard growth accounting method, we compute a TFP series for each country.14 We take out the common deterministic trend of 1.01% from the logged TFP series. There are three key features of the 64 TFP series. First, there is a wide range of TFP levels across countries. For example, the average TFP difference between the United States and Senegal is more than 13 times the cross-country average of time-series standard deviations. Second, TFPs of poor countries are generally more volatile than those of rich countries, as shown in Figure 1(a). The mean of the coefficients of variation of the TFP series is 0.02 for OECD countries and is 0.04 for developing countries. Third, TFPs for some countries have different characteristics during different subperiods, as shown in Figure 1(b). For example, Peruvian TFP shows an abrupt change in 1980: its coefficient of variation and the mean are 0.02 and 3.8 before 1980, and 0.05 and 3.5 after 1980. 14
For details, see Appendix B.
614
Y. BAI AND J. ZHANG
FIGURE 1.—Key features of the TFP processes.
To generate the features of the TFP series in the data, we specify the world productivity process as a stochastic regime-switching process. We assume that the world has three regimes, each of which is captured by its mean, persistence, and standard deviation of innovations {(μj ρj νj )}j=123 .15 The TFP shock ait of country i at period t in regime j follows an autoregressive process (11)
ait = μj (1 − ρj ) + ρj ait−1 + νj εit
where εit is independently and identically distributed, and drawn from a standard normal distribution. At period t + 1, country i has some probability of switching to another regime, governed by the transition matrix P. Finally, we use maximum likelihood to estimate all the parameters. The estimation algorithm, described in Appendix B, is an extension of the expectation maximization (EM) principle of Hamilton (1989). Table III reports parameter estimates and standard errors. For convenience, the regimes will be referred to as the low, middle, and high regimes according to their conditional means. All three regimes are persistent with ρ around 0.99. The middle regime is more volatile than the other two regimes. Switching between regimes mimics abrupt changes of some countries’ TFP processes in the data, such as those of Iran and Peru. The regime-switching process successfully replicates the key features of the cross-country TFP series. 15
The three-regime specification greatly improves the goodness of fit over the two-regime specification, while introducing another regime barely improves the goodness of fit. For tractability, we choose the three-regime specification.
615
SOLVING THE FELDSTEIN–HORIOKA PUZZLE TABLE III ESTIMATED PARAMETERS OF THE WORLD PRODUCTIVITY PROCESSa Switching Probability P Regime
Mean μ
Innovation ν
Persistence ρ
Low
Middle
High
Low Middle High
2.07 (1.08) 3.46 (0.37) 4.58 (0.14)
0.023 (0.0001) 0.070 (0.0003) 0.020 (0.0000)
0.995 (0.003) 0.987 (0.011) 0.981 (0.003)
0.92 (0.11) 0.06 (0.06) 0.04 (0.14)
0.04 (0.15) 0.90 (0.05) 0.03 (0.19)
0.04 (0.08) 0.04 (0.04) 0.93 (0.10)
a Numbers in parentheses are standard errors.
3.4. Quantitative Results Now, with the calibrated parameters and the estimated world TFP process, we can compute the predictions of the bond-enforcement model by simulation. To be consistent with the empirical data, in each simulation we obtain 64 series of 44 periods from the invariant distribution. We then compute domestic savings as output minus consumption and domestic investment as changes in capital stocks plus capital depreciation. After calculating the average savings and investment rates, we run the same regression as in equation (1). We simulate the model 1,000 times. Table IV reports the results, along with comparisons to the data and the results from the other models. The bond-enforcement model generates an FH coefficient of 0.52, significantly different from zero and similar to that in the data. Thus, this model solves the Feldstein–Horioka puzzle. Our results come from the interaction of limited enforcement and limited spanning. These two frictions together generate endogenous debt limits on noncontingent bonds because lenders will not offer a loan that would be defaulted on in the next period. In particular, the endogenous debt limits B depend on the current TFP shock a and the next-period capital stock k as ˜ )) = V AUT (a k )
˜ ) : W (a k b(a B(a k ) ≡ min − b(a a |π(a |a)>0
For each (a k ), the debt limit B specifies the maximum amount of debt that can be supported without default under all future contingencies. We examine features of the endogenous debt limits. Figure 2(a) plots the endogenous debt limit function B over capital for the median shock in the middle-volatility regime. The debt limit is increasing in capital because poor countries have more incentive to default than rich countries. To interpret the scale of the debt limits, we also graph them in Figure 2(b) in terms of output.16 16
The debt-limit-over-output ratio is not monotonically increasing in capital due to the concavity of the production function. The zigzag pattern in the figure is the result of the numerical approximation.
616
Y. BAI AND J. ZHANG TABLE IV COMPARISON ACROSS MODELSa
Data
Two-Friction Model
Frictionless Model
Enforcement Model
Bond Bond Natural Limits ad hoc Limits
FH coeff (s.e.) 0 52 (0 06) 0 52 (0 05) −0 01 (0 01) −0 01 (0 02) CA/Y (std) 0 07 (0 04) 0 10 (0 09) 0 62 (0 33) 0 56 (0 27) TA/Y (std) 0 49 (0 29) 0 40 (0 21) 6 12 (5 15) 4 10 (1 76)
0 05 (0 02) 0 38 (0 13) 5 60 (3 44)
0 52 (0 05) 0 11 (0 09) 0 39 (0 23)
std(S/Y )
0 06
0 06
0 41
0 28
0 34
0 06
std(I/Y ) 0 04 cor(S/Y I/Y ) 0 77
0 04 0 77
0 05 −0 09
0 04 −0 10
0 04 0 39
0 04 0 77
cor(S/Y gy ) cor(I/Y gy ) mean(S/Y ) mean(I/Y )
0 31 0 47 0 21 0 22
0 65 0 66 0 13 0 14
0 00 0 89 −0 06 0 21
0 10 0 83 −0 03 0 21
0 41 0 88 0 12 0 21
0 64 0 68 0 13 0 15
RS coeff (s.e.) TS cor (std)
0 78 (0 01) 0 60 (0 01) 0 49 (0 42) 0 73 (0 04)
0 46 (0 03) 0 61 (0 02)
0 61 (0 01) 0 74 (0 04)
0 00 (0 00) 0 52 (0 10)
0 01 (0 00) 0 38 (0 14)
a CA/Y denotes the average absolute current-account-to-GDP ratio and TA/Y denotes the average absolute foreign-asset-position-to-GDP ratio. The term std refers to the cross-country variation in the time-series average, cor denotes the correlation, and s.e. denotes the standard error. RS coeff reports coefficient β1 in the panel regression: log cit − log c¯t = β0 + β1 ( log yit − log y¯t ) + uit , where c¯t and y¯t denote the average consumption and output across countries of date t . TS cor denotes the average time-series correlations between detrended savings and investment using the Hodrick–Prescott (HP) filter. Bond ad hoc Limits denotes the bond model with the constraint (14), where κ is set at 9 8% to match the observed FH coefficient.
These endogenous debt limits overall allow countries to borrow about 30% of their output on average. As a result, the model generates a capital flow ratio of 10% and a foreign-asset-to-output ratio of 40%, which are close to their empirical counterparts of 7% and 49%.
(a) Levels of Debt Limits
(b) Ratios of Debt Limits and Output
FIGURE 2.—Endogenous debt limits.
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
617
The endogenous debt limits are key to understanding savings and investment behavior. Countries with low capital face tight debt limits and cannot borrow much to invest when they are experiencing good productivity shocks: they have to save more to invest more. Countries with high capital intend to lend abroad when they are experiencing bad shocks. Nonetheless, total lending must be equal to total borrowing in equilibrium. These countries have to invest more at home because the interest rate decreases to lower their lending incentive. Consequently, the average savings and investment rates are positively correlated across countries with a correlation of 0.77, the same as that in the data. Also, the model produces small dispersions of savings and investment rates, similar to those found in the data; the standard deviations are 0.06 and 0.04, respectively. Furthermore, the model generates positive correlations between the output growth and the savings and investment rates, although these correlations are higher than those in the data, as shown in Table IV. 4. THE ROLE OF EACH FRICTION The bond-enforcement model deviates from the standard complete markets model in two ways: limited spanning of assets and limited enforcement of debt contracts. Could we have quantitatively solved the Feldstein–Horioka puzzle with just one of these? To address this question, we first examine the frictionless complete markets model, then examine the enforcement model where contingent contracts have limited enforceability, and finally examine the bond model where assets have limited spanning. Across these models, we maintain the benchmark parameter values with the exception that the discount factor is recalibrated in each model to match the interest rate of 4% per annum.17 By doing so, we highlight the impact of each financial friction, and more importantly their interaction, on capital flows and the FH coefficient. 4.1. The Complete Markets Model Feldstein and Horioka conjecture that in a general equilibrium model of a world economy without financial frictions, the FH coefficient should be zero. Interestingly, Feldstein and Horioka made their conjecture long before there existed quantitative stochastic general equilibrium models that could be used to evaluate it. In this subsection, we verify the Feldstein–Horioka conjecture in a standard complete markets model. Under frictionless financial markets, countries trade a complete set of Arrow securities. The government chooses allocations to maximize (3) subject to the 17 The discount factor is calibrated to be 0.96 in the frictionless model, 0.955 in the benchmark enforcement model, and 0.94 in the bond model with the natural debt limits.
618
Y. BAI AND J. ZHANG
budget constraints (12)
c(at ) + k(at ) +
q(at at+1 )b(at at+1 )
at+1 |at
≤ at k(at−1 )α + (1 − δ)k(at−1 ) + b(at−1 at ) and the no-Ponzi constraints (13)
¯ b(at at+1 ) ≥ −B
where b(at at+1 ) denotes the quantity of Arrow securities that deliver one unit of consumption if state at+1 is realized next period and q(at at+1 ) denotes the price of such Arrow securities. The borrowing limit B¯ > 0 is set so large that the no-Ponzi constraints never bind in equilibrium. As reported in Table IV, the complete markets model generates an FH coefficient of –0.01.18 Thus, the frictionless model produces an FH coefficient close to zero, as Feldstein and Horioka conjectured. To understand this result, we first look at investment and savings decisions. Under a persistent shock process, investment depends on changes of TFP shocks: a country with a higher average TFP growth rate invests more on average. Savings depends not on changes, but on levels of shocks: a country with a higher average TFP level saves more on average. As a result, the average growth of output is positively correlated with the average investment rate, but uncorrelated with the average savings rate, as reported in Table IV. We then examine the two terms of the FH coefficient in equation (2): the correlation between the average savings and investment rates and their relative dispersion. The change and the level of our persistent and mean-reversion process are slightly negatively correlated, which implies a small, negative correlation between the average savings and investment rates of –0.09. Additionally, the relative dispersion of investment and savings is small because the change has a smaller dispersion than the level under the persistent shock process. Consequently, the frictionless model produces an FH coefficient close to zero. Additionally, the savings dispersion in the frictionless model is much larger than that in the two-friction model: 0.41 versus 0.06, which implies that the two frictions have substantial impacts on limiting capital flows across countries. This analysis shows that the size of the FH coefficient depends on the persistence of the TFP process. As the persistence decreases, the correlation between changes and levels of shocks becomes more negative, and the dispersion 18 If savings is defined as national savings instead of domestic savings, the FH coefficient will be zero. The intuition is simple. National income and consumption of each country are constant over time due to fully diversified portfolios, which leads to constant national savings rates over time. Investment, however, varies with TFP shocks over time. Thus, average national savings rates and average investment rates are uncorrelated across countries.
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
619
of changes relative to levels rises. Thus, the correlation between the average investment and savings rates becomes more negative, and the relative dispersion of the two rates rises with lower persistence. As a result, the FH coefficient becomes more negative. To illustrate this point, we vary the persistence parameter of an AR(1) process, where the innovation standard deviation is set at 4.2% to match the unconditional standard deviation of the TFP data. When persistence falls from 0.999 to 0.5, the FH coefficient falls from –0.001 to –0.12, but the FH puzzle remains. The frictionless model generates a large capital flow ratio, 62%, which is about 9 times that in the data. The average foreign-asset-position-to-GDP ratio is also large, 6.12, which is about 12 times that in the data. Furthermore, the pattern of capital flows deserves some attention. Rich, fast-growing countries on average both invest and save a lot, while poor, stagnant countries on average both invest and save little. Even under complete markets, these two types of countries have relatively low capital flows. Rich, stagnant countries on average save a lot but invest little, and poor, miracle countries on average save little but invest heavily. Thus, capital flows generally move from rich, stagnant to poor, miracle countries. This prediction is not observed in the data, as pointed out by Lucas (1990). 4.2. The Enforcement Model We now examine the enforcement model. It allows a complete set of assets to be traded, but international financial contracts have limited enforceability. Formally, each country chooses an allocation x = {c(at ) k(at ) b(at at+1 )} to maximize its welfare given by (3), subject to the budget constraints (12), the no-Ponzi constraints (13), and the enforcement constraints (5).19 As shown in Table IV, the enforcement model produces an FH coefficient of –0.01 and a capital flow ratio of 0.56, which are close to the predictions of the frictionless model. In this model, limited enforceability of contracts generates endogenous borrowing limits, which ensure that countries prefer to repay their contingent claims next period. Figure 3 plots these limits on contingent claims on the median shock of each regime. Borrowing limits are larger for countries with larger capital stocks or higher TFP shocks because they have less incentive to default. Moreover, these state-contingent debt limits are very loose when compared with the noncontingent debt limits in the two-friction model. On average, countries can borrow three times their income. This is because continued participation in financial markets is so attractive that countries have little incentive to default under the rich asset structure and the volatile shock process. 19 Solving the enforcement model is computationally intensive. In addition to transforming the enforcement constraints recursively, we also need to deal with the curse of dimensionality that arises under state-contingent assets. We adapt the approach proposed by Atkeson and Lucas (1992) to compute this model. For details, see Zhang (2005).
620
Y. BAI AND J. ZHANG
FIGURE 3.—Comparison of endogenous debt limits.
The enforcement model thus generates a capital flow ratio of 0.56 and a ratio of foreign asset and output of 4.1, both of which are much higher than those in the data. Under these large debt limits, the response of investment to shocks is similar to that found in the frictionless model; the correlation of the investment rate with output growth is 0.83. Savings starts to respond slightly to the changes of shocks; the correlation between the savings rate and output growth remains low at 0.10. Consequently, savings and investment rates remain almost uncorrelated, and the relative dispersion of the investment and savings rates is still small, as in the frictionless model. As a result, the enforcement model generates an FH coefficient close to zero. Default penalties play an important role in determining capital flows under limited enforcement; lenient penalties increase the default incentive and thus reduce borrowing and lending. We conduct a sensitivity analysis on default penalties to examine the robustness of our results. We start with the output drop parameter. As shown in Table V, a smaller output drop reduces capital flows and drives up the FH coefficient, but the effects are quantitatively small. Even under a zero output drop, the enforcement model still produces an FH coefficient close to zero and a large volume of capital flows. In contrast, Kehoe and Perri (2002) found that this default penalty leads to small capital flows. The difference comes from two sources. First, our shock process is more volatile than theirs. We calibrate to TFPs of both developed and developing countries while they calibrate to those of developed countries only. Second, our multicountry model offers more insurance opportunities than their two-
621
SOLVING THE FELDSTEIN–HORIOKA PUZZLE TABLE V SENSITIVITY ANALYSIS OF THE ENFORCEMENT MODEL Output Loss λ λ = 2 0%
λ = 0 9%
Reentry Probability η λ = 0%
η = 20%
η = 40%
η = 100%
FH coeff (s.e.) −0.015 (0.02) −0.011 (0.02) −0.009 (0.02) 0.08 (0.03) 0.11 (0.03) 0.23 (0.03) CA/Y 0 57 0 55 0 51 0 10 0 07 0 02 std(S/Y ) 0 29 0 27 0 26 0 06 0 04 0 016 std(I/Y ) 0 04 0 04 0 04 0 01 0 01 0 005 cor(S/Y I/Y ) −0 10 −0 08 −0 06 0 36 0 36 0 70 a
a The term s.e. refers to the standard error.
country model. Thus, in our model, there is both a greater need and a greater opportunity to insure, which leads to larger capital flows. We next relax the assumption of permanent exclusion from international financial markets after default. To give the enforcement model the best chance of matching the data, we shut down the output drop so as to generate the highest possible FH coefficient. The default welfare on the right hand side of the enforcement constraints becomes V D (a k) = max u(c) ck
+β
π(a |a)((1 − η)V D (a k ) + ηW (a k 0))
a |a
subject to c + k − (1 − δ)k ≤ akα , where η denotes the reentry probability and W denotes the market welfare of the enforcement model. Table V reports the results for different reentry probabilities. As the reentry probability η increases, capital flows decrease and the FH coefficient increases. Even when η approaches 100%, the enforcement model still generates an FH coefficient much lower than that found in the data. The capital flow ratio and the dispersions of the savings and investment rates, however, are much lower than those found in the data. To produce the observed FH coefficient, we would need an even lower default penalty: countries can access the markets with some probability in the defaulting period and with certainty in the next period. This seems unrealistic. Gelos, Sahay, and Sandleris (2004) documented that, on average, defaulting countries are excluded from international financial markets for 5 years, which implies an η of 20%. Moreover, capital flows and dispersions of savings and investment rates would be even lower. The enforcement model has a counterfactual implication when the reentry probability is high: countries hit with bad shocks might respond by increasing capital and investment. The reason is as follows. In this model, the demand for capital decreases when shocks are bad or when enforcement constraints
622
Y. BAI AND J. ZHANG
are binding. With contingent assets, the enforcement constraints tend to bind under good shocks and relax under bad shocks. Bad shocks thus have two effects: they decrease capital due to lower returns, but increase capital due to the relaxation of the enforcement constraints. Hence, capital and investment might increase when countries are hit with bad shocks if the effect of relaxing the binding enforcement constraints is large enough. This counterfactual implication helps explain why the FH coefficient remains small under high reentry probabilities. As the reentry probability increases, enforcement constraints tighten and capital flows decrease, thereby increasing the savings–investment correlation across countries. This increase in the correlation is dampened by the counterfactual implication as follows. Consider a country that increases investment when hit by a bad shock. Its investment is high, but output and savings are low. Moreover, savings tends to decrease by more than output due to the incentive of consumption smoothing. Thus, this country has a high investment rate and a low savings rate in response to the bad shock. These responses are large enough to lead to a high average investment rate and a low average savings rate. This lowers the standard deviation of the average investment rate and the savings–investment correlation across countries, and dampens the increase in the FH coefficient. Note that in these experiments we assume that defaulting countries have debt fully written off and are treated the same after reentry as countries that have never defaulted. In the data, the default penalties are more severe because these assumptions are violated. Defaulting countries have only partial debt relief and need to repay a nonnegligible fraction of their outstanding debt when reentering the market.20 In addition, though defaulting countries regain access to markets, they may have only limited access and face a higher cost of borrowing relative to nondefaulting countries. If we enrich the enforcement model further along these dimensions, the default penalty will increase, which loosens borrowing limits and lowers the FH coefficient. 4.3. The Bond Model We next examine the bond model, where countries can trade only noncontingent bonds. To isolate the role of limited spanning, we impose natural debt limits, which ensure that countries are able to repay without incurring negative consumption. These debt limits are loose enough such that only a small fraction of countries bind at the constraints in equilibrium. Formally, each country maximizes welfare given by (3) subject to the budget constraints (4) and the natural debt limits (6). As shown in Table IV, the bond model with the natural debt limits produces a small FH coefficient of 0.05 and a large capital flow ratio of 0.38. 20 The debt recovery rate is 40% for Ecuador’s 1999 default, 36.5% for Russia’s 1998 default, and 28% for Argentina’s 2001 default.
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
623
The investment behavior in the bond model is similar to that in the frictionless model because of the large debt limits. Savings behavior in this model, however, is different due to precautionary motives under incomplete markets; countries tend to save more when the TFP shock increases. Thus, the correlation between the average savings rate and average output growth increases from zero in the frictionless model to 0.41 in the bond model, and similarly for the correlation between the average rates of savings and investment. Despite the increases in the correlation between these two rates, the dispersion of the savings rate is still much larger than that of the investment rate: 0.34 versus 0.04. This is because the amount of borrowing is virtually unrestricted: the foreign-asset-position-to-GDP ratio is 5.6, which is much higher than 0.49 in the data. As a result, the bond model with the natural debt limits implies a counterfactually small FH coefficient. Most of the literature with incomplete markets imposes ad hoc debt limits. When these limits are tightened exogenously, the bond model can generate the observed FH coefficient. This was shown by Castro (2005), who restricted countries to borrow no more than a fraction κ of their resources at the beginning of the period: (14)
b(at ) ≥ −κ(at k(at−1 )α + (1 − δ)k(at−1 ) + Rt b(at−1 ))
We conduct a similar analysis in the bond model. We find that when κ is set at 9 8%, the bond model reproduces the observed FH coefficient. As reported in Table IV, the bond model under the constraint (14) produces similar implications as the bond-enforcement model. Castro’s analysis shows that we need debt limits to be severe enough to restrict capital flows close to the data to resolve the FH puzzle. Although this is an important contribution, Castro’s analysis leaves the source of the borrowing constraints unexplained. Our work suggests that the interaction of the two financial frictions could be a potential source for these required borrowing constraints. Moreover, our analysis is useful for predicting the effects on savings, investment, and capital flows if there is a change in default penalties or contracting technologies. 4.4. Interaction of Two Frictions The key reason that the two-friction model can solve the FH puzzle is that the interaction of the two frictions generates tight endogenous debt limits to restrict capital flows close to the data. As shown in Figure 3, the enforcement model under the benchmark default penalties generates loose state-contingent debt limits, which ensure repayments if the contingency occurs. When limited spanning is also introduced, the debt limits become noncontingent to ensure that countries prefer to repay, even under the worst realization of shocks. Clearly, the noncontingent debt limits are tighter than the contingent ones.
624
Y. BAI AND J. ZHANG
Moreover, market welfare is lower in the two-friction model than in the enforcement model because limited spanning restricts trading opportunities, although the autarky utilities are the same across these two models. Countries thus have larger incentive to default in the two-friction model than in the enforcement model. As a result, the debt limits are further tightened in the presence of both frictions. In addition, the two-friction model does not have the counterfactual investment behavior in the enforcement model, even when the debt limits are tight. The key is that limited spanning makes repayments noncontingent. Repayments are more painful under bad shocks with lower income, and thus the enforcement constraints tend to bind when countries are hit with bad shocks. Therefore, investment tends to decrease in response to bad shocks as in the data. Absence of the counterfactual implication helps explain why the twofriction model can generate the observed FH coefficient, while the enforcement model cannot under the same capital flow ratio. The interaction of the two frictions improves the model’s quantitative performance along four dimensions over the frictionless model, the enforcement model, and the bond model with the natural debt limits. First, the two-friction model produces a capital flow ratio close to that in the data, while the other three models generate ratios at least 5 times that in the data (see Table IV). Second, the two-friction model generates an observed dispersion of average savings rates of 0.06, while the other three models generate a dispersion at least 4 times that found in the data. Third, the two-friction model produces a correlation between average savings and investment rates almost the same as that in the data, while this correlation is much lower in the other three models. Last, the two-friction model generates an average savings rate closest to the data, although it underperforms with regard to the average investment rate. The two-friction model has implications in these dimensions similar to the bond model with the exogenous debt limits, which are calibrated to match the observed FH coefficient. The interaction of the two frictions also helps account for the imperfect risk sharing that is in the data. The empirical literature commonly measures the degree of risk sharing as the coefficient on output growth in a panel regression of consumption growth on output growth. As shown in Table IV, international risk sharing is far from perfect empirically, which is completely at odds with the perfect risk sharing prediction of the complete markets model. The enforcement model and the bond model with the natural debt limits still provide too much risk sharing relative to the data. With the tight debt limits, the bondenforcement model greatly reduces risk sharing across countries and generates a degree of risk sharing that is much closer to that found in the data. We now examine the impact of default penalties in the two-friction model. When the output loss parameter λ is zero, permanent exclusion from financial markets is the only default penalty. Under this scenario, we find that capital flows are small and the FH coefficient is large, as shown in Table VI. In
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
625
TABLE VI SENSITIVITY ANALYSIS OF THE BOND -ENFORCEMENT MODEL η = 0%
a
FH coeff (s.e.) CA/Y std(S/Y ) std(I/Y ) cor(S/Y I/Y )
λ = 1 4%
λ = 2 0%
λ = 0 9%
λ = 0%
η = 20%
η = 40%
η = 100%
0.45 (0.05) 0.11 0.06 0.04 0.72
0.61(0.05) 0.09 0.05 0.04 0.84
0.94 (0.02) 0.01 0.03 0.02 0.99
0.81 (0.04) 0.04 0.04 0.03 0.94
0.86 (0.03) 0.03 0.03 0.03 0.96
0.93 (0.02) 0.02 0.03 0.03 0.98
a The term s.e. refers to the standard error.
contrast, in the enforcement model this default penalty supports large capital flows and generates a close-to-zero FH coefficient. This is because the exclusion from financial markets is less severe when only noncontingent bonds can be traded. Thus, to match the observed capital flows, we need the output loss as part of the default penalties. As the output drop parameter λ rises from 0 to 2%, the FH coefficient decreases from 0.94 to 0.45, but remains large and positive. We also experiment with partial exclusion from financial markets in Table VI. We set the output loss at the benchmark value of 1.4%. When the reentry probability is 20%, the model produces a small capital flow ratio of 4% and a large FH coefficient of 0.81. Capital flows decrease and the FH coefficient rises as we further increase the reentry probability to make default less painful. Again, we assume in all these experiments that the defaulting country has all the debt written off and has full access to the markets upon reentry. If we further relax these assumptions, the FH coefficient will decrease since borrowing limits become looser with higher default penalties. 4.5. Time-Series and Cross-Section Predictions Savings and investment are also positively correlated over business cycles within a country, as documented by Tesar (1991). The international business cycle literature shows that the positive time-series savings–investment correlation arises in either a frictionless model or a model with financial frictions.21 We confirm the previous results. Moreover, we highlight that the cross-country dimension, and not the time-series dimension, of savings and investment data helps evaluate the significance of financial frictions. Finally, we demonstrate the success of the two-friction model mechanism in producing key international business cycle statistics. 21
See Backus, Kehoe, and Kydland (1992), Baxter and Crucini (1993), and Mendoza (1991).
626
Y. BAI AND J. ZHANG
To examine the time-series implications, we must introduce capital adjustment costs, as is standard in international business cycle models, so as to reduce the volatility of investment to a level close to that in the data.22 Following the literature, we specify the capital adjustment cost as χ(kt+1 /kt − 1)2 kt /2, where χ is calibrated to match the volatility of investment in the data. The resource constraints are modified accordingly to reflect the adjustment costs. We find that all of the models, with or without financial frictions, generate a positive time-series correlation between savings and investment rates (reported in the last column of Table IV), while the FH coefficients are almost the same as were estimated before. The result comes from endogenous responses of savings and investment to the persistent shock process. When hit by a good shock, a country increases investment to utilize this good production opportunity and also increases savings to smooth consumption. On the other hand, when hit by a bad shock, a country reduces both savings and investment. Thus, savings and investment are positively correlated over time, and this mechanism is present in each of the models. Different from the time-series dimension, the cross-section dimension studies how divergent the long-term average savings and investment rates are for each country. One can imagine that in a world with a persistent shock process, each country could have positively correlated savings and investment rates over time, but might have very different average savings and investment rates. In this study, we have shown that the degree of divergence depends on the ability of countries to borrow and lend, which in turn is given by the degree of financial frictions. This explains why the cross-section dimension helps evaluate the significance of financial frictions. We also examine the implications on key international business cycle statistics of the bond-enforcement model. As reported in Table VII, the bondTABLE VII TIME-SERIES IMPLICATIONS OF THE BOND -ENFORCEMENT MODELa Volatility
Data Model
Cyclicality
International
std(y)
std( nx y )
std(c) std(y)
std(i) std(y)
cor(c y)
cor(i y)
cor( nx y y)
cor(y y ∗ )
cor(c c ∗ )
3.53% 3.35%
5.59% 3.18%
1.01 0.68
4.08 4.04
0.74 0.98
0.58 0.78
−0.15 0.09
0.08 0.02
0.05 0.02
a The data statistics are calculated from logged (except for net exports and investment) and HP-filtered annual time series, 1960–2003. The model statistics are averages from 1,000 simulations of 64 series and 44 periods, where the relevant series have been logged and HP-filtered as in the data series. All statistics are averages across countries. std denotes the standard deviation, cor denotes the correlation, y denotes output, c denotes consumption, i denotes investment, and nx denotes net exports. The relative standard deviation of investment and output is computed using the growth rates, since investment might be negative in the model.
22 We did not impose the capital adjustment cost in the enforcement model, due to technical complexity.
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
627
enforcement model generates fluctuations of output, consumption, and net exports close to those observed in the data. It also comes close to matching the cyclical behavior of consumption and investment. In addition, the model generates a cross-country output correlation that is the same as the consumption correlation due to limited risk sharing under the tight endogenous debt limits. These tight constraints, however, also limit the model’s ability to generate the countercyclicality of net exports found in the data.23 5. CONCLUSION The Feldstein–Horioka finding of a positive long-run savings–investment correlation across countries is one of the most robust findings in international finance. Our work first shows that this finding is a puzzle for the frictionless (complete markets) model. To our knowledge, this point is new to the literature. Most existing theoretical studies examine the positive time-series savings– investment correlation in the data and find this observation can arise even in a frictionless model as savings and investment comove in response to productivity shocks. We find, however, that the frictionless model implies a correlation of zero between the long-run savings and investment rates across countries. Our work then quantitatively investigates whether plausibly calibrated financial frictions can explain this finding. We find that a calibrated model with both limited spanning and limited enforcement frictions produces a savings– investment correlation and capital flows close to those found in the data. In contrast, the model with limited enforcement alone cannot jointly produce the capital flows and the FH coefficient found in the data. The model with limited spanning produces this finding when we exogenously set debt limits to restrict the volume of capital flows consistent with the data. The two frictions together generate such debt limits endogenously through their interaction. In sum, our work analyzes the roles of different financial frictions in one harmonized framework and highlights the importance of the interaction between the two frictions. In this work, the limited enforcement friction endogenizes borrowing constraints and links them to the fundamental parameters of the default penalties. This analysis is useful for predicting international capital flows when the underlying default penalties change. On the other hand, the two-friction model still assumes that the contracts available are exogenously incomplete and does not provide a deep reason for debt to be noncontingent. A future extension is to endogenize the set of contracts available. Another interesting extension would be to allow for equilibrium default, since in the data we do observe frequent episodes of sovereign default. In our bond-enforcement model, default never occurs in equilibrium because the endogenous borrowing constraints ensure that countries repay their debt under 23
A similar result is also found in Kehoe and Perri (2002).
628
Y. BAI AND J. ZHANG
any future contingency. If endogenous debt contracts are instead set to ensure debt repayments only in expectation, default might occur in equilibrium.24 Theoretically, equilibrium default could improve welfare by providing state contingency in debt repayments upon default. It is still an open question whether such an extension could have a significant effect on the savings–investment correlation. APPENDIX A: DATA SAMPLE AND SOURCES In this appendix, we describe our data source, identify the countries in our sample, and document several important changes in the systems of national accounts. A.1. Data Sources Our nominal data series are from the World Bank’s publication World Development Indicators 2007. These include nominal GDP, nominal final consumption, and nominal gross capital formation. Our population and real data series are from the Penn World Table 6.2 (Heston, Summers, and Aten (2006)). These series include real GDP per capita (Laspeyres), shares of consumption, government expenditure, and investment. Total employment data are mainly from data bases compiled by the Groningen Growth and Development Centre. The missing employment data are supplemented by the Penn World Table 6.2 as employment =
real GDP per capita (chained) × population
real GDP per worker (chained)
The data on capital flows are from the data set compiled by Lane and MilesiFerretti (2007). A.2. Country Sample A total of 98 countries have all relevant data series available for our whole sample period (1960–2003). We select the ending year to be 2003 because the Penn World Table 6.2 has missing data for many countries in 2004. We exclude economies that have a real GDP per capita of less than 4.5% or a population of less than 1% of those in the United States in 2000. We also exclude Luxembourg, Hong Kong, Taiwan, and China. The 64 countries remaining in the sample are Algeria, Argentina, Australia, Austria, Belgium, Bolivia, Brazil, Cameroon, Canada, Chile, Colombia, Costa Rica, Côte d’Ivoire, Denmark, the Dominican Republic, Ecuador, Egypt, El Salvador, Finland, France, Germany, Greece, Guatemala, Guinea, Honduras, Iceland, India, Indonesia, Iran, 24 See Eaton and Gersovitz (1981), Arellano (2007), Chatterjee, Corbae, Nakajima, and RiosRull (2007), and Livshits, MacGee, and Tertilt (2007).
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
629
Ireland, Israel, Italy, Japan, Korea, Malaysia, Mexico, Morocco, the Netherlands, New Zealand, Nicaragua, Norway, Pakistan, Panama, Paraguay, Peru, the Philippines, Portugal, Romania, Senegal, Singapore, South Africa, Spain, Sri Lanka, Sweden, Switzerland, the Syrian Arab Republic, Thailand, Tunisia, Turkey, the United Kingdom, the United States, Uruguay, Venezuela, and Zimbabwe. A.3. Changes in Systems of National Accounts Our result for the 16 OECD countries Feldstein and Horioka (1980) used over 1960–1974 is 0.67, which is different from the estimate of 0.89 found in the original study because of changes in the systems of national accounts (SNA). In the data source they used, National Accounts of OECD Countries 1974, the 1953 SNA, and the 1968 SNA are used for these countries. In our data source, the World Development Indicators 2007, most countries use the 1993 SNA. The adoption of the 1993 SNA involves many changes. Some changes are simply reclassifications of items between various components of GDP, but others involve adding new transactions or suppressing old ones. Among all of the components of GDP, the largest overall revisions affect gross fixed capital formation. The 1993 SNA broadens the concept of investment to include several types of expenditure that were not formerly considered to be capital spending, such as spending on computer software and expenditures on mineral exploration, entertainment, and artistic works. The above changes lead to an upward revision of gross capital formation and to a decrease in the FH coefficient. APPENDIX B: ESTIMATION OF THE WORLD PRODUCTIVITY PROCESS The TFP level for country i at period t is defined as (15)
log Ait = log Yti − α log Kti − (1 − α) log Lit
where Yti is real GDP, Kti is the capital stock constructed from gross capital formation data, and Lit is employment. The average growth rate of the TFP series of 64 countries ga is 1.01%. In this appendix, we describe the expectation maximization (EM) algorithm used to obtain the maximum likelihood estimates of parameters in the regimeswitching process specified in (11). This is an extension of the EM principle of Hamilton (1989). The log-likelihood function is given by L(Ψ ; Θ) =
N
log(f (Ψ i ; Θ))
i=1
where Ψ i = {aiT aiT −1 ai1 } is a vector containing all the observations on country i’s TFP, Θ = {{μj ρj νj }j=123 P} denotes the parameters to be estimated, N denotes the number of countries, and T denotes number of periods.
630
Y. BAI AND J. ZHANG
The density function f is given by f (Ψ i ; Θ) = f (aiT |miT aiT −1 ; Θ) · · · f (ai2 |mi2 ai1 ; Θ) mi
× p(miT |miT −1 ) · · · p(mi2 |mi1 )p(mi1 ) where mit denotes the regime that country i is in at period t. To make computation feasible, we assume the process to be stationary and restrict ρ ≤ 0 995 in the estimation. Due to the nonlinearity of the maximum likelihood function, we cannot solve the parameters analytically. Instead, we use the EM algorithm to solve the maximum likelihood estimates iteratively. We start with an initial guess of the parameters Θn−1 . We then update the conditional probability of each regime in each period for each country using the Bayes rule. Next, given the conditional probabilities, we compute Θn with the maximum log-likelihood method. We iterate these procedures until Θn converges. REFERENCES ABRAHAM, A., AND E. CARCELES-POVEDA (2010): “Endogenous Trading Constraints With Incomplete Asset Markets,” Journal of Economic Theory (forthcoming). [606] ABREU, D., D. PEARCE, AND E. STACCHETTI (1990): “Toward a Theory of Discounted Repeated Games With Imperfect Monitoring,” Econometrica, 58 (5), 1041–1063. [612] AIYAGARI, S. R. (1994): “Uninsured Idiosyncratic Risk and Aggregate Saving,” Quarterly Journal of Economics, 109 (3), 659–684. [604,611] ARELLANO, C. (2007): “Default Risk, the Real Exchange Rate and Income Fluctuations in Emerging Economies,” American Economic Review, 98 (3), 690–712. [628] ATKESON, A. (1991): “International Lending With Moral Hazard and Risk of Repudiation,” Econometrica, 59 (4), 1969–1989. [612] ATKESON, A., AND R. E. LUCAS JR. (1992): “On Efficient Distribution With Private Information on Efficient Distribution With Private Information,” Review of Economic Studies, 59 (3), 427–453. [619] BACKUS, D. K., P. J. KEHOE, AND F. E. KYDLAND (1992): “International Real Business Cycles,” Journal of Political Economy, 100 (4), 745–775. [625] BAI, Y., AND J. ZHANG (2010): “Supplement to ‘Solving the Feldstein–Horioka Puzzle With Financial Frictions’,” Econometrica Supplemental Material, 78, http://www.econometricsociety. org/ecta/Supmat/6619_extensions.pdf; http://www.econometricsociety.org/ecta/Supmat/6619_ data and programs.zip. [606,611,612] BARRO, R. J., N. G. MANKIW, AND X. SALA-I-MARTIN (1995): “Capital Mobility in Neoclassical Models of Growth,” American Economic Review, 85 (1), 103–115. [606] BAXTER, M., AND M. J. CRUCINI (1993): “Explaining Saving-Investment Correlations,” American Economic Review, 83 (3), 416–436. [605,625] (1995): “Business Cycles and the Asset Structure of Foreign Trade,” International Economic Review, 36 (4), 821–854. [604] BULOW, J., AND K. ROGOFF (1989): “Sovereign Debt: Is to Forgive or Forget?” American Economic Review, 79 (1), 43–50. [609] CASTRO, R. (2005): “Economic Development and Growth in the World Economy,” Review of Economic Dynamics, 8 (1), 195–230. [605,623]
SOLVING THE FELDSTEIN–HORIOKA PUZZLE
631
CHATTERJEE, S., D. CORBAE, M. NAKAJIMA, AND J.-V. RIOS-RULL (2007): “A Quantitative Theory of Unsecured Consumer Credit With Risk of Default,” Econometrica, 75 (6), 1525–1589. [628] CLARIDA, R. H. (1990): “International Lending and Borrowing in a Stochastic Stationary Equilibrium,” International Economic Review, 31 (3), 543–558. [609] COAKLEY, J., F. KULASI, AND R. SMITH (1998): “The Feldstein–Horioka Puzzle and Capital Mobility: A Review,” International Journal of Finance and Economics, 3 (2), 169–188. [607] COHEN, D. (1992): “The Debt Crisis: A Postmortem,” NBER Macroeconomics Annual, 7, 65–105. [610] EATON, J., AND M. GERSOVITZ (1981): “Debt With Potential Repudiation: Theoretical and Empirical Analysis,” Review of Economic Studies, 48 (2), 289–309. [609,628] FELDSTEIN, M., AND C. HORIOKA (1980): “Domestic Saving and International Capital Flows,” Economic Journal, 90 (358), 314–329. [603,606,629] FRANKEL, J. A. (1992): “Measuring International Capital Mobility: A Review,” American Economic Review, 82 (2), 197–202. [607] GELOS, R. G., R. SAHAY, AND G. SANDLERIS (2004): “Sovereign Borrowing by Developing Countries: What Determines Market Access?” Working Paper 04/221, IMF. [621] HAMILTON, J. D. (1989): “A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle,” Econometrica, 57 (2), 357–384. [614,629] HESTON, A. R., R. SUMMERS, AND B. ATEN (2006): “Penn World Table Version 6.2,” Center for International Comparisons at the University of Pennsylvania. [628] JESKE, K. (2006): “Private International Debt With Risk of Repudiation,” Journal of Political Economy, 114 (3), 576–593. [611] KEHOE, P. J., AND F. PERRI (2002): “International Business Cycles With Endogenous Incomplete Markets,” Econometrica, 70 (3), 907–928. [604,605,611,620,627] KEHOE, T. J., AND D. K. LEVINE (1993): “Debt-Constrained Asset Markets,” Review of Economic Studies, 60 (4), 865–888. [604] KOCHERLAKOTA, N. (1996): “Implications of Efficient Risk Sharing Without Commitment,” Review of Economic Studies, 63 (4), 595–609. [604] LANE, P. R., AND G. M. MILESI-FERRETTI (2007): “The External Wealth of Nations Mark II,” Journal of International Economics, 73 (2), 223–250. [628] LIVSHITS, I., J. MACGEE, AND M. TERTILT (2007): “Consumer Bankruptcy: A Fresh Start,” American Economic Review, 97 (1), 402–418. [628] LUCAS, R. E. (1990): “Why Doesn’t Capital Flow From Rich to Poor Countries?” American Economic Review, Papers and Proceedings of the Hundred and Second Annual Meeting of the American Economic Association, 80 (2), 92–96. [619] MENDOZA, E. G. (1991): “Real Business Cycles in a Small Open Economy,” American Economic Review, 81 (4), 797–818. [604,605,625] OBSTFELD, M. (1986): “Capital Mobility in the World Economy: Theory and Measurement,” Carnegie-Rochester Conference Series on Public Policy, 24, 55–103. [606] SUMMERS, L. H. (1988): “Tax Policy and International Competitiveness,” in International Aspects of Fiscal Policies, ed. by J. Frenkel. Chicago: Chicago University Press, 349–375. [606] TESAR, L. L. (1991): “Savings, Investment, and International Capital Flows,” Journal of International Economics, 31 (1–2), 55–78. [605,625] TOMZ, M., AND M. L. WRIGHT (2007): “Do Countries Default in ‘Bad Times’?” Journal of the European Economic Association, 5 (2–3), 352–360. [610,613] WESTPHAL, U. (1983): “Domestic Saving and International Capital Movements in the Long Run and the Short Run’ by M. Feldstein,” European Economic Review, 21 (1–2), 157–159. [606] WRIGHT, M. L. (2006): “Private Capital Flows, Capital Controls and Default Risk,” Journal of International Economics, 69 (1), 120–149. [611] ZHANG, H. H. (1997): “Endogenous Borrowing Constraints With Incomplete Markets,” Journal of Finance, 52 (5), 2187–2209. [604,605]
632
Y. BAI AND J. ZHANG
ZHANG, J. (2005): “Essays on International Economics,” Ph.D. Dissertation, University of Minnesota. [619]
Economics Dept., Arizona State University, Tempe, AZ 85287, U.S.A.; yan.
[email protected] and Dept. of Economics, University of Michigan, 611 Tappan St., Ann Arbor, MI 48109, U.S.A.;
[email protected]. Manuscript received August, 2006; final revision received September, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 633–653
AXIOMS FOR DEFERRED ACCEPTANCE BY FUHITO KOJIMA AND MIHAI MANEA1 The deferred acceptance algorithm is often used to allocate indivisible objects when monetary transfers are not allowed. We provide two characterizations of agentproposing deferred acceptance allocation rules. Two new axioms—individually rational monotonicity and weak Maskin monotonicity—are essential to our analysis. An allocation rule is the agent-proposing deferred acceptance rule for some acceptant substitutable priority if and only if it satisfies non-wastefulness and individually rational monotonicity. An alternative characterization is in terms of non-wastefulness, population monotonicity, and weak Maskin monotonicity. We also offer an axiomatization of the deferred acceptance rule generated by an exogenously specified priority structure. We apply our results to characterize efficient deferred acceptance rules. KEYWORDS: Deferred acceptance algorithm, stable allocations, axioms, individually rational monotonicity, weak Maskin monotonicity, population monotonicity, nonwastefulness.
1. INTRODUCTION IN AN ASSIGNMENT PROBLEM, a set of indivisible objects that are collectively owned need to be allocated to a number of agents, with each agent being entitled to receive at most one object. Student placement in public schools and university housing allocation are examples of important assignment problems in practice. The agents are assumed to have strict preferences over the objects (and being unassigned). An allocation rule specifies an assignment of the objects to the agents for each preference profile. No monetary transfers are allowed. In many assignment problems, each object is endowed with a priority over agents. For example, schools in Boston give higher priority to students who live nearby or have siblings already attending. An allocation rule is stable with respect to a given priority profile if there is no agent–object pair (i a) such that (i) i prefers a to his assigned object and (ii) either i has higher priority for a than some agent who receives a or a is not assigned to other agents up to its quota. In the school choice settings of Balinski and Sönmez (1999) and Abdulkad˙ıro˘glu and Sönmez (2003), priorities represent a social objective. For example, it may be desirable that in Boston students attend high schools within walking distance from their homes or that in Turkey students with excellent achievements in mathematics and science go to the best engineering universities. Stability is regarded as a normative fairness criterion in the following sense. An allocation is stable if no student has justified envy, that is, any school 1 We thank Haluk Ergin, John Friedman, Drew Fudenberg, David Laibson, Bart Lipman, Stephen Morris, Andrew Postlewaite, Tayfun Sönmez, Utku Ünver, and three anonymous referees for helpful comments. We are especially grateful to Al Roth and one of the referees for suggesting the analyses of Sections 6 and, respectively, 5 and 7.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7443
634
F. KOJIMA AND M. MANEA
that a student prefers to his assigned school is attended (up to capacity) by students who are granted higher priority for it. The deferred acceptance algorithm of Gale and Shapley (1962) determines a stable allocation which has many appealing properties. The agent-proposing deferred acceptance allocation Pareto dominates any other stable allocation. Moreover, the agent-proposing deferred acceptance rule makes truthful reporting of preferences a dominant strategy for every agent. Consequently, the deferred acceptance rule is used in many practical assignment problems such as student placement in New York City and Boston (Abdulkad˙ıro˘glu, Pathak, and Roth (2005), Abdulkad˙ıro˘glu, Pathak, Roth, and Sönmez (2005)) and university house allocation at MIT and the Technion (Guillen and Kesten (2008), Perach, Polak, and Rothblum (2007)), to name some concrete examples. There are proposals to apply the rule to other problems such as course allocation in business schools (Sönmez and Ünver (2009)) and assignment of military personnel to positions (Korkmaz, Gökçen, and Çetinyoku¸s (2008)). Despite the importance of deferred acceptance rules in both theory and practice, no axiomatization has yet been obtained in an object allocation setting with unspecified priorities. Our first results (Theorems 1 and 2) offer two characterizations of deferred acceptance rules with acceptant substitutable priorities. For the first characterization, we introduce a new axiom, individually rational (IR) monotonicity. We say that a preference profile R is an IR monotonic transformation of a preference profile R at an allocation μ if for every agent, any object that is acceptable and preferred to μ under R is preferred to μ under R. An allocation rule ϕ satisfies IR monotonicity if every agent weakly prefers the allocation ϕ(R ) to the allocation ϕ(R) under R whenever R is an IR monotonic transformation of R at ϕ(R). If R is an IR monotonic transformation of R at ϕ(R), then the interpretation of the change in reported preferences from R to R is that all agents place fewer claims on objects they cannot receive at R, in the sense that each agent’s set of acceptable objects that are preferred to ϕ(R) shrinks. Intuitively, the IR monotonicity axiom requires that all agents be weakly better off when some agents claim fewer objects. The IR label captures the idea that each agent effectively places claims only on acceptable objects; an agent may not be allocated unacceptable objects because he can opt to remain unassigned, so the relevant definition of an upper contour set includes the IR constraint. IR monotonicity requires that allocations be monotonic with respect to the IR constrained upper contour sets. IR monotonicity resembles Maskin monotonicity (Maskin (1999)), but the two axioms are independent. We also define a weak requirement of efficiency, the non-wastefulness axiom. An allocation rule is non-wasteful if at every preference profile, any object that an agent prefers to his assignment is allocated up to its quota to other agents. Our first characterization states that an allocation rule is the deferred acceptance rule for some acceptant substitutable priority if and only if it satisfies non-wastefulness and IR monotonicity (Theorem 1).
AXIOMS FOR DEFERRED ACCEPTANCE
635
To further understand deferred acceptance rules, we provide a second characterization based on axioms that are mathematically more elementary and tractable than IR monotonicity. An allocation rule is population monotonic if for every preference profile, when some agents deviate to declaring every object unacceptable (which we interpret as leaving the market unassigned), all other agents are weakly better off (Thomson (1983a, 1983b)). Following Maskin (1999), R is a monotonic transformation of R at μ if for every agent, any object that is preferred to μ under R is also preferred to μ under R. An allocation rule ϕ satisfies weak Maskin monotonicity if every agent prefers ϕ(R ) to ϕ(R) under R whenever R is a monotonic transformation of R at ϕ(R). Our second result shows that an allocation rule is the deferred acceptance rule for some acceptant substitutable priority if and only if it satisfies nonwastefulness, weak Maskin monotonicity, and population monotonicity (Theorem 2). We also study allocation rules that are stable with respect to an exogenously specified priority profile C (Section 6). We show that the deferred acceptance rule at C is the only stable rule at C that satisfies weak Maskin monotonicity (Theorem 3). In addition to stability, efficiency is often a goal of the social planner. We apply our axiomatizations to the analysis of efficient deferred acceptance rules. The Maskin monotonicity axiom plays a key role. Recall that an allocation rule ϕ satisfies Maskin monotonicity if ϕ(R ) = ϕ(R) whenever R is a monotonic transformation of R at ϕ(R) (Maskin (1999)). We prove that an allocation rule is an efficient deferred acceptance rule if and only if it satisfies Maskin monotonicity, along with non-wastefulness and population monotonicity; an equivalent set of conditions consists of Pareto efficiency, weak Maskin monotonicity, and population monotonicity (Theorem 4). Priorities are not primitive in our model except for Section 6, and our axioms are “priority-free” in the sense that they do not involve priorities. The IR monotonicity axiom conveys the efficiency cost imposed by stability with respect to some priority structure.2 Whenever some agents withdraw claims for objects that they prefer to their respective assignments, all agents (weakly) benefit. In the context of the deferred acceptance algorithm, the inefficiency is brought about by agents who apply for objects that tentatively accept, but subsequently reject, them. While it is intuitive that deferred acceptance rules satisfy IR monotonicity, it is remarkable that this priority-free axiom fully de2
We do not regard IR monotonicity as a normative (either desirable or undesirable) requirement, but as a positive comprehensive description of the deferred acceptance algorithm. The reason is that priorities often reflect social objectives, and priority-free statements such as IR monotonicity may lack normative implications for priority-based assignment problems. The present welfare analysis disregards the social objectives embedded in the priorities. Nonetheless, it should be reiterated that for a given priority structure, the corresponding deferred acceptance rule attains constrained efficiency subject to stability.
636
F. KOJIMA AND M. MANEA
scribes the theoretical contents of the deferred acceptance algorithm (along with the requirement of non-wastefulness). The weak Maskin monotonicity axiom is mathematically similar to and is weaker than (i.e., implied by) Maskin monotonicity. We establish that weak Maskin monotonicity is sufficient, along with non-wastefulness and population monotonicity, to characterize deferred acceptance rules. At the same time, if we replace weak Maskin monotonicity by Maskin monotonicity in the list of axioms above, we obtain a characterization of efficient deferred acceptance rules. The contrast between these two findings demonstrates that the inefficiency of some deferred acceptance rules can be attributed entirely to instances where weak Maskin monotonicity is satisfied while Maskin monotonicity is violated. Our analysis focuses on substitutable priorities because priorities may be non-responsive but substitutable in applications. Such priorities arise, for example, in school districts concerned with balance in race distribution (Abdulkad˙ıro˘glu and Sönmez (2003)) or in academic achievement (Abdulkad˙ıro˘glu, Pathak, and Roth (2005)) within each school. A case in point is the New York City school system, where each Educational Option school must allocate 16% of its seats to top performers in a standardized exam, 68% to middle performers, and 16% to bottom performers. In the context of house allocation, some universities impose bounds on the number of rooms or apartments assigned to graduate students in each program (arts and sciences, business, public policy, law, etc.). Furthermore, substitutability of priorities is an “almost necessary” condition for the non-emptiness of the core.3 When priorities are substitutable, the core coincides with the set of stable allocations. Since the relevant restrictions on priorities vary across applications, allowing for substitutable priorities is a natural approach. Special instances of deferred acceptance rules have been characterized in the literature. Svensson (1999) axiomatized the serial dictatorship allocation rules. Ehlers, Klaus, and Papai (2002), Ehlers and Klaus (2004), Ehlers and Klaus (2006), and Kesten (2009) offered various characterizations for the mixed dictator and pairwise-exchange rules. Mixed dictator and pairwise-exchange rules correspond to deferred acceptance rules with acyclic priority structures. For responsive priorities, Ergin (2002) showed that the only deferred acceptance rules that are efficient correspond to acyclic priority structures. Other allocation mechanisms have been previously characterized. Papai (2000) characterized the hierarchical exchange rules, which generalize the priority-based top trading cycle rules of Abdulkad˙ıro˘glu and Sönmez (2003). 3
Formally, suppose there are at least two proper objects a and b. Fix a non-substitutable priority for a. Then there exist a preference profile for the agents and a responsive priority for b such that, regardless of the priorities for the other objects, the core is empty. The first version of this result, for a slightly different context, appears in Sönmez and Ünver (2009). The present statement is from Hatfield and Kojima (2008).
AXIOMS FOR DEFERRED ACCEPTANCE
637
In the context of housing markets, Ma (1994) characterized the top trading cycle rule of David Gale described by Shapley and Scarf (1974). Kesten (2006) showed that the deferred acceptance rule and the top trading cycle rule for some fixed priority profile are equivalent if and only if the priority profile is acyclic.4 When the priority structure is a primitive of the model as in Section 6, alternative characterizations of the corresponding deferred acceptance rule are known. The classic result of Gale and Shapley (1962) implies that the deferred acceptance rule is characterized by constrained efficiency subject to stability. Alcalde and Barbera (1994) characterized the deferred acceptance rule by stability and strategy-proofness. Balinski and Sönmez (1999) considered allocation rules over the domain of pairs of responsive priorities and preferences. An allocation rule respects improvements if an agent is weakly better off when his priority improves for each object. Balinski and Sönmez (1999) showed that the deferred acceptance rule is the only stable rule that respects improvements. 2. FRAMEWORK Fix a set of agents N and a set of (proper) object types O. There is one null object type, denoted ∅. Each object a ∈ O ∪ {∅} has quota qa ; ∅ is not scarce, q∅ = |N|. Each agent i is allocated exactly one object in O ∪ {∅}. An allocation is a vector μ = (μi )i∈N that assigns object μi ∈ O ∪ {∅} to agent i, with each object a being assigned to at most qa agents. We write μa = {i ∈ N|μi = a} for the set of agents who receive object a under μ. Each agent i has a strict (complete, transitive, and antisymmetric) preference relation Ri over O ∪ {∅}.5 We denote by Pi the asymmetric part of Ri , that is, aPi b if only if aRi b and a = b. An object a is acceptable (unacceptable) to agent i if aPi ∅ (∅Pi a). Let R = (Ri )i∈N denote the preference profile of all agents. For any N ⊂ N, we use the notation RN = (Ri )i∈N .6 We write μRμ if and only if μi Ri μi for all i ∈ N. We denote by A and R the sets of allocations and preference profiles, respectively. An allocation rule ϕ : R → A maps preference profiles to allocations. At R, agent i is assigned object ϕi (R), and object a is assigned to the set of agents ϕa (R). 4
Kesten’s acyclicity condition is stronger than Ergin’s. The null object may represent private schools in the context of student placement in public schools or off-campus housing in the context of university house allocation. Taking into consideration preferences that rank the null object above some proper objects is natural in such applications. 6 Our analysis carries through if we do not stipulate that preferences rank pairs of unacceptable objects, but alternatively regard as identical all preferences that agree on the ranking of acceptable objects. 5
638
F. KOJIMA AND M. MANEA
3. DEFERRED ACCEPTANCE A priority for a proper object a ∈ O is a correspondence Ca : 2N → 2N , satisfying Ca (N ) ⊂ N and |Ca (N )| ≤ qa for all N ⊂ N; Ca (N ) is interpreted as the set of high priority agents in N “chosen” by object a. The priority Ca is substitutable if agent i is chosen by object a from a set of agents N whenever i is chosen by a from a set N that includes N ; formally, for all N ⊂ N ⊂ N, we have Ca (N ) ∩ N ⊂ Ca (N ). The priority Ca is acceptant if object a accepts each agent when its quota is not entirely allocated; formally, for all N ⊂ N, |Ca (N )| = min(qa |N |).7 Let C = (Ca )a∈O denote the priority profile; C is substitutable (acceptant) if Ca is substitutable (acceptant) for all a ∈ O. The allocation μ is individually rational at R if μi Ri ∅ for all i ∈ N The allocation μ is blocked by a pair (i a) ∈ N × O at (R C) if aPi μi and i ∈ Ca (μa ∪ {i}). An allocation μ is stable at (R C) if it is individually rational at R and is not blocked by any pair (i a) ∈ N × O at (R C). When C is substitutable, the following iterative procedure, called the (agent-proposing) deferred acceptance algorithm, produces a stable allocation at (R C) (Gale and Shapley (1962); extended to the case of substitutable priorities by Roth and Sotomayor (1990)). Step 1. Every agent applies to his most preferred acceptable object under R (if any). Let N˜ a1 be the set of agents applying to object a. Object a tentatively accepts the agents in Na1 = Ca (N˜ a1 ) and rejects the applicants in N˜ a1 \ Na1 . Step t (t ≥ 2). Every agent who was rejected at step t − 1 applies to his next preferred acceptable object under R (if any). Let N˜ at be the new set of agents applying to object a. Object a tentatively accepts the agents in Nat = Ca (Nat−1 ∪ N˜ at ) and rejects the applicants in (Nat−1 ∪ N˜ at ) \ Nat . The deferred acceptance algorithm terminates when each agent who is not tentatively accepted by some object has been rejected by every object acceptable to him. Each agent tentatively accepted by a proper object at the last step is assigned that object and all other agents are assigned the null object. The deferred acceptance rule ϕC is defined by setting ϕC (R) equal to the allocation obtained when the algorithm is applied for (R C). The allocation ϕC (R) is the agent-optimal stable allocation at (R C): it is stable at (R C) and is weakly preferred under R by every agent to any other stable allocation at (R C) (Theorem 6.8 in Roth and Sotomayor (1990)). REMARK 1: It can be easily shown that no two distinct priority profiles induce the same deferred acceptance rule. Therefore, the subsequent characterization results lead to unique representations. 7 The acceptant responsive priority Ca for a linear order a on N is defined as follows. For all N ⊂ N, Ca (N ) is the set of min(qa |N |) top ranked agents in N under a . The class of acceptant responsive priorities is a subset of the class of acceptant substitutable priorities. Studying substitutable priorities is important because priorities may often be non-responsive but substitutable in practice, as discussed in the Introduction.
AXIOMS FOR DEFERRED ACCEPTANCE
639
4. FIRST CHARACTERIZATION OF DEFERRED ACCEPTANCE RULES We introduce two axioms—non-wastefulness and individually rational (IR) monotonicity—that characterize the set of deferred acceptance rules. Priorities are not primitive in our model except for Section 6, and our axioms are priorityfree in the sense that they do not involve priorities. DEFINITION 1—Non-Wastefulness: An allocation rule ϕ is non-wasteful if aPi ϕi (R)
⇒
|ϕa (R)| = qa
∀R ∈ R i ∈ N a ∈ O ∪ {∅}
Non-wastefulness is a weak efficiency condition. It requires that an object is not assigned to an agent who prefers it to his allocation only if the entire quota of that object is assigned to other agents. Note that if ϕ is non-wasteful, then ϕ(R) is individually rational at R for every R ∈ R, as the null object is not scarce. To introduce the main axiom, we say that Ri is an individually rational monotonic transformation of Ri at a ∈ O ∪ {∅} (Ri i.r.m.t. Ri at a) if any object that is ranked above both a and ∅ under Ri is ranked above a under Ri , that is, bPi a&bPi ∅
⇒
bPi a ∀b ∈ O
R is an IR monotonic transformation of R at an allocation μ (R i.r.m.t. R at μ) if Ri i.r.m.t. Ri at μi for all i. DEFINITION 2—IR Monotonicity: An allocation rule ϕ satisfies individually rational monotonicity if R i.r.m.t. R at ϕ(R)
⇒
ϕ(R )R ϕ(R)
In words, ϕ satisfies IR monotonicity if every agent weakly prefers ϕ(R ) to ϕ(R) under R whenever R is an IR monotonic transformation of R at ϕ(R). If R i.r.m.t. R at ϕ(R), then the interpretation of the change in reported preferences from R to R is that all agents place fewer claims on objects they cannot receive at R, in the sense that each agent’s set of acceptable objects that are preferred to ϕ(R) shrinks. Intuitively, the IR monotonicity axiom requires that all agents be weakly better off when some agents claim fewer objects. The IR label captures the idea that each agent effectively places claims only on acceptable objects. An agent may not be allocated unacceptable objects because he can opt to remain unassigned (∅ represents the outside option), so the relevant definition of an upper contour set includes the IR constraint. Hence IR monotonicity requires that allocations be monotonic with respect to the IR constrained upper contour sets (ordered according to set inclusion).
640
F. KOJIMA AND M. MANEA
THEOREM 1: An allocation rule ϕ is the deferred acceptance rule for some acceptant substitutable priority C, that is, ϕ = ϕC , if and only if ϕ satisfies nonwastefulness and IR monotonicity. The proof appears in the Appendix. Example 1 below, borrowed from Ergin (2002), illustrates an instance where a deferred acceptance rule satisfies IR monotonicity and provides some intuition for the “only if” part of the theorem. IR monotonicity resembles Maskin (1999) monotonicity. Ri is a monotonic transformation of Ri at a ∈ O ∪ {∅} (Ri m.t. Ri at a) if any object that is ranked above a under Ri is also ranked above a under Ri , that is, bPi a ⇒ bPi a ∀b ∈ O ∪ {∅}. R is a monotonic transformation of R at an allocation μ (R m.t. R at μ) if Ri m.t. Ri at μi for all i. DEFINITION 3—Maskin Monotonicity: An allocation rule ϕ satisfies Maskin monotonicity if R m.t. R at ϕ(R)
ϕ(R ) = ϕ(R)
⇒
On the one hand, IR monotonicity has implications for a larger set of preference profile pairs (R R ) than Maskin monotonicity, as R m.t. R at ϕ(R) ⇒ R i.r.m.t. R at ϕ(R). On the other hand, for every preference profile pair (R R ) for which both axioms have implications (i.e., R m.t. R at ϕ(R)), Maskin monotonicity imposes a stronger restriction than IR monotonicity (as ϕ(R ) = ϕ(R) ⇒ ϕ(R )R ϕ(R)). Example 1 establishes the independence of the IR monotonicity and Maskin monotonicity axioms. The example also shows that deferred acceptance rules do not always satisfy Maskin monotonicity (cf. Kara and Sönmez (1996)) and that some top trading cycle rules violate IR monotonicity, but satisfy Maskin monotonicity. EXAMPLE 1: Let N = {i j k} O = {a b}, and qa = qb = 1. Consider the strict orderings a and b specified as a i j k
b k i j
Let C denote the responsive priorities that correspond to these orderings defined as in footnote 7. Consider the set of preferences for the agents: Ri
Ri
Rj
Rj
Rk
b a ∅
∅ b a
a ∅ b
∅ a b
a b ∅
AXIOMS FOR DEFERRED ACCEPTANCE
641
Let R = (Ri Rj Rk ) R = (Ri Rj Rk ) R = (Ri Rj Rk ). In the first step of the deferred acceptance algorithm for (R C), i applies to b, and j and k apply to a; then k is rejected by a. In the second step, k applies to b and i is rejected by b. At the third step, i applies to a and j is rejected by a. The algorithm terminates after the third step and the final allocation is given by ϕC (R) = (ϕCi (R) ϕCj (R) ϕCk (R)) = (a ∅ b) In the first step of the deferred acceptance algorithm for (R C), i applies to b and k applies to a. The algorithm ends at the first step and ϕC (R ) = (b ∅ a) All agents prefer ϕC (R ) to ϕC (R) under R (the preference is weak for j and strict for i and k) due to the fact that R i.r.m.t. R at ϕC (R). Indeed, there is a chain of rejections in the deferred acceptance algorithm for (R C): k is rejected by a because j claims higher priority to a; next, i is rejected by b because k claims higher priority to b; then j is rejected by a because i claims higher priority for a. Hence j receives the null object in spite of his initial priority claim to a, which starts off the rejection chain. If j does not claim higher priority to a and reports Rj instead of Rj , then the rejection chain does not occur, weakly benefiting everyone (with respect to R ). Also, note that ϕC violates Maskin monotonicity since R m.t. R at ϕC (R) and ϕCi (R ) = ϕCi (R). IR monotonicity is not satisfied by the top trading cycle rule (Abdulkad˙ıro˘glu and Sönmez (2003)) associated with the priorities ( a b ).8 At R, i and k trade their priorities for a and b; the top trading cycle allocation is μ = (b ∅ a). At R , i is assigned the null object and then j receives a, for which he has higher priority than k. The top trading cycle allocation is μ = (∅ a b). IR monotonicity is violated as R i.r.m.t. R at μ and agent k strictly prefers μ to μ under Rk . At R , k does not receive a because he has lower priority than j for a and cannot trade his priority for b with the priority of i for a since i does not place claims for b. The top trading cycle rule considered here satisfies Maskin monotonicity by Papai (2000) and Takamiya (2001). The following examples show that non-wastefulness and IR monotonicity are independent axioms if |N| |O| ≥ 2 and there is at least one scarce object, that is, qa < |N| for some a ∈ O. EXAMPLE 2: Consider the rule that allocates the null object to every agent for all preference profiles. This rule trivially satisfies IR monotonicity, but violates non-wastefulness. EXAMPLE 3: Let N = {1 2 n}. Suppose that a is one of the scarce objects (qa < n) and b is a proper object different from a (such a and b exist by assumption). Let R denote a (fixed) preference profile at which every agent ranks a first and ∅ second. Define the following allocation rule: 8 We follow a definition of top trading cycles from Kesten (2006), which assumes the existence of a null object and allows each agent to consider some objects unacceptable.
642
F. KOJIMA AND M. MANEA
(i) At any preference profile where agent qa reports Rqa , the assignment is according to the serial dictatorship with the ordering of agents 1 2 n, that is, agent 1 picks his most preferred object, agent 2 picks his most preferred available object, and so on (an object is available for an agent if the number of preceding agents who choose that object is smaller than its quota). (ii) At any other preference profile, the assignment is specified by the serial dictatorship with the agent ordering 1 2 qa − 1 qa + 1 qa qa + 2 n, defined analogously to (i). The allocation rule described above clearly satisfies non-wastefulness, but violates IR monotonicity. Indeed, let Rqa be a preference relation for agent qa that ranks a first and b second. The profile (Rqa RN\{qa } ) i.r.m.t. R at the allocation for R, but agent qa is assigned a at R and b at (Rqa RN\{qa } ), and aPq a b. 5. SECOND CHARACTERIZATION OF DEFERRED ACCEPTANCE RULES We offer an alternative characterization of deferred acceptance rules in terms of more elementary axioms. These axioms are mathematically more tractable and contribute to further understanding of deferred acceptance rules. For instance, in Section 7, we obtain a characterization of Pareto efficient deferred acceptance rules via a simple alteration in the new collection of axioms. We first define the weak Maskin monotonicity axiom. Recall that Ri is a monotonic transformation of Ri at a ∈ O ∪ {∅} (Ri m.t. Ri at a) if any object that is ranked above a under Ri is also ranked above a under Ri , that is, bPi a ⇒ bPi a ∀b ∈ O ∪ {∅}. R is a monotonic transformation of R at an allocation μ (R m.t. R at μ) if Ri m.t. Ri at μi for all i. DEFINITION 4—Weak Maskin Monotonicity: An allocation rule ϕ satisfies weak Maskin monotonicity if R m.t. R at ϕ(R)
⇒
ϕ(R )R ϕ(R)
To gain some perspective, note that the implication of R m.t. R at ϕ(R) is that ϕ(R ) = ϕ(R) under Maskin monotonicity, but only that ϕ(R )R ϕ(R) under weak Maskin monotonicity. Therefore, any allocation rule that satisfies the standard Maskin monotonicity axiom also satisfies weak Maskin monotonicity. We next define the population monotonicity axiom (Thomson (1983a, 1983b)). As a departure from the original setting, suppose that the collection of all objects (qa copies of each object type a ∈ O ∪ {∅}) needs to be allocated to a subset of agents N or, equivalently, that the agents outside N receive ∅ and are removed from the assignment problem. It is convenient to view the new setting as a restriction on the set of preference profiles, whereby the agents in N \ N are constrained to report every object as unacceptable. Specifically, let R∅ denote a fixed preference profile that ranks ∅ first for every agent. For any R ∈ R, we interpret the profile (RN R∅N\N ) as a deviation from R generated by restricting the assignment problem to the agents in N .
AXIOMS FOR DEFERRED ACCEPTANCE
643
DEFINITION 5—Population Monotonicity: An allocation rule ϕ is population monotonic if ϕi RN R∅N\N Ri ϕi (R) ∀i ∈ N ∀N ⊂ N ∀R ∈ R The definitions of weak Maskin monotonicity and population monotonicity are inspired by the connection between IR monotonicity and the deferred acceptance algorithm. IR monotonicity clearly implies both weak Maskin monotonicity and population monotonicity. Building on the intuition for Theorem 1, we prove that the latter two axioms, along with non-wastefulness, are sufficient to characterize deferred acceptance rules (the proof appears in the Appendix). THEOREM 2: An allocation rule ϕ is the deferred acceptance rule for some acceptant substitutable priority C, that is, ϕ = ϕC , if and only if ϕ satisfies nonwastefulness, weak Maskin monotonicity, and population monotonicity. We show that the three axioms from Theorem 2 are independent if |N| |O| ≥ 2 and qa < |N| − 1 for at least one object a ∈ O.9 The rule described in Example 2 satisfies weak Maskin monotonicity and population monotonicity, and violates non-wastefulness. The rule from Example 3 satisfies nonwastefulness and population monotonicity, but not weak Maskin monotonicity. Last, the following example defines a non-wasteful and weakly Maskin monotonic rule, which is not population monotonic. EXAMPLE 4: Let N = {1 2 n}. Consider the allocation rule defined as follows: (i) At any preference profile where agent 1 declares every object unacceptable, the assignment is according to the serial dictatorship allocation for the ordering of agents 1 2 n − 2 n − 1 n. (ii) Otherwise, the assignment is specified by the serial dictatorship for the ordering 1 2 n − 2 n n − 1. The allocation rule so defined satisfies non-wastefulness and weak Maskin monotonicity, but not population monotonicity. To show that the rule violates population monotonicity, suppose that a is an object with qa < n − 1 and b = a is a proper object (such a and b exist by assumption). Let R be a preference profile where the first ranked objects are b for agent 1; a for agents 2 3 qa n − 1 n; and ∅ for the other agents. Note that agent n receives a at R and some c with aPn c at (R∅1 RN\{1} ). If qa ≥ |N| − 1 for all a ∈ O, then non-wastefulness implies population monotonicity. In that case, in any market that excludes at least one agent, every non-wasteful allocation assigns each of the remaining agents his favorite object. 9
644
F. KOJIMA AND M. MANEA
IR monotonicity implies both weak Maskin monotonicity and population monotonicity, and under the assumption of non-wastefulness, by Theorems 1 and 2, is equivalent to the conjunction of the latter two axioms. However, the following example shows that weak Maskin monotonicity and population monotonicity do not imply IR monotonicity if |N| |O| ≥ 2. EXAMPLE 5: Let N = {1 2 n}. Fix two proper objects a and b (such a and b exist by assumption). Consider the allocation rule that, at every preference profile R, specifies the following assignments: (i) Agent 1 is assigned the higher ranked object between a and ∅ under R1 . (ii) Agent 2 is assigned the higher ranked object between b and ∅ under R2 , except for the case bP1 ∅P1 a, when he is assigned ∅. (iii) Agents in N \ {1 2} are assigned ∅. One can check that this allocation rule satisfies weak Maskin monotonicity and population monotonicity. To show that the rule violates IR monotonicity, let R be a preference profile where agent 1 ranks b first and a second, and agent 2 ranks b first, and let R1 be a preference for agent 1 that ranks b first and ∅ second. Then IR monotonicity is violated since (R1 RN\{1} ) i.r.m.t. R at the allocation under R, but agent 2 is assigned b at R and ∅ at (R1 RN\{1} ), and bP2 ∅. 6. AXIOMS FOR STABLE RULES In this section, we study stable allocation rules with respect to an exogenously specified priority structure C. We say that an allocation rule ϕ is stable at C if ϕ(R) is stable at (R C) for all R. We show that the deferred acceptance rule at C is the only allocation rule that is stable at C and satisfies weak Maskin monotonicity. THEOREM 3: Let C be an acceptant substitutable priority. Suppose that ϕ is a stable allocation rule at C. Then ϕ is the deferred acceptance rule for C, that is, ϕ = ϕC , if and only if it satisfies weak Maskin monotonicity. PROOF: The “only if” part is a consequence of Theorem 2. The “if” part follows from Lemma 2 in the Appendix. Q.E.D. 7. EFFICIENT DEFERRED ACCEPTANCE RULES An allocation μ Pareto dominates another allocation μ at the preference profile R if μi Ri μi for all i ∈ N and μi Pi μi for some i ∈ N. An allocation is Pareto efficient at R if no allocation Pareto dominates it at R. An allocation rule ϕ is Pareto efficient if ϕ(R) is Pareto efficient at R for all R ∈ R. An allocation rule ϕ is group strategy-proof if there exist no N ⊂ N and R R ∈ R such that ϕi (RN RN\N )Ri ϕi (R) for all i ∈ N and ϕi (RN RN\N )Pi ϕi (R) for some i ∈ N .
AXIOMS FOR DEFERRED ACCEPTANCE
645
In general, there are deferred acceptance rules that are neither Pareto efficient nor group strategy-proof. Since deferred acceptance rules are often used in resource allocation problems where efficiency is one of the goals of the social planner (besides stability), it is desirable to develop necessary and sufficient conditions for the efficiency of these rules. PROPOSITION 1: Let C be an acceptant substitutable priority. The following properties are equivalent. (i) ϕC is Pareto efficient. (ii) ϕC satisfies Maskin monotonicity. (iii) ϕC is group strategy-proof. The proof is given in the Appendix. Proposition 1 generalizes part of Theorem 1 from Ergin (2002). Under the assumption that priorities are responsive, Ergin established that a deferred acceptance rule is Pareto efficient if and only if it is group strategy-proof, and that these properties hold if and only if the priority is acyclic. Takamiya (2001) showed that Maskin monotonicity and group strategy-proofness are equivalent for any allocation rule. THEOREM 4: Let ϕ be an allocation rule. The following conditions are equivalent. (i) ϕ is the deferred acceptance rule for some acceptant substitutable priority C, that is, ϕ = ϕC , and ϕ is Pareto efficient. (ii) ϕ satisfies non-wastefulness, Maskin monotonicity, and population monotonicity. (iii) ϕ satisfies Pareto efficiency, weak Maskin monotonicity, and population monotonicity. The proof appears in the Appendix. In view of Proposition 1, two additional characterizations of efficient deferred acceptance rules are obtained by replacing the Pareto efficiency property in condition (i) of Theorem 4 with Maskin monotonicity and, respectively, group strategy-proofness. Recall from Theorem 2 that weak Maskin monotonicity is sufficient, along with non-wastefulness and population monotonicity, to characterize deferred acceptance rules. Theorem 4 shows that if we replace weak Maskin monotonicity by Maskin monotonicity in the list of axioms above, we obtain a characterization of efficient deferred acceptance rules. The contrast between these two results demonstrates that the inefficiency of some deferred acceptance rules can be attributed entirely to instances where weak Maskin monotonicity is satisfied, but Maskin monotonicity is violated.
646
F. KOJIMA AND M. MANEA
8. CONCLUSION Our axiomatizations provide a comprehensive description of the theoretical contents of deferred acceptance rules. The intuition behind the axioms sheds light on the mechanics of the deferred acceptance algorithm. The present analysis only restricts the priorities to be acceptant and substitutable. When no additional information about the priority structure is available, our axioms represent the strongest statements satisfied by deferred acceptance rules. The axioms are priority-free and may prove useful in characterizing deferred acceptance rules with restrictions on priorities relevant in applications. APPENDIX PROOF OF THEOREM 1: Since IR monotonicity implies weak Maskin monotonicity and population monotonicity, the “if” part of Theorem 1 follows from the “if” part of Theorem 2, which we establish later. We prove the “only if” part here. We need to show that a deferred acceptance rule ϕC with acceptant substitutable priority C satisfies the non-wastefulness and IR monotonicity axioms. ϕC is non-wasteful since C is acceptant and the deferred acceptance rule is stable. To prove that ϕC satisfies IR monotonicity, suppose that R i.r.m.t. R at ϕC (R). We need to show that ϕC (R )R ϕC (R) =: μ0 . Define μ1 by assigning each agent i the higher ranked object between μ0i and ∅ under Ri . For t ≥ 1, if μt can be blocked at (R C) we choose an arbitrary object at that is part of a blocking pair and define μt+1 by t a if i ∈ Cat (μtat ∪ {j ∈ N|at Pj μtj }), (A.1) = μt+1 i μti otherwise. If μt cannot be blocked, we let μt+1 = μt . Part of the next lemma establishes that each μt is well defined, that is, μt is an allocation for all t ≥ 0. The sequence (μt )t≥0 is a variant of the vacancy chain dynamics of Blum, Roth, and Rothblum (1997).10 LEMMA 1: The sequence (μt )t≥0 satisfies (A.2)
μt ∈ A
(A.3)
μt R μt−1
(A.4)
μta ⊂ Ca (μta ∪ {j ∈ N|aPj μtj })
∀a ∈ O
10 As in Section 5, the exclusion of agent i with preferences Ri from the market can be modeled as a change in i’s reported preferences, making every object unacceptable, which is an IR monotonic transformation of Ri at every object.
AXIOMS FOR DEFERRED ACCEPTANCE
647
for every t ≥ 1. The sequence (μt )t≥0 becomes constant in a finite number of steps T and the allocation μT is stable at (R C). PROOF: We prove the claims (A.2)–(A.4) by induction on t. We first show the induction base case, t = 1. The definition of μ1 immediately implies that μ1 ∈ A and μ1 R μ0 , proving (A.2) and (A.3) (at t = 1). To establish (A.4) (at t = 1), fix a ∈ O. We have that (A.5)
μ0a = Ca (μ0a ∪ {j ∈ N|aPj μ0j })
because μ0 is stable at (R C), and Ca is an acceptant and substitutable priority. By construction, (A.6)
μ1a ⊂ μ0a
Since R i.r.m.t. R at μ0 , it must be that {j ∈ N|aPj μ1j } ⊂ {j ∈ N|aPj μ0j }11 Therefore, (A.7)
μ1a ∪ {j ∈ N|aPj μ1j } ⊂ μ0a ∪ {j ∈ N|aPj μ0j }
Ca ’s substitutability and (A.5)–(A.7) imply μ1a ⊂ Ca (μ1a ∪ {j ∈ N|aPj μ1j }) To establish the inductive step, we assume that the conclusion holds for t ≥ 1 and prove it for t + 1. The only nontrivial case is μt = μt+1 . By the inductive hypothesis (A.4) (at t), μtat ⊂ Cat (μtat ∪ {j ∈ N|at Pj μtj }). Then the definition of (μt )t≥0 implies that (A.8)
t t t t μt+1 at = Ca (μat ∪ {j ∈ N|a Pj μj })
t t To prove (A.2) (at t + 1), first note that (A.8) implies |μt+1 at | = |Ca (μat ∪ {j ∈ t t t t+1 t N|a Pj μj })| ≤ qat If a = a , then by construction, μa ⊂ μa , and by (A.2) (at t), t t+1 we conclude that |μt+1 ∈ A. a | ≤ |μa | ≤ qa . Therefore, μ t+1 t t t To show (A.3) (at t + 1), note that a = μj Pj μj for any j ∈ μt+1 at \ μat and t+1 t t+1 that each agent outside μat \ μat is assigned the same object under μ and μt . Therefore, μt+1 R μt . We show (A.4) (at t + 1) separately for the cases a = at and a = at . By construction of μt+1 , t t+1 t t t 12 μt+1 at ∪ {j ∈ N|a Pj μj } = μat ∪ {j ∈ N|a Pj μj }
11 Suppose that aPj μ1j . Then μ1j Rj ∅ implies aPj ∅. By definition, μ1j Rj μ0j , so aPj μ0j . The assumption that Rj i.r.m.t. Rj at μ0j , along with aPj ∅ and aPj μ0j , implies that aPj μ0j .
648
F. KOJIMA AND M. MANEA
Then (A.8) implies that t+1 t t+1 t μt+1 at = Ca (μat ∪ {j ∈ N|a Pj μj })
⊂ μta by construction, and {j ∈ N|aPj μt+1 For any a = at , we have μt+1 j } ⊂ a t t+1 t {j ∈ N|aPj μj } since μ R μ . Therefore, (A.9)
t+1 t t μt+1 a ∪ {j ∈ N|aPj μj } ⊂ μa ∪ {j ∈ N|aPj μj }
Recall the inductive hypothesis (A.4) (at t): μta ⊂ Ca (μta ∪ {j ∈ N|aPj μtj }). Then (A.9), along with the facts that Ca is substitutable and μt+1 ⊂ μta , leads to a t+1 ⊂ Ca (μt+1 μt+1 a a ∪ {j ∈ N|aPj μj })
completing the proof of the induction step. By (A.3), the sequence (μt )t≥0 becomes constant in a finite number of steps T . The final allocation μT is individually rational at R and is not blocked Q.E.D. at (R C), so is stable at (R C). To finish the proof of the “only if” part, let μT be the stable matching identified in Lemma 1. We have that ϕC (R )R μT because ϕC (R ) is the agentoptimal stable allocation at (R C). Therefore, we obtain ϕC (R )R μT R μT −1 R · · · R μ1 R μ0 = ϕC (R) showing that ϕC satisfies IR monotonicity.
Q.E.D.
PROOF OF THEOREM 2: Since weak Maskin monotonicity and population monotonicity are implied by IR monotonicity, the “only if” part of Theorem 2 follows from the “only if” part of Theorem 1 shown above. We only need to prove the “if” part here. Fix a rule ϕ that satisfies the non-wastefulness, weak Maskin monotonicity, and population monotonicity axioms. To show that ϕ is a deferred acceptance rule for some acceptant substitutable priority, we proceed in three steps. First, we construct a priority profile C and verify that it is acceptant and substitutable. Second, we show that for every R ∈ R, ϕ(R) is a stable allocation at (R C). Third, we prove that ϕ(R) is the agent-optimal stable allocation at (R C). For a ∈ O ∪ {∅}, let Ra be a fixed preference profile which ranks a as the most preferred object for every agent. For each a ∈ O N ⊂ N, define Ca (N ) = ϕa RaN R∅N\N t+1 t t t+1 t t We have μtat ⊂ μt+1 at by construction and {j ∈ N|a Pj μj } ⊂ {j ∈ N|a Pj μj } since μj Rj μj for t+1 t t t every j ∈ N. At the same time, an inspection of (A.1) reveals that μat \ μat = {j ∈ N|a Pj μj } \ {j ∈ N|at Pj μt+1 j }. 12
AXIOMS FOR DEFERRED ACCEPTANCE
649
We have that Ca (N ) ⊂ N because ϕ is non-wasteful and the null object is not scarce. Step 1—Ca is an acceptant and substitutable priority for all objects a ∈ O. Ca is an acceptant priority because ϕ is non-wasteful. To show that Ca is substitutable, consider N ⊂ N ⊂ N. Assume that i ∈ Ca (N ) ∩ N . By definition, ϕi (RaN R∅N\N ) = a Since i ∈ N ⊂ N , population monotonicity for the subset of agents N and the preference profile (RaN R∅N\N ) implies that ϕi (RaN R∅N\N )Rai ϕi (RaN R∅N\N ) = a. Hence ϕi (RaN R∅N\N ) = a, which by definition means that i ∈ Ca (N ). This shows Ca (N ) ∩ N ⊂ Ca (N ). Step 2—ϕ(R) is a stable allocation at (R C) for all R ∈ R. For all R, ϕ(R) is individually rational because ϕ is non-wasteful and the null object is not scarce. To show that no blocking pair exists, we proceed by contradiction. Assume that (i a) ∈ N × O blocks ϕ(R), that is, (A.10)
aPi ϕi (R)
(A.11)
i ∈ Ca (ϕa (R) ∪ {i})
Let N = ϕa (R). N has qa elements by non-wastefulness of ϕ and (A.10). aϕ (R) Fix a preference Ri i for agent i, which ranks a first and ϕi (R) second. Note aϕi (R) aϕ (R) a RN RN\(N ∪{i}) ) m.t. R at ϕ(R) (Ri i m.t. Ri at ϕi (R) by (A.10), that (Ri a Rj m.t. Rj at ϕj (R) for j ∈ N because ϕj (R) = a by the definition of N , and the preferences of the agents outside N ∪ {i} are identical under the two preference profiles). As ϕ satisfies weak Maskin monotonicity, it follows that for all j ∈ N , aϕ (R) ϕj Ri i RaN RN\(N ∪{i}) Raj ϕj (R) = a hence aϕ (R) ϕj Ri i RaN RN\(N ∪{i}) = a Using ϕ’s population monotonicity for the subset of agents N ∪ {i} and the aϕ (R) preference profile (Ri i RaN RN\(N ∪{i}) ), we obtain aϕ (R) aϕ (R) ∀j ∈ N ϕj Ri i RaN R∅N\(N ∪{i}) Raj ϕj Ri i RaN RN\(N ∪{i}) = a From the definition of Ra , it follows that aϕ (R) (A.12) ∀j ∈ N ϕj Ri i RaN R∅N\(N ∪{i}) = a From the construction of Ca , (A.11) is equivalent to ϕi (RaN ∪{i} aϕ (R) R ) = a. Note that (Ri i RaN R∅N\(N ∪{i}) ) m.t. (RaN ∪{i} R∅N\(N ∪{i}) ) at aϕ (R) ϕ(R R∅N\(N ∪{i}) ) (Ri i m.t. Rai at ϕi (RaN ∪{i} R∅N\(N ∪{i}) ) = a and the pref∅ N\(N ∪{i}) a N ∪{i}
650
F. KOJIMA AND M. MANEA
erences of all other agents are identical under the two preference profiles). As ϕ satisfies weak Maskin monotonicity, it follows that aϕ (R) aϕ (R) (A.13) ϕi Ri i RaN R∅N\(N ∪{i}) Ri i ϕi RaN ∪{i} R∅N\(N ∪{i}) = a hence aϕ (R) (A.14) ϕi Ri i RaN R∅N\(N ∪{i}) = a By (A.12) and (A.14), ϕa (Ri i RaN R∅N\(N ∪{i}) ) ⊃ N ∪ {i}, hence ϕ(Ri i R R∅N\(N ∪{i}) ) allocates a to at least |N | + 1 = qa + 1 agents, which is a contradiction with the feasibility of ϕ. Step 3—ϕ(R) = ϕC (R) for all R ∈ R. We state and prove the main part of this step as a separate lemma so that we can use it in the proof of Theorem 3 as well. aϕ (R)
aϕ (R)
a N
LEMMA 2: Let C be an acceptant substitutable priority and suppose that ϕ is a stable allocation rule at C that satisfies weak Maskin monotonicity. Then ϕ is the deferred acceptance rule for C, that is, ϕ = ϕC . PROOF: Fix a preference profile R. For each i ∈ N, let Ri be the truncation of Ri at ϕCi (R), that is, Ri and Ri agree on the ranking of all proper objects, and a proper object is unacceptable under Ri if and only if it is less preferred than ϕCi (R) under Ri . We first establish that ϕC (R) is the unique stable allocation at (R C). Since C ϕ (R) is stable at (R C), it is also stable at (R C). By definition, ϕC (R ) is the agent-optimal stable allocation at (R C), thus ϕC (R )R ϕC (R). This leads to ϕC (R )RϕC (R) as Ri is the truncation of Ri at ϕCi (R) for all i ∈ N. Then the stability of ϕC (R ) at (R C) implies its stability at (R C). But ϕC (R) is the agent-optimal stable allocation at (R C), so it must be that ϕC (R)RϕC (R ) The series of arguments above establishes that ϕC (R) = ϕC (R ) Thus ϕC (R) is the agent-optimal stable allocation at (R C). Let μ be a stable allocation at (R C). We argue that μ = ϕC (R). Since C ϕ (R) is the agent-optimal stable allocation at (R C), we have that ϕCi (R)Ri μi for all i ∈ N. Since μ is stable at (R C) and Ri is the truncation of Ri at ϕCi (R), it follows that μi ∈ {ϕCi (R) ∅} for all i ∈ N. If μi = ϕCi (R) for some agent i ∈ N, then ϕCi (R)Pi μi = ∅ and |μϕCi (R) | < |ϕCϕC (R) (R)| ≤ qϕCi (R) , which is a contradici
AXIOMS FOR DEFERRED ACCEPTANCE
651
tion with the stability of μ at (R C) (as CϕCi (R) is acceptant). It follows that μ = ϕC (R), hence ϕC (R) is the unique stable allocation at (R C). By hypothesis, ϕ is a stable allocation rule at C, thus ϕ(R ) is a stable allocation at (R C). As ϕC (R) is the unique stable allocation at (R C), we need ϕ(R ) = ϕC (R) We have that R m.t. R at ϕ(R ) because Ri is the truncation of Ri at ϕi (R ) = ϕCi (R) for all i ∈ N. As ϕ satisfies weak Maskin monotonicity, it follows that ϕ(R)Rϕ(R ) = ϕC (R). Since ϕ(R) is a stable allocation at (R C) and ϕC (R) is the agent-optimal stable allocation at (R C), we obtain that ϕ(R) = ϕC (R), finishing the proof of the lemma. Q.E.D. We resume the proof of Step 3. By assumption, ϕ satisfies weak Maskin monotonicity. Step 1 shows that C is an acceptant substitutable priority and Step 2 proves that ϕ is a stable allocation rule at C, so ϕ satisfies all the hypotheses of Lemma 2. Therefore, ϕ = ϕC , which completes the proof of Step 3 and of the “if” part of the theorem. Q.E.D. PROOF OF PROPOSITION 1: We prove each of the three implications (i) ⇒ (ii) ⇒ (iii) ⇒ (i) by contradiction. To show (i) ⇒ (ii), assume that ϕC is Pareto efficient, but not Maskin monotonic. Then there exist preference profiles R R such that R m.t. R at ϕC (R) and ϕC (R ) = ϕC (R). As ϕC satisfies weak Maskin monotonicity by Theorem 2, it follows that ϕC (R ) Pareto dominates ϕC (R) at R . Since R m.t. R at ϕC (R), this implies that ϕC (R ) Pareto dominates ϕC (R) at R, which contradicts the assumption that ϕC is Pareto efficient. To show (ii) ⇒ (iii), assume that ϕC is Maskin monotonic, but not group strategy-proof. Then there exist N ⊂ N and preference profiles R R such that ϕCi (RN RN\N )Ri ϕCi (R) for all i ∈ N , with strict preference for some i. For every i ∈ N , let Ri be a preference relation that ranks ϕCi (RN RN\N ) first and ϕCi (R) second.13 Clearly, (RN RN\N ) m.t. (RN RN\N ) at ϕC (RN RN\N ) and (RN RN\N ) m.t. R at ϕC (R). Then the assumption that ϕC is Maskin monotonic leads to ϕC RN RN\N = ϕC RN RN\N = ϕC (R) which is a contradiction with ϕCi (RN RN\N )Pi ϕCi (R) for some i ∈ N . To show (iii) ⇒ (i), suppose that ϕC is group strategy-proof, but not Pareto efficient. Then there exist a preference profile R and an allocation μ such that μ Pareto dominates ϕC (R) at R. For every i ∈ N, let Ri be a preference that ranks μi as the most preferred object. Clearly, μ is the agent-optimal stable allocation at (R C), hence ϕC (R ) = μ. The deviation for all agents 13
If ϕCi (RN RN\N ) = ϕCi (R), then we simply require that Ri rank ϕCi (RN RN\N ) first.
652
F. KOJIMA AND M. MANEA
in N to report R rather than R leads to a violation of group strategy-proofness Q.E.D. of ϕC . PROOF OF THEOREM 4: We prove the three implications (i) ⇒ (ii) ⇒ (iii) ⇒ (i). To show (i) ⇒ (ii), assume that ϕ = ϕC for some acceptant substitutable priority C and that ϕ is Pareto efficient. By the equivalence of properties (i) and (ii) in Proposition 1, ϕ satisfies Maskin monotonicity. By Theorem 2, ϕ satisfies non-wastefulness and population monotonicity. To show (ii) ⇒ (iii), suppose that ϕ satisfies non-wastefulness, Maskin monotonicity, and population monotonicity. Since Maskin monotonicity implies weak Maskin monotonicity, Theorem 2 shows that ϕ = ϕC for some acceptant substitutable priority C. As ϕC satisfies Maskin monotonicity by assumption, the equivalence of conditions (i) and (ii) in Proposition 1 implies that ϕ is Pareto efficient. To show (iii) ⇒ (i), assume that ϕ satisfies Pareto efficiency, weak Maskin monotonicity, and population monotonicity. As Pareto efficiency implies nonwastefulness, by Theorem 2 we obtain that ϕ = ϕC for some acceptant substitutable priority C. Q.E.D. REFERENCES ˘ ABDULKAD˙IROGLU , A., AND T. SÖNMEZ (2003): “School Choice: A Mechanism Design Approach,” American Economic Review, 93, 729–747. [633,636,641] ˘ ABDULKAD˙IROGLU , A., P. A. PATHAK, AND A. E. ROTH (2005): “The New York City High School Match,” American Economic Review Papers and Proceedings, 95, 364–367. [634,636] ˘ ABDULKAD˙IROGLU , A., P. A. PATHAK, A. E. ROTH, AND T. SÖNMEZ (2005): “The Boston Public School Match,” American Economic Review Papers and Proceedings, 95, 368–372. [634] ALCALDE, J., AND S. BARBERA (1994): “Top Dominance and the Possibility of Strategy-Proof Stable Solutions to Matching Problems,” Economic Theory, 4, 417–435. [637] BALINSKI, M., AND T. SÖNMEZ (1999): “A Tale of Two Mechanisms: Student Placement,” Journal of Economic Theory, 84, 73–94. [633,637] BLUM, Y., A. ROTH, AND U. ROTHBLUM (1997): “Vacancy Chains and Equilibration in SeniorLevel Labor Markets,” Journal of Economic Theory, 76, 362–411. [646] EHLERS, L., AND B. KLAUS (2004): “Resource-Monotonicity for House Allocation Problems,” International Journal of Game Theory, 32, 545–560. [636] (2006): “Efficient Priority Rules,” Games and Economic Behavior, 55, 372–384. [636] EHLERS, L., B. KLAUS, AND S. PAPAI (2002): “Strategy-Proofness and Population-Monotonicity for House Allocation Problems,” Journal of Mathematical Economics, 38, 329–339. [636] ERGIN, H. (2002): “Efficient Resource Allocation on the Basis of Priorities,” Econometrica, 70, 2489–2498. [636,640,645] GALE, D., AND L. S. SHAPLEY (1962): “College Admissions and the Stability of Marriage,” American Mathematical Monthly, 69, 9–15. [634,637,638] GUILLEN, P., AND O. KESTEN (2008): “On-Campus Housing: Theory vs. Experiment,” Mimeo, University of Sydney and Carnegie Mellon University. [634] HATFIELD, J. W., AND F. KOJIMA (2008): “Matching With Contracts: Comment,” American Economic Review, 98, 1189–1194. [636] KARA, T., AND T. SÖNMEZ (1996): “Nash Implementation of Matching Rules,” Journal of Economic Theory, 68, 425–439. [640]
AXIOMS FOR DEFERRED ACCEPTANCE
653
KESTEN, O. (2006): “On Two Competing Mechanisms for Priority-Based Allocation Problems,” Journal of Economic Theory, 127, 155–171. [637,641] (2009): “Coalitional Strategy-Proofness and Resource Monotonicity for House Allocation Problems,” International Journal of Game Theory, 38, 17–21. [636] KORKMAZ, ˙I., H. GÖKÇEN, AND T. ÇETINYOKU¸S (2008): “An Analytic Hierarchy Process and Two-Sided Matching Based Decision Support System for Military Personnel Assignment,” Information Sciences, 178, 2915–2927. [634] MA, J. (1994): “Strategy-Proofness and the Strict Core in a Market With Indivisibilities,” International Journal of Game Theory, 23, 75–83. [637] MASKIN, E. (1999): “Nash Equilibrium and Welfare Optimality,” Review of Economic Studies, 66, 23–38. [634,635,640] PAPAI, S. (2000): “Strategyproof Assignment by Hierarchical Exchange,” Econometrica, 68, 1403–1433. [636,641] PERACH, N., J. POLAK, AND U. ROTHBLUM (2007): “A Stable Matching Model With an Entrance Criterion Applied to the Assignment of Students to Dormitories at the Technion,” International Journal of Game Theory, 36, 519–535. [634] ROTH, A. E., AND M. A. O. SOTOMAYOR (1990): Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis. Econometric Society Monographs. Cambridge: Cambridge University Press. [638] SHAPLEY, L., AND H. SCARF (1974): “On Cores and Indivisibility,” Journal of Mathematical Economics, 1, 23–37. [637] SÖNMEZ, T., AND U. ÜNVER (2009): “Course Bidding at Business Schools,” International Economic Review (forthcoming). [634,636] SVENSSON, L.-G. (1999): “Strategy-Proof Allocation of Indivisible Goods,” Social Choice and Welfare, 16, 557–567. [636] TAKAMIYA, K. (2001): “Coalition Strategy-Proofness and Monotonicity in Shapley-Scarf Housing Markets,” Mathematical Social Sciences, 41, 201–213. [641,645] THOMSON, W. (1983a): “The Fair Division of a Fixed Supply Among a Growing Population,” Mathematics of Operations Research, 8, 319–326. [635,642] (1983b): “Problems of Fair Division and the Egalitarian Solution,” Journal of Economic Theory, 31, 211–226. [635,642]
Dept. of Economics, Stanford University, 579 Serra Mall, Stanford, CA 94305, U.S.A.;
[email protected] and Dept. of Economics, MIT, 50 Memorial Drive, Cambridge, MA 02142, U.S.A.;
[email protected]. Manuscript received September, 2007; final revision received August, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 655–695
FRAMING CONTINGENCIES BY DAVID S. AHN AND HALUK ERGIN1 The subjective likelihood of a contingency often depends on the manner in which it is described to the decision maker. To accommodate this dependence, we introduce a model of decision making under uncertainty that takes as primitive a family of preferences indexed by partitions of the state space. Each partition corresponds to a description of the state space. We characterize the following partition-dependent expected utility representation. The decision maker has a nonadditive set function ν over events. Given a partition of the state space, she computes expected utility with respect to her partition-dependent belief, which weights each cell in the partition by ν. Nonadditivity of ν allows the probability of an event to depend on the way in which the state space is described. We propose behavioral definitions for those events that are transparent to the decision maker and those that are completely overlooked, and connect these definitions to conditions on the representation. KEYWORDS: Partition-dependent expected utility, support theory.
1. INTRODUCTION THIS PAPER FORMALLY INCORPORATES the framing of contingencies into decision making under uncertainty. Its primitives are descriptions of acts, which map contingencies to outcomes. For example, the following health insurance policy associates deductibles on the left with contingencies on the right: ⎛ $500 surgery ⎞ ⎝ $100 prenatal care ⎠ Compare this to the following policy, which includes some redundancies: ⎛ ⎞ $500 laminotomy ⎜ $500 other surgeries ⎟ ⎜ ⎟ ⎜ $100 prenatal care ⎟ ⎝ ⎠ Both policies provide effectively identical levels of coverage. Nonetheless, a consumer might evaluate them differently. The second policy explicitly mentions laminotomies, which she may overlook or fail to fully consider when 1 This paper supersedes an earlier draft titled “Unawareness and Framing.” Comments from Raphaël Giraud, Todd Sarver, a co-editor, and anonymous referees were very helpful. Klaus Nehring deserves special thanks for discovering and carefully explaining to us the relationship between binary bet acyclicity and the product rule, leading to the material in Section 4.4. We thank the National Science Foundation for financial support under Grants SES-0550224, SES0551243, and SES-0835944.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7019
656
D. S. AHN AND H. ERGIN
evaluating the first contract. This oversight is behaviorally revealed if the consumer is willing to pay a higher premium for the second contract, reflecting an increased personal belief of the likelihood of surgery after laminotomies are mentioned. The primary methodological innovation of the paper is its ability to discriminate between different presentations of the same act. Our general model expands the standard subjective model of decision making under uncertainty. We introduce a richer set of primitives that distinguishes the different expressions for an act as distinct choice objects. In particular, lists of contingencies with associated outcomes are the primitive objects of choice. Choices over lists are captured by a family of preferences, where each preference is indexed by a partition of the state space. We interpret the partition as a description of the different events. Equipped with this primitive, we present axioms that characterize the suggested partition-dependent expected utility representation. To our knowledge, this is the first axiomatic attempt to incorporate framing of contingencies as a consideration in decision making.2 We characterize the following utility function, which we call partitiondependent expected utility. Each event E carries a weight ν(E). Each outcome x delivers a utility u(x). When presented a list E1 En of contingencies that partition the state space, the decision maker judges the probability of Ei to be ν(Ei )/ j ν(Ej ). Suppose E = F ∪ G with F G disjoint. Since ν is not necessarily additive, the judged likelihood of event E = F ∪ G can depend on whether it is coarsely expressed as E or finely expressed as the union of two subevents F ∪ G. The utility for a list ⎛x E ⎞ 1 1 ⎠ ⎝ xn En is obtained by aggregating her utilities u(xi ) over the consequences xi by the normalized weights ν(Ei )/ j ν(Ej ) on their corresponding events Ei . This particular functional form departs modestly and parsimoniously from standard expected utility by relaxing the additivity of ν. Indeed, given a fixed list E1 En of events, it maintains the affine aggregation and probabilistic sophistication of standard expected utility.3 Savage (1954) and Anscombe and Aumann (1963) did not distinguish different presentations of the same act. They implicitly assumed the psychological principle of extensionality, that the framing of an event is inconsequential to its judged likelihood. Despite its normative appeal, extensionality is violated in 2 A recent paper by Bourgeois-Gironde and Giraud (2009) considers presentation effects in the Bolker–Jeffrey decision model. 3 In fact, shortly we directly impose the Anscombe–Aumann representation on preferences given a fixed list of contingencies. As will be clear in the sequel, the belief will change between lists.
FRAMING CONTINGENCIES
657
experiments where unpacking a contingency into finer subcontingencies affects its perceived likelihood. In a classic experiment, Fischoff, Slovic, and Lichtenstein (1978) told car mechanics that a car fails to start and asked for the likelihood that different parts could cause the failure. The mechanics’ likelihood assessments depended on whether a part’s subcomponents were explicitly listed. Tversky and Koehler (1994) proposed a nonextensional model of judgment, which they called support theory. Support theory begins with a function P(A B), which reflects the likelihood of a hypothesis A given that A or the mutually exclusive hypothesis B holds. It connects these likelihoods by s(A) , where s(·) is a nonadditive “support function” asserting P(A B) = s(A)+s(B) over different hypotheses. Support theory enjoys considerable success among psychologists for its ability to “accommodate many mechanisms that influence subjective probability, but integrate them via the construct of the support” (Brenner, Koehler, and Rottenstreich (2002)). This paper contributes to the development of support theory by, first, providing a decision theoretic model and foundation for support theory, second, studying the uniqueness of the support function under different assumptions on the behavioral data, and third, identifying new classes of events which have special properties in terms of their support. One interpretation of nonextensionality is through unforeseen contingencies. The general idea of a decision maker with a coarse understanding of the state space appears in papers by Dekel, Lipman, and Rustichini (2001), by Epstein, Marinacci, and Seo (2007), by Ghirardato (2001), and by Mukerji (1997). Our contribution is to compare preferences across descriptions to identify which contingencies had been unforeseen. This basic insight of using the explicit expression of unforeseen contingencies as a foundation for their identification was anticipated in psychology and in economics. Tversky and Koehler (1994, p. 565) connected nonextensional judgment and unforeseen contingencies: “The failures of extensionality highlight what is perhaps the fundamental problem of probability assessment, namely the need to consider unavailable possibilities People cannot be expected to generate all relevant future scenarios.” Dekel, Lipman, and Rustichini (1998a, p. 524) distinguished unforeseen contingencies from null events, because “an ‘uninformative’ statement—such as ‘event x might or might not happen’—can change the agent’s decision.” Our model formally executes their suggested test. Beyond unforeseen contingencies, there are other psychological sources for nonextensional judgment. A first source is limited memory or recall. For example, the car mechanics surveyed by Fischoff, Slovic, and Lichtenstein (1978) had surely heard of the mechanical failures before. To explain nonextensionality, Tversky and Koehler (1994, p. 549) appealed to “memory and attention Unpacking a category into its components might remind people of possibilities that would not have been considered otherwise.” A second source of nonextensionality is that different descriptions alter the salience of events. For example, Fox and Rottenstreich (2003) asked subjects
658
D. S. AHN AND H. ERGIN
to report the probability that Sunday would be the hottest day of the coming week. Subjects’ reports depended significantly on whether the rest of the week was described as a single event or separated as Monday, Tuesday, and so on, with a mean of 13 in the first case and of 17 in the latter. In such cases, descriptions affect probability judgments without suggesting unforeseen or unrecalled cases. The next section introduces the primitives of our theory. Section 3 defines the suggested partition-dependent expected utility representation. Section 4 axiomatizes the representation and discusses the uniqueness of its components. Finally, Section 5 defines two families of events—those which are completely understood and those which are completely overlooked—and examines the structure of these families when the representation holds. 2. A NONEXTENSIONAL MODEL OF DECISION MAKING This section introduces the primitives of the model. The closest formalism of which we are aware is the model of decision making under ignorance by Cohen and Jaffray (1980), which also considers different descriptions of the state space.4 However, they imposed as a normative condition that preference is invariant to the manner in which the states are expressed, while this dependence is exactly our focus. Let S denote a state space. A finite partition of S is a nonempty n and pairwise disjoint collection of subsets π = {E1 En } such that S = i=1 En . The events E1 En are called the cells of partition π.5 Let Π ∗ denote the collection of all finite partitions of S. We interpret each partition π ∈ Π ∗ as a description of the state space S: it explicitly mentions categories of possible states, where each cell of the partition is a category, and these categories are comprehensive. For any π ∈ Π ∗ , let σ(π) denote the algebra induced by π.6 Define the binary relation ≥ on Π ∗ by π ≥ π if σ(π ) ⊃ σ(π), that is, if π is finer than π. If π ≥ π, then π is a richer description of the state space than π. The meet π ∧ π and join π ∨ π respectively denote the finest common coarsening and the coarsest common refinement of π and π . For any event E ⊂ S, let ΠE∗ denote the set of finite partitions of E. If E ∈ π ∈ Π ∗ and πE ∈ ΠE∗ , we slightly abuse notation and let π ∨ πE denote π ∨ [πE ∪ {E }]. The model considers a set of descriptions Π ⊂ Π ∗ . We assume that Π includes the vacuous description {S} and is closed under ∧ and ∨. Some definitions in the sequel reference two collections of events. First, let C = π∈Π π denote the collection of cells of partitions in Π. Second, let E = π∈Π σ(π) denote the collection of all unions of cells of some partition in Π. Clearly, E is the algebra generated by C . Most results focus on two cases of Π. In the first 4
We thank Raphaël Giraud for bringing this work to our attention. For any partition π ∈ Π ∗ , we adopt the convention where π ∪ {∅} is identified with π. 6 Since π is finite, σ(π) is the family of unions of cells in π and the empty set. 5
FRAMING CONTINGENCIES
659
case, descriptions can be indexed so they become progressively finer, in which case Π is a filtration. In the second case, all possible descriptions are included, in which case Π = Π ∗ . We discuss the distinction shortly. Let X denote a finite set of consequences or prizes. Invoking the Anscombe– Aumann structure, let ΔX denote the set of lotteries on X. An act f : S → ΔX maps states to lotteries. Slightly abusing notation, let p ∈ ΔX also denote the corresponding constant act. Let Fπ denote the family of acts that respect the partition π, that is, f −1 (p) ∈ σ(π) for all p ∈ ΔX. In words, the act f is σ(π)measurable if it assigns a constant lottery to all states in a particular cell of the partition: if s s ∈ E ∈ π, then f (s) = f (s ). Informally, Fπ is the set of acts of contracts that can be described using the descriptive power of π; an act g ∈ / Fπ requires a finer categorization than is available in π. Let F = π∈Π Fπ denote the universe of acts under consideration. For any act f ∈ F , let π(f ) denote the coarsest available partition π ∈ Π such that f ∈ Fπ .7 Note that when Π = Π ∗ , because π(f ) is the coarsest partition within Π, it could be strictly finer than the partition induced by f , that is, the coarsest partition (among all partitions) that makes f measurable. Similarly for any pair of acts f g ∈ F , let π(f g) be the coarsest available partition π ∈ Π such that f g ∈ Fπ . Our primitive is a family of preferences {π }π∈Π indexed by partitions π, where each π is defined over the family Fπ of π-measurable acts. Our interpretation of f π g is that f is weakly preferred to g when the state space is described as the partition π. If f ∈ / Fπ , then the description π is too coarse to express the structure of f . If either f or g is not π-measurable, then the statement f π g is nonsensical. The strict and symmetric components π and ∼π carry their standard meanings. The restriction to π-measurable acts is not innocuous, particularly when framing effects are interpreted as reflecting unawareness. Consider a health insurance contract that covers eighty percent of the cost of surgery. The exact benefit of the insurance depends on which surgery is required, about which the consumer might have only a vague understanding. Nonetheless, its terms are described without explicitly mentioning every possible surgery. The measurability assumption precludes such contracts, a limitation of our model. Our original motivation was to study preferences over lists. The family of preferences {π }π∈Π provides a parsimonious primitive that loses little descriptive power relative to a model that begins with preferences over lists. Suppose we started with a list ⎛x E ⎞ 1 1 ⎠ ⎝ xn
En
7 The existence of π(f ) is guaranteed by our assumption that Π is closed under the operation ∧. To see this, let π ∈ Π be any partition according to which f is measurable. Since π is a finite partition, there are finitely many partitions that are (weakly) coarser than
π. Hence the set Π = {π ∈ Π|π ∈ Π π ≤ π &f ∈ Fπ } is finite and nonempty, and π(f ) = π ∈Π π .
660
D. S. AHN AND H. ERGIN
which is a particular expression of the act f . This list is more compactly represented as a pair (f π), where the partition π = {E1 En } denotes the list of explicit contingencies on the right. This description π is necessarily richer than the coarsest expression of f , so f ∈ Fπ . Now suppose the decision maker is deciding between two lists, which are represented as (f π1 ) and (g π2 ). Then the events in both π1 and π2 are explicitly mentioned. So the family of described events is the coarsest common refinement of π1 and π2 , their join π = π1 ∨ π2 . Then (f π1 ) is preferred to (g π2 ) if and only if (f π) is preferred to (g π). We can therefore restrict attention to the preferences over pairs (f π) and (g π), where f g ∈ Fπ . Moving the partition from being carried by the acts to being carried as an index of the preference relation arrives at exactly the model studied here. The lists are expressed through indexed preference relations for the resulting economy of notation. The partition π that indexes f π g is the coarsest refinement of the observable descriptions in the lists π1 and π2 that accompanied f and g. The partition π is not meant to be interpreted as anything more. It is exogenous information that is an observable component of the decision problem and should not be taken as a direct measure of the decision maker’s subjective understanding of the state space. In fact, Section 5 suggests a method for inferring her subjective understanding of the state space from her preferences over lists. An important consideration is exactly which preferences are available or observable to the analyst. How rich are the preferences that can be sensibly elicited from the decision maker? This question speaks directly to the structure of the collection Π. Consider the interpretation of framing in terms of availability or recall. Once an event is explicitly mentioned to the decision maker, this pronouncement cannot be reversed. In this case, after being presented with prior partitions π1 πt−1 , the relevant behavior after also being told πt is with respect to the refinement of the prior presentations π1 πt−1 and the current πt . So the appropriate assumption in this case is that Π is a filtration. On the other hand, under different motivations for framing, it seems more reasonable to consider the family of all descriptions. For example, if framing effects are due to salience, these effects are independent of the decision maker’s ability to recall events. A similar argument can be made for the representativeness heuristic.8 Even for motivations where preferences under the full set of descriptions cannot be elicited for a single subject, the analyst could believe there is enough uniformity in the population to elicit preferences across subjects, in which case a particular description could be given to one subject while 8 Consider the famous “Linda problem,” where subjects are told that “Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.” The subjects believe the event “Linda is a bank teller” is less probable than the event “Linda is a bank teller and is active in the feminist movement” (Tversky and Kahneman (1983, p. 297)).
FRAMING CONTINGENCIES
661
alternative descriptions are given to others. Similarly, it might be useful to consider counterfactual assessments about what a particular decision maker would have done if she had been presented alternative sequences of descriptions. We therefore consider two canonical cases: in the first, Π is a filtration; in the second, Π is the family of all finite partitions. The appropriateness of either case depends on the application. Neither case is obviously more technically challenging. When Π is larger, the theory leverages more information about the decision maker, but also must rationalize more of her choices. Given a partition π = {E1 En } ⊂ E and acts f1 fn ∈ F , define a new act by ⎛f 1 ⎝ fn
⎧ E1 ⎞ ⎪ ⎨ f1 (s) ⎠ (s) = ⎪ ⎩ En fn (s)
if s ∈ E1 , if s ∈ En .9
Null events for our setting with a family of preferences are defined as follows: DEFINITION 1: Given π ∈ Π, an event E ∈ σ(π) is π-null if
p E f E
∼π
q f
E E
for all f ∈ Fπ and p q ∈ ΔX; E ∈ σ(π) is π-nonnull if it is not π-null. The event E is null if E is π-null for any π such that E ∈ π; E is nonnull if it is not null.10
3. PARTITION-DEPENDENT EXPECTED UTILITY We study the following utility representation. The decision maker has a nonadditive set function ν : C → R+ over relevant contingencies. Presented with a description π = {E1 E2 En } of the state space, she places a weight ν(Ek ) on each described event. Following Tversky and Koehler (1994), we refer to ν(E) as the support of E. Normalizing by the sum, μπ (Ek ) = ν(Ek )/ i ν(Ei ) 9 Note that the partition π does not necessarily belong to Π. However, the assumption that π ⊂ E guarantees that π is coarser than some partition π ∈ Π. To see this, let πi ∈ Π be such that Ei ∈ σ(πi ) for each i = 1 n and let π = π1 ∨ π2 ∨ · · · ∨ πn ∈ Π. Then π ≤ π and the new act defined above belongs to Fπ ∨π(f1 )∨···∨π(fn ) ⊂ F . 10 Note that for an event E to be nonnull, E only needs to be nonnull for some partition π including E, but not necessarily for all partitions whose algebras include E.
662
D. S. AHN AND H. ERGIN
defines a probability measure μπ over σ(π). Then her utility for the act f expressed as ⎛ ⎞ p1 E1 ⎜ p2 E2 ⎟ ⎜ ⎟ f =⎜ ⎟ ⎝ ⎠ pn En n is i=1 u(pi )μπ (Ei ), where u : ΔX → R is an affine utility function over objective lotteries. The following definition avoids division by zero during the normalization. DEFINITION 2: A support function is a weakly positive set function ν : C → R+ such that E∈π ν(E) > 0 for all π ∈ Π. Although ∅ is not in C , since it is not an element of any partition, we adhere to the convention that ν(∅) = 0. We can now formally define the utility representation. DEFINITION 3: {π }π∈Π admits a partition-dependent expected utility (PDEU) representation if there exist a nonconstant affine von Neumann–Morgenstern (vNM) utility function u : ΔX → R and a support function ν : C → R+ such that for all π ∈ Π and f g ∈ Fπ , u ◦ f dμπ ≥ u ◦ g dμπ f π g ⇐⇒ S
S
where μπ is the unique probability measure on (S σ(π)) such that for all E ∈ π, (1)
ν(E) μπ (E) = ν(F) F∈π
When such a pair (u ν) exists, we call it a PDEU representation. The support ν(E) corresponds to the relative weight of E in lists where E, but not its subevents, are explicitly mentioned. The nonadditivity of ν allows for framing effects: E and F can be disjoint yet ν(E) + ν(F) = ν(E ∪ F). The normalization of dividing by E∈π ν(E) is also significant. If the complement of E is unpacked into finer subsets, then the assessed likelihood of E will be indirectly affected in the denominator. So the probability of E depends directly on its description πE and indirectly on the description πE of its complement. PDEU is closely related to support theory, which was introduced by Tversky and Koehler (1994) and extended by Rottenstreich and Tversky (1997). Support theory begins with descriptions of events, called hypotheses. Tversky and
FRAMING CONTINGENCIES
663
Koehler (1994) analyzed comparisons of likelihood between pairs (A B) of mutually exclusive hypotheses that they call evaluation frames, which consist of a focal hypothesis A and an alternative hypothesis B. The probability judgment of A relative to B is P(A B) = s(A)/[s(A) + s(B)], where s(A) is the support assigned to hypothesis A based on the strength of its evidence. They focused on the case of nonadditive support for the same motivations as we do. They also characterized the formula for P(A B). However, they directly treated P, rather than preference, as primitive (Tversky and Koehler (1994, Theorem 1)). Our theory translates support theory from judgment to decision making and extends its scope beyond binary evaluation frames. Our results provide behavioral axiomatic foundations for the model and precise requirements for identifying a unique support function from behavioral data. Alongside its psychological pedigree, there are sound methodological arguments for PDEU. These points will develop in the sequel, but we summarize a few here. First, while the beliefs μπ could be left unconnected across partitions, the consequent lack of basic structure would not be amenable to applications or comparative statics. Second, PDEU has an attractively compact form. As in the standard case, preference is summarized by two mathematical objects: one function for utility and another for likelihood. Third, an inherited virtue of the standard model is that a large number of implied preferences can be determined from a small number of choice observations. Under PDEU, once the weights of specific events are fixed, the weights of many other events can be computed by comparing likelihood ratios. This tractably generates counterfactual predictions about behavior under alternative descriptions of the state space, an exercise that would be difficult without any structure across partitions. Finally, PDEU associates interesting classes of behavior with features of ν. For example, specific kinds of framing effects are characterized by subadditivity. The availability heuristic associates the probability of events with the number of cases that the decision maker can recall; if more precise description aid recall, then the support function is subadditive. These sorts of characterizations are provided in the online supplement (Ahn and Ergin (2010)). PDEU also guarantees natural structure on special collections of events, in particular, those that are immune to framing and those that are completely overlooked without explicit mention. These results are presented in Section 5. We sometimes refer to the following prominent example of PDEU. EXAMPLE 1—Principle of Insufficient Reason: Suppose ν is a constant function, for example, ν(E) = 1 for every nonempty E. Then the decision maker puts equal probability on all described contingencies. Such a criterion for cases of extreme ignorance or unawareness was advocated by Laplace and Leibnitz as the principle of insufficient reason, but is sensitive to the framing of the states.
664
D. S. AHN AND H. ERGIN
When the set function ν is additive, the probabilities of events do not depend on their expressions and the model reduces to standard subjective expected utility. DEFINITION 4: {π }π∈Π admits a partition-independent expected utility representation if it admits a PDEU representation (u ν) with finitely additive ν. 4. AXIOMS AND REPRESENTATION THEOREMS This section provides axiomatic characterizations of PDEU in two settings: when Π is a filtration and when Π includes all finite partitions. 4.1. Basic Axioms We first present axioms that will be required in both settings. The first five are standard and are collectively denoted as the Anscombe–Aumann axioms. AXIOM 1—Weak Order: π is complete and transitive for all π ∈ Π. AXIOM 2—Independence: For all π ∈ Π, f g h ∈ Fπ and α ∈ (0 1), if f π g, then αf + (1 − α)h π αg + (1 − α)h. AXIOM 3—Archimedean Continuity: For all π ∈ Π and f g h ∈ Fπ , if f π g π h, then there exist α β ∈ (0 1) such that αf + (1 − α)h π g π βf + (1 − β)h. AXIOM 4 —Nondegeneracy: For all π ∈ Π, there exist f g ∈ Fπ such that f π g. AXIOM 5—State Independence: For all π ∈ Π, π-nonnull E ∈ σ(π), p q ∈ ΔX, and f ∈ Fπ , q E p E π p {S} q ⇐⇒ f E f E State independence has some additional content here: not only is the utility for a consequence invariant to the event in which it obtains, it also is invariant to the description of the state space. These axioms guarantee a collection of probability measures μπ : σ(π) → [0 1] and an affine function u : ΔX → R such that S u ◦ f dμπ represents π . That is, fixing a partition π, the preference π is standard expected utility given the subjective belief μπ . The model’s interest derives from the relationship between preferences across descriptions. Any expression of a contract f must mention at least the different events in which it delivers the various payments. At a minimum, the events in π(f )
FRAMING CONTINGENCIES
665
must be explicitly mentioned, recalling that π(f ) is the coarsest available partition π ∈ Π such that f ∈ Fπ . Similarly, when comparing two acts f and g, the coarsest description available to express both f and g is π(f g) = π(f ) ∨ π(g), where none of the payoff-relevant contingencies is unpacked into finer subevents. This motivates the following binary relation on F . DEFINITION 5: For all f g ∈ F , define f g if f π(fg) g, where π(f g) = π(f ) ∨ π(g). Under the Anscombe–Aumann axioms, the single relation compactly summarizes the entire family of relations {π }π∈Π . For example, consider the preference between two acts f and g given a description π that is strictly finer than π(f g). Does the preference f π g hold? To answer this question equipped only with , take any act h such that π = π(h). Then f π g if and only if αf + (1 − α)h π g for α ∈ (0 1) close to 1, since the mixture act αf + (1 − α)h is close to f in terms of payoffs but requires the minimal description π. We use in the sequel for its notational convenience. However, where is invoked, much of the force is implicit in its construction. These assumptions should, therefore, be delicately interpreted. The following principle is a verbatim application of the classic axiom of Savage (1954) to the defined relation . AXIOM 6—Sure-Thing Principle: For all events E ∈ E and acts f g h h ∈ F , f E g E f E g E ⇐⇒ h E h E h E h E The sure-thing principle is usually invoked to establish coherent conditional preferences: the relative likelihood of subevents of E is independent of the prizes associated with E . In our context, this coherence is already guaranteed by the Anscombe–Aumann axioms. Here, the marginal power of the axiom is to require that the preference conditional on E is independent of the description of E induced by h or h . To see this, assume for simplicity that the images of h and h are disjoint from the images of f and g. Then the implied descriptions to make the comparison in the left hand side can be divided into two parts: the description of E implied by f and g, and the description of E implied by h. The descriptions in the right hand side can be similarly divided: the same description of E generated by f and g, and the possibly different description of E generated by h . The sure-thing principle requires that the relative likelihoods of subevents of E are independent of how the complement E is expressed.11 11 Given the Anscombe–Aumann axioms, this feature of the sure-thing principle is perhaps more transparently expressed by the following equivalent condition: Fix an event E ∈ E . Let π|E = {A ∩ E : A ∈ π}. For any π π ∈ Π such that E ∈ σ(π) σ(π ), if π|E = π |E and f g ∈ Fπ ∩ Fπ with f |E = g|E , then f π g ⇐⇒ f π g
666
D. S. AHN AND H. ERGIN
There are situations where such separability might be restrictive. For example, the judged relative likelihood of a failure of an automobile’s alarm system to a failure of its transmission might depend on how finely its audio system is described. This is because alarm and audio systems are electronic components, while the transmission is mechanical. Nonetheless, such separability is required in classic support theory, where the relative likelihood in an evaluation frame (A B) of hypothesis A to hypothesis B is independent of how any third hypothesis is described. This separability is a consequence of summarizing likelihood with a single function ν and, therefore, is necessary for PDEU representation. We occasionally reference the following standard condition, which excludes any nonempty null events: AXIOM 7—Strict Admissibility: If f (s) g(s) for all s ∈ S and f (s ) g(s ) for some s ∈ S, then f g. 4.2. Π Is a Filtration We write Π is a filtration if the refinement relation ≥ is complete on Π. Given the restriction to finite partitions, Π can then be indexed by a finite or countably infinite sequence as Π = {πt }Tt=0 with π0 = {S} and πt+1 > πt for 0 ≤ t < T . When T is finite, πT is the finest partition in Π; therefore, F = F = F and E = π πT π∈Π π∈Π σ(π) = σ(πT ). For any expressible act f ∈ F , here π(f ) refers to the first partition in {πt }Tt=0 for which f is measurable, but π(f ) could be strictly finer than the algebra induced by f . Similarly, π(f g) refers to the first partition in the filtration where f and g become describable. THEOREM 1: Given a filtration {πt }Tt=0 , {πt }Tt=0 admits a PDEU representation if and only if it satisfies the Anscombe–Aumann axioms and the sure-thing principle. See Appendix B for the proof. Some intuition for Theorem 1 is provided after presenting the uniqueness result. A precise statement regarding the uniqueness of u and ν requires an additional definition. DEFINITION 6: A filtration Π = {πt }Tt=0 is gradual with respect to {πt }Tt=0 if there exists a πt -nonnull event E ∈ πt ∩ πt+1 for all t = 1 T − 1. In words, Π is gradual if it never splits all of the πt -nonnull events into finer descriptions. For example, suppose π1 = {{a b} {c d}} and π2 = {{a} {b} {c} {d}}. This filtration is not gradual because π2 splits every event in π1 . An alternative elicitation could describe the state space as π2 = {{a} {b} {c d}} and then as π3 = {{a} {b} {c} {d}}. This filtration collects a
FRAMING CONTINGENCIES
667
strictly richer set of preferences. In the alternative elicitation, ν is uniquely identified up to a constant scalar.12 THEOREM 2: Suppose {πt }Tt=0 is a filtration and {πt }Tt=0 admits a PDEU representation (u ν). Then the following statements are equivalent: (i) {πt }Tt=0 is gradual with respect to {πt }Tt=0 . (ii) If (u ν ) also represents {πt }Tt=0 , then there exist numbers a c > 0 and b ∈ R such that u (p) = au(p) + b for all p ∈ ΔX and ν (E) = cν(E) for all E ∈ C \ {S}. See Appendix B for the proof. The identification of the support function ν is surprisingly delicate. This delicacy provides some intuition for how the support function is elicited. When two cells are in the same partition, identifying ν is simple. For example, if E F ∈ π, then the likelihood ratio ν(E)/ν(F) is identified by μπ (E)/μπ (F), where μπ is the probability measure on π implied by the Anscombe–Aumann axioms on π . When E and F are not part of the same partition, an appropriate chain of available partitions and betting preferences calibrates the likelihood ratio ν(E)/ν(F). For example, suppose S = {a b c d}, T = 2, π1 = {{a b} {c d}}, and π2 = {{a} {b} {c d}}. Consider the ratio ν({a b})/ν({a}). First, consider preferences when the states are described as the partition π1 to identify the of {a b} to {c d}. Next, considering the preferences likelihood ratio ν({ab}) ν({cd}) of {c d} to {a}. when the states are described as π2 reveals the ratio ν({cd}) ν({a}) ν({ab}) ν({ab}) ν({cd}) Then we can identify ν({a}) = ν({cd}) × ν({a}) , that is, “the {c d}’s cancel” when the revealed likelihood ratios multiply out. This approach of indirectly linking the cells with intermediate connections might encounter two obstacles. First, if {c d} is π1 -null, then the ratio is undefined. Second, if the filtration specifies π2 = {{a} {b} {c} {d}}, then it is not gradual and there is no cell common to π1 and π2 with which to execute the indirect comparison of {a b} to {b}. Instead, the ratios would reflect ν({ab}) in the first case and ν({c})+ν({d}) in the second. However, since generally ν({cd}) ν({a}) . ν({c d}) = ν({c}) + ν({d}), these ratios are not useful in identifying ν({ab}) ν({a}) If two events E and F can be connected through such a chain of disjoint nonnull cells across partitions, then the ratio of ν(E) to ν(F) is pinned down. Otherwise, the ratio cannot be identified. Assuming that the filtration is gradual ensures that all cells can be connected, hence providing unique identification of ν up to a scalar multiple. 12 The exception is that value ν(S) at the vacuous description {S}, which is unidentified because this quantity always divides itself to unity.
668
D. S. AHN AND H. ERGIN
4.3. Π Is the Collection of All Finite Partitions We now consider the case where Π = Π ∗ , the collection of all finite partitions of S. Then E = 2S and C = 2S \ {∅}. Unlike when Π is a filtration, the surething principle is insufficient for PDEU. The problem is that the calibrated likelihood ratio of E to F can depend on the particular chain of comparisons used to link them. When Π is a filtration, there is only one such sequence available. The next example illustrates the potential dependence. EXAMPLE 2: Let S = {a b c d} and ΔX = [0 1]. Let π ∗ = {{a b} {c d}} 1 for with μπ ∗ ({a b}) = 23 and μπ ∗ ({c d}) = 13 . For any π = π ∗ , let μπ (C) = |π| all cells C ∈ π. Suppose u(p) = p, so π is represented by S f dμπ . These preferences satisfy the Anscombe–Aumann axioms and the sure-thing principle, but admit no PDEU representation. To the contrary, suppose (u ν) was such a representation. Let π1 = {{a b} {c} {d}}, π2 = {{a d} {b} {c}}, and π3 = {{a} {b} {c d}}. Then, multiplying relevant likelihood ratios, ν({b}) ν({a b}) ν({a b}) ν({c}) = × × ν({c d}) ν({c}) ν({b}) ν({c d}) =
μπ3 ({b}) μπ1 ({a b}) μπ2 ({c}) × × = 1 μπ1 ({c}) μπ2 ({b}) μπ3 ({c d})
We can directly obtain a contradictory conclusion: ν({a b}) μπ ∗ ({a b}) = = 2 ν({c d}) μπ ∗ ({c d}) The example suggests that an additional assumption on implied likelihood ratios across different sequences of comparisons is required. Preferences across partitions are summarized by the defined relation , which compares acts assuming their coarsest available description. This relation is intransitive: the implied partitions π(f g), π(g h), and π(f h) are generally distinct. The following statement is a common generalization of transitivity. AXIOM 8—Acyclicity: For all acts f1 fn ∈ F , f1 f2 fn−1 fn
⇒
f1 fn
This generalization is still too strong. Given the Anscombe–Aumann axioms, acyclicity guarantees additivity of ν.
FRAMING CONTINGENCIES
669
PROPOSITION 1: {π }π∈Π ∗ admits a partition-independent expected utility representation if and only if it satisfies the Anscombe–Aumann axioms and acyclicity.13 See Appendix C for the proof. Acyclicity therefore precludes nonadditive support functions. It is behaviorally restrictive because some cycles seem intuitive in the presence of framing effects, such as the following example. EXAMPLE 3: This example is inspired by Tversky and Kahneman (1983), who reported that the predicted frequency across subjects of seven-letter words ending with ing is higher than those with n as the sixth letter. Consider the following events regarding a random seven-letter word: E1 E2
_ _ _ _ _t_, _ _ _ _ _n_.
The decision maker might consider E1 more likely than E2 because the letter t is more common than n. Now consider the following pair of events: E2 E3
_ _ _ _ _n_, _ _ _ _ing.
The decision maker considers E2 more likely than E3 , because it is a strict superset. But, when presented with E1 and E3 , E1 E3
_ _ _ _ _t_, _ _ _ _ing,
she thinks E3 is more likely, since she is now reminded of the large number of present participles that end with ing. Letting p q, we have a strict cycle: p E1 p E2 p E2 p E3 q E1 q E2 q E2 q E3 p E3 p E1 14 q E3 q E1 The heart of this example is the nonempty intersection shared by E2 and E3 . When E2 and E3 are mentioned together, this intersection primes the consider13 It is clear that Proposition 1 remains true if acyclicity of is replaced with transitivity of . Define the certainty equivalence relation ∗ on F by: f ∗ g if there exists p q ∈ ΔX such that f ∼ p q ∼ g. The relation ∗ is monotone (or weakly admissible) if f ∗ g whenever f (s) ∗ g(s) for all s ∈ S. Then, Proposition 1 also remains true if acyclicity of is replaced with monotonicity of ∗ . Details are available from the authors upon request. 14 We are very grateful to an anonymous referee for suggesting this example.
670
D. S. AHN AND H. ERGIN
ation of subevents of E3 , namely words ending with ing. The decision maker is not comparing the likelihood of “seven-letter words ending with ing” against “seven-letter words with n in the sixth place,” but against “seven-letter words with n in the sixth place that may or may not end with ing.” On the other hand, when comparing E1 to E2 , she is directly comparing the two events, without explicit mention of E2 ∩ E3 . Finally, suppose E3 had been _ _ _ _ _d_. This event is disjoint from E1 and E2 , and a cycle now seems less plausible. This suggests that cycles where subsequent events are disjoint should be excluded, since these have meaningful likelihood interpretations even in the presence of framing. This motivates the following definition. DEFINITION 7: A cycle of events E1 E2 En E1 is sequentially disjoint if E1 ∩ E2 = E2 ∩ E3 = · · · = En−1 ∩ En = En ∩ E1 = ∅. AXIOM 9—Binary Bet Acyclicity: For any sequentially disjoint cycle of sets E1 En E1 and lotteries p1 pn ; q ∈ ΔX, p2 E2 p1 E1 q E1 q E2 pn−1 En−1 pn En p1 E1 pn En ⇒ q En−1 q En q E1 q En This consistency on likelihoods is only applicable across comparisons of disjoint events, a sensible restriction given our model of framing. If A and B intersect, then eliciting whether A is judged more likely than B is delicate. The delicacy is that we cannot directly measure the likelihood of the coarsest expression of A versus the coarsest expression of B, because no partition allows a comparison of A to B. The best we can do is assess the subjective likelihood of “A \ B or A ∩ B” versus “B \ A or A ∩ B.” Once framing effects are allowed, this is a conceptually distinct question. We can now characterize PDEU preferences when Π is rich. THEOREM 3: {π }π∈Π ∗ admits a PDEU representation if and only if it satisfies the Anscombe–Aumann axioms, the sure-thing principle, and binary bet acyclicity. See Appendix C for the proof. Turning to uniqueness, the following definition translates Definition 6 of a gradual filtration to the current setting with all partitions. DEFINITION 8: A sequence of events E1 E2 En is sequentially disjoint if E1 ∩ E2 = E2 ∩ E3 = · · · = En−1 ∩ En = ∅.
FRAMING CONTINGENCIES
671
AXIOM 10 —Event Reachability: For any distinct nonnull events E F S, there exists a sequentially disjoint sequence of nonnull events E1 En such that E1 = E and En = F . THEOREM 4: Assume that {π }π∈Π ∗ admits a PDEU representation (u ν). The following statements are equivalent: (i) {π }π∈Π ∗ satisfies event reachability. (ii) If (u ν ) also represents {π }π∈Π ∗ , then there exist numbers a c > 0 and b ∈ R such that u (p) = au(p) + b for all p ∈ ΔX and ν (E) = cν(E) for all E S. The proof follows from Lemma 4 in Appendix A. Strict admissibility implies event reachability, but the converse is false: event reachability is strictly weaker than strict admissibility. 4.4. Binary Bet Acyclicity and the Product Rule Binary bet acyclicity is reminiscent of an implication of support theory called the product rule, which is well known in the psychological literature. Roughly speaking, if R(A B) denotes the relative likelihood of hypothesis A to a mutually exclusive hypothesis B, the product rule requires R(A C)R(C B) = R(A D)R(D B). Rewritten as R(A C)R(C B)R(B D) = R(A D), this is a special case of the consistency across likelihood ratios implied by binary bet acyclicity. The product rule and binary bet acyclicity have similar intuition: the particular comparison event, C or D, used to calibrate the quantitative likelihood ratio of A to B is irrelevant. One way to think of the product rule is as a limited version of binary bet acyclicity that only precludes cycles of size four, but allows for larger cycles. Given strict admissibility, if there are no cycles of size four, then there are no cycles of any size. Therefore, binary bet acyclicity is equivalent to the product rule. The product rule also enjoys some empirical support.15 The next result formally states this equivalence. As Appendix D argues in more detail, Theorem 1 of Tversky and Koehler (1994) can be restated as a representation result for the relative likelihoods ratios R(A B). Theorem 5(ii) directly follows from Tversky and Koehler (1994) and from Nehring (2008), who independently provided a proof of the same result.16 15 In an experiment involving judging the likelihoods that professional basketball teams would defeat others, Fox (1999) elicited ratios of support values and found an “excellent fit of the product rule for these data” at both the aggregate and individual subject level. 16 To clarify the relationship of the result to Tversky and Koehler (1994), we provide a proof of Theorem 5(ii) based on the proof of Theorem 1 in Tversky and Koehler (1994) (see Lemma 7 in Appendix D). As suggested above, Theorem 5(ii) can also be proven by showing that, under the hypotheses of the theorem, if there are no binary bet cycles of size four, then there are no binary bet cycles of any size. Details are available from the authors upon request.
672
D. S. AHN AND H. ERGIN
THEOREM 5—Tversky and Koehler (1994), Nehring (2008): Suppose that {π }π∈Π ∗ satisfies the Anscombe–Aumann axioms, the sure-thing principle, and strict admissibility. Then the following statements hold: (i) There exists an affine vNM utility function u : ΔX → R and a unique family of probabilities {μπ }π∈Π ∗ with μπ : σ(π) → [0 1], such hold: that two conditions (a) For any π ∈ Π ∗ and f g ∈ Fπ , f π g ⇐⇒ S u ◦ f dμπ ≥ S u ◦ g dμπ , and for any E ∈ π, μπ (E) > 0. (b) For any nonempty disjoint events A B, the ratio defined by R(A B) ≡
μπ (A) μπ (B)
is independent of π ∈ Π ∗ such that A B ∈ π.17 (ii) satisfies binary bet acyclicity if and only if R satisfies the product rule (Tversky and Koehler (1994)) R(A B)R(B C) = R(A D)R(D C) for all nonempty events A B C D such that [A ∪ C] ∩ [B ∪ D] = ∅. See Appendix D for the proof. Theorem 5 can be leveraged to connect the cases where Π is a filtration with the case where Π includes all partitions. Specifically, binary bet acyclicity is equivalent to assuming that the likelihood ratio of E to F would not have changed if another filtration had been used for elicitation. Details can be found in the online supplement. 5. TRANSPARENT EVENTS AND COMPLETELY OVERLOOKED EVENTS Throughout this section, let Π = Π ∗ . We now define two interesting families of events. The first family consists of those events that are completely transparent to the decision maker prior to any further description of the state space. The second family is the opposite: those events that are completely overlooked until they are explicitly described. These definitions are imposed on the preferences directly. When preferences admit a PDEU representation, the families 17 Under the assumptions of the theorem, these ratios can also be directly defined through preference. Fix any p q. For all disjoint nonempty A B, define R(A B) as follows. Without loss of generality, suppose p A p B q A q B
Then there exists a unique α ∈ (0 1] such that αp + (1 − α)q A p B ∼ q A q B Define R(A B) = 1/α and R(B A) = α.
FRAMING CONTINGENCIES
673
of transparent and of overlooked events are closed under union and intersection, which is potentially useful for applications. Moreover, these events can be readily identified from the support function ν, which is another useful consequence of PDEU for applications. 5.1. Transparent Events We now consider those events whose explicit descriptions have no effect on choice. If event A was already in mind when deciding between acts f and g, then mentioning it explicitly should have no bearing on preference. Conversely, if its explicit description reverses preference, then A must not have been completely considered. DEFINITION 9: Fix {π }π∈Π ∗ . An event A is transparent if for any π ∈ Π ∗ and for any f g ∈ Fπ , f π g
⇐⇒
f π∨{AA } g
Let A denote the family of all transparent events. The events in A are those that are immune to manipulation by framing or description. Someone designing a contract and deciding which contingencies to explicitly mention cannot change the decision maker’s willingness to pay for the contract by mentioning an event in A. The family A has some nice features when preferences admit a strictly admissible PDEU representation. PROPOSITION 2: Suppose {π }π∈Π ∗ admits a PDEU representation (u ν) and satisfies strict admissibility. Then the following statements hold: (i) A ∈ A if and only if ν(E) = ν(E ∩ A) + ν(E ∩ A ) for all events E = S. (ii) A is an algebra.18 (iii) ν is additive on A \ {S}, that is, for all disjoint A B ∈ A such that A ∪ B = S, ν(A ∪ B) = ν(A) + ν(B) Moreover, ν(A) + ν(A ) = ν(B) + ν(B ) for any A B ∈ A \ {∅ S}. (iv) A = 2S if and only if ν is additive on 2S \ {S}. See Appendix E for the proof. Given a strictly admissible PDEU representation, an event A is transparent if every other event is additive with respect to its intersection and relative 18 The algebraic structure of A is similar to the structure of unambiguous events under some definitions (Nehring (1999)). This structure is arguably restrictive for unambiguous events, but does not carry these shortcomings for our interpretation. Moreover, the behavioral definitions that induce algebras in that literature are logically independent of our definition of transparent events.
674
D. S. AHN AND H. ERGIN
complement with A. This is natural, since if A was already understood, mentioning it should have no effect on the judged likelihood of any other event. Moreover, the family A is an algebra. Thus (S A) can be sensibly interpreted as the prior understanding of the state space before any descriptions. This understanding might vary across agents, that is, one decision maker might understand more events than another, but A can be elicited from preferences. Finally, the support function is additive over the transparent events. Since complementary weights sum to a constant number, if we redefine the value ν(S) ≡ ν(A) + ν(A ) for an arbitrary A ∈ A \ {∅ S}, then (S A ν|A ) defines a probability space after appropriate normalization. EXAMPLE 4: Let π ∗ = {A1 An } be a partition of the state space. Interpret π ∗ as the decision maker’s a priori understanding of the state space before any additional details are provided in the description. Suppose that when the state space is described as the partition π, the decision maker understands both the explicitly described events in π and those events in π ∗ which she understood a priori. She then adapts the principle of insufficient reason over the refinement π ∨ π ∗ . In terms of the representation, this is captured by setting ν(E) = |{i : E ∩ Ai = ∅}|. For example, a consumer might understand that chemotherapy, surgery, drugs, and behavioral counseling are possible treatments when purchasing health insurance, even if they are not specifically mentioned, but when a specific disease is mentioned, she applies the principle of insufficient reason over its relevant treatments. In this case, A is the algebra generated by π ∗ . Even if the prior understanding π ∗ of the decision maker is unknown to the analyst, the example confirms that Definition 9 recovers π ∗ from preferences. The notion of transparency can also be defined relative to a partition. In other words, one can define the events A(π) that are understood once the partition π is announced to the decision maker. The operator A(π) has appealing properties across partitions. Under strictly admissible PDEU representations, A(π) has the properties of A described in Proposition 2 for every π. Details are in the online supplement. 5.2. Completely Overlooked Events As a counterpoint to the events that are understood perfectly, we now discuss the events that are completely overlooked. In the unforeseen contingencies interpretation of our model, these will correspond to the completely unforeseen events. DEFINITION 10: Fix {π }π∈Π ∗ . An event E ⊂ S is completely overlooked if E = ∅ or if, for all three cell partitions {E F G} of S and p q r ∈ ΔX, p E∪F p F ∼ r ⇐⇒ ∼ r q G q E∪G
FRAMING CONTINGENCIES
675
In words, E is completely overlooked if the decision maker never puts any weight on E unless it is explicitly described to her. In the first comparison of the definition, she attributes all the likelihood of receiving p to F , because E carries no weight when it is not separately mentioned; in the second comparison, all the likelihood of q is similarly attributed to G. Due to the framing of both acts, E remains occluded and the certainty equivalents are equal because both appear to be bets on F or G. It is important to notice that an event does not have to be either transparent in the sense of Definition 9 or completely overlooked. The two definitions represent extreme cases that admit many intermediate possibilities. A completely overlooked event is distinct from a null event. Whenever E ∪ F = S, the preference ⎛ ⎞ p E p E∪F ⎝p F ⎠ q G q G is consistent with E being completely overlooked. Here, the presentation of the second act explicitly mentions E, at which point the decision maker assigns it some positive likelihood. In contrast, this strict preference is precluded whenever E is null, because then the decision maker would be indifferent to whether p or p was assigned to the impossible event E. On the other hand, all null events are completely overlooked. The event E might contribute no additional likelihood to E ∪ F for two reasons. First, the decision maker may have completely overlooked the event E when it was grouped as E ∪ F . Second, she may have actually considered its possibility, but concluded that E was impossible. These cases are behaviorally indistinguishable. PROPOSITION 3: Suppose {π }π∈Π ∗ satisfies strict admissibility and admits the PDEU representation (u ν) where ν is monotone. Then the following statements hold: (i) E is completely overlooked if and only if ν(E ∪ F) = ν(F) for any nonempty event F disjoint from E such that E ∪ F = S. (ii) If E and F are completely overlooked and E ∪ F = S, then E ∩ F and E ∪ F are also completely overlooked. (iii) If |S| ≥ 3 and all nonempty events are completely overlooked, then ν(E) = ν(F) for all nonempty E F = S. The first part of the proposition relates completely overlooked events with their marginal contribution to the weighting function ν. The second part shows that the family of completely overlooked events has some desirable properties: closure under set operations is guaranteed when the sets do not cover all of S. The third part characterizes the principle of insufficient reason. This extreme
676
D. S. AHN AND H. ERGIN
case where all nonempty events are completely overlooked is represented by a constant support function where ν(E) = 1 for every nonempty E. The decision maker places a uniform distribution over the events that are explicitly mentioned in a description π.19 APPENDIX A: PRELIMINARY OBSERVATIONS In this section we state and prove a set of preliminary lemmas and a uniqueness result for general Π. We note that the results in this section apply to both the case where Π is a filtration and the case where Π is the set of all finite partitions. We first state, without proof, the straightforward observation that the first five axioms provide a simple analog of the Anscombe–Aumann expected utility theorem. LEMMA 1: The collection {π }π∈Π satisfies the Anscombe–Aumann axioms if and only if there exist an affine utility function u : ΔX → R with [−1 1] ⊂ u(ΔX) and a unique family of probability measures {μπ }π∈Π with μπ : σ(π) → [0 1] such that u ◦ f dμπ ≥ u ◦ g dμπ f π g ⇐⇒ S
S
for any f g ∈ Fπ . The next lemma states that the sure-thing principle is necessary for a PDEU representation. LEMMA 2: If {π }π∈Π admits a partition-dependent expected utility representation, then satisfies the sure-thing principle. PROOF: For any f g ∈ F , note that D(f g) ≡ {s ∈ S : f (s) = g(s)} ∈ σ(π(f g)); hence f g
⇐⇒ ⇐⇒
f π(fg) g u ◦ f dμπ(fg) ≥ D(fg)
⇐⇒
F∈π(fg) : F⊂D(fg)
u ◦ g dμπ(fg) D(fg)
u(f (F))ν(F) ≥
u(g(F))ν(F)
F∈π(fg) : F⊂D(fg)
19 In fact, part (iii) of Proposition 3 can be strengthened to the following statements: if two disjoint sets E and F , with E ∪ F = S, are completely overlooked, then the principle of insufficient reason is applied to subevents of their union: ν(D) = ν(D ) for all D D ⊂ E ∪ F . Then E ∪ F can be considered an area of the state space of which the decision maker has no understanding.
FRAMING CONTINGENCIES
677
where the second equivalence follows from multiplying both sides by F ∈π(fg) ν(F ). Now to demonstrate the sure-thing principle, let E ∈ E and f g h h ∈ F . Let g E f E ˆ g = fˆ = h E h E f E g E ˆ fˆ = g = h E h E ˆ : F ⊂ D(fˆ ˆ = D(fˆ gˆ ) ⊂ E and πD ≡ {F ∈ π(fˆ g) Note that D ≡ D(fˆ g) ˆ = {F ∈ π(fˆ gˆ ) : F ⊂ D(fˆ gˆ )}. Hence by the observation made in the first g)} paragraph, fˆ gˆ
⇐⇒
F∈πD
⇐⇒
u(f (F))ν(F) ≥
u(g(F))ν(F)
F∈πD
u(fˆ (F))ν(F) ≥
F∈πD
u(gˆ (F))ν(F)
F∈πD
fˆ gˆ
⇐⇒
ˆ u(g(F))ν(F)
F∈πD
F∈πD
⇐⇒
u(fˆ(F))ν(F) ≥
Q.E.D.
The next lemma summarizes the general implications of the Anscombe– Aumann axioms and the sure-thing principle. LEMMA 3: Assume that {π }π∈Π satisfies the Anscombe–Aumann axioms and the sure-thing principle. Then {π }π∈Π admits a representation (u {μπ }π∈Π ) as in Lemma 1. For any events E F ∈ C and partitions π π ∈ Π, the following statements hold: (i) If E ∈ π π , then μπ (E) = 0 ⇔ μπ (E) = 0. (ii) If E F ∈ π π and E ∩ F = ∅, then μπ (E)μπ (F) = μπ (F)μπ (E). PROOF: To prove part (i), it suffices to show that if E ∈ π π , then μπ (E) = 0 ⇒ μπ (E) = 0. Suppose that μπ (E) = 0. Select any two lotteries p q ∈ ΔX that satisfy u(p) > u(q), and select any two acts h h ∈ F such that π(h) = π and π(h ) = π . Then
p E h E
∼
q E h E
678
D. S. AHN AND H. ERGIN
by Lemma 1. Hence q p E ∼ h h E
E E
by the sure-thing principle. Since u(p) > u(q), the last indifference can hold only if μπ (E) = 0 by Lemma 1. To prove part (ii), observe that if either side of the desired equality is zero, then part (ii) is immediately implied by part (i). So now assume that both sides are strictly positive. Then all of the terms μπ (E), μπ (F), μπ (F), and μπ (E) are strictly positive. As before, select any two lotteries p q ∈ ΔX such that u(p) > u(q) and define a new lottery r by r=
μπ (F) μπ (E) p+ q μπ (E) + μπ (F) μπ (E) + μπ (F)
/ h(S) ∪ h (S), π(h) = π, and Select any two acts h h ∈ F such that p q r ∈ π(h ) = π . By the choice of r and the expected utility representation of π , we have ⎞ ⎛ p E E∪F ⎠∼ r ⎝q F h (E ∪ F) h (E ∪ F) Hence by the sure-thing principle, ⎛ ⎞ p E r E∪F ⎝q ⎠ ∼ F h (E ∪ F) h (E ∪ F) This indifference, in conjunction with the expected utility representation of π , implies that u(r) =
μπ (F) μπ (E) u(p) + u(q) μπ (E) + μπ (F) μπ (E) + μπ (F)
We also have u(r) =
μπ (F) μπ (E) u(p) + u(q) μπ (E) + μπ (F) μπ (E) + μπ (F)
by the definition of r. Subtracting u(q) from each side of the prior two expressions for u(r) above, we obtain μπ (E) μπ (E) [u(p) − u(q)] = [u(p) − u(q)] μπ (E) + μπ (F) μπ (E) + μπ (F)
FRAMING CONTINGENCIES
679
which further simplifies to μπ (F) μπ (F) = μπ (E) μπ (E) since both sides of the previous equality are strictly positive.
Q.E.D.
By part (i) of Lemma 3, for all π π ∈ Π, an event E ∈ π π is π-null if and only if it is π -null. Hence under the Anscombe–Aumann axioms and the sure-thing principle, we can change quantifiers in the definitions of null and nonnull events in C . An event E ∈ C is null if and only if E is π-null for some partition π ∈ Π with E ∈ π. Similarly, an event E ∈ C is nonnull if and only if E is π-nonnull for every partition π ∈ Π with E ∈ π.20 We will next state and prove a general uniqueness result that will imply the uniqueness Theorems 2 and 4. To do so, we first need to generalize event reachability so that it applies to our general model. AXIOM 11—Generalized Event Reachability: For any distinct nonnull events E F ∈ C \ {S}, there exists a sequence of nonnull events E1 En ∈ C such that E = E1 , F = En , and, for each i = 1 n − 1, there is π ∈ Π such that Ei Ei+1 ∈ π. Note that when Π is the set of all finite partitions, generalized event reachability is equivalent to event reachability. LEMMA 4: Assume that {π }π∈Π admits a PDEU representation (u ν). Then the following statements are equivalent: (i) {π }π∈Π satisfies generalized event reachability. (ii) If (u ν ) also represents {π }π∈Π , then there exist numbers a c > 0 and b ∈ R such that u (p) = au(p) + b for all p ∈ ΔX and ν (E) = cν(E) for all E ∈ C \ {S}. PROOF: Assume that {π }π∈Π admits the PDEU representation (u ν). Let C ∗ denote the set of nonnull events in C . The collection C ∗ is nonempty since nondegeneracy ensures that S ∈ C ∗ . Define the binary relation ≈ on C ∗ by E ≈ F if there exists a sequence of events E1 En ∈ C ∗ with E = E1 , F = En and, for each i = 1 n − 1, there is π ∈ Π such that Ei Ei+1 ∈ π. The relation ≈ is reflexive, symmetric, and transitive, defining an equivalence relation on C ∗ . For any E ∈ C ∗ , let [E] = {F ∈ C ∗ : E ≈ F} denote the equivalence class of 20 Note that ∅ is null and S is nonnull by nondegeneracy. Also, there may exist a nonnull event E ∈ C , which is π-null for some π ∈ Π such that E ∈ σ(π). From the above observation concerning the quantifiers, this can only be possible if E is not a cell in π, but a union of its cells. This would correspond to a representation where, for example, E is a union of two disjoint subevents E = E1 ∪ E2 and ν(E) > 0, yet ν(E1 ) = ν(E2 ) = 0.
680
D. S. AHN AND H. ERGIN
E with respect to ≈. Let C ∗ / ≈ = {[E] : E ∈ C ∗ } denote the quotient set of all equivalence classes of C ∗ modulo ≈, with a generic class R ∈ C ∗ / ≈. Note that, given the above definitions, event reachability is equivalent to C ∗ / ≈ consisting of two indifference classes {S} and C ∗ \ {S}. We first show the (i) ⇒ (ii) part. Suppose that (u ν ) is a PDEU representation of {π }π∈Π and that generalized event reachability is satisfied. For each π ∈ Π, let μπ and μπ , respectively, denote the probability distributions derived from ν and ν by Equation (1). Applying the uniqueness component of the Anscombe–Aumann expected utility theorem to π , we have μπ = μπ and u = au + b for some a > 0 and b ∈ R. If E ∈ C is null, then ν(E) = μπ (E) = 0 = μπ (E) = ν (E) for any π ∈ Π with E ∈ π. Also note that if E F ∈ C ∗ are such that there exists π ∈ Π with E F ∈ π, then ν(E) μπ (E) μπ (E) ν (E) = = = ν(F) μπ (F) μπ (F) ν (F)
We next extend the equality ν(E) = νν (E) to any pair of events E F ∈ C ∗ \ {S}, so ν(F) (F) as to conclude that there exists c > 0 such that ν (E) = cν(E) for all E ∈ C \ {S}. Let E F ∈ C ∗ \ {S}. By generalized event reachability, there exist E1 En ∈ C ∗ such that E = E1 , F = En and, for each i = 1 n − 1, there is π ∈ Π such that Ei Ei+1 ∈ π. Then ν(En−1 ) ν (E1 ) ν (En−1 ) ν (E) ν(E) ν(E1 ) = × ··· × = × ··· × = ν(F) ν(E2 ) ν(En ) ν (E2 ) ν (En ) ν (F) where the middle equality follows from the existence of π ∈ Π such that Ei Ei+1 ∈ π for each i = 1 n−1. Thus ν is a scalar multiple of ν on C ∗ \{S}, determined by the constant c = ν (E)/ν(E) for any E ∈ C ∗ \ {S}. To see the (i) ⇐ (ii) part, suppose that generalized event reachability is not satisfied. Then the relation ≈ defined above has at least two distinct equivalence classes R and R that are different from {S}. Define ν : C → R+ by ν(E) if E ∈ R ν (E) = 2ν(E) otherwise for E ∈ C . Take any π ∈ Π. If π ∩ R = ∅, then ν (E) = ν(E) for all E ∈ π. If π ∩ R = ∅, then ν (E) = 2ν(E) for all E ∈ π. Hence (u ν) and (u ν ) are two partition-dependent expected utility representations of {π }π∈Π such that there does not exist a c > 0 with ν (E) = cν(E) for all E ∈ C \ {S}. Q.E.D. APPENDIX B: PROOFS FOR SECTION 4.2 PROOF OF THEOREM 1: Necessity is implied by Lemmas 1 and 2. We now prove sufficiency. Let u and {μπ }π∈Π be as guaranteed by Lemma 1. We de-
FRAMING CONTINGENCIES
681
k fine ν on t=0 πt recursively on k ≥ 0, which will define ν on the whole T C = t=0 πt .21 Step 0: Let ν(S) := c0 for an arbitrary constant c0 > 0. Step 1: For all E ∈ π1 , set ν(E) := c1 μπ1 (E) for an arbitrary constant c1 > 0. Step k + 1 (k ≥ 0): Assume the following inductive assumptions: k (i) The nonnegative set function ν has already been defined on t=0 πt . (ii) For all t = 0 1 k, E ∈πt ν(E ) > 0 (i.e., nondegeneracy is satisfied). (iii) For all t = 0 1 k and for all E ∈ πt , μπt (E) = ν(E)/ E ∈πt ν(E ). Case 1. Assume that there exists E ∗ ∈ πk ∩ πk+1 such that μπk (E ∗ ) > 0. Then by Lemma 3, μπk+1 (E ∗ ) > 0 and by the inductive assumption, ν(E ∗ ) > 0. For all k E ∈ πk+1 \ πk = πk+1 \ ( t=1 πt ) (the equality is because we have a filtration) define ν(E) by (2)
ν(E) =
ν(E ∗ ) μπ (E) μπk+1 (E ∗ ) k+1
Equation (2) also holds (as an equation rather than a definition) for E ∈ πk+1 ∩ πk , since μπk+1 (E) μπk (E) ν(E) = = ∗ ∗ ν(E ) μπk (E ) μπk+1 (E ∗ ) where the first equality is by the inductive assumption and the second is by k+1 Lemma 3. It is now easy to verify that ν satisfies (i), (ii), and (iii) on t=1 πt . Case 2. Assume that for all E ∈ πk ∩ πk+1 , μπk (E) = 0. Let ck+1 > 0 be an k arbitrary constant and for all E ∈ πk+1 \ πk = πk+1 \ ( t=1 πt ), define ν(E) by (3)
ν(E) = ck+1 μπk+1 (E)
Equation (2) actually also holds (as an equation rather than a definition) for E ∈ πk+1 ∩ πk , since for all such E, μπk (E) = 0; hence by Lemma 3, μπk+1 (E) = 0 and by the inductive assumption, ν(E) = 0. It is now easy to verify that ν k+1 Q.E.D. satisfies (i), (ii), and (iii) on t=1 πt . PROOF OF THEOREM 2: In light of the general uniqueness result Lemma 4, we only need to prove that generalized event reachability is equivalent to gradualness for filtrations. Suppose that {πt }Tt=0 admits a PDEU representation (u ν). 21 The ck constants in the iterative definition show just how flexible we are in defining ν, which also hints to the role of gradualness in guaranteeing uniqueness. In the iterative definition, Step 1 is a subcase of the subsequent step; however, we prefer to write it down explicitly because it is substantially simpler.
682
D. S. AHN AND H. ERGIN
First assume that {πt }Tt=0 is gradual with respect to {π }π∈Π . Let E F ∈ C \{S} be distinct nonnull events. Then there exist πi πj such that 0 < i j ≤ T , E ∈ πi , and F ∈ πj . Without loss of generality, let i ≤ j, let Ei−1 := E, Ej := F , and, for each t ∈ {i i + 1 j − 1}, let Et ∈ πt ∩ πt+1 be a πt -nonnull event as guaranteed by gradualness. Then Ei−1 Ei Ei+1 Ej ∈ C is sequence of nonnull events such that E = Ei−1 , F = Ej , and Et Et+1 ∈ πt+1 ∈ Π for each t = i − 1 i j − 1. Hence generalized event reachability is satisfied. Now assume that generalized event reachability is satisfied. Let 0 < t ∗ < T . By nondegeneracy, there exist a πt ∗ -nonnull event E ∈ πt ∗ and a πt ∗ +1 -nonnull event F ∈ πt ∗ +1 . Then E F ∈ C \ {S} are nonnull, hence by generalized event reachability, there exists a sequence of nonnull events E1 En ∈ C such that E = E1 , F = En , and, for each i = 1 n − 1, there is t such that Ei Ei+1 ∈ πt . For each i = 1 n, let t(i) = min{t : Ei ∈ πt } and t¯(i) = sup{t : Ei ∈ πt }.22 Then Ei ∈ πt if and only if t(i) ≤ t ≤ t¯(i). Note that t(1) ≤ t ∗ ≤ t¯(1), t(n) ≤ t ∗ +1 ≤ t¯(n), and t(i +1) ≤ t¯(i) for i = 1 n−1. Hence t(i) ≤ t ∗ and t ∗ +1 ≤ t¯(i) for some i = 1 n. Then Ei ∈ πt ∗ ∩ πt ∗ +1 and Ei is nonnull, hence Ei is πt ∗ -nonnull by Lemma 3. We conclude that {πt }Tt=0 is gradual with respect to {πt }Tt=0 . Q.E.D. APPENDIX C: PROOFS FOR SECTION 4.3 PROOF OF PROPOSITION 1: For the necessity part, assume that {π }π∈Π ∗ admits a partition-independent expected utility representation (u ν). Note that f g if and only if S u ◦ f dν ≥ S u ◦ g dν for any f g ∈ F . Thus is transitive, hence acyclic. The necessity of the Anscombe–Aumann axioms follows immediately from the standard Anscombe–Aumann expected utility theorem. Now turning to sufficiency, assume that {π }π∈Π ∗ satisfies the Anscombe– Aumann axioms and acyclicity. Let u and {μπ }π∈Π ∗ be as guaranteed by Lemma 1. We first show that, for all π ∈ Π ∗ \ {{S}} and E ∈ π: (4)
μπ (E) = μ{EE } (E)
Suppose for a contradiction that μπ (E) > μ{EE } (E) in (4). Let μπ (E) > α > μ{EE } (E). Since the range of u contains the interval [−1 1], there exist p q ∈ ΔX such that u(p) = 1 and u(q) = 0. Define the act h by p E h= q E Note that αp + (1 − α)q h. Let f ∈ F be such that π(f ) = π and for all s ∈ S, u(f (s)) < 0. Then there exists ε ∈ (0 1) such that the act hε ≡ (1 − ε)h + εf 22
We use supremum here since this value can be +∞.
FRAMING CONTINGENCIES
683
satisfies π(hε ) = π and hε π αp + (1 − α)q. Then h hε αp + (1 − α)q h, a contradiction to being acyclic. The argument for the case where μπ (E) < μ{EE } (E) is entirely symmetric, hence omitted. Define ν: 2S → [0 1] by ν(∅) ≡ 0, ν(S) ≡ 1, and ν(E) ≡ μ{EE } (E) for E = ∅ S. To see that μ is finitely additive, let E and F be nonempty disjoint sets. If E ∪ F = S, then F = E so ν(E) + ν(F) = μ{EE } (E) + μ{EE } (E ) = 1 = ν(E ∪ F) If E ∪ F S, let π = {E F (E ∪ F) } and π = {E ∪ F (E ∪ F) }. Then by (4), ν(E) + ν(F) = μπ (E) + μπ (F) = 1 − μπ ((E ∪ F) ) = 1 − μπ ((E ∪ F) ) = μπ (E ∪ F) = ν(E ∪ F) Therefore ν is additive. To conclude, note that for any π ∈ Π ∗ , the definition of ν and (4) imply that μπ (E) = ν(E) for all E ∈ π. Hence (u ν) is a partitionQ.E.D. independent representation of {π }π∈Π ∗ . PROOF OF THEOREM 3: The necessity of the Anscombe–Aumann axioms follows from the standard Anscombe–Aumann expected utility theorem. The necessity of the sure-thing principle was established in Lemma 2. We now establish the necessity of binary bet acyclicity. LEMMA 5: If {π }π∈Π ∗ admits a PDEU representation, then it satisfies binary bet acyclicity. PROOF: First note that for any (possibly empty) disjoint events E and F , and (not necessarily distinct) lotteries p q r ∈ ΔX, we have r F p E ⇐⇒ q F q E [u(p) − u(q)]ν(E) ≥ [u(r) − u(q)]ν(F) To see the necessity of binary bet acyclicity, let E1 En E1 be a sequentially disjoint cycle of events and let p1 p2 pn ; q ∈ ΔX be such that pi+1 Ei+1 pi Ei ∀i = 1 n − 1: q Ei q Ei+1 The observation made in the first paragraph implies that [u(p1 ) − u(q)]ν(E1 ) > [u(p2 ) − u(q)]ν(E2 ) > · · · > [u(pn ) − u(q)]ν(En )
684
D. S. AHN AND H. ERGIN
Since [u(p1 ) − u(q)]ν(E1 ) > [u(pn ) − u(q)]ν(En ), we conclude that pn En p1 E1 q E1 q En
Q.E.D.
We next prove the sufficiency part. Suppose that {π }π∈Π ∗ satisfies the Anscombe–Aumann axioms, the sure-thing principle, and binary bet acyclicity. Let (u {μπ }π∈Π ∗ ) be a representation of {π }π∈Π ∗ guaranteed by Lemma 1. For any two disjoint nonnull events E and F , define the ratio E μπ (E) ≡ F μπ (F) where π is a partition such that E F ∈ π. By part (ii) of Lemma 3, the value of E does not depend on the particular choice of π. Moreover, EF is well defined F and strictly positive since E and F are nonnull. Finally, EF × EF = 1 by construction. The following lemma appeals to binary bet acyclicity in generalizing this equality. LEMMA 6: Suppose that {π }π∈Π ∗ satisfies the Anscombe–Aumann axioms, the sure-thing principle, and binary bet acyclicity. Then, for any sequentially disjoint cycle of nonnull events E1 En E1 ∈ E , (5)
E1 E2 En−1 En × × ··· × × = 1 E2 E3 En E1
PROOF: Let (u {μπ }π∈Π ∗ ) be a representation of {π }π∈Π ∗ guaranteed by Lemma 1. We first show that for any p1 pn q ∈ ΔX such that u(q) = 0 and u(pi ) ∈ (0 1) for i = 1 n, pi+1 Ei+1 pi Ei (6) ∼ (∀i = 1 n − 1) q Ei q Ei+1 p1 E1 pn En ⇒ ∼ q E1 q En Note that it is enough to show that the hypothesis in Equation (6) above implies pn En p1 E1 q E1 q En Let ε¯ ∈ (0 1) be such that u(pi ) + ε¯ < 1 for i = 1 n. Since the range of the utility function u over lotteries contains the unit interval [−1 1], for each ε ∈ (0 ε) ¯ and i ∈ {1 n}, there exists pi (ε) ∈ ΔX such that u(pi (ε)) =
FRAMING CONTINGENCIES
685
u(pi ) + εi , where εi refers to the ith power of ε. The expected utility representation of Lemma 1 and the fact that Ei is nonnull implies that for sufficiently small ε ∈ (0 ε), ¯ pi+1 (ε) Ei+1 pi (ε) Ei q Ei q Ei+1 for i = 1 n − 1. By binary bet acyclicity, this implies pn (ε) En p1 (ε) E1 q E1 q En Appealing to the continuity of the expected utility representation of Lemma 1 in the assigned lotteries f (s) and taking ε → 0 proves the desired conclusion. We can now prove Equation (5). The case where n = 2 immediately follows from our definition of event ratios, so assume that n ≥ 3. Fix t1 > 0 and recursively define ti = t1 ×
E1 E2 Ei−1 × × ··· × E2 E3 Ei
for i = 2 n. By selecting a sufficiently small t1 , we may assume that t1 tn ∈ (0 1). Also note that ti+1 /ti = Ei /Ei+1 for i = 1 n − 1. Recall that the range of the utility function u over lotteries contains the unit interval [−1 1], so there exist lotteries p1 pn q ∈ ΔX such that u(pi ) = ti for i = 1 n and u(q) = 0. Fix any i ∈ {1 n − 1}. Let π = {Ei Ei+1 (Ei ∪ Ei+1 ) }. Since ti+1 /ti = Ei /Ei+1 , we have μπ (Ei+1 )u(pi+1 ) = μπ (Ei )u(pi ). Hence pi+1 Ei+1 pi Ei ∼ q Ei q Ei+1 by the expected utility representation of Lemma 1. Since the above indifference holds for any i ∈ {1 n − 1}, by Equation (6), we have p1 E1 pn En ∼ q E1 q En Hence by the expected utility representation of π for π = {E1 En (E1 ∪En ) }, we have μπ (E1 )u(p1 ) = μπ (En )u(pn ). This implies tn /t1 = E1 /En . Recalling the construction of tn , we then have the desired conclusion: E1 E2 En−1 E1 × × ··· × = E2 E3 En En
Q.E.D.
We can now conclude the proof of sufficiency. Assume that {π }π∈Π ∗ satisfies the Anscombe–Aumann axioms, the sure-thing principle, and binary bet
686
D. S. AHN AND H. ERGIN
acyclicity. Define C ∗ and ≈ as in the proof of Lemma 4. Let C ∗ denote the set of nonnull events in C . The collection C ∗ is nonempty, since nondegeneracy ensures that S ∈ C ∗ . Define the binary relation ≈ on C ∗ by E ≈ F if there exists a sequentially disjoint sequence of nonnull events E1 En ∈ C ∗ with E = E1 and F = En .23 The relation ≈ is reflexive, symmetric, and transitive. So ≈ is an equivalence relation on C ∗ . For any E ∈ C ∗ , let [E] = {F ∈ C ∗ : E ≈ F} denote the equivalence class of E with respect to ≈. Let C ∗ / ≈ = {[E] : E ∈ C ∗ } denote the quotient set of all equivalence classes of C ∗ modulo ≈, with a generic class R ∈ C ∗ / ≈.24 Select a representative event GR ∈ R for every equivalence class R ∈ C ∗ / ≈, invoking the axiom of choice if the quotient is uncountable. We next define ν. For all null E ∈ C , let ν(E) = 0. For every class R ∈ C ∗ / ≈, arbitrarily assign a positive value ν(GR ) > 0 for its representative. We conclude by defining ν(E) for any E ∈ C ∗ \ {S}. If E = G[E] , then E represents its equivalence class and ν(E) has been assigned. Otherwise, whenever E = G[E] , since E ≈ G[E] , there exists a sequentially disjoint path of nonnull events E1 En ∈ C ∗ such that E = E1 and G[E] = En . Then let ν(E) =
E1 En−1 × ··· × × ν G[E] E2 En
Note that the definition of ν(E) above is independent of the particular choice of the path E1 En , because for any other such sequentially disjoint path of nonnull events E = F1 Fm = G[E] , E1 En−1 Fm F2 × ··· × × × ··· × =1 E2 En Fm−1 F1 by Lemma 6. We next verify that ν : C \ {S} → R+ defined above is a nondegenerate set function that satisfies (7)
ν(E) μπ (E) = ν(F) F∈π
for any event E ∈ π of any partition π ∈ Π ∗ \ {{S}}. Let π ∈ Π ∗ \ {{S}}. By nondegeneracy and the expected utility representation for π , there exists a π-nonnull F ∈ π. Then, since Lemma 3 implies that π-nonnull events in C are nonnull, F is nonnull so the denominator on the right hand side of Equation (7) is strictly positive and so the fraction is well 23 Note that this definition slightly differs from that used in the general uniqueness result (Lemma 4). The two definitions can easily be verified as equivalent, since Π is the set of all finite partitions. 24 Note that [S] = {S} and E ≈ F for any disjoint nonnull E and F .
FRAMING CONTINGENCIES
687
defined. This also implies that ν is a nondegenerate set function. Observe that Equation (7) immediately holds if E is null, since then ν(E) = 0 and μπ (E) = 0 follows from E being π-null. Let Cπ∗ ⊂ π denote the nonnull cells of π. To finish the proof of the theorem, we show that (μπ (E))/(μ π (F)) = (ν(E))/(ν(F)) for any distinct E F ∈ Cπ∗ . Along with the fact that E∈Cπ∗ μπ (E) = 1, this will prove Equation (7). Let E F ∈ Cπ∗ be distinct. Note that [E] = [F] since E and F are disjoint. Suppose first that neither E nor F is G[E] . Then there exists a sequentially disjoint path of nonnull events E1 En ∈ C ∗ such that E = E1 , G[E] = En and ν(E) =
E1 En−1 × ··· × × ν G[E] E2 En
Then F E1 En = G[E] forms such a path from F to G[E] , hence we have ν(F) =
F E1 En−1 × × ··· × × ν G[E] E1 E2 En
Dividing the term for ν(E) by the term for ν(F), we obtain EF = ν(E) . ν(F) The other possibility is that exactly one of E or F (without loss of generality E) is G[E] . Then the nonnull events F = E1 and E2 = E make up a path from F to E = G[E] . Then ν(F) =
F × ν(E) E Q.E.D.
as desired. APPENDIX D: PROOF OF THEOREM 5
Part (i) follows from Lemma 1 and Lemma 3 in Appendix A. In part (ii), if {π }π∈Π ∗ satisfies binary bet acyclicity, then it has a PDEU representation, implying the product rule. The next lemma shows that the product rule is also sufficient for a PDEU representation, establishing the other direction of Theorem 5(ii). LEMMA 7 —Tversky and Koehler (1994), Nehring (2008): Suppose that {π }π∈Π ∗ satisfies the Anscombe–Aumann axioms, the sure-thing principle, and strict admissibility. Let u : ΔX → R, {μπ }π∈Π ∗ , and R be as in Theorem 5(i). Then the product rule implies that there exists a strictly positive support function ν such that μπ (E) = (ν(E))/( F∈π ν(F)), for any π ∈ Π ∗ and E ∈ π. We will show that the above lemma follows from the proof of Theorem 1 in Tversky and Koehler (1994). The general idea is first to establish a natural correspondence between probability judgments P (which are the primitive of their analysis) and event ratios R, and then to translate Tversky and
688
D. S. AHN AND H. ERGIN
Koehler’s (1994) axioms and arguments to event ratios. We also argue that a key assumption of Tversky and Koehler (1994) on probability judgments— proportionality—is implied by our construction of event ratios using the Anscombe–Aumann axioms and the sure-thing principle. Throughout the remainder of this section, we assume strict admissibility, which is also implicitly assumed in Tversky and Koehler (1994). Remember that for any two disjoint nonempty events A and B, R(A B) ≡ AB and in Tverν(A) .25 Therefore, the sky and Koehler’s (1994) representation, P(A B) = ν(A)+ν(B) probability judgment function P is related to event ratios via (8)
A P(A B) = B P(B A)
(9)
P(A B) =
1 B 1+ A
where A and B are nonempty disjoint events. Tversky and Koehler (1994) also used the operation A ∨ B for explicit disjunction of disjoint nonempty events A and B. Then the term P(A B ∨ C) is naturally related to event ratios via (10)
P(A B ∨ C) =
1 C B 1+ + A A
where A, B, and C are nonempty disjoint events.26 We next state Tversky and Koehler’s (1994) proportionality axiom on P (see Tversky and Koehler (1994, Equation (4), p. 549). AXIOM 12—Proportionality: For all pairwise disjoint nonempty events A, B, and C, P(A B) P(A B ∨ C) = P(B A) P(B A ∨ C) Given Equations (9) and (10) and AB = 1/( AB ) for disjoint nonempty events A and B, one can equivalently express the proportionality axiom in terms of event ratios. 25 Tversky and Koehler (1994) distinguished between the collection of hypotheses H and the collection of events 2S . They assumed that every hypothesis A ∈ H corresponds to a unique event A ∈ 2S , and defined the functions P(· ·) and ν(·) on hypotheses rather than events. For simplicity of exposition, we directly work with events rather than hypotheses. 26 Note that the object B ∨ C that denotes the explicit disjunction of B and C is not an event. B C Intuitively, P(A B ∨ C) = 1/(1 + B∨C ) where B∨C is naturally associated with A +A , yielding A A Equation (10).
FRAMING CONTINGENCIES
689
AXIOM 13—Proportionality: For all pairwise disjoint nonempty events A, B, and C, AB A = BC C Under the assumptions of the lemma, event ratios satisfy proportionality since π = {A B C (A ∪ B ∪ C) } is a partition and ABC μπ (A) μπ (B) μπ (C) = = 1 BCA μπ (B) μπ (C) μπ (A) Therefore, the probability judgment function also satisfies proportionality. We = 1 for any nonempty event A. adopt the convention that A A PROOF OF LEMMA 7: We next prove a verbatim adaption of the proof of Theorem 1 in Tversky and Koehler (1994). To establish sufficiency, we define ν as follows. Let S = {{a} : a ∈ S} be the set of singleton events.27 Select some D∗ ∈ S and set ν(D∗ ) = 1. For any other singleton event C ∈ S, such that C = D∗ , define ν(C) = DC∗ . Given any event A ∈ 2S such that A = S ∅, select some C ∈ S such that A ∩ C = ∅ and define ν(A) through ν(A) A = ν(C) C that is, ν(A) =
A C C D∗
To demonstrate that ν(A) is uniquely defined, suppose B ∈ S \ {C} and A ∩ B = ∅. We want to show that (11)
A B A C = ∗ CD B D∗
If D∗ = B or D∗ = C, then Equation (11) directly follows from proportionality. If, on the other hand, D∗ ∩ B = D∗ ∩ C = ∅, then by repeated application of proportionality, A A B A B D∗ = = C BC B D∗ C Tversky and Koehler (1994) called a hypothesis A elementary if the associated event A is a singleton. Therefore, the collection of singleton events S above takes the role of the collection of elementary hypotheses E in their proof. 27
690
D. S. AHN AND H. ERGIN
proving Equation (11). To complete the definition of ν, let ν(∅) = 0 and fix ν(S) > 0 arbitrarily. To establish the desired representation, we first show that for any disjoint events A and B such that A B = S ∅, we have ν(A)/ν(B) = AB . Two cases must be considered. First suppose that A ∪ B = S; hence, there exists a singleton event C ∈ S such that A ∩ C = B ∩ C = ∅. In this case, ν(A) A B AC A = ν(C) ν(C) = = ν(B) C C CB B by proportionality. Second, suppose A∪B = S. In this case, there is no C ∈ S that is not included in either A or B, so the preceding argument cannot be applied. To show that (ν(A))/(ν(B)) = AB , suppose C D ∈ S and A ∩ C = B ∩ D = ∅. Hence, ν(A) ν(A)ν(C)ν(D) = ν(B) ν(C)ν(D)ν(B) ACD CDB A (by the product rule). = B =
For any pair of disjoint events, therefore, we obtain AB = (ν(A))/(ν(B)) and ν is unique up to a choice of unit which is determined by ν(D∗ ). It is easy to see that this implies that μπ (E) = ν(E)ν(F) for any π ∈ Π ∗ and E ∈ π. Q.E.D. F∈π
APPENDIX E: PROOFS FOR SECTION 5 PROOF OF PROPOSITION 2: (i) To see the ⇒ part of (i), assume that A ∈ A and let E be any event. Assume without loss of generality that E = ∅. Consider the partition π = {E E ∩ A E ∩ A }. Since E = S, the sets E ∩ A and E ∩ A cannot both be empty. Hence by strict admissibility ν(E ∩ A) + ν(E ∩ A ) > 0. Assume without loss of generality that [0 1] ⊂ u(ΔX) and let p q r ∈ ΔX be such that u(p) = 1, u(q) = 0, and (12)
u(r) =
ν(E) ν(E) + ν(E ∩ A) + ν(E ∩ A )
Define the act f by p E f= q E
FRAMING CONTINGENCIES
691
Then f ∈ Fπ and f ∼π r. Hence by A ∈ A we have that f ∼π∨{AA } r. Since π ∨ {A A } = {E ∩ A E ∩ A E ∩ A E ∩ A }, the last indifference implies that (13)
u(r) =
ν(E ∩ A) + ν(E ∩ A ) ν(E ∩ A) + ν(E ∩ A ) + ν(E ∩ A) + ν(E ∩ A )
By Equations (12), (13), and ν(E ∩ A) + ν(E ∩ A ) > 0, we conclude that ν(E) = ν(E ∩ A) + ν(E ∩ A ). To see the ⇐ part of (i), assume that ν(E) = ν(E ∩ A) + ν(E ∩ A ) for any event E = S. Take any π ∈ Π ∗ . If π is the trivial partition, then the desired conclusion follows trivially from state independence. So assume without loss of generality that π is nontrivial and let π = π ∨ {A A }. It suffices to show that μπ (F) = μπ (F) for all F ∈ π. To see this, note that ν(F) ν(F ∩ A) + ν(F ∩ A ) μπ (F) = = = μπ (F) ν(E) [ν(E ∩ A) + ν(E ∩ A )] E∈π
E∈π
where the middle equality follows from our assumption, and F = S and E = S since π is nontrivial. (ii) By definition, A is closed under complements and ∅ S ∈ A. It suffices to show that A is closed under intersections. Let A B ∈ A and take any event E = S. We have that ν(E) = ν(E ∩ A) + ν(E ∩ A ) = ν(E ∩ A ∩ B) + ν(E ∩ A ∩ B ) + ν(E ∩ A ) by part (i), A B ∈ A, and E E ∩ A = S. Similarly, we have that ν(E ∩ (A ∩ B) ) = ν(E ∩ (A ∩ B) ∩ A) + ν(E ∩ (A ∩ B) ∩ A ) = ν(E ∩ A ∩ B ) + ν(E ∩ A ) The two equalities above imply that ν(E) = ν(E ∩ A ∩ B) + ν(E ∩ (A ∩ B) ) Therefore, by part (i), A ∩ B ∈ A. (iii) We next prove the first part of (iii). Let A B ∈ A be disjoint events such that A ∪ B = S. Since A ∈ A, we have by part (i) that ν(A ∪ B) = ν([A ∪ B] ∩ A) + ν([A ∪ B] ∩ A ) = ν(A) + ν(B) Hence ν is additive on A \ {S}.
692
D. S. AHN AND H. ERGIN
To see the second part of (iii), let A B ∈ A \ {∅ S}. Note that ν(A) + ν(A ) = ν(A ∩ B) + ν(A ∩ B ) + ν(A ∩ B) + ν(A ∩ B ) by part (i) applied twice to B ∈ A and to A A = S. By the exact symmetric argument, and interchanging the roles of A and B, we also have that ν(B) + ν(B ) = ν(B ∩ A) + ν(B ∩ A ) + ν(B ∩ A) + ν(B ∩ A ) Hence ν(A) + ν(A ) = ν(B) + ν(B ) as desired. (iv) Immediately follows from parts (i) and (iii).
Q.E.D.
PROOF OF PROPOSITION 3: (i) The ⇐ part of (i) is easily seen to hold even without monotonicity of ν. To see the ⇒ part, assume that E is completely overlooked. If E = ∅, then the conclusion is immediate, so assume without loss of generality that E = ∅. Take any nonempty event F disjoint from E such that E ∪ F = S. Let G = S \ (E ∪ F) = ∅. We first show that (14)
ν(E ∪ F) ν(F) = ν(G) ν(E ∪ G)
The fractions above are well defined since strict admissibility guarantees that the denominators do not vanish. To see (14), let p q r ∈ ΔX be such that u(p) > u(q) and (15)
ν(G) ν(E ∪ F) u(p) + u(q) = u(r) ν(E ∪ F) + ν(G) ν(E ∪ F) + ν(G) p E∪F ⇐⇒ ∼ r q G
By E being completely overlooked, we have (16)
ν(F) ν(E ∪ G) u(p) + u(q) = u(r) ν(F) + ν(E ∪ G) ν(F) + ν(E ∪ G) p F ⇐⇒ ∼ r q E∪G
Since u(p) > u(q), (15) and (16) imply that ν(E ∪ F) ν(F) = ν(E ∪ F) + ν(G) ν(F) + ν(E ∪ G) which is equivalent to (14).
FRAMING CONTINGENCIES
693
By monotonicity of ν, we have that ν(F) ν(E ∪ F) ν(F) ≤ ≤ ν(E ∪ G) ν(G) ν(G) By Equation (14), all the weak equalities above are indeed equalities, hence, in particular, ν(F) = ν(E ∪ F) as desired. (ii) Assume that E and F are completely overlooked and E ∪ F = S. To see that E ∪ F is completely overlooked, let G be a nonempty event disjoint from E ∪ F such that E ∪ F ∪ G = S. Then G is disjoint from E and E ∪ G = S. By part (i), we have ν(E ∪ G) = ν(G). Moreover, E ∪ G is disjoint from F and E ∪ F ∪ G = S. Again by part (i), we have, ν(E ∪ F ∪ G) = ν(E ∪ G). Hence ν(E ∪ F ∪ G) = ν(G), as desired. To see that E ∩ F is completely overlooked, suppose that G is a nonempty event disjoint from E ∩ F such that [E ∩ F] ∪ G = S. We show that ν(G ∪ [E ∩ F]) = ν(G) by considering three cases. This will imply, by part (i), that E ∩ F is completely overlooked. Case 1. G ⊂ E. In this case, G \ F = ∅, for otherwise G ⊂ E ∩ F would not be disjoint from E ∩ F . Moreover, (G \ F) ∪ F = G ∪ F ⊂ E ∪ F = S, hence by part (i) we have that ν([G \ F] ∪ F) = ν(G \ F). By monotonicity (17)
ν(G) ≤ ν(G ∪ [E ∩ F]) ≤ ν(G ∪ F) = ν([G \ F] ∪ F) = ν(G \ F) ≤ ν(G)
Hence ν(G ∪ [E ∩ F]) = ν(G). Case 2. G ⊂ F . We again have that ν(G ∪ [E ∩ F]) = ν(G) by exactly the same argument as the one above, changing the roles of events E and F . Case 3. G \ E = ∅ and G \ F = ∅. It cannot be that both G ∪ E and G ∪ F are equal to S, because otherwise [G ∪ E] ∩ [G ∪ F] = G ∪ [E ∩ F] = S, contradicting the hypothesis. Assume without loss of generality that G ∪ F = S. Hence by part (i), we have that ν([G \ F] ∪ F) = ν(G \ F). By Equation (17) again, we conclude that ν(G ∪ [E ∩ F]) = ν(G). (iii) The ⇐ part of (iii) is easily seen to hold even without monotonicity of ν. We only prove the ⇒ part. We first show that ν(G) = ν(G ) if G = ∅ S. To see this, note that since there are at least three states, G or G is not a singleton. Without loss of generality, suppose that G has at least two elements and let {G1 G2 } be a two element partition of G. Then by part (i), ν(G) = ν(G1 ∪ G2 ) = ν(G1 ) = ν(G1 ∪ G ) = ν(G ) where the second equality follows because G2 and G1 ∪ G2 = S are completely unforeseen; the third equality follows because G and G1 ∪ G = S are completely unforeseen; and the fourth equality follows because G1 and G1 ∪G = S are completely unforeseen.
694
D. S. AHN AND H. ERGIN
Take any distinct events E F = ∅ S. If E \ F = ∅, then ν(E \ F) ≤ ν(E) = ν(E ) ≤ ν((E \ F) ) = ν(E \ F) where the inequalities follow from monotonicity of ν, hence ν(E) = ν(E \ F). Similarly ν(E \ F) ≤ ν(F ) = ν(F) ≤ ν((E \ F) ) = ν(E \ F) hence ν(F) = ν(E \ F) = ν(E) as desired. The case where F \ E = ∅ is entirely symmetric. Q.E.D. REFERENCES AHN, D. S., AND H. ERGIN (2010): “Supplement to ‘Framing Contingencies’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/7019_proofs.pdf. [663] ANSCOMBE, F. J., AND R. J. AUMANN (1963): “A Definition of Subjective Probability,” Annals of Mathematical Statistics, 34, 199–205. [656] BOURGEOIS-GIRONDE, S., AND R. GIRAUD (2009): “Framing Effects as Violations of Extensionality,” Theory and Decision, 67, 385–404. [656] BRENNER, L. A., D. J. KOEHLER, AND Y. ROTTENSTREICH (2002): “Remarks on Support Theory: Recent Advances and Future Directions,” in Heuristics and Biases: The Psychology of Intuitive Judgment, ed. by T. Gilovich, D. Griffin, and D. Kahneman. New York: Cambridge University Press, 489–509. [657] COHEN, M., AND J.-Y. JAFFRAY (1980): “Rational Behavior Under Complete Ignorance,” Econometrica, 48, 1281–1299. [658] DEKEL, E., B. L. LIPMAN, AND A. RUSTICHINI (1998a): “Recent Development in Modeling Unforeseen Contingencies,” European Economic Review, 42, 523–542. [657] (2001): “Representing Preferences With a Unique Subjective State Space,” Econometrica, 69, 891–934. [657] EPSTEIN, L. G., M. MARINACCI, AND K. SEO (2007): “Coarse Contingencies and Ambiguity,” Theoretical Economics, 2, 355–394. [657] FISCHOFF, B., P. SLOVIC, AND S. LICHTENSTEIN (1978): “Fault Trees: Sensitivity of Estimated Failure Probabilities to Problem Representation,” Journal of Experimental Psychology: Human Perception and Performance, 4, 330–334. [657] FOX, C. R. (1999): “Strength of Evidence, Judged Probability, and Choice Under Uncertainty,” Cognitive Psychology, 38, 167–189. [671] FOX, C. R., AND Y. ROTTENSTREICH (2003): “Partition Priming in Judgment Under Uncertainty,” Psychological Science, 14, 195–200. [657] GHIRARDATO, P. (2001): “Coping With Ignorance: Unforeseen Contingencies and Non-Additive Uncertainty,” Economic Theory, 17, 247–276. [657] MUKERJI, S. (1997): “Understanding the Nonadditive Probability Decision Model,” Economic Theory, 9, 23–46. [657] NEHRING, K. (1999): “Capacities and Probabilistic Beliefs: A Precarious Coexistence,” Mathematical Social Sciences, 38, 196–213. [673] (2008): Personal Communication. [671,672,687] ROTTENSTREICH, Y., AND A. TVERSKY (1997): “Unpacking, Repacking, and Anchoring: Advances in Support Theory,” Psychological Review, 104, 406–415. [662] SAVAGE, L. J. (1954): The Foundations of Statistics. New York: Wiley. [656,665] TVERSKY, A., AND D. KAHNEMAN (1983): “Extensional versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment,” Psychological Review, 90, 293–315. [660,669]
FRAMING CONTINGENCIES
695
TVERSKY, A., AND D. J. KOEHLER (1994): “Support Theory: A Nonextensional Representation of Subjective Probability,” Psychological Review, 101, 547–567. [657,661-663,671,672,687-689]
Dept. of Economics, University of California, 508-1 Evans Hall 3880, Berkeley, CA 94720-3880, U.S.A.;
[email protected] and Dept. of Economics, Washington University in Saint Louis, Campus Box 1208, Saint Louis, MO 63130, U.S.A.;
[email protected]. Manuscript received March, 2007; final revision received July, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 697–718
NOTES AND COMMENTS CONSTRUCTING OPTIMAL INSTRUMENTS BY FIRST-STAGE PREDICTION AVERAGING BY GUIDO KUERSTEINER AND RYO OKUI1 This paper considers model averaging as a way to construct optimal instruments for the two-stage least squares (2SLS), limited information maximum likelihood (LIML), and Fuller estimators in the presence of many instruments. We propose averaging across least squares predictions of the endogenous variables obtained from many different choices of instruments and then use the average predicted value of the endogenous variables in the estimation stage. The weights for averaging are chosen to minimize the asymptotic mean squared error of the model averaging version of the 2SLS, LIML, or Fuller estimator. This can be done by solving a standard quadratic programming problem. KEYWORDS: Model averaging, instrumental variable, many instruments, two-stage least squares, LIML, Fuller’s estimator, higher order theory.
1. INTRODUCTION IN THIS PAPER, WE PROPOSE a new and flexible method of constructing optimal instruments for two stage least squares (2SLS), limited information maximum likelihood (LIML), and Fuller’s estimators of linear models when there are many instruments available. Donald and Newey (2001) and Newey (2007) proposed a selection criterion to select instruments in a way that balances higher order bias and efficiency. We show that the model averaging approach of Hansen (2007) can be applied to the first stage of the 2SLS estimator as well as to a modification of LIML and Fuller’s (1977) estimator. The benefits of model averaging mostly lie in a more favorable trade-off between estimator bias and efficiency relative to procedures that rely on a single set of instruments. In contrast to the existing literature on model averaging, the weights for averaging are not restricted to be positive. Our theoretical results show that for certain choices of weights, the model averaging 2SLS estimator (MA2SLS) controls higher order bias and achieves the same higher order rates of convergence as the Nagar (1959) and LIML estimators, and thus dominates conventional 2SLS procedures. Model averaging allowing for bias reduction requires a refined asymptotic approximation to the mean squared error (MSE) of the 2SLS estimator. We provide such an approximation by including terms 1 We thank Whitney Newey, Xiaohong Chen, Bruce Hansen, Jack Porter, Kaddour Hadri, Ken Ariga, Yuichi Kitamura, Kosuke Oya, Han Hong, and seminar participants at Columbia, Georgetown, USC, Yale, CIREQ, Berkeley, Hitotsubashi, Seoul National (SETA08), Kyoto, and Osaka as well as two anonymous referees for valuable comments and suggestions. Okui acknowledges financial support from the Hong Kong University of Science and Technology under Project DAG05/06.BM16 and from the Research Grants Council of Hong Kong under Project HKUST643907. We are solely responsible for all errors.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7444
698
G. KUERSTEINER AND R. OKUI
of the next higher order than the leading bias term in our MSE approximation. This approach provides a criterion that directly captures the trade-off between higher order variance and bias correction. Conventional instrument selection procedures often depend on an ad hoc ordering of the instruments. By allowing our model weights to be both positive and negative, we establish that the MA2SLS and corresponding LIML and Fuller estimators have the ability to select arbitrary subsets of instruments from an orthogonalized set of instruments. In other words, if there are certain orthogonal directions in the instrument space that are particularly useful for the first stage, our procedure is able to individually select these directions from the instrument set. The selection of relevant directions is done in a fully data-dependent and computationally efficient way without requiring any prior knowledge about the strength of the instruments. 2. DEFINITIONS AND IMPLEMENTATION Following Donald and Newey (2001), we consider the model (2.1)
yi = Yi βy + x1i βx + i = Xi β + i ηi Yi E[Yi |zi ] Xi = = f (zi ) + ui = + x1i x1i 0
i = 1 N
where yi is a scalar outcome variable, Yi is a d1 × 1 vector of endogenous variables, x1i is a vector of included exogenous variables, zi is a vector of exogenous variables (including x1i ), i and ui are unobserved random variables with second moments which do not depend on zi , and f (·) is an unknown function of z. Let fi = f (zi ). The set of instruments has the form ZMi ≡ (ψ1 (zi ) ψM (zi )) , where ψk s are functions of zi such that ZMi is an M × 1 vector of instruments. Let y = (y1 yN ) and define X, , f , and u similarly. First-stage prediction averaging is based on a weighting vector W , where M W = (w1 wM ) and m=1 wm = 1 for some M such that M ≤ N for any N. We note that W is implicitly indexed by the sample size N. In Section 4, we discuss in more detail the restrictions that need to be imposed on W and M, but we point out here that wm is allowed to take on positive and negative values. Let Zmi be the vector of the first m elements of ZMi , let Zm be the matrix M (Zm1 ZmN ) and let Pm = Zm (Zm Zm )−1 Zm . Define P(W ) = i=1 wm Pm . ˆ of β is defined as The MA2SLS estimator, β, (2.2)
βˆ = (X P(W )X)−1 X P(W )y
The definition of (2.2) can be extended to the LIML estimator. Let
(y − Xβ) Pm (y − Xβ) Λˆ m = min β (y − Xβ) (y − Xβ)
CONSTRUCTING OPTIMAL INSTRUMENTS
699
M ˆ and define Λ(W ) = m=1 wm Λˆ m The model averaging LIML (MALIML) esˆ of β then is defined as timator, β, (2.3)
ˆ ˆ )X X)−1 (X P(W )y − Λ(W )X y) βˆ = (X P(W )X − Λ(W
Similarly we consider a modification to Fuller’s (1977) estimator. Let ⎞ ⎛ α Λˆ m − (1 − Λˆ m ) ⎟ ⎜ N −m Λˇ m = ⎝ ⎠ α ˆ (1 − Λm ) 1− N −m ˇ where α is a constant chosen by the econometrician.2 Define Λ(W )= M ˇ Λ w . The model averaging Fuller estimator (MAFuller) now is defined m=1 m m as (2.4)
ˇ ˇ )X X)−1 (X P(W )y − Λ(W )X y) βˆ = (X P(W )X − Λ(W
We choose W to minimize the approximate MSE, Sλ (W ), defined in (4.1), of λ βˆ for some fixed λ ∈ Rd The optimal weight, denoted W ∗ , is the solution to minW ∈Ω Sλ (W ), where Ω is some set. We consider several versions of Ω which lead to different estimators. The MA2SLS, MALIML, and MAFuller estimators are unconstrained if Ω = ΩU = {W ∈ l1 |W 1M = 1}, where l1 is a space of absolutely summable sequences defined in Assumption 4. From a finite sample point of view, it may be useful to further constrain the weights W to lie in a compact set. This is achieved in Ω = ΩC = {W ∈ l1 |W 1M = 1; wm ∈ [−1 1] ∀m ≤ M}. If we allow only positive weights, then Ω = ΩP = {W ∈ l1 |W 1M = 1; wm ∈ [0 1] ∀m ≤ M}. We now discuss how to estimate Sλ (W ). Let β˜ denote some preliminary es˜ As pointed out in Donald timator of β, and define the residuals ˜ = y − X β. and Newey (2001), it is important that β˜ does not depend on the weighting vector W . We use the 2SLS estimator with the number of instruments selected by the first-stage Mallows criterion in simulations for MA2SLS, and use the corresponding LIML and Fuller estimator for MALIML and MAFuller. Let Hˆ be some estimator of H = f f/n. Let u˜ be some preliminary residual vector of the first-stage regression. Let u˜ λ = u˜ Hˆ −1 λ. Define σˆ 2 = ˜ ˜ /N
σˆ λ2 = u˜ λ u˜ λ /N
σˆ λ = u˜ λ ˜ /N
3 ˆ −1 ˆ ˆ 1λ uˆ M ˆ 1λ uˆ M Let uˆ m λ = (PM − Pm )X H λ and U = (u λ ) (u λ ). Let Γ be the M × M matrix whose (i j) element is min(i j) and let K = (1 2 M) . The 2 Popular choices are α = 1 or α = 4 See, for example, Hahn, Hausman, and Kuersteiner (2004). 3 Note that u˜ is the preliminary residual vector, but uˆ m λ s are the vectors of the differences of the residuals.
700
G. KUERSTEINER AND R. OKUI
criterion Sˆλ (W ) for choosing the weights for the MA2SLS defined in (2.2) is (2.5)
(K W )2 ˆ (W Γ W ) K W ˆ + bλ − BλN Sˆλ (W ) = aˆ λ N N N ˆ W UW − σˆ λ2 (M − 2K W + W Γ W ) + σˆ 2 N
2 2 with aˆ λ = σˆ λ , bˆ λ = σˆ 2 σˆ λ2 + σˆ λ , and Bˆ λN = λ Hˆ −1 Bˆ N Hˆ −1 λ, where Bˆ N is some 4 estimator of BN defined in (4.2). When the weights are only allowed to be positive, we may use the simpler criterion,
(2.6)
ˆ (K W )2 ˆ λ2 (M − 2K W + W Γ W ) 2 W UW − σ ˆ + σˆ Sλ (W ) = aˆ λ N N
that does not account for the terms of smaller order involving bˆ λ and Bˆ λN . For the MALIML and MAFuller estimators defined in (2.3) and (2.4), we choose W based on the criterion (2.7)
2 W ΓW ) Sˆ λ (W ) = (σˆ 2 σˆ λ2 − σˆ λ N ˆ W UW − σˆ λ2 (M − 2K W + W Γ W ) + σˆ 2 N
Theorem 4.4 establishes the sense in which Sˆλ (W ) defined in (2.5)–(2.7) is a valid estimator of the corresponding population criterion Sλ (W ) From a practical point of view, it is often important to report the quality of fit for the first-stage regression in 2SLS. The model averaging predictor of X is Xˆ = P(W )X. When dim(β) = 1 and Xi has mean zero (or is demeaned), a ˆ 2 and measure of the fit of the first stage is the pseudo R2 defined as corr(X X) estimated by ˆ 2= R˜ 2 = corr(X X)
(X P(W )X)2 X P(W )P(W )X · X X
We have R˜ 2 ∈ [0 1] by construction and Theorem A.5 in the Supplemental Material (Kuersteiner and Okui (2010)) shows that R˜ 2 →p E(fi2 )/(E(fi2 ) + σu2 ), which is the population R2 2 When dim(β) = 1, we have BN = 2(σ2 Σu + 4σu ), where Σu = E[ui ui ], σu = E[ui i ], and we 2 may use Bˆ λN = 2(σˆ 2 σλ2 + 4σˆ λ ). The Supplemental Material (Kuersteiner and Okui (2010)) to this paper contains an expression for Bˆ N for the general case. 4
CONSTRUCTING OPTIMAL INSTRUMENTS
701
3. MOTIVATION AND DISCUSSION OF THEORETICAL RESULTS We refer to instrument selection procedures that take the ordering of instruments in ZM as given and only choose the number of instruments as sequential instrument selection procedures. Sequential instrument selection is a special case of model averaging with weights chosen from the set Ωsq ≡ {W ∈ l1 |wm = 1 for some m and wj = 0 for j = m} to minimize Sλ (W ). Note that when W ∈ Ωsq , it follows that K W = m and (I − P(W ))(I − P(W )) = (I − Pm ). Hence, S(W ) for 2SLS with W restricted to W ∈ Ωsq simplifies to m2 m −1 2 f (I − Pm )f H aσ + (bσ − BN ) + σ H −1 (3.1) N N N for m ≤ M Because m/N = o(m2 /N) as m → ∞, the expression for S(W ) with W ∈ Ωsq reduces to the result of Donald and Newey (2001, Proposition 1). We note that all sets Ω = ΩU ΩP contain the set Ωsq as a subset (i.e., Ωsq ⊂ Ω). This guarantees that MA2SLS, MALIML, and MAFuller weakly dominate the sequential instrument selection procedure in the sense that Sλ (W ∗ ) ≤ minW ∈Ωsq Sλ (W ). Theorem 4.3 in Section 4 establishes that the inequality is strict for MA2SLS, MALIML, and MAFuller when W ∈ ΩP The theorem holds under conditions where f (I − Pm )f is monotonically decaying and convex in m, and thus applies to situations where the instruments have already been optimally ordered. Since ΩP ⊂ ΩC ⊂ ΩU , Theorem 4.3 establishes strict dominance for all model averaging estimators considered in this paper. To better understand these results, it is useful to consider the bias variance trade-off of 2SLS in more detail. It can be shown that the largest term √ of the ˆ higher order bias of MA2SLS, β in (2.2), is proportional to K W / N. When a specific first-stage model with exactly m instruments is selected, this result reduces √ to the well known result that the higher order bias is proportional to m/ N. To illustrate the bias reduction properties of MA2SLS, we consider an extreme case where the higher order bias is completely eliminated. This occurs when W satisfies the additional constraint K W = 0. Thus, the higher order rate of convergence of MA2SLS can be improved relative to the rate for 2SLS by allowing wj to be both positive and negative. In fact, the Nagar estimator can be interpreted as a special case of the MA2SLS, where M = N, wm = N/(N − m) for some m, wN = −m/(N − m), and wj = 0 otherwise. An instrument selection method related to ours is from Kuersteiner (2002), who proposed a kernel weighted form of the 2SLS estimator in the context of time series models and showed that kernel weighting reduces the bias of 2SLS. Let k = diag(k1 kM ), where kj = k((j − 1)/M) are kernel functions k(·) evaluated at j/M with k(0) = 1. The kernel weighted 2SLS esti mator then is defined as in (2.2) with P(W ) replaced by ZM k(ZM ZM )−1 kZM For expositional purposes and to relate kernel weighting to model averaging, we consider a special case in which instruments are mutually orthogonal so
702
G. KUERSTEINER AND R. OKUI
that ZM ZM is a diagonal matrix. Let Z˜ j be the jth column of ZM such that ZM = (Z˜ 1 Z˜ M ) and P˜j = Z˜ j (Z˜ j Z˜ j )−1 Z˜ j . For a given set of kernel weights k, there exist weights W such that for wj = k2j − k2j+1 and wM = k2M , the relaM M tionship m=1 wm Pm = j=1 k2j P˜j = ZM k(ZM ZM )−1 kZM holds. In other words, the kernel weighted 2SLS estimator corresponds to model averaging with the weights {wm }M m=1 defined above. Okui’s (2008) shrinkage 2SLS estimator is also a special case of the averaged estimator (2.2). In this case, wL = s wM = 1 − s s ∈ [0 1], and wj = 0 for j = L M where L (<M) is fixed. Okui’s procedure can be interpreted in terms of kernel √weighted 2SLS. Letting the kernel function k(x) = 1 for x ≤ L/M k(x) = s for L/M < x ≤ 1, and k(x) = 0 otherwise implies that the kernel weighted 2SLS estimator formulated on the orthogonalized instruments is equivalent to Okui’s procedure. The common feature of kernel weighted 2SLS estimators is that they shrink the first stage coefficient estimators toward zero. Shrinkage in the first-stage reduces bias in the second stage at the cost of reduced efficiency. While kernel weighting has been shown to reduce bias, conventional kernels with monotonically decaying “tails” cannot completely eliminate bias. Kuersteiner (2002) showed that the distortions of the optimal 2SLS weight matrix introduced by kernel weights asymptotically dominate the higher order variance of βˆ for conventional choices of k(·). This later problem was recently addressed by Canay (2010) through the use of top-flat kernels (e.g., Politis and Romano (1995)). Despite these advances, conventional kernel based methods have significant limitations due to the fact that once a kernel function is chosen, the weighting scheme is not flexible. The more flexible weighting schemes employed by MA2SLS guarantee that the net effect of bias reduction at the cost of decreased efficiency always results in a net reduction of the approximate MSE of the second-stage estimator. Similar improvements are possible for LIML and Fuller’s estimator, where two higher order variance terms can be balanced efficiently against each other through flexible instrument weighting. A second advantage of model averaging is its ability to eliminate irrelevant instruments with an appropriate choice of W . Let ZjM be the jth column of ZM when j ≤ M and define Z˜ 2 = (I − P1 )Z2M Z˜ 3 = (I − P2 )Z3M Z˜ M = (I − PM−1 )ZM such that Z1 Z˜ 2 Z˜ M are orthogonal and span ZM . Then, M PM = j=1 P˜j , where P˜j = Z˜ j (Z˜ j Z˜ j )−1 Z˜ j for j > 1 and P˜1 = Z1 (Z1 Z1 )−1 Z1 . It M M M follows that m=1 wm Pm = j=1 w˜ j P˜j for w˜ j = m=j wm . Then w˜ j is the weight on the mth orthogonalized instrument. We may write W = D−1 W˜ , where D is an M × M matrix with elements dij = 1{j ≥ i} and W˜ = (w˜ 1 w˜ M ) . The only constraint we impose on W˜ is w˜ 1 = 1. Since W˜ is otherwise unconstrained, we can set w˜ j = 0 for any 1 < j ≤ M so that the jth instrument is eliminated. The
CONSTRUCTING OPTIMAL INSTRUMENTS
703
use of negative weights thus allows MA2SLS, MALIML, and MAFuller to pick out relevant instruments from a set of instruments that contains redundant instruments. 4. REGULARITY CONDITIONS AND FORMAL RESULTS This section covers the formal theory underlying our estimators. All proofs are available in the Supplemental Material (Kuersteiner and Okui (2010)). The choice of model weights W is based on an approximation to the higher order ˆ Following Donald and Newey (2001), we approximate the MSE MSE of β. conditional on the exogenous variable z, E[(βˆ − β0 )(βˆ − β0 ) |z], by σ2 H −1 + S(W ), where (4.1)
ˆ ) + rˆ(W ) N(βˆ − β0 )(βˆ − β0 ) = Q(W ˆ E[Q(W )|z] = σ2 H −1 + S(W ) + T (W )
H = f f/N, and (ˆr (W ) + T (W ))/ tr(S(W )) = op (1) as N → ∞. However, because of the possibility of bias elimination by setting K W = 0, we need to consider an expansion that contains additional higher order terms for the MA2SLS case. We show the asymptotic properties of MA2SLS, MALIML, and MAFuller under the following assumptions. ASSUMPTION 1: {yi Xi zi } are independent and identically distributed (i.i.d.), E[2i |zi ] = σ2 > 0, and E[ηi 4 |zi ] and E[|i |4 |zi ] are bounded. ASSUMPTION 2: (i) H¯ ≡ E[fi fi ] exists and is nonsingular. (ii) For some α > 1/2,
sup m2α sup λ f (I − Pm )f λ/N = Op (1) m≤M
λ λ=1
(iii) Let N+ be the set of positive integers. There exists a subset J¯ ⊂ N+ with a finite number of elements such that supm∈J¯ supλ λ λ f (Pm − Pm+1 )f λ/N = 0 with ¯ it follows that probability approaching 1 (wpa1) and for all m ∈ / J,
inf m2α+1 sup λ f (Pm − Pm+1 )f λ/N > 0 wpa1 ¯ m∈ / Jm≤M
λ λ=1
ASSUMPTION 3: (i) Let uia be the ath element of ui . E[ri usia |zi ] are constant and bounded for all a, and r s ≥ 0 and r + s ≤ 5. Let σu = E[ui i |zi ] and Σu = E[ui ui |zi ]. (ii) ZM ZM are nonsingular wpa1. (iii) maxi≤N PMii →p 0, where PMii signifies the (i i)th element of PM . (iv) fi is bounded.
704
G. KUERSTEINER AND R. OKUI
ASSUMPTION 4: Let W + = (|w1N | |wMN |) . The following conditions ∞ hold: 1M W = 1; W ∈ l1 for all N, where l1 = {x = (x1 )| i=1 |xi | ≤ Cl1 < ∞} for some constant Cl1 , M ≤ N; and, as N → ∞ and M → ∞, M K W + = m=1 |wm |m → ∞. For some sequence L ≤ M such that L → ∞ as ¯ where J¯ is defined in Assumption 2(iii), it follows that N → ∞ and L ∈ / J, √ j supj∈/ Jj≤L | m=1 wm | = O(1/ N) as N → ∞. ¯ √ ASSUMPTION 5: It holds either that (i) K W + / N → 0 or (ii) K W + /N → 0 and M/N → 0. ASSUMPTION 6: The eigenvalues of E[Zki Zki ] are bounded away from zero uniformly in k. Let H¯ k = E[fi Zki ](E[Zki Zki ])−1 E[fi Zki ] and H¯ = E[fi fi ]. −2α 8 ¯ ¯ Then Hk − H = O(k ) for k → ∞. E[|i | |z] and E[|uia |8 |z] are uniformly bounded in z for all a.
ASSUMPTION 7: β ∈ Θ, where Θ is a compact subset of Rd . REMARK 1: The second part of Assumption 2 allows for redundant instruments where f (Pm − Pm+1 )f/N = 0 for some m, as long as the number of such cases is small relative to M. Assumptions 1–3 are similar to those imposed in Donald and Newey (2001). The set J¯ corresponds to the set of redundant instruments. Assumption 4 collects the conditions that weights must satisfy and is related to the conditions imposed by Donald and Newey (2001) on the number of instruments. The condition K W + → ∞ may be understood as the number√of instruments tendj | s=1 ws | = O(1/ N), guarantees that ing to infinity. The condition, supj∈/ Jj≤L ¯ small models receive asymptotically negligible weight and is needed to guarantee first-order asymptotic efficiency of the MA2SLS, MALIML, and MAFuller estimators. We also restrict W to lie in the space of absolutely summable sequences l1 . The fact that the sequences in l1 have infinitely many elements creates no problems since one can always extend W to l1 by setting wj = 0 for all j > M. Assumption 5 limits the rate at which the number of instruments is allowed to increase and Assumption 5(i) guarantees standard firstorder asymptotic properties of the estimators. A weaker condition, Assumption 5(ii), is sufficient for the MALIML/Fuller estimator. Assumptions 6 and 7 are used when we derive the asymptotic MSE of the MALIML/Fuller estimator. THEOREM 4.1: Suppose that Assumptions 1–3 are satisfied. Define μi (W ) = E[2i ui ]Pii (W ) and μ(W ) = (μ1 (W ) μN (W )) . If W satisfies Assumptions 4 and 5(i), then, for MA2SLS (βˆ defined in (2.2)), the decomposition given by (4.1)
705
CONSTRUCTING OPTIMAL INSTRUMENTS
holds with S(W ) = H −1 Cum[i i ui ui ] ) + (σ2 Σu + σu σu
N (Pii (W ))2 i=1
N
+ σu σu
(K W )2 N
(W Γ W ) N N fi Pii (W )
N
fi Pii (W ) KW i=1 i=1 2 BN + E[1 u1 ] + E[21 u1 ] − N N N f (I − P(W ))μ(W ) μ(W ) (I − P(W ))f + + N N f (I − P(W ))(I − P(W ))f H −1 + σ2 N where d = dim(β) and (4.2)
N 1 fi σu H −1 σu fi BN = 2 σ Σu + dσu σ + N i=1 2
u
N 1 −1 −1 + (fi σu H fi σu + σu fi H σu fi ) N i=1
2 ). REMARK 2: When d = 1, BN = 2(σ2 Σu + 4σu
Note that the term BN is positive semidefinite. This implies that a higher order formula that neglects the term (K W /N)BN overestimates the effect on the bias of including more instruments. A number of special cases discussed in the Supplemental Material (Kuersteiner and Okui (2010)) lead to simplifications of the above result. In particular, if Cum[i i ui ui ] = 0 and E[2i ui ] = 0, as would be the case if i and ui were jointly Gaussian, the following result is obtained. COROLLARY 4.1: Suppose that the same conditions as in Theorem 4.1 hold and that, in addition, Cum[i i ui ui ] = 0 and E[2i ui ] = 0. Then, for MA2SLS (βˆ defined in (2.2)), the decomposition given by (4.1) holds with 2 (W Γ W ) −1 (K W ) σu σu (4.3) + (σ2 Σu + σu σu ) S(W ) = H N N KW 2 f (I − P(W ))(I − P(W ))f − BN + σ H −1 N N
706
G. KUERSTEINER AND R. OKUI
where BN is as defined before. For the MALIML and MAFuller estimator, we obtain the following result. THEOREM 4.2: Suppose that Assumptions 1–4, 5(ii), 6, and 7 are satisfied. , μv (W ) = (μv1 (W ) Let vi = ui − (σu /σ2 )i . Define Σv = Σu − σuε σuε 2 μvN (W )) , and μvi (W ) = E[i vi ]Pii (W ). Then, for MALIML (βˆ defined in (2.3)) and MAFuller (βˆ defined in (2.4)), the decomposition given by (4.1) holds with W Γ W f (I − P(W ))(I − P(W ))f −1 σ2 Σv + σ2 S(W ) = H N N N (Pii (W ))2
+ ζˆ + ζˆ N f (I − P(W ))μv (W ) μv (W ) (I − P(W ))f − H −1 − N N
+ Cum[i i vi vi ]
i=1
where ζˆ =
N
fi Pii (W )E[2i vi ]/N − K W /N
i=1
N
fi E[2i vi ]/N
i=1
When Cum[i i vi vi ] = 0 and E[2i vi ] = 0, we have W Γ W −1 2 2 f (I − P(W ))(I − P(W ))f σ Σv + σ H −1 (4.4) S(W ) = H N N The following theorem shows that MA2SLS, MALIML, and MAFuller dominate corresponding estimators based on sequential moment selection under some regularity conditions on the population goodness-of-fit of the first-stage regression. THEOREM 4.3: Assume that Assumptions 1–5 hold. Let γm = λ H −1 f (I − Pm )f H −1 λ/N Assume that there exists a nonstochastic function C(a) such that supa∈[−εε] γm(1+a) /γm = C(a) wpa1 as N m → ∞ for some ε > 0. Assume that C(a) = (1 + a)−2α + o(|a|2α ) as a → 0. (i) For Sλ (W ) given by (4.3), it follows that min Sλ (W )
W ∈ΩP
min Sλ (W )
W ∈Ωsq
< 1 wpa1
CONSTRUCTING OPTIMAL INSTRUMENTS
707
(ii) For Sλ (W ) given by (4.4), it follows that min Sλ (W )
W ∈ΩP
min Sλ (W )
<1
wpa1
W ∈Ωsq
REMARK 3: The additional conditions on γm imposed in Theorem 4.3 are satisfied if γm = δm−2α , but are also satisfied for more general specifications. For example, if γm = δ(m)m−2α + op (m−2α ) as m → ∞, where the function δ(m) satisfies δ(m(1 + a))/δ(m) = 1 + o(|a|2α ) wpa1, then the condition holds. To show that Wˆ , which is found by minimizing Sˆ λ (W ), has certain optimality properties, we need to impose the following additional technical conditions. ASSUMPTION 8: For some α, supm≤M m2α+1 (supλ λ=1 λ f (Pm − Pm+1 )f λ/N) = Op (1). ASSUMPTION 9: Hˆ − H = op (1), σˆ 2 − σ2 = op (1), σˆ λ2 − σλ2 = op (1), σˆ λ − σλ = op (1), and Bˆ N − BN = op (1). ASSUMPTION 10: Let α be as defined in Assumption 8. For some 0 < ε < min(1/(2α) 1) and δ such that 2αε > δ > 0, it holds that M = O(N (1+δ)/(2α+1) ). For some ϑ > (1 + δ)/(1 − 2αε), it holds that E(|ui |2ϑ ) < ∞. Further assume that σˆ λ2 − σλ2 = op (N −δ/(2α+1) ). Assumption 8 imposes additional smoothness on f . Assumption 9 assumes the consistency of the estimators of the parameters in the criterion function. Assumption 10 restricts the order of the number of instruments and assumes the existence of the moments of ui . It also imposes a condition on the rate of the consistency of σˆ λ2 . For example, when α = 3/4, M = O(N 3/5 ), E[ui 16 ] < ∞, and σˆ λ2 − σλ2 = op (N −1/5 ), Assumption 10 is satisfied by taking ε = 1/2, δ = 1/2, and ϑ = 8. We note that σˆ λ2 − σλ2 = op (N −1/5 ) is achievable. The following result generalizes a result established by Li (1987) to the case of the MA2SLS, MALIML, and MAFuller estimators. THEOREM 4.4: Let Assumptions 1–10 hold. For Ω = ΩU , ΩC , or ΩP and Wˆ = arg minW ∈Ω Sˆλ (W ), where Sˆλ (W ) is defined in either (2.5) or (2.7), it follows that (4.5)
Sˆλ (Wˆ ) →p 1 inf Sλ (W )
W ∈Ω
Theorem 4.4 complements the result in Hansen (2007). Apart from the fact that Sˆλ (W ) is different from the criterion in Hansen (2007), there are more
708
G. KUERSTEINER AND R. OKUI
technical differences between our result and Hansen’s (2007). Hansen (2007) showed (4.5) only for a restricted set Ω, where Ω has a countable number of elements. We are able to remove the countability restriction and allow for more general W . However, in turn we need to impose an upper bound M on the maximal complexity of the models considered. 5. MONTE CARLO 5.1. Design We use the same experimental design as Donald and Newey (2001) to ease comparability of our results with theirs. Our data-generating process is the model: yi = βYi + i
Yi = π Zi + ui
for i = 1 N, where Yi is a scalar, β is the scalar parameter of interest, Zi ∼ i.i.d. N(0 IM ), and (i ui ) is i.i.d. jointly normal with variances 1 and covariance c. The integer M is the total number of instruments considered in each experiment. We fix the true value of β at 01 and examine how well each procedure estimates β. In this framework, each experiment is indexed by the vector of specifications: (N M c {π}). We set N = 100 1000. The number of instruments is M = 20 when N = 100 and M = 30 when N = 1000. The degree of endogeneity is controlled by the covariance c and set to c = 01 05 09. We consider the three specifications for π listed in Table I. Models A and B were considered by Donald and Newey (2001). In Model C, the first M/2 instruments are completely irrelevant. Other instruments are relevant and the strength of them decreases gradually as in Model B. We use this model to investigate the ability of our procedure to eliminate irrelevant instruments. For each model, c(M) is set so that π satisfies π π = R2f /(1 − R2f ), where R2f is the theoretical value of R2 and we set R2f = π π/(π π + 1) = 01 001. The number of replications is 5000. The experiments are conducted with Ox 5.1 for Windows (see Doornik (2007)). TABLE I MODEL SPECIFICATIONS FOR MONTE CARLO SIMULATIONS Model A
πm =
Model B R2f M(1−R2f )
Model C
πm = c(M)(1 −
m )4 M+1
πm =
0 c(M)(1 −
m−M/2 4 ) M/2+1
for m ≤ M/2 for m > M/2
CONSTRUCTING OPTIMAL INSTRUMENTS
709
5.2. Estimators We compare the performances of the following 14 estimators. Seven of them are existing procedures and the other seven procedures are the MA2SLS, MALIML, and MAFuller estimators developed in this paper. First, we consider the 2SLS estimator with all available instruments (2SLS-All in the tables). Second, the 2SLS estimator with the number of instruments chosen by Donald and Newey’s (2001) procedure is examined (2SLS-DN), where we use the criterion function (2.6). The kernel weighted generalized method of moments (GMM) of Kuersteiner (2002) is also examined (KGMM). Let ΩKGMM = {W ∈ l1 : wm = L−1 if m ≤ L and 0 otherwise for some L ≤ M}. Then the MA2SLS estimator with W ∈ ΩKGMM corresponds to the kernel weighted √ 2SLS estimator with kernel k(x) = max(1 − x 0). Because the weights are always positive with ΩKGMM , we use the criterion function (2.6) for KGMM. The procedure 2SLS-U is the MA2SLS estimator with Ω = ΩU . The procedure 2SLS-P uses the set Ω = ΩP . The criterion for MA2SLS-U and MA2SLS-P is formula (2.5). The procedure 2SLS-Ps also uses the set ΩP , but the criterion for computing weights is (2.6). For the LIML estimators, we consider the LIML estimator with all available instruments (LIML-All) and the LIML estimator with the number of instruments chosen by Donald and Newey’s (2001) procedure (LIML-DN). We use the criterion function (2.7) for LIML-DN. The procedures LIML-U and LIML-P are the MALIML estimators with Ω = ΩU and Ω = ΩP , respectively. For these MALIML estimators, we minimize the criterion (2.7) to obtain optimal weights. The estimators Fuller-All, Fuller-DN, Fuller-U, and Fuller-P are defined in the same way.5 For 2SLS-DN, LIML-DN, and Fuller-DN the optimal number of instruments is obtained by a grid search. We also use a grid search to find the L that minimizes the criterion for KGMM. For the MA2SLS, MALIML, and MAFuller estimators, we use the procedure SolveQP in Ox to minimize the criteria (see Doornik (2007)). We use the 2SLS, LIML, and Fuller estimators, respectively, with the number of instruments that minimizes the first-stage Mallows criterion as a first-stage estimator β˜ to estimate the parameters of the criterion function Sλ (W ). For each estimator, we compute the median absolute deviation relative to that of 2SLS-DN (RMAD).6 The measure KW+ is the average M value of m=1 m max(wm 0). The measure KW− is the average value of M m=1 m| min(wm 0)|. This measure is zero for W ∈ ΩP and W ∈ Ωsq 5 We do not report results for estimators based on ΩC to preserve space. Their performance is very similar to that of estimators using ΩU 6 We use this robust measure because of concerns about the existence of moments of estimators.
710
G. KUERSTEINER AND R. OKUI
5.3. Results Tables II–IV report simulation results for Models A–C. Table II contains results for the case when c = 01, where endogeneity is weak. 2SLS-All as well as 2SLS-U perform best. The KW+ and KW− statistics for 2SLS-U reveal that model averaging only partly attempts to offset bias, as can be expected from the theoretical analysis. 2SLS-P also performs well, but not as well as 2SLS-All, and 2SLS-U. The procedures that are based on (2.6) (2SLS-DN, KGMM, and 2SLS-Ps) do not perform well, but the performance of 2SLS-Ps is best among these three. This result reveals the importance of considering additional higher order terms in approximating the MSE. All the LIML and Fuller based estimators perform less well than their 2SLS counterparts. Nonetheless, in contrast to the results for 2SLS, both LIML-DN and the MALIML estimators improve substantially over LIML-All. The performance of LIML-U relative to LIMLAll is particularly strong with similar results holding for Fuller. When identification is strong (i.e., N = 1000 and R2 = 01), the difference between the different estimators disappears: all of them perform roughly at par. This is a reflection of the first-order asymptotic properties that will eventually dominate the finite sample issues in large samples. In Table III, we consider the case of intermediate endogeneity (c = 05). When N = 100 and R2f = 001, the 2SLS-type estimators perform well and the performance of 2SLS-P is particularly good. Fuller-P and Fuller-DN also do well in this case. On the other hand, when identification is strong (N = 1000 and/or R2f = 01), the LIML-type and even more so the Fuller estimators perform better than the 2SLS-type estimators, and Fuller-P is the best estimator overall, sometimes outperforming 2SLS based methods by large margins. LIML-All and to a lesser extent Fuller-All perform particularly poorly across all scenarios in Table III. This underlines the importance of data-dependent instrument selection for the LIML and Fuller estimators. Table IV contains results for the case of strong endogeneity (c = 09). In these experiments, LIML is generally the preferred method with a slight advantage over Fuller. LIML-U and LIML-P often perform equally well and at least as well as LIML-DN. In the designs with R2f = 01 and N = 100 as well as R2f = 001 and N = 1000, LIML-U and -P do sometimes significantly better than LIML-DN. Similar relationships hold for the Fuller-U and -P estimators relative to Fuller-DN. Overall LIML-P is the preferred procedure among the data-dependent procedures in Table IV, but Fuller-P is only slightly worse. Among the 2SLS procedures, 2SLS-Ps shows good performance over the range of cases reported on Table IV. Its RMAD value is never above 1 and in most cases significantly less. 2SLS-U outperforms 2SLS-All across all experiments. The ranking relative to 2SLS-DN depends on the data-generating process. In Model A and especially in Model C, 2SLS-U outperforms 2SLS-DN, whereas the latter has the upper hand in Model B. It is remarkable that 2SLS-U performs particularly well in Model C when R2f = 01, demonstrating its ability to pick out relevant instruments.
TABLE II MONTE CARLO RESULTS (c = 01)
2SLS-All 2SLS-DN KGMM 2SLS-U 2SLS-P 2SLS-Ps LIML-All LIML-DN LIML-U LIML-P Fuller-All Fuller-DN Fuller-U Fuller-P
Model B R2f = 01
R2f = 001
RMAD
KW+
KW−
RMAD
KW+
KW−
RMAD
0.38 1.00 1.00 0.40 0.43 0.75 1.98 1.14 0.95 1.00 1.42 0.62 0.63 0.58
20 4.47 3.28 72.4 9.89 4.9 20 4.95 309 7.02 20 3.72 785 5.13
0 0 0 65.6 0 0 0 0 333 0 0 0 870 0
0.66 1.00 1.02 0.68 0.70 0.91 1.79 1.48 1.19 1.29 1.60 1.17 1.02 1.07
20 9.43 6.44 80.5 13.5 8.22 20 9.16 175 8.75 20 8.4 310 8.05
0 0 0 71.6 0 0 0 0 189 0 0 0 346 0
0.42 1.00 0.98 0.44 0.48 0.80 1.89 1.00 0.84 0.88 2.23 1.00 0.99 0.93
KW+
Model C R2f = 01
KW−
N = 100 20 0 4.35 0 3.24 0 73.2 66.4 10.1 0 4.96 0 20 0 4.55 0 152 168 6.64 0 20 0 3.54 0 606 730 5.01 0
R2f = 001
R2f = 01
RMAD
KW+
KW−
RMAD
KW+
KW−
RMAD
KW+
KW−
0.74 1.00 0.97 0.75 0.80 0.93 2.01 1.26 1.03 1.17 1.78 1.14 0.97 1.05
20 7.13 5.4 87.1 13.3 7.36 20 5.58 406 6.13 20 5.44 442 5.9
0 0 0 78.4 0 0 0 0 451 0 0 0 491 0
0.37 1.00 0.98 0.39 0.41 0.75 1.93 1.12 0.95 0.98 1.36 0.61 0.64 0.57
20 4.52 3.32 71.2 9.84 4.92 20 5.09 189 7.17 20 3.76 487 5.2
0 0 0 64.4 0 0 0 0 205 0 0 0 542 0
0.68 1.00 1.02 0.70 0.72 0.92 1.80 1.49 1.26 1.38 1.61 1.14 1.06 1.11
20 9.91 7.08 78.4 13.4 8.73 20 10.5 226 9.38 20 9.45 416 8.52
0 0 0 69.8 0 0 0 0 244 0 0 0 457 0
CONSTRUCTING OPTIMAL INSTRUMENTS
Model A R2f = 001
(Continues)
711
712
TABLE II—Continued Model A
RMAD
2SLS-All 2SLS-DN KGMM 2SLS-U 2SLS-P 2SLS-Ps LIML-All LIML-DN LIML-U LIML-P Fuller-All Fuller-DN Fuller-U Fuller-P
0.37 1.00 0.95 0.38 0.42 0.68 1.23 0.97 0.75 0.88 1.10 0.73 0.58 0.68
KW+
30 7.65 5.43 189 14.4 7.54 30 8.24 369 10.9 30 5.6 682 9.27
Model B R2f = 01
KW−
0 0 0 180 0 0 0 0 392 0 0 0 733 0
RMAD
1.02 1.00 1.14 0.99 1.01 1.03 1.26 1.24 1.26 1.24 1.24 1.23 1.24 1.22
KW+
30 29 15.5 128 27.9 23.4 30 29.2 311 23.2 30 29.2 314 23.2
R2f = 001
KW−
0 0 0 106 0 0 0 0 311 0 0 0 314 0
RMAD
0.58 1.00 0.93 0.60 0.66 0.83 1.92 1.14 0.90 1.07 1.70 1.02 0.81 0.95
KW+
Model C R2f = 01
KW−
N = 1000 30 0 7.63 0 5.94 0 215 205 16.7 0 8.65 0 30 0 5.31 0 898 959 7.82 0 30 0 4.85 0 1060 1130 7.26 0
RMAD
0.93 1.00 0.98 0.94 0.97 0.98 1.12 1.06 1.05 1.05 1.12 1.06 1.04 1.04
KW+
R2f = 001
KW−
30 0 15.3 0 13 0 246 231 23.9 0 14.8 0 30 0 11.2 0 1390 1480 11.4 0 30 0 11.2 0 1410 1490 11.4 0
RMAD
0.29 1.00 0.96 0.31 0.33 0.57 0.97 0.79 0.62 0.72 0.87 0.54 0.47 0.55
KW+
R2f = 01
KW−
30 0 7.81 0 5.68 0 176 168 13.7 0 7.68 0 30 0 10.3 0 524 556 12 0 30 0 6.75 0 1480 1580 9.98 0
RMAD
0.97 1.00 1.01 0.98 0.99 0.99 1.20 1.17 1.12 1.15 1.19 1.15 1.12 1.14
KW+
KW−
30 0 23.2 0 15.5 0 198 184 26 0 21.2 0 30 0 21.8 0 1220 1310 19.3 0 30 0 21.8 0 1230 1320 19.3 0
G. KUERSTEINER AND R. OKUI
R2f = 001
TABLE III MONTE CARLO RESULTS (c = 05)
RMAD
2SLS-All 2SLS-DN KGMM 2SLS-U 2SLS-P 2SLS-Ps LIML-All LIML-DN LIML-U LIML-P Fuller-All Fuller-DN Fuller-U Fuller-P
0.77 1.00 0.98 0.76 0.76 0.86 1.40 0.97 0.92 0.90 0.95 0.79 0.81 0.77
KW+
Model B R2f = 01
KW−
20 0 4.41 0 3.23 0 71.9 65.2 9.81 0 4.87 0 20 0 4.94 0 2860 3140 7.05 0 20 0 3.72 0 3870 4270 5.14 0
R2f = 001
RMAD
KW+
KW−
RMAD
0.87 1.00 0.99 0.87 0.84 0.91 0.87 0.83 0.93 0.81 0.77 0.80 0.89 0.74
20 8.13 5.87 81.8 12.4 7.43 20 9.36 166 8.96 20 8.45 254 8.23
0 0 0 73.9 0 0 0 0 176 0 0 0 289 0
0.84 1.00 0.97 0.83 0.81 0.89 1.52 0.95 0.94 0.93 1.04 0.79 0.83 0.79
KW+
Model C R2f = 01
KW−
N = 100 20 0 4.37 0 3.2 0 73.5 66.8 9.89 0 4.86 0 20 0 4.56 0 132 144 6.72 0 20 0 3.53 0 383 430 5.06 0
R2f = 001
R2f = 01
RMAD
KW+
KW−
RMAD
KW+
KW−
RMAD
KW+
KW−
1.25 1.00 0.93 1.20 1.02 0.94 1.25 0.89 1.05 0.92 1.11 0.84 0.98 0.85
20 5.7 4.51 84.2 10.7 5.72 20 5.84 189 6.89 20 5.62 217 6.59
0 0 0 77.1 0 0 0 0 207 0 0 0 239 0
0.74 1.00 0.98 0.74 0.74 0.85 1.38 0.95 0.88 0.88 0.90 0.79 0.80 0.76
20 4.46 3.29 71.7 9.77 4.89 20 5.05 142 7.19 20 3.77 664 5.21
0 0 0 65 0 0 0 0 157 0 0 0 746 0
0.81 1.00 1.05 0.80 0.79 0.87 0.82 0.75 0.76 0.72 0.72 0.76 0.73 0.66
20 8.5 6.3 79.4 12.1 7.79 20 10.6 118 9.68 20 9.6 287 8.8
0 0 0 71.8 0 0 0 0 124 0 0 0 317 0
CONSTRUCTING OPTIMAL INSTRUMENTS
Model A R2f = 001
(Continues)
713
714
TABLE III—Continued Model A
RMAD
2SLS-All 2SLS-DN KGMM 2SLS-U 2SLS-P 2SLS-Ps LIML-All LIML-DN LIML-U LIML-P Fuller-All Fuller-DN Fuller-U Fuller-P
0.82 1.00 1.00 0.82 0.81 0.89 0.84 0.87 0.89 0.76 0.73 0.89 0.90 0.70
KW+
30 7.1 5.21 187 13.7 7.13 30 8.53 247 11.2 30 5.57 696 9.41
Model B R2f = 01
KW−
0 0 0 179 0 0 0 0 259 0 0 0 747 0
RMAD
0.70 1.00 0.79 0.93 0.75 0.80 0.47 0.46 0.48 0.47 0.47 0.46 0.49 0.47
KW+
30 12.9 12.9 207 16.6 13.2 30 29.4 26.9 23.5 30 29.4 27 23.5
R2f = 001
KW−
0 0 0 197 0 0 0 0 3.91 0 0 0 4.01 0
RMAD
1.32 1.00 0.99 1.28 1.07 0.97 1.30 0.89 1.07 0.93 1.15 0.86 1.04 0.87
KW+
Model C R2f = 01
KW−
N = 1000 30 0 5.76 0 4.72 0 201 193 13.1 0 6.55 0 30 0 6.14 0 262 282 9.02 0 30 0 5.3 0 339 366 8.29 0
R2f = 001
R2f = 01
RMAD
KW+
KW−
RMAD
KW+
KW−
RMAD
KW+
KW−
1.50 1.00 0.98 1.33 1.03 0.99 0.98 0.94 0.96 0.95 0.97 0.95 0.96 0.95
30 7.73 7.04 216 9.84 7.23 30 12.2 16.4 13.1 30 12.2 16.4 13
0 0 0 210 0 0 0 0 4.78 0 0 0 4.92 0
0.62 1.00 0.97 0.61 0.62 0.74 0.63 0.66 0.61 0.57 0.55 0.72 0.61 0.52
30 6.74 5.13 175 12.9 7.09 30 10.6 242 12.2 30 6.77 962 10.1
0 0 0 168 0 0 0 0 256 0 0 0 1030 0
0.95 1.00 1.15 0.85 0.84 0.84 0.63 0.62 0.61 0.63 0.63 0.62 0.61 0.62
30 16.8 13.9 183 14.9 12.8 30 22.3 23.4 20.9 30 22.3 23.4 20.9
0 0 0 177 0 0 0 0 5.32 0 0 0 5.44 0
G. KUERSTEINER AND R. OKUI
R2f = 001
TABLE IV MONTE CARLO RESULTS (c = 09)
2SLS-All 2SLS-DN KGMM 2SLS-U 2SLS-P 2SLS-Ps LIML-All LIML-DN LIML-U LIML-P Fuller-All Fuller-DN Fuller-U Fuller-P
Model B R2f = 01
R2f = 001
RMAD
KW+
KW−
RMAD
KW+
KW−
RMAD
0.97 1.00 1.00 0.97 0.97 0.97 0.94 0.94 0.93 0.91 0.78 0.98 0.97 0.95
20 4.18 3.12 71.7 9.52 4.65 20 5.61 140 7.4 20 3.87 565 5.46
0 0 0 65.2 0 0 0 0 152 0 0 0 623 0
0.93 1.00 0.97 0.94 0.92 0.93 0.38 0.56 0.46 0.44 0.33 0.66 0.52 0.49
20 3.81 3.38 77.4 8.65 4.47 20 10.8 36.5 10.3 20 9.4 66.6 9.51
0 0 0 71.6 0 0 0 0 29.8 0 0 0 66.7 0
1.06 1.00 0.98 1.06 1.03 0.96 1.03 0.89 0.89 0.87 0.86 0.94 0.95 0.93
KW+
Model C R2f = 01
KW−
N = 100 20 0 3.78 0 2.95 0 68.6 62.5 8.75 0 4.18 0 20 0 5.74 0 90.4 95.7 7.88 0 20 0 3.9 0 423 475 6.09 0
R2f = 001
R2f = 01
RMAD
KW+
KW−
RMAD
KW+
KW−
RMAD
KW+
KW−
1.81 1.00 0.91 1.69 1.30 0.97 0.73 0.73 0.67 0.67 0.65 0.73 0.61 0.61
20 2.65 2.48 62.4 6.1 2.94 20 8.21 9.93 9.71 20 7.71 10.1 9.49
0 0 0 57.5 0 0 0 0 0.319 0 0 0 0.856 0
0.94 1.00 1.00 0.94 0.94 0.96 0.93 0.95 0.93 0.91 0.76 0.97 0.96 0.95
20 4.31 3.18 72.1 9.69 4.82 20 5.58 189 7.36 20 3.83 555 5.33
0 0 0 65.6 0 0 0 0 217 0 0 0 642 0
0.71 1.00 0.97 0.69 0.69 0.75 0.29 0.38 0.33 0.32 0.26 0.46 0.41 0.39
20 3.34 3.01 73.4 8.65 4.69 20 11.4 61.3 10.7 20 10.2 201 9.84
0 0 0 67.7 0 0 0 0 57.6 0 0 0 215 0
CONSTRUCTING OPTIMAL INSTRUMENTS
Model A R2f = 001
(Continues)
715
716
TABLE IV—Continued Model A
RMAD
2SLS-All 2SLS-DN KGMM 2SLS-U 2SLS-P 2SLS-Ps LIML-All LIML-DN LIML-U LIML-P Fuller-All Fuller-DN Fuller-U Fuller-P
0.94 1.00 0.99 0.95 0.93 0.93 0.36 0.66 0.47 0.44 0.31 0.87 0.61 0.53
KW+
30 3.69 3.06 172 9.7 4.46 30 12.3 126 13.4 30 7.8 346 11.3
Model B R2f = 01
KW−
0 0 0 166 0 0 0 0 123 0 0 0 366 0
RMAD
0.71 1.00 0.83 1.13 0.81 0.86 0.25 0.25 0.26 0.26 0.25 0.25 0.26 0.26
KW+
30 2.98 3.56 177 8.73 4.97 30 29.6 23.9 23.9 30 29.6 23.9 23.9
R2f = 001
KW−
0 0 0 171 0 0 0 0 0 0 0 0 0 0
RMAD
1.72 1.00 0.91 1.65 1.25 0.91 0.66 0.68 0.59 0.59 0.57 0.70 0.52 0.52
KW+
Model C R2f = 01
KW−
N = 1000 30 0 2.15 0 2.07 0 135 131 6 0 2.69 0 30 0 10.9 0 14.1 0.251 13.9 0 30 0 9.6 0 25.9 14.8 13.3 0
R2f = 001
RMAD
KW+
KW−
RMAD
2.23 1.00 0.90 1.77 1.03 0.98 0.79 0.79 0.78 0.78 0.78 0.79 0.79 0.79
30 5.02 4.62 139 6.63 4.71 30 16.9 16.3 16.3 30 16.8 16.3 16.3
0 0 0 134 0 0 0 0 0 0 0 0 0 0
0.75 1.00 0.99 0.73 0.74 0.82 0.29 0.59 0.39 0.36 0.25 0.91 0.53 0.46
KW+
R2f = 01
KW−
30 0 4.11 0 3.17 0 167 161 10.8 0 5.33 0 30 0 12.2 0 325 337 13.5 0 30 0 7.47 0 1350 1430 11 0
RMAD
KW+
KW−
0.21 1.00 0.98 0.15 0.17 0.18 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
30 1.15 1.23 125 9.05 6.12 30 24 22 21.9 30 24 22 21.9
0 0 0 120 0 0 0 0 0 0 0 0 0 0
G. KUERSTEINER AND R. OKUI
R2f = 001
CONSTRUCTING OPTIMAL INSTRUMENTS
717
Overall, our results show that when the correlation between structural and reduced form errors is low, 2SLS-U is the preferred procedure among all datadependent estimators considered. With increasing correlation, LIML-P and Fuller-P perform best in most cases. In addition, the LIML and Fuller estimators are quite sensitive to the number of instruments, and data-dependent methods clearly dominate LIML-All and Fuller-All in the majority of cases. The simulation results also confirm our theoretical findings. They indicate the importance of considering additional higher order terms in approximating the MSE of the estimator: 2SLS-P generally performs better than 2SLS-Ps except in Model B when c = 09. Our results also document the ability of the model averaging procedure with possibly negative weights to pick out irrelevant instruments. In Model C, our estimators based on ΩU perform better than the sequential selection methods, and 2SLS-U significantly outperforms 2SLS-All when R2 = 01, c ≥ 05, and N = 1000 Finally, we note that 2SLS-Ps systematically outperforms 2SLS-DN across all Monte Carlo designs. 6. CONCLUSIONS For models with many overidentifying moment conditions, we show that model averaging of the first-stage regression can be done in a way that reduces the higher order MSE of the 2SLS estimator relative to procedures that are based on a single first-stage model. The procedures we propose are easy to implement numerically. Monte Carlo experiments document that the MA2SLS estimators perform at least as well as conventional moment selection approaches, and perform particularly well when the degree of endogeneity is low to moderate and when the instrument set contains uninformative instruments. When endogeneity is moderate to strong, the MALIML and MAFuller estimators with positive weights are the preferred estimators. REFERENCES CANAY, I. A. (2010): “Simultaneous Selection and Weighting of Moments in GMM Using a Trapezoidal Kernel,” Journal of Econometrics (forthcoming). [702] DONALD, S. G., AND W. K. NEWEY (2001): “Choosing the Number of Instruments,” Econometrica, 69, 1161–1191. [697-699,701,703,704,708,709] DOORNIK, J. A. (2007): Ox 5—An Object-Oriented Matrix Programming Language. London: Timberlake Consultants Ltd. [708,709] FULLER, W. A. (1977): “Some Properties of a Modification of the Limited Information Estimator,” Econometrica, 45, 939–954. [697,699] HAHN, J., J. HAUSMAN, AND G. KUERSTEINER (2004): “Estimation With Weak Instruments: Accuracy of Higher-Order Bias and MSE Approximations,” Econometrics Journal, 7, 272–306. [699] HANSEN, B. E. (2007): “Least Squares Model Averaging,” Econometrica, 75, 1175–1189. [697, 707,708] KUERSTEINER, G. M. (2002): “Mean Squared Error Reduction for GMM Estimators of Linear Time Series Models,” Unpublished Manuscript, University of California, Davis. [701,702,709]
718
G. KUERSTEINER AND R. OKUI
KUERSTEINER, G., AND R. OKUI (2010): “Supplement to ‘Constructing Optimal Instruments by First Stage Prediction Averaging’: Auxiliary Appendix,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/7444_proofs.pdf; http://www.econometricsociety.org/ecta/Supmat/7444_data and programs.zip. [700,703,705] LI, K.-C. (1987): “Asymptotic Optimality for Cp , CL , Cross-Validation and Generalized CrossValidation: Discrete Index Set,” The Annals of Statistics, 15, 958–975. [707] NAGAR, A. L. (1959): “The Bias and Moment Matrix of the General k-Class Estimators of the Parameters in Simultaneous Equations,” Econometrica, 27, 575–595. [697] NEWEY, W. K. (2007): “Choosing Among Instruments Sets,” Unpublished Manuscript, Massachusetts Institute of Technology. [697] OKUI, R. (2008): “Instrumental Variable Estimation in the Presence of Many Moment Conditions,” Journal of Econometrics (forthcoming). [702] POLITIS, D. N., AND J. P. ROMANO (1995): “Bias-Corrected Nonparametric Spectral Estimation,” Journal of Time Series Analysis, 16, 67–103. [702]
Dept. of Economics, University of California, Davis, 1 Shields Avenue, Davis, CA 95616, U.S.A.;
[email protected] and Institute of Economic Research, Kyoto University, Yoshida-Hommachi, Sakyo, Kyoto, Kyoto, 606-8501, Japan;
[email protected]. Manuscript received September, 2007; final revision received October, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 719–733
A DYNAMIC MODEL FOR BINARY PANEL DATA WITH UNOBSERVED HETEROGENEITY ADMITTING √ A n-CONSISTENT CONDITIONAL ESTIMATOR BY FRANCESCO BARTOLUCCI AND VALENTINA NIGRO1 A model for binary panel data is introduced which allows for state dependence and unobserved heterogeneity beyond the effect of available covariates. The model is of quadratic exponential type and its structure closely resembles that of the dynamic logit model. However, it has the advantage of being easily estimable via conditional likelihood with at least two observations (further to an initial observation) and even in the presence of time dummies among the regressors. KEYWORDS: Longitudinal data, quadratic exponential distribution, state dependence.
1. INTRODUCTION BINARY PANEL DATA ARE USUALLY ANALYZED by using a dynamic logit or probit model which includes, among the explanatory variables, the lags of the response variable and has individual-specific intercepts; see Arellano and Honoré (2001) and Hsiao (2005), among others. These models allow us to disentangle the true state dependence (i.e., how the experience of an event in the past can influence the occurrence of the same event in the future) from the propensity to experience a certain outcome in all periods, when the latter depends on unobservable factors (see Heckman (1981a, 1981b)). State dependence arises in many economic contexts, such as job decision, investment choice, and brand choice, and can determine different policy implications. The parameters of main interest in these models are typically those for the covariates and the true state dependence, which are referred to as structural parameters. The individual-specific intercepts are referred to as incidental parameters; they are of interest only in certain situations, such as when we need to obtain marginal effects and predictions. In this paper, we introduce a model for binary panel data which closely resembles the dynamic logit model, and, as such, allows for state dependence and unobserved heterogeneity between subjects, beyond the effect of the available covariates. The model is a version of the quadratic exponential model (Cox (1972)) with covariates in which (i) the first-order effects depend on the covari1 We thank a co-editor and three anonymous referees for helpful suggestions and insightful comments. We are also grateful to Franco Peracchi and Frank Vella for their comments and suggestions. Francesco Bartolucci acknowledges financial support from the Einaudi Institute for Economics and Finance (EIEF), Rome. Most of the article was developed during the period Valentina Nigro spent at the University of Rome “Tor Vergata” and is part of her Ph.D. dissertation.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7531
720
F. BARTOLUCCI AND V. NIGRO
ates and on an individual-specific parameter for the unobserved heterogeneity, and (ii) the second-order effects are equal to a common parameter when they are referred to pairs of consecutive response variables and to 0 otherwise. We show that this parameter has the same interpretation that it has in the dynamic logit model in terms of log-odds ratio, a measure of association between binary variables which is well known in the statistical literature on categorical data analysis (Agresti (2002, Chap. 8)). For the proposed model, we also provide a justification as a latent index model in which the systematic component depends on expectation about future outcomes, beyond the covariates and the lags of the response variable, and the stochastic component has a standard logistic distribution. An important feature of the proposed model is that, as for the static logit model, the incidental parameters can be eliminated by conditioning on sufficient statistics for these parameters, which correspond to the sums of the response variables at individual level. Using a terminology derived from Rasch (1961), these statistics will be referred to as total scores. The resulting conditional likelihood allows us to identify the structural parameters for the covariates and the state dependence with at least two observations (further to an initial observation). The estimator √ of the structural parameters based on the maximization of this function is n-consistent; moreover, it is simpler to compute than the estimator of Honoré and Kyriazidou (2000) and may be used even in the presence of time dummies. On the basis of a simulation study, the results of which are reported in the Supplemental Material file (Bartolucci and Nigro (2010)), we also notice that the estimator has good finite-sample properties in terms of both bias and efficiency. The paper is organized as follows. In the next section, we briefly review the dynamic logit model for binary panel data. The proposed model is described in Section 3, where we also show that the total scores are sufficient statistics for its incidental parameters. Identification of the structural parameters and the conditional maximum likelihood estimator of these parameters is illustrated in Section 4. 2. DYNAMIC LOGIT MODEL FOR BINARY PANEL DATA In the following discussion, we first review the dynamic logit model for binary panel data; then we discuss conditional inference and related inferential methods on its structural parameters. 2.1. Basic Assumptions Let yit be a binary response variable equal to 1 if subject i (i = 1 n) makes a certain choice at time t (t = 1 T ) and equal to 0 otherwise; also let xit be a corresponding vector of strictly exogenous covariates. The standard
DYNAMIC MODEL FOR BINARY PANEL DATA
721
fixed-effects approach for binary panel data assumes that (1)
yit = 1{yit∗ ≥ 0} yit∗ = αi + xit β + yit−1 γ + εit
i = 1 n t = 1 T
where 1{·} is the indicator function and yit∗ is a latent variable which may be interpreted as utility (or propensity) of the choice. Moreover, the zero-mean random variables εit represent error terms. Of primary interest are the vector of parameters for the covariates, β, and the parameter that measures the state dependence effect, γ. These are the structural parameters which are collected in the vector θ = (β γ) . The individual-specific intercepts αi are instead the incidental parameters. The error terms εit are typically assumed to be independent and identically distributed conditionally on the covariates and the individual-specific parameters, and assumed to have a standard logistic distribution. The conditional distribution of yit given αi , Xi = ( xi1 · · · xiT ) and yi0 yit−1 can then be expressed as (2)
p(yit |αi Xi yi0 yit−1 ) = p(yit |αi xit yit−1 ) =
exp[yit (αi + xit β + yit−1 γ)] 1 + exp(αi + xit β + yit−1 γ)
for i = 1 n and t = 1 T . This is a dynamic logit formulation which implies the following conditional distribution of the overall vector of response variables yi = (yi1 yiT ) given αi Xi and yi0 : exp yi+ αi + yit xit β + yi∗ γ (3)
t
p(yi |αi Xi yi0 ) = [1 + exp(αi + xit β + yit−1 γ)] t
where yi+ = t yit and yi∗ = t yit−1 yit , with the sum t and the product t ranging over t = 1 T . The statistic yi+ is referred to as the total score of subject i. For what follows, it is important to note that log
p(yit = 0|αi Xi yit−1 = 0)p(yit = 1|αi Xi yit−1 = 1) =γ p(yit = 0|αi Xi yit−1 = 1)p(yit = 1|αi Xi yit−1 = 0)
for i = 1 n and t = 1 T . Thus, the parameter γ for the state dependence corresponds to the conditional log-odds ratio between (yit−1 yit ) for every i and t.
722
F. BARTOLUCCI AND V. NIGRO
2.2. Conditional Inference As mentioned in Section 1, an effective approach to estimate the model illustrated above is based on the maximization of the conditional likelihood given suitable sufficient statistics. For the static version of the model, in which the parameter γ is equal to 0, we have that yi is conditionally independent of αi given yi0 , Xi , and the total score yi+ , and then p(yi |αi Xi yi+ ) = p(yi |Xi yi+ ). The likelihood based on this conditional probability√ allows us to identify β for T ≥ 2; by maximizing this likelihood we also obtain a n-consistent estimator of β. Even though referred to a simpler context, this result goes back to Rasch (1961) and was developed by Andersen (1970). See also Magnac (2004), who characterized other situations in which the total scores are sufficient statistics for the individual-specific intercepts. Among the first authors to deal with the conditional approach for the dynamic logit model (γ is unconstrained) were Cox (1958) and Chamberlain (1985). In particular, the latter noticed that when T = 3 and the covariates are omitted from the model, p(yi |αi yi0 yi1 + yi2 = 1 yi3 ) does not depend on αi for every yi0 and yi3 . On the basis of this conditional distribution, it is therefore possible to construct a likelihood function which depends on the response configurations of only certain subjects (those such that yi1 + yi2 = 1), and which allows us to identify and consistently estimate the parameter γ. The approach of Chamberlain (1985) was extended by Honoré and Kyriazidou (2000) to the case where, as in (2), the model includes exogenous covariates. In particular, when these covariates are continuous, they proposed to estimate the vector θ of structural parameters by maximizing a weighted conditional log-likelihood with weights depending on the individual covariates through a kernel function which must be defined in advance. Although the weighted conditional approach of Honoré and Kyriazidou (2000) is of great interest, their results about identification and consistency are based on certain assumptions on the support of the covariates which rule out, for instance, time dummies. Moreover, the approach requires careful choice of the kernel function and of its bandwidth, since these choices affect the performance of their estimator. Furthermore, the estimator is consistent as n √→ ∞, but its rate of convergence to the true parameter value is slower than n, unless only discrete covariates are present. See also Magnac (2004) and Honoré and Tamer (2006). Even though it is not strictly related to the conditional approach, it is worth mentioning that a recent line of research investigated dynamic discrete choice models with fixed-effects proposing bias corrected estimators (see Hahn and Newey (2004), Carro (2007)). Although these estimators are only consistent when the number of time periods goes to infinity, they have a reduced order of the bias without increasing the asymptotic variance. Monte Carlo simulations have shown their good finite-sample performance in comparison to the esti-
DYNAMIC MODEL FOR BINARY PANEL DATA
723
mator of Honoré and Kyriazidou (2000) even with not very long panels (e.g., seven time periods). 3. PROPOSED MODEL FOR BINARY PANEL DATA In this section, we introduce a quadratic exponential model for binary panel data and we discuss its main features in comparison to the dynamic logit model. 3.1. Basic Assumptions We assume that (4)
p(yi |αi Xi yi0 ) yit xit β1 + yiT (φ + xiT β2 ) + yi∗ γ exp yi+ αi + =
t exp z+ αi + zt xit β1 + zT (φ + xiT β2 ) + zi∗ γ t
z
where the sum z ranges over all possible binaryresponse vectors z = (z1 zT ) ; moreover, z+ = t zt and zi∗ = yi0 z1 + t>1 zt−1 zt . The denominator does not depend on yi ; it is simply a normalizing constant that we denote by μ(αi Xi yi0 ). The model can be viewed as a version of the quadratic exponential model of Cox (1972) with covariates in which the first-order effect for yit is equal to αi + xit β1 (to which we add φ + xit β2 when t = T ) and the secondorder effect for (yis yit ) is equal to γ when t = s + 1 and equal to 0 otherwise. The need for a different parametrization of the first-order effect when t = T and t < T will be clarified below. It is worth noting that the expression for the probability of yi given in (4) closely resembles that given in (3) which results from the dynamic logit model. From some simple algebra, we also obtain that log
p(yit = 0|αi Xi yit−1 = 0)p(yit = 1|αi Xi yit−1 = 1) =γ p(yit = 0|αi Xi yit−1 = 1)p(yit = 1|αi Xi yit−1 = 0)
for every i and t. Then, under the proposed quadratic exponential model, γ has the same interpretation that it has under the dynamic logit model, that is, log-odds ratio between each pair of consecutive response variables. Not surprisingly, the dynamic logit model coincides with the proposed model in the absence of state dependence (γ = 0).2 2 It is also possible to show that, up to a correction term, expression (4) is an approximation of that in (3) obtained by a first-order Taylor expansion around αi = 0, β = 0, and γ = 0.
724
F. BARTOLUCCI AND V. NIGRO
The main difference with respect to the dynamic logit is in the resulting conditional distribution of yit given the available covariates Xi and yi0 yit−1 . In fact, (4) implies that (5)
p(yit |αi Xi yi0 yit−1 ) =
exp{yit [αi + xit β1 + yit−1 γ + e∗t (αi Xi )]} 1 + exp[αi + xit β1 + yit−1 γ + e∗t (αi Xi )]
where, for t < T , (6)
e∗t (αi Xi ) = log = log
1 + exp[αi + xit+1 β1 + e∗t+1 (αi Xi ) + γ] 1 + exp[αi + xit+1 β1 + e∗t+1 (αi Xi )] p(yit+1 = 0|αi Xi yit = 0) p(yit+1 = 0|αi Xi yit = 1)
and (7)
e∗T (αi Xi ) = φ + xiT β2
Then, for t = T , the proposed model is equivalent to a dynamic logit model with a suitable parametrization. The interpretation of this correction term will be discussed in detail in Section 3.2. For the moment, it is important to note that the conditional probability depends on present and future covariates, meaning that these covariates are not strictly exogenous (see Wooldridge (2001, Sec. 15.8.2)). The relation between the covariates and the feedback of the response variables vanishes when γ = 0. Consider also that, for t < T , the same Taylor expansion mentioned in footnote 2 leads to e∗t (αi Xi ) ≈ 05γ. Under this approximation, p(yit |αi Xi yi0 yit−1 ) does not depend on the future covariates and these covariates can be considered strictly exogenous in an approximate sense. In the simpler case without covariates, the conditional probability of yit becomes p(yit |αi yi0 yit−1 ) =
exp{yit [αi + yit−1 γ + e∗t (αi )]} 1 + exp[αi + yit−1 γ + e∗t (αi )]
t = 1 T − 1
whereas, for the last period, we have the logistic parametrization p(yiT |αi yi0 yiT −1 ) =
exp[yiT (αi + yiT −1 γ)] 1 + exp(αi + yiT −1 γ)
where e∗t (αi ) = log
p(yit+1 = 0|αi yit = 0) p(yit+1 = 0|αi yit = 1)
DYNAMIC MODEL FOR BINARY PANEL DATA
725
which is 0 only in the absence of state dependence. Finally, we have to clarify that the possibility to use quadratic exponential models for panel data is already known in the statistical literature; see Diggle, Heagerty, Liang, and Zeger (2002) and Molenberghs and Verbeke (2004). However, the parametrization adopted in this type of literature, which is different from the one we propose, is sometimes criticized for lack of a simple interpretation. In contrast, for our parametrization, we provide a justification as a latent index model. 3.2. Model Justification and Related Issues Expression (5) implies that the proposed model is equivalent to the latent index model (8)
yit = 1{yit∗ ≥ 0}
yit∗ = αi + xit β1 + yit−1 γ + e∗t (αi Xi ) + εit
where the error terms εit are independent and have standard logistic distribution. Assumption (8) is similar to assumption (1) on which the dynamic logit model is based, the main difference being in the correction term e∗t (αi Xi ). As is clear from (6), this term can be interpreted as a measure of the effect of the present choice yit on the expected utility (or propensity) at the next occasion (t + 1). In the presence of positive state dependence (γ > 0), this correction term is positive, since making the choice today has a positive impact on the expected utility. Also note that the different definition of e∗t (αi Xi ) for t < T and t = T (compare equations (6) and (7)) is motivated by considering that e∗T (αi Xi ) has an unspecified form, because it would depend on future covariates not in Xi ; then we assume this term to be equal to a linear form of the covariates xiT , in a way similar to that suggested by Heckman (1981c) to deal with the initial condition problem. As suggested by a referee, it is possible to justify formulation (8), which involves the correction term for the expectation, on the basis of an extension of the job search model described by Hyslop (1999). The latter is based on the maximization of a discounted utility and relies on a budget constraint in which search costs are considered only for subjects who did not participate in the labor market in the previous year. In our extension, subjects who decide to not participate in the current year save an amount of these costs for the next year, but benefit from the amounts previously saved according to the same rule. The reservation wage is then modified so that the decision to participate depends on future expectation about the participation state, beyond the past state. This motivates the introduction of the correction term e∗t (αi Xi ) in (8), which accounts for the difference between the behavior of a subject who has a budget constraint including expectation about future search costs and a subject who has a budget constraint that does not include this expectation.
726
F. BARTOLUCCI AND V. NIGRO
Two issues that are worth discussing so as to complete the description of the properties of the model are (i) model consistency with respect to marginalizations over a subset of the response variables and (ii) how to avoid assumption (7) on the last correction term. Assume that (4) holds for the T response variables in yi . For the subsequence −1) , where in general y(t) of responses y(T i i = (yi1 yit ) , we have
−1) |αi Xi yi0 p y(T i = exp yit (αi + xit β1 ) + yit−1 yit γ t
t
× [1 + exp(φ + x δ + yiT −1 γ)]/μ(αi Xi yi0 ) with δ = β1 + β2 . After some algebra, this expression can be reformulated as
−1) |αi Xi yi0 p y(T (9) i yit (αi + xit β1 ) + yit−1 yit γ + yiT −1 eT −1 (αi Xi ) exp =
t
t
μT −1 (αi Xi yi0 )
with eT −1 (αi Xi ) = log
1 + exp(φ + xiT δ + γ) 1 + exp(φ + xiT δ)
and μT −1 (αi X yi0 ) denoting the normalizing constant, which is equal to the sum of the numerator of (9) for all possible configurations of the first T − 1 response variables. Note that eT −1 (αi Xi ) has an interpretation similar to the correction term e∗T −1 (αi Xi ) for the future expectation which is defined above. −1) −1) |αi Xi yi0 ) = p(y(T |αi When γ = 0, eT −1 (αi Xi ) = 0 and then p(y(T i i (T −1) (t) Xi yi0 ) with Xi = ( xi1 · · · xit ). The latter probability can be expressed as in (4) and model consistency with respect to marginalization exactly holds. In the other cases, this form of consistency approximately holds, in the sense that by substituting eT −1 (αi Xi ) with its linear approximation, we obtain a dis−1) −1) tribution p(y(T |αi X(T yi0 ) which can be cast into (4). This argument can i i be iterated to show that, at least approximately, model consistency holds with respect to marginalizations over an arbitrary number of response variables3 ; in (t) this case, the distribution of interest is p(y(t) i |αi Xi yi0 ) with t smaller than T − 1. 3 Simulation results (see the Supplemental Material file) show that, for different values of γ, the bias of the conditional estimator of the structural parameters is negligible and is comparable to that resulting from computing these estimators on the complete data sequence.
727
DYNAMIC MODEL FOR BINARY PANEL DATA
Finally, assumption (7) on the last correction term e∗T (αi Xi ) can be avoided by conditioning the joint distribution on the corresponding outcome yiT . This removes this correction term since we have p(yi1 yiT −1 |αi Xi yi0 yiT ) yit αi + yit xit β1 + yi∗ γ exp t
=
t
μT −1 (αi Xi yi0 yiT )
This conditional version of the proposed model also has the advantage of being consistent across T . However, it would need at least three observations (beyond the initial one) to make the model parameters identifiable. Moreover, the conditional estimator becomes less efficient with respect to the same estimator applied to the initial model. 3.3. Conditional Distribution Given the Total Score The main advantage of the proposed model with respect to the dynamic logit model is that the total scores yi+ , i = 1 n, represent a set of sufficient statistics for the incidental parameters αi . This is because, for every i, yi is conditionally independent of αi given Xi , yi0 , and yi+ . First of all, note that, under assumption (4), p(yi+ |αi Xi yi0 ) = p(yi = z|αi Xi yi0 ) z(yi+ )
exp(yi+ αi ) exp zt xit β1 + zT (φ + xiT β2 ) + zi∗ γ = μ(αi Xi yi0 ) z(y ) t i+
where the sum z(yi+ ) is restricted to all response configurations z such that z+ = yi+ . After some algebra, the conditional distribution at issue becomes (10)
p(yi |αi Xi yi0 yi+ ) =
p(yi |αi Xi yi0 ) p(yi+ |αi Xi yi0 ) yit xit β1 + yT (φ + xiT β2 ) + yi∗ γ exp
= z(yi+ )
t
exp
t
t it
iT
z x β1 + zT (φ + x β2 ) + zi∗ γ
728
F. BARTOLUCCI AND V. NIGRO
The expression above does not depend on αi and, therefore, is also denoted by p(yi |Xi yi0 yi+ ). The same circumstance happens for the elements of β1 that correspond to the covariates which are time constant. To make this clearer, consider that we can divide the numerator and the denominator of (10) by exp(yi+ xi1 β1 ) and, after rearranging terms, we obtain
exp (11)
p(yi |Xi yi0 yi+ ) = z(yi+ )
yit dit β1 + yiT (φ + xiT β2 ) + yi∗ γ
t>1
exp
zt dit β1 + zT (φ + xiT β2 ) + zi∗ γ
t>1
with dit = xit − xi1 , t = 2 T . We consequently assume that β1 does not include any intercept common to all time occasions and regression parameters for covariates which are time constant; if these parameters are included, they would not be identified. This is typical of other conditional approaches, such as that of Honoré and Kyriazidou (2000), and of fixed-effects approaches in which the individual intercepts are estimated together with the structural parameters. Similarly, β2 must not contain any intercept for the last occasion, since this is already included through φ. 4. CONDITIONAL INFERENCE ON THE STRUCTURAL PARAMETERS In the following discussion, we introduce a conditional likelihood based on (11). We also provide formal arguments on the identification of the structural parameters via this function and on the asymptotic properties of the estimator that results from its maximization. 4.1. Structural Parameters Identification via Conditional Likelihood For an observed sample (Xi yi0 yi ), i = 1 n, the conditional likelihood has logarithm (12)
(θ) =
1{0 < yi+ < T } log[pθ (yi |Xi yi0 yi+ )]
i
where the subscript θ has been added to p(·|·) to indicate that this probability, which is defined in (11), depends on θ. Note that in this case θ = (β1 β2 φ γ) . Also note that the response configurations yi with sum 0 or T are removed since these do not contain information on θ.
DYNAMIC MODEL FOR BINARY PANEL DATA
729
To obtain a simple expression for the score and the information matrix corresponding to (θ), consider that (11) may be expressed in the canonical exponential family form as exp[u(yi0 yi ) A(Xi ) θ] pθ (yi |Xi yi0 yi+ ) = exp[u(yi0 z) A(Xi ) θ] z(yi+ )
where u(yi0 yi ) = (yi2 yiT yi∗ ) and ⎛ di2 · · · diT −1 diT ⎜ 0 ··· 0 xiT A(Xi ) = ⎜ ⎝ 0 ··· 0 1 0 ··· 0 0
⎞ 0 0⎟ ⎟ 0⎠ 1
with 0 denoting a column vector of zeros of suitable dimension. From standard results on exponential family distributions (Barndorff-Nielsen (1978, Chap. 8)), it is easy to obtain s(θ) = ∇ θ (θ) = 1{0 < yi+ < T }A(Xi )vθ (Xi yi0 yi ) i
J(θ) = −∇ θθ (θ) =
1{0 < yi+ < T }A(Xi )Vθ (Xi yi0 yi+ )A(Xi )
i
where vθ (Xi yi0 yi ) = u(yi0 yi ) − Eθ [u(yi0 yi )|Xi yi0 yi+ ] Vθ (Xi yi0 yi+ ) = Vθ [u(yi0 yi )|Xi yi0 yi+ ] Suppose now that the subjects in the samples are independent of each other with αi , Xi , yi0 , and yi drawn, for i = 1 n, from the true model (13)
f0 (α X y0 y) = f0 (α X y0 )p0 (y|α X y0 )
where f0 (α X y0 ) denotes the joint distribution of the individual-specific intercept, the covariates X = ( x1 · · · xT ), and the initial observation y0 . Furthermore, p0 (y|α X y0 ) denotes the conditional distribution of the response variables under the quadratic exponential model (4) when θ = θ0 , with θ0 denoting the true value of its structural parameters. Under this assumption, we have that Q(θ) = (θ)/n converges in probability to Q0 (θ) = E0 [(θ)/n] = E0 {log[pθ (y|X y0 y+ )]} for any θ, where E0 (·) denotes the expected value under the true model. By simple algebra, it is possible to show that the first derivative ∇ θ Q(θ) is equal to 0 at θ = θ0 and that, provided E0 [A(X)A(X) ] is of full rank, the second derivative matrix ∇ θθ Q(θ) is always negative definite. This implies that
730
F. BARTOLUCCI AND V. NIGRO
Q0 (θ) is strictly concave with its only maximum at θ = θ0 and, therefore, the vector of structural parameters is identified. Note that the regularity condition that E0 [A(X)A(X) ] is of full rank, necessary to ensure that ∇ θθ Q(θ) is negative definite, rules out cases of timeconstant covariates (see also the discussion in Section 3.3). It is also worth noting that the structural parameters of the model are identified with T ≥ 2, whereas identification of the structural parameters of the dynamic logit model is only possible when T ≥ 3 (Chamberlain (1993)). See also the discussion provided by Honoré and Tamer (2006). 4.2. Conditional Maximum Likelihood Estimator The conditional maximum likelihood estimator of θ, denoted by θˆ = ˆ γ) ˆ ˆ 2 φ (β1 β ˆ , is obtained by maximizing the conditional log-likelihood (θ). This maximum may be found by a simple iterative algorithm of Newton– Raphson type. At the hth step, this algorithm updates the estimate of θ at the previous step, θ(h−1) , as θ(h) = θ(h−1) + J(θ(h−1) )−1 s(θ(h−1) ). Note that the information matrix J(θ) is always nonnegative definite since it corresponds to the sum of a series of variance–covariance matrices. Provided E0 [A(X)A(X) ] is of full rank, J(θ) is also positive definite with probability approaching 1 as n → ∞. Then we can reasonably expect that (θ) is strictly concave and has its unique maximum at θˆ in most economic applications, where the sample size is usually large. Since we also have that the parameter space is equal to Rk , with k denoting the dimension of θ, the above algorithm is very ˆ regardless of simple to implement and usually converges in a few steps to θ, (0) the starting value θ . Under the true model (13), and provided that E0 [A(X)A(X) ] exists and is √ of full rank, we have that θˆ exists, is a n-consistent estimator of θ0 , and has asymptotic Normal distribution as n → ∞. This results may be proved on the basis of standard asymptotic results (cf. Theorems 2.7 and 3.1 of Newey and McFadden (1994)). From Newey and McFadden (1994, Sec. 4.2), we also derive that the standard errors for the elements of θˆ can be obtained as the corresponding diagonal ˆ −1 under square root. Note that Jˆ is obtained as a by-product elements of (J) from the Newton–Raphson algorithm described above. These standard errors can be used to construct confidence intervals for the parameters and to test hypotheses on them in the usual way. To study the finite-sample properties of the conditional estimator, we performed a simulation study (for a detailed description, see the Supplemental Material file) that closely follows the one performed by Honoré and Kyriazidou (2000). In particular, we first considered a benchmark design under which samples of different size are generated from the quadratic exponential model (4) for 3 and 7 time occasions, only one covariate generated from a Normal
DYNAMIC MODEL FOR BINARY PANEL DATA
731
distribution, and different values of γ between 0.25 and 2. As in Honoré and Kyriazidou (2000), we also considered other scenarios based on more sophisticated designs for the regressors. Under each scenario, we generated a suitable number of samples and, for every sample, we computed the proposed conditional estimator, whose property were mainly evaluated in terms of median bias and median absolute error (MAE). We also computed the corresponding standard errors and obtained confidence intervals with different levels for each structural parameter. On the basis of the simulation study, we conclude that, for each structural parameter, the bias of the conditional estimator is always negligible (with the exception of the estimator γˆ when n is small); this bias tends to increase with γ, to decrease with n, and to decrease very quickly with √ T . Similarly, we observe that the MAE decreases with n at a rate close to n and much faster with T . This depends on the fact that the number of observations that contribute to the conditional likelihood increases more than proportionally with T , as an increase of T also determines an increase of the actual sample size.4 Moreover, the MAE of the estimator of each parameter increases with γ. This is mainly due to the fact that when γ is positive, its increase implies a decrease of the actual sample size. The simulation results also show that the confidence intervals based on the conditional estimator attain the nominal level for each parameter. This confirms the validity of the rule to compute standard errors based on ˆ the information matrix J. Given the same interpretation of the parameters of the quadratic exponential and the dynamic logit models, it is quite natural to compare the proposed conditional estimator with available estimators of the parameters of the latter model. In particular, the results of our simulation study can be compared with those of Honoré and Kyriazidou (2000). It emerges that our estimator performs better than their estimator in terms of both bias and efficiency. This is mainly due to the fact that the former exploits a larger number of response configurations with respect to the latter. Similarly, our estimator can be compared with the bias corrected estimator proposed by Carro (2007). In this case, we observe that the former performs much better than the latter when the parameter of interest is γ, whereas our estimator performs slightly worse than that of Carro (2007) when the parameters of interest are those in β1 . However, when considering these conclusions, one must be conscious that the results compared here derive from simulation studies performed under different, although very similar, models. REFERENCES AGRESTI, A. (2002): Categorical Data Analysis (Second Ed.). New York: Wiley. [720] 4 The actual sample size is the number of response configurations yi such that 0 < yi+ < T . These response configurations contain information on the structural parameters and contribute to (θ); see equation (12).
732
F. BARTOLUCCI AND V. NIGRO
ANDERSEN, E. B. (1970): “Asymptotic Properties of Conditional Maximum-Likelihood Estimators,” Journal of Royal Statistical Society, Ser. B, 32, 283–301. [722] ARELLANO, M., AND B. HONORÉ (2001): “Panel Data Models: Some Recent Developments,” in Handbook of Econometrics, Vol. V, ed. by J. J. Heckman and E. Leamer. Amsterdam: NorthHolland. [719] BARNDORFF-NIELSEN, O. (1978): Information and Exponential Families in Statistical Theory. New York: Wiley. [729] BARTOLUCCI, F., AND V. NIGRO (2010): “Supplement √ to ‘A Dynamic Model for Binary Panel Data With Unobserved Heterogeneity Admitting a n-Consistent Conditional Estimator’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/7531_data. pdf; http://www.econometricsociety.org/ecta/Supmat/7531_data and programs.zip. [720] CARRO, J. M. (2007): “Estimating Dynamic Panel Data Discrete Choice Models With Fixed Effects,” Journal of Econometrics, 140, 503–528. [722,731] CHAMBERLAIN, G. (1985): “Heterogeneity, Omitted Variable Bias, and Duration Dependence,” in Longitudinal Analysis of Labor Market Data, ed. by J. J. Heckman and B. Singer. Cambridge: Cambridge University Press. [722] (1993): “Feedback in Panel Data Models,” Unpublished Manuscript, Department of Economics, Harvard University. [730] COX, D. R. (1958): “The Regression Analysis of Binary Sequences,” Journal of the Royal Statistical Society, Ser. B, 20, 215–242. [722] (1972): “The Analysis of Multivariate Binary Data,” Applied Statistics, 21, 113–120. [719, 723] DIGGLE, P. J., P. J. HEAGERTY, K.-Y. LIANG, AND S. L. ZEGER (2002): Analysis of Longitudinal Data (Second Ed.). New York: Oxford University Press. [725] HAHN, J., AND W. NEWEY (2004): “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models,” Econometrica, 72, 1295–1319. [722] HECKMAN, J. J. (1981a): “Statistical Models for Discrete Panel Data,” in Structural Analysis of Discrete Data With Econometric Applications, ed. by D. McFadden and C. F. Manski. Cambridge, MA: MIT Press. [719] (1981b): “Heterogeneity and State Dependence,” in Structural Analysis of Discrete Data With Econometric Applications, ed. by D. McFadden and C. F. Manski. Cambridge, MA: MIT Press. [719] (1981c): “The Incidental Parameter Problem and the Problem of Initial Conditions in Estimating a Discrete Time-Discrete Data Stochastic Process,” in Structural Analysis of Discrete Data With Econometric Applications, ed. by D. McFadden and C. F. Manski. Cambridge, MA: MIT Press. [725] HONORÉ, B. E., AND E. KYRIAZIDOU (2000): “Panel Data Discrete Choice Models With Lagged Dependent Variables,” Econometrica, 68, 839–874. [720,722,723,728,730,731] HONORÉ, B. E., AND E. TAMER (2006): “Bounds on Parameters in Panel Dynamic Discrete Choice Models,” Econometrica, 74, 611–629. [722,730] HSIAO, C. (2005): Analysis of Panel Data (Second Ed.). New York: Cambridge University Press. [719] HYSLOP, D. R. (1999): “State Dependence, Serial Correlation and Heterogeneity in Intertemporal Labor Force Participation of Married Women,” Econometrica, 67, 1255–1294. [725] MAGNAC, T. (2004): “Panel Binary Variables and Sufficiency: Generalizing Conditional Logit,” Econometrica, 72, 1859–1876. [722] MOLENBERGHS, G., AND G. VERBEKE (2004): “Meaningful Statistical Model Formulations for Repeated Measures,” Statistica Sinica, 14, 989–1020. [725] NEWEY, W. K., AND D. MCFADDEN (1994): “Large Sample Estimation and Hypothesis Testing,” in Handbook of Econometrics, Vol. 4, ed. by R. F. Engle and D. L. McFadden. Amsterdam: North-Holland. [730] RASCH, G. (1961): “On General Laws and the Meaning of Measurement in Psychology,” in Proceedings of the IV Berkeley Symposium on Mathematical Statistics and Probability, Vol. 4, Berkeley, CA: University of California Press, 321–333. [720,722]
DYNAMIC MODEL FOR BINARY PANEL DATA
733
WOOLDRIDGE, J. M. (2001): Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press. [724]
Dipartimento di Economia, Finanza e Statistica, Università di Perugia, Via A. Pascoli 20, 06123 Perugia, Italy;
[email protected] and Dipartimento di Studi Economico-Finanziari e Metodi Quantitativi, Università di Roma “Tor Vergata,” Via Columbia 2, 00133 Roma, Italy; Valentina.Nigro@ uniroma2.it. Manuscript received October, 2007; final revision received September, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 735–753
BOOTSTRAP INFERENCE IN PARTIALLY IDENTIFIED MODELS DEFINED BY MOMENT INEQUALITIES: COVERAGE OF THE IDENTIFIED SET BY FEDERICO A. BUGNI1 This paper introduces a novel bootstrap procedure to perform inference in a wide class of partially identified econometric models. We consider econometric models defined by finitely many weak moment inequalities,2 which encompass many applications of economic interest. The objective of our inferential procedure is to cover the identified set with a prespecified probability.3 We compare our bootstrap procedure, a competing asymptotic approximation, and subsampling procedures in terms of the rate at which they achieve the desired coverage level, also known as the error in the coverage probability. Under certain conditions, we show that our bootstrap procedure and the asymptotic approximation have the same order of error in the coverage probability, which is smaller than that obtained by using subsampling. This implies that inference based on our bootstrap and asymptotic approximation should eventually be more precise than inference based on subsampling. A Monte Carlo study confirms this finding in a small sample simulation. KEYWORDS: Partial identification, moment inequalities, inference, bootstrap, subsampling, asymptotic approximation, rates of convergence.
1. INTRODUCTION THIS PAPER CONTRIBUTES to the growing literature on inference in partially identified econometric models. A model is said to be partially identified or set identified when the sampling process and the maintained assumptions restrict the value of the parameter of interest to a set, called the identified set, which is smaller than the logical range of the parameter but potentially4 larger than a single point. Partially identified models arise naturally in economic models when strong and usually unrealistic assumptions are traded by weaker and 1 I am indebted to my advisors, Joel Horowitz, Rosa Matzkin, and Elie Tamer, for their guidance and support. I thank a co-editor and three anonymous referees for comments and suggestions that have significantly helped to improve this paper. I also thank Donald Andrews, Ivan Canay, Xiaohong Chen, Silvia Glaubach, Nenad Kos, Enno Mammen, Charles Manski, Adam Rosen, Viktor Subbotin, Xun Tang, and the participants of seminars at Oberwolfach, Northwestern, UCL, Brown, UPenn, UCLA, Columbia, NYU, Yale, Duke, and Michigan for their comments. Financial support from the Robert Eisner Memorial Fellowship and the Dissertation Year Fellowship is gratefully acknowledged. Any and all errors are my own. 2 We can also admit models defined by moment equalities by combining pairs of weak moment inequalities. 3 We deal with the objective of covering each element of the identified set with a prespecified probability in Bugni (2010a). 4 If the parameter of interest is restricted to a single point, the model is said to be point identified.
© 2010 The Econometric Society
DOI: 10.3982/ECTA8056
736
FEDERICO A. BUGNI
more credible ones. In this paper, we consider partially identified models defined by finitely many weak moment inequalities.5 The goal of this paper is twofold. The first objective is to introduce a novel bootstrap procedure to construct confidence sets for our class of partially identified models. In large samples, our bootstrap procedure achieves exactly the desired coverage probability. The second objective is to compare our bootstrap procedure with competing inferential procedures in terms of the rate of convergence of the error in the coverage probability, that is, in terms of the rate at which they achieve the desired coverage level. To the best of our knowledge, this is the first paper that performs this comparison among competing inferential procedures for partially identified models. The ultimate purpose of our confidence sets is to conduct hypothesis testing in partially identified econometric models. Based on the duality between hypothesis testing and confidence sets, the hypothesis testing problem can be translated into the construction of confidence sets that cover the object of interest with a minimum prespecified probability. In the literature on partially identified models, there are two possible objects of interest, each of them related to different hypothesis testing problems. On the one hand, the object of interest can be the identified set; on the other hand, the object of interest can be each of the points of the identified set.6 In this paper, we focus exclusively on the problem of construction of confidence sets that cover the identified set with a prespecified probability.7 The formal structure of the problem is as follows. An economic model relates the distribution of the observables with a finite dimensional parameter, denoted by θ, that belongs to a parameter space, denoted by Θ The model is partially identified and, hence, the distribution of the observables restricts the true value of the parameter to the identified set, which is denoted by ΘI A set Cn (1 − α) is a confidence set for the identified set with confidence level (1 − α) if and only if the following property is satisfied: (1)
lim inf P(ΘI ⊆ Cn (1 − α)) ≥ (1 − α) n→+∞
Romano and Shaikh (2010) provided a link between the confidence set Cn (1 − α) and a hypothesis testing problem whose null hypothesis is H0 : θ ∈ ΘI and whose alternative hypothesis is H1 : θ ∈ / ΘI Specifically, if we decide to reject the null hypotheses for all parameter values that lie outside of Cn (1 − α) then, 5 As mentioned earlier, moment equalities can be admitted by combining pairs of weak moment inequalities. 6 This distinction was pointed out by Imbens and Manski (2004), who showed that the confidence set for the identified set will also be a confidence set for each of its elements. 7 As we mentioned in footnote 3, in Bugni (2010a), we focus on the problem of construction of confidence sets that cover each of the elements of the identified set with a prespecified probability. In that paper, we proposed an analogous bootstrap procedure and obtained the same theoretical results as in this paper.
BOOTSTRAP INFERENCE IN PARTIALLY IDENTIFIED MODELS
737
in the limit, the probability of incorrectly rejecting at least one of these hypotheses will be less than α The confidence set is said to provide consistent inference in level if it satisfies the coverage requirement, condition (1), with equality. This is a desirable feature since it implies that the confidence set is not excessively large, which would result in unnecessary loss of statistical power of the underlying hypothesis test. We show that our bootstrap procedure provides consistent inference in level. Our results on confidence regions for partially identified models build upon the criterion function approach introduced by Chernozhukov, Hong, and Tamer (2007) (henceforth, CHT). In their paper, they implemented their inference using a resampling technique called subsampling. In essence, we implement the criterion function approach in a wide class of econometric models using an alternative resampling technique, the bootstrap. Our bootstrap procedure differs qualitatively from replacing the subsampling method provided by CHT with the bootstrap, that is, we do not merely propose a bootstrap analogue of their subsampling method. In fact, we show in Section A.2.5 of the Appendix (Supplemental Material (Bugni (2010b))) that a bootstrap analogue of their subsampling procedure would, in general, fail to be consistent in level. The difference between our bootstrap method and the bootstrap analogue of the subsampling procedure proposed by CHT lies in the choice of the bootstrap criterion function, which is the key to our consistency result. Following similar techniques to those used to implement our bootstrap scheme, we also propose an asymptotic approximation to perform consistent inference in level.8 There are currently many methods available to implement inference in partially identified econometric models. Given the choice of the criterion function, the researcher can implement inference using our bootstrap, asymptotic approximation, or subsampling. Since all these methods provide consistent inference in level (that is, they all achieve the desired goal asymptotically), an important basis of comparison is the rate at which the error in the coverage probability vanishes (that is, the rate at which the goal is achieved). If two methods have errors in the coverage probability that converge to zero at different rates, then the one that converges faster will eventually be more accurate than the one that converges slower. One of the main contributions of this paper is to show that, under certain conditions, our bootstrap and our asymptotic approximation have errors in the coverage probability that converge to zero at the same rate, which is a faster rate than the one obtained by using subsampling. Hence, under these conditions, our results imply that inference 8 In independent research, a similar asymptotic approximation was also proposed by Soares (2006), CHT, and Andrews and Soares (2010).
738
FEDERICO A. BUGNI
based on our bootstrap and our asymptotic approximation should eventually be more precise than inference based on subsampling.9 Several papers have proposed different inferential methods to perform inference in partially identified models. Most of these references only construct confidence sets that cover the elements of the identified set with a prespecified probability, which is a different coverage objective from the one considered in this paper.10 Notable exceptions are CHT, Beresteanu and Molinari (2008), and Romano and Shaikh (2010). Beresteanu and Molinari (2008) proposed an alternative approach to inference in partially identified models using random set theory, which is significantly different from the techniques used in our paper. Romano and Shaikh (2010) considered the general problem of constructing coverage regions for the identified set using a subsampling stepdown control procedure that is comparable to the subsampling procedure proposed by CHT. They showed formally that the subsampling stepdown control procedure cannot be replaced by a naive bootstrap stepdown control procedure. All of the procedures considered in our paper, including our bootstrap approximation, can be used as an ingredient in a stepdown control inferential procedure that results in consistent inference in level.11 The rest of the paper is organized as follows. Section 2 considers the construction of confidence sets for the identified set. Section 2.1 introduces our assumptions and provides an example of an econometric model where these assumptions are satisfied. In Section 2.2, we introduce our bootstrap procedure to perform inference, demonstrate its consistency in level, and analyze its error in the coverage probability. In Section 2.3, we consider the two competing inferential procedures—subsampling and asymptotic approximation—for which we also show consistency in level and analyze the error in coverage probability. Section 3 concludes the paper and provides directions for further research. The online Appendix (Bugni (2010b)) collects all the proofs for the paper, intermediate results, supplementary explanations, and Monte Carlo simulations. Note that, except for assumptions, enumerations preceded by “A” refer to the online Appendix. 2. CONFIDENCE SETS FOR THE IDENTIFIED SET Our objective is to construct confidence sets that cover the identified set with a prespecified probability and we choose to accomplish this objective using the 9 This result can be related to results in the literature on inference about sample averages, where the rate of convergence of subsampling procedures is likely to be slow, relative to the bootstrap or the asymptotic approximation. See, for example, Horowitz (2002), Politis and Romano (1994), and Politis, Romano, and Wolf (1999). 10 These include Andrews, Berry, and Jia (2004), Pakes, Porter, Ho, and Ishii (2006), Soares (2006), Romano and Shaikh (2008), Rosen (2008), Andrews and Soares (2010), and Canay (2010). 11 I thank an anonymous referee for suggesting this point. This is formally shown in Section A.4.1 of the Appendix.
BOOTSTRAP INFERENCE IN PARTIALLY IDENTIFIED MODELS
739
criterion function approach. According to this approach, the first step is to define a nonnegative function of the parameter space, denoted by Q that equals zero if and only if the parameter belongs to the identified set. This function is referred to as a criterion function because it provides a criterion that characterizes the identified set. We denote its sample analogue by Qn and define Γn = supθ∈ΘI an Qn (θ) where {an }+∞ n=1 is a sequence of constants that makes the (asymptotic) distribution of Γn nondegenerate. If we let cn denote the (1 − α) quantile of Γn then a confidence set of the identified set with confidence level (1 − α) is given by (2)
Cn (1 − α) = {θ ∈ Θ : an Qn (θ) ≤ cn }
Therefore, the criterion function approach translates the problem of constructing confidence sets into a problem of approximation of the quantiles of Γn . This approximation problem is nonstandard precisely because the econometric model is partially identified. In a class of models defined by moment inequalities and equalities, we show how to perform this approximation using the bootstrap. 2.1. Setup In this section, we introduce the assumptions that define our econometric model. We consider two separate sets of assumptions. The first set of assumptions constitutes what we call the general model. The second set of assumptions is a particular subset of the first one and constitutes what we call the conditionally separable model. The reason to consider two separate setups is that consistency in level can be shown under the assumptions of the general model, but results regarding rates of convergence require the stronger restrictions imposed by the conditionally separable model. After introducing both sets of assumptions, we provide an example to illustrate each of these frameworks. 2.1.1. General Model The following assumptions constitute our general model in the independent and identically distributed (i.i.d.) setting. ASSUMPTION A1: For the probability space (Ω B P) let Z : Ω → SZ be a random vector. We observe an i.i.d. sample of size n denoted by Xn = {Zi }ni=1 ASSUMPTION A2: The parameter space, denoted by Θ is a compact and convex subset of a finite dimensional Euclidean space Rη (η < +∞). ASSUMPTION A3: The identified set, denoted by ΘI is given by J ΘI = θ ∈ Θ : E(mj (Z θ)) ≤ 0 j=1
740
FEDERICO A. BUGNI
where m(z θ) : SZ × Θ → RJ is a (jointly) measurable function and E(m(Z θ)) : Θ → RJ is a lower semicontinuous function. Moreover, ΘI is a proper subset of Θ. ASSUMPTION A4: For every θ ∈ Θ and every j = 1 2 J the variance of mj (Z θ) is finite and positive. For every z ∈ SZ the function {m(z θ) − E(m(Z θ)) : Θ → RJ } belongs to B which is a separable subset of the space of bounded functions that map Θ onto RJ with the sup-norm metric. The empirical n −1/2 process vn (mθ ) = n i=1 (m(Zi θ) − E(m(Z θ))) is stochastically equicontinuous, that is, for any ε > 0, sup vn (mθ ) − vn (mθ ) > ε = 0 lim lim sup P ∗ sup η↓0
n→+∞
θ∈Θ {θ ∈Θ:θ −θ≤η}
where · denotes Euclidean distance and P ∗ denotes the outer measure12 with respect to P We briefly comment on some of the assumptions. Assumption A1 requires that the sample be i.i.d. The result of consistency of the bootstrap procedure proposed in this paper is based on the law of large numbers, the central limit theorem, and the law of iterated logarithm applied to empirical processes. Consistency of our bootstrap procedure can be generalized to non-i.i.d. settings, provided that these results hold and, of course, that the resampling method is adequately modified for these settings. Assumption A3 defines the identified set as the intersection of finitely many weak moment inequalities. Of course, equality restrictions can be accommodated by combining two moment inequalities. Also, notice that Assumption A3 allows the identified set to be empty, which would imply that the model is misspecified. The present setup allows for econometric models defined by conditional moment conditions as long as the conditioning covariates have finite support.13 To see why, suppose that the conditioning covariate X has finite support given by SX and let the identified set be given by ΘI = {θ ∈ Θ : {{E(Mj (Y θ)|X = x) ≤ 0}Jj=1 }x∈SX } where M(y θ) : SY ×Θ → RJ is a jointly measurable function and, for every x ∈ SX E(M(Y θ)|X = x) : Θ → RJ is lower semicontinuous. By defining Z = (Y X) and m(Z θ) = M(Y θ)1[X = x] the identified set is reexpressed according to Assumption A3. When we combine the total boundedness of the parameter space of Assumption A2 with the stochastic equicontinuity condition of Assumption A4, we deduce that the class of functions {m(z θ) − E(m(Z θ)) : SZ → RJ }, indexed by θ ∈ Θ, is P-Donsker. 12 Let (Ω B P) be a probability space. For any arbitrary subset of Ω denoted A its outer measure is defined by P ∗ (A) = infS⊆B {P(S) : A ⊆ S} 13 The methods proposed in this paper cannot handle conditioning covariates with infinite support. If this is the case, one can still use our techniques by partitioning the support of the conditioning covariate into finitely many bins. In this process, some information will be lost, and so our method will result in conservative inference.
BOOTSTRAP INFERENCE IN PARTIALLY IDENTIFIED MODELS
741
2.1.2. Conditionally Separable Model The following assumptions constitute our conditionally separable model in the i.i.d. setting. ASSUMPTION B1: For the probability space (Ω B P) let (X Y ) : Ω → SX × RJ be a random vector, where SX , the support of X, is composed of K values: SX = {xk }Kk=1 . We observe an i.i.d. sample of size n denoted by Xn = {Xi Yi }ni=1 ASSUMPTION B2: The parameter space, denoted by Θ is a compact and convex subset of a finite dimensional Euclidean space Rη (η < +∞). ASSUMPTION B3: The identified set, denoted by ΘI is given by J K ΘI = θ ∈ Θ : E(Yj − Mj (θ X)|X = xk ) ≤ 0 j=1 k=1 where, for each x ∈ SX M(θ x) : Θ → RJ is continuous. Moreover, ΘI is a proper subset of Θ ASSUMPTION B4: For every x ∈ SX and for every j = 1 2 J {Yj |X = x} has positive variance and finite fourth absolute moments. This setup is a particular case of the general model that strengthens Assumptions A1, A3, and A4. Assumption B1 organizes the observable variables into conditioning covariates and dependent variables, and requires the former to have finite support. Recall that Assumption A3 defined the identified set as the intersection of moment inequalities of the form E(mj (Z θ)|X = x) ≤ 0 Assumption B3 strengthens this by requiring that the conditional expectation E(mj (Z θ)|X = x) can be separated into the expectation of a random variable that does not involve the parameter, E(Yj |X = x) and a conditionally nonstochastic function of the parameter, E(Mj (θ X)|X = x) Finally, Lemma A.1 shows that Assumptions B1–B4 imply Assumption A4. 2.1.3. Illustrative Example To illustrate both of our frameworks and highlight their differences, we consider the following example related to Manski (1989). Suppose that our model predicts that E(Z − f (X θ)|W ) = 0, where f is a known function, Z is the explained variable, X is the explanatory variable, θ is the parameter of interest, and W is an exogenous variable. Typical examples of this setup are linear index models, such as the linear model, the probit model, or the logit model. Suppose that certain observations of the explained variable are missing (or censored). Let U denote the binary variable that takes value 1 if the observation is unobserved and 0 otherwise. Suppose that {Z|W } has logical lower and
742
FEDERICO A. BUGNI
upper bounds, given by ZL (W ) and ZH (W ) respectively.14 Also assume that the support of W is given by finitely many values: SW = {wk }Kk=1 Under these conditions, the identified set for the parameter of interest is given by ΘI =
θ ∈ Θ:
E −(Z(1 − U) + ZH (wk )U − f (X θ))1[W = wk ] ≤ 0 K E (Z(1 − U) + ZL (wk )U − f (X θ))1[W = wk ] ≤ 0 k=1
Under random sampling and certain regularity conditions,15 this example satisfies the assumptions of the general framework. In particular, if the explanatory variable X is endogenous or has continuous support, it cannot replace W in the role of conditioning covariate and, therefore, the example does not satisfy the assumptions of the conditionally separable model. On the other hand, if the explanatory variable X is exogenous and has finite support, it can replace W as the conditioning covariate and the example satisfies the assumptions of the conditionally separable model. 2.2. Bootstrap Procedure In this section, we introduce our bootstrap procedure to construct confidence regions for the identified set. To implement any inferential procedure based on the criterion function approach, we need to complete certain steps. First, we need to define the criterion function for our problem. Second, we need to generate an estimator of the identified set. This estimator is not our final goal, but an input to our inferential procedure. Once we complete these preliminary steps, we are ready to define the resampling procedure that implements our inference. 2.2.1. Criterion Function By definition, a function Q : Θ → R is a valid criterion function if it is nonnegative and takes value zero if and only if it is evaluated at a parameter in the identified set. Lemma A.2 characterizes all possible criterion functions for the type of models considered in this paper. This result reveals that there is a wide range of possible criterion functions. By restricting the class of functions under consideration, we can obtain desirable asymptotic results for our inferential procedures, such as consistency in level and rates of convergence. With this objective in mind, we consider the following assumption. 14 When the event has no logical lower bound (respectively, upper bound), then ZL (W ) = −∞ (respectively, ZH (W ) = +∞). 15 These conditions are made explicit in Section A.2.2.
BOOTSTRAP INFERENCE IN PARTIALLY IDENTIFIED MODELS
743
ASSUMPTION CF: The population criterion function is given by Q(θ) = G({[E(mj (Z θ))]+ }Jj=1 ) where, for an arbitrary vector of positive constants J {wj }Jj=1 , G : RJ+ → R is one of two functions: G(x) = j=1 wj xj or G(x) = max{wj xj }Jj=1 . Throughout this paper, we focus on criterion functions that satisfy Assumption CF, but most of our results extend to more general criterion functions.16 Based on this √ assumption, the corresponding sample analogue is given by an Qn (θ) = G({[ nEn (mj (Z θ))]+ }Jj=1 ) where, for every j = 1 2 J, n En (mj (Z θ)) denotes n−1 i=1 mj (Zi θ) 2.2.2. Estimation of the Identified Set The estimator of the identified set is an ingredient in the construction of confidence sets that cover the identified set with a prespecified probability. An estimator of the identified set is adequate for the purpose of inference if a confidence set that uses this estimator as an input produces inference that is consistent in level. By definition, the identified set is the subset of the parameter space that satisfies Q(θ) = 0 Therefore, the analogy principle suggests defining the estimator of the identified set as the collection of parameters that satisfy Qn (θ) = 0. In J our context, this set estimator is given by Θˆ AP I = {θ ∈ Θ : {En (mj (Z θ)) ≤ 0}j=1 } This random set is called the analogy principle estimator. In settings of practical relevance, the analogy principle estimator is not an adequate estimator for the purpose of inference. Following CHT, an adequate estimator requires the introduction of some slackness in the sample moment inequalities, thereby expanding the analogy principle estimator. Specifically, if √ √ +∞ we let {τn }n=1 be a positive sequence such that τn / n = o(1) and ln ln n/τn = o(1) (almost surely), then our estimator of the identified set is given by √ J Θˆ I (τn ) = θ ∈ Θ : En (mj (Z θ)) ≤ τn / n j=1 Our estimator can be interpreted as √ an expansion of the analogy principle estimator by a slackness factor of τn / n The following lemma formalizes its properties. +∞ LEMMA 2.1: Assume Assumptions √ A1–A4. Let {τn }n=1 be a positive sequence √ such that τn / n = o(1) and ln ln n/τn = o(1), almost surely, and define √ Θˆ I (τn ) = {θ ∈ Θ : {En (mj (Z θ)) ≤ τn / n}Jj=1 } For a sequence of positive num√ bers {εn }+∞ n=1 such that εn = o(1) and (τn / n)/εn = o(1), almost surely, define ΘI (εn ) = {θ ∈ Θ : {E(mj (Z θ)) ≤ εn }Jj=1 } If the identified set is nonempty, then
16 In particular, our theoretical results extend to a generalization of Assumption CF, called Assumption CF , which is described in Section A.2.3.
744
FEDERICO A. BUGNI
P(lim inf{ΘI ⊆ Θˆ I (τn ) ⊆ ΘI (εn )}) = 1, and if the identified set is empty, then P(lim inf{Θˆ I (τn ) = ∅}) = 1. When the identified set is nonempty, then, with probability one, our set estimate will eventually be “sandwiched” between two sets. These sets are the identified set and a set that converges to the identified set. When the identified set is empty, then, with probability one, our set estimate will eventually become empty. As we show later, Lemma 2.1 implies that the proposed estimator of the identified set is adequate for the purpose of inference. The restrictions on the sequence {τn }+∞ n=1 provide little guidance on how to implement the estimator (and the inference based on it) in a finite sample setting. We comment on this important practical question in the next subsection. 2.2.3. The Procedure We now introduce our bootstrap procedure to construct confidence sets that cover the identified set with a prespecified probability. We propose two different procedures: one to be used if the model satisfies the assumptions of the general model and one to be used exclusively if the model satisfies the assumptions of the conditionally separable model. Bootstrap Procedure for the General Model. The following bootstrap method is intended for the general model and so, in particular, it could also be applied to the conditionally separable model.17 This will be referred to as the bootstrap procedure for the general model and it consists of the following steps: √ +∞ Step 1. Choose {τ } to be a positive sequence such that τ / n = o(1) and n n n=1 √ ln ln n/τn = o(1), almost surely. Step 2. Estimate the identified set with √ J Θˆ I (τn ) = θ ∈ Θ : En (mj (Z θ)) ≤ τn / n j=1 Step 3. Repeat the following procedure for s = 1 2 S Construct bootstrap samples of size n by sampling randomly with replacement from the data. Denote the bootstrapped observations by {Zi∗ }ni=1 and, for every j = 1 2 J n ∗ −1 ∗ let En (mj (Z θ)) denote n i=1 mj (Zi θ) Compute √ ⎧ ⎪ sup G n E∗n (mj (Z θ)) − En (mj (Z θ)) + ⎪ ⎪ ⎪ ⎨ θ∈Θˆ I (τn ) ∗ √ J Γn = × 1 E (m (Z θ)) ≤ τ / n if Θˆ I (τn ) = ∅ ⎪ n j n ⎪ j=1 ⎪ ⎪ ⎩ 0 if Θˆ I (τn ) = ∅ 17 Nevertheless, as we explain next, if the model satisfies the assumptions of the conditionally separable model, there are advantages to using the bootstrap procedure specialized for this framework.
BOOTSTRAP INFERENCE IN PARTIALLY IDENTIFIED MODELS
745
Step 4. Let cˆnB (1 − α) be the (1 − α) quantile of the distribution of Γn∗ approximated with arbitrary accuracy in the previous step. The (1−α) confidence set for the identified set is given by √ J nEn (mj (Z θ)) + j=1 ≤ cˆnB (1 − α) Cˆ nB (1 − α) = θ ∈ Θ : G To implement our procedure, we need to specify the sequence {τn }+∞ n=1 described in the first step. This sequence enters the procedure in two places. First, it enters in the estimation of the identified set in Step 2 and, second, it enters in the indicator function term in the bootstrap criterion function in Step 3.18 The restrictions on the rate of the sequence {τn }+∞ n=1 in Step 1 provide little guidance on how to choose this sequence in practice. As we show later, our bootstrap procedure will be consistent in level and have the same rate of convergence regardless of the specific choice of the sequence {τn }+∞ n=1 Thus, our asymptotic analysis does not provide a criterion for an “optimal” choice of this sequence. In this sense, we consider that the development of a data-dependent choice of this sequence is out of the scope of this paper. The experience drawn from Monte Carlo simulations seems to indicate that the finite sample performance of our inferential method does not depend critically on the specific choice of this sequence. The key to the consistency in level of our bootstrap procedure is the bootstrap analogue criterion function defined in Step 3. In particular, it is essential to the consistency result that we introduce (a) the recentering term, that is, subtracting the sample moment from the bootstrap sample moment, and (b) the indicator function term. Because of these two terms, our bootstrap procedure differs qualitatively from the bootstrap version of the subsampling scheme proposed by CHT. We analyze these differences in Section A.2.5. Bootstrap Procedure for the Conditionally Separable Model. In the conditionally separable model, we do not only show the consistency in level of the inferential procedure, but we also establish the rate of convergence of the error of the bootstrap approximation. To understand why we need to introduce a separate bootstrap method for the conditionally separable model, we need to distinguish between whether the design of the covariates is fixed or random. The design of the covariates refers to how the econometrician perceives the distribution of covariates in the sample. If the design of the covariates is fixed, then the distribution of the covariates is considered to be nonstochastic (or stochastic and conditioned upon), and if the design of the covariates is random, then the distribution of the 18 In principle, the sequence {τn }+∞ n=1 in these two steps could be two different sequences provided that they both satisfy the rate requirements of Step 1. In fact, the formal arguments in the Appendix allow these two sequences to differ. We restrict both sequences to coincide in the main text only to simplify the notation.
746
FEDERICO A. BUGNI
covariates is considered to be stochastic. Of course, the inferential procedure we perform is different depending on the case. In the fixed covariates case, the cell frequency of the covariates is constant and this implies that the bootstrap procedure proposed in the previous section delivers a rate of convergence of order n−1/2 In the random covariates case, the cell frequency of the covariates is random and this implies that the bootstrap procedure proposed in the previous section delivers a rate of convergence of order n−1/2 ln n ln ln n (rather than n−1/2 ).19 Nevertheless, it is possible to design a bootstrap method that can achieve a rate of convergence of order n−1/2 independently of the design of the covariates. This is referred to as the bootstrap procedure specialized for the conditionally separable model and it consists of the following steps: √ +∞ } to be a positive sequence such that τ / n = o(1) and Step 1. Choose {τ n n n=1 √ ln ln n/τn = o(1), almost surely. Step 2. Estimate the identified set with √ J K Θˆ I (τn ) = θ ∈ Θ : pˆ k (En (Yj |xk ) − Mj (θ xk )) ≤ τn / n j=1 k=1 n where, for every k = 1 2 K, pˆ k = n−1 i=1 1[Xi = xk ] Step 3. Repeat the following procedure for s = 1 2 S Construct bootstrap samples of size n by sampling randomly with replacement from the data.20 Denote the bootstrapped observations by {Yi∗ Xi∗ }ni=1 , and, for every k = n ∗ −1 ∗ ∗ 1 2 K and j = 1 2 J let pˆ k = n i=1 1[Xi = xk ] and En (Yj |xk ) = n ∗ −1 ∗ ∗ (pˆ k n) i=1 Yji 1[Xi = xk ] Compute √ ⎧ ⎪ sup G npˆ ∗k (E∗n (Yj |xk ) − En (Yj |xk )) + ⎪ ⎪ ⎪ ⎪ ⎪ θ∈Θˆ I (τn ) ⎨ √ J K × 1 pˆ k (En (Yj |xk ) − Mj (θ xk )) ≤ τn / n j=1 Γn∗ = k=1 ⎪ ⎪ ⎪ ⎪ if Θˆ I (τn ) = ∅ ⎪ ⎪ ⎩ 0 if Θˆ I (τn ) = ∅ Step 4. Let cˆnB (1 − α) be the (1 − α) quantile of the distribution of Γn∗ simulated with arbitrary accuracy in the previous step. The (1 − α) confidence set for the identified set is given by √ J K npˆ k (En (Yj |xk ) − Mj (θ xk )) + j=1 k=1 Cˆ nB (1 − α) = θ ∈ Θ : G ≤ cˆnB (1 − α) 19
This result is available from the author, upon request. Of course, bootstrap samples are constructed to respect the assumption on the design of the covariates. 20
BOOTSTRAP INFERENCE IN PARTIALLY IDENTIFIED MODELS
747
The only distinction between the general procedure and the one specialized for the conditionally separable model occurs in the definition of the bootstrap criterion function in Step 3. In the latter, the argument inside the [·]+ function is a random variable rather than a random function, which is the key feature that allows us to obtain a rate of convergence of order n−1/2 , regardless of the assumption about the covariates. 2.2.4. Asymptotic Properties In this section, we establish the asymptotic properties of our bootstrap procedure. The results of this section are based on two representation theorems, which are stated and proved in the Appendix. As a first step, we show that the distribution of the statistic of interest has a certain asymptotic representation (Theorem A.1). In a second step, we establish that, conditional on the sample, our bootstrap approximation has an analogous asymptotic representation (Theorem A.3). Based on these results, we can show that bootstrap confidence sets are consistent in level. THEOREM 2.1—Consistency in Level—Bootstrap Approximation: Assume Assumptions A1–A4 and CF. If the identified set is nonempty, then, for any α ∈ (0 05) lim P(ΘI ⊆ Cˆ nB (1 − α)) = (1 − α)
n→+∞
The representation theorems are also the key to analyze the error in the coverage probability (ECP) of the bootstrap approximation, which is the difference between the desired coverage level and the actual coverage level. By Theorem 2.1, the error in the coverage probability of our bootstrap approximation converges to zero. In the conditionally separable model, the representation theorems are used to provide an upper bound on the rate at which this convergence occurs. THEOREM 2.2 —ECP—Bootstrap Approximation: Assume Assumptions B1–B4 and CF, and choose the bootstrap procedure to be the one specialized for the conditionally separable model. If the identified set is nonempty, then, for any α ∈ (0 05) P(ΘI ⊆ Cˆ B (1 − α)) − (1 − α) = O n−1/2 n In terms of coverage, the only relevant case is the nonempty identified set, since the empty set is trivially covered by any confidence set.21 The previous 21 Our bootstrap procedure also exhibits desirable properties when the identified set is empty. This is established in Lemma A.7.
748
FEDERICO A. BUGNI
theorem shows that, in the conditionally separable model, the error in the coverage probability converges to zero at a rate of order n−1/2 Provided that the sequence {τn }+∞ n=1 satisfies the requirements of Step 1, the rate of convergence of the error in the coverage probability does not depend on the particular choice of the rate. In this sense, our asymptotic analysis does not provide a criterion for an “optimal” choice of this sequence. 2.3. Alternative Procedures In the previous sections, we proposed a bootstrap scheme to perform inference in partially identified models and we studied its properties. In this section, we consider alternative inferential methods. 2.3.1. Subsampling There are different subsampling procedures that can be proposed to approximate the distribution of interest. A first scheme is a subsampling analogue of the bootstrap procedure proposed in the preceding section, which will be referred to as subsampling 1. A second scheme22 is the subsampling procedure proposed by CHT, which is referred to as Subsampling 2. Both of these procedures are described and studied in detail in Section A.6.1. The basic difference between the two methods is the definition of the subsampling criterion function in Step 3. On the one hand, Subsampling 1 proposes a criterion function that includes a recentering term and an indicator function term, exactly as in the bootstrap scheme of Section 2.2.3. On the other hand, the criterion function of Subsampling 2 has none of these terms. The benchmark subsampling scheme for the presentation of the results is Subsampling 1, but at the end of this subsection, we briefly discuss how the results differ from those of Subsampling 2. First, we establish the consistency in level of the benchmark subsampling approximation. THEOREM 2.3 —Consistency in Level—Subsampling 1: Assume Assumptions A1–A4 and CF. If the identified set is nonempty, then, for any α ∈ (0 05) SS1 lim P ΘI ⊆ Cˆ bn n (1 − α) = (1 − α)
n→+∞
Second, we establish the rate of convergence of the error in the coverage probability of the inference based on the benchmark subsampling approximation. 22 I thank an anonymous referee for suggesting the consideration of the second version of subsampling.
BOOTSTRAP INFERENCE IN PARTIALLY IDENTIFIED MODELS
749
THEOREM 2.4—ECP—Subsampling 1: Assume Assumptions B1–B4 and CF, and that the distribution of {{Yj 1[X = xk ]}Jj=1 }Kk=1 is strongly nonlattice. If the identified set is nonempty, then, for any α ∈ (0 05) P ΘI ⊆ Cˆ SS1 (1 − α) − (1 − α) = O bn /n + b−1/2 bn n n This theorem establishes an upper bound on the rate at which the error in coverage probability of the subsampling approximation converges to zero. This upper bound depends on the choice of the subsampling size, reflecting the usual trade-off: increasing subsampling size increases the precision of the averages within a subsample, but decreases the total number of subsamples available. The choice of the subsampling size that minimizes this upper bound is bn = O(n2/3 ) which results in an error in the coverage probability of order n−1/3 . We can provide conditions under which this rate constitutes not just an upper bound on the rate of convergence of the error in the coverage probability, but also a lower bound. We now describe the results, but all the formal arguments are provided in Section A.6.1. Under certain conditions, Lemma A.9 shows that the conditional distribution of our subsampling approximation has the asymptotic representation (3)
SS P Γˆbn n1 ≤ h|Xn subsampling approx −1/2 n
= P(Γn ≤ h) +K1 (h)b
bn bn −1/2 + bn + K2 (h) + op n n
exact distribution
uniformly over h ≥ μ (for any μ > 0), where K1 and K2 are two nonstochastic functions given in the Appendix. Politis, Romano, and Wolf (1999) provided intuition for the nature of the two leading terms in the error of the subsampling approximation. The term of order b−1/2 appears because we are approximatn ing a distribution by repeatedly extracting random samples of size bn from the data. This leading term also appears in the bootstrap, but is of order n−1/2 instead of b−1/2 The term of order bn /n appears because samples are extracted n without replacement from a finite population, which introduces error in the approximation of the variance of the distribution. This term does not appear in the bootstrap, where samples are extracted with replacement. From this equation, if follows that, for any h ≥ μ the absolute value of the error in the approximation of Subsampling 1 is minimized by choosing subsampling size bn = C(h)n2/3 where C(h) is the positive minimizer of |K1 (h)C(h)−1/2 + K2 (h)C(h)| Therefore, if the functions K1 and K2 share the sign (for relevant values of h), then the error in the approximation of Subsampling 1 converges to zero at a rate that cannot be faster than n−1/3 .
750
FEDERICO A. BUGNI
For the purpose of inference, we are interested in values of h in a neighborhood of the (1 − α) quantile of the limiting distribution, which we denote by c∞ (1 − α) In Lemma A.11, we show that for all significance levels that are relevant for inferential purposes, K2 (c∞ (1 − α)) is positive. Therefore, if K1 (c∞ (1 − α)) is also positive, the previous logic implies that the approximation of Subsampling 1 converges to the distribution of interest at an exact rate of n−1/3 . This is the content of Theorem A.11. The conditions under which K1 (c∞ (1 − α)) is positive involve restrictions on the moments of {{Yj 1[X = xk ]}Jj=1 }Kk=1 that, to the best of our knowledge, lack intuitive interpretation. In the case that K1 (c∞ (1 − α)) is nonpositive, it might be possible to set the right-hand side of equation (3) to be op (n−1/3 ) by a particularly judicious choice of C(h) However, this approach would not be very practical, since it requires careful empirical selection of the subsampling size based on estimation of K1 (c∞ (1 − α)) and K2 (c∞ (1 − α)). In practice, based on equation (3), the subsampling size is likely to be chosen as bn = Cn2/3 for a fixed C > 0 In this case, unless K1 (c∞ (1 − α))C −1/2 + K2 (c∞ (1 − α))C = 0, the approximation of Subsampling 1 will also converge to the distribution of interest at a rate of n−1/3 According to previous sections, the bootstrap delivers an error in the coverage probability of order n−1/2 Hence, in the conditionally separable model and under certain conditions, the error in the coverage probability of the bootstrap approximation is eventually smaller than the error in the coverage probability produced by the approximation of Subsampling 1. In the Appendix, we show that the approximation of Subsampling 2 generates consistent inference in level (Theorem A.13) and we provide an upper bound on the rate of convergence of the error in coverage probability (Theorem A.14). In this case, the upper bound on the rate of convergence is of order + τn (bn /n)1/2 , which is even worse than the one obtained for Subsamb−1/2 n pling 1. Moreover, Lemma A.16 provides conditions under which this rate is not just an upper bound, but the exact rate of convergence of the error in the coverage probability. 2.3.2. Asymptotic Approximation Theorem A.1 shows that the limiting distribution of the statistic of interest converges weakly to a continuous function of a tight Gaussian process with a certain variance–covariance function. An asymptotic approximation can be constructed by replacing the Gaussian process with an estimate. This procedure is described in detail in Section A.6.2. Following the steps we used for the bootstrap approximation, we can prove consistency in level for the asymptotic approximation.
BOOTSTRAP INFERENCE IN PARTIALLY IDENTIFIED MODELS
751
THEOREM 2.5—Consistency in Level—Asymptotic Approximation: Assume Assumptions A1–A4 and CF. If the identified set is nonempty, then, for any α ∈ (0 05) lim P(ΘI ⊆ Cˆ nAA (1 − α)) = (1 − α)
n→+∞
Moreover, we can also establish the rate of convergence of the error in the coverage probability of the asymptotic approximation. THEOREM 2.6—ECP—Asymptotic Approximation: Assume Assumptions B1–B4 and CF. If the identified set is nonempty, then, for any α ∈ (0 05) P(ΘI ⊆ Cˆ AA (1 − α)) − (1 − α) = O n−1/2 n For the conditionally separable model, the bootstrap and the asymptotic approximation have the same upper bound on the rate of convergence of the error in the coverage probability. In other words, the bootstrap does not seem to be providing any improvement over the asymptotic approximation, was is usually referred to as asymptotic refinements.23 The lack of asymptotic refinements is expected given that the statistic of interest is not asymptotically pivotal.24 Finally, notice that the implementation of the asymptotic approximation requires a simulation that involves exactly the same amount of computation as the implementation of the bootstrap approximation. 3. CONCLUSION This paper contributes to the growing literature of inference in partially identified or set identified econometric models. We build on the criterion function approach proposed by CHT. The first contribution of this paper is to introduce a novel bootstrap procedure to construct confidence sets that cover the identified set with a prespecified probability in a wide class of partially identified models. The models considered are those defined by finitely many moment inequalities and equalities, which include many applications of economic interest. Asymptotically, the coverage level provided by our confidence set converges to the desired coverage level, that is, our procedure is consistent in level. This constitutes an advantage relative to other inferential procedures that have been proposed in the literature. Along the lines of our bootstrap procedure, we also propose an asymptotic approximation that is also consistent in level.25 23 To obtain asymptotic refinements, one could consider a computationally intensive procedure called prepivoting. This procedure is described in detail in Hall (1992) and Horowitz (2002). The study of the validity of the prepivoting procedure in this setting is out of the scope of this paper. 24 This is discussed in Section A.2.3. 25 As we mentioned earlier, this approximation was independently introduced by Soares (2006), Andrews and Soares (2010), CHT, and working paper versions of this paper.
752
FEDERICO A. BUGNI
The second contribution of our paper is to analyze the rate of convergence of the error in the coverage probability for our bootstrap approximation, our asymptotic approximation, and for subsampling approximations such as the one proposed by CHT. We show that our bootstrap approximation and our asymptotic approximation have errors in the coverage probability of order n−1/2 . Under certain conditions, we show that the error in the coverage probability of the subsampling approximation converges to zero at a rate of n−1/3 or slower. As a consequence, under these conditions, our bootstrap and our asymptotic approximation should eventually provide inference that is more precise than that of the subsampling approximation. The Monte Carlo simulations presented in the appendix reveal that our bootstrap approximation and our asymptotic approximation have a satisfactory finite sample performance. Moreover, both of these approximations exhibit a much better finite sample performance than the subsampling procedures, in accordance with the results regarding rates of convergence of the error in the coverage probability. This paper opens several topics for further research. A first topic is to find an adequate data-dependent procedure to choose the sequence {τn }+∞ n=1 which is the key to our consistency result. A second important generalization of this paper would be to allow for continuous covariates, which requires modifying the formal arguments in nontrivial ways. A final topic left for future research is to study whether the theoretical properties of our inferential method (coverage in level and rates) hold uniformly over a relevant class of probability distributions. The literature that studies this problem in partially identified models has focused exclusively on the problem of coverage of each of the elements of the identified set with a prespecified probability. Extending these results to the problem of coverage of the identified set with a prespecified probability appears to be a very challenging problem. REFERENCES ANDREWS, D. W. K., AND G. SOARES (2010): “Inference for Parameters Defined by Moment Inequalities Using Generalized Moment Selection,” Econometrica, 78, 119–157. [737,738,751] ANDREWS, D. W. K., S. BERRY, AND P. JIA (2004): “Confidence Regions for Parameters in Discrete Games With Multiple Equilibria, With an Application to Discount Chain Store Location,” Mimeo, Yale University and M.I.T. [738] BERESTEANU, A., AND F. MOLINARI (2008): “Asymptotic Properties for a Class of Partially Identified Models,” Econometrica, 76, 763–814. [738] BUGNI, F. A. (2010a): “Bootstrap Inference in Partially Identified Models Defined by Moment Inequalities: Coverage of the Elements of the Identified Set,” Mimeo, Duke University. [735, 736] (2010b): “Supplement to ‘Bootstrap Inference in Partially Identified Models Defined by Moment Inequalities: Coverage of the Elements of the Identified Set’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/8056_extensions.pdf; http://www.econometricsociety.org/ecta/Supmat/8056_data and programs.zip. [737,738] CANAY, I. A. (2010): “EL Inference for Partially Identified Models: Large Deviations Optimality and Bootstrap Validity,” Journal of Econometrics (forthcoming). [738]
BOOTSTRAP INFERENCE IN PARTIALLY IDENTIFIED MODELS
753
CHERNOZHUKOV, V., H. HONG, AND E. TAMER (2007): “Parameter Set Inference in a Class of Econometric Models,” Econometrica, 75, 1243–1284. [737] HALL, P. (1992): The Bootstrap and Edgeworth Expansion. New York: Springer. [751] HOROWITZ, J. L. (2002): The Bootstrap. Handbook of Econometrics, Vol. 5. Amsterdam: Elsevier Science, Chap. 52, 3159–3228. [738,751] IMBENS, G., AND C. F. MANSKI (2004): “Confidence Intervals for Partially Identified Parameters,” Econometrica, 72, 1845–1857. [736] MANSKI, C. F. (1989): “Anatomy of the Selection Problem,” The Journal of Human Resources, 24, 343–360. [741] PAKES, A., J. PORTER, K. HO, AND J. ISHII (2006): “Moment Inequalities and Their Application,” Mimeo, Harvard University, University of Wisconsin, and Columbia University. [738] POLITIS, D. N., AND J. P. ROMANO (1994): “Large Sample Confidence Regions Based on Subsamples Under Minimal Assumptions,” The Annals of Statistics, 22, 2031–2050. [738] POLITIS, D. N., J. P. ROMANO, AND M. WOLF (1999): Subsampling. New York: Springer. [738,749] ROMANO, J. P., AND A. M. SHAIKH (2008): “Inference for the Identified Set in Partially Identified Econometric Models,” Journal of Statistical Planning and Inference, 138, 2786–2807. [738] (2010): “Inference for the Identified Set in Partially Identified Econometric Models,” Econometrica, 78, 169–211. [736,738] ROSEN, A. (2008): “Confidence Sets for Partially Identified Parameters That Satisfy a Finite Number of Moment Inequalities,” Journal of Econometrics, 146, 107–117. [738] SOARES, G. (2006): “Inference for Partially Identified Models With Inequality Moment Constraints,” Mimeo, Yale University. [737,738,751]
Dept. of Economics, Duke University, 213 Social Sciences Building, Box 90097, Durham, NC 27708, U.S.A.;
[email protected]. Manuscript received August, 2008; final revision received September, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 755–770
OBJECTIVE AND SUBJECTIVE RATIONALITY IN A MULTIPLE PRIOR MODEL BY ITZHAK GILBOA, FABIO MACCHERONI, MASSIMO MARINACCI, AND DAVID SCHMEIDLER1 A decision maker (DM) is characterized by two binary relations. The first reflects choices that are rational in an “objective” sense: the DM can convince others that she is right in making them. The second relation models choices that are rational in a “subjective” sense: the DM cannot be convinced that she is wrong in making them. In the context of decision under uncertainty, we propose axioms that the two notions of rationality might satisfy. These axioms allow a joint representation by a single set of prior probabilities and a single utility index. It is “objectively rational” to choose f in the presence of g if and only if the expected utility of f is at least as high as that of g given each and every prior in the set. It is “subjectively rational” to choose f rather than g if and only if the minimal expected utility of f (with respect to all priors in the set) is at least as high as that of g. In other words, the objective and subjective rationality relations admit, respectively, a representation à la Bewley (2002) and à la Gilboa and Schmeidler (1989). Our results thus provide a bridge between these two classic models, as well as a novel foundation for the latter. KEYWORDS: Multiple priors, rationality.
1. INTRODUCTION A CENTRAL ISSUE IN DECISION UNDER UNCERTAINTY has been the modeling of Ellsberg-type phenomena (Ellsberg (1961)) that arise in the presence of ambiguity, that is, when decision makers (DMs) do not have enough information to quantify uncertainty with a single probability measure. Following Schmeidler’s (1986, 1989) model involving nonadditive probabilities, Gilboa and Schmeidler (1989) and Bewley (2002) suggested models with sets of probabilities: the former modeling a complete preference relation by the maxmin rule, and the latter modeling an incomplete relation by the unanimity rule. Our purpose here is to show that the two models, and their perspectives on ambiguity, are complementary and can be fruitfully combined in a preference formation perspective. We take a normative viewpoint and attempt to capture preferences that are justifiable. In many decision problems of interest, preferences that can be solidly justified are incomplete, yet decisions eventually have to be made. Thus we deal with two preference relations. We correspondingly suggest two notions of rational choice. A choice is objectively rational if the DM can convince others that she is right in making it. 1 We wish to thank Eyal Baharad, Erio Castagnoli, Simone Cerreia-Vioglio, Eric Danan, Eddie Dekel, Gabi Gayer, Paolo Ghirardato, Al Klevorick, Dan Levin, Bob Nau, Klaus Nehring, Efe Ok, Wolfgang Pesendorfer, Ben Polak, Peter Wakker, and three anonymous referees for comments and references. This project was supported by the European Research Council (Advanced Grant BRSCDP-TEA), the Israel Science Foundation (Grants 975/03 and 355/06), and the Pinhas Sapir Center for Development.
© 2010 The Econometric Society
DOI: 10.3982/ECTA8223
756
GILBOA, MACCHERONI, MARINACCI, AND SCHMEIDLER
A choice is subjectively rational if others cannot convince the DM that she is wrong in making it. It can be useful to think of “being able to convince” as “having a proof that.” Consider a model (not formalized here) in which preference statements such as f g or f g are basic propositions, used to generate proofs in a given logic. A proof employs objective propositions that are accepted as statistical analysis of objective evidence, as scientific facts, and so forth, as well as preference statements by the DM. The proof is allowed to use standard logic, mathematical analysis, and statistical analysis, and also decision theoretic axioms. For example, transitivity may be used as an inference rule, allowing one to concatenate a proof that f g with a proof that g h to get a proof that f h. Using this logic metaphor, it is objectively rational to express a preference f g if there is a proof that starts only with objective facts, and uses the logic described above to show that f g. In other words, one may view the objectively rational relation as consisting of all provable preference statements. By contrast, the relation is subjectively rational if, starting with its preference statements, no inconsistencies result. Thus, objective rationality is essentially a property of particular instances of the relation, whereas subjective rationality is a property of the entire relation.2 While we refer to objective and subjective rationality throughout the paper, the reader may think of an incomplete relation, describing justifiable choice, and a complete relation, describing the choices that will eventually be made. To illustrate our approach, next we consider two classic axioms, that is, transitivity and independence. In this regard, observe that one may choose to model objective and subjective rationality using different axioms than those we employ in this paper. The axioms are supposed to capture the regularities satisfied by the two notions of rationality, that is, the ability to convince of or to insist on one’s opinion. Which axioms are acceptable for these two notions is ultimately a subjective matter that may depend on culture or personality. Therefore, whether a particular axiom is satisfied by objective or subjective rationality becomes an empirical question, to be settled by the degree to which people tend to be convinced by arguments. It follows that the axioms we propose here should only be viewed as a suggestion for the list of regularities the two notions of rationality may satisfy. Transitivity We will require both the objective and the subjective rational relations to be transitive. However, this axiom is interpreted differently in the two cases. 2
A more precise, though more cumbersome term for “objective rationality” would be “intersubjective rationality,” because we do not make any reference to an externally defined objectivity or to “truth.” Note that Simon (1947, 1957) also distinguished between objective rationality and subjective rationality, and in his case “objectivity” has a more classical meaning, referring to the experimenter’s judgment.
OBJECTIVE AND SUBJECTIVE RATIONALITY
757
Regarding objective rationality, transitivity is a basic “inference rule,” as explained above. When subjective rationality is discussed, we assume that all preference pairs are given as data. If the DM expresses strict preferences f g h f , we believe that she can be convinced that her preferences are irrational or wrong. Assuming transitivity as a normative condition of subjective rationality captures this intuition. Independence We maintain that objective and subjective rationality differ when we consider the independence axiom as in Anscombe–Aumann’s model, namely that f g if and only if αf + (1 − α)h αg + (1 − α)h. Consider first objective rationality. Suppose that f g. Hence, there exists a proof, starting with objectively acceptable preferences, that f is at least as good as g. The standard argument in favor of the independence axiom can be concatenated with this proof to conclude that αf + (1 − α)h αg + (1 − α)h. Next suppose that subjective rationality is concerned. Assume that the DM expressed preferences f g and αf + (1 − α)h ≺ αg + (1 − α)h. Will she necessarily be embarrassed by these preferences? We maintain that this need not be the case. For example, assume that there are two states of the world, and that f = (1 0) and g = (0 1). The DM has no information about the probability of the two states, and therefore her objective rationality relation does not rank them. Having to make a decision, the DM might shrug her shoulders and decide that they are equivalent, namely, that f ∼ g, due to symmetry. But when f and g are mixed with h = f and α = 1/2, the mixture (1/2)g + (1/2)h completely hedges against uncertainty, whereas the mixture of f with itself does not provide any reduction of uncertainty. The DM might plausibly argue that in this case αf + (1 − α)h is not equivalent to αg + (1 − α)h, because the former is uncertain, whereas the latter is not. As a result, the independence axiom is not as normatively appealing for subjective rationality as it is for objective rationality. On the other hand, observe that no asymmetric uncertainty reduction would result if f and g were mixed with a constant act h. For this reason the conjunction of f g and αf + (1 − α)h ≺ αg + (1 − α)h seems more difficult to justify as consistent, and subjective rationality will be required to satisfy CIndependence. The Present Model We use two binary relations, (∗ ∧ ), interpreted as objective and subjective rationality relations, respectively. We first provide a characterization of ∗ so that it can be represented by a unanimity rule à la Bewley (2002) with a set of priors C ∗ and a utility index u∗ (Theorem 1). If ∧ satisfies the axioms of Gilboa and Schmeidler (1989), then it can be represented by the maxmin rule
758
GILBOA, MACCHERONI, MARINACCI, AND SCHMEIDLER
with a set of priors C ∧ and a utility index u∧ (Theorem 2). However, the two sets of priors and the two utility indexes are unrelated. We therefore introduce two additional properties, explicitly connecting the two relations. The first property, Consistency, requires that a preference instance that is objectively rational is also subjectively rational. The second, Caution, deals with the way subjective rationality completes preferences between acts involving uncertainty and acts that do not. These two properties hold if and only if the two sets of priors (C ∗ and C ∧ ) are identical, and the two utility indexes (u∗ and u∧ ) are equivalent. Taken together, the axioms imply the existence of a set of priors and a utility index that represent both ∗ and ∧ simultaneously: the former via unanimity and the latter via the maxmin rule (Theorem 3). Finally, we observe that one of the connecting properties (Caution) guarantees uncertainty aversion. This suggests that the maxmin representation can follow even if the assumptions on ∧ are only completeness, transitivity, and continuity (Theorem 4). 2. MODEL AND RESULTS 2.1. Preliminaries We use a version of the Anscombe and Aumann (1963) model as restated by Fishburn (1970). Let X be a set of outcomes. A von Neumann–Morgenstern lottery is a finite support probability distribution over X. The set of lotteries, L, is endowed with a mixing operation: for every P Q ∈ L and every α ∈ [0 1], αP + (1 − α)Q ∈ L is defined pointwise (over X). The set of states of the world is S endowed with an algebra Σ of events. The set Δ(Σ) of (finitely additive) probabilities on Σ is endowed with the eventwise convergence topology. The set of (simple) acts F consists of all simple measurable functions f : S → L. It is endowed with a mixture operation as well, performed pointwise (over S). The DM is characterized by two binary relations ∗ and ∧ on F , denoting objective and subjective rational preferences, respectively. The derived relations ∗ , ∼∗ , ∧ , and ∼∧ are defined as usual. We extend ∗ and ∧ to L by identifying lotteries with constant acts. The set of all constant acts is denoted by Fc .3 For a function u : X → R, and a lottery P ∈ L, we write, EP u = x∈X P(x)u(x). 2.2. Axioms We begin with the basic conditions that will be imposed on both relations ∗ and ∧ . 3
We sometimes abuse notation by writing P ∈ L for the corresponding constant act in Fc .
OBJECTIVE AND SUBJECTIVE RATIONALITY
759
BASIC CONDITIONS: PREORDER: is reflexive and transitive. MONOTONICITY: For every f g ∈ F , f (s) g(s) for all s ∈ S implies f g. ARCHIMEDEAN CONTINUITY: For all f g h ∈ F , the sets {λ ∈ [0 1] : λf + (1 − λ)g h} and {λ ∈ [0 1] : h λf + (1 − λ)g} are closed in [0 1]. NONTRIVIALITY: There exist f g ∈ F such that f g. We interpret transitivity and monotonicity as axioms of rationality. Transitivity was discussed in the Introduction under the two interpretations. We here assume that monotonicity is also satisfied by the two relations: if act f pointwise dominates act g, it should be easy for the DM to argue that f is at least as good a choice as g; also, she will be embarrassed not to exhibit such a preference in this case. Reflexivity is a model-related assumption: it states that we prefer to represent the weak rather than the strict part of the preference relations involved. In the presence of incomplete relations, such an assumption is not innocuous because strict preferences are not the complement of weak ones. Still, this is a modeling choice that makes no claims about the DM’s preferences or behavior. Similarly, Nontriviality is a modeling assumption that simply rules out the uninteresting case of an overall indifferent DM, who would feature a constant utility function and any beliefs whatsoever, without any uniqueness results. Finally, the continuity axiom has a familiar status: it can be viewed as a purely “technical” condition, having to do with the mathematical idealization we use, and it can also be viewed as an assumption whose content can be challenged by thought experiments. Next we discuss axioms that are specific to objective or to subjective rationality.4 C-COMPLETENESS: For every f g ∈ Fc , f ∗ g or g ∗ f . COMPLETENESS: For every f g ∈ F , f ∧ g or g ∧ f . Observe that we require that objective rationality be complete when restricted to the subset of constant acts. C-Completeness verifies that the incom4 Since each of the following axioms will be assumed for one relation only, we state them directly in terms of this relation rather than in terms of an abstract relation as above. In the sequel, we use phrases such as “C-Completeness” and “∗ satisfies C-Completeness” interchangeably.
760
GILBOA, MACCHERONI, MARINACCI, AND SCHMEIDLER
pleteness of the objectively rational relation ∗ is not due to any difficulties that the DM might have about determining her preferences under certainty.5 INDEPENDENCE: For every f g h ∈ F and every α ∈ (0 1), f ∗ g if and only if αf + (1 − α)h ∗ αg + (1 − α)h C-INDEPENDENCE: For every f g ∈ F , every h ∈ Fc , and every α ∈ (0 1), f ∧ g if and only if αf + (1 − α)h ∧ αg + (1 − α)h. UNCERTAINTY AVERSION: For every f g ∈ F , if f ∼∧ g, then (1/2)f + (1/2)g ∧ g. The two versions of the Independence axiom were discussed in the Introduction. Uncertainty Aversion is implied by Independence, and there is therefore no need to explicitly require that it holds for objective rationality. We find it a plausible assumption for subjective rationality due to the intuition that “smoothing out” acts should be desirable. That is, we assume that a DM who finds two acts, f and g, equally attractive, will be embarrassed to state that their mixture is worse than both. Of course, this is but an assumption on what may or may not embarrass or convince the DM. For example, a DM who cannot reason in terms of the mixture operation may be subjectively rational while violating Uncertainty Aversion or C-Independence. 2.3. Representation of Objective and of Subjective Rationality 2.3.1. Unanimity Representation of Objective Rationality The axioms we imposed on ∗ deliver a unanimity representation. Our first result is conceptually similar to Bewley (2002). However, it represents a weak (rather than a strict) preference and it applies to an infinite state space; it is proved in Appendix B. THEOREM 1: The following statements are equivalent: (i) ∗ satisfies the Basic Conditions, C-Completeness, and Independence. (ii) There exist a nonempty closed and convex set C ∗ of probabilities on Σ and a nonconstant function u∗ : X → R such that, for every f g ∈ F , ∗ ∗ (1) Ef (s) u dp(s) ≥ Eg(s) u∗ dp(s) ∀p ∈ C ∗ f g iff S ∗
S ∗
Moreover, in this case, C is unique and u is unique up to positive affine transformations. 5 Indeed, incompleteness of tastes will also result in incomplete preferences. See Aumann (1962), Kannai (1963), Richter (1966), Peleg (1970), and, more recently, Ok (2002), Dubra, Maccheroni, and Ok (2004), and Mandler (2005). Nau (2006) and Ok, Ortoleva, and Riella (2008) suggested models with incompleteness of both tastes and beliefs.
OBJECTIVE AND SUBJECTIVE RATIONALITY
761
2.3.2. Maxmin Representation of Subjective Rationality The axioms we imposed on ∧ deliver a maxmin rule. THEOREM 2 —Gilboa and Schmeidler (1989, Theorem 1): The following statements are equivalent: (i) ∧ satisfies the Basic Conditions, Completeness, C-Independence, and Uncertainty Aversion. (ii) There exist a nonempty closed and convex set C of probabilities on Σ and a nonconstant function u : X → R such that, for every f g ∈ F , ∧ (2) f g iff min Ef (s) u dp(s) ≥ min Eg(s) u dp(s) p∈C
S
p∈C
S
Moreover, in this case, C is unique and u is unique up to positive affine transformations. 2.4. Relating Objective and Subjective Rationality We now come to discuss the relationship between the two orders. CONSISTENCY: f ∗ g implies f ∧ g. Intuitively, we argued that it is subjectively rational to prefer f to g if the DM cannot be convinced that she is wrong in exhibiting such a preference. One way in which the DM can be proven wrong is by pointing out that there are compelling, objective reasons to exhibit the opposite preference. A similar condition appears in Nehring (2001, 2009), titled Compatibility.6 If (∗ ∧ ) are represented as in Theorems 1 and 2, it is straightforward and essentially known that Consistency is equivalent to u = u∗ and C ⊆ C ∗ .7 CAUTION: For g ∈ F and f ∈ Fc , g ∗ f implies f ∧ g. This property implies that the DM in question is rather averse to ambiguity. Comparing a potentially uncertain act g and a constant (risky) act f , the DM first checks whether there are compelling reasons to prefer g to f . If there are, namely, g ∗ f , the property has no bite (and g ∧ f would follow from Consistency). If, however, no such reasons can be found, the DM would opt for the risky act over the uncertain one. Observe that Consistency appears to be essential to our interpretation: if the DM can be convinced of a given claim (i.e., that f is at least as good as g), it 6 In Nehring’s work, compatibility is supposed to hold between a preference relation over acts, assumed complete, and a likelihood relation over events, allowed to be incomplete. 7 See Nehring (2001, 2009) and Ghirardato, Maccheroni, and Marinacci (2004).
762
GILBOA, MACCHERONI, MARINACCI, AND SCHMEIDLER
should better be the case that she cannot be convinced that it is wrong to accept this claim. By contrast, Caution is a much more demanding assumption. If we impose it as a condition on subjective rationality, it suggests that, whenever f is a constant act (f ∈ Fc ), the DM will be embarrassed to state a strict preference g f unless there is an “objective proof” that the weak preference g f should hold. We discuss relaxations of this assumption below. THEOREM 3: The following statements are equivalent: (i) ∗ satisfies the Basic Conditions, C-Completeness, and Independence; ∧ satisfies the Basic Conditions, Completeness, and C-Independence; and jointly (∗ ∧ ) satisfy Consistency and Caution. (ii) There exist a nonempty closed and convex set C of probabilities on Σ and a nonconstant function u : X → R such that, for every f g ∈ F , (3) Ef (s) u dp(s) ≥ Eg(s) u dp(s) ∀p ∈ C f ∗ g iff S
and (4)
S
∧
f g
iff
Ef (s) u dp(s) ≥ min
min p∈C
S
p∈C
Eg(s) u dp(s) S
Moreover, in this case, C is unique and u is unique up to positive affine transformations. Notice that we do not need to assume that ∧ satisfies Uncertainty Aversion. In fact, its connection with ∗ through Caution already guarantees that ∧ satisfies this property. Theorem 3 can be also viewed as providing a novel foundation for the maxmin representation (2), based on the interplay of the two preferences ∗ and ∧ : the maxmin rule can be interpreted as a completion of the unanimity rule. If we take this approach, the following slightly stronger version of Caution allows us to further reduce the assumptions imposed on ∧ . DEFAULT TO CERTAINTY: For g ∈ F and f ∈ Fc , g ∗ f implies f ∧ g. This condition strengthens Caution by adding a “tie-breaking” rule that favors certainty. With this condition we can state another theorem. THEOREM 4: Statements (i) and (ii) of Theorem 3 are equivalent to the following one: (iii) ∗ satisfies the Basic Conditions, C-Completeness, and Independence; ∧ satisfies Preorder, Archimedean Continuity, and Completeness; and jointly (∗ ∧ ) satisfy Consistency and Default to Certainty.
OBJECTIVE AND SUBJECTIVE RATIONALITY
763
This result can be interpreted as follows. Suppose that a DM starts with an incomplete preference relation, ∗ , satisfying the Basic Conditions, CCompleteness, and Independence, which thus admits a unanimity representation by a set of probabilities and a utility index. Suppose further that the DM needs to make decisions and that ∧ is her completion of ∗ , which also satisfies Preorder and Archimedean Continuity. Default to Certainty then characterizes a behavior (modeled by ∧ ) that conforms to the maxmin model. Thus, Theorem 4 provides another possible account by which maxmin behavior might emerge from incomplete preferences. 3. DISCUSSION 3.1. Extremity of the Maxmin Rule The extreme nature of Caution is reflected in the extremity of the maxmin rule, when the set of probabilities C is interpreted as representing “hard evidence.”8 Indeed, it has often been argued that evaluating an act f by its worstcase expected utility is unreasonable. One may consider alternatives to Caution. Simply dropping the property allows a representation of ∗ by one set of probabilities, C ∗ , as in (3), and a representation of ∧ by another set of probabilities, C, as in (4), where C ⊆ C ∗ . If ∧ satisfies Independence, C will reduce to a singleton chosen out of C ∗ . Should one find this choice too arbitrary, one may choose a nonsingleton proper subset of C ∗ as suggested, for example, by Gajdos, Hayashi, Tallon, and Vergnaud (2008). Another completion of ∗ to a maxmin relation ∧ with a proper nonsingleton subset C ⊆ C ∗ is suggested by Kopylov (2009), building on the present paper and the intuition of preference deferral suggested by Danan and Ziegelmeyer (2006). Finally, we mention that the set C ∗ used for the unanimity rule may also be a subset of the set of probabilities defined by hard evidence. In fact, what counts as hard evidence is ultimately a subjective matter, as in the choice of significance level in hypotheses testing. 3.2. Related Literature Ghirardato, Maccheroni, and Marinacci (GMM; 2004) modeled a preference relation ∧ which may exhibit nonneutrality to ambiguity, and they derived from it a relation that captures “unambiguous preferences.” This relation, which they also denote by ∗ , is incomplete whenever ∧ fails to satisfy the independence axiom. Moreover, when ∧ is a maxmin expected utility relation, ∗ turns out to be precisely the unanimity relation with respect to the same set of priors. 8 However, the set C in Gilboa and Schmeidler (1989) is derived from preferences and need not coincide with a set of probabilities that are externally given to the DM. The set C is defined in behavioral terms, as a representation of a binary relation ∧ , and it need not coincide with any cognitive notion of a set of probabilities.
764
GILBOA, MACCHERONI, MARINACCI, AND SCHMEIDLER
The present paper is close to GMM in terms of the mathematical structure, and we have indeed partly relied on its derivation of the unanimity rule (as opposed to the earlier work by Bewley (2002)). However, the emphasis is different. In our case, both ∧ and ∗ are assumed as primitive relations, and the focus is on the relationships between them, as a step in the direction of modeling the reasoning process behind the completion of ∗ to a subjectively rational but complete order ∧ . If, for instance, one were to replace Caution by the requirement that ∧ satisfies independence, the derived relation ∗ in GMM would equal ∧ . By contrast, our model would still distinguish between subjective and objective rationality, and may be used to discuss the process by which a particular prior (corresponding to ∧ ) is selected out of the set of possible priors (corresponding to ∗ ). Nehring (2001, 2009) also discussed the tension between the inability to have complete preferences that are rationally derived and the need to make decisions. His model also deals with a pair of relations and the connection between them. In particular, he suggests that “contexts” can be used to choose a way to complete a relation, and has an axiom similar to our Consistency. Formally, our unanimity representation result for ∗ , though independent, is similar to Girotto and Holzer (2005): the setup is slightly different and our proof is simpler. There are other models that assume more than a single relation as a description of a preference formation process. Rubinstein (1988) discussed the generation of preferences over lotteries based on similarity relations. Mandler (2005) distinguished between “psychological preferences,” which may be incomplete, and “revealed preferences,” which are complete but may be intransitive. Danan (2008) also dealt with two relations, cognitive and behavioral. APPENDIX A: PROOFS AND RELATED MATERIAL B0 (Σ) is the vector space generated by the indicator functions of the elements of Σ, endowed with the supnorm. We denote by ba(Σ) the set of all bounded, finitely additive set functions on Σ, and by Δ(Σ) the set of all probabilities on Σ. As well known, ba(Σ), endowed with the total variation norm, is isometrically isomorphic to the norm dual of B0 (Σ). In this case the weak* topology, w∗ , of ba(Σ) coincides with the eventwise convergence topology. Given a nonsingleton interval K in the real line (whose interior is denoted K ◦ ), B0 (Σ K) is the set of the functions in B0 (Σ) taking values in K. Clearly, B0 (Σ) = B0 (Σ R). PROOF OF THEOREM 3: Assume that (∗ ∧ ) satisfy (i). Let u∗ and C ∗ represent ∗ as in Theorem 1. In F , set f g if and only if λf + (1 − λ)h ∧ λg + (1 − λ)h for all λ ∈ [0 1] and h ∈ F . Lemma 1 and Propositions 5 and 7 of GMM guarantee that there exist a nonempty closed and convex set C of
OBJECTIVE AND SUBJECTIVE RATIONALITY
765
probabilities on Σ, a nonconstant function u : X → R, and a monotonic and constant linear functional I : B0 (Σ) → R such that, for every f g ∈ F , (5)
f ∧ g
iff
(6)
f g
iff
S
(7)
S
Ef (s) u dp(s) ≤ I(Ef u)
min p∈C
I(Ef u) ≥ I(Eg u) Ef (s) u dp(s) ≥ Eg(s) u dp(s) ∀p ∈ C
S
Moreover, equality holds in (7) for all f ∈ F if (and only if) ∧ satisfies Uncertainty Aversion. On constant acts, by Consistency, ∗ (represented by E· u∗ ) is a nontrivial subrelation of ∧ (represented by E· u). Corollary B.3 of GMM allows us to assume u∗ = u. Proposition 4 of GMM implies that is the maximal subrelation of ∧ satisfying Independence. Consistency then implies ∗ ⊆ and Proposition A.1 of GMM delivers C ⊆ C ∗ . To show the converse inclusion, we use Caution. If there is g ∈ F such ∗ E u dp(s), then there is Q ∈ L such that I(Eg u) > that I(Eg u) > min p∈C S g(s) ∧ EQ u > minp∈C ∗ S Eg(s) u dp(s). That is, g ∗ Q and g Q, which violates ∗ Caution. Thus, by (7) and C ⊆ C , minp∈C S Ef (s) u dp(s) ≤ I(Ef u) ≤ minp∈C ∗ S Ef (s) u dp(s) ≤ minp∈C S Ef (s) u dp(s) for all f ∈ F . Proposition A.1 of GMM delivers C ∗ ⊆ C.9 The rest is trivial. Q.E.D. PROOF OF THEOREM 4: Assume that (∗ ∧ ) satisfy (iii). Let C and u represent ∗ as in Theorem 1. Let P Q ∈ L. By Consistency, P ∗ Q implies P ∧ Q. By Default to Certainty, P ∗ Q implies P ∧ Q. Therefore, ∧ and ∗ coincide on L. Hence, Monotonicity of ∗ and Consistency imply that ∧ also satisfies Monotonicity. Since ∧ satisfies Monotonicity and Archimedean Continuity, for each f there exists Rf ∈ L such that Rf ∼∧ f . ∧ For every f , f ∗ Rf would imply, by Default to Certainty, Rf f . ∗ Hence, f Rf . Therefore, ERf u ≤ S Ef (s) u dp(s) for all p ∈ C and ERf u ≤ minp∈C S Ef (s) u dp(s). Moreover, if ERf u < minp∈C S Ef (s) u dp(s), take P ∈ L such that P ∧ f (s) for all s ∈ S. Then there is γ ∈ (0 1] such that ERf u < EγP+(1−γ)Rf u = minp∈C S Ef (s) u dp(s). Thus, f ∗ γP + (1 − γ)Rf ∧ Rf and, by Consistency, f ∧ Rf , which is absurd. In conclusion, ERf u = minp∈C S Ef (s) u dp(s) for all f ∈ F and all Rf ∈ L such that Rf ∼∧ f . Finally, f ∧ g ⇐⇒ Rf ∧ Rg ⇐⇒ ERf u ≥ ERg u ⇐⇒ Q.E.D. minp∈C S Ef (s) u dp(s) ≥ minp∈C S Eg(s) u dp(s). The rest is trivial. 9
In particular, this implies that coincides with ∗ .
766
GILBOA, MACCHERONI, MARINACCI, AND SCHMEIDLER
APPENDIX B: SUPPLEMENTARY MATERIAL We recall that a binary relation on B0 (Σ K) is: • a preorder if it is reflexive and transitive, • continuous if ϕn ψn for all n ∈ N, ϕn → ϕ and ψn → ψ imply ϕ ψ • Archimedean if the sets {λ ∈ [0 1] : λϕ + (1 − λ)ψ η} and {λ ∈ [0 1] : η λϕ + (1 − λ)ψ} are closed in [0 1] for all ϕ ψ η ∈ B0 (Σ K), • affine if for all ϕ ψ η ∈ B0 (Σ K) and α ∈ (0 1), ϕ ψ if and only if αϕ + (1 − α)η αψ + (1 − α)η, • monotonic if ϕ ≥ ψ implies ϕ ψ, • nontrivial if there exists ϕ ψ ∈ B0 (Σ K) such that ϕ ψ but not ψ ϕ. PROPOSITION 1—GMM, Proposition A.1: For j = 1 2, let Cj be nonempty subsets of Δ(Σ) and let j be the relations defined on B0 (Σ K) by ϕ dp ≥ ψ dp ∀p ∈ Cj ϕ j ψ ⇐⇒ S
Then
S
ϕ j ψ
⇐⇒
∗
ϕ dp ≥ S
ψ dp ∀p ∈ cow (Cj ) S
and the following statements are equivalent: (i) ϕ 1 ψ ⇒ ϕ 2 ψ for all ϕ and ψ in B0 (Σ K). ∗ ∗ (ii) cow (C2 ) ⊆ cow (C1 ). (iii) infp∈C2 S ϕ dp ≥ infp∈C1 S ϕ dp for all ϕ ∈ B0 (Σ K). PROPOSITION 2 —GMM, Proposition A.2: is a nontrivial, continuous, affine, and monotonic preorder on B0 (Σ K) if and only if there exists a nonempty subset C of Δ(Σ) such that ϕ ψ ⇐⇒ (8) ϕ dp ≥ ψ dp ∀p ∈ C S
S
w∗
Moreover, co (C) is the unique weak* closed and convex subset of Δ(Σ) representing in the sense of Eq. (8). The complete proofs of the above propositions appear in Ghirardato, Maccheroni, and Marinacci (2002). To prove our results, we need some additional lemmas. LEMMA 1: Let be a preorder on B0 (Σ). Then is affine if and only if ϕ ψ implies γϕ + η γψ + η for all η ∈ B0 (Σ) and all γ ∈ R+ . The proof is a standard exercise.
OBJECTIVE AND SUBJECTIVE RATIONALITY
767
LEMMA 2: If is an affine preorder on B0 (Σ K), then there exists a unique affine preorder on B0 (Σ) that coincides with on B0 (Σ K). Moreover, if is monotonic (resp. Archimedean), then is monotonic (resp. Archimedean) too. PROOF: Suppose first 0 ∈ K ◦ . We begin with a claim: CLAIM: Given any ϕ ψ ∈ B0 (Σ K), the following facts are equivalent: (i) ϕ ψ (ii) There exists α > 0 such that αϕ αψ ∈ B0 (Σ K) and αϕ αψ. (iii) αϕ αψ for all α > 0 such that αϕ αψ ∈ B0 (Σ K). PROOF OF THE CLAIM: (i)⇒(ii) and (iii)⇒(i) are obvious. (ii)⇒(iii) follows from affinity. Q.E.D. If ϕ ψ ∈ B0 (Σ), set ϕ ψ ⇐⇒ αϕ αψ for some α > 0 such that αϕ αψ ∈ B0 (Σ K). By the Claim, is a well defined binary relation on B0 (Σ), which coincides with on B0 (Σ K). Moreover, ϕ ψ if and only if αϕ αψ for all α > 0 such that αϕ αψ ∈ B0 (Σ K). By standard arguments, is an affine preorder and it is monotonic if is monotonic. As to uniqueness, let be an affine preorder on B0 (Σ) that coincides with on B0 (Σ K). For all ϕ ψ ∈ B0 (Σ), take α > 0 such that αϕ αψ ∈ B0 (Σ K). Then the Claim (applied to ), the fact that coincides with on B0 (Σ K), and the definition of guarantee that ϕ ψ ⇐⇒ αϕ αψ ⇐⇒ αϕ αψ ⇐⇒ ϕ ψ, that is, coincides with on B0 (Σ). Suppose 0 ∈ / K ◦ . Given any k ∈ K ◦ , for ϕ ψ ∈ B0 (Σ K − k) set ϕ k ψ ⇐⇒ ϕ + k ψ + k. Then k is an affine preorder on B0 (Σ K − k) (monotonic if is monotonic). Since 0 ∈ (K − k)◦ , there is a unique affine preorder k on B0 (Σ) that coincides with k on B0 (Σ K − k) (monotonic if is monotonic). Such an extension coincides with on B0 (Σ K) and it is the unique affine preorder on B0 (Σ) with this property. Finally, if is Archimedean, denote by the unique affine preorder on B0 (Σ) which coincides with on B0 (Σ K). For all ϕ ψ η ∈ B0 (Σ) take α > 0 and β ∈ R such that αϕ + β αψ + β αη + β ∈ B0 (Σ K). Then, by Lemma 1, {λ ∈ [0 1] : λϕ + (1 − λ)ψ η} coincides with {λ ∈ [0 1] : λ(αϕ + β) + (1 − λ)(αψ + β) αη + β}, which is closed since is Archimedean. This argument and the one obtained by reversing all the relations show that is Archimedean. Q.E.D. LEMMA 3: An affine and monotonic preorder on B0 (Σ K) is continuous if and only if it is Archimedean. PROOF: Obviously, continuity implies the Archimedean property. Conversely, assume is Archimedean. Since is monotonic and Archimedean, then the affine preorder on B0 (Σ) that coincides with on B0 (Σ K) is monotonic and Archimedean too (Lemma 2).
768
GILBOA, MACCHERONI, MARINACCI, AND SCHMEIDLER
If ϕn 0 for all n ∈ N and ϕn → ϕ, let M = sups∈S ϕ(s), which is indeed a maximum. For all ε ∈ (0 1) there is n such that ϕn ≤ ϕ + ε ≤ ϕ + ε(M + 1 − ϕ). In fact, M ≥ ϕ implies M + 1 − ϕ ≥ 1. Therefore, for all ε ∈ (0 1) there is n ∈ N such that ε(M + 1) + (1 − ε)ϕ = ϕ + ε(M + 1 − ϕ) ≥ ϕn 0. Monotonicity of delivers that, for all ε ∈ (0 1), ε(M + 1) + (1 − ε)ϕ 0, but is Archimedean, hence the set of all ε with this property is closed and because it contains (0 1), it also contains 0; in particular, ϕ 0. Conclude that, if ϕn → ϕ, ψn → ψ, and ϕn ψn for all n ∈ N, then ϕn − ψn 0 for all n ∈ N and ϕn − ψn → ϕ − ψ; therefore ϕ − ψ 0, that is, ϕ ψ. Thus, is continuous, which immediately implies that is continuous too. Q.E.D. Now Lemma 3 and Proposition 2 deliver the following corollary. COROLLARY 1: is a nontrivial, Archimedean, affine, and monotonic preorder on B0 (Σ K) if and only if there exists a nonempty subset C of Δ(Σ) such that (9) ϕ dp ≥ ψ dp ∀p ∈ C ϕ ψ ⇐⇒ S
S
w∗
Moreover, co (C) is the unique weak* closed and convex subset of Δ(Σ) representing in the sense of Eq. (9). All the results we have proved in this appendix hold more generally if B0 (Σ) is replaced by any normed Riesz space with unit.10 PROOF OF THEOREM 1: Assume (i). By standard arguments, there exists a nonconstant u∗ such that U(·) ≡ E· u∗ represents ∗ on L. Also, B0 (Σ U(L)) = {U ◦ f : f ∈ F} and U ◦ f = U ◦ g if and only if f (s) ∼∗ g(s) for all s ∈ S, which by Monotonicity implies f ∼∗ g. We can therefore define ∗ as follows: for ϕ ψ ∈ B0 (Σ U(L)), ϕ ∗ ψ if f ∗ g for some f g ∈ F such that ϕ = U ◦ f and ψ = U ◦ g. By standard arguments, (i) implies that ∗ is a preorder that satisfies the conditions of Corollary 1. Hence, there exists a unique nonempty weak* closed ∗ ϕ ∗ ψ ⇐⇒ and convex subset C of Δ(Σ) ∗such that, for ϕ ψ ∈ B0 (Σ U(L)), ∗ ∗ ϕ dp ≥ Sψ dp for all p ∈ S C . Therefore, for f g ∈ ∗F , f g ⇐⇒ ∗U ◦ f U (U ◦ f ) dp ≥ S (U ◦ g) dp for all p ∈ C ⇐⇒ S Ef (s) u dp(s) ≥ S ◦ g ⇐⇒ ∗ E u dp(s) for all p ∈ C ∗ . The rest is trivial. Q.E.D. S g(s) Final Technical Remarks REMARK 1: There is a natural trade-off between Archimedean Continuity and Independence. Theorem 1 holds unchanged if we replace Archimedean 10
For definitions, see Chapter 8 of Aliprantis and Border (2006).
OBJECTIVE AND SUBJECTIVE RATIONALITY
769
Continuity with the following stronger condition and replace Independence with the following weaker condition. STRONG ARCHIMEDEAN CONTINUITY: For all e f g h ∈ F , the set {λ ∈ [0 1] : λf + (1 − λ)g ∗ λh + (1 − λ)e} is closed in [0 1]. WEAK INDEPENDENCE: For every f g h ∈ F , and every α ∈ (0 1), f ∗ g implies αf + (1 − α)h ∗ αg + (1 − α)h In fact, Strong Archimedean Continuity implies Archimedean Continuity, while Shapley and Baucells (1998, Lemma 1.2) showed that Preorder, Strong Archimedean Continuity, and Weak Independence imply Independence. Thus, representation (1) holds if Archimedean Continuity and Independence are replaced by Strong Archimedean Continuity and Weak Independence. Conversely, (1) implies Strong Archimedean Continuity and (Weak) Independence. REMARK 2: Lemma 3 and the implied Corollary 1 are the main technical novelties, with respect to the results of GMM, that we need for the proof of Theorem 1. Lacking the link between the algebraic Archimedean property and the topological continuity property established here, GMM had to resort to the topological continuity of the functional that represented the original preferences so as to obtain a unanimity representation of the derived unambiguous preferences (see the proof of their Proposition 5). As already observed, here ∗ is assumed as a primitive and their techniques cannot be directly replicated. REFERENCES ALIPRANTIS, C. D., AND K. C. BORDER (2006): Infinite Dimensional Analysis (Third Ed.). New York: Springer-Verlag. [768] ANSCOMBE, F. J., AND R. J. AUMANN (1963): “A Definition of Subjective Probability,” The Annals of Mathematics and Statistics, 34, 199–205. [758] AUMANN, R. J. (1962): “Utility Theory Without the Completeness Axiom,” Econometrica, 30, 445–462. [760] BEWLEY, T. (2002): “Knightian Decision Theory: Part I,” Decisions in Economics and Finance, 25, 79–110. [755,757,760,764] DANAN, E. (2008): “Revealed Preference and Indifferent Selection,” Mathematical Social Sciences, 55, 24–37. [764] DANAN, E., AND A. ZIEGELMEYER (2006): “Are Preferences Complete? An Experimental Measurement of Indecisiveness Under Risk,” Working Paper, Max Planck Institute of Economics. [763] DUBRA, J., F. MACCHERONI, AND E. A. OK (2004): “Expected Utility Theory Without the Completeness Axiom,” Journal of Economic Theory, 115, 118–133. [760] ELLSBERG, D. (1961): “Risk, Ambiguity and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. [755] FISHBURN, P. C. (1970): Utility Theory for Decision Making. New York: Wiley. [758] GAJDOS, T., T. HAYASHI, J.-M. TALLON, AND J.-C. VERGNAUD (2008): “Attitude Toward Imprecise Information,” Journal of Economic Theory, 140, 27–65. [763]
770
GILBOA, MACCHERONI, MARINACCI, AND SCHMEIDLER
GHIRARDATO, P., F. MACCHERONI, AND M. MARINACCI (2002): “Ambiguity From the Differential Viewpoint,” Social Science Working Paper 1130, Caltech. [766] (2004): “Differentiating Ambiguity and Ambiguity Attitude,” Journal of Economic Theory, 118, 133–173. [761,763] GILBOA, I., AND D. SCHMEIDLER (1989): “Maxmin Expected Utility With Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. [755,757,761,763] GIROTTO, B., AND S. HOLZER (2005): “Representation of Subjective Preferences Under Ambiguity,” Journal of Mathematical Psychology, 49, 372–382. [764] KANNAI, Y. (1963): “Existence of a Utility in Infinite Dimensional Partially Ordered Spaces,” Israel Journal of Mathematics, 1, 229–234. [760] KOPYLOV, I. (2009): “Choice Deferral and Ambiguity Aversion,” Theoretical Economics, 4, 199–225. [763] MANDLER, M. (2005): “Incomplete Preferences and Rational Intransitivity of Choice,” Games and Economic Behavior, 50, 255–277. [760,764] NAU, R. (2006): “The Shape of Incomplete Preferences,” The Annals of Statistics, 34, 2430–2448. [760] NEHRING, K. (2001): “Ambiguity in the Context of Probabilistic Beliefs,” Mimeo, UC Davis. [761,764] (2009): “Imprecise Probabilistic Beliefs as a Context for Decision-Making Under Ambiguity,” Journal of Economic Theory, 144, 1054–1091. [761,764] OK, E. A. (2002): “Utility Representation of an Incomplete Preference Relation,” Journal of Economic Theory, 104, 429–449. [760] OK, E. A., P. ORTOLEVA, AND G. RIELLA (2008): “Incomplete Preferences Under Uncertainty: Indecisiveness in Beliefs vs. Tastes,” Mimeo, NYU. [760] PELEG, B. (1970): “Utility Functions for Partially Ordered Topological Spaces,” Econometrica, 38, 93–96. [760] RICHTER, M. (1966): “Revealed Preference Theory,” Econometrica, 34, 625–645. [760] RUBINSTEIN, A. (1988): “Similarity and Decision-Making Under Risk,” Journal of Economic Theory, 46, 145–153. [764] SCHMEIDLER, D. (1986): “Integral Representation Without Additivity,” Proceedings of the American Mathematical Society, 97, 255–261. [755] (1989): “Subjective Probability and Expected Utility Without Additivity,” Econometrica, 57, 571–587. [755] SHAPLEY, L. S., AND M. BAUCELLS (1998): “A Theory of Multiperson Utility,” Working Paper 779, UCLA. [769] SIMON, H. A. (1947): Administrative Behavior (Second Ed.). New York: The Free Press. [756] (1957): “Comparison of Game Theory and Learning Theory,” in Models of Man. New York: Wiley, 274–279. [756]
HEC, Paris, France and Eitan Berglas School of Economics, Tel Aviv University, Tel Aviv 69978, Israel;
[email protected], Department of Decision Sciences, Dondena, and IGIER, Università Bocconi, 20136 Milano, Italy;
[email protected], Department of Decision Sciences, Dondena, and IGIER, Università Bocconi, 20136 Milano, Italy;
[email protected], and Department of Economics, The Ohio State University, Columbus, OH 43210, U.S.A. and School of Mathematical Sciences, Tel Aviv University, Tel Aviv 69978, Israel;
[email protected]. Manuscript received November, 2008; final revision received September, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 771–789
THE DYNAMIC PIVOT MECHANISM BY DIRK BERGEMANN AND JUUSO VÄLIMÄKI1 We consider truthful implementation of the socially efficient allocation in an independent private-value environment in which agents receive private information over time. We propose a suitable generalization of the pivot mechanism, based on the marginal contribution of each agent. In the dynamic pivot mechanism, the ex post incentive and ex post participation constraints are satisfied for all agents after all histories. In an environment with diverse preferences it is the unique mechanism satisfying ex post incentive, ex post participation, and efficient exit conditions. We develop the dynamic pivot mechanism in detail for a repeated auction of a single object in which each bidder learns over time her true valuation of the object. The dynamic pivot mechanism here is equivalent to a modified second price auction. KEYWORDS: Pivot mechanism, dynamic mechanism design, ex post equilibrium, marginal contribution, multiarmed bandit, Bayesian learning.
1. INTRODUCTION IN THIS PAPER, we generalize the idea of the pivot mechanism (due to Green and Laffont (1977)) to dynamic environments with private information. We design an intertemporal sequence of transfer payments which allows each agent to receive her flow marginal contribution in every period. In other words, after each history, the expected transfer that each agent must pay coincides with the dynamic externality cost that she imposes on the other agents. In consequence, each agent is willing to truthfully report her information in every period. We consider a general intertemporal model in discrete time and with a common discount factor. The private information of each agent in each period is her perception of her future payoff path conditional on the realized signals and allocations. We assume throughout that the information is statistically independent across agents. At the reporting stage of the direct mechanism, each agent reports her information. The planner then calculates the efficient allocation given the reported information. The planner also calculates for each agent i the optimal allocation when agent i is excluded from the mechanism. The total expected discounted payment of each agent is set equal to the externality cost imposed on the other agents in the model. In this manner, each agent receives as her payment her marginal contribution to the social welfare in every conceivable continuation game. 1 We thank the editor and four anonymous referees for many helpful comments. The current paper is a major revision and supersedes “Dynamic Vickrey–Clarke–Groves Mechanisms” (2007). We are grateful to Larry Ausubel, Jerry Green, Paul Healy, John Ledyard, Benny Moldovanu, Michael Ostrovsky, David Parkes, Alessandro Pavan, Ilya Segal, Xianwen Shi, and Tomasz Strzalecki for many informative conversations. The authors acknowledge financial support through National Science Foundation Grants CNS 0428422 and SES 0518929, and the Yrjö Jahnsson’s Foundation, respectively.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7260
772
D. BERGEMANN AND J. VÄLIMÄKI
With transferable utilities, the social objective is simply to maximize the expected discounted sum of the individual utilities. Since this is essentially a dynamic programming problem, the solution is by construction time-consistent. In consequence, the dynamic pivot mechanism is time-consistent and the social choice function can be implemented by a sequential mechanism without any ex ante commitment by the designer (apart from the commitment to the transfers promised for the current period). In contrast, in revenue-maximizing problems, it is well known that the optimal solution relies critically on the ability of the principal to commit to a contract, see Baron and Besanko (1984). Interestingly, Battaglini (2005) showed that in dynamic revenue-maximizing problems with stochastic types, the commitment problems are less severe than with constant types. The dynamic pivot mechanism yields a positive monetary surplus for the planner in each period and, therefore, the planner does not need outside resources to achieve the efficient allocation. Finally, the dynamic pivot mechanism induces all agents to participate in the mechanism after all histories. In the intertemporal environment there are many transfer schemes that support the same incentives as the pivot mechanism. In particular, the monetary transfers necessary to induce the efficient action in period t may become due at some later period s provided that the net present value of the transfers remains constant. We say that a mechanism supports efficient exit if an agent who ceases to affect current and future allocations also ceases to pay and receive transfers. This condition is similar to the requirement often made in the scheduling literature that the mechanism be an online mechanism (see Lavi and Nisan (2000)). We establish that in an environment with diverse preferences, the dynamic pivot mechanism is the only efficient mechanism that satisfies ex post incentive compatibility, ex post participation, and efficient exit conditions. The basic idea of the dynamic pivot mechanism is first explored in the context of a scheduling problem where a set of privately informed bidders compete for the services of a central facility over time. This class of problems is perhaps the most natural dynamic allocation analogue to the static single-unit auction. The scheduling problem is kept deliberately simple and all the relevant private information arrives in the initial period. Subsequently, we use the dynamic pivot mechanism to derive the dynamic auction format for a model where bidders learn their valuations for a single object over time. In contrast to the scheduling problem where a static mechanism could still have implemented the efficient solution, a static mechanism now necessarily fails to support the efficient outcome as more information arrives over time. In turn, this requires a more complete understanding of the intertemporal trade-offs in the allocation process. By computing the dynamic marginal contributions, we can derive explicit and informative expressions for the intertemporal transfer prices. In recent years, a number of papers have been written with the aim to explore various issues arising in dynamic allocation problems. Among the contributions
THE DYNAMIC PIVOT MECHANISM
773
which focus on socially efficient allocation, Cavallo, Parkes, and Singh (2006) proposed a Markovian environment for general allocation problems and analyzed two different classes of sequential incentives schemes: (i) Groves-like payments and (ii) pivot-like payments. They established that Groves-like payments, which award every agent positive monetary transfers equal to the sum of the valuation of all other agents, guarantee interim incentive compatibility and ex post participation constraints after all histories. In contrast, pivot-like payments guarantee interim incentive compatibility and ex ante participation constraints. Athey and Segal (2007) considered a more general dynamic model in which the current payoffs are allowed to depend on the entire past history including past signals and past actions. In addition, they also allowed for hidden action as well as hidden information. The main focus of their analysis is on incentive compatible mechanisms that are budget balanced in every period of the game. Their mechanism, called balanced team mechanism, transfers the insight from the Arrow (1979) and D’Aspremont and Gerard-Varet (1979) mechanisms into a dynamic environment. In addition, Athey and Segal (2007) presented conditions in terms of ergodic distributions over types and patients agents such that insights from repeated games can be employed to guarantee interim participation constraints. In contrast, we emphasize voluntary participation without any assumptions about the discount factor or the ergodicity of the type distributions. We also define an efficient exit condition which allows us to single out the dynamic pivot mechanism in the class of efficient mechanisms. The focus of the current paper is on the socially efficient allocation, but a number of recent papers have analyzed the design of dynamic revenuemaximizing mechanisms, beginning with the seminal contributions by Baron and Besanko (1984) and Courty and Li (2000), who considered optimal intertemporal pricing policies with private information in a setting with two periods. Battaglini (2005) considered the revenue-maximizing long-term contract of a monopolist in a model with an infinite time horizon when the valuation of the buyer changes in a Markovian fashion over time. In particular, Battaglini (2005) showed that the optimal continuation contracts for a current high type are efficient, as his payoff is determined by the allocations for the current low type (by incentive compatibility). The net payoffs of the types then have a property related to the marginal contribution here. But as Battaglini (2005) considered revenue-maximizing contracts, the lowest type served receives zero utility, and hence the notion of marginal contribution refers only to the additional utility generated by higher types, holding the allocation constant, rather than the entire incremental social value. Most recently, Pavan, Segal, and Toikka (2008) developed a general allocation model and derived the optimal dynamic revenue-maximizing mechanism. A common thread in these papers is a suitable generalization of the notion of virtual utility to dynamic environments.
774
D. BERGEMANN AND J. VÄLIMÄKI
2. MODEL Uncertainty We consider an environment with private and independent values in a discrete-time, infinite-horizon model. The flow utility of agent i ∈ {1 2 I} in period t ∈ N is determined by the current allocation at ∈ A, the current monetary transfer pit ∈ R, and a state variable θit ∈ Θi . The von Neumann– Morgenstern utility function ui of agent i is quasilinear in the monetary transfer: ui (at pit θit ) vi (at θit ) − pit The current allocation at ∈ A is an element of a finite set A of possible allocations. The state of the world θit for agent i is a general Markov process on the state space Θi . The aggregate state is given by the vector θt = (θ1t θIt ) I with Θ = i=1 Θi . There is a common prior Fi (θi0 ) regarding the initial type θi0 of each agent i. The current state θit and the current action at define a probability distribution for next period state variables θit+1 on Θi . We assume that this distribution can be represented by a stochastic kernel Fi (θit+1 ; θit at ). The utility functions ui (·) and the probability transition functions Fi (·; at θit ) are common knowledge at t = 0. The common prior Fi (θi0 ) and the stochastic kernels Fi (θit+1 ; θit at ) are assumed to be independent across agents. At the beginning of each period t, each agent i observes θit privately. At the end of each period, an action at ∈ A is chosen and payoffs for period t are realized. The asymmetric information is therefore generated by the private observation of θit in each period t. We observe that by the independence of the priors and the stochastic kernels across i, the information of agent i, θit+1 , does not depend on θjt for j = i. The expected absolute value of the flow payoff is assumed to be bounded by some K < ∞ for every i a θ and allocation plan a : Θ → A: vi (a (θ ) θ ) dF(θ ; a θ) < K i
×
The nature of the state space Θ depends on the application at hand. At this point, we stress that the formulation accommodates the possibility of random arrival or departure of the agents. The arrival or departure of agent i can be represented by an inactive state θi where vi (at θi ) = 0 for all at ∈ A and a random time τ at which agent i privately observes her transition in or out of the inactive state. Social Efficiency All agents discount the future with a common discount factor δ 0 < δ < 1. The socially efficient policy is obtained by maximizing the expected discounted
THE DYNAMIC PIVOT MECHANISM
775
sum of valuations. Given the Markovian structure, the socially optimal program starting in period t at state θt can be written as ∞ I s−t E δ vi (as θis ) W (θt ) max ∞ {as }s=t
s=t
i=1
For notational ease, we omit the conditioning state in the expectation operator, when the conditioning event is obvious, as in the above, where E[·] = Eθt [·]. Alternatively, we can represent the social program in its recursive form: I vi (at θit ) + δEW (θt+1 ) W (θt ) = max E at
i=1
The socially efficient policy is denoted by a∗ = {a∗t }∞ t=0 . The social externality cost of agent i is determined by the social value in the absence of agent i: ∞ s−t W−i (θt ) max E δ vj (as θjs ) ∞ {as }s=t
s=t
j=i
The efficient policy when agent i is excluded is denoted by a∗−i = {a∗−it }∞ t=0 . The marginal contribution Mi (θt ) of agent i at signal θt is defined by (1)
Mi (θt ) W (θt ) − W−i (θt )
The marginal contribution of agent i is the change in the social value due to the addition of agent i.2 Mechanism and Equilibrium We focus attention on direct mechanisms which truthfully implement the socially efficient policy a∗ . A dynamic direct mechanism asks every agent i to report her state θit in every period t. The report rit ∈ Θi may or may not be truthful. The public history in period t is a sequence of reports and allocations until period t − 1, or ht = (r0 a0 r1 a1 rt−1 at−1 ), where each rs = (r1s rIs ) is a report profile of the I agents. The set of possible public histories in period t is denoted by Ht . The sequence of reports by the agents is part of the public history and we assume that the past reports of each agent are observable to all the agents. The private history of agent i in period t consists of the public history and the sequence of private observations until period t, or hit = (θi0 r0 a0 θi1 r1 a1 θit−1 rt−1 at−1 θit ) The set of possible private histories in period t is denoted by Hit . An (efficient) dynamic direct 2 In symmetric information environments, we used the notion of marginal contribution to construct efficient equilibria in dynamic first price auctions; see Bergemann and Välimäki (2003, 2006).
776
D. BERGEMANN AND J. VÄLIMÄKI
mechanism is represented by a family of allocations and monetary transfers, ∗ I {a∗t pt }∞ t=0 : at : Θ → (A) and pt : Ht × Θ → R . With the focus on efficient ∗ mechanisms, the allocation at depends only on the current (reported) state rt ∈ Θ, while the transfer pt may depend on the entire public history. A (pure) reporting strategy for agent i in period t is a mapping from the private history into the state space: rit : Hit → Θi . For a given mechanism, the expected payoff of agent i from reporting strategy ri = {rit }∞ t=0 given the strategies r−i = {r−it }∞ t=0 is E
∞ δt vi (a∗ (rt ) θit ) − pi (ht rt ) t=0
Given the mechanism {a∗t pt }∞ t=0 and the reporting strategies r−i , the optimal strategy of bidder i can be stated recursively:
Vi (hit ) = max E vi (a∗t (rit r−it ) θit ) − pi (ht rit r−it ) + δVi (hit+1 ) rit ∈Θi
The value function Vi (hit ) represents the continuation value of agent i given the current private history hit . We say that a dynamic direct mechanism is interim incentive compatible if for every agent and every history, truthtelling is a best response given that all other agents report truthfully. We say that the dynamic direct mechanism is periodic ex post incentive compatible if truthtelling is a best response regardless of the history and the current state of the other agents. In the dynamic context, the notion of ex post incentive compatibility is qualified by periodic, as it is ex post with respect to all signals received in period t, but not ex post with respect to signals arriving after period t. The periodic qualification arises in the dynamic environment, as agent i may receive information at some later time s > t such that in retrospect she would wish to change the allocation choice in t and hence her report in t. Finally we define the periodic ex post participation constraints of each agent. After each history ht , each agent i may opt out (permanently) from the mechanism. The value of the outside option is denoted Oi (hit ) and it is defined by the payoffs that agent i receives if the planner pursues the efficient policy a∗−i for the remaining agents. The periodic participation constraint requires that each agent’s equilibrium payoff after each history weakly exceeds Oi (hit ). For the remainder of the text, we say that a mechanism is ex post incentive compatible and individually rational if it satisfies the periodic ex post incentive and participation constraints. 3. SCHEDULING: AN EXAMPLE We consider the problem of allocating time to use a central facility among competing agents. Each agent has a private valuation for the completion of a
THE DYNAMIC PIVOT MECHANISM
777
task which requires the use of the central facility. The facility has a capacity constraint and can only complete one task per period. The cost of delaying any task is given by the discount rate δ < 1. The agents are competing for the right to use the facility at the earliest available time. The objective of the social planner is to sequence the tasks over time so as to maximize the sum of the discounted utilities. In an early contribution, Dolan (1978) developed a static mechanism to implement a class of related scheduling problems with private information. An allocation policy in this setting is a sequence of choices at ∈ {0 1 I} where at denotes the bidder chosen in period t. We allow for at = 0 and hence the possibility that no bidder is selected in t. Each agent has only one task to complete and the value θi0 ∈ R+ of the task is constant over time and independent of the realization time (except for discounting). The transition function is then given by θit+1 =
0 if at = i θit if at = i
For this scheduling model, we find the marginal contribution of each agent and derive the associated dynamic pivot mechanism. We determine the marginal contribution of bidder i by comparing the value of the social program with and without i. With the constant valuations over time for all i, the optimal policy is given by assigning in every period the alternative j with the highest remaining valuation. To simplify notation, we define the positive valuation vi θi0 . We may assume without loss of generality (after relabelling) that the valuations vi are ordered with respect to the index i: v1 ≥ · · · ≥ vI ≥ 0. Due to the descending order of valuations, we identify each task i with the period i + 1 in which it is completed along the efficient path: (2)
W (θ0 ) =
I
δt−1 vt
t=1
Similarly, the efficient program in the absence of task i assigns the tasks in ascending order, but necessarily skips task i in the assignment process: (3)
W−i (θ0 ) =
i−1 t=1
δt−1 vt +
I−1
δt−1 vt+1
t=i
By comparing the social program with and without i, (2) and (3), respectively, we find that the assignments for agents j < i remain unchanged after i is removed, but that each agent j > i is allocated the slot one period earlier than
778
D. BERGEMANN AND J. VÄLIMÄKI
in the presence of i. The marginal contribution of i from the point of view of period 0 is Mi (θ0 ) = W (θ0 ) − W−i (θ0 ) =
I
δt−1 (vt − vt+1 )
t=i
The social externality cost of agent i is established in a straightforward manner. At time t = i − 1, agent i completes her task and realizes the value vi . The immediate opportunity cost is the next highest valuation vi+1 . But this overstates the externality, because in the presence of i, all less valuable tasks are realized one period later. The externality cost of agent i is hence equal to the next valuable task vi+1 minus the improvement in future allocations due to the delay of all tasks by one period: (4)
pi (θt ) = vi+1 −
I
δ (vt − vt+1 ) = (1 − δ) t−i
t=i+1
I
δt−i vt+1
t=i
Since we have by construction vt − vt+1 ≥ 0, the externality cost of agent i in the intertemporal framework is less than in the corresponding single allocation problem where it would be vi+1 . Consequently, the final expression states that the externality of agent i is the cost of delay imposed on the remaining and less valuable tasks.3 4. THE DYNAMIC PIVOT MECHANISM We now construct the dynamic pivot mechanism for the general model described in Section 2. The marginal contribution of agent i is her contribution to the social value. In the dynamic pivot mechanism, the marginal contribution will also be the information rent that agent i can secure for herself if the planner wishes to implement the socially efficient allocation. In a dynamic setting, if agent i can secure her marginal contribution in every continuation game of the mechanism, then she should be able to receive the flow marginal contribution mi (θt ) in every period. The flow marginal contribution accrues incrementally over time and is defined recursively: Mi (θt ) = mi (θt ) + δEMi (θt+1 ) 3
In the online Supplementary Material (Bergemann and Välimäki (2010)), we show that the socially efficient scheduling can be implemented through a bidding mechanism rather than the direct revelation mechanism used here. In a recent and related contribution, Said (2008) used the dynamic pivot mechanism and a payoff equivalence result to construct bidding strategies in a sequence of ascending auctions with entry and exit of the agents.
THE DYNAMIC PIVOT MECHANISM
779
The flow marginal contribution can be expressed directly in terms of the social value functions, using the definition of the marginal contribution given in (1) as (5)
mi (θt ) W (θt ) − W−i (θt ) − δE[W (θt+1 ) − W−i (θt+1 )]
The continuation payoffs of the social programs with and without i, respectively, may be governed by different transition probabilities, as the respective social decisions in period t, a∗t a∗ (θt ) and a∗−it a∗−i (θ−it ), may differ. The continuation value of the socially optimal program, conditional on current allocation at and state θt is W (θt+1 |at θt ) EF(θt+1 ;at θt ) W (θt+1 ) where the transition from state θt to state θt+1 is controlled by the allocation at . For notational ease, we omit the expectations operator E from the conditional expectation. We adopt the same notation for the marginal contributions Mi (·) and the individual value functions Vi (·). The flow marginal contribution mi (θt ) is expressed as mi (θt ) =
I
vj (a∗t θjt ) −
vj (a∗−it θjt )
j=i
j=1 ∗ t
+ δ[W−i (θt+1 |a θt ) − W−i (θt+1 |a∗−it θt )] A monetary transfer p∗i (θt ) such that the resulting flow net utility matches the flow marginal contribution leads agent i to internalize her social externalities: (6)
p∗i (θt ) vi (a∗t θit ) − mi (θt )
We refer to p∗i (θt ) as the transfer of the dynamic pivot mechanism. The transfer p∗i (θt ) depends only on the current report θt and not on the entire public history ht . We can express p∗i (θt ) in terms of the flow utilities and the social continuation values: p∗i (θt ) = (7) [vj (a∗−it θjt ) − vj (a∗t θjt )] j=i
+ δ[W−i (θt+1 |a∗−it θt ) − W−i (θt+1 |a∗t θt )] The transfer p∗i (θt ) for agent i depends on the report of agent i only through the determination of the social allocation which is a prominent feature of the static Vickrey–Clarke–Groves mechanisms. The monetary transfers p∗i (θt ) are always nonnegative, as the policy a∗−it is by definition an optimal policy to maximize the social value of all agents exclusive of i. It follows that in every period t,
780
D. BERGEMANN AND J. VÄLIMÄKI
the sum of the monetary transfers across all agents generates a weak budget surplus. THEOREM 1—Dynamic Pivot Mechanism: The dynamic pivot mechanism {a∗t p∗t }∞ t=0 is ex post incentive compatible and individually rational. PROOF: By the unimprovability principle, it suffices to prove that if agent i receives as her continuation value her marginal contribution, then truthtelling is incentive compatible for agent i in period t, or (8)
vi (a∗ (θt ) θit ) − p∗i (θt ) + δMi (θt+1 |a∗t θt ) ≥ vi (a∗ (rit θ−it ) θit ) − p∗i (rit θ−it ) + δMi (θt+1 |a∗ (rit θ−it ) θt )
for all rit ∈ Θi and all θ−it ∈ Θ−i , and we recall that we denote the socially efficient allocation at the true state profile θt by a∗t a∗ (θt ). By construction of p∗i in (7), the left-hand side of (8) represents the marginal contribution of agent i. We can express the marginal contributions Mi (·) in terms of the different social values to get (9)
W (θt ) − W−i (θt ) ≥ vi (a∗ (rit θ−it ) θit ) − p∗i (rit θ−it )
+ δ W (θt+1 |a∗ (rit θ−it ) θt ) − W−i (θt+1 |a∗ (rit θ−it ) θt )
We then insert the transfer price p∗i (rit θ−it ) (see (7)) into (9) to obtain W (θt ) − W−i (θt ) ≥ vi (a∗ (rit θ−it ) θit ) − +
vj (a∗−it θjt ) − δW−i (θt+1 |a∗−it θt )
j=i
vj (a∗ (rit θ−it ) θjt ) + δW (θt+1 |a∗ (rit θ−it ) θt )
j=i
But now we reconstitute the entire inequality in terms of the respective social values: W (θt ) − W−i (θt ) ≥
I
vj (a∗ (rit θ−it ) θjt )
j=1
+ δW (θt+1 |a∗ (rit θ−it ) θt ) − W−i (θt ) The above inequality holds for all rit by the social optimality of a∗ (θt ) in state θt . Q.E.D.
THE DYNAMIC PIVOT MECHANISM
781
The dynamic pivot mechanism specifies a unique monetary transfer after every history. It guarantees that the ex post incentive and ex post participation constraints are satisfied after every history. In the intertemporal environment, each agent evaluates the monetary transfers to be paid in terms of the expected discounted transfers, but is indifferent (up to discounting) over the incidence of the transfers over time. This temporal separation between allocative decisions and monetary decisions may be undesirable for many reasons. First, if the agents and the principal do not have the ability to commit to future transfer payments, then delays in payments become problematic. In consequence, an agent who is not pivotal should not receive or make a payment. Second, if it is costly (in a lexicographic sense) to maintain accounts of future monetary commitments, then the principal wants to close down (as early as possible) the accounts of those agents who are no longer pivotal.4 This motivates the following efficient exit condition. Let state θτi in period τi be such that the probability that agent i affects the efficient social decision a∗t in period t is equal to zero for all t ≥ τi , that is, Pr({θt |a∗t (θt ) = a∗−it (θt )}|θτi ) = 0. In this case, agent i is irrelevant for the mechanism in period τi , and we say that the mechanism satisfies the efficient exit condition if agents neither make nor receive transfers in periods where they are irrelevant for the mechanism. DEFINITION 1—Efficient Exit: A dynamic direct mechanism satisfies the efficient exit condition if for all i hτi θτi ,
pi hτi θτi = 0 We establish the uniqueness of the dynamic pivot mechanism in an environment with diverse preferences and the efficient exit condition. The assumption of diverse preferences allows for rich preferences over the current allocations and indifference over future allocations. ASSUMPTION 1—Diverse Preferences: (i) For all i, there exists θi ∈ Θi such that for all a, vi (a θi ) = 0 and Fi (θi ; a θi ) = 1 (ii) For all i a, and x ∈ R+ , there exists θiax ∈ Θi such that x if at = a, ax vi (at θi ) = 0 if at = a, and for all at Fi (θi ; at θiax ) = 1 4 We would like to thank an anonymous referee for the suggestion to consider the link between exit and uniqueness of the transfer rule.
782
D. BERGEMANN AND J. VÄLIMÄKI
The diverse preference assumption assigns to each agent i a state, θi , which is an absorbing state and in which i gets no payoff from any allocation. In addition, each agent i has a state in which i has a positive valuation x for a specific current allocation aand no value for other current or any future allocations. The diverse preferences condition is similar to the rich domain conditions introduced in Green and Laffont (1977) and Moulin (1986) to establish the uniqueness of the Groves and the pivot mechanism in a static environment. Relative to their conditions, we augment the diverse (flow) preferences with the certain transition into the absorbing state θi . With this transition we ensure that the diverse flow preferences continue to matter in the intertemporal environment. The assumption of diverse preference in conjunction with the efficient exit condition guarantees that in every dynamic direct mechanism there are some types, specifically types of the form θiax , that receive exactly the flow transfers they would have received in the dynamic pivot mechanism. LEMMA 1: If {a∗t pt }∞ t=0 is ex post incentive compatible and individually rational, and satisfies the efficient exit condition, then pi (ht θiax θ−it ) = p∗i (θiax θ−it )
for all i a x θ−it ht
PROOF: In the dynamic pivot mechanism, if the valuation x of type θiax for allocation a exceeds the social externality cost, that is, (10)
x ≥ W−i (θ−it ) −
vj (a θjt ) − δW−i (θ−it+1 |a θ−it )
j=i
then p∗i (θiax θ−it ) is equal to the above social externality cost; otherwise it is zero. We now argue by contradiction. By the ex post incentive compatibility constraints, all types θiax of agent i, where x satisfies the inequality (10), must pay the same transfer. To see this, suppose that for some x y ∈ R+ satisfying (10), ay ay we have pi (ht θiax θ−it ) < pi (ht θi θ−it ). Now type θi has a strict incenax tive to misreport rit = θi , a contradiction. We therefore denote the transfer for all x and θiax satisfying (10) by pi (ht a θ−it ), and denote the corresponding dynamic pivot transfer by p∗i (a θ−it ). Suppose next that pi (ht a θ−it ) > p∗i (a θ−it ). This implies that the ex post participation constraint for some x with pi (ht a θ−it ) > x > p∗i (a θ−it ) is violated, contradicting the hypothesis of the lemma. Suppose to the contrary that pi (ht a θ−it ) < p∗i (a θ−it ), and consider the incentive constraints of a type θiax with a valuation x such that (11)
pi (ht a θ−it ) < x < p∗i (a θ−it )
THE DYNAMIC PIVOT MECHANISM
783
If the inequality (11) is satisfied, then it follows that a∗ (θiax θ−it ) = a∗−i (θ−it ) and, in particular, that a∗ (θiax θ−it ) = a. If the ex post incentive constraint of type θiax were satisfied, then we would have (12)
vi (a∗ (θiax θ−it ) θiax ) − pi (ht θiax θ−it ) ≥ vi (a θiax ) − pi (ht a θ−it )
Given that θi = θiax , we rewrite (12) as 0 − pi (ht θiax θ−it ) ≥ x − pi (ht a θ−it ). But given (11), this implies that pi (ht θiax θ−it ) < 0. In other words, type θiax receives a strictly positive subsidy even though her report is not pivotal for the social allocation as a∗ (θiax θ−it ) = a∗−i (θ−it ). Now, a positive subsidy violates the ex post incentive constraint of the absorbing type θi . By the efficient exit condition, type θi should not receive any contemporaneous (or future) subsidies. But by misreporting her type to be θiax , type θi would gain access to a positive subsidy without changing the social allocation. It thus follows that pi (ht θiax θ−it ) = p∗i (θiax θ−it ) for all a and all x. Q.E.D. Given that the transfers of the dynamic pivot mechanism are part of every dynamic direct mechanism with diverse preferences, we next establish that every type θi0 in t = 0 has to receive the same ex ante expected utility as in the dynamic pivot mechanism. LEMMA 2: If {a∗t pt }∞ t=0 is ex post incentive compatible and individually rational, and satisfies the efficient exit condition, then for all i and all θ0 , Vi (θ0 ) = Mi (θ0 ). PROOF: The argument is by contradiction. Consider i such that Vi (θ0 ) = Mi (θ0 ). Suppose first that Vi (θ0 ) > Mi (θ0 ). Then there is a history hτ and a state θτ such that pi (hτ θτ ) < p∗i (θτ ). We show that such a transfer pi (hτ θτ ) leads to a violation of the ex post incentive constraint for some type θiax ∈ Θi . a∗ x Specifically consider the incentive constraint of a type θi τ with pi (hτ θτ ) < x < p∗i (θτ ) at a misreport θiτ : (13)
a∗ x
a∗ x
a∗ x vi a∗ θi τ θ−iτ θi τ − pi hτ θi τ θ−iτ a∗ x
a∗ x
+ δVi hiτ+1 |a∗ θi τ θ−iτ θi τ θ−iτ a∗ x ≥ vi a∗ (θiτ θ−iτ ) θi τ − pi (hτ θτ )
a∗ x
+ δVi hiτ+1 |a∗ θiτ θ−iτ θi τ θ−iτ
By hypothesis, we have pi (hτ θτ ) < x < p∗i (θτ ) and if x < p∗i (θτ ), then we can a∗ x infer from marginal contribution pricing that a∗ (θi τ θ−iτ ) = a∗ (θiτ θ−iτ ). ∗ a x But as the type θi τ has only a positive valuation for a∗ (θiτ θ−iτ ), it follows
784
D. BERGEMANN AND J. VÄLIMÄKI
that the left-hand side of (13) is equal to zero. However, the right-hand side is a∗ x equal to vi (a∗ (θiτ θ−iτ ) θi τ ) − pi (hτ θτ ) = x − pi (hτ θτ ) > 0, leading to a contradiction. Suppose next that for some ε > 0, we have (14)
Mi (θ0 ) − Vi (θ0 ) > ε
By hypothesis of ex post incentive compatibility, we have for all reports ri0 , Mi (θ0 ) − vi (a∗ (ri0 θ−i0 ) θi0 ) − pi (h0 ri0 θ−i0 ) + δVi (hi1 |a∗ (ri0 θ−i0 ) θi0 ) > ε Given a∗0 , we can find, by the diverse preference condition, a type θi = a∗ x a∗ x θi 0 such that a∗0 = a∗ (θi 0 θ−i0 ). Now by Lemma 1, there exists a rea∗ x port ri0 for agent i namely ri0 = θi 0 , such that a∗0 is induced at the price a∗ x p∗i (θ0 ). After inserting ri0 = θi 0 into the above inequality and observing that vi (a∗ (ri0 θ−i0 ) θi0 ) −pi (h0 ri0 θ−i0 ) = mi (θ0 ), we conclude that Mi (θ1 ) − Vi (hi1 |a∗0 (ri0 θ−i0 ) θi0 ) > ε/δ. Now we repeat the argument we started with (14) and find that there is a path of realizations θ0 θt , such that the difference between the marginal contribution and the value function of agent i grows without bound. But the marginal contribution of agent i is finite given that the expected flow utility of agent i is bounded by some K > 0, and thus eventually the ex post participation constraint of the agent is violated and we obtain the desired contradiction. Q.E.D. The above lemma can be viewed as a revenue equivalence result of all (efficient) dynamic direct mechanisms. As we are analyzing a dynamic allocation problem with an infinite horizon, we cannot appeal to the revenue equivalence results established for static mechanisms. In particular, the statement of the standard revenue equivalence results involves a fixed utility for the lowest type. In the infinite-horizon model here, the diverse preference assumption gives us a natural candidate of a lowest type in terms of θi , and the efficient exit condition determines her utility. The remaining task is to argue that among all intertemporal transfers with the same expected discounted value, only the time profile of the dynamic pivot mechanism satisfies the relevant conditions. Alternative payments streams could either require an agent to pay earlier or later relative to the dynamic pivot transfers. If the payments were to occur later, payments would have to be lower in an earlier period by the above revenue equivalence result. This would open the possibility for a “short-lived” type θiax to induce action a at a price below the dynamic pivot transfer and hence violate incentive compatibility. The reverse argument applies if the payments were to occur earlier relative to the dynamic pivot transfer, for example, if the agent were to be asked to post a bond at the beginning of the mechanism.
THE DYNAMIC PIVOT MECHANISM
785
THEOREM 2—Uniqueness: If the diverse preference condition is satisfied and if {a∗t pt }∞ t=0 is ex post incentive compatible and individually rational, and satisfies the efficient exit condition, then it is the dynamic pivot mechanism. PROOF: The proof is by contradiction. Suppose not. Then by Lemma 2 there exists an agent i, a history hτ , and an associated state θiτ such that pi (hτ θτ ) = p∗i (θτ ). Suppose first that pi (hτ θτ ) < p∗i (θτ ). We show that the current monetary transfer pi (hτ θτ ) violates the ex post incentive constraint of a∗ x some type θiax . Consider now a type θi τ with a valuation x for the allocation a∗τ such that x > p∗i (θτ ). Her ex post incentive constraints are given by vi (a∗ (θiax θ−it ) θiax ) − pi (ht θiax θ−it ) + δVi (hit+1 |a∗ (θiax θ−it ) (θiax θ−it )) ≥ vi (a∗ (rit θ−it ) θit ) − pi (ht rit θ−it ) + δVi (hit+1 |a∗ (rit θ−it ) (θiax θ−it )) for all rit ∈ Θi . By the efficient exit condition, we have for all rit , Vi (hit+1 |a∗ (θiax θ−it ) (θiax θ−it )) = Vi (hit+1 |a∗ (rit θ−it ) (θiax θ−it )) = 0 By Lemma 1, pi (ht θiax θ−it ) = p∗i (θiax θ−it ) = p∗i (θτ ). Consider then the misreport riτ = θiτ by type θiax . The ex post incentive constraint now reads x − p∗i (θτ ) ≥ x − pi (hτ θτ ), which leads to a contradiction, as by hypothesis we have pi (hτ θτ ) < p∗i (θτ ). Suppose next that pi (hτ θτ ) > p∗i (θτ ). Now by Lemma 2, it follows that the ex ante expected payoff is equal to the value of the marginal contribution of agent i in period 0. It therefore follows from pi (hτ θτ ) > p∗i (θτ ) that there also exists another time τ and state θτ such that pi (hτ θτ ) < p∗i (θτ ). By repeating the argument in the first part of the proof, we obtain a contradiction. Q.E.D. We should reiterate that in the definition of the ex post incentive and participation conditions, we required that a candidate mechanism satisfies these conditions after all possible histories of past reports. It is in the spirit of the ex post constraints that these constraints hold for all possible states rather than strictly positive probability events. In the context of establishing the uniqueness of the mechanism, it allows us to use the diverse preference condition without making an additional assumption about the transition probability from a given state θit into a specific state θiax . We merely require the existence of these types in establishing the above result.
786
D. BERGEMANN AND J. VÄLIMÄKI
5. LEARNING AND LICENSING In this section, we show how our general model can be interpreted as one where the bidders learn gradually about their preferences for an object that is auctioned repeatedly over time. We use the insights from the general pivot mechanism to deduce properties of the efficient allocation mechanism. A primary example of an economic setting that fits this model is the leasing of a resource or license over time. In every period t a single indivisible object can be allocated to a bidder i ∈ {1 I}, and the allocation decision at ∈ {1 2 I} simply determines which bidder gets the object in period t To describe the uncertainty explicitly, we assume that the true valuation of bidder i is given by ωi ∈ Ωi = [0 1]. Information in the model represents, therefore, the bidder’s prior and posterior beliefs on ωi In period 0, bidder i does not know the realization of ωi , but she has a prior distribution θi0 (ωi ) on Ωi . The prior and posterior distributions on Ωi are assumed to be independent across bidders. In each subsequent period t, only the winning bidder in period t − 1 receives additional information leading to an updated posterior distribution θit on Ωi according to Bayes’ rule. If bidder i does not win in period t, we assume that she gets no information, and consequently the posterior is equal to the prior. In the dynamic direct mechanism, the bidders simply report their posteriors at each stage. The socially optimal assignment over time is a standard multiarmed bandit problem and the optimal policy is characterized by an index policy (see Whittle (1982)). In particular, we can compute for every bidder i the index based exclusively on the information about bidder i. The index of bidder i after private history hit is the solution to the optimal stopping problem ⎫ ⎧ τi ⎪ ⎪ l ⎪ ⎪ ⎪ δ vi (at+l ) ⎪ ⎪ ⎪ ⎬ ⎨ l=0 γi (hit ) = max E τi τi ⎪ ⎪ ⎪ ⎪ l ⎪ ⎪ ⎪ ⎪ δ ⎭ ⎩ l=0
where at+l is the path in which alternative i is chosen l times following a given past allocation (a0 at ). An important property of the index policy is that the index of alternative i can be computed independent of any information about the other alternatives. In particular, the index of bidder i remains constant if bidder i does not win the object. The socially efficient allocation policy a∗ = {a∗t }∞ t=0 is to choose in every period a bidder i if γi (hit ) ≥ γj (hjt ) for all j. In the dynamic direct mechanism, we construct a transfer price such that under the efficient allocation, each bidder’s net payoff coincides with her flow marginal contribution mi (θt ). We consider first the payment of the bidder i who has the highest index in state θt and who should therefore receive the
THE DYNAMIC PIVOT MECHANISM
787
object in period t. To match her net payoff to her flow marginal contribution, we must have (15)
mi (θt ) = vi (hit ) − pi (θt )
The remaining bidders, j = i, should not receive the object in period t and their transfer price must offset the flow marginal contribution: mj (θt ) = −pj (θt ). We expand mi (θt ) by noting that i is the efficient assignment and that another bidder, say k, would be the efficient assignment in the absence of i: mi (θt ) = vi (hit ) − vk (hkt ) + δ(W−i (θt+1 |i θt ) − W−i (θt+1 |k θt )) The continuation value without i in t + 1, but conditional on having assigned the object to i in period t, is simply equal to the value conditional on θt , or W−i (θt+1 |i θt ) = W−i (θt ). The additional information generated by the assignment to agent i only pertains to agent i and hence has no value for the allocation problem once i is removed. The flow marginal contribution of the winning agent i is, therefore, mi (θt ) = vi (hit ) − (1 − δ)W−i (θt ) It follows that p∗i (θt ) = (1 − δ)W−i (θt ), which is the flow externality cost of assigning the object to agent i. A similar analysis leads to the conclusion that each losing bidder makes zero payments: p∗j (θt ) = −mj (θt ) = 0. THEOREM 3—Dynamic Second Price Auction: The socially efficient allocation rule a∗ is ex post incentive compatible in the dynamic direct mechanism with the payment rule p∗ , where (1 − δ)W−j (θt ) if a∗t = j, p∗j (θt ) = 0 if a∗t = j The incentive compatible pricing rule has a few interesting implications. First, we observe that in the case of two bidders, the formula for the dynamic second price reduces to the static solution. If we remove one bidder, the social program has no other choice but to always assign it to the remaining bidder. But then the expected value of that assignment policy is simply equal to the expected value of the object for bidder j in period t by the martingale property of the Bayesian posterior. In other words, the transfer is equal to the current expected value of the next best competitor. It should be noted, though, that the object is not necessarily assigned to the bidder with the highest current flow payoff. With more than two bidders, the flow value of the social program without bidder i is different from the flow value of any remaining alternative. Since
788
D. BERGEMANN AND J. VÄLIMÄKI
there are at least two bidders left after excluding i the planner has the option to abandon any chosen alternative if its value happens to fall sufficiently. This option value increases the social flow payoff and hence the transfer that the efficient bidder must pay. In consequence, the social opportunity cost is higher than the highest expected valuation among the remaining bidders. Second, we observe that the transfer price of the winning bidder is independent of her own information about the object. This means that for all periods in which the ownership of the object does not change, the transfer price stays constant as well, even though the value of the object to the winning bidder may change. REFERENCES ARROW, K. (1979): “The Property Rights Doctrine and Demand Revelation Under Incomplete Information,” in Economics and Human Welfare: Essays in Honor of Tibor Scitovsky, ed. by M. Boskin. New York: Academic Press, 23–39. [773] ATHEY, S., AND I. SEGAL (2007): “An Efficient Dynamic Mechanism,” Discussion Paper, Harvard University and Stanford University. [773] BARON, D., AND D. BESANKO (1984): “Regulation and Information in a Continuing Relationship,” Information Economics and Policy, 1, 267–302. [772,773] BATTAGLINI, M. (2005): “Long-Term Contracting With Markovian Consumers,” American Economic Review, 95, 637–658. [772,773] BERGEMANN, D., AND J. VÄLIMÄKI (2003): “Dynamic Common Agency,” Journal of Economic Theory, 111, 23–48. [775] (2006): “Dynamic Price Competition,” Journal of Economic Theory, 127, 232–263. [775] (2010): “Supplement to ‘The Dynamic Pivot Mechanism’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/7260_extensions.pdf. [778] CAVALLO, R., D. PARKES, AND S. SINGH (2006): “Optimal Coordinated Planning Among SelfInterested Agents With Private State,” in Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, Cambridge. [773] COURTY, P., AND H. LI (2000): “Sequential Screening,” Review of Economic Studies, 67, 697–717. [773] D’ASPREMONT, C., AND L. GERARD -VARET (1979): “Incentives and Incomplete Information,” Journal of Public Economics, 11, 25–45. [773] DOLAN, R. (1978): “Incentive Mechanisms for Priority Queuing Problems,” Bell Journal of Economics, 9, 421–436. [777] GREEN, J., AND J. LAFFONT (1977): “Characterization of Satisfactory Mechanisms for the Revelation of the Preferences for Public Goods,” Econometrica, 45, 427–438. [771,782] LAVI, R., AND N. NISAN (2000): “Competitive Analysis of Incentive Compatible Online Auctions,” in Proceedings of the 2nd Conference of Electronic Commerce. New York: ACM Press, 233–241. [772] MOULIN, H. (1986): “Characterization of the Pivotal Mechanism,” Journal of Public Economics, 31, 53–78. [782] PAVAN, A., I. SEGAL, AND J. TOIKKA (2008): “Dynamic Mechanism Design: Revenue Equivalence, Profit Maximization and Information Disclosure,” Discussion Paper, Northwestern University and Stanford University. [773] SAID, M. (2008): “Auctions With Dynamic Populations: Efficiency and Revenue Maximization,” Discussion Paper, Yale University. [778]
THE DYNAMIC PIVOT MECHANISM
789
WHITTLE, P. (1982): Optimization Over Time, Vol. 1. Chichester: Wiley. [786]
Dept. of Economics, Yale University, 30 Hillhouse Avenue, New Haven, CT 06520, U.S.A.;
[email protected] and Dept. of Economics, Helsinki School of Economics, Arkadiankatu 7, 00100 Helsinki, Finland;
[email protected]. Manuscript received July, 2007; final revision received September, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 771–789
THE DYNAMIC PIVOT MECHANISM BY DIRK BERGEMANN AND JUUSO VÄLIMÄKI1 We consider truthful implementation of the socially efficient allocation in an independent private-value environment in which agents receive private information over time. We propose a suitable generalization of the pivot mechanism, based on the marginal contribution of each agent. In the dynamic pivot mechanism, the ex post incentive and ex post participation constraints are satisfied for all agents after all histories. In an environment with diverse preferences it is the unique mechanism satisfying ex post incentive, ex post participation, and efficient exit conditions. We develop the dynamic pivot mechanism in detail for a repeated auction of a single object in which each bidder learns over time her true valuation of the object. The dynamic pivot mechanism here is equivalent to a modified second price auction. KEYWORDS: Pivot mechanism, dynamic mechanism design, ex post equilibrium, marginal contribution, multiarmed bandit, Bayesian learning.
1. INTRODUCTION IN THIS PAPER, we generalize the idea of the pivot mechanism (due to Green and Laffont (1977)) to dynamic environments with private information. We design an intertemporal sequence of transfer payments which allows each agent to receive her flow marginal contribution in every period. In other words, after each history, the expected transfer that each agent must pay coincides with the dynamic externality cost that she imposes on the other agents. In consequence, each agent is willing to truthfully report her information in every period. We consider a general intertemporal model in discrete time and with a common discount factor. The private information of each agent in each period is her perception of her future payoff path conditional on the realized signals and allocations. We assume throughout that the information is statistically independent across agents. At the reporting stage of the direct mechanism, each agent reports her information. The planner then calculates the efficient allocation given the reported information. The planner also calculates for each agent i the optimal allocation when agent i is excluded from the mechanism. The total expected discounted payment of each agent is set equal to the externality cost imposed on the other agents in the model. In this manner, each agent receives as her payment her marginal contribution to the social welfare in every conceivable continuation game. 1 We thank the editor and four anonymous referees for many helpful comments. The current paper is a major revision and supersedes “Dynamic Vickrey–Clarke–Groves Mechanisms” (2007). We are grateful to Larry Ausubel, Jerry Green, Paul Healy, John Ledyard, Benny Moldovanu, Michael Ostrovsky, David Parkes, Alessandro Pavan, Ilya Segal, Xianwen Shi, and Tomasz Strzalecki for many informative conversations. The authors acknowledge financial support through National Science Foundation Grants CNS 0428422 and SES 0518929, and the Yrjö Jahnsson’s Foundation, respectively.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7260
772
D. BERGEMANN AND J. VÄLIMÄKI
With transferable utilities, the social objective is simply to maximize the expected discounted sum of the individual utilities. Since this is essentially a dynamic programming problem, the solution is by construction time-consistent. In consequence, the dynamic pivot mechanism is time-consistent and the social choice function can be implemented by a sequential mechanism without any ex ante commitment by the designer (apart from the commitment to the transfers promised for the current period). In contrast, in revenue-maximizing problems, it is well known that the optimal solution relies critically on the ability of the principal to commit to a contract, see Baron and Besanko (1984). Interestingly, Battaglini (2005) showed that in dynamic revenue-maximizing problems with stochastic types, the commitment problems are less severe than with constant types. The dynamic pivot mechanism yields a positive monetary surplus for the planner in each period and, therefore, the planner does not need outside resources to achieve the efficient allocation. Finally, the dynamic pivot mechanism induces all agents to participate in the mechanism after all histories. In the intertemporal environment there are many transfer schemes that support the same incentives as the pivot mechanism. In particular, the monetary transfers necessary to induce the efficient action in period t may become due at some later period s provided that the net present value of the transfers remains constant. We say that a mechanism supports efficient exit if an agent who ceases to affect current and future allocations also ceases to pay and receive transfers. This condition is similar to the requirement often made in the scheduling literature that the mechanism be an online mechanism (see Lavi and Nisan (2000)). We establish that in an environment with diverse preferences, the dynamic pivot mechanism is the only efficient mechanism that satisfies ex post incentive compatibility, ex post participation, and efficient exit conditions. The basic idea of the dynamic pivot mechanism is first explored in the context of a scheduling problem where a set of privately informed bidders compete for the services of a central facility over time. This class of problems is perhaps the most natural dynamic allocation analogue to the static single-unit auction. The scheduling problem is kept deliberately simple and all the relevant private information arrives in the initial period. Subsequently, we use the dynamic pivot mechanism to derive the dynamic auction format for a model where bidders learn their valuations for a single object over time. In contrast to the scheduling problem where a static mechanism could still have implemented the efficient solution, a static mechanism now necessarily fails to support the efficient outcome as more information arrives over time. In turn, this requires a more complete understanding of the intertemporal trade-offs in the allocation process. By computing the dynamic marginal contributions, we can derive explicit and informative expressions for the intertemporal transfer prices. In recent years, a number of papers have been written with the aim to explore various issues arising in dynamic allocation problems. Among the contributions
THE DYNAMIC PIVOT MECHANISM
773
which focus on socially efficient allocation, Cavallo, Parkes, and Singh (2006) proposed a Markovian environment for general allocation problems and analyzed two different classes of sequential incentives schemes: (i) Groves-like payments and (ii) pivot-like payments. They established that Groves-like payments, which award every agent positive monetary transfers equal to the sum of the valuation of all other agents, guarantee interim incentive compatibility and ex post participation constraints after all histories. In contrast, pivot-like payments guarantee interim incentive compatibility and ex ante participation constraints. Athey and Segal (2007) considered a more general dynamic model in which the current payoffs are allowed to depend on the entire past history including past signals and past actions. In addition, they also allowed for hidden action as well as hidden information. The main focus of their analysis is on incentive compatible mechanisms that are budget balanced in every period of the game. Their mechanism, called balanced team mechanism, transfers the insight from the Arrow (1979) and D’Aspremont and Gerard-Varet (1979) mechanisms into a dynamic environment. In addition, Athey and Segal (2007) presented conditions in terms of ergodic distributions over types and patients agents such that insights from repeated games can be employed to guarantee interim participation constraints. In contrast, we emphasize voluntary participation without any assumptions about the discount factor or the ergodicity of the type distributions. We also define an efficient exit condition which allows us to single out the dynamic pivot mechanism in the class of efficient mechanisms. The focus of the current paper is on the socially efficient allocation, but a number of recent papers have analyzed the design of dynamic revenuemaximizing mechanisms, beginning with the seminal contributions by Baron and Besanko (1984) and Courty and Li (2000), who considered optimal intertemporal pricing policies with private information in a setting with two periods. Battaglini (2005) considered the revenue-maximizing long-term contract of a monopolist in a model with an infinite time horizon when the valuation of the buyer changes in a Markovian fashion over time. In particular, Battaglini (2005) showed that the optimal continuation contracts for a current high type are efficient, as his payoff is determined by the allocations for the current low type (by incentive compatibility). The net payoffs of the types then have a property related to the marginal contribution here. But as Battaglini (2005) considered revenue-maximizing contracts, the lowest type served receives zero utility, and hence the notion of marginal contribution refers only to the additional utility generated by higher types, holding the allocation constant, rather than the entire incremental social value. Most recently, Pavan, Segal, and Toikka (2008) developed a general allocation model and derived the optimal dynamic revenue-maximizing mechanism. A common thread in these papers is a suitable generalization of the notion of virtual utility to dynamic environments.
774
D. BERGEMANN AND J. VÄLIMÄKI
2. MODEL Uncertainty We consider an environment with private and independent values in a discrete-time, infinite-horizon model. The flow utility of agent i ∈ {1 2 I} in period t ∈ N is determined by the current allocation at ∈ A, the current monetary transfer pit ∈ R, and a state variable θit ∈ Θi . The von Neumann– Morgenstern utility function ui of agent i is quasilinear in the monetary transfer: ui (at pit θit ) vi (at θit ) − pit The current allocation at ∈ A is an element of a finite set A of possible allocations. The state of the world θit for agent i is a general Markov process on the state space Θi . The aggregate state is given by the vector θt = (θ1t θIt ) I with Θ = i=1 Θi . There is a common prior Fi (θi0 ) regarding the initial type θi0 of each agent i. The current state θit and the current action at define a probability distribution for next period state variables θit+1 on Θi . We assume that this distribution can be represented by a stochastic kernel Fi (θit+1 ; θit at ). The utility functions ui (·) and the probability transition functions Fi (·; at θit ) are common knowledge at t = 0. The common prior Fi (θi0 ) and the stochastic kernels Fi (θit+1 ; θit at ) are assumed to be independent across agents. At the beginning of each period t, each agent i observes θit privately. At the end of each period, an action at ∈ A is chosen and payoffs for period t are realized. The asymmetric information is therefore generated by the private observation of θit in each period t. We observe that by the independence of the priors and the stochastic kernels across i, the information of agent i, θit+1 , does not depend on θjt for j = i. The expected absolute value of the flow payoff is assumed to be bounded by some K < ∞ for every i a θ and allocation plan a : Θ → A: vi (a (θ ) θ ) dF(θ ; a θ) < K i
×
The nature of the state space Θ depends on the application at hand. At this point, we stress that the formulation accommodates the possibility of random arrival or departure of the agents. The arrival or departure of agent i can be represented by an inactive state θi where vi (at θi ) = 0 for all at ∈ A and a random time τ at which agent i privately observes her transition in or out of the inactive state. Social Efficiency All agents discount the future with a common discount factor δ 0 < δ < 1. The socially efficient policy is obtained by maximizing the expected discounted
THE DYNAMIC PIVOT MECHANISM
775
sum of valuations. Given the Markovian structure, the socially optimal program starting in period t at state θt can be written as ∞ I s−t E δ vi (as θis ) W (θt ) max ∞ {as }s=t
s=t
i=1
For notational ease, we omit the conditioning state in the expectation operator, when the conditioning event is obvious, as in the above, where E[·] = Eθt [·]. Alternatively, we can represent the social program in its recursive form: I vi (at θit ) + δEW (θt+1 ) W (θt ) = max E at
i=1
The socially efficient policy is denoted by a∗ = {a∗t }∞ t=0 . The social externality cost of agent i is determined by the social value in the absence of agent i: ∞ s−t W−i (θt ) max E δ vj (as θjs ) ∞ {as }s=t
s=t
j=i
The efficient policy when agent i is excluded is denoted by a∗−i = {a∗−it }∞ t=0 . The marginal contribution Mi (θt ) of agent i at signal θt is defined by (1)
Mi (θt ) W (θt ) − W−i (θt )
The marginal contribution of agent i is the change in the social value due to the addition of agent i.2 Mechanism and Equilibrium We focus attention on direct mechanisms which truthfully implement the socially efficient policy a∗ . A dynamic direct mechanism asks every agent i to report her state θit in every period t. The report rit ∈ Θi may or may not be truthful. The public history in period t is a sequence of reports and allocations until period t − 1, or ht = (r0 a0 r1 a1 rt−1 at−1 ), where each rs = (r1s rIs ) is a report profile of the I agents. The set of possible public histories in period t is denoted by Ht . The sequence of reports by the agents is part of the public history and we assume that the past reports of each agent are observable to all the agents. The private history of agent i in period t consists of the public history and the sequence of private observations until period t, or hit = (θi0 r0 a0 θi1 r1 a1 θit−1 rt−1 at−1 θit ) The set of possible private histories in period t is denoted by Hit . An (efficient) dynamic direct 2 In symmetric information environments, we used the notion of marginal contribution to construct efficient equilibria in dynamic first price auctions; see Bergemann and Välimäki (2003, 2006).
776
D. BERGEMANN AND J. VÄLIMÄKI
mechanism is represented by a family of allocations and monetary transfers, ∗ I {a∗t pt }∞ t=0 : at : Θ → (A) and pt : Ht × Θ → R . With the focus on efficient ∗ mechanisms, the allocation at depends only on the current (reported) state rt ∈ Θ, while the transfer pt may depend on the entire public history. A (pure) reporting strategy for agent i in period t is a mapping from the private history into the state space: rit : Hit → Θi . For a given mechanism, the expected payoff of agent i from reporting strategy ri = {rit }∞ t=0 given the strategies r−i = {r−it }∞ t=0 is E
∞ δt vi (a∗ (rt ) θit ) − pi (ht rt ) t=0
Given the mechanism {a∗t pt }∞ t=0 and the reporting strategies r−i , the optimal strategy of bidder i can be stated recursively:
Vi (hit ) = max E vi (a∗t (rit r−it ) θit ) − pi (ht rit r−it ) + δVi (hit+1 ) rit ∈Θi
The value function Vi (hit ) represents the continuation value of agent i given the current private history hit . We say that a dynamic direct mechanism is interim incentive compatible if for every agent and every history, truthtelling is a best response given that all other agents report truthfully. We say that the dynamic direct mechanism is periodic ex post incentive compatible if truthtelling is a best response regardless of the history and the current state of the other agents. In the dynamic context, the notion of ex post incentive compatibility is qualified by periodic, as it is ex post with respect to all signals received in period t, but not ex post with respect to signals arriving after period t. The periodic qualification arises in the dynamic environment, as agent i may receive information at some later time s > t such that in retrospect she would wish to change the allocation choice in t and hence her report in t. Finally we define the periodic ex post participation constraints of each agent. After each history ht , each agent i may opt out (permanently) from the mechanism. The value of the outside option is denoted Oi (hit ) and it is defined by the payoffs that agent i receives if the planner pursues the efficient policy a∗−i for the remaining agents. The periodic participation constraint requires that each agent’s equilibrium payoff after each history weakly exceeds Oi (hit ). For the remainder of the text, we say that a mechanism is ex post incentive compatible and individually rational if it satisfies the periodic ex post incentive and participation constraints. 3. SCHEDULING: AN EXAMPLE We consider the problem of allocating time to use a central facility among competing agents. Each agent has a private valuation for the completion of a
THE DYNAMIC PIVOT MECHANISM
777
task which requires the use of the central facility. The facility has a capacity constraint and can only complete one task per period. The cost of delaying any task is given by the discount rate δ < 1. The agents are competing for the right to use the facility at the earliest available time. The objective of the social planner is to sequence the tasks over time so as to maximize the sum of the discounted utilities. In an early contribution, Dolan (1978) developed a static mechanism to implement a class of related scheduling problems with private information. An allocation policy in this setting is a sequence of choices at ∈ {0 1 I} where at denotes the bidder chosen in period t. We allow for at = 0 and hence the possibility that no bidder is selected in t. Each agent has only one task to complete and the value θi0 ∈ R+ of the task is constant over time and independent of the realization time (except for discounting). The transition function is then given by θit+1 =
0 if at = i θit if at = i
For this scheduling model, we find the marginal contribution of each agent and derive the associated dynamic pivot mechanism. We determine the marginal contribution of bidder i by comparing the value of the social program with and without i. With the constant valuations over time for all i, the optimal policy is given by assigning in every period the alternative j with the highest remaining valuation. To simplify notation, we define the positive valuation vi θi0 . We may assume without loss of generality (after relabelling) that the valuations vi are ordered with respect to the index i: v1 ≥ · · · ≥ vI ≥ 0. Due to the descending order of valuations, we identify each task i with the period i + 1 in which it is completed along the efficient path: (2)
W (θ0 ) =
I
δt−1 vt
t=1
Similarly, the efficient program in the absence of task i assigns the tasks in ascending order, but necessarily skips task i in the assignment process: (3)
W−i (θ0 ) =
i−1 t=1
δt−1 vt +
I−1
δt−1 vt+1
t=i
By comparing the social program with and without i, (2) and (3), respectively, we find that the assignments for agents j < i remain unchanged after i is removed, but that each agent j > i is allocated the slot one period earlier than
778
D. BERGEMANN AND J. VÄLIMÄKI
in the presence of i. The marginal contribution of i from the point of view of period 0 is Mi (θ0 ) = W (θ0 ) − W−i (θ0 ) =
I
δt−1 (vt − vt+1 )
t=i
The social externality cost of agent i is established in a straightforward manner. At time t = i − 1, agent i completes her task and realizes the value vi . The immediate opportunity cost is the next highest valuation vi+1 . But this overstates the externality, because in the presence of i, all less valuable tasks are realized one period later. The externality cost of agent i is hence equal to the next valuable task vi+1 minus the improvement in future allocations due to the delay of all tasks by one period: (4)
pi (θt ) = vi+1 −
I
δ (vt − vt+1 ) = (1 − δ) t−i
t=i+1
I
δt−i vt+1
t=i
Since we have by construction vt − vt+1 ≥ 0, the externality cost of agent i in the intertemporal framework is less than in the corresponding single allocation problem where it would be vi+1 . Consequently, the final expression states that the externality of agent i is the cost of delay imposed on the remaining and less valuable tasks.3 4. THE DYNAMIC PIVOT MECHANISM We now construct the dynamic pivot mechanism for the general model described in Section 2. The marginal contribution of agent i is her contribution to the social value. In the dynamic pivot mechanism, the marginal contribution will also be the information rent that agent i can secure for herself if the planner wishes to implement the socially efficient allocation. In a dynamic setting, if agent i can secure her marginal contribution in every continuation game of the mechanism, then she should be able to receive the flow marginal contribution mi (θt ) in every period. The flow marginal contribution accrues incrementally over time and is defined recursively: Mi (θt ) = mi (θt ) + δEMi (θt+1 ) 3
In the online Supplementary Material (Bergemann and Välimäki (2010)), we show that the socially efficient scheduling can be implemented through a bidding mechanism rather than the direct revelation mechanism used here. In a recent and related contribution, Said (2008) used the dynamic pivot mechanism and a payoff equivalence result to construct bidding strategies in a sequence of ascending auctions with entry and exit of the agents.
THE DYNAMIC PIVOT MECHANISM
779
The flow marginal contribution can be expressed directly in terms of the social value functions, using the definition of the marginal contribution given in (1) as (5)
mi (θt ) W (θt ) − W−i (θt ) − δE[W (θt+1 ) − W−i (θt+1 )]
The continuation payoffs of the social programs with and without i, respectively, may be governed by different transition probabilities, as the respective social decisions in period t, a∗t a∗ (θt ) and a∗−it a∗−i (θ−it ), may differ. The continuation value of the socially optimal program, conditional on current allocation at and state θt is W (θt+1 |at θt ) EF(θt+1 ;at θt ) W (θt+1 ) where the transition from state θt to state θt+1 is controlled by the allocation at . For notational ease, we omit the expectations operator E from the conditional expectation. We adopt the same notation for the marginal contributions Mi (·) and the individual value functions Vi (·). The flow marginal contribution mi (θt ) is expressed as mi (θt ) =
I
vj (a∗t θjt ) −
vj (a∗−it θjt )
j=i
j=1 ∗ t
+ δ[W−i (θt+1 |a θt ) − W−i (θt+1 |a∗−it θt )] A monetary transfer p∗i (θt ) such that the resulting flow net utility matches the flow marginal contribution leads agent i to internalize her social externalities: (6)
p∗i (θt ) vi (a∗t θit ) − mi (θt )
We refer to p∗i (θt ) as the transfer of the dynamic pivot mechanism. The transfer p∗i (θt ) depends only on the current report θt and not on the entire public history ht . We can express p∗i (θt ) in terms of the flow utilities and the social continuation values: p∗i (θt ) = (7) [vj (a∗−it θjt ) − vj (a∗t θjt )] j=i
+ δ[W−i (θt+1 |a∗−it θt ) − W−i (θt+1 |a∗t θt )] The transfer p∗i (θt ) for agent i depends on the report of agent i only through the determination of the social allocation which is a prominent feature of the static Vickrey–Clarke–Groves mechanisms. The monetary transfers p∗i (θt ) are always nonnegative, as the policy a∗−it is by definition an optimal policy to maximize the social value of all agents exclusive of i. It follows that in every period t,
780
D. BERGEMANN AND J. VÄLIMÄKI
the sum of the monetary transfers across all agents generates a weak budget surplus. THEOREM 1—Dynamic Pivot Mechanism: The dynamic pivot mechanism {a∗t p∗t }∞ t=0 is ex post incentive compatible and individually rational. PROOF: By the unimprovability principle, it suffices to prove that if agent i receives as her continuation value her marginal contribution, then truthtelling is incentive compatible for agent i in period t, or (8)
vi (a∗ (θt ) θit ) − p∗i (θt ) + δMi (θt+1 |a∗t θt ) ≥ vi (a∗ (rit θ−it ) θit ) − p∗i (rit θ−it ) + δMi (θt+1 |a∗ (rit θ−it ) θt )
for all rit ∈ Θi and all θ−it ∈ Θ−i , and we recall that we denote the socially efficient allocation at the true state profile θt by a∗t a∗ (θt ). By construction of p∗i in (7), the left-hand side of (8) represents the marginal contribution of agent i. We can express the marginal contributions Mi (·) in terms of the different social values to get (9)
W (θt ) − W−i (θt ) ≥ vi (a∗ (rit θ−it ) θit ) − p∗i (rit θ−it )
+ δ W (θt+1 |a∗ (rit θ−it ) θt ) − W−i (θt+1 |a∗ (rit θ−it ) θt )
We then insert the transfer price p∗i (rit θ−it ) (see (7)) into (9) to obtain W (θt ) − W−i (θt ) ≥ vi (a∗ (rit θ−it ) θit ) − +
vj (a∗−it θjt ) − δW−i (θt+1 |a∗−it θt )
j=i
vj (a∗ (rit θ−it ) θjt ) + δW (θt+1 |a∗ (rit θ−it ) θt )
j=i
But now we reconstitute the entire inequality in terms of the respective social values: W (θt ) − W−i (θt ) ≥
I
vj (a∗ (rit θ−it ) θjt )
j=1
+ δW (θt+1 |a∗ (rit θ−it ) θt ) − W−i (θt ) The above inequality holds for all rit by the social optimality of a∗ (θt ) in state θt . Q.E.D.
THE DYNAMIC PIVOT MECHANISM
781
The dynamic pivot mechanism specifies a unique monetary transfer after every history. It guarantees that the ex post incentive and ex post participation constraints are satisfied after every history. In the intertemporal environment, each agent evaluates the monetary transfers to be paid in terms of the expected discounted transfers, but is indifferent (up to discounting) over the incidence of the transfers over time. This temporal separation between allocative decisions and monetary decisions may be undesirable for many reasons. First, if the agents and the principal do not have the ability to commit to future transfer payments, then delays in payments become problematic. In consequence, an agent who is not pivotal should not receive or make a payment. Second, if it is costly (in a lexicographic sense) to maintain accounts of future monetary commitments, then the principal wants to close down (as early as possible) the accounts of those agents who are no longer pivotal.4 This motivates the following efficient exit condition. Let state θτi in period τi be such that the probability that agent i affects the efficient social decision a∗t in period t is equal to zero for all t ≥ τi , that is, Pr({θt |a∗t (θt ) = a∗−it (θt )}|θτi ) = 0. In this case, agent i is irrelevant for the mechanism in period τi , and we say that the mechanism satisfies the efficient exit condition if agents neither make nor receive transfers in periods where they are irrelevant for the mechanism. DEFINITION 1—Efficient Exit: A dynamic direct mechanism satisfies the efficient exit condition if for all i hτi θτi ,
pi hτi θτi = 0 We establish the uniqueness of the dynamic pivot mechanism in an environment with diverse preferences and the efficient exit condition. The assumption of diverse preferences allows for rich preferences over the current allocations and indifference over future allocations. ASSUMPTION 1—Diverse Preferences: (i) For all i, there exists θi ∈ Θi such that for all a, vi (a θi ) = 0 and Fi (θi ; a θi ) = 1 (ii) For all i a, and x ∈ R+ , there exists θiax ∈ Θi such that x if at = a, ax vi (at θi ) = 0 if at = a, and for all at Fi (θi ; at θiax ) = 1 4 We would like to thank an anonymous referee for the suggestion to consider the link between exit and uniqueness of the transfer rule.
782
D. BERGEMANN AND J. VÄLIMÄKI
The diverse preference assumption assigns to each agent i a state, θi , which is an absorbing state and in which i gets no payoff from any allocation. In addition, each agent i has a state in which i has a positive valuation x for a specific current allocation aand no value for other current or any future allocations. The diverse preferences condition is similar to the rich domain conditions introduced in Green and Laffont (1977) and Moulin (1986) to establish the uniqueness of the Groves and the pivot mechanism in a static environment. Relative to their conditions, we augment the diverse (flow) preferences with the certain transition into the absorbing state θi . With this transition we ensure that the diverse flow preferences continue to matter in the intertemporal environment. The assumption of diverse preference in conjunction with the efficient exit condition guarantees that in every dynamic direct mechanism there are some types, specifically types of the form θiax , that receive exactly the flow transfers they would have received in the dynamic pivot mechanism. LEMMA 1: If {a∗t pt }∞ t=0 is ex post incentive compatible and individually rational, and satisfies the efficient exit condition, then pi (ht θiax θ−it ) = p∗i (θiax θ−it )
for all i a x θ−it ht
PROOF: In the dynamic pivot mechanism, if the valuation x of type θiax for allocation a exceeds the social externality cost, that is, (10)
x ≥ W−i (θ−it ) −
vj (a θjt ) − δW−i (θ−it+1 |a θ−it )
j=i
then p∗i (θiax θ−it ) is equal to the above social externality cost; otherwise it is zero. We now argue by contradiction. By the ex post incentive compatibility constraints, all types θiax of agent i, where x satisfies the inequality (10), must pay the same transfer. To see this, suppose that for some x y ∈ R+ satisfying (10), ay ay we have pi (ht θiax θ−it ) < pi (ht θi θ−it ). Now type θi has a strict incenax tive to misreport rit = θi , a contradiction. We therefore denote the transfer for all x and θiax satisfying (10) by pi (ht a θ−it ), and denote the corresponding dynamic pivot transfer by p∗i (a θ−it ). Suppose next that pi (ht a θ−it ) > p∗i (a θ−it ). This implies that the ex post participation constraint for some x with pi (ht a θ−it ) > x > p∗i (a θ−it ) is violated, contradicting the hypothesis of the lemma. Suppose to the contrary that pi (ht a θ−it ) < p∗i (a θ−it ), and consider the incentive constraints of a type θiax with a valuation x such that (11)
pi (ht a θ−it ) < x < p∗i (a θ−it )
THE DYNAMIC PIVOT MECHANISM
783
If the inequality (11) is satisfied, then it follows that a∗ (θiax θ−it ) = a∗−i (θ−it ) and, in particular, that a∗ (θiax θ−it ) = a. If the ex post incentive constraint of type θiax were satisfied, then we would have (12)
vi (a∗ (θiax θ−it ) θiax ) − pi (ht θiax θ−it ) ≥ vi (a θiax ) − pi (ht a θ−it )
Given that θi = θiax , we rewrite (12) as 0 − pi (ht θiax θ−it ) ≥ x − pi (ht a θ−it ). But given (11), this implies that pi (ht θiax θ−it ) < 0. In other words, type θiax receives a strictly positive subsidy even though her report is not pivotal for the social allocation as a∗ (θiax θ−it ) = a∗−i (θ−it ). Now, a positive subsidy violates the ex post incentive constraint of the absorbing type θi . By the efficient exit condition, type θi should not receive any contemporaneous (or future) subsidies. But by misreporting her type to be θiax , type θi would gain access to a positive subsidy without changing the social allocation. It thus follows that pi (ht θiax θ−it ) = p∗i (θiax θ−it ) for all a and all x. Q.E.D. Given that the transfers of the dynamic pivot mechanism are part of every dynamic direct mechanism with diverse preferences, we next establish that every type θi0 in t = 0 has to receive the same ex ante expected utility as in the dynamic pivot mechanism. LEMMA 2: If {a∗t pt }∞ t=0 is ex post incentive compatible and individually rational, and satisfies the efficient exit condition, then for all i and all θ0 , Vi (θ0 ) = Mi (θ0 ). PROOF: The argument is by contradiction. Consider i such that Vi (θ0 ) = Mi (θ0 ). Suppose first that Vi (θ0 ) > Mi (θ0 ). Then there is a history hτ and a state θτ such that pi (hτ θτ ) < p∗i (θτ ). We show that such a transfer pi (hτ θτ ) leads to a violation of the ex post incentive constraint for some type θiax ∈ Θi . a∗ x Specifically consider the incentive constraint of a type θi τ with pi (hτ θτ ) < x < p∗i (θτ ) at a misreport θiτ : (13)
a∗ x
a∗ x
a∗ x vi a∗ θi τ θ−iτ θi τ − pi hτ θi τ θ−iτ a∗ x
a∗ x
+ δVi hiτ+1 |a∗ θi τ θ−iτ θi τ θ−iτ a∗ x ≥ vi a∗ (θiτ θ−iτ ) θi τ − pi (hτ θτ )
a∗ x
+ δVi hiτ+1 |a∗ θiτ θ−iτ θi τ θ−iτ
By hypothesis, we have pi (hτ θτ ) < x < p∗i (θτ ) and if x < p∗i (θτ ), then we can a∗ x infer from marginal contribution pricing that a∗ (θi τ θ−iτ ) = a∗ (θiτ θ−iτ ). ∗ a x But as the type θi τ has only a positive valuation for a∗ (θiτ θ−iτ ), it follows
784
D. BERGEMANN AND J. VÄLIMÄKI
that the left-hand side of (13) is equal to zero. However, the right-hand side is a∗ x equal to vi (a∗ (θiτ θ−iτ ) θi τ ) − pi (hτ θτ ) = x − pi (hτ θτ ) > 0, leading to a contradiction. Suppose next that for some ε > 0, we have (14)
Mi (θ0 ) − Vi (θ0 ) > ε
By hypothesis of ex post incentive compatibility, we have for all reports ri0 , Mi (θ0 ) − vi (a∗ (ri0 θ−i0 ) θi0 ) − pi (h0 ri0 θ−i0 ) + δVi (hi1 |a∗ (ri0 θ−i0 ) θi0 ) > ε Given a∗0 , we can find, by the diverse preference condition, a type θi = a∗ x a∗ x θi 0 such that a∗0 = a∗ (θi 0 θ−i0 ). Now by Lemma 1, there exists a rea∗ x port ri0 for agent i namely ri0 = θi 0 , such that a∗0 is induced at the price a∗ x p∗i (θ0 ). After inserting ri0 = θi 0 into the above inequality and observing that vi (a∗ (ri0 θ−i0 ) θi0 ) −pi (h0 ri0 θ−i0 ) = mi (θ0 ), we conclude that Mi (θ1 ) − Vi (hi1 |a∗0 (ri0 θ−i0 ) θi0 ) > ε/δ. Now we repeat the argument we started with (14) and find that there is a path of realizations θ0 θt , such that the difference between the marginal contribution and the value function of agent i grows without bound. But the marginal contribution of agent i is finite given that the expected flow utility of agent i is bounded by some K > 0, and thus eventually the ex post participation constraint of the agent is violated and we obtain the desired contradiction. Q.E.D. The above lemma can be viewed as a revenue equivalence result of all (efficient) dynamic direct mechanisms. As we are analyzing a dynamic allocation problem with an infinite horizon, we cannot appeal to the revenue equivalence results established for static mechanisms. In particular, the statement of the standard revenue equivalence results involves a fixed utility for the lowest type. In the infinite-horizon model here, the diverse preference assumption gives us a natural candidate of a lowest type in terms of θi , and the efficient exit condition determines her utility. The remaining task is to argue that among all intertemporal transfers with the same expected discounted value, only the time profile of the dynamic pivot mechanism satisfies the relevant conditions. Alternative payments streams could either require an agent to pay earlier or later relative to the dynamic pivot transfers. If the payments were to occur later, payments would have to be lower in an earlier period by the above revenue equivalence result. This would open the possibility for a “short-lived” type θiax to induce action a at a price below the dynamic pivot transfer and hence violate incentive compatibility. The reverse argument applies if the payments were to occur earlier relative to the dynamic pivot transfer, for example, if the agent were to be asked to post a bond at the beginning of the mechanism.
THE DYNAMIC PIVOT MECHANISM
785
THEOREM 2—Uniqueness: If the diverse preference condition is satisfied and if {a∗t pt }∞ t=0 is ex post incentive compatible and individually rational, and satisfies the efficient exit condition, then it is the dynamic pivot mechanism. PROOF: The proof is by contradiction. Suppose not. Then by Lemma 2 there exists an agent i, a history hτ , and an associated state θiτ such that pi (hτ θτ ) = p∗i (θτ ). Suppose first that pi (hτ θτ ) < p∗i (θτ ). We show that the current monetary transfer pi (hτ θτ ) violates the ex post incentive constraint of a∗ x some type θiax . Consider now a type θi τ with a valuation x for the allocation a∗τ such that x > p∗i (θτ ). Her ex post incentive constraints are given by vi (a∗ (θiax θ−it ) θiax ) − pi (ht θiax θ−it ) + δVi (hit+1 |a∗ (θiax θ−it ) (θiax θ−it )) ≥ vi (a∗ (rit θ−it ) θit ) − pi (ht rit θ−it ) + δVi (hit+1 |a∗ (rit θ−it ) (θiax θ−it )) for all rit ∈ Θi . By the efficient exit condition, we have for all rit , Vi (hit+1 |a∗ (θiax θ−it ) (θiax θ−it )) = Vi (hit+1 |a∗ (rit θ−it ) (θiax θ−it )) = 0 By Lemma 1, pi (ht θiax θ−it ) = p∗i (θiax θ−it ) = p∗i (θτ ). Consider then the misreport riτ = θiτ by type θiax . The ex post incentive constraint now reads x − p∗i (θτ ) ≥ x − pi (hτ θτ ), which leads to a contradiction, as by hypothesis we have pi (hτ θτ ) < p∗i (θτ ). Suppose next that pi (hτ θτ ) > p∗i (θτ ). Now by Lemma 2, it follows that the ex ante expected payoff is equal to the value of the marginal contribution of agent i in period 0. It therefore follows from pi (hτ θτ ) > p∗i (θτ ) that there also exists another time τ and state θτ such that pi (hτ θτ ) < p∗i (θτ ). By repeating the argument in the first part of the proof, we obtain a contradiction. Q.E.D. We should reiterate that in the definition of the ex post incentive and participation conditions, we required that a candidate mechanism satisfies these conditions after all possible histories of past reports. It is in the spirit of the ex post constraints that these constraints hold for all possible states rather than strictly positive probability events. In the context of establishing the uniqueness of the mechanism, it allows us to use the diverse preference condition without making an additional assumption about the transition probability from a given state θit into a specific state θiax . We merely require the existence of these types in establishing the above result.
786
D. BERGEMANN AND J. VÄLIMÄKI
5. LEARNING AND LICENSING In this section, we show how our general model can be interpreted as one where the bidders learn gradually about their preferences for an object that is auctioned repeatedly over time. We use the insights from the general pivot mechanism to deduce properties of the efficient allocation mechanism. A primary example of an economic setting that fits this model is the leasing of a resource or license over time. In every period t a single indivisible object can be allocated to a bidder i ∈ {1 I}, and the allocation decision at ∈ {1 2 I} simply determines which bidder gets the object in period t To describe the uncertainty explicitly, we assume that the true valuation of bidder i is given by ωi ∈ Ωi = [0 1]. Information in the model represents, therefore, the bidder’s prior and posterior beliefs on ωi In period 0, bidder i does not know the realization of ωi , but she has a prior distribution θi0 (ωi ) on Ωi . The prior and posterior distributions on Ωi are assumed to be independent across bidders. In each subsequent period t, only the winning bidder in period t − 1 receives additional information leading to an updated posterior distribution θit on Ωi according to Bayes’ rule. If bidder i does not win in period t, we assume that she gets no information, and consequently the posterior is equal to the prior. In the dynamic direct mechanism, the bidders simply report their posteriors at each stage. The socially optimal assignment over time is a standard multiarmed bandit problem and the optimal policy is characterized by an index policy (see Whittle (1982)). In particular, we can compute for every bidder i the index based exclusively on the information about bidder i. The index of bidder i after private history hit is the solution to the optimal stopping problem ⎫ ⎧ τi ⎪ ⎪ l ⎪ ⎪ ⎪ δ vi (at+l ) ⎪ ⎪ ⎪ ⎬ ⎨ l=0 γi (hit ) = max E τi τi ⎪ ⎪ ⎪ ⎪ l ⎪ ⎪ ⎪ ⎪ δ ⎭ ⎩ l=0
where at+l is the path in which alternative i is chosen l times following a given past allocation (a0 at ). An important property of the index policy is that the index of alternative i can be computed independent of any information about the other alternatives. In particular, the index of bidder i remains constant if bidder i does not win the object. The socially efficient allocation policy a∗ = {a∗t }∞ t=0 is to choose in every period a bidder i if γi (hit ) ≥ γj (hjt ) for all j. In the dynamic direct mechanism, we construct a transfer price such that under the efficient allocation, each bidder’s net payoff coincides with her flow marginal contribution mi (θt ). We consider first the payment of the bidder i who has the highest index in state θt and who should therefore receive the
THE DYNAMIC PIVOT MECHANISM
787
object in period t. To match her net payoff to her flow marginal contribution, we must have (15)
mi (θt ) = vi (hit ) − pi (θt )
The remaining bidders, j = i, should not receive the object in period t and their transfer price must offset the flow marginal contribution: mj (θt ) = −pj (θt ). We expand mi (θt ) by noting that i is the efficient assignment and that another bidder, say k, would be the efficient assignment in the absence of i: mi (θt ) = vi (hit ) − vk (hkt ) + δ(W−i (θt+1 |i θt ) − W−i (θt+1 |k θt )) The continuation value without i in t + 1, but conditional on having assigned the object to i in period t, is simply equal to the value conditional on θt , or W−i (θt+1 |i θt ) = W−i (θt ). The additional information generated by the assignment to agent i only pertains to agent i and hence has no value for the allocation problem once i is removed. The flow marginal contribution of the winning agent i is, therefore, mi (θt ) = vi (hit ) − (1 − δ)W−i (θt ) It follows that p∗i (θt ) = (1 − δ)W−i (θt ), which is the flow externality cost of assigning the object to agent i. A similar analysis leads to the conclusion that each losing bidder makes zero payments: p∗j (θt ) = −mj (θt ) = 0. THEOREM 3—Dynamic Second Price Auction: The socially efficient allocation rule a∗ is ex post incentive compatible in the dynamic direct mechanism with the payment rule p∗ , where (1 − δ)W−j (θt ) if a∗t = j, p∗j (θt ) = 0 if a∗t = j The incentive compatible pricing rule has a few interesting implications. First, we observe that in the case of two bidders, the formula for the dynamic second price reduces to the static solution. If we remove one bidder, the social program has no other choice but to always assign it to the remaining bidder. But then the expected value of that assignment policy is simply equal to the expected value of the object for bidder j in period t by the martingale property of the Bayesian posterior. In other words, the transfer is equal to the current expected value of the next best competitor. It should be noted, though, that the object is not necessarily assigned to the bidder with the highest current flow payoff. With more than two bidders, the flow value of the social program without bidder i is different from the flow value of any remaining alternative. Since
788
D. BERGEMANN AND J. VÄLIMÄKI
there are at least two bidders left after excluding i the planner has the option to abandon any chosen alternative if its value happens to fall sufficiently. This option value increases the social flow payoff and hence the transfer that the efficient bidder must pay. In consequence, the social opportunity cost is higher than the highest expected valuation among the remaining bidders. Second, we observe that the transfer price of the winning bidder is independent of her own information about the object. This means that for all periods in which the ownership of the object does not change, the transfer price stays constant as well, even though the value of the object to the winning bidder may change. REFERENCES ARROW, K. (1979): “The Property Rights Doctrine and Demand Revelation Under Incomplete Information,” in Economics and Human Welfare: Essays in Honor of Tibor Scitovsky, ed. by M. Boskin. New York: Academic Press, 23–39. [773] ATHEY, S., AND I. SEGAL (2007): “An Efficient Dynamic Mechanism,” Discussion Paper, Harvard University and Stanford University. [773] BARON, D., AND D. BESANKO (1984): “Regulation and Information in a Continuing Relationship,” Information Economics and Policy, 1, 267–302. [772,773] BATTAGLINI, M. (2005): “Long-Term Contracting With Markovian Consumers,” American Economic Review, 95, 637–658. [772,773] BERGEMANN, D., AND J. VÄLIMÄKI (2003): “Dynamic Common Agency,” Journal of Economic Theory, 111, 23–48. [775] (2006): “Dynamic Price Competition,” Journal of Economic Theory, 127, 232–263. [775] (2010): “Supplement to ‘The Dynamic Pivot Mechanism’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/7260_extensions.pdf. [778] CAVALLO, R., D. PARKES, AND S. SINGH (2006): “Optimal Coordinated Planning Among SelfInterested Agents With Private State,” in Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, Cambridge. [773] COURTY, P., AND H. LI (2000): “Sequential Screening,” Review of Economic Studies, 67, 697–717. [773] D’ASPREMONT, C., AND L. GERARD -VARET (1979): “Incentives and Incomplete Information,” Journal of Public Economics, 11, 25–45. [773] DOLAN, R. (1978): “Incentive Mechanisms for Priority Queuing Problems,” Bell Journal of Economics, 9, 421–436. [777] GREEN, J., AND J. LAFFONT (1977): “Characterization of Satisfactory Mechanisms for the Revelation of the Preferences for Public Goods,” Econometrica, 45, 427–438. [771,782] LAVI, R., AND N. NISAN (2000): “Competitive Analysis of Incentive Compatible Online Auctions,” in Proceedings of the 2nd Conference of Electronic Commerce. New York: ACM Press, 233–241. [772] MOULIN, H. (1986): “Characterization of the Pivotal Mechanism,” Journal of Public Economics, 31, 53–78. [782] PAVAN, A., I. SEGAL, AND J. TOIKKA (2008): “Dynamic Mechanism Design: Revenue Equivalence, Profit Maximization and Information Disclosure,” Discussion Paper, Northwestern University and Stanford University. [773] SAID, M. (2008): “Auctions With Dynamic Populations: Efficiency and Revenue Maximization,” Discussion Paper, Yale University. [778]
THE DYNAMIC PIVOT MECHANISM
789
WHITTLE, P. (1982): Optimization Over Time, Vol. 1. Chichester: Wiley. [786]
Dept. of Economics, Yale University, 30 Hillhouse Avenue, New Haven, CT 06520, U.S.A.;
[email protected] and Dept. of Economics, Helsinki School of Economics, Arkadiankatu 7, 00100 Helsinki, Finland;
[email protected]. Manuscript received July, 2007; final revision received September, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 791–801
MECHANISM GAMES WITH MULTIPLE PRINCIPALS AND THREE OR MORE AGENTS BY TAKURO YAMASHITA1 We consider a class of mechanism games in which there are multiple principals and three or more agents. For a mechanism game in this class, a sort of folk theorem holds: there is a threshold value for each of the principals such that an allocation is achieved at a pure-strategy sequential equilibrium of the game if and only if (i) it is incentive compatible and (ii) it attains an expected utility for each principal that is greater than or equal to the threshold value for the principal. KEYWORDS: Mechanism design, revelation principle, common agency, folk theorem.
1. INTRODUCTION THERE ARE A NUMBER OF ECONOMIC EXAMPLES in the real world where multiple principals independently design contracts, rules, or, in general, “mechanisms,” and agents independently make their decisions under these mechanisms. For example, several countries (principals) may compete with each other by designing regulation and/or corporate tax schemes (mechanisms) to attract multinational enterprises (agents), and the enterprises may then compete with each other if they operate in the same country or countries.2 In this market, not only the agents, but the principals compete with each other by designing mechanisms, a factor that represents one of the main conceptual differences between a multiprincipal model and a (standard) single-principal model. In this paper, we are interested in what kind of allocations can be achieved at the “equilibria” of such a multiprincipal, multiagent situation. Note that, for this purpose, we require a model of a “mechanism game” in which each of the principals strategically designs a mechanism as a player of the game. Hence, we need to model an equilibrium of the game as a strategy profile such that neither an agent nor a principal can be better off by any deviation from the strategy profile. There is a growing literature on mechanism design with common agency, which studies a mechanism game with multiple principals and a single agent, and many important observations about equilibrium allocations have been obtained.3 However, for mechanism games with multiple principals and multiple 1 I am grateful to Koichi Tadenuma, Hideshi Itoh, Hideo Suehiro, Akira Okada, Matthew O. Jackson, R. Preston McAfee, Jonathan D. Levin, Ilya Segal, Takuro Miyazaki, and seminar participants at Hitotsubashi University, Tokyo University, Kyoto University, and Yokohama National University for their comments and discussion on previous versions of this paper. I am especially grateful to Balazs Szentes, the anonymous referees, and the co-editor for their helpful comments and suggestions. 2 See Calzolari (2004), among others, for analyses of such a model and its variations. 3 See Martimort (2006) for a recent survey of mechanism design with common agency.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7005
792
TAKURO YAMASHITA
agents, we do not know much about the characteristics of equilibrium allocations. For example, Epstein and Peters (1999) considered a model of mechanism games with multiple principals and agents that is similar to our model. They showed a generalized version of the revelation principle in the sense that, under some technical conditions, there is a class of mechanisms, called generalized revelation mechanisms, such that every equilibrium allocation of a mechanism game is attained at an equilibrium in which every principal offers a generalized revelation mechanism. While the paper by Epstein and Peters (1999) provides an initial step toward characterizing the equilibrium allocations, it does not give a direct way to obtain them.4 In this paper, we offer a way to obtain the set of all pure-strategy, sequential-equilibrium allocations for a mechanism game with multiple principals and three or more agents under certain assumptions. More specifically, we consider the following mechanism games. There are two or more principals and three or more agents. Each agent is assumed to have some private information in advance. At the first stage of the game, the principals simultaneously and independently design mechanisms, and those mechanisms are observed by every agent. Then, at the second stage, each of the agents simultaneously and independently sends a message to each of the principals. Finally, an allocation is realized according to the mechanisms designed by the principals and the messages sent by the agents. For such a mechanism game, under certain assumptions, we find a sort of folk theorem for the set of all pure-strategy, sequential-equilibrium allocations. More precisely, there is a threshold value for each principal such that an allocation is realized at a pure-strategy equilibrium of the game if and only if (i) it is incentive compatible, which basically means that, given the allocation, any agent of any type cannot be made better off by pretending to be another type, and (ii) it attains an expected utility for each principal that is greater than or equal to the threshold value for the principal. In other words, the set of all pure-strategy equilibrium allocations is characterized by the threshold values of the principals and incentive-compatibility conditions. The rest of the paper is organized as follows. In Section 2, a formal model of mechanism games is introduced. In Section 3, we characterize the set of all equilibrium allocations of a mechanism game and, in addition, we discuss some relationships between our results and the existing literature. Section 4 concludes the paper. 4 Other papers that offer generalizations of revelation mechanisms or their variations (especially menu mechanisms) include Peters (2001), Martimort and Stole (2002), Page and Monteiro (2003), Han (2006), and Pavan and Calzolari (2010). Except for Han (2006), these papers study a model with a single agent (common agency). Han (2006) studied mechanism games with multiple agents but considered bilateral-contracting games, which are based on an essentially different model from ours in the sense that, in the model of Han (2006), each agent can observe only what the principals have offered to that particular agent and cannot observe what the principals have offered to the other agents. In our model, in contrast, every agent observes all of the mechanisms designed at the first stage.
MECHANISM GAMES
793
2. A MECHANISM GAME Let J be the set of all principals and let I be the set of all agents. We assume that |I| ≥ 3 so that there are three or more agents. Each agent i ∈ I has a private information, which we call the type of agent i, and denote by θi ∈ Θi . We assumethat the set of all possible types of agent i, Θi , is finite for each i. Define Θ = i∈I Θi . Let p(θ) denote the probability that θ = (θi )i∈I ∈ Θ is realized. We assume that p is common knowledge among all the principals and agents. Each principal j ∈ J chooses a choice variable yj ∈ Yj . We assume that each Yj is finite and define Y = j∈J Yj .5 We refer to z : Θ → Y as an allocation. Let vj : Y × Θ → R represent principal j’s utility function and let ui : Y × Θ → R represent agent i’s utility function. We denote the expected utility of principal j for allocation z by Eθ [vj (z(θ) θ)] = θ vj (z(θ) θ)p(θ) and denote the conditional expected utility of agent i of type θi for allocation z by Eθ−i ui (z(θi θ−i ) θi θ−i )|θi p(θi θ−i ) ui (z(θi θ−i ) θi θ−i ) = p(θi θ−i ) θ−i θ−i
In this paper, we consider the following mechanism game. At Stage 1, each principal simultaneously offers a mechanism to the agents. It is assumed that there is an exogenously given (nonempty) set of messages, denoted by M, that any agent can send to any principal. Any mapping cj : M |I| → Yj is called a mechanism of principal j. Let Cj be an arbitrary set of mechanisms of principal j. We refer to Cj as the (pure) strategy set of principal j. We define C = j Cj . At Stage 1.5, each agent i observes θi ∈ Θi and c = (cj )j∈J ∈ C, that is, the type of the agent and the mechanisms offered at Stage 1. At Stage 2, each agent i simultaneously reports a message mji ∈ M to each principal j. Finally, yj = cj ((mji )i∈I ) is realized for each j. In this formulation, note that a message mji potentially conveys some information to principal j about θi and c−j = (ck )k=j , that is, the agent’s type and the mechanisms designed by the principals other than j. Therefore, through the design of cj at Stage 1, each principal j attempts to give each agent an appropriate incentive to convey the information about θi and c−j . In each subgame where the principals have offered c ∈ C at the first stage, let sji (θi c) ∈ M denote agent i’s message to principal j when agent i’s type is θi . Then we say that si (c) = (sji (θi c))j∈Jθi ∈Θi is agent i’s continuation strategy given c and that si = (si (c))c∈C is agent i’s strategy. Let z(c s(c)) : Θ → Y 5 Although the finiteness of Y is restrictive in the sense that, for example, some environments with money and/or lotteries are excluded from our model, the author believes that similar results are obtained even if Y is infinite.
794
TAKURO YAMASHITA
denote the allocation induced by c ∈ C and s(c), where s(c) = (si (c))i∈I is a continuation strategy profile of the agents given c. That is, for each θ, z(θ|c s(c)) = (cj ((sji (θi c))i∈I ))j∈J , where each cj ((sji (θi c))i∈I ) ∈ Yj denotes principal j’s choice variable when the agents’ types are θ = (θi )i∈I . In this paper, we consider pure-strategy sequential equilibria of the game.6 First, in each subgame where the principals have offered c = (cj )j∈J at the first stage, a continuation strategy profile of the agents, s∗ (c) = (si∗ (c))i∈I , is said to be a continuation equilibrium given c if s∗ (c) is a Bayesian equilibrium among the agents given c, that is, for any i, θi , and (mji )j∈J ∈ M |J| , ∗ (θ−i c)) j∈J θi θ−i |θi Eθ−i ui cj (sji∗ (θi c) sj−i ∗ (θ−i c)) j∈J θi θ−i |θi ≥ Eθ−i ui cj (mji sj−i Let S ∗ (c) be the set of all continuation equilibria given c. Throughout the paper, we assume that S ∗ (c) is nonempty for any c ∈ C, although this is a restrictive assumption because S ∗ (c) is defined as the set of pure-strategy continuation equilibria.7 We now define an equilibrium of the game. Let c ∗ ∈ C and let s∗ be a strategy profile of the agents. We say that (c ∗ s∗ ) is a (pure-strategy) equilibrium if (i) s∗ (c) ∈ S ∗ (c) for any c ∈ C and (ii) for each j and cj ∈ Cj , ∗ ∗ ∗ ∗ s∗ (cj∗ c−j )) θ ≥ Eθ vj z(θ|cj c−j s∗ (cj c−j )) θ Eθ vj z(θ|cj∗ c−j We denote by Z ∗ the set of all equilibrium allocations of the game. 2.1. Recommendation Mechanisms In the following analysis, we consider a mechanism game such that each Cj contains a special mechanism, which we call a recommendation mechanism. Let Dj be the set of all mappings from Θ to Yj . In the following definition, we interpret each dj : Θ → Yj as a (standard) revelation mechanism of principal j. We assume that M is sufficiently large so that M ⊇ Θi × Dj for any j and i. DEFINITION 1: We call cˆj ∈ Cj a recommendation mechanism of principal j if there exist θi ∈ Θi for each i, d j ∈ Dj , and y j ∈ Yj , such that both of the following conditions hold. 6 A standard definition of a sequential equilibrium consists of a strategy profile and a belief system. We omit the description of a belief system in the following definition, because it is obvious as long as we consider only pure strategies of the mechanism game. 7 If we allow mixed strategies and assume that M is finite, then a (mixed-strategy) continuation equilibrium always exists for any c ∈ C, and hence, we no longer need this existence assumption. The author believes that similar results can be obtained for the mixed-strategy equilibrium allocations, but it is an open question.
MECHANISM GAMES
795
(i) If every agent i reports a message in Θi × Dj , that is, mji = (θi dji ) ∈ Θi × Dj for all i ∈ I, then cˆj ((θi dji )i∈I ) =
dj ((θi )i∈I ) if |{i|dji = dj }| ≥ |I| − 1, yj otherwise.
(ii) If there is h ∈ I such that mjh ∈ / Θh × Dj , then for any (mji )i=h ∈ M |I|−1 , we have cˆj (mjh (mji )i=h ) = cˆj ((θi d j ) (mji )i=h ). Under the recommendation mechanism, each agent i is asked to report the agent’s type θi ∈ Θi and to recommend a revelation mechanism dji ∈ Dj to principal j. If the message of agent h was not in Θh × Dj , then the principal interprets that agent h has sent (θh d j ) ∈ Θh × Dj . When each agent i reports (θi dji ) ∈ Θi × Dj and if at least |I| − 1 agents recommend the same dj ∈ Dj , then the principal obeys the recommendation and assigns dj (θ). If there is no such dj , an arbitrary choice variable y j is assigned. Hence, for any h ∈ I and (θh djh ) ∈ Θh × Dj , as long as any other agent i = h reports (θi dj ) to principal j, we have cˆj ((θh djh ) (θi dj )i=h ) = cˆj ((θh dj ) (θi dj )i=h ) (= dj (θh θ−h )). 3. EQUILIBRIUM ALLOCATIONS In this section, we discuss the characteristics of Z ∗ , the set of all equilibrium allocations. The first property that an equilibrium allocation z ∈ Z ∗ must satisfy is the incentive compatibility condition. For an allocation z : Θ → Y , let zj : Θ → Yj represent the jth projection of z, so that for all θ, z(θ) = (zj (θ))j∈J . We say that z is incentive compatible if, for each i, θi , and (θji )j∈J ∈ Θi|J| , Eθ−i ui (zj (θi θ−i ))j∈J θi θ−i |θi ≥ Eθ−i ui (zj (θji θ−i ))j∈J θi θ−i |θi That is, z is called an incentive-compatible allocation if each agent i of type θi cannot be better off by pretending to be another type.8 Let Z IC be the set of all incentive-compatible allocations. The following lemma states that any equilibrium allocation must be incentive compatible. LEMMA 1: Z ∗ ⊆ Z IC . PROOF: Let (c ∗ s∗ ) be an equilibrium and let z ∈ Z ∗ be the equilibrium allocation (i.e., zj (θ) = cj ((sji∗ (θi c ∗ ))i∈I ) for each j and θ). Because s∗ (c ∗ ) is a 8 The incentive compatibility is more complicated than in a single-principal model, because agent i can report distinct types to distinct principals in a multiprincipal model.
796
TAKURO YAMASHITA
continuation equilibrium given c ∗ , for each i, θi , and (mji )j∈J ∈ M |J| , Eθ−i ui (zj (θi θ−i ))j∈J θi θ−i |θi ∗ (θ−i c ∗ )) j∈J θi θ−i |θi ≥ Eθ−i ui cj (mji sj−i and, in particular, if we let mji = sji∗ (θji c ∗ ) for each j, then we obtain ∗ cj (sji∗ (θji c ∗ ) sj−i (θ−i c ∗ )) = zj (θji θ−i ), and hence, the right-hand side of this inequality becomes Eθ−i [ui ((zj (θji θ−i ))j∈J θi θ−i )|θi ], which implies that z is incentive compatible. Q.E.D. Now we state the main result of the paper. THEOREM 1: There is a threshold value vj ∈ R for each j such that we have Z ∗ = {z ∈ Z IC |∀j Eθ [vj (z(θ) θ)] ≥ vj }. As the first step of the proof, we define vj for each j as
vj = min max
min∗
c−j ∈C−j cj ∈Cj s(cj c−j )∈S (cj c−j )
Eθ vj z(θ|cj c−j s(cj c−j )) θ
We show that vj is well defined. In general, for a set X and a function f : X → R, f = minx∈X f (x) and f = maxx∈X f (x) exist if (i) X = ∅ and (ii) {f (x)|x ∈ X} is finite. Because S ∗ (c) is assumed to be nonempty for any c ∈ C, the first condition is met. Because Θ and Y are finite, the second condition is also met. Thus, vj exists. In the following, as we have already shown Z ∗ ⊆ Z IC , it suffices to show that an incentive-compatible allocation z ∈ Z IC is attained at some equilibrium if and only if Eθ [vj (z(θ) θ)] ≥ vj for every j. PROOF OF THEOREM 1—“Only if” Direction: Suppose that there is an equilibrium (c s) such that, for some j, we have Eθ [vj (z(θ|c s(c)) θ)] < vj . By the definition of vj , there is cj ∈ Cj such that for any s(cj c−j ) ∈ S ∗ (cj c−j ), we have Eθ [vj (z(θ|cj c−j s(cj c−j )) θ)] ≥ vj . It means that principal j can be better off by deviating from cj to cj regardless of the continuation equilibrium played by the agents, which contradicts that c is an equilibrium array of mechanisms. Q.E.D. Before the proof of the “if” direction, we first show the following lemma. Recall that cˆk represents a recommendation mechanism of principal k ∈ J. LEMMA 2: For each j and cj , there is sˆ(cj cˆ−j ) ∈ S ∗ (cj cˆ−j ) such that Eθ [vj (z(θ|cj cˆ−j s(cj cˆ−j )) θ)] ≤ vj .
MECHANISM GAMES
797
REMARK: The lemma states that for any mechanism cj of principal j, if the other principals offer recommendation mechanisms, cˆ−j , then we can find a continuation equilibrium, sˆ(cj cˆ−j ), that makes the expected utility of principal j no more than vj . In the proof, we construct sˆki (θi cj cˆ−j ) for k = j so that each agent i truthfully reports θi and recommends the same revelation mechanism, denoted by d k in the proof, to principal k. Then, because every agent recommends the same revelation mechanism, the principal k obeys the recommendation (and assigns d k (θ)). By the definition of d k , the expected utility of principal j cannot be greater than vj . PROOF OF LEMMA 2: Fix an arbitrary principal j. By the definition of vj , there is c −j ∈ C−j such that, for each cj , there is s(cj c −j ) ∈ S ∗ (cj c −j ) that satisfies Eθ vj z(θ|cj c −j s(cj c −j )) θ ≤ vj It suffices to show that for any cj , there is sˆ(cj cˆ−j ) ∈ S ∗ (cj cˆ−j ) such that z(θ|cj cˆ−j sˆ(cj cˆ−j )) = z(θ|cj c −j s(cj c −j )) for any θ. Fix an arbitrary mechanism cj ∈ Cj . For each k = j, define d k ∈ Dk so that for each θ = (θi )i∈I , we have d k (θ) = c k ((ski (θi cj c −j ))i∈I ). We construct sˆ(cj cˆ−j ) as follows: For each i and θi , let sˆji (θi cj cˆ−j ) = sji (θi cj c −j ) and let sˆki (θi cj cˆ−j ) = (θi d k ) for k = j. Then, for each θ, cj (ˆsji (θi cj cˆ−j ))i∈I = cj (sji (θi cj c −j ))i∈I cˆk (ˆski (θi cj cˆ−j ))i∈I = d k (θ) = c k (ski (θi cj c −j ))i∈I ∀k = j which implies that z(θ|cj cˆ−j sˆ(cj cˆ−j )) = z(θ|cj c −j s(cj c −j )) for each θ. Second, we show that sˆ(cj cˆ−j ) ∈ S ∗ (cj cˆ−j ). Suppose that agent h of type θh deviated from sˆh (cj cˆ−j ) by reporting mjh ∈ M to principal j (instead of sˆjh (θh cj cˆj )), and (θkh dkh ) ∈ Θh × Dk to each principal k = j (instead of sˆkh (θh cj cˆj ) = (θh d k )).9 Then for each θ−h , principal j assigns cj (mjh sj−h (θ−h cj c −j )) while each k = j assigns, because the agents other than i recommend d k , d k (θkh θ−h ) = c k (skh (θkh cj c −j ) sk−h (θ−h cj c −j )) If this is a profitable deviation, then it means that under (cj c −j ), agent h can be better off by deviating from sh (θh cj c −j ) to (mjh (skh (θkh cj c −j ))k=j ), which contradicts that s(cj c −j ) ∈ S ∗ (cj c −j ). Q.E.D. 9
Notice that reporting mkh ∈ / Θh × Dk is equivalent to reporting (θh d k ) ∈ Θh × Dk .
798
TAKURO YAMASHITA
PROOF OF THEOREM 1—“if” Direction: Fix an arbitrary z ∈ Z IC such that Eθ [vj (θ z(θ))] ≥ vj for every j. We show that there is a strategy profile of the ˆ s∗ ) is an equilibrium that achieves z as the equilibrium agents, s∗ , such that (c allocation. ˆ then each agent i of We define s∗ (c) for each c ∈ C as follows. First, if c = c, ˆ = (θi zj ) to each principal j, where zj : Θ → Yj is the type θi reports sji∗ (θi c) ˆ s∗ (c)) ˆ = z(θ) for all θ. Second, if c = jth projection of z. Then we have z(θ|c (cj cˆ−j ) such that cj = cˆj for some j, then the agents play s∗ (cj cˆ−j ) = sˆ(cj cˆ−j ), where sˆ(cj cˆ−j ) is, as defined in Lemma 2, the continuation equilibrium given (cj cˆ−j ) at which principal j’s expected utility is less than or equal to vj . Finally, for any other c ∈ C, let s∗ (c) be an arbitrary continuation equilibrium given c. Thus, as long as the agents follow s∗ , each principal j cannot be better off by any deviation from cˆj . ˆ ∈ S ∗ (c). ˆ Under c, ˆ suppose that an agent Now it suffices to show that s∗ (c) i of type θi reported (θji dji ) ∈ Θi × Dj to each principal j, instead of reˆ = (θi zj ) to each j.10 Because the agents other than i recporting sji (θi c) ommend zj to principal j, principal j assigns zj (θji θ−i ) when the other agents’ types are θ−i . Hence, by this deviation, the agent’s expected utility becomes Eθ−i [ui ((zj (θji θ−i ))j∈J θi θ−i )|θi ], which cannot be greater than Eθ−i [ui ((zj (θi θ−i ))j∈J θi θ−i )|θi ] because z is an incentive-compatible allocaˆ tion. Thus, agent i cannot be better off by any deviation under c. Q.E.D. 3.1. Discussion The Structure of a Recommendation Mechanism In the proof of Theorem 1, to achieve z as an equilibrium allocation, we have actually constructed an equilibrium such that each principal offers a recommendation mechanism. At this equilibrium, if every principal has offered a recommendation mechanism at the first stage, then each agent i of type θi reports (θi zj ) to each principal j. We have shown that as long as z is incentive compatible, an agent cannot be better off under any deviation. If principal j has deviated from cˆj at the first stage, then the agents play a continuation equilibrium at which principal j’s expected utility is no more than vj . Lemma 2 assures that such a continuation equilibrium exists. Therefore, as long as principal j’s expected utility given z is greater than or equal to vj , principal j cannot be better off by any deviation from cˆj . Notice that, as long as no more than one principal has deviated, every agent recommends the same revelation mechanism to each of the nondeviating principals. Although an agent can potentially recommend another revelation mechanism, such a deviation cannot essentially change the allocation. This structure of the recommendation mechanism is basically the same as that 10
Notice that reporting mji ∈ / Θi × Dj is equivalent to reporting (θi d j ) ∈ Θi × Dj .
MECHANISM GAMES
799
of an agreement mechanism in the literature on implementation.11 Consider a situation with a single principal and N agents, where we have p(θ) > 0 only if θ1 = · · · = θN . An agreement mechanism is a revelation mechanism φ such that ˜ θ ˜ θ) ˜ = φ(θi θ ˜ θ) ˜ for any i θi , and θ. ˜ Then, under the agreeφ(θ ment mechanism, each agent truthfully reporting θi = θ˜ is a Nash equilibrium. Other Folk-Theorem Results for Mechanism Games Folk-theorem results similar to ours have been obtained in the literature on delegation games (e.g., Fershtman, Judd, and Kalai (1991), Polo and Tedeschi (2000), and Katz (2006)). In contrast to the mechanism game considered in this paper, in a delegation game, there is an “agent j” for each principal j who serves only principal j, and this agent j chooses yj on behalf of principal j. The task of principal j is to design a monetary transfer scheme for agent j to affect the agent’s incentive to choose yj . To obtain the folk-theorem results of those papers, the assumptions of complete information and the existence of money are crucial, whereas we obtain the folk-theorem result under incomplete information. For example, in Katz (2006),12 the folk-theorem result was obtained as follows. Katz (2006) constructed an equilibrium at which each principal j carefully designs a monetary transfer scheme so that, under the monetary transfer scheme, agent j is indifferent as to the choice of any yj ,13 and, therefore, (i) it is one of the best choices for agent j to choose a “cooperative” yj if no principal has deviated at Stage 1 and (ii) it is one of the best choices for agent j to choose another choice variable yjk that “punishes” principal k (= j) if principal k has deviated at Stage 1. Because of the assumptions of complete information and the existence of money, on the other hand, those papers do not need three or more agents to obtain their folk-theorem results, in contrast to our approach. With One or Two Agents Notice that the folk-theorem result cannot generally be extended to games with only one or two agents. Clearly, when there is only one agent, then the agent does not report a principal’s deviation to the other principals if such a deviation is preferable for the agent. See the literature on common agency14 11 See Palfrey (2002) and Maskin and Sjöström (2002) for a formal definition of the agreement mechanism and related discussion. 12 Katz (2006) considered a hidden-action model (i.e., principal j cannot observe yj chosen by agent j, but observes only a noisy signal of yj ), whereas Fershtman, Judd, and Kalai (1991) and Polo and Tedeschi (2000) considered models without hidden actions. On the other hand, Fershtman, Judd, and Kalai (1991) and Polo and Tedeschi (2000) obtained the folk-theorem results under stronger concepts of incentive compatibility. 13 More specifically, under this monetary transfer scheme, the agent always earns the expected utility that is equal to the level of the “outside option.” 14 For example, see the recent survey by Martimort (2006).
800
TAKURO YAMASHITA
regarding the general characteristics of the equilibrium allocations in games with only one agent. When there are exactly two agents, the result is ambiguous. For example, if the principals can “punish” both of the agents15 when their messages are inconsistent, then we obtain the same result as in the case with three or more agents. However, if it is impossible to punish both of the agents at the same time, then the folk-theorem result would not be obtained in general. 4. CONCLUSION In this paper, we have considered a mechanism game with multiple principals and three or more agents, and have shown that the set of all pure-strategy, sequential-equilibrium allocations is characterized by the incentive compatibility of allocations and the threshold values for the principals. More precisely, an allocation is attained at a pure-strategy equilibrium of the game if and only if it is incentive compatible and it attains an expected utility for each principal greater than or equal to the threshold value for the principal. This result suggests that the set of all equilibrium allocations of a mechanism game could be very large. If one is interested in whether some particular allocation (such as an “optimal” allocation in some sense) is in this set, then this folk-theorem result may be useful. However, if one is interested in comparative statics for equilibrium allocations, for example, then some criteria may be called for so as to select “reasonable” equilibrium allocations. For singleprincipal settings, the literature on mechanism design and implementation has provided several arguments that eliminate some “unreasonable” equilibrium allocations, for example, collusion among agents and imperfect commitment of the principal.16 It may be worth studying how those arguments could work for equilibrium selection in multiple-principal settings. REFERENCES BESTER, H., AND R. STRAUSZ (2001): “Contracting With Imperfect Commitment and Revelation Principle: The Single Agent Case,” Econometrica, 69, 1077–1098. [800] CALZOLARI, G. (2004): “Incentive Regulation of Multinational Enterprises,” International Economic Review, 45, 157–282. [791] EPSTEIN, L., AND M. PETERS (1999): “A Revelation Principle for Competing Mechanisms,” Journal of Economic Theory, 88, 119–160. [792] FERSHTMAN, C., K. JUDD, AND E. KALAI (1991): “Observable Contracts: Strategic Delegation and Cooperation,” International Economic Review, 32, 551–559. [799] HAN, S. (2006): “Menu Theorems for Bilateral Contracting,” Journal of Economic Theory, 131, 157–178. [792] JACKSON, M. O. (1991): “Bayesian Implementation,” Econometrica, 59, 461–477. [800] 15
An example is the “economic situation” discussed by Jackson (1991). For example, collusion among agents was studied by Laffont and Martimort (1997), and imperfect commitment of the principal was studied by Bester and Strausz (2001). 16
MECHANISM GAMES
801
KATZ, M. (2006): “Observable Contracts as Commitments: Interdependent Contracts and Moral Hazard,” Journal of Economics and Management Strategy, 15, 685–706. [799] LAFFONT, J. J., AND D. MARTIMORT (1997): “Collusion Under Asymmetric Information,” Econometrica, 65, 875–911. [800] MARTIMORT, D. (2006): “Multi-Contracting Mechanism Design,” in Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress, Vol. I, ed. by R. Blundell, W. K. Newey, and T. Persson. Cambridge, MA: Cambridge University Press, 57–101. [791,799] MARTIMORT, D., AND L. STOLE (2002): “The Revelation and Delegation Principles in Common Agency Games,” Econometrica, 70, 1659–1673. [792] MASKIN, E., AND T. SJÖSTRÖM (2002): “Implementation Theory,” in Handbook of Social Choice and Welfare, Vol. I, ed. by K. Arrow, A. Sen, and K. Suzumura. Amsterdam: Elsevier, 237–288. [799] PAGE, F. H., AND P. K. MONTEIRO (2003): “Three Principles of Competitive Nonlinear Pricing,” Journal of Mathematical Economics, 39, 63–109. [792] PALFREY, T. R. (2002): “Implementation Theory,” in Handbook of Game Theory With Economic Applications, Vol. III, ed. by R. J. Aumann and S. Hart. Amsterdam: Elsevier, 2273–2326, Chap. 61. [799] PAVAN, A., AND G. CALZOLARI (2010): “Truthful Revelation Mechanisms for Simultaneous Common Agency Games,” American Economic Journal: Microeconomics (forthcoming). [792] PETERS, M. (2001): “Common Agency and the Revelation Principle,” Econometrica, 69, 1349–1372. [792] POLO, M., AND P. TEDESCHI (2000): “Delegation Games and Side-Contracting,” Research in Economics, 54, 101–116. [799]
Dept. of Economics, Stanford University, Stanford, CA 94305, U.S.A.;
[email protected]. Manuscript received February, 2007; final revision received June, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 803–821
SOLVING, ESTIMATING, AND SELECTING NONLINEAR DYNAMIC MODELS WITHOUT THE CURSE OF DIMENSIONALITY BY VIKTOR WINSCHEL AND MARKUS KRÄTZIG1 We present a comprehensive framework for Bayesian estimation of structural nonlinear dynamic economic models on sparse grids to overcome the curse of dimensionality for approximations. We apply sparse grids to a global polynomial approximation of the model solution, to the quadrature of integrals arising as rational expectations, and to three new nonlinear state space filters which speed up the sequential importance resampling particle filter. The posterior of the structural parameters is estimated by a new Metropolis–Hastings algorithm with mixing parallel sequences. The parallel extension improves the global maximization property of the algorithm, simplifies the parameterization for an appropriate acceptance ratio, and allows a simple implementation of the estimation on parallel computers. Finally, we provide all algorithms in the open source software JBendge for the solution and estimation of a general class of models. KEYWORDS: Dynamic stochastic general equilibrium (DSGE) models, Bayesian structural time series econometrics, curse of dimensionality.
1. INTRODUCTION MANY MODERN MACROECONOMIC MODELS with rational expectations are nonlinear and linearization combined with a Kalman (1960) filter has several disadvantages. Fernández-Villaverde and Rubio-Ramírez (2006) reported evidence for nonlinearities in macroeconomic data; consequently, linear estimates are likely to be biased. Theoretical nonlinearities naturally arise in the aggregate, for example, in a model with nominal rigidities and a lower bound on the nominal interest rate; see Billi and Adam (2007). Unfortunately, a nonlinear likelihood based econometric approach is a complex numerical operation with at least four problems. We propose innovations to all of them. The first problem is to solve the model. Here we use the Smolyak operator for a polynomial approximation of the policy functions as well as for the quadrature of rational expectations. The second problem is to evaluate the likelihood where we introduce three nonlinear state space filters. The third problem is to generate parameter estimates. We do this by a new parallel Metropolis–Hastings algorithm. The fourth problem is to program all these
1 We thank Wouter Denhaan, Paul Fackler, Jesús Fernández-Villaverde, James Heckman, Florian Heiss, Kenneth Judd, Michel Juillard, Felix Kübler, Alexander Ludwig, Thomas Mertens, Juan Rubio-Ramírez, and the participants of the Institute on Computational Economics at the University of Chicago and Argonne National Laboratory, 2005, for discussion and inspiration. We also thank the anonymous referees for their valuable comments. This research was supported by the Deutsche Forschungsgemeinschaft through the SFB 649 Economic Risk.
© 2010 The Econometric Society
DOI: 10.3982/ECTA6297
804
V. WINSCHEL AND M. KRÄTZIG
complex interacting algorithms. As a solution, we provide the open source software JBendge2 for a general class of models. The currently predominant approximation strategy is a local perturbation approach based on the implicit function theorem discussed by Judd and Guu (1997), Judd and Jin (2002), and Gaspar and Judd (2005).3 A global approximation based on the projection method is presented by Judd (1992). Aruoba, Fernández-Villaverde, and Rubio-Ramírez (2006) compared these approaches. The global approximation of functions or integrands suffers from exponentially growing computational costs, called the curse of dimensionality, in case it is extended from the univariate to the multivariate case by the tensor operator. The curse effectively restricts tensor based global approximation methods to be useful only for a few (in our case, six) dimensions. We replace the tensor operator by the Smolyak (1963) operator, applicable for polynomial, finite element, or wavelet approximations. It decreases the approximation burden for sufficiently smooth functions and integrands from exponentially growing costs to a polynomial order; see Bungartz and Griebel (2004) for an extensive summary. The current approach to the evaluation of the likelihood of state space models is dominated by the sequential importance resampling particle filter and its extensions described in Doucet, de Freitas, and Gordon (2001). We introduce three new nonlinear filters based on deterministic integration to overcome the high computational burden of the particle filter. The first filter is the simple but very fast deterministic Smolyak Kalman filter, which is an improvement of the unscented Kalman filter by Julier and Uhlmann (1997). The unscented integration restricts the computational costs to rise linearly with the dimension of the integrand and, accordingly, suffers from a decreasing approximation quality. Moreover, the unscented method is tailored to integrands weighted by Gaussian densities and cannot be easily adapted to other cases, whereas the Smolyak quadrature is readily applicable to any existing univariate scheme. The Smolyak as well as the unscented Kalman filters approximate some nonGaussian densities with Gaussians during the filtering recursions. As an improvement, we propose the Smolyak particle filter, which uses the density obtained by the Smolyak Kalman filter as the proposal density for the sequential importance resampling particle filter. This is an approach similar to the unscented particle filter of van der Merwe, Doucet, de Freitas, and Wan (2000). Our third new filter deterministically approximates the non-Gaussian densities by Gaussian sums. 2 JBendge is an acronym for Java based Bayesian estimation of nonlinear dynamic general equilibrium models. The homepage of this project is at http://jbendge.sourceforge.net/. It is licensed under the GPL. 3 Current software packages that implement these methods are, for example, Juillard (1996) or Schmitt-Grohé and Uribe (2004).
NONLINEAR DYNAMIC MODELS
805
For the parameter estimation, we combine the Metropolis–Hastings algorithm with the genetic algorithm of Storn and Price (1997) to receive a parallel version. This improves the global maximization properties, automatically chooses the innovation variances on the run, and simplifies the implementation on parallel computers. One application of the Smolyak operator is presented by Heiss and Winschel (2008), where a Smolyak based mixed logit likelihood estimator systematically outperforms simulation based methods. The research on likelihood based nonlinear dynamic structural estimation has just started with a number of papers by Fernández-Villaverde and Rubio-Ramírez (2004, 2005, 2006) and a recent estimation by Amisano and Tristani (2007). The rest of the paper is divided into four sections: economics, econometrics, results, and conclusion.4 In the economics Section 2, we present the generic model class, the optimality conditions for the example model, the Smolyak operator, the nonlinear solution strategy based on a Chebyshev approximation, and the approximation error estimator. In the econometrics Section 3, we describe the Smolyak Kalman filter, the particle filter, the Smolyak particle filter, and the Smolyak sum filter, the parallel Metropolis–Hastings algorithm, the convergence diagnostics, and the calculation of the marginal likelihood. In the results Section 4, we summarize the performance of the solution method, the filters, and the estimators. In Section 5, we conclude. 2. ECONOMICS Our general model class 0 = f (st xt zt ; θ)
where
f : Rds +dx +dz → Rdx
zt = Ee h(st xt et+1 st+1 xt+1 ; θ) st+1 = g(st xt et+1 ; θ)
where h : Rds +dx +de +ds +dx → Rdz
where g : Rds +dx +de → Rds
is formulated in terms of dynamic first-order optimality conditions f , expected functions h, and state transitions g. The variables are states st ∈ Rds , policies xt ∈ Rdx , expected variables zt ∈ Rdz , and random shocks et+1 ∈ Rde , which are usually assumed to follow a normal distribution et+1 ∼ N (0 Σe ) with a diagonal Σe . The vector θ contains the structural parameters. 4 Additional material is available in the supplemental material to this paper (Winschel and Krätzig (2010)), which has details on the linearization of the general model, the new filters, the Metropolis–Hastings algorithm, an example of a two-dimensional Smolyak polynomial approximation, and detailed simulation results.
806
V. WINSCHEL AND M. KRÄTZIG
2.1. Model In our example, the allocation problem is solved by the dynamic optimization max
∞ {{cnt lnt int }N n=1 }t=0
N ∞
U = E0
βt Unt
t=0 n=1
for n = 1 N countries and all future periods t ≥ 0. The welfare function U θn is a discounted sum of country utilities Unt = (cnt (1 − lnt )1−θn )1−τn /(1 − τn ) with a common discount factor β for all countries, an elasticity of intertemporal substitution τn , and consumption and substitution rate θn , for leisure N each country. The world budget constraint n=1 (ynt − cnt − int ) = 0 restricts the world output to be either consumed or invested. The policy variables are consumption cnt , labor lnt , and investment int for each country. The produc1−αn depends on productivity ant , capital knt , tion technology ynt = eant kαntn lnt labor lnt , and the technical substitution rate αn . The capital and productivity transitions are (1)
2 knt+1 = int + (1 − δn )knt − 05κn int
(2)
ant+1 = ρn ant + ent+1
where δn are the depreciation rates and ρn are the autocorrelation coefficients for the productivity processes with normally distributed shocks ent ∼ N (0 σen ) which are independent across countries and time. In the capital transition equation (1), we include capital adjustment costs parameterized by κn . They assure that the state of the system is not simply the aggregate capital stock, but its distribution across the countries. The variables and parameters of the example model correspond to the variables of the general model class as follows: st = {knt ant }Nn=1 xt = {cnt lnt int }Nn=1 et = {ent }Nn=1 θ = {τn θn αn δn ρn κn σen }Nn=1 ∪ {β} 2.2. Optimality The Bellman equation for Vt ≡ V (k1t kNt a1t aNt ; θ) is Vt =
max
{cnt lnt int knt+1 }N n=1
N n=1
N Unt + βEt Vt+1 + λB (ynt − cnt − int ) n=1
N
+
2 λn (knt+1 − (1 − δn )knt − int + 05κn int )
n=1
As a characterization of the solution, we obtain N Euler equations ∂Unt+1 ∂ynt+1 ∂Unt 1 − δn 1 (3) = 0 − βEt + 1 − κn int ∂cnt ∂cnt+1 ∂knt+1 1 − κn int+1
NONLINEAR DYNAMIC MODELS
807
N intratemporal trade-offs between consumption and labor supply (4)
∂Unt ∂Unt ∂ynt + = 0 ∂lnt ∂cnt ∂lnt
and N − 1 cross-country optimality conditions for n = 2 N, (5)
∂Unt ∂U1t = ∂c1t ∂cnt
The 4N equations for the variables cnt lnt int , and knt+1 are the N Euler conditions (3), N intratemporal trade-offs between consumption and labor (4), N − 1 equalities of marginal utilities (5), and the budget constraint and N capital transitions (1). The mapping into the general model class is the following: the 2N − 1 equations (3) and (5) describe the policy functions for l1t lNt and i2t iNt in the general first-order conditions f . The general model functions h for N forward looking variables are given by the arguments of the expected values in the Euler equations (3). The N capital transitions (1) and the N productivity transitions (2) form the state transition functions g of the general model. 2.3. Solution Our solution approach is to iterate on the implicitly defined policy functions x∗ : Rds → Rdx . The policy values x(k) at the grid of the states in each iteration k are obtained as the solutions to f (s x(k) z (k−1) ) = 0 for given expected variables z (k−1) = Ee h(x(k−1) ) based on the policies x(k−1) from the previous iteration. We generate the start values for the iteration from a linear approximation. The evaluation of z (k) involves a numerical integration which can be thought as a polynomial approximation of the integrand with a subsequent trivial inh(e) = h(e)p(e) de ≈ tegration. This amounts to evaluating Ee j wj h(ej ), where the continuous random variable e and its density p(e) are essentially discretized into realizations ej with mass wj . An alternative approach would solve for x in f (s x Ee h(x)) = 0 in one large system of dg × dx equations, where dg denotes the number of grid points of the approximation. Our method has the advantage that we can solve dg separate systems with dx equations during each iteration. Moreover, the Jacobians for these small systems are available.5 A restriction of the function iteration scheme is that all policy variables have to appear in the contemporary policy vector x in f (s x z) = 0. 5
JBendge uses a symbolic engine for an automatic calculation of these Jacobians.
808
V. WINSCHEL AND M. KRÄTZIG
2.3.1. Approximation
The approximation of the policy function x∗ (s; c) = j cj bj (s) is characterized by the coefficient vector c = {c1 cJ } of Chebyshev polynomials with basis functions bj (s). These coefficients are obtained from the constraint that the true function values are equal to the approximation at the Gauss–Lobatto grid of the states whose number drives the solution costs. The tensor operator extends a univariate approximation to the multivariate case by combining each element of the univariate grids and basis functions with each other. The univariate approximation of function x is given by U i (x) =
mi
aij x(sji )
j=1
with approximation level i ∈ N and grid points sji . The aij are functions in a function approximation and weights in a numerical integration. The multidimensional (d > 1) tensor product operator ⊗ is defined as m
m
(U ⊗ · · · ⊗ U )(x) = i1
id
i1
j1 =1
···
id
i i i i aj11 ⊗ · · · ⊗ ajdd x sj11 sjdd
jd =1
d The number of grid points j=1 mij and, thus, the number of function evaluations needed to identify the coefficients of the approximation c grows exponentially with d. The Smolyak operator is given by d−1 Aqd (x) = (U i1 ⊗ · · · ⊗ U id ) (−1)q−|i| q − |i| q−d+1≤|i|≤q
where d is the dimensionality of the function to be approximated, q is the level of approximation, and i = {i1 id } is a multiindex with |i| = j ij . This formula highlights the fact that the Smolyak operator is a simple linear combination of some lower level tensor products. The Smolyak operator constructs the multivariate grid as a combination of some lower level Cartesian products × of the univariate grids sij :
Hqd = (6) (si1 × · · · × sid ) q−d+1≤|i|≤q
We illustrate the Smolyak operator step by step in the Supplemental Material (Winschel and Krätzig (2010)). For alternative presentations, see Kübler and Krüger (2004), Heiss and Winschel (2008), Bungartz and Griebel (2004), or Barthelmann, Novak, and Ritter (1999). The last paper is the closest to our
NONLINEAR DYNAMIC MODELS
809
application and contains the following result: for a smooth policy function x(s) (i.e., x ∈ Fdk = {f : [a b]d → R|Dα f is continuous if αi ≤ k ∀i}), the accuracy of a polynomial Smolyak approximation is O(n−k (log n)(k+1)(d−1) ), where n is the number of grid points. The error of the approximation is controlled by the Euler error proposed by Judd (1992), which is a normalization of the residuals of the first-order conditions. Dividing the residual by (1 − ln (st ))(1−θn )(1−τn ) /(κn int − 1) and taking it to the power of 1/(θn (1 − τn ) − 1) gives the Euler equation in terms of consumption: 1/(θ(1−τ)−1) βEt h(s x∗ (s) e s x∗ (s ); θ) r c (s) = cn (s) − (1 − ln (s))(1−θn )(1−τn ) /(κn int − 1) s = g(s x∗ (s) e ) The Euler error is finally given by r E = |r c (s)/cn (s)|. A log10 error of −3 means that the utility loss due to the approximation is less than one per 1000 dollars. 3. ECONOMETRICS The Bayesian estimation given a model Mi is based on the joint density p(ω y θMi |Mi ) = p(ω|y θMi Mi )p(y|θMi Mi )p(θMi |Mi ) of observables y, variables of interest ω, and unobservable parameters and states θMi = {θ} ∪ {s}. The factorization describes functional forms and shock distributions for the variables of interest p(ω|y θMi Mi ), the likelihood p(y|θMi Mi ), and the prior density of unobservables p(θMi |Mi ). The likelihood transforms the prior into the posterior of unobservables p(θMi |y Mi ) =
p(y|θMi Mi )p(θMi |Mi ) p(y|Mi )
=
p(y|θMi Mi )p(θMi |Mi ) p(y|θMi Mi )p(θMi |Mi ) dθMi
The marginal likelihood p(y|Mi ) integrates out the parameters and transforms a model prior p(Mi ) into its posterior p(Mi |y) whose ratio, the Bayes factor p(Mi |y) p(Mi ) p(y|Mi ) = p(Mj |y) p(Mj ) p(y|Mj ) selects the model with the best in-sample forecast quality, that is, the one which allows a higher compression of the explained data. This criterion is valid for nonnested models with different functional forms or shock densities.
810
V. WINSCHEL AND M. KRÄTZIG
3.1. Filtering ∗
The model solution x (s), the state transition equation st = g∗ (st−1 x∗ (st−1 ) et ) = g(st−1 et ) ⇔ p(st |st−1 ), and a measurement equation yt = m∗ (st x∗ (st ))+ εt = m(st ) + εt ⇔ p(yt |st ) form a nonlinear state space model. For a given parameter vector and prior of the unobserved state p(s0 ) = p(s0 |y0 ), the data of each period yt transform the previous posterior density of unobserved states p(st−1 |y1:t−1 ) in a prediction and filtering step into the next posterior p(st |y1:t ).6 In the prediction step, the prior is formed according to the Chapman– Kolmogorov equation p(st |y1:t−1 ) = p(st st−1 |y1:t−1 ) dst−1 = p(st |st−1 )p(st−1 |y1:t−1 ) dst−1 and is updated in the filtering step to the new posterior p(st |y1:t ) =
p(yt |st )p(st |y1:t−1 ) p(st yt |y1:t−1 ) = p(yt |y1:t−1 ) p(yt |st )p(st |y1:t−1 ) dst
with respect to yt , which becomes the prior with respect to new data yt+1 in the next iteration. The normalizing constant lt = p(yt |st )p(st |y1:t−1 ) dst = p(yt |y1:t−1 ) is the period’s contribution to the likelihood of the complete sample
L(θ; y1:T ) = p(y1:T |θ) =
T
p(yt |y1:t−1 θ) =
t=1
T
lt
t=1
The likelihood transforms the prior parameter density into the posterior p(θ|y1:T ) =
p(y1:T |θ)p(θ) p(y1:T )
An overview of various approaches to this problem can be found in Arulampalam, Maskell, Gordon, and Clapp (2002). The following subsections summarize the main ideas for the filters. Detailed formulas can be found in the Supplemental Material. 3.1.1. Smolyak Kalman Filter The extended Kalman filter linearizes the state space equations and then applies the Kalman filter. A widely used improvement is the deterministic unscented filter by Julier and Uhlmann (1997), where the prediction p(st |y1:t−1 ) and posterior densities p(st |y1:t ) are approximated by Gaussians whose first 6
The notation y1:t is a shorthand for {y1 yt }.
NONLINEAR DYNAMIC MODELS
811
two moments are updated by the Kalman method. The unscented integration of these two moments addresses the curse of dimensionality by restricting the number of grid points to rise linearly with the dimension and there by shifts the curse to a rising inaccuracy of the approximation. Our Smolyak Kalman filter uses the theoretically sound Smolyak Gaussian quadrature instead. Its advantage is that the approximation level can be freely chosen and that any shock density with an existing univariate quadrature scheme can be incorporated. Our approach reduces the moment approximation error, but also retains the error from assuming the densities to be Gaussian. 3.1.2. Particle Filter The particle filter updates the posterior density of the unobserved states by sequential importance sampling. It is easy to implement, very general, and arbitrarily accurate, but it needs very large samples since it does not use the smoothness of the integrand and the latest observation in its proposal density p(st |st−1 ), which should also condition on yt . 3.1.3. Smolyak Particle Filter Our second filter is the Smolyak particle filter which generates the particle filter’s proposal density by the densities obtained from the Smolyak Kalman filter. By that it incorporates the latest observation and combines the advantages of both filters—the accuracy of the slow particle filter and the speed of the potentially inaccurate Smolyak Kalman filter. This idea is similar to that used by Amisano and Tristani (2007), who generated the proposal density by some kind of extended Kalman filter, or to that used by van der Merwe et al. (2000), who used the unscented filter. In contrast to the particle filter, these filters do not break down in state space models with very small measurement errors or without any errors at all. 3.1.4. Smolyak Sum Filter The basic idea for the Smolyak sum filter is that any density of practical concern can be approximated as a weighted sum of normal densities p(x) ≈ ω N (x; μi Σi ). This filter approximates the prediction and filtering densii i ties and effectively runs several Smolyak Kalman filters in parallel. The weights are updated according to the likelihood contribution of each summand. The idea can be traced back to Alsbach and Sorenson (1972). More recently, Kotecha and Djuri´c (2003) revived this approach, but they used costly importance sampling to approximate the involved integrals. Instead, we propose to use Smolyak Gaussian quadrature again. Anderson and Moore (1979) presented this approach for a model with additive noise in the measurement and state equations. In our setting of nonadditive state shocks, we therefore implicitly assume that the weights are preserved during the prediction step. This simplifying assumption deserves further elaboration.
812
V. WINSCHEL AND M. KRÄTZIG
3.2. Posterior Density The Metropolis–Hastings algorithm generates draws from a density if this density can be evaluated at any point of its domain. Once we can evaluate the likelihood in p(y|θ)p(θ), we obtain a histogram of draws of θ that approximate its posterior p(θ|y). For details, see the introdution in Chib and Greenberg (1995). The parameter space is traversed by a random walk. A generated candidate parameter θˆ n is accepted with the probability given by the ratio between the candidate’s posterior and the posterior of the previously accepted parameter θˆ n−1 . This qualifies the algorithm as a global maximizer. The survival according to the parameter’s fitness, measured by the posterior, qualifies it as a genetic algorithm. The innovation variance influences the density region covered by the random walk. After convergence, sampling around the mode with a large variance generates candidates far from the current value and results in a low acceptance ratio. Smaller variances increase the acceptance ratio, but decrease the region being covered so that low probability regions may be undersampled. If the innovation variance is tuned to balance this trade-off at an acceptance ratio of around 30%, the algorithm converges to a representative sample drawn from the target density. The critical choices of the algorithm are the starting value θˆ 0 , the convergence detection, and the innovation variance of the random walk. 3.2.1. Convergence Test The starting value θˆ 0 drives the number of burn-in draws to be discarded before a representative region of the target density is reached. Formal convergence tests along with eyeballing make up an important part of the estimation process. The convergence test we use can be based on one subdivided sequence or on several parallel sequences as in our parallel algorithm. The idea is to diagnose convergence if the sequences appear to be drawn from the same distribution according to a distance measure between the within- and across-sequence variances. Examining only one sequence will result in overly optimistic diagnostic tests. Gelman and Rubin (1992) pointed out that lack of convergence, in many problems, can easily be detected from many sequences but not from one. 3.2.2. Parallel Mixing Metropolis–Hastings The variance for the innovation shock is the major problem of the algorithm, since the optimal variance can be derived only from the variance of the target density. The usual approach is, therefore, to estimate the variance repeatedly from training sequences which are also useful to assure robustness with respect to the start value.
813
NONLINEAR DYNAMIC MODELS
We propose to run multiple sequences simultaneously and not sequentially. By doing so, we can assure robustness with respect to the start values, unbias the convergence diagnostic test, and, most importantly, estimate the innovation variance on the fly. The idea is to run several sequences m = 1 M in parallel and generate the candidate vector in each sequence according to ∗ = θˆ mn + γ GE (θˆ m1 n − θˆ m2 n ) + ε θˆ m
where
ε ∼ N (0 bI)
The innovation from the random shock is common to the standard algorithm but we use only one variance common to all shocks. The additional source of innovation is the difference between the parameter vectors from two randomly chosen sequences m1 and m2 . The scalar γ GE and the shock variance b determine the relative weight of the mixing and random walk innovations, and are the only two values that need to be tuned for an appropriate acceptance ratio instead of one variance for each estimated parameter. The intuition of our approach is that the variance of the difference between two randomly drawn parameter vectors is the optimal variance given that the sequence has converged. Our idea originated from the diagnosis test where the within- and between-sequence variances are examined and convergence is detected when they are of a similar size. Since we know that the optimal variance is the scaled variance of the target density, the between-sequence variance is a good estimator of the optimal variance. ter Braak (2006) derived the same candidate by analogy to the global evolutionary optimization algorithm, called differential evolution, by Storn and Price (1997). This evolutionary feature substantially improves the search for the modus of the posterior in the beginning of the sampling process. 3.3. Marginal Likelihood The marginal likelihood p(y|Mi ) can be approximated by a simple calculation based on the Metropolis–Hastings draws and their associated posterior values. Gelfand and Dey (1994) showed that for any density h(θMi |Mi ), the expected value h(θMi |Mi ) Ep(θMi |yMi ) p(y|θMi Mi )p(θMi |Mi ) approximates the inverse of the marginal likelihood p(y|Mi )−1 . 4. RESULTS All results are calculated by JBendge, which is programmed in Java. The hardware we use is a shared memory cluster with 16 CPUs. The parallel Metropolis–Hastings algorithm allows for a simple parallelization.
814
V. WINSCHEL AND M. KRÄTZIG
The model is parameterized equally for all countries by αn = 04 β = 099 δn = 002 ρn = 095 θn = 0357 τn = 20, and σna = 0007 The solutions for the multicountry models are calculated for κn = 001. The estimations are done only for the one country model with the parameterization τ = 50 and σa = 0035. To simplify the estimation, we set capital adjustment costs to zero (κ = 0) and obtain analytical expressions for the steady state values, a¯ = 00 k¯ = 2327 c¯ = 129 ¯i = 047 l¯ = 031, and y¯ = 175, which are independent of τ and σa . The missing investment costs allow us to compare our estimates with those obtained by Fernández-Villaverde and Rubio-Ramírez (2005). 4.1. Solution The multicountry model allows us to vary the number of countries and thus the dimensionality of the approximated policy functions. Table I documents the results. The tensor approximations are constructed from a minimal 3 point univariate approximation so as to obtain a nonlinear solution. This results in 3d grid points for the simplest tensor approximation. For the Smolyak approximation, we use levels 2, 3, and 4 for the models with four and six states, and afterward use only levels 2 and 3. The quality of the approximations is measured by the number of grid points the running time for a solution in seconds, and the maximal absolute Euler error at 10,000 random points in the approximation space. The Smolyak operator is already superior to the tensor operator for a small model with four states, where the Euler error on a 41 point Smolyak grid is smaller than the error on a tensor grid with 81 points, although the solution time is the same. For the next level with a similar error, the Smolyak operator is more than three times faster and uses about five times less points (137 vs. 625). The efficiency gain for the model with six states is even more dramatic. Here the Smolyak operator needs only 85 compared to 729 points of the tensor operator for similar approximation accuracy. The running times are accordingly about four times lower for the Smolyak operator. The tensor operator breaks down for models beyond six states, while the Smolyak operator is still doing fine. The biggest model we were able to solve has 22 states and it took around 1 hour for an approximation error of 17E–5. 4.2. Estimation We simulated data sets of 100 observations, starting from the deterministic steady states generated by very accurate nonlinear solutions of the one country model with the state’s productivity a, capital k, and one labor decision l. The consumption policy can be solved analytically from the intratemporal trade-off in terms of the labor policy. The observables in the measurement model are investment i, labor l, and output y. The measurement shocks are assumed to be
NONLINEAR DYNAMIC MODELS
815
TABLE I SMOLYAK (S) AND TENSOR (T) BASED SOLUTIONS States
Op.
Points
Error
Sec.
4 4 4 4 4
S S S T T
9 41 137 81 625
6.6E–4 8.1E–6 9.3E–7 4.9E–5 1.8E–7
03 25 240 25 885
6 6 6 6
S S S T
13 85 389 729
6.2E–4 5.1E–5 9.3E–7 6.5E–5
07 125 2015 541
8 8
S S
17 145
5.9E–4 3.5E–5
13 299
10 10
S S
21 221
7.5E–4 4.0E–5
23 692
12 12
S S
25 313
4.4E–4 4.8E–5
38 1578
14 14
S S
29 421
4.3E–4 3.7E–5
57 3391
16 16
S S
33 545
4.5E–4 4.0E–5
85 7241
18 18
S S
37 685
3.7E–4 2.6E–5
122 18194
20 20
S S
41 841
3.3E–4 1.9E–5
171 21074
22 22
S S
45 1013
3.3E–4 1.7E–5
233 40874
additive. We present the estimates of two variants of the models: one with small {σi σl σy } = {866E–4 11E–3 158E–4} and one with large measurement errors {σi σl σy } = {465E–4 312E–3 175E–2}. The large standard deviations are set to 1% of the steady state values and the small ones are set to the values in Fernández-Villaverde and Rubio-Ramírez (2005). 4.2.1. Likelihood The particle filter is run with 40,000 particles, the Smolyak Kalman filter with integration level 3, the Smolyak particle filter with integration level 2 and 500 particles, and the Smolyak sum filter with integration level 3 and 20 Gaussian summands. All solutions are calculated with Smolyak level 3 for the policy approximation and the rational expectation integrals. Figures 1 and 2 show slices through the multidimensional likelihood. The left plots show likelihood values from data with small measurement errors and
816
V. WINSCHEL AND M. KRÄTZIG
FIGURE 1.—Likelihood at true parameters for τ = 50 and σa = 0035.
the right plots show values from data with large measurement errors. We set all parameters to their true values and vary τ at the abscissas, and plot them against the likelihood values at the ordinates. The results are rather encouraging for all our filters as the values are very similar. As expected, the particle filter gets into trouble for small measurement errors in the model with τ = 50 and σa = 0035 in the left plot of Figure 1. The running times for one likelihood evaluation are very different: 120 seconds for the particle filter, 0.2 seconds for the Smolyak Kalman filter, 6 seconds for the Smolyak particle filter, 0.5 seconds for the Smolyak sum filter, 0.015 seconds for the linear Kalman filter, and 0.2 seconds for the extended Kalman filter. The particle filter is hardly useful in combination with a Chebyshev approximation where the policy function interpolation at the particles is the main bottleneck. A possible remedy is to use a finite element solution with its fast interpolation and to accept a larger grid for the same approximation quality.
FIGURE 2.—Likelihood at true parameters for τ = 20 and σa = 0007.
NONLINEAR DYNAMIC MODELS
817
4.2.2. Parameters We have implemented an interactive sampling environment in JBendge where the Metropolis–Hastings parameters γ GE and b can be changed, and individual sequences can be restarted while sampling. This greatly simplifies the maximization and acceptance ratio tuning. The usual procedure within our framework has three stages. The first one searches for the posterior mode. At this stage the mixing parameter γ GE can be rather large between 0.8 and 2.0 according to the prescription for the differential evolution algorithm of Storn and Price (1997). During this stage the acceptance ratio is usually very low. During the second stage, we sample until the diagnostic tests signal convergence, and tune the parameters γ GE and b to obtain an acceptance ratio around 0.3. For all estimations, parameter b is set to 1E–6 and γ GE is set between 0.1 and 0.4. Both parameters are remarkably stable across the linear and nonlinear estimations. This helps us to find the appropriate values for the nonlinear estimation by fast linear estimation runs. Once the convergence of the sampler is detected, we get into the third stage, where we sampled 50,000 draws. The start values of the sequences are random draws from the uniform parameter priors: [0 1] for α ρ θ; [075 1] for β; [0 005] for δ; [0 100] for τ; [0 01] for all shock variances. We run the estimation for 13 parameters with 32 parallel sequences. The synchronization necessary for the 16 CPUs results in an average utilization of 50% for the nonlinear estimation, while the linear estimation, scales almost linearly. It takes about 1500 draws for each of the 32 sequences to find the mode of the posterior density and another 1000–4000 draws to converge. The complete estimation process takes around 5 minutes for the linear estimation, about 2 hours for the nonlinear estimation with the Smolyak Kalman filter, 4 hours with the Smolyak sum filter, and 20 hours with the Smolyak particle filter. The particle filter is much too slow to be of practical use. In Figure 3 we report representative Smolyak Kalman filter estimates of the model with τ = 50 and σa = 0035. The vertical bars indicate the true parameter values. The data are generated with a very accurate solution with integration and solution approximation level 5, an Euler error of 1E–10, and small measurement errors. The summary of our findings is the following. All nonlinear filters clearly outperform the linearization plus Kalman filter approach. All nonlinear filters have problems accurately estimating the parameter τ, which is biased and has a large standard deviation of the posterior. The standard deviation of the posteriors for the measurement errors of output and investment are large and are of the same magnitude as the estimates themselves. The results are similar with respect to the various filters. Large measurement errors do not deteriorate the mean of the estimates, but consistently increase the standard deviations of the posteriors.
818
V. WINSCHEL AND M. KRÄTZIG
FIGURE 3.—Parameter posterior with the Smolyak Kalman filter.
We investigated the estimation problems for the parameter τ in some detail. Our first suspicion was that this result is due to poor performance of our filters. Since we did not run a full estimation with an accurate particle filter, we had no benchmark. What we did was to compare the likelihood values calculated by the particle filter at several parameter vectors visited by other filters during their sampling process. The finding was that τ is not properly identified
NONLINEAR DYNAMIC MODELS
819
in this model as the particle filter also clearly indicates a flat likelihood along the τ dimension. The detailed simulations can be found in the Supplemental Material. We conclude that the poor estimates of τ are not a problem of our nonlinear filters, but are a feature of the model, and thanks to good global search properties of our parallel Metropolis–Hastings algorithm, we were able to find these parameters. The last calculation is the marginal likelihood which gives similar results for large and small measurement errors. The Smolyak Kalman filter clearly outperforms the estimation based on the linear Kalman filter. The estimations with the Smolyak sum and Smolyak particle filter are better than that with the Smolyak Kalman filter. 5. CONCLUSION The Smolyak operator is highly effective for a global solution of models with about 20 states. The operator is also useful for numerical integration essential in many econometric applications. The Smolyak Kalman filter is very fast and useful if the posterior and prediction densities are reasonably approximated by a Gaussian density. If not, they can be approximated by Gaussian sums in the Smolyak sum filter. The particle filter is of little use in our implementation in combination with a Chebyshev approximation where the interpolation is costly and it results in very costly likelihood evaluations. The Smolyak particle filter is slower than both deterministic filters, but much faster than the particle filter. The parallelized Metropolis–Hastings algorithm improves the global maximization properties of the serial algorithm, avoids extensive training sequences and robustness checks, simplifies the choice of the innovation variance, provides a less biased convergence diagnostic test, and allows implementation of the algorithms on parallel computers. A major practical improvement for handling the Metropolis–Hastings algorithm is the interactive graphical user interface of JBendge, which allows a very comfortable estimation and algorithm tuning process. REFERENCES ALSBACH, D., AND H. SORENSON (1972): “Nonlinear Estimation Using Gaussian Sum Approximation,” IEEE Transactions on Automated Control, 17, 439–448. [811] AMISANO, G., AND O. TRISTANI (2007): “Euro Area Inflation Persistence in an Estimated Nonlinear DSGE Model,” Discussion Paper 6373, CEPR. [805,811] ANDERSON, B. D. O., AND J. B. MOORE (1979): Optimal Filtering. Mineola, NY: Dover Publications. [811] ARULAMPALAM, S., S. MASKELL, N. GORDON, AND T. CLAPP (2002): “A Tutorial on Particle Filters for On-Line Non-Linear/Non-Gaussian Bayesian Tracking,” IEEE Transactions on Signal Processing, 50, 174–188. [810]
820
V. WINSCHEL AND M. KRÄTZIG
ARUOBA, B. S., J. FERNÁNDEZ-VILLAVERDE, AND J. F. RUBIO-RAMÍREZ (2006): “Comparing Solution Methods for Dynamic Equilibrium Economies,” Journal of Economic Dynamics and Control, 30, 2477–2508. [804] BARTHELMANN, V., E. NOVAK, AND K. RITTER (1999): “High Dimensional Polynomial Interpolation on Sparse Grids,” Advances in Computational Mathematics, 12, 273–288. [808] BILLI, R., AND K. ADAM (2007): “Discretionary Monetary Policy and the Zero Lower Bound on Nominal Interest Rates,” Journal of Monatary Economics, 54, 728–752. [803] BUNGARTZ, H.-J., AND M. GRIEBEL (2004): “Sparse Grids,” Acta Numerica, 13, 147–269. [804, 808] CHIB, S., AND E. GREENBERG (1995): “Understanding the Metropolis–Hastings Algorithm,” The American Statistician, 49, 327–335. [812] DOUCET, A., N. DE FREITAS, AND N. GORDON (eds.) (2001): Sequential Monte Carlo Methods in Practice. New York: Springer. [804] FERNÁNDEZ-VILLAVERDE, J., AND J. F. RUBIO-RAMÍREZ (2004): “Comparing Dynamic Equilibrium Models to Data: A Bayesian Approach,” Journal of Econometrics, 123, 153–187. [805] (2005): “Estimating Dynamic Equilibrium Economies: Linear versus Nonlinear Likelihood,” Journal of Applied Econometrics, 20, 891–910. [805,814,815] (2006): “Estimating Macroeconomic Models: A Likelihood Approach,” Review of Economic Studies, 74, 1059–1087. [803,805] GASPAR, J., AND K. JUDD (2005): “Solving Large-Scale Rational-Expectations Models,” Macroeconomic Dynamics, 1, 45–75. [804] GELFAND, A., AND D. DEY (1994): “Bayesian Model Choice: Asymptotics and Exact Calculations,” Journal of the Royal Statistical Society, Ser. B, 56, 501–514. [813] GELMAN, A., AND D. B. RUBIN (1992): “Inference From Iterative Simulation Using Multiple Sequences,” Statistical Science, 7, 457–472. [812] HEISS, F., AND V. WINSCHEL (2008): “Likelihood Approximation by Numerical Integration on Sparse Grids,” Journal of Econometrics, 144, 62–80. [805,808] JUDD, K. L. (1992): “Projection Methods for Solving Aggregate Growth Models,” Journal of Economic Theory, 58, 410–452. [804,809] JUDD, K. L., AND S.-M. GUU (1997): “Asymptotic Methods for Aggregate Growth Models,” Journal of Economic Dynamics and Control, 21, 1025–1042. [804] JUDD, K. L., AND H.-H. JIN (2002): “Perturbation Methods for General Dynamic Stochastic Models,” Discussion Paper, Hoover Institution, Stanford. [804] JUILLARD, M. (1996): “Dynare: A Program for the Resolution and Simulation of Dynamic Models With Forward Variables Through the Use of a Relaxation Algorithm,” Discussion Paper, CEPREMAP. [804] JULIER, S. J., AND J. K. UHLMANN (1997): “A New Extension of the Kalman Filter to Nonlinear Systems,” in Proceedings of AeroSense: 11th International Symposium on Aerospace/Defense Sensing, Simulation and Controls, Orlando, FL. [804,810] KALMAN, R. E. (1960): “A New Approach to Linear Filtering and Prediction Problems,” Transactions of the ASME—Journal of Basic Engineering, 82, 35–45. [803] KOTECHA, J., AND P. DJURIC´ (2003): “Gaussian Sum Particle Filter,” IEEE Transactions on Signal Processing, 51, 2602–2612. [811] KÜBLER, F., AND D. KRÜGER (2004): “Computing Equilibrium in OLG Models With Stochastic Production,” Journal of Economic Dynamics and Control, 28, 1411–1436. [808] SCHMITT-GROHÉ, S., AND M. URIBE (2004): “Solving Dynamic General Equilibrium Models Using a Second-Order Approximation to the Policy Function,” Journal of Economic Dynamics and Control, 28, 755–775. [804] SMOLYAK, S. (1963): “Quadrature and Interpolation Formulas for Tensor Products of Certain Classes of Functions,” Soviet Mathematics—Doklady, 4, 240–243. [804] STORN, R., AND K. PRICE (1997): “Differential Evolution—A Simple and Efficient Heuristic for Global Optimization Over Continuous Spaces,” Journal of Global Optimisation, 11, 341–359. [805,813,817]
NONLINEAR DYNAMIC MODELS
821
BRAAK, C. J. (2006): “A Markov Chain Monte Carlo Version of the Genetic Algorithm Differential Evolution: Easy Bayesian Computing for Real Parameter Spaces,” Statistics and Computing, 16, 239–249. [813] VAN DER MERWE, R., A. DOUCET, N. DE FREITAS, AND E. WAN (2000): “The Unscented Particle Filter,” Advances in Neural Information Processing Systems, 13, 584–590. [804,811] WINSCHEL, V., AND M. KRÄTZIG (2010): “Supplement to ‘Solving, Estimating, and Selecting Nonlinear Dynamic Models Without the Curse of Dimensionality’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/6297_Extensions.pdf; http://www.econometricsociety.org/ecta/Supmat/6297_data and programs.pdf. [805,808] TER
Dept. of Economics, L7, 3-5, University of Mannheim, 68131 Mannheim, Germany;
[email protected] and School of Business and Economics, Institute of Statistics and Econometrics, Humboldt-Universität zu Berlin, 10099 Berlin, Germany;
[email protected]. Manuscript received February, 2006; final revision received January, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 823–832
STRONGLY CONSISTENT SELF-CONFIRMING EQUILIBRIUM BY YUICHIRO KAMADA1 Fudenberg and Levine (1993a) introduced the notion of self-confirming equilibrium, which is generally less restrictive than Nash equilibrium. Fudenberg and Levine also defined a concept of consistency, and claimed in their Theorem 4 that with consistency and other conditions on beliefs, a self-confirming equilibrium has a Nash equilibrium outcome. We provide a counterexample that disproves Theorem 4 and prove an alternative by replacing consistency with a more restrictive concept, which we call strong consistency. In games with observed deviators, self-confirming equilibria are strongly consistent self-confirming equilibria. Hence, our alternative theorem ensures that despite the counterexample, the corollary of Theorem 4 is still valid. KEYWORDS: Self-confirming equilibrium, consistency, strong consistency, Nash equilibrium.
1. INTRODUCTION FUDENBERG AND LEVINE (1993a; henceforth FL) introduced the notion of self-confirming equilibrium.2 In general, it is less restrictive than the notion of Nash equilibrium. This is mainly because beliefs can be incorrect at off-path information sets in a self-confirming equilibrium, which results in the possibility that two players have different beliefs about the strategy used by a third player. This is illustrated in the “horse” example of Fudenberg and Kreps (1988). FL defined a concept of consistency in an attempt to preclude this possibility, and claimed in their Theorem 4 that with consistency and other conditions on beliefs, a self-confirming equilibrium has a Nash equilibrium outcome. We provide a counterexample that disproves Theorem 4 and prove an alternative by replacing consistency with a more restrictive notion, which we call strong consistency. Briefly, consistency requires that each player’s belief be correct at the information sets that are reachable if he sticks to his equilibrium strategy and the opponents deviate.3 Strong consistency further requires that each player’s belief be correct at certain other information sets—those that are reachable if he sticks to actions that he plays on the equilibrium path, the opponents deviate, and he himself deviates at off-path information sets. 1 I would like to thank Akihiko Matsui and Drew Fudenberg, who read through previous versions of the present paper several times and gave me insightful advice. A co-editor and two anonymous referees encouraged me to pursue significant improvement in exposition. Thanks to Joshua Gottlieb and Shih En Lu for their careful proofreading. Thanks also to Michihiro Kandori, Tadashi Hashimoto, and Thomas Barrios. 2 For related issues, see also Battigalli (1987), Fudenberg and Kreps (1995), Fudenberg and Levine (1993b), and Kalai and Lehrer (1993). 3 For a justification of consistent self-confirming equilibrium, see FL and Fudenberg, Kreps, and Levine (1988).
© 2010 The Econometric Society
DOI: 10.3982/ECTA7496
824
YUICHIRO KAMADA
As a consequence of the alternative theorem proved here, we have that in games with observed deviators, in particular, in two-person games, strong consistency is sufficient to ensure a Nash equilibrium outcome, so the corollary of FL’s Theorem 4 is valid. 2. NOTATION, DEFINITIONS, AND THEOREM 4 OF FL We follow the same notation as in FL (pp. 525–527). Here we review and expand it. Fix an I-player game in extensive form with perfect recall. X is the set of nodes; 0 are the moves of Nature; Hi , H, and H−i are the sets of information sets; A(hi ) is the set of actions at hi ; Si , S, and S−i are the sets of pure strategies and profiles; Σi , Σ, and Σ−i are the sets of mixed strategies and profiles; Πi , Π, and Π−i are the sets of behavior strategies and profiles; πˆ i (·|σi ) is the behavior strategy induced by (giving the same outcome as) σi ; H(·) is the set of the information sets reachable under the argument (strategy or strategy profile)4 ; μi is i’s belief (a probability measure over Π−i ); ui (·) is i’s expected utility given the argument (a strategy profile or a strategy–belief pair). We assume that each player knows (at least) his own payoff function, the extensive form of the game, and the probability distribution over Nature’s moves. Some new notation follows: N is the set of players. We let Π−ij = k=ij Πk . Let p(z|b) denote the probability of reaching the terminal node z given a strategy profile or strategy–belief pair b. FL defined the following concepts: An information set hj , j = i, is relevant to player i given a belief μi if there exists si ∈ Si such that p(hj |si μi ) > 0. The set of information sets that are relevant to i given μi is denoted Ri (μi ). A game has observed deviators if for all players i, all strategy profiles s ∈ S, and all deviations si = si , h ∈ H(si s−i ) \ H(s) implies that there is no s−i with h ∈ H(si s−i ). In FL, it was proved that every two-player game of perfect recall has observed deviators.5 We say that profile σ ∈ Σ is equivalent to another profile σ ∈ Σ if they lead to the same distribution over terminal nodes, that is, p(z|σ) = p(z|σ ) for all z ∈ Z.
×
DEFINITIONS 1 AND 2 OF FL: Profile σ ∈ Σ is a self-confirming equilibrium if ∀i ∈ N, ∀si ∈ support(σi ), ∃μi s.t. the following statements hold: (i) si maximizes ui (· μi ). (ii) μi [{π−i |πj (hj ) = πˆ j (hj |σj )}] = 1 ∀j = i ∀hj ∈ H(si σ−i ). It is a consistent self-confirming equilibrium if “∀hj ∈ H(si σ−i )” in (ii) above is replaced by a stronger requirement, “∀hj ∈ H(si ).” 4 5
¯ Notice that, in contrast to FL, we do not distinguish between what they denote H(·) and H(·). See Lemma 2 of FL.
SELF-CONFIRMING EQUILIBRIUM
825
A self-confirming equilibrium σ is said to have unitary beliefs if for each player i, a single belief μi can be used to rationalize every si ∈ support(σi ). That is, in Definition 1 of FL, we replace “∀si ∈ support(σi ), ∃μi s.t.” with “∃μi s.t. ∀si ∈ support(σi ).” A self-confirming equilibrium σ is said to have independent beliefs if for each player i and each si ∈ support(σi ), the associated belief μi satisfies μi ( j=i Π¯ j ) = j=i μi (Π¯ j × Π−ij ) for all ( j=i Π¯ j ) ⊆ Π−i , where Π¯ j ⊆ Πj for all j ∈ N. The set of consistent self-confirming equilibria is strictly smaller than that of self-confirming equilibria, while it is strictly larger than that of Nash equilibrium.6 It was defined in FL in an attempt to rule out the possibility of a non-Nash outcome, as is claimed in Theorem 4 of FL.
×
×
×
THEOREM 4 OF FL: Every consistent self-confirming equilibrium with independent, unitary beliefs is equivalent to a Nash equilibrium. The next section provides a counterexample to this theorem. It also establishes that Theorems 1 and 3 of FL are incorrect as well, and that the proof of their Theorem 2 needs a correction. 3. A COUNTEREXAMPLE Consider the game depicted in Figure 1. In this game, player 1 moves first at his information set h1 , choosing between L1 and R1 . Knowing 1’s choice,
FIGURE 1.—The counterexample. 6 The examples in which consistent self-confirming equilibrium distinguishes itself from selfconfirming equilibrium and Nash equilibrium are given in FL. See Example 1 of FL for selfconfirming equilibrium, and see Examples 2, 3, and 4 of FL (and the game in Figure 1 of the present paper which we will explain in the next section) for Nash equilibrium.
826
YUICHIRO KAMADA TABLE I STRATEGIES AND BELIEFS IN CONSISTENT SELF-CONFIRMING EQUILIBRIUM s∗ Player 1
Player 2
Player 3
Equilibrium strategy profile s∗
R1
L2 r2
R3
Player 1’s belief Player 2’s belief Player 3’s belief
· R1 R1
R 2 r2 · L2 r2
R3 L3 ·
player 2 moves next. After L1 (resp. R1 ), player 2 chooses between L2 and R2 at his information set h2 (resp. l2 and r2 at h2 ). If L1 and L2 are chosen, each player receives the payoff 2, and if R1 and r2 are chosen, each player receives the payoff 1. Otherwise, player 3 gets the move at his information set h3 , not knowing players 1 and 2’s choices. Regardless of players 1 and 2’s choices, payoffs are (3 0 0) if player 3 plays L3 and (0 3 0) if player 3 plays R3 . We will show that (R1 r2 ) is played in a consistent self-confirming equilibrium with independent, unitary beliefs while it is not a Nash equilibrium outcome. To see that (R1 r2 ) is played in a consistent self-confirming equilibrium with independent, unitary beliefs, consider the strategy profile s∗ = (R1 (L2 r2 ) R3 ). We first verify that this is a consistent self-confirming equilibrium by considering the beliefs of players 1, 2, and 3 to be ((R2 r2 ) R3 ), (R1 L3 ), and (R1 (L2 r2 )), respectively. Table I presents these strategies and beliefs. It is easy to see that no player has an incentive to deviate from s∗ under these beliefs: By playing L1 , player 1 expects the payoff 0; by playing l2 , player 2 expects the payoff 0; and h2 and h3 lie off the equilibrium path so that there is no incentive to deviate at these information sets. Thus, it suffices to show that each player has the correct belief at the information sets reachable under his equilibrium strategy.7 The beliefs specified above (or in Table I) are incorrect only in that player 1 believes player 2 will play R2 at h2 and player 2 believes player 3 will play L3 at h3 . These incorrect beliefs hold in a consistent self-confirming equilibrium because h2 is not included in H(R1 ) and h3 is not included in H((L2 r2 )). Thus s∗ is a consistent self-confirming equilibrium. Moreover, s∗ has independent, unitary beliefs, because correlations are not allowed in each player’s belief and beliefs are concentrated on singletons. Next, we show that (R1 r2 ) cannot be played in a Nash equilibrium. To see this, suppose to the contrary, that is, that (R1 r2 ) is played in a Nash equilibrium. If player 3 played R3 with a probability greater than 1/3, then player 2 would take l2 with probability 1. So player 3 must be playing L3 with a probability of at least 2/3. This means that if player 1 plays L1 , he obtains at least 2 7 Player i’s belief is defined as a measure on the space Π−i , so the term “correct belief at information set hj ” is not appropriate. Throughout this paper, we use it to mean “belief that is correct at hj ” so as to simplify exposition.
SELF-CONFIRMING EQUILIBRIUM
827
as his payoff. Thus no matter how player 3 plays, at least one of players 1 and 2 has an incentive to deviate. This means that (R1 r2 ) cannot be played in a Nash equilibrium.8 In FL’s proof of their Theorem 4, they constructed a strategy profile π that is supposed to be a Nash equilibrium. In our example, π is (R1 (R2 r2 ) L3 ), but this is not a Nash equilibrium since player 1 would have an incentive to deviate.9 The example also establishes that Theorems 1 and 3 of FL are incorrect as well, and the proof of their Theorem 2 needs a correction. Theorem 1 claims that even if we relax condition (ii) for consistent self-confirming equilibrium by allowing beliefs to also be incorrect at information sets that cannot be reached when opponents’ equilibrium strategies are fixed, the set of possible strategy profiles does not change. In our example, strategy profile (R1 (L2 r2 ) L3 ) is not a consistent self-confirming equilibrium since player 1, being restricted to believe that player 3 will play L3 , has an incentive to deviate. On the other hand, this strategy profile is allowed in the relaxed condition: Since player 2’s strategy makes h3 unreachable, player 1’s belief about player 3’s strategy can be arbitrary.10 Theorem 3 claims that for each consistent self-confirming equilibrium of a game whose information sets are ordered by precedence, there is an equivalent extensive-form correlated equilibrium of Forges (1986). It is straightforward to check that s∗ is not an extensive-form correlated equilibrium.11 Finally, Theorem 2 claims that, in games with observed deviators, selfconfirming equilibria are consistent self-confirming. This claim itself is correct, but FL’s proof is incorrect because it uses the result of Theorem 1. The claim is a consequence of our Proposition 2. 4. STRONGLY CONSISTENT SELF-CONFIRMING EQUILIBRIUM AND NASH EQUILIBRIUM
To define the equilibrium concept that enables us to rule out non-Nash outcomes, we need one more piece of notation. Let Hi (si∗ σ−i ) = h|h ∈ H(si ) for some si ∈ Si s.t. si (h ) = si∗ (h ) ∀h ∈ H(si∗ σ−i ) 8 It would be easy to see that this example holds for an open set of payoffs around the payoffs we give. Thus the example is not trivial. i 9 There is an illogical jump when FL claimed “ui (si π−i ) = ui (si π−i )” holds for all si ∈ Si . In fact, it is satisfied only for all si ∈ support(σi ). Moreover, π is not well defined for information sets which are reached only by deviations made by more than one player. 10 Theorem 1 can be modified without loss of the intuition by slightly modifying one of its conditions. Specifically, replace condition (iii) of our Definition 1 in Section 4 with condition (iii ) as follows: Condition (iii ): μi [{π−i |πj (hj ) = πˆ j (hj |σj )}] = 1 ∀j = i ∀hj ∈ Hi (si σ−i ) ∩ H(σj ). 11 Theorem 3 and its corollary can be shown to be true by replacing consistency with strong consistency, which we will define in Section 4.
828
YUICHIRO KAMADA
be the set of action-possible information sets for player i under (si∗ σ−i ), that is, the set of information sets that can be reached when player i conforms to si∗ at nodes that are reached under (si∗ σ−i ). Note that the action-possible information sets for a player are determined not only by his own strategy, but also by his opponents’ strategies. DEFINITION 1: Profile σ ∈ Σ is a strongly consistent self-confirming equilibrium if ∀i ∈ N, ∀si ∈ support(σi ), ∃μi s.t. the following statements hold: (i) si maximizes ui (· μi ). (iii) μi [{π−i |πj (hj ) = πˆ j (hj |σj )}] = 1 ∀j = i ∀hj ∈ Hi (si σ−i ). PROPOSITION 1: Every strongly consistent self-confirming equilibrium with independent, unitary beliefs is equivalent to a Nash equilibrium. The difference between consistency and strong consistency can be seen best in the example in the previous section: (R1 r2 ) is not played in any strongly consistent self-confirming equilibrium; if it were, then player 2 must have the correct belief at h3 . This is because h3 can be reached by the action combination (L1 R2 ), which does not contradict player 2’s on-path play, namely r2 . Because player 1 must also have the correct belief at h3 , players 1 and 2’s beliefs about the strategy of player 3 must coincide. This implies (R1 r2 ) cannot satisfy the best-response condition (condition (i) of our Definition 1), as we have seen already. A strongly consistent self-confirming equilibrium can have a non-Nash outcome.12 This is immediate because Proposition 2 below ensures that strongly consistent self-confirming equilibrium reduces to self-confirming equilibrium in games with observed deviators. PROPOSITION 2: In games with observed deviators, hence a fortiori in twoplayer games, self-confirming equilibria are strongly consistent self-confirming. We omit the proof for this proposition. It is just a matter of showing that in games with observed deviators, action-possible information sets for player i that lie off the equilibrium path are not relevant to him.13 This establishes that the following result from FL holds without any modification as a corollary of Proposition 1. 12
Examples where outcomes (and hence strategy profiles) arise which are not Nash, but are strongly consistent self-confirming, are given in FL. (See Examples 2, 3, and 4 of FL.) 13 Notice that the game in Figure 1 does not have observed deviators. This is because player 3 cannot tell which of players 1 and 2 has deviated from (R1 (R2 r2 )) when he gets his turn to move at h3 . Thus the assumption of the proposition fails, allowing for the difference between self-confirming equilibrium and strongly consistent self-confirming equilibrium, which we have seen already.
SELF-CONFIRMING EQUILIBRIUM
829
COROLLARY 1: In games with observed deviators, and hence a fortiori in twoplayer games, every self-confirming equilibrium with independent, unitary beliefs is equivalent to a Nash equilibrium. Now, we are going to prove Proposition 1. Intuitively, the proof of the main theorem is as follows: Fix a strongly consistent self-confirming equilibrium σ and construct a new strategy profile σ as follows: πˆ k (σk )(hk ) is how player i believes k will play at the information set hk if hk is relevant to player i. For an information set which is irrelevant to all the players, the strategy is specified arbitrarily by how player j actually plans to play. This construction is well defined because strong consistency ensures, as we will see, that if an information set hk is relevant to both players i and j, they have the correct beliefs at hk so that they have the same beliefs at hk . Thus σ is a Nash equilibrium because σ−i is constructed according to player i’s belief μi whenever an information set in question is relevant to i, and i takes a best response against μi by condition (i) of our Definition 1. A formal proof is given below. PROOF OF PROPOSITION 1: Let σ be a strongly consistent self-confirming equilibrium with independent, unitary beliefs. The condition of unitary beliefs ensures that a single belief rationalizes all si ∈ support(σi ). For each player i, take one such belief, denoted μi . μ Before proceeding, we need one more piece of notation: π−ii ∈ Π−i is defined μi μi by π−i = j=i πj , where for each hj ∈ Hj and each aj ∈ A(hj ), μ πj i (hj )(aj ) = πj (hj )(aj )μi (dπj × Π−ij )
×
Πj
We now construct each player k’s behavior strategy πk by the rule14 ⎧ μi if ∃i ∈ N i = k s.t. hk ∈ Ri (μi ), ⎨ πk (hk ) (1) πk (hk ) = πˆ k (σk )(hk ) if hk ∈ H \ Ri (μi ) ⎩ i∈N\{k}
The construction of πk in (1) is well defined. To see this, first observe that Ri (μi ) ⊆ si ∈support(σi ) Hj (sj σ−j ) holds. This is because player i’s belief about player j’s strategy is correct at on-path information sets, by condition (iii) of our Definition 1. By the condition of unitary beliefs, this implies that player j has the correct belief at all hk ∈ Ri (μi ). Similarly, player i has the correct belief at all hk ∈ Rj (μj ). Thus if hk ∈ Ri (μi ) and hk ∈ Rj (μj ), i = k = j, the beliefs of players i and j about player k’s strategy at the information set hk are correct, μ μ so in particular πk i (hk ) = πk j (hk ) holds. 14
This construction is different from the original in FL.
830
YUICHIRO KAMADA
Now, construct each player i’s strategy σi and belief μi by (2)
σi = σˆ i (πi )
μi [{π−i }] = 1
where σˆ i (πi ) is a mixed strategy induced by (giving the same outcome as) πi . We will show that this σ is a Nash equilibrium. Because of the condition of independent beliefs, from Lemma 1 in the Appendix, for all si ∈ Si , (3)
p(·|si π−ii ) = p(·|si μi )
(4)
ui (si π−ii ) = ui (si μi )
(5)
H(π−ii ) \ Hi = Ri (μi )
μ
μ
μ
Now define μi such that (6)
μi ({π−ii }) = 1 μ
Then from (5) and (6), (7)
Ri (μi ) = Ri (μi )
×
×
Q ∼Q = h∈Q (A(h)) and Π−i = h∈H−i \Q (A(h)), For Q ⊆ H−i , define Π−i Q Q Q Q ∼Q Q Q ⊆ Π−i . and then define μi to satisfy μi (B−i ) = μi (B−i × Π−i ) for every B−i By (1), (2), and (6), Ri (μi )
μi
Ri (μi )
= μi
From this and (7), we can apply Lemma 1 of FL15 to have for all si ∈ Si , p(·|si μi ) = p(·|si μi ) and
ui (si μi ) = ui (si μi )
which mean, according to (2) and (6), for all si ∈ Si , (8)
) = p(·|si π−ii ) and p(·|si π−i μ
ui (si π−i ) = ui (si π−ii ) μ
From (4) and (8), we have for all si ∈ Si , ) = ui (si μi ) ui (si π−i
Because of condition (i) of our Definition 1, for all si ∈ support(σi ), for all si ∈ Si , ui (si μi ) ≥ ui (si μi ) R(μ )
R(μ )
Lemma 1 of FL states that “If μi and μˆ i are two distributions on Π−i such that μi i = μˆ i i , then (a) R(μi ) = R(μˆ i ), and (b) ui (si μi ) = ui (si μˆ i ) for all si .” Also p(·|si μi ) = p(·|si μˆ i ) for all si ∈ Si is implicit in this result. 15
831
SELF-CONFIRMING EQUILIBRIUM
From the above two expressions and (2), we obtain for all i ∈ N, for all si ∈ support(σi ), for all si ∈ Si , ui (si σ−i ) ≥ ui (si σ−i )
This inequality implies that σ is a Nash equilibrium if we establish that σ and σ are equivalent, because then we can replace “support(σi )” in the above inequality by “support(σi ).”16 Thus to conclude the proof, it now suffices to show that σ and σ are equivalent. To see this, first observe that, for each player i, condition (iii) of our Definition 1 and H(si σ−i ) ⊆ Hi (si σ−i ) imply that, for all si ∈ support(σi ), p(·|si σ−i ) = p(·|si μi ) From this and (2), (3), and (8), we obtain, for all si ∈ support(σi ), ) p(·|si σ−i ) = p(·|si σ−i
This equality means that σ and σ are equivalent. This concludes the proof. Q.E.D. APPENDIX LEMMA 1: If profile σ ∈ Σ has independent beliefs, for all i ∈ N, si ∈ μ support(σi ), and the associated belief μi , then p(·|si π−ii ) = p(·|si μi ), ui (si μi μ π−i ) = ui (si μi ), and H(π−ii ) \ Hi = Ri (μi ) hold for all si ∈ Si . PROOF: Define h(aj ) = A−1 (aj ) to be the information set where the action ˜ aj is possible. The path of actions to z ∈ Z, a(z), is the set of actions which are necessarily taken to get to the terminal node z. Let Z(si ) be the set of the terminal nodes reachable under si . For all z ∈ Z, p(z|si π−ii ) μ
i
= |{z} ∩ Z(s )| ·
˜ aj ∈a(z)j =i
πj (h(aj ))(aj )μi (dπj × Π−ij )
Πj
= |{z} ∩ Z(si )| 16 To see this, first note that fixing σ−i , “∀si ∈ support(σi )” in the last inequality can be re ), si (hi ) = si∗ (hi )]].” Now, we have that placed by “∀si s.t. [∃si∗ ∈ support(σi ) s.t. [∀hi ∈ H(si∗ σ−i if (a) ∃si∗ ∈ support(σi ) s.t. [∀hi ∈ H(si∗ σ−i ), si (hi ) = si∗ (hi )], then (b) ∃si∗ ∈ support(σi ) s.t. ), si (hi ) = si∗ (hi )], because if (b) does not hold, we have Z(si ) \ Z(σi σ−i ) = ∅. [∀hi ∈ H(si∗ σ−i ) = ∅ by the assumption that σ is equivalent to σ , but However, this implies Z(si ) \ Z(σi σ−i this contradicts (a). So we can replace “∀si ∈ support(σi )” by “∀si s.t. [∃si∗ ∈ support(σi ) s.t. [∀hi ∈ H(si∗ σ−i ), si (hi ) = si∗ (hi )]].” A special case of this is “∀si ∈ support(σi ).”
832
YUICHIRO KAMADA
·
···
ΠI
··· Πi+1
Πi−1
i
= |{z} ∩ Z(s )| · Π−i
Π1
˜ aj ∈a(z)j =i
πj (h(aj ))(aj ) μi
×dπ j=i
j
πj (h(aj ))(aj ) μi (dπ−i )
˜ aj ∈a(z)j =i
p(z|si π−i )μi (dπ−i )
= Π−i
= p(z|si μi ) In particular, the second equality follows from the condition of independent beliefs. Thus under this condition, given si and μi , the distribution over the μ terminal nodes when players j = i follow π−ii is identical to what i believes in μ his belief μi . The other two equations follow, that is, ui (si π−ii ) = ui (si μi ) and μi H(σˆ −i (π−i )) \ Hi = Ri (μi ) hold. Q.E.D. REFERENCES BATTIGALLI, P. (1987): “Comportamento razionale ed equilibrio nei giochi e nelle situazioni sociali,” Unpublished Dissertation, Università Commerciale “L. Bocconi”, Milano. [823] FORGES, F. (1986): “An Approach to Communication Equilibria,” Econometrica, 54, 1375–1386. [827] FUDENBERG, D., AND D. M. KREPS (1988): “A Theory of Learning, Experimentation and Equilibrium in Games,” Mimeo, Stanford University. [823] (1995): “Learning in Extensive-Form Games I. Self-Confirming Equilibria,” Games and Economic Behavior, 8, 20–55. [823] FUDENBERG, D., AND D. K. LEVINE (1993a): “Self-Confirming Equilibrium,” Econometrica, 61, 523–545. [823] (1993b): “Steady State Learning and Nash Equilibrium,” Econometrica, 61, 547–573. [823] FUDENBERG, D., D. M. KREPS, AND D. K. LEVINE (1988): “On the Robustness of Equilibrium Refinements,” Journal of Economic Theory, 44, 354–380. [823] KALAI, E., AND E. LEHRER (1993): “Rational Learning Leads to Nash Equilibrium,” Econometrica, 61, 1019–1045. [823]
Dept. of Economics, Harvard University, Cambridge, MA 02138, U.S.A.;
[email protected]. Manuscript received October, 2007; final revision received June, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 833–842
SEQUENTIAL ESTIMATION OF DYNAMIC DISCRETE GAMES: A COMMENT BY MARTIN PESENDORFER AND PHILIPP SCHMIDT-DENGLER1 Recursive procedures which are based on iterating on the best response mapping have difficulties converging to all equilibria in multi-player games. We illustrate these difficulties by revisiting the asymptotic properties of the iterative nested pseudo maximum likelihood method for estimating dynamic games introduced by Aguirregabiria and Mira (2007). An example shows that the iterative method may not be consistent. KEYWORDS: Dynamic discrete games, multiple equilibria, pseudo maximum likelihood estimation, recursive methods.
AGUIRREGABIRIA AND MIRA (2007), henceforth AM (2007), studied pseudo maximum likelihood estimators of dynamic games and proposed an iterative nested pseudo maximum likelihood method. This comment revisits the asymptotic properties of the sequential method. We illustrate that the method may not be consistent. We provide an example in which the sequential method converges to a fixed number distinct from the true parameter value with probability approaching 1. EXAMPLE: Consider a repeated game with t = 1 2 ∞. Every period t, two firms, indexed by i = 1 2, simultaneously decide whether to be active or not. Firm i’s period payoff is equal to εi1 if firm i is active and firm 3 − i is not active; θ + εi1 if both firms are active; and εi2 if firm i is not active. The true parameter θ0 is contained in the interior of a compact interval Θ with Θ = [−1 −10]. The tuple of random variables (εi1 εi2 ) is such that the difference εi = εi1 − εi2 is drawn independently every period from the distribution function Fα and observed privately by firm i prior to making the choice with ⎧ εi − 1 + α 1 ⎪ ⎪ 1 − α + 2α − [1 − α ∞) ⎪ ⎪ σ 2 ⎨ Fα (εi ) = εi (1) [α 1 − α) ⎪ ⎪ ε − α ⎪ i ⎪ ⎩ 2α [−∞ α) σ where √ denotes the standard normal cumulative distribution function, σ = 2α/ 2π, and α(θ0 ) > 0 is small.2 There are no publicly observed state vari1
We thank the editor, an anonymous referee, Helmut Elsinger, and Oliver Linton for helpful comments. 2 The cumulative distribution function (c.d.f.) Fα arises when the joint density of (ε1 ε2 ) takes the form fα (ε1 − ε2 ) · φ(ε2 ), where φ(·) denotes the standard normal probability distribution function (p.d.f.) and fα (x) is a p.d.f. that equals 2α φ( x−α ) for x ∈ [−∞ α), equals 1 for x ∈ σ σ x−1+α [α 1 − α), and equals 2α φ( ) for x ∈ [1 − α ∞). σ σ © 2010 The Econometric Society
DOI: 10.3982/ECTA7633
834
M. PESENDORFER AND P. SCHMIDT-DENGLER
ables, and firms strategies are a function of the privately observed payoff shock only. Firms play a Markov equilibrium. The example satisfies Assumptions 1–4 in AM (2007). By construction Fα approaches the uniform distribution in the limit when α vanishes. We assume that α(θ0 ) is chosen sufficiently small (as a function of the level of θ0 ) to allow us to focus on the uniform distribution part of Fα . The assumption α > 0 ensures that εi is distributed on the real line. EQUILIBRIUM
Let P denote the probability that firm i is active and let P = (P 1 P 2 ). Firm i is active if and only if (θ+εi1 )·P 3−i +εi1 ·(1−P 3−i ) > εi2 , which yields (θ)·P 3−i > εi2 −εi1 and gives the following expression for firm i’s probability of being active: i
(2)
P i = Ψ (P 3−i θ) = 1 − Fα (−θ · P 3−i )
We denote Ψ (P θ) = (Ψ (P 2 θ) Ψ (P 1 θ)). An equilibrium solves P = Ψ (P 1 for i = 1 2. θ). The symmetric equilibrium for α small is given by P i = 1−θ The equilibrium is the unique symmetric equilibrium, but it is not a stable equilibrium in the sense that the fixed point on the best response mapping is not asymptotically stable. The equilibrium is evolutionary stable in the sense of Maynard Smith (1982). The instability property in the best response mapping plays a central role in establishing inconsistency of the nested pseudo likelihood (NPL) method, but does not appear to be a reasonable equilibrium refinement concept for the incomplete information Markov game. The reason is that another firm’s strategy is not observable and it is not clear how firms learn from opponents’ behavior to justify the best response mapping as a refinement concept. To “learn” from opponents’ play, a firm would have to calculate long-run averages to infer strategies. But any such long-run average calculation would violate the Markov assumption. NPL METHOD
1 P
2 ) denote the sample frequency estimator of the choice Let
PM = (P M M probabilities. The pseudo log-likelihood for any tuple (P 1 P 2 ) is proportional to (3)
1 ln(1 − Fα (−θ · P 2 )) + (1 − P
1 ) ln Fα (−θ · P 2 ) QM (θ P) ∝ P M M
2 ln(1 − Fα (−θ · P 1 )) + (1 − P
2 ) ln Fα (−θ · P 1 ) +P M M
AM (2007, p. 18, eqs. (29) and (30)) defined the NPL method as a sequence of K }, where the K-stage solves estimators { θM (4)
PK−1 ) θKM = arg max QM (θ θ∈Θ
ESTIMATION OF DYNAMIC DISCRETE GAMES
835
K } are obtained recursively as and the probabilities {P (5)
K PK−1 θM (PK−1 )) PK = Ψ (
We shall examine the limit of the sequential method. Notice that function (5) is distinct from the best response function (2), as θ is not fixed but a function of the choice probabilities. AM (2007) introduced the NPL fixed point as a pair ( θ P) that satisfies the two conditions (6)
θ = arg max QM (θ P) and P = Ψ ( P θ) θ∈Θ
The NPL estimator is defined as the NPL fixed point with the highest value of the pseudo likelihood among all the NPL fixed points. AM (2007) established the consistency of the NPL estimator. We shall illustrate that the sequential method may not converge to the correct fixed point. We shall illustrate that the sequential NPL method can lead to inconsistent estimates. NPL LIMIT K We examine the limit θ of the NPL sequence θM . An estimator θM is conP sistent if θM −→ θ0 , that is, limM→∞ Pr(|θM − θ0 | ≥ μ) = 0. The proof of the following result is given in the Appendix. ∞ M
LIMIT RESULT: The following statements hold. P (i)
PM −→ P0 . (ii) Suppose
PM is the starting value of the NPL choice probability sequence. ∞ P Then θM −→ −1 for any θ0 ∈ (−1 −10). ∞ converges with probability 1 to the number −1 for The limit estimator θM any value of the true parameter θ0 ∈ (−1 −10), even when the choice probability sequence is initialized at the consistent frequency estimator. The example K shows that properties of the estimator θM rely on the order in which the limit is taken. When M is held fixed and the limit K → ∞ is considered, then the sequential method converges to a number distinct from the true value.
ILLUSTRATION
Figure 1 illustrates the NPL difference equation graphically. To simplify the illustration, we depict the NPL difference equation in terms of choice proba2 /P 1 . The NPL sequence for p K = P K is formally stated in equability ratios, p K K tions (8) and (9) in the Appendix. The illustration assumes a true parameter value of θ0 = −2. The equilibrium choice probabilities are then P 1 = P 2 = 1/3
2 ≈ 1/3 for large M.
1 ≈ P and P M M
836
M. PESENDORFER AND P. SCHMIDT-DENGLER
FIGURE 1.—NPL difference equation.
The NPL difference equation has three fixed points. The middle fixed point p = 1 yields the true parameter value θ0 = −2. This fixed point is unstable, as the slope of the difference equation is larger than 1 at p = 1. So the NPL sequence attains the fixed point p = 1 only if it starts at the true value p = 1. For any starting value with p = 1, the NPL sequence moves away from that point. There are two additional fixed points of the NPL sequence with approximate values for (p θ) of (373 −1), (1/373 −1), respectively. These fixed points are stable, and notice that both fixed points imply an approximate parameter value of θ ≈ −1. Any starting point p > 1 converges to the fixed point 1 2 ≈ 021 and P∞ ≈ 079. (373 −1) with equilibrium choice probabilities of P∞ Any starting point p < 1 converges to the third point that equals (1/373 −1) 1 2 ≈ 079 and P∞ ≈ 021. with equilibrium choice probabilities of P∞ The NPL method always converges. A researcher reaches a stable fixed point with probability approaching 1 as M increases. The stable fixed points have approximately the same likelihood values and the same parameter value estimates. Hence, the researcher may incorrectly conclude that the NPL estimate of θ is unique. Observe that the probability that the NPL method converges to the true parameter value approaches zero as M increases, as only starting values that lie on the 45 degree line, where p = 1, yield consistent estimates.
ESTIMATION OF DYNAMIC DISCRETE GAMES
837
= 1 approaches zero as M With the frequency estimator, the probability that p increases. Note that the instability of the fixed point at p = 1 stems from θ0 being smaller than −1 For θ0 ∈ (−1 0), the NPL difference equation will only have one stable fixed point and the NPL method will converge to the true parameter with probability approaching 1. AM (2007) explained that in case of multiple fixed points, the researcher may initiate the sequence at different starting values for the choice probabilities P0 and choose the sequence that maximizes the pseudo maximum likelihood in the limit. This suggestion works in the example only if the econometrician guesses correctly that the choice probability estimates lie on the 45 degree line. The 45 degree line emerges in this simple example because of the assumed symmetry. Guessing the relationship between choice probabilities correctly may be more difficult in richer settings. For instance, introducing a slight asymmetry in payoffs in the current example would require the researcher to find the solution to a cubic equation.3 Yet, as already observed in AM (2007), consistent estimates of θ emerge only if all the NPL fixed points are calculated and compared. Computationally, the task of finding all fixed points is demanding. Importantly, this task is not achieved by the NPL method when the fixed point on the best response mapping is not asymptotically stable. The inconsistency of the NPL method appears not to be an artefact of the chosen static example. A Monte Carlo study in Pesendorfer and SchmidtDengler (2008) illustrated that the same problem may emerge in richer settings. In a rich and realistic dynamic entry game, the NPL method converged, but did not converge to the true value in three of five dynamic entry equilibria. APPENDIX: PROOF OF THE LIMIT RESULT (i) This follows immediately as the sample frequency estimator is consistent. (ii) We begin by describing the expression for the NPL difference equation. In the description, we initially impose the condition that along the NPL sequence (A)
P 1 P 2 ∈ (α 1 − α)
Later, we establish that condition (A) indeed holds at each point along the NPL sequence PK . Observe that condition (A) eventually holds at the starting values, that is, for any μ > 0 there exists an M such that, for all M > M
1 P
2 ∈ (α 1 − α)) > 1 − μ. This follows immediately from part (i) as Pr(P M M P0 ∈ (α 1 − α). 3
Details of such an example are available from the authors on request.
838
M. PESENDORFER AND P. SCHMIDT-DENGLER
The necessary first order condition in problem (4) when P 1 and P 2 satisfy property (A) yields (7)
1 P 2 /(1 + θ · P 2 ) + (1 − P
1 )/θ ∂QM /∂θ = P M M
2 P 1 (1 + θ · P 1 ) + (1 − P
2 )/θ +P M M = 0
θM (P) which gives rise to a quadratic equation in θ.4 Substituting the solution into equation (5) yields the following difference equation that characterizes the NPL method: 2 2 1 PK−1 PK−1 P 1 2 (8) 1 + hM · K−1 (PK PK ) = 1 + hM 1 1 2 PK−1 PK−1 PK−1 where
2
1 2−P 2−P M M − ·p 4 4 1 1 2 1 2
M
M
M PM · p [2 − P − (2 − P ) · p]2 + 4 · P + 4
hM (p) = −
We wish to study the limit of the NPL sequence (8). Notice that the right hand side in equation (8) is determined by the probability ratios pK−1 = 2 1 PK−1 /PK−1 and does not depend on the probability levels. Restating the sequence in terms of the probability ratios yields a one dimensional difference equation which is easier to analyze: (9)
pK = gM (pK−1 ) =
1 + hM (pK−1 )/pK−1 1 + hM (pK−1 )
1 P
2 < 1/2, the function gM in (9) has exactly three fixed points: When P M M p = 1
1 2 2 1 2
1 − P
2 + (2 − P
M
M
M
1 ) PM /(2P −P ) − 4P p∗M = 2 − P M M M
1 2 2 1 2
1 )
1
2
M
M
M PM /(2P (2 − P −P ) − 4P p∗∗ M = 2 − PM − PM − M ∗ with p∗∗ M < 1 < pM . Part (i) and the assumption θ0 < −1 imply that for any
1 P
2 < 1/2) > 1 − μ, μ > 0, there exists an M such that, for all M > M, Pr(P M M 4
1 )/(4P 2 ) − (2 − P
2 )/(4P 1 ) + ([(2 − P
1 )/(4P 2 ) − (2 − With solution θM (P) = −(2 − P M M M
1 P
2 /(4P 1 P 2 ))1/2
2 )/(4P 1 )]2 + P P M M M
ESTIMATION OF DYNAMIC DISCRETE GAMES
839
and the described fixed points arise with probability 1 as M → ∞. The first
2 )/2 which yields θ
1 + P fixed point implies equal choice probabilities of (P M M close to θ0 . The second and third fixed points yield choice probabilities of ∗∗ ∗∗ (1/(1 + p∗M ) p∗M /(1 + p∗M )) and (1/(1 + p∗∗ M ) pM /(1 + pM )), respectively, with θ = −1. Which of the described fixed points is attained as the NPL limit is determined by the shape of the function gM and the starting values. Next, we observe four properties of gM which are then used to determine the limit of the NPL sequence. Then we briefly sketch the proofs of these properties.5 Properties of gM PROPERTY 1: gM (p) > 1 if and only if p > 1, and gM (p) = 1 if and only if p = 1. PROPERTY 2: gM has a nonnegative derivative for p ≥ 1. PROPERTY 3: The derivative ∂gM (p)/∂p evaluated at p = 1 equals −1 + P
1 + P
2 ) and, from Limit Result part (i), ∂gM (p)/∂p|p=1 −→ 2/(P −θ0 . M M PROPERTY 4: For any μ > 0 there exists an M such that, for all M > M, Pr(limp→∞ gM (p) < ∞) ≥ 1 − μ. Property (A) implies that 1 + hM (pK−1 )/pK−1 1 + hM (pK−1 ) ∈ (α 1 − α), and Property 1 follows immediately from inspection of equation (9). Without loss of generality, we may relabel firms’ identities and by Property 1, we may restrict attention to the case p ≥ 1 and to fixed points p = 1 and p = p∗M . We do so for the remainder of this proof. Properties 2 and 3 can be seen by taking the derivative. Property 4 can be established by using l’Hospital’s rule as
lim hM (p) = lim (hM (p)/p)/(1/p) p→∞
p→∞
∂(hM (p)/p) ∂(1/p) p→∞ ∂p ∂p
= lim
The derivative Property 3 combined with monotonicity Property 2 imply that the fixed point p = 1 is unstable and fixed points p∗M (and p∗∗ M ) are stable. To see this, observe that for any μ > 0 there exists an M such that, for all M > M, with probability 1 − μ, the monotone function gM intersects the 45 degree line at p = 1 from below, as the slope is strictly larger than 1 at p = 1. In turn, this implies that the function gM intersects the 45 degree line at fixed points p∗M 5
A complete proof of the properties can be obtained from the authors.
840
M. PESENDORFER AND P. SCHMIDT-DENGLER
(and p∗∗ M ) from above. Now, as the function gM is monotone for p ≥ 1 (and finite at ∞), it must hold that the slope of the function gM at fixed points p∗M (and p∗∗ M ) is between 0 and 1 (and strictly less than 1 from Property 4), which establishes (local) stability. We can now determine the limit of the NPL sequence. For any μ > 0, there exists an M such that, for all M > M, with probability 1 − μ, equation (9) converges to fixed point p∗M whenever the starting value exceeds 1 (and it converges to fixed point two, p∗∗ M , whenever the starting value is less than 1). To see this, notice that for starting values in the interval (1 p∗M ), the difference equation (9) increases toward fixed point p∗M as the function gM is monotone increasing and above the 45 degree line. On the other hand, for starting values in the interval (p∗M ∞), the difference equation (9) decreases toward fixed point p∗M as the function gM is monotone increasing and below the 45 degree line. Next, we establish property (A). We already know from part (i) that for any
1 P
1 ∈ (P0 − α P0 + μ > 0, there exists an M such that, for all M > M, Pr(P M M α)) > 1 − μ. We need to establish that the updated choice probabilities, based on the updating equation (8), are contained in (α 1 − α) whenever p ∈ [α/(1 − α) (1 − α)/α]. Without loss of generality, we relabel firms’ identities so that P 2 ≥ P 1 and we examine the condition for p ∈ [1 (1 − α)/α]. We need to show that α < PK1 (p) and PK2 (p) < 1 − α. The second inequality can be established
1 by rewriting the equation hM (p) conveniently as hM (p) = −[(2 − PM )/4 + (2 − 1 2 1 1
2 ) · p/4] + [(2 − P
M
M
M
M P )/4 + (2 − P ) · p/4]2 − (2 − P −P ) · p/2. For any M
μ > 0, there exists an M such that, for all M > M, with probability 1 − μ, the term in parentheses is strictly positive, and the term under the square root is strictly smaller than the square of the first term in square brackets. Thus, with probability 1 − μ, the expression hM (p) is strictly less than zero on [1 p∗M ]. Since PK2 = 1 + hM (p)/p, this implies that PK2 < 1 − α. An examination of the derivative of PK1 (p) reveals that it equals ∂hM (p)/∂p, which is nonpositive. Thus, it suffices to establish that limp→∞ PK1 (p) > α with probability 1 − μ. Rewriting the inequality yields 1 2 1 1
M
M
M
M [2 − P + (2 − P ) · p]2 − 8(2 − P −P )·p
2 ) · p]
1 + (2 − P > −4(1 − α) + [2 − P M M The expression under the root is positive (which can be immediately seen from the equivalent representation of the root in (8)). Squaring both the left and
1 − α(2 − P
2 )] > (1 − α)[P
1 − right hand sides yields (after cancelling) p · [P M M M 2α], which indeed holds with probability 1 − μ for p sufficiently large. So far we have shown that for starting values P 2 = P 1 , the NPL sequence converges to the limit θ = −1. To complete the argument, we need to establish
1 = P
2 ) = 0. Note that the most likely outcome of an (M P0 ) that limM→∞ Pr(P M M
841
ESTIMATION OF DYNAMIC DISCRETE GAMES
binomial distribution is given by k = (M + 1)P0 where x is the smallest integer less than or equal to x. Using this notation, we find that an upper bound on the probability
2 ) =
1 = P Pr(P M M
M M k=0
is given by M M k
k=0
k
(P0 ) (1 − P0 ) k
M−k
(P0 ) (1 − P0 ) k
M−k
M k
M k
(P0 )k (1 − P0 )M−k
(P0 ) (1 − P0 ) k
M−k
= Pr(k = k)
√ Robbins (1955) illustrated bounds on M! and showed that M! = 2πM(M/ e)M erM , where 1/(12M +1) < rM < 1/(12M). For M > max(1/P0 P0 /(1 − P0 )), we can use these bounds to obtain that Pr(k = k) is less than or equal to M (MP0 )k (M(1 − P0 ))M−k 1/(12M) e 2πk(M − k) (k)k (M − k)(M−k) (MP0 )k M ≤ 2π(MP0 − 1)(M(1 − P0 ) − P0 ) (MP0 − 1)k × =
(M(1 − P0 ))M−k e1/(12M) (M(1 − P0 ) − P0 )(M−k) 1 1 k P0 1 2π(MP0 − 1) (1 − P0 ) − 1− M P0 M 1
× 1− ≤
P0 M(1 − P0 )
1/(12M)
(M−k) e
1 1 (M+1)P0 P0 1 2π(MP0 − 1) (1 − P0 ) − 1− M P0 M 1
×
P0 1− M(1 − P0 )
1/(12M)
M e
842
M. PESENDORFER AND P. SCHMIDT-DENGLER
The inequalities follow because (MP0 − 1) < (M + 1)P0 < (M + 1)P0 . The first term in the last expression converges to zero, and the remaining three
1 = P
2 ) = 0. This completes terms are bounded. It follows that limM→∞ Pr(P M M the proof. Q.E.D. REFERENCES AGUIRREGABIRIA, V., AND P. MIRA (2007): “Sequential Estimation of Dynamic Discrete Games,” Econometrica, 75, 1–53. [833-835,837] MAYNARD SMITH, J. (1982): Evolution and the Theory of Games. Cambridge, U.K.: Cambridge University Press. [834] PESENDORFER, M., AND P. SCHMIDT-DENGLER (2008): “Asymptotic Least Squares Estimators for Dynamic Games,” Review of Economic Studies, 75, 901–928. [837] ROBBINS, H. (1955): “A Remark on Stirling’s Formula,” American Mathematical Monthly, 62, 26–29. [841]
Dept. of Economics, London School of Economics, Houghton Street, WC2A 2AE London, U.K.;
[email protected] and Dept. of Economics, London School of Economics, Houghton Street, WC2A 2AE London, U.K;
[email protected]. Manuscript received December, 2007; final revision received June, 2009.
Econometrica, Vol. 78, No. 2 (March, 2010), 843–844
ANNOUNCEMENTS 2010 WORLD CONGRESS OF THE ECONOMETRIC SOCIETY
THE TENTH WORLD CONGRESS of the Econometric Society will be held in Shanghai from August 17th to August 21th, 2010. It is hosted by Shanghai Jiao Tong University in cooperation with Shanghai University of Finance and Economics, Fudan University, China Europe International Business School, and the Chinese Association of Quantitative Economics. The congress is open to all economists, including those who are not now members of the Econometric Society. It is hoped that papers presented at the Congress will represent a broad spectrum of applied and theoretical economics and econometrics. The Program Co-Chairs are: Professor Daron Acemoglu, MIT Department of Economics, E52-380B, 50 Memorial Drive, Cambridge, MA 02142-1347, U.S.A. Professor Manuel Arellano, CEMFI, Casado del Alisal 5, 28014 Madrid, Spain. Professor Eddie Dekel, Department of Economics, Northwestern University, 2003 Sheridan Rd., Evanston, IL 60208-2600, U.S.A., and Eitan Berglas School of Economics, Tel Aviv University, Tel Aviv 69978, Israel. Submissions will be open from November 1st, 2009 and will be accepted only in electronic form at www.eswc2010.com. The deadline for such submissions will be January 30th, 2010. There will be financial assistance for young scholars to be allocated once the decisions on submitted papers have been made. At least one co-author must be a member of the Society or must join prior to submission. This can be done electronically at www.econometricsociety.org. The Chair of the Local Organizing Committee is: Professor Lin Zhou, Department of Economics, Shanghai Jiao Tong University, Shanghai 200052, China, and Department of Economics, Arizona State University, Tempe, AZ 85287, U.S.A. Detailed information on registration and housing will be sent by email to all members of the Econometric Society in due course and will be available at www.eswc2010.com. THE 2011 NORTH AMERICAN WINTER MEETING
THE 2011 NORTH AMERICAN WINTER MEETING of the Econometric Society will be held in Denver, CO, from January 7–9, 2011, as part of the annual © 2010 The Econometric Society
DOI: 10.3982/ECTA782ANN
844
ANNOUNCEMENTS
meeting of the Allied Social Science Associations. The program will consist of contributed and invited papers. The program committee invites contributions in the form of individual papers and entire sessions (of three or four papers). Each person may submit only one paper and only present one paper. However, each person is allowed to be the co-author of several papers submitted to the conference. At least one coauthor must be a member of the Society or must join prior to submission. You may join the Econometric Society at http://www.econometricsociety.org. The submissions should represent original manuscripts not previously presented to any Econometric Society regional meeting or submitted to other professional organizations for presentation at these same meetings. Prospective contributors are invited to submit titles and abstracts of their papers by May 3, 2010 at the conference website: https://editorialexpress.com/conference/NAWM2011 Authors that submit complete papers are treated favorably. The following information should also be provided electronically at the time of submission: the authors’ names, affiliations, complete addresses, telephone and fax numbers; both the email addresses and web sites (if any) of the submitters; the JEL primary field name and number (important since papers will be assigned to program committee members with overlapping research interest); and the paper title. Program Committee Chair: Markus K. Brunnermeier
Econometrica, Vol. 78, No. 2 (March, 2010), 845
FORTHCOMING PAPERS THE FOLLOWING MANUSCRIPTS, in addition to those listed in previous issues, have been accepted for publication in forthcoming issues of Econometrica. BRUHIN, ADRIAN, HELGA FEHR-DUDA, AND THOMAS EPPER: “Risk and Rationality: Uncovering Heterogeneity in Probability Distortion.” COMPTE, OLIVIER, AND PHILIPPE JEHIEL: “The Coalitional Nash Bargaining Solution.” ERGIN, HALUK, AND TODD SARVER: “A Unique Costly Contemplation Representation.” HASHIMOTO, TADASHI: “A Corrigendum to ‘Games With Imperfectly Observable Actions in Continuous Time’.” HELPMAN, ELHANAN, OLEG ITSKHOKI, AND STEPHEN REDDING: “Inequality and Unemployment in a Global Economy.” PETERS, MICHAEL: “Noncontractible Heterogeneity in Directed Search.”
© 2010 The Econometric Society
DOI: 10.3982/ECTA782FORTH
SUBMISSION OF MANUSCRIPTS TO ECONOMETRICA 1. Members of the Econometric Society may submit papers to Econometrica electronically in pdf format according to the guidelines at the Society’s website: http://www.econometricsociety.org/submissions.asp Only electronic submissions will be accepted. In exceptional cases for those who are unable to submit electronic files in pdf format, one copy of a paper prepared according to the guidelines at the website above can be submitted, with a cover letter, by mail addressed to Professor Stephen Morris, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A. 2. There is no charge for submission to Econometrica, but only members of the Econometric Society may submit papers for consideration. In the case of coauthored manuscripts, at least one author must be a member of the Econometric Society. Nonmembers wishing to submit a paper may join the Society immediately via Blackwell Publishing’s website. Note that Econometrica rejects a substantial number of submissions without consulting outside referees. 3. It is a condition of publication in Econometrica that copyright of any published article be transferred to the Econometric Society. Submission of a paper will be taken to imply that the author agrees that copyright of the material will be transferred to the Econometric Society if and when the article is accepted for publication, and that the contents of the paper represent original and unpublished work that has not been submitted for publication elsewhere. If the author has submitted related work elsewhere, or if he does so during the term in which Econometrica is considering the manuscript, then it is the author’s responsibility to provide Econometrica with details. There is no page fee; nor is any payment made to the authors. 4. Econometrica has the policy that all empirical and experimental results as well as simulation experiments must be replicable. For this purpose the Journal editors require that all authors submit datasets, programs, and information on experiments that are needed for replication and some limited sensitivity analysis. (Authors of experimental papers can consult the posted list of what is required.) This material for replication will be made available through the Econometrica supplementary material website. The format is described in the posted information for authors. Submitting this material indicates that you license users to download, copy, and modify it; when doing so such users must acknowledge all authors as the original creators and Econometrica as the original publishers. If you have compelling reason we may post restrictions regarding such usage. At the same time the Journal understands that there may be some practical difficulties, such as in the case of proprietary datasets with limited access as well as public use datasets that require consent forms to be signed before use. In these cases the editors require that detailed data description and the programs used to generate the estimation datasets are deposited, as well as information of the source of the data so that researchers who do obtain access may be able to replicate the results. This exemption is offered on the understanding that the authors made reasonable effort to obtain permission to make available the final data used in estimation, but were not granted permission. We also understand that in some particularly complicated cases the estimation programs may have value in themselves and the authors may not make them public. This, together with any other difficulties relating to depositing data or restricting usage should be stated clearly when the paper is first submitted for review. In each case it will be at the editors’ discretion whether the paper can be reviewed. 5. Papers may be rejected, returned for specified revision, or accepted. Approximately 10% of submitted papers are eventually accepted. Currently, a paper will appear approximately six months from the date of acceptance. In 2002, 90% of new submissions were reviewed in six months or less. 6. Submitted manuscripts should be formatted for paper of standard size with margins of at least 1.25 inches on all sides, 1.5 or double spaced with text in 12 point font (i.e., under about 2,000 characters, 380 words, or 30 lines per page). Material should be organized to maximize readability; for instance footnotes, figures, etc., should not be placed at the end of the manuscript. We strongly encourage authors to submit manuscripts that are under 45 pages (17,000 words) including everything (except appendices containing extensive and detailed data and experimental instructions).
While we understand some papers must be longer, if the main body of a manuscript (excluding appendices) is more than the aforementioned length, it will typically be rejected without review. 7. Additional information that may be of use to authors is contained in the “Manual for Econometrica Authors, Revised” written by Drew Fudenberg and Dorothy Hodges, and published in the July, 1997 issue of Econometrica. It explains editorial policy regarding style and standards of craftmanship. One change from the procedures discussed in this document is that authors are not immediately told which coeditor is handling a manuscript. The manual also describes how footnotes, diagrams, tables, etc. need to be formatted once papers are accepted. It is not necessary to follow the formatting guidelines when first submitting a paper. Initial submissions need only be 1.5 or double-spaced and clearly organized. 8. Papers should be accompanied by an abstract of no more than 150 words that is full enough to convey the main results of the paper. On the same sheet as the abstract should appear the title of the paper, the name(s) and full address(es) of the author(s), and a list of keywords. 9. If you plan to submit a comment on an article which has appeared in Econometrica, we recommend corresponding with the author, but require this only if the comment indicates an error in the original paper. When you submit your comment, please include any correspondence with the author. Regarding comments pointing out errors, if an author does not respond to you after a reasonable amount of time, then indicate this when submitting. Authors will be invited to submit for consideration a reply to any accepted comment. 10. Manuscripts on experimental economics should adhere to the “Guidelines for Manuscripts on Experimental Economics” written by Thomas Palfrey and Robert Porter, and published in the July, 1991 issue of Econometrica. Typeset at VTEX, Akademijos Str. 4, 08412 Vilnius, Lithuania. Printed at The Sheridan Press, 450 Fame Avenue, Hanover, PA 17331, USA. Copyright © 2010 by The Econometric Society (ISSN 0012-9682). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation, including the name of the author. Copyrights for components of this work owned by others than the Econometric Society must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Posting of an article on the author’s own website is allowed subject to the inclusion of a copyright statement; the text of this statement can be downloaded from the copyright page on the website www.econometricsociety.org/permis.asp. Any other permission requests or questions should be addressed to Claire Sashi, General Manager, The Econometric Society, Dept. of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA. Email:
[email protected]. Econometrica (ISSN 0012-9682) is published bi-monthly by the Econometric Society, Department of Economics, New York University, 19 West 4th Street, New York, NY 10012. Mailing agent: Sheridan Press, 450 Fame Avenue, Hanover, PA 17331. Periodicals postage paid at New York, NY and additional mailing offices. U.S. POSTMASTER: Send all address changes to Econometrica, Blackwell Publishing Inc., Journals Dept., 350 Main St., Malden, MA 02148, USA.
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Membership Joining the Econometric Society, and paying by credit card the corresponding membership rate, can be done online at www.econometricsociety.org. Memberships are accepted on a calendar year basis, but the Society welcomes new members at any time of the year, and in the case of print subscriptions will promptly send all issues published earlier in the same calendar year. Membership Benefits • Possibility to submit papers to Econometrica, Quantitative Economics, and Theoretical Economics • Possibility to submit papers to Econometric Society Regional Meetings and World Congresses • Full text online access to all published issues of Econometrica (Quantitative Economics and Theoretical Economics are open access) • Full text online access to papers forthcoming in Econometrica (Quantitative Economics and Theoretical Economics are open access) • Free online access to Econometric Society Monographs, including the volumes of World Congress invited lectures • Possibility to apply for travel grants for Econometric Society World Congresses • 40% discount on all Econometric Society Monographs • 20% discount on all John Wiley & Sons publications • For print subscribers, hard copies of Econometrica, Quantitative Economics, and Theoretical Economics for the corresponding calendar year Membership Rates Membership rates depend on the type of member (ordinary or student), the class of subscription (print and online or online only) and the country classification (high income or middle and low income). The rates for 2010 are the following:
Ordinary Members Print and Online Online only Print and Online Online only
1 year (2010) 1 year (2010) 3 years (2010–2012) 3 years (2010–2012)
Student Members Print and Online Online only
1 year (2010) 1 year (2010)
High Income
Other Countries
$90 / €65 / £55 $50 / €35 / £30 $216 / €156 / £132 $120 / €84 / £72
$50 $10 $120 $24
$50 / €35 / £30 $10 / €7 / £6
$50 $10
Euro rates are for members in Euro area countries only. Sterling rates are for members in the UK only. All other members pay the US dollar rate. Countries classified as high income by the World Bank are: Andorra, Antigua and Barbuda, Aruba, Australia, Austria, The Bahamas, Bahrain, Barbados, Belgium, Bermuda, Brunei, Canada, Cayman Islands, Channel Islands, Croatia, Cyprus, Czech Republic, Denmark, Equatorial Guinea, Estonia, Faeroe Islands, Finland, France, French Polynesia, Germany, Greece, Greenland, Guam, Hong Kong (China), Hungary, Iceland, Ireland, Isle of Man, Israel, Italy, Japan, Rep. of Korea, Kuwait, Liechtenstein, Luxembourg, Macao (China), Malta, Monaco, Netherlands, Netherlands Antilles, New Caledonia, New Zealand, Northern Mariana Islands, Norway, Oman, Portugal, Puerto Rico, Qatar, San Marino, Saudi Arabia, Singapore, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Taiwan (China), Trinidad and Tobago, United Arab Emirates, United Kingdom, United States, Virgin Islands (US). Institutional Subscriptions Information on Econometrica subscription rates for libraries and other institutions is available at www.econometricsociety.org. Subscription rates depend on the class of subscription (print and online or online only) and the country classification (high income, middle income, or low income). Back Issues and Claims For back issues and claims contact Wiley Blackwell at
[email protected].
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Administrative Office: Department of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA; Tel. 212-9983820; Fax 212-9954487 General Manager: Claire Sashi (
[email protected]) 2010 OFFICERS JOHN MOORE, University of Edinburgh and London School of Economics, PRESIDENT BENGT HOLMSTRÖM, Massachusetts Institute of Technology, FIRST VICE-PRESIDENT JEAN-CHARLES ROCHET, Toulouse School of Economics, SECOND VICE-PRESIDENT ROGER B. MYERSON, University of Chicago, PAST PRESIDENT RAFAEL REPULLO, CEMFI, EXECUTIVE VICE-PRESIDENT
2010 COUNCIL DARON ACEMOGLU, Massachusetts Institute of Technology MANUEL ARELLANO, CEMFI SUSAN ATHEY, Harvard University ORAZIO ATTANASIO, University College London DAVID CARD, University of California, Berkeley JACQUES CRÉMER, Toulouse School of Economics (*)EDDIE DEKEL, Tel Aviv University and Northwestern University MATHIAS DEWATRIPONT, Free University of Brussels DARRELL DUFFIE, Stanford University GLENN ELLISON, Massachusetts Institute of Technology HIDEHIKO ICHIMURA, University of Tokyo (*)MATTHEW O. JACKSON, Stanford University MICHAEL P. KEANE, University of Technology Sydney LAWRENCE J. LAU, Chinese University of Hong Kong
CESAR MARTINELLI, ITAM ANDREW MCLENNAN, University of Queensland ANDREU MAS-COLELL, Universitat Pompeu Fabra and Barcelona GSE AKIHIKO MATSUI, University of Tokyo HITOSHI MATSUSHIMA, University of Tokyo MARGARET MEYER, University of Oxford PAUL R. MILGROM, Stanford University STEPHEN MORRIS, Princeton University JUAN PABLO NICOLINI, Universidad Torcuato di Tella CHRISTOPHER A. PISSARIDES, London School of Economics (*)ROBERT PORTER, Northwestern University JEAN-MARC ROBIN, Université de Paris I and University College London LARRY SAMUELSON, Yale University ARUNAVA SEN, Indian Statistical Institute JÖRGEN W. WEIBULL, Stockholm School of Economics
The Executive Committee consists of the Officers, the Editors of Econometrica (Stephen Morris), Quantitative Economics (Orazio Attanasio), and Theoretical Economics (Martin J. Osborne), and the starred (*) members of the Council.
REGIONAL STANDING COMMITTEES Australasia: Trevor S. Breusch, Australian National University, CHAIR; Maxwell L. King, Monash University, SECRETARY. Europe and Other Areas: John Moore, University of Edinburgh and London School of Economics, CHAIR; Helmut Bester, Free University Berlin, SECRETARY; Enrique Sentana, CEMFI, TREASURER. Far East: Hidehiko Ichimura, University of Tokyo, CHAIR. Latin America: Pablo Andres Neumeyer, Universidad Torcuato di Tella, CHAIR; Juan Dubra, University of Montevideo, SECRETARY. North America: Bengt Holmström, Massachusetts Institute of Technology, CHAIR; Claire Sashi, New York University, SECRETARY. South and Southeast Asia: Arunava Sen, Indian Statistical Institute, CHAIR.