CONTENTS SRIHARI GOVINDAN AND ROBERT WILSON: On Forward Induction. . . . . . . . . . . . . . . . . . . . . . .
Public vs. Private Offers in the Market for Lemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RON SIEGEL: All-Pay Contests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LANCE FORTNOW AND RAKESH V. VOHRA: The Complexity of Forecast Testing . . . . . . . . . . GARY CHAMBERLAIN AND MARCELO J. MOREIRA: Decision Theory Applied to a Linear Panel Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HIROYUKI KASAHARA AND KATSUMI SHIMOTSU: Nonparametric Identification of Finite Mixture Models of Dynamic Discrete Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LARS PETER HANSEN AND JOSÉ A. SCHEINKMAN: Long-Term Risk: An Operator Approach JONATHAN L. BURKE: Virtual Determinacy in Overlapping Generations Models . . . . . . . . JOHN R. CONLON: Two New Conditions Supporting the First-Order Approach to Multisignal Principal–Agent Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
JOHANNES HÖRNER AND NICOLAS VIEILLE:
NOTES AND COMMENTS: BERNARD SINCLAIR-DESGAGNÉ: Ancillary Statistics in Principal–Agent Models . . . . SÍLVIA GONÇALVES AND NOUR MEDDAHI: Bootstrapping Realized Volatility . . . . . . .
29 71 93 107 135 177 235 249 279 283
BIRGIT HEYDENREICH, RUDOLF MÜLLER, MARC UETZ, AND RAKESH V. VOHRA: Char-
acterization of Revenue Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Corrigendum to “Existence and Uniqueness of Solutions to the Bellman Equation in the Unbounded Case” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
307
317
ANNOUNCEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FORTHCOMING PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REPORT OF THE SECRETARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REPORT OF THE TREASURER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REPORT OF THE EDITORS 2007–2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ECONOMETRICA REFEREES 2007–2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REPORT OF THE EDITORS OF THE MONOGRAPH SERIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SUBMISSION OF MANUSCRIPTS TO THE ECONOMETRIC SOCIETY MONOGRAPH SERIES . . . . . . . . . .
319 325 327 335 341 347 357 361
JUAN PABLO RINCÓN-ZAPATERO AND CARLOS RODRÍGUEZ-PALMERO:
VOL. 77, NO. 1 — January, 2009
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org EDITOR STEPHEN MORRIS, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] MANAGING EDITOR GERI MATTSON, 2002 Holly Neck Road, Baltimore, MD 21221, U.S.A.; mattsonpublishingservices@ comcast.net CO-EDITORS DARON ACEMOGLU, Dept. of Economics, MIT, E52-380B, 50 Memorial Drive, Cambridge, MA 021421347, U.S.A.;
[email protected] STEVEN BERRY, Dept. of Economics, Yale University, 37 Hillhouse Avenue/P.O. Box 8264, New Haven, CT 06520-8264, U.S.A.;
[email protected] WHITNEY K. NEWEY, Dept. of Economics, MIT, E52-262D, 50 Memorial Drive, Cambridge, MA 021421347, U.S.A.;
[email protected] WOLFGANG PESENDORFER, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] LARRY SAMUELSON, Dept. of Economics, Yale University, New Haven, CT 06520-8281, U.S.A.;
[email protected] HARALD UHLIG, Dept. of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, U.S.A.;
[email protected] ASSOCIATE EDITORS YACINE AÏT-SAHALIA, Princeton University JOSEPH G. ALTONJI, Yale University JAMES ANDREONI, University of California, San Diego DONALD W. K. ANDREWS, Yale University JUSHAN BAI, New York University MARCO BATTAGLINI, Princeton University PIERPAOLO BATTIGALLI, Università Bocconi DIRK BERGEMANN, Yale University MICHELE BOLDRIN, Washington University in St. Louis VICTOR CHERNOZHUKOV, Massachusetts Institute of Technology J. DARRELL DUFFIE, Stanford University JEFFREY ELY, Northwestern University LARRY G. EPSTEIN, Boston University HALUK ERGIN, Washington University in St. Louis FARUK GUL, Princeton University JINYONG HAHN, University of California, Los Angeles PHILIP A. HAILE, Yale University PHILIPPE JEHIEL, Paris School of Economics YUICHI KITAMURA, Yale University PER KRUSELL, Princeton University and Stockholm University OLIVER LINTON, London School of Economics
BART LIPMAN, Boston University THIERRY MAGNAC, Toulouse School of Economics GEORGE J. MAILATH, University of Pennsylvania DAVID MARTIMORT, IDEI-GREMAQ, Université des Sciences Sociales de Toulouse STEVEN A. MATTHEWS, University of Pennsylvania ROSA L. MATZKIN, University of California, Los Angeles LEE OHANIAN, University of California, Los Angeles WOJCIECH OLSZEWSKI, Northwestern University ERIC RENAULT, University of North Carolina PHILIP J. RENY, University of Chicago JEAN-MARC ROBIN, Université de Paris 1 and University College London SUSANNE M. SCHENNACH, University of Chicago UZI SEGAL, Boston College CHRIS SHANNON, University of California, Berkeley NEIL SHEPHARD, Oxford University MARCIANO SINISCALCHI, Northwestern University JEROEN M. SWINKELS, Washington University in St. Louis ELIE TAMER, Northwestern University IVÁN WERNING, Massachusetts Institute of Technology ASHER WOLINSKY, Northwestern University
EDITORIAL ASSISTANT: MARY BETH BELLANDO, Dept. of Economics, Princeton University, Fisher Hall, Princeton, NJ 08544-1021, U.S.A.;
[email protected] Information on MANUSCRIPT SUBMISSION is provided in the last two pages. Information on MEMBERSHIP, SUBSCRIPTIONS, AND CLAIMS is provided in the inside back cover.
Econometrica, Vol. 77, No. 1 (January, 2009), 1–28
ON FORWARD INDUCTION BY SRIHARI GOVINDAN AND ROBERT WILSON1 A player’s pure strategy is called relevant for an outcome of a game in extensive form with perfect recall if there exists a weakly sequential equilibrium with that outcome for which the strategy is an optimal reply at every information set it does not exclude. The outcome satisfies forward induction if it results from a weakly sequential equilibrium in which players’ beliefs assign positive probability only to relevant strategies at each information set reached by a profile of relevant strategies. We prove that if there are two players and payoffs are generic, then an outcome satisfies forward induction if every game with the same reduced normal form after eliminating redundant pure strategies has a sequential equilibrium with an equivalent outcome. Thus in this case forward induction is implied by decision-theoretic criteria. KEYWORDS: Game theory, equilibrium refinement, forward induction, backward induction.
THIS PAPER HAS TWO PURPOSES. One is to provide a general definition of forward induction for games in extensive form with perfect recall. As a refinement of weakly sequential equilibrium, forward induction restricts the support of a player’s belief at an information set to others’ strategies that are optimal replies to some weakly sequential equilibrium with the same outcome, if there are any that reach that information set. The second purpose is to resolve a conjecture by Hillas and Kohlberg (2002, Sec. 13.6), of which the gist is that an invariant backward induction outcome satisfies forward induction. A backward induction outcome is invariant if every game representing the same strategic situation (i.e., they have the same reduced normal form) has a sequential equilibrium with an equivalent outcome. For a game with two players and generic payoffs, we prove that an invariant backward induction outcome satisfies forward induction. The definitions and theorem are entirely decision-theoretic. None of the technical devices invoked in game theory, such as perturbations of players’ strategies or payoffs, is needed.2 Sections 1 and 2 review the motivations for backward induction and forward induction. Sections 3 and 4 provide general definitions of forward induction and invariance. The formulation and proof of the theorem are in Sections 5 and 6. Section 7 examines an alternative version of forward induction, and 1
This work was funded in part by a grant from the National Science Foundation of the United States. We are grateful for superb insights and suggestions from a referee. 2 We retain Kreps and Wilson’s (1982) definition of sequential equilibrium that consistent beliefs are limits of beliefs induced by sequences of completely mixed strategies converging to equilibrium strategies. However, Kohlberg and Reny (1997, Theorem 3.2) established that this property is implied by elementary assumptions that represent the “fundamental property of noncooperative games, namely that no player’s strategy choice affects any other player’s choice.” © 2009 The Econometric Society
DOI: 10.3982/ECTA6956
2
S. GOVINDAN AND R. WILSON
Section 8 concludes. Appendix A proves a technical lemma and Appendix B describes a definition of forward induction for a game in normal form. 1. INTRODUCTION We consider a finite game in extensive form specified by a game tree and an assignment of players’ payoffs to its terminal nodes. Throughout, we assume perfect recall, so the game tree induces a decision tree for each player. A pure strategy for a player specifies an action at each of his information sets, and a mixed strategy is a distribution over pure strategies. A mixed strategy induces a behavioral strategy that mixes anew according to the conditional distribution among actions at each information set. Kuhn (1953, Theorem 4) established for a game with perfect recall that each behavioral strategy is induced by a mixed strategy, and vice versa, inducing the same distribution on histories of play. 1.1. Backward Induction Economic models formulated as games typically have multiple Nash equilibria. Decision-theoretic criteria are invoked to select among Nash equilibria. For a game in extensive form with perfect recall, the primary criterion is backward induction. Backward induction is invoked to eliminate Nash equilibria that depend on implausible behaviors at information sets excluded by other players’ equilibrium strategies. Thus backward induction requires that a player’s strategy remains optimal after every contingency, even those that do not occur if all players use equilibrium strategies. We assume here that backward induction is implemented by sequential equilibrium as defined by Kreps and Wilson (1982, pp. 872, 882). A sequential equilibrium is a pair of profiles of players’ behavioral strategies and beliefs. Here we define a player’s belief to be a conditional probability system (i.e., satisfying Bayes’ rule where well defined) that at each of his information sets specifies a distribution over pure strategies that do not exclude the information set from being reached. A player’s belief is required to be consistent in that it is a limit of the conditional distributions induced by profiles of completely mixed or equivalent behavioral strategies converging to the profile of equilibrium strategies. Kreps and Wilson’s exposition differs in that for a player’s belief at an information set they used only the induced distribution over nodes in that information set. This restriction cannot be invoked here since the purpose of forward induction is to ensure that the support of a player’s belief at an information set is confined to others’ optimal strategies wherever possible, both before this information set as in their formulation in terms of nodes, and also subsequently in the continuation of the game. Thus we use throughout the more general specification that a player’s belief is over strategies.
ON FORWARD INDUCTION
3
The defining feature of a sequential equilibrium is the requirement that in the event an information set is reached, the player acting there behaves according to a strategy that in the continuation is optimal given his belief about Nature’s and other players’ strategies. A weakly sequential equilibrium as defined by Reny (1992, p. 631) is the same as a sequential equilibrium except that if a player’s strategy excludes an information set from being reached, then his continuation strategy there need not be optimal. Section 3 provides a formal definition of weakly sequential equilibrium, which is then used in our definition of forward induction. 1.2. Forward Induction Kohlberg and Mertens (1986, Sec. 2.3) emphasized that refining Nash equilibrium to sequential equilibrium is not sufficient to ensure that behaviors are justified by plausible beliefs. One of their chief illustrations is their Proposition 6 that a stable set of Nash equilibria contains a stable set of the game obtained by deleting a strategy that is not an optimal reply to any equilibrium in the set. Kohlberg and Mertens labeled this result forward induction, but they and other authors do not define the criterion explicitly. The main idea is the one expressed by Hillas and Kohlberg (2002, Sec. 42.13.6) in their survey article: “Forward induction involves an assumption that players assume, even if they see something unexpected, that the other players chose rationally in the past,” to which one can add “and other players will choose rationally in the future.” This addendum is implicit because rationality presumes that prior actions are part of an optimal strategy. See also Kohlberg (1990) and van Damme (2002). Studies of particular classes of two-player games with generic payoffs reveal aspects of what this idea entails. Outside-option games (as in Example 2.1 below) were addressed by van Damme (1989, p. 485). He proposed as a minimal requirement that forward induction should select a sequential equilibrium in which a player rejects the outside option if in the ensuing subgame there is only one equilibrium whose outcome he prefers to the outside option. Signaling games (as in Example 2.2 below) were addressed by Cho and Kreps (1987, p. 202). They proposed an intuitive criterion that was refined further by Banks and Sobel (1987, Sec. 3) to obtain criteria called divinity and universal divinity. These are obtained from iterative application of criteria called D1 and D2 by Cho and Kreps (1987, p. 205). Briefly, a sequential equilibrium satisfies the intuitive criterion if no type of sender could obtain a payoff higher than his equilibrium payoff were he to choose a nonequilibrium message and the receiver responds with an action that is an optimal reply to a belief that imputes zero probability to Nature’s choice of those types that cannot gain from such a deviation regardless of the receiver’s reply. The D1 criterion requires that after an unexpected message, the receiver’s belief imputes zero probability to a type of sender for which there is another type who prefers this deviation for a larger set of those responses of
4
S. GOVINDAN AND R. WILSON
the receiver that are justified by beliefs concentrated on types who could gain from the deviation and response. See also Cho and Sobel (1990) and surveys by van Damme (2002), Fudenberg and Tirole (1993, Sec. 11), Hillas and Kohlberg (2002), and Kreps and Sobel (1994). Battigalli and Siniscalchi (2002, Sec. 5) derived the intuitive criterion from an epistemic model. They said that a player strongly believes that an event is true if he remains certain of this event after any history that does not contradict this event. They consider a signaling game and a belief-complete space of players’ types; for example, one containing all possible hierarchies of conditional probability systems (beliefs about beliefs) that satisfy a coherency condition. Say that a player expects an outcome if his first-order beliefs are consistent with this outcome, interpreted as a probability distribution on terminal nodes of the game tree. They showed in Proposition 11 that an outcome of a sequential equilibrium satisfies the intuitive criterion under the following assumption about the epistemic model: The sender (1) is rational and (2) expects the outcome and believes that (2a) the receiver is rational and (2b) the receiver expects the outcome and strongly believes that (2b.i) the sender is rational and (2b.ii) the sender expects the outcome and believes the receiver is rational.
The key aspect of this condition is the receiver’s strong belief in the sender’s rationality. This implies that the receiver sustains his belief in the sender’s rationality after any message for which there exists some rational explanation for sending that message. These contributions agree that forward induction should ensure that a player’s belief assigns positive probability only to a restricted set of strategies of other players. In each case, the restricted set comprises strategies that satisfy minimal criteria for rational play. Two prominent contributions apply such criteria iteratively. McLennan (1985, p. 901) and Reny (1992, p. 639) proposed different algorithms for iterative elimination of beliefs that are implausible according to some criterion. McLennan defined the set of justifiable equilibria iteratively by excluding a sequential equilibrium that includes a belief for one player that assigns positive probability at an information set to an action of another player that is not optimal in any sequential equilibrium in the restricted set obtained in the previous iteration. Reny defined the set of explicable equilibria iteratively by excluding a belief that assigns positive probability to a pure strategy that is not a best response to some belief in the restricted set obtained in the previous iteration. Essentially, these procedures apply variants of Pearce’s (1984) iterative procedure for identifying rationalizable strategies to the more restrictive context of sequential equilibria. 1.3. Synopsis In Section 2 we illustrate further the motivation for forward induction via two standard examples from the literature. Our analyses of these examples
ON FORWARD INDUCTION
5
anticipate the theorem in Section 6 by showing that the result usually obtained by “forward induction reasoning” is implied by the decision-theoretic criterion called invariance. Invariance requires that the outcome should be unaffected by whether a mixed strategy is treated as a pure strategy. In Section 3 we propose a general definition of forward induction. Its key component specifies relevant pure strategies, that is, those that satisfy minimal criteria for rational play resulting in any given outcome—the induced probability distribution on terminal nodes of the game tree and thus on possible paths of equilibrium play. Our definition says that a pure strategy is relevant if there is some weakly sequential equilibrium with that outcome for which the strategy prescribes an optimal continuation at every information set the strategy does not exclude.3 We then say that an outcome satisfies forward induction if it results from a weakly sequential equilibrium in which each player’s belief at an information set reached by relevant strategies assigns positive probability only to relevant strategies. In Section 6 we prove for general two-player games with generic payoffs that backward induction and invariance imply forward induction. Thus for such games forward induction is implied by standard decision-theoretic criteria. 2. EXAMPLES In this section we illustrate the main ideas with two standard examples. These examples illustrate how forward induction can reject some sequential equilibria in favor of others. Each example is first addressed informally using the ‘forward induction reasoning’ invoked by prior authors. The literature provides no formal definition of forward induction and we defer the statement of our definition to Section 3, but the main idea is evident from the context. Each example is then analyzed using the decision-theoretic criterion called invariance to obtain the same result. Invariance is defined formally in Section 4 and invoked in Theorem 6.1, but in these examples and the theorem it is sufficient to interpret invariance as requiring only that the outcome resulting from a sequential equilibrium is not affected by adding a redundant pure strategy, that is, a pure strategy whose payoffs for all players are replicated by a mixture of other pure strategies. 2.1. An Outside-Option Game The top panel of Figure 1 displays the extensive and normal forms of a twoplayer game consisting of a subgame with simultaneous moves that is preceded by an outside option initially available to player I. The component of Nash 3 See Govindan and Wilson (2007) for a version that obtains results for the weaker concept of relevant actions, rather than strategies. We are indebted to a referee who provided an example of an irrelevant strategy that uses only relevant actions.
6
S. GOVINDAN AND R. WILSON
FIGURE 1.—Two versions of a game with an outside option.
equilibria in which player I chooses his outside option includes an equilibrium in which player II’s strategy has probability 2/3 of his left column and therefore player I is indifferent about deviating to his top row in the subgame, whereas there is no such equilibrium justifying deviating to the bottom row. Alternatively, player I might anticipate that II will recognize rejection of the outside option as a signal that I intends to choose the top row and therefore II should respond with the left column. To apply forward induction, one excludes from the support of II’s belief the dominated strategy in which I rejects the outside option and chooses the bottom row in the subgame. If this restriction is imposed, then after I rejects the outside option, II is sure that I will play the top row and, therefore, II’s optimal strategy is the left column, and anticipating this, player I rejects the outside option. As in Hillas (1994, Figure 2), one can invoke invariance to obtain this conclusion. The bottom panel of Figure 1 shows the expanded extensive form after adjoining the redundant strategy in which, after tentatively rejecting the outside option, player I randomizes between the outside option and the top row of the subgame with probabilities 3/4 and 1/4. Player II does not observe which strategy of player I led to rejection of the outside option. In the unique sequential equilibrium of this expanded game, player I rejects both the outside option and the redundant strategy, and then chooses the top row of the final subgame. A lesson from this example is that an expanded game has imperfect information in the sense that II has imperfect observability about whether I chose the redundant strategy. This is significant for II because I retains the option to choose the bottom row in the subgame iff he rejected the redundant strategy.
ON FORWARD INDUCTION
7
Even though subgame perfection could suffice in the original and expanded game for this simple example, in general one needs sequential equilibrium to analyze the expanded game—as will be seen later in Figure 4 of Section 7. Addition of redundant strategies can alter equilibrium strategies, but if invariance is satisfied, then the induced probabilities of actions along equilibrium paths of the original game are preserved and thus so too is the predicted outcome. One could explicitly map equilibria of an expanded game into induced behavioral probabilities of actions in the original game, but we omit this complication. 2.2. A Signaling Game The top panel of Figure 2 displays the two-player two-stage signaling game Beer–Quiche studied by Cho and Kreps (1987, Sec. II) and discussed further by Kohlberg and Mertens (1986, Sec. 3.6.B) and Fudenberg and Tirole (1993, Sec. 11.2). Consider sequential equilibria with the outcome QQ-R; that is, both types W and S of player I (the sender) choose Q and player II (the receiver) responds
FIGURE 2.—Two versions of the Beer–Quiche game.
8
S. GOVINDAN AND R. WILSON
to Q with R and to B with a probability of F that is ≥1/2. The equilibria in this component are sustained by player II’s belief after observing B that imputes to I’s type W a greater likelihood of having deviated than to type S. In all these equilibria, B is not an optimal action for type W. But in the equilibrium for which player II assigns equal probabilities to W and S after observing B and mixes equally between F and R, type S is indifferent between Q and B. If II recognizes this as the source of I’s deviation, then he infers after observing B that I’s type is S and therefore chooses R. Alternatively, if player I’s type is S, then he might deviate to B in hopes that this action will credibly signal his type, since his equilibrium payoff is 2 from Q but he obtains 3 from player II’s optimal reply R if the signal is recognized, but type W has no comparable incentive to deviate—this is the “speech” suggested by Cho and Kreps (1987, pp. 180–181) to justify their intuitive criterion. One applies forward induction by excluding from the support of II’s belief after observing B those strategies that take action B when I’s type is W. In fact, the sequential equilibria in which both types of I choose Q do not survive this restriction on II’s belief because II’s optimal response to B is then R, which makes it advantageous for player I’s type S to deviate by choosing B. Thus sequential equilibria with the outcome QQ-R do not satisfy forward induction. This leaves only sequential equilibria with the outcome BB-R in which both types of player I choose B and II chooses R after observing B. As in Example 2.1, one can obtain this same conclusion by invoking invariance. The bottom panel of Figure 2 shows the extensive form after adjoining a mixed action X for type S of player I that produces a randomization between B and Q with probabilities 1/9 and 8/9. Denote by BQ player I’s pure strategy that chooses B if his type is W and chooses Q if his type is S, and similarly for his other pure strategies. The normal form of this expanded game is shown in Table I with all payoffs multiplied by 10 (we intentionally omit the pure strategy BX to keep the analysis simple). Now consider the following extensive form that has the same reduced normal form. Player I initially chooses whether or not to use his pure strategy QQ, and if not then subsequently he chooses among his other pure strategies BB, BQ, QB, and QX. After each of these five pure TABLE I STRATEGIC FORM OF THE BEER–QUICHE GAME WITH THE REDUNDANT STRATEGY QX
W
S
B B Q Q Q
B Q B Q X
B: Q:
F F
F R
R F
R R
9, 1 0, 1 10, 1 1, 1 2, 1
9, 1 18, 10 12, 0 21, 9 20, 8
29, 9 2, 0 28, 10 1, 1 4, 2
29, 9 20, 9 30, 9 21, 9 22, 9
ON FORWARD INDUCTION
9
strategies, the extensive form in the bottom panel ensues, but with I’s action dictated by his prior choice of a pure strategy. That is, nature chooses I’s type to be W or S, the selected pure strategy dictates the subsequent choice of B or Q, and then player II (still having observed only which one of B or Q was chosen) chooses F or R. At player I’s information set where, after rejecting QQ, he chooses among his other pure strategies, a sequential equilibrium requires that he assigns zero probability to BQ because it is strictly dominated by QX in the continuation. At player II’s information set, after observing B, a sequential equilibrium requires that his behavioral strategy is an optimal reply to some consistent belief about those strategies of player I that reach this information set. But every mixture of I’s pure strategies BB, QB, and QX implies that, given his choice of B, the induced conditional probability that his type is S exceeds 9/10. Therefore, player II’s reply to B must be R in every sequential equilibrium of this game. Hence the sequential equilibria with outcome QQ-R are inconsistent with invariance, in agreement with failure to satisfy forward induction. A similar analysis applies to the game in Figure 3, which resembles games considered in studies of signaling via costly educational credentials in labor markets as in Spence (1974) and Kreps (1990, Sec. 17.2). In this game, forward induction rejects the outcome of the pooling equilibrium in which both types of player I move right and II responds to right with up (and to left with probability of up ≥1/2), and accepts the outcome of the separating equilibrium in which only I’s top type moves right. The most common use of forward induction in economic models is to reject pooling equilibria in favor of a separating equilibrium, although often what is actually assumed is a weaker implication of forward induction. The way in which invariance is invoked in Example 2.2 is indicative of the proof of Theorem 6.1 in Section 6.
FIGURE 3.—A signaling game with pooling and separating equilibria.
10
S. GOVINDAN AND R. WILSON
3. DEFINITION OF FORWARD INDUCTION In this section we propose a general definition of forward induction for a game in extensive form with perfect recall. Our definition of forward induction relies on the solution concept called weakly sequential equilibrium by Reny (1992, p. 631). Recall from Section 1.1 that a weakly sequential equilibrium is the same as a sequential equilibrium except that a player’s strategy need not be optimal at information sets it excludes. See Reny (1992, Sec. 3.4) for an expanded justification of weakly sequential equilibrium as the right concept for analysis of forward induction. Our definition differs from Reny’s in that we interpret players’ beliefs as specifying distributions over others’ strategies. Beliefs over strategies typically encode more information than necessary to implement sequential rationality, that is, as in Kreps and Wilson (1982), the conditional distribution over nodes in an information set suffices to verify optimality. However, it is only from a belief specified as a conditional distribution over strategies that one can verify whether a player’s belief recognizes the rationality of others’ strategies. As Examples 2.1 and 2.2 illustrate, the purpose of forward induction as a refinement is to reject outcomes that deter player I’s deviation by the threat of II’s response that is optimal for II only because his belief does not recognize I’s deviation as part of an optimal strategy for some equilibrium with the same outcome. To reject such outcomes, it is sufficient that the support of II’s belief is confined to I’s pure strategies that are optimal replies at information sets they do not exclude. The following definition is the analog of the definitions in Kreps and Wilson (1982) and Reny (1992). DEFINITION 3.1—Weakly Sequential Equilibrium: A weakly sequential equilibrium is a pair (b μ) of profiles of players’ behavioral strategies and beliefs. At each information set hn of player n, his behavioral strategy specifies a distribution bn (·|hn ) over his feasible actions, and his belief specifies a distribution μn (·|hn ) over profiles of Nature’s and other players’ pure strategies that enable hn to be reached. These profiles are required to satisfy the following conditions: (i) Consistency: There exists a sequence {bk } of profiles of completely mixed behavioral strategies converging to b and a sequence {σ k } of completely mixed equivalent normal-form strategies such that for each information set of each player the conditional distribution specified by μ is the limit of the conditional distributions obtained from {σ k }.4 (ii) Weak Sequential Rationality: For each player n and each information set hn that bn does not exclude, each action in the support of bn (·|hn ) is part of a pure strategy that is an optimal reply to μn (·|hn ) in the continuation from hn . 4 In this definition, Nature’s strategy is not perturbed. Also, the belief μn (·|hn ) might entail correlation, as observed by Kreps and Ramey (1987).
ON FORWARD INDUCTION
11
A sequential equilibrium is defined exactly the same except that each player’s actions must be optimal at all his information sets, including those excluded by his equilibrium strategy. We interpret forward induction as a property of an outcome of the game, defined as follows. DEFINITION 3.2—Outcome of an Equilibrium: The outcome of an equilibrium of a game in extensive form is the induced probability distribution over the terminal nodes of the game tree. A key feature in the definition of forward induction is the concept of a relevant strategy. DEFINITION 3.3—Relevant Strategy: A pure strategy of a player is relevant for a given outcome if there is a weakly sequential equilibrium with that outcome for which the strategy at every information set it does not exclude prescribes an optimal continuation given the player’s equilibrium belief there. Thus a relevant strategy is optimal for some expectation about others’ equilibrium play with that outcome and his beliefs at events after their deviations. For instance, in Example 2.2 of a signaling game, the strategy QB of the sender I in which type W chooses Q and type S chooses B is relevant for the outcome QQ-R because it is an optimal reply to the weakly sequential equilibrium with that outcome in which the receiver II responds to B by using F and R with equal probabilities. But the strategies BB and BQ are irrelevant because B is not an optimal reply for I’s type W to any weakly sequential equilibrium with outcome QQ-R. For the standard examples in Section 2 it is sufficient to interpret forward induction as requiring merely that player II’s belief at the information set excluded by I’s equilibrium strategy imputes positive probability only to the node reached by I’s nonequilibrium relevant strategy. For general games, however, a stronger requirement is desirable. We propose a general definition of forward induction that identifies those outcomes resulting from the conjunction of rational play and belief that others’ play is rational, and thus minimally consistent with Battigalli and Siniscalchi’s (2002) epistemic model of strong belief in rationality. Because a relevant strategy is optimal, hence rational, in some weakly sequential equilibrium with the same outcome, the relevant strategies are the minimal set for which one can require the support of one player’s belief to recognize the rationality of other players’ strategies—indeed that is the lesson from the standard examples in Section 2. Our proposed definition of a forward induction outcome therefore requires that the outcome results from a weakly sequential equilibrium in which every player maintains the hypothesis that other players are using relevant strategies throughout the game, as long as that hypothesis is tenable.
12
S. GOVINDAN AND R. WILSON
Thus forward induction is applied only to information sets reached by profiles of relevant strategies: DEFINITION 3.4—Relevant Information Set: An information set is relevant for an outcome if it is not excluded by every profile of strategies that are relevant for that outcome.5 Then we define a forward induction outcome as follows.6 DEFINITION 3.5—Forward Induction: An outcome satisfies forward induction if it results from a weakly sequential equilibrium in which at every information set that is relevant for that outcome the support of the belief of the player acting there is confined to profiles of Nature’s strategies and other players’ strategies that are relevant for that outcome. Section 7 compares this definition with Reny’s alternative interpretation. Applied to the standard examples in Section 2, our definition yields the conclusions obtained from “forward induction reasoning” in the literature. For instance, in Example 2.2 of a signaling game, the outcome QQ-R does not satisfy forward induction because the definition requires that after observing B, player II assigns zero probability to I’s irrelevant strategies BB and BQ, and thus assigns positive probability only to I’s relevant strategy QB that enables II’s information set B to be reached. For signaling games in general, forward induction implies the intuitive criterion, D1, D2 (whose iterative version defines universal divinity), and Cho and Kreps’ (1987, Sec. IV.5) strongest criterion, called never weak best response (NWBR) for signaling games. These implications are verified by showing that a strategy s of the sender is irrelevant if s prescribes that his type t sends a message m that is not sent by any type in some weakly sequential equilibrium with the given outcome, and the pair (t m) satisfies any of these criteria. For instance, the criterion NWBR excludes the strategy s from the receiver’s belief if the continuation strategy m at information set t yields exactly the sender’s type-contingent payoff from the given outcome for some beliefs and optimal responses of the receiver only when some other type t that could send m would get a type-contingent payoff that is higher than from the designated outcome for the same or a larger set of the receiver’s optimal responses. But this condition implies that there is no weakly sequential equilibrium with the same outcome for which m is an optimal action for type t. Were there such an equilibrium, the receiver could use any such response at the off-the-equilibrium-path 5 This differs from Kuhn’s (1953, Definition 6) and Reny’s (1992, p. 631) definition of an information set that is relevant for a pure strategy because the information set is not excluded by that strategy. 6 For readers who prefer normal-form analysis, Definition B.1 in Appendix B proposes a slightly stronger definition applied to the normal form of the game that is equivalent to the definition proposed here when payoffs are generic and that enables an analog of Theorem 6.1.
ON FORWARD INDUCTION
13
information set m, but then type t could obtain a superior payoff by sending m. Thus, m cannot be an optimal continuation by type t in any weakly sequential equilibrium with the given outcome, and therefore s is an irrelevant strategy. For general games in extensive form, forward induction implies a version of NWBR that Fudenberg and Tirole (1993, p. 454) attributed to Kohlberg and Mertens (1986, Proposition 6). A pure strategy that is an inferior reply to every equilibrium with a given outcome chooses an inferior action at some information set that intersects a path of equilibrium play. Such a strategy is irrelevant for that outcome according to Definition 3.3 and, therefore, if the outcome satisfies forward induction according to Definition 3.5, then it results from a weakly sequential equilibrium in which, at every information set that is relevant for that outcome, the support of the belief of the player acting there assigns zero probability to this irrelevant strategy. 4. DEFINITION OF INVARIANCE In this section we define invariance as a property of a solution concept. First we define relations of equivalence between games and between outcomes of equivalent games. Recall that a player’s pure strategy is redundant if its payoffs for all players are replicated by a mixture of his other pure strategies. From the normal form of a game one obtains its reduced normal form by deleting redundant strategies. Thus the reduced normal form is the minimal representation of the essential features of the strategic situation. DEFINITION 4.1—Equivalent Games: Two games are equivalent if their reduced normal forms are the same up to relabeling of strategies. As specified in Definition 3.2, the outcome of an equilibrium of a game in extensive form is the induced probability distribution on terminal nodes and thus on the paths through the tree. Associated with each outcome is a set of profiles of Nature’s and players’ mixed strategies that result in the outcome, and in turn each such profile can be replicated by a profile of mixed strategies in the reduced normal form. Hence we define equivalent outcomes as follows. DEFINITION 4.2 —Equivalent Outcomes of Equivalent Games: Outcomes of two equivalent games are equivalent if they result from the same profile of mixed strategies of their reduced normal form. Trivially, the outcome of any Nash equilibrium is equivalent to the outcome of a Nash equilibrium of any equivalent game. For any solution concept that is a refinement of Nash equilibrium, we define invariance as follows. DEFINITION 4.3—Invariant Outcome: An outcome is invariant for a solution concept if every equivalent game has an equivalent outcome of an equilibrium selected by the solution concept.
14
S. GOVINDAN AND R. WILSON
This definition is used in Section 6 where the solution concept is sequential equilibrium. Existence of invariant outcomes of sequential equilibria for generic games in extensive form with perfect recall is implied by the existence of stable sets as defined by Mertens (1989).7 5. FORMULATION In this section we introduce notation used in the proof of Theorem 6.1 in Section 6. Let Γ be a game in extensive form with perfect recall. For each player n let Hn be the collection of his information sets, and let Sn Bn , and Σn be his sets of pure, behavioral, and mixed strategies. A pure strategy chooses an action at each information set in Hn , a behavioral strategy chooses a distribution over actions at each information set, and a mixed strategy chooses a distribution over pure strategies. We say that a pure strategy enables an information set if the strategy’s prior actions do not exclude the information set from being reached, and similarly for behavioral and mixed strategies. Let P be the outcome of a sequential equilibrium of the game Γ . Say that a pure strategy or an information set is P-relevant if it is relevant for the outcome P. Let Σ(P) and B(P) be the sets of Nash equilibria of Γ represented as profiles of mixed and behavioral strategies, respectively, that result in the outcome P. Also, let BM(P) be the set of weakly sequential equilibria whose outcome is P, where each (b μ) ∈ BM(P) consists of a profile b ∈ B(P) of players’ behaviorial strategies and a profile μ of players’ consistent beliefs. As in Definition 3.1, in each weakly sequential equilibrium (b μ) the belief of a player n at his information set hn ∈ Hn is a probability distribution μn (·|hn ) over Nature’s and other players’ pure strategies that reach hn . A pure strategy sn ∈ Sn for player n is optimal in reply to (b μ) if at each information set of n enabled by sn , the action specified there by sn is optimal given (b μ). Given an outcome P, say that a P-path through the game tree is one that terminates at a node in the support of P. Actions on P-paths are called equilibrium actions. Let Hn (P) be the collection of player n’s information sets that intersect P-paths. Obviously, every equilibrium in B(P) prescribes the same mixture at each information set in Hn (P). Let Sn (P) ⊂ Sn comprise those pure strategies sn of player n such that sn chooses an equilibrium action at every information set in Hn (P) that sn enables. Note that if σ ∈ Σ(P), then the support of σn is contained in Sn (P). Moreover, every strategy in Sn (P) is optimal against every equilibrium in Σ(P). Partition the complement Tn (P) ≡ Sn \Sn (P) into subsets Rn and Qn of n’s pure strategies that are P-relevant and P-irrelevant, respectively. Note that Sn (P) may also contain P-irrelevant strategies. 7 In Govindan and Wilson (2006) we proved that if a solution concept satisfies invariance and a condition called strong backward induction, then it selects sets of equilibria that are stable in the weaker sense defined by Kohlberg and Mertens (1986, Sec. 3.5).
ON FORWARD INDUCTION
15
Define an equivalence relation among player n’s pure strategies as follows. Two strategies are equivalent if they prescribe the same action at each information set in Hn (P). Let En (P) be the set of equivalence classes. Denote a typical element of En (P) by En and let En (sn ) be the equivalence class that contains sn . Let En◦ (P) be the subcollection of equivalence classes that contain strategies in Sn (P). Thus, any strategy that is used in some equilibrium in Σ(P) belongs to some equivalence class in En◦ (P), while any strategy that is in Tn (P) does not. (In Example 2.2, for P = QQ − R and player n = I, Sn (P) = {QQ}, Rn (P) = {QB}, En (P) = {QQ QB BQ BB}, and En◦ (P) = {QQ}.) If Rn is not empty, then for each probability δ ∈ (0 1) let tnδ be a mixed strategy of n of the form [1 − δ]sn∗ + δρn , where sn∗ is a strategy in Sn (P) and ρn is a mixed strategy whose support is Rn . Since sn∗ is a best reply against every equilibrium in Σ(P), tnδ is an approximate best reply against equilibria in Σ(P) when δ is small, a fact we need in the next section. Define a game Gδ in normal form by adding to the normal form G of Γ the redundant pure strategy tnδ for each player n for whom Rn is not empty. In particular, tnδ is added iff there is some information set in Hn (P) where some nonequilibrium action is part of a P-relevant strategy. (In Example 2.1, the redundant strategy tnδ for player I is the one shown in Figure 1 with parameter δ = 1/4; in Example 2.2, it is the strategy QX with parameter δ = 1/9.) Next we define a game Γ δ in extensive form with perfect recall whose normal form is equivalent to Gδ , and thus also Γ δ is equivalent to Γ . A path of play in Γ δ consists of choices by players in an initial stage, followed by a path of play in a copy of Γ . In the subsequent play of Γ , no player is informed about choices made by other players in the initial stage of Γ δ . The rules of Γ δ are the following. If Rn is empty, then in Γ δ player n chooses among all his equivalence classes in En (P) in the initial stage. If Rn is not empty, then in the initial stage he first chooses whether to play an equivalence class in En◦ (P) or not. If he decides to play something in En◦ (P), then he chooses one of these equivalence classes; if he chooses not to, then he proceeds to a second information set where he chooses to play either the redundant pure strategy tnδ or an equivalence class among those not in En◦ (P). After these initial stages for all players, Γ δ evolves the same as Γ does, that is, a copy of Γ follows each sequence of choices in the initial stage. In Γ δ the information sets in Γ are expanded to encompass appropriate copies of Γ to represent that no player ever observes what others chose in the initial stage; thus, the information revealed in Γ δ is exactly the same as in Γ . The information set hn ∈ Hn in Γ has in Γ δ for each En ∈ En (P) an expanded copy hδn (En ) and a copy hδn (tnδ ). Nature makes the choice at the expansions of those information sets in Hn (P) (but not at expansions of those in Hn \Hn (P)) according to the equivalence class chosen in the initial stage or at all expansions of information sets in Hn if tnδ was chosen. That is, if n chooses tnδ at the second information set, then Nature automatically implements the entire strategy; but if he chooses some equivalence class En in En (P), then Nature implements actions prescribed by En at each hδn (En ) when hn ∈ Hn (P)
16
S. GOVINDAN AND R. WILSON
and leaves it to him to choose at those that are not, if and when they occur. (If Γ is the Beer–Quiche game and δ = 1/9, then tnδ = QX and Γ δ is the game in which player I chooses whether to play QQ, and if not then he chooses among BB, BQ, QB, and QX, as described in the text of Example 2.2.) One can interpret player n’s choice of an equivalence class from, say, En◦ (P) as equivalent to making an initial commitment to take a specified equilibrium action at every information set in Hn (P), leaving the actions at n’s other information sets unspecified until those information sets are reached. A pure strategy sn ∈ Sn can be implemented in Γ δ by first choosing En (sn ) in the initial stage and then making the choices prescribed by sn at all hn ∈ Hn \Hn (P), and any strategy in Γ δ that begins by choosing some equivalence class En in the first stage ends up implementing some sn ∈ En . Observe too that the redundant pure strategy tnδ , when available, ends up implementing the mixture given by tnδ . Thus, it is obvious that Gδ is obtained from the normal form of Γ δ by deleting some redundant pure strategies in the latter that are duplicates of other pure strategies. Hence, by a slight abuse of notation, we view Gδ as the normal form of Γ δ . The game Γ δ is now easily seen to be equivalent to Γ . Suppose hn ∈ Hn \Hn (P). If an equivalence class En contains a pure strategy sn that enables hn in Γ , then in Γ δ the corresponding strategy sn —that is, / Hn (P)— choosing En in the initial stage and then making sn ’s choices at all hn ∈ enables hδn (En ). Conversely, if En does not contain such an sn , then there is an information set vn ∈ Hn (P) that precedes hn , is enabled by En , and where En makes a choice different from the one that leads to hn . Thus, in Γ δ , Nature’s choice at vnδ (En ) prevents hδn (En ) from being reached. Therefore, to analyze the game Γ δ , we need to consider only information sets hδn (En ) where En contains a strategy that enables hn in Γ . For simplicity in this section and the next, by an information set hδn (En ) in Γ δ of player n, we mean an hn ∈ Hn \Hn (P) and an En that contains a strategy that enables hn in Γ . Now assume the game has two players. We use m to denote the opponent of player n. Suppose that En and En are two equivalence classes that contain strategies that enable some hn ∈ / Hn (P). The information that n has at hδn (En ) δ and hn (En ) about m’s choices are the same at both information sets. Therefore, a pure strategy of m in Gδ enables one if and only if it enables the other. In particular, in a sequential equilibrium (b˜ δ μ˜ δ ) of Γ δ , player n’s belief at hδn (En ) is independent of En and can thus be denoted μ˜ δn (·|hn ). Likewise, suppose σ˜ mδ is a mixed strategy of m in Gδ that enables some information set hδn (En ) of n. Then σ˜ mδ induces a conditional distribution τ˜ mδ over the pure strategies of m in Gδ that enable hδn (En ). Let σm and τm be the equivalent strategies in G. It is easily checked that τm is the conditional distribution induced by σm over the pure strategies that enable hn , and an action an at hδn (En ) in Γ δ is optimal against τ˜ mδ iff it is optimal against τm in Γ .
ON FORWARD INDUCTION
17
6. STATEMENT AND PROOF THE THEOREM In this section we show for two-player games with generic payoffs that an invariant backward induction outcome satisfies forward induction. The notion of genericity we invoke is the following. Let G be the space of all games generated by assigning payoffs to the terminal nodes of a fixed twoplayer game tree. In Govindan and Wilson (2001) we showed that there exists a closed lower-dimensional subset G0 such that for each game not in G0 there are finitely many outcomes of Nash equilibria. For technical reasons, in Appendix A we construct another closed lower-dimensional subset denoted G1 . Now a game is generic if it is in the complement of both G0 and G1 . With this, the formal statement of our theorem is the following: THEOREM 6.1: An outcome of a two-player game with perfect recall and generic payoffs satisfies forward induction if it is invariant for the solution concept sequential equilibrium. PROOF: Assume that Γ is a two-player game in extensive form with perfect recall and generic payoffs. Assume also that P is an invariant sequential equilibrium outcome of Γ , that is, each game equivalent to Γ has a sequential equilibrium whose outcome is equivalent to P. Because Γ is equivalent to the game Γ δ defined in Section 5, Γ δ has a sequential equilibrium (b˜ δ μ˜ δ ) whose outcome is equivalent to P. Because μ˜ δ is consistent, there exists a sequence {b˜ δε } of profiles of completely mixed behavioral strategies that converges as ε ↓ 0 to b˜ δ and a corresponding equivalent sequence {σ˜ εδ } of profiles of completely mixed strategies in the normal form Gδ that converges to some profile σ˜ δ and such that the belief profile μ˜ δ is the limit of the beliefs derived from the sequence {σ˜ εδ }. Since b˜ δ induces an outcome that is equivalent to P, the strategy σ˜ δ , which is equivalent to b˜ δ , has its support in Sn (P) for each n: indeed, a strategy in Tn (P), or the strategy tnδ when available, chooses a nonequilibrium action at some hn ∈ Hn (P) in Γ that it enables. Therefore, under b˜ δ each player n in the initial stage assigns positive probability only to choices of equivalence classes in En◦ (P). Corresponding to the sequence {σ˜ εδ }, there is an equivalent sequence {σεδ } of profiles of mixed strategies in the normal form of Γ for which there is an equivalent sequence {bδε } of profiles of behavioral strategies in the extensive form of Γ . Let μδε be the profile of beliefs induced by σεδ . Denote selected limit points of these sequences by σ δ , bδ , and μδ . By construction, σ δ ∈ Σ(P), bδ ∈ B(P), and μδ is consistent. It follows from our remarks at the end of the previous section that for each n and hn ∈ / Hn (P), μδn (·|hn ) is equivalent δ to μ˜ n (·|hn ), and an action at hn is optimal in Γ against μδn (·|hn ) iff it is optimal in Γ δ at hδn (En ) against μ˜ δn (·|hn ) for the corresponding copies in Γ δ .
18
S. GOVINDAN AND R. WILSON
Next we argue that (bδ μδ ) is a weakly sequential equilibrium of Γ . Let hn be an information set of player n that bδn enables. We need to show that the choice made by bδn at hn is optimal against μδn (·|hn ). If hn belongs to Hn (P), then μn (·|hn ) is derived from σmδ and obviously bδn chooses optimally at hn . / Hn (P). Let an be an arbitrary action at hn that is chosen Suppose that hn ∈ with positive probability by bδn . Since hn is enabled by bδn there exists a pure strategy sn in the support of σ δ that enables hn and chooses an there. Since σ δ is equivalent to σ˜ δ , in b˜ δ player n with positive probability chooses En (sn ) and then makes the choices prescribed by sn . Sequential rationality of an at hδn (En ) implies its optimality against μ˜ δn (·|hn ). Hence, from the previous paragraph an is optimal against μδn (·|hn ) in Γ . Since an was arbitrary, this shows that (bδ μδ ) is a weakly sequential equilibrium of Γ . For some sequence δ ↓ 0, (σ δ bδ μδ ) converges to some limit point (σ b μ). Clearly, σ ∈ Σ(P), b ∈ B(P), μ is consistent, and (b μ) ∈ BM(P) is a weakly sequential equilibrium of Γ because BM(P) is a closed set. It remains to prove the forward induction property for the belief profile μ. δ } be the sequence For each n for whom Rn is not empty, and each δ, let {τ˜ nε δ δ of conditional distributions over Tn ≡ Tn (P) ∪ {tn } induced by the sequence δ {σ˜ εδ } and let τ˜ nδ be a limit point. The sequence {τ˜ nε } and therefore its limit are determined by choices made after n chooses in the initial stage to avoid equivalence classes in En◦ (P). Therefore, the probability of tnδ is nonzero under τ˜ nδ iff tnδ is chosen with positive probability at n’s second information set in the initial stage, and the probability of sn ∈ Tn (P) is nonzero under τ˜ nδ iff n chooses E(sn ) with positive probability at this stage and then implements the choices of sn with positive probability after this choice. Express τ˜ nδ as a convex combination α˜ δn tnδ + [1 − α˜ δn ]τˆ nδ where the support of τˆ nδ is contained in Tn (P). Then sequential rationality at the initial stage after rejecting equivalence classes in En◦ (P) and at subsequent information sets have the following two implications. First, α˜ δn is nonzero only if tnδ is at least as good a reply as each sn ∈ Tn (P) against bδm . Second, if α˜ δn < 1, then a strategy sn ∈ Tn (P) belongs to the support of τˆ nδ only if it is at least as good a reply against bδm as the other strategies in Tnδ , and for / Hn (P) enabled by sn in Γ , the choice prescribed by sn at hn is optimal each hn ∈ at hδn (En (sn )) given the belief μ˜ δn (·|hn ). If an information set hδm (Em ) of player m is enabled by τ˜ nδ but not by σ˜ nδ , then the beliefs μ˜ δm (·|hm ) are derived from τ˜ nδ . δ The sequence {τ˜ nε } induces a corresponding sequence of equivalent strategies in G that induces a sequence of conditional distributions over Tn (P). Because tnδ = [1 − δ]sn∗ + δρn , the limit of the sequence of strategies in G δ that is equivalent to the sequence {τ˜ nε } is [1 − δ]α˜ δn sn∗ + δα˜ δn ρ + [1 − α˜ δn ]τˆ nδ . Therefore, the limit of the sequence of conditional distributions over Tn (P) is τnδ = αδn ρ + [1 − αδn ]τˆ nδ , where αδn = α˜ δn δ/[α˜ δn δ + (1 − α˜ δn )]. Obviously if an information set hm of player m is enabled by τnδ but not by σnδ , then the beliefs μδm (·|hm ) are those derived from τnδ . Passing to a subsequence if necessary, the limit τn of the sequence τnδ can be expressed as a convex combination τn = αn ρ + [1 − αn ]τˆ n where αn and τˆ n
ON FORWARD INDUCTION
19
are the limits of αδn and τˆ nδ , respectively. As in the previous paragraph, if an information set hm of player m is enabled by τn but not by σn , then the beliefs μm (·|hm ) are those derived from τn . CLAIM 6.2: (i) αn > 0. (ii) If αn < 1, then the support of τˆ n consists of strategies in Rn . In particular, for each sn in its support and each information set hn that sn enables, the choice of sn at hn is optimal given (b μ). PROOF OF CLAIM: We prove (ii) first. Suppose αn < 1. Let sn be a strategy in Tn (P) that is not optimal in reply to (b μ). We show that sn is not in the support of τˆ nδ for all sufficiently small δ, which proves the second statement. Let hn be an information set that sn enables where its action is not optimal. If hn ∈ Hn (P), then every strategy in En (sn ) is suboptimal against bm . Because sn∗ , the strategy that belongs to Sn (P) and to the support of tnδ for all δ, is optimal against bδm for all δ and because b is the limit of bδ , for sufficiently small δ the strategy tnδ does better against bδ than every strategy in the equivalence class En (sn ). At the second information set in the initial stage of Γ δ where n decides among the redundant strategy tnδ and equivalence classes not in En◦ (P), sequential rationality implies that he chooses the equivalence class En (sn ) with zero probability for all small δ. As we remarked above, this implies that for such δ, the probability of sn is zero in τˆ nδ . If hn ∈ / Hn (P), then there exists another strategy sn in the equivalence class En (sn ) that agrees with sn elsewhere but prescribes an optimal continuation at hn . Obviously, for all small δ, sn is a better reply than sn in reply to (bδ μδ ). But then sequential rationality at the copy hδn (En (sn )) of hn in Γ δ for such small δ implies that he would choose according to sn there and not sn . Again, the probability of sn under τˆ nδ is zero for small δ. Thus every strategy in the support of τˆ n is optimal in reply to (b μ) and therefore is a P-relevant strategy. It remains to show that αn = 0. Suppose to the contrary that αn = 0. Let Sˆn be the set of strategies in the support of either σn or τˆ n . Let Hˆ m be the collection of information sets in Hm enabled by strategies in Sˆn . Because (b μ) is a weakly sequential equilibrium, we obtain the following properties for each information set hm in Hˆ m of player m: if hm is enabled by σn , then the action prescribed by bm at hm is optimal against σn ; if hm is enabled by τˆ n and not by σn , then the action prescribed by bm is optimal against τˆ n . Therefore, for each small η > 0 there exists a perturbation Γ (η) of Γ , where only m’s payoffs are perturbed, such that σm is optimal against σn (η) ≡ [1 − η]σn + ητˆ n in Γ (η) and that Γ (η) converges to Γ as η goes to zero. As we argued above, τˆ n is optimal against σm in Γ . Therefore, for all small η, (σm σn (η)) is an equilibrium of Γ (η). Since Γ is generic, it belongs to some component C of the open set G \G1 , where G1 is the set constructed in Appendix A. Since G \G1 has finitely many connected components, C is open in G \G1 and, hence, in G . Therefore, the sequence Γ (η) is in C for all small η. By Lemma A.1 in Appendix A, there
20
S. GOVINDAN AND R. WILSON
exists a strategy τn such that (i) the support of τn equals Sˆn and (ii) σm is a best reply against τn in Γ . Therefore, for all 0 ≤ ε ≤ 1, (σm (1 − ε)σn + ετn ) is an equilibrium of Γ . But because the strategies in the support of τˆ n choose a nonequilibrium action at some hn ∈ Hn (P) that they enable, all these equilibria result in different outcomes. This is impossible because the payoffs in Γ are generic and therefore Γ has only finitely many equilibrium outcomes, as shown Q.E.D. in Govindan and Wilson (2001). Thus αn = 0. Now we prove that P satisfies forward induction by showing that (b μ) induces beliefs that assign positive probability only to P-relevant strategies. Let hn be an information set of n that is enabled by bn . If hn ∈ Hn (P), then obviously n’s belief over the continuation strategies of m is the one derived from / Hn (P), σm , and strategies in the support of σm are obviously P-relevant. If hn ∈ then the only strategies of m that enable hn are those in Tm (P). If there is no strategy in Rm that enables hn , then there is nothing to prove. Otherwise, the subset of strategies in Rm that enable hn is not empty and then the strategy τm , which by the above claim has Rm as its support, enables hn . Therefore, μn (·|hn ) is derived from τm , and in this case, too, the restriction on beliefs imposed by forward induction holds. Thus P satisfies forward induction. Q.E.D. Theorem 6.1 resolves a conjecture by Hillas and Kohlberg (2002, Sec. 13.6). Its remarkable aspect is that backward induction and invariance suffice for forward induction—if there are two players and payoffs are generic. No further assumption about rationality of behavior or plausibility of beliefs is invoked; neither are perturbations of strategies invoked as in studies of perfect equilibria and stable sets of equilibria, and for signaling games there is no reliance on Cho and Kreps’ (1987, p. 181) auxiliary scenario in which the sender makes a “speech” that the other’s intransigent belief ignores the fact that a deviation would be rational provided merely that the receiver recognizes and acts on its implications by excluding irrelevant strategies from the support of his belief. Invariance excludes one particular presentation effect by requiring that the outcome should not depend on whether a mixed strategy is treated as an additional pure strategy. One interpretation of forward induction is that it excludes another presentation effect by requiring that the outcome does not depend on irrelevant strategies. Indeed, van Damme (2002, p. 1555) interpreted forward induction as akin to the axiom called independence of irrelevant alternatives in social choice theory. In the case of a game, the analog of social choice is the outcome (the probability distribution on terminal nodes) and the irrelevant alternatives are players’ irrelevant strategies. Our proof of Theorem 6.1 relies on the assumption that Γ has two players and generic payoffs. Indeed, the conclusion of Claim 6.2 relies on genericity.8 8 That extensions to nongeneric payoffs are problematic can be seen in studies of signaling games where signals are costless to the sender, as in Chen, Kartik, and Sobel (2008).
ON FORWARD INDUCTION
21
Moreover, the proof does not suffice for the Beer–Quiche game in Section 2.2 if the two types of player I are treated as two different players; in particular, the game Γ δ has a sequential equilibrium in which both types choose Q. The intuitive reason why the proof does not apply to an N-player game is that at a player’s information set that does not intersect paths of equilibrium play, his beliefs might need to evaluate the relative likelihood of one opponent choosing a relevant strategy compared to another opponent choosing an irrelevant strategy. Asking for a sequential equilibrium in the game Γ δ does not impose any discipline on such considerations. We surmise that additional decisiontheoretic criteria must be imposed to obtain a version of Theorem 6.1 for a game with more than two players. 7. RENY’S INTERPRETATION OF FORWARD INDUCTION An implication of Theorem 6.1 is that there is no conflict between backward and forward induction if one adopts the decision-theoretic principle of invariance. This conclusion depends on our definitions of relevant strategies and forward induction outcomes; for example, we interpret forward induction as a refinement of weakly sequential equilibrium that ensures the outcome does not depend on one player believing the other is using an irrelevant strategy at a relevant information set. In this section we compare our definitions with the principle alternative, represented by the discussion in Reny (1992, Sec. 4). He invoked “best response motivated inferences” as an instance of “forward induction logic” and concluded from an example that it can conflict with backward induction. Although he does not propose an explicit definition, the main ingredients differ from our formulation as follows. Our definitions are narrow—we interpret forward induction as a property of an outcome of a weakly sequential equilibrium and ask only that the outcome results from one in which the support of a player’s belief at a relevant information set is confined to relevant strategies, which we limit to those strategies that are optimal replies to some weakly sequential equilibrium. Reny’s view applies forward induction reasoning directly to players’ strategies rather than to outcomes, and applies it to more information sets and more strategies. At every information set not excluded by a player’s own strategy, he asks only that the support of the player’s belief is confined to those strategies that reach that information set for which there are some beliefs of the other player that would justify using them. The implications of Reny’s expanded view of forward induction reasoning are illustrated by the motivating example in his Figure 3. The top panel of Figure 4 shows a game in which players I and II alternately choose whether to end the game. Reny argued that this example shows a tension between forward and backward induction. He observed that I’s choice of the pure strategy D strictly dominates Ad. He inferred from this that forward induction should require that if I rejects D, then II must believe that I’s strategy is surely Aa and hence
22
S. GOVINDAN AND R. WILSON
FIGURE 4.—Top panel: Reny’s example of a game between players I and II. Bottom panel: The game modified so that player I can choose the redundant strategy x(δ) after rejecting D.
II’s only optimal reply is Ad. But backward induction requires each player to choose d, and before that D, which contradicts the seeming implication of forward induction that II’s strategy should be Ad. From this Reny concluded that II’s backward induction strategy is “rendered ‘irrational”’ and thus “the inappropriateness, indeed the inapplicability of the usual backward programming argument in the presence of best response motivated inferences” (Reny (1992, p. 637), italics as in original). Our analysis of this example differs in two respects. First, I’s only relevant strategy is to choose D initially, so forward induction according to our definition has no implications for II’s beliefs. This is so because our definitions identify outcomes that result from the conjunction of rational play and beliefs that other players are playing rationally; hence we apply them only to information sets reached by rational play as represented by relevant strategies. In contrast,
ON FORWARD INDUCTION
23
Reny applied forward induction to the belief of a player when the other player’s strategy is an optimal reply to an arbitrary belief. In the top panel of Figure 4, for player II to choose A requires that either II believes I is irrational or II believes that I believes II is irrational. Specifically, for II at her first decision node to believe that I’s strategy is Ad amounts to believing that I is irrational (because D dominates Ad as noted above); for II to believe that I’s strategy is Aa and that I is rational requires II to ascribe to I a belief that II’s strategy is Aa with high probability, which is an irrational strategy for II (because at II’s second decision node the continuation d dominates the continuation a). Our view is that a coherent theory of rational play and beliefs that others are playing rationally (i.e., a theory consistent with strong belief in rationality) is possible only with the more circumscribed definition of relevant strategies that we propose. Note, however, that we admit fewer strategies as relevant but restrict beliefs only at relevant information sets, so our definition of forward induction is neither stronger nor weaker than Reny’s interpretation. The other respect in which our analysis differs is that we invoke invariance. The bottom panel of Figure 4 shows an expanded extensive form in which player I can reject D and then choose between action A or the new pure strategy x(δ) for some probability δ ∈ (1/2 1). (See Govindan and Wilson (2006, Sec. 2.3) for a similar example.) The two information sets indicate that player II cannot know whether I chose A or x(δ) after rejecting D. The branch points indicated by black boxes refer to moves by Nature, that is, Nature takes over and implements the strategy x(δ) using the indicated probabilities (1 − δ δ) and (0 1) at I’s first and second information sets after I chooses x(δ). Note that x(δ) is redundant because it is replicated by the mixed strategy that chooses between D and Aa with probabilities 1 − δ and δ. In the expanded game it is somewhat arbitrary whether one supposes that I can choose x(δ) before or after D. We use the latter because then it is easy to construct the unique Nash equilibrium of the subgame that begins after I rejects D. This is a sequential equilibrium in which I chooses x(δ) with probability 1/[1 + δ] and otherwise chooses A and then d; and, if A occurs, then II chooses D with probability 2δ − 1, and otherwise A and then d. Consistent with Bayes’ rule, II’s strategy is supported by beliefs at her first and second information sets that the conditional probabilities are, respectively, 1/2 and 1 that I chose x(δ). As δ approaches 1/2, II’s strategy in this equilibrium converges to Ad and, as δ approaches 1, to D, which correspond to the two strategies by II that Reny considered. (When δ = 1/2, the Nash equilibria of the subgame require only that I’s probability of x(1/2) is at least 2/3, and II’s belief changes accordingly; when δ = 1, the game is essentially the same as the original game since x(1) is a duplicate of Aa.) Thus we find that II might be indifferent between her backward induction strategy D and her strategy Ad that Reny concluded is implied by best response motivated inferences. Therefore, Reny’s conclusion that II’s backward induction strategy is rendered irrational depends on rejecting either sequential equilibrium or invariance as decision-theoretic principles.
24
S. GOVINDAN AND R. WILSON
8. CONCLUSION Theorem 6.1 offers an explanation of why forward induction is a desirable refinement of sequential equilibrium in two-player games with generic payoffs. If an outcome does not satisfy forward induction—that is, depends on one player believing the other is using an irrelevant strategy—then there is an equivalent game in which this outcome results only from Nash equilibria and not from any sequential equilibrium. Failure of an economic model to predict an outcome that satisfies forward induction could motivate reconsideration of whether the essential features of the strategic situation are well represented by the specific extensive form used in the model, or if one has confidence in the model, then this prediction might be rejected because for the same model there necessarily exists another prediction that does satisfy forward induction. Because the theorem is restricted to games with two players and generic payoffs, it does not establish that our definitions of relevant strategies and forward induction outcomes are surely the right ones for general games. But it suggests that similar definitions might enable “forward induction reasoning” to be justified by decision-theoretic criteria. APPENDIX A: TECHNICAL LEMMA Given an extensive form with two players, for each player n let Sn and Σn be n’s sets of pure and mixed strategies, and let S = S1 × S2 and Σ = Σ1 × Σ2 be the product sets of profiles. Let G be the Euclidean space of games generated by assigning payoffs to the players at the terminal nodes of the given extensive form. LEMMA A.1: There exists a closed, lower-dimensional, semi-algebraic set G1 of G such that G \G1 has finitely many connected components. For each connected component C, the following holds: if for some game Γ ∈ C and profile σ ∈ Σ the set of profiles of pure strategies that are the players’ optimal replies to σ is T = T1 × T2 ⊂ S, then for every game Γ ∈ C there exists a profile σ ∈ Σ with the same support as σ and such that in Γ the set of pure optimal replies to σ is T . PROOF: Let X = G × Σ and let p : X → G be the natural projection. For each pair R = R1 × R2 and T = T1 × T2 of subsets of S, let X(R T ) be the set of (Γ σ) in X such that, for each n, Rn is the support of σn and Tn is the set of n’s pure optimal replies in Γ to the mixed strategy σm of the other player. By the generic local triviality theorem in Bochnak, Coste, and Roy (1998) there exists a closed, lower-dimensional, semi-algebraic subset G1 of G such G \G1 has finitely many connected components. Moreover, for each connected component C of G \G1 there exist (i) a semi-algebraic fiber F , (ii) for each pair (R T ), a subset F(R T ) of F , and (iii) a homeomorphism h : C × F → p−1 (C)
ON FORWARD INDUCTION
25
with the properties that (a) p ◦ h(Γ f ) = Γ for all Γ ∈ C, f ∈ F and (b) h maps C × F(R T ) homeomorphically onto p−1 (C) ∩ X(R T ) for each (R T ). Suppose T is the set of profiles of pure optimal replies to σ in a game Γ ∈ C. Let R be the support of σ. Then (Γ σ) belongs to X(R T ). Therefore, there exists f ∈ F(R T ) such that h(Γ f ) = (Γ σ). For each Γ ∈ C, let σ (f ) be the unique mixed strategy in Σ for which h(Γ f ) = (Γ σ (f )). Then the support of σ (f ) is R and the set of profiles of pure optimal replies in Γ to σ (f ) is T . Q.E.D. APPENDIX B: FORWARD INDUCTION IN THE NORMAL FORM The classical view in game theory is that the normal form of a game is sufficient to capture all strategically significant aspects. Hence the question arises as to whether we can state a comparable version of forward induction for a game in normal form. Here we provide one such definition. The following three components of Definition 3.5 for a game in extensive form need to be rephrased in terms of the normal form: (i) weakly sequential equilibria, (ii) relevant strategies, and (iii) restriction of beliefs to those induced by relevant strategies whenever possible. As will be seen below, if the sequential rationality requirement in the definition of weakly sequential equilibria is strengthened slightly (and only for nongeneric games), then the corresponding definition of forward induction has a normal-form counterpart. Given a game G in normal form, let σ be a profile of players’ mixed strategies and let b be an equivalent profile in behavioral strategies for an extensiveform game Γ with that normal form. Reny (1992, Proposition 1) showed that σ is a normal-form perfect equilibrium of G iff in Γ there exists a sequence bε of completely mixed profiles converging to b such that for each player n and each information set hn that bn does not exclude, the action prescribed by bn at hn is optimal against bε−n for all small ε. Thus the difference between weakly sequential equilibrium and normal-form perfect equilibrium is analogous to that between sequential equilibrium and extensive-form perfect equilibrium: one requires optimality only in reply to the limit, while the other requires optimality in reply to the sequence as well. Reny also showed that weakly sequential equilibria coincide with normal-form perfect equilibria for generic extensiveform games. Therefore, a perfect equilibrium seems to be the right normalform analog of a weakly sequential equilibrium. Suppose Σ∗ is a set of Nash equilibria of G. (To fix ideas, Σ∗ could be the set Σ(P) of equilibria inducing an outcome P in an extensive-form version of the game, but to allow applications to nongeneric games, we allow multiple outcomes.) In the extensive-form case, we said that a strategy was relevant if it was optimal against a pair of profiles of strategies and beliefs inducing the given outcome. But as noted above, if we insist on optimality along the sequence, then the appropriate normal-form definition of a relevant strategy becomes: a strategy is relevant if it is optimal against a sequence of ε-perfect equilibria converging to an equilibrium in Σ∗ .
26
S. GOVINDAN AND R. WILSON
Finally, we turn to belief restrictions. The idea in the extensive-form case is that if an information set hn of player n is reached by a profile of relevant strategies of his opponents, then he assigns zero probability to continuations that are enabled only by profiles that contain an irrelevant strategy for one of the other players. Let R−n (hn ) be the set of profiles of relevant strategies of n’s opponents that reach such an hn . If we use a sequence σ ε of normal-form profiles to generate players’ beliefs and their continuation strategies, then the belief restriction says that n’s belief at hn and the continuation strategies of his opponents should be obtained from the limit of the sequence of conditional distributions over R−n (hn ) induced by the sequence σ ε . That is, the beliefs at all information sets of all players that are reached by relevant strategies can be generated from the sequence of conditional distributions confined to relevant strategies. Because we insist on optimality along the sequence, what we obtain is a perfect equilibrium with a restriction on the form of its representation as a lexicographic probability system, as in Blume, Brandenburger, and Dekel (1991, Propopositions 4, 7). The restriction is that any profile that includes an irrelevant strategy for some player should occur later in the lexicographic sequence than those profiles that include only relevant strategies. This implements the basic requirement that each player believes the other is using a relevant strategy as long as that hypothesis is tenable. Thus, we are led to the following definition: DEFINITION B.1—Normal-Form Forward Induction: A set of Nash equilibria satisfies normal-form forward induction if it contains a perfect equilibrium whose lexicographic representation has all profiles of relevant strategies occurring before all profiles that include irrelevant strategies. In general, this is a stronger requirement than the one in the text, but for a generic two-player game in extensive form with perfect recall it can be shown that the set of weakly sequential equilibria inducing an outcome P satisfies the above definition iff P satisfies forward induction as defined in the text. The reason for this equivalence is similar to the reason that weakly sequential equilibria and normal-form perfect equilibria coincide for generic extensiveform games as established by Reny (1992, Proposition 1). An implication is that the analog of Theorem 6.1 is true with this definition of forward induction, that is, the set of Nash equilibria resulting in an invariant sequential equilibrium outcome of a two-player game in extensive form with perfect recall and generic payoffs satisfies normal-form forward induction. REFERENCES BANKS, J., AND J. SOBEL (1987): “Equilibrium Selection in Signaling Games,” Econometrica, 55, 647–661. [3]
ON FORWARD INDUCTION
27
BATTIGALLI, P., AND M. SINISCALCHI (2002): “Strong Belief and Forward Induction Reasoning,” Journal of Economic Theory, 106, 356–391. [4,11] BLUME, L., A. BRANDENBURGER, AND E. DEKEL (1991): “Lexicographic Probabilities and Equilibrium Refinements,” Econometrica, 59, 81–98. [26] BOCHNAK, J., M. COSTE, AND M.-F. ROY (1998): Real Algebraic Geometry. Berlin: Springer Verlag. [24] CHEN, Y., N. KARTIK, AND J. SOBEL (2008): “Selecting Cheap-Talk Equilibria,” Econometrica, 76, 117–136. [20] CHO, I., AND D. KREPS (1987): “Signalling Games and Stable Equilibria,” Quarterly Journal of Economics, 102, 179–221. [3,7,8,12,20] CHO, I., AND J. SOBEL (1990): “Strategic Stability and Uniqueness in Signaling Games,” Journal of Economic Theory, 50, 381–413. [4] FUDENBERG, D., AND J. TIROLE (1993): Game Theory. Cambridge, MA: MIT Press. [4,7,13] GOVINDAN, S., AND R. WILSON (2001): “Direct Proofs of Generic Finiteness of Nash Equilibrium Outcomes,” Econometrica, 69, 765–769. [17,20] (2006): “Sufficient Conditions for Stable Equilibria,” Theoretical Economics, 1, 167–206. [14,23] (2007): “On Forward Induction,” Research Report 1955, Business School, Stanford, CA. [5] HILLAS, J. (1994): “How Much of Forward Induction Is Implied by Backward Induction and Ordinality,” Mimeo, University of Auckland, New Zealand. [6] HILLAS, J., AND E. KOHLBERG (2002): “Conceptual Foundations of Strategic Equilibrium,” in Handbook of Game Theory, Vol. III, ed. by R. Aumann and S. Hart. New York: Elsevier, 1597–1663. [1,3,4,20] KOHLBERG, E. (1990): “Refinement of Nash Equilibrium: The Main Ideas,” in Game Theory and Applications, ed. by T. Ichiishi, A. Neyman, and Y. Tauman. San Diego: Academic Press. [3] KOHLBERG, E., AND J.-F. MERTENS (1986): “On the Strategic Stability of Equilibria,” Econometrica, 54, 1003–1037. [3,7,13,14] KOHLBERG, E., AND P. RENY (1997): “Independence on Relative Probability Spaces and Consistent Assessments in Game Trees,” Journal of Economic Theory, 75, 280–313. [1] KREPS, D. (1990): A Course in Microeconomic Theory. Princeton, NJ: Princeton University Press. [9] KREPS, D., AND G. RAMEY (1987): “Structural Consistency, Consistency, and Sequential Rationality,” Econometrica, 55, 1331–1348. [10] KREPS, D., AND J. SOBEL (1994): “Signalling,” in Handbook of Game Theory, Vol. II, ed. by R. Aumann and S. Hart. New York: Elsevier, 849–868. [4] KREPS, D., AND R. WILSON (1982): “Sequential Equilibria,” Econometrica, 50, 863–894. [1,2,10] KUHN, H. (1953): “Extensive Games and the Problem of Information,” in Contributions to the Theory of Games, Vol. II, ed. by H. Kuhn and A. Tucker. Princeton, NJ: Princeton University Press, 193–216. Reprinted in H. Kuhn (ed.) (1997): Classics in Game Theory. Princeton, NJ: Princeton University Press. [2,12] MCLENNAN, A. (1985): “Justifiable Beliefs in Sequential Equilibrium,” Econometrica, 53, 889–904. [4] MERTENS, J.-F. (1989): “Stable Equilibria—A Reformulation, Part I: Definition and Basic Properties,” Mathematics of Operations Research, 14, 575–625. [14] PEARCE, D. (1984): “Rationalizable Strategic Behavior and the Problem of Perfection,” Econometrica, 52, 1029–1050. [4] RENY, P. (1992): “Backward Induction, Normal Form Perfection and Explicable Equilibria,” Econometrica, 60, 627–649. [3,4,10,12,21,22,25,26] SPENCE, A. M. (1974): Market Signaling. Cambridge, MA: Harvard University Press. [9] VAN DAMME, E. (1989): “Stable Equilibria and Forward Induction,” Journal of Economic Theory, 48, 476–496. [3]
28
S. GOVINDAN AND R. WILSON
(2002): “Strategic Equilibrium,” in Handbook of Game Theory, Vol. III, ed. by R. Aumann and S. Hart. New York: Elsevier, 1523–1596. [3,4,20]
Dept. Economics, University of Iowa, Iowa City, IA 52242, U.S.A.;
[email protected] and Business School, Stanford University, Stanford, CA 94305-5015, U.S.A.;
[email protected]. Manuscript received February, 2007; final revision received May, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 29–69
PUBLIC VS. PRIVATE OFFERS IN THE MARKET FOR LEMONS BY JOHANNES HÖRNER1 AND NICOLAS VIEILLE We study the role of observability in bargaining with correlated values. Short-run buyers sequentially submit offers to one seller. When previous offers are observable, bargaining is likely to end up in an impasse. In contrast, when offers are hidden, agreement is always reached, although with delay. KEYWORDS: Bargaining, delay, impasse, observability, lemons problem.
1. INTRODUCTION WE STUDY THE ROLE of observability in bargaining with correlated values. More precisely, we study how the information available to potential buyers affects the probability of reaching an agreement. Our main result is that if discounting is low and the static incentive constraints preclude first-best efficiency, agreement is always reached when previous offers are kept hidden, while agreement is unlikely to be reached when they are made public. The information and payoff structures are as in Akerlof’s (1970) market for “lemons.” One seller is better informed than the potential buyers about the value of the single unit for sale. It is common knowledge that trade is mutually beneficial. All potential buyers share the same valuation for the unit, which strictly exceeds the seller’s cost. The game is dynamic. The seller bargains sequentially with potential buyers until agreement is reached, if ever, and delay is costly. When it is his turn, a buyer makes a take-it-or-leave-it offer to the seller. That is, the setting is formally a search model: the seller may rationally choose to reject available offers in return for the opportunity to wait for higher prospective offers. This search process is without recall. To take a specific example, consider the sale of a residential property. In most countries, houses are sold through bilateral bargaining. Potential buyers come and go, engaging in private negotiations with the seller, until either an agreement with one of them is reached or the house is withdrawn from the market. Typically, potential buyers know how long the house has been for sale, which provides a rough estimate of the number of rejected offers. However, past offers remain hidden and “only a bad agent would reveal them,” in the words of one broker. Similarly, in most labor markets, employers do not observe the actual offers that the applicant may have previously rejected, but they can infer how long he has been unemployed from the applicant’s vita. In contrast, in other bargaining settings, such as corporate acquisition via tender offer, previous offers are commonly observed. Remarkably, our analysis supports the broker’s point of view. With public offers, the equilibrium outcome is unique. Bargaining typically ends up in an 1
Hörner’s work on this project began when he was affiliated with Northwestern University.
© 2009 The Econometric Society
DOI: 10.3982/ECTA6917
30
J. HÖRNER AND N. VIEILLE
impasse: only the first buyer submits an offer that has any chance of being accepted. If this offer is rejected, no further serious offer is submitted on the equilibrium path. This is rather surprising, since it is common knowledge that no matter how low the quality of the unit may be, it is still worth more to the potential buyers than to the seller. Indeed, the second buyer, for instance, would submit a serious offer if he were the last. So it is precisely the competition from future buyers that deters him from making such an offer. Yet on the equilibrium path, all these buyers also submit losing offers. Why can a buyer not break the deadlock by making an offer above the seller’s lowest cost but below the buyer’s lowest possible valuation? As we show, such an offer necessarily triggers an aggressive offer by the next buyer. Because the seller can wait, this dramatically increases the price that a given type of the seller would find acceptable. Gains from trade between the current buyer and the seller no longer exist once we account for this seller’s outside option, rendering such a deviation futile.2 This result provides an explanation for impasses in bargaining. While standard bargaining models are often able to explain delay, agreement is always reached eventually. Exceptions either rely on behavioral biases (see Babcock and Loewenstein (1997)) or Pareto-inefficient commitments (see Crawford (1982)). Here, it is precisely the inability of the seller not to solicit another offer that discourages potential buyers from submitting serious offers. In contrast, agreement is always reached when offers are private. Because the seller cannot use his rejection of an unusually high offer as a signal to elicit an even higher offer by the following buyer, buyers are not deterred from submitting serious offers. To put it differently, the unique equilibrium outcome with public offers cannot be an equilibrium outcome with private offers. Suppose, per impossibile, that such an equilibrium were to exist. Then consider a deviation in which a potential buyer submits an offer that is both higher than the seller’s lowest possible cost yet lower than the buyer’s lowest possible valuation. Future potential buyers would be unaware of the specific value of this out-of-equilibrium offer. Hence, turning it down would not change their beliefs about the unit’s value. Thus, given that the seller expects to receive losing offers thereafter, he should accept the offer if his cost is low enough. This, in turn, means that the offer is a profitable deviation for the buyer. Our main result may appear surprising in light of one of the “linkage” principles in auction theory, stating that disclosure of additional information increases the seller’s expected revenue. In our dynamic setup, it is important to distinguish between how much information can possibly be revealed given the information structure and the information that is actually revealed in equilibrium. While finer information could be transmitted with public offers than with 2 Academic departments are well aware of this problem when considering making senior offers. As clearly this example cannot fail any of the underlying rationality assumptions, the prevalence of such offers raises an interesting puzzle.
PUBLIC VS. PRIVATE OFFERS
31
private offers, this is not what happens in equilibrium: because all offers but the first one are losing offers, no further information about the seller is ever revealed, so that, somewhat paradoxically, more information is communicated with private offers. As already mentioned, this is a search model. However, unlike most of the search literature, the distribution of offers is not fixed, but endogenously derived. The analysis shows that random offers can, indeed, be part of the equilibrium strategies. In addition, it shows that the offer distribution depends on the information available to the offerers. Therefore, it also suggests that it is not always innocuous to treat the offer distribution as fixed while considering variants of the standard search models. The general setup is described in Section 2. Section 3 solves the case of public offers and addresses issues of robustness. Private offers are considered in Section 4. Related literature, results, and extensions are discussed in Section 5. Proofs are in the Appendix. Extensions to the two-type case are provided in the Supplemental Material (Hörner and Vieille (2009)). 2. THE MODEL We consider a dynamic game between a single seller with one unit for sale and a countably infinite number of potential buyers, or buyers for short. Time is discrete and indexed by n = 1 ∞. At each time or period n, one buyer makes an offer for the unit. Each buyer makes an offer only at one time, and we refer to buyer n as the buyer who makes an offer in period n, provided the seller has accepted no previous offer. After receiving the offer, the seller either accepts or rejects the offer. If the offer is accepted, the game ends. If the offer is rejected, a period elapses and it is another buyer’s turn to submit an offer. The quality q of the good is determined by Nature and is uniformly distributed over the interval [q1 1] for some q1 < 1. The value of q is the seller’s private information, but its distribution is common knowledge. We refer to q as the seller’s type. Given q, the seller’s cost of providing the unit is c(q). The valuation of the unit to buyers is common to all of them and is denoted by v(q). We assume that c(·) is (strictly) positive, (strictly) increasing, and twice differentiable, with bounded derivatives. We also assume that v(·) is positive, increasing, and continuously differentiable. Moreover, we assume that v is positive. We set Mc = max |c |, Mc = sup |c |, Mv = max |v |, M = max{Mc Mc Mv }, and mv = min |v | > 0. Observe that the assumption that q is uniformly distributed is made with little loss of generality, given that few restrictions are imposed on the functions v and c.3 3 In particular, our results are still valid if the distribution of q has a bounded density, bounded away from zero.
32
J. HÖRNER AND N. VIEILLE
We assume that gains from trade are always positive: v(q) − c(q) > 0 for all q ∈ [q1 1]. The seller is impatient, with discount factor δ < 1. We are particularly interested in the case in which δ is sufficiently large. To be specific, we set δ¯ := 1 − mv /3M ¯ and assume throughout that δ > δ. Buyer n submits an offer pn that can take any real value. An outcome of the game is a triple (q n pn ), with the interpretation that the realized type is q and that the seller accepts buyer n’s offer of pn (which implies that he rejected all previous offers). The case n = ∞ corresponds to the outcome in which the seller rejects all offers (set p∞ equal to zero). The seller’s von Neumann– Morgenstern utility function over outcomes is his net surplus δn−1 (pn − c(q)) when n < ∞, and zero otherwise. An alternative formulation that is equivalent to the one above is that the seller incurs no production cost but derives a perperiod gross surplus, or reservation value, of (1 − δ)c(q) from owning the unit. It is immediate to verify that this interpretation yields the same utility function. Buyer n’s utility is v(q) − pn if the outcome is (q n pn ), and zero otherwise (discounting is irrelevant since buyers make only one offer). We define the players’ expected utility over lotteries of outcomes, or payoff for short, in the standard fashion. We consider both the case in which offers are public and the case in which previous offers are private. It is worth pointing out that the results for the case in which offers are public also hold for any information structure (about previous offers) in which each buyer n > 1 observes the offer of buyer n − 1. A history (of offers) hn−1 ∈ H n−1 in case no agreement has been reached at time n is a sequence (p1 pn−1 ) of offers that were submitted by the buyers and rejected by the seller (we set H0 equal to ∅). A behavior strategy for the seller is a sequence {σSn }, where σSn is a probability transition from [q1 1] × H n−1 × R into {0 1}, mapping the realized type q, the history hn−1 , and buyer n’s price pn into a probability of acceptance. In the public case, a strategy for buyer n is a probability transition σBn from H n−1 to R.4 In the private case, a strategy for buyer n is a probability distribution σBn over R. Observe that whether offers are public or private, the seller’s optimal strategy must be of the cutoff type. That is, if σSn (q hn−1 pn ) assigns a positive probability to accepting for some q, then σSn (q hn−1 pn ) assigns probability 1 to it for all q < q. The proof of this skimming property is standard and can be found in, for example, Fudenberg and Tirole (1991, Chapter 10, Lemma 10.1). The infimum over types q accepting a given offer is called the indifferent type (at history (hn−1 pn ) given the strategy profile). Since the specification of the 4 That is, for each hn−1 ∈ H n−1 , σBn (hn−1 ) is a probability distribution over R, and the probability σBn (·)[A] assigned to any Borel set A ⊂ R is a measurable function of hn−1 , and similarly for σSn .
PUBLIC VS. PRIVATE OFFERS
33
action of the seller’s indifferent type does not affect payoffs, we also identify equilibria which only differ in this regard. For definiteness, in all formal statements, we shall follow the convention that a seller’s type that is indifferent accepts the offer. For conciseness, we shall usually omit to specify that some statements only hold “with probability 1.” For instance, we shall say that the seller accepts the offer when he does so with probability 1. We use the perfect Bayesian equilibrium (PBE) concept as defined in Fudenberg and Tirole (1991, Definition 8.2).5 In both the public and the private case, this implies that upon receiving an out-of-equilibrium offer, the continuation strategy of the seller is optimal. In the public case, this also implies that after any history on or off the equilibrium path along which all offers submitted by buyers were smaller than c(1), the belief (over seller’s types) of the remaining buyers is common to all of them and computed on the assumption that the seller’s reasons for rejecting previous offers were rational. Thus, in the public case, after any such history, the belief of buyer n over q is the uniform distribution over some interval [qn 1], where qn may depend on the sequence of earlier offers. In the private case, the only nontrivial information sets that are reached with probability 0 occur in periods such that, along the equilibrium path, the probability is 1 that the seller accepts some earlier offer. The specification of beliefs after such information sets turns out to be irrelevant. Given some (perfect Bayesian) equilibrium, we follow standard terminology in calling a buyer’s offer serious if it is accepted by the seller with positive probability. An offer is losing if it is not serious. Clearly, the specification of losing offers in an equilibrium is, to a large extent, arbitrary. Therefore, statements about uniqueness are understood to be made up to the specification of the losing offers. Finally, an offer is a winning offer if it is accepted with probability 1. We briefly sketch here the static version with one buyer, similar to Wilson (1980). The unique buyer submits a take-it-or-leave-it offer. The game then ends whether the offer is accepted or rejected, with payoffs specified as before (with n = 1). Clearly, the seller accepts any offer p provided p ≥ c(q). Therefore, the buyer offers c(q∗ ), where q∗ ∈ [q1 1] is the maximizer of
q
(v(t) − c(q)) dt q
1
with respect to q. Observe that the corresponding payoff of the buyer must be positive, because the buyer can always submit an offer in the interval (c(q1 ) v(q1 )). 5
Formally speaking, Fudenberg and Tirole defined perfect Bayesian equilibria for finite games of incomplete information only. The suitable generalization of their definition to infinite games is straightforward and is omitted.
34
J. HÖRNER AND N. VIEILLE
3. OBSERVABLE OFFERS Throughout this section, we maintain the assumption that offers are public. We prove that the market breaks down in that case. 3.1. General Statement Our main result, Proposition 1, is paradoxical: there is a unique equilibrium outcome, according to which the first buyer’s offer is rejected with positive probability, and all subsequent offers are rejected with probability 1.6 We first present the equilibrium outcome, then provide some intuition. Given a type q and since v is increasing, there exists at most one value of x ∈ [q1 q) that solves 1 c(q) = q−x
q
v(t) dt x
That is, for this value of x, the expected valuation over types in the interval [x q] is equal to the highest cost in this interval, c(q). We denote by f (q) := x ∈ [q q) this value, as a function of q, if it exists. The value f (q) is well defined whenever asymmetric information is severe enough that the buyer’s expected value does not exceed the highest seller’s cost. This is the case if either the range of existing qualities is large or valuations are close to costs. Define the strictly decreasing sequence qk as follows: q0 = 1 and qk+1 = f (qk ) as long as f (qk ) is defined (that is, as long as f (qk ) ≥ q1 ). Because minq {v(q) − c(q)} > 0, it must be that q − f (q) > κ for some κ > 0 and all q. Hence the sequence (qk ) must be finite, and we denote the last and smallest element of this sequence by qK ≥ q1 . Note that this sequence is always well defined, as q0 = 1 > q1 . The mapping f that is used in the definition of (qk ) plays a key role in the analysis of the static model of adverse selection between one seller and two or more buyers. Indeed, consider the “static” game between n ≥ 2 buyers simultaneously announcing prices and the seller who then either keeps the unit or sells it at the highest price, p. Obviously, the seller only sells the unit if p ≥ c(q) or q ≤ c −1 (p). Therefore, at equilibrium it must be that p ≥ p implies that p ≥ E[v(q)|q ≤ c −1 (p )] (where c −1 (p) = 1 for p ≥ c(1)) with equality for p = p, so that the winner barely breaks even and no higher price yields a positive payoff. That is, q1 = f (c −1 (p)) (unless a winning offer yields a positive payoff).7 Note that any such offer p strictly exceeds the optimal offer that a lone buyer would submit. 6
The probability of rejection can be arbitrarily close to 1, depending on q1 . More generally, in all equilibria, there are at least two buyers whose offers all have the property that higher offers cannot yield a positive payoff. Because f −1 need not be uniquely defined, the equilibrium need not be unique, pure, or symmetric. 7
PUBLIC VS. PRIVATE OFFERS
35
While it is possible that qK = q1 , our result is stated here, for simplicity, for the generic case in which qK > q1 . ¯ There is a unique equilibPROPOSITION 1: Assume that qK > q1 and δ > δ. rium outcome, which is independent of δ. On the equilibrium path, the first buyer submits the offer c(qK ), which the seller accepts if and only if q ≤ qK . If this offer is rejected, all buyers n > 1 submit a losing offer, not exceeding c(qK ). This proposition, proved in the Appendix, both asserts that a bargaining impasse can happen in equilibrium and that it must happen in equilibrium. To shed some light on the first of these assertions, we describe one strategy profile and argue that it is an equilibrium. The seller’s acceptance rule is stationary and continuous. If p ≥ c(q0 ), the seller accepts p, independently of his type. Fix now an offer p and consider the interval [c(qk ) c(qk−1 )), k ≥ 1, that contains p:8 • If p belongs to the interval [c(qk ) (1 − δ)c(qk ) + δc(qk−1 )), a seller with type q accepts the offer if and only if q ≤ qk . • If p belongs to the interval [(1 − δ)c(qk ) + δc(qk−1 ) c(qk−1 )), a seller with type q accepts the offer if and only if p − c(q) ≥ δ(c(qk−1 ) − c(q)). This strategy is optimal given the following buyers’ strategies. Given some stage n and a history up to n, let qn denote the lowest type willing to reject all previous offers, so that buyer n’s belief is uniform over the interval [qn 1]. The offer of buyer n depends both on qn and on the previous offer, pn−1 : • If qn ∈ (qk+1 qk ), for some k, the buyer offers c(qk ). • If qn = qk for some k, the buyer randomizes between c(qk ) and c(qk−1 ), so that, given pn−1 , the seller’s type qk was indeed indifferent in the previous stage between accepting and rejecting pn−1 ; that is, the probability π assigned to the offer c(qk−1 ) solves pn−1 − c(qk ) = δπ(c(qk−1 ) − c(qk )). Let us now provide some intuition on why this is an equilibrium, by focusing on, say, the second buyer. On the equilibrium path, this buyer is supposed to make the losing offer c(qK ). Why can he not do any better? Consider Figure 1 as an illustration. The price P(q) denotes the lowest price that the second buyer must offer so that the seller’s type q is indifferent between accepting or not. If q ≤ qK , offering the price c(qK ) trivially suffices, since this is a losing offer. If q ∈ (qK qK−1 ], then the second buyer must offer the price (1 − δ)c(q) + δc(qK−1 ), since the next price offered will be c(qK−1 ). Similarly, if q ∈ (qK−1 qK−2 ], then the second buyer must offer the price (1 − δ)c(q) + δc(qK−2 ), since the next price offered will be c(qK−2 ), etcetera. There are three features to notice. First, the 8 If p belongs to the interval [c(q1 ) c(qK )), a seller with type q accepts the offer if and only if p − c(q) ≥ δ(c(qK ) − c(q)).
36
J. HÖRNER AND N. VIEILLE
FIGURE 1.
function P is discontinuous at qk for all k ≤ K. Second, its graph goes through the points (qk c(qk )). Finally, as δ → 1, it becomes arbitrarily flat on every subinterval (qk qk−1 ]. Because there are known gains from trade, the cost c(q) falls short of ˜ q˜ ≤ q] whenever q ≤ qK−1 (recall that the conditional expected value E[v(q)| qK = f (qK−1 )). So offers for which the seller’s indifferent type exceeds qK−1 are necessarily unprofitable (they would be unprofitable even if no future buyer ever came). By offering c(qK−1 ), the second buyer just breaks even. Finally, for δ close enough to 1, the function P, being arbitrarily flat, must exceed the con˜ q˜ ≤ q] on the entire interval (qK qK−1 ), so that ditional expected value E[v(q)| offers for which the seller’s indifferent type belongs to this range are unprofitable as well. This exhausts all possibilities. As a result, there are no offers that the buyer could make that would ensure him positive profits, and, in fact, he is indifferent between making a losing offer and the offer c(qK−1 ).9 This is not the only PBE, but other PBEs differ only in irrelevant ways (see the Appendix). As for uniqueness, there is no easy argument. In Section 3.2, we provide a sketch of the proof that relies on the following insights. The first insight is that, provided there is little heterogeneity—that is, if qn is close enough to 1— making a winning offer is optimal (as doing so is optimal in the static case). 9 The discontinuities of the function P are the cause of nonexistence of (strong) Markov equilibrium; if a buyer makes an offer in such an interval, the next buyer must randomize accordingly.
PUBLIC VS. PRIVATE OFFERS
37
Next, if the next offer is independent of the current offer, at least in some range, the indifferent type becomes very sensitive to the current offer and, within that range, only the highest or lowest offers need be considered. These two observations trigger some type of unraveling, with the result that a winning offer is submitted whenever doing so yields a positive payoff, that is, when qn > q1 . The second main insight is that when q1 < q1 , no type beyond q1 can possibly trade at equilibrium. Indeed, when qn < q1 , buyer n does not make an offer with indifferent type strictly above q1 . Hence, full trade would require that some buyer n make an offer that type q1 accepts, eventually followed by a winning offer. Such a buyer could profitably deviate: by making instead an offer with indifferent type slightly below q1 , buyer n would delay the timing of the winning offer. In turn, this implies that the corresponding price is bounded below his “equilibrium” price, while the loss in the probability of acceptance is arbitrarily small. As a consequence, an offer equal to c(q1 ) is accepted by all types up to q1 and is followed by losing offers. Hence, and this is the last insight, it is in a sense as if the highest existing type is q1 , rather than q0 = 1, and the previous arguments can be repeated, first on the interval [q2 q1 ], next on [q3 q2 ], and so on. It is now possible to draw a comparison between the dynamic version with public offers and the static version with one buyer. Observe that, depending on the exact value of q1 , q∗ could be anywhere in the interval (q1 qK−1 ), so that both qK > q∗ and qK ≤ q∗ may occur, where c(q∗ ) is the optimal offer in the static version.10 This means that from the seller’s point of view, the comparison between the dynamic version and the static version with a unique buyer is ambiguous. The probability of sale and the expected revenue could be larger in either format depending on q1 . However, it makes more sense to compare this equilibrium outcome with the equilibria in the static game with multiple buyers. This comparison is immediate, as the offer in the static case must be at least as large as c(qK ). Thus, the seller is better off in the static version, having the different buyers compete simultaneously for the unit, rather than one at a time. The probability of trade is higher in the static version. Only the first buyer is better off in the dynamic version, while all other buyers are indifferent. As an immediate consequence, the bargaining outcome generically fails to be ex ante efficient. That is, there exists an incentive-compatible and individually rational mechanism that yields higher expected gains from trade. Indeed, with a single buyer, consider the mechanism in which the seller must accept or reject the fixed price c(q), where q is the largest root of f (q) = q1 . In the linear case that we introduce in the next section, one has q1 = (1 − α)q∗ while the only constraint on q1 is 1−α q < q1 ≤ qK . For low (resp. high) values of q1 in this interval, the dynamic 1+α K (resp. static) version yields a higher payoff to the seller. 10
38
J. HÖRNER AND N. VIEILLE
As mentioned, it is surprising that trade does not always take place. In a sense, there is infinite delay as soon as the lowest type reaches qk . The most recent contribution to this literature, Deneckere and Liang (2006), considered the case of a single long-run buyer, who had the same discount factor as the seller, rather than a sequence of short-run buyers. They defined a finite sequence (qk ) similar (but not identical) to ours and found that in the unique stationary equilibrium, as the lowest type approaches qk from below, the (single) buyer repeatedly submits offers accepted with small probability, so that delay ensues. As the time period between successive offers vanishes, this delay remains finite and bounded away from zero. In the limit, bursts of trade alternate with long periods of delay. This implies that the equilibrium acceptance function of the seller becomes increasingly similar to the equilibrium acceptance of the seller in Proposition 1. Note, however, that because delay remains finite in their model, the price accepted by type qk exceeds c(qk ), so that the actual values of the two sequences differ. Furthermore, this limit comparison ¯ is only illustrative, since the conclusion of Proposition 1 is valid for every δ > δ. Proposition 1 remains valid in the case of a single long-run buyer, provided the buyer is much less patient than the seller. Hence, when combined, these two results point out that the possibility of trade depends on the relative patience of the buyer relative to the seller, an insight already hinted at by Evans (1989) in the case of binary values. Just as Proposition 1 remains valid with a single buyer who is sufficiently more impatient than the seller, it is also valid if the number of buyers is finite, as long as the probability that each of them is selected to make the offer in each of the countably many periods is sufficiently small. The results of Vincent (1989) and Deneckere and Liang (2006) rely on the screening of types that bargaining over time affords. Because delay is costly for the seller, buyers become more optimistic over time, so that uncertainty is progressively eroded. Our no-trade result points to another familiar force in dynamic games; namely, the absence of commitment. Indeed, if the horizon were finite, the last buyer would necessarily submit a serious offer. However, since Coase’s (1972) original insight, the inability to commit has always been associated with an increase in the probability of trade. To quote Deneckere and Liang (2006, p. 1313), the “absence of commitment power implies that bargaining agreement will eventually be reached.” This is because the traditional point of view emphasizes the inability of the buyer to commit to not making another offer. Instead, the driving force here is the inability of the seller not to solicit another offer. This leads to a collapse in the probability of trade and an increase in the inefficiency. 3.2. Sketch of the Proof We here provide a sketch of why all equilibria result in an impasse. For simplicity, here we let the cost and the valuation functions be given by c(q) = q
PUBLIC VS. PRIVATE OFFERS
39
and v(q) = (1 + α)q, where α ∈ (0 1). With such a parametrization, the average valuation of types in any interval [a b] is (1 + α) a+b , so that the values qk 2 defined in the previous section are given by qk = βk , where β := 1−α ∈ (0 1). 1+α 3 2 For concreteness, we assume here that β < q1 < β . We let an equilibrium be given. Following any history, we let q stand for the currently lowest type. We shall solve the game “backwards.” Starting with high values and moving then to lower values of q, we shall prove that the equilibrium offers are uniquely determined. The first and main insight of the proof is that a gradual resolution of uncertainty ending with full trade cannot be an equilibrium outcome. To be specific, ˜ such that any equilibrium offer is either rejected by all there is a threshold, q, ˜ 1] or accepted by all types in (q ˜ 1]. types in (q We establish this claim in several steps. Note first that no equilibrium offer exceeds c(1) = 1 and that an offer of c(1) = 1 is winning. Indeed, following any history, any offer above v(1) = 1 + α is strictly dominated. Hence, no such offer is ever made at equilibrium. Therefore, following any history, any offer above 1 + δα is accepted by all types, since the discounted benefit from turning it down is at most δα. This implies in turn that any offer which exceeds 1 + δ2 α is accepted by all types, etcetera. Next, we observe that when q is close enough to 1, the equilibrium offer of the current buyer is winning. Indeed, if this buyer were the last one, an offer q ≤ 1 would be accepted by all types in [q q]. The buyer’s expected payoff would then be given by the expected benefit conditional on trade, q+q q−q (1 + α) 2 − q, times the probability of this offer being accepted, 1−q . This payoff is strictly increasing on [q 1], as soon as q ≥ 1 − α. Hence, a winning offer would be submitted. The competition from future buyers only makes the winning offer relatively more attractive, since the offer c(1) = 1 yields the same payoff with or without competition, while any other offer cannot possibly yield more with competition than without. ˜ Suppose that the current buyer considers an Fix a history after which q < q. offer that type q > q˜ would accept. Since the lowest type in the following stage ˜ a winning offer would follow. Hence type q would reject would then exceed q, any offer below p(q) := δ + (1 − δ)q Conversely, any offer above p(q) would be accepted by type q. Since δ is close to 1, it is apparent that p(q) is relatively insensitive to q. In particular, if the buyer were to derive a nonnegative payoff from the offer p(q), any higher offer p(q ) (q > q) would yield an even higher payoff, as (i) it would be accepted with a higher probability and (ii) the expected profit conditional on trade would increase as well. As a consequence, it cannot be optimal to make an offer ac˜ 1]. cepted by only a fraction of types in [q
40
J. HÖRNER AND N. VIEILLE
˜ 11 Whenever q > q, ˜ the equilibThis fact allows us to uniquely pin down q. rium offer is winning. On the other hand and by definition, there are histories following which q < q˜ is arbitrarily close to q˜ and following which the equilib˜ Such an offer is accepted with a rium offer p is rejected by all types beyond q. probability arbitrarily close to 0 and hence yields an expected payoff close to 0. By a limiting argument, it follows that a winning offer must yield an expected ˜ Equivalently, the average valuation of all types in [q ˜ 1] payoff of 0 when q = q. is equal to the winning offer, 1, so that q˜ = β. The second insight of the proof is that no type above q˜ = β can possibly trade on the equilibrium path. We argue by contradiction. If not and in the light of our earlier claims, there must be some history such that q < β, and for which (i) the equilibrium offer of the current buyer is accepted by all types up to β and (ii) a winning offer is eventually submitted with positive probability. This equilibrium offer of the current buyer must exceed c(β); otherwise, seller’s types close to β would reject it. Among such histories, choose one for which the equilibrium offer p of the buyer is highest or close to being highest. We claim that if the current buyer were to offer slightly less, the lower offer p would still be accepted by all types up to β and would, therefore, do better—a contradiction. The logic behind this claim is as follows. If the indifferent type q for the lower offer p were below β, the next offer would at most be accepted by all types up to β, and type q would eventually accept an offer not exceeding p. But, provided p is close enough to p, type q would then strictly prefer to accept the lower offer p immediately. This shows that the indifferent type for p is indeed β. The last insight in the proof is that this scenario can be repeated for q ∈ (β2 β]. Let us elaborate on this. A consequence of the previous argument is that an offer of c(β) is accepted by all types up to β (and all offers after that are losing). As in the first part of the sketch, we claim that the equilibrium offer is c(β) whenever q < β is close enough to β. To see this, we mimic the earlier argument. If q ≥ β(1 − α), if the current buyer were the last, and if he were constrained not to make an offer above c(β), his optimal offer would be c(β). This implies that it is also optimal to offer c(β) given the subsequent buyers. The end of the proof follows similar lines. One first shows that the equilibrium offer is c(β) = β whenever q > β2 and that equilibrium offers are rejected by all types above β2 whenever q < β2 . We next argue that no type above β2 trades on the equilibrium path. In turn, this implies that an offer of c(β2 ) = β2 is accepted by all types up to β2 . Finally, we rely again on the comparison with the fictional scenario in which the current buyer would be last, to claim that c(β2 ) = β2 is the equilibrium offer whenever q < β2 is close enough 11 We formally define q˜ as the supremum over all histories of those values of q for which a winning offer need not immediately follow.
PUBLIC VS. PRIVATE OFFERS
41
to β2 . In the unique equilibrium outcome, all buyers offer c(β2 ) = β2 , and the first offer is accepted if and only if the seller’s type does not exceed β2 . 4. PRIVATE OFFERS 4.1. General Properties Throughout this section, we maintain the assumption that offers are private. As we are unable to construct equilibria in general, we first argue that an equilibrium exists. We will apply a fixed-point argument on the space of buyers’ strategies. Given a profile of buyers’ strategies σB , the buyers’ payoffs are computed using the optimal response of the seller to the profile σB . If no later buyer sets a price exceeding c(1), it is suboptimal for a given buyer to set such a price. Hence, for the purpose of equilibrium existence, we can limit the set of buyers’ mixed (or behavior) strategies to the set M([c(q1 ) c(1)]) of probability distributions over the interval [c(q1 ) c(1)], endowed with the weak- topology. The set of strategy profiles for buyers is thus the countable product M([c(q1 ) c(1)])N . It is compact and metric when endowed with the product topology. Since the random outcome of buyer n’s choice is not known to the seller unless he has rejected the first n − 1 offers, buyer n’s payoff function is not the usual multilinear extension of the payoff induced by pure profiles. Given a period n, we denote by qn (p σB ) the indifferent type when the offer p is submitted in stage n, given the strategy profile σB . It is jointly continuous in p and σB . As a result, the belief of any given buyer (viewed as a probability distribution over [q1 1]) is continuous with respect to σB , in the weak- topology. Hence, the set Bn (σB ) ⊂ M([c(q1 ) c(1)]) of best replies of buyer n to σB is both convex-valued and upper hemicontinuous with respect to σB . The existence of a (Nash) equilibrium follows from Glicksberg’s fixed-point theorem. To any such equilibrium, there corresponds a perfect Bayesian equilibrium, because all buyers’ histories are on the equilibrium path until trade occurs, if ever. While an equilibrium exists, it need not be unique. As an example, for c(q) = q, v(q) = (1 + α)q, α = 1/3, δ = 3/4, and q1 = 4249, it can be shown that the following two (and probably more) equilibria exist. • The first buyer offers p∗1 61 and attracts all types up to β = 1/2 for sure. The second buyer makes a losing offer. Buyer n ≥ 3 mixes between a winning and a losing offer, offering the winning price 1 with probability 3/20. • The first buyer randomizes between a losing offer and the offer p∗1 63 with indifferent type q1∗ 51, assigning a probability μ1 40 to the losing offer. The second buyer mixes between a losing offer and the offer p∗2 72 with indifferent type q2∗ 62, assigning a probability μ2 76 to the losing offer. Buyer n ≥ 3 mixes between a winning and a losing offer, offering the losing price with probability μ 94. We summarize the discussion so far in the following lemma.
42
J. HÖRNER AND N. VIEILLE
LEMMA 1: An equilibrium exists. For some parameters, the equilibrium is not unique. More precisely, the equilibrium is unique if and only if q1 > q1 , as defined in Section 3. The two equilibria described above differ quantitatively in terms of delay, revenue, and payoffs. Nevertheless, there are some qualitative similarities. Both equilibria involve mixed strategies. Also, trade occurs with probability 1. This is no coincidence. PROPOSITION 2: In all equilibria, trade occurs with probability 1. PROOF: Fix some equilibrium. Given q ∈ [q1 1], let Fn (q) denote the unconditional probability that the seller is of type t ≤ q and has rejected all offers submitted by buyers i = 1 n − 1, and denote by F the pointwise limit of the nonincreasing sequence (Fn ). Suppose that F(q) = 0 for some q < 1. In particular, the probability that the seller accepts buyer n’s offer, conditional on having rejected the previous ones, converges to 0 as n increases. Hence, the successive buyers’ payoffs also converge to 0. Choose q such that F(q) > 0 and q ν v(t) − c(q) − (1) dF(t) > 0 2 q 1 where ν := minx∈[q1 1] {v(x)−c(x)}. Note that (F(q)−Fn (q))/Fn (q) is the probability that type q accepts an offer from a buyer after n, conditional on having rejected all previous offers. Since F(q) > 0, this probability converges to 0 and the offer pn (q) with indifferent type q tends to no more than c(q). Thus, pn (q) ≤ c(q) + ν2 for n large and, using (1), buyer n’s payoff is bounded away from 0—a contradiction. Q.E.D. Observe that Proposition 2 holds independently of δ and establishes that offers arbitrarily close to c(1) are eventually submitted. According to Proposition 2, agreement is always reached in finite time. This raises the question of delay. That is, let τ(1) denote the random period in which a winning offer is first submitted. The next proposition places bounds on E[δτ(1) ], the expected delay until agreement. In particular, it implies that τ(1) is finite almost surely (a.s.): a winning offer is submitted in finite time, with probability 1. ¯ There exist constants 0 < c1 < c2 < 1 such PROPOSITION 3: Assume that δ > δ. that, in all equilibria, c1 ≤ E δτ(1) ≤ c2
PUBLIC VS. PRIVATE OFFERS
43
The proof of this and all remaining results can be found in the Appendix. Delay (c2 < 1) should not come as a surprise. Since the seller can wait until the first winning offer is submitted and serious offers until then must yield a nonnegative payoff to the buyers submitting them, delay must make waiting for the winning offer a costly alternative to the seller’s lower types. Slightly less obvious is the second conclusion; namely, that delay does not dissipate all gains from trade (c1 > 0). In the first example of this section, the first buyer enjoys a positive payoff, but all other buyers get a zero payoff. More complicated examples of equilibria can be constructed in which more than one buyer gets a positive payoff. However, in all equilibria, all buyers’ payoffs are small. PROPOSITION 4: There exists a constant M1 > 0 such that, for every δ > δ¯ and every equilibrium, the payoff of any buyer n is at most (1 − δ)2 M1 According to the next proposition, buyers with a positive equilibrium payoff are infrequent in the sense that two such buyers must be sufficiently far apart in the sequence of buyers. PROPOSITION 5: There exists a constant M2 > 0 such that, for every δ ≥ δ¯ and every equilibrium, the following statement holds: If buyer n1 and buyer n2 both get a positive payoff, then |n2 − n1 | ≥
M2 1−δ 4.2. Equilibrium Strategies
As mentioned, we do not provide an explicit characterization of an equilibrium for general parameters. Nevertheless, all equilibria share common features. Given some equilibrium, let Fn (q) denote the (unconditional) probability that the seller’s type t is less than or equal to q and that all offers submitted by buyers i = 1 n − 1 are rejected. Set qn := inf{q : Fn (q) > 0}. Buyer n’s strategy is a probability distribution over offers in [c(q1 ) c(1)]. We denote by Pn its support and by Tn the corresponding (closed) set of indifferent types. That is, if buyer n’s strategy has finite support, q ∈ Tn if it is an equilibrium action for buyer n to submit some pn with indifferent type q. The following proposition complements Proposition 5, as together they imply that the number of buyers with a positive payoff is bounded above, uniformly in the discount factor and the equilibrium. ¯ Given some equilibrium, let N0 := inf{n ∈ PROPOSITION 6: Assume that δ > δ. N ∪ {∞} : 1 ∈ Tn }. There exists a constant M3 > 0 such that, in all equilibria, N0 ≤
44
J. HÖRNER AND N. VIEILLE
M3 /(1 − δ). Further, given some equilibrium, Tn ⊂ {qN 1} for all n ≥ N0 . For all 0 n > N0 , buyer n’s equilibrium payoff is zero. Observe that, by Propositions 4, 5, and 6, the (undiscounted) sum of all buyers’ payoffs is at most (1 − δ)2 M3 M1 /M2 , and therefore vanishes as the seller becomes more patient. Thus, from period N0 on, buyers only make winning or losing offers, and all but the first of these have a payoff of zero. In fact, it follows readily from the proof that the equilibrium payoff of buyer N0 is zero as well, as long as q1 ≤ q1 . If q1 > q1 , it follows from Proposition 6 that in the unique equilibrium outcome, the first buyer offers c(1), which the seller accepts. Indeed, provided that he is called upon to submit an offer, any buyer is guaranteed a positive payoff, because he can always offer c(1). In all other cases, there exist multiple equilibria. In particular, there always exists an equilibrium in which agreement is reached in bounded time, as well as an equilibrium in which agreement is reached in unbounded, yet finite, time. Indeed, given any equilibrium profile σ, the probabilities assigned by buyers n > N0 to the winning offer can be modified so that the modified profile is still an equilibrium with the desired property. While the equilibria obtained in this way are payoff-equivalent, the first example in this section shows that this is not true across all equilibria. For q1 < q1 , the next proposition formalizes the idea that all equilibria are in mixed strategies. PROPOSITION 7: Assume that δ > δ¯ and q1 < q1 . No buyer n ≤ N0 uses a pure strategy except possibly buyer 1. All buyers n ≤ N0 submit a serious offer with positive probability. Indeed, buyer 1 need not use a mixed strategy, as the first example given in this section illustrates. Without further assumptions, it is difficult to establish additional structural properties on equilibrium strategies. However, it can be shown that if v is concave and c is convex over (q1 1), with either v or c being strictly so, then each buyer’s strategy is a distribution with finite support, so that each buyer randomizes over finitely many offers only.12 Propositions 6 and 7 allow us to circumscribe the equilibrium strategies as follows. During a first phase of the game (until period N0 − 1), buyers’ strategies assign positive probability to more than one offer (with the possible exception of the first buyer’s strategy); in particular, they all assign positive probability to serious, but not winning, offers. Some of these buyers may enjoy a small positive payoff, while all others have zero payoff; in fact, the number of those not submitting a losing offer with positive probability is finite as well. In 12
See Hörner and Vieille (2007).
PUBLIC VS. PRIVATE OFFERS
45
a second phase (from period N0 on), all buyers’ payoffs are zero, and they randomize between the winning offer and a losing offer, with relative probabilities that are to a large extent free variables. Thus, as long as offers are rejected, the unit’s expected value increases until N0 and is constant thereafter. It is tempting to investigate the existence of equilibria in which all but the first buyers’ strategies assign positive probability to two offers, as in the second example given at the beginning of this section. Such equilibria need not exist, suggesting that either some buyers’ strategies assign positive probability to more than two offers or that the lower offer in the support of some buyers’ strategies is serious as well. It is possible to construct equilibria of the second kind, but doing so generally is intractable, even numerically. 4.3. Comparison With the Public Case The main difference between the two information structures has been emphasized throughout: trade occurs with probability 1 with private offers, but not generally so with public offers. Other comparisons are less clear-cut. In particular, we have been unable to rank the two structures according to efficiency. However, in all examples that we have been able to solve, including those presented in Section 4, welfare is higher under private offers than under public offers. Further, it can be shown that this result holds whenever there are two types only.13 It is straightforward to show that no equilibrium is second-best efficient. In terms of interim efficiency, the comparison is ambiguous. Considering the second example in Section 4, it is easy to check that very low types prefer the outcome under observable offers, while very high types prefer the outcome under hidden offers. From the buyers’ point of view, since it is possible that q∗ = qK , Samuelson’s (1984) Proposition 1 implies that the outcome with public offers is the preferred one among the outcomes of all bargaining procedures from the first buyer’s perspective. In particular, since eventual agreement in the hidden case implies that serious (but not winning) offers involve prices higher than the cost of the corresponding indifferent type, the first buyer prefers the outcome of the game with public offers to the outcome in the game with private outcome whenever q∗ is sufficiently close to qK . The same argument applies to the aggregate buyers’ payoff. Buyers n ≥ 2 prefer the outcome with hidden offers, although any difference disappears as the discount factor tends to 1 (see Proposition 4). 5. RELATED LITERATURE AND CONCLUDING COMMENTS Our contribution is related to the literatures in three ways. First, several authors have already considered dynamic versions of Akerlof’s model. Second, 13
The proof can be found in the supplemental material (Hörner and Vieille (2009)).
46
J. HÖRNER AND N. VIEILLE
several papers in the bargaining literature considered interdependent values. Third, several papers in the literature on learning addressed the conditions under which learning occurs. In particular, two papers have investigated the difference between public and private offers in the framework of Spence’s signaling model. Janssen and Roy (2002) considered a dynamic, competitive durable good setting with a fixed set of sellers. They proved that trade for all qualities of the good occurs in finite time. The critical difference lies in the market mechanism. In their model, the price in every period must clear the market. That is, by definition, the market price must be at least as large as the good’s expected value to the buyer conditional on trade, with equality if trade occurs with positive probability (this is condition (ii) of their equilibrium definition). This expected value is derived from the equilibrium strategies when such trade occurs with positive probability and it is assumed to be at least as large as the lowest unsold value even when no trade occurs in a given period (this is condition (iv) of their definition). This implies that the price exceeds the valuation to the lowest quality seller, so that trade must occur eventually. Also related works are Taylor (1999), Hendel and Lizzeri (1999), House and Leahy (2004), and Hendel, Lizzeri, and Siniscalchi (2005). In the bargaining literature, Evans (1989), Vincent (1989), and Deneckere and Liang (2006) considered bargaining with interdependent values. Evans (1989) considered a model in which the seller’s unit can have one of two values, and assumed that there is no gain from trade if the value is low. He showed that the bargaining may result in an impasse when the buyer is too impatient relative to the seller. In his Appendix, Vincent (1989) provided another example of equilibrium in which bargaining breaks down. As in Evans, the unit can have one of two values. It follows from Deneckere and Liang (2006) that his example is generically unique. Deneckere and Liang (2006) generalized these findings by considering an environment in which the unit’s quality takes values in an interval. They characterized the (stationary) equilibrium of the game between a buyer and a seller with equal discount factors, in which, as in ours, the uninformed buyer makes all the offers. When the static incentive constraints preclude first-best efficiency, the limiting bargaining outcome involves agreement but delay, and fails to be second-best efficient. Other related contributions include Riley and Zeckhauser (1983), Cramton (1984), and Gul and Sonnenschein (1988). There is a large literature on learning. Some papers have examined the conditions under which full learning occurs as time passes and under rather general conditions (see Aghion, Bolton, Harris, and Jullien (1991)). These models, however, are typically cast as decision problems in which Nature’s response is exogenously specified. Here instead, the information that is being revealed is a function of the seller’s best reply, which is endogenous. Nöldeke and van Damme (1990) and Swinkels (1999) developed an analogous distinction in Spence’s signaling model. Both considered a discrete-time version of the
PUBLIC VS. PRIVATE OFFERS
47
model, in which education is acquired continuously and a sequence of shortrun firms submit offers that the worker can either accept or reject. Nöldeke and van Damme considered the case of public offers, while Swinkels focused mainly on private offers. Nöldeke and van Damme showed that there is a unique equilibrium outcome that satisfies the never-a-weak-best-response requirement and that the equilibrium outcome converges to the Riley outcome as the time interval between consecutive periods shrinks. For private offers, Swinkels proved that the sequential equilibrium outcome is unique and that it involves pooling in the limit. The logic driving these results is similar to ours, at least for public offers. In both papers, firms (buyers) are deterred from submitting mutually beneficial offers because rejecting such an offer sends a strong signal to future firms and elicits wage offers so attractive that only very low types would accept the current offer. There are many conceivable variations to the simple setup considered here and the reader may wonder how far the results extend to more complex environments. For instance, one may wish to model buyers as being unaware of the number of prior offers that have been submitted. One may want to allow for multiple buyers in each period. One could also allow the seller to choose how much information to disclose, to make offers, or to send messages that are more or less committing, such as list prices in the case of real estate. These extensions are further discussed in the working paper (see Hörner and Vieille (2007)). One issue which is not addressed there is the impact of potential liquidity shocks on the seller.14 For instance, assume that the seller is subject to random temporary liquidity shocks, independent across periods, upon which the seller would accept any offer that meets his cost. If these costs are unobserved by the buyers, then beliefs of successive buyers are no longer uniform.15 Finally, it might also have occurred to the reader that the impasse result for the public case cannot possibly hold if the horizon is finite, as the situation of the last buyer reduces then to the static case, for which the unique optimal offer is serious. However, it is not hard to see that, for a fixed horizon T , there exists a discount factor δ¯ T < 1 such that if δ > δ¯ T , all offers but the last one are necessarily losing (provided q1 < q1 ): because the last buyer’s offer is serious, the objective function of the penultimate buyer inherits the convexity property stressed in Section 3.1 that compels him to submit a losing offer, and the result follows by backward induction.16 A crucial assumption throughout the analysis has been that buyers do not receive private signals. Allowing for private information is both economically 14
We thank two referees for raising this issue. As a result, the analysis of this variant seems to be beyond reach. However, in the case where cost and valuations may take only two different values, results are qualitatively similar, with or without liquidity shocks. The proof is available upon request. 16 One could also fix the discount factor δ > δ¯ and let T tend to infinity. The complete analysis of this case is quite intricate and we have not been able to resolve it satisfactorily. 15
48
J. HÖRNER AND N. VIEILLE
relevant and likely to drastically affect the results in the public case. In this case, later buyers do not only learn about the seller’s type through the seller’s earlier actions, but also through the offers that were made by previous buyers, as in the literature on cascades (see Bikhchandani, Hirshleifer, and Welch (1992)). A unit might remain on the market for a longer time period either because the seller is particularly reluctant to give up a good that he knows to be of high quality or because all previous buyers’ signals have been unfavorable. The possibility of the latter event is likely to depress later offers, but we conjecture that this might paradoxically help trade by prompting the seller to accept earlier offers.
APPENDIX A: PROOFS FOR PUBLIC OFFERS A.1. Proof of Proposition 1 We first prove the uniqueness of the PBE outcome: On the equilibrium path of any PBE, all buyers offer c(qK ), which the seller accepts if and only if his type does not exceed qK ; otherwise, the seller rejects all offers. We next discuss the existence issue. We prove the uniqueness of the PBE outcome by induction over K. The proof for K = 0 is, in most respects, identical to the proof of the induction step and we, therefore, provide only the latter. We let a (perfect, Bayesian) equilibrium be given. Recall that qn stands for the lowest type which rejected all offers up to stage n (according to the seller’s strategy). Assume that for some k < K, the following statements hold for each l < k: • For every period n ≥ 1 and after any history hn−1 such that qn ∈ (ql+1 ql ), all subsequent buyers, including buyer n, offer c(ql ). • For every period n ≥ 1 and after any history hn−1 such that qn ≤ ql+1 , none of the subsequent equilibrium offers ever exceeds c(ql ). We will prove that the same conclusion holds for k + 1. The proof is broken into the following four steps: Step 1. Whenever qn < qk , no equilibrium offer of buyer n is accepted by some type q > qk . Step 2. Whenever qn < qk , none of the following equilibrium offers ever exceeds c(qk ). Step 3. If qn < qk is close enough to qk , the unique equilibrium offer of buyer n is c(qk ), which the seller accepts if and only if his type does not exceed qk . Step 4. More generally, if qn ∈ (qk+1 qk ), the unique equilibrium offer of buyer n is c(qk ), which the seller accepts if and only if his type does not exceed qk .
PUBLIC VS. PRIVATE OFFERS
49
Step 1: We let a stage n and a history hn−1 be given, such that q := qn < qk . Fix l < k − 1 and let q ∈ (ql+1 ql ]. We claim that an offer p by buyer n is accepted by type q if and only if p − c(q) ≥ δ(c(ql ) − c(q))
(2)
Indeed, first let p be an offer that would be accepted by type q and denote by q ≥ q the indifferent type of p. If q ≥ ql , then p ≥ c(ql ); hence (2) holds. If, on the other hand, q ∈ [q ql ), the offer following p would be c(ql ) by the induction hypothesis. Since type q is willing to accept p, (2) also holds in that case. Conversely, assume that there is some offer p that satisfies (2) and that would be rejected by type q. Denote by q < q the indifferent type of p. If q > ql+1 , then the offer following p would be c(ql ) and would be accepted by type q by the induction hypothesis. If q ≤ ql+1 , by the induction hypothesis again, none of the equilibrium offers following p would ever exceed c(ql ). In both cases, type q would rather accept p in the first place. For q ∈ (ql+1 ql ], define p(q) by p(q) − c(q) = δ(c(ql ) − c(q)) The indifferent type of the offer p(q) is q. When offering p(q), buyer n’s payoff is therefore q 1 {v(t) − δc(ql ) − (1 − δ)c(q)} dt 1−q q As a function of q, the integral is twice differentiable over the interval (ql+1 ql ], with first and second derivatives given by17 v(q) − δc(ql ) − (1 − δ)c(q) − (1 − δ)c (q)(q − q) and v (q) − (1 − δ) c (q)(q − q) + 2c (q) Since (2Mc + Mc )(1 − δ) < mv , buyer n’s payoff is strictly convex over (ql+1 ql ]. Since buyer n’s payoff is negative for q = ql , the claim follows. Step 2: We argue by contradiction. Assume that, for some stage n and some history hn−1 with qn < qk , there is a positive probability that buyer n makes an equilibrium offer pn which exceeds c(qk ). By Step 1, the indifferent type of any such equilibrium offer does not exceed qk . Let p¯ > c(qk ) be the supremum of 17
Up to the constant 1 − q.
50
J. HÖRNER AND N. VIEILLE
all such offers, where the supremum is taken over all such stages n and histories hn−1 . Define p by the equation p − c(qk ) = δ(p¯ − c(qk )). We claim that, for any stage n and following any history hn−1 such that qn < qk , any offer p ≥ p would be accepted by all types up to qk . Hence, no equilibrium offer would ever ex¯ this will imply the desired contradiction. ceed p. Since p < p, Assume to the contrary that, for some stage n and following some history hn−1 , the indifferent type q of some offer p ≥ p satisfies q < qk . Following the offer p , one has qn+1 = q < qk . Upon declining the offer p , type q would accept the first (if ever) offer whose indifferent type exceeds q. By definition, ¯ By definition of p, type q would strictly this offer cannot possibly exceed p. prefer to accept p in the first place. Step 3: Let a stage n and a history hn−1 ∈ H n−1 be given, with q := qn < qk . Given q ∈ (q qk ], denote by p(q) the infimum of the offers that type q would accept. Obviously, p(q) ≥ c(q). On the other hand, and by Step 2, any offer that is rejected by type q is followed by offers which do not exceed c(qk ); hence p(q) ≤ c(qk ). q q 1 1 {v(t) − p(q)} dt, is at most 1−q {v(t) − Hence, buyer n’s payoff, 1−q q q c(q)} dt, and the difference between the two integrals converges to 0 as q increases to qk . The latter integral, as a function of q, is differentiable, with derivative v(q) − c(q) − c (q)(q − q), which is positive whenever q − q < ν/Mc . q 1 {v(t) − c(q)} dt, is inThus, for q close enough to qk , the upper bound, 1−q q creasing over [q qk ). Hence, for such q, and since an optimal offer is assumed to exist, buyer n’s equilibrium offer is c(qk ), which the seller accepts if and only if his type does not exceed qk . Step 4: Again, we argue by contradiction. We assume that for some stage n and some history hn−1 with qn > qk+1 , buyer n’s strategy assigns a positive probability to serious offers with indifferent type below qk . Among all such n and hn−1 , let q˜ ∈ (qk+1 qk ] be the supremum of qn . By Step 3, q˜ < qk . Consider now any stage n and any history hn−1 such that q := qn satisfies ˜ Much as in Step 1, we claim that an offer p is accepted by type q ∈ (qk+1 q]. ˜ qk ) if and only if q ∈ (q (3)
p − c(q) ≥ δ(c(qk ) − c(q))
Indeed, let p be an offer that would be accepted by type q, and denote by q ≥ q the indifferent type of p. If q ≥ qk , then p ≥ c(qk ); hence (3) holds. On the other hand, if q ∈ [q qk ), then the offer following p would be c(qk ) by the induction hypothesis. Since type q is willing to accept p, (3) also holds in that case.
PUBLIC VS. PRIVATE OFFERS
51
Conversely, assume that there is some offer p that satisfies (3) and that would be rejected by type q. By Step 2, none of the subsequent offers would ever exceed c(qk ). Hence type q would rather accept p in the first place. ˜ qk ), define p(q) as For q ∈ (q p(q) − c(q) = δ(c(qk ) − c(q)) As a function of q, and when offering p(q), buyer n’s payoff is q 1 {v(t) − δc(ql ) − (1 − δ)c(q)} dt 1−q q As in Step 1, the integral is a strictly convex function of q. Therefore, the indifferent type of any equilibrium offer is either equal to qk or lies in the ˜ In the former case, buyer n’s offer is c(qk ) and his payoff is interval [q q]. positive since q˜ > qk+1 . In the latter case, buyer n’s payoff is at most (q˜ − ˜ − c(q)), which is arbitrarily close to 0, provided q is close enough to q. ˜ q)(v(q) ˜ the unique equilibrium offer of buyer As a consequence, for q < q˜ close to q, ˜ n is c(qk ), with indifferent type qk . This contradicts the definition of q. This concludes the proof that all PBE have the same outcome: all buyers offer c(qK ), which the seller accepts if and only if his type does not exceed qK . It is instructive to derive more precise insights into equilibrium strategies. Let a stage n and a history hn−1 be given, with qn ∈ (ql+1 ql ) for some l ≤ K. We already know that the unique equilibrium offer following hn−1 is c(ql ), and all subsequent equilibrium offers are equal to c(ql ). What happens off equilibrium? If buyer n makes an offer p < c(ql ) with independent type q, one has q ∈ (ql+1 ql ); thus the offer p is followed by offers equal to c(ql ). Hence the indifferent type q is the solution of the equation p − c(q) = δ(c(ql ) − c(q)).18 If instead buyer n makes an offer p > c(ql ), say c(ql ) < p ≤ c(ql−1 ), the situation is a bit more subtle. We know that the indifferent type q associated with the offer p is at least equal to ql (for otherwise, all subsequent offers would be equal to c(ql ), and type q would strictly prefer to accept the current offer). If q > ql , then the offer p is followed by offers equal to c(ql−1 ), which type q accepts; hence q is given by p − c(q) = δ(c(ql−1 ) − c(q)) This equation has a solution in (ql ql−1 ] if and only if (1 − δ)c(ql ) + δc(ql−1 ) < p ≤ c(ql−1 ) 18
If it exists. Otherwise, if the offer p is so low that a solution does not exist, the offer is losing.
52
J. HÖRNER AND N. VIEILLE
If instead, p lies in the interval (c(ql ) (1 − δ)c(ql ) + δc(ql−1 )], the indifferent type of the offer p is necessarily equal to ql . But this implies that the offer p is eventually followed by serious offers. Which serious offers? Since qn+1 = ql , the only possible serious equilibrium offer is c(ql−1 ), provided it is accepted by all types up to ql−1 . In other words, indifference of type ql relies on the fact that with positive probability, some subsequent buyer will make an offer equal to c(ql−1 ), followed by offers all equal to c(ql−1 ). These probabilities must be such that the benefit p − c(ql ) derived from the current offer is equal to the discounted benefit derived from declining the offer p. The latter is given by c(ql−1 ) − c(ql ) times the expectation of δτ−n , where τ is the first buyer who makes an offer equal to c(ql−1 ). As a consequence, there is some freedom in specifying the equilibrium behavior of buyers after the history hn−1 and an offer of p. (We may, for instance, assume that only buyer n does randomize, as done in the main body of the paper.) Similar arguments apply to pin down the equilibrium behavior of the buyers after any history hn−1 such that qn = qk for some k ≤ K. In this case, consider the most recent stage m < n with qm < qk , and assume without loss of generality that m = n − 1. If along hn−1 , buyer n − 1 made the offer c(qk ), then buyer n should make the offer c(qk ), as should all subsequent buyers as well. If instead buyer n − 1 made a higher offer, then buyer n (and subsequent buyers) should randomize between c(qk ) and c(qk−1 ) in a way such that the prospect of this randomization made type qk indifferent between accepting and declining the offer of buyer n − 1. The seller’s strategy is then uniquely determined (apart from the acceptance decision of the indifferent type): (i) an offer in the interval [c(qk ) (1 − δ)c(qk ) + δc(qk−1 )] is accepted by all types q ≤ qk and is rejected by all types q > qk , (ii) an offer p in the interval ((1 − δ)c(qk ) + δc(qk−1 ) c(qk−1 )) is accepted by all types q such that p − c(q) ≤ δ(c(qk−1 ) − c(q)) and is rejected otherwise.19 We proved that the equilibrium outcome is at most unique and that all PBE coincide, apart from some off-equilibrium randomizations. To conclude the proof, we briefly sketch the definition of an equilibrium profile. Define first the strategy σS of the seller according to the previous paragraph. Given σS , we define strategies σB of the buyers according to the paragraph before. It is straightforward, but somewhat tedious, to check that the profile σ = (σS σB ) is a PBE of the game. Q.E.D.
19
With the convention qK+1 = q1 and q−1 = q0 = 1.
PUBLIC VS. PRIVATE OFFERS
53
APPENDIX B: PROOFS FOR PRIVATE OFFERS B.1. Preliminaries The remainder of the Appendix is organized as follows. As a preliminary, we set up some additional notation and state a few important facts. We then prove Propositions 3–7, though in a different order. We start with Propositions 4 and 5. We then need to prove Proposition 6—with the exception of the upper bound on N0 . Indeed, it is a logical preliminary to Proposition 7, which we prove next, and its proof is instrumental in the proof of Proposition 3. A strategy of buyer n is a probability distribution σBn over offers. Any profile σB of such distributions induces a probability distribution over sequences of offers, which we denote PσB . Expectation with respect to PσB is denoted by EσB . We denote by p˜ n the random offer of buyer n. If a seller with type q declines the first offer and plans to accept an offer at a (random) time τ > 1, his expected payoff is EσB [δτ−1 (p˜ τ − c(q))]. His optimal continuation payoff is thus supτ>1 EσB [δτ−1 (p˜ τ − c(q))], where the supremum is taken over all stopping times τ > 1, and the offer p1 (q) with indifferent type q is given by p1 (q) − c(q) = sup EσB δτ−1 (p˜ τ − c(q)) τ>1
We assume for concreteness that a seller accepts an offer whenever he is indifferent. Thus, a seller with type q accepts the offer from buyer τ(q) := inf{n ≥ 1 : p˜ k ≥ pk (q)}. Similarly, the offer pn (q) with indifferent type q is given in stage n by (4) pn (q) − c(q) = sup EσB δτ−n (p˜ τ − c(q)) τ>n
The function pn (·) is continuous and increasing. The stopping time τn (q) := inf{k > n : p˜ k ≥ pk (q)} is an optimal stopping time in (4); hence pn (·) also satisfies pn (q) − c(q) = EσB δτn (q)−n p˜ τn (q) − c(q) It follows that (5)
pn (q) − c(q) ≥ δ(pn+1 (q) − c(q))
with equality if and only if buyer n + 1 makes no offer above pn+1 (q): competition between successive buyers prevents pn from being much below pn+1 .20 20 On the other hand, it may happen that pn+1 is much below pn , as is, for example, the case if buyer n + 1 makes high offers with high probability, followed by losing offers.
54
J. HÖRNER AND N. VIEILLE
Using a version of the envelope theorem, the function pn has a left derivative everywhere, given by (6) D− pn (q) = c (q) 1 − EσB δτn (q)−n It also admits a right derivative everywhere, given by + D+ pn (q) = c (q) 1 − EσB δτn (q)−n where τn+ (q) := inf{k > n : p˜ k > pk (q)}. Note that EσB [1 − δτn (q)−n ] is nondecreasing in q and, therefore, pn is convex if the cost function is convex. The function pn plays a crucial role in the proofs. It may be interpreted as an (inverse) offer function faced by buyer n, and (6) provides a direct link between the slope of this offer function at q and the discounted time at which a seller with type q expects to receive an acceptable offer—the earlier the discounted time, the lower the slope of pn . We now comment on the beliefs of the various buyers. Since offers are private, the belief of buyer n need not be a uniform distribution. Recall that Fn (q) is the (unconditional) probability that the seller is of type t ≤ q and rejects offers from buyers 1 through n − 1. Letting 1
Pσ (p˜ k < pk (q)) fn (q) = 1 − q1 k=1 B n−1
denote the (normalized) probability that a seller with type q rejects the first n − 1 offers, one has q Fn (q) = fn (t) dt q
1
Observe that fn is left-continuous and nondecreasing, so that Fn is nondecreasing, convex, and admits a left derivative D− Fn (q) = fn (q). We last introduce qn := max{q ∈ [q1 1] : Fn (q) = 0}, the infimum of the types that reject the first n − 1 offers with probability 1. With these notations at hand, the expected payoff πn (q) of buyer n, when submitting the offer pn (q), is given by q (7) (v(t) − pn (q))fn (t) dt πn (q) := q
n
We denote by vn (q) = 1/(Fn (q))
q q
n
v(t)fn (t) dt the average valuation of types
below q, as seen by buyer n. Then (7) is rewritten as πn (q) = Fn (q)(vn (q) − pn (q))
PUBLIC VS. PRIVATE OFFERS
55
which reads as the probability that the nth offer is accepted, Fn (q), times the conditional payoff, given that trade takes place. The payoff function πn has a left derivative everywhere, equal to (8)
D− πn (q) = (v(q) − pn (q))fn (q) − D− pn (q)Fn (q)
It also has a right derivative, given by (9)
D+ πn (q) = (v(q) − pn (q))fn+ (q) − D+ pn (q)Fn (q)
where fn+ is the right derivative of Fn . We stress that the quantities introduced so far, pn , Fn , fn , qn , πn , and vn all depend on the profile σ under consideration, although the notation does not indicate this. Throughout the Appendix, we let an equilibrium σ ∗ be given, and no confusion should arise. For conciseness, we will refer to the indifferent type associated with an offer p as a type offer. For instance, the statement buyer n submits a type offer q is logically equivalent to the statement buyer n submits an offer with indifferent type q. Since πn is continuous, the equilibrium payoff πn∗ of buyer n is equal to max[qn 1] πn , and one has πn (q) = πn∗ for every q ∈ Tn , where Tn is the support of the random type offer of buyer n.21 Finally, we state a preliminary observation that is used repeatedly below. We consider a buyer, say n + 1, who submits only type offers bounded away from qn+1 —the lowest remaining type. We prove that the previous buyer then makes no serious type offer below the lowest serious type offer of buyer n + 1. LEMMA 2: Assume qn+2 > qn+1 for some n ∈ N. Then buyer n submits no type offer in (qn qn+2 ).22 In particular, qn+1 = qn , buyer n submits a losing offer with positive probability, and πn∗ = 0. ∗ > 0, since πn+1 (qn+1 ) = The inequality qn+2 > qn+1 is satisfied whenever πn+1 0 and πn+1 (q) is therefore arbitrarily close to zero in a neighborhood of qn+1 . Lemma 2 thus implies that there are no two consecutive buyers with a positive equilibrium payoff.
PROOF OF LEMMA 2: By assumption, a seller with type q ∈ (qn qn+2 ) plans to accept buyer n + 1’s offer with probability 1, were he to decline buyer n’s offer. Thus, the seller’s continuation payoff is δEσB∗ [p˜ n+1 − c(q)] and, therefore, 21 22
That is, Tn is the smallest closed set of type offers that is assigned probability 1 by σBn . That is, σB∗n assigns probability 0 to type offers in (qn qn+2 ).
56
J. HÖRNER AND N. VIEILLE
pn (q) = (1 − δ)c(q) + δEσB∗ [p˜ n+1 ]. Since δ > δ, this implies that v(q) − pn (q) is increasing. Set z := inf{q ∈ [qn 1] : v(q) ≥ pn (q)} (with inf ∅ = qn ). Note that D− πn (q) = (v(q) − pn (q))fn (q) − c (q)Fn (q) < 0 on (qn z]. On the other hand, on the interval (z qn+2 ], D− πn is lower semicontinuous since fn is nondecreasing and left-continuous. We now prove that D− πn is increasing. Since fn is nondecreasing, one has (v(q) − pn (q))fn (q) − (v(x) − pn (x))fn (x) xq q−x
lim inf
≥ (v (q) − D− pn (q))fn (q) thus, using (8), D− πn (q) − D− πn (x) ≥ (v (q) − (1 − δ)cn (q))fn (q) xq q−x
lim inf
− (1 − δ)(c (q)Fn (q) + c (q)fn (q)) ≥ v (q) − (1 − δ)(2c (q) + c (q))fn (q) > 0 where the second inequality holds since Fn (q) ≤ fn (q) and the third inequality ¯ holds since δ ≥ δ. Since D− πn is upper semicontinuous, this implies that D− πn is strictly increasing over (z qn+2 ], hence πn is strictly convex over [z qn+2 ]. To summarize, πn is continuous, decreasing over [qn z], and strictly convex over [z qn+2 ]. Therefore, it has no maximum over (qn qn+2 ). This proves the first claim. If buyer n does not submit a losing offer with positive probability, then his lowest type offer is at least qn+2 , which implies qn+1 ≥ qn+2 —a contradiction. In particular, πn∗ = πn (qn ) = 0. This concludes the proof of the lemma. Q.E.D. B.2. Proof of Proposition 4 We here prove that equilibrium payoffs are very small. Proposition 8 below implies Proposition 4. PROPOSITION 8: The equilibrium payoff of each buyer n is at most πn∗ ≤
2 (1 − δ)2 (v(qn+1 ) − c(qn+1 ))2 δmv (1 − q1 )
PUBLIC VS. PRIVATE OFFERS
57
PROOF: Consider a buyer n with a positive equilibrium payoff, πn∗ > 0, so that qn+1 > qn and πn∗ = Fn (qn+1 )(vn (qn+1 ) − pn (qn+1 )) We bound below each of the two terms. ∗ = 0, which implies in turn Note that πn∗ > 0 implies πn+1 (10)
pn+1 (qn+1 ) ≥ v(qn+1 )
(for otherwise buyer n + 1 would get a positive payoff when making a type offer just above qn+1 ). Note also that vn (qn+1 ) < v(qn+1 ) and that πn∗ > 0 implies pn (qn+1 ) < vn (qn+1 ): (11)
pn (qn+1 ) < vn (qn+1 ) < v(qn+1 )
Recall finally (5): (12)
pn (qn+1 ) − c(qn+1 ) ≥ δ(pn+1 (qn+1 ) − c(qn+1 ))
We rely on (10), (11), and (12) to prove first that the expected payoff conditional on trade, vn (qn+1 ) − pn (qn+1 ), is at most of the order 1 − δ. By (11), then (10), then (12), one has 1 vn (qn+1 ) < v(qn+1 ) ≤ pn+1 (qn+1 ) ≤ {pn (qn+1 ) − (1 − δ)c(qn+1 )}; δ hence (13)
vn (qn+1 ) − pn (qn+1 ) ≤
1−δ (v(qn+1 ) − c(qn+1 )) δ
Next, we argue that Fn (qn+1 ) is at most of the order 1 − δ. Substituting (10) into (12) yields δv(qn+1 ) ≤ pn (qn+1 ) − (1 − δ)c(qn+1 ) which then implies, using the first half of (11), (14)
vn (qn+1 ) ≥ δv(qn+1 ) + (1 − δ)c(qn+1 )
The intuition now goes as follows. If the probability Fn (qn+1 ) is nonnegligible, then the computation of vn (qn+1 ) must involve a significant fraction of low types, and vn (qn+1 ) is therefore bounded away from v(qn+1 ), which stands
58
J. HÖRNER AND N. VIEILLE
in contradiction to (14). To verify formally this claim, by compute the highest value for Fn (qn+1 ) which is consistent with (14) and compute the value Ω of the ¯ find auxiliary infinite-dimensional linear problem (P ): given q1 < q ≤ 1 and v, the supremum of q P: f (t) dt q
over the set F of nondecreasing, left-continuous q q functions with values in [0 1/(1 − q1 )] and such that q v(t)f (t) dt ≥ v q f (t) dt. The analysis of (P ) 1 1 is standard. When endowed with the Levy distance, F is compact and the objective of (P ) is continuous; hence there is an optimal solution, f ∗ . Since v(·) − v is strictly increasing, the solution f ∗ must be of the form f ∗ (t) = 1 ∗ × 1/(1 − q1 ) for some q∗ . The location of q∗ is dictated by the constraint t>q q v(t) dt = v(q − q∗ ). Plugging the inequality v(t) ≤ v(q) − mv (q − t) for all q∗ t ∈ [q∗ q] into the constraint yields q − q∗ ≤
2 (v(q) − v) mv
We apply this upper bound with q = qn+1 and v¯ = δv(qn+1 ) + (1 − δ)c(qn+1 ), and get (15)
Ω≤
2(1 − δ) (v(qn+1 ) − c(qn+1 )) mv (1 − q1 )
Collecting (13) and (15) then yields πn∗ ≤
2 (1 − δ)2 (v(qn+1 ) − c(qn+1 ))2 δmv (1 − q1 ) Q.E.D.
as desired. B.3. Proof of Proposition 5
The intuition for the proof is as follows. A seller with type qn +1 < qn +1 ex1 2 pects to receive an acceptable offer at stage n2 at the latest. Thus, the difference n2 − n1 is directly linked to the discounted time at which a seller with type qn +1 expects to trade. In Lemma 3, we first provide a lower bound on D+ pn (q). 1 We then rely on the relation between D+ pn (qn +1 ) and the discounted time at 1 which type qn +1 trades. 1
59
PUBLIC VS. PRIVATE OFFERS
LEMMA 3: For any buyer n and any serious type offer q > qn in Tn , one has D+ pn (q) ≥ mv /2(1 − q1 ). The proof of Lemma 3 uses the following technical inequality. LEMMA 4: Let h : [q1 1] → R+ be nondecreasing. Then, for any [a b] ⊆ [q1 1], one has (16)
h(b)
b
v(t)h(t) dt +
b
a
h(t) dt
mv 2
b
h(t) dt ≤ v(b)h(b) a
a
(with the convention
0 0
= 0).
PROOF: By induction over the cardinality of the range of v, it is easily checked that (16) holds whenever v is a step function. The result then follows using a limit argument. Q.E.D. PROOF OF LEMMA 3: Since πn is maximal at q, one has D+ πn (q) ≤ 0, that is, (17)
(v(q) − pn (q))fn+ (q) − D+ pn (q)Fn (q) ≤ 0
On the other hand, since q ∈ Tn , one has πn (q) = πn∗ which, since q is a serious type offer, implies v(q) ≥ vn (q) ≥ pn (q). Plugging these inequalities into (17), one obtains fn+ (q)(v(q) − vn (q)) − D+ pn (q)Fn (q) ≤ 0 or, equivalently, q v(t)fn+ (t) dt q q
n fn+ (q)v(q) ≤ fn+ (q)
q q
fn+ (t) dt
+ D+ pn (q)
fn+ (t) dt
q
n
n
The result then follows by applying Lemma 4.
Q.E.D.
PROOF OF PROPOSITION 5: Recall from (6) that D+ pn (q) = c (q)(1 − + EσB∗ [δτn (q)−n ]), where τn+ (q) = inf{k > n : p˜ k > pk (q)}. Consider buyer n := n1 + and q := qn +1 > qn . Since q ∈ Tn and by Lemma 3, one has EσB∗ [δτn (q)−n ] ≤ 1 1 1 − mv /2Mc (1 − q1 ). On the other hand, τn+ (q) ≤ n2 PσB∗ -a.s. This implies δn2 −n1 ≤ 1 − mv /2Mc (1 − q1 ) and thus n2 − n 1 ≥
mv (1 − q1 ) 1 × 1−δ mv + 2Mc
60
J. HÖRNER AND N. VIEILLE
Q.E.D.
as desired.
For later use, we note that the very same argument, applied to n = 1 and to any serious offer q ∈ T1 ensures that EσB∗ [δτ(1) ] ≤ 1 − mv /2Mc (1 − q1 ). This yields the upper bound in Proposition 3. B.4. Proof of Proposition 6 We here prove Proposition 9 below. It corresponds to Proposition 6 except for the upper bound on N0 , which is established in the proof of Proposition 3. PROPOSITION 9: There is a stage N0 such that the following statements hold: P1. Tn ⊆ {qN 1} for all n ≥ N0 . 0 P2. max Tn < 1 for all n < N0 . In addition, πn∗ = 0 for all n ≥ N0 . PROOF: Define N := 1 + max{n : max Tn < 1 and Fn (1) ≥ ν/Mc } (where ν = min[q1 1] (v − c)). Since limn Fn (1) = 0, the stage N is well defined and either FN (1) < ν/Mc or max TN = 1. To start with, assume that the latter holds. We prove that P1 and P2 hold with N0 = N. Since 1 ∈ TN0 , one has πN0 (1) = FN0 (1)(vN0 (1) − c(1)) ≥ 0, and thus πn (1) ≥ 0 for all n ≥ N0 . We argue by contradiction and assume that Tn ∩ (qN 1) = ∅ 0 for some n ≥ N0 , and we call n the first such stage. One thus has vn+1 (1) > ∗ vn (1); hence πn+1 (1) > 0 so that πn+1 > 0 and thus qn+2 > qn+1 . This implies that buyer n + 1 makes a winning offer with probability 1. (Otherwise indeed, the equilibrium payoff of buyer n + 2 would also be positive—a contradiction.) Put otherwise, qn+2 = 1. By Lemma 2, buyer n makes no serious type offer in (qn qn+2 ) = (qN 1). This is the desired contradiction. Therefore, 0 Tn ⊆ {qN 1}, for all n ≥ N0 . 0 Assume now that FN (1) < ν/Mc and that q := maxn
n
PUBLIC VS. PRIVATE OFFERS
= π˜ n (q) +
1 1 − qn
61
q
(v(t) − c(q)) dt q
This is the payoff that would accrue to buyer n if he were the last buyer or, alternatively, if all buyers following n would only submit losing offers. Thus, π˜ n (q) ≥ πn (q) with equality if q = 1. The derivative of π˜ n is given by π˜ n (q) = ≥
1 (v(q) − c(q)) − c (q)(Fn (q) − Fn (q)) 1 − q1 ν − Mc Fn (1) > 0 1 − q1
Therefore, π˜ n is increasing over the interval [q 1], so that πn (1) > πn (q) for every q ∈ [q 1): buyer n makes no offer in (q 1). It remains to prove that πn∗ = 0 for all n ≥ N0 . It suffices to prove that ∗ πN0 = 0. Assume to the contrary that πN∗ 0 > 0. Then buyer N0 would make a winning offer with probability 1, for otherwise πN∗ 0 +1 would also be positive. Therefore, by Lemma 2, buyer N0 − 1 would make no serious type offer in (qN −1 qN +1 ) = (qN −1 1). This would imply that buyer N0 − 1’s equilibrium 0 0 0 payoff is also positive—a contradiction. Q.E.D. There may be several (consecutive) stages consistent with P1 and P2. Without further notice, we choose N0 to be the first of those stages. B.5. Proof of Proposition 7 For convenience, we recall the statement of Proposition 7. PROPOSITION 10: All buyers n < N0 submit a serious offer with positive probability. No buyer n < N0 uses a pure strategy, with the possible exception of the first buyer. PROOF: By definition of N0 , buyer N0 − 1 makes a serious, nonwinning offer with positive probability. We start with the first statement. We argue by contradiction and assume that Tn = {qn } for some n < N0 . (In particular, πn∗ = 0.) Let n∗ > n be the first buyer following n who submits a serious offer with positive probability. Since n∗ ≤ N0 − 1, one has qn∗ := max Tn∗ < 1. Because of discounting and using (5) inductively, one has pn (qn∗ ) − c(qn∗ ) = δn∗ −n (pn∗ (qn∗ ) − c(qn∗ )); hence pn (qn∗ ) < pn∗ (qn∗ ).
62
J. HÖRNER AND N. VIEILLE
Since buyers n ≤ k < n∗ only submit losing offers, the distribution of types faced by buyers n and n∗ is the same, and vn (qn∗ ) = vn∗ (qn∗ ). It follows that πn (qn∗ ) > πn∗ (qn∗ ) = πn∗∗ ≥ 0—a contradiction. This concludes the proof of the first statement. Consider now an arbitrary buyer n, where 1 < n < N0 . If buyer n assigns probability 1 to a specific type offer, it must be to a serious type offer qn > qn , and then qn+1 = qn . On the other hand, pn−1 (qn ) − c(qn ) = δ(pn (qn ) − c(qn )); ∗ = 0 and buyer n − 1 makes no type hence pn−1 (qn ) < pn (qn ). By Lemma 2, πn−1 offer in (qn−1 qn ); hence vn−1 (qn ) = vn (qn ). As above, this implies πn−1 (qn ) > 0—a contradiction. Q.E.D. B.6. Equilibrium Delay Recall that τ(1) = inf{n : p˜ n = c(1)} is the first buyer who submits a winning offer. Recall that we proved that EσB∗ [δτ(1) ] ≤ 1 − mv /2Mc (1 − q1 ). We here proceed to provide a lower bound for EσB∗ [δτ(1) ], which is independent of δ and of the equilibrium σ ∗ . Let N0 be given by Proposition 9 and define N1 := inf{n : Fn (1) ≤ ν/Mc }. We proceed in three steps: Step 1. One has N1 ≤ C1 /(1 − δ), where C1 is independent of δ ≥ δ¯ and of the equilibrium σ ∗ . Let q := maxn
0 is independent of δ ≥ δ¯ and of the equilibrium σ ∗ . By Steps 1–3, one thus has EσB∗ [δτ(1) ] ≥ C2 × (δ1/(1−δ) )C0 +C1 , and the result ¯ ¯ follows since δ1/(1−δ) ≥ δ¯ 1/(1−δ) for every δ ≥ δ. Steps 1 and 2 make use of the following technical result, stated without proof, which links the discounted expectation of a random variable (hereafter, r.v.) to its tail distribution. LEMMA 5: Let τ be a random time with integer values and such that (18)
E[δτ−n |τ > n] ≥ a
for some a > 0 and all n ≥ 1. Then P(τ ≥
1 ab(1−δ)
) ≤ b for all b > 0.
Our computation of C1 and C0 involves three parameters. We choose φ ψ ψ−φ ). Next, we set K := 1 + (Mv (1 − such that 0 < φ < ψ < ν and η ∈ (0 2v(1)
63
PUBLIC VS. PRIVATE OFFERS
¯ − c(q1 )). We express C1 and C0 q1 ))/(2(ν − ψ)), ε := 12 ηK , and a := φδ/(c(1) as functions of these constants. We make no attempt to optimize the choice of φ ψ, and η. Step 1. Define C1 := 1 + (ln(Mv /ν)) a1 ε12 . We prove that N1 ≤ C1 /(1 − δ). For a given stage n, we let qn := max{q ∈ [q 1] : vn (q) > c(q) + φ}. Since vn (1) ≤ c(1), one has qn < 1. On the other hand, since vn (qn ) = v(qn ), one also has qn > qn . The proof is organized as follows. In Claim 1 below, we first prove that conditional on the seller having rejected all previous offers, the probability that the seller’s type does not exceed qn is bounded away from zero. That is, from buyer n’s viewpoint, types below qn have a significant probability. CLAIM 1: One has Fn (qn ) ≥ 2εFn (1). Next, we observe that, since vn (qn ) is bounded away from c(qn ), the offer pn (qn ) with independent type qn is also bounded away from c(qn ). Therefore, it is likely that type qn receives acceptable offers shortly after stage n (for otherwise, he would be willing to accept a price close to c(qn )). In Claim 2, we use this insight to prove that conditional on the seller having rejected all previous offers, it is likely that type qn accepts an offer within 1/aε(1 − δ) additional stages. CLAIM 2: One has Fn+Naε (qn ) < εFn (1), where Naε := the proof of Lemma 5.
1 aε(1−δ)
is defined as in
The assertion of Step 1 immediately follows from Claims 1 and 2. Indeed, observe that, for a given q, Fn (1)−Fn (q) is the probability that the seller rejects all offers from buyers 1 2 n − 1 and has a type t in [q 1]. This difference is nonincreasing in n; hence Fn+Naε (1) − Fn+Naε (qn ) ≤ Fn (1) − Fn (qn ) By Claims 1 and 2, this yields Fn+Naε (1) ≤ (1 − ε)Fn (1) and thus also F1+iNaε (1) ≤ (1 − ε)i for all i ≥ 1. In particular, Fn (1) < ν/Mc as soon as n ≥ 1 + C1 /(1 − δ), as desired. PROOF OF CLAIM 1: We introduce an auxiliary sequence of types which is defined by y0 = q1 and yj+1 = max q ∈ [0 1] : E v(t)|t ∈ [yj q] ≥ c(q) + ψ
64
J. HÖRNER AND N. VIEILLE
until yJ = 1. In particular, E[v(t)|t ∈ [yj yj+1 ]] = c(yj+1 ) + ψ for j < J − 1. On the other hand, observe that yj+1 1 E v(t)|t ∈ [yj yj+1 ] = v(t) dt yj+1 − yj yj ≥ v(yj+1 ) −
Mv (yj+1 − yj ) 2
Since v(yj+1 ) − c(yj+1 ) ≥ ν, this implies that yj+1 − yj ≥
2(ν − ψ) Mv
for j < J − 1
hence J ≤ K. For a given stage n, let jn := min{j : Fn (yj ) ≥ ηK−j Fn (1)}. We now check that qn ≥ yjn , which yields Fn (qn ) ≥ Fn (yjn ) ≥ 2εFn (1), as desired. There is nothing to prove if jn = 0; hence assume jn > 0. By definition of jn , one has Fn (yjn −1 ) < ηFn (yjn ): conditional on t ≤ yjn , it is very likely that buyer n faces a type in [yjn −1 yjn ]. Hence23
EFn v(t)|t ≤ yjn − EFn v(t)|t ∈ yjn −1 yjn ≤ 2ηv(1) (19) since |E(X) − E(X1A )| ≤ 2P(A) sup |X| for every bounded r.v. X and every event A. The first expectation in (19) is vn (yjn ), while the second one is at least c(yjn ) + ψ. Therefore, vn (yjn ) ≥ c(yjn ) + φ and thus qn ≥ yjn . Q.E.D. PROOF OF CLAIM 2: For clarity, we abbreviate qn to q. Recall that τn (q) = inf{m > n : p˜ m ≥ pm (q)} denotes the first buyer after n who submits an offer that is acceptable to type q. For any given stage m ≥ n and since pm (q)−c(q) = EσB∗ [δτn (q)−m (p˜ τn (q) − c(q))|τn (q) > m], one has (20)
pm (q) − c(q) ≤ (c(1) − c(q))EσB∗ δτn (q)−m |τn (q) > m
On the other hand, pm (q) = vm (q) ≥ vn (q) if πm∗ = 0, and then pm (q) − c(q) ≥ ∗ φ, while pm (q) − c(q) ≥ δ(pm+1 (q) − c(q)) ≥ δφ if πm∗ > 0, since then πm+1 = 0. Using (20), this implies EσB∗ δτn (q)−m |τn (q) > m ≥
δφ ≥ a c(1) − c(q)
Apply now Lemma 5 to obtain PσB∗ (τ(q) ≥ n + Naε |τ(q) ≥ n) ≤ ε. 23
Denote by EFn the expectation under the belief held by buyer n.
PUBLIC VS. PRIVATE OFFERS
65
Finally, observe that PσB∗ (τ(q) ≥ n + Naε ) ≥ PσB∗ (τ(q) ≥ n + Naε t ∈ [q1 q]) ≥ PσB∗ (τ(t) ≥ n + Naε t ∈ [q1 q]) = Fn+Naε (q) whereas PσB∗ (τ(q) ≥ n) = PσB∗ (τ(q) ≥ n t ∈ [q1 1]) ≤ PσB∗ (τ(1) ≥ n t ∈ [q1 1]) = Fn (1) Therefore, Fn+Naε (q) ≤ PσB∗ (τ(q) ≥ n + Naε |τ(q) ≥ n) ≤ ε Fn (1) Q.E.D.
as desired.
Step 2. Define C0 = 2Mc /((1 − q1 )mv ) × 1/a(1/ε2 ). We prove that N0 − N1 ≤ C0 /(1 − δ). Denote q∗ := maxn vN0 −1 (1) since buyer N0 − 1 makes a serious, nonwinning offer with positive probability. Thus, N1 − N0 is bounded by the time it takes for vn (1) to increase from c(q∗ ) to c(1). Between stages N1 and N0 , and using the proof of Proposition 9, no buyer ever submits a serious type offer above q∗ . Hence, vn (1) increases steadily with time at a speed which is related to the probability with which successive buyers do trade. Lemma 6 below provides a precise estimate of this relationship. LEMMA 6: Let n < m ≤ N0 be any two stages and denote by πnm := (Fn (1) − Fm (1))/(Fn (1)) the probability that the seller accepts an offer from some buyer n n + 1 m − 1, conditional on having declined all previous offers. Then (21)
vm (1) − vn (1) ≥
mv (1 − q1 ) 2
× (1 − q∗ )
πnm Fm (1)
The proof of Lemma 6 is tedious and somewhat lengthy. It is postponed to the end of the section. By Lemma 5, one has, as in Step 1, πnn+Naε ≥ ε. By
66
J. HÖRNER AND N. VIEILLE
Lemma 6, one thus has vn+Naε (1) − vn (1) ≥ ≥
mv (1 − q1 )
× (1 − q∗ )ε 2 mv (1 − q1 ) c(1) − c(q∗ ) Mc
2
ε
Hence, in any block of Naε consecutive stages k < N0 , the average quality increases by at least (mv (1 − q1 )ε)/2Mc times c(1) − c(q∗ ). In particular, it takes no more than 2Mc /(mv (1 − q1 )ε) such blocks to increase from c(q∗ ) to c(1). The result follows. Step 3. In light of the results obtained so far, this last step is straightforward. Observe first that pN0 (qN ) ≥ v(qN ) ≥ c(qN ) + ν, for otherwise buyer N0 0 0 0 would get a positive payoff when submitting a type offer slightly above qN . 0 Since no buyer n ≥ N0 ever submits a serious offer below 1, one has pN0 qN − c qN = c(1) − c qN EσB∗ δτ(1)−N0 0
0
0
which yields EσB∗ δτ(1)−N0 ≥
ν c(1) − c qN 0
This concludes the proof of Lemma 5.
Q.E.D.
PROOF OF LEMMA 6: Fix the distribution fn of types faced by buyer n and the value of πnm . We minimize vm (1) over all distributions of types that buyer m may possibly be facing. It is convenient to parameterize such distributions by g(t), the probability that a seller with type t would reject all offers from buyers n n + 1 m − 1, so that fm (t) = g(t)fn (t). 1 Hence, vm (1) is minimal when q g(t)fn (t)v(t) dt is minimal. The minimum 1 is computed over all nondecreasing functions g, with values in [0 1] and such that (i) g(t) = 1 over [q∗ 1] (since there is no serious offer beyond q∗ ); 1 (ii) q g(t)fn (t) dt = (1 − πnm )Fn (1). 1 Since v is increasing, the minimum is obtained when g is constant over the interval [q q∗ ], that is, g(t) = ω if t < q∗ and g(t) = 1 if t ≥ q∗ . The value of ω is deduced from (ii) and is given by ωFn (q∗ ) = πnm Fn (1). Thus, 1 1 1 1 (22) vm (1) − vn (1) ≥ v(t)g(t)fn (t) dt − v(t)fn (t) dt Fm (1) q1 Fn (1) q1
67
PUBLIC VS. PRIVATE OFFERS
The rest of the proof consists of showing that the right-hand side of (22) is at least equal to the right-hand side in (21). Plugging g into (22), one has vm (1) − vn (1) 1 q∗ 1 v(t)fn (t) dt − ω v(t)fn (t) dt − vn (1) ≥ Fm (1) q1 q 1 1 q∗ 1 1 ωFn (q∗ ) v(t)fn (t) dt − v(t)fn (t) dt = Fm (1) Fn (1) q1 Fn (q∗ ) q1 1 q∗ πnm Fn (q∗ ) v(t)fn (t) dt − Fn (1) v(t)fn (t) dt = Fm (1)Fn (q∗ ) q q 1 1 =
πnm Fm (1)Fn (q∗ ) 1 v(t)fn (t) dt − (1 − q∗ ) × Fn (q∗ )
q∗ q
q∗
v(t)fn (t) dt
1
b a a where the first equality follows from the identity a−a = b−b ( b − b ), the second b−b follows from the value of ω, and the third follows from Fn (1) = Fn (q∗ ) + (1 − q∗ ). We now use the inequality v(t) ≥ v(q∗ ) + mv (t − q∗ ) (t ∈ [q∗ 1]) to bound the first integral and the inequality v(t) ≤ v(q∗ ) + mv (t − q∗ ) (t ∈ [q1 q∗ ]) to bound the second integral. After simplification, this yields
(23)
vm (1) − vn (1) ≥
πnm (1 − q∗ )mv Fm (1)Fn (q∗ ) q∗ 1 − q∗ + × Fn (q∗ ) (q∗ − t)fn (t) dt 2 q 1
Consider finally the right-hand side of (23). For a given value of Fn (q∗ ), the integral is minimized when fn is constant over [q1 q∗ ] and equal to Fn (q∗ )/(q∗ − q1 ). The integral is then equal to 12 (q∗ − q1 )Fn (q∗ ). Substituting into (23), this yields vm (1) − vn (1) ≥ as desired.
1 − q1 πnm (1 − q∗ )mv × Fm (1) 2 Q.E.D.
68
J. HÖRNER AND N. VIEILLE REFERENCES
AGHION, P., P. BOLTON, C. HARRIS, AND B. JULLIEN (1991): “Optimal Learning by Experimentation,” Review of Economic Studies, 58, 621–654. [46] AKERLOF, G. (1970): “The Market for ‘Lemons’: Qualitative Uncertainty and the Market Mechanism,” Quarterly Journal of Economics, 84, 488–500. [29] BABCOCK, L., AND G. LOEWENSTEIN (1997): “Explaining Bargaining Impasse: The Role of SelfServing Biases,” Journal of Economic Perspectives, 11, 109–126. [30] BIKHCHANDANI, S., D. HIRSHLEIFER, AND I. WELCH (1992): “A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascades,” Journal of Political Economy, 100, 992–1026. [48] COASE, R. H. (1972): “Durability and Monopoly,” Journal of Law and Economics, 15, 143–149. [38] CRAMTON, P. (1984): “Bargaining With Incomplete Information: An Infinite-Horizon Model With Continuous Uncertainty,” Review of Economic Studies, 51, 573–591. [46] CRAWFORD, V. P. (1982): “A Theory of Disagreement in Bargaining,” Econometrica, 3, 607–638. [30] DENECKERE, R., AND M.-Y. LIANG (2006): “Bargaining With Interdependent Values,” Econometrica, 74, 1309–1364. [38,46] EVANS, R. (1989): “Sequential Bargaining With Correlated Values,” Review of Economic Studies, 56, 499–510. [38,46] FUDENBERG, D., AND J. TIROLE (1991): Game Theory. Cambridge, MA: MIT Press. [32,33] GUL, F., AND H. SONNENSCHEIN (1988): “On Delay in Bargaining With One-Sided Uncertainty,” Econometrica, 56, 601–611. [46] HENDEL, I., AND A. LIZZERI (1999): “Adverse Selection in Durable Goods Markets,” American Economic Review, 89, 1097–1115. [46] HENDEL, I., A. LIZZERI, AND M. SINISCALCHI (2005): “Efficient Sorting in a Dynamic AdverseSelection Model,” Review of Economic Studies, 72, 467–497. [46] HÖRNER, J., AND N. VIEILLE (2007): “Public vs. Private Offers in the Market for Lemons,” Center for Mathematical Studies DP, Northwestern University. [44,47] (2009): “Supplement to ‘Public vs. Private Offers in the Market for Lemons’,” Econometrica Supplement Material, 77, http://www.econometricsociety.org/ecta/Supmat/6917_ extensions.pdf. [31,45] HOUSE, C. L., AND J. V. LEAHY (2004): “An sS Model With Adverse Selection,” Journal of Political Economy, 112, 581–614. [46] JANSSEN, M. C. W., AND S. ROY (2002): “Dynamic Trading in a Durable Good Market With Asymmetric Information,” International Economic Review, 43, 1, 257–282. [46] NÖLDEKE, G., AND E. VAN DAMME (1990): “Signalling in a Dynamic Labour Market,” Review of Economic Studies, 57, 1–23. [46] RILEY, J. G., AND R. ZECKHAUSER (1983): “Optimal Selling Strategies: When to Haggle and When to Hold Firm,” Quarterly Journal of Economics, 76, 267–287. [46] SAMUELSON, W. (1984): “Bargaining Under Asymmetric Information,” Econometrica, 52, 995–1005. [45] SWINKELS, J. (1999): “Educational Signalling With Preemptive Offers,” Review of Economic Studies, 66, 949–970. [46] TAYLOR, C. (1999): “Time-on-the-Market as a Signal for Quality,” Review of Economic Studies, 66, 555–578. [46] VINCENT, D. (1989): “Bargaining With Common Values,” Journal of Economic Theory, 48, 47–62. [38,46] WILSON, C. (1980): “The Nature of Markets With Adverse Selection,” Bell Journal of Economics, 11, 108–130. [33]
Dept. of Economics, Yale University, 30 Hillhouse Avenue, New Haven, CT 06520-8281, U.S.A.; [email protected]
PUBLIC VS. PRIVATE OFFERS
69
and Département Economie et Finance, HEC Paris, 78351 Jouy-en-Jasas, France; [email protected]. Manuscript received January, 2007; final revision received May, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 71–92
ALL-PAY CONTESTS BY RON SIEGEL1 This paper studies a class of games, “all-pay contests,” which capture general asymmetries and sunk investments inherent in scenarios such as lobbying, competition for market power, labor-market tournaments, and R&D races. Players compete for one of several identical prizes by choosing a score. Conditional on winning or losing, it is weakly better to do so with a lower score. This formulation allows for differing production technologies, costs of capital, prior investments, attitudes toward risk, and conditional and unconditional investments, among others. I provide a closed-form formula for players’ equilibrium payoffs and analyze player participation. A special case of contests is multiprize, complete-information all-pay auctions. KEYWORDS: All-pay, contests, auctions, rent-seeking, lobbying.
1. INTRODUCTION IN MANY SETTINGS, economic agents compete by making irreversible investments before the outcome of the competition is known. Lobbying activities, research and development races, and competitions for promotions, to name a few, all have this property. This type of competition has been widely studied in the literature. In the classic all-pay auction with complete information (henceforth: all-pay auction), for example, rivals incur a cost of bidding that is the same whether they win or lose, but may differ in their valuation for winning the single prize. The all-pay auction has been used to model rent-seeking and lobbying activities (Hillman and Samet (1987), Hillman and Riley (1989), Baye, Kovenock, and de Vries (1993)), competitions for a monopoly position (Ellingsen (1991)), waiting in line (Clark and Riis (1998)), sales (Varian (1980)), and R&D races (Dasgupta (1986)). Variations of the all-pay auction have been used to model competitions for multiple prizes (Clark and Riis (1998) and Barut and Kovenock (1998)), the effect of lobbying caps (Che and Gale (1998, 2006) and Kaplan and Wettstein (2006)), and R&D races with endogenous prizes (Che and Gale (2003)). While this literature has produced interesting results, the models considered are often restrictive in some or all of the following dimensions: the types of asymmetries across players, the number of prizes, the number of players, and the degree of irreversibility of the investments. 1
I am indebted to Ed Lazear and Ilya Segal for their continuous guidance and encouragement. I thank Jeremy Bulow for many beneficial discussions. I thank Larry Samuelson and four referees for very detailed comments. I thank Julio González-Díaz, Matthew Jackson, Sunil Kumar, Jonathan Levin, Stephen Morris, Marco Ottaviani, Kareen Rozen, Yuval Salant, Andrzej Skrzypacz, Balázs Szentes, Michael Whinston, Robert Wilson, Eyal Winter, and seminar participants at Chicago GSB, CSIO-IDEI, Cornell, Hebrew University, Northwestern, NYU Stern, Penn State, Stanford, Tel-Aviv, Toronto, and Yale for helpful comments and suggestions. © 2009 The Econometric Society
DOI: 10.3982/ECTA7537
72
RON SIEGEL
The goal of this paper is to better understand competitions in which contestants are asymmetrically positioned and make irreversible investments. To this end, I investigate all-pay contests (henceforth: contests). In a contest, each player chooses a costly “score” and the players with the highest scores obtain one prize each (relevant ties can be resolved using any tie-breaking rule). Thus, ex post, each player can be in one of two states: winning or losing. Conditional on winning or losing, a player’s payoff decreases weakly and continuously with his chosen score; choosing a higher score entails a player-specific cost, which may differ across the two states. The primitives of the contest are commonly known. This captures players’ knowledge of the asymmetries among them. Consequently, equilibrium payoffs represent players’ “economic rents,” in contrast to “information rents” that arise in models of competition with private information.2 Contests are defined in Section 2. The generality of players’ cost functions allows for differing production technologies, costs of capital, and prior investments, among others. Moreover, contests allow for nonordered cost functions. These arise when different competitors are disadvantaged relative to others in different regions of the competition (see the example in Section 1.1 below). In addition, state-dependent costs accommodate both sunk and conditional investments, player-specific risk attitudes, and player- and score-dependent valuations for a prize.3 When all investments are unconditional, each player is characterized by his valuation for a prize, which is the payoff difference between the two states, and a weakly increasing, continuous cost function that determines his cost of choosing a score independently of the state. I refer to such contests as separable contests. Separable contests nest many models of competition that assume a deterministic relation between effort and prize allocation.4 For example, the models of Che and Gale (2006) and Kaplan and Wettstein (2006) are two-player, single-prize separable contests.5 Single-prize and multiprize all-pay 2
Such models typically assume ex ante identical players (for example, Moldovanu and Sela (2001, 2006) and Kaplan, Luski, Sela, and Wettstein (2002)). An exception is the work by Parreiras and Rubinchik (2006), who allowed for asymmetry between players and provided a partial characterization of equilibrium. 3 González-Díaz (2007) also allowed for nonlinear state-dependent costs, but accommodated only a single prize and ordered costs. His techniques and results do not generalize to nonordered costs or multiple prizes. 4 Other models assume a probabilistic relation between players’ efforts and prize allocation. The classic examples are Tullock’s (1980) “lottery model,” in which players are symmetric and each player’s probability of winning the single prize is proportional to the player’s share of the total expenditures, and Lazear and Rosen’s (1981) two-player tournaments. Recent contributions to this large, single-prize literature that accommodate a degree of asymmetry include Cornes and Hartley (2005) and Szymanski and Valletti (2005). The analysis of probabilistic models typically focuses on pure-strategy equilibria by using first-order conditions. Functional form assumptions are made “[. . . ] to ensure the existence of pure-strategy equilibria and first-order conditions characterizing these equilibria” (Szymanski and Valletti (2005)). 5 Although Che and Gale (2006) have N ≥ 2 players, they assumed strictly ordered costs so only two players participate in equilibrium.
ALL-PAY CONTESTS
73
auctions are separable contests with linear costs, in which asymmetries among players are captured only by differences in valuations for a prize. Section 3 begins the analysis by identifying an unambiguous ranking of players. Such a ranking is not immediately obvious, because players’ costs may not be ordered. Players are ranked in decreasing order of their reach, where a player’s reach is the highest score he can choose without obtaining a negative payoff if he wins a prize with certainty. The key result of the paper is Theorem 1, which provides a full characterization of players’ expected equilibrium payoffs in a “generic” contest for m prizes. Expected payoffs are determined by reaches, powers, and the threshold. The threshold is the reach of player m + 1. A player’s power equals his payoff from winning a prize with certainty when choosing a score equal to the threshold. Theorem 1 shows that under “generic” conditions—checked using players’ powers—each player’s expected payoff equals the higher of his power and zero. Thus, a generic contest has the same payoffs in all equilibria. Theorem 1 also shows that the number of players who obtain strictly positive expected payoffs equals the number of prizes. The derivation of the payoff characterization does not rely on solving for an equilibrium. The payoff result implies that a player’s expected payoff does not depend on his cost when he loses. For example, a player’s expected payoff in a modified complete-information first-price auction in which he pays a strictly positive fraction of his bid if he loses does not depend on the size of the fraction. When the fraction equals 1 for all players, we have an all-pay auction. Section 3.2 discusses contests that are not generic. In such contests, players’ payoffs in at least one equilibrium are specified by the payoff characterization, but payoffs in other equilibria may be different. Perturbing a contest that is not generic leads to a generic contest. Section 4 examines equilibrium participation. Participation is related to the ordering of players’ costs. Theorem 2 shows that when players’ costs (appropriately normalized) are strictly ordered, at most m + 1 players participate in any equilibrium. This is why precisely m + 1 players participate in an all-pay auction with distinct valuations. In contrast, nonordered costs may lead more than two players to participate even when there is only one prize.6 Section 5 concludes by briefly discussing implications of the analysis for rent dissipation and comparative statics. The Appendix contains an example of a contest with multiple equilibria, the proofs of Corollary 1 and Theorem 2, and a technical lemma. 6 When more than two players participate, general asymmetric costs significantly complicate equilibrium analysis. For this reason, a noncrossing property is the standard assumption in the literature. The few papers that analyze mixed-strategy equilibria with more than two participants make assumptions, in addition to noncrossing, that lead to limited asymmetry among participating players (see Baye, Kovenock, and de Vries (1996), Clark and Riis (1998), and González-Díaz (2007), who showed that the analysis of Baye, Kovenock, and de Vries (1996) generalizes to nonlinear costs).
74
RON SIEGEL
I begin with an example that illustrates the payoff result and some of its implications. 1.1. An Example Three risk-neutral firms compete for one monopoly position allocated by a government official. Each firm chooses how much to invest in lobbying activities, and this investment leads to a score, which can be interpreted as the amount of influence the firm has achieved over the government official. The firm with the highest score obtains the monopoly position, which carries a monetary value of 1, but all firms pay the lobbying costs associated with their respective scores. Firms 1 and 2 have better “lobbying technologies” than firm 3, because they have better lobbyists, are located closer to the government official, or have lower costs of capital. Firm 3 has an initial advantage, due to prior investments or reputation. Figure 1 depicts players’ cost functions, where player i ∈ {1 2 3} corresponds to firm i. Firm 3’s initial advantage is captured by its initial marginal cost, γ ≥ 0, which is low relative to the other firms’ marginal costs. In contrast, firm 3’s cost for high scores is high relative to those of the other two firms. Thus, players’ costs are not ordered. Cost functions are commonly known, and relevant ties are broken randomly. The costs of choosing 1 for players 1, 2, and 3 are K <1, 1, and L > 1, respectively. Consequently, players 2 and 3 would never choose a score higher than 1, since that would cost them more than the value of the prize. Player 1 can therefore guarantee himself a payoff arbitrarily close to 1 − K by choosing a score slightly higher than 1, so 1 − K is a lower bound on his expected payoff in any equilibrium. Player 1 does not, however, choose scores greater or equal to 1 with certainty in equilibrium, since players 2 and 3 would best-reply by choosing scores lower than 1, in which case player 1 would be better off choosing a lower score. In fact, player 1 must employ a mixed strategy in any equilibrium. It may therefore
FIGURE 1.—Players’ costs.
ALL-PAY CONTESTS
75
seem plausible that such a strategy could give player 1 an equilibrium payoff higher than 1 − K. Theorem 1 shows that the expected equilibrium payoff of player 1 is exactly 1 − K. Similarly, players 2 and 3 can guarantee themselves no more than 0; the payoff characterization shows that this is exactly their equilibrium payoff. This implies that aggregate equilibrium expenditures equal K. For low, strictly positive values of γ, all three players must participate (invest) in equilibrium. This is shown in Section 4. Each player contributes to aggregate expenditures and wins the prize with strictly positive probability. This participation behavior results from the nonordered nature of players’ cost functions.7 Now consider a variant of the contest, in which player 3 has 0 marginal cost up to a score whose cost for player 1 is at least 1. This represents a very large initial advantage for player 3. In this case, there is a pure-strategy equilibrium in which player 3 wins with certainty and no player invests. Thus, it may be that no player invests in a contest for a valuable prize. Regardless of the value of γ, precisely one player receives a strictly positive expected payoff. This too follows from the payoff characterization, since there is only one prize. Section 5 discusses the effects of changes in competition structure. For example, the addition of player 3 from Figure 1 to a contest that includes only players 1 and 2 changes neither expected payoffs nor expected aggregate expenditures, but changes individual expenditures for low, strictly positive values of γ, because all three players participate. Thus, the addition of a player may change equilibrium behavior without changing players’ payoffs or aggregate expenditures. In contrast, lowering the prize’s value can lead to a positive payoff for player 3, making him the only player who obtains a positive expected payoff. 2. THE MODEL In a contest, n players compete for m homogeneous prizes, 0 < m < n. The set of players {1 n} is denoted by N. Players compete by each choosing a score simultaneously and independently. Player i chooses a score si ∈ Si = [ai ∞), where ai ≥ 0 is his initial score. Positive initial scores capture starting advantages, or “head starts,” without allowing players to choose lower scores. This may eliminate equilibria involving weakly dominated strategies (see Example 4 in Siegel (2007)). Each of the m players with the highest scores wins one prize. In case of a relevant tie, any procedure may be used to allocate the tie-related prizes among the tied players. 7 Unlike in the all-pay auctions of Baye, Kovenock, and de Vries (1996), participation by more than two players in a contest for one prize does not rely on players’ valuations being identical.
76
RON SIEGEL
Player i has preferences over lotteries whose outcomes are pairs (si Wi ), where si is the player’s score and Wi indicates whether he obtains a prize (Wi = 1) or not (Wi = 0). These preferences are represented by a Bernoulli utility function. Because Wi equals 0 or 1, this function can be written as Wi vi (si ) − (1 − Wi )ci (si ), where vi : Si → R is player i’s valuation for winning and ci : Si → R is player i’s cost of losing. The primitives of the contest are commonly known. Thus, given a profile of scores s = (s1 sn ), si ∈ Si , player i’s payoff is ui (s) = Pi (s)vi (si ) − (1 − Pi (s))ci (si )
×
Sj → [0 1], player i’s probability of winning, satisfies ⎧ 0 if sj > si for m or more players j = i, ⎪ ⎨ 1 if sj < si for N − m or more Pi (s) = players j = i, ⎪ ⎩ any value in [0 1] otherwise, n such that j=1 Pj (s) = m. Note that a player’s probability of winning depends on all players’ scores, but his valuation for winning and cost of losing depend only on his chosen score. I make the following assumptions. where Pi :
j∈N
ASSUMPTION A1: vi and −ci are continuous and nonincreasing. ASSUMPTION A2: vi (ai ) > 0 and limsi →∞ vi (si ) < ci (ai ) = 0. ASSUMPTION A3: ci (si ) > 0 if vi (si ) = 0. Assumption A1 means that, conditional on winning or losing, a lower score is weakly preferable.8 This represents an “all-pay” component. Assumption A2 means that with the initial score, winning is better than losing, so prizes are valuable, but losing with the initial score is preferable to winning with sufficiently high scores. Assumption A3 means that if winning with score si is as good as losing with the initial score, then winning with score si is strictly better than losing with score si . This last condition stresses the all-pay nature of contests. It is not satisfied by complete-information first-price auctions, for example, since a player pays nothing if he loses and is therefore indifferent between losing and winning with a bid that equals his valuation for the prize. But the condition is met when an all-pay element is introduced; for example, when 8 The “nonincreasing” part of Assumption A1 can be relaxed to requiring that (i) vi be nonincreasing where it is nonnegative, that is, if y > x, then vi (x) ≥ 0 and vi (y) ≥ 0 imply vi (x) ≥ vi (y), (ii) vi cross 0 once from above, that is, if y > x, vi (x) < 0 implies vi (y) < 0, and (iii) ci ≥ 0 and if y > x, then ci (x) > 0 implies ci (y) > 0.
ALL-PAY CONTESTS
77
every bidder pays some strictly positive fraction of his bid whether he wins or not, and only the winner pays the balance of his bid. The formulation allows the difference between a player’s valuation for winning and his cost of losing to depend on his chosen score. For example, in a competition for promotions in which a higher score is achieved by investing in managerial skills, such skills are costly to acquire and may increase the value associated with a promotion. Alternatively, it may be that the value of the prize for player i is fixed at Vi but some costs are only borne if the player wins, so his costs are ciW when he wins and ciL when he loses. In this case, vi (si ) = Vi − ciW (si ) and ci = ciL , so ui (s) = Pi (s)(Vi − ciW (si )) − (1 − P(s))ciL (si ). When thinking about the outcome of a contest in monetary terms, contests can capture players’ risk attitudes. In the previous setting, for example, we can let vi (si ) = f (Vi − ciW (si )) and ci = f (ciL ) for some strictly increasing f such that f (0) = 0. One subclass of contests that is of particular interest is separable contests. In a separable contest, every player i’s preferences over lotteries with outcomes (si Wi ) depend only on the marginal distributions of the lotteries.9 This implies that the effect of winning or losing on a player’s Bernoulli utility is additively separable from that of the score. That is, vi (si ) = Vi − ci (si ) and ui (s) = Pi (s)Vi − ci (si ) for Vi = vi (ai ) > 0.10 If we interpret payoffs as money, the value ci (si ) could be thought of as player i’s cost of choosing score si , which does not depend on whether he wins or loses, and Vi could be thought of as player i’s valuation for a prize, which does not depend on his chosen score. All expenditures are unconditional, and players are risk neutral. The example of Section 1.1 depicts a separable contest with three players and nonlinear costs. The lobbying games of Kaplan and Wettstein (2006) and Che and Gale (2006) are two-player separable contests. Separable contests with linear costs are the single- and multi-prize complete-information all-pay auctions (Hillman and Samet (1987), Hillman and Riley (1989), Clark and Riis (1998)).11
9 For example, in a separable contest, player i is indifferent between a lottery in which (ai 1) occurs with probability 12 and (si 0) occurs with probability 12 , and a lottery in which (ai 0) occurs with probability 12 and (si 1) occurs with probability 12 (for any si ≥ ai ), because the marginal distribution of the first coordinate is the same in both lotteries, and the marginal distribution of the second coordinate is the same in both lotteries. 10 Indeed, if player i’s Bernoulli utility is not additively separable, then vi (ai ) + ci (ai ) = vi (si ) + ci (si ) for some si > ai . But then the player is not indifferent between the two lotteries described in the previous footnote, even though their marginal distributions are the same. The converse is also true: if player i’s Bernoulli utility is additively separable, it is immediate that his preferences over lotteries with outcomes (si Wi ) depend only on the marginal distributions of the lotteries. 11 Formally, vi (si ) = Vi − si ci (si ) = si ai = 0, and ties are resolved by randomizing uniformly, where Vi is bidder i’s valuation for a prize.
78
RON SIEGEL
3. PAYOFF CHARACTERIZATION The following concepts are key in analyzing the payoffs of players in equilibrium. DEFINITIONS: (i) Player i’s reach ri is the highest score at which his valuation for winning is 0. That is, ri = max{si ∈ Si |vi (si ) = 0}. Re-index players in (any) decreasing order of their reach, so that r1 ≥ r2 ≥ · · · ≥ rn . (ii) Player m + 1 is the marginal player. (iii) The threshold T of the contest is the reach of the marginal player: T = rm+1 . (iv) Players i’s power wi is his valuation for winning at the threshold. That is, wi = vi (max{ai T }). In particular, the marginal player’s power is 0. In a separable contest, a player’s reach is the highest score he can choose by expending no more than his valuation for a prize. In the example of Section 1.1, players are indexed in decreasing order of their reach, player 2 is the marginal player, and the threshold is 1. Player 1’s power is 1 − K > 0, player 2’s power is 0, and player 3’s power is 1 − L < 0. In an m-prize all-pay auction, player i is the player with the ith highest valuation, and his power equals his valuation less that of the marginal player. Theorem 1 below characterizes players’ equilibrium payoffs in contests that meet the following two conditions. GENERIC CONDITIONS: (i) Power Condition—The marginal player is the only player with power 0. (ii) Cost Condition—The marginal player’s valuation for winning is strictly decreasing at the threshold. That is, for every x ∈ [am+1 T ), vm+1 (x) > vm+1 (T ) = 0.12 I refer to a contest that meets the Generic Conditions as a generic contest. The separable contest in the example of Section 1.1 is generic, because the marginal player’s costs are strictly increasing at the threshold and only he has power 0. An m-prize all-pay auction meets the Cost Condition because costs are strictly increasing. If the Power Condition is met, that is, the valuation of the marginal player is different from those of all other players, the all-pay auction is generic. Contests that do not meet the Generic Conditions can be perturbed slightly to meet them. Perturbing the marginal player’s valuation for winning around the threshold leads to a contest that meets the Cost Condition. Doing the same for all players with power 0 generates a contest 12 In a separable contest, because vm+1 (x) = Vm+1 − cm+1 (x), the cost condition is that for every x ∈ [am+1 T ), cm+1 (x) < cm+1 (T ) = Vm+1 .
ALL-PAY CONTESTS
79
that meets the Power Condition. Note that in a generic contest, players in NW = {1 m} (“winning players”) have strictly positive powers, and players in NL = {m + 1 n} (“losing players”) have nonpositive powers. I now state the main result of the paper. THEOREM 1: In any equilibrium of a generic contest, the expected payoff of every player equals the maximum of his power and 0. An immediate implication of Theorem 1 is that in a generic contest players in NW have strictly positive expected payoffs and players in NL have expected payoffs of 0. Equivalently, a player obtains a strictly positive expected payoff in a generic contest if and only if his reach is strictly higher than the threshold. Players’ equilibrium strategies may be mixed, so players in NW may obtain a prize with probability smaller than 1 and players in NL may obtain a prize with strictly positive probability. It is only expected payoffs that are positive for players in NW and 0 for players in NL . In the example of Section 1.1, NW = {1} and NL = {2 3}. The contest is generic, so player 1’s payoff is 1 − K, and those of players 2 and 3 are 0. In an m-prize generic all-pay auction in which player i’s value is Vi , the payoff of every player i is max{Vi − Vm+1 0}.13 I use the following notation in the proof of Theorem 1. Pi (·), player i’s probability of winning, and ui (·), player i’s utility, are expanded to mixed strategies. A mixed strategy Gi of player i is a cumulative probability distribution that assigns probability 1 to his set of pure strategies Si . When a strategy profile G = (G1 Gn ) is specified, Pi (x) is shorthand for player i’s probability of winning when he chooses x ≥ ai with certainty and all other players play according to G, and similarly for ui (x). For an equilibrium (G1 Gn ), denote by ui = ui (Gi ) player i’s equilibrium payoff. Note that in equilibrium best responses are chosen with probability 1. The phrase “player i beats player j” refers to player i choosing a strictly higher score than player j does. For a set I, denote by |I| the cardinality of I. PROOF OF THEOREM 1: Choose a generic contest and an equilibrium G = (G1 GN ) of the contest. LEAST LEMMA: A player’s expected payoff in G is at least the maximum of his power and 0. PROOF: Every player i can guarantee himself a payoff of 0 by choosing his initial score, ai (recall that vi (ai ) > 0 and ci (ai ) = 0). It therefore suffices to consider players with strictly positive power, all of whom are in NW . In equilibrium, no player chooses scores higher than his reach with a strictly positive 13 Players’ payoffs in a multiprize all-pay auction in which all players have different valuations were first derived by Clark and Riis (1998).
80
RON SIEGEL
probability, since choosing such scores leads to a negative payoff (by Assumptions A1 and A3). So, by choosing max{ai T + ε} for ε > 0, a player i in NW beats all N − m players in NL with certainty. This means that for every player i in NW , ui ≥ vi (max{ai T + ε}) → vi (max{ai T }) = wi ε→0
by continuity of vi .
Q.E.D.
TIE LEMMA: Suppose that in G two or more players have an atom at a score x, that is, choose x with strictly positive probability. Then players who have an atom at x either all win with certainty or all lose with certainty when choosing x. PROOF: Denote by N the set of players who have an atom at x, with |N | ≥ 2. Denote by E the strictly positive-probability event that all players in N choose x. Denote by D ⊆ E the event in which a relevant tie occurs at x, that is, the event in which m prizes are divided among the |N | players in N , with 1 ≤ m < |N |. Suppose D has strictly positive probability. Then, conditional on D, at least one player i in N can strictly increase his probability of winning to 1 by choosing a score slightly higher than x, regardless of the tie-breaking rule. Since i chooses x with strictly positive probability, x ≤ ri so vi (x) > −ci (x) (by Assumptions A1–A3). Thus, by continuity of vi and ci , player i would be strictly better off by choosing a score slightly higher than x. Therefore, D has probability 0. This implies that P(E) = P(E L ) + P(E W ), where E L ⊆ E is the event that at least m players in N\N choose scores strictly higher than x, E W ⊆ E is the event that at most m − |N | players in N\N choose scores strictly higher than x, and P(A) denotes the probability of event A. By independence of players’ strategies, either E L or E W have probability 0; otherwise, D would have strictly positive probability. Suppose that P(E) = P(E L ). Independence of players’ strategies now implies that, without conditioning on E, at least m players in N\N choose scores strictly higher than x with probability 1, so Pi (x) = 0 for every player i in N . Similarly, if P(E) = P(E W ), then Q.E.D. Pi (x) = 1 for every player i in N . Several players may have an atom at the same score in equilibrium. The Tie Lemma only rules out ties in which at least one player wins with a strictly positive probability that is less than 1. That no such ties arise in equilibrium helps establish which players have an expected payoff of 0. ZERO LEMMA: In G, at least n − m players have best responses with which they win with probability 0 or arbitrarily close to 0. These players have an expected payoff of at most 0. PROOF: Denote by J a set of some m + 1 players. Denote by S˜ the union of the best-response sets of the players in J and denote by sinf the infimum
ALL-PAY CONTESTS
81
˜ Consider three cases: (i) two or more players in J have an atom at sinf , of S. (ii) exactly one player in J has an atom at sinf , and (iii) no player in J has an atom at sinf . Case i. Denote by N ⊆ J the set of players in J who have an atom at sinf . It cannot be that Pi (sinf ) = 1 for every player i in N : because any player in J\N chooses scores strictly higher than sinf with probability 1, even if the players in N\J choose scores strictly lower than sinf with probability 1, only m − |(J\N )| = m − (m + 1 − |N |) = |N | − 1 > 0 prizes are divided among the |N | players in N . Thus, the Tie Lemma shows that Pi (sinf ) = 0 for every player i in N . Case ii. Denote by i the only player in J with an atom at sinf . Pi (sinf ) = 0, since all m players in J\{i} choose scores strictly higher than sinf with probability 1. In Cases i and ii, Pi (sinf ) = 0 for some player i in J who has an atom at sinf , so sinf is a best response for this player at which he wins with probability 0. Case iii. By definition of sinf , there exists a player i in J with best responses that approach s . Since 1 ≥ 1 − P (x ) ≥ {xn }∞ inf i n n=1 j∈J\{i} (1 − Gj (xn )), no player in J has an atom at sinf , and G is right-continuous, as n tends to infinity Pi (xn ) approaches 0. Because J was a set of any m + 1 players, at least n − m players have best responses with which they win with probability 0 or arbitrarily close to 0, and therefore have an expected payoff of at most 0 Q.E.D. The Least Lemma, the Tie Lemma, and the Zero Lemma hold regardless of the Generic Conditions. The Least Lemma and the Power Condition show that the m players in NW have strictly positive expected payoffs. Therefore, the Least Lemma and the Zero Lemma imply that under the Power Condition the n − m players in NL obtain expected payoffs of 0. Using this fact, I show that players in NW obtain at most their power. THRESHOLD LEMMA: The players in NW have best responses that approach or exceed the threshold and, therefore, have an expected payoff of at most their power. PROOF: By the Power Condition, players in NL \{m + 1} have strictly negative powers. Their reaches, and therefore the supremum of their best responses, are strictly below the threshold. Consequently, there is some ssup < T such that Gi (x) = 1 for every player i in NL \{m + 1} and every score x > ssup . This implies that every player i in NW chooses scores that approach or exceed the threshold, that is, has Gi (x) < 1 for every x < T . Otherwise, for some s in (ssup T ), Gi (s) = 1 for all but at most m − 1 players in N\{m + 1}. But then the marginal player could win with certainty by choosing a score in (max{am+1 s} T ) (note that am+1 < T ); because of the Cost Condition, this would give him a strictly positive payoff, a contradiction (recall that the marginal player is in NL and therefore has an expected payoff of 0).
82
RON SIEGEL
Take a player i in NW . Because Gi (x) < 1 for every x < T , there exists a sequence {xn }∞ n=1 of best responses for player i that approach some zi ≥ T . Since xn is a best response for player i, who has a strictly positive payoff by the Least Lemma and the Power Condition, vi (xn ) > 0. So, by Assumptions A1 and A2, vi (xn ) > −ci (xn ). By continuity of vi , we have ui = ui (xn ) = Pi (xn )vi (xn ) − (1 − Pi (xn ))ci (xn ) ≤ vi (xn ) → vi (zi ) ≤ vi (T ) = wi
xn →zi
so every player in NW obtains at most his power.
Q.E.D.
The Least Lemma and the Threshold Lemma, which relies on the Generic Conditions, show that players in NW have expected payoffs equal to their power. We have seen that players in NL have expected payoffs of 0. Since NL ∪ NW = N, the expected payoff of every player equals the maximum of his power and 0. Q.E.D. 3.1. Discussion of the Payoff Characterization Equilibrium payoffs in generic contests depend only on players’ valuations for winning at the threshold, even though the equilibria generally depend on players’ valuations for winning and costs of losing at all scores up to the threshold.14 From an applied perspective, only the reach of each player and valuations for winning at a single score—the threshold—need to be computed. In particular, players’ costs of losing do not affect payoffs. This means, for example, that a player’s expected payoff in an all-pay auction does not change if, instead of paying his entire bid, he pays only a strictly positive fraction of his bid in advance and the rest only if he wins (as long as the fraction is specified in advance). The payoff result implies that the number of players who obtain positive expected payoffs equals the number of prizes. That no more than m players obtain positive payoffs for every realization of players’ strategies (and tie-breaking randomization, if necessary) follows from the definition of contests. The payoff result shows that this is also true in expectation. As the example of Section 1.1 illustrates, contests do not, in general, have pure-strategy equilibria.15 Existence of an equilibrium (in pure or mixed strategies) is not immediately obvious, since payoffs are discontinuous in pure strategies, of which there is a continuum. Simon and Zame’s (1990) result shows 14 Equilibria may also include multiple atoms at various scores and multiple gaps in players’ best-response sets (see Example 4 and page 23 of Siegel (2007)). 15 Pure-strategy equilibria arise when players with positive power have head starts sufficiently large to dissuade weaker players from participating. Such equilibria do not arise in all-pay auctions, regardless of the difference in players’ valuations. The payoff result applies to both pureand mixed-strategy equilibria.
ALL-PAY CONTESTS
83
that an equilibrium exists for some tie-breaking rule. The following corollary of their result and the Tie Lemma above, the proof of which is in the Appendix, shows that an equilibrium exists for any tie-breaking rule. COROLLARY 1: Every contest has a Nash equilibrium. The payoff result does not rely on equilibrium uniqueness: Example 3 in the Appendix describes a generic separable contest and two equilibria of the contest. Moreover, these equilibria lead to different allocations of the prize and different aggregate expenditures, so standard revenue equivalence techniques cannot be used to compare players’ payoffs across equilibria. 3.2. Contests That Are Not Generic Although the payoff result does not apply to contests that are not generic, it implies the following statement. COROLLARY 2: Every contest (generic or not) has at least one equilibrium in which every player’s payoff is the maximum of his power and 0. This “upper-hemicontinuity” result can be proved by considering a sequence of generic contests that “approach” the original contest and an equilibrium for each contest in the sequence. Every limit point (in the weak∗ topology) of the resulting sequence of equilibria is an equilibrium of the original contest in which payoffs are given by the payoff result.16 The payoff result also holds for contests in which all players are identical, even though such contests do not meet the Power Condition (all players have power 0). This is because in any equilibrium of any contest, identical players have identical payoffs,17 and the Zero Lemma, which does not require the Generic Conditions, shows that at least one player has payoff 0. Therefore, we can state another corollary: COROLLARY 3: In any equilibrium of a contest in which all players are identical, all players have a payoff of 0. 16 Probability measures on the Borel subsets of the compact metric set [ai ri + ε] are regular (for any ε > 0), so the set of these probability measures is weak∗ compact and has at least one limit point (see, for example, Dunford and Schwartz (1988)). 17 Suppose players i and j = i are identical, and consider an equilibrium G. Player i can choose scores slightly above the supremum of player j’s best responses, beating player j for sure and beating the other players at least as often as player j does with any best response. This implies that player i’s payoff in G is at least as high as that of player j by applying a continuity argument and a reasoning similar to that used in the proof of Theorem 2.
84
RON SIEGEL
When players are not identical and the contest is not generic, the payoff of a player in some equilibrium may be very close to his valuation for winning at his initial score, even if his power is very low. Example 1 below shows this when the Power Condition fails; Example 2 below shows this when the Cost Condition fails. EXAMPLE 1—The Power Condition Fails and the Payoff Result Does Not Hold: Consider the following three-player separable contest for one prize of common value 1, which is a modification of the example from Section 1.1. Players’ costs are ⎧ if 0 ≤ x ≤ h, ⎨ (1 − α)x αh c1 (x) = (x − h) if x > h, ⎩ (1 − α)h + 1 + 1−h ⎧ if 0 ≤ x ≤ h, ⎨ (1 − ε)x εh c2 (x) = (x − h) if x > h, ⎩ (1 − ε)h + 1 + 1−h
γx if 0 ≤ x ≤ h c3 (x) = γh + L(x − h) if x > h for some small α ε in (0 1), small γ ≥ 0, h in (0 1), and L > 0. Regardless of the value of L, the threshold is 1 and the Power Condition is violated (since at least two players have power 0). Costs are strictly increasing at 1 for all players, so the Cost Condition is met. It is straightforward to verify that for any h in (0 1), there exist some β > 0 and M > 0 such that if α ε γ < β and L > M, then (G1 G2 G3 ) is an equilibrium, for ⎧ 0 if x < 0, ⎪ ⎪ ⎪ γ x ⎪ ⎪ −1 if 0 ≤ x ≤ h, ⎨ (1 − ε)h + 1−α h G1 (x) = εh ⎪ ⎪ ⎪ (1 − ε)h + 1 + (x − h) if h < x ≤ 1, ⎪ ⎪ 1−h ⎩ 1 if x > 1, ⎧ 0 if x < 0, ⎪ ⎪ ⎪ if 0 ≤ x ≤ h, ⎨ (1 − α)h αh G2 (x) = (1 − α)h + 1 + (x − h) if h < x ≤ 1, ⎪ ⎪ ⎪ 1−h ⎩ 1 if x > 1,
x/ h if x ≤ h G3 (x) = 1 if x > h The top part of Figure 2 depicts players’ costs. The bottom part depicts players’ atoms and densities in the equilibrium (G1 G2 G3 ).
ALL-PAY CONTESTS
85
FIGURE 2.—Cost functions and the equilibrium (G1 G2 G3 ) of Example 1.
Player 3’s power is 1 − γh − L(1 − h) (his valuation for the prize less his cost of choosing the threshold), so as L increases, player 3’s power becomes arbitrarily low. But as h tends to 1 and ε α and γ tend to 0, for any value of L > M, player 3 wins with near certainty and his payoff (1 − ε)(1 − α)h2 − γh approaches the value of the prize. While the equilibrium of Figure 2 may seem robust, a slight change in player 1’s or 2’s valuation for the prize leads to a generic contest and destroys the equilibrium (since Theorem 1 then applies). EXAMPLE 2—The Cost Condition Fails and the Payoff Result Does Not Hold: When the Cost Condition fails, the marginal player cannot obtain a strictly positive payoff on some interval of scores leading up to the threshold, so competition may stop before the threshold is reached. Consider the following two-player separable contest for one prize of common value 1. Players’ costs are c1 (x) = bx for some b < 1 and ⎧x ⎪ ⎨ c2 (x) = d ⎪ ⎩ 1 2x − 1
if 0 ≤ x < d if d ≤ x ≤ 1 if x > 1
for some d in (0 1). Player 1’s reach is b1 > 1 player 2’s reach is 1, the threshold is 1, and players’ powers are w1 = 1 − b > 0 and w2 = 0, so the Power Condition holds, but the Cost Condition fails, because c2−1 (c2 ( r2 )) = [d 1]. As a result,
86
RON SIEGEL
(G1 G2 ) is an equilibrium in which player 1 has a payoff of 1 − bd > w1 , for ⎧ 0 if x < 0 ⎪ ⎨x if 0 ≤ x ≤ d G1 (x) = ⎪ ⎩d 1 if x > d 0 if x < 0 G2 (x) = 1 − bd + bx if 0 ≤ x ≤ d 1 if x > d As b approaches 1, player 1’s power approaches 0, but for any value of b, as d approaches 0, player 1’s payoff approaches 1, the value of the prize. Note, ˜ 2 ) is an equilibrium in which both players’ payoffs equal however, that (G1 G their powers, for ˜ 2 (x) = G
0 1 − b + bx 1
if x < 0 if 0 ≤ x ≤ 1 if x > 1
4. PARTICIPATION A player participates in an equilibrium of a contest if with strictly positive probability he chooses scores associated with strictly positive costs of losing. Players with strictly negative powers that are disadvantaged everywhere with respect to the marginal player do not participate in any equilibrium. This is the content of Theorem 2, which is proved in the Appendix. THEOREM 2: In a generic contest, if the normalized costs of losing and valuations for winning for the marginal player are, respectively, strictly lower and weakly higher than those of player i > m + 1, that is, ci (x) cm+1 (max{am+1 x}) < vm+1 (am+1 ) vi (ai )
for all x ∈ Si such that ci (x) > 0
vi (x) vm+1 (max{am+1 x}) ≥ vm+1 (am+1 ) vi (ai )
for all x ∈ Si
and
then player i does not participate in any equilibrium. In particular, if these conditions hold for all players in NL \{m + 1}, then in any equilibrium only the m + 1 players in NW ∪ {m + 1} may participate.
ALL-PAY CONTESTS
87
Theorem 2 shows that in a generic all-pay auction only players 1 m + 1 may participate,18 and because players’ cost functions in all-pay auctions are strictly increasing and all initial scores equal 0, players 1 m + 1 do indeed participate.19 This explains why in all-pay auctions with distinct valuations precisely the m + 1 players with the highest valuations place strictly positive bids with strictly positive probability. In contrast, the example of Section 1.1 shows that a player in NL \{m + 1} may participate if he has a local advantage with respect to the marginal player. Indeed, the proof of the Threshold Lemma shows that players 1 and 2 choose scores that approach or exceed the threshold, and so participate in any equilibrium. Suppose player 3 did not participate. Then players 1 and 2 would have to play strategies that make all scores in (0 T ) best responses for both of them.20 For low values of γ > 0, player 3 could then obtain a strictly positive payoff by choosing a low score, contradicting the payoff result. Thus, player 3 must also participate in any equilibrium, even though his expected equilibrium payoff is 0. 5. CONCLUDING REMARKS All-pay contests capture general asymmetries among contestants, and allow for both sunk and conditional investments. This paper has provided a closedform formula for players’ expected payoffs in generic contests and has analyzed players’ participation. The main insight is that reach and power are the right variables to focus on when examining contests. Additional, seemingly complicated questions become simple when this insight in put to use. Consider, for example, the issue of rent dissipation, which is central to the rent-seeking literature. In a separable contest for m prizes of value V , aggregate equilibrium expenditures are simply mV less players’ payoffs. As winning players’ powers approach 0, which happens when their costs at the threshold approach V , rent dissipation is complete. As winning players’ powers approach V , which happens when their costs at the threshold approach 0, no rent is dissipated. The addition of a player to a contest never lowers the threshold and, therefore, makes existing players weakly worse off. If the new player’s reach is below the existing threshold, existing players’ payoffs do not change and the new 18 Baye, Kovenock, and de Vries (1996) showed that more than two players may participate in certain nongeneric, single-prize all-pay auctions. 19 The m players in NW participate because, as shown in the proof of the Threshold Lemma, they choose scores that approach or exceed the threshold. If the marginal player does not choose scores that approach or exceed the threshold, any player in NW can win with certainty and increase his payoff by choosing a score strictly below the threshold, contradicting the Threshold Lemma. 20 Apply Lemma 1 in the Appendix to the contest that includes only players 1 and 2.
88
RON SIEGEL
player has a payoff of 0. The addition of a prize makes player m + 2 the marginal player, and this lowers the threshold and makes existing players better off. In contrast, making prizes more valuable may make players worse off, because it raises the threshold. Further analysis of these issues, which have implications for contest design, is left for future work. APPENDIX EXAMPLE 3: This example depicts a separable contest and two equilibria of the contest, in which different players participate and in which aggregate expenditures differ. Let n = 4 and m = 1. I extend the example of Section 1.1 by adding player 4, without constructing the contest explicitly. I will demonstrate the existence of two equilibria, G and G , such that only players 1, 2, and 3 participate in G, and only players 1, 2, and 4 participate in G . Different player participation implies different allocations, since a player chooses a costly score only if he has a strictly positive probability of winning by doing so. Begin with an equilibrium G = (G1 G2 G3 ) of the example of Section 1.1. When γ is small and positive, all three players participate in G as shown in Section 4. Let t3 = inf{x : G3 (x) = 1}. Add player 4 with V4 = 1 and continuous, increasing costs c4 that are lower than those of player 3 below t3 and equal to them starting from t3 . That is, ∀x ∈ (0 t3 ): c4 (x) ∈ (0 c3 (x)) and ∀x ≥ t3 : c4 (x) = c3 (x). There exist such functions c4 for which an equilibrium is for players 1, 2, and 3 to play G, and for player 4 not to participate. Indeed, assume that player 4 does not participate. Since player 3’s equilibrium payoff is 0 (his power is negative), he obtains at most 0 by choosing any score when players 1 and 2 play G1 and G2 , respectively, and he wins a prize under G if and only if he beats players 1 and 2. By Lemma 1 below, there are no atoms in (0 T ) in G, so ∀x ∈ (0 T ): P3 (x) = G1 (x)G2 (x). Player 4, who considers joining in when the others are playing G, must beat players 1, 2, and 3 to win, that is, ∀x ∈ (0 T ): P4 (x) = G1 (x)G2 (x)G3 (x). Let x ∈ (0 t3 ) (note that t3 < T ), which implies that G3 (x) < 1. If u3 (x) = 0, then P3 (x) > 0 because c3 (x) > 0. Thus, P4 (x) < P3 (x) and P4 (x) − c3 (x) < P3 (x) − c3 (x) = u3 (x) = 0. If u3 (x) < 0, since P4 (x) ≤ P3 (x) again P4 (x) − c3 (x) < 0. Thus, there exist continuous, increasing functions c4 such that ∀x ∈ (0 t3 ): c4 (x) ∈ (0 c3 (x)) and P4 (x) − c4 (x) < 0. For such functions, it is a best response for player 4 not to participate when the other players play G. Since G is an equilibrium of the contest that includes only players 1, 2, and 3, we have an equilibrium. Maintaining the same cost functions, consider now an equilibrium G = (G1 G2 G4 ) of the contest that includes only players 1, 2, and 4. As in the example of Section 1.1, all three players must participate in G . When player 3 is added to the contest and does not participate, this remains an equilibrium. Indeed, player 4’s payoff is zero in G (his power is negative) and at every score, player 3’s costs are weakly higher than those of player 4 whereas his probability of winning is weakly lower than that of player 4.
ALL-PAY CONTESTS
89
If aggregate expenditures under the two equilibria are the same, multiply the valuation and cost of player 4 by some strictly positive d = 1. This does not change the equilibria of the contest, but changes aggregate expenditures in G . PROOF OF COROLLARY 1: Consider a contest C and the restricted contest C , in which every player i chooses scores in Si = [ai K] for K = maxi∈N ri < ∞. Any equilibrium of C is an equilibrium of C, since scores higher than K are strictly dominated by ai for every player i. Thus, it suffices to show that C has an equilibrium. ∗ To do this, consider S ∗ = i∈N Si \{(s1 sn )|∃i = j : si = sj }. That is, S is the set of n-tuples of distinct strategies. Players’ payoffs are bounded and continuous on S ∗ , which is dense in i∈N Si . So, following Simon and Zame (1990, p. 864), there exists some tie-breaking rule, which may be score dependent, such that C has a mixed-strategy equilibrium G when this tie-breaking rule is used. Denote the game in which this tie-breaking rule is used by C˜ and denote player i’s payoff in the equilibrium G by u˜ i . To complete the proof, it suffices to show that G is an equilibrium of C . I do this in two steps. Step 1. A Gi -measure 1 of best responses for player i in C˜ gives player i the same payoff u˜ i in C as it does in C˜ when all other players play according to G. Indeed, choosing a score si at which no player has an atom gives player i the same payoff regardless of the tie-breaking rule. Since the number of atoms in G is countable, it is enough to show that when choosing Gi -atoms in C , player i obtains u˜ i . Consider a Gi -atom si . If only i has an atom at si , then a tie occurs with probability 0 so player i obtains u˜ in C . If there are multiple atoms at si , by the Tie Lemma the tie is never binding regardless of the tie-breaking rule, so again player i obtains u˜ i in C . Step 2. No score gives player i a payoff in C higher than u˜ i . The only scores si to check are those at which player i does not have an atom and another player does. Consider such a score si , and denote player i’s payoff in C when ˜ player choosing si by ui (si ). By choosing scores slightly higher than si in C, i can obtain a payoff of at least ui (si ) − ε for any ε > 0. Thus, ui (si ) ≤ u˜ i . Q.E.D.
×
×
PROOF OF THEOREM 2: Since dividing a player’s Bernoulli utility by vi (ai ) > 0 does not change his strategic behavior, it suffices to prove the result for contests in which vi (ai ) = 1 for every player i. Choose an equilibrium G of such a contest and suppose player i > m + 1 that meets the conditions of the proposition participated in G. Let ti = inf{x : Gi (x) = 1} < T and let t˜i = max{am+1 ti }. Then t˜i < T , Pi (ti ) < 1 (because the m players in NW choose scores that approach or exceed the threshold, as shown in the proof of the Threshold Lemma), and for every δ > 0: Pm+1 (t˜i + δ) ≥ Pi (ti ) since by choosing (t˜i + δ), player m + 1 beats player i for sure and beats the other players
90
RON SIEGEL
at least as often as player i does. Therefore, since for every δ > 0 such that t˜i + δ < rm+1 = T we have vm+1 (t˜i + δ) > 0 ≥ −cm+1 (t˜i + δ) we obtain um+1 ≥ Pm+1 (t˜i + δ)vm+1 (t˜i + δ) − (1 − Pm+1 (t˜i + δ))cm+1 (t˜i + δ) ≥ Pi (ti )vm+1 (t˜i + δ) − (1 − Pi (ti ))cm+1 (t˜i + δ) Now, by definition of participation, ci (ti ) > 0, so ci (ti ) > cm+1 (t˜i ). Since Pi (ti ) < 1 and vm+1 (max{am+1 x}) ≥ vi (x) for all x ∈ Si , by continuity of vm+1 and cm+1 , player m + 1 can choose t˜i + δ for sufficiently small δ > 0 such that Pi (ti )vm+1 (t˜i + δ) − (1 − Pi (ti ))cm+1 (t˜i + δ) > Pi (ti )vi (ti ) − (1 − Pi (ti ))ci (ti ) = ui (ti ) ≥ 0 so um+1 > 0, which contradicts the payoff result because wm+1 = 0.
Q.E.D.
LEMMA 1: In any equilibrium G of a contest with strictly decreasing valuations for winning, strictly increasing costs of losing, and initial scores of 0, (i) G is continuous on (0 T ) and (ii) every score in (0 T ) is a best response for at least two players. PROOF: Suppose that a player i had an atom at x ∈ (0 T ). Because players 1 m + 1 choose scores that approach or exceed the threshold (see the argument in footnote 19), Pj (x) < 1 for every player j and, in particular, Pi (x) < 1. If Pi (x) = 0 then player i would be better off by choosing 0. Therefore, Pi (x) ∈ (0 1). Now consider a player j = i. Because player i has an atom at x and Pi (x) ∈ (0 1), player j would be better off choosing scores slightly above x than choosing scores slightly below x. To see this, suppose player j chose x and all other players played according to G, with some randomization as the tie-breaking rule at x. Then player i would still win with a probability in (0 1) when choosing x, so j would be better off choosing a slightly higher score by reasoning similar to that of the Tie Lemma. Thus, no player j = i chooses scores in some region below x (regardless of the tie-breaking rule) and, by the Tie Lemma, no player j = i has an atom at x. Therefore, player i would be better off by choosing a score slightly below x. This shows that G is continuous on (0 T ). For (ii), note that if x ∈ (0 T ) is not a best response for player i, then continuity of G implies that the same is true for scores in some neighborhood of x. Therefore, if only one player had a best response at x, he could choose scores slightly lower than x and win with the same probability, making him better off. Suppose no player had a best response at x. Then there would be a gap in the union of players’ best response sets. Continuity of G
ALL-PAY CONTESTS
91
on (0 T ) implies that the top of the gap cannot be below the threshold, and because valuations for winning are strictly decreasing and Gi (T ) = 1 for every player i in NL , players in NW do not have best responses above the threshold, so Gi (T ) = 1 for every player i. Now every player i in {1 m + 1} has Gi (x) < 1 because he chooses scores that approach or exceed the threshold. So the top of the gap must be the threshold, and players {1 m + 1} must each have an atom there, contradicting the Tie Lemma. This shows that x is a best response for at least two players. Q.E.D. REFERENCES BARUT, Y., AND D. KOVENOCK (1998): “The Symmetric Multiple Prize All-Pay Auction With Complete Information,” European Journal of Political Economy, 14, 627–644. [71] BAYE, M. R., D. KOVENOCK, AND C. G. DE VRIES (1993): “Rigging the Lobbying Process: An Application of All-Pay Auctions,” American Economic Review, 83, 289–294. [71] (1996): “The All-Pay Auction With Complete-Information,” Economic Theory, 8, 291–305. [73,75,87] CHE, Y.-K., AND I. GALE (1998): “Caps on Political Lobbying,” American Economic Review, 88, 643–651. [71] (2003): “Optimal Design of Research Contests,” American Economic Review, 93, 646–671. [71] (2006): “Caps on Political Lobbying: Reply,” American Economic Review, 96, 1355–1360. [71,72,77] CLARK, D. J., AND C. RIIS (1998): “Competition Over More Than One Prize,” American Economic Review, 88, 276–289. [71,73,77,79] CORNES, R., AND R. HARTLEY (2005): “Asymmetric Contests With General Technologies,” Economic Theory, 26, 923–946. [72] DASGUPTA, P. (1986): “The Theory of Technological Competition,” in New Developments in the Analysis of Market Structure, ed. by J. E. Stiglitz and G. F. Mathewson. Cambridge, MA: MIT Press, 519–547. [71] DUNFORD, N., AND J. T. SCHWARTZ (1988): Linear Operators Part I: General Theory. New York: Wiley. [83] ELLINGSEN, T. (1991): “Strategic Buyers and the Social Cost of Monopoly,” American Economic Review, 81, 648–657. [71] GONZÁLEZ-DÍAZ, J. (2007): “A Unifying Model for First-Price Winner-Takes-All Contests,” Mimeo. [72,73] HILLMAN, A. L., AND J. G. RILEY (1989): “Politically Contestable Rents and Transfers,” Economics and Politics, 1, 17–39. [71,77] HILLMAN, A. L., AND D. SAMET (1987): “Dissipation of Contestable Rents by Small Numbers of Contenders,” Public Choice, 54, 63–82. [71,77] KAPLAN, T. R., AND D. WETTSTEIN (2006): “Caps on Political Lobbying: Comment,” American Economic Review, 96, 1351–1354. [71,72,77] KAPLAN, T. R., I. LUSKI, A. SELA, AND D. WETTSTEIN (2002): “All-Pay Auctions With Variable Rewards,” Journal of Industrial Economics, 50, 417–430. [72] LAZEAR, E. P., AND S. ROSEN (1981): “Rank-Order Tournaments as Optimum Labor Contracts,” Journal of Political Economy, 89, 841–864. [72] MOLDOVANU, B., AND A. SELA (2001): “The Optimal Allocation of Prizes in Contests,” American Economic Review, 91, 542–558. [72] (2006): “Contest Architecture,” Journal of Economic Theory, 126, 70–96. [72] PARREIRAS, S., AND A. RUBINCHIK (2006): “Contests With Many Heterogeneous Agents,” Mimeo. [72]
92
RON SIEGEL
SIEGEL, R. (2007): “All-Pay Contests,” Ph.D. Thesis, Stanford Graduate School of Business. [75, 82] SIMON, L. K., AND W. R. ZAME (1990): “Discontinuous Games and Endogenous Sharing Rules,” Econometrica, 58, 861–872. [82,89] SZYMANSKI, S., AND T. M. VALLETTI (2005): “Incentive Effects of Second Prizes,” European Journal of Political Economy, 21, 467–481. [72] TULLOCK, G. (1980): “Efficient Rent Seeking,” in Toward a Theory of the Rent Seeking Society, ed. by J. M. Buchanan, R. D. Tollison, and G. Tullock. College Station, TX: Texas A&M University Press, 269–282. [72] VARIAN, H. (1980): “A Model of Sales,” American Economic Review, 70, 651–658. [71]
Dept. of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208, U.S.A.; [email protected]. Manuscript received November, 2007; final revision received July, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 93–105
THE COMPLEXITY OF FORECAST TESTING BY LANCE FORTNOW AND RAKESH V. VOHRA1 Consider a weather forecaster predicting a probability of rain for the next day. We consider tests that, given a finite sequence of forecast predictions and outcomes, will either pass or fail the forecaster. Sandroni showed that any test which passes a forecaster who knows the distribution of nature can also be probabilistically passed by a forecaster with no knowledge of future events. We look at the computational complexity of such forecasters and exhibit a linear-time test and distribution of nature such that any forecaster without knowledge of the future who can fool the test must be able to solve computationally difficult problems. Thus, unlike Sandroni’s work, a computationally efficient forecaster cannot always fool this test independently of nature. KEYWORDS: Forecast testing, prediction, bounded rationality.
1. INTRODUCTION SUPPOSE ONE IS ASKED to forecast the probability of rain on successive days. Sans knowledge of the distribution that governs the change in weather, how should one measure the accuracy of the forecast? One criterion for judging the effectiveness of a probability forecast, called calibration, has been an object of interest. Dawid (1982) offered the following intuitive definition of calibration: Suppose that, in a long (conceptually infinite) sequence of weather forecasts, we look at all those days for which the forecast probability of precipitation was, say, close to some given value ω and (assuming these form an infinite sequence) determine the long run proportion p of such days on which the forecast event (rain) in fact occurred. The plot of p against ω is termed the forecaster’s empirical calibration curve. If the curve is the diagonal p = ω, the forecaster may be termed (empirically) well calibrated.
Foster and Vohra (1993) exhibited a randomized forecasting algorithm that with high probability will be calibrated on all sequences of wet–dry days. Thus, a forecaster with no meteorological knowledge would be indistinguishable from one who knew the distribution that governs the change in weather.2 This has inspired the search for tests of probability forecasts that can distinguish between a forecaster who knows the underlying distribution of the process being forecast and one who simply “games” the test. A test takes as input a forecasting algorithm and a sequence of outcomes, and after some period accepts the forecast (PASS) or rejects it (FAIL). Sandroni (2003) proposed two properties that such a test should have. The first is that the test should declare PASS/FAIL after a finite number of periods. This seems unavoidable for a practical test. Second, suppose the forecast is indeed 1 Research supported in part by NSF grant ITR IIS-0121678. We thank A. Sandroni and W. Olszewski and the referees for useful comments. 2 Lehrer (2001), Sandroni, Smorodinsky, and Vohra (2003), and Vovk and Shafer (2005) gave generalizations of this result.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7163
94
L. FORTNOW AND R.V. VOHRA
correct, that is, accurately gives the probability of nature in each round. Then the test should declare PASS with high probability. We call this second condition passing the truth. Call a test that satisfies these two conditions a good test. A test based on calibration is an example of a good test. A forecaster with no knowledge of the underlying distribution who can pass a good test with high probability on all sequences of data is said to have ignorantly passed the test. For every good test, Sandroni (2003) showed that there exists a randomized forecasting algorithm that will ignorantly pass the test. Therefore, no good test can distinguish between a forecaster who knows the underlying distribution of the process being forecast from one who simply games the test. Since this randomized forecast can pass the test for all distributions, it must be independent of the underlying (if any) distribution, being forecasted. Hence, in some sense these forecasts provide no information at all about the process being forecasted. Dekel and Feinberg (2006) as well as Olszewski and Sandroni (2007) got around the impossibility of result of Sandroni (2003) by relaxing the first property of a good test; for example, allowing the test to declare PASS/FAIL at infinity, allowing the test to declare FAIL in a finite number of periods but PASS at infinity, or relaxing the condition that the test always passes the truth. These tests can often be made efficient in the sense that they can run in time linear in the length of the current sequence, but the number of forecasts before a bad forecaster is failed could be extremely large as a function of the forecaster. Olszewski and Sandroni (2008) have noted that the tests considered by Dekel and Feinberg (2006) and Olszewski and Sandroni (2007) rely on counterfactual information. Specifically, the test can use the predictions the forecast would have made along sequences that did not materialize because the test has access to the forecasting algorithm itself. As noted by Olszewski and Sandroni (2008), this is at variance with practice. For this reason they considered tests that are not permitted to make use of counterfactual predictions on the part of the forecaster, but relaxed the condition that the test must decide in finite time. Formally, two different forecasting algorithms that produce the same forecast on a realization must be treated in the same way. If such tests pass the truth with high probability, they show that for each such test, there is a forecasting algorithm that can ignorantly pass the test. It is natural to ask if a test, using a proper scoring rule3 like log-loss, can circumvent these difficulties. Here one penalizes the forecaster log p if the forecaster predicts a probability p of rain and it rains, and a penalty of log(1 − p) if it does not rain. The lowest possible score that can be obtained is the longrun average entropy of the distribution. One could imagine the test passing the forecaster if the log-loss matches the entropy. However, such a test would need 3 Assuming the forecaster is compensated on the basis of the scores associated with the rule, a proper scoring rule gives the forecaster the incentive to reveal his/her true beliefs. See Good (1952).
THE COMPLEXITY OF FORECAST TESTING
95
to know the entropy of the distribution. As noted in the Introduction, we are concerned with tests which operate without any prior knowledge of the distribution. Proper scoring rules are good methods to compare two forecasters, but are not useful for testing the validity of a single forecaster against an unknown distribution of nature. 1.1. Computationally Bounded Forecasters This paper will examine the consequences of imposing computational limits on both the forecaster and the test. We measure the complexity as a function of the length of the history so far. Most practical tests have a complexity that is polynomial in the length of the history, so it seems reasonable to restrict attention to good tests that have a complexity that is polynomial in the length of the history. Restricting the test in this way should make it easier to be ignorantly passed. It seems natural to conjecture that for every polynomial-time good test, there exists a polynomial-time randomized forecasting algorithm that will ignorantly pass the test. Remarkably, this is not the case. We exhibit a good linear-time test that would require the forecaster to factor numbers under a specific distribution or fail the test. The existence of an efficient (i.e., probabilistic polynomialtime) algorithm for factoring composite numbers is considered unlikely. Indeed, many commercially available cryptographic schemes are based on just this premise. This result suggests that the “ignorant” forecaster of Sandroni (2003) must have a complexity at least exponential in n. Hence, the ignorant forecaster must be significantly more complex than the test. In particular, its complexity may depend on the complexity of nature’s distribution. To prove this result, we interpret the observed sequence of 0–1’s as encoding a number followed by a list of its possible factors. A sequence that correctly encodes a list of factors is called correct. The test fails any forecaster who does not assign high probability to these correct sequences when they are realized. Consider now the distribution that puts most of its weight on correct sequences. If the forecaster can ignorantly pass the test, he/she must be able to identify sequences that correspond to correct answers to our computational question. The factoring proof does not generalize to all NP problems, because we need a unique witness so as to guarantee that the test always passes the truth. Witness reduction techniques as in Valiant and Vazirani (1986) do not appear to help. Our second result strengthens the previous one by exhibiting a good test that requires the forecaster to solve PSPACE-hard problems4 by building on the structure of specific interactive proof systems. While the latter result is strictly stronger than the factoring result, we present both since the factoring proof 4 See Section 5 of this paper for a definition. This class contains the more well known class of NP-hard problems.
96
L. FORTNOW AND R.V. VOHRA
is much simpler and illustrates some of the techniques used in the PSPACE proof. In both cases the tests are deterministic. Furthermore, they use only the realized outcomes and forecasts to render judgement. We can modify the factoring proof to use efficiently sampleable distributions of nature. Whether we can do the same for the more general PSPACE result remains open. In addition we also consider the possibility that the test may have more computational power than the forecaster. If we restrict ourselves to forecasters using time O(t(n)), there is a test T using time O(nO(1) t(n)) with the following properties. • For all distributions of nature μ, T will pass, with high probability, a forecaster forecasting μ. • For some distribution τ of nature, for every forecaster F running in time O(t(n)), T will fail F with high probability. If one takes a highly noncomputable distribution τ, a forecaster would not be able to forecast τ well, but T could not test this in general if T is also required to always pass the truth. In a nutshell, no forecasting algorithm can ignorantly pass a good test that is more complex than itself. In Section 2 we give formal definitions of forecasters and tests. Section 3 shows that the forecaster needs to be nearly as powerful as the test. Section 4 gives a simple test where a successful ignorant forecasters must be able to factor. In Section 5 we sketch a more complicated test that checks if the forecaster successfully acts like a prover in an interactive proof system. This section will also provide a brief description of an interactive proof system and its implications. 2. DEFINITIONS Let N be the set of natural numbers. Let S = {0 1} be the state space.5 An element of S is called an outcome. Let S n , for n ∈ N, be the n-Cartesian product of S. An n-sequence of outcomes is denoted s = (s1 s2 sn ) ∈ S n , where si denotes the state realized in period i. Given s ∈ S n and r < n, let sr = (s1 s2 sr ) ∈ S r be the prefix of length r of s. An element of [0 1] is called a forecast of the event 1. A forecast made at period r refers to outcomes that will be observed in period r + 1. Let Δ∗ be the set of probability distributions over [0 1]. A forecasting algorithm is a function F:
n−1 (S r × [0 1]r ) → Δ∗ r=0
At the end of each stage r < n, an r-history (sr f0 f1 fr−1 ) ∈ S r × [0 1]r is observed. Here fj ∈ [0 1] is the forecast made by F in period j. Let 5
The results can easily be extended to more than two states.
THE COMPLEXITY OF FORECAST TESTING
97
f r = (f0 fr ). Based on this r-history, the forecaster must decide which forecast fr ∈ [0 1] to make in period r. The forecaster is allowed to randomize, so fr ∈ [0 1] can be selected (possibly) at random, using a probability distribution in Δ∗ . An n-outcome sequence s ∈ S n and a forecasting algorithm F determine a probability measure F¯ s on [0 1]n , where, conditional on (sr f r−1 ), the probabilities of forecasts next period are given by F(sr f r−1 ). The vector of realized forecasts associated with F on a sequence s will be denoted f (s). Denote the unknown data generating process by P. Given P and sr ∈ [0 1]r , let Psr ∈ [0 1] be the probability that sr+1 = 1 conditional on sr . Given P, let F P (s) ∈ [0 1]n be the forecast sequence such that frP (s) = Psr . A test is a function T : S n × [0 1]n → {0 1}. After a history of n forecasts and outcomes are observed, a test must either accept (PASS) or reject (FAIL) the forecast. When the test returns a 0, the test is said to FAIL the forecast based on the outcome sequence. When the test returns a 1, the test is said to PASS the forecast based on the outcome sequence. A test is said to pass the truth with probability 1 − ε if Pr s : T (s F P (s)) = 1 ≥ 1 − ε P
for all P. A test T can be ignorantly passed by a forecasting algorithm F with probability 1 − ε if for all P, Pr s : T (s f (s)) = 1 ≥ 1 − ε P
Equivalently, for all s ∈ S n , Pr T (s f (s)) = 1 ≥ 1 − ε where the probability is with respect to the forecaster’s randomness. Therefore, F can ignorantly pass T if on any sequence of outcomes, the realized forecast sequence will not be failed with probability at least 1−ε (under the distribution induced by the forecasting scheme). A test T is said to fail the forecasting algorithm F on the distribution Q with probability 1 − ε if
Pr s : Pr T (s f (s)) = 1 ≥ 1 − ε ≤ ε Q
F¯ s
THEOREM 1—Sandroni’s Theorem: Suppose test T passes the truth with probability 1 − ε. Then there is a forecasting algorithm F that can ignorantly pass T with probability 1 − ε.
98
L. FORTNOW AND R.V. VOHRA
3. TESTS MORE COMPLEX THAN THE FORECAST We show that no forecasting algorithm can ignorantly pass a test with probability 1 − ε that is more complex than itself. The basic idea of the proof can be found in Dawid (1985). For notational simplicity our exposition will be limited to deterministic forecasts. The extension to randomized forecasts is straightforward and we outline it at the end of the section. Let t(·) be a time-constructible function and let p(·) be a polynomial.6 The results here and later in the paper hold only for sufficiently large n, that is, n > n0 , where n0 can depend on the forecaster and ε. For small n, a forecaster could be hard-wired with enough information to fool any test that passes the truth. LEMMA 1: For every deterministic forecaster F using time t(n) there is a distribution P, polynomial p, and test T , using time p(n)t(n), so that for all ε > 0 and n sufficiently large: (i) T passes the truth with probability at least 1 − ε. (ii) T fails F on P with probability 1 − ε. PROOF: Let n be sufficiently large and let a sequence s∗ = (s1∗ s2∗ s3∗ sn∗ ) of outcomes be such that the probability F assigns to seeing sj∗ given (s1∗ ∗ sj−1 ) and f j−2 is at most 1/2 for all j ≤ n. To describe the test, let G be any forecasting algorithm and let (s g(s)) be an n-history. • If s = s∗ , then T (s g(s)) = 1. • If s = s∗ and the probability that G assigns to s∗ is less than ε, then T (s g(s)) = 1. • If s = s∗ and the probability that G assigns to s∗ is at most ε, then T (s g(s)) = 0. Observe that the truth is failed if and only if s∗ is realized and the probability assigned by the truth to this event is less than ε. Since this can only happen with probability less than ε, it follows that the test passes the truth with probability 1 − ε. Now we exhibit a distribution P such that PrP ({s : T (s f ) = 1}) = 0 Let P be the distribution that puts measure 1 on s∗ . The probability that F assigns to s = s∗ is less than (1/2)n ≤ ε. Hence, the test fails F on P with probability 1 − ε. Q.E.D. LEMMA 2: Suppose m deterministic forecasting algorithms, F1 F2 Fm , that each use time t(n). Then there is a distribution P, polynomial p, and test T that uses mp(n)t(n) time so that for all ε > 0 and n sufficiently large: 6 A time-constructible function t(n) is one which can be constructed from n by a Turing machine in time order t(n). Formally, there exists a Turing machine M which, given a string of n 1’s, returns the binary representation of t(n) in O(t(n)) steps. All natural functions are time constructible.
THE COMPLEXITY OF FORECAST TESTING
99
(i) T passes the truth with probability 1 − ε. (ii) T fails all F1 F2 Fm on P with probability 1 − ε. PROOF: Consider the sequence of integers 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 Let h(j) be the jth element of the sequence. Let n be sufficiently large and let a sequence s∗ = (s1∗ s2∗ s3∗ sn∗ ) of outcomes be such j−2 ∗ ) and fh(j) is that the probability Fh(j) assigns to seeing sj∗ given (s1∗ sj−1 j−2 ∗ fh(j) (sj∗ |s1∗ sj−1 fh(j) ) ≤ 1/2 for all j ≤ n. If h(j) > m, let sj∗ = 0. To describe the test, let G be any forecasting algorithm and let (s g(s)) be an n-history. • If s = s∗ , then T (s g(s)) = 1. • If s = s∗ and the probability that G assigns to s∗ is at least ε, then T (s g(s)) = 1. • If s = s∗ and the probability that G assigns to s∗ is at most ε, then T (s g(s)) = 0. Observe that the truth is failed if and only if s∗ is realized and the probability assigned by the truth to this event is less than ε. Since this can only happen with probability less than ε, it follows that the test passes the truth with probability 1 − ε. Let P be the distribution that puts measure 1 on s∗ . Choose any Fk . Consider the subsequence {sj∗ }h(j)=k of s∗ . The probability that Fk assigns to this subsequence is at most (1/2)|{j : h(j)=k}| < ε. Hence, the failure condition in the part third of the test holds. Q.E.D. THEOREM 2: For any t(n) there is a polynomial p and a test T of complexity p(n)t(n) with the following properties. (i) For all n sufficiently large the test passes the truth with probability 1 − ε. (ii) There is a distribution P on infinite 0–1 sequences such that each forecaster F of complexity t(n) fails the test on P with probability 1 − ε for all n ≥ nF , where nF depends on the forecaster F . (iii) P is independent of the forecaster and n. PROOF: Let F1 F2 be an enumeration of all t(n)-computable forecasts. Note that the sequence s∗ defined in Lemma 2 does not depend on n. Let P be the distribution that puts all its weight on the sequence s∗ . For each n the test we require coincides with the test as defined in Lemma 2. This test will pass the truth with high probability for every n. For any t(n) computable test Fk , by Lemma 2, for sufficiently large n, the test will fail Fk on P with probability 1 − ε. Q.E.D. To extend the results to randomized forecasts it suffices to show how the proof of Lemma 1 can be modified. Recall that the sequence s∗ was chosen so that the forecasted probability assigned by the forecast to sj∗ (given the previous history) is less than 1/2. For a randomized forecast, we choose sj∗ so as to
100
L. FORTNOW AND R.V. VOHRA
maximize the probability of choosing a forecast of less than 1/2 of observing event sj∗ . 4. FORECASTS THAT FACTOR In this section we describe a test that always passes the truth and any forecaster who could ignorantly pass this test must be able to factor any number. We prove our results for deterministic forecasters, but one can extend, in a manner similar to Section 3, the results to randomized forecasters. Given an integer k and a 0–1 sequence s, we can interpret the sequence as encoding an arbitrary tuple of numbers. The first number in the tuple we shall call the prefix of the sequence. The remaining numbers (the suffix) will be interpreted as a potential factorization of the prefix. A sequence that encodes the tuple (m π1 e1 π2 e2 πk ek ) is said to encode the unique factorization of m if (i) π1 π2 πk are primes, (ii) π1 > π2 > · · · > πk > 1, and e e e (iii) m = π1 1 π2 2 · · · πkk . To describe the test, let G be any forecasting algorithm and let (s g(s)) be an n-history. • Let S ∗ be the set of all sequences that encode the factorization of a number. • If s ∈ / S ∗ , then T (s g(s)) = 1. • If s ∈ S ∗ , then the prefix of s corresponds to some number m. Determine the probability, p, that G assigns to s conditional on the prefix being m. • If p ≥ ε, then T (s g(s)) = 1; otherwise T (s g(s)) = 0. Call this the factoring test. It can be implemented in polynomial time since we have efficient algorithms for multiplication and primality testing (see Agrawal, Kayal, and Saxena (2004)). We can make a linear-time test by appropriately padding the sequence. THEOREM 3: For any ε > 0, for sufficiently large m, the factoring test passes the truth with probability 1 − ε and any forecasting algorithm that can ignorantly pass the test with probability 1 − ε can be used to factor m. PROOF: Consider any distribution of nature P and the forecaster F P . Let pm be the probability that P assigns a sequence whose prefix encodes m. Let qm be the probability that a sequence encodes the factorization of m conditional on its prefix encoding m. The factoring test will fail F P if for some m a sequence s ∈ S ∗ with prefix m is realized and qm < ε. The probability of this happening is pm qm < pm ε = ε m : qm <ε
m
Hence, the factoring test will pass the truth with high probability. Now, fix an m and consider the distribution that puts its full weight on the sequence that encodes the unique factorization of m. If the forecaster passes the
THE COMPLEXITY OF FORECAST TESTING
101
test for this distribution, the conditional probability that suffix of the sequence reveals the prime factors of m is at least ε. Use the forecasted probabilities of the suffix as a distribution to generate a sequence. With probability at least ε, we generate the factors of m. If we repeat the process O(1/ε) times, we determine the factors with high probability. Q.E.D. 4.1. Efficiently Sampleable Distributions Theorem 3 makes use of the fact that the test must be passed for all distributions and the distribution given in the proof is one where nature must know how to factor. We can easily modify the proof to nicer distributions, particularly polynomial-time sampleable distributions. A distribution μ is polynomial-time sampleable (see Ben-David, Chor, Goldreich, and Luby (1992)) if there is a polynomial-time computable function f and a polynomial q such that h maps strings of length q(n) to strings of length n, and μ(x) = Prr (h(r) = x), where r is chosen uniformly from strings of length q(|x|). Specifically, we sample the distribution μ by randomly choosing two large primes p and q and output the sequence corresponding to the tuple (pq p q). Using exactly the same test as in Theorem 3, we get the following result. THEOREM 4: For any ε > 0, for sufficiently large n, if we use a distribution that creates factors of length n, the factoring test passes the truth with probability at least 1 − ε, but any forecast that passes the test with distribution μ described above will be able to factor numbers on average by those generated by μ projected on the first coordinate (the value pq). Most cryptographic work based on factoring assumes that factoring numbers even on average is very difficult. This proof can be generalized in a straightforward way to invert any cryptographic one-way function with unique inverses. 5. PSPACE HARDNESS It is natural to suspect that the proof of Theorem 3 should generalize to all NP problems. The obstacle to such a generalization is the absence of unique witnesses. This means we cannot create a test that will pass the truth for all distributions of nature and still force the forecaster to put heavy weight on a witness. Witness reducing techniques as in Valiant and Vazirani (1986) do not appear to help. In this section we take a different approach to create a linear-time test that can be ignorantly passed by a forecaster only if he/she can solve NP-hard and even PSPACE-hard problems by using the theory of interactive proof systems. Once again we assume deterministic forecasters, though the results extend to probabilistic forecasters as well. In complexity theory the class PSPACE is the set of decision problems that can be solved by a deterministic or nondeterministic Turing machine using a
102
L. FORTNOW AND R.V. VOHRA
polynomial amount of memory and unlimited time. They contain the more well known class of NP problems. A logical characterization of PSPACE is that it is the set of problems expressible in second-order logic. A major result of complexity theory is that PSPACE can be characterized as all the languages recognizable by a particular interactive proof system. The hardest problems within the class PSPACE are called PSPACE-complete. An interactive proof system is an abstraction that models computation as an exchange of messages between a prover (P) and a verifier (V). P would like to convince V that a given x belongs to a given set L.7 P has unlimited computational resources, while V’s computational resources are bounded. Messages are passed, in rounds, between P and V until V reaches a conclusion about the correctness of the statement x ∈ L. The parties take turns sending a message to the other party. A strategy for P (or V) is a function that specifies in each round what message is to be sent as a function of the history of messages exchanged in prior rounds. A pair of strategies, sP for P and sV for V, is called a protocol. Membership in L admits an interactive proof system if there is a protocol (sP sV ) with the following two properties: Completeness: If x ∈ L, and both P and V follow the protocol (sP sV ), then V will be convinced that x ∈ L. Soundness: Suppose x ∈ / L. If V follows sV but P is allowed to deviate from sP , V can be convinced that x ∈ L with some small probability. Shamir (1992), building on the work of Lund, Fortnow, Karloff, and Nisan (1992), exhibited an interactive proof system for any PSPACE language L. The proof system has the following properties for an input x of length n. • Let F be a field GF(q) for a prime q exponential in n. • The protocol alternates between the prover (P) giving a polynomial of degree m over F and the verifier (V) choosing a random element of F . Both m and the number of rounds r of the protocol are bounded by a polynomial in n. • V takes all of the messages and decides whether to accept or reject with a deterministic polynomial-time algorithm. • In each round for V there are at most m bad choices from the possible q elements of F that depend only on the previous messages. The other choices we call good. It can be computationally difficult to decide whether a choice is good or bad, but since m q the verifier will pick good choices with very high probability. • If x ∈ / L, then for any strategy of P, V will reject if all of its choices are good. • If x ∈ L, there is a strategy for P that will cause V to accept always. Moreover, if all of V’s choices are good, the messages of P that cause V to accept are unique. 7
More formally, that a given string x belongs to a certain language L.
THE COMPLEXITY OF FORECAST TESTING
103
Fix a PSPACE-complete language L. Consider a sequence of outcomes interpreted as the tuple (x ρ1 v1 ρr vr ), where the ρi ’s are P’s messages and the vi ’s are V’s messages in an interactive proof system for showing x ∈ L. We call the tuple accepting if the verifier would accept if the protocol played out with these messages. Let pi be the probabilities that the realized forecaster predicts for each ρi and let bi be the probabilities for each vi . Given a sequence interpreted as such a tuple and the realized forecasts, our test will declare FAIL if all of the following occur: (i) (x ρ1 v1 ρr vr ) is an accepting tuple. (ii) Each bi is at most √1q . (iii) The product of the ρi ’s is at most n1 . Otherwise the test declares PASS. The test runs in polynomial time and we can pad the sequence so that the test can run in linear time. Call this the PSPACE test. THEOREM 5: For any ε > 0, for sufficiently large n, the PSPACE test passes the truth on inputs of length n with probability at least 1 − ε. A forecaster who can ignorantly pass the PSPACE test with probability at least 1 − ε can solve all inputs of PSPACE problems of length at least n. PROOF: Consider any distribution of nature P and the associated forecaster F P . Let pi ’s and bi ’s be as described above for F P . Let dx be the probability that P puts on choosing a sequence starting with x. By the properties of the proof system, we have that the probability that the test fails is bounded by px (ux + wx ) x
where ux is the probability that a sequence beginning with x fails the test when all the elements are good and wx is the probability that a sequence beginning with x fails the test when some element is bad. The probability wx is bounded by the probability that some element is bad, which is bounded by i
mr mbi ≤ √ = o(1) q
If the vi ’s are good, then there is at most one choice of the ρi ’s that leads to an accepting tuple. So ux ≤
i
pi ≤
1 = o(1) n
104
L. FORTNOW AND R.V. VOHRA
So the probability that the test fails is x
px (ux + wx ) = o(1)
px = o(1)
x
For any x in L we define the following distribution. Fix a strategy for the prover in the interactive proof system for L that guarantees acceptance for all verifier’s messages. For each ρi play that strategy; for each vi pick an element from F uniformly. Output (x ρ1 v1 ρr vr ). Suppose we have a forecaster who passes the test on this distribution with high probability. The first condition of the test is always true. Fix an i. There are q possible vi of which the forecaster can only put weight greater than √1q on √ √ q of those possibilities. So with probability at least 1 − r q/q = 1 − o(1/n), all of the bi ’s will be less than √1q and the second condition of the test will be satisfied. By a similar argument, with high probability all of the vi ’s are good. Since the forecaster passes the test with high probability, then with high probability the product of the pi ’s must be at least 1/n. So if we use the forecaster as a distribution to generate the prover’s responses, the verifier will accept with probability close to 1/n. If we repeat the process O(n) times at least once, the verifier will accept with high probability, whereas if x were not in L, repeating the protocol a polynomial number of times will rarely cause the verifier to accept, no matter what the prover’s strategy. Thus the forecaster gives us a probabilistic algorithm for determining whether x is in L. Q.E.D. 6. CONCLUSION Sandroni (2003) showed that any reasonable forecast tester can be passed by an ignorant forecaster. However his proof requires the forecaster to run in exponential time. We show this exponential blowup is necessary by exhibiting efficient forecast testers that any forecaster ignorant of nature requires to solve PSPACE-complete problems, generally believed to require exponential time. We showed that for efficiently sampleable distributions of nature, an ignorant forecaster would still be required to factor random numbers. It remains open whether one can prove a similar result for NP-hard or PSPACE-hard problems. In the future we would also like to consider what happens when nature is drawn from some smaller set of simpler distributions (see, for example, AlNajjar, Sandroni, Smorodinsky, and Weinstein (2008)). Note that this may change results in two directions: the forecaster need only succeed on these distributions of nature, but also the tester need only pass the truth on that set of distributions as well.
THE COMPLEXITY OF FORECAST TESTING
105
REFERENCES AGRAWAL, M., N. KAYAL, AND N. SAXENA (2004): “PRIMES Is in P,” Annals of Mathematics, 160, 781–793. [100] AL -NAJJAR, N., A. SANDRONI, R. SMORODINSKY, AND J. WEINSTEIN (2008): “Testing Theories With Learnable and Predictive Representations,” Manuscript, Northwestern University. [104] BABAI, L., L. FORTNOW, AND C. LUND (1991): “Nondeterministic Exponential Time Has TwoProver Interactive Protocols,” Computational Complexity, 1, 3–40. BEN-DAVID, S., B. CHOR, O. GOLDREICH, AND M. LUBY (1992): “On the Theory of Average Case Complexity,” Journal of Computer and System Sciences, 44, 193–219. [101] DAWID, A. P. (1982): “The Well Calibrated Bayesian,” Journal of the American Statistical Association, 77, 605–613. [93] (1985): “The Impossibility of Inductive Inference,” Journal of the American Statistical Association, 80, 340–341. [98] DEKEL, E., AND Y. FEINBERG (2006): “Non-Bayesian Testing of a Stochastic Prediction,” Review of Economic Studies, 73, 893–906. [94] FOSTER, D. P., AND R. V. VOHRA (1993): “Asymptotic Calibration,” Biometrika, 85, 379–390. [93] GOOD, I. J. (1952): “Rational Decisions,” Journal of the Royal Statistical Society, Series B, 14, 107–114. [94] LEHRER, E. (2001): “Any Inspection Rule Is Manipulable,” Econometrica, 69, 1333–1347. [93] LUND, C., L. FORTNOW, H. KARLOFF, AND N. NISAN (1992): “Algebraic Methods for Interactive Proof Systems,” Journal of the Association for Computing Machinery, 39, 859–868. [102] OLSZEWSKI, W., AND A. SANDRONI (2009): “A Non-Manipulable Test,” Annals of Statistics (forthcoming). [94] (2008): “Manipulability of Future-Independent Tests,” Econometrica, 76, 1437–1480. [94] SANDRONI, A. (2003): “The Reproducible Properties of Correct Forecasts,” International Journal of Game Theory, 32, 151–159. [93-95,104] SANDRONI, A., R. SMORODINSKY, AND R. VOHRA (2003): “Calibration With Many Checking Rules,” Mathematics of Operations Research, 28, 141–153. [93] SHAMIR, A. (1992): “IP = PSPACE,” Journal of the Association for Computing Machinery, 39, 869–877. [102] VALIANT, L., AND V. VAZIRANI (1986): “NP Is as Easy as Detecting Unique Solutions,” Theoretical Computer Science, 47, 85–93. [95,101] VOVK, V., AND G. SHAFER (2005): “Good Sequential Probability Forecasting Is Always Possible,” Journal of the Royal Statistical Society, Series B, 67, 747–763. [93]
Dept. of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208, U.S.A. and Dept. of Managerial Economics and Decision Sciences, Kellogg Graduate School of Management, Northwestern University, Evanston, IL 60208, U.S.A. Manuscript received May, 2007; final revision received June, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 107–133
DECISION THEORY APPLIED TO A LINEAR PANEL DATA MODEL BY GARY CHAMBERLAIN AND MARCELO J. MOREIRA1 This paper applies some general concepts in decision theory to a linear panel data model. A simple version of the model is an autoregression with a separate intercept for each unit in the cross section, with errors that are independent and identically distributed with a normal distribution. There is a parameter of interest γ and a nuisance parameter τ, a N × K matrix, where N is the cross-section sample size. The focus is on dealing with the incidental parameters problem created by a potentially high-dimension nuisance parameter. We adopt a “fixed-effects” approach that seeks to protect against any sequence of incidental parameters. We transform τ to (δ ρ ω), where δ is a J × K matrix of coefficients from the least-squares projection of τ on a N × J matrix x of strictly exogenous variables, ρ is a K × K symmetric, positive semidefinite matrix obtained from the residual sums of squares and cross-products in the projection of τ on x, and ω is a (N − J) × K matrix whose columns are orthogonal and have unit length. The model is invariant under the actions of a group on the sample space and the parameter space, and we find a maximal invariant statistic. The distribution of the maximal invariant statistic does not depend upon ω. There is a unique invariant distribution for ω. We use this invariant distribution as a prior distribution to obtain an integrated likelihood function. It depends upon the observation only through the maximal invariant statistic. We use the maximal invariant statistic to construct a marginal likelihood function, so we can eliminate ω by integration with respect to the invariant prior distribution or by working with the marginal likelihood function. The two approaches coincide. Decision rules based on the invariant distribution for ω have a minimax property. Given a loss function that does not depend upon ω and given a prior distribution for (γ δ ρ), we show how to minimize the average—with respect to the prior distribution for (γ δ ρ)—of the maximum risk, where the maximum is with respect to ω. There is a family of prior distributions for (δ ρ) that leads to a simple closed form for the integrated likelihood function. This integrated likelihood function coincides with the likelihood function for a normal, correlated random-effects model. Under random sampling, the corresponding quasi maximum likelihood estimator is consistent for γ as N → ∞, with a standard limiting distribution. The limit results do not require normality or homoskedasticity (conditional on x) assumptions. KEYWORDS: Autoregression, fixed effects, incidental parameters, invariance, minimax, correlated random effects.
1. INTRODUCTION THIS PAPER APPLIES some general concepts in decision theory to a linear panel data model. An example of the model is an autoregression with a separate intercept for each unit in the cross section, with errors that are independent and identically distributed with a normal distribution. There is a parameter of interest γ and a nuisance parameter τ, an N × K matrix, where N is the crosssection sample size. The focus is on dealing with the incidental parameters problem created by a potentially high-dimension nuisance parameter. 1 We thank two referees and a co-editor for helpful comments. Financial support was provided by the National Science Foundation (SES-0819761).
© 2009 The Econometric Society
DOI: 10.3982/ECTA6869
108
G. CHAMBERLAIN AND M. J. MOREIRA
In our general model, the observation is the realized value of an N × M matrix Y of random variables. We shall be conditioning on the value of an N × J matrix x, which is observed and has rank J. Our model specifies a conditional distribution for Y given x, as a function of the parameter of interest γ and the nuisance parameter τ: (1)
d
Y |x = xa(γ) + τb(γ) + W c(γ)
where τ is N × K, W is N × p, and J + K ≤ N, J + M ≤ N, and M ≤ p. The components of W , conditional on x, are independently and identically distributed N (0 1), which we shall denote by
L(W ) = N (0 IN ⊗ Ip ) The functions a, b, and c are given. (For a random matrix V , the notation L(V ) = N (μ Λ) indicates that the vector formed by joining the rows of V has a multivariate normal distribution with covariance matrix Λ and mean vector formed by joining the rows of the matrix μ.) All distributions throughout the paper are conditional on x. A simple version of our model arises from the reduced form of the autoregression Yit = ψYit−1 + αi + Uit
(i = 1 N; t = 1 T¯ )
where the Uit are independent and identically distributed N (0 σ 2 ). We observe the realized value of the random variable Yit for i = 1 N and t = 1 T¯ . We do not observe Yi0 . The reduced form is Yi1 = ψYi0 + αi + Ui1 Yit = ψt Yi0 + (1 + ψ + · · · + ψt−1 )αi + Uit + ψUit−1 + · · · + ψt−1 Ui1 (t = 2 T¯ ) Conditional on Yi0 = yi0 , we can write this as (2)
Y = τb(γ) + W c(γ)
where γ = (ψ σ), ⎛Y ⎛y Y1T¯ ⎞ 10 11 ⎠ ⎝ ⎝ Y= (3) τ= YN1 YN T¯ yN0 ⎛W W1T¯ ⎞ 11 ⎠ W = ⎝ WN1 WN T¯
α1 ⎞ ⎠ αN
109
DECISION THEORY APPLIED TO PANEL DATA
(4)
¯
ψ ψ2 ψT ¯ 1 (1 + ψ) (1 + ψ + · · · + ψT −1 ) ⎛ 1 ψ ψT¯ −1 ⎞ ¯ ⎜ 0 1 ψT −2 ⎟ ⎜ c(γ) = σ ⎝ ⎟ ⎠ 0 0 1
b(γ) =
and the Wit are independent and identically distributed N (0 1). The observation is the realized value of Y . The parameters are γ and τ. We shall focus on inference for γ, and treat the initial conditions and individual effects in τ as nuisance parameters. We shall try to deal with the large number of incidental parameters in τ that arise when N is large. We shall adopt a “fixed-effects” approach that seeks to protect against any sequence of incidental parameters in τ. There are recent discussions of incidental parameters and panel data in Lancaster (2000, 2002) and Arellano (2003). An alternative analysis could be based on the distribution of (Yi2 YiT¯ ) conditional on the observed value Yi1 = yi1 . This can fit into our framework by removing the first column of Y and including yi1 in the ith row of x. We prefer to work with the full distribution of the observed Y to avoid possible loss of information from conditioning. Now consider a second-order autoregression with time-varying coefficients on the individual effect (a factor model) and time-varying variances for the innovations: (t = 1 T¯ )
Yit = ψ1 Yit−1 + ψ2 Yit−2 + αi ζt + Uit
where Yi0 = yi0 and Yi−1 = yi−1 are not observed, and the Uit are mutually independent with Uit ∼ N (0 σt2 ). With Y and W defined as above, we can write this as ˜ ˜ Y d(ψ) = τb(ψ ζ) + W c(σ) where
⎛1
−ψ1 1 0
−ψ2 −ψ1 1
0 0 0
⎞
⎟ ⎜0 ⎟ ⎜ ⎜ ⎟ y10 y1−1 α1 ⎜0 ⎟ ⎜ ⎟ ⎠ d(ψ) = ⎜ τ = ⎝ ⎟ ⎜ ⎟ ⎜ yN0 yN−1 αN 0 0 −ψ2 ⎟ ⎜0 ⎟ ⎝0 0 0 −ψ1 ⎠ 0 0 0 1 ⎞ ⎛ ψ1 ψ2 0 0 ˜ ˜ = diag(σ1 σT¯ ) b(ψ ζ) = ⎝ ψ2 0 0 0 ⎠ c(σ) ζ1 ζ2 ζ3 ζT¯ ⎛
⎞
110
G. CHAMBERLAIN AND M. J. MOREIRA
We can impose a normalization such as model is
T¯ t=1
ζt2 = 1. The reduced form of the
Y = τb(γ) + W c(γ) with γ = (ψ ζ σ) and ˜ b(γ) = b(ψ ζ)d(ψ)−1
−1 ˜ c(γ) = c(σ)d(ψ)
We can include strictly exogenous variables xit , Yit = xit ξ + ψ1 Yit−1 + ψ2 Yit−2 + αi ζt + Uit where xit and ξ are L × 1 matrices, ⎞ ⎛ x11 · · · x1T¯ ⎜ ⎟ x = ⎝ ⎠ xN1 · · · xN T¯ and conditional on x, the Uit are mutually independent with Uit ∼ N (0 σt2 ). The reduced form of this model is Y = xa(γ) + τb(γ) + W c(γ) ˜ with γ = (ξ ψ ζ σ), a(ξ) = IT¯ ⊗ ξ, and −1 ˜ a(γ) = a(ξ)d(ψ)
˜ b(γ) = b(ψ ζ)d(ψ)−1
−1 ˜ c(γ) = c(σ)d(ψ)
Note that if ψ1 or ψ2 is not equal to zero, then the reduced form has a distributed lag: the conditional expectation of Yit given x depends upon xi1 xit . An alternative model has Yit = xit ξ + αi ζt + Uit where, conditional on x, the vector (Ui1 UiT¯ ) is independent and identically distributed with a multivariate normal distribution: iid
(Ui1 UiT¯ ) ∼ N (0 Λ(χ)) The function Λ is given, and specifies the variances and serial correlations of the errors Uit as a function of the parameter vector χ with fixed dimension. We can write this as ˜ ˜ ˜ Y = xa(ξ) + τb(ψ ζ) + W c(χ) 2 ˜ ˜ where c(χ) is the symmetric square root of Λ(χ): c(χ) = Λ(χ).
DECISION THEORY APPLIED TO PANEL DATA
111
In our general model, the observation is the realized value of an N × M matrix Y of random variables. For example, in a vector autoregression involving the variables Y (1) Y (k) , the ith row of Y could be
Yi1(1) Yi1(k) Yi(1) Yi(k) T¯ T¯
so that M = kT¯ . We only consider linear, complete systems whose reduced forms match equation (1). See Arellano’s (2003, p. 144) discussion of incomplete systems with unspecified feedback processes. The next section shows that the model is invariant under the actions of a group. This group is isomorphic to O(N − J), the group of orthogonal matrices with N − J rows and columns. This isomorphism suggests a canonical form for the model, based on a one-to-one transformation, which simplifies the subsequent analysis. We transform τ to (δ ρ ω), where δ is a J × K matrix of coefficients from the least-squares projection of τ on x, ρ is a K × K symmetric, positive semidefinite matrix obtained from the residual sums of squares and cross-products in the projection of τ on x, and ω is an (N − J) × K matrix whose columns are orthogonal and have unit length. Only ω has a dimension that increases with N. Section 3 finds a maximal invariant statistic. The distribution of the maximal invariant statistic does not depend upon ω. Section 4 obtains the unique, invariant distribution for ω. We use this invariant distribution as a prior distribution to obtain an integrated likelihood function. It depends upon the observation only through the maximal invariant statistic. We use the maximal invariant statistic to construct a marginal likelihood function, so we can eliminate ω by integration with respect to the invariant prior distribution or by working with the marginal likelihood function. The two approaches coincide. Section 5 shows that decision rules based on the invariant distribution for ω have a minimax property. Given a loss function that does not depend upon ω and given a prior distribution for (γ δ ρ), we show how to minimize the average—with respect to the prior distribution for (γ δ ρ)—of the maximum risk, where the maximum is with respect to ω. Section 6 shows that there is a family of prior distributions for (δ ρ) that leads to a simple closed form for the integrated likelihood function. This integrated likelihood function coincides with the likelihood function for a normal, correlated random-effects model. Section 7 develops the example of a simple autoregression and relates our results to the literature. Under random sampling, the quasi maximum likelihood (ML) estimator for the correlated random-effects model is consistent for γ as N → ∞, with a standard limiting distribution. The limit results do not require normality or homoskedasticity (conditional on x) assumptions.
112
G. CHAMBERLAIN AND M. J. MOREIRA
2. MODEL INVARIANCE AND CANONICAL FORM This section shows that the model is invariant under the actions of a group. This group invariance implies a maximal invariant statistic and an invariant prior distribution. Their derivation is simplified by working with a canonical form for the model, based on a one-to-one transformation. We shall briefly describe invariance in the original form of the model and then provide more detail in the canonical form, where most of the subsequent analysis takes place. Let O(N) denote the group of N × N orthogonal matrices (gg = g g = IN ). ˜ is the subgroup of O(N) that preserves x: The group G ˜ = {g˜ ∈ O(N) : gx ˜ = x} G ˜ on the sample space maps y to gy. ˜ Note that The action of G d
˜ |x = xa(γ) + (gτ)b(γ) ˜ gY + W c(γ) ˜ on the parameter space maps ˜ ) = L(W )). The action of G (because L(gW ˜ on the sample ˜ so the model is invariant under the actions of G (γ τ) to (γ gτ), space and the parameter space. ˜ is in fact isoThe canonical form follows from recognizing that the group G morphic to O(N − J). To see this, use the polar decomposition of x to obtain s x=q 0 where q ∈ O(N) and s is the unique symmetric, positive semidefinite square root of x x: s = (x x)1/2 , with ss = x x. (See Golub and Van Loan (1996, p. 149).) The J ×J matrix s is positive definite because x has full column rank J. ˜ = x is equivalent to Then gx s s ˜ (q gq) = 0 0 Let ˜ = q gq
h11 h21
h12 h22
Then we have h11 s = s
⇒
h11 = IJ
h21 s = 0
⇒
h21 = 0
DECISION THEORY APPLIED TO PANEL DATA
113
˜ ∈ O(N) implies that and so q gq IJ 0 ˜ = (5) q gq 0 g ˜ → G by where g ∈ O(N − J). Define G = O(N − J) and define the map ι : G using (5) to map g˜ to g. It is straightforward to check that this map is bijective. In addition, q (g˜ 1 g˜ 2 )q = (q g˜ 1 q)(q g˜ 2 q) IJ 0 IJ 0 IJ = = 0 g1 0 g2 0
0 g 1 g2
so that ι is a group homomorphism: ι(g˜ 1 g˜ 2 ) = g1 g2 = ι(g˜ 1 )ι(g˜ 2 ) ˜ and G are isomorphic. Since ι is a bijective homomorphism, the groups G ˜ Because of this isomorphism, the action of G on the sample space implies an action of G on the sample space: IJ 0 z1 z1 ˜ = q(q gq)q ˜ =q y =q gy z2 gz2 0 g where z = q y. This applies the orthogonal transformation q to y, multiplies the last N − J rows by the orthogonal matrix g, and then uses q to transform back. So we can simplify notation by working with the one-to-one transformation Z ≡ q Y , with d s Z|x = a(γ) + τb(γ) ˜ + W c(γ) 0 and τ˜ = q τ. Let Z = RN×M denote the sample space and partition a point z ∈ Z as z1 z= z2 where z1 is J × M and z2 is (N − J) × M. Then the action of G on the sample space is given by IJ 0 z1 z= m1 : G × Z → Z m1 (g z) = gz2 0 g
114
G. CHAMBERLAIN AND M. J. MOREIRA
We shall abbreviate m1 (g z) = g · z. This defines a group action because for all g1 g2 ∈ G and z ∈ Z , we have e · z = z and (g1 g2 ) · z = g1 · (g2 · z), where e = IN−J is the identity element in G. Partition τ˜ as τ˜ 1 τ˜ = τ˜ 2 where τ˜ 1 is J × K and τ˜ 2 is (N − J) × K. Note that τ˜ 1 d s g · Z|x = (6) b(γ) + W c(γ) a(γ) + 0 gτ˜ 2 (because L(g · W ) = L(W )). It is convenient to define δ = s−1 τ˜ 1 and write (6) as d s 0 b(γ) + W c(γ) π(γ δ) + g · Z|x = 0 gτ˜ 2 where π(γ δ) = a(γ) + δb(γ) Note that the least-squares projection coefficient of τ on x is (x x)−1 x τ = s−1 τ˜ 1 = δ so δ captures a linear relationship between the individual effects τ and x. Let FKN−J denote the set of (N − J) × K matrices whose columns are orthogonal and have unit length:
FKN−J = d ∈ R(N−J)×K : d d = IK (FKN−J is the Stiefel manifold of ordered sets of K orthonormal vectors in RN−J ; see Bishop and Crittenden (1964, p. 137).) The matrix τ˜ 2 has polar decomposition τ˜ 2 = ωρ
ω ∈ FKN−J
ρ = (τ˜ 2 τ˜ 2 )1/2
where ρ is the unique, symmetric positive semidefinite square root of τ˜ 2 τ˜ 2 . Partition q = ( q1 q2 ), where q1 is N × J and q2 is N × (N − J). Note that x = q1 s implies that x(x x)−1 x = q1 q1 and qq = IN implies that q2 q2 = IN − x(x x)−1 x ρ2 = τ˜ 2 τ˜ 2 = τ q2 q2 τ = τ (IN − x(x x)−1 x )τ
DECISION THEORY APPLIED TO PANEL DATA
115
so ρ2 is formed from the residual sums of squares and cross-products in the least-squares projection of τ on x. Then we can write the model in (1) as d s 0 Z|x = (7) π(γ δ) + ρb(γ) + W c(γ) 0 ω
L(W ) = N (0 IN ⊗ Ip ) Let θ = (β ω) denote the parameter, with β = (γ δ ρ). The parameter space is Θ = Θ1 × Θ2
with
Θ2 = FKN−J
(and Θ1 is a subset of some Euclidean space). We shall let Pθ denote the distribution of Z (conditional on x) when the parameter takes on the value θ: L(Z) = Pθ . The action of the group G on the parameter space is given by m2 : G × Θ → Θ
m2 (g θ) = m2 (g (β ω)) = (β gω)
We shall abbreviate m2 (g θ) = g · θ. This defines a group action because for all g1 g2 ∈ G and θ ∈ Θ, we have e · θ = θ and (g1 g2 ) · θ = g1 · (g2 · θ). Then d s 0 π(γ δ) + ρb(γ) + W c(γ) g · Z|x = 0 gω
L(W ) = N (0 IN ⊗ Ip ) and so
L(Z) = Pθ
implies L(g · Z) = Pg·θ
and the model is invariant under the actions of G on the sample space and the parameter space. 3. MAXIMAL INVARIANT STATISTIC A statistic S is a (measurable) function defined on the sample space. S is invariant if S(g · z) = S(z) for all g ∈ G and z ∈ Z . Let PθS denote the distribution of S(Z) when L(Z) = Pθ . If S is an invariant statistic, then for all g ∈ G and θ ∈ Θ, S PθS = L(S(Z)) = L(S(g · Z)) = Pg·θ
The orbit of a point θ ∈ Θ under the action of G is the set {g · θ : g ∈ G}. Note that for any ω1 ω2 ∈ Θ2 , there exists a g ∈ G such that gω1 = ω2 , and hence
116
G. CHAMBERLAIN AND M. J. MOREIRA
for any β ∈ Θ1 , the points (β ω) are in the same orbit for all ω ∈ Θ2 (The action of G on Θ2 , defined by m(g ω) = gω, is transitive.) So the distribution of an invariant statistic does not depend upon ω. Let T (z) = (T1 (z) T2 (z)) = (z1 z2 z2 ) Then T (g · (z1 z2 )) = T (z1 gz2 ) = (z1 z2 g gz2 ) = (z1 z2 z2 ) and so T is an invariant statistic. We shall show that T is a maximal invariant statistic: if S is an invariant statistic, then for any z z˜ ∈ Z , T (z) = T (z˜ ) implies that S(z) = S(˜z). This result is a consequence of the following proposition: PROPOSITION 1: If T (z) = t = (t1 t2 ), then there exists a gz ∈ G such that z = gz · r(t), where ⎛ ⎞ t1 r(t) = ⎝ t21/2 ⎠ ∈ Z 0 PROOF: The matrix z2 can be decomposed as 1/2 (z2 z2 ) where h ∈ O(N − J) z2 = h 0 Set gz = h. Then gz−1 · z =
IJ 0
0 gz−1
z1 z2
⎛
t1
⎞
= ⎝ t21/2 ⎠ = r(t) 0
Q.E.D.
COROLLARY: T is a maximal invariant statistic. ˜ = t, then PROOF: Suppose that S is an invariant statistic. If T (z) = T (z) Proposition 1 implies that ˜ = r(t) gz−1 · z = gz−1 ˜ ·z
with
gz gz˜ ∈ G
Hence z and z˜ are in the same orbit, g · z = z˜
for g = gz˜ gz−1 ∈ G
so S(z) = S(g · z) = S(˜z )
Q.E.D.
The orbit of a point z ∈ Z under the action of G is the set {g · z : g ∈ G}. The maximal invariant T indexes the orbits in the sample space: if T (z1 ) = T (z2 ) =
DECISION THEORY APPLIED TO PANEL DATA
117
t, then z1 and z2 are in the orbit of r(t). The set {r(T (z)) : z ∈ Z } contains one point from each orbit. It is a measurable cross section; see Eaton (1989, p. 58). In the parameter space, for any point β ∈ Θ1 , the points (β ω) are in the same orbit for all ω ∈ Θ2 , so we can fix some point ω0 ∈ Θ2 and then the set {(β ω0 ) : β ∈ Θ1 } contains one point from each orbit in the parameter space. It is a measurable cross section in the parameter space. 4. INVARIANT PRIOR DISTRIBUTION Since G is a compact group, there is a unique invariant distribution μ on G that is Haar measure normalized so that μ(G) = 1. Let U denote a random variable taking on values in G. The invariance property is that
L(U) = μ implies L(gU) = L(Ug) = μ for all g ∈ G We shall refer to the invariant distribution μ as the uniform distribution on G. This invariant distribution on G implies a unique invariant distribution λ on the compact set Θ2 = FKN−J ; see Eaton (1989, Example 2.10, p. 27). This distribution can be obtained from μ by fixing some point ω0 ∈ Θ2 and setting λ = L(Uω0 ), where L(U) = μ. The distribution λ does not depend upon the point ω0 , since if ω1 is some other point in Θ2 , with ω1 = gω0 for some g ∈ G, then
L(Uω1 ) = L(U(gω0 )) = L((Ug)ω0 ) = L(Uω0 ) = λ Let V be a random variable taking on values in Θ2 . Then the invariance property of λ is that
L(V ) = λ implies L(gV ) = L(g(Uω0 )) = L((gU)ω0 ) = L(Uω0 ) = λ for all g ∈ G. We shall refer to the invariant distribution λ as the uniform distribution on Θ2 . Define Ω(γ) = c(γ) c(γ) and assume that c(γ) has full column rank for all β = (γ δ ρ) ∈ Θ1 . Let f (z|β ω) denote the likelihood function, f (z|β ω) = (2π)−NM/2 det(Ω(γ))−N/2 1 −1 × exp − trace[Ω(γ) k(z β ω) k(z β ω)] 2
118
G. CHAMBERLAIN AND M. J. MOREIRA
where s 0 k(z β ω) = z − π(γ δ) − ρb(γ) 0 ω We can use the uniform distribution on Θ2 as a prior distribution to obtain an integrated likelihood function: f (z|β ω)λ(dω) fλ (z|β) = Θ2
The next proposition shows that this integrated likelihood function depends upon z only through the maximal invariant T (z). PROPOSITION 2: For all z ∈ Z and β ∈ Θ1 , fλ (z|β) = fλ (r(T (z))|β) PROOF: Note that for any g ∈ G, IJ 0 s 0 −1 z− k(g · z β ω) = π(γ δ) − ρb(γ) 0 gω 0 g−1 0 IJ k(z β gω) = 0 g−1 and so, for all z ∈ Z and (β ω) ∈ Θ, f (g−1 · z|β ω) = f (z|β gω) (See Eaton (1989, p. 44) for a general discussion of this point.) As in Proposition 1, z = gz · r(T (z)), so f (z|β ω)λ(dω) = f gz · r(T (z))|β ω λ(dω) Θ2
Θ2
f r(T (z))|β gz−1 ω λ(dω)
=
Θ2
f r(T (z))|β ω λ(dω)
= Θ2
Q.E.D.
We can use the maximal invariant statistic T to construct a marginal likelihood function, based on a density for the distribution of T . The next proposition uses Proposition 2 to show that this marginal likelihood function can be obtained from the integrated likelihood function. Let PβT denote the distribution of T (Z) when L(Z) = P(βω) ; the value of ω does not matter since T is
DECISION THEORY APPLIED TO PANEL DATA
119
an invariant statistic. Let ζ denote Lebesgue measure on RN × RM and let ν = ζT −1 denote the measure ν(A) = ζ(T −1 (A)) for (measurable) sets A in a Euclidean space containing T (Z ). Define f T (t|β) = fλ (r(t)|β)
for t ∈ T (Z ) β ∈ Θ1
Proposition 3 shows that f T (t|β) provides a density function for PβT : T Pβ (A) = f T (t|β)ν(dt) A
PROPOSITION 3: fλ (r(t)|β) is a density for PβT with respect to the measure ν. PROOF: For all ω ∈ Θ2 , PβT (A) = P(βω) (T −1 (A)) and so
P(βω) (T −1 (A))λ(dω)
P (A) = T β
Θ2
= Θ2
=
T −1 (A)
T −1 (A)
=
T −1 (A)
f (z|β ω)ζ(dz) λ(dω) f (z|β ω)λ(dω) ζ(dz)
Θ2
fλ r(T (z))|β ζ(dz)
fλ (r(t)|β)ζT −1 (dt)
= A
Q.E.D.
We can eliminate the parameter ω by integration with respect to the invariant prior distribution to obtain the integrated likelihood function fλ (z|β). Alternatively, we can eliminate ω by working with the marginal likelihood function f T (t|β), based on the maximal invariant statistic T . Propositions 2 and 3 show that these likelihood functions coincide: fλ (z|β) = fλ r(T (z))|β = f T (T (z)|β) Having eliminated ω, we can ask whether γ is identified in these likelihood functions. This will depend on the particular specifications for a(γ), b(γ), and
120
G. CHAMBERLAIN AND M. J. MOREIRA
c(γ), and one can examine the following moment conditions based on T (Z): E[(x x)−1 x Y ] = s−1 E(Z1 ) = π(γ δ) E(Y Y ) = E(Z1 Z1 + Z2 Z2 ) = π(γ δ) x xπ(γ δ) + b(γ) ρ2 b(γ) + Nc(γ) c(γ) 5. OPTIMALITY Using the likelihood function of an invariant statistic has the advantage of eliminating dependence on the parameter ω. The concern is that, even using the maximal invariant statistic, we are not using all of the data. This concern can be addressed in our case, since the marginal likelihood function based on T coincides with the integrated likelihood function when we use the invariant prior distribution for ω. Suppose the loss function does not depend upon ω, L : Θ1 × A → R where A is the action space. The corresponding risk function is R((β ω) d) = L(β d(z))f (z|β ω)ζ(dz) Z
where d : Z → A is in the set D of feasible decision rules; D is unrestricted except for regularity conditions. Let η be some prior distribution on Θ1 and consider the average risk with respect to the prior distribution η × λ on Θ: R((β ω) d)λ(dω)η(dβ) R∗ (η × λ d) = Θ1
Θ2
Θ1
Z
=
L(β d(z))fλ (z|β)ζ(dz)η(dβ)
So, choosing d to minimize this average risk function can be based on the integrated likelihood function. Under regularity conditions, we have the standard result that the optimal d is obtained by minimizing posterior expected loss, L(β a)fλ (z|β)η(dβ) d(z) = arg min a∈A
Θ1
= arg min a∈A
L(β a)f T (T (z)|β)η(dβ) Θ1
so we can obtain an optimal decision rule using the marginal likelihood function. This optimal decision rule is a function of the maximal invariant statistic— it depends upon z only through T (z)—but this was not imposed as a constraint
DECISION THEORY APPLIED TO PANEL DATA
121
on D in the optimization. See Eaton (1989, Chapter 6) for a general discussion of invariant decision rules. Suppose that dη×λ minimizes average risk, dη×λ = arg min R∗ (η × λ d) d∈D
and depends upon z only through T (z), dη×λ (z) = d˜η×λ (T (z)) The next proposition establishes a minimax property for this decision rule. The argument is based on Chamberlain (2007, Theorem 6.1). PROPOSITION 4: dη×λ solves the following problem, which combines the average risk and maximum risk criteria: dη×λ = arg min sup R((β ω) d) η(dβ) d∈D
Θ1 ω∈Θ2
PROOF: Let L(Z) = P(βω) . Then R((β ω) dη×λ ) = E L β d˜η×λ (T (Z)) which does not depend upon ω since T is an invariant statistic. So, we can fix ˜ a point ω0 ∈ Θ2 , define R(β dη×λ ) = R((β ω0 ) dη×λ ), and then, for all β ∈ Θ1 ˜ and ω ∈ Θ2 , we have R((β ω) dη×λ ) = R(β dη×λ ). For any d ∈ D , sup R((β ω) d) η(dβ) Θ1 ω∈Θ2
R((β ω) d)λ(dω) η(dβ)
≥ Θ1
Θ2
Θ1
Θ2
≥ = Θ1
=
R((β ω) dη×λ )λ(dω) η(dβ)
˜ R(β dη×λ )η(dβ)
sup R((β ω) dη×λ ) η(dβ)
Θ1 ω∈Θ2
Q.E.D.
The use of minimax here does not eliminate the choice of a prior distribution: the average risk criteria on the parameter space Θ1 for β requires that we specify a prior distribution η. But we can replace the choice of a prior distribution on the parameter space FKN−J for ω by the maximum risk criterion. The
122
G. CHAMBERLAIN AND M. J. MOREIRA
solution to the minimax problem calls for a particular, least favorable, distribution on FKN−J : the uniform distribution λ. This minimax treatment of the incidental parameters can be obtained using the marginal likelihood function f T (·|β) based on the maximal invariant statistic. Recall from Section 2 that δ = (x x)−1 x τ
ρ2 = τ (IN − x(x x)−1 x )τ
Define the set-valued function B(· ·) by
B(δ ρ) = τ ∈ RN×K : (x x)−1 x τ = δ τ (IN − x(x x)−1 x )τ = ρ2 The minimax result in Proposition 4 can be related to the original parametrization by replacing the sup over ω in Θ2 by the sup over τ in B(δ ρ). 6. A CLOSED FORM INTEGRATED LIKELIHOOD AND CORRELATED RANDOM EFFECTS
Our finite sample optimality result uses a prior distribution η for β, where β = (γ δ ρ). This section develops a family of prior distributions for (δ ρ) that leads to a simple, explicit form for the integrated likelihood. The basic idea is that the uniform distribution λ on FKN−J can be combined with a central Wishart distribution to obtain a multivariate normal distribution. We start with a family of prior distributions for ρ that is indexed by a parameter Φ, which is a K × K symmetric, positive semidefinite matrix. Let
L(Q) = N (0 IN−J ⊗ Φ) Let κΦ = L (Q Q)1/2 be the prior distribution for ρ with parameter Φ. Then the corresponding integrated likelihood function is fλκ (z|γ δ Φ) = fλ (z|(γ δ ρ))κΦ (dρ) =
f (z|(γ δ ρ) ω)λ(dω)κΦ (dρ)
Suppose that V is independent of Q Q, with L(V ) = λ. Then L(Q) = L V (Q Q)1/2 ; see Eaton (1989, Example 4.4, p. 61). So the distribution for τ˜ 2 = ωρ implied by λ × κΦ is N (0 IN−J ⊗ Φ). This implies that the log of the integrated likelihood
DECISION THEORY APPLIED TO PANEL DATA
123
function is log[fλκ (z|γ δ Φ)] J NM log(2π) − log det(Ω(γ)) 2 2 N −J − log det[b(γ) Φb(γ) + Ω(γ)] 2 1 − trace Ω(γ)−1 (z1 − sπ(γ δ)) (z1 − sπ(γ δ)) 2 1 − trace [b(γ) Φb(γ) + Ω(γ)]−1 z2 z2 2
=−
Fix a value for the parameter in (7): β∗ = (γ ∗ δ∗ ρ∗ ) ∈ Θ1
ω∗ ∈ Θ2
Let L(Z) = P(β∗ ω∗ ) and define l(γ δ Φ) = E log[fλκ (Z|γ δ Φ)] Note that this expectation does not depend upon the normality assumption for W in (7); only the first and second moments of Z are used, and so l(γ δ Φ) depends upon L(W ) only through its first and second moments. Evaluating E(Z1 ), E(Z1 Z1 ), and E(Z2 Z2 ) gives l(γ δ Φ) J NM log(2π) − log det(Ω(γ)) 2 2 N −J − log det[b(γ) Φb(γ) + Ω(γ)] 2 1 − trace Ω(γ)−1 (π(γ ∗ δ∗ ) − π(γ δ)) x x 2 × (π(γ ∗ δ∗ ) − π(γ δ)) + JΩ(γ)−1 Ω(γ ∗ )
=−
−
1 trace [b(γ) Φb(γ) + Ω(γ)]−1 2
× [b(γ ∗ ) ρ∗ 2 b(γ ∗ ) + (N − J)Ω(γ ∗ )] The maximum of l(γ δ Φ) is attained at γ = γ∗
δ = δ∗
Φ = ρ∗ 2 /(N − J)
124
G. CHAMBERLAIN AND M. J. MOREIRA
This result is useful for obtaining asymptotic properties of the estimator that maximizes the integrated (quasi) log-likelihood function. To make connections between a correlated random-effects model and our fixed-effects approach, it is convenient to introduce a prior distribution for δ, in addition to the prior distribution for ρ that was chosen to obtain a closed form for the integrated likelihood. The family of prior distributions for (δ ρ) is indexed by the parameter (ι Φ), where ι is a J × K matrix and Φ is a K × K symmetric, positive semidefinite matrix. Let Q1 = N (0 IN ⊗ Φ) L Q2 where Q1 is J × K and Q2 is (N − J) × K. Let κιΦ = L ι + s−1 Q1 (Q2 Q2 )1/2 be the prior distribution for (δ ρ). The distribution for ωρ implied by λ × κιΦ is N (0 IN−J ⊗ Φ) (as above) and the distribution for (sπ(γ δ) ωρ) is
N (sπ(γ ι) IJ ⊗ b(γ) Φb(γ)) × N (0 IN−J ⊗ Φ) The corresponding integrated likelihood function is ¯ fλκ (z|γ ι Φ) = fλ (z|(γ δ ρ))κιΦ (dδ dρ) Evaluating the log of this integrated likelihood function gives (8)
log[f¯λκ (z|γ ι Φ)] N NM log(2π) − log det[b(γ) Φb(γ) + Ω(γ)] 2 2 1 − trace [b(γ) Φb(γ) + Ω(γ)]−1 2 × (z1 − sπ(γ ι)) (z1 − sπ(γ ι)) + z2 z2
=−
As above, fix a value (β∗ ω∗ ) for the parameter, let L(Z) = P(β∗ ω∗ ) , and define ¯ ι Φ) = E log[f¯λκ (Z|γ ι Φ)] l(γ As before, this expectation does not depend upon the normality assumption for W in (7); only the first and second moments of Z are used. Evaluating E(Z1 ), E(Z1 Z1 ), and E(Z2 Z2 ) gives (9)
¯ ι Φ) = − NM log(2π) − N log det[b(γ) Φb(γ) + Ω(γ)] l(γ 2 2
DECISION THEORY APPLIED TO PANEL DATA
125
1 trace [b(γ) Φb(γ) + Ω(γ)]−1 2 × (π(γ ∗ δ∗ ) − π(γ ι)) x x(π(γ ∗ δ∗ ) − π(γ ι)) + b(γ ∗ ) ρ∗ 2 b(γ ∗ ) + NΩ(γ ∗ ) −
¯ ι Φ) is attained at The maximum of l(γ γ = γ∗
ι = δ∗
Φ = ρ∗ 2 /N
Consider the following correlated random-effects specification for the incidental parameters: (10)
d
τ|x = N (xι IN ⊗ Φ)
Combining this with the model in (1), the implied distribution for the observation is d Y |x = N xπ(γ ι) IN ⊗ [b(γ) Φb(γ) + Ω(γ)] (11) and the log-likelihood function is (12)
log[f re (y|γ ι Φ)] NM N log(2π) − log det[b(γ) Φb(γ) + Ω(γ)] 2 2 1 − trace [b(γ) Φb(γ) + Ω(γ)]−1 2 × (y − xπ(γ ι)) (y − xπ(γ ι))
=−
We shall refer to this as the normal, correlated random-effects model. As in Section 2, let s x=q z = q y 0 where q is an N × N orthogonal matrix; then we can write the log-likelihood function as (13)
log[f re (qz|γ ι Φ)] NM N log(2π) − log det[b(γ) Φb(γ) + Ω(γ)] 2 2 1 − trace [b(γ) Φb(γ) + Ω(γ)]−1 2
=−
126
G. CHAMBERLAIN AND M. J. MOREIRA
× (z1 − sπ(γ ι)) (z1 − sπ(γ ι)) + z2 z2 = log[f¯λκ (z|γ ι Φ)] So, the log of the normal, correlated random-effects likelihood function coincides with the log of the integrated likelihood function in equation (8). This connection with a correlated random-effects model helps to relate our results to the literature on panel data. 7. CONNECTIONS WITH THE LITERATURE A simple version of our model arises from the reduced form of the autoregression (14)
(i = 1 N; t = 1 T¯ )
Yit = ψYit−1 +αi +Uit
where the Uit are independent and identically distributed N (0 σ 2 ). We observe the realized value of the random variable Yit for i = 1 N and t = 1 T¯ . We do not observe Yi0 . This specification implies a likelihood function for {Yi1 YiT¯ }Ni=1 conditional on {yi0 αi }Ni=1 . Our framework allows for conditioning on time-varying covariates xit , but in this simple version we shall just use x = 1N , where 1N denotes an N × 1 matrix of ones. To obtain our canonical form, use an N√× N orthogonal matrix q whose first column is proportional to 1N : q = ( 1N / N q2 ) so that √ N x=q 0 Note that qq = IN implies that q2 q2 = IN − 1N 1N /N. Then our transformation of the N × T¯ matrix Y is √ N Y¯ Z = q Y = where q2 Y N N Y¯ = Yi1 /N · · · YiT¯ /N i=1
i=1
Our transformation of the parameters uses √ N τ¯ τ˜ = q τ = τ˜ 2 with τ˜ 2 = q2 τ and (15)
¯ (τ − 1N τ) ¯ τ˜ 2 τ˜ 2 = τ q2 q2 τ = (τ − 1N τ)
127
DECISION THEORY APPLIED TO PANEL DATA
⎛
⎞ N ¯ (y − y )(α − α) ¯ i0 0 i ⎜ ⎟ ⎜ ⎟ i=1 i=1 ⎜ ⎟ =⎜ N ⎟ N ⎝ ⎠ 2 (yi0 − y¯0 )(αi − α) ¯ (αi − α) ¯ N (yi0 − y¯0 )2
i=1
Then
δ = τ¯ =
N i=1
i=1
yi0 /N
N
αi /N
ρ = (τ˜ 2 τ˜ 2 )1/2
i=1
The maximal invariant statistic T (Z) = (T1 (Z) T2 (Z)) has √ T1 (Z) = N Y¯ T2 (Z) = (Y − 1N Y¯ ) (Y − 1N Y¯ ) The distribution of this statistic depends only upon (ψ σ δ ρ), which has dimension 7; the distribution does not depend upon ω, which has dimension 2N − 5. Let γ = (ψ σ). The distribution of the maximal invariant statistic has density f T (·|γ δ ρ), which is based on a normal distribution for Y¯ and an independent noncentral Wishart distribution for (Y − 1N Y¯ ) (Y − 1N Y¯ ). Our optimality result requires a prior distribution for (ψ σ δ ρ). The dimension reduction shows up in not requiring a prior distribution for ω—that is, where the minimax result is used. There is a particular family of prior distributions for (δ ρ) that connects to the literature on random-effects models. The family is indexed by the parameter (ι Φ), where ι is 1 × 2 and Φ is a 2 × 2 symmetric, positive semidefinite matrix. Let Q1 = N (0 IN ⊗ Φ) L Q2 where Q1 is 1 × 2 and Q2 is (N − 1) × 2. The prior for (δ ρ) is κιΦ = L ι + N −1/2 Q1 (Q2 Q2 )1/2 Combining f T (T (z)|γ δ ρ) with this family of prior distributions for (δ ρ) gives the integrated likelihood f¯λκ (z|γ ι Φ), as in equation (8) in Section 6. Now consider the following normal random-effects model: the specification in (14) plus (16)
iid
(yi0 αi ) ∼ N ((ι1 ι2 ) Φ)
(i = 1 N)
The likelihood function is f re (y|γ ι Φ), as in equation (12) in Section 6. Using the transformation z = q y, our result in (13) shows that f re (qz|γ ι Φ) = f¯λκ (z|γ ι Φ)
128
G. CHAMBERLAIN AND M. J. MOREIRA
A prior distribution for (ψ σ ι Φ), of dimension 7, is needed to obtain minimum average risk in finite samples. We do not have a specific recommendation for this prior. Our point is that the normal random-effects likelihood function can be obtained from the likelihood function for the maximal invariant, in which the dimension of the parameter space has already been reduced to a number (7) that does not depend upon N. In this sense, the incidental parameter problem has been dealt with in the original fixed-effects model in (14), which conditions on {yi0 αi }Ni=1 , without relying on the specification of a random-effects distribution in (16). Our paper has a finite sample perspective, but in connecting with the literature, we shall consider limits as N → ∞ in the context of our general model. In equation (9) of Section 6, we fix a “true value” θ∗ = (γ ∗ δ∗ ρ∗ ω∗ ) for the parameter in equation (7), and evaluate ¯ ι Φ) = E log[f¯λκ (Z|γ ι Φ)] l(γ with Z distributed according to Pθ∗ . This expectation does not depend upon the ¯ ι Φ) is attained normality assumption for W in (7), and the maximum of l(γ ∗ ∗ ∗2 at γ = γ , ι = δ , and Φ = ρ /N. If there is a unique maximizing value for γ, it should be feasible to go from here to a consistency result for γ ∗ (as N → ∞), without the normality assumption for W . In fact, if we add an assumption of random sampling over the cross-section dimension i, then the asymptotics are straightforward and well known. Let Y(i) , x(i) , and τ(i) denote the ith rows of Y , x, and τ. Assume that (i = 1 N) Y(i) x(i) τ(i) are independent and identically distributed from a joint distribution F , and let EF denote expectation with respect to this distribution. We shall assume that the (unconditional) second moments of (Y(i) x(i) ) correspond to the normal, correlated random-effects model, but we shall not make normality or homoskedasticity (conditional on x) assumptions in obtaining the limit distribution of the estimator. We shall refer to this semiparametric model simply as the correlated random-effects model. Assume that EF x(i) Y(i) − x(i) π(γ ∗ ι∗ ) = 0 so that x(i) π(γ ∗ ι∗ ) is the minimum mean-square-error linear predictor of Y(i) given x(i) . Define ε(i) = Y(i) − x(i) π(γ ∗ ι∗ ) and assume that EF ε(i) ε(i) = b(γ ∗ ) Φ∗ b(γ ∗ ) + Ω(γ ∗ )
DECISION THEORY APPLIED TO PANEL DATA
129
Let υ denote the column vector formed from γ, ι, and the lower triangle of Φ, and let 1 h Y(i) x(i) υ = − log det[b(γ) Φb(γ) + Ω(γ)] 2 1 − trace [b(γ) Φb(γ) + Ω(γ)]−1 2 × Y(i) − x(i) π(γ ι) Y(i) − x(i) π(γ ι) Then it is straightforward to show that max EF h Y(1) x(1) υ υ
is attained at υ∗ , which is formed from the distinct elements of (γ ∗ ι∗ Φ∗ ). The quasi ML estimator is υˆ N = arg max υ
N 1 h Y(i) x(i) υ N i=1
Standard method-of-moments arguments, as in Hansen (1982), MaCurdy (1982), and White (1982), provide regularity conditions under which υˆ N has a limiting normal distribution as N → ∞ (with J, K, and M fixed): √ d N(υˆ N − υ∗ ) → N (0 Λ∗ ) where
∂2 h Y(1) x(1) υ∗ −1 Λ = EF ∂υ ∂υ ∂h Y(1) x(1) υ∗ ∂h Y(1) x(1) υ∗ × EF ∂υ ∂υ ∂2 h Y(1) x(1) υ∗ −1 × EF ∂υ ∂υ
∗
Since f¯λκ (q y|γ ι Φ) equals f re (y|γ ι Φ), this limit distribution result applies to a quasi maximum likelihood estimator based on the integrated likelihood f¯λκ from Section 6. The quasi maximum likelihood estimator is asymptotically equivalent to a minimum distance estimator that imposes the restrictions on the second moments. An optimal minimum distance estimator uses a weight matrix based on the covariance matrix of the sample second moments. The minimum distance
130
G. CHAMBERLAIN AND M. J. MOREIRA
estimator that corresponds to quasi ML uses a weight matrix that would be optimal under normality but not in general. See Chamberlain (1984, Section 4.4) and Arellano (2003, Sections 5.4.3 and 7.4.2). Returning to the example in equation (14), we can use the reduced form from equations (2)–(4) in Section 1 to calculate moment conditions based on T (Z), conditional on {yi0 αi }Ni=1 . This gives ¯ E(Y¯ ) = N −1/2 E(Z1 ) = τb(γ) E(Y Y ) = E(Z1 Z1 + Z2 Z2 ) = b(γ) (N τ¯ τ¯ + τ˜ 2 τ˜ 2 )b(γ) + Nc(γ) c(γ) where γ = (ψ σ), τ˜ τ˜ = ρ2 is displayed in (15), and b(γ) and c(γ) are displayed in (4). These moments can be used to examine the identification of ψ and σ, treating τ¯ and τ˜ 2 τ˜ 2 as unrestricted. Now add the normal random-effects specification in (16), and calculate moment conditions without conditioning on {yi0 αi }Ni=1 : E(Y¯ ) = ιb(γ) E(Y Y ) = b(γ) (Nι ι + NΦ)b(γ) + Nc(γ) c(γ) Working with these moments leads to the same identification analysis for γ, because ι and Φ are unrestricted. Bhargava and Sargan (1983) considered maximum likelihood estimation in a model with lagged dependent variables and strictly exogenous variables. They used a normal, correlated random-effects model for the initial conditions. Their model is discussed by Arellano (2003, Sections 7.4.1 and 7.4.2). Arellano (2003, Section 7.4.3) considered a normal, correlated random-effects specification for the individual effects in the Bhargava–Sargan model. Chamberlain (1980, pp. 234–235) and Blundell and Smith (1991) considered maximum likelihood estimation, conditional on the first observation, in normal, correlated random-effects models. Alvarez and Arellano (2003, Section 3.5) obtained limiting results for inference in these models as N and T¯ tend to infinity.2 Alvarez and Arellano (2004) considered quasi maximum likelihood estimators in correlated random-effects models, with a stress on allowing for time-series heteroskedasticity. Lancaster (2002) dealt with incidental parameters by first reparametrizing so that the information matrix is block diagonal, with the common parameters in one block and the incidental parameters in the other. In his application to a nonstationary dynamic regression model (Lancaster (2002, p. 653)), the parameter space for the reparametrized incidental parameters is RN . Then he 2 Hahn and Kuersteiner (2002) considered maximum likelihood estimation in fixed-effects models, and obtained bias corrections as N and T¯ tend to infinity.
DECISION THEORY APPLIED TO PANEL DATA
131
formed an integrated likelihood function, integrating with respect to Lebesgue measure on RN . He showed that maximizing this integrated likelihood function provides a consistent estimator of the common parameters. Note that the information matrix block diagonality would be preserved by a smooth bijective transformation of the incidental parameters, so the use of Lebesgue measure does not by itself provide a unique prior measure. Our approach is similar in that it uses an integrated likelihood function. The prior measure, however, is different. Our reparametrization is motivated by the invariance of the model under the actions of the orthogonal group, and this determines a unique invariant distribution for ω on the compact space FKN−J . This distribution is least favorable in our minimax optimality result. Another difference is that the use of Lebesgue measure on RN for a prior measure does not correspond to the normal, correlated random-effects model. It amounts to specifying that the (reparametrized) individual effects have very large variances, instead of treating the individual effects as draws from a distribution whose variance is a parameter to be estimated. Sims (2000) used a likelihood perspective in his analysis of dynamic panel data models. He dealt with incidental parameters by treating the individual effects and initial conditions as draws from a bivariate normal distribution (Sims (2000, p. 454)). Our approach has a different starting point, since our model treats the individual effects and initial conditions as parameters (fixed effects). But our minimax optimality argument calls for a particular least favorable distribution for ω. We have seen that this unique distribution can be combined with a particular family of prior distributions for (δ ρ) to obtain a normal, correlated random-effects model, which corresponds to Sims’s specification. 8. CONCLUSION We started with a fixed-effects model. After reparametrizing, only the parameter ω has dimension depending on the cross-section sample size N. The model is invariant under the actions of the orthogonal group, and we obtained a maximal invariant statistic, T , whose distribution does not depend upon ω. So we can solve the incidental parameters problem by working with a marginal likelihood, based on the sampling distribution of T . This approach has a finite sample, minimax optimality. The argument is based on expressing the marginal likelihood as an integrated likelihood for a particular prior distribution for ω. The prior distribution is the unique, invariant distribution under the group action on that part of the parameter space. In addition to ω, the nuisance parameter consists of (δ ρ), whose dimension does not depend upon N. A convenient way to implement our approach is to use a particular family of prior distributions for (δ ρ), indexed by the parameter (ι Φ). This leads to an integrated likelihood function with a closed form expression. It is a function of (γ ι Φ), where γ is the original parameter of interest, which is not affected by the reparametrization. It turns out that
132
G. CHAMBERLAIN AND M. J. MOREIRA
this integrated likelihood function coincides with the likelihood function for a normal, correlated random-effects model. So our finite sample optimality arguments take us from the initial fixedeffects model to a normal, correlated random-effects model. The normal distribution for the effects is not part of our model in equations (1) and (7); the model only specifies a normal distribution for the errors. The normal distribution for the effects arises from two sources: the unique uniform distribution for ω on the compact manifold FKN−J , whose dimension depends upon N, and the convenient choice of prior distribution for (δ ρ), whose dimension does not depend upon N. The first source is motivated by our invariance and minimax arguments. The second source lacks this motivation, but since the dimension of (δ ρ) does not depend upon N, the particular choice made here may not be so important when N is large. In fact, using the integrated likelihood function as a quasi likelihood, the large N asymptotics of the quasi ML estimator are covered by standard arguments, under random sampling. These large N arguments do not require the assumption of normal errors in (1) and (7). So one way to view our finite sample results is that, starting with a fixedeffects model, they provide motivation for a normal, correlated random-effects model. At that point, robustness concerns can lead to dropping the normality assumption. Our quasi ML estimator can still provide the basis for large N inference, but it would not be (semiparametric) efficient, so one may prefer to use a different weighting scheme for the moment restrictions implied by the correlated random-effects model. This leads to standard optimal minimum distance and generalized method of moments estimators for the correlated random-effects model. REFERENCES ALVAREZ, J., AND M. ARELLANO (2003): “The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators,” Econometrica, 71, 1121–1159. [130] (2004): “Robust Likelihood Estimation of Dynamic Panel Data Models,” Unpublished Manuscript, CEMFI. [130] ARELLANO, M. (2003): Panel Data Econometrics. Oxford, U.K.: Oxford University Press. [109, 111,130] BHARGAVA, A., AND J. D. SARGAN (1983): “Estimating Dynamic Random Effects Models From Panel Data Covering Short Time Periods,” Econometrica, 51, 1635–1659. [130] BISHOP, R., AND R. CRITTENDEN (1964): Geometry of Manifolds. New York: Academic Press. Reprinted with corrections, 2001. Providence, RI: AMS Chelsea Publishing. [114] BLUNDELL, R., AND R. SMITH (1991): “Initial Conditions and Efficient Estimation in Dynamic Panel Data Models: An Application to Company Investment Behaviour,” Annales d’Économie et de Statistique, 20/21, 109–123. [130] CHAMBERLAIN, G. (1980): “Analysis of Covariance With Qualitative Data,” The Review of Economic Studies, 47, 225–238. [130] (1984): “Panel Data,” in Handbook of Econometrics, Volume II, ed. by Z. Griliches and M. Intriligator. Amsterdam: North-Holland, 1247–1318. [130] (2007): “Decision Theory Applied to an Instrumental Variables Model,” Econometrica, 75, 609–652. [121]
DECISION THEORY APPLIED TO PANEL DATA
133
EATON, M. (1989): Group Invariance Applications in Statistics, Regional Conference Series in Probability and Statistics, Vol. 1. Hayward, CA: Institute of Mathematical Statistics. [117,118, 121,122] GOLUB, G., AND C. VAN LOAN (1996): Matrix Computations (Third Ed.). Baltimore, MD: The Johns Hopkins University Press. [112] HAHN, J., AND G. KUERSTEINER (2002): “Asymptotically Unbiased Inference for a Dynamic Panel Model With Fixed Effects When Both n and T Are Large,” Econometrica, 70, 1639–1657. [130] HANSEN, L. P. (1982): “Large Sample Properties of Generalized Method of Moments Estimators,” Econometrica, 50, 1029–1054. [129] LANCASTER, T. (2000): “The Incidental Parameter Problem Since 1948,” Journal of Econometrics, 95, 391–413. [109] (2002): “Orthogonal Parameters and Panel Data,” Review of Economic Studies, 69, 647–666. [109,130] MACURDY, T. (1982): “The Use of Time Series Processes to Model the Error Structure of Earnings in a Longitudinal Data Analysis,” Journal of Econometrics, 18, 83–114. [129] SIMS, C. (2000): “Using a Likelihood Perspective to Sharpen Econometric Discourse: Three Examples,” Journal of Econometrics, 95, 443–462. [131] WHITE, H. (1982): “Maximum Likelihood Estimation of Misspecified Models,” Econometrica, 50, 1–25. [129]
Dept. of Economics, Harvard University, Cambridge, MA 02138, U.S.A.; [email protected] and Dept. of Economics, Harvard University, Cambridge, MA 02138, U.S.A. and FGV/EPGE, Rio de Janeiro, RJ 22250, Brazil; [email protected]. Manuscript received December, 2006; final revision received November, 2007.
Econometrica, Vol. 77, No. 1 (January, 2009), 135–175
NONPARAMETRIC IDENTIFICATION OF FINITE MIXTURE MODELS OF DYNAMIC DISCRETE CHOICES BY HIROYUKI KASAHARA AND KATSUMI SHIMOTSU1 In dynamic discrete choice analysis, controlling for unobserved heterogeneity is an important issue, and finite mixture models provide flexible ways to account for it. This paper studies nonparametric identifiability of type probabilities and type-specific component distributions in finite mixture models of dynamic discrete choices. We derive sufficient conditions for nonparametric identification for various finite mixture models of dynamic discrete choices used in applied work under different assumptions on the Markov property, stationarity, and type-invariance in the transition process. Three elements emerge as the important determinants of identification: the time-dimension of panel data, the number of values the covariates can take, and the heterogeneity of the response of different types to changes in the covariates. For example, in a simple case where the transition function is type-invariant, a time-dimension of T = 3 is sufficient for identification, provided that the number of values the covariates can take is no smaller than the number of types and that the changes in the covariates induce sufficiently heterogeneous variations in the choice probabilities across types. Identification is achieved even when state dependence is present if a model is stationary first-order Markovian and the panel has a moderate time-dimension (T ≥ 6). KEYWORDS: Dynamic discrete choice models, finite mixture, nonparametric identification, panel data, unobserved heterogeneity.
1. INTRODUCTION IN DYNAMIC DISCRETE CHOICE ANALYSIS, controlling for unobserved heterogeneity is an important issue. Finite mixture models, which are commonly used in empirical analyses, provide flexible ways to account for it. To date, however, the conditions under which finite mixture dynamic discrete choice models are nonparametrically identified are not well understood. This paper studies nonparametric identifiability of finite mixture models of dynamic discrete choices when a researcher has access to panel data. Finite mixtures have been used in numerous applications, especially in estimating dynamic models. In empirical industrial organization, Crawford and Shum (2005) used finite mixtures to control for patient-level unobserved heterogeneity in estimating a dynamic matching model of pharmaceutical demand. Gowrisankaran, Mitchell, and Moro (2005) estimated a dynamic model of voter behavior with finite mixtures. In labor economics, finite mixtures are a popular choice for controlling for unobserved person-specific effects when 1 The authors are grateful to the co-editor and three anonymous referees whose comments greatly improved the paper. The authors thank Victor Aguirregabiria, David Byrne, Seung Hyun Hong, Hidehiko Ichimura, Thierry Magnac, and the seminar participants at Hitotsubashi University, University of Tokyo, University of Toronto, New York Camp Econometrics II, and 2006 JEA Spring Meeting for helpful comments. The financial support from SSHRC is gratefully acknowledged.
© 2009 The Econometric Society
DOI: 10.3982/ECTA6763
136
H. KASAHARA AND K. SHIMOTSU
dynamic discrete choice models are estimated (e.g., Keane and Wolpin (1997), Cameron and Heckman (1998)). Heckman and Singer (1984) used finite mixtures to approximate more general mixture models in the context of duration models with unobserved heterogeneity. In most applications of finite mixture models, the components of the mixture distribution are assumed to belong to a parametric family. The nonparametric maximum likelihood estimator (NPMLE) of Heckman and Singer (1984) treats the distribution of unobservables nonparametrically but assumes parametric component distributions. Most existing theoretical work on identification of finite mixture models either treats component distributions parametrically or uses training data that are from known component distributions (e.g., Titterington, Smith, and Makov (1985), Rao (1992)). As Hall and Zhou (2003) stated, “very little is known of the potential for consistent nonparametric inference in mixtures without training data.” This paper studies nonparametric identifiability of type probabilities and type-specific component distributions in finite mixture dynamic discrete choice models. Specifically, we assess the identifiability of type probabilities and typespecific component distributions when no parametric assumption is imposed on them. Our point of departure is the work of Hall and Zhou (2003), who proved nonparametric identifiability of two-type mixture models with independent marginals: (1)
F(y) = π
T
Ft1 (yt ) + (1 − π)
t=1
T
Ft2 (yt )
t=1
where F(y) is the distribution function of a T -dimensional variable Y , and j Ft (yt ) is the distribution function of the tth element of Y conditional on type j. Hall and Zhou showed that the type probability π and the type-specific compoj nents Ft are nonparametrically identifiable from F(y) and its marginals when T ≥ 3, while they are not when T = 2. The intuition behind their result is as follows. Integrating out different elements of y from (1) gives lower-dimensional submodels, (2)
F yi1 yi2 yil = π
l s=1
l F yis + (1 − π) Fi2s yis 1 is
s=1
where 1 ≤ l ≤ T , 1 ≤ i1 < · · · < il ≤ T , and F(yi1 yi2 yil ) is the l-variate marginal distribution of F(y). Each lower-dimensional submodel implies a difj ferent restriction on the unknown elements, that is, π and the Ft ’s. F and T its marginals imply 2 − 1 restrictions, while there are 2T + 1 unknown elements. When T = 3, the number of restrictions is the same as the number of unknowns, and one can solve these restrictions to uniquely determine π and j the Ft ’s.
MODELS OF DYNAMIC DISCRETE CHOICES
137
While Hall and Zhou’s analysis provides the insight that lower-dimensional submodels (2) provide important restrictions for identification, it has limited applicability to the finite mixture models of dynamic discrete choices in economic applications. First, it is difficult to generalize their analysis to three or more types.2 Second, their model (1) does not have any covariates, while most empirical models in economics involve covariates. Third, the assumption that elements of y are independent in (1) is not realistic in dynamic discrete choice models. This paper provides sufficient conditions for nonparametric identification for various finite mixture models of dynamic discrete choices used in applied work. Three elements emerge as the important determinants of identification: the time-dimension of panel data, the number of the values the covariates can take, and the heterogeneity of the response of different types to changes in the covariates. For example, in a simple case where the transition function is type-invariant, a time-dimension of T = 3 is sufficient for identification, provided that the number of values the covariates can take is no smaller than the number of types and that the changes in the covariates induce sufficiently heterogeneous variations in the choice probabilities across types. The key insight is that, in models with covariates, different sequences of covariates imply different identifying restrictions in the lower-dimensional submodels; in fact, if d is the number of support points of the covariates and T is the time-dimension, then the number of restrictions becomes on the order of d T . As a result, the presence of covariates provides a powerful source of identification in panel data even with a moderate time-dimension T . We study a variety of finite mixture dynamic discrete choice models under different assumptions on the Markov property, stationarity, and typeinvariance in the transition process. Under a type-invariant transition function and conditional independence, we analyze the nonstationary case that conditional choice probabilities change over time because time-specific aggregate shocks are present or agents are finitely lived. We also examine the case where state dependence is present (for instance, when the lagged choice affects the current choice and/or the transition function of state variables is different across types), and show that identification is possible when a model is stationary first-order Markovian and the panel has a moderate time-dimension T ≥ 6. This result is important since distinguishing unobserved heterogeneity and state dependence often motivates the use of finite mixture models in empirical studies. On the other hand, our approach has a limitation in that it does not simultaneously allow for both state dependence and nonstationarity. 2
When the number of types, M, is more than three, Hall, Neeman, Pakyari, and Elmore (2005) showed that for any number of types, M, there exists TM such that type probabilities and typespecific component distributions are nonparametrically identifiable when T ≥ TM , and that TM is no larger than (1 + o(1))6M ln(M) as M increases. However, such a TM is too large for typical panel data sets.
138
H. KASAHARA AND K. SHIMOTSU
We also study nonparametric identifiability of the number of types, M. Under the assumptions on the Markov property, stationarity, and type-invariance used in this paper, we show that the lower bound of M is identifiable and, furthermore, M itself is identified if the changes in covariates provide sufficient variation in the choice probabilities across types. Nonparametric identification and estimation of finite mixture dynamic discrete choice models are relevant and useful in practical applications for, at least, the following reasons. First, choosing a parametric family for the component distributions is often difficult because of a lack of guidance from economic theory; nonparametric estimation provides a flexible way to reveal the structure hidden in the data. Furthermore, even when theory offers guidance, comparing parametric and nonparametric estimates allows us to examine the validity of the restrictions imposed by the underlying theoretical model. Second, analyzing nonparametric identification helps us understand the identification of parametric or semiparametric finite mixture models of dynamic discrete choices. Understanding identification is not a simple task for finite mixture models even with parametric component distributions, and formal identification analysis is rarely provided in empirical applications. Once type probabilities and component distributions are nonparametrically identified, the identification analysis of parametric finite mixture models often becomes transparent as it is reduced to the analysis of models without unobserved heterogeneity. As we demonstrate through examples, our nonparametric identification results can be applied to check the identifiability of some parametric finite mixture models. Third, the identification results of this paper will open the door to applying semiparametric estimators for structural dynamic models to models with unobserved heterogeneity. Recently, by building on the seminal work by Hotz and Miller (1993), computationally attractive semiparametric estimators for structural dynamic models have been developed (Aguirregabiria and Mira (2002), Kasahara and Shimotsu (2008a)), and a number of papers in empirical industrial organization have proposed two-/multistep estimators for dynamic games (e.g., Bajari, Benkard, and Levin (2007), Pakes, Ostrovsky, and Berry (2007), Pesendorfer and Schmidt-Dengler (2008), Bajari and Hong (2006), and Aguirregabiria and Mira (2007)). To date, however, few of these semiparametric estimators have been extended to accommodate unobserved heterogeneity. This is because these estimators often require an initial nonparametric consistent estimate of type-specific component distributions, but it has not been known whether one can obtain a consistent nonparametric estimate in finite mixture models.3 The identification results of this paper provide an apparatus 3 It is believed that it is not possible to obtain a consistent estimate of choice probabilities. For instance, Aguirregabiria and Mira (2007) proposed a pseudo maximum likelihood estimation algorithm for models with unobserved heterogeneity, but stated that (p. 15) “for [models with unobservable market characteristics] it is not possible to obtain consistent nonparametric
MODELS OF DYNAMIC DISCRETE CHOICES
139
that enables researchers to apply these semiparametric estimators to the models with unobserved heterogeneity. This is important since it is often crucial to control for unobserved heterogeneity in dynamic models (see Aguirregabiria and Mira (2007)). In a closely related paper, Kitamura (2004) examined nonparametric identifiability of finite mixture models with covariates. Our paper shares his insight that the variation in covariates may provide a source of identification; however, the setting as well as the issues we consider are different from Kitamura’s. We study discrete choice models in a dynamic setting with panel data, while Kitamura considered regression models with continuous dependent variables with cross-sectional data. We address various issues specific to dynamic discrete choice models, including identification in the presence of state dependence and type-dependent transition probabilities for endogenous explanatory variables. Our work provides yet another angle for analysis that relates current and previous work on dynamic discrete choice models. Honoré and Tamer (2006) studied identification of dynamic discrete choice models, including the initial conditions problem, and suggested methods to calculate the identified sets. Rust (1994), Magnac and Thesmar (2002), and Aguirregabiria (2006) studied the identification of structural dynamic discrete choice models. Our analysis is also related to an extensive literature on identification of duration models (e.g., Elbers and Ridder (1982), Heckman and Singer (1984), Ridder (1990), and Van den Berg (2001)). The rest of the paper is organized as follows. Section 2 discusses our approach to identification and provides the identification results using a simple “baseline” model. Section 3 extends the identification analysis of Section 2, and studies a variety of finite mixture dynamic discrete choice models. Section 4 concludes. The proofs are collected in the Appendix. 2. NONPARAMETRIC IDENTIFICATION OF FINITE MIXTURE MODELS OF DYNAMIC DISCRETE CHOICES
Every period, each individual makes a choice at from the discrete and finite set A, conditioning on (xt xt−1 at−1 ) ∈ X × X × A, where xt is observable individual characteristics that may change over time and the lagged choice at−1 is included as one of the conditioning variables. Each individual belongs to one of M types, and his/her type attribute is unknown. The probability of belonging to type m is π m , where the π m ’s are positive and sum to 1. Throughout this paper, we impose a first-order Markov property on the conditional choice probability of at and denote type m’s conditional choice probability by P m (at |xt xt−1 at−1 ). The initial distribution of (x1 a1 ) and the transition probability function of xt are also different across types. For each type estimates of [choice probabilities].” Furthermore, Geweke and Keane (2001, p. 3490) wrote that “the [Hotz and Miller] methods cannot accommodate unobserved state variables.”
140
H. KASAHARA AND K. SHIMOTSU
m, we denote them by p∗m (x1 a1 ) and ftm (xt |{xτ aτ }t−1 τ=1 ), respectively. With a slight abuse of notation, we let p∗m (x1 a1 ) and ftm (xt |{xτ aτ }t−1 τ=1 ) denote the density of the continuously distributed elements of xt and the probability mass function of the discretely distributed elements of xt . Suppose we have a panel data set with time-dimension equal to T . Each individual observation, wi = {ait xit }Tt=1 , is drawn randomly from an M-term mixture distribution, (3)
P({at xt }Tt=1 ) =
M
π m p∗m (x1 a1 )
m=1
=
M
T
m t−1 ftm (xt |{xτ aτ }t−1 τ=1 )Pt (at |xt {xτ aτ }τ=1 )
t=2
π m p∗m (x1 a1 )
m=1
T
m ftm (xt |{xτ aτ }t−1 τ=1 )Pt (at |xt xt−1 at−1 )
t=2
where the first equality presents a general mixture model, while the second equality imposes the Markovian assumption on the conditional choice probam bilities, Ptm (at |xt {xτ aτ }t−1 τ=1 ) = Pt (at |xt xt−1 at−1 ). This is the key identifying assumption of this paper. The left-hand side of (3) is the distribution function of the observable data, while the right-hand side of the second equality contains the objects we would like the data to inform us about. REMARK 1: In models where at and xt follow a stationary first-order Markov process, it is sometimes assumed that the choice of the distribution of the initial observation, p∗m (x1 a1 ), is the stationary distribution that satisfies the fixed point constraint p∗m (x1 a1 ) = (4) P m (a1 |x1 x a )f m (x1 |x a )p∗m (x a ) x ∈X a ∈A
when all the components of x have finite support. When x is continuously distributed, we replace the summation over x with integration. Our identification result does not rely on the stationarity assumption of the initial conditions. The model (3) includes the following examples as special cases. EXAMPLE 1—Dynamic Discrete Choice Model With Heterogeneous Coefficients: Denote a parameter vector specific to type m’s individual by θm = (βm ρm ) . Consider a dynamic binary choice model for individual i who belongs to type m: (5)
m P m (ait = 1|xit {xiτ aiτ }t−1 τ=1 ) = P (ait = 1|xit ait−1 )
= Φ(xit βm + ρm ait−1 )
141
MODELS OF DYNAMIC DISCRETE CHOICES
where the first equality imposes the Markovian assumption and the second follows from the parametric restriction with Φ(·) denoting the standard normal cumulative distribution function (c.d.f.). The distribution of xit conditional on (xit−1 ait−1 ) is specific to the value of θm . Since the evolution of (xit ait ) in the presample period is not independent of random coefficient θm , the initial distribution of (xi1 ai1 ) depends on the value of θm (cf. Heckman (1981)). Browning and Carro (2007) estimated a continuous mixture version of (5) for the purchase of milk using a Danish consumer “long” panel (T ≥ 100), and provided evidence for heterogeneity in coefficients. Their study illustrates that allowing for such heterogeneity can make a significant difference for outcomes of interest such as the marginal dynamic effect. In practice however, researchers quite often only have access to a short panel. The results of this paper are therefore useful to understand the extent to which unobserved heterogeneity in coefficients is identified in such a situation. Our identification results are not applicable, however, to a parametric dynamic discrete choice model with serially correlated idiosyncratic shocks; for example, ait = 1(xit βm + ρm ait−1 + εit ), where εit is serially correlated. EXAMPLE 2—Structural Dynamic Discrete Choice Models: ∞ Type m’s agent maximizes the expected discounted sum of utilities, E[ j=0 βj {u(xt+j at+j ; θm ) + εt+j (at+j )}|at xt ; θm ], where xt is an observable state variable and εt (at ) is a state variable that are known to the agent but not to the researcher. The Bellman equation for this dynamic optimization problem is m m V (x) = max u(x a; θ ) + ε(a) + β (6) V (x )f (x |x a; θ ) a∈A
x ∈X
× g(dε|x) where g(ε|x) is the joint distribution of ε = {ε(j) : j ∈ A} and f (x |x a; θm ) is a type-specific transition function. The conditional choice probability is
Pθm (a|x) = 1 a = arg max u(x j; θm ) + ε(j) (7) j∈A
+β
Vθm (x )f (x |x j; θ ) m
x ∈X
× g(dε|x) where Vθm is the fixed point of (6). Let Ptm (at |xt xt−1 at−1 ) = Pθm (at |xt ) m and ftm (xt |{xτ aτ }t−1 τ=1 ) = f (xt |xt−1 at−1 ; θ ) in (3). The initial distribution of (x1 a1 ) is given by the stationary distribution (4). Then the likelihood function for {at xt }Tt=1 is given by (3) with (4).
142
H. KASAHARA AND K. SHIMOTSU
We study the nonparametric identifiability of the type probabilities, the initial distribution, the type-specific conditional choice probabilities, and the type-specific transition function in equation (3), which we denote by θ = {π m p∗m (·) {Ptm (·|·) ftm (·|·)}Tt=2 }M m=1 . Following the standard definition of nonparametric identifiability, θ is said to be nonparametrically identified (or identifiable) if it is uniquely determined by the distribution function P({at xt }Tt=1 ), without making any parametric assumption about the elements of θ. Because the order of the component distributions can be changed, θ is identified only up to a permutation of the components. If no two of the π’s are identical, we may uniquely determine the components by assuming π 1 < π 2 < · · · < π M . 2.1. Our Approach and Identification of the Baseline Model The finite mixture models studied by Hall and Zhou (2003) have no covariates as discussed in the Introduction. In this subsection, we show that the presence of covariates in our model creates a powerful source of identification. First, we impose the following simplifying assumptions on the general model (3) and analyze the nonparametric identifiability of the resulting “baseline model.” Analyzing the baseline model helps elucidate the basic idea of our approach and clarifies the logic behind our main results. In the subsequent sections, we relax Assumption 1 in various ways and study how it affects the identifiability of the resulting models. ASSUMPTION 1: (a) The choice probability of at does not depend on time. (b) The choice probability of at is independent of the lagged variable (xt−1 at−1 ) t−1 t conditional on xt . (c) ftm (xt |{xτ aτ }t−1 τ=1 ) > 0 for all (xt {xτ aτ }τ=1 ) ∈ X × t−1 A and for all m. (d) The transition function is common across types; t−1 ftm (xt |{xτ aτ }t−1 τ=1 ) = ft (xt |{xτ aτ }τ=1 ) for all m. (e) The transition function is t−1 stationary; ft (xt |{xτ aτ }τ=1 ) = f (xt |xt−1 at−1 ) for all m. Under Assumptions 1(a) and (b), the choice probabilities are written as Ptm (at |xt xt−1 at−1 ) = P m (at |xt ), where at−1 is not one of the elements of xt . Under Assumption 1(b), the lagged variable (xt−1 at−1 ) affects the current choice at only through its effect on xt via ftm (xt |{xτ aτ }t−1 τ=1 ). Assumption 1(c) implies that, starting from any combinations of the past state and action, any state x ∈ X is reached in the next period with positive probability. With Assumption 1 imposed, the baseline model is (8)
P({at xt }Tt=1 ) =
M m=1
π m p∗m (x1 a1 )
T
f (xt |xt−1 at−1 )P m (at |xt )
t=2
Since f (xt |xt−1 at−1 ) is nonparametrically identified directly from the observed data (cf. Rust (1987)), we may assume f (xt |xt−1 at−1 ) is known without affecting the other parts of the argument. Divide P({at xt }Tt=1 ) by the transition
143
MODELS OF DYNAMIC DISCRETE CHOICES
functions and define (9)
˜ t xt }T ) = P({a t=1
P({at xt }Tt=1 ) T f (xt |xt−1 at−1 ) t=2
=
M
π m p∗m (x1 a1 )
m=1
T
P m (at |xt )
t=2
which can be computed from the observed data. Assumption 1 guarantees that ˜ t xt }T ) is well defined for any possible sequence of {at xt }T ∈ (A×X)T . P({a t=1 t=1 Let I = {i1 il } be a subset of the time indices, so that I ⊆ {1 T }, where 1 ≤ l ≤ T and 1 ≤ i1 < · · · < il ≤ T . Integrating out different elements ˜ t xt }T ), which we call from (9) gives the l-variate marginal version of P({a t=1 lower-dimensional submodels (10)
M l
π m p∗m x1 a1 P m (ais |xis ) P˜ ais xis is ∈I = m=1
when {1} ∈ I
s=2
and (11)
M l
P˜ ais xis is ∈I = πm P m ais |xis m=1
when
{1} ∈ / I
s=1
In model (9), a powerful source of identification is provided by the difference in each type’s response patterns to the variation of the covariate (x1 xT ). The key insight is that for each different value of (x1 xT ), (10) and (11) imply different restrictions on the type probabilities and conditional choice probabilities. Let |X| denote the number of elements in X. The variation of (x1 xT ) generates different versions of (10) and (11), providing restrictions whose number is on the order of |X|T , while the number of the parameters {π m p∗m (a x) P m (a|x) : (a x) ∈ A × X}M m=1 is on the order of |X|. This identification approach is much more effective than one without covariates, in particular, when T is small.4 In what follows, we assume that the support of the state variables is discrete and known. This is assumed for the sake of clarity: our identification results are easier to understand in the context of a discrete state space, although they hold more generally. We also focus on the case where A = {0 1} to simplify notation. It is straightforward to extend our analysis to the case with a multinomial For example, when T = 3 and A = {0 1}, (10) and (11) imply at least tions while there are 3M|X| − 1 parameters. 4
|X|+2 3
different restric-
144
H. KASAHARA AND K. SHIMOTSU
choice of a, but with heavier notation. Note also that Chandra (1977) shows that a multivariate finite mixture model is identified if all the marginal models are identified. It is convenient to collect notation first. Define, for ξ ∈ X, (12)
∗m λ∗m and ξ = p ((a1 x1 ) = (1 ξ))
m λm ξ = P (a = 1|x = ξ)
Let ξj , j = 1 M − 1, be elements of X. Let k be an element of X. Define a matrix of type-specific distribution functions and type probabilities as ⎤ ⎡ 1 λ1ξ1 · · · λ1ξM−1 ⎢ ⎥ (13) L = ⎣ ⎦ (M×M) M M 1 λξ1 · · · λξM−1 ∗M Dk = diag(λ∗1 k λk )
V = diag(π 1 π M )
The elements of L, Dk , and V are parameters of the underlying mixture models to be identified. Now we collect notation for matrices of observables. Fix at = 1 for all t in ˜ t xt }3t=1 ), and define the resulting function as P({a (14)
˜ Fx∗1 x2 x3 = P({1 xt }3t=1 ) =
M
m m π m λ∗m x1 λx2 λx3
m=1
and λm where λ∗m x x are defined in (12). Next, integrate out (a1 x1 ) from 3 ˜ P({at xt }t=1 ), fix a2 = a3 = 1, and define the resulting function as (15)
˜ Fx2 x3 = P({1 xt }3t=2 ) =
M
m π m λm x2 λx3
m=1
Similarly, define the following “marginals” by integrating out other elements ˜ t xt }3t=1 ) and setting at = 1: from P({a (16)
˜ Fx∗1 x2 = P({1 xt }2t=1 ) =
M
m π m λ∗m x1 λx2
m=1
˜ Fx∗1 x3 = P({1 x1 1 x3 }) =
M m=1
˜ x1 }) = Fx∗1 = P({1
M m=1
π m λ∗m x1
m π m λ∗m x1 λx3
MODELS OF DYNAMIC DISCRETE CHOICES
˜ x2 }) = Fx2 = P({1
M
145
π m λm x2
m=1
˜ x3 }) = Fx3 = P({1
M
π m λm x3
m=1
Note that F·∗ involves (a1 x1 ) while F· does not contain (a1 x1 ). In fact, Fx∗1 x2 = Fx∗1 x3 if x2 = x3 because P m (a|x) does not depend on t, but we keep separate notation for the two because later we analyze the case where the choice probability depends on t. Evaluate Fx∗1 x2 x3 Fx2 x3 , and their marginals at x1 = k, x2 = ξ1 ξM−1 , and x3 = ξ1 ξM−1 , and arrange them into two M × M matrices: ⎡ ⎤ 1 Fξ1 ··· FξM−1 ⎢ Fξ Fξ1 ξ1 ··· Fξ1 ξM−1 ⎥ 1 ⎢ ⎥ P =⎢ (17) ⎥ ⎣ ⎦ FξM−1 FξM−1 ξ1 · · · ⎡ ∗ Fk∗ Fkξ 1 ∗ ∗ ⎢ Fkξ F kξ1 ξ1 ⎢ 1 Pk = ⎢ ⎣ ∗ ∗ FkξM−1 FkξM−1 ξ1
FξM−1 ξM−1 ··· ··· ···
⎤ ∗ Fkξ M−1 ∗ ⎥ Fkξ 1 ξM−1 ⎥ ⎥ ⎦ ∗ Fkξ M−1 ξM−1
The following proposition and corollary provide simple and intuitive sufficient conditions for identification under Assumption 1. Proposition 1 extends the idea of the proof of nonparametric identifiability of finite mixture models from Anderson (1954) and Gibson (1955) to models with covariates.5 Proposition 1 gives a sufficient condition for identification in terms of the rank of the matrix L and the type-specific choice probabilities evaluated at k. In practice, however, it may be difficult to check this rank condition because the elements of L are functions of the component distributions. Corollary 1 provides a sufficient condition in terms of the observable quantities P and Pk . The proofs are constructive. PROPOSITION 1: Suppose that Assumption 1 holds and assume T ≥ 3. Suppose further that there exist some {ξ1 ξM−1 } such that L is nonsingular and that ∗n ∗m there exists k ∈ X such that λ∗m k > 0 for all m and λk = λk for any m = n. Then 5 Anderson (1954) and Gibson (1955) analyzed nonparametric identification of finite mixture models similar to (9) but without covariates and derived a sufficient condition for nonparametric identifiability under the assumption T ≥ 2M − 1. Madansky (1960) extended their analysis to obtain a sufficient condition under the assumption 2(T −1)/2 ≥ M. When T is small, the number of identifiable types by their method is quite limited.
146
H. KASAHARA AND K. SHIMOTSU
3 3 m M ˜ {π m {λ∗m ξ λξ }ξ∈X }m=1 is uniquely determined from {P({at xt }t=1 ) : {at xt }t=1 ∈ 3 (A × X) }.
COROLLARY 1: Suppose that Assumption 1 holds, and assume T ≥ 3. Suppose further that there exist some {ξ1 ξM−1 } and k ∈ X such that P is of full rank and that all the eigenvalues of P −1 Pk take distinct values. Then 3 3 m M ˜ {π m {λ∗m ξ λξ }ξ∈X }m=1 is uniquely determined from {P({at xt }t=1 ) : {at xt }t=1 ∈ (A × X)3 }. REMARK 2: (i) The condition of Proposition 1 implies that all columns in L must be linearly independent. Since each column of L represents the conditional choice probability of different types for a given value of x, the changes in x must induce sufficiently heterogeneous variations in the conditional choice probabilities across types. In other words, the covariate must be relevant, and different types must respond to its changes differently. (ii) When λ∗m k = 0 for some m, its identification fails, because we never ∗n observe (x1 a1 ) for such type. The condition that λ∗m k = λk for some k ∈ X is satisfied if the initial distributions are different across different types. If either of these conditions is violated, then the initial distribution cannot be used as a source of identification and, as a result, the requirement on T becomes T ≥ 4 instead of T ≥ 3. (iii) One needs to find only one set of M − 1 points to construct a nonsingular L. The identification of choice probabilities at all other points in X follows without any further requirement. (iv) When X has |X| < ∞ support points, the number of identifiable types is at most |X|+1. When x is continuously distributed, we may potentially identify as many types as we wish. (v) By partitioning X into M − 1 disjoint subsets (Ξ1 Ξ2 ΞM−1 ), we may characterize a sufficient condition in terms of the conditional choice probabilities given a subset Ξj of X rather than an element ξj of X. (vi) We may check the conditions of Corollary 1 empirically by computing the sample counterpart of P and Pk for various {ξ1 ξM−1 }’s and/or for various partitions Ξj ’s. The latter procedure is especially useful when x is continuously distributed. The foundation for our identification method lies in the following relationship between the observables, P and Pk , and the parameters L, Dk , and V , which we call the factorization equations: (18)
P = L V L
Pk = L Dk V L
M Note that the (1 1)th element of P = L V L is 1 = m=1 π m . These two equations determine L Dk , and V uniquely. The first equation of (18) alone does
MODELS OF DYNAMIC DISCRETE CHOICES
147
not give a unique decomposition of P in terms of L and V , because this equation provides M(M + 1)/2 restrictions due to the symmetry of P, while there are M 2 − M + M = M 2 unknowns in L and V . Indeed, when M = 2, there are three restrictions and four unknowns, and L and V are just not identified. To shed further light on our identification method, we provide a sketch of how we constructively identify L, Dk , and V from P and Pk . Suppose P is invertible or, equivalently, L is invertible. As is apparent from equation (18), Pk is similar to P except that Pk contains an extra diagonal matrix Dk . Since P −1 Pk = L−1 Dk L, the eigenvalues of P −1 Pk identify the elements of Dk . Furthermore, multiplying both sides of P −1 Pk = L−1 Dk L by L−1 , we have (P −1 Pk )L−1 = L−1 Dk , suggesting that the columns of L−1 are identified with the eigenvectors of P −1 Pk . Finally, once L is identified, V is identified since V = (L )−1 PL−1 . By applying the above algorithm to a sample analogue of P and Pk , we may m M construct an estimator for {π m {λ∗m ξ λξ }ξ∈X }m=1 , which will have the same rate of convergence as the estimates of P and Pk . Alternatively, once identification is established, we may use various nonparametric estimation procedures, such as a series-based mixture likelihood estimator. Magnac and Thesmar (2002, Proposition 6) studied a finite mixture dynamic discrete choice model similar to our baseline model and showed that their model is not nonparametrically identified. They assumed that the transition probability is common across types and that the initial distribution is independent of the types. Hence, we may express their model in terms of our notation as (19)
P({at xt }Tt=1 ) =
M m=1
π m f (x1 )P m (a1 |x1 )
T
f (xt |xt−1 at−1 )P m (at |xt )
t=2
Setting p∗m (x1 a1 ) = f (x1 )P m (a1 |x1 ) gives our baseline model (3). Our results differ from those of Magnac and Thesmar in two ways: the length of the periods considered and the variation of xt . First, Magnac and Thesmar considered a two-period model, whereas our identifiability result requires at least three periods.6 Second, Magnac and Thesmar restricted the variation of xt by assuming that there are only two states, one of which is absorbing. In terms of our notation, this restriction is the same as assuming xt ∈ {0 1}, and xt = 1 with probability 1 if xt−1 = 1. This reduces the possible variation of the sequences of xt substantially, making identification difficult because only T + 1 different sequences of xt are observable. For example, when T = 3, the only possible sequences of (x1 x2 x3 ) are (1 1 1), (0 1 1), (0 0 1), and (0 0 0). 6 The possibility of identifying many types using the variation of |X| under T = 2 is currently under investigation. A related study on nonidentifiability of multivariate mixtures by Kasahara and Shimotsu (2008b) suggests, however, that T ≥ 3 is necessary for identification even when |X| ≥ 2.
148
H. KASAHARA AND K. SHIMOTSU
If we assume T ≥ 3 and that there are more than two nonabsorbing states, then we can apply Proposition 1 and Corollary 1 to (19). Alternately, identifying the single nonabsorbing state model is still possible if T ≥ 2M − 1 by applying Remark 3 below with x¯ = 0, although Remark 3 uses the stationarity of P m (a|x). For the sake of brevity, in the subsequent analysis we provide sufficient conditions only in terms of the rank of the matrix of the type-specific component distributions (e.g., L). In each of the following propositions, sufficient conditions in terms of the distribution function of the observed data can easily be deduced from the conditions in terms of the type-specific component distributions. The identification method of Proposition 1 uses a set of restrictions implied by the joint distribution of only (a1 x1 a2 x2 a3 x3 ). When the variation of (x1 x2 xT ) for T ≥ 5 is available, we may adopt the approach of Madansky (1960) to use the information contained in all xt ’s. Define u = (T − 1)/2, and write the functions corresponding to (14) and (15) as (20)
˜ Fx∗1 ···xT = P({1 xt }Tt=1 ) =
M
m m π m λ∗m x1 λx2 · · · λxT
m=1
=
M
m m m m π m λ∗m x1 λx2 · · · λxu+1 λxu+2 · · · λxT
m=1
and (21)
Fx2 ···xT
˜ = P({1 xt }Tt=2 ) =
M
m m m π m λm x2 · · · λxu+1 λxu+2 · · · λxT
m=1
Equations (20) and (21) have the same form as (14) and (15) if we view m m m u λm x2 · · · λxu+1 and λxu+2 · · · λxT as marginal distributions with |X| support points. Consequently, we can construct factorization equations similar to (18), in which the elements of a matrix corresponding to the matrix L are based on m m m λm x2 · · · λxu+1 and λxu+2 · · · λxT and their subsets. This extends the maximum number of identifiable types from on the order of |X| to on the order of |X|(T −1)/2 . Despite being more complex than Proposition 1, the following proposition is useful when T is large, making it possible to identify a large number of types even if |X| is small. For notational simplicity, we assume |X| is finite and X = {1 2 |X|}. PROPOSITION 2: Suppose that Assumption 1 holds. Assume T ≥ 5 is odd and define u = (T − 1)/2. Suppose X = {1 2 |X|} and define ⎤ ⎡ 1 ⎡ ⎤ λ1 · · · λ1|X| 1 ⎢ ⎥ Λ0 = ⎣ ⎦ Λ1 = ⎣ ⎦ M M 1 λ ··· λ 1
|X|
MODELS OF DYNAMIC DISCRETE CHOICES
149
For l = 2 u, define Λl to be a matrix, each column of which is formed by choosing l columns (unordered, with replacement) from the columns of Λ1 and taking their Hadamard product. There are |X|+l−1 ways to choose such columns; l |X|+l−1 thus the dimension of Λl is M × . For example, Λ2 and Λ3 take the form l ⎤ ⎡ 1 1 λ1 λ1 · · · λ11 λ1|X| λ12 λ12 · · · λ12 λ1|X| · · · λ1|X| λ1|X| ⎥ ⎢ Λ2 = ⎣ ⎦ M M M M M M M M M M λ1 λ1 · · · λ1 λ|X| λ2 λ2 · · · λ2 λ|X| · · · λ|X| λ|X| ⎡ 1 1 1 λ1 λ1 λ1 · · · λ11 λ11 λ1|X| ⎢ Λ3 = ⎣ M M M M M λM λ λ · · · λ λ 1 1 1 1 1 λ|X| λ12 λ11 λ12 · · · λ12 λ11 λ1|X| M M M M M M λ2 λ1 λ2 · · · λ2 λ1 λ|X| u |X|+l−1 ) matrix Λ as Define an M × ( l=0 l
⎤ · · · λ1|X| λ1|X| λ1|X| ⎥ ⎦ M M · · · λM |X| λ|X| λ|X|
Λ = [Λ0 Λ1 Λ2 Λu ] u Suppose (a) l=0 |X|+l−1 ≥ M, (b) we can construct a nonsingular M ×M matrix l L by setting its first column as Λ0 and choosing other M − 1 columns from the columns of Λ other than Λ0 , and (c) there exists k ∈ X such that λ∗m k > 0 for all m ∗n m ∗m m |X| M and λ∗m = λ for any m = n. Then {π {λ λ } } is uniquely determined k k j j j=1 m=1 T T T ˜ from {P({at xt }t=1 ) : {at xt }t=1 ∈ (A × X) }. REMARK 3: In a special case where there are no covariates and |X| = 1, the matrix Λ becomes ⎤ ⎡ 1 λ11 (λ11 )2 · · · (λ11 )u ⎢ ⎥ Λ = ⎣ ⎦ 1 λM 1
2 (λM 1 )
u · · · (λM 1 )
and the sufficient condition of Proposition 2 reduces to (a) T ≥ 2M − 1, n ∗n ∗m ∗m (b) λm 1 = λ1 for any m = n, and (c) λ1 > 0 and λ1 = λ1 for any m = n. Not surprisingly, the condition T ≥ 2M − 1 coincides with the sufficient condition of nonparametric identification of finite mixtures of binomial distributions (Blischke (1964)). This set of sufficient condition also applies to the case where the covariates have no time variation (x1 = · · · = xT ), such as race and/or sex. In this case, because of the stationarity, the time-series variation of at substitutes for the variation of xt .
150
H. KASAHARA AND K. SHIMOTSU
Houde and Imai (2006) studied nonparametric identification of finite mix¯ ture dynamic discrete choice models by fixing the value of the covariate x (to x, for instance) and derived a sufficient condition for T . They also considered a model with terminating state. If the conditional choice probabilities of different types are heterogeneous and the column vectors (λ1x λM x ) for x = 1 |X| are linearly independent, the rank condition of this proposition is likely to be satisfied, since the Hadamard products of these column vectors are unlikely to be linearly dependent, unless by chance. Since the construction of the matrices in Proposition 2 is rather complex, we provide a simple example with T = 5 to illustrate its connection to the L matrix in Proposition 1. EXAMPLE 3—An Example for Proposition2: Suppose that T = 5 and X = 2 {1 2}. In this case, we can identify M = l=0 1+l = 6 types. Consider a matrix l ⎡
λ11
λ12
λ11 λ11
λ11 λ12
1 λM 1
λM 2
M λM 1 λ1
M λM 1 λ2
1 ⎢
L =Λ=⎣
⎤ λ12 λ12 ⎥ ⎦ M M λ2 λ2
Then the factorization equations that correspond to (18) are given by (22)
P = (L ) V L
Pk = (L ) Dk V L
∗M where V = diag(π 1 π M ) and Dk = diag(λ∗1 k λk ), as defined in (13).
We can verify that the elements of P and Pk can be constructed from the distribution function of the observed data. For instance,
⎡
1 ⎢ F1 ⎢ ⎢F P = ⎢ 2 ⎢ F11 ⎣F 12 F22
F1 F11 F21 F111 F121 F221
F2 F12 F22 F112 F122 F222
F11 F111 F211 F1111 F1211 F2211
F12 F112 F212 F1112 F1212 F2212
⎤ F22 F122 ⎥ ⎥ F222 ⎥ ⎥ F1122 ⎥ F1222 ⎦ F2222
M M M m m m m m m m where Fi = m=1 π m λm i , Fij = m=1 π λi λj , Fijk = m=1 π λi λj λk , and M m m m Fijkl = m=1 π m λm i λj λk λl for i j k l ∈ {1 2} are identifiable from the population. Once the factorization equations (22) are constructed, we may apply the argument following Corollary 1 to determine L , V , and Dk uniquely from P and Pk .
MODELS OF DYNAMIC DISCRETE CHOICES
151
2.2. Identification of the Number of Types So far, we have assumed that the number of mixture components M is known. How to choose M is an important practical issue because economic theory usually does not provide much guidance. We now show that it is possible to nonparametrically identify the number of types from panel data with two periods.7 Assume T ≥ 2 and X = {1 |X|}. Define a (|X| + 1) × (|X| + 1) matrix which is analogous to P in (17) but uses the first two periods and all the support points of X: ⎡ 1 ∗ ⎢ F1 ∗ ⎢ P = ⎣ ∗ F|X|
F1 ∗ F11 ∗ F|X|1
··· ··· ···
F|X| ⎤ ∗ F1|X| ⎥ ⎥ ⎦ ∗ F|X||X|
M m ∗m ˜ where, as defined in (16), Fi∗ = P({(a 1 x1 ) = (1 i)}) = m=1 π λi , Fi = M m m ∗ ˜ ˜ P({(a 2 x2 ) = (1 i)}) = m=1 π λi , and Fij = P({(a1 x1 a2 x2 ) = (1 i 1 M m ∗m m ∗ j)}) = m=1 π λi λj . The matrix P contains information on how different types react differently to the changes in covariates for all possible x’s. The following proposition shows that we may nonparametrically identify the number of types from P ∗ under Assumption 1. PROPOSITION 3: Suppose that Assumption 1 holds. Assume T ≥ 2 and X = {1 |X|}. Then M ≥ rank(P ∗ ). Furthermore, in addition to Assumption 1, suppose that the two matrices L∗1 and L∗2 defined below both have rank M: ⎤ ⎡ 1 λ∗1 · · · λ∗1 1 |X| ⎢ ⎥ L∗1 = ⎣ ⎦ (M×(|X|+1)) ∗M ∗M · · · λ|X| 1 λ1 ⎤ ⎡ 1 1 λ1 · · · λ1|X| ⎢ ⎥ L∗2 = ⎣ ⎦ (M×(|X|+1))
1
λM 1
···
λM |X|
Then M = rank(P ∗ ). REMARK 4: (i) The rank condition of L∗1 and L∗2 is not empirically testable from the observed data. The rank of P ∗ , which is observable, gives the lower bound of the number of types. 7
We thank the co-editor and a referee for suggesting that we investigate this problem.
152
H. KASAHARA AND K. SHIMOTSU
(ii) Surprisingly, two periods of panel data, rather than three periods, may suffice for identifying the number of types. (iii) The rank condition on L∗1 implies that no row of L∗1 can be expressed as a linear combination of the other rows of L∗1 . The same applies to the rank condition on L∗2 . Since the mth row of L∗1 or L∗2 completely summarizes type m’s conditional choice probability within each period, this condition requires that the changes in x provide sufficient variation in the choice probabilities across types and that no type is “redundant” in one-dimensional submodels. (iv) The rank condition on L∗2 is equivalent to the rank condition on L in Proposition 1 when X = {1 |X|}. In other words, rank(L∗2 ) = M if and only if rank(L) = M. (v) We may partition X into disjoint subsets and compute P ∗ with respect to subsets of X rather than elements of X. (vi) When T ≥ 4, we use an approach similar to Proposition 2 to con may u u |X|+l−1 struct a ( l=0 |X|+l−1 × ) matrix P ∗ (similar to P in Example 3 l=0 l l but using (x1 x2 ) and (x3 x4 )) and increase the number of identifiable M to the order of |X|(T −1)/2 . 3. EXTENSIONS OF THE BASELINE MODEL In this section, we relax Assumption 1 of the baseline model in various ways to accommodate real-world applications. In the following subsections, we relax Assumption 1(a) and (e) (stationarity), Assumption 1(b) and (d) (typeinvariant transition), and Assumption 1(c) (unrestricted transition) in turn and analyze nonparametric identifiability of resulting models. In all cases, identification is achieved by constructing a version of the factorization equation similar to (18), specific to the model under consideration, and then applying an argument that follows the one presented in (18). The differences arise solely from the ways in which the factorization equations are constructed across the various models. 3.1. Time-Dependent Conditional Choice Probabilities The baseline model (8) assumes that conditional choice probabilities and the transition function do not change over periods. However, the agent’s decision rules may change over periods in some models, such as a model with timespecific aggregate shocks or a model of finitely lived individuals. In this subsection, we keep the assumption of the common transition function, but relax Assumption 1(a) and (e) to extend our analysis to models with time-dependent choice probabilities. When Assumption 1(a) and (e) (stationarity) are relaxed but Assumption 1(b) and (d) (conditional independence and type-invariant transition) are maintained, the choice probabilities and the transition function are written as
MODELS OF DYNAMIC DISCRETE CHOICES
153
t−1 Ptm (at |xt xt−1 at−1 ) = Ptm (at |xt ) and ftm (xt |{xτ aτ }t−1 τ=1 ) = ft (xt |{xτ aτ }τ=1 ), respectively, where at−1 is not an element of xt . Equation (9) then becomes
(23)
˜ t xt }T ) = P({a t=1
P({at xt }Tt=1 ) T
ft (xt |{xτ aτ }t−1 τ=1 )
t=2
=
M
π m p∗m (x1 a1 )
m=1
T
Ptm (at |xt )
t=2
The next proposition states a sufficient condition for nonparametric identification of the mixture model (23). In the baseline model (8), the sufficient condition is summarized to the invertibility of a matrix consisting of the conditional choice probabilities. In the time-dependent case, this matrix of conditional choice probabilities becomes time-dependent, and hence its invertibility needs to hold for each period. We consider the case of A = {0 1}. Define, for ξ ∈ X, ∗m λ∗m ξ = p ((a1 x1 ) = (1 ξ)) m λm tξ = Pt (at = 1|xt = ξ)
and
t = 2 T
PROPOSITION 4: Suppose that Assumptions 1(b)–(d) hold and assume T ≥ 3. For t = 2 T − 1, let ξjt , j = 1 M − 1, be elements of X and define ⎡
···
1 λ1tξt 1 ⎢ ⎢ Lt = ⎣ (M×M) M 1 λtξt
···
1
⎤ λ1tξt M−1 ⎥ ⎥ ⎦ λM tξt M−1
Suppose there exist {ξ ξ } such that Lt is nonsingular for t = 2 T and ∗n there exists k ∈ X such that λ > 0 for all m and λ∗m k = λk for any m = n. Then m T M T ˜ {π m {λ∗m ξ {λtξ }t=2 }ξ∈X }m=1 is uniquely determined from {P({at xt }t=1 ) : {at T T xt }t=1 ∈ (A × X) }. t 1
t M−1 ∗m k
When the choice probabilities are time-dependent, the factorization equations (that correspond to (18)) are also time-dependent: Pt = Lt V Lt+1
and
Ptk = Lt Dk V Lt+1
for
t = 2 T − 1
where V and Dk are defined as before. The elements of Pt and Ptk are the ˜ t xt }T ) in equation (23) with at = 1 for all t. Specifically, “marginals” of P({a t=1
154
H. KASAHARA AND K. SHIMOTSU
Pt and Ptk are defined as ⎡
···
Fξt+1 t+1
1
⎤
Fξt+1 t+1
⎥ ⎢ t ⎢ Fξt ⎥ Fξtt+1 ··· Fξtt+1 t ξt+1 t ξt+1 ⎢ 1 1 1 1 M−1 ⎥ Pt = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ tt+1 tt+1 t Fξt Fξt ξt+1 · · · Fξt ξt+1 M−1 M−1 1 M−1 M−1 ⎡ F∗ ∗t+1 ∗t+1 Fkξ ··· Fkξ t+1 t+1 k 1 M−1 ⎢ ∗t ∗tt+1 ∗tt+1 ⎢ Fkξt Fkξ ··· Fkξ t t+1 t t+1 ⎢ 1 1 ξ1 1 ξM−1 Ptk = ⎢ ⎢ ⎣ 1
∗t Fkξ t
M−1
∗tt+1 Fkξ t
t+1 M−1 ξ1
M−1
···
∗tt+1 Fkξ t
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
t+1 M−1 ξM−1
M tt+1 t m m ˜ where, similar to (15), Fξt t = P({(a t xt ) = (1 ξ )}) = m=1 π λtξt , Fξt ξt+1 = M m ∗ t t+1 ˜ ˜ P({(a )}) = m=1 π m λm t xt at+1 xt+1 ) = (1 ξ 1 ξ tξt λt+1ξt+1 , Fk = P({(a1 M ∗t m ∗m t ˜ x1 ) = (1 k)}) = m=1 π λk , Fkξt = P({(a 1 x1 at xt ) = (1 k 1 ξ )}) = M ∗tt+1 m ∗m m t ˜ m=1 π λk λtξt , and Fkξt ξt+1 = P({(a1 x1 at xt at+1 xt+1 ) = (1 k 1 ξ 1 M m m ξt+1 )}) = m=1 π m λ∗m k λtξt λt+1ξt+1 . Since Pt and Ptk are identifiable from the data, we may construct V , Dk , Lt , and Lt+1 from Pt and Ptk for t = 2 T − 1 by applying an argument that follows the one presented in (18) to each period. The following proposition corresponds to Proposition 2 and relaxes the identification condition of Proposition 4 when T ≥ 5 by utilizing all the marginals ˜ t xt }T ). The proof is omitted because it is similar to that of Propoof P({a t=1 sition 2. The difference from Proposition 2 is (i) the conditions are stated in m terms of λm tξ instead of λξ because of time-dependence and (ii) the number of restrictions implied by the submodels, analogously defined to (10) and (11) but with time subscripts, is larger because the order of the choices becomes relevant. As a result, the condition on |X| of Proposition 5 is weaker than that of Proposition 2. PROPOSITION 5: Suppose Assumption 1(b)–(d) hold. Assume T ≥ 5 is odd and define u = (T − 1)/2. Suppose X = {1 |X|} and further define ⎤ ⎡ 1 ⎡ ⎤ λ21 · · · λ12|X| 1 ⎢ ⎥ Λ¯ 0 = ⎣ ⎦ Λ¯ 11 = ⎣ ⎦ M M 1 ··· λ λ 21
2|X|
For l = 2 u, define Λ¯ 1l to be a matrix whose elements consist of the l-variate m m product of the form λm 2j2 λ3j3 · · · λljl+1 , covering all possible l ordered combinations
MODELS OF DYNAMIC DISCRETE CHOICES
155
(with replacement) of (j2 j3 jl+1 ) from (1 |X|). For example, ⎡ 1 1 λ21 λ31 · · · λ121 λ13|X| ⎢ Λ¯ 12 = ⎣ M M M λ · · · λ λ λM 21 31 21 3|X| λ122 λ131 M M λ22 λ31
· · · λ122 λ13|X| M M · · · λ22 λ3|X|
λ12|X| λ131 M λ2|X| λM 31
⎤ · · · λ12|X| λ13|X| ⎥ ⎦ M · · · λM 2|X| λ3|X|
Similarly, for l = 1 u, define Λ¯ 2l to be a matrix whose elements consist of m m the l-variate product of the form λm u+1j2 λu+2j3 · · · λu+ljl+1 , covering all possible l ordered combinations (with replacement) of (j2 j3 jl+1 ) from (1 |X|). Let Λ¯ 1 = [Λ¯ 0 Λ¯ 11 Λ¯ 12 Λ¯ 1u ]
and
Λ¯ 2 = [Λ¯ 0 Λ¯ 21 Λ¯ 22 Λ¯ 2u ]
Define L¯ 1 to be an M × M matrix whose first column is Λ¯ 0 and whose other M − 1 columns are from the columns of Λ¯ 1 other than Λ¯ 0 . Define L¯ 2 to be an M × M matrix whose first is Λ¯ 0 and whose other columns are from Λ¯ 2 . column u Suppose (a) l=0 |X|l ≥ M, (b) L¯ 1 and L¯ 2 are nonsingular, and (c) there ∗n ∗m exists k ∈ X such that λ∗m k > 0 for all m and λk = λk for any m = n. |X| M m T T ˜ Then {π m {λ∗m j {λtj }t=2 }j=1 }m=1 is uniquely determined from {P({at xt }t=1 ) : {at T T xt }t=1 ∈ (A × X) }. 3.2. Lagged Dependent Variable and Type-Specific Transition Functions In empirical applications, including the lagged choice in explanatory variables for the current choice is a popular way to specify dynamic discrete choice models. Furthermore, we may encounter a case where the transition pattern of state variables is heterogeneous across individuals, even after controlling for other observables. In such cases, the transition function of both at and xt becomes type-dependent. In this subsection, we relax Assumption 1(b) and (d) of the baseline model (8) to accommodate type-specific transition functions as well as the dependence of current choice on lagged variables. In place of Assumption 1(b) and (d), we impose stationarity and a first-order Markov property on the transition process of xt . Assumption 2(a) and (c) are identical to Assumption 1(a) and (c). ASSUMPTION 2: (a) The choice probability of at does not depend on time. (b) xt follows a stationary first-order Markov process; ftm (xt |{xτ aτ }t−1 τ=1 ) = f m (xt |xt−1 at−1 ) for all t and m. (c) f m (x |x a) > 0 for all (x x a) ∈ X × X × A and for all m.
156
H. KASAHARA AND K. SHIMOTSU
Under Assumption 2, the model is (24)
P({at xt }Tt=1 ) =
M
π m p∗m (x1 a1 )
m=1
T
f m (xt |xt−1 at−1 )P m (at |xt xt−1 at−1 )
t=2
and the transition process of (at xt ) becomes a stationary first-order Markov process. Define st = (at xt ), q∗m (s1 ) = p∗m (x1 a1 ), and Qm (st |st−1 ) = f m (xt | xt−1 at−1 )P m (at |xt xt−1 at−1 ), and rewrite the model (24) as (25)
P({st }Tt=1 ) =
M
π m q∗m (s1 )
m=1
T
Qm (st |st−1 )
t=2
Unlike the transformed baseline model (9), st appears both in Qm (st |s t−1 ) and Qm (st+1 |s t ), and creates the dependence between these terms. Consequently, the variation of st affects P({st }Tt=1 ) via both Qm (st |s t−1 ) and Qm (st+1 |s t ). This dependence makes it difficult to construct factorization equations that correspond to (18), which is the key to obtaining identification. We solve this dependence problem by using the Markov property of st . The idea is that if st follows a first-order Markov process, looking at every other period breaks the dependence of st across periods. Specifically, consider the sequence (st−1 st st+1 ) for various values of st , while fixing the values of st−1 and st+1 . Once st−1 and st+1 are fixed, the variation of st does not affect the state variables in other periods because of the Markovian structure of Qm (st |st−1 ). As a result, we can use this variation to distinguish different types. Let s¯ ∈ S = A × X be a fixed value of s and define (26)
πs¯m = π m q∗m (¯s)
m λm s|s)Qm (s|¯s) s¯ (s) = Q (¯
m λ∗m s) s¯ (sT ) = Q (sT |¯
Assume T is even and consider P({st }Tt=1 ) with st = s¯ for odd t: T −2 M ∗m (27) πs¯m λm P({st }Tt=1 |st = s¯ for t odd) = s¯ (st ) λs¯ (sT ) m=1
t=24
This conditional mixture model shares the property of independent marginals with (9). Consequently, we can construct factorization equations similar to (18) and, hence, can identify the components of the mixture model (27) for each s¯ ∈ S. Assume T = 6. Let ξj , j = 1 M − 1, be elements of S and let k ∈ S. Define ⎡ ⎤ 1 λ1s¯ (ξ1 ) · · · λ1s¯ (ξM−1 ) ⎢ ⎥ Ls¯ = ⎣ ⎦ (M×M)
1 λM s¯ (ξ1 )
···
λM s¯ (ξM−1 )
MODELS OF DYNAMIC DISCRETE CHOICES
Vs¯ = diag(πs¯1 πs¯M )
157
∗M Dk|¯s = diag(λ∗1 s¯ (k) λs¯ (k))
Then, from (27), the factorization equations that correspond to (18) are (28)
Ps¯ = Ls¯ Vs¯ Ls¯
Ps¯k = Ls¯ Dk|¯s Vs¯ Ls¯
where the elements of Ps¯ and Ps¯k are various marginals of the left-hand side of (27) and are identifiable from the data. Then we can construct Vs¯ , Dk|¯s , and Ls¯ uniquely from Ps¯ and Ps¯k by applying the argument following Corollary 1. The following proposition establishes a sufficient condition for nonparametric identification of model (27). Because of the temporal dependence in st , the requirement on T becomes T ≥ 6 instead of T ≥ 3. PROPOSITION 6: Suppose Assumption 2 holds and assume T ≥ 6. Suppose that q∗m (¯s) > 0 for all m, there exist some {ξ1 ξM−1 } such that Ls¯ is nonsin∗m ∗n gular, and there exists k ∈ S such that λ∗m s¯ (k) > 0 for all m and λs¯ (k) = λs¯ (k) m m ∗m M for any m = n. Then {πs¯ {λs¯ (s) λs¯ (s)}s∈S }m=1 is uniquely determined from {P({st }Tt=1 ) : {st }Tt=1 ∈ S T }. REMARK 5: (i) The assumption of stationarity and a first-order Markov property is crucial. When st follows a second-order Markov process (e.g., P m (at |{xτ aτ }t−1 τ=1 ) = P m (at |xt−1 at−1 xt−2 at−2 )), the requirement on T becomes T ≥ 9 instead of T ≥ 6 because we need to look at every two other periods so as to obtain the “independent” variation of st across periods. (ii) The transition functions and conditional choice probabilities are idenm tified from λ∗m s) = λ∗m s¯ (s) as follows. Recall Q (s|¯ s¯ (s) by definition (see (26)) m m m ¯ a)P ¯ ¯ a) ¯ with (a ¯ x) ¯ = s¯ . Summing Qm (s|¯s) over (a|x x and Q (s|¯s) = f (x|x ¯ a), ¯ and we then identify the conditional choice probabilia ∈ A gives f m (x|x ¯ a) ¯ = Qm (s|¯s)/f m (x|x ¯ a). ¯ ties by P m (a|x x (iii) If |S| M and the transition pattern of s is sufficiently heterogeneous across different types, the sufficient conditions in Proposition 6 are likely to hold for all s¯ ∈ S, and we may therefore identify the primitive parameters π m , p∗m (a x), f m (x |x a), and P m (a |x x a). Specifically, repeating Proposition 6 for all s¯ ∈ S, we obtain π m q∗m (s) = π m p∗m (a x) for all (a x) ∈ A × X. Then m m π is determined by π = (ax)∈A×X π m p∗m (a x) and we identify p∗m (a x) = (π m p∗m (a x))/π m . EXAMPLE 4 —An Example for Proposition 6: Consider a case in which T = 6, A = {0 1}, and X = {0 1}. Then M = |S| + 1 = 5 types can be identified. Fix s1 = s3 = s5 = s¯ ∈ A × X. Then Ls¯ is given by ⎡ ⎤ 1 λ1s¯ (0 0) λ1s¯ (0 1) λ1s¯ (1 0) λ1s¯ (1 1) ⎢ ⎥ Ls¯ = ⎣ ⎦ (M×M)
1
λM λM λM λM s¯ (0 0) s¯ (0 1) s¯ (1 0) s¯ (1 1)
158
H. KASAHARA AND K. SHIMOTSU
For example, the (5 5)th elements of Ps¯ and Ps¯k in (28) are given by P({s1 = 5 2 s3 = s5 = s¯ s2 = s4 = (1 1)}) = m=1 πs¯m (λm s¯ (1 1)) and P({s1 = s3 = s5 = 5 2 ∗m s¯ s2 = s4 = (1 1) s6 = k}) = m=1 πs¯m (λm s¯ (1 1)) λs¯ (k), respectively. When T ≥ 8, we can relax the condition |S| ≥ M − 1 of Proposition 6 by m s), λm applying the approach of Proposition 2. Define λ∗m s¯1 (s1 ) = s¯ (s) = Q (s|¯ m Qm (¯s|s1 )Qm (s1 |¯s), and λs¯2 (s1 s2 ) = Qm (¯s|s1 )Qm (s1 |s2 )Qm (s2 |¯s), and similarly m define λm s¯l (s1 sl ) for l ≥ 3 as an (l + 1)-variate product of Q (s |s)’s of l m m m l the form Q (¯s|s1 ) · · · Q (sl−1 |sl )Q (sl |¯s) for {st }t=1 ∈ S . PROPOSITION 7: Suppose Assumption 2 holds. Assume T ≥ 8 and is even, and define u = (T − 4)/2. Suppose that S = {1 2 |S|} and further define ⎡ 1 ⎤ ⎡ ⎤ λs¯1 (1) · · · λ1s¯1 (|S|) 1 ⎢ ⎥ Λ˜ 0 = ⎣ ⎦ Λ˜ 1 = ⎣ ⎦ M M 1 λs¯1 (1) · · · λs¯1 (|S|) For l = 2 u, define Λ˜ l to be a matrix whose elements consists of λm s¯l (s1 sl ), covering all possible unordered combinations (with replacement) of (s1 sl ) from S l . For example, ⎡ 1 λs¯2 (1 1) · · · λ1s¯2 (1 |S|) ⎢ =⎣ Λ˜ 2 |S|+1 (M×( 2 )) M λM (1 1) · · · λ (1 |S|) s¯2 s¯2 ⎤ λ1s¯2 (2 2) · · · λ1s¯2 (2 |S|) · · · λ1s¯2 (|S| |S|) ⎥ ⎦ · · · λM · · · λM λM s¯2 (2 2) s¯2 (2 |S|) s¯2 (|S| |S|) u |S|+l−1 Define an M × l=0 matrix Λ˜ as Λ˜ = [Λ˜ 0 Λ˜ 1 Λ˜ 2 Λ˜ u ] and define L s¯ l to be a M × M matrix consisting of M columns from Λ˜ but with the first column unchanged. u Suppose (a) l=0 |S|+l−1 ≥ M, (b) q∗m (¯s) > 0 for all m, (c) L s¯ is nonsingular, l ∗m ∗n and (d) there exists k ∈ S such that λ∗m s¯ (k) > 0 for all m and λs¯ (k) = λs¯ (k) |S| l u ∗m M for any m = n. Then {πs¯m {λm s¯l (s1 sl ) : (s1 sl ) ∈ S }l=1 {λs¯ (s)}s=1 }m=1 is T T T uniquely determined from {P({st }t=1 ) : {st }t=1 ∈ (A × X) }. The identification of the primitive parameters π m , p∗m (a x), f m (x |x a), P (a|x) follows from Remark 5(ii) and (iii). m
EXAMPLE 5 —An Example for Proposition 7: Browning and Carro (2007, Section 4) considered a stationary first-order Markov chain model of ait ∈
MODELS OF DYNAMIC DISCRETE CHOICES
159
{0 1} without covariates and showed that their model is not nonparametrically identified when T = 3 and M = 9. In our notation, Browning and Carro’s model is written as P(a1 aT ) =
M
π m p∗m (a1 )
m=1
T
P m (at |at−1 )
t=2
Note s = a because there are no covariates. If T = 8, we can identify M = u that 2+l−1 = 6 types, provided that, for s¯ = {0 1}, p∗m (¯s) > 0 for all m, L s¯ is l=0 l nonsingular, and P m (1|¯s) = P n (1|¯s) for any m = n. Here, L s¯ is given by ⎡ ⎤ 1 λ1s¯1 (0) λ1s¯1 (1) λ1s¯2 (0 0) λ1s¯2 (0 1) λ1s¯2 (1 1) ⎢ ⎥ L s¯ = ⎣ ⎦ λM λM λM λM 1 λM s¯1 (0) s¯1 (1) s¯2 (0 0) s¯2 (0 1) s¯2 (1 1) and the elements of L 0 are given by, for example, m m λm 01 (0) = P (0|0)P (0|0)
m m m λm 02 (1 1) = P (0|1)P (1|1)P (1|0)
The factorization equations that correspond to (18) are Ps¯ = (L s¯ ) Vs¯ L s¯ and Ps¯ k = (L s¯ ) Dk|¯s Vs¯ L s¯ , where V˜s¯ and Dk|¯s are defined as before. Ps¯ and Ps¯ k are identifiable from the data, and we can construct V˜s¯ , Dk|¯s , and L s¯ from these factorization equations. Similarly, if T = 10 12 14, then the maximum number of identifiable types by Proposition 7 is 10 15 21, respectively. The following example demonstrates that nonparametric identification of component distributions may help us understand the identification of parametric finite mixture models of dynamic discrete choices. EXAMPLE 6 —Identification of Models With Heterogeneous Coefficients: Consider the model of Example 1. For an individual who belongs to type m, P m (at = 1|xt at−1 ) = Φ(xt βm + ρm at−1 ) and the initial observation, (a1 x1 ), is randomly drawn from p∗m (a1 x1 ) while the transition function of xt is given by f m (xt |xt−1 at−1 ). If the conditions in Proposition 6 including T ≥ 6, |S| ≥ M − 1, and the rank of Ls¯ are satisfied, then p∗m (a1 x1 ), f m (xt |xt−1 at−1 ), and P m (at = 1|xt at−1 ) are identified for all m. Once P m (at = 1|xt at−1 ) is identified, taking an inverse mapping gives xt βm + ρm at−1 = Φ−1 (P m (at = 1|xt at−1 )). Evaluating this at all the points in A × X gives a system of |A||X| linear equations with dim(βm ) + 1 unknown parameters (βm ρm ), and solving this system for (βm ρm ) identifies (βm ρm ). m m For instance, consider a model P m (at = 1|xt at−1 ) = Φ(βm 0 + β1 xt + ρ at−1 ) m with A = X = {0 1}. If P (at = 1|x a) is identified for all (a x) ∈ A × X, then
160
H. KASAHARA AND K. SHIMOTSU
m m the type-specific coefficient (βm 0 β1 ρ ) is identified as the unique solution to the linear system ⎤ ⎡ −1 m ⎡ ⎤ Φ (P (at = 1|0 0)) 1 0 0 ⎡ βm ⎤ 0 ⎢ 1 1 0 ⎥ ⎣ m ⎦ ⎢ Φ−1 (P m (at = 1|0 1)) ⎥ ⎣ 1 0 1 ⎦ β1 = ⎣ Φ−1 (P m (a = 1|1 0)) ⎦ t ρm Φ−1 (P m (a = 1|1 1)) 1 1 1 t
The next example shows that the degree of underidentification in structural dynamic models with unobserved heterogeneity can be reduced to that in models without unobserved heterogeneity. Furthermore, a researcher can now apply various two-step estimators for structural models developed by Hotz and Miller (1993) (and others listed in the Introduction) to models with unobserved heterogeneity since, with our identification results, one can obtain an initial nonparametric consistent estimate of type-specific component distributions. Kasahara and Shimotsu (2008a) provided an example of such an application.8 EXAMPLE 7 —Dynamic Discrete Games (Aguirregabiria and Mira (2007, Section 3.5)): Consider the model of dynamic discrete games with unobserved market characteristics studied by Aguirregabiria and Mira (2007, Section 3.5). There are N ex ante identical “global” firms competing in H local markets. There are M market types and each market’s type is common knowledge to all firms, but unknown to a researcher. In market h ∈{1 2 H}, firm i maximizes the expected discounted sum of profits ∞ E[ s=t βs−t {Πi (xhs ahs ahs−1 ; θh ) + εhis (ahis )}|xht aht aht−1 ; θh ], where xht is a state variable that is common knowledge for all firms, θh ∈ {1 2 M} is the type attribute of market h, ahs = (ah1s ahNs ) is the vector of firms’ decisions, and εhit (ahit ) is a state variable that is private information to firm i. The profit function may depend on, for example, the past entry/exit decision of firms. The researcher observes xht and aht , but neither θh nor εhit . There is no interaction across different markets. Let a−1 denote the vector of firms’ decision in the preceding period. Assume that the εi ’s are independent from x and independent and identically distributed (i.i.d.) across firms. Let σ ∗ (θh ) = {σi∗ (x a−1 εi ; θh ) : i = 1 N} denote a set of strategy functions in a stationary Markov perfect equilibrium (MPE). Then, theequilibrium conditional choice probabilities are given ∗ by Piσ (ai |x a−1 ; θh ) = 1{ai = σi∗ (x a−1 εi ; θh )}g(εi ) dεi , where g(εi ) is the density function for ε = {ε(a) : a ∈ A}. A MPE induces a transition function ∗ of x, which we denote by f σ (xt |xt−1 a−1 ; θh ). 8 Kasahara and Shimotsu (2008a) showed that in structural discrete Markov decision models with unobserved heterogeneity, it is possible to obtain an estimator that is higher-order equivalent to the maximum likelihood estimator (MLE) by iterating the nested pseudo-likelihood (NPL) algorithm of Aguirregabiria and Mira (2002) sufficiently many, but finite times.
MODELS OF DYNAMIC DISCRETE CHOICES
161
Suppose that panel data {{aht xht }Tt=1 }H h=1 are available. As in Aguirregabiria and Mira (2007), consider the case where H → ∞ with N and T fixed. The initial distribution of (a x) differs across market types and is given by p∗m (a x). N σ ∗ Let P m (aht |xht aht−1 ) = and f m (xht |xht−1 i=1 Pi (ahit |xht aht−1 ; m) σ∗ aht−1 ) = f (xht |xht−1 aht−1 ; m). Then the likelihood function for market h becomes a mixture across different unobserved market types, P({aht xht }Tt=1 ) =
M m=1
m
∗m
π p (ah1 xh1 )
T
P m (aht |xht aht−1 )f m (xht |xht−1 aht−1 )
t=2
for which Propositions 6 and 7 are applicable. EXAMPLE 8—An Empirical Example of Aguirregabiria and Mira (2007, Section 5): Aguirregabiria and Mira (2007, Section 5) considered an empirical model of entry and exit in local retail markets based on the model of Example 7. Each market is indexed by h. The firms’ profits depend on the logarithm of the market size, xht , and their current and past entry/exit decisions, aht and aht−1 . The profit of a nonactive firm is zero. When active, firm i’s profit is Πio (xht aht aht−1 ) + ωh + εhit (ahit ), where the function Πio is common across all the markets and εhit is i.i.d. across markets. The parameter ωh captures the unobserved market characteristics and has a discrete distribution with 21 points of support. The logarithm of market size follows an exogenous firstorder Markov process. Their panel covers 6 years at annual frequency (i.e., T = 6). This satisfies the requirement for T in Proposition 6. Given the nonlinear nature of the dynamic games models, the rank condition on the Ls¯ matrix in Proposition 6 is likely to be satisfied. In their specification however, the transition function for xht , fh (xht |xht−1 ), is market-specific, so that the number of types is equal to the number of markets. Consequently, Proposition 6 does not apply to this case, and the type-specific conditional choice probabilities may not be nonparametrically identified.9 If we limit the number of types for transition functions, then we may apply Proposition 6 to identify the type-specific conditional choice probabilities and the type-specific transition probabilities. For the mth type market, the joint conditional choice probabilities across all firms are P m (aht |xht aht−1 ) = 9 Aguirregabiria and Mira estimated the market-specific transition function using 14 years of data on market size from other data sources. Given the relatively long time length, the identification of market-specific transition functions may come from the time variation of each market. When the transition function is market-specific, however, the conditional choice probabilities also become market-specific, leading to the incidental parameter problem. Even if the market-specific transition function is known, nonparametrically identifying the market-specific conditional choice probabilities is not possible given a short panel.
162
H. KASAHARA AND K. SHIMOTSU
N
Pi (ahit |xht aht−1 ; θm ), where aht = (ah1t ahNt ) . The market size is discretized with 10 support points (i.e., |X| = 10) and A = {0 1}N . Consequently, the size of the state space of sht = (aht xht ) for this model is |A||X| = 2N × 10, and we may identify up to M = 2N × 10 + 1 types. For instance, their Table VI reports that 63.5 percent of markets have no less than five potential firms (i.e., N ≥ 5) in the restaurant industry. Hence, M = 25 × 10 + 1 = 321 market types can be identified for these markets.10 i=1
The following proposition extends Proposition 3 for identification of M under Assumption 2. Because of the state dependence, the required panel length becomes T = 5. We omit the proof because it is essentially the same as that of Proposition 3. PROPOSITION 8: Suppose that Assumption 2 holds. Assume T ≥ 5 and S = {1 |S|}. Fix s1 = s3 = s5 = s¯ ∈ S. Define, for s s ∈ S, Ps¯ (s) = P(s2 = s s1 = s3 = s¯) Ps¯ (s s ) = P (s2 s4 ) = (s s ) s1 = s3 = s5 = s¯ and define a (|S| + 1) × (|S| + 1) matrix ⎡ 1 P (1) ··· s¯
⎢ Ps¯ (1) Ps¯∗ = ⎢ ⎣ Ps¯ (|S|)
Ps¯ (1 1) P˜s¯ (|S| 1)
··· ···
Ps¯ (|S|) Ps¯ (1 |S|)
⎤ ⎥ ⎥ ⎦
Ps¯ (|S| |S|)
Suppose q∗m (¯s) > 0 for all m. Then M ≥ rank(Ps¯∗ ). Furthermore, if the matrix L∗s¯ defined below has rank M, then M = rank(Ps¯∗ ): ⎡ ⎤ 1 λ1s¯ (1) · · · λ1s¯ (|S|) ⎢ ⎥ L∗s¯ = ⎣ ⎦ (M×(|S|+1))
1 λM s¯ (1)
···
λM s¯ (|S|)
In some applications, the model has two types of covariates, zt and xt , where the transition function of xt depends on types, while the transition function of zt is common across types. In such a case, we may use the variation of zt as a main source of identification and relax the requirement on T in Proposition 6. We impose an assumption analogous to Assumption 2, as well as the conditional independence assumption on the transition function of (x z ): 10 Even if the type-specific conditional choice probabilities are nonparametrically identified, it is generally not possible to nonparametrically identify the primitive objects such as the discount factor β, profit functions Πi , and the distribution of shocks, εiht , in structural dynamic models (Rust (1994), Magnac and Thesmar (2002)).
MODELS OF DYNAMIC DISCRETE CHOICES
163
ASSUMPTION 3: (a) The choice probability of at does not depend on time and is independent of zt−1 . (b) The transition function of (xt zt ) conditional m on {xτ zτ aτ }t−1 τ=1 takes the form g(zt |xt−1 zt−1 at−1 )f (xt |xt−1 at−1 ) for all t. m (c) f (x |x a) > 0 for all (x x a) ∈ X × X × A and g(z |x z a) > 0 for all (z x z a) ∈ Z × X × Z × A and for m = 1 M. Under Assumption 3, consider a model P({at xt zt }Tt=1 ) =
M
π m p∗m (x1 z1 a1 )
m=1
T
g(zt |xt−1 zt−1 at−1 )f m (xt |xt−1 at−1 )
t=2
× P m (at |xt xt−1 at−1 zt ) Assuming g(zt |xt−1 zt−1 at−1 ) is known and defining st = (at xt ), q˜ ∗m (s1 z1 ) = p∗m (x1 z1 a1 ), and Q˜ m (st |st−1 zt ) = f m (xt |xt−1 at−1 )P m (at |xt xt−1 at−1 zt ), we can write this equation as (29)
˜ t zt }T ) = P({s t=1
P({at xt zt }Tt=1 ) T
g(zt |xt−1 zt−1 at−1 )
t=2
=
M
π m q˜ ∗m (s1 z1 )
m=1
T
Q˜ m (st |st−1 zt )
t=2
We fix the value of {st }Tt=1 and use the “independent” variation of zt to identify unobserved types. The next proposition provides a sufficient condition for nonparametric identification of the model (29). Define, for s¯ ∈ S and h ξ ∈ Z, π˜ s¯mh = π m q˜ ∗m (¯s h)
˜ m s|¯s ξ) λ˜ m s¯ (ξ) = Q (¯
PROPOSITION 9: Suppose that Assumption 3 holds and assume T ≥ 4. Define ⎡ ⎤ 1 λ˜ 1s¯ (ξ1 ) · · · λ˜ 1s¯ (ξM−1 ) ⎢ ⎥ L¯ s¯ = ⎣ ⎦ (M×M)
1
λ˜ M s¯ (ξ1 )
···
λ˜ M s¯ (ξM−1 )
Suppose that q˜ ∗m (¯s h) > 0 for all m, there exist some {ξ1 ξM−1 } such that L¯ s¯ is nonsingular, and there exist (r k) ∈ S × Z such that Q˜ m (r|¯s k) > 0 for all m and Q˜ m (r|¯s k) = Q˜ n (r|¯s k) for any m = n. Then {π˜ s¯mh {λm s¯ (ξ)}ξ∈Z , m M T ˜ t zt } ) : {st zt }T ∈ {Q˜ (s|¯s ξ)}(sξ)∈S×Z }m=1 is uniquely determined from {P({s t=1 t=1 T (S × Z) }.
164
H. KASAHARA AND K. SHIMOTSU
We may identify the primitive parameters π m , p∗m (a x), f m (x |x a), P (a|x) using an argument analogous to those of Remark 5(ii) and (iii). The requirement of T = 4 in Proposition 9 is weaker than that of T = 6 in Proposition 6 because the variation of zt , rather than (xt at ), is used as a main source of identification. When T > 4, we may apply the argument of Proposition 2 to relax the sufficient condition for identification in Proposition 9, but we do not pursue it here; Proposition 2 provides a similar result. m
3.3. Limited Transition Pattern This section analyzes the identification condition of the baseline model when Assumption 1(c) is relaxed. In some applications, the transition pattern of x is limited, as not all x ∈ X are reachable with a positive probability. In such instances, the set of sequences {at xt }Tt=1 that can be realized with a positive probability also becomes limited and the number of restrictions from a set of the submodels falls, making identification more difficult. EXAMPLE 9—Bus Engine Replacement Model (Rust (1987)): Suppose a ∈ {0 1} is the replacement decision for a bus engine, where a = 1 corresponds to replacing a bus engine. Let x denote the mileage of a bus engine with X = {1 2 }. The transition function of xt is ⎧ θf1 for xt+1 = (1 − at )xt + at , ⎪ ⎪ ⎨ for xt+1 = (1 − at )xt + at + 1, θf2 f (xt+1 |xt at ; θ) = ⎪ − θ for xt+1 = (1 − at )xt + at + 2, 1 − θ f1 f2 ⎪ ⎩ 0 otherwise, and not all x ∈ X can be realized from (x a). Henceforth, we assume the transition function of x is stationary and takes the form f (x |x a) to simplify the exposition. If f (x |x a) = 0 for some (x x a) and not all x ∈ X can be reached from (a x), then some values of ˜ t xt }T ) in (9) and its lower{at xt } are never realized. For such values, P({a t=1 dimensional submodels in (10) and (11) are not well defined. Hence, their restrictions cannot be used for identification. Thus, we fix the values of (a1 x1 ) and (aτ xτ ), and focus on the values of (at xt ) that are realizable between (a1 x1 ) and (aτ xτ ). The difference in response patterns between (a1 x1 ) and (aτ xτ ) provides a source of identification. To fix the idea, assume T = 4, and fix at = 0 for all t, x1 = h, and xτ = k. Of course, it is possible to choose different sequences of {at }Tt=1 . Let Bh and C h be subsets of X, of which elements are realizable between (a1 x1 ) and (aτ xτ ). We use the variations of x within Bh and C h as a source of identification. Define, for h ξ ∈ X, (30)
π˜ hm = π m p∗m (a1 = 0 x1 = h) and
m λ˜ m ξ = P (a = 0|x = ξ)
MODELS OF DYNAMIC DISCRETE CHOICES
165
˜ k = diag(λ˜ 1 λ˜ M ). We identify V˜h , D ˜ k, and V˜h = diag(π˜ h1 π˜ hM ) and D k k m and λ˜ ξ ’s from the factorization equations corresponding to (18): (31)
P h = L˜ b V˜h L˜ c
and
˜ k V˜h L˜ c Pkh = L˜ b D
where L˜ b and L˜ c are defined analogously to L in (13), but using λ˜ m ξ , and with ξ ∈ Bh and ξ ∈ C h , respectively. As we discuss below, we choose Bh , C h , and k so that P h and Pkh are identifiable from the data. Each equation of these factorization equations (31) represents a submodel in (9) and (10) for a sequence of at ’s and xt ’s that belongs to one of the sets (32)
A1 = {x1 = h (x2 x3 ) ∈ Bh × C h x4 = k; at = 0 for all t} A2 = {x1 = h x2 ∈ Bh x3 = k; at = 0 for all t} A3 = {x1 = h x2 ∈ C h x3 = k; at = 0 for all t} A4 = {x1 = h x2 = k; at = 0 for all t}
For instance, a submodel for a sequence q1 ∈ A1 in (32) is ˜ 1) = P(q =
P(q1 ) f (k|x3 0)f (x3 |x2 0)f (x2 |h 0) M
π m p∗m (h 0)P m (0|x2 )P m (0|x3 )P m (0|k)
m=1
for (x2 x3 ) ∈ Bh × C h ˜ k V˜h L˜ c in (31). which represents one of the equations of Pkh = L˜ b D For all the submodels implied by (31) to provide identifying restrictions, all the sequences of xt ’s in A1 –A4 in (32) must have positive probability; otherwise, some elements of P h and Pkh in (31) cannot be constructed from the data, and our identification strategy fails. This requires that all the points in Bh must be reachable from h, while all the points in C h must be reachable from h and all the points in Bh . Finally, k must be reachable from h and all the points in Bh and C h . EXAMPLE 9 —Continued: In Example 9, assume the initial distribution p∗m (x a) is defined as the type-specific stationary distribution. Set at = 0 for t = 1 4 and x1 = h. Choose Bh = {h h + 1} and C h = {h + 1 h + 2}, and k = h + 2. For this choice of Bh , C h , and k, the corresponding transition probabilities are nonzero, and we may construct all the elements of P h and Pkh in (31) from the observables. For each h ∈ X, these submodels provide 4 + 3 + 1 = 8 restrictions for identification.
166
H. KASAHARA AND K. SHIMOTSU
We now state the restrictions on Bh and C h formally. First we develop useful notation. For a singleton {x} ⊂ X, let Γ (a {x}) = {x ∈ X : f (x |x a) > 0} denote a set of x ∈ X that can be reached from (a x) in the next period with a positive probability. For a subset W ⊆ X, define Γ (a W ) as the intersection of Γ (a {x})’s across all x’s in W : Γ (a W ) = x∈W Γ (a {x}). We summarize the assumptions of this subsection including the restrictions on Bh and C h : ASSUMPTION 4: (a) The choice probability of at does not depend on time. (b) The choice probability of at is independent of the lagged variable (xt−1 at−1 ) conditional on xt . (c) P m (a|x) > 0 for all (a x) ∈ A × X and m = 1 M. h h (d) ftm (xt |{xτ aτ }t−1 τ=1 ) = f (xt |xt−1 at−1 ) for all m. (e) h k ∈ X, B and C sat∗m isfy p (a1 = 0 x1 = h) > 0 for all m, and Bh ⊆ Γ (0 {h})
C h ⊆ Γ (0 Bh ) ∩ Γ (0 {h})
{k} ⊆ Γ (0 C h ) ∩ Γ (0 Bh ) ∩ Γ (0 {h}) Assumption 4(a) and (b) are identical to Assumption 1(a) and (b). Assumption 4(c) is necessary for the submodels to be well defined. Assumption 4(d) strengthens Assumption 1(d) by imposing stationarity and a first-order Markov property. It may be relaxed, but doing so would add substantial notational complexity. Assumption 4(e) guarantees that all the sequences we consider in the subsets in (32) have nonzero probability. Note that the choice of C h is affected by how Bh is chosen. If Assumption 1(c) holds, it is possible to set Bh = C h = X. The next proposition provides a sufficient condition for identification under Assumption 4. PROPOSITION 10: Suppose that Assumption 4 holds T = 4, and |Bh |, |C h | ≥ b c } and {ξ1c ξM−1 } be elements of Bh and C h , respecM − 1. Let {ξ1b ξM−1 m m tively. Define π˜ h and λ˜ ξ as in (30), and define ⎤ ⎡ 1 λ˜ 1ξb λ˜ 1ξb · · · λ˜ 1ξb 1 2 M−1 ⎢ ⎥ ⎥ L˜ b = ⎢ ⎦ ⎣ (M×M) 1 λ˜ M · · · λ˜ M λ˜ M b ξ1b ξ2b ξM−1 ⎡ 1 λ˜ 1 λ˜ 1 · · · λ˜ 1 ⎤ L˜ c (M×M)
⎢ = ⎣
ξ1c
˜ 1 λM ξc
1
ξ2c
˜λMc ξ
2
c ξM−1
···
⎥ ⎦
λ˜ M ξc
M−1
b } and {ξ1c Suppose that L˜ b and L˜ c are nonsingular for some {ξ1b ξM−1 c ˜m ˜n }, and that λ˜ m ˜ hm λ˜ m ξM−1 k > 0 for all m and λk = λk for any m = n. Then {π ξ :ξ ∈ h h M T T ˜ t xt } ) : {at xt } ∈ (A × X)T }. B ∪C } is uniquely determined from {P({a m=1
t=1
t=1
MODELS OF DYNAMIC DISCRETE CHOICES
167
Assuming that all the values of x can be realized in the initial period, we may repeat the above argument for all possible values of x1 to identify λ˜ m ξ for any ξ ∈ h∈X Bh . Furthermore, we can repeat the argument for different sequences of {at }4t=1 to increase the identifiable elements of P m (a|x). For instance, by choosing Bh = Γ (a {h}), λ˜ m l is identified for all l ∈ X if the union of Γ (a {h}) across different (a h) ∈ A × X includes all the elements of X so that X = Γ (a {h}). This is a weak condition and is satisfied if X is an ergodic (ah)∈A×X set. However, setting Bh = Γ (a {h}) may lead to a small number of identifiable types. EXAMPLE 9—Continued: Setting at = 0 for t = 1 4, we have Γ (0 {h}) = {h h + 1 h + 2} for any h ∈ X. To satisfy Assumption 4(e), choose Bh = {h h + 1}, C h = {h + 1 h + 2}, and k = h + 2. If the other assumptions of Proposition 10 are satisfied, we can identify M = 3 types. From the factoriza˜ k , L˜ b , and L˜ c , and idention equations (31), we can uniquely determine V˜h , D m ∗m m tify {π p (0 x) P (0|x) : x = h h + 1 h + 2}m=123 . Repeating for all h ∈ X, we identify P m (a|x) for all (a x) ∈ A × X. We then identify p∗m (x a) using P m (a|x), f (x |x a), and the fixed point constraint, while π m is determined as π m p∗m (0 x)/p∗m (0 x). The sufficient condition of Proposition 10 does not allow one to identify many types when the size of Bh or C h is small. It is possible to identify more types when we can find a subset D of X that is reachable from itself, namely D ⊆ Γ (0 D). For example, if the transition pattern is such that Γ (0 {x}) = {x − 2 x − 1 x x + 1 x + 2} for some x ∈ X, then the set {x − 1 x x + 1} serves as D. In such cases, we can apply the logic of Proposition 2 to identify many types if T ≥ 5. ASSUMPTION 5: (a) Assumptions 4(a)–(d) hold. (b) A subset D of X satisfies D ⊆ Γ (0 D). m ∗m Set D = {d1 d|D| }, and define λ∗m d = p ((a x) = (1 d)) and λd = P (a = 1|x = d) for d ∈ D. Under Assumption 5, replacing X with D and simply repeating the proof of Proposition 2 gives the following proposition: m
PROPOSITION 11: Suppose Assumption 5 holds. Assume T ≥ 5 is odd and define u = (T − 1)/2. Define Λr , r = 0 u, analogously to Proposition 2 except u |D|+l−1 ∗m m m (X λ∗m λ ) is replaced with (D λ λ ). Define an M × ( ) matrix dj dj ξj ξj l=0 l Λ Λ ]. Λ as Λ = [Λ0 Λ 1 2 u u Suppose (a) l=0 |D|+l−1 ≥ M, (b) we can construct a nonsingular M × M l matrix L by setting its first column as Λ0 and choosing the other M − 1 columns from the columns of Λ but Λ0 , and (c) there exists dk ∈ D such that λ∗m dk > 0 ∗n ∗m ∗m m |D| M m for all m and λdk = λdk for any m = n. Then {π {λdj λdj }j=1 }m=1 is uniquely ˜ t xt }T ) : {at xt }T ∈ (A × X)T }. determined from {P({a t=1
t=1
168
H. KASAHARA AND K. SHIMOTSU
4 if |D| = 3 and T = 5, the number of identifiable types becomes 2For3example, + + 2 = 10. Identifying more types is also possible when the model has 0 1 an additional covariate zt whose transition pattern is not limited and there is a ¯ > 0 for some sequence of at . Then, for state x¯ such that P(x1 = · · · = xT = x) ¯ we can use the variation of zt and apply Proposition 9. This increases x = x, the number of identifiable types to |Z| + 1. 4. CONCLUDING REMARK This paper studies dynamic discrete choice models with unobserved heterogeneity that is represented in the form of finite mixtures. It provides sufficient conditions under which such models are identified without parametric distributional assumptions. While we emphasize that the variation in the covariate and in time provides important identifying information, our identification approach does require assumptions on the Markov property, stationarity, and type-invariance in transition processes. To clarify our identification results, consider a general nonstationary finite mixture model of dynamic discrete choices: (33)
P({at xt }Tt=1 ) =
M
π m p∗m (x1 a1 )
m=1
T
m t−1 ftm (xt |{xτ aτ }t−1 τ=1 )Pt (at |xt {xτ aτ }τ=1 )
t=2
Such a general mixture model (33) cannot be nonparametrically identified without imposing further restrictions.11 One possible nonparametric restriction is a first-order Markovian assumption on (xt at ), that yields a less general nonstationary model: (34)
P({at xt }Tt=1 ) =
M m=1
m
∗m
π p (x1 a1 )
T
ftm (xt |xt−1 at−1 )Ptm (at |xt xt−1 at−1 )
t=2
We do not know whether this model is nonparametrically identified without additional assumptions. Section 3.1 shows that the identification of the nonstaThe model (33) is equivalent to a mixture model P({at xt }Tt=1 ) = because it is always to possible to decompose 11
P m ({at xt }Tt=1 ) = p∗m (x1 a1 )
T
M m=1
π m P m ({at xt }Tt=1 ),
t−1 m ftm (xt |{xτ aτ }t−1 τ=1 )Pt (at |xt {xτ aτ }τ=1 )
t=2
The number of restrictions implied by P({at xt }Tt=1 ) is (|A||X|)T − 1, while the number of un T m m T knowns in M m=1 π P ({at xt }t=1 ) is M − 1 + M((|A||X|) − 1).
MODELS OF DYNAMIC DISCRETE CHOICES
169
tionary model (34) is possible under the assumptions of type-invariant transition processes and conditional independence of discrete choices. In Section 3.2, we provide identification results under the stationarity assumption on the transition function and choice probabilities in (34). Relaxing these identifying assumptions as well as investigating identifiability, or perhaps nonidentifiability, of finite mixture model (34) is an important future research area. Estimation and inference on the number of components (types), M, is an important topic because of the lack of guidance from economic theory. It is known that the likelihood ratio statistic has a nonstandard limiting distribution when applied to testing the number of components of a mixture model (see, for example, Liu and Shao (2003)). Leroux (1992) considered a maximumpenalized-likelihood estimator for the number of components, which includes the Akaike information criterion and Bayesian information criterion as a special case. McLachlan and Peel (2000, Chapter 6) surveyed the methods for determining the number of components in parametric mixture models. To our best knowledge, all of these existing methods assume that the component distributions belong to a parametric family. Developing a method for testing and selecting the number of components without imposing any parametric assumption warrants further research. A statistical test of the number of components may be possible by testing the rank of matrix P ∗ in Proposition 3. When the covariate has a large number of support points, we may test the number of components by testing a version of matrix P in (17) across different partitions of X. In Kasahara and Shimotsu (2008b), we pursued this idea, and proposed a selection procedure for the number of components by sequentially testing the rank of matrices. APPENDIX: PROOFS PROOF OF PROPOSITION 1 AND COROLLARY 1: Define V = diag(π 1 ∗M π ) and Dk = diag(λ∗1 k λk ) as in (13). Define P and Pk as in (17). Then P and Pk are expressed as (see (14)–(16)) M
P = L V L
Pk = L V Dk L
We now uniquely determine L, V , and Dk from P and Pk constructively. Since L is nonsingular, we can construct a matrix Ak = P −1 Pk = L−1 Dk L. Because Ak L−1 = L−1 Dk , the eigenvalues of Ak determine the diagonal elements of Dk while the right eigenvectors of Ak determine the columns of L−1 up to multiplicative constants; denote the right eigenvectors of Ak by L−1 K, where K is some diagonal matrix. Now we can determine V K from the first row of PL−1 K because PL−1 K = L V K and the first row of L is a vector of ones. Then L is determined uniquely by L = (PL−1 K)(V K)−1 = (L V K)(V K)−1 .
170
H. KASAHARA AND K. SHIMOTSU
Having obtained L , we may determine V from the first column of (L )−1 P because (L )−1 P = V L and the first column of L is a vector of ones. Therefore, M−1 M we identify {π m {λm ξj }j=1 }m=1 as the elements of V and L. Once V and L are determined, we can uniquely determine Dζ = diag(λ∗1 ζ ∗M λζ ) for any ζ ∈ X by constructing Pζ in the same way as Pk and using the relationship Dζ = (L V )−1 Pζ L−1 . Furthermore, for arbitrary ζ ξj ∈ X, evaluate Fx2 x3 , Fx2 , and Fx3 defined in (15) and (16) at (x2 x3 ) = (ζ ξj ), and define ⎤ ⎡ 1 λ1ζ
1 Fξ1 FξM−1 ⎥ ⎢ ζ ζ ⎦ (35) P = L =⎣ (M×2) (2×M) Fζ Fζξ1 FζξM−1 M 1 λζ Since P ζ = (Lζ ) V L, we can uniquely determine (Lζ ) = P ζ (V L)−1 . Therefore, M m M {λ∗m ζ }m=1 and {λζ }m=1 are identified for any ζ ∈ X. This completes the proof of Proposition 1, and Corollary 1 follows immediately. Q.E.D. PROOF OF PROPOSITION 2: The proof is similar to the proof of Proposition 1. Let T = (τ2 τp ), 2 ≤ p ≤ T , be a subset of {2 T }. Let X (T ) be a subset of {xt }Tt=2 with t ∈ T . For example, if T = {2 4 6}, then X (T ) = ˜ t xt }T ), integrating out (at xt ) if t ∈ {x2 x4 x6 }. Starting from P({a / T , and t=1 evaluating it at (a1 x1 ) = (1 k) and at = 1 for t ∈ T gives a “marginal” M ∗ m ∗m m ˜ Fk X (T ) = P({a1 x1 } = {1 k} {1 xt }τ∈T ) = m=1 π λk t∈T λxt . For example, M ∗ m ∗m m m m if T = {2 4 6}, then Fk X (T ) = m=1 π λk λx2 λx4 λx6 . Integrating out (a1 x1 ) ˜ xt }τ∈T ) = additionally and proceeding in a similar way gives FX (T ) = P({1 M m m t∈T λxt . m=1 π ∗M Define V = diag(π 1 π M ) and Dk = diag(λ∗1 k λk ) as in (13). Define
P = (L ) V L and Pk = (L ) V Dk L . Then the elements of P take the form M m m |T | . m=1 π t∈T λxt and can be expressed as FX (T ) for some T and {xt }t∈T ∈ X
∗ Similarly, the elements of Pk can be expressed as FkX (T ) . For instance, if u = 3, T = 7, and both Λ and L are M × M, then P is given by ⎡
1 ⎢ F1 ⎢ ⎢ ⎢ F ⎢ |X| ⎢ F 11 ⎢ ⎢ ⎢ ⎢ ⎢ F|X||X| ⎢ ⎢ F111 ⎢ ⎣ F|X||X||X|
F1
· · · F|X|
F11
···
F|X|11
F|X||X|
F111
···
F|X||X||X|
F|X||X||X||X|
F111|X||X||X|
F|X||X||X|11
···
F|X||X||X||X||X||X|
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
MODELS OF DYNAMIC DISCRETE CHOICES
171
where the (i j)th element of P is Fσ , where σ consists of the combined subscripts of the (i 1)th and (1 j)th element of P . For example, the (|X| + 1 2)th ∗ and element of P is F|X|1 (= F1|X| ). Pk is given by replacing Fσ in P with Fkσ ∗ setting the (1 1)th element to Fk . Consequently, P and Pk can be computed from the distribution function of the observed data. By repeating the argument of the proof of Proposition 1, we determine L , V , and Dk uniquely from P and Pk first, and then Dζ = ∗M ζ
ζ ζ diag(λ∗1 ζ λζ ) and L for any ζ ∈ X from P , Pζ , L , and P , where L and P ζ are defined in (35). Q.E.D. PROOF OF PROPOSITION 3: Let V = diag(π 1 π M ). Then P ∗ = (L∗1 ) × V L∗2 . It follows that rank(P ∗ ) ≤ min{rank(L∗1 ) rank(L∗2 ) rank(V )}. Since rank(V ) = M, it follows that M ≥ rank(P ∗ ), where the inequality becomes strict when rank(L∗1 ) or rank(L∗2 ) is smaller than M. When rank(L∗1 ) = rank(L∗2 ) = M, multiplying both sides of P ∗ = (L∗1 ) V L∗2 from the right by (L∗2 ) (L∗2 (L∗2 ) )−1 gives P ∗ (L∗2 ) (L∗2 (L∗2 ) )−1 = (L∗1 ) V . There are M linearly independent columns in (L∗1 ) V , because (L∗1 ) has M linearly independent columns while V is a diagonal matrix with strictly positive elements. Therefore, rank(P ∗ (L∗2 ) (L∗2 (L∗2 ) )−1 ) = M. It follows that rank(P ∗ ) = M because M ≤ min{rank(P ∗ ) rank(L∗2 ) rank(L∗2 (L∗2 ) )−1 }, and rank(L∗2 ) = M imQ.E.D. ply rank(P ∗ ) ≥ M. PROOF OF PROPOSITION 4: The proof is similar to the proof of Proposition 1. Define Pt and Ptk analogously to P and Pk but with λx2 and λx3 replaced with λtxt and λt+1xt+1 in the definition of F· and F·∗ . Define V and Dk as before. Then Pt and Ptk are expressed as Pt = Lt V Lt+1 and Ptk = Lt V Dk Lt+1 . Since Lt and Lt+1 are nonsingular, we have Ak = Pt−1 Ptk = L−1 t+1 Dk Lt+1 . Be−1 cause Ak L−1 = L D , the eigenvalues of A determine the diagonal elek k t+1 t+1 ments of Dk while the right eigenvectors of Ak determine the columns of L−1 t+1 up to multiplicative constants; denote the right eigenvectors of Ak by L−1 t+1 K, where K is some diagonal matrix. Now we can determine V K from the first row −1 of Pt L−1 t+1 K because Pt Lt+1 K = Lt V K and the first row of Lt is a vector of ones. Then Lt is determined uniquely by Lt = (Lt V K)(V K)−1 . Having obtained Lt , we may determine V and Lt+1 from V Lt+1 = (Lt )−1 P because the first column of V Lt+1 equals the diagonal of V and Lt+1 = V −1 (V Lt+1 ). Therefore, we de λm }M−1 }M as elements of V , Lt , and Lt+1 . Once V , Lt termine {π m {λm tξt t+1ξt+1 j=1 m=1 j
j
∗M and Lt+1 are determined, we can uniquely determine Dζ = diag(λ∗1 ζ λζ ) for any ζ ∈ X by constructing Ptζ in the same way as Ptk and using the relationship Dζ = (Lt V )−1 Ptζ (Lt+1 )−1 . Furthermore, for arbitrary ζ ∈ X, define ⎤ ⎡ 1 λ1tζ ⎢ ⎥ Lζt = ⎣ ⎦ (M×2)
1
λM tζ
172
H. KASAHARA AND K. SHIMOTSU
Then Ptζ = (Lζt ) V Lt+1 is a function of the distribution function of the observable data, and we can uniquely determine (Lζt ) for 2 ≤ t ≤ T − 1 as Ptζ (V Lt+1 )−1 . For t = T , we can use the fact that (LT −1 ) V LζT is also a function of the distribution function of the observable data and proceed in the m M−1 same manner. Therefore, we can determine {λ∗m ζ λtζ }j=1 for any ζ ∈ X and 2 ≤ t ≤ T. Q.E.D. PROOF OF PROPOSITION 6: Without loss of generality, set T = 6. Integrating out st ’s backward from P({st }6t=1 ) and fixing s1 = s3 = s5 = s¯ gives the “marginals” F˜s∗2 s4 s6 =
M
m ∗m πs¯m λm s¯ (s2 )λs¯ (s4 )λs¯ (s6 )
F˜s∗2 s6 =
m=1
F˜s∗6 =
M
M
∗m πs¯m λm s¯ (s2 )λs¯ (s6 )
m=1
πs¯m λ∗m s¯ (s6 )
F˜s2 s4 =
m=1
F˜s2 =
M
M
m πs¯m λm s¯ (s2 )λs¯ (s4 )
m=1
πs¯m λm s¯ (s2 )
F˜ =
m=1
M
πs¯m
m=1
As in the proof of Proposition 1, evaluate these F˜·’s at s2 = ξ1 ξM−1 , s4 = ξ1 ξM−1 , and s6 = r, and arrange them into two M × M matrices: ⎡ ˜ ⎤ F F˜ξ1 ··· F˜ξM−1 ⎢ F˜ξ ··· F˜ξ1 ξM−1 ⎥ F˜ξ1 ξ1 ⎢ ⎥ 1 Ps¯ = ⎢ ⎥ ⎣ ⎦ F˜ξM−1 F˜ξM−1 ξ1 · · · F˜ξM−1 ξM−1 ⎡ ˜∗ ⎤ Fk ··· F˜ξ∗M−1 k F˜ξ∗1 k ⎢ ˜∗ ⎥ ⎢ Fξ1 k ··· F˜ξ∗1 ξM−1 k ⎥ F˜ξ∗1 ξ1 k ⎢ ⎥ Ps¯k = ⎢ ⎥ ⎣ ⎦ ∗ ∗ ∗ ˜ ˜ ˜ F ··· F F ξM−1 k
ξM−1 ξ1 k
ξM−1 ξM−1 k
∗M Define Vs¯ = diag(πs¯1 πs¯M ) and Dk|¯s = diag(λ∗1 s¯ (k) λs¯ (k)). Then Ps¯ and Ps¯k are expressed as Ps¯ = Ls¯ Vs¯ Ls¯ and Ps¯k = Ls¯ Vs¯ Dk|¯s Ls¯ . Repeating the argument of the proof of Proposition 1 shows that Ls¯ , Ls¯ , Vs¯ , and Dk|¯s are uniquely determined from Ps¯ and Ps¯k , and that Ds|¯s and λm s¯ (s) can be uniquely determined for any s ∈ S and m = 1 M. Q.E.D.
PROOF OF PROPOSITION 7: Define Vs¯ = diag(πs¯1 πs¯M ) and Dk|¯s = ∗M diag(λ∗1 s¯ (k) λs¯ (k)). Applying the argument of the proof of Proposition 6
MODELS OF DYNAMIC DISCRETE CHOICES
173
with Ls¯ replaced by L s¯ , we can identify L s¯ , Vs¯ , and Dk|¯s , and then Ds|¯s and λm s¯ (s) for any s ∈ S and m = 1 M. The stated result immediately follows. Q.E.D. PROOF OF PROPOSITION 9: The proof uses the logic of the proof of Proposition 6. Consider a sequence {st zt }4t=1 with (s1 s2 s3 s4 ) = (¯s s¯ s¯ r) and (z1 z4 ) = (h k). Summarize the value of s4 and z4 into ζ = (r k). For M ˜ m s k) and F˜ h = ˜m (z2 z3 ) ∈ Z 2 , define F˜zh∗2 z3 ζ = m=1 π˜ s¯mh λ˜ m s¯ (z2 )λs¯ (z3 )Q (r|¯ z2 z3 M M m ˜m m ˜m m h∗ m ˜ ˜ ˜ ˜ s¯h λs¯ (z2 )λs¯ (z3 ). Define Fz2 ζ = m=1 π˜ s¯h λs¯ (z2 )Q (r|¯s k), and define m=1 π F˜ζh∗ , F˜zh2 , and F˜ h analogously to the proof of Proposition 6. As in the proof of Proposition 6, arrange these marginals into two matrices ∗ replaced P¯ h and P¯ζh . P¯ h and P¯ζh are the same as Ps¯ and Ps¯k , but F˜· and F˜·k h h∗ ˜ ˜ with F· and F·ζ and subscripts are elements of Z instead of S. Define V˜s¯h = ˜ ζ|¯s = diag(Q˜ 1 (r|¯s k) Q˜ M (r|¯s k)). It then follows diag(π˜ s¯1h π˜ s¯Mh ) and D h h ˜ ζ|¯s L¯ s¯ . By repeating the argument of the that P¯ = L¯ s¯ V˜s¯ L¯ s¯ and P¯ζh = L¯ s¯ V˜s¯h D ˜ ζ|¯s from P¯ h proof of Proposition 1, we can uniquely determine L¯ s¯ , V˜s¯h , and D h ˜ (sz)|¯s for any (s z) ∈ S × Z. and P¯ζ , and, having determined L¯ s¯ , determine D Q.E.D. PROOF OF PROPOSITION 10: For (x2 x3 ) ∈ Bh × C h and xc ∈ Bh ∪ C h , deM M M ˜ m ˜ m h∗ ˜ m h∗ fine Fxh∗2 x3 k = m=1 π˜ hm λ˜ m ˜ hm λ˜ m ˜ hm λ˜ m k, x2 λx3 λk , Fxc k = xc λk , Fk = m=1 π m=1 π M M M m˜m ˜m m˜m m h h h Fx2 x3 = m=1 π˜ h λx2 λx3 , Fxc = m=1 π˜ h λxc , and F = m=1 π˜ h . They can be constructed from sequentially integrating out P({at xt }4t=1 ) backward and then dividing them by a product of f (xt |xt−1 0). Note that Assumption 4(b) guarantees f (xt |xt−1 0) > 0 for all xt and xt−1 in the subsets of X considered. As in the proof of Proposition 1, arrange these marginals into two matrices ∗ are replaced P h and Pkh . P h and Pkh are the same as P and Pk but F· and Fk· h∗ ˜ k = diag(λ˜ 1 λ˜ M ). By . Define V˜h = diag(π˜ h1 π˜ hM ) and D with F·h and F·k k k applying the argument in the proof of Proposition 4, we may show that L˜ b , L˜ c , ˜ k are uniquely determined from P({a ˜ t xt }4 ) and its marginals, and V˜h , and D t=1 m M ˜ then show that {λξ }m=1 is determined for ξ ∈ Bh ∪ C h . Q.E.D. REFERENCES AGUIRREGABIRIA, V. (2006): “Another Look at the Identification of Dynamic Discrete Decision Processes,” Unpublished Manuscript, University of Toronto. [139] AGUIRREGABIRIA, V., AND P. MIRA (2002): “Swapping the Nested Fixed Point Algorithm: A Class of Estimators for Discrete Markov Decision Models,” Econometrica, 70, 1519–1543. [138,160] (2007): “Sequential Estimation of Dynamic Discrete Games,” Econometrica, 75, 1–54. [138,139,160,161]
174
H. KASAHARA AND K. SHIMOTSU
ANDERSON, T. W. (1954): “On Estimation of Parameters in Latent Structure Analysis,” Psychometrika, 19, 1–10. [145] BAJARI, P., AND H. HONG (2006): “Semiparametric Estimation of a Dynamic Game of Incomplete Information,” Technical Working Paper 320, NBER. [138] BAJARI, P., C. L. BENKARD, AND J. LEVIN (2007): “Estimating Dynamic Models of Imperfect Competition,” Econometrica, 75, 1331–1370. [138] BLISCHKE, W. R. (1964): “Estimating the Parameters of Mixtures of Binomial Distributions,” Journal of the American Statistical Association, 59, 510–528. [149] BROWNING, M., AND J. CARRO (2007): “Heterogeneity and Microeconometrics Modelling,” in Advances in Economics and Econometrics, Theory and Applications: Ninth World Congress of the Econometric Society, Vol. 3, ed. by R. Blundell, W. Newey, and T. Persson. Cambridge, U.K.: Cambridge University Press, 47–74. [141,158] CAMERON, S. V., AND J. J. HECKMAN (1998): “Life Cycle Schooling and Dynamic Selection Bias: Models and Evidence for Five Cohorts of American Males,” Journal of Political Economy, 106, 262–333. [136] CHANDRA, S. (1977): “On the Mixtures of Probability Distributions,” Scandinavian Journal of Statistics, 4, 105–112. [144] CRAWFORD, G. S., AND M. SHUM (2005): “Uncertainty and Learning in Pharmaceutical Demand,” Econometrica, 73, 1137–1173. [135] ELBERS, C., AND G. RIDDER (1982): “True and Spurious Duration Dependence: The Identifiability of the Proportional Hazard Model,” Review of Economic Studies, 49, 403–409. [139] GEWEKE, J., AND M. KEANE (2001): “Computationally Intensive Methods for Integration in Econometrics,” in Handbook of Econometrics, Vol. 5, ed. by J. Heckman and E. Leamer. Amsterdam: North-Holland. [139] GIBSON, W. A. (1955): “An Extension of Anderson’s Solution for the Latent Structure Equations,” Psychometrika, 20, 69–73. [145] GOWRISANKARAN, G., M. F. MITCHELL, AND A. MORO (2005): “Why Do Incumbent Senators Win? Evidence From a Dynamic Selection Model,” Unpublished Manuscript, Washington University in St. Louis. [135] HALL, P., AND X.-H. ZHOU (2003): “Nonparametric Estimation of Component Distributions in a Multivariate Mixture,” Annals of Statistics, 31, 201–224. [136,142] HALL, P., A. NEEMAN, R. PAKYARI, AND R. ELMORE (2005): “Nonparametric Inference in Multivariate Mixtures,” Biometrika, 92, 667–678. [137] HECKMAN, J. J. (1981): “The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating a Discrete Time-Discrete Data Stochastic Process,” in Structural Analysis of Discrete Panel Data With Econometric Applications, ed. by C. F. Manski and D. McFadden. Cambridge, MA: MIT Press, 179–195. [141] HECKMAN, J. J., AND B. SINGER (1984): “A Method of Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration Data,” Econometrica, 52, 271–320. [136,139] HONORÉ, B. E., AND E. TAMER (2006): “Bounds on Parameters in Panel Dynamic Discrete Choice Models,” Econometrica, 73, 611–629. [139] HOTZ, J., AND R. A. MILLER (1993): “Conditional Choice Probabilities and the Estimation of Dynamic Models,” Review of Economic Studies, 60, 497–529. [138,160] HOUDE, J.-F., AND S. IMAI (2006): “Identification and 2-Step Estimation of DDC Models With Unobserved Heterogeneity,” Unpublished Manuscript, Queen’s University. [149,150] KASAHARA, H., AND K. SHIMOTSU (2008a): “Pseudo-Likelihood Estimation and Bootstrap Inference for Structural Discrete Markov Decision Models,” Journal of Econometrics, 146, 92–106. [138,160] (2008b): “Nonparametric Identification and Estimation of Multivariate Mixtures,” Unpublished Manuscript, Queen’s University. [147,169] KEANE, M. P., AND K. I. WOLPIN (1997): “The Career Decisions of Young Men,” Journal of Political Economy, 105, 473–522. [136]
MODELS OF DYNAMIC DISCRETE CHOICES
175
KITAMURA, Y. (2004): “Nonparametric Identifiability of Finite Mixtures,” Unpublished Manuscript, Yale University. [139] LEROUX, B. G. (1992): “Consistent Estimation of a Mixing Distribution,” Annals of Statistics, 20, 1350–1360. [169] LIU, X., AND Y. SHAO (2003): “Asymptotics for Likelihood Ratio Tests Under Loss of Identifiability,” Annals of Statistics, 31, 807–832. [169] MADANSKY, A. (1960): “Determinantal Methods in Latent Class Analysis,” Psychometrika, 25, 183–198. [145,148] MAGNAC, T., AND D. THESMAR (2002): “Identifying Dynamic Discrete Decision Processes,” Econometrica, 70, 801–816. [139,147,162] MCLACHLAN, G. J., AND D. PEEL (2000): Finite Mixture Models. New York: Wiley. [169] PAKES, A., M. OSTROVSKY, AND S. BERRY (2007): “Simple Estimators for the Parameters of Discrete Dynamic Games (With Entry/Exit Examples),” RAND Journal of Economics, 38, 373–399. [138] PESENDORFER, M., AND P. SCHMIDT-DENGLER (2008): “Asymptotic Least Squares Estimators for Dynamic Games,” Review of Economic Studies, 75, 901–928. [138] RAO, P. (1992): Identifiability in Stochastic Models. San Diego: Academic Press. [136] RIDDER, G. (1990): “The Non-Parametric Identification of Generalized Accelerated FailureTime Models,” Review of Economic Studies, 57, 167–181. [139] RUST, J. (1987): “Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher,” Econometrica, 55, 999–1033. [142,164] (1994): “Estimation of Dynamic Structural Models, Problems and Prospects: Discrete Decision Processes,” in Advances in Econometrics: Sixth World Congress of the Econometric Society, ed. by C. Sims. Cambridge, U.K.: Cambridge University Press, 119–170. [139,162] TITTERINGTON, D. M., A. F. M. SMITH, AND U. E. MAKOV (1985): Statistical Analysis of Finite Mixture Distributions. New York: Wiley. [136] VAN DEN BERG, G. J. (2001): “Duration Models: Specification, Identification and Multiple Durations,” in Handbook of Econometrics, Vol. 5, ed. by J. Heckman and E. Leamer. Amsterdam: North-Holland. [139]
Dept. of Economics, University of Western Ontario, London, Ontario, N6A 5C2 Canada; [email protected] and Dept. of Economics, Queen’s University, 94 University Avenue, Kingston, Ontario, K7L 3N6 Canada; [email protected]. Manuscript received October, 2006; final revision received April, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 177–234
LONG-TERM RISK: AN OPERATOR APPROACH BY LARS PETER HANSEN AND JOSÉ A. SCHEINKMAN1 We create an analytical structure that reveals the long-run risk-return relationship for nonlinear continuous-time Markov environments. We do so by studying an eigenvalue problem associated with a positive eigenfunction for a conveniently chosen family of valuation operators. The members of this family are indexed by the elapsed time between payoff and valuation dates, and they are necessarily related via a mathematical structure called a semigroup. We represent the semigroup using a positive process with three components: an exponential term constructed from the eigenvalue, a martingale, and a transient eigenfunction term. The eigenvalue encodes the risk adjustment, the martingale alters the probability measure to capture long-run approximation, and the eigenfunction gives the long-run dependence on the Markov state. We discuss sufficient conditions for the existence and uniqueness of the relevant eigenvalue and eigenfunction. By showing how changes in the stochastic growth components of cash flows induce changes in the corresponding eigenvalues and eigenfunctions, we reveal a long-run riskreturn trade-off. KEYWORDS: Risk-return trade-off, long run, semigroups, Perron–Frobenius theory, martingales.
1. INTRODUCTION WE STUDY LONG-RUN NOTIONS of a risk-return relationship that feature the pricing of exposure to uncertainty in the growth rates of cash flows. In financial economics, risk-return trade-offs show how expected rates of return over small intervals are altered in response to changes in the exposure to the underlying shocks that impinge on the economy. In continuous-time modeling, the length of the interval is driven to zero to deduce a limiting local relationship. In contrast to the local analysis, we focus on what happens as the length of time between valuation and payoff becomes large. In a dynamic setting, asset payoffs depend on both the future state and on the date when the payoff will be realized. Risk-averse investors require compensation for their risk exposure, giving rise to risk premia in equilibrium pricing. There is a term structure of risk premia to consider. There are many recent developments in asset pricing theory that feature the intertemporal composition of risk. The risk premia for different investment horizons are necessarily related, just as long-term interest rates reflect a compounding of short-term 1
Comments from Jaroslav Boroviˇcka, Rene Carmona, Vasco Carvalho, Junghoon Lee, Angelo Melino, Eric Renault, Chris Rogers, Mike Stutzer, Grace Tsiang, and Yong Wang were very helpful in preparing this paper. We also benefited from valuable feedback from the co-editor, Larry Samuelson, and the referees of this paper. This material is based on work supported by the National Science Foundation under award numbers SES-05-19372, SES-03-50770, and SES-0718407. Portions of this work were done while José Scheinkman was a Fellow of the John Simon Guggenheim Memorial Foundation. © 2009 The Econometric Society
DOI: 10.3982/ECTA6761
178
L. P. HANSEN AND J. A. SCHEINKMAN
rates. The risk counterpart to this compounding is most transparent in lognormal environments with linear state dynamics and constant volatility (e.g., see Hansen, Heaton, and Li (2008)). Our aim, however, is to support the analysis of a more general class of models that allow for nonlinearity in the state dynamics. Risk premia depend on both risk exposure and the price of that exposure. The methods we develop in this paper are useful for representing the exposure of cash flows and the price of that exposure in long horizons. While we are interested in the entire term structure of risk prices, the aim of this paper is to establish limiting behavior as investment horizons become large. There are two reasons for such an investigation. First, it provides a complementary alternative to the local pricing that is familiar from the literature on asset pricing. Comparison of the two allows us to understand the (average) slope of the term structure of risk prices. Second, economics is arguably a more reliable source of restrictions over longer time horizons. Thus it is advantageous to have tools that allow us to characterize how equilibrium risk prices are predicted to behave in the long run and how the prices depend on ingredients of the underlying model of the economy. The continuous-time local analysis familiar in financial economics is facilitated by the use of stochastic differential equations driven by a vector Brownian motion and jumps. An equilibrium valuation model gives the prices of the instantaneous exposure of payoffs to these risks. Values over alternative horizons can be inferred by integrating appropriately these local prices. Such computations are nontrivial when there are nonlinearities in the evolution of state variables or valuations. This leads us to adopt an alternative approach based on an operator formulation of asset pricing. As in previous research, we model asset valuation using operators that assign prices today to payoffs on future dates. Since these operators are defined for each payoff date, we build an indexed family of such pricing operators. This family is referred to as a semigroup because of the manner in which the operators are related to one another.2 We show how to modify valuation operators in a straightforward way to accommodate stochastic cash-flow growth. It is the evolution of such operators as the payoff date changes that interests us. Long-run counterparts to risk-return trade-offs are reflected in how the limiting behavior of the family of operators changes as we alter the stochastic growth of the cash flows. We study the evolution of the family of valuation operators in a continuoustime framework, although important aspects of our analysis are directly applicable to discrete-time specifications. Our analysis is made tractable by assuming the existence of a Markov state that summarizes the information in the economy pertinent for valuation. The operators we use apply to functions of this Markov state and can be represented as Mt ψ(x) = E[Mt ψ(Xt )|X0 = x] 2 See Garman (1984) for an initial contribution featuring the use of semigroups in modeling asset prices.
LONG-TERM RISK
179
for some process M appropriately restricted to ensure intertemporal consistency and to guarantee that the Markov structure applies to pricing. The principal restriction we impose is that M has a “multiplicative” property, resulting in a family of operators indexed by t that is a semigroup. A central mathematical result that we establish and exploit is a multiplicative factorization, (1)
ˆ t φ(X0 ) Mt = exp(ρt)M φ(Xt )
ˆ is a martingale whose logarithm has stationary increments, and φ is where M a positive function.3 While this decomposition is typically not unique, we show that there is at most one such decomposition that is of value to study longterm approximation. Intuitively, we may think of the scalar ρ as a deterministic growth rate and of the ratio of positive random variables (φ(X0 ))/(φ(Xt )) as a transitory contribution. We construct this representation using a function φ which is a principal (that is positive) eigenfunction of every operator Mt in the semigroup, that is, (2)
Mt φ(x) = exp(ρt)φ(x)
ˆ to change the probability measure In our analysis, we use the martingale M prior to our study of approximation. The principal eigenfunction φ used in constructing this change characterizes the limiting state dependence of the semigroup. We use the multiplicative factorization (1) to study two alternative long-run counterparts to risk-return trade-offs. It allows us to isolate enduring components to cash-flows or prices and to explore how these components are valued. For instance, cash flows with different stochastic growth components are valued differently. One notion of a long-run risk-return trade-off characterizes how the asymptotic rate of return required as compensation depends on the long-run cash-flow risk exposure. A second notion cumulates returns that are valued in accordance to a local risk-return trade-off. A corresponding long-run trade-off gives the asymptotic growth rates of alternative cumulative returns over long time horizons as a function of the risk exposures used to construct the local returns. Previous papers have explored particular characterizations of the term structure of risk pricing of cash flows. (For instance, see Hansen, Heaton, and Li 3
Alvarez and Jermann (2005) proposed the use of such a decomposition to analyze the longrun behavior of stochastic discount factors and cited an early version of our paper for proposing the link between this decomposition and principal eigenvalues and functions. We develop this connection, formally, and establish existence and uniqueness results for a general class of multiplicative functionals.
180
L. P. HANSEN AND J. A. SCHEINKMAN
(2008) and Lettau and Wachter (2007).) In this regard, local pricing characterizes one end of this term structure and our analysis characterizes the other end. Hansen, Heaton, and Li (2008) characterized the long-run cash-flow risk prices for discrete-time log-normal environments. Their characterization shares our goal of pricing the exposure to stochastic growth risk, but to obtain analytical results, they excluded potential nonlinearity and temporal variation in volatility. Hansen, Heaton, and Li (2008) examined the extent to which the long-run cash-flow risk prices from a family of recursive utility models can account for the value heterogeneity observed in equity portfolios with alternative ratios of book value to market value. Our paper shows how to produce such pricing characterizations for more general nonlinear Markov environments. There is a corresponding equation to (2) that holds locally, obtained essentially by differentiating with respect to t and evaluating the derivative at t = 0. More generally, this time derivative gives rise to the generator of the semigroup. By working with the generator, we exploit some of the well known local characterizations of continuous-time Markov models from stochastic calculus to provide a solution to equation (2). While continuous-time models achieve simplicity by characterizing behavior over small time increments, operator methods have promise for enhancing our understanding of the connection between short-run and long-run behavior. The remainder of the paper is organized as follows. In Sections 3 and 4 we develop some of the mathematical preliminaries pertinent for our analysis. Specifically, in Section 3 we describe the underlying Markov process and introduce the reader to the concepts of additive and multiplicative functionals. Both type of functionals are crucial ingredients to what follows. In Sections 3.3, 3.4, and 3.5 we consider three alternative multiplicative functionals that are pertinent in intertemporal asset pricing. In Section 3.3 we use a multiplicative functional to model a stochastic discount factor process, in Section 3.4 we introduce valuation functionals that are used to represent returns over intervals of any horizon, and in Section 3.5 we use multiplicative functionals to represent the stochastic growth components to cash flows. In Section 4 we study semigroups. Semigroups are used to evaluate contingent claims written on the Markov state indexed by the elapsed time between trading date and the payoff date. In Section 5 we define an extended generator associated with a multiplicative functional, and in Section 6 we introduce principal eigenvalues and functions and use these objects to establish a representation of the form (1). We present approximation results that justify formally the long-run role of a principal eigenfunction and eigenvalue, and show that there is at most one appropriate eigenfunction in Section 7. Applications to financial economics are given in Section 8. Among other things, we derive long-term counterparts to risk-return trade-offs by making the valuation horizon arbitrarily large. Finally, in Section 9 we discuss sufficient conditions for the existence of the principal eigenvectors needed to support our analysis.
LONG-TERM RISK
181
2. STOCHASTIC DISCOUNT FACTORS AND PRICING Consider a continuous-time Markov process {Xt : t ≥ 0} and the (completed) filtration Ft generated by its histories. A strictly positive stochastic discount factor process {St : t ≥ 0} is an adapted (St is Ft measurable) positive process that is used to discount payoffs. The date zero price of a claim to the payoff Πt at t is E[St Πt |F0 ] For each date t, this representation follows from the representation of positive linear functionals on appropriately constructed payoff spaces. The stochastic discount factor process is positive with probability 1 and satisfies S0 = 1 because of the presumed absence of arbitrage. Let τ ≤ t be an intermediate trading date between date zero and date t. The time t + u payoff could be purchased at date zero or alternatively it could be purchased at date τ with a prior date zero purchase of a claim to the date τ purchase price. The law of one price guarantees that these two ways to acquire the payoff Πt must have the same initial cost. The same must be true if we scale the Πt by a bounded random variable available to the investor at date τ. This argument allows us to infer the date τ prices from the date zero prices. Specifically, for τ ≤ t, St (3) E Πt Fτ Sτ is the price at time τ of a claim to the payoff Πt at t. Thus once we have a date zero representation of prices at alternative investment horizons via a stochastic discount factor process, we may use that same process to represent prices at subsequent dates by forming the appropriate ratios of the date zero stochastic discount factors. This representation imposes temporal consistency in valuation enforced by the possibility of trading at intermediate dates.4 We add a Markov restriction to this well known depiction of asset prices. Expression (3) can then be used to define a pricing operator St . In particular, if ψ(Xt ) is a random payoff at t that depends only on the current Markov state, its time zero price is (4)
St ψ(x) = E[St ψ(Xt )|x0 = x]
expressed as a function of the initial Markov state. The fact that the price depends only on the current state restricts the stochastic discount factor process. While St can depend on the Markov process between dates 0 and t, we do not 4 This temporal consistency property follows formally from a “consistency axiom” in Rogers (1998).
182
L. P. HANSEN AND J. A. SCHEINKMAN
allow it to depend on previous history of the Markov state; otherwise, this history would be reflected in the date zero prices. As first remarked by Garman (1984), the temporal consistency in valuation insures the that family of such linear pricing operators {St : t ≥ 0} satisfies a semigroup property: S0 = I and St+u ψ(x) = St Su ψ(x). In our asset pricing setting, the semigroup property is thus an iterated value property that connects pricing over different time horizons.5 Consider again trading at the intermediate date τ. Then Markov valuation between dates τ and t can be depicted using the operator St−τ or it can be depicted using (3). As a consequence, St /Sτ should depend only on the Markov process between dates t and τ. Moreover, (5)
St = St−τ (θτ ) Sτ
where θτ is a shift operator that moves the time subscript of the Markov process used in the construction of St−τ forward by τ time units. Property (5) is a restriction on functionals built from an underlying Markov process, and we will call functionals that satisfy property (5) and have initial value 1 multiplicative functionals. Later we will give convenient representations of such functionals. Our approach is motivated by this multiplicative property of the stochastic discount factor and uses the connection between this multiplicative property and the semigroup property of the family of pricing operators. We will also use this multiplicative property to study the valuation of payoffs with stochastic growth components. To accommodate these other processes, we set up a more general framework in the next couple of sections. 3. MARKOV AND RELATED PROCESSES We first describe the underlying Markov process, and then build other convenient processes from this underlying process. These additional processes are used to represent stochastic discounting, stochastic growth, and returns over long horizons. 3.1. Baseline Process Let {Xt : t ≥ 0} be a continuous-time, strong Markov process defined on a probability space {Ω F Pr} with values on a state space D0 ⊂ Rn The sample paths of X are continuous from the right and with left limits, and we will sometimes also assume that this process is stationary and ergodic. Let Ft be completion of the sigma algebra generated by {Xu : 0 ≤ u ≤ t}. 5 Garman (1984) allowed for non-Markov environments. In this case, the family of operators forms an “evolution semigroup.” We adopt a Markov formulation of the law of one price for tractability.
LONG-TERM RISK
183
One simple example is the following: EXAMPLE 3.1—Finite-State Markov Chain: Consider a finite-state Markov chain with states xj for j = 1 2 N. The local evolution of this chain is governed by an N × N intensity matrix U. An intensity matrix encodes all of the transition probabilities. The matrix exp(tU) is the matrix of transition probabilities over an interval t. Since each row of a transition matrix sums to unity, each row of U must sum to zero. The diagonal entries are nonpositive and represent minus the intensity of jumping from the current state to a new one. The remaining row entries, appropriately scaled, represent the conditional probabilities of jumping to the respective states. When treating infinite-state spaces, we restrict the Markov process X to be a semimartingale. As a consequence, we can extract a continuous component X c and what remains is a pure jump process X j . The evolution of the jump component is given by j dXt = yζ(dy dt) Rn
where ζ = ζ(· ·; ω) is a random counting measure. That is, for each ω, ζ(b [0 t]; ω) gives the total number of jumps in [0 t] with a size in the Borel set b in the realization ω. In general, the associated Markov stochastic process X may have an infinite number of small jumps in any time interval. In what follows we will assume that this process has a finite number of jumps over a finite time interval. This rules out most Lévy processes, but greatly simplifies the notation. In this case, there is a finite measure η(dy|x) dt that is the compensator of the random measure ζ. It is the (unique) predictable random measure, such that for each predictable stochastic function f (x t; ω), the process t t f (y s; ω)ζ(dy ds; ω) − f (y s; ω)η[dy|Xs− (ω)] ds 0
Rn
0
Rn
is a martingale. The measure η encodes both a jump intensity and a distribution of the jump size given that a jump occurs. The jump intensity is the implied conditional measure of the entire state space D0 , and the jump distribution is the conditional measure divided by the jump intensity. We presume that the continuous sample path component satisfies the stochastic evolution dXtc = ξ(Xt− ) dt + Γ (Xt− ) dBt where B is a multivariate Ft -Brownian motion and Γ (x) Γ (x) is nonsingular. Given the rank condition, the Brownian increment can be deduced from the sample path of the state vector via dBt = [Γ (Xt− ) Γ (Xt− )]−1 Γ (Xt− ) [dXtc − ξ(Xt− ) dt]
184
L. P. HANSEN AND J. A. SCHEINKMAN
EXAMPLE 3.2—Markov Diffusion: In what follows we will often use the following example. Suppose the Markov process X has two components, X f and X o , where X f is a Feller square root process and is positive, and X o is an Ornstein–Uhlenbeck process and ranges over the real line, f f f f dXt = ξf (x¯ f − Xt ) dt + Xt σf dBt dXto = ξo (x¯ o − Xto ) dt + σo dBto f with ξi > 0, x¯ i > 0 for i = f o, and 2ξf x¯ f ≥ (σf )2 , where B = BBo is a bivariate standard Brownian motion. The parameter restrictions guarantee that there is a stationary distribution associated with X f with support contained in R+ .6 For convenience, we make the two processes independent. We use X o to model predictability in growth rates and X f to model predictability in volatility. We normalize σo to be positive and σf to be negative. We specify σf to be negative f so that a positive increment to Bt reduces volatility. 3.2. Multiplicative Functionals A functional is a stochastic process constructed from the original Markov process: DEFINITION 3.1: A functional is a real-valued process {Mt : t ≥ 0} that is adapted (Mt is Ft measurable for all t). We will assume that Mt has a version with sample paths that are continuous from the right with left limits. Recall that θ denotes the shift operator. Using this notation, write Mu (θt ) for the corresponding function of the X process shifted forward t time periods. Since Mu is constructed from the Markov process X between dates zero and u, Mu (θt ) depends only on the process between dates t and date t + u. DEFINITION 3.2: The functional M is multiplicative if M0 = 1 and Mt+u = Mu (θt )Mt . Products of multiplicative functionals are multiplicative functionals. We are particularly interested in strictly positive multiplicative functionals. In this case, one may define a new functional A = log(M) that will satisfy an additive property. It turns out that it is more convenient to parameterize M using its logarithm A. The functional A will satisfy the following definition: 6 We could accommodate the case where Bf or Bo is each multidimensional by considering a filtration {Ft } larger than the one generated by X. In effect, we would enlarge the state space in ways that were inconsequential to the computations that interest us. However, for simplicity we have assumed throughout this paper that {Ft } is the filtration generated by X.
LONG-TERM RISK
185
DEFINITION 3.3: A functional A is additive if A0 = 0 and At+u = Au (θt ) + At for each nonnegative t and u.7 Exponentials of additive functionals are strictly positive multiplicative functionals. While the joint process {(Xt At ) : t ≥ 0} is Markov, by construction the additive functional does not Granger cause the original Markov process. Instead it is constructed from that process. No additional information about the future values of X are revealed by current and past values of A. When X is restricted to be stationary, an additive functional can be nonstationary but it has stationary increments. The following are examples of additive functionals: EXAMPLE 3.3: Given any continuous function ψ, At = ψ(Xt ) − ψ(X0 ). EXAMPLE 3.4: Let β be a Borel measurable function on D0 and construct t At = β(Xu ) du 0
where
t 0
β(Xu ) du < ∞ with probability 1 for each t.
EXAMPLE 3.5: Form t γ(Xu− ) dBu At = 0
where
t 0
|γ(Xu )|2 du is finite with probability 1 for each t.
EXAMPLE 3.6: Form
κ(Xu Xu− ) At = 0≤u≤t
where κ : D0 × D0 → R, κ(x x) = 0. Sums of additive functionals are additive functionals. We may thus use Examples 3.4, 3.5, and 3.6 as building blocks in a parameterization of additive functionals. This parameterization uses a triple (β γ κ) that satisfies the following situations: t (a) β : D0 → R and 0 β(Xu ) du < ∞ for every positive t. t (b) γ : D0 → Rm and 0 |γ(Xu )|2 du < ∞ for every positive t. 7 Notice that we do not restrict additive functionals to have bounded variation as, for example, Revuz and Yor (1994).
186
L. P. HANSEN AND J. A. SCHEINKMAN
(c) κ : D0 × D0 → R, κ(x x) = 0 for all x ∈ D0 , exp(κ(y x))η(dy|x) < ∞ for all x ∈ D0 . Form t t
β(Xu ) du + γ(Xu− ) dBu + κ(Xu Xu− ) At = 0
0
0≤u≤t
t
=
β(Xu ) du 0
+
t
γ(Xu− ) [Γ (Xu− ) Γ (Xu− )]−1 Γ (Xu− ) [dXuc − ξ(Xu− ) du]
0
+
κ(Xu Xu− )
0≤u≤t
This additive functional is a semimartingale. While we will use extensively these parameterizations of an additive functional, they are not exhaustive as the following example illustrates. EXAMPLE 3.7: Suppose that {Xt : t ≥ 0} is a standard scalar Brownian motion, suppose b is a Borel set in R, and define the occupation time of b up to time t as t At = 1{Xu ∈b} du 0
At is an additive functional. As a consequence, the local time at a point r, defined as 1 t Lt = lim 1{Xu ∈(r−r+)} du ↓0 2 0 is also an additive functional. Since the logarithm of a strictly positive multiplicative process is an additive process, we will consider parameterized versions of strictly positive multiplicative processes by parameterizing the corresponding additive process. For instance, if M = exp(A) when A is parameterized by (β γ κ), we will say that the multiplicative process M is parameterized by (β γ κ). Notice that Ito’s lemma guarantees that dMt |γ(Xt− )|2 dt + γ(Xt− ) dBt = β(Xt− ) + Mt− 2 + exp[κ(Xt Xt− )] − 1
187
LONG-TERM RISK
The multiplicative process {Mt : t ≥ 0} of this form is a local martingale if and only if |γ|2 β+ (6) + exp[κ(y ·)] − 1 η(dy|·) = 0 2 For the remainder of this section we describe three types of multiplicative functionals that we use in our subsequent analysis. 3.3. Stochastic Discount Factors In this section we write down two parameterized examples of multiplicative stochastic discount factors that we will use to illustrate our results. EXAMPLE 3.8—Breeden Model: Using the Markov process given in Example 3.2, we consider a special case of Breeden’s (1979) consumption-based asset pricing model. Suppose that equilibrium consumption evolves according to f f dct = Xto dt + Xt ϑf dBt + ϑo dBto (7) where ct is the logarithm of consumption Ct . Given our previous sign convention that σo > 0, when ϑo > 0 a positive shock dBto is unambiguously good news because it gives a favorable movement for consumption and its growth f rate. Similarly, since σf < 0, when ϑf > 0 a positive shock dBt is unambiguously good news because it reduces volatility while increasing consumption. Suppose also that investors’ preferences are given by
∞
E
exp(−bt) 0
Ct 1−a − 1 dt 1−a
for a and b strictly positive. The implied stochastic discount factor is St = exp(Ast ), where t t t f s o f Xs ds − bt − a ϑo dBso Xs ϑf dBs − a At = −a 0
0
0
EXAMPLE 3.9—Kreps–Porteus Model: When investors have time-separable logarithmic utility and perfect foresight, the continuation value process W ∗ for the consumption process satisfies the differential equation (8)
dWt ∗ = b(Wt ∗ − ct ) dt
where b is the subjective rate of time discount. This equation is solved forward with an appropriate “terminal” condition. In constructing this differential
188
L. P. HANSEN AND J. A. SCHEINKMAN
equation, we have scaled the logarithm of consumption by b for convenience. Let Wt =
1 exp[(1 − a)Wt ∗ ] 1−a
for a > 1 and notice that Wt is an increasing transformation of Wt ∗ . Thus for the purposes of representing preferences, W can be used as an ordinally equivalent continuation value process. The process W satisfies the differential equation
dWt 1 = b(1 − a)Wt (9) log[(1 − a)Wt ] − ct dt 1−a = bWt (a − 1)ct + log[(1 − a)Wt ] Next suppose that investors do not have perfect foresight. We may now think of the right-hand sides of (8) and (9) as defining the drift or local means of the continuation values. As we know from Kreps and Porteus (1978) and Duffie and Epstein (1992), the resulting preferences cease to be ordinally equivalent. The first gives the recursive equation for continuation values that are expectations of the discounted logarithmic utility. Instead we use the counterpart to the second differential equation: lim ↓0
E(Wt+ − Wt |Ft ) = bWt (a − 1)ct + log[(1 − a)Wt ]
where a > 1. The resulting preferences can be viewed as a special case of the continuous-time version of the preferences suggested by Kreps and Porteus (1978) and of the stochastic differential utility model of Duffie and Epstein (1992) and Schroder and Skiadas (1999). If we were to take the continuation value process W as a starting point in a stochastic environment and transform back to the utility index W ∗ using Wt ∗ =
1 log[(1 − a)Wt ] 1−a
the resulting drift would include a contribution of the local variance as an application of Ito’s lemma. For these preferences the intertemporal composition of risk matters. Bansal and Yaron (2004) have used this feature of preferences in conjunction with predictable components in consumption and consumption volatility as a device to amplify risk premia. This particular utility recursion we use imposes a unitary elasticity of intertemporal substitution as in the original preference specification with logarithmic utility. The parameter a alters risk prices as we will illustrate.8 8 Epstein and Zin (1989) used a more general discrete-time version of these preferences as a way to distinguish risk aversion from intertemporal substitution.
189
LONG-TERM RISK
Suppose again that consumption evolves according to equation (7). Conjecture a continuation value process of the form Wt =
1 f exp[(1 − a)(wf Xt + wo Xto + ct + w¯ )] 1−a
The coefficients satisfy −ξf wf +
(1 − a)σf2 2
(wf )2 + (1 − a)ϑf σf wf +
(1 − a)ϑ2f 2
= bwf
−ξo wo + 1 = bwo (1 − a)σo2 (wo )2 2 (1 − a)ϑ2o = bw¯ + (1 − a)ϑo σo wo + 2
ξf x¯ f wf + ξo x¯ o wo +
The equation for wf is quadratic and there are potentially two solutions. The solution that interests us is wf = (a − 1)σf ϑf + b + ξf − [(a − 1)σf ϑf + b + ξf ]2 − (a − 1)2 σf2 ϑ2f /((1 − a)σf2 ) See Appendix A. Furthermore, w0 > 0 and, as we show in Appendix A, wf < 0. The stochastic discount factor is the product of two multiplicative functionals. One has the same form as the Breeden model with a logarithmic instantaneous utility function. It is the exponential of t t t f B o f At = − Xs ds − bt − Xs ϑf dBs − ϑo dBso 0
0
0
The other functional is a martingale constructed from the forward-looking continuation value process. It is the exponential of t t f Xs (ϑf + wf σf ) dBsf + (1 − a) (ϑo + wo σo ) dBso Awt = (1 − a) 0
−
(1 − a) 2
2
0
t
Xsf (ϑf + wf σf )2 ds − 0
(1 − a)2 (ϑo + wo σo )2 t 2
We next consider a variety of ways in which multiplicative functionals can be used to build models of asset prices and to characterize the resulting implications.
190
L. P. HANSEN AND J. A. SCHEINKMAN
3.4. Valuation Functionals and Returns We use a special class of multiplicative functionals called valuation functionals to characterize local pricing. The result of this analysis will be the Markov version of a local risk-return trade-off. A valuation functional is constructed to have the following property. If the future value of the process is the payout, the current value is the price of that payout. For instance, a valuation process could be the result of continually reinvesting dividends in a primitive asset. Equivalently, it can be constructed by continually compounding realized returns to an investment. To characterize local pricing, we use valuation processes that are multiplicative functionals. Recall that the product of two multiplicative functionals is a multiplicative functional. The following definition is motivated by the connection between the absence of arbitrage and the martingale properties of properly normalized prices. DEFINITION 3.4: A valuation functional {Vt : t ≥ 0} is a multiplicative functional such that the product functional {Vt St : t ≥ 0} is a martingale. Provided that V is strictly positive, the associated gross returns over any horizon u can be calculated by forming ratios: Rtt+u =
Vt+u Vt−
This increment in the value functional scaled by the current (pre-jump) value gives an instantaneous net return. The martingale property of the product V S gives a local pricing restriction for returns. To deduce a convenient and familiar risk-return relation, consider the multiplicative functional M = V S, where V is parameterized by (βv γv κv ) and {St : t ≥ 0} is parameterized by (βs γs κs ). In particular, the implied net return evolution is dVt |γv (Xt− )|2 dt + γv (Xt− ) dBt = βv (Xt− ) + Vt− 2 + exp[κv (Xt Xt− )] − 1 Thus the expected net rate of return is |γv |2 + exp[κv (y ·)] − 1 η(dy|·) εv = βv + 2 Since both V and S are exponentials of additive processes, their product is the exponential of an additive process and is parameterized by β = βv + βs
LONG-TERM RISK
191
γ = γv + γ s κ = κ v + κs PROPOSITION 3.1: A valuation functional parameterized by (βv γv κv ) satisfies the pricing restriction |γv + γs |2 βv + βs = − − exp[κv (y ·) + κs (y ·)] − 1 η(dy|·) (10) 2 PROOF: The proof follows from the definition of a valuation functional and the martingale restriction (6). Q.E.D. This restriction is local and determines the instantaneous risk-return relation. The parameters (γv κv ) determine the Brownian and jump risk exposure. The following corollary gives the required local mean rate of return: COROLLARY 3.1: The required mean rate of return for the risk exposure (γv κv ) is |γs |2 εv = −βs − γv · γs − 2 − exp[κv (y ·) + κs (y ·)] − exp[κv (y ·)] η(dy ·) The vector −γs contains the factor risk prices for the Brownian motion components. The function κs is used to price exposure to jump risk. Then εv is the required expected rate of return expressed as a function of the risk exposure. This local relation is familiar from the extensive literature on continuous-time asset pricing.9 In the case of Brownian motion risk, the local risk price vector of the exposure to risk is given by −γs . A valuation functional is typically constructed from the values of a selffinancing strategy. Not every self-financing strategy results in a valuation which can be written as a multiplicative functional, but the class of (multiplicative) valuation functionals is sufficiently rich to extract the implied local risk prices. For this reason, we restrict ourselves in this paper to (multiplicative) valuation functionals. EXAMPLE 3.10—Breeden Example (Continued): Consider again the Markov diffusion Example 3.2 with the stochastic discount factor given in Exam9
Shaliastovich and Tauchen (2005) presented a structural model of asset prices in discrete time with a Lévy component to the risk exposure. The continuous-time counterpart would include Markov processes with an infinite number of jumps expected in any finite time interval.
192
L. P. HANSEN AND J. A. SCHEINKMAN
ple 3.8. This is a Markov version of Breeden’s model. The local risk price for exposure to the vector of Brownian motion increments is −γs =
√ a xf ϑf aϑo
and the instantaneous risk-free rate is b + axo −
a2 (xf (ϑf )2 + (ϑo )2 )
2
Consider a family of valuation processes parameterized by (β γ), where √ γ(x) = ( xf γf γo ). To satisfy the martingale restriction, we must have 1 β(x) = b + axo − [xf (γf − aϑf )2 + (γo − aϑo )2 ] 2 EXAMPLE 3.11 —Kreps–Porteus Model (Continued): Consider again the Markov diffusion Example 3.2 with the stochastic discount factor given in Example 3.9. The local risk price for exposure to the vector of Brownian motion increments is √ √ a xf ϑf + (a − 1) xf wf σf −γs = aϑo + (a − 1)wo σo and the instantaneous risk-free rate is 1 [xf (ϑf )2 + (ϑo )2 ] − (a − 1)xf ϑf (ϑf + wf σf ) 2 − (a − 1)ϑo (ϑo + wo σo )
b + xo −
In particular, the local risk prices are larger than for their counterparts in the Breeden (1979) model when ϑo and ϑf are both positive.10 As we have seen, alternative valuation functionals reflect alternative risk exposures. The examples we just discussed show how the required expected rate of return (βv ) for a given local risk exposures (γv κv ) depends on the underlying economic model and the associated parameter values. The methods we will describe allow us to characterize the behavior of expectations of valuation functionals over long horizons. 10 When ϑf is positive, Kleshchelski and Vincent (2007) showed that the real term structure will be often downward sloping.
LONG-TERM RISK
193
3.5. Growth Functionals In our analysis of valuation, we will investigate the pricing of cash flows as we extend the payoff horizon. To investigate the value implications for cash flows that grow stochastically, we introduce a reference growth process {Gt : t ≥ 0} that is a positive multiplicative functional. Consider a cash flow that can be represented as (11)
Dt = Gt ψ(Xt )D0
for some initial condition D0 , where G is a multiplicative functional. The initial condition is introduced to offset the restriction that multiplicative functionals are normalized to be unity at date zero. Heuristically, we may think of ψ(X) as the stationary component of the cash flow and of G as the growth component.11 As we will illustrate, however, the covariance between components sometimes makes this interpretation problematic. The fact that the product of multiplicative functionals is a multiplicative functional implies that the product of a stochastic discount factor functional and a growth functional is itself multiplicative. This property facilitates the construction of valuation operators designed for cash-flow processes that grow stochastically over time. In Section 2 we emphasized the connection between the multiplicative property of stochastic discount factors and the semigroup property of pricing operators. In the next section we discuss how multiplicative functionals give rise to semigroups. This development lays the groundwork for considering a variety of ways in which multiplicative functionals and their implied semigroups can be used to characterize the implications of asset pricing models over long horizons. 4. MULTIPLICATIVE FUNCTIONALS AND SEMIGROUPS Given a multiplicative functional M, our aim is to establish properties of the family of operators (12)
Mt ψ(x) = E[Mt ψ(Xt )|X0 = x] 4.1. Semigroups
Let L be a Banach space with norm · and let {Tt : t ≥ 0} be a family of operators on L. The operators in these family are linked according to the following property: 11 One can easily write down securities with a payout that cannot be represented as in equation (11), but we are interested in deriving properties of the pricing of securities with a payout as in (11) to evaluate alternative models and parameter configurations.
194
L. P. HANSEN AND J. A. SCHEINKMAN
DEFINITION 4.1: A family of linear operators {Tt : t ≥ 0} is a one-parameter semigroup if T0 = I and Tt+s = Tt Ts for all s t ≥ 0. One possibility is that these operators are conditional expectations operators, in which case this link typically follows from the law of iterated expectations restricted to Markov processes. We will also use such families of operators to study valuation and pricing. As we argued in Section 2, from a pricing perspective, the semigroup property follows from the Markov version of the law of iterated values, which holds when there is frictionless trading at intermediate dates. We will often impose further restrictions on semigroups such as follows: DEFINITION 4.2: The semigroup {Tt : t ≥ 0} is positive if for any t ≥ 0, Tt ψ ≥ 0 whenever ψ ≥ 0. 4.2. Multiplicative Semigroup The semigroups that interest us are constructed from multiplicative functionals. PROPOSITION 4.1: Let M be a multiplicative functional such that for each ψ ∈ L, E[Mt ψ(Xt )|X0 = x] ∈ L. Then Mt ψ(x) = E[Mt ψ(Xt )|X0 = x] is a semigroup in L. PROOF: For ψ ∈ L, M0 ψ = ψ and Mt+u ψ(x) = E E[Mt+u ψ(Xt+u )|Ft ]X0 = x = E E Mt Mu (θt )ψ(Xu (θt ))|Ft X0 = x = E Mt E Mu (θt )ψ(Xu (θt ))|X0 (θt ) X0 = x = E[Mt Mu ψ(Xt )|X0 = x] = Mt Mu ψ(x) which establishes the semigroup property.
Q.E.D.
In what follows we will refer to semigroups constructed from multiplicative functionals as in this proposition as multiplicative semigroups. If the multiplicative process is a stochastic discount factor, we will refer to the corresponding multiplicative semigroup as the pricing semigroup. Other semigroups also interest us.
LONG-TERM RISK
195
4.3. Valuation Semigroups Associated with a valuation functional V is a semigroup {Vt : t ≥ 0}. For any such valuational functional, we will derive the asymptotic growth rates of the implied cumulative return over long time horizons. The limiting growth rate expressed as a function of the risk exposures (γv κv ) gives one version of longterm risk-return trade-off. While measurement of long-horizon returns in loglinear environments has commanded much attention, operator methods can accommodate volatility movements as well. (See Bansal, Dittmar, and Kiku (2008) for a recent addition to this literature.) Our characterization of the long-run expected rate of return is motivated by our aim to quantify a risk-return relation. In contrast, Stutzer (2003) used the conditional expectation of a valuation functional raised to a negative power to develop a large deviation criterion for portfolio evaluation over large horizons. He also related this formulation to the familiar power utility model applied to terminal wealth appropriately scaled. Since a multiplicative functional raised to a negative power remains multiplicative, the limits we characterize are also germane to his analysis. In what follows we will suggest another way to represent a long-term risk return trade-off. 4.4. Semigroups Induced by Cash-Flow Growth We study cash flows with a common growth component using the semigroup Qt ψ(x) = E[Gt St ψ(Xt )|X0 = x] instead of the pricing semigroup {St } constructed previously. The date zero price assigned to Dt is D0 Qt ψ(X0 ). More generally, the date τ price assigned to Dt+τ is D0 Gτ Qt ψ(Xτ ). Thus the date τ price to (current period) payout ratio is D0 Gτ Qt ψ(Xτ ) Qt ψ(Xτ ) = Dτ ψ(Xτ ) provided that ψ(Xτ ) is different from zero. For a security such as an equity with a perpetual process of cash payouts or dividends, the price–dividend ratio is the integral of all such terms for t ≥ 0. Our subsequent analysis will characterize the limiting contribution to this value. The rate of decay of Qt ψ(Xτ ) as t increases will give a measure of the duration of the cash flow as it contributes to the value of the asset. This semigroup assigns values to cash flows with common growth component G but alternative transient contributions ψ. To study how valuation is altered when we change stochastic growth, we will be led to alter the semigroup.
196
L. P. HANSEN AND J. A. SCHEINKMAN
When the growth process is degenerate and equal to unity, the semigroup is identical to the one constructed previously in Section 2. This semigroup is useful in studying the valuation of stationary cash flows including discount bonds and the term structure of interest rates. It supports local pricing and generalizations of the analyses in Backus and Zin (1994) and Alvarez and Jermann (2005) that use fixed income securities to make inferences about economic fundamentals. This semigroup offers a convenient benchmark for the study of long-term risk just as a risk-free rate offers a convenient benchmark in local pricing. The decomposition (11) used in this semigroup construction is not unique. For instance, let ϕ be a strictly positive function of the Markov state. Then ϕ(Xt ) ψ(Xt ) [D0 ϕ(X0 )] Dt = Gt ψ(Xt )D0 = Gt ϕ(X0 ) ϕ(Xt ) Since (ψ(Xt ))/(ϕ(Xt )) is a transient component, we can produce (infinitely) many such decompositions. For decomposition (11) to be unique, we must thus restrict the growth component. ˆ t , where G ˆ is a marA convenient restriction is to require that Gt = exp(δt)G tingale. With this choice, by construction G has a constant conditional growth ˆ from a large rate δ. Later we show how to extract martingale components, G’s, class of multiplicative functionals G. In this way we will establish the existence of such a decomposition. Even with this restriction, the decomposition will not necessarily be unique, but we will justify a particular choice. We investigate long-term risk by changing the reference growth functionals. These functionals capture the long-term risk exposure of the cash flow. Our approach extends the analysis of Hansen, Heaton, and Li (2008) beyond loglinear environments. As we will demonstrate, the valuation of cash flows with common reference growth functionals will be approximated by a single dominant component when the valuation horizon becomes long. Thus the contributions to value that come many periods into the future will be approximated by a single pricing factor that incorporates an adjustment for risk. Changing the reference growth functional alters the long-term risk exposure with a corresponding adjustment in valuation. Each reference growth functional will be associated with a distinct semigroup. We will characterize long-term risk formally by studying the limiting behavior of the corresponding semigroup. As we have just seen, semigroups used for valuing growth claims are constructed by forming products of two multiplicative functionals: a stochastic discount factor functional and a growth functional. Pricing stationary claims and constructing cumulative returns lead to the construction of alternative multiplicative functionals. Table I gives a summary of the alternative multiplicative functionals and semigroups. For this reason, we will study the behavior of a general multiplicative semigroup: Mt ψ(x) = E[Mt ψ(Xt )|X0 = x]
LONG-TERM RISK
197
TABLE I ALTERNATIVE SEMIGROUPS AND MULTIPLICATIVE FUNCTIONALS Object
Stochastic discount factor Cumulated return Stochastic growth Valuation with stochastic growth
Multiplicative Functional
Semigroup
S V G Q = GS
{St } {Vt } {Gt } {Qt }
for some strictly positive multiplicative functional M. The next three sections establish some basic representation and approximation results for multiplicative semigroups that are needed for our subsequent economic analysis of long-term risk. An important vehicle in this study is the extended generator associated with a multiplicative process. This generator is a local (in time) construct. We develop its properties in Section 5. In Section 6 we show how to use a principal eigenfunction of this extended generator to construct our basic multiplicative decomposition (1). As we demonstrate in Section 7, an appropriately chosen eigenfunction and its associated eigenvalue dictate the long-term behavior of a multiplicative semigroup and the corresponding multiplicative functional. After establishing these basic results, we turn to the featured application in our paper: How do we characterize the longterm risk-return trade-off? 5. GENERATORS In this section we define a notion of an extended generator associated with a multiplicative functional. The definition parallels the definition of an extended (infinitesimal) generator associated with Markov processes as in, for example, Revuz and Yor (1994). Our extended generator associates to each function ψ a function χ such that Mt χ(Xt ) is the “expected time derivative” of Mt ψ(Xt ). DEFINITION 5.1: A Borel function ψ belongs to the domain of the extended generator A of the multiplicative functional M if there exists a Borel function t χ such that Nt = Mt ψ(Xt ) − ψ(X0 ) − 0 Ms χ(Xs ) ds is a local martingale with respect to the filtration {Ft : t ≥ 0}. In this case, the extended generator assigns the function χ to ψ and we write χ = Aψ. For strictly positive multiplicative processes M, the extended generator is (up to sets of measure zero) single valued and linear. In the remainder of the paper, if the context is clear, we often refer to the extended generator simply as the generator. Our first example deals with Markov chains:
198
L. P. HANSEN AND J. A. SCHEINKMAN
EXAMPLE 5.1—Markov Chain Generator: Recall the finite-state Markov chain Example 3.1 with intensity matrix U. Let uij denote entry (i j) of this matrix. Consider a multiplicative functional that is the product of two components. The first component decays at rate βi when the Markov state is xi . The second component only changes when the Markov process jumps from state i to state j, in which case the multiplicative functional is scaled by exp[κ(xj xi )]. From this construction we can deduce the generator A for the multiplicative semigroup depicted as a matrix with entry (i j): uii − βi if i = j, aij = uij exp[κ(xj xi )] if i = j. This formula uses the fact that in computing the generator we are scaling probabilities by the potential proportional changes in the multiplicative functional. The matrix A is not necessarily an intensity matrix. The row sums are not necessarily zero. The reason for this is that the multiplicative functional can include pure discount effects or pure growth effects. These effects can be present even when the βi ’s are zero, since it is typically the case that
uij exp[κ(xj xi )] = −uii j =i
The unit function is a trivial example of a multiplicative functional. In this case the extended generator is exactly what is called in the literature the extended generator of the Markov process X. When X is parameterized by (η ξ Γ ), Ito’s formula shows that the generator has the representation
∂2 φ(x) ∂φ(x) 1 (13) + trace Σ(x) Aφ(x) = ξ(x) · ∂x 2 ∂x ∂x + [φ(y) − φ(x)]η(dy|x) where Σ = Γ Γ provided φ is C 2 and the integral in (13) is finite. Recall our earlier parameterization of an additive functional A in terms of the triple (β γ κ). The process M = exp(A) is a multiplicative functional. We now display how to go from the extended generator of the Markov process X, that is, the generator associated with M ≡ 1, to the extended generator of the multiplicative functional M. The formulas below use the parameterization for the multiplicative process to transform the generator of the Markov process into the generator of the multiplicative semigroup and are consequences of Ito’s lemma: (a) Jump measure: exp[κ(y x)]η(dy|x). (b) First derivative term: ξ(x) + Γ (x)γ(x). (c) Second derivative term: Σ(x).
LONG-TERM RISK
199
(d) Level term: β(x) + |γ(x)|2 /2 + (exp[κ(y x)] − 1)η(dy x). The Markov chain example that we discussed above can be seen as a special case where γ ξ and Γ are all null. There are a variety of direct applications of this analysis. In the case of the stochastic discount factor introduced in Section 3.3, the generator encodes the local prices reflected in the local risk-return trade-off of Proposition 3.1. The level term that arises gives the instantaneous version of a risk-free rate. In the absence of jump risk, the increment to the drift gives the factor risk prices. The function κ shows us how to value jump risk in small increments in time. In a further application, Anderson, Hansen, and Sargent (2003) used this decomposition to characterize the relation among four alternative semigroups, each of which is associated with an alternative multiplicative process. Anderson, Hansen, and Sargent (2003) featured models of robust decision making. In addition to the generator for the original Markov process, a second generator depicts the worst-case Markov process used to support the robust equilibrium. There is a third generator of an equilibrium pricing semigroup, and a fourth generator of a semigroup used to measure the statistical discrepancy between the original model and the worst-case Markov model. 6. PRINCIPAL EIGENFUNCTIONS AND MARTINGALES As stated in the Introduction, we use a decomposition of the multiplicative functional to study long-run behavior. We construct this decomposition using an appropriate eigenfunction of the generator associated to the multiplicative functional. DEFINITION 6.1: A Borel function φ is an eigenfunction of the extended generator A with eigenvalue ρ if Aφ = ρφ. Intuitively, if φ is an eigenfunction, the “expected time derivative” of Mt φ(Xt ) is ρMt φ(Xt ). Hence the expected time derivative of exp(−ρ × t)Mt φ(Xt ) is zero. The next proposition formalizes this intuition. PROPOSITION 6.1: Suppose that φ is an eigenfunction of the extended generator associated with the eigenvalue ρ. Then exp(−ρt)Mt φ(Xt ) is a local martingale. t PROOF: Nt = Mt ψ(Xt ) − ψ(X0 ) − ρ 0 Ms ψ(Xs ) ds is a local martingale that is continuous from the right with left limits and thus is a semimartingale (Protter (2005, Chap. 3, Corollary to Theorem 26)); hence Yt = Mt φ(Xt ) is also a
200
L. P. HANSEN AND J. A. SCHEINKMAN
semimartingale. Since dNt = dYt − ρYt− dt, integration by parts yields t t exp(−ρt)Yt − Y0 = − ρ exp(−ρs)Ys ds + exp(−ρs) dYs =
0
0
t
exp(−ρs) dNs 0
Q.E.D.
It is the strictly positive eigenfunctions that interest us. DEFINITION 6.2: A principal eigenfunction of the extended generator is an eigenfunction that is strictly positive. COROLLARY 6.1: Suppose that φ is a principal eigenfunction with eigenvalue ρ for the extended generator of the multiplicative functional M. Then this multiplicative functional can be decomposed as ˆ t φ(X0 ) Mt = exp(ρt)M φ(Xt ) ˆ t = exp(−ρt)Mt (φ(Xt ))/(φ(X0 )) is a local martingale and a multiwhere M plicative functional. ˆ be the local martingale from Corollary 6.1. Since M ˆ is bounded from Let M below, the local martingale is necessarily a supermartingale and thus for t ≥ u, ˆ u ˆ t |Fu ) ≤ M E(M We are primarily interested in the case in which this local martingale is actually a martingale: ˆ is a martingale. ASSUMPTION 6.1: The local martingale M For Assumption 6.1 to hold, it suffices that the local martingale N introduced in the proof of Proposition 6.1 is a martingale. In Appendix C we give primitive conditions that imply Assumption 6.1. ˆ to find a new probability on the When Assumption 6.1 holds, we may use M sigma algebra generated by Ft for each t. Later, we will use this new probability to establish approximation results that hold for long horizons. We did not restrict φ to belong to the Banach space L where the semigroup ˆ is a supermartingale, Mt φ := exp(ρt)φ(x) × {Mt : t ≥ 0} was defined. Since M ˆ t |X0 = x] is always well defined. In addition, the semigroup {M ˆ t : t ≥ 0} is E[M well defined at least on the Banach space of bounded Borel measurable functions. Moreover:
LONG-TERM RISK
201
PROPOSITION 6.2: If φ is a principal eigenfunction with eigenvalue ρ for the extended generator of the multiplicative functional M, then for each t ≥ 0, exp(ρt)φ ≥ Mt φ. If, in addition, Assumption 6.1 holds then, for each t ≥ 0, (14)
Mt φ = exp(ρt)φ
Conversely, if φ is strictly positive, Mt φ is well defined for t ≥ 0, and (14) holds, ˆ is a martingale. then M PROOF: ˆ t |X0 = x] = 1 ≥ E[M
exp(−ρt) E[Mt φ(Xt )|X0 = x] φ(x)
ˆ is a martingale. Conversely, using (14) and the multiwith equality when M plicative property of M one obtains, E exp(−ρt)Mt φ(Xt )|Fs = exp(−ρt)Ms E Mt−s (θs )φ(Xt )|Xs = exp(−ρs)Ms φ(Xs )
Q.E.D.
Proposition 6.2 guarantees that under Assumption 6.1 a principal eigenfunction of the extended generator also solves the principal eigenvalue problem given by (14). Conversely, a strictly positive solution to the principal eigenvalue ˆ problem (14) yields a decomposition as in Corollary 6.1, where the process M is actually a martingale. In light of the decomposition given by Corollary 6.1, when the local marˆ is a martingale, we will sometimes refer to ρ as the growth rate tingale M ˆ as its martingale component, and of the multiplicative functional M, to M to (φ(X0 ))/(φ(Xt )) as its transient or stationary component. This decomposition is typically not unique, however. As we have defined them, there may be multiple principal eigenfunctions even after a normalization. Each of these principal eigenfunctions implies a distinct decomposition, provided that we establish that the associated local martingales are martingales. Since the martingale and the stationary components are correlated, it can happen that ˆ t (φ(X0 ))/(φ(Xt ))|X0 = x] diverges exponentially, challenging the interE[M pretation that ρ is the asymptotic growth rate of the semigroup. We take up this issue in the next section. REMARK 6.1: There are well known martingale decompositions of additive functionals with stationary increments used to deduce the central limit approximation and to characterize the role of permanent shocks in time series. The nonlinear, continuous-time Markov version of such a decomposition is At = ωt + mt − υ(Xt ) + υ(X0 )
202
L. P. HANSEN AND J. A. SCHEINKMAN
where {mt : t ≥ 0} is a martingale with stationary increments (see Bhattacharya (1982) and Hansen and Scheinkman (1995)). Exponentiating this decomposition yields another decomposition similar to the one in Corollary 6.1 except that the exponential of a martingale is not a martingale. When the martingale increments are constant functions of Brownian increments, then exponential adjustment has simple consequences.12 In particular, the exponential adjustment is offset by changing ω. With state dependent volatility in the martingale approximation, however, there is no longer a direct link between the additive and the multiplicative decompositions. In this case, the multiplicative decomposition of Corollary 6.1 is the one that is valuable for our purposes. EXAMPLE 6.1—Markov Chain Example: Recall that for a finite-state space, we can represent the Markov process in terms of a matrix U that serves as its generator. Previously we constructed the corresponding generator A of the multiplicative semigroup. For this example, the generator is a matrix. A principal eigenvector is found by finding an eigenvector of A with strictly positive entries. Standard Perron–Frobenius theory implies that if the chain is irreducible, since the multiplicative functional is strictly positive, there is such an eigenvector which is unique up to scale. While there is uniqueness in the case of an irreducible finite-state chain, there can be multiple solutions in more general settings. EXAMPLE 6.2—Markov Diffusion Example (Continued): Consider a multiplicative process M = exp(A), where (15)
¯ + At = βt
t
t
t
βf Xsf ds + 0
0
t
f
βo Xso ds +
Xs γf dBsf + 0
γo dBso 0
where X f and X o are given in Example 3.2. Guess an eigenfunction of the form exp(cf xf + co xo ). The corresponding eigenvalue equation is γf2
γo2 2 2 + cf [ξf (x¯ f − xf ) + xf γf σf ] + co [ξo (x¯ o − xo ) + γo σo ]
ρ = β¯ + βf xf + βo xo +
+ (cf )2 xf
12
σf2 2
+ (co )2
xf +
σo2 2
This is the case studied by Hansen, Heaton, and Li (2008).
203
LONG-TERM RISK
This generates two equations: one that equates the coefficients of xf to zero and another that equates the coefficients of xo to zero: 0 = βf +
γf2
+ cf (γf σf − ξf ) + (cf )2
2 0 = βo − co ξo
σf2 2
The solution to the first equation is (ξf − γf σf ) ± (ξf − γf σf )2 − σf2 (2βf + γf2 ) (16) cf = (σf )2 provided that (ξf − γf σf )2 − σf2 (2βf + γf2 ) ≥ 0 The solution to the second equation is (17)
co =
βo ξo
The resulting eigenvalue is ρ = β¯ +
σ2 γo2 + cf ξf x¯ f + co (ξo x¯ o + γo σo ) + (co )2 o 2 2
Write f
o ˆ t = exp(−ρt)Mt exp(cf Xt + co Xt ) M f exp(cf X0 + co X0o )
ˆ is multiplicative, we can express it as M ˆ t = exp(Aˆ t ), where Since M t t f Aˆ t = Xs (γf + cf σf ) dBsf + (γo + co σo ) dBso 0
(γf + cf σf )2 − 2
0
t 0
(γo + co σo )2 t Xsf ds − 2
ˆ is always a martingale. Hence we may use this Also, it can be shown that M martingale to define a new probability measure. Using this new measure entails appending an extra drift to the law of motion of X. The resulting distorted or twisted drift for X f is ξf (x¯ f − xf ) + xf σf (γf + cf σf )
204
L. P. HANSEN AND J. A. SCHEINKMAN
and the drift for X o is ξo (x¯ o − xo ) + σo (γo + co σo ) Later we will argue that only one of these solutions interests us. We will select a solution for cf so that the implied distorted process for X f remains stationary. Notice that ξf (x¯ f − xf ) + xf σf (γf + cf σf ) = ξf x¯ f ± xf (ξf − γf σf )2 − σf2 (2βf + γf2 ) For mean reversion to exist, we require that the coefficient on xf be negative. REMARK 6.2: At the cost of an increase in notational complexity, we could add an “affine” jump component as in Duffie, Pan, and Singleton (2000). Suppose that the state variable X o , instead of being an Ornstein–Uhlenbeck process, satisfies dXto = ξo (x¯ o − Xto ) dt + σo dBto + dZt where Z is a pure jump process whose jumps have a fixed probability distribution ν on R and arrive with intensity 1 xf + 2 with 1 ≥ 0, 2 ≥ 0. Suppose that the additive functional A has an additional jump term modeled using κ(y x) = κ(y ¯ o − xo ) for y = x and exp[κ(z)] ¯ dν(z) < ∞. The generator A has now an extra term given by (1 xf + 2 ) [φ(xf xo + z) − φ(xf xo )] exp[κ(z)] ¯ dν(z) Hence when φ(x) = exp(cf xf + co xo ), the extra term reduces to ¯ dν(z) (1 xf + 2 ) exp(cf xf + co xo ) [exp(co z) − 1] exp[κ(z)] As before we must have co =
βo ξo
and hence cf must solve σf2 + cf (γf σf − ξf ) + (cf )2 2 2 βo + 1 exp z − 1 exp[κ(z)] ¯ dν(z) ξo
0 = βf +
γf2
LONG-TERM RISK
205
The resulting eigenvalue is σ2 γ2 ρ = β¯ + o + cf ξf x¯ f + co (ξo x¯ o + γo σo ) + (co )2 o 2 2 βo + 2 exp z − 1 exp[κ(z)] ¯ dν(z) ξo 7. LONG-RUN DOMINANCE In this section we establish approximation results for semigroups that apply over long time horizons. The limiting result we justify is ψ (18) d ς ˆ lim exp(−ρt)Mt ψ = φ t→∞ φ where the limit is expressed in terms of principal eigenvalue ρ and eigenfunction φ for a collection of functions ψ and a measure ςˆ that we will characterize. We have illustrated that there may be multiple principle eigenfunctions. We show that at most one of these principle eigenfunctions is the one germane for establishing this limiting behavior. In light of (18), the eigenvalue ρ governs the growth (or decay) of the semigroup. When we rescale the semigroup to eliminate this growth (decay), the limiting state dependence is proportional to the dominant eigenfunction φ (which is itself only determined up to a scale factor) for alternative functions ψ. The precise characterization of this limiting behavior of the semigroup provides the fundamental inputs to our characterization of valuation over long time horizons. It provides us with a measure of longterm growth rates is asset payoffs and of long-term decay rates in the values assigned to these payoffs. It gives us ways to formalize long-term risk-return trade-offs for nonlinear Markov models as we will show in the next section. Prior to our more general investigation, we first illustrate the results in the case of a Markov chain. 7.1. Markov Chain Consider the finite-state Markov chain example with intensity matrix U. In this section we will study the long-run behavior of the semigroup by solving the eigenvalue problem Aφ = ρφ for an eigenvector φ with strictly positive entries and a real eigenvalue ρ. This solution exists whenever the chain is irreducible and the multiplicative functional is strictly positive. Given this solution, then Mt φ = exp(tA)φ = exp(ρt)φ
206
L. P. HANSEN AND J. A. SCHEINKMAN
The beauty of Perron–Frobenius theory is that ρ is the eigenvalue that dominates in the long run. Its real part is strictly larger than the real parts of all of the other eigenvalues. This property dictates its dominant role. To see this, suppose for simplicity that the matrix A has distinct eigenvalues, A = TDT−1 where T is a matrix with eigenvectors in each column and D is a diagonal matrix of eigenvalues. Then exp(tA) = T exp(tD)T−1 Let the first entry of D be ρ and let the first column of T be φ. Scaling by exp(−ρt) and taking limits, lim exp(−ρt) exp(tA)ψ = (φ∗ · ψ)φ
t→∞
where φ∗ is the first row of T −1 . Thus ρ determines the long-run growth rate of the semigroup. After adjusting for this growth, the semigroup has an approximate one factor structure in the long run. Provided that φ∗ · ψ is not zero, exp(−ρt)Mt ψ is asymptotically proportional to the dominant eigenvector φ. 7.2. General Analysis To establish this dominance more generally, we use the martingale construction as in the decomposition of Corollary 6.1 to build an alternative family of distorted Markov transition operators and apply known results about Markov operators to this alternative family. ˆ denote the exIn what follows we will maintain Assumption 6.1 and let A ˆ ˆ astended generator of the martingale M. We will also call the semigroup M ˆ the principal eigenfunction semigroup. This semigroup is well sociated with M defined at least on the space L∞ , and it maps constant functions into constant functions. Consistent with the applications that interest us, we consider only multiplicative functionals that are strictly positive. ASSUMPTION 7.1: The multiplicative functional M is strictly positive with probability 1. ˆ can be used to define a new meaAs we mentioned earlier, the martingale M sure on sets f ∈ Ft for any t. We are interested in the case where we may initialize the process X such that, under the new probability measure, the process X is stationary. The next assumption guarantees that this is possible:
LONG-TERM RISK
207
ASSUMPTION 7.2: There exists a probability measure ςˆ such that ˆ d ςˆ = 0 Aψ ˆ for all ψ in the L∞ domain of the generator A. ˆ for the expectation operator and the probability measure We write Eˆ and Pr obtained when we use ςˆ as the distribution of the initial state X0 and the marˆ to distort the transition probabilities, that is, for each event f ∈ Ft , tingale M ˆ t 1f |X0 = x] d ς(x) ˆ ˆ Pr(f ) = E[M ˆ For each x, the probability Pr(·|X 0 = x) is absolutely continuous with respect to probability measure implied by Pr conditioned on X0 = x when restricted to Ft for each t ≥ 0. Assumption 7.2 guarantees that ςˆ is a stationary distribution for the distorted Markov process. (For example, see Proposition 9.2 of Ethier and Kurtz (1986).) Furthermore, ψ(Xt ) (19) E[Mt ψ(Xt )|X0 = x] = exp(ρt)φ(x)Eˆ X0 = x φ(Xt ) If we treat exp(−ρt)φ(Xt ) as a numeraire, equation (19) is reminiscent of the familiar risk-neutral pricing in finance. Note, however, that the numeraire depends on the eigenvalue–eigenfunction pair, and equation (19) applies even when the multiplicative process does not define a price.13 Let Δˆ > 0 and consider the discrete time Markov process obtained by samˆ for j = 0 1 This discrete process is often referred pling the process at Δj to as a skeleton. In what follows we assume that the resulting discrete time process is irreducible. ASSUMPTION 7.3: There exists a Δˆ > 0 such that the discretely sampled process {XΔjˆ : j = 0 1 } is irreducible. That is, for any Borel set Λ of the state space D0 with ς(Λ) ˆ > 0, ∞
1{X ∈Λ} X0 = x > 0 Eˆ ˆ Δj
j=0
for all x ∈ D0 . 13 The idea of using an appropriately chosen eigenfunction of an operator to construct and analyze a twisted probability measure is also featured in the work of Kontoyiannis and Meyn (2003).
208
L. P. HANSEN AND J. A. SCHEINKMAN
Under Assumption 7.1 it is equivalent to assume that this irreducibility restriction holds under the original probability measure.14 We establish approximation results by imposing a form of stochastic stability under the distorted probability measure. We assume that the distorted Markov process satisfies the next assumption: ˆ ASSUMPTION 7.4: The process X is Harris recurrent under the measure Pr. That is, for any Borel set Λ of the state space D0 with positive ςˆ measure, ∞ ˆ 1{Xt ∈Λ} = ∞X0 = x = 1 Pr 0
for all x ∈ D0 . Among other things, this assumption guarantees that the stationary distribution ςˆ is unique. Under these assumptions, we characterize the role of the principal eigenvalue and function on the long-run behavior of the semigroup. The proof is given in Appendix B. ˆ satisfies Assumptions 6.1 and 7.1–7.4, and PROPOSITION 7.1: Suppose that M let Δ > 0. (a) For any ψ for which (|ψ|/φ) d ςˆ < ∞, ψ lim exp(−ρΔj)MΔj ψ = φ d ςˆ j→∞ φ for almost all (ς) ˆ x. (b) For any ψ for which ψ/φ is bounded, ψ d ςˆ lim exp(−ρt)Mt ψ = φ t→∞ φ for x ∈ D0 . The approximation implied by Proposition 7.1, among other things, gives a formal sense in which ρ is a long-run growth rate. It also provides more precise information, namely that after eliminating the deterministic growth, application of the semigroup to ψ is approximately proportional to φ, where the ˆ Subsequently, we will consider other versions of this scale coefficient is φψ d ς. approximation. We will also impose additional regularity conditions that will guarantee convergence without having to sample the Markov process. 14 Irreducibility and Harris recurrence are defined relative to a measure. This claim uses the ςˆ measure when verifying irreducibility for the original probability measure. Since irreducibility depends only on the probability distribution conditioned on X0 , it does not require that the X process be stationary under the original measure.
LONG-TERM RISK
209
7.2.1. Uniqueness As we mentioned earlier, there may exist more than one principal eigenfunction of the extended generator even after a scale normalization is imposed. To be of interest to us, a principal eigenfunction must generate a twisted probaˆ must be a martingale. As we showed in Example 6.2, bility measure, that is, M this requirement is not enough to guarantee uniqueness—there may exist more ˆ t is a martingale. than one principal eigenfunction for which the implied M However, in that example, only one of the two solutions we exhibited implies a Markov evolution for X that is stochastically stable. The other solution will also result in a Markov process, but it fails to be stationary. Recall that the two candidate drift distortions are ξf x¯ f ± xf (ξf − γf σf )2 − σf2 (2βf + γf2 ) Only when we select the solution associated with the negative root do we obtain a process that has a stationary density. This approach to uniqueness works much more generally. The next proposition establishes that stochastic stability requirements will typically eliminate the multiplicity of principal eigenvectors that generate appropriate twisted probabilities. More generally, it states that the eigenvalue of interest to us is always the smallest one. PROPOSITION 7.2: Assume that Assumption 7.1 is satisfied and that there exists a sampling interval Δ such that {XΔj : j = 0 1 } is irreducible. Suppose φ is a principal eigenfunction of the extended generator A of a multiplicative process M ˆ t : t ≥ 0} satisfies Assumptions 6.1, 7.2 with for which the associated process {M a stationary distribution ς, ˆ 7.3, and 7.4. Then the associated eigenvalue ρ is the smallest eigenvalue associated with a principal eigenfunction. Furthermore, if φ˜ is another positive eigenfunction associated with ρ, then φ˜ is proportional to φ (ςˆ almost surely). The proof can be found in Appendix B. This proposition guarantees that once we find a positive eigenfunction that generates a martingale that satisfies the required stochastic stability restriction, then we have found the only eigenfunction of interest (up to a constant scale factor). For instance, in Example 6.2 we only examined candidate eigenfunctions of a particular functional form, but found one that satisfies the assumptions of Proposition 7.2. Hence there exist no other eigenfunctions that satisfy these assumptions. 7.2.2. Lp Approximation When there exists a stationary distribution, it follows from Nelson (1958) ˆ t : t ≥ 0} can be extended to Lˆ p for any p ≥ 1 constructed that the semigroup {M
210
L. P. HANSEN AND J. A. SCHEINKMAN
using the measure d ς. ˆ The semigroup is a weak contraction. That is, for any t ≥ 0, ˆ t ψ p ≤ ψ p M where · p is the Lˆ p norm. PROPOSITION 7.3: Under Assumption 7.2, for p ≥ 1,
|Mt ψ|
p
provided that
1/p
1/p 1 1 p d ςˆ d ςˆ ≤ exp(ρt) |ψ| p φ φp
|ψ|p (1/φp ) d ςˆ < ∞.
PROOF: This follows from the weak contraction property established by Nelson (1958) together with the observation that
1 ψ ˆ exp(−ρt) Mt ψ = M t Q.E.D. φ φ REMARK 7.1: This proposition establishes an approximation in an Lp space ˆ Notice that φ itself is constructed using the transformed measure (1/φp ) d ς. always in this space. In particular, we may view the semigroup {Mt : t ≥ 0} as operating on this space. Proposition 7.3 shows that when the distorted Markov process constructed using the eigenfunction is stationary, ρ can be interpreted as an asymptotic growth rate of the multiplicative semigroup. The eigenfunction is used to characterize the space of functions over which the bound applies. We now produce a more refined approximation. ˆp Let Z pdenote the set of Borel measurable functions ψ such that ψ d ςˆ = 0 and |ψ| d ςˆ < ∞. Make the following supposition: ASSUMPTION 7.5: For any t > 0, sup ψ∈Zˆ p : ψ ≤1
ˆ t ψ p < 1 M
In the case of p = 2, Hansen and Scheinkman (1995) gave sufficient conditions for Assumption 7.5 to be satisfied.15 15 Assumption 7.5 for p = 2 is equivalent to requiring that the distorted Markov process be rho-mixing.
LONG-TERM RISK
211
7.4: Under Assumptions 7.2 and 7.5, for any ψ such that PψROPOSITION | φ |p d ςˆ < ∞, p 1/p exp(−ρt)Mt ψ − φ ψ d ςˆ 1 d ςˆ ≤ c exp(−ηt) φ φp for some rate η > 0 and positive constant c. 7.2.3. Lyapunov Functions Meyn and Tweedie (1993a) established, under an additional mild continuity condition, sufficient conditions for the assumptions in this section using a “Lyapunov function” method. In this subsection we will make the following assumption: ASSUMPTION 7.6: The process X is a Feller process under the probability meaˆ 16 sure associated with M. We use Lyapunov functions that are restricted to be norm-like. DEFINITION 7.1: A continuous function V is called norm-like if the set {x : V (x) ≤ r} is precompact for each r > 0. A norm-like function converges to +∞ along any sequence {xj } that conˆ verges to ∞. We will consider here only norm-like functions V for which AV is continuous. A sufficient condition for the existence of a stationary distribution (Assumption 7.2) and for Harris recurrence (Assumption 7.4) is that there exists a norm-like function V for which A(φV ) ˆ ≤ −1 − ρV = AV φ outside a compact subset of the state space. (See Theorem 4.2 of Meyn and Tweedie (1993b).) In Section 7.2.2 we established Lp approximations results. The space Lˆ p is largest for p = 1. It is of interest to ensure that the constant functions are in the corresponding domain for the semigroup {Mt : t ≥ 0}. This requires that 16 By a Feller process we presume that the implied conditional expectation operators map continuous functions on the one-point compactification of D into continuous functions. In fact, Meyn and Tweedie (1993b) permitted more general processes. The restriction that the process be Feller implies that all compact subsets are what Meyn and Tweedie (1993b) referred to as petite sets.
212
L. P. HANSEN AND J. A. SCHEINKMAN
1/φ have a finite first moment under the stationary distribution ς. ˆ A sufficient condition for this is the existence of a norm-like function V such that ˆ ) ≤ − max{1 φ} A(φV ) − ρφV = φA(V for x outside a compact set. (Again see Theorem 4.2 of Meyn and Tweedie (1993b).) Finally, Proposition 7.4 only applies when the process is weakly dependent under the stationary distribution.17 By weakening the sense of approximation, we can expand the range of applicability. Consider some function ψˆ ≥ 1. For any t, we use ˆ sup Mt ψ − ψ d ςˆ |ψ|≤ψˆ
for each x as a measure of approximation. When ψˆ = 1 this is equivalent to ˆ t ψ and ψ d ςˆ applied to what is called the total variation norm by viewing M indicator functions as measures for each x. It follows from Meyn and Tweedie (1993b, Theorem 5.3) that if there exists a norm-like function V and a real number a such that (20)
A(φV ) ˆ ≤ −ψ ˆ − ρV = AV φ ˆ A(φψ) ˆ ψˆ ≤ aψˆ − ρψˆ = A φ
outside a compact set, then ψ ψ ψ ˆt − d ςˆ = lim sup exp(−ρt)Mt ψ − φ d ςˆ φ lim sup M t→∞ t→∞ φ φ φ |ψ|≤φψˆ |ψ|≤φψˆ = 0 Note that in inequality (20) the constant a can be positive. Hence this inequality only requires the existence of an upper bound on rate of growth of the conditional expectation of the function ψˆ under the distorted probability. While ˆ it is pointwise in the approximation is uniform in functions dominated by φψ, the Markov state x. The approximation results obtained in this section have a variety of applications depending on our choice of the multiplicative functional M. In these applications M is constructed using stochastic discount factor functionals, growth functionals, or valuation functionals. These applications are described in the next section. 17
In contrast, Proposition 7.1 applies more generally.
LONG-TERM RISK
213
8. LONG-TERM RISK A familiar result from asset pricing is the characterization of the short-term risk-return trade-off. The trade-off reflects the compensation, expressed in terms of expected returns, from being exposed to risk over short time horizons. Continuous-time models of financial markets are revealing because they give a sharp characterization of this trade-off by looking at the instantaneous limits. Our construction of valuation functionals in Section 3.4 reflects this trade-off in a continuous-time Markov environment. Formally, the trade-off is given in Corollary 3.1. In this section we explore another extreme: the trade-off pertinent for the long run. In the study of dynamical systems, a long-run analysis gives an alternative characterization that reveals features different from the short-run dynamics. For linear systems it is easy to move from the short run to the long run. Nonlinearity makes this transformation much less transparent. This is precisely why operator methods are of value. Specifically, we study growth or decay rates in semigroups constructed from alternative multiplicative functionals. Asset values are commonly characterized in terms of growth rates in the cash flows and risk-adjusted interest rates. By using results from the previous section, we have a way to provide long-term counterparts to growth rates and risk-adjusted interest rates. By changing the long-term cash-flow exposure to risk, we also have a way to study how long-term counterparts to risk-adjusted interest rates change with cash-flow risk exposure. In this section we show how to apply the methods of Section 7 to support characterizations of long-term growth and valuation. It has long been recognized that steady state analysis provides a useful characterization of a dynamical system. For Markov processes the counterpart to steady state analysis is the analysis of a stationary distribution. We are led to a related but distinct analysis for two reasons. First, we consider economic environments with stochastic growth. Second, our interest is in the behavior of valuation, including valuation of cash flows with long-run risk exposure. These differences lead us to study stochastic steady distributions under alternative probability measures. As we have seen, these considerations lead naturally to the study of multiplicative semigroups that display either growth in expectation or decay in value. The counterpart to steady state analysis is the analysis of the principal eigenvalues and eigenfunctions, the objects that characterize the long-run behavior of multiplicative semigroups. We use appropriately chosen eigenvalues and eigenfunctions to change probability measures. Changing probability measures associated with positive martingales are used extensively in asset pricing. Our use of this tool is distinct from the previous literature because of its role in long-run approximation. We now explore three alternative applications of the methods developed in this paper.
214
L. P. HANSEN AND J. A. SCHEINKMAN
8.1. Decomposition of Stochastic Discount Factors (M = S) Alvarez and Jermann (2005) characterized the long-run behavior of stochastic discount factors. Their characterization is based on a multiplicative decomposition on a permanent and a transitory component (see their Proposition 1). Corollary 6.1 delivers this decomposition, which we write as (21)
ˆt St = exp(ρt)M
φ(X0 ) φ(Xt )
ˆ The eigenvalue ρ is typically negative. We illustrated for some martingale M. that such a decomposition is not unique. For such a decomposition to be useful ˆ in long-run approximation, the probability measure implied by martingale M must imply that the process X remains stationary. Proposition 7.2 shows that only one such representation implies that the process X remains recurrent and stationary under the change of measure. Decomposition (21) of a stochastic discount factor functional shows how to extract a deterministic growth component and a martingale component from the stochastic discount factor functional. Long-run behavior is dominated by these two components vis á vis a transient component. Building in part on representations in Bansal and Lehmann (1997), Hansen (2008) showed that the transient component can often include contributions from habit persistence or social externalities as modeled in the asset pricing literature. This stochastic discount factor decomposition can be used to approximate prices of long-term discount bonds,
1 ˆ exp(−ρt)E(St |X0 = x) = E X0 = x φ(x) φ(Xt ) 1 ˆ ≈E φ(x) φ(Xt ) where the approximation on the right-hand side becomes arbitrarily accurate as the horizon t becomes large. Prices of very long-term bonds depend on the current state only through φ(x). Thus φ is the dominant pricing factor in the long run. This approximation result extends more generally to stationary cash flows as characterized by Proposition 7.1.18 8.2. Changing Valuation Functionals (M = V ) Alternative valuation functionals imply alternative risk exposures and growth trajectories. For one version of a long-term risk-return frontier, we 18 Alvarez and Jermann (2005) referred to an earlier version of our paper for the link to eigenfunctions.
215
LONG-TERM RISK
change the risk exposure of the valuation functional subject to pricing restriction (10). This gives a family of valuation functionals that are compatible with a single stochastic discount factor. We may then apply the decomposition in Corollary 6.1, restricted so that the distorted Markov process is stationary, to find a corresponding growth rate associated with each of these valuation functionals. Thus alternative valuation functionals as parameterized by the triple (βv γv κv ) and restricted by the pricing restriction of Proposition 3.1 imply return processes with different long-run growth rates. The principal eigenvalues of the corresponding semigroups give these rates. In effect, the valuation functionals can be freely parameterized by their risk exposure pair (γv κv ) with βv determined by the local pricing restriction. The vector γv gives the exposure to Brownian risk and κv gives the exposure to jump risk. Thus a long-run risk-return frontier is given by the mapping from the risk exposure pair (γv κv ) to the long-run growth rate of the valuation process. The growth rate may be computed by solving an eigenvalue problem that exploits the underlying Markovian dynamics. This characterizations allows us to move beyond the log-linear–log-normal specification implicit in many studies of long-horizon returns. The dominant eigenvalue calculation allows for conditional heteroskedasticity with long-run consequences and it allows jumps that might occur infrequently. The principal eigenfunction (along with the eigenvalue) can be used to construct the martingale component as in Corollary 6.1. EXAMPLE 8.1—Application to the Markov Diffusion Example: Recall that in the Breeden model and the Kreps–Porteus model, the implied stochastic discount factor is St = exp(Ast ), where (22)
A = β¯ s t +
t
t
β X ds +
s t
s f
f s
0
t t f s f β X ds + Xs γf dBs + γos dBso s o
o s
0
0
0
where the alternative models give rise to alternative interpretations of the parameters. To parameterize a valuation functional V = exp(Av ), we construct A = β¯ v t +
t
v t
βvf Xsf ds 0
t
t
+
X γ dB +
o s
0
f s
β X ds + v o
0
v f
f s
t
γov dBso 0
where β¯ v + βvf xf + βvo xo xf 1 = −β¯ s − βsf xf − βso xo − (γfs + γfv )2 − (γos + γov )2 2 2
216
L. P. HANSEN AND J. A. SCHEINKMAN
This equation imposes the local risk-return relation and determines β¯ v , βvf , and βvo as a function of the stochastic discount factor parameters and the risk exposure parameters γfv and γov . To infer the growth rates of valuation processes parameterized by (γfv γov ), we find the principal eigenvalue for the multiplicative semigroup formed by setting M = V . Applying the calculation in Example 6.2, this eigenvalue is given by σ2 (γ v )2 ρ = β¯ v + o + cvf ξf x¯ f + cvo (ξo x¯ o + γov σo ) + (cvo )2 o 2 2 s 2 ) σ2 (γ = −β¯ s − o − γos γov + cvf ξf x¯ f + cvo (ξo x¯ o + γov σo ) + (cvo )2 o 2 2 where cvf and cvo are given by formulas (16) and (17), respectively. The terms on the right-hand side exclusive of cvf ξf x¯ f give the continuous-time log-normal adjustments, while cvf ξf x¯ f adjusts for the stochastic volatility in the cumulative return. A long-run risk-return trade-off is given by mapping of (γfv γov ) into the eigenvalue ρ. Note that ρv is a linear function of γov . One notion of a long-run risk price is obtained by imputing the marginal change in the rate of return given a marginal change in the risk exposure as measured by ρ: (23)
∂ρ σo = −γos + cvo σo = −γos − βso ∂γov ξo
In contrast, ρ depends nonlinearly on γfv , although risk prices can still be constructed by computing marginal changes in the implied rates of return at alternative values of γfv . 8.3. Changing Cash Flows (M = G, M = S, and M = GS) Consider next a risky cash flow of the form Dt = Gt ψ(Xt )D0 where G is a multiplicative functional. This cash flow grows over time. We could parameterize the multiplicative functional as the triple (βg γg κg ), but this over-parameterizes the long-term risk exposure. The transient components to cash flows will not alter the long-run risk calculation. One attractive possibility is to apply Corollary 6.1 and Propositions 7.1 and 7.2 with (M = G), and use the martingale from that decomposition for our choice of G. Thus we could impose the following restriction on the parametrization of G: |γg |2 βg + + exp[κg (y ·)] − 1 η(dy ·) = δ 2
LONG-TERM RISK
217
for some positive growth rate δ. Given δ, this relation determines a unique βg . In addition, we restrict these parameters so that the distorted probability measure associated with an extended generator built from (a) jump measure exp[κg (y x)]η(dy|x), (b) first derivative term ξ(x) + Γ (x)γg (x), and (c) second derivative term Σ(x) implies a semigroup of conditional expectation operator that converges to the corresponding unconditional expectation operator. Hansen, Heaton, and Li (2008) explored the valuation consequences by constructing a semigroup using M = GS, where S is a stochastic discount factor functional. They only considered the log-linear–log-normal model, however. Provided that we can apply Proposition 7.1 for this choice of M and ψ, the negative of the eigenvalue −ρ is the overall rate of decay in value of the cash flow. Consider an equity with cash flow D. For appropriate specifications of ψ, the values of the cash flows far into the future are approximately proportional to 1 as the limiting contribution to the the eigenfunction φ. Thus we may view −ρ price dividend ratio. The decay rate ρ reflects both a growth rate effect and a discount rate effect. To net out the growth rate effect, we compute −ρ + δ as an asymptotic rate of return that encodes a risk adjustment. Heuristically, this is linked to the Gordon growth model because −ρ is the difference between the asymptotic rate of return −ρ + δ and the growth rate δ. Following Hansen, Heaton, and Li (2008), we explore the consequences of altering the cash-flow risk exposure. Such alterations induce changes in the asymptotic decay rate in value (−ρ), and hence in the long-run dividend price 1 ratio −ρ and the asymptotic rate of return −ρ + δ. The long-run cash-flow riskreturn relation is captured by the mapping from the cash-flow risk exposure pair (γg κg ) to the corresponding required rate of return −ρ + δ. Hansen, Heaton, and Li (2008) used this apparatus to produce such a tradeoff using empirical inputs in a discrete-time log-linear environment. The formulation developed here allows for extensions to include nonlinearity in conditional means, heteroskedasticity that contributes to long-run risk, and large shocks modeled as jump risk. EXAMPLE 8.2—Application to the Markov Diffusion Example: Returning to the Breeden model or the Kreps–Porteus model, suppose the growth process G is the exponential of the additive functional: t t t f g 2 Xs (γf ) + (γog )2 f g f g o ds Xs γf dBs + γo dBs − A = δt + 2 0 0 0 g t
g
The parameters γf and γfo parameterize the cash-flow risk exposure. We limit the cash-flow risk exposure by the inequality g
2(ξf + σf γf )x¯ f ≥ σf2
218
L. P. HANSEN AND J. A. SCHEINKMAN
This limits the martingale component so that it induces stationarity under the ˆ associated with M = G. probability measure induced by the M We use again the parameterization St = exp(Ast ), where Ast is given by (22). Hence A = As + Ag is given by t t t t f f o f ¯ At = βt + βf Xs ds + βo Xs ds + γo dBso Xs γf dBs + 0
0
0
0
where β¯ = δ − (γog )2 /2 + β¯ s , βf = −(γ ) /2 + βsf , βo = βso , γf = γ + γfs , and γo = γog + γos . The formulas given in Example 6.2, discussed previously, give us an asymptotic, risk-adjusted rate of return g 2 f
−ρ + δ = −
g f
(γos )2 − γos γog − β¯ s − cf ξf x¯ f 2
− co [ξo x¯ o + (γog + γos )σo ] − (co )2
σo2 2
Recall that co = βso /ξo and cf is a solution to a quadratic equation (16). This allows us to map exposures to the risks Bo and Bf into asymptotic rates of return. For instance, the long-run risk price for the exposure to the Bo risk is dρ βso s σo g = −γo − ξo dγo This risk price vector coincides with the one imputed from valuation functionals (see (23)). The long-run contribution is reflected in the parameter ξo that governs the persistence of the Ornstein–Uhlenbeck process. This limit gives a continuous-time counterpart to the discrete-time log-normal model studied by Hansen, Heaton, and Li (2008). The cash-flow risk exposure to Bf is encoded in the coefficient cf of the g eigenfunction. Since this coefficient depends on γf in a nonlinear manner, there is nonlinearity in the long-run risk price; the marginal prices depend on the magnitude of the exposure. The prices of the cash-flow exposure to Bf risk differ from their counterparts from valuation functions. The cash-flow prices feature exposure at a specific horizon, while the valuation prices value the cumulative exposure over the horizon when payoffs are reinvested. In general, these prices will differ even though they happen to agree for log-normal specifications. Hansen (2008) gave some other continuous-time examples, drawing on alternative contributions from the asset pricing literature. Hansen, Heaton, and Li (2008) also decomposed the one-period return risk to equity into a portfolio of one-period holding period returns to cash flows for log-linear models. To extend their analysis, consider a cash flow of the form Dt = D0 Gt ψ(Xt )
LONG-TERM RISK
219
where G is a multiplicative growth functional. The limiting gross return is given by lim
t→∞
E(St Dt /S1 |F1 ) φ(X1 ) = exp(−ρ)G1 E(St Dt |F0 ) φ(X0 )
where ρ and φ are the principal eigenvalue and eigenfunction of the semiˆ group constructed using M = GS. This limit presumes that Eψ(X t ) is positive. The limiting holding period return has a cash-flow growth component G1 , an eigenvalue component exp(−ρ), and an eigenfunction component (φ(X1 ))/(φ(X0 )). The limit is independent of the transient contribution to the cash flow provided that the assumptions of Proposition 7.1 are satisfied. 9. EXISTENCE In our analysis thus far we supposed that we could find solutions to the principal eigenvalue problem and then proceeded to check the alternative solutions. We also exhibited solutions to this problem for specific examples. We now discuss some sufficient conditions for the existence of a solution to the eigenvalue problem. We return to our study of a generic semigroup represented with a multiplicative functional M. As we know from Table I, there is a variety of constructions of a multiplicative functional depending on which of the applications described in Section 8 is the focal point of the investigation. Our analysis in this section builds on work of Nummelin (1984) and Kontoyiannis and Meyn (2005). We impose the following drift condition: ASSUMPTION 9.1: There exists a function V ≥ 1 and constant a such that for x ∈ D0 , AV (x) ≤ a V (x) Since M is a multiplicative functional, so is {(Mt V (Xt ))/(V (X0 ))}. We show in Appendix D that when Assumption 9.1 holds, the operator ∞ V (Xt ) ψ(Xt )X0 = x exp(−at)E Mt Fψ(x) = V (X0 ) 0 is bounded on L∞ whenever a > a. As an alternative, we could have constructed an analogous operator applied to V ψ, in which case we should also scale the outcome: V Fψ. In what follows it will be convenient to work directly with the operator F.19 We impose the following positivity condition on F: 19 The operator F is a resolvent operator associated with the semigroup built with the multiplicative functional {(Mt V (Xt ))/(V (X0 ))}.
220
L. P. HANSEN AND J. A. SCHEINKMAN
ASSUMPTION 9.2: There exists a measure ν on the state space D0 such that for every Borel set Λ for which ν(Λ) > 0, F1Λ (x) > 0 This irreducibility assumption on F can be obtained from more primitive hypotheses as verified in Appendix D. If Assumption 9.2 holds, it follows from Theorem 2.1 of Nummelin (1984) that there exists a function s ≥ 0 with s dν > 0 such that for any ψ ≥ 0, (24) Fψ ≥ s ψ dν The function s is necessarily bounded and hence we may scale it to have an L∞ norm equal to unity by adjusting the measure ν accordingly. Next form
∞ −j j (25) r F s dν j=0
for alternative values of the real number r. Theorem 3.2 of Nummelin (1984) states that is a critical value λ ≥ F for which the quantity in (25) is finite for r > λ and infinite for r < λ, and that such a λ is independent of the particular function s chosen. When (rI − F)−1 does not exist as a bounded operator, r is in the spectrum of F. The spectrum is closed and thus λ is necessarily an element of the spectrum. Our goal is to state sufficient conditions under which λ is an eigenvalue of F associated with a nonnegative eigenfunction φ. This result is of interest because, as we show in Appendix D, when φ is a nonnegative eigenfunction of F, then V φ is a positive eigenfunction of the semigroup {Mt : t ≥ 0}. Proposition 6.2 thus guarantees that we can produce the desired decomposition of the multiplicative functional M. Construct the nonnegative operator Gψ =
∞
λ−j (F − s ⊗ ν)j ψ
j=0
where s ⊗ ν is the operator (s ⊗ ν)ψ = s ψ dν Following Nummelin (1984), our candidate eigenfunction is the nonnegative function Gs. Provided that G is a bounded operator, Gs is an eigenfunction of F. (See Appendix D.)
LONG-TERM RISK
221
This leaves open how to verify that G is a bounded operator.20 Instead of assuming that G is bounded, we may suppose that the following statement holds: ASSUMPTION 9.3: φ = Gs is bounded.21 In addition, we strengthen Assumption 9.1. ASSUMPTION 9.4: There exists a function V ≥ 1 such that for any r > 0, there exists a positive number c such that AV ≤ −r + cs V for all x ∈ D0 . In Appendix D we show that these two assumptions guarantee that the operator G is bounded using an argument that follows in part the proof of Proposition 4.11 in Kontoyiannis and Meyn (2003). REMARK 9.1: We may apply the multiplicative mean ergodic theorem (Theorem 4.16) of Kontoyiannis and Meyn (2003) if we decompose the generator AφV AV = Bφ + φ V V where Bφ = ( AφV −φ AV ). Notice that B1 = 0. Typically, B will be the generator V V of a semigroup for a Markov process. Kontoyiannis and Meyn (2003) imposed restrictions on this process and bounds on their counterpart to AV to establish V the existence of a positive eigenfunction.22 (26)
REMARK 9.2: Alternatively, we may establish the existence of an eigenfunction by showing that F is a compact operator on an appropriately weighted L2 space by using the approach of Chen, Hansen, and Scheinkman (2007). Chen, Hansen, and Scheinkman (2007) focused on the case of a multivariate diffusion, implying that B is a second-order differential operator. 20 It follows from Nummelin (1984, Proposition 4.7) that Gs is finite except on the set of ν measure zero, and follows from Propositions 4.7 and 2.1 of Nummelin (1984) (applied to the kernel λ−1 (F − s ⊗ ν)) that Fφ ≤ λφ. 21 a function 0 ≤ s∗ ≤ 1 such that Fs∗ ≤ λs∗ and ∗ Alternatively, we could assume that there exists ∗ s dν > 0. It may then be shown that φ ≤ λ/( s dν) as in, say, Proposition 4.11 of Kontoyiannis and Meyn (2003). 22 The Kontoyiannis and Meyn (2003) established more refined results motivated by their interest in large deviation theory.
222
L. P. HANSEN AND J. A. SCHEINKMAN
10. CONCLUSIONS In this paper we characterized the long-run risk-return relationship for nonlinear continuous-time Markov environments. This long-term relationship shows how alternative cash-flow risk exposures are encoded in asymptotic riskadjusted discount rates. To achieve this characterization we decomposed a multiplicative functional built from the Markov process into the product of three components: (i) a deterministic exponential trend, (ii) a martingale, and (iii) a transitory component. The martingale and transitory components are constructed from a principal eigenfunction associated with the multiplicative functional, and the rate of growth of the exponential trend is given by the corresponding eigenvalue. The multiplicative functional represents a semigroup of valuation operators that accommodate stochastic growth in consumption or cash flows. Thus the decomposition of the multiplicative functional allows us to characterize transitory and permanent components to valuation. Specifically, the martingale component gives an alternative distorted or twisted probability that we used to characterize approximation over long time horizons. This long-horizon apparatus is a complement to the short-term risk-return trade-offs familiar from asset pricing and is tailored to accommodate stochastic growth. It supports an analysis of the term structure of risk prices. We explore this term structure for two reasons. First, a variety of recent theories of asset prices feature investor preferences in which the intertemporal decomposition of risk is an essential ingredient, as in models in which separability over states of nature or time is relaxed. A second motivation is that the arguably simplified models that we use to construct evidence are likely to be misspecified when pricing over short intervals of time. Pricing models can be repaired by appending ad hoc dynamics, but then it becomes valuable to understand which repairs have long-run consequences. There are several natural extensions of this work. First, while we presented results concerning the existence and uniqueness of principal eigenvalues and eigenfunctions, it remains important to develop methods for computing these objects. There is only a limited array of examples for which quasi-analytical solutions are currently available. Second, while we have focused on dominant eigenvalues, more refined characterizations are needed to understand how well long-run approximation works and how it can be improved. Results in Chen, Hansen, and Scheinkman (2007) and Kontoyiannis and Meyn (2003) could be extended and applied to achieve more refined characterizations.23 Third, we considered only processes with a finite number of jumps in any finite interval of time. Extending the results presented in this paper to more general Lévy processes may add new insights into characterizing long-term risk. 23
Lewis (1998), Linetsky (2004), and Boyarchenko and Levendorskii (2007) used spectral methods of this type to study the term structure of interest rates and option pricing by applying and extending quasi-analytical characterizations of eigenfunctions.
223
LONG-TERM RISK
Shaliastovich and Tauchen (2005) motivated such extensions when building structural models of asset pricing. APPENDIX A: VALUE FUNCTION IN THE EXAMPLE ECONOMY Recall the equation −ξf wf +
(1 − a)σf2 2
(wf )2 + (1 − a)ϑf σf wf +
(1 − a)ϑ2f 2
= bwf
that we solved when constructing the value function for the Kreps and Porteus (1978) example with a > 1. This quadratic equation has two solutions: wf = (a − 1)σf ϑf + b + ξf ± [(a − 1)σf ϑf + b + ξf ]2 − (a − 1)2 σf2 ϑ2f /((1 − a)σf2 ) Solutions will exist provided |ξf + b + (a − 1)σf ϑf | > (a − 1)|σf ϑf | which will always be satisfied when ϑf σf > 0. Solutions will also exist when ϑf σf < 0 and ξf + b ≥ 2(a − 1)|ϑf σf | In both cases ξf + b + (a − 1)σf ϑf ≥ 0 and thus both zeros are negative. As claimed in the text, the solution that interests us is wf = (a − 1)σf ϑf + b + ξf − [(a − 1)σf ϑf + b + ξf ]2 − (a − 1)2 σf2 ϑ2f /((1 − a)σf2 ) To see why, we note that a finite time horizon solution is given by a value function with the same functional form but coefficients that depend on the gap of time between the terminal period and the current period. This leads us to study the slope of the quadratic function −ξf wf +
(1 − a)σf2 2
(wf )2 + (1 − a)ϑf σf wf +
(1 − a)ϑ2f 2
− bwf
at the two zeros of this function. This function is concave. We pick the zero associated with a negative slope, which will always be the rightmost zero since this is the only “stable” solution.
224
L. P. HANSEN AND J. A. SCHEINKMAN
APPENDIX B: APPROXIMATION In this appendix we present additional proofs for some of the propositions in Section 7. PROOF OF ROPOSITION 7.1: Note that
ˆ t ψ φ(x) exp(−ρt)Mt ψ(x) = M φ It follows from Theorem 6.1 of Meyn and Tweedie (1993a) that ψ ψ ˆ − d ςˆ = 0 lim sup Mt t→∞ 0≤ψ≤φ φ φ which proves (b). Consider any sample interval Δ > 0. Then
ψ ψ ˆ Δj lim sup M − d ςˆ = 0 j→∞ 0≤ψ≤φ φ φ From Proposition 6.3 of Nummelin (1984), the sampled process {XΔj : j = 0 } is aperiodic and Harris recurrent with stationary density ς. ˆ Hence if 1 ˆ < ∞, | φψ (x)| d ς(x) ˆ Δj lim M
j→∞
ψ ψ d ςˆ = φ φ
for almost all (ς) ˆ x, which proves (a). (See for example, Theorem 5.2 of Meyn and Tweedie (1992).) Q.E.D. PROOF OF PROPOSITION 7.2: Consider another principal eigenfunction φ with associated eigenvalue ρ∗ . By Proposition 6.2, the eigenfunction– eigenvalue pairs must solve ∗
Mt φ(x) = exp(ρt)φ(x) Mt φ∗ (x) ≤ exp(ρ∗ t)φ∗ (x) ˆ is the martingale associated with the eigenvector φ, then If M φ∗ (Xt ) φ∗ (x) ˆ E Mt X0 = x ≤ exp[(ρ∗ − ρ)t] φ(Xt ) φ(x)
225
LONG-TERM RISK
ˆ is Harris Since the discrete-time sampled Markov process associated with M recurrent, aperiodic, and has a unique stationary distribution, the left-hand side converges to ∗ φ (X0 ) ˆ >0 E φ(X0 ) for t = Δj as the integer j tends to ∞, whenever this expected value is finite. ∗ If this expected value is not finite, then since φφ > 0, the left-hand side must diverge to +∞. In any case, this requires that ρ ≤ ρ∗ . If ρ∗ = ρ, then ∗ φ∗ (x) φ (X0 ) ˆ ≤ E φ(X0 ) φ(x) Hence the ratio of the two eigenfunctions is constant (ςˆ almost surely). Q.E.D. PROOF OF PROPOSITION 7.4: Notice that exp(−ρt)Mt ψ − φ ψ d ςˆ = φM ˆ t ψ − ψ d ςˆ φ φ φ Moreover,
p ψ 1 ψ ˆ φ Mt d ςˆ = − d ςˆ φ φ φp p
p M ˆ t ψ − ψ d ςˆ d ς ˆ φ φ
Assumption 7.5 implies that the right-hand side converges to zero as t gets large. By the semigroup property, this convergence is necessarily exponentially fast. Q.E.D. APPENDIX C: MARTINGALES AND ABSOLUTE CONTINUITY In this appendix we state some conditions that insure that Assumption 6.1 holds. Our result is inspired by the approach developed in Chapter 7 of Liptser ˆ denote a multiplicative functional parameterized and Shiryaev (2000). Let M ˆ where ˆ = exp(A), ˆ by (β γ ˆ κ) ˆ that is restricted to be a local martingale. Thus M t ˆ ˆ u ) du At = β(X 0
t
+
γ(X ˆ u− ) [Γ (Xu− ) Γ (Xu− )]−1 Γ (Xu− ) [dXuc − ξ(Xu− ) du]
0
+
κ(X ˆ u Xu− )
0≤u≤t
and the following assumption holds.
226
L. P. HANSEN AND J. A. SCHEINKMAN
ASSUMPTION C.1: −βˆ −
|γ| ˆ2 − 2
exp[κ(y ˆ x)] − 1 η(dy|x) = 0
ˆ is given by The extended generator for M
∂2 φ(x) ∂φ(x) 1 ˆ + trace Σ(x) Aφ(x) = [ξ(x) + Γ (x)γ(x)] ˆ · ∂x 2 ∂x ∂x + [φ(y) − φ(x)] exp[κ(y ˆ x)]η(dy|x) ˇ Fˇ Pr), ˇ a filtration Fˇ t , ASSUMPTION C.2: There exists a probability space (Ω ˇ ˇ an n-dimensional Ft Brownian motion B, and a semimartingale Xˇ = Xˇ c + Xˇ j , where (27)
d Xˇ tc = [ξ(Xˇ t− ) + Γ (Xˇ t− )γ( ˆ Xˇ t− )] dt + Γ (Xˇ t− ) d Bˇ t
and Xˇ j is a pure jump process with a finite number of jumps in any finite interval that has a compensator exp[κ(y ˆ Xˇ t− )]η(dy|Xˇ t− ) dt. In this case, d Bˇ t = [Γ (Xˇ u− ) Γ (Xˇ u− )]−1 × Γ (Xˇ u− ) [d Xˇ uc − ξ(Xˇ u− ) du − Γ (Xˇ u− )γ( ˆ Xˇ u− ) du] ˇ ˇ = exp(A), Use the process Xˇ to construct a multiplicative functional M where t ˇ ˆ Xˇ u ) du At = − β( 0
t
−
γ( ˆ Xˇ u− ) [Γ (Xˇ u− ) Γ (Xˇ u− )]−1 Γ (Xˇ u− ) [d Xˇ uc − ξ(Xˇ u− ) du]
0
−
κ( ˆ Xˇ u Xˇ u− )
0≤u≤t
t
=−
ˆ Xˇ u ) + |γ(Xˇ u− )|2 du − β(
0
t
γ( ˆ Xˇ u− ) [Γ (Xˇ u− ) Γ (Xˇ u− )]−1
0
× Γ (Xˇ u− ) [d Xˇ uc − ξ(Xˇ u− ) du − Γ (Xˇ u− )γ( ˆ Xˇ u− ) du]
κ( ˆ Xˇ u Xˇ u− ) − 0≤u≤t
LONG-TERM RISK
227
ˇ is parameterized by The multiplicative functional M βˇ = −βˆ − |γ| ˆ 2 γˇ = −γ ˆ κˇ = −κ ˆ ˇ γ ˇ κ) ˇ of the multiplicative funcASSUMPTION C.3: The parameterization (β ˇ satisfies the following statements: tional M t ˇ Xˇ u ) du < ∞ for every positive t. (a) 0 β( t (b) 0 |γ( ˇ Xˇ u )|2 du < ∞ for every positive t. Notice that exp[κ(y ˇ x)]η(dy|x) ˆ = η(dy|x) < ∞ for all x ∈ D0 . Moreover, |γ| ˇ2 + βˇ + 2 = −βˆ −
exp[κ(y ˇ x)] − 1 η(dy|x) ˆ
|γ| ˆ2 − 2
exp[κ(y ˆ x)] − 1 η(dy|x) = 0
ˇ is a local martingale. Thus the multiplicative functional M PROPOSITION C.1: Suppose that Assumptions C.1, C.2, and C.3 are satisfied. ˆ is a martingale. Then the local martingale M ˆ is a martingale in three steps: PROOF: We show that M ˇ is a local martingale, there is an increasing sequence of stop(i) Since M ping times {τˇ N : N = 1 } that converge to ∞ such that ˇ t ≤ τˇ N , ˇ N = Mt M t ˇ MτˇN t > τˇ N , is a martingale and ˇ M ˇ N |Xˇ 0 = x) = 1 E( t for all t ≥ 0.
228
L. P. HANSEN AND J. A. SCHEINKMAN
ˇ {t≤τˇ } |X0 = x) repre(ii) Next we obtain an alternative formula for E(1 N sented in terms of the original X process. The stopping time τˇ N can be repˇ Let τN be the corresponding function of X and resented as a function of X. construct ˆ t ≤ τN , N ˆ = Mt M t ˆ MτN t > τN . Recall that ˆ N = Φt (X) M t for some Borel measurable function Φt . By construction, ˇ = 1 Φt (X) ˇN M t
Then N ˆ 1{t≤τ } |X0 = x ˆ t 1{t≤τ } |X0 = x = E M E M t N N
ˇ N 1 1{t≤τˇ } Xˇ 0 = x = Eˇ M t N ˇN M t = Eˇ 1{t≤τˇN } |Xˇ 0 = x where the second equality follows from the Girsanov theorem. (iii) Note that lim Eˇ 1{τˇN ≤t} |Xˇ 0 = x = 1 N→∞
by the dominated convergence theorem. Thus ˆ t |X0 = x) ≥ lim E M ˆ t 1{τ ≤t} |X0 = lim Eˇ 1{τˇ ≤t} |Xˇ 0 = x = 1 E(M N N N→∞
N→∞
ˆ is a nonnegative local martingale, we know that Since M ˆ t |X0 = x) ≤ 1 E(M ˆ t |X0 = x) = 1 for all t ≥ 0 and M ˆ is a martingale. Therefore E(M
Q.E.D.
APPENDIX D: EXISTENCE We next establish the existence results discussed in Section 9. We divide our analysis into four lemmas. The first lemma states that under Assumption 9.1,
229
LONG-TERM RISK
F is bounded. The second lemma verifies that if Assumptions 9.1 and 9.2 are satisfied, and F has a nonnegative eigenfunction, there exists a strictly positive solution for the principal eigenvalue problem for the semigroup M and as a consequence of Proposition 6.2 we obtain the desired decomposition of the multiplicative functional M. The third lemma shows that if G is bounded, Gs is an eigenfunction of F. The fourth lemma shows that the boundedness of G follows from Assumptions 9.2, 9.3, and 9.4. LEMMA D.1: Suppose Assumption 9.1 is satisfied. Then F is bounded. PROOF: Let a = a + with > 0. Construct the multiplicative process Mt∗ = exp(−at)Mt Then
t
Nt∗ = Mt∗ − 1 −
V (Xt ) V (X0 ) Mu∗
0
AV (Xu ) − a du V (Xu )
is a local martingale, as we now verify. Note that t Nt = Mt V (Xt ) − V (X0 ) − Mu AV (Xu ) du 0
is a local martingale. Thus 1/(V (X0 )) gale and t 1 exp(−au) dNu V (X0 ) 0 = exp(−at)Mt
t
− 0
t 0
exp(−au) dNu is also a local martin-
V (Xt ) −1+a V (X0 )
t
exp(−au)Mu 0
V (Xu ) du V (X0 )
V (Xu ) AV (Xu ) du exp(−au)Mu V (X0 ) V (Xu )
= Nt∗ Since N ∗ is a local martingale, Fatou’s lemma implies that t E(Mt∗ |X0 = x) + E[Mu∗ |X0 = x] du ≤ 1 0
Since this holds for any t, ∞ V (Xt ) 1 (28) exp(−at)E Mt X0 = x dt ≤ V (X0 ) 0
230
L. P. HANSEN AND J. A. SCHEINKMAN
Inequality (28) guarantees that
∞
Fψ = 0
V (Xt ) ψ(Xt )X0 = x dt exp(−at)E Mt V (X0 )
defines a bounded operator in L∞ .
Q.E.D.
REMARK D.1: The irreducibility Assumption 9.2 on F that we use in our next ∞ lemma can be obtained from more primitive hypotheses. Write K(x Λ) = exp(−at)E[1Λ (Xt )|X0 = x] and suppose that K satisfies the counterpart to 0 Assumption 9.2. Then since V ≥ 1, F will satisfy Assumption 9.2 whenever M is bounded below by a positive number. Another set of sufficient conditions is obtained by first assuming that there exists a function p(t x y) such that p(t x ·) is the conditional density (with respect to ν) of Xt given X0 = x and that p satisfies the following restriction. Let {Λ˜ k : k = 1 2 } be an increasing sequence of compact subsets of the state space D0 whose union is the entire space. Suppose that for each integer k and x in the Markov state space, there exists a T such that for t ≥ T and y ∈ Λ˜ k , p(t x y) > 0. In this case we may define for each t ≥ T and each y ∈ Λ˜ k , f (t x y) = E[Mt |X0 = x Xt = y]. If we further assume that there exists a version of this conditional expectation that is a continuous function of (t y) and that M > 0, then Assumption 9.2 must hold. To see this, notice that if ν(Λ) > 0 and Λk = Λ˜ k ∩ Λ, then there must exist a positive integer k for which ν(Λk ) > 0. Choose a T > T and set c=
inf
T ≤t≤T y∈Λk
f (t x y) > 0
Then
T
exp(−at)E[Mt 1Λ (Xt )|X0 = x] T
T
exp(−at)E Mt 1Λk (Xt )|X0 = x
T
exp(−at)E 1Λk (Xt )E[Mt |X0 = x Xt ]X0 = x > 0
≥ T
=
T
since ν(Λk ) > 0. Given our restriction that V ≥ 1, Assumption 9.2 must hold. LEMMA D.2: Suppose Assumptions 9.1 and 9.2 are satisfied and Fφ = λφ for some nonnegative bounded φ. Then V φ is a strictly positive eigenfunction for the semigroup {Mt : t ≥ 0}.
231
LONG-TERM RISK
PROOF: Assumption 9.2 guarantees that φ is strictly positive. Moreover, ∞ ˜ ˜ t+s φ(x) ds ˜ exp(−as)M λMt φ(x) = Mt Fφ(x) = 0
where the right side follows from Tonelli’s theorem. Hence t ˜ s φ(x) ds ˜ exp(−as)M λMt φ(x) = exp(at)Fφ(x) − exp(at)
0
= exp(at)λφ(x) − exp(at)
t
˜ s φ(x) ds exp(−as)M
0
For a fixed x, define the function of t: ˜ t φ(x) g(t) = exp(−at)M Then
t
λg(t) = λφ −
g(s) ds 0
and g(0) = φ(x). The unique solution to this integral equation is
t g(t) = exp − φ(x) λ ˜ t and V φ solves the Hence φ solves the principal eigenvalue problem for M principal eigenvalue problem for M. Q.E.D. LEMMA D.3: Suppose F and G are bounded and Assumption 9.2 is satisfied. Then FGs = λGs.24 PROOF: Since G is bounded, for any bounded ψ, ψ = (λI − F + ν ⊗ s)Gψ = (λI − F)Gψ + s
Gψ dν
Furthermore, since λ is in the spectrum of F, choose a sequence {ψj : j = 1 2 } such that Gψj has L∞ norm 1 and that lim (λI − F)Gψj = 0
j→∞
24 This result is essentially a specialization of Proposition 4.6 of Kontoyiannis and Meyn (2003) and is closely related to Proposition 5.2 of Nummelin (1984). We include a proof for sake of completeness.
232
L. P. HANSEN AND J. A. SCHEINKMAN
The sequence { Gψj dν} is bounded and hence has a convergent subsequence with a limit r. Thus there is a subsequence of {ψj : j = 1 2 } that converges to rs and, in particular, r = 0. As a consequence, Gs = φ is an eigenfunction associated with λ. Q.E.D. LEMMA D.4: Under Assumptions 9.2, 9.3, and 9.4, the operator G is bounded on L∞ . PROOF: Assumption 9.4 implies that F1 ≤
1 + c˜ Fs r+a
where c˜ = c/(r + a). Moreover,
GFs ≤ s dν + λ Gs and, in particular, since Assumption 9.3 is satisfied, GFs is a bounded function. Given that Assumption 9.4 applies for any r, it applies for an r such that r+1a is less than λ. Moreover, λ−1 (F − s ⊗ dν)1 ≤
c˜ 1 + Fs (r + a)λ λ
Thus λ−n (F − s ⊗ ν)n 1 ≤1−
n−1
λ−j−1 (F − s ⊗ ν)j 1 + c˜
j=0
where = 1 − ν)n 1 ≥ 0,
n−1
1 r+ a
n−1
λ−j−1 (F − s ⊗ ν)j Fs
j=0
. Rearranging terms and using that fact that λ−n (F − s ⊗
λ−j−1 (F − s ⊗ ν)j 1 ≤ 1 + c˜
j=0
n−1
λ−j−1 (F − s ⊗ ν)j Fs
j=0
Therefore, ∞
j=0
λ−j−1 (F − s ⊗ ν)j 1 ≤
1 c˜ + GFs
and hence G is a bounded operator on L∞ .
Q.E.D.
LONG-TERM RISK
233
REFERENCES ALVAREZ, F., AND U. JERMANN (2005): “Using Asset Prices to Measure the Persistence in the Marginal Utility of Wealth,” Econometrica, 73, 1977–2016. [179,196,214] ANDERSON, E. W., L. P. HANSEN, AND T. J. SARGENT (2003): “A Quartet of Semigroups for Model Specification, Robustness, Prices of Risk and Model Detection,” Journal of the European Economic Association, 1, 68–123. [199] BACKUS, D., AND S. ZIN (1994): “Reverse Engineering the Yield Curve,” Working Paper 4676, NBER. [196] BANSAL, R., AND B. N. LEHMANN (1997): “Growth-Optimal Portfolio Restrictions on Asset Pricing Models,” Macroeconomic Dynamics, 1, 333–354. [214] BANSAL, R., AND A. YARON (2004): “Risks for the Long Run: A Potential Resolution of Asset Pricing Puzzles,” Journal of Finance, 59, 1481–1509. [188] BANSAL, R., R. DITTMAR, AND D. KIKU (2008): “Cointegration and Consumption Risks in Asset Returns,” Review of Financial Studies (forthcoming). [195] BHATTACHARYA, R. N. (1982): “On the Functional Central Limit Theorem and the Law of the Iterated Logarithm,” Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 60, 185–201. [202] BOYARCHENKO, N., AND S. LEVENDORSKII (2007): “The Eigenfunction Expansion Method in Multi-Factor Quadratic Term Structure Models,” Mathematical Finance, 17, 503–540. [222] BREEDEN, D. (1979): “An Intertemporal Asset Pricing Model With Stochastic Consumption and Investment Opportunities,” Journal of Financial Economics, 7, 265–296. [187,192] CHEN, X., L. P. HANSEN, AND J. SCHEINKMAN (2007): “Spectral Decomposition of Forms,” Report, Yale University, University of Chicago, and Princeton University. [221,222] DUFFIE, D., AND L. EPSTEIN (1992): “Stochastic Differential Utility,” Econometrica, 60, 353–394. [188] DUFFIE, D., J. PAN, AND K. J. SINGLETON (2000): “Transform Analysis and Asset Pricing for Affine Jump-Diffusions,” Econometrica, 68, 1343–1376. [204] EPSTEIN, L., AND S. ZIN (1989): “Substitution, Risk Aversion, and the Temporal Behavior of Consumption and Asset Returns: A Theoretical Framework,” Econometrica, 57, 937–969. [188] ETHIER, S. N., AND T. G. KURTZ (1986): Markov Processes: Characterization and Convergence. New York: Wiley. [207] GARMAN, M. B. (1984): “Towards a Semigroup Pricing Theory,” Journal of Finance, 40, 847–861. [178,182] HANSEN, L. P. (2008): “Modeling the Long Run: Valuation in Dynamic Stochastic Economies,” Fisher–Schultz Lecture, Presented at the 2006 European Meetings of the Econometric Society. [214,218] HANSEN, L. P., AND J. A. SCHEINKMAN (1995): “Back to the Future: Generating Moment Implications for Continuous-Time Markov Processes,” Econometrica, 63, 767–804. [202,210] HANSEN, L. P., J. C. HEATON, AND N. LI (2008): “Consumption Strikes Back?: Measuring Long Run Risk,” Journal of Political Economy, 116, 260–302. [178-180,196,202,217,218] KLESHCHELSKI, I., AND N. VINCENT (2007): “Robust Equilibrium Yield Curves,” Report, Northwestern University and HEC Montreal. [192] KONTOYIANNIS, I., AND S. P. MEYN (2003): “Spectral Theory and Limit Theorems for Geometrically Ergodic Markov Processes,” Annals of Applied Probability, 13, 304–362. [207,221,222, 231] (2005): “Large Deviations Asymptotics and the Spectral Theory of Multiplicatively Regular Markov Processes,” Electronic Journal of Probability, 10, 61–123. [219] KREPS, D. M., AND E. L. PORTEUS (1978): “Temporal Resolution of Uncertainty and Dynamic Choice,” Econometrica, 46, 185–200. [188,223] LETTAU, M., AND J. A. WACHTER (2007): “Why Is Long-Horizon Equity Less Risky? A DurationBased Explanation of the Value Premium,” Journal of Finance, 62, 55–92. [180] LEWIS, A. L. (1998): “Applications of Eigenfunction Expansions in Continuous-Time Finance,” Mathematical Finance, 8, 349–383. [222]
234
L. P. HANSEN AND J. A. SCHEINKMAN
LINETSKY, V. (2004): “Lookback Options and Diffusion Hitting Times: A Spectral Approach,” Finance and Stochastics, 8, 373–398. [222] LIPTSER, R. S., AND A. N. SHIRYAEV (2000): Statistics of Random Processes: I. General Theory, Applications of Mathematics: Stochastic Modelling and Applied Probability (Second Ed.). Berlin: Springer Verlag. [225] MEYN, S. P., AND R. L. TWEEDIE (1992): “Stability of Markovian Processes I: Criteria for Discrete-Time Chains,” Advances in Applied Probability, 24, 542–574. [224] (1993a): “Stability of Markovian Processes II: Continuous-Time Processes and Sampled Chains,” Advances in Applied Probability, 25, 487–517. [211,224] (1993b): “Stability of Markovian Processes III: Foster–Lyapunov Criteria for Continuous Time Processes,” Advances in Applied Probability, 25, 518–548. [211,212] NELSON, E. (1958): “The Adjoint Markov Process,” Duke Mathematical Journal, 25, 671–690. [209,210] NUMMELIN, E. (1984): General Irreducible Markov Chains and Non-Negative Operators. Cambridge, U.K.: Cambridge University Press. [219-221,224,231] PROTTER, P. E. (2005): Stochastic Integration and Differential Equations. Berlin: Springer Verlag. [199] REVUZ, D., AND M. YOR (1994): Continuous Martingales and Brownian Motion (Second Ed.). Berlin: Springer Verlag. [185,197] ROGERS, L. C. G. (1998): “The Origins of Risk-Neutral Pricing and the Black–Scholes Formula,” in Risk Management and Analysis, ed. by C. O. Alexander. New York: Wiley, Chap. 2, 81–94. [181] SCHRODER, M., AND C. SKIADAS (1999): “Optimal Consumption and Portfolio Selection With Stochastic Differential Utility,” Journal of Economic Theory, 89, 68–126. [188] SHALIASTOVICH, I., AND G. TAUCHEN (2005): “Pricing Implications of Stochastic Volatility, Business Cycle Time Change and Non-Gaussianity,” Report, Duke University. [191,223] STUTZER, M. (2003): “Portfolio Choice With Endogenous Utility: A Large Deviations Approach,” Journal of Econometrics, 116, 365–386. [195]
Dept. of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, U.S.A. and NBER; [email protected] and Dept. of Economics, Princeton University, 26 Prospect Avenue, Princeton, NJ 08540-5296, U.S.A. and NBER; [email protected]. Manuscript received October, 2006; final revision received June, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 235–247
VIRTUAL DETERMINACY IN OVERLAPPING GENERATIONS MODELS BY JONATHAN L. BURKE1 We reappraise the significance and robustness of indeterminacy in overlappinggenerations models. In any of Gale’s example economies with an equilibrium that is not locally unique, for instance, perturbing the economy by judiciously splitting each of Gale’s goods into two close substitutes restricts that indeterminacy to each period’s allocation of consumption between those substitutes. In particular, prices, interest rates, the commodity value of nominal savings (including money), and utility levels become determinate. Any indeterminacy of equilibrium consumption in the perturbed economy is thus insignificant to consumers, and some forecasting and comparative-statics policy exercises become possible. KEYWORDS: Indeterminacy, robustness, overlapping generations, forecasting, comparative statics.
1. INTRODUCTION THE GENERAL -EQUILIBRIUM LITERATURE motivates an appraisal of determinacy in rational-expectations models: If an equilibrium is determinate, then forecasting and comparative-statics exercises are conclusive, and rational expectations follow from underlying assumptions of perfect rationality and perfect knowledge of the environment. Debreu (1972, Section 2) interpreted any general-equilibrium model with at least one locally unique equilibrium allocation as determinate insofar as only small changes from that allocation are considered. Kehoe and Levine (1985) described a stronger standard for determinacy in stationary overlapping-generations models. They supposed exogenous, past consumption achieved a steady state and required for determinacy that the particular equilibrium replicating past steady-state consumption from period 1 onward be locally unique among all (stationary and nonstationary) equilibria. One interpretation of the latter standard is consumers have only local information about other consumers’ preferences near past consumption, and so only small changes from past consumption are considered. For simple indeterminacy examples, consider stationary, pure-exchange overlapping-generations models with one consumer per generation, two periods per lifetime, and one (aggregate) good available per period. Under standard conditions, Gale (1973) showed almost every such model has two steady states (autarky and the golden rule), and one steady state is locally unique and 1 Preliminary versions of this paper were presented at the North American Summer Meetings of the Econometric Society at UCLA, and at the first annual CARESS–Cowles conference on general equilibrium and its applications at Yale University. Thoughtful critiques by seminar participants and university colleagues provoked a significant revision. My thanks to one especially thorough referee of this journal for his clarification of my central proof and to Levon Goukasian for help polishing the final draft.
© 2009 The Econometric Society
DOI: 10.3982/ECTA5675
236
JONATHAN L. BURKE
the other is not.2 For the latter steady state, there is a continuum of nearby nonstationary equilibrium allocations. Under Kehoe and Levine’s standard, the model is thus determinate if exogenous, past consumption were the locally unique steady state, but indeterminate if past consumption were the other state. The literature accepts such indeterminacy as robust since it remains in one-good models after any suitably small perturbation of preferences. However, indeterminacy is no longer robust after making two changes to standard analysis. First, allow perturbations of preferences to disaggregate individual goods into substitutes. Specifically, consider a one-good-per-period model to be equivalent to a disaggregated version with two perfect substitutes (say, early and late consumption) available per period, and consider a one-good model to be perturbed by any sequence of two-good models that converge to the twogood disaggregated version of the one-good model. Second, weaken Kehoe and Levine’s standard for determinacy to require that the particular equilibrium replicating past steady-state consumption be virtually locally unique, rather than locally unique in the standard sense. “Virtual” local uniqueness allows there to be a continuum of nonstationary equilibrium allocations in any neighborhood of the particular equilibrium, but prices, rates of return, savings values, and utility levels are constant across all those equilibria when the neighborhood is sufficiently small. (For example, the first consumer born in the economy achieves the same utility level in each equilibrium, as does the second consumer and so on.) In particular, any indeterminacy of equilibrium consumption in the neighborhood is thus insignificant to consumers. Here is the plan of the rest of the paper. Sections 2 and 3 begin precise analysis with a definition of economies and equilibria with two goods available per period and discrete time, starting in period 1. The definition of economies is standard, and includes all the two-good economies we need to perturb away the significant indeterminacy in Gale’s one-good models. Following Kehoe and Levine (1985), equilibria are defined and interpreted as the possible outcomes of a particular comparative-statics exercise.3 Sections 4–6 define virtual local uniqueness and contain the main results (Theorem 5.1 and Theorem 5.2), which prove a representative example of 2
In particular, having every steady state locally unique is exceptional; it only happens when there is a unique equilibrium. Overlapping-generations analysis thus differs from Debreu’s (1972, Section 2) analysis for a finite number of consumers and goods, where the sufficient conditions found for guaranteeing at least one equilibrium is locally unique are the same as the conditions guaranteeing every equilibrium is locally unique. 3 Specifically, suppose exogenous, past consumption achieved a steady state. Then an unexpected sunspot shock occurs (which does not change endowments or preferences), and consumers may revise their expectations and so follow a new perfect-foresight equilibrium path from period 1 onward.
VIRTUAL DETERMINACY
237
Gale’s indeterminacy models can be perturbed so that the particular equilibrium replicating past consumption becomes virtually locally unique. Thus significant indeterminacy is not robust. Finally, Section 7 considers further implications of Theorem 5.1 and Theorem 5.2, and discusses known extensions. 2. ECONOMIES This section defines a typical universe of pure-exchange overlappinggenerations economies with endowments and preferences that are stationary across generations. There are a countable number of time periods, indexed t ∈ N := {1 2 }, two nonstorable goods available per period, indexed i = 1 2, and a countable number of consumers. Consumer 0 lives in period 1 (his old age), and consumer t ∈ N in periods t (youth) and t + 1 (old age). Throughout the paper, fix a positive endowment e = (a1 a2 b1 b2 ) for consumer t ∈ N of the two goods in youth, ai > 0, and the two goods in old age, bi > 0. (The endowment is the same for every consumer t ∈ N.) Consumer 0 receives the old-age endowment, (b1 b2 ). Let xti τ denote typical consumption of good i in period t ∈ N by consumer τ = t − 1 t in our first definition: DEFINITION 1: A consumption vector x0 = (x011 x012 ) ∈ 2+ for consumer 0 denotes the old-age (period 1) consumption x1i 0 of each good i, and a cont2 t+11 t+12 4 sumption vector xt = (xt1 x x x ) ∈ t t t t + for consumer t ∈ N denotes ti t+1i consumption xt in youth and xt in old age. t2 t+11 Parameterize an economy by a utility function u(xt ) = u(xt1 t xt xt x ) representing preferences for consumer t ∈ N over consumption vectors (Definition 1). (The utility function is the same for every consumer t ∈ N.) The definition of equilibrium (Definition 3) in Section 3 represents consumer 0’s preferences over old-age consumption x0 = (x011 x012 ) by the same utility function u(x001 x002 x011 x012 ) above, after restricting young-age consumption (x001 x002 ) to a particular steady state. The universe of economies are those utility functions u : 4+ → [−∞ +∞) satisfying these assumptions: t+12 t
ASSUMPTION 1: The function u is concave and nondecreasing over 4+ . ASSUMPTION 2: The function u is finite-valued (u > −∞) and increasing over t2 t+11 xt+12 ) ≥ u(e). those vectors at least as good as the endowment, u(xt1 t xt xt t t2 t+11 For example, the modified sum-of-logs function ln(xt1 + t + xt ) + ln(xt ) satisfies both assumptions. x t+12 t
238
JONATHAN L. BURKE
3. EQUILIBRIA This section defines a typical class of stationary and nonstationary equilibria for each economy u satisfying Assumptions 1 and 2. DEFINITION 2: An allocation is a path x = (xt )t∈{0}∪N of consumption vectors, t2 t+11 xt+12 ) ∈ 4+ (Definition 1), with x0 = (x011 x012 ) ∈ 2+ and xt = (xt1 t xt xt t 4 that balances materials (1)
ti i i xti t−1 + xt = b + a
(t ∈ N i = 1 2)
In the following definition of equilibrium, good 1 is numéraire, pt > 0 is the relative price of good 2, Rt > 0 is the gross rate of return on savings from period t to period t + 1 (principal plus interest in terms of the numéraire), and M is the period 1 numéraire value of consumer 0’s nominal savings. DEFINITION 3: For any young-age consumption (x¯ 01 x¯ 02 ) ∈ 2+ by consumer 0, an allocation x = (xt )t∈{0}∪N is an equilibrium (or equilibrium continuation of (x¯ 01 x¯ 02 )) if it is supported by some price sequence p = (pt )t∈N ∈ N++ , rate-of-return sequence R = (Rt )t∈N ∈ N++ , and savings value M ∈ . Specifically, old-age consumption x0 = (x011 x012 ) ∈ 2+ by consumer 0 maximizes utility u(x¯ 01 x¯ 02 x011 x012 ) over all vectors x0 in 2+ satisfying the budget constraint (2)
(x011 − b1 ) + p1 (x012 − b2 ) = M
t2 t+11 and lifetime consumption xt = (xt1 xt+12 ) by consumer t ∈ N maxt xt xt t 4 imizes utility u(xt ) over all vectors xt in + satisfying
(3)
1 t t2 2 (xt1 t − a ) + p (xt − a ) +
− b1 ) + pt+1 (xt+12 − b2 ) (xt+11 t t = 0 Rt
DEFINITION 4: An equilibrium continuation x = (xt )t∈{0}∪N of (young-age) consumption (x¯ 01 x¯ 02 ) ∈ 2+ is a stationary equilibrium if consumption is stationary across consumers. That is, writing (old-age) consumption x0 = (x¯ 11 x¯ 12 ) for consumer 0, then xt = (x¯ 01 x¯ 02 x¯ 11 x¯ 12 ) for consumer t ∈ N. DEFINITION 5: A steady state is the consumption vector x¯ = (x¯ 01 x¯ 02 x¯ 11 x¯ ) ∈ 4+ received by every consumer in a stationary equilibrium. 12
Since utility is assumed increasing over those vectors at least as good as the endowment (Assumption 2), the set of equilibria would be unchanged 4
Section 2 fixed individual consumer endowments e = (a1 a2 b1 b2 ) throughout the paper.
VIRTUAL DETERMINACY
239
if prices pt were allowed to be zero or negative. Likewise, the set of equilibria would be unchanged if budget constraints were defined by a sequence (q11 q12 ) (q21 q22 ) of present-value prices. For example, the budget constraint (3) of consumer t ∈ N evidently has the same feasible solutions t2 t+11 xt+12 ) as the alternative constraint xt = (xt1 t xt xt t 1 t2 t2 2 t+11 qt1 (xt1 (xt+11 − b1 ) t − a ) + q (xt − a ) + q t
− b2 ) = 0 + qt+12 (xt+12 t defined by positive present-value prices.5 Kehoe and Levine (1985) interpreted the set of all equilibrium continuations of a fixed steady state (Definitions 3 and 5) as the possible outcomes of a particular comparative-statics exercise. They supposed exogenous, past consumption before period 1 achieved a steady state x¯ = (x¯ 01 x¯ 02 x¯ 11 x¯ 12 ), and then an unexpected shock occurs. In the current setting, just consider an unexpected sunspot, which does not change endowments or preferences. After the shock, consumers may revise their expectations and so follow any equilibrium continuation (Definition 3) of the steady-state young-age consumption (x¯ 01 x¯ 02 ). Specifically, consumers can buy and sell commodities on complete spot markets at perfectly forseen prices, and consumer 0 carries over nominal savings from the past.6 4. VIRTUAL LOCAL UNIQUENESS This section compares a typical definition of local uniqueness of equilibria to virtual local uniqueness for each economy u satisfying Assumptions 1 and 2. Following Gale (1973) and Kehoe and Levine (1985), endow the ambient space (2+ × 4+ × · · ·) of consumption paths (Definition 2) with the uniform (or sup) topology (Munkres (1975, pp. 119, 122)).7 DEFINITION 6: A stationary equilibrium x = (xt )t∈{0}∪N , with steady state x¯ = (x¯ 01 x¯ 02 x¯ 11 x¯ 12 ), is locally unique if, for some neighborhood X ⊂ (2+ × 4+ × · · ·) of x, the path x is the only allocation in X that is an equilibrium continuation (Definition 3) of young-age consumption (x¯ 01 x¯ 02 ). DEFINITION 7: A stationary equilibrium x = (xt )t∈{0}∪N , with steady state x¯ = (x¯ 01 x¯ 02 x¯ 11 x¯ 12 ), is virtually locally unique if, for some neighborhood X ⊂ τ 11 Specifically, qt1 := 1/ t−1 := 1) and qt2 := pt qt1 derive presentτ=1 R (with null product q value prices from the pt and Rt , while pt := qt2 /qt1 and Rt := qt1 /qt+11 derive the pt and Rt from present-value prices. 6 Such savings is called “money” in the literature when its commodity value is nonnegative. 7 The uniform topology is generated by the metric supt∈{0}∪N min{ xt − xˆ t 1}, defined over pairs of paths x = (xt )t∈{0}∪N and xˆ = (xˆ t )t∈{0}∪N . 5
240
JONATHAN L. BURKE
(2+ × 4+ × · · ·) of x, there is a unique triple (p R M) of a price sequence p = (pt )t∈N , a rate-of-return sequence R = (Rt )t∈N , and a savings value M that support ((2), (3)) every allocation in X that is an equilibrium continuation (Definition 3) of young-age consumption (x¯ 01 x¯ 02 ). Kehoe and Levine (1985) interpreted local uniqueness of a stationary equilibrium (Definition 6) by supposing exogenous, past consumption achieved a steady state x¯ and that consumers have only local information about other con¯ Hence, only allocations in the neighborhood X are sumers’ preferences near x. considered. Local uniqueness of the particular equilibrium x replicating past steady-state consumption from period 1 onward thus implies determinacy, and forecasting and comparative-statics policy exercises are possible. “Virtual” local uniqueness (Definition 7) has virtually the same interpretation and implications: there may be many nonstationary equilibrium allocations in the neighborhood X, but prices, rates of return, and savings values are constant across all those equilibria. Any nonuniqueness of consumption in the neighborhood is thus insignificant to consumers, and forecasting and comparative-statics policy exercises are still possible. 5. PERTURBING INDETERMINACY EXAMPLES This section perturbs away significant indeterminacy for a representative example of Gale’s indeterminacy models. Gale’s models have only one good per period. To fix an example, fix the endowment at 6 for the good in youth and 2 in old age, and fix the sum-of-logs utility function (4)
ln(x0 ) + ln(x1 )
to represent preferences over youth x0 and old-age x1 consumption. Gale (1973) proved there are two stationary equilibria (and steady states): the golden rule and autarky. The golden rule is locally unique (and virtually locally unique), but autarky is not locally unique (and not virtually locally unique). The model is thus indeterminate, under Kehoe and Levine’s standard, if exogenous, past consumption was autarkic. The first part of our perturbation of Gale’s example requires there be at least one way to disaggregate goods (say, into early and late moments within each period) so that there is a complete spot market for each of two goods available per period. Although we could accommodate any disaggregation and any individual consumer endowments (a1 a2 b1 b2 ) such that a1 + a2 = 6 and b1 + b2 = 2, for the rest of the paper just consider the disaggregation for which individual consumer endowments are (5)
e = (a1 a2 b1 b2 ) = (3 3 1 1)
That is, ai = 3 units of each good in youth, and bi = 1 of each in old age.
241
VIRTUAL DETERMINACY
One underlying implicit assumption in any general-equilibrium model with a finite number of goods available is that if any one of the goods were disaggregated into two (or more) goods, then those disaggregated goods must be perfect substitutes.8 Thus Gale’s model implicitly assumes his one-good-perperiod function (4) to be equivalent to this disaggregated version v with two perfect substitutes per period (6)
v(x01 x02 x11 x12 ) := ln(x01 + x02 ) + ln(x11 + x12 )
There is an evident equivalence of steady states and of equilibria between the one-good version (4) and the two-good version v (6) of Gale’s example. In particular, the autarkic steady state in the one-good version is equivalent, in the two-good version v, to a barter steady state—that is, a steady state x¯ = (x¯ 01 x¯ 02 x¯ 11 x¯ 12 ) in which savings across periods have zero value ((2), (5)) (7)
(x¯ 01 − 3) + pt (x¯ 02 − 3) = 0
(x¯ 11 − 1) + pt (x¯ 12 − 1) = 0
(t ∈ N)
for the price sequence (pt )t∈N supporting the stationary equilibrium associated with the steady state (Definition 5). THEOREM 5.1: Gale’s economy v (6) has an infinite number of barter steady states indexed (8)
x(λ) := λ(4 2 0 2) + (1 − λ)(2 4 2 0)
for
λ ∈ [0 1]
The stationary equilibrium associated with any one of those barter steady states is not locally unique and not virtually locally unique.9 Theorem 5.1 is proved in the Appendix. Theorem 5.1 implies that the significant indeterminacy in Gale’s one-good model (4) remains in the two-good version v (6) if exogenous, past consumption were one of the barter steady states (8). The remainder of our perturbation is to consider Gale’s one-good example (4) to be perturbed by a particular sequence of two-good economies uk that converge to the two-good disaggregated version v (6) of the one-good example. 8
That assumption is commonly recognized in applied general-equilibrium models. If one only has enough data to define “hats” as a good, then one must assume “black hats” and “white hats” are perfect substitutes. 9 There are also an infinite number of nonbarter steady states, but the stationary equilibrium associated with any one of those is virtually locally unique and so is not central to our analysis.
242
JONATHAN L. BURKE
THEOREM 5.2: There exists a concave and increasing C ∞ function u : 4+ → such that, for k = 1 2 the perturbed economy10 (9)
uk := v +
1 u k
has the same set of barter (8) steady states x(λ), λ ∈ [0 1] as the Gale economy v. But the stationary equilibrium associated with any one of those barter steady states x(λ) with λ > 0 is now virtually locally unique for the perturbed economy (9). Likewise, there exists another perturbation function u satisfying all the properties above, but the stationary equilibrium associated with any barter steady state x(λ) with λ < 1 is now virtually locally unique for the perturbed economy. Theorem 5.2 shows each perturbed economy overturns the significant indeterminacy (lack of virtual local uniqueness) Theorem 5.1 finds in the Gale economy v (6) if exogenous, past consumption were one of the barter steady states x(λ). Additionally, because the perturbation function u is C ∞ , the sequence of perturbed economies (9) evidently converges to the Gale economy v for any of the topologies typically considered in the literature. 6. PROOF OF THEOREM 5.2 By symmetry, it is sufficient to prove the first paragraph of Theorem 5.2. STEP 1: Construct the perturbation function u. Consider the multivariate polynomial (10)
φ := 16 (x01 +x02 )+ 12 (x11 +x12 )−ε(x01 +x02 −6)2 (108+(x12 +1)2 )
over all vectors x = (x01 x02 x11 x12 ) ∈ 4+ . First-order partial derivatives (11)
φ1 = φ2 = 16 − 2ε(x01 + x02 − 6)(108 + (x12 + 1)2 )
(12)
φ3 = 12
φ4 = 12 − 2ε(x01 + x02 − 6)2 (x12 + 1)
and second-order derivatives prove that if parameter ε is chosen to be positive and sufficiently small, then φ is concave and increasing over all vectors (x01 x02 x11 x12 ) ≤ (5 5 5 5). Fix such a parameter ε > 0. Hence, there exEvidently, the function uk := v + k1 u satisfies Assumptions 1 and 2 for economies because the Gale economy v (6) satisfies those assumptions, and the function u is everywhere finite-valued, concave, and increasing. 10
243
VIRTUAL DETERMINACY
ists11 a concave and increasing C ∞ function u : 4+ → that equals φ over those vectors in the feasible set (13) F := (x01 x02 x11 x12 ) ∈ 4+ | (x01 x02 x11 x12 ) ≤ (4 4 4 4) Material balance ((1), (5)) implies the set F (13) contains every steady state and every consumption for consumer t ∈ N that is part of any equilibrium. By construction, functions u and φ have the same first-order partial derivatives ((11), (12)) over F . Those derivatives ((11), (12)) for function u evidently have the following properties for each vector (x01 x02 x11 x12 ) ∈ F : PROPERTY 1: u1 = u2 . PROPERTY 2: u3 ≥ u4 over F with equality if and only if x01 + x02 = 6. PROPERTY 3: u1 = 13 u4 if both x01 + x02 = 6 and x11 + x12 = 2.
Q.E.D.
STEP 2: For k = 1 2 prove the perturbed economy uk = v + k1 u ((6), (9)) has the same set of barter steady states as the Gale economy v (8). PROOF: The perturbed economy ((6), (9)) uk (x01 x02 x11 x12 ) = ln(x01 + x02 ) + ln(x11 + x12 ) +
1 u(x01 x02 x11 x12 ) k
evidently inherits Properties 1, 2, and 3 from economy u. That is, for each vector (x01 x02 x11 x12 ) ∈ F : PROPERTY 1 : uk1 = uk2 . PROPERTY 2 : uk3 ≥ uk4 over F with equality if and only if x01 + x02 = 6. PROPERTY 3 : uk1 = 13 uk4 if both x01 + x02 = 6 and x11 + x12 = 2. To compute the barter steady states of the perturbed economy uk , consider consumption x(λ) = λ(4 2 0 2) + (1 − λ)(2 4 2 0) for some λ ∈ [0 1]. That vector satisfies the hypotheses of Properties 2 and 3 . Hence, uk1 = uk2 = 13 uk3 = 13 uk4 . That is, for the stationary prices pt = 1 and rates of return 11 For example, set u(x01 x02 x11 x12 ) := φ(ψ(x01 ) ψ(x02 ) ψ(x11 ) ψ(x12 )), where ψ : + → [0 5) is a concave and increasing C ∞ function that equals the identity on [0 4]. Such a function ψ can be constructed using a convolution (Mas-Colell (1985, p. 41)).
244
JONATHAN L. BURKE
Rt = 1/3, the utility gradient ∂uk = (uk1 uk2 uk3 uk4 ) at x(λ) is proportional to (1 pt 1/Rt pt+1 /Rt ), and for pt = 1, x(λ) evidently has zero savings across periods (7). Hence, x(λ) is a barter steady state (7) with stationary prices pt = 1, rates of return Rt = 1/3, and savings value M = 0. Thus, the perturbed economy uk has all of the barter steady states x(λ), λ ∈ [0 1], of economy v. To prove the converse, consider any barter steady state x¯ = (x¯ 01 x¯ 02 x¯ 11 12 x¯ ) of the perturbed economy uk , and consider any price sequence (pt )t∈N ¯ Zero supporting the stationary equilibrium associated with the steady state x. young-age savings (7) (x¯ 01 − 3) + pt (x¯ 02 − 3) = 0 implies price pt = 1 since, otherwise, budget-constrained utility maximization ((3), (5)) by consumer t ∈ N and uk1 = uk2 over the feasible set F (Property 1 ) implies either x¯ 01 > 6 or x¯ 02 > 6, either of which contradicts x¯ ∈ F (13). But pt = 1 and zero savings (7) imply x¯ 01 + x¯ 02 = 6 which with material balance ((1), (5)) equations x¯ 01 + x¯ 11 = 4 and x¯ 02 + x¯ 12 = 4 imply (x¯ 01 x¯ 02 x¯ 11 x¯ 12 ) = (2 + 2λ 4 − 2λ 2 − 2λ 2λ) for λ := (x¯ 01 − 2)/2. That is, the barter steady state x¯ = (x¯ 01 x¯ 02 x¯ 11 x¯ 12 ) equals x(λ) (8). Finally, the nonnegativity of consumption x¯ implies λ ∈ [0 1], as required. Q.E.D. STEP 3: For k = 1 2 prove the stationary equilibrium associated with any one of the barter steady states x(λ) with λ > 0 is virtually locally unique for the perturbed economy uk = v + k1 u (9). PROOF: Since λ > 0, consumption x(λ) is positive in its first, second, and fourth components (8). Hence, define the required neighborhood12 X ⊂ (2+ × 4+ × · · ·) of the stationary equilibrium with steady state x(λ) to be those paths x = (xt )t∈{0}∪N with consumption positive in their first, second, and fourth components (14)
xt1 t > 0
xt2 t > 0
and
xt+12 >0 t
for each consumer t ∈ N. It remains to consider any equilibrium continuation ((2), (3)) x = (xt )t∈{0}∪N of steady state x(λ) that is in the neighborhood (14), and show there is a unique 12 The set X is a neighborhood of the stationary equilibrium because the interior of X contains the stationary equilibrium.
VIRTUAL DETERMINACY
245
value for supporting prices pt , rates of return Rt , and savings values M. Specifically, we will show pt = 1, Rt = 1/3, and M = 0 for each equilibrium continuation x = (xt )t∈{0}∪N in the neighborhood (14). Lagrangian first-order conditions for budget-constrained utility maximization (3) for consumer t ∈ N with positive consumption in first, second, and fourth components (14) imply uk2 (xt ) uk1 (xt )
(15)
pt =
(16)
uk4 (xt ) ≥ pt+1 uk3 (xt )
(17)
Rt uk1 (xt ) = pt+1 uk4 (xt )
Hence, Property 1 and the first equality (15) determine prices pt = 1 for each period t ∈ N. Hence, for consumer t ∈ N, price pt+1 = 1 and the inequality (16) imply k u4 (xt ) ≥ uk3 (xt ), which with Property 2 implies the young-age consumption aggregation (18)
t2 xt1 t + xt = 6
That young-age restriction (18) holding in each period combines with material balance ((1), (5)) to restrict old-age consumption (19)
+ xt+12 = 2 xt+11 t t
Hence, the aggregation restrictions ((18), (19)) and Property 3 imply uk1 (xt ) = 1 k u (xt ), which with price pt+1 = 1 and the second equality (17) determines the 3 4 rate of return, Rt = 1/3. Finally, price p1 = 1 and the old-age restriction x011 + x012 = 2 (19) substitute into consumer 0’s budget constraint ((2), (5)) to determine the savings value, M = 0. Q.E.D. 7. CONCLUSION Theorems 5.1 and 5.2 were specialized to perturb away the significant indeterminacy for just a representative example of Gale’s indeterminacy models. That should provoke generalizations and extensions. One known generalization is that the proof of Theorem 5.2 in Section 6 can be adapted from log-linear utility (4) with a particular disaggregation of endowments (5) to accommodate any of Gale’s indeterminacy models and any disaggregation of endowments. One known extension is that the conclusion that the equilibrium in
246
JONATHAN L. BURKE
the perturbed economies is virtually locally unique can be reformulated and strengthened to a conclusion that the correspondence u → S(u) of economies u into the set S(u) of supporting prices, rates of return, and savings values is single-valued and continuous at the perturbed economies uk . Roughly, continuity means that significant indeterminacy returns slowly (the support set S(u) remains small) as one moves away from the perturbed economies. To guide further generalizations and extensions, note that our perturbation of indeterminant overlapping-generations models into virtually determinate models required both of our nonconventional steps: (1) perturbing models with one good into models with two perfect or close substitutes, and (2) interpreting indeterminacy in the division of goods among substitutes as insignificant. In any model, taking one step alone is insufficient and can mislead one to think that determinate models can be perturbed into indeterminate models. For example, consider a partial-equilibrium model with a single consump¯ tion variable and determinate equation: f (x) = 0 with a single solution at x. Step 1 alone considers f (x) = 0 to be equivalent to the bivariate equation f (x1 + x2 ) = 0, and the latter equation seems to be indeterminate because ¯ But f (x1 + x2 ) = 0 has an infinite number of solutions, defined by x1 + x2 = x. step 2 interprets that indeterminacy in the division of x¯ between x1 and x2 to be insignificant. Finally, both step 1 and step 2 make sense when variables measure economic consumption or production. It remains to be seen whether there are other applications. APPENDIX: PROOF OF THEOREM 5.1 Step 2 in Section 6 proves any economy satisfying Properties 1 and 3 and u3 (x(λ)) = u4 (x(λ)) has the required set of barter steady states (8). The Gale economy v (6) satisfies those properties, and so it has the required barter steady states. It remains to prove the stationary equilibrium associated with any one of those barter steady states x(λ) = λ(4 2 0 2) + (1 − λ)(2 4 2 0), λ ∈ [0 1], is not locally unique and not virtually locally unique. For any sequence of scalers z t ∈ [0 1], consider the path x = (xt )t∈{0}∪N of consumption vectors (20)
x0 := λ(0 2) + (1 − λ)(2 0) + (z 0 0) and xt := x(λ) + (−z t−1 0 z t 0)
where the pair λ(0 2) + (1 − λ)(2 0) consists of the third and fourth components of x(λ). The path x is evidently nonnegative and balances materials ((1), (5), and (20), and so is an allocation Definition 2). Compute marginal utilities v1 (xt ) = v2 (xt ) =
1 6 − z t−1
and v3 (xt ) = v4 (xt ) =
1 2 + zt
VIRTUAL DETERMINACY
247
along the consumption path (20), and consider the price sequence and rate-ofreturn sequence (21)
pt := 1 and
Rt := (2 + z t )/(6 − z t−1 )
Since the utility gradient ∂v = (v1 v2 v3 v4 ) at xt is proportional to (1 pt 1/Rt pt+1 /Rt ), the Lagrangian first-order conditions for budget-constrained utility maximization by consumer t ∈ N at xt ((3), (20)) reduce to the budget constraint (22)
−z t−1 +
6 − z t−1 t z = 0 2 + zt
Evidently, that constraint defines the first-order difference equation z t = f (z t−1 ) := z t−1 /(3 − z t−1 ), which satisfies 0 = f (0) and 0 < f (z t−1 ) < 1 for z t−1 ∈ [0 1]. Therefore, for any neighborhood X ⊂ (2+ × 4+ × · · ·) of the stationary equilibrium associated with barter steady state x(λ), any sufficiently small initial value z 0 > 0 generates a unique solution to the difference equation z t = f (z t−1 ), and a unique equilibrium (20), x = (xt )t∈T ∈ X. The stationary equilibrium associated with x(λ) is thus not locally unique, since there is an equilibrium (20) in neighborhood X for each sufficiently small z 0 > 0. The stationary equilibrium is also not virtually locally unique because the rates of return Rt (21) vary with z 0 ; in particular, R1 := (2 + f (z 0 ))/(6 − z 0 ). Q.E.D. REFERENCES DEBREU, G. (1972): “Smooth Preferences,” Econometrica, 40, 603–612. [235,236] GALE, D. (1973): “Pure Exchange Equilibrium of Dynamic Economic Models,” Journal of Economic Theory, 6, 12–36. [235,239,240] KEHOE, T., AND D. LEVINE (1985): “Comparative Statics and Perfect Foresight in Infinite Horizon Economies,” Econometrica, 53, 433–453. [235,236,239,240] MAS-COLELL, A. (1985): The Theory of General Economic Equilibrium: A Differentiable Approach. New York: Cambridge University Press. [243] MUNKRES, J. (1975): Topology: A First Course. Englewood Cliffs, NJ: Prentice-Hall. [239]
Business Administration Division, Pepperdine University, 24255 Pacific Coast Hwy., Malibu, CA 90263-4237, U.S.A.; [email protected]. Manuscript received February, 2005; final revision received August, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 249–278
TWO NEW CONDITIONS SUPPORTING THE FIRST-ORDER APPROACH TO MULTISIGNAL PRINCIPAL–AGENT PROBLEMS BY JOHN R. CONLON1 This paper presents simple new multisignal generalizations of the two classic methods used to justify the first-order approach to moral hazard principal–agent problems, and compares these two approaches with each other. The paper first discusses limitations of previous generalizations. Then a state-space formulation is used to obtain a new multisignal generalization of the Jewitt (1988) conditions. Next, using the Mirrlees formulation, new multisignal generalizations of the convexity of the distribution function condition (CDFC) approach of Rogerson (1985) and Sinclair-Desgagné (1994) are obtained. Vector calculus methods are used to derive easy-to-check local conditions for our generalization of the CDFC. Finally, we argue that the Jewitt conditions may generalize more flexibly than the CDFC to the multisignal case. This is because, with many signals, the principal can become very well informed about the agent’s action and, even in the one-signal case, the CDFC must fail when the signal becomes very accurate. KEYWORDS: Principal–agent model, moral hazard, first-order approach, multiple signals.
1. INTRODUCTION MORAL HAZARD PRINCIPAL –AGENT PROBLEMS have always presented surprising technical challenges. In particular, a great deal of attention has focused on determining conditions under which the principal can predict the agent’s behavior using the agent’s first-order conditions alone. This issue was first raised by Mirrlees ((1999); originally circulated in 1975) and Guesnerie and Laffont (1978), and was further discussed by Grossman and Hart (1983). Rogerson (1985) gave a rigorous set of sufficient conditions for the first-order approach in the one-signal case, based on the “convexity of the distribution function condition” (CDFC). Jewitt (1988) gave a different set of sufficient conditions, not using the CDFC, also for the one-signal case. In addition, Jewitt (1988) gave two sets of sufficient conditions for the multisignal case, one of which also avoided the CDFC. However, both sets of Jewitt’s multisignal conditions assumed that signals were independent of one another. 1 This paper benefitted greatly from a faculty seminar at the University of Mississippi. Discussions with Richard Boylan, Gerard Buskes, Don Cole, Ron Harstad, David Marshall, Frank Page, Hyun Park, Paul Pecorino, Marcin P¸ eski, William Rogerson, Bernard Sinclair-Desgagné, Jeroen Swinkels, and Min-hung Tsay were also extremely helpful. In particular, especially perceptive questions by Bernard Sinclair-Desgagné led to the results in Section 8 and contributed significantly to Section 10. Ian Jewitt also suggested numerous major improvements, only a few of which are explicitly acknowledged below. Finally, extensive and thoughtful suggestions by the referees and by the co-editor, Larry Samuelson, have greatly streamlined and improved the paper. This work was made possible, in part, by a sabbatical from the University of Mississippi in the fall of 2005. All remaining errors are mine.
© 2009 The Econometric Society
DOI: 10.3982/ECTA6688
250
JOHN R. CONLON
Six years later, Sinclair-Desgagné (1994) developed a generalization of the CDFC, the GCDFC, for the multisignal case. This allowed him to avoid Jewitt’s independence assumption. However, the GCDFC requires the marginal distribution on all but one of the principal’s signals to be an affine (i.e., constant sloped) function of the agent’s action (see Section 3). Also, a second less essential condition of his makes the marginal distribution on all but one of the principal’s signals independent of the agent’s action. In addition, the CDFC and GCDFC do not capture the standard diminishing marginal returns intuition. As Rogerson (1985) explained, “if output is determined by a stochastic production function with diminishing returns to scale in each state of nature, the implied distribution function over output will not, in general, exhibit the CDFC.” Thus, neither Rogerson’s one-signal results nor Sinclair-Desgagné’s multisignal results build on the usual economic notion of diminishing marginal returns. Jewitt’s (1988) conditions, by contrast, do build on the standard economic sense of diminishing returns, so his conditions are closer to ordinary economic intuition. However, the profession seems to have regarded his conditions as overly technical. Thus, while most principal–agent textbooks present the Rogerson CDFC, none presents the Jewitt conditions. Also, little theoretical work has built on these conditions. In particular, no multisignal generalizations of Jewitt’s conditions are available beyond those in Jewitt’s original paper, which assume independent signals. The goal of this paper is to clarify and extend both of the above methods of justifying the first-order approach. First we suggest that the Rogerson conditions arise most naturally from the Mirrlees formulation of the agent’s problem, while the Jewitt conditions arise naturally from the older state-space perspective of Spence and Zeckhauser (1971) and Ross (1973). In addition we discuss limitations of previous multisignal generalizations of the Jewitt and Rogerson one-signal approaches. We next present new multisignal extensions of each set of conditions, which are more general than previous extensions. The multisignal generalization of Jewitt’s approach yields essentially Jewitt’s main set of multisignal conditions, but without his restrictive independence assumption. The multisignal extension of the CDFC approach involves a new condition, the concave increasing-set probability (CISP) condition, which generalizes the CDFC, and is more flexible than previous extensions. In addition, if we want to allow for a risk averse principal, we need an additional condition, the nondecreasing increasing-set probability (NISP) condition. Finally, we argue that the Jewitt approach may generalize more flexibly to the multisignal case than the CDFC approach. This is because, with many signals, the principal tends to become very well informed about the agent’s action and, even in the one-signal case, the CDFC must fail when the signal becomes very accurate.
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
251
The multisignal case is of practical interest since, in most actual environments, the principal sees a wide range of signals of an agent’s effort. The multisignal case is also especially interesting because it is often used to underscore the central role of information in principal–agent problems (Holmström (1979), Shavell (1979)). The next section introduces the basic framework. Section 3 discusses previous results, and Section 4 presents the new multisignal generalization of Jewitt’s conditions. Sections 5 and 6 derive simple new multisignal generalizations of the CDFC approach, for risk neutral and risk averse principals, respectively. Section 7 reduces the key NISP and CISP conditions to the existence of probability density flows having certain local properties. Section 8 shows how to construct these density flows for a given distribution function, and so check NISP and CISP. Section 9 briefly discusses other extensions. Section 10 compares the two main approaches and concludes.2 2. BASIC FRAMEWORK This section presents the general framework for multisignal principal–agent problems, describing both the state-space and the Mirrlees formulations of the problem. Assume one principal and one agent. The agent chooses an effort ˜ with level a ≥ 0. This is combined with a random vector of state variables ϑ, probability density function p(ϑ), to generate a random vector of signals (1SS )
˜ x˜ = x(a ϑ)
with x(· ·) taking values in Rn . Equation (1SS ) is a state-space representation of the technology. Assume, until Section 6, that the principal is risk neutral. The agent has a von Neumann–Morgenstern utility function u(s) − a, where s is the agent’s monetary payoff and u(·) is a strictly increasing function.3 Let s = s(x) be the function, chosen by the principal, specifying her payment to the agent as a function of the signal x. Let the value of output be given by the function π(x), as in Sinclair-Desgagné (1994). Then the principal’s expected payoff is (2SS ) V (s(·) a) = π(x(a ϑ)) − s(x(a ϑ)) p(ϑ) dϑ 2 This paper is mostly self-contained. The main exceptions are that (a) I have not derived the principal’s standard first-order conditions, (7) and (15) below, and (b) I have not included Jewitt’s (1988) argument that μ ≥ 0 in (7). 3 This function could easily be replaced by u(s) − c(a) with c(·) convex. However, the curvature ˆ but (1SS ) becomes of c(·) can be important. For if we let aˆ = c(a), then utility becomes u(s) − a, ˜ Thus, if c(·) is concave, then c −1 (·) is convex, which can alter the production ˆ ϑ). x˜ = x(c −1 (a) technology in ways that overturn some of the conditions below. See Brown, Chiang, Ghosh, and Wolfstetter (1986) on the importance of the curvature of c(a).
252
JOHN R. CONLON
and the agent’s expected payoff is (3SS ) U(s(·) a) = u s(x(a ϑ)) p(ϑ) dϑ − a The principal’s problem is then to choose a payment schedule, s∗ (·), and target action, a∗ , by the agent, to maximize (2SS ), given two constraints: (4)
a∗ maximizes the expected payoff U(s∗ (·) a) to the agent
and (5)
the resulting expected payoff to the agent U(s∗ (·) a∗ ) ≥ U0
where U0 is the agent’s reservation utility. Here (4) and (5) are the usual incentive compatibility and participation (individual rationality) constraints. To rephrase this in terms of a Mirrlees representation, let (1M )
˜ ≤ x) F(x|a) = Prob(x(a ϑ)
be the cumulative distribution function of x˜ implied by (1SS ) (where x ≤ y for x, y ∈ Rn means xi ≤ yi , i = 1 2 n) and let f (x|a) be the corresponding density function. Assume as usual that the support of f (x|a) is compact and independent of a, and that f (x|a) is sufficiently smooth and bounded between two positive constants on its support. Using the Mirrlees notation, the principal’s expected payoff becomes (2M ) V (s(·) a) = [π(x) − s(x)]f (x|a) dx and the agent’s expected payoff becomes (3M ) U(s(·) a) = u(s(x))f (x|a) dx − a The principal’s problem is still to maximize V (s(·) a) subject to (4) and (5). Our primary focus is (3SS ) and (3M ). These are equivalent ways to represent the agent’s expected payoff in terms of the payment function s(·) and the agent’s action a. However, (3SS ) explicitly separates out the production technology x(a ϑ) from the density p(ϑ), while (3M ) folds them together in f (x|a). On the other hand, the state-space representation contains extraneous information since many different state-space representations correspond to any given Mirrlees representation (see Section 4). The first-order approach assumes that one can replace the incentive compatibility constraint (4) by a “relaxed” constraint—the agent’s first-order condition (6)
Ua (s∗ (·) a∗ ) = 0
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
253
where subscripts denote partial derivatives. To ensure that (6) implies (4), it is sufficient for the agent’s utility, U(s∗ (·) a), to be a concave function of her effort a. The goal of the first-order approach is to find conditions which yield this concavity. If one does replace (4) by (6) in the principal’s problem, then the principal’s optimal contract, s∗ (x), for inducing the agent to choose some action a∗ at minimum cost, satisfies the principal’s usual first-order condition (7)
1 fa (x|a∗ ) = λ + μ u (s∗ (x)) f (x|a∗ )
where λ and μ are the Lagrange multipliers for the participation and relaxed incentive compatibility constraints, (5) and (6), respectively (see Jewitt, Kadan, and Swinkels (2008) for an elegant treatment). In addition, as shown by Jewitt (1988, p. 1180), if (6) and (7) hold, then μ ≥ 0. Jewitt’s argument also applies to the multisignal case (Jewitt (1988, p. 1184)). With this setup, conditions that support the first-order approach are generally derived using the following clever argument. Assume that s∗ (·) solves the principal’s relaxed problem, maximizing V (s(·) a), subject to (5) and (6). Thus s∗ (·) satisfies (7). Suppose also that, if (7) holds, then one can use this to show that U(s∗ (·) a) is concave in a. Then this same s∗ (·) also solves the principal’s original problem, maximizing V (s(·) a) subject to the more restrictive unrelaxed constraints (4) and (5). Any solution to the relaxed problem is therefore also an optimal solution to the principal’s original unrelaxed problem, so the first-order approach is justified. 3. PREVIOUS RESULTS As indicated above, the first-order approach looks for conditions to ensure that the agent’s utility, U(s∗ (·) a), is a concave function of her effort a, if s∗ (·) satisfies (7). Rogerson (1985) and Jewitt (1988) obtained conditions which ensure this concavity in the one-signal case. Jewitt also presented some multisignal generalizations of a slightly more restrictive version of his one-signal conditions. Sinclair-Desgagné (1994) obtained a multisignal generalization of the Rogerson conditions. Whether we obtain the Rogerson/Sinclair-Desgagné conditions or the Jewitt conditions depends on whether we represent U(s∗ (·) a) using (3M ) or (3SS ). If we represent U(s∗ (·) a) using (3M ) (i.e., u(s∗ (x))f (x|a) dx − a), we are naturally led to impose conditions on f (x|a), since that is where a appears. This yields the approach of Rogerson (1985) and Sinclair-Desgagné (1994), as is well known. On the other hand, if we represent U(s∗ (·) a) using the state-space formu lation, (3SS ) (i.e., u(s∗ (x(a ϑ)))p(ϑ) dϑ − a), then it becomes obvious that U(s∗ (·) a) is concave in a if x(a ϑ) is concave in a, s∗ (x) is nondecreasing
254
JOHN R. CONLON
concave in x, and u(s) is nondecreasing concave in s. This leads naturally to the main set of Jewitt’s (1988) multisignal conditions. Note that Jewitt himself also used both the state-space and Mirrlees formulations in his derivations. However, both the Jewitt and the Sinclair-Desgagné multisignal generalizations remain more restrictive than necessary. First, Jewitt’s multisignal results assume that the signals are independent from one another, though his proof of his main result does not require this independence. Nevertheless, the Mirrlees formulation, which Jewitt used to present his results, makes it difficult to even state his multisignal results without assuming independence. Section 4 below therefore initially uses the state-space formulation to state and prove our generalization of Jewitt’s main multisignal results. A second major goal of that section is then to show how to restate these results using the Mirrlees formulation. Sinclair-Desgagné’s generalization of the Rogerson conditions is also more restrictive than necessary. As we show next, one of Sinclair-Desgagné’s conditions, his Assumption (A8), requires the marginal distribution of all but one of the signals to be constant in a, and so, independent of the agent’s action. That is, it requires all but one of the signals to be “ancillary” to the agent’s action (Lehmann (1983, pp. 45–46)). This assumption can be avoided if the principal is risk neutral.4 However, a second key assumption, his Assumption (A9), cannot be so easily avoided. Also, Assumption (A9) implies that the marginal distribution of all but one of the signals is an affine (i.e., constant sloped) function of the agent’s choice variable, a (see below). To obtain these results, translate Sinclair-Desgagné’s key assumptions into the continuous signal-space case of the current paper. For any index h = 1 2 n, let x−h = (x1 x2 xh−1 xh+1 xn ) and let ∞ Q(x0h x−h |a) = f (xh x−h |a) dxh x0h
Intuitively Q(x0h x−h |a) is the probability density that the random point x˜ is on the upper ray, starting from (x0h x−h ), pointing in the direction of increasing xh . Thus, Q(x0h x−h |a) is a generalization of the upper cumulative distribution function to the multisignal case, where this upper cumulative focuses on a one-dimensional ray in the n-dimensional space of signals. With this notation, Assumptions (A8) and (A9) are as follows: 4 This can be seen as follows. First, Sinclair-Desgagné’s other key assumption, his Assumption (A9) (which we are about to discuss), implies the CISP condition of Section 5 below. Thus, by Proposition 4 in that section, (A9) plus the monotone likelihood ratio property and u(·) strictly concave, justify the first-order approach. By contrast, if the principal is risk averse, then Proposition 5 in Section 6 is needed. This therefore requires the additional assumption, NISP, introduced below. Sinclair-Desgagné’s Assumption (A8) plays the same role as NISP, but is more restrictive.
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
255
ASSUMPTION (A8)—Generalized Stochastic Dominance: For at least one index h, Q(x0h x−h |a) is nondecreasing in a for all x0h and x−h . ASSUMPTION (A9) —Generalized Concave (Upper) Distribution Function Condition: For at least one index h, Q(x0h x−h |a) is concave in a for all x0h and x−h . PROPOSITION 1: If Assumption (A8) holds, then f−h (x−h |a), the marginal density of x˜ on x−h , is independent of a for almost all x−h . If instead Assumption (A9) holds, then f−h (x−h |a) is an affine function of a for almost all x−h , so f−h (x−h |a) = η1 (x−h ) + η2 (x−h )a, for some functions η1 (·) and η2 (·).5 PROOF: Assume (A8). Then Q(x0h x−h |a) is nondecreasing in a for all x0h and x−h . Thus its limit as x0h → −∞ is also nondecreasing in a for all x−h , so (8)
f−h (x−h |a) = Q(−∞ x−h |a)
is nondecreasing in a for all x−h . However, f−h (x−h |a) dx−h = 1. Thus, f−h (x−h |a) is constant in a for almost all x−h . In the same way, if Assumption (A9) holds, then (8) is concave in a for all x−h . Thus, again using f−h (x−h |a) dx−h = 1, it now follows that f−h (x−h |a) is an affine function of a for almost all x−h . Q.E.D. Note that this proposition does not imply that the x˜ −h signals are useless when (A8) holds, as Sinclair-Desgagné (2009) pointed out. For example, when a principal uses the output of other agents as a benchmark in “yardstick competition” (Holmström (1982), Mookherjee (1984)), this involves signals which are ancillary to the agent’s action, but may nevertheless be quite important. As a second example, suppose the principal uses a scale to weigh output. If the principal does not know how accurate the scale is, she may first put an object of known weight on the scale to calibrate it, before weighing the agent’s output. In this case, the calibrating measurement is ancillary, but is nevertheless clearly relevant to the optimal payment schedule. It may therefore figure prominently in an optimal contract. Multiple calibrating signals are also possible. One may then ask, for example, how the optimal contract changes as the calibration becomes more and more precise. Thus, although Assumption (A8) is restrictive, it does allow the x˜ −h signals to play an important role in helping the principal to interpret x˜ h . Alternatively, 5 Note that these conditions are necessary but not sufficient. In addition, even though f−h (x−h |a) constant in a implies f−h (x−h |a) affine in a, it does not follow that (A8) implies (A9). Thus, suppose x˜ −h is ancillary, while the conditional cumulative distribution function, Fh (xh |x−h a), is decreasing, but not convex, in a. Then x˜ satisfies (A8) but not (A9).
256
JOHN R. CONLON
once the principal sees x˜ h , the x˜ −h signals may then contain a great deal of additional information. Similarly, if Assumption (A9) holds, it may nevertheless be the case that the distribution of x˜ −h given x˜ h depends on a in a highly nonlinear manner. A related way to see this is that the contract ultimately depends on the likelihood ratio and, as Sinclair-Desgagné (2009) showed, this likelihood ratio can depend, in interesting ways, even on ancillary signals. Thus, Assumptions (A8) and (A9) do permit a wide range of interesting contracts. Nevertheless, Proposition 1 shows that these assumptions are fairly restrictive. The following sections therefore derive less restrictive versions of both the Jewitt and the Sinclair-Desgagné results. 4. A NEW MULTISIGNAL EXTENSION OF JEWITT’S CONDITIONS This section uses the state-space formulation to give a simple generalization of Jewitt’s (1988) main set of multisignal conditions (for a brief discussion of his slightly different one-signal conditions, see footnote 7 below). First, it is obvious from (3SS ) that concavity of U(s∗ (·) a) in a follows from two standard conditions plus one highly nonstandard condition: (i) The coordinates of x(a ϑ) should be concave in a. (ii) s∗ (x) should be nondecreasing concave in x. (iii) u(s) should be nondecreasing concave in s. Conditions (i) and (iii) are completely standard: (i) is diminishing marginal productivity—at least when the signal is output—and (iii) is diminishing marginal utility, that is, risk aversion. Condition (ii), however, is problematic, since it depends on the endogenous payment schedule, s∗ (·). The condition therefore depends on the solution. Grossman and Hart’s (1983) Proposition 9, however, gives conditions—in the one-signal case—which make the payment function s∗ (x) nondecreasing concave in x, so (ii) is met. Moreover, these conditions generalize easily to the multisignal case. Using the Mirrlees formulation, the principal’s cost minimizing schedule, s∗ (·), for inducing the agent to choose a∗ , given the participation and relaxed incentive compatibility constraints (5) and (6), solves (7) above. Also, if (6) and (7) hold, then μ ≥ 0, as mentioned in the discussion after (7). In addition, even though the principal’s relaxed constraint, (6), does not include the agent’s complementary slackness condition at a = 0, (7) nevertheless holds at that point as well, with μ = 0. Given this setup, the Grossman–Hart conditions for s∗ (·) to be nondecreasing concave are the following: (iia) fa (x|a)/f (x|a) is nondecreasing concave in x for each a. (iib) 1/u (s) is increasing convex in s so ν(z) = (u )−1 (1/z) increasing concave in z. Condition (iia) strengthens the monotone likelihood ratio property. It says that the information in x˜ grows more and more slowly as x˜ rises, which tends
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
257
to make s∗ (x) concave. Condition (iib) says that the marginal utility of income falls quickly so, for s large, further increases in s have less effect in terms of incentives. This also tends to make s∗ (x) concave. Thus, (iia) and (iib) are clearly relevant for s∗ (·) to be concave. The Grossman–Hart conditions, plus μ ≥ 0, ensure that the function s∗ (x) = ν λ + μ[fa (x|a∗ )/f (x|a∗ )] is nondecreasing concave in x. This, combined with conditions (i) and (iii), ensures that the first-order approach is valid. Thus, let La∗ (x) = λ+μ[fa (x|a∗ )/ f (x|a∗ )] be the likelihood ratio, evaluated at a∗ , and adjusted by the multipliers λ and μ. Then U(s∗ (·) a) is the expectation of the composition of functions (9) u ν La∗ (x(ϑ a)) = u ◦ ν ◦ La∗ ◦ x(a ϑ) Conditions (i), (iia), (iib), and (iii) then clearly imply that (9) is concave in a ˜ for each ϑ. Thus, U(s∗ (·) a) = E[u(ν(La∗ (x(a ϑ))))] − a is concave in a as well. Moreover, thinking of u ◦ ν ◦ La∗ ◦ x as (u ◦ ν) ◦ La∗ ◦ x shows that conditions (iib) and (iii) can be combined and generalized, since what matters is for the composition of ν and u to be concave, not for each to be concave separately. But this composition is simply Jewitt’s function (10)
ω(z) = u(ν(z)) = u((u )−1 (1/z))
so we can replace (iib) and (iii) with Jewitt’s condition that ω(z) is concave in z.6 PROPOSITION 2: Assume (a) the coordinates of x(a ϑ) are concave in a, (b) fa (x|a)/f (x|a) is nondecreasing concave in x, and (c) ω(z) is increasing concave in z. Then any solution to the relaxed problem, maximizing V (s(·) a) subject to (5) and (6), also solves the full problem of maximizing V (s(·) a) subject to (4) and (5). PROOF: Conditions (a), (b), and (c) imply that any solution to the relaxed problem yields an agent objective function U(s∗ (·) a), from (3SS ), which is concave in a. Since U(s∗ (·) a) is concave and Ua (s∗ (·) a∗ ) = 0, a∗ also satisfies 6 This condition makes considerable sense if viewed through the dual representation of the principal–agent problem. That is, if we let ψ(z) = maxs zu(s) − s be a profit function from “selling utility” at shadow price z, then concavity of ω(z) can be expressed as ψ (z) ≤ 0 (Jewitt (2007, footnote 18); see also Fagart and Sinclair-Desgagné (2007), who explored further economic implications of ψ (z) positive or negative). Intuitively, s∗ = ν(z) solves zu (s∗ ) = 1, so it maximizes zu(s) − s. Thus, by Hotelling’s lemma, ψ (z) = u(s∗ ) = u(ν(z)) = ω(z) is the “supply” of utility as a function of its shadow price z, so ω (z) = ψ (z) ≤ 0 implies that this supply curve is concave in the shadow price. Then, if the shadow price z = La∗ (x(a ϑ)) is concave in a for each ϑ, U(s∗ (·) a) is concave in a also.
258
JOHN R. CONLON
the full incentive compatibility constraint (4), so it is a solution to the fully constrained problem. Q.E.D. Proposition 2 is similar to Jewitt’s (1988) Theorem 3, but without his independence assumption. The proof is also quite similar. Jewitt (1988, footnote 6), mentioned that “[i]t is possible to derive some conditions without [independence], but we do not pursue the issue here.” Thus, Jewitt could clearly have obtained something like our Proposition 2. However, it is not immediately obvious how to express the condition, that x(a ϑ) is concave in a, using the Mirrlees notation. To translate this condition into the Mirrlees notation, we must begin with a Mirrlees representation, f (x|a), and find a state-space model that yields this density. We then impose concavity of x(a ϑ) in the state-space model and translate back to the Mirrlees notation. Thus, for the random vector x˜ with ˜ i = 1 2 n, where ϑ ˜ has density f (x|a), we need a representation xi (a ϑ), some joint density p(ϑ) independent of a, such that the random variables ˜ have joint density f (x|a). xi (a ϑ) There are many such representations, and each provides different sufficient conditions. This is somewhat inelegant, since there is no one canonical representation. It may, however, supply some flexibility. If one representation does not work, the researcher can try another. The representation in Proposition 3 builds on what is known as the standard construction (see Müller and Stoyan (2002, p. 88)). Consider the two-signal case for specificity (the approach clearly generalizes), let F 1 (x1 |a) be the marginal cumulative distribution function of x˜ 1 as a function of a, and let F 2 (x2 |x1 a) be the conditional cumulative distribution function of x˜ 2 , given x˜ 1 = x1 , as a function of a. We construct one representation and then indicate how others might be obtained. PROPOSITION 3: Suppose that F 1 and F 2 are continuous, and that the support of the distribution of x˜ is compact and convex. Then there exist continuous functions x1 (a θ1 θ2 ) and x2 (a θ1 θ2 ) for (θ1 θ2 ) ∈ [0 1] × [0 1] that solve the system of equations (11)
F 1 (x1 |a) = θ1
F 2 (x2 |x1 a) = θ2
˜ = (θ˜ 1 θ˜ 2 ) is uniformly distributed on the square [0 1] × [0 1], then Also, if ϑ ˜ = (x1 (a ϑ) ˜ x2 (a ϑ)) ˜ has the same joint distribution as x˜ . x(a ϑ) PROOF: First, F 1 (x1 |a) is continuous and also increasing in x1 on its (convex) support. Thus the function x1 (a θ1 θ2 ) = (F 1 )−1 (θ1 |a), solving the first half of (11), exists and is continuous (and depends only on θ1 and a). Next, plug x1 = (F 1 )−1 (θ1 |a) into the second half of (11). Then one can similarly solve F 2 (x2 |(F 1 )−1 (θ1 |a) a) = θ2 for (F 2 )−1 (θ2 |(F 1 )−1 (θ1 |a) a) = x2 (a θ1 θ2 ).
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
259
Now, Prob(x1 (a θ˜ 1 θ˜ 2 ) ≤ x1 ) = Prob((F 1 )−1 (θ˜ 1 |a) ≤ x1 ) = Prob(θ˜ 1 ≤ F (x1 |a)) = F 1 (x1 |a), since θ˜ 1 is U[0 1]. A similar logic shows that Prob(x2 (a θ˜ 1 θ˜ 2 ) ≤ x2 |x1 (a θ˜ 1 θ˜ 2 ) = x1 ) = F 2 (x2 |x1 a). Q.E.D. 1
Proposition 3 lets us check concavity of x(a ϑ) in the context of a Mirrlees framework. Thus, suppose that (F 1 )−1 (θ1 |a) is concave in a, and that (F 2 )−1 (θ2 |x1 a) is nondecreasing in x1 and concave in x1 and a. Then x(a ϑ) from Proposition 3 is concave in a. This, plus fa (x|a)/f (x|a) nondecreasing concave in x and ω(z) increasing concave in z, is sufficient to support the firstorder approach, by Proposition 2. Other representations are also possible. For example, the roles of x1 and x2 could be switched above. More generally, if one considers a densitypreserving smooth deformation, Φa (ϑ) = ϑ , of the square [0 1] × [0 1], then ˜ gives a new state-space representation of the same Mirrlees x˜ = x(a Φa (ϑ)) model f (x|a). Jewitt (1988), in his Theorem 3, assumes x˜ 1 and x˜ 2 are independent from one another, so the functions in Proposition 3 reduce to xi (a θi ) = (F i )−1 (θi |a). He also assumes that the F i (xi |a) are quasiconvex in xi and a. This is how he imposes concavity of the xi (a θi ) in a. His Theorem 3 then assumes this plus fa (x|a)/f (x|a) nondecreasing concave in x and ω(z) concave in z, as in our Proposition 2. To check concavity of x(a ϑ) more generally, one can implicitly differentiate (11) twice to obtain ∂2 xi /∂a2 , and check concavity through the analogue of bordered Hessians. However, this may be computationally messy. It might therefore be easier to use a shortcut to check concavity of x(a ϑ). For example, one might begin with a state-space representation or solve (11) explicitly (see Conlon (2009)). 5. A MULTISIGNAL EXTENSION OF THE CDFC As suggested in Section 3 above, if we express the agent’s payoff function using the Mirrlees formalism, (3M ) (i.e., U(s∗ (·) a) = u(s∗ (x))f (x|a) dx − a), then we are naturally led to impose concavity through the density function, f (x|a), since that is where the agent’s choice variable a appears. Consider, for ˜ and let example, the one-signal case, so x˜ reduces to the univariate signal x, ¯ Then integrating by parts gives the support of f (x|a) be [x x]. (12)
∗
x¯
∗
U(s (·) a) = u(s (x)) + x
du(s∗ (x)) [1 − F(x|a)] dx − a dx
Thus, suppose u(s∗ (x)) is nondecreasing in x (applying, e.g., the monotone likelihood ratio property), so du(s∗ (x))/dx ≥ 0. Rogerson’s convexity of the distribution function condition (CDFC) states that F(x|a) is convex in a. This
260
JOHN R. CONLON
implies that 1 − F(x|a) is concave in a, so (12) clearly implies that U(s∗ (·) a) is concave in a.7 The natural multisignal generalization of (12) would be to choose one of the n signals, say xh , and integrate (3M ) by parts with respect to xh . This yields essentially the GCDFC of Sinclair-Desgagné (1994). To obtain a more flexible generalization of the CDFC, begin by returning to the one-signal case and note that if u(s∗ (x)) is nondecreasing in x, then it can clearly be approximated by a sum of nondecreasing step functions. Specifically, it can be approximated as a sum, u(s∗ (x)) ≈ α0 + j≥1 αj h(x; bj ), where the αj ’s are positive for j ≥ 1 and h(x; b) is the step function which is 0 for x < b and 1 for x ≥ b. The integral u(s∗ (x))f (x|a) dx in U(s∗ (·) a) can then be approximated as u(s∗ (x))f (x|a) dx ≈ α0 + αj h(x; bj )f (x|a) dx j≥1
= α0 +
αj [1 − F(bj |a)]
j≥1
By the CDFC this sum is concave in a, so U(s∗ (·) a) is concave in a, as desired. To obtain a new multisignal generalization of the CDFC approach, we must therefore simply construct a family of multi-variable generalizations of the functions h(x; b) above, capable of generating all multi-variable nondecreasing functions. Thus, consider the n signal case and say that the set E ⊆ Rn is an increasing set (Nachbin (1965), Milgrom and Weber (1982)) if x ∈ E and y ≥ x (i.e., yi ≥ xi , i = 1 2 n), implies y ∈ E. Note that if a function is nondecreasing, then its upper level sets are increasing sets. Given an increasing set E, let h(x; E) be the characteristic function of E, so
0 if x ∈ / E, h(x; E) = 1 if x ∈ E. 7
Note that a new derivation of Jewitt’s (1988) full one-signal conditions can be obtained similarly, by performing a second integration by parts on (12). Thus, if we let x ˆ F(x|a) = x [1 − F(z|a)] dz, then a second integration by parts yields U(s∗ (·) a) = u(s∗ (x)) + x¯ ˆ ˆ x|a) ¯ − x (d 2 u(s∗ (x))/dx2 )F(x|a) (du(s∗ (x))/dx)|x¯ F( dx − a. Jewitt assumes that fa (x|a)/f (x|a) is nondecreasing concave in x and that ω(z) is increasing concave in z, which imply du(s∗ (x))/dx ≥ 0 and d 2 u(s∗ (x))/dx2 ≤ 0, as in Proposition 2 above. An additional condition x ˆ of his, that x F(z|a) dz is convex in a, yields F(x|a) concave in a. This implies that U(s∗ (·) a) above is concave in a, so the first-order approach applies. In the current one-signal case, Jewitt’s x condition, x F(z|a) dz convex in a, generalizes Proposition 2’s concavity in a of x(a ϑ). Howx ever, there is no simple multisignal version of x F(z|a) dz convex in a (see Conlon (2009)).
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
261
Then every nondecreasing function can be approximated using positive linear combinations of these h(x; E) functions, yielding a multisignal extension of the above approach. This suggests the following generalization of the CDFC: DEFINITION 1: The distribution f (x|a) satisfies the concave increasing-set probability (CISP) condition if, for every increasing set E, the probability Prob(˜x ∈ E|a) is concave in a. Now the CISP condition, plus the monotone likelihood ratio (MLR) property (fa (x|a)/f (x|a) nondecreasing in x) and strict concavity of u(·), can be used to justify the first-order approach. To see this, begin with a lemma. LEMMA 1: The density f (·|a) satisfies CISP if and only if the transformation (13) ϕT (a) = ϕ(x)f (x|a) dx = E[ϕ(˜x)|a] satisfies (14)
for any nondecreasing function ϕ(x) ϕT (a) is concave in a
The proof builds on a standard approximation (see, e.g., Müller and Stoyan (2002, Theorem 3.3.4)) and is provided in the Appendix. With this lemma in hand, it is easy to give conditions that justify the firstorder approach in the multisignal case. This is done in the following proposition. PROPOSITION 4: Suppose that u(·) is increasing and strictly concave, and that the MLR and CISP conditions hold. Then any solution to the relaxed principal– agent problem also solves the fully constrained problem. PROOF: First, the solution to the relaxed problem has s∗ (x) solving (7). Also, since u(·) is strictly concave, 1/u (s) is increasing in s. This, plus μ ≥ 0 (by the Jewitt (1988) argument) and fa (x|a)/f (x|a) nondecreasing in x, implies that the payment schedule s∗ (x) is also nondecreasing in x. Thus u(s∗ (x)) is nondecreasing in x. Also, letting ϕ(x) ≡ u(s∗ (x)) gives U(s∗ (·) a) = ϕT (a) − a, by (3M ) and (13). Using this and CISP in Lemma 1 then implies that U(s∗ (·) a) is concave in a, so any solution to the relaxed problem is also a solution to the fully constrained problem. Q.E.D. 6. A RISK AVERSE PRINCIPAL Up to now, the principal has been assumed to be risk neutral. This was primarily because the Jewitt conditions go through more smoothly in this case.
262
JOHN R. CONLON
However, suppose now that the principal is risk averse, with von Neumann– Morgenstern utility function v(·). Then the principal’s first-order condition for s∗ (·), (7), changes to (15)
v (π(x) − s∗ (x)) fa (x|a∗ ) = λ + μ u (s∗ (x)) f (x|a∗ )
Thus, even if we assume that the likelihood ratio fa (x|a∗ )/f (x|a∗ ) is concave in x, it is difficult to impose the nonstandard condition s∗ (x) concave in x, because it is no longer enough to assume that ν(z) = (u )−1 (1/z) is concave. Therefore Jewitt’s condition, concavity of ω(z), is also inadequate. On the other hand, the CDFC approach and its generalization in Section 5 are easily adapted to a risk averse principal. This requires two new assumptions—that the function π(x) is nondecreasing in x and that f (x|a) satisfies the NISP condition, defined as follows: DEFINITION 2: The distribution f (x|a) satisfies the nondecreasing increasingset probability (NISP) condition if, for every increasing set E, the probability Prob(˜x ∈ E|a) is nondecreasing in a. Now, Jewitt’s (1988) proof of μ ≥ 0 does not work if the principal is risk averse. One must therefore replace (6) with Rogerson’s (1985) doubly relaxed constraint (16)
Ua (s∗ (·) a∗ ) ≥ 0
The principal’s “doubly relaxed” problem is then to maximize V (s(·) a) given the participation constraint (5) and the doubly relaxed incentive compatibility constraint (16). This leads to the following proposition: PROPOSITION 5: Assume π(·) is nondecreasing, u(·) and v(·) are increasing and strictly concave, and the MLR, CISP, and NISP conditions hold. Then any solution to the principal’s doubly relaxed problem is also a solution to her unrelaxed problem. PROOF: Let s∗ (·) and a∗ be solutions to the doubly relaxed problem. As usual, (15) still holds. Also, since (16) is an inequality constraint, μ ≥ 0. This, plus u(·) and v(·) strictly concave, MLR, and π(x) nondecreasing, implies s∗ (x) nondecreasing. If Ua (s∗ (·) a∗ ) > 0, then a∗ is not incentive compatible. Thus, we must show that this contradicts the assumption that s∗ (·) and a∗ solve the doubly relaxed problem. Suppose therefore that Ua (s∗ (·) a∗ ) > 0. Then μ = 0, so (15) shows that v (π(x) − s∗ (x))/u (s∗ (x)) is constant. Thus s∗ (x) nondecreasing in x implies that π(x) − s∗ (x) is nondecreasing in x, so ϕ(x) ≡ v(π(x) − s∗ (x)) is
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
263
also nondecreasing in x. Therefore, by an argument like that in the proof ∗ = of Lemma 1 above, ∗NISP implies that the principal’s payoff, V (s (·) a) T ϕ (a) = v(π(x) − s (x))f (x|a) dx, is nondecreasing in a. This, plus Ua (s∗ (·) a∗ ) > 0, implies that s∗ (·) and a∗ are not solutions to the doubly relaxed problem, since increasing a and shifting the s∗ (·) schedule down slightly can increase the principal’s payoff, while still satisfying the constraints. Thus Ua (s∗ (·) a∗ ) > 0 leads to a contradiction, so Ua (s∗ (·) a∗ ) = 0. Next, since s∗ (·) is nondecreasing, u(s∗ (·)) is nondecreasing, so CISP, plus Lemma 1 above, implies that the agent’s objective function is concave. Thus s∗ (·) and a∗ solve the unrelaxed problem and the first-order approach is valid. Q.E.D. As Ian Jewitt has pointed out to me, NISP can be checked by adapting a version of Milgrom and Weber’s (1982) notion of affiliated random variables (equivalently MPT2 ; see Müller and Stoyan (2002, pp. 126–127)). Thus, assume f (x|a) is smooth in (x a), and that a ∈ [0 1]. Though a is not random, imagine it is a random variable, with marginal distribution uniform on [0 1]. Then ˜ with joint density f # (x a) = f (x|a), is affiliated (or the random vector (˜x a), MPT2 ) if the cross partials ∂2 ln f # /∂xi ∂a and ∂2 ln f # /∂xi ∂xj are all nonnegative. NISP then follows immediately from Milgrom and Weber’s Theorem 5. Note, incidentally, that ∂2 ln f # /∂xi ∂a ≥ 0 is the MLR property. Unfortunately, it is not clear how this approach could be used to check CISP. The next two sections therefore use a different approach to derive conditions that ensure CISP. While affiliation is sufficient to ensure NISP, the approach in the following sections also gives alternative conditions that ensure NISP, with little extra effort. Also, these alternative conditions are less restrictive than affiliation. 7. REDUCING NISP AND CISP TO DENSITY FLOWS The previous sections developed the CISP condition, which generalizes the CDFC. CISP is also less restrictive than the GCDFC. As a referee has pointed out, GCDFC implies that there is an index h such that Prob(˜x ∈ E|a) is concave in a for all sets E that satisfy the condition (xh x−h ) ∈ E and xh ≥ xh imply (xh x−h ) ∈ E. Since this class of sets is larger than the class of increasing sets, GCDFC is more restrictive than CISP. However, it remains to see more clearly just how flexible CISP is. To construct distributions that satisfy CISP, one can think of f (x|a) in terms of probability density flows as a increases. Analyzing these flows, in turn, requires tools from vector calculus (Taylor and Mann (1983, Chap. 15); Lang (1987, Chaps. 10 and 12)). For expositional purposes, focus on the two-signal case, though the approach generalizes easily to n signals. ˜ y) ˜ on the square S = [0 1] × Thus, consider the random vector x˜ = (x [0 1], with density f (x y|a) = f (x|a). Also, consider a vector field v(x a) =
264
JOHN R. CONLON
(u(x a) v(x a)) on S. If the mass of f (x|a) moves along the vector field v(x a) as a grows, then (17)
fa (x y|a) = −[ux (x y a) + vy (x y a)] = − div v(x a)
This is the usual “divergence” formula for the flow of a compressible fluid, where v(x a) is the “flux” (density × velocity of flow). Thus if, e.g., ux (x0 y0 a) > 0, then u(x0 − x y0 a) < u(x0 + x y0 a) so, in the x direction, less mass is moving toward (x0 y0 ) than away from it. That is, the mass does diverge. This should make f (x0 y0 |a) fall in a, as in (17). A similar argument applies for the y direction. This therefore does represent f (x y|a) in terms of a density flow, as expected. Next let A be a simply connected two-dimensional subset of S with boundary ∂A, and let ∂A be traced out counterclockwise by the continuous piecewise differentiable simple closed curve x(t) as t rises from 0 to 1. Let n(t) = (y (t) −x (t)) be an outward pointing “normal” (perpendicular) vector to ∂A at x(t), rotated clockwise 90◦ from the tangent vector x (t) = (x (t) y (t)) (see Figure 1). Then d (18) f (x|a) dx = − [ux (x y a) + vy (x y a)] dx dy da A A 1 = − [u(x(t) a)y (t) − v(x(t) a)x (t)] dt
0 1
=−
v(x(t) a) · n(t) dt 0
denotes a double integral over the area A. In addition, the first step Here A of (18) follows from (17) and the second step follows from the two-dimensional “divergence theorem” (Taylor and Mann (1983, p. 488), Lang (1987, pp. 285, 345)). Finally, v(x(t) a) · n(t) = u(x(t) a)y (t) − v(x(t) a)x (t) is the “dot product” of the vectors v(x(t) a) and n(t). This dot product is positive if v(x(t) a) points out from the boundary of A and is negative if it points into A. For example, at the point x(t) in Figure 1, v(x(t) a) is pointing into A, so v(x(t) a) · n(t) < 0, which makes a positive contribution to (18), increasing the mass inside A, as it should. Thus the last integral in (18) is the flow of the vector field v(x(t) a) across the boundary ∂A, so (18) gives the density flow of v(x a) into A. The equation thus makes intuitive sense. To illustrate (18) we show that f (x|a) from (17) is a probability density function on the square S for all a if it is a probability density function on S for a = 0, and if in addition f (x|a) stays nonnegative and the boundary conditions (19)
u(0 y a) = u(1 y a) = v(x 0 a) = v(x 1 a) = 0
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
265
FIGURE 1.—The vector flow into region A. The normal vector n(t), at the point x(t), is the vector pointing out from A, perpendicular to the tangent vector x (t). At the point shown, there is an obtuse angle between n(t) and the density flux v(x(t) a). The dot product v(x(t) a) · n(t) is therefore negative, so the contribution to the aggregate flow into A, given in Equation (18), is positive, as it should be.
hold. The conditions in (19) ensure that no mass is flowing across the boundaries of the square S. To see that this works, translate into ordinary multiple integrals: (20)
d da
1
1
1
1
f (x y|a) dx dy = − 0
0
[ux (x y a) + vy (x y a)] dx dy 0
0 1
=−
[u(1 y a) − u(0 y a)] dy 0
1
[v(x 1 a) − v(x 0 a)] dx = 0
− 0
Here the first step uses (17), the second step uses the fundamental theorem of calculus, and the last step uses (19). Thus f (x|a) dx is constant in a, so if S f (x|0) dx = 1, then f (x|a) dx = 1 for all a and, assuming f (x|a) ≥ 0, S S f (x|a) really is a probability density function on S. The third expression in (20), incidentally, is the analogue for the square, S, of the last two expressions
266
JOHN R. CONLON
in (18). Thus (20) gives a proof of a special case of (18). Note that this calculation suggests that the divergence theorem is a generalization of the fundamental theorem of calculus. Integrating (17) finally gives a (21) [ux (x y α) + vy (x y α)] dα f (x y|a) = f (x y|0) − 0
With this machinery, it is easy to construct densities which satisfy NISP and CISP. We show how to do this in the following lemma. LEMMA 2—The Density Flow Lemma: Let the vector field v(x a) satisfy (19). Then if its coordinates are nonnegative, the density function f (x y|a) in (21) satisfies NISP. Similarly, if its coordinates are nonincreasing in a, then f (x y|a) satisfies CISP. PROOF: Let E be an increasing set. Using (18) gives (22)
d d Prob(˜x ∈ E|a) = Prob(˜x ∈ E ∩ S|a) = − da da
1
v(x(t) a) · n(t) dt 0
with x(t) tracing ∂(E ∩ S). Since E is an increasing set, the boundary ∂(E ∩ S) consists of a downward sloping curve on its southwest, whose normal vector n(t) has nonpositive coordinates, and other lines from the boundary of S. Now, if the vector field v(x a) has nonnegative coordinates, then it flows into E ∩ S through its southwest boundary (so v(x(t) a) · n(t) ≤ 0 there), but does not flow out through the other parts of its boundary (since v(x(t) a) · n(t) = 0 there, by (19)). Thus the derivative in (22) is nonnegative, so Prob(˜x ∈ E|a) is nondecreasing in a. Similarly, if the coordinates of v(x(t) a) are nonincreasing in a, then the rate of flow in (22) is nonincreasing in a. Thus Prob(˜x ∈ E|a) is Q.E.D. concave in a and f (x|a) satisfies CISP. It is now easy to construct densities f (x|a) which satisfy NISP, CISP, and also the MLR property. To do this choose v(x a) as in Lemma 2 so that we also have fa (x y|a)/f (x y|a) = −[ux (x y a) + vy (x y a)]/f (x y|a) nondecreasing in x and y. As an example, let f (x y|0) be uniform on S, let u(x y a) = (x − x2 )(1 − a)ε, and let v(x y a) = (y − y 2 )(1 − a)ε. Then (21) gives f (x y|a) = 1 + (x + y − 1)(2a − a2 )ε
for (x y) ∈ S
If ε < 1, then this is a strictly positive probability density function on S for all a ∈ [0 1], and it satisfies CISP, NISP, and the MLR property. Also, this
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
267
density satisfies neither Jewitt’s (1988) independence assumption nor SinclairDesgagné’s (1994) GCDFC. Clearly it would be easy to generate an enormous range of examples like this. Of course, if one uses affiliation to get NISP, as in the end of Section 6, then MLR follows automatically. On the other hand, to use this approach to confirm NISP and CISP for a prespecified f (x|a), one must construct an appropriate probability density flow v(x a) corresponding to f (x|a). We show how to do this in the next section. Finally, the conditions in Lemma 2, for some given vector flow v(x a), are sufficient for NISP and CISP, but not necessary. However, it seems reasonable to conjecture that if f (x y|a) satisfies NISP and/or CISP, then there exists a vector flow v(x a) as in Lemma 2. I have not been able to prove this, however. 8. LOCAL CONDITIONS SUFFICIENT FOR NISP AND CISP The previous section reduced NISP and CISP to the existence of certain probability density flows. This allowed us to construct densities that satisfy NISP and CISP. However, to check NISP and CISP for a given density, f (x y|a), we must construct vector flows that satisfy equations (17) and (19), and then check the conditions in Lemma 2. This is relatively straightforward, and reduces the hard-to-check global NISP and CISP conditions to convenient local conditions. Let g(x|a) and G(x|a) be the marginal density and cumulative distribu˜ and let h(y|x a) and H(y|x a) be the conditional density tion function of x, and cumulative distribution function of y˜ given x˜ = x. Thus the joint density f (x y|a) = g(x|a)h(y|x a). With this notation, the following lemma allows us to check NISP and CISP. LEMMA 3: Suppose there are functions φ(x y a) and ψ(x y a) with (23)
φx (x y a) = −ψy (x y a)
and (24)
φ(0 y a) = φ(1 y a) = ψ(x 0 a) = ψ(x 1 a) = 0
and such that the vector field (25) v(x a) = − Ga (x|a)h(y|x a) + φ(x y a) g(x|a)Ha (y|x a) − Ga (x|a)Hx (y|x a) + ψ(x y a)
has nonnegative coordinates for all (x y) ∈ [0 1] × [0 1]. Then the NISP condition holds. Similarly, if there are functions φ(x y a) and ψ(x y a) that satisfy (23) and (24) and such that the coordinates of (25) are nonincreasing in a, then CISP holds.
268
JOHN R. CONLON
PROOF: Using the product rule to calculate fa (x y|a) shows that v(x a) satisfies (17). The coordinates also satisfy (19) since Ga (0|a) = Ga (1|a) = Ha (0|x a) = Ha (1|x a) = Hx (0|x a) = Hx (1|x a) = 0. Now apply Lemma 2. Q.E.D. The density flow in Lemma 3 builds on the basic density flow given by (26) vb (x a) = −Ga (x|a)h(y|x a) −g(x|a)Ha (y|x a) + Ga (x|a)Hx (y|x a) This density flux may seem somewhat ad hoc. To make sense of it, note that this flux can be broken up into three separate parts. First, the flux −Ga (x|a)h(y|x a) in the first coordinate ensures that the marginal density g(x|a) evolves appropriately as a rises. Dividing this density flux by the density, g(x|a)h(y|x a), gives the flow velocity in the x direction, −Ga (x|a)/g(x|a). The specification of the basic density flow in (26) is chosen to make this velocity independent of y. Of course, other specifications are possible and may yield more useful conditions in some applications. The second coordinate involves two flows. First, the −g(x|a)Ha (y|x a) term represents a flux in the y direction with corresponding velocity −Ha (y|x a)/ h(y|x a). This causes the conditional distribution, h(y|x a), to evolve appropriately as a rises. If the two signals x˜ and y˜ are independent, then these are the only two effects. However, if x˜ and y˜ are not independent, then there is a third flow, with flux in the y direction given by Ga (x|a)Hx (y|x a). To understand this flow, suppose Ga (x|a) < 0, so the first flow above is in the direction of increasing x. This first flow will then influence the distribution of y˜ given x˜ = x by mixing the distribution of y˜ given x˜ = x − dx into its distribution given x˜ = x, where dx > 0. The role of the third flow is to counteract this mixing effect. Thus, if, for example, x˜ and y˜ are positively correlated, the mixing effect tends to shift the distribution of y˜ at x˜ = x down, since the distribution of y˜ at x˜ = x − dx is lower than its distribution at x˜ = x. However, in this case, Hx (y|x a) is negative. This, together with Ga (x|a) negative, means that the flux, Ga (x|a)Hx (y|x a), in the y direction is positive. This shifts the distribution of y˜ given x˜ = x back up, as required. The following propositions give some illustrative applications of Lemma 3. PROPOSITION 6: Suppose G(x|a) is nonincreasing in a, and H(y|x a) is nonincreasing in x and a. Then NISP holds. PROOF: Let φ ≡ ψ ≡ 0 in (25). Then since G(x|a) is nonincreasing in a, the first coordinate of v(x a) is nonnegative. Similarly, the second coordinate is nonnegative since g(x|a)Ha (y|x a) is nonpositive and Ga (x|a)Hx (y|x a) is nonnegative. Q.E.D.
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
269
SECOND PROOF: This result can also be proven without using the vector calculus machinery from the previous section. To do this, let E be an increasing set and let y = b(x) be the curve that describes the southwest boundary of E ∩ S. Let P = Prob(˜x ∈ E) be the probability that the signal x˜ is in E. Then 1 1 g(x|a)h(y|x a) dy dx P=
x=0
y=b(x)
1
g(x|a)[1 − H(b(x)|x a)] dx
= 0
Thus, assuming one can differentiate under the integral sign, 1 dP/da = ga (x|a)[1 − H(b(x)|x a)] dx 0
1
g(x|a)Ha (b(x)|x a) dx = I1 − I2
− 0
where I1 and I2 are the obvious integrals. Here I2 is nonpositive since Ha (b(x)|x a) is, so the second term, −I2 , is nonnegative. For the first term, integrate I1 by parts, yielding 1 1 Ga (x|a)Hx (b(x)|x a) dx I1 = Ga (x|a)[1 − H(b(x)|x a)]|x=0 + +
0 1
Ga (x|a)h(b(x)|x a)b (x) dx = 0 + I1a + I1b
0
where the first term is zero since Ga (0|a) = Ga (1|a) = 0. Here I1a is nonnegative since Ga (x|a) and Hx (b(x)|x a) are both nonpositive, and I1b is nonnegQ.E.D. ative since Ga (x|a) and b (x) are both nonpositive. Note that H(y|x a) nonincreasing in x says that when x˜ increases, the conditional distribution of y˜ is nondecreasing in the sense of first-order stochastic dominance. This means that x˜ and y˜ are positively related, but is weaker than affiliation. Similarly, G(x|a) and H(y|x a) nonincreasing in a imply that the ˜ and the conditional distribution of y, ˜ respectively, marginal distribution of x, are nondecreasing in a in the sense of first-order stochastic dominance. These are weaker than the corresponding MLR (and so, affiliation) properties.8 Thus, 8 ˜ To see these results, note that Milgrom and Weber’s (1982) Theorem 5 implies that for x, ˜ and a affiliated, the expectation E[Φ(x)|a] ˜ ˜ x˜ = x a] is y, is nondecreasing in a, and E[Ψ (y)| nondecreasing in x and a, for any nondecreasing functions Φ(·) and Ψ (·). Taking Φ(·) to be χ(x∞) (·), the characteristic function of (x ∞), and taking Ψ (·) = χ(y∞) (·), shows that affiliation implies the conditions in Proposition 6.
270
JOHN R. CONLON
the conditions in Proposition 6 are more general than the affiliation conditions mentioned at the end of Section 6 above.9 PROPOSITION 7: Suppose Ga (x|a) and Ha (y|x a) are negative (this follows ˜ Assume from strict versions of the corresponding MLR properties for x˜ and y). also that g(x|a) and h(y|x a) are strictly positive on [0 1]. Finally assume Hx (y|x a) < 0 (so x˜ and y˜ are positively related). Then the conditions (27)
ha (y|x a)/ h(y|x a) ≤ −Gaa (x|a)/Ga (x|a)
(28)
ga (x|a)/g(x|a) ≤ −Haa (y|x a)/Ha (y|x a)
(29)
Hax (y|x a)/Hx (y|x a) ≤ −Gaa (x|a)/Ga (x|a)
are sufficient to ensure CISP. PROOF: Let φ ≡ ψ ≡ 0 in (25) and consider the first coordinate of v(x a). This is nonincreasing in a if (30)
∂[Ga (x|a)h(y|x a)]/∂a = Ga (x|a)ha (y|x a) + Gaa (x|a)h(y|x a) ≥ 0
Using Ga (x|a) < 0, h(y|x a) > 0, and (27) shows (30) holds. Similarly, (28) and (29) show that the second coordinate of (25) is nonincreasing in a. Q.E.D. Just as was the case for Proposition 6, there is a second proof of Proposition 7 which avoids the vector calculus machinery of Section 7 (see Conlon (2009)). However, this alterative proof is as opaque as the alternative proof of Proposition 6. Note that, for all x, the left hand side of (27) is nonnegative for some values of y. Thus, since Ga (x|a) < 0, (27) implies Gaa (x|a) ≥ 0 for all x, and so, ˜ Similarly (28) implies the CDFC for y, ˜ conditional implies the CDFC for x. on x˜ = x. Of course, (27) and (28) put additional restrictions on the x˜ and y˜ distributions, beyond the CDFC. This is not surprising since, as Jewitt (1988, ˜ sepp. 1184) pointed out, the CDFC for each of the random variables x˜ and y, arately, is not enough to justify the first-order approach, even if the signals are 9 The need for H(y|x a) to be nonincreasing in x also shows why the MLR property was not, by itself, a sufficiently strong monotonicity condition for Sinclair-Desgagné (1994) to use in justifying the first-order approach. He therefore required a stronger assumption, such as (A8), discussed in Section 3 above. Indeed, recognizing that MLR was insufficient by itself for the multisignal case is one of the important insights in Sinclair-Desgagné (1994). For example, I doubt that I could have obtained the results for a risk averse principal, in Section 6 above, without the benefit of this insight. Proposition 6, however, suggests that one can replace (A8) by MLR and H(y|x a) nonincreasing in x, and so avoid the ancillarity result for Sinclair-Desgagné’s approach in Proposition 1 above, even if the principal is risk averse.
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
271
independent. This is because the product of two concave functions is often not concave (see the discussion of Equation (31) below). Condition (29), finally, says that an increase in a should not make Hx (y|x a) fall too quickly, that is, should not increase the correlation between x˜ and y˜ too much. Under independence, Hax (y|x a) = 0, so (29) follows from Gaa (x|a) ≥ 0, and so, from (27). To illustrate these conditions, assume x˜ and y˜ are independent for specificity and let them belong to the first class of distributions in LiCalzi and Spaeter (2003), so G(x|a) = x + β1 (x)γ1 (a)
and H(y|a) = y + β2 (y)γ2 (a)
for x y ∈ [0 1], where the βi (·) are nonnegative and concave on [0 1], with βi (0) = βi (1) = 0 and |βi (·)| ≤ 1, and the γi (·) are decreasing and convex with |γi (·)| < 1. Conditions (27) and (28) then impose the additional restrictions γ (a) β2 (y)γ2 (a) ≤ − 1 1 + β2 (y)γ2 (a) γ1 (a)
and
β1 (x)γ1 (a) γ (a) ≤ − 2 1 + β1 (x)γ1 (a) γ2 (a)
respectively. These are stronger than CDFC in x˜ and y˜ separately, as expected. The conditions in Proposition 7 are meant to be illustrative and are by no means canonical. One may nevertheless ask how close these conditions are to best possible and whether uniformly best, canonical results are possible. First, the conditions are, in a sense, a point on the Pareto frontier of best possible conditions. To see this, suppose there is an increasing set, E0 , such that two of the conditions hold as equalities in E0 ∩ S, while the third is violated throughout this set. Then CISP is violated for E0 . This can be seen since the vector vb (x a) in (26) will have coordinates which are nondecreasing in a, with at least one coordinate increasing in a. Thus, by the logic of the proof of Lemma 2, Prob(˜x ∈ E0 ) is strictly convex in a, violating CISP. In other words, it is not possible to weaken any of the conditions in Proposition 7 without strengthening one of the other conditions. Second, these conditions can be compared to various necessary conditions for CISP. For example, let x˜ and y˜ be independent for specificity, and consider the increasing set, Ex = {ˆx : xˆ ≥ x}, for a fixed x = (x y). Thus, Ex ∩ S is a rectangle with lower left hand corner x. CISP then requires not just that (1 − G(x|a)) and (1 − H(y|a)) be concave in a separately, but that the product, Prob(˜x ∈ Ex ) = (1 − G(x|a))(1 − H(y|a)), be concave in a as well. This in turn requires that (31)
−d 2 Prob(˜x ∈ Ex )/da2 = (1 − H(y|a))Gaa (x|a) − Ga (x|a)Ha (y|a) + (1 − G(x|a))Haa (y|a) − Ga (x|a)Ha (y|a) = T1 + T2 ≥ 0
272
JOHN R. CONLON
where T1 and T2 represent the first and second bracketed terms, respectively. Now, in the independence case, we can ignore (29) in Proposition 7. Also (allowing for independence), sufficient conditions (27) and (28) were derived from essentially (32)
Gaa (x|a)h(y|a) + Ga (x|a)ha (y|a) ≥ 0
and
Haa (y|a)g(x|a) + Ha (y|a)ga (x|a) ≥ 0 respectively. Integrating the left-hand side of the first inequality in (32) gives 1 ˆ + Ga (x|a)ha (y|a)] ˆ [Gaa (x|a)h(y|a) d yˆ y
= (1 − H(y|a))Gaa (x|a) − Ga (x|a)Ha (y|a) = T1 and similarly for T2 . Thus, for T1 ≥ 0 and T2 ≥ 0, the inequalities in (32) must hold on average. The sufficient conditions in (32) are then more restrictive than the necessary condition in (31) in exactly two ways: (a) they impose their constraints pointwise, rather than on average, and (b) they require T1 ≥ 0 and T2 ≥ 0 separately, rather than just T1 + T2 ≥ 0. Conditions (27) and (28) are then essentially convenient rearrangements of those in (32). Of course, it is in the nature of easy-to-check local conditions that they should be imposed pointwise, so drawback (a) is inevitable in this type of result. On the other hand, drawback (b) suggests that (27) can be weakened if we are willing to strengthen (28) and visa versa. This is the role of the φ and ψ functions in (25), and specific applications will presumably require clever choices of these functions. For example, the conditions in Proposition 7 do not imply, and are not implied by, the GCDFC, but a careful choice of the φ and ψ functions in (25) shows that the GCDFC implies CISP. The GCDFC requires either that g(x|a)[1 − H(y|x a)] be concave in a for all (x y) ∈ [0 1] × [0 1] or that h(y|a)[1 − G(x|y a)] be concave in a for all (x y) ∈ [0 1] × [0 1] (with h(y|a) and G(x|y a) the obvious marginal density and conditional cumulative distribution functions). While these conditions cannot be obtained from Proposition 7, they can be derived from Lemma 3, as shown in the following proposition. PROPOSITION 8: The GCDFC implies the CISP condition. PROOF: For specificity, assume g(x|a)[1 − H(y|x a)] concave in a for all (x y) ∈ [0 1]×[0 1]. Apply Lemma 3 with φ(x y a) = [1−h(y|x a)]Ga (x|a) and ψ(x y a) = Ga (x|a)Hx (y|x a) + [H(y|x a) − y]ga (x|a). These functions satisfy (23) and (24). In addition, with these functions, the vector field in (25) becomes (33) v(x a) = − Ga (x|a) g(x|a)Ha (y|x a) + [H(y|x a) − y]ga (x|a)
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
273
Now, g(x|a)[1 − H(y|x a)] is concave in a for all (x y) ∈ [0 1] × [0 1] by GCDFC. Plugging y = 0 into this shows that g(x|a) is concave in a, 1 so gaa (x|a) ≤ 0 for all x ∈ [0 1]. Integrating gives 0 ≥ 0 gaa (x|a) dx = 1 (d 2 /da2 ) 0 g(x|a) dx = d 2 1/da2 = 0, so gaa (x|a) = 0 almost everywhere (this also follows from Proposition 1 above). Thus, since Gaa (0|a) = 0, it follows that Gaa (x|a) = 0 for all x, so the first coordinate of v(x a) in (33) is constant in a, and so, trivially nonincreasing. Next, concavity of g(x|a)[1 − H(y|x a)] in a plus gaa (x|a) = 0 yields (34)
g(x|a)Haa (y|x a) + 2ga (x|a)Ha (y|x a) ≥ 0
Also, the second coordinate in (33) is nonincreasing in a if (35)
g(x|a)Haa (y|x a) + 2ga (x|a)Ha (y|x a) + [H(y|x a) − y]gaa (x|a) ≥ 0
Now GCDFC implies both gaa (x|a) = 0 and (34), and these in turn imply (35). Both coordinates of (33) are therefore nonincreasing, so Lemma 3 yields CISP. Q.E.D. Of course, it may be more natural to derive the GCDFC directly, as in Sinclair-Desgagné (1994). However, Proposition 8 illustrates the flexibility of Lemma 3. This flexibility may be especially useful in applications with asymmetries between signals, such as contingent monitoring problems (e.g., Fagart and Sinclair-Desgagné (2007)). On the other hand, I do not believe that it is possible to find “canonical” local conditions that imply NISP or CISP. Since NISP and CISP are global conditions, it seems doubtful that one can find local conditions which would be both necessary and sufficient for them. For any local conditions, there should exist some function, F(x y|a), which violates these conditions only in some small region, while NISP and/or CISP still hold. Of course, this does not deny that there may well be conditions which, while imperfect, are nevertheless more general than those in Propositions 6 and 7. For an attempt along these lines which, however, I believe is of limited value, see Conlon (2009). The above can all be extended to the n-signal case. For example, consider the three-signal case with density f (x y z|a) = g(x|a) h(y|x a) k(z|x y a). Then, leaving off the arguments for brevity, (25) should be replaced by the vector field v = −(Ga hk + φ1 gHa k − Ga Hx k + φ2 ghKa − Ga hKx − gHa Ky + Ga Hx Ky + φ3 ) where G, H, and K are the obvious cumulative distribution functions, and the functions φi (x y z a) satisfy φ1x + φ2y + φ3z = 0 and the analogue of (24).
274
JOHN R. CONLON
One can then obtain results like Propositions 6 and 7 above. The approach is therefore quite flexible.10 9. OTHER EXTENSIONS The above method allows other extensions. The basic approach is to find conditions which imply that the function u(s∗ (x)) is in some restricted class, and then find conditions on the density f (x|a) such that the mapping (13), from ϕ(x) to ϕT (a), maps this restricted class into concave functions (see also Jewitt (1988, pp. 1189–1190)). ˜ Suppose For example, consider a two-signal case, with signals x˜ and y. the likelihood ratio, fa (x y|a)/f (x y|a), is nondecreasing and submodular in (x y) (so its cross partial is nonpositive), and that ω(z) in (10) above is nondecreasing concave. Then it can be shown that ϕ(x) ≡ u(s∗ (x)) is also nondecreasing and submodular. Next suppose Prob(x˜ ≥ x0 or y˜ ≥ y0 |a) is concave in a for all (x0 y0 ). Then it can be shown that the agent’s payoff, U(s∗ (·) a) = ϕT (a) − a, is concave and the first-order approach is valid (this result was suggested to me by Ian Jewitt; for details see Conlon (2009)). This generalizes Jewitt’s other multisignal result, his Theorem 2. However, it is not clear how to extend this beyond the two-signal case. Next (as Ian Jewitt has also pointed out to me), if one is willing to assume that the n − 1 signals in x˜ −h are ancillary (see Proposition 1 above), then one can extend the Sinclair-Desgagné (1994) results by imposing conditions on the remaining dimension h. For example, using the Rogerson (1985) approach leads to the original Sinclair-Desgagné conditions. The Jewitt (1988) approach could also be used here. It would also be interesting to extend the approach in Brown et al. (1986), building on increasing marginal cost of effort and nonseparabilities between effort and income. Finally, Ian Jewitt has also suggested a third result: Assume f (x|a) is affiliated (i.e., MPT2 ) in x and let faa (x|a)/f (x|a) be nonincreasing in x. Then, for any nondecreasing function ϕ(x), ϕT (a) = ϕ(x)f (x|a) dx is concave in a. To see this, use [faa (x|a)/f (x|a)]f (x|a) dx = (d 2 /da2 ) f (x|a) dx = 0
10
One can also use the vector flows from Section 7 to generate state-space representations of the technology. Specifically, divide the flux v(x a) by the density f (x|a) to get the flow velocity [1/f (x|a)]v(x a). Suppose the differential system ∂x/∂a = [1/f (x|a)]v(x a) has a unique solution, x(a x0 ), for each initial point x0 in the support of f (x|a = 0). Then the set of these initial points x0 can be treated as the state space, with density p(x0 ) = f (x0 |a = 0). If this construction is applied to vb (x a) in (26), one gets roughly the state-space representation in Proposition 3 above (see Conlon (2009)). This approach may also help obtain other state-space representations.
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
275
so d2 da2
faa (x|a) f (x|a) dx f (x|a) faa (˜x|a) = − Cov ϕ(˜x) − f (˜x|a)
ϕ(x)f (x|a) dx =
ϕ(x)
This is nonpositive, since the covariance of the two nondecreasing functions, ϕ(x) and −faa (x|a)/f (x|a), of the affiliated vector x˜ , is nonnegative. Thus ϕT (a) is concave. By Lemma 1, this implies CISP. 10. CONCLUSION: COMPARISON OF THE TWO APPROACHES This paper has derived new multisignal extensions of the two major sets of conditions that justify the first-order approach to moral hazard principal– agent problems. In the past, one of these sets of conditions, the one developed by Rogerson (1985), seems to have been considerably more popular than the other set, developed by Jewitt (1988). However, the Jewitt approach may be more general in some ways. For example, suppose the principal sees a number of signals, x˜ 1 , x˜ 2 x˜ n , and suppose that the signals are independent. If the Jewitt conditions in Proposition 2 hold for each signal separately, then they clearly also hold for the vector of signals. However, if the CDFC holds for each signal separately, CISP need not hold, as shown in Section 8 above. Furthermore, suppose the signals x˜ 1 x˜ 2 x˜ n are independent and identically distributed (iid). Then CISP must break down as the number of signals n → ∞. For if F(x|a) is the common cumulative distribution function of the x˜ i , then the probability of the increasing set Exˆ = {x : xi ≥ xˆ for all i} is n ˆ . This ceases to be concave in a as n → ∞, so Prob(˜x ∈ Exˆ ) = [1 − F(x|a)] CISP breaks down. This is related to a more general point. Even in the one-signal case, the CDFC must fail if the signal x˜ is sufficiently accurate. For example, let x˜ have mean χ(a) for some increasing function χ(·). If χ(·) is concave, one would expect to find the sort of decreasing returns that CDFC is intended to capture. However, suppose x˜ has a standard deviation much less than some small ε. Then F(x|a) will be near one for a < χ−1 (x) − ε, then fall rapidly for a near χ−1 (x), and be near zero for a > χ−1 (x) + ε. This is a reverse s-shaped function in a, and so, cannot be convex in a, even if the function χ(·) is extremely concave.11 11 This also explains a seeming contradiction in the contingent monitoring literature, between Proposition 1 in Fagart and Sinclair-Desgagné (2007), and Theorem 1(a) in Dye (1986). Fagart and Sinclair-Desgagné assumed that the first-order approach is valid, and showed that an optimal contingent monitoring policy is “upper tailed” if Jewitt’s function, ω(·) in (10) above (∗ (·)
276
JOHN R. CONLON
Now, in the iid case considered just above, the vector of signals, x˜ , becomes almost perfectly informative about a as n → ∞. Thus, since the CDFC fails if signals approach perfect accuracy, it is not surprising that CISP also fails in this case.12 In most actual applications, of course, the vector of signals does not approach perfect accuracy, so the breakdown of CISP is not inevitable. However, the above discussion suggests that alternative conditions, such as Jewitt’s results and their generalization in Section 4, are of significant practical use.13 Unfortunately, the Jewitt conditions are tied to the concavity, not only of the technology, but also of the payment schedule s∗ (·). Thus, there are interesting cases where the Jewitt conditions should fail, since the payment schedule s∗ (·) is often not concave. For example, managers often receive stock options, face liquidity constraints, or are subject to contingent monitoring, and these can introduce nonconcavities into their payment functions. However, I believe that it is natural to expect the first-order approach itself to fail in such cases, since the agent’s overall objective function will tend to be nonconcave if s∗ (·) is not concave. The fact that CDFC and CISP imply concavity of the agent’s payoff, regardless of the curvature of s∗ (·), then suggests that the CDFC/CISP conditions are very restrictive, not that the first-order approach is widely applicable. Nevertheless, if researchers want to ignore such problems and use the firstorder approach even when the payment schedule may not be concave, then the generalizations of the CDFC approach from Sections 5 and 6 above can be used. APPENDIX: PROOF OF LEMMA 1 First, (14) clearly implies CISP. For h(x; E) is nondecreasing in x when E is an increasing set, so by (14), Prob(˜x ∈ E|a) = hT (a; E) is concave in a. Thus, assume CISP and let ϕ(·) be nondecreasing. For α < β, let ϕ[αβ] (x) = max α min(β ϕ(x)) in Fagart and Sinclair-Desgagné’s notation), is convex. Dye, by contrast, without assuming the first-order approach, showed that the optimal contingent monitoring policy is always lower tailed if the monitoring technology is sufficiently accurate, regardless of ω(·). The resolution of this contradiction is that if the monitoring technology is very accurate, then the CDFC is violated. Thus we must resort to the Jewitt conditions to justify the first-order approach in Fagart and Sinclair-Desgagné (2007), and these conditions require ω(·) to be concave. 12 Note, in particular, that this breakdown does not imply that the CISP is a poor generalization of the CDFC. As Lemma 1 shows, CISP is necessary and sufficient for u(s∗ (x)) nondecreasing in x to imply U(s∗ (·) a) concave in a. Thus, given the general Rogerson approach, CISP is best possible. 13 In addition, as Jewitt (1988) pointed out, the CDFC, and so, its multisignal generalizations such as CISP, are not satisfied by most standard statistical distributions.
MULTISIGNAL PRINCIPAL–AGENT PROBLEMS
277
This equals ϕ(x) when α ≤ ϕ(x) ≤ β, but equals α when ϕ(x) < α and equals β when ϕ(x) > β. Now ϕ[αβ] (·) can be approximated uniformly by the sum (A1)
ϕN[αβ] (x) = α +
N
[(β − α)/N]h(x; Ei )
i=1
where the sets Ei = {x : ϕ(x) ≥ α + (β − α)(i/N)} are increasing sets (note that (A1) essentially expresses the convex cone of increasing functions in terms of its extreme rays). In addition, the transformation in (13) is linear and also continuous under the uniform norm. Thus, applying this transformation to (A1) indicates that ϕT[αβ] (a) is approximated uniformly by
ϕ
T N [αβ]
N (a) = α + [(β − α)/N]hT (a; Ei ) i=1
Here hT (a; Ei ) = h(x; Ei )f (x|a) dx = Prob(˜x ∈ Ei |a) is concave in a by CISP, so (ϕN[αβ] )T (a) is concave in a for all N. Thus ϕT[αβ] (a) must also be concave in a. Next, letting α → −∞ and using the monotone convergence theorem shows that ϕT(−∞β] (a) is concave in a. Similarly, ϕT(−∞∞) (a) = ϕT (a) is concave in a. Q.E.D. REFERENCES BROWN, M., S. H. CHIANG, S. GHOSH, AND E. WOLFSTETTER (1986): “A New Class of Sufficient Conditions for the First-Order Approach to the Principal–Agent Problem,” Economics Letters, 21, 1–6. [251,274] CONLON, J. R. (2009): “Supplement to ‘Two New Conditions Supporting the First-Order Approach to Multisignal Principal–Agent Problems’,” Econometrica Supplementary Material, 77, http://www.econometricsociety.org/ecta/Supmat/6688_proofs.pdf. [259,260,270,273,274] DYE, R. A. (1986): “Optimal Monitoring Policies in Agencies,” Rand Journal of Economics, 17, 339–350. [275] FAGART, M.-C., AND B. SINCLAIR-DESGAGNÉ (2007): “Ranking Contingent Monitoring Systems,” Management Science, 53, 1501–1509. [257,273,275,276] GROSSMAN, S. J., AND O. D. HART (1983): “An Analysis of the Principal–Agent Problem,” Econometrica, 51, 7–45. [249,256] GUESNERIE, R., AND J.-J. LAFFONT (1978): “Taxing Price Makers,” Journal of Economic Theory, 19, 423–455. [249] HOLMSTRÖM, B. (1979): “Moral Hazard and Observability,” Bell Journal of Economics, 10, 74–91. [251] (1982): “Moral Hazard in Teams,” Bell Journal of Economics, 13, 324–340. [255] JEWITT, I. (1988): “Justifying the First-Order Approach to Principal–Agent Problems,” Econometrica, 56, 1177–1190. [249-251,253,254,256,258-262,267,270,274-276]
278
JOHN R. CONLON
(2007): “Information Order in Decision and Agency Problems,” Working Paper, Nuffield College, Oxford. [257] JEWITT, I., O. KADAN, AND J. M. SWINKELS (2008): “Moral Hazard With Bounded Payments,” Journal of Economic Theory (forthcoming). [253] LANG, S. (1987): Calculus of Several Variables (Third Ed.). New York: Springer-Verlag. [263,264] LEHMANN, E. L. (1983): Theory of Point Estimation. New York: Wiley. [254] LICALZI, M., AND S. SPAETER (2003): “Distributions for the First-Order Approach to Principal– Agent Problems,” Economic Theory, 21, 167–173. [271] MILGROM, P. R., AND R. J. WEBER (1982): “A Theory of Auctions and Competitive Bidding,” Econometrica, 50, 1089–1122. [260,263,269] MIRRLEES, J. A. (1999): “The Theory of Moral Hazard and Unobservable Behavior: Part I,” Review of Economic Studies, 66, 3–21. [249] MOOKHERJEE, D. (1984): “Optimal Incentive Schemes With Many Agents,” Review of Economic Studies, 51, 433–446. [255] MÜLLER, A., AND D. STOYAN (2002): Comparison Methods for Stochastic Models and Risks. West Sussex, U.K.: Wiley. [258,261,263] NACHBIN, L. (1965): Topology and Order. Princeton, NJ: Van Nostrand. [260] ROGERSON, W. P. (1985): “The First-Order Approach to Principal–Agent Problems,” Econometrica, 53, 1357–1367. [249,250,253,262,274,275] ROSS, S. A. (1973): “The Economic Theory of Agency: The Principal’s Problem,” American Economic Review, 63, 134–139. [250] SHAVELL, S. (1979): “Risk Sharing and Incentives in the Principal and Agent Relationship,” Bell Journal of Economics, 10, 55–73. [251] SINCLAIR-DESGAGNÉ, B. (1994): “The First-Order Approach to Multi-Signal Principal–Agent Problems,” Econometrica, 62, 459–465. [250,251,253,260,267,270,273,274] (2009): “Ancillary Statistics in Principal–Agent Models,” Econometrica, 77, 279–281. [255,256] SPENCE, M., AND R. ZECKHAUSER (1971): “Insurance, Information, and Individual Action,” American Economic Review, 61, 380–387. [250] TAYLOR, A. E., AND W. R. MANN (1983): Advanced Calculus (Third Ed.). New York: Wiley. [263,264]
Dept. of Economics, University of Mississippi, University, MS 38677, U.S.A.; [email protected]. Manuscript received September, 2006; final revision received June, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 279–281
NOTES AND COMMENTS ANCILLARY STATISTICS IN PRINCIPAL–AGENT MODELS BY BERNARD SINCLAIR-DESGAGNÉ1 IN THIS ISSUE Conlon (2009, Proposition 1) remarks that someone using the first-order approach in multisignal principal–agent analysis will often encounter “ancillary” statistics, namely signals which are uncorrelated to the agent’s effort. Take, for instance, the convenient family of conditional probability distributions such that the probability of observing the vector of signals s given effort level a is given by p(s | a) = α(a)f (s) + (1 − α(a))g(s) where α(·) is a twice continuously differentiable concave function from R+ to [0 1] with α(0) = 0 and lima→∞ α(a) = 1, and the probabilities f (s) and g(s) K are strictly positive for all s ∈ S = k=1 Sk with Sk = {0 1 2 Sk }. To ensure the validity of the first-order approach, the maximum likelihood ratio property is met if (1)
f (s) g(s)
is nondecreasing on S
while Sinclair-Desgagné’s (1994) Assumptions A8 (generalized stochastic dominance) and A9 (generalized concavity of the distribution function) are simultaneously verified when (2)
Sh
[f (sh s−h ) − g(sh s−h )] ≥ 0
for some index h and all jh s−h
sh =jh
The latter condition entails that the signals in s−h must be ancillary. Ancillary signals, however, can still constitute valuable information. In his seminal paper (although he does not use the terminology), Holmstrom (1979, Section 4) already gave examples of ancillary items that frequently appear in contracts: contingencies that are outside the agent’s control (such as strikes, accidents, or natural disasters), for instance, or the performance levels achieved independently by other people doing similar work. Since the notion of ancillary 1 I am particularly grateful to John Conlon for many enlightening exchanges and discussions on this topic. I also thank the co-editor, Larry Samuelson, and an anonymous reviewer for their comments and suggestions. This note was mostly written while I was visiting the London School of Economics and the Judge Business School of the University of Cambridge in academic year 2007–2008.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7812
280
BERNARD SINCLAIR-DESGAGNÉ
statistics was introduced by Fisher (1934), it is, in fact, well known that such statistics, say the s−h above, can affect the likelihood ratio (pa (s | a))/(p(s | a)); when this happens under the first-order approach, s−h will have to be part of an optimal contract.2 To illustrate this, let K = 2, Sk = {0 1}, and f (0 0) = 01
f (1 0) = 02
f (0 1) = 03
f (1 1) = 04
g(0 0) = 02
g(1 0) = 01
g(0 1) = 06
g(1 1) = 01
so that the inequalities in (2) are now satisfied for the first coordinate and s2 is ancillary. In this example, the likelihood ratio takes the forms pa (0 0 | a) pa (0 1 | a) −α (a) = = p(0 0 | a) p(0 1 | a) 2 − α(a) pa (1 0 | a) α (a) pa (1 1 | a) 3α (a) = < = p(1 0 | a) 1 + α(a) p(1 1 | a) 1 + 3α(a) The principal will thus find it optimal to set a wage schedule w(0 0) = w(0 1) < w(1 0) < w(1 1) which does depend on s2 . Note that this occurs, not because observing a larger value of s2 makes a higher effort by the agent more likely, as the standard interpretation of the monotone likelihood ratio property would suggest (see, e.g., Milgrom (1981)), but due to the fact that the relative weight of α(a) in the outcome distribution α(a)f (s1 s2 ) + (1 − α(a))g(s1 s2 ) increases at s1 = 1 when s2 = 1 instead of s2 = 0; this raises the agent’s return on effort and allows the principal to then use the signal s1 more aggressively. By and large, it can be shown that an ancillary statistic s−h can contribute valuable extra information unless sh is a sufficient statistic for the agent’s effort a.3 When this is not the case, some additional ancillary signals s−h will often supplement sh to make (sh s−h ) sufficient (see Ghosh, Reid, and Fraser (2007)). REFERENCES BOOS, D. D., AND J. M. HUGHES-OLIVER (1998): “Applications of Basu’s Theorem,” The American Statistician, 52, 218–221. [280] 2 The key role of likelihood ratios in principal–agent analysis under the first-order approach was observed early on by Holmstrom (1979) and spelled out later by Kim (1995). 3 As John Conlon pointed out to me, this is true in this principal–agent setting because what matters is the likelihood ratio and the latter only depends on the sufficient statistic. In general, no ancillary statistic will play a role for inference when a sufficient statistic is also complete (meaning that it contains no irrelevant information about a). This is the celebrated “Basu theorem” of mathematical statistics (see Boos and Hughes-Oliver (1998) for a precise statement). As Spanos (2007) just remarked, however, even when sh is completely sufficient in this sense, an ancillary statistic might still serve to validate the current representation p(s | a) of the agent’s production function.
ANCILLARY STATISTICS IN PRINCIPAL–AGENT MODELS
281
CONLON, J. R. (2009): “Two New Conditions Supporting the First-Order Approach to Multisignal Principal–Agent Problems,” Econometrica, 77, 249–278. [279] FISHER, R. A. (1934): “Two New Properties of Mathematical Likelihood,” Proceedings of the Royal Statistical Society, Series A, 144, 285–307. [280] GHOSH, M., N. REID, AND D. A. S. FRASER (2007): “Ancillary Statistics: A Review,” Mimeo, Department of Statistics, University of Toronto. [280] HOLMSTROM, B. (1979): “Moral Hazard and Observability,” Bell Journal of Economics, 10, 74–91. [279,280] KIM, S. K. (1995): “Efficiency of an Information System in an Agency Model,” Econometrica, 63, 89–102. [280] MILGROM, P. R. (1981): “Good News and Bad News: Representation Theorems and Applications,” Bell Journal of Economics, 12, 380–391. [280] SINCLAIR-DESGAGNÉ, B. (1994): “The First-Order Approach to Multi-Signal Principal–Agent Problems,” Econometrica, 62, 459–465. [279] SPANOS, A. (2007): “Sufficiency and Ancillarity Revisited: Testing the Validity of a Statistical Model,” Mimeo, Department of Economics, Virginia Tech. [280]
HEC Montréal and CIRANO, 3000 Chemin de la Côte-Sainte-Catherine, Montréal, Québec, Canada H3T 2A7 and École Polytechnique, 91128 Palaiseau Cedex, France; [email protected]. Manuscript received March, 2008; final revision received June, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 283–306
BOOTSTRAPPING REALIZED VOLATILITY BY SÍLVIA GONÇALVES AND NOUR MEDDAHI1 We propose bootstrap methods for a general class of nonlinear transformations of realized volatility which includes the raw version of realized volatility and its logarithmic transformation as special cases. We consider the independent and identically distributed (i.i.d.) bootstrap and the wild bootstrap (WB), and prove their first-order asymptotic validity under general assumptions on the log-price process that allow for drift and leverage effects. We derive Edgeworth expansions in a simpler model that rules out these effects. The i.i.d. bootstrap provides a second-order asymptotic refinement when volatility is constant, but not otherwise. The WB yields a second-order asymptotic refinement under stochastic volatility provided we choose the external random variable used to construct the WB data appropriately. None of these methods provides thirdorder asymptotic refinements. Both methods improve upon the first-order asymptotic theory in finite samples. KEYWORDS: Realized volatility, i.i.d. bootstrap, wild bootstrap, Edgeworth expansions.
1. INTRODUCTION THE INCREASING AVAILABILITY of high frequency financial data has contributed to the popularity of realized volatility as a measure of volatility in finance. Realized volatility is simple to compute (it is equal to the sum of squared high frequency returns) and is a consistent estimator of integrated volatility under general conditions (see Andersen, Bollerslev, and Diebold (2002) for a survey of realized volatility). Recently, a series of papers, including Barndorff-Nielsen and Shephard (henceforth BNS) (2002) and Barndorff-Nielsen, Graversen, Jacod, and Shephard (BNGJS) (2006) have developed an asymptotic theory for measures of variation such as realized volatility. In particular, for a rather general stochastic volatility model, these authors establish a central limit theorem (CLT) for 1 We would like to thank participants at the 2005 North American Winter Meeting of the Econometric Society, the SBFSIF II conference, Québec (April 2005), the CIREQ Montréal Financial Econometrics (May 2005), the SETA conference, Taipei (May 2005), the 2005 CEA meetings, the Princeton–Chicago High Frequency Conference (June 2005), and the NBER Summer Institute 2005, as well as seminar participants at Concordia University, Université de Toulouse I, the St. Louis Fed, and Universidade Nova de Lisboa. We also thank Torben Andersen, António Antunes, Christian Brownlees, Rui Castro, Valentina Corradi, Peter Hansen, Emma Iglesias, Atsushi Inoue, Lutz Kilian, and especially Per Mykland and Neil Shephard for helpful comments on the first version of the paper. In addition, we are grateful to three anonymous referees and a co-editor for many valuable suggestions. This work was supported by grants from FQRSC, SSHRC, MITACS, NSERC, and Jean-Marie Dufour’s Econometrics Chair of Canada. Parts of this paper were completed while Gonçalves was visiting the Banco de Portugal, Lisboa, and the Finance Department at Stern Business School and Meddahi was visiting Toulouse University and CREST–Paris.
© 2009 The Econometric Society
DOI: 10.3982/ECTA5971
284
S. GONÇALVES AND N. MEDDAHI
realized volatility over a fixed interval of time, for example, a day, as the number of intraday returns increases to infinity. In this paper, we propose bootstrap methods for realized volatility-like measures. Our main motivation is to improve upon the existing asymptotic mixed normal approximations. The bootstrap can be particularly valuable in the context of high frequency data-based measures. Current practice is to use a moderate number of intraday returns in computing realized volatility to avoid microstructure biases.2 Sampling at long horizons may limit the value of the asymptotic approximations derived under the assumption of an infinite number of returns. In particular, the Monte Carlo results in BNS (2005) showed that the feasible asymptotic theory for realized volatility can be a poor guide to the finite sample distribution of the studentized realized volatility. BNS (2005) also showed that a logarithmic version of the raw statistic has improved finite sample properties. Here we focus on a general class of nonlinear transformations of realized volatility which includes the raw realized volatility and its log transform as special cases. For this class of statistics, we ask whether we can improve upon the existing first-order asymptotic theory by relying on the bootstrap for inference on integrated volatility in the absence of microstructure noise. Since the effects of microstructure noise are more pronounced at very high frequencies, we expect the bootstrap to be a useful tool of inference based on realized volatility when sampling at moderate frequencies such as 30 minute horizons (as in Andersen, Bollerslev, Diebold, and Labys (2003)) or at 10–15 minute horizons for liquid asset returns (see Hansen and Lunde (2006)). We propose and analyze two bootstrap methods for realized volatility: an independent and identically distributed (i.i.d.) bootstrap and a wild bootstrap (WB). The i.i.d. bootstrap generates bootstrap intraday returns by resampling with replacement the original set of intraday returns. It is motivated by a benchmark model in which volatility is constant and therefore intraday returns are i.i.d. In practice, volatility has components which are highly persistent, especially over a daily horizon, implying that it is at least locally nearly constant. Hence we may expect the i.i.d. bootstrap to provide a good approximation even under stochastic volatility. The WB observations are generated by multiplying each original intraday return by an i.i.d. draw from a distribution that is independent of the data. The WB was introduced by Wu (1986), and further studied by Liu (1988) and Mammen (1993) in the context of cross-section linear regression models subject to unconditional heteroskedasticity in the error term. We summarize our main contributions as follows. First, we prove the firstorder asymptotic validity of both bootstrap methods under very general as2 Recently, a number of papers have studied the impact of microstructure noise on realized volatility; these include Zhang, Mykland, and Aït-Sahalia (2005b), Hansen and Lunde (2006), Bandi and Russell (2008), and Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008). In particular, these papers proposed alternative estimators of integrated volatility that are robust to microstructure noise and that differ from realized volatility.
BOOTSTRAPPING REALIZED VOLATILITY
285
sumptions which allow for drift and leverage effects. Second, for a simpler model that rules out these effects, we derive formal second- and third-order Edgeworth expansions of the distribution of realized volatility-based t statistics as well as of their bootstrap analogues. Third, we use our Edgeworth expansions to compare the accuracy of the first-order asymptotic theory for realized volatility and for its log transform. Last, we use our Edgeworth expansions and Monte Carlo simulations to compare the finite sample accuracy of bootstrap confidence intervals for integrated volatility with the existing CLT-based intervals. Our results are as follows. The Edgeworth expansions for the raw and log statistics provide a theoretical explanation for the superior finite sample performance of the log statistic. For both types of statistics, the simulated bootstrap (one-sided and two-sided symmetric) intervals are more accurate in finite samples than the CLT-based intervals. The second-order Edgeworth expansions show that the i.i.d. bootstrap provides a second-order refinement over the normal approximation when volatility is constant but not otherwise. When volatility is time-varying and the rate of convergence of both approximations is the same, we use the asymptotic relative bootstrap error as a criterion of comparison (see Shao and Tu (1995) and Davidson and Flachaire (2001) for a similar argument). We show that the i.i.d. bootstrap is better than the normal approximation under this criterion for the raw statistic. These results are consistent with the good finite sample properties of the i.i.d. bootstrap one-sided confidence intervals. The WB provides a second-order asymptotic refinement when we choose the external random variable appropriately. We provide an optimal choice for the raw statistic. Our Monte Carlo simulations show that the WB implemented with this choice outperforms the first-order asymptotic normal approximation. The comparison between this WB and the i.i.d. bootstrap favors the i.i.d. bootstrap, which is the preferred method in the context of our study. Motivated by the good finite sample performance of the bootstrap for twosided symmetric intervals, we also investigate the ability of the bootstrap to provide a third-order asymptotic refinement for the raw realized volatility statistic. We show that none of our bootstrap methods gives third-order refinements. This is true for the i.i.d. bootstrap even when volatility is constant, a surprising result given that returns are i.i.d. in this case. A distinctive feature of our i.i.d. bootstrap t statistic is that it uses the (unscaled) sample variance estimator of the bootstrap squared returns and not the bootstrap analogue of the variance estimator proposed by BNS (2002) (which relies on the conditional local Gaussianity of intraday returns and cannot be used with the bootstrap). Under constant volatility, an alternative consistent variance estimator to BNS (2002) is the (unscaled) sample variance of squared returns, which mimics the i.i.d. bootstrap variance estimator. In this case, the i.i.d. bootstrap is third-order accurate when used to estimate the distribution of the alternative t statistic based on the sample variance of squared returns.
286
S. GONÇALVES AND N. MEDDAHI
Thus, the lack of third-order asymptotic refinements for the i.i.d. bootstrap under constant volatility is explained by the fact that the bootstrap statistic is not of the same form as the original statistic. The remainder of this paper is organized as follows. In Section 2, we describe the setup and briefly review the existing theory. Section 3 introduces the bootstrap methods and establishes their first-order asymptotic validity. Section 4 contains the second-order accuracy results, whereas Section 5 discusses third-order results. Section 6 contains simulations and Section 7 concludes. In Appendix A we state and prove the cumulant asymptotic expansions. Appendix B collects some of the proofs of the results that appear in Sections 3–5. Supplementary proofs and technical results appear in the web supplement to this paper (Gonçalves and Meddahi (2009), hereafter GM09). 2. SETUP, NOTATION, AND EXISTING THEORY We follow BNGJS (2006) and assume that the log-price process {log St : t ≥ 0} is defined on some filtered probability space (Ω F (Ft )t≥0 P) and follows the continuous time process (1)
d log St = μt dt + σt dWt
where Wt denotes a standard Brownian motion, μ is an adapted predictable locally bounded drift term, and σ is an adapted cadlag volatility process. These assumptions are very general, allowing for jumps, intraday seasonality, and long memory in both μ and σ. In addition, we do not assume Wt to be independent of σt , allowing for the presence of leverage effects. The parameter of interest is the integrated volatility over a fixed time interval [0 1] and is de1 fined as σ 2 ≡ 0 σu2 du. A consistent estimator of σ 2 is the realized volatility 1/ h R2 = i=1 ri2 , where ri ≡ log Sih − log S(i−1)h denotes the high frequency return measured over the period [(i − 1)h ih] for i = 1 1/ h. 1 1/ h q For any q > 0, define σ q ≡ 0 σuq du and σh ≡ h−q/2+1 i=1 (σi2 )q/2 , where ih σi2 ≡ (i−1)h σu2 du. BNGJS (2006) showed that for any q > 0, as h → 0, Rq ≡ 1/ h P h−q/2+1 i=1 |ri |q → μq σ q , where μq ≡ E|Z|q , with Z ∼ N(0 1). When q = 2, we obtain the consistency result for realized volatility. BNGJS (2006) also showed that √ h−1 (R2 − σ 2 ) d Th ≡ (2) → N(0 1) Vˆ where Vˆ = 23 R4 , under very general conditions, including drift and leverage effects. In particular, a sufficient assumption is (1) and t t t σt = σ0 + (3) a#u du + σu# dWu + vu# dVu 0
0
0
BOOTSTRAPPING REALIZED VOLATILITY
287
with a# , σ # , and v# adapted cadlag processes, a# predictable and locally bounded, and V a Brownian motion independent of W . Equation (3) does not allow for jumps in the volatility, but this can be relaxed (see Assumption H1 of BNGJS (2006) for a more general assumption on σ). An earlier statement of the CLT result for realized volatility under stronger conditions appeared in Jacod and Protter (1998) and BNS (2002). The log transformation of realized volatility is often used in empirical applications due to its improved finite sample properties. Here we consider a general class of nonlinear transformations that satisfy the following assumption. Throughout we let g (z) and g (z) denote the first and second derivatives of g with respect to z, respectively. ASSUMPTION G: Let g : R → R be twice continuously differentiable with g (σ 2 ) = 0 for any path of σ.
Assumption G contains the log transform for realized volatility (when g(z) = log z) and the raw statistic (when g(z) = z) as special cases. The corresponding t statistic is √ h−1 (g(R2 ) − g(σ 2 )) Tgh ≡ g (R2 ) Vˆ For the raw statistic, Tgh = Th . By the delta method, it follows from (2) that d Tgh → N(0 1). 3. THE BOOTSTRAP Under stochastic volatility, intraday returns are independent but heteroskedastic, conditional on the volatility path, which motivates a WB in this context. The i.i.d. bootstrap is motivated by a benchmark model in which μt = 0 and σt = σ > 0 for all t. In this case, intraday returns at horizon h are i.i.d. N(0 σ 2 h). As we show here, the i.i.d. bootstrap remains asymptotically valid for general stochastic volatility models described by (1) and (3). We denote the bootstrap intraday h-period returns as ri∗ . For the i.i.d. bootstrap, ri∗ is i.i.d. from {ri : i = 1 1/ h}. For the WB, ri∗ = ri ηi , where ηi are i.i.d. with moments given by μ∗q = E ∗ |ηi |q . In the following, P ∗ denotes the probability measure induced by the bootstrap, conditional on the original sample. Similarly, we let E ∗ (and Var∗ ) denote expectation (and variance) with respect to the bootstrap data, conditional on the original sample. 1/ h ∗2 The bootstrap realized volatility is equal to R∗2 = i=1 r√ i . For the i.i.d. bootstrap, we can show that E ∗ (R∗2 ) = R2 and V ∗ ≡ Var∗ ( h−1 R∗2 ) = R4 − R22 .
288
S. GONÇALVES AND N. MEDDAHI
We propose the following consistent estimator of the i.i.d. bootstrap variance V ∗ : 1/ h 2 1/ h (4) r ∗4 − r ∗2 ≡ R∗ − R∗2 Vˆ ∗ = h−1 i
i=1
i
4
2
i=1
1/ h where for any q > 0 we let R∗q ≡ h−q/2+1 i=1 |ri∗ |q . The i.i.d. bootstrap analogue of Tgh is given by √ h−1 (g(R∗2 ) − g(R2 )) ∗ (5) Tgh ≡ g (R∗2 ) Vˆ ∗ Note that although we center the (transformed) bootstrap realized volatility around the (transformed) sample realized volatility (since E ∗ (R∗2 ) = R2 ), the bootstrap standard error estimator is not of the same form as that used to studentize Tgh . In particular, Vˆ ∗ is not given by 23 R∗4 , which would be the bootstrap analogue of Vˆ . The naive estimator 23 R∗4 is not consistent for V ∗ because it relies on a local Gaussianity assumption that does not hold for the i.i.d. nonparametric bootstrap. In contrast, Vˆ ∗ given in (4) is a consistent estimator of V ∗ . √ For the WB, we can show that E ∗ (R∗2 ) = μ∗2 R2 and V ∗ ≡ Var∗ ( h−1 R∗2 ) = ∗ (μ∗4 − μ∗2 2 )R4 . We propose the following consistent estimator of V ,
∗ μ4 − μ∗2 2 ∗ ˆ R∗4 (6) V = μ∗4 ∗ as and define the WB studentized statistic Tgh √ h−1 (g(R∗2 ) − g(μ∗2 R2 )) ∗ (7) ≡ Tgh g (R∗2 ) Vˆ ∗ ∗ Note that Tgh is invariant to multiplication of η by a constant when g(z) = z and when g(z) = log(z), the two leading choices of g. ∗ THEOREM 3.1: Suppose (1) and (3) hold. Let Tgh denote either the i.i.d. bootstrap statistic defined in (4) and (5), or the WB statistic defined in (6) and (7). For the WB, let ηi ∼ i.i.d. such that μ∗8 = E ∗ |ηi |8 < ∞. Under Assumption G, as P
∗ ≤ x) − P(Tgh ≤ x)| → 0. h → 0, supx∈R |P ∗ (Tgh
This result provides a theoretical justification for using the i.i.d. bootstrap or the WB to consistently estimate the distribution of Tgh for any function g satisfying Assumption G. The conditions under which the i.i.d. bootstrap and WB work are those of BNGJS (2006), which allow for the presence of drifts and leverage effects. As the proof of Theorem 3.1 shows, the asymptotic validity of
BOOTSTRAPPING REALIZED VOLATILITY
289
the bootstrap depends on the availability of a CLT result for R2 and a law of large numbers for Rq , which hold under the general assumptions of BNGJS (2006). 4. SECOND-ORDER ACCURACY OF THE BOOTSTRAP We investigate the ability of the bootstrap to provide a second-order asymptotic refinement over the standard normal approximation when estimating P(Tgh ≤ x). We make the following assumption. ASSUMPTION H: The log price process follows (1) with μt = 0 and σt is independent of Wt , where σ is a cadlag process, bounded away from zero, and satisfies 1/ h limh→0 h1/2 i=1 |σηr i − σξri | = 0 for some r > 0, and for any ηi and ξi such that 0 ≤ ξ1 ≤ η1 ≤ h ≤ ξ2 ≤ η2 ≤ 2h ≤ · · · ≤ ξ1/ h ≤ η1/ h ≤ 1. Assumption H restricts considerably our previous assumptions √by ruling out drift and leverage effects. The effect of the drift on Tgh is OP ( h) (see, e.g., Meddahi (2002)). While this is asymptotically negligible at the first order, it is not at higher orders. Thus, our higher order results do not allow for μt = 0. One could in principle bootstrap the centered returns to account for the presence of a constant drift, but we do not explore this possibility here. The noleverage assumption is mathematically convenient to derive the asymptotic expansions because it allows us to condition on the path of volatility when computing higher order cumulants. Relaxing this assumption is beyond the scope of this paper. To describe the Edgeworth expansions, we need to introduce some additional notation. We write κj (Tgh ) to denote the jth-order cumulant of Tgh ∗ and write κ∗j (Tgh ) to denote the corresponding bootstrap cumulant. For j = 1 √ and 3, κjg denotes the coefficient of the terms of order O( h) of the asymptotic expansion of κj (Tgh ), whereas for j = 2 and 4, κjg denotes the coefficients of the terms of order O(h). The bootstrap coefficients κ∗jgh are defined similarly. For the raw statistic, we omit the subscript g, and write κj and κ∗jh to denote the corresponding cumulants. We follow this convention throughout, for instance, when referring to q1g (x) and q2g (x). See Appendix A for a precise definition of κjg and κ∗jgh . Finally, we let σqp ≡ σ q /(σ p )q/p for any q p > 0. Note that under constant volatility, σqp = 1. Similarly, we let Rqp = Rq /Rq/p p . The formal3 second-order Edgeworth expansion of the distribution of Tgh can be written as √ P(Tgh ≤ x) = (x) + hq1g (x)φ(x) + O(h) (8) 3 We do not prove the validity of our Edgeworth expansions. Such a result would be a valuable contribution in itself, which we defer for future research. Here our focus is on using formal expansions to explain the superior finite sample properties of the bootstrap theoretically. See Mammen (1993) and Davidson and Flachaire (2001) for a similar approach.
290
S. GONÇALVES AND N. MEDDAHI
uniformly over x ∈ R, where (x) and φ(x) are the standard normal cumulative and partial distribution functions, respectively. Following Hall (1992, p. 48), q1g (x) = −(κ1g + 16 κ3g (x2 − 1)). Given (8), the error of the normal approximation is √ P(Tgh ≤ x) − (x) = hq1g (x)φ(x) + O(h) (9) uniformly in x ∈ R. The Edgeworth expansion for the bootstrap is √ ∗ ∗ P ∗ (Tgh (10) ≤ x) = (x) + hq1g (x)φ(x) + OP (h) ∗ (x) = −(κ∗1gh + 16 κ∗3gh (x2 − 1)). where q1g
PROPOSITION 4.1: Under Assumptions G and H, conditionally on σ, we have that the following statements: (a) q1g (x) = q1 (x) + 12 (g (σ 2 ))/(g (σ 2 )) 2σ 4 x2 , where q1 (x) ≡ ((4(2x2 + √ 1))/6 2)σ64 . ∗ (b) For the i.i.d. bootstrap, q1g (x) = q1∗ (x) + 12 (g (R2 ))/(g (R2 )) R4 − R22 x2 , where 1 R6 − 3R4 R2 + 2R32 q1∗ (x) ≡ (2x2 + 1) 6 (R4 − R22 )3/2
∗ 2 (c) For the WB, q1g (x) = q1∗ (x) + 12 (g (μ∗2 R2 ))/(g (μ∗2 R2 )) (μ∗4 − μ∗2 2 )R4 x , where
A∗1 1 ∗ ∗ ∗ 2 + (B1 − 3A1 )(x − 1) R64 q1 (x) ≡ − − 2 6 A∗1 =
μ∗6 − μ∗2 μ∗4 1/2 μ∗4 (μ∗4 − μ∗2 2 )
B1∗ =
μ∗6 − 3μ∗2 μ∗4 + 2μ∗3 2 3/2 (μ∗4 − μ∗2 2 )
Proposition 4.1(a) shows that the magnitude of q1g (x) depends on σ (except when volatility is constant) and on g. When g(z) = z, q1g (x) = q1 (x) and 1 when g(z) = log z, q1log (x) ≡ q1g (x) = q1 (x) − 2 2σ42 x2 . The following result compares |q1g (x)| for these two leading choices of g. PROPOSITION 4.2: Under Assumption H, conditionally on σ, for any x = 0, |q1log (x)| < |q1 (x)| and |q1log (0)| = |q1 (0)|. Given (9), supx |q1log (x)|/|q1 (x)| is a measure of the relative asymptotic error of the normal when approximating the distribution of the log transformed
291 √ statistic as compared to the raw statistic (to O( h)). Proposition 4.2 implies that the error of the normal approximation is larger for the raw statistic than for its log version. This theoretical result explains the finite sample improvements of the log statistic found in the simulations (see BNS (2005) and Section 6). Gonçalves and Meddahi (2007) applied the results of Proposition 4.1(a) to the class of Box–Cox transforms to show that there are other choices of nonlinear transformations within this class that dominate the log. Similarly, Gonçalves and Meddahi (2008) use q1 (x) to build improved confidence intervals for σ 2 . Although these outperform the CLT-based intervals, they are dominated by the i.i.d. bootstrap intervals proposed here. Recently, Zhang, Mykland, and Aït-Sahalia (2005a) also derived Edgeworth expansions for test statistics based on realized volatility measures. Zhang, Mykland, and Aït-Sahalia (2005a) allowed for microstructure noise (from which we abstract here) and therefore studied a variety of estimators including realized volatility as well as other microstructure noise robust estimators. Nevertheless, their results apply only to normalized statistics based on the true variance of realized volatility (which is unknown in practice), whereas we provide results for the feasible studentized statistics. As Gonçalves and Meddahi (2008) showed, confidence intervals based on Edgeworth expansions for normalized statistics have poor finite sample properties when compared to the Edgeworth-based intervals derived from the correct expansions for the feasible statistics. For the raw statistic, the second-order Edgeworth expansion for the i.i.d. bootstrap can be obtained as a special case of Liu’s (1988) work. She showed that the i.i.d. bootstrap is not only asymptotically valid, but also second-order correct for studentized statistics based on the sample mean of independent but heterogeneous observations. Liu’s (1988) results apply to t and bootstrap t statistics that are both studentized by the sample variance. Crucial to Liu’s (1988) results is a homogeneity condition on the population means that ensures consistency of the sample variance estimator context. Specifn in the heterogeneous −1 2 ically, Liu (1988) assumed that n (μ − μ) ¯ → 0, where μi ≡ E(Xi ), i i=1 n μ¯ ≡ n−1 i=1 μi , and n is the sample size. Letting Xi ≡ ri2 / h, where n ri = σi ui , −1 with ui ∼ N(0 1), and letting n ≡ 1/ h, we can write R2 = n i=1 Xi . Conditionally on σ, Xi is independently distributed with mean μi ≡ σi2 / h and variance 2σi4 / h2 . We can show that q1∗ (x) can be obtained from (2.7) in Liu (1988) as a special case. In our context, Liu’s (1988) homogeneity condition n is n−1 i=1 (μi − μ) ¯ 2 = σh4 − (σh2 )2 → 0, which is not satisfied under stochastic volatility. Thus, we cannot use R4 − R22 to studentize realized volatility. Tgh is the statistic of interest here and this is not covered by the results in Liu (1988). Hence the results in Proposition 4.1(a) are new (and so are the results for the WB, as well as the results for nonlinear functions g for the i.i.d. bootstrap). BOOTSTRAPPING REALIZED VOLATILITY
292
S. GONÇALVES AND N. MEDDAHI
Given (10), the bootstrap error in estimating P(Tgh ≤ x) is (11)
∗ P ∗ (Tgh ≤ x) − P(Tgh ≤ x) √ √ ∗ (x) − q1g (x) φ(x) + oP ( h) = h plim q1g h→0
∗ (x) − q1g (x) for our two uniformly in x ∈ R. Next we characterize plimh→0 q1g bootstrap methods.
4.1. The i.i.d. Bootstrap Error PROPOSITION 4.3: Under Assumptions G and H, conditionally on σ, we have that the following statements: ∗ (a) plimh→0 q1g (x)−q1g (x) = plimh→0 q1∗ (x)−q1 (x)+ 12 (g (σ 2 ))/(g (σ 2 ))×
( 3σ 4 − (σ 2 )2 − 2σ 4 )x2 , where plim q1∗ (x) − q1 (x) h→0
σ6 15σ 6 − 9σ 4 σ 2 + 2(σ 2 )3 4 1 − = (2x2 + 1) √ 3/2 6 2 (σ 4 )3/2 3σ 4 − (σ 2 )2
∗ (x) − q1g (x) = 0. (b) If σt = σ for all t, then plimh→0 q1g ∗ (c) | plimh→0 q1 (x) − q1 (x)| ≤ |q1 (x)| uniformly in x. ∗ Proposition 4.3(a) shows that under Assumptions G and H, plimh→0 q1g (x)− √ q1g (x) = 0, implying that the bootstrap error is of the same order, OP ( h), as the normal approximation error. The i.i.d. bootstrap does not match the cumulants of the original statistic when volatility is time-varying, explaining the lack of asymptotic refinements (although it is asymptotically valid, as we showed in Section 3 under more general assumptions than Assumption H). When volatility √ that the i.i.d. bootstrap √ is constant, Proposition 4.3(b) implies error is oP ( h), smaller than the normal error O( h). In this case, ri is i.i.d. N(0 hσ 2 ) and the i.i.d. bootstrap provides a second-order refinement. This result holds for any choice of g, including the raw statistic and the log-based statistic. When the two approximations have the same convergence rate, an alternative bootstrap accuracy measure is the relative asymptotic error of the bootstrap. See Shao and Tu (1995, Section 3.3) and Davidson and Flachaire (2001) for more on alternative measures of accuracy of the bootstrap. The asymp√ totic relative bootstrap error can be approximated to O( h) by the ratio ∗ r1g (x) = | plimh→0 q1g (x) − q1g (x)|/|q1g (x)| for any x ∈ R. An approximation
BOOTSTRAPPING REALIZED VOLATILITY
293
to this order of the relative error for i.i.d. bootstrap critical values is r1g (zα ), where zα is such that (zα ) = α. For the raw statistic, Proposition 4.3(c) proves that r1g (x) ≡ r1 (x) ≤ 1 uniformly in x. Thus, r1 (zα ) ≤ 1, showing that the bootstrap critical values are more accurate than the normal critical values for the raw statistic under our assumptions. In this case, it is easy to see that r1 (x) is a random function that depends on σ, but not on x. This not only simplifies the proof that supx∈R r1 (x) ≤ 1, but also allows us to evaluate easily by simulation the magnitude of this ratio for different stochastic volatility models. In particular, we show that this ratio is very small and close to zero for the generalized autoregression conditional heteroskedasticity GARCH(1 1) diffusion (with a mean of 0.0025 and a maximum of 0.024 across 10,000 simulations), and slightly larger for the two-factor diffusion model (the mean is 0.089 and the maximum is 0.219). See Section 6 for details on the simulation design. For nonlinear functions g, r1g (x) is a more complicated function, depending on both σ and x. Proving that supx∈R r1g (x) ≤ 1 is therefore more challenging. Although we do not provide a proof of this analytical result, we evaluated by simulation the value of r1g (x) on a grid of values of x in the interval [0 10] for g(z) = log z. For the GARCH(1 1) model, the maximum (over x) mean value (over σ) of r1log (x) was 0.0074, with an overall maximum (over σ and x) equal to 0.043. For the two-factor model, these numbers were 0.097 and 0.289 respectively. We take this as evidence of the superior accuracy of the bootstrap critical values for the GARCH(1 1) and two-factor diffusions, consistent with the good performance of the i.i.d. bootstrap for these models for one-sided intervals based on the log transform (see Section 6). 4.2. The Wild Bootstrap Error PROPOSITION 4.4: Under Assumptions G and H, conditionally on σ, ∗ plim q1g (x) − q1g (x) h→0
1 = − plim κ∗1gh − κ1g + plim κ∗3gh − κ3g (x2 − 1) 6 h→0 h→0
where
5 ∗ 1 4 plim κ − κ1g = − σ64 √ A1 − √ 2 3 h→0 2 ∗ 2
g (σ 2 ) 4 1 g (μ2 σ ) 3σ 4 (μ∗4 − μ∗2 ) − 2σ − 2 2 g (μ∗2 σ 2 ) g (σ 2 )
5 ∗ 4 ∗ ∗ plim κ3gh − κ3g = 6 plim κ1gh − κ1g + σ64 √ B1 − √ 3 h→0 h→0 2 ∗ 1gh
294
S. GONÇALVES AND N. MEDDAHI
with A∗1 and B1∗ as in Proposition 4.1. Proposition 4.4 shows that the ability of the WB to match κ1g and κ3g (and hence provide a second-order asymptotic refinement) depends on g, A∗1 , and B1∗ . The constants A∗1 and B1∗ are a function of μ∗q for q = 2 4 6, and therefore depend on the choice of ηi . For instance, if we choose4 ηi ∼ N(0 1), then A∗1 = A1 = B1 = B1∗ . This implies that for the raw statistic plimh→0 κ∗1h − κ1 = ( √53 − 1)κ1 = 0, and plimh→0 κ∗3h − κ3 = ( √53 − 1)κ3 = 0. In this case, plimh→0 q1∗ (x) − q1 (x) ≈ 189q1 (x), showing that this choice of ηi does not deliver√an asymptotic refinement. It also shows that the contribution of the term O( h) to the bootstrap error is almost twice as large as the contribution of q1 (x) to the normal error. Thus ηi ∼ N(0 1) is not a good choice for the WB, which is confirmed by our simulations in Section 6. A sufficient condition for the WB to provide a second-order asymptotic refinement is that μ∗2 , μ∗4 , and μ∗6 solve plimh→0 κ∗1gh = κ1g and plimh→0 κ∗3gh = κ3g . For the raw statistic, as Proposition 4.4 shows, this is equivalent to solving √53 A∗1 = √42 and √53 B1∗ = √42 . We can show that for any γ = 0, the solution 37 6 is of the form μ∗2 = γ 2 , μ∗4 = 31 γ 4 , and μ∗6 = 31 γ . Since Th∗ is invariant to 25 25 25 the choice of γ, we choose γ = 1 without loss of generality, implying μ∗2 = 1, 37 μ∗4 = 31 = 124, and μ∗6 = 31 = 18352. Next, we propose a two-point distribu25 25 25 tion for ηi that matches these three moments and thus implies a second-order asymptotic refinement for the WB for the raw statistic. PROPOSITION 4.5: Let Th∗ be defined as in (6) and (7) with g(z) = z, and let ηi be i.i.d. such that ⎧ √ 1 3 1 ⎪ ⎪ ≈ 028, 31 + 186 ≈ 133 with prob p = − √ ⎨ 5 2 186 ηi = √ ⎪ ⎪ ⎩ − 1 31 − 186 ≈ −083 with prob 1 − p. 5 ∗ ∗ Under Assumption √ H, conditionally on σ, as h → 0, supx∈R |P (Th ≤ x) − P(Th ≤ x)| = oP ( h).
The choice of ηi in Proposition 4.5 is not optimal for other choices of g, including the log statistic. In this case, the solution to plimh→0 κ∗1gh = κ1g and plimh→0 κ∗3gh = κ3g depends on g and on the volatility path through σ q . Although we could replace these unknowns by consistent estimates, the Edgeworth expansions derived here would likely change because they do not take into account the randomness of the estimates. In addition, these estimates are 4 Given that returns are (conditionally on σ) normally distributed, choosing ηi ∼ N(0 1) could be a natural choice.
BOOTSTRAPPING REALIZED VOLATILITY
295
very noisy and it is unclear whether such an approach would be useful in practice. See Gonçalves and Meddahi (2007) for more on a related issue. For these reasons, we do not pursue this approach here. 5. THIRD-ORDER ACCURACY OF THE BOOTSTRAP Here we develop Edgeworth expansions through O(h) and use these to evaluate the accuracy of the bootstrap for estimating P(|Th | ≤ x). For brevity, we only give results for the raw statistic. The third-order Edgeworth expansion of the distribution of Th is √ (12) P(Th ≤ x) = (x) + hq1 (x)φ(x) + hq2 (x)φ(x) + o(h) for any x ∈ R, where q1 is defined in Section 4 and q2 is an odd polynomial of degree 5 whose coefficients depend on κj for j = 1 4. The third-order bootstrap Edgeworth expansion is similar to (12), with q1∗ (x) and q2∗ (x) denoting the bootstrap analogues of q1 (x) and q2 (x), respectively. In particular, q2∗ (x) is of the same form as q2 (x) but replaces the coefficients κj with bootstrap analogues κ∗jh . The error in estimating P(|Th | ≤ x) made by the normal approximation is given by P(|Th | ≤ x) − (2 (x) − 1) = 2hq2 (x)φ(x) + o(h), which is O(h). The bootstrap error can be written as (13) P ∗ (|Th∗ | ≤ x) − P(|Th | ≤ x) = 2h plim q2∗ (x) − q2 (x) φ(x) + oP (h) h→0
The bootstrap provides a third-order asymptotic refinement when plimh→0 q2∗ (x) = q2 (x) or, equivalently, when plimh→0 κ∗jh = κj for j = 1 4. Our findings are as follows. The i.i.d. bootstrap does not provide thirdorder asymptotic refinements. This is true even when volatility is constant, which is a surprising result. Under constant volatility, plimh→0 κ∗jh = κj for j = 1 and 3 (implying that plimh→0 q1∗ (x) = q1 (x); cf. Proposition 4.3(b)), but this is not true for j = 2 and 4. Note that this does not mean that the i.i.d. bootstrap provides inconsistent estimates of the asymptotic value (as h → 0) of the second and fourth cumulants of Th . Since κ∗2 (Th∗ ) = 1 + hκ∗2h + oP (h) and κ∗4 (Th∗ ) = hκ∗4h + oP (h), it follows that plimh→0 κ∗2 (Th∗ ) = 1 = plimh→0 κ2 (Th ) and plimh→0 κ∗4 (Th∗ ) = 0 = plimh→0 κ4 (Th ), independently of the value of plimh→0 κ∗jh and κj ; these terms are multiplied by h, which goes to zero, and only play a role in proving bootstrap refinements. The reason why the i.i.d. bootstrap does not provide a third-order asymptotic refinement under constant volatility is related to the fact that Th∗ uses a variance estimator Vˆ ∗ which is not the bootstrap analogue of the variance estimator Vˆ ≡ 2 R used in Th . Under constant volatility, an alternative consistent variance 3 4 estimator of the asymptotic variance of R2 is V˜ = R4 − R22 , which is of the
296
S. GONÇALVES AND N. MEDDAHI
same form as Vˆ ∗ . We can show that for a t statistic based on V˜ , we get secondand third-order asymptotic refinements for the i.i.d. bootstrap under constant volatility. Using Vˆ instead of V˜ does not have an impact at the second order, but it does at the third order. Because V˜ is only consistent for V under constant volatility, we cannot use it in the general context of stochastic volatility. Our main finding for the WB is that there is no choice of ηi for which the WB gives a third-order asymptotic refinement. In particular, it is not possible to find ηi such that plimh→0 κ∗jh = κj for j = 1 4. As discussed in Section 4, to match the first- and third-order cumulants, we need to choose ηi 37 6 with moments μ∗2 = γ 2 , μ∗4 = 31 γ 4 , and μ∗6 = 31 γ . Since the WB statistic 25 25 25 is invariant to the choice of γ, we set γ = 1. We are left with two equations (plimh→0 κ∗jh = κj for j = 2 4) and one free parameter μ∗8 . The two-point distribution proposed in Proposition 4.5 gives a second-order refinement, implying μ∗8 = 3014. We can also choose ηi to solve plimh→0 κ∗jh = κj for j = 1 2 3 by 37 , μ∗6 = 31 , and μ∗8 = ( 31 )2 ( 251 )( 1739 ) = 3056.5 Because it setting μ∗2 = 1, μ∗4 = 31 25 25 25 25 35 ∗ solves plimh→0 κjh = κj for j = 2 (in addition to j = 1 and 3), this choice may perform better than the two-point choice of ηi in Proposition 4.5. Given the absence of third-order bootstrap asymptotic refinements, we rely on the asymptotic relative error of the bootstrap as the criterion of comparison. To O(h), this error is equal to r2 (x) = | plimh→0 q2∗ (x) − q2 (x)|/|q2 (x)|, with x > 0. In the general stochastic volatility case, r2 (x) is a random function of x as it depends on σ through the ratios σ64 and σ84 . When σ is constant, these ratios equal 1 and r2 (x) becomes a deterministic function of x. Figure 1 plots r2 (x) against x when σ is constant. Four methods are considered: the i.i.d. bootstrap, the WB based on ηi ∼ N(0 1), the WB based on ηi chosen according to Proposition 4.5, and a third WB whose moments μ∗q solve plimh→0 κ∗jh = κj for j = 1 2 3. Figure 1 shows that supx r2 (x) < 1 for the i.i.d. bootstrap, suggesting that it is better than the normal approximation under this criterion. Instead, Figure 1 shows that for the WB, r2 (x) can be larger or smaller than 1, depending on x, except for the WB based on N(0 1), for which it is always well above 1. We also evaluated r2 (x) by simulation when σ is stochastic, as we did for r1log (x). The results show that r2 (x) can be smaller or larger than 1, depending on x. Overall, Figure 1 suggests that the asymptotic relative bootstrap error criterion is not a good indicator of the accuracy of our WB methods for two-sided distribution functions. Although Edgeworth expansions are the main theoretical tool for proving bootstrap asymptotic refinements, it has already been pointed out in the bootstrap literature (see, e.g., Härdle, Horowitz, and Kreiss (2003)) that Edgeworth expansions can be imperfect guides to the relative accuracy of the bootstrap methods. The same comment applies here to the asymptotic relative bootstrap error criterion for two-sided distribution functions. 5 Matching κj for j = 1 2 4 is not possible because the solution for the μ∗q ’s does not satisfy Jensen’s inequality.
BOOTSTRAPPING REALIZED VOLATILITY
297
FIGURE 1.—The function r2 (x) when σ is constant.
6. MONTE CARLO RESULTS We compare the finite sample performance of the bootstrap with the firstorder asymptotic theory for confidence intervals of integrated volatility. Our Monte Carlo design follows that of Andersen, Bollerslev, and Meddahi (2005). In particular, we consider the stochastic volatility model
d log St = μ dt + σt ρ1 dW1t + ρ2 dW2t + 1 − ρ21 − ρ22 dW3t where W1t , W2t , and W3t are three independent standard Brownian motions. For σt , we consider a GARCH(1 1) diffusion (cf. Andersen and Bollerslev (1998)), where dσt2 = 0035(0636 − σt2 ) dt + 0144σt2 dW1t , and a two-factor diffusion (see Huang and Tauchen (2006) and Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008)) where σt = exp(−12 + 004σ1t2 + 15σ2t2 ), with dσ1t2 = −000137σ1t2 dt + dW1t and dσ2t2 = −1386σ2t2 dt + (1 + 025σ2t2 ) dW2t . Our baseline models let μ = 0 and ρ1 = ρ2 = 0, consistent with Assumption H. We also allow for drift and leverage effects by setting μ = 00314, ρ1 = −0576, and ρ2 = 0 for the GARCH(1 1) model, and μ = 0030 and ρ1 = ρ2 = −030 for the two-factor diffusion model, for which our results in Section 3 apply. We consider one- and two-sided symmetric 95% confidence intervals based on the raw and on the log statistics. We use the normal distribution (CLT), the i.i.d. bootstrap (iidB), and two WB methods, one based on
298
S. GONÇALVES AND N. MEDDAHI
ηi ∼ N(0 1) (WB1) and another based on the two-point distribution proposed in Proposition 4.5 (WB2) to compute critical values. Table I gives the actual coverage rates of all the intervals across 10,000 replications for four different sample sizes: 1/ h = 1152, 288, 48, and 12, corresponding to 1.25-minute, 5-minute, half-hour, and 2-hour returns. Bootstrap intervals use 999 bootstrap replications. For all models, both one-sided and two-sided asymptotic intervals tend to undercover. The degree of undercoverage is especially large for larger values of h, when sampling is not too frequent, and it is larger for one-sided than for two-sided intervals. It is also larger for the raw statistics than for the log-based statistics. The two-factor model implies overall lower coverage rates (hence larger coverage distortions) than the GARCH(1 1) model. The bootstrap methods outperform the feasible asymptotic theory for both one- and two-sided intervals, and for the raw and the log statistics. The i.i.d. bootstrap does very well across all models and intervals, even though there is stochastic volatility. It essentially eliminates the distortions associated with the asymptotic intervals for small values of 1/ h for the GARCH(1 1). Its performance deteriorates for the two-factor model, but it remains very competitive relative to the other methods. The WB intervals based on the normal distribution tend to overcover across all models. The WB based on the two-point distribution tends to undercover, but significantly less than the feasible asymptotic theory intervals. This is true for both the raw and the log versions of R2 , although its relative performance is worse for the log case, for which this choice is not optimal. The i.i.d. and the WB based on the two-point distribution outperform the normal approximation for symmetric intervals, despite the fact that these bootstrap methods do not theoretically provide an asymptotic refinement for two-sided symmetric confidence intervals. The i.i.d. bootstrap is the preferred method overall, followed by the WB based on the proposed two-point distribution. Finally, the results are robust to leverage and drift effects. 7. CONCLUSIONS The results presented here justify using the i.i.d. bootstrap and the wild bootstrap for a class of nonlinear transformations of realized volatility that contains the log transform as a special case. We show that these methods are asymptotically valid under the assumptions of BNGJS (2006), which allow for drift and leverage effects. In simulations, the bootstrap is more accurate than the standard normal asymptotic theory for two popular stochastic volatility models. We provide higher order results that explain these findings under a stricter set of assumptions that rules out drift and leverage effects. Establishing higher order refinements of the bootstrap under the conditions of BNGJS (2006) is a promising extension of this work. Another important extension is to prove the validity of the Edgeworth expansions derived here. Finally, one interesting application of the bootstrap is to realized beta, where the Monte Carlo results of
TABLE I COVERAGE RATES OF NOMINAL 95% CONFIDENCE INTERVALS FOR σ 2 a One-Sided
Two-Sided Symmetric
Raw
1/ h
CLT
iidB
Log WB1
WB2
CLT
iidB
Raw WB1
WB2
CLT
iidB
Log WB2
CLT
iidB
WB1
WB2
87.49 93.83 95.17 94.88
90.40 93.64 94.70 94.85
95.86 95.46 95.11 94.99
97.96 97.42 96.38 95.43
88.30 94.66 95.13 94.86
12 48 288 1152
82.69 89.74 93.03 94.01
93.27 94.63 95.10 95.02
98.51 98.32 97.40 96.51
87.50 93.87 95.04 95.04
88.83 92.74 94.33 94.56
12 48 288 1152
75.69 84.52 90.27 93.20
89.70 92.66 94.28 95.02
96.52 96.92 97.32 96.93
78.94 89.71 93.49 94.95
82.41 88.48 92.12 94.04
90.35 92.64 94.25 94.99
96.12 96.49 96.94 96.60
Two-factor diffusion 82.76 78.94 90.13 91.70 87.95 92.83 94.35 92.83 94.59 95.30 94.64 95.20
96.52 96.92 97.25 96.52
78.92 89.79 93.98 94.89
85.90 90.85 93.59 94.77
93.32 93.97 94.88 95.11
96.14 96.50 96.78 96.08
80.25 90.95 94.27 94.92
98.36 98.57 96.84 96.05
87.22 93.92 94.94 95.13
90.48 93.65 94.56 95.10
95.70 95.55 95.09 95.16
97.93 97.70 96.19 95.59
88.29 94.66 94.80 95.15
Two-factor diffusion 82.97 79.52 90.87 91.73 87.81 92.89 94.69 93.14 94.81 95.17 94.27 94.81
96.75 97.05 97.30 96.33
79.55 89.69 94.08 94.56
86.09 90.76 93.76 94.47
93.50 94.08 94.99 94.88
96.34 96.57 96.68 95.84
80.40 90.82 94.36 94.74
12 48 288 1152
82.40 89.81 92.84 94.28
93.00 94.70 94.98 95.16
98.36 98.57 97.37 96.70
87.21 94.01 94.95 95.13
88.40 92.72 94.25 94.77
93.32 94.79 95.00 95.16
Models With Leverage and Drift GARCH(1 1) diffusion 98.04 89.99 85.72 93.69 98.01 95.17 92.35 94.97 96.87 95.46 94.41 95.15 96.27 95.39 95.04 95.13
12 48 288 1152
75.79 84.16 90.75 93.01
90.44 92.69 94.56 95.13
96.75 97.05 97.34 96.79
79.57 89.68 93.76 94.82
83.09 88.51 92.39 93.98
90.67 92.76 94.57 95.08
96.34 96.60 97.04 96.54
299
a Notes: CLT—intervals based on the Normal; iidB—intervals based on the i.i.d. bootstrap; WB1—WB based on η ∼ N(0 1); WB2—WB based on Proposition 4.5. i 10,000 Monte Carlo trials with 999 bootstrap replications each.
BOOTSTRAPPING REALIZED VOLATILITY
WB1
Baseline Models: No Leverage and No Drift GARCH(1 1) diffusion 93.48 98.07 90.27 86.08 93.75 98.51 94.74 97.73 95.20 92.32 94.87 98.32 95.12 97.03 95.55 94.57 95.18 97.05 95.00 96.22 95.21 94.81 94.97 95.69
300
S. GONÇALVES AND N. MEDDAHI
BNS (2004a) show that there are important finite sample distortions. Dovonon, Gonçalves, and Meddahi (2007) considered this extension. APPENDIX A: CUMULANT EXPANSIONS This Appendix contains the cumulant expansions used in the paper. Auxiliary lemmas and proofs appear in the Supplemental Material (see GM09). Recall that σqp ≡ σ q /(σ p )q/p for any q p > 0. In some results, σ q is replaced q with σh in this definition and we write σqph . Finally, we let Rqp ≡ Rq /(Rp )q/p . THEOREM A.1—Cumulants of Tgh : Suppose Assumptions G and H hold. For √ q σ q = oP ( h), and, conditionally on σ, as h → 0, any q > 0, σh − √ (a) κ1 (Th ) = hκ1 + o(h), with κ1 ≡ −(A1 /2)σ64 ; √ (b) κ1 (Tgh ) = hκ1g + O(h), with κ1g ≡ κ1 − 12 (g (σ 2 ))/(g (σ 2 )) 2σ 4 ; 2 (c) κ2 (Th ) = 1 + hκ2 + o(h), with κ2 ≡ (C1 − A2 )σ84 + 74 A21 σ64 ; √ (d) κ3 (Th ) = hκ3 + o(h), with κ3 ≡ (B1 − 3A1 )σ64 ; √ (e) κ3 (Tgh ) = hκ3g + O(h), with κ3g ≡ κ3 − 3(g (σ 2 ))/(g (σ 2 )) 2σ 4 ; (f) κ4 (Th ) = hκ4 + o(h), with κ4 ≡ (B2 + 3C1 − 6A2 )σ84 + (18A21 − 6A1 B1 ) × 2 σ64 , and A1 =
μ6 − μ2 μ4 4 =√ 2 1/2 μ4 (μ4 − μ2 ) 2
A2 =
μ8 − μ24 − 2μ2 μ6 + 2μ22 μ4 = 12 μ4 (μ4 − μ22 )
B1 =
μ6 − 3μ2 μ4 + 2μ32 4 =√ 2 3/2 (μ4 − μ2 ) 2
B2 =
μ8 − 4μ2 μ6 + 12μ22 μ4 − 6μ42 − 3μ24 = 12 (μ4 − μ22 )2
C1 =
μ8 − μ24 32 = 3 μ24
THEOREM A.2—i.i.d. Bootstrap Cumulants: Under Assumptions G and H, conditionally on √ σ, as h → 0, (a) κ∗1 (Th∗ ) = hκ∗1h + oP (h), with κ∗1h ≡ −A˜ 1 /2; √ ∗ (b) κ∗1 (Tgh ) = hκ∗1gh + OP (h), with κ∗1gh ≡ κ∗1h − 12 (g (R2 ))/(g (R2 )) × R4 − R22 ; (c) κ∗2 (Th∗ ) = 1 + hκ∗2h + oP (h), with κ∗2h ≡ C˜ − A˜ 2 − 14 A˜ 21 ; √ (d) κ∗3 (Th∗ ) = hκ∗3h + oP (h), with κ∗3h ≡ −2A˜ 1 ;
BOOTSTRAPPING REALIZED VOLATILITY
301
√
∗ (e) κ∗3 (Tgh ) = hκ∗3gh + OP (h), with κ∗3gh ≡ κ∗3h − 3(g (R2 ))/(g (R2 )) × R4 − R22 ; ˜ + 3E) ˜ − 6(C˜ − A˜ 2 ) − 4A˜ 2 , (f) κ∗4 (Th∗ ) = hκ∗4h + oP (h), with κ∗4h ≡ (B˜ 2 − 2D 1 where
R6 − 3R4 R2 + 2R32 A˜ 1 = (R4 − R22 )3/2 R8 − 4R24 − 4R6 R2 + 14R4 R22 − 7R42 A˜ 2 = (R4 − R22 )2 R8 − 4R6 R2 + 12R4 R22 − 6R42 − 3R24 B˜ 2 = (R4 − R22 )2 R8 − R24 2(R6 − R4 R2 )2 12(R6 − R4 R2 )(R2 ) 12R22 C˜ = + − + (R4 − R22 )2 (R4 − R22 )3 (R4 − R22 )2 R4 − R22 3 ˜ = 4(R6 − 3R4 R2 + 2R2 )(R6 − R4 R2 ) D (R4 − R22 )3
+
6(R8 − R24 − 2R6 R2 + 2R4 R22 ) (R4 − R22 )2
− 15 −
20R2 (R6 − 3R4 R2 + 2R32 ) (R4 − R22 )2
3(R8 − R24 ) 12(R6 − R4 R2 )2 E˜ = + (R4 − R22 )2 (R4 − R22 )3 −
60(R6 − R4 R2 )(R2 ) 60(R2 )2 + (R4 − R22 )2 R4 − R22
THEOREM A.3—WB Cumulants: Under Assumptions G and H, conditionally on σ, as h → 0, √ (a) κ∗1 (Th∗ ) = hκ∗1h + oP (h), with κ∗1h ≡ −(A∗1 /2)R64 ; √ ∗ (b) κ∗1 (Tgh ) = hκ∗1gh + OP (h), with κ∗1gh ≡ κ∗1h − 12 (g (μ∗2 R2 ))/(g (μ∗2 × R2 )) (μ∗4 − μ∗2 2 )R4 ; ∗ ∗ 2 (c) κ2 (Th ) = 1 + hκ∗2h + oP (h), with κ∗2h ≡ (C1∗ − A∗2 )R84 + 74 A∗2 1 R64 ; √ (d) κ∗3 (Th∗ ) = hκ∗3h + oP (h), with κ∗3h ≡ (B1∗ − 3A∗1 )R64 ; √ ∗ (e) κ∗3 (Tgh ) = hκ∗3gh + OP (h), with κ∗3gh ≡ κ∗3h − 3(g (μ∗2 R2 ))/(g (μ∗2 × R2 )) (μ∗4 − μ∗2 2 )R4 ;
302
S. GONÇALVES AND N. MEDDAHI
(f) κ∗4 (Th∗ ) = hκ∗4h + oP (h), with κ∗4h ≡ (B2∗ + 3C1∗ − 6A∗2 )R84 + (18A∗2 1 − 6A∗1 B1∗ )R264 , where A∗1 =
μ∗6 − μ∗2 μ∗4 1/2 μ (μ∗4 − μ∗2 2 )
A∗2 =
∗ ∗ ∗2 ∗ μ∗8 − μ∗2 4 − 2μ2 μ6 + 2μ2 μ4 ∗ ∗ ∗2 μ4 (μ4 − μ2 )
B1∗ =
μ∗6 − 3μ∗2 μ∗4 + 2μ∗3 2 3/2 (μ∗4 − μ∗2 2 )
B2∗ =
∗ ∗4 ∗2 μ∗8 − 4μ∗2 μ∗6 + 12μ∗2 2 μ4 − 6μ2 − 3μ4 2 (μ∗4 − μ∗2 2 )
C1∗ =
μ∗8 − μ∗2 4 μ∗2 4
∗ 4
PROOF OF THEOREM A.1: We sketch the proofs for the raw statistic. The proofs of (b) and (e) for nonlinear g follow by a second-order Taylor expan√ sion of K(R2 Vˆ ) around (σ 2 Vh ), where K(x y) = (g(x) − g(σ 2 ))/(g (x) y) √ 4 −1 and g(·) G. We let √ is as in Assumption √ Vh = Var( h R2 ) = 2σh , and let √ Sh ≡ ( √h−1 (R2 − σ 2 ))/ Vh and Uh ≡ ( h−1 (Vˆ − Vh ))/Vh . We can write Th = Sh (1 + hUh )−1/2 . The first four cumulants of Th are given by (e.g., Hall (1992, p. 42)) κ1 (Th ) = E(Th );
κ2 (Th ) = E(Th2 ) − [E(Th )]2
κ3 (Th ) = E(Th3 ) − 3E(Th2 )E(Th ) + 2[E(Th )]3 κ4 (Th ) = E(Th4 ) − 4E(Th3 )E(Th ) − 3[E(Th2 )]2 + 12E(Th2 )[E(Th )]2 − 6[E(Th )]4 We identify the terms of order up to O(h). For a fixed k, we can write √ Thk = Shk (1 + hUh )−k/2
k√ k k k k + 1 hShk Uh2 + O h3/2 = Sh − hSh Uh + 2 4 2 For k = 1 4, the moments of Thk up to O(h3/2 ) are given by √ 1 3 E(Th ) = − h E(Sh Uh ) + hE(Sh Uh2 ) 2 8 √ 2 2 E(Th ) = 1 − hE(Sh Uh ) + hE(Sh2 Uh2 )
BOOTSTRAPPING REALIZED VOLATILITY
303
√ 3 15 h E(Sh3 Uh ) + hE(Sh3 Uh2 ) 2 8 √ 4 4 4 E(Th ) = E(Sh ) − 2 hE(Sh Uh ) + 3hE(Sh4 Uh2 ) E(Th3 ) = E(Sh3 ) −
where we used E(Sh ) = 0 and E(Sh2 ) = 1. By Lemma S.3 in GM09, we have that
√ 1 E(Th ) = h − A1 σ64h + O h3/2 2 2 + O(h2 ) E(Th2 ) = 1 + h (C1 − A2 )σ84h + C2 σ64h
√ 3 E(Th3 ) = h B1 − A3 σ64h + O h3/2 2 2 + O(h2 ) E(Th4 ) = 3 + h (B2 − 2D1 + 3E1 )σ84h + (3E2 − 2D2 )σ64h √ √ Thus κ1 (Th ) = h(−(A1 /2)σ64h ) + O(h3/2 ) = h(−(A1 /2)σ64 ) + O(h3/2 ), q since under Assumption H, BNS (2004b) showed that σh − σ q = o(h1/2 ). This proves the first result. The remaining results follow similarly. Q.E.D. PROOF OF THEOREM A.2: We follow the proof of Theorem A.1 and use Lemma S.7 in GM09 instead of Lemma S.3. The cumulant expansions follow by noting that A˜ 3 = 3A˜ 1 and B˜ 1 = A˜ 1 . Q.E.D. See the proof of Theorem A.1 and Remark 1 in GM09 for the proof of Theorem A.3. APPENDIX B: PROOFS OF RESULTS IN SECTIONS 3–5 d
d∗
∗ → PROOF OF THEOREM 3.1: Given that Tgh → N(0 1), it suffices that Tgh N(0 1) in probability. We prove this for g(z) = z; the delta method implies the result for nonlinear g. The proof contains two steps: 1. show the desired √ √ P∗ result for Sh∗ ≡ h−1 (R∗2 − E ∗ (R∗2 ))/ V ∗ ; 2. show Vˆ ∗ → V ∗ in probability. We start with the i.i.d. bootstrap. √ 1/ h Step 1. Let Sh∗ = i=1 zi∗ , where zi∗ ≡ (ri∗2 − E ∗ (ri∗2 ))/ hV ∗ are (conditionally) i.i.d. with E ∗ (zi∗ ) = 0 and Var∗ (zi∗ ) = h2 V ∗ /hV ∗ = h such that 1/ h Var∗ ( i=1 zi∗ ) = 1. Thus, by the Berry–Esseen bound, for some small ε > 0 and some constant K, 1/ h 1/ h ∗ ∗ zi ≤ x − (x) ≤ K E ∗ |zi∗ |2+ε supP x∈R i=1
i=1
304
S. GONÇALVES AND N. MEDDAHI
which converges to zero in probability as h → 0. We have 1/ h
E ∗ |zi∗ |2+ε = h−1−(2+ε)/2 |V ∗ |−(2+ε)/2 E ∗ |r1∗2 − E ∗ |r1∗ |2 |2+ε
i=1
≤ 2|V ∗ |−(2+ε)/2 h−1−(2+ε)/2 E ∗ |r1∗ |2(2+ε) = 2|V ∗ |−(2+ε)/2 hε/2 R2(2+ε) = OP hε/2 = oP (1) P
P
since V ∗ → 3σ 4 − (σ 2 )2 > 0 and R2(2+ε) → μ2(2+ε) σ 2(2+ε) = O(1).
P Step 2. Use Lemma S.5 in GM09 to show that Bias∗ (Vˆ ∗ ) → 0 and P Q.E.D. Var∗ (Vˆ ∗ ) → 0. The proof for the WB follows similarly.
PROOF OF PROPOSITION 4.1: The results follow from the definition of ∗ (x) given the cumulants expansions in Theorems A.1, A.2, q1g (x) and q1g and A.3. Q.E.D. The proof of Proposition 4.2 appears in GM09. PROOF OF PROPOSITION 4.3: (a) We compute plimh→0 κ∗jgh for j = 1 3 usP ing Theorem A.2 and the fact that Rq → μq σ q , as shown by BNGJS (2006). (b) Follows trivially when σ is constant because (σ q )p = σ qp for any q p > 0. The proof of (c) appears in GM09. Q.E.D. PROOF OF PROPOSITION 4.4: This follows from Theorem A.1 and A.3, given Q.E.D. that Rq → μq σ q in probability for any q > 0, by BNGJS (2006). PROPOSITION 4.5: Let ηi = a1 with probability p and let √ ηi = a2 with probability 1 − p. We can show that a1 = 15 31 + 186, a2 = √ 3 − 15 31 − 186, and p = 12 − √186 solve E(η2i ) = a21 p + a22 (1 − p) = 1, E(η4i ) = 37 31 4 4 6 a1 p + a2 (1 − p) = 25 , and E(ηi ) = a61 p + a62 (1 − p) = 31 . Q.E.D. 25 25 PROOF
OF
REFERENCES ANDERSEN, T. G., AND T. BOLLERSLEV (1998): “Answering the Skeptics: Yes, Standard Volatility Models Do Provide Accurate Forecasts,” International Economic Review, 39, 885–905. [297] ANDERSEN, T. G., T. BOLLERSLEV, AND F. X. DIEBOLD (2002): “Parametric and Nonparametric Measurements of Volatility,” in Handbook of Financial Econometrics, ed. by Y. Aït-Sahalia and L. P. Hansen. Amsterdam: North Holland (forthcoming). [283] ANDERSEN, T. G., T. BOLLERSLEV, F. X. DIEBOLD, AND P. LABYS (2003): “Modeling and Forecasting Realized Volatility,” Econometrica, 71, 529–626. [284]
BOOTSTRAPPING REALIZED VOLATILITY
305
ANDERSEN, T. G., T. BOLLERSLEV, AND N. MEDDAHI (2005): “Correcting the Errors: Volatility Forecast Evaluation Using High-Frequency Data and Realized Volatilities,” Econometrica, 73, 279–296. [297] BANDI, F. M., AND J. R. RUSSELL (2008): “Microstructure Noise, Realized Variance, and Optimal Sampling,” Review of Economic Studies, 75, 339–369. [284] BARNDORFF-NIELSEN, O., AND N. SHEPHARD (2002): “Econometric Analysis of Realized Volatility and Its Use in Estimating Stochastic Volatility Models,” Journal of the Royal Statistical Society, Ser. B, 64, 253–280. [283,285,287] (2004a): “Econometric Analysis of Realized Covariation: High Frequency Based Covariance, Regression, and Correlation in Financial Economics,” Econometrica, 72, 885–925. [300] (2004b): “Power and Bipower Variation With Stochastic Volatility and Jumps,” Journal of Financial Econometrics, 2, 1–48. [303] (2005): “How Accurate Is the Asymptotic Approximation to the Distribution of Realised Volatility?” in Inference for Econometric Models. A Festschrift for Tom Rothenberg, ed. by D. W. K. Andrews and J. H. Stock. Cambridge, U.K.: Cambridge University Press, 306–331. [284,291] BARNDORFF-NIELSEN, O., S. E. GRAVERSEN, J. JACOD, AND N. SHEPHARD (2006): “Limit Theorems for Bipower Variation in Financial Econometrics,” Econometric Theory, 22, 677–719. [283,286-289,298,304] BARNDORFF-NIELSEN, O., P. R. HANSEN, A. LUNDE, AND N. SHEPHARD (2008): “Designing Realized Kernels to Measure the ex post Variation of Equity Prices in the Presence of Noise,” Econometrica, 76, 1481–1536. [284,297] DAVIDSON, R., AND E. FLACHAIRE (2001): “The Wild Bootstrap, Tamed at Last,” Working Paper Darp58, STICERD, London School of Economics. [285,289,292] DOVONON, P., S. GONÇALVES, AND N. MEDDAHI (2007): “Bootstrapping Realized Multivariate Volatility Measures,” Manuscript, Université de Montréal. [300] GONÇALVES, S., AND N. MEDDAHI (2007): “Box–Cox Transforms for Realized Volatility,” Manuscript, Université de Montréal. [291,295] (2008): “Edgeworth Corrections for Realized Volatility,” Econometric Reviews, 27, 139–162. [291] (2009): “Supplement to ‘Bootstrapping Realized Volatility’,” Econometrica Supplemental Material, 77, http://www.econometricsociety.org/ecta/Supmat/5971_Miscellaneous.pdf. [286] HALL, P. (1992): The Bootstrap and Edgeworth Expansion. New York: Springer Verlag. [290,302] HANSEN, P. R., AND A. LUNDE (2006): “Realized Variance and Market Microstructure” (With Discussion), Journal of Business and Economic Statistics, 24, 127–218. [284] HÄRDLE, W., J. HOROWITZ, AND J.-P. KREISS (2003): “Bootstrap Methods for Time Series,” International Statistical Review, 71, 435–459. [296] HUANG, X., AND G. TAUCHEN (2006): “The Relative Contribution of Jumps to Total Price Variance,” Journal of Financial Econometrics, 3, 456–499. [297] JACOD, J., AND P. PROTTER (1998): “Asymptotic Error Distributions for the Euler Method for Stochastic Differential Equations,” Annals of Probability, 26, 267–307. [287] LIU, R. Y. (1988): “Bootstrap Procedure Under Some Non-i.i.d. Models,” Annals of Statistics, 16, 1696–1708. [284,291] MAMMEN, E. (1993): “Bootstrap and Wild Bootstrap for High Dimensional Linear Models,” Annals of Statistics, 21, 255–285. [284,289] MEDDAHI, N. (2002): “A Theoretical Comparison Between Integrated and Realized Volatility,” Journal of Applied Econometrics, 17, 475–508. [289] SHAO, J., AND D. TU (1995): The Jackknife and Bootstrap. New York: Springer Verlag. [285,292] WU, C. F. J. (1986): “Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis,” Annals of Statistics, 14, 1261–1295. [284] ZHANG, L., P. MYKLAND, AND Y. AÏT-SAHALIA (2005a): “Edgeworth Expansions for Realized Volatility and Related Estimators,” Working Paper, Princeton University. [291]
306
S. GONÇALVES AND N. MEDDAHI
(2005b): “A Tale of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency Data,” Journal of the American Statistical Association, 100, 1394–1411. [284]
Département de Sciences Économiques, CIREQ and CIRANO, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal, QC, H3C 3J7, Canada; [email protected] and Finance and Accounting Group, Tanaka Business School, Imperial College London, Exhibition Road, London SW7 2AZ, U.K.; [email protected]. Manuscript received July, 2005; final revision received June, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 307–316
CHARACTERIZATION OF REVENUE EQUIVALENCE BY BIRGIT HEYDENREICH,1 RUDOLF MÜLLER, MARC UETZ, AND RAKESH V. VOHRA2 The property of an allocation rule to be implementable in dominant strategies by a unique payment scheme is called revenue equivalence. We give a characterization of revenue equivalence based on a graph theoretic interpretation of the incentive compatibility constraints. The characterization holds for any (possibly infinite) outcome space and many of the known results are immediate consequences. Moreover, revenue equivalence can be identified in cases where existing theorems are silent. KEYWORDS: Revenue equivalence, mechanism design, incentive compatibility.
1. INTRODUCTION ONE OF THE MOST IMPORTANT RESULTS of auction theory is the revenue equivalence theorem. Subject to certain reasonable assumptions, it concludes that a variety of different auctions generate the same expected revenue for the seller. Klemperer (1999) wrote that “much of auction theory can be understood in terms of this theorem. . . ”; hence the long line of papers that have attempted to relax the sufficient conditions under which revenue equivalence holds. The present paper provides necessary and sufficient conditions on the underlying primitives for revenue equivalence to hold. We consider direct revelation mechanisms for agents with multidimensional types. Such mechanisms consist of an allocation rule and a payment scheme. The allocation rule selects an outcome depending on the agents’ reported types, whereas the payment scheme assigns a payment to every agent. We focus attention on allocation rules that are implementable in dominant strategies.3 Call such rules implementable. In this environment we characterize the uniqueness of the relevant payment scheme in terms of conditions that are easily verified in potential applications. The property of an allocation rule to be implementable in dominant strategies by a unique payment scheme is called revenue equivalence. Our characterization of revenue equivalence is based on a graph theoretic interpretation of the incentive compatibility constraints. This interpretation has been used before by Rochet (1987), Gui, Müller, and Vohra (2004), Saks and Yu (2005), and Müller, Perea, and Wolf (2007) to identify allocation rules that are implementable in dominant strategies or in Bayes–Nash. The characterization holds for any (possibly infinite) outcome space. 1 The first author is supported by NWO Grant 2004/03545/MaGW, “Local Decisions in Decentralised Planning Environments.” 2 We thank the editor and three anonymous referees for very useful comments and suggestions. We also thank participants of Dagstuhl Seminars 07271 and 07471 for their valuable feedback. 3 With appropriate adjustments, our characterization of revenue equivalence holds for ex post as well as Bayes–Nash incentive compatibility.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7168
308
HEYDENREICH, MÜLLER, UETZ, AND VOHRA
The bulk of prior work on revenue equivalence (Green and Laffont (1977), Holmström (1979), Myerson (1981), Krishna and Maenner (2001), Milgrom and Segal (2002)) identifies sufficient conditions on the type space that ensure all allocation rules from a given class satisfy revenue equivalence. We know of only two papers that identify necessary as well as sufficient conditions—that is, characterizing conditions—for revenue equivalence to hold. If the outcome space is finite, Suijs (1996) characterized type spaces and valuation functions for which utilitarian maximizers satisfy revenue equivalence. Chung and Olszewski (2007) characterized type spaces and valuation functions for which every implementable allocation rule satisfies revenue equivalence, again under the assumption of a finite outcome space. Our characterization identifies, for general outcome spaces, a joint condition on the type space, valuation function, and implementable allocation rule that characterizes revenue equivalence. Given agents’ type spaces and valuation functions, several allocation rules may be implementable in dominant strategies, some of which satisfy revenue equivalence and some which do not. In such cases, all previous results have no bite. However, our characterization can be used to determine which of the allocation rules do satisfy revenue equivalence. The remainder of the paper is organized as follows. In Section 2, we introduce notation and basic definitions. In Section 3, we derive our graph theoretic characterization of revenue equivalence. In Section 4, we briefly discuss how our characterization applies in various settings. 2. SETTING AND BASIC CONCEPTS Denote by {1 n} the set of agents and let A be the set of possible outcomes. Outcome space A is allowed to have infinitely many, even uncountably many, elements. Denote the type of agent i ∈ {1 n} by ti . Let Ti be the type space of agent i. Type spaces Ti can be arbitrary sets. Agent i’s preferences over outcomes are modeled by the valuation function vi : A × Ti → R, where vi (a ti ) is the valuation of agent i for outcome a when he has type n ti . A mechanism (f π) consists of an allocation rule f : i=1 Ti → A and a payn n ment scheme π : i=1 Ti → R . In a direct revelation mechanism, the allocation rule chooses for a vector t of aggregate type reports of all agents an outcome f (t), whereas the payment scheme assigns a payment πi (t) to each agent i. Let the vector (ti t−i ) denote the aggregate type report vector when i reports ti and the other agents’ reports are represented by t−i . We assume quasi-linear utilities, that is, the utility of agent i when the aggregate report vector is (ti t−i ) is vi (f (ti t−i ) ti ) − πi (ti t−i ). DEFINITION 1—Dominant Strategy Incentive Compatible: A direct revelation mechanism (f π) is called dominant strategy incentive compatible if for
CHARACTERIZATION OF REVENUE EQUIVALENCE
309
every agent i, every type ti ∈ Ti , all aggregate type vectors t−i that the other agents could report, and every type si ∈ Ti that i could report instead of ti , vi (f (ti t−i ) ti ) − πi (ti t−i ) ≥ vi (f (si t−i ) ti ) − πi (si t−i ) If for allocation rule f there exists a payment scheme π such that (f π) is a dominant strategy incentive compatible mechanism, then f is called implementable in dominant strategies, in short implementable. In this paper we assume that the allocation rule is implementable in dominant strategies and study the uniqueness of the corresponding payment scheme. We refer to the latter as revenue equivalence.4 DEFINITION 2—Revenue Equivalence: An allocation rule f , implementable in dominant strategies, satisfies the revenue equivalence property if for any two dominant strategy incentive compatible mechanisms (f π) and (f π ) and any agent i there exists a function hi that only depends on the reported types of the other agents t−i such that ∀ti ∈ Ti :
πi (ti t−i ) = πi (ti t−i ) + hi (t−i )
3. CHARACTERIZATION OF REVENUE EQUIVALENCE We give a necessary and sufficient condition for revenue equivalence with the aid of a graph theoretic interpretation used by Rochet (1987), Gui, Müller, and Vohra (2004), and Saks and Yu (2005) to characterize implementable allocation rules. We also adopt some of their notation. Fix agent i and the reports, t−i , of the other agents. For simplicity of notation we write T and v instead of Ti and vi . Similarly, for any mechanism (f π), we regard f and π as functions of i’s type alone, that is, f : T → A and π : T → R. If (f π) is dominant strategy incentive compatible, it is easy to see that for any pair of types s t ∈ T such that f (t) = f (s) = a for some a ∈ A, the payments must be equal, that is, π(t) = π(s) =: πa . Hence, the payment of agent i is completely defined if the numbers πa are defined for all outcomes a ∈ A such that f −1 (a) is nonempty. For ease of notation, we let A denote the set of outcomes that can be achieved for some report of agent i. For an allocation rule f , let us define two different kinds of graphs. The type graph Tf has node set T and contains an arc from any node s to any other node t of length5 st = v(f (t) t) − v(f (s) t) 4 We choose the term revenue equivalence in accordance with Krishna (2002). In our setting it is equivalent to payoff equivalence as used in Krishna and Maenner (2001). See Milgrom (2004, Section 4.3.1) for settings where it is not equivalent.
310
HEYDENREICH, MÜLLER, UETZ, AND VOHRA
Here st represents the gain in valuation for an agent truthfully reporting type t instead of lying type s. This could be positive or negative. The allocation graph Gf has node set A. Between any two nodes a b ∈ A, there is a directed arc with length5 ab = inf (v(b t) − v(a t)) t∈f −1 (b)
The arc lengths ab in the allocation graph represent the least gain in valuation for an agent with any type t ∈ f −1 (b) for reporting truthfully, instead of misreporting so as to get outcome a (instead of b). The type graph and allocation graph are complete, directed, and possibly infinite graphs.6 We introduce our main results in terms of allocation graphs. Analogous results hold for type graphs as well. A path from node a to node b in Gf , or the (a b) path for short, is defined as P = (a = a0 a1 ak = b) such that ai ∈ A for i = 0 k. Denote by length(P) the length of this path. A cycle is a path with a = b. For any a, we regard the path from a to a without any arcs as an (a a) path and define its length to be 0. Define P (a b) to be the set of all (a b) paths. DEFINITION 3—Node Potential: A node potential p is a function p : A → R such that for all x y ∈ A, p(y) ≤ p(x) + xy . OBSERVATION 1: Let f be an allocation rule. Payment schemes π such that (f π) is a dominant strategy incentive compatible mechanism exactly correspond to node potentials in each of the allocation graphs Gf that are obtained from a combination of an agent and a report vector of the other agents. PROOF: Assume f is implementable. Fix agent i and the reports t−i of the other agents. Consider the corresponding allocation graph Gf . For any pair of types s t ∈ T such that f (t) = f (s) = a for some a ∈ A, the payments must be equal, that is, π(t) = π(s) = πa . Therefore, π assigns a real number to every node in the graph. Incentive compatibility implies for any two outcomes a b ∈ A and all t ∈ f −1 (b) that v(b t) − πb ≥ v(a t) − πa ; hence, πb ≤ πa + ab . For the other direction, define the payment π for agent i as follows. For any report vector of the other agents t−i , consider the corresponding allocation graph Gf and fix a node potential p. At aggregate report vector (ti t−i ) with outcome a = f (ti t−i ), let the payment be πa := p(a). Incentive compatibility 5 We assume that arc lengths are strictly larger than −∞. For allocation rules implementable in dominant strategies, this is no restriction, as the incentive compatibility constraints imply finiteness of the arc lengths. 6 Clearly, type and allocation graphs depend on the agent i and reports t−i of the other agents. To keep the notation simple, we suppress the dependence on i and t−i , and will simply write Tf and Gf .
CHARACTERIZATION OF REVENUE EQUIVALENCE
311
now follows from the fact that p is a node potential in Gf , similarly to the above. Q.E.D. Let distGf (a b) =
inf length(P)
P∈P (ab)
In general, distGf (a b) could be unbounded. However, if Gf does not contain a negative cycle (the nonnegative cycle property), then distGf (a b) is finite, since the length of any (a b) path is lower bounded by −ba . LEMMA 1: Fix an agent and some report vector of the other agents. The corresponding allocation graph Gf has a node potential if and only if it satisfies the nonnegative cycle property. PROOF: Proofs can be found, for example, in Schrijver (2003) for finite A and in Rochet (1987) for infinite A. For completeness, we give a simple proof. If Gf has no negative cycle, then for any a ∈ A, distGf (a ·) is well defined, that is, takes only finite values. The distances distGf (a ·) define a node potential, because distGf (a x) ≤ distGf (a y) + yx for all x y ∈ A. On the other hand, given a node potential p, add up the inequalities p(y) − p(x) ≤ xy for all arcs (x y) on a cycle to prove that the cycle has nonnegative length. Q.E.D. Observation 1 together with Lemma 1 yields a characterization of allocation rules that are implementable in dominant strategies (see also, e.g., Rochet (1987)). OBSERVATION 2: The allocation rule f is implementable in dominant strategies if and only if all allocation graphs Gf obtained from a combination of an agent and a report vector of the other agents satisfy the nonnegative cycle property. From Lemma 1 and Observations 1 and 2 it follows that for any allocation rule f implementable in dominant strategies, there exist node potentials in all allocation graphs Gf . The allocation rule f satisfies revenue equivalence if and only if in each allocation graph Gf , the node potential is uniquely defined up to a constant. Our main result states that this is the case if and only if distances are antisymmetric in every Gf . THEOREM 1—Characterization of Revenue Equivalence: Let f be an allocation rule implementable in dominant strategies. f satisfies revenue equivalence if and only if in all allocation graphs Gf obtained from a combination of an agent and a report vector of the other agents, distances are antisymmetric, that is, distGf (a b) = − distGf (b a) for all a b ∈ A.
312
HEYDENREICH, MÜLLER, UETZ, AND VOHRA
PROOF: Suppose first that f satisfies revenue equivalence. Fix a combination of an agent and a report vector of the other agents, and let Gf be the corresponding allocation graph. Let a b ∈ A. The functions distGf (a ·) and distGf (b ·) are node potentials in Gf . As any two node potentials differ only by a constant, we have that distGf (a ·) − distGf (b ·) is a constant function. Especially, for a and b we get that distGf (a a) − distGf (b a) = distGf (a b) − distGf (b b). Clearly, distGf (a a) = distGf (b b) = 0 and hence distGf (a b) = − distGf (b a). Now suppose that distGf (a b) = − distGf (b a) for all a b ∈ A. Let a b ∈ A. Let Pab be an (a b) path with nodes a = a0 a1 ak = b. For any node potential p, add up the inequalities p(ai ) − p(ai−1 ) ≤ ai−1 ai for i = 1 k This yields p(b) − p(a) ≤ length(Pab ). Therefore, p(b) − p(a) ≤
inf length(P) = distGf (a b)
P∈P (ab)
Similarly, p(a) − p(b) ≤ distGf (b a). Therefore, − distGf (b a) ≤ p(b) − p(a) ≤ distGf (a b). Since distGf (a b) = − distGf (b a), p(b) − p(a) = distGf (a b) for any node potential p. Hence, any potential is completely defined, once p(a) has been chosen for some outcome a. Thus, any two node potentials can only differ by a constant and f satisfies revenue equivalence. Q.E.D. An analogous characterization holds for type graphs as well. One can check that all previous arguments still apply when using type graphs. On the other hand, note the following relation of node potentials in Gf and node potentials in Tf . Given a node potential pG in Gf , we can define a node potential pT in Tf by letting pT (t) := pG (f (t)) for any t ∈ T . In fact, let G and T denote the arc lengths in Gf and Tf , respectively, and observe that G ab = inf{Tst | s ∈ f −1 (a) t ∈ f −1 (b)}. Then, for any s t ∈ T , pT (t) = pG (f (t)) ≤ T T T is a node potential. On the other pG (f (s)) + G f (s)f (t) ≤ p (s) + st and p T hand, given a node potential p in Tf , let pG (a) := pT (s) for any s ∈ f −1 (a). Note that pG is well defined as f (s) = f (t) = a implies Tst = 0 and hence pT (s) = pT (t). Furthermore, for any a b ∈ A and any s ∈ f −1 (a), t ∈ f −1 (b), pG (a) = pT (s) ≤ pT (t) + Tts = pG (b) + Tts . Hence, pG (a) ≤ pG (b) + G ba and pG is a node potential in Gf . Consequently, there is a one-to-one relationship between node potentials in Gf and node potentials in Tf . This insight together with a proof similar to the one of Theorem 1 yield the following corollary. COROLLARY 1—Characterization of Revenue Equivalence on Type Graphs: Let f be an allocation rule that is implementable in dominant strategies. Then f satisfies revenue equivalence if and only if in all type graphs Tf obtained from a combination of an agent and a report vector of the other agents, distances are antisymmetric, that is, distTf (s t) = − distTf (t s) for all s t ∈ T .
CHARACTERIZATION OF REVENUE EQUIVALENCE
313
4. DISCUSSION In settings with multidimensional type spaces, finite A, and valuation functions that are linear in types, implementability implies that the allocation rule f , viewed from a single agent perspective as a vector field that maps multidimensional types on lotteries over outcomes, has a potential function F . One can easily verify that this property has the following interpretation on type graphs: the length of a shortest path in Tf from some type s to some type t is upper bounded by a path integral of the vector field f or, equivalently, F(t) − F(s). From this it follows easily that distTf (s t) = − distTf (t s), that is, revenue equivalence holds. In particular, distTf (s t) = F(t) − F(s) for any potential function F . This connection between implementability, potential functions, and revenue equivalence is also established in Jehiel, Moldovanu, and Stacchetti (1996, 1999), Jehiel and Moldovanu (2001), and Krishna and Maenner (2001). It is interesting to compare our result with the characterization by Chung and Olszewski (2007). First, we introduce the notation used by Chung and Olszewski (2007) and restate their characterization theorem. Let A be countable. As before, regard everything from the perspective of a single agent. Let A1 , A2 be disjoint subsets of A and let r : A1 ∪ A2 → R. For every ε > 0, let T1 (ε) = {t ∈ T |∀a2 ∈ A2 : v(a1 t) − v(a2 t) > r(a1 ) − r(a2 ) + ε} a1 ∈A1
and
T2 (ε) =
{t ∈ T |∀a1 ∈ A1 : v(a1 t) − v(a2 t) < r(a1 ) − r(a2 ) − ε}
a2 ∈A2
Finally, let Ti = ε>0 Ti (ε), i = 1 2. Observe that T1 ∩ T2 = ∅. Call the type space T splittable if there are A1 , A2 , and r such that T = T1 ∪ T2 and Ti = ∅ for i = 1 2. THEOREM 2—Chung and Olszewski (2007): If A is finite, the following two statements are equivalent. (i) All f that are implementable in dominant strategies satisfy revenue equivalence. (ii) For all agents, Ti is not splittable. If A is not finite, but countable, (ii) implies (i). To see the connection between the allocation graph defined in Section 3 and the notion of splittable, we outline a proof that (ii) ⇒ (i). Suppose an allocation rule f implementable in dominant strategies that fails revenue equivalence. Since f is implementable, the allocation graphs satisfy the nonnegative cycle property. Since revenue equivalence is violated, Theorem 1 implies an agent
314
HEYDENREICH, MÜLLER, UETZ, AND VOHRA
i and reports of the other agents t−i such that in the corresponding allocation graph Gf , distGf (a∗ b∗ ) + distGf (b∗ a∗ ) > 0 for some a∗ b∗ ∈ A. Assume the perspective of agent i. We show that this implies that Ti is splittable. Define d(a) = distGf (a∗ a) + distGf (a a∗ ) for all a ∈ A. Since the function d takes only countably many values, there exists z ∈ R such that the following sets form a nontrivial partition of A: A1 = {a ∈ A | d(a) > z}, A2 = {a ∈ A | d(a) < z}. Observe that for every a1 ∈ A1 , there exists ε(a1 ) > 0 such that d(a1 ) > z + ε(a1 ). Similarly, for every a2 ∈ A2 , there exists ε(a2 ) > 0 such that d(a2 ) < z − ε(a2 ). It is now straightforward to verify that the sets Ai “split” the type space. Notice that in Theorem 1 no assumption on the cardinality of A is made, whereas in Theorem 2, A is assumed finite or countable. On the other hand, Theorem 1 imposes a condition on the allocation rule, whereas Theorem 2 characterizes T and v such that all allocation rules that are implementable in dominant strategies satisfy revenue equivalence. The principle difference between these settings is illustrated by the following example. A principal has one unit of a perfectly divisible good to be distributed among n agents. The type of agent i is his demand ti ∈ (0 1]. Given the reports t ∈ n n (0 1]n of all agents, an allocation rule n f : (0 1] → [0 1] assigns a fraction of the good to every agent such that i=1 fi (t) ≤ 1. If an agent’s demand is met, he incurs a disutility of 0; otherwise, his disutility is linear in the amount of unmet demand. More precisely, agent i’s valuation if he is assigned quantity qi is 0 if qi ≥ ti , vi (qi ti ) = qi − ti if qi < ti . In this context, payments are reimbursements from the principal for unmet demand. This valuation function appears in Holmström (1979) as an example to demonstrate that his smooth path-connectedness assumption cannot be weakened. Likewise, the example can be used to show that the convexity assumption of the valuation function in Krishna and Maenner (2001) cannot be relaxed. Call an allocation rule f dictatorial if there is an agent i who always gets precisely his demanded quantity fi (ti t−i ) = ti for all t−i . This rule is implementable and, as shown in Holmström (1979), fails revenue equivalence. However there are implementable rules in this setting that satisfy revenue equivalence: THEOREM 3: For the ndemand rationing problem, the proportional allocation rule f with fi (t) = ti / j=1 tj for i = 1 n is implementable and satisfies revenue equivalence. The proof uses the type graph for any agent i and any fixed report t−i of other agents, and verifies implementability by using Lemma 1. A fixed point
CHARACTERIZATION OF REVENUE EQUIVALENCE
315
argument is used to show that the distance function is antisymmetric. Thus revenue equivalence holds due to Theorem 1.7 As this setting of T and v allows for allocation rules that satisfy revenue equivalence as well as for rules that do not, any theorem that describes sufficient conditions for all implementable f to satisfy revenue equivalence is necessarily silent. REFERENCES CHUNG, K.-S., AND W. OLSZEWSKI (2007): “A Non-Differentiable Approach to Revenue Equivalence,” Theoretical Economics, 2, 469–487. [308,313] GREEN, J., AND J.-J. LAFFONT (1977): “Characterization of Satisfactory Mechanisms for the Revelation of Preferences for Public Goods,” Econometrica, 45, 427–438. [308] GUI, H., R. MÜLLER, AND R. VOHRA (2004): “Dominant Strategy Mechanisms With Multidimensional Types,” Discussion Paper 1392, The Center for Mathematical Studies in Economics and Management Sciences, Northwestern University, Evanston, IL. [307,309] HEYDENREICH, B., R. MÜLLER, M. UETZ, AND R. VOHRA (2008): “Characterization of Revenue Equivalence,” Meteor Research Memorandum RM/08/01, Maastricht University. [315] HOLMSTRÖM, B. (1979): “Groves’ Scheme on Restricted Domains,” Econometrica, 47, 1137–1144. [308,314] JEHIEL, P., AND B. MOLDOVANU (2001): “Efficient Design With Interdependent Valuations,” Econometrica, 69, 1237–1259. [313] JEHIEL, P., B. MOLDOVANU, AND E. STACCHETTI (1996): “How (Not) to Sell Nuclear Weapons,” American Economic Review, 86, 814–829. [313] (1999): “Multidimensional Mechanism Design for Auctions With Externalities,” Journal of Economic Theory, 85, 258–293. [313] KLEMPERER, P. (1999): “Auction Theory: A Guide to the Literature,” Journal of Economic Surveys, 13, 227–286. [307] KRISHNA, V. (2002): Auction Theory. San Diego, CA: Academic Press. [309] KRISHNA, V., AND E. MAENNER (2001): “Convex Potentials With an Application to Mechanism Design,” Econometrica, 69, 1113–1119. [308,309,313,314] MILGROM, P. (2004): Putting Auction Theory to Work. New York: Cambridge University Press. [309] MILGROM, P., AND I. SEGAL (2002): “Envelope Theorems for Arbitrary Choice Sets,” Econometrica, 70, 583–601. [308] MÜLLER, R., A. PEREA, AND S. WOLF (2007): “Weak Monotonicity and Bayes–Nash Incentive Compatibility,” Games and Economic Behavior, 61, 344–358. [307] MYERSON, R. (1981): “Optimal Auction Design,” Mathematics of Operations Research, 6, 58–73. [308] ROCHET, J.-C. (1987): “A Necessary and Sufficient Condition for Rationalizability in a QuasiLinear Context,” Journal of Mathematical Economics, 16, 191–200. [307,309,311] SAKS, M., AND L. YU (2005): “Weak Monotonicity Suffices for Truthfulness on Convex Domains,” in Proceedings of the 6th ACM Conference on Electronic Commerce. New York: Association for Computing Machinery, 286–293. [307,309] SCHRIJVER, A. (2003): Combinatorial Optimization, Vol. A. Berlin: Springer. [311] SUIJS, J. (1996): “On Incentive Compatibility and Budget Balancedness in Public Decision Making,” Economic Design, 2, 193–209. [308] 7
A complete proof can be found in Heydenreich, Müller, Uetz, and Vohra (2008).
316
HEYDENREICH, MÜLLER, UETZ, AND VOHRA
Dept. of Quantitative Economics, Maastricht University, P.O. Box 616, 7500 AE Enschede, The Netherlands; [email protected], Dept. of Quantitative Economics, Maastricht University, P.O. Box 616, 7500 AE Enschede, The Netherlands; [email protected], Dept. of Applied Mathematics, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands; [email protected], and Dept. of Managerial Economics and Decision Sciences, Kellogg Graduate School of Management, Northwestern University, Evanston, IL 60208, U.S.A.; [email protected]. Manuscript received May, 2007; final revision received July, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 317–318
CORRIGENDUM TO “EXISTENCE AND UNIQUENESS OF SOLUTIONS TO THE BELLMAN EQUATION IN THE UNBOUNDED CASE” ECONOMETRICA, VOL. 71, NO. 5 (SEPTEMBER, 2003), 1519–1555 BY JUAN PABLO RINCÓN-ZAPATERO AND CARLOS RODRÍGUEZ-PALMERO WE THANK JANUSZ MATKOWSKI AND ANDRZEJ S. NOWAK for pointing out an error in Proposition 1 of our paper (Rincón-Zapatero and Rodríguez-Palmero (2003)) and providing a counterexample.1 In the aforementioned result, the metric d(f g) =
∞
2−j
j=1
dj (f g) 1 + dj (f g)
considered is not a contraction on A, a bounded subset of C(X). However, a slight modification of d restores the result once the condition supj≥1 βj = β < 1 is imposed. This last condition is harmless in our setting, since all the applications we give satisfy this assumption. Given a bounded set A with supfg∈A dj (f g) ≤ mj (without loss of generality it can be considered mj > 0 for each j), consider the metric dA (f g) =
∞
2−j
j=1
dj (f g) mj + dj (f g)
This metric turns A into a complete metric space. The correct statement of Proposition 1 is then as follows. PROPOSITION 1: Let T : C(X) −→ C(X) be a 0-local contraction relative to A, a bounded subset of C(X) with bounds {mj }, such that sup βj = β < 1. Then there exists α ∈ [0 1) such that dA (Tf T g) ≤ αdA (f g)
for all f g ∈ A
PROOF: For f g ∈ A, it follows that dA (Tf T g) ≤
∞ j=1
βj dj (f g) dj (f g) ≤ 2−j aj mj + βj dj (f g) j=1 mj + dj (f g) ∞
2−j
1 Matkowski and Nowak (2008) applied our concept of k-local contraction to stochastic dynamic programming.
© 2009 The Econometric Society
DOI: 10.3982/ECTA7770
318
J. P. RINCÓN-ZAPATERO AND C. RODRÍGUEZ-PALMERO
where aj = βj (mj + dj (f g))/(mj + βj dj (f g)). Obviously, aj ≤ 2βj /(1 + βj ) for each j, since A is bounded and (mj + x)/(mj + βj x) is increasing with respect to x. Thus, aj ≤ 2β/(1 + β) < 1 for each j, and hence dA (Tf T g) ≤ Q.E.D. α dA (f g) for α = 2β/(1 + β). In the reading of the proofs of Theorems 1, 3, and 6 and Proposition 3, the metric d should be changed by the metric dA for the results to hold. None of the remaining results is affected. REFERENCES MATKOWSKI, J., AND A. S. NOWAK (2008): “On Discounted Dynamic Programming With Unbounded Returns,” Unpublished Manuscript, Department of Mathematics, Computer Science, and Econometrics, University of Zielona Góra. [317] RINCÓN-ZAPATERO, J. P., AND C. RODRÍGUEZ-PALMERO (2003): “Existence and Uniqueness of Solution to the Bellman Equation in the Unbounded Case,” Econometrica, 71, 1519–1555. [317]
Departamento de Economía, Universidad Carlos III de Madrid, C/Madrid 126, E-28903 Getafe, Madrid, Spain; [email protected] and Departamento de Economía Aplicada, Universidad de Valladolid, Avda de Valle Esgueva 6, E-47011 Valladolid, Spain; [email protected]. Manuscript received March, 2008; final revision received June, 2008.
Econometrica, Vol. 77, No. 1 (January, 2009), 319–323
ANNOUNCEMENTS 2009 NORTH AMERICAN WINTER MEETING
THE 2009 NORTH AMERICAN WINTER MEETING of the Econometric Society will be held in San Francisco, CA, from January 3 to 5, 2009, as part of the annual meeting of the Allied Social Science Associations. The program will consist of contributed and invited papers. It is hoped that the research presented will represent a broad spectrum of applied and theoretical economics and econometrics. The program committee will be chaired by Steven Durlauf of University of Wisconsin–Madison. Details on registration, housing and the preliminary program are now available on the ASSA Meeting website at http://www.vanderbilt.edu/AEA/Annual_ Meeting/index.htm. The 2009 North American Winter Meeting will feature the following Invited Lectures: January 3, 10:15 am DECISION MAKING IN COMPLEX ENVIRONMENTS Presiding: NOAH WILLIAMS, University of Wisconsin CHARLES F. MANSKI, Northwestern University THOMAS J. SARGENT, New York University and Hoover Institution, Stanford University January 3, 2:30 pm PREFERENCES AND DECISION THEORY Presiding: PHILIP J. RENY, University of Chicago LAWRENCE E. BLUME, Cornell University CHRIS SHANNON, University of California–Berkeley January 4, 10:15 am EMPIRICAL MICROECONOMICS Presiding: PETRA ELISABETH TODD, University of Pennsylvania MOSHE BUCHINSKY, University of California–Los Angeles CHRISTOPHER R. TABER, University of Wisconsin–Madison January 5, 10:15 am THE ECONOMICS AND PSYCHOLOGY OF INEQUALITY AND HUMAN DEVELOPMENT
Presiding: STEVEN N. DURLAUF, University of Wisconsin–Madison DENNIS EPPLE, Carnegie Mellon University JAMES J. HECKMAN, University of Chicago Program Committee: Steven Durlauf, University of Wisconsin–Madison (Program Chair) © 2009 The Econometric Society
DOI: 10.3982/ECTA771ANN
320
ANNOUNCEMENTS
David Austen-Smith, Northwestern University (Political Economy) Dirk Bergemann, Yale University (Information Economics) Lawrence Blume, Cornell University (Game Theory) Moshe Buchinsky, University of California, Los Angeles (Applied Econometrics) Dennis Epple, Carnegie Mellon University (Public Economics) Oded Galor, Brown University (Economic Growth) Jinyong Hahn, University of California, Los Angeles (Econometric Theory) Caroline Hoxby, Stanford University (Social Economics) Guido Kuersteiner, University of California, Davis (Time Series) Jonathan Levin, Stanford University (Industrial Organization) Shelly Lundberg, University of Washington (Labor Economics) James Rauch, University of California, San Diego (International Trade) Hélène Rey, Princeton University (International Finance) Manuel Santos, University of Miami (Computational Economics) Christina Shannon, University of California, Berkeley (Mathematical Economics) Steven Tadelis, University of California, Berkeley (Market Design) Petra Todd, University of Pennsylvania (Microeconometrics/Empirical Microeconomics) Toni Whited, University of Wisconsin (Finance) Noah Williams, Princeton University (Macroeconomics) Justin Wolfers, Wharton (Behavioral Economics/Experimental Economics) Tao Zha, Federal Reserve Bank of Atlanta (Macroeconomics) Lin Zhou, Arizona State University (Social Choice Theory/Microeconomic Theory) 2009 NORTH AMERICAN SUMMER MEETING
THE 2009 NORTH AMERICAN SUMMER MEETING of the Econometric Society will be held June 4–7, 2009, hosted by the Department of Economics, Boston University, in Boston, MA. The program will be composed of a selection of invited and contributed papers. The program co-chairs are Barton Lipman and Pierre Perron of Boston University. The local arrangements chair is Marc Rysman of Boston University. Program Committee: Daron Acemoglu, Massachusetts Institute of Technology (Macroeconomics: Growth, and Political Economy) John Campbell, Harvard University (Financial Economics) Yeon-Koo Che, Columbia University (Auctions and Contracts)
ANNOUNCEMENTS
321
Francis X. Diebold, University of Pennsylvania (Financial Econometrics) Jean-Marie Dufour, McGill University (Theoretical Econometrics) Jonathan Eaton, New York University (International Trade) Glenn Ellison, Massachusetts Institute of Technology (Theoretical Industrial Organization) Charles Engel, University of Wisconsin (International Finance) Larry Epstein, Boston University (Plenary Sessions) Hanming Fang, Duke University (Theoretical Public Economics) Jesus Fernandez-Villaverde, University of Pennsylvania (Macroeconomics: Dynamic Models and Computational Methods) Simon Gilchrist, Boston University (Plenary Sessions) Wojciech Kopczuk, Columbia University (Empirical Public Economics) Thomas Lemieux, University of British Columbia (Empirical Microeconomics) Dilip Mookherjee, Boston University (Plenary Sessions) Kaivan Munshi, Brown University (Development) Muriel Niederle, Stanford University (Experimental Economics and Market Design) Edward O’Donoghue, Cornell University (Behavioral Economics) Claudia Olivetti, Boston University (Empirical Labor/Macroeconomics) Christine Parlour, University of California, Berkeley (Corporate Finance/Microeconomic Foundations of Asset Pricing) Zhongjun Qu, Boston University (Plenary Sessions) Lucrezia Reichlin, London School of Economics (Applied Macroeconomics/Factor Models: Theory and Application) Marc Rysman, Boston University (Empirical Industrial Organization) Uzi Segal, Boston College (Decision Theory) Chris Shannon, University of California, Berkeley (General Equilibrium and Mathematical Economics) Balazs Szentes, University of Chicago (Economic Theory) Julia Thomas, Ohio State University (Macroeconomics: Business Cycles) Timothy Vogelsang, Michigan State University (Time Series Econometrics) Adonis Yatchew, University of Toronto (Micro-Econometrics and Non-Parametric Methods) Muhammet Yildiz, Massachusetts Institute of Technology (Game Theory) 2009 AUSTRALASIAN MEETING
THE ECONOMETRIC SOCIETY AUSTRALASIAN MEETING in 2009 (ESAM09) will be held in Canberra, Australia, from July 7th to July 10th. ESAM09 will be hosted by the College of Business and Economics at the Australian National University, and the program committee will be co-chaired by Heather Anderson and Maria Racionero. The program will include plenary, invited and contributed sessions in all fields of economics.
322
ANNOUNCEMENTS
Prospective contributors are invited to submit titles and abstracts of their papers in both theoretical and applied economics and econometrics by March 6th 2009 via the conference website at http://esam09.anu.edu.au. Each person may submit only one paper, or be a co-author on others providing that they will present no more than one paper. At least one co-author must be a member of the Society or must join prior to submission. The ESAM09 conference website contains details about the program, invited speakers, the paper submission process, and conference registration. 2009 EUROPEAN MEETING
THE 2009 EUROPEAN MEETING of the Econometric Society (ESEM) will take place in Barcelona, Spain from 23 to 27 August, 2009. The Meeting is jointly organized by the Barcelona Graduate School of Economics and it will run in parallel with the Congress of the European Economic Association (EEA). The Program Committee Chairs are Prof. Juuso Välimäki (Helsinki School of Economics) for Theoretical and Applied Economics; Prof. Gerard J. van der Berg (Free University Amsterdam) for Econometrics and Empirical Economics. The Local Arrangements Committee: Albert Carreras, Chairman—Universitat Pompeu Fabra and Barcelona GSE Carmen Bevià—Universitat Autònomade Barcelona and Barcelona GSE Jordi Brandts—Institute for Economic Analysis-CSIC and Barcelona GSE Eduard Vallory, Secretary—Barcelona GSE Director-General All details regarding the congress can be found on the website http:// eea-esem2009.barcelonagse.eu/. 2010 NORTH AMERICAN WINTER MEETING
THE 2010 NORTH AMERICAN WINTER MEETING of the Econometric Society will be held in Atlanta, GA, from January 3 to 5, 2010, as part of the annual meeting of the Allied Social Science Associations. The program will consist of contributed and invited papers. The program committee will be chaired by Dirk Bergemann of Yale University. Each person may submit only one paper, or be a co-author on others providing that they will present no more than one paper. At least one co-author must be a member of the Society or must join prior to submission. You may join the Econometric Society at http://www.econometricsociety.org.
ANNOUNCEMENTS
323
Program Committee: Dirk Bergemann, Yale University, Chair Marco Battaglini, Princeton University (Political Economy) Roland Benabou, Princeton University (Behavioral Economics) Markus Brunnermeier, Princeton University (Financial Economics) Xiahong Chen, Yale University (Theoretical Econometrics, Time Series) Liran Einav, Stanford University (Industrial Organization) Luis Garicano, University of Chicago (Organization, Law and Economics) John Geanakoplos, Yale University (General Equilibrium Theory, Mathematical Economics) Mike Golosov, MIT (Macroeconomics) Pierre Olivier Gourinchas, University of California (International Finance) Igal Hendel, Northwestern (Empirical Microeconomics) Johannes Hoerner, Yale University (Game Theory) Han Hong, Stanford University (Applied Econometrics) Wojcich Kopczuk, Columbia University (Public Economics) Martin Lettau, University of California, Berkeley (Finance) Enrico Moretti, University of California, Berkeley (Labor) Muriel Niederle, Stanford University (Experimental Game Theory, Market Design) Luigi Pistaferri, Stanford University (Labor) Esteban Rossi-Hansberg, Princeton University (International Trade) Marciano Siniscalchi, Northwestern University (Decision Theory) Robert Townsend, Massachusetts Institute of Technology (Development Economics) Oleg Tsyvinski, Yale University (Macroeconomics, Public Finance) Harald Uhlig, University of Chicago (Macroeconomics, Computational Finance) Ricky Vohra, Northwestern University (Auction, Mechanism Design)
Econometrica, Vol. 77, No. 1 (January, 2009), 325
FORTHCOMING PAPERS THE FOLLOWING MANUSCRIPTS, in addition to those listed in previous issues, have been accepted for publication in forthcoming issues of Econometrica. ALVAREZ, FERNANDO, AND FRANCESCO LIPPI: “Financial Innovation and the Transactions Demand for Cash.” ARELLANO, MANUEL, AND STEPHANE BONHOMME : “Robust Priors in Nonlinear Panel Data Models.” CHIAPPORI, PIERRE-ANDRE, AND IVAR EKELAND: “The Micro Economics of Efficient Group Behavior: Identification.” KARNI, EDI: “A Mechanism for Eliciting Probabilities.” KEANE, MICHAEL P., AND ROBERT M. SAUER: “Classification Error in Dynamic Discrete Choice Models: Implications for Female Labor Supply Behavior.” LAGOS, RICARDO, AND GUILLAUME G. ROCHETEAU: “Liquidity in Asset Markets With Search Frictions.” SEO, KYOUNGWON: “Ambiguity and Second-Order Belief.”
© 2009 The Econometric Society
DOI: 10.3982/ECTA771FORTH
Econometrica, Vol. 77, No. 1 (January, 2009), 327–333
THE ECONOMETRIC SOCIETY ANNUAL REPORTS REPORT OF THE SECRETARY
MILAN, ITALY AUGUST 26–27, 2008
THIS REPORT STARTS by describing the evolution of the Society’s membership and of the number of institutional subscribers. Information is provided on both a midyear and an end-of-year basis. The latest information available, as of June 30 of the current year and of selected previous years, is provided in the top panel of Table I. The bottom panel of Table I reports the final number of members and subscribers as of the end of 2007 and selected previous years. For any given year, the figures in the bottom half of Table I are larger than in the top half, reflecting those memberships and subscriptions that are initiated between the middle and the end of that calendar year. The membership of the Society has continued its upward trend, reaching a total of 4,691 ordinary and 1,019 student members at the end of 2007. This represents an increase of 51.8 percent and 57.3 percent with respect to the 2000 figures. In the last two years there has been a decline in the number of student members that has been more than compensated by the increase in ordinary members. However, the latest data indicate that the membership may decline slightly in 2008. The number of institutional subscribers has continued its downward trend, reaching a total of 1,842 subscribers at the end of 2007, which represents a decrease of 4.6 percent with respect to the figure in 2006 and of 24.4 percent with respect to the figure in 2000. Moreover, the latest data indicate that further declines in the number of institutional subscribers will be registered in 2008. Table II displays the division between print and online and online only subscriptions. The comparison between 2007 and 2008 shows a continued shift toward online only. This is especially significant for student members, 74.8 percent of whom chose the online only option as of June 2008. Table III compares the Society’s membership and the number of institutional subscribers with those of the American Economic Association. (For the membership category these figures include ordinary, student, free, and life members for both the ES and the AEA.) The steady reduction of the AEA membership stands in marked contrast to the sharp increase in the ES membership, with the ES/AEA ratio for members increasing from 19.7 percent in 2000 to a record 34.1 percent in 2007. However, the decline in the number of institutional subscribers has been similar for both organizations, with the ES/AEA ratio ranging between 50.8 percent in 2000 to 46.0 percent in 2005. The ratio went up to 48.9 percent in 2006 and down to 47.1 percent in 2007. The geographic distribution of members (including students) by countries and regions as of June 30 of the current year and of selected previous years is shown in Table IV. The format of this table has been slightly changed with respect to that in previous reports. The table now shows individual data on countries with more than 10 members in 2008. Previously some countries were grouped together, so their individual membership data are not available. In comparison with the 2005 figures, membership has increased in Australasia, Europe and Other Areas, and the Far East, and has declined in North America, Latin America, and South and South East Asia (partly © 2009 The Econometric Society
DOI: 10.3982/ECTA771SEC
328
ANNUAL REPORTS TABLE I INSTITUTIONAL SUBSCRIBERS AND MEMBERS Members
Year
Institutions
Total Circulation
1. Institutional subscribers and members at the middle of the year 1980 2,829 1,978 411 53 45 1985 2,428 2,316 536 28 55 1990 2,482 2,571 388 57 73 1995 2,469 2,624 603 46 77 2000 2,277 2,563 437 — 112
74 71 69 66 62
5,390 5,434 5,643 5,885 5,471
2001 2002 2003 2004 2005 2006 2007 2008
71 103 117 111 106 110 97 89
62 61 60 60 57 58 58 56
5,174 5,153 5,620 5,915 7,085 6,984 7,250 6,852
47 61 74 96 77
74 70 68 66 62
6,018 6,123 6,608 6,651 6,316
87 105 112 101 110 93 86
61 60 60 58 58 58 56
6,233 6,247 6,586 7,095 7,621 7,629 7,694
2,456 2,419 2,839 2,965 3,996 4,020 4,393 4,257
Student
363 461 633 784 1,094 1,020 916 759
Soft Currency
Freea
Life
2,222 2,109 1,971 1,995 1,832 1,776 1,786 1,691
Ordinary
— — — — — — — —
2. Institutional subscribers and members at the end of the year 1980 3,063 2,294 491 49 1985 2,646 2,589 704 53 1990 2,636 3,240 530 60 1995 2,569 3,072 805 43 2000 2,438 3,091 648 — 2001 2002 2003 2004 2005 2006 2007
2,314 2,221 2,218 2,029 1,949 1,931 1,842
3,094 3,103 3,360 3,810 4,282 4,382 4,691
680 758 836 1,097 1,222 1,165 1,019
— — — — — — —
a Includes free libraries.
as a result of the decision by the Executive Committee in 2006 to move Hong Kong to the Far East region). Table V shows the percentage distribution of members (including students) by regions as of June 30 of the current year and of selected previous years. The share of North America in total membership fell below 50 percent in 2005 and it is now at 42.7 percent. Table VI displays the geographic distribution of Fellows as of June 30, 2008. As noted in previous reports, this distribution is very skewed, with 69.2 percent of the Fellows based in North America and 26.5 percent in Europe and Other Areas. Table VII provides information on the nomination and election of Fellows. Since 2006, the election has been conducted with an electronic ballot system. This has led to a very significant increase in the participation rate, which was 72.7 percent in 2006 and 70.5 percent in 2007 (jumping from an average of 55.8 percent in the previous five
329
ANNUAL REPORTS TABLE II INSTITUTIONAL SUBSCRIBERS AND MEMBERS BY TYPE OF SUBSCRIPTION (MIDYEAR) 2007
2008
Total
Percent
Total
Percent
Institutions Print + Online Online only
1,786 1,519 267
1000 851 149
1,691 1,388 303
1000 821 179
Ordinary members Print + Online Online only
4,393 2,573 1,820
1000 586 414
4,257 2,233 2,024
1000 525 475
916 323 593
1000 353 647
759 191 568
1000 252 748
Student members Print + Online Online only
years and a historical minimum of 45.5 percent in 2005). The number of nominees in 2007 was 50 and the number of new Fellows elected was 16, which happens to be the average number of Fellows elected during the period 1974–2006. This outcome is very different from the one in 2006, when only 5 new Fellows were elected. The increase in the number of Fellows elected may be related to the change in the electronic ballot agreed by the Executive Committee in 2006 and implemented in 2007 that added the possibility of selecting by a single click all the candidates nominated by the Nominating Committee. In fact, of the 16 new Fellows elected in 2007, 8 had been nominated by the Committee (which also nominated 2 other candidates that were not elected). TABLE III INSTITUTIONAL SUBSCRIBERS AND MEMBERS ECONOMETRIC SOCIETY AND AMERICAN ECONOMIC ASSOCIATION (END OF YEAR) Institutions
Members
Year
ES
AEA
ES/AEA (%)
ES
AEA
ES/AEA (%)
1975 1980 1985 1990 1995 2000
3,207 3,063 2,646 2,636 2,569 2,438
7,223 7,094 5,852 5,785 5,384 4,780
44.4 43.2 45.2 45.6 47.7 50.8
2,627 2,955 3,416 3,972 4,082 3,878
19,564 19,401 20,606 21,578 21,565 19,668
13.4 15.2 16.0 18.4 18.9 19.7
2001 2002 2003 2004 2005 2006 2007
2,314 2,221 2,218 2,029 1,949 1,931 1,842
4,838 4,712 4,482 4,328 4,234 3,945 3,910
47.8 47.1 49.5 46.9 46.0 48.9 47.1
3,919 4,026 4,368 5,066 5,672 5,698 5,852
18,761 18,698 19,172 18,908 18,067 17,811 17,143
20.9 21.5 22.8 26.8 31.4 32.0 34.1
330
ANNUAL REPORTS TABLE IV GEOGRAPHIC DISTRIBUTION OF MEMBERSa (MIDYEAR)
Region and Country
Australasia Australia New Zealand
1980
1985
1990
1995
2000
2005
2008
57 52 5
60 57 3
95 84 11
98 88 10
90 78 12
162 137 25
201 167 34
Europe and Other Areas Austria Belgium Czech Republic Denmark Finland Franceb Germany Greecec Hungary Ireland Israel Italyd Netherlands Norway Poland Portugal Russiae Spain Sweden Switzerland Turkey United Kingdom Other Europe Other Asia Other Africa
625 15 23 — 19 19 53 92 12 34 4 0 16 75 24 4 5 5 34 27 26 0 135 3 0 0
716 21 21 — 22 26 36 106 12 30 5 16 43 68 26 6 5 2 43 31 27 1 145 6 4 14
803 25 30 — 27 17 56 112 6 30 5 25 48 90 23 20 11 4 36 25 25 3 162 10 2 11
1,031 27 31 — 38 15 81 135 14 5 6 32 57 103 29 27 11 4 88 45 34 8 210 17 5 9
992 24 32 — 22 13 73 153 15 5 6 37 59 86 21 27 19 5 81 42 25 9 207 19 7 5
2,092 49 61 — 47 27 188 354 18 13 15 56 126 130 52 22 32 11 171 72 79 21 509 23 6 10
2,106 44 41 17 43 48 206 390 22 19 18 38 167 151 51 18 33 13 184 57 91 12 398 29 5 11
Far East China Hong Kongf Japan Korea Taiwan Other Far East
105 — — 83 — — 22
134 — — 114 — — 20
144 — — 101 — — 43
228 — — 143 — — 85
189 — — 130 — — 59
315 — — 203 — — 112
391 25 27 248 47 43 1
North America Canada United States
1,645 159 1,486
2,059 192 1,867
2,150 194 1,956
1,989 200 1,789
1,498 127 1,371
2,409 208 2.201
2,187 226 1,961 (Continues)
Comparing the 2006 with the 2007 election, the number of votes needed to be elected (30 percent of the number of ballots submitted) went down from 96 to 90, while the number of votes per ballot went up from 11.4 to 12.8.
331
ANNUAL REPORTS TABLE IV—Continued Region and Country
1980
Latin America Argentina Brazil Chile Colombia Mexico Other Latin America South and South East Asia India Singapore Other South and South East Asiaf Total
1985
1990
1995
2000
2005
2008
53 — — — — 10 43
39 — — — — 5 34
30 — — — — 1 29
87 — — — — 16 71
105 — — — — 15 90
180 — — — — 33 147
162 21 69 21 13 25 13
27 6 — 21 2,512
49 30 — 19 3,057
42 18 — 24 3,264
49 10 — 39 3,482
31 14 — 17 2,905
105 22 — 83 5,263
74 22 36 16 5,121
a Only countries with more than 10 members in 2008 are listed individually. Until 2005 some countries were grouped together, so their individual membership data are not available. b Until 2005 the data for France include Luxembourg. c Until 2005 the data for Greece include Cyprus. d Until 2005 the data for Italy include Malta. e Until 2005 the data for Russia correspond to the Commonwealth of Independent States or the USSR. f Until 2005 Hong Kong was included in South and South East Asia.
TABLE V PERCENTAGE DISTRIBUTION OF MEMBERS (MIDYEAR)
Australasia Europe and Other Areas Far East North America Latin America South and Southeast Asia Total
1980
1985
1990
1995
2000
2005
2008
23 249 42 655 21 11 1000
20 234 44 674 13 16 1000
29 246 44 659 09 13 1000
28 296 65 571 25 14 1000
31 341 65 516 36 11 1000
31 397 60 458 34 20 1000
39 411 76 427 32 14 1000
In 2008, all six regions of the Society are organizing meetings, according to the following timetable: North American Winter Meeting, New Orleans, Louisiana, January 4–6, 2008 North American Summer Meeting, Pittsburgh, Pennsylvania, June 19–22, 2008 Australasian Meeting, Wellington, New Zealand, July 9–11, 2008 Far Eastern and South Asian Meeting, Singapore, July 16–18, 2008 European Summer Meeting, Milan, Italy, August 27–31, 2008 European Winter Meeting, Cambridge, United Kingdom, October 31–November 1, 2008 Latin American Meeting, Rio de Janeiro, Brazil, November 20–22, 2008
332
ANNUAL REPORTS TABLE VI GEOGRAPHICAL DISTRIBUTION OF FELLOWS, 2008 Australasia Australia
6 6
Europe and Other Areas Austria Belgium Denmark Finland France Germany Hungary Ireland Israel Italy Netherlands Norway Poland Russia Spain Sweden Switzerland Turkey United Kingdom
162 2 9 2 2 31 9 6 1 22 4 6 1 2 4 6 3 2 1 49
Far East Japan Korea
15 14 1
North America Canada United States
423 10 413
Latin America Brazil
2 2
South and Southeast Asia India
3 3
Total (as of June 30, 2008)
611
TABLE VII FELLOWS’ VOTING STATISTICS Late Ballots Returned but Not Counted
Inactive
Eligible to Vote
Returned Ballots
Percent Returning Ballots
Number of Nominees
Number Elected
Percent Ratio Elected to Nominee
197 299 354 422 499 546
26 49 57 47 119 147
171 251 301 375 380 399
100 150 164 209 225 217
58.5 59.8 54.4 55.7 59.2 54.4
63 73 60 44 52 59
21 18 13 23 15 14
33.3 24.7 21.7 52.3 28.8 23.7
n.a. n.a. 17 5 2 10
564 577 590 582 604 601 599
170 189 200 145 140 154 166
394 388 390 437 464 447 433
245 236 217 239 211 325 305
62.2 60.8 55.6 54.7 45.5 72.7 70.4
55 45 53 51 50 55 50
10 17 20 15 14 5 16
18.2 37.8 37.7 29.4 28.0 9.1 32.0
0 2 10 8 16 — —
Year
Total Fellows
1975 1980 1985 1990 1995 2000 2001 2002 2003 2004 2005 2006 2007
ANNUAL REPORTS
333
The North American Winter Meetings have traditionally taken place within the meetings of the Allied Social Sciences Association (ASSA). Since 2003, the European Summer Meeting has run in parallel with the Annual Congress of the European Economic Association, and since 2006, the Latin Amerian Meeting has run in parallel with the Annual Meeting of the Latin American and Caribbean Economic Association (LACEA). The 2008 South and South East Asian Meeting was scheduled to take place in Islamabad, Pakistan, but the political situation in this country led to the decision to cancel the meeting, and have a joint Far Eastern and South Asian Meeting in Singapore. In August 2007, the Executive Committee initiated a discussion of the regional structure of the Society, partly driven by the fact that the 2008 Far Eastern Meeting was taking pace in Singapore, which was formally part of the South and South East Asian region. The Committee decided to consult the Council members of the two regions about a possible merger into a single Asian region. The reaction was positive, with some members advocating a period of cooperation before the merger. In line with this approach, the two Standing Committees agreed in Singapore to have another joint Far Eastern and South Asian Meeting next year in Tokyo. The 2010 World Congress will take place in Shanghai, China, August 17–22, and will be organized by the Shanghai Jiaotong University in cooperation with the Shanghai University of Finance and Economics, Fudan University, the China Europe International Business School, and the Chinese Association of Quantitative Economics. The Organization Committee appointed Daron Acemoglu, Manuel Arellano, and Eddie Dekel as Program Chairs. To conclude, I would like to thank the members of the Executive Committee, and in particular Lars Peter Hansen, for their help and support during 2007. I am also very grateful to Claire Sashi, the Society’s General Manager in charge of the office at New York University, for her excellent work during this year. RAFAEL REPULLO
Econometrica, Vol. 77, No. 1 (January, 2009), 335–340
THE ECONOMETRIC SOCIETY REPORTS REPORT OF THE TREASURER MILAN, ITALY AUGUST 26–27, 2008
THE 2007 ACCOUNTS of the Econometric Society show a surplus of $178,015, which represents a 39.8% decrease with respect to the record figure in 2006 (Table III, Line H). The 2007 surplus is lower than the estimate of $196,000 at this time last year. Although revenues were $42,225 higher than expected, expenses were $60,210 higher than expected. The higher revenues are mainly explained by the increase in investment income, with membership and subscriptions plus other revenues being roughly in line with the estimate at this time last year (Table II, Line C). The higher expenses are mainly explained by higher than expected increases in publishing and Executive Committee expenses (Table III, Lines A and D). The net worth of the Society on 12/31/2007 reached $1,425,219 (Table I, Line C). Consequently, the ratio of net worth to total expenses on 12/31/2007 was 142 percent, a figure which is in the middle of the target range between 120 and 160 percent agreed by the Executive Committee in August 2007. Table I shows the balance sheets of the Society for the years 2003–2007, distinguishing between unrestricted assets and liabilities, whose difference gives the Society’s net worth, and five restricted accounts: The World Congress Fund, which is a purely bookkeeping entry that serves to smooth the expenses every five years on travel grants to the World Congress, the Jacob Marschak Fund, devoted to support the Marschak lectures at regional meetings outside Europe and North America, and the Far Eastern, Latin American, and European Funds, which are held in custody for the convenience of the corresponding regional Standing Committees. Tables IV and V show the movements in the World Congress Fund and the other restricted accounts for the years 2003– 2007. Table II shows the actual revenues for 2006, the estimated and actual revenues for 2007, and the estimated revenues for 2008 and 2009. Membership and subscription revenues in 2007 were roughly in line with the estimate at this time last year, while other revenues were higher than expected due to increases in permissions and the revenue from the North American Winter Meeting. Investment income for 2007 was also significantly above the estimate at this time last year. The situation is likely to be drastically reversed for 2008, due to the fall in stock markets since the beginning of the year (the S&P 500 has fallen 13.7 percent from January 1 to July 31). Thus total revenues are expected to go down to $820,000, which implies a 30.6 percent reduction relative to the figure in 2007. © 2009 The Econometric Society
DOI: 10.3982/ECTA771TREAS
336
ANNUAL REPORTS
TABLE I ECONOMETRIC SOCIETY BALANCE SHEETS, 2003–2007 12/31/03 $
12/31/04 $
12/31/05 $
12/31/06 $
12/31/07
1372807 364781 776278 201281 11203 12214 7050
1771179 453857 855848 436678 7285 10448 7063
1707036 123106 1114981 450198 7067 5791 5893
1801710 63854 1548878 167360 1884 2459 17275
2203312 117574 1753807 308662 7913 3161 12195
B. Unrestricted Liabilities 1. Accounts Payable 2. Deferred Revenue 3. World Congress Fund
794112 52082 502030 240000
882989 19470 563519 300000
755569 68293 607276 80000
554504 37861 356643 160000
778093 99103 438990 240000
C. Unrestricted Fund Balance
578695
888190
951467
1247206
1425219
D. World Congress Fund Balance
240000
300000
80000
160000
240000
E. Jacob Marschak Fund Balance
27490
27876
29011
26560
24926
F. Far Eastern Fund Balance
61036
61710
63576
66624
70016
G. Latin American Fund Balance
24292
14489
22046
23103
21941
—
—
—
—
64903
A. Unrestricted Assets 1. Short Term Assets 2. Investments 3. Accounts Receivable 4. Back Issue Inventory 5. Furniture and Equipment 6. Other Assets
H. European Fund Balance
TABLE II ECONOMETRIC SOCIETY REVENUES, 2006–2009
A. Membership and Subscriptions B. Other Revenues 1. Back Issues 2. Reprints 3. Advertising 4. List Rentals 5. Permissions 6. North American Meetings (net) C. Investment Income 1. Interest and Dividends 2. Capital Gains D. Total Revenues
Actual 2006 $
Estimate 2007 $
Actual 2007 $
Estimate 2008 $
Budget 2009 $
975680
980000
964367
920000
950000
34250 14613 939 7623 1560 4842 4673
40000 15000 1000 7500 1500 5000 10000
50629 15660 831 4496 1366 12085 16191
50000 15000 1000 7000 2000 10000 15000
50000 15000 1000 7000 2000 10000 15000
236593 53133 183460
120000 40000 80000
167229 (150000) 64998 50000 102231 (200000)
100000 50000 50000
1246524 1140000 1182225
820000 1100000
337
ANNUAL REPORTS TABLE III ECONOMETRIC SOCIETY EXPENSES, 2006–2009 Actual 2006 $
Estimate 2007 $
Actual 2007 $
Estimate 2008 $
Budget 2009 $
A. Publishing 1. Composition 2. Printing 3. Inventory (net) 4. Circulation 5. Postage
329283 47934 76943 5184 89061 110161
330000 50000 80000 0 90000 110000
372450 71226 67907 (6029) 102556 136790
360000 60000 70000 0 100000 130000
360000 60000 70000 0 100000 130000
B. Editorial 1. Editors 2. Editorial Assistants 3. Software 4. Meetings
233008 145375 80937 1000 5696
277000 170000 96000 1000 10000
278828 173875 92637 3000 9316
339000 232000 92000 3000 12000
339000 240000 82000 4000 12000
C. Administrative 1. Salaries and Honoraria 2. Office 3. Accounting and Auditing 4. IRS 5. Website 6. Other
170443 127300 16447 18200 1243 6653 600
202000 145000 12000 22000 1500 21500 0
206964 141598 3429 38040 (635) 23494 1038
195000 146000 5000 34000 1000 8000 1000
191000 142000 5000 34000 1000 8000 1000
D. Executive Committee E. Meetings 1. World Congress 2. Regional Meetings
24630 119259 88500 30759
30000 105000 80000 25000
54479 91489 80000 11489
52000 124000 80000 44000
52000 128000 80000 48000
F. Special Expenses 1. Transition
74162 74162
0 0
0 0
0 0
0 0
G. Total Expenses
950785
944000 1004210 1070000 1070000
295739
196000
H. Surplus I. Unrestricted Fund Balance J. Ratio of Unrestricted Fund Balance to Total Expenses
178015 (250000)
30000
1247206 1443206 1425219 1175219 1205219 1.31
1.53
1.42
1.10
1.13
Table III shows the actual expenses for 2006, the estimated and actual expenses for 2007, and the estimated expenses for 2008 and 2009. Publishing expenses in 2007 have been significantly higher than the estimate at this time last year, while editorial and administrative expenses have been in line with the estimate. Total expenses for 2008 and 2009 are expected to increase by about 7 percent due to higher editorial expenses (because of the increase in payments to the Co-Editors of Econometrica) and meetings expenses (because of the increase in the grants for activities involving “young economists” in regions other than Europe and North America). This together with the expected behavior of total revenues implies a deficit of $250,000 for 2008 and a surplus of $30,000
338
ANNUAL REPORTS TABLE IV WORLD CONGRESS FUND, 2003–2007
A. Income 1. Transfer from General Fund B. Expenses 1. Travel Grants 2. Transfer to General Fund C. Fund Balance
2003 $
2004 $
2005 $
2006 $
2007 $
60000 60000
60000 60000
180000 180000
80000 80000
80000 80000
0 0 0
0 0 0
400000 326385 73615
0 0 0
0 0 0
240000
300000
80000
160000
240000
TABLE V RESTRICTED ACCOUNTS, 2003–2007 2003 $
2004 $
2005 $
2006 $
2007 $
A. Jacob Marschak Fund 1. Investment Income 2. Expenses 3. Fund Balance
316 1002 27490
387 0 27877
1135 0 29012
1393 3844 26561
1393 3029 24926
B. Far Eastern Fund 1. Investment Income 2. Expenses 3. Fund Balance
552 0 61036
674 0 61710
1866 0 63576
3048 0 66624
3392 0 70016
C. Latin American Fund 1. Investment Income 2. Expenses (net) 3. Fund Balance
212 0 24292
197 10000 14489
558 (7000) 22046
1057 0 23103
1162 2324 21941
— — —
— — —
— — —
— — —
62612 2291 64903
D. European Fund 1. Transfer from European Region 2. Investment Income 3. Fund Balance
for 2009 after allocating $80,000 each year to the World Congress Fund. Thus the ratio of net worth to total expenses is expected to go down to 110 percent in 2008 and to increase to 113 percent in 2009, slightly below the lower bound of the target range agreed by the Executive Committee in August 2007. The Executive Committee decided by e-mail in June 2008 to adjust the institutional subscription rates in the following manner:
ANNUAL REPORTS
Print + Online (High income) Online only (High income) Print + Online (Concessionary) Online only (Concessionary)
2008 $520 $480 $40 Free
339
2009 $550 $500 $50 Free
High income rates are applied to economies classified as high income by the World Bank. Income classifications are set each year on July 1. In the latest classification high income economies are those with 2007 gross national income per capita (calculated using the World Bank Atlas method) higher than $11,456. Concessionary rates are applied to economies that are not classified as high income. Print + Online subscribers receive paper copies of the six issues of Econometrica for the corresponding year and have online access to volumes back to 1999. Online only subscribers do not get the paper copies of Econometrica. Since 2006, institutional subscribers have perpetual online access to the volumes to which they subscribed. Following the recommendation of the Bergstrom report on “Pricing and Access to Econometrica,” individual membership rates should be gradually adjusted so that the difference between print and online only rates covers the marginal cost of printing and postage, estimated to be about $40. My proposal for 2009 (agreed by the Executive Committee) is the following: Ordinary member (High income) Print + Online Ordinary member (High income) Online only Student member (High income) Print + Online Student member (High income) Online only Ordinary or student member (Concessionary) Print + Online Ordinary or student member (Concessionary) Online only
2008 2009 $55 $60 $25 $25 $40 $45 $10 $10 $40 $45 $10 $10
The Society’s Investments Committee, whose members are John Campbell, Rafael Repullo, and Hyun Shin, met in January 2008 during the ASSA meetings in New Orleans to review the Society’s unrestricted portfolio on 12/31/2007 (Table VI, Column 2). The Committee decided to overweight the share of cash in the portfolio relative to the reference asset allocation of 20 percent cash, 10 percent bonds, and 70 percent equities, of which 45 percent correspond to U.S. equities, 45 percent to international equities, and 10 percent to emerging market equities. On 7/31/2008, the breakdown by type of asset was 30.2 percent cash, 6.9 percent bonds, 29.0 percent U.S. equities, 28.8 percent international equities, and 5.2 percent emerging markets equities (Table VI, Column 3). All investments are in no-load Fidelity mutual funds. The return of the unrestricted portfolio in the year ending July 31, 2008 was −582 percent, as compared to the return of the S&P 500 stock market index of −1291 percent. The 2007 financial statements have been compiled by E. C. Ortiz & Co., LLP, 333 S. Desplaines St., Chicago, IL 60661, and will be audited by Rothstein, Kass
340
ANNUAL REPORTS TABLE VI ECONOMETRIC SOCIETY INVESTMENT PORTFOLIO Market Value 7/31/2007
Name of Fund
$
%
Market Value 12/31/2007 $
%
Market Value 7/31/2008 $
%
Unrestricted Investment Portfolio 1968879 Fidelity Money Market 520306 Spartan Interm. Treasury Bond 178761 Spartan 500 Index 567265 Spartan International Index 571630 Fidelity Emerging Markets 130917
1000 1753807 264 368988 91 192077 288 517344 290 524084 67 151315
1000 1946055 210 587956 110 133328 295 563438 299 560915 86 100418
1000 302 69 290 288 52
Restricted Investment Portfolio Fidelity Money Market Spartan International Index
1000 1000 —
1000 643 357
1000 680 320
Total Investment Portfolio
119764 119764 — 2088643
181785 116882 64903 1935592
174979 119000 55979 2121034
& Company, 1350 Avenue of the Americas, New York, NY 10019. I would like to thank Stella Santos of E. C. Ortiz & Co. for her help with the Society’s accounts. RAFAEL REPULLO
Econometrica, Vol. 77, No. 1 (January, 2009), 341–345
THE ECONOMETRIC SOCIETY ANNUAL REPORTS REPORT OF THE EDITORS 2007–2008 WHILE THIS YEAR HAS SEEN a significant turnover in personnel (discussed below), Econometrica’s mission continues to be to publish original articles in all branches of economics—theoretical and empirical, abstract and applied— providing wide-ranging coverage across the subject area. It promotes studies that aim at the unification of the theoretical-quantitative and the empiricalquantitative approaches to economic problems and that are penetrated by constructive and rigorous thinking. It explores a unique range of topics each year—from the frontier of theoretical developments in many new and important areas to research on current and applied economic problems to methodologically innovative, theoretical and applied studies in econometrics. The Editorial Board continues to seek to broaden the pool of top quality submissions from traditional areas of strength, such as microeconomic theory and econometrics, across all areas of economics. The society has shown its commitment to this goal through the expansion of the Editorial Board with new Co-Editors in more applied areas. The three tables below provide summary statistics on the editorial process in the form presented in previous editors’ reports. Table I indicates that we received 744 new submissions this year, the highest number ever. The number of revisions (146) received is high relative to recent years and is the second highest ever. The number of papers accepted was 57. This represents a return to the level of recent years, after an unexplained dip in 2007, which led to some shorter issues in 2007 and early 2008. Table III gives data on the time to first decision for decisions made in this fiscal year, with 47% of papers decided within 3 months and 81% decided within 6 months. There has been a disappointing increase in the time to decision in the bottom tail of the distribution. In particular, while our decision times on TABLE I STATUS OF MANUSCRIPTS
In process at beginning of year New papers received Revisions received Papers accepted Papers conditionally accepted Papers returned for revision Papers rejected or active withdrawals [Of these, rejected without full refereeing] Papers in process at end of year
© 2009 The Econometric Society
03/04
04/05
05/06
06/07
07/08
218 589 122 61
156 617 130 50
158 615 161 57
138 574 [194] 156
153 542 [199] 158
190 520 [146] 165
165 691 127 45 16 95 591 [163] 236
236 744 146 57 32 156 656 [154] 216
DOI: 10.3982/ECTA771EDS
342
ANNUAL REPORTS TABLE II DISTRIBUTION OF NEW PAPERS AMONG CO-EDITORS 03/04
05/06
06/07
121
129
71 127
113
105 110
12
7
12
116 115 90 3
Previous Editors Dekel Horowitz Meghir Postlewaite
193 93 56 117
192
184
169
83 101
75
Total
589
617
615
Current Editors Acemoglu Berry Levine Morris Newey Samuelson Uhlig Others
04/05
118
691
07/08
84 70 110 170 107 102 91 10
744
revisions are faster than on new submissions, earlier improvements on revision decision times have not been sustained and a disappointing 27% of revisions took more than 6 months. Although not reported in the tables, we can report that a final decision was reached on 48% of incoming first-round revisions and on 78% of incoming second-round revisions. TABLE III TIME TO DECISION Decisions on New Submissions Number Percentage
In ≤1 months In 2 months In 3 months In 4 months In 5 months In 6 months In 7 months In 8 months In >8 months
181 50 128 108 89 64 46 24 73
Total
763
24% 7% 17% 14% 12% 8% 6% 3% 10%
Cumulative %
24% 30% 47% 61% 73% 81% 87% 90% 100%
Decisions on Revisions
Decisions on All Papers
Number Percent- Cumulative Number Percent- Cumulative age % age %
50 11 15 13 11 7 14 6 20 147
34% 7% 10% 9% 7% 5% 10% 4% 14%
34% 41% 52% 61% 68% 73% 82% 86% 100%
231 61 143 121 100 71 60 30 93 910
25% 7% 16% 13% 11% 8% 7% 3% 10%
25% 32% 48% 61% 72% 80% 86% 90% 100%
ANNUAL REPORTS
343
These numbers are comparable with the performance in recent years, although a dramatic increase in final decisions on first-round revisions (to 68%) last year was not sustained. We have started to collect data on the processing time of published papers and the number of “rounds” that papers go through. Papers published during 2007–2008 spent an average of 11 months in the hands of the journal (throughout the whole process) and 9 months in the hands of the authors (carrying out revisions). We are pleased that we have been able to increase in breadth while maintaining the quality we strive for. We hope that this success and the expansion of the Editorial Board to include more applied strength will translate into a broader pool of top quality submissions. There seems to be small progress in this direction and we are currently investigating ways to track the submission pool more systematically. The Co-Editors meet on an annual basis to discuss policy for the journal. The journal has a longstanding tradition of avoiding conflicts of interest in the editorial process, although there has not been a written policy. We decided to formalize the policy. Co-Editors will not handle papers from their current colleagues, their thesis advisors, their active co-authors, and their Ph.D. students for whom they were the main advisor; in addition, they will not handle the papers of Ph.D. students they advised even in a less central way within two years of graduation. Finally, Co-Editors will not have access to the referee reports or names of referees on papers written by authors with whom they have one of the conflict issues described above. Academic publication is in flux as experiments are made with new methods of publication (e.g., electronic) and peer review. We remain committed to the principle that we evaluate work for publication in Econometrica by assessing the marginal contribution over previously published work by the authors and others. While what constitutes “prior publication” for this purpose is not always easy to articulate (as discussed in our “Instructions for Authors” on our website), we decided to require—as part of the electronic submission process—that authors acknowledge their awareness of our policy on prior publications and their obligation to report any related work that might be relevant to our decision making process. Balancing our reputation for careful evaluation of papers with the need to provide authors with faster turnaround is a continuing priority. As a (small) part of this process, we decided on a new policy on late revisions. In the ordinary course of events, we expect authors who are offered the opportunity to submit a revised version of their paper to return it within one year. If a paper is returned after two years, there will be no presumption that it will be sent to the original Co-Editor (especially if he/she has left the board) and, given the passage of time, the Co-Editor will be able to use his/her discretion in deciding how to reflect past referee reports and decisions in his/her decision process. There have been many changes in personnel. Eddie Dekel was an Associate Editor of Econometrica from 1996 to 2000, a Co-Editor from 2000 to 2003 and
344
ANNUAL REPORTS
Editor from 2003 to 2007. We are grateful to him for the extraordinary service that he has brought to the journal over the years and his excellent stewardship over the last four years. Stephen is particularly grateful for the organizational changes that he has made, putting the journal fully on the web and otherwise leaving it in perfect order. He has also gone beyond the call of duty in helping with the transition of editors and the editorial office. Stephen is very grateful to David Levine for agreeing to extend his term for the 2007–2008 year. This also greatly assisted in the transition. The journal has benefitted from his breadth and wisdom during this year and his previous four year term as Co-Editor. Daron Acemoglu and Stephen Morris joined the board in July 2007. Daron’s appointment, and the implied expansion of the Editorial Board, signals our determination to strengthen the journal in more applied areas and particularly in applied theory. Stephen has greatly benefitted from his energy, breadth, and knowledge in pursuing this mission. Wolfgang Pesendorfer is replacing David Levine in July 2008. His depth and perspective will help compensate for some of the board members we have lost. We are very grateful to Whitney Newey for agreeing to extend his term for one year until June 30, 2008. Jim Stock has agreed to replace Whitney as Co-Editor in July 2009. Jim’s combination of economic insight and econometric skills embody the best Econometric Society traditions. It is hard to underestimate the role played by the Associate Editors of Econometrica. This group maintains a tradition of writing referee reports for Econometrica of a consistently high standard, and referee reports by the Associate Editors play a disproportionate role in our decision making. We thank Bruce Hansen (University of Wisconsin–Madison), Guido Imbens (Harvard University), Michael Keane (University of Technology, Sydney), Benny Moldovanu (University of Bonn), and Thomas Palfrey (California Institute of Technology), who will not be continuing on the board; we are very grateful for all they have done for the journal. We are delighted that James Andreoni (University of California, San Diego), Marco Battaglini (Princeton University), Pierpaolo Battigalli (Università Bocconi), Dirk Bergemann (Yale University), Victor Chernozhukov (MIT), Haluk Ergin (Washington University in St. Louis), Susanne Schennach (University of Chicago), Jeroen Swinkels (Washington University in St. Louis), and Iván Werning (MIT) will be joining us. We are very grateful to those who have agreed to extend their service for another term: Jushan Bai (New York University), Darrell Duffie (Stanford University), Faruk Gul (Princeton University), Jinyong Hahn (University of California, Los Angeles), Per Krusell (Princeton University), Steven Matthews (University of Pennsylvania), Philip Reny (University of Chicago), Neil Shephard (Oxford University), and Elie Tamer (Northwestern University). Our referees also maintain a tradition of writing referee reports to a remarkably high standard. We offer them our sincere gratitude for their willingness to invest their time to offer us their insightful views on the submissions we receive. Following this report we list those who advised us this year; we apologize to anyone who we have mistakenly omitted.
ANNUAL REPORTS
345
Editors would like to concentrate on editing. We are lucky to be able to do so because we have excellent professionals managing the editorial and publication process. On the editorial side, we set up a new editorial office for Econometrica at Princeton University with a new editorial assistant, Mary Beth Bellando. Mary Beth has quickly learned how Econometrica works better than the rest of us and has perfected the combination of firm resolve and charm that is crucial to keeping everything moving forward. We are very grateful for her skill and enthusiasm in making everything run smoothly. We were greatly assisted in setting up the editorial office by the previous editorial assistant in Tel Aviv, Yael Leshem, who not only left everything running smoothly, but has continued to provide remarkable backup assistance to us in the editorial office. We thank Princeton University for providing us with invaluable facilities and backup services, and, in particular, Matthew Parker (for technical support), Barbara Radvany (the Economics Department Manager), and Bo Honore (the Economics Department chair when we set up the office). We benefit from the help of all the Co-Editors’ assistants: Emily Gallagher, Susann Roethke, Susan Olmsted, Sharline Samuelson, Lauren Fahey, and Patricia Wong. John Rust and Sarbartha Bandyopadhyay of Editorial Express® continue to assist us by developing and maintaining the software we use for running the journal. On the publication side, the Managing Editor Geri Mattson and her staff at Mattson Publishing Services supervise a very smooth process. We appreciate the assistance of Elisabetta O’Connell and Zoë Cumming from Blackwell with the journal website. Vytas Statulevicius and his staff at VTEX do a superb job typesetting the journal. The Econometric Society in the form of its General Manager, Claire Sashi, and its executive Vice President, Rafael Repullo, oversee the production process and the management of our editorial process. We thank them for their efficiency in doing this as well as for their input and advice on running the journal. STEPHEN MORRIS DARON ACEMOGLU STEVE BERRY DAVID LEVINE WHITNEY NEWEY LARRY SAMUELSON HARALD UHLIG
Econometrica, Vol. 77, No. 1 (January, 2009), 347–355
THE ECONOMETRIC SOCIETY ANNUAL REPORTS ECONOMETRICA REFEREES 2007–2008
A Abadie, A. Abbink, K. Abdulkadiroglu, A. Abreu, D. Abrevaya, J. Ackerberg, D. Aguirregabiria, V. Ahn, D. Ahn, S. Ai, C. Akcigit, U. Akyol, A. Alant, Y. Albanesi, S.
Alesina, A. Ali, S. Allen, F. Al-Najjar, N. Alvarez, F. Amador, M. Anderlini, L. Andersen, T. Anderson, E. Anderson, S. Andreoni, J. Angeletos, G. Antras, P. Aoyagi, M.
Aradillas-Lopez, A. Arellano, M. Arkolakis, C. Asheim, G. Ashraf, Q. Atakan, A. Atakan, A. Athey, S. Attanasio, O. Austen-Smith, D. Ausubel, L. Azariadis, C.
B Bachmann, R. Back, K. Baeurle, G. Bagwell, K. Bahrman, J. Bajari, P. Bandi, F. Banerjee, A. Barbera, S. Barberis, N. Barlevy, G. Barndorff-Nielsen, O. Baruch, S. Battagini, M. Battigalli, P. Bellemare, C. Benabou, R. Benaim, M. Bencivenga, V. Benhabib, J. © 2009 The Econometric Society
Benkard, L. Ben-Porath, E. Bergemann, D. Bergstrom, T. Berk, J. Bertola, G. Besley, T. Bester, A. Bester, H. Bhattacharya, S. Bhattacharya, U. Biasis, B. Bikhchandani, S. Bisin, A. Blackorby, C. Board, O. Bolvin, J. Boldrin, M. Bollerslev, T. Bolton, G.
Bond, P. Borgers, T. Bossaerts, P. Bound, J. Bowles, S. Boylan, R. Brambilla, I. Brandenburger, A. Brandts, J. Brocas, I. Broner, F. Brown, C. Bruegemann, B. Brunnermeier, M. Brusco, S. Buchinsky, M. Buera, F. Burda, M. Burnside, C.
DOI: 10.3982/ECTA771REF
348
ANNUAL REPORTS
C Cabrales, A. Caillaud, B. Callander, S. Calsamiglia, C. Camera, G. Camerer, C. Campbell, J. Campello, M. Cao, C. Carbonell-Nicolau, O. Carlier, G. Carneiro, P. Carranza, J. Carrasco, M. Caselli, F. Celentani, M. Chade, H. Chamberlain, G. Chamley, C.
Chaney, T. Chao, J. Chari, V. Charness, G. Chassang, S. Chatterjee, K. Che, Y. Chen, S. Chen, X. Chernozhukov, V. Chesher, A. Chetty, R. Chew, S. Chiappori, P. Chintagunta, P. Chiu, W. Chone, P. Christiano, L. Chung, K.
Cocco, J. Cochrane, J. Cogley, T. Cole, H. Coles, M. Collar-Wexler, A. Conlon, J. Cooper, D. Cordoba, J. Corsetti, G. Costa-Gomes, M. Cox, J. Crampton, P. Crawford, V. Creane, A. Cremer, J. Cripps, M. Crocker, K. Cvitanic, J.
D Dahl, G. Dal Bo, E. Dal Bo, P. Danthine, J. Davidson, J. Davidson, R. Dayanik, S. de Castro, L. Deck, C. De Fiore, F. de Jong, R. Deleire, T.
Delgado, M. Della Vigna, S. De Loecker, J. Deng, Y. Detemple, J. Diebold, F. Diewert, W. Diks, C. Dillenberger, D. Dionne, G. Dominitz, J. Donald, S.
Doppelhofer, G. Doraszelski, U. Duclos, J. Dueker, M. Duffee, G. Dufour, J. Dufwenberg, M. Duggan, J. Durham, G. Dutta, B.
E Easley, D. Eaton, J. Eberly, J. Echenique, F. Eckel, C.
Eckwert, B. Egorov, G. Egulia, J. Ehlers, L. Eichengreen, B.
Eissa, N. Eizenberg, A. Ekmekci, M. Ellingsen, T. Elliott, G.
349
ANNUAL REPORTS
Ellison, G. Engel, E.
Eraslan, H.
Ergin, H.
F Faingold, E. Falk, A. Fama, E. Fan, Y. Fang, H. Farmer, R. Faust, J. Feddersen, T. Fehr-Duda, H. Feinberg, Y. Fiaschi, D.
Firpo, S. Fishman, A. Fishman, M. Flache, A. Fleurbaey, M. Flinn, C. Foelimer, H. Forchini, G. Forges, F. Fostel, A. Foster, D.
Foucault, T. Fox, J. Franco, A. Frankel, D. Frazer, G. French, E. Friedenberg, A. Friedman, D. Fuchs, W. Fuchs-Schundein, N. Fudenberg, D.
G Gajdos, T. Galenianos, M. Gallien, J. Gancia, G. Gavazza, A. Gavin, W. Gentzkow, M. Gerardi, D. Gerber, A. Gerlach, S. Gersbach, H. Gershkov, A. Gersovitz, M. Geweke, J. Ghirardato, P. Ghysels, E.
Giesecke, K. Giboa, I. Giraitis, L. Glacomini, R. Glaeser, E. Gneiting, T. Goeree, J. Goettler, R. Gollier, C. Gollsbee, A. Golosov, M. Gonzalez, F. Gosling, A. Gossner, O. Gourieroux, C. Gourinchas, P.
Govindan, S. Goyal, S. Graham, B. Grant, S. Green, E. Greenstone, M. Greenwood, J. Grinblatt, M. Grosskopf, B. Gruber, J. Guembel, A. Guerrieri, V. Guggenberger, P. Guvenen, F.
H Haider, S. Haile, P. Hakevy, Y. Hall, A. Hall, R. Haltiwanger, J.
Ham, J. Hamilton, B. Han, C. Hansen, C. Hansen, L. Hansen, P.
Harding, M. Harrington, J. Harrison, G. Harstad, R. Hart, S. Haruvy, E.
350 Hastings, J. Hautsch, N. Hawkins, W. Hayashi, T. Heifetz, A. Hellwig, C. Hellwig, M. Hendel, I. Hendricks, K. Hennig-Schmidt, H. Hey, J. Hill, J. Hillier, G.
ANNUAL REPORTS
Hirano, K. Hirshleifer, D. Ho, T. Hobijn, B. Hoderlein, S. Holden, R. Holmes, T. Hong, H. Hong, Y. Horner, J. Horowitz, J. Horst, U. Hortacsu, A.
House, C. Houser, D. Howitt, P. Hu, Y. Huck, S. Huddart, S. Huggett, M. Hummel, P. Hunt, J. Hurst, E. Hyslop, D.
I Iaryczower, M. Imkeller, P.
Ioannides, Y. Iriberri, N.
Ishii, J.
J Jackson, M. Jacod, J. Jaffee, D. Jagannathan, R. James, D. Jansson, M. Jerez, B.
Jermann, U. Jewitt, I. Jia, P. Jiang, G. Johansen, S. Johnson, C. Jones, B.
Jones, C. Jordan, J. Jordan, P. Jouini, E. Jovanovic, B. Judd, K. Juillard, M.
K Kagel, J. Kahn, C. Kalai, G. Kamihigashi, T. Kaniel, R. Kanwar, S. Kapetanios, G. Kariv, S. Kartik, N. Kasa, K. Katok, E. Kaya, A. Kehoe, P.
Kehoe, T. Keller, G. Kelsey, D. Keursteiner, G. Khan, S. Kim, D. Kim, J. Kimball, M. Kinnan, C. Kircher, P. Kiyotaki, N. Kleibergen, F. Klein, P.
Kocherlakota, N. Koenker, R. Kogan, L. Kojima, F. Kopulov, I. Koszegi, B. Krammarz, F. Kranton, R. Krasnokutskaya, E. Kremer, I. Kremer, M. Krishna, V. Krueger, D.
351
ANNUAL REPORTS
Kubler, F.
Kultti, K.
Kuruscu, B. L
Lagos, R. Laibson, D. Lambson, V. Landsberger, M. Lange, A. Lange, F. Laroque, G. Laslier, J. Lau, M. Leahy, J. Lee, I. Lee, S. Legros, P. Lehrer, E.
Leitner, Y. Lettau, M. Levin, A. Levin, D. Levin, J. Levy, G. Lewbel, A. Lewis, G. Li, N. Li, T. Ligon, E. Linn, J. Lise, J. List, J.
Liu, H. Liu, Q. Ljungqvist, L. Llanes, G. Loewenstein, M. Longstaff, F. Lorenzoni, G. Lugosi, G. Luo, Y. Lustig, H. Lustig, J. Luttmer, E.
M Maccheroni, F. Machina, M. MacLeod, W. Mahajan, A. Mammen, E. Mandler, M. Manelli, A. Manning, A. Manovskii, I. Manski, C. Mariano, R. Marinacci, M. Mariotti, M. Mariotti, T. Mark, N. Markellos, R. Masatlioglu, Y. Matouschek, N. Matsushima, H. Matsuyama, K. McAdams, D.
McGrattan, E. Mclean, R. Mclennan, A. Mechaelides, A. Meddahi, N. Meikisz, J. Meirowitz, A. Melitz, M. Menager, L. Meng, Q. Merlo, A. Merz, M. Meyer, B. Meyer-ter-Vehn, M. Mezzetti, C. Midrigan, V. Mikusheva, A. Milgrom, P. Miller, D. Miller, R. Miyagawa, E.
Moav, O. Mobius, M. Moen, E. Molinari, F. Mookherjee, D. Moon, H. Moreira, M. Moreno-Ternero, J. Moscarni, G. Moser, D. Moulin, H. Mroz, T. Mueller, U. Mukerji, S. Munro, A. Munshi, K. Murto, P. Mykland, P. Mylovanov, T.
352
ANNUAL REPORTS
N Nachbar, J. Nau, R. Neeman, Z. Nehring, K.
Neilson, W. Neumark, D. Nevo, A. Newman, A.
Ng, S. Noor, J. Nousair, C. Nowak, A.
O Obara, I. O’Donoghue, T. Ohanian, L. Okui, R. Olken, B. Onatski, A.
O’Neill, B. Oomen, R. Oreopoulos, P. Ortalo-Magne, F. Osborne, M. Oster, E.
Ostroy, J. Otsu, R. Ottaviani, M. Otter, P. Ozbay, E.
P Pakes, A. Palacios-Huerta, I. Palfrey, T. Palomino, F. Pande, R. Pavan, A. Pavcnik, N. Pearce, D. Peck, J. Peeters, R. Peng, L. Pepper, J. Peretto, P. Perez-Castrillo, D.
Peri, G. Perni, F. Perrigne, I. Persico, N. Pesaran, H. Pesendorfer, M. Peski, M. Peters, M. Petrin, A. Phelan, C. Pinkse, J. Piqueira, N. Pirlot, M. Pissarides, C.
Ploberger, W. Podczeck, K. Poirer, D. Polak, B. Porter, J. Porter, R. Posterl-Vinay, F. Potscher, B. Powell, J. Preston, I. Pritsker, M. Prucha, I. Puppe, C.
Q Quadrini, V. Quah, J.
Quiggin, J.
Quinzii, M.
R Rady, S. Rahbek, A. Ramey, G. Ramey, V.
Rampini, A. Rangel, A. Ray, D. Rayo, L.
Razin, R. Reinganum, J. Reiter, M. Repullo, R.
353
ANNUAL REPORTS
Restuccia, D. Rey, P. Ridder, G. Riedel, F. Rios-Rull, J. Ritschl, A. Ritzberger, K. Robin, J.
Robins, J. Robinson, J. Robinson, P. Robson, A. Rochet, J. Roemer, J. Rogers, B. Rosen, A.
Rottenstreich, Y. Roughgarden, T. Rubinstein, A. Rudanko, L. Rust, J. Rutstrom, E. Ryan, S.
S Sadowski, P. Sahin, A. Saks, R. Salanie, B. Salant, S. Salmon, T. Sandholm, W. Santos, M. Sappington, D. Sargent, T. Sarver, T. Sattinger, M. Sauer, R. Scheinkman, J. Schenk-Hoppe, K. Schennach, S. Scherbina, A. Schlee, E. Schmedders, K. Schmeidler, D. Schmidt, K. Schmidt, P. Scholl, A. Schotter, A. Schram, A. Schuendeln, M. Schulhofer-Wohl, S. Schultz, P. Schummer, J.
Schwarz, M. Seater, J. Segal, I. Semenov, A. Semmler, W. Sentana, E. Seo, K. Serizawa, S. Serrano, R. Seshadri, A. Sethuraman, J. Shearer, B. Shi, S. Shimotsu, K. Shin, H. Shmaya, E. Shneyerov, A. Shore, S. Shorrocks, A. Shotts, K. Shum, M. Siconolfi, P. Sieg, H. Sinai, R. Sinclair-Desgagne, B. Singleton, K. Sjostrom, T. Skeie, D. Sleet, C.
Smith, L. Smorodinsky, R. Sobel, J. Sonmez, T. Sonsino, D. Sorensen, A. Sorensen, P. Sorger, G. Sowell, F. Spiegel, Y. Spiegler, R. Sprumont, Y. Stacchetti, E. Steel, M. Stewart, C. Stinchombe, M. Stock, J. Stoker, T. Stoye, J. Strausz, R. Strzalecki, T. Sturmfels, B. Sun, Y. Swanson, N. Swinkels, J. Syverson, C. Szeidl, A. Szentes, B.
354
ANNUAL REPORTS
T Tabellini, G. Tabor, C. Tadelis, S. Tahman, D. Takahashi, S. Tallon, J. Taub, B. Tchistyi, A.
Thisse, J. Thoenig, M. Thomas, J. Tian, G. Titman, S. Todd, P. Towe, C. Townsend, R.
Toxvaerd, F. Trebbi, F. Trefler, D. Tripathi, G. Troger, T. Tsyvinski, A. Turner, S.
U Ui, T.
Unver, U.
Urzua, S. V
Valente, S. Valimaki, J. Van Biesebroeck, J. van der Klaauw, B. van Soest, A. Van Zandt, T. Vega-Redondo, F.
Veldkamp, L. Veronesi, P. Vetta, A. Viceira, L. Vieille, N. Vincent, D. Violante, G.
Vives, X. Vogel, J. Vogelsang, T. Vohra, R. Vovk, V. Vuong, Q. Vytlacil, E.
W Wachter, J. Waldman, M. Wang, J. Wang, R. Washington, E. Watanabe, M. Watson, M. Watts, A. Weale, M. Weber, R.
Weibull, J. Weinstein, J. Werner, J. Werning, I. Wettstein, D. Weymark, J. Whinston, M. Wieland, V. Wilcox, N. Wilson, A. X
Xiao, Z.
Xiong, W.
Winter, S. Wiseman, T. Wolak, F. Wolfers, J. Wooders, M. Woodland, A. Worrall, T. Woutersen, T. Wright, R.
355
ANNUAL REPORTS
Y Yan, J. Yannelis, N. Yared, P.
Yaron, A. Yatchew, A. Yechiam, E.
Yildiz, M. Yogo, M. Yokoo, M.
Z Zaffaroni, P. Zame, W. Zapatero, F.
Zhang, B. Zhang, J. Zhang, L.
Zhu, S. Zinde-Walsh, V.
Econometrica, Vol. 77, No. 1 (January, 2009), 357–359
THE ECONOMETRIC SOCIETY ANNUAL REPORTS REPORT OF THE EDITORS OF THE MONOGRAPH SERIES THE GOAL OF THE SERIES is to promote the publication of high-quality research works in the fields of economic theory, econometrics, and quantitative economics more generally. Publications may range from more or less extensive accounts of the state of the art in a field to which the authors have made significant contributions to shorter monographs that represent important advances on more specific issues. In addition to the usual promotion by the publisher (Cambridge University Press) in their advertising and displays at conferences, it also arranges for members of the Econometric Society to receive monographs at a special discount. The publishing arrangement with Cambridge University Press specifies that the reviewing process and the decision to publish a monograph in the series rests solely in the hands of the Editors appointed by the Society, in the same way as for papers submitted to Econometrica. Our experience shows that this procedure generates quite valuable services to the authors. Referee reports are usually very professional, and contain detailed and specific suggestions on how to improve the manuscript. Such services, which are not normally offered by private publishing companies, are among the features that distinguish the Monograph Series of the Society from others. The complete list of publications in the series follows; the original publication dates of the hardcover (HC) and paperback (PB) versions are given, and an asterisk (*) indicates that a particular paper version is now out of print. All 44 monographs are now available for electronic purchase, and are available online to Econometric Society members free of charge. 1. W. Hildenbrand (ed.), Advances in Economic Theory, *HC:2/83, *PB:8/85. 2. W. Hildenbrand (ed.), Advances in Econometrics, *HC:2/83, *PB:8/85. 3. G. S. Maddala, Limited Dependent and Qualitative Variables in Econometrics, HC:3/83, PB:6/86. 4. G. Debreu, Mathematical Economics, HC:7/83, PB:10/86. 5. J.-M. Grandmont, Money and Value, *HC:11/83, *PB:9/85. 6. F. M. Fisher, Disequilibrium Foundations of Equilibrium Economics, HC:11/83, PB:3/89. 7. B. Peleg, Game Theoretic Analysis of Voting in Committees, *HC:7/84. 8. R. J. Bowden and D. A. Turkington, Instrumental Variables, *HC:1/85, *PB:1/90. Second edition in process. 9. A. Mas-Colell, The Theory of General Economic Equilibrium: A Differentiable Approach, HC:8/85, PB:1/90. 10. J. Heckman and B. Singer, Longitudinal Analysis of Labor Market Data, *HC:10/85. 11. C. Hsiao, Analysis of Panel Data, *HC:7/86, *PB:11/89. 12. T. Bewley (ed.), Advances in Economic Theory: Fifth World Congress, *HC:8/87, *PB:7/89. 13. T. Bewley (ed.), Advances in Econometrics: Fifth World Congress, Vol. I, HC:11/ 87, PB:4/94. 14. T. Bewley (ed.), Advances in Econometrics: Fifth World Congress, Vol. II, HC:11/87, PB:4/94. © 2009 The Econometric Society
DOI: 10.3982/ECTA771MONO
358 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34.
35.
36.
37. 38. 39. 40. 41.
ANNUAL REPORTS
H. Moulin, Axioms of Cooperative Decision-Making, HC:11/88, PB:7/91. L. G. Godfrey, Misspecification Tests in Econometrics, HC:2/89, PB:7/91. T. Lancaster, The Econometric Analysis of Transition Data, HC:9/90, PB:6/92. A. Roth and M. Sotomayor, Two-Sided Matching, HC:9/90, PB:6/92. W. Härdle, Applied Nonparametric Regression Analysis, HC:10/90, PB:1/92. J.-J. Laffont (ed.), Advances in Economic Theory: Sixth World Congress, Vol. I, HC:12/92, PB:2/95. J.-J. Laffont (ed.), Advances in Economic Theory: Sixth World Congress, Vol. II, HC:12/92, PB:2/95. H. White, Inference, Estimation and Specification Analysis, HC:9/94, PB:6/96. C. Sims (ed.), Advances in Econometrics: Sixth World Congress, Vol. I, HC:3/94, PB:3/96. C. Sims (ed.), Advances in Econometrics: Sixth World Congress, Vol. II, HC:10/94, PB:3/96. R. Guesnerie, A Contribution to the Pure Theory of Taxation, HC:9/95, PB:9/98. D. M. Kreps and K. F. Wallis (eds.), Advances in Economics and Econometrics: Theory and Applications (Seventh World Congress), Vol. I, HC:2/97, PB:2/97. D. M. Kreps and K. F. Wallis (eds.), Advances in Economics and Econometrics: Theory and Applications (Seventh World Congress), Vol. II, HC:2/97, PB:2/97. D. M. Kreps and K. F. Wallis (eds.), Advances in Economics and Econometrics: Theory and Applications (Seventh World Congress), Vol. III, HC:2/97, PB:2/97. D. Jacobs, E. Kalai, and M. Kamien (eds.), Frontiers of Research in Economic Theory—The Nancy L. Schwartz Memorial Lectures, HC & PB:11/98. A. C. Cameron and P. K. Trivedi, Regression Analysis of Count—Data, HC & PB:9/98. S. Strøm (ed.), Econometrics and Economic Theory in the 20th Century: The Ragnar Frisch Centennial Symposium, HC & PB:2/99. E. Ghysels, N. Swanson, and M. Watson (eds.), Essays in Econometrics— Collected Papers of Clive W. J. Granger, Vol. I, HC & PB:7/01. E. Ghysels, N. Swanson, and M. Watson (eds.), Essays in Econometrics— Collected Papers of Clive W. J. Granger, Vol. II, HC & PB:7/01. M. Dewatripont, L. P. Hansen, and S. J. Turnovsky (eds.), Advances in Economics and Econometrics: Theory and Applications (Eighth World Congress), Vol. I, HC:2/03, PB:2/03. M. Dewatripont, L. P. Hansen, and S. J. Turnovsky (eds.), Advances in Economics and Econometrics: Theory and Applications (Eighth World Congress), Vol. II, HC:2/03, PB:2/03. M. Dewatripont, L. P. Hansen, and S. J. Turnovsky (eds.), Advances in Economics and Econometrics: Theory and Applications (Eighth World Congress), Vol. III, HC:2/03, PB:2/03. C. Hsiao, Analysis of Panel Data: Second Edition, HC & PB:2/03. R. Koenker, Quantile Regression, HC & PB:5/05. C. Blackorby, W. Bossert, and D. Donaldson, Population Issues in Social Choice Theory, Welfare Economics and Ethics, HC & PB:11/05. J. Roemer, Democracy, Education, and Equality, HC & PB:1/06. R. Blundell, W. Newey, and T. Persson, Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress, Vol. I, HC & PB:8/06.
ANNUAL REPORTS
359
42. R. Blundell, W. Newey, and T. Persson, Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress, Vol. II, HC & PB:8/06. 43. R. Blundell, W. Newey, and T. Persson, Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress, Vol. III, HC & PB:3/07. 44. F. Vega-Redondo, Complex Nextworks, HC & PB:3/07. The editors welcome submissions of high-quality manuscripts, as well as inquiries from prospective authors at an early stage of planning their monographs. Note that the series now includes shorter more focussed manuscripts on the order of 100 to 150 pages, as well as the traditional longer series of 200 to 300 or more pages. Information on submissions can be found on the society website. Andrew Chesher is the Co-Editor responsible for econometrics and George Mailath is the Co-Editor responsible for the economic theory side of the series. ANDREW CHESHER GEORGE MAILATH
SUBMISSION OF MANUSCRIPTS TO ECONOMETRICA 1. Members of the Econometric Society may submit papers to Econometrica electronically in pdf format according to the guidelines at the Society’s website: http://www.econometricsociety.org/submissions.asp Only electronic submissions will be accepted. In exceptional cases for those who are unable to submit electronic files in pdf format, one copy of a paper prepared according to the guidelines at the website above can be submitted, with a cover letter, by mail addressed to Professor Stephen Morris, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A. 2. There is no charge for submission to Econometrica, but only members of the Econometric Society may submit papers for consideration. In the case of coauthored manuscripts, at least one author must be a member of the Econometric Society. Nonmembers wishing to submit a paper may join the Society immediately via Blackwell Publishing’s website. Note that Econometrica rejects a substantial number of submissions without consulting outside referees. 3. It is a condition of publication in Econometrica that copyright of any published article be transferred to the Econometric Society. Submission of a paper will be taken to imply that the author agrees that copyright of the material will be transferred to the Econometric Society if and when the article is accepted for publication, and that the contents of the paper represent original and unpublished work that has not been submitted for publication elsewhere. If the author has submitted related work elsewhere, or if he does so during the term in which Econometrica is considering the manuscript, then it is the author’s responsibility to provide Econometrica with details. There is no page fee; nor is any payment made to the authors. 4. Econometrica has the policy that all empirical and experimental results as well as simulation experiments must be replicable. For this purpose the Journal editors require that all authors submit datasets, programs, and information on experiments that are needed for replication and some limited sensitivity analysis. (Authors of experimental papers can consult the posted list of what is required.) This material for replication will be made available through the Econometrica supplementary material website. The format is described in the posted information for authors. Submitting this material indicates that you license users to download, copy, and modify it; when doing so such users must acknowledge all authors as the original creators and Econometrica as the original publishers. If you have compelling reason we may post restrictions regarding such usage. At the same time the Journal understands that there may be some practical difficulties, such as in the case of proprietary datasets with limited access as well as public use datasets that require consent forms to be signed before use. In these cases the editors require that detailed data description and the programs used to generate the estimation datasets are deposited, as well as information of the source of the data so that researchers who do obtain access may be able to replicate the results. This exemption is offered on the understanding that the authors made reasonable effort to obtain permission to make available the final data used in estimation, but were not granted permission. We also understand that in some particularly complicated cases the estimation programs may have value in themselves and the authors may not make them public. This, together with any other difficulties relating to depositing data or restricting usage should be stated clearly when the paper is first submitted for review. In each case it will be at the editors’ discretion whether the paper can be reviewed. 5. Papers may be rejected, returned for specified revision, or accepted. Approximately 10% of submitted papers are eventually accepted. Currently, a paper will appear approximately six months from the date of acceptance. In 2002, 90% of new submissions were reviewed in six months or less. 6. Submitted manuscripts should be formatted for paper of standard size with margins of at least 1.25 inches on all sides, 1.5 or double spaced with text in 12 point font (i.e., under about 2,000 characters, 380 words, or 30 lines per page). Material should be organized to maximize readability; for instance footnotes, figures, etc., should not be placed at the end of the manuscript. We strongly encourage authors to submit manuscripts that are under 45 pages (17,000 words) including everything (except appendices containing extensive and detailed data and experimental instructions).
While we understand some papers must be longer, if the main body of a manuscript (excluding appendices) is more than the aforementioned length, it will typically be rejected without review. 7. Additional information that may be of use to authors is contained in the “Manual for Econometrica Authors, Revised” written by Drew Fudenberg and Dorothy Hodges, and published in the July, 1997 issue of Econometrica. It explains editorial policy regarding style and standards of craftmanship. One change from the procedures discussed in this document is that authors are not immediately told which coeditor is handling a manuscript. The manual also describes how footnotes, diagrams, tables, etc. need to be formatted once papers are accepted. It is not necessary to follow the formatting guidelines when first submitting a paper. Initial submissions need only be 1.5 or double-spaced and clearly organized. 8. Papers should be accompanied by an abstract of no more than 150 words that is full enough to convey the main results of the paper. On the same sheet as the abstract should appear the title of the paper, the name(s) and full address(es) of the author(s), and a list of keywords. 9. If you plan to submit a comment on an article which has appeared in Econometrica, we recommend corresponding with the author, but require this only if the comment indicates an error in the original paper. When you submit your comment, please include any correspondence with the author. Regarding comments pointing out errors, if an author does not respond to you after a reasonable amount of time, then indicate this when submitting. Authors will be invited to submit for consideration a reply to any accepted comment. 10. Manuscripts on experimental economics should adhere to the “Guidelines for Manuscripts on Experimental Economics” written by Thomas Palfrey and Robert Porter, and published in the July, 1991 issue of Econometrica. Typeset at VTEX, Akademijos Str. 4, 08412 Vilnius, Lithuania. Printed at The Sheridan Press, 450 Fame Avenue, Hanover, PA 17331, USA. Copyright © 2009 by The Econometric Society (ISSN 0012-9682). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation, including the name of the author. Copyrights for components of this work owned by others than the Econometric Society must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Posting of an article on the author’s own website is allowed subject to the inclusion of a copyright statement; the text of this statement can be downloaded from the copyright page on the website www.econometricsociety.org/permis.asp. Any other permission requests or questions should be addressed to Claire Sashi, General Manager, The Econometric Society, Dept. of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA. Email: [email protected]. Econometrica (ISSN 0012-9682) is published bi-monthly by the Econometric Society, Department of Economics, New York University, 19 West 4th Street, New York, NY 10012. Mailing agent: Sheridan Press, 450 Fame Avenue, Hanover, PA 17331. Periodicals postage paid at New York, NY and additional mailing offices. U.S. POSTMASTER: Send all address changes to Econometrica, Blackwell Publishing Inc., Journals Dept., 350 Main St., Malden, MA 02148, USA.
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Membership, Subscriptions, and Claims Membership, subscriptions, and claims are handled by Blackwell Publishing, P.O. Box 1269, 9600 Garsington Rd., Oxford, OX4 2ZE, U.K.; Tel. (+44) 1865-778171; Fax (+44) 1865-471776; Email [email protected]. North American members and subscribers may write to Blackwell Publishing, Journals Department, 350 Main St., Malden, MA 02148, USA; Tel. 781-3888200; Fax 781-3888232. Credit card payments can be made at www.econometricsociety.org. Please make checks/money orders payable to Blackwell Publishing. Memberships and subscriptions are accepted on a calendar year basis only; however, the Society welcomes new members and subscribers at any time of the year and will promptly send any missed issues published earlier in the same calendar year. Individual Membership Rates Ordinary Member 2009 Print + Online 1933 to date Ordinary Member 2009 Online only 1933 to date Student Member 2009 Print + Online 1933 to date Student Member 2009 Online only 1933 to date Ordinary Member—3 years (2009–2011) Print + Online 1933 to date Ordinary Member—3 years (2009–2011) Online only 1933 to date Subscription Rates for Libraries and Other Institutions Premium 2009 Print + Online 1999 to date Online 2009 Online only 1999 to date
$a $60
€b €40
£c £32
Concessionaryd $45
$25
€18
£14
$10
$45
€30
£25
$45
$10
€8
£6
$10
$175
€115
£92
$70
€50
£38
$a
€b
£c
Concessionaryd
$550
€360
£290
$50
$500
€325
£260
Free
a All
countries, excluding U.K., Euro area, and countries not classified as high income economies by the World Bank (http://www.worldbank.org/data/countryclass/classgroups.htm), pay the US$ rate. High income economies are: Andorra, Antigua and Barbuda, Aruba, Australia, Austria, The Bahamas, Bahrain, Barbados, Belgium, Bermuda, Brunei, Canada, Cayman Islands, Channel Islands, Cyprus, Czech Republic, Denmark, Equatorial Guinea, Estonia, Faeroe Islands, Finland, France, French Polynesia, Germany, Greece, Greenland, Guam, Hong Kong (China), Hungary, Iceland, Ireland, Isle of Man, Israel, Italy, Japan, Rep. of Korea, Kuwait, Liechtenstein, Luxembourg, Macao (China), Malta, Monaco, Netherlands, Netherlands Antilles, New Caledonia, New Zealand, Northern Mariana Islands, Norway, Oman, Portugal, Puerto Rico, Qatar, San Marino, Saudi Arabia, Singapore, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Taiwan (China), Trinidad and Tobago, United Arab Emirates, United Kingdom, United States, Virgin Islands (US). Canadian customers will have 6% GST added to the prices above. b Euro area countries only. c UK only. d Countries not classified as high income economies by the World Bank only. Back Issues Single issues from the current and previous two volumes are available from Blackwell Publishing; see address above. Earlier issues from 1986 (Vol. 54) onward may be obtained from Periodicals Service Co., 11 Main St., Germantown, NY 12526, USA; Tel. 518-5374700; Fax 518-5375899; Email [email protected].
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Administrative Office: Department of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA; Tel. 212-9983820; Fax 212-9954487 General Manager: Claire Sashi ([email protected]) 2008 OFFICERS ROGER B. MYERSON, University of Chicago, PRESIDENT JOHN MOORE, University of Edinburgh and London School of Economics, FIRST VICE-PRESIDENT BENGT HOLMSTRÖM, Massachusetts Institute of Technology, SECOND VICE-PRESIDENT TORSTEN PERSSON, Stockholm University, PAST PRESIDENT RAFAEL REPULLO, CEMFI, EXECUTIVE VICE-PRESIDENT
2008 COUNCIL (*)DARON ACEMOGLU, Massachusetts Institute of Technology MANUEL ARELLANO, CEMFI SUSAN ATHEY, Harvard University ORAZIO ATTANASIO, University College London (*)TIMOTHY J. BESLEY, London School of Economics KENNETH BINMORE, University College London TREVOR S. BREUSCH, Australian National University DAVID CARD, University of California, Berkeley JACQUES CRÉMER, Toulouse School of Economics (*)EDDIE DEKEL, Tel Aviv University and Northwestern University MATHIAS DEWATRIPONT, Free University of Brussels DARRELL DUFFIE, Stanford University HIDEHIKO ICHIMURA, University of Tokyo MATTHEW O. JACKSON, Stanford University
LAWRENCE J. LAU, The Chinese University of Hong Kong CESAR MARTINELLI, ITAM HITOSHI MATSUSHIMA, University of Tokyo MARGARET MEYER, University of Oxford PAUL R. MILGROM, Stanford University STEPHEN MORRIS, Princeton University ADRIAN R. PAGAN, Queensland University of Technology JOON Y. PARK, Texas A&M University and Sungkyunkwan University CHRISTOPHER A. PISSARIDES, London School of Economics ROBERT PORTER, Northwestern University ALVIN E. ROTH, Harvard University LARRY SAMUELSON, Yale University ARUNAVA SEN, Indian Statistical Institute MARILDA SOTOMAYOR, University of São Paulo JÖRGEN W. WEIBULL, Stockholm School of Economics
The Executive Committee consists of the Officers, the Editor, and the starred (*) members of the Council.
REGIONAL STANDING COMMITTEES Australasia: Trevor S. Breusch, Australian National University, CHAIR; Maxwell L. King, Monash University, SECRETARY. Europe and Other Areas: John Moore, University of Edinburgh and London School of Economics, CHAIR; Helmut Bester, Free University Berlin, SECRETARY; Enrique Sentana, CEMFI, TREASURER. Far East: Joon Y. Park, Texas A&M University and Sungkyunkwan University, CHAIR. Latin America: Pablo Andres Neumeyer, Universidad Torcuato Di Tella, CHAIR; Juan Dubra, University of Montevideo, SECRETARY. North America: Roger B. Myerson, University of Chicago, CHAIR; Claire Sashi, New York University, SECRETARY. South and Southeast Asia: Arunava Sen, Indian Statistical Institute, CHAIR.