CONTENTS MICHAEL PETERS: Noncontractible Heterogeneity in Directed Search . . . . . . . . . . . . . . . . .
1173
MARTIN F. HELLWIG: Incentive Problems With Unidimensional Hidden Characteristics:
A Unified Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ELHANAN HELPMAN, OLEG ITSKHOKI, AND STEPHEN REDDING: Inequality and Unemployment in a Global Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HALUK ERGIN AND TODD SARVER: A Unique Costly Contemplation Representation . . . . KAREEN ROZEN: Foundations of Intrinsic Habit Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . ADRIAN BRUHIN, HELGA FEHR-DUDA, AND THOMAS EPPER: Risk and Rationality: Uncovering Heterogeneity in Probability Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STEVEN D. LEVITT, JOHN A. LIST, AND DAVID H. REILEY: What Happens in the Field Stays in the Field: Exploring Whether Professionals Play Minimax in Laboratory Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1201 1239 1285 1341 1375
1413
NOTES AND COMMENTS: ASEN IVANOV, DAN LEVIN, AND MURIEL NIEDERLE: Can Relaxation of Beliefs Ratio-
nalize the Winner’s Curse?: An Experimental Study . . . . . . . . . . . . . . . . . . . . . . . ANNOUNCEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FORTHCOMING PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FELLOWS OF THE ECONOMETRIC SOCIETY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SUBMISSION OF MANUSCRIPTS TO THE ECONOMETRIC SOCIETY MONOGRAPH SERIES . . . . . . . . . .
VOL. 78, NO. 4 — July, 2010
1435 1453 1455 1457 1489
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org EDITOR STEPHEN MORRIS, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] MANAGING EDITOR GERI MATTSON, 2002 Holly Neck Road, Baltimore, MD 21221, U.S.A.; mattsonpublishingservices@ comcast.net CO-EDITORS DARON ACEMOGLU, Dept. of Economics, MIT, E52-380B, 50 Memorial Drive, Cambridge, MA 021421347, U.S.A.;
[email protected] PHILIPPE JEHIEL, Dept. of Economics, Paris School of Economics, 48 Bd Jourdan, 75014 Paris, France; University College London, U.K.;
[email protected] WOLFGANG PESENDORFER, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] JEAN-MARC ROBIN, Dept. of Economics, Sciences Po, 28 rue des Saints Pères, 75007 Paris, France; University College London, U.K.;
[email protected] JAMES H. STOCK, Dept. of Economics, Harvard University, Littauer M-24, 1830 Cambridge Street, Cambridge, MA 02138, U.S.A.;
[email protected] ASSOCIATE EDITORS YACINE AÏT-SAHALIA, Princeton University JOSEPH G. ALTONJI, Yale University JAMES ANDREONI, University of California, San Diego JUSHAN BAI, Columbia University MARCO BATTAGLINI, Princeton University PIERPAOLO BATTIGALLI, Università Bocconi DIRK BERGEMANN, Yale University YEON-KOO CHE, Columbia University XIAOHONG CHEN, Yale University VICTOR CHERNOZHUKOV, Massachusetts Institute of Technology DARRELL DUFFIE, Stanford University JEFFREY ELY, Northwestern University HALUK ERGIN, Duke University JIANQING FAN, Princeton University MIKHAIL GOLOSOV, Yale University FARUK GUL, Princeton University JINYONG HAHN, University of California, Los Angeles PHILIP A. HAILE, Yale University JOHANNES HÖRNER, Yale University MICHAEL JANSSON, University of California, Berkeley PER KRUSELL, Stockholm University FELIX KUBLER, University of Zurich OLIVER LINTON, London School of Economics
BART LIPMAN, Boston University THIERRY MAGNAC, Toulouse School of Economics (GREMAQ and IDEI) DAVID MARTIMORT, IDEI-GREMAQ, Université des Sciences Sociales de Toulouse STEVEN A. MATTHEWS, University of Pennsylvania ROSA L. MATZKIN, University of California, Los Angeles SUJOY MUKERJI, University of Oxford LEE OHANIAN, University of California, Los Angeles WOJCIECH OLSZEWSKI, Northwestern University NICOLA PERSICO, New York University JORIS PINKSE, Pennsylvania State University BENJAMIN POLAK, Yale University PHILIP J. RENY, University of Chicago SUSANNE M. SCHENNACH, University of Chicago ANDREW SCHOTTER, New York University NEIL SHEPHARD, University of Oxford MARCIANO SINISCALCHI, Northwestern University JEROEN M. SWINKELS, Northwestern University ELIE TAMER, Northwestern University EDWARD J. VYTLACIL, Yale University IVÁN WERNING, Massachusetts Institute of Technology ASHER WOLINSKY, Northwestern University
EDITORIAL ASSISTANT: MARY BETH BELLANDO, Dept. of Economics, Princeton University, Fisher Hall, Princeton, NJ 08544-1021, U.S.A.;
[email protected] Information on MANUSCRIPT SUBMISSION is provided in the last two pages. Information on MEMBERSHIP, SUBSCRIPTIONS, AND CLAIMS is provided in the inside back cover.
SUBMISSION OF MANUSCRIPTS TO ECONOMETRICA 1. Members of the Econometric Society may submit papers to Econometrica electronically in pdf format according to the guidelines at the Society’s website: http://www.econometricsociety.org/submissions.asp Only electronic submissions will be accepted. In exceptional cases for those who are unable to submit electronic files in pdf format, one copy of a paper prepared according to the guidelines at the website above can be submitted, with a cover letter, by mail addressed to Professor Stephen Morris, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A. 2. There is no charge for submission to Econometrica, but only members of the Econometric Society may submit papers for consideration. In the case of coauthored manuscripts, at least one author must be a member of the Econometric Society. Nonmembers wishing to submit a paper may join the Society immediately via Blackwell Publishing’s website. Note that Econometrica rejects a substantial number of submissions without consulting outside referees. 3. It is a condition of publication in Econometrica that copyright of any published article be transferred to the Econometric Society. Submission of a paper will be taken to imply that the author agrees that copyright of the material will be transferred to the Econometric Society if and when the article is accepted for publication, and that the contents of the paper represent original and unpublished work that has not been submitted for publication elsewhere. If the author has submitted related work elsewhere, or if he does so during the term in which Econometrica is considering the manuscript, then it is the author’s responsibility to provide Econometrica with details. There is no page fee; nor is any payment made to the authors. 4. Econometrica has the policy that all empirical and experimental results as well as simulation experiments must be replicable. For this purpose the Journal editors require that all authors submit datasets, programs, and information on experiments that are needed for replication and some limited sensitivity analysis. (Authors of experimental papers can consult the posted list of what is required.) This material for replication will be made available through the Econometrica supplementary material website. The format is described in the posted information for authors. Submitting this material indicates that you license users to download, copy, and modify it; when doing so such users must acknowledge all authors as the original creators and Econometrica as the original publishers. If you have compelling reason we may post restrictions regarding such usage. At the same time the Journal understands that there may be some practical difficulties, such as in the case of proprietary datasets with limited access as well as public use datasets that require consent forms to be signed before use. In these cases the editors require that detailed data description and the programs used to generate the estimation datasets are deposited, as well as information of the source of the data so that researchers who do obtain access may be able to replicate the results. This exemption is offered on the understanding that the authors made reasonable effort to obtain permission to make available the final data used in estimation, but were not granted permission. We also understand that in some particularly complicated cases the estimation programs may have value in themselves and the authors may not make them public. This, together with any other difficulties relating to depositing data or restricting usage should be stated clearly when the paper is first submitted for review. In each case it will be at the editors’ discretion whether the paper can be reviewed. 5. Papers may be rejected, returned for specified revision, or accepted. Approximately 10% of submitted papers are eventually accepted. Currently, a paper will appear approximately six months from the date of acceptance. In 2002, 90% of new submissions were reviewed in six months or less. 6. Submitted manuscripts should be formatted for paper of standard size with margins of at least 1.25 inches on all sides, 1.5 or double spaced with text in 12 point font (i.e., under about 2,000 characters, 380 words, or 30 lines per page). Material should be organized to maximize readability; for instance footnotes, figures, etc., should not be placed at the end of the manuscript. We strongly encourage authors to submit manuscripts that are under 45 pages (17,000 words) including everything (except appendices containing extensive and detailed data and experimental instructions).
While we understand some papers must be longer, if the main body of a manuscript (excluding appendices) is more than the aforementioned length, it will typically be rejected without review. 7. Additional information that may be of use to authors is contained in the “Manual for Econometrica Authors, Revised” written by Drew Fudenberg and Dorothy Hodges, and published in the July, 1997 issue of Econometrica. It explains editorial policy regarding style and standards of craftmanship. One change from the procedures discussed in this document is that authors are not immediately told which coeditor is handling a manuscript. The manual also describes how footnotes, diagrams, tables, etc. need to be formatted once papers are accepted. It is not necessary to follow the formatting guidelines when first submitting a paper. Initial submissions need only be 1.5 or double-spaced and clearly organized. 8. Papers should be accompanied by an abstract of no more than 150 words that is full enough to convey the main results of the paper. On the same sheet as the abstract should appear the title of the paper, the name(s) and full address(es) of the author(s), and a list of keywords. 9. If you plan to submit a comment on an article which has appeared in Econometrica, we recommend corresponding with the author, but require this only if the comment indicates an error in the original paper. When you submit your comment, please include any correspondence with the author. Regarding comments pointing out errors, if an author does not respond to you after a reasonable amount of time, then indicate this when submitting. Authors will be invited to submit for consideration a reply to any accepted comment. 10. Manuscripts on experimental economics should adhere to the “Guidelines for Manuscripts on Experimental Economics” written by Thomas Palfrey and Robert Porter, and published in the July, 1991 issue of Econometrica. Typeset at VTEX, Akademijos Str. 4, 08412 Vilnius, Lithuania. Printed at The Sheridan Press, 450 Fame Avenue, Hanover, PA 17331, USA. Copyright © 2010 by The Econometric Society (ISSN 0012-9682). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation, including the name of the author. Copyrights for components of this work owned by others than the Econometric Society must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Posting of an article on the author’s own website is allowed subject to the inclusion of a copyright statement; the text of this statement can be downloaded from the copyright page on the website www.econometricsociety.org/permis.asp. Any other permission requests or questions should be addressed to Claire Sashi, General Manager, The Econometric Society, Dept. of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA. Email:
[email protected]. Econometrica (ISSN 0012-9682) is published bi-monthly by the Econometric Society, Department of Economics, New York University, 19 West 4th Street, New York, NY 10012. Mailing agent: Sheridan Press, 450 Fame Avenue, Hanover, PA 17331. Periodicals postage paid at New York, NY and additional mailing offices. U.S. POSTMASTER: Send all address changes to Econometrica, Blackwell Publishing Inc., Journals Dept., 350 Main St., Malden, MA 02148, USA.
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Membership Joining the Econometric Society, and paying by credit card the corresponding membership rate, can be done online at www.econometricsociety.org. Memberships are accepted on a calendar year basis, but the Society welcomes new members at any time of the year, and in the case of print subscriptions will promptly send all issues published earlier in the same calendar year. Membership Benefits • Possibility to submit papers to Econometrica, Quantitative Economics, and Theoretical Economics • Possibility to submit papers to Econometric Society Regional Meetings and World Congresses • Full text online access to all published issues of Econometrica (Quantitative Economics and Theoretical Economics are open access) • Full text online access to papers forthcoming in Econometrica (Quantitative Economics and Theoretical Economics are open access) • Free online access to Econometric Society Monographs, including the volumes of World Congress invited lectures • Possibility to apply for travel grants for Econometric Society World Congresses • 40% discount on all Econometric Society Monographs • 20% discount on all John Wiley & Sons publications • For print subscribers, hard copies of Econometrica, Quantitative Economics, and Theoretical Economics for the corresponding calendar year Membership Rates Membership rates depend on the type of member (ordinary or student), the class of subscription (print and online or online only) and the country classification (high income or middle and low income). The rates for 2010 are the following:
Ordinary Members Print and Online Online only Print and Online Online only
1 year (2010) 1 year (2010) 3 years (2010–2012) 3 years (2010–2012)
Student Members Print and Online Online only
1 year (2010) 1 year (2010)
High Income
Other Countries
$90 / €65 / £55 $50 / €35 / £30 $216 / €156 / £132 $120 / €84 / £72
$50 $10 $120 $24
$50 / €35 / £30 $10 / €7 / £6
$50 $10
Euro rates are for members in Euro area countries only. Sterling rates are for members in the UK only. All other members pay the US dollar rate. Countries classified as high income by the World Bank are: Andorra, Antigua and Barbuda, Aruba, Australia, Austria, The Bahamas, Bahrain, Barbados, Belgium, Bermuda, Brunei, Canada, Cayman Islands, Channel Islands, Croatia, Cyprus, Czech Republic, Denmark, Equatorial Guinea, Estonia, Faeroe Islands, Finland, France, French Polynesia, Germany, Greece, Greenland, Guam, Hong Kong (China), Hungary, Iceland, Ireland, Isle of Man, Israel, Italy, Japan, Rep. of Korea, Kuwait, Liechtenstein, Luxembourg, Macao (China), Malta, Monaco, Netherlands, Netherlands Antilles, New Caledonia, New Zealand, Northern Mariana Islands, Norway, Oman, Portugal, Puerto Rico, Qatar, San Marino, Saudi Arabia, Singapore, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Taiwan (China), Trinidad and Tobago, United Arab Emirates, United Kingdom, United States, Virgin Islands (US). Institutional Subscriptions Information on Econometrica subscription rates for libraries and other institutions is available at www.econometricsociety.org. Subscription rates depend on the class of subscription (print and online or online only) and the country classification (high income, middle income, or low income). Back Issues and Claims For back issues and claims contact Wiley Blackwell at
[email protected].
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Administrative Office: Department of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA; Tel. 212-9983820; Fax 212-9954487 General Manager: Claire Sashi (
[email protected]) 2010 OFFICERS JOHN MOORE, University of Edinburgh and London School of Economics, PRESIDENT BENGT HOLMSTRÖM, Massachusetts Institute of Technology, FIRST VICE-PRESIDENT JEAN-CHARLES ROCHET, Toulouse School of Economics, SECOND VICE-PRESIDENT ROGER B. MYERSON, University of Chicago, PAST PRESIDENT RAFAEL REPULLO, CEMFI, EXECUTIVE VICE-PRESIDENT
2010 COUNCIL DARON ACEMOGLU, Massachusetts Institute of Technology MANUEL ARELLANO, CEMFI SUSAN ATHEY, Harvard University ORAZIO ATTANASIO, University College London DAVID CARD, University of California, Berkeley JACQUES CRÉMER, Toulouse School of Economics (*)EDDIE DEKEL, Tel Aviv University and Northwestern University MATHIAS DEWATRIPONT, Free University of Brussels DARRELL DUFFIE, Stanford University GLENN ELLISON, Massachusetts Institute of Technology HIDEHIKO ICHIMURA, University of Tokyo (*)MATTHEW O. JACKSON, Stanford University MICHAEL P. KEANE, University of Technology Sydney LAWRENCE J. LAU, Chinese University of Hong Kong
CESAR MARTINELLI, ITAM ANDREW MCLENNAN, University of Queensland ANDREU MAS-COLELL, Universitat Pompeu Fabra and Barcelona GSE AKIHIKO MATSUI, University of Tokyo HITOSHI MATSUSHIMA, University of Tokyo MARGARET MEYER, University of Oxford PAUL R. MILGROM, Stanford University STEPHEN MORRIS, Princeton University JUAN PABLO NICOLINI, Universidad Torcuato di Tella CHRISTOPHER A. PISSARIDES, London School of Economics (*)ROBERT PORTER, Northwestern University JEAN-MARC ROBIN, Sciences Po and University College London LARRY SAMUELSON, Yale University ARUNAVA SEN, Indian Statistical Institute JÖRGEN W. WEIBULL, Stockholm School of Economics
The Executive Committee consists of the Officers, the Editors of Econometrica (Stephen Morris), Quantitative Economics (Orazio Attanasio), and Theoretical Economics (Martin J. Osborne), and the starred (*) members of the Council.
REGIONAL STANDING COMMITTEES Australasia: Andrew McLennan, University of Queensland, CHAIR; Maxwell L. King, Monash University, SECRETARY. Europe and Other Areas: John Moore, University of Edinburgh and London School of Economics, CHAIR; Helmut Bester, Free University Berlin, SECRETARY; Enrique Sentana, CEMFI, TREASURER. Far East: Hidehiko Ichimura, University of Tokyo, CHAIR. Latin America: Juan Pablo Nicolini, Universidad Torcuato di Tella, CHAIR; Juan Dubra, University of Montevideo, SECRETARY. North America: Bengt Holmström, Massachusetts Institute of Technology, CHAIR; Claire Sashi, New York University, SECRETARY. South and Southeast Asia: Arunava Sen, Indian Statistical Institute, CHAIR.
Econometrica, Vol. 78, No. 4 (July, 2010), 1173–1200
NONCONTRACTIBLE HETEROGENEITY IN DIRECTED SEARCH BY MICHAEL PETERS1 This paper provides a directed search model designed to explain the residual part of wage variation left over after the impact of education and other observable worker characteristics have been removed. Workers have private information about their characteristics at the time they apply for jobs. Firms value these characteristics differently and can observe them once workers apply. They hire the worker they most prefer. However, the characteristics are not contractible, so firms cannot condition their wages on them. This paper shows how to extend arguments from directed search to handle this, allowing for arbitrary distributions of worker and firm types. The model is used to provide a functional relationship that ties together the wage distribution and the wage– duration function. This relationship provides a testable implication of the model. This relationship suggests a common property of wage distributions that guarantees that workers who leave unemployment at the highest wages also have the shortest unemployment duration. This is in strict contrast to the usual (and somewhat implausible) directed search story in which high wages are always accompanied by higher probability of unemployment. KEYWORDS: Directed search, heterogeneous workers, limits of equilibrium.
1. INTRODUCTION THIS PAPER PROVIDES a directed search model in which worker and firm characteristics differ, but where firms cannot condition the wages they pay on worker characteristics as they can in papers like Shi (2002) or Shimer (2005). An example of such characteristics might be reference letters that convey a lot of information about an applicant’s skill as long as they are not contractible. Another example might be connections and friendships that workers have with managers or just with other workers in the industry. These connections typically cannot be verified in any way that would be satisfactory in a formal contract. Alternatively, firms may care a lot about worker characteristics on which they are not allowed to condition wages, for example, whether or not the worker has a criminal record or a history of union activism. Workers know their own characteristics at the time they apply for jobs, but they do not know the characteristics of other workers who might apply. Firms value these characteristics differently and can observe these characteristics once workers apply. Once they have collected a bunch of applications, they hire the worker they most prefer. The paper shows how to extend directed search to handle this, allowing for arbitrary distributions of worker and firm types. More broadly, this approach provides a method to understand the variation in wages that cannot be attributed to observable characteristics. The basic logic of the model ties together 1 I am grateful to Daron Acemoglu, Rob Shimer, and a number of referees for substantive and expositional help. The work on this paper was funded by SSHRC.
© 2010 The Econometric Society
DOI: 10.3982/ECTA8379
1174
MICHAEL PETERS
the wage distribution and the unemployment duration function, that is, the relationship between the wage at which a worker leaves unemployment and his duration. This relationship provides a potentially testable implication of the model. The relationship between the wage–duration function and the wage distribution developed in this paper makes it possible to examine one of the key predictions of directed search in models where wages cannot be conditioned on worker type—workers who submit applications to high wage firms should expect to be hired by those firms with low probability.2 There is not a lot of evidence about this central prediction, however, what there is does not seem to support it. Addison, Centeno, and Portugal (2004), for example, provided some evidence to suggest that the wages at which workers leave unemployment and the duration of their unemployment spell are negatively correlated. The evidence is not strong, but it certainly provides no support at all for the classical prediction of directed search.3 The argument below illustrates that the relationship between wage and unemployment duration is driven by two considerations. The first is completely intuitive: higher wage firms will tend to hire workers whose (externally unobservable) quality is higher, and these workers will tend to be more likely to find jobs no matter where they apply. This creates a positive relationship between quality, employment probability, and wage. This is confounded by the fact that higher quality workers will tend to use different application strategies than low quality workers. In particular, they will tend to apply at high wage jobs along with a lot of other high quality workers. This effect leads in the opposite direction, lowering the probability with which high quality workers will be hired. This is where the directed search model plays a role, since it ties down the application strategy for workers of different qualities. The characterization of the 2 For example, Peters (2000), Lang, Manove, and Dickens (1999), Eeckhout and Kircher (2008), Acemoglu and Shimer (2000), or Shi (2009) all generate equilibrium wage dispersion for which wage and employment probability are related in this way. Of course, when wages can be made conditional on type, as in Shi (2002) or Shimer (2005), workers who receive higher wages are being compensated for having a more valuable type, not for bearing risk, so no such relationship would be expected in these models. 3 Many models of directed search assume that workers and firms are identical, so this assertion is not so much a prediction as it is a statement about what happens out of equilibrium. However, there are many models that support distributions of wages in equilibrium. For example, Shi (2002), Shimer (2005), and Lang, Manove, and Dickens (1999) all allowed for a finite number of different types of workers. In Peters (2000), workers are identical, but there is a continuum of different types of firms. In Delacroix and Shi (2006) and Shi (2009), workers equilibrium wages (and search strategies) depend on their employment history. Albrecht, Gautier, and Vroman (2006) or Galenianos and Kircher (2005) supported wage distributions by allowing multiple applications. Finally, other than the model described in this paper, the only one I am familiar with that allows a continuum of both firm and worker types is Eeckhout and Kircher (2008). In all these models, the standard trade-off between high wages and low employment probabilities occurs on the equilibrium path.
NONCONTRACTIBLE HETEROGENEITY
1175
equilibrium application strategy provides a testable connection between wage offer distribution and the duration function. One implication of this relationship is that it provides a relatively simple (and apparently normally satisfied) restriction on the wage distribution that ensures that workers who leave unemployment at high wages tend to have shorter unemployment spells. This paper begins with an analysis of the case where there is a continuum of workers and firms. The actions of individual firms support a wage offer distribution. Workers’ decisions support a joint distribution of applications across wages and types. The payoff functions resemble those in a standard directed search model, yet there is an important difference. Instead of being concerned with the expected number of competitors he will face when he applies at a given wage, a worker is instead concerned with the expected number of competitors with higher types. For this reason, firms who set high wages do not necessarily get more applicants than low wage firms. However, the average quality of the applicants they receive is higher and this compensates the firm for the higher wages they commit themselves to pay. If firms differ in the way they value worker quality on the margin, then firms who like worker quality will set higher wages. This will induce an imperfect matching of high quality workers with firms who value that quality. However, we show that this matching will be imperfect. The reason is that equilibrium application strategies will have workers acting as if they were following a “reservation wage rule” in which they apply with equal probability to all firms that set wages above their reservation wage. For this reason, mismatches will continue, with high value employees remaining unemployed simply because they tried to compete with other high quality employees and high productivity firms filled vacancies with low quality workers simply because no high quality workers applied. We provide a pair of functional equations that can be used to characterize the equilibrium wage distribution and the equilibrium application strategy of workers. We use these to illustrate the equilibrium with a number of examples and to provide a condition that can be used to test the model. We also revisit the relationship mentioned above between unemployment duration and exit wage. From the arguments above, it should be apparent that exit wages induce a selection bias: workers who leave unemployment at high wages tend to be higher type workers who do not compete with the lower quality applicants who also apply to high wage firms. As a consequence, there is no reason for them to experience long unemployment spells. We use the functional equations to compute the relationship between exit wage and average duration. In general, this relationship is not systematic. However, we provide a restriction on the wage distribution that ensures that workers who leave unemployment at higher wages actually have higher employment probabilities. In the final part of this paper, we focus on a microfoundation for the model. This argument justifies the particular payoffs used in the main part of the paper. It also illustrates the result that workers should follow a reservation wage
1176
MICHAEL PETERS
strategy in choosing where to apply. We consider a finite array of firms and wages, and imagine a finite number of workers with privately knows types who apply to these firms. We characterize the unique symmetric Bayesian equilibrium application strategy for workers. As mentioned, it involves a reservation wage and then a set of application probabilities with which workers apply to higher wage firms. As might be expected, the higher the wage (or the farther away it is from the worker’s reservation wage), the lower the probability that the worker applies. At that point, we explain why it is that low type workers have to apply to high wage firms with some probability so as to support the equilibrium. This, of course, rules out assortative matching. One useful consequence of this argument is to distinguish models like this one, where firms care about the type of worker they hire, from models like Eeckhout and Kircher (2008), where they do not. Finally, we show explicitly how the payoff functions in the Bayesian equilibrium of the finite game converge to the payoff functions we described for the continuum model in the first part of the paper as the number of workers and firms grows large. In particular, we show how large numbers appear to equalize the application probabilities across firms. The limit theorems make it straightforward to show that pure strategy equilibrium of the finite search game support allocations that converge to the equilibrium allocations described in the first part of the paper. A few papers from the literature are worth mentioning to put the arguments here in context. Papers by Shi (2002) and Shimer (2005) resemble this one in that matching of worker and firm types is not assortative in equilibrium. They differ from this paper in that they allow firms to set wages that are conditional on the type of worker they ultimately hire. As a consequence, the logic that breaks down assortative matching is much different than it is here. The exact differences are easier to explain once the details of the model have been made clearer, so the discussion of the differences is deferred to Section 8. The papers by Lang, Manove, and Dickens (1999) and Eeckhout and Kircher (2008) provide directed search models that support assortative matching. In Eeckhout and Kircher (2008) this is accomplished by having firms declare the profits they require rather than the wages they pay.4 This has the effect of making the wage a worker receives from a firm depend on the worker’s type. However, firms cannot control how the wage varies with worker type, and this prevents them from adjusting wages to ensure all types will want to apply. Lang, Manove, and Dickens (1999) are concerned with racial discrimination, so there are only two different types of workers, which permits assortative matching. Again, we defer discussion of this point until the model in this paper has been described in more detail. 4 Of course, they do not model it in this crude way; they consider a model in which a firm sets the price that it wants from consumers.
NONCONTRACTIBLE HETEROGENEITY
1177
2. FUNDAMENTALS A labor market consists of measurable sets M and N of firms and workers, respectively. We normalize the measure of the set of firms to 1. The measure of the set of workers is assumed to be τ. Each worker has a characteristic y contained in a closed connected interval Y = [y y] ⊂ R+ . These characteristics are observable to firms once workers apply, but initially, they are private information to workers. Let F be a differentiable and monotonically increasing distribution function. Assume that τF(y) is the measure of workers whose type is less than or equal to y. These characteristics are assumed to be noncontractible. Firms are not able to make wages vary directly with these characteristics. A worker’s payoff if he finds a job is simply the wage he receives. If he fails to find a job, his payoff is zero. Workers are risk neutral. Firms’ characteristics are drawn from a set X = [x x]. Let H(x) be the measure of the set of firms whose characteristics are less than or equal to x, with H(x) = 1 as described above. H is assumed to be integrable with convex support. Each firm has a single job that it wants to fill. It chooses the wage that it wishes to pay the worker who fills this job. Each firm’s wage is chosen from a compact interval W ⊂ R+ . Payoffs for firms depend on the wage they offer and on the characteristic of the worker they hire, and, of course, on their own characteristic. The payoff for every firm that hires a worker is v : W × Y × X → R, where x ∈ X is the firm’s type and y ∈ Y is the characteristic of the worker who they hire. It is assumed that v is continuously differentiable in all its arguments, concave in w, and bounded. To maintain an order on firm types, it is assumed that for any pair (w y) and (w y ) with (w y) ≥ (w y ), if v(w y x) ≥ v(w y x) for some type x, then v(w y x ) ≥ v(w y x ) for any higher type x ≥ x. In words, this single crossing condition says that higher type firms assign a higher value to higher type workers. Finally, a firm that does not hire receives payoff 0. We use the assumption that v(· 0 ·) = 0 uniformly, which means that not hiring is treated the same way as hiring a worker with type 0. 3. THE MARKET Each firm in the market commits to a wage. As is typically assumed in directed search, each worker applies to one and only one firm. The firm is assumed to hire the highest type applicant who applies. It is assumed that a firm that advertises a job is committed to hire a worker as long as some worker applies, even if the firm’s perceived ex post payoff from the best worker who applies is negative.5 This assumption could be defended by observing that all the applicants who apply to the firm are verifiably qualified for the job being 5 A high type firm that sets a high wage to attract high type workers might not be willing to pay a low type worker that wage.
1178
MICHAEL PETERS
offered. A refusal to hire any applicant might be problematic for legal reasons, though, of course, no firm is required to hire every qualified applicant who applies. However, the assumption is primarily intended to simplify the limit arguments given in the last section of the paper. The payoff functions in the large game described in this section can be modified in a straightforward way to allow firms to decide not to hire when they have applicants. The payoffs that players receive depend on their own actions, and on the distributions of actions taken by the other players. We specify these payoffs using standard arguments from directed search and then provide a microfoundation in Section 8. Let G be the wage offer distribution and let P be the joint distribution of applications, where P(w y) is understood to be the measure of the set of workers of type y or less who apply at wage w or less. We let pw (y) refer to the conditional distribution function that describes the measure of the set of applicants of type less than or equal to y who apply at wage w A worker of type y is always chosen over workers with lower types wherever he applies. As a consequence, he is concerned not with the total number of applicants expected to apply at the firm where he applies (the queue size), but with the measure of the set of applicants whose type is as least as large as his. This number is given by
y
˜ dpw (y) y y
We use the familiar formula e− y dpw (y)˜ to give the probability of trade.6 This provides worker payoffs for wages in the support of G as (3.1)
U(w y G P) ≡ we−
y
˜ y dpw (y)
For firms, let w be a wage in the support of G. The payoff from employing such a worker is v(w x y). Integrating over the set of worker types gives expected profit (3.2)
V (w x G P) ≡
y
v(w y x)e−
y
y dpw (y )
dpw (y)
y
for any wage in the support of G. To describe equilibrium, we need to define payoffs for both workers and firms for wages that lie outside the support of G. In what follows, let G mean the support of G. The notation w means the highest wage in the support and w 6 We show below that this expression is the limit value of the matching probability in a conventional urn–ball matching game.
NONCONTRACTIBLE HETEROGENEITY
1179
means the lowest wage in the support. We can now define the payoffs using an argument similar to Acemoglu and Shimer (2000). Define ω(y) ≡ max we−
y
y dpw (y )
w∈G
The function ω(y) is monotonic and continuous, so it is differentiable almost everywhere. We refer to ω(y) henceforth as the market payoff. We assume, as do Acemoglu and Shimer (2000) (among others), that application strate/ G, all workers whose market payoff is gies adjust so that for any wage w ∈ less than w receive exactly their market payoff when they apply at w , while all other workers receive payoff w . In particular, we assume U(w y G P) = min[ω(y) w ]. For each wage w , we use the marginal distribution pw that supports this property to compute the firm’s profit when it offers such a wage. To find this marginal distribution, we solve the functional equation (3.3)
w e−
y
y dpw (y )
= ω(y)
on the support of pw . The solution for pw depends on whether w is above or below the support of G.7 We discuss this briefly here to illustrate the method and because the payoff for wages above the support is surprising. In the final part of the paper, we show that these payoff functions are limits of payoff functions in finite versions of the game. Begin with the case where w < w (i.e., below the support). If w ≤ ω(y), then (3.3) has no solution for any y and pw should be uniformly zero. Otherwise, (3.3) implies that y∗ dpw (y ) = log(w ) − log(ω(y)) y
The difference between the logs can be written as the integral of its derivative. Using this fact and changing the variable in the integration gives y∗ y∗ ˜ ω (y) ˜ (3.4) d y dpw (y ) = ˜ ω(y) y y The firm’s payoff V (w x G P) is then given by substituting (3.4) into (3.2). When w strictly exceeds the highest wage in the support of G, (3.3) implies that y y ˜ ω (y) ˜ d y dpw (y ) = log(w ) − log(w) + ˜ ω( y) y y 7
We leave out the case where G has a nonconvex support since it is very similar.
1180
MICHAEL PETERS
This means that to satisfy the market payoff condition, the distribution pw must have an atom at y of size log(w /w). Heuristically, this means that when a firm sets a wage that is strictly higher than every other wage in the support of the distribution of wages, it expects a set of applications from workers with the highest possible type. From this group, the firm will select an applicant randomly. This is just the usual matching problem in directed search. It is well known that in this case, if k is the measure of the set of applicants of the highest possible type, then each applicant is −k offered the job with probability 1−ek . The profit that the seller earns from this atom of applicants of the highest type is the measure of this set of applications, log(w /w), times the probability with which each of them is awarded the job, 1−e− log(w /w) , times the profit per applicant, v(w y x). So the profit associated log(w /w) with the wage w is given by y w y ω (y) (3.5) dy v(w y x) e− y (ω (y ) dy )/ω(y ) V (w x G P) ≡ w ω(y) y w + 1 − v(w y x) w With these definitions for profits associated with wages outside the support of G, the payoff functions given by (3.1) and (3.2) now define a large game in the sense that each player’s payoff depends on his own action and type as well as the distribution of actions of the other players.8 Equilibrium is now defined in the usual way by requiring that the distribution of best replies to a distribution coincides with the distribution itself. DEFINITION 3.1: An equilibrium of this game is a pair of distributions G and P such that (i) if BF = {w : ∃x; V (w x G P) ≥ V (w x G P) ∀w ∈ W }, then (3.6) dG(w) = 1 BF
and (ii) if BW = ((w y) : U(w y G P) ≥ U(w y G P) ∀w ∈ G), then (3.7) dP(w y) = τ BW
4. RESERVATION WAGES We first demonstrate that equilibrium application strategies must satisfy a reservation wage property. Workers will effectively pick the lowest wage to 8
For example, see Mas-Colell (1975), Ali-Khan and Sun (2002), or Schmeidler (1973).
1181
NONCONTRACTIBLE HETEROGENEITY
which they will apply and then apply with equal probability to all firms whose wages are higher.9 PROPOSITION 4.1: In any equilibrium and for every wage in the support of G, ⎧ y τ dF(y ) ⎪ ⎪ if w ≥ ω(y), ⎪ ⎪ ⎨ y 1 − G− (ω(y )) pw (y) = y ∗ (w) ⎪ τ dF(y ) ⎪ ⎪ otherwise, ⎪ ⎩ 1 − G− (ω(y )) y where G− (ω(y)) = limx↑ω(y) G(x) and y ∗ (w) = supy {y : ω(y ) ≤ w}. y
PROOF: First observe that by definition, ω(y) ≥ we− y dpw (y ) for all w. Then if w < ω(y), there is no nonnegative distribution function pw for which (3.7) could be satisfied. So pw (y) is constant for all y for which w < ω(y). To prove the reservation wage property, we want to show that if y is in the support of pw , then it is in the support of pw for all w > w. To accomplish this, we show the stronger result that w > ω(y) implies that y is in the support of pw . Suppose w > ω(y) for some pair (w y). Observe first that there must be some set B of F positive measure such that y > y and y is in the support of pw . If that were not true, then worker y would be hired for sure at wage w, which would contradict w > ω(y). By condition (3.7) in the definition of equilibrium, −
ω(y ) = we
y
y
˜ dpw (y)
for almost all y ∈ B. Now suppose there is also a subset B− which has strictly positive F -measure and contains only types y ≥ y who have market payoff ω(y ) > w, but which is not contained in the support of pw . Then workers whose types are in B do not compete against workers whose types are in B− when they apply at wage w Yet for the same reason, workers whose types are in B− do not compete against other workers whose types are in B− when they apply at wage w. Since almost all workers in B− are supposed to apply at wage w with probability 0, there must be a pair of distinct worker types, say y0 < y1 , such that y0 ∈ B− , y1 ∈ B, and ˜ − yy dpw (y)
we
0
− yy dpw (w)
= we
1
≥ ω(y1 ) > ω(y0 )
Since this contradicts the definition of ω(y0 ), we conclude that almost all y for which w > ω(y) are in the support of pw . 9
The proof of the following proposition was suggested by a referee.
1182
MICHAEL PETERS
Since pw is absolutely continuous with respect to F , we can write ω(y) = we−
y
˜ dF(y) ˜ y p(wy)τ
˜ dF(y) ˜ is the Radon–Nikodym derivative of pw . Differentiating where p(w y)τ with respect to y gives ω (y) = we−
y
˜ dF(y) y p(wy)τ
p(w y)
or p(w y) =
ω (y) ω(y)
w almost everywhere, which is independent of w. Since ω(y) p(w y) dG(w) = 1 for each y, we have p(w y) = 1−G−1(ω(y)) , which gives the result that workers apply equally to all firms whose wage is as high as their market payoff. Q.E.D. We can now use Proposition 4.1 to provide a characterization of equilibrium. It implies that we can interpret the market payoff ω(y) as the lowest wage to which a worker of type y applies. Then the inverse function ω−1 (w) ≡ y ∗ (w) has a natural interpretation as the highest type who applies with positive probability at a wage w.10 When a worker of type y applies at wage w, he does not need to worry about workers whose types are lower than his or workers whose types are such that their reservation wages exceed w. By Proposition 4.1, the workers whose types are between y and y ∗ (w) are using a relatively simple application rule that has them applying with equal probability at all firms whose wage is above their reservation wage. Substituting the result in Proposition 4.1 into (3.1) gives the worker’s expected payoff when he applies at wage w as (4.1)
we−
y ∗ (w) y
k(y ) dF(y )
where we substitute k(y) ≡
τ 1 − G (ω(y)) −
to simplify the formula slightly. Firms’ payoffs can be similiarly simplified. A firm that offers wage w in the support of G will attract workers whose types are between y and y ∗ (w). Each such worker is hired conditional on applying with probability e− 10
In case ω is not monotonic, use ω−1 (w) = supy {y : ω(y ) ≤ w}
y ∗ (w) y
k(y ) dF(y )
NONCONTRACTIBLE HETEROGENEITY
1183
as just described. The probability that such a worker applies at wage w is 1 − G− (ω(y)), which gives the expected revenue to a firm of type x from a worker of type y as y ∗ (w)
v(w y x)e− y k(y ) dF(y ) 1 − G− (ω(y)) Adding this up over all the worker types who apply with positive probability gives the firm’s profit function
y ∗ (w)
(4.2) y
y ∗ (w)
e− y k(y ) dF(y ) τf (y) dy v(w y x) 1 − G− (ω(y))
y ∗ (w)
=
k(y)v(w y x)e−
y ∗ (w) y
k(y ) dF(y )
dF(y)
y
for each wage in the support of G. To simplify the case where w < w, begin with type y ∗ (w), which is the highest type of worker who will apply at wage w. Each worker whose type is below y ∗ (w) has a market value below w. The measure of the set of firms whose wage is at least as high as their reservation wage is then 1, and k(y) is simply τ. So this market value is given by v(y) = we−
y ∗ (w0 ) y
τ dF(y )
Then from (3.3), the probability that worker y is hired at wage w must be equal to we−
y ∗ (w0 ) y
w
τ dF(y )
This simple multiplication makes all workers whose types are below y ∗ (w) indifferent between the deviator and the lowest wage in the support of G. Using this matching probability gives the firm’s profit when it offers w as y(w ) w y ∗ (w) (4.3) τv(w y x) e− y τ dF(y ) F (y) dy w y Finally, substituting the results of Proposition 4.1 into (3.5) gives the profits for a firm that offers a wage that is strictly higher than any wage in the support of G as w y w − yy k(y ) dF(y ) (4.4) k(y)v(w y x)e F (y) dy + v(w y x) 1 − w y w
1184
MICHAEL PETERS
5. EQUILIBRIUM We can now present the main characterization theorem. and its PROPOSITION 5.1: Suppose that both the function ι(w y x) ≡ v(wyx) w derivative with respect to w are nondecreasing in x. Then a pair (G P) is an equilibrium if and only if there is a point y0 and a pair of functions ω(y) and h(y) satisfying ω(y0 ) = w and G− (ω(y)) = H(h(y)) such that (5.1)
ω(y)
τ F (y) = ω (y) 1 − H(h(y))
and (5.2)
v[ω(y) y h(y)] y
v(ω(y) y h(y)) =− vw (ω(y) y h(y)) − ω (y ) dy ω(y) y
This proposition characterizes the equilibrium in a manner that is familiar in the directed search literature. The function ω is the market payoff function of workers or the market utility function. The worker of type y0 breaks the set of worker types into two parts. Workers whose types are at or above y0 have some wage where they can apply and be hired for sure. For them, the function ω(y) represents their reservation wage. The worker of type y0 , in particular, is sure to be hired if he applies at the lowest wage w offered by any firm. Worker types below y0 have a chance of losing out on the job even if they apply at the lowest wage w. The function h(y) identifies the type of the firm that offers worker y’s reservation wage when y ≥ y0 , while h(y) = x if y < y0 . The two conditions can be interpreted in the usual way as tangency conditions. For example, the payoff when a worker applies to a firm is a function of the wage that the firm offers and the highest worker type who applies to that firm. Each worker type should attain a payoff that maximizes this function across all wage–highest-type pairs that provide the market payoff to some worker type. Equation (5.1) then expresses the fact that a worker of type y has an indifference curve that is tangent to the market payoff function ω at y. Similarly, interpret the firm’s profit function (4.2) as a function of the wage that it pays and the highest worker type it attracts. The firm’s problem is then to choose a wage and highest worker type that maximizes its profit conditional on providing some worker type his market payoff. The equation (5.2) expresses the requirement that an isoprofit curve for a firm of type h(y) is tangent to the market payoff function ω(y) at the point (ω(y) y). It may be worth noting here that despite the tangency, this equilibrium will not be efficient. The reason is that this allocation does not do a good job matching worker and firm types. To see this, observe that from Shimer (2005), an
NONCONTRACTIBLE HETEROGENEITY
1185
efficient allocation is attained when firms can pay workers a wage that depends on their types. Generally, a low type worker who applies to a high type firm in Shimer’s model is rewarded with a lower wage than that paid to the high type workers. This focuses low type applications at low type firms, which limits mismatching. Here a low type worker is provided the same wage as a high type worker, providing the low type worker a much bigger incentive to apply at high wage firms. It is this concentration of applications with the high type firms which precludes efficiency. PROOF OF PROPOSITION 5.1: Start with an equilibrium pair P and G. Let w be the lowest wage in the support of G and let y0 = sup{y : ω(y) ≤ w}. From Proposition 4.1, each worker type y must attain his market payoff ω(y) by applying at any wage above his reserve price. So (5.3)
we−
y ∗ (w) y
k(y ) dF(y )
= constant
for each w ≥ ω(y). For each worker type y, the derivative of this expression with respect to w should then be zero at every wage above ω(y). That is, we−
y ∗ (w) y
k(y ) dF(y )
k(y ∗ (w))F (y ∗ (w))
y ∗ (w) dy ∗ (w) = e− y k(y ) dF(y ) dw
giving (5.4)
w
1 τ F (y ∗ (w)) = ∗ − 1 − G (w) dy (w)/dw
Since y ∗ (w) is the inverse function of ω(y) at each point at which ω is increasing, (5.5)
ω(y)
dω(y) τ F (y) = 1 − G (ω(y)) dy −
at each y ≥ y0 . Fix h(y) to be the solution to G− (ω(y)) = H(h(y)). The result (5.1) then follows from the definition of h. Now use the market payoff function defined by (5.1) and its extension to types below y0 to simplify the firm’s profit function. The firm’s profit function when it sets wage w ≥ ω(y0 ) is given by y ∗ (w) ω(y) F (y) dy V (w x G ω) = k(y)v(w y x) w y Now substituting (5.1) into the firm’s profit function gives y ∗ (w) ω (y) dy v(w y x) V (w x G ω) = w y
1186
MICHAEL PETERS
Observe that choosing w and then figuring out what worker types will apply by finding the worker type who has reservation wage w is equivalent to choosing the highest worker type who will apply, then setting the wage equal to that worker type’s reservation wage. Firm profits can then be written as functions of the highest worker type who applies as y ω (y ) dy V (ω(y) x G ω) = (5.6) v(ω(y) y x) ω(y) y Maximizing (5.6) with respect to y gives the first order condition y
v(ω(y) y x) v[ω(y) y x] = − vw (ω(y) y x) − ω (y ) dy ω(y) y When G is an equilibrium, the firm that offers a wage equal to worker y’s reservation wage is using a best reply. Hence this condition must hold when evaluated at x = h(y) for each y ≥ y, and this gives (5.2). To make the argument in the other direction, observe that a solution to (5.1) holds worker payoff constant as required by the first condition of equilibrium, provide the wage distribution is given by G(ω(y)) = H(h(y)). Hence (3.7) holds for the density p(w y) = 1−G1− (w) . The aggregate distribution P can then be constructed by integrating this density. From equation (4.4), it is straightforward to verify that local profit maximization conditions hold at ω(y) whenever (5.2) holds. It is then straightforward to verify by brute force that second order conditions are guaranteed by the assumption that v(w y x) is concave in w for every y. The same approach verifies the second order condition at w0 . Inside the support, the condition (5.2) guarantees that the first order conditions for profit maximization hold when every firm of type h(y) offers wage ω(y). If the second order condition fails, then the isoprofit curve in (w y) space for firm h(y) must cross ω at another wage. The argument is similar whether the wage at this crossing point is higher or lower than ω(y), so suppose the crossing occurs at a higher wage. Then firm h(y)’s isoprofit curve is strictly steeper than ω at this higher wage w . Yet by (5.2), there is some higher type firm x > h(y) whose isoprofit curve is tangent to ω at w . Since the slope of the isoprofit curve is ι(w y x)
−
y
ιw (w y x)ω (y ) dy
y
and this ratio is nondecreasing in x by assumption, this leads to a contradiction. Q.E.D.
NONCONTRACTIBLE HETEROGENEITY
1187
In Proposition 5.1, y0 is the highest worker type who applies to the lowest wage in the support of the equilibrium wage distribution. Condition (5.1) ensures that payoffs are constant above at all wages above the reservation wage. The advantage of the exponential matching function is that a reservation wage rule that satisfies this single functional equation ensures the constant payoff condition for all worker types. The condition (5.2) is the first order condition for profit maximization, that is, Vw (w x G ω) = 0. The term on the left hand side of the condition is the marginal gain associated with attracting a higher type when wage is increased. The term on the right hand side is the marginal cost of paying a higher wage to all the lower types. The functional h(y) has a natural interpretation as the lowest type of the firm that offers the reservation wage of worker y. From the type y0 and the functional equations ω and h, the wage distribution is readily constructed. The lowest wage in the support of the distribution is ω(y0 ). Then for each y > y0 , G− (ω(y)) = H(h(y)). 6. EXAMPLES WITH IDENTICAL FIRMS To begin, we will assume that all firms have the same profit function. The reason this case is amenable to analysis is because the first order condition (5.2) reduces to a constant profit condition that involves only the function ω and a boundary condition. If we begin with a function ω that satisfies this constant profit condition, we can then find the function G that ensures that (5.1) is satisfied. If this function is a distribution function, then we have an equilibrium with a nondegenerate wage distribution. EXAMPLE 6.1: If v(w y x) = 1 + αy − w for all x ∈ [x x], then there is a y y − τF (t) dt degenerate single wage equilibrium with w = 1 + αy − y (1 + αy ) de y . PROOF: Proposition 5.1 applies to the degenerate case since we can set y0 = y and h(y) = x. To see this, observe that if the wage distribution is degenerate at some wage w0 , then all workers apply at this wage. Since y0 is the highest type worker who applies to the lowest wage in the support, the conclusion y0 = y follows. The function h(y) is supposed to be the lowest firm type that offers worker y his reservation wage. Since all firms offer the same wage, this is x. We then have trivially from (5.1) that ω(y)τF (y) = wτF (y) = ω (y). The first order condition (5.2) then becomes (since G− (ω(y)) = G− (w) = 0) v[w y x] = −
y
v(w y x) vw (w y x) − ω (y ) dy w y
1188
MICHAEL PETERS
Rewriting (see the online Appendix for details) and substituting the specific profit function 1 + αy − w then gives the first order condition (6.1)
y
1 + αy − w =
−
(1 + αy ) de
y
y
τF (t) dt
y
which gives the result.
Q.E.D.
This equilibrium looks very much like any other generic directed search equilibrium in that a common wage is chosen so that no firm has any incentive to increase its wage offer. Lang and Manove (2003) proved a similar single wage result with a continuum of worker types under the assumption that firms’ profits are independent of worker types. Their model is a special case of this example in which α = 0. Substituting α = 0 into (6.1) and solving gives w = e−τ . The result is derived here by simplifying the functional equations in Proposition 5.1. To see the argument in a more conventional way,11 note that since firms do not care who they hire, they simply want to maximize expected profit. These expected profits are given by 1 − w times the probability that at least one worker applies, say 1 − e−l . Since firms always choose the highest type worker who applies, this means that e−l is the probability that the lowest type worker is hired if he applies to the firm. Since all the firm cares about is the trading probability, the equilibrium can be described using the usual market utility story: the firm should choose a wage to maximize (1 − w)(1 − e−l ) subject to the constraint that we−l = we−τ , where we−τ represents the expected payoff to the lowest worker type when he applies to one of the other firms. It is straightforward to check that w can only be a solution to this problem if w = e−τ as above. When firms care about worker types directly, the argument is more subtle since firms also care about the highest type who applies at any wage. It is this feature that is captured by (5.2) in Proposition 5.1. EXAMPLE 6.2: Suppose that v(w y x) = (αy − w) for all x. Then a wage distribution can be supported in equilibrium only if F (y) is decreasing. PROOF: Fix the lowest wage w in the support of the equilibrium distribution and let y0 be the type for whom ω(y0 ) = w. When firms have the same profit function, all wages in the support must yield the same profit. This is guaranteed by condition (5.2), which, after substituting the special profit function, becomes y
αy − ω(y) ω (y ) dy αy − ω(y) = − 1− ω(y) y 11
I am grateful to a referee for pointing this out.
NONCONTRACTIBLE HETEROGENEITY
1189
Rewriting slightly gives
y
(αy − ω(y))ω(y) =
αy ω (y ) dy
y
Since this must hold uniformly in y, the derivatives of this expression with respect to y must also be the same, that is, (αy − ω(y))ω (y) + ω(y)(α − ω (y)) = αyω (y) This gives the simple condition ω (y) = α2 . This is the condition that the market payoff function must have when ω(y) is in the support of the equilibrium wage distribution so that firms’ profits are constant on the support of this distribution. From condition (5.1), it must be that ω(y)
α τF (y) = 1 − G(ω(y)) 2
ω(y) along the support of the equilibrium wage distribution. Since 1−G(ω(y)) is strictly increasing in y, this condition cannot be fulfilled unless F (y) is decreasing. Q.E.D.
One implication of this result is that if worker types are uniformly distributed, then the only equilibrium that can supported with identical firms has all firms offering the same wage. EXAMPLE 6.3: Suppose that v(w x y) = y − w and that F(y) = y(2 − y) with y = 0 and y = 1. Then there is a worker–firm ratio τ0 < 32 such that a nondegenerate distribution of wages can be supported in equilibrium for the economy where the ratio of workers to firms is τ0 . The equilibrium wage distribution is convex and has support [ 2τ1 2τ1 + 14 ]. 0
0
The proof of the assertions in Example 6.3 is given in the Appendix, which comprises the Supplemental Material (Peters (2010)). This example illustrates a difference between this paper and Lang and Manove (2003), for which only single wage equilibrium exists with identical firms. The difference is that in the latter, firms do not care what worker they hire. The differences in matching probabilities associated with higher wages will not in itself support a distribution of wages. Here, higher wages can also bring improvements in worker quality, which is what supports the distribution here. 7. OFFER DISTRIBUTIONS AND DURATION These examples illustrate how the functional equations can be used to analyze equilibrium. They illustrate that equilibrium might not be unique. With
1190
MICHAEL PETERS
identical firms, nondegenerate wage distributions may or may not exist in equilibrium, depending both on equilibrium selection and primitive. In this section, we return to the case where firms differ, so that equilibrium wage offer distributions will generally be nondegenerate. In particular, the focus here is on the relationship between employment probability and wage. Employment probabilities at different wages are unobservable. However, some insight into this can be gleaned from unemployment duration. For this section, we imagine the equilibrium wage distribution G to be a steady state distribution associated with a repeated version of the model in which a worker of type y who fails to find a job goes into the next period with the same type, and faces the same wage and worker type distribution, so he plays the same mixed strategy again in the next period. To do this properly, we should model all the dynamics. However, for the interpretive results in this section, this informal interpretation should be sufficient. If the probability Q(y) with which a worker finds a job is the same in each period when he is unemployed, then the average number of periods that will 1 . This observation makes it possible to elapse before he finds a job is just Q(y) work out the relationship between duration of unemployment and the wage at which a worker leaves unemployment. Suppose this duration function is Φ(w), that is, Φ(w) is the average unemployment duration for workers who find jobs at wage w. Since workers apply with equal probability at all openings where the wage is above their reservation wage, it is possible to work out the probability that a worker of type y is hired by some firm. There are G (w) firms that offer wage w and 1 − G(ω(y)) firms that offer wages above the worker’s reservation wage. G (w) . The probability So the probability that a worker applies at wage w is 1−G(ω(y)) y ∗ (w)
of matching with such a firm is e− y k(y ) dF(y ) , so the probability that a worker matches if he follows his equilibrium application strategy is Q(y) =
wG
ω(y)
e−
y ∗ (w) y
k(y ) dF(y )
G (w) dw 1 − G(ω(y))
where wG is the highest wage in the support of G.12 Since the expected wage is constant for a worker of type y at every wage above ω(y), this can be written as wG G (w) ω(y) dw Q(y) = w 1 − G(ω(y)) ω(y) 12
In this expression, the function ω(y) has to be interpreted as the lowest wage to which a worker of type y applies instead of as his market payoff. These two things can differ for a worker whose type is low enough that there is no wage at which he will surely be hired.
NONCONTRACTIBLE HETEROGENEITY
1191
It is not immediate that better workers will have higher matching probabilities or lower durations. The reason is that higher types have higher reservation wages. So despite the fact that they are more likely to be hired at any particular firm than low type workers, they tend to apply at firms where there is a lot of high type competition. From the last equation and the fact that workers’ reservation wages increase in their types, employment probability will be an increasing function of type if wG w G (w ) ψ(w) = dw w 1 − G(w) w is an increasing function of the wage w.1314 This function represents the expectation of the ratio of any wage to the harmonic mean of higher wages in the distribution G. This function is not particularly simple conceptually; neither is it easy to deduce distributions for the unobservables that will support this property. However, it is relatively easy to check. For example, it is straightforward to check that the equilibrium distribution given in closed form in Example 6.3 has the property that this function is increasing. We explain below how to check this condition using the accepted wage distribution (which is easier to observe). Workers’ types cannot be observed. What is observable is the actual duration of workers hired at different wages. To establish the final connection, we simply have to show that firms that set high wages and hire a worker actually end up with better workers, that is, workers with higher matching probabilities. ˜ 0 |w) that a worker hired by the firm that offers wage w has The probability F(y a type less than or equal to y0 is given by y0 y ∗ (w) k(y)e− y k(y ) dF(y ) F (y) dy ˜ 0 |w) = y∗ F(y y (w)
k(y)e−
y ∗ (w) y
k(y ) dF(y )
F (y) dy
y
Note that this probability is conditional on some worker being hired by the firm, which explains the denominator. Substituting for k(y), and using (5.3) and (5.5) gives an even simpler formulation y0 ω (y) dy y ˜ 0 |w) = (7.1) F(y w − ω(y) 13
See the previous footnote to see why the lower bound of the integration is w. Notice that the function ψ must be increasing somewhere. It is obviously less than 1 when w is at the bottom of the support and equal to 1 at wG . 14
1192
MICHAEL PETERS
This expression is readily seen to be declining in w. The interpretation is that an increase in the wage moves the distribution function for the type hired by the firm to one that first order stochastically dominates the original distribution. The following proposition is a consequence of computing the duration as Φ(w) = Q(y) d1F(y|w) . ˜ PROPOSITION 7.1: If ψ(w) is monotonically increasing, then the expected duration of unemployment for a worker hired by a firm is a decreasing function of the wage offered by that firm. When ψ(w) is increasing, workers who are hired at high wage firms will tend, on average, to have spent less time searching for jobs than workers who are hired by low wage firms. This is quite unlike standard directed search where high wages and long duration must go together. This prediction is not a particularly strong test of the model, since the function ψ(w) may not be monotonic. Notice, however, that it is a testable consequence of the model that does not rely on any knowledge about the distributions of the unobservables. The expected duration for a worker hired by a firm offering a wage w is given by the reciprocal of y ∗ (w) ω (y) ψ(ω(y)) dy w − ω(y) y using the expression for the density of the type of worker hired by the firm that was derived above. It is apparent from this expression that when ψ(w) is nonmonotonic, then there will be no systematic relationship between the wage at which a worker is hired and his probability of matching measured as his expected duration. Even in this dimension, the result is quite different from the standard directed search model where wage and employment probability must be inversely related. Finally, whether duration and wage are inversely related or not, a simple ˜ change of variable in the expression Q(y) d F(y|w) gives the following result.15 PROPOSITION 7.2: In every equilibrium of the large directed search game, w 1 1 ψ(w) dw = w−w w Φ(w) The expression on the right hand side is the expected duration function; the expression on the left hand side is the condition derived from the equilibrium 15
I am grateful to Vadim Marmer for pointing out this connection.
NONCONTRACTIBLE HETEROGENEITY
1193
conditions. The function ψ is a simple function of the wage offer distribution, while Φ(w) is the observed relationship between duration and exit wage. One problem with these results is that they are based on the wage offer distribution, which is not observable. The relationship between the wage offer distribution and the accepted wage distribution is relatively straightforward since the accepted wage distribution can be computed from the wage offer distribution using the equilibrium application strategy and probabilities of being hired. Let G∗ be the observed (or accepted wage distribution). Then we can state the following proposition. PROPOSITION 7.3: The wage offer distribution G and the accepted wage distribution are related by w G∗ (w ) G(w) = G∗ (w) + ω(y) dw ω(y) w w 1 − w The proof of this proposition is in the (online) Appendix. On the right hand side of this expression, the only unobservable is the payoff to the lowest type worker ω(y). This could be estimated from survey data by using the worst observed experience or possibly by using an outside option like unemployment insurance to define the lowest attainable payoff.16 Up to this identifying assumption, this formula can be used to convert the results of Propositions 7.1 and 7.2 into statements about the observable accepted wage distribution. 8. EQUILIBRIUM OF THE WORKER APPLICATION SUBGAME Assertions about payoffs in large games are ultimately ad hoc. In this section, we analyze a finite game to show how the payoffs in the large game come about. The finite game also makes it possible to illustrate how the model discussed above differs from some of the other well known papers involving different worker types. In particular, in the finite model, it is easy to see how workers mixed application strategies differ from the mixing that occurs in models like Shimer (2005) and Shi (2002), where wages can be conditioned on worker type. Furthermore, the distinction between two type models like Lang, Manove, and Dickens (1999) and continuous type models can be made clear. Finally, the finite type model illustrates why assortative matching of the sort that occurs in Eeckhout and Kircher (2008) cannot occur here. For this section, there are n workers and m firms. Each worker’s type is an independent draw from some common distribution F . Worker and firm payoffs are as described above. Firm types are assumed to be common knowledge. 16 The lowest observed wage will typically exceed the payoff of the lowest worker type if the worst worker’s type is so low that he will not be hired for sure even at the lowest wage.
1194
MICHAEL PETERS
Firms set wages; then workers apply. Finally firms hire the best worker who applies. The solution concept is perfect Bayesian equilibrium. To begin, focus on the second part of the process in which workers make their applications. A strategy for worker i in the application subgame is a func m : π = 1}.17 This section tion π i : W m × Y → S m−1 , where S m−1 = {π ∈ Rm i + i=1 analyzes symmetric equilibria in which every worker uses an application strategy that is a common function of his or her type. The idea that is fundamental to directed search is that these application strategies depend on the array of wages being offered. For the purposes of characterizing the equilibrium in the application subgame associated with a fixed set of wages, the notation that captures this will be suppressed and we write πj (y) to denote the probability with which each worker whose type is y applies to firm j. Since firms always hire the worker with the highest type who applies, worker i will match with firm j in equilibrium as long as every other worker in the market either has a lower type than he does or applies to some other firm. To calculate this, suppose worker i’s type is y. The probability that some other worker has type y > y and chooses to come to firm j is πj (y ) dF(y ). So the probability that this worker will come and take the job away from worker i is y πj (y ) dF(y ). The probability that no other worker comes and takes the job y away is (8.1)
n−1 y πj (y ) dF(y ) q(y wj ) = 1 − y
So q(y wj ) is the probability that worker i gets the job at wage wj . His expected payoff when he applies to firm j is q(y wj ) multiplied by the wage wj that the firm offers. This logic can also be used to derive the firm’s profit function. The firm hires the best type who applies. So the firm’s expected profits are determined by the probability distribution of the highest type who applies. Fix a type y. The probability that any particular worker either has a type below y or applies at some y other firm, using the logic above, is 1 − y πj (y ) dF(y ). The probability that all the workers either have types below y or apply to another firm is n
y πj (y ) dF(y ) 1− y
To say this a different way, this is the probability that the highest type who applies to firm j is less than or equal to y. The probability distribution function 17 We ignore the possibility that a worker might not apply to any firm since that is a strictly dominated strategy given the assumptions about payoffs.
NONCONTRACTIBLE HETEROGENEITY
1195
has a density given by nq(y wj )πj (y)F (y). Integrating over possible values for this highest type gives the expected payoff function for firm j as
y
ρ(wj x) =
v(wj y x)nq(y wj )πj (y) dF(y) y
Notice that the functions π and q implicitly depend on the wages offered by all the firms, so these payoff functions describe a game in the usual way. It will simplify the argument in this section to assume that wages are ordered in such a way that w1 ≤ w2 ≤ · · · ≤ wm . In a slight abuse of notation, refer to an array {yK ym } with yK ≤ yK+1 ≤ · · · ≤ ym as a partition of the set of types. The collection of intervals [yk yk+1 ) along with [y yK ) constitutes the sets in this partition. The unique (symmetric) equilibrium for the application subgame is given by the following lemma. LEMMA 8.1: For any array of wages w1 wm offered by firms for which w1 > 0, there is a partition {yK ym } containing no more mthan m intervals, and a set {πjk }k≥K;j≥k of probabilities satisfying πjk > 0 and j=k πjk = 1 for each k and such that the strategy
πj (y) =
πjk if j ≥ k; y ∈ [yk yk+1 ), 0
otherwise,
is almost everywhere a unique (symmetric) continuation equilibrium application strategy. The probabilities πji satisfy (8.2)
πji i i
π
n−1 =
wi wj
for each j > i. Furthermore, the numbers {yk } and {πjk } depend continuously on the wages offered by firms. There are many indices to keep track of, but the logic is simple enough. Suppose there are only two firms offering wages w1 < w2 The highest worker types apply only to firm 2. If y is close enough to y, then even if the other worker is expected to apply at wage w2 for sure, the first worker will get the job with probability F(y). If F(y)w2 > w1 , then there is no point applying at wage w1 . This immediately describes the cutoff point ym = y2 to be the point where F(y2 )w2 = w1 . This gives the constant π22 = 1. The main content of the lemma comes in the description of what happens to the types below y2 . The lemma says that all types below y2 use exactly the
1196
MICHAEL PETERS
same application probabilities π11 and π21 which are readily derived from the conditions w1 1 π π21 = w2 1 and the fact that π11 and π21 must sum to zero (π21 = strategies are then ⎧ ⎨ w2 y < y , 2 π1 (y) = w1 + w2 ⎩ 0 otherwise, and
⎧ ⎨
w1 π2 (y) = w1 + w2 ⎩ 1
w1 w1 +w2
). The two application
y < y2 otherwise.
The complete proof is included in the Appendix. The lemma is hard to state because each worker type has to assign a different probability of applying to every different wage. The lemma shows that Bayesian equilibrium puts three kinds of structure on these strategies. First, it says that if a worker sends an application with positive probability to a firm offering a wage wk , then he or she must send applications to every higher wage as well. This is the “reservation wage” part of the story, since the worker has to decide what is the lowest wage to which he will send an application. Second, the formula (8.2) along with the requirement that the application probabilities sum to 1, then determines the entire application strategy from the reservation wage. Third, and critically important for most of the technical results in the paper, the lemma partitions the worker type space into intervals and then says that all workers whose type is in the same interval will use exactly the same application probabilities. To see why mixing has to occur, start with the elite types in the interval [ym y]. As in the two firm example described above, they apply only to the highest wage firm. They might as well, since their types are so high they are very likely to be hired at every wage no matter what the other workers do. The “marginal” worker type in this elite group has type ym . He is just indifferent between applying at the highest wage firm and getting the job if it happens to be the case that there are no other elite workers, and applying at the second highest wage and getting that job for sure. Now consider a worker whose type is just slightly below ym . If the only workers who apply to the highest wage firm are workers in the elite group, then this inframarginal type has the same chance of finding a job with the highest wage firm as does the worker of type ym ; he will get the job if none of the other workers has an elite type. Yet if he applies at the second highest wage, there is always a chance that he will lose out to a worker whose type is in between
NONCONTRACTIBLE HETEROGENEITY
1197
his type and ym . Since the worker of type ym is indifferent between the highest and second highest wage, the worker with the lower type must strictly prefer to apply at the highest wage. So to support an equilibrium, these inframarginal workers whose types are slightly below ym must face the same competition from other inframarginal workers at the highest wage firm as they do at the second highest wage firm. In other words, inframarginal workers must apply with positive probability at the highest wage firm. Exactly this same logic extends down through all the lower types. This explains the difference between models with a continuum of types and models like Lang, Manove, and Dickens (1999), where there are only two types.18 As explained above, it is the inframarginal workers who break up any potential equilibrium where workers sort by type. When there are only two types, as there are in Lang, Manove, and Dickens (1999), there are no inframarginal types, so complete sorting can be supported when the lower type is just indifferent between applying at the low wage and the high wage. The addition of a third worker type in their model without a corresponding firm type to hire it, would lead to an equilibrium in the application subgame that more closely resembles the equilibrium described here. The uniqueness of the mixed equilibrium also explains why assortative matching cannot be supported in equilibrium, as it is in Eeckhout and Kircher (2008). The difference between the two models is that Eeckhout and Kircher (2008) assumed that the “proposer” (or wage setter here) does not care directly about the types of the parties involved in a transaction. This same property could be accomplished here by changing the offer that the firm makes from a wage to a demand for profit, that is, whomever it hires, the firm requires the same profit. The firm no longer cares which worker it hires and workers will naturally match assortatively (under the payoff restrictions that they provide). Their approach is similar to the Shimer (2005) and Shi (2002) approach, which makes the wage contingent on worker type. Yet feasible contracts in Eeckhout and Kircher (2008) are more restricted than in Shimer (2005) and Shi (2002) since firms cannot vary wages arbitrarily across types. Models like Shimer (2005) or Shi (2002), where firms can condition wages on type also support mixed application strategies, but for a very different reasons. As in Shi (2002), suppose there are only two types of firms and two types of workers. High type firms offer high wages to attract high type workers. However, by the nature of directed search, there is a chance they will nonetheless end up without applications. Firms can raise their profits by setting a higher wage to attract lower quality applicants, who they will only hire in the event that no high quality applicants apply. The higher wage for lower quality is needed to compensate the low quality workers for the low chance they will be hired. So mixing among firms of different wages is used to support the equilibrium in the wage setting part of the game. 18
Two types is a very natural assumption for their problem.
1198
MICHAEL PETERS
9. LIMIT PAYOFFS Finally, we show the sense in which the large game payoffs we described in the continuum model in the first part of the paper are limits of payoffs in finite games. There are a couple of reasons for doing this. First, the matching probabilities and payoff functions used in the continuum model are subtly different than those used in the conventional directed search literature. For instance, matching probabilities are not based on the queue size at the firm in the usual way. Furthermore, payoffs associated with wage offers that exceed all wages in the support of the existing distribution of wages involve outcomes in which firms sometimes choose the best applicant and sometimes select randomly from a group of highest quality applicants. Whether or not these functions seem to be plausible descriptions of payoff, they are essentially ad hoc. The limit theorem provided here justifies these payoff functions. Furthermore, the symmetric continuation equilibrium in worker applications strategies is unique. This limit theorem shows not only that the heuristic descriptions of payoffs given in the continuum model are reasonable approximations to payoffs in large finite games, but also shows that these payoff functions are the only ones that can be used to approximate payoffs in large finite versions of the game. Last, the limit results are designed to provide payoffs for all distributions of wages, not just those associated with equilibrium. So they do not directly address sequences of equilibrium in the wage setting part of the game. However, it is straightforward to use these results to show that every sequence of pure strategy equilibrium in finite versions of the game converges to an equilibrium in the continuum game as we have described it above. All this requires is a restriction on firms’ payoff functions that ensures that they do not vary in “unreasonable” ways with worker types.19 The proof is by contradiction. If convergence fails, then the limit distribution of wages will have the property that a measurable set of firms will want to deviate. Then, provided payoffs are not too sensitive to type and wage distributions, firms will want to deviate in large finite versions of the game as well. The details of this straightforward but lengthy argument are left out in the interest of brevity. THEOREM 9.1: Let G be a distribution of wages and let w be a wage in the support of G offered by a firm of type x. Let Gn be a sequence of distributions with finite support that converges weakly to G. Then worker and firm payoffs in the continuation equilibrium in which other firms offer wages given by the mass points in Gn converge to the payoff functions given by (4.1), (4.2), (4.3), and (4.4) in Section 4. The complete proof of this theorem is in the (online) Appendix. 19 For example, if the family of payoff functions for a firm determined by the set of worker types is equicontinuous.
NONCONTRACTIBLE HETEROGENEITY
1199
10. CONCLUSION This paper illustrates how a directed search model can be used to model wage competition among firms who cannot condition wage payments on worker type. Part of this involves adjusting the directed search model to allow for rich variation in the types of workers and firms. This improves on existing models that use extensive symmetry assumptions that sometimes force the models to behave in counter-factual ways. In the variant proposed here, rich distributions of firm and worker characteristics can be incorporated. The directed search model does impose some structure on the data. Surprisingly it restricts the relationship between the wage distribution and the function relating unemployment duration and exit wage. Some wage distributions (the uniform being an example) have the property that workers who leave unemployment at high wages must also have shorter unemployment duration. This prediction is distinctly different from standard directed search models where unemployment duration and wage must be positively related. The driving force in the model presented here is the equilibrium of the workers’ application subgame. Contrary to what one might expect, low quality workers do not restrict their applications to low wage firms. On the contrary, low quality workers make applications at all kinds of different wages. The higher the unobservable quality of the worker, the more discriminating the worker is in the wages at which he applies. It is this property that breaks the strong relationship between wage and unemployment probability. Higher quality workers are more likely—all else being constant—to be hired by firms. High quality workers also apply to higher wage firms on average. In this sense, high wages and short duration should be related. This relationship is not unambiguous, however. As a worker’s quality rises, he is more likely to be hired at any given firm, but he will also restrict his applications to firms whose wages are higher. This by itself reduces the probability of employment, because high wage firms have bigger queues—the usual directed search story. Finally, this paper suggests how observable data on wages and duration can be used to provide a testable implication of the model. REFERENCES ACEMOGLU, D., AND R. SHIMER (2000): “Wage and Technology Dispersion,” Review of Economic Studies, 67, 585–607. [1174,1179] ADDISON, J. T., M. CENTENO, AND P. PORTUGAL (2004): “Reservation Wages, Search Duration and Accepted Wages in Europe,” Discussion Paper 1252, IZA. [1174] ALBRECHT, J., P. A. GAUTIER, AND S. VROMAN (2006): “Equilibrium Directed Search With Multiple Applications,” Review of Economic Studies, 73, 869–891. [1174] ALI-KHAN, M., AND Y. SUN (2002): “Non-Cooperative Games With Many Players,” in Handbook of Game Theory With Economic Applications, Vol. 2. Amsterdam: Elsevier. [1180] DELACROIX, A., AND S. SHI (2006): “Directed Search on the Job and the Wage Ladder,” International Economic Review, 47, 651–699. [1174] EECKHOUT, J., AND P. KIRCHER (2008): “Sorting and Decentralized Price Competion,” Working Paper, University of Pennsylvania. [1174,1176,1193,1197]
1200
MICHAEL PETERS
GALENIANOS, M., AND P. KIRCHER (2005): “Directed Search With Multiple Applications,” Working Paper, University of Pennsylvania. [1174] LANG, K., AND M. MANOVE (2003): “Wage Announcements With a Continuum of Worker Types,” Annales d’Economie et de Statistique, 10, 71–72. [1188,1189] LANG, K., M. MANOVE, AND W. DICKENS (1999): “Racial Discrimination in Labour Markets With Announced Wages,” Working Paper, Boston University. [1174,1176,1193,1197] MAS-COLELL, A. (1975): “A Model of Equilibrium With Differentiated Commodities,” Journal of Mathematical Economics, 2, 263–295. [1180] PETERS, M. (2000): “Limits of Exact Equilibria for Capacity Constrained Sellers With Costly Search,” Journal of Economic Theory, 95, 139–168. [1174] (2010): “Supplement to ‘Noncontractible Heterogeneity in Directed Search’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/ 8379_proofs.pdf. [1189] SCHMEIDLER, D. (1973): “Equilibrium Points of Nonatomic Games,” Journal of Statistical Physics, 4, 295–300. [1180] SHI, S. (2002): “A Directed Search Model of Inequality With Heterogeneous Skills and SkillBased Technology,” Review of Economic Studies, 69, 467–491. [1173,1174,1176,1193,1197] (2009): “Directed Search for Equilibrium Wage Tenure Contracts,” Econometrica, 77, 561–584. [1174] SHIMER, R. (2005): “The Assignment of Workers to Jobs in an Economy With Coordination Frictions,” Journal of Political Economy, 113, 996–1025. [1173,1174,1176,1184,1193,1197]
Dept. of Economics, University of British Columbia, 997-1873 East Mall, Vancouver, BC V6T 1Z1, Canada;
[email protected]. Manuscript received January, 2009; final revision received January, 2010.
Econometrica, Vol. 78, No. 4 (July, 2010), 1201–1237
INCENTIVE PROBLEMS WITH UNIDIMENSIONAL HIDDEN CHARACTERISTICS: A UNIFIED APPROACH BY MARTIN F. HELLWIG1 This paper develops a technique for studying incentive problems with unidimensional hidden characteristics in a way that is independent of whether the type set is finite, the type distribution has a continuous density, or the type distribution has both mass points and an atomless part. By this technique, the proposition that optimal incentive schemes induce no distortion “at the top” and downward distortions “below the top” is extended to arbitrary type distributions. However, mass points in the interior of the type set require pooling with adjacent higher types and, unless there are other complications, a discontinuous jump in the transition from adjacent lower types. KEYWORDS: Incentive problems, principal–agent models, hidden characteristics, general type distributions.
1. INTRODUCTION INCENTIVE PROBLEMS with unidimensional hidden characteristics have usually been analyzed under the assumption that either the type set is finite or the type set is an interval and the type distribution has a continuous, strictly positive density. These assumptions permit the application of standard optimization techniques: calculus if the type set is finite; control theory if the type set is a continuum. Both assumptions are special. In the space of distributions on the real line, distributions with finite supports and distributions with positive densities form a meager set. These distributions have the special property that the weights given to different types are commensurate in the sense that no one type is infinitely more important than any other type. This commensurateness property has played an important role in the analysis of such problems. It is, therefore, of interest to know to what extent the results and insights that have been obtained remain valid when commensurateness fails, in particular, when the type distribution has both mass points and a continuous part, so that some types (the mass points) are infinitely more important than others (the points at which the distribution has a positive density). As an example, consider the theory of optimal utilitarian income taxation. In this theory, the positivity of the optimal marginal income tax for all but the very highest types is usually explained in terms of a local equity-efficiency trade-off2 : If, for some type, the marginal income tax were zero, then this type’s labor–consumption pair would be efficient. At the margin, therefore, the efficiency loss induced by a small reduction of this type’s labor and consumption 1
For helpful comments and suggestions, I thank Christoph Engel, Alia Gizatulina, Hendrik Hakenes, and Klaus Ritzberger, as well as a co-editor and three referees. 2 See Mirrlees (1971, 1976, 1986), Seade (1977, 1982), and Hellwig (2007a). © 2010 The Econometric Society
DOI: 10.3982/ECTA7726
1202
MARTIN F. HELLWIG
would be negligible relative to the gains from the additional redistribution from higher types to lower types that is made possible by the induced slackening of incentive constraints for the higher types. This argument presumes that the different types are commensurate. If the type under consideration was a mass point of the type distribution, it would be “infinitely” more important than immediately adjacent higher types that are continuity points of the type distribution. Therefore, one could not presume that the efficiency loss induced by a small reduction of this type’s labor and consumption is negligible relative to the gains from the additional redistribution.3 Commensurateness of neighboring types is also presumed in elasticities interpretations of optimal income tax formulae in models with a continuum of types.4 The analysis of incentive problems with arbitrary type distributions requires a new technique. This paper develops such a technique and uses it to provide a complete characterization of optimal contracts in a principal–agent problem with unidimensional hidden characteristics, with a single-crossing condition on preferences, when no restriction is imposed on the type distribution. The principal–agent problem is chosen because its simplicity facilitates the exposition. The new technique can, however, be applied to any incentive problem with unidimensional hidden characteristics and with a single-crossing condition on preferences. In particular, it can be applied to the optimal income tax problem.5 For distributions with mass points and a continuous part, two new properties of optimal incentive schemes are obtained, both of them illustrated in Figure 1. First, any mass point below the top of the type set must be pooled with adjacent higher types. Second, a mass point between the top and the bottom of the type set is likely to give rise to a discontinuity in the mapping from types to outcomes.6 Both findings are due to the lack of commensurateness between the mass point and any adjacent higher or lower types that are continuity points of the type distribution. In terms of local trade-offs between allocative and distributive concerns (e.g., the equity-efficiency trade-off that is discussed in the 3 On this point, a referee has commented: “you distort downward to reduce the information rents of all types (not just the type that is immediately above).” While it is true that the weight of the mass point is commensurate with the weight of the set of all higher types, this observation is of little help when it comes to assessing whether one wants to use a distortion for type t or a distortion for type t + δt to reduce the information rents of types above t + δt Assessments of this sort underlie the Mirrlees formula for the optimal marginal income tax and the interpretations that this formula has been given. 4 Roberts (2000), Saez (2001), and Hellwig (2004/2008). 5 Because the optimal income tax problem is encumbered by the difficulties involved in characterizing the welfare weights of different types (Hellwig (2007a)), it is not so well suited for purposes of illustration of the new technique for dealing with problems involving arbitrary type distributions. 6 The effect can be neutralized by other, nonlocal considerations requiring the mass point to be pooled with lower as well as higher types. In the absence of such concerns, however, there must be an upward jump.
INCENTIVE PROBLEMS
1203
FIGURE 1.—An optimal incentive scheme when there is a mass point at t.
literature on optimal income taxation), the difference in relative weights given to a mass point and to adjacent continuity points of the type distribution implies that, at the mass point, efficiency concerns are much more important than at adjacent continuity points of the type distribution. In an income tax model, this would suggest that the mass point should work significantly more and consume significantly more than the neighboring types. With preferences satisfying a single-crossing condition, however, outcomes must be nondecreasing in types. Therefore, there cannot be a downward jump above the mass point. Instead, the monotonicity constraint is binding, and there is pooling of the mass point with adjacent higher types. By contrast, monotonicity does not preclude an upward jump as one moves from adjacent lower types to the mass point.7 This paper also considers those properties of optimal incentive schemes that have been discussed in the literature. Regardless of the structure of the type distribution, optimal incentive schemes involve no distortion “at the top” and downward distortions “below the top” of the type distribution. In the more general setting of this paper, the latter result requires a new argument. Whereas the desirability of downward distortions below the top has traditionally been derived from trade-offs imposed by the first-order conditions for incentive compatibility, at a mass point of the type distribution, the argument must rely on the second-order conditions. As explained above, the lack of commensurateness between a mass point and adjacent higher or lower continuity points of the type distribution implies that monotonicity conditions, that is, second-order conditions for incentive compatibility, are binding. The mass point must be pooled 7
However, there is no discontinuity in payoffs. Incentive compatibility precludes any discontinuity in the dependence of payoffs on types. At the discontinuity point, left-hand and right-hand limits of optimal outcomes lie on the same indifference curve for the critical type.
1204
MARTIN F. HELLWIG
with adjacent higher types and the analysis must show that, for all types in the pool, outcomes are distorted downward from efficiency. The paper builds on two technical innovations. First, for an arbitrary model with unidimensional hidden characteristics, a change of variables can be used to redefine the notion of “type” in such a way that the original incentive problem is transformed into a new one, where the distribution of the “redefined types” has a density. This density need not be continuous. However, from Clarke’s (1976, 1983) version of the maximum principle under minimal hypotheses, we know that this is not a problem. The application of controltheoretic methods does not require continuity of the Hamiltonian with respect to the exogenous parameter, that is, the agent’s type. It is important, however, to verify that the change of variables has no material effect on the solution to the incentive problem under consideration. Second, for control problems with monotonicity constraints, a version of the maximum principle holds even if the map from types to outcomes is not continuous. According to this result, which is established in Hellwig (2008), one may think of the “slope” of the map from types to outcomes as a control variable even though this map may have a nontrivial singular component and its slope may be unbounded. The maximum principle requires that, regardless of whether this slope is finite or infinite, it should not be possible to raise the value of the Hamiltonian by changing it. Thus, whenever the map from types to outcomes is strictly increasing, the associated co-state variable must be zero. Previous work on incentive problems with unidimensional hidden characteristics has assumed that the map from types to outcomes is piecewise continuously differentiable. This assumption facilitates the control-theoretic treatment of monotonicity constraints. With piecewise continuous differentiability, the slope of the map from types to outcomes can be treated as a control variable; monotonicity of outcomes is equivalent to requiring this control variable to take nonnegative values.8 Because the map from types to outcomes is endogenous, however, the assumption of piecewise continuous differentiability is problematic. The technique developed in this paper provides a way to do without it. The generalization of the analysis to allow for type distributions with mass points as well as a continuous part is not just a matter of mathematical generality. Such type distributions arise naturally in quasilinear models in which the agent can get information about his type before he signs a contract. In such models, being uninformed is equivalent to having a type equal to the mean of the type distribution. Thus, if there is a positive probability that the agent does not learn his type at all, then, from the principal’s perspective, the contracting problem can be treated as an incentive problem with hidden characteristics in which the type distribution has a mass point at its mean.9 8
This approach was pioneered by Guesnerie and Laffont (1984). In the literature on information acquisition and incentive contracting, Szalay (2005) considered the very technology in the text, but assumed that information is only acquired after the 9
INCENTIVE PROBLEMS
1205
In the following article, Section 2 formulates the agency problem with hidden characteristics and states the main results. Section 3 uses a change of variables to make the problem amenable to control-theoretic methods. Section 4 uses the maximum principle for control problems with monotonicity constraints to characterize the solutions to the agency problem and prove the main results. 2. A PRINCIPAL–AGENT PROBLEM WITH HIDDEN CHARACTERISTICS 2.1. Statement of the Problem A principal wants an agent to produce some output y ≥ 0 in return for a wage payment w ≥ 0 The payoffs from the pair (w y) are y −w for the principal and u(w y t) for the agent, where t ∈ R is a productivity parameter. The function u is assumed to be twice continuously differentiable, nondecreasing in w and t, nonincreasing in y and strictly quasiconcave in w and y jointly. The agent’s utility function also satisfies u(0 0 t) = 0 for all t limy→0 uy (w y t) = 0, and limy→∞ uy (w y t) = −∞ uniformly in w for all t as well as (2.1)
uw (w y t) > 0
uy (w y t) < 0
and (2.2)
∂ |uy (w y t)| ≤0 ∂t uw (w y t)
for all w > 0 y > 0 and t.10 The single-crossing condition (2.2) is imposed as a weak rather than a strict inequality. The principal is assumed to have all the bargaining power. If he offers the agent a contract (w y), the agent can only accept or reject the offer. The agent’s payoff from rejecting the principal’s offer is zero. Thus, under complete information, the principal would hire the agent at a wage that just compensates him for the disutility from working, without letting him share in the surplus from production. However, there is incomplete information. Whereas the agent knows t the principal thinks of t as the realization of a random variable t˜, to which he attributes a probability distribution F The support T of the distribution F is assumed to be compact, with minimum t0 and maximum t1 contract has been signed. Crémer, Khalil, and Rochet (1998a, 1998b) assumed that information is acquired before the contract is offered, but they also had a further stage at which the agent learns and uses the information anyway; in their analysis, therefore, being uninformed is not the same as having a type equal to the mean of the type distribution. 10 This utility specification encompasses the commonly used quasilinear specification u = w − g(y)/t; for the quasilinear specification, the assumptions reduce to the conditions that g(0) = g (0) = 0 g (y) > 0, and g (y) > 0 for y > 0, and limy→∞ g (y) = ∞
1206
MARTIN F. HELLWIG
Given his lack of information, the principal offers a menu of contracts and lets the agent choose a contract from the menu or reject the principal’s offer altogether. A contract menu is a pair (w(·) y(·)) of integrable functions on T such that, for any t ∈ T (w(t) y(t)) is the contract, that is, the wage/output combination that is chosen by the agent when his productivity parameter is t The principal’s problem is to choose the contract menu (w(·) y(·)) so that his expected net payoff, (2.3) [y(t) − w(t)] dF(t) is maximized subject to the incentive-compatibility condition that (IC)
u(w(t) y(t) t) ≥ u(w(t ) y(t ) t)
for all t and t in T and subject to the individual-rationality condition that (IR)
u(w(t) y(t) t) ≥ 0
for all t in T11 A contract menu that satisfies the incentive-compatibility and individual-rationality conditions is said to be admissible. A contract menu that maximizes the principal’s expected net payoff (2.3) subject to the incentivecompatibility and individual-rationality conditions is said to be optimal. 2.2. Distortions in Optimal Contracts Conceptually, the principal’s problem is a standard incentive problem with hidden characteristics. Textbook treatments are provided by Fudenberg and Tirole (1991), Mas-Colell, Whinston, and Green (1995), or Laffont and Martimort (2001) under the assumptions that T is a finite set or that T is an interval and F has a density that is strictly positive and continuous on T . Here, I only assume that T is compact. Let (w(·) y(·)) be an optimal contract menu and let (2.4)
v(·) := u(w(·) y(·) ·)
be the associated indirect utility function for the agent. To assess the efficiency properties of (w(t) y(t)) the literature compares (w(t) y(t)) to the 11 Condition (IR) presumes that the principal does not prefer to make an offer which, for some t, the agent wants to reject. This presumption involves no loss of generality: Under the given assumptions, a contract menu (w(·) y(·)) with the property that, for some t ∈ T the agent rejects the principal’s offer is payoff-equivalent to the contract menu that is obtained if, for the rejecting types, the contract offers (w(t) y(t)) are replaced by (0 0); this latter contract menu satisfies the participation constraint (IR) for all t.
INCENTIVE PROBLEMS
1207
pair (w∗ (t v(t)) y ∗ (t v(t))) that provides the person with productivity parameter t with the utility v(t) at the lowest net resource cost; formally, for any t and v (2.5)
(w∗ (t v) y ∗ (t v)) = arg max {y − w|u(w y n) ≥ v} y≥0w≥0
The pair (w∗ (t v) y ∗ (t v)) is fully characterized by the first-order condition (2.6)
uw (w∗ (t v) y ∗ (t v) t) + uy (w∗ (t v) y ∗ (t v) t) = 0
In those cases that have been treated in the literature, the optimal (w(·) y(·)) and v(·) have been shown to exhibit the following properties. PROPERTY A: There is no distortion at the top: If F({t1 }) > 0 then (2.7) (w(t1 ) y(t1 )) = w∗ (t1 v(t1 )) y ∗ (t1 v(t1 )) If F({t1 }) = 0 then (2.8)
lim (w(t k ) y(t k )) = w∗ (t1 v(t1 )) y ∗ (t1 v(t1 ))
k→∞
for any sequence {t k } in T that converges to t1 from below. PROPERTY B: There are downward distortions below the top: For any t ∈ T ∩ [t0 t1 ), (2.9) (w(t) y(t)) w∗ (t v(t)) y ∗ (t v(t)) The following theorem extends the results in the literature to the more general setting considered here. Some care must be taken with the formulation because the choice of an optimal contract menu involves a certain arbitrariness, due to the fact that the principal’s objective function (2.3) is unchanged if the contract menu (w(·) y(·)) is modified on a set of probability zero. The arbitrariness is inessential, however, because the joint distribution of wage/output combinations and types is unaffected by such a modification. I will say that two contract menus (w(·) y(·)) and (w (·) y (·)) are equivalent if (w(t) y(t)) = (w (t) y (t)) for F -almost all t Two contract menus (w(·) y(·)) and (w (·) y (·)) are said to be strongly equivalent if they are equivalent and, in addition, u(w(t) y(t) t) = u(w (t) y (t) t) for all t, that is, they yield the same payoff to every type of the agent. THEOREM 2.1: For any optimal contract menu (w(·) y(·)) with associated indirect utility function v(·) for the agent, there exists a strongly equivalent contract ¯ ¯ menu (w(·) y(·)) (which is also optimal) and there exists tˆ ∈ [t0 t1 ] such that ¯ ¯ (w(·) y(·)) is nondecreasing and, moreover, the following conditions hold:
1208
MARTIN F. HELLWIG
(a.1) If F([tˆ t1 ]) > 0 then, for all t ∈ [tˆ t1 ] ¯ 1 ) y(t ¯ 1 )) ¯ ¯ (2.10) (w(t) y(t)) = w∗ (t v(t)) y ∗ (t v(t)) = (w(t (a.2) If F([tˆ t1 ]) = 0 that is, if tˆ = t1 and F({t1 }) = 0 then ¯ k ) y(t ¯ k )) = w∗ (t1 v(t1 )) y ∗ (t1 v(t1 )) (2.11) lim (w(t k→∞
for any sequence {t k } in T that converges to t1 from below. (b) For any t ∈ [t0 tˆ), ¯ ¯ (2.12) (w(t) y(t))
w∗ (t v(t)) y ∗ (t v(t)) COROLLARY 2.2: If, at the point (w y t) = (w∗ (t1 v(t1 )) y ∗ (t1 v(t1 )) t1 ) the single-crossing condition (2.2) holds with a strict inequality, then the critical tˆ in Theorem 2.1 coincides with t1 and the contract menus (w(·) y(·)) and ¯ ¯ (w(·) y(·)) exhibit Properties A and B. For utility functions that satisfy the single-crossing condition (2.2) with a strict inequality, Corollary 2.2 shows that, regardless of the form of the type distribution, optimal contract menus must have Properties A and B, no distortion at the top, and downward distortions below the top of the type distribution. If the single-crossing condition (2.2) holds only as a weak inequality, optimal contract menus need not literally have Properties A and B. In this case, however, one still gets a decomposition of the type set into an upper part, T ∩[tˆ t1 ] where the optimal contract is efficient, and a lower part, T ∩ [t0 tˆ) where the optimal contract is distorted downward from efficiency. We still have no distortion at the top, but the top now can be an entire interval [tˆ t1 ] If this is the ¯ ¯ case, then, by statement (a.1) of the theorem, the contract (w(t) y(t)) is the same for all t in the interval [tˆ t1 ] Because this contract is efficient for all these types, the single-crossing condition (2.2) must locally hold as an equation, that is, one must have (2.13)
∂ |uy (w y t)| ∗ w (t1 v(t1 )) y ∗ (t1 v(t1 )) t = 0 ∂t uw (w y t)
for all t ∈ [tˆ t1 ] Conversely, if, locally, at (w∗ (t1 v(t1 )) y ∗ (t1 v(t1 )) t) the single-crossing condition (2.2) is strict, one must have tˆ = t1 so that the optimal contract menu satisfies Properties A and B as specified. The corollary makes this converse explicit. Theorem 2.1 and Corollary 2.2 provide a positive answer to the question, which has been raised in the literature on optimal taxation,12 whether Property A holds if the type distribution has a density and the value of the density 12
Brett and Weymark (2003).
INCENTIVE PROBLEMS
1209
at the top is equal to zero. There is no need to assume that the density is strictly positive at t1 . To understand the underlying logic, consider the case where F has a continuous density f and u takes the form y (2.14) u(w y t) = w − γ t which was used by Mirrlees (1971). If a first-order approach to incentive compatibility is valid, an optimal contract menu must satisfy the optimality condition ¯ ¯ ¯ ¯ (2.15) f (t) 1 + uy (w(t) y(t) t) = (1 − F(t))uyt (w(t) y(t) t) that is,
(2.16)
1 y 1 y y y f (t) 1 − γ = (1 − F(t)) 2 γ +γ t t t t t t
For any sequence {t k } that converges to t1 from below, the ratio (1 − F(t k ))/ (f (t k )) converges to zero.13 Along such a sequence, therefore, (1/t k )γ (y/t k ) must converge to 1, that is, the wedge distorting the output level of type t k vanishes. If one cannot just rely on a first-order approach, the argument is more complicated, but the economic logic is the same: Near t1 distortions are of the weight of the set of types above t from kept small because the ratio 1−F(t) f (t) whom the principal can extract more rents as a result of a distortion at t and the density of the type t that is affected by the distortion is close to zero. As t converges to t1 , therefore, the trade-off between the efficiency loss and the rent extraction gain from the distortion at t becomes degenerate. A referee has commented that Property A depends on the assumption that t1 is known and finite. If t1 = ∞ and F({t1 }) = 0 the efficient pair in (2.8) is not well defined, but in the formulation of Property A, (2.8) can be replaced by the requirement that (2.17)
lim t↑t1
¯ ¯ |uy (w(t) y(t) t)| = 1 ¯ ¯ uw (w(t) y(t) t)
Condition (2.17) is equivalent to (2.8) if t1 < ∞ and can also be applied when t1 = ∞ However, with t1 = ∞ the ratio 1−F(t) need not go to zero as t becomes large. f (t) Adapting an example of Diamond (1998), suppose that, above some threshold, F is a Pareto distribution, that is, 1 − F(t) = t −α for some α > 0 so that the 13 Because 1 − F(t1 ) = 0 this claim is trivial if f (t1 ) > 0 More generally, 1 − F(t1 ) = 0 implies = 0. limt↑t1 ln(1 − F(t)) = −∞ Therefore, limt↑t1 dtd ln(1 − F(t)) = −∞ and, hence, limt↑t1 1−F(t) f (t)
1210
MARTIN F. HELLWIG
ratio 1−F(t) = αt goes out of bounds with t. If u takes the form (2.14) with γ( yt ) = f (t) y q ( t ) for some q > 1 (2.16) can be shown to imply α 1 y (2.18) γ = t t q+α ¯ For the given utility specification, the marginal rate of substitution |uy (w(t) ¯ ¯ ¯ y(t) t)|/(uw (w(t) y(t) t)) is just equal to the term 1t γ ( yt ) on the left-hand side. Condition (2.18) requires this term to be constant and less than 1, which is incompatible with (2.17). In this case, Property A does not hold. The crucial difference between this example and Theorem 2.1 is in the bewhen t converges to t1 . In the example, this ratio goes havior of the ratio 1−F(t) f (t) out of bounds; in the setting of Theorem 2.1, with a compact type set, it necessarily goes to zero as t converges to t1 For the utility specification (2.14), the arguments of Werning (2007) imply that even with t1 = ∞, Property A, with (2.17) replacing (2.8), is obtained if 1−F(t) goes to zero as t → ∞ and if tf (t)
the relative curvature γ (y/t) of the effort cost function is bounded I conjecture γ goes that this is generally true whenever uyt /uy is uniformly bounded and 1−F(t) f (t) to zero as t becomes large. Property A is properly understood as the statement that, near the top of the type distribution, the trade-off between distributive is close to zero. Relaand allocative concerns is degenerate if the ratio 1−F(t) f (t) tive to the density of the type t that is affected by a distortion at t the weight of the set of types above t from whom the principal can extract more rents as a result of the distortion at t is close to zero. Therefore, it is undesirable to have any significant distortion. 2.3. Mass Points, Pooling, and Discontinuities in Optimal Contract Menus The following results establish some additional properties of optimal contract menus. These properties arise only when the type distribution has both mass points and a continuous part. THEOREM 2.3: Let (w(·) y(·)) be any optimal contract menu, and let ¯ ¯ (w(·) y(·)) and tˆ ∈ [t0 t1 ] be the associated strongly equivalent contract menu and critical type as given by Theorem 2.1. If F({t}) > 0 for some t ∈ [t0 tˆ) there ¯ ¯ exists t¯ ∈ (t tˆ] such that, on the interval [t t¯) the functions w(·), y(·), and ¯ − w(·) ¯ y(·) are constant. In particular, if F((t t¯)) > 0 the menus (w(·) y(·)) ¯ ¯ and (w(·) y(·)) both provide for pooling of type t with adjacent higher types. The rationale for this result was sketched in the Introduction: If there was no pooling with higher types, the usual trade-off between the distributive effects and the allocative effects of a downward distortion in the contract ¯ ¯ (w(t) y(t)) for a type t that has positive mass would be degenerate. Type t
INCENTIVE PROBLEMS
1211
¯ ¯ would be deemed to have so much weight that a distortion in (w(t) y(t)) away from efficiency, would seem undesirable. However, for a type t just above t that does not have positive mass, standard arguments imply that the con¯ )) is distorted downward from efficiency. The resulting con¯ ) y(t tract (w(t tract menu though, with an efficient outcome for t and a downward distortion for t > t would be decreasing and would violate incentive compatibility. The assumption that type t is not pooled with higher types thus leads to a contradiction. By contrast, the monotonicity requirement does not preclude upward jumps in the optimal contract menu. The observation that a mass point is incommensurately more important than any continuity point of the type distribution implies that, as one moves from immediately adjacent lower types to the mass point, the weights given to losses from distortions of efficiency and to gains from alleviating incentive constraints change discontinuously. As illustrated in Figure 1 in the Introduction, this induces an upward jump in the contract menu. PROPOSITION 2.4: Assume that for any w and y the function t → ln
|uy (w y t)| uw (w y t)
is convex.14 Assume also that the density fa of the absolutely continuous component Fa of the type distribution F is strictly positive and nondecreasing on [t0 t1 ]. ¯ ¯ Let (w(·) y(·)) be any optimal contract menu, and let (w(·) y(·)) and tˆ ∈ [t0 t1 ] be the associated strongly equivalent contract menu and critical type as given by Theorem 2.1. If F({t}) > 0 for some t ∈ [t0 tˆ), consider the lowest type that gets the same outcome as t, that is, ¯ ) y(t ¯ )) = (w(t) ¯ ¯ y(t)) t := inf t |(w(t ¯ ¯ is discontinuous at t. If t > 0 and F((t − and y(·) If t > 0 the contract menu w(·) Δ t − Δ)) > 0 for all Δ > 0 the contract menu (w(·) y(·)) is also discontinuous at t. In Proposition 2.4, the additional assumptions—log-convexity of |uy (w y t)|/(uw (w y t)) in t and monotonicity of the density fa —are introduced to eliminate the possibility that the mass point might belong to the interior of an ironing interval à la Guesnerie and Laffont (1984). Such an interval would provide for a pooling of types even when the type distribution has a continuous density. The additional assumptions imply that the contracts offered to higher 14
This assumption is satisfied, for example, by the quasilinear specification u(w y t) = w − For this specification, ln(|uy (w y t)|/(uw (w y t))) = ln g (y) − ln t is strictly convex in t
1 g(y) t
1212
MARTIN F. HELLWIG
types necessarily involve strictly greater outcomes then the contracts offered to lower types unless the higher types are pooled with mass points. Without mass points, these additional assumptions imply that there is no pooling. This is formally stated as follows. PROPOSITION 2.5: Assume that for any w and y the function t → ln(|uy (w y t)|/(uw (w y t))) is convex. Assume also that the type distribution has a density, and that this density is strictly positive and nondecreasing on [t0 t1 ]. ¯ ¯ If (w(·) y(·)) is any optimal contract menu, and if (w(·) y(·)) is the associated strongly equivalent contract menu and critical type as given by Theorem 2.1, then ¯ ¯ both menus (w(·) y(·)) and (w(·) y(·)) are strictly increasing on [t0 tˆ], where tˆ is the critical type above which outcomes involve no distortion away from efficiency. Whereas log-convexity of |uy (w y t)|/(uw (w y t)) in t and monotonicity of the density provide for monotonicity of optimal contract menus, continuity requires a different set of assumptions. Adapting an argument of Mirrlees (1986, pp. 1231f.), one obtains the following proposition. PROPOSITION 2.6: Assume that for any v and t the function y → ut (c(v y t) y t) is convex, where c(v y t) is defined so that u(c(v y t) y t) = v for any v y t If the type distribution has a continuous density, any optimal contract menu is continuous. Convexity of the function y → ut (c(v y t) y t) is implied by convexity of the function (v y) → ut (c(v y t) y t) As discussed in Hellwig (2004, 2007b), this latter condition is equivalent to the assumption that consumption-specific ˜ y), ˜ risk aversion is weakly decreasing in t that is, that for any random pair (w the amount of consumption that the agent is willing to sacrifice to eliminate the ˜ y) ˜ is nonincreasing in t This assumption also implies that, uncertainty in (w up to equivalence, the optimal contract menu is unique. Moreover, the optimal deterministic contract menu remains optimal if randomization is allowed; randomization is undesirable. 2.4. Three Preliminary Lemmas The remainder of the paper provides formal proofs of the theorems and propositions. I begin by stating three lemmas that allow me to replace incentive-compatibility and individual-rationality constraints by analytically tractable conditions on the indirect utility function v(·) and by a monotonicity condition on outcomes, along the lines of Mirrlees (1976). These lemmas are proved in the Supplemental Material (Hellwig (2010)). The first result shows that even if the support of the type distribution is a strict subset of the interval [t0 t1 ] one can always formulate the principal’s problem in terms of contract menus that are defined on the interval [t0 t1 ]
INCENTIVE PROBLEMS
1213
Refer to a contract menu with domain X as incentive compatible on X if condition (IC) holds for all t and t in X; refer to it as individually rational on X if condition (IR) holds for all t in X Then one obtains the following lemma. LEMMA 2.7: A contract menu (w(·) y(·)) that is defined on T ⊂ [t0 t1 ] is incentive-compatible and individually rational on T if and only if there exists an extension of (w(·) y(·)) to the interval [t0 t1 ] that is incentive-compatible and individually rational on [t0 t1 ] This lemma implies that there is no loss of generality in assuming that all contract menus are defined on the interval [t0 t1 ] and have to satisfy incentive-compatibility and individual-rationality conditions on this interval. For t ∈ [t0 t1 ] \ T , contracts (w(t) y(t)) can always be chosen so that conditions (IC) and (IR) hold. Incentive-compatibility and individual-rationality requirements for such types do not add materially to the principal’s constraints. These types’ contracts are, of course, irrelevant for the principal’s payoff expectations. For contract menus that are defined on [t0 t1 ] the arguments of Mirrlees (1976) are easily adapted to yield the following characterization of incentive compatibility. LEMMA 2.8: A nondecreasing contract menu (w(·) y(·)) is incentive-compatible and individually rational on [t0 t1 ] if and only if the induced indirect utility function v(·) satisfies the integral equation t (2.19) v(t) = v(t0 ) + ut (w(τ) y(τ) τ) dτ t0
for t ∈ [t0 t1 ] and, moreover, (2.20)
v(t0 ) ≥ 0
In Lemma 2.8, weak monotonicity of the contract menu is assumed. Under a strict single-crossing condition, weak monotonicity is, in fact, known to be necessary for incentive compatibility. Here, with only a weak single-crossing condition, this is not the case. However, with strictly convex indifference curves of the agent, the principal does not want to implement a nonmonotonic contract menu.15 This is the point of the following lemma. LEMMA 2.9: For any incentive-compatible contract menu (w(·) y(·)), there ¯ ¯ exists a nondecreasing incentive-compatible contract menu (w(·) y(·)) that pro15
The logic is the same as in Figure 2 below.
1214
MARTIN F. HELLWIG
vides the agent with the same payoff v(t) = u(w(t) y(t) t) for all t and that satisfies ¯ − w(t)] ¯ (2.21) [y(t) dF(t) ≥ [y(t) − w(t)] dF(t); moreover, the inequality in (2.21) is strict unless the contract menus (w(·) y(·)) ¯ ¯ and (w(·) y(·)) are equivalent. 3. A REFORMULATION OF THE PRINCIPAL’S PROBLEM Lemmas 2.7 and 2.8 imply that the principal’s problem is equivalent to the problem of choosing w(·), y(·) and v(·) so as to maximize (2.3) under the constraints that y(·) is nondecreasing and that v(·) = u(w(·) y(·) ·) satisfy the integral equation (2.19) and the boundary condition (2.20). With a slight abuse of language, I will refer to this problem also as the principal’s problem. The integral equation (2.19) is equivalent to the requirement that v(·) be absolutely continuous, with Radon–Nikodym derivative (3.1)
v (t) = ut (w(t) y(t) t)
for almost all t One is therefore tempted to treat the principal’s problem as a problem of optimal control with state variable v and control variables w and y This would be the natural way to proceed if y(·) was not required to be nondecreasing and if F had a density. Here, however, a direct application of controltheoretic methods is precluded by the monotonicity requirement on y(·) and the lack of any structure on F . I will therefore reformulate the principal’s problem so as to circumvent these difficulties. For this purpose, I change the variable of integration in (2.3), using a new variable x rather than t as the argument of the functions that are to be chosen. In a sense, this amounts to a redefinition of the notion of type. The new “pseudotype” is constructed so that its distribution has a density. This density, however, need not be continuous. For any t ∈ [t0 t1 ] set (3.2)
ξ(t) := t + F(t)
The function ξ is strictly increasing and has an inverse τ = ξ−1 The inverse is defined on the range of ξ a subset of the interval [x0 x1 ] := [t0 t1 + 1] Using the fact that F(·), and therefore also ξ(·), is right-continuous as well as increasing, one can extend the inverse τ to the entire interval [x0 x1 ] by setting (3.3)
τ(x) = sup{s|ξ(s) ≤ x} s
INCENTIVE PROBLEMS
1215
˜ The distribution of x˜ is G := F ◦ ξ−1 If one sets x˜ = ξ(t˜) one has t˜ = τ(x) ˜ and the distribution F of t satisfies F = G ◦ τ−1 = G ◦ ξ By the change-ofvariables formula, it follows that, for any function h on [t0 t1 ] t1 x1 (3.4) h(t) dF(t) = h(τ(x)) dG(x) t0
x0
The following lemma shows that G has a density so that (3.4) can actually be written in the form x1 t1 (3.5) h(t) dF(t) = h(τ(x))g(x) dx t0
x0
LEMMA 3.1: The function τ(·) that is defined by (3.2) and (3.3) is absolutely continuous. Its derivative τ (·) satisfies (3.6)
τ (x) =
1 1 + f (τ(x))
if, at t = τ(x), the derivative F (t) = f (t) is well defined; (3.7)
τ (x) = 0
otherwise. The distribution function G = F ◦ ξ−1 is also absolutely continuous. Its density g(x) = G (x) satisfies (3.8)
g(x) = 1 − τ (x)
for all x PROOF: From (3.2), (3.3), and the definition of G one has (3.9)
x = τ(x) + F(τ(x)) = τ(x) + G(x)
for all x ∈ [x0 x1 ] Since τ(·) and G(·) are both nondecreasing, it follows that both are Lipschitz continuous, hence absolutely continuous. Moreover, their slopes must add to 1. For any x τ(x + Δ) > τ(x − Δ) for all Δ > 0 implies 1=
τ(x + Δ) − τ(x − Δ) F(τ(x + Δ)) − F(τ(x − Δ)) + 2Δ 2Δ
lim
τ(x + Δ) − τ(x − Δ) = 2Δ
hence Δ→0
1 F(τ(x + Δ)) − F(τ(x − Δ)) 1 + lim Δ→0 τ(x + Δ) − τ(x − Δ)
1216
MARTIN F. HELLWIG
which yields (3.6) if F(τ(x + Δ)) − F(τ(x − Δ)) = f (τ(x)) Δ→0 τ(x + Δ) − τ(x − Δ) lim
is well defined and yields (3.7) if lim
Δ→0
F(τ(x + Δ)) − F(τ(x − Δ)) = ∞ τ(x + Δ) − τ(x − Δ)
Trivially, (3.7) holds also if τ(x + Δ) = τ(x − Δ) for some Δ > 0 For this case, (3.3) indicates that ξ and F are discontinuous at t = τ(x + Δ) = τ(x − Δ) The Q.E.D. derivative F (t) is then not well defined at t For any contract menu (w(·) y(·)) (3.3) and (3.5) imply that the principal’s payoff (2.3) can be rewritten as x1
(3.10) y(τ(x)) − w(τ(x)) g(x) dx x0
Moreover, if uˆ is defined so that (3.11)
ˆ u(w y x) := u(w y τ(x))
for all w y x then (2.4), (2.19), and (2.20) are equivalent to the conditions (3.12) v(τ(x)) = uˆ w(τ(x)) y(τ(x)) x (3.13) v (τ(x))τ (x) = uˆ x w(τ(x)) y(τ(x)) x and (3.14)
v(τ(t0 )) ≥ 0
Now (3.10)–(3.14) can be rewritten as x1 ˆ ˆ (3.15) [y(x) − w(x)]g(x) dx x0
(3.16)
ˆ ˆ w(x) ˆ ˆ v(x) = u( y(x) x)
(3.17)
ˆ ˆ y(x) x) vˆ (x) = uˆ x (w(x)
and (3.18)
ˆ 0 ) ≥ 0 v(t
INCENTIVE PROBLEMS
1217
where (3.19)
wˆ := w ◦ τ
yˆ := y ◦ τ
and
vˆ := v ◦ τ
ˆ Consider the problem of maximizing (3.15) under the constraints that y(·) ˆ satisfy (3.16)–(3.18). For lack of a better term, be nondecreasing and that v(·) I call this the principal’s modified problem. The following proposition shows that this problem is actually equivalent to the principal’s problem. PROPOSITION 3.2: A contract menu (w(·) y(·)) with associated indirect utilˆ ity function v(·) solves the principal’s problem if and only if the functions w(·), ˆ ˆ that are given by (3.2), (3.3), and (3.19) solve the principal’s modiy(·) and v(·) fied problem. PROOF: The principal’s problem has been shown to be equivalent to the problem of choosing w(·), y(·) and v(·) so as to maximize (3.10) under the constraints that y(·) be nondecreasing and that v(·) satisfy conditions (3.11)– ˆ (3.14). This problem is equivalent to the problem of choosing functions w(·), ˆ ˆ ˆ y(·) and v(·) to maximize (3.15) under the constraints that y(·) be nondeˆ ˆ creasing, that v(·) satisfy (3.16)–(3.18) with uˆ given by (3.11), and that w(·), ˆ ˆ can be represented in the form (3.19) for some functions w(·), y(·) and v(·) y(·) and v(·) This latter problem is the same as the principal’s modified probˆ ˆ ˆ take the form (3.19) for lem with the added constraint that w(·) y(·) and v(·) some functions w(·) y(·) and v(·) To prove the proposition, it is therefore sufficient to show that the added constraint is redundant, because, up to modˆ y ˆ vˆ to the principal’s modified problem ifications on null sets, any solution w satisfies (3.19) for some functions w, y, v For this purpose, I will show that, for almost all x1 and x2 τ(x1 ) = ˆ 1 ) = w(x ˆ 2 ) y(x ˆ 1 ) = y(x ˆ 2 ) and v(x ˆ 1 ) = v(x ˆ 2 ) From (3.11) τ(x2 ) implies w(x and (3.17), one has ˆ ˆ y(x) τ(x))τ (x) vˆ (x) = ut (w(x) Since τ(x1 ) = τ(x2 ) implies τ (x) = 0 for almost all x ∈ [x1 x2 ] it follows that ˆ 1 ) = v(x ˆ 2 ) By standard arguments,16 it follows that τ(x1 ) = τ(x2 ) implies v(x ˆ there exists a function v such that v(x) = v(τ(x)) for all x Next, consider the function w∗ such that (3.20)
ˆ x)|τ( ˜ ˜ = t] w∗ (t) = E[w( x)
for all t where x˜ is distributed as G By the definition of the conditional expectation, one has x1 x1 ∗ ˆ w (τ(x)) dG(x) = (3.21) w(x) dG(x) x0 16
x0
See, for example, Result (8) in Hildenbrand (1974, p. 43).
1218
MARTIN F. HELLWIG
Given w∗ consider also the function y ∗ such that (3.22)
u(w∗ (t) y ∗ (t) t) = v(t)
for all t By the definition of v one also has (3.23)
ˆ ˆ u(w(x) y(x) τ(x)) = v(τ(x))
for all x By the strict quasiconcavity of u (3.20), (3.22), and (3.23) imply (3.24)
ˆ x)|τ( ˜ ˜ = t] x) y ∗ (t) ≥ E[y(
for all t hence x1 ∗ y (τ(x)) dG(x) ≥ (3.25) x0
x1
ˆ y(x) dG(x)
x0
ˆ for Moreover, the inequality in (3.25) is strict unless one has w∗ (τ(x)) = w(x) G-almost all x If the inequality in (3.25) is strict, one has x1 x1
∗ ∗ ˆ ˆ (3.26) [y(x) − w(x)] dG(x) y (τ(x)) − w (τ(x)) dG(x) > x0
x0
Given that, trivially, the triple w∗ ◦ τ y ∗ ◦ τ vˆ has y ∗ ◦ τ nondecreasing and vˆ = v ◦ τ satisfying (3.16)–(3.18), (3.26) is incompatible with the assumption ˆ y ˆ vˆ maximizes (3.15) subject to the constraints that yˆ be nondecreasing that w and that vˆ satisfy (3.16)–(3.18). Therefore, the inequality in (3.25) cannot be ˆ ˆ and, by (3.22), y ∗ (τ(x)) = y(x) for Gstrict. It follows that w∗ (τ(x)) = w(x) almost all x as claimed in the lemma. Q.E.D. The argument is illustrated in Figure 2. If contracts are conditioned on x rather than t the principal has room to offer a richer contract menu. In particular, if t is a mass point of the distribution F the function ξ(·) is discontinuous at t and the principal can assign different contracts to different pseudoˆ ˆ types x ∈ (ξ(t−) ξ(t)] Thus, he might offer different contracts (w(x) y(x)) ˆ for x ∈ (ξ(t−) ξ(t)] so that w(x) is uniformly distributed between w1 and w2 in Figure 2. However, such an arrangement cannot be optimal for him. Because all pseudotypes x ∈ (ξ(t−) ξ(t)] correspond to the same real type ˆ ˆ τ(x) = t incentive compatibility requires that all the contracts (w(x) y(x)) for x ∈ (ξ(t−) ξ(t)] provide the agent of type t with the same utility Thus, in Figˆ ˆ ure 1, the contract offers (w(x) y(x)) for x ∈ (ξ(t−) ξ(t)] all lie on the same indifference curve I(t) for type t. Strict quasiconcavity of u implies that the indifference curve I(t) is strictly convex. If the principal replaces the wage offers ˆ w(x) for x ∈ (ξ(t−) ξ(t)] by their (conditional) expectation w∗ = (w1 + w2 )/2
INCENTIVE PROBLEMS
1219
FIGURE 2.—Multiple contracts for different pseudotypes with the same type.
he can ask for an output y ∗ that is strictly greater than the (conditional) expecˆ tation of y(x) x ∈ (ξ(t−) ξ(t)] By introducing heterogeneity into the contract offers to people with the same “real” type t the principal can only harm himself. 4. ANALYSIS OF THE PRINCIPAL’S MODIFIED PROBLEM 4.1. Preliminaries The principal’s modified problem has the same formal structure as the principal’s problem itself when F has a density. However, the density g(·) in (3.15) ˆ uˆ w and uˆ y are not is not, in general, continuous. Moreover, the functions u generally continuously differentiable with respect to x From (3.16), however, ˆ like u is twice continuously differentiable and strictly one easily verifies that u quasiconcave in w and y jointly, as well as increasing in w and decreasing in y In particular, one has (4.1)
uˆ w (w y x) = uw (w y τ(x))
and (4.2)
uˆ y (w y x) = uy (w y τ(x))
for any w, y and x However, from (4.1) and (4.2), one obtains (4.3)
∂ |uˆ y (w y x)| ∂ |uy (w y τ(x))| = τ (x) ∂x uˆ w (w y x) ∂t uw (w y τ(x))
1220
MARTIN F. HELLWIG
which has a discontinuity with respect to x whenever τ and the density g = 1 − τ have a discontinuity. By (2.2) and Lemma 3.1, (4.3) yields (4.4)
∂ |uˆ y (w y x)| ≤0 ∂x uˆ w (w y x)
for all w, y, and x Thus, uˆ inherits the weak single-crossing property from u. 4.2. Optimality Conditions ˆ was known to be absolutely continuous, the principal’s If the function y(·) ˆ and y(·) ˆ as modified problem would be a standard control problem with v(·) ˆ state variables, and with w(·) and q(·) := yˆ (·) as control variables, with the constraint that q(x) ≥ 0 for all x. The Hamiltonian function for this control problem would take the form (4.5)
ˆ v ˆ y ˆ w ˆ q ˆ μ ˆ ˆ w ˆ y ˆ x) − v) ˆ H( ˆ ϕˆ v ϕˆ q x) = (yˆ − w)g(x) + μ( ˆ u( ˆ y ˆ x) + ϕˆ q q + ϕˆ v uˆ x (w
However, this Hamiltonian is not, in general, continuous in x In particular, if τ(x) is a mass point of the type distribution F and x < x implies τ(x ) < τ(x) the density g(·) will exhibit a discontinuity at x and so will the Hamiltonian ˆ If such a discontinuity arises, there is no reason for y(·) ˆ H to be absolutely continuous. Even so, the principal’s modified problem can be handled by controltheoretic methods. In Hellwig (2008), I formulated a maximum principle for optimal control problems with monotonicity constraints on the controls. When applied to the principal’s modified problem, Theorem 3.1 in Hellwig (2008) yields the following theorem. ˆ ˆ ˆ solve the principal’s modiy(·) and v(·) THEOREM 4.1: If the functions w(·), fied problem, then there exists a measurable function μˆ from [x0 x1 ] into R+ and there exist absolutely continuous functions ϕˆ v and ϕˆ q from [x0 x1 ] into R such that the following statements hold: (a) For almost all x ∈ [x0 x1 ] ˆ ˆ ˆ ˆ ϕˆ v (x) = −Hˆ v v(x) (4.6) y(x) w(x) q(x) μ(x) ˆ ϕˆ v (x) ϕˆ q (x) x (b) Moreover, (4.7)
ϕˆ v (x0 ) ≤ 0
and (4.8)
ϕˆ v (x1 ) = 0
ˆ 0 ) = 0 ϕˆ v (x0 )v(x
INCENTIVE PROBLEMS
1221
(c) For almost all x ∈ [x0 x1 ] ˆ ˆ ˆ ˆ y(x) w(x) q(x) μ(x) ˆ ϕˆ v (x) ϕˆ q (x) x (4.9) ϕˆ q (x) = −Hˆ y v(x) (d) Moreover, (4.10)
ϕˆ q (x0 ) ≤ 0
ˆ 0 ) = 0 ϕˆ q (x0 ) · y(x
and (4.11)
ϕˆ q (x1 ) = 0
(e) For almost every x ∈ [x0 x1 ] (4.12)
ˆ ˆ ˆ ˆ y(x) x) + ϕˆ v (x)uˆ xw (w(x) y(x) x) = 0 −g(x) + μ(x) ˆ uˆ w (w(x)
(f) For almost every x ∈ [x0 x1 ] (4.13)
ϕˆ q (x) ≤ 0
ˆ is strictly increasing at x.17 Moreover, ϕˆ q (x) = 0 if y(·) PROOF: Reformulate the principal’s modified problem by substituting for ˆ ˆ ˆ w(x) = c(v(x) y(x) x) for all x where c(· · ·) is defined so that for any y and t c(· y t) is the inverse of the section u(· y t) of u that is determined by y and t The modified problem then satisfies the assumptions of Theorem 3.1 in Hellwig (2008). The associated Hamiltonian is (4.14)
ˆ y ˆ x) y ˆ x) + ϕˆ q q ˆ y ˆ x))g(x) + ϕˆ v (x)uˆ x (c(v H = (yˆ − c(v
Theorem 3.1 in Hellwig (2008) implies the existence of ϕˆ v , ϕˆ q and M such that, for almost all x ∈ [x0 x1 ] one has (4.15)
ϕˆ v (x) = −Hv
(4.16)
ϕˆ q (x) = −Hy
and, in addition, the transversality conditions (b) and (d) as well as statements (e) and (f) hold. I also introduce (4.17)
μ(x) ˆ = (g(x) − ϕˆ v (x)uˆ xw )
1 uˆ w
17 A real-valued, nondecreasing function f is said to be strictly increasing at t if f (t + ε) − f (t − ε) > 0 for all ε > 0
1222
MARTIN F. HELLWIG
defined so that (4.12) holds. From (4.14)–(4.17), one obtains ∂c ∂c ˆ ˆ ˆ ˆ (v(x) y(x) x) − ϕˆ v (x)uˆ xw (v(x) y(x) x) ∂v ∂v 1 = (g(x) − ϕˆ v (x)uˆ xw ) = μ(x) ˆ uˆ w
ϕˆ v (x) =
and (4.18)
∂c ∂c ˆ ˆ ˆ ˆ y(x) x) g(x) − ϕˆ v (x)uˆ xy (v(x) y(x) x) ϕˆ (x) = − 1 − (v(x) ∂y ∂y uˆ y uˆ y =− 1+ g(x) + ϕˆ v (x)uˆ xy uˆ w uˆ w q
ˆ ˆ ˆ ˆ = −g(x) − μ(x) ˆ uˆ y (w(x) y(x) x) − ϕˆ v (x)uˆ xy (w(x) y(x) x) for almost all x ∈ [x0 x1 ] Statements (a) and (c) follow immediately. Q.E.D. ˆ ˆ ˆ provide a solution In the following, I will suppose that w(·), y(·) and v(·) to the principal’s modified problem, and that μ(·), ˆ ϕˆ v (·) and ϕˆ q (·) are the associated Lagrange multiplier and co-state variables. Given the Inada condition uy (w 0 t) = 0 for all w and t one does not have to worry about boundary solutions. ˆ LEMMA 4.2: Any solution to the principal’s modified problem satisfies w(x) > ˆ 0 and y(x) > 0 for all x ∈ (x0 x1 ]. ˆ > 0 for all x ∈ (x0 x1 ] Suppose that this claim PROOF: I first show that y(x) ˆ is nondecreasing, there exists x¯ ∈ (x0 x1 ] such that is false. Then, because y(·) ¯ and y(x) ˆ ¯ Because y(·) ˆ is strictly increasˆ > 0 for x > x y(x) = 0 for x ∈ [x0 x) ¯ statement (f) of Theorem 4.1 yields ing at x, (4.19)
¯ = 0 ϕˆ q (x)
ˆ ¯ also implies = 0 for x ∈ [x0 x) Because uy (w 0 t) = 0 for all w and t y(x) ˆ ˆ ˆ uˆ y (w(x) y(x) x) = uy (w(x) 0 τ(x)) = 0 and ˆ ˆ ˆ uˆ xy (w(x) y(x) x) = uyt (w(x) 0 τ(x))τ (x) = 0 ¯ By (4.18), therefore, for all x ∈ [x0 x) ψ (x) = −g(x)
INCENTIVE PROBLEMS
1223
¯ By integration, using (4.19), one obtains for all x ∈ (x0 x) (4.20)
x¯
¯ − ϕˆ q (0) = ϕˆ q (x)
¯ ϕˆ q (x ) dx = G(x)
0
¯ = F(τ(x)) ¯ > 0 Because x¯ > x0 by the definitions of G and x0 one has G(x) ˆ Therefore, (4.20) is incompatible with (4.13). The assumption that y(x) = 0 for somex ∈ (x0 x1 ] thus leads to a contradiction and must be false. ˆ Given that y(x) > 0 for x ∈ (x0 x1 ] the individual-rationality condition ˆ w(x) ˆ ˆ ˆ Q.E.D. u( y(x) x) ≥ 0 also yields w(x) > 0 for x ∈ (x0 x1 ] Using (4.6), one can rewrite (4.12) and (4.18) as (4.21)
ϕˆ v (x)uˆ w + ϕˆ v (x)uˆ xw = g(x)
and (4.22)
ϕˆ q (x) = −g(x) − ϕˆ v (x)uˆ y − ϕˆ v (x)uˆ xy
ˆ ˆ y(x) x) If one uses where uˆ w , uˆ y , uˆ wx and uˆ yx are all evaluated at (w(x) (4.21) to substitute for ϕˆ v (x) in (4.22), one obtains uˆ y uˆ y (4.23) ϕˆ q (x) = − 1 + g(x) − ϕˆ v (x) uˆ yx − uˆ wx uˆ w uˆ w or, equivalently, (4.24)
ϕˆ q (x) = −
∂ |uˆ y | uˆ w + uˆ y g(x) + ϕˆ v (x)uˆ w ∂x uˆ w uˆ w
4.3. Analysis of the Optimality Conditions Equation (4.24) is the key to assessing the efficiency properties of the conˆ ˆ ˆ ˆ y(x)) ∈ R2++ is efficient tracts (w(x) y(x)) x ∈ (x0 x1 ] A contract (w(x) for x if (4.25)
ˆ ˆ ˆ ˆ uˆ w (w(x) y(x) x) + uˆ y (w(x) y(x) x) = 0
distorted downward from efficiency for x if (4.26)
ˆ ˆ ˆ ˆ y(x) x) + uˆ y (w(x) y(x) x) > 0 uˆ w (w(x)
and distorted upward from efficiency for x if (4.27)
ˆ ˆ ˆ ˆ y(x) x) + uˆ y (w(x) y(x) x) < 0 uˆ w (w(x)
1224
MARTIN F. HELLWIG
ˆ ˆ By (4.24), therefore, (w(x) y(x)) is efficient for x if (4.28)
ϕˆ q (x) − ϕˆ v (x)uˆ w
∂ |uˆ y | = 0 ∂x uˆ w
distorted downward from efficiency for x if (4.29)
ϕˆ q (x) − ϕˆ v (x)uˆ w
∂ |uˆ y | < 0 ∂x uˆ w
and distorted upward from efficiency for x if (4.30)
ϕˆ q (x) − ϕˆ v (x)uˆ w
∂ |uˆ y | > 0 ∂x uˆ w
The function ϕˆ v (·) is given as the solution to the differential equation (4.21) that satisfies the transversality condition ϕˆ v (x1 ) = 0 This solution is computed as x1 g(x ) (4.31) ϕˆ v (x) = − ˆ ) y(x ˆ ) x ) uˆ w (w(x x x ˆ ) y(x ˆ ) x ) uˆ xw (w(x dx dx × exp ) y(x ) x ) ˆ ˆ ˆ ( w(x u w x so that one obtains the following lemma. LEMMA 4.3: The co-state variable ϕˆ v (·) satisfies ϕˆ v (x) < 0 for all x ∈ [x0 x1 ) At this point, a standard argument along the lines of Mirrlees (1971, 1976) or Seade (1982) might go as follows: If there is no pooling of types, so that ϕˆ q (x) = 0 and if the single-crossing condition (4.4) is strict, that is, if (∂/∂x)|uˆ y |/uˆ w < 0 then for x ∈ [x0 x1 ) Lemma 4.3 implies (4.29). Therefore, ˆ ˆ the contract (w(x) y(x)) is distorted downward from efficiency if x ∈ [x0 x1 ) However, the presumptions that ϕˆ q (x) = 0 and that the single-crossing condition (4.4) is strict are both unjustified. Even if the original single-crossing condition (2.2) is strict, the inequality in (4.4) cannot be strict if τ(x) is a mass point of the distribution F(·) Moreover, if τ(x) is a mass point of the distribution F(·) it turns out that one must have ϕˆ q (x) < 0 Given the insufficiency of the traditional argument focussing on the sign of the co-state variable ϕˆ v (·) the following argument focusses on the co-state variable ϕˆ q (·) that corresponds to the monotonicity constraint. From (4.24), one finds that xˆ ∂ |uˆ y | uˆ w + uˆ y ˆ − ϕˆ q (x) = ϕˆ v (x)uˆ w (4.32) ϕˆ q (x) − g(x ) dx ∂x uˆ w uˆ w x
INCENTIVE PROBLEMS
1225
ˆ The following lemma relates the behavior of ϕˆ q (·) to the for any x and x ˆ ˆ efficiency properties of the contracts (w(x) y(x)) LEMMA 4.4: For any x ∈ [x0 x1 ] ϕˆ q (x) = 0 implies ˆ ˆ (w(x) y(x))
w∗ (τ(x)) y ∗ (τ(x)) ; that is, if the monotonicity constraint is binding, there must be a downward distortion from efficiency. ˆ 0 ) > 0 this follows from the PROOF: I first show that ϕˆ q (x0 ) = 0 If y(x ˆ 0 ) = 0 Lemma 4.2 implies that the functransversality condition (4.10). If y(x ˆ is strictly increasing at x0 ; in this case, ϕˆ q (x0 ) = 0 follows from statetion y(·) ment (f) in Theorem 4.1. ˆ = 0 for some xˆ so that (w( ˆ x) ˆ y( ˆ x)) ˆ is not distorted Suppose that ϕˆ q (x) downward from efficiency. Since ϕˆ q (x0 ) = 0 and ϕˆ q (·) is continuous, there exˆ such that ϕˆ q (x) ¯ = 0 and ϕˆ q (x) = 0 for all x ∈ (x ¯ x] ˆ By stateists x¯ ∈ [x0 x) ˆ ˆ x) ˆ for all ment (f) in Theorem 4.1, one actually has ϕˆ q (x) < 0 and y(x) = y( ¯ x] ˆ By (3.17), one also has w(x) ˆ ˆ x) ˆ for all x ∈ (x ¯ x] ˆ that is, all x ∈ (x = w( ¯ x] ˆ must get the same contract. types t with ξ(t) ∈ (x ¯ x] ˆ the By the single-crossing condition (4.4), it follows that for any x ∈ (x ˆ ) y(x ˆ )) = (w( ˆ x) ˆ y( ˆ x)) ˆ is not distorted downward from efficontract (w(x ciency for x. By (4.25)–(4.27), it follows that ˆ ) y(x ˆ ) x ) + uˆ y (w(x ˆ ) y(x ˆ ) x ) ≤ 0 uˆ w (w(x ¯ x] ˆ From (4.32), one therefore obtains for all x ∈ (x (4.33)
ˆ − ϕˆ q (x) ¯ ≥− ϕˆ q (x)
xˆ
−ϕˆ v (x)uˆ w
x
∂ |uˆ y | dx ∂x uˆ w
ˆ − ϕˆ q (x) ¯ ≥ 0 which is inBy (4.4) and Lemma 4.3, (4.33) in turn yields ϕˆ q (x) ˆ < 0 and ϕˆ q (x) ¯ = 0 The assumption compatible with the assumption that ϕˆ q (x) ˆ = 0 for some xˆ so that (w( ˆ x) ˆ y( ˆ x)) ˆ is not distorted downward from that ϕˆ q (x) efficiency has thus led to a contradiction and must be false. Q.E.D. ˆ ˆ Lemma 4.4 implies that for any x for which (w(x) y(x)) is not downward ˆ ≤ 0, distorted, the term ϕˆ q (x) in (4.32) vanishes. Because, by (4.13), ϕˆ q (x) it follows that for any such x and any xˆ > x the left-hand side of (4.32) is nonpositive, and one must have (4.34) x
xˆ
ϕˆ v (x )uˆ w
∂ |uˆ y | dx ≤ ∂x uˆ w
xˆ x
uˆ w + uˆ y g(x ) dx uˆ w
1226
MARTIN F. HELLWIG
Since uˆ w > 0 and uˆ y ≤ 0 this is equivalent to the requirement that (4.35) x
xˆ
ϕˆ v (x )uˆ w
∂ |uˆ y | dx ≤ ∂x uˆ w
xˆ
1−
x
|uˆ y | g(x ) dx uˆ w
By Lemma 4.3 and the single-crossing condition (4.4), the integrand on the left-hand side is everywhere nonnegative. What about the integrand on the ˆ ˆ right-hand side? If (w(x) y(x)) is not downward distorted, then at x the integrand on the right-hand side is zero or negative. For x > x the integrand |uˆ | on the right-hand side of (4.35) may change, first, because uˆ wy depends on x ˆ ) y(x ˆ )) depends on x The directly and, second, because the contract (w(x ˆ and the quasiconcavity of u following lemma exploits the monotonicity of y(·) and uˆ to provide a bound on the change that depends only on the direct effect of x on the marginal rate of substitution. LEMMA 4.5: For any x ∈ [x0 x1 ) and any x ∈ (x x1 ] (4.36)
ˆ ) y(x ˆ ) x )| |uˆ y (w(x) ˆ ˆ |uˆ y (w(x y(x) x)| ≥ + ˆ ) y(x ˆ ) x ) ˆ ˆ y(x) x) uˆ w (w(x uˆ w (w(x)
x
x
∂ |uˆ y | dx ∂x uˆ w
ˆ ˆ ) > y(x) Moreover, the inequality is strict if y(x ˆ ˆ PROOF: Incentive compatibility implies that the functions y(·) and w(·) are co-monotonic. Indeed, by (3.16) and the incentive-compatibility condiˆ ) + (uˆ y /uˆ w ) d y(x ˆ ) = 0 for almost all x ∈ tion (3.17), one must have d w(x [x0 x1 ]. By standard calculus, one therefore has (4.37)
ˆ ) y(x ˆ ) x )| |uˆ y (w(x) ˆ ˆ |uˆ y (w(x y(x) x)| − − ˆ ) y(x ˆ ) x ) ˆ ˆ y(x) x) uˆ w (w(x uˆ w (w(x) x uˆ y ∂ |uˆ y | ∂ |uˆ y | ˆ ) − + d y(x = ∂w uˆ w ∂y uˆ w uˆ w x
x
x
∂ |uˆ y | dx ∂x uˆ w
for all x ∈ [x0 x1 ) and all x ∈ (x x1 ] The right-hand side of (4.37) is computed as x 1 ˆ ) − 3 [uˆ 2y uˆ ww − uˆ y uˆ w (uˆ wy + uˆ yw ) + uˆ 2w uˆ yy ] d y(x (4.38) uˆ w x ˆ is nondecreasBecause u and uˆ are quasiconcave in w and y, and because y(·) ˆ ) > y(x) ˆ ing, expression (4.38) is nonnegative. Moreover, if y(x then, because the quasiconcavity of u and uˆ in w and y is strict, expression (4.38) is strictly positive. The left-hand side of (4.37) is, therefore, nonnegative. It is positive if ˆ ˆ ) > y(x) Q.E.D. y(x
INCENTIVE PROBLEMS
1227
Upon combining (4.35) and (4.36), one finds that, for any x for which ˆ ˆ (w(x) y(x)) is not downward distorted, one must have xˆ xˆ ˆ ˆ y(x) x)| |uˆ y (w(x) ∂ |uˆ y | (4.39) ϕˆ v (x )uˆ w dx ≤ 1 − g(x ) dx ˆ ˆ ∂x uˆ w y(x) x) uˆ w (w(x) x x xˆ x ∂ |uˆ y | dx g(x ) dx − ˆ ∂x u w x x for all xˆ > x Upon combining (4.4) with (4.39) and Lemma 4.3, one obtains the next lemma. ˆ ˆ LEMMA 4.6: None of the contracts (w(x) y(x)) for x ∈ [x0 x1 ) is distorted upward from efficiency, that is, all of these contracts satisfy ˆ ˆ ˆ ˆ uˆ w (w(x) y(x) x) + uˆ y (w(x) y(x) x) ≥ 0 PROOF: If the lemma is false, then, by (4.27), one has (4.40)
ˆ ˆ |uˆ y (w(x) y(x) x)| >1 ˆ ˆ y(x) x) uˆ w (w(x)
for some x ∈ [x0 x1 ) Moreover, Lemmas 4.4 and 4.5 imply that (4.39) must hold for all xˆ ∈ (x x1 ] However, (4.40) implies that for xˆ sufficiently close to x the right-hand side of (4.39) is negative. By (4.4) and Lemma 4.3, the left-hand side of (4.39) is nonnegative. The assumption that the lemma is false thus leads to a contradiction. Q.E.D. ˆ ˆ By contrast, the possibility that (w(x) y(x)) might be efficient for x cannot be entirely ruled out. The following lemma and its corollary show that if this ˆ ) y(x ˆ )) must also be is the case, then for any x ∈ [x x1 ] the contract (w(x efficient for x Moreover, one must have pooling of all types between τ(x) and t1 , the top of the type set. ˆ ˆ LEMMA 4.7: If, for some x ∈ [x0 x1 ) the contract (w(x) y(x)) is efficient ˆ x) ˆ y( ˆ x)) ˆ is efficient for x ˆ for x, then, for every xˆ ∈ [x x1 ] the contract (w( PROOF: I will prove that (4.41)
ˆ ˆ |uˆ y (w(x) y(x) x)| =1 ˆ ˆ y(x) x) uˆ w (w(x)
implies (4.42)
ˆ x) ˆ y( ˆ x) ˆ x)| ˆ |uˆ y (w( =1 ˆ x) ˆ y( ˆ x) ˆ x) ˆ uˆ w (w(
1228
MARTIN F. HELLWIG
for all xˆ ∈ [x x1 ] If the claim is false, there exist x ∈ [x0 x1 ) and xˆ ∈ [x x1 ] ˆ ˆ ˆ x) ˆ y( ˆ x)) ˆ is distorted downward from such that (w(x) y(x)) is efficient and (w( efficiency. Let x¯ ≥ x be the infimum of the set of xˆ ∈ [x x1 ) for which (4.42) fails to hold. I first show that (4.43)
ˆ x) ¯ y( ˆ x) ¯ x) ¯ + uˆ y (w( ˆ x) ¯ y( ˆ x) ¯ x) ¯ = 0 uˆ w (w(
If x¯ = x the claim is trivial. If x¯ > x the definition of x¯ implies that (4.42) ¯ By Lemma 4.5, it follows that holds for all x ∈ [x x) x¯ ˆ x) ¯ y( ˆ x) ¯ x)| ¯ |uˆ y (w( ∂ |uˆ y | ≥1+ dx ˆ x) ¯ y( ˆ x) ¯ x) ¯ ˆw uˆ w (w( x ∂x u ¯ Upon taking limits as x converges to x¯ from below one for all x ∈ [x x) ˆ x) ¯ y( ˆ x) ¯ x| ¯ ≥ uˆ w (w( ˆ x) ¯ y( ˆ x) ¯ x) ¯ By Lemma 4.6, one also has obtains |uˆ y (w( ˆ x) ¯ y( ˆ x) ¯ x| ¯ ≤ uˆ w (w( ˆ x) ¯ y( ˆ x) ¯ x) ¯ (4.43) follows immediately. |uˆ y (w( ˆ x) ¯ y( ˆ x)) ¯ satisfies (4.43), Lemmas 4.4 and 4.5 imply Because the contract (w( ¯ x1 ]; moreover, by (4.43), the that (4.39) must hold for x = x¯ and any xˆ ∈ (x first term on the right-hand side of (4.39) is zero. Thus, one must have xˆ xˆ x ∂ |uˆ y | ∂ |uˆ y | (4.44) ϕˆ v (x )uˆ w dx ≤ − dx g(x ) dx ˆ ∂x ∂x u uˆ w w x¯ x¯ x¯ for all xˆ ∈ (x x1 ] By (3.8) and (4.4), the right-hand side of (4.44) is no greater than xˆ xˆ ∂ |uˆ y | dx dx − ˆw x¯ x¯ ∂x u Therefore, (4.44) implies that xˆ xˆ ∂ |uˆ y | ∂ |uˆ y | ¯ ϕˆ v (x )uˆ w dx ≤ −(xˆ − x) dx ˆw ∂x uˆ w x¯ x¯ ∂x u or, equivalently, that xˆ ∂ |uˆ y | (4.45) [ϕˆ v (x )uˆ w + xˆ − x] dx ≤ 0 ∂x uˆ w x¯ ¯ x1 ] By Lemma 4.3, there exists A > 0 such that ϕˆ v (x )uˆ w ≤ −A for all xˆ ∈ (x ¯ If xˆ − x¯ < A the integrand in (4.45) is everywhere if x is sufficiently close to x nonnegative; moreover, it is strictly positive if (∂/∂x)|uˆ y |/uˆ w < 0. For (4.45) ¯ x] ˆ By to hold, one must therefore have (∂/∂x)|uˆ y |/uˆ w = 0 for all x ∈ [x Lemma 4.5, it follows that (4.46)
ˆ ) y(x ˆ ) x )| |uˆ y (w( ˆ x) ¯ y( ˆ x) ¯ x)| ¯ |uˆ y (w(x ≥ ˆ ) y(x ˆ ) x ) ˆ x) ¯ y( ˆ x) ¯ x) ¯ uˆ w (w(x uˆ w (w(
INCENTIVE PROBLEMS
1229
¯ x] ˆ By (4.43), this implies |uˆ y (w(x ˆ ) y(x ˆ ) x )| ≥ uˆ w (w(x ˆ ) yˆ for all x ∈ [x ˆ ) y(x ˆ ) x )| ≤ uˆ w (w(x ˆ ) y(x ˆ ) (x ) x ) By Lemma 4.6, one also has |uˆ y (w(x ¯ x] ˆ contrary to the assumption that x ) Therefore, (4.42) holds for all x ∈ [x ˆ x) ˆ y( ˆ x)) ˆ is distorted downward from efficiency The assumption that one (w( ˆ ˆ y(x)) is efficient and can have x ∈ [x0 x1 ) and xˆ ∈ [x x1 ] such that (w(x) ˆ x) ˆ y( ˆ x)) ˆ is distorted downward from efficiency has thus led to a contradic(w( tion. Q.E.D. ˆ ˆ y(x)) is efficient LEMMA 4.8: If for some x ∈ [x0 x1 ) the contract (w(x) for x, then for every xˆ ∈ [x x1 ] (4.47)
ˆ x) ˆ y( ˆ x)) ˆ = (w(x) ˆ ˆ (w( y(x))
(4.48)
ˆ ˆ ∂ |uˆ y (w(x) y(x) x )| ˆ = 0 (x) ˆ ˆ ∂x uˆ w (w(x) y(x) x )
ˆ ˆ y(x)) is efficient for x, PROOF: If for some x ∈ [x0 x1 ) the contract (w(x) ˆ = 0 for all xˆ ∈ [x x1 ] Hence, then by Lemmas 4.4 and 4.7, one must have ϕˆ q (x) ˆ = 0 for all xˆ ∈ [x x1 ] By the optimality condition (4.24), it follows also, ϕˆ q (x) that ˆ uˆ w ϕˆ v (x)
∂ |uˆ y | =0 ∂x uˆ w
for all xˆ ∈ [x x1 ] By Lemma 4.3, therefore, (4.49)
ˆ x) ˆ y( ˆ x) ˆ x )| ∂ |uˆ y (w( ˆ =0 (x) ˆ x) ˆ y( ˆ x) ˆ x ) ∂x uˆ w (w(
for all xˆ ∈ [x x1 ) By Lemma 4.5, it follows that (4.50)
ˆ x) ˆ y( ˆ x) ˆ x)| ˆ ˆ ˆ |uˆ y (w(x) y(x) x)| |uˆ y (w( ≥ ˆ x) ˆ y( ˆ x) ˆ x) ˆ ˆ ˆ y(x) x) uˆ w (w( uˆ w (w(x)
ˆ x) ˆ > y(x) ˆ for all xˆ ∈ [x x1 ] and the inequality is strict if y( Because Lemma 4.7 implies that the two sides of (4.50) are both equal to 1, it follows ˆ x) ˆ ≯ y(x) ˆ ˆ x) ˆ = y(x) ˆ ˆ x) ˆ = that y( Hence y( and, by incentive compatibility, w( ˆ w(x) This establishes (4.47). (4.48) follows from (4.47) and (4.49). Q.E.D. As a last step on the way toward proving Theorem 2.1, I show that even if ˆ ˆ y(x)) is efficient, there is there is no x ∈ [x0 x1 ) for which the contract (w(x) still no distortion at the top. LEMMA 4.9: For any sequence {xk } that converges to x1 from below, one has
ˆ k ) y(x ˆ k ) xk ) + uˆ y (w(x ˆ k ) y(x ˆ k ) xk ) = 0 lim uˆ w (w(x (4.51) k→∞
1230
MARTIN F. HELLWIG
PROOF: Let {xk } be any sequence that converges to x1 from below. Without loss of generality, one may assume that the sequence is nondecreasing. The ˆ k ) y(x ˆ k ))} is then also nondecreasing. Because this contract sequence {(w(x ˆ 1 )) it must have a limit (w ¯ y) ¯ ˆ 1 ) y(x sequence is bounded by (w(x Recall that, from the transversality condition (4.11) and the optimality condition (4.13), one has ϕˆ q (x1 ) = 0 and ϕˆ q (x) ≤ 0 For any k therefore, condition (4.32) yields (4.52)
1 1 − G(xk )
x1
xk
∂ |uˆ y | 1 ϕˆ v (x)uˆ w dx ≥ ∂x uˆ w 1 − G(xk )
x1
xk
uˆ w + uˆ y g(x) dx uˆ w
I claim that as k → ∞ the left-hand side of this inequality converges to zero. To establish this claim, I first observe that, by (4.4) and (4.31), the lefthand side of (4.52) is nonnegative. Also, by (4.4) and (4.31), the left-hand side of (4.52) can be written in the form
x1 x1
∂ |uˆ y | 1
dx (4.53) h(x x )g(x ) dx uˆ w 1 − G(xk ) xk x ∂x uˆ w where, for any x and x (4.54)
h(x x ) :=
x ˆ ) y(x ˆ ) x ) 1 uˆ xw (w(x exp dx ˆ ) y(x ˆ ) x ) ˆ ) y(x ˆ ) x ) uˆ w (w(x uˆ w (w(x x
For x ∈ [xk x1 ] one has 1 − G(xk ) ≥ 1 − G(x) Expression (4.53) is therefore bounded above by x1
x1 h(x x ) dG(x )
∂ |uˆ y |
x
(4.55) dx uˆ w 1 − G(x) ∂x uˆ w xk which converges to zero as k → ∞ and xk converges to x1 from below. Given that the left-hand side of (4.52) is bounded between zero and (4.53), it must also converge to zero as k → ∞ By Lemma 4.6, the right-hand side of (4.52) is nonnegative for all k Because the left-hand side of (4.52) converges to zero, it follows that the right-hand side also converges to zero as k → ∞ and xk converges to x1 from below. Therefore, one must have (4.56)
¯ y ¯ x1 ) + uˆ y (w ¯ y ¯ x1 ) uˆ w (w = 0 ¯ y ¯ x1 ) uˆ w (w
The lemma follows immediately.
Q.E.D.
INCENTIVE PROBLEMS
1231
4.4. Proofs of Theorems 2.1 and 2.3 PROOF OF THEOREM 2.1: Given the optimal contract menu (w(·) y(·)) let ¯ ¯ (w(·) y(·)) be the associated strongly equivalent contract menu that is given by Lemmas 2.8 and 2.9, and let v(·) be the associated indirect utility function. ˆ ˆ ˆ Furthermore, let w(·) y(·) and v(·) be given by (3.2), (3.3), and (3.19). By Proposition 3.2, these functions solve the principal’s modified problem. ˆ ˆ If, for all x ∈ (x0 x1 ] the contract (w(x) y(x)) is efficient for x, set tˆ = t0 Statement (a.1) of the theorem is then trivially true, and statements (a.2) and (b) are moot. Suppose, therefore, that the set of x ∈ [x0 x1 ] for which the contract ˆ ˆ (w(x) y(x)) is distorted downward from efficiency is nonempty, let xˆ > x0 ˆ Lemma 4.7 implies that, for be the supremum of this set, and let tˆ = τ(x). ˆ the contract (w(x) ˆ ˆ y(x)) is distorted downward from efficiency. x ∈ [x0 x) ¯ ¯ By (3.19), it follows that, for t ∈ [t0 tˆ) the contract (w(·) y(·)) is distorted downward from efficiency. This confirms statement (b) of the theorem. Statement (a.1) of the theorem follows from Lemma 4.8; statement follows (a.2) from Lemma 4.9. Q.E.D. ¯ ¯ y(·)), tˆ, and t ∈ [t0 tˆ) be as specified PROOF OF THEOREM 2.3: Let (w(·) ˆ ˆ in the theorem, and let w(·) y(·) be given by (3.2), (3.3), and (3.19). Let x = ξ(t) and x = supt
x that is sufficiently close to x Therefore, there exists ˆ )) = (w(x) ˆ ˆ ¯ ¯ ¯ ˆ ) y(x y(x)) = (w(t) y(t)) for all x ∈ (x x) x¯ > x such that (w(x ˆ )) = ¯ Because x¯ > x one must have t¯ > t Since (w(x ˆ ) y(x Set t¯ = τ(x) ¯ one must ¯ ¯ (w(t) y(t)) is distorted downward from efficiency for all x ∈ (x x) also have t¯ ≤ tˆ Q.E.D. 4.5. Proofs of Propositions 2.4, 2.5, and 2.6 PROOF OF PROPOSITION 2.4: Let t and t be as specified in the proposition. ¯ ¯ By Theorem 2.3, there exists t¯ ∈ (t tˆ) such that the functions w(·) y(·) are constant on (t t¯) If t = t0 there is nothing to prove. Suppose, therefore, that
1232
MARTIN F. HELLWIG
¯ ¯ ˆ ˆ t > t0 and that the contract menu (w(·) y(·)) is continuous at t. Let (w(·) y(·)) be the associated solution to the principal’s modified problem, and let x := ξ(t) ¯ ¯ ˆ ˆ and x¯ := ξ(t¯) The constancy of (w(·) y(·)) on (t t¯) implies that (w(·) y(·)) is ¯ The continuity of (w(·) ¯ ¯ ˆ ˆ y(·)) at t implies that (w(·) y(·)) is constant on (x x) ˆ y) ˆ is the common value of (w(·) ˆ ˆ ¯ then, y(·)) on (x x) continuous at x. If (w ˆ ˆ ˆ y) ˆ for x < x. y(x))
(w by the definitions of x and t, one has (w(x) By statement (f) in Theorem 4.1, it follows that ϕˆ q (x) = 0 One must also have ϕˆ q (xk ) = 0 for all k along some sequence {xk } that converges to x from below. Suppose that ϕˆ q (x) < 0 for all x below x and sufficiently close to x. Then ˆ ˆ (w(·) y(·)) must be constant on an interval which has x as its supremum. Given ˆ ˆ ˆ y) ˆ for x < x ˆ this would contradict the continuity of that (w(x) y(x))
(w ˆ ˆ ˆ (w(·) y(·)) at x Given that ϕˆ q (xk ) = 0 for all k along some sequence {xk } that converges to x from below, one must also have ϕˆ q (χk ) ≤ 0 for all k along some sequence {χk } that converges to x from below. By (4.24) and the definition (3.11) of the ˆ one then has function u uky ∂ |uky | k (4.57) − 1 + k g(χk ) + ϕˆ v (χk )ukw · τ (χ ) ≤ 0 uw ∂t ukw ˆ k ) y(χ ˆ k ) τ(χk )) Withfor all k where uky ukw are all evaluated at (w(χ out loss of generality, one may suppose that for any k at τ(χk ) the distribution function F has a derivative. The value of this derivative is fa (τ(χk )) By Lemma 3.1, one then has τ (χk ) =
1 1 + fa (τ(χk ))
and
g(χk ) =
fa (τ(χk )) 1 + fa (τ(χk ))
for all k Thus, (4.57) implies that (4.58)
uky ϕˆ v (χk )ukw ∂ |uky | ≤ 1+ k fa (τ(χk )) ∂t ukw uw
for all k. Upon taking limits as k goes out of bounds, using the presumed conˆ ˆ tinuity of (w(·) y(·)) at x, one infers that ˆ y ˆ τ(x)) ∂ |uy | ˆ y ˆ τ(x)) ϕˆ v (x)uw (w uy (w ˆ y ˆ τ(x)) ≤ 1 + (4.59) (w ˆ y ˆ τ(x)) ∂t uw uw (w f¯a where f¯a := limk→∞ fa (τ(χk )); this limit exists because fa (·) is a nondecreasing function. Without loss of generality, one may also suppose that x¯ is the supremum ˆ ˆ ˆ y) ˆ Thus, (w(·) ˆ ˆ of the set on which (w(·) y(·)) takes the value (w y(·)) is ¯ = 0 By statement (f) in Theorem 4.1, ¯ Therefore, ϕˆ q (x) strictly increasing at x
INCENTIVE PROBLEMS
1233
¯ Therefore, there exists (a nonnull set of) it follows that ϕˆ q is maximal at x ζ < x¯ close to x¯ such that ϕˆ q (ζ) ≥ 0 Because t < tˆ Theorem 2.1 implies that ˆ y ˆ τ(x)) + uy (w ˆ y ˆ τ(x)) > 0 By the single-crossing condition, it follows uw (w ˆ y ˆ τ(ζ)) + uy (w ˆ y ˆ τ(ζ)) > 0 for ζ close to x ¯ By (4.24), therefore, that uw (w ϕˆ q (ζ) ≥ 0 implies τ (ζ) > 0 so that, at τ(ζ), the type distribution F again has a density, with the value fa (τ(ζ)) Upon using (3.11), (4.24), and Lemma 3.1, as before, one infers that ˆ y ˆ τ(ζ)) ˆ y ˆ τ(ζ)) ∂ |uy | uy (w ϕˆ v (ζ)uw (w ˆ y ˆ τ(ζ)) ≥ 1 + (4.60) (w ˆ y ˆ τ(ζ)) fa (τ(ζ)) ∂t uw uw (w ˆ ˆ ˆ y) ˆ for all x ∈ (x ζ), one also By (4.21) and the fact that (w(x) y(x)) = (w ˆ y ˆ τ(ζ)) > ϕˆ v (x)uw (w ˆ y ˆ τ(x)) Since τ(ζ) > τ(χk ) for all k has ϕˆ v (ζ)uw (w and fa is a nondecreasing function, one also has fa (τ(ζ)) ≥ fa (τ(χk )) for all k and, therefore, fa (τ(ζ)) ≥ f¯a Because, by Lemma 4.3, ϕˆ v (ζ) < 0 and ϕˆ v (x) < 0 it follows that (4.61)
ˆ y ˆ τ(ζ)) ϕˆ v (x)uw (w ˆ y ˆ τ(x)) ϕˆ v (ζ)uw (w > fa (τ(ζ)) f¯a
By the log-convexity of |uy (w y t)|/(uw (w y t)) in t one also has (4.62)
ˆ y ˆ τ(ζ)) ∂ |uy | uw (w ˆ y ˆ τ(ζ)) (w ˆ y ˆ τ(ζ))| ∂t uw |uy (w ≥
ˆ y ˆ τ(x)) ∂ |uy | uw (w ˆ y ˆ τ(x)) (w ˆ y ˆ τ(x))| ∂t uw |uy (w
Upon combining (4.59)–(4.62), using the fact that by Lemma 4.3 and the singlecrossing condition, ϕˆ v and (∂/∂t)|uy |/uw are negative, one obtains ˆ y ˆ τ(ζ)) ˆ y ˆ τ(ζ)) uy (w uw (w 1+ ˆ y ˆ τ(ζ))| ˆ y ˆ τ(ζ)) |uy (w uw (w ≤
ˆ y ˆ τ(ζ)) uw (w ˆ y ˆ τ(ζ)) ∂ |uy | ϕˆ v (ζ)uw (w ˆ y ˆ τ(ζ)) (w ˆ y ˆ τ(ζ))| ∂t uw fa (τ(ζ)) |uy (w
ˆ y ˆ τ(x)) uw (w ˆ y ˆ τ(x)) ∂ |uy | ϕˆ v (x)uw (w ˆ y ˆ τ(x)) (w ˆ y ˆ τ(x))| ∂t uw |uy (w f¯a ˆ y ˆ τ(x)) ˆ y ˆ τ(x)) uy (w uw (w 1+ ≤ ˆ y ˆ τ(x))| ˆ y ˆ τ(x)) |uy (w uw (w <
Thus, (4.63)
ˆ y ˆ τ(ζ)) ˆ y ˆ τ(x)) uw (w uw (w −1< − 1 ˆ y ˆ τ(ζ))| ˆ y ˆ τ(x))| |uy (w |uy (w
1234
MARTIN F. HELLWIG
which is incompatible with the single-crossing condition. The assumption that tˆ > t0 and that the contract menu (w(·) y(·)) is continuous at tˆ has thus led to a contradiction and must be false. This completes the proof of Proposition 2.4. Q.E.D. PROOF OF PROPOSITION 2.5—Sketch: The argument is similar to the one in the proof of Proposition 2.4: If there were a pooling interval [t t¯] the co-state variable ϕˆ q (·) would have to be nonincreasing just above x = ξ(t) and nondecreasing just below x¯ = ξ(t¯) If the type distribution has no singular component, one has τ (x) = 1/(1 + f (τ(x))) for all x By the same rea¯ soning as before, therefore, ϕˆ q (x+) ≤ 0 and ϕˆ q (x−) ≥ 0 would yield (4.59) and (4.60), with f¯a and fa (τ(ζ)) replaced by f (t) = f (τ(x)) and f (t¯) = ¯ f (τ(x)) With strict positivity and monotonicity of f , and with log-convexity of |uy (w y t)|/(uw (w y t)) in t one then again obtains (4.63). Because this is incompatible with the single-crossing condition, the assumption that there is a pooling interval must be false. Q.E.D. PROOF OF PROPOSITION 2.6: Proceeding indirectly, suppose that the proposition is false. Then there is an optimal contract menu (w(·) y(·)) that exhibits ˆ ˆ a discontinuity at some t¯ ∈ [t0 t1 ] If (w(·) y(·)) is the associated solution to ˆ ˆ the principal’s modified problem, then (w(·) y(·)) exhibits a discontinuity at ˆ ˆ ¯ This implies ϕˆ q (x) ¯ = 0 x¯ = ξ(t¯) Then (w(·) y(·)) is strictly increasing at x ¯ Therefore, By statement (f) in Theorem 4.1, it follows that ϕˆ q is maximal at x there exist sequences {ζk } converging to x¯ from below and {χ } converging to x¯ from above such that ϕˆ q (ζk ) ≥ 0 and ϕˆ q (χ ) ≤ 0 for all k and By (4.24), one then has uky ∂ |uky | g(ζk ) (4.64) ϕˆ v (ζk )ukw · τ (ζ ) ≥ 1 + k ∂t ukw ukw and (4.65)
uy ∂ |uy | · τ (χ ) ≤ 1 + g(χ ) ϕˆ v (χ )u ∂t uw uw
w
ˆ k ) y(ζ ˆ k ) τ(ζk )) and for all k and where uky ukw are evaluated at (w(ζ ˆ ) y(χ ˆ ) τ(χ )). uy uw are evaluated at (w(χ Because the type distribution has a continuous density, Lemma 3.1 yields τ (x) =
1 1 + f (τ(x))
and
g(x) =
f (τ(x)) 1 + f (τ(x))
for all x Conditions (4.64) and (4.65) can therefore be rewritten as uky ϕˆ v (ζk ) k ∂ |uky | u (4.66) ≥ 1+ k f (τ(ζk )) w ∂t ukw uw
INCENTIVE PROBLEMS
1235
and (4.67)
uy ϕˆ v (χ ) ∂ |uy | u ≤ 1+ f (τ(χ )) w ∂t uw uw
Upon taking limits as k and become large, using the continuity of f and τ and the monotonicity of the contract menu, one obtains (4.68)
¯ ϕˆ v (x) ∂ |uy | ˆ x−) ¯ ˆ x−) ¯ ¯ ˆ x−) ¯ ˆ x−) ¯ ¯ uw (w( y( τ(x)) (w( y( τ(x)) ¯ f (τ(x)) ∂t uw ˆ x−) ¯ ˆ x−) ¯ ¯ y( τ(x)) uy (w( ≥ 1+ ˆ x−) ¯ ˆ x−) ¯ ¯ uw (w( y( τ(x))
and (4.69)
¯ ∂ |uy | ϕˆ v (x) ˆ x+) ¯ ˆ x+) ¯ ¯ ˆ x−) ¯ ˆ x−) ¯ ¯ uw (w( y( τ(x)) (w( y( τ(x)) ¯ f (τ(x)) ∂t uw ˆ x+) ¯ ˆ x+) ¯ ¯ y( τ(x)) uy (w( ≤ 1+ ˆ x+) ¯ ˆ x+) ¯ ¯ uw (w( y( τ(x))
ˆ x+) ¯ ˆ x+)) ¯ ˆ x−) ¯ Because u is strictly quasiconcave in w and y (w( y( (w( ˆ x−)) ¯ y( implies that the right-hand side of (4.68) is strictly greater than the ¯ ¯ (τ(x))) < 0 and |uy | = −uy it right-hand side of (4.69). Because (ϕˆ v (x))/(f follows that ∂ uy ˆ x−) ¯ ˆ x−) ¯ ¯ (w( y( τ(x)) ∂t uw ∂ uy ˆ x+) ¯ ˆ x+) ¯ ¯ ˆ x−) ¯ ˆ x−) ¯ ¯ y( τ(x)) (w( y( τ(x)) > uw (w( ∂t uw
ˆ x−) ¯ ˆ x−) ¯ ¯ uw (w( y( τ(x))
or, equivalently, (4.70)
uyt (−) −
uy (−) uy (+) uwt (−) > uyt (+) − uwt (+) uw (−) uw (+)
ˆ x−) ¯ ˆ x−) ¯ where − and + in the arguments stand in for evaluation at (w( y( ¯ and (w( ˆ x+) ¯ ˆ x+) ¯ ¯ However, since (IC) implies τ(x)) y( τ(x)) ˆ x−) ¯ ˆ x−) ¯ ¯ = u(w( ˆ x+) ¯ ˆ x+) ¯ ¯ u(w( y( τ(x)) y( τ(x)) one easily verifies that (4.70) is incompatible with the convexity of the function ¯ y τ(x)). ¯ The assumption that the proposition is false has y → ut (c(v y τ(x)) thus led to a contradiction. Q.E.D.
1236
MARTIN F. HELLWIG REFERENCES
BRETT, C., AND J. A. WEYMARK (2003): “Financing Education Using Optimal Redistributive Taxation,” Journal of Public Economics, 87, 2549–2569. [1208] CLARKE, F. H. (1976): “The Maximum Principle Under Minimal Hypotheses,” SIAM Journal of Control and Optimization, 14, 1078–1091. [1204] (1983): Optimization and Nonsmooth Analysis. New York: Wiley. [1204] CRÉMER, J., F. KHALIL, AND J.-C. ROCHET (1998a): “Strategic Information Gathering Before a Contract Is Offered,” Journal of Economic Theory, 81, 163–200. [1205] (1998b): “Contracts and Productive Information Gathering,” Games and Economic Behavior, 25, 174–193. [1205] DIAMOND, P. A. (1998): “Optimal Income Taxation: An Example With a U-Shaped Pattern of Optimal Marginal Income Tax Rates,” American Economic Review, 88, 83–95. [1209] FUDENBERG, D., AND J. TIROLE (1991): Game Theory. Cambridge, MA: MIT Press. [1206] GUESNERIE, R., AND J.-J. LAFFONT (1984): “A Complete Solution to a Class of Principal–Agent Problems With an Application to the Control of a Self-Managed Firm,” Journal of Public Economics, 25, 329–369. [1204,1211] HELLWIG, M. F. (2004): “Risk Aversion in the Small and in the Large When Outcomes Are Multidimensional,” Discussion Paper 04-06, Max Planck Institute for Research on Collective Goods, Bonn, Germany. Available at http://www.coll.mpg.de/pdf_dat/2004_06online.pdf. [1212] (2004/2008): “Optimal Income Taxation, Public-Goods Provision and Public-Sector Pricing: A Contribution to the Foundations of Public Economics,” 2008 revision of Preprint 14/2004, Max Planck Institute for Research on Collective Goods, Bonn, Germany. Available at http://www.coll.mpg.de/pdf_dat/2004_14online.pdf. [1202] (2007a): “A Contribution to the Theory of Optimal Utilitarian Income Taxation,” Journal of Public Economics, 91, 1449–1477. Available at http://www.coll.mpg.de/pdf_dat/2007_ 02online.pdf. [1201,1202] (2007b): “The Undesirability of Randomized Income Taxation Under Decreasing Risk Aversion,” Journal of Public Economics, 91, 791–816. [1212] (2008): “A Maximum Principle for Control Problems With Monotonicity Constraints,” Preprint 04/2008, Max Planck Institute for Research on Collective Goods, Bonn, Germany. Available at http://www.coll.mpg.de/pdf_dat/2008_04online.pdf. [1204,1220,1221] (2010): “Supplement to ‘Incentive Problems With Unidimensional Hidden Characteristics: A Unified Approach’,” Econometrica Supplemental Material, 78, http://www. econometricsociety.org/ecta/Supmat/7726_Proofs.pdf. [1212] HILDENBRAND, W. (1974): Core and Equilibria of a Large Economy. Princeton, NJ: Princeton University Press. [1217] LAFFONT, J.-J., AND D. MARTIMORT (2001): The Theory of Incentives: The Principal–Agent Model. Princeton, NJ: Princeton University Press. [1206] MAS-COLELL, A., M. D. WHINSTON, AND J. R. GREEN (1995): Microeconomic Theory. Oxford: Oxford University Press. [1206] MIRRLEES, J. M. (1971): “An Exploration in the Theory of Optimum Income Taxation,” Review of Economic Studies, 38, 175–208. [1201,1209,1224] (1976): “Optimal Tax Theory: A Synthesis,” Journal of Public Economics, 6, 327–358. [1201,1212,1213,1224] (1986): “The Theory of Optimal Taxation,” in Handbook of Mathematical Economics, Vol. 3, ed. by K. J. Arrow and M. D. Intriligator. Amsterdam: North-Holland, Chapter 24, 1198–1249. [1201,1212] ROBERTS, K. (2000): “A Reconsideration of the Optimal Income Tax,” in Incentives, Organization and Public Economics: Papers in Honour of Sir James Mirrlees. Oxford: Oxford University Press, 171–189. [1202] SAEZ, E. (2001): “Using Elasticities to Derive Optimal Income Tax Rates,” Review of Economic Studies, 68, 205–229. [1202]
INCENTIVE PROBLEMS
1237
SEADE, J. (1977): “On the Shape of Optimal Tax Schedules,” Journal of Public Economics, 7, 203–236. [1201] (1982): “On the Sign of the Optimum Marginal Income Tax,” Review of Economic Studies, 49, 637–643. [1201,1224] SZALAY, D. (2005): “The Economics of Extreme Options and Clear Advice,” Review of Economic Studies, 72, 1173–1198. [1204] WERNING, I. (2007): “Pareto Efficient Income Taxation,” Working Paper, MIT. Available at http: //econ-www.mit.edu/files/1281. [1210]
Max Planck Institute for Research on Collective Goods, Kurt-SchumacherStrasse 10, D-53113 Bonn, Germany; [email protected]. Manuscript received February, 2008; final revision received April, 2010.
Econometrica, Vol. 78, No. 4 (July, 2010), 1239–1283
INEQUALITY AND UNEMPLOYMENT IN A GLOBAL ECONOMY BY ELHANAN HELPMAN, OLEG ITSKHOKI, AND STEPHEN REDDING1 This paper develops a new framework for examining the determinants of wage distributions that emphasizes within-industry reallocation, labor market frictions, and differences in workforce composition across firms. More productive firms pay higher wages and exporting increases the wage paid by a firm with a given productivity. The opening of trade enhances wage inequality and can either raise or reduce unemployment. While wage inequality is higher in a trade equilibrium than in autarky, gradual trade liberalization first increases and later decreases inequality. KEYWORDS: Wage inequality, international trade, risk, unemployment.
1. INTRODUCTION TWO CORE ISSUES IN INTERNATIONAL TRADE are the allocation of resources across economic activities and the distribution of incomes across factors of production. In this paper, we develop a new framework for examining the determinants of resource allocation and income distribution in which both wage inequality and unemployment respond to trade. Our framework encompasses a number of important features of product and labor markets, as a result of which it generates predictions that match features of the data. This framework is rich, flexible, and tractable, as we demonstrate by deriving a number of interesting results on trade, inequality, and unemployment. In addition, we show how this framework can be extended in various ways and how it can accommodate different general equilibrium structures. Moreover, our framework fits squarely into the new view of foreign trade that emphasizes firm heterogeneity in differentiated product sectors. We introduce standard Diamond–Mortensen–Pissarides search and matching frictions into a Melitz (2003) model, but unlike previous work in this area, such as Helpman and Itskhoki (2010), we also introduce ex post match-specific heterogeneity in a worker’s ability. Because a worker’s ability is not directly observable by his employer, firms screen workers to improve the composition of their employees. Complementarities between workers’ abilities and firm productivity imply that firms have an incentive to screen workers to exclude those 1 This paper is a combined version of Helpman, Itskhoki, and Redding (2008a, 2008b). Work on these papers started when Redding was a Visiting Professor at Harvard University. We thank the National Science Foundation for financial support. Redding thanks the Centre for Economic Performance at the London School of Economics and the Yale School of Management for financial support. We are grateful to a co-editor, four anonymous referees, Pol Antràs, Matilde Bombardini, Arnaud Costinot, Gilles Duranton, Gene Grossman, James Harrigan, Larry Katz, Marc Melitz, Guy Michaels, Steve Pischke, Esteban Rossi-Hansberg, Peter Schott, Dan Trefler, and conference and seminar participants at AEA, Berkeley, CEPR, Chicago, Columbia, Harvard, LSE, NBER, NYU, Northwestern, Penn State, Princeton, Stanford, Stockholm, Tel Aviv, UCLA, and Yale for helpful comments. The usual disclaimer applies.
© 2010 The Econometric Society
DOI: 10.3982/ECTA8640
1240
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
with lower abilities. As larger firms have higher returns to screening and the screening technology is the same for all firms, more productive firms screen more intensively and have workforces of higher average ability than less productive firms. Search frictions induce multilateral bargaining between a firm and its workers, and since higher-ability workforces are more costly to replace, more productive firms pay higher wages. When the economy is opened to trade, the selection of more productive firms into exporting increases their revenue relative to less productive firms, which further enhances their incentive to screen workers to exclude those of lower ability. As a result, exporters have workforces of higher average ability than nonexporters and hence pay higher wages. This mechanism generates a wage-size premium and implies that exporting increases the wage paid by a firm with a given productivity. Both features of the model have important implications for wage inequality within sectors and within groups of workers with the same ex ante characteristics. Our first main result is that the opening of a closed economy to trade raises wage inequality. The intuition for this result is that larger firms pay higher wages and the opening of trade increases the dispersion of firm revenues, which in turn increases the dispersion of firm wages. This result is more general than our model in the sense that it holds in a wider class of models in which firm wages are increasing in firm revenue and there is selection into export markets. We provide a proof that the opening of trade raises wage inequality for any inequality measure that respects second-order stochastic dominance, and this result holds for a class of models satisfying the following three sufficient conditions: firm wages and employment are power functions of firm productivity, exporting increases the wage paid by a firm with a given productivity, and firm productivity is Pareto distributed.23 Our second main result is that once the economy is open to trade, the relationship between wage inequality and trade openness is at first increasing and later decreasing. As a result, a given change in trade frictions can either raise or reduce wage inequality, depending on the initial level of trade openness. The intuition for this result stems from the increase in firm wages that occurs 2 While the assumption that firm productivity is Pareto distributed is strong, this assumption is standard in the literature on firm heterogeneity and trade, provides a reasonable approximation for the observed distribution of firm sizes (e.g., Axtell (2001)), and provides a reasonable approximation for the upper tail of the observed distribution of worker wages (e.g., Saez (2001)). 3 A number of recent studies examine similar issues in models of fair wages (e.g., Egger and Kreickemeier (2009a, 2009b) and Amiti and Davis (2008)). In these models the wage distribution depends on the formulation of the fair wage hypothesis and this formulation differs across studies: wages are assumed to rise with either a firm’s productivity, its revenue, or its profits. For example, Egger and Kreickemeier (2009a) assumed that the fair wage is a power function of a firm’s productivity. In other words, more productive firms pay higher wages by assumption. This implies that the relative wage of two firms with different productivity levels is the same, independently of whether one or both or neither of them exports. The mechanism in our model is quite different and our inequality results hold for all standard measures of inequality.
INEQUALITY AND UNEMPLOYMENT
1241
at the productivity threshold above which firms export, which is only present when some but not all firms export. When no firm exports, a small reduction in trade costs increases wage inequality, because it induces some firms to export and raises the wages paid by these exporting firms relative to domestic firms. When all firms export, a small rise in trade costs increases wage inequality, because it induces some firms to cease exporting and reduces the wages paid by these domestic firms relative to exporting firms. Another key prediction of our framework is that these two results hold regardless of general equilibrium effects. To demonstrate this, we derive these results from comparisons across firms that hold in sectoral equilibrium irrespective of how the sector is embedded in general equilibrium. It follows that our results for sectoral wage inequality do not depend on the impact of trade on aggregate variables and variables in other sectors. We use our framework to derive closed-form expressions for the sectoral wage distribution. This distribution depends on an extensive margin of trade openness (the fraction of exporting firms) and an intensive margin of trade openness (relative revenue in the export and domestic markets). We characterize the relationship between these extensive and intensive margins of trade openness and the exogenous parameters of the model such as fixed and variable trade costs. Since workers are ex ante homogeneous, wage inequality in our model is within-group inequality. Our theoretical results are therefore consistent with empirical findings of increased within-group wage inequality following trade liberalization (see, for example, Attanasio, Goldberg, and Pavcnik (2004) and Menezes-Filho, Muendler, and Ramey (2008)). As these theoretical results hold for asymmetric countries, they are also consistent with empirical findings of increased wage inequality following trade liberalization in both developed and developing countries (see, for example, the survey by Goldberg and Pavcnik (2007)). While our focus is within-group inequality, we also develop an extension in which there are multiple types of workers with different observable ex ante characteristics. We show that our results for the impact of trade on withingroup wage inequality hold in this more general framework; in particular, trade raises wage inequality within every group of workers. While betweengroup inequality of wages can rise or fall, the rise in within-group inequality can dominate when between-group inequality falls, so that overall inequality rises. In our framework, the opening of trade results in endogenous changes in workforce composition and measured firm productivity. The increase in revenue at exporters induces them to screen workers more intensively, while the decrease in revenue at nonexporters causes them to screen workers less intensively. It follows that more productive firms experience larger increases in average worker ability and wages following the opening of trade, which strengthens the correlation between firm productivity and average worker ability and
1242
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
echoes the empirical findings of Verhoogen (2008). To the extent that empirical measures of productivity do not completely control for differences in workforce composition, these endogenous changes in workforce composition also result in endogenous changes in measured firm productivity. As more productive firms experience the largest increases in average worker ability following the opening of trade, they also exhibit the largest increases in measured firm productivity. Another distinctive feature of our framework is the interaction between wage inequality and unemployment. The unemployment rate depends on the fraction of workers searching for employment that are matched (the tightness of the labor market) and the fraction of these matched workers that are hired (the hiring rate). While the more intensive screening of more productive firms implies that they pay higher wages, it also implies that they hire a smaller fraction of the workers with whom they are matched. As a result, the reallocation of resources toward more productive firms that occurs following the opening of trade reduces the hiring rate and increases the unemployment rate. In contrast, the tightness of the labor market can either remain constant following the opening of trade (as in Helpman and Itskhoki (2010)) or can rise (as in Felbermayr, Prat, and Schmerer (2008) and Felbermayr, Larch, and Lechthaler (2009)) depending on what happens to expected worker income, which in turn depends on how the model is closed in general equilibrium. Therefore, the net effect on the unemployment rate of opening a closed economy to trade is ambiguous. In contrast to the unambiguous results for wage inequality, our analysis suggests that unemployment can either rise or fall following the opening of trade, which is consistent with the lack of a clear consensus on the empirical relationship between trade and unemployment (see, for example, the discussion in Davidson and Matusz (2009)). Worker ability admits two possible interpretations within our framework. One interpretation is that ability is match-specific and independently distributed across matches. Another interpretation is that ability is a general talent of a worker that does not depend on his match, but is unobservable to both workers and firms. In the static model that we develop here, the analysis is the same irrespective of which interpretation is taken. In both cases, workers do not know their ability and have no incentive to direct their search across firms or sectors. However, our preferred interpretation is that ability is matchspecific, because this interpretation makes our static model consistent with the steady state of a dynamic search and matching model, since the screening of a worker for one match reveals no information about their ability for other potential matches.4 Under the alternative interpretation that ability is a general talent of workers, wage inequality in the model has a worker component as well as a firm component. Since more productive firms screen more intensively 4 This type of dynamic analysis can be done along the lines of Helpman and Itskhoki (2009), whose model has, however, no worker heterogeneity and no screening.
INEQUALITY AND UNEMPLOYMENT
1243
and pay higher wages, only more able workers receive the higher wages paid by more productive firms.5 Our paper is related to a large literature on trade, wage inequality, and unemployment. One broad area of research has explored the relationship between trade and wage inequality in models with neoclassical labor markets. Yeaple (2005) and Bustos (2009) developed models of monopolistic competition in which firms make endogenous choices about production technology and observable skill composition. Ohnsorge and Trefler (2007) and Costinot and Vogel (2009) examined the relationship between trade and wage inequality in competitive assignment models. Burstein and Vogel (2009) developed a model in which both comparative advantage and skill-biased technology play a role in determining the relationship between trade and wage inequality. Another broad area of research has examined the implications of labor market frictions for the impact of trade on unemployment and wage inequality. One strand of this literature has considered models of efficiency or fair wages. Davis and Harrigan (2007) developed a model of firm heterogeneity and efficiency wages in which wages vary across firms because of differences in monitoring technology, and equilibrium unemployment exists to induce workers to supply effort. Egger and Kreickemeier (2009a, 2009b) and Amiti and Davis (2008) developed models of firm heterogeneity and fair wages in which the fair wage at which workers supply effort is assumed to vary with either firm productivity, revenue, or profits. Yet another strand of this literature has considered models of search and matching, which provide natural microfoundations for labor market frictions.6 In important research, Davidson, Martin, and Matusz (1988, 1999) showed that the introduction of search and matching frictions into competitive models of international trade has predictable implications for the relationship between relative goods and factor prices. Using models of firm heterogeneity with search and matching frictions, Felbermayr, Prat, and Schmerer (2008), Felbermayr, Larch, and Lechthaler (2009), and Helpman and Itskhoki (2010) examined the relationship between trade and unemployment. None of these 5
Embedding this alternative interpretation in a dynamic framework would be more complicated, because screening for one match reveals information about a worker’s productivity for other potential matches. As a result, the ex post distribution of worker ability among the unemployed would no longer be equal to the ex ante distribution of worker ability. Additionally, as workers gradually learn about their ability, more able workers would have an incentive to direct their search toward more productive firms. 6 Seminal models of search and matching include Mortensen (1970), Pissarides (1974), Diamond (1982a, 1982b), and Mortensen and Pissarides (1994). One line of research follows Burdett and Mortensen (1998) in analyzing wage dispersion in models of wage posting and random search. Another line of research examines wage dispersion when both firms and workers are heterogeneous, including models of pure random search, such as Acemoglu (1999) and Shimer and Smith (2000), and models incorporating on-the-job search, such as Postel-Vinay and Robin (2002), Cahuc, Postel-Vinay, and Robin (2006), and Lentz (2010).
1244
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
papers, however, features wage dispersion across firms, because more productive firms expand on the extensive margin of matched workers until the bargained wage rate equals the replacement cost of a worker. Our main point of departure from these studies is the introduction of ex post heterogeneity in worker ability and screening of workers by firms, which generates wage dispersion across firms that is influenced by both search frictions and trade liberalization. Our modelling of labor market frictions is also related to the one-period search models of Acemoglu (1999) and Acemoglu and Shimer (1999), in which firms make irreversible investments in capacity or technology before being matched one-to-one with workers. Davidson, Matusz, and Shevchenko (2008) examined the impact of international trade in a model of this form, where firms choose either a high or low technology and can be matched with either a highor low-skill manager. In equilibria where high-skill managers are willing to accept jobs at low-technology firms and only high-technology firms export, the model features an exporter wage premium and trade liberalization increases the wage gap between high- and low-skill managers. One key difference between our approach and these models is that we allow for an endogenous measure of matched workers for each firm rather than assuming one-to-one matching between firms and workers. Modelling endogenous variation in firm size enables our framework to speak in a meaningful way to empirically observed correlations between productivity, employment, and wages, which is important for the within-industry reallocations induced by trade liberalization. Once this endogenous variation in firm size is introduced, differences in workforce composition play a central role in generating differences in wages across firms, as discussed above. The remainder of the paper is structured as follows. Section 2 outlines the model and its sectoral equilibrium. Section 3 presents our results on sectoral wage inequality. Section 4 presents our results on sectoral unemployment, and Section 5 extends our analysis to incorporate observable ex ante heterogeneity between multiple types of workers. Section 6 examines alternative ways of closing the model in general equilibrium. Section 7 concludes. The online Technical Appendix (Helpman, Itskhoki, and Redding (2010)) contains technical details, including proofs of propositions and other results. 2. SECTORAL EQUILIBRIUM The key predictions of our model relate to the distribution of wages and employment across firms and workers within a sector. In this section, we derive these distributions from comparisons across firms that hold in sectoral equilibrium for any value of a worker’s expected income outside the sector, that is, his outside option. An important implication of this result is that the model’s predictions for sectoral equilibrium hold regardless of general equilibrium effects. Throughout this section, all prices, revenues, and costs are measured in
INEQUALITY AND UNEMPLOYMENT
1245
terms of a numeraire, where the choice of this numeraire is specified when we embed the sector in general equilibrium. 2.1. Model Setup We consider a world of two countries, home and foreign, where foreign variables are denoted by an asterisk. In each country there is a continuum of workers who are ex ante identical. Initially, we assume workers are risk neutral, but we consider risk aversion in Section 6. Demand within the sector is defined over the consumption of a continuum of horizontally differentiated varieties and takes the constant elasticity of substitution (CES) form. The real consumption index for the sector (Q) is therefore defined as 1/β β Q= q(j) dj 0 < β < 1 j∈J
where j indexes varieties, J is the set of varieties within the sector, q(j) denotes consumption of variety j, and β controls the elasticity of substitution between varieties. To simplify notation, we suppress the sector subscript except where important, and while we display expressions for home, analogous relationships hold for foreign. The price index dual to Q is denoted by P and depends on the prices p(j) of individual varieties j. Given this specification of sectoral demand, the equilibrium revenue of a firm is (1)
r(j) = p(j)q(j) = Aq(j)β
where A is a demand shifter for the sector.7 Each firm takes the demand shifter as given when making its decisions, because it supplies one of a continuum of varieties within the sector and is, therefore, of measure zero relative to the sector as a whole. In this section we show how A is determined in sectoral equilibrium, and we show in Section 6 how it is related to prices and expenditures in general equilibrium. The product market is modelled in the same way as in Melitz (2003). There is a competitive fringe of potential firms that can choose to enter the differentiated sector by paying an entry cost of fe > 0. Once a firm incurs the sunk entry cost, it observes its productivity θ, which is independently distributed and drawn from a Pareto distribution Gθ (θ) = 1 − (θmin /θ)z for θ ≥ θmin > 0 and z > 1. The Pareto distribution is not only tractable, but together with our other assumptions implies a Pareto firm-size distribution, which provides a reasonable approximation to observed data (see Axtell (2001)). Since in equilibrium 7
As is well known, the demand function for a variety j can be expressed as q(j) = A1/(1−β) p(j)−1/(1−β)
where
A = E 1−β P β
and E is total expenditure on varieties within the sector, while P is the sector’s ideal price index.
1246
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
all firms with the same productivity behave symmetrically, we index firms by θ from now onward. Once firms observe their productivity, they decide whether to exit, produce solely for the domestic market, or produce for both the domestic and export markets. Production involves a fixed cost of fd > 0 units of the numeraire. Similarly, exporting involves a fixed cost of fx > 0 units of the numeraire and an iceberg variable trade cost, such that τ > 1 units of a variety must be exported for one unit to arrive in the foreign market. Output of each variety (y) depends on the productivity of the firm (θ), the ¯ measure of workers hired (h), and the average ability of these workers (a): (2)
¯ y = θhγ a
0 < γ < 1
This production technology can be interpreted as capturing either human capital complementarities (e.g., production in teams where the productivity of a worker depends on the average productivity of her team) or a managerial time constraint (e.g., a manager with a fixed amount of time who needs to allocate some time to each worker). In the Technical Appendix, we derive the production technology under each of these interpretations. A key feature of the production technology is complementarities in worker ability, where the productivity of a worker is increasing in the abilities of other workers employed by the firm.8 Worker ability is assumed to be independently distributed and drawn from a Pareto distribution, Ga (a) = 1 − (amin /a)k for a ≥ amin > 0 and k > 1. Under our preferred interpretation, worker ability is match-specific and hence a worker’s ability draw for a given match conveys no information about ability draws for other potential matches. The labor market is characterized by search and matching frictions which are modelled following the standard Diamond– Mortensen–Pissarides approach. A firm that pays a search cost of bn units of the numeraire can randomly match with a measure n of workers, where the search cost b is endogenously determined by the tightness of the labor market as discussed below. Consistent with a large empirical literature in labor economics, we assume that worker ability cannot be costlessly observed when firms and workers are matched.9 Instead, we assume that firms can undertake costly investments in worker screening to obtain an imprecise signal of worker ability, which is in line with a recent empirical literature on firm screening and other recruitment 8 The existence of these production complementarities is the subject of a long line of research in economics, including Lucas (1978), Rosen (1982), and Garicano (2000). For empirical evidence, see, for example, Moretti (2004). 9 For example, Altonji and Pierret (2001) found that as employers learn about worker productivity, the wage equation coefficients on easily observed characteristics, such as education, fall relative to the coefficients on hard-to-observe correlates of worker productivity.
INEQUALITY AND UNEMPLOYMENT
1247
policies.10 To capture the idea of an imprecise signal in as tractable a way as possible, we assume that by paying a screening cost of caδc /δ units of the numeraire, where c > 0 and δ > 0, a firm can identify workers with an ability below ac .11 Screening costs are increasing in the ability threshold ac chosen by the firm, because more complex and costlier tests are required for higher ability cutoffs.12 This specification of worker screening is influenced by empirical evidence that more productive firms not only employ more workers, but also screen more intensively, have workforces of higher average ability, and pay higher wages. Each of these features emerges naturally from our specification of production and screening, as demonstrated below, because production complementarities imply a greater return to screening for more productive firms and the screening technology is the same for all firms. Our formulation also ensures that the multilateral bargaining game between firms and workers over the surplus from production remains tractable. As the only information revealed by screening is which workers have match-specific abilities above and below ac , neither the firm nor the workers know the match-specific abilities of individual workers, and hence bargaining occurs under conditions of symmetric information. The key feature of our analysis is not the precise formulation of screening, which is chosen partly for tractability, but the variation in workforce composition across firms after screening. Since more productive firms have workforces of higher average ability after screening, they pay higher wages as the outcome of the bargaining game. Other formulations in which more productive firms have workforces of higher average ability after screening would also generate the prediction that more productive firms pay higher bargained wages. 2.2. Firm’s Problem The complementarities between workers’ abilities in the production technology provide the incentive for firms to screen workers. By screening and not employing workers with abilities less than ac , a firm reduces output (and hence revenue and profits) by decreasing the measure of workers hired (h), but raises ¯ Since there are diminishing reoutput by increasing average worker ability (a). turns to the measure of workers hired (0 < γ < 1), output can be increased by 10 For empirical evidence on the resources devoted by firms to the screening of job applicants, see, for example, Barron, Bishop, and Dunkelberg (1985), Barron, Black, and Loewenstein (1987), Pellizzari (2005), and Autor and Scarborough (2008). 11 In this formulation, there is a fixed cost of screening, even when the screening is not informative, that is, when ac = amin . We focus on interior equilibria in which firms of all productivities choose screening tests that are informative, ac > amin , and so the fixed cost of screening is always incurred. As we show below, this is the case when the screening cost, c, is sufficiently small. 12 All results generalize immediately to the case where the screening costs are separable in ac and n, and linear in n, so that we can allow screening costs to rise with the measure of matched workers.
1248
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
screening as long as there is sufficient dispersion in worker ability (sufficiently low k).13 With a Pareto distribution of worker ability, a firm that chooses a screening threshold ac hires a measure h = n(amin /ac )k of workers with average ability a¯ = kac /(k − 1). Therefore, the production technology can be rewritten as (3)
y = κy θnγ a1−γk c
κy ≡
k γk a k − 1 min
where we require 0 < γk < 1 for a firm to have an incentive to screen.14 Given consumer love of variety and a fixed production cost, no firm will ever serve the export market without also serving the domestic market. If a firm exports, it allocates its output (y(θ)) between the domestic and export markets (yd (θ) and yx (θ), respectively) to equate its marginal revenues in the two markets, which from (1) implies [yx (θ)/yd (θ)]1−β = τ−β (A∗ /A). Therefore, a firm’s total revenue can be expressed as (4)
r(θ) ≡ rd (θ) + rx (θ) = Υ (θ)1−β Ay(θ)β
where rd (θ) ≡ Ayd (θ)β is revenue from domestic sales and rx (θ) ≡ A∗ [yx (θ)/ τ]β is revenue from exporting. The variable Υ (θ) captures a firm’s “market access,” which depends on whether it chooses to serve both the domestic and foreign markets or only the domestic market: (5)
Υ (θ) ≡ 1 + Ix (θ)τ
−β/(1−β)
A∗ A
1/(1−β)
where Ix (θ) is an indicator variable that equals 1 if the firm exports and 0 otherwise.15 13 Since production complementarities provide the incentive for firms to screen, the marginal product of workers with abilities below ac is negative for a firm with screening threshold ac , as shown in the Technical Appendix. Note that in this production technology, the marginal product of a worker depends not only on his ability, but also on the ability of his co-workers. Therefore, a worker with a given ability can have a positive or negative marginal product, depending on the ability of his co-workers. While worker screening is a key feature of firms’ recruitment policies, and production complementarities provide a tractable explanation for it, other explanations are also possible, such as fixed costs of maintaining an employment relationship (e.g., in terms of office space or other scarce resources). 14 In contrast, when γ > 1/k, no firm screens and the model reduces to the model of Helpman and Itskhoki (2010), which has no screening or ex post worker heterogeneity. We do not discuss this case here. While for simplicity, we assume a unit exponent on average ability in the production technology (2), a more general specification is y = θhγ a¯ ξ , in which case the condition for firms to screen is 0 < γ < ξ/k. 15 Note that [yx (θ)/yd (θ)]1−β = τ−β (A∗ /A) and yd (θ) + yx (θ) = y(θ) imply yd (θ) = y(θ)/Υ (θ) and yx (θ) = y(θ)[Υ (θ) − 1]/Υ (θ), and hence rd (θ) = r(θ)/Υ (θ) and rx (θ) = r(θ)[Υ (θ) − 1]/Υ (θ).
INEQUALITY AND UNEMPLOYMENT
1249
After having observed its productivity, a firm chooses whether or not to produce, whether or not to export, the measure of workers to sample, and the screening ability threshold (and hence the measure of workers to hire). Once these decisions have been made, the firm and its hired workers engage in strategic bargaining with equal weights over the division of revenue from production in the manner proposed by Stole and Zwiebel (1996a, 1996b). The only information known by firms and workers at the bargaining stage is that each hired worker has an ability greater than ac . Therefore, the expected ability of each worker is a¯ = k/(k − 1)ac and each worker is treated as if they ¯ Combining (2) and (4), firm revenue can be written as have an ability of a. ¯ β hβγ , which is continuous, increasing, and concave in h. As r = Υ (θ)1−β A(θa) the fixed production, and fixed exporting, search, and screening costs have all been sunk before the bargaining stage, all other arguments of firm revenue are fixed. Furthermore, the outside option of hired workers is unemployment, whose value we normalize to 0. As a result, the solution to the bargaining game is that the firm receives the fraction 1/(1 + βγ) of revenue (4), while each worker receives the fraction βγ/(1 + βγ) of average revenue per worker.16 Anticipating the outcome of the bargaining game, the firm maximizes its profits. Combining (3), (4), and (5), this profit maximization problem can be written as ∗ 1/(1−β) 1−β 1 −β/(1−β) A π(θ) ≡ max 1 + Ix τ (6) n≥0 1 + βγ A ac ≥amin Ix ∈{01}
× A(κy θn a
c δ ) − bn − ac − fd − Ix fx δ
γ 1−γk β c
The firm’s decision whether or not to produce and whether or not to export takes a standard form. The presence of a fixed production cost implies that there is a zero-profit cutoff for productivity, θd , such that a firm drawing a productivity below θd exits without producing. Similarly, the presence of a fixed exporting cost implies that there is an exporting cutoff for productivity, θx , such that a firm drawing a productivity below θx does not find it profitable to serve the export market. Given that a large empirical literature finds evidence of 16
See Acemoglu, Antràs, and Helpman (2007) and the Technical Appendix for the derivation of the solution to the bargaining game. Stole–Zwiebel bargaining is a natural generalization of Nash bargaining to the multiple workers case: the firm bargains bilaterally with every worker, but unlike in Nash bargaining, it internalizes the effect of a worker’s departure on the wages of the remaining workers. As a result, the equilibrium wage as a function of employment is the solution to the differential equation ∂(r − wh)/∂h = w, which equalizes the marginal surplus of the firm and the surplus of the worker from employment.
1250
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
selection into export markets, where only the most productive firms export, we focus on values of trade costs for which θx > θd > θmin .17 The firm market access variable is, therefore, determined as 1 θ < θx , Υ (θ) = (7) Υx θ ≥ θx , ∗ 1/(1−β) −β/(1−β) A > 1 Υx ≡ 1 + τ A The firm’s first-order conditions for the measure of workers sampled (n) and the screening ability threshold (ac ) are βγ r(θ) = bn(θ) 1 + βγ β(1 − γk) r(θ) = cac (θ)δ 1 + βγ These conditions imply that firms with larger revenue sample more workers and screen to a higher ability threshold. While the measure of workers hired, h = n(amin /ac )k , is increasing in the measure of workers sampled, n, it is decreasing in the screening ability threshold, ac . Under the assumption δ > k, firms with larger revenue not only sample more workers, but also hire more workers. Finally, from the division of revenue in the bargaining game, the total wage bill is a constant share of revenue, which implies that firm wages are monotonically increasing in the screening ability cutoff: k n(θ) ac (θ) βγ r(θ) =b =b w(θ) = 1 + βγ h(θ) h(θ) amin By adjusting employment, firms are able to push their bargained wage down to the replacement cost of a worker. As larger firms have workers of higher average ability, which are more costly to replace, they pay higher wages. Thus, firms with larger revenue have higher screening ability cutoffs and pay higher wages, but the expected wage conditional on being sampled is the same across all firms, w(θ)h(θ) = b n(θ) 17 For empirical evidence of selection into export markets, see, for example, Bernard and Jensen (1995) and Roberts and Tybout (1997).
INEQUALITY AND UNEMPLOYMENT
1251
which implies that workers have no incentive to direct their search.18 Combining the measure of workers hired, h = n(amin /ac )k , with the first-order conditions above yields the following relationship between firm wages and the measure of workers hired: ln w(θ) = constant +
k ln h(θ) δ−k
Therefore, under the assumption δ > k, the model exhibits an employer-size wage premium, where firms that employ more workers pay higher wages. To match empirical findings of such an employer-size wage premium, we focus on parameter values satisfying this inequality. Using the firms’ first-order conditions, firm revenue (4), and the production technology (3), we can solve explicitly for firm revenue as a function of firm productivity (θ), the demand shifter (A), the search cost (b), and parameters:
1/Γ r(θ) = κr c −β(1−γk)/δ b−βγ Υ (θ)(1−β) Aθβ (8) where Γ ≡ 1 − βγ − β(1 − γk)/δ > 0 and the constant κr is defined in the Technical Appendix. An implication of this expression is that the relative revenues of any two firms depend solely on their relative productivities and relative market access: r(θ )/r(θ ) = (θ /θ )β/Γ (Υ (θ )/Υ (θ ))β/Γ . Finally, using the two first-order conditions in the firm’s problem (6), firm profits can be expressed in terms of firm revenue and the fixed costs of production and exporting: (9)
π(θ) =
Γ r(θ) − fd − Ix (θ)fx 1 + βγ 2.3. Sectoral Variables
To determine sectoral equilibrium, we use the recursive structure of the model. In a first block of equations, we solve for the tightness of the labor market (x x∗ ) and search costs (b b∗ ) in each country. In a second block of equations, we solve for the zero-profit productivity cutoffs (θd θd∗ ), the exporting productivity cutoffs (θx θx∗ ), and sectoral demand shifters (A A∗ ). A third and final block of equations, to be described in Section 6, determines the remaining components of the sector’s variables: the dual price index (P P ∗ ), the real consumption index (Q Q∗ ), the mass of firms (M M ∗ ), and the size of the labor force (L L∗ ). 18 We note that search frictions and wage bargaining alone are not enough to generate wage variation across firms in our model. From the firm’s first-order condition for the measure of workers sampled, each firm equates workers’ share of revenue per sampled worker to the common search cost. In the special case of our model without worker heterogeneity and screening, all sampled workers are hired, which implies that each firm’s wage is equal to the common search cost.
1252
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
2.3.1. Labor Market Tightness and Search Costs Following the standard Diamond–Mortensen–Pissarides approach, the search cost (b) is assumed to be increasing in labor market tightness (x): (10)
b = α0 xα1
α0 > 1 α1 > 0
where labor market tightness equals the ratio of workers sampled (N) to workers searching for employment in the sector (L): x = N/L.19 Under the assumption of risk neutrality, the supply of workers searching for employment in the sector depends on their expected income outside the sector, that is, their outside option, ω. In particular, workers are indifferent between searching for employment inside and outside the sector if their expected income in the sector, which equals the probability of being sampled (x) times the expected wage conditional on being sampled (w(θ)h(θ)/n(θ) = b from the analysis above), is equal to ω: (11)
ω = xb
We discuss in Section 6 how this condition is modified when workers are risk averse. Together (10) and (11) determine the search cost and the labor market tightness (b x) for a given value of expected income (ω):
(12)
1/(1+α1 ) 0
b=α
ω
α1 /(1+α1 )
and
ω x= α0
1/(1+α1 )
where we assume α0 > ω so that 0 < x < 1, as discussed in Section 6. Analogous relationships determine search costs and labor market tightness (b∗ x∗ ) for a given value of expected income (ω∗ ) in foreign. The search cost in (12) depends solely on parameters of the search technology (α0 α1 ) and expected income (ω). In particular, we can make the following statement. LEMMA 1: The search cost b and the measure of labor market tightness x are both increasing in expected worker income ω. When we subsequently embed the sector in general equilibrium, we specify conditions under which expected worker income (ω) is constant and conditions under which it changes with the other endogenous variables of the model. 19 As shown by Blanchard and Galí (2010) and in the Technical Appendix, this relationship can be derived from a constant returns to scale Cobb–Douglas matching function and a cost of posting vacancies. The parameter α0 is increasing in the cost of posting vacancies and decreasing in the productivity of the matching technology, while α1 depends on the weight of vacancies in the Cobb–Douglas matching function. Other static models of search and matching include Acemoglu (1999) and Acemoglu and Shimer (1999).
1253
INEQUALITY AND UNEMPLOYMENT
2.3.2. Productivity Cutoffs and Demand The two productivity cutoffs can be determined using firm revenue (8) and profits (9). The productivity cutoff below which firms exit (θd ) is determined by the requirement that a firm with this productivity makes zero profits: (13)
1/Γ Γ κr c −β(1−γk)/δ b−βγ Aθdβ = fd 1 + βγ
Similarly, the exporting productivity cutoff above which firms export (θx ) is determined by the requirement that at this productivity a firm is indifferent between serving only the domestic market and serving both the domestic and foreign markets: (14)
1/Γ (1−β)/Γ Γ κr c −β(1−γk)/δ b−βγ Aθxβ − 1 = fx Υx 1 + βγ
These two conditions imply the following relationship between the productivity cutoffs:
θx β/Γ fx (1−β)/Γ (15) −1 = Υx θd fd In equilibrium, we also require the free entry condition to hold, which equates the expected value of entry to the sunk entry cost. Using the zero-profit and exporting cutoff conditions, (13) and (14), respectively, and the relationship between revenues for firms with different productivities from (8), the free entry condition can be written as (16)
∞
fd θd
θ θd
β/Γ
− 1 dGθ (θ) + fx
∞
θx
θ θx
β/Γ
− 1 dGθ (θ) = fe
Equations (13), (14), and (16) can be used to solve for home’s productivity cutoffs and the demand shifter (θd θx A) for a given value of the foreign demand shifter (A∗ ), which only influences home sectoral equilibrium through exporter market access (Υx > 1).20 Three analogous equations can be used to solve for foreign variables (θd∗ θx∗ A∗ ) for a given value of A. Together these six equations allow us to solve for the productivity cutoffs and demand shifters in the two countries (θd θx A θd∗ θx∗ A∗ ) for given values of search costs (b b∗ ), which were determined in the previous block of equations. Having solved for the productivity cutoffs and demand shifters, firm market access in each country (Υ (θ) Υ ∗ (θ)) follows immediately from (5). 20 In a symmetric equilibrium A = A∗ and Υx = 1 + τ−β/(1−β) , which implies that the ratio of the two productivity cutoffs is pinned down by (15) alone.
1254
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
The productivity cutoffs and demand shifter depend on two dimensions of trade openness in (13), (14), and (16). First, they depend on an extensive margin of trade openness, as captured by the ratio of the productivity cutoffs ρ ≡ θd /θx ∈ [0 1], which determines the fraction of exporting firms [1 − Gθ (θx )]/[1 − Gθ (θd )] = ρz . Second, they depend on an intensive margin of trade openness, as captured by the market access variable, Υx > 1, which determines the ratio of revenues from domestic sales and exporting, as discussed in footnote 15. These two dimensions of trade openness are linked through the relationship between the productivity cutoffs (15). 2.4. Firm-Specific Variables In this section, we use the solutions for sectoral equilibrium to solve for firmspecific variables. We show that the model’s predictions for firm-specific variables are consistent with empirically observed relationships between wages, employment, workforce composition, productivity, and export participation. To solve for firm-specific variables, we use two properties of the model. First, from firm revenue (8), the relative revenue of any two firms depends solely on their relative productivities and relative market access. Second, from firm profits (9), the lowest productivity firm with productivity θd makes zero profits. Combining these two properties with the firm’s first-order conditions above allows all firm-specific variables to be written as functions of firm productivity (θ), firm market access (Υ (θ)), the zero-profit productivity cutoff (θd ), search costs (b), and parameters, (17)
r(θ) = Υ (θ)(1−β)/Γ · rd ·
θ θd
β/Γ
rd ≡
1 + βγ fd Γ
β(1−k/δ)/Γ θ · hd · h(θ) = Υ (θ) θd −k/δ βγ fd β(1 − γk) fd hd ≡ Γ b Γ caδmin βk/(δΓ ) θ w(θ) = Υ (θ)k(1−β)/(δΓ ) · wd · θd k/δ β(1 − γk) fd wd ≡ b Γ caδmin
(1−β)(1−k/δ)/Γ
where market access (Υ (θ)) is determined as a function of firm productivity in (7). Combining these expressions with the Pareto productivity distribution, firm revenue and employment are also Pareto distributed, with shape parameters that depend on the dispersion of firm productivity, the dispersion of
1255
INEQUALITY AND UNEMPLOYMENT
2
Wage Rate, w(θ)
1.8
w(θ)
1.6
wc(θ)
1.4
1.2
1
θd 1
θx 1.5 Productivity, θ
2
2.5
FIGURE 1.—Wages as a function of firm productivity.
worker ability, and product and labor market parameters that influence workforce composition.21 From the firm-specific solutions (17), more productive firms not only have higher revenue, profits, and employment, as in the benchmark model of firm heterogeneity of Melitz (2003), but also pay higher wages as shown in Figure 1. These results are consistent with empirical evidence on rent sharing, whereby higher firm revenue and profits are shared with workers through higher wages (e.g., Van Reenen (1996)), and with the large empirical literature that finds an employer-size wage premium (see the survey by Oi and Idson (1999)). Additionally, the differences in wages across firms are driven by differences in workforce composition. More productive firms have workforces of higher average ability, which are more costly to replace in the bargaining game, and, therefore, they pay higher wages. These features of the model are consistent with empirical findings that the employer-size wage premium is in part explained by differences in the unobserved characteristics of workers across firms.22 The reason more productive firms have workforces of higher average ability in the model is that they screen more intensively, which also receives empirical support. An emerging literature on firm recruitment policies provides 21
See Helpman, Itskhoki, and Redding (2008a) for further discussion. As the shape parameters of these distributions depend on ratios of parameters in the model, the observed distributions can be used to discipline the model’s parameters. 22 See, for example, Abowd, Kramarz, and Margolis (1999), Abowd, Creecy, and Kramarz (2002), and De Melo (2008).
1256
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
evidence of more intensive screening policies for larger firms and higher-wage matches.23 Finally, firm characteristics are systematically related to export participation in the model. As a result of fixed costs of exporting, there is a discrete increase in firm revenue at the productivity threshold for entry into exporting (θx ), where Υ (θ) increases from 1 to Υx > 1, which implies a discrete increase in all other firm variables except for profits. Therefore, exporters not only have higher revenue and employment than nonexporters, as in the benchmark model of firm heterogeneity of Melitz (2003), but also pay higher wages, as found empirically by Bernard and Jensen (1995, 1997). While exporting increases the wage paid by a firm with a given productivity, so that the model features an exporter wage premium conditional on firm productivity, it does not feature an exporter wage premium conditional on firm size, because both firm wages and firm size increase discretely at the productivity threshold for entry into export markets.24 The wage differences between exporters and nonexporters in the model are accompanied by differences in workforce composition, as found empirically by Schank, Schanbel, and Wagner (2007), Munch and Skaksen (2008), and Frías, Kaplan, and Verhoogen (2009). 3. SECTORAL WAGE INEQUALITY While workers are ex ante identical and have the same expected income, there is ex post wage inequality because workers receive different wages depending on the employer with whom they are matched. In this section, we consider the within-sector distribution of wages across employed workers. This sectoral wage distribution is a weighted average of the distributions of wages for workers employed by domestic firms, Gwd (w), and for workers employed by exporters, Gwx (w), with weights equal to the shares of employment in the two groups of firms: ⎧ Shd Gwd (w) ⎪ ⎪ ⎪ ⎪ for wd ≤ w ≤ wd /ρβk/(δΓ ) , ⎪ ⎨ (18) Gw (w) = Shd for wd /ρβk/(δΓ ) ≤ w ≤ wd Υx k(1−β)/(δΓ ) /ρβk/(δΓ ) , ⎪ ⎪ ⎪ S + (1 − S hd )Gwx (w) ⎪ ⎪ ⎩ hd for w ≥ wd Υx k(1−β)/(δΓ ) /ρβk/(δΓ ) , 23 For example, Barron, Black, and Loewenstein (1987) found that expenditures on screening workers are positively and significantly related to employer size, while Pellizzari (2005) found that matches created through more intensive screening pay higher wages. 24 One potential explanation for an exporter wage premium conditional on firm size emerges from the extension of our model (discussed below) to incorporate multiple types of workers with different observed characteristics. Differences in workforce composition across these types between exporters and nonexporters that are imperfectly controlled for in empirical studies can give rise to such an exporter wage premium conditional on firm size.
INEQUALITY AND UNEMPLOYMENT
1257
where ρ and Υx are the extensive and intensive margins of trade openness defined above, wd = w(θd ) is the wage paid by the least productive firm in (17), wd /ρkβ/(δΓ ) = w(θx− ) is the wage paid by the most productive nonexporter, and wd Υxk(1−β)/(δΓ ) /ρkβ/(δΓ ) = w(θx+ ) is the wage paid by the least productive exporter. Note that wd depends solely on parameters and search costs (b), which in turn depend on expected worker income (ω). The share of workers employed by domestic firms, Shd , can be evaluated using the Pareto productivity distribution and the solution for firm-specific variables (17) as Shd =
1 − ρz−β(1−k/δ)/Γ 1 + ρz−β(1−k/δ)/Γ [Υx(1−β)(1−k/δ)/Γ − 1]
which depends on the extensive and intensive margins of trade openness. The distributions of wages across workers employed by domestic and exporting firms can also be derived from the solutions for firm-specific variables (17). Given that productivity is Pareto distributed, and both wages and employment are power functions of productivity, the distribution of wages across workers employed by domestic firms is a truncated Pareto distribution 1+1/μ wd 1− w Gwd (w) = 1 − ρz−β(1−k/δ)/Γ
for wd ≤ w ≤ wd /ρkβ/(δΓ )
Similarly, the distribution of wages across workers employed by exporters, Gwx (w), is an untruncated Pareto distribution
wd k(1−β)/(δΓ ) −kβ/(δΓ ) Gwx (w) = 1 − Υ ρ w x
1+1/μ
for w ≥ wd Υx k(1−β)/(δΓ ) /ρkβ/(δΓ ) The wage distributions for workers employed by domestic firms and by exporters have the same shape parameter, 1 + 1/μ, where μ is defined as μ≡
βk/δ zΓ − β
where Γ ≡ 1 − βγ −
β (1 − γk) δ
For the mean and variance of the sectoral wage distribution to be finite, we require 0 < μ < 1 and hence zΓ > 2β, which is satisfied for sufficiently large z (a not too dispersed productivity distribution).25 The dispersion of firm wages 25 While we concentrate on the wage distribution, as this is typically the subject of the economic debate over the impact of trade liberalization, the income distribution could also be influenced by profits. The model can also be used to determine the distribution of revenue (and hence profits) across firms as discussed above.
1258
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
is systematically related to the dispersion of firm revenue in the model, because larger firms have workforces of higher average ability and hence pay higher wages. While this mechanism is more general than our distributional assumptions, the assumption of Pareto distributions of firm productivity and worker ability enables closed-form solutions for the wage distribution to be derived. While the log normal distribution is generally believed to provide a closer approximation to the empirical wage distribution, the Pareto distribution provides a close approximation for the upper tail. 3.1. Sectoral Wage Inequality in the Closed Economy The closed economy wage distribution can be obtained by considering the case of arbitrarily large values of trade costs, which imply ρ → 0 in (18). In the closed economy, the share of employment in domestic firms is equal to 1, and the sectoral wage distribution across workers employed by domestic firms is an untruncated Pareto distribution with lower limit wd and shape parameter 1 + 1/μ. Given an untruncated Pareto distribution, all scale-invariant measures of inequality, such as the coefficient of variation, the Gini coefficient, and the Theil index, depend solely on the distribution’s shape parameter. None of these measures depends on the lower limit of the support of the wage distribution (wd ), and they therefore do not depend on search costs (b) and expected worker income (ω). While these variables affect the mean of the wage distribution, they do not affect its dispersion. An important implication of this result is that the model’s predictions for wage inequality do not depend on the equilibrium value of expected worker income (ω) and hence are robust to alternative ways of closing the model in general equilibrium. PROPOSITION 1: In the closed economy, μ is a sufficient statistic for sectoral wage inequality. In particular, (i) the coefficient of variation of wages is μ/ 1 − μ2 ; (ii) the Lorenz curve is represented by sw = 1 − (1 − sh )1/(1+μ) , where sh is the fraction of workers and sw is the fraction of their wages when workers are ordered from low- to high-wage earners; (iii) the Gini coefficient is μ/(2 + μ); and (iv) the Theil index is μ − ln(1 + μ). The existence of a sufficient statistic for wage inequality in the closed economy is more general than our model in the sense that it holds for a wider class of models in which firm wages and employment are power functions of productivity, and productivity is Pareto distributed. Together these features imply that the wage distribution in the closed economy is an untruncated Pareto distribution and, hence, the shape parameter of this distribution is a sufficient statistic for wage inequality. In our model, this shape parameter is linked to the underlying structural parameters of the model that influence workforce composition and, hence, enter the derived parameter μ. Evidently, sectoral wage inequality is monotonically increasing in μ (the lower the shape parameter of the wage distribution 1 + 1/μ, the greater the wage inequality).
INEQUALITY AND UNEMPLOYMENT
1259
PROPOSITION 2: In the closed economy, inequality in the sectoral distribution of wages is increasing in firm productivity dispersion (lower z) and increasing in worker ability dispersion (lower k) if and only if z −1 + δ−1 + γ > β−1 . Since more productive firms pay higher wages, greater dispersion in firm productivity (lower z) implies greater sectoral wage inequality. In contrast, greater dispersion in worker ability (lower k) has an ambiguous effect on sectoral wage inequality because of two counteracting forces. On the one hand, a reduction in k increases relative employment in more productive firms (from (17)) that pay higher wages, which increases wage inequality. On the other hand, a reduction in k decreases relative wages paid by more productive firms (from (17)), which reduces wage inequality. When the parameter inequality in the proposition is satisfied, the change in relative employment dominates the change in relative wages, and greater dispersion in worker ability implies greater sectoral wage inequality. The model’s prediction that sectoral wage inequality is closely linked to the dispersion of firm productivity receives empirical support. In particular, Davis and Haltiwanger (1991) showed that wage dispersion across plants within sectors accounts for a large share of overall wage dispersion and is responsible for more than one-third of the growth in overall wage dispersion in U.S. manufacturing between 1975 and 1986. Additionally, they found that between-plant wage dispersion is strongly related to between-plant size dispersion, which in our model is driven by productivity dispersion. Similarly, Faggio, Salvanes, and Van Reenen (2007) showed that a substantial component of the increase in individual wage inequality in the United Kingdom in recent decades has occurred between firms within sectors and is linked to increased productivity dispersion between firms within sectors. While greater firm productivity dispersion (associated, for example, with innovations such as information and communication technologies (ICTs)) is one potential source of increased wage inequality in the model, another potential source is international trade as considered in the next section. Indeed, both greater firm productivity dispersion and international trade raise wage inequality through the same mechanism of greater dispersion in firm revenue and wages within industries, and both raise measured productivity at the industry level through reallocations of resources across firms. 3.2. Open versus Closed Economy The sectoral wage distribution in the open economy depends on the sufficient statistic for wage inequality in the closed economy (μ) and the extensive and intensive measures of trade openness (ρ and Υx , respectively). In the two limiting cases where trade costs are sufficiently high that no firm exports (ρ = 0) and trade costs are sufficiently low that all firms export (ρ = 1), the open economy wage distribution is an untruncated Pareto distribution with
1260
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
shape parameter 1 + 1/μ. From Proposition 1, all scale-invariant measures of inequality for an untruncated Pareto distribution depend solely on the distribution’s shape parameter. Therefore, the same level of wage inequality exists in the open economy when all firms export as in the closed economy. To characterize sectoral wage inequality in the open economy when 0 < ρ < 1 (only some firms export), we compare the actual open economy wage distribution (Gw (w)) to a counterfactual wage distribution (Gcw (w)). For the counterfactual wage distribution, we choose an untruncated Pareto distribution with the same shape parameter as the wage distribution in the closed economy (1 + 1/μ), but the same mean as the wage distribution in the open economy. An important feature of this counterfactual wage distribution is that it has the same level of inequality as the closed economy wage distribution. Therefore, if we show that there is more inequality with the open economy wage distribution than with the counterfactual wage distribution, this will imply that there is more wage inequality in the open economy than in the closed economy. The counterfactual wage distribution has two other important properties, as shown formally in the Technical Appendix. First, the lowest wage in the counterfactual wage distribution (wdc ) lies strictly in between the lowest wage paid by domestic firms (wd ) and the lowest wage paid by exporters (w(θx+ )) in the actual open economy wage distribution. Otherwise, the counterfactual wage distribution would have a mean either lower or higher than the actual open economy wage distribution, which contradicts the requirement that the two distributions have the same mean. Second, the counterfactual wage distribution has a smaller slope than the actual wage distribution at w(θx+ ). Otherwise, the counterfactual wage distribution would have a greater density than the actual wage distribution for w ≥ w(θx+ ), and would, therefore, have a higher mean than the actual wage distribution. Together, these two properties imply that the relative location of the cumulative distribution functions for actual and counterfactual wages is as shown in Figure 2.26 The actual and counterfactual cumulative distributions intersect only once, and the actual distribution lies above the counterfactual distribution for low wages and below it for high wages.27 This pattern provides a sufficient condition for the counterfactual wage distribution to second-order stochastically dominate the wage distribution in the open economy. Therefore, for all measures of inequality that respect second-order stochastic dominance, the open economy wage distribution exhibits greater inequality than the counterfactual wage distribution. It follows that the wage distribution in the open economy exhibits more inequality than the wage distribution in the 26
To generate Figures 1–3, we set the parameters of the model to match some of the salient features of the data. For details see Helpman, Itskhoki, and Redding (2008b). 27 Note that the actual and counterfactual distributions can intersect either above the wage at the most productive nonexporter, w(θx− ) (as shown in Figure 2), or below it. In both cases, the actual and counterfactual distributions have the properties discussed in the text.
1261
INEQUALITY AND UNEMPLOYMENT
1 c
Gw(w)
0.8
G (w) w
0.6
0.4
0.2 −
0 w c w d
d
w(θx ) 1.25
+
w(θx )
1.5 Wage Rate, w
1.75
FIGURE 2.—Cumulative distribution function of wages.
closed economy. This result holds independently of whether the opening of trade affects expected worker income (ω), because ω affects the lower limit of the actual open economy wage distribution (and hence the lower limit of the counterfactual wage distribution), but does not affect the comparison of levels of inequality between the two distributions. PROPOSITION 3: (i) Sectoral wage inequality in the open economy when some but not all firms export is strictly greater than in the closed economy and (ii) sectoral wage inequality in the open economy when all firms export is the same as in the closed economy. Proposition 3 highlights a new mechanism for international trade to affect wage inequality that is absent from neoclassical trade theories such as the Heckscher–Ohlin model; namely, the participation of some but not all firms in exporting. This generic mechanism applies in any heterogeneous firm model in which firm wages are related to firm revenue and there is selection into export markets. As a result of this mechanism, Proposition 3 holds in models in which the following three conditions are satisfied: (i) firm wages and employment are power functions of firm productivity, (ii) there is firm selection into export markets and exporting increases wages for a firm with a given productivity, and (iii) firm productivity is Pareto distributed. An important implication of Proposition 3, which applies for symmetric and asymmetric countries alike, is that the opening of trade can increase wage inequality in all countries. In contrast, the Stolper–Samuelson theorem of the Heckscher–Ohlin model predicts rising wage inequality in developed countries
1262
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
but falling wage inequality in developing countries. Proposition 3 is, therefore, consistent with empirical findings of increased wage inequality in developing countries following trade liberalization, as reviewed by Goldberg and Pavcnik (2007). Similarly, Proposition 3 is consistent with empirical evidence that much of the observed reallocation in the aftermath of trade liberalization occurs across firms within sectors and is accompanied by increases in within-group wage inequality.28 Since sectoral wage inequality when all firms export is the same as in the closed economy, but sectoral wage inequality when only some firms export is higher than in the closed economy, the relationship between sectoral wage inequality and the fraction of exporters is at first increasing and later decreasing. COROLLARY TO PROPOSITION 3: An increase in the fraction of exporting firms raises sectoral wage inequality when the fraction of exporting firms is sufficiently small and reduces sectoral wage inequality when the fraction of exporting firms is sufficiently large. The intuition for this result is that the increase in firm wages that occurs at the productivity threshold above which firms export is present only when some, but not all, firms export. When no firm exports (ρ = 0), a small reduction in trade costs that induces some firms to start exporting raises sectoral wage inequality because of the higher wages paid by exporters. When all firms export (ρ = 1), a small increase in trade costs that induces some firms to stop exporting raises sectoral wage inequality because of the lower wages paid by domestic firms. Furthermore, as the proof follows from that for Proposition 3 above, this corollary holds for any measure of wage inequality that respects second-order stochastic dominance and for asymmetric countries. One important implication of these results is that the initial level of trade openness is a relevant control for empirical studies examining the relationship between wage inequality and trade. Our closed-form expression for the wage distribution (18) in terms of the extensive and intensive margins of trade openness (ρ and Υx , respectively) holds both in the extreme cases of autarky and frictionless trade, as well as in a trade equilibrium where only a fraction of firms export. While both fixed and variable trade costs influence sectoral wage inequality, they do so through slightly different mechanisms, because they have different effects on ρ and Υx . This can be seen most clearly for symmetric countries, where the intensive margin depends on variable trade costs alone (Υx = 1 + τ−β/(1−β) ), and changes in the fixed costs of exporting affect only the extensive margin (ρ). To illustrate the relationship between sectoral wage inequality and trade openness in the 28 To account for reallocation within industries, however, the Stolper–Samuelson theorem can be reinterpreted as applying at a more disaggregated level within industries as, for example, in Feenstra and Hanson (1996).
1263
INEQUALITY AND UNEMPLOYMENT
0.02
Theil Index, Tw
0.019
0.018
0.017 μ−ln(1+μ) 0.016
0.015 0
0.2
0.4 0.6 0.8 Trade Openness, ρ=θ /θ d
1
x
FIGURE 3.—Trade openness and sectoral wage inequality.
model, Figure 3 graphs the variation in the Theil index of wage inequality with symmetric countries as we vary the fixed cost of exporting (fx ) and hence the extensive margin of trade openness (ρ). A similar pattern of at first increasing and then later decreasing wage inequality can emerge as we vary variable trade costs (τ). While we have not been able to show analytically that the relationship between wage inequality and the fraction of exporting firms is single peaked, as shown in Figure 3, this pattern emerged from all of our simulations of the model for a wide range of different parameter configurations. In our framework, the relationship between firm wages and revenue arises from differences in average worker ability across firms. As the opening of trade changes the dispersion of firm revenue, this in turn changes the dispersion of average worker ability across firms and, hence, changes the dispersion of wages. PROPOSITION 4: The opening of the closed economy to trade amplifies differences in workforce composition across firms. As a result of the opening of trade, the revenue of exporters increases, which induces them to screen more intensively, while the revenue of domestic firms decreases, which induces them to screen less intensively. Hence a worker with a given ability who would be hired by a high productivity firm in the closed economy may not be hired by this firm in the open economy if it becomes an exporter. The opening of trade, therefore, strengthens the correlation between firm productivity and average worker ability, which echoes the empirical findings of greater wage and skill upgrading at more productive exporting firms
1264
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
in Verhoogen (2008). To the extent that empirical measures of productivity do not adequately control for worker ability, changes in average worker ability are reflected in changes in measured firm productivity. As the opening of trade amplifies differences in workforce composition across firms, it, therefore, also magnifies differences in measured firm productivity. 4. SECTORAL UNEMPLOYMENT In the model workers can be unemployed either because they are not matched with a firm or because their match-specific ability draw is below the screening threshold of the firm with which they are matched. Both components of unemployment are frictional in the sense that workers cannot immediately achieve another match. The sectoral unemployment rate u includes both of these components and can be written as 1 minus the product of the hiring rate σ and the tightness of the labor market x, (19)
u=
L−H HN =1− = 1 − σx L N L
where σ ≡ H/N, H is the measure of hired workers, N is the measure of matched workers, and L is the measure of workers seeking employment in the sector. The sectoral tightness of the labor market (x) in (12) depends on the search friction parameter (α0 ) and expected worker income (ω). Therefore, the tightness of the labor market is not directly affected by trade openness and is only indirectly affected insofar as trade openness influences ω. In contrast, the sectoral hiring rate (σ) depends directly on trade openness, which influences firm revenues and hence screening ability thresholds. Using the Pareto productivity distribution, the sectoral hiring rate can be expressed as a function of the extensive and intensive margins of trade openness (ρ and Υx , respectively), the sufficient statistic for wage inequality (μ), and other parameters, as shown in the Technical Appendix: (20)
σ = ϕ(ρ Υx )σ A
k/δ Γ caδmin 1 σ = 1 + μ β(1 − γk) fd A
where σ A is the hiring rate in autarky, the term in square brackets is the hiring rate of the least productive firm (hd /nd ), and ϕ(ρ Υx ) ≡
1 + [Υx(1−β)(1−k/δ)/Γ − 1]ρz−β(1−k/δ)/Γ 1 + [Υx(1−β)/Γ − 1]ρz−β/Γ
Evidently, we have ϕ(0 Υx ) = 1 and 0 < ϕ(ρ Υx ) < 1 for 0 < ρ ≤ 1, since Υx > 1 and δ > k.
INEQUALITY AND UNEMPLOYMENT
1265
Search and screening costs have quite different effects on the closed economy unemployment rate. For a given expected worker income, a rise in the search friction α0 raises search costs (b), which reduces the sectoral tightness of the labor market (x) and increases the sectoral unemployment rate. In contrast, as the screening cost (c) increases, firms screen less intensively, which increases the sectoral hiring rate (σ) and thereby reduces the sectoral unemployment rate. The opening of the closed economy to trade affects the sectoral unemployment rate through two channels. The first channel is through expected worker income (ω) and the tightness of the labor market (x). When we embed the sector in general equilibrium, we show that expected worker income can either remain constant or rise following the opening of trade. As a result, the tightness of the labor market is either unaffected by the opening of trade (as in Helpman and Itskhoki (2010)) or rises (as in Felbermayr, Prat, and Schmerer (2008), Felbermayr, Larch, and Lechthaler (2009)). The second channel is through the hiring rate (σ), which depends on firms’ screening decisions and is distinctive to our approach. As firms’ screening decisions determine both firm wages and hiring rates, this second channel introduces a two-way dependence between wage inequality and unemployment. The opening of trade results in an expansion in the revenue of exporters and a contraction in the revenue of nonexporters, which changes industry composition toward more productive firms that screen more intensively. Therefore, the opening of trade reduces the hiring rate, which increases sectoral unemployment. PROPOSITION 5: The opening of the closed economy to trade has an ambiguous overall effect on the sectoral unemployment rate: (a) The tightness of the labor market can either remain constant or rise following the opening of trade, which leaves unchanged or reduces the rate of unemployment. (b) The hiring rate is strictly lower in the open economy than in the closed economy, which raises the rate of unemployment. While the model’s predictions for the impact on sectoral wage inequality of the opening of the closed economy to trade are unambiguous irrespective of how expected worker income is determined in general equilibrium, its predictions for sectoral unemployment are ambiguous and depend on general equilibrium effects. This ambiguity of the results for unemployment is consistent with the absence of a clear empirical consensus on the relationship between trade and unemployment, as discussed, for example, in Davidson and Matusz (2009). The sectoral distribution of income depends on both the sectoral distribution of wages and the unemployment rate, where unemployed workers all receive the same income of zero. As the opening of trade raises wage inequality and has an ambiguous effect on unemployment, income and wage inequality can
1266
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
move in opposite directions in the model. Therefore, our framework highlights that conclusions based on wage inequality can be misleading if the ultimate concern is income inequality. 5. OBSERVABLE WORKER HETEROGENEITY In this section, we introduce ex ante heterogeneity across workers. We consider a setting in which there are multiple occupations and occupation-specific supplies of workers, where workers from one occupation cannot perform the tasks of workers from another occupation. There are observable differences in ex ante worker characteristics across the occupations, which introduces a distinction between within-group wage inequality (among workers with the same ex ante characteristics) and between-group wage inequality (across workers with different ex ante characteristics). Although, for expositional simplicity, we confine the discussion to two occupations only, it will become clear how the main specification can be generalized to any number of occupations. To demonstrate in a simple way the robustness of our results to the introduction of ex ante heterogeneity, we concentrate on a Cobb–Douglas production technology. We show that the opening of trade raises within-group wage inequality for each group of workers, whereas between-group wage inequality can rise or decline with trade. When between-group wage inequality declines, the rise of within-group wage inequality can dominate, so that overall wage inequality still rises. To illustrate the flexibility of our framework, we also briefly discuss at the end of this section some implications of technology–skill complementarity. 5.1. Main Specification There are two types of labor, = 1 2, with h denoting a firm’s employment of labor of type and n denoting a firm’s measure of matches with labor of this type. Labor markets are occupation-specific and each one of them is similar to the labor market specified above. In particular, search and matching occurs separately for every occupation. We allow the expected income of a type- worker ω , the resulting hiring costs b , and tightness in the labor market x to vary across occupations. The ability of every group is Pareto distributed with shape parameter k and lower bound amin . The generalized production function is (21)
γ
γ
y = θ(a¯ 1 h1 1 )λ1 (a¯ 2 h2 2 )λ2
λ1 + λ2 = 1
This is a Cobb–Douglas extension of the production function (2) that allows for two occupation-specific tasks (e.g., engineers and managers). As before, the revenue function is r = Ay β for nonexporting firms and r = Υx1−β Ay β for exporting firms. The wage rates are determined in a Stole–Zwiebel bargaining
INEQUALITY AND UNEMPLOYMENT
1267
game. This results in a wage bill for every type of worker which is a constant fraction of revenue,29 w h =
βγ λ r 1 + βγ¯
for = 1 2
where γ¯ ≡ λ1 γ1 + λ2 γ2 . Therefore, the problem of the firm, which is a generalization of (6), yields the following wage and employment schedules for the two occupational groups (see the Technical Appendix for details)30 :
β(1−k /δ)/Γ θ h (θ) = hd Υ (θ) θd βk /(δΓ ) θ k (1−β)/(δΓ ) w (θ) = wd Υ (θ) θd (1−β)(1−k /δ)/Γ
where now Γ ≡ 1 − βγ¯ −
β [1 − (λ1 γ1 k1 + λ2 γ2 k2 )] δ
We next use this generalized solution to discuss wage dispersion within firms, and wage inequality within and between groups. It is evident from the equations for the firm-specific variables above that employment and wages are rising with firm productivity in every occupation (provided that k < δ for both groups). However, the relative wage bills of the two types of workers are the same in every firm. Under these circumstances, relative wages are inversely proportional to relative employment across firms. In particular, we have β(k2 −k1 )/(δΓ ) θ h1 (θ) hd1 (1−β)(k2 −k1 )/(δΓ ) = Υ (θ) h2 (θ) hd2 θd 29 This results from a solution to the system of differential equations ∂(r − w1 h1 − w2 h2 )/∂h = w for = 1 2. See the Technical Appendix for details. 30 The constants hd and wd ( = 1 2) are generalizations of those provided in equation (17) for the baseline model without ex ante heterogeneity. They are
k /δ λ β(1 − γ k ) fd Γ caδmin −k /δ λ βγ fd λ β(1 − γ k ) fd hd ≡ Γ b Γ caδmin
wd ≡ b
1268
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
and β(k1 −k2 )/(δΓ ) θ w1 (θ) wd1 (1−β)(k1 −k2 )/(δΓ ) = Υ (θ) w2 (θ) wd2 θd It follows that more productive firms employ relatively more of type-1 workers if and only if k1 < k2 , that is, if and only if the ability of type-1 workers is more dispersed than the ability of type-2 workers. In what follows, we assume this to be the case. Under these circumstances, more productive firms pay relatively higher wages to type-2 workers.31 The intuition for this relationship between relative wages and employment across firms of different productivities is as follows. Since more productive firms employ relatively more of type-1 workers, this weakens their bargaining position relative to type-2 workers, and hence more productive firms pay relatively higher wages to type-2 workers. Note that the occupation-specific degree of decreasing returns γ only affects the relative wages and employment of firms through the composite derived parameter Γ . Note also that the relationship between relative wages and employment across firms of different productivities does not depend on the levels of human capital of workers in each group. High human capital of workers in group (high amin ) only affects relative employment and wages of the two groups through the cutoffs hd and wd , which are common to all firms. 5.1.1. Within-Group Inequality We can use the above solutions to calculate the distribution of wages within every occupation, as described in Section 3. As before, the distribution of wages within an occupation is an untruncated Pareto in the closed economy. This distribution now has an occupation-specific shape parameter 1 + 1/μ , where μ = βk /[δ(zΓ − β)]. It follows that k1 < k2 implies μ1 < μ2 , so that there is more wage dispersion in group 2. Group 1 has a steeper employment schedule, but a flatter wage schedule across firms. The second effect dominates, leading to less wage inequality within group 1. As in Section 3, the open economy wage distribution within an occupation has two components: a truncated Pareto among workers employed by firms that serve only the domestic market and an untruncated Pareto among workers employed by exporters, with 1 + 1/μ being the common shape parameter of these distributions. As a result, we can use the same method as in Section 3 to prove that within every occupation, there is more wage inequality in the trade regime than in autarky, as long as some, but not all, firms export (see Proposition 3). Similarly, within every occupation, wage inequality rises with trade 31
If the screening technology were also to differ across occupations, then the relative comparison would be between k1 /δ1 and k2 /δ2 instead of k1 and k2 , while differences in c have no effect on these variations across firms.
1269
INEQUALITY AND UNEMPLOYMENT
openness initially and later declines (see Corollary to Proposition 3). Therefore our results for within-group inequality naturally generalize to the case of multiple occupations with ex ante differences in worker characteristics across occupations. 5.1.2. Between-Group Inequality Next consider the impact of trade on wage inequality across occupations. Average wages for an occupation are given by32 w¯ = (1 + μ )wd ·
1 + ρz−β/Γ [Υx(1−β)/Γ − 1] 1 + ρz−β(1−k /δ)/Γ [Υx(1−β)(1−k /δ)/Γ − 1]
= 1 2
In the case where k1 < k2 and expected worker income is unchanged as a result of the opening of trade, it is straightforward to show that trade causes the average wage in occupation 2 to rise relative to occupation 1 (i.e., w¯ 2 /w¯ 1 increases).33 As discussed earlier, there are two effects of trade openness on the average wage for an occupation: the higher wages paid by exporting firms and the reallocation of employment toward high-wage exporting firms. The first effect is stronger for the high-k group, while the second effect is stronger for the low-k group. The first effect dominates and the relative average wage of the high-k occupation rises. How do these results affect between-group inequality? With two occupations, different measures of between-group inequality (including the Theil index and the Gini coefficient) achieve their minimum when w¯ 2 /w¯ 1 = 1, and inequality rises in w¯ 2 /w¯ 1 if and only if w¯ 2 > w¯ 1 . It follows that trade raises between-group inequality when w¯ 2 > w¯ 1 in autarky. This occurs, for example, when wd1 = wd2 , because k1 < k2 implies μ1 < μ2 . In contrast, trade reduces between-group inequality when w¯ 1 > w¯ 2 in both autarky and the trade equilibrium. This happens when labor market tightness for group 1 is sufficiently larger than for group 2, so that b1 /b2 and, hence, wd1 /wd2 are sufficiently large, where b1 /b2 depends on relative expected incomes for the two groups of workers (ω1 /ω2 ). From this discussion, it is clear that between-group inequality can either rise or fall as a result of the opening to trade. But even if between-group inequality falls, its decline can be dominated by the rise in within-group inequality, so that the opening of trade raises overall wage inequality. This analysis of between-group inequality has so far abstracted from effects of trade on the relative expected incomes of the two groups of workers 32 Note that w¯ A = (1 + μ )wd is the autarky average wage rate. The derivation of the average wage can be found in the Technical Appendix in the proof of Proposition 3. 33 The formal argument is the following. Given k1 < k2 , an increase of ρ from zero to a positive value increases w¯ 2 /w¯ 1 . Moreover, the partial derivative of w¯ 2 /w¯ 1 with respect to ρ is positive. Both arguments assume b2 /b1 and, hence, wd2 /wd1 are constant, which is the case when expected worker income is unchanged as a result of the opening of trade.
1270
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
(ω1 /ω2 ), as analyzed in the section on general equilibrium below. A movement in ω1 /ω2 can, however, be a dominant force behind the change in betweengroup inequality by affecting wd1 /wd2 through relative search costs b1 /b2 . Our framework can be nested in a two-sector Heckscher–Ohlin model in which trade and the relative supply of the two types of workers determine the relative expected incomes of the two groups (see the Technical Appendix). In this specification, within-group inequality responds to trade according to Proposition 3 while the response of between-group inequality is shaped by the standard Stolper–Samuelson forces. Our predictions for within-group inequality are, therefore, robust to these extensions of our model to multifactor multisector environments. 5.2. Technology–Skill Complementarity While the Cobb–Douglas production technology provides a tractable framework within which to demonstrate the robustness of our results to the introduction of ex ante heterogeneity, it implies that the relative wage bills of the two types of workers are constant across firms. This feature imposes a tight link between differences in ability dispersion between the two groups of workers, k2 − k1 , and variation in the relative wages and employment of the two groups of workers across firms of different productivities. It, therefore, also imposes a tight link between k2 − k1 and differences in wage dispersion between the two groups of workers. To break this tight link, the model can be generalized to a more flexible CES production technology34 : y = [λ1 (θ1 a¯ 1 hγ1 )ν + λ2 (θ2 a¯ 2 hγ2 )ν ]1/ν
0 < ν ≤ β λ1 + λ2 = 1
Interpreting = 1 as skilled labor and = 2 as unskilled labor, we can treat θ1 = θ as the productivity level and set θ2 ≡ 1. This specification exhibits technology–skill complementarity, which is a known feature of the data. To focus on this feature, we also assume k1 = k2 = k. The limit ν → 0 results in the Cobb–Douglas case studied above, while we have imposed ν ≤ β to insure that employment of both types of labor increases with firm productivity. In the Technical Appendix, we show that in this case, the Stole–Zwiebel bargaining game yields the equilibrium wages βγ w h = χ r 1 + βγ
λ (θ a¯ hγ )ν χ ≡ 2 χ1 + χ2 ≡ 1 γ ν λj (θj a¯ j hj ) j=1
34
Alternatively, to address this issue, one can study a model where firms of different productivity choose different technologies which use the two types of labor with different intensities. We discuss this extension in the Technical Appendix.
INEQUALITY AND UNEMPLOYMENT
1271
As before, the aggregate wage bill is a fraction βγ/(1 + βγ) of revenue r, and now the wage bill of input is a fraction χ of the total wage bill. Using this wage structure, we show in the Technical Appendix that in this case, the solution to the firm’s problem, which is a generalization of (6), yields h1 (θ) = κh θ(ν/Λ)(1−k/δ) h2 (θ) w1 (θ) = κw θkν/(δΛ) w2 (θ) where κh and κw are constants that depend on parameters and the relative equilibrium search costs for the two groups of workers (b1 /b2 ), and Λ ≡ 1 − νγ − ν(1 − γk)/δ > 0. In words, more productive firms employ relatively more skilled workers and skilled workers are paid relatively more by more productive firms. That is, the share of group-1 workers in the total wage bill increases with firm productivity, in contrast to the constant share imposed by the Cobb– Douglas specification. As in the single input case, we show in the Technical Appendix that more productive firms pay higher wages to both types of workers and they employ more workers of each type. More productive firms also select into exporting, and there is a discontinuous upward jump in revenue and, hence, wages and employment for each group of workers at the productivity cutoff θx above which firms export. This jump in wages contributes to wage inequality in the trade equilibrium when not all firms export, although we cannot directly apply the arguments from Section 3 so as to extend Proposition 3 to this case.35 An interesting special case arises when ν = β. In this case, all nonexporters pay the same wages to the unskilled workers and employ the same number of these workers, and, similarly, all exporters pay the same wages to the unskilled and employ the same number of these workers, except that exporters pay higher unskilled wages than nonexporters. Additionally, in this case, wages and employment of skilled workers are power functions of firm productivity θ. Therefore, Proposition 3 applies. It follows that in the closed economy, wage dispersion is greater for skilled workers than unskilled workers, and the opening of the closed economy to trade increases within-group wage inequality for both skilled and unskilled workers. 6. GENERAL EQUILIBRIUM In this section, we return to our benchmark model with a single type of worker and embed the sector in general equilibrium to determine expected 35 The reason that the previous proof does not directly apply is that wages are no longer power functions of θ. The Technical Appendix provides closed-form solutions for employment and wages of the two skill groups as a function of firm productivity.
1272
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
worker income (ω), prices, and aggregate income. We use this characterization of general equilibrium to examine the impact on worker welfare of opening the closed economy to trade. Individual workers in the differentiated sector experience idiosyncratic income risk as a result of the positive probability of unemployment and wage dispersion. We assume that preferences are defined over an aggregate consumption index (C ) and exhibit constant relative risk aversion (CRRA): (22)
U=
EC 1−η 1−η
0 ≤ η < 1
where E is the expectation operator. Expected indirect utility is therefore 1−η 1 w V= E 1−η P where P is the price index of the aggregate consumption index C . While we initially assume that workers are risk neutral (η = 0), we discuss the implications of introducing risk aversion below (0 < η < 1). To demonstrate the robustness of our sectoral equilibrium results, we consider several approaches to embedding the sector in general equilibrium. First, we adopt a standard approach from international trade of introducing an outside good, which is homogeneous and produced without search frictions.36 This approach allows for the possibility that some sectors of the economy are best characterized by neoclassical assumptions and is particularly tractable. Expected worker income is pinned down by the wage in the outside sector as long as countries are incompletely specialized across sectors. As a result, the opening of trade leaves expected worker income unchanged as long as countries remain incompletely specialized. Second we consider a single-sector economy where expected worker income responds to the opening of trade. While the model’s predictions for sectoral wage inequality are the same for both approaches to closing the model, the impact of opening the closed economy to trade on sectoral unemployment depends on what happens to expected worker income, as discussed above. 6.1. Expenditure, Mass of Firms, and the Labor Force Before considering the above two approaches to closing the model in general equilibrium, we first provide some additional conditions for sectoral equilibrium that can be used to solve for other sectoral variables of interest. While 36 While we assume no search frictions in the outside sector, Helpman and Itskhoki (2010) showed in a model without worker heterogeneity or screening that introducing search frictions in the outside sector generates an expected income ω0 that is independent of features of the differentiated sector. Augmenting the model here to incorporate search frictions in the outside sector would generate a similar result.
INEQUALITY AND UNEMPLOYMENT
1273
these variables have not been needed for the analysis so far, they are used below when we embed the sector in general equilibrium. Recall that the system of equilibrium conditions discussed in Section 2 allowed us to solve (θd θx A θd∗ θx∗ A∗ ) for each differentiated product sector. From utility maximization across varieties within each sector, the demand shifter (A) depends on total expenditure on the sector’s varieties (E) and the sector’s ideal price index (P) as A = E 1−β P β . Additionally, utility maximization across sectors implies that total expenditure on the sector’s varieties depends on the sectoral price indices and aggregate income Ω. Therefore, utility maximization implies that the demand shifter for each sector i is a function of sectoral price indices and aggregate income: (23)
Ai = A˜ i (P Ω)
˜ where the function A(·) is derived from the structure of preferences, P is a vector of sectoral price indexes Pi , and Ω is aggregate income. Given the sectoral demand shifters for each sector determined above, equation (23) for each sector provides a system of equations that can be used to solve for the sectoral price indices as a function of aggregate income. Next consider a particular sector i, for which we now have both Ai and Pi ; we now drop the subscript i. In this sector, the real consumption index (Q) follows from utility maximization across varieties, which implies (24)
Q = (A/P)1/(1−β)
and yields total expenditure within the sector E = PQ. Similar relationships determine the foreign price index, real consumption index, and total expenditure within the sector (P ∗ , Q∗ , E ∗ , respectively). The mass of firms within the sector (M) can be determined from the market clearing condition that total domestic expenditure on differentiated varieties equals the sum of the revenues of domestic and foreign firms that supply varieties to the domestic market: ∞ ∞ ∗ E=M (25) rd (θ) dGθ (θ) + M rx∗ (θ) dGθ (θ) θd
θx∗
From rd (θ) = r(θ)/Υ (θ), rx (θ) = r(θ)(Υ (θ) − 1)/Υ (θ),37 and total firm revenue (8), domestic and foreign revenue can be expressed in terms of variables that have already been determined (θd , θx , Υ (θ)). Therefore, we can solve for the mass of firms in each country (M M ∗ ) from (25) and a similar equation for foreign. 37
See footnote 15.
1274
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
The mass of workers searching for employment in the sector (L) can be determined by noting that total labor payments are a constant fraction of total revenue from the solution to the bargaining game: ∞ ∞ βγ ωL = M w(θ)h(θ) dGθ (θ) = M r(θ) dGθ (θ) 1 + βγ θd θd where we have solved for the mass of firms (M) and total firm revenue (r(θ)) above, and where a similar equation determines the sectoral labor force in foreign (L∗ ). Finally, we also require that the sectoral labor force is less than ¯ as discussed below. or equal to the supply of labor (L ≤ L), 6.2. Economy With an Outside Sector Having solved for the remaining components of sectoral equilibrium, we now turn to our first approach to embedding the sector in general equilibrium. The aggregate consumption index (C ) is defined over consumption of a homogeneous outside good (q0 ) and a real consumption index of differentiated varieties (Q):
C = [ϑ1−ζ q0ζ + (1 − ϑ)1−ζ Qζ ]1/ζ
0 < ζ < β
where Q is modelled as in Section 2 and workers are assumed to be risk neutral. The parameter ϑ determines the relative weight of the homogeneous and differentiated sectors in consumer preferences.38 While for simplicity we consider a single differentiated sector, the analysis generalizes in a straightforward way to the case of multiple differentiated sectors. In the homogeneous sector, the product market is perfectly competitive and there are no labor market frictions. In this sector, one unit of labor is required to produce one unit of output and there are no trade costs. Therefore, as we choose the homogeneous good as the numeraire (p0 = 1), the wage in this sector is equal to 1 in both countries. To determine expected worker income in the differentiated sector, we use an indifference condition between sectors, which equates the expected utility of entering each sector in an equilibrium where both goods are produced. Under risk neutrality, this Harris–Todaro condition implies that expected worker income in the differentiated sector equals the certain wage of 1 in the homogeneous sector (see (11)), (26) 38
xb = ω = 1
While in the analysis here we assume that workers have CRRA-CES preferences and experience income risk, Helpman, Itskhoki, and Redding (2008a, 2008b) considered an alternative specification with quasilinear preferences and income insurance within families.
INEQUALITY AND UNEMPLOYMENT
1275
where incomplete specialization can be ensured by appropriate choice of labor ¯ L¯ ∗ ), and relative preferences for the homogeneous and difendowments (L ferentiated goods (ϑ). Positive unemployment occurs in the differentiated sector for a sufficiently large search friction α0 , such that α0 > ω = 1 and, hence, 0 < x < 1 in (12). Given an expected income of 1 in each sector, each country’s aggregate income is equal to its labor endowment: (27)
¯ Ω = L
To determine the price index in the differentiated sector (P), we use the functional relationship (23), which, with CES preferences between the homogeneous and differentiated sector, takes the form (28)
A1/(1−β) =
(1 − ϑ)P (β−ζ)/((1−β)(1−ζ)) Ω ϑ + (1 − ϑ)P −ζ/(1−ζ)
where the right-hand side is monotonically increasing in P. Therefore, this relationship uniquely pins down P given the demand shifter (A) and aggregate income (Ω). To determine general equilibrium, we use the conditions for sectoral equilibrium in Sections 2 and 6.1 (where (28) replaces (23)), and combine them with the Harris–Todaro condition (26) and aggregate income (27). Together these relationships determine the equilibrium vector (x b θd θx A Q P M L ω Ω). The model has a recursive structure and the equilibrium vector is unique as shown in the Technical Appendix. Having determined this equilibrium vector, the price index P —dual to the aggregate consumption index C — and consumption of the homogeneous good q0 follow from CES demand. Finally, equilibrium employment in the homogeneous sector follows from labor ¯ market clearing (L0 = L¯ − L, where incomplete specialization requires L < L). Having characterized general equilibrium, we are now in a position to examine the impact of the opening of trade on ex ante expected welfare. Note that differentiated sector workers receive the same expected indirect utility as workers in the homogeneous sector when both goods are produced: V=
1 P
for η = 0
Therefore, the change in expected welfare as a result of the opening of trade depends solely on the change in the aggregate price index (P ), which, with our choice of numeraire, depends solely on the change in the price index for the differentiated sector (P). These comparative statics are straightforward to determine. From the free entry condition (16), the opening of trade raises the zero-profit productivity cutoff (θd ). Using the Harris–Todaro condition (26) and labor market tightness (12), search costs (b) remain constant as long as both goods are produced, because expected worker income equals 1. Therefore, from the zero-profit cutoff condition (13), the rise in θd implies a lower
1276
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
value of the demand shifter (A). Given constant aggregate income (Ω) and a lower value of A, CES demand (28) implies that the opening of trade reduces the price index for the differentiated sector (P), which implies higher expected welfare in the open economy than in the closed economy. We, therefore, can make the following statement. PROPOSITION 6: Let η = 0. Then in the two-sector economy, every worker’s ex ante welfare is higher in the open economy than in the closed economy. With constant expected worker income (ω), labor market tightness (x) is unchanged as a result of the opening of trade and, hence, the change in the sectoral unemployment rate (u) depends solely on the change in the hiring rate (σ). 6.3. Single-Sector Economy We now turn to our second approach to embedding the sector in general equilibrium. The aggregate consumption index (C ) is defined over consumption of a continuum of horizontally differentiated varieties,
C = Q where Q again takes the same form as in Section 2. All search, screening, fixed production, and fixed exporting costs are denominated in terms of the aggregate consumption index, which implies that these activities use the output of each differentiated variety in the same way as is demanded by final consumers. While for simplicity we again focus on a single differentiated sector, the analysis generalizes in a straightforward way to the case where the aggregate consumption index is defined over the consumption of many sectors, each of which contains a continuum of horizontally differentiated varieties. To determine general equilibrium, we follow the same approach as for sectoral equilibrium in Sections 2 and 6.1, where these equations now apply to the economy as a whole and we also solve for expected worker income (ω) and aggregate income (Ω). With a single sector, the analysis in Section 6.1 is slightly modified. Expenditure and the mass of firms in the two countries are determined by three equations for each country: total expenditure is the sum of expenditure on home and foreign varieties, total revenue is the sum of domestic expenditure on home varieties plus foreign expenditure on home varieties, and total expenditure equals total revenue. Labor market clearing requires that total employment in the sector equals the economy’s supply of labor, and expected worker income is determined from the requirement that total labor payments are a constant share of total revenue. Aggregate income follows immediately from expected worker income times the economy’s supply of labor. Finally, we choose the aggregate
INEQUALITY AND UNEMPLOYMENT
1277
consumption index in one country as the numeraire (P = 1) and solve for the dual price index in the other country (P ∗ ) using the sectoral demand shifter (A∗ ), total expenditure (E ∗ ), and CES demand (A∗ = (E ∗ )1−β (P ∗ )β ). In the case of symmetric countries, the single-sector model can be solved in closed form, as shown in the Technical Appendix. The closed-form solution for expected worker income is −β(1+α1 )/((1−β)Δ)
c β(1−γk)(1+α1 )/(δ(1−β)Δ) θd ω = αγβ/((1−β)Δ) 0 Δ ≡ (1 + α1 ) −
(1+α )/Δ L¯ −(1+α1 )/Δ κb 1
βγα1 < 0 (1 − β)
where κb is defined in the Technical Appendix.39 As the opening of trade increases the zero-profit cutoff productivity below which firms exit (θd ), it increases expected worker income and, hence, ex ante welfare. PROPOSITION 7: Let η = 0. Then in the one-sector economy the opening of trade (i) increases expected worker income (ω) and, hence, ex ante welfare, and (ii) increases labor market tightness (x) and search costs (b). While the predictions of the model without the outside sector for wage inequality are the same as those of the model with the outside sector, there is a new general equilibrium effect for unemployment through the tightness of the labor market (x). 6.4. Risk Aversion Income risk in the differentiated sector arises from random search, which implies a positive probability of unemployment and uncertainty over a worker’s employer, and, hence, uncertainty over the wage received in employment. In this section, we introduce risk aversion by considering the case where utility is concave in the aggregate consumption index (0 < η < 1). In the model with an outside sector, this implies that workers will require a risk premium to enter the differentiated sector rather than receiving a certain wage of 1 in the outside sector. To determine general equilibrium in the model with the outside sector, we follow a similar approach as in Section 6.2 above.40 Risk aversion changes the βγ α1 The stability of the equilibrium requires 1−β > 1, which is satisfied for sufficiently con1+α1 vex search costs (sufficiently high α1 ) and sufficiently high elasticities of substitution between varieties (β sufficiently close to but less than 1). For parameter values satisfying this inequality, the equilibrium vector is again unique. 40 Introducing risk aversion in the model with a single differentiated sector has little effect, because there is no riskless activity to or from which resources can move. 39
1278
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
equilibrium share of revenue received by workers in the bargaining game, but does not otherwise affect the determination of sectoral equilibrium within the differentiated sector.41 Additionally, risk aversion modifies the Harris–Todaro condition for indifference between sectors, which equates expected utility in the differentiated sector to the certain wage of 1 in the outside sector: ∞ w1−η dGw (w) = 1 xσEw1−η = xσ wd
where expected utility in the differentiated sector equals the probability of being matched (x) times the probability of being hired conditional on being matched (σ) times expected utility conditional on being hired.42 To determine expected worker income in the differentiated sector (ω = xb), we use the modified Harris–Todaro condition above and the search technology (10), which together imply (29)
ω = (α0 )η/(1+(1−η)α1 ) [(1 + μη)φηw ](α1 +1)/(1+(1−η)α1 ) × Λ(ρ Υx )−(1+α1 )/(1+(1−η)α1 )
where Λ(ρ Υx ) and φw are defined in the Technical Appendix, and satisfies Λ(0 Υx ) = 1 and 0 < Λ(ρ Υx ) < 1 for 0 < ρ ≤ 1. A sufficiently large search friction (α0 ) ensures positive unemployment (0 < x < 1 in (12)) and a positive risk premium in the differentiated sector (ω − 1 > 0). As shown in the analysis of sectoral equilibrium in Sections 2–4 above, the opening of trade increases sectoral wage inequality and unemployment for a given value of ω. This increase in wage inequality and unemployment enhances income risk in the differentiated sector, which implies that risk averse workers require a higher risk premium to enter the differentiated sector. This “risk effect” raises expected worker income (ω) following the opening of trade (since Λ(0 Υx ) = 1 and 0 < Λ(ρ Υx ) < 1 for 0 < ρ ≤ 1 in (29)). This increase in expected worker income in turn increases labor market tightness (x) and search costs (b). PROPOSITION 8: Let 0 < η < 1. Then in the two-sector economy, the opening of trade (i) increases expected worker income (ω), and (ii) increases labor market tightness (x) and search costs (b). 41 In the Technical Appendix, we derive the solution to the Stole and Zwiebel (1996a, 1996b) bargaining game when workers are risk averse. We show that with CRRA-CES preferences, the solution takes a similar form as when there are differences in bargaining weight between the firm and its workers. 42 The terms in the price index (P ) and 1/(1 − η) cancel from the Harris–Todaro condition equating expected utility in the two sectors.
INEQUALITY AND UNEMPLOYMENT
1279
While the predictions of the model with risk aversion for wage inequality are the same as those of the model with risk neutrality, there is again a new general equilibrium effect for unemployment through the tightness of the labor market (x). 7. CONCLUSION The relationship between international trade and earnings inequality is one of the most hotly debated issues in economics. Traditionally, research has approached this topic from the perspective of neoclassical trade theory with its emphasis on specialization across industries and changes in the relative rewards of skilled and unskilled labor. In this paper, we propose a new framework that emphasizes firm heterogeneity, Diamond–Mortensen– Pissarides search and matching frictions, and ex post heterogeneity in worker ability. In this framework, the participation of some, but not all, firms in international trade provides a new mechanism for trade to affect wage inequality. We derive two results that hold in a class of models for which firm wages and employment are power functions of firm productivity, exporting increases the wage paid by a firm with a given productivity, and firm productivity is Pareto distributed. We show that the opening of the closed economy to trade raises sectoral wage inequality for any measure of inequality that respects secondorder stochastic dominance, because the opening of trade increases the dispersion of firm revenue, which in turn increases the dispersion of firm wages. Once the economy is open to trade, wage inequality is at first increasing and later decreasing in trade openness. Therefore, a given change in trade openness can either raise or reduce wage inequality, depending on the initial level of trade openness. The relationship between firm wages and revenue in our framework is derived from search and matching frictions and heterogeneity in ex post worker ability. Larger firms screen workers more intensively and have workforces of higher average ability, who are more costly to replace in the bargaining game and are, therefore, paid higher wages. As the opening of the closed economy to trade reallocates resources toward more productive firms that screen more intensively, it also affects unemployment. While the fraction of matched workers that are hired necessarily falls, the fraction of workers searching for employment that are matched can either remain unchanged or rise. Therefore, in contrast to the model’s unambiguous predictions for wage inequality, its predictions for unemployment are more nuanced, and measures of wage and income inequality can yield quite different pictures of the impact of trade liberalization. While for most of our analysis we concentrate on within-group wage inequality among ex ante identical workers, we show that our results on within-group wage inequality are robust to introducing multiple worker types with different
1280
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
ex ante observable characteristics. While trade raises wage inequality within every group of workers, it may increase or reduce wage inequality between different groups of workers. But even when between-group wage inequality falls, the rise of within-group inequality can dominate so that the opening of trade raises overall wage inequality. Our model thus provides a unified framework for analyzing the complex interplay between wage inequality, unemployment and income inequality, and their relation to international trade. In popular discussion, the effects of international trade on income distribution are sometimes viewed as largely transitory, because it takes time for resources to be reallocated. In contrast, our framework identifies a systematic mechanism through which the opening of trade can raise equilibrium wage inequality across workers within sectors. This mechanism is founded in the new view of foreign trade that emphasizes firm heterogeneity in differentiated product markets and is consistent with observed features of firm and worker data. The tractability and flexibility of our framework lend themselves to a variety of further applications. REFERENCES ABOWD, J. M., R. H. CREECY, AND F. KRAMARZ (2002): “Computing Person and Firm Effects Using Linked Longitudinal Employer–Employee Data,” Mimeograph, Cornell University. [1255] ABOWD, J. M., F. KRAMARZ, AND D. N. MARGOLIS (1999): “High Wage Workers and High Wage Firms,” Econometrica, 67, 251–333. [1255] ACEMOGLU, D. (1999): “Changes in Unemployment and Wage Inequality; An Alternative Theory and Some Evidence,” American Economic Review, 89, 1259–1278. [1243,1244,1252] ACEMOGLU, D., AND R. SHIMER (1999): “Efficient Unemployment Insurance,” Journal of Political Economy, 107, 893–928. [1244,1252] ACEMOGLU, D., P. ANTRÀS, AND E. HELPMAN (2007): “Contracts and Technology Adoption,” American Economic Review, 97, 916–943. [1249] ALTONJI, J. G., AND C. R. PIERRET (2001): “Employer Learning and Statistical Discrimination,” Quarterly Journal of Economics, February, 313–350. [1246] AMITI, M., AND D. R. DAVIS (2008): “Trade, Firms, and Wages: Theory and Evidence,” Working Paper 14106, NBER. [1240,1243] ATTANASIO, O., P. GOLDBERG, AND N. PAVCNIK (2004): “Trade Reforms and Wage Inequality in Colombia,” Journal of Development Economics, 74, 331–366. [1241] AUTOR, D. H., AND D. SCARBOROUGH (2008): “Does Job Testing Harm Minority Workers? Evidence From Retail Establishments,” Quarterly Journal of Economics, 123, 219–277. [1247] AXTELL, R. L. (2001): “Zipf Distribution of U.S. Firm Sizes,” Science, 293, 1818–1820. [1240, 1245] BARRON, J. M., J. BISHOP, AND W. C. DUNKELBERG (1985): “Employer Search: The Interviewing and Hiring of New Employees,” Review of Economics and Statistics, 67, 43–52. [1247] BARRON, J. M., D. BLACK, AND M. LOEWENSTEIN (1987): “Employer Size: The Implications for Search, Training, Capital Investment, Starting Wages, and Wage Growth,” Journal of Labor Economics, 5, 76–89. [1247,1256] BERNARD, A. B., AND J. B. JENSEN (1995): “Exporters, Jobs, and Wages in U.S. Manufacturing: 1976–1987,” Brookings Papers on Economic Activity: Microeconomics, 1995, 67–112. [1250,1256] (1997): “Exporters, Skill Upgrading and the Wage Gap,” Journal of International Economics, 42, 3–31. [1256]
INEQUALITY AND UNEMPLOYMENT
1281
BLANCHARD, O., AND J. GALÍ (2010): “Labor Markets and Monetary Policy: A New-Keynesian Model With Unemployment,” American Economic Journal: Macroeconomics, 2, 1–30. [1252] BURDETT, K., AND D. MORTENSEN (1998): “Wage Differentials, Employer Size and Unemployment,” International Economic Review, 39, 257–273. [1243] BURSTEIN, A., AND J. VOGEL (2009): “Globalization, Technology and the Skill Premium,” Mimeograph, Columbia University. [1243] BUSTOS, P. (2009): “Trade Liberalization, Exports and Technology Upgrading: Evidence on the Impact of MERCOSUR on Argentinean Firms,” Mimeo, CREI. [1243] CAHUC, P., F. POSTEL -VINAY, AND J.-M. ROBIN (2006): “Wage Bargaining With On-the-Job Search: Theory and Evidence,” Econometrica, 74, 323–364. [1243] COSTINOT, A., AND J. VOGEL (2009): “Matching and Inequality in the World Economy,” Mimeograph, MIT. [1243] DAVIDSON, C., AND S. MATUSZ (2009): International Trade With Equilibrium Unemployment. Princeton, NJ: Princeton University Press. [1242,1265] DAVIDSON, C., L. MARTIN, AND S. MATUSZ (1988): “The Structure of Simple General Equilibrium Models With Frictional Unemployment,” Journal of Political Economy, 96, 1267–1293. [1243] (1999): “Trade and Search Generated Unemployment,” Journal of International Economics, 48, 271–299. [1243] DAVIDSON, C., S. MATUSZ, AND A. SHEVCHENKO (2008): “Globalization and Firm-Level Adjustment With Imperfect Labor Markets,” Journal of International Economics, 75, 295–309. [1244] DAVIS, D., AND J. HARRIGAN (2007): “Good Jobs, Bad Jobs, and Trade Liberalization,” Working Paper 13139, NBER. [1243] DAVIS, S. J., AND J. HALTIWANGER (1991): “Wage Dispersion Between and Within U.S. Manufacturing Plants, 1963–1986,” Brookings Papers on Economic Activity: Microeconomics, 1991, 115–200. [1259] DE MELO, R. L. (2008): “Sorting in the Labor Market: Theory and Measurement,” Mimeograph, Yale University. [1255] DIAMOND, P. A. (1982a): “Demand Management in Search Equilibrium,” Journal of Political Economy, 90, 881–894. [1243] (1982b): “Wage Determination and Efficiency in Search Equilibrium,” Review of Economic Studies, 49, 217–227. [1243] EGGER, H., AND U. KREICKEMEIER (2009a): “Firm Heterogeneity and the Labour Market Effects of Trade Liberalization,” International Economic Review, 50, 187–216. [1240,1243] (2009b): “Fairness, Trade, and Inequality,” Mimeograph, University of Nottingham. [1240,1243] FAGGIO, G., K. G. SALVANES, AND J. VAN REENEN (2007): “The Evolution of Inequality in Productivity and Wages: Panel Data Evidence,” Discussion Paper 821, CEPR. [1259] FEENSTRA, R. C., AND G. H. HANSON (1996): “Foreign Investment, Outsourcing and Relative Wages,” in The Political Economy of Trade Policy, ed. by R. C. Feenstra, G. M. Grossman, and D. A. Irwin. Cambridge, MA: MIT Press, Chapter 6. [1262] FELBERMAYR, G., M. LARCH, AND W. LECHTHALER (2009): “Unemployment in an Interdependent World,” Mimeograph, University of Stuttgart-Hohenheim. [1242,1243,1265] FELBERMAYR, G., J. PRAT, AND H.-J. SCHMERER (2008): “Globalization and Labor Market Outcomes: Wage Bargaining, Search Frictions and Firm Heterogeneity,” Discussion Paper 3363, IZA. [1242,1243,1265] FRÍAS, J., D. KAPLAN, AND E. VERHOOGEN (2009): “Exports and Wage Premia: Evidence From Mexican Employer–Employee Data,” Mimeograph, Columbia University. [1256] GARICANO, L. (2000): “Hierarchies and the Organization of Knowledge in Production,” Journal of Political Economy, 108, 874–904. [1246] GOLDBERG, P., AND N. PAVCNIK (2007): “Distributional Effects of Globalization in Developing Countries,” Journal of Economic Literature, 45, 39–82. [1241,1262]
1282
E. HELPMAN, O. ITSKHOKI, AND S. REDDING
HELPMAN, E., AND O. ITSKHOKI (2009): “Labor Market Rigidity, Trade and Unemployment: A Dynamic Model,” Mimeograph, Harvard University. [1242] (2010): “Labor Market Rigidities, Trade and Unemployment,” Review of Economic Studies, 77, 1100–1137. [1239,1242,1243,1248,1265,1272] HELPMAN, E., O. ITSKHOKI, AND S. J. REDDING (2008a): “Wages, Unemployment and Inequality With Heterogeneous Firms and Workers,” Working Paper 14122, NBER. [1239,1255,1274] (2008b): “Inequality and Unemployment in a Global Economy,” Working Paper 14478, NBER. [1239,1260,1274] (2010): “Supplement to ‘Inequality and Unemployment in a Global Economy’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/ 8640_extensions.pdf. [1244] LENTZ, R. (2010): “Sorting by Search Intensity,” Journal of Economic Theory (forthcoming). [1243] LUCAS JR., R. E. (1978): “On the Size Distribution of Business Firms,” Bell Journal of Economics, 9, 508–523. [1246] MELITZ, M. J. (2003): “The Impact of Trade on Intra-Industry Reallocations and Aggregate Industry Productivity,” Econometrica, 71, 1695–1725. [1239,1245,1255,1256] MENEZES-FILHO, N. A., M.-A. MUENDLER, AND G. RAMEY (2008): “The Structure of Worker Compensation in Brazil, With a Comparison to France and the United States,” Review of Economics and Statistics, 90, 324–346. [1241] MORETTI, E. (2004): “Workers’ Education, Spillovers and Productivity: Evidence From PlantLevel Production Functions,” American Economic Review, 94, 656–690. [1246] MORTENSEN, D. T. (1970): “A Theory of Wage and Employment Dynamics,” in The Microeconomic Foundations of Employment and Inflation Theory, ed. by E. S. Phelps et al. New York: Norton. [1243] MORTENSEN, D. T., AND C. A. PISSARIDES (1994): “Job Creation and Job Destruction in the Theory of Unemployment,” Review of Economic Studies, 61, 397–415. [1243] MUNCH, J. R., AND J. R. SKAKSEN (2008): “Human Capital and Wages in Exporting Firms,” Journal of International Economics, 75, 363–372. [1256] OHNSORGE, F., AND D. TREFLER (2007): “Sorting It Out: International Trade With Heterogeneous Workers,” Journal of Political Economy, 115, 868–892. [1243] OI, W. Y., AND T. L. IDSON (1999): “Firm Size and Wages,” in Handbook of Labor Economics, Vol. 3, ed. by O. Ashenfelter and D. Card. Amsterdam: Elsevier, Chapter 33, 2165–2214. [1255] PELLIZZARI, M. (2005): “Employers’ Search and the Efficiency of Matching,” Discussion Paper 1862, IZA. [1247,1256] PISSARIDES, C. A. (1974): “Risk, Job Search, and Income Distribution,” Journal of Political Economy, 82, 1255–1268. [1243] POSTEL -VINAY, F., AND J.-M. ROBIN (2002): “Equilibrium Wage Dispersion With Worker and Employer Heterogeneity,” Econometrica, 70, 2295–2350. [1243] ROBERTS, M. J., AND J. TYBOUT (1997): “The Decision to Export in Colombia: An Empirical Model of Entry With Sunk Costs,” American Economic Review, 87, 545–564. [1250] ROSEN, S. (1982): “Authority, Control and the Distribution of Earnings,” Bell Journal of Economics, 13, 311–323. [1246] SAEZ, E. (2001): “Using Elasticities to Derive Optimal Income Tax Rates,” Review of Economic Studies, 68, 205–229. [1240] SCHANK, T., C. SCHANBEL, AND J. WAGNER (2007): “Do Exporters Really Pay Higher Wages? First Evidence From German Linked Employer–Employee Data,” Journal of International Economics, 72, 52–74. [1256] SHIMER, R., AND L. SMITH (2000): “Assortative Matching and Search,” Econometrica, 68, 343–369. [1243] STOLE, L. A., AND J. ZWIEBEL (1996a): “Organizational Design and Technology Choice Under Intrafirm Bargaining,” American Economic Review, 86, 195–222. [1249,1278]
INEQUALITY AND UNEMPLOYMENT
1283
(1996b): “Intra-Firm Bargaining Under Non-Binding Contracts,” Review of Economic Studies, 63, 375–410. [1249,1278] VAN REENEN, J. (1996): “The Creation and Capture of Economic Rents: Wages and Innovation in a Panel of UK Companies,” Quarterly Journal of Economics, 111, 195–226. [1255] VERHOOGEN, E. (2008): “Trade, Quality Upgrading and Wage Inequality in the Mexican Manufacturing Sector,” Quarterly Journal of Economics, 123, 489–530. [1242,1264] YEAPLE, S. R. (2005): “A Simple Model of Firm Heterogeneity, International Trade, and Wages,” Journal of International Economics, 65, 1–20. [1243]
Dept. of Economics, Harvard University, 217 Littauer Center, Cambridge, MA 02138, U.S.A. and CIFAR; [email protected], Dept. of Economics, Princeton University, Fisher Hall 306, Princeton, NJ 08544-1021, U.S.A.; [email protected], and Dept. of Economics, London School of Economics, Houghton Street, London, WC2A 2AE, United Kingdom; [email protected]. Manuscript received June, 2009; final revision received February, 2010.
Econometrica, Vol. 78, No. 4 (July, 2010), 1285–1339
A UNIQUE COSTLY CONTEMPLATION REPRESENTATION BY HALUK ERGIN AND TODD SARVER1 We study preferences over menus which can be represented as if the individual is uncertain of her tastes, but is able to engage in costly contemplation before selecting an alternative from a menu. Since contemplation is costly, our key axiom, aversion to contingent planning, reflects the individual’s preference to learn the menu from which she will be choosing prior to engaging in contemplation about her tastes for the alternatives. Our representation models contemplation strategies as subjective signals over a subjective state space. The subjectivity of the state space and the information structure in our representation makes it difficult to identify them from the preference. To overcome this issue, we show that each signal can be modeled in reduced form as a measure over ex post utility functions without reference to a state space. We show that in this reduced-form representation, the set of measures and their costs are uniquely identified. Finally, we provide a measure of comparative contemplation costs and characterize the special case of our representation where contemplation is costless. KEYWORDS: Costly contemplation, aversion to contingent planning, subjective state space.
1. INTRODUCTION IN MANY PROBLEMS OF INDIVIDUAL CHOICE, the decision-maker faces some uncertainty about her preferences over the available alternatives. In many cases, she may be able to improve her decision by first engaging in some form of introspection or contemplation about her preferences. However, if this contemplation is psychologically costly for the individual, then she will not wish to engage in any unnecessary contemplation. This will lead a rational individual to exhibit what we will refer to as an aversion to contingent planning. To illustrate, consider a simple example. We will take an individual to one of two restaurants. The first one is a seafood restaurant that serves a tuna (t) and a salmon (s) dish, which we denote by A = {t s}. The second one is a steak restaurant that serves a filet mignon (f ) and a ribeye (r) dish, which we denote by B = {f r}. We will flip a coin to determine to which restaurant to go. If it comes up heads, then we will buy the individual the meal of her choice in A, and if it comes up tails, then we will buy her the meal of her choice in B. We consider presenting the individual with one of the two following decision problems: DECISION PROBLEM 1: We ask the individual to make a complete contingent plan listing what she would choose conditional on each outcome of the coin flip. 1 We thank Eddie Dekel, Faruk Gul, Bart Lipman, Massimo Marinacci, numerous seminar participants, the editor, and the three anonymous referees for helpful comments and suggestions.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7801
1286
H. ERGIN AND T. SARVER
DECISION PROBLEM 2: We first flip the coin and let the individual know its outcome. She then selects the dish of her choice from the restaurant determined by the coin flip. It is conceivable that the individual prefers facing the second decision problem rather than the first one. In this case, we say that her preferences (over decision problems) exhibit an aversion to contingent planning (ACP). Our explanation of ACP is that the individual finds it psychologically costly to figure out her tastes over meals. Because of this cost, she would rather not contemplate an inconsequential decision: She would rather not contemplate about her choice out of A were she to know that the coin came up tails and her actual choice set is B. In particular, she prefers to learn which choice set (A or B) is relevant before contemplating her choice. Our main results are a representation and a uniqueness theorem for preferences over sets of lotteries. We interpret the preferences as arising from a choice situation where the individual initially chooses from among sets (or menus) of lotteries and subsequently chooses a lottery from that set. The only primitive of the model is the preference over sets of lotteries, which corresponds to the individual’s choice behavior in the first period; we do not explicitly model the second-period choice out of the sets. The key axioms in our analysis are aversion to contingent planning (ACP) and independence of degenerate decisions (IDD). These axioms allow for costly contemplation, but impose enough structure to rule out the possibility that the individual’s beliefs themselves are changing. Before stating the ACP axiom formally, note that in our restaurant example, Decision Problem 1 corresponds to a choice out of A × B = {(t f ) (t r) (s f ) (s r)}, where, for instance, (s f ) is the plan where the individual indicates that she will have the salmon dish from the seafood restaurant if the coin comes up heads and she will have the filet mignon from the steak restaurant if the coin comes up tails. Also, note that each choice of a contingent plan eventually yields a lottery over meals. For example, if the individual chooses (s f ), then she will face the lottery 12 s + 12 f that yields either salmon or filet mignon, each with one-half probability. Hence, Decision Problem 1 is identical to a choice out of the set of lotteries 12 A + 12 B = { 12 t + 12 f 12 t + 12 r 12 s + 12 f 12 s + 12 r}. In general, we can represent the set of contingent plans between any two menus as a convex combination of these menus, with the weight on each menu corresponding to the probability that it will be the relevant menu. The individual’s preference of Decision Problem 2 to Decision Problem 1 is thus equivalent to preferring the half–half lottery over A and B (resolving prior to her choice from the menus) to the convex combination of the two menus, 12 A + 12 B. Although we do not analyze preferences over lotteries over menus explicitly, it is intuitive that the individual would prefer the better menu, say A, to any lottery over the two menus. Under this assumption, aversion to contingent planning implies that the individual will prefer choosing from the better of the two menus to making a contingent plan from the two menus. Our ACP axiom is
CONTEMPLATION REPRESENTATION
1287
precisely the formalization of this statement: If A B, then A αA+(1−α)B for any α ∈ [0 1]. To motivate our IDD axiom, consider the situation in which the individual makes a contingent choice from a menu A and with probability α her contingent choice is carried out; with probability 1 − α she is instead given a fixed lottery p. As argued above, this choice problem corresponds to the menu αA + (1 − α){p}. If the probability α that her contingent choice from A will be implemented decreases, then her benefit from contemplation decreases. However, if α is held fixed, then replacing the lottery p in the convex combination αA + (1 − α){p} with another lottery q does not change the probability that the individual’s contingent choice from A will be implemented. Therefore, although replacing p with q could affect the individual’s utility through its effect on the final composition of lotteries, it will not affect the individual’s optimal level of contemplation. This observation motivates our IDD axiom, which states that for any fixed α, if αA + (1 − α){p} is preferred to αB + (1 − α){p}, then αA + (1 − α){q} is also preferred to αB + (1 − α){q}. We present our model in detail in Section 2. Section 2.1 contains a detailed description of our axioms. Along with ACP and IDD, we consider three standard axioms in the setting of preferences over menus: (i) weak order, which states that the preference is complete and transitive, (ii) continuity, and (iii) monotonicity, which states that adding alternatives to any menu is always (weakly) better for the individual. Our representation theorem is contained in Section 2.2. Letting p denote a lottery over some set of alternatives Z and letting A denote a menu of such lotteries, Theorem 1 shows that any preference over menus satisfying our axioms can be represented by the costly contemplation (CC) representation V (A) = max E max E[U|G ] · p − c(G ) (1) G ∈G
p∈A
We interpret Equation (1) as follows. The individual is uncertain regarding her tastes over alternatives in Z. This uncertainty is modeled by a probability space (Ω F P) and a state-dependent expected-utility function U over (Z). Before making a choice out of a menu A, the individual is able to engage in contemplation so as to resolve some of this uncertainty. Contemplation strategies are modeled as a collection of signals about the state or, more compactly, as a collection G of σ-algebras generated by these signals. If the individual carries out the contemplation strategy G ∈ G, she is able to update her expectedutility function using her information G and chose a lottery p in A maximizing her conditional expected utility E[U|G ] · p = z∈Z pz E[Uz |G ]. Faced with the menu A, the individual chooses her contemplation strategy optimally by maximizing the ex ante value E[maxp∈A E[U|G ] · p] minus the cost c(G ) of contemplation, giving Equation (1). Note that this representation closely resembles a standard costly information acquisition problem. The difference is that the parameters ((Ω F P) G U c) of the CC representation are subjective in the
1288
H. ERGIN AND T. SARVER
sense that they are not directly observable, but instead must be elicited from the individual’s preferences. In Section 3, we discuss the extent to which we are able identify contemplation strategies and their costs from the preference. Due to the subjectivity of the state space and information structure in the CC representation, it is not possible to pin down the parameters of the representation from the preference. After providing an example to illustrate the nonuniqueness of the CC representation, we show that contemplation strategies can be uniquely identified when they are put into a reduced form using measures over ex post utility functions. To motivate this reduced form, suppose the individual selects a signal (i.e., contemplation strategy), the realization of which gives her some information about her tastes for the different alternatives in a menu. The information contained in a realization of the signal results in some ex post utility function, and hence the distribution of the signal translates into a distribution over ex post utility functions. Following this approach of transforming contemplation strategies into measures, Theorem 2 shows that any CC representation is equivalent to the reduced-form costly contemplation (RFCC) representation2 (2) V (A) = max max u(p)μ(du) − c(μ) μ∈M
U p∈A
The set U is a collection of ex post expected-utility functions, and M is a set of measures on U .3 Each measure μ ∈ M determines a particular weighting of the ex post utility functions. It is important to note that we do not require the measures in M to be probability measures. Although such a requirement seems natural given our motivation for the representation, to identify the parameters in the RFCC representation, we will impose a normalization on the utility functions in U . Under this normalization, the measures in M are used to capture both the likelihood of an ex post utility function and the “magnitude” of that utility function, which requires the use of measures that are not probabilities. When costly contemplation is modeled in this reduced form, parameters can be uniquely identified from the preference. Theorem 4 establishes the 2 This representation bears some similarity to the representation for “variational preferences” considered by Maccheroni, Marinacci, and Rustichini (2006) in the Anscombe–Aumann setting. There is also a technical connection between the two representations since we apply similar results from convex analysis to establish our representation theorems, although the setting of our model requires us to develop a stronger version of these results in Section S.1 of the Supplemental Material (Ergin and Sarver (2010)). 3 The RFCC representation also imposes the following consistency
condition on the measures in M: For every μ ν ∈ M and every lottery p, U u(p)μ(du) = U u(p)ν(du). This condition implies that even though the individual’s tastes after contemplation can be very different from her tastes before contemplation, the contemplation process should not affect the individual’s tendencies on average.
CONTEMPLATION REPRESENTATION
1289
uniqueness of the set of measures M and cost function c in an RFCC representation. The uniqueness of the parameters in the RFCC representation makes it possible to conduct meaningful comparisons of contemplation costs between two representations. In Section 4, we introduce a measure of comparative contemplation costs and show the implications for our RFCC representation. In Section 5.1, we introduce a variation of our model in which the individual has limited resources to devote to contemplation. That is, the cost of contemplation does not directly affect the utility of the individual, but instead enters indirectly by being constrained to be below some fixed upper bound. We show that such a representation is in fact a special case of our model, and we introduce the additional axiom needed to obtain this representation for limited contemplation resources. Our work relates to several other papers in the literature on preferences over menus. This literature originated with Kreps (1979), who considered preferences over menus taken from a finite set of alternatives. Dekel, Lipman, and Rustichini (2001) (henceforth DLR) extended Kreps’ analysis to the current setting of preferences over menus of lotteries and used the additional structure of this domain to obtain an essentially unique representation. In Section 5.2, we discuss a version of the independence axiom for preferences over menus of lotteries which was used by DLR in one of their representation results. We illustrate how our axioms relax the independence axiom and why such a relaxation of independence is necessary to model costly contemplation. A model of costly contemplation was also considered by Ergin (2003), whose primitive was the same as that of Kreps (1979)—a preference over menus taken from a finite set of alternatives. The costly contemplation representation in Equation (1) is similar to the functional form of his representation. However, the parameters in Ergin’s (2003) representation are not pinned down by the preference. The richer domain of our preferences, menus of lotteries, combined with the reduced form of our RFCC representation enables us to uniquely identify the parameters of our representation. Moreover, our richer domain yields additional behavioral implications of costly contemplation, such as ACP and IDD. We conclude in Section 6 with a brief overview of the so-called infinite-regress issue for models of costly decision-making. We discuss how our model relates to the issue and explain how our representation result provides an as if solution to the issue. Unless otherwise indicated, all proofs are contained in the Appendix. 2. A MODEL OF COSTLY CONTEMPLATION Let Z be a finite set of alternatives, and let (Z) denote the set of all probability distributions on Z, endowed with the Euclidean metric d.4 Let A denote 4
Since Z is finite, the topology generated by d is equivalent to the weak* topology on (Z).
1290
H. ERGIN AND T. SARVER
the set of all closed subsets of (Z), endowed with the Hausdorff metric, which is defined by dh (A B) = max max min d(p q) max min d(p q) p∈A q∈B
q∈B p∈A
Elements of A are called menus or option sets. The primitive of our model is a binary relation on A, representing the individual’s preferences over menus. We maintain the interpretation that after committing to a particular menu A, the individual chooses a lottery out of A in an unmodeled second stage. For any A B ∈ A and α ∈ [0 1], define the convex combination of these two menus by αA + (1 − α)B ≡ {αp + (1 − α)q : p ∈ A and q ∈ B}. Let co(A) denote the convex hull of the set A. 2.1. Axioms We impose the following order and continuity axioms. AXIOM 1—Weak Order: is complete and transitive. AXIOM 2—Strong Continuity: (i) Continuity. For all A ∈ A, the sets {B ∈ A : B A} and {B ∈ A : B A} are closed. (ii) L-continuity. There exist p∗ p∗ ∈ (Z) and M > 0 such that for every A B ∈ A and α ∈ (0 1) with α > Mdh (A B), (1 − α)A + α{p∗ } (1 − α)B + α{p∗ } The weak order axiom is entirely standard, as is the first part of the strong continuity axiom. The added assumption of L-continuity is used to obtain Lipschitz continuity of our representation in much the same way that the continuity axiom is used to obtain continuity.5 To interpret L-continuity, first note that {p∗ } {p∗ }.6 For any A B ∈ A, continuity therefore implies that there exists α ∈ (0 1) such that (1 − α)A + α{p∗ } (1 − α)B + α{p∗ }. L-continuity implies that such a preference holds for any α > Mdh (A B), so as A and B get closer, the minimum required weight on p∗ and p∗ converges to 0 at a smooth rate. The constant M can be thought of as the sensitivity of this minimum α to the distance between A and B. The next axiom captures an important aspect of our model of costly contemplation: 5 Similar L-continuity axioms are used in Dekel, Lipman, Rustichini, and Sarver (2007) (henceforth DLRS) and Sarver (2008). There is also a connection between our L-continuity axiom and the properness condition proposed by Mas-Colell (1986). 6 Let α = 12 . Applying L-continuity with A = B = {p∗ } implies {p∗ } { 12 p∗ + 12 p∗ }, and applying L-continuity with A = B = {p∗ } implies { 12 p∗ + 12 p∗ } {p∗ }.
CONTEMPLATION REPRESENTATION
1291
AXIOM 3—Aversion to Contingent Planning: For any α ∈ [0 1], AB
⇒
A αA + (1 − α)B
To interpret ACP, suppose we were to extend the individual’s preferences to lotteries over menus. Let α ◦ A ⊕ (1 − α) ◦ B denote the lottery that yields the menu A with probability α and the menu B with probability 1 − α. We interpret this lottery as resolving prior to the individual making her choice of alternative from the menus. If instead the individual is asked to make her decision prior to the resolution of the lottery, then she must make a contingent choice. The situation in which the individual makes a contingent choice, p if A and q if B, prior to the resolution of the lottery over menus is equivalent to choosing the alternative αp + (1 − α)q ∈ αA + (1 − α)B. Thus, any contingent choice from A and B corresponds to a unique lottery in αA + (1 − α)B. As discussed in the Introduction, if contemplation is costly for the individual, then she will prefer that a lottery over menus is resolved prior to her choosing an alternative so that she can avoid contingent planning. Hence, (3)
α ◦ A ⊕ (1 − α) ◦ B αA + (1 − α)B
If in addition this extended preference satisfies stochastic dominance, then A B implies A α ◦ A ⊕ (1 − α) ◦ B. Together with Equation (3), this implies ACP.7 The following axiom allows for the possibility that the individual contemplates to obtain information about her ex post utility, but it rules out the possibility that she changes her beliefs by becoming more optimistic about the utility she will obtain from a given lottery. AXIOM 4 —Independence of Degenerate Decisions: For any A B ∈ A, p q ∈ (Z), and α ∈ [0 1], αA + (1 − α){p} αB + (1 − α){p} ⇒
αA + (1 − α){q} αB + (1 − α){q}
Suppose the individual is asked to make a contingent plan, and she is told that she will be choosing from the menu A with probability α and from the menu {p} with probability 1 − α. We refer to a choice from the singleton menu {p} as a degenerate decision. When faced with a degenerate decision, there is no benefit or loss to the individual from contemplating. Therefore, if the probability α that her contingent choice from A will be implemented decreases, then her benefit from contemplation decreases. Hence, we should expect the 7 All the results in the text continue to hold if one replaces ACP with the following weaker condition: A ∼ B ⇒ A αA + (1 − α)B ∀α ∈ [0 1].
1292
H. ERGIN AND T. SARVER
individual to choose a less costly level of contemplation as α decreases. However, if α is held fixed, then replacing the degenerate decision {p} in the convex combination αA + (1 − α){p} with another degenerate decision {q} does not change the probability that the individual’s contingent choice from A will be implemented. Therefore, although replacing p with q could affect the individual’s utility through its effect on the final composition of lotteries, it will not affect the individual’s optimal level of contemplation.8 Our last axiom is a monotonicity axiom introduced by Kreps (1979). AXIOM 5—Monotonicity: If A ⊂ B, then B A. If additional alternatives are added to a menu A, the individual can always “ignore” these new alternatives and engage in the same contemplation as with the menu A.9 Therefore, the utility from a menu B ⊃ A must be at least as great as the utility from the menu A. Although at first glance it may seem that costly contemplation alone could lead to a preference for smaller menus to avoid “overanalyzing” the decision, this argument overlooks the fact that the individual chooses her contemplation strategy optimally and, in particular, can ignore any options. The possibility of overanalysis could arise if the individual experiences some disutility from not selecting the ex post optimal choice from a menu, for example, because of regret. Therefore, regret could lead the individual to sometimes prefer a smaller menu, which we refer to as a preference for commitment. Other factors, such as temptation, could also lead to a preference for commitment. Regret is studied in a related framework by Sarver (2008), and temptation is studied by Gul and Pesendorfer (2001) and Dekel, Lipman, and Rustichini (2008). We leave the study of how to incorporate regret or temptation into our model of costly contemplation as an open question for future research. 2.2. Representation Result We now define our costly contemplation representation. DEFINITION 1: A costly contemplation (CC) representation is a tuple ((Ω F P) G U c), where (Ω F P) is a probability space, G is a collection of subσ-algebras of F , U is a Z-dimensional, F -measurable, and integrable random vector, and c : G → R is a cost function such that V : A → R defined by V (A) = max E max E[U|G ] · p − c(G ) (4) G ∈G
p∈A
8 Our IDD axiom is similar in spirit to the weak certainty independence axiom used by Maccheroni, Marinacci, and Rustichini (2006) in the Anscombe–Aumann setting. In their axiom, arbitrary acts play the role of the menus A and B, and constant acts play the role of the singleton menus {p} and {q}. 9 Note that we are assuming it is costless for the individual to “read” the alternatives on the menu. What is costly for the individual is analyzing her tastes for these alternatives.
CONTEMPLATION REPRESENTATION
1293
represents , where the outer maximization in Equation (4) has a solution for every A ∈ A and there exist p q ∈ (Z) such that E[U] · p > E[U] · q.10 The costly contemplation representation above is a generalized version of the costly contemplation representation in Ergin (2003).11 The interpretation of Equation (4) is as follows: The individual has a subjective state space Ω representing her tastes over alternatives, endowed with a σ-algebra F . She does not know the realization of the subjective state ω ∈ Ω but has a prior P on (Ω F ). Her tastes over lotteries in (Z) are summarized by the random vector U representing her state-dependent expected-utility function. Her utility from a lottery p ∈ (Z) conditional on the subjective state ω ∈ Ω is therefore given by U(ω) · p = z∈Z pz Uz (ω). Before making a choice out of a menu A ∈ A, the individual may engage in contemplation. A contemplation strategy is modeled as a signal about the subjective state, which corresponds to a sub-σ-algebra G of F . The contemplation strategies available to the individual are given by the collection of σ-algebras G. If the individual carries out the contemplation strategy G , she incurs a psychological cost of contemplation c(G ). However, she can then condition her choice out of A on G and pick an alternative that yields the highest expected utility conditional on the signal realization. Faced with the menu A, the individual chooses an optimal level of contemplation by maximizing the value minus the cost of contemplation. This yields V (A) in Equation (4) as the ex ante value of the option set A. The CC formulation is similar to an optimal information acquisition formula. The difference from a standard information acquisition problem is that the parameters ((Ω F P) G U c) are subjective. Therefore, they are not directly observable, but need to be derived from the individual’s preference.
10
Two notes are in order regarding this definition: (i) We show in Appendix A that the integrability of U implies that the term E[maxp∈A E[U|G ] · p] is well defined and finite for every A ∈ A and G ∈ G. (ii) For simplicity, we directly assume that the outer maximization in Equation (4) has a solution instead of making topological assumptions on G to guarantee the existence of a maximum. An alternative approach that does not require this indirect assumption on the parameters of the representation would be to replace the outer maximization in Equation (4) with a supremum, in which case all of our results would carry over. 11 Ergin (2003) works in the framework introduced by Kreps (1979), where the primitive of the model is a preference over subsets of Z rather than subsets of (Z). He shows that a preference over sets of alternatives is monotone (A ⊂ B ⇒ B A) if and only if there exists a costly contemplation representation with finite Ω such that is represented by the ex ante utility function V in Equation (4). The formulation of costly contemplation in this paper allows for infinite subjective state space Ω and extends the formulation to menus of lotteries assuming that the state-dependent utility is von Neumann–Morgenstern.
1294
H. ERGIN AND T. SARVER
Finally, note that V ({p}) = E[U] · p − minG ∈G c(G ) for any p ∈ (Z). Therefore, the requirement in Definition 1 that E[U] · p > E[U] · q for some p q ∈ (Z) is equivalent to assuming V ({p}) > V ({q}).12 We now present our first representation theorem. THEOREM 1: The preference has a CC representation if and only if it satisfies weak order, strong continuity, ACP, IDD, and monotonicity. We will not provide a direct proof for this result since it follows from two results presented in the next section (Theorems 2 and 3). 3. IDENTIFYING CONTEMPLATION STRATEGIES AND COSTS The following example shows that two different CC representations can lead to the same value function V for menus and, hence, represent the same preference. EXAMPLE 1: Let Z = {z1 z2 z3 } and Ω = {ω1 ω2 ω3 }. Let F be the discrete algebra and let P be the uniform distribution on Ω. For each i ∈ {1 2 3}, let Gi be the algebra generated by the partition {{ωi } {ωj ωk }}, and let the collection of contemplation strategies be G = {G1 G2 G3 }. Let c : G → R be any cost function, and define U : Ω → R3 and Uˆ : Ω → R3 by13
2 −1 −1 U(ω1 ) = −1 U(ω2 ) = 2 U(ω3 ) = −1 −1 −1 2
−2 1 1 ˆ 2 ) = −2 U(ω ˆ 3) = 1 ˆ 1 ) = 1 U(ω U(ω 1 1 −2 Then (5)
1 E max E[U|G1 ] · p = max U(ω1 ) · p p∈A 3 p∈A 1 1 2 + max U(ω2 ) + U(ω3 ) · p 3 p∈A 2 2
If we take p∗ and p∗ as in the definition of L-continuity, then {p∗ } {p∗ }, which gives rise to this condition. This “singleton-nontriviality” implication of L-continuity is not accidental, as it plays an important role in the proof of our representation theorem. 13 ˆ = 0, and hence Under this specification of the random vectors, we have E[U] = 0 and E[U] the singleton-nontriviality condition in Definition 1 is not satisfied. However, we allow for this violation purely for expositional simplicity. The representations can be modified to satisfy singletonˆ 4 ) = (1 0 0), and let Gi be nontriviality as follows: Add a fourth state ω4 to Ω, let U(ω4 ) = U(ω the algebra generated by the partition {{ωi } {ωj ωk } {ω4 }}. 12
CONTEMPLATION REPRESENTATION
1295
⎛ 2 ⎞ ⎛ 2⎞ − ⎜ 3 ⎟ ⎜ 3⎟ ⎜ ⎜ ⎟ ⎟ ⎜ 1⎟ ⎜ 1 ⎟ = max ⎜ − ⎟ · p + max ⎜ ⎟·p p∈A ⎜ p∈A ⎜ 3 ⎟ ⎟ ⎝ 3⎠ ⎝ ⎠ 1 1 − 3 3 1 ˆ 1 ˆ 2 = max U(ω2 ) + U(ω3 ) · p 3 p∈A 2 2 1 ˆ 1) · p max U(ω 3 p∈A ˆ G1 ] · p = E max E[U| +
p∈A
Similar arguments can be made for each algebra in Gi ∈ G. Therefore, defining V and Vˆ as in Equation (4) for each of the representations ((Ω F P) G U c) ˆ c), respectively, we have V (A) = Vˆ (A) for any menu and ((Ω F P) G U A ∈ A. Although the CC representation is not unique, we will show that the contemplation strategies in the representation can be put into a “reduced form” which will allow them to be uniquely identified by the preference. Equation (5) in Example 1 illustrates the motivation for this reduced form. First, note that for any G ∈ G, the distributions over ex post utility functions induced by these two representations are not the same. For instance, for G1 , the distribution over ex post utility functions induced by the first representation puts weight 13 on (2 −1 −1) and weight 23 on (−1 12 12 ), whereas the second representation puts weight 23 on (1 − 12 − 12 ) and weight 13 on (−2 1 1). However, the product of these ex post utility functions with their probabilities is the same for both representations, yielding the vectors ( 23 − 13 − 13 ) and (− 23 13 13 ). We now generalize these observations to show that contemplation strategies can be uniquely identified when they are represented using measures over ex post expected-utility functions, where each measure captures the combination of the likelihood and the magnitude of ex post utilities for the corresponding contemplation strategy. Since expected-utility functions on (Z) are equivalent to vectors in RZ , we will use the notation u(p) and u · p interchangeably. Define the set of normalized (nonconstant) expected-utility functions on (Z) to be Z 2 (6) U = u∈R : uz = 0 uz = 1 z∈Z
z∈Z
1296
H. ERGIN AND T. SARVER
For any v ∈ RZ (i.e., any expected-utility function), there exist α ≥ 0, β ∈ R, and u ∈ U such that v(p) = αu(p) + β for all p ∈ (Z). Therefore, modulo an affine transformation, U contains all possible ex post expected-utility functions. The following lemma shows that in any CC representation, each contemplation strategy corresponds to a unique measure over U . LEMMA 1: Let ((Ω F P) G U c) be any costly contemplation representation. For each G ∈ G, there exists a unique finite Borel measure μG on U and scalar βG ∈ R such that for all A ∈ A, E max E[U|G ] · p = max u(p)μG (du) + βG p∈A
U p∈A
In particular, it must be that βG =
1 |Z|
z∈Z
E[Uz ] for all G ∈ G.
Note that the normalization of the ex post utility functions in U is necessary for obtaining the uniqueness of the measure μG in this result. For instance, as we illustrated in the context of Example 1, different distributions over nonnormalized ex post utility functions may correspond to the same measure μ on the set of normalized utility functions U .14 Note also that while the normalization of the utility functions in U necessitates the use of measures that may not be probabilities, we can interpret any positive measure μ on U as a normalized version of a distribution over ex post utility functions. Specifically, let λ = μ(U ) > 0, and consider the probability measure π on V = λU which (heuristically) puts
μ(u)/λ weight on each v = λu ∈ V . Then, by a simple change of variables, U maxp∈A u(p)μ(du) = V maxp∈A v(p)π(dv). We now sketch the proof of Lemma 1 for the case of a costly contemplation representation ((Ω F P) G U c) where the state space Ω is finite (and F is the discrete algebra). For each event E ⊂ Ω, one can think of ω∈E P(ω)U(ω) as an expected-utility function over (Z). By the definition of U , there exist αE ≥ 0, βE ∈ R, and uE ∈ U such that αE uE (p) + βE = P(ω)U(ω) · p ∀p ∈ (Z) ω∈E
For simplicity, suppose that βE = 0 for each event E ⊂ Ω. 14 The impossibility of uniquely identifying distributions over (nonnormalized) ex post utility functions in our model is similar to the issue common to most models of state-dependent utility that the probability distribution over states cannot be identified separately from the utility function (see Karni (1993) for a more detailed discussion of state-dependent utility within the context of the Anscombe–Aumann model). For example, this observation motivated Kreps (1979) to impose the implicit normalization that the expectation of the state-dependent utility function be taken with respect to the uniform distribution. We adopt the alternative approach of normalizing utilities and using measures to represent the product of the probability and ex post utility.
CONTEMPLATION REPRESENTATION
1297
Let G ∈ G. Finiteness of Ω implies that there is a partition πG of Ω that generatesG . We define a measure μG over U which has finite support by μG (u) = E∈πG :uE =u αE for each u ∈ U . Note that we sum over all E for which uE = u since it is possible to have multiple elements of the partition that lead to the same ex post expected-utility preference. Then E max E[U|G ] · p = P(E) max P(ω|E)U(ω) · p p∈A
E∈πG
=
E∈πG
=
p∈A
ω∈E
max P(ω)U(ω) · p p∈A
ω∈E
αE max uE (p)
E∈πG
p∈A
=
max u(p)μG (du)
U p∈A
We show in Appendix the assumption that βE = 0 for all E ⊂ B that without 1 Ω, the term βG = E∈πG βE = |Z| z∈Z E[Uz ] would be added to the above expression. We also show that the uniqueness of μG can be established using the uniqueness results for the additive expected-utility (EU) representation of DLR.15 Going back to Example 1, define u1 u2 u3 ∈ U by
2 −1 −1 1 1 1 1 2 3 u =√ −1 u = √ 2 u =√ −1 6 −1 6 −1 6 2 Then, by the same arguments as those given in Equation (5), we see that the measures induced by √the partition {{ω√i } {ωj ωk }} in the two representations are identical, giving 36 weight to ui , 36 weight to −ui , and 0 weight to U \ {ui −ui }. Motivated by the equivalence obtained in Lemma 1, we now define our reduced-form representation.16 DEFINITION 2: A reduced-form costly contemplation (RFCC) representation is a pair (M c) consisting of a compact set of finite Borel measures M on U 15
We discuss the relationship between our model and the additive EU representation of DLR in more detail in Section 5.2. 16 Note that we endow the set of all finite Borel measures on U with the weak* topology, that
is, the topology where a net {μd }d∈D converges to μ if and only if U f μd (du) → U f μ(du) for every continuous function f : U → R.
1298
H. ERGIN AND T. SARVER
and a lower semicontinuous function c : M → R such that V : A → R defined by (7) max u(p)μ(du) − c(μ) V (A) = max μ∈M
U p∈A
represents and the following conditions hold: (i) The set M is consistent: For each μ ν ∈ M and p ∈ (Z), u(p)μ(du) = u(p)ν(du) U
U
(ii) The set M is minimal: For any compact proper subset M of M, the function V obtained by replacing M with M in Equation (7) no longer represents . (iii) There exist p q ∈ (Z) such that V ({p}) > V ({q}). The following result shows that the RFCC representation can be interpreted as a reduced form of the CC representation. THEOREM 2: Let V : A → R. Then there exists a CC representation such that V is given by Equation (4) if and only if there exists an RFCC representation such that V is given by Equation (7). We now sketch the construction of an equivalent RFCC representation for a 1 E[U ], we given CC representation ((Ω F P) G U c). Letting β = |Z| z z∈Z showed in Lemma 1 that for any G ∈ G, there exists a unique finite Borel measure μG on U such that for all A ∈ A, E max E[U|G ] · p = max u(p)μG (du) + β p∈A
U p∈A
Let M = {μG : G ∈ G} and, for each μ ∈ M, let ˜ c(μ) = inf{c(G ) : G ∈ G and μ = μG } − β ˜ for any A ∈ A, By the construction of M and c, (8)
˜ max E max E[U|G ] · p − c(G ) = max max u(p)μ(du) − c(μ) G ∈G
p∈A
μ∈M
U p∈A
Also, for any G ∈ G and p ∈ (Z), u(p)μG (du) = E[E[U|G ] · p] − β = E[U] · p − β U
CONTEMPLATION REPRESENTATION
1299
by the law of iterated expectations. This implies that the measures in M must satisfy the consistency condition in Definition 2. Also, condition (iii) in Definition 2 corresponds to the requirement in the definition of the CC representation that E[U] · p > E[U] · q for some p q ∈ (Z). The minimality condition in the definition of the RFCC representation is needed to uniquely identify the parameters in the representation. To see this, note that it is always possible to add a measure μ ∈ / M to the set M and assign it a cost c(μ) high enough to guarantee that this measure is never a maximizer in Equation (7). The minimality condition requires that all such unnecessary measures be dropped from the representation. In contrast, a CC representation may include contemplation strategies that are never optimal. In the construction of an equivalent RFCC representation from a CC representation, it is therefore necessary to remove measures from M that are not strictly optimal in Equation (8) for some A ∈ A.17 Thus, the minimal set M in an RFCC representation may not include all possible contemplation strategies available to the individual, but it identifies all of the “relevant” ones. Theorem 2 also asserts that for any RFCC representation, there exists a CC representation giving rise to the same value function V for menus. The construction used to prove this part of the theorem is more involved, so we refer the reader to Appendix D for the details. Using Theorem 2, we establish our CC representation result (Theorem 1) by proving the following RFCC representation theorem. THEOREM 3: The preference has an RFCC representation if and only if it satisfies weak order, strong continuity, ACP, IDD, and monotonicity. The proof of Theorem 3 is contained in Appendix C and is divided into two parts18 : In Appendix C.1, we construct a function V that represents and satisfies certain desirable properties: Lipschitz continuity, convexity, and a type of “translation linearity” which is closely related to the consistency condition for the measures in our representation. Then, in Appendix C.2, we apply duality results from convex analysis to establish that this function V satisfies Equation (7) for some pair (M c). We claimed that contemplation strategies can be uniquely identified from the preference once they are put into the reduced form of measures over utility functions. The following uniqueness result for the RFCC representation formalizes this claim. 17 If G is finite, then the set M obtained by the construction above is also finite. In this case, it can be shown that sequentially removing measures from M that are not strictly optimal in ˜ ⊂ M. Although this approach Equation (8) for some A ∈ A leads to a minimal set of measures M can be generalized to the case of infinite G, in Appendix D we instead give a simpler indirect proof of this direction that does not use Lemma 1. 18 In Appendix C, we also prove a related representation result for nonmonotone preferences, which may be useful in future applications.
1300
H. ERGIN AND T. SARVER
THEOREM 4: If (M c) and (M c ) are two RFCC representations for , then there exist α > 0 and β ∈ R such that M = αM and c (αμ) = αc(μ) + β for all μ ∈ M. An RFCC representation (M c) in which M is a singleton corresponds to a monotone additive EU representation of DLR. Since DLR did not impose a normalization on the ex post expected-utility functions in their representation, their uniqueness result appears weaker than the implication of our Theorem 4 for singleton M. However, our uniqueness result for singleton M is not actually stronger than theirs since the same normalization also gives a unique belief in DLR. For the intuition behind this theorem, note that the V defined by Equation (7) for an RFCC representation is a convex function. Although the nonlinearity of this function prevents the use of standard arguments from expectedutility theory, it can still be shown that V is unique up to a positive affine transformation (see Proposition 1 in Appendix C.1). From this it can then be shown that the parameters of an RFCC representation (M c) are themselves unique up to the positive affine transformation described in Theorem 4. For a simple example of how the type of transformation described in Theorem 4 could arise, consider a CC representation ((Ω F P) G U c), and let ˜ be the corresponding RFCC representation as described in Theorem 2. (M c) If we replace the state-dependent utility function U in this CC representation with the utility function U = αU − β and replace the cost function c with c = αc, where α > 0 and β ∈ R, then the underlying preference is the same. This new representation corresponds to the RFCC representation (M c˜ ), ˜ α1 μ) + β for all μ ∈ M . However, due to the where M = αM and c˜ (μ) = αc( nonuniqueness of the CC representation, there are many other changes to the parameters of a CC representation that could also result in such a transformation of the corresponding RFCC representation (e.g., changes to the probability distribution or information structure, or other types of changes to the utility or cost function). In particular, as illustrated in Example 1, two sets of CC parameters can correspond to precisely the same RFCC representation. Given the sharp uniqueness result that is obtained for the RFCC representation (Theorem 4), the equivalence result established above (Theorem 2) allows the nonuniqueness issue associated with CC representations to be overcome by working with equivalent RFCC representations.1920 19 Note that in the model of Ergin (2003), the preference being over finitely many menus presents an additional, more basic source of nonuniqueness. In his framework, even a reducedform representation would not be uniquely identified. 20 In an alternative approach to modeling costly information acquisition, Hyogo (2007) studied preferences over pairs consisting of an action and a menu of Anscombe–Aumann acts. In his representation, each action yields a distribution over posteriors over the objective state space, and the individual anticipates that she will choose an ex post optimal act from the menu. Since in his framework the state space is objective and utility is not state dependent, he is able to uniquely identify the prior over the state space and the distribution of posteriors induced by each action.
CONTEMPLATION REPRESENTATION
1301
4. COMPARING CONTEMPLATION COSTS By identifying contemplation strategies with measures over ex post utility functions as described in the previous section, it is possible to conduct meaningful comparisons of contemplation costs between two representations. In this section, we consider one such comparative measure of the cost of contemplation. Our measure will apply to preferences that are bounded above by singleton menus in the sense that there exists an alternative z ∈ Z such that {δz } A for all A ∈ A, where δz denotes the lottery that puts full probability on z. For example, such an alternative z could be a very large monetary prize that is known with certainty to be better than any other alternative z ∈ Z. DEFINITION 3: Suppose that the preferences 1 and 2 satisfy Axioms 1–5 and are bounded above by singleton menus. We say that 1 has lower cost of contemplation than 2 if for every A ∈ A and p ∈ (Z), A 2 {p}
⇒
A 1 {p}
In this comparative measure, individuals face a trade-off between a menu A that may offer some flexibility and a lottery p that may be better on average. For example, consider any menu A and lottery p. If there is some q ∈ A such that {q} i {p} for i = 1 2, then A i {p} for i = 1 2 by the monotonicity of the preferences. In this case, the condition in Definition 3 holds vacuously. Alternatively, suppose {p} i {q} for i = 1 2 for all q ∈ A. Then the menu A may offer flexibility if it contains more than one alternative, while p is better than the alternatives in A on average. Definition 3 formalizes the intuition that an individual is more likely to favor A over {p} as her cost of contemplation becomes smaller since flexibility is more valuable when information about the alternatives is available at a lower cost. Assume that the preference i has the RFCC representation (Mi ci ) for i = 1 2. If the sets of measures M1 and M2 are different, then it is not clear what the statement “the cost function c1 is lower than the cost function c2 ” means. In this case, there are measures μ ∈ M1 ∪ M2 for which either c1 (μ) or c2 (μ) is not defined. To overcome this problem, we will extend the cost function in an RFCC representation to the set of all measures. DEFINITION 4: Let M denote the set of all finite Borel measures on U and let V : A → R be continuous. The minimum rationalizable cost of contemplation for V is the function c ∗ : M → R defined by ∗ (9) max u(p)μ(du) − V (A) c (μ) = max A∈A
U p∈A
Suppose V is defined by Equation (7) for some RFCC representation (M c). Then the function c ∗ defined by Equation (9) agrees with the cost
1302
H. ERGIN AND T. SARVER
21 for any A ∈ A and μ ∈ M, we have V (A) ≥
function c on M. Moreover, ∗ max u(p)μ(du) − c (μ), with equality for some A ∈ A. Thus, c ∗ is the p∈A U minimal extension of c to M that does not alter the function V . Recall from the discussion in Section 3 that if an individual has an RFCC representation (M c), then μ ∈ / M is not a statement that the contemplation strategy corresponding to μ is not available to the individual. Rather, the exclusion of this measure from M implies that it is never strictly optimal and, hence, is not needed to represent the individual’s preference. In this sense, it is natural to consider what contemplation costs could be attributed to measures not contained in M. The function c ∗ indicates the minimum rationalizable cost of contemplation for all measures in M, which makes it possible to compare contemplation costs between different RFCC representations. For the following result, we use S ≡ {{p} : p ∈ (Z)} to denote the set all of singleton menus, and we write V2 |S ≈ V1 |S to indicate that the restriction of V2 to S is a positive affine transformation of the restriction of V1 to S (i.e., there exist α > 0 and β ∈ R such that V2 ({p}) = αV1 ({p}) + β for all p ∈ (Z)).
THEOREM 5: Assume that for i = 1 2, the preference i has an RFCC representation (Mi ci ) and is bounded above by singleton menus. Define Vi by Equation (7) and ci∗ by Equation (9) for i = 1 2. Then the following statements are equivalent: (i) 1 has lower cost of contemplation than 2 . (ii) V2 |S ≈ V1 |S and V2 ≤ V1 (provided V2 |S = V1 |S ). (iii) V2 |S ≈ V1 |S and c2∗ ≥ c1∗ (provided V2 |S = V1 |S ).22 To interpret condition (iii) in this theorem, first note that if V2 |S ≈ V1 |S , then by Theorem 4 it is without loss of generality to assume that V2 |S = V1 |S . In this case, we have U u(p)μ(du) = U u(p)ν(du) for all μ ∈ M2 , ν ∈ M1 , and p ∈ (Z). In other words, the average utility of any lottery is the same for both representations. However, c1∗ (μ) ≤ c2∗ (μ) for all μ ∈ M implies that information is less costly for the first individual. In particular, consider any μ ∈ M2 . If μ ∈ M1 ∩ M2 , then c1 (μ) = c1∗ (μ) ≤ c2∗ (μ) = c2 (μ), that is, the contemplation strategy corresponding to μ is less costly for the first individual. Alternatively, if μ ∈ M2 \ M1 , then c1∗ (μ) ≤ c2∗ (μ) = c2 (μ). Thus, if the measure μ were added to the representation (M1 c1 ) at a cost c1∗ (μ), where c1∗ (μ) ≤ c2 (μ), the value function for menus V1 would not be altered. Although we cannot infer from the 21 This result is obtained as part of the proof of Theorem 4 in Appendix E. Note that although it is immediate that c ∗ (μ) ≤ c(μ) for all μ ∈ M, the minimality requirement on M is important for obtaining the opposite inequality. For example, consider a measure μ ∈ M that is strictly suboptimal for every menu in the sense that V (A) > U maxp∈A u(p)μ(du) − c(μ) for all A ∈ A. The minimality requirement rules out the possibility of having such a measure in M, but if it were permitted, we would obtain c(μ) > c ∗ (μ). 22 This theorem continues to hold if that assumption that c2∗ ≥ c1∗ in condition (iii) is replaced with the weaker assumption that c2∗ (μ) ≥ c1∗ (μ) for all μ ∈ M2 .
CONTEMPLATION REPRESENTATION
1303
/ M1 is preference 1 whether the contemplation strategy corresponding to μ ∈ available to the first individual or not, this contemplation strategy can be rationalized by the preference at a cost c1∗ (μ) ≤ c2 (μ). In this sense, all of the contemplation strategies available to the second individual can be thought of as being available to the first individual at a lower cost. Using the mapping from contemplation strategies G to measures μG described in Lemma 1, we obtain the following corollary for CC representations. COROLLARY 1: Assume that for i = 1 2, the preference i has a CC representation ((Ωi Fi Pi ) Gi Ui ci ) and is bounded above by singleton menus. Define Vi by Equation (4) and ci∗ by Equation (9) for i = 1 2. Then the following statements are equivalent: (i) 1 has lower cost of contemplation than 2 . (ii) V2 |S ≈ V1 |S and V2 ≤ V1 (provided V2 |S = V1 |S ). (iii) V2 |S ≈ V1 |S and c2∗ (μG2 ) ≥ c1∗ (μG2 ) for all G2 ∈ G2 (provided V2 |S = V1 |S ). The interpretation of this corollary is similar to that of Theorem 5, with the following caveat: In a CC representation ((Ω F P) G U c), the interpretation of the function c ∗ as an extension of the cost function c is a little more subtle than in the case of the RFCC representation. Aside from the obvious distinction that the domain of c is not actually a subset of the domain of c ∗ , it is possible to have c ∗ (μG ) < c(G ) for some G ∈ G. In particular, Lemma 14 in Appendix F.2 shows that c ∗ (μG ) ≤ c(G ), with equality if and only if G solves Equaˆ denote the subset of contemtion (4) for some A ∈ A.23 Therefore, if we let G plation strategies that solve Equation (4) for some menu, then c ∗ (μG ) = c(G ) ˆ Thus, c ∗ can be thought of as the minimal extension of c|Gˆ to M for any G ∈ G. that does not alter the function V . 5. SPECIAL CASES 5.1. Limited Contemplation Resources In this section, we consider an alternative model of costly contemplation in which the cost of contemplation does not directly affect the utility of the individual. Instead, the cost of contemplation enters indirectly when it is constrained to be below some bound k. Such a model may be appropriate in instances where the only cost of contemplation is time and the individual has a limited amount of time to devote to her decision. Formally, we continue to model contemplation in the reduced form of a compact set of finite Borel measures M over the set of ex post utility functions U , with the requirement that M be consistent and minimal. Let c : M → R be a 1 Throughout this discussion, we assume for expositional simplicity that β ≡ |Z| z∈Z E[Uz ] = ∗ 0. As we show in Lemma 14, without this assumption, the inequality would be c (μG ) ≤ c(G ) − β. 23
1304
H. ERGIN AND T. SARVER
lower semicontinuous cost function and let k ∈ R be the maximum allowable contemplation cost. A representation for limited contemplation resources then takes the form of a function V : A → R defined by (10) max u(p)μ(du) subject to c(μ) ≤ k V (A) = max μ∈M
U p∈A
If we let M = {μ ∈ M : c(μ) ≤ k}, then this representation is equivalent to max u(p)μ(du) V (A) = max μ∈M
U p∈A
Moreover, since c is lower semicontinuous, M is also compact. Thus, the limited contemplation resources representation in Equation (10) is equivalent to an RFCC representation with a zero cost function, (M 0). Since the cost function in an RFCC representation is only unique up to an affine transformation, we see that a preference has a representation as in Equation (10) if and only if it has an RFCC representation (M c ), where c is constant. We now introduce an axiom that characterizes a constant cost of contemplation for all available contemplation strategies. AXIOM 6—Strong IDD: For any A B ∈ A, p ∈ (Z), and α ∈ (0 1), AB
⇐⇒
αA + (1 − α){p} αB + (1 − α){p}
As the name suggests, strong IDD is a strengthening of IDD. Suppose αA + (1 − α){p} αB + (1 − α){p} for some A B ∈ A, p ∈ (Z), and α ∈ (0 1). Strong IDD then implies A B, and applying strong IDD again, we have βA + (1 − β){q} βB + (1 − β){q} for any q ∈ (Z) and β ∈ (0 1). In contrast, IDD only guarantees that the above preference holds for β = α. Thus, strong IDD implies an independence of degenerate decisions (IDD) and, in addition, independence of the weights on these degenerate decisions.24 24 Strong IDD is similar in spirit to the certainty independence axiom used by Gilboa and Schmeidler (1989) in the Anscombe–Aumann setting. In their axiom, arbitrary acts play the role of the menus A and B, and a constant act plays the role of the singleton menu {p}. Our discussion of the relationship between strong IDD and IDD parallels the comparison of certainty independence and weak certainty independence found in Section 3.1 of Maccheroni, Marinacci, and Rustichini (2006).
CONTEMPLATION REPRESENTATION
1305
For intuition, recall that the menu αA + (1 − α){p} represents the decision problem in which the individual makes a contingent choice from A, this choice is implemented with probability α, and with probability 1 − α the individual instead receives p. We argued in Section 2.1 that as α decreases, the individual’s benefit from contemplation decreases, causing her to choose a less costly contemplation strategy. However, if the cost of all available contemplation strategies is the same, then her optimal contemplation strategy when choosing from the menu A will be the same as her optimal contemplation strategy when choosing from αA + (1 − α){p} for any α ∈ (0 1). Therefore, if A B, then taking the convex combination of these menus with some singleton menu {p} could affect the individual’s utility through its effect on the final composition of lotteries, but it will not affect her optimal contemplation strategy for each of the respective menus. Hence, her ranking of the menus will not change. The following theorem formalizes the connection between strong IDD and constant contemplation costs. THEOREM 6: Suppose the preference has an RFCC representation (M c). Then satisfies strong IDD if and only if c is constant. Given the relationship between the CC representation and the RFCC representation, we obtain the following corollary. COROLLARY 2: For a preference on A, the following statements are equivalent: (i) The preference satisfies weak order, strong continuity, ACP, strong IDD, and monotonicity. (ii) There exists a probability space (Ω F P), a collection G of sub-σ-algebras of F , a Z-dimensional, F -measurable, and integrable random vector U, a cost function c : G → R, and a constant k ∈ R such that the preference is represented by V (A) = max E max E[U|G ] · p subject to c(G ) ≤ k (11) G ∈G
p∈A
where the outer maximization in Equation (11) has a solution for every A ∈ A and there exist p q ∈ (Z) such that E[U] · p > E[U] · q. 5.2. Connection to the Independence Axiom In this section, we discuss the special case of our model in which the fullinformation contemplation strategy is available and no more costly than any other (less informative) contemplation strategy. This special case will be closely related to the analysis of DLR, who introduced the following independence axiom for sets of lotteries.
1306
H. ERGIN AND T. SARVER
AXIOM 7—Independence: For any A B C ∈ A and α ∈ (0 1), AB
⇒
αA + (1 − α)C αB + (1 − α)C
It is easily verified that under weak order and continuity, independence implies ACP and strong IDD. Note also that under weak order and continuity, independence implies a form of indifference to contingent planning: For any A B ∈ A and α ∈ [0 1], A ∼ B implies A ∼ αA + (1 − α)B. Intuitively, this suggests that independence rules out the possibility of multiple contemplation strategies. For a simple example of why multiple contemplation strategies are inconsistent with the independence axiom, let A and B be the restaurant menus described in the Introduction, that is, A = {t s} and B = {f r}. Suppose the individual has two contemplation strategies, both of which have zero cost: (i) contemplate which seafood dish she would like and (ii) contemplate which steak dish she would like. In particular, it is not possible for the individual to contemplate both restaurant menus. This could occur if, as discussed in Section 5.1, the individual’s contemplation is constrained due to limited time and there is not sufficient time to think about both restaurant menus. Then, when faced with either menu A or B, the individual can choose a contemplation strategy that allows her to pick the ex post optimal alternative with probability 1. However, since she cannot contemplate both menus simultaneously, it is not possible for her to choose the ex post optimal alternative with certainty when asked to make a contingent plan from αA + (1 − α)B. Therefore, if the items on these menus are such that A ∼ B, it follows that A αA + (1 − α)B, in violation of the independence axiom.25 The following result generalizes these observations by showing that the independence axiom is equivalent to an RFCC representation with a single contemplation strategy. THEOREM 7: The preference satisfies weak order, strong continuity, independence, and monotonicity if and only if it has an RFCC representation (M c) in which M is a singleton. 25
It is well known that independence may be violated if the individual takes a payoff-relevant action prior to the resolution of uncertainty. In the context of our CC representation, the individual facing the complete contingent plan αA + (1 − α)B chooses her contemplation strategy before the uncertainty regarding the menu (A or B) is resolved. In the context of choices among lotteries, Mossin (1969) gave the example of an individual who has expected-utility preferences over two-period consumption vectors and makes a savings decision in period 1. Mossin argued that the individual’s induced preferences over second-period income distributions may violate independence if the savings decision precedes the resolution of uncertainty regarding the second-period income. Such induced preferences over lotteries naturally satisfy a quasiconvexity property analogous to our ACP axiom: p q ⇒ p αp + (1 − α)q. Quasiconvexity of preferences over monetary prizes has also been studied for entirely different purposes in economics. For instance, Green (1987) showed that an individual who has fixed, time-independent, continuous, and monotone preferences over lotteries over monetary prizes is prone to “money pumps” starting from nonrandom wealth levels if and only if her preferences are quasiconvex.
CONTEMPLATION REPRESENTATION
1307
We will not provide a proof of this result since it is simply a restatement of the additive EU representation theorem of DLR and DLRS.26 The following corollary shows the implications of independence for the CC representation. COROLLARY 3: The preference satisfies weak order, strong continuity, independence, and monotonicity if and only if it has a CC representation ((Ω F P) G U c) such that F ∈ G and c(F ) = minG ∈G c(G ). Corollary 3 states that a preference that satisfies independence (and our other axioms) can be represented with a CC representation in which the fullinformation contemplation strategy is available and no more costly than any other contemplation strategy. However, due to the nonuniqueness of the CC representation, there are also other CC representations for this preference in which the full-information contemplation strategy is not the least costly. Indeed, an individual’s preference will satisfy independence whenever there is a single optimal contemplation strategy, even if it is not the most informative. Therefore, it is not possible to determine from the preference whether or not the full-information contemplation strategy is the least costly; the independence axiom simply indicates that the preference can be represented as if the full-information contemplation strategy is the least costly. In the remainder of this section, we provide graphical intuition for our main axioms (ACP and IDD) and illustrate how these axioms relax the independence axiom. Consider preferences over menus of lotteries over two alternatives. That is, suppose Z = {z1 z2 }. In this case, the set of lotteries over Z can be represented as the unit interval [0 1], with p ∈ [0 1] being the probability of alternative z2 . Under weak order and continuity, ACP and monotonicity imply that the individual is indifferent between any menu and its convex hull (see Lemma 2 in Appendix C.1). We can therefore restrict attention to convex menus. Closed and convex menus from [0 1] are simply closed intervals, and hence we are considering preferences over menus of the form [p q] ⊂ [0 1] where p q ∈ [0 1]. 26 Although DLR do not impose a normalization on the set of ex post expected-utility functions in the definition of their representation, the proof of their representation result uses a set of ex post utility functions that is precisely U as defined in Equation (6). In particular, it is shown in the Supplemental Material of DLRS that the preference satisfies weak order, strong continuity, independence, and monotonicity if and only if there exists a finite Borel measure μ on U such that is represented by the functional form max u(p)μ(du) V (A) = U p∈A
Since DLRS used a slightly weaker L-continuity axiom, their representation need not satisfy singleton nontriviality (condition (iii) in Definition 2). However, under the strong continuity axiom of the current paper, singleton nontriviality will be satisfied. Hence, the pair (M c), where M = {μ} and c = 0, is an RFCC representation for .
1308
H. ERGIN AND T. SARVER
FIGURE 1.—Representing convex menus.
The set of all menus of this form is illustrated in Figure 1.27 Consider any interval A = [p q]. This interval corresponds to the point in the triangle whose first coordinate is p and whose second coordinate is q, that is, the point whose horizontal distance from the left side of the graph is p and whose vertical distance from the bottom of the graph is q. In particular, the set of all singleton menus (i.e., menus of the form {p} = [p p]) is represented by the diagonal of the triangle in this figure. Note that we abuse notation slightly and let z1 denote the lottery that gives z1 with probability 1, and likewise for z2 . Thus, the corners of the triangle labeled {z1 }, {z2 }, and [z1 z2 ] correspond to the menus {0}, {1}, and [0 1], respectively. When the set of closed and convex menus is represented as in Figure 1, a convex combination of two menus corresponds to the convex combination of the points representing these menus. Therefore, the implication of ACP is simply that the lower contour sets for the preference are convex sets. Before illustrating the implications of IDD, we make a few observations about “translations” of menus. Consider the menu A = [p q] indicated in Figure 1, and take some real number θ. Adding the translation θ to the menu A yields a new menu A + θ = [p + θ q + θ]. Figure 1 illustrates that translating a menu results in a movement in a direction parallel to the diagonal of the triangle. Figure 2 builds on these observations to show that IDD implies a type of translation invariance.28 That is, we will show that if the individual is indifferent between two menus, then she is also indifferent between the new menus 27
A similar depiction of menus of lotteries can be found in Olszewski (2007). This property is defined formally in Appendix C.1 and plays an important role in the proof of Theorem 3. 28
CONTEMPLATION REPRESENTATION
1309
FIGURE 2.—Translation invariance.
obtained by translating them both the same distance in a direction parallel to the diagonal of the triangle. Consider any two menus A and B such that A ∼ B. Therefore, as illustrated in Figure 2, A and B both lie on the same indifference curve I1 . Note that for this preference to satisfy ACP, the lower contour sets of the preference must be convex, and hence the points above I1 must be preferred to the points below I1 . Figure 2 illustrates that the menus A and B can be written as convex combinations of the singleton menu {p} with the menus A and B , respectively. That is, there exists α ∈ (0 1) such that A = αA + (1 − α){p} and B = αB + (1 − α){p}. Fix any lottery q. Then by IDD, we have αA + (1 − α){p} ∼ αB + (1 − α){p} ⇒
αA + (1 − α){q} ∼ αB + (1 − α){q}
Thus, the menus αA + (1 − α){q} and αB + (1 − α){q} must also be on the same indifference curve, which is indicated by I2 in Figure 2. However, letting θ = (1 − α)(q − p), it is easily seen that A + θ = αA + (1 − α){q} and B + θ = αB + (1 − α){q}. In other words, if the menus A and B are both translated by θ, then the individual remains indifferent between them. More generally, it can be shown that IDD implies that when the same translation is applied to any two menus, the individual’s ranking of these menus is not altered (see Lemma 3). These figures show that although ACP and IDD allow for “kinks” in indifference curves, these axioms require that lower contour sets be convex and that indifference curves be translations of each other. Note that the kinks in the indifference curves in Figure 2 indicate a change in the optimal contemplation
1310
H. ERGIN AND T. SARVER
strategy, and our model allows for a possibly infinite number of kinks. In contrast, the independence axiom requires that indifference curves be linear and does not allow for such kinks. These observations illustrate why it is necessary to relax independence so as to allow for nondegenerate costly contemplation, that is, costly contemplation with more than one contemplation strategy.29 6. INFINITE REGRESS We conclude by discussing the infinite-regress problem of bounded rationality (see Lipman (1991) and Conlisk (1996)) within the context of our main representation theorem. Let D stand for some collection of abstract decision problems. In theoretical economic analysis, standard rational agents are assumed to solve any decision problem D ∈ D optimally without any constraints. One may think that this is not a realistic assumption when the decision problem D is difficult in some sense, and be tempted to make the model more realistic by explicitly taking into account the costs of solving D. Let F be a correspondence that associates with every decision problem D ∈ D a set of new decision problems F(D) ⊂ D obtained by incorporating into D the costs of solving D. Typically, the decision problems in F(D) are even more “difficult” than D, in the same sense in which D is difficult to start with. Therefore, assuming that the individual solves the problems in F(D) optimally is no more reasonable than assuming that she solves D optimally. Explicitly including the costs of solving the decision problems in F(D) leads to a new class of decision problems F 2 (D) = F(F(D)) = D ∈F(D) F(D ). This argument can be iterated ad infinitum. Since the classes of problems D F(D) F 2 (D) F n (D) become progressively more complicated, assuming that the individual solves any one of them optimally defeats the initial purpose of building a more realistic model. This is the infinite-regress problem. To state the infinite-regress problem within the context of our model, assume that each decision problem in D ∈ D specifies a set of actions and payoffs from these actions. The augmented decision problems in F(D) introduce uncertainty about the individual’s payoffs from the actions in D, but allow her to acquire costly information about this uncertainty and condition her choice of action from D on her information. As a result, the augmented actions in a decision problem D ∈ F(D) are pairs consisting of (i) the choice of information and (ii) the choice of action from D contingent on the realized information. The payoff function corresponding to D is obtained by taking the expected payoff from the augmented action minus the cost of acquired information. 29 A relaxation of the independence axiom in the setting of preferences over menus of lotteries was also considered by Epstein, Marinacci, and Seo (2007), who interpreted their axiom as the behavior of an individual with an incomplete (or coarse) conception of the future. This coarse conception entails a degree of pessimism on the part of the individual, and their resulting representations are intuitively similar to the maxmin representation of Gilboa and Schmeidler (1989).
CONTEMPLATION REPRESENTATION
1311
The most basic type of decision problem D0 we consider specifies a menu A of lotteries interpreted as actions and an expected-utility function u : (Z) → R. The optimization problem corresponding to D0 is to find the utility-maximizing lottery out of the given menu. Let D0 denote the set of such basic decision problems. The (once) augmented decision problems D1 ∈ F(D0 ) consist of all maximization problems of type (#)
max E[U · f ] − c(G ) (G f )
where, as in the CC formulation, (Ω F P) is a probability space and U is a state-dependent utility function representing the individual’s uncertainty about her payoff from actions in A, G is a collection of sub-σ-algebras of F specifying the information that the individual can acquire, and c(G ) denotes the cost of information G . The maximization is done over all augmented actions (G f ), where G ∈ G and f : Ω → A determines a plan of actions measurable with respect to the acquired information G .30 We can define F(D1 ) by introducing further uncertainty about the augmented decision problem (#) above. More specifically, we can introduce uncertainty about the probability P, the state dependent utility function U, and the cost function c, and allow the individual to acquire costly information about this additional uncertainty and condition her choice of action (G f ) in Equation (#) on this information. It is straightforward to see how this construction can be iterated an arbitrary of times to construct F n (D0 ) for an arbi∞ number n trary n ≥ 1. We let D = n=0 F (D0 ). Although one can also argue within the context of our model that the classes of decision problems D0 F(D0 ) F 2 (D0 ) F n (D0 ) become progressively more complicated because they involve solving higher-order information acquisition problems, our representation result is immune to this criticism. To see this, consider an excerpt from Lipman (1995, p. 59), who explained why axiomatic approaches to bounded rationality are not susceptible to the infinite-regress criticism: Roughly, the axiomatic approach begins with a description of the agent and then translates this into a model of information processing. Clearly, it then makes no sense to ask whether the agent can carry out this information processing accurately. If the processing is simply a representation of what the agent is doing, the question boils down to asking whether an agent is able to do whatever it is that he does!
30
The one-shot maximization problem corresponding to (#) is equivalent to the two-stage maximization problem in the CC formulation where the individual first chooses her information G and then chooses a lottery maximizing her ex post expected utility E[U|G ] conditional on the realized information. We are using the one-shot maximization formulation in Equation (#) because it is more explicit about the action space for D1 .
1312
H. ERGIN AND T. SARVER
In particular, to the extent that one finds ACP and IDD to be convincing behavioral aspects of bounded rationality arising from contemplation costs, there is no loss of generality from restricting attention to the case where the decision maker optimally solves the problem of learning her preferences subject to costs, that is, to the case where she optimally solves F(D0 ). Therefore, our representation result may be seen as giving an as if solution to the infinite-regress problem. On a final note, it is standard that every nth-level problem Dn ∈ F n (D0 ) where n ≥ 1 can be collapsed to a first-level problem D1 ∈ F(D0 ) by rewriting the dynamic information acquisition problem in Dn as a one-shot augmented costly information acquisition problem. In particular, if we do not observe the individual’s sequence of information acquisition choices, then we cannot distinguish between first-level and nth-level decision problems. Since the only observable in our model is the individual’s preferences over menus, the representations where the individual solves a higher-order subjective information acquisition problem in F n (D0 ) for n ≥ 2 are behaviorally indistinguishable from those where she solves a first-order problem in F(D0 ).
APPENDIX A: SHOWING THE CC REPRESENTATION IS WELL DEFINED Consider any CC representation ((Ω F P) G U c). In this section, we show that the term E[maxp∈A E[U|G ] · p] is well defined and finite for every A ∈ A and G ∈ G. This in particular implies that V (A) is finite whenever the outer maximization in Equation (4) has a solution. Let U˜ be an arbitrary version of E[U|G ]. The existence and integrability of U˜ z follow from integrability of Uz for each z ∈ Z (see Billingsley (1995, p. 445)). Let B be a countable ˜ dense subset of A. At each ω ∈ Ω, maxp∈A U(ω) · p exists and is equal to ˜ ˜ supp∈B U(ω) · p. For each p ∈ B, U · p is F -measurable as a convex combination of F -measurable random variables. Hence, maxp∈A U˜ · p = supp∈B U˜ · p is an F -measurable random variable as the pointwise supremum of countably many F -measurable random variables (see Billingsley (1995, p. 184, Theo rem 13.4(i))). Note also that for any p ∈ (Z), |U˜ · p| ≤ z∈Z |U˜ z |, and hence | maxp∈A U˜ · p| ≤ z∈Z |U˜ z |. Therefore, integrability of maxp∈A U˜ · p follows ˜ from integrability of U. APPENDIX B: PROOF OF LEMMA 1 Fix a CC representation ((Ω F P) G U c) and fix G ∈ G. Let 1 ∈ RZ denote the vector whose coordinates are all equal to 1. It is easy to show that there exist G -measurable and integrable functions α : Ω → R+ , β : Ω → R, and
CONTEMPLATION REPRESENTATION
1313
u : Ω → U such that E[U|G ] = αu + β1
P-almost surely.31
Define a positive finite measure m on (Ω G ) via its Radon–Nikodym derivative dm (ω) = α(ω), and define a finite Borel measure μG on U via μG = m ◦ u−1 .32 dP Let βG = E[β]. Then, for any menu A ∈ A, α(ω) max u(ω) · p P(dω) + E[β] E max E[U|G ] · p = p∈A
Ω
=
Ω
=
U
p∈A
max u(ω) · p m(dω) + βG p∈A
max u · p μG (du) + βG p∈A
where the final equality follows from the change of variables formula. In addition, taking p = (1/|Z| 1/|Z|), we have u · p = 0 for all u ∈ U . Thus, letting A = {p} in the above equation, we have 1 1 βG = E[E[U|G ] · p] = E E[Uz |G ] = E[Uz ] |Z| z∈Z |Z| z∈Z To show that the μG and βG defined above are unique, consider any other μG and βG such that for all A ∈ A, max(u · p)μG (du) + βG = max(u · p)μG (du) + βG U p∈A
U p∈A
Taking p = (1/|Z| 1/|Z|) and letting A = {p}, the above equation implies βG = βG . This in turn implies that for any A ∈ A, max(u · p)μG (du) = max(u · p)μG (du) U p∈A
U p∈A
31 For example, fix any version U˜ of E[U|G ]. Letting u¯ be any element of U and letting · de 1 ˜ ˜ note the standard Euclidean norm on RZ , take β(ω) = |Z| z∈Z Uz (ω), α(ω) = U(ω)−β(ω)1, and ⎧ ˜ ⎨ U(ω) − β(ω)1 if α(ω) = 0, u(ω) = α(ω) ⎩ u¯ if α(ω) = 0.
It is a standard exercise
to check that α, β, and u so defined are G -measurable and integrable. 32 That is, m(E) = E α(ω)P(dω) for any E ∈ G , and μG (F) = m ◦ u−1 (F) = u−1 (F) α(ω)P(dω) for any Borel measurable set F ⊂ U .
1314
H. ERGIN AND T. SARVER
Since each side of this equality is what DLR referred to as an additive EU representation, we can apply their uniqueness result to conclude that μG = μG .33 APPENDIX C: PROOF OF THEOREM 3 In this section, we prove two results. We first prove a general representation theorem for preferences that may violate monotonicity and subsequently establish Theorem 3 as a special case. The following definition is a generalization of the RFCC representation to allow for signed measures. DEFINITION 5: A signed RFCC representation is a pair (M c) consisting of a compact set of finite signed Borel measures M on U and a lower semicontinuous function c : M → R such that V : A → R defined by Equation (7) represents and (i)–(iii) in Definition 2 are satisfied. The signed RFCC representation is of interest since it can be used to model a preference for commitment in conjunction with costly contemplation. A preference for commitment could arise if an individual experiences regret or temptation. See, for example, Sarver (2008) for a model of regret and Gul and Pesendorfer (2001) or Dekel, Lipman, and Rustichini (2008) for models of temptation and self-control. The representations considered in those papers are special cases of the singleton signed RFCC representation (i.e., the signed RFCC representation with a single measure). We conjecture that models that combine regret or temptation with costly contemplation could be represented in reduced form as special cases of the general signed RFCC representation. We leave the investigation of such models as an open question for future research. To allow for signed measures, we replace the monotonicity axiom with the following axiom introduced by DLR. AXIOM 8—Indifference to Randomization: For every A ∈ A, A ∼ co(A). Indifference to randomization (IR) is justified if the individual choosing from the menu A can also randomly select an alternative from the menu, for example, by flipping a coin. In that case, the menus A and co(A) offer the same set of options, and hence they are identical from the perspective of the individual. In this section, we prove the following theorem. THEOREM 8: (A) The preference has a signed RFCC representation if and only if it satisfies weak order, strong continuity, ACP, IDD, and IR. (B) The preference has an RFCC representation if and only if it satisfies weak order, strong continuity, ACP, IDD, and monotonicity. 33 This particular version of the uniqueness result for the additive EU representation can be found in Sarver (2008, Lemma 18) for the case where μG and μG are Borel probability measures. Extending the result to arbitrary finite Borel measures is trivial.
CONTEMPLATION REPRESENTATION
1315
Theorem 8(B) is simply a restatement of Theorem 3, and Theorem 8(A) characterizes the signed RFCC representation. Note also that the IR axiom is not included in Theorem 8(B) because it is implied by the other axioms (see Lemma 2 in Appendix C.1). The remainder of this section is devoted to the proof of Theorem 8. With the exception of L-continuity, the necessity of the axioms in Theorem 8 is straightforward and left to the reader. The proof of the necessity of L-continuity is contained in Section S.2 of the Supplemental Material. For the sufficiency direction, let Ac ⊂ A denote the collection of all convex menus. In both parts (A) and (B) of Theorem 8, satisfies IR. In part (A), IR is directly assumed, whereas in part (B) it is implied by the other axioms. Therefore, for all A ∈ A, A ∼ co(A) ∈ Ac . Note that for any u ∈ U , we have max u · p = max u · p p∈A
p∈co(A)
Thus, if we establish the representations in Theorem 8 for convex menus and then apply the same functional form to all of A, then by IR the resulting function represents on A. Note also that A is a compact metric space since (Z) is a compact metric space (see, e.g., Munkres (2000, pp. 280–281) or Theorem 1.8.3 in Schneider (1993, p. 49)). It is a standard exercise to show that Ac is a closed subset of A, and hence Ac is also compact (see Theorem 1.8.5 in Schneider (1993, p. 50)). In Section C.1, we construct a function V with certain desirable properties. In Section C.2, we apply the duality results from Section S.1 of the Supplemental Material to the function V , which completes the proof of the sufficiency part of Theorem 8. C.1. Construction of V We start by establishing a simple implication of the axioms introduced in the text. LEMMA 2: If satisfies weak order, continuity, ACP, and monotonicity, then it also satisfies IR. PROOF: Let A ∈ A. Monotonicity implies that co(A) A, and hence we only need to prove that A co(A). Let us inductively define a sequence of sets via A0 = A and Ak = 12 Ak−1 + 12 Ak−1 for k ≥ 1. ACP implies that Ak−1 Ak and therefore, by transitivity, A Ak for any k. It is straightforward to verify Q.E.D. that dh (Ak co(A)) → 0, so we have A co(A) by continuity.
1316
H. ERGIN AND T. SARVER
For proving our representation theorem, it will be useful to derive an alternative formulation of our IDD axiom. Before introducing this new axiom, we define the set of translations to be (12) θz = 0 Θ ≡ θ ∈ RZ : z∈Z
Any θ ∈ Θ can be thought of as a signed measure on Z such that θ(Z) = 0. For A ∈ A and θ ∈ Θ, define A + θ ≡ {p + θ : p ∈ A}. Intuitively, adding θ to A in this sense simply “shifts” A. Also, note that for any p q ∈ (Z), we have p − q ∈ Θ. We now give a formulation of IDD in terms of translations. AXIOM 9—Translation Invariance: For any A B ∈ A and θ ∈ Θ such that A + θ B + θ ∈ A, AB
⇒
A + θ B + θ34
LEMMA 3: The preference satisfies IDD if and only if it satisfies translation invariance (TI). PROOF: To see that TI implies IDD, assume that A B ∈ A, p q ∈ (Z) are such that λA + (1 − λ){q} λB + (1 − λ){q}. Let A = λA + (1 − λ){q}, B = λB + (1 − λ){q}, and θ = (1 − λ)(p − q). Note that θ ∈ Θ, A + θ = λA + (1 − λ){p} ∈ A, and B + θ = λA + (1 − λ){p} ∈ A. Hence, by TI, λA + (1 − λ){p} λB + (1 − λ){p}. To see that IDD implies TI, assume that A B ∈ A and θ ∈ Θ are such that A + θ B + θ ∈ A and A B. If θ = 0, the conclusion of TI holds triv− ially, so assume that θ = 0. Let Z − = {z ∈ Z : θz < 0}. Define θ+ , θ ∈ RZ by + − θ = max {0 θz } and θz = max {0 −θz } for any z ∈ Z. Then let κ ≡ z∈Z θz+ = z − z∈Z θz > 0. We will first show that for any r ∈ A ∪ B, (13)
0 ≤ rz − θz− ≤ 1 − κ
for all z ∈ Z
Note that for any z ∈ Z − , rz − θz− = rz + θz ≥ 0 since r + θ ∈ (Z). Note also that if z ∈ / Z − , then rz − θz− = rz ≥ 0 since θz− = 0. So for any z ∈ Z, − − − rz − θz ≤ 1 − θz − θz− 0 ≤ rz − θz ≤ 1 − z ∈Z − \{z}
z ∈Z − \{z}
= 1 − κ 34 Note that TI implies its converse. Suppose A + θ B + θ. Then by TI, A = (A + θ) + (−θ) (B + θ) + (−θ) = B.
CONTEMPLATION REPRESENTATION
1317
establishing Equation (13). Therefore, since θ = 0, we have 0 < κ ≤ 1. Then p ≡ κ1 θ+ , q ≡ κ1 θ− are in (Z), and θ = κ(p − q). There are two cases to consider: First, consider the case of κ < 1. Define subsets A and B of RZ by 1 (r − θ− ) for some r ∈ A A ≡ r ∈ R Z : r = 1−κ 1 (r − θ− ) for some r ∈ B B ≡ r ∈ RZ : r = 1−κ By Equation (13) and the definition of κ, we have that A B ∈ A and (14)
(1 − κ)A + κ{q} = A B = (1 − κ)B + κ{q}
Next, consider the κ = 1 case. By Equation (13) we have r = θ− = q for any r ∈ A ∪ B. Therefore, A = B = {q}, and hence Equation (14) holds for any choice of A B ∈ A. Since Equation (14) holds in each of the two cases above, we conclude by IDD that A + θ = (1 − κ)A + κ{p} (1 − κ)B + κ{p} = B + θ Therefore, TI is satisfied.
Q.E.D.
In light of Lemma 3, we will use IDD and TI interchangeably. Before proceeding, we define the following important subset of Ac : (15)
A◦ ≡ {A ∈ Ac : ∀θ ∈ Θ ∃α > 0 such that A + αθ ∈ Ac }
Thus A◦ contains menus that can be translated at least a little bit in the direction of any vector in Θ. It is easily verified that A◦ is convex. In addition, the following result gives an alternative characterization of A◦ along with some other important properties. LEMMA 4: The set A◦ has the following properties: (i) A◦ = {A ∈ Ac : ∃ε > 0 such that ∀p ∈ A ∀z ∈ Z pz ≥ ε}. (ii) Suppose p ∈ (Z) is such that pz > 0 for all z ∈ Z. Then for any A ∈ Ac and λ ∈ [0 1), λA + (1 − λ){p} ∈ A◦ . (iii) A◦ is dense in Ac . PROOF: (i) Let Aˆ ◦ ≡ {A ∈ Ac : ∃ε > 0 such that ∀p ∈ A ∀z ∈ Z pz ≥ ε}. To see that Aˆ ◦ ⊂ A◦ , take any A ∈ Aˆ ◦ and θ ∈ Θ. Let ε > 0 be such that pz ≥ ε for all p ∈ A and z ∈ Z. Choose α > 0 sufficiently small to ensure that α · maxz∈Z |θz | ≤ ε. Then pz + αθz ≥ pz − ε ≥ 0 for all p ∈ A and z ∈ Z, so A + αθ ∈ Ac . Thus A ∈ A◦ .
1318
H. ERGIN AND T. SARVER
To see that A◦ ⊂ Aˆ ◦ , take any A ∈ A◦ . Fix any z ∈ Z and take any θ ∈ Θ such that θz = −1. Then let αz > 0 be such that A + αz θ ∈ Ac , so for any p ∈ A, pz + αz θ = pz − αz ≥ 0. We obtain such an αz > 0 for every z ∈ Z, so let ε ≡ minz∈Z αz > 0. Then for any p ∈ A and z ∈ Z, pz ≥ αz ≥ ε, so A ∈ Aˆ ◦ . (ii) Let ε ≡ (1 − λ)(minz∈Z pz ) > 0. Then for any q ∈ A and z ∈ Z, λqz + (1 − λ)pz ≥ ε. Thus λA + (1 − λ){p} ∈ A◦ by part (i). (iii) It is easily verified that for any A ∈ Ac , (1 − 1/n)A + (1/n){p} → A as Q.E.D. n → ∞. Hence A◦ is dense in Ac by part (ii). We next define Lipschitz continuity. DEFINITION 6: Given a metric space (X d), a function f : X → R is Lipschitz continuous if there is some real number K such that for every x y ∈ X, |f (x) − f (y)| ≤ Kd(x y). The number K is called a Lipschitz constant of f . We will construct a function V : Ac → R that represents on Ac and has certain desirable properties. We next define the notion of translation linearity so as to present the main result of this section. Recall that the set of translations, denoted by Θ, is defined in Equation (12). DEFINITION 7: Suppose that V : Ac → R. Then V is translation linear if there exists v ∈ RZ such that for all A ∈ Ac and θ ∈ Θ with A + θ ∈ Ac , we have V (A + θ) = V (A) + v · θ. PROPOSITION 1: If the preference satisfies weak order, strong continuity, ACP, and IDD, then there exists a function V : Ac → R with the following properties: (i) For any A B ∈ Ac , A B ⇐⇒ V (A) ≥ V (B). (ii) V is Lipschitz continuous, convex, and translation linear. (iii) There exist p q ∈ (Z) such that V ({p}) > V ({q}). Moreover, if V and V are two functions that satisfy (ii)–(iii) and are ordinally equivalent in the sense that for any A B ∈ Ac , V (A) ≥ V (B) ⇐⇒ V (A) ≥ V (B), then there exist α > 0 and β ∈ R such that V = αV + β. First note that by taking the p∗ and p∗ from the L-continuity axiom, it follows that {p∗ } {p∗ }. Thus part (iii) of Proposition 1 follows from part (i). The proof of the rest of the proposition is in Section S.3 of the Supplemental Material. In the remainder of the current section, we present an outline of the proof. Intuitively, the assumptions of strong continuity, ACP, and IDD (equivalently TI) on play key roles in establishing Lipschitz continuity, convexity, and translation linearity of V , respectively. Let S ≡ {{p} : p ∈ (Z)} be the set all of singleton sets in Ac . Lemma S.5 in the Supplemental Material shows that given the assumptions of Proposition 1, satisfies the von Neumann–Morgenstern axioms on S . Therefore, there ex-
CONTEMPLATION REPRESENTATION
1319
ists v ∈ RZ such that for all p q ∈ (Z), {p} {q} if and only if v · p ≥ v · q. We will abuse notation and also treat v as a function v : S → R naturally defined by v({p}) = v · p. Note that v is translation linear since v({p} + θ) = v({p}) + v · θ whenever p ∈ (Z), θ ∈ Θ, and p + θ ∈ (Z). We want to extend v to a function V on Ac that represents and is translation linear. The outline of the construction of the desired extension is the following: We first restrict attention to menus in A◦ , as defined in Equation (15). This restriction allows us to make extensive use of the translation invariance (TI) property. We construct a sequence of subsets of A◦ , starting with A◦ ∩ S , such that each set is contained in its successor set. We then extend v sequentially to each of these domains, while still representing and preserving translation linearity (with respect to the vector v). The domain will grow to eventually contain all of the sets in A◦ , and we show how to extend it to all of Ac by continuity. Then we prove that the resulting function is translation linear, Lipschitz continuous, and convex. As above, take p∗ and p∗ from the L-continuity axiom, and let θ∗ ≡ p∗ − p∗ . Define a sequence A0 A0 A1 A1 of subsets of A◦ inductively as follows: Let A0 ≡ A◦ ∩ S . By part (i) of Lemma 4, we have that A0 = {{p} : p ∈ (Z) and ∀z ∈ Z pz > 0}. Define Ai for all i ≥ 0 by
Ai ≡ {A ∈ A◦ : A ∼ B for some B ∈ Ai } and define Ai for all i ≥ 1 by
Ai ≡ {A ∈ A◦ : A = B + αθ∗ for some α ∈ R B ∈ Ai−1 } Intuitively, we first extend A0 by including all A ∈ A◦ that are viewed with indifference to some B ∈ A0 . Then we extend to all translations by multiples of θ∗ . We repeat the process, alternating between extension by indifference and extension by translation. Note that A0 ⊂ A0 ⊂ A1 ⊂ A1 ⊂ · · ·. Figure 3 illustrates this construction for the special case of Z = {z1 z2 }. In this case, closed and convex menus of lotteries over Z can be represented as ordered pairs in the triangle in Figure 3 (see the discussion in Section 5.2). In this figure, the set A0 is the diagonal of the triangle, and the set A0 is the region labeled I. The combination of regions I and II is the set A1 , and the combination of regions I, II, and III is the set A1 . One could continue in this fashion to obtain the remaining sets A2 A2 We also define a sequence of functions V0 V0 V1 V1 from these domains. That is, for all i ≥ 0, Vi : Ai → R and Vi : Ai → R. Define these functions recursively as follows: (i) Let V0 ≡ v|A0 . (ii) For i ≥ 0, if A ∈ Ai , then A ∼ B for some B ∈ Ai , so define Vi by Vi (A) ≡ Vi (B). (iii) For i ≥ 1, if A ∈ Ai , then A = B + αθ∗ for some α ∈ R and B ∈ Ai−1 , so (B) + α(v · θ∗ ). define Vi by Vi (A) ≡ Vi−1
1320
H. ERGIN AND T. SARVER
FIGURE 3.—Construction of Ai and Ai .
In a series of lemmas in Section S.3 in the Supplemental Material, we show that these are well defined functions which represent on their domains and are translation linear. C.2. Application of Duality Results In this section, we apply the duality results from Section S.1 of the Supplemental Material to the function V constructed in Section C.1 to obtain the desired signed RFCC representation. Thus, in the remainder of this section assume that V satisfies (i)–(iii) from Proposition 1. Note that if also satisfies monotonicity, then V is monotone in the sense that for all A B ∈ Ac such that A ⊂ B, we have V (A) ≤ V (B). We explicitly assume monotonicity of V at the end of this section to prove the stronger representation of Theorem 8(B). We follow a construction similar to that in DLR to obtain from V a function W whose domain is the set of support functions. Let U be defined as in Equation (6). For any A ∈ Ac , the support function σA : U → R of A is defined by σA (u) = maxp∈A u · p. For a more complete introduction to support functions, see Rockafellar (1970) or Schneider (1993). Let C(U ) denote the set of continuous real-valued functions on U . When endowed with the supremum norm · ∞ , C(U ) is a Banach space. Define an order ≥ on C(U ) by f ≥ g if f (u) ≥ g(u) for all u ∈ U . Let Σ = {σA ∈ C(U ) : A ∈ Ac }. For any σ ∈ Σ, let p ∈ (Z) : u · p = uz pz ≤ σ(u) Aσ = u∈U
z∈Z
LEMMA 5: (i) For all A ∈ Ac and σ ∈ Σ, A(σA ) = A and σ(Aσ ) = σ. Hence, σ is a bijection from Ac to Σ.
CONTEMPLATION REPRESENTATION
1321
(ii) For all A B ∈ Ac , σλA+(1−λ)B = λσA + (1 − λ)σB (iii) For all A B ∈ Ac , dh (A B) = σA − σB ∞ PROOF: These are standard results that can be found in Rockafellar (1970) or Schneider (1993).35 For instance, in Schneider (1993), part (i) follows from Theorem 1.7.1, part (ii) follows from Theorem 1.7.5, and part (iii) follows from Theorem 1.8.11. Q.E.D. LEMMA 6: Σ is convex and compact, and 0 ∈ Σ. PROOF: The set Σ is convex by the convexity of Ac and part (ii) of Lemma 5. As discussed above, the set Ac is compact, and hence by parts (i) and (iii) of Lemma 5, Σ is a compact subset of the Banach space C(U ). Also, if we take q = (1/|Z| 1/|Z|) ∈ (Z), then q · u = 0 for all u ∈ U . Thus σ{q} = 0, and hence 0 ∈ Σ. Q.E.D. Define the function W : Σ → R by W (σ) = V (Aσ ). Then, by part (i) of Lemma 5, V (A) = W (σA ) for all A ∈ Ac . We say the function W is monotone if for all σ σ ∈ Σ such that σ ≤ σ , we have W (σ) ≤ W (σ ). LEMMA 7: W is convex and Lipschitz continuous with the same Lipschitz constant as V . If V is monotone, then W is monotone. PROOF: To see that W is convex, let A B ∈ Ac . Then W (λσA + (1 − λ)σB ) = W σλA+(1−λ)B = V (λA + (1 − λ)B) ≤ λV (A) + (1 − λ)V (B) = λW (σA ) + (1 − λ)W (σB ) by parts (i) and (ii) of Lemma 5 and convexity of V . The function W is Lipschitz continuous with the same Lipschitz constant as V by parts (i) and (iii) of Lemma 5. The function W inherits monotonicity from V because of the following fact which is easy to see from part (i) of Lemma 5: For all A B ∈ Ac , Q.E.D. A ⊂ B if and only if σA ≤ σB . We denote the set of continuous linear functionals on C(U ) (the dual space of C(U )) by C(U )∗ . It is well known that C(U )∗ is the set of finite signed Borel measures on U , where the duality is given by f μ = f (u)μ(du) U
35 The standard setting for support functions is the set of nonempty closed and convex subsets of Rn . However, by imposing our normalizations on the domain of the support functions U , the standard results are easily adapted to our setting of nonempty closed and convex subsets of (Z).
1322
H. ERGIN AND T. SARVER
for any f ∈ C(U ) and μ ∈ C(U )∗ .36 For σ ∈ Σ, the subdifferential of W at σ is defined to be ∂W (σ) = {μ ∈ C(U )∗ : σ − σ μ ≤ W (σ ) − W (σ) for all σ ∈ Σ} The conjugate (or Fenchel conjugate) of W is the function W ∗ : C(U )∗ → R ∪ {+∞} defined by W ∗ (μ) = sup[σ μ − W (σ)] σ∈Σ
There is an important duality between a convex function and its conjugate. We discuss this duality in detail in Section S.1 of the Supplemental Material. Lemma 8 summarizes certain properties of W ∗ that will be used in the sequel. Lemma S.1 provides a proof of these properties for general convex functions. LEMMA 8: (i) W ∗ is lower semicontinuous in the weak* topology. (ii) W (σ) ≥ σ μ − W ∗ (μ) for all σ ∈ Σ and μ ∈ C(U )∗ . (iii) W (σ) = σ μ − W ∗ (μ) if and only if μ ∈ ∂W (σ). We next define ΣW , NW , and MW as in Equations (S.3), (S.4), and (S.5), respectively, from Section S.1 of the Supplemental Material: ΣW = {σ ∈ Σ : ∂W(σ) is a singleton}
NW = {μ ∈ C(U )∗ : μ ∈ ∂W(σ) for some σ ∈ ΣW } M W = NW where the closure is taken with respect to the weak* topology. We now apply Theorem S.1 in the Supplemental Material to the current setting. LEMMA 9: MW is weak* compact, and for any weak* compact M ⊂ C(U )∗ ,
MW ⊂ M
⇐⇒
W (σ) = max[σ μ − W ∗ (μ)] μ∈M
∀σ ∈ Σ
PROOF: We simply need to verify that C(U ), Σ, and W satisfy the assumptions of Theorem S.1, that is, (i) C(U ) is a separable Banach space, (ii) Σ is a closed and convex subset of C(U ) containing the origin such that span(Σ) is dense in C(U ), and (iii) W : Σ → R is Lipschitz continuous and convex. Since U 36 Since U is a compact metric space, by the Riesz representation theorem (see Royden (1988, p. 357)), each continuous linear functional on C(U ) corresponds uniquely to a finite signed Baire measure on U . Since U is a locally compact separable metric space, the Baire sets and the Borel sets of U coincide (see Royden (1988, p. 332)). Hence the sets of Baire and Borel finite signed measures also coincide.
CONTEMPLATION REPRESENTATION
1323
is a compact metric space, C(U ) is separable (see Theorem 8.48 of Aliprantis and Border (1999)). By Lemma 6, Σ is a closed and convex subset of C(U ) containing the origin. Although the result is stated slightly differently, it is shown in Hörmander (1954) that span(Σ) is dense in C(U ). This result is also proved in DLR. Finally, W is Lipschitz continuous and convex by Lemma 7. Q.E.D. One consequence of Lemma 9 is that for all σ ∈ Σ, W (σ) = max [σ μ − W ∗ (μ)] μ∈MW
Therefore, for all A ∈ Ac , ∗ max(u · p)μ(du) − W (μ) V (A) = max μ∈MW
U p∈A
The function W ∗ is lower semicontinuous by part (i) of Lemma 8, and MW is compact by Lemma 9. It remains only to show that MW is consistent and minimal, and that monotonicity of W implies each μ ∈ MW is positive. Since V is translation linear, there exists v ∈ RZ such that for all A ∈ Ac and θ ∈ Θ with A + θ ∈ Ac , we have V (A + θ) = V (A) + v · θ. The following result shows that a certain subset of MW must agree with v in a way that will imply the consistency of this subset. In what follows, let q = (1/|Z| 1/|Z|) ∈ (Z) and let A◦ ⊂ Ac be defined as in Equation (15). LEMMA 10: If A ∈ A◦ and μ ∈ ∂W(σA ), then σ{p} μ = v · (p − q) for all p ∈ (Z). PROOF: Fix any A ∈ A◦ and μ ∈ ∂W(σA ). We can apply the definition of the support function to θ ∈ Θ, so that σ{θ} (u) = u · θ for u ∈ U . It is easily verified that for any A ∈ Ac and θ ∈ Θ, σA+θ = σA + σ{θ} . We first prove that σ{θ} μ = v · θ for all θ ∈ Θ. Fix any θ ∈ Θ. Since A ∈ A◦ , there exists α > 0 such that A + αθ A − αθ ∈ Ac . By the translation linearity of V , we have α(v · θ) = V (A + αθ) − V (A) = W (σA+αθ ) − W (σA ) Since μ ∈ ∂W(σA ), by part (iii) of Lemma 8, W (σA ) = σA μ − W ∗ (μ). Also, by part (ii) of the same lemma, W (σA+αθ ) ≥ σA+αθ μ − W ∗ (μ). Therefore, we have ! " ! " α(v · θ) ≥ σA+αθ μ − σA μ = σ{αθ} μ = α σ{θ} μ A similar argument can be used to show that ! " −α(v · θ) = W (σA−αθ ) − W (σA ) ≥ −α σ{θ} μ
1324
H. ERGIN AND T. SARVER
Hence, we have α(v · θ) = ασ{θ} μ or, equivalently, v · θ = σ{θ} μ. We now prove that σ{p} μ = v · (p − q) for all p ∈ (Z). Since z uz = 0 for u ∈ U , we have u · q = 0 for all u ∈ U . Clearly, this implies that σ{q} = 0, so that σ{q} μ = 0. For any p ∈ (Z), p − q ∈ Θ, so the above results imply " ! " ! " ! " ! σ{p} μ = σ{p−q} μ + σ{q} μ = σ{p−q} μ = v · (p − q) which completes the proof.
Q.E.D.
By part (ii) of Lemma 4, if q = (1/|Z| 1/|Z|), then λA + (1 − λ){q} ∈ A◦ for any A ∈ Ac and λ ∈ (0 1). Therefore, we can use Lemma 10 and the continuity of W to prove the consistency of MW . LEMMA 11: If μ ∈ MW , then σ{p} μ = v · (p − q) for all p ∈ (Z). PROOF: Define M ⊂ MW by # ! " $ M ≡ μ ∈ MW : σ{p} μ = v · (p − q) for all p ∈ (Z) It is easily verified that M is a closed subset of MW and is therefore compact. We want to show MW ⊂ M, which would imply M = MW . By Lemma 9, we only need to verify that W (σ) = maxμ∈M [σ μ − W ∗ (μ)] for all σ ∈ Σ. Let σ ∈ Σ be arbitrary. For all λ ∈ (0 1), we have λAσ + (1 − λ){q} ∈ A◦ . Note that σλAσ +(1−λ){q} = λσ(Aσ ) + (1 − λ)σ{q} = λσ. Therefore, Lemma 10 implies that for all λ ∈ (0 1), MW ∩ ∂W(λσ) ⊂ M. By Lemma 9, there exists μ ∈ MW such that W (λσ) = λσ μ − W ∗ (μ), which implies μ ∈ ∂W(λσ) by part (iii) of Lemma 8. Thus, MW ∩ ∂W(λσ) = ∅. Take any net {λd }d∈D such that λd → 1, and let σd ≡ λd σ, so that σd → σ. From the above arguments, for all d ∈ D there exists μd ∈ MW ∩ ∂W(σd ) ⊂ M. Without Since M is weak* compact, every net in M has a convergent subnet. w∗ loss of generality, suppose the net itself converges, so that μd → μ for some μ ∈ M. By the definition of the subdifferential and the continuity of W , for any σ ∈ Σ, σ − σ μ = limσ − σd μd d
≤ lim[W (σ ) − W (σd )] d
= W (σ ) − W (σ) which implies μ ∈ ∂W(σ).37 Hence, W (σ) = σ μ − W ∗ (μ) by part (iii) of Lemma 8. Since σ ∈ Σ was arbitrary, this completes the proof. Q.E.D. 37 To establish the first equality in this equation, note that {μd }d∈D is norm bounded by the compactness of M and Alaoglu’s theorem (see Theorem 6.25 in Aliprantis and Border (1999)).
CONTEMPLATION REPRESENTATION
1325
The consistency of MW follows immediately from Lemma 11 since for any μ μ ∈ MW and p ∈ (Z), we have ! " (u · p)μ(du) = σ{p} μ = v · (p − q) U ! " = σ{p} μ = (u · p)μ (du) U
We now prove the minimality of MW . LEMMA 12: MW is minimal. PROOF: Suppose M ⊂ MW is compact and (M W ∗ |M ) still represents . We will show that this implies M = MW . Define V : Ac → R as in Equation (7) for the representation (M W ∗ |M ), and define W : Σ → R by W (σ) = V (Aσ ). Then W (σ) = max [σ μ − W ∗ (μ)] μ∈M
for all σ ∈ Σ. Note that V satisfies (i)–(iii) from Proposition 1. Lipschitz continuity and translation linearity follow from Lemma S.2 in the Supplemental Material, and the other properties are immediate. Therefore, by the uniqueness part of Proposition 1, there exist α > 0 and β ∈ R such that V = αV + β, which implies W = αW + β. By singleton nontriviality, there exist p∗ p∗ ∈ (Z) such that {p∗ } {p∗ }. Therefore, by Lemma 11, for any μ ∈ MW , ! " ! " ! " σ{p∗ } − σ{p∗ } μ = σ{p∗ } μ − σ{p∗ } μ = v · (p∗ − p∗ ) > 0 We can therefore apply Proposition S.1 from the Supplemental Material with Q.E.D. x¯ = σ{p∗ } − σ{p∗ } to conclude that M = MW . Thus, MW is minimal. We have now completed the proof of Theorem 8(A). To complete the proof of Theorem 8(B), note that C(U ) is a Banach lattice (see Aliprantis and Border (1999, p. 302)) and Σ has the property that σ ∨ σ ∈ Σ for all σ σ ∈ Σ. Therefore, by Theorem S.2 from the Supplemental Material, if W is monotone, then each μ ∈ MW is positive. Thus, there exists K > 0 such that μd ≤ K for all d ∈ D. Therefore, |σ − σ μ − σ − σd μd | ≤ |σ − σ μ − σ − σ μd | + |σ − σ μd − σ − σd μd | = |σ − σ μ − μd | + |σd − σ μd | ≤ |σ − σ μ − μd | + σd − σμd ≤ |σ − σ μ − μd | + σd − σK w∗
The right side of this inequality converges to zero since μd → μ and σd → σ.
1326
H. ERGIN AND T. SARVER
APPENDIX D: PROOF OF THEOREM 2 D.1. CC ⇒ RFCC Assume there exists a CC representation ((Ω F P) G U c) such that V is given by Equation (4). Then the restriction of V to Ac is monotone and satisfies (ii) and (iii) in Proposition 1 in Appendix C.1. It is easy to see that V is monotone, convex, and translation linear, and that there exist p q ∈ (Z) such that V ({p}) > V ({q}). It remains only to show that V is Lipschitz continuous. Note that K = z∈Z E[|Uz |] > 0 is finite since U is integrable. Let · denote the usual Euclidean norm in RZ . Let G ∈ G and define fG : A → R by fG (A) = E max E[U|G ] · p − c(G ) p∈A
Let A B ∈ A. Given a state ω ∈ Ω, let p∗ be a solution of maxp∈A E[U|G ](ω) · p. By definition of Hausdorff distance, there exists q∗ ∈ B such that p∗ −q∗ ≤ dh (A B). Then max E[U|G ](ω) · p − max E[U|G ](ω) · q q∈B
p∈A
∗
= E[U|G ](ω) · p − max E[U|G ](ω) · q q∈B
∗
≤ E[U|G ](ω) · p − E[U|G ](ω) · q∗ ≤ E[U|G ](ω) × p∗ − q∗ ≤ E[U|G ](ω) × dh (A B) Taking the expectation of the above inequality we obtain % & fG (A) − fG (B) ≤ E E[U|G ] dh (A B) where
& E E[U|G ] ≤ E |E[Uz |G ]| ≤ E E[|Uz ||G ] %
z∈Z
=
z∈Z
E[|Uz |] = K
z∈Z
Hence fG is Lipschitz continuous with a Lipschitz constant K that does not depend on G . Since V is the pointwise maximum of fG over G ∈ G, it is also Lipschitz continuous with the same Lipschitz constant K. Since the restriction of V to Ac is monotone and satisfies (ii) and (iii) in Proposition 1 in Appendix C.1, the construction in Appendix C.2 implies that there exists an RFCC representation such that V (A) is given by Equation (7)
CONTEMPLATION REPRESENTATION
1327
for all A ∈ Ac . Since V (A) = V (co(A)) for all A ∈ A (which follows immediately from Equation (4)), this implies that V (A) is given by Equation (7) for all A ∈ A. D.2. RFCC ⇒ CC We begin by establishing a result in probability theory that will be useful laterin the proof. Given a finite set N = {1 n}, let (N) = {α ∈ [0 1]N : i∈N αi = 1} denote the simplex over N. In the following discussion, we will always assume without explicit mention that N is endowed with its discrete algebra consisting of all subsets of N and that (N) is endowed with the Borel σ-algebra B induced by its Euclidean metric. The integral of an ndimensional variable is used as a shorthand for the n-tuple of integrals of each dimension of the variable. Suppose for a moment that the set N is a state space. Consider an individual who has uncertainty about the state i ∈ N and observes a noisy signal that gives her additional information about i (a statistical experiment). Blackwell (1951, 1953) conveniently represented such a signal through the distribution over posterior beliefs over N that it induces.38 The next result establishes the converse of this approach by representing a collection of probability measures over beliefs over N satisfying a certain consistency condition as conditional probabilities resulting from statistical experiments. More specifically, Lemma 13 shows that for any collection of probability measures {πd }d∈D on (N) having the same mean, there exists a probability space (Ω F P) with the properties that (i) the state space is of the form Ω = N × Λ and (ii) for each d ∈ D there exists a sub-σ-algebra Gd of F such that the random vector (P({i} × Λ|Gd ))i∈N , denoting the posterior over N conditional on Gd , is distributed according to πd .39 LEMMA 13: Let N = {1 n} and let Λ = [(N)]D for some an arbitrary index set D. Let F denote the product σ-algebra on N × Λ and let G = {Gd : d ∈ D} where each Gd denotes the sub-σ-algebra of F consisting of events measurable with respect to the dth coordinate only, that is, # $ Gd = N × E × [(N)]D\{d} ∈ F : E ∈ B for each d ∈ D. Let {πd }d∈D be any collection of probability measures on (N) that satisfies the following consistency condition for some α ∈ (N): βπd (dβ) = α ∀d ∈ D (N)
38 This approach is also used extensively in papers on mechanism design with information acquisition. For instance, see Bergemann and Välimäki (2002, 2006) and Persico (2000). 39 Blackwell (1951) gave a proof of this result for the special case where there is only a single measure π and where α = ( n1 n1 ).
1328
H. ERGIN AND T. SARVER
Then there exists a probability measure P on (N × Λ F ) such that the following statements hold: (i) The marginal of P on N agrees with α, that is, P({i} × Λ) = αi for all i ∈ N. (ii) The marginal of P on the dth coordinate of Λ agrees with πd , that is, P N × E × [(N)]D\{d} = πd (E) ∀E ∈ B (iii) For any d ∈ D, the random vector X d : N × Λ → (N) defined by X (j λ) = λ(d) for all (j λ) ∈ N × Λ satisfies d
P({i} × Λ|Gd ) = Xid P-almost surely for all i ∈ N.40 PROOF: We first define a probability measure Pd (·|i) on (N) for each i ∈ N and d ∈ D. If αi = 0, fix the probability measure Pd (·|i) arbitrarily. If αi > 0, then let 1 βi πd (dβ) Pd (E|i) = αi E for all E ∈ B . The consistency condition on {πd }d∈D and α implies that each Pd (·|i) is a probability measure. By Theorem 4.4.6 in Dudley (2002), for each i ∈ N and nonempty finite subset D ⊂ D, there exists a unique product proba' bility measure d∈D Pd (·|i) on [(N)]D and its associated product σ-algebra. By the Kolmogorov extension theorem (see, e.g., Corollary 14.27 in Aliprantis and Border (1999)), there exists a unique extension P(·|i) of these finite product probability measures to Λ = [(N)]D and its associated product σ-algebra. Define the probability measure P on (N × Λ F ) by P(F) = αi P {λ ∈ Λ : (i λ) ∈ F}|i i∈N
for all F ∈ F . The marginal of P on N agrees with α by definition. Also, for any d ∈ D, i ∈ N, and E ∈ B , P {i} × E × [(N)]D\{d} = αi P E × [(N)]D\{d} |i (16) = αi Pd (E|i) = βi πd (dβ) E
Note that Gd is the σ-algebra generated by the signal X d . Thus, conditional on observing the signal X d , the posterior over the first dimension of the state space is almost surely equal to the realization of the signal itself. For the case where α = ( n1 n1 ), Blackwell (1951, 1953) refered to the distribution of such a signal as a standard measure. 40
CONTEMPLATION REPRESENTATION
1329
Summing Equation (16) over i ∈ N implies that the marginal of P on the dth coordinate of Λ agrees with πd : P {i} × E × [(N)]D\{d} P N × E × [(N)]D\{d} = i∈N
= =
βi πd (dβ)
i∈N
E
E
i∈N
βi πd (dβ) = πd (E)
To verify the final claim of the lemma, fix any d ∈ D and i ∈ N. Then, for any G = N × E × [(N)]D\{d} ∈ Gd , Xid (j λ)P(dj dλ) = λi (d)P(dj dλ) = βi πd (dβ) G
G
E
= P {i} × E × [(N)] = P ({i} × Λ) ∩ G
D\{d}
where the second equality follows from the second claim of the lemma and the third equality follows from Equation (16). Hence, the claim holds by definition of conditional probability.41 Q.E.D. Using these results, we now complete the proof of the RFCC ⇒ CC part of Theorem 2. Let N = {1 n} for n = |Z|. Assume that there exists an RFCC representation (M c) such that V is given by Equation (7). Since M is compact, there is κ > 0 such that μ(U ) ≤ κ for all μ ∈ M. The set κU is compact and (n − 1)-dimensional, which implies there exist affinely independent vectors v1 vn ∈ RZ such that κU ⊂ co({v1 vn }). By affine independence of v1 vn , for all u ∈ co({v1 vn }), there exist unique coefficients (barycentric coordinates) γ(u) = (γ1 (u) γn (u)) ∈ (N) such that u = γ1 (u)v1 + · · · + γn (u)vn . The mapping γ : co({v1 vn }) → (N) is a continuous bijection. In the first step of the proof, we transform each measure μ ∈ M into a probability measure πμ over (N) such that the following statements hold: (i) For every μ ∈ M and A ∈ A, i (17) max u(p)μ(du) = max βi v · pπμ (dβ) U p∈A
(N) p∈A
i∈N
41 By the definition of conditional probability, a random
variable Y is a version of P(F|Gd ) for F ∈ F if (i) Y is Gd -measurable and integrable and (ii) G Y (j λ)P(dj dλ) = P(F ∩ G) for all G ∈ Gd (see, e.g., Billingsley (1995, p. 430)).
1330
H. ERGIN AND T. SARVER
(ii) There exists α ∈ (N) such that for every μ ∈ M, (18) βπμ (dβ) = α (N)
To interpret Equation (17), suppose that vi is the individual’s expected-utility function over (Z) conditional on state i ∈ N. In period 1, the individual is uncertain about her posterior belief β = (β1 βn ) over N. In period 2, she chooses p ∈ A, maximizing her ex post expected utility i∈N βi vi determined by her posterior belief β. She believes that β is distributed according to πμ , and hence the term on the right-hand side of the first condition is her expected utility before β is realized. Equation (18) corresponds to the consistency requirement that her prior belief, given by the expected value of the posterior belief, is the same for any probability measure in the collection {πμ }μ∈M . Take any μ ∈ M. To define πμ , first consider the probability measure μ˜ on μ(U )U defined by μ(E) ˜ = μ(1U ) μ( μ(1U ) E) for any measurable E ⊂ μ(U )U .42 By a simple change of variables, we have (19) max(u · p)μ(du) = max(v · p)μ(dv) ˜ U p∈A
μ(U )U p∈A
The above equation reinterprets the integral expression in the RFCC representation in a probabilistic sense by rescaling the utility functions in U , where the rescaling coefficient depends on the particular measure μ. Recall that μ(U )U ⊂ co(κU ) ⊂ co({v1 vn }). By affine independence of v1 vn , each point in co({v1 vn }) can be uniquely expressed as a convex combination of the vertices v1 vn . We can therefore interpret each such point as a probability measure on N where the probability of i ∈ N is given by the coefficient of vi in the unique convex combination. Hence, the probability measure μ˜ can be identified with a probability measure πμ over (N) defined by πμ = μ˜ ◦ γ −1 . Figure 4 illustrates this construction for the case where n = 3. It is easy to see that Equation (17) is satisfied for every A ∈ A.43 In addition, letting α = (N) βπμ (dβ), we have44 (20) αi v i = βi vi πμ (dβ) = uμ(du) i∈N
(N)
i∈N
U
42 Note that μ(U ) > 0 since μ is positive and the RFCC representation satisfies consistency and singleton nontriviality. 43 To see this, define a continuous function g : (N) → R by g(β) = maxp∈A ( i∈N βi vi ) · p.
˜ by πμ = μ˜ ◦ γ −1 and the change of variables Then (N) g(β)πμ (dβ) = μ(U )U (g ◦ γ)(v)μ(dv) formula. This implies Equation (17) by g(γ(v)) = maxp∈A v · p and Equation (19). 44 To see the second equality, consider the continuous function g : (N) → RZ defined by
˜ by πμ = μ˜ ◦ γ −1 and the g(β) = i∈N βi vi . Then (N) g(β)πμ (dβ) = μ(U )U (g ◦ γ)(v)μ(dv) change of variables formula. This implies Equation (20) by g(γ(v)) = v and Equation (19).
CONTEMPLATION REPRESENTATION
1331
FIGURE 4.—Construction of the distribution over posteriors.
In particular, α = γ( U uμ(du)) is independent of the particular choice of μ by the consistency of the measures in M. Therefore, Equation (18) is also satisfied for all μ ∈ M. We next use Lemma 13 to express the probability measures {πμ }μ∈M over (N) as distributions over posteriors resulting from statistical experiments. Let Ω = N × Λ, where Λ, F , and G are as in Lemma 13 with D = M and N = {1 n} for n = |Z|. That is, Λ = [(N)]M , F is the product σ-algebra on Ω = N × [(N)]M , and G = {Gμ : μ ∈ M}, where # $ Gμ = N × E × [(N)]M\{μ} ∈ F : E ∈ B for each μ ∈ M. By Equation (18), the collection {πμ }μ∈M satisfies the consistency condition of Lemma 13, so there exists a probability measure P on (Ω F ) such that the following statements hold: (i) P({i} × Λ) = αi for all i ∈ N. (ii) P(N × E × [(N)]M\{μ} ) = πμ (E) for all E ∈ B and μ ∈ M. (iii) For any μ ∈ M, the random vector X μ : Ω → (N) defined by μ X (j λ) = λ(μ) for all (j λ) ∈ Ω satisfies P({i} × Λ|Gμ ) = Xiμ P-almost surely for all i ∈ N. Let U : Ω → RZ be defined by U(i λ) = vi for every i ∈ N and λ ∈ Λ. Fix any μ ∈ M. Defining X μ : Ω → (N) by X μ (j λ) = λ(μ) for all (j λ) ∈ Ω, condition (iii) on the measure P implies that (21)
E[U|Gμ ] =
i∈N
P({i} × Λ|Gμ )vi =
i∈N
Xiμ vi
1332
H. ERGIN AND T. SARVER
P-almost surely for (j λ) ∈ Ω.45 Therefore, for any A ∈ A, i E max E[U|Gμ ] · p = (22) max λi (μ)v · p P(dj dλ) p∈A
Ω
p∈A
= =
(N)
i∈N
i max βi v · p πμ (dβ) p∈A
i∈N
max u(p)μ(du)
U p∈A
where the first equality follows from Equation (21), the second equality follows from condition (ii) on the measure P, and the third equality follows from ˜ Gμ ) = c(μ), we have estabEquation (17). By Equation (22) and defining c( lished that V can be expressed as ˜ G) V (A) = max E max E[U|G ] · p − c( G ∈G
p∈A
giving the desired CC representation. APPENDIX E: PROOF OF THEOREM 4 In this section, we show that the uniqueness asserted in Theorem 4 applies not only to the RFCC representation, but to any signed RFCC representation (see Definition 5 in Appendix C). Throughout this section, we will continue to use the notation for support functions that was introduced in Appendix C.2. Suppose (M c) and (M c ) are two signed RFCC representations for . Let V : Ac → R and V : Ac → R be defined as in Equation (7) for these respective representations, and define W : Σ → R and W : Σ → R by W (σ) = V (Aσ ) and W (σ) = V (Aσ ). We first show that M = MW and c = W ∗ |MW , and likewise for (M c ). To see this, first note that by the definitions of V and W , we have W (σ) = max[σ μ − c(μ)] μ∈M
∀σ ∈ Σ
Therefore, by Theorem S.3 in the Supplemental Material and the compactness of Σ, W is Lipschitz continuous and convex, MW ⊂ M, and W ∗ (μ) = c(μ) for all μ ∈ MW . By Lemma 9 and the minimality of M, this implies M = MW and c = W ∗ |MW . It is easily verified that both V and V satisfy (i)–(iii) from Proposition 1. Therefore, by the uniqueness part of Proposition 1, there exist α > 0 and β ∈ R 45 The first equality can be seen by applying Example 34.2 of Billingsley (1995, p. 446) to each coordinate of U, since Uz is the simple function i∈N vzi I{i}×Λ for each z ∈ Z.
CONTEMPLATION REPRESENTATION
1333
such that V = αV − β. This implies that W = αW − β. For any μ ∈ C(U )∗ and σ σ ∈ Σ, note that W (σ ) − W (σ) ≥ σ − σ μ ⇐⇒
W (σ ) − W (σ) ≥ σ − σ αμ
and hence ∂W (σ) = α ∂W (σ). In particular, ΣW = ΣW and NW = αNW . Taking closures, we also have that MW = αMW . Since from our earlier arguments M = MW and M = MW , we conclude that M = αM. Finally, let μ ∈ M. Then c (αμ) = sup[σ αμ − W (σ)] = α sup[σ μ − W (σ)] + β σ∈Σ
σ∈Σ
= αc(μ) + β where the first and last equalities follow from our earlier findings that c = W ∗ |MW and c = W ∗ |MW . This concludes the proof of the theorem. APPENDIX F: PROOFS OF RESULTS FROM SECTION 4 F.1. Proof of Theorem 5 (i) ⇒ (ii) Suppose 1 has a lower cost of contemplation than 2 . For any p q ∈ (Z), taking A = {q} in Definition 3 yields V2 ({q}) ≥ V2 ({p}) ⇒ V1 ({q}) ≥ V1 ({p}). Since the restrictions of V1 and V2 to singleton menus are nonconstant affine functions, it is a standard result that this condition implies there exist α > 0 and β ∈ R such that V2 ({p}) = αV1 ({p}) + β for all p ∈ (Z) (see, e.g., Corollary B.3 of Ghirardato, Maccheroni, and Marinacci (2004)). The preference 2 was assumed to be bounded above by singletons. Thus, there exists z ∈ Z such that {δz } 2 A for all A ∈ A. It is also easy to verify that V2 being affine on singletons implies there exists some z ∈ Z such that V2 ({p}) ≥ V ({δz }) for all p ∈ (Z). Combined with monotonicity, this implies A 2 {δz } for all A ∈ A. Fix any A ∈ A. Since {δz } 2 A 2 {δz }, continuity implies there exists λ ∈ [0 1] such that A ∼2 {λδz + (1 − λ)δz }. Since 1 has a lower cost of contemplation than 2 , this implies A 1 {λδz + (1 − λ)δz }. Therefore, V2 (A) = V2 {λδz + (1 − λ)δz } = αV1 {λδz + (1 − λ)δz } + β ≤ αV1 (A) + β (ii) ⇒ (i) Suppose there exist α > 0 and β ∈ R such that V2 ({p}) = αV1 ({p}) + β for all p ∈ (Z), and V2 ≤ αV1 + β. Then A 2 {p} implies αV1 (A) + β ≥ V2 (A) ≥ V2 ({p}) = αV1 ({p}) + β
1334
H. ERGIN AND T. SARVER
which implies A 1 {p}. Thus, 1 has a lower cost of contemplation than 2 . (ii) ⇒ (iii) Suppose there exist α > 0 and β ∈ R such that V2 ({p}) = αV1 ({p}) + β for all p ∈ (Z), and V2 ≤ αV1 + β. Then, for any μ ∈ M, c2∗ (αμ) = max(ασA μ − V2 (A)) A∈A
≥ max(ασA μ − αV1 (A) − β) A∈A
= α max(σA μ − V1 (A)) − β = αc1∗ (μ) − β A∈A
(iii) ⇒ (ii) Suppose there exist α > 0 and β ∈ R such that V2 ({p}) = αV1 ({p}) + β for all p ∈ (Z), and c2∗ (αμ) ≥ αc1∗ (μ) − β for all μ ∈ M. Fix any A ∈ A. Since c2∗ (μ) = c2 (μ) for all μ ∈ M2 ,46 it follows from the definition of V2 that there exists μ ∈ M2 ⊂ M such that V2 (A) = σA μ − c2∗ (μ) ∗ 1 μ +β ≤ σA μ − αc1 α ( ) 1 ∗ 1 = α σA μ − c1 μ +β α α ≤ αV1 (A) + β where the last inequality follows from the definition of c1∗ . F.2. Proof of Corollary 1 Define V by LEMMA 14: Let ((Ω F P) G U c) be any CC representation. 1 Equation (4), define c ∗ by Equation (9), and let γ = |Z| z∈Z E[Uz ]. Then the following statements hold: (i) c ∗ (μG ) ≤ c(G ) − γ for any G ∈ G. (ii) c ∗ (μG ) = c(G ) − γ if and only if G solves Equation (4) for some A ∈ A. PROOF: (i) Note first that for any A ∈ A and G ∈ G, V (A) ≥ E max E[U|G ] · p − c(G ) = σA μG + γ − c(G ) p∈A
where the inequality follows from Equation (4) and the equality follows from Lemma 1. Therefore, σA μG − V (A) ≤ c(G ) − γ, implying by the definition of c ∗ that c ∗ (μG ) ≤ c(G ) − γ. 46 That c = c ∗ |M for any RFCC representation (M c) follows from the observations made in Appendix E regarding conjugate convex functionals since c ∗ is precisely the restriction of the functional W ∗ to the set M.
CONTEMPLATION REPRESENTATION
1335
(ii) Suppose that G ∈ G solves Equation (4) for some A ∈ A. Then, by Equation (4) and Lemma 1, V (A) = σA μG + γ − c(G ). Along with the definition of c ∗ , this implies that c ∗ (μG ) ≥ σA μG − V (A) = c(G ) − γ. By part (i), we have c ∗ (μG ) = c(G ) − γ. Conversely, suppose that c ∗ (μG ) = c(G ) − γ. Then taking A ∈ A such that σA μG − V (A) = c ∗ (μG ), Lemma 1 implies V (A) = σA μG + γ − c(G ) = E max E[U|G ] · p − c(G ) p∈A
Thus, G solves Equation (4) for the menu A.
Q.E.D.
We now complete the proof of Corollary 1. Note for each i = 1 2, by Theorem 2, there exists an RFCC representation such that Vi is given by Equation (7). Therefore, the implications (i) ⇔ (ii) and (ii) ⇒ (iii) follow from Theorem 5. To see (iii) ⇒ (ii), suppose there exist α > 0 and β ∈ R such that V2 ({p}) = αV1 ({p}) + β for all p ∈ (Z), and c2∗ (μG2 ) ≥ αc1∗ ( α1 μG2 ) − β for all G2 ∈ G2 . Let 1 P2 A ∈ A and define β2 = |Z| z∈Z E [U2z ]. By Equation (4), there exists G2 ∈ G2 such that V2 (A) = EP2 max EP2 [U2 |G2 ] · p − c2 (G2 ) p∈A
" ! = σA μG2 + β2 − c2 (G2 ) ! " ≤ σA μG2 − c2∗ μG2 ! " 1 μG2 + β ≤ σA μG2 − αc1∗ α ( ) 1 1 = α σA μG2 − c1∗ μG2 +β α α ≤ αV1 (A) + β where the second equality follows from Lemma 1, the first inequality follows from part (i) of Lemma 14, the second inequality follows from our assumptions on c1∗ and c2∗ , and the last inequality follows from the definition of c1∗ . APPENDIX G: PROOFS OF RESULTS FROM SECTION 5 G.1. Proof of Theorem 6 In this section, we show that for any signed RFCC representation (see Definition 5 in Appendix C)—in particular, for any RFCC representation—strong IDD is equivalent to a constant cost function. The necessity of strong IDD is straightforward and left to the reader. For sufficiency, suppose V is defined by
1336
H. ERGIN AND T. SARVER
Equation (7) for a signed RFCC representation (M c) for the preference and that satisfies strong IDD. Lemma S.15 in the Supplemental Material shows that for any A ∈ A, p ∈ (Z), and α ∈ [0 1],47 (24)
V (αA + (1 − α){p}) = αV (A) + (1 − α)V ({p})
As in Appendix C.2, define W : Σ → R by W (σ) = V (Aσ ). By Equation (24) and parts (i) and (ii) of Lemma 5, for any A ∈ A, p ∈ (Z), and α ∈ [0 1], W ασA + (1 − α)σ{p} = αW (σA ) + (1 − α)W σ{p} It was shown in Appendix E that for any signed RFCC representation (M c), defining W as we have here gives M = MW and c = W ∗ |MW . In particular, W satisfies (25)
W (σ) = max [σ μ − W ∗ (μ)] μ∈MW
Therefore, it suffices to show that W ∗ is constant on MW . Let w¯ = minμ∈MW W ∗ (μ). Note that this minimum is well defined since W ∗ is lower semicontinuous and MW is compact. Let μ¯ ∈ MW be a minimizing measure, ¯ so that W ∗ (μ) ¯ = w. We first show that W ∗ (μ) = w¯ for all μ ∈ NW . Let μ ∈ NW be arbitrary. By the definition of NW and Lemma 8, there exists some A ∈ A such that μ is the unique maximizer of Equation (25) at σA . That is, W (σA ) = σA μ − W ∗ (μ) > σA μ − W ∗ (μ ) for any μ ∈ MW , μ = μ. Now, for any p ∈ (Z) and α ∈ (0 1), choose μ ∈ MW that maximizes Equation (25) at ασA + (1 − α)σ{p} . Then αW (σA ) + (1 − α)W σ{p} = W ασA + (1 − α)σ{p} " ! = ασA + (1 − α)σ{p} μ − W ∗ (μ ) 47 To have an intuition for Equation (24), suppose there exist alternatives z z ∈ Z such that {δz } A {δz } for any A ∈ A. It is easy to see that under this simplifying assumption, every menu is indifferent to a singleton menu. It is also easily verified that for any signed RFCC representation, the consistency of the measures implies that V is affine on singleton menus:
(23)
V (α{q} + (1 − α){p}) = αV ({q}) + (1 − α)V ({p}) ∀p q ∈ (Z)
Let A ∈ A, p ∈ (Z), and α ∈ [0 1]. By the simplifying assumption, there exists q ∈ (Z) such that A ∼ {q}. Then V (αA + (1 − α){p}) = V (α{q} + (1 − α){p}) (by strong IDD) = αV ({q}) + (1 − α)V ({p}) (by Equation (23)) = αV (A) + (1 − α)V ({p})
CONTEMPLATION REPRESENTATION
1337
= α[σA μ − W ∗ (μ )] %! & " + (1 − α) σ{p} μ − W ∗ (μ ) Since σA μ −W ∗ (μ ) ≤ W (σA ) and σ{p} μ −W ∗ (μ ) ≤ W (σ{p} ), the above equation implies that we must in fact have σA μ − W ∗ (μ ) = W (σA ) and σ{p} μ − W ∗ (μ ) = W (σ{p} ). By the choice of A, the former implies μ = μ. Therefore, the latter implies " " ! ! ¯ σ{p} μ − W ∗ (μ) = W σ{p} ≥ σ{p} μ¯ − w ¯ Since w¯ is Consistency implies σ{p} μ = σ{p} μ ¯ and therefore W ∗ (μ) ≤ w. ∗ ¯ the minimum of W on MW , we have W ∗ (μ) = w. The proof is completed by showing that W ∗ (μ) = w¯ for all μ ∈ MW . If w∗ μ ∈ MW , then there exists a net {μd }d∈D in NW such that μd → μ. Since each ¯ Since W ∗ is lower μd is in NW , our previous arguments imply that W ∗ (μd ) = w. ∗ ∗ ¯ Since w¯ is minisemicontinuous, it follows that W (μ) ≤ lim infd W (μd ) = w. ¯ mal, we have W ∗ (μ) = w. G.2. Proof of Corollary 2 (ii) ⇒ (i) Let G = {G ∈ G : c(G ) ≤ k} and let c : G → R be any constant function. Then ((Ω F P) G U c ) is a CC representation for . Hence, Theorem 1 implies that satisfies weak order, strong continuity, ACP, and monotonicity. It is easily verified that since c is constant, also satisfies strong IDD. (i) ⇒ (ii) First, apply Theorem 3 to conclude that has an RFCC representation (M c). Then, by Theorem 6, strong IDD implies that c is constant. The˜ that gives the orem 2 implies there is a CC representation ((Ω F P) G U c) same value function for menus V as the RFCC representation (M c). Moreover, since c is constant, it is immediate from the construction in the proof of Theorem 2 that c˜ can be taken to be constant. If we choose k ∈ R larger than this constant value, then the function V defined by Equation (11) for these parameters represents . G.3. Proof of Corollary 3 If : The necessity of weak order, strong continuity, and monotonicity follows from Theorem 1. Since c(F ) = minG ∈G c(G ), F is an optimal contemplation strategy for any menu. Thus, for any A ∈ A, V (A) = E max E[U|F ] · p − c(F ) p∈A
This implies that V is affine, and hence satisfies independence.
1338
H. ERGIN AND T. SARVER
Only if : By Theorem 7, has an RFCC representation (M c) in which M = {μ} for some finite Borel measure μ. Since μ is positive, we can define a Borel probability measure on U by P = μ(μU ) . Let Ω = U , let F be the Borel σ-algebra on U , and let G = {F }. If we define U : U → RZ by U(u) = u, then for any A ∈ A, 1 max u(p)μ(du) E max E[U|F ] · p = E max U · p = p∈A p∈A μ(U ) U p∈A Therefore, taking c(F ) to be any real number, we have the desired CC representation for . REFERENCES ALIPRANTIS, C., AND K. BORDER (1999): Infinite Dimensional Analysis. Berlin, Germany: Springer-Verlag. [1323-1325,1328] BERGEMANN, D., AND J. VÄLIMÄKI (2002): “Information Acquisition and Efficient Mechanism Design,” Econometrica, 70, 1007–1033. [1327] (2006): “Information in Mechanism Design,” in Proceedings of the 9th World Congress of the Econometric Society, ed. by R. Blundell, W. Newey, and T. Persson. Cambridge: Cambridge University Press, 186–221. [1327] BILLINGSLEY, P. (1995): Probability and Measure. New York: Wiley. [1312,1329,1332] BLACKWELL, D. (1951): “Comparisons of Experiments,” in Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. Berkeley and Los Angeles: University of California Press, 93–102. [1327,1328] (1953): “Equivalent Comparisons of Experiments,” The Annals of Mathematical Statistics, 24, 265–272. [1327,1328] CONLISK, J. (1996): “Why Bounded Rationality,” Journal of Economic Literature, 34, 669–700. [1310] DEKEL, E., B. LIPMAN, AND A. RUSTICHINI (2001): “Representing Preferences With a Unique Subjective State Space,” Econometrica, 69, 891–934. [1289] (2008): “Temptation-Driven Preferences,” Review of Economic Studies, 76, 937–971. [1292,1314] DEKEL, E., B. LIPMAN, A. RUSTICHINI, AND T. SARVER (2007): “Representing Preferences With a Unique Subjective State Space: Corrigendum,” Econometrica, 75, 591–600. [1290] DUDLEY, R. (2002): Real Analysis and Probability. Cambridge: Cambridge University Press. [1328] EPSTEIN, L. G., M. MARINACCI, AND K. SEO (2007): “Coarse Contingencies and Ambiguity,” Theoretical Economics, 2, 355–394. [1310] ERGIN, H. (2003): “Costly Contemplation,” Mimeo, Washington University in Saint Louis. [1289, 1293,1300] ERGIN, H., AND T. SARVER (2010): “Supplement to ‘A Unique Costly Contemplation Representation’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ ecta/Supmat/7801_proofs.pdf. [1288] GHIRARDATO, P., F. MACCHERONI, AND M. MARINACCI (2004): “Differentiating Ambiguity and Ambiguity Attitude,” Journal of Economic Theory, 118, 133–173. [1333] GILBOA, I., AND D. SCHMEIDLER (1989): “Maxmin Expected Utility With Non-Unique Priors,” Journal of Mathematical Economics, 18, 141–153. [1304,1310] GREEN, J. (1987): “‘Making Book Against Oneself,’ the Independence Axiom and Nonlinear Utility Theory,” The Quarterly Journal of Economics, 102, 785–796. [1306]
CONTEMPLATION REPRESENTATION
1339
GUL, F., AND W. PESENDORFER (2001): “Temptation and Self-Control,” Econometrica, 69, 1403–1435. [1292,1314] HÖRMANDER, L. (1954): “Sur la Fonction d’Appui des Ensembles Convexes dans un Espace Localement Convexe,” Arkiv för Matematik, 3, 181–186. [1323] HYOGO, K. (2007): “A Subjective Model of Experimentation,” Journal of Economic Theory, 133, 316–330. [1300] KARNI, E. (1993): “A Definition of Subjective Probabilities With State-Dependent Preferences,” Econometrica, 61, 187–198. [1296] KREPS, D. (1979): “A Representation Theorem for Preference for Flexibility,” Econometrica, 47, 565–578. [1289,1292,1293,1296] LIPMAN, B. (1991): “How to Decide How to Decide How to : Modeling Limited Rationality,” Econometrica, 59, 1105–1125. [1310] (1995): “Information Processing and Bounded Rationality: A Survey,” The Canadian Journal of Economics, 28, 42–67. [1311] MACCHERONI, F., M. MARINACCI, AND A. RUSTICHINI (2006): “Ambiguity Aversion, Robustness, and the Variational Representation of Preferences,” Econometrica, 74, 1447–1498. [1288,1292, 1304] MAS-COLELL, A. (1986): “The Price Equilibrium Existence Problem in Topological Vector Lattices,” Econometrica, 54, 1039–1054. [1290] MOSSIN, J. (1969): “A Note on Uncertainty and Preferences in a Temporal Context,” The American Economic Review, 59, 172–174. [1306] MUNKRES, J. (2000): Topology. Upper Saddle River, NJ: Prentice Hall. [1315] OLSZEWSKI, W. (2007): “Preferences Over Sets of Lotteries,” Review of Economic Studies, 74, 567–595. [1308] PERSICO, N. (2000): “Information Acquisition in Auctions,” Econometrica, 68, 135–148. [1327] ROCKAFELLAR, R. T. (1970): Convex Analysis. Princeton, NJ: Princeton University Press. [1320, 1321] ROYDEN, H. L. (1988): Real Analysis. Englewood Cliffs, NJ: Prentice Hall. [1322] SARVER, T. (2008): “Anticipating Regret: Why Fewer Options May Be Better,” Econometrica, 76, 263–305. [1290,1292,1314] SCHNEIDER, R. (1993): Convex Bodies: The Brunn–Minkowski Theory. Cambridge: Cambridge University Press. [1315,1320,1321]
Dept. of Economics, Washington University in Saint Louis, Campus Box 1208, Saint Louis, MO 63130, U.S.A.; [email protected] and Dept. of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208, U.S.A.; [email protected]. Manuscript received March, 2008; final revision received February, 2010.
Econometrica, Vol. 78, No. 4 (July, 2010), 1341–1373
FOUNDATIONS OF INTRINSIC HABIT FORMATION BY KAREEN ROZEN1 We provide theoretical foundations for several common (nested) representations of intrinsic linear habit formation. Our axiomatization introduces an intertemporal theory of weaning a decision-maker from her habits using the device of compensation. We clarify differences across specifications of the model, provide measures of habit-forming tendencies, and suggest methods for axiomatizing time-nonseparable preferences. KEYWORDS: Time-nonseparable preferences, linear habit formation, weaning, compensated separability, gains monotonicity.
But now I’m just about fed up to the teeth with relativity! Even such a thing pales when one is too occupied with it. Albert Einstein2
1. INTRODUCTION DOES ONE’S VALUATION for a good depend on the amount or frequency in which it is consumed? Is an increase in consumption always beneficial, even if it is only temporary? Because questions such as these cannot be properly addressed in the standard time-separable model of choice, the economic literature has seen a surge in models of habit formation, which permit an individual’s consumption history (her intrinsic habit) to affect her enjoyment of present and future consumption.3 While these models have had success in accounting for notable phenomena, the literature has been unable to come to a consensus on how to formulate habit formation, and in some cases, the predictions of the most commonly used models disagree.4 Related to this difficulty is the scarcity 1 I am grateful to a co-editor and anonymous referees for advice that greatly improved this paper. I am indebted to Roland Benabou, Wolfgang Pesendorfer, and especially Eric Maskin for their guidance during this paper’s development. I also thank Dilip Abreu, Dirk Bergemann, Faruk Gul, Giuseppe Moscarini, Jonathan Parker, Ben Polak, Michael Rothschild, Larry Samuelson, Ron Siegel, and numerous seminar participants for helpful comments. This paper is based on the first chapter of my doctoral dissertation at Princeton University. 2 In a 1921 letter, published in Buchwald et al. (2009, p. 13). 3 We distinguish intrinsic habits from extrinsic habits such as Abel (1990)’s “catching up with the Joneses” effect, which captures a form of consumption externality or envy. 4 Variants of the intrinsic linear habit formation model we axiomatize have been used, for example, in Constantinides (1990) to shed light on equity-premium puzzle data indicating individuals are far more averse to risk than expected; in Boldrin, Christiano, and Fisher’s (2001) model of real business cycles with habit formation and intersectoral inflexibilities, to explain why consumption growth is connected strongly to income but only weakly to interest rates; and in Uribe (2002), to explain the consumption contractions seen before exchange rate stabilization programs collapse. While intrinsic linear habits is the most common model, other researchers posit habits that affect the discount factor, as in Shi and Epstein (1993), or that are nonlinear. In Carroll, Overland, and Weil (2000), a linear habit aggregator divides consumption; this model was criticized by Wendner (2003) for having counterintuitive implications for consumption growth.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7302
1342
KAREEN ROZEN
of theoretical work examining the underpinnings of such preferences. Though there is a rich literature on static reference dependence, there is little understanding of dynamic settings where the reference point changes endogenously, as is the case in habit formation.5 By clarifying the implications for choice behavior, such work would help illuminate why one model of habit formation might be more reasonable than another or why the commonly used incarnations are reasonable at all. We contribute to the literature in that theoretical vein. We formulate a theory of history-dependent intertemporal choice, describing a decision-maker (DM) by a family of continuous preference relations over future consumption, each corresponding to a possible consumption history. Our representative DM is dynamically consistent given her consumption history, can be weaned from her habits using special streams of compensation, and satisfies a separability axiom appropriate for time-nonseparable preferences. Though she is fully rational, her history-dependent behavior violates the axioms of Koopmans (1960), upon which the standard theory of discounted utility rests. Our theory lays the foundation for the model of linear habit formation, in which a DM evaluates consumption at each point in time with respect to a reference point that is generated linearly from her consumption history. Suppose the DM’s time-0 habit is h = ( h3 h2 h1 ), where hk denotes consumption k periods ago. If she consumes the stream c = (c0 c1 c2 ), her time-t habit will be h(t) = (h c0 c1 ct−1 ), where h(t) k denotes consumption k periods prior to time t. The DM then evaluates the stream c using the utility function ∞ ∞ (t) t Uh (c) = δ u ct − λk hk t=0
k=1
In this model, the time-t habit (h c0 c1 ct−1 ) that results from consuming c under initial habit h is aggregated into the DM’s period-t reference point by taking a weighted average using the habit formation coefficients {λk }k≥1 . These coefficients satisfy a geometric decay property ensuring that the influence of past consumption fades over time. Several variations of this model are prevalent in the applied literature. We provide axiomatic foundations for this general formulation and some common specializations, clarifying the behavioral differences across the nested models and providing measures of habit-forming tendencies. This paper is related to a growing literature on forward-looking habit formation, beginning with the seminal work of Becker and Murphy (1988) on rational 5
We contribute to this axiomatic literature, particularly Neilson (2006), which specifies the first component of a bundle as the reference point. By contrast, we do not assume a particular reference point, but derive an infinite sequence of endogenously changing reference points.
1343
INTRINSIC HABIT FORMATION
addiction. Although Koopmans (1960) uncovered foundations for intertemporally separable preferences, this literature has not found axiomatic foundations for a structured model of habit-forming preferences over consumption streams, such as those used in applied work. Rustichini and Siconolfi (2005) proposed axiomatic foundations for a model of dynamically consistent habit formation which, unlike this paper, does not offer a particular structure for the utility or form of habit aggregation. Gul and Pesendorfer (2007) studied self-control problems by considering preferences over menus of consumption streams of addictive goods, rather than over the streams themselves. Shalev (1997) provided a foundation for a special case of loss aversion, which, like the classical representation in Tversky and Kahneman (1991), is time-inconsistent. Our representation can accommodate a dynamically consistent model of loss aversion where the period utility takes the well known S-shaped form. Such a model resolves various anomalies of intertemporal choice: as Camerer and Loewenstein (2004) noted, many effects “are consistent with stable, uniform, time discounting once one measures discount rates with a more realistic utility function.” We present the framework in Section 2 and the main axioms in Section 3. We discuss the main representation theorem in Section 4. In Sections 5–7 we examine the behavioral implications of some common restrictions on the model. 2. THE FRAMEWORK The DM faces an infinite-horizon decision problem in which a single habitforming good is consumed in every period t ∈ N = {0 1 2 } from the same set Q = R+ . A consumption level q ∈ Q may be interpreted as a choice of either quantity or quality of the good. The DM chooses an infinite consumption stream c = (c0 c1 c2 ) from the ∞ set of bounded consumption streams C = {c ∈ t=0 Q| supt ct < ∞} where ct is the consumption level prescribed for t periods into the future. Date 0 is always interpreted to be the current date. We consider C as a metric subspace ∞ ∞ of t=0 Q endowed with the product metric ρC (c c ) = t=0 1/2t · |ct − ct |/(1 + |ct − ct |).6 The DM’s preferences over the space of consumption streams C depend on her consumption history, her habit. The set of possible habits is time-invariant 1 and given by the space of bounded streams H = {h ∈ k=∞ Q| supk hk < ∞}; each habit h ∈ H is an infinite stream denoting prior consumption and is written as h = ( h3 h2 h1 ), where hk denotes the consumption level of the
×
×
×
×
∞
Since t=0 Q endowed with ρC is a topologically separable metric space, so is C when viewed as a metric subspace. Ensuring C is separable allows us to concentrate on the structural elements of habit formation. Alternatively, we could impose separability directly as in Rustichini and Siconolfi (2005). Bleichrodt, Rohde, and Wakker (2008) is representative of a literature that concentrates on relaxing assumptions about the consumption space, including separability. 6
1344
KAREEN ROZEN
DM k periods ago. We endow the space H with the sup metric ρH (h h ) = supk |hk − hk |. The DM knows that her future tastes will be influenced by her consumption history. Starting from any given initial habit h ∈ H, consuming the stream c ∈ C results in the date-t habit (h c0 c1 ct−1 ). Consequently, the DM’s habit, and therefore her preferences, may undergo an infinite succession of changes endogenously induced from her choice of consumption stream. The DM’s preferences given a habit h ∈ H are denoted by h and are defined on the consumption space C. Each such preference is a member of the family = {h }h∈H . We assume the DM’s preference depends on consumption history but not on calendar time. Our setup explicitly presumes histories are infinite because this assumption is commonly invoked in the literature. Alternatively, one may assume that the DM’s preferences are affected only by her last K ≥ 3 consumption levels.7 The notation in our analysis would remain the same as long as current and future habits are truncated after K components; that is, (h c0 ) would denote the habit (hK−1 h2 h1 c0 ). Finally, while our framework is one of riskless choice, the analysis can be extended immediately to lotteries over consumption streams by imposing the axioms of von Neumann and Morgenstern (1944) on lotteries and our axioms on the degenerate lotteries. We collect here some useful notation. We reserve k ∈ {1 2 3 } to signify a period of previous consumption and reserve t ∈ {0 1 2 } to signify a period of impending consumption. The notation c + c (or h + h ) refers to usual vector addition. As is customary, t c denotes (ct ct+1 ct+2 ) and c t denotes (c0 c1 ct ). If c ∈ C, we write (c t t+1 c + c ) to denote (c0 c1 ct ct+1 + c0 ct+2 + c1 ). For α ∈ R, we use the shorthand αt to signify the t-period repetition (α α α) and use (c t t+1 c + α) to compactly denote (c0 c1 ct ct+1 + α ct+2 + α ) whenever the resulting stream is in C. At times it will be convenient to let hq denote the habit (h q) that forms after consuming q under habit h (similarly for hc t ). The zero habit ( 0 0) ¯ Finally, h ≥ h (or c ≥ c ) means hk ≥ h for all k (or ct ≥ c for is denoted 0. k t all t), with at least one strict inequality. 3. THE MAIN AXIOMS This section presents axioms of choice behavior that are necessary and sufficient for a linear habit formation representation. The following axioms are imposed for all h ∈ H. The first three are familiar in the theory of rational choice over consumption streams, and the fourth is a simple technical condition to ensure that the DM’s preferences are nondegenerate. As usual, h denotes the asymmetric part of the binary relation h . 7
K ≥ 3 is required only for the proof of time-additivity.
1345
INTRINSIC HABIT FORMATION
AXIOM PR—Preference Relation: h is complete and transitive. AXIOM C—Continuity: For all c ∈ C, {c : c h c} and {c : c h c } are open. AXIOM DC—Dynamic Consistency: For any q ∈ Q and c c ∈ C, (q c) h (q c ) if and only if c hq c . AXIOM S—Sensitivity: There exist c ∈ C and α > 0 such that c + α ∼h c. Axioms PR and C together require that the DM’s choices are derived from a continuous preference relation, ensuring a continuous utility representation on our separable space. Axiom DC further assumes that the DM’s preferences are dynamically consistent in a history-dependent manner, in the sense that given the relevant histories, she will not change her mind tomorrow about the consumption stream she chooses today. Axiom DC is weak enough to accommodate a number of observed time-discounting anomalies, but strong enough to ensure that dynamic programming techniques can be used to solve the DM’s choice problem and that her welfare can be analyzed unambiguously.8 Axiom S is a nondegeneracy condition that is much weaker than monotonicity, which we address in Section 5. It allows for the possibility that due to habit formation, the DM is worse off under a uniform increase in consumption. Our main structural axiom, habit compensation, provides a revealed preference theory of “weaning” a DM from her habits. Define the set of ordered pairs of consumption histories H = {(h h) ∈ H ×H|h ≤ h}. We say that habits (h h) ∈ H agree on k if hk = hk . Similarly, the habits (h h) ∈ H agree on a subset of indices K ⊂ {1 2 } if they agree on each k ∈ K. The axiom has three parts, two of which play central roles. The first says that for any ordered pair of habits, there is a decreasing “compensating stream” that compensates the DM for having the higher habit. The second says that if a compensating stream received in the future compensates for variations in prior consumption, then preferences over current consumption are independent of future consumption.
AXIOM HC—Habit Compensation: There is a mapping (h h) → d h h from ordered pairs (h h) ∈ H to strictly positive consumption streams satisfying: (i) Weaning. Each d h h is a weakly decreasing stream and uniquely satisfies c h c
if and only if
c + d h h h c + d h h
∀c c ∈ C
8 Without Axiom DC, it becomes more difficult to interpret the DM’s choices for the future and discuss welfare implications of her choices. The DM’s choice may need to be modeled using an equilibrium concept, as in Köszegi and Rabin’s (2009) model of dynamic reference dependence: the utility over sequences of consumption and beliefs there is technically consistent, but beliefs must be determined rationally in a personal equilibrium (see also Köszegi and Rabin (2006)).
1346
KAREEN ROZEN
ˆ c¯ ∈ C, t ≥ 0, and h ≤ hc t hcˆt , (ii) Compensated Separability. For any c c t h hct t c d h cˆt d h hcˆ if and only if t t t c c¯ + d h hc h cˆt c¯ + d h hcˆ ˆ (h h) ∈ H that agree on k, ˆ (iii) Independence of Irrelevant Habits. For any k, and q ∈ Q, if ˆ ˆ hk if k = k, hk if k = k, ˆ ˆ and hk = hk = ˆ q if k = k, q if k = kˆ
ˆ ˆ
then d h h = d h h .
Formally, Axiom HC(i) says there exists a unique compensating stream d h h such that when the DM is endowed with d h h at the higher habit h, her choice behavior at h is identical to her choice behavior at the lower habit h , without this endowment.9 As illustrated in Figure 1, the indifference curves for the lower habit h are thus translated up by the stream d h h into indifference curves for the higher habit h.10 An increase in habit has similar effects as a
FIGURE 1.—HC(i) applied to an h indifference curve on (c0 c1 ), for given 2 c. 9 Given the existence of compensating streams, uniqueness corresponds to a regularity or nondegeneracy condition on preferences for any fixed habit: if compensation is not unique for some ˆ then for every h ≥ h, ˆ there are nonzero c¯ = c¯¯ ∈ C such that for any c c ∈ C, we have pair (hˆ h), c + c¯ h c + c¯ if and only if c + c¯¯ h c + c¯¯ . As the representation theorem shows, this rules out period utilities that are essentially periodic functions. 10 While not evident in Figure 1, the pictured indifference curves correspond to the same utility levels under their respective habits; hence the analogy to Hicksian wealth compensation.
INTRINSIC HABIT FORMATION
1347
change in intertemporal prices, and just as Hicks (1939) uses wealth compensation to identify income and substitution effects, HC(i) uses intertemporal consumption compensation to disentangle habit formation and time preference. In particular, it allows us to elicit the DM’s changing reference points by identifying the baseline consumption stream inducing her to behave as if she has a lower habit. Because she is compensated with the habit-forming good, the DM’s need for compensation could theoretically grow over time. The re quirement that each stream d h h is weakly decreasing formalizes the sense in which the DM can be weaned: she receives the highest compensation today, because the effect of her habit today is sufficiently stronger than it will be tomorrow. Axiom HC(ii) is a generalization of separability for time-nonseparable preferences. Suppose a DM with habit h compares streams having one of two possible consumption paths for periods 0 through t: c t or cˆt . Suppose further that whichever initial consumption path is chosen, she will be appropriately t compensated starting in t + 1 to behave as if she has habit h (with d h hc or t d h hcˆ , respectively). Then she will evaluate any t + 1-continuation path c¯ from a habit-h perspective. Axiom HC(ii) says that with such compensation starting in period t + 1, the DM’s choice among streams that share a common t + 1continuation path is determined only by the consumption path from 0 to t. Hence the future becomes “separable” from the past. Axiom HC(iii) ensures that if (h h) ∈ H agree on some k, then the compensation needed to wean the DM from h to h is independent of the period-k habit level. Thus, an element of a habit that is unchanged does not affect weaning. Finally, we require two additional technical conditions on the DM’s initial level of compensation. These conditions concern the strength of the DM’s memory and rule out degenerate representations of the preferences we seek. First, we require that the initial compensation needed for a habit goes to zero as that habit becomes more distant in memory: that is, for any habit h ∈ H t ¯ we have limt→∞ d00h0 = 0.11 In counterpoint, the second condition states that for any fixed prior date of consumption, we can find two habits that differ widely enough on that date to generate any initial level of compensation: that is, for any q > 0 and k, there exist (h h) ∈ H that agree on N \ {k} and satisfy d0h h = q. We say the DM’s memory is nondegenerate if these two conditions hold. AXIOM NDM—Nondegenerate Memory: The DM’s memory is nondegenerate. 11
This is required only for histories of infinite length.
1348
KAREEN ROZEN
4. THE MAIN REPRESENTATION THEOREM We now present our main theorem, which characterizes the preferences that satisfy our axioms of habit formation. The utility representation obtained is a dynamically consistent model of intrinsic linear habit formation that has featured prominently in the applied literature. The representation theorem requires an acyclicity condition on period utilities. We say that u : R → R is cyclic if there is γ > 0 such that u(x + γ) = u(x) for all x ∈ R, and is acyclic otherwise. THEOREM 1—Main Representation: The family of preference relations satisfies Axioms PR, C, DC, S, HC, and NDM if and only if there exist a discount factor δ ∈ (0 1), habit formation coefficients {λk }k≥1 ∈ R, and a period utility u : R → R such that for every h ∈ H, h can be represented by ∞ ∞ (t) with h(t) = (h c0 c1 ct−1 ) (1) δt u c t − λk hk Uh (c) = t=0
k=1
where the habit formation coefficients {λk }k≥1 are unique and satisfy (2)
λk ∈ (0 1) and
λk+1 ≤ 1 − λ1 λk
for all k ≥ 1
∞ which implies that k=1 λk ≤ 1. The period utility u(·) is continuous, acyclic, bounded, and unique up to positive affine transformation. The proof is given in Appendix A. In Section 4.1, we examine why the representation satisfies Axiom HC, which provides insight into our constructive proof. The representation may be seen as a model of dynamic reference dependence: the linear habit aggregator ϕ : H → R, which is defined by ˆ = ϕ(h)
∞
λk hˆ k
k=1
determines the reference point against which date-t consumption is evaluated. The representation has two main features. First, given a history h, the DM transforms each consumption stream c into a habit-adjusted consumption stream (c0 − ϕ(h) c1 − ϕ(h c0 ) c2 − ϕ(h c0 c1 ) ) We denote this transformation ∞by g(h c). The DM then applies a discounted “outer utility” U ∗ , given by t=0 δt u(·), to evaluate the habit-adjusted stream. The DM’s utility Uh over consumption streams is then U ∗ (g(h ·)). Because the habit formation coefficients in Theorem 1 are positive, the representation implies that utility is history dependent. If the DM’s history is assumed to be
INTRINSIC HABIT FORMATION
1349
finite and of length K, only the first K habit formation coefficients would be positive. A standard discounted utility maximizer, for whom all the habit formation coefficients would equal zero, would satisfy all our axioms if the compensating streams were identically zero. We may include the standard model by relaxing Axiom HC to include the possibility that all the compensating streams are identically zero, but we avoid doing so to simplify exposition. The other restrictions in this representation are the boundedness and acyclicity requirements on the period utility. Boundedness ensures that product continuity is preserved and acyclicity ensures the uniqueness of compensation. If the period utility is cyclic (for example, the sine function), then we cannot pin down the transformation of the DM’s preferences from one habit to another and, therefore, we lose uniqueness of compensation.12 However, cyclic functions would not fall into the class of period utilities regularly considered in economic models. Theorem 1 may also be viewed as a foundation for a log-linear representation
∞ ct δt u Uh (c) = ψ(h(t) ) t=0 where
ˆ = ψ(h)
∞ k=1
λ hˆ kk
and
λk+1 ≤ 1 − λ1 λk
if we reinterpret the framework so that the DM cares about, and forms habits over, consumption growth rates instead of consumption levels.13 That is, the DM forms habits over logarithms of past consumption levels ( log h2 log h1 ) and has preferences over streams of logarithms of consumption (log c0 log c1 ), under the assumption that consumption is bounded away from zero. The DM must then be compensated in terms of rates of consumption growth rather than in consumption levels. 4.1. Why the Representation Satisfies Axiom HC Consider a DM who can be represented as in Theorem 1. Why does this DM satisfy Axiom HC and how do her compensating streams look? 12 This is also true for a more general class of quasicyclic functions (for which there exist α and β γ > 0 such that u(x + γ) = βu(x) + α for all x), but they are ruled out by boundedness unless β = 1 and α = 0. For example, if the period utility is linear, then we cannot uniquely identify compensation either: indeed, the DM’s choice behavior would be observationally equivalent to that in a model without habit formation. 13 Such a model was proposed by Kozicki and Tinsley (2002) and is appealing in light of Wendner (2003), who showed the counterintuitive implications of a common model in which the argument of the period utility is current consumption divided by a linear habit stock.
1350
KAREEN ROZEN
Consider any ordered pair of habits (h h) ∈ H. At time t, the DM’s period utility is u(ct − ϕ(h c t−1 )) if she has habit h , while it is u(ct − ϕ(h c t−1 )) if she has habit h. A simple relationship between these period utilities is obtained by adding and subtracting ϕ(h c t−1 ): (3) u(ct − ϕ(h c t−1 )) = u ct + [ϕ(h c t−1 ) − ϕ(h c t−1 )] − ϕ(h c t−1 ) Since the habit aggregator ϕ(·) is strictly increasing and linear, the bracketed term ϕ(h c t−1 ) − ϕ(h c t−1 ) is strictly positive and equal to ϕ(h − h 0t ). Axiom HC(i) says that whenever the DM is endowed with d h h at habit h, her utility from any stream c is the same as her utility from c under the lower habit h , without compensation. We use (3) to construct d h h as follows. At h h time 0, we provide the DM with the amount d0 = ϕ(h − h ). As seen from (3), the DM’s period utility from consuming c0 + d0h h under habit h at time 0 is the same as her period utility from consuming c0 under habit h . To construct d1h h , we must take into account that the DM was compensated with the habit forming good: her actual time-0 consumption level under h in (3) is c0 + d0h h . The bracketed term in (3) is then d1h h = ϕ(h − h ϕ(h − h )). Continuing in this manner, at time t the compensating stream d h h compensates for the original difference in habits as well as for compensation provided prior to t. Formally, d h h has the recursive form (4) ϕ(h − h ) ϕ(h − h ϕ(h − h )) ϕ h − h ϕ(h − h ) ϕ(h − h ϕ(h − h )) where ϕ is linear. In the Appendix we prove the characterization of compensation in (4) from the axioms. In the special case that the habits involved differ only by the most recent element, (4) takes a particularly simple form: (5)
hq hq
= λ1 (q − q )
hq hq
= λ2 (q − q ) + λ1 d0
d0 d1
hq hq
d2
hq hq hq hq
= λ3 (q − q ) + λ2 d0
hq hq
+ λ1 d1
It is easy to see from (5) that d hq hq is weakly decreasing if λk+1 /λk ≤ 1 − λ1 . Moreover, given d hq hq , this triangular linear system recovers all the {λk }∞ k=1 . Because the argument of the period utility is linear, the construction of d h h above delivers a compensating stream that is independent of the consumption stream c being evaluated. That is, linearity of the habit adjustment is related to the order of the quantifiers in Axiom HC(i). Indeed, HC(i) would be nearly unrestrictive if the compensation were allowed to depend on the choices in-
INTRINSIC HABIT FORMATION
1351
volved without specifying further properties. By itself, Axiom HC(i) does not require the manner of habit dependence to be homogenous across habits. Our construction above still works if the habit formation coefficients depend on tail elements of the habit; for example, if α + lim sup hk λkh = λk
k
β + lim sup hk
k
where β > α > 0. Tail dependence would violate only Axiom HC(iii), which requires homogeneity. Furthermore, the discounted form of the “outer utility” was irrelevant to our construction; it remains valid as long as the DM evaluates a consumption stream c through any utility U ∗ : R∞ → R over habit-adjusted consumption streams. The special feature of our time-additive utility is that it satisfies Axiom HC(ii), which is a generalized separability axiom that restricts the “outer ∗ ∗ utility” ∞ ∗ U above to be additively separable (that is, U (x0 x1 x2 ) = s=0 us (xs )). To see why HC(ii) is implied by time-additivity, notice that if the t t DM receives compensation d h hc after consuming c t and d h hcˆ after consumt t t ¯ and (cˆ c) ¯ reduces to comparing the the streams (c c) ing cˆ , then comparing t t values of s=0 u∗s (cs − ϕ(h c s−1 )) and s=0 u∗s (cˆs − ϕ(h cˆs−1 )). This argument does not depend on stationarity or dynamic consistency (i.e., u∗s (·) = δs u(·)). If the DM naively used β − δ quasihyperbolic discounting, HC(ii) would still be satisfied. Moreover, HC(ii) does not require the habit adjustment to be linear. It would also be satisfied using a more general notion of compensation that permits dependence on the consumption streams being evaluated, as long as the outer utility is time-additive. 5. DESIRABLE HABIT-FORMING GOODS If the consumption good is a desirable one, we can strengthen the previous representation to one in which the period utility is monotonic, as is typically assumed in the applied literature on habit formation. Standard monotonicity says the DM is better off whenever consumption in any period is increased. This seemingly innocuous assumption may not be satisfied in a time-nonseparable model: a consumption increase also raises the DM’s habit. We suggest a weakening that accommodates the possibility that a short-term consumption “gain” might not suffice to overcome the long-term utility loss. Our axiom considers a gain to be an indefinite consumption increase.14 14 By constrast, Shalev’s (1997) constant-tail monotonicity says (restricted to deterministic streams) that if a stream gives q from time t onward, then raising q to some q > q from t onward improves the stream. This is equivalent to saying that a weakly increasing (decreasing) consumption stream is at least as good (bad) as getting its worst (best) element constantly.
1352
KAREEN ROZEN
AXIOM GM—Gains Monotonicity: If α > 0, (c t t+1 c + α) c for all c t. Replacing Axiom S with GM ensures that the period utility in Theorem 1 is increasing. The proof requires additional results found in the Supplemental Material (Rozen (2010)). THEOREM 2 —Main Representation With Monotonic Period Utility: The family of preference relations satisfies Axioms PR, C, DC, GM, HC, and NDM if and only if each h can be represented as in Theorem 1 usingan increas∞ ing period utility u(·) which is (i) strictly increasing on (0 ∞) if k=1 λk < 1 and (ii) strictly increasing on either (−a ∞) or (−∞ a) for some a > 0 if ∞ k=1 λk = 1. Experimental evidence indicates that individuals may prefer receiving an increasing stream of consumption over one that is larger but fluctuates more (see Camerer and Loewenstein (2004) for a comprehensive survey). Axiom GM suggests a guideline for when a larger stream should be preferred. Consider two consumption streams, c and c , with c ≥ c . We say c >GD c , or c gainsdominates c , if c has larger period-to-period gains and smaller period-to ∀t ≥ 1. The following result characterizes period losses: ct − ct−1 ≥ ct − ct−1 GM in terms of a preference for gains-dominating streams. PROPOSITION 1—Respect of Gains-Domination: A preference relation continuous in the product topology satisfies GM if and only if it respects gains domination; that is, if and only if for any c c ∈ C, c >GD c implies that c c . The proof is immediate after noting that a stream will gains-dominate another if and only if the difference between the two streams is positive and increasing; the result follows from repeatedly applying Axiom GM to build the gains-dominating stream forward and using continuity in the product topology. 6. THE AUTOREGRESSIVE MODEL AND HABIT DECAY A frequently used specification of the linear habit formation model posits that the habit aggregator satisfies the autoregressive law of motion ϕ(hq) = αϕ(h) + βq for all h ∈ H and q ∈ Q, where the parameters α β are positive and satisfy α + β ≤ 1.15 In this section, we show that this autoregressive structure corresponds to an additional axiom which elicits the habit decay parameter α. Suppose a DM faces two consumption scenarios for period 0, high and low. In the former, she consumes very much at t = 0; in the latter, she consumes very little. If the DM consumes very little for some time after scenario high 15 Such a model appears in Boldrin, Christiano, and Fisher (1997), as well as in Sundaresan (1989), Constantinides (1990), and Schroder and Skiadas (2002) in continuous time.
INTRINSIC HABIT FORMATION
1353
and very much for some time after scenario low, could the opposing effects on her habit cancel so that her preferences following each scenario eventually coincide? The next axiom describes a choice behavior for which such equilibration is possible. AXIOM IE—Immediate Equilibration: For all c0 cˆ0 ∈ Q, there exist c1 cˆ1 ∈ ¯¯ if and only if (cˆ0 cˆ1 c) ¯¯ for all ¯ h (c0 c1 c) ¯ h (cˆ0 cˆ1 c) Q such that (c0 c1 c) ¯ ¯ c¯ ∈ C. c This says we can undo by tomorrow the effect of a difference in consumption today. Together, Axioms DC and IE imply that hc0 c1 and hcˆ0 cˆ1 are identical. Axiom IE offers a comparative measure of habit decay. To see this, fix any period-0 consumption levels cˆ0 > c0 and consider the period-1 consumption levels cˆ1 c1 given by Axiom IE. If the DM’s habits decay slowly, then the effects of prior consumption linger strongly. Hence c1 will have to be quite large and cˆ1 will have to be quite small to offset the initial difference. For fixed cˆ0 > c0 , one expects the difference c1 − cˆ1 to be larger for those DM’s whose habits decay more slowly. This intuition is confirmed by the following theorem, which shows that Axiom IE corresponds to the autoregressive specification of habits and that habits decay at the constant rate (c1 − cˆ1 )/(cˆ0 − c0 ).16 THEOREM 3—Autoregressive Habit Formation: The family of preference relations satisfies Axioms PR, C, DC, S, HC, NDM, and IE if and only if each ∞ h can be represented by Uh (c) = t=0 δt u(ct − ϕ(h c0 c1 ct−1 )) as in Theorem 1 and there exist α β > 0 with α + β ≤ 1 such that the linear habit aggregator ϕ(·) satisfies the autoregressive law of motion (6)
ϕ(hq) = αϕ(h) + βq
∀h ∈ H q ∈ Q
Moreover, for arbitrary choice of c0 cˆ0 in Axiom IE, the values of c1 cˆ1 given by Axiom IE calibrate the habit decay parameter: α = (c1 − cˆ1 )/(cˆ0 − c0 ).17 The proof of Theorem 3, given in Appendix B, suggests a more general result. It can similarly be shown that a generalization of the autoregressive model that has n habit parameters corresponds to a generalized n − 1 period version of equilibration in which it takes n − 1 periods to equilibrate preferences after a single difference in consumption. For the simplest autoregressive model, the geometric coefficients model, where the aggregator satisfies the law of motion ϕ(hq) = (1 − λ)ϕ(h) + λq, the 16
The following alternative to IE would also yield the representation in Theorem 3, but would ¯¯ ¯ c¯¯ ∈ C, c¯ h c¯¯ if and only if (q c) ¯ h (q c). not calibrate α: ∀h, ∃q ∈ Q such that for all c 17 For finite histories of length K ≥ 3, the habit aggregator cannot be written in the form (6), but the result of Theorem 3 is unchanged: the ratio of successive habit formation coefficients λk+1 /λk is constant and given by (c1 − cˆ1 )/(cˆ0 − c0 ).
1354
KAREEN ROZEN
choice experiment in Axiom IE immediately recovers the single parameter λ. Since this model corresponds to the special case α + β = 1, the parameter λ is given by 1 − (c1 − cˆ1 )/(cˆ0 − c0 ). While the autoregressive model and its geometric specialization appear similar, we show in the next section that choice behavior can depend critically on whether α + β is equal to or smaller than 1. 7. PERSISTENT VERSUS RESPONSIVE HABITS In this section we distinguish between two types of preferences that satisfy our axioms, those whose habits are responsive to weaning and those whose habits are persistent. As illustrated in Figure 1, the compensating stream d h h measures the distance between the indifference curves of h and h . Whether the DM can be weaned using a quickly fading stream of compensation or must be weaned using possibly high levels of consumption that fade slowly—or never at all—determines the extent to which consumption affects her preferences. Consider the following simple characterization of habit-forming tendencies. The DM is responsive to weaning if she can always be weaned using a finite ∞ amount of compensation: for every (h h) ∈ H, t=0 dth h is finite. The DM has persistent habits if she can never be weaned using a finite amount of com∞ ∞ pensation: for every (h h) ∈ H, t=0 dth h = ∞. We show that k=1 λk characterizes a DM’s habits as responsive or persistent and can profoundly affect the manner in which indifference curves are translated from one habit to another.18 PROPOSITION 2—Dichotomy: Suppose the DM satisfies our axioms. Then two possibilities exist: ∞ (i) The DM’s habits are persistent if k=1 λk = 1. Moreover, for every (h h) ∈ H, the compensating stream d h h is constant. ∞ (ii) The DM’s habits are responsive to weaning if k=1 λk < 1. Moreover, if k∗ is such that λk∗ +1 /λk∗ < 1 − λ1 , then for every (h h) ∈ H, the stream d h h decays at least at the geometric rate 1 − λk∗ (1 − λ1 − λk∗ +1 /λk∗ ) for all t ≥ k∗ ; and if h > h , then d h h is strictly decreasing for all t. ∞ To understand Proposition 2, suppose that k=1 λk = 1 − γ, where γ > 0 may be attributed to λk∗ +1 /λk∗ falling below 1 − λ1 for some small k∗ (observe that ∞ k=1 λk = 1 if λk+1 /λk = 1 − λ1 for every k). Even if γ is small, the effect of habits on choice behavior is quite different from that under persistent habits, ceteris paribus. Compensation rapidly decreases early on and the translation of the indifference map between two habits (h h) ∈ H is milder than it would be if habits were persistent (in which case the translation would be constant). This difference is particularly pronounced in the class of autoregressive models discussed in Section 6. In the autoregressive model, λk+1 /λk is given by the 18
The proof follows from Lemma 8.
INTRINSIC HABIT FORMATION
1355
constant α, and β = λ1 . If α + β < 1, compensation declines immediately; if α + β = 1, compensation never declines. The choice of α + β should thus be taken with care and, as the following result shows, should be made in conjunction with the choice of period utility.19 PROPOSITION 3—Persistent Habits: Suppose the DM’s habits are persistent. Then for any ε > 0, there do not exist c ∈ C and h ∈ H such that the argument of the DM’s period utility, ct − ϕ(hc t−1 ), is at least as large as ε for every t. To facilitate dynamic programming, applied work typically uses a period utility satisfying an Inada condition limx→0 u (x) = ∞. For such a period utility, Proposition 3 implies that a persistent DM will have infinite marginal utility infinitely often from any bounded consumption stream. Moreover, a persistent DM cannot perfectly smooth her habit-adjusted consumption if her consumption is bounded. 8. DISCUSSION In this paper we have introduced the device of compensating a DM for giving up her habits to provide axiomatic foundations for intrinsic linear habit formation. This approach has allowed us to clarify the differences across some prevalent specifications of this model in the applied literature. Our axiomatization can be modified to accommodate other models of history dependence. For example, it is easy to extend our axioms to generate a multidimensional version of intrinsic linear habit formation (e.g., with one standard good and two habit-forming ones). By specifying compensation to be independent across goods, one may obtain the representation ∞
2 3 δt u(ct1 ct2 − ϕ2 (h2 c02 ct−1 ) ct3 − ϕ3 (h3 c03 ct−1 ))
t=0
where the habit aggregator ϕi (·) for good i = 2 3 is given by ϕi (hˆ i ) = ∞ i ˆ i k=1 λk hk . Although consumption histories for each good are evaluated separately, the curvature of u(·) may imply that a DM’s desire for a habit-forming good she has not tried before is intensified when another good for which she has a high habit is unavailable. In addition, if the definition of weaning is generalized so that compensation may depend on the DM’s choice set, then the critical assumption that generates linearity is relaxed. One may potentially place the appropriate axioms on compensation to axiomatize models of nonlinear habit formation. 19
This result follows from Lemma 26 in the Supplemental Material.
1356
KAREEN ROZEN
APPENDIX A: PROOF OF THEOREM 1 Combined with the results in the Supplemental Material, this also proves Theorem 2. A.1. Sufficiency Axioms PR, C, DC, S, HC, and NDM are implicit in all hypotheses.
Results About the Sequences d h h LEMMA 1—Zero: For each h there is no nonzero c¯ ∈ C such that c + c¯ h c + c¯ if and only if c h c for all c c ∈ C. Consequently we may define d hh = (0 0 ).
PROOF: If there were, then for any h ≥ h both c¯ + d h h and d h h would compensate from h to h, violating uniqueness. Q.E.D. h
LEMMA 2—Triangle Equality: Let h ≥ h ≥ h . Then d h
h
= dh
+ d h h .
PROOF: By application of Axiom HC(i), c h c if and only if c + d h h h c + d h h . Using HC(i) again, c + d h h h c + d h h if and only if c + d h h + d h h h c + d h h + d h h . Therefore, c h c if and only if c + d h h + d h h h h h h h c +d +d for arbitrary c c ∈ C. The result follows from uniqueness of d h h . Q.E.D.
¯
¯
By the triangle equality, d h h = d 0h − d 0h . We abuse notation by writing d h ¯ whenever d 0h is intended. For any h ∈ H, q ∈ Q, and k ∈ N, the habit hkq ∈ H kq kq is defined by hk = q and hj = hj for every j = k. In particular, 0¯ kq is the habit which has q as the kth element and 0 everywhere else.
LEMMA 3—Additive Separability: d h h =
∞
¯ khk
k=1
(d 0
¯ khk
− d0
).
PROOF: Let h0 = h and for every k inductively define hk by hkk = hk and hki = hk−1 for all i = k. We prove the lemma in three steps: (i) for i ∞ k−1 k K any (h h) ∈ H, we may write d h h = k=1 d h h + limK→∞ d h h ; (ii) each k−1
k
¯ kh
¯ kh
K
d h h = d 0 k − d 0 k ; and (iii) limK→∞ d h h = (0 0 ). Step (i). Apply Lemma 2 iteratively to observe that if habits (h h) ∈ H eventually agree (without loss, suppose they agree on {t t + 1 }), then t k−1 k d h h = k=1 d h h . Now consider arbitrary (h h) ∈ H. For any K ∈ N and any c c ∈ C, c h c
if and only if
c+
K k=1
k−1 hk
dh
hK c +
K k=1
k−1 hk
dh
1357
INTRINSIC HABIT FORMATION
But again by HC(i), c+
K
d
hk−1 hk
hK c +
k=1
c+
K
K
k−1 hk
dh
if and only if
k=1
d
hk−1 hk
+d
hK h
h c +
K
k=1
k−1 hk
dh
K h
+ dh
k=1
K k−1 k K Therefore, for arbitrary K, d h h = k=1 d h h + d h h . k−1 k Step (ii). We show that each d h h is independent of the values of h and h ¯ ∈ H that on N \ {k}. In particular, we show for arbitrary q ≤ q and (h h) (7)
kq hkq
dh
¯ kq h¯ kq
= dh
kq h ¯ kq
dh
if and only if
kq h ¯ kq
= dh
By Lemma 2, using h¯ kq as an intermediate habit, kq h ¯ kq
dh
kq h ¯ kq
= dh
¯ kq h¯ kq
+ dh
and also, using hkq as an intermediate habit, kq h ¯ kq
dh
kq hkq
= dh
kq h ¯ kq
+ dh
Combining these two expressions yields kq h ¯ kq
dh
kq h ¯ kq
− dh
kq hkq
¯ kq h¯ kq
= dh
− dh kq
This proves (7). By Axiom HC(iii), d h k
¯ kq
h
k+1
kq h ¯ kq
= dh
kh 0¯ k 0¯ khk
. Since hk and hk+1
agree on N \ {k}, (7) implies that d h h = d . Now use the triangle equality. K Step (iii). Now we show that limK→∞ d h h = (0 0 ). Since the habits hK and h agree on {1 2 K}, iterated application of Axiom HC(iii) implies that K K K for each K, d h h = d h 0 h0 . But by the triangle equality, d h h is decreasing K K K ¯ in h . Hence d h 0 h0 ≤ d 0h0 . Therefore, K h
(0 0 ) ≤ lim d h K→∞
K h0K
= lim d h 0 K→∞
¯
K
≤ lim d 0h0 = (0 0 ) K→∞
where the last equality is due to Axiom NDM and d h h decreasing in h . Q.E.D. LEMMA 4—Weak Invariance: For any q qˆ ∈ Q and k, 0¯ kqˆ
kq+d 0 0¯ kq 0¯ 0
d
=d
0¯ kqˆ
¯ 0¯ kd0 0 0
1358
KAREEN ROZEN
PROOF: Take any c c ∈ C such that (c0 c1 ck−1 ) and (c0 c1 ck−1 ) are both equal to (q 0 0 0). According to HC(i),
(8)
c 0¯ c
if and only if
¯ kqˆ
c + d0
¯ kqˆ
0¯ kqˆ c + d 0
¯ kqˆ ¯ kqˆ To simplify notation, define f : Q → H by f (q) = (0¯ kqˆ q + d00 d10 ˆ k q 0¯ ). Applying DC to the right-hand side of (8) yields dk−1
(9)
¯ kqˆ
c + d0
¯ kqˆ
0¯ kqˆ c + d 0
k
if and only if
¯ kqˆ
c + k d0
¯ kqˆ
f (q) k c + k d 0
But again by DC, this time applied to the left-hand side of (8), (10)
c 0¯ c
if and only if
k
c 0¯ kq k c
Combining expressions (9) and (10) using (8) yields (11)
k
c 0¯ kq k c
if and only if
k
¯ kqˆ
c + k d0
¯ kqˆ
f (q) k c + k d 0
Since both have a q in the kth place, 0¯ kq ≤ f (q). As k c and k c are arbitrary, ¯ kqˆ ¯ kq d 0 = d 0 f (q) . In particular, the choice of c c (which depended on q) does ¯ kqˆ ¯ kqˆ not affect d 0 . Then (11) means k d 0 = d f (0) as well. Canceling parts using Lemma 3 gives the desired conclusion. Q.E.D. k
Construction of the Habit Aggregator ¯ kq
0 For each k define ϕk : Q → R by ϕk (q) = d∞0 if q > 0 and ϕk (0) = 0. We h naturally define ϕ : H → R by ϕ(h) = d0 = k=1 ϕk (hk ).
LEMMA 5—Linearity: ϕk (q) = λk q for some λk > 0 and for all q ∈ Q, and ¯ d h h = d 0h−h for every (h h) ∈ H. This implies that ϕ(h − h ) = d0h−h = d0h h . PROOF: By Lemma 2 and the definition of ϕk (·), 0¯ kqˆ
¯ kq 0¯ kq+d0
ˆ = ϕk (q) + d00 ϕk (q + ϕk (q))
ˆ because of Lemma 4. Then, by construcBut the last term above is ϕk (ϕk (q)) tion, ϕk (·) is additive on its image, that is, for every k, (12)
ˆ + q) = ϕk (ϕk (q)) ˆ + ϕk (q) ∀q qˆ ∈ Q ϕk (ϕk (q)
By Axiom NDM, ϕk (·) is onto Q.20 Hence (12) is identical to a nonnegativity restricted Cauchy equation (i.e., ϕk (q + q ) = ϕk (q) + ϕk (q ) for all 20 The solution of functional equation (12) is not fully characterized. Jarczyk (1991, pp. 52–61) proved continuous solutions are affine. Without NDM, we know ϕk is increasing.
1359
INTRINSIC HABIT FORMATION
ˆ We know ϕk (·) is strictly q q ≥ 0) under the reparametrization q = ϕk (q). monotone, so by Aczél and Dhombres (1989, Corollary 9), ϕk (q) = λq for some λ > 0. Q.E.D. h h
h
LEMMA 6—Recursive Structure: For any t ≥ 0 and h ∈ H, t d h = d hd0 d1 ···dt−1 ; hence d1h = ϕ(hϕ(h)), d2h = ϕ(hϕ(h)ϕ(hϕ(h))), and so forth. PROOF: The proof is by strong induction. The lemma is true for t = 0: d h = h h h d h . Assume that t d h = d hd0 d1 ···dt−1 for all t smaller than some tˆ. This implies hd h d h ···d h that tˆ+1 d h = 1 d 0 1 tˆ−1 . Using the inductive hypothesis with hd0h d1h · · · dthˆ−1 as the habit yields 1
d
hd0h d1h ···d hˆ
t −1
=d
hd h d h ···d h 0 1 tˆ−1
hd0h d1h ···d hˆ d0 t −1
hd0h d1h ···d hˆ
Once more by the inductive hypothesis, dthˆ = d0 equal to d
hd0h d1h ···d hˆ t
t −1
. Therefore,
tˆ+1
d h is
Q.E.D.
as desired.
LEMMA 7—Geometric Decay: For all h ∈ H, d h is decreasing if and only if λ1 ∈ (0 1) and (13)
λk+1 ≤ 1 − λ1 λk
∀k ≥ 1
We remark that (13) clearly implies
∞ k=1
λk ≤ 1. ¯ kq
PROOF OF LEMMA 7: Lemmas 3, 5, and 6 together prove that each dt0 be written (14)
d
0¯ kq t
= qλt+k +
t−1
may
¯ kq
di0 λt−i
i=0
Therefore, for t ≥ 1, (15)
¯ kq
¯ kq
0 dt−1 − dt0
=
t−2 i=0
¯ kq
di0 λt−i−1 + qλt−1+k −
t−1
¯ kq
di0 λt−i − λt+k q
i=0
When t = 1, only the term q(λk − λk λ1 − λk+1 ) remains in (15) for each k. ¯ kq ¯ kq Hence, the condition (13) holds if and only if d00 ≥ d10 for every k. Note that this also has the effect of implying λ1 < 1, since λk > 0 for every k by ¯ kq 0¯ kq ≥ dt0 for every t. Lemma 5. Now, we show that (13) guarantees that dt−1
1360
KAREEN ROZEN
Indeed, rearranging (15) and plugging in from (14), we obtain 0¯ kq t−1
d
−d
0¯ kq t
=
t−2
¯ kq
¯ kq
0 di0 [λt−i−1 − λt−i ] + q[λt−1+k − λt+k ] − λ1 dt−1
i=0
t−2
=
¯ kq
di0 [λt−i−1 (1 − λ1 ) − λt−i ]
i=0
+ q[λt−1+k (1 − λ1 ) − λt+k ] ¯ kq
¯ kq
0 ≥ dt0 Hence dt−1
follows from condition (13).
Q.E.D.
LEMMA 8—Persistent or Responsive: For any h ∈ H, there are two possibilities: ∞ (i) If k=1 λk < 1, d h is infinitely summable. In particular, if for some ε > 0 h there is k∗such that λk∗ +1 /λk∗ = 1 − λ1 − ε, then dth /dt−1 ≤ 1 − ελk∗ for all t ≥ k∗ . ∞ h (ii) If k=1 λk = 1 then d is a constant sequence. PROOF: For (i), let ε = 1 − λ1 − λk∗ +1 /λk∗ and xtk∗ =
h dt−1−k ∗ hk∗ +1−t
if t > k∗ , if t ≤ k∗ .
t Using ∞ the recursive construction of Lemma 6 and the fact that ϕ(h0 ) = k=t+1 λk−t hk−t (λk /λk−t ) yields h h h h ϕ(hd0h · · · dt−2 (1 − λ1 )dt−1 0) + λ1 dt−1 − εxtk∗ λk∗ + λ1 dt−1 dth = ≤ h h h dt−1 dt−1 dt−1 h h with equality if k∗ uniquely satisfies λk+1 /λk < 1 − λ1 . Since dt−1−k ∗ ≥ dt−1 ∗ ∀t > k , h h h (1 − λ1 )dt−1 − εdt−1−k dth ∗ λk∗ + λ1 dt−1 ≤ h h dt−1 dt−1
= (1 − λ1 ) − ε
h dt−1−k ∗ λk∗ + λ1 ≤ 1 − ελk∗ h dt−1
For (ii), note that for all q ∈ Q, ϕ(hq) = (1 − λ1 )ϕ(h) + λ1 q. Therefore, ϕ(hϕ(h)) = ϕ(h). The claim easily follows from induction and Lemma 6. Q.E.D.
1361
INTRINSIC HABIT FORMATION
Construction of the Continuous Preference Relation ∗ We use Axiom HC to construct a continuous map g from H × C into an auxiliary space C ∗ , as well as a continuous preference relation on C ∗ preserv∞ ing . We endow the space i=0 R with the product topology and define the
× transformation g : H × C → ×
∞
i=0
R by
g(h c) = (c0 − ϕ(h) c1 − ϕ(hc0 ) c2 − ϕ(hc0 c1 ) ) Let C ∗ = g(H × C) and Ch∗ = g({h} × C), for any h ∈ H, be the image and projected image under g, respectively. We shall consider C ∗ to be a metric ∞ subspace of t=0 R, implying that C ∗ is a metric space in its own right. Recall that the spaces H and C are metrized by the sup metric ρH and the product metric ρC , respectively. We metrize H × C by ρH×C ((h c) (h c )) = ρH (h h ) + ρC (c c ).
×
LEMMA 9—Continuous Mapping: g(· ·) is a continuous mapping. Moreover, for any given h ∈ H, g(h ·) is a homeomorphism into Ch∗ . PROOF: The map is continuous in the topology if every component is. Linearity of ϕ implies that the tth component can be written as ct − ϕ(h0t ) − t k=1 λk ct−k ; because only there is only a finite sum of elements of c in each component, the map is continuous with respect to C. Using the sup metric, it is clear that ϕ(·) is continuous with respect to H. The desired joint continuity is evident under the respective metrics. Finally, for any h ∈ H, we can directly exhibit the clearly continuous inverse g−1 (h ·) : Ch∗ → C defined by g−1 (h c ∗ ) = c0∗ + ϕ(h) c1∗ + ϕ(h c0∗ + ϕ(h)) c2∗ + ϕ h c0∗ + ϕ(h) c1∗ + ϕ(h c0∗ + ϕ(h)) Q.E.D. LEMMA 10—Nestedness: Ch∗ ⊆ Ch∗ for any (h h) ∈ H. PROOF: Take any c ∗ ∈ Ch∗ , so that by definition c ∗ has the form (c0 − ϕ(h ) c1 − ϕ(h c0 ) c2 − ϕ(h c0 c1 ) )
where c = (c0 c1 c2 ) ∈ C. For any (h h) ∈ H, c + d h h ∈ C. By Lemma 6 we know that d h h = d h−h = ϕ(h − h ) ϕ(h − h ϕ(h − h))
1362
KAREEN ROZEN
Moreover, since ϕ is affine by Lemma 5, c0 + ϕ(h − h ) − ϕ(h)
c1 + ϕ(h − h ϕ(h − h )) − ϕ(h c0 + ϕ(h − h )) = c0 + ϕ(h − h − h) c1 + ϕ(h − h − h ϕ(h − h ) − c0 − ϕ(h − h )) = (c0 − ϕ(h ) c1 − ϕ(h c0 ) c2 − ϕ(h c0 c1 ) ) ∈ Ch∗
Q.E.D.
LEMMA 11—Topological Properties: C ∗ is separable, connected, and convex. PROOF: Connectedness follows from being the continuous image of a connected space. Convexity follows from convexity of C and H and linearity of g(· ·). To see separability, the sequence {hn }n∈Z by hn = ( n n n).
construct ∗ ∗ By Lemma 10, C = n∈Z Chn . Since each g(hn ·) is continuous, each Ch∗n is separable, being the continuous image of a separable space. Let C˜ h∗n denote
the countable dense subset of each Ch∗n . Then n∈Z C˜ h∗n is a countable dense Q.E.D. subset of C ∗ .21 We define a binary relation ∗ on C ∗ × C ∗ by (16)
˙ if and only if g(h c) ∗ g(h c)
˙ c h c
Note that the definition of ∗ can be rewritten as c ∗ ∗ c˙∗ if and only if c ∗ c˙∗ ∈ Ch∗ and g−1 (h c ∗ ) h g−1 (h c˙∗ ) for some h ∈ H. LEMMA 12—Well-Definedness: The relation ∗ is well defined. PROOF: Suppose there are h h and c ∗ c˙∗ ∈ Ch∗ Ch∗ with g−1 (h c ∗ ) h g (h c˙∗ ) and g−1 (h c˙∗ ) h g−1 (h c ∗ ). We apply HC(i) to both relationships: ¯ ¯ ¯ g−1 (h c ∗ ) + d hh h¯ g−1 (h c˙∗ ) + d hh and g−1 (h c˙∗ ) + d h h h¯ g−1 (h c ∗ ) + ¯ ¯ ¯ ¯ c∗) d h h . But both g−1 (h c ∗ ) + d hh and g−1 (h c ∗ ) + d h h are equal to g−1 (h ∗ ˙ (similarly for c ). Hence the statements above are contradictory. Q.E.D. −1
LEMMA 13—Continuous Preference: ∗ is a continuous preference relation. PROOF: The Ch∗ are nested by Lemma 10. Thus for any c ∗ c˙∗ cˆ∗ ∈ C ∗ , there is h ∈ H large enough so that c ∗ c˙∗ cˆ∗ ∈ Ch∗ . Hence ∗ inherits completeness and transitivity over {c ∗ c˙∗ cˆ∗ } from h , which suffices since c ∗ c˙∗ , and cˆ∗ were arbitrary. 21 Note that H is not separable under the sup metric; if we were to make H separable by endowing it with the product topology instead, then g would not be continuous with respect to h.
INTRINSIC HABIT FORMATION
1363
To prove that ∗ is continuous in the product topology, we will show that the weak upper contour sets are closed; the argument for the weak lower contour sets is identical. Consider any sequence of streams {c ∗n }n∈Z ∈ C ∗ converging to some c ∗ ∈ C ∗ and suppose there is cˆ∗ ∈ C ∗ such that c ∗n ∗ cˆ∗ for all n. Take any h and c such that g(h c) = c ∗ . By Lemma 9, g is continuous: for any εball Y ⊂ C ∗ around c ∗ there is a δ-ball X ⊂ H × C around (h c) such that g(X) ⊂ Y . Because the sequence {c ∗n } converges to c ∗ , there is a subsequence {c ∗m } ∈ Y also converging to c ∗ . By our use of the sup metric on H we know that any (h c ) ∈ X must satisfy h ≤ h + (δ δ ). Then Lemma 10 ensures that ∗ . Take h¯ ≥ h + ( δ δ) and large enough that for every m, c ∗m ∈ Ch+(δδ) ∗ ∗ cˆ ∈ Ch¯ . We may compare the corresponding streams in C under h¯ . Using ¯ ·) as defined in the proof of Lemma 9, take c¯m = g−1 (h ¯ c ∗m ) for each g−1 (h −1 ¯ ∗ −1 ∗ ¯ cˆ ). Using the hypothesis and the definition m, c¯ = g (h c ), and cˆ¯ = g (h ∗ m ¯ ·) is ˆ of , we know that c¯ h¯ c¯ for every m. Lemma 9 asserts that g−1 (h m ¯ Since h¯ is continuous by Axiom C, we know continuous, so c¯ converges to c. ˆ¯ proving that c ∗ ∗ cˆ∗ . that c¯ h¯ c, Q.E.D. Standard results then imply ∗ has a continuous representation U ∗ : C ∗ → R. LEMMA 14—Koopmans Sensitivity: There exist q∗ qˆ ∗ ∈ R, c ∗ ∈ C ∗ , and t ∈ N such that (c ∗t−1 q∗ t+1 c ∗ ) ∗ (c ∗t−1 qˆ ∗ t+1 c ∗ ). PROOF: Let α > 0, h ∈ H, and c ∈ C be such that c + α ∼h c. Since the com¯ pensating streams are (weakly) decreasing and d00α < α for all α > 0, we can write any positive constant stream as a staggered sum of streams of the form ¯ (α d 0α ). Formally, for any α > 0, we can find a sequence of times t 1 < t 2 < · · · and positive numbers α > α1 > α2 > · · · such that the stream (α α ) can ¯ be written as the consumption stream given by (α d 0α ) starting at time 0, plus 1 2 ¯ ¯ (α1 d 0α ) starting at time t 1 , plus (α2 d 0α ) starting at time t 2 , and so forth. Suppose by contradiction that ∀q∗ qˆ ∗ ∈ R, c ∗ ∈ C ∗ , and t ∈ N, (c ∗t−1 q∗ t+1 c ∗ ) ∼∗ (c ∗t−1 qˆ ∗ t+1 c ∗ ). Let g(h c) = c ∗ , where h c are given as initially stated. Then (c ∗t−1 ct∗ + α t+1 c ∗ ) ∼∗ c ∗ by hypothesis. By definition, this means that ¯ g−1 (h (c ∗t−1 ct∗ + α t+1 c ∗ )) ∼h g−1 (h (c ∗ )) or (c t−1 ct + α t+1 c + d 0α ) ∼h c. It1 2 erative application of the indifference for α α and product continuity imply that c + (α α ) ∼h c, violating Axiom S. Q.E.D. Separability Conditions for ∗ We now prove that compensated separability suffices for the required additive separability conditions to hold by showing that the following mapping from C into C ∗ is surjective, so the needed conditions apply for all elements of C ∗ . For each t, define the “compensated consumption” map ξt : H × C → C ∗ by t t−1 ξt (h c) = g h c t−1 t c + d h0 hc (17)
1364
KAREEN ROZEN
To show that ξt is surjective, we first prove the following auxiliary result. LEMMA 15—Containment: For any h ∈ H, t ≥ 0, and c t ∈ Qt+1 , there exists ∗ ∗ ˆ h ∈ H large enough that Chc t ⊆ C ˆ t+1 . h0 PROOF: Since ϕ is linear and strictly increasing, we may choose hˆ > h such that (18)
ˆ t+1 ) − ϕ(hc t ) ≥ ϕ(h0
t (1 − λ1 )s+1 cs s=0
∗ ˙ ∈ C such that g(hc t c) ˙ = c ∗ . For it to Choose any c ∗ ∈ Chc t . Then there is a c ∗ ∗ ˆ ∈ C, also be true that c ∈ Ch0 ˆ t+1 , it must be that for some c
(19)
ˆ t+1 cˆs−1 ) = c ∗ = c˙s − ϕ(hc t c˙s−1 ) cˆs − ϕ(h0 s
∀s ≥ 0
where c −1 and c˙−1 are ignored for the case s = 0. We claim that we can construct a cˆ ∈ C (nonnegative, bounded) by using (19) to recursively define ˆ t+1 cˆs−1 ) + c˙s − ϕ(hc t c˙s−1 ) for every s ≥ 0. cˆs = ϕ(h0 ˙ For s = 0, it is clearly true Step (i)—cˆ is nonnegative. It suffices to show cˆ ≥ c. t+1 t ˆ that cˆ0 ≥ c˙0 , since we have chosen ϕ(h0 ) − ϕ(hc ) ≥ 0 in (18). We proceed by strong induction, assuming cˆsˆ−1 ≥ c˙sˆ−1 for every sˆ ≤ s. From (19), to prove cˆs ≥ ˆ t+1 cˆs−1 ) − ϕ(hc t c˙s−1 ) ≥ 0. By the inductive hypothesis, c˙s we must show ϕ(h0 (20)
ˆ t+1 cˆs−1 ) − ϕ(hc t c˙s−1 ) ϕ(h0 ¯ cˆ1 − c˙1 ) · · · (cˆs−1 − c˙s−1 )) = ϕ((hˆ − h)0t+s+1 ) + ϕ(0( ¯ t 0s ) ¯ cˆ0 − c˙0 )0s−1 ) − ϕ(0c + ϕ(0( ¯ t 0s ) ¯ cˆ0 − c˙0 )0s−1 ) − ϕ(0c ≥ ϕ(0( ˆ t+1 ) − ϕ(hc t ))0s−1 − ϕ(0c ¯ ¯ t 0s ) = ϕ 0(ϕ( h0 ≥ λs
t i=0
(1 − λ1 )i+1 ci −
t
λs+1+i ci
i=0
where the first inequality comes from the nonnegativity of ϕ, the equality comes from plugging in for cˆ0 − c˙0 from (19), and the second inequality comes from (18) and Lemma 5. By Lemma 7, λs+1+i /λs ≤ (1 − λ1 )i+1 , hence (20) is positive. Step (ii)—cˆ remains bounded. Since c˙ ∈ C, it is bounded, so it will suffice to show that the difference between cˆ and c˙ is bounded. Let us set y = ϕ((hˆ − ¯ cˆ0 − c˙0 )0). By construction, for all s ≥ 1, cˆs − c˙s equals the sum h)0t+2 ) + ϕ(0(
INTRINSIC HABIT FORMATION
1365
on the right-hand side of the first equality in (20). Because compensation is ¯ cˆ1 − c˙1 ) · · · (cˆs − c˙s )) converge to 0 as s tends to decreasing, all terms but ϕ(0( infinity. In fact, for any h and t, ϕ(h0t ) ≤ (1 − λ1 )t ϕ(h). Consequently, the ¯ cˆ0 − c˙0 )0s−1 ) is no bigger than (1 − λ1 )s−1 y for sum ϕ((hˆ − h)0t+s+1 ) + ϕ(0( ¯ t 0s ) in (20) to obtain an upper any s. Let us drop the negative term −ϕ(0c bound. By the definition of y, we see that cˆ1 − c˙1 ≤ y. We claim that for all s ≥ 1, cˆs − c˙s ≤ y. The proof proceeds by strong induction. Using the inductive s−1 hypothesis, cˆs − c˙s ≤ y(1 − λ1 )s−1 + y k=1 λs. But s−1 k=1
s−2 λs ≤ λ1 (1 − λ1 )k = 1 − (1 − λ1 )s−1 k=0
so cˆs − c˙s ≤ y as claimed.
Q.E.D.
LEMMA 16—Surjectivity: Each ξt as defined in (17) is surjective. PROOF: Fix any c ∗ ∈ C ∗ and t ≥ 1. By definition, there is h ∈ H and c ∈ C such that g(h c) = c ∗ . That is, for every s, cs − ϕ(hc0 c1 · · · cs−1 ) = cs∗ . Fix this h and c. ˆ c) ˆ = c∗ , We wish to show that there exist hˆ ∈ H and cˆ ∈ C such that ξt (h that is, (21)
ˆ cˆ1 − ϕ(hˆ cˆ0 ) cˆt−1 − ϕ(hˆ cˆ0 · · · cˆt−2 ) (cˆ0 − ϕ(h) ˆ t ) cˆt+1 − ϕ(h0 ˆ t cˆt ) ) = c ∗ cˆt − ϕ(h0
∗ Because c ∗ ∈ Ch∗ then t c ∗ ∈ Chc t−1 . Equation (21) suggests that we must show ∗ ˆ ∈ H. Lemma 15 provides a c¯ and hˆ > h such that that t c ∗ ∈ Ch0 for some h ˆ t t t ∗ ˆ ¯ = c . Moreover, since hˆ > h then c ∗ ∈ C ∗ . Therefore, there exists g(h0 c) hˆ
ˆ c) ˆ c) ¯¯ t−1 = c ∗t−1 . Setting cˆ = ¯¯ = c ∗ and, in particular, g(h c¯¯ ∈ C such that g(h t−1 t ∗ ˆ c) ˆ =c . ¯ we have ξt (h Q.E.D. (c¯¯ c),
LEMMA 17—Separability: ∗ satisfies the following separability conditions: (i) Take any c ∗ cˆ∗ ∈ C ∗ with c0∗ = cˆ0∗ . Then, for any c¯0∗ such that (c¯0∗ 1 c ∗ ), ∗ 1 ∗ (c¯0 cˆ ) ∈ C ∗ , (c0∗ 1 c ∗ ) ∗ (c0∗ 1 cˆ∗ ) if and only if
(c¯0∗ 1 c ∗ ) ∗ (c¯0∗ 1 cˆ∗ )
(ii) For any t ≥ 0 and c ∗ cˆ∗ c¯∗ c¯¯∗ ∈ C ∗ such that (c ∗t c¯∗ ), (cˆ∗t c¯∗ ), (c ∗t c¯¯∗ ), (cˆ c¯¯∗ ) ∈ C ∗ , ∗t
(22)
(c ∗t c¯∗ ) ∗ (cˆ∗t c¯∗ ) if and only if
(c ∗t c¯¯∗ ) ∗ (cˆ∗t c¯¯∗ )
1366
KAREEN ROZEN
PROOF: To see (i), note that (c0∗ 1 c ∗ ) ∗ (c0∗ 1 c˙∗ ) if and only if ∗ (23) c0 + ϕ(h) c1∗ + ϕ(h c0∗ + ϕ(h)) h c0∗ + ϕ(h) c˙1∗ + ϕ(h c0∗ + ϕ(h)) for some h ∈ H such that c ∗ c˙∗ ∈ Ch∗ . Since c ∗ ∈ Ch∗ , then c0∗ + ϕ(h) ∈ Q. Because h satisfies DC, (23) holds if and only if ∗ c1 + ϕ(h c0∗ + ϕ(h)) c2∗ + ϕ h c0∗ + ϕ(h) c1∗ + ϕ(h c0∗ + ϕ(h)) h(c0∗ +ϕ(h)) c˙1∗ + ϕ(h c0∗ + ϕ(h)) c˙2∗ + ϕ h c0∗ + ϕ(h) c˙1∗ + ϕ(h c0∗ + ϕ(h)) By definition, then, 1 c ∗ ∗ 1 c˙∗ . To prove (ii), find h large enough so that (c ∗t c¯∗ ), (cˆ∗t c¯∗ ), (c ∗t c¯¯∗ ), (cˆ∗t c¯¯∗ ) ˜˜ c, ˜˜ = ˜ c ˙ and c¨ such that g(h c) ˜ = (c ∗t c¯∗ ), g(h c) ∈ Ch∗ . Hence, there exist c ∗t ∗ ∗t ¯ ∗ ∗t ¯ ∗ ˙ = (c c¯ ), and g(h c) ¨ = (cˆ c¯ ). Moreover, we must have (cˆ c¯ ), g(h c) c˜t = c˙t and c˜˜t = c¨t . ˆ c ¯ c¯¯ ∈ C so that By Lemma 16, ξt is surjective. We claim there are hˆ and c c (24)
ˆ (c t c)) ¯ = (c ∗t c¯∗ ) ξt (h
ˆ (cˆt c)) ¯ = (cˆ∗t c¯∗ ) ξt (h
ˆ (c t c)) ¯¯ = (c ∗t c¯¯∗ ) ξt (h
ˆ (cˆt c)) ¯¯ = (cˆ∗t c¯¯∗ ) ξt (h
Recalling the construction in Lemma 15, choose hˆ > h large enough so that ˆ t+1 ) ϕ(h0
≥ max
t
(1 − λ1 )s+1 c˜s + ϕ(hc˜t )
s=0
t
(1 − λ1 )s+1 c˜˜s + ϕ(hc˜˜t )
s=0
Now that we have an hˆ that will work uniformly for these four streams in C ∗ , note again from the construction in Lemma 15 that the required continuation streams depend only on c˜t = c˙t and c˜˜t = c¨t . Therefore, c¯ and c¯¯ may be constructed as desired in (24). From the construction at the end of Lemma 16 and the fact that hˆ has been chosen to work uniformly, c and cˆ may be chosen to satisfy (24). Consequently, using (24), the desired result (22) holds if and only if ˆ (c t c)) ˆ (cˆt c)) ¯ ∗ ξt (h ¯ if and only if ξt (h ˆ (c t c)) ˆ (cˆt c)) ¯¯ ∗ ξt (h ¯¯ ξt (h
INTRINSIC HABIT FORMATION
1367
which, using the definitions of ξt in (17) and ∗ , holds if and only if t t−1 hˆ cˆt−1 c¯ + d h0 hcˆ if and only if t−1 t t−1 t t−1 hˆ cˆt−1 c¯¯ + d h0 hcˆ c c¯¯ + d h0 hc
t
c t−1 c¯ + d h0 hc
t−1
But this is immediately true by compensated separability.
Q.E.D.
For each subset of indices K ⊂ N, we will define the projection correspondences ιK : C ∗ i∈K R by ιK (Cˆ ∗ ) = x ∈ R∃c ∗ ∈ Cˆ ∗ s.t. c ∗ |K = x
×
× i∈K
where c ∗ |K denotes the restriction of the stream c ∗ to the indices in K (e.g., c ∗ |{34} = (c3∗ c4∗ )). For any t ≥ 0 we will use Ct∗ and t C ∗ to denote the projected spaces ι{t} (C ∗ ) and ι{tt+1} (C ∗ ), respectively. Since g(· ·) is continuous, the projected image Ct∗ is connected for every t. Moreover, each Ct∗ is separable. It is evident by the arbitrariness of histories used to construct these spaces that for any t, t C ∗ = C ∗ . LEMMA 18—Product of Projections: Choose some t and cˆ∗ ∈ t C ∗ , and take c ∈ Cs∗ for every 0 ≤ s ≤ t. Then (c0∗ c1∗ ct∗ cˆ∗ ) ∈ C ∗ . ∗ s
ˆ c)| ˆ {01t} . PROOF: Pick hˆ ∈ H and cˆ ∈ C such that cˆ∗ ∈ Ch∗ˆcˆt and let c˜t∗ = g(h ∞ ∗ ∗ ˆ Choose any ε ≥ max{0 max0≤i≤t (c˜i −ci )/ k=i+1 λk } and set h = h+( ε ε). Recall the inverse function g−1 (h ·), which takes an element of C ∗ and returns an element of C. We do not know that (c0∗ c1∗ ct∗ cˆ∗ ) ∈ C ∗ , but we demonstrate that applying the transformation used in g−1 (h ·) to (c0∗ c1∗ ct∗ cˆ∗ ) returns a nonnegative stream. Let us take c t = g−1 (h (c0∗ c1∗ ct∗ cˆ∗ ))|{01t} ˆ it will suffice to prove that c t ≥ cˆt , for then Since the Ch∗ are nested and h ≥ h, ∗ ∗ ¯ = (c0∗ c1∗ ct∗ cˆ∗ ). Using cˆ ∈ Chct and there is a c¯ ∈ C such that g(h (c t c)) ˆ c ∗ + ϕ(hc0 ) ≥ the transformation, c t ≥ cˆt if and only if c0∗ + ϕ(h) ≥ c˜0∗ + ϕ(h), 1 c˜1∗ + ϕ(hˆ cˆ0 ), up through ct∗ + ϕ(hc0 · · · ct−1 ) ≥ c˜t∗ + ϕ(hˆ cˆ0 · · · cˆt−1 ). But this can be seen using induction, the choices of ε and h, and the fact that ϕ is linear and strictly increasing. Q.E.D. We have proved that C ∗ = C0∗ × C1∗ × C2∗ × C ∗ and that ∗ is continuous and sensitive (stationarity implies essentiality of all periods). Hence C ∗ is a product of separable and connected spaces. We now use the result of Gorman
1368
KAREEN ROZEN
(1968, Theorem 1), which requires that each of C0∗ C1∗ C2∗ , and C ∗ be arcconnected and separable. We have shown separability; arc-connectedness follows from being a path-connected Hausdorff space (a convex space is pathconnected and a metric space is Hausdorff). Gorman’s Theorem 1 asserts that the set of separable indices is closed under unions, intersections, and differences. Condition (22) implies separability of {(0) (1)}, stationarity implies separability of {(1 2 3 )} and {(2 3 4 )}, and so on.22 Repeated application of Gorman’s theorem implies Debreu’s additive separability conditions for n = 4 and we may conclude from Fishburn (1970, Theorem 5.5) that there exist u0 u1 u2 : R → R and U3 : C ∗ → R (all continuous and unique up to a similar positive linear transformation) such that c ∗ ∗ cˆ∗ if and only if u0 (c0∗ ) + u1 (c1∗ ) + u2 (c2∗ ) + U3 (3 c ∗ ) ≥ u0 (cˆ0∗ ) + u1 (cˆ1∗ ) + u2 (cˆ2∗ ) + U3 (3 cˆ∗ ) h Can Be Represented as in (1) LEMMA 19—Representation: There → is∞ a continuous, ∞bounded utility u : R (t) ), where h = R and δ ∈ (0 1) such that Uh (c) = t=0 δt u(ct − k=1 λk h(t) k (h c0 c1 ct−1 ). PROOF: ∗ is a continuous, stationary, and sensitive preference relation, and can be represented in the form u0 (·) + u1 (·) + u2 (·) + U3 (·) on the space C ∗ = C0∗ × C1∗ × C2∗ × C ∗ , with the additive components continuous and unique up to a similar positive affine transformation. There is also additive representability on C ∗ = C0∗ × C1∗ × C ∗ , with the additive components again unique up to a similar positive linear transformation. By stationarity, u0 (·) + u1 (·) + [u2 (·) + U3 (·)] and u1 (·) + u2 (·) + U3 (·) are both additive representations on C ∗ = C0∗ × C1∗ × C ∗ . Thus, ∃δ > 0 and β1 β2 β3 ∈ R such that u1 (·) = δu0 (·) + β1 , u2 (·) = δu1 (·) + β2 = δ2 u0 (·) + δβ1 + β2 , and, for any c ∗ ∈ C ∗ , U3 (c ∗ ) = δ[u2 (c0∗ ) + U3 (1 c ∗ )] + β3 , which means U3 (c ∗ ) = ∗ ) + δU3 (1 c ∗ ) + β3 + δβ2 + δ2 β1 . Each c ∈ C and h ∈ H is bounded δ3 u0 (c 0∞ ¯ x ∈ R such that x ≤ ct∗ ≤ x¯ ∀t. By and k=1 λk ≤ 1, so for each c ∗ ∈ C ∗ , ∃x ∞ ¯ ∞ is compact in i=0 R and therefore [x x] ¯ ∞ ∩ C∗ Tychonoff’s theorem, [x x] ∗ ¯ continuity of u0 (·) and U3 (·) ensures they is compact in C . Given x and x, ¯ and [x x] ¯ ∞ ∩ C ∗ , respectively. Using itremain uniformly bounded on [x x] ∞ ∗ ∗ t ∗ erative substitution, U (c ) = t=0 δ u(ct ), where to ensure product continuity, δ ∈ (0 1) and u(·) = u0 (·) must be continuous and bounded. To represent h as in (1) we then transform each c ∈ C by g(h ·) into an argument Q.E.D. of U ∗ .
×
22 Because (22) holds for all t, it is an even stronger hypothesis than necessary; also, for any t, {(t t + 1 t + 2 )} is strictly sensitive by dynamic consistency.
1369
INTRINSIC HABIT FORMATION
The Felicity u Is Acyclic We first prove the following auxiliary result.23 LEMMA 20 —Rewriting: Consider any sequence {γt }t∈N and h ∈ H. If c¯ ∈ ∞ ¯t = ϕ(hc¯t−1 ) + γt for every t, then each c¯t may be alternatively t=0 R satisfies c written as
×
(25)
c¯t = γt + dth +
t−1
¯
ds0γt−s−1
s=0
PROOF: It is clearly true for t = 0. Suppose (25) holds for every t ≤ T − 1. Then c¯T = γT + ϕ(hc¯T −1 ) ¯ 0γ
= γT + ϕ h γ0 + d0h γ1 + d1h + d0 0 γT −1 + d
h T −1
+
T −2
d
¯ T −s−2 0γ s
s=0
= γT + ϕ(hd0h · · · dTh −1 ) +
T −1
¯
¯
¯ s d 0γs · · · d 0γs ) ϕ(0γ 0 T −2−s
s=0
= γT + dTh +
T −1
¯
ds0γT −s−1
s=0
where the second-to-last equality follows from using the recursive characterization given in Lemma 6 and reversing the order of summation. Q.E.D. LEMMA 21—Acyclicity: u(·) is acyclic. consider two cases. PROOF: We ∞ Case (i). k=1 λk < 1. Suppose that u is cyclic. Apply Lemma 20 with γt = γ for every t and recall the summability of per-period compensation from Lemma 8. These results imply that c¯ as defined in Lemma 20 remains bounded, that is, c¯ ∈ C. Moreover, c¯0 = γ, so c is nonzero. We claim this c¯ is exactly ruled out in Lemma 1, a contradiction. By the representation, c + c¯ h c + c¯ if and 23
For technical convenience, the statement of this lemma allows an extension of the definition ¯ ¯ of compensation to negative “histories”; hence if γ < 0, then d (0γ) = −d (0−γ) .
1370
KAREEN ROZEN
only if ∞
δt u(ct + c¯t − ϕ(hc t−1 ) − ϕ(0¯ c¯t−1 ))
t=0
≥
∞
δt u(ct + c¯t − ϕ(hc t−1 ) − ϕ(0¯ c¯t−1 ))
t=0
¯ Consider the tth term u(ct + c¯t − ϕ(hc t−1 ) − ϕ(0¯ c¯t−1 )). By construction of c, this term is u(ct − ϕ(hc t−1 ) + γ) = u(ct − ϕ(hc t−1 )) It becomes evident that c + c¯ h c + c¯ if and only if c h c for any c c ∈ C. ∞ Case (ii). k=1 λk = 1. Suppose that u is cyclic. Then there exists γ > 0 such that u(x + γ) = u(x) for every x ∈ R. In this case, simply choose c¯0 = γ and c¯t = ϕ(0¯ c¯t−1 ) for every t ≥ 1. Clearly c¯ ∈ C. It is easy to check that c + c¯ h c + c¯ if and only if c h c for any c c ∈ C, violating Lemma 1. Q.E.D. A.2. Necessity The constructive proof of sufficiency has proved all but uniqueness of compensation. LEMMA 22—Unique Compensation: Given the representation, for every (h h) ∈ H there is a unique d ∈ C satisfying c + d h c + d if and only if c h c for every c c ∈ C. PROOF: To prove this, we define a period utility as quasicyclic if there exist β γ > 0 and α such that u(x + γ) = βu(x) + α for all x ∈ R. Note that a quasicyclic function which is not cyclic is necessarily unbounded, either from above or below, and ruled out by the boundedness requirement.24 Suppose both d h h h h as constructed earlier and some satisfy the condition. By the ∞d ∈t C, d = d , t−1 representation for , both δ u(c − ϕ(h c ) + dt − ϕ((h − h )d t−1 )) h t t=0 ∞ t t−1 and t=0 δ u(ct − ϕ(h c )) represent h . Using the uniqueness of the additive representation, there exist β > 0 and a sequence {αt }t≥0 such that for any c ∈ C, u ct − ϕ(h c t−1 ) + dt − ϕ((h − h )d t−1 ) = βu(ct − ϕ(h c t−1 )) + αt 24
Using this definition, note that the proof of Lemma ∞21 is easily modified to prove a stronger result: unique compensation implies that (i) when k=1 λk < 1, u cannot be quasicyclic, and (ii) when ∞ k=1 λk = 1, u cannot be quasicyclic with β = 1. Here we effectively prove a converse.
1371
INTRINSIC HABIT FORMATION
Let γt = dt − ϕ((h − h )d t−1 ) for every t; we must show γt = 0 for all t. For any x ∈ R and any t, there is c ∈ C such that ct − ϕ(h c t−1 ) = x. Indeed, if x ≥ 0, choose cs = 0 for s < t and ct = ϕ(h 0t ) + x; if x < 0, choose cs = 0 for s < t − 1, t ct−1 = x/λ1 , and c t = ϕ(h 0 ). Hence u(x + γt ) = βu(x) + αt for all x t. ∞ Suppose that k=1 λk < 1. Consider the first nonzero γt . If it is positive, then u is quasicyclic, a contradiction either to acyclicity or boundedness. If γt < 0, then rearranging and changing variables gives u(x + |γt |) = u(x)/β − αt /β. Hence ∞ u is quasicyclic, again a contradiction. Now suppose k=1 λk = 1. If some γt = 0, then u(x)(1 − β) = αt for all x, implying that u is quasicyclic with β = 1, again a contradiction. Hence γt = 0 for every t. We aim to show there exist t tˆ such that γt = γtˆ . If instead γt = γ for every t, then we know that γ > 0 from Lemma 26 in the Supplemental Material. That lemma says that for any γ < 0, there does not exist a stream ˆ c) ≤ (γ γ ) (apply the lemma with c ∈ C and history hˆ ∈ H such that g(h ˆ h = h − h and c = d). But if γ > 0, then dt = ϕ((h − h )d t−1 ) + γ cannot be in ¯ 0γ C, ∞a contradiction. To see this, observe by Lemma 8 that dt−1 = λ1 γ > 0 when k=1 λk = 1; then apply Lemma 20 to see d grows unboundedly. Hence there exist nonzero γt = γtˆ such that u(x + γt ) = βu(x) + αt and u(x + γtˆ) = βu(x) + αtˆ for all x. Plugging x + γtˆ into the first equation and x + γt into the second implies βu(x + γt ) + αtˆ = u(x + γt + γtˆ) = βu(x + γtˆ) + αt
for all x
Suppose without loss of generality that γt > γtˆ . By changing variables, we see that for all x, u(x + γ) ˜ = u(x) + α, ˜ where γ˜ = γt − γtˆ and α˜ = (αt − αtˆ)/β. But then u is quasicyclic with β = 1, a contradiction. Q.E.D.
∞
APPENDIX B: PROOF OF THEOREM 3
If k=1 λk = 1, then λk+1 /λk = 1 − λ1 for every k. Then ϕ(hq) = (1 − λ1 )ϕ(h) + λ1 q. For the particular h and c0 cˆ0 ∈ Q from Axiom IE, find the corresponding c1 cˆ1 . Axioms IE and DC together imply that hc0 c1 and hcˆ0 cˆ1 are equivalent preferences, both representable as in (1) according to Theorem 1. By the uniqueness of additive representations up to positive affine transformation, there exist ρ > 0 and σi for every i ≥ 0 such that for each c¯ ∈ C, (26)
u(c¯ − ϕ(h00c¯i−1 ) − λi+1 c1 − λi+2 c0 ) = ρu(c¯ − ϕ(h00c¯i−1 ) − λi+1 cˆ1 − λi+2 cˆ0 ) + σi
For each ∞ i, let γi = λi+1 c1 + λi+2 c0 − λi+1 cˆ1 − λi+2 cˆ0 . If k=1 λk < 1, then γi = 0 for every i since u cannot be quasicyclic. For the ∞ case k=1 λk = 1, we note that ρ = 1 must hold. Since λi+1 /λi ≤ 1 − λ1 ∈ (0 1),
1372
KAREEN ROZEN
both the values |λi+1 cˆ1 + λi+2 c0 | and |λi+1 cˆ1 + λi+2 cˆ0 | tend to zero as i goes to infinity. As previously noted, for any i and x ∈ R, we may find a c¯ ∈ C such that x = c¯ − ϕ(h00c¯i−1 ). Then, by (26) and continuity of u(·), limi→∞ σi = (1 − ρ)u(x) for any x ∈ R. Since the right-hand side depends on x while the lefthand side does ∞not, we must have ρ = 1. Since u cannot be quasicyclic with β = 1 when k=1 λk = 1, we have γi = 0 for every i in that case too. Since γi = 0 for every i, we have λi+1 /λi = (c1 − cˆ1 )/(cˆ0 − c0 ) for all i ≥ 1. Then ∞
∞ λk λk hk−1 + λ1 q = λk−1 hk−1 + λ1 q ϕ(hq) = λk−1 k=2 k=2
=
c1 − cˆ1 ϕ(h) + λ1 q cˆ0 − c0
Define α = (c1 − cˆ1 )/(cˆ0 − c0 ) and β = λ1 . Since λi+1 /λi ≤ 1 − λ1 , then α + β ≤ 1. Q.E.D. REFERENCES ABEL, A. (1990): “Asset Prices Under Habit Formation and Catching Up With the Joneses,” American Economic Review, 80, 38–42. [1341] ACZÉL, J., AND J. DHOMBRES (1989): Functional Equations in Several Variables. New York: Cambridge University Press. [1359] BECKER, G., AND K. MURPHY (1988): “A Theory of Rational Addiction,” Journal of Political Economy, 96, 675–700. [1342] BLEICHRODT, H., K. ROHDE, AND P. WAKKER (2008): “Koopmans’ Constant Discounting: A Simplification and a Generalization,” Journal of Mathematical Psychology, 52, 341–347. [1343] BOLDRIN, M., L. CHRISTIANO, AND J. FISHER (1997): “Habit Persistence and Asset Returns in an Exchange Economy,” Macroeconomic Dynamics, 1, 312–332. [1352] (2001): “Habit Persistence, Asset Returns, and the Business Cycle,” The American Economic Review, 91, 149–166. [1341] BUCHWALD, D., Z. ROSENKRANZ, T. SAUER, J. ILLY, AND V. HOLMES (EDS.) (2009): The Collected Papers of Albert Einstein, Vol. 12. Princeton, NJ: Princeton University Press. [1341] CAMERER, C., AND G. LOEWENSTEIN (2004): “Behavioral Economics: Past, Present, and Future,” in Advances in Behavioral Economics, ed. by C. Camerer, G. Loewenstein, and M. Rabin. Princeton, NJ: Princeton University Press. [1343,1352] CARROLL, C., J. OVERLAND, AND D. WEIL (2000): “Saving and Growth With Habit Formation,” American Economic Review, 90, 341–355. [1341] CONSTANTINIDES, G. (1990): “Habit Formation: A Resolution of the Equity Premium Puzzle,” Journal of Political Economy, 98, 519–543. [1341,1352] FISHBURN, P. (1970): Utility Theory for Decisionmaking. New York: Wiley. [1368] GORMAN, W. (1968): “The Structure of Utility Functions,” Review of Economic Studies, 35, 367–390. [1367,1368] GUL, F., AND W. PESENDORFER (2007): “Harmful Addiction,” Review of Economic Studies, 74, 147–172. [1343] HICKS, J. (1939): Value and Capital. Oxford: Clarendon Press. [1347] JARCZYK, W. (1991): A Recurrent Method of Solving Iterative Functional Equations. Prace Naukowe ´ aski. [1358] Uniwersytetu Slaskiego w Kawtowicach, Vol. 1206. Katowice: Uniwersytet Sl¸
INTRINSIC HABIT FORMATION
1373
KOOPMANS, T. (1960): “Stationary Ordinal Utility and Impatience,” Econometrica, 28, 287–309. [1342,1343] KÖSZEGI, B., AND M. RABIN (2006): “A Model of Reference Dependent Preferences,” Quarterly Journal of Economics, 121, 1133–1166. [1345] (2009): “Reference-Dependent Consumption Plans,” American Economic Review, 99, 909–936. [1345] KOZICKI, S., AND P. TINSLEY (2002): “Dynamic Specifications in Optimizing Trend-Deviation Macro Models,” Journal of Economic Dynamics and Control, 26, 1585–1611. [1349] NEILSON, W. (2006): “Axiomatic Reference-Dependence in Behavior Towards Others and Toward Risk,” Economic Theory, 28, 681–692. [1342] ROZEN, K. (2010): “Supplement to ‘Foundations of Intrinsic Habit Formation’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/7302_Proofs. pdf. [1352] RUSTICHINI, A., AND P. SICONOLFI (2005): “Dynamic Theory of Preferences: Taste for Variety and Habit Formation,” Mimeo. [1343] SCHRODER, M., AND C. SKIADAS (2002): “An Isomorphism Between Asset Pricing Models With and Without Linear Habit Formation,” The Review of Financial Studies, 15, 1189–1221. [1352] SHALEV, J. (1997): “Loss Aversion in a Multi-Period Model,” Mathematical Social Sciences, 33, 203–226. [1343,1351] SHI, S., AND L. EPSTEIN (1993): “Habits and Time Preference,” International Economic Review, 34, 61–84. [1341] SUNDARESAN, S. (1989): “Intertemporally Dependent Preferences and the Volatility of Consumption and Wealth,” Review of Financial Studies, 2, 73–89. [1352] TVERSKY, A., AND D. KAHNEMAN (1991): “Loss Aversion in Riskless Choice: A ReferenceDependent Model,” Quarterly Journal of Economics, 106, 1039–1061. [1343] URIBE, M. (2002): “The Price-Consumption Puzzle of Currency Pegs,” Journal of Monetary Economics, 49, 533–569. [1341] VON NEUMANN, J., AND O. MORGENSTERN (1944): Theory of Games and Economic Behavior. Princeton: Princeton University Press. [1344] WENDNER, R. (2003): “Do Habits Raise Consumption Growth?” Research in Economics, 57, 151–163. [1341,1349]
Cowles Foundation and Dept. of Economics, Yale University, Box 208281, New Haven, CT 06520-8281, U.S.A.; [email protected]. Manuscript received July, 2007; final revision received March, 2010.
Econometrica, Vol. 78, No. 4 (July, 2010), 1375–1412
RISK AND RATIONALITY: UNCOVERING HETEROGENEITY IN PROBABILITY DISTORTION BY ADRIAN BRUHIN, HELGA FEHR-DUDA, AND THOMAS EPPER1 It has long been recognized that there is considerable heterogeneity in individual risk taking behavior, but little is known about the distribution of risk taking types. We present a parsimonious characterization of risk taking behavior by estimating a finite mixture model for three different experimental data sets, two Swiss and one Chinese, over a large number of real gains and losses. We find two major types of individuals: In all three data sets, the choices of roughly 80% of the subjects exhibit significant deviations from linear probability weighting of varying strength, consistent with prospect theory. Twenty percent of the subjects weight probabilities near linearly and behave essentially as expected value maximizers. Moreover, individuals are cleanly assigned to one type with probabilities close to unity. The reliability and robustness of our classification suggest using a mix of preference theories in applied economic modeling. KEYWORDS: Individual risk taking behavior, latent heterogeneity, finite mixture models, prospect theory.
1. INTRODUCTION RISK IS A UBIQUITOUS FEATURE of social and economic life. Many of our everyday choices, and often the most important ones, such as what trade to learn and where to live, involve risky consequences. While it has long been recognized that individuals differ in their risk taking attitudes, comparatively little is known about the distribution of risk preferences in the population.2 Since preferences are one of the ultimate drivers of behavior, knowledge of the composition of risk attitudes is paramount to predicting economic behavior. Economic models often allow for heterogeneity, but this heterogeneity is usually defined by the boundaries of the standard model of preferences, expected utility theory (EUT). The empirical evidence, however, reveals that heterogeneity in risk taking behavior is of a substantive kind, that is, some people evaluate risky prospects consistently with EUT, whereas other people depart substantially from expected utility maximization (Hey and Orme (1994)). Moreover, it seems to be the case that rational decision makers, revealing EUT preferences, constitute only a minority of the population (Lattimore, Baker, and Witte (1992)). 1
We thank Ernst Fehr, David Levine, Rainer Winkelmann, Michael Wolf, three anonymous referees, and the participants of the ESA World Meeting 2007 and the EEA-ESEM Meeting 2008 for helpful comments. The usual disclaimer applies. This research was supported by the Swiss National Science Foundation (Grant 100012-109907). 2 Exceptions include Dohmen, Falk, Huffman, Sunde, Schupp, and Wagner (2005), Eckel, Johnson, and Montmarquette (2005), Harrison, Lau, Rutström, and Sullivan (2005), and Harrison, Lau, and Rutström (2007). © 2010 The Econometric Society
DOI: 10.3982/ECTA7139
1376
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
To improve descriptive performance, a plethora of alternative theories have been developed. Unfortunately, no single best fitting model has been identified so far (Harless and Camerer (1994), Starmer (2000)) and, depending on the individual, one or the other model fits better. This finding poses a serious problem for applied economics. What the modeler needs is a parsimonious representation of risk preferences that is empirically well grounded and robust, and not a host of different functionals. Providing such a parsimonious characterization of heterogeneity in risk taking behavior is the objective of this paper. Our method is based on a literature on classifying individuals which has been recently adopted by the social sciences. On the basis of statistical classification procedures, such as finite mixture models, investigators have tried to discover which decision rules people actually apply when playing games or dealing with complex decision situations (El-Gamal and Grether (1995), Stahl and Wilson (1995), Houser, Keane, and McCabe (2004), Houser and Winter (2004)). The finite mixture approach does not require fitting a model for each individual, which is—given the usual quality of choice data—frequently impossible and often not desirable in the first place. Instead, our method reveals latent heterogeneity by estimating the proportions of distinct behavioral types in the population and assigning each individual to one endogenously defined behavioral type, characterized by a unique set of parameter values. We apply such a finite mixture model to choice data from three different experiments, two of which were conducted in Zurich, Switzerland. The third experiment took place in Beijing, People’s Republic of China. We analyze 448 subjects’ decisions over real monetary gains and losses, which comprise a total of nearly 18,000 observations. All three experiments were designed in a similar manner and served to elicit certainty equivalents for binary lotteries. Using a flexible sign-dependent functional as the basic behavioral model, we show the following main results. First, the estimation procedure renders a robust classification of risk taking behavior across all three data sets. Moreover, the proportions of these distinct types in their respective populations are very similar. Second, almost all the experimental subjects are unambiguously assigned to one distinct type. Measuring the quality of classification by the normalized entropy criterion (Celeux and Soromenho (1996)), ambiguity of assignments turns out to be extremely low. Thus, we observe hardly any mixed types, that is, individuals with a high probability (of say 0.4) of being one type and a high probability (of say 0.6) of being another type. This clean segregation suggests that the classification procedure is able to capture the distinctive characteristics of each behavioral type. Third, without restricting parameter values a priori, we find that in all three data sets, the minority type, which constitutes about 20% of the population, weights probabilities and values monetary outcomes near linearly. Consequently, this group of individuals can essentially be characterized as expected
RISK AND RATIONALITY
1377
value maximizers. This result is particularly interesting in the light of Rabin’s (2000) calibration theorem, which shows that expected utility maximizers should be approximately risk neutral for small stakes, which typically are encountered in laboratory experiments, if behavior under high stakes is to remain within a plausible range of risk aversion. Therefore, we label subjects belonging to this group of nearly risk neutral people as EUT types. Moreover, the EUT group remains robust to increasing the number of types in the mixture. Fourth, the majority of individuals, labeled cumulative prospect theory (CPT) types, are characterized by significant departures from linear probability weighting, consistent with prospect theory. As three-group classifications show, this group’s behavior can be characterized as a mixture of two different types: In all three data sets a proportion of approximately 30% of the subjects display pronounced departures from linear probability weighting, whereas the relative majority of 50% differ less radically from linear probability weighting. Finally, within the class of CPT types, we find major differences between Swiss and Chinese behavior. Sensitivity to changes in probabilities is generally lower for the Chinese subjects than for the Swiss. While the majority CPT groups’ probability weighting curves do not differ dramatically between countries, the minority groups display diametrically opposed patterns of probability weighting. In particular, the minority Chinese CPT group weights probabilities extremely favorably, rendering them risk seeking over a considerable range of probabilities. The minority Swiss CPT group, however, is characterized by the opposite behavior. Thus, our analysis provides a deeper understanding for the finding that, on average, the Chinese tend to be more risk seeking than westerners (Kachelmeier and Shehata (1992)). Our results show that the classification procedure successfully uncovers latent heterogeneity in the population. If there is heterogeneity of a substantive kind, as the data suggest, basing predictions on a single preference theory is inappropriate and may lead to biased results (Wilcox (2006)). EUT preferences should be taken into account alongside prospect theory preferences, even if rational EUT individuals constitute only a minority in the population. As the literature on the role of bounded rationality under strategic complementarity and substitutability shows, the mix of rational and irrational actors may be decisive for aggregate outcomes (Haltiwanger and Waldman (1985, 1989), Fehr and Tyran (2005), Camerer and Fehr (2006), Fehr and Tyran (2008)). Depending on the nature of strategic interdependence, the behavior of even a minority of players may drive the aggregate outcome. Therefore, the mix of types in the population is a crucial variable in predicting market outcomes. Since the finite mixture model provides a robust and reliable classification of individuals, the resulting estimates of group sizes and group-specific parameters may serve as valuable inputs for applied economics. The finite mixture method has been used by others in the context of modeling risk taking. However, to the best of our knowledge, there is no previous study showing a nearly identical classification of risk preference types for three
1378
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
independent data sets. Additionally, our analysis breaks new ground by showing that EUT types emerge endogenously and by extending classification to three groups. Related work by Harrison and colleagues (Andersen, Harrison, and Rutström (2006), Harrison, Humphrey, and Verschoor (2010), Harrison and Rutström (2009)) applies finite mixture models as well, but differs from our approach. Their estimation procedure is based on the a priori assumption that choices, irrespective of by whom they were taken, are either EUT consistent or CPT consistent, that is, it sorts choices by a predefined decision model. In contrast, we aim to classify individuals by endogenously defined type. Therefore, if there is a group of people whose behavior can best be described by EUT, they should get identified by the classification procedure. Furthermore, in certain decision situations, choices of EUT individuals and CPT individuals do not differ substantially from one another and, therefore, both decision models fit equally well. Consequently, depending on the data available, classification by EUT- and CPT-consistent decisions may differ markedly from classification by decision makers’ types. A recent study by Conte, Hey, and Moffatt (2010) is also dedicated to finite mixture modeling of risk taking behavior. Their results for British subjects corroborate our conclusions: Even though their work differs from ours in set of lotteries, elicitation method, and estimation procedure, and restricts one behavioral type to be EUT a priori, they also find that in the domain of gains, 80% of the individuals exhibit nonlinear probability weighting, whereas 20% are assigned to EUT. The paper is structured as follows. Section 2 describes the experimental design and procedures of the three experiments. The functional specification of the behavioral model and the finite mixture model are discussed in Section 3. Section 4 presents descriptive statistics of the data and the results of the classification procedure. Section 5 concludes. 2. EXPERIMENTAL DESIGN In the following section we describe the experimental setup and procedures. The experiments took place in Zurich in 2003 and 2006 as well as in Beijing in 2005. In Zurich, all subjects were recruited from the subject pool of the Institute for Empirical Research in Economics, which consists of students of all fields of the University of Zurich and the Swiss Federal Institute of Technology Zurich. In Beijing, subjects were recruited by flier distributed at the campuses of Peking University and Tsinhua University. Since all three experiments are based on the same design principles, we will present the prototype experiment Zurich 2003 in detail and describe the extent to which the other two experiments deviate. The main distinguishing features of the different experiments are summarized in Table I. We elicited certainty equivalents for a large number of two-outcome lotteries. One-half of the lotteries were framed as choices between risky and certain
1379
RISK AND RATIONALITY TABLE I DIFFERENCES IN EXPERIMENTAL DESIGN
Number of Subjects Lotteries Observations Procedure Framing
Zurich 03
Zurich 06
Beijing 05
179 50 8906
118 40 4669
151 28 4225
Computerized Abstract and contextual
Computerized Contextual
Paper and pencil Abstract and contextual
gains (“gain domain”); the other half were presented as choices between risky and certain losses (“loss domain”).3 For each decision in the loss domain, subjects were endowed with a specific monetary amount, which served to cover potential losses and equalized expected payoffs of corresponding gain and loss lotteries. In the Zurich 2003 and the Beijing experiments, 50% of the subjects were confronted with decisions framed in the standard gamble format. The other 50% of the subjects had to make choices framed in contextual terms, that is, gains were represented as risky or sure investment gains, and losses were represented as repair costs and insurance premiums, respectively. The Zurich 2006 experiment was based on contextually framed lotteries only. In Zurich, outcomes x1 and x2 ranged from 0 to 150 Swiss francs.4 The payoffs in the Beijing 2005 experiment were commensurate with the compensation in Zurich and varied between 4 and 55 Chinese yuan.5 Expected payoffs per subject amounted to approximately 31 Swiss francs and 20 Chinese yuan, respectively, which was considerably more than a local student assistant’s hourly compensation, plus a show up fee of 10 Swiss francs and 20 Chinese yuan, thus generating salient incentives. Probabilities p of the lotteries’ higher gain or loss x1 varied from 5% to 95%. The gain lotteries for Zurich 2003 are presented in Table II. The other two experiments essentially included a subset of these. The lotteries appeared in random order on a computer screen6 in the Swiss experiments and on paper in Beijing. In the computerized experiments, the screen displayed a decision sheet containing the specifics of the lottery under consideration and a list of 20 equally spaced certain outcomes, ranging from the lottery’s maximum payoff to the 3
There were no mixed lotteries involving both gains and losses. At the time of the experiments, 1 Swiss franc equaled about 0.76 and 0.84 U.S. dollars, respectively. 5 At the time of the experiment, 1 Chinese yuan equaled about 0.12 U.S. dollars. 6 The experiment was programmed and conducted with the software z-Tree (Fischbacher (2007)). 4
1380
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER TABLE II GAIN LOTTERIES (x1 p; x2 ), ZURICH 2003a
p
0.05 0.05 0.05 0.05 0.10 0.10 0.10 0.25 0.25
x1
x2
p
x1
x2
p
x1
x2
20 40 50 150 10 20 50 20 40
0 10 20 50 0 10 0 0 10
0.25 0.50 0.50 0.50 0.50 0.50 0.50 0.75 0.75
50 10 20 40 50 50 150 20 40
20 0 10 10 0 20 0 0 10
0.75 0.90 0.90 0.90 0.95 0.95 0.95
50 10 20 50 20 40 50
20 0 10 0 0 10 20
a Outcomes x and x are denominated in Swiss francs. 1 2
lottery’s minimum payoff, as shown in Figure 1.7 The subjects had to indicate in each row of the decision sheet whether they preferred the lottery or the certain payoff. The lottery’s certainty equivalent was calculated as the arithmetic mean of the smallest certain amount the subject preferred to the lottery and the subsequent certain amount on the list, when the subject had, for the first time, reported preference for the lottery. For example, if the subject had decided as indicated by the small circles in Figure 1, her certainty equivalent would amount to 13.5 Swiss francs. Before subjects were permitted to start working on the real decisions, they had to correctly calculate the payoffs for two hypothetical choices. In the computerized experiments, there were two trial rounds to familiarize the subjects with the procedure. At the end of the experiment, one row number of one decision sheet was randomly selected for each subject, and the subject’s choice in that row determined her payment. Subjects were paid in private afterward. The subjects could work at their own speed; the vast majority of them needed less than an hour to complete the experimental tasks as well as a socio-economic questionnaire. 3. ECONOMETRIC MODEL This section discusses the specification of the finite mixture model, which allows controlling for latent heterogeneity in risk taking behavior in a parsimonious way. For the purpose of classifying subjects according to risk taking type, we need to specify three ingredients of the mixture model: the basic theory of decision under risk, the functional form of the decision model, and the specification of the error term. 7
The format of the decision sheet for the Beijing experiment was identical to the Zurich one.
RISK AND RATIONALITY
1381
FIGURE 1.—Design of the decision sheet.
The underlying theory of decision under risk should be able to accommodate a wide range of different behaviors. Sign- and rank-dependent models capture reference dependence and nonlinear probability weighting. Therefore, a flexible approach in the spirit of cumulative prospect theory (CPT) lends itself to describing risk taking behavior. Moreover, CPT nests EUT as special case.8 If there is a group of people whose behavior can best be described by EUT, these individuals should be identified by the finite mixture estimation as a unique group exhibiting the predicted behavior. Suppose that there are C different types of individuals in the population. According to CPT, an individual belonging to a certain group c ∈ {1 C} values any binary lottery Gg = (x1g pg ; x2g ) g ∈ {1 G}, where |x1g | > |x2g |, by v(Gg ) = v(x1g )w(pg ) + v(x2g )(1 − w(pg )) 8
The bulk of previous research has been conducted under the tacit assumption that utility is defined over lottery outcomes rather than lottery outcomes integrated with total wealth. In Section 4.8.1, we extend the model to accommodate the possibility of integration.
1382
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
The function v(x) describes how monetary outcomes x are valued, whereas the function w(p) assigns a subjective weight to every outcome probability p. ˆ g can then be written as The lottery’s certainty equivalent ce ˆ g = v−1 v(x1g )w(pg ) + v(x2g )(1 − w(pg )) ce To make CPT operational, we have to assume specific functional forms for the value function v(x) and the probability weighting function w(p). A natural candidate for v(x) is a sign-dependent power function v(x) =
if x ≥ 0
xα −(−x) β
otherwise,
which can be conveniently interpreted and has turned out to be the best compromise between parsimony and goodness of fit in the context of prospect theory (Stott (2006)). Our specification of the value function seems to lack a prominent feature of prospect theory, loss aversion, capturing that “[. . . ] most people find symmetric bets of the form (x 05; −x 05) distinctly unattractive” (Kahneman and Tversky (1979, p. 279)). In this interpretation, loss aversion measures a decision maker’s attitude toward mixed lotteries, encompassing both gains and losses.9 Our lottery design does not contain any mixed lotteries, however. When there are only single-domain lotteries and loss aversion is introduced into our model in the conventional way, that is, by assuming v(x) = −λ(−x)β for x < 0 and λ > 0 (Tversky and Kahneman (1992)), the parameter of loss aversion λ is not identifiable: λ cancels out in the definition of the certainty equivalent ce of a loss lottery (x1 p; x2 ) with x1 < x2 ≤ 0, as λ(−ce)β = λ(−x1 )β w(p) + λ(−x2 )β (1 − w(p)) holds for any value of λ. Consequently, when there are no mixed lotteries available, estimating such a parameter is neither feasible nor meaningful. Obviously, this argument rests on the assumption that subjects’ reference point with respect to which gains and losses are defined is equal to zero. However, subjects might encode positive payments as gains only if they exceed a certain threshold, which would turn some of the objectively given gain lotteries into mixed ones, containing both subjective gains and losses. While in principle this is possible, estimating this reference point is questionable when there are no mixed lotteries from the onset, which would provide valuable additional information for locating the reference point reliably. To complicate matters, near linear value functions, as is predominantly the case for our data, pose severe 9 Köbberling and Wakker (2005, p. 125) viewed loss aversion as a component of risk attitudes which is logically independent from basic utility: “Prospects [. . . ] will exhibit considerably less risk aversion if [. . . ] they are nonmixed than if [. . . ] they are mixed. [. . . ] [T]he difference in risk aversion between them is due to loss aversion.”
RISK AND RATIONALITY
1383
identification problems.10 For these reasons, we stick to common practice and assume a zero reference point. Turning to the second component of the model, a variety of functional forms for modeling probability weights w(p) have been proposed in the literature (Quiggin (1982), Tversky and Kahneman (1992), Prelec (1998)). We use the two-parameter specification suggested by Goldstein and Einhorn (1987) and Lattimore, Baker, and Witte (1992): w(p) =
δpγ δpγ + (1 − p)γ
δ ≥ 0 γ ≥ 0
We favor this specification because it has proven to account well for individual heterogeneity (Wu, Zhang, and Gonzalez (2004)) and the parameters are nicely interpretable. The parameter γ < 1 largely governs the slope of the curve and measures sensitivity toward changes in probability. The smaller the value of γ is, the more strongly the probability weighting function departs from linear weighting.11 The parameter δ largely governs curve elevation and measures the relative degree of optimism. The larger is the value of δ for gains, the more elevated is the curve, the higher is the weight placed on every probability, and, consequently, the more optimistically the prospect is valued, ceteris paribus. For losses, the opposite holds. Linear weighting is characterized by γ = δ = 1. In a sign-dependent model, the parameters may take on different values for gains and for losses. We now turn to the third step of model specification. In the course of the experiments, we measured risk taking behavior of individual i ∈ {1 N} by her certainty equivalents ceig for a set of different lotteries. Since CPT explains deterministic choice, we have to add an error term εig so as to estimate the parameters of the model based on the elicited certainty equivalents. The observed ˆ g + εig . There may be certainty equivalent ceig can then be written as ceig = ce different sources of error, such as carelessness, hurrying, or inattentiveness, that result in accidentally wrong answers (Hey and Orme (1994)). The central limit theorem supports our assumption that the errors are normally distributed and simply add white noise. Furthermore, we allow for three different sources of heteroskedasticity in the error variance. First, for each lottery, subjects had to consider 20 certain outcomes, which are equally spaced throughout the lottery’s range |x1g − x2g |. Since the observed certainty equivalent ceig is calculated as the arithmetic mean of the smallest certain amount preferred to the lottery and the subsequent amount on the list, the error is proportional to the lottery range.12 10 Previous attempts to estimate model parameters simultaneously with the reference point are extremely scarce and suggest that the reference point is of negligible magnitude (Harrison, List, and Towe (2007)); their experimental design included mixed lotteries, however. 11 If linear probability weighting is accepted as a standard of rationality, γ < 1 can be interpreted as an index of departure from rationality (Tversky and Wakker (1995)). 12 See Wilcox (2010) for a similar approach in the context of discrete choice under risk.
1384
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
Second, as the subjects may be heterogeneous with respect to their previous knowledge, their attention span, and their ability to find the correct certainty equivalent, we expect the error variance to differ by individual. Third, lotteries in the gain domain may be evaluated differently from those in the loss domain. Therefore, we allow for domain-specific variance in the error term. This yields the form σig = ξi |x1g − x2g | for the standard deviation of the error distribution, where ξi denotes an individual domain-specific parameter. Note that the model allows us to test for both individual-specific and domain-specific heteroskedasticity either by imposing the restriction ξi = ξ or by forcing all the ξi to be equal in both decision domains. Both types of restrictions are rejected by their corresponding likelihood ratio tests in all three samples with p-values close to zero. Therefore, we control for all three types of heteroskedasticity in the estimation procedure. Having discussed all the necessary ingredients, we now turn to the specification of the finite mixture model. The basic idea of the mixture model is assigning an individual’s risk taking choices to one of C types of behavior, each characterized by a distinct vector of parameters θc = (αc βc γc δc ) .13 When estimating the model parameters, the number of types C is held fixed. The optimum number of classes is determined by estimating mixture models with varying C and applying some suitable test to decide among them (see Section 4.2). We denote the proportions of the C different types in the population by πc . Given our assumptions on the distribution of the error term, the density of type c for the ith individual can be expressed as G ˆg ceig − ce 1 φ f (cei G ; θc ξi ) = σig σig g=1 where φ denotes the density of the standard normal distribution. Since we do not know a priori to which group a certain individual belongs, the proportions πc are interpreted as probabilities of group membership. Therefore, each individual density of type c has to be weighted by its respective mixing proportion πc , which, of course, is unknown and has to be estimated as well. Summing over all C components yields the individual’s contribution to the model’s likelihood L. The log likelihood of the finite mixture model is then given by ln L(Ψ ; ce G ) =
N i=1
ln
C
πc f (cei G ; θc ξi )
c=1
where the vector Ψ = (θ1 θC π1 πC−1 ξ1 ξN ) summarizes all the parameters of the model. 13 The vectors γc and δc contain the domain-specific parameters for the slope and the elevation of the probability weighting functions.
RISK AND RATIONALITY
1385
The parameters are estimated by the iterative expectation maximization (EM) algorithm (Dempster, Laird, and Rubin (1977)),14 which provides an additional feature: In each iteration, the algorithm calculates by Bayesian updating an individual’s posterior probability τic of belonging to group c. The final posterior probabilities represent a particularly valuable result of the estimation procedure. Not only do we obtain the probabilities of individual group membership, but we also have a method of judging the quality of classification at our disposal. If all the τic are either close to 0 or 1, all the individuals are unambiguously assigned to one specific group. The τic can be used to calculate a suitable measure of entropy, such as the normalized entropy criterion (Celeux and Soromenho (1996)), to gauge the extent of ambiguous assignments. If classification has been successful, that is, if genuinely distinct types have been identified, we should observe a low measure of entropy. 4. RESULTS In this section we present descriptive statistics of the raw data and the results of the finite mixture estimations. 4.1. Descriptive Statistics At the level of observed data, risk taking behavior can be conveniently summarized by relative risk premia RRP = (ev − ce)/|ev|, where ev denotes the expected value of a lottery’s payoff and ce stands for its certainty equivalent. RRP > 0 indicates risk aversion, RRP < 0 risk seeking, and RRP = 0 risk neutrality. In the context of EUT, risk preferences are captured solely by the curvature of the utility function, which in turn determines the sign of relative risk premia. Hence, the sign of RRP should be independent of p, the probability of the more extreme lottery outcome. In Figures 2–4, median risk premia sorted by p show a systematic relationship between RRP and p, however: In all three data sets subjects’ choices display a fourfold pattern, that is, they are risk averse for low-probability losses and high-probability gains, and they are risk seeking for low-probability gains and high-probability losses. Therefore, at a first glance, average behavior is adequately described by a model such as CPT rather than EUT. As the following sections show, the median RRPs gloss over an important feature of the data as there is substantial latent heterogeneity in risk taking behavior. 14
Various problems may be encountered when maximizing the likelihood function of a finite mixture model and, therefore, a customized estimation procedure was used that can adequately deal with these problems. Details of the estimation procedure, written in the R environment (R Development Core Team (2006)), are discussed in the Supplemental Material (Bruhin, FehrDuda, and Epper (2010)) available online.
1386
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
FIGURE 2.—Median relative risk premia, Zurich 2003.
4.2. Model Selection So far we have not addressed the issues of whether a finite mixture model is actually to be preferred over a single-component model in the first place, and of what the number of groups C in the mixture model, often termed model size, should be. To deal with these questions, the researcher needs a criterion for assessing the correct number of mixture components. The literature on model selection in the context of mixture models is quite controversial, however, and
FIGURE 3.—Median relative risk premia, Zurich 2006.
RISK AND RATIONALITY
1387
FIGURE 4.—Median relative risk premia, Beijing 2005.
there is no best solution.15 For this reason, rather than relying on a single measure, we examine several criteria with differing characteristics to get a handle on the problem of model selection. Obviously, the classical information criteria, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), are a natural starting point for our analysis. Unfortunately, the AIC is order inconsistent, that is, the probability that it is minimized at the true model size does not approach unity with increasing sample size, and it tends to overfit models (Atkinson (1981), Geweke and Meese (1981), Celeux and Soromenho (1996)). The BIC, on the other hand, has been proved to be consistent under suitable regularity conditions, but may suffer from over- or underestimating the number of mixture components (Biernacki, Celeux, and Govaert (2000)). Aside from these problems, both classical criteria share the principle of trading off model parsimony against goodness of fit, but do not directly measure the ability of the mixture to provide well separated and nonoverlapping components, which, ultimately, is the objective of estimating mixture models. Therefore, Celeux and Soromenho (1996) proposed the normalized entropy criterion (NEC), which is based on the posterior probability of group membership τic . Biernacki, Celeux, and Govaert (1999) argued that the NEC criterion appears to be less sensitive than AIC and BIC. However, the NEC focuses solely on the quality of classification and does not take model fit into account. Ideally, what the researcher would like to have at her disposal is a criterion that delivers an assessment of both model fit, making allowance for parsimony, 15 “The problem of identifying the number of classes is one of the issues in mixture modeling with the least satisfactory treatment” (Wedel (2002, p. 364)). For example, a standard likelihood ratio test is not appropriate here (Cameron and Trivedi (2005, p. 624)).
1388
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
and the quality of classification. Biernacki, Celeux, and Govaert (2000) therefore suggested modifying the BIC criterion by factoring in a penalty for mean entropy. When the mixture components are well separated, mean entropy is close to zero and its weight in their proposed integrated completed likelihood criterion (ICL) is negligible. In the one-component case, there is no entropy by definition, and therefore ICL coincides with BIC. While there is no theoretical justification for this approach, simulations seem to show a superior performance compared to other heuristic criteria, such as NEC (Biernacki, Celeux, and Govaert (2000)), as well as compared to AIC and BIC (McLachlan and Peel (2000)). As different criteria may come up with conflicting results concerning the correct number of mixture components, model selection is a difficult problem. One way to deal with this issue is to use one’s central research question as a guideline.16 Our concern here is twofold: First, given the vast heterogeneity in individual risk taking behavior, it is doubtful whether a single-component model is adequate. Therefore, the crucial question is whether C > 1 should be preferred to C = 1.17 Second, considering the heated dispute about the “right” model of choice under risk, another objective of our study is to identify relative group sizes of EUT and non-EUT types. Bearing these objectives in mind, we calculated values for four different criteria, AIC, BIC, NEC, and ICL, and three different model sizes, C ∈ {1 2 3}, which are presented in Table III. According to these criteria, the model size which minimizes the respective criterion value should be preferred. TABLE III MODEL SELECTION CRITERIA AIC
BIC
NEC
ICL
Zurich 03 C =1 C =2 C =3
−38,398 −39,629 −40,504
−35,815 −36,997 −37,822
n.a. 0.0099 0.0131
−35,815 −36,991 −37,807
Zurich 06 C =1 C =2 C =3
−20,858 −22,173 −22,622
−19,297 −20,568 −20,971
n.a. 0.0041 0.0049
−19,297 −20,566 −20,968
Beijing 05 C =1 C =2 C =3
−18,485 −19,585 −19,965
−16,529 −17,585 −17,920
n.a. 0.0061 0.0114
−16,529 −17,582 −17,912
16
Cameron and Trivedi (2005, p. 622) argued in this context: “Therefore, it is very helpful in empirical application if the components have a natural interpretation.” 17 Parameter estimates for C = 1 are presented in the Supplemental Material.
RISK AND RATIONALITY
1389
As AIC, BIC, and therefore also ICL, are highest at C = 1 for all three data sets, C > 1 is clearly favored over C = 1. As the NEC criterion is not defined for C = 1, Biernacki, Celeux, and Govaert (1999) argued in favor of a multicomponent model if there is a C > 1 with NEC(C) ≤ 1, which is the case here. We therefore conclude that a finite mixture model is superior to a single-component model, given the unanimous recommendation by all four criteria. With regard to the choice between C = 2 and C = 3, the three-group classifications seem to be favored by all criteria but NEC. Given the minimum level of NEC at C = 2, a two-group classification is preferable if the central issue is a parsimonious representation of risk taking types rather than best model fit. As entropy is generally extremely low for both the two-group and three-group classifications, both model sizes seem quite sensible, however. Before we infer from these results that we should choose C = 3, we take a closer look at the difference between the two-group and the three-group classifications.18 What is of special interest here is whether one group remains fairly stable and the other group gets subdivided into two new ones when model size is increased, or whether the individuals get reshuffled to three new types. If the latter were the case, a two-group specification would clearly be misleading. To answer this question, we examine relative group sizes and transition patterns of individuals’ type assignment. Table IV displays the estimated relative group sizes of the behavioral types for model sizes C = 2 and C = 3. As the percentages reveal, all the Type I groups remain stable with respect to relative group size. Moreover, with a few exceptions, Type I individuals remain Type I when model size is increased: Only a total of 2% of the individuals move into or out of Type I when an additional component is introduced into the finite mixture model.19 Increasing model size results in a decomposition of the original Type II groups into two subtypes, Type IIa and Type IIb, as there is still considerable heterogeneity within these groups. Thus, from the point of view of identifying Type I individuals, the 18 Since there is quite some heterogeneity within the majority group, it is to be expected that even finer segmentations improve model fit. However, when we extend the number of groups beyond three, multimodality of the log likelihood function becomes a severe problem as, depending on the randomly drawn start values, even a stochastic extension of the EM algorithm tends to converge to local maxima. For “poorly drawn” start values, the estimation algorithm diverges, with one group getting smaller in each iteration, which might indicate that the likelihood is unbounded (McLachlan and Peel (2000, p. 54)). Therefore, estimating larger models may ask too much of our data. See also the discussion of overparametrization in Cameron and Trivedi (2005, p. 625)). Nevertheless, in the case of Zurich 06 we were able to estimate four- and five-group models: In both cases, the relative size of the minority group declines only slightly. This finding supports our conjecture that heterogeneity is particularly pronounced within the majority group, whereas the minority group is fairly homogeneous and robust to model size. Since we are not able to present results for all three data sets, we do not discuss these findings here. 19 Across all three data sets, only two individuals are newly assigned to Type I and seven individuals leave Type I when C is increased from 2 to 3.
1390
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER TABLE IV RELATIVE GROUP SIZES Type I
Type II/IIa
Type IIb
Zurich 03 C =2 C =3
17.1% 16.7%
82.9% 27.3%
56.0%
Zurich 06 C =2 C =3
22.3% 22.0%
77.7% 29.8%
48.2%
Beijing 05 C =2 C =3
20.3% 19.9%
79.7% 29.3%
50.8%
two-group classifications are informative by themselves and provide the most parsimonious classification, whereas three groups render a more detailed description of Type II individuals. To keep interpretation of graphs manageable, we will present results for C = 2 when contrasting Type I with Type II, and for C = 3 when discussing subtypes of Type II behavior.20 4.3. Clean and Robust Segregation of Behavioral Types To be of value to applied economics, a classification of risk taking behavior should meet two conditions. First, it should be clean, that is, all the individuals should be clearly associated with one specific risk taking type. Second, the classification should be robust across different experiments based on the same design principles. Regarding the first condition, entropy criteria, based on the posterior probabilities of group membership, can be used to evaluate the quality of classification. One such measure is the normalized entropy criterion introduced in the previous section. If all the individuals can be clearly assigned to one of the different behavioral groups, the posterior probabilities of group membership τic are close to 0 or 1, and NEC ≈ 0. A τic distinctly different from 0 or 1 indicates that the individual is classified as a “mixed” type belonging to group c with probability τic and to the other group(s) with probability 1 − τic . As Table III shows, NEC always lies in the vicinity of 0, irrespective of model size C being 1 2, or 3, that is, there are hardly any mixed types with ambiguous group affiliation. The high quality of classification can also be inferred directly from the distributions of the individuals’ posterior probabilities of group membership. In Figure 5, based on C = 2, τEUT denotes the posterior probability of belonging 20 The interested reader is referred to Bruhin, Fehr-Duda, and Epper (2007) for an extensive discussion of C = 2.
RISK AND RATIONALITY
1391
FIGURE 5.—Distribution of posterior probability of assignment to EUT, τEUT (C = 2).
to the first type, which can indeed be characterized, as we will demonstrate below, as expected utility maximizers.21 As the distributions of τEUT show, the individuals’ posterior probabilities of behaving consistently with EUT are either close to 1 or close to 0 for practically all the individuals in all three data sets, indicating an extremely clean segregation of subjects to types. Our result is quite remarkable as it substantiates that there are distinct types in the population—be they Swiss or Chinese—and it also shows that the underlying behavioral model provides a sound basis of discriminating between them. With respect to the second criterion, robustness of classification, Figure 5 illustrates the probably most striking result of our study, namely similar distributions of types across all three data sets. In all three histograms of Figure 5, there are about four times as many individuals with τEUT close to 0, compared to individuals with τEUT close to 1. This finding is mirrored by the estimates of the relative group sizes, displayed in Table IV, which show a stable proportion of Type I of about 20%, irrespective of model size C. Moreover, it can be shown that the hypothesis of the same distribution of types prevailing in all three data sets cannot be rejected. Similarly, when model size is increased to C = 3, relative group sizes turn out to be of equal magnitudes in all three data sets and are statistically indistinguishable from one another. Therefore, classification is not only unambiguous, but also results in roughly equal mixing proportions, demonstrating that classification is robust across experiments. This finding leads us to the next question. Do the respective types identified in each data set also exhibit similar patterns of behavior? This question will be addressed in the following sections, dedicated to the characterization of the endogenously defined types of behavior. 21
As group membership is stable, histograms of τEUT for C = 3 are qualitatively the same.
1392
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
4.4. Characterization of the Minority Type Irrespective of model size, the first type of individuals encompasses about 20% of the subjects in all three data sets, thus constituting the minority type. To characterize risk taking behavior, we examine the parameter estimates of the value functions and probability weighting functions. Table V displays, for C = 2, the type-specific parameter estimates of the finite mixture model and their standard errors, obtained by the bootstrap method with 4000 replications (Efron and Tibshirani (1993)).22 When model size is increased to three groups, parameter estimates, presented in Tables VI–IX, remain unchanged for the minority type, as group membership does not change substantially. Therefore, from the point of view of identifying this type of individuals, model size is not TABLE V CLASSIFICATION OF BEHAVIOR (C = 2)a EUT Types Parameters
CPT Types
ZH 03
ZH 06
BJ 05
Pooled
ZH 03
ZH 06
BJ 05
Pooled
0.171 (0.026)
0.223 (0.025)
0.203 (0.020)
0.193 (0.013)
0.829
0.777
0.797
0.807
0.978 (0.014)
0.988 (0.018)
1.083 (0.102)
0.981 (0.011)
1.054 (0.021)
0.901 (0.026)
0.389 (0.107)
0.941 (0.013)
γ
0.954 (0.022)
0.945 (0.020)
0.911 (0.033)
0.943 (0.021)
0.415 (0.015)
0.425 (0.015)
0.245 (0.014)
0.377 (0.009)
δ
0.910 (0.015)
0.909 (0.019)
0.889 (0.052)
0.911 (0.012)
0.845 (0.022)
0.862 (0.028)
1.315 (0.074)
0.926 (0.013)
1.007 (0.018)
1.013 (0.023)
1.023 (0.084)
1.015 (0.013)
1.107 (0.028)
1.122 (0.047)
1.144 (0.107)
1.139 (0.019)
γ
0.871 (0.043)
0.953 (0.020)
0.949 (0.040)
0.950 (0.023)
0.417 (0.017)
0.451 (0.014)
0.309 (0.013)
0.397 (0.009)
δ
0.967 (0.062)
1.049 (0.033)
1.066 (0.065)
1.072 (0.026)
1.025 (0.028)
1.059 (0.044)
0.937 (0.053)
0.991 (0.016)
20,185 371 179 8906
11,336 249 118 4669
10,108 315 151 4225
41,385 909 448 17,800
π Gains α
Losses β
ln L Parameters Individuals Observations
a Standard errors (in parentheses) are based on the bootstrap method with 4000 replications. Parameters include additional estimates for ξi for domain- and individual-specific error variances. ZH stands for Zurich; BJ stands for Beijing.
22 “[U]nless the sample size is very large, the standard errors found by an information-based approach may be too unstable to be recommended” (McLachlan and Peel (2000, p. 68)).
1393
RISK AND RATIONALITY TABLE VI CLASSIFICATION OF BEHAVIOR WITH C = 3, ZURICH 2003a Gains
Losses
EUT
CPT-I
CPT-II
π
0.167 (0.016)
0.273 (0.015)
0.560 (0.022)
α
0.954 (0.013)
1.007 (0.016)
1.075 (0.015)
γ
0.944 (0.041)
0.302 (0.031)
δ
0.930 (0.020)
0.622 (0.023)
ln L Parameters Individuals Observations
EUT
CPT-I
CPT-II
β
1.006 (0.020)
1.237 (0.044)
1.091 (0.015)
0.467 (0.013)
γ
0.885 (0.042)
0.304 (0.029)
0.459 (0.015)
0.944 (0.017)
δ
1.024 (0.043)
1.371 (0.075)
0.897 (0.016)
20,630 378 179 8906
a Standard errors (in parentheses) are based on the bootstrap method with 4000 replications. Parameters include estimates of ξi for domain- and individual-specific error variances.
a crucial issue and the two-group classifications nicely contrast the distinctive characteristics of the minority type with the majority type. Concerning probability weighting, Table V displays almost identical parameter estimates across all three data sets as well as the pooled data. Without any restrictions imposed on the parameters, we find that the minority types’ probability weighting functions are roughly linear, as the parameter estimates for both γ and δ are close to 1. Since the probability weights are a nonlinear combination of these parameters, inference needs to be based on γ and δ jointly. Therefore, we constructed the 95%-confidence bands for the probability weighting curves by the bootstrap method. Figures 6, 7, and 8 contain the graphs of the type-specific probability weighting functions for each decision domain. The gray solid lines correspond to the estimated curves for the first type, referred to as EUT type, and the gray dashed lines delimit their respective confidence bands. For both gains and losses, the confidence bands for the first type in fact include the diagonal over a wide range of probabilities, demonstrating high congruence with linear probability weighting. Where the confidence bands do not include the diagonal, the curves still lie extremely close to linear weighting. In sum, in all three data sets, we find the first behavioral type to exhibit near linear probability weighting. With respect to the valuation of monetary outcomes, the second element of the decision model, the estimated parameters α and β also display a high degree of conformity. As can be inferred from the standard errors in Table V, the 95%-confidence intervals of each single curvature estimate contain unity,
1394
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER TABLE VII CLASSIFICATION OF BEHAVIOR WITH C = 3, ZURICH 2006a Gains
Losses
EUT
CPT-I
CPT-II
π
0.220 (0.020)
0.298 (0.025)
0.482 (0.030)
α
0.990 (0.024)
0.884 (0.042)
0.908 (0.031)
γ
0.946 (0.084)
0.362 (0.081)
δ
0.905 (0.042)
0.658 (0.054)
ln L Parameters Individuals Observations
EUT
CPT-I
CPT-II
β
1.012 (0.029)
1.100 (0.083)
1.141 (0.049)
0.465 (0.022)
γ
0.952 (0.081)
0.393 (0.078)
0.491 (0.023)
1.012 (0.043)
δ
1.054 (0.074)
1.460 (0.122)
0.878 (0.054)
11,567 256 118 4669
a Standard errors (in parentheses) are based on the bootstrap method with 4000 replications. Parameters include estimates of ξi for domain- and individual-specific error variances.
TABLE VIII CLASSIFICATION OF BEHAVIOR WITH C = 3, BEIJING 2005a Gains
Losses
EUT
CPT-I
CPT-II
π
0.199 (0.017)
0.293 (0.026)
0.508 (0.027)
α
1.083 (0.098)
0.032 (0.155)
0.489 (0.113)
γ
0.911 (0.051)
0.244 (0.049)
δ
0.889 (0.094)
2.194 (0.241)
ln L Parameters Individuals Observations
EUT
CPT-I
CPT-II
β
1.023 (0.070)
1.348 (0.149)
1.111 (0.102)
0.254 (0.023)
γ
0.948 (0.053)
0.263 (0.046)
0.332 (0.019)
1.085 (0.113)
δ
1.062 (0.057)
0.600 (0.093)
1.106 (0.075)
10,304 322 151 4225
a Standard errors (in parentheses) are based on the bootstrap method with 4000 replications. Parameters include estimates of ξi for domain- and individual-specific error variances. Estimates for CPT-I α statistically not distinguishable from logarithmic utility.
1395
RISK AND RATIONALITY TABLE IX CLASSIFICATION OF BEHAVIOR WITH C = 3, POOLEDa Gains
Losses
EUT
CPT-I
CPT-II
π
0.198 (0.010)
0.316 (0.011)
0.486 (0.013)
α
0.960 (0.009)
0.901 (0.009)
0.957 (0.010)
γ
0.915 (0.032)
0.309 (0.015)
δ
0.935 (0.009)
0.726 (0.012)
ln L Parameters Individuals Observations
EUT
CPT-I
CPT-II
β
1.019 (0.008)
1.250 (0.010)
1.139 (0.009)
0.451 (0.010)
γ
0.935 (0.027)
0.339 (0.013)
0.444 (0.011)
1.063 (0.010)
δ
1.055 (0.013)
1.230 (0.013)
0.878 (0.011)
42,105 916 448 17,800
a Standard errors (in parentheses) are based on the bootstrap method with 4000 replications. Parameters include estimates of ξi for domain- and individual-specific error variances.
implying that the hypothesis of linear value functions cannot be rejected. Together with near linear probability weighting, this result justifies regarding the first type of individuals as largely consistent with expected value maximization, and therefore EUT.
FIGURE 6.—Type-specific probability weighting functions, Zurich 2003.
1396
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
FIGURE 7.—Type-specific probability weighting functions, Zurich 2006.
4.5. Characterization of the Majority Types As the discussion on model selection revealed, model size makes a difference when characterizing the majority types. Due to the stability of the minority EUT groups in all three data sets, the behavior of the majority groups can be described by a mixture of two different subtypes. As the majority groups exhibit
FIGURE 8.—Type-specific probability weighting functions, Beijing 2005.
RISK AND RATIONALITY
1397
FIGURE 9.—Probability weights CPT-I versus CPT-II, Zurich 2003.
inverted S-shaped probability weighting curves, apparent in Figures 6, 7, and 8, we label them CPT types and label their corresponding subtypes CPT-I and CPT-II. CPT-I and CPT-II groups are characterized by specific varieties of nonlinear probability weighting as Figures 9–11 show. The difference between CPT-I
FIGURE 10.—Probability weights CPT-I versus CPT-II, Zurich 2006.
1398
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
FIGURE 11.—Probability weights CPT-I versus CPT-II, Beijing 2005.
and CPT-II types manifests itself predominantly in relative strength of optimism: the elevation of the probability weighting curves, measured by δ,23 differs substantially between CPT-I and CPT-II. CPT-II individuals, who constitute the relative majority of approximately 50% in all three data sets, exhibit moderately S-shaped probability weighting curves with δ in the vicinity of 1. The remaining 30% of the individuals, however, are characterized by differing patterns of behavior. Swiss CPT-I individuals are systematically less optimistic than Swiss CPT-II types, whereas the Chinese CPT-I group encompasses highly optimistic individuals, overweighting gain probabilities and underweighting loss probabilities over a wide range of probabilities. This specific feature of Chinese CPT-I types might explain the prevalence of high risk tolerance in the Chinese population, documented by previous research (Kachelmeier and Shehata (1992)). The three-group classifications constitute a valuable piece of information when more disaggregate estimates of risk taking behavior are called for. When the focus of research lies on a parsimonious characterization of risk taking types, juxtaposing rational decision makers, not prone to probability distortions, with nonrational ones, two-group classifications are sufficiently informative due to the stability of EUT group membership.
23
Parameter estimates are presented in Tables VI–IX.
RISK AND RATIONALITY
1399
4.6. Observed Behavior by Type So far we have characterized the different behavioral types by their estimated parameter values. The obvious question that arises is whether the discriminatory power of the classification procedure can also be traced at the behavioral level. After assigning the subjects to one of the types, EUT, CPT-I, or CPT-II, based on their posterior probability of group membership τic , the observed relative risk premia can be broken down by type as depicted in Figure 12, aggregated for the pooled data set. As can be seen, median RRP of the EUT types are close to 0, reflecting near risk neutral behavior in accordance with expected value maximization. When we trace the behavior of the CPT-I and CPT-II types at the level of observed RRP in Figure 12, we find a fourfold pattern of risk attitudes, with distinctive departures from risk neutrality. Consistent with the characterizations before, CPT-I types exhibit more pronounced deviations. These findings document that individuals’ type assignment is highly congruent with observed behavioral differences. Obviously, the type-specific median relative risk premia do not differ greatly at p = 05. In decision situations when the more extreme reward materializes with a 50% chance, the typical CPT individual will not over- or underweight its probability significantly, and therefore her behavior will often not be distinguishable from a typical EUT type’s behavior. This consideration can be illustrated by means of Figure 13, which displays the departures of average CPT behavior, aggregated over both subtypes CPT-I and CPT-II, from EUT, measured by the type-specific differences in median normalized certainty equivalents. Each circle in Figure 13 corresponds to one specific lottery played in any of the three experiments, encompassing a total of 59 gain and 59 loss lotteries, ordered by the probability of the more extreme lottery outcome. At a gain probability of 25%, for instance, CPT lottery evaluations, on average, exceed EUT by up to 30% of their corresponding expected values. The dashed lines in the graphs represent the case when median CPT behavior does not differ from median EUT behavior. Positive values in the graphs indicate that, on average, CPT types are relatively more risk seeking than EUT types. The opposite holds for negative values. As the graphs show, zero differences occur solely at the 05 probability level, where, in some cases, average CPT behavior is totally indistinguishable from EUT behavior. The bulk of type-specific differences in lottery evaluations lie in the range of about ±20% of expected values, but there are also a few observations with up to ±300% of expected value, where the more extreme outcomes materialize with a low probability. In these cases, CPT types tend to overreact pronouncedly to stated probabilities. To provide an overall measure, we conducted two-sided Mann–Whitney tests which indicate significant differences (at the 5% level) in the type-specific distributions of the certainty equivalents for 75% of the lotteries.
1400
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
FIGURE 12.—Median relative risk premia by type, pooled.
RISK AND RATIONALITY
1401
FIGURE 13.—Differences in median normalized certainty equivalents, pooled.
4.7. Demographics and Group Membership The finite mixture model is a powerful tool to uncover latent heterogeneity in behavior. Given our clean and robust classification of types, an interesting question is whether we can characterize the composition of the different groups by demographic variables. In particular, can we explain who the EUT types are? To answer this question, we conducted two kinds of analyses. First, we estimated a single-component model with demographic variables as covariates. This procedure uncovers systematic behavioral differences among groups defined by observable socio-economic characteristics. We included the following variables: a gender dummy female, the number of semesters enrolled at university semester, and a binary variable highincome for incomes above a certain threshold. Summary statistics for these variables are included in the Supplemental Material. It turns out that the only variable that consistently affects behavioral parameters across experiments is female24 : Being female is associated with a substantially lower value of γ, the slope of the probability weighting function. This finding implies that women tend to be less sensitive to changes in probability than men, in line with the evidence in Fehr-Duda, de Gennaro, and Schubert (2006).25 24 Note that the percentage of females is approximately 50% in all three data sets. Parameter estimates for the single-component model are available in the Supplemental Material. 25 In our experience, in student subject pools we generally do not find socio-economic characteristics, other than gender, that are systematically correlated with the curvature of the probability weighting function. Factors other than demographics may be more important here, but this question is still underresearched.
1402
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
FIGURE 14.—Type-specific probability weighting functions: men.
Second, given that only gender systematically influences parameter values, we estimated the finite mixture model separately for men and women. In the following text we limit discussion to the results for the pooled model with C = 3. The gender-specific probability weighting functions classified by types are presented in Figures 14 and 15.26 Whereas the distributions of types are quite similar, the probability weights display a striking gender difference. The men’s groups differ essentially by their degree of rationality, characterized by the magnitude of the slope parameter γ. As in the overall data, the EUT group’s probability weights lie very close to the diagonal. The male CPT-I types deviate quite strongly from linear weighting, whereas the CPT-II types, who constitute the relative majority of 49% of the men, lie somewhere in between these two more extreme groups. The women’s minority group, however, departs more strongly from linear weighting than does the men’s. One may conclude from these findings that the overall EUT group is dominated by the behavior of male individuals exhibiting near rational probability weighting. The female CPT-I and CPT-II curves differ predominantly in the size of the elevation parameter δ. Compared with its male counterpart, the female CPT-I type also exhibits quite pronounced probability distortions, albeit with a larger fraction of optimistically weighted probabilities. The largest gender difference is displayed by the CPT-II types. Women in this group are characterized by the widest region of pessimistically weighted probabilities. This group’s behavior seems to have a decisive influence on women’s greater average risk aversion, usually found in empirical studies. While previous research has typically 26
Parameter estimates are available in the Supplemental Material.
RISK AND RATIONALITY
1403
FIGURE 15.—Type-specific probability weighting functions: women.
centered on comparative risk aversion, our finite mixture estimations provide new, much more detailed, insights in gender-specific differences in risk taking behavior. 4.8. Extensions 4.8.1. Robustness to Model Specification An additional part of our analysis concerns the robustness of classification results with respect to alternative specifications of the value function. For instance, people may not evaluate lotteries in isolation, but integrate prospective outcomes with their wealth or consumption spending. To account for the possibility that subjects integrate prospective outcomes with some background variable, we reestimated the model with the value function being defined over the sum of the prospective lottery outcome and an additional type-specific background parameter k, to be estimated, such that v(x) = (x + k)α over gains and mutatis mutandis over losses, that is, v(x) = (x + ω + k)β , where ω stands for the initial endowment.27 27 Estimating such an additional parameter comes at a cost, however. As Wakker (2008, p. 1338) noted, k represents an “anti-index of concavity” and therefore serves a similar function as the exponents of the value function α and β. For this reason, their respective contributions to utility curvature cannot be reliably separated unless one has observations over two distinct sets of lotteries (e.g., over low stakes and high stakes) at one’s disposal (Heinemann (2008)). Moreover, k is not identifiable when functions v are near linear. Previous research suggests that under EUT, people generally do not integrate their wealth in their choices over risky lotteries (Binswanger
1404
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
FIGURE 16.—Distribution of posterior probability of assignment to EUT τEUT .
Extending our model in such a manner yields the following insights. First, the stability of classification is not affected by the alternative model specification: For all three data sets, the distribution of the posterior probability of belonging to EUT is almost unaltered when background consumption is introduced into the model as Figure 16 shows. The stability of group assignment is also reflected in the estimated relative group sizes πEUT . Table X clearly shows TABLE X ESTIMATED MODEL -SPECIFIC PROPORTIONS OF EUT TYPES, πEUT
k=0 k endogenous
Zurich 03
Zurich 06
Beijing 05
0.171 0.163
0.223 0.227
0.203 0.203
(1981), Harrison, List, and Towe (2007), Heinemann (2008)). Since group affiliation of EUT types remains stable, we limit discussion to C = 2 here.
RISK AND RATIONALITY
1405
FIGURE 17.—Probability weights Zurich 2003, k endogenous.
that these values practically do not change. Moreover, not a single subject out of 448 is assigned to a different group, defined by τic ≥ 05, after allowing for integration with background consumption. Finally, the estimated probability weighting functions for both the EUT types and the CPT types remain stable as well, as Figures 17–19, confirm. In sum, our analysis attests that the distribution of types, individuals’ type affiliations, and the estimated probability weighting functions are robust to inclusion of background consumption. This robustness result represents further evidence that decision makers’ tendency to weight probabilities nonlinearly is the driving force of classification. 4.8.2. Heterogeneity in Error Variance The finite mixture model supplied us not only with estimates of type-specific behavioral parameters, but also with estimates of the error parameters, ξi —the normalized standard deviations of the error distributions. These parameters are idiosyncratic to the individual and, thus, capture some of the heterogeneity across subjects. A high error variance does not necessarily stem from random behavior, however. In an aggregate model such as ours, individual errors also reflect the degree of congruence between individual and group behavior. The question then arises of how well average behavioral group parameters describe subjects with differing degrees of departure from average behavior.28 To investigate this matter, we classified individuals as either low- or high-variance 28
We are grateful to an anonymous referee who called our attention to this issue.
1406
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
FIGURE 18.—Probability weights Zurich 2006, k endogenous.
type, depending on their estimated ξi being below or above the respective median value, and reestimated the behavioral parameters for each of the resulting six types (two types of variance × three types of behavior), pooled over all three data sets. The upper panel in Figure 20 displays the average probability weighting curves for the aggregate types estimated from the pooled data.
FIGURE 19.—Probability weights Beijing 2005, k endogenous.
RISK AND RATIONALITY
1407
FIGURE 20.—Probability weights by error variance, pooled.
The lower panel contains the curves for the variance-specific types where the solid lines mark the low-variance people’s curves and the dot-dashed lines denote the respective high-variance ones. Comparing the variance-specific curves with the overall averages reveals that low- and high-variance EUT probability weighting functions generally differ somewhat in degree of rationality, but largely remain within a comparatively narrow band around linear weighting. CPT individuals, however, exhibit a wide range of degrees of optimism. Typical high-variance individuals are either distinctly less optimistic (CPT-I) or more
1408
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
optimistic (CPT-II) than their low-variance colleagues.29 Not surprisingly perhaps, decomposing behavior according to error variance widens the spectrum of emerging probability weighting types. These findings underscore that EUT types are a fairly homogeneous group, whereas CPT types display a much wider range of behaviors. 5. CONCLUDING REMARKS We conducted three experiments based on the same design principles and applied a finite mixture model to the choice data. Our results provide novel insights: In all three data sets, the procedure renders a parsimonious characterization of risk taking behavior. Across experiments, we find an equal mix of distinct types, each characterized by a specific pattern of probability distortion. Almost every single individual is identified as one specific type, rendering segregation extremely clean. By and large, 20% of the population adhere to linear probability weighting and behave essentially as expected value maximizers, whereas majority preferences are more suitably represented by a model such as prospect theory, which can accommodate nonlinear probability weighting. In each data set, the overall CPT group is composed of a smaller group of 30% of the subjects who display substantial departures from linear probability weighting, and a relative majority of 50% who depart less radically from linear probability weighting. Moreover, classification is robust to an alternative model specification. Whereas the distribution of types is the same in the Swiss and the Chinese data sets, there are substantial cultural differences in CPT-type behavior, the most prominent being the existence of a pronouncedly optimistic group of Chinese subjects who distort small- and medium-sized probabilities much more strongly than do the Swiss. This prevalence of Chinese optimism in lottery valuation may explain previous findings that, on average, Chinese respondents are relatively more risk seeking than westerners (Kachelmeier and Shehata (1992), Hsee and Weber (1999)). We also identify a gender difference in risk taking behavior: Women generally depart more strongly from linear probability weighting than men. This finding corroborates previous research (Fehr-Duda, de Gennaro, and Schubert (2006), Harrison and Rutström (2009)). Moreover, on average, female probability distortions vary predominantly in degree of optimism, whereas male probability distortions vary in degree of rationality. Our findings demonstrate that the finite mixture approach is a powerful tool to identify and to characterize the distribution of risk taking types in the population. In this study, the individual is the unit of classification, that is, the entirety of an individual’s choices governs group affiliation. As the low measures of entropy demonstrate, almost every individual got unambiguously assigned to one 29 In the upper panel of Figure 20, comparatively more optimistic probability weighting represents CPT-II and comparatively less optimistic weighting represents CPT-I.
RISK AND RATIONALITY
1409
endogenously defined behavioral type. Previous work by Harrison and collaborators tried to accomplish a different goal: They estimated the probability that any one lottery choice, irrespective of the identity of the decision maker, was consistent with EUT or CPT, respectively, and found that “each [specification] is equally likely for these data” (Harrison and Rutström (2009, p. 144)). Clearly, in certain decision situations CPT-consistent choices are indistinguishable from EUT-consistent ones. Our findings indicate, for example, that this is the case for outcome probabilities in the neighborhood of 0.5. Since a CPT individual’s choices in this region are interlinked with all her other choices, the respective observations are categorized as CPT by our method, but may be interpreted as either CPT or EUT in the choice-based approach. Therefore, classification results may differ depending on the unit of classification and the type of data available. When we started this project, we were quite confident that we would find a considerable percentage of expected utility maximizers. What really surprised us is the robust percentage of EUT individuals, even across two so different cultures as the Swiss and Chinese. Our findings were recently corroborated by Conte, Hey, and Moffatt (2010), who found a similar distribution of behavioral types for British subjects. Since the subject pools in all three of our experiments consisted of students, further research is needed to see whether the proportion of near rational EUT types changes significantly in a representative sample and whether the complexity of decision tasks greatly alters type assignment. If it can be ascertained that near rational actors constitute a nonnegligible proportion of the population, their behavior, depending on the nature of the strategic environment, may be decisive for aggregate outcomes. The existence of a robust share of near rational actors suggests using a mix of preference theories for modeling behavior rather than a single theory, which would yield systematically biased results. In our data, prospect theory adequately describes behavior of the majority of subjects, but the parameter estimates exhibit culture- as well as type-specific values. Researchers should take this evidence into account when constructing, estimating, and applying models of choice under risk. REFERENCES ANDERSEN, S., G. W. HARRISON, AND E. E. RUTSTRÖM (2006): “Choice Behavior, Asset Integration and Natural Reference Points,” Working Paper 06-07, Department of Economics, College of Business Administration, University of Central Florida. [1378] ATKINSON, A. (1981): “Likelihood Ratios, Posterior Odds and Information Criteria,” Journal of Econometrics, 16, 15–20. [1387] BIERNACKI, C., G. CELEUX, AND G. GOVAERT (1999): “An Improvement of the NEC Criterion for Assessing the Number of Clusters in a Mixture Model,” Pattern Recognition Letters, 20, 267–272. [1387,1389] (2000): “Assesing a Mixture Model for Clustering With the Integrated Completed Likelihood,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 719–725. [1387,1388]
1410
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
BINSWANGER, H. P. (1981): “Attitudes Toward Risk: Theoretical Implications of an Experiment in Rural India,” Economic Journal, 91, 867–890. [1403,1404] BRUHIN, A., H. FEHR-DUDA, AND T. F. EPPER (2007): “Risk and Rationality: Uncovering Heterogeneity in Probability Distortion,” Working Paper 0705, Socioeconomic Institute, University of Zurich. [1390] (2010): “Supplement to ‘Risk and Rationality: Uncovering Heterogeneity in Probability Discortion’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ ecta/Supmat/7139_tables.pdf; http://www.econometricsociety.org/ecta/Supmat/7139_data and programs.zip. [1385] CAMERER, C. F., AND E. FEHR (2006): “When Does Economic Man Dominate Social Behavior?” Science, 311, 47–52. [1377] CAMERON, A., AND P. TRIVEDI (2005): Microeconometrics. Methods and Applications. Cambridge: Cambridge University Press. [1387-1389] CELEUX, G., AND G. SOROMENHO (1996): “An Entropy Criterion for Assessing the Number of Clusters in a Mixture Model,” Journal of Classification, 13, 195–212. [1376,1385,1387] CONTE, A., J. D. HEY, AND P. G. MOFFATT (2010): “Mixture Models of Choice Under Risk,” Journal of Econometrics (forthcoming). [1378,1409] DEMPSTER, A., N. LAIRD, AND D. RUBIN (1977): “Maximum Likelihood From Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Ser. B, 39, 1–38. [1385] DOHMEN, T., A. FALK, D. HUFFMAN, U. SUNDE, J. SCHUPP, AND G. G. WAGNER (2005): “Individual Risk Attitudes: New Evidence From a Large, Representative, Experimentally-Validated Survey,” Discussion Paper 1730, Institute for the Study of Labor (IZA), Bonn, Germany. [1375] ECKEL, C., C. JOHNSON, AND C. MONTMARQUETTE (2005): “Savings Decisions of the Working Poor: Short and Long-Term Horizons,” in Field Experiments in Econometrics. Research in Experimental Economics, Vol. 10, ed. by J. Carpenter, G. W. Harrison, and J. A. List. Greenwich, CT: JAI Press, 219–260. [1375] EFRON, B., AND R. J. TIBSHIRANI (1993): An Introduction to the Bootstrap. New York: Chapman & Hall. [1392] EL -GAMAL, M. A., AND D. M. GRETHER (1995): “Are People Bayesian? Uncovering Behavioral Strategies,” Journal of the American Statistical Association, 90, 1137–1145. [1376] FEHR, E., AND J. R. TYRAN (2005): “Individual Irrationality and Aggregate Outcomes,” Journal of Economic Perspectives, 19, 43–66. [1377] (2008): “Limited Rationality and Strategic Interaction: The Impact of the Strategic Environment on Nominal Inertia,” Econometrica, 76, 353–394. [1377] FEHR-DUDA, H., M. DE GENNARO, AND R. SCHUBERT (2006): “Gender, Financial Risk, and Probability Weights,” Theory and Decision, 60, 283–313. [1401,1408] FISCHBACHER, U. (2007): “z-Tree: Zurich Toolbox for Readymade Economic Experiments,” Experimental Economics, 10, 171–178. [1379] GEWEKE, J., AND R. MEESE (1981): “Estimating Regression Models of Finite but Unknown Order,” International Economic Review, 22, 55–70. [1387] GOLDSTEIN, W., AND H. EINHORN (1987): “Expression Theory and the Preference Reversal Phenomena,” Psychological Review, 94, 236–254. [1383] HALTIWANGER, J. C., AND M. WALDMAN (1985): “Rational Expectations and the Limits of Rationality: An Analysis of Heterogeneity,” American Economic Review, 75, 326–340. [1377] (1989): “Limited Rationality and Strategic Complements: The Implications for Macroeconomics,” Quarterly Journal of Economics, 104, 463–483. [1377] HARLESS, D. W., AND C. F. CAMERER (1994): “The Predictive Utility of Generalized Expected Utility Theories,” Econometrica, 62, 1251–1290. [1376] HARRISON, G. W., AND E. E. RUTSTRÖM (2009): “Representative Agents in Lottery Choice Experiments: One Wedding and a Decent Funeral,” Experimental Economics, 12, 133–158. [1378,1408,1409] HARRISON, G. W., S. J. HUMPHREY, AND A. VERSCHOOR (2010): “Choice Under Uncertainty: Evidence From Ethiopia, India and Uganda,” The Economic Journal, 120, 80–104. [1378]
RISK AND RATIONALITY
1411
HARRISON, G. W., M. I. LAU, AND E. RUTSTRÖM (2007): “Estimating Risk Attitudes in Denmark: A Field Experiment,” Scandinavian Journal of Economics, 109, 341–368. [1375] HARRISON, G. W., M. I. LAU, E. E. RUTSTRÖM, AND M. B. SULLIVAN (2005): “Eliciting Risk and Time Preferences Using Field Experiments: Some Methodological Issues,” in Field Experiments in Econometrics. Research in Experimental Economics, Vol. 10, ed. by J. Carpenter, G. W. Harrison, and J. A. List. Greenwich, CT: JAI Press, 125–218. [1375] HARRISON, G. W., J. A. LIST, AND C. TOWE (2007): “Naturally Occurring Preferences and Exogenous Laboratory Experiments: A Case Study of Risk Aversion,” Econometrica, 75, 433–458. [1383,1404] HEINEMANN, F. (2008): “Measuring Risk Aversion and the Wealth Effect,” in Risk Aversion in Experiments. Research in Experimental Economics, Vol. 12, ed. by J. Cox and G. W. Harrison. Bingley: Emerald Group Publishing Limited, 293–313. [1403,1404] HEY, J. D., AND C. ORME (1994): “Investigating Generalizations of Expected Utility Theory Using Experimental Data,” Econometrica, 62, 1291–1326. [1375,1383] HOUSER, D., AND J. WINTER (2004): “How Do Behavioral Assumptions Affect Structural Inference?” Journal of Business and Economic Statistics, 22, 64–79. [1376] HOUSER, D., M. KEANE, AND K. MCCABE (2004): “Behavior in a Dynamic Decision Problem: An Analysis of Experimental Evidence Using a Bayesian Type Classification Algorithm,” Econometrica, 72, 781–822. [1376] HSEE, C. K., AND E. U. WEBER (1999): “Cross-National Differences in Risk Preferences and Lay Predictions,” Journal of Behavioral Decision Making, 12, 165–179. [1408] KACHELMEIER, S. J., AND M. SHEHATA (1992): “Examining Risk Preferences Under High Monetary Incentives: Experimental Evidence From the People’s Republic of China,” American Economic Review, 82, 1120–1141. [1377,1398,1408] KAHNEMAN, D., AND A. TVERSKY (1979): “Prospect Theory: An Analysis of Decision Under Risk,” Econometrica, 47, 263–292. [1382] KÖBBERLING, V., AND P. WAKKER (2005): “An Index of Loss Aversion,” Journal of Economic Theory, 122, 119–131. [1382] LATTIMORE, P. K., J. R. BAKER, AND A. D. WITTE (1992): “The Influence of Probability on Risky Choice,” Journal of Economic Behavior and Organization, 17, 377–400. [1375,1383] MCLACHLAN, G., AND D. PEEL (2000): Finite Mixture Models. Wiley Series in Probabilities and Statistics. New York: Wiley. [1388,1389,1392] PRELEC, D. (1998): “The Probability Weighting Function,” Econometrica, 66, 497–527. [1383] QUIGGIN, J. (1982): “A Theory of Anticipated Utility,” Journal of Economic Behavior and Organization, 3, 323–343. [1383] R DEVELOPMENT CORE TEAM (2006): R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. [1385] RABIN, M. (2000): “Risk Aversion and Expected-Utility Theory: A Calibration Theorem,” Econometrica, 68, 1281–1292. [1377] STAHL, D. O., AND P. W. WILSON (1995): “On Players’ Models of Other Players: Theory and Experimental Evidence,” Games and Economic Behavior, 10, 218–254. [1376] STARMER, C. (2000): “Developments in Non-Expected Utility Theory: The Hunt for a Descriptive Theory of Choice Under Risk,” Journal of Economic Literature, 38, 332–382. [1376] STOTT, H. P. (2006): “Cumulative Prospect Theory’s Functional Menagerie,” Journal of Risk and Uncertainty, 32, 101–130. [1382] TVERSKY, A., AND D. KAHNEMAN (1992): “Advances in Prospect Theory: Cumulative Representation of Uncertainty,” Journal of Risk and Uncertainty, 5, 297–323. [1382,1383] TVERSKY, A., AND P. WAKKER (1995): “Risk Attitudes and Decision Weights,” Econometrica, 63, 1255–1280. [1383] WAKKER, P. P. (2008): “Explaining the Characteristics of the Power (CRRA) Utility Family,” Health Economics, 17, 1329–1344. [1403] WEDEL, M. (2002): “Concomitant Variables in Finite Mixture Models,” Statistica Neerlandica, 56, 362–375. [1387]
1412
A. BRUHIN, H. FEHR-DUDA, AND T. EPPER
WILCOX, N. (2006): “Theories of Learning in Games and Heterogeneity Bias,” Econometrica, 74, 1271–1292. [1377] (2010): “Stochastically More Risk Averse: A Contextual Theory of Stochastic Discrete Choice Under Risk,” Journal of Econometrics (forthcoming). [1383] WU, G., J. ZHANG, AND R. GONZALEZ (2004): “Decision Under Risk,” in The Blackwell Handbook of Judgment and Decision Making, ed. by D. Koehler and N. Harvey. Oxford: Oxford University Press, 399–423. [1383]
Institute for Empirical Research in Economics, University of Zurich, Bluemlisalpstrasse 10, 8006 Zurich, Switzerland; [email protected], Chair of Economics, ETH Zurich, Weinbergstrasse 35, 8092 Zurich, Switzerland; [email protected], and Chair of Economics, ETH Zurich, Weinbergstrasse 35, 8092 Zurich, Switzerland; [email protected]. Manuscript received May, 2007; final revision received February, 2010.
Econometrica, Vol. 78, No. 4 (July, 2010), 1413–1434
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD: EXPLORING WHETHER PROFESSIONALS PLAY MINIMAX IN LABORATORY EXPERIMENTS BY STEVEN D. LEVITT, JOHN A. LIST, AND DAVID H. REILEY1 The minimax argument represents game theory in its most elegant form: simple but with stark predictions. Although some of these predictions have been met with reasonable success in the field, experimental data have generally not provided results close to the theoretical predictions. In a striking study, Palacios-Huerta and Volij (2008) presented evidence that potentially resolves this puzzle: both amateur and professional soccer players play nearly exact minimax strategies in laboratory experiments. In this paper, we establish important bounds on these results by examining the behavior of four distinct subject pools: college students, bridge professionals, world-class poker players, who have vast experience with high-stakes randomization in card games, and American professional soccer players. In contrast to Palacios-Huerta and Volij’s results, we find little evidence that real-world experience transfers to the lab in these games—indeed, similar to previous experimental results, all four subject pools provide choices that are generally not close to minimax predictions. We use two additional pieces of evidence to explore why professionals do not perform well in the lab: (i) complementary experimental treatments that pit professionals against preprogrammed computers and (ii) post-experiment questionnaires. The most likely explanation is that these professionals are unable to transfer their skills at randomization from the familiar context of the field to the unfamiliar context of the lab. KEYWORDS: Mixed strategy, minimax, laboratory experiments.
1. INTRODUCTION JOHN VON NEUMANN’S (1928) minimax theorem preceded Nash equilibrium as the first general framework for understanding play in strategic situations. The underlying logic of the minimax argument has subsequently been applied broadly—from models of firm, animal, and plant competition to the optimal actions of nations at war. In zero-sum games with unique mixed-strategy equilibria, minimax logic has an intuitive appeal: one needs to randomize strategies to prevent exploitation by one’s opponent. A nagging issue is that subjects in laboratory studies typically do not play near the predictions of minimax (see, e.g., Lieberman (1960, 1962), Brayer (1964), Messick (1967), Fox (1972), Brown and Rosenthal (1990), Rosenthal, Shachat, and Walker (2003)). Perhaps this finding should not have come as a surprise given that experimental 1 We would like to thank the editor and three anonymous referees for valuable comments. Ignacio Palacios-Huerta and Jesse Shapiro provided helpful conversations. Phil Gordon was instrumental in our efforts to recruit world-class poker players. Omar Al-Ubaydli, David Caballero, Dwyer Gunn, Bill Hessert, Ryan Johnson, Min Lee, Randall Lewis, Andrew Sherman, Alec Smith, Brittany Smith, Dean Strachan, and, especially, Lisandra Rickards provided fantastic research assistance.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7405
1414
S. D. LEVITT, J. A. LIST, AND D. H. REILEY
subjects are unable to produce a random series of responses, even when directed to do so (Budescu and Rapoport (1992)). O’Neill (1991, p. 506) aptly summarized these results by noting that “by the mid-1960s, non-cooperative theory had received so little support that laboratory tests ceased almost completely.” Recent evidence from field data has provided a renewed sense of optimism, however. Walker and Wooders (2001) analyzed serve choices in Grand Slam tennis matches and reported similar win rates across various strategies, a result consistent with the minimax equilibrium prediction. Yet, they found that the players switch from one strategy to another too often, a result at odds with minimax theory but consonant with laboratory experimental evidence. Hsu, Huang, and Tang (2007) analyzed a broader tennis data set, finding results even more consistent with theory: not only are win rates similar across strategies, but individual play is serially independent. Complementing these data are results from Chiappori, Levitt, and Groseclose (2002) and Palacios-Huerta (2003), who examined penalty kicks in professional soccer games. Both studies report that winning probabilities are identical across strategies and that choices are serially independent. Combined, these two strands of literature present an important puzzle: why do controlled laboratory tests of minimax systematically provide data far from minimax predictions, whereas less controlled tests using field data appear to confirm theory? Perhaps the tests using field data lack statistical power to reject minimax play. Alternatively, maybe the laboratory has not provided the appropriate environment—for example, crucial experience and context—for subjects to learn the gaming rules to produce equilibrium play. Palacios-Huerta and Volij (2008) provided a third possible resolution to this puzzle: typical subjects in laboratory experiments do not have uniformly high skill at playing games with mixed-strategy equilibria, but the subset of individuals who excel at tennis or penalty kicks do have uniformly high skill and they are able to transfer their skills from one setting to another. In support of this conjecture, using two standard zero-sum laboratory games with both amateur and professional soccer players as experimental subjects, they reported striking evidence that both subject pools use exact minimax strategies. To the best of our knowledge, this study is the first laboratory experiment to show that subjects can both (i) play strategies in the predicted equilibrium proportions and (ii) generate a sequence of choices that are serially independent. The authors attribute their result to skills learned in soccer that are subsequently transferred to laboratory card games. In an effort to understand when field behavior does and does not translate into lab performance, this paper begins by summarizing data from three distinct subject pools—undergraduate students, professional bridge players, and professional poker players—playing the same two zero-sum laboratory games as in Palacios-Huerta and Volij (2008). Members of both of our professional subject pools have extensive experience thinking analytically in card games.
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD
1415
For our purposes, however, there is one crucial difference between bridge and poker players: there is virtually no role for mixed strategies in bridge. In contrast, randomization is an integral component of skillful poker, as noted, for example, by Friedman (1971, B764): “it is quite clear to those who have played much poker that some sort of mixed strategy must be used.”2 Empirical results from all three subject pools parallel those previously reported in the literature using laboratory experiments with student subjects: play is shown to deviate considerably from the theoretical predictions (see, e.g., Lieberman (1960, 1962), Brayer (1964), Messick (1967), Fox (1972), Brown and Rosenthal (1990), Rosenthal, Shachat, and Walker (2003), Palacios-Huerta and Volij (2008)). Importantly, professional poker players play no closer to minimax than students and bridge professionals, and far from minimax predictions. This finding holds when the professionals compete against other players, as well as when they are informed that they are playing against a computer preprogrammed to exploit individual deviations from optimal play. These results induced us to collect data from our own sample of professional soccer players drawn from three Major League Soccer (MLS) teams: the Los Angeles Galaxy, Chivas USA, and Real Salt Lake. Again, the empirical results mirror those found with other subject pools, both in our own experiments and in the previous literature: play is much farther from equilibrium than observed in Palacios-Huerta and Volij (2008). In light of the large body of psychological evidence that reports limited transfer of learning across tasks (Loewenstein (1999)), we suspect that our failure to find that play in the lab closely approximates minimax predictions is due to the fact that the two zero-sum games themselves are not ideal representations of what the subjects actually face in the field, or at least the players are not recognizing them as such. To dig a level deeper into this hypothesis, we conducted a post-experimental survey inquiring how the professional soccer players interpreted the experimental game. Consonant with the data patterns observed, not one soccer player who participated in the experiment spontaneously responded that the experiment reminded him of penalty kicks. Even when specifically prompted with a question about penalty kicks, many of the subjects saw little connection between the lab game and penalty kicks. 2. EXPERIMENTAL DESIGN We chose to follow Palacios-Huerta and Volij (2008) in using two different matrix games, which we described to subjects as Hide and Seek and Four-Card 2 Hirschberg, Levitt, and List (2008) provided further anecdotal evidence of the importance of randomization and empirical documentation of poker players randomizing. Using thousands of observations, they found that online poker players equalize payoffs across strategies when mixing. That paper also shows that mixing is an extremely common occurrence in the game examined (heads-up limit hold’em).
1416
S. D. LEVITT, J. A. LIST, AND D. H. REILEY
FIGURE 1A.—Payoff matrix for the 2 × 2 game. The pursuer’s payoff is given first, followed by the evader’s payoff. If both cards match, a die roll determines who wins the round. Cells 1 and 4 show the payoffs for low values of the die. High values yield the opposite payoff.
Barry. Hide and Seek is a 2 × 2 matrix game taken from Rosenthal, Shachat, and Walker (2003); Four-Card Barry is a 4 × 4 matrix game developed by O’Neill (1987).3 As Figures 1a and 1b demonstrate, both games are two-player zero-sum games with non-uniform equilibrium mixing proportions. 2.1. Student Subjects To provide a baseline of comparison, our examination begins with an exploration of undergraduate student play in these two games. We included a total of 46 students from the University of Arizona—22 students participated as subjects in two sessions of Hide and Seek and 24 participated as subjects in three sessions of Four-Card Barry. No subject participated in more than one session. Following Rosenthal, Shachat, and Walker (2003), in the Hide and Seek treatment we had each pair of participants sit opposite each other, with a game conductor sitting at the side of the table. The conductor gave each participant the instructions for Hide and Seek (see the Supplementary Appendix
FIGURE 1B.—Payoff matrix for the 4 × 4 game. The row player’s payoff is given first, followed by the column player’s payoff. Row wins if two non-diamond cards are played and the suits match, or if a diamond card and a non-diamond card are played. Column wins if two non-diamond cards are played and the suits do not match, or if two diamond cards are played. 3 The O’Neill experiment was seminal in the sense that it moved experimental tests of minimax theory to an environment that required fewer assumptions on players’ utility functions. Prior to O’Neill (1987), previous experimenters assumed that utility depended only on the players’ own payoff and furthermore was a linear function of that payoff. By restricting the game to two outcomes—win or lose the same dollar amount—O’Neill was able to construct a matrix with the property that a players’ minimax strategy is invariant over reasonable utility functions. Mark Walker and John Wooders suggested the name “Four-Card Barry” to us.
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD
1417
(Levitt, List, and Reiley (2010))) and read them aloud. Then the participants played several practice rounds until each was sure she was ready to play for real money. Participants made their decisions by playing either a red card or a black card,4 with both players’ decisions revealed simultaneously. As specified in the instructions, the game conductor sometimes rolled a six-sided die to determine the winner of a round. Participants played a total of 150 rounds, switching roles after 75 rounds. Each round, one player won a payoff of $0.25. Each session of Hide and Seek lasted less than an hour. The payoff matrix in our version of Hide and Seek follows that of Rosenthal, Shachat, and Walker (2003), and therefore differs slightly from the 2 × 2 penalty-kick game studied by Palacios-Huerta and Volij (2008).5 As in Hide and Seek, we implemented Four-Card Barry in a manner that closely followed the literature. Each pair of participants for Four-Card Barry sat opposite each other, with a conductor sitting at the side of the table. The conductor gave each participant the instructions for Four-Card Barry (see the Supplementary Appendix) and read them aloud. The participants then played as many practice rounds as they wished until both were ready to play for real money. Participants made their decisions by playing one of their four cards, with both players’ decisions revealed simultaneously. Participants played a total of 150 rounds, switching roles after the first 75 rounds. Each round produced a winner who received a $0.25 payoff; each session of Four-Card Barry typically lasted less than an hour. This variant of Four-Card Barry is identical to Palacios-Huerta and Volij (2008), with two minor exceptions. First, since we intended to play this game with professional card players, we chose to use regular playing cards with all four of our subject pools. That is, instead of the colored cards (red, brown, purple, green) used by Palacios-Huerta and Volij (2008), we gave each player one card of each suit (spade, heart, club, diamond).6 Second, to provide additional insights into the ability of subjects to transfer knowledge across tasks, we had the players switch roles halfway through the 150 rounds of the game. Given that 4 We used regular playing cards, typically with a ten of a red suit and a ten of a black suit for each player. 5 Palacios-Huerta and Volij (2008) created a payoff matrix based on the empirical success percentages from professional penalty kicks. Since we intended to play our game with multiple subject pools, we elected instead to implement a game that was easy for the subjects to understand, but was asymmetric and had equilibrium mixing proportions different from 50 : 50 for each player. Consequently, the equilibrium mixing proportions differ slightly from those of Palacios-Huerta and Volij (2008). Palacios-Huerta and Volij (2008) have equilibrium mixing proportions of approximately 36 : 64 for player 1 and 55 : 45 for player 2; Hide and Seek has equilibrium mixing proportions of 67 : 33 for each player. If anything, this change provides theory with a better chance to succeed, as a two-thirds mixing proportion might be cognitively easier to execute than a more complicated proportion like 36 : 64 or 55 : 45. 6 Each player received all four cards of the same rank, either four nines or four tens. We deliberately avoided using aces to avoid having the ace be focal, as most decks of playing cards make the ace of spades much larger than the aces of the other suits.
1418
S. D. LEVITT, J. A. LIST, AND D. H. REILEY
previous results show little evidence that play changes over periods (Rosenthal, Shachat, and Walker (2003) and Palacios-Huerta and Volij (2008)), this change is likely innocuous. 2.2. Professional Subjects We used three types of professionals as subjects in our artefactual field experiment (Harrison and List (2004)).7 The first is professional poker players. Our poker treatments were conducted in 2006 at the World Series of Poker in Las Vegas, NV. There were 130 participants: 44 playing Hide and Seek, 52 playing Four-Card Barry, and 34 who played against a preprogrammed computer (see below). Among our sample of players, the self-reported average number of hours spent per week playing poker is 25. Over 87 percent of the players reported they had made money playing poker in the previous year, with average annual earnings of $120,111. The sample included 11 individuals who had won either a World Series of Poker Bracelet or a World Poker Tour event. Recruiting was done through the distribution of flyers and face-to-face solicitation at the World Series of Poker venue (the Rio Hotel). All treatments were carried out in our hotel suites at the Rio Hotel, which hosted the World Series of Poker. Subjects were paid $1 per successful play.8 The experiment lasted, in most cases, no longer than one hour. Our second professional subject pool, like Palacios-Huerta and Volij (2008), is professional soccer players. We obtained permission to run experiments with three MLS teams: the Los Angeles Galaxy, Chivas USA, and Real Salt Lake. Each of these clubs granted us access to their locker room for two hours or less. Given the time constraint, we limited our soccer player treatments to the 4×4 O’Neill game, which yielded the most striking results reported in PalaciosHuerta and Volij (2008). We played Four-Card Barry with a total of 32 players from the three MLS teams, typically with four or five game conductors simultaneously administering the game to different pairs. These 32 players included 30 roster players, plus one team trainer (who had previously played intercollegiate soccer) and one youth player (a goalkeeper) who trained with the team, but was not yet on the official roster. Five of the 32 players were goalkeepers, and following Palacios-Huerta and Volij (2008) we made sure to have all five goalkeepers play against non-goalkeepers. Again, we 7 For space purposes, we suppress further discussion of our world-class bridge players. As aforementioned, bridge players offer an interesting counterpoint to poker players because, unlike poker, there is virtually no role for mixed strategies in bridge. Bridge players, like our other subject pools described below, deviated systematically from minimax play. Full results on bridge players are available in the Supplementary Appendix and in an earlier version of this paper (Levitt, List, and Reiley (2007)). 8 In this regard, we followed Fehr and List (2004) and Haigh and List (2005) in using larger payoffs for the professionals than the students. This was done to provide more comparable payoffs on an opportunity–cost scale and maintain the professionals’ attention during the games.
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD
1419
followed the identical protocol to that discussed above with poker players, except that the treatments were carried out in the locker room of the professional soccer clubs. 2.3. Human versus Computer Treatments With the poker players, we complemented the human–human treatments with two computer treatments.9 The first of these involved a computer programmed to play minimax for the first 15 periods and thereafter mixed between minimax and the action with the highest predicted payoff based on a logit/multinomial logit model that used as right-hand-side variables the previous moves by both the human and the computer.10 As more data accumulated, the program gave increasing weight to the model’s predictions.11 Given the nature of what we desired to learn from this exercise, the instructions to this game (included in the Supplementary Appendix) told the player that “ we have programmed the computer to play the theoretically correct strategy in this game. In addition, any deviations that your play has from this correct style of play will be taken advantage of by the computer.” These instructions reflect the fact that the computer plays minimax initially, but responds to human play that is far from minimax by attempting to exploit it. As a shorthand, we refer to this treatment as the “optimal computer.” Our second computer treatment involved programming the computer to play sub-optimally. In particular, we chose a simple algorithm whereby the computer randomly chose between the available actions with equal probabilities (whereas the optimal mix was 67 : 33 in the 2 × 2 game, and 40 : 20 : 20 : 20 9 This line of research originated with Messick (1967), who conducted a three choice, twoplayer repeated experiment where human subjects played against computer algorithms. See also Fox (1972), Coricelli (2004), Shachat and Swarthout (2004), and Spiliopoulos (2007). 10 It is important to build-in this learning component because if the computer played minimax equilibrium probabilities regardless of the response of the human opponent, the human should be indifferent between the choice of actions. By programming the computer to exploit play that is off the equilibrium path, we provided subjects with an incentive to play minimax proportions to avoid exploitation. 11 To be more specific, this involved a two step procedure: first, we estimated a predicted choice for the human player via a logit regression model (for Hide and Seek) or a multinomial logit model (Four-Card Barry) in which the only right-hand-side variables were dummies for (up to) three previous actions by the human and the computer, and calculated a best response to that predicted choice. Second, we averaged that best response together with the theoretical equilibrium ratios (1/3, 2/3 in Hide and Seek; 2/5, 1/5, 1/5, and 1/5 in Four-Card Barry) to provide the computer’s mixed-strategy proportions for the next round. We increased the weight given to the logit prediction over time as those predictions are based on more data. In Hide and Seek we used the simple average of the predicted logit best-response strategy and the theoretical equilibrium strategy for periods 16–35; in periods after 35, we used a weight of 3/4 on our predicted best response and 1/4 on the equilibrium play. Four-Card Barry was identical except that we used a 1/4 weight on our predicted best response and 3/4 weight on the equilibrium ratio for periods 16–25.
1420
S. D. LEVITT, J. A. LIST, AND D. H. REILEY
in the 4 × 4 game). This was a static strategy, and no computer learning component was built into this treatment. We refer to this treatment as the naive computer program. The instructions for this treatment differed from those of the first computer treatment only by the omission of the sentence saying that the computer had been programmed to play optimally. In total, we had 34 participants in the computer treatments. To maximize sample sizes, we allocated the participants so that each would play both games. But we did not vary the computer algorithm condition: if a subject was randomly placed in the optimal condition, for example, then she was in that condition for both games. Additionally, to control for order effects, the computer program randomly decided whether Hide and Seek or Four-Card Barry would be played first. In aggregate, 21 subjects played both Hide and Seek or FourCard Barry against an optimally programmed computer, and 13 played both games against a computer programmed to play an exploitable strategy. 3. EMPIRICAL RESULTS As noted in the literature, if subjects are playing the unique minimax equilibrium, then the data generated should conform to three key predictions: (i) for all players combined, the aggregate marginal and joint distributions of actions should correspond to that predicted by equilibrium play; (ii) for each particular pair of players, the marginal and joint distribution of actions should correspond to that predicted by equilibrium play; (iii) actions should be serially uncorrelated.12 In what follows, we report our results parsed by subject pool. 3.1. Hide-and-Seek Results Table I summarizes our findings for Hide and Seek. Each column corresponds to a different subject pool. The first two columns provide our data on college students and poker players. For purposes of comparison, we also report the soccer player results obtained by Palacios-Huertas and Volij (2008) in their 2 × 2 game. The top two rows of the uppermost panel in Table I show 12
Palacios-Huerta and Volij also tested a fourth hypothesis that expected win rates across strategies should be equal to each other and to the predicted equilibrium win rate. This test is important for field studies such as Walker and Wooders (2001) and Palacios-Huerta (2003), where the true payoff matrix is unknown, but is unnecessary in this setting since it is superfluous once hypothesis (2) is tested, in that it follows mechanically from (2). Further, randomization to produce the winner, particularly in the 2 × 2 game, introduces more noise and lower power for a test of win rates versus a test of choice frequencies. We therefore conserve space and place these results in the Supplementary Appendix, but we should note that for over 60 percent of both students and poker players, we can reject equality of success rates at the p < 005 level, and many of the rejections show behavior far removed from the minimax prediction. In stark contrast, Palacios-Huerta and Volij (2008) cannot reject for a single professional soccer player.
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD
1421
TABLE I SUMMARY OF RESULTS ACROSS SUBJECT POOLS IN THE 2 × 2 GAMEa Source: Test:
# of pairs # pairs of roles
Levitt, List, and Reiley
Palacios-Huerta and Volij
College Students
Poker Players
Soccer Pros
Soccer College
11 22
22 44
40 40
40 40
<0.001 0.374 0.001
<0.001 <0.001 N/A
68% 52% 75% 41%
5% 5% 0% 0%
5% 10% 5% N/A
18% 14% 5%
5% 0% 5%
8% N/A N/A
I. Minimax Play at Aggregate Level Chi-square test for minimax play: pursuer (or row player) <0.001 <0.001 evader (or column player) <0.001 <0.001 joint play <0.001 <0.001 II. Minimax Play at Individual Level Rejections at 5 percent: pursuer evader joint play neither player Rejections at 5 percent: for too few runs for too many runs
59% 55% 91% 27% III. Runs Tests 23% 10% 14%
a Table reports results for the 2 × 2 matrix game based on the game used by Rosenthal, Shachat, and Walker (2003). The columns correspond to the different subject pools tested, while the rows report results for each test. The last two columns report results for a similar experiment carried out by Palacios-Huerta and Volij (2008). Panel I shows p-values from Pearson’s chi-square test for goodness of fit of aggregate frequencies to minimax predictions. p-values for the marginal frequencies of the pursuer and evader are shown in the first two rows, while the third row shows p-values for combinations of plays by both players. The test uses one degree of freedom for the marginal distribution of play and three for the joint distribution. Panel II shows the percentage of individuals (or pairs) that we reject at the 5% level for this same chi-square test. Panel III presents the percentage of players for whom we can reject the null hypothesis of no serial correlation in actions, based on the runs test of Gibbons and Chakraborti (2003) which has the following distribution: ⎧ i i ni + n i
nB − 1 nR − 1 ⎪ B R ⎪ 2 if r is even ⎪ ⎪ ⎪ (r/2) − 1 (r/2) − 1 niB ⎪ ⎨ i i i i i ni + ni
f (r|nB nR ) = i nB − 1 nR − 1 nB − 1 nR − 1 ⎪ B R ⎪ ⎪ + ⎪ ⎪ (r − 1)/2 (r − 3)/2 (r − 3)/2 (r − 1)/2 ⎪ niB ⎩ if r is odd
where r is the number of runs, and niB and niR are the number of black and red choices. The serial independence hypothesis will be rejected at the 5 percent level if there are too few or too many runs, that is if F(r|niB niR ) < 0025 or if F(r − 1|niB niR ) > 0975, where F(r|niB niR ) = rk=1 f (k|niB niR ).
sample sizes. The lower panels provide tests of the relevant hypotheses described above. Readers interested in greater detail regarding these findings are directed to the tables in the Supplementary Appendix. Panel I in Table I reports p-values for rejecting the null hypothesis that the aggregate frequencies match those of minimax play. The first and second rows show results corresponding to the marginal distributions for those playing pur-
1422
S. D. LEVITT, J. A. LIST, AND D. H. REILEY
suer (or seeker) and evader (or hider).13 The third row reports results for the joint distribution of play, showing whether combinations of actions (for example, black–black plays) match minimax predictions. For both of our subject groups, all three of these hypotheses are rejected at the p < 001 level, and, importantly, the magnitude of the aggregate deviations from equilibrium is substantial. Theory predicts that both players should adopt a 67 : 33 red to black ratio. Students played red 61 percent of the time; poker players only 56 percent. Note, however, that this is also one of the few tests on which the soccer players in Palacios-Huerta and Volij (2008) strayed somewhat from minimax, as shown in columns 3 and 4. Panel II in Table I continues to focus on action frequencies, but differs from the top panel in reporting results for individual pairs of players, rather than summarizing the aggregate data. Instead of reporting p-values, in this case we show the fraction of individual players for whom we can reject the null hypothesis that the player’s actions match minimax play when acting as the pursuer or evader at the p < 005 level.14 The third row in panel II reports a similar statistic, but for the joint play by each pair. In contrast to panel I, large numbers represent violations of minimax in this part of the table.15 We also provide the distribution of individual choice frequencies in Figures 2a and 2b, along with the theoretical distribution that these choice frequencies should have under minimax. For both students and poker players, we find considerable departures from minimax play. Whether in the role of pursuer or evader, more than half of the subjects engage in play that is inconsistent with minimax behavior at the p < 005 significance level.16 We are able to reject the hypothesis that both players are jointly following minimax in at least 75 percent of the pairs. In roughly one-third of the pairs, neither of the players’ actions is consistent with minimax. Most of the deviations take the form of playing red too infrequently: nearly one-third of the students play red less than half of the time (minimax predicts red 67 percent of the time); approximately one-fourth of the poker players chose red less than half of the time. Note that our findings in panel II differ starkly from Palacios-Huerta and Volij (2008), as revealed in columns 3 and 4. In their sample, rejections were no more frequent than chance would predict under the null. 13 These p-values are obtained from Pearson’s chi-square test for goodness of fit, using 1 degree of freedom for the test of marginal frequencies and 3 degrees of freedom for the test of joint frequencies. 14 As before, we use a Pearson chi-square test with 1 degree of freedom for the marginal frequencies and 3 degrees of freedom for the joint frequencies. 15 We also examined the frequency with which neither player follows minimax, which represents a stronger test of the theory since if one player is following minimax, both players receive the equilibrium payoffs, regardless of the strategy the other follows. The patterns generally follow those discussed above. 16 For purposes of comparison, when we program computers to naively play a 50 : 50 mix, we are able to reject the null at the 0.05 level in 65 percent of the cases.
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD
1423
FIGURE 2A.—Distribution of choice frequencies in Hide and Seek (75 rounds).
FIGURE 2B.—Distribution of choice frequencies in PH&V 2 × 2 game (Soccer Pros, 150 rounds).
1424
S. D. LEVITT, J. A. LIST, AND D. H. REILEY
Finally, panel III of Table I presents the percentage of players for whom we can reject the null hypothesis of no serial correlation in actions, based on the runs test of Gibbons and Chakraborti (2003). The students and poker players fare much better on this test than the other tests, with “only” about 20 percent of the players exhibiting play that significantly deviates from the noserial-correlation null, although this is still worse than Palacios-Huerta and Volij’s (2008) soccer players. When Palacios-Huerta and Volij do reject, particularly with college soccer players, it tends to be for too many runs, which means too frequent switching of strategies.17 By contrast, our rejections of serial independence are at least as likely to be for too few runs as for too many runs, indicating that players frequently fail to switch often enough.18 In sum, the results we obtain using either undergraduate students or professional card players parallel those previously reported in the literature (see, e.g., Brown and Rosenthal (1990), Rosenthal, Shachat, and Walker (2003)). Consistent with Rosenthal, Shachat, and Walker (2003), these results hold whether we examine early periods of play or later periods, suggesting that play is not converging to equilibrium.19 3.2. Four-Card Barry Results Table II presents results from Four-Card Barry. The structure is similar to Table I, except that Table II contains one additional column corresponding to our sample of professional soccer players. Thus, the first three columns of Table II contain our data, and columns 4 and 5 report the results of PalaciosHuerta and Volij (2008). Panel I presents results on aggregate frequencies.20 For college students and poker players, the aggregate proportions are closer to theory in Four-Card Barry than in the 2 × 2 game, although we continue to reject at the p < 001 17
This is consistent with Walker and Wooders’ (2001) professional tennis study. The serial correlation performance is perhaps better than one would expect based on prior individual-level studies in psychology that lead one to conclude that “producing a random series of responses is difficult, if not impossible task to human [subjects], even when they are explicitly instructed” (Wagenaar (1971, p. 78)). However, it is in line with the intuition that subjects competing in dyadic interactions are more likely to yield serially uncorrelated play than in parallel individual choice settings (Budescu and Rapoport (1992)). 19 See the Supplementary Appendix for the results split by the first and second halves of the treatment. Across all of our 2 × 2 treatments, the results for panels I–III are similar for the two halves of play. In all cases, however, the frequency of rejection for serially correlated play is lower as subjects gain experience with the game. The Supplementary Appendix also highlights the economic significance of these departures. 20 The chi-squared tests for the 4 × 4 games use 3 degrees of freedom for the marginal distributions and 15 degrees of freedom for the joint distributions in panels I and II. In panel III, play is broken down into diamond versus non-diamond plays, and the analysis proceeds as in the 2 × 2 game. 18
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD
1425
TABLE II SUMMARY OF RESULTS ACROSS SUBJECT POOLS IN THE 4 × 4 GAMEa Source:
Levitt, List, and Reiley
Palacios-Huerta and Volij
Test: College Students Poker Players Soccer Pros Soccer Pros Soccer College
# of pairs # pairs of roles
12 24
26 52
16 32
I. Minimax Play at Aggregate Level Chi-square test for minimax play: row player 0.320 0.253 0.001 column player 0.008 <0.001 <0.001 joint play 0.105 0.008 <0.001
40 40
40 40
0.956 0.932 0.910
0.956 0.932 N/A
II. Minimax Play at Individual Level Rejections at 5 percent: row player column player joint play
33% 46% 38%
27% 35% 31%
28% 16% 31%
5% 5% 10%
10% 10% 5%
Rejections at 5 percent: for too few runs for too many runs
III. Runs Tests 38% 35% 4% 12% 38% 23%
16% 6% 9%
5% 0% 5%
8% N/A N/A
a Table reports results for the 4 × 4 matrix game based on the game developed by O’Neill (1987). The columns correspond to the different subject pools tested, while the rows report results for each test. The last two columns report results for a similar experiment carried out by Palacios-Huerta and Volij (2008). Panel I shows p-values from Pearson’s chi-square test for goodness of fit of aggregate frequencies to minimax predictions. p-values for the marginal frequencies of the row and column players are shown in the first two rows, while the third row shows p-values for combinations of plays by both players. The test uses three degrees of freedom for the marginal distribution of play and fifteen for the joint distribution. Panel II shows the percentage of individuals (or pairs) that we reject at the 5% level for this same chi-square test. For panel III, play is divided into two—diamond plays and non-diamond plays— before being analyzed. Panel III presents the percentage of players for whom we can reject the null hypothesis of no serial correlation in actions, based on the runs test of Gibbons and Chakraborti (2003) which has the following distribution: ⎧ i i ni + n i
nB − 1 nR − 1 ⎪ B R ⎪ if r is even ⎪2 ⎪ ⎪ (r/2) − 1 (r/2) − 1 niB ⎪ ⎨ i i i i ni + ni
i f (r|nB nR ) = i nB − 1 nR − 1 nB − 1 ⎪ nR − 1 B R ⎪ ⎪ + ⎪ ⎪ (r − 3)/2 (r − 1)/2 (r − 3)/2 (r − 1)/2 ⎪ niB ⎩ if r is odd
where r is the number of runs, and niB and niR are the number of black and red choices. The serial independence hypothesis will be rejected at the 5 percent level if there are too few or too many runs, that is if F(r|niB niR ) < 0025 or if F(r − 1|niB niR ) > 0975, where F(r|niB niR ) = rk=1 f (k|niB niR ).
level that the column players have the minimax mixing proportions. The substantive magnitudes of these deviations, however, are small: in all cases, the aggregate proportions are within a few percentage points of the predicted values. Our professional soccer players provided the largest deviations we found from theory in the aggregate data, where row players played 43 percent diamonds and 17 percent clubs (instead of 40 percent and 20 percent), while column play-
1426
S. D. LEVITT, J. A. LIST, AND D. H. REILEY
ers played 23 percent clubs and 17 percent hearts (instead of 20 percent each). These deviations are reflected in the especially low p-values for soccer players. We find it surprising that our soccer players deviate more from minimax than do students or poker players, given that the aggregate frequencies of soccer players—even intramural college soccer players—in Palacios-Huerta and Volij (2008) so closely matched the predictions of theory. Indeed, their data are stark in that only one time in 10 would such a result be obtained by chance, even if every player was following the minimax strategy.21 Panel II of Table II paints a similar picture when analyzing mixing proportions at the individual level. Across our various samples, we are able to reject at p < 005 the null that players are following minimax in roughly 20–45 percent of the cases, compared to 5 percent rejections for the soccer players in Palacios-Huerta and Volij (2008).22 Figures 3a and 3b complement these results by providing an ocular depiction of the distributions of individual choice frequencies, compared with the minimax binomial distribution. On the runs test reported in Panel III of Table II, our sample of soccer players perform reasonably well, but not quite as well as the players in PalaciosHuerta and Volij (2008); college students do quite poorly and poker players
FIGURE 3A.—Distribution of choice frequencies in Four-Card Barry (75 rounds). 21
Theory predicts mixing proportions of 40 : 20 : 20 : 20. In Palacios-Huerta and Volij (2008), the observed aggregate frequencies among professionals were 398 : 200 : 198 : 204. For more on this, see Wooders (2008). 22 In the worst cases, individuals are playing diamond over 60 percent of the time (minimax predicts 40 percent), and in one case a soccer player did not play heart even a single time.
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD
1427
FIGURE 3B.—Distribution of choice frequencies in PH&V Four-Card Barry (Soccer Pros, 200 rounds).
are in between. In this case, rejections in our data are typically for too many runs (switching too infrequently), consonant with Palacios-Huerta and Volij’s rejections. Overall, as was the case with Hide and Seek, the data suggest that our subjects provide choices that are not as close to minimax predictions as found in Palacios-Huerta and Volij (2008). These results hold whether we examine early or late periods of play, as shown in the Supplementary Appendix. It appears that among our professionals, what happens in the field stays in the field, establishing important bounds on the generality of the results of Palacios-Huerta and Volij (2008). Thus far, we have largely focused on statistical testing, leaving the question of proximity to theory on the sidelines. As can be gleaned from Tables I and II, in aggregate minimax predictions perform relatively well when judged against other social science theories. To provide further descriptive evidence at the individual level, we construct Table III, which reports proportions of players who deviated from optimal play. The table classifies players from each of our subject pools (including bridge players) into “deviations from optimality” bins. For instance, a player who played 26 percent, 39 percent, 19 percent, and 18 percent in Four-Card Barry would be classified as a “±5 percent” player due to the 26 percent being 6 percent higher than the 20 percent optimal rate. For comparability purposes, in panel II of Table III, we compute similar descriptive statistics using the data from Palacios-Huerta and Volij (2008). The descriptive statistics provide an indication of the large deviations observed in our data. For example, among poker players in the Hide and Seek
1428
S. D. LEVITT, J. A. LIST, AND D. H. REILEY TABLE III SUMMARY OF DEVIATIONS ACROSS SUBJECT POOLSa Source:
Subject-Role
Levitt, List, and Reiley Game
Number of Subject-Roles
±5%
±10%
±15%
±20%
Poker evader Poker pursuer
I. Levitt, List, and Reiley Hide and Seek 44 Hide and Seek
73% 89%
57% 75%
27% 48%
9% 23%
Bridge evader Bridge pursuer
Hide and Seek Hide and Seek
14
64% 64%
43% 36%
21% 14%
14% 14%
Student evader Student pursuer
Hide and Seek Hide and Seek
22
86% 82%
68% 64%
36% 41%
32% 23%
Soccer row Soccer column
Four-Card Barry Four-Card Barry
32
97% 94%
63% 25%
16% 9%
9% 9%
Poker row Poker column
Four-Card Barry Four-Card Barry
52
87% 87%
46% 48%
17% 19%
4% 10%
Bridge row Bridge column
Four-Card Barry Four-Card Barry
22
95% 86%
55% 41%
27% 23%
14% 14%
Student row Student column
Four-Card Barry Four-Card Barry
24
96% 88%
42% 54%
13% 29%
0% 21%
Soccer row Soccer column
II. Palacios-Huerta and Volij Hide and Seek 20 Hide and Seek
25% 10%
0% 0%
0% 0%
0% 0%
Student row Student column
Hide and Seek Hide and Seek
20
70% 75%
5% 5%
0% 5%
0% 0%
Soccer row Soccer column
Four-Card Barry Four-Card Barry
20
25% 30%
5% 0%
0% 0%
0% 0%
Student Row Student Column
Four-Card Barry Four-Card Barry
20
50% 30%
10% 0%
0% 0%
0% 0%
a Table reports on proportions of players (from LLR and PHV respectively) who deviated from optimal play (20% clubs, 40% diamonds, 20% hearts, 20% spades for 4-card barry, or 1/3 black, 2/3 red for hide and seek) by various amounts in at least one of their options. For instance, a player who played 26/39/19/18 would enter into the proportion of players in “±5%” due to the 26% of the time they played clubs.
game, 57 percent (75 percent) of the evaders (pursuers) deviate by more than 10 percent from optimal play. Likewise, for soccer professionals, we observe a substantial fraction of players deviating significantly from optimal play in FourCard Barry: for row (column) players 63 percent (25 percent) deviate by more than 10 percent from optimal play. In contrast, the data in Palacios-Huerta and Volij (2008) are notable in the sense that very few individuals depart significantly from optimal play. For instance, in the Hide and Seek game, none of the soccer players deviate by more than 5 percent from optimal play; in Four-Card Barry, only 5 percent of the soccer professionals deviate by at least 10 percent from optimal play and none deviates by more than 15 percent.
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD
1429
3.3. Understanding Why Play Deviates From Minimax There are several different plausible explanations as to why subjects in our sample fail to play the minimax strategy. A first explanation is that players would like to play minimax, but they are unable to do so because they cannot solve for the equilibrium, or they are cognitively able but deem the costs too prohibitive. A second, very different explanation is that they do not believe that their opponents will play minimax. If opponents systematically deviate from minimax (or are expected to deviate), then minimax is no longer the best response because the opponent’s strategy is exploitable. A third explanation lies at the foundation of the experimental environment: the nature and context of the constructed situation did not induce the professionals to retrieve the relevant cognitive tool kit to play optimally. We use two approaches to provide a deeper understanding of our results. The first is to use computers as opponents in similar lab games. In one treatment, we programmed the computer to play minimax (and then to exploit its competitor if possible), and in the other, the computer was programmed to persistently play suboptimally. Table IV reports results for the two computer treatments, with the optimally programmed opponent shown in columns 1 and 2 and the naive computer opponent shown in columns 3 and 4. Results are shown separately for the Hide and Seek and the Four-Card Barry games. The top portion of Table IV presents the same three tests included in the preceding tables, except that we restrict our tests to the behavior of the human player.23 Importantly for our purposes, even when faced with an opponent programmed to initially play minimax and to only deviate from that strategy in response to nonminimax play by the subject, poker players’ actions are not consistent with minimax theory.24 For a majority of the tests, empirical results are quite similar to the results obtained from the human–human interactions. Indeed, if anything, the runs test reveals that players are more likely to exhibit serially dependent play when competing against the computer, a result consonant with Budescu and Rapoport (1992) if subjects in this treatment interpreted the situation as an individual, noncompetitive, choice.25 These results indicate that the deviations we observe from minimax are not merely due to 23 As would be expected, the play of the naively programmed computer is nearly always rejected as being consistent with minimax. Less frequently, but often, the play of the optimally programmed computer is also rejected since it deviates from minimax in response to suboptimal play on the part of the human subject. These results are presented in the Supplementary Appendix. 24 When we divide each player’s actions into two equal size sets corresponding to the first 75 and last 75 plays, we find similar results (see the Supplementary Appendix). 25 In Four-Card Barry, the world-class poker players perform better than the other players in the sample, although the sample size is small. In Hide and Seek, world-class players do not play better than the others.
1430
S. D. LEVITT, J. A. LIST, AND D. H. REILEY
TABLE IV SUMMARY OF RESULTS FOR SUBJECTS PLAYING AGAINST COMPUTERSa Source: Test: Type of Player:
# of players # pairs of roles
Computer Programmed for Optimal Play
Computer Programmed for Naive Play
2×2 All Players
4×4 All Players
2×2 All Players
4×4 All Players
21 42
21 42
13 26
13 26
1.000 <0.001
<0.001 <0.001
I. Minimax Play at Aggregate Level Chi-square test for minimax play: evader/row player <0.001 0.132 pursuer/column player <0.001 <0.001 II. Minimax Play at Individual Level Rejections at 5 percent: evader/row player pursuer/column player Rejections at 5 percent: for too few runs for too many runs Overall Rounds 1–25 Rounds 26–50 Rounds 51–75
52% 57%
48% 33%
77% 85%
92% 100%
IV. Runs Tests 38% 24% 14%
31% 24% 7%
62% 54% 8%
27% 19% 8%
51% 51% 50% 51%
58% 57% 60% 55%
62%
92%
V. Mean Player Payoff as a Fraction of Total Payoff 50% 49% 51% 53% 50% 47% 48% 46%
Proportion of players who beat the computer
57%
43%
a Table reports results for the computer-based experiments. The first two columns correspond to games played on the computer programmed for optimal play, while the last two columns correspond to games played on the computer programmed for naive play. Panel I shows p-values from Pearson’s chi-square test for goodness of fit of the human player’s aggregate frequencies to minimax predictions. p-values for the marginal frequencies of the human player as evader (or row) and pursuer (or column) are shown in the first and second rows. The test uses one (three) degree(s) of freedom for the marginal distribution of play and three (fifteen) for the joint distribution for the 2 × 2 (4 × 4) game. Panel II shows the percentage of humans that we reject at the 5% level for this same chi-square test. For panel III, play in the 4 × 4 game is divided into two—diamond plays and non-diamond plays—and then analyzed as in the 2 × 2 game. Panel III presents the percentage of players for whom we can reject the null hypothesis of no serial correlation in actions, based on the runs test of Gibbons and Chakraborti (2003) which has the following distribution: ⎧ i i ni + n i
nB − 1 nR − 1 ⎪ B R ⎪ if r is even ⎪2 ⎪ ⎪ (r/2) − 1 (r/2) − 1 niB ⎪ ⎨ i i i i i ni + ni
f (r|nB nR ) = i nB − 1 nR − 1 nB − 1 nR − 1 ⎪ B R ⎪ ⎪ + ⎪ ⎪ (r − 1)/2 (r − 3)/2 (r − 3)/2 (r − 1)/2 ⎪ niB ⎩ if r is odd
where r is the number of runs, and niB and niR are the number of black and red choices. The serial independence hypothesis will be rejected at the 5 percent level if there are too few or too many runs, that is if F(r|niB niR ) < 0025 or if F(r − 1|niB niR ) > 0975, where F(r|niB niR ) = rk=1 f (k|niB niR ). Panel IV gives the average player payoff relative to the maximum potential payoff. In equilibrium, the expected payoff is 50 percent.
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD
1431
beliefs that the other player is not playing minimax and can therefore be exploited. The bottom panel of Table IV reports other results for these treatments, including average payoffs as well as the fraction of players who “beat” the computer in the sense of winning more than half of the trials.26 As expected, the computer programmed to play optimally slightly outperforms its human opponents: humans win 49.6 (48.5) percent of the payoffs in Hide and Seek (FourCard Barry). Alternatively, our subjects fared better against the computer programmed to play naive, nonminimax strategies, particularly in Four-Card Barry, where humans obtained 57.5 percent of the payoffs, and 12 of 13 humans earned more than the computer. This result accords with insights gained in Messick (1967), Fox (1972), Coricelli (2004), and Shachat and Swarthout (2004), who reported that subjects have some propensity in these games to exploit nonoptimal play. While our subjects were able to exploit effectively, they did not perform optimally. Given that the naive computer chose randomly between the four strategies with equal probability, the optimal human strategy is to always play diamond as the row player and never play diamond as the column player, yielding an expected payoff of 62.5 percent. None of our subjects realized this payoff level. In contrast to Four-Card Barry, humans did quite poorly against the naively programmed computer in Hide and Seek, winning only 50.9 percent of the total payoff, when the pure-strategy best response to the computer’s strategy would yield an expected payout of 75 percent. Only 8 of 13 humans beat the naive computer in Hide and Seek. Overall, both sets of results mirror Fox (1972), who found that subjects adjusted their play in the direction of a best response, but did not play optimally. Our second approach to learning why minimax met with limited success in our lab experiments is to use post-experimental surveys for the soccer players. We used two survey instruments, each given to a part of the professional soccer sample (both are contained in the Supplementary Appendix). One survey asks the question, “Does this game remind you of any other games?” None of the 12 soccer players who were asked this question spontaneously made the link between the experimental game and penalty kicks. Among the remaining soccer players, when prompted for a comparison between the lab game and penalty kicks, four players responded that they saw no comparison at all, two said that they only thought about penalty kicks after the question was asked, nine said that they were somewhat comparable, and five gave an outright yes.27 26
Similar to the human–human treatments, the computer and the player switched roles (row vs. column) halfway through the experiment so that each had the same expected value in terms of wins. 27 Interestingly, when asked their strategies on penalty kicks, 44 percent of the players reported playing pure strategies (e.g., always kick left), highlighting the fact that very few professional
1432
S. D. LEVITT, J. A. LIST, AND D. H. REILEY
These results suggest that, overall, our experiment was not able to summon important field parallels for our subjects. 4. CONCLUSIONS Determining the conditions under which people play mixed strategies is a question of fundamental importance in economics. Indeed, O’Neill (1991, p. 503) asserts that “the most basic idea in game theory is the minimax argument.” Judged by past laboratory games with unique mixed-strategy equilibria, this most basic idea has been met with limited success. Palacios-Huerta and Volij (2008) have revived this area of research by generating data that is aligned with some of the main predictions of minimax theory. In contrast to their results, however, we find that neither professional poker players nor our own sample of professional soccer players actually produce play in the lab that closely approximates minimax predictions. The results of pitting humans against computers programmed to exploit deviations from minimax further demonstrates that the failure to play minimax does not seem to be due to players’ beliefs in suboptimal play by their opponents. Although these subjects might be able to randomize effectively in their chosen line of business, they seemingly had difficulty transferring their particular field situation to the specific lab task. Our post-experimental surveys for the soccer treatments highlight the fact that players did not make the connection between the lab experiments and naturally occurring situations they face in the field. We should stress that our results are meant to provide important bounds on Palacios-Huerta and Volij (2008), as we note several differences between the two experiments. First, our experiments involved fewer repetitions (75 rounds in each role rather than 200 rounds in a single role), and the subjects were told the number of rounds in advance. Second, we conducted the experiments in the soccer teams’ locker rooms, rather than in a university laboratory, and we did not employ screens to hide the backs of a player’s cards from his opponent. Third, the experiments were played between teammates, rather than across teams, and we were not able to obtain enough goalkeepers to ensure one goalkeeper per pair. Finally, the subjects were players from American professional teams rather than from Spanish professional teams. We do not know which, if any, of these differences might have caused the large behavioral discrepancies, but one point of the study is that the previous results on soccer players are not as robust as one might have hoped. Additional replications would be valuable.28 soccer players ever get the chance to take penalty kicks in games and suggesting that few subjects viewed our experimental environment as having a direct parallel to their field of expertise. 28 We attempted to reproduce our results with American college soccer players, but quickly realized that this would violate NCAA rules prohibiting payments to college athletes. One coach suggested that players could legitimately compete if the cash they earned would be donated to charity, but we felt this would not be a very good test of equilibrium behavior in a zero-sum game.
WHAT HAPPENS IN THE FIELD STAYS IN THE FIELD
1433
Clearly, subjects come to experiments with rules of behavior learned in the outside world. Depending on whether the specific context of the lab game cues the proper rules of thumb, radically divergent results can be obtained. Harrison and List (2010), for instance, examined the behavior of professional bidders in their naturally occurring environments. In their real-world bidding, such subjects do not constantly fall prey to the winner’s curse. When the expert bidders are placed in unfamiliar roles, however, they often fall prey to the winner’s curse, just as happens in the lab. Our results combined with their insights underscore an important methodological point: slight changes in context can have profound behavioral effects, whether students or professionals are the experimental participants. REFERENCES BRAYER, A. R. (1964): “An Experimental Analysis of Some Variables of Minimax Theory,” Behavioral Science, 9, 33–44. [1413,1415] BROWN, J. N., AND R. W. ROSENTHAL (1990): “Testing the Minimax Hypothesis: A ReExamination of O’Neill’s Game Experiment,” Econometrica, 58, 1065–1081. [1413,1415,1424] BUDESCU, D. V., AND A. RAPOPORT (1992): “Generation of Random Series in Two-Person Strictly Competitive Games,” Journal of Experimental Psychology, 121, 352–363. [1414,1424, 1429] CHIAPPORI, P. A., S. LEVITT, AND T. GROSECLOSE (2002): “Testing Mixed Strategy Equilibria When Players Are Heterogeneous: The Case of Penalty Kicks in Soccer,” American Economic Review, 92, 1138–1151. [1414] CORICELLI, G. (2004): “Strategic Interaction in Iterated Zero-Sum Games,” Unpublished Manuscript, University of Arizona. Available at http://economics.eller.arizona.edu/downloads/ working_papers/coricelli.pdf. [1419,1431] FEHR, E., AND J. A. LIST (2004): “The Hidden Costs and Returns of Incentives: Trust and Trustworthiness Among CEOs,” Journal of the European Economic Association, 2, 743–771. [1418] FOX, J. (1972): “The Learning of Strategies in a Simple, Two-Person Zero-Sum Game Without Saddlepoint,” Behavioural Science, 17, 300–308. [1413,1415,1419,1431] FRIEDMAN, L. (1971): “Optimal Bluffing Strategies in Poker,” Management Science, 17, B764–B771. [1415] GIBBONS, J. D., AND S. CHAKRABORTI (2003): Nonparametric Statistical Inference (Fourth Ed.). New York: Dekker. [1421,1424,1425,1430] HAIGH, M., AND J. A. LIST (2005): “Do Professional Traders Exhibit Myopic Loss Aversion? An Experimental Analysis,” Journal of Finance, 60, 523–534. [1418] HARRISON, G. W., AND J. A. LIST (2004): “Field Experiments,” Journal of Economic Literature, 42, 1009–1055. [1418] (2010): “Naturally Occurring Markets and Exogenous Laboratory Experiments: A Case Study of the Winner’s Curse,” Economic Journal (forthcoming). [1433] HIRSCHBERG, D., S. LEVITT, AND J. LIST (2008): “Poker Players Equate Mixed-Strategy Payoffs in the Field,” Unpublished Manuscript, University of Chicago. [1415] HSU, S.-H., C.-Y. HUANG, AND C.-T. TANG (2007): “Minimax Play at Wimbledon: Comment,” The American Economic Review, 97, 517–523. [1414] LEVITT, S. D., J. A. LIST, AND D. H. REILEY (2007): “What Happens in the Field Stays in the Field: Exploring Whether Professionals Play Minimax in Laboratory Experiments,” Unpublished Manuscript, University of Chicago. [1418] (2010): “Supplementary Appendix to ‘What Happens in the Field Stays in the Field: Exploring Whether Professionals Play Minimax in Laboratory Experiments’,” Econometrica
1434
S. D. LEVITT, J. A. LIST, AND D. H. REILEY
Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/7405_instructions to experimental subjects.pdf. [1417] LIEBERMAN, B. (1960): “Human Behavior in a Strictly Determined 3 × 3 Matrix Game,” Behavioral Science, 5, 317–322. [1413,1415] (1962): “Experimental Studies of Conflict in Some Two-Person and Three-Person Games,” in Mathematical Models in Small Group Processes, ed. by J. H. Criswell, H. Solomon, and P. Suppes. Stanford, CA: Stanford University Press, 203–220. [1413,1415] LOEWENSTEIN, G. (1999): “Experimental Economics From the Vantage-Point of Behavioral Economics,” Economic Journal, 109, F23–F34. [1415] MESSICK, D. M. (1967): “Interdependent Decision Strategies in Zero-Sum Games: A ComputerControlled Study,” Behavioral Science, 12, 33–48. [1413,1415,1419,1431] O’NEILL, B. (1987): “A Non-Metric Test of the Minimax Theory of Two-Person Zerosum Games,” Proceedings of the National Academy of Sciences, 84, 2106–2109. [1416,1425] (1991): “Comments on Brown and Rosenthal’s Reexamination,” Econometrica, 59, 503–507. [1414,1432] PALACIOS-HUERTA, I. (2003): “Professionals Play Minimax,” Review of Economic Studies, 70, 395–415. [1414,1420] PALACIOS-HUERTA, I., AND O. VOLIJ (2008): “Experientia Docet: Professionals Play Minimax in Laboratory Experiments,” Econometrica, 76, 71–115. [1413-1415,1417,1418,1420-1422, 1424-1428,1432] ROSENTHAL, R. W., J. SHACHAT, AND M. WALKER (2003): “Hide and Seek in Arizona,” International Journal of Game Theory, 32, 273–293. [1413,1415-1418,1421,1424] SHACHAT, J., AND T. J. SWARTHOUT (2004): “Do We Detect and Exploit Mixed Strategy Play by Opponents?” Mathematical Methods of Operations Research, 59, 359–373. [1419,1431] SPILIOPOULOS, L. (2007): “Do Repeated Game Players Detect Patterns in Opponents? Revisiting the Nyarko and Schotter Belief Elicitation Experiment,” Paper 2179, Munich Personal RePEc Archive. Available at http://mpra.ub.uni-muenchen.de/2179. [1419] VON NEUMANN, J. (1928): “Zür Theorie der Gesellschaftsspiele,” Mathematische Annalen, 100, 295–320. [1413] WAGENAAR, W. A. (1971): “Serial Non-Randomness as a Function of Duration and Monotony of a Randomization Task,” Acta Psychologica, 35, 78–87. [1424] WALKER, M., AND J. WOODERS (2001): “Minimax Play at Wimbledon,” American Economic Review, 91, 1521–1538. [1414,1420,1424] WOODERS, J. (2008): “Does Experience Teach? Professionals and Minimax Play in the Lab,” Unpublished Manuscript, University of Arizona. [1426]
Dept. of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, U.S.A.; [email protected], Dept. of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, U.S.A.; [email protected], and Dept. of Economics, University of Arizona, 401 McClelland Hall, Tucson, AZ 85721, U.S.A.; [email protected]. Manuscript received September, 2007; final revision received July, 2009.
Econometrica, Vol. 78, No. 4 (July, 2010), 1435–1452
NOTES AND COMMENTS CAN RELAXATION OF BELIEFS RATIONALIZE THE WINNER’S CURSE?: AN EXPERIMENTAL STUDY BY ASEN IVANOV, DAN LEVIN, AND MURIEL NIEDERLE1 We use a second-price common-value auction, called the maximal game, to experimentally study whether the winner’s curse (WC) can be explained by models which retain best-response behavior but allow for inconsistent beliefs. We compare behavior in a regular version of the maximal game, where the WC can be explained by inconsistent beliefs, to behavior in versions where such explanations are less plausible. We find little evidence of differences in behavior. Overall, our study casts a serious doubt on theories that posit the WC is driven by beliefs. KEYWORDS: Common-value auctions, winner’s curse, beliefs, cursed equilibrium, level-k model.
1. INTRODUCTION A WELL DOCUMENTED PHENOMENON in common-value auctions is the winner’s curse (WC)—a systematic overbidding relative to Bayesian Nash equilibrium (BNE) which results in massive losses in laboratory experiments.2 Two recent papers, Eyster and Rabin (2005) and Crawford and Iriberri (2007), rationalize the WC within theories that retain the BNE assumption that players best respond to beliefs (hence, we refer to these theories as belief-based), but relax the requirement of consistency of beliefs. Eyster and Rabin introduced the concept of cursed equilibrium (CE) in which players’ beliefs do not fully take into account the connection between others’ types and bids. Crawford and Iriberry used the level-k model which was introduced by Stahl and Wilson (1995) and Nagel (1995). In this model, level-0 (L0 ) players bid in some prespecified way and level-k (Lk ) players (k = 1 2 ) best respond to a belief that others are Lk−1 .3 In response to Eyster and Rabin (2005) and Crawford and Iriberri (2007), we investigate experimentally whether the WC in common-value auctions is indeed driven by beliefs.4 We use a second-price common-value auction, called the maximal (or maximum) game, that was first introduced in Bulow and Klem1 We would like to thank David Harless and Oleg Korenok for useful discussions. We would also like to thank a co-editor and three anonymous referees for their comments and suggestions. An extended working-paper version is available at http://www.people.vcu.edu/~aivanov/. 2 See Bazerman and Samuelson (1983), Kagel and Levin (1986), Kagel, Harstad, and Levin (1987), Dyer, Kagel, and Levin (1989), Lind and Plott (1991), and the papers surveyed in Kagel (1995, Section II) and Kagel and Levin (2002). 3 CE and the level-k model can be applied to environments other than common-value auctions. 4 Our study applies to any belief-based explanation of the WC. This includes, for example, analogy-based expectation equilibrium (Jehiel (2005), Jehiel and Koessler (2008)). We focus on CE and the level-k model because they are the two most prominent belief-based explanations.
© 2010 The Econometric Society
DOI: 10.3982/ECTA8112
1436
A. IVANOV, D. LEVIN, AND M. NIEDERLE
perer (2002) and Campbell and Levin (2006). This game has the special property of being two-step dominance-solvable and our experimental design exploits this property. We focus on initial periods of play as this seems like a natural starting point for evaluating belief-based theories. The paper most closely related to ours is Charness and Levin (2009). This study finds that the WC is alive and well in an individual-choice variant of the “acquiring a company” game, that is, in an environment where the WC cannot be rationalized by inconsistent beliefs about other players’ behavior. Three concerns arise in interpreting the results in Charness and Levin (2009). First, one cannot reasonably expect CE or the level-k model to explain every aspect of behavior.5 Thus, even if Charness and Levin’s setup rules out belief-based explanations, we should still expect some anomalies. The key question is, “Is the WC more pronounced in environments where it can be rationalized by belief-based explanations than in environments where such explanations are less plausible?” If the answer is “yes,” the difference could be attributed to the level-k model or CE, whereas a negative answer casts doubt on the validity of such models. However, Charness and Levin (2009) cannot answer this question because it does not include a regular acquiring a company game against human opponents, that is, an environment of the former type. In contrast, our paper studies and compares behavior in both types of environments. Second, the acquiring a company game represents a lemons market (see Akerlof (1970)) and is not a common-value auction. Although both types of environments admit a WC, they are quite different and it is not obvious that Charness and Levin’s conclusions readily extend to common-value auctions.6 Finally, Charness and Levin (2009) studied behavior in individual-choice settings. However, it is possible that subjects employ very different cognitive mechanisms in interactions with other players; such interactions may trigger all sorts of thought processes about others’ reasoning, beliefs, and intentions. Thus, the conclusions from Charness and Levin (2009) do not necessarily extend to games against human opponents. In our study, subjects play against other people.7 Another related study is Costa-Gomes and Weizsäcker (2008), which finds a systematic inconsistency between chosen actions and stated beliefs in normalform games. This study differs from ours in two important ways. First, it concerns an environment which is very different from common-value auctions. Second, it is based on eliciting subjects’ beliefs. In addition, the study cannot distinguish between two possible interpretations: (i) that subjects do not best 5
For example, they do not explain bidding above values in private-value second-price auctions. It is plausible that the WC in both types of environments is driven by the same forces. However, given that these are quite different environments, this cannot be taken for granted. 7 In one of our environments, each subject plays against the computer which, however, mimics the strategy of a person (actually, the subject’s own past strategy). 6
RATIONALIZING THE WINNER’S CURSE
1437
respond to beliefs when choosing actions and (ii) that subjects form different beliefs when choosing actions and stating beliefs.8 We proceed as follows. In Section 2, we describe the maximal game and derive the relevant theoretical predictions. In Section 3, we describe our experimental design and in Section 4 we examine the experimental data. Section 5 concludes. 2. THEORETICAL CONSIDERATIONS We begin by describing the maximal game. There are n bidders, each of whom privately observes a signal Xi that is independent and identically distributed (i.i.d.) from a cumulative distribution function F(·) on [0 10]. Let X max = max({Xi }ni=1 ) be the highest of the n signals, and let xi and xmax denote particular realizations of Xi and X max , respectively. Given (x1 xn ), the ex post common value to the bidders is v(x1 xn ) = xmax . Bidders bid in a sealed-bid second-price auction where the highest bidder wins, earns the common value, xmax , and pays the second highest bid. In case of a tie, each tying bidder gets the object with equal probability. We will say that, given signal xi , a player bidding b underbids, bids her signal, overbids, or bids above 10 if b < xi , b = xi , xi < b ≤ 10, or b > 10, respectively. We now state our first result. PROPOSITION 1: b(xi ) = xi is the unique bid function remaining after two rounds of iterated deletion of weakly dominated bid functions.9 In the first round, all bid functions bi (·) with bi (xi ) < xi or bi (xi ) > 10 for some xi are deleted. In the second round, all bid functions bi (·) with xi < bi (xi ) ≤ 10 for some xi are deleted. The proof is given in the Appendix A. Here, we give the intuition. It is obvious that bidding above 10 is weakly dominated. Underbidding is also weakly dominated since, under the second-price rule, one could lose the auction at a price below one’s signal even though the value of the object is greater than or equal to one’s signal. Given that no one underbids, bi (xi ) > xi is weakly dominated for any xi , because, in case the highest bid among others is between xi and bi (xi ), i makes nonpositive (and possibly negative) profits. That bidding one’s signal is a BNE follows directly from Proposition 1. In fact, we can say more than that (the proof is given in Appendix A): 8 Another related study is Pevnitskaya (2008). This study investigates whether deviations from the risk-neutral BNE in first-price private-value auctions are caused by inconsistent beliefs, risk aversion, or probability misperception. All components seem to be at work. 9 A bid function is weakly dominated if, for some signal, it prescribes a weakly dominated bid.
1438
A. IVANOV, D. LEVIN, AND M. NIEDERLE
PROPOSITION 2: The bid function b(xi ) = xi is the unique symmetric BNE (including mixed strategies).10 We now show that overbidding can arise within the level-k model and CE. First, let us consider the level-k model. In this model, level-0 (L0 ) players bid in some prespecified way and level-k (Lk ) players (k = 1 2 ) best respond to a belief that others are Lk−1 . For auction settings, Crawford and Iriberry (2007) considered a version of L0 , called random L0 (RL0 ), which, regardless of its signal, bids uniformly over all bids between the minimal and maximal value of the object. RLk (k ≥ 1) best responds to RLk−1 . The next proposition shows that RL1 can overbid.11 PROPOSITION 3: The bid function of RL1 is bRL1 (xi ) = E(X max |Xi = xi ) ≥ xi . If F(xi ) < 1, the inequality is strict.12 The proof is given in Appendix A. It hinges on the fact that, because RL0 ’s bid is uninformative about its signal, RL1 cannot draw any inference about X max from winning the auction.13 Let us turn to CE. In a χ-CE (χ ∈ [0 1]), players best respond to a belief that each other player j, with probability χ, chooses a bid that is type-independent and is distributed according to the ex ante distribution of j’s bids and, with probability 1 − χ, chooses a bid according to j’s actual type-dependent bid function. Thus, χ captures players’ level of “cursedness”: if χ = 0, we have a standard BNE, and if χ = 1, players are fully cursed and draw no inferences about other players’ types. Based on Proposition 5 in Eyster and Rabin (2005), we can state the following proposition. PROPOSITION 4: Assuming Xi has a strictly positive probability density function (p.d.f.),14 the following bid function constitutes a symmetric χ-CE: bCE (xi ) = (1 − χ)xi + χE(X max |Xi = xi ) ≥ xi . If χ > 0 and F(xi ) < 1, the inequality is strict.15 10 In our experiment, matching of subjects is anonymous and there is no feedback, so it seems implausible that subjects should coordinate on an asymmetric BNE. For more on asymmetric equilibria, see our working paper at http://www.people.vcu.edu/~aivanov/ 11 Crawford and Iriberry (2007) also considered a version of the level-k model based on a socalled truthful L0 (TL0 ). In our settings, the behavior of TLk and RLk+1 coincides for k ≥ 0. 12 If signals have the discrete uniform distribution on the set {0 1 2 10} and there are two bidders (this is relevant for our experiment), then bRL1 (xi ) = E(X max |Xi = xi ) = (x2i + xi + 110)/22. 13 The behavior of RLk for k ≥ 2 is not uniquely determined. The point, however, is that a RL1 can rationalize overbidding. 14 Although the assumption of a strictly positive p.d.f. is not satisfied for the discrete distribution in our experiment, we suspect that the proposition nevertheless holds. 15 If signals have the discrete uniform distribution on the set {0 1 2 10} and there are two bidders, then bCE (xi ) = (1 − χ)xi + χE(X max |Xi = xi ) = (1 − χ)xi + χ(x2i + xi + 110)/22.
RATIONALIZING THE WINNER’S CURSE
1439
3. EXPERIMENTAL DESIGN 3.1. Treatments and Procedures The experiment consists of the Baseline, ShowBidFn, and MinBid treatments. The Baseline treatment consists of two parts. In part I, subjects play the maximal game for 11 periods. In each period, subjects are randomly and anonymously rematched in separate two-player auctions. Each subject’s signals for the 11 auctions are drawn with equal probability and without replacement from the set {0 1 2 10}.16 Signals are independent across subjects. Subjects can bid anything between 0 and 1000000 experimental currency units (ECU). To minimize the effect of learning, we provide no feedback whatsoever during the experiment. This also ensures that, in any auction, each bidder’s prior over the other bidder’s signal is the discrete uniform distribution on {0 1 2 10}. Part II is similar to part I. The only difference is that each subject i bids against the computer rather than against another subject. The computer, which “receives” a uniformly distributed signal, mimics i’s behavior from part I by using the same bid function that i used in part I. For example, if the computer receives signal y, it makes the same bid that i made in part I when she received signal y. Effectively, in part II each subject is playing against herself from part I (and knows that this is the case).17 The ShowBidFn treatment is identical to the Baseline treatment except that in part II we explicitly show subjects their bid functions from part I. The MinBid treatment is identical to the Baseline treatment except that subjects are explicitly not allowed to underbid. We conducted three sessions of the Baseline (62 subjects), two sessions of the ShowBidFn (46 subjects), and one session of the MinBid treatment (26 subjects). Subjects were students at The Ohio State University (OSU) who were enrolled in undergraduate economics classes. The sessions were held at the Experimental Economics Lab at OSU and lasted around 45 minutes. At the start of each session, the experimenter read the instructions for part I aloud as subjects read along. After that, subjects did a practice quiz. Experimenters walked around checking subjects’ quizzes, answering questions, and explaining mistakes. After part I of the relevant treatment, the instructions for part II were read. After part II, subjects were paid. Subjects’ earnings consisted of a $5 show-up fee, plus 10 ECU starting balances, plus their cumulative earnings from the 22 auctions,18 converted at a rate of $0.50 per ECU. Average earnings 16 Our design for part I ensures that each subject receives each signal from the set {0 1 2 10} exactly once. In effect, we are eliciting subjects’ bid functions. This simplifies the design of part II. 17 Note that although in part II a subject bids against the computer, the bidding strategy of the opponent is that of a person. The fact that this person is herself from part I should only make the cognitive processes of the opponent all the more salient. 18 In case a subject incurred losses which could not be covered by the 10 ECU starting balances, she was paid just her $5 show-up fee.
1440
A. IVANOV, D. LEVIN, AND M. NIEDERLE
were $18.53/$18.03/$15.53 in the Baseline/ShowBidFn/MinBid treatment. The instructions for the Baseline treatment are given in the Supplemental Material (Ivanov, Levin, and Niederle (2010)).19 The experiment was programmed and conducted with the software z-Tree (Fischbacher (2007)). 3.2. Possible Implications for Belief-Based Theories In part I of the Baseline and ShowBidFn treatments, underbidding and bidding above 10 are weakly dominated and can hardly be explained by any beliefbased theory. The most interesting behavior is overbidding because it leads to a WC (as long as others are also appropriately overbidding) and because it could potentially be explained by belief-based theories. Notice that, to explain overbidding, both the level-k model and CE require that beliefs place a positive weight on underbidding, that is, on weakly dominated bids. Although not implausible, this requirement puts some strain on belief-based explanations of overbidding. However, the real test of belief-based theories comes from part II of each treatment and part I of the MinBid treatment. In particular, we argue below that if behavior is driven by beliefs, we should observe a reduction in overbidding (i) in part II of each treatment relative to part I and (ii) in part I of the MinBid treatment relative to part I of the Baseline and ShowBidFn treatments. The absence of any such reduction would cast a serious doubt on belief-based theories. Our argument is based on the assumption that if behavior is driven by beliefs, these beliefs are at least consistent with the objectively known features of the environment. That is, we assume that a subject’s belief in part II is consistent with the fact that the computer uses her own bid function from part I20 and that a subject’s belief in the MinBid treatment is consistent with the fact that the opponent cannot underbid. Later, we will consider alternative interpretations of belief-based theories under which beliefs can be at odds with the objectively known features of the environment. Consider a subject i who overbids (for all signals) in part I of one of the three treatments.21 From Proposition 1, it follows that bidding her signal is a best response in part II. Although underbidding may not be a best response, it is at least a response in the right direction.22 If i continues to overbid but 19
The instructions in the other two treatments are very similar and are available upon request. In the Baseline treatment, this assumption entails that subjects are able to recall their bidding behavior from part I (which was just a few minutes ago) or perhaps, at least, whether they tended to underbid, bid their signal, overbid, or bid above 10. In the ShowBidFn treatment, subjects do not need to recall anything because they are explicitly shown their bid functions. 21 To be precise, overbidding is not possible for signal 10: a subject can underbid (except in the MinBid treatment), bid her signal, or bid above 10. Therefore, the correct statement is “a subject i who overbids for all signals 0–9 and bids above 10 for signal 10.” 22 Of course, underbidding is not possible in the MinBid treatment. 20
RATIONALIZING THE WINNER’S CURSE
1441
corrects her overbidding downward, this may or may not be a best response,23 but again it is a response in the right direction. On the other hand, if i continues overbidding without a downward correction or even starts bidding above 10 in part II, she is clearly not best responding to her behavior from part I. The bottom line is that if i’s behavior is driven by beliefs, we should observe a downward correction of bids in part II relative to part I. In part I of the MinBid treatment, anything other than bidding one’s signal is weakly dominated. Thus, if behavior is driven by beliefs, we would expect a reduction in the frequency and (average) magnitude of overbidding relative to part I of the Baseline and ShowBidFn treatments.24 Let us turn to three interpretations of belief-based theories under which beliefs can be at odds with the objectively known features of the environment. The first interpretation is that subjects are using some simple rule of thumb which leads them to behave “as if” they were best responding to beliefs. For example, a player using a rule like “bid based on the expected value conditional on my signal and ignore everything else” would behave just like a fully cursed or a RL1 player. Because subjects do not deliberately form beliefs, the beliefs describing their behavior could be at odds with objectively known features of the environment. Thus, subjects in any of our environments could behave as if they had cursed or RL1 beliefs. Note that this interpretation requires that the rule of thumb be rigid across environments. For example, a subject using the above rule of thumb needs to ignore the opponent’s bidding strategy just as much in part II as in part I, even though in part II it is her own past bidding strategy (which is even explicitly shown to her in the ShowBidFn treatment). The second and third interpretations pertain to CE. According to the second interpretation, cursed players do not fully think through the connection between others’ types and bids. As a result, they come up with cursed beliefs to which, however, they best respond by appropriately conditioning the expected value of the object on winning the auction. Under this interpretation, CE would explain behavior in our experiment only if players equally fail to realize the connection between others’ types and bids when bidding against (i) other people whose bids are unrestricted, (ii) their own bidding strategy (even when it is shown to them), and (iii) other people who are explicitly not allowed to underbid. 23
For overbidding in part II to be a best response, i would need to shift her bid function in part II, bIIi (·), downward in a way that, for all signals xi , none of the bids she made in part I lies in (xi bIIi (xi )]. Otherwise, there is a positive probability that she wins the auction and loses money. 24 In the MinBid treatment, a subject’s available bids depend on her type so that CE is not formally defined. Nevertheless, our point remains valid: if subjects’ behavior is driven by beliefs (whether these beliefs are appropriately redefined cursed beliefs or other beliefs), we should observe a reduction in overbidding.
1442
A. IVANOV, D. LEVIN, AND M. NIEDERLE
According to the third interpretation, players are aware of others’ typecontingent strategies but underappreciate the information content of winning the auction. Under this interpretation, CE could explain overbidding in any of the environments of our experiment.25 The problem with this interpretation is that rather than being about inconsistent beliefs, it is about a failure to properly update the expected value of the object conditional on winning. Such a failure is not at all part of the formal definition of CE according to which players perfectly update given their (albeit cursed) beliefs. Although theoretically awkward, this interpretation could have validity if CE can accurately capture behavior which is actually driven by improper updating. For example, in the special case of bidders who have correct beliefs but completely fail to condition on winning, fully CE perfectly describes behavior. However, it is unclear to what extent CE can accurately capture the behavior of bidders who update, albeit incompletely. Thus, it is unclear to what extent this interpretation is generally valid.26 4. RESULTS We start by studying and comparing behavior in parts I and II within each treatment. After that, we compare the part I behavior of the Baseline and ShowBidFn treatments with that of the MinBid treatment. 4.1. Behavior in Part I and Part II We start by placing each bid b, given signal x, in one of the following categories: (i) b < x − 025, (ii) x − 025 ≤ b ≤ x + 025, (iii) x + 025 < b ≤ 10, and (iv) b > 10.27 That is, we count all bids within 0.25 ECU of one’s signal as if they were precisely equal to the signal.28 Based on this, we classify subjects (separately for each part of each treatment) in the following way: Underbidders/Signal Bidders/Overbidders/Above-10 Bidders are those who make at least 6 (out of 11) bids in category (i)/(ii)/(iii)/(iv); subjects who fall in none of these four classes are classified as Indeterminate.29 We start the analysis with the Baseline treatment. Table I shows how many subjects were in each class in part I (last column) and part II (last row). The 25
The same holds for Charness and Levin’s variant of the acquiring a company game. To shed light on the issue, one would formally have to define an equilibrium concept in which players have correct beliefs but incompletely update their beliefs (conditional on winning). 27 Actually, for signal x = 10, a bid needs to be above 10.25 to fall into category (iv); a bid 975 ≤ b ≤ 1025 falls into category (ii). We ignore this into our notation. 28 Counting only bids which are precisely equal to the signal in category (ii) (and adjusting the other categories appropriately) does not change any of our results. 29 Using 7 or 8 (instead of 6) class-consistent decisions as the cutoff for a player to be assigned to a class does not affect the analysis much (apart from increasing the number of Indeterminate subjects). 26
1443
RATIONALIZING THE WINNER’S CURSE TABLE I SUBJECT CLASSIFICATION IN PARTS I AND II OF THE BASELINE TREATMENT Part I / II
Underbidders Signal Bidders Overbidders Above-10 Bidders Indeterminate
Underbidders Signal Bidders Overbidders Above-10 Bidders Indeterminate
2 0 1 2 2
0 5 5 1 2
2 3 14 1 3
1 1 1 6 5
0 0 4 0 1
7
13
23
14
5
5 9 25 10 13
table also shows how subjects switched between classes from part I to part II. For example, the entry in the first row and third column shows that 2 subjects who were Underbidders in part I became Overbidders in part II. Based on the table, we can make the following statements: RESULT 1: (a) In part I, a large percentage of subjects make a weakly dominated bid (b < x − 025 or b > 10) in at least 6 (out of 11) auctions (30.7%).30 (b) In part I, Overbidders are the largest class (40.3%). (c) Only a minority of Overbidders from part I become Signal Bidders or Underbidders in part II (24%). (d) The majority of Overbidders from part I remain Overbidders in part II (56%). For a large proportion of subjects, behavior can hardly be explained by belief-based theories (point (a)). However, the largest proportion of subjects in part I are Overbidders. These subjects’ behavior is potentially driven by beliefs. However, only a minority of them (best) respond in part II by becoming Signal Bidders or Underbidders. The key question is whether those who remain Overbidders in part II are best responding in part II or are at least responding in the right direction by correcting their bids downward.31 For subjects who are Overbidders in parts I and II, we find that only 23% of bids in part II are best responses to part I behavior. These subjects are foregoing, on average, 5.62 ECU (median is 4.07 ECU) in expected profits by not behaving optimally in part II. Figure 1 plots, for each signal, the median bid in part I (circles) and part II (stars).32 Based on the figure, we see no downward correction of bids in part II. We can state the following result: 30
This percentage includes all Underbidders and Overbidders, as well as 4 Indeterminate subjects. The one Overbidder from part I who becomes an Above-10 Bidder in part II is clearly not (best) responding to her behavior from part I. 32 We plot median, rather than average, bids because averages are distorted by bids above 10. 31
1444
A. IVANOV, D. LEVIN, AND M. NIEDERLE
FIGURE 1.—Median bids in parts I (circles) and II (stars) for subjects who are Overbidders in parts I and II of the Baseline treatment.
RESULT 2: For subjects who are Overbidders in parts I and II, we make the following observations: (a) In part II, they forego substantial expected profits. (b) In part II, there is no evidence of a downward correction of bids. Result 1 extends to the ShowBidFn and MinBid treatments; see Tables II and III in Appendix B which are the analogs of Table I.33 Result 2 also extends to the ShowBidFn and MinBid treatments. In the ShowBidFn/MinBid treatment, for subjects who are Overbidders in parts I and II, 15%/19% of bids in part II are best responses to part I behavior; these subjects are foregoing, on average, 6.60 ECU34 /5.40 ECU (median is 7.19 ECU/3.84 ECU) in expected profits in part II. For analogs of Figure 1, see Figures 4 and 5 in Appendix B. 4.2. Baseline and ShowBidFn versus MinBid If behavior is driven by beliefs, we would expect a reduction in the frequency and (average) magnitude of overbidding in part I of the MinBid treatment rel33 In the ShowBidFn/MinBid treatment, the percentage of subjects who make a weakly dominated bid in at least 6 (out of 11) auctions is 28.3%/7.8%. In the MinBid treatment, the percentage is smaller largely because subjects cannot underbid. 34 This average excludes one subject who bid very high both in parts I and II so that she incurred huge losses in part II.
RATIONALIZING THE WINNER’S CURSE
1445
FIGURE 2.—Average bids in part I of Baseline and ShowBidFn (circles) and MinBid (stars) (based on bids of form x + 025 < b ≤ 10).
ative to part I of the Baseline and ShowBidFn treatments. The frequency of overbidding is 42.8% in the Baseline and ShowBidFn treatments35 and 60.5% in the MinBid treatment. Overbidding is probably more frequent in the MinBid treatment because underbidding is impossible, so all bids are distributed in three, rather than four, categories. Given this, the frequencies of overbidding seem quite comparable. What about the magnitude of overbidding? Figure 2 shows, for each signal, the average bid of the form x + 025 < b ≤ 10 in part I of the Baseline and ShowBidFn treatments (circles) and in part I of the MinBid treatment (stars). Average bids are astonishingly close. We can now state our final result: RESULT 3: Relative to part I of the Baseline and ShowBidFn treatments, we find no evidence in part I of the MinBid treatment of (a) a lower frequency of bids of the form x + 025 < b ≤ 10 or (b) a reduction in the average size of bids of the form x + 025 < b ≤ 10.
35 We pool the data from part I of the Baseline and ShowBidFn treatments because part I is the same in both treatments.
1446
A. IVANOV, D. LEVIN, AND M. NIEDERLE
5. CONCLUDING REMARKS We investigate experimentally whether belief-based theories can explain the WC in common-value auctions in initial periods of play.36 The main idea of our approach is to compare behavior in an environment where overbidding can be rationalized by belief-based theories with behavior in environments where belief-based explanations are less plausible. We observe no reduction in overbidding in the latter environments. We conclude that, our results cast serious doubt on belief-based explanations of the WC in initial periods of play unless one is willing to accept one of the following statements: (i) Subjects use a rule of thumb which leads them to behave as if they were best responding to beliefs and which is fixed across the environments in our study. (ii) Subjects equally fail to realize the connection between others’ types and bids in all environments in our study. (iii) CE, contrary to its formal definition, can be interpreted as being about improper updating rather than about inconsistent beliefs. APPENDIX A: PROOFS PROOF OF PROPOSITION 1: First round of deletion of weakly dominated bid functions. It is obvious that bidding above 10 is weakly dominated. Under the second-price rule, for any xi , any bid strictly below xi is also weakly dominated (by bidding xi ) since one could lose the auction at a price below xi even though xmax ≥ xi . Therefore, we can delete all bid functions, such that bi (xi ) < xi or bi (xi ) > 10 for some xi . Second round of deletion of weakly dominated strategies. Suppose that bidder i with signal xi considers bidding b+ > xi . In the event that bidding xi wins, bidding b+ rather than xi does not matter. In the event that bidding b+ does not win, bidding b+ rather than xi also does not matter. Now consider the third possible event: that bidding xi does not win but bidding b+ does. Then bidder i pays the highest bid among the other n − 1 playb ≥ xi (otherwise xi would ers, b, where b ≥ xmax . The inequality holds because have won) and because none of the other bidders ever underbid (by the first round of deletion of weakly dominated bid functions). But then i would make nonpositive profits by bidding b+ , whereas she would make zero profits by bidb is strictly above xmax , then b+ makes strictly negative ding xi . Moreover, if 36 Our study, and particularly behavior in part II of our treatments, may have implications for theories of learning, such as fictitious play (Brown (1951) and Robinson (1951)), in which players best respond to others’ past actions. (We thank an anonymous referee for this point.) We do not emphasize this point because learning models are usually about multiple repetitions. Therefore, even if players in part II in our experiment do not best respond to their own past behavior, with experience, they may very well learn to best respond to others’ past behavior.
RATIONALIZING THE WINNER’S CURSE
1447
profit. Therefore, b+ is weakly dominated and we can delete all bid functions such that bi (xi ) > xi for some xi . Q.E.D. PROOF OF PROPOSITION 237 : A strategy for player i is a probability measure H on [0 10] × [0 ∞) with marginal cumulative distribution function (c.d.f.) on the first coordinate equal to F(·). A pure strategy is a bid function b : [0 10] → [0 ∞) such that H({x b(x)}x∈[010] ) = 1. That b(x) = x is a BNE follows directly from Proposition 1. Here, we prove uniqueness among all symmetric BNE.38 Assume that H is a symmetric BNE. Let L = {(x b)|x ∈ [0 10] b < x} and U = {(x b)|x ∈ [0 10] b > x}. That is, L and U are the sets in [0 10] × [0 10] strictly below and strictly above the 45◦ line, respectively. We need to show that H(L ∪ U) = 0 or, equivalently, that H(L) = 0 and H(U) = 0. First, assume H(L) > 0. Let sk (·) be the step function, defined by sk (x) = 10 int( kx ), where int(·) gives the integer part of a real number (s3 (·) is depicted k 10 in the left graph in Figure 3). Let Ak = {(x b)|b ≤ sk (x)} ∩ L, that is, Ak is the area in L below the sk (·) function. Note that k < k implies A2k ⊂ A2k and that L = k≥1 A2k . Therefore, H(L) = limk→∞ H(A2k ) > 0.39 Therefore,
FIGURE 3.—s3 (·) and S3 (·).
37
Under standard assumptions on F(·), we could simply invoke Proposition 1 in Pesendorfer and Swinkels (1997) so that no proof would be necessary. However, these assumptions do not hold in the case of the discrete distribution in our experiment. 38 Of course, any bid function which differs from b(x) = x only on a set of measure zero will also be a symmetric BNE. 39 To see this, let B2 = A2 and Bl = Al /Al−1 for l ≥ 3. Then H(L) = H( l≥2 Al ) = k H( l≥2 Bl ) = l≥2 H(Bl ) = limk→∞ l=2 H(Bl ) = limk→∞ H(Ak ) = limk→∞ H(A2k ). The third and fifth equalities follow from the (countable) additivity of probability measures.
1448
A. IVANOV, D. LEVIN, AND M. NIEDERLE
for some k, H(A2k ) > 0. Because A2k consists of finitely many rectangles like ABCD in Figure 3 (ABCD includes its boundaries, except for point D), it follows that at least one of these rectangles has positive measure. Assume, without loss of generality, H(ABCD) > 0. We will show that, for a positive measure (with respect to (w.r.t.) H) of points (x b) ∈ ABCD, bidding b given signal x is strictly worse than bidding x because there is a positive probability that one will lose the auction to a bid strictly below x. Let g( b) = H({(x b)|(x b) ∈ ABCD b ≤ b}). Note that ≥ 0 and g(b) > g(b), where g(·) is a nondecreasing function and that g(b) b = min({b|(x b) ∈ ABCD}) and b = max({b|(x b) ∈ ABCD}). If g(b) > 0, then {(x b)|(x b) ∈ ABCD b = b} has positive measure. For any point (x b) in this set, bidding b given signal x is strictly worse than bidding x since there is a positive probability of a tie at b. Assume g(b) = 0. If g(·) is continuous, choose b∗ ∈ (b b), such that 0 < g(b∗ ) < g(b).40 Then {(x b)|(x b) ∈ ABCD b ≤ b∗ } and {(x b)|(x b) ∈ ABCD b > b∗ } each have positive measure. But then for a positive measure of points (x b) (the points in the former set), bidding b given signal x is strictly worse than bidding x since there is a positive probability of losing the auction to a bid b, such that b∗ < b < x. If g(·) is not continuous, then it has a jump point41 at, say, b∗∗ . Therefore, {(x b)|(x b) ∈ ABCD b = b∗∗ } has positive measure. For any point (x b) in this set, bidding b∗∗ given signal x is strictly worse than bidding x since there is a positive probability of a tie at b∗∗ . This proves that we cannot have H(L) > 0. The proof that we cannot have H(U) > 0 is analogous, so only a brief outline is provided. Assume that H(U) > 0. Let Sk (·) be the step function, defined by ) (S3 (·) is depicted in the right graph in Figure 3). Then we Sk (x) = sk (x + 10 k show analogously to the above discussion that a rectangle of the sort EFGK in Figure 3 has positive measure. Then defining h( b) = H({(x b)|(x b) ∈ EFGH b ≤ b}), we show that for a positive measure (w.r.t. H) of points (x b) ∈ EFGH, bidding b given signal x is strictly worse than bidding x because there is a positive probability that one will win the auction at a price Q.E.D. strictly above xmax . denote the highest bid among the n − 1 PROOF OF PROPOSITION 3: Let B subjects other than i. Given Xi = xi , subject i chooses her bid, b, to maximize42 E(payoff|Xi = xi ) i = xi B < b) < b|Xi = xi )E(X max − B|X = prob(B 40
This can clearly be done by the intermediate value theorem. Any nondecreasing function is either continuous or has countably many jump points. 42 Ties are ignored because they occur with zero probability. 41
1449
RATIONALIZING THE WINNER’S CURSE
< b)[E(X max |Xi = xi ) − E(B| B < b)] = prob(B bn−1 n−1 max b = n−1 E(X |Xi = xi ) − 10 n in particular) The second equality follows, because (i) others’ bids (and B max are not informative about X and (ii) Xi is not informative about others’ in particular). The third equality uses facts about the distribids (and about B bution and expectation of the first-order statistic of n − 1 i.i.d. random variables which have the uniform distribution on [0 10]. From the last expression, it is straightforward to verify that the unique optimal bid equals E(X max | Q.E.D. Xi = xi ). APPENDIX B: FIGURES AND TABLES
TABLE II SUBJECT CLASSIFICATION IN PARTS I AND II OF THE SHOWBIDFN TREATMENT Part I / II
Underbidders Signal Bidders Overbidders Above-10 Bidders Indeterminate
Underbidders Signal Bidders Overbidders Above-10 Bidders Indeterminate
3 0 4 0 3
0 5 0 1 1
0 1 10 0 5
0 0 2 4 1
1 1 2 0 2
10
7
16
7
6
4 7 18 5 12
TABLE III SUBJECT CLASSIFICATION IN PARTS I AND II OF THE MINBID TREATMENT Part I / II
Underbidders Signal Bidders Overbidders Above-10 Bidders Indeterminate
Underbidders Signal Bidders Overbidders Above-10 Bidders Indeterminate
0 0 0 0 0
0 3 3 1 1
0 0 14 1 0
0 1 2 0 0
0 0 0 0 0
0
8
15
3
0
0 4 19 2 1
1450
A. IVANOV, D. LEVIN, AND M. NIEDERLE
FIGURE 4.—Median bids in parts I (circles) and II (stars) for subjects who are Overbidders in parts I and II of the ShowBidFn treatment.
FIGURE 5.—Median bids in parts I (circles) and II (stars) for subjects who are Overbidders in parts I and II of the MinBid treatment.
RATIONALIZING THE WINNER’S CURSE
1451
REFERENCES AKERLOF, G. (1970): “The Market for Lemons: Quality Uncertainty and the Market Mechanism,” Quarterly Journal of Economics, 84, 488–500. [1436] BAZERMAN, M. H., AND W. F. SAMUELSON (1983): “I Won the Auction But Don’t Want the Prize,” Journal of Conflict Resolution, 27, 618–634. [1435] BROWN, G. (1951): “Iterative Solution of Games by Fictitious Play,” in Activity Analysis of Production and Allocation, ed. by T. C. Koopmans. New York: Wiley. [1446] BULOW, J., AND P. KLEMPERER (2002): “Prices and the Winner’s Curse,” RAND Journal of Economics, 33, 1–21. [1435,1436] CAMPBELL, C. M., AND D. LEVIN (2006): “When and why not to Auction,” Economic Theory, 27, 583–596. [1436] CHARNESS, G., AND D. LEVIN (2009): “The Origin of the Winner’s Curse: A Laboratory Study,” American Economics Journal: Microeconomics, 1, 207–236. [1436] COSTA-GOMES, M., AND G. WEIZSÄCKER (2008): “Stated Beliefs and Play in Normal-Form Games,” Review of Economic Studies, 75, 729–762. [1436] CRAWFORD, V. P., AND N. IRIBERRI (2007): “Level-k Auctions: Can a Non-Equilibrium Model of Strategic Thinking Explain the Winner’s Curse and Overbidding in Private-Value Auctions?” Econometrica, 75, 1721–1770. [1435,1438] DYER, D., J. H. KAGEL, AND D. LEVIN (1989): “A Comparison of Naive and Experienced Bidders in Common Value Offer Auctions: A Laboratory Analysis,” Economic Journal, 99, 108–115. [1435] EYSTER, E., AND M. RABIN (2005): “Cursed Equilibrium,” Econometrica, 73, 1623–1672. [1435, 1438] FISCHBACHER, U. (2007): “z-Tree: Zurich Toolbox for Ready-Made Economic Experiments,” Experimental Economics, 10, 171–178. [1440] IVANOV, A., D. LEVIN, AND M. NIEDERLE (2010): “Supplement to ‘Can Relaxation of Beliefs Rationalize the Winner’s Curse?: An Experimental Study’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/8112_instructions to experimental subjects.pdf. [1440] JEHIEL, P. (2005): “Analogy-Based Expectation Equilibrium,” Journal of Economic Theory, 123, 81–104. [1435] JEHIEL, P., AND F. KOESSLER (2008): “Revisiting Games of Incomplete Information With Analogy-Based Expectations,” Games and Economic Behavior, 62, 533–557. [1435] KAGEL, J. H. (1995): “Auctions: A Survey of Experimental Research,” in The Handbook of Experimental Economics, ed. by J. H. Kagel and A. E. Roth. Princeton, NJ: Princeton University Press. [1435] KAGEL, J. H., AND D. LEVIN (1986): “The Winner’s Curse and Public Information in Common Value Auctions,” American Economic Review, 76, 894–920. [1435] (2002): Common Value Auctions and the Winner’s Curse. Princeton, NJ: Princeton University Press. [1435] KAGEL, J., R. HARSTAD, AND D. LEVIN (1987): “Information Impact and Allocation Rules in Auctions With Affiliated Private Valuations: An Experimental Study,” Econometrica, 55, 1275–1304. [1435] LIND, B., AND C. R. PLOTT (1991): “The Winner’s Curse: Experiments With Buyers and With Sellers,” American Economic Review, 81, 335–346. [1435] NAGEL, R. (1995): “Unraveling in Guessing Games: An Experimental Study,” American Economic Review, 85, 1313–1326. [1435] PESENDORFER, W., AND J. M. SWINKELS (1997): “The Loser’s Curse and Information Aggregation in Common Value Auctions,” Econometrica, 65, 1247–1281. [1447] PEVNITSKAYA, S. (2008): “Effect of Beliefs and Risk Preferences on Bidding in Auctions,” Working Paper, Florida State University. [1437] ROBINSON, J. (1951): “An Iterative Method of Solving a Game,” Annals of Mathematics, 54, 296–301. [1446]
1452
A. IVANOV, D. LEVIN, AND M. NIEDERLE
STAHL, D., AND P. WILSON (1995): “On Players’ Models of Other Players: Theory and Experimental Evidence,” Games and Economic Behavior, 10, 218–254. [1435]
Virginia Commonwealth University, 301 West Main Street, Room B3149, Richmond, VA 23284, U.S.A.; [email protected], The Ohio State University, 1945 North High Street, Arps Hall 410, Columbus, OH 43210, U.S.A.; [email protected], and Stanford University, 579 Serra Mall, Stanford, CA 94305-6072, U.S.A.; [email protected]. Manuscript received September, 2008; final revision received December, 2009.
Econometrica, Vol. 78, No. 4 (July, 2010), 1453–1454
ANNOUNCEMENTS 2010 WORLD CONGRESS OF THE ECONOMETRIC SOCIETY
THE TENTH WORLD CONGRESS of the Econometric Society will be held in Shanghai from August 17th to August 21th, 2010. It is hosted by Shanghai Jiao Tong University in cooperation with Shanghai University of Finance and Economics, Fudan University, China Europe International Business School, and the Chinese Association of Quantitative Economics. The congress is open to all economists, including those who are not now members of the Econometric Society. It is hoped that papers presented at the Congress will represent a broad spectrum of applied and theoretical economics and econometrics. The Program Co-Chairs are: Professor Daron Acemoglu, MIT Department of Economics, E52-380B, 50 Memorial Drive, Cambridge, MA 02142-1347, U.S.A. Professor Manuel Arellano, CEMFI, Casado del Alisal 5, 28014 Madrid, Spain. Professor Eddie Dekel, Department of Economics, Northwestern University, 2003 Sheridan Rd., Evanston, IL 60208-2600, U.S.A., and Eitan Berglas School of Economics, Tel Aviv University, Tel Aviv 69978, Israel. The Chair of the Local Organizing Committee is: Professor Lin Zhou, Department of Economics, Shanghai Jiao Tong University, Shanghai 200052, China, and Department of Economics, Arizona State University, Tempe, AZ 85287, U.S.A. Detailed information on registration and housing will be sent by email to all members of the Econometric Society in due course and will be available at www.eswc2010.com. THE 2011 NORTH AMERICAN WINTER MEETING
THE 2011 NORTH AMERICAN WINTER MEETING of the Econometric Society will be held in Denver, CO, from January 7–9, 2011, as part of the annual meeting of the Allied Social Science Associations. The program will consist of contributed and invited papers. More information on program details and registration will be sent by email to all members of the Econometric Society and posted on the website at http:// www.econometricsociety.org. © 2010 The Econometric Society
DOI: 10.3982/ECTA784ANN
1454
ANNOUNCEMENTS
Program Committee Chair: Markus K. Brunnermeier 2011 NORTH AMERICAN SUMMER MEETING
THE 2011 NORTH AMERICAN SUMMER MEETING of the Econometric Society in 2011 will be held June 9–12, 2011, hosted by Washington University in Saint Louis, MO. The program committee will be chaired by Marcus Berliant of Washington University in Saint Louis. The program will include plenary, invited and contributed sessions in all fields of economics. 2011 AUSTRALASIA MEETING
THE 2011 AUSTRALASIA MEETING of the Econometric Society in 2011 (ESAM11) will be held in Adelaide, Australia, from July 5 to July 8, 2011. ESAM11 will be hosted by the School of Economics at the University of Adelaide. The program committee will be co-chaired by Christopher Findlay and Jiti Gao. The program will include plenary, invited and contributed sessions in all fields of economics. 2011 EUROPEAN MEETING OF THE ECONOMETRIC SOCIETY ANNOUNCEMENT
THE 2011 EUROPEAN MEETING of the Econometric Society (ESEM) will take place in Oslo, Norway, from August 25 to 29, 2011. The Meeting is organized by the University of Oslo, and it will run in parallel with the Congress of the European Economic Association (EEA). Participants will be able to attend all sessions of both events. The Program Committee Chairs are Professor John van Reenen, London School of Economics, for Econometrics and Empirical Economics, and Professor Ernst-Ludwig von Thadden, University of Mannheim, for Theoretical and Applied Economics. The Local Arrangements Chairs is Professor Asbjørn Rødseth, University of Oslo.
Econometrica, Vol. 78, No. 4 (July, 2010), 1455
FORTHCOMING PAPERS THE FOLLOWING MANUSCRIPTS, in addition to those listed in previous issues, have been accepted for publication in forthcoming issues of Econometrica. DILLENBERGER, DAVID: “Preferences for One-Shot Resolution of Uncertainty and Allais-Type Behavior.” FUDENBERG, DREW, AND YUICHI YAMAMOTO: “Repeated Games Where the Payoffs and Monitoring Structure Are Unknown.” GRANT, SIMON, ATSUSHI KAJII, BEN POLAK, AND ZVI SAFRA: “Generalized Utilitarianism and Harsanyi’s Impartial Obersver Theorem.” HOLMES, THOMAS J.: “The Diffusion of Wal-Mart and Economies of Density.” MANELLI, ALEJANDRO M., AND DANIEL R. VINCENT: “Bayesian and Dominant Strategy Implementation in the Independent Private Values Model.” SPRUMONT, YVES: “An Axiomatization of the Serial Cost-Sharing Method.”
© 2010 The Econometric Society
DOI: 10.3982/ECTA784FORTH
Econometrica, Vol. 78, No. 4 (July, 2010), 1457–1487
FELLOWS OF THE ECONOMETRIC SOCIETY JULY, 2010 ANDREW ABEL, Department of Finance, Wharton School, University of Pennsylvania, 3620 Locust Walk, Philadelphia, PA 19104-6367 (1991). DILIP ABREU, Department of Economics, Princeton University, 210 Fisher Hall, Princeton, NJ 08544-1021 (1991). DARON ACEMOGLU, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (2005). IRMA ADELMAN, Department of Agricultural and Resource Economics, University of California, Berkeley, 207 Giannini Hall, Berkeley, CA 94720 (1968). ANAT R. ADMATI, Department of Finance, Graduate School of Business, Stanford University, 518 Memorial Way, Stanford, CA 94305-5015 (2004). SYDNEY N. AFRIAT, Dipartimento di Economia Politica, Università di Siena, Piazza S. Francesco, 7, Siena 53100, Italy (1976). ABEL GESEVICH AGANBEGYAN, Economics Department, Russian Academy of Sciences, Krasikova St. 51-32, Moscow 117418, Russia (1978). PHILIPPE AGHION, Department of Economics, Harvard University, Littauer Center 222, Cambridge, MA 02138 (1993). DENNIS J. AIGNER, Department of Economics, Paul Merage School of Business, University of California, Irvine, Irvine, CA 92697-3130 (1975). YACINE AÏT-SAHALIA, Department of Economics, Bendheim Center for Finance, Princeton University, 26 Prospect Avenue, Princeton, NJ 085405296 (2002). GEORGE AKERLOF, Department of Economics, University of California, Berkeley, 549 Evans Hall #3880, Berkeley, CA 94720-3880 (1979). ALBERTO ALESINA, Department of Economics, Harvard University, Littauer Center 210, Cambridge, MA 02138 (2002). MAURICE ALLAIS, Ecole Nationale Supérieure des Mines and the Centre National de la Recherche Scientific, Paris, France (1949). BETH E. ALLEN, Department of Economics, College of Liberal Arts, University of Minnesota, 4-101 Hanson Hall, 1925 S. 4th Street, Minneapolis, MN 55455 (1983). JOSEPH G. ALTONJI, Department of Economics, Yale University, Box 208264, New Haven, CT 06520-8264 (1996). FERNANDO ALVAREZ, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60636 (2007). TAKESHI AMEMIYA, Department of Economics, Stanford University, Econ 250, Stanford, CA 94305-6072 (1974). TORBEN ANDERSEN, Department of Finance, Kellogg School of Management, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (2008). © 2010 The Econometric Society
DOI: 10.3982/ECTA784FES
1458
FELLOWS
ROBERT M. ANDERSON, Department of Economics, University of California, Berkeley, 508-1 Evans Hall #3880, Berkeley, CA 94720-3880 (1987). THEODORE W. ANDERSON, Department of Statistics, Stanford University, Sequioa Hall 236, 390 Serra Mall, Stanford, CA 94305-4065 (1950). DONALD W. K. ANDREWS, Department of Economics, Yale University, Box 208281, New Haven, CT 06520-8281 (1989). JOSHUA ANGRIST, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Building E52, Cambridge, MA 02142-1347 (1998). MASAHIKO AOKI, Department of Economics, Stanford University, Landau Economics Building, 579 Serra Mall, Stanford, CA 94305-6072 (1981). MASANAO AOKI, Department of Economics, University of California, Los Angeles, 405 Hilgard Avenue, Los Angeles, CA 90095-1477 (1978). ALOISIO ARAUJO, Instituto Nacional de Matemática Pura e Aplicada and Fundação Getúlio Vargas, Estrada Dona Castorina 110, Rio de Janeiro, RJ 22460-320, Brazil (1987). MANUEL ARELLANO, CEMFI, Casado Del Alisal 5, Madrid 28014, Spain (2002). MARK ARMSTRONG, ESRC Centre for Economic Learning and Social Evolution, University College London, Gower Street, London, WC1E 6BT, England (2008). KENNETH J. ARROW, Department of Economics, Stanford University, Landau Economics Building 342, 579 Serra Mall, Stanford, CA 94305-6072 (1951). W. BRIAN ARTHUR, Intelligent Systems Lab, 3333 Coyote Hill Road, Palo Alto, CA 94304 (1994). ORLEY ASHENFELTER, Industrial Relations Section, Princeton University, Firestone Library A-18-J, Princeton, NJ 08544-2098 (1977). SUSAN ATHEY, Department of Economics, Harvard University, Cambridge, MA 02138-3001 (2004). ANDREW ATKESON, Department of Economics, University of California, Los Angeles, Bunche Hall 9381, Box 951477, Los Angeles, CA 90095-1477 (2005). TONY ATKINSON, Department of Economics, Nuffield College, University of Oxford, New Road, Oxford, OX1 1NF, England (1974). ORAZIO ATTANASIO, Department of Economics, University College London, Gower Street, London, WC1E 6BT, England (2001). ALAN J. AUERBACH, Department of Economics, University of California, Berkeley, 508-1 Evans Hall #3880, Berkeley, CA 94720-3880 (1986). MARIA AUGUSZTINOVICS, Institute of Economics, Hungarian Academy of Sciences, PO Box 262, Budapest, XI, H-1502, Hungary (1979). ROBERT AUMANN, Center for Rationality, The Hebrew University, Jerusalem 91904, Israel (1965). LAWRENCE AUSUBEL, Department of Economics, University of Maryland, Tydings Hall, College Park, MD 20742-7211 (2007).
FELLOWS
1459
COSTAS AZARIADIS, Department of Economics, Washington University in St. Louis, Campus Box 1208, St. Louis, MO 63130-4899 (1989). KYLE BAGWELL, Department of Economics, Stanford University, 579 Serra Mall, Stanford, CA 94305-6072 (2005). YVES BALASKO, Department of Economics and Related Studies, University of York, Heslington, York, YO10 5DD, England (1980). JAMES BALL, Department of Economics, London Business School, Regent’s Park, London, NW1 4SA, England (1973). ABHIJIT BANERJEE, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (1995). SALVADOR BARBERA, Departament d’Economia i d’Història Econòmica and CODE, Universitat Autònoma de Barcelona, Edifici B, Bellaterra 08193, Spain (1988). DAVID P. BARON, Graduate School of Business, Stanford University, 518 Memorial Way, Stanford, CA 94305 (1990). ROBERT J. BARRO, Department of Economics, Harvard University, Littauer Center 218, 1805 Cambridge Street, Cambridge, MA 02138 (1980). ANTON P. BARTEN, Centrum voor Economische Studien, Katholieke Universiteit Leuvan, Naamsestraat 69, Leuven B-3000, Belgium (1968). ROBERT L. BASMANN, Department of Economics, Binghamton University, PO Box 6000, Binghamton, NY 13902-6000 (1966). KAUSHIK BASU, Department of Economics, Cornell University, Uris Hall, Ithaca, NY 14853 (1991). WILLIAM J. BAUMOL, Berkley Center for Entrepreneurial Studies, New York University, Henry Kaufman Management Center, 44 West Fourth Street, New York, NY 10012 (1953). GARY S. BECKER, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637 (1967). MARTIN J. BECKMANN, Department of Economics, Brown University, Box B, Providence, RI 02912 (1958). JERE R. BEHRMAN, Department of Economics, University of Pennsylvania, 229 McNeil, 3718 Locust Walk, Philadelphia, PA 19104-6297 (1980). ROLAND BENABOU, Department of Economics, Woodrow Wilson School of Public and International Affairs, Princeton University, 320 Bendheim Hall, Princeton, NJ 08544-1013 (1994). JEAN PASCAL BENASSY, CEPREMAP, 48 Boulevard Jourdan, Building E, Paris 75014, France (1981). JESS BENHABIB, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10013 (1992). DIRK BERGEMANN, Department of Economics, Yale University, Box 208281, New Haven, CT 06520-8281 (2007). TED BERGSTROM, Department of Economics, University of California, Santa Barbara, 2127 North Hall, Santa Barbara, CA 93106-9210 (2000). BEN S. BERNANKE, Woodrow Wilson School of Public and International Affairs, Princeton University, Princeton, NJ 08544 (1997).
1460
FELLOWS
ERNST R. BERNDT, Sloan School of Management, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142 (1994). B. DOUGLAS BERNHEIM, Department of Economics, Stanford University, Stanford, CA 94305-6072 (1991). STEVEN T. BERRY, Department of Economics, Yale University, 37 Hillhouse Ave, PO Box 8264, New Haven, CT 06520-8254 (1999). TIMOTHY J. BESLEY, Department of Economics, London School of Economics, London, WC2A 2AE, England (2000). HELMUT BESTER, FB Wirtschaftswissenschaft, Freie Universität Berlin, Boltzmannstr. 20, Berlin 14195, Germany (2009). TRUMAN F. BEWLEY, Department of Economics, Yale University, Box 208281, New Haven, CT 06520-8281 (1978). JAGDISH BHAGWATI, Department of Economics, Columbia University, 828 International Affairs Building, MC 3308, 420 West 118th Street, New York, NY 10027 (1973). HERMAN J. BIERENS, Department of Economics, Pennsylvania State University, 608 Kern Graduate Building, University Park, PA 16802 (2005). KENNETH G. BINMORE, Department of Economics, University College London, Gower Street, London, WC1E 6BT, England (1987). CHARLES BLACKORBY, Department of Economics, University of Warwick, Gibbet Hill Road, Coventry, Warwickshire, CV4 7AL, England (1988). OLIVIER BLANCHARD, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (1985). ALAN S. BLINDER, Department of Economics, Princeton University, 105 Fisher Hall, Princeton, NJ 08544-1021 (1981). CHRISTOPHER J. E. BLISS, Nuffield College, University of Oxford, New Road, Oxford, OX1 1NF, England (1975). LAWERENCE BLUME, Department of Economics, Cornell University, Uris Hall, Ithaca, NY 14850-7601 (1998). RICHARD W. BLUNDELL, Department of Economics, University College London, Gower Street, London, WC1E 6BT, England (1991). M. MARCEL BOITEUX, d’Electricité de France, 26 Rue de la Baume, Paris 75008, France (1953). MICHELE BOLDRIN, Department of Economics, Washington University in St. Louis, Campus Box 1208, St. Louis, MO 63130-4899 (2002). TIM BOLLERSLEV, Department of Economics, Duke University, 305 Social Sciences, Durham, NC 27708-0097 (1999). PATRICK BOLTON, Finance and Economics, Columbia Business School, Columbia University, 3022 Broadway, Uris Hall 804, New York, NY 10027 (1993). GEORGE J. BORJAS, John F. Kennedy School of Government, Harvard University, 79 John F. Kennedy Street, Cambridge, MA 02138 (1998). JOHN BOUND, Department of Economics, University of Michigan, Ann Arbor, MI 48109 (2004).
FELLOWS
1461
FRANCOIS BOURGUIGNON, Paris-Jourdan Sciences Economiques, 48 Boulevard Jourdan, Paris 75014, France (1986). WILLIAM C. BRAINARD, Department of Economics, Yale University, Box 208268, 28 Hillhouse Avenue, New Haven, CT 06520-8268 (1975). ADAM BRANDENBURGER, Department of Economics, Stern School of Business, New York University, 44 West Fourth Street, New York, NY 100121126 (2004). TIMOTHY F. BRESNAHAN, Department of Economics, Stanford University, Stanford, CA 94305-6072 (1990). TREVOR BREUSCH, IDEC Program, Crawford School of Economics and Government, Australian National University, J. G. Crawford Building, Canberra, ACT 0200, Australia (1991). WILLIAM A. BROCK, Department of Economics, University of Wisconsin, 1180 Observatory Drive, Madison, WI 53706 (1974). ANDRAS BRODY, Institute of Economics, Hungarian Academy of Sciences, Budapest H-1361, Hungary (1971). DONALD J. BROWN, Economic Growth Center, Department of Mathematics, Yale University, Box 208269, New Haven, CT 06520 (1981). MARTIN BROWNING, Department of Economics, Nuffield College, Oxford University, Manor Road, Oxford, Oxfordshire, OX1 3UQ, England (1996). JEREMY I. BULOW, Graduate School of Business, Stanford University, 518 Memorial Way, Stanford, CA 94305 (1990). KENNETH BURDETT, Department of Economics, University of Pennsylvania, 3718 Locust Walk, 439 McNeil Building, Philadelphia, PA 19104-6297 (1999). EDWIN BURMEISTER, Department of Economics, Duke University, 2200 West Main Street, Suite 210, Box 90097, Durham, NC 27708-0097 (1978). RICARDO J. CABALLERO, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (1998). PHILIP D. CAGAN, Department of Economics, Columbia University, 1022 International Affairs Building, 420 West 118th Street, New York, NY 10027 (1975). GUILLERMO CALVO, Department of Economics, School of International and Public Affairs, Columbia University, 420 West 118th Street, 13th Floor, New York, NY 10027 (1995). COLIN CAMERER, Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91125 (1999). JOHN Y. CAMPBELL, Department of Economics, Harvard University, Littauer Center 213, Cambridge, MA 02138 (1990). ANDREW CAPLIN, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10003 (1992). DAVID CARD, Department of Economics, University of California, Berkeley, 631A Evans Hall #3880, Berkeley, CA 94720-3880 (1991). ANNE CARTER, Department of Economics, Brandeis University, Sachar International Center, Waltham, MA 02454 (1973).
1462
FELLOWS
ANNE C. CASE, Department of Economics and Woodrow Wilson School, Princeton University, 367 Wallace Hall, Princeton, NJ 08544 (2009). GARY CHAMBERLAIN, Department of Economics, Harvard University, Littauer 123, Cambridge, MA 02138 (1981). CHRISTOPHE CHAMLEY, Department of Economics, Boston University, 270 Bay State Road, Boston, MA 02215 (1993). PAUL CHAMPSAUR, Autorité de Régulation des Communications électroniques et des Postes, Paris, Cedex 15, 75730, France (1981). VARADARAJAN V. CHARI, Department of Economics, University of Minnesota, 4-101 Hanson Hall, 1925 S. 4th Street, Minneapolis, MN 55455 (1998). KALYAN CHATTERJEE, Department of Economics, Pennsylvania State University, 608 Kern Graduate Building, University Park, PA 16802-3306 (2003). YEON-KOO CHE, Department of Economics, Columbia University, 420 West 118th Street, New York, NY 10027 (2009). XIAOHONG CHEN, Department of Economics, Yale University, Box 208281, New Haven, CT 06520-8281 (2007). VICTOR CHERNOZHUKOV, Department of Economics and Operations Research Center, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142 (2009). ANDREW CHESHER, Department of Economics, University College London, Gower Street, London, WC1E 6BT, England (1999). PIERRE CHIAPPORI, Department of Economics, Columbia University, 1009A International Affairs Building, MC 3308, 420 West 118th Street, New York, NY 10027 (1995). JOHN S. CHIPMAN, Department of Economics, University of Minnesota, 4-101 Hanson Hall, 1925 S. 4th Street, Minneapolis, MN 55419-5229 (1956). IN KOO CHO, Department of Economics, University of Illinois, 1206 S 6th Street, Champaigne, IL 61820 (2002). GREGORY C. CHOW, Department of Economics, Princeton University, 205 Fisher Hall, Princeton, NJ 08544 (1967). CARL F. CHRIST, Department of Economics, Johns Hopkins University, 34th and Charles Streets, Baltimore, MD 21218 (1967). LAWRENCE CHRISTIANO, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (2001). ROBERT W. CLOWER, Department of Economics, University of California, Los Angeles, Bunche Hall 8283, Los Angeles, CA 90095-1477 (1978). STEPHEN COATE, Department of Economics, Cornell University, 476 Uris Hall, Ithaca, NY 14853-7601 (2004). JOHN COCHRANE, Booth School of Business, University of Chicago, 5807 S. Woodlawn, Chicago, IL 60637 (2001). JOHN CONLISK, Department of Economics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0508 (2002). THOMAS F. COOLEY, Department of Economics, Stern School of Business, New York University, 44 West Fourth Street, New York, NY 10012 (1998).
FELLOWS
1463
RUSSELL COOPER, Department of Economics, European University Institute, Villa San Paolo, Via della Piazzuola 43, Florence 50133, Italy (1996). W. W. COOPER, Department of Information Risk and Operations Management, University of Texas at Austin, University Station, Austin, TX 78712 (1957). STEPHEN R. COSSLETT, Department of Economics, Ohio State University, 1945 North High Street, Columbus, OH 43210 (1994). JOHN C. COX, Sloan School of Management, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142 (1990). JAN SALOMON CRAMER, Baambrugse Zuwe 194, Vinkeveen 3645 AM, Netherlands (1972). VINCENT P. CRAWFORD, Department of Economics, University of Oxford, Manor Road Building, Manor Road, Oxford, OX1 3UQ, United Kingdom (1990). JACQUES CREMER, Université des Sciences Sociales, IDEI, Place Anatole France, Toulouse, Cedex, 31042, France (1992). MARTIN CRIPPS, John M. Olin School of Business, Washington University in St. Louis, One Brookings Drive, St. Louis, MO 63130-4899 (2008). PARTHA DASGUPTA, Department of Economics, University of Cambridge, Sidgwick Avenue, Cambridge, CB3 9DD, England (1975). CLAUDE D’ASPREMONT, Economics Core, Université catholique de Louvain, 34 Voie Du Roman Pays, Louvain-la-Neuve 1348, Belgium (1984). PAUL A. DAVID, Oxford Internet Institute, 1 St. Giles, Oxford, OX1 3JS, United Kingdom (1975). RUSSELL DAVIDSON, Department of Economics, McGill University, Leacock Building, Room 443, 855 Sherbrooke Street West, Montréal, Québec, H3A 2T7, Canada (1994). RICHARD H. DAY, Department of Economics, University of Southern California, Los Angeles, CA 90089 (1991). ANGUS S. DEATON, Woodrow Wilson School of Public and International Affairs, Princeton University, 328 Wallace Hall, Princeton, NJ 08544 (1978). MANFRED DEISTLER, Department of Econometrics, Vienna University of Technology, Argentinierstr. 8/E105-2, Vienna A 1040, Austria (1993). EDDIE DEKEL, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (1996). GABRIELLE DEMANGE, Paris-Jourdan Sciences Economiques, 48 Boulevard Jourdan, Paris 75014, France (1991). RAYMOND DENECKERE, Department of Economics, University of Wisconsin– Madison, 1180 Observatory Drive, 6422 Social Science, Madison, WI 53706 (2002). MATHIAS DEWATRIPONT, European Centre for Advanced Research in Economics and Statistics, Universite Libre De Bruxelles, Avenue F. D. Roosevelt 50, CP114, Brussels 1050, Belgium (1993). PHOEBUS J. DHRYMES, Department of Economics, Columbia University, 1025 International Affairs Building, New York, NY 10027 (1970).
1464
FELLOWS
DOUGLAS W. DIAMOND, Booth School of Business, University of Chicago, 5807 S Woodlawn Avenue, Chicago, IL 60637-1610 (1990). PETER A. DIAMOND, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (1968). FRANCIS X. DIEBOLD, Department of Economics, University of Pennsylvania, 3718 Locust Walk, Philadelphia, PA 19104-6297 (1998). EGBERT DIERKER, Reisnerstrasse 22 5, Wien 1030, Austria (1986). W. ERWIN DIEWERT, Department of Economics, University of British Columbia, #997-1873 East Mall, Vancouver, BC, V6T 1Z1, Canada (1975). AVINASH K. DIXIT, Department of Economics, Princeton University, Fisher Hall G 001, Princeton, NJ 08544-1021 (1977). JACQUES H. DREZE, Center for Operations Research and Econometrics, Université catholique de Louvain, 34 Voie du Roman Pays, B-1348, Louvainla-Neuve 1348, Belgium (1965). PRADEEP DUBEY, Department of Applied Math and Statistics, State University of New York, Stony Brook, Stony Brook, NY 11794 (1990). J. DARELL DUFFIE, Graduate School of Business, Stanford University, 518 Memorial Way, Stanford, CA 94305-5015 (1995). JEAN-MARIE DUFOUR, CIREQ/Département de sciences économiques, McGill University, Leacock Building, 855 Sherbrooke Street West, Montréal, Québec, H3A 2T7, Canada (1998). JAMES DURBIN, Centre for Microdata Methods and Practice, University College London, Gower Street, London, WC1E 6BT, England (1967). STEVEN DURLAUF, Department of Economics, University of Wisconsin, 1180 Observatory Road, Madison, WI 53706 (1997). BHASKAR DUTTA, Department of Economics, University of Warwick, Gibbet Hill Road, Coventry, Warwickshire, CV4 7AL, England (1996). DAVID EASLEY, Department of Economics, Cornell University, 450 Uris Hall, Ithaca, NY 14853 (1997). RICHARD A. EASTERLIN, Department of Economics, University of Southern California, KAP 300, Los Angeles, CA 90089-0253 (1982). JOHNATHAN EATON, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10012 (1995). ZVI ECKSTEIN, The Eitan Berglas School of Economics, Tel Aviv University, Tel Aviv 69978, Israel (2000). MARTIN EICHENBAUM, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (1997). GLENN ELLISON, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (2000). JEFFREY ELY, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (2009). ROBERT F. ENGLE III, Center For Financial Econometrics, New York University, Stern School of Business, 44 West Fourth Street, New York, NY 10012 (1981).
FELLOWS
1465
DENNIS EPPLE, Graduate School of Industrial Administration, Carnegie Mellon University, Pittsburgh, PA 15213-3890 (2003). LARRY G. EPSTEIN, Department of Economics, Boston University, 270 Bay State Road, Boston, MA 02215 (1989). WILFRED J. ETHIER, Department of Economics, University of Pennsylvania, 372 McNeil, 3718 Locust Walk, Philadelphia, PA 19104-6297 (1991). RAY C. FAIR, Department of Economics, Yale University, Cowles Foundation, New Haven, CT 06520-8281 (1977). EUGENE FAMA, Booth School of Business, University of Chicago, 5807 South Woodlawn Avenue, Chicago, IL 60637 (1973). HENRY S. FARBER, Industrial Relations Section, Princeton University, Firestone Library A-19-H-1, Princeton, NJ 08544 (1988). ROGER FARMER, Department of Economics, University of California, Los Angeles, PO Box 951477, Los Angeles, CA 90095-1477 (2003). JOSEPH FARRELL, Department of Economics, University of California, Berkeley, 508-1 Evans Hall #3880, Berkeley, CA 94720-3880 (2002). ERNST FEHR, Institute for Empirical Research in Economics, University of Zurich, Blümlisalpstrasse 10, Zürich CH-8006, Switzerland (2008). MARTIN FELDSTEIN, National Bureau of Economic Research, Harvard University, 1050 Massachusetts Avenue, Cambridge, MA 02138-5398 (1970). STANLEY FISCHER, Bank of Israel, P.O. Box 780, Jerusalem 91007, Israel (1977). PETER C. FISHBURN, P.O. Box 309, Basking Ridge, NJ 07920-0309 (1974). FRANKLIN M. FISHER, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (1963). ROBERT W. FOGEL, Booth School of Business, University of Chicago, 5807 S Woodlawn Ave, Chicago, IL 60637 (1971). FRANÇOISE FORGES, Ufr D’economie Appliquee, CEREMADE, Universite Paris IX–Dauphine, Place du Maréchal de Lattre de Tassigny, Paris, Cedex 16, 75775, France (1997). KARL A. FOX, 234 Parkridge Circle, Ames, IA 50010-3645 (1959). JACOB A. FRENKEL, Group of Thirty and Chairman of JPMorgan Chase International, 1726 M Street, Suite 200, Washington, DC 20036 (1982). JAMES W. FRIEDMAN, Department of Economics, University of North Carolina–Chapel Hill, Gardner 306A, Chapel Hill, NC 27599-3305 (1977). DREW FUDENBERG, Department of Economics, Harvard University, Cambridge, MA 02138 (1987). WAYNE A. FULLER, Department of Economics, Iowa State University, 260 Heady Hall, Ames, IA 50011-1070 (1993). JEAN J. GABSZEWICZ, Département des sciences économiques, Center for Operations Research and Econometrics, Université Catholique de Louvain, 34 Voie Du Roman Pays, Louvain-la-Neuve 1348, Belgium (1979). DOUGLAS GALE, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10012 (1987).
1466
FELLOWS
JORDI GALI, CREI, Universitat Pompeu Fabra, Ramon Trias Fargas, 25-27, Barcelona 08005, Spain (2003). A. RONALD GALLANT, Fuqua School of Business, Duke University, DUMC Box 90120, W425, Durham, NC 27708-0120 (1985). JOHN GEANAKOPLOS, Department of Economics, Yale University, Box 208281, 30 Hillhouse Avenue, New Haven, CT 06520-8281 (1989). MARK GERTLER, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10003 (1998). JOHN GEWEKE, Department of Economics, University of Iowa, 108 Pappajohn Building Suite 210, Iowa City, IA 52242-1000 (1982). ALLAN GIBBARD, Department of Philosophy, University of Michigan, Angell Hall, 435 South State Street, Ann Arbor, MI 48109-1003 (1984). ROBERT GIBBONS, Department of Economics, Sloan School of Management, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (2002). ITZHAK GILBOA, Eitan Berglas School of Economics, Tel Aviv University, Tel Aviv 69978, Israel (2000). EDWARD L. GLAESER, Department of Economics, Harvard University, 315A Littauer Center, HKS Taubman Center 318, Cambridge, MA 02138 (2005). PINELOPI GOLDBERG, Department of Economics, Princeton University, Fisher Hall, Princeton, NJ 08544-1021 (2004). CLAUDIA GOLDIN, Department of Economics, Harvard University, 229 Littauer Center, Cambridge, MA 02138 (1991). RALPH E. GOMORY, Stern School of Business, New York University, 44 West Fourth Street, Kaufman Management Center, New York, NY 10012 (1972). ROBERT J. GORDON, Department of Economics, Northwestern University, 2001 Sheridan Road, Arthur Andersen Hall, Evanston, IL 60208-2600 (1977). ROGER H. GORDON, Department of Economics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0508 (1995). CHRISTIAN GOURIEROUX, Center for Research in Economics and Statistics, Université Paris IX and ENSAE, Timbre J320–Bureau 11.06, 15 Boulevard Gabriel Péri, Malakoff, Cedex, 92400, France (1986). JEAN MICHEL GRANDMONT, CNRS and CREST, 15 Boulevard Gabriel Peri, Malakoff, Cedex, 92245, France (1974). EDWARD J. GREEN, Department of Economics, Pennsylvania State University, 415 Kern Graduate Building, University Park, PA 16802-1294 (1987). JERRY GREEN, Department of Economics, Harvard University, Littauer 326, 1805 Cambridge Street, Cambridge, MA 02138 (1975). JEREMY GREENWOOD, Department of Economics, University of Pennsylvania, McNeil 160, Philadelphia, PA 19104-6297 (2008). AVNER GREIF, Department of Economics, Stanford University, Stanford, CA 94305-6072 (1999).
FELLOWS
1467
REUBEN GRONAU, Department of Economics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 91905, Israel (1986). GENE M. GROSSMAN, Department of Economics, Woodrow Wilson School of Public and International Affairs, Princeton University, 300 Fisher Hall, Princeton, NJ 08544-1013 (1992). SANFORD GROSSMAN, Quantitative Financial Strategies, 10 Glenville Street, Greenwich, CT 06831 (1980). THEODORE GROVES, Department of Economics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0508 (1977). ROGER GUESNERIE, Paris-Jourdan Sciences Économiques, 48 Boulevard Jourdan, Paris 75014, France (1980). G. T. GUILBAUD, Department of Mathematics, Ecole des Hautes Etudes en Sciences Sociales, 54 Boulevard Raspail, Paris 75006, France (1951). FARUK GUL, Department of Economics, Princeton University, Fisher Hall 214, Princeton, NJ 08544-1021 (1996). FRANK H. HAHN, Department of Economics, Cambridge University, Sidgwick Avenue, Cambridge, CB3 9DD, England (1961). JINYONG HAHN, Department of Economics, University of California, Los Angeles, Bunche Hall 8383, Box 951477, Los Angeles, CA 90095-1477 (2003). PHILIP HAILE, Department of Economics, Yale University, P.O. Box 208264, New Haven, CT 06520 (2008). ROBERT E. HALL, Hoover Institution, Stanford University, Stanford, CA 94305-6010 (1973). KOICHI HAMADA, Economic Growth Center, Yale University, Box 208269, 27 Hillhouse Avenue, New Haven, CT 06520-8268 (1978). DANIEL S. HAMERMESH, Department of Economics, University of Texas, Austin, TX 78712-1173 (1996). JAMES D. HAMILTON, Department of Economics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 (1996). PETER J. HAMMOND, Department of Economics, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, England (1977). GIORA HANOCH, Department of Economics, Hebrew University of Jerusalem, Jerusalem 91905, Israel (1975). BRUCE E. HANSEN, Department of Economics, University of Wisconsin– Madison, 1180 Observatory Drive, Madison, WI 53706 (2000). LARS PETER HANSEN, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637 (1984). ARNOLD HARBERGER, Department of Economics, University of California, Los Angeles, 405 Hilgard Avenue, Los Angeles, CA 90095-1477 (1967). CHRISTOPHER HARRIS, Faculty of Economics, University of Cambridge, Sidgwick Avenue, Cambridge, CB3 9DD, England (2004). MILTON HARRIS, Booth School of Business, University of Chicago, 5807 S Woodlawn Avenue, Chicago, IL 60637 (1988).
1468
FELLOWS
OLIVER HART, Department of Economics, Harvard University, Littauer 220, Cambridge, MA 02138 (1979). SERGIU HART, Center for Rationality, Hebrew University of Jerusalem, Feldman Building, Givat-Ram, Jerusalem 91904, Israel (1985). ANDREW C. HARVEY, Faculty of Economics, University of Cambridge, Sidgwick Avenue, Cambridge, CB3 9DD, England (1990). MICHIO HATANAKA, 30-12, 1-Chome, Kichijoji-Kitamachi, Musashino Shi, 180-0001, Japan (1974). JERRY A. HAUSMAN, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (1979). FUMIO HAYASHI, Department of Economics, Graduate School of International Corporate Strategy (ICS), Hitotsubashi University, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8439, Japan (1988). GEOFFREY MARTIN HEAL, Graduate School of Business, Columbia University, 616 Uris Hall, New York, NY 10027 (1977). JOHN C. HEATON, Booth School of Business, University of Chicago, 5807 South Woodlawn Avenue, Chicago, IL 60637 (2007). JAMES HECKMAN, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637 (1980). MARTIN HELLWIG, Max Planck Institute for Research on Collective Goods, Kurt-Schumacher-Str. 10, Bonn D-53113, Germany (1981). ELHANAN HELPMAN, Department of Economics, Harvard University, Littauer 217, Cambridge, MA 02138 (1986). KENNETH HENDRICKS, Department of Economics, University of Texas at Austin, 1 University Station, Austin, TX 78712 (2004). DAVID HENDRY, Department of Economics, Oxford University, Nuffield College, Oxford, OX1 4NF, England (1975). CLAUDE HENRY, School of International and Public Affairs, Columbia University, 420 West 118th Street, New York, NY 10027 (1976). DONALD D. HESTER, Department of Economics, University of Wisconsin– Madison, 1180 Observatory Drive, Madison, WI 53706 (1976). BERT G. HICKMAN, Department of Economics, Stanford University, 518 Memorial Way, Stanford, CA 94305 (1977). WERNER HILDENBRAND, Institut für Gesellschafts- und Wirtschaftswissenschaften, Universität Bonn, Forschungsgruppe Hildenbrand, Lennéstr. 37, Bonn D-53113, Germany (1971). ALBERTO HOLLY, Ecole des HEC, Université de Lausanne, Quartier UNILDorigny, Bâtiment Extranef, Lausanne CH-1015, Switzerland (1985). BENGT R. HOLMSTROM, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (1983). CHARLES HOLT, Department of Economics, University of Virginia, Box 400182, Charlottesville, VA 22904-4182 (1971). HAN HONG, Department of Economics, Stanford University, Landau Economics Building, 579 Serra Mall, Stanford, CA 94305-6072 (2009).
FELLOWS
1469
SEPPO HONKAPOHJA, Bank of Finland, PO Box 160, Helsinki FI-00101, Finland (1999). BO E. HONORÉ, Department of Economics, Princeton University, Fisher Hall 209, Princeton, NJ 08544-1021 (1995). WILLIAM C. HOOD, 601 Windermere Avenue, Ottawa, Ontario, Canada (1975). HUGO HOPENHAYN, Department of Economics, University of California, Los Angeles, Bunche Hall 9377, Los Angeles, CA 90095 (2000). JOEL HOROWITZ, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208-2600 (1996). V. JOSEPH HOTZ, Department of Economics, Duke University, 220B Social Sciences Building, P.O. Box 90097, Durham, NC 27708-0097 (2003). PETER HOWITT, Department of Economics, Brown University, Providence, RI 02912 (1994). CHENG HSIAO, Department of Economics, University of Southern California, University Park, Los Angeles, CA 90089 (1996). HIDEHIKO ICHIMURA, Faculty of Economics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan (2007). SHINICHI ICHIMURA, International Centre for the Study of East Asian Development, 11-4 Otemachi, Kokurakita, Kitakyushu, 803-0814, Japan (1962). GUIDO IMBENS, Department of Economics, Harvard University, 1805 Cambridge Street, Littauer M-24, Cambridge, MA 02138 (2001). MICHAEL INTRILIGATOR, Department of Economics, University of California, Los Angeles, Bunche Hall 8254, Los Angeles, CA 90095-1477 (1982). TAKATOSHI ITO, Faculty of Economics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan (1992). MATTHEW O. JACKSON, Department of Economics, Stanford University, 241 Landau, 579 Serra Mall, Stanford, CA 94305-6072 (1998). PHILIPPE JEHIEL, CERAS, Ecole Nationale des Ponts et Chaussées, 48 Boulevard Jourdan, Paris 75014, France (2004). IAN JEWITT, Nuffield College, Oxford University, Manor Road, Oxford, OX1 3UQ, England (2008). SOREN JOHANSEN, Department of Statistics and Operations Research, Institute of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, Copenhagen DK-02100, Denmark (2000). LARRY E. JONES, Department of Economics, University of Minnesota, 4-101 Hanson Hall, 1925 S. 4th Street, Minneapolis, MN 55455 (1995). RONALD W. JONES, Department of Economics, University of Rochester, Harkness Hall, Rochester, NY 14627 (1971). JAMES JORDAN, Department of Economics, Pennsylvania State University, 514 Kern Graduate Building, College Park, PA 16802-3306 (1980). DALE W. JORGENSON, Department of Economics, Harvard University, Littauer 122, 1805 Cambridge Street, Cambridge, MA 02138 (1964). PAUL L. JOSKOW, Alfred P. Sloan Foundation, 630 Fifth Avenue, Suite 2550, New York, NY 10111 (1988).
1470
FELLOWS
BOYAN JOVANOVIC, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10012 (1989). KENNETH L. JUDD, Hoover Institution, Stanford University, Stanford, CA 94305-6010 (1989). GEORGE G. JUDGE, Department of Economics, University of California, Berkeley, 207 Giannini Hall, Berkeley, CA 94720 (1986). JOHN H. KAGEL, Department of Economics, Ohio State University, 1945 North High Street, 473-A Arps Hall, Columbus, OH 43210-1172 (2003). DANIEL KAHNEMAN, Woodrow Wilson School of Public and International Affairs, Princeton University, 322 Wallace Hall, Princeton, NJ 08544-1013 (1993). EHUD KALAI, Managerial Economics & Decision Sciences, Kellogg School of Management, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208-2001 (1988). MORTON I. KAMIEN, Managerial Economics & Decision Sciences, Kellogg School of Management, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (1996). MICHIHIRO KANDORI, Faculty of Economics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan (1999). MAMORU KANEKO, Institute of Policy and Planning Sciences, University of Tsukuba, Tsukuba, Ibaraki, 305, Japan (2009). JAKAR KANNAI, Faculty of Mathematics and Computer Science, Weizmann Institute of Science, Rehovot 76100, Israel (1978). ARIE KAPTEYN, RAND, 1776 Main Street, P.O. Box 2138, Santa Monica, CA 90407-2138 (1994). EDI KARNI, Department of Economics, Johns Hopkins University, 440 Mergenthaler Hall, 3400 N. Charles Street, Baltimore, MD 21218 (2001). LAWRENCE KATZ, Department of Economics, Harvard University, Littauer 224, Cambridge, MA 02138 (1993). MICHAEL P. KEANE, Faculty of Business, University of Technology, Sydney, P.O. Box 123 Broadway, NSW 2007, Australia (2005). PATRICK KEHOE, Department of Economics, University of Minnesota, 4-101 Hanson Hall, 1925 S. 4th Street, Minneapolis, MN 55455-0430 (2000). TIMOTHY J. KEHOE, Department of Economics, University of Minnesota, 4101 Hanson Hall, 1925 S. 4th Street, Minneapolis, MN 55455-0430 (1991). MURRAY C. KEMP, Division of Economic and Financial Studies, Macquarie University, Sydney, NSW 2109, Australia (1971). JOHN KENNAN, Department of Economics, University of Wisconsin, Madison, WI 53706 (2005). NICHOLAS M. KIEFER, Department of Economics, Cornell University, 490 Uris Hall, Ithaca, NY 14853-7601 (1989). RICHARD KIHLSTROM, Finance Department, Wharton School, University of Pennsylvania, 2300 Steinberg Hall–Dietrich Hall, 3620 Locust Walk, Philadelphia, PA 19104-6367 (1979).
FELLOWS
1471
MERVYN KING, Bank of England, Threadneedle Street, London, EC2R 8AH, England (1982). ALAN KIRMAN, Groupement de Recherche en Economie Quantitative d’Aix Marseille, 2 Rue de la Charité, Marseille 13236, France (1990). YUICHI KITAMURA, Department of Economics, Yale University, 30 Hillhouse Avenue, Box 208281, New Haven, CT 06520-8281 (2009). NOBUHIRO KIYOTAKI, Department of Economics, Princeton University, Fisher Hall 112, Princeton, NJ 08544-1021 (1997). LAWRENCE R. KLEIN, Department of Economics, University of Pennsylvania, 335 McNeil, 3718 Locust Walk, Philadelphia, PA 19104-6297 (1948). PAUL KLEMPERER, Nuffield College, Oxford University, Oxford, OX1 1NF, England (1994). TEUN KLOEK, Econometric Institute, Erasmus University Rotterdam, Postbus 1738, Rotterdam 3000 DR, Netherlands (1978). JAN KMENTA, CERGE-EI, P.O. Box 882, Politickych veznu 7, Praha 1, 111 21, Czech Republic (1980). NARAYANA KOCHERLAKOTA, The Federal Reserve Bank of Minneapolis, 90 Hennepin Avenue, Minneapolis, MN 55401 (2005). ROGER KOENKER, Department of Economics, University of Illinois at Urbana–Champaign, Urbana, IL 61801 (1998). ELON KOHLBERG, Harvard Business School, Soldiers Field, Boston, MA 02163 (1991). SERGE-CHRISTOPHE KOLM, L’Ecole des Hautes Etudes en Sciences Sociales, 20 Rue Henri-Heine, Paris 75016, France (1973). JANOS KORNAI, Institute for Advanced Study, Collegium Budapest, Budapest H-1014, Hungary (1968). LAURENCE J. KOTLIKOFF, Department of Economics, Boston University, 270 Bay State Road, Boston, MA 02215 (1992). MICHAEL KREMER, Department of Economics, Harvard University, Littauer Center M-20, Cambridge, MA 02138 (2008). DAVID KREPS, Department of Economics, Stanford University, 518 Memorial Way, Stanford, CA 94305 (1981). VIJAY KRISHNA, Department of Economics, Pennsylvania State University, University Park, PA 16802 (2002). ALAN B. KRUEGER, Industrial Relations Section, Woodrow Wilson School of Public and International Affairs, Princeton University, Firestone Library, Princeton, NJ 08544 (1996). ANNE KRUEGER, School of Advanced International Studies, Johns Hopkins University, 1717 Massachusetts Avenue NW, Washington, DC 20036 (1981). PAUL R. KRUGMAN, Woodrow Wilson School, Princeton University, Princeton, NJ 08544-1013 (1986). PER KRUSELL, Institute for International Economic Studies (IIES), Stockholm University, Stockholm 106 91, Sweden (2006).
1472
FELLOWS
HAROLD W. KUHN, Department of Economics, Princeton University, 111 Fisher Hall, Princeton, NJ 08544 (1961). MORDECAI KURZ, Department of Economics, Stanford University, Landau 339, Stanford, CA 94305-6072 (1971). FINN E. KYDLAND, Department of Economics, University of California, Santa Barbara, 2127 North Hall, Santa Barbara, CA 93106-9210 (1992). ALBERT S. KYLE, Department of Finance, Smith School of Business, University of Maryland, 4433 Van Munching Hall, College Park, MD 20742 (2002). ANTHONY LANCASTER, Department of Economics, Brown University, Box B, Providence, RI 02912 (1991). GUY LAROQUE, Laboratoire de Macroéconomie J360, Centre de Recherche en Économie et Statistique, 15 Boulevard Gabriel Péri, Malakoff 92245, France (1979). LAWRENCE J. LAU, Department of Economics, Stanford University, Landau Economics Building, Room 340, 579 Serra Mall, Stanford, CA 94305-6072 (1976). RICHARD LAYARD, Centre for Economic Performance, London School of Economics, Houghton Street, London, WC2A 2AE, England (1986). EDWARD LAZEAR, Graduate School of Business, Stanford University, 518 Memorial Way, Stanford, CA 94305-5015 (1988). EDWARD E. LEAMER, UCLA Anderson Forecast, University of California, Los Angeles, 110 Westwood Plaza, C507 Entrepreneurs Hall, Los Angeles, CA 90095-1481 (1977). JOHN LEDYARD, Division of Humanities & Social Science, Californian Institute of Technology, MC 228-77, Pasadena, CA 91125 (1977). LUNG-FEI LEE, Department of Economics, Ohio State University, 410 Arps Hall, 1945 N High Street, Columbus, OH 43210-1172 (1990). EHUD LEHRER, School of Mathematical Sciences, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel (2001). JACQUES LESOURNE, Institut d’Economie Industrielle, Manufacture des Tabacs, 21 Allée de Brienne, Toulouse F-31000, France (1967). DAVID LEVHARI, Department of Economics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 91905, Israel (1971). JONATHAN LEVIN, Department of Economics, Stanford University, Stanford, CA 94305 (2008). DAVID K. LEVINE, Department of Economics, Washington University in St. Louis, Campus Box 1208, St. Louis, MO 63130-4899 (1989). STEVEN LEVITT, Department of Economics, Booth School of Business, University of Chicago, 5807 S Woodlawn Ave, Chicago, IL 60637 (2004). ARTHUR LEWBEL, Department of Economics, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467 (2003). ASSAR LINDBECK, Institute for International Economic Studies, University of Stockholm, Stockholm S-106 91, Sweden (1973).
FELLOWS
1473
OLIVER B. LINTON, Department of Economics, London School of Economics and Political Science, Houghton Street, Room S485/S486, London, WC2A 2AE, England (2007). BARTON L. LIPMAN, Department of Economics, Boston University, 270 Bay State Road, Boston, MA 02215 (2005). RICHARD G. LIPSEY, Department of Economics, Simon Fraser University, 431 Sunset Road, RR#1 Q70, Bowen Island, BC, V0N 1G0, Canada (1972). NISSAN LIVIATAN, New Economic School, Nakhimovsky Prospect 47, Suite 1721, Moscow 117418, Russia (1966). ALESSANDRO LIZZERI, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10012 (2007). ANDREW W. LO, MIT Laboratory for Financial Engineering, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (2002). GLENN C. LOURY, Department of Economics, Brown University, 64 Waterman Street, Providence, RI 02912 (1994). MICHAEL C. LOVELL, Economics Department, Wesleyan University, 264 High Street, Middletown, CT 06459-6067 (1980). ROBERT E. LUCAS, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637 (1975). R. DUNCAN LUCE, University of California, Irvine, Social Science Plaza 2133, Irvine, CA 92697-5100 (2009). HAROLD F. LYDALL, University of East Anglia, Norwich, NR4 7TJ, England (1980). MARK MACHINA, Department of Economics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0508 (1989). JAMES G. MACKINNON, Department of Economics, Queen’s University, Kingston, ON, K7L 3N6, Canada (1990). W. BENTLEY MACLEOD, Department of Economics, Columbia University, 1427A International Affairs Building, MC 3308, 420 West 118th Street, New York, NY 10027 (2005). THOMAS E. MACURDY, Hoover Institution, Stanford University, Stanford, CA 94305-6072 (1987). ALBERT MADANSKY, Graduate School of Business, University of Chicago, Chicago, IL 60637 (1977). MICHAEL MAGILL, Department of Economics, University of Southern California, Los Angeles, CA 90089 (2000). THIERRY MAGNAC, Universitè des Sciences Sociales (IDEI & GREMAQ), Toulouse School of Economics, 21 Allée de Brienne, Toulouse 31000, France (2009). GEORGE MAILATH, Department of Economics, University of Pennsylvania, 3718 Locust Walk, Philadelphia, PA 19104-6297 (1995).
1474
FELLOWS
JACQUES MAIRESSE, Centre de Recherche en Économie et Statistique, Institut National de la Statistique et des Études Économiques, 15 Boulevard Gabriel Péri, Malakoff 92245, France (1986). MUKUL MAJUMDAR, Department of Economics, Cornell University, 460 Uris Hall, Ithaca, NY 14853 (1976). VALERY L. MAKAROV, New Economic School, Nakhimovsky Prospect 47, Moscow 117418, Russia (1978). JAMES M. MALCOMSON, All Souls College, Oxford University, Manor Road Building, Oxford, OX1 3UQ, England (2005). EDMOND MALINVAUD, Institut National de la Statistique et des Études Économiques, 15 Boulevard Gabriel Péri, Malakoff 92245, France (1955). SVEN MALMQUIST, Institut of Statistics, University of Stockholm, Stockholm S-10691, Sweden (1955). BENOIT B. MANDELBROT, Mathematics Department, Yale University, New Haven, CT 06520-8283 (1967). CHARLES F. MANSKI, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208-2609 (1984). STEPHEN MARGLIN, Department of Economics, Harvard University, Littauer 221, 1805 Cambridge Street, Cambridge, MA 02138 (1976). ROBERTO S. MARIANO, School of Economics, Singapore Management University, 90 Stamford Road, 178903, Singapore (2009). HARRY M. MARKOWITZ, School of Management, University of California, San Diego, 9500 Gilman Drive, Otterson Hall, La Jolla, CA 92093 (1959). THOMAS MARSCHAK, Haas School of Business, University of California, Berkeley, 2220 Piedmont Avenue, Berkeley, CA 94720-1900 (1975). DAVID MARTIMORT, Institut d’Economie Industrielle, Manufacture des Tabacs, 21 Allée de Brienne, Toulouse F-31000, France (2005). CESAR MARTINELLI, Centro de Investigación Económica, Instituto Tecnológico Autónomo de México, Camino Santa Teresa 930, México DF 10700, Mexico (2009). BELA MARTOS, Institute of Economics, Hungarian Academy of Sciences, Budaorsi ut 43-45, Budapest H-1112, Hungary (1982). ANDREU MAS-COLELL, Department of Economics & Business, Universitat Pompeu Fabra, Ramon Trias Fargas, 25-27, Barcelona 08005, Spain (1977). ERIC S. MASKIN, School of Social Science, Institute for Advanced Study, Einstein Drive, Princeton, NJ 08540 (1981). AKIHIKO MATSUI, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 1130033, Japan (2008). HITOSHI MATSUSHIMA, Faculty of Economics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan (2003). KIMINORI MATSUYAMA, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (1999). STEVEN MATTHEWS, Department of Economics, University of Pennsylvania, 466 McNeil, 3718 Locust Walk, Philadelphia, PA 19104-6297 (1992).
FELLOWS
1475
ROSA MATZKIN, Department of Economics, University of California, Los Angeles, 405 Hilgard Avenue, Bunche Hall 8283, Los Angeles, CA 90095 (1995). PRESTON MCAFEE, Yahoo! Research, 3333 Empire Boulevard, Burbank, CA 91504 (1995). JOHN J. MCCALL, Department of Economics, University of California, Los Angeles, Bunche Hall 8283, Los Angeles, CA 90095-1477 (1997). BENNETT T. MCCALLUM, Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA 15213-3890 (1992). DANIEL MCFADDEN, Department of Economics, University of California, Berkeley, 549 Evans Hall #3880, Berkeley, CA 94720-3880 (1970). LIONEL MCKENZIE, Department of Economics, University of Rochester, Harkness Hall 204, Rochester, NY 14627 (1958). ANDREW MCLENNAN, School of Economics, University of Queensland, 520 Colin Clark Building, St. Lucia, Brisbane, QLD 4072, Australia (2006). COSTAS MEGHIR, Department of Economics, University College London, Gower Street, London, WC1E 6BT, England (1999). MARC MELITZ, Harvard University, Littaeuer Center, Cambridge, MA 02138 (2008). JEAN-FRANCOIS MERTENS, Faculté des sciences, Département de mathématique, Université catholique de Louvain, Voie Du Roman Pays 34, Louvain-la-Neuve 1348, Belgium (1981). ROBERT C. MERTON, Graduate School of Business Administration, Harvard University, Soldiers Field, Boston, MA 02163 (1983). MARGARET A. MEYER, Nuffield College, University of Oxford, New Road, Oxford, OX1 1NF, England (1998). PAUL MILGROM, Department of Economics, Stanford University, Stanford, CA 94305 (1983). JEAN CLAUDE MILLERON, 9 Rue Jean-Baptiste Lully, Santeny 94440, France (1978). LEONARD J. MIRMAN, Department of Economics, University of Virginia, PO Box 400182, Monroe Hall, Charlottesville, VA 22904-4182 (1978). SIR JAMES MIRRLEES, Faculty of Economics, University of Cambridge, Sidgwick Avenue, Cambridge, CB3 9DD, England (1970). TAPAN MITRA, Department of Economics, Cornell University, 448 Uris Hall, Ithaca, NY 14853 (1997). ROBERT A. MOFFITT, Department of Economics, Johns Hopkins University, 429 Mergenthaler Hall, 3400 North Charles Street, Baltimore, MD 21218 (1997). BENNY MOLDOVANU, Department of Economics, University of Bonn, Lennéstr. 37, Bonn 53113, Germany (2004). ALAIN MONFORT, Centre de Recherche en Économie et Statistique, Institut National de la Statistique et des Études Économiques, 15 Boulevard Gabriel Péri, Malakoff 92245, France (1985).
1476
FELLOWS
PAOLO KLINGER MONTEIRO, Escola de Pós-Graduação em Economia, Fundação Getulio Vargas, Praia de Botafogo 190, Sala 1103, Rio de Janeiro 22253-900, Brazil (2009). DILIP MOOKHERJEE, Department of Economics, Boston University, 270 Bay State Road, Boston, MA 02215 (2008). JOHN HARDMAN MOORE, Department of Economics, University of Edinburgh, 50 George Square, Edinburgh, EH8 9JY, Scotland (1989). CHIKASHI MORIGUCHI, Institute of Social and Economic Research, Osaka University, 1-1 Yamadaoka, Suita, Osaka, 567-0871, Japan (1986). KIMIO MORIMUNE, Graduate School of Economics, Kyoto University, 6068501 Sakyo, Kyoto, 606-8501, Japan (2009). STEPHEN MORRIS, Department of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021 (2002). DALE MORTENSEN, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (1978). HERVÉ MOULIN, Department of Economics, Rice University, PO Box 1892, Houston, TX 77251 (1983). JOHN MUELLBAUER, Department of Economics, Nuffield College, Oxford University, Manor Road, Oxford, OX1 1NF, England (1979). YAIR MUNDLAK, Department of Agricultural Economics and Management, Hebrew University of Jerusalem, 18 Weizman St., Tel Aviv 1309, Israel (1970). KEVIN M. MURPHY, Graduate School of Business, University of Chicago, 5807 South Woodlawn Avenue, Chicago, IL 60637 (1993). MICHAEL L. MUSSA, Institute for International Economics, 1750 Massachusetts Avenue, N.W., Washington, DC 20036-1903 (1986). ROGER B. MYERSON, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637 (1983). JOHN NACHBAR, Department of Economics, Washington University in St. Louis, Campus Box 1208, One Brookings Drive, St. Louis, MO 63130-4899 (2009). A. L. NAGAR, Indian Institute of Finance, Ashok Vihar II, Dehli 110052, India (1970). ANDRAS NAGY, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 25 Orde Street, 5-1015-3, Toronto, Ontario, M5T 3H7, Canada (1978). JOHN FORBES NASH, Department of Mathematics, Princeton University, 910 Fine Hall, Washington Road, Princeton, NJ 08544 (1990). J. PETER NEARY, Department of Economics, University of Oxford, Manor Road Building, Manor Road, Oxford, OX1 3UQ, England (1987). TAKASHI NEGISHI, 1-3-1-2003 Motoazabu, Minatoku, Tokyo, 106-0046, Japan (1966). CHARLES R. NELSON, Department of Economics, University of Washington, Box 353330, Seattle, WA 98195 (2003). MARC NERLOVE, Agricultural and Resource Economics Department, University of Maryland, 2200 Symons Hall, College Park, MD 20742-5535 (1960).
FELLOWS
1477
DAVID NEWBERY, Faculty of Economics, University of Cambridge, Sidgwick Avenue, Cambridge, CB3 9DD, England (1989). WHITNEY K. NEWEY, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (1989). ABRAHAM NEYMAN, Institute of Mathematics, Hebrew University of Jerusalem, Givat Ram, Jerusalem 91904, Israel (1989). STEPHEN J. NICKELL, Department of Economics, Nuffield College, Oxford University, New Road, Oxford, OX1 1NF, England (1980). JUAN PABLO NICOLINI, Research Department, Federal Reserve Bank of Minneapolis, 90 Hennepin Avenue, Minneapolis, MN 55480-0291 (2009). KAZUO NISHIMURA, Institute of Economic Research, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, Japan (1992). WILLIAM D. NORDHAUS, Cowles Foundation, Yale University, 28 Hillhouse Avenue, New Haven, CT 06511-8268 (1984). MAURICE OBSTFELD, Department of Economics, University of California, Berkeley, 508-1 Evans Hall #3880, Berkeley, CA 94720-3880 (1996). WALTER OI, Department of Economics, University of Rochester, Harkness Hall 211, Rochester, NY 14627 (1975). MASAHIRO OKUNO-FUJIWARA, Faculty of Economics, University of Tokyo, 73-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan (1990). GUY HENDERSON ORCUTT, Box 989, Grantham, NH 03753 (1956). MARTIN J. OSBORNE, Department of Economics, University of Toronto, 150 St George Street, Toronto, ON, M5S 3G7, Canada (2003). JOSEPH M. OSTROY, Department of Economics, University of California, Los Angeles, Box 951477, Bunche Hall 8339, Los Angeles, CA 90095-1477 (1984). ADRIAN R. PAGAN, School of Economics and Finance, Queensland University of Technology, GPO Box 4, Brisbane, QLD 4001, Australia (1985). ARIEL PAKES, Department of Economics, Harvard University, Littauer 117, Cambridge, MA 02138 (1988). THOMAS R. PALFREY, Division of Humanities and Social Sciences, California Institute of Technology, 228-77 Caltech, Pasadena, CA 91125 (1995). JOON Y. PARK, Department of Economics, Indiana University, 100 S. Woodlawn, Bloomington, IN 47405-7104 (2002). LUIGI L. PASINETTI, Università Cattolica del Sacro Cuore, Largo Gemelli, 1, Milano 20123, Italy (1978). PRASANTA K. PATTANAIK, Department of Economics, University of California, Riverside, 4105 Sproul Hall, Riverside, CA 92521-0427 (1978). DAVID PEARCE, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10012 (1991). BEZALEL PELEG, Centre for Rationality & Interactive Decision Theory, Hebrew University of Jerusalem, Feldman Building, Givat Ram, Jerusalem 91904, Israel (1977). JOHN H. PENCAVEL, Department of Economics, Stanford University, Economics Building 262, Stanford, CA 94305-6072 (1993).
1478
FELLOWS
PIERRE PERRON, Department of Economics, Boston University, 270 Bay State Road, Boston, MA 02215 (2007). MOTTY PERRY, Department of Economics, Hebrew University of Jerusalem, Givat Ram, Jerusalem 91904, Israel (1999). TORSTEN PERSSON, Institute for International Economic Studies, Stockholm University, Stockholm S-106 91, Sweden (1997). M. HASHEM PESARAN, Faculty of Economics, University of Cambridge, Sidgwick Avenue, Cambridge, CB3 9DD, England (1989). WOLFGANG PESENDORFER, Department of Economics, Princeton University, 211 Fisher Hall, Princeton, NJ 08544 (2000). MICHAEL PETERS, Department of Economics, University of British Columbia, 997-1873 East Mall, Vancouver, V6T 1Z1, Canada (2007). EDMUND S. PHELPS, Department of Economics, Columbia University, 420 West 118th Street, International Affairs Building, New York, NY 10027 (1967). PETER C. B. PHILLIPS, Cowles Foundation for Research in Economics, Yale University, Box 208281, 30 Hillhouse Avenue, New Haven, CT 06520-8281 (1981). LOUIS PHLIPS, Department of Economics, European University Institute, Badia Fiesolana, Via dei Roccettini, 9, San Domenico di Fiesole 1-50016, Italy (1980). MONIKA PIAZZESI, Stanford University, 579 Serra Mall, Stanford, CA 943056072 (2008). ROBERT S. PINDYCK, Sloan School of Management, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02139 (1993). CHRISTOPHER PISSARIDES, Department of Economics, London School of Economics, Houghton Street, London, WC2A 2AE, England (1997). CHARLES R. PLOTT, Division of the Humanities and Social Sciences, California Institute of Techology, MC 288-77, Pasadena, CA 91125 (1985). DALE J. POIRIER, Department of Economics, University of California–Irvine, 3151 Social Science Plaza, Irvine, CA 92697-5100 (1995). HERAKLES M. POLEMARCHAKIS, Department of Economics, University of Warwick, Coventry, CV4 7AL, United Kingdom (1991). ROBERT A. POLLAK, Department of Economics, Olin Business School, Washington University in St. Louis, Campus Box 1133, 1 Brookings Drive, St. Louis, MO 63130-4899 (1977). VICTOR POLTEROVICH, Central Economics and Mathematics Institute, RAS, Nakhimovski Prospect 47, Moscow 117418, Russia (1989). ROBERT PORTER, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208-2600 (1989). RICHARD D. PORTES, Department of Economics, London Business School, Sussex Place, London, NW1 4SA, England (1983). KRZYSZTOF PORWIT, Central School of Planning and Statistics, A1. Niepodleglosci 162, Warszawa, Poland (1973).
FELLOWS
1479
ANDREW W. POSTLEWAITE, Department of Economics, University of Pennsylvania, 3718 Locust Walk, Philadelphia, PA 19104 (1985). JAMES M. POTERBA, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (1988). ALAN A. POWELL, Center of Policy Studies, Monash University, Eleventh Floor, Menzies Bldg, Wellington Road, Clayton, Victoria 3800, Australia (1988). JAMES L. POWELL, Department of Economics, University of California, Berkeley, 549 Evans Hall #3880, Berkeley, CA 94720-3880 (1990). JOHN W. PRATT, Harvard Business School, Soldiers Field, Boston, MA 02163 (1974). ANDRAS PREKOPA, Rutgers Center for Operations Research, Rutgers, The State University of New Jersey, 640 Bartholomew Road, Piscataway, NJ 08854-8003 (1978). EDWARD C. PRESCOTT, Research Department, Federal Reserve Bank of Minneapolis, 90 Hennepin Avenue, Minneapolis, MN 55480-0291 (1980). F. GRAHAM PYATT, Institute of Social Studies, P.O. Box 29776, NL-2502 LT, The Hague, Netherlands (1978). RICHARD E. QUANDT, Department of Economics, Princeton University, Fisher Hall, Princeton, NJ 08544-1021 (1968). MARTINE QUINZII, Department of Economics, University of California, Davis, 1 Shields Avenue, Davis, CA 95616-8578 (2000). MATTHEW RABIN, Department of Economics, University of California, Berkeley, 549 Evans Hall #3880, Berkeley, CA 94720-3880 (2000). ROY RADNER, Stern Business School, New York University, 44 West Fourth Street, New York, NY 10012 (1961). HOWARD RAIFFA, John F. Kennedy School of Government, Harvard University, 79 John F. Kennedy Street, Mailbox NR, Cambridge, MA 02138 (1975). C. RADHAKRISHNA RAO, Department of Statistics, Pennsylvania State University, 326 Thomas Building, University Park, PA 16802 (1972). DEBRAJ RAY, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10012 (1993). ASSAF RAZIN, Department of Economics, Cornell University, Ithaca, NY 14853 (1994). SERGIO REBELO, Kellogg School of Management, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208-2001 (2006). JENNIFER F. REINGANUM, Department of Economics, Vanderbilt University, 106 Calhoun Hall, Nashville, TN 37235 (1989). STANLEY REITER, Managerial Economics and Decision Sciences, Northwestern University, 2001 Sheridan Road, Jacobs Center, 5th Floor, Evanston, IL 60208-2001 (1970). ERIC RENAULT, Department of Economics, University of North Carolina, Chapel Hill, Gardner 300G, Chapel Hill, NC 27599-3305 (1998).
1480
FELLOWS
PHILIP J. RENY, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637 (1996). RAFAEL REPULLO, CEMFI, Casado Del Alisal 5, Madrid 28014, Spain (2002). PATRICK REY, Institut d’Economie Industrielle, Université Toulouse 1, Toulouse 31042, France (1998). JEAN-FRANCOIS RICHARD, Department of Economics, University of Pittsburgh, 4917 W. W. Posvar Hall, Pittsburgh, PA 15260 (1980). MARCEL K. RICHTER, Department of Economics, University of Minnesota, 4-101 Hanson Hall, 1925 S. 4th Street, Minneapolis, MN 55455 (1974). GEERT RIDDER, Department of Economics, University of Southern Califorina, Kaprielian Hall 310A, University Park Campus, Los Angeles, CA 90089 (2003). JOHN G. RILEY, Department of Economics, University of California, Los Angeles, 405 Hilgard Avenue, Los Angeles, CA 90095 (1983). MICHAEL RIORDAN, Department of Economics, Columbia University, 420 West 118th Street, 1014 International Affairs Building, New York, NY 10027 (1994). JOSÉ VÍCTOR RÍOS-RULL, Department of Economics, University of Minnesota, 4-101 Hanson Hall, 1925 S. 4th Street, Minneapolis, MN 55455 (2007). RAFAEL ROB, Department of Economics, University of Pennsylvania, 512 McNeil, 3718 Locust Walk, Philadelphia, PA 19104-6297 (1995). D. JOHN ROBERTS, Graduate School of Business, Stanford University, Littlefield 239, Stanford, CA 94305-5015 (1982). KEVIN W. S. ROBERTS, Department of Economics, Nuffield College, Oxford University, New Road, Oxford, OX1 1NF, England (1987). JEAN-MARC ROBIN, Department of Economics, Maison des Sciences Economiques, Université de Paris 1 Panthéon–Sorbonne, 106/112 Boulevard de l’Hôpital, Paris 75647, France (2006). PETER ROBINSON, Department of Economics, London School of Economics, Houghton Street, London, WC2A 2AE, England (1989). ARTHUR J. ROBSON, Department of Economics, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia, V5A 1S6, Canada (2007). JEAN-CHARLES ROCHET, University of Zurich and Toulouse School of Economics, Plattenstrasse 14, 8032 Zürich, Switzerland (1995). JOHN E. ROEMER, Department of Political Science, Yale University, 124 Prospect Street, P.O. Box 208301, New Haven, CT 06520-8301 (1986). RICHARD ROGERSON, Department of Economics, School of Business, Arizona State University, PO Box 873806, Tempe, AZ 85287 (2005). WILLIAM R. ROGERSON, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (1999).
FELLOWS
1481
KENNETH S. ROGOFF, Department of Economics, Harvard University, Littauer 216, 1805 Cambridge Street, Cambridge, MA 02138 (1991). RICHARD ROLL, Department of Finance, Anderson School of Management, University of California, Los Angeles, Box 951481, Los Angeles, CA 90095-1481 (1989). PAUL ROMER, Graduate School of Business, Stanford University, Stanford, CA 94305 (1990). HARVEY S. ROSEN, Department of Economics, Princeton University, Princeton, NJ 08544-1021 (1986). MARK R. ROSENZWEIG, Department of Economics, Yale University, PO Box 208269, 27 Hillhouse Avenue, New Haven, CT 06520-8269 (1994). STEPHEN ROSS, Sloan School of Management, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02139 (1978). JULIO J. ROTEMBERG, Harvard Business School, Soldiers Field, Boston, MA 02163 (1990). ALVIN ROTH, Department of Economics, Harvard Business School, Soldiers Field, Baker Library 441, Boston, MA 02163 (1983). THOMAS ROTHENBERG, Department of Economics, University of California, Berkeley, 549 Evans Hall #3880, Berkeley, CA 94590 (1977). MICHAEL ROTHSCHILD, Woodrow Wilson School, Princeton University, Princeton, NJ 08544-1013 (1974). ARIEL RUBINSTEIN, School of Economics, Tel Aviv University, Tel Aviv 69978, Israel (1985). JOHN RUST, Department of Economics, University of Maryland, 4115 Tydings Hall, College Park, MD 20742 (1993). ALDO RUSTICHINI, Department of Economics, University of Minnesota, 4-101 Hanson Hall, 1925 S. 4th Street, Minneapolis, MN 55455 (2004). PAUL A. RUUD, Department of Economics, University of California, Berkeley, 549 Evans Hall #3880, Berkeley, CA 94720-3880 (2003). JEFFREY D. SACHS, Earth Institute, Columbia University, 314 Low Library, MC 4327, 535 West 116th Street, New York, NY 10027 (1986). WIESLAW SADOWSKI, Institute of Planning, Warszawa, Poland (1970). BERNARD SALANIE, Department of Economics, Columbia University, International Affairs Building MC 3308, 420 West 118th Street, New York, NY 10027 (2001). DOV E. SAMET, Faculty of Management, Tel Aviv University, Tel Aviv 69978, Israel (1998). LARRY SAMUELSON, Department of Economics, Yale University, 30 Hillhouse Avenue, New Haven, CT 06520-8281 (1994). JAN SANDEE, Pellenaerstraat 27, The Hague, Netherlands (1965). AGNAR SANDMO, Department of Economics, Norges Handelshøyskole, Helleveien 30, Bergen NO-5045, Norway (1976). MANUEL SANTOS, Department of Economics, School of Business, University of Miami, P.O. Box 248126, Coral Gables, FL 33124-6550 (2009).
1482
FELLOWS
THOMAS J. SARGENT, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10012 (1976). MARK A. SATTERTHWAITE, Department of Management and Strategy, Kellogg School of Management, Northwestern University, Evanston, IL 60208 (1986). N. EUGENE SAVIN, Department of Economics, Tippie College of Business, University of Iowa, W374 Pappajohn Business Building, Iowa City, IA 52242-1994 (1985). TAKAMITSU SAWA, Kyoto Institute of Economic Research, Kyoto University, Sakyoku, Kyoto, 606, Japan (1978). HERBERT E. SCARF, Cowles Foundation, Yale University, Box 208281, 30 Hillhouse Avenue, New Haven, CT 06520-8281 (1962). JOSÉ ALEXANDRE SCHEINKMAN, Department of Economics, Princeton University, 26 Prospect Avenue, Princeton, NJ 08540-5296 (1978). THOMAS C. SCHELLING, Department of Economics and School of Public Affairs, University of Maryland, 3105 Tydings Hall, College Park, MD 20742 (2007). RICHARD SCHMALENSEE, Sloan School of Management, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142-1347 (1982). DAVID SCHMEIDLER, School of Mathematical Sciences, Tel Aviv University, Ramat Aviv, Tel Aviv 69978, Israel (1980). PETER SCHMIDT, Department of Economics, Michigan State University, 18-B Marshall–Adams Hall, East Lansing, MI 48824 (1988). PETER K. SCHOENFELD, Universität Bonn, Adenauerallee 24 42, Bonn D53113, Germany (1976). MYRON S. SCHOLES, Graduate School of Business, Stanford University, 518 Memorial Way, Stanford, CA 94305-5015 (1984). ILYA SEGAL, Department of Economics, Stanford University, 579 Serra Mall, Landau Economics Building, Room 242, Stanford, CA 94305-6072 (2003). REINHARD SELTEN, Department of Economics, Universität Bonn, Adenauerallee 24 42, Bonn D-53113, Germany (1973). AMARTYA K. SEN, Department of Economics, Harvard University, Littauer 205, 1805 Cambridge Street, Cambridge, MA 02138 (1968). ARUNAVA SEN, Planning Unit, Indian Statistical Institute, 7 S.J.S. Sansanwal Marg, New Delhi 110016, India (2003). WAYNE J. SHAFER, Department of Economics, University of Illinois at Urbana–Champaign, 410 David Kinley Hall, 1407 W. Gregory, Urbana, IL 61801 (1994). AVNER SHAKED, Universität Bonn, Adenauerallee 24 42, Bonn D-53113, Germany (1991). CHRISTINA SHANNON, Department of Economics, University of California, Berkeley, 508-1 Evans Hall #3880, Berkeley, CA 94720-3880 (2002). LLOYD S. SHAPLEY, Department of Economics, University of California, Los Angeles, Bunche Hall 8240, Los Angeles, CA 90095-1477 (1967).
FELLOWS
1483
STEVEN SHAVELL, Harvard Law School, 1563 Massachusetts Avenue, Cambridge, MA 02138 (1988). KARL SHELL, Department of Economics, Cornell University, 402 Uris Hall, Ithaca, NY 14853-7601 (1972). NEIL G. SHEPHARD, Department of Economics, Nuffield College, Oxford University, New Road, Oxford, OX1 1NF, England (2004). EYTAN SHESHINSKI, Department of Economics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 91905, Israel (1971). ROBERT J. SHILLER, Yale University, Box 208281, 30 Hillhouse Avenue, New Haven, CT 06511-8281 (1980). ROBERT SHIMER, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637 (2006). HYUN SONG SHIN, Bendheim Center for Finance, Princeton University, 26 Prospect Avenue, Princeton, NJ 08540-5296 (2004). ANDREI SHLEIFER, Department of Economics, Harvard University, Littauer M-9, Cambridge, MA 02138 (1993). ANTHONY F. SHORROCKS, UNU-WIDER, Katajanokanlaituri 6 B, Helsinki, FIN-00160, Finland (1996). JOHN B. SHOVEN, Stanford Institute for Economic Policy Research (SIEPR), Stanford University, Landau 132, Stanford, CA 94305-6072 (1984). MARTIN SHUBIK, Cowles Foundation for Research, Yale University, Box 208281, 30 Hillhouse Avenue, New Haven, CT 06520-8281 (1971). JOAQUIM SILVESTRE, Department of Economics, University of California, Davis, One Shields Avenue, Davis, CA 95616 (1991). CHRISTOPHER A. SIMS, Department of Economics, Princeton University, 104 Fisher Hall, Princeton, NJ 08544-1021 (1974). KENNETH J. SINGLETON, Graduate School of Business, Stanford University, Stanford, CA 94305 (1988). STEPHEN SMALE, Department of Mathematics, University of California, Berkeley, Berkeley, CA 94720-3840 (1982). LONES SMITH, Department of Economics, University of Michigan, 611 Tappan Street, Ann Arbor, MI 48109-1220 (2009). RICHARD J. SMITH, Faculty of Economics, University of Cambridge, Sidgwick Avenue, Cambridge, CB3 9DD, England (2007). VERNON L. SMITH, Economic Science Institute, Chapman University, One University Drive, Orange, CA 92866 (1987). JOEL SOBEL, Department of Economics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 (1990). ROBERT SOLOW, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02139 (1957). DIETER SONDERMANN, Department of Economics, Universität Bonn, Adenauerallee 24 42, Bonn D-53113, Germany (1977). TAYFUN SÖNMEZ, Department of Economics, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467 (2003).
1484
FELLOWS
HUGO SONNENSCHEIN, Department of Economics, University of Chicago, 1126 East 59th St, Chicago, IL 60637 (1972). SYLVAIN SORIN, Université Pierre et Marie Curie, Paris 6, Barre 1516, 1er étage, bureau 104, 4 Place Jussieu, Paris 75013, France (2000). MARILDA SOTOMAYOR, Department of Economics, Universidade São Paulo, Av. Prof. Luciano Gualberto, 908, Sao Paulo 05508-900, Brazil (2003). A. MICHAEL SPENCE, Graduate School of Business, Stanford University, 518 Memorial Way, Stanford, CA 94305-5015 (1976). T. N. SRINIVASAN, Economic Growth Center, Yale University, Box 208269, 27 Hillhouse Avenue, New Haven, CT 06520-8269 (1970). ENNIO STACCHETTI, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10012 (2001). ROBERT STAIGER, Department of Economics, Stanford University, Stanford, CA 94305 (2008). DAVID STARRETT, Department of Economics, Stanford University, 579 Serra Mall, Stanford, CA 94305-6072 (1975). NICHOLAS H. STERN, Department of Economics, London School of Economics, Houghton Street, London, WC2A 2AE, England (1978). JOSEPH E. STIGLITZ, Department of Economics, Columbia University, 3022 Broadway, Uris Hall, Room 814, New York, NY 10027 (1973). JAMES H. STOCK, Department of Economics, John F. Kennedy School of Government, Harvard University, Mailbox NR, 79 John F. Kennedy Street, Cambridge, MA 02138 (1992). THOMAS M. STOKER, Sloan School of Management, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142 (1991). NANCY L. STOKEY, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637 (1987). DANIEL B. SUITS, Department of Economics, Michigan State University, East Lansing, MI 48823 (1966). LAWRENCE H. SUMMERS, Office of the President, John F. Kennedy School of Government, Harvard University, 79 John F. Kennedy Street, Cambridge, MA 02138 (1985). ROBERT SUMMERS, Department of Economics, Pennsylvania State University, 329 McNeil, 3718 Locust Walk, Philadelphia, PA 19104-6297 (1989). JOHN SUTTON, Department of Economics, London School of Economics, Houghton Street, London, WC2A 2AE, England (1991). KOTARO SUZUMURA, Institute of Economic Research, Hitotsubashi University, Kunitachi City, Tokyo, 186-8603, Japan (1990). LARS SVENSSON, Deputy Governor, Sveriges Riksbank, Brunkebergstorg 11, Stockholm SE-103 37, Sweden (1990). JEROEN SWINKELS, Department of Management and Strategy, Kellogg School of Management, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (2004). GUIDO TABELLINI, IGIER, Bocconi University, Via Sarfatti, 25, Milano 20136, Italy (2001).
FELLOWS
1485
ELIE TAMER, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (2008). GEORGE TAUCHEN, Department of Economics, Duke University, Mail Box 90097, 213 Social Sciences, Durham, NC 27708-0097 (1994). JOHN B. TAYLOR, Department of Economics, Stanford University, Herbert Hoover Memorial Building, Stanford, CA 94305 (1984). LESTER G. TELSER, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637 (1968). JACQUES THISSE, Center for Operations Research and Econometrics, Université catholique de Louvain, 34 Voie du Roman Pays, B-1348, Louvain-laNeuve 1348, Belgium (1992). JONATHAN P. THOMAS, Management School & Economics, University of Edinburgh, William Robertson Building, 50 George Square, Edinburgh, EH8 9JY, Scotland (2007). WILLIAM THOMSON, Department of Economics, University of Rochester, Harkness Hall, Rochester, NY 14627 (1990). JEAN TIROLE, Institut d’Économie Industrielle, Université des Sciences Sociales, Manufacture des Tabacs, 21 Allée de Brienne, Toulouse F-31000, France (1986). PETRA E. TODD, Department of Economics, University of Pennsylvania, McNeil 160, 3718 Locust Walk, Philadelphia, PA 19104 (2009). ROBERT TOWNSEND, Department of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, Cambridge, MA 02142 (1985). STEPHEN J. TURNOVSKY, Department of Economics, University of Washington, Box 353330, Savery 302, Seattle, WA 98195-3330 (1981). CHRISTOPHER UDRY, Department of Economics, Yale University, Box 208269, New Haven, CT 06520-8269 (2005). HARALD UHLIG, Department of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637 (2003). HIROFUMI UZAWA, Department of Economics, University of Tokyo, 3-1 Hongo 7 Chome, Bunkyo-ku, Tokyo, 113, Japan (1960). JUUSO VALIMAKI, Aalto University School of Economics, Helsinki School of Economics, P.O. Box 21210, Aalto FI-00076, Finland (2007). ERIC VAN DAMME, Center for Economic Research, Tilburg University, PO Box 90153, Warandelaan 2, Tilburg 5000LE, Netherlands (1993). HAL R. VARIAN, Department of Economics, University of California, Berkeley, 102 South Hall, Berkeley, CA 94720-4600 (1983). ANTHONY J. VENABLES, Department of Economics, University of Oxford, Manor Road, Oxford, OX1 3UQ, England (2003). M. J. VERHULST, 180 Rue de Roubaix, Toufflers 59390, France (1957). JOHN VICKERS, All Souls College, Oxford University, Oxford, OX1 4AL, England (1998). XAVIER VIVES, IESE Business School and UPF, Avenida Pearson, 21, Barcelona 8034, Spain (1992).
1486
FELLOWS
C. CHRISTIAN VON WEIZSACKER, Zur Erforschung Von Gemeinschaftsgutern, Max Planck Institute, Kurt Schumacher Str 10, Bonn 53113, Germany (1968). QUANG H. VUONG, Department of Economics, Pennsylvania State University, 608 Kern Graduate Building, University Park, PA 16802-3306 (1997). JEAN WAELBROECK, Université Libre de Bruxelles, C.P. 135, Avenue F. D. Roosevelt 50, Bruxelles 1050, Belgium (1971). PETER P. WAKKER, Econometric Institute, Erasmus University, P.O. Box 1738, Rotterdam 3000 DR, Netherlands (2003). MARK WALKER, Economics Department, University of Arizona, McClelland Hall 401, PO Box 210108, Tucson, AZ 85721-0108 (2009). NEIL WALLACE, Department of Economics, Pennsylvania State University, 608 Kern Graduate Building, University Park, PA 16802 (1981). KENNETH F. WALLIS, Department of Economics, University of Warwick, Gibbet Hill Road, Coventry, Warwickshire, CV4 7AL, England (1975). MARK W. WATSON, Department of Economics, Princeton University, Princeton, NJ 08544 (1992). HAROLD W. WATTS, Department of Economics, Columbia University, 1022 International Affairs Building, 420 West 118th Street, New York, NY 10027 (1976). JORGEN W. WEIBULL, Department of Economics, Stockholm School of Economics, Sveavägen 65, PO Box 6501, Stockholm 11383, Sweden (1999). ANDREW WEISS, Founder, Weiss Asset Management, 29 Commonwealth Ave, Boston, MA 02216 (1989). MARTIN L. WEITZMAN, Department of Economics, Harvard University, Littauer 313, Cambridge, MA 02138 (1976). FINIS WELCH, Welch Consulting, 1716 Briarcrest Dr, Bryan, TX 77802 (1980). KENNETH D. WEST, Department of Economics, University of Wisconsin, 1180 Observatory Drive, Madison, WI 53706-1393 (1993). JOHN WHALLEY, Department of Economics, University of Western Ontario, Social Science Centre, London, ON, N6A 5C2, Canada (1990). MICHAEL D. WHINSTON, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208 (1993). HALBERT L. WHITE, Department of Economics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0508 (1983). T. M. WHITIN, Department of Economics, Wesleyan University, Middletown, CT 06457 (1958). OLIVER E. WILLIAMSON, Walter A. Haas School of Business, University of California, Berkeley, S545 Student Services Bldg. #1900, Berkeley, CA 94720-1900 (1976). ROBERT D. WILLIG, Woodrow Wilson School, Princeton University, 401 Robertson Hall, Princeton, NJ 08544 (1981). CHARLES WILSON, Department of Economics, New York University, 19 West Fourth Street, New York, NY 10012 (1982).
FELLOWS
1487
ROBERT B. WILSON, Stanford Business School, Stanford University, 518 Memorial Way, Stanford, CA 94305-5015 (1976). SIDNEY G. WINTER, Management Department, Wharton School of Business, University of Pennsylvania, 2000 Steinberg Hall-Dietrich Hall, 3620 Locust Walk, Philadelphia, PA 19104-6370 (1978). DAVID A. WISE, John F. Kennedy School of Government, Harvard University, 79 John F. Kennedy Street, Cambridge, MA 02138-5801 (1986). PHILIP S. WOLFE, 33-221 IBM Research Center, P.O. Box 218, Yorktown Heights, NY 10598 (1979). ASHER WOLINSKY, Department of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208-2600 (1991). MYRNA WOODERS, Department of Economics, Vanderbilt University, 402 Calhoun Hall, Nashville, TN 37235-1819 (2001). MICHAEL WOODFORD, Department of Economics, Columbia University, 1009B International Affairs Building, 420 West 118th Street, New York, NY 10027 (1991). ALAN D. WOODLAND, Department of Economics, Australian School of Business, University of New South Wales, West Wing 4th Floor, Australian School of Business Building, Sydney, NSW 2052, Australia (1988). JEFFREY M. WOOLDRIDGE, Department of Economics, Michigan State University, 18D Marshall Hall, East Lansing, MI 48824-1038 (2002). RANDALL WRIGHT, Department of Economics, University of Pennsylvania, 3718 Locust Walk, Philadelphia, PA 19104-6297 (1997). MENAHEM E. YAARI, Center for the Study of Rationality, Hebrew University of Jerusalem, Givat Ram Campus, Feldman Building, Jerusalem 91904, Israel (1970). H. PEYTON YOUNG, Department of Economics, Nuffield College, Oxford University, Manor Road, Oxford, OX1 3UQ, England (1994). WILLIAM R. ZAME, Department of Economics, University of California, Los Angeles, 8359 Bunche Hall, Los Angeles, CA 90095-1477 (1994). SHMUEL ZAMIR, Center for the Study of Rationality, Hebrew University of Jerusalem, Givat Ram Campus, Jerusalem 91904, Israel (1992). RICHARD ZECKHAUSER, John F. Kennedy School of Government, Harvard University, 79 John F. Kennedy Street, Cambridge, MA 02138 (1989). ARNOLD ZELLNER, Booth School of Business, University of Chicago, 5807 South Woodlawn Avenue, Chicago, IL 60637 (1965).
Econometrica, Vol. 78, No. 4 (July, 2010), 1489
SUBMISSION OF MANUSCRIPTS TO THE ECONOMETRIC SOCIETY MONOGRAPH SERIES FOR MONOGRAPHS IN ECONOMIC THEORY, a PDF of the manuscript should be emailed to Professor George J. Mailath at [email protected]. Only electronic submissions will be accepted. In exceptional cases for those who are unable to submit electronic files in PDF format, three copies of the original manuscript can be sent to: Professor George J. Mailath Department of Economics 3718 Locust Walk University of Pennsylvania Philadelphia, PA 19104-6297, U.S.A. For monographs in theoretical and applied econometrics, a PDF of the manuscript should be emailed to Professor Rosa Matzkin at [email protected]. Only electronic submissions will be accepted. In exceptional cases for those who are unable to submit electronic files in PDF format, three copies of the original manuscript can be sent to: Professor Rosa Matzkin Department of Economics University of California, Los Angeles Bunche Hall 8349 Los Angeles, CA 90095 They must be accompanied by a letter of submission and be written in English. Authors submitting a manuscript are expected not to have submitted it elsewhere. It is the authors’ responsibility to inform the Editors about these matters. There is no submission charge. The Editors will also consider proposals consisting of a detailed table of contents and one or more sample chapters, and can offer a preliminary contingent decision subject to the receipt of a satisfactory complete manuscript. All submitted manuscripts should be double spaced on paper of standard size, 8.5 by 11 inches or European A, and should have margins of at least one inch on all sides. The figures should be publication quality. The manuscript should be prepared in the same manner as papers submitted to Econometrica. Manuscripts may be rejected, returned for specified revisions, or accepted. Once a monograph has been accepted, the author will sign a contract with the Econometric Society on the one hand, and with the Publisher, Cambridge University Press, on the other. Currently, monographs usually appear no more than twelve months from the date of final acceptance.
© 2010 The Econometric Society
DOI: 10.3982/ECTA784SUM