Handbook of Choice Modelling

12345346 789 8 9 342423 !99"#"$% &' ()*+,-./0)+-/1 2+3535465352567...

Author: Stephane Hess | Andrew Daly

210 downloads 1961 Views 6MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

12345346 789 8 9 342423 !99"#"$% &' ()*+,-./0)+-/1 2+353546535256758659236:5325;538<283435 :5865:534254577=;53> % 8&@A "" B(6C3272>87565838343525623655553487325 F GHIJ9G'"FKLH89MG D+24868:252:675:257256NC325 OG7 G9"8G J&M ()*+-P1Q*R/S(*Q,Q*Q/1Q1 T132523454 V 9 "8 9OIO &89"H G@IW"'& X1575=483>8356243>964845254358434358645255<5875 ;538<2 Y "8H89Z[99&"V '\I#]9 ?14845325565:5348756>34352C9435432^43543848343543C F @IH 9"@88&_IFIJ&8G `P564=42646873>94352C83:54326 aIYIA&'"#I#IFI@ &' E+356545325565:5348662834253<23:534872343>534 <878423 H8"aI_ 9 "@8M bc_dcM $9M8 U*5873256833C624354873256 O&\I! 89 ()*+e-0Qff/SgQ+Q*-SQ/Q+h /23688:54866283564256;3>35452>5354C @ L9A 9L 0

? DE ?B U B T2 ?` 2U2 2BX 2T?

ijklmnokpqkrrpnosptosukvpwnxypzp{|}~|}~~ wvoxnskspupxnupoxokpnjp~|~|p} npukkpnkrr

0134567889 8 8 865 11!!"!01!#!$1"1%!&1

)4*6 35+ , -.!1!#!10/11#1!0#"1%/#0!11!# /11

34+44, 56 3 8,7+ 89!#!0!"1$!/$/

:;4 45 3++ <=>?@AB?CADEFBCGH?I<9EG?H.=E>E.JB?.E KI/#L//"1

M4 5,5 'G1"1/11N%!&1O1"%#1#/11N%!&O1"1" ""#/

P56,Q 6 R44ST4; 4 R 4,6 456 U54 8 U584 VJ$1/"1%/#

W4X4 P787Y Z6 456 W8+ [5Y P9*4 (."1%/#1!/1&%!!%

555 M6 34,,+85 456 \X W4,;]5^Y 38,,4 1I#1#/1N1"1%/#0!L#10!!#$1!/! 1001O

3 456,4 [ 4; 456 P767 R5_4, <=>@H<E.?`?.>?BCSEH>?I>?BCCG?C`E=EC.E 2D!$1!1%!1L!/%1#1$%/#

R;, a59 H1%#O!$1%!"1%/#L1#%

[4,; b,+ 95 ;S3 ,+;5 c 9,;S\8,645 a87*, 456 d48 e7,8 J$1/"1%/#0"1/1L1!1#%

P9+ 4X f_ 456 \845 a6 M49, -C%1!#%"/L1%1g!1N!/%/#1%!1!/1L

)4*6 :6 [75 <=>@?C9FH?HCGhHEB`=EHh9>H 8!1!#

P56,+ i4,+;,j K`!1"1

P56,k )4X lmnopqrnstnuusqrvswrvxnyszq{|s}s~ zyr{qvnvsxs{qxsr{rnsqmss qsxnnsqnuu

'( 12 - ---'-(88V 8KV 81( K1 K'K '2 'V

01234235788

9 9 88 !8"###$%&'(!8)!#8&(8 %)#%8&#%*!8 - .1//04-12343-325-6045-64427408 98&((8%)#%878%&!#(#%)'%&8) - ;4<310-=>-?53@124ABCDE3F5-GE42<-325-.H0842-I4J40K1// L8&((8%)#%8#(!&8) - N@4EJ-=32A530-325-O4340-PF084 , %#)Q%!%)#%) - .F32-54-R415-S03TB30CNE45374333-0K40AK4-325-=F45-U823A41-V4BB4 U254W

+, : 9M LL ,,

XYZ[\]^Z_`Zaa_]^b_c^bdZe_f]gh_i_jklmklmnnompq fre^gr]bZb_sdrt_ugv]d_w^gx^Z_]Y_nyzmkzynmk_ny{ql{qy|} ~x]_sdZZ_]Zaa

01234562829956 6 2 5 65 2 5 66251!""!!##!$% &5 225''299

1

Introduction Stephane Hess and Andrew Daly

Human behaviour is characterised by choices, long term as well as short term. Many of the choices we make have significant implications on the demand for services and infrastructure as well as the consumption of goods. The efficient functioning of society relies on the provision of sufficient supply to meet that demand. Governments and industry need to make decisions on infrastructure developments, the introduction of new services and the development and configuration of consumer products. At the policy end, there is also scope for steering demand, for example encouraging more environmentally friendly behaviour or a spreading of energy use throughout the day. Many of these decisions concerning pricing, supply or regulation have important financial, environmental and societal implications and need to be based on an understanding of consumer preferences, notably in the form of monetary valuations, and accurate forecasts of consumer demand. At the present time, potentially more than ever before, this need for reliable valuations and forecasts of demand is of crucial importance. Indeed, in the face of great economic uncertainty, coupled with increasing environmental concerns and ongoing security threats, the prioritisation between different major infrastructure developments is especially difficult, as recent examples of new high speed rail developments or the replacement of an ageing energy infrastructure have shown in a number of countries. Similarly, major policy decisions such as changes to the welfare system need to be informed at least in part by an understanding of the likely changes in work patterns and, longer term, the education and career choices of young people. Corresponding complexities arise in the commercial area with a need to understand the demand for new products and services. At the same time, important demographic changes relating to ageing and migration are likely to have major implications on the pattern of demand for services and products and their spatial location. Finally, the very nature of human choice behaviour is changing, with increasing use of information technology, the growing influence of (virtual) social networks and the role of societal and peer pressure, also in the context of various governments’ nudge agendas. Mathematical models of consumer choice play a key role in the process of understanding and predicting behaviour and are also used around the world to produce estimates of the valuations of services, environmental goods and product components. Their use in practice is truly multidisciplinary and while transport may still be the biggest area of activity, there exists a huge variety of applications in fields as diverse as energy, communications, health and a number of environmental areas. The methods used in these real-world applications largely have their source in academia, notwithstanding theoretical developments by leading practitioners. The academic community of choice modellers is vibrant and similarly cross-disciplinary as the real-world applications; in fact, it is researchers outside, for example, the transport and marketing disciplines that are driving the increasing use of choice models in new areas. Academic work in choice modelling is also evolving and recognising the real-world 1 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:50AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

2

Handbook of choice modelling

changes to behaviour, for example, through improved representation of social networks and the role of soft factors such as attitudes and perceptions. Despite the exciting breadth of activity taking place in academia, it is also important to recognise that there is still a lack of transition of advanced methods from academia into practice and, as a community, choice modellers have a responsibility to better illustrate the advantages of their developments in real-world work and to make their methods more practical. While this volume refers largely to developments in a microeconomic context, a large number of researchers in fields such as behavioural economics and mathematical psychology are also concerned with the understanding and modelling of choices, working largely in parallel and with little communication with traditional choice modellers. This work looks primarily at a deeper study of choices at the individual level and the treatment of rationality, and has gained added prominence in recent years with the publication of widely read books, for example by Daniel Kahneman and by Dan Ariely. To recognise this, and promote better collaboration between fields, the present volume contains two selected contributions from scholars in these disciplines as well as a number of chapters written by traditional choice modellers working at the interface with the behavioural fields. The question of course arises: ‘Why a handbook, and why now?’ There are several excellent textbooks on choice modelling, in addition to choice modelling being a key technique in chapters in discipline-specific handbooks, notably in transport and resource economics. The success of the recently established International Choice Modelling Conference series highlights the cross-disciplinary nature of the topic, as well as the desire (and need) for cross-fertilisation. It is in this spirit that the present volume aims to provide a collection of authoritative chapters on what we feel are key research areas, invited from leading colleagues from the field and thoroughly peer-reviewed. The book aims to provide an overview of key topics, highlighting the major ongoing developments, as well as to indicate directions for future research. It is our hope that it will lead to improved communication and cross-fertilisation across fields, as well as stimulating better uptake of advanced methods in practice. The book is divided into a number of parts, grouping together contributions that fall into the same general area. Part I following this introduction comprises three chapters looking at foundational issues. Nobel Prize winner Daniel McFadden, the leading architect of developments in choice modelling in the past 50 years, opens the book by discussing a move away from classical consumer choice theory and recognising the developments for example in psychology. This leads us directly to contributions by leading authors from mathematical psychology and behavioural economics. Busemeyer and Rieskamp highlight the need for an understanding and modelling of the dynamic nature of choice processes to arrive at reliable insights on beliefs and values. Loomes and Blackburn focus on an issue at the heart of research across different fields, namely, the modelling of choice under risk and uncertainty, and put forward a framework for a richer treatment of the key factors at play. The understanding and modelling of choices is entirely dependent on good observations, and this is the topic of Part II. Goulias and Pendyala discuss the importance of context in decision-making and look at how this can be adequately accommodated in data elicitation processes. While a majority of studies, across fields, now rely on stated

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:50AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Introduction 3 preference data, revealed preference data still has much to offer, and Rieser-Schüssler and Axhausen discuss this in the specific context of transport studies where there is growing uptake of automatic data collection methods. This contribution is followed by three chapters looking at specific issues in a hypothetical choice context. Rose and Bliemer provide an extensive overview of experimental design for stated choice surveys, drawing on evidence from different fields and putting the developments into context. While stated preference surveys relied for many years on simple discrete choice data, a growing number of studies are aiming at a fuller elicitation of preference structures and Flynn and Marley look at the recently popular approach of best–worst scaling. Carson and Czajkowski next look at the use of experimental data in the vibrant area of environmental economics where revealed preference data is often not an option, given the hypothetical nature of the valuations of interest. The use of data on hypothetical choice has always led to some degree of criticism in relation to differences from behaviour in real choices, and Harrison closes this section of the book through a discussion of this potential bias and how it might vary across settings and approaches. The interest in understanding and modelling variation in preferences across individual decision-makers is nearly as old as choice modelling itself. Part III looks at different approaches for modelling different types of heterogeneity. Fosgerau starts by discussing the use of nonparametric distributions in random coefficients models, and specifically looks at simple ways of capturing the unknown shapes. Hensher looks at attribute processing as a key source for heterogeneity across individual decision-makers, presents an overview of modelling approaches in that area and makes the link with the wider literature on heuristics. Chorus is specifically concerned with the assumptions on decision rules in models and the potential advantages but also pitfalls associated with using alternative decision paradigms in our analyses. Finally, Hess looks back at the use of latent class approaches as a tool for capturing heterogeneity, and places this in the context of recent work on the topics of attribute processing and decision rules covered in the preceding two chapters. While a large majority of research continues to focus on simple discrete choice, there are departures from this in multiple directions. In the first chapter of Part IV (Chapter 15), Greene presents a thorough introduction to models for choosing among ordered alternatives. A different departure from simple discrete choice comes in the form of decisions made jointly by multiple individuals, and an in-depth discussion of appropriate models for such choices is given by de Palma, Picard and Inoa in Chapter 16. One of the most active areas of research in choice modelling in the last few years has been concerned with the development of hybrid choice models, most notably with a view to accommodating a range of soft factors such as plans, attitudes and perceptions in decision-making; Abou-Zeid and Ben-Akiva provide an overview of the advantages of such models over structures explaining choices alone. Harrison and Martínez-Correa next return to the issue of decision-making under risk and uncertainty and, specifically, how individuals manage these risks when making choices. Finally, Bhat and Pinjari look at the important link between discrete choices and continuous consumption, providing an overview of existing work on multiple discrete–continuous choices and setting a research agenda for the field. The development of more powerful computers and estimation techniques has opened up the possibility of working with ever more complex model structures, but this in turn

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:50AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

4

Handbook of choice modelling

poses new issues in terms of model specification and inference. Additionally, model complexity is at the very least keeping up with the developments in computational power, and model estimation of the most advanced structures remains a substantial challenge. In many areas, Bayesian techniques have been put forward as an alternative to classical approaches, and in the first chapter in Part V (Chapter 20), Lenk gives a historical overview as well as providing the readers with a range of techniques to use in practice. While sample-level estimation remains the standard approach for the majority of choice modelling applications, the ability to estimate models at the level of individual decision makers has major appeal, and Frischknecht, Eckert, Louviere and Ribeiro discuss simple ways of doing so, drawing also on developments on best-worst scaling, linking to the Flynn and Marley’s Chapter 8 in this volume. The growing popularity of hybrid choice models, as discussed earlier in Chapter 17 by Abou-Zeid and Ben-Akiva, has posed new issues notably in terms of identification and this is discussed in detail by Vij and Walker in the next chapter. In the final chapter in this part, Bunch gives a very thorough grounding in alternative estimation and inference techniques, with a strong focus on the inner workings of the approaches. While model specification and estimation receive the majority of attention, especially in the academic literature, the real-world emphasis is on the analysis and use of results. Two different uses are covered in the penultimate part of the book, with Karlström focusing on use of model results in appraisal while Daly looks at using the models themselves in forecasting future choices. Up to this point, the chapters have focused on individual topics of research, the majority of which cross the boundaries of different disciplines. The final part of the book recognises that the specific needs in terms of methodology and behavioural insights vary substantially across fields. To this extent, we include four chapters covering what we believe to be the most active disciplines in choice modelling, each time setting an agenda for future research. Swait and Feinberg cover marketing, Adamowicz, Glenk and Meyerhoff focus on environmental and resource economics, Lancsar and Burge do the same for health economics and, leaving our own discipline until the end, Ortúzar, Cherchi and Rizzi look at research needs in transport. The topics covered in this book are a result of the editors’ perceptions of key topics, also informed by discussions with leading colleagues, where we especially appreciated a substantial input from Moshe Ben-Akiva. A volume such as this can of course never be complete and readers will form their own opinion of where the gaps are – one day, there might be a second edition. Until then, it remains for us to thank all of the authors for producing the chapters that form this handbook, and for taking on board the feedback from a long list of reviewers whose efforts are also greatly appreciated. Without these two groups, a volume such as this would of course not be possible.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:50AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

2

The new science of pleasure: consumer choice behavior and the measurement of well-being Daniel McFadden1

Illusion, Temperament, Succession, Surface, Surprise, Reality, Subjectiveness – these are threads on the loom of time, these are the threads of life. (Ralph Waldo Emerson, 1844) Let there be granted to the science of pleasure what is granted to the science of energy, to imagine an ideally perfect instrument, a psychophysical machine, continually registering the height of pleasure experienced by an individual, exactly according to the verdict of consciousness, or rather diverging therefrom according to a law of errors. From moment to moment the hedonimeter varies; the delicate index now flickering with the flutter of the passions, now steadied by intellectual activity, low sunk whole hours in the neighbourhood of zero, or momentarily springing up toward infinity. (Edgeworth, 1881)

1

INTRODUCTION

At the base of economic analysis is the consumer, whose behavior and well-being motivate a whole gamut of questions spanning demand analysis, incentive theory and mechanism design, project evaluation, and the introduction and marketing of private and public goods and services. Understanding and modeling consumer welfare was central in early economics, and remains so, with a continuing tension between elements of illusion, temperament, and subjectivity in consumer behavior, and the need for stable, predictive indicators for choice and well-being. The neoclassical model of the individualistic utility-maximizing consumer that forms the basis of most economic analysis is largely a finished subject, but new studies of consumer behavior and interesting new measurements challenge this model. This behavioral revaluation suggests new directions for the continuing development of choice theory. This chapter surveys the history of the study of consumer behavior and well-being, with particular attention to the lessons and opportunities afforded by new measurements coming into economics from cognitive psychology, anthropology, market science and neurology. The chapter focuses on the perceptions, emotions and behavior of individual consumers, and touches only briefly on important related issues of interpersonal comparisons and economic policy evaluation. I start with the views of the classical economists on happiness and utility. I first discuss attempts at measurement, followed by the flowering of demand analysis in the age of Sir Richard Stone. I then turn to expansions of neoclassical demand measurement, particularly to the subjects of choice in nonlinear and discrete budget sets, and finally to the new frontiers of measurement shared by economics and other disciplines.

7 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

8

Handbook of choice modelling

2

PLEASURE, PAIN, UTILITY

Systematic study of consumer motivation and well-being started with Jeremy Bentham, who still sits, stuffed, in University College London, and to this day is reputed to be the life of any party of economists that he joins. In Introduction to the Principles of Morals and Legislation, published in 1789, Bentham laid out the concept of consumers driven by self-interest to increase pleasure and reduce pain: ‘My notion of man is that . . . he aims at happiness . . . in every thing he does.’ Bentham and his successors explored the economic implications and moral content of utilitarianism, but despite their quantitative rhetoric, they were not much concerned with the actual measurement of happiness. It is not that they considered utility unmeasurable. Quite the opposite: by introspection utility existed, and its practical measurement was not needed for drawing out the broad principles of utilitarianism. Choice was viewed as an automatic consequence of self-interest, not as behavior that could put utilitarianism to test. Pursuit of happiness explained everything, and predicted nothing. A comment by Frank Taussig (1912), at the end of the classical era, summarizes nicely the utilitarian attitude: An article can have no value unless it has utility. No one will give anything for an article unless it yield him satisfaction. Doubtless people are sometimes foolish, and buy things, as children do, to please a moment’s fancy; but at least they think at the moment that there is a wish to be gratified. Doubtless, too, people often buy things which, though yielding pleasure for the moment, or postponing pain, are in the end harmful. But here . . . we must accept the consumer as the final judge. The fact that he is willing to give up something in order to procure an article proves once for all that for him it has utility – it fills a want.

The writings of the utilitarians provide insight into the nature and dimensions of wellbeing, and the problem of its measurement. Bentham thought about the pursuit of happiness in ways that did not fit into the later neoclassical synthesis, but which resonate with contemporary behavioral studies. Bentham’s utility was attached to the experience or sensation that objects and actions produced, their pleasure-increasing or pain-reducing effect. Later, utility became identified with a state of being defined by the consequences of actions rather than the processes producing these consequences. The behavioral revaluation supports the earlier view that attaches utility to process rather than to consequence. Bentham almost always distinguished increased pleasure and reduced pain as two distinct sources of happiness. This may just have been his penchant to say anything worth saying more than once, but perhaps he recognized that people respond differently to perceived gains and losses, a view supported by contemporary brain science. Bentham laid out four critical dimensions that determine the utility of an experience: intensity, duration, certainty or uncertainty, and propinquity or remoteness. Clearly, Bentham’s first two dimensions anticipated the utility of an episode as an integral of intensities over some time interval, although formalization of that idea would not come until Francis Edgeworth a century later. The third dimension anticipated a utility theory for risky prospects, and the fourth, intertemporal preferences and discounting. Also clearly present in classical economics are allowances for reciprocity and altruism in the determination of happiness. Bentham (1789) stressed the role of reciprocity:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure 9 By the self-regarding principle, the more urgent the need a man feels himself to have of the kindness and good will of others, the more strenuous and steady will be his exertion for the obtaining it. . . . The stronger a man’s need of the effective benevolence of others, the stronger the inducement he has for the manifesting effective benevolence as towards them.

Adam Smith (1753) noted the importance of altruism, particularly within families: How selfish soever man may be supposed, there are evidently some principles in his nature, which interest him in the fortune of others, and render their happiness necessary to him, though he derives nothing from it, except the pleasure of seeing it. Every man feels [after himself, the pleasures and pains] of the members of his own family. Those who usually live in the same house with him, his parents, his children, his brothers and sisters, are naturally the objects of his warmest affections.

Edgeworth (1881) noted ways in which such altruism is reflected in behavior: [E]fforts and sacrifices . . . are often incurred for the sake of one’s family rather than oneself. The action of the family affections ‘has always been fully reckoned with by economists’, especially in relation to the distribution of the family income between its various members, the expenses of preparing children for their future career, and the accumulation of wealth to be enjoyed after the death of him by whom it has been earned.

Classical economics came slowly to the problem of recovering utility from observed behavior, Adam Smith (1776) described how ‘haggling and bargaining in the market’ would achieve ‘rough equality’ between value in use and value in exchange. Working at the fringes of mainstream economics, Jules Dupuit (1844) and Hermann Gossen (1854) deduced that consumers exhibiting diminishing marginal utility would achieve maximum utility by equalizing the marginal utility per unit of expenditure across various goods. Dupuit was remarkably prescient, recognizing that an individual demand curve can be identified with a marginal utility curve for a good, provided the marginal utility of money remained constant, and showing that the area behind the demand curve then gave a measure of ‘relative utility’, or in Marshall’s later terminology, consumer surplus. Dupuit’s idea of solving the inverse problem (Figure 2.1), recovering utility from demand, was brought into the mainstream at the end of the nineteenth century by William Stanley Jevons (1871), Edgeworth (1881), Alfred Marshall (1895) and Vilfredo Pareto (1906); see Newman (2003). With the refinements introduced by John Hicks (1939) and Paul Samuelson (1947), it remains today the standard approach to measuring and predicting consumer welfare. In this era, economists also began to step back from introspective explanations of utility, instead treating it as a black box whose inner workings were not their concern. Irving Fisher (1892) makes the argument: ●

●

To fix the idea of utility, the economist should go no further than is serviceable in explaining economic facts. It is not his province to build a theory of psychology. Whether the necessary antecedent of desire is ‘pleasure’, or whether independently of pleasure it may sometimes be ‘duty’ or ‘fear’ concerns a phenomenon of the second remove from the economic act of choice.

The emphasis on characterizing utility solely in terms of the demand behavior it produced became the centerpiece of neoclassical consumer theory, perfected by Eugen

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

10

Handbook of choice modelling Bridge Toll

Demand = Marginal Utility

Area = Utility Difference

Trips

Figure 2.1

Dupuit’s inverse problem

Slutsky (1915), John Hicks (1939) and Paul Samuelson (1947), and in its purest statement forming the theory of revealed preference. This was a great logical achievement, but the demands of the analysis also narrowed and stiffened the way economists thought about preferences. The cardinal, proto-physiological utility of Bentham and Edgeworth was weakened to an ordinal index of preference. The domain of utility moved from activities or processes to the commodity vectors that were the consequence of choice. Self-interest was defined narrowly to include only personally purchased and consumed goods; reciprocity and altruism were ignored. No allowance was made for ambiguities and uncertainties regarding tastes, budgets, the attributes of goods, or the reliability of transactions. The Hicks–Samuelson formulation was fundamentally static, with the consumer making a once-and-for-all utility-maximizing choice of market goods. Utility in this formulation is usually interpreted as the felicity produced by flows of non-durable goods and services from durable goods. However, from the time of Fisher (1930), there were also neoclassical models of inter-temporal utility and the dynamics of choice. I discuss these in more detail in section 6. The remainder of this section sets notation with an abbreviated restatement of the core of neoclassical demand analysis: introductory treatments are given in standard textbooks, for example, Varian (1992, chs 7 and 10), Mas-Colell et al. (1995, ch. 3E, F, G and I). I use the theory of duality, with indirect utility functions and expenditure functions linked to demands through Roy’s identity and Shephard’s identity, respectively. Major features of these dual functions follow from the envelope theorem, developed by Rudolph Auspitz and Richard Lieben (1889), and applied to consumer theory first by Irving Fisher (1892), and later by Harold Hotelling (1935), Rene Roy (1942), Paul

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

11

The new science of pleasure

Samuelson (1947), Lionel McKenzie (1957) and Hirofumi Uzawa (1971). The full power of dual methods for derivation of demand systems or recovery of utility in econometric applications was not realized until the end of the 1950s, after the circulation of the unpublished lecture notes on convexity of Fenchel (1953), and the demonstration by Ron Shephard (1953) of the formal duality of input requirement sets and cost functions.2 Let p 5 (p1, . . . ,pn) denote a market good price vector in a positive cone P P Rn and x 5 (x1, . . . ,xn) denote a vector of goods and services in a closed, bounded-below consumption set XPRn.3 Let Z denote a compact metric space of points z that are placeholders for later analysis of (1) attributes of market or non-market goods, or (2) the consumer’s experience, information, social environment and predetermined choices. For example, z might characterize a state produced by experience and holdings of durables, or a predetermined location choice that determines the markets that are open to the consumer.4 Let R denote a compact metric space of points r interpreted as primitive characteristics of the individual (for example, genetic endowment) that shape tastes. The introduction of r will facilitate later analysis of unobserved taste heterogeneity. Suppose a consumer has a continuous utility index U(x,z,r) defined on X3Z3R.5 The fundamental consumer sovereignty assumption of neoclassical theory requires that r not depend on opportunities or choice. The arguments (z,r) are suppressed in most textbook treatments, but are implicit in the neoclassical theory and can be developed to accommodate some important behavioral phenomena. In the usual theory, the consumer seeks to maximize utility subject to a linear budget constraint y $ p·x, where y is an income level higher than the minimum necessary to make some vector in X affordable and lower than the expenditure needed to attain a bliss point in X. Make the standard assumption that in this range of income, local nonsatiation holds, so that all income is spent; for example, at least one commodity is available in continuous amounts and always desired. In general, we do not require that X be a convex set, or that preferences be convex; that is, we do not require that U be a quasi-concave function of x. Define the Hicksian (compensated) demand function6 x 5 H(p,u,z,r) ; argminxHX{p·x | U(x,z,r) $ u},

(2.1)

and expenditure function7 y 5 M(p,u,z,r) ; minxHX{p·x | U(x,z,r) $ u}.

(2.2)

Income and prices in the expenditure function may be nominal values, or may be deflated to real values. For much of the following development, it is unnecessary to distinguish between nominal and real income and prices, but when the distinction matters, let (p,y) denote nominal values, A(p) denote a price deflator that is a positive concave conical function of p, and let (p,y) 5 (p/A(p),y/A(p)) denote real values. Define the market demand function x 5 D(p,y,z,r) ; argmaxxHX{U(x,z,r) | y $ p·x}

(2.3)

and the indirect utility function8 u 5 V(p,y,z,r) ; maxxHX{U(x,z,r) | y $ p·x}.

(2.4)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

12

Handbook of choice modelling

With local non-satiation, the expenditure function and indirect utility function satisfy the identities y ; M(p,V(p,y,z,r),z,r) ; p·H(p,V(p,y,z,r),z,r) D(p,y,z,r) ; H(p,V(p,y,z,r),z,r), H(p,u,z,r) ; D(p,M(p,u,z,r),z,r)

(2.5)

V(p,y,z,r) ; U(D(p,y,z,r),z,r). Shephard’s identity establishes that when M is differentiable in p, H(p,u,z,r) ; =pM(p,u,z,r),

(2.6)

while Roy’s identity establishes that when V is differentiable in p and in y, D(p,y,z,r)5.=yV(p,y,z,r) ; −=pV(p,y,z,r).

(2.7)

When U(x,z,r) is quasi-concave and non-decreasing in x, the dual mappings U(x,z,r) 5 minpV(p,p·x,z,r) 5 max{u | p·x $ M(p,u,z,r) for p H P}

(2.8)

recover the direct utility function; otherwise, they recover the closed quasi-concave freedisposal hull of the direct utility function. Substituting the direct or indirect utility function into an expenditure function gives a monotone increasing transformation that is again a utility function, now denominated in dollars and termed a money-metric direct or indirect utility function, u 5 h(p9,z9;x,z,r) ; M(p9,U(x,z,r),z9,r),

(2.9)

u 5 μ(p9,z9;p,y,z,r) ; M(p9,V(p,y,z,r),z9,r). where (p9,z9) determine a benchmark metric and (x,z) or (p,y,z) determine the utility level. The function m behaves like an expenditure function in p9 and an indirect utility function in (p,y), and satisfies μ(p,z;p,y,z,r) ; y; see Hurwicz and Uzawa (1971), Hammond (1994), McFadden (1999b). A concern of neoclassical demand analysis, and a first question for measurement of well-being, is whether preferences or an indirect utility function can be recovered from an individual’s observed market demand function D(p,y,z,r), provided it satisfies the necessary conditions implied by utility maximization subject to budgets y $ p∙x. With qualifications, affirmative answers have been provided by two different lines of argument. The first, originating in the integrability analysis of Antonelli (1886) and Samuelson (1950), can be characterized as giving sufficient conditions under which Roy’s identity (2.7), treated as a partial differential equation in V, has a solution. Hurwicz and Uzawa (1971) give local and global sufficient conditions for recovery of money-metric indirect utility when market demand functions are single-valued and smooth; a summary of their argument is given by Katzner (1970). The second, originating in the revealed preference

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

13

analysis of Samuelson (1948), Houthakker (1950), and Richter (1966), gives necessary and sufficient conditions for recovery of a preference order whose maximization yields the observed demand function; Afriat (1967) and Varian (2006) provide constructive methods for recovery of utility under some conditions. Qualifications are required because quite strong smoothness and curvature conditions on utility are needed to assure smoothness properties on market demand, and preferences recovered from upper hemicontinuous demand functions are not necessarily continuous; see Peleg (1970), Rader (1973), Conniffe (2007). An important caution is that even when consumer behavior is formulated in terms of money-metric utility, and discussed using phrases like ‘marginal utility of money’ and ‘diminishing marginal utility’, the indices U(x,z,r) and V(p,y,z,r) that can be recovered from observed demand are purely ordinal. Suppose z can be partitioned into z 5 (z1,z2), where z1 has a direct identifiable effect on market demand and z2 does not influence market demand; that is, D(p,y,z1,z2,r) is independent of z2, but if z1 Z z19, then there exist (p,y,r) such that D(p,y,z1,z2,r) Z D(p,y,z19,z2,r). Let V(p,y,z1,r) denote an ‘economical’ ordinal representation of preferences for market goods that satisfies Roy’s identity but does not depend on z2. Suppose there exists a true neurologicallydetermined hedonic index V*(p,y,z1,z2,r) that would be ideal for the assessment of consumer welfare, and suppose that it does depend on z2. Because V and V* both represent the preferences that determine market demand, they are linked by a transformation V*(p,y,z1,z2,r) ; f(V(p,y,z1,r),z1,z2,r), where f(v,z,r) is a smooth function that is increasing in V. Now, V and V* are equally legitimate utility functions from the standpoint of economic demand analysis. However, even though the variables z2 influence pleasure or pain, they have no influence on market demand behavior, and within neoclassical demand analysis have no identifiable or econometrically recoverable effect on well-being. Section 7 discusses contemporary attempts to go outside the neoclassical model to measure such effects by ‘making a market’ for z2 via incentive-compatible mechanisms for eliciting values.

3

FIRST MEASUREMENTS

In the days before digital computers, data on consumer behavior was limited and statistical computation was laborious. Consequently, empirical measurement of utility came slowly. One of the first serious attempts was made by Ragnar Frisch (1926, 1932), specializing a framework initially proposed by Irving Fisher (1892, 1918, 1927). Frisch used 31 monthly observations from Paris starting in 1920 on income, and the price and consumption of sugar. Frisch’s formulation now seems restrictive and a little awkward, but it was suited to the computational limits of the day and contained the important ideas of separable utility and composite commodities. In modern terminology, Frisch postulated that the demand for sugar could be written as x 5 D(p,y) ; f9(p)/g9(y),

(2.10)

where p was the real price of sugar and y was real income, with deflation to real values using a price index for a composite of the remaining commodities, g9 was a decreasing

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

14

Handbook of choice modelling

function interpreted as the marginal utility of money, and f9 was a decreasing function interpreted as the inverse of the marginal utility of sugar. This demand function has an associated indirect utility function that is additively separable in real income and the real price of sugar, so that the marginal utility of money is independent of the price of sugar, u 5 V(y,p) 5 g(y) − f(p).

(2.11)

The quasi-convexity requirement for indirect utility functions is met if g is convex and f is concave, but somewhat weaker requirements suffice on a restricted (p,y) domain.9

4

THE STONE AGE

Econometric demand analysis flowered in the 1960s, as improved data and digital computers made serious empirical work possible. The real starting point was the contribution of Richard Stone (1954), who estimated expenditure systems linear in income that were derived from Cobb–Douglas demands, translated to allow committed expenditures, xi 5 ci 1 qi(y − p∙c)/pi.

(2.12)

Here, i 5 1, . . . ,n indexes the commodities, y is income, p is a vector of commodity prices, c 5 (c1, . . . ,cn) is a vector of committed demands, and q1, . . . ,qn are positive parameters that sum to one. The Stone system is a special case10 of the polar form of Terence Gorman (1953, 1961), xi 5 Ci(p,z,r) 1 (y − C(p,z,r))∙Ai(p,z,r)/A(p,z,r),

(2.13)

where C and A are concave, non-decreasing, conical functions of prices that may depend on experience and tastes through the arguments (z,r), derived from an indirect utility function u 5 (y − C(p,z,r))/A(p,z,r),

(2.14)

and Ci and Ai denote derivatives with respect to pi. The Gorman polar form can be generalized to allow more flexible Engel curves by introducing a monotone transformation of deflated income, u 5 g(y/A(p,z,r)) − C(p,z,r)/A(p,z,r),

(2.15)

with quasi-convexity of the indirect utility function restricting the curvature of g and/or the domain of (p,y). The corresponding demand function is xi 5 Ci(p,z,r)/g9(y/A(p,z,r)) 1 (y − C(p,z,r)/g9(y/A(p,z,r)))Ai(p,z,r)/A(p,z,r). (2.16) Frisch’s original demand function for sugar is of this generalized Gorman polar form, with sugar excluded from the price index A(p).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

15

In the 1960s and 1970s, a variety of econometric demand systems were proposed, many derived from specifications of expenditure or indirect utility functions. Important early contributions were the direct and indirect addilog systems of Houthakker (1960), and the Constant Elasticity of Substitution (CES) form of Arrow et al. (1961). A number of econometric demand systems were developed at Berkeley by students working with me and my colleagues Dale Jorgenson and Robert Hall. In 1963, Erwin Diewert proposed a Generalized Leontief cost function that was quadratic in square roots of prices; see Diewert (1971) and Blackorby and Diewert (1979). I pointed out that this system could be interpreted as a second-order Taylor’s expansion of any smooth cost function, so that it had the nice property that at the approximation point it could reproduce all the own and cross-price elasticities of the original. We named this the flexible functional form property, and it became one of the criteria guiding subsequent developments. Dale Jorgenson and Larry Lau devised the translog system, another flexible functional form that generalized the Houthakker indirect addilog system; see Christensen et al. (1975). Another major contribution to the specification of demand systems, influenced by both the Berkeley tradition and by Terence Gorman, was the Almost Ideal Demand System proposed by Angus Deaton and John Muellbauer (1980a, 1980b), with the indirect utility function u 5 [ln y − a0 − a k51ak ln pk − ½ a k51 a j51gkj ln pk ln pj]/b0 q k51 pkbk, (2.17) n

n

n

n

and demand functions whose expenditure shares are linear in logs of income and prices. This is a Gorman generalized polar form with translog committed expenditures and a Cobb–Douglas price index. Zero degree homogeneity of (2.17) in income and prices n n n requires the parameter restrictions gkj 5 gjk, g k51ak 5 1, g k51 bk 5 0, and g k51 gkj 5 0, and quasi-convexity restricts the (p,y) domain. In general, the parameters in (2.17) depend on r and can be functions of z. While the demand systems (2.12)–(2.17) were derived from the theory of the individual consumer, they were typically applied to observations on cross-sections of individuals, or to market aggregates, by assuming a representative consumer. Except under special circumstances (see for example Chipman and Moore, 1980, 1990), this presumed homogeneous preferences, or in the later work of Jorgenson et al. (1980, 1997) and Arthur Lewbel (1992), preference heterogeneity parameterized as a function of observables. The utility-consistent demand systems mentioned above generally worked well to explain demand at the market level despite the representative consumer restriction. Lester Taylor (2005) estimates neoclassical demand systems using US Consumer Expenditure Survey quarterly expenditure data and ACCRA Cost of Living indices across urban areas in six expenditure categories. Table 2.1 gives price and expenditure elasticities for an Almost Ideal Demand System fitted to these data. An example of a use for these results is calculation of excise tax structures that maximize well-being subject to budget and distributional constraints. Taylor points out that there are substantive aggregation, quality, and taste heterogeneity issues in the use of such data, but his results are generally consistent with other studies. He finds that Stone, indirect addilog, and direct addilog systems give qualitatively similar results.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

16

Handbook of choice modelling

Table 2.1 Price and total expenditure elasticities Almost Ideal Demand System, 1995 CES-ACCRA Surveys, from Lester Taylor (2005) Food Shelter Utilities Trans. Health Misc.

5

Food

Shelter

Utilities

Trans.

Health

−0.2981 −0.1105 −0.1071 −0.6134 −0.7813 0.4395

0.6644 −0.8285 0.1638 −0.2520 0.0023 −0.2179

0.0599 0.1909 −0.7222 −0.2471 0.4260 −0.2267

−0.0013 0.1902 0.0523 −1.3739 −0.0129 −0.0154

0.1400 0.2782 −0.0669 −0.7627 −0.9375 0.0470

Misc.

Total exp.

−0.5044 −0.5777 0.1783 1.5824 0.8318 −1.1448

0.4469 0.8876 0.4612 1.7250 0.6338 1.2150

CONSUMER WELL-BEING

How does a change in a consumer’s economic environment from (p9,y9,z9) to (p0,y0,z0) affect her indirect utility, the neoclassical measure of well-being? The concept of consumer surplus from Dupuit, Marshall, and Hicks can be interpreted as a money metric for changes in indirect utility, an adjustment to income that equates indirect utilities before and after a change in the economic environment. The compensating variation or willingness-to-pay (WTP) for this change is the net reduction in final income that makes the consumer indifferent to the change, V(p0,y0 − WTP,z0,r) 5 V(p9,y9,z9,r). Substituting these values into M(p0,u,z0,r) gives the money-metric equality μ(p0,z0;p9,y9,z9,r) 5 μ(p0,z0;p0,y0 − WTP,z0,r) K y0 − WTP, or WTP 5 μ(p0,z0;p0,y0,z0,r) − μ(p0,z0;p9,y9,z9,r) ; y0 − μ(p0,z0;p9,y9,z9,r)

(2.18)

; {y0 − y9} − {μ(p9,z0;p9,y9,z9,r) − μ(p9,z9;p9,y9,z9,r)} − {μ(p0,z0;p9,y9,z9,r) − μ(p9,z0;p9,y9,z9,r)}. The last identity decomposes WTP into the observed net increase in money income, less the net increase in income necessary at initial prices and utility level to offset the change in non-market attributes, less the net increase in income necessary to offset the change in prices at the final non-market attributes and initial utility level. The last bracketed term in (2.18) can be written ps

μ(p0,z0;p9,y9,z9,r) − μ(p9,z0;p9,y9,z9,r) 5 C pr H (p, ur, zs, r) # dp,

(2.19)

the Hicksian net consumer surplus from the price change from p9 to p0, where u9 5 V(p9,y9,z9,r9) is the initial utility level. This integral is independent of path and can be taken over any rectifiable path from p9 to p0. When z0 5 z9 and y0 5 y9, so that only price changes, WTP equals the Hicksian consumer surplus. The Equivalent Variation or Willingness-to-Accept (WTA) the change is the net addition to initial income that makes the consumer indifferent to the change, V(p0,y0,z0,r) 5 V(p9,y9 1 WTA,z9,r), or

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure WTA 5 μ(p9,z9;p0,y0,z0,r) − μ(p9,z9;p9,y9,z9,r) ; μ(p9,z9;p0,y0,z0,r) − y9

17

(2.20)

; {y0 − y9} − {μ(p0,z0;p0,y0,z0,r) − μ(p0,z9;p0,y0,z0,r)} − {μ(p0,z9;p0,y0,z9,r) − μ(p9,z9;p0,y0,z0,r)}. Again, the final decomposition is the observed net increase in money income, less the net increase in income necessary at final prices and utility level to offset the change in nonmarket attributes, less the net increase in income necessary to offset the change in prices at the initial non-market attributes and final utility level, with the last term expressible as a Hicksian consumer surplus integral similar to (2.19), but with Hicksian demand evaluated at the final utility level u0 5 V(p0,y0,z0,r). Putting income and prices in real terms and using the first identity in (2.5), the definition (2.9), and the theorem of the mean, WTP and WTA can be written WTP 5 [V(p0,y0,z0,r) − V (p9,y9,z9,r)]/=yV(p0,ya,z0,r) ≈ MWTP ; [V(p0,y0,z0,r) − V (p9,y9,z9,r)]/=yV(p0,y0,z0,r), (2.21) WTA 5 [V(p0,y0,z0,r) − V(p9,y9,z9,r)]/=yV(p9,yb,z9,r) ≈ MWTA ; [V(p0,y0,z0,r) − V(p9,y9,z9,r)]/=yV(p9,y9,z9,r), where ya and yb are points in the line segment between y9 and y0, marginal willingness-topay (MWTP) approximates WTP when y9 and y0 are close, and marginal willingness to accept (MWTA) approximates WTA when y9 and y0 are close. An implication of (2.21) is that WTP, MWTP, WTA, and MWTA all have the same sign, and can be interpreted as alternative scalings of the indirect utility difference V(p0,y0,z0,r) − V(p9,y9,z9,r). If before and after prices are scaled so that y9 5 y0, then WTP and MWTP coincide, as do WTA and MWTA. This result also holds if the marginal utility of income is independent of y, a case that corresponds, at least locally, to an indirect utility function V(p,y,z,r) 5 V1(p,z,r)·y 1 V2(p,z,r) that is linear in real income. The Gorman polar form has this structure, and allows easy welfare calculations, with WTP and WTA differing only by a scale factor, WTP∙A(p0,z0,r) 5 WTA∙A(p9,z9,r) 5 [y0 − y9 − C(p0,z0,r) 1 C(p9,z9,r)]. If the marginal utility of income is independent not only of y, but also of p and z, then WTP and WTA coincide, and the indirect utility function reduces to V(p,y,z,r) 5 V1(r)·y 1 V2(p,z,r). Neoclassical measurement of well-being starts from the assumption that one can identify and recover the market demand functions x 5 D(p,y,z,r) of individuals, and infer from these the features of money-metric utility necessary to do the consumer surplus calculation. Examine this question in the formula (2.18) for the WTP of an individual consumer. The first term in the final decomposition of (2.18) is just an observed income difference. The final term in the decomposition of (2.18), the Hicksian consumer surplus, can be recovered or bounded by first recovering the demand function using observations on choice at different prices and incomes from the same preferences, and it is well known that with sufficient variation in budgets, one can bound or recover exactly this consumer surplus; see Hendrik Houthakker (1950), Robert Willig (1976), Hal Varian (1982). This leaves the middle term in (2.18) to be identified. McFadden (2008) argues that this requires

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

18

Handbook of choice modelling

either that choice be observed in situations where the environment z is determinative at an active margin, for example because z and p influence utility in a known interaction, or because discrete choices are made that select the environment; or that some non-market information on well-being be collected and used. McFadden (1986, 1994, 1999b, 2008) gives detailed discussions of identifying or bounding WTP and WTA using both revealed and hypothetical choice data. In practice, the identification of demand for individual consumers is a challenging task. The market rarely provides natural experiments in which the same individual reveals demand in repeated choice situations that span a full domain (p,y) of prices and incomes, and a consumer’s choice history modifies the experience vector z systematically, so that it will often be difficult to identify the separate effects of (p,y) and z. Only components of z that have active margins, in the sense that changing z changes market behavior, have neoclassically identifiable effects. Tacit in most applications of neoclassical welfare analysis is an assumption that market good prices and nonmarket attributes interact in such a way that changes in z can be translated into changes in effective market prices, and rolled into the consumer surplus calculation. For example, suppose direct utility is U(x*,r), where x*j 5 xj∙fj(z) is the ‘quality-corrected’ amount of good j, with an associated indirect utility function V(p*,y,r) of income and effective prices p*j 5 pj/fj(z). Then z influences indirect utility only through effective prices, and the contribution to WTP from changes in p and z is given by a consumer surplus integral of the form (2.19) between initial and final effective prices, with a Hicksian demand integrand that also depends on z only through the effective prices. The expressions (2.18)–(2.21) represent the full neoclassical elaboration of Dupuit’s characterization of changes in well-being for consumers facing linear budget constraints, incorporating Hick’s refinement of compensating for the effect of income on marginal utility. However, as noted at the end of section 2, the effect on well-being of changes in non-market attributes z may not be identified from observable neoclassical demand behavior.

6

EXPANSIONS

As microdata on individuals and computational capacity have expanded over the last half-century, neoclassical econometric demand systems predicated on linear budget sets and representative consumers have proven uncomfortably restrictive. These systems could not deal easily with preference heterogeneity, acquired tastes, shifting hedonic attributes of commodities, nonlinear budget sets, time, space, or uncertainty, and the frequent cases of zero and lumpy purchases. It was necessary to expand the domain of the theory. This was done initially by retaining the central elements of standard neoclassical consumer theory, and bringing forward some of the broader components of utilitarianism in a way that was consistent with the neoclassical core, as illustrated in Figure 2.2. This meant preserving the tenets of consumer sovereignty and preference maximization, but admitting the influence of (observed and unobserved) experience and memory on perceptions and on current preferences, leading to heterogeneity across consumers. These extensions also allowed household production, nonlinear budget constraints, and utility maximization with strategic optimization and recalculation as events unfold. The following subsections describe each of these extensions.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure Experience

Information

Memory

Perceptions/Beliefs

19

Statistical information processing (rational expectations)

Budget Constraint (linear or nonlinear) Process

Household Production

Choice

Utility maximization (static or dynamic)

Preferences Memory-dependent utility of outcomes

Figure 2.2

The extended neoclassical model

6.1 Preference Heterogeneity The extension of neoclassical consumer theory to handle both tastes acquired as the result of observable experience and history and unobserved preference heterogeneity is a reaffirmation of circumstances allowed in the neoclassical model, but pushed aside to facilitate exposition and econometric estimation. In the summary given in the previous sections, I wrote utility U(x,z,r) as a function of observed experience z and unobserved tastes r, and these effects carried into the demand functions as arguments. A family of utility functions U(∙,∙,r) on R is termed a preference field, and a distribution on r determines a distribution of demands given z and market variables (p,y).11 Observed demand distributions then restrict or identify the underlying distributions of unobserved tastes. The primary problems in application are practical; how to measure and fold into the utility function all the varied experiences of consumers, and how to embed within the system and characterize the distribution of unobservable components of tastes. My original treatment of discrete choice as a result of random utility maximization (Domencich and McFadden, 1975; McFadden, 1974a, 1974b) illustrates a parametric solution. Modern developments allow both flexible parametric and nonparametric estimation; see, for example, Ichimura and Lee (1991), Horowitz (1992), Matzkin (1992, 1993), Ichimura and Thompson (1998), Pagan and Ullah (1999), Huang and Nychka (2000), Horowitz and Savin (2001), Signorini and Jones (2004) and Blundell et al. (2008, 2012). There are advantages to shifting the focus of consumer theory from individual preferences to distributions of preferences. Both market demand and social welfare are

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

20

Handbook of choice modelling

functions of these distributions, and do not require detailed preference information at the individual level. This can substantially reduce the requirements for information and experimental variation relative to those needed to identify individual utilities. Random and fixed effects panel data models in econometrics are an instructive analogy – random effects models require much less data for identification, but also require independence assumptions that are not needed in fixed-effects estimation. However, preference heterogeneity raises conceptual issues. Is unobserved taste variation a permanent individual effect, or is there a component that varies with time or choice opportunity? The neoclassical presumption is that tastes within an individual are fixed. This is the setup of revealed preference theory, which envisions a sequence of budgets offered to an individual whose tastes are uninfluenced by the experience of previous offers, or by whims. Alternately, individual tastes may have an unobserved time-varying component. This is in itself not inconsistent with classical utilitarianism, which left room for utility to reflect ‘a moment’s fancy’. However, the presence of taste variations or hysterisis across a revealed preference sequence undermines the main revealed preference result that the convex hull of preferences can be recovered from observed demands. If instead preferences are treated as stochastic, interesting possibilities open for models with both intra-individual and cross-individual heterogeneity. Employing the theory of stochastic revealed preference (Block and Marschak, 1960; Marschak, 1960; Luce and Suppes, 1965; McFadden and Richter, 1990; McFadden, 2005; Fosgerau and McFadden, 2012), and panel data on demand, one could ask for conditions under which the distributions of the unobserved taste heterogeneity can be nonparametrically identified. Is it possible to untangle state-dependence and unobserved individual effects in consumer panels, the Heckman initial-values problem? Is it possible to separate heterogeneity in perceptions from heterogeneity in tastes when choice alternatives are risky or ambiguous? Is it possible to identify the distribution of preferences from market-level demand observations?12 6.2

Nonlinear Budget Sets

The neoclassical focus on linear budgets and convex preferences neglected a range of consumer behavior that is apparent at the level of the individual, the lumpiness and mutual exclusivity of many consumer choices such as school, job and brand of automobile. It also neglected the important economic area of nonlinear pricing, arising from two-part and nonlinear tariffs, and progressive taxes. Extending econometric consumer theory to handle these applications required attention to the role of taste heterogeneity, and to the characterization of budgets. The duality methods that are so useful in linear budget problems are hampered here, but still valuable, for example in Dubin and McFadden (1984) and Hausman (1985). One important observation for measurement of consumer well-being is that nonlinear budget sets can be helpful in identifying neoclassical preferences. For example, preferences are recovered directly when budgets are restricted to binary comparisons. A useful tool for analyzing nonlinear budget sets within the framework of the neoclassical utility model, introduced by Matzkin and McFadden (2011) and developed by Fosgerau and McFadden (2012), considers a preference field formed by taking a base money-metric direct utility function h(p9,z9;x,z,r) as in (2.9) that is continuous in its arguments and embedding it in a family formed by additive linear perturbations q of marginal utility, h(p9,z9;x,z,r) 1 q∙x. The perturbations introduced in this analysis can

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

21

be treated as a technical device and set to a fixed value at the end, but the full power of the approach is attained when these are true unobserved preference perturbations q that have an absolutely continuous distribution in the population. Fix the benchmark (p9,z9) and define the money-metric indirect utility function V(B,z,r,q) 5 maxx[B{h(p9,z9;x,z,r) 1 q∙x} and demand function D(B,z,r,q) 5 argmaxx[B{h(p9,z9;x,z,r) 1 q∙x} for any nonempty compact budget set B that intersects the consumption set X. Then, V is a convex function of q, the convex hull of D(B,z,r,q) equals the subdifferential of V with respect to q, and for almost all q, D(B,z,r,q) is a singleton. Thus, the perturbation vector q plays the same role for general budget sets that prices play in a standard expenditure function for linear budget sets. Fosgerau and McFadden give necessary and sufficient conditions for V to be a money-metric indirect utility function for a family of nonlinear budget sets. These conditions can be used in applications to generate generalizations of neoclassical expenditure systems and construct WTP measures for nonlinear budget sets. 6.3

Hedonic Goods and Household Production

Economists moved in the 1970s from treating commodities as objects with fixed attributes to hedonic models in which consumers care about generic attributes that can be met through various quantities and combinations of market goods. The simplest hedonic model, dating to Andrew Court (1939), Kevin Lancaster (1966) and John Muth (1998), allowed the hedonic content of a unit of a market good to vary with the design of its manufacturer, and assumed in implementation that these dimensions of content could be measured. Extending this approach, consumers may be thought of as obtaining various observed and unobserved hedonic quantities through a combination of the direct hedonic content of market goods and household production of hedonic content. For example, an automobile contains as direct hedonic content ‘horsepower’ and ‘cargo capacity’, and requires the household production activities of driving and parking to facilitate foraging for food and satisfying hunger. Household production is a feature of economic life whose presence influences consumer’s economic behavior, and enriches the interpretation but complicates the measurement of utility, and also offers additional measurement opportunities. Economists invoke household production ideas to explain time allocation, and facilitating activities like travel. Nevertheless, household production is given little attention in economics textbooks. I think one reason for this is that unless one has observations on household production activities or hedonic products of the household production process, one cannot distinguish household technology from tastes. To illustrate, let w 5 (w1, . . . ,wK) denote hedonic quantities, z denote the consumer’s environment, x 5 (x1, . . . ,xN) denote market goods, y denote income, and p 5 (p1, . . . ,pN) denote market good prices. Let F(w,x,z) # 0 denote the household technology, and U(w,z,r) denote the direct utility function. Then, the consumer’s indirect utility satisfies V(y,p,z,r) 5 maxw,x U(w,r) s.t. F(w,x,z) # 0, p∙x # y.

(2.22)

Given this indirect utility function, apply the duality mapping U*(x,z,r) 5 minp V(p∙x,p,z,r)

(2.23)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

22

Handbook of choice modelling

to obtain a reduced form utility function of the market goods. Then U* has the conventional properties of a neoclassical utility function. This construction does not require convex preferences and household production possibilities, and leaves household production implicit. Then, by Occam’s razor, if only market purchases are observed, one might as well model only U*, and treat household production as outside the province of economics. However, there is potentially a great deal to be learned when it is possible to measure some post-household-production hedonic quantities. Variation in household technologies may be a source of apparent taste variation in utility, or may attenuate the impact of taste variations on market transactions. Structural models of household production and consumption can explain simply behavior that may otherwise be difficult to interpret, such as demand for education, exercise, work, and household durable equipment that has both consumption and production aspects. The hedonic measures w may be conventional economic ones, like horsepower and cargo space, or may be proximate to the organism; for example, calorie intake or alleostatic load. Careful analysis of household production, augmented by hedonic measurements, is in my opinion one of the promising and relatively neglected frontiers in econometric study of consumer behavior. The utility maximization (2.22) is a problem of utility maximization subject to a nonlinear budget constraint, as discussed in the previous section; in this case, the hedonic content of market goods and the household technology define the nonlinear budget in w. Then, linear additive perturbations in marginal utility and/or linear additive perturbations in the cost of meeting household output requirements can be used in dual characterizations of production and utility. Hedonic regressions of product prices on attributes were introduced by Zvi Griliches and Irma Adelman (1961) as a method of adjusting price indices to control for product quality.13 These regressions were later connected by Sherwin Rosen (1974) to the theory of utility maximization and market equilibrium in hedonic space; see also Makoto Ohta (1971), Ohta and Griliches (1986), Andreu Mas-Colell (1996), McFadden (2008), Ekeland (2010) and Heckman et al. (2010). In summary, this literature finds that the existence of stable equilibrium in markets with differentiated hedonic commodities is problematic, that structural identification is difficult even when equilibrium is well defined, and that an equilibrium mapping from hedonic content to price is in general a nonparametric reduced form that reflects technology and market structure as well as consumer preferences. Nevertheless, both McFadden and Heckman et al. give conditions under which it is possible to recover the distribution of hedonic preferences when consumers operate at active margins, locating at observable points in hedonic space in response to tradeoffs between hedonic factors. I will give one example in which a linear regression of the log of market good price on a vector of observed hedonic content identifies consumer tastes for hedonic attributes. Assume that consumers have utility functions U(w,r) of a vector w 5 (w1,. . .,wK) of hedonic quantities, and associated indirect utility functions V(p*,y,r) of the effective prices p* of hedonic units. Assume the first hedonic quantity is a sum of quantities of m market goods, weighted by their hedonic content, w1 5 x1exp(z1b1e1) 1 . . . 1 xmexp(zmb1em), where b is a vector of taste weights, zj describes the measured hedonic content of good j, and ej summarizes unmeasured hedonic attributes of good j. Note that the hedonic attributes enter in a ‘factor augmenting’ form, so that p*1 5 min1≤j≤mpjexp(− zjb−ej) is the effective price of good 1, and V(min1≤j≤mpjexp(−zjb−ej),p*2, . . . ,p*K,y,r)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

23

is the consumer’s indirect utility expressed in terms of the market prices and hedonic content of the market goods 1, . . . ,m. Suppose the consumer faces a consumption set X in which she can buy the market goods 1, . . . ,m in continuous quantities. The optimizing consumer will then purchase only market goods in argmin1≤j≤mpjexp(−zjb−ej). If all consumers are identical in their hedonic taste weights b and perceptions of the unobserved attributes ej, then all goods observed in the market will have effective prices achieving the minimum, so that log pj 5 a 1 zjb 1 ej for j 5 1, . . . ,m, where a is a (random) value common to all the goods. Then with these strong assumptions on preferences, market prices for the goods 1, . . . ,m are determined independently of production and market structure, and the hedonic regression parameters are preference weights. A discrete choice variant on the setup above also allows econometric recovery of hedonic taste weights. Suppose now that the consumption set X requires the consumer to choose a unit purchase from mutually exclusive alternatives 1, . . . ,m, and consumers have common taste weighs b but are heterogeneous in their perceptions (e1, . . . ,em) of the unmeasured attributes. Then, the share of consumers choosing alternative j is given by a discrete choice model, Prob(j | p1, . . . ,pm;z1, . . . ,zm) 1` 5 e Fj((zj−z1)b−log(pj/p1)1ej, . . . , (zj−zm)b−log(pj/pm)1ej)dej, 2`

(2.24)

where F is the cumulative distribution function of (e1, . . . , em) and Fj denotes its derivative with respect to its jth argument. If, for example, F is i.i.d. extreme value, then (2.24) is multinomial logit, the specification used in my initial formulation of discrete choice models (Domencich and McFadden, 1975; McFadden, 1974a, 1974b). Both hedonic regression and hedonic discrete choice have had wide application, and have been generalized to nonlinear, semiparametric, and nonparametric specifications; see, for example, Anderson et al. (1992), McFadden and Train (2000), Yatchew (1998, 2003) and Heckman et al. (2010). An econometric issue, tacit in both hedonic regression and discrete choice models, is that the orthogonality or independence of observed hedonic attributes and unobserved disturbances is problematic. Traditional instrumental variables methods usually suffice for linear hedonic regression, but nonlinear models are more challenging, and have been the subject of a large literature; see, for example, Berry et al. (1995, 2004a, 2004b), Blundell and Powell (2004), Matzkin (2005, 2008, 2012). 6.4

Consumer Dynamics

When consumer behavior is considered over time, it is necessary to clarify what utility and utility-maximization mean. One concept is that of instant utility or felicity, a hedonic index of the sensation of well-being at a moment. Another is decision utility, an index of the desirability of choices that may be available at the moment and that determine current consumption and future options. A third is remembered utility, an index of current satisfaction with experiences in the past. Neoclassical economics focuses on decision utility as the operative driver of market behavior, and emphasizes that only its ordinal properties matter. In this view, instant utility and remembered utility are relevant

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

24

Handbook of choice modelling

to economic behavior only through their influence on decision utility, even if they have independent psychological content. The major issues in neoclassical modeling of consumer behavior over time were the intertemporal structure of decision utility, and the event timing, information sets, and calculus involved in utility maximization. Consumer theory has handled these in two ways. A framework introduced by Fisher (1908, 1930), Edmund Malinvaud (1953) and Gerard Debreu (1959) dated commodities and made their delivery contingent on uncertain events. In Debreu’s interpretation, utility spanned the lifetime of the consumer, with a single decision-utility-maximizing choice specifying in advance the response to the realization of each contingency, and determining the entire life course. This was a complete, logically elegant and instructive implementation of consumer theory, with utility incorporating a complete system of perceptions and subjective probabilities, and including in the life plan of the consumer full allowance for the strategic impact of choice on later options and preferences. Nevertheless, the approach has severe limitations, first because its full articulation requires the existence of a spanning set of contingent markets that in practice do not exist, but more fundamentally because it is clear from behavioral evidence that life plans are ‘incomplete contracts’ that ignore many contingencies and are subject to continual updating and revision. The limits of the approach are obvious when one asks at what point in time the consumer’s once-and-for-all life choice is made – at birth, the time of preparation for A-level examinations, voting age? The second approach to handling time and uncertainty in neoclassical consumer theory was to treat the utility of a life as the integral of discounted instant utilities, an idea that dates back to Bentham’s depiction of utility as depending on intensity, duration, and propinquity or remoteness, and to Edgeworth’s description in 1881 of the level of happiness associated with an experience as the integral of the intensity of pleasure over the duration of the event: ‘The continually indicated height [of felicity] is registered by photographic or other frictionless apparatus upon a uniformly moving vertical plane. Then, the quantity of happiness between two epochs is represented by the area contained between the zero-line . . . and the curve traced by the index.’ Edgeworth viewed felicity as a cardinal measure of sensation, with levels that were comparable across time and allowed utility to be expressed as an integral. The later neoclassical formulation instead deduces felicities as a feature induced by a separability property of preferences (see Debreu, 1986). The formulation of decision utility as an integral of felicities is usually extended to decompose the utility of uncertain prospects into the expected utility of their outcomes under the axioms of von Neumann and Morgenstern (1953) and Savage (1954); see Arrow (1971). To complete the theory, it is necessary to describe how the utility function depends on memory and learning, how perceptions and subjective probabilities are formed and updated, and how choices are made and revised as time passes and events unfold. A typical implementation assumes that the consumer solves a dynamic stochastic program to maximize the expected present value of a discounted integral of future instant utilities, with subjective probabilities that satisfy the Muth-Lucas axiom of rational expectations, requiring that subjective probabilities of different consumers agree with objective frequencies, and hence with each other; see Lucas (1975) and Muth (1992, 1994). The approach can accommodate experience and learning through state variables that enter instant utility, but often these effects are omitted or admitted in very restrictive form. The dynamic stochastic programming approach is again an elegant

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

25

and instructive logical solution to the problem of consumer dynamics. However, the strongest form of the model, with a representative consumer and rational expectations, is vulnerable to behavioral rejection, because the solution of these programs involve levels of complexity and computation that fairly clearly exceed human cognitive capacity, because it is unrealistic to assume that historical experience and market information and discipline are sufficient to homogenize subjective expectations, particularly for rare events, and because the axiomatic foundations for utility jointly additively separable in time and uncertain outcomes are not persuasive; see Pollak (1970). Intertemporally separable decision utility has difficulty explaining the smoothness of consumption in the presence of observed income shocks; for example, Hall (1978), Campbell and Deaton (1989), Sundaresan (1989), Okubo (2008) and Attanasio and Pavoni (2011). This is most easily addressed within the neoclassical framework by letting felicity depend on state variables that summarize consumer history. In addition to observed states, such as holdings of consumer durables, allow unobserved or hidden states that carry the effects of intertemporal substitutability, one interpretation of Browning (1991). By expanding the dimensionality of the state description, the utility maximization model can be represented as a dynamic stochastic program with Markov dynamics. A final generalization would be to reintroduce the idea of Jevons (1871) and Edgeworth (1894) that the same objective time may correspond to different rates of thought and feeling in different periods, so that two dimensions are required to characterize the elements of the utility of an episode, its felicity and subjective time. Then, decision utility at moment t would have the form 1`

u 5 Et|z(t),s(t) 3 U (x (t) , z (t) , s (t) , r) d (dt, t, z (t) ) ,

(2.25)

t

where x(t) is the vector of market goods purchased at time t, z(t) is the consumer’s environment, s(t) is a vector of observed and hidden state variables, and r indexes tastes. The function U(x(t),z(t),s(t),r) is felicity at t, and d(dt,t,z(t)) measures a subjective time interval at t as viewed from the current moment t. In this formulation, subjective time may depend on the environment of the consumer. The measure d also incorporates time and risk discounting, which arises in the utilitarian view because, in the words of Edgeworth, ‘the bird in the bush may never come to hand’. The operator Et|z(t),s(t) denotes subjective expectation at t, conditioned on the consumer’s environment and experience at that moment. The state s has an equation of motion ds(t)/dt 5 h(x(t),z(t),s(t)).

(2.26)

This formulation of decision utility, embedded in a dynamic stochastic program, and allowing heterogeneity in preferences and perceptions, and interactions between perceptions, tastes, and experience, is an extension of the neoclassical consumer model that can accommodate phenomena such as ‘time-inconsistent discounting’, ‘time-inconsistent perceptions’, and differences between a direct integral of felicities and either remembered or decision utility. If the state s(t) includes time averages of x, and these time averages establish reference points or aspiration levels for the consumer, then the utility function

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

26

Handbook of choice modelling

can capture asymmetric hypersensitivity to gains and losses from these reference points. This setup risks explaining too much, including Taussig’s purchases made on a whim, but it can be given content by restricting the structure of felicity, subjective time, and subjective expectations.

7

NEW FRONTIERS: A BEHAVIORAL REVALUATION OF CONSUMER DECISION-MAKING

Neoclassical consumer theory implies that with rational calculation, we cannot be harmed by choice and trade. Then people should relish choice, and welcome all the alternatives offered by markets. Yet, people are challenged by choice. In the words of a Dutch proverb, ‘He who has choice has trouble’. We find choice uncomfortable, and often use procrastination, rules, pre-commitment, habit, suspicion and imitation to avoid ‘rational’ decision-making and trade. The psychiatrists even have a word for it – agoraphobia, or fear of the market. There are two possible reasons for this behavior. First, while trade is calculated to advance our self-interest, the calculation may be burdensome and the cost of mistakes substantial. We may simply be too lazy or timid to trade. Second, trade involves social interaction and the emotions that go with this. Choice alternatives and trades may be misrepresented in the market game, and suspicions may be justified. As a result, we evaluate economic activities not only cognitively, but also strategically and viscerally; see Mellers (2000), Lowenstein et al. (2003) and McFadden (2006). This emotional aspect not only explains why economic choices can make us uncomfortable, but also why we sometimes make systematic mistakes – we do not approach economic decisions with a single mind. A schematic for behavioral models of choice, given in Figure 2.3, differs from the neoclassical schematic primarily by adding affect and motivation as factors in choice, relaxing the rigid requirement that preferences are sovereign and king of the sentiments, and adding possible feedbacks. However, there is a more fundamental difference. Neoclassically trained economists think of these behavioral elements as arising from the limits of memory and cognitive capacity that bound rationality, slips or anomalies that the individual will detect and correct if they become obvious. Many psychologists and biologists think of this instead as a product of evolution, the result of a rough correspondence between generalized self-interest and survival, a hodge-podge of rules, processes, and strategies that mimic rationality in circumstances where rationality increases survival value. Day-to-day economic choices are explained by either paradigm, but perception and choice in novel situations tests the neoclassical premise, and challenges easy transitions between conventional demand analysis and the effect of novel economic policy on consumer well-being. Measurement of economic consumer behavior will continue to center on studies of revealed market behavior, with traditional consumer expenditure surveys augmented by electronic tracking of consumer purchases through scanner data, high frequency sampling through Internet panels, and increasing exploitation of natural experiments. These measurements will be supplemented by analysis of stated choices in hypothetical markets, and a great deal more data from microeconomic surveys, experimental economics, marketing science and cognitive psychology. Perhaps the most interesting and

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure Experience

27

Information

Stated Perceptions Memory

Perceptions/Beliefs

Budget Constraint (linear or nonlinear) Affect (emotion)

Process

Choice

Household Production

Preferences Stated Preferences Motivation

Figure 2.3

Memory-dependent utility of outcomes

The behavioral choice model

challenging new measurements come from fields not commonly allied with economics, sociology, anthropology, evolutionary and cellular biology, and neurology. I give an overview of this research, starting with more traditional psychological measurements and experiments in cognitive psychology, then measurements and experiments in sociology and anthropology, and concluding with findings and experiments in biology and neurology. 7.1

Stated Preferences and Conjoint Analysis

Lack of observed variation in attributes of market goods, and issues of exogeneity, have led economists to consider information obtained from hedonic preference experiments with hypothetical market choices. This is the method of conjoint analysis, adapted in market research from its psychophysical roots (Thurstone, 1931; Luce and Tukey, 1964; Johnson, 1974; Green et al., 1981, 2001; McFadden, 1986), and tied to models of stochastic preferences as a result of econometric work on discrete choice models (McFadden, 1986; Morikawa et al., 2002). In a review of consumer demand experiments, Ivan Moscati (2007) gives a remarkable bit of intellectual history. The first conjoint experiment on consumer demand was done by the iconic psychologist Leon Thurstone at the urging of his University of Chicago colleague Henry Schultz. Thurstone presented his paper at the 1932 meeting of the Econometric Society, with Ragnar Frisch and Harold Hotelling commenting from the audience on the critical differences between hypothetical and real choices. Thurstone’s method was noted and dismissed by Nicholas

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

28

Handbook of choice modelling

Georgescu-Rogen (1936) and by Allen Wallis and Milton Friedman (1942) for three compelling reasons, the hypothetical nature of the offered choices, the difficulty of detecting indifference, and the difficulty of controlling experimentally for the effect of income and prices. Thurstone is not mentioned in the neoclassical treatises of Hicks and Samuelson, and there were no economists involved in the initial applications of conjoint analysis in marketing. However, abbreviated versions of conjoint analysis, termed contingent valuation, vignette analysis or self-reported preference, later became popular among some applied economists and political scientists; see Rossi (1979), McFadden (1986, 1994), Diamond and Hausman (1994), Green et al. (1998), Carson et al. (2001), Frey et al. (2002a, 2002b), King et al. (2004). The use of hypothetical market choice data remains controversial among economists, with some reason, as it is difficult to achieve the verisimilitude of real markets in the laboratory, and cognitive inconsistencies that are not obvious in low-frequency real market choices may be glaring in repeated laboratory choices. A number of mechanisms have been developed for incentive-compatible elicitation of preferences; McFadden (2012) shows, for example, how the Clark–Groves mechanism can be used in an economic jury drawn at random from the population to decide on public projects. However, in practice many stated preference elicitations are either not formatted to be incentive-compatible, or fail to carry through to the payoffs required in an incentive-compatible mechanism. Consequently, responses are likely to be distorted by inattention, risk perceptions, and careless opinion; see Johansson-Stenman and Svedsater (2003). Worse, conjoint responses are manipulable through the framing of elicitations, an impediment to their reliability and reproducibility. Despite these weaknesses, stated preference methods have become a proven tool in marketing for designing and positioning new products. For example, experiments on automobile brand choice can determine with considerable predictive accuracy the distributions of preference weights that consumers give to various vehicle features; see Urban et al. (1990, 1997); Toubia et al. (2003); Train and Winston, (2007). In overview, experience seems to be that these methods work best when the task is choice among a small number of realistic, relatively familiar and fully described alternative products, ideally with the incentive that with some probability the offered transaction will be executed and the stated choice delivered. Stated preference methods are less reliable and less directly useful for predicting behavior when the task is to rate products on some scale, or to adjust some attribute (for example, price) to make alternatives indifferent. They are also less reliable when the products are unfamiliar or incompletely described, or involve public good aspects that induce respondents to make social welfare judgments. Methods that require cardinal utility judgments, such as those of the Leiden school (van Praag and Kapteyn, 1994) and Frey and Stutzer (2002a, 2002b), have intuitive validity, but require strong behavioral axioms to be consistently predictive for choice; see Dagsvik et al. (2005). A neglected area related to stated preferences are elicitations of stated perceptions. Manski (1991, 2004) and others have developed elicitation methods that avoid some obvious distortions in stated personal probabilities, and appear to explain some risktaking behavior. A useful extension of current conjoint methods would be to incorporate and measure subjective perceptions and other psychological dimensions that appear to influence decision-making.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure 7.2

29

Measurements from Cognitive Psychology

There are now extensive experiments and insights from cognitive psychology that contradict a narrowly defined neoclassical model of rational choice, many originally conducted by Amos Tversky and Danny Kahneman. These suggest that preferences are malleable and context-dependent, that memory and perceptions are often biased and statistically flawed, and decision tasks are often neglected or misunderstood. Table 2.2 is a summary of major cognitive anomalies that appear in psychological experiments and surveys; for more details, see Rabin (1998), Camerer (1999) and McFadden (1999a). I give four examples of anomalies that challenge the neoclassical model. 7.2.1 The endowment effect is consumer aversion to trade from any given status quo. The endowment effect was beautifully illustrated in a classical experiment by Jack Knetsch (1989) in which a random assignment of coffee cups produced a large gap between WTP and WTA, with far less trading than should be needed to move from a random allocation to a Pareto efficient allocation; see also Kahneman et al. (1990, 1991) and Camerer and Thaler (1995). I conducted a comparable experiment in an introductory microeconomics course at Berkeley, using pencils embossed with the course name. About half of the 345 students, 172, were randomly endowed with a chit redeemable for a pencil. Then, a Vickery sealed-bid uniform price double auction (Yoon, 2001) was held to reallocate the chits: if the participants had true values v1 $ . . . $ v345, then the optimal strategy for each player was to bid her true value, and chits awarded to people with values less than v172 should trade, with buyers paying v173, sellers receiving v172, and the market manager covering the difference. The income effect of being endowed with a pencil is negligible, so that with random assignment the distributions of money marginal utilities of a pencil should be the same for buyers and sellers. Then if there was no endowment effect, one would expect about one-half of the pencils to trade at a price close to the median values for both buyers and sellers. More precisely, the number of chits awarded among the 172 students with the lowest value, which we expect to trade, has a hypergeometric distribution with mean 86.2 and standard deviation 4.6. The experimental results are summarized in Figure 2.4. The market cleared with 32 chits traded, at a price of 35 cents. The median of the sealed bid prices was 10 cents, while the median of the sealed ask prices was 100 cents, a gap between WTP and WTA similar to that found in the cup experiment. The probability of 32 transactions or fewer when there is no endowment effect is on the order of 10−16. Further, a runs test confirms (T-Stat 5 12.5) that buyers and sellers do not have the same value distribution. Thus, there is a strong, trade-suppressing endowment effect, generated instantaneously by a random allocation of pencil chits. Either values are changing endogenously, with almost instantaneous habituation to the status quo, or agoraphobia is real – consumers find trade an edgy experience, instinctively mistrust the market and resist trading for small gains. 7.2.2 Choice among lotteries often deviates from rationality; see Langer and Weber (2001) and Gilovich et al. (2002). A stylized summary is that consumers display: (i) an endowment effect, evaluating lotteries as changes from a reference point

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

30

Handbook of choice modelling

Table 2.2 Cognitive anomalies Effect

Description

COMPREHENSION Completion/substitution Missing or ambiguous parts of question are reconstructed Disjunction Failure to reason through or accept the logical consequences of choices Engagement/awareness Limited attention to and engagement in the cognitive task Format/mode Availability influenced by format, visual or auditory presentation Construal Question interpreted as one the subject is able (or prefers) to answer Translation Question terminology translated into subject’s personal vocabulary RETRIEVAL OF FACTUAL AND AFFECTIVE MEMORY Affective attenuation Affective memories are recalled with diminished intensity Availability Memory reconstruction is tilted toward the most available and salient information Primacy/recency Initial and recent experiences are the most available Reconstructed memory Imperfect memories rebuilt using contemporary cues and context, historical exemplars, commonly employed search criteria Selective memory Coincidences are more available than non-coincidences Telescoping/temporal Compression and attenuation of history, inconsistent time discounting JUDGMENT AND THE FORMATION OF PERCEPTIONS AND BELIEFS Anchoring Judgments are influenced by quantitative cues contained in the decision task Context/framing History and framing of the decision task influence perception and motivation Endowment No action is the ‘safe’ choice. ‘The devil you know is better than the devil you don’t’ Extension Representative rates are more available than integrated experience Prominence/order The format or order of decision tasks influences the weight given to different aspects Prospect Inconsistent probability calculus, asymmetry in gains and losses Regression Attribution of causal structure to fluctuations; failure to anticipate regression to mean Representativeness Frequency neglect in exemplars TASK DEFINITION, AND THE DECISION AND REPORTING PROCESSES Awareness Recognition of choices, subjective definition of choice set Construal/constructive Cognitive task misconstrued, preferences constructed endogenously Prevarication/projection Misrepresentation for real or perceived strategic advantage or to project self-image Subitic/automatic Sudden decisions made unconsciously by habit Suspicion/superstition Subjects mistrust offers and question motives of others in unfamiliar situations, avoid choices that ‘tempt fate’ Rule-driven Choice guided by principles, analogies, and exemplars rather than utilitarian calculus; rules induce pro forma, focal responses

that may be sensitive to framing; (ii) an asymmetric loss aversion effect, in which the consumer is more sensitive to losses than to gains, displaying risk aversion for small gains and risk seeking for small losses; and (iii) a certainty effect in which sure outcomes are overvalued relative to lotteries. In addition, there are (iv) an

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

31

150 140

Bid Price of Buyers

130 120 Asking Price of Sellers

Price of Pencils (cents)

110 100 90 80 70 60 50 40 30 20 10 0 0

10

20

30

40

50 Pencils

60

70

80

90

100

Figure 2.4 Bids in the pencil experiment isolation or cancellation effect in which common aspects of alternative lotteries are ignored when they are compared; (v) a segregation effect in which a riskless component of a lottery is evaluated separately from the risky component; and (vi) a mode effect in which pricing a lottery is treated as a qualitatively different task than choosing between lotteries. One of the consequences of these effects is that consumers will often refuse to take any share of either side of an offered lottery, a result consistent with the observed paucity of real-world wagers. Kahneman and Tversky attribute these effects to an editing process that determines the reference point and the perception of lottery outcomes as gains or losses, and to systematic misperception of probabilities. An additional reason that individuals are ambiguous about lotteries, and often avoid them, is the superstitious belief that there are hidden causal forces at work, interventions that place the lottery in an ambiguous relationship to the rest of life. People often have strong beliefs that they are lucky, or unlucky, or that their luck has to change. We have selective memory for coincidences. You remember running into a friend at a surprising place, or a particularly good night a poker, but you forget all the times you did not encounter a friend or had an unremarkable night. Chance jolts the harmony of conscious belief; relief from this dissonance is gained by imposing an order over chaos, weaving a fabric of apparent cause and effect, out of jumbled coincidences. The mind accepts and emphasizes those coincidences which reaffirm one’s perceived order of the universe, ignores and forgets inconsistent data, and shrouds each offered lottery in

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

32

Handbook of choice modelling

ambiguity regarding hidden effects. Superstition can arise and persist even when people are consistently Bayesian. Start with a prior that admits the possibility of complex, hidden causal paths. The experiments that life offers, and selective memory of outcomes, allows these cognitive ‘castles in the air’ to survive; see McFadden (1974c) and Hastie and Dawes (2001). There is experimental evidence that endowment effects are attenuated when traders are experienced; see Myagkov and Plott (1997) and List (2004). Thus, the observed paucity of trades in lotteries may occur primarily for novel events and inexperienced traders. These facts are consistent with a proposition that learning by observing and by doing may be effective in selecting rational market behavior rules in arenas with sufficient repetitiveness to allow these effects to operate. 7.2.3 Hyperbolic discounting occurs when individuals systematically underweight future consequences relative to contemporaneous consequences, and make choices that gratify now and leave lasting regret, in patterns that cannot be explained by maximization of consistently discounted present value of instantaneous utility. If you think of the current instance as a reference point in time, then this phenomenon resembles those surrounding the endowment effect, with the future neglected because it is ambiguous and difficult to anticipate, and lacks saliency. As discussed in section 6.4, a utilitarian rationalization of hyperbolic discounting, dating back to Jevons, is that the experience of time is subjective, so that a ten-minute interval now is subjectively longer than a ten-minute interval a week in the future. 7.2.4 A remembered utility effect occurs when memory of a painful or pleasurable episode is dominated by sensation at the peak and the end of the episode, rather than being determined as an integral of experienced intensities over the duration of the episode. A related phenomenon in psychology is labeled the primacy/ recency effect. We remember the first and last instances of some significant experience, less well the intermediate and integrated experience. An implication of these features of recall is extension neglect – the comparison of two episodes that differ in duration will tend to neglect duration. For example, a study by Redelmeier and Kahneman (1996) of experienced pain during colonoscopies, and recall of the episode, finds that adding pain of reduced intensity at the end of an episode improves overall recall of the experience; see also Varey and Kahneman (1992) and Huber et al. (1997). Kahneman et al. (1997) document in a number of experimental settings this phenomenon of duration neglect and concentration on recent experience, what one might call hyperbolic memory. A deeper reason for the phenomena of hyperbolic discounting and remembered utility is given by the psychologist George Lowenstein (1996) – it is difficult to recall or anticipate affective or emotional state. We may remember being in pain, and have a strong aversion to the antecedents of a painful experience, but we cannot relive the experience itself. Consequently, we may forget affective history, and fail to adequately protect ourselves against repeating it. Duration neglect can be recast in a neoclassical model with subjective time. Whether this leads to parsimonious, predictive models, or experiments on these effects can be designed that give results inconsistent with any intertemporal utility model, remains an open question.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure 7.3

33

The Sociality of Choice

Man is a social animal, identified with family and kin, and with clubs, troupes, tribes, ethnicities and nationalities. This has several consequences for economic choice behavior. First, individuals may look to their social networks for information. Second, they may look to social networks for approval, and use accountability to limit choice. Third, they may out of pure self-interest engage in mutually beneficial reciprocity; simple when the acts are synchronous, involving more complex elements of reputation and trust when they are not. Pursuing comparative advantage, with division of labor and trade, is a form of reciprocity. Fourth, they may engage in genetic altruism, making choices that are in the interest of their progeny rather than themselves as individuals. Fifth, they may exhibit altruistic behavior that does not obviously serve their personal or genetic selfinterest, such as incurring costs to sanction greedy behavior. There is a large literature in economics about the sociality of consumption, from Duesenberry (1949) on relative consumption and the sensitivity of savings behavior to relative income within a society and relative insensitivity to its absolute level of income, to conspicuous consumption, fads and bandwagon effects; see Veblen (1899), Leibenstein (1950) and Schelling (1978). However, while sociality has been recognized as important, the mechanisms of its operation have been obscure, and it has not led to a simple formalization comparable to that for conventional demand theory; these questions are explored further in McFadden (2010). 7.3.1 One major way sociality may work is simply through transmission of information, learning by imitation rather than learning by doing. People constantly make interpersonal comparisons, judging the desirability of options from the apparent satisfaction and advice of others. While personal experience is the proximate determinant of the utility of familiar objects, and may be extrapolated to similar objects, our primary sources of information on new objects come from others, through observation, advice, and association. McFadden and Train (1996) show that in innovation games with uncertain payoffs, it may pay to wait, and learn by observing rather than learn by doing. Manski (1993) has explored the possibility that individuals faced with dynamic stochastic decision problems that pose immense computational challenges may simply look to others to infer valuation functions to be used to judge the future payoff of current acts, or to infer satisfactory policies. An objection to such copycat behavior is that it fails to take account of the individual’s idiosyncratic tastes, and correcting this quickly gets the individual back into the computational difficulties that imitation was intended to circumvent. But if tastes as well as perceptions are modified socially, the relevance and value of the lessons from others increases. 7.3.2 Economic demographer Hans Peter Kohler (2001) has investigated the effect of word-of-mouth communication from friends on choice of contraceptive. He studies Korean peasant women, who have access to relatively little public information on efficacy, costs, and side effects of new contraceptives. Choices within villages show little diversity, but there is substantial, persistent diversity across villages. This pattern is not explained by income, education, or price differences. Word-of-mouth communication from friends was found to be the important

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

34

Handbook of choice modelling

explanation of most women’s choices. Lack of inter-village mobility explained multiple equilibria. Thus, some apparent taste heterogeneity is due to the boundedly rational practice of imitation in Balkanized social networks. The moral is that any complete measurement system for consumer behavior must account for social network effects. Suggestions for measurement are that stated perceptions and preferences should be conditioned on the behavior of members in an individual’s social network, and the distribution of consumption in social equilibrium should be modeled as the (often non-unique) solution to a game in which choices of peers matter. 7.3.3 In addition to providing information, social networks may discipline the behavior of members through consensus on social norms, accountability for choices, and sanctions for behavior that violates norms. The individual gains from affiliation with such networks if imitation and conformity save energy, if the ‘expectation that one will be called upon to justify one’s beliefs, feelings, or actions, to others’ improves decision-making and if approval is itself a source of pleasure. We engage in a great deal of automatic or intuitive thinking, or one might say semi-conscious or background thinking, in daily decisions. For example, an experienced driver does not go through a conscious process of deciding to change lanes. Automatic thinking saves energy, and time. The classical idea of herd mentality is that social animals find it easier and more comfortable to adhere to a group, accept group roles, and mimic group behavior than to act independently. Accountability reinforces herd mentality in fixed groups, and promotes safety in numbers. Individual membership may be voluntary, as in the peloton of tightly packed riders in a bicycle race, with riders tightly clustered and constrained in order to save energy in preparation for ‘breakaways’. The lack of well-defined measures for social norms and accountability are a significant barrier to modeling their influence on utility and on social equilibria, but clearly natural or laboratory experiments in which the social environment of market behavior is manipulated can be used to test for the effect of social pressures in various contexts. 7.3.4 Reciprocity is a form of social interaction present in economic trade and explained by self-interest. Reciprocity is simple to establish when it is synchronous, as in bilateral barter. However, asynchronous reciprocity requires reputation and trust. Kenneth Arrow (1972) argues that trust is an element in every commercial transaction. Norms for fair practice, and sanctions for violations, may evolve in social networks to facilitate asynchronous reciprocity, and individuals may by habit or internalization conform to these norms even in novel situations where the normal cycle of approval and reputation is suspended. Consider the single-shot ultimatum game with anonymous players: player 1 proposes a division of a prize of 100 units. If player 2 accepts, the players get the proposed shares; otherwise, they get nothing. It is neoclassically rational for player 2 to accept any positive amount, and thus neoclassically rational for player 1 to offer the minimum positive amount. However, if the probability of acceptance p(s) by player 2 is less than one when the share s offered by player 1 is low, then player 1’s optimal strategy is to maximize p(s)·(1 − s). Students in a cross-section of developed countries play similarly, but not rationally. Offers are usually 42 to 50 percent of the prize, and offers less than 20 percent are rejected about half the time. These results are

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

35

Table 2.3 Ultimatum game outcomes Society Lamalera (communal hunting village) Ache (seasonal foraging band) Hazda (foraging band) Machiguenga (subsistence farming families)

Mean offer (%)

Rejection rate (%)

57 48 40 26

– 0.0 19.2 4.8

consistent with neoclassical rationality for the first player if, for example, half of the second players are neoclassically rational and the remainder adopt reciprocity norms to punish ‘unfair’ behavior, with p(s) 5 ½ 1 ½(ln(2s)/s) and s between 0.42 and 0.46. Whether stated beliefs of player 1 regarding player 2 would be consistent with this p(s), or another function that rationalizes player 1 behavior, is an open question. 7.3.5 Isolated cultures offer natural experiments for testing the impact of social norms on trust and reciprocity. Sam Bowles and a team of experimental economists and ethnographers have conducted anonymous ultimatum game experiments in 15 isolated societies; see Henrich et al. (2004), and Bowles and Gintis (2011). Four of these are the Lamalera, a cooperative whale-hunting culture in Indonesia; the Ache, seasonal foraging bands in Paraguay that have some exposure to markets; the Hadza, hunter-gatherer bands in Tanzania; and the Machiguenga, horticultural family groups in Peru. The research finds strong cultural differences, shown in Table 2.3, with large mean offers among the Lamalera, who have ritualized rules for cooperation and sharing, and low mean offers among the Machiguenga, who have little experience in interaction outside the family. Within a culture, lower offers generate more rejections, but willingness to incur the cost of rejecting an offer differs substantially across cultures. The research concludes that violation of the selfishness axiom is common across cultures, but with differences that are a product of the social and economic lives of the subjects. The more integrated and market-oriented the contacts between individuals influenced by the technologies available for subsistence, the stronger a norm for ‘fair play’, and the more willing respondents to punish selfish behavior at a cost to themselves. 7.3.6 Genetic altruism is the phenomenon of self-sacrifice for the good of your family or kinship group. Genetic altruism appears to explain cooperation in most species, and appears to have a convincing evolutionary basis. In an entertaining account of the evolution of sociality, Matt Ridley (1996, p. 17) paraphrases the pioneers of sociobiology, William Hamilton, George Williams and Richard Dawkins: ‘The force of evolution favors “selfish” genes, those that promote their own reproduction. Individuals do not consistently do things for the good of their group, their family, or even themselves. They consistently do things for the good of their genes, because they are all inevitably descended from those that did the same. None of your ancestors died celibate.’ The principles of selection and genetic altruism infuse classical economics. However, despite their recognized importance, particularly in economic models of the family and of intergenerational transfers, they were not systematically

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

36

Handbook of choice modelling studied as determinants of economic behavior; see Becker (1976) and Koszegi (2004). The operation of genetic selection in promoting a disposition toward altruism could be very indirect. Thus, the acquisition of language, the exploitation of comparative advantage, the formation of successful defenses against marauders and disease, and a disposition to ‘fair play’ that reduces interpersonal conflict, may all arise from the selective advantage to group traits that promote sociality. Then altruistic behavior, including gifts to unrelated individuals with no possibility of personal gain, might be explainable as an indirect consequence of genetic self-interest. If so, the center of the original utilitiarian concept of relentless pursuit of pleasure could still hold, with group selection leading to the real, selfish pleasure we get from altruism. Paul Samuelson (1993) demonstrated that group selection works if the advantages of altruism are sufficient to offset a Gresham’s law of individual selection, in which altruistic traits are driven out by antagonistic selfish traits. However, experimental studies of altruistic punishment collected and carefully interpreted by Ernst Fehr and colleagues (Fehr, 2002; Fehr and Fischbacher, 2002; Fehr et al., 2005) suggest that evolutionary pressure for group selection is not consistent enough, and the costs of altruistic punishment in large groups are too high, to explain the pervasive and distinguishing level of altruism in large human groups. His conclusion is that human altruism is a mystery that selfish genes and selection cannot fully explain, something about our wiring that may not fit the notion of utility calibrated to experience pleasure from genetic survival. What is important for a discussion of the measurement of well-being is to understand that whatever its roots, our perceptions of the well-being of others do affect our own behavior and well-being, in ways that may be explained in part by genetic altruism and group selection even if other causes are buried deeper in the human makeup; see Zamagni (1995).

7.4

Sensation and Neuroeconomics

Brain science offers a new frontier for consumer measurements, through identification of reward structures and neurotransmitters in the brain, and study of the impact of choice problems on the brain in the presence of experimental treatments. Brain measurements include maps of energy consumption (functional magnetic resonance imaging –fMRI – and positron-emission tomography – PET), electrochemistry (probes, peptides and radionucleides), and physical intervention (gene manipulation, structural manipulation in animals and natural experiments in brain-damaged humans). In tandem with behavior intervention (manipulation of the choice environment and measurement of response), brain measurements provide information on the cognitive processing structure, perceptions, and sensations associated with choice. They fall considerably short of Edgeworth’s wistful call for a hedinometer to measure pleasure, but they provide some functionality and insight into the sensations that economists call utility. The early biologists observed that as the human embryo developed, it seemed to go through stages of evolution, from a simple one-celled creature to its complex final form. That view was superficial, but it does seem to be the case that human physiology and, in particular, the structure of the brain is consistent with a layering of added functionality

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

37

over a simpler and more primitive core. The aspects of brain function that we identify with being human – language, the cognitive processes of deduction and induction, and the ability to empathize and interact with others – are primarily sited in the frontal lobe of the cerebrum, the outer layer of the brain whose relative size and complexity in humans differentiate us from most other species. The more basic limbic system, buried at the base of the cerebrum, is heavily involved in emotion and the reward pathways that are associated with sensations of pain and pleasure. This system includes the amygdala, sometimes termed the ‘switchboard of the brain’, which is particularly rich in reward pathways and is active in animal behavior at a visceral level: approach and avoidance, foraging, territory and reproduction. The brain is a potent chemical factory, producing peptides that act as neurotransmitters and neuromodulators that bind to receptors on neurons and act to either excite or inhibit neuron firing. A few examples of natural peptides and related molecules are dopamine, a pleasure/reward transmitter and pain suppresser; epinephrine, a stress or threat transmitter; Bradykinin, a pain transmitter; and oxytocin, a regulator of approach-avoidance behavior, promoting ‘tend and befriend’ rather than ‘fight or flight’. Oxytocin is sometimes called the ‘trust’ or ‘love’ hormone because it plays a primary role in sexual and maternal bonding. Most people think of economic activity as quite cerebral, learned through lengthy education and shaped by culture. If the brain is the hardware, then the utilitarian calculus might be pictured as software, an operating system that is stored and run at various, possibly relocatable hardware sites, and modified by experience and selection. In this view, monitoring the brain can tell you something about the burden the software places on the hardware, but relatively little about what the software is doing. However, the picture that is now emerging is that economic behavior, like the brain itself, has layers, and high-level cognitive activities may appropriate primitive reward pathways to control behavior. Working a spreadsheet to balance a retirement portfolio is indeed a high-level, learned skill. However, economic trading also seems to involve relatively primitive circuits in the limbic system. An evolutionary tale, adapted from Ridley (1996) and Barrett and Fiddick (1999) suggests why this may be so. A few million years ago, the great apes had established family structures that were successful in the essentials, obtaining food, protecting themselves from predators and reproducing. In common with other animals, they evolved a sense of personal space sufficient to provide some defense against attack, and a system of trust that allowed them to get close to family members. These spatial, interactive activities had a physiological basis – neuromodulators and reward pathways in the brain that facilitated these interactions. Some of these apes discovered that through division of labor, specialization and trade, they could be more successful in surviving and reproducing. But trade, particularly outside the family, was a risky business. To get close enough to a stranger to trade flints for hides, one had to risk being attacked. The apes who were able to form bonds of trust over larger social groups than the family were the most successful at this. These interactions were facilitated by adapting the brain’s visceral reward pathways that already functioned in family units. In addition, these apes developed analytic and communication skills, such as language and empathetic attribution of sentiment, which allowed them to operate in larger social and economic groups. These were cerebral activities, and evolution selected the apes with more cerebral capacity.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

38

Handbook of choice modelling

Among these apes were our ancestors. They gave us large brains, with the capacity to explore the corners of our universe, and to engage in sophisticated economic activities. They also gave us an emotional reward system that processes economic actions in much the same way that it processes personal interactions: when to trust, when to form personal or professional bonds. Therefore, you should not be surprised to learn that brain hardware, the limbic system and its reward pathways, are associated with economic decisions in a substantial and relatively direct way. In particular, the ventral tegmental dopamine reward pathway in the amygdala qualifies as the brain’s primary center for recording pleasure, and appears to be active when we are involved in matters of threat, trust, sex and economic trade. Much of the information on the neurological foundations of economic behavior comes from measuring brain activity through levels of cellular energy consumption, using imaging techniques such as functional MRI and PET scans. Used in combination with experimental treatments with electrical probes, neurotransmitters and neuromodulators, and experimental presentation of economic decision-making tasks in games or markets, one has a powerful tool for detecting the links between choice and sensations of pleasure or pain. Brain-damaged humans and animals allow imaging under conditions under which some brain pathways are blocked. However, the linkages from physiological sensation to conscious interpretation and reasoning may be complex, and physiology may give an incomplete picture, just as computer hardware monitoring gives an incomplete picture of what software is doing. Nevertheless, it should be clear that any ability to measure directly in the brain the impact of economic choice tasks on reward pathways is potentially an immensely powerful tool for linking economic activities and consumer well-being. I will outline a scattering of results from human and animal studies that provide an intriguing picture of how sensation is directly influenced by economic tasks. 7.4.1 How do organisms process sensations of pleasure and pain? The answer goes directly to the question of whether there is a single, absolute physiological scale of well-being, and whether the organism consciously or unconsciously acts out of self-interest to maximize this quantity; see Bozarth (1994), Berridge (2003), Bhatt and Camerer (2005), Camerer (2005) and Damasio (2005). First, both behavioral observation and brain studies indicate that organisms seem to be on a hedonic treadmill, quickly habituating to homeostasis, and experiencing pleasure from gains and pain from losses relative to the reference point that homeostasis defines; see Sanfey et al. (2003). People quickly grow to accept the city in which they are located, their job, their mate and their health status. They may recognize and complain about unfavorable absolute states, but their levels of satisfaction by various measures are not nearly as differentiated as they would have to be if their sensation of well-being was experienced on an absolute scale. For example, Inglehart (2004) plots country means of self-rated happiness against income. There are obviously major measurement issues associated with such a study, beginning with the difficulty of rendering comparable semantic scales in different languages, but the study’s conclusion that money does not buy proportionate happiness is consistent with both the hedonic treadmill and with the proposition that effects other than market goods enter utility. Second, the picture that emerges from brain studies is that the ventral teg-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

39

mental dopamine pathways in the limbic/amygdala region play a central role in experiencing pleasure, and also mitigate, with a lag, the sensation of pain; see Becerra et al. (1999), McCabe et al. (2001), Rustichini et al. (2003), McClure et al. (2004), Dichhaut et al. (2005), Camerer (2005) and Glimcher et al. (2005, 2009). Adaptation to homeostasis and differentiation between the pleasure and pain circuits coincide with the powerful endowment and loss aversion effects, and sensitivity to framing and context, found in behavioral studies and suggest that these phenomena are tied fundamentally to brain structure. This is good news and bad news for utilitarians: the limbic system reward pathways seem to correspond to a utility pump, but specialized brain circuitry processes experience in ways that are not necessarily consistent with relentless maximization of hedonic experience. 7.4.2 Ivan Diamond, a neurologist at the University of California, San Francisco, who studied ethanol addiction, found that this and other substance addictions worked primarily by stimulating ventral tegmental dopamine pathways, although addiction once established has other physiological effects; see Diamond and Gordon (1997) and Appel et al. (2004). His laboratory engineered neuromodulators that block the D2 dopamine receptors in this reward pathway; these may lead to effective therapies for ethanol addiction. I cite this work because it shows, indirectly, the close relationship between these reward pathways and economic behavior: Diamond and his colleagues operated an experimental bar in which the spending rate was observed for alcoholics treated with various blockers; this rate was a very good predictor for the efficacy of the blocker. 7.4.3 David Laibson and colleagues (McClure et al., 2004) have investigated the processing of intertemporal choices. They find that choices involving delayed gratification are primarily processed in the frontal system, and those involving immediate gratification are primarily processed in the limbic system. Thus, eating a candy bar now activates the limbic pleasure center of the brain; deciding to delay gratification requires thought. Unless these systems work together in harmony, time-inconsistent behavior results. 7.4.4 One of the interesting bits of contemporary biology has been the establishment for a variety of species of simple direct links from particular genes to the production of and receptors for specific neurotransmitters, and from this to specific social behavior. Specific genes control the production and efficacy of the peptide oxytocin in the brain, and this in turn appears to control sexual attraction and behavior in everything from fruit flies to voles to humans. One may ask why these biological findings have any relevance to our discipline. The answer is that sexual reproduction requires close interaction between organisms, and to achieve such interaction requires a suspension of distrust. The oxytocin peptide appears to have the genetic role of promoting trust and bonding between the sexes. This is relevant to economics because trade, and more generally interactions in economic games, also involve elements of trust; see Eisenberger and Lieberman (2004) and Kosfeld et al. (2005). Thus, in its fundamentals, the primitives of economic behavior and sexual behavior may be the same neurotransmitters and reward pathways in the brain – shopping and sex share the same dopamine reward pathways. In a study that strikes at the heart of consumer sovereignty, Ernst Fehr and

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

40

Handbook of choice modelling associates (2005) administer oxytocin or a placebo to subjects, and then ask them to play the trust game. In this game, an investor is given 100 MU, and has the option of placing Y MU with an anonymous trustee, who then receives triple this amount, and then chooses to send Z MU back to the investor. The trustee’s game is a dictator game in which norms of fairness and reputation matter, but the rational response in a single-shot anonymous game is to return nothing. By backward induction, the investor should send nothing. In fact, both the investment and the return are usually positive, with the level of investment higher in subjects who are administered the ‘trust’ peptide oxytocin. However, oxytocin has no effect on play of the dictator sub-game, where trust does not matter. The conclusion is that economic perceptions and decisions are sensitive to brain chemistry, and susceptible to chemical manipulation.

8

THE FUTURE

What are the challenges and measurement opportunities in the future of research on consumers’ economic behavior and well-being? Even from a neoclassical perspective, the role of experience and memory on perceptions and preferences, nonlinear budget sets, household production and hedonics complicate the identification of utility and well-being, but also offer new measurement opportunities, through the added information contained in choice in nonlinear budget sets, and through natural and designed experiments that alter household production possibilities. New results challenge the standard assumption of maximization of individualistic utility, indicating that social networks as information sources, reciprocity and altruism enter human behavior and cannot be ignored. There are new opportunities to study the sociality of choice through experiments that manipulate the information provided through social networks, the effect of approval, and, through comparative study of isolated societies, the role of cultural and social norms. Finally, the striking ties between brain physiology and behavior in economic decisions, and new methods for measuring and manipulating brain activity, offer the possibility of powerful experiments in which economic, social, and physiological treatments are employed to identify and isolate the causal foundations of economic choice behavior. In particular, the ‘warm glow’ attached to bonding and trust in family and social groups seems to be tied to reward pathways in the limbic system that we experience as pleasure. It may be this chemistry that has worked with selection to promote social cognition and empathy in humans, giving them the mental capacity to function as social animals in large groups, to organize complex and productive economic systems, and to internalize cultural norms for reciprocity, trust, fairness and altruism. The challenge facing economic consumer theory is to utilize the disparate measurements and experimental methods that have become available to synthesize a new behavioral science of pleasure that retains the quantitative, predictive features of neoclassical theory in the economic settings where it works well, and extend these features into areas of individual sensation of well-being and choice in the context of social network information and approval, so that the theory can better predict the impact of novel economic policies on consumer well-being.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

41

NOTES 1.

2. 3.

4.

5.

6. 7.

8. 9. 10. 11.

12. 13.

An initial version of this paper was presented as the Frisch Lecture, Econometric Society World Congress, London, 2005. This research was supported by the E. Morris Cox endowment at the University of California, Berkeley, and by the National Institute on Aging of the National Institutes of Health. I am indebted to Sam Bowles, Colin Camerer, John Dagsvik, Ernst Fehr, Mogens Fosgerau, James Heckman, Danny Kahneman, David Laibson, Charles Manski, Rosa Matzkin, and Joachim Winter for useful comments. I first learned from John Chipman, Leonid Hurwicz, Marc Nerlove, and Hirofumi Uzawa how dual methods could be used to develop demand systems and implement econometric models of production and utility; see Hurwicz and Uzawa (1971) and Fuss and McFadden (1978). When it is useful to make X compact in the finite-dimensional case, this can be accomplished by imposing a bound that is not economically restrictive. Most of the results of duality theory continue to hold when prices p are points in a convex cone P in a locally convex linear topological space S, and X is a compact subset of the conjugate space S* of S. This extension is useful for applications where the consumer is making choices over continuous time, over risky prospects with a continuum of uncertain events, and/or over objects in physical or hedonic space. In the case of discrete or mutually exclusive alternatives, one can also write x 5 (x1, . . . ,xJ), where xj is subvector of commodities purchased under discrete choice j. If xj includes a dummy variable, its price is interpreted as the direct cost of alternative j. Exclusivity of alternatives is specified through the consumption set X. In this setup, utility maximization may be treated as a joint discrete-continuous decision, or the maximization can be done in stages, typically with discrete choice in the first stage assuming optimal continuous conditional choice in the second stage. The existence of a continuous utility index is somewhat more than is needed for most duality and demand analysis purposes, but is useful for welfare analysis. Consider the following preference continuity axiom: Suppose consumers with tastes defined by points r in a compact metric space R have preferences over objects (x,z) in a compact metric space X3Z, with (x9,z9) r (x0,z0) meaning (x9,z9) is at least as good as (x0,z0) for a consumer with tastes r. Suppose r is a complete, transitive preorder on X3Z, and has the continuity property that if a sequence (x9k,z9k,x0k, z0k,rk) converges to a limit (x90, z90,x00, z00,r0) and satisfies (x9k,z9k) rk (x0k,z0k), then (x90,z90) r0 (x00,z00). McFadden and Train (2000), Lemma 1, establishes that if this axiom holds, then there exists a utility function U(x,z,r), continuous in its arguments, that represents r [ R; see also Bridges (1988). H is a homogeneous of degree zero, upper hemicontinuous correspondence in p H P for each (z,r) H Z3R; see McFadden (1966) and Diewert (1974, 1982). M is strictly increasing in u, and concave and conical (i.e., linear homogeneous) in p, and consequently when p is finite-dimensional, almost everywhere twice continuously differentiable in p with symmetric second derivatives. The epigraph {(p,y)[Rn11 | y # M(p,u,z,r)} is a closed cone, and a vector x is a support of this cone at p (i.e., q·x $ M(q,u,z,r) for all q [ P(u,z,r), with equality for q 5 p), if and only if x is in the convex hull of H(p,u,z,r). V is quasi-convex and homogeneous of degree zero in (p,y), and increasing in y, while D is a homogeneous of degree zero, upper hemicontinuous correspondence in (p,y). When f and g are twice continuously differentiable, a necessary and sufficient condition for quasiconvexity is f0(p)/f9(p)2 # g0(y)/g9(y)2. For example, if g(y) 5 y1−a/(1−a) and f(p) 5 p1−b/(1−b), quasiconcavity holds for (p,y) satisfying aya−1 # bpb−1. The polar form reduces to the Stone form when C(p,z,r) 5 p∙c and A(p,z,r) 5p1q1 # . . . # pnqn. Realized distributions of demands are obtained with incomes drawn from a distribution that may be conditioned on (z,r) through ecological correlation and the influence of tastes on work history. In some applications, z is exogenous to the consumer, and thus independent of income and tastes. For example, a product attribute such as durability may be uniform for all consumers. In other applications, the z are an endogenous part of consumer choice, such as congestion levels, or residential location in response to air pollution levels, and thus have a distribution that is dependent upon income and tastes. A satisfactory model for WTP in the presence of endogenously determined environmental attributes requires specification of the structure of supply as well as demand, and determination of an equilibrium allocation in both market goods and the non-market environments. Willingness-to-pay is then defined on an equilibrium trajectory from old to new environmental, income, and market management policies. In the terminology of the statistical and econometric literature on treatment effects, the final state in the welfare comparison is a fully consistent equilibrium counterfactual. The answer to this question depends on what one knows about the resources available to individuals; see Debreu (1974), McFadden et al. (1974) and Matzkin (2005). The typical hedonic regression for a commodity like housing is log p 5 zb 1 e, where p is price, z includes observed attributes such as square footage, age, number of baths, and proximity to schools, jobs, and environmental nuisances, and unobserved attributes combine into a disturbance e.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

42

Handbook of choice modelling

REFERENCES Afriat, S.N. (1967), ‘The construction of utility functions from expenditure data’, International Economic Review, 8 (1), 67–77. Anderson, S., A. de Palma and J. Thisse (1992), Discrete Choice Theory of Product Differentiation, Cambridge, MA: MIT Press. Antonelli, G. (1886), ‘Sulla teoria matematica della economia politica’ (‘On the mathematical theory of political economy’), Nella tipografia del Folchetto, Pisa. Appel, W., M. McBride, M. Diana, I. Diamond and A. Bonci (2004), ‘Ethanol effects on dopaminergic “reward” neurons in the ventral tegmental area and the mesolimbic’, Alcoholism: Clinical & Experimental Research, 26 (11), 1768–78. Arrow, K. (1971), ‘Exposition of the theory of choice under uncertainty’, Essays on the Theory of Risk-Bearing, New York: Macmillan, pp. 44–89. Arrow, K. (1972) ‘Gifts and exchanges’, Philosophy and Public Affairs, 1 (4), 343–62. Arrow, K., H. Chenery, B. Minhas and R. Solow (1961), ‘Capital-labor substitution and economic efficiency’, Review of Economics and Statistics, 43 (3), 225–50. Attanasio, O. and N. Pavoni (2011), ‘Risk sharing in private information models with asset accumulation: explaining the excess smoothness of consumption’, Econometrica, 79 (4), 1027–68. Auspitz, R. and R. Lieben (1889), Untersuchungen uber die Theorie des Prices (Investigations on the Theory of Prices), Leipzig: Verlag von Duncker & Humblot. Barrett, C. and L. Fiddick (1999), ‘Evolution and risky decisions’, Trends in Cognitive Science, 4, 251–2. Becerra, L., H.C. Breiter, M. Stojanovic, S. Fishman, A. Edwards, A.R. Comite, R.G. Gonzalez and D. Borsook (1999), ‘Human brain activation under controlled thermal stimulation and habituation to noxious heat: an fMRI study’, Magnetic Resonance in Medicine, 41 (5), 1044–57 Becker, G. (1976), ‘Altruism, egoism, and genetic fitness: economics and sociobiology’, Journal of Economic Literature, 14 (3), 817–26. Bentham, J. (1789), An Introduction to the Principles of Morals and Legislation, reprinted 1876, Oxford: Clarendon Press. Berridge, K. (2003), ‘Pleasures of the brain’, Brain and Cognition, 52 (1), 106–28. Berry, S., J. Levinsohn and A. Pakes (1995), ‘Automobile prices in market equilibrium’, Econometrica, 63 (4), 841–90. Berry, S., J. Levinsohn and A. Pakes (2004a), ‘Differentiated products demand systems from a combination of micro and macro data: the new car market’, Journal of Political Economy, 112 (1), 68–105. Berry, S., O. Linton and A. Pakes (2004b), ‘Limit theorems for estimating the parameters of differentiated product demand systems’, Review of Economic Studies, 71 (3), 613–54. Bhatt, M. and C.F. Camerer (2005), ‘Self-referential thinking and equilibrium as states of mind in games: fMRI evidence’, Games and Economic Behavior, 52 (2), 424–59. Blackorby, C. and E. Diewert (1979), ‘Expenditure functions, local duality, and second order approximations’, Econometrica, 47 (3), 579–601. Block, H. and J. Marschak (1960), ‘Random orderings and stochastic theories of response’, in I. Olkin (ed.), Contributions to Probability and Statistics, Stanford, CA: Stanford University Press. Blundell, R. and J. Powell (2004), ‘Endogeneity in semiparametric binary response models’, Review of Economic Studies, 71 (3), 655–79. Blundell, R., M. Browning and I. Crawford (2008), ‘Best nonparametric bounds on demand responses’, Econometrica, 76 (6), 1227–62. Blundell, R., M. Browning, L.J.H. Cherchye, I. Crawford, B. de Rock and F.M.P. Vermeulen (2012), ‘Sharp for SARP: nonparametric bounds on the behavioural and welfare effects of price changes’, Working Paper 2012.1214, Institute for Fiscal Studies, London. Bowles, S. and H. Gintis (2011), A Cooperative Species: Human Reciprocity and Its Evolution, Princeton, NJ: Princeton University Press. Bozarth, M. (1994), ‘Pleasure systems in the brain’, in D.M. Warburton (ed.), Pleasure: The Politics and the Reality, Chichester: Wiley, pp. 5–14. Bridges, D. (1988), ‘The Euclidean distance construction of order homomorphism’, Mathematical Social Sciences, 15 (2), 179–88. Browning, M. (1991), ‘A simple nonadditive preference structure for models of household behavior over time’, The Journal of Political Economy, 99 (3), 607–37. Camerer, C. (1999), ‘Behavioral economics: reunifying psychology and economics’, Proceedings of the National Academy of Sciences, 96 (19), 10575–77. Camerer, C. (2005), ‘Strategizing in the brain’, Science, 300 (5626), 1673–5.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

43

Camerer, C. and R. Thaler (1995), ‘Anomalies: ultimatums, dictators, and manners’, Journal of Economic Perspectives, 9 (2), 209–19. Campbell, J. and A. Deaton (1989), ‘Why is consumption so smooth?’, The Review of Economic Studies, 56 (3), 357–74. Carson, R., N. Flores and N. Meade (2001), ‘Contingent valuation: controversies and evidence’, Environmental and Resource Economics, 19 (2), 173–210. Chipman, J. and J. Moore (1980), ‘Compensating variation, consumer’s surplus, and welfare’, American Economic Review, 70 (5), 933–49. Chipman, J. and J. Moore (1990), ‘Acceptable indicators of welfare change, consumer’s surplus analysis, and the Gorman polar form’, in D. McFadden and M. Richter (eds), Preferences, Uncertainty, and Optimality: Essays in Honor of Leonid Hurwicz. Boulder, CO, and Oxford: Westview Press, pp. 68–120. Christensen, L., D. Jorgenson and L. Lau (1975), ‘Transcendental logarithmic utility functions’, American Economic Review, 65 (3), 367–83. Conniffe, D. (2007), ‘A note on generating globally regular indirect utility functions’, B.E. Journal of Theoretical Economics, 7.1, 1–11. Court, A. (1939), ‘Hedonic price indexes with automobile examples’, in General Motors Corporation, The Dynamics of Automobile Demand, New York: General Motors Corporation, pp. 99–117. Dagsvik, J., S. Strom and Z. Zia (2005), ‘Utility of income as a random function: behavioral characterization and empirical evidence’, working paper Statistics Norway. Damasio, A. (2005), ‘Brain trust’, Nature, 435 (2 June), 571–2. Deaton, A. and J. Muellbauer (1980a), ‘An almost ideal demand system’, American Economic Review, 70 (3), 312–26. Deaton, A. and J. Muellbauer (1980b), Economics and Consumer Behavior, Cambridge: Cambridge University Press. Debreu, G. (1959), Theory of Value, New Haven, CT: Yale University Press. Debreu, G. (1974), ‘Excess demand functions’, Journal of Mathematical Economics, 1 (1), 15–24. Debreu, G. (1986), ‘Topological methods in cardinal utility theory’, Mathematical Economics: Twenty Papers of Gerard Debreu, Econometric Society Monograph, Cambridge; New York and Melbourne: Cambridge University Press. Diamond, I. and A. Gordon (1997), ‘Cellular and molecular neuroscience of alcoholism’, Physiological Review, 77 (1), 1–20. Diamond, P. and J. Hausman (1994), ‘Contingent valuation: is some number better than no number?’, Journal of Economic Perspectives, 8 (4), 45–64. Dickhaut, J., K. McKabe, J. Nagode, A. Rustichini, K. Smith and J. Pardo (2005), ‘The impact of certainty context on the process of choice’, working paper, University of Minnesota. Diewert, E. (1971), ‘An application of the Shephard duality theorem, a generalized Leontief production function’, Journal of Political Economy, 79 (3), 481–507. Diewert, E. (1974), ‘Applications of duality theory’, in M. Intriligator and D. Kendrick (eds), Frontiers of Quantitative Economics, vol. 2, Amsterdam: North-Holland, pp. 106–71. Diewert, E. (1982), ‘Duality approaches to microeconomic theory’, in K.J. Arrow and M.D. Intriligator (eds), Handbook of Mathematical Economics, vol. 2, Amsterdam: North-Holland, pp. 535–99. Domemcich, T. and D. McFadden (1975), Urban Travel Demand: A Behavioral Analysis, Amsterdam: North-Holland. Dubin, J. and D. McFadden (1984), ‘An econometric analysis of residential electric appliance holdings and consumption’, Econometrica, 52 (2), 345–62. Duesenberry, J.S. (1949), Income, Saving and the Theory of Consumer Behaviour, Cambridge, MA: Harvard University Press. Dupuit, J. (1844), ‘On the measurement of the utility of public works’, Annales des ponts et chaussées, trans. 1952 by R.H. Barback, International Economic Review, 2, 83–110. Edgeworth, F.Y. (1881), Mathematical Psychics; an Essay on the Application of Mathematics to the Moral Sciences, London: C.K. Paul & Company. Edgeworth, F.Y. (1894), ‘Distance in time as an element of value’, Palgrave’s Dictionary of Political Economy. Republished 2003 in P. Newman, F.Y. Edgeworth’s Mathematical Psychics and Further Papers on Political Economy, Oxford: Oxford University Press. Eisenberger, N. and M. Lieberman (2004), ‘Why rejection hurts: a common neural alarm system for physical and social pain’, Trends in Cognitive Science, 8, 294–300. Ekeland, I. (2010), ‘Existence, uniqueness, and efficiency of equilibrium in hedonic markets with multidimensional types’, Economic Theory, 42 (2), 275–315. Emerson, R.W. (1844) ‘Experience’, Essays: Second Series, Boston, MA: John Munroe. Fehr, E. and S. Gachter (2002), ‘Altruistic punishment in humans’, Nature, 415 (10 January), 137–40.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

44

Handbook of choice modelling

Fehr, E. and U. Fischbacher (2002), ‘Why social preferences matter – the impact of non-selfish motives on competition, cooperation, and incentives’, Economic Journal, 112 (March), 1–33. Fehr, E., U. Fischbacher and M. Kosfeld (2005), ‘Neuroeconomic foundations of trust and social preferences’, American Economic Review, 95 (2), 346–51. Fenchel, W. (1953), ‘Convex cones, sets, and functions: lecture notes’, mimeograph, Princeton University. Fisher, I. (1892), Mathematical Investigations in the Theory of Value and Prices, Yale. Fisher, I. (1908), ‘Are savings income?’, American Economic Association Quarterly, 3rd Series, 9, 21–47. Fisher, I. (1918), ‘Is “utility” the most suitable form for the concept it is used to denote?’, American Economic Review, 8 (2), 335–7. Fisher, I. (1927), ‘A statistical method for measuring marginal utility and testing for the justice of a progressive income tax’, in J. Hollander (ed.), Economic Essays in Honor of John Bates Clark, New York: Macmillan. Fisher, I. (1930), The Theory of Interest as Determined by Impatience to Spend Income and Opportunity to Invest It, New York: The Macmillan Company. Fosgerau, M. and D. McFadden (2012), ‘A theory of the perturbed consumer with general budgets’, working paper, Institute for Transport, Technical University of Denmark. Frey, B. and A. Stutzer (2002a), ‘The economics of happiness’, World Economics, 3 (1), 1–17. Frey, B. and A. Stutzer (2002b), Happiness and Economics, Princeton, NJ: Princeton University Press. Frisch, R. (1926), ‘Sur un Probleme of Economie Pure’, Norsk Matematisk Forenings Skrifter, 16, 1–40, trans. in J. Chipman, L. Hurwicz, M. Richter and H. Sonnenschein (eds) (1971), Preferences, Utility, and Demand, New York: Harcourt. Frisch, R. (1932), New Methods of Measuring Marginal Utility, Tubingen: Mohr. Fuss, M. and D. McFadden (1978), Production Economics: A Dual Approach to Theory and Applications, Amsterdam: North Holland. Georgescu-Roegen, N. (1936), ‘The pure theory of consumer’s behavior’, Quarterly Journal of Economics, 50 (August), 545–93. Gilovich, T., D. Griffin and D. Kahneman (2002), Heuristics and Biases: The Psychology of Intuitive Judgment, New York: Cambridge University Press. Glimcher, P., M. Dorris and H. Bayer (2005), ‘Physiological utility theory and the neuroeconomics of choice’, Games and Economic Behavior, 52 (2), 213–56. Glimcher, P., E. Fehr, C. Camerer and R. Poldrack (eds) (2009), Neuroeconomics: Decision Making and the Brain, New York: Academic Press. Gorman, W. (1953), ‘Community preference fields’, Econometrica, 21, 63–80. Gorman, W. (1961), ‘On a class of preference field’, Metroeconomica, 13 (2), 53–6. Gossen, H. (1854), Die Entwicktlung, English translation, 1983, The Laws of Human Relations, Cambridge: MIT Press. Green, D., K. Jacowitz, D. Kahneman and D. McFadden (1998), ‘Referendum contingent valuation, anchoring, and willingness to pay for public goods’, Resource and Energy Economics, 20 (2), 85–116. Green, P., D. Carroll, P. Greem and S. Goldberg (1981), ‘A general approach to product design optimization via conjoint analysis’, Journal of Marketing, 45 (2), 17–37. Green, P., A. Krieger and Y. Wind (2001), ‘Thirty years of conjoint analysis: reflections and prospects’, Interfaces, 31 (3), S56–S73. Griliches, Z. and I. Adelman (1961), ‘On an index of quality change’, Journal of the American Statistical Association, 56 (295), 535–48. Hall, R.E. (1978), ‘Stochastic implications of the permanent income hypothesis: theory and evidence’, Journal of Political Economy, 96 (6), 971–87. Hammond, P. (1994), ‘Money metric measures of individual and social welfare allowing for environmental externalities’, in W. Eichhorn (ed.), Models and Measurement of Welfare and Inequality, Heidelberg: Springer-Verlag, pp. 694–724. Hastie, R. and R. Dawes (2001), Rational Choice in an Uncertain World, Thousand Oaks, CA: Sage. Hausman, J. (1985), ‘The econometrics of nonlinear budget sets’, Econometrica, 53 (6), 1255–82. Heckman, J.J., R. Matzkin and L. Nesheim (2010), ‘Nonparametric identification and estimation of nonadditive hedonic models’, Econometrica, 78 (5), 1569–91. Henrich, J., R. Boyd, S. Bowles, C. Camerer, E. Fehr and H. Gintis (2004), Foundations of Human Sociality, Oxford: Oxford University Press. Hicks, J. (1939), Value and Capital, Oxford: Clarendon Press. Horowitz, J. and N. Savin (2001), ‘Binary response models: logits, probits and semiparametrics’, Journal of Economic Perspectives, 15 (4), 43–56. Horowitz, J. (1992), ‘A smoothed maximum score estimator for the binary response model’, Econometrica, 60 (3), 505–31 Hotelling, H. (1935), ‘Demand functions with limited budgets’, Econometrica, 3 (January), 66–78. Houthakker, H. (1950), ‘Revealed preference and the utility function’, Economica, 17 (66), 159–74.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

45

Huang, J.C. and D. Nychka (2000), ‘A nonparametric multiple choice method within the random utility framework’, Journal of Econometrics, 97 (2), 207–25. Huber, J., J. Lynch, K. Corfman, J. Feldman, M. Holbrook, D. Lehman, B. Munier, D. Schkade and I. Simonson (1997), ‘Thinking about values in prospect and retrospect: maximizing experienced utility’, Marketing Letters, 8 (3), 323–34. Hurwicz, L. and H. Uzawa (1971), ‘On the integrability of demand functions’, in J. Chipman, L. Hurwicz, M. Richter and H. Sonnenschein (eds), Preferences, Utility, and Demand, New York: Harcourt, pp. 114–48. Ichimura, H. and L.F. Lee (1991), ‘Semiparametric least squares estimation of multiple index models: single equation estimation’, in W. Barnett, J. Powell and G. Tauchen (eds), Nonparametric and Semiparametric Methods in Econometrics and Statistics: Proceedings of the Fifth International Symposium in Economic Theory and Econometrics, Cambridge: Cambridge University Press, pp. 3–49. Ichimura, H. and T.S. Thompson (1998), ‘Maximum likelihood estimation of a binary choice model with random coefficients of unknown distribution’, Journal of Econometrics, 86 (2), 269–95. Inglehart, R. (2004), European and World Value Surveys Integrated Data File, 1999-2002, Ann Arbor, MI: Institute for Social Research, University of Michigan. Jevons, W. (1871), Theory of Political Economy, reprinted 1931, London: Macmillan. Johansson-Stenman, O. and H. Svedsater (2003), ‘Self-image and choice experiments: hypothetical and actual willingness to pay’, working paper, Goteborg University. Johnson, R. (1974), ‘Trade-off analysis of consumer values’, Journal of Marketing Research, 11 (2), 121–7. Jorgenson, D., L. Lau and T. Stoker (1980), ‘Welfare comparison under exact aggregation’, American Economic Review, 70 (2), 268–72. Jorgenson, D., L. Lau and T. Stoker (1997), ‘The transcendental logarithmic model of aggregate consumer behavior’, in D. Jorgenson (ed.), Welfare. 1. Aggregate Consumer Behavior. Cambridge, MA and London: MIT Press, pp. 203–356. Previously published 1982. Kahneman, D., E. Diener and N. Schwartz (1999), Well-Being: The Foundations of Hedonic Psychology, New York: Russell Sage Foundation. Kahneman, D., J. Knetsch, and R. Thaler (1990), ‘Experimental Tests of the endowment effect and the Coase theorem’, Journal of Political Economy, 98 (6), 1325–48. Kahneman, D., J. Knetsch, and R. Thaler (1991), ‘The endowment effect, loss aversion, and status quo bias’, Journal of Economic Perspectives, 5 (1), 193–206. Kahneman, D., P. Wakker and R. Sarin (1997), ‘Back to Bentham? Explorations of experienced utility’, Quarterly Journal of Economics, 112 (2), 375–405. Katzner, D. (1970), Static Demand Theory, New York: Macmillan. King, G., C. Murray, J. Salomon and A. Tandon (2004), ‘Enhancing the validity and cross-cultural comparability of measurement in survey research’, American Political Science Review, 98 (1), 191–207. Knetsch, J. (1989), ‘The endowment effect and evidence of nonreversible indifference curves’, American Economic Review, 79 (5), 1277–84. Kohler, H.-P. (2001), Fertility and Social Interactions, Oxford: Oxford University Press. Kosfeld, M., M. Heinrichs, P. Zak, U. Fischbacher and E. Fehr (2005), ‘Oxytocin increases trust in humans’, Nature, 435 (June), 673–6. Koszegi, B. (2004), ‘Ego utility, overconfidence, and task choice’, Journal of the European Economics Association, 4 (4), 673–707. Lancaster, K. (1966), ‘A new approach to consumer theory’, Journal of Political Economy, 74 (2), 132–57. Langer, T. and M. Weber (2001), ‘Prospect theory, mental accounting, and differences in aggregated and segregated evaluation of lottery portfolios’, Management Science, 47 (5), 716–33. Leibenstein, H. (1950), ‘Bandwagon, snob, and Veblen effects in the theory of consumers’ demand’, Quarterly Journal of Economics, 64 (2), 183–207. Lewbel, A. (1992), ‘Aggregation with log-linear models’, Review of Economic Studies, 59 (3), 635–42. List, J. (2004), ‘Neoclassical theory versus prospect theory: evidence from the marketplace’, Econometrica, 72 (2), 615–25. Lowenstein, G. (1996), ‘Out of control: visceral influences on behavior’, Organizational Behavior and Human Decision Processes, 65 (3), 272–92. Lowenstein, G., D. Read and R. Baumeister (2003), Time and Decision: Economic and Psychological Perspectives on Intertemporal Choice, New York: Russell Sage Foundation. Lucas, R. (1975), ‘An equilibrium model of the business cycle’, The Journal of Political Economy, 83 (December), 1113–44. Luce, D. and J. Tukey (1964), ‘Simultaneous conjoint measurement: a new type of fundamental measurement’, Journal of Mathematical Psychology, 1 (1), 1–27. Luce, R. and P. Suppes (1965), ‘Preference, utility, and subjective probability’, in R. Luce, R. Bosh and F. Galanter (eds), Handbook of Mathematical Psychology, vol. 3, New York: Wiley, pp. 249–410.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

46

Handbook of choice modelling

Malinvaud, E. (1953), ‘Capital accumulation and efficient allocation of resources’, Econometrica, 21 (2), 233–68. Manski, C. (1991), ‘Nonparametric estimation of expectations in the analysis of discrete choice under uncertainty’, in W. Barnett, J. Powell and G. Tauchen (eds), Nonparametric Methods in Econometrics and Statistics, Cambridge: Cambridge University Press, pp. 259–275. Manski, C. (1993), ‘Dynamic choice in social settings: learnings from the experiences of others’, Journal of Econometrics, 58 (1–2), 121–36. Manski, C. (2004), ‘Measuring Expectations’, Econometrica, 72 (5), 1329–76. Marschak, J. (1960), ‘Binary choice constraints on random utility indicators’, in K. Arrow (ed.), Stanford Symposium on Mathematical Methods in the Social Sciences, Stanford, CA: Stanford University Press. Marshall, A. (1895), Principles of Economics, London and New York: Macmillan. Mas-Colell, A. (1996), ‘A model of equilibrium with differentiated commodities’, in G. Debreu (ed.), General Equilibrium Theory, vol. 2, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 462–94. Mas-Colell, A., M. Whinston and J. Green (1995), Microeconomic Theory, Oxford: Oxford University Press. Matzkin, R. (1992), ‘Nonparametric and distribution-free estimation of the binary threshold crossing and the binary choice models’, Econometrica, 60 (2), 239–70. Matzkin, R. (1993), ‘Nonparametric identification and estimation of polychotomous choice models’, Journal of Econometrics, 58 (1–2), 137–68. Matzkin, R. (2005), ‘Identification of consumers’ preferences when their choices are unobservable’, Economic Theory, 26 (2), 423–44. Matzkin, R. (2008), ‘Identification in nonparametric simultaneous equations models’, Econometrica, 76 (5), 945–78. Matzkin, R. (2012), ‘Identification in nonparametric limited dependent variable models with simultaneity and unobserved heterogeneity’, Journal of Econometrics, 166 (1), 106–15. Matzkin, R. and D. McFadden (2011), ‘Trembling payoff market games’, working paper, University of California Berkeley. McCabe, K., D. Houser, L. Ryan, V. Smith and T. Trouard (2001), ‘A functional imaging study of cooperation in two-person reciprocal exchange’, Proceedings of the National Academy of Sciences, 98 (20), 11832–5. McClure, S.M., D. Laibson, G. Lowenstein and J. Cohen (2004), ‘Separate neural systems value immediate and delayed monetary rewards’, Science, 306 (5695), 503–7. McFadden, D. (1966), ‘Cost, revenue, and profit functions’, lecture notes, University of California. McFadden, D. (1974a), ‘Conditional logit analysis of qualitative choice behavior’, in P. Zarembka (ed.), Frontiers of Econometrics, New York: Academic Press, pp. 105–42. McFadden, D. (1974b), ‘The measurement of urban travel demand’, Journal of Public Economics, 3 (4), 303–28. McFadden, D. (1974c), ‘On some facets of betting’, in M.S. Balch, D. McFadden and S.Y. Wu (eds), Essays on Economic Behavior under Uncertainty, Amsterdam: North-Holland, pp. 126–37. McFadden, D. (1986), ‘The choice theory approach to market research’, Marketing Science, 4 (5), 275–97. McFadden, D. (1994), ‘Contingent valuation and social choice’, American Journal of Agricultural Economics, 76 (4), 689–708. McFadden, D. (1999a), ‘Rationality for economists?’, Journal of Risk and Uncertainty, 19 (1–3), 73–105. McFadden, D. (1999b), ‘Computing willingness-to-pay in random utility models’, in J. Moore, R. Riezman and J. Melvin (eds), Trade, Theory, and Econometrics: Essays in Honour of John S. Chipman, London: Routledge, pp. 253–74. McFadden, D. (2005), ‘Revealed stochastic preference: a synthesis’, Economic Theory, 26 (2), 245–64. McFadden, D. (2006), ‘Free markets and fettered consumers’, American Economic Review, 96 (1), 5–29. McFadden, D. (2008), ‘Environmental valuation of environmental projects’, working paper, University of California. McFadden, D. (2010), ‘Sociality, rationality, and the ecology of choice’, in S. Hess and A. Daly (eds), Choice Modelling: The State-of-the-Art and the State-of-Practice, Bingley: Emerald. McFadden, D. (2012), ‘Economic juries and public project provision’, Journal of Econometrics, 166 (1), 116–26. McFadden, D. and M.K. Richter (1990), ‘Stochastic rationality and revealed stochastic preference’, in J. Chipman, D. McFadden and M.K. Richter (eds), Preferences, Uncertainty, and Optimality: Essays in Honor of Leo Hurwicz, Boulder, CO: Westview Press, pp. 161–86. McFadden, D. and K. Train (1996), ‘Consumers’ evaluation of new products: learning from self and others’, Journal of Political Economy, 104 (4), 683–703. McFadden, D. and K. Train (2000), ‘Mixed MNL models for discrete response’, Journal of Applied Econometrics, 15 (5), 447–70. McFadden, D., R. Mantel, A. Mas-Colell and M.K. Richter (1974), ‘A characterization of community excess demand functions’, Journal of Economic Theory, 9 (4), 361–74. McKenzie, L. (1957), ‘Demand theory without a utility index’, The Review of Economic Studies, 24 (3), 185–9. Mellers, B. (2000), ‘Choice and the relative pleasure of consequences’, Psychological Bulletin, 126 (6), 910–74.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The new science of pleasure

47

Morikawa, T., M. Ben-Akiva and D. McFadden (2002), ‘Discrete choice models incorporating revealed preferences and psychometric data’, in P. Franses and A. Montgomery (eds), Econometric Models in Marketing. Amsterdam, London and New York: Elsevier Science, pp. 29–55. Moscati, I. (2007), ‘Early experiments in consumer demand theory: 1930–1970’, History of Political Economy, 39 (3), 359–401. Muth, J. (1992), ‘Rational expectations and the theory of price movements’, in K. Hoover (ed.), The New Classical Macroeconomics, vol. 1, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 3–23. Previously published 1961. Muth, J. (1994), ‘Optimal properties of exponentially weighted forecasts’, in A. Harvey (ed.), Time Series, vol. 1, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 121–8. Previously published 1960. Muth, R. (1998), ‘Household production and consumer demand functions’, in K. Lancaster (ed.), Consumer Theory, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 302–11. Previously published 1966. Myagkov, M. and C. Plott (1997), ‘Exchange economies and loss exposure: experiments exploring prospect theory and competitive equilibria in market environments’, American Economic Review, 87 (5), 801–28. Newman, P. (2003), F.Y. Edgeworth: Mathematical Psychics and Further Papers on Political Economy, Oxford: Oxford University Press. Ohta, M. (1971), ‘Hedonic price index for boiler and turbo-generator: a cost function approach’, PhD thesis, University of California, Berkeley. Ohta, M. and Z. Griliches (1986), ‘Automobile prices and quality: did gasoline price increases change consumer tastes in the U.S.?’, Journal of Business and Economic Statistics, 4 (2), 187–98. Okubo, M. (2008), ‘Intertemporal substitution and nonhomothetic preferences’, Economic Letters, 98 (1), 41–7. Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge: Cambridge University Press. Pareto, V. (1906), Manual of Political Economy, English translation 1971, Augustus M. Kelley, New York: Harcourt, Brace. Peleg, B. (1970), ‘Utility functions for partially ordered topological spaces’, Econometrica, 38 (1), 93–6. Pollak, R. (1970), ‘Habit formation and dynamic demand function’, Journal of Political Economy, 78 (4), 745–63. Rabin, M. (1998), ‘Psychology and economics’, Journal of Economic Literature, 36 (1), 11–46. Rader, T. (1973), ‘Nice demand functions’, Econometrica, 41 (5), 913–35. Redelmeier, D. and D. Kahneman (1996), ‘Patent’s memories of painful medical treatments: real-time and retrospective evaluations of two minimally invasive procedures’, Pain, 66 (1), 3–8. Richter, M.K. (1966), ‘Revealed preference theory’, Econometrica, 34 (3), 635–45. Ridley, M. (1996), The Origins of Virtue, Harmondsworth: Penguin. Rosen, S. (1974), ‘Hedonic prices and implicit markets: product differentiation in perfect competition’, Journal of Political Economy, 82 (1), 34–55. Rossi, P. (1979), ‘Vignette analysis: Uncovering the normative structure of complex judgments’, in R.K. Merton, J.S. Coleman and P.H. Rossi (eds), Qualitative and Quantitative Social Research: Papers in Honor of Paul Lazarsfeld, New York: Macmillan, pp. 175–88. Roy, R. (1942), Eléments d’économétrie, reprinted 1970, Paris: Presses universitaires de France. Rustichini, A., J. Dickhaut, P. Ghirardato, K. Smith and J. Pardo (2003), ‘A brain imaging study of the choice process’, working paper, University of Minnesota. Samuelson, P. (1947), Foundations of Economic Analysis, reprinted 1983, Cambridge, MA: Harvard University Press. Samuelson, P. (1948), ‘Consumption theory in terms of revealed preference’, Economica, 15 (60), 243–53. Samuelson, P. (1950), ‘The problem of integrability in utility theory’, Economica, 17 (November), 355–85. Samuelson, P. (1993), ‘Altruism as a problem involving group versus individual selection in economics and biology’, American Economic Review, 83 (2), 143–8. Sanfey, A.G., J.K. Rilling, J.A. Aronson, L.E. Nystrom and J.D. Cohen, (2003), ‘The neural basis of economic decision-making in the ultimatum game’, Science, 300 (5626), 1755–8. Savage, L.J. (1954), The Foundations of Statistics, New York, Wiley. Schelling, T.C. (1978), Micromotives and Macrobehavior, Cambridge, MA: Harvard University Press. Shephard, R.N (1953), Cost and Production Functions, Princeton, NJ: Princeton University Press. Signorini, D. and M. Jones (2004), ‘Kernel estimators for univariate binary regression’, Journal of the American Statistical Association, 99 (465), 119–26. Slutsky, E. (1915), ‘Sulla teoria del bilancio del consummatore’, Giornale degli Economisti, English translation, ‘On the theory of the budget of the consumer’, reprinted 1952, in G. Stigler and K. Boulding (eds), Readings in Price Theory, Homewood, IL: Irving. Smith, A. (1753), The Theory of Moral Sentiments, reprinted 1984, Oxford: Oxford University Press. Smith, A. (1776), An Inquiry into the Nature and Causes of the Wealth of Nations. London: W. Strahan and T. Cadell.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

48

Handbook of choice modelling

Stone, R. (1954), ‘Linear expenditure systems and demand analysis’, Economic Journal, 64, 511–27. Sundaresan, S. (1989), ‘Intertemporally dependent preferences and the volatility of consumption and wealth’, Review of Financial Studies, 2, 75–89. Taussig, F. (1912), Principles of Economics, New York: Macmillan. Taylor, L. (2005), ‘Estimation of theoretically plausible demand functions from U.S. Consumer Expenditure Survey data’, working paper, University of Arizona. Thurstone, L.L. (1931), ‘The indifference function’, Journal of Social Psychology, 2 (2), 139–67. Toubia, O., D. Simester, J. Hauser and E. Dahan (2003), ‘Fast polyhedral adaptive conjoint estimation’, Marketing Science, 22 (3), 273–303. Train, K. and C. Winston (2007), ‘Vehicle choice behavior and the declining market share of U.S. automakers’, International Economic Review, 48 (4), 1469–98. Urban, G.L., J.R. Hauser and J.H. Roberts (1990), ‘Prelaunch forecasting of new automobiles: models and implementation’, Management Science, 36 (4), 401–21. Urban, G.L., J.R. Hauser, W.J. Qualls, B.D. Weinberg, J.D. Bohlmann and R.A. Chicos (1997), ‘Validation and lessons from the field: applications of information acceleration’, Journal of Marketing Research, 34 (1), 143–53. Uzawa, H. (1971), ‘Preference and rational choice in the theory of consumption’, in J. Chipman, L. Hurwicz, M. Richter and H. Sonnenschein (eds), Preferences, Utility, and Demand, New York: Harcourt, pp. 7–28. Van Praag, B. and A. Kapteyn (1994), ‘How sensible is the Leyden individual welfare function of income? A reply’, International Economic Review, 38 (9), 1817–25. Varey, C. and D. Kahneman (1992). ‘Experiences extended across time: evaluation of moments and episodes’, Journal of Behavioral Decision Making, 5 (3), 169–86. Varian, H. (1982), ‘The nonparametric approach to demand analysis’, Econometrica, 50 (4), 945–73. Varian, H. (1992), Microeconomic Analysis, New York: Norton. Varian, H. (2006), Revealed Preference, New York: Oxford University Press. Veblen, T. (1899), Theory of the Leisure Class: An Economic Study in the Evolution of Institutions, New York: Macmillan. Von Neumann, J. and O. Morgenstern (1953), Theory of Games and Economic Behavior, Princeton, NJ: Princeton University Press. Wallis, W.A. and M. Friedman (1942), ‘The empirical derivation of indifference functions’, in O. Lange and others (eds), Studies in Mathematical Economics and Econometrics, Chicago: University of Chicago Press, pp. 175–89. Willig, R. (1976), ‘Consumer’s surplus without apology’, American Economic Review, 66 (4), 589–97. Yatchew, A. (1998), ‘Nonparametric regression techniques in economics’, Journal of Economic Literature, 36 (2), 669–721. Yatchew, A. (2003), Semiparametric Regression for the Applied Econometrician, Cambridge and New York: Cambridge University Press. Yoon, K. (2001), ‘The modified Vickery double auction’, Journal of Economic Theory, 101 (2), 572–84. Zamagni, S. (1995), The Economics of Altruism, Cheltenham, UK and Northampton, MA, USA: Edward Elgar.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:53AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

3

Psychological research and theories on preferential choice Jerome R. Busemeyer and Jörg Rieskamp1

Understanding human preferential choice behavior is challenging because humans change their preferences across time and contexts. This chapter summarizes the basic behavioral findings from research on human preferential choice and reviews the psychological theories that have been proposed to account for the puzzling findings. The main theme that we attempt to convey to the reader is that a coherent view of an individual’s underlying beliefs and values can only be recovered by carefully modeling the dynamic nature of the choice process through which these beliefs and values operate to produce observed behavior. When examining people’s choice behavior it becomes apparent that it varies substantially. For instance, Hey (2001) conducted a study in which 53 people repeatedly choose between pairs of simple gambles in five different sessions. In every session the same set of 100 pairs was presented. When assuming stable and deterministic preferences all people should have made identical choices in every session. However, it turned out that no single person always made the same choices across all five sessions. Instead, on average participants changed their preferences for 10 percent of pairs between two consecutive sessions; for on average 23 percent of the pairs they did not make identical choices in all five sessions. The seminal work by Mosteller and Nogee (1951) discovered early on that people’s choice behavior varies and has a probabilistic character. More surprisingly, its consequences for theory-building are still not fully acknowledged. Axiomatic approaches to human choice behavior imply deterministic behavior and do not contain an error theory that could explain people’s inconsistencies. The undeniable variability in people’s behavior is often implicitly acknowledged by assuming that people’s inconsistencies can be explained by unsystematic errors or ‘white noise’. This implies that people’s behavior varies around the theories’ deterministic predictions. However, when people’s inconsistencies are systematic such forms of ‘tremble error’ theories (Loomes et al., 2002) are not sufficient to explain human behavior. Moreover, without an explicit error theory it appears almost impossible to separate unsystematic from systematic inconsistencies and to unravel the mechanisms that underlie the systematic inconsistencies. In this chapter we describe the two standard approaches to explaining the probabilistic character of choice behavior represented by fixed and random utility models. Second, we summarize the key empirical findings violating essential principles of utility theories. Next we present psychological models of choice behavior that have been suggested to explain some of the observed behavioral regularities. We start with static choice models followed by dynamic models that describe how people’s preferences evolve over time. Then we summarize some additional and new directions, and finally we conclude by briefly comparing the different approaches and models against each other. 49 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

50

Handbook of choice modelling

1

CLASSIC PROBABILISTIC UTILITY MODELS

Before delving into the behavioral research on preferential choice, it is useful to first spell out two classic utility theory approaches: fixed versus random utility models. These theories served as a primary guide for past research on preferential choice by implying specific choice principles that were tested empirically. These empirical tests revealed the inadequacies of the theories in describing people’s behavior and led to the development of more descriptive and cognitively driven models. In the following we will describe and define these models using the notation and terms employed earlier in Rieskamp et al. (2006). Assume that there exists a complete set of choice options X 5 {A1, . . . , An} under consideration. The person may be presented with a subset Y 5 {B1, . . . , Bm} 8 X, m#n of this complete set. The probability that an individual chooses option Bi from the set Y is denoted p(Bi|Y), with the constraint p(Bi|Y)$0 and S i[Y p(Bi|Y) 5 1.0. Note that psychological models normally assume that an individual’s behavior is probabilistic, and so the theory needs to be defined with the probabilities at an individual (pooled across replications within an individual) rather than an aggregate level (pooled across individuals). 1.1

Fixed Utility Approach

According to the fixed utility model, a real value u(Ai) can be assigned to each option Ai P X that remains fixed across choice sets Y. When presented with the subset Y, the probability of choosing option Bi equals: p(Bi|Y) 5 f (u(Bi), u(B1), . . . ,u(Bi-1),u(Bi11), . . . u(Bm))

(3.1)

where the function f is a strictly increasing function of the first coordinate, u(Bi), and a strictly decreasing function of each of the remaining coordinates (and the remaining can be permuted without changing the predictions). For example, according to a Luce (1959) ratio of strength model: p(Bi|Y) 5 exp( u(Bi)) / S i[Y exp( u(Bj)).

(3.2)

A key idea of this class of models is that choice is inherently probabilistic and fundamentally unpredictable. Even with fixed utilities, a person’s choice on each occasion remains probabilistic. The main simplifying assumption of this model is that the utilities assigned to the choice options do not depend on the choice set Y. In other words, utilities are context independent, also called simple scalable (Luce and Suppes, 1965). This context independence property produces a shortcoming of this class of models. Of course this property can be relaxed, but how to do this in a coherent and parsimonious manner is quite challenging. 1.2 Random Utility Approach According to random utility models, on any choice occasion, the person samples an evaluation point w from a sample space W that determines the n real valued utilities U(w)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 51 5 [U1(w), . . . ,Un(w)]. This sampling process produces n random variables Ui , i 5 1, . . . , n that determine an n dimensional random utility distribution function denoted by FX 5 Pr[U1 # u1, . . . , Un # un]; when presented with a choice set Y, the person chooses the option that has the maximum randomly sampled value, max {U(B1), . . . ,U(Bm)}. The choice probabilities for the set Y is based on the marginal m dimensional distribution FY, which is obtained by integrating FX over the values of the options in X that are not presented in Y. The probability that option Bi is chosen from the presented set Y equals the probability that the randomly sampled value for the random variable Ui is the maximum: p(Bi|Y) 5 Pr[ Ui 5 max{Uj, Uj [ Y}].

(3.3)

For example, the Thurstone (1959) model assumes that the distribution function for the random utility vector U 5 [U1, . . . , Un] is multivariate normal.2 The key idea behind this class of models is that once the evaluation point w is selected by the person, then choice becomes deterministic. Behavior is only probabilistic because we do not know the point w used to evaluate the options; we only know the probabilities of sampling these points. The main assumption of this class of models is that the same n dimensional distribution function FX is used for all choice sets Y. In other words, the distribution function is context independent. This context independence turns out to produce shortcomings for explaining certain choice phenomena with this class of models. This property can be relaxed, but how to do this in a coherent and parsimonious manner presents a major challenge. 1.3

Comparison of the Two Utility Approaches

The ideas motivating these two approaches are fundamentally different: Fixed utility models imply that choice is fundamentally indeterminstic. In contrast, random utility models imply that choice is completely deterministic. Despite these conceptual differences, it is often difficult to empirically discriminate these ideas. For example, the Luce (1959) model can be mathematically derived from a random utility model with an identical and independently extreme value distributed error (Yellot, 1977). More general extreme value random utility models form the basis of many economic choice models (McFadden, 1981), and they are also very popular among marketing researchers (Louviere et al., 2000). In sum, fixed and random utility models represent the predominant approach (in particular, in economics) of predicting human choice behavior. Both approaches should be evaluated empirically by the major behavior findings.

2

BASIC CHOICE BEHAVIOR FINDINGS

This section reviews the basic empirical findings from preferential choice that have accumulated across the last 50 or so years from behavioral economics, consumer research and psychology. As mentioned earlier, much of this research was targeted at basic properties implied by the two classic probabilistic utility approaches described above. The research examines whether human choice behavior obeys basic properties such as transitivity, independence, regularity, and stationarity.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

52

Handbook of choice modelling

2.1

Transitivity

D

One of the most basic properties of choice to examine empirically is transitivity. If a person prefers Beethoven to Mozart and Mozart to Chopin, then transitivity implies that the person will also prefer Beethoven to Chopin. It appears unreasonable to violate this principle repeatedly. In general, the transitivity property is considered to be one of the main axioms of rational choice. Formally, transitivity is defined by a mathematical relation called a ‘preference’ denoted f so that Bi f Bj means that option Bi is preferred or indifferent to option Bj. Transitive preferences must satisfy Bi f Bj ` Bj f Bk S Bi f Bk for all i,j,k in X. This is required by deterministic utility models in order to postulate a real valued utility function for ordering preference, u: X S Reals. Transitivity is often justified as an axiom of choice behavior by arguing that violations of this axiom permit a person to be turned into a money pump. If a person prefers B to C then the person should be willing to pay money to exchange C for B; likewise if the person prefers A to B then the person should be willing to pay money to exchange B for A; finally if the person also prefers C to A then the person should be willing to pay to exchange A for C, thus returning to their original position, but after losing money on three exchanges. However, it is unlikely that these intransitive preferences could be exploited by building a money pump (compare, Chu and Chu, 1990). Instead, people will presumably notice their intransitive cycles at some point, making the money pump a ‘bogeyman’ that only demonstrates in principle the irrationality of intransitive choices but one that would never be observed (Lopes, 1996). The concept of transitivity is difficult to apply to probabilistic choice behavior because it is difficult to define the preference relation f when choices are inconsistent. One way is to define Bi f Bj p(Bi|{Bi,Bj}) $ .50 (Luce, 2000). This immediately leads to a definition of weak stochastic transitivity: p(Bi|{Bi,Bj}) $ .50 , p(Bj|{Bj,Bk}) $ .50 S p(Bi|{Bi,Bk}) $ .50

(3.4)

for all i,j,k in X. The fixed utility class of models must satisfy weak stochastic transitivity, and moreover, this class must also satisfy a stronger version called strong stochastic transitivity: p(Bi|{Bi,Bj}) $ .50 , p(Bj|{Bj,Bk}) $ .50 S p(Bi|{Bi,Bk}) 5 max{ p(Bi|{Bi,Bj}), p(Bj|{Bj,Bk}) }

(3.5)

for all i,j,k in X. This follows from the fact that p(Bi|{Bi,Bj}) $ .50S u(Bi) $ u(Bj) and p(Bj|{Bj,Bk}) $ .50S u(Bj) $ u(Bk) and together this implies u(Bi) $ u(Bk), so that p(Bi|{Bi,Bk}) $ p(Bj|{Bj,Bk}) and p(Bi|{Bi,Bk}) $ p(Bi|{Bi,Bj}). The random utility class of models does not need to satisfy weak stochastic transitivity. The decision-maker can be transitive from each point of view w, but averaging across these different preference orders produced by different points of view can violate weak stochastic transitivity (Regenwetter et al., 2011). For instance, suppose that across different occasions, a person experiences the following three transitive preference orders: (B1 s B2 s B3), (B2 s B3 s B1), (B3 s B1 s B2) where A s B indicates strict preference. If

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 53 we assume that these preferences occur equally often, then we observe that p(B1|{B1,B2}) 5 2/3, p(B2|{B2,B3}) 5 2/3, but p(B1|{B1,B3}) 5 1/3, violating weak stochastic transitivity, which is called the ‘Condorcet paradox’. Although random utility models do not need to satisfy weak stochastic transitivity they need to satisfy another transitivity property called the triangular inequality for binary choices (Becker et al., 1963), if we assume no indifference, p(Ui 5 Uj) 5 0: p(Bi|{Bi,Bj}) 1 p(Bj|{Bj,Bk}) − p(Bi|{Bi,Bk}) # 1.

(3.6)

This can be shown as follows. For brevity, define p(xyz) as the probability for the strict transitive order x s y s z. Then p(x|{x,y})1p(y|{y,z}) − p(x|{x,z}) # 1 S p(z|{x,z}) # p(z|{y,z}) 1 p(y|{x,y}) S p(zxy) 1 p(zyx) 1 p(yzx) # p(zyx) 1 p(zxy) 1 p(xzy) 1 p(yxz) 1 p(yzx) 1 p(zyx). Given the importance of the transitivity property, one would expect a firm resolution with regard to its empirical status. Beginning with May (1954), a long series of investigations has appeared reporting violations of transitivity (see review by Rieskamp et al., 2006). Many of these studies were designed to replicate the well-known experiment by Tversky (1969), who reported violations of weak stochastic transitivity. Recently, however, the results regarding transitivity have been called into question because of inadequate methods for statistically testing this property (Iverson and Falmagne, 1985; Regenwetter et al., 2011). In response, several new statistical methods have been developed for testing transitivity, and these new results send a more mixed message. Tsai and Böckenholt (2006) used a random utility mixture model to compare a parametric formulation of an intransitive model versus a more constrained transitive model, and they found that chi-square difference tests rejected the constrained transitive model in favor of the more general intransitive model. Regenwetter et al. (2011) developed nonparametric statistical tests of the triangular inequality and found that most participants did not produce statistically significant violations. Birnbaum and Schmidt (2008) developed statistical models for choice patterns based on true preferences plus response errors, and they found that most participants produced patterns consistent with a model of transitive true preferences plus response error. In sum, in spite of the large body of research it appears unclear whether people violate weak stochastic transitivity. Recent evidence using adequate statistic tests shows that these violations appear to be the exception rather than the rule. More systematic, reliable, and robust evidence has been found for violations of strong stochastic transitivity (Mellers and Biagini, 1994; Rumelhart and Greeno, 1971; Tversky and Russo, 1969). The psychological reason for the violations of strong stochastic transitivity is the following: when a person is faced with choices between multidimensional options, the context produced by a pair of options is used to single out some dimensions for making comparisons while ignoring other dimensions. For example, one pair of options A and B might differ largely in a tradeoff among the quality features but seem similar in price – for this pair the person may tend to ignore the price difference and focus more on the quality features. In another example comparing B and C again the price difference appears negligible after giving the quality dimension more attention. However, when comparing A and C the difference in price might become substantial, so that the person may tend to refocus more heavily on price for the comparison. Thus the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

54

Handbook of choice modelling

dimensions used to make the comparison change across choice pairs and this change in the basis for comparison results in violations of strong stochastic transitivity. 2.2

Independence

The next most basic property of choice to be examined empirically is independence, which is concerned with invariance of a preference relation between two options with changes in the choice set that contains these options. More formally suppose Y # X and also Z # X; then independence states (Tversky, 1972b) p(Bi|{Bi, Bj}
(3.7)

This property is also required by fixed utility theories: p(Bi|{Bi, Bj}
Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 55 included into the choice set then the compromise becomes favored so that p(C|{A,B,C}) $ p(B|{A,B,C}). This work has been replicated and extended in many subsequent studies (see review by Kivetz et al., 2004). 2.3

Regularity

Another very basic property of choice to be examined empirically is regularity, which asserts that the addition of an option to a choice set should never increase the probability of selecting an option from the original set. More formally, suppose Z ( Y 8 X presented for choice. Then regularity states p(Bi | Bi < Z) $ p(Bi | Bi < Y).

(3.8)

This property holds for random utility theories since it is less likely that Bi has the highest utility in a larger set, Y, than in a smaller set, Z, because p(Bi |Z) 5 Pr[U(Bi) 5 max{U(Bj), Bj P Z}] $ Pr[U(Bi) 5 max{U(Bj), BjPZ}] 3 Pr[U(Bi) 5 max{U(Bk), Bk P Y−Z }| U(Bi) 5 max{U(Bj), BjPZ}] 5 p(Bi |Y). Violations of regularity were first found by Huber et al. (1982), which are called asymmetric dominance effects and/or attraction effects. The essential idea here is to examine the preferences for options, say A versus B with or without the context of another deficient option D. Like the ‘similarity effect,’ options A and B are designed to be qualitatively different or dissimilar – for example option A could be high on quality and price whereas option B is low in quality and price. Also like the ‘similarity effect’, option D is designed to be very similar to option A; but the key difference needed to produce an attraction effect is to make D deficient or defective as compared to option A, for example D is of slightly lower quality but the same price as A so that D is dominated by A. In this situation, D is rarely chosen, but including D to the choice set increases the probability of choosing option A so that p(A|{A,B}) # p(A|{A,B,D}). The same result can be obtained even when D is not necessarily dominated by A but just much less attractive as compared with A (Huber and Puto, 1983). These findings have been replicated in many studies (Wedell, 1991). (For review see Heath and Chatterjee, 1995.) 2.4 Stationarity All choices take time, and the time taken to make a decision can change the choice that is finally made. For example, suppose you wake one morning to find two emails, each of which is an invitation to present a keynote speech at an attractive venue, but unfortunately on the same day. Choosing between these mutually exclusive offers may take substantial time to think through the advantages and disadvantages, and deadlines for making the decision could affect the final decision by preventing you from thinking through all the consequences. If choice probabilities do not change as a function of deliberation time (excluding time to read the choices), then they are stationary. Define the probability of choosing an option A from a set Y ( X conditioned on deliberating for a period of time t as pt(A|Y). Stationarity states that pt(A|Y) 5 p(A|Y) for t .t0 where t0 is time necessary to read the choice options. Fixed and random utility models are static models that provide no

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

56

Handbook of choice modelling

mechanisms for predicting the effects of decision time on choice probability. This oversimplification becomes a problem for these theories when stationarity is violated. On the one hand, one could argue that fixed and random utility models only apply for choices without time constraints, assuming that the probabilities converge to some asymptote pt(A|Y) S p`(A|Y). On the other hand, even when there is no explicit deadline, there is a cost for taking time to deliberate that puts time pressure on the decision-maker. The effects of decision time on choice probability are now well established by several different lines of experimental research (Svenson and Maule, 1993). Consumer choices systematically reverse under time pressure (Svenson and Edland, 1987). Choices between uncertain actions systematically change as a function of deliberation time (Busemeyer, 1985) and even reverse under time pressure (Diederich, 2003). Compromise and attraction effects become even larger when decision-makers are encouraged to deliberate longer (Dhar et al., 2000; Pettibone, 2012). Decision-makers also tend to switch strategies for making decisions when pressed for time (Ben Zur and Bresnitz, 1981; Rieskamp and Hoffrage, 2008). Time is an important factor in choice behavior because information about the choice options must be accumulated across time, and the type of information ultimately entering the choice process depends on how much time is allocated to making the decision (Wallsten and Barton, 1982). Under short time constraints, the decision-maker may have time only to focus on the most important dimension and ignore a number of other relevant aspects of the decision problem. When given more time to process all of the dimensions, the initial preference established by the first dimension can be overcome by competing information accumulated on many other relevant dimensions. Alternatively decision strategies could change from optimal to heuristic under time pressure (Payne et al., 1993). Clearly, decision time plays a major role in emergency types of decisions, such as medical or military decisions (Janis and Mann, 1977). However, decision time also plays an important role in day-to day economic choices by consumers because of the well-known tradeoff between effort and accuracy when choosing strategies for decisionmaking (Payne et al., 1993). Time has a cost and consumers are often unwilling to spend a large amount of costly time to make decisions (Wright, 1972). There is another way to think about stationarity, one that concerns changes in the choice probabilities caused by learning from experience across a sequence of repeated decisions. There is now a large experimental literature on experienced-based choice (compare, Hertwig and Erev, 2009), and a growing theoretical effort to develop integrated models of learning and decision-making (Erev and Barron, 2005; Gonzalez and Dutt, 2011). However, the topic of learning goes beyond the intended range of this chapter. 2.5

Conclusions from Behavioral Findings

The overview of some of the main findings on human choice behavior illustrates that people often do not adhere to simple principles such as strong stochastic transitivity, independence, regularity and stationarity. People’s preferences and choice behavior change as a function of the choice context as well as the amount of time taken to process the information for making a decision. However, standard fixed and random utility

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 57 models are insufficiently sensitive to context or time pressure to capture these systematic changes in preference behavior. For instance, fixed utility theories assign values to options that are independent of the choice set, and random utility theories do not have any mechanism to explain the impact of available time on decision-making. Of course, one can change utilities in an ad hoc way for every context to fit these effects post hoc. For instance, the utility function could be defined to include a context effect term (for example, include an extra utility term for a compromise option), and different random utilities could be assumed for different deliberation times (for example, change the means for each time period). But these ad hoc fixes do not provide a scientific explanation that allows for a priori predictions in new contexts and time constraints. Psychological choice models, which are described next, tend to be more complex than the basic fixed and random utility models. They are designed to provide mechanisms to account for either context effects, or time pressure effects, or both.

3 3.1

STATIC PSYCHOLOGICAL MODELS Thurstone Model

One of the earliest choice theories was put forth by Thurstone (1927). The theory was extended by Thurstone (1959) and later by Bock and Jones (1968). See Böckenholt (2006) for a recent review of the developments of this theory. Thurstone’s theory is a random utility type of model for choice that assumes a multivariate normal distribution for the random utility vector U~Normal(m,S), where m is the mean and S is the variance–covariance matrix for the vector of random utilities U. This model is closely related to the probit model used in economics and marketing (Hausman and Wise, 1978). Parsimonious representations of the variance-covariance matrix S are obtained by formulating specific covariance structure models (Takane, 1987). Being a random utility model, the Thurstone theory must satisfy the triangle inequality principle as well as the principle of regularity presented earlier. Furthermore, because it assumes a multivariate normal distribution, it also must satisfy weak stochastic transitivity, but it can violate strong stochastic transitivity (Halff, 1976). When used in conjunction with multidimensional scaling methods, Thurstone’s theory provides an effective way to model similarity effects that produce violations of independence and strong stochastic transitivity (Carrol and DeSoute, 1991). In one version of this model (called the wandering ideal point (WIP) model) each choice object is represented as a fixed point lying within a multidimensional attribute space. Each person has an ideal point, which also lies within this same attribute space, but this ideal point randomly changes (wanders) across replications according to a multivariate normal distribution. On any trial, the choice option having the shortest distance to the sample ideal point is chosen. Multidimensional scaling based on similarity judgments can be used to identify the dimension of the attribute space as well as the coordinates of the choice objects. The similarity relations between the choice objects can then be used to account for violations of independence and strong stochastic transitivity. Thurstone’s theory does not provide any mechanisms for explaining attraction effects and compromise effects. In particular, being a random utility model, it must

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

58

Handbook of choice modelling

satisfy regularity. Another deficiency is that Thurstone’s theory is a static model, and so it cannot explain changes in choice probability as a function of deliberation time. However, below we will describe ways that the model can be reformulated as a diffusion model that allows the mean and covariance to change across time. 3.2

Elimination by Aspects Model

One of the earliest formalizations of heuristic choice is Tversky’s (1972a) elimination by aspects theory (EBA). This theory was built on earlier ideas presented by Restle (1961) in which each option is described by a set of aspects (or properties) and choice is based on the selection of aspects. The basic idea is that the when presented with a choice set Y # X, the person samples an aspect with a probability equal to its weight or importance, and then eliminates any option that does not contain this aspect. This elimination process continues until only one option remains, which is then chosen. This heuristic process leads to the following formulation for choice probability. Define Z as the power set of X; that is, the set of all subsets of X. Define Zj [ Z as one of the nonempty subsets of Z, which is assigned a weight u(Zj) > 0. Then the probability of choosing option Bi [ Y # X equals p(Bi | Y) 5 S u(Zj) · p(Bi | Zj) / S u(Zj),

(3.9)

where each sum extends across all Zj # Y that are strictly contained in Y, and it is assumed that the denominator is greater than zero. Tversky (1972b) proved that the EBA satisfies weak stochastic transitivity, and he also proved that EBA can be re-expressed as a random utility model, so that it also satisfies regularity and the triangle inequality. However, the EBA model does not satisfy strong stochastic transitivity, and it also violates independence. In fact, the EBA model was originally designed to provide an elegant explanation of similarity effects on choice (Tversky, 1972a), but it cannot explain attraction and compromise effects. Mathematically, the EBA model is a Markov process (see Tversky, 1972b), and from this process one can derive choice response times (Busemeyer et al., 1988). However, the EBA model has not yet been used to explain changes in choice probability as a function of deliberation time. 3.3 Context Dependent Preference Model After the discovery of attraction and compromise effects, Tversky and Simonson (1993) proposed the contextual dependent preference (CDP) model to account for these two new effects.3 This model assigns different values to an option depending on the other choice options in the choice set. The value of an option A presented in context of choice set Y is defined as V(A|Y) 5 Sk bk·v(Ak) 1 q . SZ[Y R(A,Z),

(3.10)

where the index k refers to the k-th attribute. The first term on the right, Sk bk·v(Ak), is a standard weighted sum of values for option A across the k attributes. This is a context

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 59 free contribution assigned to option A. In the second term, R(A,Z) is the relative advantage of option A over option Z, which is defined as R (A,Z) 5

a kadvk (A,Z)

a kadvk (A,Z) 1 a kdisk (A,Z)

(3.11)

with advk(A,Z) 5 v(Ak) – v(Zk) if the advantage is positive (otherwise the advantage is zero), and disk(A,Z ) 5 d[(v(Zk) – v(Ak)] if the disadvantage is positive (otherwise the disadvantage is zero). The function d[x] is assumed to be a convex function that reflects loss aversion, and q is a constant that moderates the contextual effect. This contextual model provides a straightforward account of attraction and compromise effects based on the concept of loss aversion. Surprisingly, this same loss aversion principle prevents it from accounting for similarity effects (see Roe et al., 2001; Rieskamp et al., 2006, app. A). Furthermore, being a static model, it cannot explain the empirical fact that these effects grow larger with longer deliberation times. In fact, when fit to choices exhibiting both attraction and similarity effects, the model turns out to fit poorly and cannot fit similarity effects (Soltani et al., 2012). 3.4

Summary of Static Psychological Models

As mentioned earlier, the goal of these psychological choice models was to formulate psychological principles or mechanisms to capture context effects in a predictable manner. For example, the WIP Thurstone model uses spatial locations between options in the attribute space to derive predictions for similarity effects, and the EBA model derives predictions for similarity effects from the attribute sampling and elimination mechanism; the CDP model derives predictions for attraction and compromise effects using the principle of loss aversion. However, there are several serious limitations of these static models. First, there is no single model to account for all (similarity, compromise, attraction) effects simultaneously. Some models (WIP, EBA) account for similarity effects only, and others (CPA) account only for attraction and compromise effects. Second, these models fail to describe how these context effects change across deliberation time. These deficiencies motivate the development of the dynamic models presented next.4

4

DYNAMIC PSYCHOLOGICAL MODELS

The static psychological models do not explicitly describe how people’s preferences change as a function of deliberation time when there is a specific deadline. Nor do they describe the random time, T, chosen by the decision-maker to make a decision when there is no specific deadline. In contrast, dynamic models of choice behavior explicitly model how a deadline decision time affects people’s choices, as well as the time T required to make a decision when there is no specific deadline.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

60

Handbook of choice modelling

4.1

‘Horse Race’ Choice Models

One of the first attempts to generalize the classic random utility family to include choice response time was a class called ‘horse race’ random utility models (Marley and Colonius, 1991). The underlying assumption is that when presented with a choice from set Y, each option is assigned a non-negative arrival time and option Bi 8 Y is chosen if its ‘arrival event’ occurs first. It is assumed that the random arrival times for all options in the entire set X form a joint multivariate distribution, and the probability distribution for a subset Y 8 X is obtained by marginalizing over the joint multivariate distribution of arrival times for the complete set X. The probability of choosing option Bi from set Y and this occurs after time t equals p(Bi ` (T . t) | Y) 5 p( [ T(Bi) 5 min{T(Bj), Bj[Y} ] > T(Bi) . t).

(3.12)

The ‘horse race’ choice model reduces to a random utility model when marginalizing over the time to make the decision; however, it extends the traditional random utility model by also providing a model of the distribution of choice time T. Building on earlier ideas presented in Townsend and Ashby (1983), Otter et al. (2008) formulated an ‘independent’ version of the ‘horse race’ model. Otter et al. (2009) assumed that each choice option Bi in a set Y is associated with an independent counter Ni(t) that counts up events favoring that option at time t. The first counter to reach a threshold wins the race and determines the choice. Events favoring an option occur at times distributed according to a Poisson process, and the rate assigned to each option depends on the attribute values of the choice option. The model was fit to choice and response time data from a survey involving 422 people, with each person providing stated choices for 18 different choice sets, and each choice set contained five television sets described by six attributes. The researchers found that by fitting both choice and response time (as compared to fitting only choice), they obtained a better marginal fit to the choice part of the data, which suggests that the model successfully extracts information contained in response times and that response times are informative about the cognitive processes underlying the observed choices. The early versions of the ‘horse race’ choice model were random utility models, and so they inherited the triangle inequality and regularity properties. Therefore they could not explain violations of regularity such as the attraction effect. Nor were they used to account for similarity effects or the compromise effect. More recently, however, a linear ballistic accumulator (LBA) type of horse race model has been proposed that accounts for all three (similarity, attraction, compromise) context effects. The LBA model accomplishes this by introducing two new features into the horse race model – a front-end component that transforms choice stimuli into contextualized values and a back-end process that transforms these contextualized values into overt choices (Trueblood et al., 2014). The independent Poisson ‘horse race’ model and the LBA models do account for changes in choice probability as a function of deliberation time. However, the Poisson model also predicts that the distribution of choice times becomes normal as the counter threshold increases (Otter et al., 2009). This prediction is problematic because, in fact, the choice response time distribution tends to become more positively skewed with longer mean deliberation times (Ratcliff and Smith, 2004). This problem is corrected by using the next class of dynamic models.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 61 4.2

Sequential Sampling Choice Models

Sequential sampling models of decision-making were originally developed for Bayesian inference (DeGroot, 1970). Cognitive psychologists applied these models to a variety of cognitive tasks including sensory detection (Smith, 1995), perceptual discrimination (Laming, 1968; Link and Health, 1975; Usher and McClelland, 2001; Vickers, 1979), memory recognition (Brown and Heathcote, 2005; Ratcliff, 1978); categorization (Ashby, 2000; Nosofsky and Palmeri, 1997), and probabilistic inference (Wallsten and Barton, 1982). Several sequential sampling models for preferential choice have also been proposed (Aschenbrenner et al., 1984; Bhatia, 2013; Fehr and Rangel, 2011; Glöckner and Betsch, 2008; Guo and Holyoak, 2002; Roe et al., 2001; Usher and McClelland, 2004). The basic idea of many sequential sampling models is that when presented with a subset of m choice options Y # X from a larger set X, a choice from this set takes some amount of deliberation time T. The decision process starts with an m 3 1 state vector P(0), where each coordinate corresponds to the preference for one of the options in Y. This initial state P(0) reflects preferences for status quo options or biases from past experience. The initial state vector then evolves across time t by accumulating evaluations regarding the advantages and disadvantages of each option. After deliberating for a time t, the cumulative evaluations evolve to a new state vector P(t) according to the linear stochastic difference equation (technically an Ornstein–Uhlenbeck process – OU process), dP(t1h) 5 P(t1h) − P(t) 5 −G3P(t)·h 1 m·h 1 dB(t1h),

(3.13)

where dB(t1h) 5 B(t 1h) −B(t) is a Brownian motion increment with mean zero and variance–covariance matrix F·h. The term, m·h 1 dB(t1h), is the new sample input into the process during the time step h, and −G·P(t)·h is a feedback process that can be used to maintain preference stability. An important special case is the Wiener process, which is obtained by setting G 5 0, but this allows preferences to grow without bound, and it also reduces the capability of accounting for context effects. The preference accumulation process continues until one of two stopping rules is satisfied – either a fixed or a variable stopping rule. According to a fixed time stopping rule, the preference state evolves until some externally determined deadline time t occurs. In this case, the process evolves in an unconstrained manner until the deadline, t, at which point the option with the strongest preference state is chosen (that is, the option corresponding to the maximum coordinate of the P(T) vector). In the case of the fixed time stopping rule, the theory is equivalent to a dynamic version of a Thurstone model in which the state vector P(t) serves as an evolving vector of random utilities. However, unlike the static Thurstone model, the mean and the variance–covariance matrix of P(t) evolve across time, which accounts for changes in preferences as a function of deliberation time. Furthermore, if the feedback matrix is non-zero (G Z 0), then both the mean and variance–covariance matrix of P(t) are context-dependent; that is, the multivariate distribution of P(t) changes depending on the presented choice set Y. More specifically, if G Z 0, then the joint distribution for a set of options Y cannot be derived by marginalization from a complete joint distribution defined over all possible options in X. Consequently, these theories do not have to satisfy

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

62

Handbook of choice modelling

regularity.5 If the initial state P(0) starts out unbiased (i.e., P(0) 5 0) then these theories satisfy both weak stochastic transitivity and the triangle inequality. However, they generally violate strong stochastic transitivity because of the changes in the variance–covariance matrix F across pairs of options. According to the variable time stopping rule, preferences continue to evolve until the preference strength for one of the coordinates, corresponding to one of the options, exceeds a positive threshold. The first option exceeding the threshold is chosen, and the deliberation time equals the time it takes for an option to first cross a threshold. Figure 3.1 illustrates the process using the variable stopping time for three options {A, B, C}. In this figure, the horizontal axis represents time, and the vertical axis represents the preference state for each option; and in this case option A first crosses the threshold at a time equal to 2 seconds. The flat line near the top of the figure is called the threshold, and it is used to determine how strong the preference must be in order to make a decision. Increasing this threshold increases the average time to make a decision, which allows more evaluations of all of the options’ advantages and disadvantages. Decreasing the

1.0

A

Preference state

0.5

B 0

−0.5

C −1.0

0

0.2

0.4

0.6

0.8 1 1.2 Time in seconds

1.4

1.6

1.8

2

Note: The horizontal axis represents time, the vertical axis represents the preference state for each option and the flat horizontal line represents the decision threshold that has to be crossed by one of the preference states so that a choice occurs. In this case, option A first crosses the threshold at a time equal to 2 seconds and is chosen by the decision-maker.

Figure 3.1

The preference state for each option/action changes over time by sampling information about the choice options

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 63 threshold decreases the average time to make a decision, which limits the ability to evaluate all the advantages and disadvantages of each option. The threshold is determined by the cost of sampling new information as compared to the gains and losses expected by the final decision after sampling more information. When the variable stopping time rule is used, sequential sampling models can be used to predict both the choice probabilities as well as the mean decision times. In fact, a strong test of the model is obtained by fitting the model parameters to the choice probabilities, and then using these same parameters to predict mean decision time. Sequential sampling models provide very accurate accounts of the skewed shape of choice response time distributions (Ratcliff and Smith, 2004). In addition, they are designed to account for similarity, compromise and attraction effects using the same exact parameters for all three effects (see, Bhatia, 2013; Roe et al., 2001; Usher and McClelland, 2004). Furthermore, they make the strong a priori prediction that both compromise and attraction effects become larger with longer deliberation times, a prediction that has been confirmed by experimental evidence (Dhar et al., 2000; Pettibone, 2012). As mentioned above, there are several versions of this general type of sequential sampling theory of preferential choice. One of the earliest was decision field theory (DFT) (Busemeyer and Townsend, 1993; Roe et al., 2001). Decision field theory shares exactly the same parameters as the traditional Thurstone model with respect to the mean vector m and the covariance matrix F of the OU process. According to DFT, the mean m and covariance matrix F are determined by the expectation and variance–covariance, respectively, of advantages or disadvantages for each option along a randomly sampled attribute (Roe et al., 2001). Different attributes are sampled across time by switching attention from one attribute to another during deliberation. Like the EBA model, attention-switching provides the mechanism used to account for similarity effects and violations of independence and strong stochastic transitivity. Also key to DFT, the coefficients in the feedback matrix G are determined by the distance between options in a multi-attribute space of options (Hotaling et al., 2010). Three additional parameters are required to determine the feedback matrix G on the basis of the distances between the m options in the set Y (see Hotaling et al., 2010, for details). According to DFT, this feedback matrix is the key mechanism for predicting attraction and compromise effects in this theory. Importantly, DFT predicts all three effects (similarity, attraction and compromise), as well as other violations of independence, using the same fixed set of parameters. The feedback mechanism of DFT also explains why attraction and compromise effects become larger with longer deliberations. Decision field theory has been shown to predict choices between gambles better than popular models of decision-making under risk such as cumulative prospect theory (Rieskamp, 2008), and it also fits choices between consumer products better than another recently introduced cognitive model called the proportional difference choice model (Gonzalez-Vallejo, 2002; Scheibehenne et al., 2009). Finally it provides accurate cross validation predictions for mean decision time based on parameters derived from choice probabilities (Diederich and Busemeyer, 2003; Dror et al., 1999). The leaky competing accumulator (LCA) model is another important version that shares many assumptions with DFT, but differs on some key mechanisms (Usher and McClelland, 2004).6 This model also assumes that people switch their attention during

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

64

Handbook of choice modelling

the decision process to the different attributes and accumulate advantages or disadvantages for each option across time. This attention-switching mechanism allows LCA, similarly as with DFT, to explain violations of independence (such as the similarity effect) and violations of strong stochastic transitivity. The LCA model is also able to explain the attraction and compromise effects, but it uses a different mechanism than DFT. The LCA model does not assume distant dependence in the feedback matrix G (all off diagonal elements are the same). Instead it incorporates the same loss aversion principle as used in the context dependent preference model (Tversky and Simonson, 1993) to predict the compromise and attraction effects. The accumulation mechanism of LCA also enables the model to explain how attraction and compromise effects increase with longer deliberations. Decision field theory and the leaky competing accumulator model remain competitive by partly using different mechanisms to account for main behavior findings, but they are still quite similar and share many assumptions in contrast to more traditional random utility theories (Tsetsos et al., 2010). Krajbich et al. (2010) developed a version of sequential sampling called the attention modulated drift-diffusion model (AM-DDM). Unlike DFT and LCA, it does not assume attention-switching across attributes, and instead it assumes attention shifting across choice alternatives. The AM-DDM assumes that each option accumulates random utilities according to a Wiener process (G5 0). The mean drift rate m is simply determined by a traditional weighted sum of the attribute values, and F 5 s2·I, where s2 is the variance (a scalar). However, the mean drift rate is not constant during deliberation, but it changes depending on the option to which the decision-maker is attending. Using eye-movement recordings, it is possible to track the location of the gaze of the decision-maker at each moment during the decision process. The gaze is observed to shift from one option to another when they are spatially separated in a display. The mean drift rate of the Wiener process is assumed to change depending on the gaze of the decision-maker. The option currently being viewed is given more weight. If the option under gaze is advantageous then its advantage is enhanced during the gaze. If the option under gaze is disadvantageous, then this disadvantage is enhanced during the gaze. This model provided accurate quantitative predictions for complex relationships between eye fixation patterns and choices across time, as well as several fixation-driven decision biases. Unlike DFT, it does not make use of the feedback matrix (G 5 0), and unlike LCA, it does not make use of loss aversion. Consequently, it has not been used to predict similarity, attraction, and compromise effects or other context effects and violations of independence. The newest version of this class of sequential sampling models is the associative accumulation model (AAM) (Bhatia, 2013). Once again this model shares many assumptions with decision field theory such as sequential accumulation of advantages and disadvantages for options based on attention-switching across attributes to account for similarity effects. However, unlike decision field theory, the AAM does not make use of the feedback matrix G to account for attraction and compromise effects (G 5 0). Instead it assumes that the average attention weight given to an attribute is mainly determined by the total absolute values assigned to the attribute summed across alternatives. Attributes present in extreme quantities in some alternatives, attributes present in many alternatives, and attributes present in especially salient alternatives, are more accessible relative to their competitors. This mechanism can account for not only compromise and attrac-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 65 tion effects, but also several other context effects such as alignment effects (Markman and Medin, 1995) and reference point effects (Tversky and Kahneman, 1991). 4.3

Summary of Dynamic Models

The dynamic psychological models are more complex than the static psychological models but they substantially expand their explanatory power. First, they are capable of predicting both choice probability and the distribution of time taken to make a choice. They are also capable of predicting the effects on time limits or time pressure on choice probabilities. Third, the dynamics provide mechanisms for simultaneously explaining all three types of context effects (similarity, attraction and compromise) as well as many other violations of independence and strong stochastic transitivity. None of the previously developed static models were found to be capable of predicting all these effects. This does not mean that it is not possible to discover a new static model that can do the job, but such a model remains to be developed. Dynamic models, and sequential sampling models in particular, make strong predictions about the sampling process used to make decisions. One of the basic assumptions (except for the AM-DDM) is that information sampling is not affected by the person’s current preference (formally, the input, m·h 1 dB(t1h), and does not depend on the current state of preference P(t)). However, recent studies using information search (Willemsen et al., 2011) and eye movements (Fiedler and Glöckner, 2012) indicates that the sampling process becomes biased to sample attributes that support the currently favored alternative. This finding suggests that the current preference state may feedback and modify the input into the accumulation process (see, for example, Glöckner and Betsch, 2008; Guo and Holyoak, 2002).

5

ADDITIONAL AND NEW DIRECTIONS

This section briefly reviews some additional important ideas for choice modeling coming from cognitive psychology and neuroscience. 5.1 Decision by Sampling One limitation of the previously mentioned sequential sampling models is that they do not specify the details of the sampling process other than assuming a mean input vector m and a covariance matrix F for the Brownian motion process. Another cognitive theory, called decision by sampling, provides more detailed mechanisms for the sequential evaluations entering the accumulation process (Stewart and Simpson, 2008; Stewart et al., 2006). Stewart et al. (2006) postulate that the subjective values of an attribute value are derived on-the-fly from comparisons with samples of other attribute values drawn from long-term memory. Furthermore, this long-term memory reflects the values experienced from real-world distributions. Stewart et al. (2006) showed that the shapes of utility functions, decision weighting functions and delay discounting functions estimated in decision research can be explained by the real-world distributions of gains, losses, probabilities, and delays that people experience.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

66

Handbook of choice modelling

5.2

Quantum Choice Theory

Here we wish to return to one of the most fundamental issues concerning the probabilistic nature of choice. Why is choice probabilistic? One idea is that choice is fundamentally deterministic – each choice is determined by a real preference order or preference state existing at the moment immediately before a choice is made, but we (the theorist) do not have full information about this precise state. Another idea is that choice is fundamentally indeterministic – there is no real preference order or preference state existing at the moment immediately before a choice is made. Both the random utility and the sequential sampling class of models are deterministic in the sense that choice is based on a specific state that exists at the moment before the choice is made. Although the EBA model was originally cast as a probabilistic process, it can also be interpreted as a random utility model. Is there any empirical reason for questioning the deterministic underpinning of choice? Recently, interest has been growing in the use of principles based on quantum probability theory, which posits that choice is essentially indeterministic in nature (Lambert-Mogiliansky et al., 2009). The reason for considering a quantum approach is that preferences sometimes violate the law of total probability that lies at the heart of classic probabilistic models built upon deterministic mechanisms (Khrennikov, 2010; Busemeyer and Bruza, 2012; Pothos and Busemeyer, 2009; Yukalov and Sornette, 2011). 5.3 Decision Neuroscience More recently, sequential sampling models of decision-making have attracted the interest of decision neuroscientists (Fehr and Rangel, 2011). Neuroscientists have been examining the neural basis for decision-making in the brains of Macaque monkeys using single cell recording techniques (Gold and Shadlen, 2001, 2002; Platt, 2002; Schall, 2003). Both the choice and the decision time of the monkeys were accurately predicted, trial by trial, based on where and when the neural activation crossed a threshold bound. Moreover, sequential sampling models have been fit to choice/response time data and then used to make a priori predictions for electrophysiological activation (Smith and Ratcliff, 2004). Although this research was based on single or multiple cell recording of saccadic eye movements from monkeys, converging evidence has been reported using cognitive tasks with humans. Researchers have recorded electrophysiological potentials from humans during a categorization task, called the lateralized readiness potential (Gratton et al., 1988). These potentials were recorded from the scalp above the premotor cortex that signals preparation for left- or right-hand movements to a cue. They found that choice and response times were determined by accumulation of the lateralized readiness potential to a threshold criterion. More direct evidence for accumulation to threshold as the basis for perceptual decision-making has been obtained from functional magnetic resonance imaging (Liu and Pleskac, 2011). Another imaging study by Gluth et al. (2012) explored the predictive accuracy of sequential sampling models for preferential choices. In their experiment the participants received sequential information about a stock company that could be bought or rejected. The results showed that a time-variant sequential sampling model using a decreasing rather than a fixed decision threshold was best in predicting the decision time and choice behavior. Furthermore the option’s value as assumed by the sequential sampling model correlated with the brain activity in the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 67 ventromedial prefrontal cortex, the right and left orbitofrontal cortex, and the ventral striatum. These results indicate that the brain accumulates samples of information by forming an updated value representation in dopaminoceptive areas including the ventromedial prefrontal cortex and the ventral striatum. The conclusion from this research is the basic idea that decisions in the brain are based on the accumulation of noisy activation until a threshold is reached, which forms the basis for the sequential sampling models in cognitive science.

6

GENERAL CONCLUSIONS

In this chapter we have reviewed some of the main behavioral findings in behavioral economics, consumer research, and psychology. The list of findings provides a basis for evaluating the different approaches to predict and explain human choice behavior. On the one hand, we have shown that standard utility approaches, such as fixed and random utility models, have difficulties in explaining why people violate principles, such as strong stochastic transitivity, independence or regularity. On the other hand, we have shown that more cognitive models of decision-making that also aim to describe how preferential choices evolve dynamically over time can provide explanations for these findings. These models have been the result of the long and rich history in choice modeling in psychology. This work has led to more sophisticated models that have also increased the complexity of the models. The increase in theoretical complexity has been driven by two sources – the first is striving to explain the highly context-sensitive nature of human preferential choice behavior; the second is the attempt to account for more details of choice behavior including decision time, eye movements, information search and choice confidence (Pleskac and Busemeyer, 2010; Ratcliff and Starns, 2009; Van Zandt, 2000). Beyond that, decision neuroscientists have even begun to explain the neurological underpinnings of decision-making using electrophysiological recordings and/or functional magnetic resonance imaging during choice. The psychological history of choice modeling began over 50 years ago with Luce’s choice model (Luce, 1959), which was essentially a context-independent theory. Next, there were developments of Thurstone’s (1959) theory to account for the effects of similarity between alternatives on choice. Aspect-based theories by Restle (1961) and Tversky (1972a) were also devised to account for similarity effects. However, these earlier models turned out to be inadequate to explain more complicated findings such as compromise and attraction effects. Dynamic models such as the sequential sampling models (DFT, LBA, LCA and AAM) are the first to be able to account for the full range of context effects in a coherent manner and predict how these effects change as a function of deliberation time. These models also add explanatory power by accounting for the strong effects of deliberation time on choice probability, and they gain empirical testability by making new predictions regarding the relations between choice probability and choice response time. More recent models, such as the decision by sampling model, provide a new capability for deriving the evaluations entering the sequential sampling accumulation process on the basis of long-term memory retrievals, where the memory is built from experience with real-world distributions. When developing sophisticated models, it is of course also important to examine to what extent the increased complexity

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

68

Handbook of choice modelling

of the developed models can be justified by the increased goodness-of-fit in predicting human behavior. Only when the qualitative and quantitative increase in predictive accuracy of a model is large enough, the increased complexity can be justified. Although the goals of psychologists and economists with respect to choice modeling overlap to some extent, there are also some clear differences. On the one hand, economists are often primarily interested in applying simple choice models to large samples of people with relatively few data per person, and the primary goal is to obtain efficient estimates of economically relevant parameters. For this goal, robust choice models are required that provide statistically efficient and computationally practical parameter estimates for further economic analysis. On the other hand, psychologists are primarily interested in explaining the cognitive processes that provide predictions across a wide range of puzzling findings using models that can simultaneously account for various measurements of choice including choice selection, decision time, confidence and brain activation. For this goal, more complex models are required that provide explanatory coherence and the power to predict new findings. But these are only two extreme positions, and more often the researcher is interested in both of these goals. For example, a marketing researcher may want to predict the market share of a new product to be introduced to the market. The market share will depend on the context of the market; that is, the other products available. Given the huge costs of introducing a new product to the market (for example, product development and marketing) the precision with which the future share of the product can be predicted becomes essential. In this situation, using a complex choice model that requires higher methodological effort to be estimated can pay off in terms of obtaining better predictions of people’s behavior.

NOTES 1. This work was supported by NIDA grant 5R01DA030551 to the first author and by a SNSF research grant 100014_130149 to the second author. 2. The Thurstone model has always been a theory of choice. Initially it was applied to psychophysics (Thurstone, 1927), but later (Thurstone, 1959) it was applied to values and preferences (Bock and Jones, 1968). 3. For some reason, similarity effects are ignored in this article. 4. Recently, Soltani et al. (2012) proposed a range renormalization model to account for both attraction and similarity effects. This is also a static choice model. However, this model has not been shown to predict compromise effects and, furthermore, it is a static model that fails to describe how the effects change with deliberation time. 5. Tony Marley (personal communication, 9 November 2011) pointed out to the first author of this chapter that if G 5 0 then this model becomes a special case of the ‘horse race’ model described in Marley and Colonious (1991) and satisfies regularity. 6. Strictly speaking, the LCA is not an OU process because they impose a lower threshold on the preferences so that they reflect off a bound at zero. This bound is used to make the model more similar to neural activations that must always be positive.

REFERENCES Aschenbrenner, K.M., D. Albert and F. Schmalhofer (1984), ‘Stochastic choice heuristics’, Acta Psychologica, 56 (1–3), 153–66. Ashby, F.G. (2000), ‘A stochastic version of general recognition theory’, Journal of Mathematical Psychology, 44 (2), 310–29.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 69 Becker, G.M., M.H. DeGroot and J. Marschak (1963), ‘Stochastic models of choice behavior’, Behavioral Science, 8 (1), 41–55. Ben Zur, H. and S.J. Bresnitz (1981), ‘The effect of time pressure on risky choice behavior’, Acta Psychologica, 47 (2), 89–104. Bhatia, S. (2013), ‘Associations and the accumulation of preference’, Psychological Review, 120 (3), 522–43. Birnbaum, M.H. and U. Schmidt (2008), ‘An experimental investigation of violations of transitivity in choice under uncertainty’, Journal of Risk and Uncertainty, 37 (1), 77–91. Bock, R.D. and L.V. Jones (1968), The Measurement and Prediction of Judgment and Choice, Oxford: Holden-Day. Böckenholt, U. (2006), ‘Thurstonian-based analyses: past, present, and future utilities’, Psychometrika, 71 (4), 615–29. Brown, S. and A. Heathcote (2005), ‘A ballistic model of choice response time’, Psychological Review, 112 (1), 117–28. Busemeyer, J.R. (1985), ‘Decision making under uncertainty: a comparison of simple scalability, fixed-sample, and sequential-sampling models’, Journal of Experimental Psychology: Learning, Memory, & Cognition, 11 (3), 538–64. Busemeyer, J.R. and J.T. Townsend (1993), ‘Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment’, Psychological Review, 100 (3), 432–59. Busemeyer, J.R., B.H. Forsyth and G. Nozawa (1988), ‘The use of choice response time to distinguish between elaborated versions of the elimination by aspects and suppression of aspects choice models’, Journal of Mathematical Psychology, 32 (3), 341–49. Carrol, J.D. and G. De Soute (1991), ‘Toward a new paradigm for the study of multiattribute choice behavior’, American Psychologist, 46 (4), 342–52. Chu, Y.P. and R.L. Chu (1990), ‘The subsidence of preference reversals in simplified and marketlike experimental settings – a note’, American Economic Review, 80 (4), 902–11. DeGroot, M.H. (1970), Optimal Statisical Decisions, New York: McGraw-Hill. Dhar, R., S.M. Nowlis and S.J. Sherman (2000), ‘Trying hard or hardly trying: an analysis of context effects in choice’, Journal of Consumer Psychology, 9 (4), 189–200. Diederich, A. (2003), ‘MDFT account of decision making under time pressure’, Psychonomic Bulletin & Review, 10 (1), 157–66. Diederich, A. and J.R. Busemeyer (2003), ‘Conflict and the stochastic-dominance principle of decisionmaking’, Psychological Science, 10 (4), 353–59. Dror, I.E., J.R. Busemeyer and B. Basola (1999), ‘Decision making under time pressure: an independent test of sequential sampling models’, Memory & Cognition, 27 (4), 713–25. Erev, I. and G. Barron (2005), ‘On adaptation, maximization, and reinforcement learning among cognitive strategies’, Psychological Review, 112 (4), 912–31. Fehr, E. and A. Rangel (2011), ‘Neuroeconomic foundations of economic choice – recent advances’, Journal of Economic Perspectives, 25 (4), 3–30. Fiedler, S. and A. Glöckner (2012), ‘The dynamics of decision making in risky choice: an eye-tracking analysis’, Frontiers in Psychology, 3, 335, doi: 10.3389/fpsyg.2012.00335. Glöckner, A. and T. Betsch (2008), ‘Modeling option and strategy choices with connectionist networks: towards an integrative model of automatic and deliberate decision making’, Judgment and Decision Making, 3 (3), 215–28. Gluth, S., J. Rieskamp and C. Büchel, C. (2012), ‘Deciding when to decide: time-variant sequential sampling models explain the emergence of value-based decisions in the human brain’, Journal of Neuroscience, 32 (31), 10686–98. Gold, J.I. and M.N. Shadlen (2007), ‘The neural basis of decision making’, Annual Review of Neuroscience, 30 (July), 535–74. Gold, J.I. and M.N. Shadlen (2002), ‘Banburismus and the brain decoding the relationship between sensory stimuli, decisions, and reward’, Neuron, 36 (October), 299–308. Gold, J.I. and M.N. Shadlen (2001), ‘Neural computations that underlie decisions about sensory stimuli’, Trends in Cognitive Sciences, 5 (1), 10–16. Gonzalez, C. and V. Dutt (2011), ‘Instance based learning: integrating decisions from experience in sampling and repeated choice experiments’, Psychological Review, 118 (4), 523–51. Gonzalez-Vallejo, C. (2002), ‘Making trade-offs: a probabilistic and context-sensitive model of choice behavior’, Pschological Review, 109 (1), 137–54. Gratton, G., M.G.H. Coles, E.J. Sirevaag, C.W. Eriksen and E. Donchin (1988), ‘Pre- and post-stimulus activation of response channels: a psychophysiological analysis’, Journal of Experimental Psychology. Human Perception and Performance, 14 (3), 331–44. Guo, F.Y. and K.J. Holyoak (2002), ‘Understanding similarity in choice behavior: a connectionist model’, in

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

70

Handbook of choice modelling

W.D. Gray and C.D. Schunn (eds), Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society, Mahwah, NJ: Erlbaum, pp. 393–8. Halff, H.M. (1976), ‘Choice theories for differentially comparable alternatives’, Journal of Mathematical Psychology, 14 (3), 244–6. Hausman, J.A. and D.A. Wise (1978), ‘A conditional probit model for qualitative choice: discrete decisions recognizing interdependence and heterogeneous references’, Econometrica, 46 (2), 403–26. Heath, T.B. and S. Chatterjee (1995), ‘Asymmetric decoy effects on lower quality versus higher quality brands: meta-analytic and experimental evidence’, Journal of Consumer Research, 22 (3), 268–84. Hertwig, R. and I. Erev (2009), ‘The description-experience gap in risky choice’, Trends in Cognitive Science, 13 (12), 517–23. Hey, J.D. (2001), ‘Does repetition improve consistency?’, Experimental Economics, 4 (1), 5–54. Hotaling, J.M., J.R. Busemeyer and J. Li (2010), ‘Theoretical developments in decision field theory: a comment on K. Tsetsos, N. Chater and M. Usher’, Psychological Review, 117 (4), 1294–8. Huber, J. and C. Puto (1983), ‘Market boundaries and product choice: illustrating attraction and substitution effects’, Journal of Consumer Research, 10 (1), 31–44. Huber, J., J.W. Payne and C. Pluto (1982), ‘Adding asymmetrically dominated alternatives: violations of regularity and the similarity hypthesis’, Journal of Consumer Research, 9 (1), 90–98. Iverson, G.J. and J.-C.Falmagne (1985), ‘Statistical issues in measurement’, Mathematical Social Sciences, 10 (2), 131–53. Janis, I.L. and L. Mann (1977), Decision Making: A Psychological Analysis of Choice, Conflict, and Decision Making, New York: Free Press. Khrennikov, A.Y. (2010), Ubiquitous Quantum Structure: From Psychology to Finance: Berlin: Springer. Kivetz, R., O. Netzer and V. Srinivasan (2004), ‘Alternative models for capturing the compromise effect’, Journal of Marketing Research, 41 (3), 237–57. Krajbich, I., C. Armel and A. Rangel (2010), ‘Visual fixations and the computation and comparison of value in simple choice’, Nature Neuroscience, 13 (10), 1292–8. Lambert-Mogiliansky, A., A. Zamir and H. Zwirn (2009), ‘Type indeterminacy: a model of the KT (Kahneman-Tversky) man’, Journal of Mathematical Psychology, 53 (5), 349–61. Laming, D.R. (1968), Information Theory of Choice-Reaction Times, New York: Academic Press. Link, S.W. and R.A. Health (1975), ‘A sequential theory of psychological discrimination’, Psychometrika, 40 (1), 77–111. Liu, T. and T.J. Pleskac (2011), ‘Neural correlates of evidence accumulation in a perceptual decision task’, Journal of Neurophysiology, 106 (5), 2383–98. Loomes, G., P.G. Moffat and R. Sugden (2002), ‘A microeconomic test of alternative stochastic theories of risky choice’, Journal of Risk and Uncertainty, 24 (2), 103–30. Lopes, L.L. (1996), ‘When time is of the essence: averaging, aspiration, and the short run’, Organizational Behavior & Human Decision Processes, 65 (3), 179–89. Louviere, J.J., D.A. Hensher and J.D. Swait (2000), Stated Choice Models: An Analysis and Applications, Cambridge: Cambridge University Press. Luce, R.D. (1959), Individual Choice Behavior, New York: Wiley. Luce, R.D. (2000), Utility of Gains And Losses: Measurement-Theoretical and Experimental Approaches, Mahwah, NJ: Lawrence Erlbaum. Luce, R.D. and P. Suppes (1965), ‘Preference, utility, and subjective probability’, in R.D. Luce, R.B. Bush and E. Gallanter (eds), Handbook of Mathematical Psychology, vol. 3, New York: Wiley, pp. 249–410. Markman, A. and D.L. Medin (1995), ‘Similarity and alignment in choice’, Organizational Behavior and Human Decision Processes, 63 (2), 117–30. Marley, A.A.J. and H. Colonius (1991), ‘The “horse race” random utility model and its competing risks interpretation’, Journal of Mathematical Psychology, 36 (1), 1–20. May, K. (1954), ‘Intransitivity, utility, and the aggregation of preference patterns’, Econometrica, 22 (1), 1–13. McFadden, D. (1981), ‘Econometric models of probabilistic choice’, in C.F. Manski and D. McFadden (eds), Structural Analysis of Discrete Data with Econometric Applications, Cambridge, MA: MIT Press, pp. 198–272. Mellers, B.A. and K. Biagini (1994), ‘Similarity and choice’, Psychological Review, 101 (3), 505–18. Mosteller, F. and P. Nogee (1951), ‘An experimental measurement of utility’, Journal of Political Economy, 59 (5), 371–404. Nosofsky, R.M. and Palmeri, T.J. (1997), ‘An exemplar-based random walk model of speeded classification’, Psychological Review, 104 (2), 266–300. Otter, T., G.M. Allenby and T. van Zandt (2008), ‘An integrated model of discrete choice and response time’, Journal of Marketing Research, 45 (5), 593–607. Payne, J.W., J.R. Bettman and E.J. Johnson (1993), The Adaptive Decision Maker, New York: Cambridge University Press.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Psychological research and theories on preferential choice 71 Pettibone, J. (2012), ‘Testing the effect of time pressure on asymmetric dominance and compromise in choice’, Judgment and Decision Making, 7 (4), 513–23. Platt, M.L. (2002), ‘Neural correlates of decisions’, Current Opinion in Neurobiology, 12 (2), 141–8. Pleskac, T.J. and J.R. Busemeyer (2010), ‘Two-stage dynamic signal detection: a theory of choice, decision time, and confidence’, Psychological Review, 117 (3), 864–901. Pothos, E.M. and J.R. Busemeyer (2009), ‘A quantum probability model explanation for violations of “rational” decision making’, Proceedings of the Royal Society, B., 276 (1665), 2171–8. Ratcliff, R. (1978), ‘A theory of memory retrieval’, Psychological Review, 85 (2), 59–108. Ratcliff, R. and P. Smith (2004), ‘A comparison of sequential sampling models for two-choice reaction time’, Psychological Review, 111 (2), 333–67. Ratcliff, R. and J. Starns (2009), ‘Modeling confidence and response time in recognition memory’, Pschological Review, 116 (1), 59–83. Regenwetter, M., J. Dana and C.P. Davis-Stober (2011), ‘Transitivity of preferences’, Psychological Review, 118 (1), 42–6. Restle, F. (1961), Psychology of Judgment and Choice, New York: Wiley. Rieskamp, J. (2008), ‘The probabilistic nature of preferential choice’, Journal of Experimental Psychology: Learning, Memory, & Cognition, 34 (6), 1446–65. Rieskamp, J. and U. Hoffrage (2008), ‘Inferences under time pressure: how opportunity costs affect strategy selection’, Acta Psychologica, 127 (2), 258–76. Rieskamp, J., J.R. Busemeyer and B.A. Mellers (2006), ‘Extending the bounds of rationality: evidence and theories of preferential choice’, Journal of Economic Literature, 44 (3), 631–61. Roe, R.M., J.R. Busemeyer and J.T. Townsend (2001), ‘Multialternative decision field theory: a dynamic connectionist model of decision making’, Psychological Review, 108 (2), 370–92. Rumelhart, D.E. and J.G. Greeno (1971), ‘Similarity between stimuli: an experimental test of the Luce and Restle choice models’, Journal of Mathematical Psychology, 8 (3), 370–81. Schall, J.D. (2003), ‘Neural correlates of decision processes: neural and mental chronometry’, Current Opinion in Neurobiology, 13 (2), 182–6. Scheibehenne, B., J. Rieskamp and C. Gonzalez-Vallejo (2009), ‘Cognitive models of choice: comparing decision field theory to the proportional difference model’, Cognitive Science, 33 (5), 911–39. Simonson, I. (1989), ‘Choice based on reasons: the case of attraction and compromise effects’, Journal of Consumer Research, 16 (2), 158–74. Smith, P.L. (1995), ‘Psychophysically principled models of visual simple reaction time’, Psychological Review, 102 (3), 567–93. Smith, P.L. and R. Ratcliff (2004), ‘The psychology and neurobiology of simple decisions’, Trends in Neurosciences, 27 (3), 161–8. Soltani, A., B. De Martino and C. Camerer (2012), ‘A range-normalization model of context dependent choice: a new model and evidence’, PLoS Computational Biology, 8 (7), e1002607. Stewart, N. and K. Simpson (2008), ‘A decision-by-sampling account of decision under risk’, in N. Chater and M. Oaksford (eds), The Probabilistic Mind: Prospects for Bayesian Cognitive Science, Oxford: Oxford University Press, pp. 261–76. Stewart, N., N. Chater and G.D.A. Brown (2006), ‘Decision by sampling’, Cognitive Pschology, 53 (1), 1–26. Svenson, O. and A. Edland (1987), ‘Changes of preference under time pressure: choices and judgements’, Scandanavian Journal of Psychology, 28 (4), 322–30. Svenson, O. and A.J. Maule (1993), Time Pressures and Stress in Judgment and Decision Making, New York: Plenum. Takane, Y. (1987), ‘Analysis of covariance structures and binary choice data’, Cognition and Communication, 20, 45–62. Thurstone, L.L. (1927), ‘Law of comparative judgment’, Psychological Review, 34 (4), 273–6. Thurstone, L.L. (1959), The Measurement of Values, Chicago, IL: University of Chicago Press. Townsend, J.T. and F.G. Ashby (1983), Stochastic Modeling of Elementary Psychological Processes, Cambridge: Cambridge University Press. Train, K.E. (2003), Discrete Choice Methods with Simulation, Cambridge: Cambridge University Press. Trueblood, J.S., S.D. Brown and A. Heathcote, A. (2014), ‘The multi-attribute linear ballistic accumulator model of context effects in multi-alternative choice’, Psychological Review, forthcoming. Tsai, R.C. and U. Böckenholt (2006), ‘Modelling intransitive preferences: a random-effects approach’, Journal of Mathematical Psychology, 50 (1), 1–14. Tsetsos, K., M. Usher and N. Chater, N. (2010), ‘Preference reversal in multialternative choice’, Psychological Review, 117 (4), 1275–93. Tversky, A. (1969), ‘Intransitivity of preferences’, Psychological Review, 76 (1), 31–48. Tversky, A. (1972a), ‘Elimination by aspects: a theory of choice’, Psychological Review, 79 (4), 281–99. Tversky, A. (1972b), ‘Choice by elimination’, Journal of Mathematical Psychology, 9 (4), 341–67.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

72

Handbook of choice modelling

Tversky, A. and D. Kahneman. (1991), ‘Loss aversion in riskless choice: a reference-dependent model’, Quarterly Journal of Economics, 106 (4), 1039–61. Tversky, A. and J.E. Russo (1969), ‘Similarity and substitutability in binary choices’, Journal of Mathematical Psychology, 6 (1), 1–12. Tversky, A. and S. Sattath (1979), ‘Preference trees’, Psychological Review, 86 (6), 542–72. Tversky, A. and I. Simonson (1993), ‘Context dependent preferences’, Management Science, 39 (10), 1179–89. Usher, M. and J.L. McClelland (2001), ‘The time course of perceptual choice: the leaky, competing accumulator model’, Psychological Review, 108 (3), 550–92. Usher, M. and J.L. McClelland (2004), ‘Loss aversion and inhibition in dynamic models of multialternative choice’, Psychological Review, 111 (3), 757–69. Van Zandt, T. (2000), ‘ROC curves and confidence judgments in recognition memory’, Journal of Experimental Psychology: Learning, Memory, & Cognition, 26 (3), 582–600. Vickers, D. (1979), Decision Processes in Visual Perception, New York: Academic Press. Wallsten, T.S. and C. Barton (1982), ‘Processing probabilistic multidimensional information for decisions’, Journal of Experimental Psychology: Learning, Memory, and Cognition, 8 (5), 361–84. Wedell, D.H. (1991), ‘Distinguishing among models of contextually induced preference reversals’, Journal of Experimental Psychology: Learning, Memory, & Cognition, 17 (4), 767–78. Willemson, M.C., U. Böckenholt and E.J. Johnson (2011) ‘Choice by value coding and value construction: processes of loss aversion’, Journal of Experimental Psychology: General, 140 (3), 303–24. Wright, P. (1972), ‘The harassed decision maker: time pressures, distractions, and the use of evidence’, Journal of Applied Psychology, 59 (5), 555–61. Yellot, J.I. (1977), ‘The relationship between Luce’s choice axiom, Thurstone’s theory of comparative judgment, and the double exponential distribution’, Journal of Mathematical Psychology, 15 (2), 109–44. Yukalov, V.I. and D. Sornette (2011), ‘Decision theory with prospect interference and entanglement’, Theory and Decision, 70 (3), 283–328.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:55AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

4

Towards a more complex model of risky choice Graham Loomes1 and Simone Blackburn

INTRODUCTION Throughout the history of modelling choice under risk and uncertainty there has been a productive interaction between modelling and experimental testing of those models. Sometimes the experiments have (initially, at least) been ‘thought experiments’, or experiments conducted informally with small samples of workshop/seminar participants – see Allais (1953) and Ellsberg (1961). Subsequently, experiments have been conducted more formally with much larger samples, many (although not all) of which have generated patterns of response which appear to contravene at least one basic standard axiom or postulate of expected utility theory (EUT) and have thereby stimulated the development of new models. Other experimental phenomena – preference reversals (Lichtenstein and Slovic, 1971) and disparities between willingness to pay and willingness to accept (Knetsch and Sinden, 1984), for example – have also been highly influential. Experiments have looked not only at choices under risk and uncertainty, but more broadly at choices between various kinds of goods and services, as well as choices with a temporal dimension; and they have investigated not just choices but other methods of eliciting people’s preferences – for example, different kinds of scoring systems and equivalence techniques. However, the focus of this chapter is primarily on risky choices of the kind that have been most commonly used in ‘laboratory’ experiments; namely, choices between simple lotteries involving various probabilities of no more than three possible outcomes in total. Some remarks about the implications of this body of data for other kinds of choices and preference elicitation procedures are made in the concluding section. The history of model development has been characterized by a tendency to progress in an incremental way. In particular, the proponents of new alternative models have often been reluctant to modify more than one or two standard assumptions at a time or to let the number of free parameters rise much above two or three. However, in this chapter we take a different line. We consider three different kinds of influence upon the ways in which people perceive and process choices, and we offer a modelling framework to incorporate all three. Our argument for doing so is that actual decision-making may well involve our brains responding in various ways to different dimensions of the stimuli that are presented to us, so that the end result is likely to be the product of a mixture of influences and tendencies, some of which may reinforce one another under certain circumstances but may offset and possibly cancel one another out under different circumstances. So when trying to model the kinds of risky choices that have dominated experimental studies to date, we consider three broad types of influence: (1) interactions of components within each lottery, (2) interactions of components between lotteries and (3) variability/noisiness/imprecision in people’s responses. We take those three elements in that 73 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

74

Handbook of choice modelling

order and build a modelling framework which combines them into a complex – but we hope not too complicated – structure. In order to set the scene for this endeavour, we turn now to the standard model and the analytical paradigm within which we are operating.

1

SETTING THE SCENE

As noted above, the great majority of experimental studies of risky choice have used very simple tasks – most often, asking respondents to choose between pairs of alternatives involving up to three different payoffs overall. Of course, it could be objected that many ‘real-world’ decisions often involve larger numbers of options with multiple possible outcomes where issues of task complexity and considerations of attitudes to skewness and kurtosis may be influential. Indeed, we shall discuss some of these questions in section 6. However, using much simpler tasks may be defended on the grounds that the decision theories being investigated have been proposed as general theories and so should apply to any choice scenario; and it may be argued that if they fail to accommodate even the simplest choices, their claims to generality must be seriously questioned. So binary choices involving no more than three payoffs have come to constitute a large and influential part of the body of experimental evidence and thus are our primary focus. Such simple stimuli have at least two advantages. First, they are quite easy for participants to understand, so that large-scale systematic departures from any particular theory cannot be explained away on the grounds that the tasks were too complicated and respondents just could not process them. Second, they lend themselves to a diagrammatic representation which many researchers have found useful when explaining their designs and comparing their results to the implications of the theories being investigated. We introduce that diagram in conjunction with a couple of binary choices of the kind that have stimulated many alternative models. Consider first2 a choice between the following alternatives: A: The certainty of £30 – which can be written as (£30, 1) B: An 80 per cent chance of £40 and a 20 per cent chance of 0 – that is, (£40, 0.8; 0, 0.2) These might be linked to an explicit random device – say, an opaque bag containing a hundred small discs each bearing a different integer from 1 to 100 inclusive. The two alternatives might then be presented as follows:

1

100

A

£30 100% 1

B

80

81

100

£40

0

80%

20%

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

75

where the numbers along the top refer to the numbers on the discs. So participants are told that if they choose B, they will be asked to pick a numbered disc at random from the bag: if that disc bears a number from 1 to 80 inclusive, they will receive £40; and if the number on the disc is between 81 and 100 inclusive, they will receive nothing. The percentage chance of receiving each payoff is shown just underneath the payoff to which it relates. At some point the respondent is also asked to choose between: C: A 25 per cent chance of £30 and a 75 per cent chance of 0 – (£30, 0.25; 0, 0.75) D: A 20 per cent chance of £40 and an 80 per cent chance of 0 – (£40, 0.2; 0, 0.8) which might be displayed as follows: 1

C

25

100

0

25%

75%

1

D

26

£30

20

£40 20%

21

100

0 80%

Conventional EUT as developed by von Neumann and Morgenstern (1947) does not say which of A or B an individual will choose – someone who is risk seeking or risk neutral will choose the higher expected value offered by B, whereas someone who is sufficiently averse to risk may prefer the lower sum offered by A with certainty. However, anyone behaving according to EUT who prefers A in the first choice should prefer C in the second, while anyone who prefers B should also prefer D. Essentially, this is because the combination of axioms underpinning EUT implies that an individual acts as if she assigns to each payoff some utility index u(.) – which is invariant from one choice to another – and chooses the alternative which gives the highest expectation when each utility index is weighted by the probability that the payoff in question will be received. Thus (if we assign a zero utility index to a zero payoff) an individual’s choice between A and B depends on whether the certainty of u(£30) is greater than 0.8 3 u(£40) for that individual. If she strictly prefers A, EUT supposes that her subjective values for the sums involved are such that u(£30)/u(£40) . 0.8; if she strictly prefers B, we infer u(£30)/u(£40) , 0.8; and if she is indifferent between A and B, the implication is that u(£30)/u(£40) 5 0.8. Likewise, the choice between C and D depends on whether 0.25 3 u(£30) is greater than, less than or equal to 0.2 3 u(£40): i.e., whether u(£30)/u(£40) is greater than, less than or equal to 0.2/0.25 5 0.8. This is exactly the same condition as in the choice between A and B. So if u(£30)/u(£40) . 0.8, A and C are chosen; whereas if u(£30)/u(£40) , 0.8, B and D are preferred. It is only when u(£30)/u(£40) is exactly equal to 0.8 that the individual is indifferent in both choices: but we should expect such cases to be rare; and when they do occur, if people choose at random in each pair, we should expect to see as many cases of A and D as of B and C.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

76

Handbook of choice modelling

Yet the evidence from many experiments is that for parameters like those above, such cases are neither as rare as would conventionally be supposed, nor are they as symmetrical: a substantial proportion of any sample is likely to choose the safer option A in the first pair but also prefer the riskier option D in the second pair, whereas relatively few choose B and C. Kahneman and Tversky (1979) drew attention to this asymmetrical pattern, which they labelled the ‘common ratio effect’ (CRE). Since their paper, the CRE has been replicated dozens if not hundreds of times. The magnitude and reliability of the asymmetry is such that it has come to be regarded as perhaps the most significant challenge to the descriptive validity of EUT.3 However, it is by no means the only pattern of behaviour that cannot be explained by EUT. Machina (1982) reviewed a number of other such patterns and he proposed a diagrammatic representation which made it easier to see how they related to each other and to EUT within a two-dimensional picture. The ‘Marschak–Machina’ (M–M) triangle below allows us to depict graphically any well-specified risky lottery involving no more than three payoffs x3 . x2 . x1. The probability p3 of the highest payoff x3 is given on the vertical axis; the probability p1 of the lowest payoff x1 is given by the horizontal axis; and the probability p2 of the middle payoff x2 is then 1-p3-p1. So alternative B in the example above is given by the point on the hypotenuse whose co-ordinates are p3 5 0.8 and p1 5 0.2, while A is at the point where p2 5 1 and hence p3 5 p1 5 0. D and C are then generated by scaling down p3 and p2 by a quarter and making up the deficit via p1. Machina pointed out that for lotteries involving no more than three payoffs in total, the preferences of an EU maximizer could be represented by a map of linear and parallel indifference curves. Figure 4.1 shows the case of an individual who is indifferent between A and B and thus also between C and D, with dark straight lines connecting the members of each pair. Three other parallel grey lines indicate three other indifference curves in the map. The lotteries at the extremes of these lines – that is, lying on the edges of the triangle – can either be obtained by scaling down the A and B probabilities by some factor and correspondingly increasing the probability of x1 (to produce pairs to the lower-right of the A–B line, which become progressively less desirable as the probability of the worst outcome is increased) or else by increasing the probability of x3, thereby generating progressively more desirable pairs as the probability of the best outcome is increased. Thus E and F are obtained by scaling down A and B by a quarter and adding a 0.75 chance of x3 to both, so that E gives a 0.75 chance of £40 and a 0.25 chance of £30, while F gives a 0.95 chance of £40 and a 0.05 chance of 0. The straightness of the indifference curves reflects the betweenness property of EUT whereby if an individual is indifferent between any two lotteries, he is also indifferent between any linear combination of the two. So in the Figure 4.1, G can be generated as a mixture of A and B in the proportions 0.75:0.25. Another EU maximizer with different preferences to the individual depicted in Figure 4.1 would be represented by a different set of indifference curves – again, all parallel straight lines. But these lines will either be steeper (if he is more risk averse than the individual depicted above) so that he has a strict preference for A over G, G over B, C over D, E over F; or else they will all be less steep, so that he has the opposite preference

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

77

p3 1

F

B 0.8 E 0.6

0.4 D

G

0.2

C

A 0 Figure 4.1

0.2

0.4

0.6

0.8

1 p1

EUT depicted in the Marschak–Machina triangle

in all those pairs. Thus the M–M triangle provides a simple visual way of representing EU preferences and a number of studies have used it as a template for their designs to investigate EUT and compare its performance with various alternative models: by devising an assortment of binary choices where lotteries are connected by lines of different lengths and slopes, one can hope to build up a basic picture of different individuals’ patterns of response – see, for example, Hey and Orme (1994) and Loomes et al. (2002). We shall therefore use this diagram in subsequent sections to depict and overlay the implications of various alternative models.

2

WITHIN-LOTTERY INTERACTIONS

The effect of the set of axioms that underpin EUT is to produce a model where it is as if each payoff can be assigned a utility which is completely independent of the number and nature of the other available payoffs and their associated probabilities. So £30 is assigned the same utility index whether it is offered with certainty (as in lottery A above) or with probability 0.25, as in C or E. Moreover, the fact that the other payoff offered by C is 0 while the other payoff offered by E is £40 has no effect on the subjective evaluation of the 0.25 chance of £30 as far as EUT is concerned. This is the force of the within-lottery independence assumptions made by EUT.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

78

Handbook of choice modelling

The largest number of alternative models account for departures from EUT in terms of some within-lottery interactions that breach such assumptions in one way or another. Some of these alternative models allow for interactions which modify the utilities of each payoff and/or the lottery as a whole, depending on the size and distribution of the other payoffs (see, for example, the ‘three moments’ model of Hagen, 1979; the ‘alpha-nu’ model of Chew and MacCrimmon, 1979; and the forms of disappointment theory proposed by Bell, 1985 and by Loomes and Sugden, 1986). Other alternatives to EUT work on the basis of some non-linear transformation of probabilities into decision weights (see Kahneman and Tversky, 1979; and later rankdependent models such as Quiggin, 1982, Starmer and Sugden, 1989, and Tversky and Kahneman, 1992). Abstracting from the detail of the different functional forms of each of these alternatives, what they have in common might be summed up as an ‘attitude to dispersion’. Consider again the choice between A and B. The expected monetary values (EVs) are £30 and £32 respectively, but B has considerable variance and is skewed: in particular, the ‘extreme’ payoff – the one much further from the EV – is 0, so that to the extent that attention to dispersion entails extreme payoffs being more heavily weighted,4 this counts against B and makes A relatively more attractive. Next, consider the choice between C and D. Here the respective EVs are much lower – £7.50 and £8 – and the 0 is no longer the extreme payoff in either lottery: now it is the £30 and especially the £40 that are more extreme relative to their EVs and therefore arguably more heavily weighted, tipping the balance towards D. The simplest way of representing such an attitude to dispersion in relation to the triangle diagram is along the lines proposed by Chew and MacCrimmon (1979), entailing indifference curves that are straight lines fanning out as if from a point to the south-west of the right angle. Three such lines are shown in Figure 4.2. These three lines depict the preferences of someone who is indifferent between A and G and B, but who strictly prefers the riskier option D to C (that is, the indifference curve through C passes below D) while at the same time strictly preferring the safer option E over F (that is, the indifference curve through E passes above F). This not only accommodates the CRE but also the other form of Allais paradox which Kahneman and Tversky (1979) called the common consequence effect (CCE).5 Of course, modelling attitude to dispersion in terms of linear indifference curves radiating from a single point is a simplification, and most of the alternatives to EUT listed above (and others not listed here – see Sugden, 2004, for a more comprehensive list) would not be so restrictive. In particular, they would allow non-linear indifference curves that could accommodate certain violations of betweenness. However, as we shall see shortly, violations of betweenness might be accounted for in other ways. So our modelling strategy is to try to capture the main features of attitude to within-lottery dispersion in terms of a formulation that requires just two parameters – namely, the coordinates of the point from which the linear indifference curves radiate – and leave non-linearities to another source of influence, to which we now turn.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

79

p3 1

F

B 0.8 E 0.6

0.4

D

G

0.2

A 0 Figure 4.2

3

0.2

0.4

0.6

C 0.8

1

p1

Dispersion attitude as fanning out in the M–M triangle

BETWEEN-LOTTERY INTERACTIONS

Consider again the CRE pairs: A: (£30, 1) B: (£40, 0.8; 0, 0.2) and C: (£30, 0.25; 0, 0.75) D: (£40, 0.2; 0, 0.8) One way of thinking about choosing between alternatives is to suppose that people act as if they evaluate each option separately and independently of the other available option(s) and then pick whichever option scores highest in terms of its subjective value to the decision-maker. This is the force of the standard transitivity axiom, which effectively rules out between-lottery interactions that might cause the value assigned to a given option to be influenced by the characteristics of the other options in the choice set. However, much evidence, including eye-tracking studies (for example, Arieli et al., 2011), suggests that, during the process of arriving at their decisions, people do not

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

80

Handbook of choice modelling

process each option entirely separately, but are likely to make a number of betweenlottery comparisons. In itself, this is not incompatible with EUT. As we shall explain below in conjunction with expression (4.2), as long as the differences between payoffs are all evaluated according to the same u(.) and as long as each difference between utilities is weighted by its probability, decisions will conform with EUT. However, making betweenlottery comparisons allows the possibility that the evaluation of a given option may be influenced by particular respects in which it contrasts with, or is similar to, the option(s) with which it is being compared. And that, in turn, opens up the prospect that different contrast effects and similarity judgements may produce systematic departures from EUT. The potential importance of similarity judgements as causes of intransitive choice was discussed by Tversky (1969). Subsequently, Rubinstein (1988) and Leland (1994) developed the idea in order to explain the CRE and other departures from EUT. Loomes (2010) attempted to take the modelling a stage further by allowing similarity judgements to change smoothly as the parameters changed. The essence of that model is as follows. Let the safer and riskier alternatives be represented as S and R, and let bR and bS represent, respectively, the probabilities associated with the payoff advantage of R over S (in this case, £40 compared with £30) and with the payoff advantage of S over R (that is, £30 compared with 0). When bR 5 0.8 and bS 5 0.2 so that bR 1 bS 5 1, we have the most ‘scaled-up’ case – the choice between A and B above. When bR 5 0.2 and bS 5 0.05, we have the choice between C and D, where the scaling down by a factor of ¼ is reflected by the fact that the sum bR 1 bS is now 0.25. bR

bS

1-bR-bS

S

£30

£30

0

R

£40

0

0

For ease of exposition, let us set aside for the moment any within-lottery effects of the kind discussed in the previous section and focus upon similarity judgements as if they were the only influences at work. Deciding between S and R can then be seen as a matter of balancing the argument favouring R over S against the argument favouring S over R and choosing according to whichever argument is stronger. If we denote the subjective value of the payoff advantage of R over S by yR, and denote the subjective value of the payoff advantage of S over R by yS, then a conventional formulation would say that the decision will depend on whether the weighted advantage of R, that is, bR 3 yR, is greater than, equal to or less than the weighted advantage of S, namely bS 3 yS. Using s to denote strict preference and ~ to denote indifference, we can rearrange to give: s . S ~ R 3 bS/bR 5 yR/yS a ,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

(4.1)

81

Towards a more complex model of risky choice

To be more general, Loomes (2010) reformulated (4.1) in terms of the perceived relative arguments for S over R and for R over S on the probability and on the payoff dimensions, denoting these respectively by f(bS, bR) and by j(yR, yS). So (4.1) becomes: s

.

S ~ R 3 f(bS, bR) 5 j(yR, yS) a

(4.2)

,

Different decision theories can then be identified in terms of the ways in which they specify f(bS, bR) and j(yR, yS). For EUT, the weights are the probabilities themselves so that f(bS, bR) is simply bS/bR – that is, (p2 – p3)/p3; and j(yR, yS) is the ratio of the differences in utilities – that is, [u(£40) – u(£30)]/[u(£30) – u(0)]. Thus for any M–M triangle, the right-hand side of (4.2) stays constant. So too does the left-hand side if bS and bR are scaled by the same factor. Hence we get the result that under EUT the choice between S and R should be the same at every level of scaling. Tversky and Kahneman’s (1992) cumulative prospect theory (CPT) specifies this differently. f(bS, bR) 5 (w2 – w3)/w3, where wi are the subjective probability weights derived by the CPT process; and j(yR, yS) 5 [v(£40) – v(£30)]/[v(£30) – v(0)], where v(.) maps payoffs to the individual’s value function. So once again, for any M–M triangle the right-hand side of (4.2) stays constant; but now the left-hand side changes as bS and bR are scaled down because each pi is transformed nonlinearly into wi. For CPT, the transformations are made on a within-lottery basis; but the effect of the usual assumptions about the shape of the transformation function is generally to reduce (w2 – w3)/w3 as p2 and p3 are scaled down, so that f(bS, bR) falls and the balance of arguments tends to shift in favour of the risky lottery. However, if similarity effects operate in the direction that Rubinstein and Leland have proposed, then as bR and bS are scaled down, the smaller of the two – in this case, bS – carries less and less relative weight. The perceived ratio f(bS, bR) might then be expressed as f (bS, bR) 5 (bS /bR)

(bS 1 bR) a

(4.3)

where a # 0. The particular functional form in (4.3) is not the only possible way of capturing the similarity effect, but it has several convenient and pertinent properties. First, for the ‘baseline’ pair – the pair where (bS 1 bR) 5 1 – we have f(bS, bR) 5 bS/bR, which we take as a benchmark. Second, the degree of any scaling down is given by (bS 1 bR) and thus similarity effects become progressively stronger as (bS 1 bR) becomes steadily smaller. Third, the formulation allows that for any particular pair at some level of scaling down, the strength of the similarity effect may vary from one individual to another: some people may not be susceptible at all, while others are greatly influenced by differences and pay little attention to ratios. In (4.3), the individual’s degree of susceptibility to similarity judgements of this kind is represented by the parameter a: someone who is not at all susceptible and therefore behaves like an EU maximizer is modelled as if a 5 0; the more negative the value of a, the more the individual is susceptible to similarity effects.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

82

Handbook of choice modelling

For example, suppose a 5 –0.4. Then in the most scaled-up case of the choice between A and B where bR 1 bS 5 1, the perceived ratio is simply 0.25. So if j(yR, yS) < 0.25, (4.2) entails that the safer option A will be chosen. However, in the choice between C and D, where bR + bS = 0.25, the perceived ratio f(bS, bR) is (0.25)1.741, which is approximately 0.089. So if j(yR, yS) > 0.089, the riskier option D will be chosen. In short, for an individual for whom a 5 –0.4, there is a range of values of j(yR, yS)between 0.25 and 0.089 that give the A and D pattern typical of the CRE. One might wonder how similarity effects can be represented in a M–M triangle. When addressing that question, it is important to bear in mind that once we allow for similarity effects of the kind modelled in (4.3), we can no longer operate in terms of indifference curves. The notion of an indifference curve is that all lotteries on a given curve are regarded as being exactly as desirable as all others on that same curve, so that we can take any pair of such lotteries and expect that the individual will be indifferent between them. But similarity effects preclude this possibility, because the degree of similarity is a function of the particular pair being considered and changes with the distance between them. To illustrate this with a numerical example, consider an individual who is indifferent between A and B. Since f(bS, bR) 5 0.25, indifference entails j(yR, yS) 5 0.25. Now consider the choice between A and G, remembering that G is a linear combination of A and B. In this pair, G is the riskier option and A is the safer one. So here bR 5 0.2 and bS 5 0.05, so that the degree of scaling down, (bS 1 bR) is 0.25. If, as in the previous example, we let a 5 –0.4, then, just as for C and D, f(bS, bR) ≈ 0.089. However, if we set any attitudes to dispersion aside, the three payoffs and the relative arguments between them are unchanged, so j(yR, yS) 5 0.25. Since f(bS, bR) is less than j(yR, yS), the riskier G is strictly preferred to the safer option A. Next, consider a choice between G and B. In this comparison, B is the riskier option and G is the safer one. Here bR 5 0.6 and bS 5 0.15 so that bS 1 bR 5 0.75. for an individual whose a 5 –0.4, f(bS, bR) ≈ 0.211. Since j(yR, yS) remains unchanged at 0.25, the riskier B is strictly preferred to the safer G. So B is strictly preferred to G, and G is strictly preferred to A. But by initial construction, A is indifferent to B. Clearly, this result is not compatible with transitivity and therefore not compatible with any indifference curve representation. Instead, in the M–M triangle below, the dashed lines connecting pairs of lotteries on the edges indicate indifference between those lotteries but do not entail indifference between any other lotteries on the line connecting them. Indeed, as seen in the example involving A, G and B, any linear combination of two such ‘edge’ lotteries will be more similar to each of them than they are to each other and will be preferred to the safer of the two and will be less preferred than the riskier of the two if a , 0. As well as allowing intransitivities of this kind, it is possible to construct other cases which involve violations of betweenness.6, 7 Some other implications of the functional form of (4.3) are also depicted in Figure 4.3. As in earlier figures, the pairs {C, D} and {E, F} are scaled-down variants of the most scaled-up pair, {A, B}, all of which involve bS/bR 5 0.25, so that straight lines connecting these pairs all have a gradient of 4. This is also the gradient of the straight dashed line between A and B which signifies the indifference between those two lotteries that we are here assuming.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

83

p3 1

F

B

0.8 E 0.6

0.4

A Figure 4.3

D

G

0.2

0

0.2

0.4

0.6

C

0.8

1

p1

Similarity effects on indifference between ‘edge’ pairs

As we have seen, the similarity effect modelled in (4.3) means that as p3 and p2 are steadily scaled down, f(bS, bR) falls – which translates into the gradients of the dashed lines tending towards 1 as the lines get shorter and shorter. Thus as we move from {A, B} in the direction of {C, D}, the lengths of the lines between bottom edge and hypotenuse become shorter and their gradients become flatter. By the time we reach D, this change in the gradient due to the similarity effect has resulted in D being regarded as indifferent to a lottery on the bottom edge which strictly dominates C, and so we can infer that D is strictly preferred to C. Over this area of the triangle, therefore, we see that similarity effects work in the same general direction as the fanning out pattern associated with the attitude to dispersion discussed in the previous section. For pairs of lotteries where one option is on the bottom edge and the other is on the hypotenuse, the two influences are entirely compatible and mutually reinforcing – which is perhaps why the CRE is such a strong and easily replicable pattern in experiments which use pairs like {A, B} and {C, D}. However, Figure 4.3 also illustrates the property that all lines of the same length have the same gradient, wherever they are in the triangle. So the line connecting E to its counterpart on the hypotenuse is parallel to the line connecting C to its hypotenuse counterpart. And the line from F to the vertical edge is parallel to the line from G to the vertical edge and to the line of the same length in the bottom right hand corner. Two interesting implications follow from this.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

84

Handbook of choice modelling

First, if we focus exclusively on similarity effects, there should be no difference between responses to {A, G} and to {C, D}. However, as explained in note 5, these two pairs constitute the CCE form of Allais paradox, and we know that there is often some difference between the two patterns of response. But as also noted in that footnote, the CCE is often rather weaker that the CRE. Both observations may be understood if we think of the CCE as being the product of just the fanning out produced by attitude to dispersion, whereas the CRE is the product of both attitudes to dispersion and similarity. Second, notice that as we move from {A, B} in the direction of {E, F} rather than in the direction of {C, D}, the progressive shortening and flattening of the lines produces the reverse pattern of fanning – and in this area, a pattern which is in the opposite direction to the fanning out associated with the attitude to dispersion. If both influences are at work, they may partially (or perhaps, completely) offset one another in this area of the triangle. And indeed the evidence – and the theory – is mixed for patterns in this area of the triangle: sometimes behaviour seems to accord with fanning out, but sometimes fanning in predominates, and sometimes no strong effect of either kind is found (see Gul, 1991; Loomes, 1991). There is a further implication of (4.3). The great majority of experiments investigating the CRE and CCE have presented choices which, when mapped to the triangle diagram, involve gradients considerably greater than 1. However, consider replacing the 80 per cent chance of £40 in B by, say, a 20 per cent chance of £160 (otherwise 0) and label this risky lottery B*. Then D* and F* are derived as before by scaling down B* by a quarter and adding a 75 per cent chance either of 0 or of £160. In this case, bS is (four times) bigger than bR so that, as the lotteries are scaled down, it is bR that becomes relatively lower weighted. In other words, as the distance between lotteries decreases, the gradient of the connecting line signifying indifference between them increases towards 1. As illustrated in Figure 4.4, the similarity effect now produces patterns between the edge lotteries that fan out in the area above the AB line and those that fan in below the line. If these tendencies interact with the tendency for dispersion aversion to produce fanning out right across the triangle, then we should expect mutual reinforcement in the upper part of the triangle, but offsetting in the lower area that produces weaker, or possibly mixed, effects there. Although there have been fewer experiments conducted with such lotteries, there is at least some evidence of this kind. For example, Battalio et al. (1990) reported a study where bS/bR 5 2.33: scaling down by one-fifth resulted in 16 departures from EUT (out of a sample of 33), of which 6 were in the ‘usual’ direction while 10 moved in the opposite direction consistent with the similarity effects depicted in Figure 4.4. Section 5 of Loomes (2010) reports an experiment specifically designed to compare behaviour among pairs derived from a gradient 4 baseline with corresponding pairs derived from a baseline with a gradient of ¼. The kinds of similarity effects predicted by (4.3) were clearly in evidence. But equally clearly, similarity alone was not enough to explain the full sets of results: a mixture of similarity and attitude to dispersion, as proposed in this chapter, appears to provide a more comprehensive account of the data from that experiment. This raises the question of just how to integrate these two tendencies into a single model. It is to this question that we now turn.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

85

p3 1

F* 0.8 E

0.6

0.4

B*

0.2 G*

D* C

A 0 Figure 4.4

4

0.2

0.4

0.6

0.8

1

p1

Similarity effects when baseline gradient is less than 1

COMBINING ATTITUDE TO DISPERSION WITH SIMILARITY JUDGEMENTS

In this section, our approach will be to start with the within-lottery effects discussed in section 2 and then overlay that basic structure with between-lottery considerations of the kind discussed in section 3. As noted in section 2, attitude to dispersion can be modelled either in terms of modifications to utilities while taking the given probabilities as weights, or else in terms of non-linear transformations of the probabilities into weights which are then used in conjunction with a given utility/value function. Of these two modelling approaches, the first seems psychologically no less plausible than the second and allows a relatively simple way of integrating attitude to dispersion and similarity effects. Consider again an individual whose attitude to dispersion for some set of three payoffs is as depicted in Figure 4.2, reproduced below for ease of reference as the basis of Figure 4.5. In terms of expression (4.2) above, each dashed line represents a set of lotteries for which f(bS, bR) 5 j(yR, yS). Since we are, for the moment, setting to one side any similarity effects8 and taking probabilities as given, we have f(bS, bR) 5 bS/bR, which is simply the inverse of the gradient of a particular dashed line. This then gives us the j(yR, yS) for any pair of lotteries on that line, where any modification of utilities/values due to attitude to dispersion is incorporated into the computation of j(yR, yS).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

86

Handbook of choice modelling

To illustrate, suppose that the story behind attitude to dispersion is one of anticipated disappointment and elation.9 For {A, B}, any anticipated ‘elation’ associated with the £40 outcome is factored into the subjective evaluation of yR, while any anticipated disappointment associated with the zero outcome enters into the subjective evaluation of yS, so that j(yR, yS) is assumed to take any such considerations into account. On that basis, indifference between A and B implies that this j(yR, yS), incorporating any and all such increments and decrements, is equal to f(bS, bR), which we read off as bS/bR 5 0.2/0.8 5 0.25. However, when considering hypotenuse lotteries in the vicinity of D, the scaling down of p3 greatly reduces expectations, so that there is considerably less disappointment associated with the zero and much more elation associated with the £40. When the corresponding considerations are taken into account in evaluating alternative lotteries on the bottom edge, the net effect is that the payoff-based relative arguments favouring the hypotenuse lotteries have strengthened, increasing j(yR, yS). Lotteries indifferent to C must therefore exhibit a correspondingly higher f(bS, bR) – which, being the inverse of the gradient of the dashed line through C, means that this dashed line must be flatter. As a result, the individual is indifferent between C and the lottery where the dashed line through C cuts the hypotenuse – in Figure 4.5, that is the lottery offering a 0.15 chance of £40 and a 0.85 chance of 0. Now consider the top left of the triangle in Figure 4.5. E offers a 0.75 chance of £40 and a 0.25 chance of £30, so expectations are high (EV 5 £37.50); but the worst that can p3 F

1

B 0.8 E

0.6

0.4

D

G

0.2

A 0

Figure 4.5

0.2

0.4

0.6

C 0.8

1

p1

Overlaying similarity on attitude to dispersion

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

87

happen is a payoff of £30, so that the potential for disappointment is only modest. On the other hand, with expectations in this region, a zero payoff would be very much more aversive and F offers too large a chance of that outcome for this individual’s tastes. The lower j(yR, yS) entails a lower f(bS, bR), which in turn means that the dashed line through E must be steeper. In Figure 4.5, the dashed line through E indicates that the required hypotenuse lottery would have to give a 0.975 chance of £40 and only a 0.025 chance of 0. In summary, once we set the coordinates of the point to the south-west of the triangle from which the dashed lines emanate, we can compute the gradient of the line through any lottery in the triangle and thus compute the relevant j(yR, yS) as the inverse of that gradient. This is the first step in mapping the preferences of the individual in a way which incorporates the kind of within-lottery interactions that are characteristic of much of the experimental data. The next step is to allow for similarity effects on top of the dispersion effects. To illustrate this with an example, consider again the dashed line passing through C to (£40, 0.15; 0, 0.85). This line has a gradient of 1.5, corresponding with j(yR, yS) 5 ⅔. If similarity effects as modelled in expression (4.3) are now admitted with a , 0, we need to find the lottery on the hypotenuse such that f(bS, bR) 5 ⅔, which will be a first approximation to the hypotenuse lottery judged indifferent to C.10 That is, we need to find bS/bR such that this ratio, when raised to the power of (bS 1 bR)a, is equal to ⅔. If we continue to consider an individual whose a 5 –0.4, and apply this to the example in hand where (bS 1 bR) 5 0.25, we have (bS/bR)1.741 ≈ ⅔, from which (bS/bR) ≈ 0.792. This in turn allows us to derive bR ≈ 0.14 and bS ≈ 0.11. This corresponds with the hypotenuse lottery (£40, 0.14; 0, 0.86) which is connected to C by the dotted line in Figure 4.5. We can apply the same procedure to E. The gradient of the dashed line through E is 9, corresponding with j(yR, yS) 5 0.11. Here too (bS 1 bR) 5 0.25, so that (bS 1 bR) a 5 1.741. In this case, indifference requires (bS/bR)1.741 ≈ 0.11, from which (bS/bR) ≈ 0.28. Thus p3 ≈ 0.945. In other words, in this case, the first approximation is to a hypotenuse lottery very slightly to the south-east of F, as shown by the dotted line from E in Figure 4.5. Although this is only a first approximation, it illustrates the point made earlier; namely, that when the gradient for the baseline pair is greater than 1, similarity effects tend to reinforce dispersion effects in the lower right area of the triangle but offset them – possibly to the point of cancelling or outweighing them – in the upper left area of the triangle. And one could produce the ‘mirror image’ result – the effects cancelling out in the lower part of the triangle but working in the same direction in the upper part – for pairs of lotteries whose baseline gradient is less than 1. More elaborate ways of modelling attitude to dispersion and similarity judgements could no doubt be found, but with the simplifications made above, the two effects can be modelled and combined on the basis of three person-specific parameters: the coordinates of a point to the south-west of the triangle from which the dashed lines emanate, reflecting an individual’s attitude to dispersion; and the value of a which represents an individual’s susceptibility to similarity judgements. However, as specified above, the model is deterministic. Yet we know, on the basis of many experiments dating back at least to Mosteller and Nogee (1951), that most people’s responses have a substantial stochastic component. In other words, when asked to consider exactly the same pair of lotteries presented in exactly the same format on two

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

88

Handbook of choice modelling

or more separated occasions within the same experiment, many participants have been observed to make different choices on different occasions. How might we incorporate the stochastic nature of individuals’ decision-making into the modelling framework outlined above?

5

A PROBABILISTIC FORMULATION

An early review of different approaches to modelling the probabilistic nature of choice under risk was provided by Luce and Suppes (1965) and more recently by Rieskamp et al. (2006), and expositions/discussions of some of these approaches in various contexts feature in other chapters in this handbook. In this section, we focus on one approach which is consistent with a plausible (if simplified) story of the neuropsychological nature of the variability/imprecision of people’s judgements in even seemingly simple binary choice tasks. We do not claim that it is the only or best approach; but we leave it to others to explore alternatives, if they are so minded. For those who are so minded, Wilcox (2008) provides a very useful overview and discussion. The underlying idea is that the neuronal activity involved in people’s judgemental processes is inherently ‘noisy’ (see, for example, Deco and Rolls, 2006). Moreover, the ways in which choices are processed – for example, the order in which we move from one piece of information to another and the amounts of time we spend considering different items – allow that the attention and weight given to the various components and dimensions may vary somewhat from one occasion to another. So, although a particular individual may generally be influenced by similarity considerations, the degree of that influence may vary from one occasion to another. It is as if the value of a used on one occasion is somewhat different from the value of a used by that same individual on another occasion. In short, rather than modelling an individual as if she operates with a single a that is applied to every decision, she might be characterized as if there is a distribution of a values from which she draws a particular value at random that is then applied to the decision problem in front of her at that moment. This is a concept of judgemental noise in the tradition of the random preference model of Becker et al. (1963). The same idea may be applied to the convergence point to the south-west of the triangle. The location of this point may be regarded as embodying two characteristics. First, the gradient of the line through the right angle of the triangle reflects what is generally regarded as the individual’s risk attitude towards the baseline pair: the more risk averse an individual is, the better the hypotenuse lottery must be in order to be considered as good as the certainty of the intermediate payoff. So some distribution over the gradient of the line between the right angle of the triangle and the point of convergence can encompass the variability in an individual’s risk attitude.11 Let us denote this gradient by g . 0. Second, the degree of fanning will vary according to the distance of the convergence point from the right angle of the triangle: the further away that point is from the right angle, the less variability there will be between the gradients of the lines across the triangle, signifying less impact from higher moment considerations of the skewness/disappointment/elation kind: as the distance between the point of convergence and the right

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

89

angle tends towards infinity, the lines tend towards being parallel, which is what EUT assumes. So some distribution over the length of the line between the right angle of the triangle and the point of convergence can capture the variability in an individual’s attitude to any higher moment influences. Let us denote this length variable (where the unit of measurement is the length of the vertical/horizontal side of the triangle) by l. For any pair of {S, R} lotteries in a given triangle, a particular set of {g, l, a} values will determine, to a first approximation, which alternative is preferred. The probability of the S alternative being chosen is therefore the probability that sampling from the distributions g, l and a will yield a set of {g, l, a} values that entails a preference for S in that pair. Implementing such a model requires further assumptions, of course, about the distributions of each of g, l and a and about any covariance between them. We have no strong intuitions about the appropriate assumptions to make: it seems sensible to start with some simple formulations – for example, independent bounded distributions – and see whether the data suggest modifications in the course of attempts to fit them.

6

EXTENSIONS AND LIMITATIONS

So far, we have set out a modelling framework intended to allow for three sources of influence in choices between lotteries: interactions within each lottery; interactions between the different lotteries; and noise/variability in individuals’ judgements. However, this framework has been developed in the context of a rather specific – and arguably, narrow – domain: binary choice between lotteries involving three outcomes in total. The justification for this focus is that a large proportion of the experimental data in this area has been generated by decisions of this type. But there are many other types of decision task. The rest of this section touches briefly upon some of the issues raised when we broaden the domain somewhat. 6.1

Larger Numbers of Payoffs Overall

Suppose we add a third lottery J: (£100, 0.3; 0, 0.7) to the set consisting of A and B. If we continue for the moment to restrict ourselves to binary choices, there is now the potential for three such choices: {A, B}, {A, J} and {B, J}. Each pair involves three payoffs, but it is a different set of three in each case. This raises the question of how to model the relationships between different subsets of payoffs. The proposal made in Loomes (2010), building on earlier literature, was to assign a ‘basic utility’12 c(xi) to every payoff, and to allow for the possibility of non-linear interactions between the different payoffs within a particular pair. Within any particular pair, the basic utility representation of the payoff difference favouring the R option over the S option will be the difference between c(x3) and c(x2), more compactly denoted by (cx3cx2). Call this cR. Correspondingly, the payoff difference in basic utility terms favouring S over R will be (cx2-cx1), which we call cS. Then the ratio cR/cS is the counterpart to bS/ bR in the specification of f(bS, bR). To allow for the possibility of non-linear perceptual effects entering into the payoff judgements in an analogous way to the similarity effects incorporated into f(bS, bR), Loomes (2010) suggested specifying j(yR, yS) as:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

90

Handbook of choice modelling j(yR, yS) 5 (cR/cS)d

(4.4)

where d$1. Under EUT and a number of alternatives to EUT, d 5 1 (that is, there are no nonlinear perceptual effects). However, when d . 1, whichever is the bigger of cR and cS receives ‘disproportionate’ attention to an increasing extent as cR and cS become more and more different. So in the choice between A and B, j(yR, yS) 5 [(c40-c30)/(c30-c0)]d, which gives additional relative weight to cS; whereas in the choice between A and J, j(yR, yS) 5 [(c100-c30)/ (c30-c0)]d, shifting disproportionately greater weight to cR over and above the differences between basic utilities. In Loomes (2010) it is shown how such non-linear effects can contribute to choice cycles in the direction consistent with regret theory (Bell, 1982; Loomes and Sugden, 1982) but in the opposite direction to those arising from similarity effects, offering an explanation of why the same individuals may violate transitivity in the regret direction for some sets of choices but also violate transitivity in the similarity direction for other choice sets (see Day and Loomes, 2010, for experimental data of this kind). Clearly, extending the framework from one specific binary choice to encompass a range of binary choices adds further elements and greater complexity to the model. First, we require some specification of c(.). As a first approximation, and appealing to the idea of diminishing marginal basic utility, one might use a simple functional form, possibly involving just one parameter (for example, a power function). Second, if d has similar status to a – that is, if d allows for non-linear comparative judgements on the payoff dimension analogous to the propensity for non-linear comparative judgements on the probability dimension represented by a – then a distribution for d as well as a is required. However, despite the additional complexity, we have still not moved outside of the domain of binary choices involving only three payoffs per pair. In the following subsections, we consider ways in which the domain might be broadened – and the issues thereby raised. We are not offering resolutions to all those issues, but merely some observations. 6.2

Larger Numbers of Payoffs within Each Option

Although it is natural to want to keep experimental objects simple, most options outside of the laboratory are liable to involve more than two or three possible outcomes: indeed, it is hard to think of many alternatives in the non-experimental environment that are so simple, while it is easy to think of a wide variety of options where there are many levels of outcome. In principle, this might be modelled in terms of multiple pair-wise interactions between all of the various outcomes (see Cillo and Delquie, 2006, for an exploration of this approach) but in practice, especially when people are making relatively quick evaluations of options with many possible outcomes and uncertainties, it may be that the main moments – mean, median and skewness – and a limited number of particularly prominent or extreme outcomes will receive particular weight. Options with more than three outcomes defy simple visual representation such as the M–M triangle, and the idea of representing attitude to dispersion in terms of a point of

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

91

convergence to the south-west of such a triangle is no longer tenable in that form. On the other hand, confining interest to some measure of mean subjective value alone is likely to be too severe a restriction to be descriptively appropriate in a world where there is much evidence of susceptibility to at least some degree of within-lottery interactions. One challenge for future modelling is to identify the counterparts (and possibly, supplements) to the distributions of g and l in Section 5 above when options have multiple possible outcomes. Extending the domain to more than three outcomes raises a further issue, this time for between-lottery interactions. Consider a choice between A and a new lottery H below. Here the probability that the safer A will turn out better is 30 per cent, giving a figure for bS; but the 70 per cent probability favouring the riskier H is in two parts, one of which is a third the size of bS while the other part is double bS. So how should we conceive of bR? Analogously, on the payoff dimension, there are two levels of argument in favour of the riskier option: (c80–c30) and (c40–c30): so how should the overall perceived magnitude of yR be computed? 100

1

A

£30 100% 1

H

10

11

70

71

100

£80

£40

0

10%

60%

30%

It is possible that, as we scale down the probabilities of the non-zero payoffs and correspondingly increase the probabilities of zero, similarity effects might come into play. For example, if scaling those probabilities down to a fifth of their size produces A*: (£30, 0.2; 0, 0.8) and H*: (£80, 0.02; £40, 0.12; 0, 0.86), the decision-maker might judge both options to offer similar substantial chances of 0 and therefore be more inclined to choose the option with the higher payoffs. But one could also imagine that he might be inclined to judge the 0.02 to be relatively so small that it is virtually disregarded, thereby removing the weight associated with the most attractive payoff favouring H* and increasing the chance that A* will be picked. In short, increasing the number of outcomes beyond a total of three may considerably complicate the operation and modelling of betweenlottery interactions. 6.3

Larger Numbers of Options in the Choice Set

Perhaps because of the powerful attraction of transitivity for early (and many current) decision theorists, it has been commonplace to suppose that preferences in larger choice sets can be deduced quite straightforwardly from binary subsets. However, once we allow the possibility of interactions between the probabilities and/or the payoffs of the kind entailed in expressions (4.3) and (4.4) above, transitivity is liable to be undermined. This raises the question: if there is a cycle of binary choices where A is chosen over B, B is chosen over C, and C is chosen over A, how can we deduce what choice will be made from the set {A, B, C}?

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

92

Handbook of choice modelling

The essence of the difficulty is that if a third option is added to an initial pair, interactions between the components of that initial pair may be altered. Consider the case where option J defined above is offered alongside A and B. Perhaps the choice is set out as: 1

100

A

£30 100% 1

80

B

£40

J

30

£100 30%

100

0

80% 1

81 20%

31

100

0 70%

Now there are multiple between-option comparisons that might be made on both the probability and the payoff dimensions, and to the extent that some of them pull in opposite directions, it is not obvious how to weight them – or indeed, whether some of them will be neglected altogether or eliminated at an early stage as part of a process of trying to simplify the problem. There is now considerable evidence that adding options to a choice set can substantially affect the relative perceived attractiveness of the initial elements of the set. Various forms of such influences – asymmetric dominance/attraction, compromise and even, perhaps, repulsion effects – have been documented and continue to be the subject of investigation and controversy (see, for example, Malkoc, 2006). This speaks both to the potential importance of between-option interactions and to the difficulty of providing a modelling strategy for encompassing them all, even when the choice set is extended just from two to three, as is the case in much of the work on attraction and compromise effects. When the choice set becomes larger still, further effects may be produced . . . but at least one of the effects discussed earlier may be attenuated or largely disappear. Bateman et al. (2007) report two experiments where participants were asked to rank a large number of options (20 in one experiment, 25 in the other) consisting of a range of sure amounts and a number of two-outcome lotteries which, between them, spread along the spectrum of probabilities and provided quite a variety of payoffs between 0 and £60. With a range of sure amounts incorporated into the rankings, it is possible to infer a certainty equivalent value for each two-outcome lottery – at least, within a narrow interval.13 By varying the composition of the non-degenerate lotteries in different ranking exercises, the inferred values of particular lotteries were substantially and systematically altered. When a particular lottery was presented in a choice set consisting of other lotteries that were mostly regarded as inferior to it, it was ranked more highly against sure amounts than when it was part of a choice set where most of the other non-degenerate lotteries were regarded as superior. This appears to be consistent with the kind of rangefrequency effect discussed by Parducci (1965), again suggesting the potential importance of comparative and contextual influences in people’s judgements and evaluations.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

93

In a related working paper, Bateman et al. (2006) reported some other features of those same experiments which allowed the orderings over particular pairs of lotteries within the larger ranking exercises to be compared with straight binary choices between each pair. One striking result was that strong CRE patterns of the standard kind were frequently observed on the basis of the binary choice tasks; but when those same pairs were embedded in ranking exercises involving larger choice sets, the effects were substantially reduced and quite often became statistically insignificant (or were occasionally reversed). One possible explanation for this result is that when people are focused upon just two alternatives, particular contrasts or similarities may attract especial attention and weight, promoting the sorts of patterns of response typical of the CRE; whereas when the lotteries in question are incorporated into a larger set with a number of other probabilities and payoffs, the multiple comparisons greatly attenuate any one particular contrast or comparison. This is not to say that interactions of all kinds become unimportant – the range-frequency evidence referred to in the previous paragraph may still be influential – but rather that the nature and relative strength of different interactions may vary with the context.

7

CONCLUDING REMARKS

In many non-experimental environments involving some degree of risk and uncertainty, individuals are faced with multiple alternatives, each of which might result in a number of different outcomes. Ideally, an adequate descriptive model of choice in such environments would allow for the fact that human decision-making involves a process of judgement in the course of which diverse considerations and partial comparisons may vie with one another and interact in ways that may vary from one context to another and even from one occasion to another within the same context. In short, such decision-making is a complex cognitive process involving within-option interactions, between-option interactions, and judgemental noise/variability. By contrast, many formal decision theories propose relatively simple deterministic functional forms, with noise often modelled as some form of extraneous ‘error’ term. The assumptions underpinning these functional forms can be seen as restrictions on the kinds of interactions that any particular model allows. In the case of EUT, the standard axioms, collectively, have the effect of suppressing virtually all within- and betweenlottery interactions of the kind discussed in this chapter. The result is a model which operates as if the subjective value/utility assigned to any outcome is purely a matter of how the characteristics of each outcome map to the tastes of the decision-maker, independent of any and all other outcomes available within the choice set and their respective probabilities, and as if the weight given by any probability is the probability itself, independent of the distribution of which it is a part and independent of all other features of the choice set. However, to the extent that certain kinds of interactions are typical of actual decision processes and are common to many individuals, the restrictions imposed by those axioms are liable to be substantially and systematically breached. Such breaches may most easily be observed in controlled and highly simplified experimental conditions; and especially so in circumstances where two or more interactions reinforce one another, as we have

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

94

Handbook of choice modelling

suggested might be the case, say, for classic CRE choices where both similarity effects and attitudes to dispersion may work in the same direction. But it may be unwise to extrapolate too far from rather specific regularities found in a fairly limited domain. If we are correct in suggesting that the classic CRE is the product of at least two tendencies, then models built on the supposition that just one kind of interaction is responsible (and hence that just one axiom needs to be relaxed) may work in some cases, but may fail systematically in other cases where different interactions operate more powerfully. For example, the class of rank-dependent models that relax the independence axiom but hold fast to transitivity, thereby ruling out betweenlottery interactions, will be unable to accommodate cycles or attraction or compromise effects. Likewise, a model such as regret theory, which posits certain sorts of interactions between the payoffs of different lotteries, may thereby be able to account for some kinds of choice cycles; but by precluding analogous interactions between probabilities, it misses the kinds of similarity effects that may in some circumstances outweigh regret effects and actually reverse the direction of cycles. We have explored some possibilities for allowing, within each individual, for both attitude to dispersion (as represented by ‘fanning out’) and also similarity judgements responding to some combination of ratios and differences (as reflected in expressions 4.3 and 4.4). We have also tried to allow for within-person variability via distributions of key parameters. Others, who acknowledge that no single ‘simple’ model will do, have tackled the challenge in a rather different way. Instead of allowing for a number of different influences operating within each individual’s decision process, the ‘mixture model’ approach makes the simplifying assumption that each individual within a sample behaves according to one specific model (EUT, rank-dependent EU, and so on) but that different subsets behave according to different models, with the proportions of each type regarded as variables to be estimated in econometric analysis – see Conte et al. (2011). The conceptual difficulty with such an approach is that if some individuals always respect all the axioms of EUT while some behave according to models which respect transitivity but allow violations of independence and others violate independence but conform with transitivity, why should there not also be people who depart from both axioms? And why should there not be many such people? And if there are many such ‘internal’ mixtures which are assigned to the ‘nearest’ more restricted ‘type’, how much distortion arises from such misspecification? These are open questions involving interesting problems for modellers in this field. Of course, a ‘meta-model’ which allows the possibility of almost any form of internal non-linear interaction is unlikely to be very useful. But advances in complexity science and the huge expansion of computing capacity make it possible to contemplate the construction of models which allow at least some within- and between-option interactions to co-exist, perhaps sometimes reinforcing and sometimes offsetting one another, with variability and probabilistic choice made intrinsic to the model rather than seen as an add-on. The challenge is to identify the kinds of interactions which do most work in most people’s decision processes and find ways of incorporating them into a single framework. The particular suggestions we have made in this chapter may very well turn out not to be the best ways of meeting that challenge, but the general approach does seem to us to be the way forward for decision modelling.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

95

NOTES 1. 2. 3. 4. 5.

6. 7.

8. 9. 10.

11.

12.

13.

Graham Loomes acknowledges the financial support of the UK’s Economic and Social Research Council (grant no. RES-051-27-0248). In actual experiments, this choice does not necessarily precede the next. Ideally, the order in which choices are presented will be randomized. Although, as we shall see, there may be parameters for which the asymmetry is less strong – or indeed, may disappear altogether. In models such as disappointment theory, the extra weight takes the form of modifying the overall utility of the 0 payoff downwards in anticipation of the fact that receiving so much less than the EV will generate a further decrement of utility due to the disappointment resulting from expectations being dashed. Lottery A offers £30 with certainty while G offers a 20 per cent chance of £40, a 5 per cent chance of 0 and a 75 per cent chance of £30. So they have in common a 75 per cent chance of £30. If that is replaced in both alternatives by a 75 per cent chance of 0, A is transformed into C and G is transformed into D. Under the independence assumptions of EUT, replacing one common consequence by a different common consequence should make no difference; but Allais (1953) suggested that there would be violations of this implication, and that there would be a strong tendency for people to choose A and D but not the opposite departure, G and C. Some experimental evidence has suggested that this is the case, although for the kinds of sums offered in incentivized experiments, the CCE tends to be rather weaker than the CRE and in some cases fails to manifest significantly – see, for example, Conlisk (1989). This is a difference that we shall say more about shortly. For further details of the conditions under which this similarity formulation entails violations of transitivity and betweenness, see Loomes (2010, s. 2.2). Notice that those models which take transitivity for granted (and therefore assume that there are indifference maps in the M–M triangle) will suppose that such violations of betweenness entail indifference curves which are especially curved in the region closer to the bottom edge of the triangle – see, for example, Figure 4 in Tversky and Kahneman (1992). Or to put it slightly differently, suppose we are here considering an individual who is immune to similarity effects – that is, whose a 5 0. Much the same idea might be formulated in other terms – the particular account of the source/nature of the within-lottery interactions is not the main point here – the disappointment/elation story is simply to provide an illustration of the more general idea. This is a first approximation because if the hypotenuse lottery is a lottery other than (£40, 0.15; 0, 0.85), it will have slightly different elation and disappointment components so that, in relation to C, the value of j(yR, yS) will not be exactly ⅔. However, just as we made the simplifying assumption that attitude to dispersion can be approximated by linear indifference curves emanating from the same point to the southwest of the triangle, we now look at the first approximation of the impact of similarity by supposing that the value of j(yR, yS) is given by the inverse of the gradient of the dashed line through C. Other modelling assumptions are possible and it may be that future empirical analysis will show that other approximations provide a better fit to data. Of course, if there are skewness/disappointment/elation influences at work, these will also enter into the judgement about which hypotenuse lottery is exactly as good as the sure intermediate amount, so this gradient is not influenced solely by risk attitude, as would be supposed under EUT. Still, for any given attitude to higher moment factors, a steeper gradient reflects a greater degree of what is conventionally regarded as risk aversion. The idea of ‘basic utility’ as outlined in Loomes and Sugden (1982) is that it represents the subjective value of receiving the payoff (or more generally, experiencing any outcome) in cases where that payoff/ outcome is received/experienced other than as a consequence of choice. Later the idea was modified to try to capture the idea of it being the ‘core’ component, representing the essence of the subjective value yielded purely by the way in which the characteristics of xi itself interact with the tastes of the individual, independent of any comparison with any other xi within or between lotteries. The sure amounts were whole pounds, so the typical interval was between two integer numbers of pounds.

REFERENCES Allais, M. (1953), ‘Le comportement de l’homme rationnel devant le risque: critique des postulats et axioms de l’ecole Americaine’ (‘The behaviour of rational man under conditions of risk: criticisms of postulations and axioms of the American school’), Econometrica, 21 (4), 503–46.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

96

Handbook of choice modelling

Arieli, A., Y. Ben-Ami and A. Rubinstein (2011), ‘Tracking decision makers under uncertainty’, American Economic Journal: Microeconomics, 3 (4), 68–76. Bateman, I., B. Day, G. Loomes and R. Sugden (2006), ‘Ranking versus choice in the elicitation of preferences’, working paper, University of East Anglia. Bateman, I., B. Day, G. Loomes and R. Sugden (2007), ‘Can ranking techniques elicit robust values?’, Journal of Risk and Uncertainty, 34 (1), 49–66. Battalio, R., J. Kagel and K. Jiranyakul (1990), ‘Testing between alternative models of choice under uncertainty: some initial results’, Journal of Risk and Uncertainty, 3 (1), 25–50. Becker, G., M. DeGroot and J. Marschak (1963), ‘Stochastic models of choice behavior’, Behavioral Science, 8 (1), 41–55. Bell, D. (1985), ‘Disappointment in decision making under uncertainty’, Operations Research, 33 (1), 1–27. Chew, S. and K. MacCrimmon (1979), ‘Alpha-Nu choice theory: a generalization of expected utility theory’, Working Paper No. 669, July, University of British Columbia Faculty of Commerce and Business Administration. Cillo, A. and P. Delquie (2006), ‘Disappointment without prior expectation: a unifying perspective on decision under risk’, Journal of Risk and Uncertainty, 33 (1), 197–215. Conlisk, J. (1989), ‘Three variants on the Allais example’, American Economic Review, 79 (3), 392–407. Conte, A., J. Hey and P. Moffatt (2011), ‘Mixture models of choice under risk’, Journal of Econometrics, 162 (1), 79–88. Day, B. and G. Loomes (2010), ‘Conflicting violations of choice and where they may lead us’, Theory and Decision, 68 (2), 233–42. Deco, G. and E.T. Rolls (2006), ‘A neurophysiological model of decision-making and Weber’s law’, European Journal of Neuroscience, 24 (3), 90–116. Ellsberg, D. (1961), ‘Risk, ambiguity and the Savage axioms’, Quarterly Journal of Economics, 75 (4), 643–69. Gul, F. (1991), ‘A theory of disappointment aversion’, Econometrica, 59 (3), 667–86. Hagen, O. (1979), ‘Towards a positive theory of preferences under risk’, in M. Allais and O. Hagen (eds), Expected Utility Hypotheses and the Allais Paradox, Dordrecht: Reidel, pp. 271–302. Kahneman, D. and A. Tversky (1979), ‘Prospect theory: an analysis of decision under risk’, Econometrica, 47 (2), 263–91. Knetsch, J. and J. Sinden (1984), ‘Willingness to pay and compensation demanded: experimental evidence of an unexpected disparity in measures of value’, Quarterly Journal of Economics, 99 (3), 507–21. Leland, J. (1994), ‘Generalized similarity judgments: an alternative explanation for choice anomalies’, Journal of Risk and Uncertainty, 9 (2), 151–72. Lichtenstein, S. and P. Slovic (1971), ‘Reversals of preference between bids and choices in gambling decisions’, Journal of Experimental Psychology, 89 (1), 46–55. Loomes, G. (1991), ‘Testing decision theories by using value equivalences’, Oxford Economic Papers, 43 (4), 644–66. Loomes, G. (2010), ‘Modelling choice and valuation in decision experiments’, Psychological Review, 117 (3), 902–24. Loomes, G. and R. Sugden (1982), ‘Regret theory: an alternative theory of rational choice under uncertainty’, Economic Journal, 92 (368), 805–24. Loomes, G. and R. Sugden (1986), ‘Disappointment and dynamic consistency in choice under uncertainty’, Review of Economic Studies, 53 (2), 271–82. Loomes, G., P. Moffatt and R. Sugden (2002), ‘A microeconometric test of alternative stochastic theories of risky choice’, Journal of Risk and Uncertainty, 24 (2), 103–30. Luce, R. and P. Suppes (1965), ‘Preference, utility and subjective probability’, in R. Duncan Luce, R.R. Bush and E.H. Galanter (eds), Handbook of Mathematical Psychology, vol. 3, New York: Wiley, pp. 249–410. Machina, M., (1982), ‘“Expected utility” theory without the independence axiom’, Econometrica, 50 (2), 277–323. Malkoc, S. (2006), ‘Attraction and compromise effects revisited: the role of attribute characteristics and representation in context effects (Symposia summary)’, Advances in Consumer Research, 35 (1), 122–4. Mosteller, F. and P. Nogee (1951), ‘An experimental measurement of utility’, Journal of Political Economy, 59 (5), 371–404. Parducci, A. (1965), ‘Category judgment: a range-frequency model’, Psychological Review, 72 (6), 407–18. Quiggin, J. (1982), ‘A theory of anticipated utility’, Journal of Economic Behavior and Organization, 3 (4), 323–43. Rieskamp, J., J. Busemeyer and B. Mellers (2006), ‘Extending the bounds of rationality: evidence and theories of preferential choice’, Journal of Economic Literature, 44 (3), 631–61. Rubinstein, A. (1988), ‘Similarity and decision making under risk: is there a utility theory resolution to the Allais paradox?’, Journal of Economic Theory, 46 (1), 145–53.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Towards a more complex model of risky choice

97

Starmer, C. and R. Sugden (1989), ‘Violations of the independence axiom in common ratio problems: an experimental test of some competing hypotheses’, Annals of Operations Research, 19 (1), 79–102. Sugden, R. (2004), ‘Alternatives to expected utility: foundations’, in S. Barbera, P. Hammond and C. Seidl (eds), Handbook of Utility Theory: Volume 2: Extensions, Dordrecht: Kluwer, pp. 685–755. Tversky, A. (1969), ‘Intransitivity of preferences’, Psychological Review, 76 (1), 31–48. Tversky, A. and D. Kahneman (1992), ‘Advances in prospect theory: cumulative representation of uncertainty’, Journal of Risk and Uncertainty, 5 (4), 297–323. Von Neumann, J. and O. Morgenstern (1947), Theory of Games and Economic Behavior, 2nd edn, Princeton, NJ: Princeton University Press. Wilcox, N. (2008), ‘Stochastic models for binary discrete choice under risk: a critical primer and econometric comparison’, in J. Cox and G. Harrison (eds), Risk Aversion in Experiments, Research in Experimental Economics Volume 12, New York: JAI Press, pp. 197–292.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:57AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

5

Choice context Konstadinos G. Goulias and Ram M. Pendyala

1

INTRODUCTION

Context in choice models is often treated in a narrow space with a focus on limited contextual dimensions of interest that may influence the choices under study. Over time, however, the field of choice modeling has matured recognizing the need to consider the many behavioral facets where choice among alternatives is important and the contexts within which predictive models need to be developed are identified. In fact, today there is a need to develop behavioral models encompassing the entire life span of individuals because the policies models are asked to analyze require the development of model systems spanning a wider net of relationships that go far beyond the narrowly developed mode choice of the seventies. Context in this chapter is defined as the entire framework and set of factors describing the objective and subjective circumstances that surround and influence action by an individual and/or a group. To describe context, dimensions of interest include: (a) time, in terms of the life course of an individual, historical time and time scale; (b) space, that includes locations, groups of locations (for example, shopping centre), neighborhood, city, region and country; and (c) society, that includes the household and other social networks; and the entirety of laws, rules, and regulations. Factors emerging from these dimensions (for example, barriers) are sometimes separable and can be used as explanatory variables in behavioral equations or variables to manipulate in experiments (Oppewal and Timmermans, 1991; Swait et al., 2002). However, often they are inseparable and require a different machinery of detection than currently available (see the emotional and symbolic consumption in Elliott, 1998, and the non-positivist consumer behavior analysis reviewed by Pachauri, 2001). A life course perspective can offer a theoretical framework that includes many facets in the life of individuals and incorporates time in a natural way. Today this is needed more than ever because of an aging population with diverse attitudes and behaviors. Closer attention is also paid to children and their needs, and major changes continue to take place in labor relations and labor force participation accompanied by increased diversity in social institutions such as the household. Moreover, transportation policy analysis is expanding its scope to include land use, which is strongly influenced by residential location decisions that in turn require the study of other decisions such as household formation, labor force participation, fertility, schooling, and the variety of location decisions surrounding residence and job locations. In parallel, a shift in policies is seen, aiming at a betterment of quality of life instead of simple economic appraisals, as well as environmental assessments spanning long periods (2000 to 2050 and beyond in California legislative initiatives, for example). Recent attempts to create theories of travel behavior also expanded the sphere of consideration to include many aspects of social life (for example, time use, human relations and interaction, cognition and perception). It becomes natural 101 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

102

Handbook of choice modelling

then to advocate a more comprehensive viewpoint that encompasses the entire life of individuals that also considers their biological and social nature. In this way, it becomes possible to place each person in a more complete spatio-temporal context. The specific behavioral facets considered in this chapter are time use and the sequencing and ordering of activities and travel in time and space (for example, Brög and Erl, 1982; Kitamura et al., 1997b; Pendyala, 2003; Arentze and Timmermans, 2003; Pendyala and Bhat, 2004; Ettema et al., 2007; Sener et al., 2008), spatial location and place choice (for example, Waddell et al., 2003; Billig, 2004; Waddell et al., 2007; Chow and Healey, 2008), and ownership and change in ‘mobility tools’ (for example, Mohammadian and Miller, 2003; Scott and Axhausen, 2006). In this chapter, travel behavior dynamics (particularly constancy and change) is viewed from a comprehensive conceptualization of travel behavior development. In the next section, a few basic principles of what is known today as the life course approach to human development are reviewed. Then, a conceptual approach to human development that emphasizes context in time is presented and examined in detail. This is linked in the following section with Bourdieu’s approach to praxis (Bourdieu, 1998). This theory-building section is followed by a review of the context(s) encountered in the travel behavior literature and, finally, the chapter concludes with a description of possible elicitation methods.

2

BIOECOLOGY AS CHOICE CONTEXT

The life course approach is a broader perspective of life-span psychology and life-course sociology (Giele and Elder, 1998). A life-centered approach considers the entire chain of events characterizing the lives of individuals in a contextual manner from conception to death. Closely related fields are also life history and evolutionary psychology as well as life story approaches. All of these approaches differ in ontology and methodology but they share an interest in explaining events in the life span of individuals considering the life span in its entirety. This more ‘naturalistic’ framework in social psychology of life span development is an ‘emerging paradigm’ that has a distinctive theory and methods. Elder and Giele (2009) advance core principles of life course theory as a field of inquiry with strong affinities to family development, individual development, family history, stress theory, demography, gerontology and Bronfenbrenner’s (2005) ecological development perspective that is briefly reviewed in this chapter. In an attempt to integrate different approaches in life course research, Elder and Giele (2009) describe the ‘principles’ of the life course approach as described in this section. Historical Time and Place The individual life course is embedded in historical times and places that are experienced throughout a person’s lifetime, and they shape the life course experience. Examples include geopolitical events and localized conflicts (for example, conflicts and wars, unification of Europe, collapse of the USSR, dismantling of Yugoslavia), economic fluctuations (for example, recessions and growth), major natural disasters (for example, floods, earthquakes and volcanic eruptions), social and cultural ideologies (for example,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

103

patriarchy and transitions to democracy). On the one hand, these historical conditions and events shape peoples dispositions (that is, norms, beliefs, attitudes, perceptions and choices) and, on the other hand, they alter the course of their development. Understanding behavior requires knowledge of the places and socio-historical circumstances that surround individual life histories. Under this principle, travel behavior is in part determined by the sequence of personal experiences with places (for example, moving from a small village to a big city) and major events (for example, opening of national borders in the European Union and increased availability of private cars) that changed this experience. Linked Lives The lives of individuals are linked with the lives of other individuals through multiple networks of relationships cast within the social and historical context. This is a multilevel interdependency centered in, by and around the family/household. In this way an event at a different level (for example, a war and recruitment of soldiers) influences relationships in the family and other social ties. Micro social or biological events such as a death or a variety of other life-changing events may trigger behavioral changes. In addition, family members also plan and organize for the timing of life-changing events such as marriage, having children, caring for an older parent, moving into a new residence, finding suitable schools, finding jobs and pursuing a new hobby. Under this principle, travel behavior is also partially determined by the sequence and type of events other people experience because of the relationships a person has with others. This is discussed further in later sections of this chapter. Human Agency Each individual builds a life course through a complex orchestration of actions within a physical (objective) and imaginary (subjective) stage of opportunities and constraints. Individuals are active agents who have the dual role of mediating the influence of social structure and actively shaping social structure. In fact, Elder and Giele (2009, p. 13) write: ‘People’s motives to satisfy personal needs result in decision making that organizes their lives around goals within options and pressures of their situations.’ Implied in this is also the ability of families and individuals to adapt to new circumstances by modifying their expectations and behavior in response to internal and external events. Under this principle, travelers move around the physical network while maintaining a mental map of where opportunities and paths are located. At the same time, they take action to shape in many different ways the spatial distribution of these opportunities and the networks used to reach them (for example, voting for funding programs and city ordinances). Timing of Lives The impact on a person’s development of a succession of life transitions or events is contingent on when they occur in a person’s life. In life course approaches, time is considered as being of three fundamental types – individual, generational and historical. Individual time is also chronological age and often referred to as ontogenetic. It is very often used

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

104

Handbook of choice modelling

to identify and characterize childhood, adolescence, young adulthood, old adulthood and end of life. They influence rights (for example, right to vote, ability to obtain a driver’s license), positions (for example, student) and roles (for example, breadwinner). These are heavily based on culturally and socially defined age periods and expectations. Generational time is the age period used to define cohorts, such as the baby boom generation who are people born between 1946 and 1964 (definition used in the USA). In fact, the baby boomer generation is an important cohort in transportation research and policy because of its size and distinct ‘character’ and ‘habits’ relative to past generations. The third, historical time, is a marker of large-scale changes that have a broad impact on civilizations of the earth. Examples are major conflicts and wars, the proliferation of the Internet, climate change, and global economic recession. The passing of time can be viewed as the orchestration of life course events visualized as a sequence of transitions. In this case, a transition is a discrete life change, which is an event within a trajectory (for example, from living at home to leaving the nest, from a single to a married state and from a working to a retired state). A trajectory in this context is a sequence of linked lockstep-like states within a range of behaviors and experiences that can be considered to be a single entity. A typical example is education from the kindergarten to elementary school, middle school and high school, and on to university. These transitions are accompanied and marked by culturally specific rituals that reinforce them and repeat across generations. Trajectories include and integrate many transitions of multiple people in a variety of interactions (George, 2009). Age-grading or age-structuring is also a typical characteristic, at least in western cultures. This is the sequence of transitions that are deemed appropriate at specific ages. In a family trajectory, this sequence may appear as follows: leave home, marry, enter parenthood, complete parenthood and enter grandparenthood. In an education and work trajectory this may be exit full-time schooling, enter full-time work, settle on a career, reach the peak of a career and retire. Violation of the sequence as well as these age-graded transitions may occur; such violations are often associated with a social meaning or interpretation as well as positive and negative consequences. Lifelong Process An overarching principle of the life course approach(es) is the umbrella consideration that human development and progression along the age axis is a lifelong process in which the past shapes the future in different ways for individuals who may appear similar even when one controls for typical explanatory variables such as age, gender, education and employment. One can envision the past as having a wave-like impact on the future that not only influences the generation that experiences a wave forming event, but crosses over to future generations (for example, becoming a refugee due to political or economic conflicts). The timing of the onset of these wave-forming events trigger chain reactions that when considered together lead to competitive advantages and disadvantages. Moreover, the combination of transitions and their timing may create waves of inhibition of aspirations or recovery from disasters. Individual and group ability to take advantage of opportunities and to fortify against negative impacts vary depending on material and immaterial resources available at specific periods. When resources are available where they are needed, recovery happens (even rapidly) but when they are not

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

105

available, inhibition is exacerbated and prolonged. The discussion that follows describes how these are expressed in terms of different forms of possessions (capital).

3

A CONCEPTUAL FRAMEWORK FOR CHOICE CONTEXT

A conceptual framework that is based on a developmental theory would fit very well in a coherent life course context definition. Bronfenbrenner’s person-process-context-time (PPCT) model (Bronfenbrenner, 2005) is an excellent example with its core theory based on the zone of proximal development. In this organizing principle, human development in the life span is a journey through increasingly more complex reciprocal interaction between a human organism and other organisms, objects, and symbols in its environment (Vygotsky, 1978). Emerging from this is the activity theory with applications to human–computer interfaces and the bioecological model of Bronfenbrenner, which over time evolved into the PPCT model (Bronfenbrenner, 2005). Person here is intended as person factors representing individual differences in physiological and psychological states, tempo and biological intensity of reactions. Process is the stream of psychological acts that are called proximal processes and considered to be the primary engines of development. Context is the physical, socio-emotional and mental setting in which behavior takes place. Of key consideration here is the distinction of proximal versus distal interactions. The form, power, content, and direction of proximal processes vary systematically as a joint function of the characteristics of the developing person, of the immediate environment (proximal), of the remote environment (distal) and of the nature of the developmental outcomes under consideration. In this way, it is possible to differentiate the immediate setting (or immediate field or proximal arena) in which activities take place (such as the household, school, social network of friends or workplace) and the much broader context in which the immediate setting is embedded (for example, the city, social class, ethnic group, state or country). Time is considered in its three dimensions of ontogenetic (person development) time, cohort time, and historical time as discussed in the life course approaches above. Triggering behavioral change under this conceptual framework is a joint function of the characteristics of the persons, their immediate environment(s), their remote environment(s) and the nature of the outcome(s). This is a key consideration and position of the PPCT model that not only recognizes the importance of the joint influences of person and environmental factors on behavior, but also allows for the behavioral ‘mechanism’ to depend on the behavioral outcome. Under a model of behavior of this type, there are multiple levels of intervening influences within an individual, among individuals, and from other sources that can change human development and behavior. Using this framework, travel behavior dynamics can be analyzed from the lens of change processes in the life span of interacting individuals. In this way, travel behavior can be placed in the context of a chain of events with the pattern of events characterized by antecedent and subsequent events. As Bronfenbrenner (2005) advocated, a transition is best analyzed as an ecological concept that comprises a series of nested structures (microsystems) linked together in a network (the mesosystem). To illustrate his conceptual apparatus, consider an example of the joint decision to become a driver and acquire a private automobile as a developmental process of a person that enters the world of

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

106

Handbook of choice modelling Mesosystem

Exosystem Social institutions

Taxation Social values

Work Insurance

Culture

Home/ family MOTORIST

Parental wealth

Government policies Community policies p po

Political institutions

School

Spatial S organization o

Ideologies Circle of friends

Other institutions

Employment opportunities Global influences Macrosystem Microsystem

Figure 5.1

The multilevel PPCT model for a young motorist

automobility. At the center of PPCT is the motorist. The word motorist, instead of driver or automobilist, is selected to capture the intrinsic definition of a person that travels using a machine with relative independence. Figure 5.1 is a pictorial representation of the ecology of the developing motorist (described in more detail in Goulias, 2009). The person is characterized by age, gender, level of maturity, physical abilities and other attributes that are the result of personal development and interaction with the environment. This person develops in relation to her or his family members, schoolmates (and possibly co-workers), and friends. This ecology is composed of settings in which this person lives, studies, works, and interacts. The nature of these settings is physical/material but also symbolic. To capture the proximal influences of these settings on individual development, Bronfenbrenner (2005) developed the idea of microsystems to depict the innermost region of the interaction between

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

107

a person and the environment. These include genetic transmission, physical and physiological states, interpersonal interactions, relations, attitudes and the immediate physical environment characteristics. The key emphasis here is on the interactions between person and environment meaning that persons of the same age and gender may display different developmental outcomes of automobility because there are differences in the cars available at home, parent’s dispositions toward teenage automobility, schoolmates driving to school, schools allowing parking on their grounds and workplace locations. At the centre of this enterprise is the physical and cognitive ability to drive a car, which itself is the outcome of past interactions with parents, such as taking driving lessons with parents at earlier ages, as well as biological growth. In addition, teenagers interact with others in microsystems of peer groups that are indicated in Figure 5.1 in a symbolic way as school and friends. Bronfenbrenner’s (2005) next level is the mesosystem that describes how the different components of a person’s microsystem work together for the developing motorist. For example, with the newly found freedom of movement, the new motorist on the way to school picks up a schoolmate with the blessing of the parents. In the same way, the motorist can go out in the evening with friends. All of this improves the person’s skill, power in the social network, confidence of the parents and position of this empowered person in the different social networks in which he or she resides. The intersection, relationships, and interaction of two or more of these microsystems are referred to as mesosystems in this model. The examples above are of encouragement for automobility, but there could also be barriers and/or inhibitors, such as no driving at night, no parking at school, no friends allowed to ride with a teenage driver and driving limited to work and school purposes only. Of importance in this process is the consideration of change in all of these interactions through a variety of other phenomena including, but not limited to accidents, the entry in automobility of other teenagers and a variety of events that change family composition, school settings and friendships. Figure 5.1 shows yet another systemic level, the exosystem, which includes the other persons and places with which our motorist may not interact directly, but that play a substantial role on the developmental process of driving. These can be the parents’ employment and wealth as well as local taxation and insurance regulations, labor structure for employment opportunities, the spatial organization of places, workplace and school support for automobility, and the community’s policies about driving. The all-encompassing envelope that is more remote and yet exerting substantial influence on the motorist is the macrosystem. It includes society’s values, rights and responsibilities, culture, and ideologies in favor of or opposed to automobility (for example, positions about climate change, environmental justice, equity and freedom). It also includes socio-political institutions defining the overall setting of policy and the government institutions that develop and implement the rules and regulations (for example, driving after the age of 15 in the USA and 18 in many European countries). Global influences and other institutions include major historical circumstances and events as well as other organizations that influence automobility (for example, automotive industry, spatial development practices such as urbanization and urban sprawl). This example of the process-person-context in Bronfenbrenner’s terminology may be completed by adding time to make it a development model. Recall from the life course discussion above that time is individual (age), generational (cohort), and historical

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

108

Handbook of choice modelling

(period). In this example of the developing motorist, age at the first experience of driving may be 14–15 years with institutional recognition of driving rights at 16 years (in the USA) and 18 years in many other parts of the world. This construct progresses to later years to motorist maturation. Considering age, period, and cohort effects in analyzing human development allows one to account for common experiences shared by a group of persons in an age group and for the differences in experience among generations due to the specific periods in which they lived. Bronfenbrenner (2005), however, emphasizes the need for studying the three temporal dimensions in a system (called chronosystem) ‘to identify the impact of prior life events and experiences, singly or sequentially on subsequent development’ (Bronfenbrenner, 2005, p. 83). These events and experiences may originate in the external environment or within the person and alter the relationships of the person with the environment. The origin of the events can be at any of the systemic levels in Figure 5.1. Bronfenbrenner (2005) also distinguishes long-term impacts of life events from the impact of sequences of events/experiences of a person. He also advocates the need to avoid analyses of individual development paths as self standing and to account for changes in the environment (in all its rich multilevel structure) in the form of context undergoing change via multiple processes surrounding and interacting with the individual. There are many different types of events that can alter the individual and the context. These are physiological alterations (for example, hormonal changes that alter physical and social selves), transitions (for example, age-graded movement into and out of social roles such as school grades or loss of a parent), turning points (for example, events that cause reorientation of priorities and lasting alterations of a person’s developmental trajectory). All types of events create barriers or offer new opportunities. They may also lead to changes in roles, self-concepts, lifestyles, worldviews, and dispositions towards other people (McLeod and Almazan, 2003; Rönkä et al., 2003). They are also different in their impact depending on their timing and duration, and the socio-economic characteristics of the individual such as sex, and ethnic and social class. Examples of events include, and are not limited to, marriage, divorce, building a family and birth of children, entering a new intimate relationship, separation, entering school, choosing an occupation, engaging in nonoccupational studies, graduation, continuing studies, dropping out of school, job seeking, job loss, retirement, starting first job, starting private enterprise/ practice, declaring bankruptcy, moving to another community, leaving home, travelling somewhere far away, moving temporarily to another place, entering military or community service, loss due to death of close family member, getting a new apartment, getting a vacation home, change in leisure activities and hobbies, drug use and abuse, committing crime(s), religious engagement, psychological crises, own illness, illness of close family member and accidents. Each of these events can be considered factors of development continuity (for example, a motorist teenager continues to drive after moving to a new place) but also discontinuity (for example, leaving the nest and losing access to the parental car). Context plays a critical role because a move from a place where teenagers drive (for example, the USA) to a place where they are not allowed to drive will inhibit driving. Leaving the nest and moving to a place that does not offer alternate options may trigger the purchase of a car (with or without parental support). The PPCT model points out that location in different bioecologies renders the same experience – such as leaving home – as leading to different outcomes and choices on the part of the individual (motor-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

109

ist in this example). The model is less clear, however, about the dynamics of the process of negotiating with a changed context and the interactions with others in the microsystems. The discussion so far addresses three major ideas, personal development, context and time in travel behavior from a Parsonian functionalism (Parsons, 1991). In fact, Giele (2009) provides a direct mapping of Parsons’ latent pattern maintenance, integration, goal attainment and adaptation to the life course framework and other theoretical foundations as themes that she extracts using life stories. Life course theory provides an overall conceptual framework and a more detailed list of aspects that require attention in modeling and simulation of continuity and change. The zone of proximal development and the theoretical machinery of Vygotsky used by Bronfenbrenner’s ecological model of development (Bronfenbrenner, 2005) and its later version as a structural model are evidence of a more complex world of interactions that can lead to the development of behaviorally realistic models as in Spence and Lee (2003). A rich theoretical framework that is able to meet the life course theory requirements and satisfy its principles, while at the same time offering a fundamental ecological model of behavior, is Bourdieu’s approach (Bourdieu, 1984, 1998, 2005a, 2005b). A more appropriate word for this theoretical work is lens because Bourdieu’s analytical approach allows the identification of hidden structures underlying facts and complex situations, offering unique insights about the context and dynamics of human action. From a philosophy of science viewpoint, Bourdieu’s approach is relational because the roles and actions of agents are considered in relation to other agents in the complex overlapping of different fields of action. From a philosophy of action viewpoint, Bourdieu’s definition of agent’s behavioral mechanisms are dispositional ‘which notes the potentialities inscribed in the body of agents and in the structure of the situations where they act or, more precisely, in the relations between them’ (Bourdieu, 1998, p. vii). Examples of dispositions are lifestyles, tastes, posture, attitudes, and many other related concepts forming a system that determines and guides action. This is the core of Bourdieu’s theory of action that he named habitus. This is defined as a system of ‘dispositions, that is of permanent manners of being, seeing, acting and thinking, or a system of long-lasting (rather than permanent) schemes or schemata or structures of perception, conception, and action’ (Bourdieu, 2005a, p. 43, original emphasis). Habitus includes rituals and traditions, and economic sense in choices, as well as the practical sense characterized by practical schemes of perception and appreciation. All of this constitutes the ‘feel for the game’ (Bourdieu 1998) with which existing opportunity structures are shaped and they in turn shape individual action. Swartz (1997, p. 104) notes that ‘habitus, then, represents a sort of deep-structuring cultural matrix that generates self-fulfilling prophecies according to different class opportunities’. Habitus defined in this way is a structure of the practical skills and dispositions needed to negotiate life within different fields, and provides the schemata for choices of individuals without reducing them to just behavioral rules. It is a dynamic concept because it is constantly changed by these negotiations and choices. Fields of power are structured spaces organized around types of capital. The fields are spaces of relations of force among agents that possess power and dominate the specific field based on their position. They may be considered networks of objective positions and relationships. These positions are defined based on objective (as opposed to subjective) criteria that are present and characterize power amount and power distribution. Individuals and groups enhance their positions in these networks by building

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

110

Handbook of choice modelling

and using resources that are cultural, social and economic. These resources constitute the three basic types of capital, namely, economic capital (for example, wealth, income, ownership of property and other financially productive situations), social capital (for example, social connections, tastes, knowledge and the ability to use these for advancement socially and professionally), and cultural capital (for example, verbal facility, cultural awareness and knowledge, aesthetic preferences, information about education, and credentials). In a field of power, relationships include antagonism, competition, cooperation, hostility and dominance. A firm may also constitute a power field that is relatively autonomous (for example, a large vertically integrated firm) and composed of agents. For the agents, a field is a structure of probabilities of rewards, gains, profits, and penalties, as well as a space that allows some degree of indeterminacy (Bourdieu, 2005b, p. 130). Bourdieu also defines symbolic capital as any ‘ordinary property (for example, physical strength, wealth, warlike valor) which, perceived by social agents endowed with the categories of perception and appreciation permitting them to perceive, know and recognize it, becomes symbolically efficient, like a veritable magical power’ (Bourdieu, 1998, p. 102). Habitus now can be described as a system of actions and schemes leading to practices. It is also an expression of unconscious investment in power stakes. It is a ‘practice-unifying’ and ‘practice-generating’ principle in the form of class habitus as an internalized form of class condition (Bourdieu, 1984). Analysis in this context requires the identification of (social objective) classes (for example, the middle class) of agents that share conditionings, dispositions and practices. Using these ideas, it is possible to redefine context and agent behavior for the hypothetical motorist used as an example to illustrate Bronfenbrenner’s bioecological model with added detail and to also chart a course of inquiry to unravel the different layers of context. In travel behavior and in choice models we did not accomplish this yet. However, the examples below show how we made small steps towards the goal of defining and describing context and we are developing elicitation methods to help us understand and quantify the context influence on choices.

4

THREE APPROACHES TO INCORPORATE CONTEXT

A review of the literature suggests that there is no implementation of a comprehensive bioecological context model designed to be used in transportation related choice models. Context is recognized, however, as very important in early applications of stated preference and contingent valuation applications (Oppewal and Timmermans, 1991) as well as in reviews of (revealed preference) discrete choice approaches (Swait et al., 2002; Ben-Akiva et al., 2012). In this chapter, approaches to account for context are viewed in three relatively distinct ways. The first, and most likely the simplest, involves delimiting the choice (space) set decision makers face. The second approach is the consideration of social interactions in one or more choice facets. The third is an attempt to extend (or radically reconceptualize) decision-making paradigms to integrate context in the choice act(s) depicted by model(s). The major thrust of context analysis and modeling appears to be in choice set identification, and implicit as well as explicit models of choice set formation, availability and consideration (see the review by Pagliara and Timmermans, 2009, and the seminal paper

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

111

by Manski, 1977). Undoubtedly, one of the most important frameworks and inspirational constructs for representing context in discrete choice behavior is Hägerstrand’s (1970) time geography framework that considers temporal and spatial dimensions of behavior and a relatively comprehensive set of constraints to the movement of people. Based on this time geography framework, a variety of conceptual models have been developed (for example, Miller, 1991; Kwan, 1998; Weber and Kwan, 2002; Kim and Kwan, 2003), and the time-space prism construct has been computationally operationalized (for example, Kwan and Hong, 1998; Pendyala et al., 2002; Lee et al., 2010; Auld and Mohammadian, 2011a; Yoon and Goulias, 2010). In parallel, in a spatial setting where the alternatives are many, Thill (1992) discusses choice set misspecification and the consequences of an ill-defined model, and points out that the elimination of regression coefficient biases requires the identification of the right choice set. It is well recognized that choice sets can be very large, and methods to deal with this large number of alternatives include deterministic approaches imposing thresholds of distance and time (for example, Black, 1984; Termansen et al., 2004; Scott, 2006), matching observed choices by similar sample members (Miller and O’Kelly, 1983), or combinations of activity type/trip purpose and distance to delimit an area with fewer options (Bowman and Bradley, 2006). However, following Manski’s (1977) two-stage method for choice set formation and choice model estimation, more sophisticated and rigorous approaches include a two-stage spatial context setting method (Zheng and Guo, 2008), the joint estimation of choice set composition and alternative selection using dominance criteria as in Cascetta et al. (2007), and a process that implicitly considers choice set formation as in Bierlaire et al. (2009). A related approach stems directly from Hägerstrand’s ideas. Kwan (1998) developed a method for measuring point-based accessibility using the feasible/reachable opportunity set that is found using the time-space prism. Then, Weber and Kwan (2002) brought travel time variation and facility (business establishment) opening hours into the measurement method to account for the dynamics of congestion level and temporal availability of activity opportunities. Subsequently, Kim and Kwan (2003) also included the idea of a ‘time window’ during which each facility can be enjoyed and the travelling environment, which is conditioned by the transportation network’s operational characteristics (that is, one-way streets, turn prohibitions, congestion and segment specific travel speeds). This idea was taken one step further by Chen et al. (2011) to develop automobile-based dynamic urban landscapes, and Lei et al. (2012) to develop transit schedule-sensitive dynamic urban landscapes in a mega region, illustrating that methods of this type are practical, feasible and readily available. In fact, Yoon et al. (2012) created a method that combines time window of opportunities with space footprints and detailed inventory of business establishments to enumerate feasible choice sets for every individual simulated in a large-scale microsimulation model system. In this way, the choice set (for example, of a destination choice model for a meal with family) is the outcome of a myriad of conditions that encompass residential location, job location, and school location of an individual and his or her household, progress through the simulation of an entire day, and finally culminate in the creation of a spatial footprint. In parallel, the environment in which a person will act to select a destination shifts by time of day with the movement of everybody else creating congested and uncongested locations. The representation of these phenomena has proven to be feasible even when it is desired to develop a model that updates network travel times on a minute-by-minute

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

112

Handbook of choice modelling

basis as in SimTRAVEL, the Simulator of Transport, Routes, Activities, Emissions, and Land developed by Pendyala et al. (2011). Such time-sensitive choice set simulation and identification models implicitly incorporate the contexts reviewed in the preceding sections of this chapter. However, even such methods are unable to capture the impact of changing values, regulations and cultural traits because they are not explicitly modeled. This, in turn, undermines the ability to transfer models across space, time and social setting. The second major approach to context modeling focuses on the social context and the interactions involved in such a context. This is not necessarily a new area of research for travel behavior modeling and many past contributions have recognized the influence of social networks in travel related decision-making by including explanatory variables capturing household structure and composition in regression models. This approach is generally not sufficient to capture the direct influence of one person’s actions on another person’s actions and a variety of advanced methods are being explored. Reflecting the importance of this topic area, the journal Transportation Research dedicated an entire issue to this topic in 2011, the Journal of Choice Modeling published numerous papers on this topic in 2011 and 2012, and the journal Transportation Letters has an entire issue dedicated to papers from a workshop that addresses social interactions. It is not possible to review in detail all contributions on this topic, but a few key directions of research are identified within this chapter. People make choices in a social context which changes in space and time. People’s choices are, with few exceptions, influenced by what others do, what others tell them that they do and what they perceive others do. Interactions among people facilitated by social networks are therefore important to understanding context in choice making. Although there is widespread recognition of the importance of social networks and interactions in human decision-making and choice behaviors (for example, Páez and Scott, 2007; Arentze and Timmermans, 2008; Axhausen, 2008; Carrasco et al., 2008; Arentze et al., 2012a), work in this domain remains in its infancy. A key challenge in adequately capturing the social context is the difficulty associated with collecting information about people’s social networks and the strength of the multitude of associations in a network. Axhausen (2008) articulates the challenges associated with collecting data on social networks in the context of understanding activity-travel choices. The number of social contacts that people have may run into the hundreds, and possibly thousands, and it is burdensome to expect survey respondents to list all social contacts and the strengths of their association with these contacts. Although Axhausen (2008) limits the intent of collecting such data to identifying the composition of the group participating in an activity or travel episode, the beneficiaries of an action or choice, the group dynamics that may have preceded the exercise of a choice and the distribution of costs across participants in an activity or travel episode, it is possible to envision an even wider use of social network data so to better understand the flow of information across members of a network that in turn affects the choices that are subsequently made. When neighbors talk to each other about the cars they drive, the shopping malls they visit, the recreational parks in which they play, the modes of transportation they use and the restaurants at which they eat, information is being exchanged; depending on the level of trust or closeness across members of the network exchanging the information, choices may get influenced. In addition to active social interactions that involve a conscious effort on the part of

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

113

one or more members of a social circle to pass on information to or obtain information from another one or more members of the social circle, it is entirely possible that choice behaviors are influenced through passive social interactions that occur in the time-space continuum. When people passively observe what their neighbors do (for example, the cars they drive, the amount of bicycling and walking in which they engage, the degree to which yards are maintained), they may be subliminally influenced with respect to the choices they exercise as well. Passive interactions, and associated influences, are harder to measure as it would be necessary to ask people to report their subjective beliefs on the extent to which they think they are influenced by the actions of others. This is in contrast to active interactions data collection efforts where people can be asked to list or quantify their social circle and the communications that actually took place (although it may be difficult for respondents to accurately identify all members of their social network and the full extent of the communications in which they engage). Data collection efforts in the social network arena often focus on the individual, adopting an ego-centric approach that views individual social networks as the context in which data about the spatial distribution and generation of activity-travel episodes and the use of information and communications technology (ICT) may be obtained (Carrasco et al., 2008). Modeling the formation and extent of social networks continues to be a challenge largely due to the limited availability of data describing people’s social networks. Arentze and Timmerman (2008) propose an agent-based simulation modeling framework to better estimate social activity-travel demand, but their work assumed that the social network for a synthetic population in the microsimulation is already given or known. More recently, Arentze et al. (2012b) propose an approach for modeling social networks in geographic space. Their paper offers an approach to construct a social network for every person in a synthetic population based on a friendship-formation model. The model considers the degree of similarity in socio-economic and other characteristics, the geographic proximity, and the baseline preferences and opportunities for friendship creation in creating the social network. The model is based on a microeconomic utility formulation where the utility of forming a friendship connection must exceed a minimum threshold utility value for both individuals in the relationship. Skyrms and Pemantle (2000) offer a more dynamic model of social network formation, noting that social networks form over time with the strengths of associations reinforced or dissipated depending on the nature of the interaction among the agents comprising a relationship. Their model employs principles of game theory in which agents are assumed to play repeated games, learn from the outcomes of the games, and evolve their relationships and actions as a result of the learning process. Toivonen et al. (2006) present a model of social networks where communities are formed through a combination of random attachment as well as implicit preferential attachment. These processes facilitate the growth or evolution of a social network, and the models are capable of efficiently producing very large networks that can be used to study socio-dynamic phenomena. Until the modeling of social networks matures, however, and yields operational platforms that can be used to simulate social interactions in their entirety across an entire synthetic population, models of choice need to recognize that the social context in which behaviors manifest themselves is largely unobserved. In the absence of explicit knowledge and observation of the social interactions in time and space that affect choice behaviors, there has been considerable attention devoted to the modeling of choice

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

114

Handbook of choice modelling

phenomena while accounting for endogeneity arising from unobserved social and spatial dependence effects. Walker et al. (2011) incorporate a peer group social influence variable into a mode choice model with a view to capturing social interaction effects in a mode choice context. However, they note that the introduction of such variables results in an endogeneity problem because the very (unobserved) factors that influence peers also influence the subject. To account for this, they present a methodology that corrects for endogeneity arising from the introduction of social peer influence variables in choice models. Operational models accounting for unobserved social and spatial context effects have been developed in the context of modeling a range of choice behaviors. For example, Paleti et al. (2013) present a model of household vehicle type choice that accounts for spatial dependency effects, Ferdous et al. (2011) present a model of non-motorized transport mode use that accounts for family, spatial, and social context, and Sidharthan et al. (2011) present a model of school mode choice of children while accounting for spatial and social interaction effects. These model systems introduce correlations due to unobserved spatial and social interaction effects across decision-making units with the strength of the interaction across agents often dependent on proximity. Although progress has been limited in modeling social network contexts in which choices are made, there have been notable developments in the modeling and understanding of interactions within households among family or household members. The immediate (closest) social network that often (but not always, as the review in the first part of this chapter attests) defines the context and constraints governing individual choice behavior are those of the household unit. Family members share resources, undertake activities together, allocate activities among one another, negotiate with one another when making choices that affect multiple members of the household, and sometimes depend on one another for mobility (for example, in the case of children who are dependent on adults for transportation). There has been considerable progress in the development of models of household interactions and group decision-making (for example, Bradley and Vovsha, 2005, Goulias and Kim, 2005; Srinivasan and Bhat, 2005; Goulias and Henson, 2006; Yoon and Goulias, 2010). However, even in this arena, there have been numerous challenges in the operational implementation of models capable of capturing the full range of interactions that may take place among household members. In a microsimulation model of activity-travel choice behavior recently developed for the Southern California Association of Governments (SCAG), the planning agency for the Greater Los Angeles Metropolitan Area in California, Goulias et al. (2011) have implemented a multiple discrete-continuous extreme value (MDCEV) modeling approach to account for intra-household interactions. When there are multiple household members and multiple activity-travel purposes that may be pursued, the possible number of combinations of activity engagement alternatives for the household can quickly explode. This renders the use of traditional single discrete choice models (where only one alternative is selected from a choice set) untenable. The MDCEV model is capable of accommodating multiple choices simultaneously wherein an individual may choose multiple alternatives from a choice set and allocate a continuous resource (such as time, mileage or money) to each alternative chosen (Bhat, 2008). In the household interactions context, an individual may (in the course of a day) choose an array of activity purposes in which to engage, and none, one or more than one other individual in the household with whom to pursue each activity chosen. For each activity episode, the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

115

individual allocates time thus establishing the duration of the activity-travel event. It should be noted that when multiple individuals are selected to participate in an activity, all of their activity-travel schedules have to be ‘locked down’ during that period, thus creating a more constrained choice context for activity engagement for all household members affected. The social context underlying choice behavior has taken on added complexity in an era of ubiquitous mobile communications and computing. The pervasive ICT has made it possible for people to create social networks and interact in ways that would have been unimaginable prior to the advent of mobile communications technology (for example, Mokhtarian et al., 2006; Kenyon and Lyons, 2007; Krizek and Johnson, 2007; Wang and Law, 2007). A social network is no longer defined by geographical proximity or colleagues at work and school; a social network may now be virtual with an individual’s network extending across the globe. Interactions with this extensive and broad virtual social network may create opportunities that would otherwise not be available. Recent evidence in the USA suggests that teenagers are delaying or foregoing the acquisition of a driver’s license at least in part because of the interactions in virtual space that reduce the need for interactions in real space (Sivak and Schoettle, 2012). Household members and friends located at different points in time and space can use mobile communications technology to instantaneously arrange or cancel a meeting or social gathering. The spontaneity in activity-travel scheduling that technology allows, the ability to stay connected 24/7 from anywhere in the world, and the instantaneous access to information through social media and the Internet create a technology-driven context in which people now make choices. The ability to bank, shop, work, take classes, watch movies and play games online opens up new opportunities and releases traditional constraints that may have defined the context in which choices were made. Some technological enhancements, such as the advent of electric vehicles that have shorter driving range or the ability to charge travelers variable tolls on a per-mile basis, may impose new costs or constraints that once again define a technology-driven context in which choices are made. The ability to account for technology penetration and adoption in the definition of choice context is important to understanding and modeling human choices in the modern era (Pendyala and Bhat, 2012). A third key direction for incorporating context is one which recognizes that several facets of a context may remain unobserved, often leading to the development of rulebased behavioral paradigms that are built upon qualitative surveys and data (for example, Pendyala et al., 1998; Arentze et al., 2000). The choice set formation problem that was alluded to earlier in this chapter is a classic case of an unobserved contextual situation that influences the modeling of choice. When people report their choice behavior in a survey, data describing the choice set itself is often never collected. In a typical mode choice model development effort, survey data provides information about the chosen mode but no information about the non-chosen modes. Not only does this create issues in the measurement of the attributes of the non-chosen alternatives (which have to be derived from secondary data sources, and may be subject to considerable measurement error), but it also yields no information about the composition of the choice set. For an alternative to be included in a choice set, an individual must be aware of the alternative and must actually consider it in the particular choice context of interest. Alternatively, an individual may be completely unaware of a transit alternative, in which case it is

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

116

Handbook of choice modelling

impossible to consider it. On the other hand, an individual may be aware of a transit alternative, but simply does not consider it in a particular mode choice context. In either of these two cases, the transit alternative would not be included in the choice set that includes only those alternatives that were traded-off against one another. Only when the individual is both aware of and considered the transit alternative would the alternative be included in the choice set. Surveys that purport to collect behavioral data that would be used for choice modeling should include questions that capture the awareness and consideration of alternatives (for example, Outwater et al., 2011). A number of studies have adopted two-step approaches to account for latency in choice set composition, with the first step dedicated to forming the choice set for each choice-maker and the second step modeling the choice itself given the consideration choice set (Castro et al., 2011). The concept of latency may arise in a number of other dimensions as well. In models of multidimensional choice where relationships across a multitude of choice variables are being investigated (for example, Pinjari et al., 2011), information on the choice process underlying the relationships is almost always lacking in surveys. In a joint model of residential location choice and worker location choice, the question may arise whether workers explore possible work locations based on their residential location or households explore residential locations based on the work locations of the workers in the household. In all likelihood, there are multiple segments in the population with some households choosing residential location based on worker workplaces and others in which workers identify work locations based on the residential location of the household. However, it is not known from traditional survey data collection efforts as to which households fall into the different segments. Using a latent segmentation approach, Waddell et al. (2007) develop a joint model of residential and workplace location choice that accounts for the unobserved nature of the underlying choice phenomenon of interest. What is notable in this regard is that the underlying choice process (structural relationship among multiple choice variables) is likely influenced by the context in which the choice process unfolds. In other words, it is likely that heterogeneity in behavioral choice-making may, at least in part, be a manifestation of heterogeneity in choice contexts in which the choices are made. Reflecting heterogeneity in choice contexts in models of choice behavior continues to be a challenge across a wide variety of disciplinary domains largely due to the paucity of contextual data that is typically collected in surveys. Another dimension that often remains unobserved in choice modeling contexts is the cultural and attitudinal perspectives of the choice makers. Most traditional surveys do not collect detailed information about the attitudes, perceptions, values, and cultural constructs in which people operate. Rather, these aspects of a person remain unobserved and are often relegated to the error term as unobserved factors that affect choice utilities. In this simple approach, the explicit recognition that cultural constructs and attitudes define a context in which a person is making choices is absent. The cultural and attitudinal context has been shown to be critical in shaping location choices and activity-travel choices (for example, Kitamura et al., 1997a; Kuppam et al., 1999; Gatersleben and Appleton, 2007) as well as consumption of goods (Moschis, 2007). Culture is defined by societal norms and expectations, communication protocols, and household roles and responsibilities – all of which define a context in which people make choices. These norms and expectations may change over time bringing about cultural change or shifts that, in turn, lead to dynamics in choice behaviors (Baltes, 1987). Reflecting dynamics in

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

117

context and the interplay (feedback) between choice and context remains a fruitful area for inquiry. The use of structural equations modeling techniques has made it possible to represent latent constructs as a function of measured variables (Fujii and Kitamura, 2000; Bagley and Mokhtarian, 2002; Golob, 2003; Deutsch et al., 2011). Such model systems are also able to better capture the influence of qualitative attributes or constructs on choice behaviors (for example, the influence of ride quality and comfort on the choice of public transit as a mode of transport) as described in Walker (2001). Understanding the influence of such variables, and their importance (as represented by attitudes and values), makes it possible to better connect context with choice. In a recent paper, BenAkiva et al. (2012) articulate a vision for the connection between process and context recognizing the importance of latent constructs in realizing such a vision. They identify family, friends, and markets as three critical constructs that define the context for exercising extended choice models that purport to capture behavioral processes underlying decision-making. Using three examples, namely, that of subjective well-being (AbouZeid and Ben-Akiva, 2010), social interactions and transportation choices, and dynamic plans and actions (Ben-Akiva, 2010), they show how behavioral processes resulting in choices can be connected to information on context. Context may also be interpreted or defined as a series of rules and rule-based heuristics that govern choice behavior. Computational process models (for example, Pendyala et al., 1998; Arentze et al., 2000) generally fall into the category of characterizing context by a set of rules and heuristics. Rules are derived or deduced from qualitative surveys, in-depth focus groups, logic or observations of behavior as measured in traditional surveys. Rule-based heuristics may take a variety of forms including conditional if-then statements, logic checks, behavioral adjustment protocols and communication mechanisms. Recent work exploring the relationships between activity-travel demand and transportation network performance (Pendyala et al., 2011) underscores the importance of constructing rule-based behavioral processes capable of representing choice mechanisms at work. When an individual arrives late at a destination owing to congestion on the network, how does the individual adjust his or her activity schedule subsequent to the late arrival? Are activity durations shortened, some activities completely eliminated from the schedule, other activities reassigned to other household members or the entire activity schedule simply shifted in time along the continuous time axis? Any number of rules may be incorporated to model the adjustments and conflict resolutions that take place in the crafting of a human activity-travel schedule (for example, Roorda and Miller, 2005). The identification of these rules, which may be considered as defining the context in which choices are made, remains a subject of inquiry; in-depth qualitative surveys and computer aided process questionnaires such as CHASE and ADAPTS (Doherty and Miller, 2000; Arentze and Timmermans, 2003; Heinz, 2003; Auld and Mohammadian, 2011b) offer the ability to extract rule-based contextual considerations underlying choice behavior.

5

ELICITATION (DATA COLLECTION) METHODS

A major factor that has limited the ability to adequately consider context in the modeling of choice is the lack of data that sufficiently describes the spatial, temporal, cultural,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

118

Handbook of choice modelling

attitudinal, built environment and social context in which decisions are being made. While surveys have traditionally collected information about socio-economic characteristics and the choices that people actually make in a variety of domains (for example, choice of transportation, choice of residential and work location, choice of food groups to consume, choice of brand for any number of household goods and choice of vehicle), they have provided little information about the contextual variables that describe the setting and environment in which the choice decisions were made. For example, in the public health domain, there are several surveys (for example, National Health and Nutrition Examination Survey and the National Survey of Family Growth in the USA) that collect very detailed information about socio-economic characteristics and transitions (including fertility and mortality), what people eat, the amount of exercise that people undertake, and health indicators such as blood pressure and body mass index (Berrigan and Troiano, 2002). However, these surveys offer little to no information about the natural and built environment in which the household resides, thus offering no spatial context in which these health outcomes and food consumption patterns are being observed – leading some researchers to merge data across disparate sources (for example, Schenker and Raghunathan, 2007). The health surveys also provide limited information about cultural and attitudinal traits that may affect how people view the need to eat healthily and lead healthy lifestyles. In the travel survey domain, it is generally the case that survey data sets include detailed information about socio-economic characteristics and travel choices over the course of a day (as part of a travel diary survey). However, these surveys have little information about the real or perceived built environment, the awareness or consideration of alternatives and the attributes of non-chosen alternatives. As a result, modelers of choice behavior have to create ad hoc rules and assumptions regarding the composition of the choice set, append variables describing the built environment and non-chosen alternatives from secondary data sources that often provide the information at a level of aggregation that is less than desirable, and resort to simplifications when specifying choice models. In addition, travel surveys contain no information about the behavioral process that lead to the choice decisions in question – thus offering no ability to decipher how the context influenced the choice process. In addition, traditional surveys offer virtually no information about attitudes, values, perceptions and experiences that constitute an important part of the context in which people exercise choices (Clifton and Handy, 2003). The recognition that there is a paucity of contextual data has motivated the collection of such information through a variety of experiments. It should be recognized that, at least in the transportation domain, there are examples of traditional travel surveys increasingly attempting to capture contextual information. The recent National Household Travel Survey in the USA included a series of questions on attitudes and perceptions towards walking and bicycling, thus offering a rich perspective on the attitudinal context underlying the use of non-polluting and healthy non-motorized modes of transportation (Seraj et al., 2012). The survey included questions about the importance of a number of factors that may have been considered when choosing a residential location, thus offering perspectives on the contextual considerations that motivated choice of residence (Kortum et al., 2012). In yet another study focusing on factors motivating transit usage, information is being collected explicitly about people’s awareness and consideration of alternative modes of transit – thus offering rich contextual information for

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

119

choice set formation models (Outwater et al., 2011). Several studies have adopted more expansive survey data collection efforts to obtain information about attitudes, values, perceptions, and experiences with a view to better understand the influence of latent constructs on choice behaviors (Kitamura et al., 1997b; Walker, 2001). In addition to the collection of contextual information within the purview of traditional survey data collection efforts, there are several specific experimental approaches that have been applied to obtain contextual data within the behavioral choice modeling domain (for example, Manski, 2002, 2004). An approach that has been used extensively is that of stated choice experiments (Louviere et al., 2000; Hess et al., 2010). In stated choice experiments, respondents are presented questionnaires with a series of choices each characterized by several attributes. Based on the attribute values in any given choice scenario, the respondent is expected to express a choice or provide a ranking of choices. By subjecting a respondent to a number of such choice scenarios, it is possible to understand the tradeoffs that decision makers exercise in their choice behaviors, the importance of different attributes in the choice process, and the decision-making rules that may be at play (Caussade et al., 2005). Stated choice experiments offer the ability to elicit information about choices that may be exercised in the event that a scenario (that does not currently exist in the real world) was to be implemented. Stated choice experiments are therefore rich sources of information on choice behaviors because they offer the analyst the ability to include contextual information of interest. Each stated choice experiment scenario can be prefaced with or placed in a contextual setting explaining the circumstances or conditions under which the respondent is to consider the choice alternatives and exercise a choice or express a ranking preference. For example, when offering a choice scenario involving vehicles of different types as alternatives (characterized by such attributes as purchase price, fuel economy, size and fuel type), it is possible to present a contextual backdrop. Contextual variables such as the price of fuel, commute distance, level of congestion and accessibility to destinations (built environment) may be presented and described, thus placing the choice scenario in context. The respondent would then identify a choice while considering the contextual situation described. Variations in stated choice question design offer the ability to elicit information in different contextual settings (Lee-Gosselin, 1996). Stated preference questions ask respondents to identify a choice from among a set of alternatives described by a set of attributes. Stated tolerance questions ask respondents to identify circumstances under which they would exhibit or make a specific choice. Stated adaptation questions ask respondents to articulate what they would do differently (how would they adapt) if faced with a set of circumstances (context). Finally, stated prospect questions are more open-ended, asking respondents to identify circumstances (context) under which they would change behavior and explain how they would go about changing their choice behavior. Stated choice experiments allow the deployment of custom questionnaires with choice scenarios presented in such a way that respondents can relate to the contexts presented to them (see Polydoropoulou et al., 2012, for a combination with attitudes). Moreover, online versions of stated choice surveys make it possible to present follow-up choice scenarios where levels of attributes and contexts are customized based on choices expressed by respondents to prior choice scenarios. By presenting a customized series of questions that evolve according to the responses provided by survey takers, it is possible to explore the tradeoffs that people make in different contextual situations and to

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

120

Handbook of choice modelling PCATS server program

Experimenter

Agents Problem

Subjects’ inputs Ethernet

Client program

Client program

Collective results

Client program

Source: Kitamura et al. (2008).

Figure 5.2

Working of experimental data collection process with subjects and computer agents

elicit some information about the learning and adaptation process that may be at play. However, a critical missing ingredient in such surveys is the social context (or dependencies) that often play a major role in choice behavior. As noted earlier in this chapter, people’s choices are influenced by the social context in which they are placed. In other words, having people respond to a stated choice experiment in isolation may not fully and accurately depict what people would do in a societal context where they are interacting with others and observing what others do. In view of this, a rather novel variation on the stated choice design is one whose roots may be traced to the field of behavioral economics (Gaker et al., 2010) which focuses on understanding the evolutionary process and key motivations underlying choice behavior in the context of what other agents in the system are doing. Controlled stated choice experiments of this nature can take place in a behavioral human-machine laboratory such as that depicted in Figure 5.2 (Kitamura et al., 2008). In this setup, subjects are confronted with a series of choice scenarios as in regular stated choice experiments. The nature of the experiment, however, can be controlled in this setup to provide each subject with as much or as little information as desired regarding the choices being made by other subjects in the experiment. The laboratory experiment therefore allows the researcher to provide information to the respondent in an effort to mimic a real-world social context environment. As subjects learn about the choices that others are making, the consequences of their own actions, and the experiences that may result, their responses to subsequent scenarios may be affected. In this way, the social context, which is characterized by numerous interaction effects, is captured within the stated choice experiment environment. For example, consider a choice experiment where choices of alternative fuel vehicle types are being explored. In the absence of any information about the vehicle types that others in society are purchasing or choosing, an individual may behave in a certain way and exercise a choice that is purely based on his or her personal preferences. However, when presented with partial or complete information about the choices that others are making, a person’s choice processes may be altered or

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

121

influenced sufficiently to result in different choices being made. For example, if a subject is informed that others are choosing environmentally friendly vehicles, then the subject may be inclined to exercise a similar choice so as not to appear too different in a social context. Eliciting the effect of context on choice behavior is possible in this way through controlled laboratory experiments that can be varied with respect to the extent of information sharing and interaction that is allowed across subjects. The interactive experiments can be further extended into full fledged gaming environments where subjects must adopt strategies, identify threats and opportunities, collaborate and compete, and make choices with a view to achieving a certain goal (for example, Lee-Gosselin, 1990; Turrentine et al., 1992; Gärling et al., 1998; Manski, 2002). More recently, the proliferation of online gaming environments has created new opportunities to observe patterns of choices, interactions, and behaviors in a multitude of contexts characterized by rules regarding collaboration, interaction, and competition (Mahmassani et al., 2010). By following players, within online gaming environments, and examining their choices and strategies in various contexts, processes at play may be deciphered. While stated choice experiments constitute a key methodology for eliciting choices in hypothetical contexts, there may be opportunities to study choice behaviors within realworld contexts as well. In stated choice experiments, the analyst presents and defines the context; in many real-world experiments, the analyst can still control and choose the context in which data will be gathered, but subjects are actually experiencing the context in their lives. Real-world experimental settings can take various forms. In a study of land use–travel behavior relationships, Kitamura et al. (1997a) examined travel choices in five different neighborhoods characterized by a variety of built environment attributes. Studies of that nature examine cross-sectional variations in behavior across spatial, social or situational contexts. On the other hand, travel choice studies may examine how people behave when subjected to a stimulus and take the form of before-and-after experiments where changes in behavior attributable to the stimulus are measured and isolated (for example, Burris and Pendyala, 2002; Fujii and Kitamura, 2003; Olszewski and Xie, 2005; Fujii and Taniguchi, 2006; Washbrook et al., 2006; Ettema et al., 2010). In the cross-sectional studies, differences in behavior are observed at one point in time across a variety of contexts; in the latter before-and-after approach, differences in behavior are observed over time often in conjunction with a change in context (for example, before and after a change in the price of a good). When examining context, there is inevitably an interest in understanding how people considered the attributes of the context in which they are operating to arrive at a choice. For insights such as this to be realized, it is necessary to elicit information about the behavioral process that the individual followed in exercising a choice. The behavioral process would generally involve considering the many opportunities and constraints presented by a context, influences of other agents in the system, and personal attitudes and preferences before settling on a choice. As circumstances change, and contexts evolve (Laub and Sampson, 1993), people continue processing the information – past and present – to alter or adjust their choices in a way that is not necessarily rational or optimal (McFadden, 1999; Manski, 2010). An example of an experiment where behavioral process information was elicited is that of CHASE, a computerized household activity scheduling experiment (Doherty

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

122

Handbook of choice modelling

and Miller, 2000). In this experiment, all subjects were provided with a laptop computer with a spreadsheet program in which they could enter their planned activity schedules ahead of time for the week ahead. Subjects were free to alter schedules and adjust activity engagement patterns at will, and the measurement device captured and recorded every change that was made, the timing of the change and the nature of the changes in activity schedules. As the week progressed and people encountered/experienced varying contexts in their lives, they altered their activity schedules, inserted new activities, deleted activities and moved activities around in time and space. Whenever people made a change in the activity schedule, they were explicitly asked why they made the change and how they arrived at the particular alteration decision. The result is a dataset with detailed information about the extent of planning that goes into activity scheduling, the types of adjustments people make to their schedules, the extent of spontaneity in activity-travel engagement, and most importantly – the evolutionary process that led to the choices they recorded. By obtaining rich information about the adjustment processes and the underlying motivations, it is possible to study activity-travel choices in the context in which they occurred. The notion of time-dependency is central to the study of context in choice modeling. Context changes over time and these changes in context in turn contribute to changes in behavioral choices. It would therefore be of considerable value to measure choices, and the contexts in which they occur, over time while controlling for a variety of unobserved individual specific factors (Kitamura, 1990). The adoption of longitudinal data collection efforts such as repeated cross-sectional surveys and panel surveys would provide information (over time) that can be used to derive the evolutionary processes that lead to choice behaviors. However, tracing the evolutionary path of behaviors and contexts cannot be done by merely measuring choices at different points in time; often, such longitudinal data collection efforts must be enriched with in-depth qualitative data collection mechanisms that are capable of eliciting the connections between context and choice (Pendyala and Bricka, 2006). Obtaining longitudinal data would also provide the ability to stitch together the story of a life course, where long-, medium- and short-term decisions are all interconnected, and the nature of the connections is evolving with the context (Elder and Giele, 2009; Yoon and Goulias, 2010; Goulias and Yoon, 2011). Large-scale microsimulation models that purport to model urban systems and societal dynamics attempt to capture the full range of choices across a variety of contexts over time (for example, Pendyala et al., 2012). The development of such models calls for the collection of a myriad of datasets using a variety of survey techniques to obtain information on the dynamics of choice behaviors as contexts evolve. Goulias et al. (2013) recently presented a total survey design, inspired by earlier work by Brög and Erl, 1980, that would aid in the development of large scale microsimulation models in the urban transportation and land use arena. The total survey design, depicted graphically in Figure 5.3, is comprised of a number of surveys that support a core household activity-travel survey that collects complete information about the socio-economic characteristics and activity-travel schedules and choices of households and the individuals comprising households. A survey focusing on land use and accessibility would provide detailed information on the mental maps of individuals and how they perceive space and build a context around the theme of ‘sense of place’ (Deutsch et al., 2011). A survey that involves a wearable global positioning system (GPS) unit provided to survey respondents

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

In-depth mode supplement with active living questions

GPS & GPS OBD (verification, special days, emissions)

Panel of households and persons & 2-day diary

Expenditures and budgeting survey

Residence, workplace, and school location

123

One week activity diary

Core survey (household,person, and base diary)

In-depth car ownership and use

Location choice and activity satisfaction

Long-distance component Toll willingness to pay

Retrospective and prospective location choices

Source: Goulias et al. (2013).

Figure 5.3

The total survey design data collection schema

would give detailed information regarding non-motorized mode usage and route choice along the continuous time axis. An attitudinal survey would provide information about cultural constructs, attitudes, perceptions, values and priorities. A stated choice survey would ask people to identify choices they would exercise when placed in new hypothetical contexts. A special mode choice and usage survey would provide information about the use of alternative modes of transportation such as transit – in the context of the built environment. A part of the survey design may be dedicated to a rotating panel survey where the same individuals are tracked through time, but with a part of the sample rotating out of the panel on a periodic basis to accommodate new entrants into the sample. By examining changes in behavior over time for a rotating panel sample, it will be possible to capture dynamics in contexts (and therefore choices) while controlling for individualspecific effects. Real-world experiments can be conducted with stimuli of various kinds injected into the lives of people to see how they react, adapt, learn and evolve their choices over the short, medium and long term. Behavioral process-oriented qualitative

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

124

Handbook of choice modelling

surveys can provide information on the underlying paradigms that explain choices in various contexts. A host of other surveys such as long-distance travel surveys, special event and visitor surveys, monetary expenditure surveys, workplace surveys and external surveys can further shed light on the full spectrum of activity-travel demand that takes place in urban environments. The collection of data on context cannot be separated from the collection of data on choices. Virtually all choices are made within a context, and the two entities are therefore two sides of the same coin. Data collection protocols should ideally focus on collecting information about three fundamental entities – process, context and choices – so that behaviorally accurate models depicting relationships across all three constructs may be built.

6

CONCLUSION

The consideration of context in choice modeling is not new and there is undoubtedly a rich body of evidence on the role of context in explaining choice as well as theoretical frameworks to guide us. In fact, virtually all choice models include a host of contextual variables that capture the socio-economic, cultural, attitudinal, familial, social, spatial, temporal, and built and natural environment within which people make long-, medium- and short-term choices. The key is to uncover new ways to capture a greater level of detail, breadth and depth of context in choice models through innovative survey methods that elicit insightful information into behavioral decision processes under different circumstances. The practice of relying on the random error term to account for the many unknown contextual variables that explain choice, limits the potential of choice models to explain, understand, predict and forecast household and personal choices under a wide variety of scenarios; more importantly, the underlying processes that explain how and why scenarios play out in different ways can never be understood when context is not adequately represented in model specifications. To this end, choice modelers should strive to collect a rich set of contextual data through appropriate survey protocols that bring together information of different types from a variety of domains. In fact, in choice modeling we have the underlying theory to develop laboratory and real-life experiments and begin filling the empirical gaps in the many facets of the PPCT model. We also have model advances that aid in developing increasingly richer specifications of model structures used in behavioral simulation and forecasting applications.

ACKNOWLEDGMENTS The Office of the President UC Lab Fees research program, the Multicampus Research Program Initiative on Sustainable Transportation, the University of California Transportation Center, and the Green Transport in the Islands Area project of the European Union partially supported the preparation of this chapter for the first author, and they are gratefully acknowledged.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

125

REFERENCES Abou-Zeid, M. and M. Ben-Akiva (2010), ‘A model of travel happiness and mode switching’, in S. Hess and A. Daly (eds), Choice Modeling: The State-of-the-Art and the State-of-Practice, Bingley: Emerald, pp. 289–306. Arentze, T. and H. Timmermans (2003), ‘Modeling learning and adaptation processes in activity-travel choice: a framework and numerical experiment’, Transportation, 30 (1), 37–62. Arentze, T. and H. Timmermans (2008), ‘Social networks, social interactions, and activity-travel behavior: a framework for microsimulation’, Environment and Planning B, 35 (6), 1012–27. Arentze, T., F. Hofman, H. van Mourik and H. Timmermans (2000), ‘ALBATROSS: multiagent, rule-based model of activity pattern decisions’, Transportation Research Record: Journal of the Transportation Research Board (1706), 136–44. Arentze, T., P. van den Berg and H. Timmermans (2012a), ‘Modeling social networks in geographic space: approach and empirical application’, Environment and Planning A, 44 (5), 1101–20. Arentze, T.A., M. Kowald and K.W. Axhausen (2012b), ‘A method to model population-wide social networks for large scale activity-travel micro-simulations’, paper presented at the 91st Annual Meeting of the Transportation Research Board, Washington, DC, January. Auld, J. and A. Mohammadian (2011a), ‘Framework for the development of the agent-based dynamic activity planning and travel scheduling (ADAPTS) model’, Transportation Letters: The International Journal of Transportation Research, 1 (3), 245–55. Auld, J. and A. Mohammadian (2011b), ‘Planning constrained destination choice in the ADAPTS activitybased model’, paper presented at the 90th Annual Meeting of the Transportation Research Board, Washington, DC, January. Axhausen, K.W. (2008), ‘Social networks, mobility biographies, and travel: survey challenges’, Environment and Planning B, 35 (6), 981–96. Bagley, M.N. and P.L. Mokhtarian (2002), ‘The impact of residential neighborhood type on travel behavior: a structural equations modeling approach’, The Annals of Regional Science, 36 (2), 279–97. Baltes, P.B. (1987), ‘Theoretical propositions of life-span developmental psychology: on the dynamics between growth and decline’, Developmental Psychology, 23 (5), 611–26. Ben-Akiva, M. (2010), ‘Planning and action in a model of choice’, in S. Hess and A. Daly (eds), Choice Modeling: The State-of-the-Art and the State-of-Practice, Bingley: Emerald, pp. 19–34. Ben-Akiva, M., A. Palma, D. McFadden, M. Abou-Zeid, P.-A. Chiappori, M. Lapparent, S.N. Durlauf, M. Fosgerau, D. Fukuda, S. Hess, C. Manski, A. Pakes, N. Picard and J. Walker (2012), ‘Process and context in choice models’, Marketing Letters, 23 (2), 439–56. Berrigan, D. and R.P. Troiano (2002), ‘The association between urban form and physical activity in U.S. adults’, American Journal of Preventive Medicine, 23 (2), 74–9. Bhat, C.R. (2008), ‘The multiple discrete-continuous extreme value (MDCEV) model: role of utility function parameters, identification considerations, and model extensions’, Transportation Research B, 42 (3), 274–303. Bierlaire, M., R. Hurtubia and G. Flötteröd (2009), ‘An analysis of the implicit choice set generation using the constrained multinomial logit model’, Transportation Research Record: Journal of the Transportation Research Board (2175), 92–7. Billig, M. (2004), ‘The residential-environment climate sense of place in locations of urban revitalization’, Dela (21), 581–92, accessed 23 January 2014 at http://journaldatabase.org/articles/residential-environment_ climate_sense.html. Black, W.C. (1984), ‘Choice set definition in patronage modeling’, Journal of Retailing, 60 (2), 63–85. Bourdieu, P. (1984), Distinction: A Social Critique of the Judgement of Taste, Cambridge, MA: Harvard University Press. Bourdieu, P. (1998), Practical Reason, Stanford, CA: Stanford University Press. Bourdieu, P. (2005a), ‘Habitus’, in J. Hilier and E. Rooksby (eds), Habitus: A Sense of Place, Aldershot: Ashgate. Bourdieu, P. (2005b), The Social Structures of the Economy, Cambridge, MA: Polity Press. Bowman, J. and M. Bradley (2006), ‘Activity based travel forecasting model for SACOG. Intermediate stop location models’, technical memo, Sacramento Association of Governments, Sacramento, CA, July. Bradley, M. and P. Vovsha (2005), ‘A model for joint choice of daily activity pattern types of household members’, Transportation, 32 (5), 545–71. Brög, W. and E. Erl (1980), ‘Interactive measurement methods: theoretical bases and practical applications’, Transportation Research Record 765, TRB, National Research Council, Washington, DC, pp. 1–6. Brög, W. and E. Erl (1982), Application of a model of individual behavior (situational approach) to explain household activity patterns in an urban area and to forecast behavioral changes’, Working Paper 145, SOCIALDATA, Institut für Verkehrs-und Infrastructurkturforschung GmbH, Munich.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

126

Handbook of choice modelling

Bronfenbrenner, U. (2005), ‘The bioecological theory of human development’, in U. Bronfenbrenner (ed.), Making Human Beings Human: Bioecological Perspectives on Human Development, Thousand Oaks, CA: Sage, pp. 793–828. Burris, M.W. and R.M. Pendyala (2002), ‘Discrete choice models of traveler participation in differential time of day pricing programs’, Transport Policy, 9 (3), 241–51. Carrasco, J.A., B. Hogan, B. Wellman and E.J. Miller (2008), ‘Collecting social network data to study social activity-travel behavior: an egocentric approach’, Environment and Planning B, 35 (6), 961–80. Cascetta, E., F. Pagliara and K. Axhausen (2007), ‘Dominance attributes for alternatives’ perception in choice set formation: an application to spatial choices’, DVD Proceedings of the 86th Annual Meeting of the Transportation Research Board, Washington, DC, January. Castro, M., N. Eluru, C.R. Bhat and R.M. Pendyala (2011), ‘Joint model of participation in nonwork activities and time-of-day choice set formation for workers’, Transportation Research Record: Journal of the Transportation Research Board (2254), 140–50. Caussade, S., J.D. Ortúzar, L.I. Rizzi and D.A. Hensher (2005), ‘Assessing the influence of design dimensions on stated choice experiment estimates’, Transportation Research B, 39 (7), 621–40. Chen, Y., S. Ravulaparthy, K. Deutsch, P. Dalal, S.Y. Yoon, T. Lei, K.G. Goulias, R.M. Pendyala, C.R. Bhat and H-H. Hu (2011), ‘Development of opportunity-based accessibility indicators’, Transportation Research Record, Journal of the Transportation Research Board, 2255, 58–68. Chow, K. and M. Healy (2008), ‘Place attachment and place identity: first-year undergraduates making the transition from home to university’, Journal of Environmental Psychology, 28 (4), 362–72. Clifton, K. and S. Handy (2003), ‘Qualitative methods in travel behavior research’, in P.R. Stopher and P. Jones (eds), Transport Survey Quality and Innovation, Oxford: Elsevier, pp. 283–302. Deutsch, K.E., S.Y. Yoon and K.G. Goulias (2011), ‘Modeling sense of place using a structural equation model’, DVD Proceedings of the 90th Annual Meeting of the Transportation Research Board, Washington, DC, January. Doherty, S.T. and E.J. Miller (2000), ‘A computerized household activity scheduling survey’, Transportation, 27 (1), 75–97. Elder Jr, G.H. and J.Z. Giele (2009), ‘Life course studies: an evolving field’, in G.H. Elder Jr and J.Z. Giele (eds), The Craft of Life Course Research, New York: Guilford Press, pp. 1–24. Elliott, R. (1998), ‘A model of emotion-driven choice’, Journal of Marketing Management, 14 (1–3), 95–108. Ettema, D., J. Knockaert and E. Verhoef (2010), ‘Using incentives as traffic management tool: empirical results of the “peak avoidance” experiment’, Transportation Letters, 2 (1), 39–51. Ettema, D., T. Schwanen and H. Timmermans (2007), ‘The effect of location, mobility and socio-demographic factors on task and time allocation of households’, Transportation, 34 (1), 89–105. Ferdous, N., R.M. Pendyala, C.R. Bhat and K.C. Konduri (2011), ‘Modeling the influence of family, social context, and spatial proximity on use of nonmotorized transport mode’, Transportation Research Record: Journal of the Transportation Research Board (2230), 111–20. Fujii, S. and R. Kitamura (2000), ‘Evaluation of trip-inducing effects of new freeways using a structural equations model system of commuters’ time use and travel’, Transportation Research B, 34 (5), 339–54. Fujii, S. and R. Kitamura (2003), ‘What does a one-month free bus ticket do to habitual drivers? An experimental analysis of habit and attitude change, Transportation, 30 (1), 81–95. Fujii, S. and A. Taniguchi (2006), ‘Determinants of the effectiveness of travel feedback programs – a review of communicative mobility management measures for changing travel behavior in Japan’, Transport Policy, 13 (5), 339–48. Gaker, D., Y. Zheng and J. Walker (2010), ‘Experimental economics in transportation: focus on social influences and provision of information’, Transportation Research Record: The Journal of the Transportation Research Board (2156), 47–55. Gärling, T., R. Gillholm and A. Gärling (1998), ‘Reintroducing attitude theory in travel behavior research: the validity of an interactive interview procedure to predict car use’, Transportation, 25 (2), 129–46. Gatersleben, B. and K.M. Appleton (2007), ‘Contemplating cycling to work: attitudes and perceptions in different stages of change’, Transportation Research A, 41 (4), 302–12. George, L.K. (2009), ‘Conceptualizing and measuring trajectories’, in G.H. Elder Jr and J.Z. Giele (eds), The Craft of Life Course Research, New York: Guilford Press, pp. 163–86. Giele J.Z. and G.H. Elder Jr (1998), Methods of Life Course Research: Qualitative and Quantitative Approaches, Thousand Oaks, CA: Sage. Giele, J.Z. (2009), ‘Life stories to understand diversity: variations by class, race, and gender’, in G.H. Elder Jr and J.Z. Giele (eds), The Craft of Life Course Research, New York: Guilford Press, pp. 236–57. Golob, T.F. (2003), ‘Structural equation modeling for travel behavior research’, Transportation Research B, 37 (1), 1–25. Goulias K.G. (2009), ‘Travel behavior dynamics from a lifespan development perspective’, paper presented at the 12th International Conference on Travel Behavior Research, Jaipur, 13–18 December.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

127

Goulias, K.G. and K. Henson (2006), ‘On altruists and egoists in activity participation and travel: who are they and do they live together?’, Transportation, 33 (5), 447–62. Goulias, K.G. and T. Kim (2005), ‘An analysis of activity type classification and issues related to the with whom and for whom questions of an activity diary’, in H. Timmermans (ed.), Progress in Activity-Based Analysis, Oxford: Elsevier, pp. 309–34. Goulias K.G. and S.Y. Yoon (2011), ‘On the relationship among travel behavior, time use investment and expenditures in social networks’, paper presented at the 16th HKSTS International Conference, Hong Kong, 17–20 December, and published in the conference proceedings. Goulias, K.G., C.R. Bhat, R.M. Pendyala, Y. Chen, R. Paleti, K.C. Konduri, T. Lei, D. Tang, S.Y. Yoon, G. Huang and H. Hu (2011), ‘Simulator of activities, greenhouse emissions, networks, and travel (SimAGENT) in Southern California’, technical paper, GeoTrans and Department of Geography, University of California, Santa Barbara, July. Goulias, K.G., R.M. Pendyala and C.R. Bhat (2013), ‘Total design data needs for the new generation large scale activity microsimulation models’, in J. Zmud, M. Lee-Gosselin, J.A.Carrasco and M. Munizaga (eds), Transport Survey Methods: Best Practice for Decision Making, Bingley: Emerald, pp. 21–49 Hägerstrand, T. (1970), ‘What about people in regional science?’, Papers and Proceedings of the Regional Science Association, 24, 7–24. Heinz, W.R. (2003), ‘Combining methods in life-course research: a mixed blessing?’, in W.R. Heinz and V.W. Marshall (eds), Social Dynamics of the Life Course, New York: Aldine De Gruyter, pp. 73–90. Hess, S., J.M. Rose and J. Polak (2010), ‘Non-trading, lexicographic and inconsistent behaviour in stated choice data’, Transportation Research D, 15 (7), 405–17. Kenyon, S. and G. Lyons (2007), ‘Introducing multitasking to the study of travel and ICT: examining its extent and assessing its potential importance’, Transportation Research A, 41 (2), 161–75. Kim, H.-M. and M.-P. Kwan (2003), ‘Space-time accessibility measures: a geocomputational algorithm with focus on the feasible opportunity set and possible activity duration’, Journal of Geographical Systems, 5 (1), 71–91. Kitamura, R. (1990), ‘Panel analysis in transportation planning: an overview’, Transportation Research A, 24 (6), 401–15. Kitamura, R., A. Kikuchi and R.M. Pendyala (2008), ‘Integrated, dynamic activity-network simulator: current state and future directions of PCATS-DEBNetS’, paper presented at the Second Transportation Research Board Conference on Innovations in Travel Modeling, Portland, Oregon, June. Kitamura, R., T. van der Hoorn and F. van Wijk (1997a), ‘A comparative analysis of daily time use and the development of an activity-based traveler benefit measure’, in D.F. Ettema and H. Timmermans (eds), Activity-Based Approaches to Travel Analysis, Oxford: Pergamon, Elsevier Science, pp. 171–87. Kitamura, R., P.L. Mokhtarian and L. Laidet (1997b), ‘A micro-analysis of land use and travel in five neighborhoods in the San Francisco Bay Area’, Transportation, 24 (2), 125–58. Kortum, K., R. Paleti, C.R. Bhat and R.M. Pendyala (2012), ‘A joint model of residential relocation choice and underlying causal factors’, Transportation Research Record: Journal of the Transportation Research Board (2303), 28–37. Krizek, K.J. and A. Johnson (2007), ‘Mapping the terrain of information and communications technology (ICT) and household travel’, Essays on Transport Economics, pt 5, pp. 363–81. Kuppam, A.R., R.M. Pendyala and S. Rahman (1999), ‘Analysis of the role of traveller attitudes and perceptions in explaining mode choice behaviour’, Transportation Research Record: Journal of the Transportation Research Board (1676), 68–76. Kwan, M. and X. Hong (1998), ‘Network-based constraints-oriented choice set formation using GIS’, Geographical Systems, 5 (1), 139–62. Kwan, M.-P. (1998), ‘Space-time and integral measures of individual accessibility: a comparative analysis using a point-based framework’, Geographical Analysis, 30 (3), 191–217. Laub, J.H. and R.J. Sampson (1993), ‘Turning points in the life course: why change matters to the study of crime’, Criminology, 31 (3), 301–25. Lee, B.H.Y., P. Waddell, L. Wang and R.M. Pendyala (2010), ‘Re-examining the influence of work and non-work accessibility on residential location choices with a micro-analytic framework’, Environment and Planning A, 42 (4), 913–30. Lee-Gosselin, M.E.H. (1990), ‘The dynamics of car use patterns under different scenarios: a gaming approach’, in P.M. Jones (ed.), Developments in Dynamic and Activity-Based Approaches to Travel Analysis, Aldershot: Gower, pp. 250–71. Lee-Gosselin, M.E.H. (1996), ‘Scope and potential of interactive stated response data collection methods’, in Proceedings of the Conference on Household Travel Surveys: New Concepts and Research Needs, Washington, DC: Transportation Research Board, pp. 115–33. Lei, T., Y. Chen and K.G. Goulias (2012), ‘Opportunity-based dynamic transit accessibility in Southern California: measurement, findings, and comparison with automobile accessibility’, Transportation Research Record, 2276, 26–37.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

128

Handbook of choice modelling

Louviere, J.J., D. Hensher and J. Swait (2000), Stated Choice Methods: Analysis and Application, Cambridge: Cambridge University Press. Mahmassani, H.S., R. Chen, Y. Huang, N. Contractor and D. Williams (2010), ‘Time to play? Activity engagement in multiplayer online role playing games’, Transportation Research Record: Journal of the Transportation Research Board (2157), 129–37. Manski, C. (1977), ‘The structure of random utility models’, Theory and Decision, 8 (3), 229–54. Manski, C. (2002), ‘Identification of decision rules in experiments on simple games of proposal and response’, European Economic Review, 46 (4–5), 880–91. Manski, C. (2004), ‘Measuring expectations’, Econometrica, 72 (5), 1329–76. Manski, C. (2010), ‘When consensus choice dominates individualism: Jensen’s inequality and collective decisions under uncertainty’, Quantitative Economics, 1(1), 187–202. McFadden, D. (1999), ‘Rationality for economists?’, Journal of Risk and Uncertainty, 19 (1–3), 73–105. McLeod, J.D. and E.P. Almazan (2003), ‘Connections between childhood and adulthood’, in J. Mortimer and M. Shanahan (eds), Handbook of the Life Course, New York: Kluwer Academic, Plenum, pp. 391–411. Miller, E.J. and M.E. O’Kelly (1983), ‘Estimating shopping destination models from travel diary data’, Professional Geographer, 35 (4), 440–49. Miller, H.J. (1991), ‘Modeling accessibility using space-time prism concepts within geographical information systems’, International Journal of Geographical Information Systems, 5 (3), 287–301. Mohammadian, A. and E.J. Miller (2003), ‘Dynamic modeling of household automobile transactions’, Transportation Research Record: Journal of the Transportation Research Board (1831), 98–105. Mokhtarian, P.L., I. Salomon and S.L. Handy (2006), ‘The impacts of ICT on leisure activities and travel: a conceptual exploration’, Transportation, 33 (3), 263–89. Moschis, G.P. (2007), ‘Life course perspectives on consumer behavior’, Journal of the Academy of Marketing Science, 35 (2), 295–307. Olszewski, P. and L. Xie (2005), ‘Modeling the effects of road pricing on traffic in Singapore’, Transportation Research A, 39 (7–9), 755–72. Oppewal, H. and H. Timmermans (1991), ‘Context effects and decompositional choice modeling’, Papers in Regional Science, 70 (2), 113–31. Outwater, M., G. Spitz, J. Lobb, M. Campbell, B. Sana, R.M. Pendyala and W. Woodford (2011), ‘Characteristics of premium transit services that affect mode choice’, Transportation, 38 (4), 605–23. Pachauri, M. (2001), ‘Consumer behaviour: a literature review’, The Marketing Review, 2 (3), 319–55. Páez, A and D.M. Scott (2007), ‘Social influence on travel behavior: a simulation example of the decision to telecommute’, Environment and Planning A, 39 (3), 647–65. Pagliara, F. and H.J.P. Timmermans (2009), ‘Choice set generation in spatial contexts: a review’, Transportation Letters, 1 (1), 181–96. Paleti, R., C.R. Bhat, R.M. Pendyala and K.G. Goulias (2013), ‘The modeling of household vehicle type choice accommodating spatial dependence effects’, Transportation Research Record, 2343, 86–94. Parsons, T. (1991), The Social System, London: Routledge. Pendyala, R.M. (2003), ‘Time use and travel behavior in space and time’, in K.G. Goulias (ed.), Transportation Systems Planning: Methods and Applications, Boca Raton, FL: CRC Press, pp. 2-1–2-37. Pendyala, R.M. and C.R. Bhat (2004), ‘An exploration of the relationship between timing and duration of maintenance activities’, Transportation, 31 (4), 429–56. Pendyala, R.M. and C.R. Bhat (2012), ‘Moving travel behaviour research forward in a rapidly evolving world’, in C.R. Bhat and R.M. Pendyala (eds), Travel Behaviour Research in an Evolving World, North Carolina: Lulu.com, pp. 3–12. Pendyala, R.M. and S. Bricka (2006), ‘Defining and collecting behavioral process data for travel analysis: challenges and issues’, in P.R. Stopher and C. Stecher (eds), Travel Survey Methods: Quality and Future Directions, Oxford: Elsevier, pp. 511–30. Pendyala, R.M., R. Kitamura and D.V.G.P. Reddy (1998), ‘Application of an activity-based travel-demand model incorporating a rule-based algorithm’, Environment and Planning B, 25 (5), 753–72. Pendyala, R.M., K.C. Konduri, Y-C. Chiu, M. Hickman, H. Noh, P. Waddell, L. Wang, D. You and B. Gardner (2012), ‘Integrated land use-transport model system with dynamic time-dependent activity-travel microsimulation’, Transportation Research Record: Journal of the Transportation Research Board (2303), 19–27. Pendyala, R.M., T. Yamamoto and R. Kitamura (2002), ‘On the formulation of time space prisms to model constraints on personal activity-travel engagement’, Transportation, 29 (1), 73–94. Pinjari, A.R., R.M. Pendyala, C.R. Bhat and P.A. Waddell (2011), ‘Modeling the choice continuum: an integrated model of residential location, auto ownership, bicycle ownership, and commute tour mode choice decisions’, Transportation, 38 (6), 933–58. Polydoropoulou, A., M. Kamargianni and A. Tsirimpa (2012), ‘Car use addiction versus ecological con-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice context

129

sciousness: which prevails in mode choice behavior for young people?’, paper presented at 2012 IATBR Conference Toronto, 15–19 July. Rönkä, A., S. Oravala and L. Pulkkinen (2003), ‘Turning points in adults’ lives: the effects of gender and the amount of choice’, Journal of Adult Development, 10 (3), 203–15. Roorda, M.J. and E.J. Miller (2005), ‘Strategies for resolving activity scheduling conflicts: an empirical analysis’, in H.J.P. Timmermans (ed.), Progress in Activity-Based Analysis, Oxford: Elsevier, pp. 203–22. Schenker, N. and T.E. Raghunathan (2007), ‘Combining information from multiple surveys to enhance estimation of measures of health’, Statistics in Medicine, 26 (8), 1802–11. Scott, D.M. (2006), ‘Constrained destination choice set generation: a comparison of GIS- based approaches’, CD Proceedings of the 85th Annual Meeting of the Transportation Research Board Meeting, Washington, DC, January. Scott, D.M. and K.W. Axhausen (2006), ‘Household mobility tool ownership: Modeling interactions between cars and season tickets’, Transportation, 33 (4), 311–22. Sener, I.N., R.B. Copperman, R.M. Pendyala and C.R. Bhat (2008), ‘An analysis of children’s leisure activity engagement: examining the day of week, location, physical activity level, and fixity dimensions’, Transportation, 35 (5), 673–96. Seraj, S., R. Sidharthan, C.R. Bhat, R.M. Pendyala and K.G. Goulias (2012), ‘Parental attitudes towards children walking and bicycling to school: multivariate ordered response analysis’, Transportation Research Record: Journal of the Transportation Research Board (2323), 46–55. Sidharthan, R., C.R. Bhat, R.M. Pendyala and K.G. Goulias (2011), ‘Model for children’s school travel mode choice: accounting for effects of spatial and social interaction’, Transportation Research Record: Journal of the Transportation Research Board, 2213, 78–86. Sivak, M. and B. Schoettle (2012), ‘Update: percentage of young persons with a driver’s license continues to drop’, Traffic Injury Prevention, 13 (4), 341. Skyrms, B. and R. Permantle (2000), ‘A dynamic model of social network formation’, Proceedings of the National Academy of Sciences, 97 (16), 9340–46. Spence, J.C. and R.E. Lee (2003), ‘Toward a comprehensive model of physical activity’, Psychology of Sport and Exercise, 4 (1), 7–24. Srinivasan, S. and C.R. Bhat (2005), ‘Modeling household interactions in daily in-home and out-of-home maintenance activity participation’, Transportation, 32 (5), 523–44. Swait, J., W. Adamowicz, M. Hanemann, A. Diederich, J. Krosnick. D. Layton, W. Provencher, D. Schkade and R. Tourangeau (2002), ‘Context dependence and aggregation in disaggregate choice analysis’, Marketing Letters, 13 (3), 195–205. Swartz, D. (1997), Culture and Power: The Sociology of Pierre Bourdieu, Chicago, IL: University of Chicago Press. Termansen, M., C.J. McClean and H. Skov-Petersen (2004), ‘Recreational site choice modeling using highresolution spatial data’, Environment and Planning A, 36 (6), 1085– 99. Thill, J.C. (1992), ‘Choice set formation for destination choice modeling’, Progress in Human Geography, 16 (3), 361–82. Toivonen, R., J-P. Onnela, J. Saramäki, J. Hyvönen and K. Kaski (2006), ‘A model for social networks’, Physica A, 371 (2), 851–60. Turrentine, T., M. Lee-Gosselin, K. Kurani and D. Sperling (1992), ‘A study of adaptive and optimizing behavior for electric vehicles based on interactive simulation games and revealed behavior of electric vehicle owners’, unpublished paper, University of California Transportation Center, University of California, Davis, accessed 23 January 2014 at www.its.ucdavis.edu/wp-content/themes/ucdavis/pubs/download_pdf. php?id=935. Vygotsky, L.S. (1978), Mind and Society: The Development of Higher Psychological Processes, Cambridge, MA: Harvard University Press. Waddell, P., C.R. Bhat, N. Eluru, L. Wang and R.M. Pendyala (2007), ‘Modeling the interdependence in household residence and workplace choices’, Transportation Research Record: Journal of the Transportation Research Board (2003), 84–92. Waddell, P., A. Borning, N.M. Freier, M. Becke and G. Ulfarsson (2003), ‘Microsimulation of urban development and location choices: design and implementation of UrbanSim’, Networks and Spatial Economics, 3 (1), 43–67. Walker, J.L. (2001), ‘Extended discrete choice models: integrated framework, flexible error structures, and latent variables’, PhD thesis, Massachusetts Institute of Technology, Cambridge, MA. Walker, J.L., E. Ehlers, I. Banerjee and E.R. Dugundji (2011), ‘Correcting for endogeneity in behavioral choice models with social influence variables’, Transportation Research A, 45 (4), 362–74. Wang, D. and F.Y.T. Law (2007), ‘Impacts of information and communication technologies (ICT) on time use and travel behavior: a structural equations analysis’, Transportation, 34 (4), 513–27.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

130

Handbook of choice modelling

Washbrook, K., W. Haider and M. Jaccard (2006), ‘Estimating commuter mode choice: a discrete choice analysis of the impact of road pricing and parking charges’, Transportation, 33 (6), 621–39. Weber, J. and M.-P. Kwan (2002), ‘Bringing time back in: a study on the influence of travel time variations and facility opening hours on individual accessibility’, The Professional Geographer, 54 (2), 226–40. Yoon, S.Y. and K.G. Goulias (2010), ‘Impact of time-space prism accessibility on time use behavior and its propagation through intra-household interaction’, Transportation Letters, 2 (4), 245–60. Yoon, S.Y., K. Deutsch and K.G. Goulias (2012), ‘Feasibility of using time-space prism to represent available opportunities and choice sets for destination choice models in the context of dynamic urban environments’, Paper 12-2707 presented at the 91st Annual Meeting of the Transportation Research Board, Washington, DC, 22–26 January, published in Transportation, 39 (4), 807–23. Zheng, J. and J. Guo (2008), ‘A destination choice model incorporating choice set formation’, DVD Proceedings of the 87th Annual Meeting of the Transportation Research Board, Washington, DC, January.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:04AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

6

Self-tracing and reporting: state of the art in the capture of revealed behaviour Nadine Rieser-Schüssler and Kay W. Axhausen

1

INTRODUCTION

The measurement of travel behaviour is based on the traces, which travellers leave willingly or unwillingly. The chapter discusses and describes the range of these traces. They range from the participation in travel diary surveys to the technical records of mobile phone providers: some of them are recent, such as tracing by smart phones, some are driven by the curiosity of the travellers themselves, such as the dollar bill tracing website www.wheresgeorg.com,1 some by national accounting or policy making, such as the various national travel diary studies. Each of the available forms has well-known biases during the various phases of data collection and processing, which we try to highlight and discuss. It is necessary to point out, that the object of the data collection varies between the approaches. Traditional transport planning, national statistics driven approaches are interested in identifiable movements, for example stages, trips or journeys, which can be described with an origin, a destination, a purpose and a (main-)mode and associated with the person undertaking this movement. This movement has social meaning as it has been undertaken to satisfy some need or task of the person reporting it. New tracing technologies, such as the interaction records of mobile phones with their localized infrastructure or the geo-location stamps of Twitter, provide movement information as a byproduct, but at random intervals and without socially meaningful information about the movement when tracing for short periods (see Table 6.1). Without the social content of the report, that is, trip identification by the reporting person, purpose, mode, company and cost allocation, records can only be used to construct a movement field. This field will describe the lower bounds of the probability of a movement of a certain distance from the last location, or if available from post processing or from a suitable record in the user profile from the home location.2 Alternatively, they can be used to construct a lower bound of the probability of relocations between locations in space given the accuracy of the geocodes they obtain. In the case of the random tracing of objects, for example, dollar or euro bills, one has to remember, that these are also transported in bulk without a meaningful person-movement, for example, when rental cars get repositioned, or dollar bills transported for cleaning and recirculation. Equally, the social context, the information about the traveller, is available at different levels of depth. While some information can be added through imputation given the records available, this will add additional error and uncertainty to any analysis. From among this range of methods three are used extensively at this time: travel diaries, self-tracing by Global Positioning System (GPS) and the secondary analysis of global system for mobile communications (GSM) or mobile phone records (for example, Ahas et al., 2010). Payment cards, such as those for public transport (for example, 131 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

132

Handbook of choice modelling

Table 6.1 Formats of capturing movements of individuals Type

Source

Reporting period

Raw content

Location accuracy

Self-reported diary

Description at the level of stages, or trips or journeys Voluntary sharing of GPS records (via geo-logger, Smartphone) Voluntary sharing of GPS records and self-reported diary Administrative record-keeping

1 day – multiple weeks 1+ days

Coordinates at the level requested, or answered Coordinates

Administrative record-keeping Administrative record-keeping Matching of voluntary recorded locations of dollar bills Accounting information

1+ days

Movement with its social content and context Movement without its social content or social context Movement with its social context and content Movement field with its social context Movement field

1+ months

Movement field

–

Movement field without its social context

30+ days

Photography sharing sites

Voluntary publication

–

Twitter

Voluntary publication of partial diaries

1+

Social networking websites

Voluntary publication of partial diaries

1+ days

Movement field with social context Movement field without its social context Movement field without its social content, with social context Movement field with its social context

Self-tracing GPS

Prompted recall self-tracing GPS GSM mobile phone Payment cards Number plate tracing Where’s George?

Credit card bills

1 to 14 days 1+ months

Coordinates

Coordinates of cell tower Coordinates of the points of sale Coordinates of observation points Coordinates at the municipal level

Coordinates at the municipal level Coordinates of the identified objects Coordinates

Coordinates of the identified objects, locations

Chakirov and Erath, 2011), are getting more popular. This chapter looks at the differences in these methods with focus on those methods that are able to obtain socially meaningful movements. The capture and analysis of movement fields has different challenges and offers different possibilities, which cannot be done justice here, but see Gonzalez et al. (2008) for an example. The choice between these methods is a trade-off between four main factors: completeness of the record, cost per recorded day and person reporting, desired length of the reporting period and, finally, response rate. The question of the completeness of the record is sometimes also referred to as the question of observing the ground truth. Establishing this ground truth is challenging

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Self-tracing and reporting 133 for all survey methods. Diaries suffer from recall limitations and strategic soft and hard refusal or answering; harvested data such as GSM-based tracing or public transport payment card data is limited in its resolution and/or the segment of travel behaviour that is covered; GPS is usually outstanding for times and places, but the technology can fail and using the data relies on post-processing and imputation for the other facets whereas prompted recall has all the issues of the diary. Since these issues lead to differences in the observed behaviour, this has to be taken into account in the modelling. All the more since recent research (Bricka and Bhat, 2006; Stopher and Greaves, 2010; Bricka et al., 2012) has found that the differences in the observed behaviour depend on the participants’ socio-economics and their general travel habits. Another big difference between the survey methods is the structure of the cost associated with each. At one end of the spectrum are traditional diaries for which the largest part of the cost is caused by the actual survey conduct and for which the additional cost of one or more additional survey day per person is usually non-negligible. The other end of the spectrum comprises data harvested from GSM or public transport payment cards, where the cost of the data collection itself is usually born by the firm collecting the data and the main cost for the modeller lies in developing the processing routines and the purchase of the traces. The cost for GPS data collection contains both elements, but the structure for the survey conduct costs differs from traditional diaries. The marginal costs of additional survey days are so much lower than the initial deployment costs, that studies conducted using new technologies make longer observation periods very appealing and indeed necessary to improve the imputation results. Longer observation periods have the benefit of opening up new research and analysis areas such as the study of behavioural patterns over the course of a week. However, for use in traditional models the availability of repeated observations from the same person leads to a shift away from inter- and towards intra-person variability. First studies comparing intra- and inter-person variability (for example, Schlich, 2003; Stopher et al., 2008; Chikaraichi et al., 2009, 2011; Xu and Guensler, 2011) indicate that intra-person variety is substantially higher than inter-person variability but the consequences this has for modelling practice has not yet been studied sufficiently. Moreover, the length of survey period influences the response rates and combined with the respondent burden and the survey recruiting and execution costs determines the sample sizes that can be achieved with each survey method for a given budget. However, not only will the size of the sample differ between survey methods but also the composition of the sample. While only little is known about the impact of other modern technologies, some studies (Bricka, 2008; Marchal et al., 2008; Stopher, 2009) have addressed the differences in the composition of samples in GPS diaries and traditional diaries. They found that these differences can be significant, particularly with respect to age, income, education and household size of the participants but also with regard to their travel behaviour, with persons who are more mobile being more likely to participate in GPS studies than in traditional travel diaries. The researcher has to weigh all these issues against the aims of his or her study. As a starting point, Table 6.2 compares these and other aspects for the three currently most prominent formats. The next section discusses the travel and activity diary, while the following sections address the challenges arising from self-traced GPS records.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

134

Handbook of choice modelling

Table 6.2 Factors in the choice of the formats for capturing movements of individuals Dimension

Self-reported diary

GSM

Self-tracing GPS

Identification of trips

Easy

Impossible

Completeness of the trip record

Dependent on the respondent

Impossible

Precision of trip start or end times

5/15 min intervals depending on the rounding of the respondents Within the precision of the report Yes If requested

–

Post-processing required Yes, given no signal loss during the reporting period By second

Locations Trip purpose Accompanying persons/persons met Route

Recruitment

Reporting period

Geocodes of the cell towera None No

Yes, but extra response burden

Not reliably and impossible for short routes Function of the Easy, if cooperation response burden of the GSM (number of items, length operator is of reporting period) available 1 day up to multiple Unlimited weeks

Geocodes By imputation By imputation if group data is available Precisely, after map-matching Function of the response burden (length of reporting period) (1–) 7 days (up to multiple weeks)

Note: a Assuming that the study has no access to finer triangulation methods, which are available for emergencies or to the policy.

2

SELF-REPORTED TRAVEL BEHAVIOUR

Travel diaries have been used since the 1930s to obtain statistical information about the movement of the population. The change from the early origin-destination (OD) surveys to fully fledged travel and activity diaries mirrors the changing challenges of transport modelling as tool of policy-making (see Axhausen, 1995, for a description of these changes, but without a detailed discussion of their motivation and interaction with practice). Their conduct and contents have been subjects of professional discussions since their beginning. The current state of this discussion is summarized for daily movements in the typical daily activity space in Richardson et al. (1995), Centre d’etudes sur les reseaux, les transports, l’urbanisme et les construction publiques (CERTU, 2008) and for long-distance movements in Axhausen et al. (2002a). Alternatively, the latest national travel diary survey is always a reference despite the differences between the countries, especially to obtain a frame for reweighting any local or national results (for example, USA: nhts.ornl.gov/; Germany: www.mobilitaet-in-deutschland.de/; UK: http://data. gov.uk/dataset/national_travel_survey; Switzerland: www.bfs.admin.ch/bfs/portal/de/ index/themen/11/07/01/02/05.html).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Self-tracing and reporting 135 Each study will have to make a series of choices which interact: ●

● ●

● ●

●

● ●

● ●

Reporting unit: stage, trip, journey or activity (see Axhausen, 2008, for the definitions of these units). Spatial range: local, long-distance movement or both. Key information about each movement: origin and destination location, start and end time, (main) mode, estimated length of each movement, company during the movement and/or for the ensuing activity, out-of-pocket expenditure for each movement and activity. Range of additional socio-demographic and attitudinal items. Duration of the reporting period: normally one day, but seven day surveys or surveys combining a weekday and one weekend day are common. Exceptions, such as the 1973 five-week diary in Uppsala (Hanson and Huff, 1982) or the 1999 six-week Mobidrive survey (Axhausen et al., 2002b) require special care to maintain the response quality. Channel to raise the questions with the respondents: paper-and-pencil, computerbased, or web-based diaries, face-to-face or computer-aided telephone interviews. Channel to record the answers: self-completion, interviewer assisted. Channel to retrieve the answers: postal return, computer/web-capture by the respondent or interviewer. Number and channel of any reminders. Target population and the associated sampling frame.

The first main interaction is the response burden for the participants and the reporting unit, as the basic items per movement remain the same for each, but their number per day varies by nearly an order of magnitude. In industrialized countries the average person will undertake about four trips per day, that is, ten stages with one means of transport (walk, bicycle, car, bus, tram, train, and so on) or 1.5 journeys (tours) from home back to home per day. The response burden should be assessed a priori in the design phase, so that the survey designer can trade off the range of items against response rate. The response rate will drive the survey costs (and possibly the survey period) but, equally importantly, might cause a suspicion that a non-random non-response behaviour has biased the results of the survey. There is substantial literature assessing the response burden qualitatively, but hardly any quantitative assessments. One exception is Axhausen and Weis (2010), who analysed the response rates of the surveys conducted by one Swiss research group, which gave them a social-contextual uniformity reducing the variance otherwise bedevilling such analyses. Figure 6.1 shows that the response rate falls with the response burden, highlighting the need to be parsimonious in the selection of the questions. The second main interaction to consider is the form of the protocol of the interactions with the respondents. One can think of any survey as a particular and very formal interaction between strangers. The survey, represented by an interviewer, if any, has to gain the collaboration of the respondents up to the last contact with them, by taking them seriously, by conveying the importance of the subject to the organization sponsoring the survey, by matching the expectations between subject matter and sponsor, by choosing materials (design and ‘printed’ quality of the forms, number and types of contacts,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

136

Handbook of choice modelling

100 No prior recruitment, no motivation call No prior recruitment, motivation call Prior recruitment, no incentive

80

Response rate (%)

Prior recruitment and incentive

60

40

20

0 0

200

400

600

800

1000

1200

1400

1600

Ex ante assessment of respondent burden Source: Updated from Axhausen and Weis (2010).

Figure 6.1

AAPOR response rate versus an a priori assessment of response burden of IVT, ETH Zürich sponsored paper-and-pencil self-administered surveys

that is, the protocol, and size of any incentive), which are perceived to be appropriate for the task and the sponsor and, finally, by accommodating their preferred response channel. The results in Figure 6.1 and the analysis of the response behaviour (here the average number of trips reported) over a series of reminder and recontacts have shown that nonrandom non-response is prevalent in travel diaries (Wermuth, 1985; Richardson, 2003). Respondents, who have been asked to report for days on which they have been (very) active due to the normal variance in behaviour, will, all things being equal, try to reduce their response burden by not participating at all, by falsely reporting a day without movements (Madre et al., 2007), by omitting whole tours or individual trips (see Hubert et al., 2008, comparing travel and time use diaries, or Bricka and Bhat, 2006, comparing travel diaries and GPS self-tracing studies). Letting respondents choose the reporting day is likely to reduce the number of reported movements. It is clear, that all the usual survey design issues apply to travel diaries as well (see Sudman et al., 1996 or Dillman, 2000): for example, questions must be formulated for the respondent; the questions must be clear and unambiguous; the ability of the respondent to recall has to be kept in mind; the protocol has to enable non-literate persons to participate through offering the appropriate answering channels (see OECD and Statistics Canada, 1995 for an idea of the substantial share of non-literate persons in Organisation for Economic Co-operation and Development countries).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Self-tracing and reporting 137 The recall limitations are particularly important for short movements in surveys of daily travel, which are also often not very salient for the respondents, and more so, in surveys of long-distance travel, which generally have reporting periods of two or more months (see Axhausen et al., 2002b). The potentially selective non-response requires careful post-processing of the results of a travel diary. The omission of short movements leads to incomplete activity chains, but generally not to a big underestimation of travel times and distances (see, for example, Schüssler, 2010). The inflated share of days without any movements requires careful monitoring already during the survey period (interviews), as it is very hard to treat through imputation or reweighting (see Polak and Han, 1997, on expectation maximization approaches). Selective non-participation can be addressed by reweighting the sample, if one assumes that it was randomly distributed through the population. In summary, the travel diary enables the study to obtain a reasonably complete measurement of the movements of the target population in combination with a record of the selected socio-demographic and attitudinal variables. The unity of these three elements is its strength, but also its burden, as the respondent’s burden is correspondingly high. Fortunately, transport and travel is a topic of public interest motivating many residents to participate in such a survey. Still, the selective non-response in particular of younger, male respondents is worrying.

3

PASSIVE TRACING USING MODERN TECHNOLOGIES

The development of new technologies in the past decades has led to an increasing interest by the travel survey community to use these technologies to trace travel behaviour. This section briefly summarizes the major developments in this research area. For a more extensive discussion the reader is referred to Rieser-Schüssler (2012). First, the most prominent technologies are discussed. Then, the processing routines required to transfer the raw data into usable observations of travel behaviour are introduced. Finally, some of the main differences between these observations and those obtained from traditional self-reporting surveys are elaborated. 3.1

Data Sources

A wide variety of data sources are available that can be used for tracking the travel behaviour of individuals. In the following, five of the currently most prevalent data sources are summarized: ● ● ● ● ●

Global Positioning System (GPS); GSM records (GSM); public transport operations data (PTOD); automatic number plate recognition (ANPR); and smart-phones and other combined data sources.

These data sources have very different characteristics with regard to their spatial and temporal resolution but also with respect to the amount of participant interaction they

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

138

Handbook of choice modelling

require. The participant interaction can range from anonymous data that was collected for other purposes – for example, billing of public transport users or mobile phone customers – to full-scale GPS travel diaries were the participants carry dedicated GPS devices with them and answer extensive prompted recall questions. Another differentiating aspect is the extent to which the person’s daily travel behaviour is covered by the tracking technology. Person-based GPS or GSM allows the observation of complete daily patterns whereas other methods are restricted to certain modes, for example, APNR or PTOD. 3.1.1 Global Positioning System Since the first GPS studies were conducted in the late 1990s (for example, Wagner, 1997; Casas and Arce, 1999; Murakami and Wagner, 1999; Yalamanchili et al., 1999; Draijer et al., 2000; Wolf, 2000), this survey technology has gained a lot of research attention and is now the most prominent self-tracing alternative to survey travel behaviour. The raw data consists of a stream of positions with their timestamps and potentially quality and speed information. From this raw data travel diaries – containing stages, trips, activities and a series of their attributes – are reconstructed with the help of post-processing routines. The first studies were mainly vehicle based and concentrated on issues such as understanding the technology (de Jong and Mensonides, 2003), the accuracy of the measurements (for example, Wolf et al., 1999; Ogle et al., 2002) and verifying self-reported trip rates (Murakami and Wagner, 1999; Wolf et al., 2003; Du and Aultman-Hall, 2007). Recently, the focus has shifted towards person-based studies that allow the researcher to observe the entire travel behaviour of a person regardless of the mode used, including the activity locations. As a result, large-scale representative GPS household surveys at regional (for example, Washington, DC, metropolitan area: Wolf and Oliveira (2008); Greater Cincinnati area: Giaimo et al. (2010) or Jerusalem: Oliveira et al. (2011)) or even national (for example, France: Marchal et al., 2011, or the UK: Rofique et al., 2011) level are becoming increasingly common. However, there are still several open research issues. First, the currently available processing routines can still be improved – especially regarding the trip purpose detection. Second, some travel characteristics are currently not deducible from GPS traces without participant interaction, for example, number of accompanying persons and trip costs, and it has to be investigated how those characteristics can be collected with minimum participant burden. 3.1.2 GSM records Most GSM studies do not involve participant interaction. Analysts typically use the data that is originally recorded by mobile phone companies for billing purposes, that is, the phone’s position when it is used or when it moves to another mobile phone antenna or service area. The location is recorded in terms of the serving antenna, the network cell or the area code. One ethical issue with this data is that the mobile phone users are usually not asked for their permission to share this data. To address this privacy concern, mobile phone network providers typically make no or very little information about the mobile phone owner available. From a modeller’s perspective, one of the main disadvantages of this data source– compared with GPS – is the fairly low spatial and random temporal

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Self-tracing and reporting 139 resolution. However, GSM tracks have the advantage of potentially huge sample sizes because no participants have to be recruited and the penetration rate of mobile phones is high, especially in developing countries. Interesting GSM applications have been presented by Schlaich et al. (2010), who investigated route choice on high-order networks, and Ahas et al. (2010) and Bekhor et al. (forthcoming) who were interested in commuting behaviour and anchor points and longdistance travel patterns, respectively. However, since there is no standard application yet, the processing routines have so far been customized to the specific research questions. 3.1.3 Automatic number plate recognition Automatic number plate recognition is an increasingly popular component in traffic monitoring as well as in the enforcement of road-pricing schemes but the resulting data can also be used for travel behaviour modelling. Automatic number plate recognition cameras are installed along roads and take (infra-red) pictures of passing vehicles’ number plates. The actual plate numbers are then extracted from the pictures using optical character recognition algorithms and stored in a database together with the identity and position of the recording camera, the timestamp and the photo. Past applications include the estimation of origin-destination matrices and route choice models (Friedrich et al., 2008), the calibration of simulation models (Choudhury et al., 2011) and enforcement of the London road-pricing scheme. However, because APNR systems are expensive, their coverage of the road network varies greatly between countries, and behavioural modellers will depend on these systems to be installed for different purposes. 3.1.4 Public transport operations data A data source that is currently growing fast is PTOD. Public transport operators around the world have started to use automatic passenger counting (APC), vehicle location (AVL) and automatic fare collection (AFC) systems to optimize their daily operations and simplify the payment process for customers and operators alike. The resulting abundance of data can be used for a wide variety of analyses. The most common application is the estimation of OD matrices based on AFC and – if available – AVL data, and usually a spatial representation of the public transport network (Farzin, 2008; Wilson et al., 2009; Chakirov and Erath, 2011; Chapleau et al., 2011; Munizaga and Palma, 2012). Other applications are the observation of public transport connection choice (Wilson et al., 2009; Chapleau et al., 2011) and the investigation of transfer patterns and passenger flows between intersecting routes (Seaborn et al., 2009). The huge amount of data is both a blessing and a curse. On the one hand, it allows a detailed insight into the system and captures the behaviour of substantially more passengers than traditional (on-board) surveys will ever be able to. On the other hand, the large data volumes are a challenge for the analyst, and well-designed data storage systems and filtering techniques are crucial. 3.1.5 Smart-phones and other combined data sources The newest player on the survey technology market is the smart-phone that offers the opportunity to combine GPS tracks with other data from sources such as accelerometer, WiFi, Bluetooth or GSM. Additionally, the smart-phone can log the phone activities

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

140

Handbook of choice modelling

conducted by the participant (Chen et al., 2009), transactions with public transport fare systems, and so on. The different types of information can then be used to improve the reconstruction of travel diary elements such as modes or activity purposes (Hurtubia et al., 2009). In the past, there were two main obstacles to large-scale smart-phone transport surveys: Inaccurate GPS measurement and short battery run-time for the heavy energyconsuming GPS measurements. The accuracy issue has been solved in the newest generation of smart-phones, but the battery run-times are still not long enough to observe entire travel days. Therefore, a few teams (for example, Pereira et al., 2011) of researchers are currently working on better battery management strategies. 3.2

Processing the Raw Data

The processing routines required to transform raw data into data-sets usable for model estimation strongly depend on the modelling purpose and the data source. However, this set of processing steps can be seen as a standard: ● ● ● ● ● ●

cleaning and smoothing; detection of trips, stages and activities; mode detection; activity purpose imputation; spatial matching; and usage of user input.

Each of these steps has its own challenges, which are discussed below. 3.2.1 Cleaning and smoothing Cleaning the data is one of the most important processing steps for all data sources because measurement errors will occur with any survey technology. The two major types of errors are inaccurate measurements and missing measurements. While the recording gaps resulting from missing measurements have to be accounted for in the subsequent imputation steps, the cleaning step detects and removes or corrects inaccurate measurements because they can cause misleading imputations. The correction of inaccurate measurements is only possible for minor deviations and high sampling frequencies, and is usually done by smoothing the positioning data (for example, Ogle et al., 2002; Chung and Shalaby, 2005; Jun et al., 2007; Schüssler and Axhausen, 2009b). For the removal of wrong measurements, a variety of filtering mechanisms have been developed for the different measurement technologies. For GPS measurements, the number of satellites in view and the dilution of precision (DOP) values are the most efficient filtering criteria (for example, Wolf et al., 1999; Ogle et al., 2002; Rieser-Schüssler et al., 2011). If these values do not suffice or are not available, other measures such as a filter for jumps in position (Schüssler and Axhausen, 2009b) can be employed. Similarly, erroneous GSM measurements are detected by searching for short interval switches between neighbouring antennas or unreasonable switches between antennas located too far apart and within short periods of time (Bekhor et al., forthcoming).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Self-tracing and reporting 141 3.2.2 Detection of trips, stages and activities The best procedure for the segmentation of the time and space trajectories into trips, stages and activities strongly depends on the data source and the spatial and temporal resolution of the data. For sparse data, such as GSM or AFC without alighting interactions, only approximations are possible. Thus, currently the only survey technology for which more precise methods have been developed is GPS. Typical criteria for the detection of the end of a stage in GPS tracks are signal loss for a certain amount of time (for example, Doherty et al., 2001; Wolf et al., 2004), speeds close to zero (for example, Schönfelder et al., 2006; Tsui and Shalaby, 2006; Schüssler and Axhausen, 2009b) or a high density of GPS points (for example, Doherty et al., 2001; Stopher et al., 2005; Schüssler and Axhausen, 2009b). Another end-of stage indicator is a mode transfer that can be characterized either by one of the phenomena above or by a change between walking and another mode detectable in the speed and acceleration patterns (Tsui and Shalaby, 2006; Rieser-Schüssler et al., 2011). Although a lot of procedures have been developed and tested during the past decade, there are still some open and not easy to solve issues. On the one hand, the procedures often depend on parameters that have to be set by the analyst and calibrated for each new device type because the measurement accuracy of the devices varies substantially. Moreover there is the trade-off between being able to detect very short activities and not detecting an activity at every traffic light or with other short waits. On the other hand, there is the cold start problem, that is, the difficult problem that GPS devices need some time after a longer recording gap to acquire the position of the satellites, that has been addressed by only few researchers (for example, Stopher et al., 2005). 3.2.3 Mode detection Performing satisfactory mode detection is so far only possible for data with a high spatial and temporal resolution, that is, person-based GPS. A variety of methods and evaluation criteria have been proposed over the last years. Many of the approaches are rule-based (for example, de Jong and Mensonides, 2003; Chung and Shalaby, 2005; Stopher et al., 2005; Bohte and Maat, 2009; Marchal et al., 2011) and use criteria such as average or maximum speed, duration of the stage, data quality or proximity to certain network elements (for example, roads, bus stops or train stations) to derive deterministically the best fitting mode. Other approaches employ fuzzy logic (Tsui and Shalaby, 2006; Schüssler and Axhausen, 2009b) or Bayesian inference models (Zheng et al., 2008; Moiseeva et al., 2010) with similar criteria but accounting for the fact that many modes have overlapping characteristics, particularly in urban settings, and can therefore only be distinguished with a certain probability. A special challenge is the detection of underground travel modes. Since there is no or only very bad GPS reception in tunnels and subway systems, little usable information is available for these stages. Chen et al. (2010) tested a subway detection for New York City that takes into account the location of the points available in relation to subway links and stops and the distance between the start and the end of the recording gap. They found, however, that the detection rates for subway travel were bad compared to those of other modes. One approach to fill this gap might be the usage of additional data such as accelerometer, Bluetooth, WiFi or GSM data collected, for example, with smart-phones.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

142

Handbook of choice modelling

3.2.4 Activity purpose imputation Activity purpose detection has been addressed by considerably fewer researchers than mode detection, probably because it is even more challenging. Most approaches presented so far (for example, Stopher et al., 2007; Bohte and Maat, 2009; Moiseeva et al., 2010) use general land-use information and frequently visited activity locations, for example, home, work or regularly visited grocery stores. Additionally, some analysts (for example, Wolf et al., 2001, 2004; Kawasaki and Axhausen, 2009; Marchal et al., 2011; Shen and Stopher, 2012) evaluate temporal patterns such as duration, time of day of the activity, purpose of surrounding activities or repetitiveness. However, often the results of these procedures are not yet satisfactory and participant input or manual checks are required. The main reasons are either the unavailability of land-use information or a high share of mixed land-uses in the study area. The latter aspect is an even bigger challenge for multi-storey buildings with overlapping uses such as malls or mix-use commercial buildings. Moreover, temporal patterns may vary from person to person. Thus, learning of personal temporal patterns and frequently visited activity locations might be required to improve the activity purpose detection results. 3.2.5 Spatial matching Spatial matching entails the matching of trips to networks and activities to specific locations or points of interest. The spatial matching of activities is required for the activity purpose detection described above or for the derivation of origin-destination matrices. However, the research focus has been on the matching of trips to the network, called map-matching. The state of the art has moved from simple but error-prone procedures such as nearest node or nearest link detection (for example, White et al., 2000; Nielsen et al., 2004) to algorithms that ensure topological consistency for one candidate path (for example, Doherty et al., 2001; Greenfeld, 2002; Ochieng et al., 2004; Quddus et al., 2003) to finally algorithms that develop a set of candidate paths and select the most likely candidate only after the whole or large shares of the GPS track has been evaluated (Pyo et al., 2001; Marchal et al., 2005; Schüssler, 2010; Bierlaire, et al., 2013). However, all these map-matching procedures require detailed and – most importantly – spatially correct networks that are not always available even though topological navigation networks are widely-used nowadays. Moreover, the treatment of longer recording gaps in the sequence of position observations remains an issue. Most procedures use assumptions such as shortest path or least amount of turns to fill these gaps. If that is the case, this has to be accounted for in the modelling though this has been frequently neglected so far. Another topic, which will become increasingly important, is the identification of public transport connection used without having AFC data available. A first attempt at this is presented by Rieser-Schüssler and Axhausen (2013).

4

MODELLING TRAVEL BEHAVIOUR FROM PASSIVE TRACING VERSUS SELF-REPORTED DATA SOURCES

The shift from self-reported data sources to passive tracing technologies leads to a number of new challenges and issues modellers have to consider. The phenomena that can be observed are different as is the composition of the observed population sample.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Self-tracing and reporting 143 Some information that is routinely asked for in self-reporting surveys is difficult to retrieve with passive tracing technologies while other information is available more accurately and in more detail. This implies that not all tracing technologies are adequate for all modelling purposes. Person-based GPS diaries are currently the data source closest to traditional self-reported diaries and therefore preferred by modellers building the new generation of activity-based models. However, modellers have to take into account that not only the level of detail and accuracy differs between traditional diaries and GPS diaries but also the observed behaviour can be quite different as discussed in the next subsection. The difference between traditional self-reporting surveys and the other tracing technologies are even bigger. This challenges modellers to think outside their traditional modelling applications and to explore completely new modelling areas such as the modelling of transfer patterns and passenger flows based on public transport operations data or crowd phenomena recorded by GSM. The remainder of this section focuses on four of the most important differences between traditional self-reporting surveys and passive tracing technologies that every modeller has to consider before using data from passive tracing technologies for her models. First, there is the difference in the socio-economic characteristics of the observed population sample that originate from dissimilar recruiting strategies but also from the fact that the willingness to participate in a certain type of study varies between population segments. The change in participant characteristics is also a contributing factor to the second major difference: the difference in observed behavioural patterns, for example, trip rates, trip distances, and so on. Another important reason is that the type and source of observation errors vary substantially between survey methods. Third, there is the question of which information is obtainable through which data source as discussed already above. Finally, there is the issue of participant burden for survey techniques that require participant interaction. Despite the long-standing promise that it can be reduced through the use of new technologies, participants are currently often faced with a larger rather than a lower burden. 4.1

Differences in Population Sample Characteristics

Obtaining a sample that is a representative of the study area is a major goal of all transport behaviour studies since it is an essential for representative transport behaviour models. However, even traditional self-reporting surveys struggle to reach this goal because certain population segments are harder to reach than others. For surveys based on passive tracing technologies that actively recruit participants this issue remains. However, the population segments that are hard to reach are different. Studies addressing this issue (for example, Bricka, 2008; Marchal et al., 2008; Stopher, 2009) found that these differences can be significant particularly with respect to age, income, education and household size of the participants. Modellers need to account for these differences in the population sample characteristics for example by reweighting observations. Not as easily solvable is the difference in self-selection bias with regard to transport behaviour between self-reporting studies and passive tracing studies. A few studies (for example, Bricka and Bhat, 2006; Stopher and Greaves, 2010; Bricka et al., 2012) have recently looked at this issue and found that even after controlling for socio-demographic characteristics persons that are very mobile – both in the sense of trip frequency and trip

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

144

Handbook of choice modelling

distance – are more likely to participate in technology-based studies than in traditional travel diaries. Accounting for this in our models imposes a new challenge on modellers. Therefore, more research is needed that investigates the magnitude of this effect as well as ways to properly correct for it. Another important issue with respect to technology-based studies based on active recruiting that has received little attention so far: missing observation of children’s travel behaviour. In traditional travel diary studies, a diary is required for each household member aged 5 (or 6) and over. However, due to ethical considerations and concerns about how a younger child will handle a GPS device (for example, lend it to friends, and so on) only household members over the age of 12 (or 14) are usually equipped with GPS devices. Some studies include paper diaries for younger children, but as Stopher and Prasad (2012) showed in a recent paper, this does raise new issues due to the differences in travel behaviour observed through diaries and through passive tracing technologies that are discussed more extensively in the next section. For studies without active recruitment of participants such as PTOD, APNR or most GSM studies, the issue of representativeness becomes even harder to solve. First, there is usually no or only little information available about the socio-demographic characteristics of the observed individuals because either the data provider does not collect this information or is not willing to share it due to privacy concerns. Second, the modeller often has to suspect a certain bias in the observed population sample either based on travel related characteristics (for example, frequent public transport users are more likely to own a smart card) or characteristics not related to travel behaviour such as the choice of a mobile phone network provider. How to deal with this potential bias as a modeller is still an open research issue that has been largely ignored so far. 4.2

Differences in Observed Travel Behaviour

In addition to the behavioural differences resulting from differences in sample composition and self-selection bias discussed above several studies (for example, Wolf et al., 2003; Bricka and Bhat, 2006; Stopher and Greaves, 2010; Bricka et al., 2012) have shown that the different survey techniques lead to differences in the observed travel behaviour by comparing the self-reported travel diaries of the participants to their GPS traces and the travel behaviour imputed from these traces. These differences mainly originate from the fact that no method for observing travel behaviour is free from errors, but the type of error differs substantially between survey methods. Errors in passive tracing based surveys are mainly due to inaccurate position data, wrong imputation or device failures whereas errors in traditional self-reporting surveys spring from the participants’ inability or unwillingness to report their behaviour correctly and completely. Thus, trips or (short) activities are more frequently missing in self-reporting surveys because participants forget about them. Travel dates might be mixed up. Travel distances and times are rounded more or less correctly. The errors associated with passive tracing technologies follow different patterns. While device failures usually occur randomly, other errors might be spatially correlated, for example, because GPS reception can be difficult in central business districts. These different error patterns require different correction techniques from the modeller. Therefore, modellers should be aware of the particular characteristics of their survey technique.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Self-tracing and reporting 145 4.3

Differences in the Obtainable Information

In addition to the differences in observed travel behaviour discussed in the previous section, the different survey approaches and technologies lead to differences in the observable attributes and level of detail. The new passive tracing technologies offer some advantages over traditional self-reporting studies but also have some shortcomings. The advantage of passive tracing technologies is that they provide a very high level of spatial and temporal detail that is difficult to achieve by traditional self-reporting surveys. A prominent example is the observation of the actual route choice of drivers or public transport users. Obtaining route descriptions from participants at a level of detail so that the analyst does not have to impute large parts of the route using assumptions without overburdening respondents, is challenging if not impossible, whereas some passive tracing technologies, for example, GPS traces, deliver this information without any respondent input. However, estimating and applying models for this high spatial and temporal resolution imply new challenges. One of the challenges that has recently received some attention in the research community is the correct composition, structure and size of choice sets generated on high-resolution navigation networks. As Schüssler and Axhausen (2009a) show, generating traditional choice-set sizes is not sufficient for these types of networks because relevant routes are often missing in the choice set. But generating larger choiceset sizes leads to computational challenges, as discussed by Rieser-Schüssler et al. (2012) and Pillat et al. (2011). While Rieser-Schüssler et al. (2012) present a choice-set generation algorithm that generates well-composed choice sets in reasonable computation time, more research should be done in this area to further increase the behavioural realism of the choice sets without substantially increasing computation time. A shortcoming of passive tracing technologies is that they are usually not able to provide a lot of situational characteristics that are traditionally requested in self-reported studies and impact travel behaviour. Imputation methods have been developed for some of these technologies. However, the achievable accuracy of these imputations greatly depends on the survey technology. Characteristics such as trip purpose, activity location and access to the observed transport system are more difficult to derive from PTOD or APNR observations than from person-based GPS traces, whereas GSM observations allow only rough guesses about modes. For other attributes, such as the number of accompanying persons, the imputation is even more difficult and less explored. For GPS observations, Stopher et al. (2011) are the first to infer the number of accompanying household members by matching their trips. However, this provides only a subset of potentially accompanying persons. For more representative results, the integration of additional data, for example, the Bluetooth measurements via smart-phones (Hurtubia et al., 2009) might be necessary. Another way to add attributes that cannot be observed through passive tracing technology is to ask the participants for their input. In principle, there are three main types of user input that can be collected: additional information about the movements, additional information about the participant and corrections of any automatic processing results. Examples of additional information are the cost of public transport tickets, parking, and so on. These items can also be added straightforwardly as additional attributes to the observed trips or activities.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

146

Handbook of choice modelling

Additional information about the participants can, for instance, entail socio-economic information and attitudes but also frequently visited locations such as the home or work location or the availability of a car. On the one hand, this information can be used in the choice modelling the same way as in traditional studies. On the other hand, it can serve as input to processing. Frequently visited locations can be useful in the trip purpose detection, whereas car ownership can help in mode detection. The biggest challenge with regard to user input is the integration of corrections to the processing results collected in so-called prompted recall surveys. Until recently the assumption was that the corrections by the participants based on the processing results provided the ground truth. However, a few recent studies (Stopher et al., 2011; RieserSchüssler et al., 2011) in which the participants were not guided by an interviewer have expressed doubts about this. The changes made by some of the participants are implausible and how to deal with this is a pending issue. 4.4

Differences in Participant Burden

One major expectation for technology-based surveys – besides more accurate observation – was always a reduction of participant burden. This would have benefited participants and researchers alike, and researchers were eager to translate some of the burden reduction into longer observation periods. On the one hand, longer observation periods would compensate the costs caused by dropping off and picking up the devices. On the other hand, longer observation periods allows multi-day or even multi-week patterns to be studied. However, currently only surveys without participant interaction meet this expectation, but they are not able to provide the full set of data required by many modern travel behaviour models. For new technology surveys that require participant interaction, the participant burden strongly depends on the extent and design of the prompted recall survey and the quality of the processed results. Frignani et al. (2010) report that it took their respondents about 13 minutes per recording day to fill out the prompted recall survey, which implies a significant burden for the respondents as confirmed by 64 per cent of their respondents in an exit interview. Little has been published about completion times of other prompted recall surveys, but one can assume that they are in a similar range since the prompted recall survey by Frignani et al. (2010) was not particularly cumbersome and they only asked for basic information. Thus, a real reduction in participant burden can only be achieved when the prompted recall element can be dropped or at least scaled down substantially. But for this several aspects of the processing routines still need to be improved.

ACKNOWLEDGEMENTS We would like to thank our collaborators and the reviewer for their critical comments and suggestions. In particular, we thank Claude Weis, Alex Erath and Lara Montini. All remaining errors are our own. Parts of this chapter, in particular the description of the modern technology, have already been published in similar form in Rieser-Schüssler (2012).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Self-tracing and reporting 147

NOTES 1. Or its equivalent http://en.eurobilltracker.com/. Note, that the American site is currently blocked to non-US IP addresses. 2. Assumed here as the location, where the person is most often observed at night. For the discussion on multi-locality and the vagueness of the term ‘home’ see, for example, Petzold (2011).

REFERENCES Ahas, R., A. Aasa, S. Silm and M. Tiru (2010), ‘Daily rhythms of suburban commuters’ movements in the Tallinn metropolitan area: case study with mobile positioning data’, Transportation Research Part C: Emerging Technologies, 18 (1), 45–54. Axhausen, K.W. (1995), ‘Travel diaries: an annotated catalogue’, working paper, Institut für Straßenbau und Verkehrsplanung, Leopold-Franzens-Universität, Innsbruck. Axhausen, K.W. (2008), ‘Definition of movement and activity for transport modelling’, in D.A. Hensher and K.J. Button (eds), Handbook of Transport Modelling, 2nd edn, Oxford: Elsevier, pp. 329–44. Axhausen, K.W. and C. Weis (2010), ‘Predicting response rate: a natural experiment’, Survey Practise, 3, accessed 20 January 2014 at http://surveypractice.org/2010/04/14/predicting-response-rate/. Axhausen, K.W., J.-L. Madre, J.W. Polak and P.L. Toint (eds), (2002a), Capturing Long Distance Travel, Baldock, Herts: Research Science Press. Axhausen, K.W., A. Zimmermann, S. Schönfelder, G. Rindsfüser and T. Haupt (2002b), ‘Observing the rhythms of daily life: a six-week travel diary’, Transportation, 29 (2), 95–124. Bekhor, S., Y. Cohen and C. Solomon (forthcoming), ‘Evaluating long-distance travel patterns in Israel by tracking cellular phone positions’, Journal of Advanced Transportation. Bierlaire, M., J. Chen, and J.P. Newman (2013), ‘A probabilistic map matching method for smartphone GPS data’, Transportation Research Part C: Emerging Technologies, 26 (January), 78–98. Bohte, W. and K. Maat (2009), ‘Deriving and validating trip purposes and travel modes for multi-day GPS-based travel surveys: a large-scale application in the Netherlands’, Transportation Research Part C: Emerging Technologies, 17 (3), 285–97. Bricka, S. (2008), ‘Non-response challenges in GPS-based surveys’, paper presented at the 8th International Conference on Survey Methods in Transport, Annecy, May. Bricka, S. and C.R. Bhat (2006), ‘A comparative analysis of GPS-based and travel survey-based data’, Transportation Research Record: Journal of the Transportation Research Board (1972), 9–20. Bricka, S., S. Sen, R. Paleti and C.R. Bhat (2012), ‘A comparative analysis of GPS-based and travel surveybased data’, Transportation Research Part C: Emerging Technologies, 21 (1), 67–88. Casas, J. and C. Arce (1999), ‘Trip reporting in household travel diaries: a comparison to GPS-collected data’, paper presented at the 78th Annual Meeting of the Transportation Research Board, Washington, DC, January. Centre d’etudes sur les reseaux, les transports, l’urbanisme et les construction publiques (CERTU) (2008), ‘L’enquête ménage déplacements “méthode standard”’, Collections du CERTU, éditions du CERTU, Lyon. Chakirov, A. and A. Erath (2011), ‘Use of public transport smart card fare payment data for travel behaviour analysis in Singapore’, paper presented at the 16th international conference of Hong Kong Society for Transportation Studies, Hong Kong, December. Chapleau, R., K.K.A. Chu and B. Allard (2011), ‘Synthesizing AFC, APC, GPS and GIS data to generate performance and travel demand indicators for public transit’, paper presented at the 90th Annual Meeting of the Transportation Research Board, Washington, DC, January. Chen, C., H. Gong, C. Lawson, E. Bialostozky and J. Muckell (2010), ‘Evaluating the feasibility of a passive travel survey data collection in a complex urban environment: a case study in New York City’, paper presented at the 89th Annual Meeting of the Transportation Research Board, Washington, DC, January. Chen, J., J. Newman and M. Bierlaire (2009), ‘Modeling route choice behavior from smart-phone GPS data’, paper presented at the 12th International Conference on Travel Behaviour Research (IATBR), Jaipur, December. Chikaraishi, M., A. Fujiwara, J. Zhang and K.W. Axhausen (2009), ‘Exploring variation properties of departure time choice behavior using multilevel analysis approach’, Transportation Research Record: Journal of the Transportation Research Board (2134), 10–20. Chikaraishi, M., J. Zhang, A. Fujiwara and K.W. Axhausen (2011), ‘Identifying variations and co-variations in discrete choice models’, Transportation, 38 (6), 993–1016.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

148

Handbook of choice modelling

Choudhury, C.F., S.S. Rajiwade, S.R. Rapolu, M.E. Ben-Akiva and A. Emmonds (2011), ‘Evaluating the impact of interventions on network capacity’, paper presented at the 90th Annual Meeting of the Transportation Research Board, Washington, DC, January. Chung, E.-H. and A. Shalaby (2005), ‘A trip bases reconstruction tool for GPS-based personal travel surveys’, Transportation Planning and Technology, 28 (5), 381–401. De Jong, R. and W. Mensonides (2003), ‘Wearable GPS device as a data collection method for travel research’, working paper, ITS-WP-03-02, Institute of Transport Studies, University of Sydney, Sydney. Dillman, D.A. (2000), Mail and Internet Surveys: The Tailored Design Method, 2nd edn, New York: Wiley. Doherty, S.T., C. Noel, M.E.H. Lee-Gosselin, C. Sirois, M. Ueno and F. Theberge (2001), ‘Moving beyond observed outcomes: integrating Global Positioning Systems and interactive computer-based travel behaviour surveys’, Transportation Research E-Circular, C026 (March), 449–66. Draijer, G., N. Kalfs and J. Perdok (2000), ‘Global Positioning System as data collection method for travel research’, Transportation Research Record: Journal of the Transportation Research Board (1719), 147–53. Du, J. and L. Aultman-Hall (2007), ‘Increasing the accuracy of trip rate information from passive multi-day GPS travel datasets: automatic trip end identification issues’, Transportation Research Part A: Policy and Practice, 41 (3), 220–32. Farzin, J.M. (2008), ‘Constructing an automated bus origin-destination matrix using Farecard and Global Positioning System data in Sao Paulo, Brazil’, Transportation Research Record: Journal of the Transportation Research Board (2072), 30–37. Friedrich, M., P. Jehlicka and J. Schlaich (2008), ‘Automatic number plate recognition for the observance of travel behavior’, paper presented at the 8th International Conference on Survey Methods in Transport, Annecy, May. Frignani, M.Z., J. Auld, A.K. Mohammadian, C. Williams and P. Nelson (2010), ‘Urban travel route and activity choice surveys (UTRACS): an internet-based prompted recall activity travel survey using GPS data’, paper presented at the 89th Annual Meeting of the Transportation Research Board, Washington, DC, January. Giaimo, G., R. Anderson, L. Wargelin and P.R. Stopher (2010), ‘Will it work? Pilot results from the first largescale GPS-based household travel survey in the United States’, Transportation Research Record: Journal of the Transportation Research Board (2176), 26–34. González M., C.A. Hidalgo and A.-L. Barabasi (2008), ‘Understanding individual human mobility patterns’, Nature, 453 (5 June), 779–82. Greenfeld, J.S. (2002), ‘Matching GPS observations to locations on a digital map’, paper presented at the 81st Annual Meeting of the Transportation Research Board, Washington, DC, January. Hanson, S. and J.O. Huff (1982), ‘Assessing day-to-day variability in complex travel patterns’, Transportation Research Record: Journal of the Transportation Research Board (891), 18–24. Hubert, J-P., J. Armoogum, K.W. Axhausen and J.-L. Madre (2008), ‘Immobility and mobility seen through trip based versus time use surveys’, Transport Reviews, 28 (5), 641–58. Hurtubia, R., G. Flötteröd and M. Bierlaire (2009), ‘Inferring the activities of smartphone users from context measurements using Bayesian inference and random utility models’, paper presented at the European Transport Conference, Leeuwenhorst, October. Jun, J., R. Guensler and J. Ogle (2007), ‘Smoothing methods to minimize impact of Global Positioning System random error on travel distance, speed, and acceleration profile estimates’, Transportation Research Record: Journal of the Transportation Research Board (1972), 141–50. Kawasaki, T. and K.W. Axhausen (2009), ‘Choice set generation from GPS data set for grocery shopping location choice modelling in canton Zurich: the comparison to Swiss Microcensus 2005’, working paper, 595, IVT, ETH Zurich, Zurich. Madre, J.-L., K.W. Axhausen and W. Brög (2007), ‘Immobility in travel diary surveys’, Transportation, 34 (1), 107–28. Marchal, F., J.K. Hackney and K.W. Axhausen (2005), ‘Efficient map matching of large Global Positioning System data sets: tests on speed-monitoring experiment in Zürich’, Transportation Research Record: Journal of the Transportation Research Board (1935), 93–100. Marchal, P., J.-L. Madre and S. Yuan (2011), ‘Post-processing procedures for person-based GPS data collected in the French National Travel Survey 2007–2008’, Transportation Research Record: Journal of the Transportation Research Board (3397), 47–54. Marchal, P., S. Roux, S. Yuan, J.-P. Hubert, J. Armoogum, J.-L. Madre and M.E.H. Lee-Gosselin (2008), ‘A study of non-response in the GPS sub-sample of the French National Travel Survey 2007–2008’, paper presented at the 8th International Conference on Survey Methods in Transport, Annecy, May. Moiseeva, A., J. Jessurun and H.J.P. Timmermans (2010), ‘Semi-automatic imputation of activity travel diaries using GPS-traces, prompted recall and context-sensitive learning algorithms’, Transportation Research Record: Journal of the Transportation Research Board (2183), 60–68. Munizaga, M.A. and C. Palma (2012), ‘Estimation of disaggregate multimodal public transport OD

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Self-tracing and reporting 149 matrix from passive SmartCard data from Santiago, Chile’, Transportation Research Part C: Emerging Technologies, 24 (1), 9–18. Murakami, E. and D.P. Wagner (1999), ‘Can using Global Positioning System (GPS) improve trip reporting?’, Transportation Research Part C: Emerging Technologies, 7 (2–3), 149–65. Nielsen, O.A., C. Würtz and R.M. Jurgensen (2004), ‘Improved map-matching algorithms for GPS-data – methodology and test on data from the AKTA roadpricing experiment in Copenhagen’, paper presented at the 19th European Conference for ESRI Users, Copenhagen, November. Ochieng, W.Y., M.A. Quddus and R.B. Noland (2004), ‘Map matching in complex urban road networks’, Brazilian Journal of Cartography, 55 (2), 1–18. Ogle, J., R. Guensler, W. Bachman, M. Koutsak and J. Wolf (2002), ‘Accuracy of Global Positioning System for determining driver performance parameters’, Transportation Research Record: Journal of the Transportation Research Board (1818), 12–24. Oliveira, M., P. Vovsha, J. Wolf, Y. Birotker, D. Givon and J. Paasche (2011), ‘Global Positioning SystemAssisted Prompted Recall Household Travel Survey to support development of advanced travel model in Jerusalem, Israel’, Transpotation Research Record, 2246, 16–23. Organisation for Economic Co-operation and Development (OECD) and Statistics Canada (1995), Literacy, Economy and Society: Results of the First International Adult Literacy Survey, Paris: OECD. Pereira, F.C., C.D. Cottrill, M. Abou-Zeid, Y. Xiang, I. Dias, J. Santos, M.E. Ben-Akiva and J.A. Silva (2011), ‘Integrated transportation activity-travel smartphone survey’, paper presented at the 9th International Conference on Survey Methods in Transport, Termas de Puyehue, November. Petzold, K. (2011), ‘Measuring “job-related” multi-locality – overview and conceptual framework’, in C. Larsen, R. Hasberg, A. Schmid, M. Bittner and F. Clément (eds), Measuring Geographical Mobility in Regional Labour Market Monitoring – State of the Art and Perspectives, München: Rainer Hampp Verlag, pp. 235–46. Pillat, J., E. Mandir and M. Friedrich (2011), ‘Dynamic choice set generation based on a combination of GPS trajectories and stated preference data’, Proceedings of the 90th Annual Meeting of the Transportation Research Board, Washington, DC, January. Polak, J.W. and Han, X.L. (1997), ‘Iterative imputation based methods for unit and item nonresponse in travel diary surveys’, paper presented at the 8th Meeting of the International Association for Travel Behaviour Research, Austin, September. Pyo, J.-S., D.-H. Shin and T.-K. Sung (2001), ‘Development of a map matching method using the multiple hypothesis technique’, paper presented at the Intelligent Transportation Systems Conference (ITSC), Oakland, August. Quddus, M.A., W.Y. Ochieng, L. Zhao and R.B. Noland (2003), ‘A general map matching algorithm for transport telematics applications’, GPS Solutions, 7 (3), 157–67. Richardson, A.J. (2003), ‘Behavioral mechanisms of nonresponse in mail-back travel surveys’, Transportation Research Record: Journal of the Transportation Research Board (1855), 191–9. Richardson, A.J., E.S. Ampt and A.H. Meyburg (1995), Survey Methods for Transport Planning, Melbourne: Eucalyptus Press. Rieser-Schüssler, N. (2012), ‘Capitalising modern data sources for observing and modelling transport behaviour’, Transportation Letters, 4 (2), 115–28. Rieser-Schüssler, N. and K.W. Axhausen (2013), ‘Identifying chosen public transport connections from GPS observations’, paper presented at the 92nd Annual Meeting of the Transportation Research Board, Washington, DC, January. Rieser-Schüssler, N., M. Balmer and K.W. Axhausen (2012), ‘Route choice sets for very high-resolution data’, Transportmetrica, doi:10.1080/18128602.2012.671383. Rieser-Schüssler, N., L. Montini and C. Dobler (2011), ‘Improving automatic post-processing routines for GPS observations using prompted-recall data’, paper presented at the 9th International Conference on Survey Methods in Transport, Termas de Puyehue, November. Rofique, J., A. Humphrey and C. Killpack (2011), ‘National Travel Survey 2011 GPS pilot field report’, research report prepared for Department for Transport, London. Schlaich, J., T. Otterstätter and M. Friedrich (2010), ‘Generating trajectories from mobile phone data’, paper presented at the 89th Annual Meeting of the Transportation Research Board, Washington, DC, January. Schlich, R. (2003), ‘Homogenous groups of travellers’, paper presented at the 10th International Conference on Travel Behaviour Research, Lucerne, August. Schönfelder, S., H. Li, R. Guensler, J. Ogle and K.W. Axhausen (2006), ‘Analysis of commute Atlanta instrumented vehicle GPS data: destination choice behavior and activity spaces’, paper presented at the 85th Annual Meeting of the Transportation Research Board, Washington, DC, January. Schüssler, N. (2010), ‘Accounting for similarities between alternatives in discrete choice models based on highresolution observations of transport behaviour’, PhD thesis, ETH Zurich, Zurich. Schüssler, N. and K.W. Axhausen (2009a), ‘Accounting for route overlap in urban and suburban route choice

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

150

Handbook of choice modelling

decisions derived from GPS observations’, paper presented at the 12th International Conference on Travel Behaviour Research (IATBR), Jaipur, December. Schüssler, N. and K.W. Axhausen (2009b), ‘Processing GPS raw data without additional information’, Transportation Research Record: Journal of the Transportation Research Board (2105), 28–36. Seaborn, C., J. Attanucci and N.H.M. Wilson (2009), ‘Analyzing multimodal public transport journeys in London with smart card fare payment data’, Transportation Research Record: Journal of the Transportation Research Board (2121), 55–62. Shen, L. and P.R. Stopher (2012), ‘An improved process for trip purpose imputation from GPS travel data’, paper presented at the 13th International Conference on Travel Behaviour Research (IATBR), Toronto, July. Stopher, P.R. (2009), ‘Collecting and processing data from mobile technologies’, in P. Bonnel, J. Zmud, M.E.H. Lee-Gosselin and J.-L. Madre (eds), Transport Survey Methods: Keeping Up with a Changing World, Bingley: Emerald, pp. 361–91. Stopher, P.R. and S. Greaves (2007), ‘Guidelines for samplers: measuring a change in behaviour from before and after surveys’, Transportation, 34 (1), 1–16. Stopher, P.R. and S. Greaves (2010), ‘Missing and inaccurate information from travel surveys: pilot results’, working paper, ITS-WP-10-07, Institute of Transport Studies, University of Sydney, Sydney. Stopher, P.R. and C. Prasad (2012), ‘Analysis of child diaries – can GPS traces of parents movements provide sufficient travel data for children?’, paper presented at the Australasian Transport Research Forum, Perth, September. Stopher, P.R., Q. Jiang and C. FitzGerald (2005), ‘Processing GPS data from travel surveys’, paper presented at the 2nd International Colloqium on the Behavioural Foundations of Integrated Land-use and Transportation Models: Frameworks, Models and Applications, Toronto, June. Stopher, P.R., Q. Jiang and C. FitzGerald (2007), ‘Deducing mode and purpose from GPS data’, paper presented at the 11th TRB National Transportation Planning Applications Conference, Daytona Beach, May. Stopher, P.R., K. Kockelman, S. Greaves and E. Clifford (2008), ‘Sample size requirements for multi-day travel surveys: some findings’, paper presented at the 8th International Conference on Survey Methods in Transport, Annecy, May. Stopher, P.R., J. Zhang and C. Prasad (2011), ‘Evaluating and improving software for identifying trips, occupancy, mode and purpose from GPS traces’, paper presented at the 9th International Conference on Survey Methods in Transport, Termas de Puyehue, November. Sudman, S., N.M. Bradburn and N. Schwarz (1996), Thinking about Answers, San Francisco, CA: Jossey-Bass. Tsui, S.Y.A. and A. Shalaby (2006), ‘An enhanced system for link and mode identification for GPS-based personal travel surveys’, Transportation Research Record: Journal of the Transportation Research Board (1972), 38–45. Wagner, D.P. (1997), ‘Lexington area travel data collection test: GPS for personal travel surveys’, final report, Office of Highway Policy Information and Office of Technology Applications, Federal Highway Administration, Battelle Transport Division, Columbus, OH, September. Wermuth, M.J. (1985), ‘Non-sampling errors due to non-response in written household travel surveys’, in E.S. Ampt, A.J. Richardson and W. Brög (eds), New Survey Methods in Transport, Utrecht: VNU Science, pp. 349–65. White, C.E., D. Bernstein and A.L. Kornhauser (2000), ‘Some map matching algorithms for personal navigation assistants’, Transportation Research Part C: Emerging Technologies, 8 (1–6), 91–108. Wilson, N.H.M., J. Zhao and A. Rahbee (2009), ‘The potential impact of automated data collection systems on urban public transport planning’, in N.H.M. Wilson and A. Nuzzolo (eds), Schedule-Based Modeling of Transportation Networks: Theory and Applications, New York: Springer, pp. 75–99. Wolf, J. (2000), ‘Using GPS data loggers to replace travel diaries in the collection of travel data’, PhD thesis, Georgia Institute of Technology, Atlanta. Wolf, J. and M. Oliveira (2008), ‘Metropolitan Washington, D.C., household travel survey Global Positioning System pretest: results and applications for large-scale regional survey’, paper presented at the 87th Annual Meeting of the Transportation Research Board, Washington, DC, January. Wolf, J., R. Guensler and W. Bachman (2001), ‘Elimination of the travel diary – experiment to derive trip purpose from Global Positioning System travel data’, Transportation Research Record: Journal of the Transportation Research Board (1768), 125–34. Wolf, J., S. Hallmark, M. Oliveira, R. Guensler and W. Sarasua (1999), ‘Accuracy issues with route choice data collection by using Global Positioning System’, Transportation Research Record: Journal of the Transportation Research Board (1660), 66–74. Wolf, J., M. Oliveira and M. Thompson (2003), ‘Impact of underreporting on mileage and travel time estimates – results from Global Positioning System-enhanced household travel survey’, Transportation Research Record: Journal of the Transportation Research Board (1854), 189–98. Wolf, J., S. Schönfelder, U. Samaga, M. Oliveira and K.W. Axhausen (2004), ‘Eighty weeks of Global

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Self-tracing and reporting 151 Positioning System traces’, Transportation Research Record: Journal of the Transportation Research Board (1870), 46–54. Xu, Y. and R. Guensler (2011), ‘Effective GPS-based panel survey sample size analysis for before-and-after studies using generalized estimating equation method’, paper presented at the 90th Annual Meeting of the Transportation Research Board, Washington, DC, January. Yalamanchili, L., R.M. Pendyala, N. Prabaharan and P. Chakravarty (1999), ‘Analysis of Global Positioning System-based data collection methods for capturing multistop trip-chaining behavior’, Transportation Research Record: Journal of the Transportation Research Board (1660), 58–65. Zheng, Y., L. Liu, L. Wang and X. Xie (2008), ‘Learning transportation mode from raw GPS data for geographic applications on the web’, paper presented at the 17th World Wide Web Conference, Bejing, April.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:08AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

7

Stated choice experimental design theory: the who, the what and the why John M. Rose and Michiel C.J. Bliemer

1

INTRODUCTION

Unlike most survey data where information on both the dependent and independent variables is captured directly from respondents, stated preference data where respondents make decisions based in hypothetical markets, of which stated choice (SC) data is a subset, is unique in that typically only the dependent variable is provided by the respondent. With the exception of covariate information which is often ignored in most analysis, the primary variables of interest, consisting of attributes and their associated levels, are designed in advance and presented to the respondent in the form of competing alternatives in SC studies. However, increasing evidence of both an empirical (for example, Bliemer and Rose, 2011; Louviere et al., 2008) and theoretical nature (for example, Burgess and Street, 2005; Sándor and Wedel, 2001, 2002, 2005) suggests that the allocation of the attribute levels over the experiment may impact upon the model outputs obtained, particularly when small samples are involved. As such, rather than simply randomly assigning the attribute levels shown to respondents over the course of an experiment, experimental design theory has traditionally been applied to allocate the attribute levels to the alternatives in some systematic manner. A review of the literature, however, suggests that little consensus exists as to what specific experimental design theory, or aspects thereof, are appropriate for SC studies. Indeed, over the past three decades, research in the area of experimental design construction specifically related to SC studies has resulted in a number of what appear to be competing paradigms, many of which claim superiority in their approach. Unfortunately, claims of superiority require objective criteria, and over the years, different researchers have selectively chosen criteria that support their chosen paradigm to the detriment of competing methods. The objective of this chapter is therefore twofold. First, it is argued that despite the disparate nature of the existent literature leading to multiple claims of methodological superiority, there exists only a single underlying theory related to the construction of SC experiments. The only thing that differs is the underlying assumptions made by various analysts. Secondly, in presenting this theory it is discussed how the various researchers in this field have actually been reliant on this theory, either implicitly or explicitly, but under very different sets of assumptions. It is these assumptions that define the different approaches in constructing SC experimental designs. The remainder of this chapter is set out as follows. Section 2 discusses what exactly an experimental design is and why it is important. Section 3 outlines a number of decisions that are required prior to generating the experimental design. Section 4 then provides a discussion of the theory of experimental design as it relates to SC studies. Section 5 pro152 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 153 vides a selective historical overview of key research groups working in the area of experimental design theory for SC studies. Section 6 comprises brief concluding comments.

2

WHAT IS AN EXPERIMENTAL DESIGN?

Conceptually, an experimental design is simply a matrix of values that is used to determine what goes where in an SC survey. These values may be either numbers or labels depending on how the analyst wishes to relate the information of the experiment to the respondents. The values that populate the matrix represent the attribute levels that will be used in the SC survey, whereas the columns and rows of the matrix represent the choice tasks (alternatively called choice sets or choice situations depending on the literature cited), attributes (sometimes referred to as factors) and alternatives (referred to as profiles in the marketing literature and treatment combinations in the mainstream experimental design literature) of the experiment. The layout of the design matrix is typically presented in one of two ways. Some researchers set up the experimental design matrix such that each row represents a different choice task and each column a different attribute within the experiment (see for example, Bliemer and Rose 2009; Rose and Bliemer 2009). In such cases, groups of columns form different alternatives within each choice task. Other researchers, however, set up the design matrix such that each row of the matrix represents an individual alternative and each column a different attribute (see for example, Carlsson and Martinsson, 2003; Huber and Zwerina, 1996; Kanninen, 2002; Sándor and Wedel, 2001, 2002). In these cases, multiple rows are grouped together to form individual choice tasks. Independent of how the matrix is set out, the experimental design performs the same function; that being the allocation of attribute levels to choice tasks. The distribution of the levels of the design attributes over the experiment, via the underlying experimental design, may play a large part in whether or not an independent assessment of each attribute’s contribution to the observed choices of sampled respondents can be determined. Further, the allocation of the attribute levels within the experimental design will also impact upon the statistical power of the experiment, in so far as its ability to detect statistical relationships that may exist within the data. This ability is related to the sample size of the study, thus given a large enough sample the statistical power of an experimental design may not matter. Nevertheless, for the sample sizes more commonly used in practice, the ability to retrieve statistically significant parameter estimates may be compromised given the selection of a relatively poor design (sample size issues are discussed in section 5) or the need to detect particular effects such as preference heterogeneity using more complex or advanced econometric models that may require larger sample sizes than simpler model types. The experimental design chosen by the analyst may therefore play a significant role in SC studies.

3

CONSIDERATIONS BEFORE GENERATING THE DESIGN

For any SC study, there exist many experimental designs that can be constructed and used in practice. The aim for the analyst is to choose a particular method and construct

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

154

Handbook of choice modelling

the design. How best to do this depends upon many different considerations. In this section, we discuss a number of decisions that have to be made prior to the design being generated. As such, these decisions are independent of the design generation process used; however, they may influence this process significantly. The first decision required of the analyst is to decide whether the experiment will be labelled (that is, the experiment uses alternatives), the names of which have substantive meaning to the respondent other than indicating their relative order of appearance, for example, the alternatives might be labelled Sherwood Forest, Cork Bark Forest, The Tongass, or unlabelled (for example, the names of the alternatives only convey their relative order of appearance, for example, Policy A, Policy B, Policy C). Aside from affecting what results can appropriately been derived for the study (for example, elasticities have no substantive meaning in unlabelled experiments), from the perspective of design generation, this decision is important as it might directly impact upon the type and number of parameters that can or will be estimated as part of the study. Typically, unlabelled experiments will involve the estimation of generic parameters only whereas labelled experiments may involve the estimation of alternative specific and/or generic parameter estimates, hence potentially resulting in more parameter estimates than with an identical, though unlabelled, experiment. Alongside considerations as to how the alternatives are to be presented to respondents (that is, labelled or unlabelled), advance knowledge of the response mechanism to be used and the number of alternatives (including no choice or status quo alternatives) to be shown to each respondent is also crucial. To date, most discrete choice studies use a response mechanism that captures first preference rankings only via a ‘pick one’ response. Recent developments, however, have seen an increase in obtaining either partial or full rankings data via say a best/worst response (for example, Flynn et al., 2008). Aside from psychological issues surrounding various response mechanisms, the type of response and the number of alternatives shown may impact upon the dimensions of the design in terms of the minimum number of choice tasks that are required for estimation purposes. Unlike most other data types where an observation typically represents information captured about a specific respondent or agent, in discrete choice data each alternative j represents a unique observation. This is because each alternative is observed to be chosen or not, hence providing information down to this level of detail. In grouping the alternatives together in choice tasks, there therefore exist J 2 1 independent choice probabilities within each choice task S which will be estimated. As such, for first preference (pick one) tasks, the total number of independent choice probabilities obtained from any given design will be equal to S # (J 2 1) with the maximum number of parameters, K, that can be estimated from that design having to be less than or equal to this number. However, when rankings data is captured, it becomes possible to rank explode the data such that new choice tasks are formed, consisting of subsets of the original J alternatives that are formed after iteratively removing previously more highly preferred alternatives (in terms of the rank r assigned). Instead of J 2 1, the total number of modeled choice R observations will now be equal to g r51 (J 2 r) , where r 5 1, 2, . . ., R, is an index of the ranking up to the total number of rankings provided for a given choice task. For example, assume a respondent provides two rankings points (best and worst), hence R 5 2. The maximum number of parameters that can then be estimated is then given as

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 155 S # g r51 (J 2 r) . If only the first preference is captured, R 5 1 and if the full ranking is captured, R 5 J 2 1. As can be seen in this discussion, the size of the design (the number of choice tasks generated) is therefore a function of the number of parameters to be estimated, the number of alternatives shown and how the response mechanism relates to the estimation process to be used. The number of attribute levels used is also an important pre-design decision, as is how they are to be modelled. This is because this will influence the number of parameters that can be estimated. If non-linear effects are expected for a certain attribute, then more than two levels need to be used for this attribute in order to be able to estimate these non-linearities. Typically, dummy and/or effects coding is used in order to estimate these non-linear effects. In this case, the number of parameters needed for each attribute is the number of levels minus one. Hence, the more levels used, the higher the minimum number of choice tasks required will be, as each additional level will require an additional parameter to be estimated. However, if the attribute is a continuous variable, one can also decide to estimate linear effects with only a single parameter per attribute, even though in the design multiple levels are used. In this case, the minimum number of required choice tasks does not increase as the number of levels increases. Another decision that is required prior to generating the design is whether attribute level balance will be imposed when generating the design. Attribute level balance occurs when each attribute level appears an equal number of times for each attribute over the design. We discuss this concept in greater detail later as various researchers have applied the concept of attribute level balance differently. However, such a choice is important as it too may also impact on the size of the final design. Also, mixing the number of attribute levels for different attributes may yield a higher number of choice tasks or alternatives required for a given design (due to attribute level balance). For example, if there are three attributes with 2, 3 and 5 levels, respectively, then the minimum number of choice tasks for an attribute level balanced design will be 30 (since this is divisible by 2, 3 and 5 without remainder). On the other hand, if one would use 2, 4 and 6 levels, then a minimum of only 12 choice tasks would be required. Therefore, it is often suggested not to mix designs with too many different numbers of attribute levels, or at least have all even or all odd numbers of attribute levels, if attribute level balance is a design criterion. The attribute level range of quantitative attributes is another important decision that needs to be made before the experimental design can be determined. Research suggests that using a wide range (for example, $1–$6) is statistically preferable to using a narrow range (for example, $3–$4) as this will theoretically lead to better parameter estimates (that is, parameter estimates with a smaller standard error), although using too wide a range may also be problematic (see Bliemer and Rose, 2010a). The reason for this is that the attribute level range will impact upon the likely choice probabilities obtained from the design, which will impact upon the expected standard errors from that design (see section 4). Having too wide a range will likely result in choice tasks with dominated alternatives (at least for some attributes), whereas a too narrow range will result in alternatives which are largely indistinguishable. We have to emphasize that this is a pure statistical property and that one should take into account the practical limitations of the attribute levels, in so much as the attribute levels shown to the respondents have to make sense. Therefore, there is a trade-off between the statistical preference for a wide range and practical considerations that may limit the range. R

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

156

Handbook of choice modelling

The number of choice tasks, as previously mentioned, is bounded from below by the number of degrees of freedom, which is influenced by: the number of parameters to be estimated; the number of alternatives that will be shown to respondents; the response mechanism as well as by other considerations such as the number of choice tasks required to achieve attribute level balance. Additionally, the design type may restrict the number of choice tasks. A (fractional factorial) orthogonal design sometimes needs (many) more choice tasks than the minimum number determined by the number of degrees of freedom and attribute level balance, merely because an orthogonal design may not exist or may be unknown for these dimensions. A full factorial design has a predetermined number of choice tasks, only influenced by the total number of attributes and the number of attribute levels. A final consideration when generating experimental designs of SC studies is what model type is likely to be estimated on data collected using the design. At a minimum, the model type may influence the number of parameters to be estimated which will influence the number of choice tasks required of the design. More complex discrete choice models may also require larger sample sizes to detect particular effects hidden with the data. While most research has focused on generating designs for the multinomial logit model, more advanced models such as the mixed logit model may require different design characteristics and larger sample sizes in order to detect heterogeneity that may exist within the data. Given this background, the next section discusses experimental design theory in detail.

4

EXPERIMENTAL DESIGN THEORY

Independent of the specific application area, experimental design theory has at its core two common objectives: (i) the ability to detect independently the effects of multiple variables on some observable outcome and (ii) improving the statistical efficiency of the experiment. In many cases, these two objectives are not in themselves independent, with designs that allow for an independent assessment of various variables on some dependent variable being the same as those that are considered to be statistically efficient. Statistical efficiency in experimental design terms relates to an increased precision of the parameter estimates for a fixed sample size. Statistical efficiency therefore is linked to the standard errors likely to be obtained from the experiment (and to a lesser extent to the covariances), with designs that can be expected to (i) yield lower standard errors for a given sample size, or (ii) the same standard errors given a smaller sample size, being deemed more statistically efficient. Fortunately, as will be argued in section 5, while different criteria have been applied at various times to measure statistical efficiency, the underlying definition of statistical efficiency has remained constant, even if it has not been appropriately applied in all cases. Once it is understood that, independent of the specific problem being examined, experimental design theory (as applied in the case of the SC context) is concerned with the standard errors (and covariances) of the parameters obtained from models to be estimated from data collected using a generated design, it is clear that what is of prime consideration is the relationship between the design and the resulting model variance-covariance matrix (from whence the standard errors are derived).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 157 As highlighted, experimental design theory originated in fields other than SC experiments and hence developed specifically to address models tailored to particular data types corresponding to the problems that were being addressed at the time. Indeed, the original theories dealt specifically with experimental problems where the dependent variable was continuous in nature. As such, the resulting design theory was developed specifically for models capable of handling such data hence much of the work on experimental design theory has concentrated on use of analysis of variance (ANOVA) and linear regression type models (see Peirce, 1876). From a historical perspective this has had a significant impact upon the SC literature. The original SC studies, unsurprisingly, were used to test specific economic effects, such as the existence of indifference curves (May, 1954; MacCrimmon and Toda, 1969; Mosteller and Nogee, 1951; Rousseas and Hart, 1951; Thurstone, 1931), and even more recently concentrated on introducing and promoting the benefits of the new modelling method and did not concentrate specifically on the issue of experimental design (see, for example, Louviere and Hensher 1983; Louviere and Woodworth, 1983). As such, these earlier works understandably borrowed from the early theories on experimental design without considering whether they were appropriate or not for use with models applied to such data. Over time, as is often the case in academia, the designs used in these earlier SC studies became the norm and have largely remained so ever since. Sporadic research over the years, however, has looked at the specific problem of experimental designs as related to econometric models estimated on discrete choice data. In order to calculate the statistical efficiency of a SC design, Fowkes and Wardman (1988), Bunch et al. (1996), Huber and Zwerina (1996), Sándor and Wedel (2001) and Kanninen (2002), among others, have shown that the common use of logit models to analyse discrete choice data requires a priori information about the parameter estimates, as well as the final econometric model form that will be used in estimation, although the final design may be more or less robust to deviations of model specification (see Bliemer and Rose, 2010a). Specifically, information on the expected parameter estimates, in the form of priors is required in order to calculate the expected utilities for each of the alternatives present within the design. Once known, these expected utilities can in turn be used to calculate the likely choice probabilities. Hence, given knowledge of the attribute levels (the design), expected parameter estimate values and the resultant choice probabilities, it becomes a straightforward exercise to calculate the asymptotic variance-covariance (AVC) matrix for the design, from which the expected standard errors can be obtained. The AVC matrix of the design, WN, can be determined as the inverse of the Fisher information matrix, IN, which is computed as the negative expected second derivatives of the log-likelihood function, considering N respondents, of the discrete choice model to be estimated (see Train, 2009). By manipulating the attribute levels of the alternatives, for known (assumed) parameter values, the analyst is able to minimize the elements within the AVC matrix, which in the case of the diagonals means lower standard errors and hence greater reliability in the estimates at a fixed sample size (or even at a reduced sample size). In taking this approach, these authors have remained consistent with the underlying theory of experimental design as defined previously. Indeed the theory for generating SC experimental designs has as its objective the same objective when dealing with linear models; that is the minimizing of the variances and covariances of the parameter

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

158

Handbook of choice modelling

estimates. What is different, however, are the econometric models to which the theory is being applied. As discussed above, other differences have also emerged related to various assumptions that are required to be made when dealing with data specifically generated for logit type models. How various researchers have dealt with these assumptions is discussed in the next section.

5

THE WHO, THE WHAT AND THE WHY

In this section, we provide an abridged historical overview of research into the experimental design theory as applied to SC-type data. It is acknowledged that there exists a vast number of papers and researchers who have examined this issue, making it impossible to discuss in detail all the developments. For example, a group of researchers consisting of Grossmann, Grasshoff, Holling and Schwabe, among others, have been extremely active since 2001 and continue to be so. Nevertheless, the design strategies explored by these researchers, while arguably superior to some of those discussed herein, are less well known outside of the mainstream statistical literature. Hence, while inclusion of the research undertaken by this group, and others, would greatly benefit in obtaining a full understanding of experimental design theory for SC data, inclusion of such material would be beyond the knowledge of most individuals working within this field. Further, examining such work as well as other notable omitted research streams would represent a separate paper in and of itself. As such, what are perceived to be the major contributions that are commonly known about within the wider discrete choice community are concentrated on. This, however, is not meant to detract from the research of those not discussed at length herein. Post Louviere and Hensher (1983) and Louviere and Woodworth (1983) to 1988 As discussed in section 4, the first SC studies focused mainly on introducing the method and promoting its benefits over the standard stated preference techniques used at the time (such as traditional conjoint methods). These early studies, therefore, did not concern themselves specifically with experimental design issues and simply borrowed design construction methods from elsewhere. As it turned out, elsewhere happened to be from the very methods that SC methods sought to replace; traditional conjoint design methodology. Traditional conjoint studies involve respondents ranking or rating alternatives (rather than picking one) constructed from either a full factorial or fractional factorial design which are not grouped together in choice tasks but presented all at once and are estimated using linear models such as linear regression (multivariate analysis of variance – MANOVA – was also popular at one stage). As such, the experimental design theory at the time focused largely on linear regression type models used for this type of data. This is not to suggest that research into aspects associated with the specific use of orthogonal designs as applied to discrete choice data were not undertaken in the early years of SC studies. For example, Anderson and Wiley (1992) and Lazari and Anderson (1994) looked at orthogonal designs capable of addressing problems of availability of alternatives. See Louviere et al. (2000) for a review of orthogonal design theory as applied to SC methods.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 159 The variance-covariance (VC) matrix of a linear regression model is given in equation (7.1). VC 5 s2 (XrX) 21,

(7.1)

where s2 is the model variance, and X is the matrix made up of attributes and other variables, including interaction terms to be used in estimation. Fixing the model variance for the present (which simply acts as a scaling factor), the elements of the VC matrix for linear regression models will generally be minimized when the columns of the X matrix are orthogonal. As such, when such models are estimated, orthogonality of data is considered important as this property ensures that (a) the model will not suffer from multicollinearity, and (b) the variances (and covariances) of the parameter estimates are minimized. As such, orthogonal designs, at least in relation to linear models, meet the two criteria for a good design mentioned earlier. They allow for an independent determination of each attribute’s contribution on the dependent variable and they maximize the power of the design to detect statistically significant relationships (that is, maximize the t-ratios at any given sample size). Of course, the role that s plays may be important and, as such, cannot always be ignored as suggested above. This is because it may be possible to locate a non-orthogonal design which produces non-zero covariances and slightly larger variances, but have smaller elements overall when scaled by sigma (this is because sigma is the error variance of the model, which is not independent of the betas). Nevertheless, orthogonal designs will tend to perform well overall for this type of model. Despite the fact that non-linear models are commonly used to analyse discrete choice data, the question as to whether designs generated for linear models might be appropriate for such data remained surprisingly uncommented upon for a number of years. Where an examination of the problem was made, often inappropriate analysis was conducted that resulted in the not surprising conclusion that orthogonal designs are preferred to non-orthogonal designs. For example, Kuhfeld et al. (1994) compared attribute level balanced and unbalanced orthogonal designs with non-orthogonal designs using the information matrix associated with linear models (specifically equation (7.1) above without s) despite applying the designs to (non-linear) logit models. It is little surprising that while they concluded that ‘preserving orthogonality at all costs can lead to decreased efficiency’, particularly when a balanced orthogonal design was not available, that ‘nonorthogonal designs will never be more efficient than balanced orthogonal designs, when they exist’ (Kuhfeld et al., 1994, p. 548). Such misconceptions continue to this day. To demonstrate, consider the frequent practice of either (i) reporting the following design statistic in SC studies or (ii) the use of the statistic itself as the objective function to be maximized when generating a SC design (for example, Kuhfeld et al., 1994; Lusk and Norwood, 2005) D-efficiency 5

100 , S # det (XrX) 21/K

(7.2)

where S is the number of observations (that is, choice sets), K is the number of parameters in the design, X the design matrix, and det(.) refers to the determinant operator.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

160

Handbook of choice modelling

This measure is uninformative with respect to the operating conditions of discrete choice modelling under random utility theory because equation (7.2) is derived under the assumption that the model to be estimated is linear in the parameters. The relationship between this equation and that of the VC matrix of the homoscedastic linear regression model, s2 (XrX) 21 clearly demonstrates the relationship between the two. Indeed, equation (7.2) will return a value of 100 per cent for an orthogonal design and lower values for non-orthogonal designs. However, design orthogonality of this type does not necessarily imply statistical efficiency of discrete choice models, which are intrinsically non-linear in the parameters. Nevertheless, it is important to note that the apparent success (or rather a lack of failure) of studies applying such designs meant that the use of orthogonal designs remained mostly unchallenged. Even with increasing evidence that non-orthogonal designs might be more appropriate for discrete choice models, orthogonal designs remain largely entrenched within the literature and continue to be the most common design method used. Fowkes, Toner and Wardman et al. (1988–2000) In the late 1980s, writing in the transport literature, researchers from the University of Leeds began to question the appropriateness of using orthogonal designs for discrete choice type data. In a series of research papers spanning over a decade, Fowkes, Toner and Wardman (among others) question the use of fractional factorial designs based on orthogonal arrays and discuss the importance of experiments that are realistic and make sense to the respondents as well as improving the robustness of the parameter estimates. Dealing specifically with binary choice tasks, the Leeds group designs were generated under the assumptions of non-zero priors for both generic and alternative-specific parameters. Such designs are referred to as locally optimal as the parameter priors are assumed to be known with certainty and the designs are optimized precisely for these parameter values. If the true parameter estimates differ to the assumed parameters (that is, the parameter priors were misspecified), then the design will lose efficiency. Rather than concern themselves directly with minimizing the standard errors of the parameter estimates, the Leeds group designs sought to minimize the variances of ratios of parameters (that is, they are concerned with willingness to pay – WTP – issues) which they were able to calculate from the model AVC matrix using the Delta-method. It is worth noting, that even if a prior parameter is perfectly specified, the design need not be optimal. The most likely reasons for this to occur are due to constraints placed on the design by the user which may prevent the design from being truly optimal. Even if a design is optimal within the bounds of any such constraints, it is possible that smaller standard errors for one or more parameters might be possible if a different prior parameter is chosen, due to the non-linearity of the model. Likewise, misspecification of the prior will not necessarily imply that the efficiency of the design will be zero. This is because the efficiency of a design is dependent upon all parameters, and failure to correctly specify one parameter does not necessarily render impossible the ability to estimate other effects. Indeed, only in the case of perfect multicollinearity (or very near so) could the efficiency of the design be considered zero as this would render impossible the ability to estimate one or more parameters of the design. Further, it is important to acknowl-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 161 edge that even if the parameter prior is misspecified, the efficiency of the design might improve as the true parameter estimate might result in the design producing better sets of choice probabilities (that is, the magic Ps discussed later). That is, given analyst imposed constraints, such as attribute level balance, the elements of the AVC matrix may actually be smaller if a different set of priors were assumed or estimated than under the exact set of parameter priors used during the design generation process. Rather than assume pre-defined fixed attribute levels, this class of designs were generated so as to allow the attribute levels of the design to take any value, including nonintegers, and hence be continuous. In letting go of specific predetermined attribute levels and allowing for the attributes to take any value, it is possible to locate a design that will optimize the objective function of interest using a number of mathematical techniques such a non-linear programming. The Leeds group was able to utilize such methods to locate designs that minimized the variance of the ratio of two parameters and as such, generate designs which can be considered to be optimal under the assumptions for which they were generated. Careful examination of the designs that were generated by this group led to the observation that many of the resulting choice tasks were not realistic from the perspective of the respondent (that is, the particular combinations of levels were not deemed to be realistic, nor the precision of the values shown, for example, a price of $5.37). For this reason additional requirements were imposed on the generated designs in which a reasonable coverage of so-called ‘boundary values’ were sought and obtained (see Fowkes and Wardman, 1998; Fowkes et al., 1993; Toner et al., 1998, 1999; and Watson et al., 2000, for further discussion of these designs). Further examination of these designs by the Leeds group found that they tended to retrieve very specific choice probabilities, which they referred to as ‘magic Ps’. This finding was later independently rediscovered by other researchers working in other discipline areas, in particular by Kanninen in 2002, and later expanded by Johnson et al. in 2006. Bunch, Louviere and Anderson (1996) In 1996, two similar papers appeared simultaneously in the marketing literature dealing with experimental design theory as related to SC data (Bunch et al., 1996 and Huber and Zwerina, 1996). The first we discuss here (Bunch et al., 1996) appears only in the form of a working paper. This paper dealt specifically with strategies for generating designs for multinomial logit (MNL) models assuming either zero or non-zero local priors for generic parameter estimates. Unlike the earlier work coming out of the Leeds group, Bunch et al. assumed fixed attribute levels using orthogonal polynomial coding in estimation. It is worth noting that the specific coding structure can impact upon the efficiency of a design in a number of ways. First, non-linear coding will generally require more parameters and, hence, not only larger designs, but also will produce different AVC matrices than if linear codes are assumed. Secondly, the coding structure may influence the choice probabilities if the parameter priors are not adjusted to account for differences in the values taken over the design. Independent of the coding structure, the objective function used by Bunch et al. involved minimizing the elements of the resulting AVC matrix rather than with the variances of ratios of parameter estimates. Bunch et al. (1996) promoted the use of the D-error statistic applying it to the expected

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

162

Handbook of choice modelling

AVC matrix of the design as constructed for the MNL model. The D-error statistic, not to be confused with the D-efficiency measure (equation 7.2) suggested by Kuhfeld et al. (1994), is calculated by taking the determinant of the AVC matrix assuming a single respondent, W1, and normalizing this value by the number of parameters, K. Minimizing the D-error statistic corresponds to minimizing, on average, the elements contained within the expected AVC matrix. Designs which minimize the D-error statistic are therefore called D-optimal designs. The D-error statistic is shown as equation (7.3). D-error 5 det (W1) 1/K.

(7.3)

Keeping in line with earlier empirical work in SC, Bunch et al. (1996) searched only among orthogonal designs. In doing so, they considered both simultaneously and sequentially constructed orthogonal designs in the generation process. A simultaneous orthogonal design is one where the attributes of the design are orthogonal not only within alternatives, but also between. This requires that the design be generated simultaneously for all alternatives. A sequentially constructed orthogonal design is one where the attributes of the design may be orthogonal within an alternative, but not necessarily between alternatives (see Louviere et al., 2000). As such, Bunch et al. (1996) kept the same properties as orthogonal designs, including attribute level balance constraints. Unlike the designs generated by the Leeds group, the use of pre-specified fixed attribute levels makes it generally difficult to locate the design matrix which will be optimal. As such, algorithms are required which search over the design space by re-arranging the attribute levels of the design and testing the efficiency measure after each change. Only if all possible designs are tested can one conclude that the design is optimal. For designs with large design dimensions, this is often not possible and for this reason, such designs are more correctly referred to as efficient designs instead of optimal designs. Given that Bunch et al. (1996) considered designs which are orthogonal, only a subset of all possible designs were examined. For this reason the class of designs generated by Bunch et al. is more correctly referred to as locally optimal orthogonal D-efficient designs, as opposed to D-optimal designs. Although algorithms for locating SC designs are important, and formed a central part of the Bunch et al. paper, for reasons of space we do not discuss this aspect of the design generation process here (see Kessels et al., 2006 for an excellent discussion of design algorithms). Huber and Zwerina (1996) At the same time as Bunch et al. (1996), a paper by Huber and Zwerina (1996) appeared in the marketing literature. This paper covered much of the same material discussed in Bunch et al. (1996), however, a number of important and often subtle differences do exist between the two papers. Before discussing these differences, it is worth noting the many similarities between the two papers. As with the work of Bunch et al., Huber and Zwerina concerned themselves with optimal designs specifically generated for the MNL model assuming non-zero local priors for generic parameter estimates although they assumed effects coded variables as opposed to orthogonal polynomial coding. Further, as with Bunch et al., they assumed fixed attribute levels drawn from the underlying experimental design. Finally, they also concluded that minimizing the D-error statistic provided the best designs (in terms of generating statistically efficient designs).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 163 Option A

Option B

Option A

Option B

A1

A2

A3

A1

A2

A3

A1

A2

A3

A1

A2

A3

10 20 30 10 30 20

3 −5 3 −5 −5 3

2 4 6 6 2 4

10 30 20 20 10 30

3 −5 −5 3 −5 3

2 6 4 4 2 6

10 20 30 30 30 20

3 −5 3 −5 −5 −5

2 4 6 6 2 6

10 20 30 10 10 20

3 −5 3 3 −5 3

2 4 4 4 2 6

(a)

Figure 7.1

(b)

Different definitions of attribute level balance

It is the differences between the two papers and the reaction of the literature to these differences, however, which is more telling. Unlike Bunch et al. (1996), Huber and Zwerina (1996) did not confine the design space to consist only of orthogonal designs. In letting go of orthogonality as a design criterion (or more precisely a constraint), they also relaxed the concept of attribute level balance. Whereas previously attribute level balance assumed that each level appeared an equal number of times within every column of the design (for a strict definition of attribute level balance, see Figure 7.1(a)), a new definition of attribute level balance was adopted that assumes that the level will appear an equal number of times for an attribute, independent of which alternative that attribute appears in (and hence levels must appear an equal number of times across columns of the design but not necessarily within each column; see Figure 7.1(b)). To demonstrate, consider the design in panel (a), in which the levels 10, 20 and 30 each appear twice for attribute A1 for both options A and B. Likewise, the levels of A2 and A3 appear an equal number of times in each column, independent of the alternative to which they belong. In the second design shown in panel (b), the levels do not appear an equal number of times within each column, however over both options A and B, each level appears exactly four times each within the corresponding attributes. Two important and far reaching findings were obtained from this paper, however. First, Huber and Zwerina found that under non-zero local priors, non-orthogonal designs produced better designs in the form of statistically more efficient outcomes. While important, it is the second finding which has had a larger, if less desirable, impact on the literature. In their paper, Huber and Zwerina concluded that designs that produce roughly equal choice probabilities among the J alternatives were more statistically efficient than designs that resulted in less equal distributions of the choice probability. This finding unfortunately contradicts the earlier findings obtained from the Leeds group as well as later work of other researchers in this field who found that under certain conditions (actually the same conditions assumed by Huber and Zwerina), optimal designs are obtained not when the choice shares of the alternatives are equal to 1/J, but rather optimal designs result from constructing alternatives that will produce certain choice probabilities which are not probability (or utility) balanced (the magic Ps discovered by the Leeds group). Unfortunately, it was the message of utility or probability balance which gained wider traction within the literature and not that letting go of orthogonality

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

164

Handbook of choice modelling

may result in statistical gains when applied to SC models. To this day, a number of papers continue to generate probability balanced designs when assuming non-zero priors, which will typically lead to inefficient designs. Sándor and Wedel (2001, 2002, 2005) A number of years passed before the next significant breakthrough occurred within the literature, again within marketing. In 2001, Sándor and Wedel introduced Bayesian efficient designs to the SC design field. Assuming an MNL model, with generic parameters applied to effects coded variables and fixed attribute levels, Sándor and Wedel (2001) relaxed the assumption of perfect a priori knowledge of the parameter priors through adopting a Bayesian like approach to the design generation process. Rather than assuming a single fixed value for each parameter prior, the efficiency of the design is calculated over a number of simulated draws taken from prior parameter distributions assumed by the analyst. Different distributions may be associated with different population moments representing different levels of uncertainty with regards to the true parameter values. In this way, by optimizing the efficiency of the design over a range of possible parameter prior values (drawn from the assumed parameter prior distributions) the design is made more robust, at least within the range of the assumed distributions. Such designs will be generally less efficient than an equivalent locally optimal design (represented by the solid line) but will be more robust to prior parameter misspecification. As with Huber and Zwerina (1996), they found non-orthogonal designs outperformed orthogonal designs based on the Bayesian equivalent of the D-error statistic. In subsequent research Sándor and Wedel (2002, 2005), assuming generic parameter estimates for effects coded variables and fixed attribute levels as well as local (fixed) prior parameter estimates, derived the AVC matrix for the cross-sectional version of the mixed MNL (MMNL) model. As such, they were the first to generate designs for a model other than the MNL model. In doing so, they retained the use of the D-error statistic as their design criterion and, as such, despite differences in assumptions made in terms of model type and how the prior parameters are generated, retained as their design objective, the desire to locate a design that results in smaller standard errors (and covariances). Street and Burgess (2001 to Current) An independent stream of research on generating designs for SC studies began to appear in 2001 within the statistics and marketing literatures and centers around what can be referred to as Street and Burgess type designs (see Burgess and Street, 2005; Street and Burgess, 2004, 2007; Street et al., 2001, 2005). Like earlier researchers, Street and Burgess assume an MNL model specification in deriving the AVC matrix for their designs; however, the mathematical derivations used to obtain to the AVC matrix are performed in a somewhat different manner. Whereas other researchers derive the second derivatives with respect to the parameter b such that W 5 I21, with I 5 2Ea

02 log L b, 0b 0br

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

(7.4)

Stated choice experimental design theory 165 where E ( # ) is used to express the large sample population mean, Street and Burgess calculate the second derivatives with respect to total utility V, such that W 5 I21, with I 5 2Ea

02 log L b, 0V 0Vr

(7.5)

This difference in the mathematical derivations of the AVC matrix has resulted in significant confusion within the literature, with claims that the Street and Burgess approach is unrelated to the more mainstream SC experimental design literature as discussed herein. This view has been further enhanced given the fact that the resulting matrix algebra used to generate the AVC matrices under the two derivations appear to be very different. However, Bliemer and Rose (2010b) were able to show that Street and Burgess designs are a special case of the more general methods used by other researchers, as described earlier. Aside from the assumption of an MNL model specification, Bliemer and Rose (2010b) were able to reproduce Street and Burgess type designs using the same methods used by other researchers discussed above, if they assumed the data were coded using an orthonormal coding structure. Beginning with a sequentially generated orthogonal design using the methods first described by Bunch et al. (1996), Street and Burgess designs can be constructed after first converting the design (see Table 7.1a) into orthogonal contrast codes (Table 7.1b) (see Keppel and Wickens, 2004). The orthogonal contrast codes are then converted into the orthonormal coding structure by first computing the sum of the squares of each column (shown at the base of Table 7.1b) and next dividing each column of the orthogonal contrast code by this number (see Table 7.1c). The AVC matrix of the design is then computed under the assumption of zero local priors, assuming that the parameters are generic across alternatives. The elements of the resulting AVC matrix are then normalized by dividing each value by the product of K the number of levels, Lk, of each attribute k of the original design (that is, by w k51Lk). The design can then be optimized using the same D-error measure promoted by other researchers. The major contribution of Street and Burgess, however, has been to derive a method that can be used to locate the optimal design under the above set of assumptions without having to resort to complex (iterative) algorithms. They showed that by using so-called design generators, which are sequences of numbers, to create the attribute levels for the second alternative (and third, and so on) they ensure that attribute levels will have minimum overlap (that is, are mostly different for each alternative). They show that the best (minimum) D-error can be determined, such that the D-efficiency of a design can be computed relative to this minimum D-error. Then, let I be the Fisher information matrix of a certain design (assuming some design generator), then D-efficiency can be computed as D-efficiency 5

det (I) 3 100. det (Imax)

(7.6)

Equation (7.6) provides a measure, as a percentage, as to how efficient a design is under the specific assumptions outlined above; that is the design is constructed assuming

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

166

Handbook of choice modelling

Table 7.1 Design codes to orthogonal contrast codes (a) Design codes

(b) Orthogonal contrast codes

S

A1

A2

B1

B2

S

1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 2 0 1 1 2 0 1 2 0

0 1 1 0 0 1 0 0 1 0 1 1

2 0 1 1 2 0 0 1 2 0 1 2

1 0 0 1 1 0 1 1 0 1 0 0

1 2 3 4 5 6 7 8 9 10 11 12

A1a

A1b

A2a

B1a

B1b

B2a

−1 0 1 1 −1 0 0 1 −1 0 1 −1

1 −2 1 1 1 −2 −2 1 1 −2 1 1

−1 1 1 −1 −1 1 −1 −1 1 −1 1 1

1 −1 0 0 1 −1 −1 0 1 −1 0 1

1 1 −2 −2 1 1 1 −2 1 1 −2 1

1 −1 −1 1 1 −1 1 1 −1 1 −1 −1

8

24

12

8

24

12

(c) Orthonormal coding S

A1a

A1b

A2a

B1a

B1b

B2a

1 2 3 4 5 6 7 8 9 10 11 12

−0.35 0.00 0.35 0.35 −0.35 0.00 0.00 0.35 −0.35 0.00 0.35 −0.35

0.20 −0.41 0.20 0.20 0.20 −0.41 −0.41 0.20 0.20 −0.41 0.20 0.20

−0.29 0.29 0.29 −0.29 −0.29 0.29 −0.29 −0.29 0.29 −0.29 0.29 0.29

0.35 −0.35 0.00 0.00 0.35 −0.35 −0.35 0.00 0.35 −0.35 0.00 0.35

0.20 0.20 −0.41 −0.41 0.20 0.20 0.20 −0.41 0.20 0.20 −0.41 0.20

0.29 −0.29 −0.29 0.29 0.29 −0.29 0.29 0.29 −0.29 0.29 −0.29 −0.29

an MNL model specification, using orthonormal coding with local priors for generic parameters equal to zero. Unfortunately, in a case of selectively choosing criteria to promote one design paradigm over another, the measure has been incorrectly applied by some to infer that designs generated under different sets of assumptions are not optimal. That is, these equations should only be used to optimize designs under the specific conditions that the equations were derived for, and not to infer anything about designs generated under other sets of assumptions. Kanninen (2002, 2006) In 2002, Kanninen independently rediscovered the fact that D-optimal designs generated under the assumption of an MNL model specification with non-zero local priors

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 167 for generic parameter estimates tend to retrieve specific non balanced choice probabilities. As with the work of the Leeds group, Kanninen was able to show analytically that utility or probability balance in choice tasks represent an undesirable property, and in doing so suggest rules that minimize the variance of estimates in an optimal manner, based on desirable choice probabilities or what the Leeds group referred to as magic Ps. Kanninen (2002) proposed a design approach where the first K 2 1 attribute levels are first generated for each of the alternatives, typically using an orthogonal design or Street and Burgess design. The level of the last Kth attribute for each alternative is then generated as a continuous variable (usually a price attribute). The values of these continuous variables are chosen such that the choice probabilities take certain values that minimize the elements of the AVC matrix under the assumption of non-zero prior parameters. Although the boundary value method of the Leeds group is somewhat different in derivation, the implications remain the same, and similar results are achieved. The main differences between the two methods, however, lie in the fact that the designs generated by the Leeds group were constructed so as to minimize the variances of the ratios of two parameters (that is, dealing with WTP issues), whereas the approach adopted by Kanninen works directly with the variances of the parameter estimates using the D-error measure. Another difference between the two approaches is that the Leeds group promoted allowing all attributes of the design to be treated as continuous (given transport applications where the attributes often considered are times and costs, this is possible), while Kanninen designs tend to allow only one attribute to be treated as continuous (again, this is in line with the literature where these designs have been applied – marketing and environmental economics – in which many of the attributes are qualitative in nature and it makes little to no sense to allow them to be shown to respondents as a continuous attribute level). Kanninen (2002) and Johnson et al. (2006) have determined the desirable probabilities for a limited number of designs with two alternatives (see Table 7.2), however, unlike the Leeds group, Kanninen’s earlier work was not constrained to designs with only two alternatives. One concern with these designs, however, is that they are optimal for the parameter priors assumed. While this is true of other locally D-optimal designs, the values that the continuous variables take are particularly sensitive in terms of the priors assumed, Table 7.2 Optimal choice probability values for specific designs with two alternatives Number of attributes (K)

Number of choice sets

Optimal choice-percentage split for two-alternative model

2 4 4 8 8 8 8

0.82 / 0.18 0.77 / 0.23 0.74 / 0.26 0.72 / 0.28 0.70 / 0.30 0.68 / 0.32 0.67 / 0.33

2 3 4 5 6 7 8 Source: Adapted from Johnson et al. (2006).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

168

Handbook of choice modelling

and may change markedly given different parameter priors. Thus, Kanninen (2002) recommends a continual process of updating the design once data is collected and more likely parameter estimates are obtained. Further, it is worth stressing that the results of Kanninen (and the Leeds group) hold only for designs assuming generic parameter estimates. Once alternative specific parameter priors are assumed, it is no longer possible to calculate desirable choice probabilities even for the MNL model, as these will differ over each choice task. Bliemer, Rose and Scarpa (2004 to Current) Writing in the transportation and environmental economics literature, Bliemer, Rose and Scarpa have sought to extend the theory of experimental design to more advanced discrete choice models as well as address issues related to the sample size requirements for these types of studies. In 2004, Rose and Bliemer began by examining the impact that relaxing the assumption of orthogonality has on the performance of logit models; in particular MNL models assuming non-zero local priors (Rose and Bliemer, 2004). In line with earlier work, they concluded that orthogonality as a design principle did not appear to be a desirable property for the non-linear logit models. Bliemer and Rose (2005a) and Rose and Bliemer (2005a) next sought to extend the methods advocated by Bunch et al. (1996) for MNL model specifications assuming non-zero local priors to allow for alternative specific and generic parameter estimates. Concurrently, Bliemer and Rose (2005b) and Rose and Bliemer (2005b) turned their attention towards issues of sample size requirements for SC experiments. Throughout this work, fixed attribute levels were assumed in the design generation process. Bliemer and Rose pointed out that the AVC matrix for discrete choice models is inversely related to the number of times a design is replicated within the data, that is, assuming all choice tasks are answered by each respondent, which is equivalent to the number of respondents N.1 As such, the analyst can calculate the values contained within the AVC matrix for any sample size, simply by determining the AVC matrix for a single respondent and then dividing the resulting matrix by N. This means that the standard errors decrease with a rate of 1/"N. Using this relationship, Figure 7.2(a) reveals the consequences of investing in larger sample sizes for a given design XI. While initial gains can be achieved in terms of improvements to the expected asymptotic standard errors achieved from models estimated based on the design from adding more respondents, such improvements occur at a diminishing rate until each additional respondent added will have only a marginal impact on the expected asymptotic standard errors. Hence, increasing the sample size beyond a certain limit will typically have little impact upon the statistical significance of the parameter estimates achieved from SC studies. Figure 7.2(b) reveals the impact for a given set of population parameters of investing in a better design XII (that is, more efficient design). Typically, larger decreases in the standard error can be achieved by investing in finding a more efficient design than by investing in a larger sample. Given this, Bliemer and Rose were able to use this relationship to provide insight into the sample size requirements for SC experiments. Seeing that the square roots of the diagonal elements of the AVC matrix represent the asymptotic standard errors for the parameter estimates, and the asymptotic t-ratios are simply the parameter estimates

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 169 Standard error

Standard error

se1(X I, )

se1(X I, )

seN (X I, )

seN (X I, ) seN (X II, ) N

N

0 10 20 30 40 50

0 10 20 30 40 50

(a) Investing in larger samples

Figure 7.2

(b) Investing in better design

Comparison of investing in larger sample sizes versus more efficient designs

divided by the asymptotic standard errors (equation (7.7)), it is possible to determine the likely asymptotic t-ratios for a design assuming a set of prior parameter estimates. tk 5

bˆ k sek /"Nk

.

(7.7)

Rearranging equation (7.7), Nk 5

t2kse2k . b2k

(7.8)

Equation (7.8) allows for a determination of the sample size required for each parameter to achieve a minimum asymptotic t-ratio, assuming a set of non-zero prior parameter values. To use these equations, the analyst might use the prior parameters used in generating the design, or test the sample size requirements under various prior parameter misspecifications. Once the sample size is determined for all parameters, the analyst can then select the sample size that will be expected to result in all asymptotic t-ratios taking a minimum pre-specified value (for example, 1.96). Such designs are called S-efficient designs. Bliemer and Rose noted, however, that sample sizes calculated using this method should be considered as an absolute theoretical minimum. The method assumes certain asymptotic properties that may not hold in small samples. Further, the method does not consider the stability of the parameter estimates, nor at what sample size parameter stability is likely to be achieved (that is, how many respondents are required before the parameter estimates reflect the sample population estimates). Comparing samples sizes using equation (7.8) for different parameters may also give an indication of which parameters will be more difficult to estimate (at a certain level of significance) than other parameters.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

170

Handbook of choice modelling

Rose and Bliemer (2006) extended the theory of SC designs next to include covariates (such as socio-demographics; Jaeger and Rose 2008 later looked at including contextual covariates into the design, such as situational factors usually held constant within the survey preamble) in the utility functions and hence also in the AVC matrices of the designs. Assuming an MNL model with non-zero local priors and combinations of alternative specific and generic parameters, they were able to demonstrate a method capable of jointly minimizing the elements of the AVC matrix while determining the optimal number of respondents to sample from different segments. This was accomplished by determining optimal weights to apply to different segments of the Fisher information matrix based on how many respondents belong to each segment. Rose et al. (2008) next looked at SC studies requiring pivot (or customized) designs where the levels of the design alternatives are computed as percentage differences from some pre-specified respondent specific status quo alternative, rather than as specific predefined levels chosen by the analyst. Again, assuming an MNL model specification with non-zero local priors and combinations of alternative specific and generic parameters, they explored a number of design procedures capable of optimizing designs at the individual respondent level. Meanwhile, Ferrini and Scarpa (2007), writing in the environmental economics literature, extended the optimal design theory to the panel version of the error component models assuming non-zero local priors and fixed attribute levels. In considering the panel version of the model, Ferrini and Scarpa’s paper represents a significant leap forward in the theory of SC experimental design, as it was the first to consider the issue of within respondent preference correlations which theoretically exist over repeated choice tasks which cannot be detected, and indeed are ignored in models such as the MNL model. Unlike earlier papers, however, Ferrini and Scarpa used simulation to derive the AVC matrix of the model, rather than more common analytical derivations. Scarpa and Rose (2008) looked at various design strategies assuming an MNL model specification with non-zero local priors and generic parameters. Unaware of the earlier work of the Leeds group, they also used the delta method to derive the variance of the ratio of two parameters and advocated optimizing on this measure if WTP was the primary concern of the study. Bliemer et al. (2009) next applied experimental design theories assuming non-zero local priors and Bayesian priors with designs assuming fixed attribute levels in order to estimate nested logit (NL) models. They were able to show that designs generated assuming an optimal design that has been derived for a MNL model does not necessarily perform well for a nested model with correlated error terms within the different nests of alternatives. In 2010, Bliemer and Rose were able to analytically derive the AVC matrix for the panel version of the MMNL model. Via a number of case studies in which both non-zero local priors and Bayesian priors with both alternative specific and generic parameters, they were able to compare the design efficiencies for designs generated assuming MNL, cross sectional MMNL and panel MMNL model specifications. They found that efficient designs generated specifically for the MNL model are mostly also quite efficient for the panel MMNL model, and vice versa. However designs generated for a cross sectional MMNL specification were very different in terms of the resulting statistical efficiency and sample size requirements (see Bliemer and Rose, 2010a).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 171 One criticism often levelled at those advocating the generation of the efficient design approach to SC studies is the need to know in advance the precise econometric model that will be estimated once data has been collected. There unfortunately exist many forms of possible discrete choice models that analysts may wish to estimate once SC data has been collected (for example, MNL, NL, GEV, MMNL). The log-likelihood function will differ for different model types and hence, given that the AVC matrix is mathematically given as the inverse of the second derivatives of the log-likelihood function, the AVC matrix for each type of model will also be different, even if the same design is used. As such, the construction of efficient designs requires not only an assumption as to the parameter priors assumed, but also what AVC matrix the analyst is attempting to optimize. Given differences in the AVC matrices of different discrete choice models, attempts at minimizing the elements of the AVC matrix assuming one model, may not necessarily minimize the elements of the AVC matrix of another model. Similar to the problem involving knowledge of the parameter estimates with certainty (given by local priors), the analyst is unlikely to know precisely what model is likely to be estimated in advance. The problem then becomes one of having to select the most likely model that will be estimated once data has been collected. To address this specific issue, Rose et al. (2009) advocated the use of a model averaging approach, where different weights could be applied to the Fisher information matrices obtained assuming different model specifications given a common design. Included in the model averaging process were MNL, cross sectional error components and MMNL, and panel error components and MMNL model specifications. Bliemer and Rose (2010c) looked at sequential designs in which the design is optimized after collecting data from a respondent, based on the current parameter estimates as priors, and given to the next respondent. Such adaptive designs are often said to be prone to endogeneity bias. By only adapting the design between respondents, however, and not within-respondent, this endogeneity bias can be mostly avoided. They found that starting with zero local priors and then sequentially generating more efficient designs based on the parameter estimates after n respondents, is a very efficient way of collecting data. However, as designs found at the early stage of data collection are sensitive to likely unstable priors, parameter biases could be found in small sample sizes, but disappear in larger sample sizes. Starting with an orthogonal design for the first, say, 20 respondents, and then generate sequentially optimal designs for the next respondents will avoid these biases in small sample sizes, while being able to minimize the standard errors significantly. Most recently, Rose et al. (2011) sought to extend upon the earlier research originating from both the Leeds group and Kanninen to a wider range of SC problems. Unfortunately, they found that it was only possible to derive optimal choice probabilities for designs generated under the assumption of a MNL model specification and non-zero local priors parameters and generic parameter estimates. To overcome these limitations, they demonstrated how the Nelder–Mead algorithm could be used to locate the optimal choice probabilities for any model type with any number of alternatives and any type of prior parameters, including non-zero Bayesian priors. In contrast to the designs with generic parameters, fixed magic Ps do not seem to exist for this more general case.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

172

Handbook of choice modelling

Kessels, Goos, Vandebroek and Yu (2006 to Current) Research groups centred at the University of Antwerp and Catholic University of Leuven have also been active in promoting the application of experimental design theory in the generation of SC experiments. A significant proportion of the work originating from this group deals specifically with algorithms for generating these types of designs and are hence beyond the scope of this current paper (see, for example, Kessels et al., 2006, 2009; Yu et al., 2010, 2012). This group has actively examined a wide range of design criterion beyond the D-error statistic, including G- and V-error measures which are designed to minimize the prediction error variance (see, for example, Kessels et al., 2006). However, this group has also been actively looking at other areas of SC experimental design theory. In particular, they have examined designs for the cross sectional MMNL model under non-zero Bayesian priors and generic parameters (see Yu et al., 2009) as well as designs generated for MNL models with no choice alternatives under the assumption of non-zero Bayesian priors (Vermeulen et al., 2008) and NL models including no choice alternatives under the assumption of non-zero Bayesian priors (see Goos et al., 2010). Yu et al. (2008) also looked at efficient designs assuming non-zero priors for both main and interaction effects. This was later extended to allow for interaction effects assuming an MNL model specification and non-zero Bayesian priors. More recently, theory related to designs optimized for WTP to include non-zero Bayesian priors has been examined (Vermeulen et al., 2011a). This group has also been actively researching sequentially generated designs which adapt at the individual level both for models optimized for measuring preferences (Yu et al., 2011) and WTP (Danthurebandara et al., 2011a). Two new areas of innovation recently researched by this group include the inclusion of an entropy measure in generating SC designs for MNL models with non-zero priors as a measure of complexity (Danthurebandara et al., 2011b) and designs for rank order experiments involving rank exploded data (Vermeulen et al., 2011b).

6

CONCLUDING COMMENTS

The chapter has sought to provide a brief overview of experimental design theory as related to SC studies and, in doing so, make clear that the construction of experimental designs for SC problems is not independent of other wider decisions that are necessary when constructing these types of questionnaires. The central thesis of the chapter, however, has been to discuss how key research streams, as emanating from different research groups, have progressed knowledge in this field. The chapter has not sought to provide any new information on the design of SC experiments as everything discussed herein has already been reported elsewhere. However, in examining research from several different disciplines we have attempted to bring together, in one document, the substantive frontier of knowledge as held at this time. In writing the chapter we have not attempted to provide an in-depth discussion or analysis of the specific methods used for generating SC designs. By reviewing the key research streams, references have been provided that the interested reader can use to learn the specifics of the various methods. Nevertheless, where necessary, we have pointed out key misunderstandings of certain research streams and attempted to resolve

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 173 these within a general theory of design. For example, we have argued that Street and Burgess sought specifically to optimize designs under specific assumptions which can be replicated using the same methods advocated by other researchers and are a specific case of optimal designs rather than representing designs which will be optimal independent of the assumptions made. One important point we hope the reader does take away from this chapter is the inappropriate use of a number of statistical measures which have come to be prevalent within the literature. In particular, we deliberately point out two such measures; the first designed specifically to optimize designs for linear models and the second used to optimize designs assuming an MNL model specification with all generic parameters and priors equal to zero under orthonormal codes (that is, the specific case examined by Street and Burgess). While use of these measures is perfectly valid if one wishes to optimize a design under these expressed assumptions, applying these measures to determine how optimal a design is generated under other assumptions (including different model specification, prior parameter assumptions and coding structures) is both incorrect and misleading (we are not implying that those who devised these measures have applied the measures incorrectly; however, we can attest to the fact that a number of reviewers have over time applied them inappropriately to infer designs are not optimal, even when generated under different sets of assumptions). To apply a sporting analogy to this situation to highlight how inappropriate the blanket use of such measures is, this is equivalent to attempting to play golf applying the rules of hockey simply because both sports utilize sticks. In this chapter, it has been argued that the use of orthogonal designs for non-linear models, such as the logit model, will be inefficient under most, but not all, assumptions made during the design generation phase (for example, the specific case examined by Street and Burgess has shown that orthogonal designs are optimal under some assumptions). Nevertheless, orthogonal designs remain to this day the most widely used design type. Such prevalence is the result of the fact that orthogonal designs appear to (and actually do) work well in most cases and it is important to understand why this is the case. Designs of all types, whether orthogonal or non-orthogonal, are generated under assumptions about the true population parameter estimates (that is, the priors that are assumed). These assumptions are either explicitly acknowledged by those generating the design or implicitly made without their knowledge. Perhaps unknown too many, an orthogonal design will be the optimal design for an unlabelled experiment under the assumption of local priors set to zero (see Bliemer and Rose, 2005a). If the true population parameters differ from those that are assumed in the design generation phase, then the design will generally lose statistical efficiency. If a prior parameter is incorrectly specified, typically an increase in the standard error obtained will occur for the true population parameter value, all else being equal. Note that this does not mean that the true parameter cannot be estimated by the design, but simply that a larger sample size would be required to detect statistical significance of the parameter estimate than otherwise would have been the case had the prior parameter assumed been correct. It is this precise reason that orthogonal designs have appeared to work well in the past and will likely continue to work well into the future. That is, the sample sizes used in practice have in most cases reported in the literature been such that they have sufficiently compensated

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

174

Handbook of choice modelling

the loss of efficiency in the design as the true parameters diverge from those assumed in generating the design. Nevertheless, as econometric models become more sophisticated, such an argument may become less valid and larger samples may be required to detect particular effects within the data. This may be particularly true of models using random parameters to detect preference or scale heterogeneity or models using complex non-linear functions. Still, the point of those advocating non-orthogonal designs generated under non-zero prior parameter estimates is that in undertaking SC experiments, one would assume that the attributes chosen will have some influence in the choices made by the respondents, and hence the true population parameters will be non-zero. In such cases, the argument is that these designs will outperform orthogonal designs given similar sample sizes, or produce the same results as an orthogonal design but with smaller sample sizes. If parameters are unknown, but the signs known, then one can find more efficient designs than an orthogonal design. It is important to note that the above discussion is predicated on the assumption of all else being equal. That is, it assumes that there exists no link between the population parameter estimates and the design itself. Several articles have convincingly argued that the design may result in unintended biases of the parameter estimates (for example, Louviere and Lancsar, 2009). In theory, however, this should not be the case. McFadden (1974) showed that asymptotically, the parameter estimates should converge to the population parameters, independent of the data matrix (that is, design in this instance). Using Monte Carlo simulations, McFadden further showed that this was the case in quite small finite samples, with as few as 50 choice observations. Numerous studies using simulation have led to the same conclusions (for example, see Ferrini and Scarpa, 2007). However, the arguments put forward by Louviere and Lancsar (2009), drawing on research derived from the psychology literature on demand characteristics or demand induced effects (see Orne, 1959, 1969), remain compelling. They posit that if the design attributes correlate with unobserved omitted covariates or latent contrasts, such as personality profiles or other such characteristics, then the resulting parameters obtained from different designs will indeed be influenced by the specific design used. Such biases will not exist in simulated data unless they are assumed in the data generation process, which makes empirical studies far more important to determining if these biases are real or not. Thus, this represents an important area of research that is urgently required as the existence of any such biases may require a different line of enquiry in terms of generating designs than has occurred in the past, as outlined in this chapter. Similarly, the impact of designs upon scale also represents an important research area. Louviere et al. (2008) and Bliemer and Rose (2011) found scale differences across various designs relating to how ‘easy’ the resulting questions are as generated from the design. Both Louviere et al. and Bliemer and Rose found, for example, that orthogonal designs tended to lead to lower error variances than efficient designs possibly as a result of the presence of dominated alternatives. Given that efficient designs are less likely to have dominated alternatives than orthogonal designs, as choice probabilities are optimized, the questions arising from the use of orthogonal designs will be easier to answer, resulting in lower error variance. As such, there exists the very real possibility that any move away from orthogonal designs to other designs represents a trade-off between capturing more information per question versus lowering error variance. Once more, further research is required to address this specific issue.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 175

ACKNOWLEDGEMENTS We would like to thank Matthew Beck and Kirsten Howard for their comments on earlier drafts of this chapter. We would also like to acknowledge the anonymous reviewer who provided useful comments and feedback resulting in a much better chapter. All mistakes and errors remain ours however.

NOTE 1. If a design is blocked, then each design replication will require several respondents, and hence the total number of respondents will be N 3 b, where b is the number of blocks.

REFERENCES Anderson, D.A. and J.B. Wiley (1992) ‘Efficient choice set designs for estimating cross-effects models’, Marketing Letters, 3 (4), 357–70. Bliemer, M.C.J., J.M. Rose and D.A. Hensher (2009) ‘Constructing efficient stated choice experiments allowing for differences in error variances across subsets of alternatives’, Transportation Research Part B, 43 (1), 19–35. Bliemer, M.C.J. and J.M. Rose (2005a) ‘Efficient designs for alternative specific choice experiments’, working paper, ITS-WP-05-04, Institute of Transport Studies, University of Sydney. Bliemer, M.C.J. and J.M. Rose (2005b) ‘Efficiency and sample size requirements for stated choice studies’, working paper, ITS-WP-05-08, Institute of Transport Studies, University of Sydney. Bliemer, M.C.J. and J.M. Rose (2009) ‘Designing stated choice experiments: the state of the art’, in R. Kitamura, T. Yoshi and T. Yamamoto (eds), The Expanding Sphere of Travel Behaviour Research, Selected Papers from the 11th International Conference on Travel Behaviour Research, Bingley: Emerald, pp. 495–8. Bliemer, M.C.J. and J.M. Rose (2010a) ‘Construction of experimental designs for mixed logit models allowing for correlation across choice observations’, Transportation Research Part B, 46 (3) special issue, 720–34. Bliemer, M.C.J. and J.M. Rose (2010b) ‘A unified theory for stated choice designs’, Institute of Transport and Logistics Studies working paper series, University of Sydney. Bliemer, M.C.J. and J.M. Rose (2010c) ‘Serial choice conjoint analysis for estimating discrete choice models’, in S. Hess and A. Daly (eds), Choice Modelling: The State-of-the-Art and the State-of-the-Practice, Bingley: Emerald, pp. 139–61. Bliemer, M.C.J. and J.M. Rose (2011), ‘Experimental design influences on stated choice outputs: an empirical study in air travel choice’, Transportation Research Part A, 45 (1), 63–79. Bunch, D.S., J.J. Louviere and D.A. Anderson (1996), ‘A comparison of experimental design strategies for multinomial logit models: the case of generic attributes’, working paper, Graduate School of Management, University of California at Davis. Burgess, L. and D.J. Street (2005), ‘Optimal designs for choice experiments with asymmetric attributes’, Journal of Statistical Planning and Inference, 134 (1), 288–301. Carlsson, F. and P. Martinsson (2003), ‘Design techniques for stated preference methods in health economics’, Health Economics, 12 (4), 281–94. Danthurebandara, V., J. Yu and M. Vandebroek (2011a), ‘Sequential choice designs to estimate the distribution of willingness-to-pay’, Quantitative Marketing and Economics, 9, (4), 429–48. Danthurebandara, V., J. Yu and M. Vandebroek (2011b), ‘Effect of choice complexity on design efficiency in conjoint choice experiments’, Journal of Statistical Planning and Inference, 141 (7), 2276–86. Ferrini, S. and R. Scarpa (2007), ‘Designs with a priori information for nonmarket valuation with choiceexperiments: a Monte Carlo study’, Journal of Environmental Economics and Management, 53 (3), 342–63. Flynn, T.N., J.J. Louviere, T.J. Peters and J. Coast (2008), ‘Estimating preferences for a dermatology consultation using best-worst scaling: comparison of various methods of analysis’, BMC Medical Research Methodology, 8 (76), 1–12. Fowkes, A.S. and M.R. Wardman (1988), ‘The design of stated preference travel choice experiments with particular regard to inter-personal taste variations’, Journal of Transport Economics and Policy, 22 (1), 27–44.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

176

Handbook of choice modelling

Fowkes, A.S., M. Wardman and D.G.P. Holden (1993), ‘Non-orthogonal stated preference design’, Proceedings of the PTRC Summer Annual Meeting, London: PTRC Education Research Services Limited, pp. 91–7. Goos, P., B. Vermeulen and M. Vandebroek (2010), ‘D-optimal conjoint choice designs with no-choice options for a nested logit model’, Journal of Statistical Planning and Inference, 140 (4), 851–61. Hensher, D.A. and J.J. Louviere (1983), ‘Using discrete choice models with experimental design data to forecast consumer demand for a unique cultural event’, Journal of Consumer Research, 10 (3), 348–61. Huber, J. and K. Zwerina (1996), ‘The importance of utility balance and efficient choice designs’, Journal of Marketing Research, 33 (3), 307–17. Jaeger, S.R. and J.M. Rose (2008), ‘Stated choice experimentation, contextual influences and food choice: a case study’, Food Quality and Preference, 19 (6), 539–64. Johnson, F.R., B.J. Kanninen and M. Bingham (2006), ‘Experimental design for stated choice studies’, in B.J. Kanninen (ed.), Valuing Environmental Amenities Using Stated Choice Studies: A Common Sense Approach to Theory and Practice, Dordrecht: Springer, pp. 159–202. Kanninen, B.J. (2002), ‘Optimal design for multinomial choice experiments’. Journal of Marketing Research, 39 (2), 214–17. Keppel, G. and D.W. Wickens (2004), Design and Analysis: A Researchers Handbook, 4th edn, Englewood Cliffs, NJ: Pearson Prentice Hall. Kessels, R., B. Bradley, P. Goos and M. Vandebroek (2009), ‘An efficient algorithm for constructing Bayesian optimal choice designs’, Journal of Business and Economic Statistics, 27 (2), 279–91. Kessels, R., P. Goos and M. Vandebroek (2006), ‘A comparison of criteria to design efficient choice experiments’, Journal of Marketing Research, 43 (3), 409–19. Kuhfeld, W.F., R.D. Tobias and M. Garratt (1994), ‘Efficient experimental design with marketing research applications’, Journal of Marketing Research, 31 (4), 545–57. Lazari, A.G. and D.A. Anderson (1994), ‘Designs of discrete choice set experiments for estimating both attribute and availability cross effects’, Journal of Marketing Research, 31 (3), 375–83. Louviere, J.J. and D.A. Hensher (1983), ‘Using discrete choice models with experimental design data to forecast consumer demand for a unique cultural event’, Journal of Consumer Research, 10 (3), 348–61. Louviere, J.J. and E. Lanscar (2009), ‘Choice experiments in health: the good, the bad, and the ugly and toward a brighter future’, Health Economics, Policy and Law, 4 (4), 527–46. Louviere, J.J. and G. Woodworth (1983), ‘Design and analysis of simulated consumer choice or allocation experiments: an approach based on aggregate data’, Journal of Marketing Research, 20 (4), 350–67. Louviere, J.J., D.A. Hensher and J.D. Swait (2000), Stated Choice Methods: Analysis and Application, Cambridge: Cambridge University Press. Louviere, J.J., T. Islam, N. Wasi, D. Street and L. Burgess (2008), ‘Designing discrete choice experiments: do optimal designs come at a price?’, Journal of Consumer Research, 35 (2), 360–75. Lusk, J.L. and F.B. Norwood (2005), ‘Effect of experimental design on choice-based conjoint valuation estimates’, American Journal of Agricultural Economics, 87 (3), 771–85. MacCrimmon, K.R. and M. Toda (1969), ‘The experimental determination of indifference curves’, Review of Economic Studies, 36 (4), 433–51. May, K.O. (1954), ‘Intransitivity, utility, and the aggregation of preference patterns’, Econometrica, 22 (1), 1–13. McFadden, D. (1974), ‘Conditional logit analysis of qualitative choice behaviour’, in P. Zarembka (ed.), Frontiers of Econometrics, New York: Academic Press, pp. 105–42. Mosteller, F. and P. Nogee (1951), ‘An experimental measurement of utility’, Journal of Political Economy, 59 (5), 371–404. Orne, M.T. (1959), ‘The demand characteristics of an experimental design and their implications’, paper presented at the American Psychological Association, Cincinnati, September. Orne, M.T. (1969), ‘Demand characteristics and the concept of quasi-controls’, in R. Rosenthal, R.L. Rosnow and A.E. Kazdin (eds), Artifact in Behavioral Research, New York: Academic Press, pp. 143–79. Peirce, C.S. (1876), ‘Note on the theory of the economy of research’, Coast Survey Report, pp. 197–201. Rose, J.M. and M.C.J. Bliemer (2004), ‘The design of stated choice experiments: the state of practice and future challenges’, working paper, ITS-WP-04-09, Institute of Transport Studies, University of Sydney. Rose, J.M. and M.C.J. Bliemer (2005a), ‘Constructing efficient choice experiments’, working paper, ITS-WP05-07, Institute of Transport Studies, University of Sydney. Rose, J.M. and M.C.J. Bliemer (2005b), ‘Sample optimality in the design of stated choice experiments’, working paper, ITS-WP-05-13, Institute of Transport Studies, University of Sydney. Rose, J.M. and M.C.J. Bliemer (2006), ‘Designing efficient data for stated choice experiments’, 11th International Conference on Travel Behaviour Research, Kyoto, 16–20 August. Rose, J.M. and M.C.J. Bliemer (2009), ‘Constructing efficient stated choice experimental designs’, Transport Reviews, 29 (5), 587–617. Rose, J.M., S. Bain and M.C.J. Bliemer (2011), ‘Experimental design strategies for stated preference studies

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stated choice experimental design theory 177 dealing with non market goods’, in J. Bennett (ed.), International Handbook on Non-Marketed Environmental Valuation, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 273–9. Rose, J.M., M.C.J. Bliemer, D.A. Hensher and A.C. Collins (2008), ‘Designing efficient stated choice experiments involving respondent based reference alternatives’, Transportation Research Part B, 42 (4), 395–406. Rose, J.M., R. Scarpa and M.C.J. Bliemer (2009), ‘Incorporating model uncertainty into the generation of efficient stated choice experiments: a model averaging approach’, International Choice Modelling Conference, Harrowgate, 30 March–1 April. Rousseas, S.W. and A.G. Hart (1951), ‘Experimental verification of a composite indifference map’, Journal of Political Economy, 59 (4), 288–318. Sándor, Z. and M. Wedel (2001), ‘Designing conjoint choice experiments using managers’ prior beliefs’, Journal of Marketing Research, 38 (4), 430–44. Sándor, Z. and M. Wedel (2002), ‘Profile construction in experimental choice designs for mixed logit models’, Marketing Science, 21 (4), 455–75. Sándor, Z. and M. Wedel (2005), ‘Heterogeneous conjoint choice designs’, Journal of Marketing Research, 42 (2), 210–18. Scarpa, R. and J.M. Rose (2008), ‘Designs efficiency for non-market valuation with choice modelling: how to measure it, what to report and why’, Australian Journal of Agricultural and Resource Economics, 52 (3), 253–82. Street, D.J. and L. Burgess (2004), ‘Optimal and near optimal pairs for the estimation of effects in 2-level choice experiments’, Journal of Statistical Planning and Inference, 118 (1–2), 185–99. Street, D.J. and L. Burgess (2007), The Construction of Optimal Stated Choice Experiments: Theory and Methods, Hoboken, NJ: Wiley. Street, D.J., D.S. Bunch and B.J. Moore (2001), ‘Optimal designs for 2k paired comparison experiments’, Communications in Statistics, Theory, and Methods, 30 (10), 2149–71. Street, D.J., L. Burgess and J.J. Louviere (2005), ‘Quick and easy choice sets: constructing optimal and nearly optimal stated choice experiments’, International Journal of Research in Marketing, 22 (4), 459–70. Thurstone, L. (1931), ‘The indifference function’, Journal of Social Psychology, 2 (2), 139–67. Toner, J.P., S.D. Clark, S.M. Grant-Muller and A.S. Fowkes (1998), ‘Anything you can do, we can do better: a provocative introduction to a new approach to stated preference design’, WCTR Proceedings, Antwerp, 3, 107–20. Toner, J.P., M. Wardman and G. Whelan (1999), ‘Testing recent advances in stated preference design’, Proceedings of the European Transport Conference, Cambridge. Train, K. (2009), Discrete Choice Methods with Simulation, 2nd edn, Cambridge: Cambridge University Press. Vermeulen, B., P. Goos and M. Vandebroek (2008), ‘Models and optimal designs for conjoint choice experiments including a no-choice option’, International Journal of Research in Marketing, 25 (2), 94–103. Vermeulen, B., P. Goos, R. Scarpa and M. Vandebroek (2011a), ‘Bayesian conjoint choice designs for measuring willingness to pay’, Environmental and Resource Economics, 48 (1), 129–49. Vermeulen, B., P. Goos, R. Scarpa and M. Vandebroek (2011b), ‘Rank-order conjoint experiments: efficiency and design’, Journal of Statistical Planning and Inference, 141 (8), 2519–31. Watson, S.M., J.P. Toner, A.S. Fowkes and M.R. Wardman (2000), ‘Efficiency properties of orthogonal stated preference designs’, in J. de D. Ortuzar (ed.), Stated Preference Modelling Techniques, London: PTRC Education and Research Services Limited, pp. 91–101. Yu, J., P. Goos and M. Vandebroek (2008), ‘Model-robust design of conjoint choice experiments’, Communications in Statistics: Simulation and Computation, 37 (8), 1603–21. Yu, J., P. Goos and M. Vandebroek (2009), ‘Efficient conjoint choice designs in the presence of respondent heterogeneity’, Marketing Science, 28 (1), 122–35. Yu, J., P. Goos and M. Vandebroek (2010), ‘Comparing different sampling schemes for approximating the integrals involved in the efficient design of stated choice experiments’, Transportation Research Part B, 44 (10), 1268–89. Yu, J., P. Goos and M. Vandebroek (2011), ‘Individually adapted sequential Bayesian designs for conjoint choice experiments’, International Journal of Research in Marketing, 28 (4), 378–88. Yu, J., P. Goos, P. and M. Vandebroek (2012), ‘A comparison of different Bayesian design criteria to compute efficient conjoint choice experiments’, Transportation Research Part B, 46 (7), 789–807.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:03:33AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

8

Best-worst scaling: theory and methods T.N. Flynn and A.A.J. Marley

1

INTRODUCTION

Best-Worst Scaling (BWS) can be a method of data collection, and/or a theory of how respondents provide top and bottom ranked items from a list. We begin with a brief history, followed by motivations for the use of BWS. The three types (‘cases’) of BWS will then be described in detail, before issues in the conceptualisation (modelling) and analysis of best-worst data are discussed. We present the simplest models of BWS, both for expository reasons, and because they have interesting (score) properties that have been found useful in data analyses. At various places, especially in section 5, we cite some more complex models that can handle intra-option dependencies, preference heterogeneity across decision makers, context effects, and so on. Busemeyer and Rieskamp (Chapter 3 of this volume) discuss parallel models for such effects in best choice; their is no principled reason why, as data warrants, those models for best choice can not be extended to best-worst choice. 1.1

Brief History

In 1987, whilst working at the University of Alberta, Jordan Louviere became interested in what he could do with information about the ‘least preferred item’ from a choice set, in addition to the traditional ‘most preferred’ item. He was primarily interested in whether a PhD student could do the task, and in what extra information about her utility function could be elicited. His initial focus was on ‘objects’, such as attitudes, general public policy goals, brands, or anything that did not require description in terms of attributes and levels. As such, his first peer-reviewed journal article examined the degree of concern the general public had for each of a set of food safety goals, including irradiation of foods and pesticide use on crops (Finn and Louviere, 1992). Figure 8.1 contains a BWS question similar to ones used in that study. Most

Issue

Least

Pesticides used on crops Hormones given to live stock Irradiation of foods Excess salt, fat cholesterol Antibiotics given to live stock Please consider the food safety issues in the table above and tick which concerns you most and which concerns you least.

Figure 8.1 A completed example BWS ‘object case’ question 178 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Best-worst scaling

179

Finn and Louviere proposed using BWS in place of category rating scales for several reasons. First, rating scales do not force respondents to discriminate between items, allowing them to state that multiple items are of similarly high importance. Second, interpreting what the rating scale values mean is difficult. Third, the reliability and validity of rating scales are frequently unknown and unknowable. BWS addresses these issues by valuing items within a random utility framework (Thurstone, 1927; McFadden, 1974): choice frequencies provide the metric by which to compare the importance of items and the use of a model with an error theory enables predictions to be made as to how often one item might be picked over any other. Such inferences provided real life significance of the method and avoided key problems with rating scales, such as ‘what does 7 out of 10 mean in terms of real life choices?’ The 1992 paper modelled best and worst choices among relatively simple items, such as goals or attitudes, which Louviere generically referred to as objects. However, he had already begun applying BWS to more complex items. These were either attribute-levels describing a single alternative (profile), or were complete alternatives (profiles) of the type familiar to choice modellers. The former case, requiring respondents to identify the best attribute-level and worst attribute-level within an alternative, was a relatively unfamiliar task to choice modellers. However, the latter case had the potential to become the most widely accepted case of BWS, by being ‘merely’ an extension of the method to a discrete choice experiment (DCE: Louviere et al., 2000; Hensher et al., 2005). In practice only the first case of BWS (considering objects) received any sustained interest in academia before 2005, principally in marketing among practitioners who were unhappy with rating scales. In particular, Steve Cohen won a number of best paper awards at ESOMAR conferences for his application of BWS to objects (Cohen and Neira, 2004). Since the mid-2000s, there has been increasing interest in the other two cases of BWS, particularly within the fields of health and economics. This prompted Louviere et al. (2013) to try to standardize terminology across the fields and provide cross-disciplinary guides to BWS that included motivations for its use. 1.2

Motivation

During the 1980s Louviere and colleagues (Hensher and Louviere, 1982; Louviere and Woodworth, 1983) pioneered discrete choice experiments. These had advantages over traditional conjoint measurement techniques in terms of sound theoretical underpinnings (random utility theory) and the need to make fewer and weaker assumptions about human decision-making (Louviere et al., 2010): for example, assumptions about how people deal with numbers to answer rating scale questions were no longer required. This move away from direct quantitative valuation towards discrete choice models came at a cost: there was usually no longer enough information to estimate models for single individuals and valid inferences were typically only possible after aggregating across groups of respondents. Therefore Louviere’s initial motivation for inventing BWS was to obtain more data from a given respondent. Louviere could have obtained more information from a respondent by simply asking her to answer more choice sets. The motivation for BWS was the following: if the respondent has already become familiar with a choice set containing 3 or more items by choosing ‘best’, why not simply exploit this, together with the human skill at identifying

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

180

Handbook of choice modelling

extremes (Helson, 1964), and ask her for ‘worst’? This would shorten the length of the DCE, certainly in terms of the number of choice sets administered, and potentially in terms of the total time taken. Obtaining more information about a list of items was not a new idea. Ranking models were first formulated by Luce, Marley and others (Luce, 1959; Luce and Suppes, 1965; Marley, 1968). With the exception of the rank ordered logit regression model (Beggs et al., 1981; Hausman and Ruud, 1987), closed form random utility models that could be estimated using the computers available at the time were unavailable. Marley (1968) had put forward ideas on partial and complete ranking in his PhD thesis but had not taken them forward; Louviere credits Marley’s concepts of the ‘superior’ and ‘inferior’ items in a list as his inspiration for BWS. Indeed Louviere believed models of best-worst choices could be axiomatized, which would help establish their credentials to the wider research community, and he began collaborating with Marley to achieve this goal (Marley and Louviere, 2005; Marley et al., 2008). In parallel to this, Louviere continued research into the use of BWS merely as a convenient way (via repeated application) to collect a complete ranking of all items in a choice set. That work was conducted primarily with respect to multi-attribute consumer goods such as mobile phones, although it could easily have been used in studies examining simpler objects of the types mentioned above. Finally, as already mentioned briefly, Louviere was motivated to use another type of BWS (that requiring respondents to pick the best and worst attribute-levels) to address a major stumbling block encountered in discrete choice models: the confound between attribute weight and level scale (Anderson, 1970; Marley et al., 2008). Section 2.2 describes this issue in more detail but suffice to note that this confound is not addressed in traditional economic theory, and is different from the mean-variance confound that arises for all limited dependent variable (including logit and probit) models (Yatchew and Griliches, 1985). 1.3

BWS as a Method of Data Collection versus BWS as a Theory

Best-worst scaling was initially used mainly as a method of collecting data in a costefficient manner, though Finn and Louviere (1992) did introduce the maximum difference (maxdiff) model for best-worst choice (see below). However, the work with Marley spurred research into BWS as a theory, explaining the processes that individuals might follow in providing best and worst data. The 2005 paper introduced sequential and maxdiff models of best-worst choices (Marley and Louviere, 2005). The former assumes the individual provides best and worst in a particular order whilst the latter is a well-established model that assumes a simultaneous choice of that pair of items that maximises the difference between them on a latent (usually utility) scale. Some applied practitioners may regard process issues as esoteric, but there is increasing evidence that these may matter when interest is in choice models for particular individuals. Moreover, what are called ‘maxdiff scaling’ models in some estimation software are sequential, not maxdiff, ones. Recent work by Louviere has returned to the use of BWS as a method of data collection (Louviere et al., 2008): its purpose, via repeated rounds of best-worst choices, is simply to obtain a full ranking of items in a manner that is ‘easy’ for respondents. Analysis proceeds by expanding the data to all the implied choice sets, and then using

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Best-worst scaling

181

statistical models based on first choices (either conditional or binary logistic regression). That work is not covered here as it is the subject of Frischknecht et al. (Chapter 21 of this volume); also see the class of weighted utility ranking models (Marley and Islam, 2012), which are motivated, in part, by those expansion methods. The next section describes the three types (‘cases’) of BWS in detail; it is important to note that BWS can be used as a data collection method and/or a theory of process for all three cases.

2

BWS: THE THREE CASES

Louviere developed three cases of BWS, which differ in the nature and complexity of the items being chosen. Case 1 (the Object case) is the simplest, whilst Cases 2 and 3 (the Profile and Multi-profile cases) involve an attributes and levels structure that should be familiar to choice modellers. The frequent lack of clarification in many published articles as to which case is being used reflects the fact that different disciplines have tended to embrace different cases. Academic researchers from marketing, food science and personality assessment tend to be familiar with case 1 whilst those working in health are familiar with case 2 (and increasingly, case 3) and marketing industry practitioners tend to use case 3. This section concentrates on the principles and design of the three cases; section 3 presents details on the related models. 2.1

Case 1 (Object Case)

Case 1 BWS is appropriate when the researcher is interested in the relative values associated with each of a list of objects. These might be brands, public policy goals, or any set of objects that can be meaningfully compared. Generally, these will not be described in terms of an attribute and level structure. However, if the researcher is interested in valuing items such as brands, (s)he must recognize that respondents might infer particular levels of key attributes when considering these: instructions to respondents must be carefully worded to standardise any such inferences if estimates of (for example) airline carrier are not to be confounded with estimates of assumed levels of service. As mentioned above, the first peer-reviewed Case 1 study investigated food safety. It made it clear that the latent scale of interest does not have to be ‘utility’; degree of concern was key and other metrics may be of relevance, depending on the application. Indeed many applications of category rating scales are amenable to replacement by best-worst questions. Once the researcher has chosen the list of objects, (s)he must present choice sets of these to respondents to obtain best and worst data. Choice sets here serve a similar purpose to those in traditional DCEs: statistical designs are implemented that include (some or all) subsets of all possible items which, with suitable assumptions, facilitate inferences about the value associated with the wider list of objects. More specifically, choice frequencies across all sets are used to estimate the relative values associated with objects. Since there is no attribute and level structure to consider, Case 1 designs are typically less complex (and less problematic) than those for DCEs. Early work by Louviere utilized 2J designs, extending his seminal work on DCEs (Hensher and Louviere, 1982; Louviere and Woodworth, 1983). Such designs are so-called because for J objects, there

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

182

Handbook of choice modelling

Table 8.1 Choice sets from a 2J expansion for four objects

Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 Set 8 Set 9 Set 10 Set 11 Set 12 Set 13 Set 14 Set 15 Set 16

Object W

Object X

Object Y

Object Z

✓ ✓ ✓ ✓ ˚ ✓ ✓ ✓ ˚ ˚ ˚ ✓ ˚ ˚ ˚ ˚

✓ ✓ ✓ ˚ ✓ ✓ ˚ ˚ ✓ ✓ ˚ ˚ ✓ ˚ ˚ ˚

✓ ✓ ˚ ✓ ✓ ˚ ✓ ˚ ✓ ˚ ✓ ˚ ˚ ✓ ˚ ˚

✓ ˚ ✓ ✓ ✓ ˚ ˚ ✓ ˚ ✓ ✓ ˚ ˚ ˚ ✓ ˚

are 2J distinct choice sets possible. Table 8.1 gives all 16 choice sets for 4 objects – there is one full set of four, four triples, six pairs, four singletons and the null (empty) set. Fractions of a 2J design can be used to keep the number of choice sets small, using similar principles to those used in DCEs (for example main effects designs). The potential problems with these designs are psychological rather than statistical in origin. The size of the choice set is not constant and respondents may infer things that have no relevance for the outcome of interest: they might decide that objects in small choice sets ‘must be somehow important to the researcher so I’ll pay more attention to those’. Today, balanced incomplete block designs (BIBDs) are more common. A BIBD ensures that occurrence and co-occurrences of objects is constant, helping minimize the chance that respondents can make unintended assumptions about the objects based on aspects of the design. For example, a study of importance of nine short-term investment priorities in Sydney’s public transport systems used a BIBD which presented 12 sets of size three. Each object appeared four times across the design and each pair of objects appeared once. BIBDs are available from design catalogues, such as that in Street and Street (1987). Unfortunately there are some numbers for which there are no BIBDs, whilst other numbers have two or more possible BIBDs (varying in the number and size of choice sets). In the former case, it is best to include some ‘realistic but irrelevant’ objects to make the number up to one for which there is a BIBD; an alternative strategy of using a statistical algorithm to produce a ‘nearly’ balanced design risks problems similar to those above in terms of what the respondents are assuming. Furthermore, some of the attractive analysis methods to be discussed become problematic. An attractive feature of BIBDs is that the number of choice sets is often not markedly different from the number of objects being valued. This, together with the fact they force respondents to discriminate between objects, makes case 1 BWS attractive in comparison

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Best-worst scaling

183

with category rating scale surveys. Those BIBDs that ensure that every object appears in every possible position the same number of times are called Youden designs and represent the most robust defence against any propensity of the respondent to ‘read too much into’ the size or composition of the choice sets on offer. 2.2

Case 2 (Profile Case)

Case 2 BWS is largely unknown outside of the field of health economics, to which it was introduced by Szeinbach et al. (1999) and by McIntosh and Louviere in a conference paper (2002). It is easiest to describe using an example (Figure 8.2) based on a dermatology study (Coast et al., 2006; Flynn et al., 2008). The set looks like a single profile (alternative) from a DCE or conjoint study. However, the choices the respondent is asked to make do not require him/her to consider the value of the profile as a whole. Instead, (s)he must consider the attribute levels that describe it, choosing the one that is best (most attractive) and the one that is worst (least attractive). Case 2 BWS is most popular in health because (1) the systems of many industrialized countries do not typically give opportunities for patients to become ‘experienced consumers’ and (2) health care goods/services can be complicated and even pairs of specifications (in a simple DCE) may lead to unacceptable cognitive burden, particularly among vulnerable patient groups. In some respects, Case 2 is merely Case 1 with the objects grouped into an attribute and level structure. However, what makes Case 2 unique is that attribute-levels are only meaningful when forming a profile. Thus, if meaningful profiles are to be presented, two levels of the same attribute cannot compete with one another; each level competes against a level from every other attribute. This means that designs for Case 1 are generally inappropriate for Case 2. However, Case 2 design is relatively easy for those researchers who would generate a DCE design the following way: 1. 2.

Use a ‘starting’ design to produce (for example) the ‘left-hand side profile’ in every choice set, then Use some statistical procedure to produce the other profiles in each choice set, from the ‘left-hand side ones’.

In particular, Case 2 design involves only step (1): there are no ‘other’ profiles in each choice set since the choice set is the profile. Whilst this makes Case 2 design easy in some Most

Appointment #1

Least

You will have to wait two months for your appointment The specialist has been treating skin complaints part-time for 1–2 years Getting to your appointment will be quick and easy The consultation will be as thorough as you would like Please imagine being offered the appointment described above and tick which feature would be best and which would be worst.

Figure 8.2

A completed example BWS ‘profile case’ question

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

184

Handbook of choice modelling Best

Worst Some problems walking about

No problems with self-care Some problems with performing usual activities Extreme pain or discomfort

Moderately anxious or depressed Imagine you were living in the health state described above. Tick which aspect of this would be best to live with and which would be worst to live with.

Figure 8.3

An example BWS ‘profile case’ question based on EQ-5D instrument

respects, it has potential problems, which can be serious depending on the issue of interest. These generally arise when attributes all have ordered levels. We use a profile from the EQ-5D health state classification system (Figure 8.3) to illustrate this. All five attributes of the EQ-5D have ordered levels, generally representing ‘no problems’ through to ‘severe problems’. Presenting this particular profile to respondents would be unwise since the best and worst choices are obvious (to anyone who isn’t a masochist). Conceptually, the random utility term is likely identically equal to zero (not a sampling zero), violating random utility theory and thereby biasing regression estimates (if they are estimable at all). It is the researcher’s responsibility to code with care, so as to minimize the number of choice sets with this property. Unfortunately, as the design becomes larger (so as to estimate interaction terms), this becomes impossible. The first author recently advised a company that at least 75 per cent of its Case 2 data were worthless, thanks to a major choice modelling company providing it with a design that was almost the full factorial. Thus, for many Case 2 applications it may be difficult, or impossible, to estimate interaction terms. Louviere originally anticipated that Case 2 BWS would allow decomposition of attribute weight and level scale values (McIntosh and Louviere, 2002), a 40-year-old problem in mathematical psychology (Anderson, 1970). That is, it is assumed that there is a multiplicative relationship between the importance of an attribute per se – which might vary depending on the context of the choice task – and the level scale of an attribute level – which, conceptually, should be fixed in value no matter what the context of the choice. McIntosh and Louviere were partly right: Marley et al. (2008) proved that although Case 2 BWS does not enable estimation of attribute importance, it does enable the direct estimation of attribute impact, a weaker concept that represents the average utility of an attribute across all its levels. Also, as shown in section 3.4.1, a Case 2 study, in combination with a Case 3 study on the same (or suitably related) profiles, in principle may allow the separate measurement of attribute weight and level scale value; should such a study be successful, it would solve this classic measurement problem. 2.3

Case 3 (Multi-profile Case)

Case 3 BWS is perhaps the most accessible (conceptually at least) to DCE practitioners. It ‘merely’ requires respondents to choose the worst (least attractive) profile/alternative

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Best-worst scaling

185

Phone 1

Phone 2

Phone 3

Phone 4

Clam or flip phone

Candy Bar or straight phone

Swivel flip

PDA phone with touch screen input

A

B

C

D

Phone Style

Handset Brand

$49.00

$199.00

$249.00

$129.00

No camera

5 megapixel camera

2 megapixel camera

3 megapixel camera

No Bluetooth or WiFi connectivity

Bluetooth and WiFi connectivity

WiFi connectivity

Bluetooth connectivity

No video recording

Video recording (up to 1 hour)

Video recording (more than 1 hour)

Video recording (up to 15 minutes)

Internet access

Internet access

No internet access

No internet access

Music Capability

No music capability

MP3 Music Player only

FM Radio only

MP3 Music Player and FM Radio

Handset Memory

64 MB built-in memory

2 GB built-in memory

512 MB built-in memory

4 GB built-in memory

Price Built-in Camera Wireless Connectivity Video Capability

Internet Capability

Note: Reprinted from the Journal of Mathematical Psychology, 56, A.A.J. Marley and D. Pihlens, ‘Models of best-worst choice and ranking among multiattribute options’, 24–34, © 2011, with permission from Elsevier.

Figure 8.4

An example BWS ‘multi-profile case’ question

as well as the best (most attractive) one in a DCE. Therefore, virtually all (non pairwise) DCEs administered by The Institute for Choice at the University of South Australia are now Case 3 BWS studies. Figure 8.4 provides an example task from a mobile phone study. The increasing use of web-based administration makes expansion of DCEs into Case 3 BWS studies easy and cost-efficient. The additional data provided are valuable in many marketing applications: the additional information obtained about the consumer’s utility function is valuable both for its own sake and in identifying attribute levels that make a good ‘unacceptable’. The consumer who trades off attributes in a manner predicted by traditional economic theory when choosing most preferred might demonstrate lexicographic preferences when answering least preferred. Such information is valuable to the marketer who wishes to ensure that a product passes an initial ‘consideration’ test by consumers: having an attractive price and a set of desirably valued attributes is of no use if a level on another attribute rules it out of consideration. Availability of data and such ‘real life’ marketing issues caused Case 3 BWS studies to be the primary vehicle for empirical

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

186

Handbook of choice modelling

investigations of choice process issues to date. However, process issues apply to all three cases of BWS and these issues are dealt with next. Models of Case 3 BWS have to date assumed additive representations of profiles, in large part because that is the ‘natural’ representation of the options being studied (for example, mobile phones). However, as more data becomes available on single individuals, it will be desirable to consider other representations (vis, rules, strategies) that an individual might apply to such choices – for instance, lexicographic ones, and/ or different representations (rules, strategies) for best versus worst choices; such rules/ strategies are also relevant to understanding data that show differences between accepting versus rejecting an option in certain situations (Shafir, 1993, and others – see section 3.2). Also, there is growing interest in applying DCEs for best choice (and, eventually, BWS) to other domains such as choice between uncertain or risky options (gambles) – see Johnson and Marley (2012) and Viney et al. (2005) for best choice in a medical context. The basic extension of BWS to gambles is routine – one ‘simply’ replaces an additive representation of the options with a weighted additive representation such as expected utility or rank-dependent utility; however, extensive evaluation will be needed to test such representations and to decide whether, say, different (probability) weights are used for worst versus best choices.

3

MULTINOMIAL LOGIT MODELS OF BEST AND/OR WORST CHOICE

We begin with notation that applies to all Cases (1, 2, and 3) and talk of ‘choice options’ (or ‘options’) without distinguishing between objects (Case 1), attribute-levels of a profile (Case 2), and profiles (Case 3). For later results, we need additional notation for Case 2 and Case 3. We also present the results in terms of a numeric ‘utility’ value associated with each choice option (and, as relevant, with each of its attribute-levels), rather than in terms of the utility coefficients (‘beta weights’) that are standard in the discrete choice literature; we do this because various theoretical results on BWS can only be stated and proved in the former notation (for example, those in Marley and Louviere, 2005; Marley et al., 2008; Marley and Pihlens, 2012). However, for the reader’s benefit, we do introduce the utility coefficient notation when discussing Case 3. Let S with 0 S 0 $ 2 denote the finite set of potentially available choice options, and let D (S) denote the design, that is, the set of (sub)sets of choice alternatives that occur in the study. For example, participants might be asked about their preferences for mobile phones by repeatedly asking them for choices amongst a set of four different options: S represents the collection of mobile phones in the study, and each element of the set D(S) represents the set of options provided on one particular choice occasion. For any Y [ D (S) , with 0 Y 0 $ 2, BY (x) denotes the probability that alternative x is chosen as best in Y, WY (y) the probability that alternative y is chosen as worst in Y, and BWY (x, y) the probability that alternative x is chosen as best in Y and the alternative y 2 x is chosen as worst in Y. Most previous work using similar mathematical notation has used PY (x) or P (x 0 Y) where we use BY (y) . We use the latter for best, and WY (y) for worst, to distinguish clearly between such Best and Worst choice probabilities.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

187

Best-worst scaling 3.1

Multinomial Models of Best (Respectively, Worst) Choice

Many models of choice, especially those involving best-worst scaling, are based on extensions of the multinomial logit (MNL) model. The best choice MNL model assumes there is a scale u such that for all y [ Y [ D (S) , BY (y) 5

eu(y) . g z[Yeu(z)

(8.1)

The value u (y) for an option y is interpreted as the utility for that option. The representation restricted to Y # S, 0 Y 0 5 2, is the binary MNL model. Various results show that u can be assumed to be a difference scale – that is one, that is unique up to an origin. For instance, the Luce choice model corresponds to the MNL model when the latter is written in terms of b 5 eu; b is shown to be a ratio scale in Luce (1959), which implies that u is a difference scale. Parallel observations hold for the representations throughout this chapter that are written in terms of u, where the quoted theoretical results were obtained using b 5 eu. The parallel MNL model for worst choices assumes there is a scale v such that for all y [ Y [ D (S) , WY (y) 5

ev(y) . g z[Yev(z)

(8.2)

Now assume that both (8.1) and (8.2) hold, and that the choice probabilities on 2-element sets satisfy the plausible condition: for all distinct pairs x, y [ Y [ D (S) , B{x, y} (x) 5 W{x, y} (y) ,

(8.3)

e2u(y) . g z[Ye2u(z)

(8.4)

that is, we have WY (y) 5

3.2 The Assumption of Multinomial Logits A significant portion of the theoretical and empirical research on BWS assumes the MNL representation of (8.1) for best choices and the MNL representation (8.4) for worst choices, or a closely related (MNL) model for best-worst choice – see (8.6) in section 3.3. These assumptions are somewhat surprising for two reasons. First, the assumption that an MNL model holds for best choices means that those probabilities satisfy the independence of irrelevant alternatives (IIA) condition, a constraint that is generally considered unsatisfactory for various (best, first) choice data due to dependencies (correlations) between the (scale) values of different multiattribute options (Louviere et al., 2000; Train, 2009). Second, there are between-subject binary choice data that can be interpreted as showing that the probability can be greater than one of choosing (accepting) a particular option x (under an instruction to accept one of the options) and

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

188

Handbook of choice modelling

rejecting that same option x (under the instruction to reject one of the options) (Laran and Wilcox, 2011; Shafir, 1993; Wedell, 1997; and numerous others). Then, using the notation of (8.3), these between-subject data imply that B{x, y} (x) 1 W{x, y} (x) . 1, that is, that B{x, y} (x) . [ 1 2 W{x, y} (x) ] 5 W{x, y} (y) and so B{x, y} (x) . W{x, y} (y) , which contradicts (8.3), and thus the assumption that v 5 2u. A possible reason that the MNL assumption has been seen to be useful in BWS is that until recently much of the best-worst data has been for Case 1 or Case 2, which are simpler tasks for respondents than Case 3, and it is the latter that has been mainly used in the classic applications to first (best) choice. More recently, Case 3 BWS data have been fit by generalizations of the MNL models that include scale (variance) factors that depend on various aspects of the design and/or the respondents An example of such a generalization of the MNL model for best choices, (8.1) – with a similar generalization for worst choices – is: there is a non-negative scale factor s defined for each integer 2, 3,. . ., 0 T 0 , and a difference scale u such that for all y [ Y # T, BY (y) 5

es(0X0)u(y) . g z[Yes(0X0) u (z)

(8.5)

The form (8.5) is a special case of what Vermut and Magidson (2005, Section 2.4) call an MNL with replication-specific scale factor and of Fiebig et al.’s (2010) generalized multinomial logit model (GMNL); the latter model also includes scale (variance) heterogeneity across individuals. Scarpa et al. (2011) collected ranking data by repeated best, then worst, choices and fit that data quite successfully with a model based on repeated best choices, with those choices satisfying a generalization of the model in (8.5) that included dependencies of the scale factor s on aspects of the design that took account of the difference between the data collection method (repeated best, then worst) and the model (repeated best). Collins and Rose (2011) fit related models to stated preference data on dating choices and Marley and Islam (2012) fit what they called a generalized rank ordered logit model (GROL) to rank order data where the choices at each stage are best choice probabilities satisfying (8.5). Also, as required by data, the MNL in the above can be replaced by other models, such as the GMNL for best choices, also adapted to worst choices (Fiebig et al., 2010), and one can consider latent class extensions of these models. Turning to the assumption that the worst scale is the negative of the best scale, Bednarz (2006) fit 11 BWS Case 1 data sets in various ways, including fitting the (marginal) best choices with an MNL model with scale values u, (8.1), and the marginal worst choices with an independent MNL model with scale values v, (8.4). For 10 of the 11 data sets there was a high correlation between the estimates of the scales values u and 2v. Bednarz did not present the values of the slopes for these plots, but they appear to be close to one; if this were the case, then the variability in the best and worst judgments would be approximately equal. In summary, we assume MNL models in the following material as those models form the basis for more complex models involving preference homogeneity, including latent classes; different scales in best and worst choices, and so on. Also, the MNL-based models have simple (score) properties that have been found to be useful in exploratory data analysis.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

189

Best-worst scaling 3.3

Multinomial Models of Best-Worst Choice

We now present three MNL-based models for best-worst choice. Perhaps the most natural generalization of the above MNL models to best-worst choice is the maxdiff model. This model makes the strong assumption that the utility of a choice alternative in the selection of a best option is the negative of the utility of that option in the selection of a worst option, and this utility scale u is such that for all x, y [ Y [ D (S) , x 2 y, BWY (x, y) 5

e[u(x) 2u(y)] . g {p, q} [Y e [u(p) 2u(q)]

(8.6)

The maxdiff model can be given by the following process description (Marley and Louviere, 2005): assume that the best (respectively, worst) choice probabilities satisfy (8.1) (respectively, (8.4)). A person chooses a best and a worst option, independently, according to those models; if the resultant best and worst options are different, these form the best-worst choice pair; otherwise, the person re-samples both the best and the worst option until the selected options are different. Marley and Louviere (2005) show that, under the above assumptions, this process gives the maxdiff representation, (8.6). The above process description leads quite naturally to the question of how one might extend MNL-type models for choice to models of choice and response time. We consider this question in section 5 as ongoing and planned research is involved in its answer. Once one begins thinking of best and worst choices as possibly being made sequentially, there are several other plausible models forms for the combined best-worst choices. Assuming the best choices satisfy (8.1) and the worst choices satisfy (8.4), the best then worst MNL model states: for all x, y [ Y [ D (S) , x 2 y, BWY (x, y) 5 BY (x) WY 2 {x} (y) , Similarly the worst then best MNL model states: for all x, y [ Y [ D (S) , x 2 y, BWY (x, y) 5 WY (y) BY 2 {y} (x) . Repeated best-worst choices satisfying a common one of the above models lead naturally to models of rank order data. Various authors are exploring such models, and their generalizations to include heterogeneity – see Collins and Rose (2011), Scarpa and Marley (2011), Scarpa et al. (2011), Marley and Pihlens (2012), and Marley and Islam (2012). For the additional material on Case 2 and Case 3, we limit consideration to the maxdiff model (for best-worst choice). Parallel notation applies to models based on repeated best and/or worst choices. The notation already introduced, where x, y, and so on, denoted generic objects, is all we need to state later theoretical results for Case 1. However, for Case 2 and Case 3 we need the following additional notation. There are m attributes, usually with m $ 2, and we let M 5 { 1, . . . , m } . Attribute i, i 5 1, . . . , m, has q (i) levels; we call these attribute-levels and sometimes let p, q denote typical attribute-levels, with the context making clear which attribute is involved. A

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

190

Handbook of choice modelling

profile (traditionally called a multiattribute option) is an m-component vector with each component i taking on one of the q (i) levels for that component. Given a set P of such profiles, let D (P) denote the design, that is, the set of (sub)sets of profiles that occur in the study. We denote a typical profile by z 5 (z1, . . . , zm) ,

(8.7)

where zi, i 5 1, . . . , m, denotes the level of attribute i in profile z. For Case 1, we assume that each object x has a scale value u (x) , so the representation of the maxdiff model is as in (8.6); it follows from the results in Marley and Louviere (2005) that u is a difference scale, that is, unique up to an origin.1 For Case 2, we assume that attribute-level p has a scale value u (p) , and the representation of the maxdiff model is as in (8.6); it follows from the results in Marley and Louviere (2005) that u is a difference scale, that is, unique up to an origin. For Case 3, we assume that each profile z has a scale value u(m) (z) with u(m) a difference scale. We also assume the additive representation u(m) (z) 5 a ui (zi) , i51 m

where each ui is a separate (different) difference scale. For Case 2, we use the following revised notation: for a typical profile z [ P, let Z 5 { z1, . . . , zm } and let BWZ (zi, zj) denote the probability that, jointly, the attributelevel zi is chosen as best in Z and the attribute-level zj is chosen as worst in Z. Definition 1 (Adapted from Marley et al. 2008. def. 12) A set of best-worst choice probabilities on a finite set of profiles P satisfies an attribute-level maxdiff model (on single profiles) iff there exists a real-valued scale u on the attributes such that for every profile z [ P [equivalently, for every such Z 5 { z1, . . . , zm } ] and i, j [ M, i 2 j, BWZ (zi, zj) 5

e[u(z ) 2u(z )] i

g k, l[M e

j

[u (zk) 2u (zl)]

(i 2 j) .

(8.8)

k 2l

Marley et al. (2008) show that, under reasonable mathematical assumptions, u is a difference scale, that is, unique up to an origin. For Case 3, we use the following revised notation: for typical profiles x, y [ P, BWX (x, y) is the probability that, jointly, the profile x is chosen as best in X and the profile y is chosen as worst in z. For completeness, we first present the maxdiff model on profiles written with the options in boldface to represent those profiles. Definition 2 A set of best-worst choice probabilities for a design D (P) , P # Q, 0 P 0 $ 2 satisfies a maxdiff model on profiles iff there exists a real-valued scale u(m) on P such that for every x, y [ X [ D (P) , x 2 y, 0 X 0 $ 2, BWX (x, y) 5

e [u

(m)

(x) 2u (m) (y)]

g r, s[X e[u

(m)

(r) 2u (m) (s)]

(x 2 y) .

r2s

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

(8.9)

Best-worst scaling

191

Marley et al. (2008, theorem 8) show that, under reasonable mathematical assumptions, u(m) is a difference scale, that is, unique up to an origin. In the case where u(m) (z) 5 a u (z ) , i51 i i m

we have the following representation of the maxdiff model on profiles. Definition 3 (Adapted from Marley and Pihlens. 2012, def. 2). A set of best-worst choice probabilities for a design D (P) , P # Q, 0 P 0 $ 2, satisfies a preference independent maxdiff model iff there exists a separate (different) nonnegative scale ui on each attribute i, i 5 1, . . . , m, such that for every x, y [ X [ D (P) , x 2 y, 0 X 0 $ 2, BWX (x, y) 5

m exp g i51[ ui (xi) 2 ui (yi) ]

g r, s[X exp g r2s

m

i51

(x 2 y).

(8.10)

[ ui (ri) 2 ui (si) ]

Marley and Pihlens (2012, Theorem 6) show that, under reasonable mathematical assumptions, each ui is a difference scale, that is, unique up to an origin, with different origins for each i. The more standard notation in the discrete choice literature for (8.10) assumes that there is a vector b 5 (b1, . . . , bm) , with the ith component sometimes called the utility coefficient for attribute i, such that BWX (x, y) 5

expbr (x2 y) (x 2 y) . g r, s[X exp2br (r 2 s) r2s

However, that notation requires the attribute-levels to either be numerical or coded in some numerical fashion (for example, with dummy codes), and is not suitable for proving results of the kind described next. 3.4 Properties of Scores for the Maxdiff Model We now present theoretical results for best minus worst scores (defined below) for the maxdiff model of best-worst choice. Some of these results were proved in Marley and Pihlens (2012), the rest were proved in Marley and Islam (2012). The proofs in Marley and Islam were for (partial or full) ranking probabilities that belong to the class of weighted utility ranking models. This class includes the maxdiff model of best-worst choice as a special case, along with the MNL model for best choice and the MNL model for worst choice; however, it also includes many interesting ranking models, such as the reversible ranking model (Marley, 1968). For simplicity and relevance, we state the results for the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

192

Handbook of choice modelling

maxdiff model – that is, for Case 1 we have (8.6); for Case 2, we have (8.8); and for Case 3 we have (8.9) – or (8.10) when we assume additivity. Nonetheless, we know that, empirically, the score measures used in these results are useful for preliminary analyses of the data, independent of the model that is eventually fit to the data – see section 4.2. We first state results that hold in common for Case 1, 2, and 3; these are results that do not depend on the actual structure (type) of the ‘choice options’ – where a (choice) option is to be interpreted as an object (Case 1), an attribute-level (Case 2), or a profile (Case 3). Using notation paralleling that in Marley and Louviere (2005) for Case 1, for each option x in the design, let bˆ (x) 2 w ˆ (x) denote the number of times option x is chosen as best in the study minus the number of times option x is chosen as worst in the study. We call this the score for x (in this particular design) and refer to ‘the scores’ for these values across the options in the design. Scores: Property 1 (for Case 1, 2, and 3) Using general language (with undefined terms in quotation marks), the following states a result due to Huber (1963) in such a way that it applies to the maxdiff model for options; Marley and Islam (2012) state the terms and results exactly. Assume that one is interested in the rank order, only, of the (utility) scale values in the maxdiff model. An acceptable loss function is a ‘penalty’ function that: depends only on the order of the scale values and the order the scores – that is, the loss remains the same if both the scale values and the scores are reordered in the same way; and that increases if the ranking is made worse by misordering a pair of scale values. Let S be a master set with n $ 2 elements and assume that, for some k with n $ k $ 2, every subset of S with exactly k elements appears in the design2 D (S) . Then, given the maxdiff model, ranking the scale values in descending order of the (best minus worst) scores, breaking ties at random, has ‘minimal average loss’ amongst all (‘permutation invariant’) ranking procedures that depend on the data only through the set of scores. Comment 1 The above result actually holds for the class of weighted utility ranking models, a class that includes the MNL for best; MNL for worst; and the maxdiff model for best-worst choice (Marley and Islam, 2012). Comment 2 Given the above property of the scores, they are likely useful starting values in estimating the maximum likelihood values of the utilities u (x) , x [ S. In fact, various empirical work on the maxdiff model gives a linear relation between the (best minus worst) scores and the (maximum likelihood) estimates of the utilities3 (Louviere et al., 2008, 2014). Also, Marley and Islam (2012) show similar results for weighted utility ranking models applied to the ranking data of a Case 3 study of attitudes toward the microgeneration of electricity. Scores: Property 2 (for Case 1, 2 and 3) The set of (best minus worst) scores is a sufficient statistic. Comment 1 The above result actually holds for the class of weighted utility ranking models, a class that includes the MNL for best; MNL for worst; and the maxdiff model for best-worst choice (Marley & Islam, 2012, Theorem 3). The following result shows that, in the sense stated, the best minus worst scores reproduce the difference between the best and the worst choice probabilities given by the maximum likelihood estimates of the maxdiff model. Scores: Property 3 (for Case 1, 2 and 3) For the maxdiff model, the (best minus worst) score for an option x equals the sum over X [ D (P) of the weighted difference between the (marginal) best and the (marginal) worst probability of choosing option x in set X, with

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Best-worst scaling

193

those probabilities evaluated at the values of the maximum likelihood parameter estimates; the weight for a set X is the number of times X occurs in the design D (P) (Marley and Pihlens, 2012, prove this for Case 3; the result for Case 1 and Case 2 is then immediate). The following result applies to the attribute levels in Case 3; there is no equivalent result in Case 1 or Case 2 as there are no ‘implied’ choices of attribute-levels in those cases. Scores: Property 4 (for Case 3, with the preference independent maxdiff model, Definition 3) Using notation from Marley and Pihlens (2012) for choice among profiles (Case 3), ˆbi (p) 2 w ˆ i (p) , p [ Q (i) , i [ M, denotes the number of times attribute-level p is ‘chosen’4 as best minus the number of times p is ‘chosen’ as worst. Then Marley and Pihlens show that, for the preference independent maxdiff model (def. 3), (i) The set of values ˆbi (p) 2 w ˆ i (p) is a sufficient statistic. (ii) ˆbi (p) 2 w ˆ i ( p) equals the sum over all X [ D (P) of the weighted difference between the (marginal) best and (marginal) worst probability of ‘choosing’ attribute-level p in set X, with those probabilities evaluated at the values of the maximum likelihood parameter estimates; the weight for a set X is the number of times X occurs in the design D (P) . 3.4.1 Relations between the scales in Case 2 and Case 3 Note that in Case 2, we have the same difference scale u on each attribute, whereas in Case 3, we have a different difference scale ui on each attribute i. Since these scales are derived in different (types of) experiments, there is no necessary empirical relation between them. However, one would hope that such (Case 2 and Case 3) experiments are not measuring totally different properties of the attributes. One theoretical (and, potentially, empirical) property one could hope for is that for each attribute i, separately, the scale ui of Case 3 is strictly monotonic increasing with respect to the scale u of Case 2. But we are also assuming that the ui, i 5 1, . . . , m, and u, are separate (different) difference scales. Then, under weak mathematical conditions,5 the relation between ui and u has to be linear: that is, there are constants ai . 0 and bi such that, for each attribute-level zi, we have ui (zi) 5 aiu (zi) 1 bi (see Aczél et al.,1986; and Marley and Pihlens, 2012, for related results).6 The interested reader can get a feeling for why this result is true by trying to make a strictly monotonic increasing function, other than a linear one, ‘work’ in preserving the difference scale properties of the ui and u. Thus, if we have data from both a Case 2 and a Case 3 experiment where the above results hold, then substituting the expressions ui (zi) 5 aiu (zi) 1 bi in the Case 3 representation of the preference independent maxdiff model, (8.10), gives: for x, y [ X, x 2 y, m ( )] [ ( ) e g i51ai u xi 2 u yi (x 2 y) . m [ ( ) ( ) ] BWX (x, y) 5 g a u r 2 u s g i i i r, s[X e i51 r2s

In this expression, the ‘importance weights’ ai are known (from the relation between the data in the two experiments). However, from a Case 3 (or Case 2) experiment, alone, such weights are not identifiable (see Marley et al., 2008).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

194

Handbook of choice modelling

The above results relating the Case 2 scale to the Case 3 scales were based on the assumptions that the preference independent maxdiff model (def. 3) holds and for each attribute i, separately, the scale ui of Case 3 is strictly monotonic increasing with respect to the scale u of Case 2. Although this is an interesting, and possibly desirable result, the data may be otherwise. Weaker relations between the scales in Case 2 and Case 3 are obtained under weaker assumptions (Appendix B).

4

DATA ANALYSIS

Analysis of best-worst data can be conducted in a variety of ways (Louviere et al., 2008; Flynn et al., 2007; Flynn 2010). Two broad methods will be explored here: maximumlikelihood-based methods that are familiar to DCE practitioners, including conditional logistic regression and its extensions, and the best-minus-worst scores introduced in section 3. It will be demonstrated that although the scores are not part of the typical academic ‘toolbox’ of methods, they have advantages in understanding preference heterogeneity. 4.1

Maximum Likelihood (ML) Based Methods

There is increasing support in statistical packages for ‘partial ranking’ models, for use when the researcher only has data from certain ranking depths, for example, the top two ranks, or top and bottom ranks. These can be used to analyse best-worst data, but researchers are strongly recommended first to conduct simpler analyses on best and worst data separately to understand whether assumptions made in these models are satisfied. Basic ranking models assume the data at various ranking depths (1) come from a common underlying utility function and (2) have the same variance scale factor. These assumptions can be tested by plotting conditional logit (that is, multinomial logit) estimates of the best data against those for the worst data: a negative linear relationship indicates assumption (1) holds, and a slope with absolute magnitude of 1 indicates (2) holds. Indeed, multiplying all independent variables for the worst data by −1 is a ‘trick’ which allows the researcher to stack the ‘worst choice sets’ below the ‘best choice sets’ and estimate a conditional logistic regression which treats these as ‘just more best choice sets’. The intuition behind this is as follows: for plausible models (including the conditional logit for best, conditional logit for worst, and maxdiff for best worst) choosing worst from four items with latent utilities of 2, 4, 7, 8 is observationally equivalent to choosing best from four items with latent utilities of 22,24,27,28. A final adjustment to worst choice data is to delete the item already chosen as best from each choice set. This assumes a sequential model of best, then worst. Knowledge of how the respondent answered each choice set allows the analyst to make the appropriate deletion (whether best or worst) and web-surveys can force the respondent to provide a best item from J, followed by a worst from J 2 1 (or vice versa). Increasing evidence suggests the variance scale factor for best and worst data is often not equal to one and future studies should estimate it, using (minus) the estimated value as the factor used to multiply independent variables by for worst choice sets in order that they can be treated as best sets. This reflects practice in data pooling techniques proposed

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Best-worst scaling

195

by Swait and Louviere (1993) and formal testing of data pooling should ideally be conducted using methods such as those they propose. Flynn et al. (2008) compared sequential models of the type just described with the maxdiff model. The maxdiff model requires all possible pairs to be modelled in each choice set, making datasets very large. The dermatology Case 2 study (with four attributes) required 12 possible pairs to be set up per choice set. The maxdiff model’s advantages are largely theoretical: the trick used in sequential models to turn worst into best choices introduces a very small error in the likelihood function due to the asymmetry in the EV1 distribution (Marley and Louviere, 2005; Flynn et al., 2008). The maxdiff model has no such error, and the properties of the scores presented in section 3.4 are based on that model. However, in practice, estimates from maxdiff models are generally indistinguishable from those from sequential models and data parsimony means the latter are generally preferred. Finally, the reader should ask themselves whether, when providing best and worst data, they would really consider all possible pairs before giving their answer. 4.2

The Best-Minus-Worst Scores

Marley and Louviere (2005) show that the best-minus-worst scores are not unbiased estimates of the true utilities when the maxdiff model holds. However, they have been found to be linearly related to the ML estimates of the conditional logit model in virtually every empirical study to date. This is probably a manifestation of the linear portion of the logistic (cumulative distribution) function; thus, a non-linear relationship is likely only when the researcher is plotting the scores for a single highly consistent respondent, or for a sample of respondents each of whom is highly consistent and the choices are highly consistent across the sample. In other words, whilst the analyst should be wary of inferring cardinality in the scores for a given respondent, (s)he does not have to aggregate many respondents to obtain scores that are highly linearly related to the conditional logit estimates. Thus, researchers who are not confident of implementing limited dependent variable models such as logit and probit regression can obtain good estimates using a spreadsheet. The scores also enable considerable insights to be drawn at the level of the individual respondent. For example, taxonomic (clustering) methods of analysis have been applied to the scores (Auger et al., 2007). Since the scores are a function of choice frequencies there is no need for any prior rescaling of the data in attempts to eliminate or standardise any respondent-level response styles: two people who agree on the rank order of a list of items but who use different parts of a rating scale will provide identical best-worst data. Flynn and colleagues have also used the scores to help evaluate solutions from latent class analyses, which can give spurious solutions (Flynn et al., 2010). This use of the scores to judge and guide analyses of the choice (0,1) data that choice modellers traditionally use represents an important aid in decomposing mean and variance heterogeneity. It is well known that the perfect confound between means and variances on the latent scale holds for all limited dependent variable (including logit and probit) models (Yatchew and Griliches, 1985), which means that technically there are an infinite number of solutions to any DCE. Judicious use of the scores can help rule out many outcomes.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

196

5

Handbook of choice modelling

SUMMARY AND FUTURE RESEARCH

Best-worst scaling offers a cost-efficient way of obtaining more information from a respondent, and/or a way of evaluating models of choice processes. It is important to note that it is a theory of how an individual makes choices; aggregating across individuals requires assumptions to be made (and tested, where possible). It is for this reason that the BIBDs proposed here are attractive: a single design means there is no potential for any confounding of properties of the design with any given individual’s preferences. There are a number of fruitful areas for future research. Researchers interested in keeping within a traditional choice modelling paradigm would welcome work to further understand how and to what extent best-worst data can be pooled. Issues such as the size of the variance scale factor at ranking depths and the conditions under which worst models have a different functional form to best are important. Indeed, work to further understand process issues generally is welcome: early work suggests that the class of models with natural process interpretations is different from the class of models with useful score properties. Yet use of the scores may give the average applied researchers more confidence, not least in terms of better understanding heterogeneity in preferences and/or scale in their data. Data pooling has normative issues that are pertinent to health economists in particular: for best-worst choice, it is only if all individuals satisfy the maxdiff model that the average of their utility estimates represents the preferences of the ‘representative individual’ used in economic evaluation. In section 3.3, we presented a process description of the maxdiff model of best-worst choice, (8.6), based on the MNL model for separate best and worst choices. That process description leads quite naturally to the question of how one might extend MNL-type models for choice to models of choice and response time; this is important as researchers and practitioners are realizing the potential added value of response time measurements (for example, Otter et al., 2008). Roe et al. (2001, p. 385) extend their multialternative decision field theory to the case of two thresholds, one for acceptance, one for rejection, which gives one such model for best-worst choice and response time. Also, it is known that the three models (8.1), (8.4), and (8.6), with a common scale u, satisfy a common random utility model, based on the extreme value distribution7 – it is the inverse extreme value maximum8 random utility model (Marley and Louviere, 2005, def. 11, and Appendix A of this chapter). This maximum random utility model is particularly interesting because it can be rewritten in such a way as to also predict response time – the key step is to convert the above maximum random utility model of best and/or worst choice into an equivalent ‘horse race’ minimum random utility of best and/or worst choice and response time (Marley and Colonius, 1992); Hawkins et al. (2014) develop and test such extensions. The extended models remain random utility models, and thus are unable to handle various context effects that have arisen in the study of best choices (see the reviews by Busemeyer and Riskamp, Chapter 3 in this volume, and Rieskamp et al., 2006). Since Roe et al.’s (2001, p. 385) acceptance-rejection model is based on decision field theory, it can handle such context effects for best choices (see Chapter 3 of this volume), and likely can handle similar effects in best-worst choice (when they are studied). Another natural next step is to develop similar extensions to the best and/or worst random utility models in Hawkins et al. (2014) so that they can also handle context effects.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Best-worst scaling

197

ACKNOWLEDGEMENTS This research has been supported by Natural Science and Engineering Research Council Discovery Grant 8124–98 to the University of Victoria for Marley. The work was carried out, in part, whilst Marley was a Research Professor (part time) at the Institute for Choice, University of South Australia Business School.

NOTES 1. That paper uses a representation in terms of b 5 eu, and b is shown to be a ratio scale, that is, unique up to a muliplicative scale factor. This implies that u is a difference scale, that is, unique up to an additive constant (or origin). This relation holds for all the results stated in this chapter as having been demonstrated in Marley and Louviere (2005), Marley et al. (2008) or Marley and Pihlens (2012). 2. Further work is needed to extend the theoretical result to, say, balanced incomplete block (BIBD) designs. See Marley and Pihlens (2012) for related discussions of connected designs. 3. Assume that the maxdiff model holds, and a balanced incomplete design (BIBD) is used for the survey. If the utility (scale) values are in a small range – say, [−1/2,1/2] – then a linear relation holds under a firstorder Taylor expansion of the maxdiff choice probabilities (unpublished, available from Marley). 4. Of course, in this Case 3, attribute-level p is (only) ‘chosen’ as a consequence of its being a component of the profile chosen on a given choice opportunity. 5. The function is continuous; alternatively, it is bounded from above on an interval. 6. In both cases, the results were developed in terms of ratio scales b 5 eu, bi 5 eu , rather than the equivalent difference scales u, ui. 7. This means that: for 2` , t , ` Pr (ez # t) 5 exp2e2t and Pr (ep, q # t) 5 exp2e2t. 8. We have added maximum to Marley and Louviere’s definition to emphasize that the random utility models of choice are written in terms of maxima, whereas the equivalent (‘horse race’, accumulator) models of response time are written in terms of minima. See Hawkins et al. (2014) for such response time models for best-worst choice. i

REFERENCES Aczél, J., F.S. Roberts and Z. Rosenbaum (1986), ‘On scientific laws without dimensional constants’, Journal of Mathematical Analysis and Applications, 9 (1–2), 389–416. Anderson, N.H. (1970), ‘Functional measurement and psychophysical judgement’, Psychological Review, 77 (3), 153–70. Auger, P., T.M. Devinney and J.J. Louviere (2007), ‘Using best-worst scaling methodology to investigate consumer ethical beliefs across countries’, Journal of Business Ethics, 70 (3), 299–326. Bednarz, A. (2006), ‘Best-worst scaling and its relationship with multinomial logit’, Bachelor of Applied Science (Maths) thesis, University of South Australia. Beggs, S., S. Cardell and J. Hausman (1981), ‘Assessing the potential demand for electric cars’, Journal of Econometrics, 16 (1), 1–19. Coast, J., C. Salisbury, D. de Berker, A. Noble, S. Horrocks, T.J. Peters and T.N. Flynn (2006), ‘Preferences for aspects of a dermatology consultation’, British Journal of Dermatology, 155 (2), 387–92. Cohen, S. and L. Neira (2004), ‘Measuring preference for product benefits across countries: overcoming scale usage bias with maximum difference scaling’, in F. Nauckhoff (ed.), Excellence in International Research, Amsterdam: ESOMAR, pp. 1–22. Collins, A.T. and J.M. Rose (2011), ‘Estimation of a stochastic scale with best-worst data’, paper presented at the Second International Choice Modelling Conference, Leeds, 4–6 July. Fiebig, D.G., M.P. Keane, J. Louviere and N. Wasi (2010), ‘The generalized multinomial logit model: accounting for scale and coefficient heterogeneity’, Marketing Science, 29 (3), 393–421. Finn, A. and J.J. Louviere (1992), ‘Determining the appropriate response to evidence of public concern: the case of food safety’, Journal of Public Policy & Marketing, 11 (1), 12–25. Flynn, T.N. (2010), ‘Valuing citizen and patient preferences in health: recent developments in three types of best-worst scaling’, Expert Review of Pharmacoeconomics & Outcomes Research, 10 (3), 259–67.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

198

Handbook of choice modelling

Flynn, T.N., J.J. Louviere, T.J. Peters and J. Coast (2007), ‘Best-worst scaling: what it can do for health care research and how to do it’, Journal of Health Economics, 26 (1), 171–89. Flynn, T.N., J.J. Louviere, T.J. Peters and J. Coast (2008), ‘Estimating preferences for a dermatology consultation using best-worst scaling: comparison of various methods of analysis’, BMC Medical Research Methodology (online journal), 8 (76). Flynn, T. N., J. J. Louviere, T. J. Peters and J. Coast (2010), ‘Using discrete choice experiments to understand preferences for quality of life. Variance scale heterogeneity matters’, Social Science & Medicine, 70 (1), 1957–65, doi: 10.1016/j.socscimed.2010.03.008. Hausman, J.A. and P.A. Ruud (1987), ‘Specifying and testing economic models for rank-ordered data’, Journal of Econometrics, 34 (1–2), 83–104. Hawkins, G.E., S.D. Brown, A.A.J. Marley, A. Heathcote, T.N. Flynn and J.J. Louviere (2014), ‘Integrating cognitive process and descriptive models of attitudes and preferences’, Cognitive Science, 38 (4), 701–35. Helson, H. (1964), Adaptation-Level Theory, New York: Harper & Row. Hensher, D.A. and J.J. Louviere (1982), ‘Design and analysis of simulated choice or allocation experiments in travel choice modelling’, Transportation Research Record, 890 (1), 11–17. Hensher, D.A., J.M. Rose and W.H. Greene (2005), Applied Choice Analysis: A Primer, Cambridge: Cambridge University Press. Huber, P.J. (1963), ‘Pairwise comparison and ranking: optimum properties of the row sum procedure’, Annals of Mathematical Statistics, 34 (2), 511–20. Johnson, J.G. and A.A.J. Marley (2012), ‘A process model of the cognitive deliberations underlying health utility assessments’, manuscript, Department of Psychology, Miami University. Laran, J. and K. Wilcox (2011), ‘Choice, rejection, and elaboration on preference-inconsistent alternatives’, Journal of Consumer Research, 38 (2), 229–41. Louviere, J.J. and G. Woodworth (1983), ‘Design and analysis of simulated consumer choice or allocation experiments: an approach based on aggregate data’, Journal of Marketing Research, 20 (4), 350–67. Louviere, J.J., T.N. Flynn and R.T. Carson (2010), ‘Discrete choice experiments are not conjoint analysis’, Journal of Choice Modelling, 3 (3), 57–72. Louviere, J.J., T.N. Flynn and A.A.J. Marley (2014), Best-Worst Scaling: Theory, Methods and Applications, Cambridge: Cambridge University Press, forthcoming. Louviere, J.J., D.A. Hensher and J.D. Swait (2000), Stated Choice Methods: Analysis and Applications, Cambridge: Cambridge University Press. Louviere, J.J., D.J. Street, L. Burgess, N. Wasi, T. Islam and A.J. Marley (2008), ‘Modelling the choices of single individuals by combining efficient choice experiment designs with extra preference information’, Journal of Choice Modelling, 1 (1), 128–63. Luce, R.D. (1959), Individual Choice Behavior, New York: John Wiley & Sons. Luce, R.D. and P. Suppes (1965), ‘Preference, utility, and subjective probability’, in R.D. Luce, R.R. Bush and E. Galanter (eds), Handbook of Mathematical Psychology, vol. 3, New York: John Wiley & Sons, pp. 235–406. Marley, A.A.J. (1968), ‘Some probabilistic models of simple choice and ranking’, Journal of Mathematical Psychology, 5 (2), 333–55. Marley, A.A.J. and H. Colonius (1992), ‘The “horse race” random utility model for choice probabilities and reaction times, and its competing risks interpretation’, Journal of Mathematical Psychology, 36 (1), 1–20. Marley, A.A.J. and T. Islam (2012), ‘Conceptual relations between expanded rank data and models of the unexpanded rank data’, Journal of Choice Modelling, 5 (1), 38–80. Marley, A.A.J. and J.J. Louviere (2005), ‘Some probabilistic models of best, worst, and best-worst choices’, Journal of Mathematical Psychology, 49 (6), 464–80. Marley, A.A.J. and D. Pihlens (2012), ‘Models of best-worst choice and ranking among multi-attribute options (profiles)’, Journal of Mathematical Psychology, 56 (1), 24–34. Marley, A.A.J., T.N. Flynn and J.J. Louviere (2008), ‘Probabilistic models of set-dependent and attribute-level best-worst choice’, Journal of Mathematical Psychology, 52 (5), 281–96. McFadden, D. (1974), ‘Conditional logit analysis of qualitative choice behavior’, in P. Zarembka (ed.), Frontiers in Econometrics, New York: Academic Press, pp. 105–42. McIntosh, E. and J.J. Louviere (2002), ‘Separating weight and scale value: an exploration of best-attribute scaling in health economics’, Health Economists’ Study Group, Brunel University. Otter, T., G.M. Allenby and T. van Zandt (2008), ‘An integrated model of discrete choice and response time’, Journal of Marketing Research, 45 (5), 593–607. Rieskamp, J., J.R. Busemeyer and B.A. Mellers (2006), ‘Extending the bounds of rationality: evidence and theories of preferential choice’, Journal of Economic Literature, 44 (3), 631–61. Roe, R.M., J.R. Busemeyer and J.T. Townsend (2001), ‘Multialternative decision field theory: a dynamic connectionist model of decision making’, Psychological Review, 108 (2), 370–92.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Best-worst scaling

199

Scarpa, R. and A.A.J. Marley (2011), ‘Exploring the consistency of alternative best and/or worst ranking procedures’, paper presented at the Second International Choice Modelling Conference, Leeds, 4–6 July. Scarpa, R., S. Notaro, J. Louviere and R. Raffelli (2011), ‘Exploring scale effects of best/worst rank ordered choice data to estimate benefits of tourism in alpine grazing commons’, American Journal of Agricultural Economics, 93 (3), 813–28. Shafir, E. (1993), ‘Choosing versus rejecting: why some options are both better and worse than others’, Memory & Cognition, 21 (4), 546–56. Szeinbach, S.L., J.H. Barnes, W.F. McGhan, M.M. Murawski and R. Corey (1999), ‘Using conjoint analysis to evaluate health state preferences’, Drug Information Journal, 33 (3), 849–58. Street, D. and A.P. Street (1987), Combinatorics of Experimental Design, Oxford: Clarendon Press. Swait, J. and J.J. Louviere (1993), ‘The role of the scale parameter in the estimation and comparison of multinomial logit models’, Journal of Marketing Research, 30 (3), 305–14. Thurstone, L.L. (1927), ‘A law of comparative judgment’, Psychological Review, 34 (2), 273–86. Train, K.E. (2009), Discrete Choice Methods with Simulation, 2nd edn, Cambridge: Cambridge University Press. Vermut, J.K. and J. Magidson (2005) Technical Guide for Latent GOLD Choice 4.0: Basic and Advanced, Belmont, MA: Statistical Innovations. Viney, R., E. Savage and J.J. Louviere (2005), ‘Empirical investigation of experimental design properties of discrete choice experiments in health care’, Health Economics, 14 (4), 349–62. Wedell, D.H. (1997), ‘Another look at reasons for choosing and rejecting’, Memory & Cognition, 25 (6), 873–87. Yatchew, A. and Z. Griliches (1985), ‘Specification error in probit models’, Review of Economics and Statistics, 67 (1), 134–9.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

200

Handbook of choice modelling

APPENDIX A: A MAXIMUM RANDOM UTILITY MODEL FOR BEST, WORST, AND BEST-WORST CHOICES SATISFYING A ‘COMMON’ MNL MODEL When treated as a single model, the three models (8.1), (8.4), and (8.6), satisfy an inverse extreme value maximum random utility model (Marley and Louviere, 2005, def. 11). That is, for z [ S and p, q [ S, p 2 q, there are independent random variables ez, ep, q with the extreme value distribution such that for all y [ Y [ D (S) , BY (y) 5 Pr (u (y) 1 ey 5 max [ u (z) 1 ez ]) ,

(8.11)

WY (y) 5 Pr ( 2 u (y) 1 ey 5 max [ 2 u (z) 1 ez ]) ,

(8.12)

z[Y

z[Y

and for all x, y [ Y [ D (S) , x 2 y, BWY (x, y) 5 Pr au (x) 2 u (y) 1 ex, y 5

max [ u (p) 2u (q) 1 ep, q ] b.

(8.13)

p, q[Y p2q

Standard results (summarized by Marley and Louviere, 2005) show that the expression for the choice probabilities given by (8.11) (respectively, (8.12), (8.13)) agrees with that given by (8.1) (respectively, (8.4), (8.6)). These maximum random utility models are particularly interesting because they can be rewritten in such a way as to also predict response time. The key step is to convert the above maximum random utility model of best and/or worst choice into an equivalent ‘horse race’ minimum random utility of best and/or worst choice and response time (Marley and Colonius, 1992). See Hawkins et al. (2014) for such extension.

APPENDIX B: GENERAL RELATIONS BETWEEN THE SCALE VALUES IN CASE 2 AND CASE 3 The result in the text relating the Case 2 scale to the Case 3 scales was based on the assumption that for each attribute i, the scale ui of Case 3 is, separately, strictly monotonic increasing with respect to the scale u of Case 2. Although this is an interesting, and possibly desirable result, the data may be otherwise. A weaker relation is obtained with the following weaker assumption. Assume that there is a function F that maps a typical vector of scale values (u (r1) , . . . , u (rm) ) of the attribute-level maxdiff model on single profiles to the overall scale value u(m) (r1, . . . , rm) of the maxdiff model on profiles (def. 2) that is, F (u (r1) , . . . , u (rm) ) 5 u(m) (r1, . . . , rm) . Similar to the previous case, we assume that u and u(m) are separate (different) difference scales, and we also assume that the mapping F is invariant under admissible

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Best-worst scaling

201

transformations (here, changes of origin) in the following sense: there is a function G on the non-negative real numbers such that for each a . 0, F (u (r1) 1 a, . . . , u (rm) 1 a) 5 F (u (r1) , . . . , u (rm)) 1G (a) , The mathematical question then becomes: what are the possible solutions of the above equation for F (and G)? Under quite weak regularity conditions,1 the general solution has the form (Aczél et al., 1986): for some i, i [ { 1, . . . , m } , and a constant ci, F (u (r1) , . . . , u (rm)) c1u (rm) 1 f (u (r1) 2 u (rm) ,. . ., u (rm21) 2 u (rm)) ( ) ( ( ) 5 • ciu ri 1 f u r1 2 u (ri) ,. . ., u (ri21) 2 u (ri) , u (ri11) 2 u (ri) ,. . ., u (rm21) 2 u (rm) ) ¶ cmu (rm) 1 f (u (r1) 2 u (rm) ,. . ., u (rm21) 2 u (rm) ) i51 i f • 1 , i , m, i5m where f is an arbitrary non-negative function with f 5 const if m 5 1. It is important to understand that a specific solution is one such function f, for a particular i. Also, for the present application, we would expect F to be strictly increasing in each variable; a sufficient condition for this is that ci . 0 for each i and f is strictly increasing in each variable. Clearly, the above functional relation between the maxdiff scale u(m) and the attributelevel scale u is quite general – for instance, the following is a possible solution, where, for simplicity, we set m 5 3: there are constants A, B, C . 0 such that u(m) (r1,. . ., r3) 5 Cu (r1) 1 A (u (r2) 2 u (r1)) 1 Blog (u (r3) 2 u (r1) ) . Thus, if the above assumptions hold (and the previous linear ones do not), then it is a challenging empirical task to explore possible relations between the scale u of Case 2 and the scales ui, i 5 1,. . ., m, of Case 3. Note 1. That certain functions are bounded on an arbitarily small open m-dimensional interval.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:16:23AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

9

The discrete choice experiment approach to environmental contingent valuation Richard T. Carson and Mikołaj Czajkowski

1

INTRODUCTION1

Assessing the economic desirability of environmental policies requires estimating the value of non-market commodities. In response, several valuation techniques have been developed since the 1960s. They utilize two general data sources – revealed and stated preference data. The former refers to situations where people’s choices are observed in actual market situations. Conversely, stated preference data refers to situations in which choices are observed in a survey context. Stated preference (SP) methods allow collection of information about respondent preferences for the environmental amenities of interest by observing choices in hypothetical situations presented in a survey. Observed choices are contingent on scenarios posed in the survey and the environmental economics literature commonly uses the term contingent valuation (CV) to describe the process of utilizing stated preference data for valuation. Additionally, there are many different ways to elicit preference information in a CV study and the one most commonly used are discrete choice experiments (DCE). Contingent valuation is an inherently more flexible tool than revealed preference (RP) techniques, such as hedonic pricing, averting behavior and the travel cost method, because in principle it is possible to use it to examine preferences for provision levels of goods that are substantially different from those currently observed or those observed in the past. Revealed preference data from market behavior tied to the environmental good is frequently unavailable or of limited usefulness owing to a limited range of observed variation in the key variables of interest. As such, CV is often the only approach available for providing the economic valuation inputs needed for cost–benefit analysis.2 Contingent valuation is also usually the only approach to obtain another distinctive property of many environmental goods – the passive use component of their economic value (Krutilla, 1967; Carson et al., 1999).3 Passive use value is sometimes referred to as non-use, existence value or stewardship value and often is defined to include bequest value and option/quasi-option value when uncertainty comes into play. For many environmental goods, consumers may hold positive value for them even though they are not directly or indirectly using them. Passive use is thought to lie behind much of the total economic value for some environmental goods; therefore, its measurement is crucial for policymaking. It is hard to overestimate the central importance of CV to modern welfare economics. This can be illustrated by over 7500 papers and studies referenced in Carson (2011) utilizing the method with the largest group focused on environmental valuation. Estimates derived from SP data constitute almost 60 percent of the estimates in Environment Canada’s very large Environmental Resource Inventory (EVRI) database maintained 202 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 203 in conjunction with environmental protection agencies in several other countries.4 Discussion of CV is now standard in almost all textbooks on environmental economics. Estimates from CV studies are used for assessing very large policies such as the US Clean Water Act (US Environmental Protection Agency, 1994) and for a wide range of policy decisions (Griffiths et al., 2012). Contingent valuation studies are often done explicitly to look at specific policy issues and the results from older studies are frequently used to evaluate new policy issues in the form of benefits-transfer exercises including serving as meta-analyses inputs. It is impossible to ‘review’ the CV literature per se or even cover all of the major papers in the area in any detail. Instead, given the scope of this handbook, we provide an overview of selected CV and DCE issues which appear the most relevant and, therefore, receive relatively much more attention in environmental economics than in other fields. Readers are referred to the other sections of this handbook for the treatment of issues such as experimental design theory, econometric treatment of discrete choice data, survey administration or methodological frontiers in the use of choice modeling. We first provide a brief history of CV with a view toward the increasing dominance of DCE as the preferred elicitation format. We then take up the relationship between different elicitation formats used in CV, with an emphasis on the distinction between different DCE formats. From there we move on to a closely related issue – the incentive structure of different elicitation formats. This issue has been examined in some detail in environmental economics, but is only now starting to receive attention in other applied fields. After this we look at neoclassical welfare theory with an emphasis on quantity changes which characterize many environmental goods rather than price changes which apply in other fields such as marketing. In this section, we address issues related to decomposing maximum willingness to pay (WTP) into different components such as direct and passive use and the role of motives. After our brief theoretical tour, we turn to its predictions on quantities like the difference WTP and minimum willingness to accept (WTA) compensation, the size of the income elasticity of WTP and the magnitude of sequence effects where it was first thought that empirical CV results violated neoclassical theory.5 Many of these results are now thought to be generally consistent with neoclassical economic theory, while others have been shown to be behavioral effects that also characterize RP data. From there we turn to a discussion of external and internal tests of scope, an issue which has been at the heart of the debate over the validity of CV. Issues of criterion and convergent validity are then briefly considered. In the last section, we provide some thoughts on where we think contingent valuation using DCE is headed in the future.

2

A SHORT HISTORY OF CONTINGENT VALUATION

Economists have long realized that much of an individual’s utility might be provided by goods for which market prices did not exist. Bowen (1943) and Ciriacy-Wantrup (1947) were the first to propose the use of specially structured public opinion surveys to value what Bowen called ‘social goods’ and Ciriacy-Wantrup called ‘collective, extra-market goods’. The first area where the lack of monetary units for a key output of government projects was considered to be a major problem was outdoor recreation. Once policy makers recognized the need to know what people wanted and how much they were

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

204

Handbook of choice modelling

willing to pay for it, they considered surveying the public to find out what they wanted with respect to its national parks. Building the major water projects of the 1950s and 1960s brought into sharp focus the need to value an array of different project inputs and outputs at different points in time which eventually led to modern cost–benefit analysis. The tradeoffs between water-based recreation, electricity and flood control could not be tackled without taking the value of recreation into account. This led to considerable interest in developing both RP and SP methods to reliably place a monetary value on different types of outdoor recreation. The first application of CV in the academic literature was Davis’s 1963 Harvard dissertation which used surveys to estimate the value hunters and tourists placed on a particular wilderness area. He reasoned that it may be possible to ‘approximate a market’ in a survey by describing alternative kinds of areas and facilities that could be made available to the public, and then simulate market bidding behavior. Davis joined Resources for the Future and a comparison (Knetsch and Davis, 1966) between CV and the new travel cost method showed both approaches produced similar estimates for an outdoor recreation example. Over the next several years, other economists started to follow Davis’s lead and used CV to value an ever increasing array of public goods.6 There were two major developments that are of interest here. The first was the initial applications of CV to value goods that were thought to comprise mostly existence value as defined by Krutilla’s seminal 1967 American Economic Review paper. The key paper here is Randall et al. (1974) who looked at changes in visibility related to air pollution in the southwestern United States. Soon researchers were valuing the protection of endangered species (for example, Samples et al., 1986) and preserving wild and scenic rivers (Walsh et al., 1985). From the policymaking perspective, the US Water Resources Council’s (1979) published guidelines in the Federal Register for participation in project evaluation which specified those methods that were acceptable for use in determining project benefits. The inclusion of CV as one of the three recommended methods (the other two were the travel cost and the unit day value method) was a sign of CV’s growing respectability. The US Environmental Protection Agency had a major research program focused on developing CV as an approach to non-market valuation in the 1970s and 1980s (for example, Cummings et al., 1986). Mitchell and Carson (1989) provided a comprehensive treatment of a wide range of issues in the design and analysis of CV surveys that was highly influential. The second major development was the exploration of different ways of eliciting information on preferences from survey respondents. Davis’s work, as well as those of many early CV studies, used a ‘bidding game’. In a bidding game, a respondent is asked if he or she is willing to pay a specific amount for the program to supply the good; if he or she said yes, a higher amount, often $1 more, is asked and, if no, a lower amount is asked. This is repeated until an initial yes changed to a no, or vice versa. This format is a binary discrete choice question that is iterated until it effectively obtains a continuous estimate of the respondent’s WTP. A difficulty, though, was soon discovered with the bidding game, that the initial amount asked influenced the final amount. This phenomenon was called starting-point bias (Thayer, 1981). Some researchers tried simply asking for WTP using an open-ended question, which became known as a direct question. To overcome the high non-response rate often associated with the direction question, Mitchell and

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 205 Carson (1981) proposed a payment card whereby respondents were shown an array of numbers on a card and asked to pick the amount on the card or any amount in between that best represented their WTP. This elicitation format, while not without some of its own problems, performs well in a variety of settings and is the only non-DCE format currently receiving widespread use.7 Bishop and Heberlein (1979) put forward an application using goose-hunting permits where they compared WTP and WTA with the WTA comparison further featuring a comparison between a survey context and actual re-purchase of the permits. They offered to buy (or sell) goose permits at several prices which were randomly assigned and fit a logit model to the data. The application’s single binary discrete choice question immediately caught the attention of other researchers because of its simplicity and close connection with choice behavior in markets with posted prices. Discrete choice elicitation formats for CV got a major boost when Hanemann (1984a, 1984b), starting from McFadden’s (1974) random utility model, put forward a comprehensive statistical framework for estimating standard neoclassical Hicksian welfare measures based on the indirect utility function. Cameron and James (1987) and Cameron (1988) for the probit and logit model, respectively, put forward an alternative framework based on expenditure functions that exploits the fact that cost is randomly assigned.8 McConnell (1990) showed the two approaches to be duals of the same utility maximization problem when there is no random component but differ in how a random component enters. Hanemann and Kanninen (1999), Haab and McConnell (2002) and Hensher et al. (2005) provided comprehensive overviews of econometric issues related to estimating WTP measures. The CV study done for the damage assessment for the Exxon Valdez oil spill (Carson et al., 1992, 2003) used a discrete choice elicitation format with randomly assigned cost. It quickly defined the state of the art.9 In response to a critique of CV put forward at an Exxon sponsored symposia contained in the Hausman (1993) volume, the US government convened a panel co-chaired by Nobel Laureates Kenneth Arrow and Robert Solow to assess the CV method and its use in measuring passive use values. The Panel (Arrow et al., 1993, p. 4610) concluded that ‘CV studies can produce estimates reliable enough to be the starting point of a judicial process of damage assessment, including lost passive values’. The Panel made a number of recommendations to enhance the reliability of CV studies including using a single binary discrete choice question, noting its desirable incentive properties when cast in a voting context. Contingent valuation researchers had always wanted to extract as much information about the range of policy options with early studies (for example, Randall et al., 1974) asking about multiple programs using bidding games, and later studies (for example, Carson and Mitchell, 1993) using payment cards. With WTP estimates from multiple programs it was possible to estimate valuation functions where one or more attributes of the environmental good had been varied, even going as far as tracing out the response surface with an experimental design in Carson et al. (1990b). Contingent valuation researchers started to become aware of incentive problems with bidding games and open-ended type CV survey questions (Hoehn and Randall, 1987; Mitchell and Carson, 1989) and with offering multiple levels of a pure public good when only one level of a public good can be provided. This growing awareness of incentive issues though did not initially discourage researchers starting with Carson (1985) and Hanemann et al. (1991) from exploring how to obtain more information from each

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

206

Handbook of choice modelling

respondent about one level of a good. This was done by asking a double-bounded binary choice question which asked respondents who said yes to the amount in an initial binary choice question about whether they would pay a higher amount and those who said no about a lower amount. This format only involved one good and binary choice question. The response, if taken at face value, reduces the length of the interval in which the respondent’s WTP lay, and hence, decreases the confidence interval around WTP estimates given a fixed sample size. To get even more information from respondents, researchers began implementing more complex DCE formats involving either multiple choice sets, multinomial choice questions, or both. The first environmental economics paper using such an approach was Carson et al. (1990a) which asked about options involving Kenai King salmon fishing and fit a nested logit model that included a status quo no-purchase option.10 The paper, though, that generated attention in the environmental economics community was Adamowicz et al. (1994) which gathered what was then thought to be an impossible amount of data; 16 choice sets each with three 3 alternatives. The paper also showed how to combine choice data from a DCE with choice data involving recreational trip destinations that allowed for differences in the variance of the error component in the utility function estimated for the two types of data. A later paper by Adamowicz et al. (1998) demonstrated more complex DCE formats could also potentially be applied to situations involving passive use. An influential paper by Hanley et al. (1998) summarized the quickly growing number of studies using DCE to explicitly value program attributes. Adamowicz wrote a chapter on using DCE for environmental valuation that appeared in the seminal Louviere et al. (2000) book. At this point confusion in the environmental economics literature starts to set in as to what was CV and what was a DCE (often then referred to as a choice experiment – CE – or choice modeling – CM). Adamowicz et al. (1998, p. 64), one of the first papers to draw a distinction between CV and DCE, was careful to note: the most common elicitation approach is the two-alternative (referendum) contingent valuation method (CVM) (see Mitchell and Carson [1989]). Other stated preference presentations are also possible. In this article we examine an extension or variant of this traditional approach, the choice experiment, which employs a series of questions with more than two alternatives to elicit responses that allow estimation of preferences over attributes of an environmental state.

Thus, to some researchers CV was defined to mean a single binary discrete choice question while DCE were defined to mean a sequence of multinomial choice questions. In contrast, Hanley et al. (1998) use a series of open-ended direct questions as the definition for CV and compared this with a set of multinomial choice questions they labeled as ‘a choice experiment’. A sequence of binary choice questions was sometimes called CV and also sometimes referred to as being a choice experiment rather than CV. In part, an ill-defined distinction between CV and DCE took hold in the environmental economics literature because arguments were put forward: (a) that DCE with more than two alternatives or multiple choice sets overcame some of the problems critics (for example, Diamond and Hausman, 1994) asserted existed with CV, (b) that DCE represented a new approach taken from marketing (which fostered journal publication), and (c) that there was a large demand from policy makers for valuing changes in attributes.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 207 Discrete choice experiments, as distinct from CV, were ill defined because CV was never a particular elicitation method as opposed to the use of SP data to estimate well-defined economic welfare measures for public goods (Mitchell and Carson, 1989).11 Early CV studies had focused on valuing multiple levels of environmental quality, and some DCE simply expanded to look simultaneously at varying a substantial number of non-cost attributes rather than one or two. Over time it also became clear that DCE more complex than a single binary choice question, could never overcome the problems critics perceived with that format. That is because with enough statistically equivalent samples a single binary discrete choice format can be used to estimate anything that a more complex DCE can but without order effects. The advantage of more complex DCE was that it gathered considerably more preference information per respondent and, as such, had considerable cost advantages. The ability to easily vary multiple attributes made them popular with policy makers who needed to consider a wide range of options.12 One currently sees a variety of different DCE elicitation formats in use with choice of the particular elicitation format being driven by the specific characteristics of the good being valued, available funding and the range of policy options that needed to be evaluated.

3

A TYPOLOGY OF STATED PREFERENCE ELICITATION FORMATS USED IN CV

Carson and Louviere (2011) provide a nomenclature for organizing different elicitation formats used in CV. The first aspect of this nomenclature is to note that, while there are many ways to ask stated preference questions, the only two approaches that result in data consistent with neoclassical welfare economics are those that involve either matching formats or choice formats.13 A matching question essentially solves the problem of what quantity makes the consumer indifferent between two situations, typically the status quo and an alternative. The quantity that makes the agent indifferent effectively sets utility in the two situations. When the quantity is some standard numeraire like money, the truthful response to a matching question corresponds to one of the Hicksian welfare measures. This, of course, would have great advantages because it would short circuit much of the need for making assumptions about the structure of utility functions and error components. Obtaining a reliable answer to a matching question has proven difficult for two reasons. First, a direct question (DQ) for WTP tends to yield high non-response rates, most likely because agents in many countries are used to making decisions in markets with posted price. Second, the incentive structure of a DQ is not conducive to truthful preference revelation. Contemporary neoclassical consumer theory is built on the ability to make choices between options. Through a series of steps, this eventually leads to the indifference point being arrived at even though agents did not originally know their WTP or WTA.14 If one assumes truthful preference revelation and an absence of anchoring effects, this naturally leads to Davis’s bidding game (BGAME) as the way to obtain an estimate arbitrarily close to the desired matching response. Recognition that it was costly for respondents to exert the effort to formulate the matching response and that this could induce anchoring effects, leads to the payment card (PCARD) as a way to get reasonably

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

208

Handbook of choice modelling

close to the desired matching measure. Bidding-game and PCARD formats move away from a DQ toward having respondents answering one or more simpler choice questions. Direct-question, BGAME and PCARD elicitation formats all use valuing both a single (SINGLE) good and a sequence (SEQ) of goods. After Bishop and Heberlein’s seminal work, a single binary choice (SBC) question seemed to be a more natural way for environmental economists to have respondents provide preference information. They were happy to replace matching elicitation formats with the SBC as long as it was possible to efficiently convert that information into the standard Hicksian welfare measures they needed. The SBC is the canonical form of a DCE and it may be useful to step back and formally define the two essential elements to any DCE. The first is that the agent is asked to make a discrete choice between two or more alternatives in a choice set. The second is that the alternatives presented for choice are constructed by means of an experimental design that varies one or more attributes to be able to estimate economic quantities tied to preference parameters. The SBC format randomly assigns the cost of the second choice, with the cost of first good (typically the status quo) held constant (often at zero), which allows for the estimation of the relevant Hicksian welfare measure. Policy makers, however, often want to value more than one possible change from the status quo. This is straightforward to do with the SBC format. If there are j possible variants of the goods that policy makers are interested in, ask j statistically equivalent subsamples a SBC question involving the first/status quo good versus one of the j alternatives. The data from each subsample can be stacked and alternative specific constants (ASC) of each of the j goods can be estimated along with different slope parameters for the variants of the good. This will produce estimates for each of the j goods. If the j goods differ from each other in some systematic way, this can be accommodated by replacing or augmenting the ASC’s with a set of attribute variables describing those differences. While this approach can have desirable properties from the perspective of both survey design and providing desirable incentive properties for truthful preference revelation, it can also be quite expensive, and prohibitively so if j is sizeable because SBC responses reveal little preference information. There are two ways to enhance the amount of preference information obtained from a single respondent within a discrete choice framework: asking a sequence of questions or enlarging the choice set. A variant of the first has already been noted – ask about a second cost amount that conditions on the first response. This elicitation format, known as the double-bounded binary choice (DBBC), is sometimes called a double-bounded dichotomous choice question. It yields interval censored data and reduces the sample size needed for a given confidence interval. This reduction can be quite large with a good experimental design if the response to the second question is consistent with the WTP distribution implied by the first question and substantial research has revolved around the veracity of this assumption (Alberini et al., 1997). This is the simplest of the sequential elicitation approaches as it introduces a second choice set without changing any attribute of the good other than cost. The second way to obtain more preference information is to ask for the choice to be made from a larger set of alternatives. A single multinomial choice (SMC) with j $ 3 effectively generates j − 1 binary comparisons between the alternative chosen and all of the j possible alternatives. In principle, an enormous amount of preference informa-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 209 tion could be obtained from a single agent by using a choice set with an extremely large number of alternatives. The belief that there are severe limits to the number of choice alternatives that agents will seriously entertain leads one in the direction of quite strict limits on the j. This in turn implies asking the same agent to choose their most preferred alternative from more than one choice set or asking different agents to make choices in SMC questions. The belief that different agents may have heterogeneous preferences and that these may play a role in policy decisions lead researchers to sample multiple agents. Efficiency concerns coupled with cost constraints tend to lead researchers to ask respondents a sequence of choice sets. The two most common elicitation techniques using a sequence of choice sets are a sequence of binary choice questions (BC-SEQ) and a sequence of multinomial choice questions (MC-SEQ). The key issue with all sequential formats is whether respondents answer the choice sets independently. Other sequential formats that divide alternatives into more and less preferred subsets are also possible. An early example is a complete ranking (RANK) of a set of j alternatives. A more recent example known as best-worse choice (BWCHOICE) asks respondents to indicate their best alternative and their worst alternative among a set of j . 2 alternatives (Marley et al., 2008) and can be seen as a less burdensome version of ranking that extracts more information than an SMC.

4

ELICITATION FORMATS AND INCENTIVE COMPATIBILITY

Environmental economists have paid considerable attention to the incentive properties of CV elicitation formats. This appears to be due to two factors. First, the use of multiple elicitation formats in CV studies lead researchers to explore the properties of responses to individual formats (for example, Thayer, 1981) and to compare responses from different formats (for example, Smith and Desvousges, 1986). Finding that responses to specific formats did not behave as expected and that different formats yielded different estimates of WTP dismayed researchers. This suggested to some that respondents had problems valuing unfamiliar public goods and the divergence between these estimates is a major reason why some CV critics argue estimates from SP questions should be rejected (Hausman 1993; McFadden 1994). The second was recognition that underlying the belief that different elicitation formats should yield statistically equivalent estimates of WTP was the assumption that respondents always truthfully reveal their preferences irrespective of the economic incentives to do so. This, of course, is an unusual assumption for an economist to make. Indeed, Paul Samuelson (1954) in his classic article defining the properties of pure public goods saw strategic behavior, in the form of a downward bias, was the main problem in using surveys to determine public goods provision and taxation. Carson and Groves (2007) have put forward a comprehensive neoclassical framework use mechanism design theory to examine the incentive and informational properties of stated preference elicitation formats. The first result is that if respondents view the stated preference question(s) as consequential, then their outcomes can be analyzed as revealed economic behavior. To be consequential, a survey question needs to meet two properties.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

210

Handbook of choice modelling

Agents need to view responses as having (a) a positive probability of influencing agency’s actions and (b) agents must care about the outcomes. Stated preference questions that do not meet these two conditions are inconsequential. The difficulty with inconsequential questions is that a random response is as good as any other response because the response cannot influence the agent’s utility. For consequential survey questions, mechanism design theory can be used to analyze the optimal responses. Carson and Grove’s (2007) starting point is the well-known Gibbard-Satterwaite theorem that says that no mechanism with larger than a binary message space can be incentive compatible without restricting the space of allowable preference functions. This means that all elicitation formats other than a SBC can present situations where it is optimal for some respondents to reveal preference information inconsistent with true WTP. Further, conditions must be imposed for an SBC question to be incentive compatible. The basic conditions are well known from the voting literature on a single binary vote (Farquharson, 1969). To be incentive compatible, a referendum on a pure public good needs to be a take-it-or-leave-it offer, where the vote does not influence any other offers that may be made to agents and where the payment mechanism is coercive in the sense that each agent can be required to pay independently of how the individual agent voted. These conditions are quite restrictive. For instance, they rule out private goods because the government cannot force individual agents to buy private goods but the government can force agents to pay taxes. For a quasi-public good like a recreational fishing lake, it is possible to structure an SBC between the current status quo quality level and price, and an alternative quality level and price that is incentive compatible with respect to revealing which of the two configurations the agent prefers. However, like private goods, this choice will not reveal information about the number of trips that would be made under either configuration. It is straightforward to show that a question that can influence multiple outcomes cannot be incentive compatible with respect to one outcome because agents should take into account the influence of their response on all possible outcomes when formulating it. If a binding referendum vote on a single issue, meeting the auxiliary conditions noted above, is incentive compatible, what about an SBC question meeting the same auxiliary conditions? Carson and Groves (2007) show that the binding nature of a referendum vote is not needed. All that is necessary is that the government be more likely to undertake the action if the vote meets a specific plurality (typically a simple majority or two thirds in favor). Such advisory referenda are common in many places. Is the plurality voting rule necessary? The answer is no. Carson and Groves (2007) show that a weaker condition – that the probability of the alternative being implemented is weakly monotonically increasing in the percentage in favor – works. This condition includes plurality voting as a special case. Green and Laffont (1978) have shown that any economic mechanism of the type being considered here can be implemented using a statistical sample. Putting these together yields an SBC in an advisory survey that has the same property as a binding referendum vote. Carson and Groves (2007, 2011) show that it is possible to examine the incentive structure of other CV elicitation formats. Beginning with the BGAME, if the starting point used is thought by a respondent to convey information about the possible cost of the program, then it is optimal to condition his response on it. Thus, WTP estimates

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 211 from bidding games should be correlated with the starting points used, which is what is empirically observed. Direct-question and PCARD responses should be consistent with observing a concentration of responses at zero coupled with observing relatively few very small WTP amounts. This should happen because the optimal response for respondents whose WTP is less than the expected cost is zero.15 Another theoretical prediction is that two WTP distributions revealed by a DBDC should not be perfectly correlated as the standard interval censored estimator assumes. It is possible to generate a more specific prediction about the properties of particular elicitation formats with stronger assumptions about respondent beliefs. In more complex DCE, such as SMC, BC-SEQ and MC-SEQ, it is often harder to predict the impacts of incentive and information effects beyond noting that they should generate violations of the independence of irrelevant alternatives (IIA) assumption without more structure, but some flavor of the nature of problems likely to be encountered can be illustrated.16 In an SMC (and by extension both the BC-SEQ and MC-SEQ formats), the fundamental problem with pure public goods is that only one level can be supplied and it is the same for all agents. From the theoretical literature on voting, it has long been known that truthful preference revelation is often not optimal. If the agent has non-uniform priors over the two alternatives most likely to be chosen by other people, then it is generally optimal for the agent to choose the most preferred of these two alternatives, even though their true preferences may be for another alternative.17 In contrast, the SBC question has a dominant strategy that does not depend on beliefs about the preferences of other agents. There may be cases where respondents have flat priors over their beliefs about the preferences of other agents for all of the available alternatives in which case truthful preference revelation is optimal, but this is a very strong assumption. What is true in all of these cases is that in a consequential survey, the agent should not be picking alternatives randomly; rather they should be picking a ‘good’ choice, indeed the optimal choice given the elicitation format and belief structures. The difficulty is that this choice may not reflect the agent’s unconditionally most preferred alternative, as is typically assumed in most analyses. This sort of behavior can under very strong conditions yield marginal estimates of WTP for changes in attributes which are consistent even though estimates of total WTP for a good may be biased. Agents want the agency to supply their most desired mix of attribute levels (not a random mix) with most of the strategic behavior focused on influencing the agency’s pricing or overall provision decision which tends to show up in the estimates for the cost parameter or the ASC on the status quo option. More generally strategic effects can mimic or reinforce learning and behavioral effects as well as influence scale in sequential DCE.

5 5.1

ECONOMIC THEORY AND CONTINGENT VALUATION Economic Welfare Measures

The goal of a CV study is to measure an individual’s monetary value for a change in one or more goods of interest. No good can be valued in an economic context independent of how it is delivered and paid for.18 As such, the first key concept is that it is the policy (plan) to deliver the good that the agent is valuing, and we continue to talk about

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

212

Handbook of choice modelling

providing a good, except when it is useful to explore how the policy perspective might be important. Sometimes there is only one policy alternative to the status quo under consideration, in which case, standard neoclassical economic theory looking at the utility gained from the addition of a single additional good to the utility function is applicable. In other instances, policy makers are interested in exploring options that differ from each other along one or more well-defined dimensions or attributes. When this is the case, Lancaster’s theory of consumer choice (Lancaster, 1966) is often brought to bear with the standard goods of neoclassical economic theory replaced with bundles of attributes. Policy discussions are often interested in how value for the good changes as one or more of a good’s attributes are varied in either a continuous or discrete manner. The monetary value of the marginal change in only one of the good’s attributes is often referred to as the ‘implicit price’ of the attribute. In this sense, attribute-based DCE models bear some resemblance to hedonic pricing models. Let us begin by denoting the item being valued (be it the change in the amount of a composite good or the change in one of its attributes) by q, and assume the individual has a utility function defined over the quantities of various market commodities denoted by the vector x, and q, u(x,q). Corresponding to this direct utility function, we can write an indirect utility function, v(p,q,y), where p is the vector of the prices of the market commodities and y is the person’s income. We make the conventional assumption that u(x,q) is increasing and quasi-concave in x, which implies that v(p,q,y) satisfies the standard properties with respect to p and y.19 We make no assumptions regarding q. If the agent regards q as a ‘good’, u(x,q) and v(p,q,y) will both be increasing in q; if he regards it as a ‘bad,’ u(x,q) and v(p,q,y) will both be decreasing in q; and if he is indifferent to q, u(x,q) and v(p,q,y) will both be independent of q. We also make no assumption regarding quasiconcavity with respect to q. The act of valuation implies a contrast between two situations – a situation with the item, and one without it. We interpret what is being valued as a change in q. Specifically, suppose that q changes from q0 to q1; the person’s utility thus changes from u0 ; v (p, q0, y) to u1 ; v (p, q1, y) . If he regards this change as an improvement, u1 . u0; if he regards it as a change for the worse, u1 , u0; and if he is indifferent, u1 5 u0. The value of the change to him in monetary terms is represented by the two Hicksian measures, the compensating measure, C, which satisfies v (p, q1, y 2 C) 5 v (p, q0, y) ,

(9.1)

and the equivalent measure, E, which satisfies v (p, q1, y) 5 v (p, q0, y 1 E) .

(9.2)

sign (C) 5 sign (E) 5 sign (u1 2 u 0) .

(9.3)

Observe that

If the change is an improvement, C . 0 and E . 0, and C measures the individuals’ WTP to secure the change, while E measures his WTA to forego it. If the change is regarded as being for the worse, C , 0 and E , 0; in this case, C measures the individuals’ WTA

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 213 to endure the change, while E measures his WTP to avoid it. If he is indifferent to the change, C 5 E 5 0. To emphasize the dependence of the compensating and equivalent measures on (i) the starting value of q, (ii) the terminal value of q, and (iii) the value of (p,y) at which the change in q occurs, we sometimes write them as functions: C 5 C(q0, q1, p, y) and E 5 E(q0, q1, p,y). To simplify things, we can define the WTP function, WTP (q0, q1, p, y) 5 e

C (q0, q1, p, y) if C $ 0 . 2E (q0, q1, p , y) if C # 0

(9.4)

The WTA function, WTA(q0, q1, p, y), is defined analogously. The goal of a CV survey is to measure one or another of these valuation functions – either the entire function, or one or more particular points on the function. 5.2

Random Utility Framework

If one could reliably observe WTP and WTA in matching questions, analysis in CV surveys would be straightforward in the case of a single policy change of interest and involve a standard response surface regression approach in cases where it was of interest to trace out the implicit price curves for individual attributes. For reasons noted earlier, DCE are used to elicit preference information. Because choices rather than the Hicksian welfare measures themselves are observed, one must turn to a model that translates choices into preference parameters related to underlying utility. Typically this is done in a random utility model (RUM) framework. The standard economic version of the RUM assumes that while an individual knows his preferences with certainty and does not consider them stochastic or otherwise behaving in a random manner, his preferences contain some components that are unobservable to the econometric investigator, and thus, are treated by the investigator as random (McFadden, 1974; Manski, 1977). In contrast, the version of the RUM from psychology, starting with Thurstone’s original 1927 presentation in the context of a probit model, envisions a true stochastic component related to choice, couched in the inability to perfectly discriminate between alternatives. The economic version of the RUM does not rule out respondent uncertainty. Indeed, in most decision contexts including market purchases, respondents face some uncertainty about various aspects of the good. The difference between these two perspectives can best be seen by considering data generated by the DBBC elicitation format in the absence of any information or incentive effects. In the Manski formulation, because there is no true random component, the WTP distributions implied by the two binary choice questions should have the same mean, variance, and be perfectly correlated. In the Thurstone formulation, the means and the variances of the two WTP distributions should be the same but the correlation between the responses will be less than one.20 Empirical evidence from fitting bivariate probit models to DBBC data tends to reject the hypothesis that the correlation coefficient is one, but there is considerable debate over the cause of the rejection ranging from anchoring to strategic behavior to being artifact of fitting the wrong distribution for WTP to the data (Versonsi et al., 2011). Representing the stochastic component of utility function by e, we write the indirect utility function as v (p, q, y; e) . It is usually assumed that the stochastic component

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

214

Handbook of choice modelling

enters the utility function additively, so the utility function can be decomposed to the part which is deterministic and the part which is random: v (p, q, y; e) 5  (p, q, y) 1 e. Empirical implementation of random utility theory requires making an assumption about statistical properties of the random term, typically that the random component of the utility function is independently and identically (IID) distributed across individuals and alternatives and, for computational reasons, the Extreme Value Type 1 distribution is commonly used. The resulting multinomial (conditional) logit model (MNL), has the convenient closed-form expression of the probability of an individual i choosing alternative j from a set of all available alternatives J: P (j 0 J) 5

exp ( (pij,qij,yij) ) a exp ( (pik,qik,yik)) J

.

(9.5)

k 51

The structure of the conditional logit model leads to restrictive IIA property, which while having intuitively desirable features at the individual level, frequently does not hold empirically at the sample level. There are many ways to relax this restriction by allowing agents to differ either with respect to their taste parameters and/or their random component. We refer interested readers to other chapters of this handbook that are focused on statistical modeling issues for relevant discussions. 5.3

Individuals’ Motivation and Disaggregation of Total Economic Value

So far we have made no assumption about the individual’s motive for valuing q, or the nature of this value. This is because motives are essentially irrelevant for the neoclassical economic theory of value – economists simply accept consumer sovereignty. Economists do not judge if consumers’ preferences are right or wrong. Our objective here is to observe choices and from them infer consumers’ relative preferences for different goods in monetary units. Whatever the reason why an individual cares for q, if he does care, this is reflected in his direct and indirect utility functions u(x,q) and v(p,q,y). However, this is not to say that motives are irrelevant at all. In fact, understanding the motives may help policy makers to interpret the estimated economic value correctly. There is a large literature on motives and their utility representation (Mitchell and Carson, 1989; Carson et al., 1999). Originally researchers defined total value as one of the four Hicksian welfare measures and then distinguished between use and non-use (Mitchell and Carson, 1989), but the tendency now is to distinguish between use and passive use (Carson et al., 1999). The difference involves how to treat uncertainty which invokes option value. The District of Columbia (DC) Court of Appeals in the 1989 Ohio v. U.S. Department of Interior ruled that the government should be measuring passive use values in natural resource damage assessments, where it effectively defined passive use values to include Krutilla’s existence value and other types of value that were unlikely to leave a trace in market behavior. While it is potentially possible for option value to manifest itself in market transactions, such markets rarely, if ever, exist. Passive use value then is comprised of existence value including variants such as stewardship value, bequest value, option value and quasioption value.21 Each of these can be explained in terms of a conceptualization of what is being valued and a specific formulation of an individual’s preference structure.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 215 5.3.1 Use value The key characteristic of use value is some type of technical relationship with some marketed good. One classic case is where consumption of some marketed good is required to enjoy the environmental good of interest. Travel cost analysis is one technique that exploits such a relationship. The other classic case is where the environmental amenity of interest is not directly sold but bundled into some marketed good where hedonic pricing can be used. Some environmental goods examined by environmental economists are comprised of mostly, if not exclusively to use value. Analysis of DCE in this instance more closely resembles their counterparts in other areas of applied microeconomics. 5.3.2 Existence and stewardship values Existence value was a term coined by Krutilla (1967) who articulated the current conceptual framework and, in doing so, fundamentally changed environmental economics. Krutilla (1967) observed that people might be willing to pay for an environmental resource – say a remote national park, even though they knew they would never visit it because, as he famously put it, they ‘obtain satisfaction for the mere knowledge that part of the wilderness in North American remains, even though they would be appalled by the prospect of being exposed to it’ (p. 781). While earlier economists had grappled with the notion of public goods that were truly pure public goods in the sense that people could not be excluded from using them, nor did their use create congestion externalities, Krutilla’s key insight was that gaining utility from such goods would not leave any behavioral traces in market activity. This means that existence value cannot be measured with techniques like the household production function or hedonic pricing that rely on some type of technical relationship such as weak complementarity between the non-marketed good and one or more marketed goods. It is possible an agent may value a public good for both direct and passive use considerations as they are not mutually exclusive, and indeed, may be positively correlated. Krutilla’s emphasis on potentially gaining utility from the simple existence of a good starkly sets out why relying on inference from market data might substantially undervalue some environmental amenities. Other motivations lead to the same utility specification. Stewardship is one of these and occurs if people believe the government should be a ‘good steward’ of a resource and are willing to pay to see the government undertake costly actions consistent with being a good steward. Expressed another way, some people perceive a duty to protect some entity from harm imposed by human activity.22 Good stewardship can, among other things, involve preservation of endangered species, setting aside land rather than developing it, or ensuring that water quality in rivers is maintained at a high level.23 Several ways have been proposed to represent existence value in utility theoretic terms. We use the most common representations here, and start by assuming the direct utility function takes the specific form of: u 5 u (x,q) 5 T [ f (x) ,q ] ,

(9.6)

where T[.,.] is a bivariate function, and f ( # ) is a sub-function that aggregates the x’s. Expression (9.6) involves weak separability between the xs and q since the marginal rate

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

216

Handbook of choice modelling

of substitution between the consumption of any pair of market commodities, xn and xm, is independent of the level of q. A consequence of (9.6) is that the ordinary demand functions for the x’s are each independent of q; they take the form: xn 5 hn (p, q, y) 5 hn (p, y) n 5 1, . . . , N,

(9.7)

where the functions hn ( # ) are the ordinary demand functions associated with maximization of the sub-utility function f ( # ) alone: maximizing u(x,q) and f (x) subject to a budget constraint on the x’s leads to exactly the same solution. The corresponding indirect utility function is: u 5 v (p, q, y) 5 T [ y (p, y) , q ] ,

(9.8)

where y (p, y) is the indirect utility function that corresponds to maximization of the sub-utility function f ( # ) alone. While the person cares for q, it enters his direct and indirect utility functions via the T[.,.] function and the presence of q in his utility function has no effect on his utility maximizing choice of the xs. It is in this sense that one could say that this individual values q for reasons that are unconnected with his valuation of the market goods x. In this case, q would be said to have a pure existence value. Now consider a modified version of the above utility function: u 5 u (x, q) 5 T [ f (x, q) , q ] ,

(9.9)

where, as before, T[.,.] is a bivariate function and f ( # ) is a sub-function. In this case, q enters the utility twice, once through its appearance in f ( # ) and the other as the second argument in T[.,.]. Here, the individual values q for two reasons: the first is connected with his consumption of the xs and is represented by the interaction of x and q in f (x,q) ; the second is unconnected with his consumption of the xs and is represented by the appearance of q as the second argument of T[.,.]. In this case, the ordinary demand functions do depend on q: they take the form: xn 5 hn (p, q, y) 5 hn (p, q, y) n 5 1, . . . , N

(9.10)

where the functions hn ( p, q, y) are in fact the ordinary demand functions associated with the maximization of the sub-utility function f (x, q) alone. The crucial implication of this fact is that revealed preferences based purely on estimation of the demand functions for market goods hn (p, q, y) , n 5 1, . . . , N, will recover only the sub-utility function f (x, q) , but not the function T[.,.]. The indirect utility function associated with is: u 5 v (p, q, y) 5 T [ y (p, q, y) , q ] ,

(9.11)

where y (p, q, y) is the indirect utility function that corresponds to maximization of the sub-utility function f (x, q) alone. The total value that the individual places on a change in q, denoted CTOT, is given by: T [ y (p, q1, y 2 CTOT) , q1 ] 5 T [ y (p, q0, y) , q0 ] .

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

(9.12)

The discrete choice experiment approach 217 This has both a use value component, associated with y (p, q, y) , and a passive use component, associated with T[.,q]; the use component, CU, satisfies: y (p, q1, y 2 CU) 5 y ( p, q0, y) ,

(9.13)

while the passive component CPU would be defined as the difference: CPU ; CTOT – CU.

(9.14)

From a CV perspective, the distinction between use and passive components of the individual’s total value for a change in q could, in principle, be captured by formulating a model based on a specification of a utility function that conforms to the structure of (9.11), where the y (p, q, y) and T[.,.] functions could be separately identified. The crucial feature of this structure is that prices and income interact in a manner that is partially separable from q. This has not usually been done, as commodity prices are typically not used as explicit variables in utility specifications and identification would rest on the functional form assumption. As such, only information elicited in CV surveys relates to CTOT. Some early CV researchers (for example, Walsh et al., 1984) approached the problem of trying to separate CTOT into components by first asking respondents for their total value and then requesting that they allocate this total among several specific motives. The problem is that decomposition questions are difficult for respondents to answer if they think holistically about goods and that decompositions are sequence dependent and not generally unique (Mitchell and Carson, 1989, Cummings and Harrison, 1995, Carson et al. 1999). Given the structure of (9.11) which generates the decomposition of the total value for the change in q into a use value component based on y (p, q, y) , and a passive use value component based on T[.,q], there are two more fruitful approaches to identifying the two separate components of value. The first is to accept a different type of decomposition where total value is defined in the usual way using a question where the respondent is able to use the resource and another question is asked where the resource will not be available for the respondent to use. The difference between WTP from these two questions is use value, under the assumption that the resource must first exist and then the ability to use it is added. This second question may not be plausible. The second is to collect two sets of information, one being total value from a CV survey and the other RP data on the demand functions for one or more of the xs. Suppose, for simplicity, there is sufficient RP data to estimate a complete demand system for the xs. This would be combined with SP data on total economic value, using an assumed specification of the indirect utility function given above to estimate a system consisting of: x1 5 h1 (p,q,y) 5 2 [ 0y (p,q,y) /0p1 ] / [ 0y (p,q,y) /0y ] ... µ . xN 5 hN (p,q,y) 5 2 [ 0y (p,q,y) /0pN ] / [ 0y (p,q,y) /0y ] T [ y (p,q1,y 2 CTOT) ,q1 ] 5 T [ y (p,q0,y) ,q0 ]

(9.15)

The advantage of this approach is that the RP data enrich the SP data, the two sets of data are analyzed, conditional on the functional form assumption, in a mutually

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

218

Handbook of choice modelling

consistent manner, and they permit separate estimation of use and existence value as well as total value. Pioneering studies following this approach include Cameron (1992) and Adamowicz et al. (1994) with Azevedo et al. (2003) providing a recent review. 5.3.3 Bequest value Krutilla (1967) also introduced the concept of ‘bequest value’ where some agents are willing to pay to protect a wilderness area or national park in order to preserve it for their children or grandchildren. Bequest value represents concern for a future generation where the current generation wants the future generation to have access to a particular resource. A utility function can be written with this motive entering in a similar fashion to q, impacting welfare but exogenous to current consumption. As such, we could think of q as a vector with two separate components, q 5 (q1, q2) , where q1 is the current period with respect to the existence of the wilderness area, which affects them either through being a current period park visitor (use value) and/or through the park’s current period existence value. q2 represents an increase in the agent’s well-being due to future generations having the wilderness area available. Because of the bequest motive, the protection of the wilderness now involves a shift in both elements of q. With this, the formalism in q1 and q2 carries over as the definition of the individual’s WTP and WTA to protect the wilderness area. As such, a bequest motive results in a potential re-specification of the agent’s utility function, but it does not otherwise change the formal theory of the CV response probability function.

6

THEORETICAL VALIDITY OF CV RESULTS

Almost from its inception, CV has turned up behavior thought to be potentially at odds with standard neoclassical economic theory. This is the main thrust of attack by some economic critics of CV (for example, Hausman, 1993). With 50 years of experience conducting CV studies, these potential anomalies fall into three categories. The first are instances where our understanding of what neoclassical theory predicted turned out to be faulty. Much of the problem here stems from using intuition concerning demand and welfare theory for price changes where agents are free to adjust quantities to infer what sort of behavior should be observed for imposed quantity changes, the situation that characterizes many environmental goods. The second are the presence of various behavioral effects. Here it is useful to point out that some of the best known behavioral effects were first seen in CV surveys. These were later shown to be quite robust across a wide range of economic behavior. CV surveys cannot eliminate behavioral effects but they can provide respondents with choice questions that facilitate careful decision making. The third are anomalous behavior induced by poor survey design and administration. CV studies that have invested considerable time and effort into understanding what people believe, into presenting a credible choice scenario with a well-defined good and a coercive payment mechanism, and into a survey administration that enhances the survey’s consequentiality tend to produce results that are well-behaved. To keep this section focused, we concentrate on the main anomalies that have been put forward in the literature.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 219 6.1

Different Elicitation Formats Yield Different WTP Estimates

Early on (for example, Smith and Desvousges, 1986) CV researchers had clearly documented that different elicitation formats yielded different estimates of WTP.24 This was troubling if one thought respondents truthfully revealed their preferences; and, these differences were often used by CV critics (McFadden, 1994) as an indication that people did not have well-formed preferences for the environmental goods. There are three difficulties though with this view. The first is that these differences appear to be systematic not random. Second, their existence was seen by psychologists as a function of framing, not as a survey artifact. Indeed, Tversky et al. (1990) argued that ‘matching’ and ‘choice’ frames being consistent with different behavior, as perhaps the major problem with neoclassical theory. The third was that Carson and Groves (2007) showed that if respondents are standard rational maximizing economic agents facing a consequential survey, then they should exploit all of the incentive and information characteristics of the particular elicitation format used. Their neoclassical model yields the strong prediction that different elicitation formats should generally produce different estimates of WTP. Further, they showed that their framework predicts the fairly complex pattern of typically observed differences between elicitation formats. In recent years, there has been a dramatic change in the literature, from difference in WTP estimates from different elicitation formats suggesting major problems with CV to an acknowledgement that the direction of the observed divergences are predicted by neoclassical theory. 6.2 Consistency of CV Results with Theoretical Predictions Results from CV surveys have raised many interesting questions about neoclassical economic theory, which have prompted extensions or clarifications of that theory. The key to understanding these economic questions is recognizing there is a considerable difference in terms of how monetized utility differences are defined between a situation where an agent faces a price and determines the quantity to purchase to maximize utility which characterizes private goods and the situation where the agent faces a choice between having or not having access to a public good. Most textbook discussions of welfare economics are based on the first situation, as is the intuition of most economists. Hicks (1943), however, in his classic formulation of current welfare economics, showed that there are four consumer surplus measures. Two of these were routinely ignored because they involved ‘rationed’ goods, a Second World War phenomenon of interest to Hicks.25 Cornes and Sandler (1996) were to later show that pure public goods are just a special case of rationed (that is, imposed quantity change) goods. Much of the inherited wisdom about the price change case comes from Willig’s (1976) seminal paper that showed the difference between WTP and WTA was sufficiently small to be ignorable for small price changes. Willig, though, was careful not to include quantity changes in his analysis. The unraveling of structure of demand for imposed quantity changes that was set in motion by Hanemann (1991) showing that, for this case, divergences between WTP and WTA could be arbitrarily large.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

220

Handbook of choice modelling

6.2.1 WTP versus WTA Total value from an economic perspective can always be cast in terms of the maximum amount that the agent would pay for the good or the minimum amount the agent would take in exchange for selling the good. Property rights determine whether WTP or WTA is the right welfare measure. The appropriate property is generally well defined for private goods and it would not matter for environmental goods if WTP and WTA were close. If WTP and WTA are not close from a theoretical perspective, then it matters how property rights are assigned. This is often a complicated legal and political question for environmental goods. Willig’s (1976) result provided considerable comfort to early researchers working on environmental valuation since it meant that a WTP measure could be substituted for WTA where needed with little loss of precision. Initial efforts, though, to measure WTA (for example, Hammack and Brown, 1974; Bishop and Heberlein, 1979) suggested that WTA was considerably larger than WTP. This was initially taken as a sign of problems with CV or, at least, using CV to measure WTA. Subsequent developments suggested that the problem was with using Willig’s results to make inference about the imposed quantity change case. Hanemann (1991) showed that the key difference between the price and quantity cases is that the difference between WTP and WTA is governed by one parameter, an income effect, while in the case of an imposed quantity change the difference is driven by the ratio of the income effect to a substitution effect. As the substitutability of the public good with the available private goods becomes small, which is likely to be the case for many environmental goods, the difference between WTP and WTA grows large. A competing explanation was prospect theory (Kahneman and Tversky, 1979) which predicted a sizeable divergence between WTP and WTA for behavioral reasons that clashed with neoclassical economic theory and there are richer variants of neoclassical economic theory that include dynamic effects, uncertainty, strategic behavior and bargaining that predict sizeable divergences (Kling et al., 2012). Experiments using actual transactions in a variety of settings showed large divergences between WTP and WTA (for example, Knetsch et al., 1990). It soon became clear that the divergence between WTP and WTA first identified in CV studies was not a survey artifact, and indeed, routinely manifested itself in markets. Even traded financial assets did not seem to be immune, with junk bonds and thinly traded stocks showing much larger bid (WTP)-ask (WTA) spreads than predicted using Willig’s result. Horowitz and McConnell (2002) in a widely cited review of the literature show that the ratio of WTA to WTP estimates is roughly the same for surveys and actual transactions. The key implications for CV are twofold: first divergences between WTP and WTA should be expected – this is not a sign that CV is invalid – and, second, this divergence can be important in some policy contexts (Knetsch, 1990). 6.2.2 Income elasticity of WTP A claim often made by CV critics (Hausman, 1993) is that environmental goods should be luxury goods and, as such, one would expect the income elasticity from a CV study to be above one. As before, this claim also has problems. First, there is no good reason to believe that the environment is a luxury good. The typical result from CV studies is that most goods are necessary (for example, normal but not luxury) goods. While income

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 221 elasticities are typically significant, they do strike many researchers as smaller than might be expected, and the occasional, but not infrequent lack of significance, troubling. Two very distinct reasons lie behind the empirical results. The first is that the income elasticity of WTP, the quantity estimated in CV studies is a very different economic quantity than the income elasticity of demand upon which the usual economic definition of luxury goods is based. Flores and Carson (1997) show the two elasticities are functionally related. In a world with two goods, the relationship being given by: c

hv1 sd11 sd12 hd1 y d c d . 5 2 d c hv2 sd21 sd22 hd2 ev

(9.16)

where hvi are the (virtual) income elasticities of WTP, the S are the cross-price elasticities of demand, hdi are the ordinary income elasticities, and y/ev is income divided by ev which is equal to income plus the value of all public goods consumed expressed in monetary terms. This result shows the income elasticity of demand and the income elasticity of WTP can differ significantly in magnitude and even sign. Because the term y/ev is less than one, a researcher should usually expect to see the income elasticity of WTP being less than the corresponding income elasticity of demand. The second reason for expecting the income elasticity of WTP to be smaller than what intuition might suggest is measurement error, as income is generally very difficult to measure. Indeed, the relevant income measure that should be used is open to question, ranging from wealth, to some definition of permanent income, to mental accounts, to some measure of discretionary income. These measurement problems will tend to attenuate the statistical estimate of the income elasticity of WTP toward zero. 6.2.3 Sequence effects and the adding-up test An early CV finding was that the order in which two goods were valued matters, and could matter a lot. Consider a stylized example taken from some of the early work on air pollution valuation (Randall et al., 1981). Respondents from Chicago have an average WTP of about $100 for a specified air quality change in Chicago. When offered an additional specified air quality improvement in the Grand Canyon, they are willing to pay $30 more. A different sample of respondents is given the reversed sequence. Their average WTP is $60 for the Grand Canyon improvement and $70 for the Chicago improvement. Policy makers are troubled to see such a result, believing there should be only one ‘true’ monetary value. By training, an economist does not believe that there is only one true monetary value. There should be substitution and income effects that come into play in different ways depending on the order of the sequence. The magnitude of the difference in this stylized example is larger than what many economists would expect. The question is what does neoclassical economic theory predict? Carson et al. (1998) show that if the goods being valued are normal goods and Hicksian substitutes for each other, which is the typical case, then the value of a particular public good should get progressively smaller the later it is valued in a WTP sequence. This finding says that a package of goods should be valued less than the sum of its independently valued constituents, comporting with the argument put forward in Hoehn and Randall (1989) that the benefit estimates used for policy purposes might be considerably

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

222

Handbook of choice modelling

too large if all goods were valued individually and the government provides more than one. Carson et al. (1998) show the opposite effect occurs in a WTA sequence. With fewer substitute goods and more income as the sequence progresses the later in a sequence the good is valued, the larger its value. Further, they show that the WTP for a good valued first in a sequence is always less than or equal to WTA for the good valued in any order in the sequence. Turning now to what determines the magnitude of the sequence effect, consider the set of compensated, cross-price elasticities of demand (sijd) from Deaton’s (1974) wellknown analysis of UK consumer demand for two goods: food and clothing. Own-price elasticities are –0.28 for food and –0.17 for clothing and cross-price elasticities are 0.08 for the effect on food demand of a price increase in clothing and 0.21 for the effect on clothing demand of a price increase in food. All of these elasticities are reasonably small and they yield the following well behaved demand system: c

sd11 sd21

sd12 20.28 0.08 d. d 5 c sd22 0.21 2 0.17

(9.17)

If, however, one restricts quantities in the choice that agents face, then the situation involves partial rationing and hence the inverse demand system with quantity restrictions. To do this we need to look at how the virtual (shadow) prices for the rationed goods (food and clothing) respond to changes in the rationed levels of both of these goods. The virtual price substitution elasticities (sijv), measuring responsiveness, are related inversely to the compensated price elasticities (Madden, 1991). For our food and clothing example, the virtual price matrix of the substitution terms is: c

sv11 sv21

sv12 sd11 sd12 5 d c d sv22 sd21 sd22

21

5 c

25.60 27.19

2 2.55 d. 2 9.33

(9.18)

The same demand system whose cross-price elasticities imply relatively small increases in demand of one good when the price of the other good increases (an 8 percent increase in food demand in response to a 100 percent price increase in clothing and a 21 percent increase in clothing demand in response to a 100 percent price increase in food), implies very large reductions (255 percent and 719 percent, respectively) in WTP if a unit of the other good has already been provided first in the WTP sequence. Deaton’s demand system involves only two common private goods so there is no need to resort to explanations involving inconsistent preferences or even peculiar characteristics of public goods to get large sequence effects. While substitution effects are clearly sufficient to drive the sequence effects observed in many CV studies, income effects may also be important. Budget constraints are often thought relatively unimportant unless the cost is fairly sizeable, but some households may not have much discretionary income, particularly if payment is required over a short time period. Contingent valuation critics have argued that if sequence effects are large then the value of goods can be manipulated by choosing the sequence order. There is some truth to this statement. While goods that people do not care about do not magically become valuable by virtue of placing them first in a sequence, it is possible to drive down the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 223 value of a particular public good by embedding it far out in a sequence of other desirable public goods. It is not surprising that supporters of a project want it considered first and rivals for available funds want it considered after their projects. This is the agenda control problem in politics and it would be peculiar if CV, or benefit–cost analysis, more generally, was able to circumvent this issue. Another context-related consistency test, termed an adding-up test, was proposed in the Hausman (1993) volume and expanded upon in Diamond (1996). The test requires that the estimated monetary value of a bundle of two (or more) goods [WTP(A, B)] be equal to a properly constructed sequence of WTP for the individual goods that takes account of payment and provision [WTP(A) 1 WTP(B | A, –c)], where A and B are the two goods and c is the payment for A. At an abstract level, the test follows from satisfying duality properties commonly assumed in microeconomics. There are several problems with the test though. It is difficult to implement in a survey because it asks respondents to pretend they have received a specific good and paid a specific amount for it, when they have not. Closer examination of the test’s theoretical underpinnings shows it relies on a strong function form assumption that can best be seen by noting that it requires people are indifferent between a program that prevents some number of existing birds from being killed and a hatchery program producing the same number of new birds (Smith and Osborne, 1996). The last problem with the test is that people generally do not conform to the adding-up test predictions in real markets. Bateman et al. (1997) ran an experiment with students and two of their favorite commodities, pizza and coffee, and show they violate it. This should come as no surprise. Many marketing activities such as upselling exploit violations of the test. When the bundle is relevant, a good CV study should value it. 6.3

Scope Tests

An issue involving the validity of CV that has received considerable attention after being put forward as a major concern by Kahneman and Knetsch (1992) and Hausman (1993), is whether CV estimates are sensitive to the scope of the good being valued.26 The Arrow et al. (1993) panel largely adapted the methodology used in the Carson et al. (1992) Exxon Valdez study as its recommended practice except that it diverged on the scope issue by recommending that CV surveys being done for litigation purposes pass a scope test. The proximate cause for this recommendation was the striking results of Desvousges et al. (1993) in which respondents gave roughly the same WTP for preventing 2000, 20 000, and 200 000 birds from being killed by oil, and the contention of the CV critics that CV researchers had not previously examined the issue of scope insensitivity. Mitchell and Carson (1989) had earlier raised the possibility of scope insensitivity calling it ‘part-whole’ bias and argued that the problem was caused by a failure of the survey designer to clearly communicate the characteristics of good(s) in the CV survey. This alternative helped to frame the debate. Was scope insensitivity a generic characteristic of CV surveys that happened because respondents were giving generalized answers not tied to the specific characteristics of the good being valued, or was the phenomenon linked to specific poorly designed CV surveys that offered vaguely described goods or payment obligations? To examine the issue of scope sensitivity it was necessary to sort the relationships

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

224

Handbook of choice modelling

between sequencing, nesting and scope (Carson and Mitchell, 1995).27 Nesting occurs when one good is a proper subset of another, which can happen in one of two ways. One way is where different goods can be valued in different combinations, while the other is where different numbers of units of the same attribute are provided. An example of the first way is where a respondent values a particular wildlife refuge, a particular forest and a particular beach. This composite good, in turn, nests the same forest and beach, which in turn is a composite good that nests the same beach. The second way has two goods differing along one quantitative attribute so that a larger good nests a smaller one. When one good nests the other, the theoretical prediction (if the nested good and its complement are both desirable) is that WTP should be the same or greater for the larger good. Thus, the scope sensitivity hypothesis can be tested by having respondents value two goods that differ in scope. This test can be external using two independent statistically equivalent subsamples each of which values one of the two goods or internal by having the same respondents value both goods. The answer as to whether CV surveys suffered from generic insensitivity to scope was soon answered in the negative. Carson (1997) reviews the large body of empirical evidence on split sample external scope tests that quickly developed after the Arrow et al. (1993) panel report. This evidence came from four sources. First, there were a number of existing tests of the scope insensitivity hypothesis. They had not been labeled as such because the hypothesis had prior to Kahneman and Knetsch (1992), not been of much interest and because these tests occurred in studies that had been done for policy purposes where the reason for using a split sample was to avoid giving respondents contradictory information about the goods being valued. These tests tended to almost uniformly reject the scope insensitivity hypothesis. The second source was new studies that contained explicit tests of scope insensitivity. These included two large CV studies (Carson et al., 1994a, 1994b) which involved goods with primarily passive use considerations and in-person interviews and they too, overwhelmingly rejected scope insensitivity. The third source was meta-analyses (for example, Smith and Osborne, 1996) done across CV studies valuing the same class of good that showed estimates systematically varied with the characteristics of the good being valued. The fourth source was re-analyses of some of the key studies CV critics put forward suggesting claimed results are not robust and that many of the surveys used were far from state-of-the-art CV surveys (Carson, 1997). The conclusion of the Carson (1997) review has held up well over time. There have been a number of additional tests of scope insensitivity, with most rejecting the hypothesis. When problems are found they tend to fall into a few areas. The first involves the use of small risk probabilities. Many people do not process small probabilities well. This issue is now well-known to not be isolated to surveys, but rather, is widely reflected in financial and insurance decisions. The second involves endangered species when the protection plan involves land. Here respondents often appear to correctly perceive that protecting the land is likely to protect everything living on it. The third involves the use of voluntary payments where theory analysis of the incentive structure suggests no scope effect may be seen. There are other important aspects of conducting scope tests that are likely to be useful in thinking about evaluating CV results more generally. Subsequent research has shown information about how the overall choice task is conveyed (for example, Bateman et al., 2004) or how a particular attribute like low-level risk (for example, Corso et al., 2001)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 225 can influence sensitivity to scope. This suggests researchers need to be careful about how goods are presented when sensitivity to scope is likely to be an issue. With public goods, respondents are often skeptical that the government can deliver the good described and adjust their WTP accordingly. With respect to scope tests, there is evidence that suggests that respondents may be more skeptical about the government’s ability to provide the larger version of the good than the smaller version. This can create the seemly perverse result that respondents are willing to pay more for less. It is also possible to conduct internal scope tests. The most common way to do this is with a DCE that offers respondents more than one alternative to the status quo.28 When these differ by one or more attributes, it is possible to test whether changes in the attribute influence WTP in a statistically significant manner. Such tests are routinely passed though, and are thought to be weaker tests than the external version of a scope test because respondents typically can easily see the difference between goods.29 However, it should be noted that side-by-side comparisons are a typical characteristic of markets. 6.4

Criterion and Convergent Validity30

Economists suspicious of survey responses often ask the question: ‘How do CV results correspond with actual behavior?’ This question, to some degree, is ill posed in two ways. First, it assumes actual behavior represents a criterion validity comparison in the sense that the criterion contains no systematic error. This is in contrast to a convergent validity test which assumes that both measures may be influenced by various factors that may cause divergences with respect to the particular theoretical construct of interest. What has been shown consistently is that a criterion standard is rare if not non-existent. Almost all comparisons of CV estimates to some other type of measurement should be treated as tests of convergent validity (Mitchell and Carson, 1989). Second, it typically suggests an underlying belief that if there is a divergence then there is a problem with CV. However, a more sophisticated view is that CV estimates should be consistent with actual behavior only when theory and the conditions under which it is implemented suggest that there should be consistency. Divergence should be seen under conditions that predict divergence. The Carson and Groves (2007) framework has been particularly useful in thinking about this situation. It is instructive to take up the three cases most often invoked by CV critics. The first involves comparing what people actually give in response to requests for voluntary contributions for an environmental good to an SP-based estimate of what respondents say they will contribute.31 The survey estimates tend to be considerably higher than actual contributions, but this is to be expected. Economic theory predicts that people should free-ride with respect to letting others provide public goods and there is a large body of empirical evidence supporting this prediction, although free-riding behavior does not seem to be as complete as theory suggests. What was less obvious until the Carson and Groves (2007) paper is that if respondents took a survey asking about voluntary contributions as consequential, then indicating a willingness to contribute would increase the likelihood that an actual voluntary fundraising drive would be mounted, which in turn can provide the good to a free-riding agent. The only prediction that arises is the survey should over-estimate WTP for the good and that the voluntary

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

226

Handbook of choice modelling

contributions should under-estimate WTP for it. Even this bound though is subject to caveats (Chilton and Hutchinson, 1999) because this often involves a shift in who will provide the good which can influence perceptions about the nature of the good including its likelihood of being provided. The second case involves private goods where it is possible to observe actions in actual markets (including economic experiments) as well as asking survey respondents about them. Sometimes the ‘revealed’ behavior is taken from actual markets and sometimes from economic experiments. It is often argued (for example, Neil et al., 1994) that private markets represent the best situation for SP estimates to perform well because the goods are more familiar to people, but familiarity is not a relevant issue. What is relevant is the survey’s incentive structure (Carson and Groves, 2007). If a respondent has a positive probability of wanting to buy a new good at the price stated, he should say ‘yes’ because it increases the likelihood that the good will be produced and made available for sale. Not surprisingly, marketing surveys tend to over-estimate the percentage of the public who actually buy. This is the expected result though, because the survey which may still be useful if calibrated correctly, traces out potential demand, not actual demand.32 In some environmental contexts, being able to measure potential demand for one-time actions, such as hooking up to a new water system in developing countries, has proven useful because the survey-based estimates tend to be reasonable predictions of how many households hooked up over a longer time horizon (World Bank, Water Demand Research Team, 1993). The opposite situation to introducing a new good often happens when the survey asks respondents about existing private goods. Respondents appear to be less likely to buy at a given price in the survey than they are in an actual market context if they believe the survey is being used for pricing purposes.33 These surveys can still be useful, as responses are often very informative about the relative attractiveness of different attribute levels since the incentives are to get the firm to supply the good’s preferred configuration at a low price. When both availability and price are at issue, the incentive structure can become complex and respondents with different beliefs are likely to rely on different heuristics. The marketing and transportation literatures have often seen surveys as producing useful but biased estimates that need to be calibrated with revealed preference behavior. Without a coherent theory of how people respond to the surveys, however, overestimates for new goods and a lower proclivity to purchase existing goods in pricing exercises have coexisted in the form of ‘forecast’ problems from hypothetical bias using surveys (Hensher, 2010). Looking at the incentive structure provides the missing link. What should be clear from this brief discussion is that, contrary to the initial belief that private goods should represent the best case for survey estimates, private goods represent difficult situations where one should expect to see systematic deviations between survey and market behavior. The third place that researchers have looked at (and there are a number of these tests) is how well estimates from SP surveys perform relative to actual transactions in laboratory experiments. The typical experiment asks respondents about WTP for a good in a ‘real’ treatment where payment is required and a ‘hypothetical’ treatment where it is explicitly made clear that the response given will have no effect on either provision or payment. Murphy et al. (2005) provide a meta-analysis of both treatments. They find a median ratio of estimated WTP in purely hypothetical treatments to estimated WTP in the actual

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 227 payment treatments of 1.35.34 While this upward bias is troubling, it is well within the range of uncertainty that characterizes much economic analysis. Perhaps more importantly, though, it is not clear what relevance these studies have with respect to judging how well CV studies are likely to perform. Most are done with students who are paid to participate. The survey instruments used often bear little resemblance to real CV studies and it is likely those receiving the ‘hypothetical’ treatment are suspicious as to why they are being paid. The most important difference, however, is that a good CV survey is not purely hypothetical; it is consequential in the sense of potentially having a policy impact. Carson and Groves (2007) show that under a purely hypothetical treatment, a random response has the same impact on the agent’s utility as any other response. As such, it is not clear that anything useful from the perspective of a consequential CV survey can be learned from these tests. Let us now turn to the first case that can provide some information on the performance of CV surveys. With quasi-public goods such as outdoor recreation that require some type of complementary purchase or which are bundled into private goods, it is possible to compare WTP from both SP and RP (for example, averting behavior, hedonic pricing and travel cost analysis) approaches. Carson et al. (1996) provide a meta-analysis of the ratio of CV to RP estimates based on 616 comparisons from 83 studies. They estimate the mean CV/RP ratio to be .89 with a confidence interval of [0.81–0.96]. This suggests that CVM estimates in the case of quasi-public goods are on average a bit lower than RP estimates and highly correlated (0.78).35 Carson et al. (1996) do identify a publication bias – studies are most likely to get published if the CV/RP ratio is close to one or quite large. The literature has expanded dramatically so recent comparisons have focused on particular classes of goods. Rosenberger and Loomis (2000) perform a meta-analysis using 682 estimates from 131 CV and travel cost studies. They find the CV-based estimates are on average significantly lower than RP-based estimates controlling for the characteristics of the type of outdoor recreation being valued. Kochi et al. (2006) find that CV studies valuing statistical lives tend to produce smaller estimates than those obtained from hedonic wage studies. Carson and Groves (2007) show that consequential SBC questions featuring a take-itor-leave-it offer not tied to any other decision with a coercive payment should result in truth preference revelations so it is useful to look for comparisons to votes on binding referendum. Carson et al. (1987) performed the first such test with a ballot proposition involving a water quality bond and a CV survey that mimicked the ballot proposition administered as part of the Field Institute’s California Poll. Close correspondence was found between the percentage in favor in the survey and the actual vote.36 Four subsequent US comparisons of estimates involving CV surveys have been done. Champ and Brown (1997) look at a referendum involving road maintenance in Colorado, Vossler, et al. (2003) look at an open-space bond issue in Oregon, Vossler and Kerkvliet (2003) look at a different Oregon open space bond issue, and Johnston (2006) looks at a water supply bond issue in a small Rhode Island town. All find that the survey results and referenda votes are statistically indistinguishable and conservative if all do not know responses are treated as no votes, which is common CV practice. This is not surprising since surveys on two candidate races and referenda taken close to elections have long been known to be quite good predictors of election outcomes.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

228

7

Handbook of choice modelling

CONCLUDING REMARKS

There are now a vast number of CV studies and papers focused on various aspects of environmental valuation and 50 years of empirical experience implementing variants of the approach. A great deal has been learned over the course of this process and CV has substantively influenced how environmental economists view many issues. Nowhere is this clearer than with respect to the potential importance of passive use considerations and the development of a deeper understanding of the pure public goods nature of many environmental goods. At best, this chapter provides a broad overview of some of the main CV results and points the interested reader toward relevant literature for future exploration. One of the main areas were CV has matured is in understanding the implications of the underlying neoclassical economic theory. Particularly for pure public goods and passive use values, these implications are often quite different from predictions concerning welfare measures based on price changes upon which most economists’ intuition is based. Once these are taken into account, CV results tend to be consistent with theoretical theory. Results from CV studies can be used to show agents violate some of the standard tenets of neoclassical economics, but none of these violations are specific to surveys. Agents routinely violate them in markets but it is possible to design CV surveys to help facilitate careful decision making by respondents. Contingent valuation moved toward the SBC elicitation format as a way of helping to ensure incentive compatibility when there was a pure public good for which only one level could be provided. On the other hand, CV often utilizes more complex DCE as a way of obtaining more information from each respondent than an SBC question provides and as a way of being able to value a much larger range of policy options at reasonable cost. There is some tension here. Carson and Groves (2007) show that while an SBC question can be incentive compatible if a set of auxiliary conditions are met, more complex DCE are not. Once one knows where to look for the violations, they were readily apparent. But these can often be dealt with without substantially comprising results. In this sense, it is another factor to be cognizant of when analyzing choice data. The assumption that respondents truthfully answered all questions and treated choice sets independently should never have been a plausible assumption. At first, more complex DCE seemed different in substantive ways from other variants of CV because of their focus on attributes and multiple choice sets. These differences were sometimes seen as overcoming problems critics had attacked CV for, but this was not to be the case. Any problem seen in a CV study using the SBC elicitation format generally has a counterpart when looking at more complex DCE. Protest responses and nay/yea-saying with an SBC look like status quo effects in a DCE with three or more alternatives. More complex DCE, if anything, require even more skill on the part of researchers applying them. Conveying more levels of attributes in a way that is understandable to respondents is a difficult task, as is making the bundling of them together in a wide array of configurations plausible. Asking multiple choice sets also takes up scarce interview time in a survey relative to asking a single SBC. Experimental design issues loom considerably larger as the design used can influence what parameters are statistically identified and the efficiency of the choice sets used in collecting relevant information on preferences.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 229 Still, the gains from high-quality implementation of more complex DCE can be very large. It allows exploration of a much larger set of policy options and can help identify the marginal value of changing the attributes of particular plans. This is close to the holy grail of environmental economics of being able to set marginal benefits equal to marginal costs.

NOTES 1. 2.

3. 4. 5. 6.

7.

8. 9.

10.

11.

This work draws heavily upon our earlier work, and in particular: Carson et al. (2001), Carson and Hanemann (2005), Carson and Groves (2007), Carson and Louviere (2011) and Carson (2011, 2012). The public character and free access aspect of many environmental goods often requires government funding or mandated provision. This in turn creates the problem of finding the socially optimal level of their provision – a problem that requires estimates of the value consumers place on the goods. As a result, cost–benefit analysis (and hence CV) has become an essential tool for policy makers concerned with environmental goods. Environmental goods are not unique with respect to having a passive use component. For instance, passive use is thought to be an important component of many cultural resources (Navrud and Ready, 2002) and a public good like national defense is a classic case of a passive value. www.evri.ca/. Obtaining reliable estimates of WTA tends to be more difficult than for WTP. Throughout we usually refer to obtaining WTP estimates unless there is need to conceptually distinguish between WTP and WTA. Contingent valuation also started to spread outside of environmental economics. See for instance Acton’s (1973) pioneering work on valuing health effects and work by Throsby (1984) on government support for the arts. Unrelated work in transportation would start to have a synergistic effect with environmental economics at a later date with recognition that valuing travel time (for example, Hensher and Truong, 1985) had similarities to valuing some types of environmental amenities and that transportation policies such as those involving alternative fuel vehicles (for example, Brownstone et al., 2000) had large environmental implications. One of Louviere’s early papers (1974) looked at how distance (travel time) and stream quality (trout per mile) influenced stream preference. Even this format can be cast in a choice context as it can be modeled as interval censored data with the two amounts from the card defining a lower and upper bound on WTP (Cameron and Huppert, 1989). Responses to a binary discrete choice question can also be easily cast as interval censored data. Metcalfe et al. (2012) show that it is possible to combine data from the two elicitation formats in a common statistical model. Efforts to estimate models for DCE in WTP space (for example, Train and Weeks, 2005) exploit Cameron’s approach. The study estimated that the US population was willing to pay approximately $3 billion to prevent a spill similar to the Exxon Valdez in the future. Natural resource damage assessment rules require compensation to come in the form of restoration and on expenditures to provide similar resources to those injured where restoration is not possible. Exxon spent about $2 billion on response and restoration and provided the government with about $1 billion for a fund to buy additional natural resources. Subsequent to the Exxon Valdez CV study, the government implemented a prevention plan similar to that described in the study, and key components of that plan have been deployed several times to avert a spill. An early effort (Rae, 1982) involving air pollution had respondents completely rank order a set of programs but was shown to have substantial econometric problems given the techniques of the time. Other periodic efforts to implement conjoint related techniques including ratings of alternatives (for example, Mackenzie, 1993) have not been widely copied in the environmental economics literature owing to the lack of a well-accepted way of exploiting any extra preference information beyond that contained in choice data. In spite of frequent reference to the term ‘conjoint’ to refer to DCE, Louviere et al. (2010) recommend not using this term. Conjoint measurement as proposed in the psychological literature is substantively different than the random utility theory and Lancasterian attribute framework that underlies DCE. Choice-based conjoint was a term used to try to move marketers accustomed to using ad hoc approaches based on conjoint measurement theory over to using DCE. It would appear to have long outgrown its usefulness even in marketing. The term ‘contingent’ refers to the estimated values obtained using the data collected being contingent on the features of the survey’s scenario, or constructed market as it is sometimes referred to, for providing the good(s) including the specific description of the good(s) and their manner of their provision.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

230 12. 13. 14.

15.

16. 17.

18. 19. 20. 21. 22.

23.

24. 25. 26. 27.

28. 29. 30.

Handbook of choice modelling Overviews include Louviere et al. (2000), Bennett and Blamey (2001), Hanley et al. (2001), Bateman et al. (2002), Champ et al. (2003), Kanninen (2007) and Hoyos (2010). For some commonly used formats such as ratings questions, it is sometimes possible to throw away ‘extra’ information on the intensity of preferences and assume that they can be translated into the choices that consumers would make. Likewise, rankings data can typically be translated into choice data. Some work in psychology and behavioral economics implicitly assumes that neoclassical economic theory requires agents to carry around a complete vector of WTP and WTA for all possible situations and that they can effortlessly retrieve those numbers but this has never been required by neoclassical economic theory. An interesting aside is that the popular Becker et al. (1964) mechanism, and other similar mechanisms that in principle can elicit an incentive compatible matching response with weak restrictions on preferences, cannot be implemented in a survey context. This is because they rely on the agency not being able to exploit the extra information beyond a binary response, which cannot be guaranteed in a survey context. There are, of course, other reasons why IIA violations may occur. In a three-candidate race where the candidate with the largest number of votes wins, the candidate perceived to be in last place tries to convince voters that their chance of winning is higher and more uncertain than thought and that there is no effective difference in the positions of the two leading candidates. Success in this strategy makes voters who truly favor the third place candidate more likely to vote for that candidate. In a marketing context, imagine the difference in demand for the same product in a store with and without a sign that said the product had been made by exploiting workers under terrible conditions. v(p,q,y) is homogeneous of degree zero in p and y, increasing in y, non-increasing in p, and quasi-convex in p. In the Thurstone framework, different variances could be generated if different parts of the stimulus space being explored involved differential ability to discriminate between alternatives. Option value and quasi-option value involve the introduction of uncertainty over future state of the world and the role of potential learning. They are beyond the scope of this chapter. Carson and Hanemann (2005) and Zhao and Kling (2009) provide detailed discussions. Being a good steward can mean being willing to spend money to keep animals from being killed by human related causes but not natural causes. This conceptually causes no problem from a theoretical perspective, but utility is not defined on the number of animals but rather on the animals and the source of the threat to them. Pure altruism and an impure form known as warm glow (Andreoni, 1989) are sometimes advanced as other motives for environmental values. From the perspective of economic theory, the source of the motivation at the individual level is irrelevant. From the perspective of aggregating welfare measures, interdependence of utility across agents of which altruism is a special case, can be an issue but not in most cases considered in environmental valuation. Carson and Hanemann (2005) provide a detailed discussion of issues related to altruism. If different elicitation formats produced statistically similar WTP estimates, the choice of format to use would likely have stayed focused on issues of respondent burden and statistical efficiency. Hick’s four consumer surplus measures, compensating surpluses and variations and equivalence surpluses and variations, might be better seen today as WTP and WTA crossed with price and imposed quantity changes. A good that is larger in scope than another good can have more of one or more desirable attributes. The attribute(s) in question can be quantitative or qualitative. Kahneman and Knetsch (1992) originally used the term ‘embedding’ in their well-known paper to refer to two distinct phenomena. The first involved sequencing of goods where standard theory predicts that WTP for the same good should differ depending upon the order in which it is valued. The second involves the relationship between the values of two goods, where one of the goods nests another. Carson and Mitchell (1995) recommend that the term ‘embedding’ not be used because of ambiguity in its meaning and relationship to theoretical predictions. Other variants of internal scope tests are possible where there are covariates such as distance to a recreation site on which a difference in WTP is expected if respondents are paying careful attention to a good’s details. It is possible to fail such a test with respect to a particular attribute, which is sometimes referred to as attribute non-attendance (for example, Scarpa et al., 2009). There are other forms of validity and reliability that can be examined. One of these is construct validity which asks if factors that might reasonably be expected to predict higher WTP do so (Mitchell and Carson, 1989). This is part of most CV studies and the results here strongly suggest that responses to CV questions are not random. Another is to ask about temporal reliability. This has been examined less

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 231

31. 32. 33.

34. 35. 36.

often, but major comparisons such as the replication of the Exxon Valdez study (Carson et al., 1997) are encouraging. For examples see Duffield and Patterson (1991), Seip and Strand (1992) and Champ et al. (1997). There are obviously other factors at work in comparing survey results with market behavior including informational differences and marketing efforts. This is typically reflected in being more price sensitive in the survey than actual markets, the good’s ASC or both. Brownstone and Small (2005) provide a discussion in the context of variable toll road pricing. Louviere and Hensher (2001) suggest the broader attribute range in DCE, including price, may be a factor in decreasing marginal WTP estimates. Hensher (2010) discusses factors that may influence the nature of the divergence. Murphy et al. (2005) also find that there are a small number of very large outliers that can have a dramatic effect on the mean since one is dealing with a ratio. It is not surprising that these studies are prominently featured in claims that CV studies grossly over-estimate. Some studies valued multiple goods or used a different statistical approach to value the same good. Giving equal weight to each study rather than each comparison results in the estimate of average CV/RP ratio being 0.92 with the confidence interval [0.81–1.03]. This measure was put on the ballot by the state legislature and did not have any organized group who spent substantial amounts supporting or opposing the measure. Surveys of support for a ballot measure are incentive compatible if respondents believe that there is any type of bandwagon effect such that other voters are more likely to favor a measure the higher the percentage in favor when the survey results are released to the public.

REFERENCES Acton, J.P. (1973), ‘Evaluating public progress to save lives: the case of heart attacks’, Rand Research Report R-73-02, Santa Monica CA: RAND Corporation. Adamowicz, W.L., J.J. Louviere and M. Williams (1994), ‘Combining revealed and stated preference methods for valuing environmental amenities’, Journal of Environmental Economics and Management, 26 (3), 271–92. Adamowicz, W., P. Boxall, M. Williams and J.J. Louviere (1998) ‘Stated preference approaches for measuring passive use values: choice experiments and contingent valuation’, American Journal of Agricultural Economics, 80 (1), 64–75. Alberini, A., B.J. Kanninen and R.T. Carson (1997), ‘Modeling response incentives in dichotomous choice contingent valuation data’, Land Economics, 73 (3), 309–24. Andreoni, J. (1989), ‘Giving with impure altruism: applications to charity and Ricardian equivalence’, Journal of Political Economy, 97 (6), 1147–458. Arrow, K., R. Solow, R., P.R. Portney, E.E. Learner, R. Radner and H. Schuman (1993), ‘Report of the NOAA panel on contingent valuation’, Federal Register, 58 (10), 4601–14. Azevedo, C.D., J.A. Herriges and C.L. Kling (2003), ‘Combining revealed and stated preferences: consistency tests and their interpretations’, American Journal of Agricultural Economics, 85 (3), 525–37. Bateman, I.J., R.T. Carson, B. Day, W.M. Hanemann, N.D. Hanley, T. Hett, M.W. Jones-Lee, G. Loomes, S. Mourato, E. Özdemiroglu, D.W. Pearce, R. Sugden and J. Swanson (2002), Economic Valuation with Stated Preference Techniques, Cheltenham, UK and Northampton, MA, USA: Edward Elgar. Bateman, I.J., M. Cole, P. Cooper, S. Georgiou, D. Hadley and G. Poe (2004), ‘On visible choice sets and scope sensitivity’, Journal of Environmental Economics and Management, 47 (1), 71–93. Bateman, I.J., A. Munro, B. Rhodes, C. Starmer and R. Sugden (1997), ‘Does part-whole bias exist? An experimental investigation’, Economic Journal, 107 (1), 322–32. Becker G.M., M.H. DeGroot and J. Marschak (1964), ‘Measuring utility by a single-response sequential method’, Behavioral Science, 9 (3), 226–32. Bennett, J. and R. Blamey (2001), The Choice Modelling Approach to Environmental Evaluation, Cheltenham, UK and Northampton, MA, USA: Edward Elgar. Bishop, R.C. and T.A. Heberlein (1979), ‘Measuring values of extra market goods’, American Journal of Agricultural Economics, 61 (5), 926–30. Bowen, H.R. (1943), ‘The interpretation of voting in the allocation of economic resources’, Quarterly Journal of Economics, 58 (1), 27–48. Brownstone, D. and K. Small (2005), ‘Valuing time and reliability: assessing the evidence from road pricing demonstrations’, Transportation Research A, 39 (4), 279–93. Brownstone, D., D.S. Bunch and K. Train (2000), ‘Joint mixed logit models of stated and revealed references for alternative-fuel vehicles’, Transportation Research B, 34 (5), 315–38.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

232

Handbook of choice modelling

Cameron, T.A. (1988), ‘A new paradigm for valuing non-market goods using referendum data: maximum likelihood estimation by censored logistic regression’, Journal of Environmental Economics and Management, 15 (3), 355–79. Cameron, T.A. (1992), ‘Combining contingent valuation and travel cost data for the valuation of nonmarket goods’, Land Economics, 68 (3), 302–17. Cameron, T.A., and D.D. Huppert (1989), ‘OLS versus ML estimation of non-market resource values with payment card interval data’, Journal of Environmental Economics and Management, 17 (3), 230–46. Cameron, T.A. and M.D. James (1987), ‘Efficient estimation methods for “closed-ended” contingent valuation surveys’, Review of Economics and Statistics, 69 (2), 269–76. Carson, R.T. (1985), Three Essays on Contingent Valuation, PhD dissertation, University of California Berkeley. Carson, R.T. (1997), ‘Contingent valuation and tests of insensitivity to scope’, in R.J. Kopp, W. Pommerehne and N. Schwartz, (eds), Determining the Value of Non-Marketed Goods: Economic, Psychological, and Policy Relevant Aspects of Contingent Valuation Methods, Boston. MA: Kluwer, pp. 127–64. Carson, R.T. (2011), Contingent Valuation: A Comprehensive Bibliography and History, Cheltenham, UK and Northampton, MA, USA: Edward Elgar. Carson, R.T. (2012), ‘Contingent valuation: a practical alternative when prices aren’t available’, Journal of Economic Perspectives, 26 (4), 27–42. Carson, R.T. and T. Groves (2007), ‘Incentive and informational properties of preference questions’, Environmental and Resource Economics, 37 (1), 181–210. Carson, R.T. and T. Groves (2011), ‘Incentive and information properties of preference questions: commentary and extensions’, in J. Bennett (ed.), International Handbook of Non-Market Environmental Valuation, Cheltenham, UK and Northampton, MA, USA: Edward Elgar. Carson, R.T. and W.M. Hanemann (2005), ‘Contingent valuation’, in K.G. Mäler and J.R. Vincent (eds), Handbook of Environmental Economics, Elsevier: Amsterdam, pp. 821–936. Carson, R.T. and J. Louviere (2011), ‘A common nomenclature for stated preference elicitation approaches’, Environmental and Resource Economics, 49 (4), 539–59. Carson, R.T. and R.C. Mitchell (1993), ‘The value of clean water: the public’s willingness to pay for boatable, fishable, and swimmable quality water’, Water Resources Research 29 (5), 2445–54. Carson, R.T. and R.C. Mitchell (1995), ‘Sequencing and nesting in contingent valuation surveys’, Journal of Environmental Economics and Management, 28 (2), 155–73. Carson, R.T., N.E. Flores and W.M. Hanemann (1998), ‘Sequencing and valuing public goods’, Journal of Environmental Economics and Management, 36 (3), 314–23. Carson, R.T., N.E. Flores and N.F. Meade (2001), ‘Contingent valuation: controversies and evidence’, Environmental and Resource Economics, 19 (2), 173–210. Carson, R.T., N.E. Flores and R.C. Mitchell (1999), ‘The theory and measurement of passive use value’, in I.J. Bateman and K.G. Willis (eds), Valuing Environmental Preferences: Theory and Practice of the Contingent Valuation Method, Oxford: Oxford University Press, pp. 97–130. Carson, R.T., N.E. Flores, K.M. Martin and J.L. Wright (1996), ‘Contingent valuation and revealed preference methodologies: comparing the estimates for quasi-public goods’, Land Economics, 72 (1), 80–99. Carson, R.T., W.M. Hanemann and R.C. Mitchell (1987), ‘The use of simulated political markets to value public goods’, Discussion Paper 87-7, Department of Economics, University of California, San Diego. Carson, R.T., W.M. Hanemann and D. Steinberg (1990), ‘A discrete choice contingent valuation estimate of the value of Kenai king salmon’, Journal of Behavioral Economics, 19 (1), 53–68. Carson, R.T., W.M. Hanemann, R.J. Kopp, J.A. Krosnick, R.C. Mitchell, S. Presser, P.A. Ruud and V.K. Smith (1994), ‘Prospective interim lost use value due to DDT and PCB contamination in the Southern California Bight’, report to the National Oceanic and Atmospheric Administration. Carson, R.T., W.M. Hanemann, R.J. Kopp, J.A. Krosnick, R.C. Mitchell, S. Presser, P.A. Ruud and V.K. Smith (1997), ‘Temporal reliability of estimates from contingent valuation’, Land Economics, 73 (2), 151–63. Carson, R.T., W.M. Hanemann, R.J. Kopp, R.C. Mitchell, S. Presser and P.A. Ruud (1992), A Contingent Valuation Study of Lost Passive Use Values Resulting from the Exxon Valdez Oil Spill, Anchorage: State of Alaska. Carson, R.T., W.M. Hanemann, R.J. Kopp, R.C. Mitchell, S. Presser and P.A. Ruud (2003), ‘Contingent valuation and lost passive use: damages from the Exxon Valdez oil spill’, Environmental and Resource Economics, 25 (3), 257–86. Carson, R.T., R.C. Mitchell and P.A. Ruud (1990), ‘Valuing air quality improvements: simulating a hedonic equation in the context of a contingent valuation scenario’, in C.V. Mathai (ed.), Visibility and Fine Particles, Pittsburgh, PA: Air & Waste Management Association, pp. 639–946. Carson, R.T., L. Wilks and D. Imber (1994), ‘Valuing the preservation of Australia’s Kakadu Conservation Zone’, Oxford Economic Papers, 46 (5), 727–49. Champ, P.A., K.J. Boyle and T.C. Brown (eds) (2003), A Primer on Non-Market Valuation, Boston, MA: Kluwer Academic Press.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 233 Champ, P.A. and T.C. Brown (1997), ‘A comparison of contingent and actual voting behavior’, W-133, Proceedings: Benefits and Cost Transfer in Natural Resource Planning, 10th Interim Report, Rocky Mountain Research Station, USDA Forest Service. Champ, P.A., R.C. Bishop, T.C. Brown and D.W. McCollum (1997), ‘Using donation mechanisms to value nonuse benefits from public goods’, Journal of Environmental Economics and Management, 33 (2), 151–62. Chilton, S.M. and W.G. Hutchinson (1999), ‘Some further implications of incorporating the warm glow of giving into welfare measures: a comment on the use of donation mechanisms by Champ et al.’, Journal of Environmental Economics and Management, 37 (2), 202–9. Ciriacy-Wantrup, S.V. (1947), ‘Capital returns from soil-conservation practices’, Journal of Farm Economics, 29 (4), 1181–96. Cornes, R. and T. Sandler (1996), The Theory of Externalities, Public Goods and Club Goods, New York: Cambridge University Press. Corso, P.S., J.K. Hammitt and J.D. Graham, (2001), ‘Valuing mortality-risk reduction: using visual aids to improve the validity of contingent valuation’, Journal of Risk and Uncertainty, 23 (2), 165–84. Cummings, R.G., and G.W. Harrison (1995), ‘The measurement and decomposition of nonuse values: a critical review’, Environmental and Resource Economics, 5 (3), 225–47. Cummings, R.G., D.S. Brookshire and W.D. Schulze (1986), Valuing Environmental Goods: An Assessment of the Contingent Valuation Method, Totowa, NJ: Rowman and Allanheld. Davis, R.K. (1963), ‘The value of outdoor recreation: an economic study of the Maine woods’, dissertation, Harvard University. Deaton, A.S. (1974), ‘The analysis of consumer demand in the united kingdom’, Econometrica, 42 (2), 341–67. Desvousges, W.H., K.J. Boyle, R.W. Dunford, S.P. Hudson, F.R. Johnson and K.N. Wilson (1993), ‘Measuring natural resource damages with contingent valuation: tests of validity and reliability’, in J.A. Hausman (ed.), Contingent Valuation: A Critical Assessment, Amsterdam: North-Holland, pp. 91–164. Diamond, P.A. (1996), ‘Testing the internal consistency of contingent valuation surveys’, Journal of Environmental Economics and Management, 30 (3), 337–47. Diamond, P.A. and J.A. Hausman (1994), ‘Contingent valuation: is some number better than no number?’, Journal of Economic Perspectives, 8 (4), 45–64. Duffield, J.W. and D.A. Patterson (1991), ‘Field testing existence values: an instream flow trust fund for Montana Rivers’, paper presented at the Allied Social Sciences Conference, New Orleans, January. Farquharson, R. (1969), Theory of Voting, New Haven, CT: Yale University Press. Flores, N.E. and R.T. Carson (1997), ‘The relationship between the income elasticities of demand and willingness to pay’, Journal of Environmental Economics and Management, 33 (3), 287–95. Green, J.R. and J.J. Laffont (1978), ‘A sampling approach to the free rider problem’, in Agnar Sandmo (ed.), Essays in Public Economics, Lexington, MA: Lexington Books, pp. 95–114. Griffiths, C.W., H. Klemick, M. Massey, C. Moore, S. Newbold, D. Simpson, P. Walsh and W. Wheeler (2012), ‘U.S. Environmental Protection Agency’s valuation of surface water quality improvements’, Review of Environmental Economics and Policy, 6 (1), 130–46. Haab, T.C. and K.E. McConnell (2002), Valuing Environmental and Natural Resources: The Econometrics of Non-Market Valuation, Cheltenham, UK and Northampton, MA, USA: Edward Elgar. Hammack, J. and G.M. Brown, Jr (1974), Waterfowl and Wetlands: Toward Bioeconomic Analysis, Baltimore MD: Johns Hopkins University Press. Hanemann, W.M. (1984a), ‘Discrete/continuous models of consumer demand’, Econometrica, 52 (3), 541–61. Hanemann, W.M. (1984b), ‘Welfare evaluations in contingent valuation experiments with discrete responses’, American Journal of Agricultural Economics, 66 (3), 332–41. Hanemann W.M. (1991), ‘Willingness to pay and willingness to accept: how much can they differ?’, American Economic Review, 81 (3), 635–47. Hanemann, W.M. and B.J. Kanninen (1999), ‘The statistical analysis of discrete-response CVM data’, in I.J. Bateman and K.G. Willis (eds), Valuing Environmental Preferences: Theory and Practice of the Contingent Valuation Method in the US, EU and Developing Countries, Oxford: Oxford University Press, pp. 302–441. Hanemann, W.M., J.B. Loomis and B.J. Kanninen (1991), ‘Statistical efficiency of double-bounded dichotomous choice contingent valuation’, American Journal of Agricultural Economics, 73 (4), 1255–63. Hanley, N.D., S. Mourato and R.E. Wright (2001), ‘Choice modelling approaches: a superior alternative for environmental valuation?’, Journal of Economic Surveys, 15 (3), 435–62. Hanley, N.D., R.E. Wright and W.L. Adamowicz (1998), ‘Using choice experiments to value the environment – design issues, current experience and future prospects’, Environmental and Resource Economics, 11 (3), 413–28. Hausman, J. (ed.), (1993), Contingent Valuation: A Critical Assessment, Amsterdam: Elsevier. Hensher, D.A. (2010), ‘Hypothetical bias, choice experiments and willingness to pay’, Transportation Research, Part B, 44 (6), 735–44. Hensher, D.A. and P. Truong (1985), ‘Valuation of travel time savings: a direct experimental approach’, Journal of Transport Economics and Policy, 19 (3), 237–61.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

234

Handbook of choice modelling

Hensher, D.A., J.J. Louviere and J. Swait (1999), ‘Combining sources of preference data’, Journal of Econometrics, 89, 197–221. Hensher, D.A., J.M. Rose and W.H. Greene (2005), Applied Choice Analysis: A Primer, New York: Cambridge University Press. Hicks, J.R. (1943), ‘The four consumer surpluses’, Review of Economic Studies, 11 (1), 31–41. Hoehn, J.P. and A. Randall (1987), ‘A satisfactory benefit cost indicator from contingent valuation’, Journal of Environmental Economics and Management, 14 (3), 226–47. Hoehn, J.P. and A. Randall (1989), ‘Too many proposals past the benefit cost test’, American Economics Review, 79 (3), 544–51. Horowitz, J.K. and K.E. McConnell, 2002, ‘A review of WTA/WTP studies’, Journal of Environmental Economics and Management, 44 (3), 426–47. Hoyos, D. (2010), ‘The state of the art of environmental valuation with discrete choice experiments’, Ecological Economics, 69 (8), 1595–603. Johnston, R.J. (2006), ‘Is hypothetical bias universal? Validating contingent valuation responses using a binding public referendum’, Journal of Environmental Economics and Management, 52 (1), 469–81. Kahneman, D. and J.L. Knetsch (1992), ‘Valuing public goods: the purchase of moral satisfaction’, Journal of Environmental Economics and Management, 22 (1), 57–70. Kahneman, D. and A. Tversky (1979), ‘Prospect theory: an analysis of decision under risk’, Econometrica, 47 (2), 263–91. Kanninen, B. (ed.), (2007), Valuing Environmental Amenities Using Stated Choice Studies: A Common Sense Approach to Theory and Practice, Dordrecht: Springer. Kling, C.L., D.J. Phaneuf and J. Zhao (2012), ‘From Exxon to BP: has some number become better than no number?’, Journal of Economic Perspectives, 26 (4), 3–26. Knetsch, J.L. (1990), Environmental policy implications of disparities between willingness to pay and compensation demanded measures of values’, Journal of Environmental Economics and Management, 18 (3), 227–38. Knetsch, J.L. and R.K. Davis (1966), ‘Comparisons of methods for resource evaluation’, in A.V. Kneese and S.C. Smith (eds), Water Research, Baltimore MD: Johns Hopkins University Press, pp. 125–42. Knetsch, J.L., R.H. Thaler and D. Kahneman (1990), ‘Experimental tests of the endowment effect and the Coase theorem’, Journal of Political Economy, 98 (6), 1325–49. Kochi, I., B. Hubbell and R. Kramer (2006), ‘An empirical Bayes approach to combining and comparing estimates of the value of a statistical life for environmental policy analysis’, Environmental and Resource Economics, 34 (3), 385–406. Krutilla, J.V. (1967), ‘Conservation reconsidered’, American Economic Review, 57 (4), 777–86. Lancaster, K. (1966), ‘A new approach to consumer theory,’ Journal of Political Economy, 74 (2), 132–57. Louviere, J.J. (1974), ‘Predicting the evaluation of real stimulus objects from an abstract evaluation of their attributes: the case of trout streams’, Journal of Applied Psychology, 59 (5), 572–77. Louviere, J.J. and D.A. Hensher (2001), ‘Combining sources of preference data’, in D.A. Hensher (ed.), Travel Behaviour Research: The Leading Edge, Oxford: Pergamon, pp. 125–44. Louviere, J.J., D.A. Hensher and J.D. Swait (2000), Stated Choice Methods: Analysis and Application, New York: Cambridge University Press. Louviere, J.J., T. Flynn and R.T. Carson (2010), ‘Discrete choice experiments are not conjoint analysis’, Journal of Choice Modelling, 3 (3), 57–72. Mackenzie, J. (1993), ‘A comparison of contingent preference models’, American Journal of Agricultural Economics, 75 (3), 593–603. Madden, P. (1991), ‘A generalization of Hicksian substitutes and complements with application to demand rationing’, Econometrica, 59 (5), 1497–508. Marley, A.A.J., T.N. Flynn and J.J. Louviere (2008), ‘Probabilistic models of set-dependent and attribute-level best–worst choice’, Journal of Mathematical Psychology, 52 (5), 281–96. Manski, C. (1977), ‘The structure of random utility models’, Theory & Decision, 8 (3), 229–54. McConnell, K.E. (1990), ‘Models for referendum data: the structure of discrete choice models for contingent valuation’, Journal of Environmental Economics and Management, 18 (1), 19–34. McFadden, D.L. (1974), ‘Conditional logit analysis of qualitative choice behavior’, in P. Zarembka (ed.), Frontiers in Econometrics, New York: Academic Press, pp. 105–42. McFadden, D.L., (1994), ‘Contingent valuation and social choice’, American Journal of Agricultural Economics, 76 (4), 689–708. Metcalfe, P.J. W. Baker, K. Andrews, G. Atkinson, I.J. Bateman, S. Butler, R.T. Carson, J. East, Y. Gueron, R. Sheldon and K. Train (2012), ‘An assessment of the nonmarket benefits of the water framework directive for households in England and Wales’, Water Resources Research, 48 (3), W03526, doi: 10.1029/2010WR009592. Mitchell, R.C. and R.T. Carson (1981), An Experiment in Determining Willingness to Pay for National Water

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

The discrete choice experiment approach 235 Quality Improvements, report to the US Environmental Protection Agency, Washington, DC: Resources for the Future. Mitchell, R.C. and R.T. Carson (1989), Using Surveys to Value Public Goods: The Contingent Valuation Method, Baltimore, MD: Resources for the Future. Murphy, J.J., G. Allen, T.S. Stevens and D. Weatherhead (2005), ‘A meta-analysis of hypothetical bias in stated preference valuation’, Resource and Environmental Economics, 30 (3), 313–25. Navrud, S. and R.C. Ready (eds), (2002), Valuing Cultural Heritage: Applying Environmental Valuation Techniques to Historic Buildings, Monuments and Artifacts, Cheltenham, UK and Northampton, MA, USA: Edward Elgar. Neill, H.R., R. Cummings, P. Ganderton, G.W. Harrison and T. McGuckin (1994), ‘Hypothetical surveys and real economic commitments,’ Land Economics, 70 (2), 145–54. Rae, D.A., (1982), Benefits of Visual Air Quality in Cincinnati, report to the Electric Power Research Institute, Boston, MA: Charles River Associates. Randall, A., J.P. Hoehn and G.S. Tolley (1981), ‘The structure of contingent markets: some experimental results’, paper presented at American Economic Association Meeting, December. Randall, A., B.C. Ives and C. Eastman, 1974, ‘Bidding games for the valuation of aesthetic environmental improvements’, Journal of Environmental Economics and Management, 1 (2), 132–49. Rosenberger, R.S. and J.B. Loomis (2000), ‘Using meta-analysis for benefit transfer: in-sample convergent validity tests of an outdoor recreation database’, Water Resources Research, 36 (4), 1097–107. Samples, K.C., J.A. Dixon and M.M. Gower (1986), ‘Information disclosure and endangered species valuation’, Land Economics, 62 (3), 306–12. Samuelson, P.A. (1954) ‘The pure theory of public expenditures’, Review of Economics and Statistics, 36 (4), 387–89. Scarpa, R., T.J. Gilbride, D. Campbell and D.A. Hensher (2009), ‘Modelling attribute non-attendance in choice experiments for rural landscape valuation,’ European Review of Agricultural Economics, 36 (2), 151–74 Seip, K. and J. Strand (1992, ‘Willingness to pay for environmental goods in Norway: a CVM study with real payment’, Environmental and Resource Economics, 2 (1), 91–106. Smith, V.K. and W.H. Desvousges (1986), Measuring Water Quality Benefits, Boston, MA: Kluwer. Smith, V.K. and L. Osborne (1996), ‘Do contingent valuation estimates pass a scope test? A meta analysis’, Journal of Environmental Economics and Management, 31 (3), 287–301. Thayer, M.A. (1981), ‘Contingent valuation techniques for assessing environmental impacts: further evidence’, Journal of Environmental Economics and Management, 8 (1), 27–44. Throsby, C.D. (1984), ‘The measurement of willingness to pay for mixed goods’, Oxford Bulletin of Economics and Statistics, 46 (4), 279–89. Thurstone, L. (1927), ‘A law of comparative judgment’, Psychological Review, 34 (4), 273–86. Train, K. and M. Weeks (2005), ‘Discrete choice models in preference space and willingness-to-pay space’, in R. Scarpa and A. Alberini (eds), Applications of Simulation Methods in Environmental and Resource Economics, Dordrecht: Springer, pp. 1–16. Tversky, A., P. Slovic and D. Kahneman (1990), ‘The causes of preference reversals’, American Economic Review, 80 (1), 204–17. US Environmental Protection Agency (1994), President Clinton’s Clean Water Initiative: Analysis of Benefits and Costs, EPA 800-R-94-002, Washington: US EPA. US Water Resources Council (1979), ‘Procedures for evaluation of National Economic Development (NED) benefits and costs in water resources planning (level C), final rule’, Federal Register, 44 (242), 72892–976. Veronesi, M., A. Alberini and J.C. Cooper (2011), ‘Implications of bid design and willingness-to-pay distribution for starting point bias in double-bounded dichotomous choice contingent valuation surveys’, Environmental and Resource, 49 (2), 199–215. Vossler, C.A. and J. Kerkvliet (2003), ‘A criterion validity test of the contingent valuation method: comparing hypothetical and actual voting behavior for a public referendum’, Journal of Environmental Economics and Management, 45 (3), 631–49. Vossler, C.A., J. Kerkvliet, S. Polasky and O. Gainutdinova (2003), ‘Externally validating contingent valuation: an open-space survey and referendum in Corvallis, Oregon’, Journal of Economic Behavior and Organization, 51 (2), 261–77. Walsh, R.G., J.B. Loomis and R.A. Gillman (1984), ‘Valuing option, existence and bequest demands for wilderness’, Land Economics, 60 (1), 14–29. Walsh, R.G., L.D. Sanders and J.B. Loomis (1985), ‘Wild and scenic river economics: recreation use and preservation values’, report to the American Wilderness Alliance. Willig, R. (1976), ‘Consumer’s surplus without apology’, American Economic Review, 66 (4), 589–97. World Bank, Water Demand Research Team (1993), ‘The demand for water in rural areas: determinants and policy implications’, World Bank Research Observer, 8 (1), 47–70. Zhao, J. and C. Kling (2009), Welfare measures when agents can learn’, Economic Journal, 119 (540), 1560–85.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:02AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

10 Real choices and hypothetical choices Glenn W. Harrison

The distinction between real choices and hypothetical choices is one that is either completely ignored or is the focal point of intense inter-disciplinary controversy. In some quarters the terminology distinguishes ‘stated preferences’ and ‘revealed preferences’, where the former means preferences revealed by choices when there are no consequences for the decision maker and the latter means preferences revealed when there are consequences for the decision maker. The issues are the same: does it matter if choices are hypothetical or real and, if so, what can be done about it? There are many variants of ‘choice experiments’ in use and the distinction between real and hypothetical choices affects them all. In the context in which the expression is used in this volume, it refers to any situation in which a decision maker is asked to rank or choose from two or more alternatives and where there are several choices to be made in which one or more attributes of the alternatives are varied. In general there are many more attributes than prices that are varied. There appears to be no logical reason to restrict the term ‘choice experiments’ to hypothetical tasks, although that is common in the area of environmental valuation and marketing. In fact, the comparison of hypothetical responses and real responses lies at the heart of tests for incentive compatibility, where the expression ‘real responses’ is then a short hand for any task for which the choices of the decision-maker are related in a salient manner to real outcomes. Choices may also be rewarded in a non-salient manner, such as if someone was paid $10 to complete a survey, irrespective of the responses to the survey. In many social policy settings, the connection between hypothetical and real choices may be more probabilistic and tenuous than the crisp experiments that have been the focus of the academic literature. A survey may have some ill-defined ‘advisory’ role in terms of influencing policy, in some manner that is often maddeningly vague to experimental economists. But there are sometimes good reasons for such ambiguity, such as when it honestly reflects the true state of scientific knowledge or the political and legal process. We know very little about the effects of these types of ill-defined social consequences for incentive compatibility. We therefore focus here on the crisp light of experiments that involve real and transparent consequences, but we also consider how lessons about incentive compatibility drawn from the harsh contrasts of the laboratory can be transferred to more practical settings in which choice studies are applied. In section 1 we review the concept of incentive compatibility, which is at the heart of the passion that some have for considering real choices and dismissing hypothetical choice. The practical lesson, however, is that incentive compatibility means more than providing real consequences of the choices respondents make. The connection between different choices and different consequences has to make it in the best interests of the respondent to respond truthfully. Further, this connection has to be behaviorally transparent and credible, so that the respondent does not start to second-guess the incentive to respond truthfully. 236 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Real choices and hypothetical choices 237 In sections 2 and 3 the importance of making responses incentive compatible is evaluated. The most directly relevant evidence comes from laboratory experiments, where one can crisply compare environments in which the responses are incentive compatible and those where they are not. This distinction has typically been examined by just looking at choices made when the consequences are hypothetical or imagined, and comparing them to choices made when the consequences are real. There is systematic evidence of differences in responses across a wide range of elicitation procedures. The evidence is not universal, and there are some elicitation procedures and contexts in which the problem of incentive compatibility does not appear to be so serious. But there is no ‘magic bullet’ procedure or question-format that reliably produces the same results in hypothetical and real settings. Section 4 changes gear. The evidence from sections 2 and 3 establishes that there is a problem to be solved: one cannot just assume the problem of incentive compatibility away, at least if one wants to cite the literature in a systematic way. But there are several constructive ways in which one can mitigate hypothetical bias, or correct for it. One way is by ‘instrument calibration’, which is the use of controlled experiments with a particular survey population, scenario and valuation task to identify the best way to ask the question. In effect, this represents the use of experiments to put operationally meaningful teeth in the ‘focus group’ activities that many choice researchers undertake already, at least for large-scale choice studies used for policy or litigation. The other calibration approach is ex post the survey, and uses ‘statistical calibration’ procedures to try to correct for any biases in responses. Again, experiments are used to complement the survey, in this case to identify possible differences in hypothetical and real choices that might be systematically correlated with observable characteristics. These statistical methods can then be used to correct for biases, and also to better identify the appropriate standard errors to attach to estimates derived from choice studies. Section 5 discusses a number of open issues that have been ignored in previous work. Section 6 draws some conclusions for the practical application of recognition of the difference between hypothetical and real choices. These conclusions might seem harsh, but the objective is to force hypothetical choice researchers to confess to the potential problem they face, and do something constructive about it. The current practice is simply to quote the literature selectively, which allows the low-level policy applications of the hypothetical choice method to survive casual scrutiny. Higher-level applications are another matter, where the academic, adversarial and policy stakes are substantial enough to force more scrutiny. In those settings the reputation of the hypothetical choice approach, as currently practiced, is frankly appalling. But that could change quickly if the problem of incentive compatibility is addressed.

1

WHAT IS INCENTIVE COMPATIBILITY?

To illustrate the concept of incentive compatibility in relation to choice behavior, we focus initially on voting behavior in referenda. Apart from the popularity of advisory referenda in non-market valuation settings, the context of voting matches the history of thought on these matters. It is then easy to see the implications for choice experiments defined in a non-voting context.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

238

Handbook of choice modelling

Specifically, consider the design of voting mechanisms for referenda that are incentive compatible and non-dictatorial.1 In the case of voting mechanisms involving the selection of an alternative among k-alternatives, k ≥ 3, it is well known that, in fact, no such voting procedure exists.2 It is, however, easier to devise a voting mechanism involving choice among only two alternatives (k = 2) that is incentive compatible. One such voting mechanism is simple majority rule. Typically, incentive compatibility for this mechanism requires, in addition to the restriction to two alternatives, the assumption that individuals perceive that their utilities are affected by the outcome of the vote. Thus, if the voter thinks that his behavior will have some impact on the chance that one or the other alternative will be implemented, and that his utility will be affected by the outcome, the voter has a positive incentive to behave truthfully and vote honestly. Recent work on institution design using the revelation principle employs incentive compatibility as a formal constraint. This formulation uses a much stronger assumption, called epsilon truthfulness: ‘If the agent is indifferent between lying and telling the truth, assume he tells the truth.’ 3 It is important that one recognizes epsilon truthfulness for what it is: an assertion or assumption that is regarded by many as excessively strong and that does not enjoy an empirical foundation. The validity of epsilon truthfulness remains an open empirical question. In the literature concerned with the use of hypothetical choices for valuing environmental goods the epsilon truthfulness assumption is often applied to hypothetical referenda. For example, Mitchell and Carson (1989, p. 151) state that: We also showed that the discrete-choice referendum model was incentive-compatible in the sense that a person could do no better than vote yes if her WTP [willingness to pay] for a good being valued by this approach was at least as large as the tax price, and to vote no if this was not the case. This finding offers the possibility of framing contingent valuation questions so that they possess theoretically ideal and truthful demand-revelation properties.

Since one cannot know a priori whether or not subjects in a choice study will feel that their utilities will be affected by the outcome of a hypothetical vote, such assertions of incentive compatibility require that one assume that subjects will behave as they do in real referenda; that is, one invokes a form of the epsilon truthfulness assumption. The question as to whether or not a hypothetical referendum using majority rule is incentive compatible has become an important policy issue given its prominence in proposed guidelines for applications of contingent valuation (CV) for estimating environmental damages using stated choice methods. In proposed rules for using the CV method, both the Department of the Interior (DOI, 1994, p. 23102) and the National Oceanographic and Atmospheric Administration (NOAA, 1994, p. 1144) assert that, in applications of CV ‘the voting format is incentive compatible. If respondents desire the program at the stated price, they must reveal their preferences and vote for the program’.4 This proposed prescription for public policy is based on an assumption that presupposes acceptance of the hypothesis: a voter’s behavior is independent of the use of a real or hypothetical referendum mechanism. This hypothesis, and therefore the credibility of the incentive compatibility assumption for hypothetical referenda, has been empirically tested by Cummings et al. (1997). Our focus is on one reason for the lack of incentive compatibility of stated choice

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Real choices and hypothetical choices 239 experiments: hypothetical bias. This bias is said to occur whenever there is a difference between the choices made when the subjects face real consequences from their actions compared with the choices made where they face no real consequences from their actions. However, in many settings of interest to stated choice researchers in environmental economics who deal with public goods, there may be another source deriving from the propensity to free ride on the provision of others. The propensity to free ride5 has been shown to be alive and well in the laboratory, as the survey by Ledyard (1995), documents. Harrison and Hirshleifer (1989) also show that it varies theoretically and behaviorally with the nature of the production process used to aggregate private contributions into a public good, such as one finds with threshold effects in many public goods (for example, health effects of pollutants, species extinction). It is difficult to say a priori if the free-riding bias is greater than the hypothetical bias problem. In fact, there is a dearth of studies of the interaction of the two biases. To answer the question posed at the outset, incentive compatibility will be measured in terms of differences in responses between hypothetical and real environments, and where the real environment has been designed to encourage truthful responses. A ‘real environment’ is one in which the respondent bears some real consequences from making one response or the other. This will normally mean that the scenario is not imaginary, but it is the consequence that is the behavioral trace that we use to identify deviations from incentive compatibility. Knowledge that the respondent will answer truthfully normally comes from a priori reasoning about rational responses to known incentives. But we will also want to be cognizant of the need to ensure that the respondent sees what is a priori obvious to the (academic) analyst.6 For example, we prefer mechanisms for which it is a dominant strategy to tell the truth, where this can be explained to the respondent in a non-technical manner, and where the verification of this fact is a simple matter for the subject. Sometimes we cannot have this ideal behavioral environment. Rational responses may be truthful only in some strategic Nash equilibrium, so the respondent has to make some guess as to the rationality of other players. Or the respondent might not understand the simple explanation given or suspect the surveyor of deception, in which case ‘all bets are off’ when it comes to claims of incentive compatibility.

2

EVIDENCE OF HYPOTHETICAL BIAS FROM STYLIZED CHOICE TASKS

We begin the review of previous evidence by considering the simple cases in which one elicits choices over two alternatives, or where the only attribute that is varied is the cost of the alternative. If we cannot say whether choices are incentive compatible in these settings, we had better give up trying to do so in the more complex settings in which there are more than two alternatives varying in terms of some non-monetary dimension. We simplify things even further by considering elicitation over a private good, for which it is easy to exclude non-purchasers. A dichotomous choice (DC) elicitation in this setting is just a ‘take it or leave it’ offer, much like the posted-offer institution studied by experimental economists for many years. The difference is that the experimenter presents the subjects with a price,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

240

Handbook of choice modelling

and the subject responds ‘yes’ or ‘no’ if she is willing to pay that amount. The subject gets the commodity if and only if they say ‘yes’ and then part with their money. The consequences of a ‘yes’ response are real, and not imagined. Incentive compatibility is apparent, at least in the usual partial-equilibrium settings in which such things are discussed.7 Cummings et al. (1995b) designed some of the simplest experiments that have probably ever been run, just to expose the emptiness of the claims of those that would simply assert that hypothetical responses are the same as real responses in a DC setting. Subjects were randomly assigned to one of two rooms, the only difference being the use of hypothetical or real language in the instructions. An electric juicer was displayed, and passed around the room with the price tag removed or blacked out. The display box for the juicer had some informative blurb about the product, as well as pictures of it ‘in action’. Subjects were asked to say whether or not they would be willing to pay some stated amount for the good. The hypothetical subjects responded much more positively than the real subjects. Since the private sources funding these experiments did not believe that ‘students were real people’, the subjects were non-student adults drawn from church groups. The same qualitative results were obtained with students, with the same commodity and with different commodities. Comparable results have been obtained in a willingness to accept setting by Nape et al. (2003). In response to the experimental results of Cummings et al. (1995b), some proponents of hypothetical surveys argued that their claims for the incentive compatibility of the DC approach actually pertained to simple majority rule settings in which there was a referendum over just two social choices. Somehow that setting provides the context that subjects need to spot the incentive compatibility, or so it was argued. Again, it is apparent that this context is incentive-compatible if subjects face real consequences. Cummings et al. (1997) therefore undertook simple majority rule experiments for an actual public good. After earning some income, in addition to their show-up fee, subjects were asked to vote on a proposition that would have each of them contribute a specified amount towards this public good. If the majority said ‘yes’, all had to pay. The key treatments were again the use of hypothetical or real payments, and again there was significant evidence of hypothetical bias.

3

EVIDENCE OF HYPOTHETICAL BIAS FROM CHOICE EXPERIMENTS

We now reconsider more closely the evidence for hypothetical bias from several studies that are closer to the choice modeling environment considered in this volume. In each case we evaluate the raw data using comparable statistical methods. Overall, the evidence is that hypothetical bias exists and needs to be worried about: hypothetical choices are not reliably incentive compatible. But there is a glimmer or two of good news, and certain settings in which the extent of hypothetical bias might be minimal. The task is to try to understand this variation in the behavioral extent of the bias, not just document it. Only by understanding it can one design stated choice studies that mitigate it reliably.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Real choices and hypothetical choices 241 3.1

Multiple Price Lists

A direct extension of the DC choice task is to implicitly offer the subject three choices: buy the good at one stated price, buy the good at another stated price, or keep your money. In this case, known in the experimental literature as a multiple price list (MPL) auction, the subject is actually asked to make two choices: say ‘yes’ or ‘no’ to whether the good would be purchased at the first price, and make a similar choice at the second price. The subject can effectively make a third choice by saying ‘no’ to both of these two initial choices. The MPL can be made incentive compatible by telling the subject that one of the choices will be picked at random for implementation. The MPL design has been demonstrated to exhibit hypothetical bias in the elicitation of risk attitudes by Holt and Laury (2002, 2005) and Harrison (2005), and in the elicitation of individual discount rates by Coller and Williams (1999). 3.2 Conjoint Choice Experiments Conjoint choice tasks involve several choices being posed to subjects, in the spirit of the revealed preference logic. Each choice involves the subject reporting a preference over two or more bundles, where a bundle is defined by a set of characteristics of one or more commodities. The simplest example would be where the commodity is the same in all bundles, but price is the only characteristic varied. This special case is just the MPL discussed above, in which the subject may be constrained to just pick one of the prices (if any). The most popular variant is where price and non-price characteristics are allowed to vary across the choices. For example, one bundle might be a lower-quality version of the good at some lower price, one bundle might be a higher-quality version at a higher price, and one bundle is the status quo in which nothing is purchased. The subject might be asked to pick one of these three bundles in one choice task (or to provide a ranking). Typically there are several such choices. To continue the example, the qualities might be varied and/or the prices on offer varied. By asking the subject to make a series of such choices, and picking one at random for playing out,8 the subject’s preferences over the characteristics can be ‘captured’ in the familiar revealed preference manner. Since each choice reflects the preferences of the subject, if one is selected for implementation independently9 of the subject’s responses, the method is obviously incentive-compatible.10 Furthermore, the incentive to reveal true preferences is relatively transparent. This set of variants goes by far too many names in the literature. The expression ‘choice experiments’ is popular, but too generic to be accurate. A reference to ‘conjoint analysis’ helps differentiate the method, but at the cost of semantic opacity. In the end, the expression ‘revealed preference methods’ serves to describe these methods well, and connect them to a long and honorable tradition in economics since Samuelson (1938), Afriat (1967) and Varian (1982, 1983). Several studies examine hypothetical bias in this revealed preference elicitation method, at least as it is applied to valuation and ranking. Allocating money to environmental projects Carlsson and Martinsson (2001) allow subjects to allocate real money to two environmental projects, varying three characteristics: the amount of money the subject personally

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

242

Handbook of choice modelling

receives, the amount of money donated to an environmental project by the researchers and the specific World Wildlife Fund project that the donation should go to. They conclude that the real and hypothetical responses are statistically indistinguishable, using statistical models commonly used in this literature. However, several problems with their experiment make it hard to draw reliable inferences. First, and most seriously, the real treatments were all in-sample: each subject gave a series of hypothetical responses, and then gave real responses. There are obvious ways to test for order effects in such designs, as used by Cummings et al. (1995b) for example, but they are an obvious confound here. Second, the subjects were allocating ‘house money’ with respect to the donation, rather than their own. This made it hard to implement a status quo decision, since it would have been dominated by the donation options if the subject had even the slightest value for the environmental project. On the other hand, there is a concern that these are all artificial, forced decisions that might not reflect how subjects allocate monies according to their true preferences (unless one makes strong separability assumptions). Third, all three environmental projects were administered by the same organization, which leads the subject to view them as perfect substitutes. This perception is enhanced by a (rational) belief that the organization was free to reallocate untied funds residually, such that there is no net effect on the specific project. Thus the subjects may well have rationally been indifferent over this characteristic.11 Valuing beef Lusk and Schroeder (2004) conduct a careful test of hypothetical bias for the valuation of beef using revealed preference methods. They consider five different types of steak, and vary the relative prices of each steak type over 17 choices. For the subjects facing a real task, one of the 17 choices was to be selected at random for implementation. Subjects also considered a ‘none of these’ option that allowed them not to purchase any steak. Each steak type was a 12 oz steak, and subjects were told that the baseline steak, a ‘generic steak’ with no label, had a market price of $6.07 at a local supermarket. Each subject received a $40 endowment at the outset of the experiment, making payment feasible for those in the real treatment. Applying the statistical methods commonly used to analyze these data, they find significant differences between hypothetical and real responses. Specifically, they find that the marginal values of the attributes between hypothetical and real are identical but that the propensity to purchase, attributes held constant, is higher in the hypothetical case. More experimental tests of the revealed preference approach are likely. I conjecture that the experimental and statistical treatment of the ‘no buy’ option will be critical to the evaluation of this approach. It is plausible that hypothetical bias will manifest itself in the ‘buy something’ versus ‘buy nothing’ stage in decision-making, and not so much in the ‘buy this’ or ‘buy that’ stage that conditionally follows.12 Indeed, this hypothesis has been one of the implicit attractions of the method. The idea is that one can then focus on the second stage to ascertain the value placed on characteristics. But this promise may be illusory if one of the characteristics varied is price and separability in decisions is not appropriate. In this case the latent utility specification implies that changes in price spill over from the ‘buy this or buy that’ nest of the utility function and influence the ‘buy or no-buy’ decision.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Real choices and hypothetical choices 243 Ranking mortality risks Harrison and Rutström (2006a) report the results of a conjoint choice ranking experiment in which there was a marked lack of hypothetical bias. Their task involved subjects ranking the 12 major causes of death in the USA. The task was broken down for each subject according to broad age groups. Thus a subject aged 25 was asked to state 12 rankings for deaths in the age group 15 to 24, 12 more rankings for deaths in the age group 25 to 44, 12 more rankings for the age group 45 to 64 and, finally, 12 rankings for those 65 and over. In the real rewards treatment the subject was simply paid $1 for every correct ranking. Thus the subject could earn up to $48 in the session. The hypothetical versions of the survey instrument replaced the text in the original versions which described the salient reward for accuracy. The replacement text was very simple: You will be paid $10 for your time. We would like you to try to rank these as accurately as you can, compared to the official tabulations put out by the U.S. Department of Health. When you have finished please check that all cells in the table below are filled in.

The experiment was otherwise administered identically to the others with salient rewards, using a between-subjects design. There were 95 subjects in the hypothetical rewards experiments13 and 45 subjects in the salient rewards experiments. The rank errors for the hypothetical (H) sessions are virtually identical to those in the real (R) sessions. The average rank error in the H sessions is 2.15, compared to 2.00 in the R sessions. Moreover, the standard deviation in the H sessions is 1.95, which is also close to the 1.90 for the R sessions. Although there has been some evidence to suggest that average H responses might be the same as R responses in some settings, it is common to see a significantly higher variance in H responses as noted earlier. A regression analysis confirms the conclusion from the raw descriptive statistics, but when appropriate controls are added. This conclusion from the hypothetical survey variant is a surprise, given the extensive literature on the extent of hypothetical bias: the responses obtained in this hypothetical setting are statistically identical to those found in a real setting. The hypothetical setting implemented here should perhaps be better referred to as a non-salient experiment. Subjects were rewarded for participating, with a fixed show-up fee of $10. The hypothetical surveys popular in the field rarely reward subjects for participating, although this has occurred in some cases. There could be a difference between our non-salient experiment and ‘truly hypothetical’ experiments. One feature of the vast literature on hypothetical bias is that it deals almost exclusively with valuation tasks and binary choice tasks, rather than ranking tasks.14 The experimental task of Harrison and Rutström (2006a) is a ranking task. It is possible that the evidence on hypothetical bias in valuation settings simply does not apply so readily to ranking tasks. This conjecture is worth expanding on, since it suggests some important directions for further research. One account of hypothetical bias that is consistent with these data runs as follows. Assume that subjects come into an experiment task and initially form some beliefs as to the ‘range of feasible responses’, and that they then use some heuristic to ‘narrow down’ a more precise response within that range. It is plausible that hypothetical bias could affect the first step, but not be so important for the second step. If

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

244

Handbook of choice modelling

that were the case, then a task that constrained the range of feasible responses, such as our ranking task that restricts the subjects to choose ranks between 1 and 12, might not suffer from hypothetical bias. On the other hand, a valuation task might plausibly elicit extreme responses in a hypothetical setting, as subjects note that they could just as easily say that they would pay nothing as say that they would pay $1 million. In this setting there is no natural constraint, such as comparing to one’s budget, to constraint feasible responses. Hence the second stage of the posited decision process would be applied to different feasible ranges, and even if the second stage were roughly the same for hypothetical and real tasks, if the first stage were sufficiently different then the final response could be very different. This is speculation, of course. The experiment considered here does not provide any evidence for this specific thought process, but it does serve to rationalize the results.

4

MITIGATING HYPOTHETICAL BIAS

There are two broad ways in which one can try to mitigate hypothetical bias: by means of instrument calibration before the survey (trying out different ‘wordings’ to generate less biased hypothetical responses) or by means of statistical calibration after the survey (estimating hypothetical bias functions that can be used to then correct for that bias). Harrison (2006b) surveys these two calibration methods in greater detail. 4.1

Instrument Calibration

The idea of instrument calibration has already generated two important innovations in the way in which hypothetical questions have been posed: recognition of some uncertainty in the subject’s understanding of what a ‘hypothetical yes’ means (Blumenschein et al., 1998, 2001), and the role of ‘cheap talk’ scripts directly encouraging subjects to avoid hypothetical bias (Cummings et al., 1995a; Cummings and Taylor, 1998; List, 2001; Aaadland and Caplan, 2003; Brown et al., 2003). The evidence for these procedures is mixed. Allowing for some uncertainty can allow one to adjust hypothetical responses to better match real responses, but presumes that one knows ex ante what threshold of uncertainty is appropriate to apply. Simply showing that there exists a threshold that can make the hypothetical responses match the real responses, once you look at the hypothetical and real responses, is not particularly useful unless that threshold provides some out-of-sample predictive power. Similarly, the effects of ‘cheap talk’ appear to be context-specific, which simply means that one has to test its effect in each context rather than assume it works in all contexts. 4.2 Statistical Calibration The essential idea underlying the statistical calibration approach, developed by Blackburn, et al. (1994), is that the hypothetical survey provides an informative, but statistically biased, indicator of the subject’s true willingness to pay for the environmental good. The trick is how to estimate and apply such bias functions. They propose doing so with the complementary use of field elicitation procedures that use hypothetical surveys,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Real choices and hypothetical choices 245 laboratory elicitation procedures that use hypothetical and non-hypothetical surveys, and laboratory elicitation procedures that use incentive-compatible institutions.15 The upshot of the statistical calibration approach is a simple comparison of the original responses to the hypothetical survey and a set of calibrated responses that the same subjects would have made if asked to make a real economic commitment in the context of an incentive-compatible procedure. This approach does not predetermine the conclusion that the hypothetical survey is ‘wrong’. If the hypothetical survey is actually eliciting what its proponents say that it is, then the calibration procedure should say so. In this sense, calibration can be seen as a way of validating ‘good hypothetical surveys’ and correcting for the biases of ‘bad hypothetical surveys’.16 The statistical calibration approach can do more than simply pointing out the possible bias of a hypothetical choice survey. It can also evaluate the confidence with which one can infer statistics such as the population mean from a given survey. In other words, a decision maker is often interested in the bounds for a damage assessment that fall within prescribed confidence intervals. Existing hypothetical surveys often convey a false sense of accuracy in this respect. A calibration approach might indicate that the population mean inferred from a hypothetical survey is reliable in the sense of being unbiased, but that the standard deviation was much larger than the hypothetical survey would directly suggest. This type of extra information can be valuable to a risk-averse decision maker. Consider the analogy of a watch that is always 10 minutes slow to introduce the idea of a statistical bias function for hypothetical surveys. The point of the analogy is that hypothetical responses can still be informative about real responses if the bias between the two is systematic and predictable. The watch that is always 10 minutes slow can be informative, but only if the error is known to the decision maker and if it is transferable to other instances (that is, the watch does not slow further over time). Blackburn et al. (1994) define a ‘known bias function’ as one that is a systematic statistical function of the socio-economic characteristics of the sample. If this bias is not mere noise then one can say that it is ‘knowable’ to a decision maker. They then test if the bias function is transferable to a distinct sample valuing a distinct good, and conclude that it is. In other words, they show that one can use the bias function estimated from one instance to calibrate the hypothetical responses in another instance, and that the calibrated hypothetical responses statistically match those observed in a paired real elicitation procedure. Johannesson et al. (1999) extend this analysis to consider responses in which subjects report the confidence with which they would hypothetically purchase the good at the stated price, and find that information on that confidence is a valuable predictor of hypothetical bias. There have been two variants on this idea of statistical calibration: one from the marketing literature dealing with the pooling of responses from hypothetical and real data process, and one from the experimental literature dealing with in-sample calibration. Pooling responses from different mechanisms Building on long-standing approaches in marketing, a different statistical calibration tradition seeks to recover similarities and differences in preferences from data drawn from various institutions. The original objective was ‘data enrichment’, which is a useful way to view the goal of complementing data from one source with information from another source.17 Indeed, the exercise was always preceded by a careful examination

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

246

Handbook of choice modelling

of precisely what one could learn from one data source that could not be learned from another, and those insights were often built into the design. For example, attribute effects tend to be positively correlated in real life: the good fishing holes have many of the positive attributes fishermen want. This makes it hard to tease apart the effects of different attributes, which may be important for policy evaluation. Adroit combination of survey methods can mitigate such problems, as illustrated by Adamowicz et al. (1994). Relatively few applications of this method have employed laboratory data, such that there is at least one data generating mechanism with known incentive compatibility. One exception is Cameron et al. (2002). They implement six different hypothetical surveys, and one actual DC survey. All but one of the hypothetical surveys considered the same environmental good as the actual DC survey; the final hypothetical survey used a ‘conjoint analysis’ approach to identify attributes of the good. Their statistical goal was to see if they could recover the same preferences from each data generation mechanism, with allowances for statistical differences necessitated by the nature of the separate responses (for example, some were binary, and some were open-ended). They develop a mixture model, in which each data generation mechanism contributes to the overall likelihood function defined over the latent valuation. Although they conclude that they were generally able to recover the same preferences from most of the elicitation methods, their results depend strikingly on the assumed functional forms.18 Their actual DC response was only at one price, so the corresponding latent WTP function can only be identified if one is prepared to extrapolate from the hypothetical responses. The upshot is a WTP function for the actual response that has a huge standard error, making it hard to reject the null that it is the ‘same’ as the other WTP functions. The problems are clear when one recognizes that the only direct information obtained is that just 27 percent of the sample would purchase the environmental good at $6 when asked for real, whereas 45 percent would purchase the good when asked hypothetically.19 The only information linking the latent WTP functions is the reported income of respondents, along with a raft of assumptions about functional form. A popular approach to combining data from different sources has been proposed in the stated choice literature: see Louviere et al. (2000, chs 8 and 13) for a review. One concern with this approach is that it relies on differences in an unidentified ‘scale parameter’ to implement the calibration. Consider the standard probit model of binary choice, to illustrate. One common interpretation of this model is that it reflects a latent and random utility process in which the individual has some cardinal number for each alternative that can be used to rank alternatives. This latent process is assumed to be composed of a deterministic core and an idiosyncratic error. The ‘error story’ varies from literature to literature,20 but if one further assumes that it is normally distributed with zero mean and unit variance then one obtains the standard probit specification in which the likelihood contribution of each binary choice observation is the cumulative distribution function of a standard normal random variable evaluated at the deterministic component of the latent process. Rescaling the assumed variance only scales up or down the estimated coefficients, since the contribution to the likelihood function depends only on the cumulative distribution below the deterministic component. In the logit specification a comparable normalization is used, in which the variance is set to π2/3. Most of the ‘data enrichment’ literature in marketing assumes that the two data sources have the same deterministic component, but allows the scale parameter to vary. This has nothing to say about calibration, as conceived here.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Real choices and hypothetical choices 247 But an extension of this approach does consider the problem of testing if the deterministic components of the two data sources differ, and this nominally has more to do with calibration. The methods employed here were first proposed by Swait and Louviere (1993), and are discussed in Louviere et al. (2000, s. 8.4). They entail estimation of a model based solely on hypothetical responses, and then a separate estimation based solely on real responses. In each case the coefficients on the explanatory variables (for example, sex and age) conditioning the latent process are allowed to differ, including the intercept on the latent process. Then they propose estimation of a ‘pooled’ model in which there is a dummy variable for the data source. Implicitly the pooled model assumes that the coefficients on the explanatory variables other than the intercept are the same for the two data sources.21 The intercepts implicitly differ, if one thinks of there being one latent process for the hypothetical data and one latent process for the real data. Since the data are pooled, the same implicit normalization of variance is applied to the two data sources. Thus one effectively constrains the variance normalizations to be the same, but allows the intercept to vary according to the data source. The hypothesis of interest is then tested by means of an appropriate comparison of likelihood values. In effect, this procedure can test if hypothetical and real responses are affected by covariates in the same manner, but not if they differ conditionally on the covariates. Thus if respondents have the same propensity to purchase a good at some price, this method can identify that. But if men and women each have the same elevated propensity to ‘purchase’ when the task is hypothetical, this method will not identify that.22 And the overall likelihood tests will indicate that the data can be pooled, since the method allows the intercepts to differ across the two data sources. Hence claims in Louviere et al. (2000, ch.13) of widespread ‘preference regularity’ across disparate data sources and elicitation methods should not be used as the basis for dismissing the need to calibrate hypothetical and real responses.23 On the other hand, the tests of preference regularity from the marketing literature are capable of being applied more generally than the methods of pooling preferences from different sources. The specifications considered by Louviere et al. (2000, pp. 233–6) clearly admit the possibility of marginal valuations differing across hypothetical and real settings.24 In fact, it is possible to undertake tests where some coefficients are the same while others are different, illustrated by Louviere et al. (2000, s. 8.4.2). This is a clear analogue to some parameters in a real/hypothetical experiment being similar (for example, some marginal effects) but others being quite different (for example, purchase intention), as illustrated by Lusk and Schroeder (2004). The appropriate pooling procedures then allow some coefficients to be estimated jointly while others are estimated separately, although there is an obvious concern with such specification tests leading to reported standard errors that understate the uncertainty over model specification. Calibrating responses within sample Fox et al. (1994) and List and Shogren (1998, 2002) propose a method of calibration which uses hypothetical and real responses from the same subjects for the same good.25 But if one is able to elicit values in a non-hypothetical manner, then why bother in the first place eliciting hypothetical responses that one has to calibrate? The answer is that the relative cost of collecting data may be very different in some settings. It is possible in marketing settings to construct a limited number of ‘mock ups’ of the potential product

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

248

Handbook of choice modelling

to be taken to market, but these are often expensive to build due to the lack of scale economies. Similarly, one could imagine in the environmental policy setting that one could actually implement policies on a small scale at some reasonable expense, but that it is prohibitive to do so more widely without some sense of aggregate WTP for the wider project. The local implementation could then be used as the basis for ascertaining how one must adjust hypothetical responses for the wider implementation. These considerations aside, the remaining substantive challenge for calibration is to demonstrate feasibility and utility for the situation of most interest in stated choice valuation, when the underlying target good or project is non-deliverable and one must by definition consider cross-commodity calibration. Again, the work that needs to be done is to better understand when statistical calibration works and why, not to just document on occasional ‘success here’ or ‘failure there’. The literature is replete with selective citations to studies that support one position or another; the greater challenge is to explain this disparity in terms of operationally meaningful hypotheses.

5 5.1

OPEN ISSUES Advisory Referenda and Realism

One feature of hypothetical choice surveys in the field is not well captured by most experiments: the chance that the subject’s hypothetical response might influence policy or the level of damages in a lawsuit. To the extent that we are dealing with a subjective belief, such things are intrinsically difficult to control perfectly. In some field surveys, however, there is a deliberate use of explicit language which invites the subject to view their responses as having some chance of affecting real decisions. If one accepts that field surveys are successful in encouraging some subjects to take the survey for real in a subjectively probabilistic sense, then the natural question to ask is ‘how realistic does the survey have to be, in the eyes of respondents, before they respond as if it were actually real?’ In other words, if one can encourage respondents to think that there is some chance that their responses will have an impact, at what point do the subjects behave the way they do in a completely real survey? Obviously this question is well posed, since we know by construction that they must do so when the chance of the survey being real is 100 percent. The interesting empirical question, which we examine, is whether any smaller chance of the survey being real will suffice. This question takes on some significance if one can show that the subject will respond realistically even when the chance of the payment and provision being real is small. Harrison (2006a) reviews evidence to show that just making surveys ‘realistic’ is not the panacea for hypothetical bias that one might hope. 5.2 Salient Rewards Experimental economics differentiates between non-salient rewards and salient rewards. The former refer to rewards that do not vary with performance in the task; for example, an initial endowment of cash, or perhaps the show-up fee.26 The latter refer to rewards that vary with performance in the task. In parallel to the distinction between fixed and

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Real choices and hypothetical choices 249 variable costs, these might be called fixed rewards and variable rewards. The hypothetical setting for virtually all of the experiments considered here should be better referred to as an experiment with non-salient rewards, since subjects were typically rewarded for participating. The hypothetical surveys popular in the field rarely reward subjects for participating, although it has occurred in some cases. There could be a difference between the non-salient experiments which are called ‘hypothetical’ and ‘truly hypothetical’ experiments in which there are no rewards (salient or non-salient). More systematic variation in the non-salient rewards provided in CV studies would allow examination of these effects.27 5.3

A Common Defense

One common defense for ignoring hypothetical bias is that an influential survey by Camerer and Hogarth (1999) is stated as concluding that there is no evidence of hypothetical bias in simple risky lottery choices. What Camerer and Hogarth (1999) conclude, quite clearly, is that the use of hypothetical rewards makes a difference to the choices observed, but that it does not generally change the inference that they draw about the validity of expected utility theory (EUT). Since the latter typically involve paired comparisons of response rates in two lottery pairs (for example, in common ratio tests), it is logically possible for there to be (i) differences in choice probabilities in a given lottery depending on whether one uses hypothetical or real responses, and (ii) no difference between the effect of the EUT treatment on lottery pair response rates depending on whether one uses hypothetical or real responses. Furthermore. Camerer and Hogarth (1999) explicitly exclude from their analysis the mountain of data from experiments on valuation28 that show hypothetical bias. Their rationale for this exclusion was that economic theory did not provide any guidance as to which set of responses was valid. This is an odd rationale, since there is a well-articulated methodology in experimental economics that is quite precise about the motivational role of salient financial incentives (Smith, 1982). In addition, the experimental literature has generally been careful to consider elicitation mechanisms that provide dominant strategy incentives for honest revelation of valuations, and indeed in most instances explain this to subjects since it is not being tested. Thus economic theory clearly points to the real responses as having a stronger claim to represent true valuations. In any event, the mere fact that hypothetical and real valuations differ so much tells us that at least one of them is wrong! Thus one does not actually need to identify one as reflecting true preferences, even if that is an easy task a priori, in order to recognize that there are differences in behavior between hypothetical and real choices.

6

CONCLUSIONS

There is no reliable way to trick subjects into thinking that something is in their best interests when it is not. Nonetheless, the literature on hypothetical choice is littered with assertions that one can somehow trick people into believing something that is not true. One probably can, if deception is allowed, but such devices cannot be reliable more than once. The claims tend to take the form, ‘if we frame the hypothetical task the same way as some real-world task that is incentive compatible, people will view it as incentive

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

250

Handbook of choice modelling

compatible’. The same view tends to arise in the stated choice literature, but is just a variant on a refrain that has a longer history. There are some specifications which do appear to mitigate hypothetical bias in some settings, but such instances do not provide a general behavioral proof that can be used as a crutch in other instances. For example, there is some evidence that one can isolate hypothetical bias to the ‘buy or no-buy’ stage of a nested purchase decision, and thereby mitigate the effects on demand for a specific product. Similarly, there is some evidence that one can avoid hypothetical bias by using ranking tasks rather than choice or valuation tasks. In each case there are interesting conjectures about the latent decision-making process that provide some basis for believing that the specific results might generalize. But we simply do not know yet, and the danger of generalizing is both obvious and habitually neglected in the stated choice literature. These possibilities should be explored, and evaluated in other settings, before relied on casually to justify avoiding the issue. The only recommendation that can be made from experiments designed to test for incentive compatibility and hypothetical bias is that one has to address the issue head on. If one can deliver the commodity, which is the case in many stated choice applications in marketing, do so. If it is expensive, such as a beta product, then do so for a sub-sample to check for hypothetical bias and correct it statistically. If it is prohibitive or impossible, which is the case in many stated choice applications in environmental economics, use controlled experiments for a surrogate good as a complementary tool. That is, find some deliverable private or public good that has some of the attributes of the target good, conduct experiments to measure hypothetical bias using samples drawn from the same population, and use the results to calibrate the instrument and/or the responses. And explore the task specifications that appear to mitigate hypothetical bias. Above all, read with great suspicion any study that casually sweeps the problem under the rug.

NOTES 1. 2. 3. 4.

A dictatorial mechanism is one in which the outcome always reflects the preferences of one specific agent, independent of the preferences of others. See Gibbard (1973) and Satterthwaite (1975) for the original statements of this theorem, and Moulin (1988) for an exposition. See Rasmussen (1989, p. 161). The epsilon truthfulness assumption is used in formal mechanism design problems when the incentive constraints are defined so as to ensure that the expected utility to each agent from a truthful report is greater than or equal to the expected utility from any other feasible report. The adoption of this assertion by DOI and NOAA is apparently based on a reference to the following statement that appears in an appendix to the NOAA Panel report of Arrow et al. (1993, p. 4612): ‘As already noted, such a question form [a dichotomous choice question posed as a vote for or against a level of taxation] also has advantage in terms of incentive compatibility.’ This reference ignores, however, the text of the NOAA Panel’s report which includes a lengthy discussion of the advantages and disadvantages of the referendum format used in the hypothetical setting of an application of the CV method (NOAA, 1993, pp. 4606–7), discussions which belie the later assertion of incentive compatibility. Among the disadvantages discussed by them are respondent’s reactions to a hypothetical survey, the fact that there can be no real implication that a tax will actually be levied and the damage actually repaired or avoided. Thus, the NOAA Panel suggests that ‘considerable efforts should be made to induce respondents to take the question seriously, and that the CV instrument should contain other questions designed to detect whether the respondent has done so’ (1993; p. 4606). Further, the NOAA Panel notes a further problem that could detract from the reliability of CV responses: ‘A feeling that one’s vote will have no significant effect on the outcome of the hypothetical referendum, leading to no reply or an unconsidered one’ (1993, p. 4607).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Real choices and hypothetical choices 251 5. 6.

7.

8. 9.

10.

11.

12. 13. 14. 15.

Free riding is said to occur when a subject does not make any contribution to the provision of a public good that is valued by the subject. This point can be stated more formally by thinking of the choice study as a game between the surveyor and the respondent. There is a difference between complete information and common knowledge in strategic games that captures this distinction. Surveyors can tell subjects something that is true, but that is a not the same thing as knowing that subjects believe those things to be true. Linguistics has rich traditions that help us think about the everyday transition to common knowledge in these settings. Statistical approaches to the linguistic issue of how people resolve ambiguous sentences in natural languages are becoming quite standard. See, for example, Allen (1995, chs 7 and 10) and Schütze (1997, ch. 2), and the references cited there. Non-statistical approaches, using axioms of conversation to disambiguate sentences, are proposed in another line of linguistic research by Grice (1989) and Clark (1992). Carson et al. (2001, p. 191) appear to take issue with this claim, but one simply has to parse carefully what they say to understand it as actually in agreement: ‘For provision of private or quasi-public goods, a yes response increases the likelihood that the good will be provided, however, the actual decision to purchase the good need not be made until later. Thus, a yes response increases the choice set at no expense.’ They are not clear on the matter, so one has to fill in the blanks to make sense of this. If the DC involves a real commitment, such that the subject gets the private good if private money is given up, then the yes response does not increase the choice set for free. So they cannot be referring to a real DC response. In the case of a hypothetical DC for private goods, it does not follow that the yes response increases the likelihood of the good being provided. Of course, subjects are entitled to hold whatever false expectations they want, but the explicit script in Cummings et al. (1995b) contained nothing intended to lead them to that belief. Carson, Flores and Meade (2001, p. 191) then suggest how one can make this setting, which can only be interpreted as referring to a hypothetical DC, incentive compatible: ‘The desirable incentive properties of a binary discrete choice question can be restored in instances where the agent is asked to choose between two alternatives, neither of which represents a strict addition to the choice set.’ Their footnote 44 then explains what they mean: ‘It can be shown that what a coercive payment vehicle does is to effectively convert a situation whereby an addition to the choice set (e.g., a new public good) looks like a choice between two alternatives, neither of which is a subset of the other, by ensuring the extraction of payment for the good’ (emphasis added). So this is just saying that one can make a hypothetical DC incentive compatible by requiring real payment, which is the point that Cummings et al. (1995) viewed as apparent and hardly in need of notation and proof. The words ‘look like’ are problematic to an experimental economist. They suggest that one must rely on subjects misunderstanding the hypothetical nature of the task in order for it to be incentive compatible. But if subjects misunderstand part of the instructions, how does one know that they have understood all of the rest? Circular ‘logic’ of this kind is precisely why one needs stark experiments. That is, one task is selected after all choices have been made, and the subject plays it out and receives the consequences. This avoids the potentially contaminating effects of changes in real income if one plays out all choices sequentially. As a procedural matter, experimental economists generally rely on physical randomizing devices, such as die and bingo cages, when randomization plays a central role in the mechanism. There is a long tradition in psychology of subjects second-guessing computer-generated random numbers, and the unfortunate use of deception in many fields from which economists recruit subjects makes it impossible to rely on the subject trusting the experimenter in such things. The manner in which survey proponents quickly shift ground when confronted by uncomfortable evidence of hypothetical bias is well illustrated by Carson (1997; fn. 7): ‘Once the strategic incentives in the single-private-good case are grasped, it should not be surprising that the marketing research literature evolved away from the single-good case to the multiple-good case, where it is possible to restore some of the incentives for truthful preference revelation.’ This assertion is hard to understand. There are incentives for truthful revelation if the single DC question for private goods involves real consequences; otherwise, there are simply no incentives without untenable assumptions. The same is true if there are multiple DC questions, providing the real consequences only apply to one of them. When subjects are indifferent over options, it does not follow that they will choose at random. They might use other heuristics to pick choices which exhibit systematic biases. For example, concern with a possible left-right bias leads experimental economists looking at lottery choice behavior to randomize the order of presentation. See List et al. (2006) for some evidence consistent with this conjecture. After removing subjects that failed to complete the survey in some respect, there are 91 remaining subjects. See Harrison and Rutström (2006b) for one review. Related work on statistical calibration functions includes Fox et al. (1994), Johannesson et al. (1999) and List and Shogren (1998, 2002).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

252 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.

26. 27.

28.

Handbook of choice modelling Mitchell and Carson (1989) provide a popular and detailed review of many of the traits of ‘bad hypothetical surveys’. One might question the importance of some of these traits, but that debate is beyond the scope of this review. See Hensher, Louviere and Swait (1999) and Louviere, Hensher and Swait (2000; chs 8 and 13) for reviews. Unfortunately the data from this study are not available for public evaluation (Trudy Cameron, personal communication), so one cannot independently assess the effects of alternative specifications. This compares the 0-ACT and 1-PDC treatments, which are as close as possible other than the hypothetical nature of the response elicited. In the stated choice literature they refer to unobserved individual idiosyncrasies of tastes (for example, Louviere et al. (2000, p. 38)), and in the stochastic choice literature they also refer to trembles or errors by the individual (for example, Hey, 1995). This is particularly clear in the exposition of Louviere et al. (2000, pp. 237, 244) since they use the notation aRP and aSP for the intercepts from data sources RP and SP, and a common b for the pooled estimates. Interactions may or may not be identified, but they only complicate the already-complicated picture. Despite this negative assessment of the potential of this approach for constructive calibration of differences between hypothetical and real responses, the ‘data enrichment’ metaphor that originally motivated this work in marketing is an important and fundamental one for economics. Louviere et al. (2000, p. 233) use the notation aRP and aSP for the intercepts from data sources RP and SP, and bRP and bSP for the coefficient estimates. Fox et al. (1998, p. 456) offer two criticisms of the earlier calibration approach of Blackburn et al. (1994). The first is that it is ‘inconclusive’ since one of the bias functions has relatively large standard errors. But such information on the imprecision of valuations is just as important as information on the point estimates if it correctly conveys the uncertainty of the elicitation process. In other words, it is informative to convey one’s imprecision in value estimation if the decision-maker is not neutral to risk. The second criticism is that Blackburn et al. (1994) only elicit a calibration function for one price on a demand schedule in their illustration of their method, and that the calibration function might differ for different prices. This is certainly correct, but hardly a fundamental criticism of the method in general. The show-up fee is fixed conditional on the subject turning up and participating. It is definitely presumed to be salient with respect to the participation decision. A conjecture. If subjects are brought in and given a substantial non-salient reward for participating, and given certain ‘(not so) cheap talk,’ would they behave as if facing salient rewards? The ‘(not so) cheap talk’ would be something along these lines: ‘we have given you a large fee for just filling out this hypothetical survey because we value your responses. We are unable to make this a survey with real consequences. But we would like you to consider your responses as if it were real. We are giving you this large fee to encourage you to do that, because we value your careful consideration.’ The rationale for this treatment is that the payment might set up a ‘social contract’ between the experimenter and subject, leading to a ‘gift exchange’ of cognitive effort in return for the fixed participation fee. The quotation marks flag our fears as to what might happen, but these are easy things to test behaviorally. The term ‘valuation’ subsumes open-ended elicitation procedures, as well as DC, binary referenda and stated choice tasks.

REFERENCES Aadland, D. and A.J. Caplan (2003), ‘Willingness to pay for curbside recycling with detection and mitigation of hypothetical bias’, American Journal of Agricultural Economics, 85 (2), 492–502. Adamowicz, W., J. Louviere and M. Williams (1994), ‘Combining revealed and stated preference methods for valuing environmental amenities’, Journal of Environmental Economics and Management, 26 (3), 271–92. Afriat, S. (1967), ‘The construction of a utility function from expenditure data’, International Economic Review, 8 (1), 67–77. Allen, J. (1995), Natural Language Understanding, 2nd edn, Redwood City, CA: Benjamin/Cummings. Arrow, K., R. Solow, P. Portney, E.E. Leamer, R. Radner and H. Schuman (1993), ‘Report of the NOAA Panel on Contingent Valuation’, Federal Register, 58 (10), 4602–14. Blackburn, M., G.W. Harrison and E.E. Rutström (1994), ‘Statistical bias functions and informative hypothetical surveys’, American Journal of Agricultural Economics, 76 (5), 1084–8. Blumenschein, K., M. Johannesson and K. Yokoyama (2001), ‘Hypothetical vs. real willingness to pay in the health sector: results from a field experiment’, Journal of Health Economics, 20 (3), 441–57. Blumenschein, K., M. Johannesson, G.C. Blomquist, B. Liljas and R.M. O’Conor (1998), ‘Experimental

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Real choices and hypothetical choices 253 results on expressed certainty and hypothetical bias in contingent valuation’, Southern Economic Journal, 65 (1), 169–77. Brown, T.C., I. Ajzen and D. Hrubes (2003), ‘Further tests of entreaties to avoid hypothetical bias in referendum contingent valuation’, Journal of Environmental Economics and Management, 46 (2), 353–61. Camerer, C.F. and R.M. Hogarth (1999), ‘The effects of financial incentives in experiments: a review and capital-labor-production framework’, Journal of Risk and Uncertainty, 19 (December), 7–42. Cameron, T.A., G.L. Poe, R.G. Ethier and W.D. Schulze, (2002), ‘Alternative non-market value-elicitation methods: are the underlying preferences the same?’, Journal of Environmental Economics and Management, 44 (3), 391–425. Carlsson, F. and P. Martinsson (2001), ‘Do hypothetical and actual marginal willingness to pay differ in choice experiments?’, Journal of Environmental Economics and Management, 41 (2), 179–92. Carson, R.T. (1997), ‘Contingent valuation: theoretical advances and empirical tests since the NOAA Panel’, American Journal of Agricultural Economics, 79 (5), 1501–7. Carson, R.T., N.E. Flores and N.F. Meade (2001), ‘Contingent valuation: controversies and evidence’, Environmental and Resource Economics, 19 (2), 173–210. Clark, H.H. (1992), Arenas of Language Use, Chicago, IL: University of Chicago Press. Coller, M. and M.B. Williams (1999), ‘Eliciting individual discount rates,’ Experimental Economics, 2 (2), 107–27. Cummings, R.G. and L.O. Taylor (1998), ‘Does realism matter in contingent valuation surveys?’, Land Economics, 74 (2), 203–15. Cummings, R.G., S. Elliott, G.W. Harrison and J. Murphy (1997), ‘Are hypothetical referenda incentive compatible?’, Journal of Political Economy, 105 (3), 609–21. Cummings, R.G., G.W. Harrison and L.L. Osborne (1995a), ‘Can the bias of contingent valuation be reduced? Evidence from the laboratory’, Economics Working Paper B-95-03, Division of Research, College of Business Administration, University of South Carolina. Cummings, R.G., G.W. Harrison and E.E. Rutström (1995b), ‘Homegrown values and hypothetical surveys: is the dichotomous choice approach incentive compatible?’, American Economic Review, 85 (1), 260–66. Department of the Interior (DOI) (1994), ‘Proposed rules for valuing environmental damages’, Federal Register, 59 (85), 23098–111. Fox, J.A., J.F. Shogren, D.J. Hayes and J.B. Kliebenstein (1998), ‘CVM-X: calibrating contingent values with experimental auction markets’, American Journal of Agricultural Economics, 80 (August), 455–65. Gibbard, A. (1975), ‘Manipulation of voting schemes: a general result’, Econometrica, 41 (4), 587–601. Grice, P. (1989), Studies in the Way of Words, Cambridge, MA: Harvard University Press. Harrison, G.W. (2005), ‘Hypothetical bias over uncertain outcomes’, in J.A. List (ed.), Using Experimental Methods in Environmental and Resource Economics, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, ch. 3. Harrison, G.W. (2006a), ‘Making choice studies incentive compatible’, in B. Kanninen (ed.), Valuing Environmental Amenities Using Stated Choice Studies: A Common Sense Guide to Theory and Practice, Boston, MA: Kluwer, pp. 65–108. Harrison, G.W. (2006b), ‘Experimental evidence on alternative environmental valuation methods’, Environmental and Resource Economics, 34 (1), 125–62. Harrison, G.W. and J. Hirshleifer (1989), ‘An experimental evaluation of weakest-link/best-shot models of public goods’, Journal of Political Economy, 97 (1), 201–25. Harrison, G.W. and E.E. Rutström (2006a), ‘Eliciting subjective beliefs about mortality risk orderings’, Environmental & Resource Economics, 33 (3), 325–46. Harrison, G.W. and E.E. Rutström (2006b), ‘Experimental evidence on the existence of hypothetical bias in value elicitation methods’, in C.R. Plott and V.L. Smith (eds), Handbook of Experimental Economics Results, Amsterdam: North-Holland, pp. 752–67. Hensher, D., J. Louviere and J.D. Swait (1999), ‘Combining sources of preference data,’ Journal of Econometrics, 89 (3), 197–221. Hey, J.D. (1995), ‘Experimental investigations of errors in decision making under risk,’ European Economic Review, 39 (3–4), 633–40. Holt, C.A. and S.K. Laury (2002), ‘Risk aversion and incentive effects’, American Economic Review, 92 (5), 1644–55. Holt, C.A. and S.K. Laury (2005), ‘Risk aversion and incentive effects: new data without order effects’, American Economic Review, 95 (3), 902–12. Johannesson, M., G.C. Blomquist, K. Blumenschein, P.-O. Johansson, B. Liljas and R.M. O’Connor (1999), ‘Calibrating hypothetical willingness to pay responses’, Journal of Risk and Uncertainty, 8 (1), 21–32. Ledyard, J.O. (1995), ‘Public goods: a survey of experimental research’, in J. Kagel and A. Roth (eds), Handbook of Experimental Economics, Princeton, NJ: Princeton University Press, pp. 111–94. List, J.A. (2001), ‘Do explicit warnings eliminate the hypothetical bias in elicitation procedures? Evidence from field auctions for sportscards’, American Economic Review, 91 (5), 1498–507.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

254

Handbook of choice modelling

List, J.A. and J.F. Shogren (1998), ‘Calibration of the differences between actual and hypothetical valuations in a field experiment’, Journal of Economic Behavior and Organization, 37 (November), 193–205. List, J.A. and J.F. Shogren (2002), ‘Calibration of willingness-to-accept’, Journal of Environmental Economics and Management, 43 (2), 219–33. List, J.A., P. Sinha and M. Taylor (2006), ‘Using choice experiments to value non-market goods and services: evidence from the field,’ Advances in Economic Analysis and Policy, 6 (2), art. 2, accessed 6 January 2014 at www.bepress.com/bejeap/advances/vol6/iss2/art2. Louviere, J.J., D.A. Hensher and J.D. Swait (2000), Stated Choice Methods: Analysis and Application, New York: Cambridge University Press. Lusk, J.L. and T.C. Schroeder (2004), ‘Are choice experiments incentive compatible? A test with quality differentiated beef steaks’, American Journal of Agricultural Economics, 86 (2), 467–82. Mitchell, R.C. and R.T. Carson (1989), Using Surveys to Value Public Goods: The Contingent Valuation Method, Baltimore, MD: Johns Hopkins University Press. Moulin, H. (1988), Axioms of Cooperative Decision Making, New York: Cambridge University Press. Nape, S.W., P. Frykblom, G.W. Harrison and J.C. Lesley (2003), ‘Hypothetical bias and willingness to accept’, Economic Letters, 78 (3), 423–30. National Oceanographic and Atmospheric Administration (NOAA) (1994), ‘Proposed rules for valuing environmental damages’, Federal Register, 59 (5), 1062–191. Rasmussen, E. (1989), Games and Information: An Introduction to Game Theory, New York: Basil Blackwell. Samuelson, P.A. (1938), ‘A note on the pure theory of consumer’s behavior’, Economica, 5 (17), 61–71. Satterthwaite, M.A. (1975), ‘Strategy-proofness and Arrow’s conditions: existence and correspondence theorems for voting procedures and social welfare functions’, Journal of Economic Theory, 10 (2), 187–217. Schütze, H. (1997), Ambiguity Resolution in Language Learning, Stanford, CA: CLSI Publications. Smith, V.L. (1982), ‘Microeconomic systems as an experimental science’, American Economic Review, 72 (5), 923–55. Swait, J.D. and J.J. Louviere (1993), ‘The role of the scale parameter in the estimation and comparison of multinomial logit models’, Journal of Marketing Research, 30 (3), 305–14. Varian, H.R. (1982), ‘The nonparametric approach to demand analysis’, Econometrica, 50 (4), 945–74. Varian, H.R. (1983), ‘Non-parametric tests of consumer behaviour’, Review of Economic Studies, 50 (1), 99–110.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:58AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

11 Nonparametric approaches to describing heterogeneity1 Mogens Fosgerau

1

INTRODUCTION

This chapter considers the estimation of binomial and multinomial discrete choice models that contain a random preference parameter with an unknown distribution, focusing on simple approaches where this unknown distribution is directly estimated. The unknown distribution is possibly multivariate.2 We talk about approaches that are nonparametric in the sense that the description of some unknown distribution is nonparametric. This unknown distribution may be embedded in an otherwise parametric model and the combination would then be called semiparametric. In a discrete choice model, the random preference parameter may enter in some function describing the indirect utilities associated with alternatives. Let us say the model prescribes the probability of choosing alternative y [ { 1, . . . , J } to be P (y 5 j 0 x, b) , where y is the choice, j indexes alternatives, x is a vector of observed variables and b is a random parameter vector with cumulative distribution function (CDF) F. Depending on the circumstances, b may be univariate or multivariate. We use bold letters to indicate vectors (that may still be univariate) while variables in plain font must be univariate. We shall maintain a random effect assumption, namely that the distribution of b is independent of x. The random effect assumption is very convenient, but it is not always credible and it is by no means an innocuous assumption. If, for example, the population is divided into men and women, distinguished by x 5 1 or x 5 2, then we have to be able to believe that the distribution of the random preference parameter is the same for men and for women, F (b) 5 F (b 0 x 5 1) 5 F (b 0 x 5 2) . In some circumstances it is sufficient to use a fixed effect assumption, under which some parameters can be random but not necessarily independent of the variables x or the random parameters b.3 If we can accept the random effect assumption, then we obtain a very useful simplification, namely that the choice probability P (y 5 j 0 x) may be expressed, integrating out the distribution of b, as P (y 5 j 0 x) 5 e P (y 5 j 0 x, b) dF (b) . If F is known, this integration can generally be carried out, either analytically or numerically. This is routinely done in the many applications of the mixed logit model, where random parameters are given some distribution and the integration is carried out using simulation (Train, 2009). But in most situations we actually have very little idea what F should be, except possibly for restrictions such as bounds on the possible values of b, including sign restrictions. In some applications the precise form for F is not essential and in such situations it may be unproblematic to impose a specific form. There are, however, situations where it is not desirable to impose a specific functional form on F; this may be when the shape of F has significant impact on the object of interest for the investigation or when F itself is the object of interest. For example, many applications of discrete 257 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

258

Handbook of choice modelling

choice models aim to estimate a distribution of willingness-to-pay in some population, where the willingness-to-pay may concern travel time or an environmental good. It is then highly desirable to be able to infer the functional form for the distribution of willingness-to-pay from data. Section 2 discusses the method of sieves, which uses a series of functions to approximate an unknown function. Section 3 discusses regression based approaches.

2

THE METHOD OF SIEVES

This section discusses the method of sieves, which is a way to construct families of functions that may approximate an unknown function arbitrarily well.4 The underlying observation is that an arbitrary (sufficiently well-behaved) real function F (with domain and codomain on the real line) may be written as a series in terms of basis functions via ` F (x) 5 g gkLk (x) , where Lk ( # ) are known basis functions and gn [ R are coefficients. k50 A number of convenient bases exist when the domain of F is the real line or a compact interval. While F has a representation in terms of coefficients, there are, in general, infinitely many coefficients F may be approximated (in some appropriate sense) by a trunK cated series FK (x) 5 g gkLk (x) that has a finite number of coefficients. The choice of K k50 determines the degree of flexibility in the approximating FK. The optimal K will depend on the shape of F and on the size of the available data set. Even though essentially any function can be approximated by a series expression, it may sometimes require very many terms to achieve a reasonable approximation. It may be that features that are present in the data only become available in approximations having many terms. This can be problematic because many parameters must be estimated. In such cases it may be useful to modify the series by adding a leading term. Thus one can choose L0 to be a certain function thought to be a good first approximation to F and thereby economize on the number of parameters to be estimated. 2.1

Fosgerau and Bierlaire Approach

The method of sieves is able to approximate arbitrary functions. In our case we have more information, since we are concerned with the approximation of a CDF. Fosgerau and Bierlaire (2007) provide a way to use the method of sieves to approximate a CDF in a discrete choice setting. Let F be a univariate CDF having a corresponding density f and let H be an absolutely continuous distribution with density h. We use F as a base for estimating the true distribution H and therefore it is appropriate to choose an F that is a priori thought to be a likely candidate for the true distribution. We require that the support of F contains the support of H; this means loosely that a random variable with distribution F may attain all values that a random variable with distribution H can obtain. Defining Q (u) 5 H (F21 (u) ) , we have that Q (F (b)) 5 H (b) . Furthermore, Q is monotonically increasing and ranges from 0 to 1 on the unit interval. Thus, Q is a cumulative distribution function for a random variable on the unit interval. Denote by q the density of this variable, which exists since H is absolute continuous. Then we can express the true density as h (b) 5 q (F (b)) f (b) .

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

259

Nonparametric approaches to describing heterogeneity

Consider now a discrete choice model P (y 0 x, b) conditional on the univariate random parameter b which has the true distribution H. Then the unconditional model is P (y 5 j 0 x) 5 3 P ( y 5 j 0 x, b) h (b) db 5 3 P (y 5 j 0 x, F21 (u) ) q (u) du.

(11.1)

Thus the problem of finding the unknown density h is reduced to that of finding q, an unknown density on the unit interval. The probability may be simulated using R standard uniform draws ur and computing P (y 5 j 0 x) .

1 P (y 5 j 0 x, F21 (ur) ) q (ur) . Ra r

Note that the terms F21 (ur) are just draws from the distribution F. When q 5 1, we obtain the standard numerical simulation of the likelihood (cf. Train, 2009), and so the only difference from a model in which F is the true distribution is the modification of the likelihood through the term q (u) . Now, let Lk be the kth Legendre polynomial on the unit interval (cf. Bierens, 2007; Fosgerau and Bierlaire, 2007). Legendre polynomials have a convenient recursive definition that is easily implemented on a computer. It states that Lk (u) 5 "4kk 2 1 (2u 21) Lk 21 (u) 2 (k 2 1) "2k 11 Lk22 (u) . The first four polynomik"2k 23 als are L0 (u) 51, L1 (u) 5 "3 ( 2x 2 1) , L2 (u) 5 "5 (6u2 2 6u 1 1) , and L3 (u) 5 "7 (20u3 2 30u2 1 12u 2 1) . These functions constitute a basis for functions on the unit interval. Furthermore, it is an orthonormal basis, which means that e Lk (u) Lkr (u) du is equal to 1 when k 5 kr and zero otherwise. This is useful when defining the following density: 2

q (u) 5

(1 1 g kgkLk (u)) 2 . 1 1 g kg2k

(11.2)

Squaring the numerator ensures positivity, while the normalization in the denominator ensures that q (u) integrates to 1. Thus this expression is in fact a density. Bierens (2007) proves that any density on the unit interval can be written in this way. To implement the estimator, select a cut-off K for k, such that only the first K terms of (11.2) are used. Thus we have a flexible qK with K parameters (g1,. . ., gK) and a corresponding cumulative distribution function QK. This is inserted into equation (11.1) to enable estimation by maximum likelihood. One way to use this setup is to test the hypothesis that (g1,. . ., gK) 5 0. Then q 5 1 so this amounts to testing whether QK is different from the uniform distribution or equivalently whether H 5 F. Alternatively, it is possible just to use the flexibility such that the random parameter has distribution QK (F (b) ) . To make the concepts concrete, let us suppose as an example that we are considering a mixed logit model with probabilities

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

260

Handbook of choice modelling P (y 5 j 0 x, b) 5

exp (axj 1 bx0j )

0 a exp (axjr 1 bxj r)

,

jr

where x is composed of vectors xj and univariate variables x0j . The parameter vector a is supposed to be constant, but we could have specified that to be random such that we would have mixing of those parameters. The parameter b is supposed to be random with CDF QK (F (b)) . We would then approximate the likelihood using (1 1 a gkLk (ur) )2 k51 K

P (y 5 j 0 x) .

exp (axj 1 F21 (ur) x0j )

1 21 0 Ra r a exp (axjr 1 F (ur) xjr) jr

1 1 a g2k. k51 K

Note that the terms F21 (ur) and Lk (ur) need only be evaluated once after the draws ur have been computed; they do not have to be recalculated during maximization of the likelihood. Fosgerau and Nielsen (2010) consider the binary panel data model yit 5 1 { ai 1 bxit 1 eit , vit } , where yit, xit, vit are observed, ai, eit are unobserved and i.i.d. with unknown distributions and b is a fixed parameter vector. They show under weak assumptions that the method of sieves can provide consistent estimates of b as well as the distributions of ai and eit. Consistency then applies also when the distribution of eit is taken as known and particularly in the binary logit model with a random effect ai. 2.2

Mixtures of Distributions Approach

Fosgerau and Hess (2009) compare the Fosgerau-Bierlaire approach to a mixtures of distributions approach (MOD), described in this section. One way to approximate an unknown distribution is as a collection of point masses, but this results in a distribution that is not absolutely continuous. The MOD approach therefore uses smooth bumps rather than just point masses, see for example, Coppejans (2001).5 Many different functions could be used to create smooth bumps; we shall use the normal distribution to create the smooth bumps but this choice is not essential. Define pairs (mk, sk) , k 5 1, . . . , K, of means and standard deviations and corresponding weights pk, that are positive and sum to 1. We may then approximate some unknown CDF as a discrete mixture of smooth distributions using F (b) 5 a pkFa K

k 51

b 2 mk b, sk

m where F is the standard normal CDF. Every term pkF (b 2s ) is a smooth bump, it gives a part of a distribution that is centered at mk, with a dispersion controlled by sk and a mass of pk. In estimation, the standard deviations may approach zero such that point masses result. Coppejans (2011) enforces a lower bound on the variance of the normally distributed components in order to ensure that the estimated distribution is smooth and this enables him to prove asymptotic convergence to the true distribution as the number of terms K increases with sample size. k

k

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

261

Nonparametric approaches to describing heterogeneity

Using again our mixed logit example with a univariate mixing distribution, one would create standard uniform draws urk for every replication r and bump k. We would then approximate the likelihood using P (y 5 j 0 x) .

exp (axj 1 (mk 1 skF21 (urk) ) x0j ) 1 . p k 21 0 Ra r, k a exp (axjr 1 (mk 1 skF (urk) ) xjr) jr

This way of simulating the likelihood has the advantage that it is only necessary to compute standard normal draws F21 (urk) once and not during maximization of the likelihood. 2.3

Combining Sieves with a Copula

When dealing with multivariate distributions, one is confronted by the curse of dimensionality: it arises from the fact that the volume of a space rises exponentially in the number of dimensions and then so does, roughly speaking, the number of parameters required to achieve a given degree of precision. If K parameters are used to describe a univariate distribution with some given degree of precision, then something like KD parameters are required in the D-dimensional case to obtain the same precision. Of course, if one is content with letting the random parameters be independent, then only K # D parameters are required. This section discusses how to use a parametric form to describe a dependence structure using a small number of parameters, while allowing marginal distributions to be arbitrary. This is achieved through the use of copula (Joe, 1997; Nelsen, 2006). Consider a random vector (X1,. . ., XD) distributed according to a multivariate CDF F having continuous marginal distributions Fd. In general, any such multivariate CDF may be written in the form F (b) 5 C (F1 (b1) ,. . ., FD (bD) ) ,

(11.3)

C (u) 5 F (F121 (u1) ,. . ., F21 D (uD) )

(11.4)

where

is the CDF of the random vector (F1 (X1) , . . . , FD (XD)) . Such a C is called a copula. It is a CDF on the unit cube with univariate marginal distributions being uniform, and any such CDF is a copula. The copula in (11.3) captures precisely the dependence structure of F, and does not depend on the marginal distributions of F. The simplest copula is the independence copula, which is the product C (u) 5 u1 # . . . # uD. Through (11.4), it is possible to create a copula based on any given continuous CDF. For example, it is straightforward to create a copula based on the multivariate normal distribution that has any desired correlation matrix. This is called a Gaussian copula and it is completely defined in terms of the correlation matrix. In D dimensions, this has D (D 2 1) /2 parameters. Another popular class of copula may generate dependence with just a few parameters. Archimedian copula have the form C (u) 5 y (y21 (u1) 1 . . . 1 y21 (uD)) ,

(11.5)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

262

Handbook of choice modelling

where y is a generator function having certain specific properties (Nelsen, 2006). An example of a generator function is the Gumbel with y (t) 5 exp (2t1/q) , q $ 1, which leads to the an Archimedian copula determined by a single parameter q C (u) 5

1 . q q 1/q )) exp ( ((log (u21 1 . . . 1 (log (u21 1 D )) ) )

It is possible to generalize Archimedian copula using the logit family of models. Any multivariate extreme value distribution with EV1 marginals has the form exp (2G (e2b ,. . ., e2b ) ) , where G is a choice probability generating function with certain properties (Fosgerau et al., 2012). Such choice probability generating functions may be viewed as generalizations of summation and it turns out that replacing the sum in (11.5) by G does in fact lead to a generalized Archimedian copula, C (u) 5 y (G (y21 (u1) ,. . ., y21 D (uD) )) (Fosgerau et al., 2012). An attraction of this form is that complex dependence structures may be handled using nesting as in the nested or cross-nested logit models (Daly and Bierlaire, 2006). Bhat (2009) describes a way to generate copula based on such multivariate extreme value distributions. In general, it is difficult to construct multivariate copula. The generalized Archimedian copula allows only positive dependence (Joe, 1997) but is otherwise very flexible. The Gaussian copula may describe also negative dependence, but the dependence structure is given by a correlation matrix and cannot be made more flexible than that. If only a bivariate copula is needed, then a multitude of forms are known (Joe, 1997; Nelsen, 2006). Copula are convenient to use in combination with simulation methods. Let us say we want to evaluate P (y 5 j 0 x) 5 E (P (y 5 j 0 x, b) 0 x) , where the distribution of b is given in (11.3) in terms of marginal distributions and a copula. Then 1

D

P (y 5 j 0 x) 5 3 P (y 5 j 0 x, b) dF (b)

21 5 3 P (y 5 j 0 x, (F21 1 (u1) ,. . ., FD (uD) )) dC (u) .

Using a sample of random draws { ur } from the distribution C, this probability can be approximated by the average of P (y 5 j 0 x, (F121 (ur1) ,. . ., FD21 (urD) ) ) . In case C has a density c, then it is possible to use P (y 5 j 0 x) 5 3

21 P (y 5 j 0 x, (F21 1 (u1) , . . . , FD (uD) )) c (u) du,

u [ [0, 1 ]D

meaning that u can simply be drawn from the uniform distribution on the unit cube.

3

REGRESSION-BASED APPROACHES

3.1 Binary Choice and No Covariates Cross-sectional binary choice data are particularly amenable to nonparametric analysis. The simplest relevant model arises when we observe whether an unobserved random

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Nonparametric approaches to describing heterogeneity

263

variable w is smaller or greater than some observed variable v, that is, we observe y 5 1 { w , v } , where 1 { # } is the indicator function. We observe (y, v) for a range of values of v and we are concerned with finding the CDF of w. This model is natural in a contingent valuation context where subjects are asked whether they are willing or not to pay v for some good under investigation and the object of interest is the distribution of willingness-to-pay in the population. The model is also relevant in more complicated settings. Consider, for example, a binary choice involving a trade-off between two goods (or bads). For concreteness, let us say that subjects choose between two travel options characterized by travel time t and travel cost c and let us say that alternative 1 is both slower and cheaper than alternative 2. Let us furthermore assume that subjects evaluate alternatives by the cost function wt 1 c, where w is an individual-specific value of time, treated as random in the population, and that they choose the alternative with the lowest cost. Then the cheap and slow alternative 1 is chosen when the value of time is less than the unit price of time implicit in the choice situation, that is, when w , 2 (c1 2 c2) / (t1 2 t2) . Thus we obtain the same model again with the unit price of time playing the role of the bid. Say now that w has CDF F and note that E (y 0 v) 5 P (w , v) 5 F (v) . This means that the mean y conditional on a value of v is an estimate of F at the point v. In practice, one might estimate F (v) as the average of yi for observations (yi, vi) where vi is close to v. In order to estimate F, we thus need to observe (y, v) many times for a range of values of v. Thinking about it in this way also makes it clear that it is not always possible to identify F for all values of v; it is only possible for those values of v where we have sufficient observations.6 The identification problem is deadly serious in situations where it is desired to estimate the mean of w. To see this, consider the following example. Let us say that we know F for values of v up to 100, but that F (100) 5 0.9. What can we then say about E (w) ? The problem here is that we have no information about the distribution of w above the value of 100. The lower bound for the mean is reached if the residual mass is concentrated at 100. 100 In this case the mean of w would be at the lower bound e2` w F (dw) 1 (1 2 0.9) # 100. The upper bound for the mean, on the other hand, is infinity, since there is no upper bound for where the residual mass could be located. This is the underlying reason why Fosgerau (2006) found that various parametric assumptions for F could lead to estimates of E (w) that differed by an arbitrarily large factor. There are two important lessons here. One is that it is important to verify that it is in fact possible to identify the distribution of interest from the data at hand. Another is that imposing parametric assumptions runs the risk of introducing errors that are extremely large. Assume now that we have data with values of v that cover the support of w. It is then possible to estimate F by local averaging. One convenient way of doing this is by kernel regression. A basic element of kernel regression is the kernel. An easy choice is the density of a normal distribution f, but other densities could be used as well and kernels do not have to be densities. A density like the normal is, however, easy to understand as a kernel: Consider the function 1h f (x 2h x ) . This is a density that places a smooth bump of mass at the point x0; the concentration of the mass is determined by the bandwidth parameter h with small values of h corresponding to the mass being concentrated near x0. 0

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

264

Handbook of choice modelling

Let us now consider how to estimate F ( # ) at some fixed point v0. If we had many observations (yi, vi) with vi 5 v0, then we could just average yi for those observations. But in most situations we have a scatter of observations with different values of v. We therefore use the kernel to produce a weighted average, assigning more weight to observations near v0 : Fˆ (v0) 5 a .iyi 5 a i i

vi 2 v0 1 hf ( h ) vi 2 v0 1 hf ( h ) a i

yif ( a i

vi 2 v0 h )

yi 5

f( a i

vi 2 v0 h )

.

v 2 v v 2 v Note that .i 5 1h f ( h ) / a h1 f ( h ) is a weight for the i’th observation and that i these weights sum to 1. The weight is large when vi is close to v0; it decreases as 0 vi 2 v0 0 increases and the rate of decrease is governed by h. Note also that Fˆ ( # ) is a smooth function of v. It is a general finding that the choice of kernel is less important for results, but that the choice of bandwidth is very important. If the bandwidth is very large, then all observations receive almost the same weight and Fˆ ( # ) becomes almost flat, approximating the mean of the yi. If the bandwidth is very small, then Fˆ ( # ) will jump up and down, tracking each observation quite closely, having the right mean, but probably being quite far from the true F at most places. There is thus a trade-off involved in choosing the ‘optimal’ bandwidth. Various approaches exist to assist in this choice. The reader is referred, for example, to Pagan and Ullah (1999) for a discussion of these approaches. Here we shall only briefly indicate some possibilities. The most computationally expensive approach is cross-validation where the bandwidth is found by minimizing some appropriate function of the error in predicting each observation, using all other observations. A straightforward approach is to choose the bandwidth to minimize the sum of squared errors over all observations. This is feasible with samples that are not too large, but with a sample size of N, there are N squared sums of N terms in the function to be minimized and that can pose problems when N is large. An easier approach is to employ a plug-in bandwidth. This provides a bandwidth as a function of some sample statistics, in particular the sample size. A number of suggestions for doing this exist in the literature. Finally, there is eye-balling. This consists of plotting the function of interest for a range of bandwidths and choosing a bandwidth that produces an estimated Fˆ ( # ) with an appropriate number of features (say, number of modes). Even though this is informal, it may not be bad, given that we are able to form an opinion regarding say, the likely number of modes for F. In either case, the bandwidth will depend on the sample size; the optimal bandwidth is smaller for larger samples and in the limit the optimal bandwidth approaches zero. Theory exists to provide confidence bands around a kernel regression estimate. Confidence bands are either pointwise or uniform: a 95 per cent pointwise confidence band covers the true value of F with probability 0.95 at each value of V; a 95 per cent uniform confidence band covers the true function F at all values of V simultaneously with a probability of 0.95 (Pagan and Ullah, 1999). These methods are applied in, for example, Fosgerau (2006) and Fosgerau (2007). i

0

i

0

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

265

Nonparametric approaches to describing heterogeneity 3.2

Binary Choice Including Covariates

Consider now a model where we observe y 5 1 { w 1 bx # v } ,

(11.6)

with w being independent of x and v. This is the same model as before, except now a term bx has been added to the unobserved w. We take the vector x as observed and the vector b to be estimated. Such a model arises, for example, if v is the log of a bid and the willingness-to-pay is exp (w 1 bx) . Then the willingness-to-pay is always positive regardless of bx and the distribution of w. If b was known, then we could just regress y against v 2 bx in order to estimate F using the method just described. Conversely, if F was known, then we could estimate b by maximum likelihood, since P (y 5 1 0 v, x) 5 F (v 2 bx) . These observations are the basis for the Klein and Spady (1993) estimator. It works iteratively, producing an estimate of F given starting values for b, then estimating a new b given this F, then estimating a new F given the new b. This continues until convergence. There are alternatives to Klein-Spady, see Manski (1985), Horowitz (1992) and Cossleft (1983). Lee (1995) generalizes the Klein-Spady estimator to multinomial choice. Lewbel et al. (2011) consider the estimation of moments and quantiles of F in the more general setting where y 5 1 { w # v } but where the distribution of w may depend on x in a general way, such that P (y 5 1 0 v, x) 5 F (v 0 x) . The assumption used above that specifies the influence of x to take place through bx is called an index assumption and it yields a special case of the model considered by Lewbel et al.

4

SUMMARY OF NOTATION

Symbol a, b b K x y i j t v P D F, H G Q, q

Meaning Random parameters Random parameter vectors Number of terms in univariate expression for a distribution Vector of independent variables Observed choice Index for individuals Index for alternatives Index for time periods Bid, univariate Probability Number of dimensions General CDF MEV exponent, choice probability generating function CDF on the unit interval

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

266

Handbook of choice modelling

NOTES 1. I am grateful to Elisabetta Cherchi for comments. The work is supported by the Danish Strategic Research Council. 2. It is intended as an introduction and readers wishing to apply these techniques should consult the original sources. The textbooks by Härdle (1990), Horowitz (1998), Pagan and Ullah (1999), Yatchew (2003) and Li and Racine (2007) are very useful introductions to the vast literature on nonparametric techniques in general. 3. Fixed effects models are discussed in most econometrics textbooks. Other points of entry are Andersen (1973), Lancaster (2000), Arellano (2003), Maguac (2004, 2008). 4. The handbook chapter by Chen (2006) is a good starting point for readers who want to pursue the subject in more depth. 5. The books by Fruhwirth-Schatter (2006) and McLachlan and Peel (2000) provide general introductions to finite mixture modelling. 6. This is an instance of the general identification problem of econometrics, concerning when it is possible to infer the value of parameters from data. In the present case, the function F is a parameter and it is infinite-dimensional. Had we been able to restrict F to belong to a certain class of distributions, say normal distributions, then identification would have been much easier.

REFERENCES Andersen, E. (1973), Conditional Inference and Models for Measuring, Copenhagen: Mentalhygiejnisk Forlag. Arellano, M. (2003), ‘Discrete choices with panel data’, Investigaciones Economicas, 27 (3), 423–58. Bhat, C.R. (2009), ‘A new generalized gumbel copula for multivariate distributions’, technical report, University of Texas at Austin. Bierens, H. (2008), ‘Semi-nonparametric interval-censored mixed proportional hazard models: identification and consistency results’, Econometric Theory, 24 (3), 749–94. Chen, X. (2007), ‘Large sample sieve estimation of semi-nonparametric models’, in J.J. Heckman and E.E. Learner (eds), Handbook of Econometrics, vol. 6, Amsterdam: Elsevier, ch. 76. Coppejans, M. (2001), ‘Estimation of the binary response model using a mixture of distributions estimator (MOD)’, Journal of Econometrics, 102 (2), 231–69. Cosslett, S.R. (1983), ‘Distribution-free maximum likelihood estimator of the binary choice model’, Econometrica, 51 (3), 765–82. Daly, A. and M. Bierlaire (2006), ‘A general and operational representation of generalised extreme value models’, Transportation Research Part B: Methodological, 40 (4), 285–305. Fosgerau, M. (2006), ‘Investigating the distribution of the value of travel time savings’, Transportation Research Part B: Methodological, 40 (8), 688–707. Fosgerau, M. (2007), ‘Using nonparametrics to specify a model to measure the value of travel time’, Transportation Research Part A: Policy and Practice, 41 (9), 842–56. Fosgerau, M. and M. Bierlaire (2007), ‘A practical test for the choice of mixing distribution in discrete choice models’, Transportation Research Part B: Methodological, 41 (7), 784–94. Fosgerau, M. and S. Hess (2009), ‘A comparison of methods for representing random taste heterogeneity in discrete choice models’, European Transport, (42), 1–25. Fosgerau, M. and S.F. Nielsen (2010), ‘Deconvoluting preferences and errors: a model for binomial panel data’, Econometric Theory, 26 (6), 1846–54. Fosgerau, M., D.L. McFadden and M. Bierlaire (2012), ‘Choice probability generating functions’, Journal of Choice Modelling, 8, 1–18. Fruhwirth-Schnatter, S. (2006), ‘Finite mixture and Markov switching models’, Springer Series in Statistics, doi: 10.1007/978-0-387-35768-3. Härdle, W. (1990), ‘Applied nonparametric regression’, Econometric Society Monograph Series, Cambridge University Press. Horowitz, J. (1998), Semiparametric Methods in Econometrics, New York: Springer. Horowitz, J.L. (1992), ‘A smoothed maximum score estimator for the binary response model’, Econometrica, 60 (3), 505–31. Joe, H. (1997), Multivariate Models and Dependence Concepts, London: Chapman & Hall. Klein, R. and R. Spady (1993), ‘An efficient semiparametric estimator for binary response models’, Econometrica, 61 (2), 387–421.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Nonparametric approaches to describing heterogeneity

267

Lancaster, T. (2000), ‘The incidental parameter problem since 1948’, Journal of Econometrics, 95 (2), 391–413. Lee, L.F. (1995), ‘Semiparametric maximum likelihood estimation of polychotomous and sequential choice models’, Journal of Econometrics, 65 (2), 381–428. Lewbel, A., D.L. McFadden and O. Linton. (2011), ‘Estimating features of a distribution from binomial data’, Journal of Econometrics, 162 (2), 170–88. Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton and Oxford: Princeton University Press. Magnac, T. (2004), ‘Panel binary variables and sufficiency: generalizing conditional logit’, Econometrica, 72 (6), 1859–76. Magnac, T. (2008), ‘Logit models of individual choice’, in S.N. Durlauf and L.E. Blume (eds), The New Palgrave Dictionary of Economics, Basingstoke: Palgrave Macmillan. Manski, C.F. (1985), ‘Semiparametric analysis of discrete response: asymptotic properties of the maximum score estimator’, Journal of Econometrics, 27 (3), 313–33. McLachlan, G. and D. Peel (2000), Finite Mixture Models, New York: John Wiley & Sons. Nelsen, R.B. (2006), An Introduction to Copulas, vol. 2, New York: Springer. Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge: Cambridge University Press. Train, K. (2009), Discrete Choice Methods with Simulation, 2nd edn, Cambridge, MA: Cambridge University Press. Yatchew, A. (2003), Semiparametric Regression for the Applied Econometrician, Cambridge: Cambridge University Press.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:54AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

12 Attribute processing as a behavioural strategy in choice making David Hensher

1

INTRODUCTION

Choosing is a complex process that is typically simplified by human beings in many ways in order to ensure that the expected benefits outweigh the assumed costs of an outcome. Regardless of whether the context entails habitual or variety seeking behaviour, individuals draw on decision rules, often referred to as heuristics, to provide guidance on making choices. Such rules might be associated with an accumulation of overt experience; but whatever the basis of rule selection, there are many forces at play, often called cognitive processes, conscious or unconscious, that dictate responses in settings that researchers use to study choice making. Despite the recognition in behavioural research, as long ago as the 1950s (see Simon, 1955; Svenson, 1992; also Busemeyer and Rieskamp, Chapter 3 in this volume), that cognitive processes have a key role in preference revelation, and the reminders throughout the choice literature (see McFadden, 2001) about rule-driven behaviour, we still see relatively little of the decision-processing literature incorporated into mainstream discrete choice modelling which is, increasingly, becoming the preferred empirical context for individual preference measurement and willingness-to-pay derivatives. There is an extensive literature outside of discrete choice modelling focusing on these matters, broadly described as heuristics and biases, and which is crystallized in the notion of process, in contrast to outcome. Choice has both elements of process and outcome, which in combination represent the endogeneity of choice in choice studies. The failure to recognize process, and the maintenance of a linear in parameters and additive in attributes (including allowance for attribute interactions) utility expression under full attribute and parameter preservation, is an admission, by default, that individuals when faced with a choice situation deem all attributes (and alternatives) relevant, and that a fully compensatory decision rule is used by all agents to arrive at a choice. In recent years we have started to see a growing interest in alternative processing strategies at the attribute, alternative and choice set levels, with empirical evidence suggesting that inclusion of process matters in a non-marginal way, in the determination of important behavioural outputs such as estimates of willingness to pay, elasticities and predicted choice outcomes. Research contributions such as Swait (2001), Cantillo et al. (2006), Hensher (2006b, 2008), Beharry and Scarpa (2010), Cameron and DeShazo (2010), Puckett and Hensher (2008), Scarpa et al. (2009, 2012), Hensher and Rose (2009), Hensher et al. (2009), Hensher and Layton (2010), Hess and Hensher (2010) and Layton and Hensher (2010), among others, are examples of a growing interest in the way that individuals evaluate a package of attributes associated mutually exclusive alternatives in real or hypothetical 268 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 269 markets, and make choices.1 The accumulating empirical evidence, in part represented in the references above, suggests that individuals use a number of strategies derived from heuristics, to represent the way that information embedded within attributes defining alternatives is used to process the context under assessment and arrive at a choice outcome. These include cancellation or attribute exclusion, degrees of attention paid to attributes in a package of attributes, referencing of new or hypothetical attribute packages around a recent or past experience, imposing thresholds on attribute levels to represent acceptable levels (for example, Swait, 2001; Hensher and Rose, 2012), and attribute aggregation where they are in common units (for example, Layton and Hensher, 2010). Gilovich et al. (2002) synthesize the evidence under the theme of heuristics and biases. Importantly, the heuristics are likely to be context specific, such that the design and hence the nature of the information shown in stated choice experiments, for example, conditions in part the choice of rules adopted. The broad multidisciplinary literature on behavioural decision-making (see Gilovich et al., 2002) argues that individuals appear to adopt a range of ‘coping’ or editing strategies in hypothetical choice settings that are consistent with how they normally process information in real markets. Choice experiments have varying amounts of information to process, but importantly, aligning ‘choice complexity’ with the amount of information to process is potentially misleading. Relevancy is what matters (Hensher, 2006a, 2006b, 2008), and the heuristics adopted by individuals to evaluate a circumstance is what needs to be captured through frameworks that can empirically identify rules adopted by individuals, which may or may not be conditioned by the instrument being used to capture evidence. There are at least two ways in which information used in processing might be empirically identified. One involves direct questioning of respondents after each choice scenario (what is increasingly referred to as self-stated intentions); the other involves probabilistic conditions imposed on the model form through specification of the utility expressions associated with each alternative that enables inference on the way that specific attributes are processed. Both may be complementary as recently investigated by Scarpa et al. (2012). The purpose of this chapter is to review some of the findings and models that have emerged from the literature that might be used to gain an understanding of choice making and hence improve the choice-modelling process. This chapter focuses on the role of attribute processing in stated choice experiments, the dominant discrete choice setting within which attribute processing has been studied, but we note that the heuristics also apply in the context of revealed preference data.2 The chapter draws on both direct questioning and inferential methods to synthesize what is known about the role of mixtures of processing rules in order to establish the behavioural implications on key outputs such as marginal willingness to pay. The functional forms presented herein, as well as responses to self-stated intention questions, enable the analyst to infer, up to a probability, the presence of some very specific attribute processing strategies such as attribute non-attendance in the presence or otherwise of attribute thresholds and referencing. We restrict the scope of this chapter, given the extensive literature on heuristics and biases, to attribute processing strategies that researchers have found to be behaviourally appealing, to date, in the context of discrete choice analysis studies. We focus on the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

270

Handbook of choice modelling

following five themes: attribute non-attendance, attribute thresholds, the majority of confirming dimensions, reference point revision and value learning. We suggest however that it is too early to offer a view as to whether one decision-making process is gaining greater empirical support to ‘explain’ real choice making. What we can say is that there is a need for much more research on process to complement outcome in choice modelling. The current disproportionate interest in attribute non-attendance may in large measure reflect the relative ease of studying this phenomenon in contrast to the much more complex non-linear propositions associated with the other attribute processing rules presented in this review.

2

ATTRIBUTE NON-ATTENDANCE

A behavioural rule which is attracting particular attention in stated choice studies is the extent to which respondents attend to, or ignore, one or more attributes in processing the information on offer, resulting in a (stated) choice outcome. Some agents do not appear to put any weight on some attributes. The question then is whether the heterogeneity with respect to placing a zero weight on some attributes is effectively exogenous, that is, simply preference heterogeneity, whether it is a function of the characteristics of the choice sets agents faced, or more likely both factors play a role. One can probably never rule out that there is exogenous preference heterogeneity among agents with respect to placing a zero weight on one or more attributes, but it is here that by running choice experiments that one can show that the nature of the choice sets that agents see influences the pattern and extent of particular attributes given no weight. Given a continuum of relevance, distinguishing a zero weight from a very low level of relevance (approximating but not equal to zero) creates a research challenge (see below). We present below evidence on the contextual influence of the design of the choice experiment. In the popular stated choice approach it is assumed, in the main, that all attributes are processed in what DeShazo and Fermo (2004) describe as the passive bounded rationality model. This model assumes that individuals attend to all information in the choice set, but increasingly make mistakes in processing that information, as the volume of information increases. Contrasting this is the rationally adaptive model which assumes that individuals recognize that their limited cognition has positive opportunity costs. Whether rationally adaptive behaviour is a product of the survey instrument and/or the nature of an individual’s processing of any information, is an empirical matter. In stated choice (SC) studies, respondents are typically asked to choose their preferred alternative among several hypothetical alternatives in a sequence of experimentally designed choice tasks (see Rose and Bliemer, Chapter 7 in this volume). The standard behavioural assumption underlying most SC studies is that respondents make trade-offs between all attributes describing each of the alternatives, and are expected to choose their most preferred alternative in a choice set. This rules out the possibility that respondents focus solely on a subset of offered attributes, ignoring all other differences between the alternatives (see Hensher, 2006a). Ignoring attributes in the choice task implies some form of non (or semi-)-compensatory behaviour, because no matter how much the level of a given attribute is improved, the improvement will fail to compensate for worsening in the levels of other attributes if the attribute itself is ignored by the respondent

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 271 (Lockwood, 1996; Spash, 2000; Sælensminde, 2002; Rekola, 2003), or what Rigby and Burton (2006) describe as ‘disinterest’. There may be one exception where choosing to ignore an attribute may be influenced by the levels of the other attributes, and hence a switch between compensatory and non-compensatory behaviour may be legitimate as the attribute levels change within a choice experiment. This can be tested at a choice set level (see Puckett and Hensher, 2008), but is problematic if the test relates to the entire set of choice sets. There is potential for attribute non-attendance (henceforth AN-A) to have serious consequences on the derivation of prediction and welfare estimates, especially when the object of neglect is the monetary attribute, such as the cost of an alternative, although it applies equally to the numerator in any calculation of willingness to pay. The detection and statistical handling of AN-A raises technical issues for the practice of discrete choice modelling, especially where specific processing rules are observed or predicted for a sample, which then have to be applied to a population. 2.1

Two Emerging Approaches

Two choice methods are emerging to investigate the role of specific heuristics – one involving supplementary questions on whether specific attributes are ignored, referred to as self-stated intentions (see Hensher et al., 2005, for an initial contribution), and the other involving a specification of a model that can reveal the extent to which each attribute is preserved across a sample without the need for supplementary data (for example, Hess and Hensher, 2010). Although it is not possible to suggest which method is closer to the ‘truth’ in capturing process strategy, there is ongoing research designed to understand the behavioural implications of each method, and in time to establish a mapping between the two methods (see Hess and Hensher, 2013; Scarpa et al., 2012). 2.2 Attribute Non-attendance on Supplementary Questions Hensher and co-authors have initiated several explorations of attribute non-attendance within a standard multivariate discrete choice setting. For example, the attribute ranges in Hensher (2006a) were varied simultaneously across all attributes but the self-stated AN-A response is available only at the level of the individual, not the choice set. In contrast, Cameron and DeShazo (2011) consider differences in the ranges of attributes within a single choice set as additional potential determinants of attention, and therefore of apparent marginal utilities, and ultimately estimates of willingness to pay. Very relevant to Cameron and DeShazo (2011) is Hensher’s finding that individuals’ processing strategies depend on the nature of the attribute information in the choice set, not just the quantity of such information (that is, the number of attributes). Hensher et al. (2005) use a specific follow-up question about which attributes the respondent did not use in making their choices. Hensher et al. (2007) also use the same follow-up question to identify nine distinct attribute processing rules. Respondent adherence to these rules is modelled as stochastic. The authors then use a modified mixed logit model which conditions each parameter on whether a respondent included or excluded an attribute in their attribute processing strategy. In their conclusions, the authors acknowledge that there may be differences ‘between what people say they think

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

272

Handbook of choice modelling

and what they really think’ (Hensher et al., 2005, p. 216), and they question whether the ‘simply conscious statements’ made by survey respondents, no matter how much detail is obtained, represent an adequate measure of information processing (IP). They emphasize that regardless of the source of information on attribute processing, individuals’ IP strategies ‘should be built into the estimation of choice data from stated choice studies’ (Hensher et al., 2005, p. 214). A related study is Puckett and Hensher (2008) which builds on Hensher et al. (2006a) in that it considers the effects of APSs utilized by respondents for every alternative in every choice set, including across choice tasks faced by a given respondent. This approach can accommodate cases where attribute level mixes are outside of the acceptable choice bounds for the individual. The wording of their debriefing question for each choice was: ‘Is any of the information shown not relevant when you make your choice? If an attribute did not matter to your decision, please click on the label of the attribute below. If any particular attributes for a given alternative did not matter to your decision, please click on the specific attribute.’ Subjective all-or-nothing attention to different attributes is thus elicited directly from each respondent, rather than being inferred from choice behaviour. 2.3

The Role of the Design of the Choice Experiment in AN-A Defined by Self-stated Intentions

The actual design of a choice experiment may itself be a source of heterogeneity in induced attribute processing, in recognition that the design of a choice experiment can itself induce particular processing strategies. Hensher (2006a) was the first study to investigate the dimensionality of a stated choice experiment with 16 different choice experiment designs, each varying according to the number of attributes (3, 4, 5, 6), the number of alternatives (2, 3, 4), the number of choice sets (6, 9, 12, 15), and the range of each attribute (narrower, base, wider) the role of (i) the dimensionality of a stated choice experiment. See Table 12.1 for a summary of the designs. Attribute non-attendance was defined by self-stated intentions. The key findings from this study in terms of number of attributes attended to are: 1. 2.

3.

The probability of considering more attributes increases dramatically as the number of levels per attribute decreases, ceteris paribus. The probability of considering more attributes from the offered set decreases as an attribute’s range narrows, ceteris paribus. That is, respondents ignore more attributes when the difference between attribute levels is small. This result is perhaps due to the fact that evaluation of small differences is more difficult than evaluation of large differences. An important implication is that if an analyst continues to include, in model estimation, an attribute across the entire sample that is ignored by a respondent, then there is a much greater likelihood of mis-specified parameter estimates in circumstances where the attribute range is narrower rather than wider.3 These two results can be combined and stated in the converse as follows: respondents tend to consider more attributes (that is, ignore fewer attributes) when the attributes have only a few levels that differ greatly, such that evaluation of each attribute is easier. Overall the respondent seems to trade-off effort spent on each attribute against the number of attributes considered.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 273 Table 12.1 The sub-designs of the overall design Choice set of size 15 12 15 9 6 15 6 9 15 6 6 9 12 12 9 12 Note:

Number of alternatives

Number of attributes

Number of levels of attributes

3 3 2 2 2 2 3 4 4 4 3 4 3 2 2 4

4 4 5 5 3 3 6 3 6 6 5 4 6 3 4 5

3 4 2 4 3 4 2 4 4 3 4 2 2 3 2 3

Range of attribute levels Base Wider than base Wider than base Base Wider than base Narrower than base Narrower than base Wider than base Base Wider than base Narrower than base Narrower than base Base Narrower than base Base Narrower than base

Column 1 refers to the number of choice sets. The 16 rows represent the set of designs.

Source: Hensher (2006a).

4.

5.

6.

As we increase the ‘number of alternatives’ to evaluate, ceteris paribus, the importance of considering more attributes increases, as a way of making it easier to differentiate between the alternatives. This is an important finding that runs counter to some views, for example, that individuals will tend to ignore increasing amounts of attribute information as the number of alternatives increases. Our evidence suggests that the processing strategy is dependent on the nature of the attribute information, and not strictly on the quantity. Overall, we see a picture emerging that design dimensionality seems to have less of an influence on the attribute processing strategy when we have fewer items to process. This makes good sense but should not be taken to imply that designs with fewer items are preferred; but that preference heterogeneity in invoking an attribute processing strategy, appears to decline substantially as the information content declines, for real or spurious reasons. Contrariwise, individuals appear to increasingly invoke a relevancy strategy as the amount of information to process increases. The need to capture this growing heterogeneity in IP strategies is clear and should be accounted for in behavioural choice models. The evidence on sources of influence on how many attributes are considered, relative to the full set offered, is important in revealing candidate influences on attribute processing, and the extent to which the empirical policy outputs, such as willingness to pay, vary as a consequence of the SC design and its context. Where we might find evidence of attribute reduction (through exclusion and/or aggregation), we might reasonably speculate that the selected attribute processing strategy has elements of

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

274

Handbook of choice modelling coping and relevancy. This should not necessarily be interpreted as a response to complexity, but part of the natural process of decision making.

2.4

The Role of Attribute Non-attendance through Model Inference

Attribute non-attendance on supplementary questions is designed to establish whether a respondent had ignored an attribute or not: they could be asked either after each choice set or after completing all choice scenario assessments. However, as argued in a number of papers, such as Hensher and Rose (2009), Hess and Hensher (2010) and Hensher (2010), there is concern about the reliability of responses to such supplementary questions. Although the jury is still out on this issue, there is growing interest in identifying the role of attribute non-attendance through model inference, rather than directly asking each respondent. The most recent examples are Scarpa et al. (2010), Hess and Hensher (2010), Hensher and Greene (2010) and Hole (2011). A growing research theme is the issue of how to incorporate this phenomenon in statistical models when data on self-reported AN-A are not available or are deemed problematic. There are some intuitive ways of addressing this issue, building on basic models that are commonly employed by practitioners. In particular, panel mixed logit models are an appealing setting within which to account for repeated attribute exclusion in the evaluation of proposed alternatives by a given respondent. What is intended here is that the identification of AN-A behaviour is achieved by analysing the observed response pattern using a statistical model with degenerate distributions of taste intensities at zero, which implies non-attendance. This contrasts with the approaches that rely on the selfstated intentions (Carlsson et al., 2008; Hensher, 2007, 2008) that ask respondents which attributes they paid attention to or were important. Methods are now available that do not require self-reported information on attendance (see Hess and Hensher, 2010, 2013; Hoyos et al. 2010; Hole 2011). Hess and Hensher (2010) infer AN-A through the analysis of respondent-specific parameter distributions, obtained through conditioning on stated choices. Their results suggest that some respondents do indeed ignore a subset of explanatory variables. There is also some evidence that these inferred attribute processing strategies are not necessarily consistent with the responses given to supplementary questions about attribute attendance, when mapping is available. This raises questions about how both types of data can be used to assist in improving behavioural relevance. The results in Hess and Hensher (2010) for example, show that respondents who indicate that they ignored a given attribute often still show non-zero sensitivity to that attribute, albeit one that is (potentially substantially) lower than that for the remainder of the population. A possible interpretation of these results is that respondents who indicate that they did not attend to a given attribute simply assigned it a lower importance, and that the probability of indicating that they ignored a given attribute increases as the perceived importance of that attribute is reduced, an argument put forward recently by Hess (2011). In a similar manner, Scarpa et al. (2009) implement two ways of modelling AN-A; the first involves constraining coefficients to zero in a latent class framework, while the second is based on stochastic attribute selection, and grounded in Bayesian estimation. In all studies, the results indicate that accounting for non-attendance significantly improves model fit in comparison to models that assume full attribute attend-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 275 ance, and yields estimates of willingness to pay for specific attributes that are typically different. 2.4.1

The growing popularity of the latent class framework to accommodate probabilistic decision processes Much marketing research is designed to determine what people do not care about. What does cause a problem is heterogeneity in the sense that there is a substantive fraction of the population that places a zero weight on some attribute. The latent class models with zero for some attribute for some classes are an ideal statistical modelling solution for this phenomenon. There is a large literature in economics (for example, Pudney, 1989) that looks at how to model corner solutions. Not allowing for corner solutions gets one trapped into absurd propositions as suggested above whereby, for example, increasing the contrast in the colour of the paint under the carpets in an automobile (which the agent does not see and does not care about) is a way to compensate for lower fuel economy. It is common to see Tobit models, hurdle/spike models and various types of count data models. Much of the research output reported here can be seen as the analogue of how to do this in choice models with respect to individual attributes. Hess and Rose (2007), Hensher and Greene (2010) and Campbell et al. (2011) use a latent class framework (without drawing on evidence from stated non attendance questions) as a way of capturing a probabilistic decision rule process, in which specific restrictions are imposed on the utility expressions for each class, to represent hypotheses of pre-defined attribute processing strategies. A growing number of authors are showing that the constrained latent class model appears to outperform its competitors that are based on continuous mixing of taste such as mixed logit. However, while a number of the classes relate to attribute non-attendance (and other candidate attribute processing rules), these studies excluded the possibility of combinations of more than one attribute non-attendance rule in a class. Investigating all combinations, while appealing, becomes increasingly complex and infeasible as the number of attributes (K) increases, given a 2K rule for the combination of attendance or non-attendance. With four attributes, for example, we have 16 possible combinations, and with eight attributes we have 256. Nevertheless the approach has appeal and is presented below as one way of recognising attribute processing heterogeneity. Formally, assume respondent i 5 1,2, . . . , I is asked to select from amongst J alternatives, j 5 1,2 , . . . , J. Assuming that the basic analytical framework is a standard MNL choice model, the probability that respondent i chooses alternative j is given as Prob(i, j)5

exp (brxi, j) a j51 j

.

(12.1)

exp (brxi, j)

where xi,j represents the attributes associated with alternative j as observed by respondent i and br is a vector of parameter weights related to the attributes. Non-attendance is accommodated by supposing that individuals sort themselves into one of 2K (or q 5 1, . . . , Q) classes, distinguished by which of the attributes were considered in their choice process (see Hole, 2011). If the configuration chosen by the individual is not directly observed (as, for example, in a supplementary question), then

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

276

Handbook of choice modelling

in the model, this sorting can only be done probabilistically. In the context of (12.1), we can model this by writing equation (12.2). Prob(i,j|q) 5

exp (brq xi, j) a j51 j

.

(12.2)

exp (brq xi, j)

bq is one of the 2K possible vectors b in which m of the elements are zero and K-m are nonzero. Specifically, q can be thought of as a masking vector of the form (d1, d2, d3, d4, . . . ), where each d takes the possible values 0,1. bq is then the ‘element for element product’ of this masking vector, with the standard coefficient vector b, indicating that the masking vector interacts with the coefficient vector. For example, for two attributes (classes), the parameter vectors would appear b15(0,0), b25(bA,0), b35(0,bB), b45(bA, bB).4 An important part of the underlying theory is that the class q is not defined by the attribute taking value zero within the class, but by the corresponding coefficient taking the value zero. Thus the ‘random parameters’ aspect of the model is a discrete distribution of preference structures across individuals who are distinguished by whether they pay attention to the particular attribute or not. Since the sorting is not observable, we cannot directly construct the likelihood function for estimation of the parameters. In keeping with the latent class approach, we need to estimate a set of probabilities (pq) that each individual i falls into class q. While this could be conditioned on individual characteristics, or indeed any exogenous information such as respondent stated reasons as to why they did not attend to an attribute (for example, it is not important, it simplified choosing or the attribute levels were out of an acceptable range), in this case we have assumed that the same set applies equally to all respondents, so that the probabilities reflect the class proportions. The marginal probability that individual i will choose alternative j is found by averaging over the classes, as in (12.3). p Prob(i, j) 5 a q51 2k

exp (brq Xi, j) a j51 J

exp (brq XI,J)

where a

2k q51

pq 5 1.

(12.3)

As formulated, this is a type of finite mixture, or latent class model. It differs from more familiar formulations in that the nonzero elements in bq are the same across the classes and the classes have specific behavioural meaning, as opposed to merely being groupings defined on the basis of responses as in the strict latent class formulation, hence the reference to a probabilistic decision process model. Estimation of the probabilistic decision process model is straightforward as a latent class MNL model with linear constraints on the coefficients, as suggested above and can allow for random as well as fixed parameters within each class (as in Collins et al., 2012; Hensher et al., 2012; Hess et al., 2011). In the presence of AN-A, the model differs from more familiar formulations of latent class models in that the nonzero elements in bq can be allowed to be the same or free across the classes, and the classes have specific behavioural meaning, as opposed to merely being groupings defined on the basis of responses, as in the strict latent class formulation, hence the reference to a probabilistic decision process model.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 277 As an example of this approach, Hensher et al. (2012), using data collected in Australia in the context of car commuters choosing between tolled and untolled roads, estimated a multinomial logit (MNL) model in which all attributes are assumed to be attended to, and then a probabilistic decision process model with 2K possible attribute attendance ‘rules’. The model that accounts for attribute non-attendance was a significant improvement on the model that assumes all attributes are attended to, in terms of log-likelihood and Bayes information criterion (BIC). Although the probabilistic decision process model has additional parameters, namely, the class probabilities πq, the choice probability part of the model has the same number of parameters as MNL. For this application, mean values of travel time savings (VTTS) are obtained for the attribute non-attendance model. To calculate the overall VTTS across the attribute attendance rule classes, we have to weight each class by the membership probability. Under allowance for AN-A, a mean VTTS of $12.77 per person per hour is obtained, which is lower that the MNL model mean VTTS of $12.81 per person per hour; however, if we were to exclude the two classes where there is no time-cost trade off, we would obtain $17.96 per person per hour. This suggests an under-estimate from the MNL model of the mean VTTS by 36 per cent. However, this implies that for some respondents (28.9 per cent in the sample) a VTTS does not exist, which is doubtful. This is a major concern for applications of VTTS, and indeed any WTP study (see Scarpa et al. 2009), since we can reasonably assume that everyone does in reality value travel time savings, despite the inability to measure this under certain AN-A rules. We believe that this situation has arisen as a result of the design of the stated choice experiment. In particular, the range and levels of specific attributes might be such that some respondents do not see merit in some of the levels of times and costs being traded, with one or both attributes having levels that do not matter.5 In real markets, it is not unreasonable to suggest that there exist levels of time and cost that do matter, implying that the empirical instrument might not be adequate to pick up the real behavioural response at work. However, there might be some individuals, who would deem a specific attribute not relevant, no matter what a sufficiently wide attribute range was considered (for example, a very wealthy person who does not care about the running cost), and hence never trade-off time with running cost. Furthermore, the situation of a very low level of an attribute might be processed in such a way that relevance only applies when a specific threshold level is reached (see the following section). This suggests that a more careful assessment of respondent-specific attribute ranges is called for in future choice experiment designs. We suspect this finding is not uncommon in choice experiments, but is never known until an analyst undertakes the modelling exercise. Scarpa et al. (2009), for example, find that over 90 per cent of the sample ignore the cost attribute in the context of a stated preference survey designed to value landscapes in Ireland, where the cost attribute was specified as the value in euros that the respondent would personally have to pay per year through their income tax and value added tax contributions. Greene and Hensher (2013), Bujosa et al. (2010) and Hess et al. (2011) introduce a natural extension of the fixed parameter latent class model as a random parameter latent class model which allows for another layer of preference heterogeneity within each class; however, to date only Hess et al. (2011), Hensher et al. (2013) and Collins et al. (2012)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

278

Handbook of choice modelling

have developed this model form in the context of AN-A. What we then have is a latent class model that allows for heterogeneity both within and across groups. To accommodate the two layers of heterogeneity, we allow for continuous variation of the parameters within classes. The latent class aspect of the model is given as (12.4) and (12.5). f(yi|xi,class 5 q) 5 g(yi | xi, bi|q)

(12.4)

Prob(class 5 q) 5 pq(q), q 5 1, . . . ,Q.

(12.5)

The within-class heterogeneity is structured as: bi|q 5 bq 1 wi|q

(12.6)

wi|q ~ E[wi|q|X] 5 0, Var[wi|q | X] 5 Sq

(12.7)

where the X indicates that wi|q is uncorrelated with all exogenous data in the sample. We typically assume that the underlying distribution for the within-class heterogeneity is normal, with mean 0 and covariance matrix S. In a given application, it may be appropriate to further assume that certain rows and corresponding columns of Sq equal zero, indicating that the variation of the corresponding parameter is entirely across classes. The contribution of individual i to the log-likelihood for the model is obtained for each individual in the sample by integrating out the within-class heterogeneity and then the class heterogeneity. We can allow for a panel data setting, hence the observed vector of outcomes is denoted yi and the observed data on exogenous variables are collected in Xi 5 [Xi1, . . . , XiTi]. An individual is assumed to engage in Ti choice situations, where Ti . 1. The generic model is given in (12.8). f(yi|Xi, b1, . . . ,bQ, q, S1, . . . ,SQ) 5 a q51 pq (q) 3 q f [ yit (bq 1 wi) , Xit ] h (wi 0 Sq) dwi t51 w (12.8) Q

Ti

i

The model is called a latent class, mixed multinomial logit (LC_MMNL) model. Individual i chooses among J alternatives with conditional probabilities given as (12.11). exp c a j51yu, j (bq 1 wi)rxit, j d J

f [ yit 0(bq 1wi) , xit ] 5

a j51 J

exp [ a j51yit, j (bq 1 wi) r xit, j ]

, j 5 1, . . . , J,

(12.9)

J

yit,j 5 1 for the j corresponding to the alternative chosen and 0 for all others, and xit,j is the vector of attributes of alternative j for individual i in choice situation t. Applications are given in Greene and Hensher (2013), Bujosa et al. (2010), Hess et al. (2011) and Hensher et al. (2013). Hess et al. (2011) show forcefully that particular modelling assumptions influence the appearance of how much attribute non-attendance there is versus simply a substantial fraction of the sample only placing weak weight (in a mixed logit sense) on the attribute.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 279 The more general point is that any estimate of the fraction of the sample found to be doing something like placing a zero weight on an attribute is conditional on other maintained assumptions. One could, for instance, introduce different types of non-linearities and also shift this fraction. Hensher et al. (2013) conclude that, despite the marginal influence of preference heterogeneity in the overall fit of the models, they find potentially important behavioural evidence to suggest that the inclusion of random parameters may be a way of accommodating small marginal disutilities (in contrast to AN-A set equal to zero marginal disutility), and small differences in marginal disutilities (in contrast to equal marginal disutilities under aggregated common metric attributes), as observed by a ‘move back’ to full attribute attendance when fixed parameters become random parameters under attribute processing. If this argument has merit and can be confirmed using other data-sets, they suggest that they may have identified one way of recognizing what the broader literature (for example, Hess et al., 2011) refers to as low sensitivity in contrast to zero sensitivity. An important output is willingness to pay (WTP) estimates, computed using the familiar result, WTP 5 −bx/bcost. For the most general model with random parameters within class, since there is heterogeneity of the parameters within the classes as well as across classes, the result is best averaged to produce an overall estimate. The averaging can be undertaken for the random parameters within each class and then again across classes using the posterior probabilities as weights. Collecting the results, the procedure is given as (12.10) (from Hensher et al., 2013). ˆ x,ir0qAPR 2b 1 R L 0 R a r51 ir qAPR ˆb cos t,ir0qAPR QAPR 1 N ˆ } { ( 0 ) WTP 5 a i51 a qAPR51 pqAPR q i ≥ ¥ N 1 R L 0 R a r51 ir q 5

1 N QAPR 1 R {p (ˆq) 0 i } a Wir0qAPRWTP , x,ir0 qAPR N a i51 a qAPR51 qAPR R r51

(12.10)

where R is the number of draws in the simulation and r indexes the draws, ˆbx.ir0qAPR 5 ˆbx0qAPR 1 s ˆ x0qAPRwx.ir0qAPR and likewise for ˆb cos t,ir0qAPR, Lir0qAPR is the contribution of individual i to the class specific likelihood, and pqAPR (ˆq) 0 i is the estimated posterior class probability for individual i; 1 R T f [ yit 0 (ˆbqAPR 1wi,r) , Xit ] a q r51 t51 R . T 1 R pqAPR (ˆq) a q f [ yit 0 (ˆbqAPR 1 wi,r) ,Xit ] R r51 t51

ˆ) pqAPR (q pqAPR (qˆ ) 0 i 5

a qAPR QAPR

1

(12.11)

i

51

q is the vector of latent class parameters attached to candidate sources of systematic influence on class membership; and within-class heterogeneity, wi, is structured as: bi|qAPR 5 bqAPR 1 wi|qAPR

(12.12)

wi|qAPR ~ E[wi|qAPR|X] 5 0, Var[wi|qAPR | X] 5 SqAPR

(12.13)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

280

Handbook of choice modelling

where the X indicates that wi|qAPR is uncorrelated with all exogenous data in the sample. See Hensher et al. (2013) for more details. 2.5

Recent New Approaches to Accommodating Attribute Non-attendance

Hess and Hensher (2013) building on the contribution of Hensher (2008) note that AN-A is treated as an exogenous rule in the majority of studies, when in fact it may be endogenous,6 just like the choice outcome. They explicitly recognize the endogeneity induced by attribute non-attendance, and condition attribute parameters on underlying unobserved attribute importance ratings (or indeed any appropriate supplementary information that may be available). They develop a hybrid model system involving attribute processing and outcome choice models in which latent variables are introduced as explanatory variables in both parts of the model, explaining the answers to supplementary attribute processing questions and explaining heterogeneity in marginal sensitivities in the choice model. The resulting empirical model explains how lower latent attribute importance leads to a higher probability of indicating that an attribute was ignored or that it was ranked as less important, as well as increasing the probability of a reduced value for the associated marginal utility coefficient in the choice model. The model does so by treating the answers to supplementary IP questions as dependent rather than explanatory variables (which may themselves be conditioned on a series of supplementary questions focussed on knowing why an attribute was not attended to), hence avoiding potential risk of endogeneity bias and measurement error. We refer the reader to this paper for further details. 2.6 Illustrative Empirical Implications of AN-A To complete the discussion of AN-A, we provide some examples of mean estimates of WTP under full attribute attendance and AN-A. The indicative examples are drawn from transport, agricultural and environmental case studies. The directional change in Table 12.2 Illustrative impacts on willingness to pay of AN-A Context Value of travel time savings (VTTS) car ($AUD) Colour of beef ($US) Animal welfare ($US) Breed origin VTTS air travel ($AUD) VTTS free flow (car) ($AUD) Stonewalls (Euros per annum) Farmyard tidiness (Euros per annum) Note:

Full attribute attendance

Attribute Reference non-attendance

10.17 (2.77)

12.42 (1.63)

4.93 15.04 5.98 25.14 (7.1) 25.87 (60.29) 53.22 198.88

3.86 11.69 4.46 38.4 (11.8) 22.45 (41.43) 101.27 60.59

Hensher et al. (2013) Scarpa et al. (2012)

Rose et al. (in press) Hess and Hensher (2010) Campbell et al. (2011)

The numbers in brackets are standard deviation if reported.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 281 mean WTP estimates varies across the studies as well as within a discipline application. The important point to make is that there are noticeable differences in mean estimates, which is enough evidence to raise questions about the implications of not considering AN-A. When translated into an aggregate measure of welfare benefits, these differences are sufficiently large to impact on the net benefits of specific projects or policy initiatives.

3

ATTRIBUTE THRESHOLDS

Unlike attribute non-attendance, which assumes an attribute is ignored or not in a strict binary format, given the attribute levels, research suggests that people consider the particular level of an attribute and make judgements and choices based on specific thresholds, possibly only attending to an attribute if it satisfies some threshold condition. In this sense, attribute thresholds are inherently linked to a possible explanation for attribute non-attendance. This section looks at the way that perceived attribute thresholds (or lower and upper cut-offs), are used by respondents to condition the role of an attribute in its contribution to the acceptability and hence choice of an alternative. There is a growing literature on attribute thresholds, with some studies imposing analytical distributions on cut-offs (including just noticeable differences such as Cantillo et al., 2006), and other studies asking supplementary questions (for example, Swait, 2001) prior to the stated choice questions, to establish lower and upper bounds on acceptable attribute levels. Studies in transportation in the 1970s (for example, Hensher, 1976) highlighted the presence of asymmetric thresholds, but did not incorporate them into choice models. Individuals are thought to adopt attribute thresholding in the way they process offered attribute levels associated with each alternative. Attribute thresholds have lower and upper bounds, which may be subject to measurement error, and also may be revised depending on the levels offered by other attributes. That is, there is ‘softness’ (in the language of Swait, 2001) in the binding nature of perceived threshold levels reported by the qth individual. To capture the notion of threshold, we can define a lower cut-off and an upper cut-off. Accounting for attribute thresholds is equivalent to introducing functions that are incremental effects on the linear attribute effect throughout an attribute’s entire range, and only get activated if the corresponding cut-off is in use. These cut-off penalties are typically defined as a linear function of the amount of constraint violation and defined as: {0:max(0, Xljq−Xlmin)}, the lower cut-off effect and deviation of the attribute level from the minimum cut-off attribute threshold where the attribute level is below the minimum cut-off (that is, the cut-off exists), and zero otherwise (if the cut-off does not exist); and {0:max(0, Xmmin − Xmjq)}, the upper cut-off effect and deviation of the attribute level from the maximum cut-off attribute threshold where the attribute level is above the maximum cut-off (that is, the cut-off exists), and zero otherwise (if the cut-off does not exist). Defining Xkjq as the kth attribute associated with the jth alternative and qth individual, with l 5 K 1 1, . . . , L attribute lower cut offs; m 5 L 1 1, . . . , M attribute upper cut-offs; q 5 1, . . . ,Q respondents, and bl and bm are estimated penalty parameters, we can write the threshold penalty expression as equation (12.14).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

282

Handbook of choice modelling

a l5k11bl { 0:max (0,Xljq 2 Xlqmin) } 1 a m5L11bm { 0:max (0,Xmqmax2Xmjq) L

M

(12.14)

Both upper and lower bounds can be behaviourally meaningful. For example, some individuals might only be interested in six-cylinder cars and would not consider four- or eight-cylinder cars. Likewise, low prices and very high prices might be rejected for different reasons, with purchasers often looking within a specific price range given their preferences. Attribute thresholds can be introduced into a utility expression through the functional specification of an attribute, but also as a conditioning agent on an entire utility expression. To illustrate this, beginning with the standard utility expression associated with the jth alternative contained in a choice set of j51, . . . , J alternatives, we define Rhq as a dummy variable indicating whether the hth attribute level is in a perceived attribute threshold rejection region or not for the qth individual. This conditioning is a form of heteroscedasticity. An example of heteroscedastic conditioning, implemented in Hensher H and Rose (2012) is Ajq 5 (1 1 g h51ghRhq) , where Rhq is defined above and gh are estimated parameters. The model form for the utility expression that encapsulates the elements presented above is given in equation (12.15). Ujq 5 a1 1 a h51ghRhq b ) c aj 1 a K51bkjXkjq 1 a l5k11bl { 0:max (0,Xljq 2 Xlq min) } H

K

L

1 a m5L11 bm { 0: max (0, Xmq max2Xmjq)}] 1ej M

(12.15)

All terms are defined above except aj which are alternative-specific constants. Equation (12.15) is a non-linear utility function, with utility functions defined over Jqt choices available to individual q in choice situation t, given in equation (12.16). Ujqt 5 Vjqt 1 ejqt, j 5 1, . . . , Jqt; t 5 1, . . . ,Tq; q 5 1, . . . ,Q

(12.16)

The independent and identically distributed (iid), type I extreme value distribution is assumed for the random terms ejqt. Conditioned on Vjqt, the choice probabilities take the familiar multinomial logit (MNL) form (12.17). Probjqt 5

expVjqt a j51 Jqt

(12.17)

expVjqt

When we allow for heteroscedasticity, equation (12.17) becomes equation (12.18). exp c a1 1 a h51ghRhq b bVjqt d H

Probjqt 5

a j51exp c a1 1 a h51ghRhq b bVjqt d Jqt

(12.18)

H

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 283 Hensher and Rose (2012) have implemented this model form in the context of automobile purchases. They found a significant improvement in predictive power as well as different mean direct elasticities for heteroscedastic Gumbel scale MNL (HG-SMNL) compared to simple MNL models of the form in (12.18), due in large measure to the ‘scaling’ of the standard utility expression by a function that accounts for acceptability of each alternative and perceived attribute thresholds, as well as accounting for scale heterogeneity. Hensher and Rose (2012) found that the relative disutility of the jth alternative decreases7 when this alternative (1) is perceived to be acceptable in contrast to not acceptable (0); and when the price attribute is in the rejection range (given the attached parameter estimate is −0.1848); this disutility is further tempered and increases. A negative parameter for the lower and upper cut-off penalties recognized that a price level outside of the lower and upper perceived thresholds of preference will add disutility, increasing the overall relative disutility. What we then have in this formulation is a way of recognizing and adjusting the marginal disutility of an attribute associated with an alternative in a particular choice set.

4

THE MAJORITY OF CONFIRMING DIMENSIONS: DIMENSIONAL VERSUS HOLISTIC PROCESSING STRATEGIES

The ‘majority of confirming dimensions’ (MCD) rule (Russo and Dosher 1983), is another form of attribute processing strategy that is concerned with the total count of superior attributes in each alternative. Under this test, pairs of attributes are compared in turn, with an alternative winning if it has a greater number of better attribute levels. The paired test continues until there is an overall winner. Hensher and Collins (2011) used a choice experiment data-set to investigate the possibility of MCD. A total count of best attributes was generated for each alternative, and then entered into the utility expressions for all three alternatives. To contribute to the count for an alternative, an attribute had to be strictly better than that attribute in all other alternatives in the choice set. The distribution of the number of best attributes was calculated, both for the full relevance sample, and accounting for attributes being ignored, with separate reporting for all alternatives and the chosen alternative only. The distribution for the chosen alternative was found to be skewed towards a higher number of best attributes in both cases, with higher means observed, which is plausible. This alone does not suggest that MCD is being employed, as it would be expected that alternatives with a higher number of best attributes would also tend to have higher relative utilities. Hensher and Collins (2011) did find, however, that the percentage of alternatives with zero strictly best attributes was much higher when allowing for attributes not attended to than in the ‘full relevance’ group. This might suggest that respondents are more likely to ignore an attribute when at least one attribute is outranked. On this evidence, if found true in other data, there are important behavioural implications since the analyst may wish to remove alternatives in model estimation where the number of best attributes is zero.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

284

Handbook of choice modelling

A series of choice models were estimated by Hensher and Collins (2011) to explore the potential for MCD when all attributes are relevant and under stated attribute nonattendance. Under full relevance of all attributes when they included a variable defined as ‘the number of attributes in an alternative that are best’, it was highly significant, and positive in sign, so that as the number of best attributes increases, an alternative is more likely to be chosen, as would be expected. When only the number of best attributes and the alternative-specific constants are included, and the attribute levels are omitted, the model fit was considerably worse even though ‘the number of best attributes’ was highly significant, suggesting that the number of best attributes cannot substitute for the attribute levels themselves. The same tests can be performed, after accounting for attributes stated as being ignored, that is, any ignored attributes were not included in the count of the number of best attributes. The model fit was found to improve substantially when all attributes are assumed to be not attended to, with MCD complementing the parameterization of attributes attended to. Hensher and Collins (2011) calculated values of travel time savings which varied sufficiently between full relevance and allowing for attributes being ignored, but not between models within each of this attribute processing settings when allowance was made for the number of attributes that are best. The evidence suggests that all respondents simultaneously consider and trade between both the attribute levels in a typical compensatory fashion (both under full relevance and after ignoring some attributes if applicable), and the number of best attributes in each alternative. However to investigate whether there may be two classes of respondent, with heuristic application distinguishing between them, two latent class models8 were also estimated. The first class contained the attribute levels and alternative-specific constants, as per the base model, while the second class contained only the number of best attributes. A further improvement in model fit was obtained with this model. These results suggest that some respondents are employing the MCD heuristic. Under the heuristic, trading was not occurring on the absolute attribute levels. What appeared to matter instead is which alternative has the best level for each attribute, where tallies of the number of best attributes appeared to act as a supplementary step when determining the best alternative. Overall, the mean probability of class membership of each class in both models was over 80 per cent for processing the constituent attributes, and between 15 and 18 per cent for the number of attributes being the determining influence.

5

REFERENCE POINT REVISION AND VALUE LEARNING

The final attribute processing strategy reviewed was proposed by DeShazo (2002) who suggested the idea of reference point revision in which preferences may be well formed, but respondents’ value functions shift when a non-status-quo option is chosen (see also McNair et al., 2012). The shift occurs because the selection of a non-status-quo option is viewed as a transaction up to a probability, and this causes a revision of the reference point around which the asymmetric value function predicted by prospect theory is centred (Kahneman and Tversky, 1979). There is an important distinction to be made between value learning, which in its broadest meaning implies underlying preferences are changing, and reference revision which can occur when preferences are stable but the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 285 objective is to maximize the likelihood of implementation of the most preferred alternative observed over the course of the sequence of questions. The latter is a special case of the former. Consider a model in which we identify the chosen alternative from a previous choice set, and create a dummy variable equal to 1 associated with whatever alternative was chosen in the previous choice set, be it the initial reference alternative or one of the offered non-status-quo alternatives. Hensher and Collins (2011) introduced into utility expressions a revised reference dummy variable as a way of investigating the role of value learning. They found that when the reference alternative is revised, in the next choice scenario it increases the utility of the new ‘reference’ alternative. This is an important finding, supporting the hypothesis of DeShazo; it is also recognition of sequential interdependence between adjacent choice scenarios, which should be treated explicitly rather than only through a correlated error variance specification, where the latter captures many unobserved effects at the alternative level. Another useful test relates to the relationship between the level of an attribute associated with the reference (or status quo) alternative and each of the other alternatives in a choice experiment. One might distinguish between differences where a reference alternative attribute level was better, equal and worse relative to choice experiment alternatives CE1 and CE2, defined as a series of attribute specific dummy variables (for example, attributei better 5 1 if reference attributei minus CE1 attributei is negative and equal to zero if reference attributei minus CE1 attributei is positive). The choice response variable refers to the alternative chosen. A simple logit model can be specified in which the better and worse attribute forms for all design attributes can be included. Where an attribute refers to a better level for the reference alternative (the difference for all attributes being negative on the attribute difference as illustrated above for attributei), a positive parameter estimate suggests that when the difference narrows towards zero, making the reference alternative relatively less attractive on that attribute, the probability of choosing a non-reference alternative (CE1 or CE2) increases. Hensher and Collins (2011) in their empirical inquiry found that the parameter estimate was positive for ‘better’. The opposite behavioural response was found when the reference alternative is worse. Positive parameter estimates suggest that when the reference alternative becomes relatively less attractive (given it is worse), the probability of choosing CE1 or CE2 increases.

6

CONCLUSIONS

This chapter has selectively reviewed the growing literature on attribute processing, as well as its intersection with a broader literature on heuristics (the latter presented in other chapters). See also Leong and Hensher (2012). The link between attribute processing and heuristics can loosely be described by the role that attributes, as part of a package of attributes representing an alternative, play in the way that individuals process this information in arriving at a choice outcome. The connection between this chapter and work such as Chorus (2010) seems obvious, yet there is clear scope to focus on the topics presented in this chapter as a subset of the heuristics literature. What we do know is that attribute processing is part of a growing interest in returning to the study of the underlying behavioural assumptions that influence the way in which decision makers adopt coping strategies to assist in making what they believe are

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

286

Handbook of choice modelling

sensible (albeit rational) choices. The extent to which the revealed processing strategies, and subsequent choice outcomes, are truly independent of the survey context is a matter of continuing debate and research; however it is generally accepted that the world is sufficiently complex that any additional imposition from a survey instrument may not be a cause of major concern in identifying the preference functions of individuals. It is further suggested in the growing literature on attribute processing that continued sophistication of econometric assumptions, essentially treatments of errors and parameters, cannot alone improve the behavioural fit of choice models. A number of chapters in this handbook reinforce this position.

NOTES 1. This chapter does not consider other aspects of process in choice experiments such as uncertainty in the choice response. See Lundhede et al. (2010). 2. This chapter focuses on stated choice surveys. We recognize that, at some level, one might expect these attribute processing effects to be more prominent in revealed preference data given that, for example, advertising/branding is designed to encourage not paying attention to attributes, while in other instances, such as putting high sugar cereals on low shelves in grocery stores or putting important detail in fine print, there are intentional efforts to obscure details. But it is also possible to make the case that survey respondents may pay less attention to details. 3. This finding has interesting implications for the growing evidence that mean WTP for an attribute tends to be higher under a wider range for the numerator attribute (Louviere and Hensher, 2001). Simply put, the greater relevance in preserving the attribute content under a wider range will mean that such an attribute is relatively more important to the outcome, than it is under a narrow range specification, and hence a higher mean WTP is inferred. 4. In this example, there is one unrestricted parameter vector in the model, shown as b4 5 (bA, bB). The other parameter vectors are constructed from the same two parameters either by setting one or both elements to zero or by equating elements to those in b4. Thus, b3 5 (0,bB) is obtained as a linear restriction on b4, namely that one element is equal to zero and a second element is equal to the corresponding element in b4. 5. Puckett and Hensher (2008) suggest that the range and relative equivalence of the price attribute levels among alternatives in a particular choice task may lead respondents to ignore the price attribute in some choice tasks and not in others. 6. Assuming endogeneity bias as present requires a definition and a test. Endogeneity bias can arise from a number of sources such as measurement error, missing attributes and simultaneity, and is observed when a specific variable included in the observed effects is correlated with the error term associated with the utility expression containing the explanatory variable of interest. To test for endogeneity bias (that is, the part that is correlated with the random error component), analysts should undertake two tasks: first testing for the extent to which the variable has systematic influence on the standard deviation of the error component and, second, identifying other exogenous variables that are correlated with the variable under consideration, but not with the error component that could be used as instrumental variables, or simply as evidence of no endogeneity bias. 7. It should be noted that the overall utility expression is negative, and hence the heteroscedastic effect reduces the disutility when the alternative is acceptable, compared with not acceptable, as might be expected. 8. See Hensher and Greene (2010) for other examples of the identification of attribute processing heuristics with the latent class model.

REFERENCES Beharry-Borg, N. and R. Scarpa (2010), ‘Valuing quality changes in Caribbean coastal waters for heterogeneous beach visitors’, Ecological Economics, 69 (5), 1124–39. Bujosa, A., A. Riera and R. Hicks (2010), ‘Combining discrete and continuous representation of preference heterogeneity: a latent class approach’, Environment and Resource Economics, 47 (4), 477–93.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 287 Cameron, T.A. and J.R. DeShazo (2011), ‘Differential attention to attributes in utility-theoretic choice models’, Journal of Choice Modelling, 3 (3), 73–115. Campbell, D., D.A. Hensher and R. Scarpa (2011), ‘Non-attendance to attributes in environmental choice analysis: a latent class specification’, Journal of Environmental Planning and Management, 54 (8), 1061–76. Cantillo, V., B. Heydecker and J. de D. Ortuzar (2006), ‘A discrete choice model incorporating thresholds for perception in attribute values’, Transportation Research Part B, 40 (9), 807–25. Carlsson, F., M. Kataria and E. Lampi (2008), ‘Ignoring attributes in choice experiments’, Proceedings of EAERE Conference, Gothenburg, 25–28 June. Chorus, C.G. (2010), ‘A new model of random regret minimization’, European Journal of Transportation and Infrastructure Research, 10 (2), 181–96. Collins, A.T., J.M. Rose and D.A. Hensher (2012), ‘The random parameters attribute nonattendance model’, paper presented at the 13th International Conference on Travel Behavior Research, Toronto, July. DeShazo, J.R. (2002), ‘Designing transactions without framing effects in iterative question formats’, Journal of Environmental Economics and Management, 43 (3), 360–85. DeShazo, J.R. and G. Fermo (2004), ‘Implications of rationally-adaptive pre-choice behaviour for the design and estimation of choice models’, working paper, School of Public Policy and Social Research, University of California at Los Angeles, August. Gilovich, T., D. Griffin and D. Kahneman, D. (eds) (2002), Heuristics and Biases – the Psychology of Intuitive Judgment, Cambridge: Cambridge University Press. Greene, W.H. and D.A. Hensher (2010), ‘Ordered choice, heterogeneity, and attribute processing’, Journal of Transport Economics and Policy, 44 (3), 331–64, doi: 10.1080/00036846.2011650325. Greene, W.H. and D.A. Hensher (2013), ‘Revealing additional dimensions of preference heterogeneity in a latent class mixed multinomial logit model’, Applied Economic Letters, 45 (14), 1897–902. Hensher, D.A. (1976), ‘The value of commuter travel time savings: empirical estimation using an alternative valuation model’, Journal of Transport Economics and Policy, 10 (2), 167–76. Hensher, D.A. (2006a), ‘How do respondents process stated choice experiments? Attribute consideration under varying information load’, Journal of Applied Econometrics, 21 (6), 861–78. Hensher, D.A. (2006b), ‘Attribute processing in choice experiments’, in B. Kanninen (ed.), Valuing Environmental Amenities Using Stated Choice Studies: A Common Sense Approach to Theory and Practice, Dordrecht: Springer, pp. 135–58. Hensher, D.A. (2007), ‘Reducing sign violation for VTTS distributions through recognition of an individual’s attribute processing strategy’, International Journal of Transport Economics, 34 (3), 333–49. Hensher, D.A. (2008), ‘Joint estimation of process and outcome in choice experiments and implications for willingness to pay’, Journal of Transport Economics and Policy, 42 (2), 297–322. Hensher, D.A. (2010), ‘Attribute processing, heuristics and preference construction in choice analysis’, in S. Hess and A. Daly (eds), State-of Art and State-of Practice in Choice Modelling, Bingley: Emerald Press, pp. 35–70. Hensher, D.A. and A. Collins (2011), ‘Interrogation of responses to stated choice experiments: is there sense in what respondents tell us? A closer look at what respondents choose in stated choice experiments’, Journal of Choice Modelling, 4 (1), 62–89. Hensher, D.A. and W.H. Greene (2010), ‘Non-attendance and dual processing of common-metric attributes in choice analysis: a latent class specification’, Empirical Economics, 39 (2), 413–26. Hensher, D.A. and D. Layton (2010), ‘Parameter transfer of common-metric attributes in choice analysis and cognitive rationalisation: implications for willingness to pay’, Transportation, 37 (3), 473–90. Hensher, D.A. and J.M. Rose (2009), ‘Simplifying choice through attribute preservation or non-attendance: implications for willingness to pay’, Transportation Research Part E, 45 (4), 583–90. Hensher, D.A. and J.M. Rose (2012), ‘The influence of alternative acceptability, attribute thresholds and choice response certainty on automobile purchase preferences’, Journal of Transport Economics and Policy, 46 (3), 451–68. Hensher, D.A., A.T. Collins and W.H. Greene (2013), ‘Accounting for attribute non-attendance and commonmetric aggregation in a latent class mixed multinomial logit model: a warning on potential confoundment’, Transportation, 40 (5), 1003–20. Hensher, D.A., J. Rose and T. Bertoia (2007), ‘The implications on willingness to pay of a stochastic treatment of attribute processing in stated choice studies’, Transportation Research Part E, 43 (1), 73–89. Hensher, D.A., J. Rose and W. Greene (2005), ‘The implications on willingness to pay of respondents ignoring specific attributes’, Transportation, 32 (3), 203–22. Hensher, D.A., J.M. Rose and W.H. Greene (2012), ‘Inferring attribute non-attendance from stated choice data: implications for willingness to pay estimates and a warning for stated choice experiment design’, Transportation, 39 (2), 235–54. Hensher, D.A., J. Rose and S. Puckett (2009), ‘Selective developments in choice analysis and a reminder about the dimensionality of behavioural analysis’, in R. Kitamura, T. Yoshii and T. Yamamoto (eds), The

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

288

Handbook of choice modelling

Expanding Sphere of Travel Behaviour Research: Selected Papers from the 11th International Conference on Travel Behaviour Research, Bingley: Emerald Press, pp. 237–76. Hess, S. (2011), ‘Impact of unimportant attributes in stated choice surveys’, ITS working paper, Institute for Transport Studies, University of Leeds. Hess, S. and D.A. Hensher (2010), ‘Using conditioning on observed choices to retrieve individual-specific attribute processing strategies’, Transportation Research Part B, 44 (6), 781–90. Hess, S. and D.A. Hensher (2013), ‘Making use of respondent reported processing information to understand attribute importance: a latent variable scaling approach’, Transportation, 40 (2), 397–412. Hess, S. and J.M. Rose (2007), ‘A latent class approach to modelling heterogeneous information processing strategies in SP studies’, paper presented at the Oslo Workshop on Valuation Methods in Transport Planning, Oslo, 2–5 November. Hess, S., A. Stathopoulos, D. Campbell, V. O’Neill and S. Caussade (2011), ‘It’s not that I don’t care, I just don’t care very much: confounding between attribute non-attendance and taste heterogeneity’, ITS working paper, Institute for Transport Studies, University of Leeds. Hole, A.R. (2011), ‘A discrete choice model with endogenous attribute attendance’, Economics Letters, 110 (3), 203–5. Hoyos, D., P. Mariel and J. Meyerhoff (2010), ‘Comparing the performance of different approaches to deal with attribute non-attendance in discrete choice experiments: a simulation experiment’, BILTOKI 201001, Universidad del PaVasco – Departamento de EconomAplicada III (Econometry Estadica). Kahneman, D. and A. Tversky (1979), ‘Prospect theory: an analysis of decision under risk’, Econometrica, 47 (2), 263–92. Layton, D. and D.A. Hensher (2010), ‘Aggregation of common-metric attributes in preference revelation in choice experiments and implications for willingness to pay’, Transportation Research Part D, 15 (7), 394–404. Leong, W. and D.A. Hensher (2012), ‘Embedding decision heuristics in discrete choice models: a review’, Transport Reviews, 32 (3), 313–31. Lockwood, M. (1996), ‘Non-compensatory preference structures in non-market valuation of natural area policy’, Australian Journal of Agricultural Economics, 40 (2), 73–87. Louviere, J.J. and D.A. Hensher (2001), ‘Combining preference data’, in D.A. Hensher (ed.), The Leading Edge of Travel Behaviour Research, Oxford: Pergamon Press, pp. 125–44. Lundhede, T., S. Olsen, J. Jacobsen and B. Thorsen (2010), ‘Handling respondent uncertainty in choice experiments: evaluating recoding approaches against explicit modelling of uncertainty’, Journal of Choice Modelling, 2 (2), 118–47. McFadden, D. (2001), ‘Economic choices’, American Economic Review, 91 (3), 351–78. McNair, B., D.A. Hensher and J. Bennett (2012), ‘Modelling heterogeneity in response behaviour towards a sequence of discrete choice questions: a probabilistic decision process model’, Environment and Resource Economics, 51 (4), 599–616. Puckett, S.M. and D.A. Hensher (2008), ‘The role of attribute processing strategies in estimating the preferences of road freight stakeholders under variable road user charges’, Transportation Research Part E, 44 (5), 379–95. Pudney, S. (1989), Modelling Individual Choice: The Econometrics of Vorners, Kinks and Holes, New York: Basil Blackwell. Rekola, M. (2003), ‘Lexicographic preferences in contingent valuation: a theoretical framework with illustrations’, Land Economics, 79 (2), 277–91. Rigby, D. and M. Burton (2006), ‘Modelling disinterest and dislike: a bounded Bayesian mixed logit model of the UK market for GM food’, Environmental and Resource Economics 33 (3), 485–509. Rose, J., D. Hensher, W. Greene and S. Washington (in press), ‘Attribute exclusion strategies in airline choice: accounting for exogenous information on decision maker processing strategies in models of discrete choice’, Transportmetrica. Russo, J.E. and B.A. Dosher (1983), ‘Strategies for multiattribute binary choice’, Journal of Experimental Psychology: Learning, Memory, & Cognition, 9 (4), 676–96. Sælensminde, K. (2002), ‘The impact of choice inconsistencies in stated choice studies’, Environmental and Resource Economics, 23 , 403–20. Scarpa, R., T. Gilbride, D. Campbell and D.A. Hensher (2009), ‘Modelling attribute non-attendance in choice experiments for rural landscape valuation’, European Review of Agricultural Economics, 36 (2), 151–74. Scarpa, R., M. Thiene and D.A. Hensher (2010), ‘Monitoring choice task attribute attendance in non-market valuation of multiple park management services: does it matter?’, Land Economics, 86 (4), 817–39. Scarpa, R., R. Zanoli, V. Bruschi and S. Naspetti (2012), ‘Inferred and stated attribute non-attendance in food choice experiments’, American Journal of Agricultural Economics, 95 (1), 165–80, doi: 10.1093/ajae/ aas073. Simon, H.A. (1955), ‘A behavioural model of rational choice’, Quarterly Journal of Economics, 69 (1), 99–118.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Attribute processing as a behavioural strategy in choice making 289 Spash, C.L. (2000), ‘Ecosystems, contingent valuation and ethics: the case of wetland recreation’, Ecological Economics, 34 (2), 195–215. Svenson, O. (1992), ‘Differentiation and consolidation theory of human decision making: a frame of reference for the study of pre-and post-decision processes’, Acta Psychologica, 80 (1–3), 143–68. Swait, J. (2001), ‘A non-compensatory choice model incorporating attribute cut-offs’, Transportation Research Part B, 35 (10), 903–28.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:49AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

13 Capturing alternative decision rules in travel choice models: a critical discussion Caspar G. Chorus

1

INTRODUCTION

Since its inception 40 years ago (McFadden, 1974), the discrete choice paradigm has grown to become one of the most used paradigms for the study of travel demand (for example, Small and Verhoef, 2007; Ortúzar and Willumsen, 2011). Not only has the transportation field served as a very fertile application area for choice modellers; the travel demand research community has over the years provided a range of important contributions to the discrete choice modelling toolbox as well (for example, Small and Rosen, 1981; Ben-Akiva and Lerman, 1985; Brownstone and Train, 1998; Walker and Ben-Akiva, 2002; Bhat, 2005).1 When inspecting the literature on discrete (travel) choice modelling, it becomes clear that some aspects of choice modelling have over the years received much more attention in our field than other aspects. For example, numerous research efforts have aimed at deriving realistic error term distributions – that is, distributions that imply realistic correlation structures and substitution patterns (for example, Ben-Akiva, 1974; McFadden, 1978; Small, 1987; McFadden and Train, 2000). As another example, during the past decade, a rapidly growing interest has emerged in embedding ‘psychological factors’ such as – latent – attitudes, in travel choice models. See Van Acker et al. (2011), Prato et al. (2012) and Chorus (2012a) for recent discussions of this literature. An aspect of discrete (travel) choice models that has received much less attention over the years is the assumed decision rule. This decision rule is a crucial component of any choice model, as it reflects the analyst’s assumptions or hypotheses concerning how the decision-maker translates attributes of alternatives and his or her tastes (that is, parameters) into a choice for one of the alternatives in the set. Until quite recently, discrete travel choice models practically without exception used so-called linear-additive utility maximization-based decision rules. These models (from here on, standard models) assume that the decision-maker assigns a utility to each alternative, which is based on a weighted summation of attributes of alternatives (weights are given by estimable parameters that represent the importance of the attribute to the decision-maker). Subsequently, the alternative with highest utility is chosen. Although it is widely acknowledged that this decision is intuitive, elegant and formally tractable, it has regularly been mentioned in previous research papers and textbooks (for example, Ben-Akiva and Lerman, 1985) that this linear-additive utilitarian rule is not the only candidate for choice models, and that alternative decision rules may sometimes provide a more realistic account of behaviour. Despite this acknowledgement, as said, the overwhelming majority of travel choice models has been and continues to be based on the linear-additive utility maximization decision rule. 290 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Capturing alternative decision rules in travel choice models 291 However, more recently – and possibly inspired by the surge in so-called behavioural economics research (Kahneman, 2003; McFadden, 2007) – there has been a growing interest among a small but growing group of travel choice modellers into incorporating alternative decision rules in their models (for example, Arentze and Timmermans, 2007; Hess et al., 2008, 2012; Chorus et al., 2008; Hensher, 2010; Zhu and Timmermans, 2010; Chorus, 2012b; Prato, 2014). A comprehensive review of this emerging literature can be found in Leong and Hensher (2012a), and a brief discussion of some relevant examples is presented in the next section. The expectations of the potential impact of capturing alternative decision rules in travel choice models are high, which is to be expected when an emerging topic gains popularity among scholars. Examples of formulated expectations that you may encounter in the literature include suggestions that these alternative models: (1) form a more realistic account of the actual decision-making process; (2) result in a better model fit with observed choices and a better predictive ability; (3) lead to new behavioural insights; and (4) as a result of these new insights may ultimately lead to new and more effective travel demand management tools and transport policies. This chapter contributes to this emerging literature by providing a critical discussion of recent progress in capturing alternative decision rules in travel choice models: the aim is to discuss to what extent the above-mentioned expectations are in fact warranted, and to discuss a number of pitfalls and challenges that relate to the alternative decision rule-paradigm at a conceptual and operational level. Where possible, recommendations presented are aimed at helping choice modellers avoid or deal with these pitfalls and challenges and realize the potential of travel choice models that are based on alternative decision rules. The focus of this chapter is on travel choice models, although this does not imply that its observations, conclusions and recommendations are irrelevant for choice modellers in adjacent fields such as marketing research and environmental economics. Importantly, this chapter does not provide a (complete) overview of the literature – as mentioned earlier, a useful overview can be found in Leong and Hensher (2012a). Rather, the chapter provides food for thought, and to help stimulate a discussion among travel choice modellers about the pros and cons, and potential and limitations, of capturing alternative decision rules in travel choice models. Furthermore, the chapter focuses on riskless choice, in keeping with the fact that the majority of travel choice models deal with riskless choices. See van de Kaa (2010) and Li and Hensher (2011) for recent overviews of alternative decision rules in risky choice contexts. For reasons of space limitations, the chapter deals with one-stage decision processes only. As such, it does not discuss research into two-stage decision processes, which generally include a choice-set formation step based on alternative (semi-compensatory) decision rules – see Kaplan et al. (2012) for a recent example of a two-stage model including a useful review of past work. Section 2 presents brief classification of decision rules applied in travel choice models, and discusses a few relatively often used decision rules in more depth. Note that the Appendix provides a mathematical representation of these models from within one unified notational framework. Section 3 discusses pitfalls and challenges associated with capturing alternative decision rules in travel choice models. Section 4 provides recommendations aimed at helping realize the potential of travel choice models based on alternative decision rules.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

292

2

Handbook of choice modelling

ALTERNATIVE DECISION RULES INCORPORATED IN DISCRETE TRAVEL CHOICE MODELS2

Over the years and throughout the social sciences, numerous alternatives for the linearadditive utility maximization decision rule have been proposed – see, for example, Payne et al. (1993) and Gigerenzer and Selten (2002) for overviews. While some of these rules have recently been incorporated in discrete travel choice models, many have not (yet) received attention from the travel choice modelling community. It is, of course, impossible to classify all these decision rules in an unambiguous way, but it does appear that most alternative decision rules are predominantly inspired by one of two underlying behavioural observations: a first observation, backed by an impressive amount of empirical evidence, is that decision-makers aim to minimize cognitive effort (and time, and attention) when making choices. This observation has inspired behavioural scientists to propose and use decision rules such as satisficing (Simon, 1955), elimination by aspects (Tversky, 1972) and lexicographic choice (for example, Saelensminde, 2006). Presumably, these alternative rules consume less cognitive effort than the linear-additive utility maximization rule. A second observation (which is related to the first one, but yet conceptually different), is that decision-makers’ preferences are context and (choice) situation dependent. A large body of empirical literature suggests that choices can be easily manipulated by seemingly irrelevant changes in the composition of the choice set, or by seemingly irrelevant ‘cheap talk’ related to, for example, reference points. Motivated by this literature, researchers have proposed models aimed at capturing these ‘behavioural anomalies’ in alternative decision rules. Examples include reference dependent or loss aversion models (Tversky and Kahneman, 1991), the relative advantage model (Tversky and Simonson, 1993), the contextual concavity model (Kivetz et al., 2004) and the random regret minimization model for riskless choice (Chorus, 2010). In the following, some alternative decision rules that have recently been incorporated in travel choice models are discussed in more depth (but the reader is again referred to Leong and Hensher, 2012a, or to papers cited below, for a much more in-depth and formal presentation of models). To start with models are presumably inspired by cognitive effort minimization: the elimination-by-aspects (EBA) rule (Tversky, 1972), in short, amounts to an iterative process of selecting an attribute (more important attributes are more likely to be selected at a given decision stage), and subsequently eliminating alternatives that do not score ‘well enough’ on the attribute (case of continuous attributes) or that do not contain the aspect (case of dummy attributes). This process is repeated until one alternative is left, which is subsequently chosen. This rule has been applied in a travel choice context by, amongst others, Hess et al. (2012). The notion of lexicographic choice is related to the EBA rule: it states that the decision-maker focuses on one – that is, the most important – attribute only, and picks the alternative with the best performance on that attribute. Applications of lexicographic decision rules in travel choice models include Killi et al. (2007), Zhu and Timmermans (2010), and Hess et al. (2012). Note that although Herbert Simon’s ground-breaking paper introducing the satisficing heuristic (Simon, 1955) is very regularly cited as an inspiration for developing so-called boundedly rational discrete travel choice models (for example, Swait and Ben-Akiva, 1987; Ben-Akiva and Boccara, 1995), direct applications or translations of this heuristic in travel choice models are, to the best of the author’s knowledge, absent.3

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Capturing alternative decision rules in travel choice models 293 When focusing on decision rules presumably inspired by notions of context dependent preferences, notable examples in a discrete travel choice context include the following: reference dependent models are based on the assumption that when evaluating alternatives, decision-makers are focused on how an alternative’s attribute levels perform relative to attribute-specific reference points, rather than focusing on the attribute levels per se. Moreover, reference dependent models usually hypothesize that ‘losses’ compared with these reference points are given more weight than gains of similar magnitude (usually this is modelled by allowing for the taste parameter to be larger in the domain of losses than in the domain of gains). Furthermore, these models usually allow for some form of de- or increasing sensitivity (by means of concavity-convexity parameters). Although originally proposed in the context of a theory for risky choice (prospect theory; Kahneman and Tversky, 1979), the notion of reference dependency readily applies to riskless choice as well as argued by Tversky and Kahneman (1991). Riskless choice applications of this approach in a travel choice context can be found in, for example, Hess et al. (2008), De Borger and Fosgerau (2008) and Stathopoulos and Hess (2012). Most other decision rules aimed at capturing context dependent preferences are also based on some form of reference-dependency, as will become clear in the next three paragraphs. The relative advantage model (Tversky and Simonson, 1993) assumes that decisionmakers evaluate an alternative not only in terms of their ‘context-free’ linear-additive utility, but also in terms of whether it comes with advantages or disadvantages when compared to other alternatives at the level of particular attributes. As such, attribute levels of competing alternatives serve as reference points. The relative advantage model, when it allows for asymmetry between advantages and disadvantages (owing to a loss aversion parameter and a convexity-parameter in the disadvantage-function), is able to account for preferences for so-called compromise alternatives; these are alternatives that – when compared with competitors – have an intermediate performance on every attribute, rather than having a strong performance on some attributes and a poor performance on others.4 Applications of the relative advantage model in the travel choice domain are, to the best of the author’s knowledge, scarce – but see Leong and Hensher (2014) for an example, albeit one that focuses on a special model form that assumes symmetry between advantages and disadvantages. The contextual concavity model (Kivetz et al., 2004) takes as attribute-specific reference points the least preferred values of every attribute as available in the choice set. An alternative’s performance on a particular attribute is then modelled as the difference between the alternative’s attribute value and the reference value for that attribute. This difference in performance is then used as the base of a power function. Depending on the estimated parameter in the exponential of the power function (which allows for diminishing sensitivity – in other words, a concave utility function), the contextual concavity model is able to capture preferences for compromise alternatives. Applications of the contextual concavity model in a travel choice setting can be found in Leong and Hensher (2012b) and Chorus and Bierlaire (2013). The recently proposed random regret minimization (RRM) model (Chorus, 2010) also assumes that attributes of competing alternatives serve as reference points: more specifically, it assumes that choices are determined by the wish to minimize anticipated regret – regret being conceptualized as the emotion that is felt when one or more non-chosen

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

294

Handbook of choice modelling

alternatives perform better than the chosen one, in terms of one or more attributes. The model postulates that the regret associated with an alternative equals the sum of all so-called binary regrets that are associated with bilaterally comparing the considered alternative with each of the other alternatives in the choice set. The level of binary regret associated with comparing the considered alternative with another alternative is taken to be the sum of the regrets that are associated with comparing the two alternatives in terms of each of their attributes. This attribute level-regret in turn is either (close to) zero when the other alternative performs worse than the considered alternative in terms of the attribute, and it grows as an approximately linear function of the difference in attributevalues in case the considered alternative performs worse than the alternative with which it is compared. In that case, an estimable parameter (for which also the sign is estimated) gives the approximation of the slope of the regret-function for the attribute. The RRM model has been shown to capture preferences for compromise alternatives (for example, Chorus, 2010, 2012b; Chorus and Rose, 2012; Chorus and Bierlaire, 2013). Applications of the RRM approach in a travel choice context can be found in, for example, Chorus and de Jong (2011), Beck et al. (2013), Hess et al. (2012), Kaplan and Prato (2012), Thiene et al. (2012), Boeri and Masiero (2014), Hensher et al. (2013) and Prato (2014).

3

PITFALLS, DRAWBACKS AND CHALLENGES ASSOCIATED WITH EMBEDDING ALTERNATIVE DECISION RULES IN TRAVEL CHOICE MODELS

Whereas the previous section presented – in the form of a brief overview – some recent examples of discrete choice models that are based on alternative decision rules, this section highlights pitfalls and challenges which are associated with capturing these alternative decision rules in (travel) choice models. The section focuses on two broad categories of pitfalls, challenges, and drawbacks: first, the focus is on identification-related issues then section 3.2 concludes by discussing the drawback of added complexity. 3.1

Identification Issues

An important category of pitfalls relates to identification issues. Many of the papers mentioned in the previous section that propose choice models based on alternative decision rules have reported difficulties with identifying (combinations of) parameters. Although estimated models are generally identified theoretically, it appears to be sometimes very difficult to empirically identify them based on the available choice data. Particularly the estimation of parameters in power functions, for example to induce increasing or diminishing sensitivity (or, in other words, convexity or concavity of valuefunctions) has proven to be rather difficult in many occasions (for example, Kivetz et al., 2004; Avineri and Bovy, 2008; Chorus and Bierlaire, 2013). When inspecting the functional form of these models, such identification issues should come as no surprise: it is intrinsically difficult to assign an effect to either one (taste or loss aversion) parameter in the base of a power function, or to another (concavity-convexity) parameter in the exponent of that function. Transportation researchers have come up with several approaches to deal with this

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Capturing alternative decision rules in travel choice models 295 type of identification issue. In some situations, parameters have been ‘borrowed’ from previous studies in- or outside transportation. It goes without saying that this is not to be recommended, as argued in for example Chorus (2012a). Another approach has been to restrict parameters to some base value (for example, 1 in case of a parameter in the exponent of a power function). Take for example Leong and Hensher (2014) who estimate a so-called symmetric relative advantage model, where no loss aversion or non-linear sensitivity is allowed for. This approach is possibly triggered by identification issues related to the generic form of the relative advantage model as reported in Kivetz et al. (2004) and cited in Leong and Hensher (2014). The resulting symmetric relative advantage model achieves good levels of goodness of fit (see section 4). However, it loses some of the more generic model’s behavioural foundations and implications; for example, the symmetric model form in most choice situations generates preferences for extreme alternatives rather than compromise alternatives, which appears to be at odds with much empirical evidence presented in other studies (see for example note 4). Another, related, approach has been to avoid identification difficulties by means of imposing parameterfree functional forms: take for example the newest version of the random regret minimization model (Chorus, 2010), which exhibits a convex regret function in the form of a parameter-free logsum specification. Also this solution sacrifices some behavioural flexibility but it helps avoid identification issues. In the author’s view, the kind of identification issues discussed directly above are only examples of a more general identification issue related to the estimation of choice models based on alternative decision rules: it is inherently difficult to infer a choice process from a choice outcome alone. In other words, conceptually speaking, one and the same choice outcome may be equally ‘compatible’ with multiple different combinations of decision rules and parameter sets. It is the author’s personal experience that, on many occasions, choice probabilities generated by one particular combination of a decision rule and a parameter set can be very closely approximated by an arbitrarily chosen alternative decision rule and an appropriately chosen alternative parameter set. This anecdotic evidence is in line with results reported in Hess et al. (2012), who estimate models involving latent classes of random regret minimizers versus (linear-additive) utility maximizers. They find that the membership probability of the random regret class equals almost 40 per cent in the context of regular multinomial logit models, while dropping to just over 30 per cent when estimating random parameter models. These findings clearly indicate that there is a risk of confounding tastes and decision rules in discrete choice models.5 Another body of papers has focused on the fact that on a number of occasions, choice models based on presumably different decision rules may in fact be very similar or even equivalent in terms of their mathematical formulation (see, for example, the appendix in Daly, 1982). See also Batley and Daly (2006), who discuss under what conditions the mathematical formulations of the EBA model and utility maximization-based generalized extreme value models are equivalent. Another challenge regarding identification of the EBA model is that in its original specification, attribute weights (taste parameters) govern the order in which attributes are considered in the elimination process. However, in the vast majority of situations, the analyst obviously does not observe this order of elimination (that is the choice process) but only the choice outcome. This makes it intrinsically difficult to identify attribute weights (taste parameters) in the context of EBA models. The issue of identification also arises when there is no clear guidance as to where to

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

296

Handbook of choice modelling

locate a reference point. While in some models (such as the relative advantage model or the random regret minimization model) the reference point is straightforward to locate (in both cases, the attribute values of competing alternatives function as reference points), other reference-dependent models leave more room for variation in terms of reference point location6 (for example, Stathopoulos and Hess, 2012). In particular, studies inspired by the most generic version of reference dependency advocated by Tversky and Kahneman (1991) may not be aware of exact reference point location. Tversky and Kahneman mention that reference points are not necessarily limited to current wealth or the status quo, but may also be based on ‘aspirations, expectations, norms and social comparisons’ (1991, p. 1046–7). Clearly, from this perspective many different attribute-specific reference points are conceivable when one models, for example, the choice between different travel modes. Also here, parameter estimates (including taste, loss aversion and concavity-convexity parameters) are likely to vary depending on the (sometimes arbitrarily chosen) location of the various reference points – in other words: reference points and tastes are very likely to be confounded to a considerable extent. See also Stathopoulos and Hess (2012), who find that willingness-to-pay measures differ significantly depending on which reference point is used. In the context of stated preference studies, researchers (for example, Hess et al., 2008) have circumvented this reference point location issue by means of pivoting the hypothetical alternatives around the respondent’s currently chosen alternative (for example, route or travel mode). This allows the researcher to use the status-quo alternative as a multidimensional reference point. There is, however, one caveat associated with this approach: by explicitly asking the respondent for his or her status-quo alternative and subsequently highlighting that alternative as the status-quo option in the choice task, a fair amount of salience is generated concerning the status-quo option. That is, the status-quo option is given quite a bit of artificial emphasis in the choice task. As such, there is a risk that the level of reference dependency that one observes in such a pivoted stated choice experiment is larger than would be the case in an otherwise equivalent real-life (or revealed preference) situation. A somewhat special identification-related issue arises in the context of estimating lexicographic choice models. The usual way to identify lexicographic decision rules from choice data (for example, Hess et al., 2010) is to inspect multiple choices made by the same individual and to check if an individual, for example, always chooses the fastest travel mode irrespective of the associated cost levels. Such behaviour is subsequently labelled as non-trading or lexicographic behaviour. However, it seems very likely that any perception from the side of the analyst of non-trading behaviour among respondents to a stated preference survey is confounded with the range of the attribute levels specified by the analyst. It is hard to imagine, for example, that the individual observed to always choose the mode with the fastest travel time would also choose the fastest mode in a – admittedly hypothetical – choice situation where a further 1-minute time difference can be obtained at the cost of €100. In other words, non-trading or lexicographic behaviour should best be considered the projection of preferences that in reality are based on relatively steep indifference curves, in the context of a specific experimental setting. In the author’s opinion, there is very little, if any, evidence of truly lexicographic behaviour in real life, although one may be tempted to think otherwise when inspecting the choice outcomes of stated preference surveys. In a conceptual sense, the potentially erroneous interpretation of choice outcomes as being the result of lexicographic behaviour is

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Capturing alternative decision rules in travel choice models 297 nothing but an empirical identification problem, resulting from insufficient variation in explanatory variables. See Saelensminde (2006), Killi et al. (2007) and Börjesson et al. (2012) for more in-depth discussions of this topic, followed by a similar conclusion to that presented here. 3.2

Added Complexity

Almost without exception, discrete choice models involving alternative decision rules are more complex than conventional linear-additive utility maximization-based models. The added complexity may take the form of additional parameters to be estimated, and/or more complicated functional forms (for example, involving piecewise functions, power functions and/or logarithms), and/or the requirement of additional data or additional assumptions to be made by the analyst (for example, regarding reference point location) and/or more involved data pre-processing requirements. Interestingly and somewhat ironically, even those decision rules that have been designed to capture the notion that decision-makers have limited computational resources and that they wish to save mental effort when choosing, are often more complicated than their linear-additive utilitarian counterparts. This added complexity comes with a number of associated drawbacks. First, it is well known that choice models based on alternative decision rules generally take (much) longer to estimate than their linear-additive utilitarian counterpart. This increase in runtimes stems from the need to perform additional or more complex arithmetic, possibly in combination with increases in the number of needed estimation steps – owing to convergence or identification issues. It is the author’s personal experience that these differences between model types in terms of runtimes become especially notable (and sometimes prohibitive) in the context of mixed logit models. Note that runtimes of models that assume that individuals compare alternatives with every competing alternative in terms of every attribute level (such as RRM models and relative advantage models) grow more than linearly with choice-set sizes. For some route- and/or destination-choice situations involving very large choice sets, this may prove problematic. In this light it is worth mentioning that recently a methodology has been proposed to obtain unbiased random regret estimates on choice sets that are sampled from a very large universal set (Guevara et al., 2013). Second, many of the choice models that are based on more mathematically involved alternative decision rules suffer from the fact that they are not included in canned software packages and instead rely on code written by the researcher himself or herself. For example, this applies to models using max-operators and piecewise functions, such as most models assuming context (or reference) dependent preferences (for example, Chorus et al., 2008; Hess et al., 2008; Leong and Hensher, 2012b).7 Given that most students, application-oriented researchers and practitioners rely on canned software for model estimation and application, this implies that many choice models based on alternative decision rules are likely to be used predominantly in a relatively small circle of methodologically oriented scholars (and some highly trained practitioners). From the perspective of policy relevance, this is potentially problematic. Note that a solution to these kinks in utility functions may be obtained by means of constructing smooth approximations of, for example, max-operators. An example of such a remedy can be

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

298

Handbook of choice modelling

found in the newest version of the random regret minimization-model (Chorus, 2010), where smooth logsum-based approximations replaced the max-operators that were present in the older version of the model (Chorus et al., 2008). The logsum specification allowed the newest version of the random regret model to be estimated, without preprocessing the data, using canned software packages like Biogeme and NLOGIT. Note that also Leong and Hensher (2014) have adopted this smoothing function proposed in Chorus (2010) to enable estimation of their symmetric relative advantage model with conventional software. A third complexity-related drawback of alternative travel choice models concerns the fact that their results are often relatively difficult to convey to practitioners, policymakers, and users of model outcomes in general. Partly this has to do with the fact that many scholars and practitioners active in the transportation arena have over the years become used to the conventional linear additive utility maximization paradigm, and know how to interpret parameters, willingness-to-pay measures and elasticities associated with these models. However, aside from this aspect, it is easily seen that the outcomes of alternative choice models are often more difficult to interpret than those of standard models (for example, the interpretation of random regret parameters is less intuitive at first sight than the interpretation of parameters from a linear-additive utilitarian model). Although scholars have argued that this increase in difficulty in interpretation is likely to be offset by a possible increase in behavioural insights generated by these alternative models, it is the author’s personal impression that the average practitioner or policy-maker does not necessarily share that opinion (see also, Washington et al., 2003, s. 11.4). A fourth drawback related to the added complexity of choice models based on alternative decision rules is more fundamental, and refers to the widely acknowledged premise that scholars should always try to come up with the most simple model that is consistent with the facts (this maxim is sometimes referred to as Occam’s razor). If one would consider observed choices to be the ‘facts’ that discrete choice models should represent, then the question becomes to what extent the added complexity often encountered in choice models based on alternative decision rules is warranted, in light of the presumably better representation of choices. Note that this issue has a strong relation to the identification issues discussed in section 3.1: there are many fundamental and practical difficulties attached to the task of determining which decision rule (or, more strictly speaking, which data-generating process) is the (most) correct one, when one only has the choice outcomes to use as a beacon. Of course, when the added complexity is purely measured in terms of additional parameters (or losses in degrees of freedom), statistical tests – such as the likelihood ratio test for nested models and Ben-Akiva and Swait’s test (1986) for non-nested models – are available to provide a formal answer to this question. However, when one also takes into account added complexity in terms of other, harder to measure dimensions such as those discussed above, applying Occam’s razor becomes much more difficult. In these situations, it is the researcher himself or herself that in one way or the other needs to make a trade-off between the costs and benefits of using choice models based on alternative decision rules versus using the standard model. Note that, as stated in the introduction, alternative decision rules have sometimes been proposed or put forward, solely based on their presumably better representation of the behavioural process. Formally, this suggests that the researcher knows the actual choice process (presumably based on some form of introspection) and considers this process to

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Capturing alternative decision rules in travel choice models 299 be the ‘fact’ that the model needs to describe. However, past research (see Nisbett and Wilson, 1977, for a classical paper, or Senk, 2010, for a recent discussion in the context of travel choice behaviour) convincingly shows that individuals’ own perceptions and accounts of processes used to arrive at choices are notoriously unreliable. As such, treating choice processes as facts seems not to be a particularly promising idea. In the end, for the large majority of travel demand studies, choice outcomes are the only (or at least, the most important) facts that choice modellers can and should rely on.

4

TRAVEL CHOICE MODELS BASED ON ALTERNATIVE DECISION RULES: THEIR POTENTIAL AND HOW TO FULFIL IT

Whereas the previous section highlighted pitfalls and challenges related to choice models based on alternative decision rules, this section adopts a more positive perspective; it discusses the potential benefits these models may bring to scholars and practitioners, and it presents recommendations concerning how to help fulfil this potential. In section 4.1, the focus is on benefits in terms of model fit and predictive ability. Section 4.2 discusses benefits in terms of increases in behavioural insights and related policy implications. 4.1

Potential of Alternative Decision Rules: Model Fit and Predictive Performance

When inspecting recent work, it appears that more often than not, choice models based on alternative decision rules achieve higher levels of goodness of fit than their conventional (linear-additive utilitarian) counterparts. For most alternative models, such as the majority of models using some form of reference dependency, these increases in fit come with losses in degrees of freedom (that is, additional parameters). However, statistical tests usually show that the improved model fit more than compensates, in a statistical sense, these losses in parsimony. In most cases, the alternative choice model – when consuming more parameters than its linear-additive utilitarian counterpart – is specified in a way that it reduces to the standard model for some values of its additional parameters; in those cases, as a rule, estimates are obtained for these additional parameters that are in line with the behavioural hypotheses underlying the alternative decision rule (such as the hypotheses of loss aversion and increasing/decreasing sensitivity). In other words, the statistical hypothesis that the alternative model reduces to the standard model is usually rejected at conventional significance levels. Hess et al. (2012) find that latent class models – each class being characterized by a different decision rule and set of tastes – achieve notably higher levels of fit (when correcting for the large numbers of additional parameters) than models that assume one and the same decision rule for the entire population. Some alternative models, such as the RRM model and the symmetric version of the relative advantage model, do not reduce to the standard linear additive utilitarian model for particular parameter values; the two mentioned models consume the same number of parameters as the standard model. Empirical evidence for these models is mixed. In the case of the RRM model, it is the author’s impression (based on published and unpublished work involving model estimations on over 30 data-sets – see Chorus et al., 2014, for a recent overview) that model fit comparisons can be summarized as follows:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

300

Handbook of choice modelling

in roughly one-third of cases (data-sets) a model where every attribute is processed in a regret minimization-fashion achieves the best fit; in roughly one-third of cases a model where every attribute is processed in a linear-additive utility maximization fashion achieves the best fit; and in the remaining, roughly one-third, of cases a hybrid specification performs best (in such a specification, one or more, but not all, attributes are processed by means of regret minimization – see Chorus et al., 2013a, for details of this hybrid variant). Although generally significant, model fit differences between these three model types vary between very small and modest. A notable exception is the situation where, in the context of stated choice experiments, a so-called ‘opt out’ or ‘no choice’ option exists. Hess et al. (2014) found that when this option is framed as ‘none of these’ (or a variant thereof), the regret model is likely to perform much worse than its utilitarian counterpart in terms of model fit, and might even produce biased parameters. But when the ‘opt out’ is framed as ‘I am indifferent’ (or a variant thereof), the regret model is likely to perform much better than its utilitarian counterpart, the latter generating biased parameters. As explained in Hess et al. (2014), these substantial differences in performance in the context of different formulations of the ‘no choice’ option can be directly and unambiguously related to the differences in behavioural premises underlying the two model types. However, note that Chorus and Rose (2012), using other data, found that even in the context of a ‘none of these’ opt out, the RRM model achieved a better fit than its random utility maximization (RUM) counterpart (albeit the difference, in favour of RRM, was much bigger when the opt out was not taken into account). More research is needed to find out to what extent these findings related to the impact of opt-out formulations on the performance of RUM and RRM models can be generalized. Concerning the symmetric relative advantage model, Leong and Hensher (2014) show a good performance of the model, in terms of goodness of fit, compared with linear-additive utilitarian models and regret minimization models: in seven out of eight data-sets, the symmetric relative advantage model achieves a higher loglikelihood than competing models, while – as mentioned above – consuming the same number of parameters. Differences are small, however, and it should also be noted that the used stated choice data-sets are all very much alike in terms of choice context, choice-set composition and attributes used – which to some extent compromises the strength of the evidence. Notwithstanding this, the good performance of the symmetric (that is, restricted) version of the relative advantage model is noteworthy in light of the fact that previous research has highlighted the weak performance of the unrestricted or generic model variant on two consumer choice data-sets (Kivetz et al., 2004). Whereas the large majority of studies proposing or testing travel choice models based on alternative decision rules report differences in goodness of fit (and the associated statistical tests), far fewer studies compare model forms in terms of their predictive performance on validation samples, for example, using hold-out tasks in stated choice surveys. Performing these out-of-sample validation exercises is to be recommended, especially when there is a difference in the number of parameters consumed by the models being compared (which implies a potential risk of over-fitting on the estimation sample). A number of studies comparing the RRM model with its linear-additive utilitarian counterpart do report such comparisons in terms of predictive performance on a validation sample (for example, Chorus, 2010; Kaplan and Prato, 2012; Chorus et al., 2013a, 2013b). Results suggest that a similar picture in terms of the average performance

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Capturing alternative decision rules in travel choice models 301 of regret-based, utility-based and hybrid models arises as was obtained for model fit. However, and more surprisingly, it is found that relative differences between models in terms of out-of-sample predictive performance are often not in line with differences in terms of model fit; only on very few occasions does one and the same model type perform better in model fit as well as in out-of-sample predictive performance. This result for random regret models can be considered further evidence for the need to look beyond model fit when comparing travel choice models based on different decision rules. In relation to the above discussion of model comparisons in terms of goodness of fit and out-of-sample predictive ability, it is worth noting that the large majority of studies into alternative travel choice models use stated preference (SP) rather than revealed preference (RP) data to test and compare model specifications (for example, the author is only aware of five RP applications of the RRM model, compared with well over 20 SP studies; for other alternative travel choice models, the distribution across data types appears to be even more skewed in favour of the SP approach. Not only is this worrying in light of the fact that RP data present a more convincing testing ground for choice models in general; as argued in the subsection on identification difficulties, SP data may turn out to be less suitable for the application of some alternative choice models in particular – think of the abovementioned difficulties associated with identifying ‘lexicographic’ behaviour in the context of SP data (Killi et al., 2007; Hess et al., 2010; Börjesson et al., 2012), or the difficulties associated with the artificially increased salience of reference alternatives in pivoted SP experiments. Although the use of RP data comes with a number of its own challenges, it is to be recommended that – more than is currently the case – alternative choice models are being put to the empirical test using revealed choices, rather than stated ones. To summarize: when one considers from a purely statistical perspective the question whether or not to embed alternative decision rules in choice models, the answer – which depends on the data used and on the type of models being compared – is usually ‘yes’. However, in most cases model-fit differences have been found to be modest at best (latent class models appearing to be notable exceptions). Furthermore, for most alternative choice models it is yet unclear whether, or to what extent, improvements in model fit translate into a better predictive performance on validation samples. Moreover, as argued in the previous section, there are more perspectives to choice modelling than the purely statistical one. Less methodologically oriented researchers, and practitioners even more so, will ask a different kind of question when being confronted with travel choices models based on alternative decision rules: will these models ultimately lead to new behavioural insights and/or more informed planning and policy-making? 4.2

Potential of Alternative Decision Rules: Behavioural Insights and Policy Implications

Any claim that alternative choice models lead to increases in behavioural insights, and as such to better policy-making, necessarily implies that these models provide results (in terms of parameter estimates, willingness-to-pay measures, elasticities or market shares) that differ from those obtained from the standard model. A quick inspection of the literature suggests that this condition is met to some extent. For example, reference-dependent models of various types have been shown to lead to willingness-to-pay estimates which

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

302

Handbook of choice modelling

substantially differ from those obtained from conventional, that is linear-additive, travel choice models (for example, De Borger and Fosgerau, 2008; Hess et al., 2008; Leong and Hensher, 2012b). A counter-example is the symmetric relative advantage model: Leong and Hensher (2014) report that differences between the symmetric relative advantage model and the standard model, in terms of willingness-to-pay measures, are without exception very small and insignificant. This might suggest that the added value of that model for generating new and different behavioural insights and policy implications seems limited at first sight – although more research is needed regarding whether or not this result is generic across outcome types (for example, elasticities and market shares) and choice contexts. Looking at the RRM model, it appears that model outcomes such as elasticities and choice probabilities in particular choice situations can differ substantially from those generated by standard models, even when the aggregate model fit hardly differs between the two model types (for example, Kaplan and Prato, 2012; Thiene et al., 2012; Chorus and Bierlaire, 2013; Chorus et al., 2013b; Boeri and Masiero, 2014; Hensher et al., 2013). However, in a number of cases, these differences turn out to be not statistically significant. See Chorus and Bierlaire (2013), de Bekker-Grob and Chorus (2013) and Leong and Hensher (2014), for statistical tests of differences between regret- and utility-based model outcomes. For example, Chorus and Bierlaire (2013) find that differences in market shares in situations where compromise alternatives are present are significant, while differences in aggregate-level elasticities are not. To what extent these differences in model outcomes lead to new behavioural insights into traveller behaviour is to a large extent an open question. Perhaps the two most convincingly proven (and strongly related) behavioural effects are that (1) reference points matter, and (2) the composition of the choice set matters. In other words, travellers’ preferences and their resulting choice behaviour appear to be sensitive to the presence of seemingly irrelevant reference points, and to seemingly irrelevant peculiarities of the choice set. These insights are in line with results obtained in the context of more general consumer choice studies performed in the adjacent field of marketing research (see, for example, Rooderkerk et al., 2011, for a recent contribution to that literature). In terms of policy and planning implications, these results suggests that travel behaviour can be influenced by tuning (some would say manipulating) the choice context in which a traveller makes his or her decisions, without even changing the attributes of the ‘target’ alternative. Again, in the field of marketing such choice-set engineering approaches are well known and routinely applied; recent empirical work in transportation suggests that such approaches may hold benefits in terms of more effective travel demand management as well. An example of such a choice-set engineering approach would be the ‘construction’ of choice sets in such a way that an alternative which a travel demand manager wishes to gain a large market share (for example, a sustainable mode of transport) is positioned as a compromise alternative. More generally speaking, alternative travel choice models that capture the type of contextual preferences discussed directly above can be employed to assist travel demand managers in designing clever ‘nudging’ strategies (for example, Avineri, 2012) aimed at inducing more sustainable mobility behaviour. Another potential avenue for putting to use alternative choice models with the aim of arriving at more informed policy-making refers to forecasting studies. The idea here would be to simultaneously employ multiple choice model types (the standard model and one or more alternative models) for the analysis and prediction of travel behaviour.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Capturing alternative decision rules in travel choice models 303 The outcomes of the different models (in terms of, for example, elasticities and/or market share forecasts) may then be used to obtain what may be called ‘behavioural confidence intervals’ and/or to perform what may be called ‘behavioural sensitivity analyses’. That is, to the extent that different model types generate different outcomes, it makes sense to consider each model type (and associated outcome) a possible scenario, in a roughly similar way as one would work with different scenarios concerning, for example, demographic developments. Confronted with these different behavioural scenarios, policymakers and planners may then apply conventional techniques for dealing with multiple scenarios with the aim of developing ‘behaviourally robust’ policies – that is, policies that are likely to turn out effective, irrespective of which behavioural scenario (or which underlying travel choice model) in the end turns out to be the most correct one. Two important steps, in terms of additional research beyond the issues mentioned earlier in this section, must be made before the above-mentioned potential of alternative travel choice models is fully realized. First, there is a need to look beyond the decision rules currently used in travel choice research. So far, travel choice modellers have focused on a small set of alternative decision rules, while the broader decision-making literature suggests that there are many more decision rules that may form promising combinations with the discrete travel choice paradigm. Extensive surveys of alternatives to additive utilitarian decision rules can be found in Payne et al. (1993) and Gigerenzer and Selten (2002), to cite two prominent examples. Only after having diverged (in terms of exploring currently unexplored decision rules), can the travel choice modelling field start to converge by means of selecting one or a few particularly well-performing models. A second research need refers to exploring and, where possible, broadening the applicability of each of the alternative decision rules. More specifically, when introducing and testing alternative travel choice models, their applicability beyond modelling choices must be explored before these models can become viable competitors of the linear-additive utility maximization-based travel choice model. For example, it is important that the travel choice modelling community tries to gain a more solid and deeper understanding of the welfare-economic implications of using alternative decision rules, by means of exploring the axiomatic foundation and behavioural interpretation of willingness-to-pay and user-benefit measures. See, for example, Stathopoulos and Hess (2012) for a discussion of willingness to pay in the context of reference dependent models, and Chorus (2012c) for a discussion of user-benefit measurement in the context of regret minimization models. To cite another example, translating alternative choice models into viable approaches to predict traffic equilibria is of great importance to many practitioners; see Delle Site and Filippi (2011) and Bekhor et al. (2012) for recent attempts to derive traffic equilibrium formulations in the context of reference dependency and regret minimization, respectively. Related to this, very much an open question is how alternative model types would fare in the context of large-scale transportation models; would differences between alternative models and the standard model cancel out at the aggregate (macro) level, or – to the contrary – is it possible that these differences are even further amplified by the introduction of, for example, supply-demand interactions? These are very important questions that are worthy of the attention of our research community. In short, when an alternative travel choice model, after rigorous testing on SP and RP data, achieves the status of a promising candidate for travel choice modelling, a necessary next step is to transform that choice model from being a mere ‘tool’ (for understanding choice

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

304

Handbook of choice modelling

behaviour at the micro level), to becoming a ‘toolbox’ (that facilitates welfare-economic appraisal and large-scale transport modelling, for example). Only such an alternative toolbox can offer a full-fledged alternative for the very well-equipped toolbox that has over the years been put together for the standard model.

ACKNOWLEDGEMENTS Support from the Netherlands Organization for Scientific Research (NWO), in the form of VENI-grant 451.10.001 and VIDI-grant 016-125-305, is gratefully acknowledged.

NOTES 1. See Hensher and Rose (2011) for a collection of ground-breaking contributions to the field of choice modelling and an extensive historical account of progress in that area. 2. Note again that the appendix provides a mathematical representation of discussed models (those that have been used in travel demand studies) from within one unified notational framework. 3. A similar observation can be made concerning applications of the satisficing rule in choice models developed in other domains; note, though, that a very recent study presents a choice model based on a satisficing decision rule in a consumer choice context (Stüttgen et al., 2012). 4. This preference for compromise alternatives has been empirically very well documented in fields adjacent to transportation (for example, Simonson, 1989; Wernerfelt, 1995; Kivetz et al., 2004; Müller et al., 2010; Chorus and Rose, 2012), and a recent publication shows its potential relevance for travel behaviour research as well (Chorus and Bierlaire, 2013). 5. In a more general sense, these findings are in line with recent work in risky decision-making theory that suggests that the performance of different decision rules (embedded in discrete choice models) is to some extent dependent on the chosen error term structure (Blavatskyy and Pogrebna, 2010). 6. Practically without exception, discrete choice modellers locate reference points themselves (that is, reference point location is exogenous to the model), often based on a combination of intuition and an empirical process of trial and error. Recent theoretical work (Schmidt and Zank, 2012) provides directions for having reference point locations arise endogenously, something which is obviously to be preferred from a scientific viewpoint. However, the ideas put forward in that paper have not yet been tested empirically in a discrete choice context. 7. In some instances, researchers have succeeded in using canned software for the estimation of models relying on ‘kinked’ utility functions by means of pre-processing the data, for example, by a priori specifying which attribute level is least preferred in a given choice set (this example refers to the estimation of a contextual concavity model). Besides that this pre-processing is cumbersome, especially in situations where alternatives are specified in terms of many attributes, a more fundamental drawback of this pre-processing approach is that it presupposes prior knowledge of the analyst regarding the signs of the taste parameters (that is, knowledge relating to whether higher attribute values are preferred over lower attribute values, or vice versa). In some cases (such as travel times) parameter signs are obvious, but in other cases they may not be. Furthermore, it should be noted that when parameter signs are unconstrained in the estimation process (as they usually are), one may run into a situation where the signs of the parameters at some stage of the estimation process do not match the a priori assumptions that were used to pre-process the data.

REFERENCES Arentze, T.A. and H.J.P. Timmermans (2007), ‘Parametric action trees: incorporating continuous attribute variables into rule-based models of discrete choice’, Transportation Research Part B, 41 (7), 772–83. Avineri, E. (2012), ‘On the use and potential of behavioural economics from the perspective of transport and climate change’, Journal of Transport Geography, 24 (September), 512–21.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Capturing alternative decision rules in travel choice models 305 Avineri, E. and P.H.L. Bovy (2008), ‘Identification of parameters for prospect theory model for travel choice analysis, Transportation Research Record, 2082, 141–7. Batley, R. and A. Daly (2006), ‘On the equivalence between elimination-by-aspects and generalized extreme value models of choice behaviour’, Journal of Mathematical Psychology, 50 (5), 456–67. Beck, M.J., C.G. Chorus and J.M. Rose (2013), ‘Vehicle purchasing behaviour of individuals and groups: regret or reward?’, Journal of Transport Economics and Policy, 47 (3), 475–92. Bekhor, S., C.G. Chorus and T. Toledo (2012), ‘Stochastic user equilibrium for route choice model based on random regret minimization’, Transportation Research Record, 2284, 100–108. Ben-Akiva, M. (1974), ‘Structure of passenger travel demand models’, Transportation Research Record, 526, 26–41. Ben-Akiva, M. and B. Boccara (1995), ‘Discrete choice models with latent choice-sets’, International Journal of Research in Marketing, 12 (1), 9–24. Ben-Akiva, M. and S.R. Lerman (1985), Discrete Choice Analysis: Theory and Application to Travel Demand, Cambridge, MA: MIT Press. Ben-Akiva, M.E. and J. Swait (1986), ‘The Akaike likelihood ratio index’, Transportation Science, 20 (2), 133–6. Bhat, C.R. (2005), ‘A multiple discrete-continuous extreme value model: formulation and application to discretionary time-use decisions’, Transportation Research Part B, 39 (8), 679–707. Blavatskyy, P.R. and G. Pogrebna (2010), ‘Models of stochastic choice and decision theories: why both are important for analyzing decisions’, Journal of Applied Econometrics, 25 (6), 963–86. Boeri, M. and L. Masiero (2014), ‘Regret minimization and utility maximization in a freight transport context: an application from two stated choice experiments’, Transportmetrica, forthcoming. Börjesson, M., M. Fosgerau and S. Algers (2012), ‘Catching the tail: empirical identification of the distribution of the value of travel time’, Transportation Research Part A, 46 (2), 378–91. Brownstone, D. and K. Train (1998), ‘Forecasting new product penetration with flexible substitution patterns’, Journal of Econometrics, 89 (1–2), 109–29. Chorus, C.G. (2010), ‘A new model of random regret minimization’, European Journal of Transport and Infrastructure Research, 10 (2), pp. 181–96. Chorus, C.G. (2012a), ‘What about behaviour in travel demand modelling? An overview of recent progress’, Transportation Letters, 4 (2), 93–104. Chorus, C.G. (2012b), ‘Random regret minimization: an overview of model properties and empirical evidence’, Transport Reviews, 32 (1), 75–92. Chorus, C.G. (2012c), ‘Logsums for utility-maximizers and regret-minimizers, and their relation with desirability and satisfaction’, Transportation Research Part A, 46 (7), 1003–12. Chorus, C.G. and M. Bierlaire (2013), ‘An empirical comparison of travel choice models that capture preferences for compromise alternatives’, Transportation, 40 (3), 549–62. Chorus, C.G. and G.C. de Jong (2011), ‘Modeling experienced accessibility for utility-maximizers and regretminimizers’, Journal of Transport Geography, 19 (7), 1155–62. Chorus, C.G. and J.M. Rose (2012), ‘Selecting a date: a matter of regret and compromises’, in S. Hess and A. Daly (eds), Choice Modelling: The State of the Art and the State of Practice, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, ch. 11. Chorus, C.G., S. van Cranenburgh and T. Dekker (2014), ‘Random Regret Minimization for consumer choice modeling: assessment of empirical evidence’, Journal of Business Research, forthcoming. Chorus, C.G., T.A. Arentze and H.J.P. Timmermans (2008), ‘A random regret minimization model of travel choice’, Transportation Research Part B, 42 (1), 1–18. Chorus, C.G., J.M. Rose and D.A. Hensher (2013a), ‘Regret minimization or utility maximization: it depends on the attribute’, Environment and Planning Part B, 40 (1), 154–69. Chorus, C.G., M.J. Koetse and A. Hoen (2013b), ‘Consumer preferences for alternative fuel vehicles: comparing a utility maximization and a regret minimization model’, Energy Policy, 61 (October), 901–8. Daly, A. (1982), ‘Applicability of disaggregate models of behaviour: a question of methodology’, Transportation Research Part A, 16 (5–6), 363–70. De Bekker-Grob, E.W. and C.G. Chorus (2013), ‘Random regret-based discrete choice modelling: an application to health care’, PharmacoEconomics, 31 (7), 623–34. De Borger, B. and M. Fosgerau (2008), ‘The trade-off between money and time: a test of the theory of reference dependent preferences’, Journal of Urban Economics, 64 (1), 101–15. Delle Site, P. and F. Filippi (2011), ‘Stochastic user equilibrium and value-of-time analysis with referencedependent route choice’, European Journal of Transport and Infrastructure Research, 11 (2), 194–218. Gigerenzer, G. and T. Selten (2002), Bounded Rationality: The Adaptive Toolbox, Cambridge, MA: MIT Press. Guevara, C.A., C.G. Chorus and M.E. Ben-Akiva (2013), ‘Sampling of alternatives in random regret minimization models, proceedings of the 92nd annual meeting of the Transportation Research Board, Washington, DC, 13–17 January.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

306

Handbook of choice modelling

Hensher, D.A. (2010), ‘Attribute processing, heuristics and preference construction in choice analysis’, in S. Hess and A. Daly (eds), State-of Art and State-of Practice in Choice Modelling, Bingley: Emerald Press, pp. 35–70. Hensher, D.A. and J.M. Rose (2011), Choice Modelling: Foundational Contributions, Cheltenham, UK and Northampton, MA, USA: Edward Elgar. Hensher, D.A., W. Greene and C.G. Chorus (2013), ‘Random regret minimisation or random utility maximisation: an exploratory analysis in the context of automobile choice’, Journal of Advanced Transportation, 47 (7), 667–78. Hess, S., M. Beck and C.G. Chorus (2014), ‘Contrasts between utility maximisation and regret minimisation in the presence of opt out alternatives’, Transportation Research Part A, forthcoming. Hess, S., J.M. Rose and D.A. Hensher (2008), ‘Asymmetric preference formation in willingness to pay estimates in discrete choice models’, Transportation Research Part E, 44 (5), 847–63. Hess, S., J.K. Rose and J. Polak (2010), ‘Non-trading, lexicographic and inconsistent behaviour in stated choice data’, Transportation Research Part D, 15 (7), 405–17. Hess, S., A. Stathopoulos and A. Daly (2012), ‘Allowing for heterogeneous decision-rules in discrete choice models: an approach and four case-studies’, Transportation, 39 (3), 565–91. Kahneman, D. (2003), ‘Maps of bounded rationality: psychology for behavioural economics’, American Economic Review, 93 (5), 1449–75. Kahneman, D. and A. Tversky (1979), ‘Prospect theory: an analysis of decision under risk’, Econometrica, 47 (2), 263–91. Kaplan, S. and G. Prato (2012), ‘The application of the random regret minimization model to drivers’ choice of crash avoidance maneuvers’, Transportation Research Part F, 15 (6), 699–709. Kaplan, S., Y. Shiftan and S. Bekhor (2012), ‘Development and estimation of a semi-compensatory model with a flexible error structure’, Transportation Research Part B, 46 (2), 291–302. Killi, M., A. Nossum and K. Veisten (2007), ‘Lexicographic answering in travel choice: insufficient scale extensions and steep indifference curves?’, European Journal of Transport and Infrastructure Research, 7 (1), 39–62. Kivetz, R., O. Netzer and V. Srinivasan (2004), ‘Alternative models for capturing the compromise effect’, Journal of Marketing Research, 41 (August), 237–57. Leong. W. and D.A. Hensher (2012a), ‘Embedding decision heuristics in discrete choice models: a review’, Transport Reviews, 32 (3), 313–31. Leong, W. and D.A. Hensher (2012b), ‘Embedding multiple heuristics into choice models: an alternative approach’, Journal of Choice Modelling, 5 (3), 131–44. Leong, W. and D.A. Hensher (2014), ‘Contrasts of relative advantage maximization with random utility maximization and regret minimization’, Journal of Transport Economics and Policy, forthcoming. Li, Z. and D. Hensher (2011), ‘Prospect theoretic contributions in understanding traveller behaviour: a review and some comments’, Transport Reviews, 31 (1), 97–115. McFadden, D. (1974), ‘Conditional logit analysis of qualitative choice-behaviour’, in P. Zarembka (ed.), Frontiers in Econometrics, New York: Academic Press, pp. 105–42. McFadden D. (1978), ‘Modeling the choice of residential location’, in A. Karlquist, L. Lundqvist, F. Snickers and J.W. Weibull (eds), Spatial Interaction Theory and Residential Location, Amsterdam: North Holland, pp. 75–96. McFadden, D. (2007), ‘The behavioural science of transportation’, Transport Policy, 14 (4), 269–74. McFadden, D. and K.E. Train (2000), ‘Mixed MNL models for discrete response’, Journal of Applied Econometrics, 15 (5), 447–70. Müller, H., E.B. Kroll and B. Vogt (2010), ‘Fact or artifact? Empirical evidence on the robustness of compromise effects in binding and non-binding choice contexts’, Journal of Retailing and Consumer Services, 17 (5), 441–8. Nisbett, R.E. and T. DeCamp Wilson (1997), ‘Telling more than we can know: verbal reports on mental processes’, Psychological Review, 84 (3), 231–59. Ortúzar. J. and L.G. Willumsen (2011), Modelling Transport, 4th edn, Chichester: Wiley. Payne, J.W., J.R. Bettman and E.J. Johnson (1993), The Adaptive Decision Maker, Cambridge: Cambridge University Press. Prato, C.G. (2014), ‘Estimating random regret minimization models in the route choice context’, Transportation, 41 (2), 351–75 Rooderkerk, R.P., H.J. van Heerde and T.H.A. Bijmolt (2011), ‘Incorporating context effects into a choice model’, Journal of Marketing Research, 48 (4), 767–80. Saelensminde, K. (2006), ‘Causes and consequences of lexicographic choice in stated choice studies’, Ecological Economics, 59 (3), 331–40. Schmidt, U. and H. Zank (2012), ‘A genuine foundation for prospect theory’, Journal of Risk and Uncertainty, 45 (2), 97–113. Senk, P. (2010), ‘Route choice under the microscope’, Transportation Research Records, 2156, 56–63.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Capturing alternative decision rules in travel choice models 307 Simon, H.A. (1955), ‘A behavioural model of rational choice’, Quarterly Journal of Economics, 69 (1), 99–118. Simonson, I. (1989), ‘Choice based on reasons: the case of attraction and compromise effects’, Journal of Consumer Research, 19 (September), 158–74. Small, K.A. (1987), ‘A discrete choice model for ordered alternatives’, Econometrica, 55 (2), 409–24. Small, K.A. and H.S. Rosen (1981), ‘Applied welfare economics with discrete choice models’, Econometrica, 49 (1), 105–30. Small, K.A. and E.T. Verhoef (2007), The Economics of Urban Transportation, New York: Routledge. Stathopoulos, A. and S. Hess (2012), ‘Revisiting reference point formation, gain-loss asymmetry and nonlinear sensitivities with an emphasis on attribute specific treatment’, Transportation Research Part A, 46 (10), 1673–89. Stüttgen, P., P. Boatwright and R.T. Monroe (2012), ‘A satisficing choice model’, Marketing Science, 31 (6), 878–99. Swait, J. and M. Ben-Akiva (1987), ‘Incorporating random constraints in discrete models of choice set generation’, Transportation Research Part B, 21 (2), 91–102. Thiene, M., M. Boeri and C.G. Chorus (2012), ‘Random regret minimization: exploration of a new choice model for environmental and resource economics’, Environmental and Resource Economics, 51 (3), 413–29. Tversky, A. (1972), ‘Elimination by aspects: a theory of choice’, Psychological Review, 79 (4), 281–99. Tversky, A. and D. Kahneman (1991), ‘Loss aversion in riskless choice: a reference-dependent model’, Quarterly Journal of Economics, 106 (4), 1039–61. Tversky, A. and I. Simonson (1993), ‘Context dependent preferences’, Management Science, 39 (10), 1179–89. Van Acker, V., P. Mokhtarian and F. Witlox (2011), ‘Going soft: on how subjective variables explain modal choices for leisure travel’, European Journal of Transport and Infrastructure Research, 11 (2), 115–46. Van de Kaa, E. (2010), ‘Prospect theory and choice behaviour strategies: review and synthesis of concepts from social and transport sciences’, European Journal of Transport and Infrastructure Research, 10 (4), 299–329. Walker, J.L. and M.E. Ben-Akiva (2002), ‘Generalized random utility model’, Mathematical Social Sciences, 43 (3), 303–43. Washington, S.P., M.G. Karlaftis and F.L. Mannering (2003). Statistical and Econometric Methods for Transportation Data Analysis, New York: CRC Press. Wernerfelt, B. (1995), ‘A rational reconstruction of the compromise effect: using market data to infer utilities’, Journal of Consumer Research, 21 (4), 627–33. Zhu, W. and H.J.P. Timmermans (2010), ‘Cognitive process model of individual choice behaviour incorporating principles of bounded rationality and heterogeneous decision-rules’, Environment & Planning Part B, 37 (1), 59–74.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

308

Handbook of choice modelling

APPENDIX: MATHEMATICAL FORMULATIONS OF DECISION RULES DISCUSSED IN THE CHAPTER1 Notation C Set of choice alternatives i, j Choice alternatives in C Vi Utility of alternative i Ri Regret of alternative i xim Value of the m’th attribute2 of alternative i | xm Aspiration level for the m’th attribute xm Reference value of the m’th attribute m The most important of the M attributes bm Taste parameter (attribute weight) for the m’th attribute d bm Taste parameter for the m’th attribute (for xim # xm) S bm Taste parameter for the m’th attribute (for xim . xm) fm,m Concavity-convexity parameters for the m’th attribute | am Loss-aversion parameter d Relative advantage parameter yi Denotes whether (1) or not (0) alternative i is chosen A13.1

Standard Model: Linear-Additive Utility Maximization Vi 5 a bm # xim m yi 5 1 3 Vi $ Vj, 4j [ C

A13.2 Elimination-by-Aspects3 yi 5 1 3 xim $ | xm, 4m A13.3 Lexicographic Choice yi 5 1 3 xim 5 max4j[C [ xjm ] A13.4 Reference-Dependent Utility Maximization I4 S d Vi 5 a (2 b m # max [ 0, xm 2 xim ] 1 b m # max [ 0, xim 2 xm ]) m yi 5 1 3 Vi $ Vj, 4j [ C A13.5 Reference-Dependent Utility Maximization II5 Vi 5 a ( 2 gm # max [ 0,xm 2 xim ] f 1 max [ 0,xim 2 xm ]  ) m m

m

yi 5 1 3 Vi $ Vj, 4j [ C

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Capturing alternative decision rules in travel choice models 309 A13.6 Relative Advantage Model Vi 5 a bm # xim 1 m

d#

a j2i

# a mmax [ 0, bm (xim 2 xjm) ]

f # # # # ammax[0, bm (xim2xjm)] 1am(max [ 0, bm (xjm 2xim)]1gm { max [ 0,bm (xjm2xim)] } ) m

yi 5 1 3 Vi $ Vj, 4j [ C A13.7 Symmetric Relative Advantage Model Vi 5 a bm # xim 1 m a j2i

# a mln (1 1 exp [ bm (xim 2 xjm) ])

# # a mln (1 1 exp [ bm (xim 2 xjm) ]) 1 a mln (1 1 exp [ bm (xjm 2 xim) ]) yi 5 1 3 Vi $ Vj, 4j [ C

A13.8 Contextual Concavity Model Vi 5 a (bm # xim 2 bm # minj[C [ xjm ]) f m

m

yi 5 1 3 Vi $ Vj, 4j [ C A13.9 Random Regret Minimization Model: 2008 Specification Ri 5 maxj2i c a max [ 0,bm # (xjm 2 xim) ] d m yi 5 1 3 Ri # Rj, 4j [ C

A13.10 Random Regret Minimization Model: 2010 Specification Ri 5 a a ln (1 1 exp [ bm # (xjm 2 xim) ]) j2i

yi 5 1 3 Ri # Rj, 4j [ C Notes 1. Note that this appendix adopts the perspective of the decision-maker, not that of the analyst. Therefore, it does not discuss the random error term (representing uncertainty from the side of the analyst) and its possible distributional assumptions (leading to, for example, logit, probit, nested logit or mixed logit model forms). Most of the decision rules presented here are compatible various error term distributions. As a consequence one may compose linear-additive utility maximization-based logit, probit, nested logit and mixed logit models as well as regret-based counterparts or relative advantage-based counterparts, to cite a few examples.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

310

Handbook of choice modelling

2. Note that the focus of this appendix is on continuous attributes. Note also that – for ease of communication and without loss of general applicability – it is assumed throughout this appendix that higher attribute values are preferred over lower ones. 3. Note that the original elimination-by-aspects models focused on dummy attributes, whereas most travel choice applications involve continuous attributes. As mentioned in note 2, this appendix focuses on the continuous attribute case. One possible translation (see Hess et al., 2012, for alternative translations) of the notion of aspect elimination in a continuous attribute context is to assume that the individual holds attribute-specific aspiration levels which can be set by the analyst in a process of trial and error or – at least theoretically – estimated. If the alternative scores below that aspiration level for a given attribute, this is interpreted to imply that the alternative does not contain this ‘aspect’ and is therefore eliminated. For ease of communication, it is assumed that there is only one alternative that ‘contains all aspects’/meets all aspiration levels. When multiple alternatives remain, it is assumed that an alternative is randomly chosen from that subset. Finally, note that, in line with the discussion in the chapter, the focus here is on the choice outcome, whereas the original elimination-by-aspects model in addition to that also focused on the choice process (that is, the sequence, or order, of elimination). d S 4. This specification allows for loss-aversion ( bm . bm) , but it does not allow for decreasing or increasing sensitivity. Note also that this specification hypothesizes reference dependency for all attributes. 5. This specification allows for loss aversion (g . 1) as well as for decreasing or increasing sensitivity (fm 2 1, m 2 1) . Note that is only one of several competing specifications (for example, De Borger and Fosgerau, 2008; Stathopoulos and Hess, 2012) that simultaneously allow for these two effects. Note furthermore that this specification hypothesizes reference dependency for all attributes.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:44AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

14 Latent class structures: taste heterogeneity and beyond Stephane Hess1

1

INTRODUCTION

The treatment of heterogeneity across individual decision makers is one of the key topics of research in choice modelling, as evidenced by many of the chapters in this book. While part of this heterogeneity can in many cases be linked to differences in key sociodemographic characteristics across agents, there has long been a recognition that often a non-trivial share of it cannot be explained in this manner. A number of reasons exist, on the one hand an inability to capture all possible socio-demographic characteristics that may be relevant, and on the other hand the existence of idiosyncratic differences in preferences across decision makers. Limiting ourselves to a purely deterministic treatment of taste heterogeneity can result in a loss of explanatory power, a lack of insights into the true extent of preference heterogeneity, and, depending on the shape and extent of the omitted heterogeneity, potential bias in key model outputs. With the significant increase in performance of personal computers and the availability of easy to use software, a majority of academic studies as well as a large share of applied work now allow for some degree of random preference heterogeneity in their models. The key principle in any model aiming to capture random heterogeneity is to allow for a distribution in sensitivities across decision makers. Two main approaches exist, making use of either a discrete or a continuous distribution. The former generally relies on the notion of individual latent classes of decision makers, although this chapter also briefly looks at discrete mixtures at the level of individual coefficients. The latter relies on the specification of a multivariate continuous distribution for the coefficients in a choice model. In recent years especially, the continuous specification, often simply referred to as mixed logit (despite latent class and discrete mixtures also being mixtures of logit models), has come to dominate in many fields, notably in transport. It should be acknowledged that this chapter focuses solely on continuous mixed logit rather than probit (cf. Dagnazo 1979), which is seeing a resurgence thanks to the work by Bhat (2011); Bhat and Sidharthan (2011) – many of the same distinctions discussed here apply to probit, albeit that the distributional assumptions are stricter than in mixed logit. Similarly, the latent class discussions focus on a logit kernel, that is, not discussing for example latent class probit structures. The theoretical differences between continuous mixed logit and latent class logit were set out in detail by Greene and Hensher (2003), with empirical comparisons for example in Andrews et al. (2002); Hanley et al. (2002); Scarpa et al. (2005); Provencher and Bishop (2004) and Shen (2009). Aside from providing further detail relating to the general structure, notably in terms of the correlation structure findings of Hess et al. (2009), a key 311 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

312

Handbook of choice modelling

focus of the present chapter is to look at important developments in latent class models since the work by Greene and Hensher (2003). First, in the ten years that have passed, a number of analysts have sought to combine the relative advantages of the two structures in hybrid models. Second, a larger body of (mainly empirical) research has made use of latent class structures with a view to capturing patterns of heterogeneity going beyond taste coefficients, looking at information processing, heuristics and heterogeneity in decision rules. Finally, there have also been further advances in terms of estimation performance for continuous mixture models in the last ten years, developments relating to the flexibility of mixing distributions, and growing use of continuous mixtures for capturing phenomena going beyond simple taste heterogeneity. Throughout the chapter, we do not seek to come to clear conclusions as to one model being superior to others, in fact, we rather highlight that the choice of an appropriate approach may be situation specific, in line with a number of past empirical comparisons.

2 2.1

CONTRASTS BETWEEN MODEL STRUCTURES Background Methodology

Let Pnit (b) give the probability of individual n choosing alternative i in choice situation t, conditional on a vector of taste coefficients b. In a multinomial logit (MNL) model (cf. McFadden, 1974), we have: Pnit (b) 5

g

eV

nit

J Vnjt j51

e

,

(14.1)

where J is the total number of alternatives, and where the deterministic utility Vnit is given by f (xnit, b, zn) , which is a function of the attributes of alternative i as faced by individual n in task t, xnit, the vector of taste coefficients b,and the vector of sociodemographic characteristics zn.2 With ni*t referring to the alternative chosen by individual n in choice task t, the contribution by this individual to the likelihood function T (across his/her Tn choices) is simply given by Ln (b) 5 w t51Pni * t where the aim is to find values of b that maximise this function at the sample level, where simple maximum likelihood (ML) is the most commonly used approach. In this specification, deterministic heterogeneity is accommodated through the interaction between the vectors b and zn, allowing potentially for a mixture between continuous interactions and segmentations. We now look at the treatment of random heterogeneity in three different approaches. n

2.1.1 Continuous mixed logit The first applications mixing logit probabilities across an assumed continuous distribution of elements in b are generally credited to Byod and Mellman (1980) and Cardell and Dunbar (1980), though widespread use of the model was to take almost two more decades, largely owing to computational complexity. In-depth discussions of the resulting model structure are given for example in McFadden and Train (2000), Hensher and Greene (2003) and Train (2009).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Latent class structures 313 We now allow the vector b to follow a random distribution with parameters W and the choice probabilities are given by: Pnit (W) 5 3 Pnit (b) f (b 0 W) db,

(14.2)

b

where Pnit is the MNL choice probability from equation 14.1 and where f (b 0 W) gives the density function for the vector of taste coefficients b, which could allow for some fixed elements as well as correlation between individual random elements. Clearly, there is also scope for still incorporating deterministic heterogeneity through interaction between b and zn, whether at the level of the means or the dispersion parameters (cf. Greene et al., 2006). Equation 14.2 would mean that the taste heterogeneity applies at the level of individual tasks. In the case of multiple observations per individual, we instead generally work with the assumption that sensitivities vary across individual decision makers, but stay constant across choices for the same individual, notwithstanding an interest in additional within-individual heterogeneity in some work (e.g. Hess and Rose, 2009). Following the work of Revelt and Train (1998), we then write the likelihood of the observed sequence of choices for decision maker n as: Tn

Ln (W) 5 3 c q Pni*t (b) d f (b 0 W) db. b

(14.3)

t51

The integral in equation 14.3 (and equation 14.2) does not have a closed form solution and the model is typically estimated using maximum simulated log-likelihood (MSL), that is, the simulated analog of the ML typically used for MNL, averaging w Tt51Pni*t (b) across a sufficiently large number of draws from f (b 0 W) . Improvements in computer performance as well as the way in which draws from f (b 0 W) can be generated to better represent the distribution (see for example, Bhat, 2001, 2003; Hess et al., 2006) have led to widespread use of the model in many fields. A growing number of studies also rely on Bayesian techniques, which are especially useful when the dimensionality of b is large (see Train, 2009, ch. 12, for an overview), though much of the work to date has been on datasets with limited sample size and limited numbers of alternatives. Before proceeding, it should be noted that this discussion has centred on using mixed logit to accommodate heterogeneity in sensitivities across respondents, often referred to as random parameters logit. A mathematically equivalent specification, referred to as error components logit (cf. Walker et al., 2007), uses the random terms to capture phenomenae such as correlation between alternatives or choices, as well as heteroscedasticity. Capturing these effects in a latent class approach is less straightforward (or even possible), and this is a motivation for combining the approaches, as discussed later in the chapter. n

2.1.2 Simple discrete mixtures An alternative to the use of continuous distributions for individual elements in b is to allow for a finite number of possible values for each element in b, with an associated probability. This gives rise to what is variably called a discrete mixture model or a mass point logit model, with discussions in Gopinath (1995); Doug and Koppelman (2003); Wedel et al. (1999); Hess et al. (2007); and Train (2008).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

314

Handbook of choice modelling

Let us assume that b has K different elements, where we allow for Sk different values for bk where Sk needs to be specified by the analyst. With different weights for the different possible values for bk given by psk, we would then have that: S1

S2

SK

s1

s2

sK

Tn

Ln (b, p) 5 a a . . . a ps1. ps2 . . . pS q Pni *t (bs , bs , . . . , bs ) , 1

K

2

K

(14.4)

t51

that is, a weighted average across all the possible combinations of values in b, with the weight for each combination being given by a product of the respective weights for the individual elements in b, with p grouping together all individual weights, where 0 # ps # 1, 4 s, k and g Ss ps 5 1, 4k. The likelihood for this model has a closed form solution and no simulation is thus required in estimation. However, it can be seen straightaway that even with a low number of elements (K) in b and modest values for the number of possible values (Sk) for each bk, the number of combinations rapidly becomes very large and leads to computational complexity not dissimilar from the estimation of a continuous mixed logit model. As an example, many applications using mixed logit rely on fewer than say 250 draws in simulation based estimation even with as many as five random coefficients. This would mean that Pni*t (b) in equation 14.3 would need to be evaluated 250 times. If we estimated a discrete mixture analog with Sk 5 3,4k, we would need to evaluate 243 terms in the weighted sum in equation 14.4. Choosing an appropriate value of Sk, 4k, is down to the analyst, and is a non-trivial task. A key component of this is that in the estimation of discrete mixture models, in common with latent class structures, we see a rapid explosion in the number of parameters and the often observed phenomenon of multiple elements for bk collapsing to the same value, which is especially likely in the case of strongly peaked distributions. The latter issue can be addressed to some extent by moving away from simple maximum likelihood estimation and making use of expectation-maximisation (EM) algorithms, with in-depth discussions in Train (2008). In terms of the explosion in the number of parameters and the question of improvements in fit justifying such increases, it is wise to move to model fit criteria which penalise the inclusion of additional parameters more strongly, with typical approaches being the the Akaike information criterion (AIC) or the Bayesian information criterion (BIC); see for example Mittelhammer et al. (2000, s. 18.5). k

k

k

k

2.1.3 Latent class structures Latent class models have a long tradition in choice modelling. Their development is often traced back to work by Kamakura and Russell (1989) and Gupta and Chintagunta (1994), with important work also in Swait (1994), Gopinath (1995) and Bhat (1997). The heterogeneity in sensitivities across individuals is now accommodated by making use of separate classes with different values for the vector of taste coefficients b in each class. The distinction from a simple discrete mixture as discussed above is that the classes capture the joint distribution of the individual elements in b. Specifically, in a model with S classes, we would have S instances of the vector b, say b1 to bS, with a possibility of some of the elements in b staying constant across some of the classes. As with discrete mixture models, the number of classes S needs to be specified by the analyst. A latent class model uses a probabilistic class allocation model, where individual n

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Latent class structures 315 belongs to class s with probability pns, and where 0 # pns # 1 4n, s and g Ss51pns 5 1, 4n. Latent class models are generally specified with an underlying MNL model, but can easily be adapted for more general underlying structures such as nested or cross-nested logit – the same clearly also applies to continuous mixtures (cf. Garrow 2004; Hess et al., 2005a) or discrete mixtures. Let Pnit (bs) give the probability of individual n choosing alternative i in choice task t, conditional on n falling into class s. The likelihood of the observed set of choices for n, working on the assumption of intra-individual homogeneity in sensitivities, is then given by: S

Tn

s51

t51

Ln (b, p) 5 a pns a q Pni*t (bs) b,

(14.5)

with Pni*t (bs) again being given by Equation 14.1. In common with the discrete mixture model, no simulation is required in the estimation of latent class models of the form above, so that for example simple ML estimation can be used. However, in contrast with the discrete mixture model, the number of combinations of values is a function only of S and not of the number of elements (K) in b. The issue of choosing an appropriate value for S remains. In the most basic version of a latent class logit model (Kamakura and Russell, 1989), the class allocation probabilities are constant across individuals such that pns 5 ps, 4n. The real flexibility however arises when the class allocation probabilities are not constant across individuals but when a class allocation model is used to link these probabilities to characteristics of the individuals (Gupta and Chintagunta, 1991) Typically, these characteristics would take the form of socio-demographic variables, such as income, age and employment status. With zn giving the concerned vector of characteristics for individual n, and with the class allocation model taking on a logit form (this is a common specification rather than an absolute requirement), the probability of individual n falling into class s would be given by: ed 1g(g , z ) g e s

pns 5

s

n

s dl 1g (gl, zn) l51

,

(14.6)

where ds is a class-specific constant,3 gs is a vector of parameters to be estimated and g ( # ) gives the functional form of the utility function for the class allocation model – appropriate normalisation is to be used for both d and g. The class allocation model allows us to probabilistically allocate individuals to different classes depending on their sociodemographic characteristics. We have already discussed the issue of the proliferation of parameters above in the context of discrete mixtures, and the same issues apply in latent class models. Similarly, estimation with larger numbers of classes can be problematic with parameters collapsing to the same values across classes or some classes obtaining very small probabilities, and here, the EM algorithm can once again be one possible solution, discussed in Train (2008) but also earlier on by Bhat (1997). Nevertheless, it remains almost unavoidable that with a large number of classes, some of the coefficient values may not be significant across classes, or lend themselves to easy interpretation.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

316

Handbook of choice modelling

2.2

Contrasts

This section provides some theoretical contrasts between model structures, focussing on continuous mixed logit models and latent class structures. This extends on work by Bhat (1997) who derived elasticity expressions as well as on the discussions in Greene and Hensher (2003), and complements a substantial body of empirical comparisons between the structures, for example in Andrews et al. (2002); Hanley et al. (2002); Greene and Hensher (2003); Scarpa et al. (2005); Provencher and Bishop (2004) and Shen (2009). The evidence in these empirical comparisons is mixed, highlighting that both models have their advantages and that the choice of an appropriate structure will depend on the data at hand. 2.2.1 Taste heterogeneity The main emphasis in discussing mixed logit and latent class logit is on their ability to capture random heterogeneity across individuals in addition to deterministic heterogeneity such as also allowed for in simple MNL models. The two structures do this in very different ways, as already outlined in section 2.1. In the basic specification of the continuous mixed logit model, the random heterogeneity is entirely random, and while such a specification is common in most empirical work, it is clearly possible (and indeed desirable) to link the random heterogeneity to observable individual characteristics, typically through making the parameters of the random distribution a function of such characteristics (cf. Greene et al., 2006). A specification not linking the random heterogeneity to individual characteristics is similarly possible in a latent class framework (Kamakura and Russell, 1989), though here, the typical specification does rely on a parameterisation of the class allocation probabilities using socio-demographics such as in equation 14.6 meaning that the class allocation probabilities (and hence the implied sensitivities) vary also as a function of these individual characteristics. In both models, the assumptions made at the specification stage can have important influences on parameter estimates and substantive model results such as willingness-topay measures. It is well documented that the need to determine which coefficients should be allowed to vary across individuals, and what distributions are to be used, is a key issue facing analysts using continuous mixed logit models. There is a strong influence of these assumptions on model results (see, for example, Hess et al., 2005b), and while much progress has been made since the discussions by Greene and Hensher (2003) with flexible and nonparametric distributions (Fosgerau, 2006, 2007; Fosgerau and Bierlaire, 2007), numerous applications continue to rely on misguided specifications, also in relation to ensuring the existence of moments for ratios of coefficients (Daly et al., 2012), notwithstanding the possible solution of working in willingness-to-pay space (Train and Weeks, 2005). A key limitation of most parametric distributions is a strong shape assumption and general uni-modality. In theory, the same does not apply with latent class structures as no assumptions are made on the relationship between the values for a given coefficient across classes, thus allowing for flexible shapes and multi-modality. This is often touted as an advantage of latent class models. In practice however, the decision by the analyst on the number of classes to use has major implications for the shape of the distribution, and the shape of the true underlying distribution, for example in terms of the relative importance of different modes, will have impacts on the ability to retrieve sensitivities in

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Latent class structures 317 less well represented parts of the distribution. With both models, the ability to retrieve the true patterns of heterogeneity in the data thus depends both on the shape of that heterogeneity and the specification used by the analyst. 2.2.2 Posterior analysis The estimation of either type of models provides information relating to the sample level patterns of heterogeneity. By making the parameters of the continuous distribution in mixed logit models a function of socio-demographics or by incorporating sociodemographics in the class allocation model in a latent class structure, we can obtain further insights into the likely location of a given type of individual on that sample level distribution. This, however, treats two individuals who are identical on those sociodemographics as also having identical sensitivities, contrary to the notion of random heterogeneity. Further insights can be obtained post estimation in a Bayesian manner, by calculating information relating to a given individual’s sensitivities on the basis of the sample level model estimates and that individual’s observed choices. In a continuous mixed logit context, these calculations are straightforward, as discussed for example by Train (2009, ch. 12). Specifically, we have from equation 14.3 that the likelihood of the observed sequence of choices for person n is given by: Ln (W) 5 3 Ln (b) f (b 0 W) db,

(14.7)

b

n Pni*t (b) . where Ln (b) 5 w Tt51 Using Bayes’ rule, we can then rewrite this as:

L (bn 0 Cn) 5

Ln (b) f (b 0 W) . Ln (W)

(14.8)

This gives us the probability of given values for bn, conditional on the observed choices ( Cn) for individual n, where it is important to remember that bn is not observed but is distributed. It is then straightforward to, for example, calculate a conditional mean for bn as: bn 5 3 bn L (bn 0 Cn) d bn,

(14.9)

bn

with similar calculations to obtain the corresponding variance or other measures. It is similarly possible to calculate a number of posterior measures from latent class models. A key example comes in the form of posterior class allocation probabilities, where the posterior probability of individual n for class s is given by: ^ ns 5 p

pns Ln (bs) , Ln (b, pn)

(14.10)

where Ln (bs) gives the likelihood of the observed choices for individual n, conditional on class s. To explain the benefit of these posterior class allocation probabilities, let us assume that we have calculated for each class in the model a given measure ws 5 bb , that is, the s1 s2

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

318

Handbook of choice modelling

S pnsws simply gives us a sample ratio between the first two coefficients. Using wn 5 g s51 level mean for the measure w for an individual with the specific observed characteristics of person n. These characteristics (in terms of socio-demographics used in the class allocation probabilities) will, however, be common to a number of individuals who still make different choices, and the most likely value for w for individual n, conditional on S ˆ n 5 gs51p ˆ ns ws. his/her observed choices, can now be calculated as w Finally, it might also be useful to produce a profile of the membership in each class. From the parameters in the class allocation probabilities, we know which class is more or less likely to capture individuals who possess a specific characteristic, but this is not taking into account the multivariate nature of these characteristics. Let us for example assume that a given socio-demographic characteristic zc is used in the class allocation probabilities, with associated parameter gc, and using a linear parameterisation in equation 14.6. We can then calculate the likely value for zc for an individual in class s as:

ˆ ns zcn a n51 p N

zˆcs 5

a n51 N

,

(14.11)

ˆ ns p

where we again use the posterior probabilities to take into account the observed choices. Alternatively, we can also calculate the probability of an individual in class s having a given value k for zc by using: ˆ ns (zcn 5 k) a n51 p N

P (ˆzcs 5 k) 5

a n51 N

.

(14.12)

ˆ ns p

2.2.3 Correlation between coefficients In models without random taste heterogeneity, any correlation in the distribution of individual coefficients can solely arise as a result of interactions with socio-demographic attributes and specifically where multiple coefficients interact with the same sociodemographic characteristics. As an example, one could imagine a situation where cost sensitivity decreases with income while time sensitivity increases with income, resulting in negative correlation between the time and cost coefficients across the sample. In a continuous mixture model, additional correlation can be accommodated by specifying a joint distribution for the random taste coefficients. While most estimation packages allow users to specify multivariate normal distributions, the vast majority of continuous mixture applications continue to make use of independently distributed taste coefficients, despite the obvious simplification and likely lack in performance this engenders. Correlation is rarely introduced in models not based on the normal distribution, one exception being given in Walker (2011), while flexible correlation structures in continuous mixtures are also a benefit of the GMNL specification of mixed logit (Fiebig et al., 2010). In a latent class model, correlation between coefficients is an inherent characteristic of the model structure as long as the two coefficients in question take on more than one value across the S classes. As highlighted repeatedly earlier in the chapter, the nature of the distribution of sensitivities in a latent class model is a function of both the estimates

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Latent class structures 319 of the class specific b vectors as well as the individual specific class allocation probabilities. A characterisation of these distributions at the level of individuals should thus use the posterior probabilities to encompass the information gained from observed choices. Drawing on Hess et al. (2009), we can then easily see that: cov (bn1, bn2) 5 E [ (bn1 2 E (bn1) ) (bn2 2 E (bn2) ) ] 5 E (bn1 bn2) 2 E (bn1) E (bn2) ˆ ns b2,s b. ˆ ns b1, s b2, s 2 a a p ˆ b b a ap 5 a s51 p s51 ns 1,s S

S

S

(14.13)

s51

For ease of notation, let a 5 b1 and g 5 b2 in which case equation 13 can be written as: ˆ nsasgs 2 a a p ˆ a baa p ˆ g b. cov (an, gn) 5 a s51 p s51 ns s s51 ns s S

S

S

(14.14)

A special situation arises when S 5 2, in which case the class allocation probabilities have no effect on the sign of the correlation. Indeed, with the notation from equation 14.14, we then have: ˆ n1p ˆ n2 [ a1 (g1 2 g2) 1a2 (g2 2 g1) ] cov (an, gn) 5 p ˆ n1p ˆ n2 [ (a1 2 a2) (g1 2 g2) ] , 5p

(14.15)

where the sign of cov (an, gn) only depends on the changes in the two elements in a and g across the two classes. It should be noted that, using equation 14.8, we also obtain individual specific distributions for the coefficients in a continuous mixed logit model, where any correlation between these will be a function of the choices (leading to the posterior distributions), the assumptions in relation to the sample level covariance structure, and any incorporation of socio-demographic characteristics in the specification of the distributions. Unlike with a latent class structure, a simple analytic solution such as shown here is not straightforward. 2.2.4 Disaggregate elasticities As a final step, we briefly contrast disaggregate point elasticities in the different models (see also Bhat, 1997). With both types of mixtures having a logit kernel, it is worth restating the well known MNL elasticities (see, for example Ben-Akiva and Lerman 1985), with the direct elasticity in relation to attribute x given by: Ei, xni 5

0Vni x (1 2 Pni (b) ) , 0xni ni

where, with a linear in attributes specification, elasticity is given by:

0Vni 0xni

(14.16)

5 bx. The corresponding cross-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

320

Handbook of choice modelling Ei, x 5 2 nj

0Vnj 0xnj

xnj Pnj ( b) ,

(14.17)

exhibiting the independence from irrelevant alternatives (IIA) characteristic at the disaggregate level – note this does not imply IIA in the aggregate elasticities (Louviere et al., 2000). In a continuous mixed logit model, the direct elasticity (see, for example, Train, 2009) is given by:

e b 0V0x xni (1 2 Pni (b) ) Pni (b) f (b 0 W) db , eb Pni (b) f (b 0 W) db ni

Ei, xni 5

ni

(14.18)

with the cross-elasticity being:

0Vnj

Ei, xnj 5 2

eb 0x

nj

xnj Pnj (b) Pni (b) f (b 0 W) db

eb Pni (b) f (b 0 W) db

,

(14.19)

where this varies across alternatives, such that it does not exhibit the IIA property. Here, it can be seen that the elasticities are given by integration of logit elasticities. In a latent class logit model, the direct elasticity is given by: Ei, x 5 ni

5 a a pns S

s51

5 a a pns S

s51

0Pni (b) xni 0xni Pni (b) 0Pni (bs) xni b 0xni Pni (b)

0Vnis xni P (b ) (1 2 Pni (bs))b 0xni ni s Pni (b)

S p P (b ) 0V ns ni s nis x (1 2 Pni (bs) ) d . 5 a c 0xni ni s51 Pni (b)

(14.20)

It can be seen that the term in square brackets corresponds to a MNL direct elasticity for a specific class in the latent class model. This means that the direct elasticities are a weighted sum of MNL elasticities, with the weights being given by multiplying the class membership probability with the class-specific conditional probability and by dividing this product by the marginal probability. It can similarly be seen that the cross-elasticities are given by a weighted sum of MNL cross-elasticities, with: 0Pni (b) xnj Ei, x 5 0xnj Pni (b) nj

S 0Vnjs xnj Pni (bs) Pn ( j 0 bs) b 5 a pns a 2 0xnj Pni (b) s51

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Latent class structures 321 S p P (b ) 0Vnjs ns ni s 5 a x P (b ) d . c2 0xnj nj nj s s51 Pni (b)

(14.21)

The contrasts and similarities between the continuous mixed logit and latent class logit elasticities are clear. Both are a function of MNL elasticities, and both avoid the IIA assumption. The mixture in the continuous model means a reliance on integration/ simulation, while the latent class model uses weighted summation. In all of the models, the elasticities vary as a function of the attribute levels of the alternatives and hence the probabilities, but also as a function of any socio-demographic interactions with b. In the latent class model, we have the additional influence of socio-demographics through the class allocation probabilities, where, in the continuous mixed logit model, the same is the case if the parameters of the distribution are a function of decision-maker characteristics.

3

COMBINING CONTINUOUS MIXED LOGIT AND LATENT CLASS

The discussion in the previous section has highlighted the contrasts between continuous mixed logit and latent class logit models. Both structures have strengths and weaknesses and it should thus come as no surprise that a number of researchers have put forward structures that combine the two approaches. The first published such application seems to be the work of Walker and Li (2006), who add additional continuous variation into a latent class structure in the form of error component terms aimed at capturing correlation across alternatives and across choices for the same decision maker. Specifically, their model takes the general form of: S

Tn

s51

t51

Ln (b, p, s) 5 a pns3 q Pni*t (bs, h) f (h 0 s) dh. h

(14.22)

In this specification, the continuous random components h follow normal distributions with a mean of zero and with standard deviations captured in the vector s. With a view to capturing correlation across alternatives as well as across choices for the same decision maker, these error components are generic across classes within the overarching latent class structure. A different direction in combining the two structures uses the continuous component to allow for additional heterogeneity in sensitivities within given classes, where this heterogeneity varies across classes. In effect, this can be described most straightforwardly as a latent class mixed logit, using a continuous mixed logit model inside each class to capture heterogeneity. In particular, we would write: Tn

Ln (W, p) 5 a pns3 q Pni*t (bs) f (bs 0 Ws) dbs. S

s51

(14.23)

bs t51

In this model, we have that the vector of coefficients bs is specific to class s and contains at least some components that are distributed randomly across decision makers within that class, according to f (bs 0 Ws) , where W 5 8W1,. . .,WS 9. Such a specification has been Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

322

Handbook of choice modelling

used by Bujosa et al. (2010) on revealed preference data (with Tn 5 1, 4n) and Greene and Hensher (2013) on stated preference data. In a different direction, there has in recent years been growing interest in allowing for intra-agent heterogeneity in addition to inter-agent heterogeneity (Bhat and Sardeshi, 2006; Hess and Rose, 2009) making use of a specification such as: Tn

Ln (Wg, Wa) 5 3 q c 3 Pni*t (b 5 a 1 g) f (g 0 Wg) dgd h (a 0 Wa) da, g

(14.24)

a t51

where b 5 a 1 g with a distributed across decision makers and g distributed across individual choices for the same decision maker. Models of this type have proven to be very difficult to estimate due to the double layer of integration, and this raises the question whether replacing one layer with weighted summation through a latent class structure would be beneficial, in essence adapting equation 14.23 by moving the position of the integral to the level of an individual choice: S

Tn

s51

t51

Ln (W, p) 5 a pns q 3 Pni*t (bs) f (bs 0 Ws) dbs. b

(14.25)

s

This specification would now mean that the latent class structure captures the variation in sensitivities across individual decision makers through the class structure, while the integration over class specific random coefficients captures additional heterogeneity across choices for individual decision makers. Finally, the focus above has solely been on allowing for additional continuous random heterogeneity for the choice model parameters within individual latent classes. However, the drivers of the class allocation model could similarly include other latent factors (such as attitudes) that should be explicitly captured in the model specification. Such a specification, as discussed by Walker and Ben-Akiva (2002) and Hess et al. (2013a), relies on specifying a set of latent variables an 5 h (q, zn) 1 hn where hn is a vector of standard normal random variables. These an terms, which can, for example, represent underlying attitudes and perceptions, are then used in parameterising the class allocation probabilities, rewriting equation 14.26 to: ed 1g(g , z ) 1t a s

pns 5

a l51 S

s

s

n

n

.

ed 1g(g , z ) 1t a l

l

n

l

(14.26)

n

At the same time, an is used to explain answers by decision maker n to a set of attitudinal questions, grouped together in In, with for example, In 5 zan 1 n where n is a vector of random disturbances. The estimation then jointly maximises the likelihood of the observed choices and answers to the attitudinal questions, through having: Tn

Ln (b, g, q, d, t) 5 3 a pns a q Pni*t (bs) b P (In 0 an) f (hn) dhn, S

hn s51

(14.27)

t51

where pns is now also a function of an.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Latent class structures 323

4

CONFIRMATORY LATENT CLASS STRUCTURES: RECENT DEVELOPMENTS AND FUTURE RESEARCH NEEDS

The discussion of latent class models thus far has centred on a form of the model which is particularly accessible as there are well-established estimation software programs to estimate such models. This model can be referred to as an exploratory latent class model – the analyst merely specifies the number of classes and selects the attributes which are to be used in the class allocation model, and the rest is left to model estimation. This will, with a suitably robust estimation approach, lead to a well fitting structure for a model of the specified size, but there is no guarantee that it will lead to reasonable results or meaningful insights into behaviour, much the same way as when just estimating a continuous mixed logit model with standard distributions. An alternative approach is to use what can be termed a confirmatory approach, imposing different a priori restrictions on the specifications of the class membership models and on the class-specific choice probabilities, and estimating parameters subject to these constraints. This applies, for example, when the latent classes are based on a priori behavioural hypotheses. An example of such a confirmatory approach is given in Gopinath (1995), while the work by Train (2008) in the context of estimating weights for fixed points in a distribution is also an example of a confirmatory approach. An added reason for discussing confirmatory approaches in the present chapter is a strong stream of research activity making use of such models in two related but distinct contexts in recent years, namely the domains of information processing and decision rule heterogeneity. 4.1

Attribute Processing Strategies

The field of information processing strategies (IPS) or attribute processing strategies (APS) is a burgeoning area of work, especially in the context of stated choice surveys. The main emphasis has been on the question whether some decision makers may actually make their choices based on only a subset of the attributes that describe the alternatives at hand. This phenomenon is typically referred to as attribute non-attendance or attribute ignoring, and an in-depth review of work in this area is given in Hensher (2010) and Hensher, Chapter 12 in the present volume. The interest in this topic in this chapter comes in the context of ways to accommodate attribute non-attendance in models. A key role in this area was played by the early discussions in Hess and Rose (2007), who proposed the use of a latent class approach to accommodate attribute non-attendance, a method since adopted by numerous other studies (for example, Scarpa et al., 2009; Campbell et al., 2010; Hensher and Greene, 2010; Hole, 2011; Hensher et al., 2012). With this approach, different latent classes relate to different combinations of attendance and non-attendance across attributes. For each attribute treated in this manner, there exists a non-zero coefficient (to be estimated), which is used in the attendance classes, while the attribute is not employed in the non-attendance classes, that is the coefficient is set to zero. In a complete specification, covering all possible combinations, this would thus lead to 2K classes, with K being the number of attributes, where a given coefficient will take the same value in all classes where that attribute is included. A simplification so as to avoid estimating 2K separate class allocation probabilities is to use a multiplicative

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

324

Handbook of choice modelling

approach, that is, treating non-attendance independent across attributes, much as in the discrete mixture discussions in section 2.1.2, and as discussed in Hole (2011). In addition to the vector b, we now have a SxK matrix L, in which each row contains a different combination of 0 and 1 elements, where S 5 2K. Next, let A + B be the elementby-element product of two equally sized vectors A and B, yielding a vector C of the same size, where the kth element of C is obtained by multiplying the kth element of A with the kth element of B. Using this notation, the specific values used for the taste coefficients in class s are then given by the vector bs 5 b + Ls. The likelihood for decision maker n is then given by: Ln (b, p) 5 a ps q Pni*t (bs 5 b + Ls) . S

T

s51

t51

(14.28)

The overall findings of the growing body of work using the latent class specification point towards a significant portion of people ignoring attributes, including cost variables. In later work, Hess et al. (2013b) argue that an important shortcoming of this simple latent class approach is the reliance on only two possible values for each coefficient, one of which is fixed to zero, where the latter might capture sensitivities close to (rather than equal to) zero, while the two class structure might simply be a proxy for more general taste heterogeneity. Hess et al. (2013b) put forward a model which combines the confirmatory latent class structure with additional continuous heterogeneity in the non-zero coefficient values, aiming to reduce the risk of the class at zero capturing low sensitivities. The likelihood function for decision maker n is simply rewritten as: Ln (W, p) 5 a ps3 q Pni*t (bs 5 b + Ls) f (b 0 W) db. S

T

s51

(14.29)

b t51

Empirical evidence by Hess et al. (2013b) on multiple datasets reveals major improvements in fit by the specification in equation 14.29 over the model in equation 14.28, along with a reduction in the implied rates of non-attendance, which crucially however remains above zero for many attributes. Further work on this structure was subsequently conducted by Collins et al. (2013) 4.2

Decision Rule Heterogeneity and Other Mixtures of Models

Although structures belonging to the family of random utility models have come to dominate, it is important to recognise that alternative paradigms for decision making have been proposed, for example the elimination by aspects model of Tversky (1972), but also more recent work based on the concepts of happiness (Abou-Zeid and Ben-Akiva, 2010) and regret (Chorus et al., 2008). The evidence in the literature is that which paradigm works best is very much dataset specific. Hess et al. (2012) put forward the hypothesis that variations in decision rules may be across decision makers with a single dataset, not just across datasets, and propose the use of a confirmatory latent class approach in this context. Specifically, let Ln (bm, m) give the probability of the observed sequence of choices for decision maker n, conditional on using a choice model identified as m, where this

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Latent class structures 325 uses a vector of parameters bm. The framework is based on the idea that M different behavioural processes are used in the data. The probability for the sequence of choices observed for decision maker n is now given by: Ln (b, p) 5 a pnmLn (bm, m) , M

(14.30)

m51

where we use different behavioural processes in different classes, with the probability of decision rule class m for decision maker n given by pnm. Hess et al. (2012) additionally allow for random heterogeneity in parameters within individual decision rule classes, such that: Ln (W, p) 5 a pnm3 Ln (bm, m) f (bm, Wm) dbm, b M

m51

(14.31)

m

where bm , f (bm, Wm) and Wm 5 W1 . . . WM. Hess et al. (2012) use the model to allow for mixtures between random utility maximisation, random regret minimisation and elimination by aspects. In later work, Hess and Stathopoulos (2012) use an approach as in Walker and Ben-Akiva (2002) and Hess et al. (2013a), making the class allocation a function of a latent factor, which in this case also explains decision makers’ real-world choices. At this stage, it should be noted that a latent class model mixing various decision rules is just one example of a wider set of structures that combine different models. A further possibility, for example, would be a model using different generalised extreme value (GEV) nesting structures in different latent classes, somewhat similar in aims to the work of Ishaq et al. (2013). Finally, a separate body of work looks at using different choice sets in different classes, in the context of choice set generation work (see, for example, Swait and Ben-Akiva, 1985; Ben-Akiva and Boccara,1995; and Gopinath, 1995, s. 2.7).

5

SUMMARY AND CONCLUSIONS

This chapter has revisited the topic of contrasting continuous mixed logit models and latent class structures, ten years on from the work by Greene and Hensher (2003). The key distinction between the models clearly remains that the former uses continuous distributions of sensitivities while the latter uses a finite number of classes of sets of coefficient values. Both models allow for deterministic heterogeneity, along with an influence of observed components such as socio-demographics on the nature of the random heterogeneity, albeit that this is arguably done less frequently with continuous mixtures. While latent class models lead to reduced computational costs compared to continuous mixtures, they are characterised by a rapid increase in the number of parameters. Postanalysis calculations of measures of heterogeneity, correlation and elasticities are relatively straightforward in both models, again with the distinction between simulation and averaging across classes, where this chapter provides some additional insights for correlation in latent class models. A further point not touched on thus far is that of using the models in application/forecasting, where the computational cost of latent class models is lower, which is important especially in the case of micro-simulation uses.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

326

Handbook of choice modelling

The key motivation for extending on the discussions in Greene and Hensher (2003) can be found in the many methodological developments that have taken place in the last ten years. On the continuous mixed logit side, progress has been made in estimation capabilities, the flexibility of parametric and non-parametric distributions, and the treatment of phenomena such as inter-alternative correlation and heteroscedasticity. Especially the latter two are not as straightforward to capture in a latent class framework, and this, along with a desire for more flexible specifications of heterogeneity, has motivated work on combining the two approaches, for example in Walker and Li (2006); Bujosa et al. (2010); Greene and Hensher (2013); and Hess et al. (2013b). Similarly, the major interest in modelling attitudes and perceptions (cf. Ben-Akiva et al., 2002) has led to hybrid models in which the class allocation is in part driven by these latent psychological constructs (see, for example, Walker and Ben-Akiva 2002; Hess et al., 2013a). The other key focus of the chapter has been the added interest in latent class structures in recent years in the context of attribute processing strategies (see the summary in Hensher, 2010) and decision rule heterogeneity (cf. Hess et al., 2012). A substantial number of studies now make use of confirmatory latent class approaches which estimate allocation probabilities for classes characterised by specific behavioural assumptions. With growing interest in ever richer specifications of heterogeneity, the uptake of latent class structures in this context is bound to increase further, likely in conjunction with continuous layers of heterogeneity, especially given the hype of activity on treatments of latent psychological factors such as attitudes and perceptions, as evidence for example in Hess and Stathopoulos (2012). There remains substantial scope for future work in this area, both theoretical and empirical. A key avenue for work especially with some of the most complex structures is that of estimation. Notwithstanding the work on EM algorithms by Bhat (1997) and Train (2008), issues with dominant peaks in distributions persist, and the importance of starting values is not to be underestimated. Finally, on the empirical side, substantially more effort needs to go into the specification of the class allocation models and the search for appropriate observable and latent drivers of heterogeneity, be it in sensitivities, processing rules or decision rules. It remains up to the analyst to make an informed choice between the two structures, where hybrid approaches combining the benefits of both add an important further level of flexibility.

ACKNOWLEDGEMENTS This chapter is partly based on work conducted during a stay as a visiting research scholar in the Department of Civil & Environmental Engineering at the Massachusetts Institute of Technology. The author is grateful to Moshe Ben-Akiva, Joan Walker and Dinesh Gopinath for inputs into the work on correlation and elasticities, and also wishes to thank John Rose and Charisma Choudhury for helpful comments on an earlier draft.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Latent class structures 327

NOTES 1. The author wishes to acknowledge the role of Moshe Ben-Akiva, Joan Walker and Dinesh Gopinath in earlier work (Hess et al., 2009) which led to the developments in sections 2.2.3 and 2.2.4. 2. The inclusion of any alternative specific constants is not made explicit here. 3. In a model with generic class allocation probabilities, such as in Kamakura and Russell (1989), only these constants would be estimated.

REFERENCES Abou-Zeid, M. and M. Ben-Akiva (2010), ‘A model of travel happiness and mode switching’, in S. Hess and A. Daly (eds), Choice Modelling: The State-of-the-Art and the State-of-Practice, Bingley: Emerald, pp. 289–305. Andrews, R.L., A. Ainslie and I.S. Currim (2002), ‘An empirical comparison of logit choice models with discrete versus continuous representations of heterogeneity’, Journal of Marketing Research, 39 (4), 479–87. Ben-Akiva, M. and B. Boccara (1995), ‘Discrete choice models with latent choice sets’, International Journal of Research in Marketing, 12 (1), 9–24. Ben-Akiva, M. and S.R. Lerman (1985), Discrete Choice Analysis: Theory and Application to Travel Demand, Cambridge, MA: MIT Press. Ben-Akiva, M., J. Walker, A. Bernardino, D. Gopinath, T. Morikawa and A. Polydoropoulou (2002), ‘Integration of choice and latent variable models’, in H. Mahmassani (ed.), In Perpetual Motion: Travel Behaviour Research Opportunities and Application Challenges, Oxford: Pergamon, pp. 431–70. Bhat, C. (1997), ‘An endogenous segmentation mode choice model with an application to intercity travel’, Transportation Science, 31 (1), 34–48. Bhat, C.R. (2001), ‘Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model’, Transportation Research Part B, 35 (7), 677–93. Bhat, C.R. (2003), ‘Simulation estimation of mixed discrete choice models using randomized and scrambled Halton sequences’, Transportation Research Part B, 37 (9), 837–55. Bhat, C.R. (2011), ‘The maximum approximate composite marginal likelihood (macml) estimation of multinomial probit-based unordered response choice models’, Transportation Research Part B: Methodological, 45 (7), 923–39. Bhat, C.R. and R. Sardesai (2006), ‘The impact of stop-making and travel time reliability on commute mode choice’, Transportation Research Part B, 40 (9), 709–30. Bhat, C.R. and R. Sidharthan (2011), ‘A simulation evaluation of the maximum approximate composite marginal likelihood (MACML) estimator for mixed multinomial probit models’, Transportation Research Part B: Methodological, 45 (7), 940–53. Boyd, J. and J. Mellman (1980), ‘The effect of fuel economy standards on the US automotive market: a hedonic demand analysis’, Transportation Research Part A, 14 (5–6), 367–78. Bujosa, A., A. Riera and R. Hicks (2010), ‘Combining discrete and continuous representation of preference heterogeneity: a latent class approach, Environmental& Resource Economics, 47 (4), 477–93. Campbell, D., V. Lorimer, C. Aravena and W.G. Hutchinson (2010), ‘Attribute processing in environmental choice analysis: implications for willingness to pay’, 84th Annual Conference, Edinburgh, Scotland 91718, Agricultural Economics Society, 29–31 March. Cardell, S. and F. Dunbar (1980), ‘Measuring the societal impacts of automobile downsizing’, Transportation Research Part A, 14 (5–6), 423–34. Chorus, C., T. Arentze and H. Timmermans (2008), ‘A random regret minimization model of travel choice’, Transportation Research Part B, 42 (1), 1–18. Collins, A.T., J.M. Rose and D.A. Hensher (2013), ‘Specification issues in a generalised random parameters attribute nonattendance model’, Transportation Research Part B: Methodological, 56, 234–53. Daganzo, C.F. (1979), Multinomial Probit: The Theory and its Applications to Demand Forecasting, New York: Academic Press. Daly, A., S. Hess and K. Train (2012), ‘Assuring finite moments for willingness to pay estimates from random coefficients models’, Transportation, 39 (1), 19–31. Dong, X. and F.S. Koppelman (2003), ‘Mass point mixed logit model: development and application’, paper presented at the 10th International Conference on Travel Behaviour Research, Lucerne, August. Fiebig, D.G., M. Keane, J.J. Louviere, and N. Wasi (2010), ‘The generalized multinomial logit: accounting for scale and coefficient heterogeneity’, Marketing Science, 29 (3), 393–421.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

328

Handbook of choice modelling

Fosgerau, M. (2006), ‘Investigating the distribution of the value of travel time savings’, Transportation Research Part B, 40 (8), 688–707. Fosgerau, M. (2007), ‘Using nonparametrics to specify a model to measure the value of travel time’, Transportation Research Part A, 41 (9), 842–56. Fosgerau, M. and M. Bierlaire (2007), ‘A practical test for the choice of mixing distribution in discrete choice models’, Transportation Research Part B, 41 (7), 784–74. Garrow, L. (2004), ‘Comparison of models representing correlation and random taste variation: an application to airline passengers’ rescheduling behavior’, published doctoral dissertation, Northwestern University, Evanston, IL. Gopinath, D. (1995), ‘Modeling heterogeneity in discrete choice processes: application to travel demand’, PhD thesis, MIT, Cambridge, MA. Greene, W.H. and D.A. Hensher (2003), ‘A latent class model for discrete choice analysis: contrasts with mixed logit’, Transportation Research Part B, 37 (8), 681–98. Greene, W.H. and D.A. Hensher (2013), ‘Revealing additional dimensions of preference heterogeneity in a latent class mixed multinomial logit model’, Applied Economics, 45 (14), 1897–902. Greene, W.H., D.A. Hensher and J.M. Rose (2006), ‘Accounting for heterogeneity in the variance of unobserved effects in mixed logit models’, Transportation Research Part B, 40 (1), 75–92. Gupta, S. and P. Chintagunta (1994), ‘On using demographic variables to determine segment membership in logit mixture models’, Journal of Marketing Research, 31 (1), 128–36. Hanley, N., R. Wright and G. Koop (2002), ‘Modelling recreation demand using choice experiments: climbing in Scotland’, Environmental and Resource Economics, 22, 449–66. Hensher, D.A. (2010), ‘Attribute processing, heuristics and preference construction in choice analysis’, in S. Hess and A.J. Daly (eds), State-of Art and State-of Practice in Choice Modelling: Proceedings from the Inaugural International Choice Modelling Conference, Bingley: Emerald, pp. 35–70. Hensher, D.A. and W.H. Greene (2003), ‘The mixed logit model: the state of practice’, Transportation, 30 (2), 133–76. Hensher, D.A. and W.H. Greene (2010), ‘Non-attendance and dual processing of common-metric attributes in choice analysis: a latent class specification’, Empirical Economics, 39 (4), 413–26. Hensher, D.A., J.M. Rose and W.H. Greene (2012), ‘Inferring attribute non-attendance from stated choice data: implications for willingness to pay estimates and a warning for stated choice experiment design’, Transportation, 39 (2), 235–45. Hess, S. and J.M. Rose (2007), ‘A latent class approach to recognising respondents’ information processing strategies in SP studies’, paper presented at the Oslo Workshop on Valuation Methods in Transport Planning, Oslo, March. Hess, S. and J.M. Rose (2009), ‘Allowing for intra-respondent variations in coefficients estimated on repeated choice data’, Transportation Research Part B, 43 (6), 708–19. Hess, S. and A. Stathopoulos (2012), ‘Linking the decision process to underlying attitudes and perceptions: a latent variable latent class construct’, paper presented at the 13th International Conference on Travel Behaviour Research, Toronto, July. Hess, S., M. Ben-Akiva, D. Gopinath and J. Walker (2009), ‘Advantages of latent class models over continuous mixed logit’, paper presented at the 12th International Conference on Travel Behaviour Research, Jaipur, December. Hess, S., M. Bierlaire and J.W. Polak (2005a), ‘Capturing taste heterogeneity and correlation structure with mixed GEV models’, in R. Scarpa and A. Alberini (eds), Applications of Simulation Methods in Environmental and Resource Economics, Dordrecht: Springer, pp. 55–76. Hess, S., M. Bierlaire and J.W. Polak (2005b), ‘Estimation of value of travel-time savings using mixed logit models’, Transportation Research Part A, 39 (2–3), 221–36. Hess, S., M. Bierlaire and J.W. Polak (2007), ‘A systematic comparison of continuous and discrete mixture models’, European Transport, 37, 35–61, accessed 6 January 2014 at www.istiee.org/te/papers/N37/03%20 bierlaire_35-61_.pdf. Hess, S., J. Shires and A. Jopson (2013a), ‘Accommodating underlying pro-environmental attitudes in a rail travel context: application of a latent variable latent class specification’, Transportation Research Part D, 25, 42–8. Hess, S., A. Stathopoulos and A.J. Daly (2012), ‘Allowing for heterogeneous decision rules in discrete choice models: an approach and four case studies’, Transportation, 39 (3), 565–91. Hess, S., A. Stathopoulos, D. Campbell, V. O’Neill and S. Caussade (2013b), ‘It’s not that I don’t care, I just don’t care very much: confounding between attribute non-attendance and taste heterogeneity’, Transportation, 40 (3), 583–607. Hess, S., K. Train and J.W. Polak (2006), ‘On the use of a Modified Latin Hypercube Sampling (MLHS) method in the estimation of a mixed logit model for vehicle choice’, Transportation Research Part B, 40 (2), 147–63.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Latent class structures 329 Hole, A.R. (2011), ‘A discrete choice model with endogenous attribute attendance’, Economics Letters, 110 (3), 203–5. Ishaq, R., S. Bekhor and Y. Shiftan (2013), ‘A flexible model structure approach for discrete choice models’, Transportation, 40 (3), 609–24. Kamakura, W.A. and G. Russell (1989), ‘A probabilistic choice model for market segmentation and elasticity structure’, Journal of Marketing Research, 26 (4), 379–90. Louviere, J.J., D.A, Hensher and J. Swait (2000), Stated Choice Models: Analysis and Application, Cambridge: Cambridge University Press. McFadden, D. (1974), ‘Conditional logit analysis of qualitative choice behaviour’, in P. Zarembka (ed.), Frontiers in Econometrics, New York: Academic Press, pp. 105–42. McFadden, D. and K. Train (2000), ‘Mixed MNL models for discrete response’, Journal of Applied Econometrics, 15 (5), 447–70. Mittelhammer, R., R. Judge and D. Miller (2000), Econometric Foundations, New York: Cambridge University Press. Provencher, B. and R.C. Bishop (2004), ‘Does accounting for preference heterogeneity improve the forecasting of a random utility model? A case study’, Journal of Environmental Economics and Management, 48 (1), 793–810. Revelt, D. and K. Train (1998), ‘Mixed logit with repeated choices: households’ choices of appliance efficiency level’, Review of Economics and Statistics, 80 (4), 647–57. Scarpa, R., T. Gilbride, D. Campbell and D.A. Hensher (2009), ‘Modelling attribute non-attendance in choice experiments for rural landscape valuation’, European Review of Agricultural Economics, 36 (2), 151–74. Scarpa, R., K. Willis and M. Acutt (2005), ‘Individual-specific welfare measures for public goods: a latent class approach to residential customers of Yorkshire Water’, in P. Koundouri (ed.), Econometrics Informing Natural Resource Management, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 316–37. Shen, J. (2009), ‘Latent class model or mixed logit model? A comparison by transport mode choice data’, Applied Economics, 41 (22), 2915–24. Swait, J. (1994), ‘A structural equation model of latent segmentation and product choice for cross-sectional revealed preference choice data’, Journal of Retailing and Consumer Services, 1 (2), 77–89. Swait, J. and M. Ben-Akiva (1985), ‘Incorporating random constraints in discrete models of choice set generation’, Transportation Research Part B, 21 (2), 91–102. Train, K. (2008), ‘EM algorithms for nonparametric estimation of mixing distributions’, Journal of Choice Modelling, 1 (1), 40–69. Train, K. (2009), Discrete Choice Methods with Simulation, 2nd edn, Cambridge, MA: Cambridge University Press. Train, K. and M. Weeks (2005), ‘Discrete choice models in preference space and willingness-to-pay space’, in R. Scarpa and A. Alberini (eds), Application of Simulation Methods in Environmental and Resource Economics, Dordrecht: Springer, Dordrecht: pp. 1–16. Tversky, A. (1972), ‘Elimination by aspects: a theory of choice’, Psychological Review, 79, 281–99. Walker, J.L., M. Ben-Akiva and D. Bolduc (2007), ‘Identification of parameters in normal error component logit-mixture (NECLM) models’, Journal of Applied Econometrics, 22 (6), 1095–125. Walker, J. (2001), ‘Extended discrete choice models: integrated framework, flexible error structures, and latent variables’, PhD thesis, MIT, Cambridge, MA. Walker, J. and M. Ben-Akiva (2002), ‘Generalized random utility model’, Mathematical Social Sciences, 43 (3), 303–43. Walker, J. and J. Li (2006), ‘Latent lifestyle preferences and household location decisions’, Journal of Geographical Systems, 9 (1), 77–101. Wedel, M., W.A. Kamakura, N. Arora, A. Bemmaor, J. Chiang, T. Elrod, R. Johnson, P. Lenk, S. Neslin and C.S. Poulsen (1999), ‘Discrete and continuous representations of unobserved heterogeneity in choice modeling’, Marketing Letters, 10 (3), 219–32.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:39AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

15 Models for ordered choices William Greene

1

INTRODUCTION

Netflix (www.netflix.com) is an Internet company that rents movies to subscribers. After a customer rents a movie, the next time they log on to the website, they are invited to rate the movie on a five-point scale, where five is the highest, most favorable rating. The ratings of the many thousands of subscribers who rented that movie are averaged to provide a recommendation to prospective viewers. For example, from 5 April 2009, the average rating of the 2007 movie National Treasure: Book of Secrets given by approximately 12 900 visitors to the site was 3.8. This rating process provides a natural application of the models and methods described in this survey. The model described here is an ordered choice model. Ordered choice models are appropriate for a wide variety of settings in the social and biological sciences. The essential ingredient is the mapping from an underlying, naturally ordered preference scale to a discrete ordered observed outcome, such as the rating scheme described above. The model of ordered choice pioneered by Aitcheson and Silvey (1957) and Snell (1964) and articulated in its modern form by Zavoina and McElvey (1969), McElvey and Zavoina (1971, 1975) and McCullagh (1980) has become a widely used tool in many fields. The number of applications in the current literature is large and increasing rapidly. A search of just the ‘ordered probit’ model identified applications on: ● ● ● ● ● ●

● ● ●

● ● ● ● ● ● ● ●

academic grades (Butler et al., 1994; Li and Tobias, 2006), bond ratings (Terza, 1985), Congressional voting on a Medicare bill (McElvey and Zavoina, 1975), vehicle crash severity (Castro et al., 2012), credit ratings (Cheung, 1996; Metz and Cantor, 2006), driver injury severity in car accidents (Wang and Kockelman, 2005; Eluru et al., 2008), drug reactions (Fu et al., 2004), duration (Han and Hausman, 1990; Ridder, 1990), education (Machin and Vignoles, 2005; Carneiro et al., 2001, 2003; Cameron and Heckman, 1998; Cunha et al., 2007; Johnson and Albert, 1999), eye disease severity (Biswas and Das, 2002), financial failure of firms (Jones and Hensher, 2004; Hensher and Jones, 2007), happiness (Winkelmann, 2005; Zigante, 2007), health status (Greene, 2008a; Riphahn et al., 2003), insect resistance to insecticide (Walker and Duncan, 1967), job classification in the military (Marcus and Greene, 1983), job training (Groot and van den Brink, 2002), labor supply (Heckman and MaCurdy, 1981), 333 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

334 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Handbook of choice modelling life satisfaction (Clark et al., 2001; Groot and van den Brink, 2001), monetary policy (Eichengreen et al., 1985), nursing labor supply (Brewer et al., 2009), obesity (Greene et al., 2008), perceptions of difficulty making left turns while driving (Zhang, 2007), pet ownership (Butler and Chatterjee, 1997), political efficacy (King et al., 2004), pollution (Wang and Kockelman, 2009), product quality (Prescott and Visscher, 1977; Shaked and Sutton, 1982), promotion and rank in nursing (Pudney and Shields, 2000), self-assessed health (Greene et al., 2012), stock price movements (Tsay, 2005), tobacco use (Harris and Zhao, 2007; Kasteridis et al., 2010), toxicity in pregnant mice (Agresti, 2002), trip stops (Bhat, 1997), vehicle ownership (Train, 1986; Hensher et al., 1992; Bhat and Pulugurta, 1998), work disability (Kapteyn et al., 2007)

and hundreds more. This survey lays out some of the central features of ordered choice models. After developing the basic model, we describe some of the specification issues and model extensions that have appeared in recent studies. There are numerous surveys of ordered choice modeling in the received literature. This survey draws heavily on Greene and Hensher (2010). Some of the ideas developed in sections 4 and 5 are extended in Greene et al. (2013). Section 2 briefly discusses two foundational elements of the model, random utility models and the model for binary choices. The main development of the ordered choice model is given in section 3. Sections 4 to 6 detail a number of specification issues, including individual heterogeneity, functional form and panel data modeling.

2

BINARY CHOICE MODEL

The random utility model is one of two essential building blocks that form the foundation for modeling ordered choices. The second fundamental pillar is the model for binary choices. The ordered choice model that will be the focus of the rest of this survey is an extension of a model used to analyze the situation of a choice between two alternatives – whether the individual takes an action or does not, or chooses one of two elemental alternatives, and so on. 2.1

Random Utility Formulation of a Model for Binary Choice

An application that we will develop is based on a survey question in a large German panel data set, roughly, ‘On a scale from zero to ten, how satisfied are you with your health?’ The full data set consists of from one to seven observations – it is an unbalanced panel – on 7293 individuals for a total of 27 326 person year observations. A histogram

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 335 of the responses appears in Figure 15.3 (see section 3.5). We might formulate a random utility/ordered choice model for the variable Ri 5 ‘Health Satisfaction’ as Ui* 5 b9xi 1 ei, Ri 5 0 if −∞ , Ui* , μ0, Ri 5 1 if μ0 , Ui* , μ1, ... Ri 5 10 if μ9 , Ui* , 1∞, where xi is a set of variables such as gender, income, age, and education that are thought to influence the response to the survey question. (Note that at this point, we are pooling the panel data as if they were a cross section of n 5 32 726 independent observations and denoting by i one of those observations.) The average response in the full sample is 6.78. Consider a simple response variable, yi 5 ‘Healthy,’ (that is, better than average), defined by yi 5 1 if Ri . 7 and yi 5 0 otherwise. Then, in terms of the original variables, the model for yi is yi 5 0 if Ri [ (0, 1, 2, 3, 4, 5, 6) and yi 5 1 if Ri [ (7, 8, 9, 10). By adding the terms, we then find, for the two possible outcomes yi 5 0 if Ui* , μ6, yi 5 1 if Ui* . μ6. Substituting for Ui*, we find: yi 5 1 if b9xi 1 ei . μ6 or yi 5 1 if ei . μ6 − b9xi and yi 5 0 otherwise. We now assume that the first element of b9xi is a constant term, a, so that b9xi – μ6 equivalent to g9xi where the first element of g is a constant that is equal to a – μ6 and the rest of g is the same as the rest of b. Then, the binary outcome is determined by yi 5 1 g9xi 1 ei . 0 and yi 5 0 otherwise.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

336

Handbook of choice modelling

In general terms, we write the binary choice model in terms of the underlying utility as yi* 5 g9xi 1 ei, yi 5 1[yi* . 0], where the function 1[condition] equals one if the condition is true and zero if it is false. 2.2

Probability Models for Binary Choices

The observed outcome, yi, is determined by a latent regression yi* 5 g9xi 1 ei. The random variable yi takes two values, one and zero, with probabilities Prob(yi 5 1|xi) 5 Prob(yi* . 0|xi) 5 Prob(g9xi 1 ei . 0) 5 Prob(ei . −g9xi). The model is completed by the specification of a particular probability distribution for ei. In terms of building an internally consistent model, we require that the probabilities be between zero and one and that they increase when g9xi increases. In principle, any probability distribution defined over the entire real line will suffice. The literature on binary choices is overwhelmingly dominated by two models, the standard normal distribution, which gives rise to the probit model, f(ei) 5 exp(−ei2/2)/(2p)1/2 and the standard logistic distribution, f(ei) 5 exp(ei)/[1 1 exp(ei)]2, which produces the logit model. The normal distribution can be motivated by an appeal to the central limit theorem and modeling human behavior as the sum of myriad underlying influences. The logistic distribution has proved to be a useful mathematical form for modeling purposes for several decades. These two are by far the most frequently used in applications. Other distributions, such as the complementary log-log and Gompertz distribution that are built into modern software such as Stata and NLOGIT are sometimes specified as well, though without obvious motivation. The implication of the model specification is that yi|xi is a Bernoulli random variable with Prob(yi 5 1|xi) 5 Prob(yi* . 0|xi) 5 Prob(ei . − g9xi) 53

`

f (ei) dei

2grxi

5 1 – F(−g9xi), where F(.) denotes the cumulative density function (CDF) or distribution function for ei. The standard normal and standard logistic distributions are both symmetric

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 337 distributions that have the property that F(g9xi) 5 1 – F(−g9xi). This produces the convenient result Prob(yi 5 1|xi) 5 F(g9xi). Standard notations for the normal and logistic distribution functions are F(g9xi) and L(g9xi), respectively. The resulting probit model for a binary outcome is shown in Figure 15.1. Note that yi equals zero and one with probabilities F(−g9xi) and F(g9xi), E[yi|g9xi] 5 F(g9xi). Thus, the function in Figure 15.1 is also the regression function of yi on g9xi as well as E[yi|xi].

3

A MODEL FOR ORDERED CHOICES

The ordered probit model in its contemporary, regression based form was proposed by McElvey and Zavoina (1971, 1975) and Zavoina and McElvey (1969) for the analysis of ordered, categorical, nonquantitative choices, outcomes and responses. Their application concerned Congressional preferences on a Medicaid bill. Familiar recent examples include bond ratings, discrete opinion surveys such as those on political questions, obesity measures, preferences in consumption, and satisfaction and health status surveys such as those analyzed by Boes and Winkelmann (2006a, 2006b) and other applications mentioned in the introduction. The model is used to describe the data-generating process for a random outcome that takes one of a set of discrete, ordered outcomes. 3.1

A Latent Regression Model for a Continuous Measure

The model platform is an underlying random utility model or latent regression model, yi* 5 b9xi 1 ei, i 5 1, . . .,n, in which the continuous latent utility or ‘measure’, yi* is observed in discrete form through a censoring mechanism; Probability distribution for random utility 1.000

0.750

0.500

0.250

0.000 −0.3

−0.2

−0.1

0 GAMMA_X

1

2

3

Figure 15.1 Probit model for binary choice

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

338

Handbook of choice modelling yi 5 0 if m−1 , yi* , m0, 5 1 if m0 , yi* , m1, 5 2 if m1 , yi* , m2 5... 5 J if mJ−1 , yi* , mJ.

Note, for purposes of this introduction, that we have assumed that neither coefficients, b, nor thresholds, mj, differ across individuals. These strong assumptions will be reconsidered and relaxed as the analysis proceeds. The vector xi is a set of K covariates that are assumed to be strictly independent of ei; b is a vector of K parameters that is the object of estimation and inference. The n sample observations are labeled i 5 1, . . .,n. The model contains the unknown marginal utilities, b, as well as J12 threshold parameters, mj, all to be estimated using a sample of n observations, indexed by i 5 1, . . .,n. The data consist of the covariates, xi and the observed discrete outcome, yi 5 0,1, . . .,J. The assumption of the properties of the ‘disturbance’, independent of the random utility component, ei, completes the model specification. The conventional assumptions are that ei is a continuous random disturbance with conventional cumulative density function (cdf), F(ei|xi) 5 F(ei) with support equal to the real line, and that the density, f(ei) 5 F9(ei) is likewise defined over the real line. The assumption of the distribution of ei includes independence from, or exogeneity of, xi. 3.2

Ordered Choice as an Outcome of Utility Maximization

The appearance of the ordered choice model in the transportation literature falls somewhere between a latent regression approach and a more formal discrete choice interpretation. Bhat and Pulugurta (1998) discuss a model for ‘ownership propensity’, Ci 5 k if and only if yk−1 , Ci* , yk, k 5 0,1, . . .,K, y−1 5 −`, yK 5 1`, where Ci* represents the latent auto ownership propensity of household i. The observable counterpart to Ci* is Ci, typically the number of vehicles owned.1 Agyemand-Duah and Hall (1997) apply the model to numbers of trips. Bhat (1997) models the number of non-work commute stops with work travel mode choice. From here, the model can move in several possible directions: a natural platform for the observed number of vehicles owned or the number of vehicle crashes (Castro et al., 2012) might seem to be the count data models (for example, Poisson) detailed in, for example, Cameron and Trivedi (1998, 2005) or even a choice model defined on a choice set of alternatives, 0,1,2, . . . .2 The Poisson model for Ci would not follow from a model of utility maximization, though it would, perhaps, adequately describe the data-generating process. However, a looser interpretation of the vehicle ownership count as a reflection of the underlying preference intensity for ownership suggests an ordered choice model as a plausible alternative platform. Bhat and Pulugurta (1998) provide a utility maximization framework that produces an ordered choice model for the observed count. Their model departs from a random utility framework that assigns separate utility values to different states,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 339 for example, zero car ownership versus some car ownership, less than or equal to one car owned versus more than one, and so on (presumably up to the maximum observed in the sample). A suitable set of assumptions about the ranking of utilities produces essentially an unordered choice model for the number of vehicles. A further set of assumptions about the parameterization of the model makes it consistent with the latent regression model above.3 A wide literature in this area includes applications by Kitamura (1987, 1988), Golub and van Wissen (1988), Kitamura and Bunch (1989), Golob (1990), Bhat and Koppelmann (1993), Bhat (1996), Agyemand-Duan and Hall (1997), Bhat and Pulugurta (1998) and Bhat et al. (1999). One might question the strict ordering of the vehicle count. For example, the vehicles might include different mixtures of cars, SUVs and trucks. Though a somewhat fuzzy ordering might still seem natural, several authors have opted instead, to replace the ordered choice model with an unordered choice framework, the multinomial logit model and variants.4 Applications include Bhat and Pulugurta (1998), Mannering and Winsten (1985), Train (1986), Bunch and Kitamura (1990), Hensher, et al. (1992), Purvis (1994) and Agostino et al. (1996). Groot and van den Brink (2003) encounter the same issue in their analysis of job training sessions. A count model for sessions seems natural, however the length and depth of sessions differs enough to suggest that a simple count model will distort the underlying variable of interest, ‘training’. While many applications appear on first consideration to have some ‘natural’ ordering, this is not necessarily the case when one recognizes that the ordering must have some meaning also in utility or satisfaction space (that is, a naturally ordered underlying preference scale) if it assumed that the models are essentially driven by the behavioral rule of utility maximization. The number of cars owned is a good example: 0,1,2, . 2 is a natural ordering in physical vehicle space, but it is not necessarily so in utility space. 3.3

The Observed Discrete Outcome

A typical social science application might begin from a measured outcome such as: ‘Rate your feelings about the proposed legislation as Strongly oppose, Mildly oppose, Indifferent, Mildly support, Strongly support’. The latent regression model would describe an underlying continuous, albeit unobservable, preference for the legislation as yi*. The surveyed individual, even if they could, does not provide yi*, but rather, a censoring of yi* into five different ranges, one of which is closest to their own true preferences. By the laws of probability, the probabilities associated with the observed outcomes are Prob[yi 5 j | xi] 5 Prob[ei , mj − b9xi] − Prob[ei , mj−1 − b9xi], j 5 0,1, . . .,J.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

340

Handbook of choice modelling

It is worth noting the ‘model’describes, as do many other discrete choice models, probabilities of outcomes. It does not directly describe the relationship between a yi and the covariates xi; there is no obvious regression relationship at work between the observed random variable and the covariates. This calls into question the interpretation of b, an issue to which we will return at several points below. Though yi is not described by a regression relationship with xi – that is, yi is merely a label – one might consider examining the binary variables, mij 5 1[yi 5 j], Mij 5 1[yi < j], or Mij’ = 1[yi > j]. The second and third of these as well as mi0 can be described by a simple binary choice (probit or logit) model, though these are usually not of interest. However, in general, there is no obvious regression (conditional mean) relationship between the observed dependent variable(s), yi, and xi. Several normalizations are needed to identify the model parameters. First, in order to preserve the positive signs of all of the probabilities, we require mj . mj−1. Second, if the support is to be the entire real line, then m−1 5 −` and mJ 5 1`. Since the data contain no unconditional information on scaling of the underlying variable – if yi* is scaled by any positive value, then scaling the unknown mj and b by the same value preserves the observed outcomes – an unconditional, free variance parameter, Var[ei] 5 se2, is not identified (estimable). It is convenient to make the identifying restriction se 5 a constant, s. The usual approach to this normalization is to assume that Var[ei|xi] 5 1 in the probit case and p2/6 in the logit model – in either case to eliminate the free structural scaling parameter. Finally, we will assume that xi contains a constant term, which, in turns, requires m0 5 0. (If, with the other normalizations, and with a constant term present, this normalization is not imposed, then adding a constant to m0 and the same constant to the intercept term in b will leave the probability unchanged.) 3.4

Probabilities and the Log-likelihood

With the full set of normalizations in place, the likelihood function for estimation of the model parameters is based on the implied probabilities, Prob[yi 5 j | xi] 5 [F(mj − b9xi) − F(mj−1 − b9xi)] . 0, j 5 0,1, . . .,J. Figure 15.2 shows the probabilities for an ordered choice model with three outcomes, Prob[yi 5 0|xi] 5 F(0 − b9xi) − F(−∞− b9xi) 5 F(−b9xi), Prob[yi 5 1|xi] 5 F (μ1 − b9xi) − F(− b9xi), Prob[yi 5 2|xi] 5 F(1∞− b9xi) − F(μ1 − b9xi) 5 1 − F(μ1 − b9xi). Estimation of the parameters is a straightforward problem in maximum likelihood estimation (MLE). (See, for example, Pratt, 1981; Greene, 2007, 2008a.) The loglikelihood function is logL 5 Sni51SJj50 mij log [ F (mj 2 brxi) 2 F (mj21 2 brxi) ] , where mij 5 1 if yi 5 j and 0 otherwise. Maximization is done subject to the constraints m−1 5 −`, m0 5 0 and mJ 5 1`. The remaining constraints, mj − 1 , mj can, in principle,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 341 Probabilities for an ordered choice model 0.400 0.350 0.300 0.250 0.200 0.150 0.100 0.050 0.000 −3.50

y=0 −2.10

y=1 −0.70

y=2 0.70

2.10

3.50

EPSILON

Figure 15.2 Underlying probabilities for an ordered choice model be imposed by a reparameterization in terms of some underlying structural parameters, such as mj 5 mj−1 1 exp(aj) 5 Sjm51exp (am) , however, this is typically unnecessary. See, for example, Fahrmeier and Tutz (2001). Expressions for the derivatives of the log-likelihood can be found in McElvey and Zavoina (1975), Maddala (1983), Long (1997), Stata (2008) and Econometric Software (2012). The estimator of the asymptotic covariance matrix for the MLE is computed by familiar methods, using the Hessian, outer products of gradients or, in some applications, a ‘robust’ sandwich estimator. The most recent literature (since 2005) includes several applications that use Bayesian methods to analyze ordered choices. Being heavily parametric in nature, they have focused exclusively on the ordered probit model.5 Some commentary on Bayesian methods and methodology may be found in Koop and Tobias (2006). Applications to the univariate ordered probit model include Kadam and Lenk (2008), Ando (2006), Zhang et al. (2007) and Tomoyuki and Akira (2006). In the most basic cases, with diffuse priors, the ‘Bayesian’ methods merely reproduce (with some sampling variability) the maximum likelihood estimator.6 However, the Markov chain Monte Carlo (MCMC) methodology is often useful in settings which extend beyond the basic model, for example, applications to a bivariate ordered probit model (Biswas and Das, 2002), a model with autocorrelation (Girard and Parent, 2001; Czado et al., 2005) and a model that contains a set of endogenous dummy variables in the latent regression (Munkin and Trivedi, 2008). 3.5

Application of the Ordered Choice Model to Self-assessed Health Status

Riphahn et al. (2003) analyzed individual data on health care utilization (doctor visits and hospital visits) using various models for counts. The data set is an unbalanced panel of 7293 German individuals observed from 1 to 7 times for a total of

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

342

Handbook of choice modelling Histogram for NewHSAT – full sample

3748

MALE FEMALE

Frequency

2811

1874

937

0 0

1

2

3

4

5

6

7

8

9

10

Figure 15.3 Self-reported health satisfaction 27 326 observations, extracted from the German Socioeconomic Panel (GSOEP). (See Riphahn et al., 2003, and Greene, 2008a, for discussion of the data set in detail.) Among the variables in this data set is HSAT, a self-reported health assessment that is recorded with values 0,1, . . .,10 (so, J 5 10). Figure 15.3 shows the distribution of outcomes for the full sample: the figure reports the variable NewHSAT, not the original variable. Forty of the 27 326 observations on HSAT in the original data were coded with noninteger values between 6.5 and 6.95. We have changed these 40 observations to 7s. In order to construct a compact example that is sufficiently general to illustrate the technique, we will aggregate the categories shown as follows: (0−2) 5 0, (3−5) 5 1, (6−8) 5 2, (9) 5 3 and (10) 5 4. (One might expect collapsing the data in this fashion to sacrifice some information and, in turn, produce a less efficient estimator of the model parameters. See Murad et al., 2003, for some analysis of this issue.) Figure 15.4 shows the result, once again for the full sample, stratified by gender. The families were observed in 1984–88, 1991 and 1995. For purposes of the application, to maintain as closely as possible the assumptions of the model, at this point, we have selected the most frequently observed year, 1988, for which there are a total of 4483 observations, 2313 males and 2170 females. We will use the following variables in the regression part of the model, x 5 (constant, Age, Income, Education, Married, Kids). In the original data set, Income is HHMINC (household monthly income) and Kids is HHKIDS (dummy variable for children under 16 present in the household). Married and Kids are binary variables. 3.5.1 Estimated ordered probit model Estimates of the ordered probit model for the 1988 data set are as follows: (with estimated standard errors in parentheses):

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 343 Histogram for Health – full sample 7644 MALE FEMALE

Frequency

5733

3822

1911

0 0

Figure 15.4

1

2

3

4

Health satisfaction with combined categories

y* 5 1.97882 − .01806Age 1 .03556Educ 1 .25869Income − .03100Married 1 .06065Kids 1 e. (0.116)

(0.002)

(0.007)

(0.104)

(0.042)

(0.038)

y 5 0 if y* , 0 y 5 1 if 0 , y* , 1.14835 (0.021) y 5 2 if 1.14835 , y* , 2.54781 (0.022) y 5 3 if 2.54781 , y* , 3.05639 (0.027) y 5 4 if y* . 3.05639. Log-likelihood 5 − 5752.985. As commonly observed, the counterparts for an ordered logit model are approximately 1.8 times the corresponding estimates for a probit model. The log-likelihood function for the logit model is −5749.157. Figure 15.5 shows the implied model for a person of average age (43.44 years), education (11.418 years) and income (0.3487) who is married (1) with children (1). The figure shows the implied probability distribution in the population for individuals with these characteristics. As we will examine in the next section, the force of the regression model is that the probabilities change as the characteristics (x) change. In terms of the figure, changes in the characteristics induce changes in the placement of the partitions in the distribution and, in turn, in the probabilities of the outcomes. 3.5.2 Interpretation of the model – partial effects and scaled coefficients Interpretation of the coefficients in the ordered probit model is more complicated than in the ordinary regression setting.7 The outcome variable, y, is merely a label for the ordered, nonquantitative outcomes. As such, there is no conditional mean function,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

344

Handbook of choice modelling Probabilities for estimated ordered probit model 0.450 0.400 0.350

F

0.300 0.5124

0.250 0.200 0.150

0.0427

0.0906

0.100 0.050 0.000 −3.00

0.2412 y=0 −1.80 −'X −1.7195

y=1

0.1131 y=2

−0.60 1−'X −0.5716

y=3 0.60 2−'X 0.8283

y=4 1.80

3.00

3−'X 1.3369

Figure 15.5 Estimated ordered probit model E[y|x] to analyze. In order to interpret the parameters, one typically refers to the probabilities themselves. The partial effects in the ordered choice model are dj (xi) 5

0Prob (y 5 j 0 xi) 5 [ f (mj21 2 brxi) 2 f (mj 2 brxi) ] b. 0xi

Neither the sign nor the magnitude of the coefficient is informative about the result above, so the direct interpretation of the coefficients is fundamentally ambiguous. A counterpart result for a dummy variable in the model would be obtained by using a difference of probabilities, rather than a derivative.8 That is, suppose D is a dummy variable in the model (such as Married) and g is the coefficient on D. We would measure the effect of a change in D from 0 to 1 with all other variables held at the values of interest (perhaps their means) using Dj(D) 5 [ F (mj 2brxi 1g) 2F (mj212brxi 1g) ] 2[ F (mj 2 br xi) 2F (mj212 brxi) ] . Partial effects are computed using either the derivatives, or first differences for discrete variables: dj (xi) 5

0Prob (y 5 j 0 xi) 5 [ f (mj21 2 brxi) 2 f (mj 2 brxi) ] b, 0xi

or Dj(d,xi) 5 [ F (mj 2 brxi 1 g) 2F (mj21 2 brxi 1g) ] 2 [ F (mj 2 brxi) 2 F (mj21 2 brxi) ] . The received applications include both presentations of ‘average partial effects’ (APE), that is, partial effects computed by averaging the individually computed partial effects, and partial effects computed at the averages (PEA) of the data. Current practice leans

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 345 Table 15.1 Estimated partial effects for ordered choice models Age Prob(y = 0) Prob(y = 1) Prob(y = 2) Prob(y = 3) Prob(y = 4) Note:

0.00173 0.00450 −0.00124 −0.00216 −0.00283

Education −0.00340 −0.00885 0.00244 0.00424 0.00557

Income

Married*

Kids*

−0.02476 −0.06438 0.01774 0.03085 0.04055

0.00293 0.00771 −0.00202 −0.00370 −0.00491

−0.00574 −0.01508 0.00397 0.00724 0.00960

* Binary variable; partial effects computed as first differences.

toward the former. The estimated average partial effects for the model reported earlier are shown in Table 15.1. Since the estimated partial effects (APE and PEA) are functions of the estimated parameters, they are subject to sampling variability and one might desire to obtain appropriate asymptotic covariance matrices and/or confidence intervals. The delta method is used to obtain the standard errors. Let V denote the estimated asymptotic covariance matrix for the (K 1 J − 2) 3 1 parameter vector (b^ r, m^ r) r. Then, for example, the estimator of the asymptotic covariance matrix for each vector of partial effects at the means is ^ VC^ r, where C ˆ 5 c Q5C

0d^ j (x) 0b^ r

0d^ j (x) 0m^ r

d.

^ is replaced with the derivatives of D (d, x) when the effect is The appropriate row of C j being computed for a discrete variable. For computing APEs instead, the Jacobian, at the means, C^ (x) , is replaced with the average estimated Jacobian, (1/n) Sni51C^ (xi) . The implication of the preceding result is that the effect of a change in one of the variables in the model depends on all the model parameters, the data, and which probability (cell) is of interest. It can be negative or positive. To illustrate, we consider a change in the education variable on the implied probabilities in Figure 15.6. Since the changes in a probability model are typically ‘marginal’ (small), we will exaggerate the effect a bit so that it will show up in a figure. Consider, then, the average individual shown in the top panel of Figure 15.6, except now, with a PhD (college plus four years of postgraduate work); that is, 20 years of education, instead of the average 11.4 used earlier. The effect of an additional 8.6 years of education is shown in the lower panel of Figure 15.6. All five probabilities have changed. The two at the right-hand end of the distribution have increased while the three at the left have decreased. The partial effects give the impacts on the specific probabilities per unit change in the stimulus or regressor. For example, for continuous variable Educ, we find partial effects for the ordered probit model for the five cells of −.0034, −.00885, .00244, .00424, .00557, respectively, which give the expected change on the probabilities per additional year of education. For the income variable, for the highest cell, the estimated partial effect is .04055. However, some care is needed in interpreting this in terms of a unit change. The income variable has a mean of 0.34874 and a standard deviation of 0.1632. A full unit change in income would put the average individual nearly six standard deviations above the mean. Thus, for the marginal impact of income, one might want to measure

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

346

Handbook of choice modelling

0.450 0.400 0.350 0.300 0.250 0.200 0.150 0.100 0.050 0.000 −3.00 0.450 0.400 0.350 0.300 0.250 0.200 0.150 0.100 0.050 0.000 −3.00

Estimated ordered probit with average education

P0 = 0.427

P1 = 0.2412

−'X1

P2 = 0.5124

1−'X1

−1.80

P3 = 0.1131

2−'X1

−0.60

P4 = 0.0906

3−'X1

0.60

1.80

3.00

Estimated ordered probit with PhD (20 years of education)

P0 = 0.215

P1 = 0.1690

−'X1

−1.80

P2 = 0.5091

1−'X1

P3 = 0.1493

2−'X1

−0.60

0.60

P4 = 0.1511

3−'X1

1.80

3.00

Figure 15.6 Partial effect in ordered probit model

a change in standard deviation units. Thus, an assessment of the impact of a change in income on the probability of the highest cell probability might be 0.04055 3 0.1632 5 0.00662. Precisely how this computation should be done will vary from one application to another. There is typically a large difference in the coefficients obtained for the probit and logit models. The logit coefficients are roughly 1.8 times as large (not uniformly). This difference, which will always be observed, points up one of the risks in attempting to interpret directly the coefficients in the model. This difference reflects an inherent scaling of the underlying variable and in the shape of the distributions. The difference can be traced back (at least in part) to the different underlying variances in the two models. In the probit model, se 5 1; in the logit model se 5 p/√3 5 1.81. The models are roughly preserving the ratio b/se in the estimates. The difference is greatly diminished in the partial effects reported in Table 15.1. The values computed for an ordered logit model are nearly the same. That is the virtue of the scaling done to compute the partial effects. The inherent characteristics of the model are essentially the same for the two functional forms.

4

SPECIFICATION ISSUES AND GENERALIZED MODELS

It is useful to distinguish between two directions of the contemporary development of the ordered choice model, functional form and heterogeneity. Beginning with Terza (1985), a

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 347 number of authors have focused on the fact that the model does not account adequately for individual heterogeneity that is likely to be present in micro-level data. This section considers specification issues. Heterogeneity is examined in section 5. 4.1

Accommodating Individual Heterogeneity

For a subjective well being (SWB) application, the right-hand side of the behavioral equation will include variables such as Income, Education, Marital Status, Children, Working Status, Health, and a host of other measurable and unmeasurable, and measured and unmeasured variables. In individual level behavioral models, such as SWBit 5 b9xit 1 eit, the relevant question is whether a zero mean, homoscedastic eit, can be expected to satisfactorily accommodate the likely amount of heterogeneity in the underlying data, and whether it is reasonable to assume that the same thresholds should apply to each individual. Beginning with Terza (1985), analysts have questioned the adequacy of the ordered choice model from this perspective. As shown below, many of the proposed extensions of the model, such as heteroskedasticity, parameter heterogeneity, and so on, mirror developments in other modeling contexts (such as binary choice modeling and modeling counts such as number of doctor visits or hospital visits). The regression based ordered choice model analyzed here does have a unique feature, that the thresholds are part of the behavioral specification. This aspect of the specification has been considered as well. 4.2

Threshold Models – A Generalized Ordered Probit Model

The model analyzed thus far assumes that the thresholds mj are the same for every individual in the sample. Terza (1985), Pudney and Shields (2000), Boes and Winkelmann (2006a), Greene et al. (2008) and Greene and Hensher (2010), all present cases that suggest individual variation in the set of thresholds is a degree of heterogeneity that is likely to be present in the data, but is not accommodated in the model. Terza’s (1985) generalization of the model is equivalent to mij 5 mj 1 d9zi. This is the special case of the ‘generalized’ model used in his application – his fully general case allows d to differ across outcomes. The model is reformulated later to assume that the zi in the equation for the thresholds is the same as the xi in the regression. For the moment, it is convenient to isolate the constant term from xi. In Terza’s application, in which there were three outcomes, yi* 5 a 1 b9xi 1 ei,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

348

Handbook of choice modelling

and yi 5 0 if yi* , 0, 1 if 0 , yi* , m 1 d9xi, 2 if yi* . m 1 d9xi. There is an ambiguity in the model as specified. In principle, the model for three outcomes has two thresholds, m0 and m1. With a nonzero overall constant, it is always necessary to normalize the first, m05 0. Therefore, the model implies the following probabilities: Prob(y 5 0|x) 5 F(−a − b9x) 5 1 − F(a0 1 b09x), Prob(y 5 1|x) 5 F(m 1 d9xi − a − b9x) − F(−a − b9x) 5 F(a0 1 b09x) − F(a1 1 b19x), Prob(y 5 2|x) 5 F(a 1 b9x − m − d9x) 5 F(a1 1 b19x), where a0 5 a, b0 5 b, a1 5 a − m, b1 5 (b − d). This is precisely Williams’s (2006) ‘generalized ordered probit model’; that is, at this juncture, Terza’s heterogeneous thresholds model and the ‘generalized ordered probit’ model are indistinguishable. For direct applications of Terza’s approach, see, for example, Kerkhofs and Lindeboom (1995), Groot and van den Brink (1999) and Lindeboom and van Doorslayer (2003). Terza notes (1985, p. 6) that the model formulation does not impose an ordering on the threshold coefficients. He suggests an inequality constrained maximization of the loglikelihood, which is likely to be extremely difficult if there are many variables in x. As a ‘less rigorous but apparently effective remedy’, he proposes to drop from the model variables in the threshold equations that are insignificant in the initial (unconstrained) model. The analysis of this model continues with Pudney and Shields’s (2000) ‘generalized ordered probit model’, whose motivation, like Terza’s was to accommodate observable individual heterogeneity in the threshold parameters as well as in the mean of the regression. Pudney and Shields studied an example in the context of job promotion in which the steps on the promotion ladder for nurses are somewhat individual specific. In their setting, in contrast to Terza’s, at least some of the variables in the threshold equations are explicitly different from those in the regression. Their model is parameterized as Pr(yi 5 g|xi,qi,ti) 5 F[qi 9bg − xi9(a 1 dg)] − F[qi 9bg − 1 − xi9(a 1 dg − 1)]. The resulting equation is now a hybrid with outcome varying parameters in both thresholds and in the regression. The test of threshold constancy is then carried out simply by testing (using an LM test) the null hypothesis that dg 5 0 for all g. (A normalization, d0 5 dm 5 0, is imposed at the outset.) Two features of Pudney and Shields’s model to be noted are: first, the probabilities in their revised log-likelihood – their equation (8) – are not constrained to be positive. Second, the thresholds, qi 9bg, are not constrained to be ordered. No restriction on bg will ensure that qi 9bg . qi 9bg − 1 for all data vectors qi. The equivalence of the Terza and Williams models is only a mathematical means to the end of estimation of the model. The Pudney and Shields model, itself, has constant parameters in the regression model and outcome varying parameters in the thresholds,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 349 and clearly stands on the platform of the latent regression, but they do note (using a more generic notation) a deeper problem of identification). However it is originally formulated, the model implies that Prob[yi # j | xi,zi] 5 F(mj 1 d9zi − b9xi) 5 F[mj − (d*9zi 1 b9xi)], d* 5 −d. In their specification, they had a well defined distinction between the variables, zi that should appear only in the thresholds and xi that should appear in the regression. More generally, it is less than obvious whether the variables zi are actually in the threshold or in the mean of the regression. Either interpretation is consistent with the estimable model. Pudney and Shields argue that the distinction is of no substantive consequence for their analysis. The consequence is at the theoretical end, not in the implementation. But, this entire development is necessitated by the linear specification of the thresholds. Absent that, most of the preceding construction is of limited relevance. 4.3

Random Parameters Models

Formal modeling of heterogeneity in the parameters as representing a feature of the underlying data, appears in Greene (2002) (version 8.0), Bhat (1999), Bhat and Zhao (2002) and Boes and Winkelmann (2006a, 2006b). These treatments suggest a full random parameters (RP) approach to the model. Boes and Winkelmann’s (2006a, 2006b) treatment appears as follows: bi 5 b 1 ui, where ui ~ N[0,W]. Inserting the expression for bi in the latent regression model, we obtain yi* 5 bi9xi 1 ei 5 b9xi 1 ei 1 xi9ui. They propose treating this as a heteroskedastic model – Var[ei 1 xi9ui] 5 1 1 xi9Wxi − and maximizing the log-likelihood directly over b, m and W. The observation mechanism is the same as earlier. Greene (2002, 2007, 2008a, 2008b) analyzes the same model, but estimates the parameters by maximum simulated likelihood. First, write the random parameters as bi 5 b 1 Dzi 1 LDwi, where wi has a multivariate standard normal distribution, and LD2L9 5 W. The Cholesky matrix, L, is lower triangular with ones on the diagonal. The below diagonal elements of L, lmn, produce the nonzero correlations across parameters. The diagonal matrix, D, provides the scale factors, dm, that is, the standard deviations of the random parameters. The end result is that L(Dwi) is a mixture, Lwi* of random variables, wim* which have variances dm2. This is a two-level ‘hierarchical’ model (in the more widely used sense). The probability for an observation is

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

350

Handbook of choice modelling Prob (yi 5 j 0 xi, wi) 5 [ F (mj 2 bri xi) 2 F (mj21 2 bri xi) ]

( ( ) ) 5 c F mj 2brxi 2zri Drxi 2 LDwi rxi 2 d . F (mj212brxi 2zri Drxi 2 (LDwi)rxi)

In order to maximize the log-likelihood, we must first integrate out the elements of the unobserved wi. Thus, the contribution to the unconditional log-likelihood for observation i is log Li 5 log3 c wi

F (mj 2 brxi 2 zri Drxi 2 (LDwi) rxi)2 d F (wi) dwi. F (mj21 2 brxi 2 zri Drxi 2 (LDwi)rxi)

The log-likelihood for the sample is then the sum over the observations. Computing the integrals is an obstacle that must now be overcome. It has been simplified considerably already by decomposing W explicitly in the log-likelihood, so that F(wi) is the multivariate standard normal density. The Stata routine GLAMM (Rabe-Hesketh et al., 2005), that is used for some discrete choice models, does the computation using a form of Hermite quadrature. An alternative, generally substantially faster method of maximizing the log-likelihood is maximum simulated likelihood. The integration is replaced with a simulation over R draws from the multivariate standard normal population. The simulated log-likelihood is F (mj 2 brxi 2 zri Drxi 2 (LDwir)rxi)2 N 1 R log LS 5 a i51 log a r51 c F (m 2 brx 2 zr Drx 2 (LDw ) ) d . R j21 i i i ir rxi

The simulations are speeded up considerably by using Halton draws.9 Partial effects and predicted probabilities must be simulated as well. For the partial effects, 0Prob (yi 5 j 0 xi) 5 0xi 3 c wi

f (mj21 2 brxi 2 zri Dr xi 2 (LDwi) rxi) 2 d (b 1 Dzi 2 LDwi) F (wi) dwi. f (mj2 brxi 2 zri Drxi 2(LDwi)r xi)

We use simulation to compute

Est. e

0Prob (yi 5 j 0 xi) 5 0xi

^ x 2 zrD^ rx 2 (L^ D ^ wir)rxi) 2 f (m^ j21 2 br 1 R i i i d f (b^ 1 D^ zi 2 L^ D ^ wir) . c a ^ xi 2 zri D^ rxi 2 (L^ D ^ wir) rxi) R r51 f (^mj 2 br

A similar analysis provides an extension of the latent class model to ordered choice models. The latent class ordered choice model is developed in detail in Greene and Hensher (2010). The finite mixture, or latent class approach is an alternative method of modeling parameter heterogeneity. A narrow view of the latent class model casts it in the form of

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 351 discrete parameter variation in the population – the population consists of a mixture of Q ‘types’ or classes, indexed by the parameter vectors, (bq,mq) distributed discretely with nonparametric probability mass function, P 5 (p1,p2, . . .,pQ). The mixed model is then generated from the conditional probabilities Prob(yi 5 j |xj,class 5 q) 5 F(mj,q − bq9xi) − F(mj−1,q − bq9xi). Class membership is unknown (latent), so the observable contribution to the loglikelihood is found by integrating over the classes Prob(yi 5 j |xj) 5 a q51pq F(mj,q − bq9xi) − F(mj−1,q − bq9xi). Q

The log-likelihood is employed in the usual fashion. The latent class ordered choice model appears early in Uebersax (1999) and Everitt (1988). A useful extension of the latent class model is to employ information about class membership where it can be found in the class probabilities. The modified model is built around Prob(class 5 q|zi) 5 p(dq,zi). A probit or logit model for the class membership model is typical when there are two classes, as in Greene et al.’s (2008) study of obesity. Greene and Hensher (2010) suggest a multinomial logit form for more general cases. The latent class formulation provides a convenient platform for elaborate models with multiple equations and class specific model specifications. Harris and Zhao’s (2007) zero inflation model for tobacco consumption is an example (described in more detail below). Greene et al. (2013) propose a two segment model (population) for analyzing responses to a question about ‘Self-assessed Health’ (SAH) in the Hilda data. Answers to the question take the standard range (0,1,2,3,4). Class ‘1’ individuals reply in the form of the usual ordered probit model. The observed data on SAH balanced against observed objective health measures such as the incidence of heart disease and diabetes seems to display ‘2 and 3 inflation’; these cells seem too large. The authors hypothesis that a second class of individuals will always answer with one of these two categories regardless of other conditions. This gives rise to a second class, and two-class model with a probit ‘splitting’ equation, Prob (y 5 0 0 xt,xm,z) Prob (y 5 1 0 xt,xm,z) fProb (y 5 2 0 xt,xm,z) Prob (y 5 3 0 xt,xm,z) Prob (y 5 4 0 xt,xm,z)

5 F (drz) [ F (2brt xt) ] 5 F (drz) [ F (m1 2 brt xt) 2F (2brt xt) ] 5 F (drz) [ F (m2 2brt xt) 2 F (m12brt xt) ] 1[ 12F (drz) ] F (2brmxm) v 5 F (drz) [ F (m3 2 brt xt) 2F (m2 2 brtxt) ] 1 [ 12F (drz) ] F (brmxm) 5 F (drz) [ 12F (2brt xt) ]

For those who are ‘true’ reporters (merely a label), the five-outcome ordered probit model governs. For ‘misreporters,’ a two-outcome simple probit model applies. The splitting probability is F(d9z).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

352

5

Handbook of choice modelling

ORDERED CHOICE MODELING WITH PANEL DATA

Development of models for panel data parallels that in other modeling settings. The departure point is the familiar fixed and random effects approaches. Some two part extensions of the model are examined in section 6. 5.1

Ordered Choice Models with Fixed Effects

An ordered choice model with fixed effects formulated in the most familiar fashion would be Prob[yit 5 j | xi] 5 F(mj − ai − b9xit) − F(mj−1 − ai − b9xit) . 0, j 5 0,1, . . .,J. At the outset, there are two problems that this model shares with other nonlinear fixed effects models. First, regardless of how estimation and analysis are approached, time invariant variables are precluded. Since social science applications typically include demographic variables such as gender and, for some at least, education level, that are time invariant, this is likely to be a significant obstacle. (Several of the variables in the GSOEP analyzed by Boes and Winkelmann, 2006b, and others are time invariant.) Second, there is no sufficient statistic available to condition the fixed effects out of the model. That would imply that in order to estimate the model as stated, one must maximize the full log-likelihood, log L 5 a i51 log e q a a j50 mijt [ F (mj 2 ai 2 brxit) 2 F (mj21 2 ai 2 brxi) ] b f . t51 N

Ti

J

If the sample is small enough, one may simply insert the individual group dummy variables and treat the entire pooled sample as a cross section. See, for example, Mora (2006) for a cross-country application in banking that includes separate country dummy variables. We are interested, instead, in the longitudinal data case in which this would not be feasible. The data set from which our sample used in the preceding examples is extracted comes from an unbalanced panel of 7293 households, observed from 1 to 7 times each. The full ordered probit model with fixed effects, including the individual specific constants, can be estimated by unconditional maximum likelihood using the results in Greene (2004a, 2004b, 2008a, s. 16.9.6.c). The likelihood function is globally concave, so despite its superficial complexity, the estimation is straightforward.10 The larger methodological problem with this approach would be at least the potential for the incidental parameters problem that has been widely documented for the binary choice case (see, for example, Lancaster, 2000) that is, the small T bias in the estimated parameters when the full MLE is applied in panel data. For T 5 2 in the binary logit model, it has been shown analytically (Abrevaya, 1997) that the full MLE converges to 2b. (See, as well, Hsiao, 1986, 2003.) No corresponding results have been obtained for larger T or for other models. In particular, no theoretical counterpart to the Hsiao (1986, 2003) and Abrevaya (1997) result on the small T bias (incidental parameters problem) of the MLE in the presence of fixed effects has been derived for the ordered probit model, even for T equal to 2. However, Monte Carlo results have strongly suggested that the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 353 small sample bias persists for larger T as well, although as might be expected, it diminishes with increasing T. The Monte Carlo results in Greene (2004b) suggest that biases comparable to those in the binary choice models persist in the ordered probit model as well. The values given correspond to estimation of coefficients on a continuous variable (b) and a binary variable (d) in the equation. Ferrer-i-Carbonell and Frijters (2004) proposed a partial solution to building fixed effects into an ordered logit model. The central equation in the model with fixed effects is, as usual, Prob[yit 5 j | xi] 5 F(mj − ai − b9xit) − F(mj−1 − ai − b9xit) . 0, j 5 0,1, . . .,J. The model implies that Prob[yit . j|xit] 5 L(ai 1 b9xit − mj). We now simply fold the invariant mj into ai and obtain a familiar ‘fixed effects binary logit model.’ (See Rasch, 1960, and Chamberlain, 1980.) The parameters, b, can now be estimated by conditioning out the fixed effects. This leaves three estimation problems to be solved. First, the procedure does not produce estimates of mj, so it is not possible to compute the probabilities or marginal effects. It does not produce estimates of ai which compounds the just noted problem. Finally, since this can be computed for j 5 1, . . ., J − 1, the procedure produces multiple estimates of b. A natural minimum distance (generalized method of moments – GMM) estimator would seem appropriate at this point, so at least this third obstacle is surmountable. The first two, however, do raise questions of the value of this exercise. Recent proposals for ‘bias reduction’ estimators for binary choice models, including Fernández-Val (2009), Fernández-Val and Vella (2007), Carro (2007), Hahn and Newey (2004) and Hahn and Kuersteiner (2003) suggest some directions for further research. Bester and Hansen (2009) have suggested an approach for three-outcome (low, middle, high) ordered choice models. We note, for this model, the estimation of b which is the focus of these estimators, is only a means to the end. As seen earlier, in order to make meaningful statements about the implications of the model for behavior, it will be necessary to compute probabilities and derivatives. These, in turn, will require estimation of the constants, or some surrogates. The problem remains to be solved. Fernández-Val (2009) suggests, however, that the unconditional, biased estimator of (a,b) does produce consistent estimators of the partial effects in these models. If so, given that the actual target of estimation is the partial effects, not the raw coefficients, the argument about the incidental parameters may be a moot point. Research on this subject continues. 5.2

Ordered Choice Models with Random Effects

Save for an ambiguity about the mixture of distributions in an ordered logit model, a random effects version of the ordered choice model is a straightforward extension of the binary choice case developed by Butler and Moffitt (1982). An interesting application which appears to replicate, but not connect to, Butler and Moffitt is Jansen (1990).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

354

Handbook of choice modelling

Jansen estimates the equivalent of the Butler and Moffitt model with an ordered probit model, using an iterated MLE with quadrature used between iterations. The structure of the random effects ordered choice model is yit* 5 b9xit 1 ui 1 eit, yit 5 j if mj−1 , yit* , mit, eit ~ f (.) with mean zero and constant variance 1 or p2/3 (probit or logit), ui ~ g (.) with mean zero and constant variance, s2, independent of eit for all t. If we maintain the ordered probit form and assume as well that ui is normally distributed, then, at least superficially, we can see the implications for the estimator of ignoring the heterogeneity. Using the usual approach, Prob(yit 5 j|xit) 5 Prob(b9xit 1 ui 1 eit , mj) − Prob(b9xit 1 ui 1 eit , mj−1) 5 Fa

mj "1 1 s

2

2

brxit "1 1 s

2

b 2 Fa

5 F(tj − g9xit) − F(tj−1 − g9xit).

mj21 "1 1 s

2

2

brxit "1 1 s2

b

Unconditionally, then, the result is an ordered probit in the scaled threshold values and scaled coefficients. Evidently, this is what is estimated if the data are pooled and the heterogeneity is ignored.11 Wooldridge and Imbens (2009a) argue that since the partial effects are [f(tj−1 − g9xit) − f(tj − g9xit)]g, the scaled version of the parameter is actually the object of estimation in any event. 5.3

Spatial Autocorrelation

The treatment of spatially correlated discrete data presents several major complications. LeSage (1999, 2004) presents some of the methodological issues. A variety of received applications for binary choice include the geographic pattern of state lotteries (Coughlin etal., 2004), Children’s Health Insurance Programs (CHIPS) (Franzese and Hays, 2007) and HYV rice adoption (Holloway et al., 2002). The extension to ordered choice models has begun to emerge as well, with applications including ozone concentration and land development (Wang and Kockelman, 2008, 2009) and trip generation (Roorda et al., 2009). Castro et al. (2012) use the ordered choice model and a link to the Poisson regression to model crash severity with spatial effects.

6

TWO-PART AND SAMPLE SELECTION MODELS

Two-part models describe situations in which the ordered choice is part of a two-stage decision process. In a typical situation, an individual decides whether or not to participate in an activity then, if so, decides how much. The first decision is a binary choice. The intensity outcome can be of several types – what interests us here is an ordered choice. In

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 355 the example below, an individual decides whether or not to be a smoker. The intensity outcome is how much they smoke. The sample selection model is one in which the participation ‘decision’ relates to whether the data on the outcome variable will be observed, rather than whether the activity is undertaken. This chapter will describe several types of two-part and sample selection models 6.1

Inflation Models

Harris and Zhao (2007) analyzed a sample of 28 813 Australian individuals’ responses to the question ‘How often do you now smoke cigarettes, pipes or other tobacco products?’ (Data are from the Australian National Drug Strategy Household Survey [NDSHS], 2001.) Responses were ‘zero, low, moderate, high’, coded 0,1,2,3. The sample frequencies of the four responses were 0.75, 0.04, 0.14 and 0.07. The spike at zero shows a considerable excess of zeros compared with what might be expected in an ordered choice model. The authors reason that there are numerous explanations for a zero response: ‘genuine nonsmokers, recent quitters, infrequent smokers who are not currently smoking and potential smokers who might smoke when, say, the price falls’. It is also possible that the zero response includes some individuals who prefer to identify themselves as nonsmokers. The question is ambiguously worded, but arguably, the group of interest is the genuine nonsmokers. This suggests a type of latent class arrangement in the population. There are (arguably) two types of zeros, the one of interest and another type generated by the appearance of the respondent in the latent class of people who respond zero when another response would actually be appropriate. The end result is an inflation of the proportion of zero responses in the data. A ‘zero inflation’ model is proposed to accommodate this failure of the base case model. In a recent application, Greene et al. (2012) have extended an ordered probit model of self-assessed health (on a zero to four scale) to accommodate ‘2s and 3s inflation’. Some further details of this model are given at the end of section 4.3. 6.2

Sample Selection Models

The familiar sample selection model was extended to binary choice models by Wynand and van Praag (1981) and Boyes et al. (1989). A variety of extensions have also been developed for ordered choice models, both as sample selection (regime) equations and as models for outcomes subject, themselves, to sample selectivity. We consider these two cases and some related extensions. The models of sample selectivity in this area are built as extensions of Heckman’s (1979) canonical model. Estimation of the regression equation by least squares while ignoring the selection issue produces biased and inconsistent estimators of all the model parameters. Estimation of this model by two step methods is documented in a voluminous literature, including Heckman (1979) and Greene (2008a). The two step method involves estimating a first in the participation equation using an ordinary probit model, then computing an estimate of li, l^ i 5 f (b^ rxi) /F (b^ rxi) , for each individual in the selected sample. At the second step, an estimate of (b,q) is obtained by linear regression of yi on xi and ˆli. Necessary corrections to the estimated standard errors are described in Heckman (1979), Greene (1981, 2008b), and, in general terms, in Murphy and Topel (2002). Consider a model of educational attainment or performance in a training or

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

356

Handbook of choice modelling

vocational education program (for example, low, median, high), with selection into the program as an observation mechanism. (Boes, 2007, examines a related case, that of a treatment, D, that acts as an endogenous dummy variable in the ordered outcome model.) In an ordered choice setting, the ‘second step’ model is nonlinear. The received literature contains many applications in which authors have ‘corrected for selectivity’ by following the logic of the Heckman two-step estimator, that is, by constructing li 5 f(a9wi)/F(a9wi) from an estimate of the probit selection equation and adding it to the outcome equation.12 However, this is only appropriate in the linear model with normally distributed disturbances. An explicit expression, which does not involve an inverse Mills ratio, for the case in which the unconditional regression is E[y|x,e] 5 exp(b9x 1 e) is given in Terza (1998). A template for nonlinear single index function models subject to selectivity is developed in Terza (1998) and Greene (2006, 2008a, s. 24.5.7). Applications specifically to the Poisson regression appear in several places, including Greene (1995, 2005). The general case typically involves estimation either using simulation or quadrature to eliminate an integral involving u in the conditional density for y. Cases in which both variables are discrete, however, are somewhat simpler. A near parallel to the model above is the bivariate probit model with selection developed by Boyes et al. (1989) in which the outcome equation above would be replaced with a second probit model. (Wynand and van Praag (1981) proposed the bivariate probit/selection model, but used the two step approach rather than maximum likelihood.) The log-likelihood function for the bivariate probit model is given in Boyes et al. (1989) and Greene (2008a, p. 896). A straightforward extension of the result provides the log-likelihood for the ordered probit case. Essentially this model is applied in Popuri and Bhat (2003) to a sample of individuals who chose to telecommute (z 5 1) or not (z 5 0) then, for those who do telecommute, the number of days that they do. We note two aspects of this application that do depart subtly the sample selection application: (1) the application would more naturally fall into the category of a hurdle model composed of a participation equation and an activity equation given the decision to participate – in the latter, it is known that the activity level is positive.13 Thus, unlike the familiar choice case, the zero outcome is not possible here. (2) The application would fit more appropriately into the sample selection or hurdle model frameworks for count data such as the Poisson model.14 Bricka and Bhat (2006) is a similar application applied to a sample of individuals who did (z51) or did not (z 5 0) underreport the number of trips in a travel-based survey. The activity equation is the number of trips underreported for those who did. This study, like its predecessor could be framed in a hurdle model for counts, rather than an ordered choice model.

7

CONCLUSIONS

The preceding has developed the standard model for ordered choices as typically analyzed in social science applications (for example, Johnson and Albert, 1999). (There is a parallel, but markedly different stream of literature in biometrics discussed in some detail in Greene and Hensher, 2010, and references noted.) Several model extensions, such as outcomes inflation, and specification issues such as modeling heterogeneity are noted as well. These are developed in greater detail in recent surveys such as Boes

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 357 and Winkelmann (2006a), Greene and Hensher (2010) and Daykin and Moffatt (2002). Ongoing development, such as nonparametric and Bayesian approaches are noted with some pointers to recent literature suggested in Greene and Hensher (2010).

NOTES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

See, for example, Hensher et al. (1992). Hensher et al. (1992). See Bhat and Pulugurta (1998, p. 64). See, again, Bhat and Pulugurta (1998) who suggest a different utility function for each observed level of vehicle ownership. See Congden (2005) for brief Bayesian treatment of an ordered logit model. In this connection, see Train (2003) and Wooldridge and Imbens (2009b) for discussion of the Bernstein– von Mises result. See, for example, Daykin and Moffatt (2002). See Boes and Winkelmann (2006a) and Greene (2008a, ch. E22). See Halton (1970) for the general principle, and Bhat (2001, 2003) and Train (2003) for applications in the estimation of ‘mixed logit models’) rather than random draws. Further details on this method of estimation are also given in Greene (2007, 2008a). See Pratt (1981) and Burridge (1981). See Wooldridge (2002). Note that a ‘robust’ covariance matrix estimator does not redeem the estimator. It is still inconsistent. See, for example, Greene (1994). Several other examples are provided in Greene (2008b). See Cragg (1971) and Mullahy (1986). See, again, Mullahy (1986), Terza (1998) and Greene (1995, 2007).

REFERENCES Abrevaya, J. (1997), ‘The equivalence of two estimators of the fixed effects logit model’, Economics Letters, 55 (1), 41–3. Agostino, A., C. Bhat and E. Pas (1996), ‘A random effects multinomial probit model of car ownership choice’, Proceedings of the Third Workshop on Bayesian Statistics in Science and Technology, Cambridge: Cambridge University Press. Agresti, A. (2002), Categorical Data Analysis, 2nd edn, New York: John Wiley and Sons. Agyemand-Duah, K. and F. Hall (1997), ‘Spatial transferability of an ordered response model of trip generation’, Transport Research – Series A, 31 (5), 389–402. Aitcheson, J. and S. Silvey (1957), ‘The generalization of probit analysis to the case of multiple responses’, Biometrika, 44 (1), 131–40. Ando, T. (2006), ‘Bayesian credit rating analysis based on ordered probit regression model with functional predictor’, Proceeding of The Third IASTED International Conference on Financial Engineering and Applications, Tokyo, pp. 69–76. Bester, C.A. and C. Hansen (2009), ‘A penalty function approach to bias reduction in non-linear panel models with fixed effects’, Journal of Business And Economic Statistics, 27 (2), 131–48. Bhat, C. (1996), ‘A generalized multiple durations proportional hazard model with an application to activity behavior during the work-to-home commute’, Transportation Research Part B, 30 (6), 465–80. Bhat, C. (1997), ‘Work travel mode choice and number of nonwork commute stops’, Transportation Research Part B, 31 (1), 41–54. Bhat, C. (1999), ‘An analysis of evening commute stop-making behavior using repeated choice observations from a multi-day survey’, Transportation Research Part B, 33 (7), 495–510. Bhat, C. (2001), ‘Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model’, Transportation Research Part B, 35 (7), 677–93. Bhat, C. (2003), ‘Simulation estimation of mixed discrete choice models using randomized and scrambled Halton sequences’, Transportation Research Part B, 37 (9), 837–55. Bhat, C. and F. Koppelman (1993), ‘An endogenous switching simultaneous equation system of employment, income and car ownership’, Transportation Research A, 27 (6), 447–59.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

358

Handbook of choice modelling

Bhat, C. and V. Pulugurta (1998). ‘A comparison of two alternative behavioral mechanisms for car ownership decisions’, Transportation Research Part B, 32 (1), 61–75. Bhat, C. and H. Zhao (2002), ‘The spatial analysis of activity stop generation’, Transportation Research Part B, 36 (6), 557–75. Bhat, C., J. Carini and R. Misra (1999), ‘Modeling the generation and organization of household activity stops’, Transportation Research Record, 1676, 153–61. Biswas, B. and H. Das, (2002), ‘A Bayesian analysis of bivariate ordinal data: Wisconsin epidemiologic study of diabetic retinopathy revisited, Statistics in Medicine, 21 (4), 549–59. Boes, S. (2007), ‘Nonparametric analysis of treatment effects in ordered response models’, Socioeconomic Institute, Working Paper 0709, University of Zurich. Boes, S. and R. Winkelmann (2006a), ‘Ordered response models’, Allgemeines Statistisches Archiv, 90 (1), 165–80. Boes, S. and R. Winkelmann (2006b), ‘The effect of income on positive and negative subjective well-being’, Manuscript, University of Zurich, Socioeconomic Institute, IZA Discussion Paper Number 1175. Boyes, W., D. Hoffman and S. Low (1989), ‘An econometric analysis of the bank credit scoring problem’, Journal of Econometrics, 40 (1), 3–14. Brewer, C., C. Kovner, W. Greene and Y. Cheng (2009), ‘Predictors of RNs’ intent to work and work decisions one year later in a U.S. national sample’, The International Journal of Nursing, 46 (7), 940–56. Bricka, S. and C. Bhat (2006), ‘A comparative analysis of GPS-based and travel survey-based data’, Transportation Research Record, 1972, 9–20. Bunch, D. and R. Kitamura (1990), ‘Multinomial probit estimation revisited: testing estimable model specifications, maximum likelihood algorithms and probit integral approximations for car ownership’, Institute for Transportation Studies Technical Report, University of California, Davis. Burridge, J. (1981), ‘A note on maximum likelihood estimation of regression models using grouped data’, Journal of the Royal Statistical Society, Series B, 43 (1), 41–5. Butler, J., T. Finegan and J. Siegfried (1994), ‘Does more calculus improve student learning in intermediate micro and macro economic theory?’, American Economic Review, 84 (2), 206–10. Butler, J. and P. Chatterjee (1997), ‘Tests of the specification of univariate and bivariate ordered probit’, Review of Economics and Statistics, 79 (2), 343–7. Butler, J. and R. Moffitt (1982), ‘A computationally efficient quadrature procedure for the one factor multinomial probit model’, Econometrica, 50 (5), 761–64. Cameron, S. and J. Heckman (1998), ‘Life cycle schooling and dynamic selection bias: models and evidence for five cohorts of American males’, Journal of Political Economy, 106 (2), 262–333. Cameron, A. and P. Trivedi (1998), Regression Analysis of Count Data, New York: Cambridge University Press. Cameron, A. and P. Trivedi (2005), Microeconometrics: Methods and Applications, Cambridge: Cambridge University Press. Carneiro, P., K. Hansen and J. Heckman (2001), ‘Removing the veil of ignorance in assessing the distributional impacts of social policies’, Swedish Economic Policy Review, 8 (3), 273–301. Carneiro, P., K. Hansen and J. Heckman (2003), ‘Estimating distributions of treatment effects with an application to schooling and measurement of the effects of uncertainty on college choice’, International Economic Review, 44 (2), 361–422. Carro, J. (2007), ‘Estimating dynamic panel data discrete choice models with fixed effects’, Journal of Econometrics, 140 (3), 503–28. Castro, M., R. Paleti and C. Bhat (2012), ‘A latent variable representation of count frequency models to accommodate spatial and temporal dependence: application to predicting crash frequency at intersections’, Transportation Research Part B, 46 (1), 253–72. Chamberlain, G. (1980), ‘Analysis of covariance with qualitative data’, Review of Economic Studies, 47 (1), 225–38. Cheung, S. (1996), ‘Provincial credit rating in Canada: an ordered probit analysis’, Bank of Canada, Working Paper 96-6, accessed 17 January 2014 at www.bankofcanada.ca/en/res/wp/1996/wp96-6.pdf. Clark, A., Y. Georgellis and P. Sanfey (2001), ‘Scarring: the psychological impact of past unemployment’, Economica, 68 (1), 221–41. Congden, P. (2005). Bayesian Models for Categorical Data, New York: John Wiley and Sons. Coughlin, C., T. Garrett and R. Hernandez-Murillo (2004), ‘Spatial probit and the geographic patterns of state lotteries’, Working Paper 2003-042b, Federal Reserve Bank of St Louis. Cragg, J. (1971), ‘Some statistical models for limited dependent variables with application to the demand for durable goods’, Econometrica, 39 (5), 829–44. Cunha, F., J. Heckman and S. Navarro (2007), ‘The identification & economic content of ordered choice models with stochastic thresholds’, Discussion Paper WP/26/2007, University College Dublin, Gery Institute.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 359 Czado, C., H. Heyn and G. Muller (2005), ‘Modeling migraine severity with autoregressive ordered probit models’, Working Paper 463, Technizche Universitat, Munchen. Daykin, A. and P. Moffatt (2002), ‘Analyzing ordered responses: a review of the ordered probit model’, Understanding Statistics, 1 (3), 157–66. Econometric Software (2007), NLOGIT: Version 4.0, Plainview, New York. Eichengreen, B., M. Watson and R. Grossman (1985), ‘Bank rate policy under the interwar Gold Standard: a dynamic probit approch’, Economic Journal, 95 (September), 725–45. Eluru, N., C. Bhat and D. Hensher (2008), ‘A mixed generalized ordered response model for examining pedestrian and bicyclist injury severity levels in traffic crashes’, Accident Analysis and Prevention, 40 (3), 1033–54. Everitt, B.S. (1988), ‘A finite mixture model for the clustering of mixed-mode data’, Statistics and Probability Letters, 6 (1), 305–9. Fahrmeier, I. and G. Tutz (2001), Multivariate Statistical Modeling Based on Generalized Linear Models, Berlin: Springer Verlag. Fernández-Val, I. (2009), ‘Fixed effects estimation of structural parameters and marginal effects in panel probit models, Journal of Econometrics, 150 (1), 71–85. Fernández-Val, I. and F. Vella (2007), ‘Bias corrections for two-step fixed effects panel data estimators’, IZA Working Papers Number 2690. Ferrer-i-Carbonell, A. and P. Frijters (2004), ‘How important is methodology for the estimates of the determinants of happiness?’, Economic Journal, 114 (497), 641–59. Franzese, R. and J. Hays (2007), ‘The spatial probit model of interdependent binary outcomes: estimation, interpretation and presentation’, accessed 7 April 2009 at polmeth.wustl.edu/retrieve.php?id5715. Fu, A., M, Gordon, G. Liu, B. Dale and R. Christensen (2004), ‘Inappropriate medication use and health outcomes in the elderly’, Journal of the American Geriatrics Society, 52 (11), 1934–9. Girard, P. and E. Parent (2001), ‘Bayesian analysis of autocorrelated ordered categorical data for industrial quality monitoring’, Technometrics, 43 (2), 180–91. Golob, T. (1990), ‘The dynamics of household travel time expenditures and car ownership decisions’, Transportation Research A, 24 (3), 443–65. Golob, T. and L. van Wissen (1998), ‘A joint household travel distance generation and car ownership model’, Working Paper WP-88-15, Institute of Transportation Studies, University of California, Irvine. Greene. W. (1981), ‘Sample selection bias as a specification error: comment’, Econometrica, 49 (6), 795–8. Greene, W. (1994), ‘Accounting for excess zeros and sample selection in Poisson and negative binomial regression models’, Working Paper 94-10, Department of Economics, Stern School of Business, New York University. Greene, W. (1995), ‘Sample selection in the Poisson regression model’, Department of Economics, Stern School of Business, New York University, Working Paper No. 95-06. Greene, W. (2002), LIMDEP Version 8.0, Reference Guide, Plainview, NY: Econometric Software. Greene, W. (2004a), ‘Fixed effects and bias due to the incidental parameters problem in the Tobit model’, Econometric Reviews, 23 (2), 125–47. Greene, W. (2004b), ‘The behavior of the fixed effects estimator in nonlinear models’, The Econometrics Journal, 7 (1), 98–119. Greene, W. (2005), ‘Functional form and heterogeneity in models for count data’, Foundations and Trends in Econometrics, 1 (2), 113–218. Greene, W. (2006), ‘A general approach to incorporating selectivity in a model’, Working Paper 06-10, Department of Economics, Stern School of Business, New York University. Greene, W. (2007), LIMDEP Version 9.0: Reference Guide, Plainview, NY: Econometric Software. Greene, W. (2008a), Econometric Analysis, 6th edn, Englewood Cliffs, NJ: Prentice Hall. Greene, W. (2008b), ‘A stochastic frontier model with correction for selection’, Working Paper EC-08-09, Department of Economics, Stern School of Business, New York University. Greene, W. and D. Hensher (2010), ‘Ordered choices and heterogeneity in attribute processing’, Journal of Transport Economics and Policy, 44 (3), 331–64. Greene, W., M. Harris and B. Hollingsworth (2013), ‘Inflated responses in self assessed health’, manuscript, Curtin Business School, Curtin University, Perth. Greene, W., M. Harris, B. Hollingsworth and P. Maitra (2008), ‘A bivariate latent class correlated generalized ordered probit model with an application to modeling observed obesity levels’, Working Paper 08-18, Department of Economics, Stern School of Business, New York University. Greene, W., M. Harris, B. Hollingsworth and T. Weterings (2012), ‘Heterogeneity in ordered choice models: a review with applications to self-assessed health’, Journal of Economic Surveys, forthcoming. Groot, W. and H. van den Brink (1999), ‘Job satisfaction with preference drift’, Economics Letters, 63 (3), 363–7. Groot, W. and H. van den Brink (2001), ‘Match specific gains to marriage: a random effects ordered response model’, Quality and Quantity, 37 (2), 317–25.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

360

Handbook of choice modelling

Groot, W. and H. van den Brink (2002), ‘Sympathy and the value of health’, Social Indicators Research, 61 (1), 97–120. Hahn, J. and G. Kuersteiner (2003), ‘Bias reduction for dynamic nonlinear panel data models with fixed effects’, manuscript, Department of Economics, University of California Los Angeles. Hahn, J. and W. Newey (2004). ‘Jackknife and analytical bias reduction for nonlinear panel models’, Econometrica, 72 (4), 1295–319. Halton, J.H. (1970), ‘A retrospective and prospective survey of the Monte Carlo method’, SIAMReview, 12 (1), 1–63. Han, A. and J.A. Hausman (1990), ‘Flexible parametric estimation of duration and competing risk models’, Journal of Applied Econometrics, 5 (1), 1–28. Harris, M. and X. Zhao (2007), ‘Modeling tobacco consumption with a zero inflated ordered probit model’, Journal of Econometrics, 141 (3), 1073–99. Heckman, J. (1979), ‘Sample selection bias as a specification error’, Econometrica, 47 (1), 153–61. Heckman, J. and T. MaCurdy (1981), ‘New methods for estimating labor supply functions’, in R. Ehrenberg (ed.), Research in Labor Economics, Greenwich, CT: JAI Press, pp. 65–102. Hensher, D. and S. Jones (2007), ‘Predicting corporate failure: optimizing the performance of the mixed logit model’, ABACUS, 43 (3), 241–64. Hensher, D., N. Smith, N. Milthorpe and P. Barnard (1992), ‘Dimensions of automobile demand: a longitudinal study of household automobile ownership and use’, in D. Hensher (ed.), Studies in Regional Science and Urban Economics, Amsterdam: Elsevier Science. Holloway, G., B. Shankar and S. Rahman (2002), ‘Bayesian spatial probit estimation: a primer and an application to HYV rice adoption’, Agricultural Economics, 27 (3), 383–402. Hsiao, C. (1986), Analysis of Panel Data, Cambridge: Cambridge University Press. Hsiao, C. (2003), Analysis of Panel Data, 2nd edn, Cambridge: Cambridge University Press. Jansen, J. (1990), ‘On the statistical analysis of ordinal data when extravariation is present’, Applied Statistics, 39 (1), 75–84. Johnson, V. and J. Albert (1999), Ordinal Data Modeling, New York: Springer-Verlag. Jones, S. and D. Hensher (2004), ‘Predicting firm financial distress: a mixed logit model’, The Accounting Review (American Accounting Association), 79 (4), 1011–38. Kadam, A. and P. Lenk (2008), ‘Bayesian inference for issuer heterogeneity in credit ratings migration’, Journal of Banking and Finance, accessed 17 January 2014 at SSRN:ssrn.com/abstract51084006. Kapteyn, A., J. Smith and A. van Soest (2007), ‘Vignettes and self-reports of work disability in the United States and the Netherlands’, American Economic Review, 97 (1), 461–73. Kasteridis, P., M. Munkin, and S. Yen (2010), ‘A binary-ordered probit model of cigarette demand’, Applied Economics, 42 (4), 413–26. Kerkofs, M. and M. Lindeboom (1995), ‘Subjective health measures and state dependent reporting errors’, Health Economics, 4 (3), 221–35. King, G., C. Murray, A. Salomon and A. Tandon (2004), ‘Enhancing the validity and cross cultural comparability of measurement in survey research’, American Political Science Review, 98 (1), 567–83. Kitamura, R. (1987), ‘A panel analysis of household car ownership and mobility, infrastructure planning and management’, Proceedings of the Japan Society of Civil Engineers, 383/IV-7, Tokyo, pp. 13–27. Kitamura, R. (1988), ‘A dynamic model system of household car ownership, trip generation and modal split, model development and simulation experiments’, Proceedings of the 14th Australian Road Research Board Conference, Part 3, Vermint South, Victoria: Australian Road Research Board, pp. 96–111. Kitamura, R. and D. Bunch (1989), ‘Heterogeneity and state dependence in household car ownership: a panel analysis using ordered-response probit models with error components.’ Research Report, UCD-TRG-RR89-6, Transportation Research Group, University of California at Davis. Koop, G. and J. Tobias (2006), ‘Semiparametric Bayesian inference in smooth coefficient models’, Journal of Econometrics, 134 (1), 283–315. Lancaster, T. (2000), ‘The incidental parameters problem since 1948’, Journal of Econometrics, 95 (2), 391–413. LeSage, J. (1999), ‘Spatial econometrics’, accessed 27 January 2014 at www.rri.wvu.edu/WebBook/LeSage/ spatial/spatial.htm. LeSage, J. (2004), ‘Lecture 5: spatial probit models’, accessed 17 January 2014 at www4.fe.uc.pt/spatial/doc/ lecture5.pdf. Li, M. and J. Tobias (2006), ‘Calculus attainment and grades received in intermediate economic theory’, Journal of Applied Econometrics, 21 (6), 893–6. Lindeboom, M. and E. van Doorslayer (2003), ‘Cut point shift and index shift in self reported health’, Ecuity III Project Working Paper No. 2, Tinbergen Institute, Free University of Amsterdam. Long, S. (1997), Regression Models for Categorical and Limited Dependent Variables, Thousand Oaks, CA: Sage. Machin, S. and A. Vignoles (2005), What’s the Good of Education? The Economics of Education in the UK, Princeton, NJ: Princeton University Press.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Models for ordered choices 361 Maddala, J. (1983), Limited Dependent and Qualitative Variables in Econometrics, Cambridge: Cambridge University Press. Mannering, F. and C. Winston (1985), ‘A dynamic analysis of household vehicle ownership and utilization’, Rand Journal of Economics, 16 (2), 215–36. Marcus, A. and W. Greene (1983), ‘The determinants of rating assignment and performance’, Working Paper CRC528, Center for Naval Analyses, Alexandria, VA. McCullagh, P. (1980), ‘Regression models for ordinal data’, Journal of the Royal Statistical Society, Series B (Methodological), 42 (1), 109–42. McElvey, R. and W. Zavoina (1971), ‘An IBM Fortran IV program to perform N-chotomus multivariate probit analysis’, Behavioral Science, 16 (2), 186–7. McElvey, R. and W. Zavoina (1975), ‘A statistical model for the analysis of ordered level dependent variables’, Journal of Mathematical Sociology, 4 (1), 103–20. Metz, A. and R. Cantor (2006), ‘Moody’s credit rating prediction model’, Moody’s, Inc., accessed 27 January 2014 at /www.moodys.com/sites/products/DefaultResearch/2006200000425644.pdf. Mora, N. (2006), ‘Sovereign credit ratings: guilty beyond reasonable doubt?’, Journal of Banking and Finance, 30 (7), 2041–62. Mullahy, J. (1986), ‘Specification and testing of some modified count data models’, Journal of Econometrics, 33 (3), 341–65. Munkin, M. and P. Trivedi (2008), ‘Bayesian analysis of the ordered probit model with endogenous selection’, Journal of Econometrics, 143 (2), 334–48. Murphy, K. and R. Topel (2002), ‘Estimation and inference in two stem econometric models’, Journal of Business and Economic Statistics, 20 (1), 88–97. National Drug Strategy Household Surveys (NDSHS) (2001), Computer files for the unit record data from the National Drug Strategy Household Surveys. Popuri, Y. and C. Bhat (2003), ‘On modeling choice and frequency of home based telecommuting’, Transportation Research Record, 1858, 55–60. Pratt, J. (1981), ‘Concavity of the log likelihood’, Journal of the American Statistical Association, 76 (1), 103–16. Prescott, E. and M. Visscher (1977), ‘Sequential location among firms with foresight’, Bell Journal of Economics, 8 (2), 378–93. Pudney, S. and M. Shields (2000), ‘Gender, race, pay and promotion in the British nursing profession: estimation of a generalized ordered probit model’, Journal of Applied Econometrics, 15 (4), 367–99. Purvis, L. (1994), ‘Using census public use micro data sample to estimate demographic and automobile ownership models’, Transportation Research Record, 1443, 21–30. Rabe-Hesketh, S., A. Skrondal and A. Pickles (2005), ‘Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects’, Journal of Econometrics, 128 (2), 301–23. Rasch, G. (1960), ‘Probabilistic models for some intelligence and attainment tests’, monograph, Danish Institute for Educational Research, Copenhagen. Ridder, G. (1990), ‘The non-parametric identification of generalized accelerated failure-time models’, Review of Economic Studies, 57 (1), 167–81. Riphahn, R., A. Wambach and A. Million (2003), ‘Incentive effects on the demand for health care: a bivariate panel count data estimation’, Journal of Applied Econometrics, 18 (4), 387–405. Roorda, M., A. Páez, C. Morency, R. Mercado and S. Farber (2009), ‘Trip generation of vulnerable populations in three Canadian cities: a spatial ordered probit approach’, manuscript, School of Geography and Earth Sciences, McMaster University. Shaked, A. and J. Sutton (1982), ‘Relaxing price competition through product differentiation’, Review of Economic Studies, 49 (1), 3–13. Snell, E. (1964), ‘A scaling procedure for ordered categorical data’, Biometrics, 20 (3), 592–607. Stata (2008), Stata, Version 8.0, College Station, TX: Stata Corp. Terza, J. (1985), ‘Ordered probit: a generalization’, Communications in Statistics – Theory and Methods, 14 (1), 1–11. Terza, J. (1998), ‘Estimating count data models with endogenous switching: sample selection and endogenous treatment effects’, Journal of Econometrics, 84 (1), 129–54. Tomoyuki, F. and F. Akira (2006), ‘A quantitative analysis on tourists’ consumer satisfaction via the Bayesian ordered probit model’, Journal of the City Planning Institute of Japan, 41 (1), 2–10. (In Japanese.) Train, K. (1986), Qualitative Choice Analysis: Theory, Econometrics, and an Application to Automobile Demand, Cambridge, MA: MIT Press Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge, Cambridge University Press. Tsay, R. (2005), Analysis of Financial Time Series, 2nd edn, New York: John Wiley and Sons. Uebersax, J.S. (1999), ‘Probit latent class analysis with dichotomous or ordered category measures: conditional independence/dependence models’, Applied Psychological Measurement, 23 (4), 283–97.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

362

Handbook of choice modelling

Walker, S. and D. Duncan (1967), ‘Estimation of the probability of an event as a function of several independent variables’, Biometrika, 54 (1), 167–79. Wang, X. and K. Kockelman (2005), ‘Occupant injury severity using a heteroscedastic ordered logit model: distinguishing the effects of vehicle weight and type’, Transportation Research Record, 1908, 195–204. Wang, X. and K. Kockelman (2008), ‘Application of the dynamic spatial ordered probit model: patterns of land development change in Austin, Texas’, manuscript, Department of Civil Engineering, University of Texas, Austin (forthcoming in Papers in Regional Science). Wang, X. and K. Kockelman (2009), ‘Application of the dynamic spatial ordered probit model: patterns of ozone concentration in Austin, Texas’, manuscript, Department of Civil Engineering, University of Texas, Austin. Williams, R. (2006), ‘Generalized ordered logit/partial proportional odds models for ordinal dependent variables’, The Stata Journal, 6 (1), 58–82. Winkelmann, R. (2005), ‘Subjective well-being and the family: results from an ordered probit model with multiple random effects’, Empirical Economics, 30 (3), 749–61, Wooldridge, J. (2002), Econometric Analysis of Cross Section and Panel Data, Cambridge, MA: MIT Press. Wooldridge, J. and G. Imbens (2009a), ‘Lecture notes 4, summer 2007’, accessed 27 January 2014 at www.nber. org/WNE/lect_4_nlpanel.pdf. Wooldridge, J. and G. Imbens (2009b), ‘Lecture notes 7, summer 2007’, accessed 27 January 2014 at www. nber.org/WNE/lect_7_Bayes_Fig.pdf. Wynand, P. and B. van Praag (1981), ‘The demand for deductibles in private health insurance: a probit model with sample selection’, Journal of Econometrics, 17 (2), 229–52. Zavoina, W. and R. McKelvey (1969), ‘A statistical model for the analysis of legislative voting behavior’, presented at the meeting of the American Political Science Association, New York, September. Zhang, J. (2007), ‘Ordered probit modeling of user perceptions of protected left-turn signals’, Journal of Transportation Engineering, 133 (3), 205–14. Zhang, Y, F. Liang and Y. Yuanchang (2007), ‘Crash injury severity analysis using a Bayesian ordered probit model’, Transportation Research Board, Annual Meeting, Paper Number 07-2335. Zigante, V. (2007), ‘Ever rising expectations – the determinants of subjective welfare in Croatia’, master’s thesis, School of Economics and Management, Lund University, accessed 17 January 2014 at www.essays. se/about/Ordered1Probit1Model/.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Discrete Choice Decision-Making with Multiple Decision Makers within the Household Andr´e De Palma, Nathalie Picard, Ignacio Inoa

To cite this version: Andr´e De Palma, Nathalie Picard, Ignacio Inoa. Discrete Choice Decision-Making with Multiple Decision Makers within the Household. cahier de recherche 2014-10. 2014.

HAL Id: hal-00969216 https://hal.archives-ouvertes.fr/hal-00969216 Submitted on 2 Apr 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destin´ee au d´epˆot et ` a la diffusion de documents scientifiques de niveau recherche, publi´es ou non, ´emanant des ´etablissements d’enseignement et de recherche fran¸cais ou ´etrangers, des laboratoires publics ou priv´es.

ECOLE POLYTECHNIQUE CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE

DISCRETE CHOICE DECISION-MAKING WITH MULTIPLE DECISION MAKERS WITHIN THE HOUSEHOLD

André de PALMA Nathalie PICARD Ignacio INOA

Cahier n° 2014-10

DEPARTEMENT D'ECONOMIE Route de Saclay 91128 PALAISEAU CEDEX (33) 1 69333033 http://www.economie.polytechnique.edu/ mailto:[email protected]

Discrete Choice Decision-Making with Multiple Decision Makers within the Household André de Palma13, Nathalie Picard23, Ignacio Inoa2* Ecole Normale Supérieure de Cachan, Ce t e d’é o o ie de la “o o

1

2

e

Université de Cergy-Pontoise, THEMA

3

Ecole Polytechnique, Département d’Economie§ Abstract

There is still a long way to achieve the goal of providing a theoretical and empirical framework to model and apply economics of the family. Decision-making within the family has been neglected too long in transportation. Two special issues by Bhat and Pendyala, 2005 [17] and by Timmermans and Junyi Zhang, 2009 [81] provide the most notable exceptions. The objective of this paper is to set-up a flexible framework to discuss the development of integrated transportation models involving interacting and interdependent actors; updating previous reviews from the point of view of economics of the family . Transportation is very keen to have access to this type of models, since their applications are numerous. Let mention, for example, eside tial lo atio

hoi e,

o kpla e hoi e, a o

e ship, hoi e of hild e ’s s hool,

ode

choice, departure time choice activity patterns and the like. The (non unitary) economics of the family models are totally different models, which do not merely extend existing discrete choice models. They introduce new concepts, which are specific to within family interactions: negotiation, altruism, or repeated interaction and Pareto optimality. This review is completed with the study of different types of accessibility measures including recent work on timegeography measures of accessibility. *

We would like to thank Stephen Hess and one anonymous referee for their useful comments. 61 Av. du Président Wilson, 94230 Cachan Cedex France; [email protected]. 133 Bd. du Port, 95011 Cergy-Pontoise Cedex, France; [email protected]; [email protected].

§

Route de Saclay, 91128 Palaiseau, France.

1.

Introduction

Different research streams concerned with household decision-making have developed independently in different disciplines. The corresponding papers consider topics such as labour supply, transportation decisions, time and task allocation, or residential and employment choices. Literature in these fields has been dominated by models in which the household is treated as a single decision-making unit or unitary models (see Timmermans, 2009 [79] for an extensive review on past research in the transportation literature, and Vermeulen, 2002a [84], 2002b [85] for a literature review on unitary and collective household models, and Bianchi and Robinson, 1997 [18] for sociological study of time use within the household). In collective models, the different household members are engaged in a joint decision process involving bargaining. Until recently, interactions within the household were not explicitly modelled and the decision-making process outcome was considered as resulting from a representative individual (as if the household were a black box which needs not to be opened). In such models, household interactions were either introduced through explanatory variables defined at the household level, or simply disregarded in models of activity-travel demand (see Srinivasan and Bhat, 2005 [75]). Examples of household-level explanatory variables like number of household members, of active members, of children, household income, and other household dummy variables (e.g., occupational status, property status, age, etc.) are provided in Townsend, 1987 [82] or in Golob and McNally, 1997 [40], among others. However, many household decision-making processes involve more than one decision maker who cannot be reduced to a single agent. Note that even when the decision problem under scrutiny does not involve multiple decision makers, most of the individual decisions depend, directly or indirectly, on the choices made by other household members (Timmermans and Zhang, 2009 [81]). A growing body of research in different fields, ranging from transportation demand to labour economics, has recently started to explicitly take into account the interactions between household members, and to model the corresponding strategies. The distinction between discrete and continuous household decisions is at the core of these new theoretical and empirical developments. Van Soest, 1995 [83] proposed a discrete choice model of labour supply disregarding the negotiation process within the household, whereas collective models of time allocation and labour supply focus on this negotiation process in discrete 2

(Vermeulen, 2006 [86]) or in continuous-discrete applications (Blundell et al., 2007 [22]). The modelling of within-family interactions took off in the transportation literature with the special issues on modelling intra-household interactions edited by Bhat and Pendyala, 2005 [17] and by Timmermans and Zhang, 2009 [81]. In a nutshell, Bhat and Pendyala, 2005 [17] focus on contributions based on utility-maximizing models, whereas Timmermans and Zhang, 2009 [81] p ese t

o k that

adopt di e se

ethodologies as g oup de isio

theo , a d

i o-

simulation approaches. There are a few contributions using experimental economics methodology to compare the decisions taken by the husband alone, the wife alone, and then by the spouses together (see Bateman and Munro, 2005 [5], Beharry-Borg, Hensher and Scarpa, 2009 [13] or de Palma, Picard and Ziegelmeyer, 2011 [65]). However, they do not analyse the decision mechanism within the couple. A careful review of the literature reveals two main streams of research using the discrete choice framework to model household decision-making processes in multi-person households (several decision makers), with an explicit modelling of within-family interactions. In these studies, household decision-making models are developed in a discrete choice framework or with discrete choice (econometric/empirical) applications. The first research stream is related to collective models and their discrete labour supply model applications. The second research stream covers the transportation, activity-demand and location literature. These streams will be described in the following sections. Section 2 explains the difference between individual and family models, and provides a transition from unitary to collective and negotiation models, with a special focus on labour supply models. The transportation and activity-travel demand literature with intra-household interactions is presented in Section 3. Section 4 describes the location and accessibility contributions considering multiple decision makers within the households, while Section 5 concludes.

2.

Individual versus Family Models

There are two cases when it becomes relevant (and often necessary) to switch from individual decision-making models to family decision-making models. The first case is when the choices themselves are relevant to the family (marriage, divorce, number of children). The second case 3

is when individual choices generate externalities to the other members of the household (competition among family members for using a joint car, joint leisure, chores sharing, spatial o ilit , eti e e t pla s…). Current research on family economics aims at providing the best representation of the household decision-making process, when the household is made of several decision-makers. In the simplest (unitary) models, the household is considered as the unique decision unit, whereas in the most elaborate models, each household member is characterized by specific preferences, and the household decisions result from the confrontation (or aggregation) of these individual preferences.

2.1. Individual Choices about Consumption and Work Traditionally, family economics describes household behaviour, focusing on choices concerning consumption and work. In order to examine these questions, the household is described as a small production unit combining domestic time with intermediary goods bought on the market in order to produce some commodities, which are in turn consumed by household members). These developments have allowed economists to answer old but important questions related to major socio-demographic changes over the twentieth century, such as the change in domestic working hours (Gronau, 1977 [42]) and female labour supply, the growing divorce rate (Becker, Landes, and Michael, 1977 [11]), or the diminishing fertility rate (Becker and Lewis, 1973 [12]). The description of household behaviour may rely on contributions from various disciplines such as sociology, demography or ethnology (see Picard, 1999 [66]). The topics under study go far beyond consumption or work choices. However, the way of representing household decisions in fa il e o o i s u til the late

’s is

ot e

fa f o

that of "t aditio al"

odels of

consumption and labour supply, since these models generally neglect the multiplicity of decision-makers. The methodology traditionally used in family economics is quite straightforward. A (unique) household utility function describes the household preferences, taking as arguments the quantity of goods consumed (including local amenities when location choices are at stake) and/or leisure time. This function is maximized under a unique budget constraint aggregating the resources and expenses of all household members (and possibly a unique aggregated time 4

constraint). The maximization of household utility determines the relevant demand functions. This procedure can be used for assessing the effects of economic policies on individual behaviour and welfare. For example, Hausman, 1981 [46] estimates the effect of a variation in marginal taxation rates on hours worked, and measures the cost of the US taxation system in terms of welfare.

2.2. Family Economics in Unitary models I

the u ita

odel , the household is o side ed as a u i ue de isio -maker, and no

attention is devoted to the complexity of the decision-making process and of the numerous transactions between family members. As pointed by Nobel Prize laureate Amartya Sen, 1983 [72], in the unitary models literature, the household members are incorporated into a "glued together family". Following the impulsion originated from seminal work by Nobel Prize laureate Gary Becker (1965 [6]; 1973 [7]; 1974 [9]; 1991 [10]), recent developments in family economics broadens the classical research field to new questions such as the marriage decision or choices related to the number of children, their education, and the allocation of tasks and time among household members. In unitary models, any difference in the demographic structure (e.g. differences between a single person and a couple with or without children) is either totally ignored, or simply reflected in ad hoc equivalence scales. In addition, any conflicting situations that might arise among members (affecting the decisionmaking process) are disregarded. The unitary model indeed neglects the diverging interests that may arise among household members, and implicitly assumes that these members pursue consensual objectives, leading to a poor understanding (and thus modelling) of decision mechanisms (and therefore, of resources allocation) within the household. This generates three kinds of drawbacks. The first drawback is a wrong interpretation of empirical results. For example, Lise and Seitz, 2011 [56] show that failing to consider changes in the intrafamilial distribution of consumption leads to a major overestimation of inequality growth over the last 40 years in the UK. This bias should be taken into account in any poverty-reducing policy, and its recognition would lead to a fundamental rethinking of the change in intrafamilial distribution of consumption. The second drawback is a wrong assessment of the effects of economic policies 5

on the well-being of each household member, concerning for example a change in the income taxation system. Lundberg, Pollak, and Wales, 1997 [58] illustrate the importance of these questions for family policy, and show that a shift of family benefits from the father to the mother during the 70s in the UK was followed by a rise in the demand for women's and children's clothes. This indisputable empirical result is inconsistent with the "unitary model" of the household in which every member has the same objective function. The third drawback is a poor predictive power of unitary models and a biased evaluation of behavioural effects of economic policies. Unitary models implicitly or explicitly assume that the household is the basic decision unit, and acts as a unique decision-maker. In multi-person households, this implies that members are supposed to act as if the household preferences could be represented by a unique utility function, maximized under a unique budget (and time) constraint. The consensus model proposed by Samuelson, 1956 [70] could provide some justification of this unitary description, under very restrictive and unrealistic assumptions. Another attempt to legitimate the unitary approach was made by Becker, 1974 [8], with his famous "rotten kid" theorem. It basically states that, if the e is a

e e ole t di tato

i the fa il , the all fa il

e

e s, e e if

they are selfish, will act to maximize the same utility function as the benevolent dictator. The key assumption is that the benevolent dictator transfers money to each family member. All members then want to please the benevolent dictator in order to receive a larger transfer. However, Bergstrom, 1989 [16] later stressed that this attempt to justify unitary models relies on too arbitrary and unrealistic assumptions (especially the assumption about transfers). Moreover, theoretical predictions derived from unitary models, such as income pooling, are often in contradiction with empirical data. Income pooling means that household decisions depend on household total resources, but not on the distribution of these resources among family members. This prediction comes from the assumption of a unique budget constraint defined at the household level, and a unique household utility function to be maximized. Income pooling has been rejected, for example, by Thomas, 1990 [78], who showed that the relative contributions of men and women to the household income influence household decisions.

6

2.3. Family Economics in collective and other bargaining models Collective models and other within-households bargaining models aim at answering the theoretical and empirical criticisms addressed to unitary models of family decision-making. They developed in two major directions. “t ategi

odels el di e tl o the theo

Ulph, 1981 [4], Leuthold, 1968 [55]), hile

of o -cooperative games (see, e.g. Ashworth and olle ti e

odels p oposed

Chiappo i

[25],

1992 [26] rely on the basic assumption that the household decision process leads to Paretoefficient allocations. The bargaining process may then be either explicit (as in McElroy and Horney, 1981 [59], or in Lundberg and Pollak, 1993 [57]), or non-specified (as in Chiappori, 1988 [25], 1992 [26]). In the latter case, the bargaining process is very general and not restrictive, only assuming Pareto-optimality. Pareto-optimality hypothesis seems natural for analysing household decisions since family members, who interact over a long period, are probably able to find mechanisms leading to efficient decisions. Collective models can be used to study the welfare level of each household member, and therefore to analyse and measure in a consistent way the redistributive effects of any economic policy, not only at the household level but also at the individual level. Indeed, recent theoretical results show that, under some rather plausible conditions, individual utility functions can be recovered from household behaviour (and disentangled from bargaining power effects, whereas bargaining effects induce a bias in the measurement of preference parameters in unitary models). This evaluation of economic policies offers promising research avenues, especially in the context of urban development or transportation policies.

2.4. Labour supply models within the family Examples of applications of the two-adult model are provided by Hausman and Ruud, 1984 [45]; Ransom, 1987 [69]; Bloemen, 1989 [20]; Kapteyn, Kooreman, and van Soest, 1990 [48]. In these models, hours worked by the two spouses are treated as mixed discrete and continuous random variables, whereas in van Soest, 1995 [83] they are treated as discrete, which allows easily 7

incorporating nonlinear taxes, joint filing, fixed costs of working, unemployment benefits, hours restrictions, unobserved wage rates of non-workers, and random preferences, etc., without challenging model tractability. The models are estimated using smooth simulated maximum likelihood relying on Gourieroux and Monfort, 1993 [41]. The method is illustrated by several policy simulations. Unfortunately, it assumes a unitary model, which neglects the effect of the policy on respective bargaining powers, and may therefore lead to severe bias in labour supply reactions. In the collective models (Chiappori, 1988 [25]; Chiappori, 1992 [26]), the only assumption made about the decision-making process is the Pareto efficiency (optimality of decisions). That is, the decision-making process involving more than on household member leads to allocations such that it would not be possible to make one household decision-maker better-oﬀ, ithout at least another household decision- ake

o se oﬀ. The olle ti e

odels a e e

aki g

ge e al i

the sense that they do not rely on a restrictive specific bargaining process, and they do not assume any restrictive functional form for each member preferences. Collective models have proved at many occasions to perform better than the usual unitary models to explain observed behaviour (Vermeulen, 2002a [84]). Collective models of labour supply have been developed for two-earner households (e.g., Fortin and Lacroix, 1997 [37]; Moreau and Donni, 2002 [62]; Chiappori, Fortin, and Lacroix, 2002 [28]) in a continuous framework. However, the interest here is limited to the models in which utility functions are directly estimated and household labour supply is considered as a discrete choice problem. The contributions of Van Soest, 1995 [83], Bingley and Walker, 1997 [19] and Keane and Moffitt, 1998 [49] are the first ones to use a discrete choice framework to study labour supply, but this was done in the context of a unitary model. Van Soest, 1995 [83] developed a joint labour supply model for two spouses, in which the budget set is discrete. The discretization of the budget set allowed for incorporation of nonlinear taxation and non-convexities, which gave rise to a series of discrete collective labour supply models, starting with Laisney, 2002 [53] and subsequently with Vermeulen et al., 2006 [87], Vermeulen, 2006 [86], and Blundell et al., 2007 [22]. Laisney, 2002 [53] considers household labour supply as a discrete choice problem, and integrates non-participation and

8

nonlinear taxation. Vermeulen et al., 2006 [87], develop a discrete choice collective model and solve it using a procedure mixing calibration and estimations. Blundell et al., 2007 [22], consider a model in which the male labour supply is discrete, whereas the female labour supply is continuous. Vermeulen, 2006 [86], models female labour supply in a discrete choice framework considering male labour supply as given, and including non-participation and nonlinear taxation. Other discrete collective models of labour supply include Callan, Van Soest, and Walsh, 2009 [24]; Bloemen, 2010 [21]; Haan, 2010 [43]; Michaud and Vermeulen, 2011 [60]; Pacifico, 2012 [64].

3.

Intra-household interaction and group decision-making

models The main research stream applying discrete choice models to household decisions involving multiple decision-makers has been developed in the transportation literature. It includes the socalled intra-household interaction and group decision-making models of transportation, activitydemand, and location choices. Intra-household interaction models study how the household decides in the long run and in the short run. More precisely, these models take into account the interdependencies between residential location and workplace of household members (long term), or between activities and travel patterns (short term). Intra-household interaction models are interested in studying, for example, which activities are conducted in a day or over several days by the household members (sharing maintenance responsibilities, household cars, and pick-up and drop-off); when, where, by whom and with whom the activities are performed (joint or independent engagement on activities); and how job types and job locations of the different household members affect professional and residential mobility and location decisions of the household. The general research stream concerned with intra-household interaction and group decisionmaking models of transportation, activity-demand, and location choices research stream can be subdivided into two interconnected subfields of study. The first subfield is concerned with choices in a long-term decision context (i.e. residential location and mobility; job location and mobility, car ownership). Abraham and Hunt, 1997 [1]; Freedman and Kern, 1997 [38]; Sermons

9

and Koppelman, 2001 [73] and Waddell, 1996 [92] have analysed residential and professional location choices as resulting from a multiple-worker household decision-making process. The second subfield is made of studies concerned with choices in a short-term decision context: the activity-travel demand literature (i.e. mode choice, travel behaviour, car sharing, and task allocation and activity based models). This literature develops models of task allocation, decisions related to joint travel and activity participation, mode choice, car sharing and so on. Discrete choice modelling on these topics has been studied by Wen and Koppelman, 1999 [94], 2000 [95]; Gliebe and Koppelman, 2002 [39]; Scott and Kanaroglou, 2002 [71]; Vovsha, Petersen, and Donnelly, 2003 [89]; 2004a [91]; 2004b [90]; Bradley and Vovsha, 2005 [23]; Srinivasan and Athuru, 2005 [74]; Srinivasan and Bhat, 2006 [76], among others. The work of Gliebe and Koppelman, 2002 [39]; Scott and Kanaroglou, 2002 [71]; Vovsha, Petersen, and Donnelly, 2003 [89]; Srinivasan and Bhat, 2006 [76] concerns the decision to participate in an activity jointly or independently from other household members. Gliebe and Koppelman, 2002 [39] model independent activity participation, allocation of time to joint activities, and the interplay between individual and joint activities using a proportional share model. Scott and Kanaroglou, 2002 [71] develop a trivariate (by household type) ordered probit to model the number of non-work, out-of-home activity episodes for household heads. The work concerned with task allocation (of maintenance activities) is better represented by discrete choice model systems that are embedded in tour-based travel demand modelling systems. On the one hand there is the discrete choice system of Vovsha, Petersen, and Donnelly, 2003 [89]; 2004a [91]; 2004b [90] that is the joint travel model component that makes part of the Mid-Ohio Regional Planning Commission. On the other hand there is the discrete choice system of Bradley and Vovsha, 2005 [23] that is part of the activity-based model of the Atlanta region. Bradley and Vovsha, 2005 [23] survey the contributions on activity-travel demand literature, in which either intra-household decision-making is not considered explicitly, or discrete choice model techniques are not used. The attention is restricted here to theoretical or empirical developments that consider a discrete choice modelling strategy in activity-travel demand models accounting for 10

interpersonal dependencies in multiple decision-makers households. However, the activitytravel demand literature has also used seemingly unrelated regressions (SUR) and structural equation modelling (SEM) to account for household interactions (see Srinivasan and Bhat, 2005 [75]). These approaches usually develop a SUR or SEM system of two or more equations corresponding to the time invested in activities by the household head and the other members in consideration (household head spouse and/or children or other active household members). Other classifications of activity-travel demand models that account for interpersonal dependencies in households with multiple decision makers have been proposed. For instance, Timmermans, 2009 [79] classifies activity-based travel demand models that explicitly consider interactions within households with multiple decision makers into three categories: microsimulation, rule-based and utility-maximizing models. Micro-simulation models simulate a household member daily activity-travel pattern using algorithms that replicate the observed patterns from data (including time constraints and actual decision-making outcome) giving ti i g a d se ue e of a ti ities s hedules that a ou t fo household’s a d pe so al’s characteristics (see, e.g., Pribyl and Goulias, 2005 [68]). The second category of models is referred to as the rule-based models. They build multi-agent computational processes in which the individual activity-t a el de isio s efle t if-the

de isio t ee st u tu es, ega di g hi h

activities, with whom, and for how long the activities are conducted (see, e.g., Arentze and Timmermans, 2004 [3]). Ti

e

a s’ last atego

of

odels o espo ds to the utilit -maximizing models. He further

subdivides utility-maximizing models into those using the discrete choice approach (based on the random utility models) and in those using the time allocation approach. Time allocation models are based on a group utility function. This function is a linear function of individualspecific terms and of interaction terms that reflect the interactions between different individuals in a multiplicative form. The household then allocates its time to activities such that its utility would be maximized given individual time constraints (see, e.g. Zhang and Fujiwara, 2006 [98]).

11

4.

Location and Accessibility

Lee et al, 2010 [54] categorized accessibility measurement approaches into four groups: the proximity based (measured in term of travel time, distance, etc.), the gravity-based (derived from the denominator in the gravity model), the cumulative opportunities approach (as a special case of the gravity-based measure), and the utility-based approach (denominator of the MNL model). See Lee et al, 2010 [54] for more details on this classification and for further references. The utility-based approach allows for the development of disaggregated or individual-specific accessibility measures captured in the log-sum variable when the nested logit formulation is used. See Ben-Akiva and Lerman, 1979 [14]; Srour et al., 2002 [77]; Waddell and Nourzad, 2002 [93]; and Zondag and Pieters, 2005 [99] among others. A major determinant of household location is accessibility. Accessibility to jobs measures the spatial proximity of the residential location to the job location. In the absence of income effects, and with a Logit demand function, the accessibility is measured by a log-sum term, which is a measure of consumer surplus. In the Logit case, accessibility corresponds to the expected maximum utility, which is equal to the Logarithm of the denominator of the Logit demand function. It is easy to see that when the utility is additive in income, the derivative of the accessi ilit is the de a d fu tio . This is a di e t appli atio of Ro ’s ide tit (Anderson, de Palma, and Thisse, 1992 [2]). The same property is true for the Generalized Extreme Value case, provided that the utility is additive in income (and, as a consequence, there are no income effects). The reader is referred to Anderson, de Palma, and Thisse, 1992 [2]; and de Palma and Kilani, 2007 [31] for details. In the homogenous case, all agents have the same preferences (this means in particular, the same values of time and the same preferences for jobs), and therefore have the same measure of accessibility. In the heterogeneous case, the accessibility depends on household characteristics and in particular on the value of time. Research on residential location has commonly used accessibility as an aggregated measure of ease of access to jobs or people in choice models where the household is considered as a single decision-making unit (individual or unitary approach). By contrast, Chiappori, de Palma, Picard 12

and Inoa, 2013 [27] have studied residential location of households including two active spouses. In the context of a collective model assuming Pareto-optimality, they measure si ulta eousl spouses’ espe ti e alues of ti e a d

a gai i g po e s. The sho

that

neglecting bargaining powers can lead to a bias in the estimated values of time which may reach 20%. The order of magnitude of this bias is comparable to the male-female difference in values of time. Similarly, Picard et al., 2013 [67]

easu e si ulta eousl spouses’ espe ti e alues of

time and bargaining powers in a joint mode choice model. In the individual or unitary approach, accessibility has been (1) studied in single and multipleworker location choice models, (2) measured using different approaches, and (3) used as an indicator of non-work activities. We develop in the following subsections these three issues on the use and measurement of accessibility measures.

4.1. Accessibility Measures From Multiple Worker Location Choice Models The study of the interactions between household members within a household in residential location choice models has allowed the differentiation of accessibility measures by sociodemographic characteristics, identifying differences between females and males, and between multiple-worker households and one-worker households. The reference studies of multipleworker residential location choice models are the works of Timmermans et al., 1992 [80], Abraham and Hunt, 1997 [1], Freedman and Kern, 1997 [38], and Sermons and Koppelman, 2001 [73]. Note that Timmermans et al., 1992 [80] studied the residential location choice of two-worker households but using a nine-step (decompositional) joint choice model. Abraham and Hunt, 1997 [1] used a logit model structure with a system for weighting the contributions of different workers to the household utility in a three-level nested logit (residential location, workplace, and mode choice). Freedman and Kern, 1997 [38], analyses residential location and workplace with a joint logit model where a two-worker household jointly chooses residential location and both spouses workplace to maximize utility, subject to budget and time constraints. Sermons and Koppelman, 2001 [73] develop a multinomial logit model of residential location choice to study differences between males and females in sensitivity to commuting time for two-worker households. 13

In general, these studies showed that females are more sensitive to commuting time and accessibility measures than males. Demographic characteristics such as presence of children, workplace status, and spouses’ occupation and workplace location, are determinants for commuting time and accessibility measures, and therefore residential location choices in a multiple worker household.

4.2. Individual-specific Accessibility Measures Despite the variety of contributions to the study of residential location, little has been said regarding the influence of job type on the individual-specific accessibility to jobs, and therefore on the residential location and workplace choices when individuals are considered forwardlooking. Household members choose a workplace conditional on their current residential location, while also considering the future changes on workplaces when choosing their residential location. Inoa, Picard, and de Palma, 2014 [47] have elaborated a three-level nested logit model that allows to study the interdependency of residential location and workplace, while accounting for variation of preferences for job types across individuals. Residential location is the upper level choice, and workplace and job type are the middle and lower level choices, respectively. With this nested structure, an individual-specific accessibility measure is constructed, which corresponds to the expected maximum utility across all potential workplaces and job types. When considering accessibility to jobs, the choice of a particular workplace depends on the distribution of jobs by type, which are valued differently by different workers. Their modelling of the job type choice allows them to compute an individual-specific measure of attractiveness to job types (log-sum variable) and to use it in the workplace location choice model. Using data from the Paris Region Census, Inoa, Picard, and de Palma, 2014 [47] find that the individual-specific job type attractiveness measure is a more significant predictor of workplace location than the standard total number of jobs measure. Most importantly, the individualspecific accessibility measure is an important determinant of the residential location choice, and its impact on the residential location choice strongly depends on gender, fertility, age, and education. Some resulting individual-specific accessibility maps are displayed in Annex. They show that accessibility is more equally distributed over the region for the lowest education 14

level, whereas it is more concentrated in the Central business district for the most educated workers.

4.3. Time Geographic Measures of Accessibility 4.3.1. Activity Pattern Models The literature on residential location has not restricted accessibility to the concept of proximity to jobs. It has also studied the accessibility to different (non-work) activity opportunities and measured their respective influence on residential location. Activity–travel demand and task allocation models are concerned with the activity patterns of households and individuals all over a day (and even all over a week, in the new activity-based time use data sets). Capturing nonwork accessibility is therefore essential when modelling in-home and out-of-home activity patterns and trip chaining (Neutens et al., 2012 [63]). Accessibility measures adapted for these models can be found in the framework of time geographic measures of accessibility. Hägerstrand, 1970 [44] introduced the concept of time-space prism (TSP) in order to describe the temporal and spatial constraints in which individuals travel to and participate in activities. Time-space prisms define the locations that an individual can reach given a time budget. The area shaped by the potential locations that the individual can reach in a given time gap is referred to as the potential path area (PPA). A thorough study of the time-geographic measures can be found in the work of Miller, 1991 [61] and Kwan, 1998 [52]. Kim and Kwan, 2003 [50] provide a review on accessibility measures used in empirical settings derived from the timespace prism. There exist only a few applications where time-geographic measures of accessibility have been used in the literature under consideration here, that is, discrete choice location models and activity-travel demand and time allocation models that consider intra-household interactions. Among these applications are the work of Lee et al., 2010 [54]; Yoon and Goulias, 2009 [97], 2010 [96]; Kitamura et al., 2001 [51]; Ettema, 2006 [36]. Lee et al., 2010 [54] developed a discrete choice residential location model that includes a disaggregated accessibility measure to non-work activities (derived from the TPS framework), while also accounting for the accessibility to jobs. Yoon and Goulias, 2009 [97], 2010 [96] 15

developed a structural equations model of activity and time allocation that consider intrahousehold interactions in households without child only and then in households with and without child where the accessibility measure used is based on time geography. Using timegeographic accessibility measure, Kitamura et al., 2001 [51] studied the influence of travel patterns and residential location on car ownership; and Ettema, 2006 [36] developed a discrete continuous Tobit model of activity participation and duration.

4.3.2. Dynamic Transport Models The time geographic models described above neglect the interactions between households. However, such interactions are important in the case of congested cities and they evolve across time and geography. Such externalities are analyzed in dynamic transport models. METROPOLIS is a dynamic model which describes mode choice, route choice and departure time choice (see de Palma and Marchal, 2002 [35]; de Palma, Kilani, and Lindsey, 2005 [32]; and de Palma and Lindsey, 2006 [33]). It is dynamic in the sense that congestion depends on the time of the day. It uses a nested logit model, where the mode choice is made at step one, while departure time choice is made at step two. The departure time choice is given by a continuous logit model, with the log-sum formula as the welfare measure. In this case, the mode choice model, at the upper level, depends on the accessibility at the lower stage. Consider a user going from origin i to destination j. The attractiveness of this car user (the formula is similar for the public transportation users) is: Aij  t    exp  Cij  u  /  du, T

0

where C ij(u) represents the generalized cost of an individual using private transportation, and u (and t) denote the departure time. Following Vickrey, 1969 [88], this generalized cost is given by:

Cij  u    ttij  u     t * u  ttij (u )    u  ttij u   t * , 



where  is the value of time,  is the unit schedule delay early parameter,  is the unit schedule delay late parameter, t* is the desired arrival time at destination and ttij(u) is the travel time given the departure time u. Such measure is potentially useful for other applications, such

16

as activity choice, or residential location. In the latter case: the log-sum aggregation over the destinations j provides a potential user benefit of residential location at i. In the case of couples, the value of time of the man may not be the same in the case his spouse is at home at the time he leaves and in the case she leaves before him, and vice versa. In that case, the generalized cost function of the man depends on the departure time of the woman, and the generalized cost function of the woman depends on the departure time of the man. The resulting within-family externalities have been studied by de Palma, Picard, and Lindsey, 2012 [34]. They showed that, although cooperation is clearly beneficial for couples themselves, it may exacerbate congestion and thus worsen negative externalities between families.

4.4. Interactions within families outside the household Compton and Pollak, 2009 [29] have analysed interactions within larger families, living in different households. They describe and analyse the patterns of proximity and co-residence involving adult children and their mothers using data from the National Survey of Families and Households (NSFH) and the U.S. Census. Their idea is that the ability of family members to engage in intergenerational transfers of hands-on care requires close proximity or co-residence. They find that, in spite of the decline in intergenerational co-residence in the United States, most Americans still live within 25 miles of their mothers, and even closer for the lowest educational levels. Individual characteristics such as age, race and ethnicity affect both the probability of co-residence and close proximity, and their effect depends on gender and marital status, indicating the need to model the corresponding categories separately. Compton and Pollak, 2011 [30] further show that close geographical proximity to mothers or mothers-in-law has in turn a substantial positive effect on the labour supply of married women with young children. They argue that proximity increases labour supply through the availability of childcare. Their interpretation of availability is there broad enough to include not only regular scheduled childcare during work hours but also an insurance aspect of proximity (e.g., a mother or mother-in-law can provide irregular or unanticipated childcare). Using large American datasets, they find that the predicted probability of employment and labour force participation is 4-10 percentage points higher for married women with young children living in close proximity to their mother or their mother-in-law compared to those living further away.

17

5.

Conclusion and extensions

There is still a long way to achieve the goal of providing a theoretical and empirical framework to model and apply economics of the family models. Decision-making within the family has been neglected too long in transportation. Two special issues by Bhat and Pendyala, 2005 [17] and by Timmermans and Junyi Zhang, 2009 [81] provide the most notable exceptions. We also refer the reader to the discussion on group behavior, held at 11th International Conference on Travel Behavior Research (Kyoto), and organized by J. Zhang and A. Daly (Timmermans, 2009 [79]). The objective of this paper was to set-up a flexible framework to discuss the development of integrated

transportation

models

involving

interacting

and

interdependent

actors.

Transportation is very keen to have access to this type of models, since their applications are numerous. Let mention, for example, residential location choice, workplace choice, car ownership, hoi e of hild e ’s s hool,

ode hoi e, depa tu e ti e hoi e a ti it patte s

and the like. The (non unitary) economics of the family models are totally different models, which do not merely extend existing discrete choice models. They introduce new concepts, which are specific to within family interactions: negotiation, altruism, or repeated interaction and Pareto optimality. It is our belief that that activity pattern, mode choice, allocation of time, residential and job location choices, as well as departure time choice, cannot be analysed in the family, without the idea that there are almost always conflicting interests, given the budget and the time constraints of the different family members. Therefore, some coordination and cooperation is needed to achieve common goals, even if the cost borne by spouses is often not identical. The transportation field has made large advances in this area, but the connection with the theoretical and the econometric model are still open. The group behavior discussion in the 11th International Conference on Travel Behavior Research showed that cross-fertilization between the economics of the family and the transportation field is needed (). We hope that this chapter will provide a first step to fulfil this gap. Some preliminary discussion on how to integrate bargaining and collective models in transportation and urban economics can be found in Ben-Akiva et al., 2012 [15].

18

References [1]

John E. Abraham and John D. Hunt. Specification and estimation of nested logit model of

home, workplaces, and commuter mode choices by multiple-worker households. Transportation Research Record, 1606:17–24, 1997. [2]

Simon Anderson, André de Palma, and Jacques-Francois Thisse. Discrete Choice Theory of

Product Differentiation. Cambridge, MA: The MIT Press, 1992. [3]

Theo A. Arentze and Harry J.P Timmermans. A learning-based transportation oriented

simulation system. Transportation Research Part B: Methodological, 38(7):613 – 633, 2004. [4]

John S. Ashworth and D.T. Ulph. Household models. In C.V. Brown, editor, Taxation and

Labor Supply, George Allen and Unwin, London, pages 117–199. [5]

Ian Bateman and Alistair Munro. An experiment on risky choice amongst households.

The Economic Journal, 115(502):pp. C176–C189, 2005. [6]

Gary Becker. A theory of the allocation of time. The Economic Journal, 75(299):493–517,

1965. [7]

Gary Becker. A theory of marriage: Part i. Journal of Political Economy, 81(4):813–846,

1973. [8]

Gary Becker. A theory of marriage: Part ii. Journal of Political Economy, 82(2):S11–S26,

1974. [9]

Gary Becker. A theory of social interactions. Journal of Political Economy, 82(6):1063–

1093, 1974. [10]

Gary Becker. A Treatise on the Family. Harvard University Press, 1991.

[11]

Gary Becker, Elisabeth M. Landes, and Robert T. Michael. An economic analysis of

marital instability. Journal of Political Economy, 85(6):1141–1187, 1977. [12]

Gary Becker and H. Gregg Lewis. On the interaction between the quantity and quality of

children. Journal of Political Economy, 81(2):S279–S288, 1973. 19

[13]

Nesha Beharry-Borg, David A. Hensher, and Riccardo Scarpa. An analytical framework for

joint vs separate decisions by couples in choice experiments: The case of coastal water quality in tobago. Environmental and Resource Economics, 43(1):95–117, 2009. [14]

Moshe Ben-Akiva and Steven R. Lerman. Disaggregate travel and mobility choice models

and measures of accessibility. In Stopher P Hensher D, editor, Behavioral travel modeling, London: Croom Helm, pages 654–679. 1979. [15]

Moshe Ben-Akiva, André Palma, Daniel McFadden, Maya Abou-Zeid, Pierre-André

Chiappori, Matthieu Lapparent, StevenN. Durlauf, Mogens Fosgerau, Daisuke Fukuda, Stephane Hess, Charles Manski, Ariel Pakes, Nathalie Picard, and Joan Walker. Process and context in choice models. Marketing Letters, 23(2):439–456, June 2012. [16]

Theodore C. Bergstrom. A fresh look at the rotten kid theorem- and other household

mysteries. Journal of Political Economy, 97(5):1138–1159, 1989. [17]

Chandra R. Bhat and Ram M. Pendyala. Modeling intra-household interactions and group

decision-making. Transportation, 32:443–448, 2005. [18]

S. M. Bianchi and J. Ro i so . What did ou do toda ? Child e ’s use of ti e, fa il

composition, and the acquisition of social capital. Journal of Marriage and Family, 59(2):332– 344, 1997. [19]

Paul Bingley and Ian Walker. The labour supply, unemployment and participation of lone

mothers in in- work transfer programmes. The Economic Journal, 107(444):1375–1390, 1997. [20]

Hans G. Bloemen. The added worker effect in a microeconomic model of the family with

market rationing. Technical report, Working paper. Salt Lake city: Brigham Young University, 1989. [21]

Hans G. Bloemen. Income taxation in an empirical collective household labour supply

model with discrete hours. Technical report, Tinbergen Institute, January 2010. [22]

Richard Blundell, Pierre-André Chiappori, Thierry Magnac, and Costas Meghir. Collective

labour supply: Heterogeneity and non-participation. The Review of Economic Studies, 74(2):417– 445, 2007. 20

[23]

Mark Bradley and Peter Vovsha. A model for joint choice of daily activity pattern types of

household members. Transportation, 32(5):545–571, 2005. [24]

Tim Callan, Arthur Van Soest, and John R. Walsh. Tax structure and female labour supply:

Evidence from ireland. LABOUR, 23(1):1–35, 2009. [25]

Pierre-André Chiappori. Rational household labor supply. Econometrica, 56(1):63–90,

1988. [26]

Pierre-André Chiappori. Collective labor supply and welfare. Journal of Political Economy,

100(3):pp. 437–467, 1992. [27]

Pierre-André Chiappori, André de Palma, Nathalie Picard, and Ignacio A. Inoa. Couple

residential location and spouses workplaces. Mimeo, Université de Cergy Pontoise, THEMA, 2013. [28]

Pierre-André Chiappori, Bernard Fortin, and Guy Lacroix. Marriage market, divorce

legislation, and household labor supply. Journal of Political Economy, 110(1):37–72, 2002. [29]

Janice Compton and Robert A. Pollak. Proximity and coresidence of adult children and

their parents: Description and correlates. Working papers, University of Michigan, Michigan Retirement Research Center, October 2009. [30]

Janice Compton and Robert A. Pollak. Fa il p o i it , hild a e, a d

o e ’s la o

force attachment. Technical report, National Bureau of Economic Research, Inc, December 2011. [31]

André de Palma and Karim Kilani. Invariance of conditional maximum utility. Journal of

Economic Theory, 132(1):137–146, 2007. [32]

André de Palma, Moez Kilani, and Robin Lindsey. Congestion pricing on a road network:

A study using the dynamic equilibrium simulator metropolis. Transportation Research Part A, 39(7-9):588–611, 2005. [33]

André de Palma and Robin Lindsey. Modelling and evaluation of road pricing in paris.

Transport Policy, 13(2):115–126, 2006. 21

[34]

André de Palma, Robin Lindsey, and Nathalie Picard. Departure time choice within the

family. Working papers, University of British Columbia, Canada, 2012. [35]

André de Palma and Fabrice Marchal. Real cases applications of the fully dynamic

metropolis tool-box: an advocacy for large-scale mesoscopic transportation systems. Networks and Spatial Economics, 2(4):347–369, 2002. [36]

Dick Ettema. Latent activities: Modeling the relationship between travel times and

activity participation. Transportation Research Record: Journal of the Transportation Research Board, 1926:171–180, 2006. [37]

Bernard Fortin and Guy Lacroix. A test of the unitary and collective models of household

labour supply. The Economic Journal, 107(443):933–955, 1997. [38]

Ora Freedman and Clifford R. Kern. A model of workplace and residence choice in two-

worker households. Regional Science and Urban Economics, 27(3):241 – 260, 1997. [39]

John P. Gliebe and Frank S. Koppelman. A model of joint activity participation between

household members. Transportation, 29(1):49–72, 2002. [40]

Thomas F. Golob and Michael G. McNally. A model of activity participation and travel

interactions between household heads. Transportation Research Part B: Methodological, 31(3):177 – 194, 1997. [41]

Christian Gourieroux and Alain Monfort. Simulation-based inference: A survey with

special reference to panel data models. Journal of Econometrics, 59(1/2):5 – 33, 1993. [42]

Reuben Gronau. Leisure, home production, and work– the theory of the allocation of

time revisited. Journal of Political Economy, 85(6):1099–1123, 1977. [43]

Peter Haan. A multi-state model of state dependence in labor supply: Intertemporal

labor supply effects of a shift from joint to individual taxation. Labour Economics, 17(2):323 – 335, 2010. [44]

Torsten Hagerstrand. What about people in regional science? Papers of the Regional

Science Association, 24(1):7–21, 1970. 22

[45]

Jerry Hausman and Paul Ruud. Family labor supply with taxes. The American Economic

Review, 74(2):242–248, 1984. [46]

Jerry A. Haus a . E a t

o su e ’s su plus a d dead eight loss. The American

Economic Review, 71(4):662–676, 1981. [47]

Ignacio A. Inoa, Nathalie Picard, and André de Palma. Effect of an accessibility measure

in a model for choice of residential location, workplace, and type of employment. Mathematical Population Studies, page forthcoming, 2014. [48]

Arie Kapteyn, Peter Kooreman, and Arthur van Soest. Quantity rationing and concavity in

a flexible household labor supply model. The Review of Economics and Statistics, 72(1):55–62, 1990. [49]

Michael Keane and Robert Moffitt. A structural model of multiple welfare program

participation and labor supply. International Economic Review, 39(3):pp. 553–589, 1998. [50]

Hyun-Mi Kim and Mei-Po Kwan. Space-time accessibility measures: A geocomputational

algorithm with a focus on the feasible opportunity set and possible activity duration. Journal of Geographical Systems, 5(1):71–91, 2003. [51]

Ryuichi Kitamura, Takamasa Akiyama, Toshiyuki Yamamoto, and Thomas F. Golob.

Accessibility in a metropolis: Toward a better understanding of land use and travel. Transport Research Record, 1780:64–75, 2001. [52]

Mei-Po Kwan. Space-time and integral measures of individual accessibility: a

comparative analysis using a point-based framework. Geographical Analysis, 30(3):191–217, 1998. [53]

François Laisney. Welfare analysis of fiscal and social security reforms in europe: does

the representation of family decision processes matter? Technical report, Final report on EUproject VS/2000/0778, 2002. [54]

Brian H Y Lee, Paul Waddell, Liming Wang, and Ram M Pendyala. Reexamining the

influence of work and nonwork accessibility on residential location choices with a microanalytic framework. Environment and Planning A, 42(4):913–930, 2010. 23

[55]

Jane H. Leuthold. An empirical study of female income transfers and the work decision of

the poor. Journal of Human Resources, 3(3):312–323, 1968. [56]

Jeremy Lise and Shannon Seitz. Consumption inequality and intra-household allocations.

The Review of Economic Studies, 78(1):328–355, 2011. [57]

Shelly Lundberg and Robert A Pollak. Separate spheres bargaining and the marriage

market. Journal of Political Economy, 101(6):988–1010, December 1993. [58]

Shelly J. Lundberg, Robert A. Pollak, and Terence J. Wales. Do husbands and wives pool

their resources? evidence from the united kingdom child benefit. The Journal of Human Resources, 32(3):463–480, 1997. [59]

Marjorie McElroy and Mary Horney. Nash-bargained decisions: Toward a gen- eralization

of the theory of demand. International Economic Review, 22(2):333–349, 1981. [60]

Pierre-Carl Michaud and Frederic Vermeulen. A collective labor supply model with

complementarities in leisure: Identification and estimation by means of panel data. Labour Economics, 18(2):159 – 167, 2011. [61]

Harvey J. Miller. Modeling accessibility using space-time prism concepts within

geographical information systems. International Journal of Geographical Information Systems, 5(3):287–301, 1991. [62]

Nicolas Moreau and Olivier Donni. Estimation of a collective model of labour supply with

taxation. Annales d’Eco o ie et de Statistique, 2002(65):55–83, 2002. [63]

Tijs Neutens, Matthias Delafontaine, Darren M Scott, and Philippe De Maeyer. An

analysis of day-to-day variations in individual space-time accessibility. Journal of Transport Geography, 23:81–91, 2012. [64]

Daniele Pacifico. On the role of unobserved preference heterogeneity in discrete choice

models of labour supply. Empirical Economics, 45(2):1–35, 2012.

24

[65]

André de Palma, Nathalie Picard, and Anthony Ziegelmeyer. Individual and couple

decision behavior under risk: evidence on the dynamics of power balance. Theory and Decision, 70(1):45–64, 2011. [66]

Nathalie Picard. Démographie et économie de la famille dans les pays en développement

/ demography and family economics in the developing countries. Economie Publique / Public economics, 03-04(1-2):189–223, 1999. [67]

Nathalie Picard, André de Palma, and Sophie Dantan. Intra-household discrete choice

models of mode choice and residential location. International Journal of Transport Economics, XL(3):419–445, 2013. [68]

Ondrej Pribyl and Konstadinos G. Goulias. Simulation of daily activity patterns

incorporating interactions within households: Algorithm overview and performance. Transportation Research Record: Journal of the Transportation Research Board, 1926(1):135 – 141, 2005. [69]

Michael R. Ransom. An empirical model of discrete and continuous choice in family labor

supply. The Review of Economics and Statistics, 69(3):465–472, 1987. [70]

Paul A. Samuelson. Social indifference curves. The Quarterly Journal of Economics,

70(1):1–22, 1956. [71]

Darren M. Scott and Pavlos S. Kanaroglou. An activity-episode generation model that

captures interactions between household heads: development and empirical analysis. Transportation Research Part B: Methodological, 36(10):875 – 896, 2002. [72]

Amartya Sen. Economics and the family. Asian Development Review, 1:14–26, 1983.

[73]

M. William Sermons and Frank S. Koppelman. Representing the differences between

female and male commute behavior in residential location choice models. Journal of Transport Geography, 9(2):101 – 110, 2001. [74]

Karthik K. Srinivasan and Sudhakar R. Athuru. Analysis of within-household effects and

between-household differences in maintenance activity allocation. Transportation, 32(5):495– 521, 2005. 25

[75]

Sivaramakrishnan Srinivasan and Chandra R. Bhat. Modeling household interactions in

daily in-home and out-of-home maintenance activity participation. Transportation, 32(5):523– 544, 2005. [76]

Sivaramakrishnan Srinivasan and Chandra R. Bhat. A multiple discrete-continuous model

for independent- and joint-discretionary-activity participation decisions. Transportation, 33(5):497–515, 2006. [77]

Issam M. Srour, Kara M. Kockelman, and Travis P. Dunn. Accessibility indices: Connection

to residential land prices and location choices. Transportation Research Record: Journal of the Transportation Research Board, 1805(1):25–34, 2002. [78]

Duncan Thomas. Intra-household resource allocation: An inferential approach. The

Journal of Human Resources, 25(4):pp. 635–664, 1990. [79]

Harry Timmermans. Household decision making in travel behaviour analysis. In Ryuichi

Kitamura, Toshio Yoshii, and Toshiyuki Yamamoto, editors, The Expanding Sphere of Travel Behaviour Research: Selected Papers from the 11th International Conference on Travel Behaviour Research, pages 159–186. Emerald Group Publishing Limited, Bingley, UK, 2009. [80]

Harry J.P. Timmermans, A. Borgers, J. Dijk, and H. Oppewal. Residential choice behavior

of dual earner households: a decompositional joint choice model. Environment and Planning A, 24(4):517–533, 1992. [81]

Harry J.P. Timmermans and Junyi Zhang. Modeling household activity travel behavior:

Examples of state of the art modeling approaches and research agenda. Transportation Research Part B: Methodological, 43(2):187 – 190, 2009. [82]

T.A. Townsend. The effects of household characteristics on the multi-day time allocations

and travel/activity patterns of households and their members. Ph.D. Dissetation. Department of Civil Engineering, Northwestern University, Evanston, Illinois, 1987. [83]

Arthur Van Soest. Structural models of family labor supply: A discrete choice approach.

The Journal of Human Resources, 30(1):pp. 63–88, 1995.

26

[84]

Frederic Vermeulen. Collective household models: Principles and main results. Journal of

Economic Surveys, 16(4):533–564, 2002. [85]

Frederic Vermeulen. Where does the unitary model go wrong? simulating tax reforms by

means of unitary and collective labour supply models. the case for belgium. In François Laisney, editor, Welfare analysis of fiscal and social security reforms in Europe: does the representation of family decision processes matter?, Final report on EU-project VS/2000/0778, Mannheim, Centre for European Economic Research. 2002. [86]

Frederic Vermeulen. A collective model for female labour supply with non-participation

and taxation. Journal of Population Economics, 19(1):99–118, 2006. [87]

Frederic Vermeulen, Olivier Bargain, Miriam Beblo, Denis Beninger, Richard Blundell,

Raquel Carrasco, Maria-Concetta Chiuri, François Laisney, Valérie Lechene, Nicolas Moreau, Michal Myck, and Javier Ruiz-Castillo. Collective models of labor supply with nonconvex budget sets and nonparticipation: A calibration approach. Review of Economics of the Household, 4(2):113–127, 2006. [88]

William Vickrey. Congestion theory and transport investment. The American Economic

Review, 59(2):251–260, 1969. [89]

Peter Vovsha, Eric Petersen, and Robert Donnelly. Explicit modeling of joint travel by

household members: Statistical evidence and applied approach. Transportation Research Record: Journal of the Transportation Research Board, 1831(1):1–10, 2003. [90]

Peter Vovsha, Eric Petersen, and Robert Donnelly. Impact of intrahousehold interactions

on individual daily activity-travel patterns. Transportation Research Record: Journal of the Transportation Research Board, 1898(1):87–97, 2004. [91]

Peter Vovsha, Eric Petersen, and Robert Donnelly. Model for allocation of maintenance

activities to household members. Transportation Research Record: Journal of the Transportation Research Board, 1894(1):170–179, 2004.

27

[92]

Paul Waddell. Accessibility and residential location: The interaction of workplace,

housing tenure, residential mobility and location choices. October, 1996. Paper presented at the 1996 Lincoln Land Institute TRED Conference. [93]

Paul Waddell and Firouzeh Nourzad. Incorporating non-motorized mode and

neighborhood accessibility in an integrated land use and transportation model system. Transportation Research Record: Journal of the Transportation Research Board, 1805(1):119– 127, 2002. [94]

Chieh-Hua Wen and Frank Koppelman. Integrated model system of stop generation and

tour formation for analysis of activity and travel patterns. Transportation Research Record: Journal of the Transportation Research Board, 1676:136–144, 1999. [95]

Chieh-Hua Wen and Frank Koppelman. A conceptual and methodological framework for

the generation of activity-travel patterns. Transportation, 27(1):5–23, 2000. [96]

Seo Youn Yoon and Konstadinos G. Goulias. Constraint-based assessment of intra-

household bargaining on time allocation to activities and travel using individual accessibility measures. In Proceedings of the 89th Annual Meeting of the TRB. Transportation Research Board, Washington DC, 10-14 January 2010. [97]

Seo Youn Yoon and Konstadinos G. Goulias. Impact of individual accessibility on travel

behavior and its propagation through intra-household interaction. Transportation Letters, 2(4):245–260, 2010. [98]

Junyi Zhang and Akimasa Fujiwara. Representing household time allocation behavior by

endogenously incorporating diverse intra-household interactions: A case study in the context of elderly couples. Transportation Research Part B: Methodological, 40(1):54 – 74, 2006. [99]

Barry Zondag and Marits Pieters. Influence of accessibility on residential location choice.

Transportation Research Record: Journal of the Transportation Research Board, 1902(1):63–70, 2005.

28

Appendix: Accessibility measures, by education level

29

30

31

32

17 Hybrid choice models Maya Abou-Zeid and Moshe Ben-Akiva

1 1.1

BACKGROUND Introduction

The hybrid choice model (HCM) is a modeling framework that attempts to bridge the gap between discrete choice models and behavioral theories by representing explicitly unobserved elements of the decision-making process, such as the influence of attitudes, perceptions and decision protocols. It integrates discrete choice models with latent (or unobserved) variable models. Latent variable models, also known as structural equation models, will be presented later in this chapter. The origins of the HCM can be traced to several researchers including work by McFadden (1986), Ben-Akiva et al. (2002a, 2002b), Morikawa et al. (2002), Walker and Ben-Akiva (2002) and Ashok et al. (2002). Many applications in various contexts have followed, including vehicle type choice (Bolduc and Alvarez-Daziano, 2010; Choo and Mokhtarian, 2004), mode choice (Johansson et al., 2006), residential location choice (Kitrinou et al., 2010; Walker and Li, 2007), and so on. The purpose of this chapter is not to review this literature but rather to focus on the advantages of incorporating latent variables in discrete choice models through the HCM. We discuss four types of advantages. The first advantage is the ability to explicitly model unobserved heterogeneity, such as the dependence of taste parameters on underlying latent variables such as attitudes. The second advantage is a gain in statistical efficiency of the parameter estimates due to the additional information provided by indicators of latent variables. The third advantage is enhanced behavioral realism which means that the HCM represents more transparently how people make decisions compared to a ‘black-box’ discrete choice model whose utility functions depend on observable variables, such as by explicitly accounting for the effect of risk aversion on preferences. The fourth advantage is related to policy relevance, where latent variable models are expected to lead to more sensible predictions (especially where the values of the latent variables change over time) and to market segmentation by the latent variables to allow the design of policy measures that are better targeted. This chapter is organized as follows. Section 1.2 reviews the formulation of the standard discrete choice model based on random utility theory. Section 2 presents the framework of the HCM and its mathematical formulation. Section 3 presents the advantages of the HCM focusing on heterogeneity, efficiency, behavioral realism and policy relevance. The presentation of the advantages is illustrated through specific examples dealing with willingness to pay for travel time savings (Abou-Zeid et al., 2010), travel mode choice with latent choice sets (Ben-Akiva and Boccara, 1995) and airline itinerary choice (Theis, 2011). Section 4 concludes with a discussion of challenges for future work in this area. 383 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

384

Handbook of choice modelling

1.2

Discrete Choice Model

The discrete choice model based on random utility theory is a widely used framework for modeling choices in many domains, such as marketing, transportation, energy, economics, and other fields (see Ben-Akiva and Lerman, 1985; Train 2003). A representation of this modeling framework is shown in Figure 17.1. In this figure and other figures in this chapter, observed variables are shown in rectangles while latent (or unobserved) variables are shown in ellipses. Solid arrows represent structural (or causal or behavioral) relationships, while dashed arrows represent measurement relationships. A measurement equation refers to any relationship expressing an observed variable as a function, among others, of unobserved or latent variables. In this chapter, the words individual and decision maker are used interchangeably. The utility of every alternative is a function of attributes of the alternative, which may also be interacted with characteristics of the decision maker. The structural model for individual n is as follows: Un 5 U (Xn;b,en) and en~D (0,Se)

(17.1)

where Un denotes a vector (of dimension J 3 1, where J is the number of alternatives) of total utilities of all alternatives for individual n, Xn is a J 3 K matrix of observed explanatory variables where K is the number of explanatory variables, b is a vector of unknown parameters, en is a vector of disturbances with a distribution function D which has zero mean and a variance-covariance matrix denoted as Se, and U (.) is a utility function. A common specification of the utility function is to use an additive disturbance as follows:

Explanatory variables (X)

Disturbance (e) Utility (U)

Choice (y) Figure 17.1 The standard discrete choice model

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

385

Hybrid choice models Un 5 V (Xn; b) 1 en

(17.2)

where V (Xn;b) denotes the vector of systematic utilities and is generally specified as a linear function of the parameter vector b. The observed choice is a manifestation of the underlying utility. The measurement equation expresses the relationship between the observed choice and the utility. If the choice is based on utility maximization, the measurement equation for alternative i for individual n is as follows: yin 5 e

1 0

if Uin $ Ujn 4j [ Cn otherwise

(17.3)

where yin is a choice indicator equal to one if individual n chose alternative i and is zero otherwise, and Cn is the choice set of individual n. yn denotes the choice vector with elements yin, i 5 1, . . . , J. The probability of the choice vector for individual n depends on the distribution of the utilities and is expressed as follows:

where:

P (yn 0 Xn;b,Se) 5 3 P (yn 0 Xn;b,e) f (e 0 Se) de

(17.4)

e

P (yn 0 Xn;b,e) 5 e

1 if e 0 X,b implies that the choice vector is y 0 otherwise

(17.5)

If the parameters b randomly vary across decision makers and are distributed according to a joint probability density function f (b) with mean mb and variance-covariance matrix Sb, the choice probability becomes: P (yn 0 Xn;mb,Sb,Se) 5 3 P (yn 0 Xn; b,Se) f (b 0 mb,Sb) db

(17.6)

b

If e is assumed to be independently and identically extreme value type I distributed, equation (17.6) becomes the probability expression of a random coefficient logit model which is of the logit mixture models presented elsewhere in this handbook. The HCM choice probability presented in the next section is also expressed as a mixture where the mixing distribution is derived from a latent variable model.

2

HYBRID CHOICE MODEL FRAMEWORK

This section presents the HCM framework, including the model, the inclusion of indicators of the latent variables, normalization, and estimation.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

386

Handbook of choice modelling Explanatory variables (X) Latent variables (X*)

Utility (U)

Choice (y) Figure 17.2 The HCM without indicators of the latent variables

2.1

The Model

The aim of the HCM is to extend the standard discrete choice model of Figure 17.1 to account for the effects of latent variables such as knowledge, perceptions, attitudes, choice sets, decision protocols, and so on, on choice. The HCM framework is shown in Figure 17.2. Disturbances and error terms are not shown in this or subsequent figures. The latent variables X* influence the utility and are a function of explanatory variables X. The HCM therefore combines a choice model with a latent variable model (expressing the latent variables as a function of explanatory variables). The latent variables X* as depicted above are endogenous. Latent variables are modeled as endogenous when one is interested in using the model for prediction, where changes in the values of the explanatory variables may change the values of the latent variables as well. One may also use exogenous latent variables (with an assumed probability distribution) in case the model is to be used to describe behavior but not to make predictions, or where the explanatory variables cannot explain well the latent variables. We present next the model formulation with continuous latent variables and discrete latent variables separately. 2.1.1 Continuous latent variables The model consists of structural equations for the utility and the latent variables and a measurement equation (choice model). The utility is expressed as a function of observed explanatory variables and latent variables, and the latent variables are expressed as a function of explanatory variables as follows: Un 5 U (Xn,X*n;b,en) and en~D (0,Se)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

(17.7)

387

Hybrid choice models X*n 5 X* (Xn;a,wn) and wn~D (0,Sw)

(17.8)

where X*n is a vector of continuous latent variables (of dimension L 3 1), a is a vector of unknown parameters, wn is a vector of disturbances with a distribution function D which has zero mean and a variance-covariance matrix denoted as Sw, X* (.) is a function, and the other terms are as previously defined. A common specification is to use additive disturbances in equations (17.7) and (17.8) and functions U (.) and X* (.) that are linear in the unknown parameters. The choice model as before expresses the choice indicators as a function of the utilities based on utility maximization as follows: yin 5 e

1

if Uin $ Ujn 4j [ Cn 0 otherwise

(17.9)

The conditional choice probability (conditional on the latent variables) is denoted as: P (yn 0 Xn,X*n ;b,Se)

(17.10)

If the disturbances e are i.i.d. extreme value type I, the above probability is given by the logit model. If they are multivariate normal, the conditional choice probability is given by the probit model. Since the latent variables are unobserved, the unconditional choice probability is obtained by integrating the conditional choice probability over the distribution of the latent variables as follows:1 P (yn 0 Xn;b,a,Se,Sw) 5 3 P (yn 0 Xn,X*n ; b,Se) f (X*n 0 Xn ;a,Sw ) dX*

(17.11)

X*

where f (X*n 0 Xn;a,Sw) denotes the joint probability density function of the latent variables X* and the dimensionality of the integral is equal to the number of latent variables. Note that if the utility is additive in both e and X*, equation (17.11) may pose identification issues. As discussed later, the addition of indicators of the latent variables eases the identification of the model. To see how f (X*n 0 Xn;a,Sw) is derived, consider for example the l th latent variable X*ln with an additive and normal disturbance as follows: X*ln 5 h (Xn;a) 1 wln and wl n~N (0,s2w )

(17.12)

l

where s2w denotes the variance of the disturbance wln and h (.) is a function. Given the observed variables, X*ln is normally distributed with a mean equal to h (Xn;a) and a variance equal to s2w . Its probability density function f (X*ln 0 Xn;a,sw ) is that of a normal variable: l

l

l

f (X*ln 0 Xn;a,sw ) 5 l

( )) 2 (X* ln 2h Xn;a

1 "2psw

l

e2

2sw2

l

5

X*ln 2 h (Xn;a) 1 fa b sw sw l

(17.13)

l

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

388

Handbook of choice modelling

where f (.) denotes the standard normal probability density function. The rightmost X* 2 h (X ;a) equality in equation (17.13) results from the fact that is a standard normal s variable. The joint density function of the latent variables can be expressed given the joint distribution of the disturbances wn in their structural equations. For example, if the disturbances are additive and their joint distribution is multivariate normal with a diagonal variance-covariance matrix, this density function is given as follows: ln

n

wl

L X*ln 2 h (Xn;a) 1 f (X*n 0 Xn;a,Sw) 5 q fa b sw l51 sw l

(17.14)

l

where L is the number of latent variables.

2.1.2 Discrete latent variables Latent variables can represent discrete constructs such as decision protocols, choice sets, levels of sensitivity to attributes, lifestyles, or generally different unobserved segments of the population or ‘latent classes’ (Gopinath, 1995; Gopinath and Ben-Akiva, 1997). Every latent class will have its own choice model to allow for different taste parameters, explanatory variables, choice sets, and so on across classes. The model, called a latent class choice model, consists of two models that are jointly estimated: a class membership model which predicts the probability of belonging to any given class, and class-specific choice models which predict the choice given membership in a certain class. The number of classes can be set a priori by the researcher (for example, based on behavioral considerations) or is based on empirical considerations. We focus here specifically on behavioral discrete mixtures. The specification of the class probability is based on the theory of unobserved criterion functions (Swait and Ben-Akiva, 1987a, 1987b) which map a vector of observed variables into a vector of continuous latent variables (or random criteria) which are then related to the discrete latent variables or classes. For example, if the latent class represents a choice set, it may be postulated that an alternative will be considered if a set of unobserved criteria or constraints related to that alternative are met. If every alternative i is associated with a set of Ki criteria, the expression of the value Hkin of the kth criterion may be written as follows: Hkin 5 H (Xin;a,wkin)

(17.15)

where H (.) is a function, Xin is a vector of explanatory variables including sociodemographic variables or attributes of the alternative, a is a vector of unknown parameters, and wkin is a disturbance. A choice set C will therefore be chosen by an individual if the set of unobserved criteria of each alternative in the choice set are met (for example, where Hkin $ 0,4k [ Ki,4i [ C). For example, in the case of choice sets representing modes, the constraints or criteria may relate to the number of cars at home relative to the number of drivers, the distance to transit, and so on and the interpretation of the unobserved constraints is that the corresponding modes are available if the corresponding factors exceed certain unobserved individual-specific thresholds (for example, distance

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models

389

to transit less than a certain threshold). Assumptions about the distribution of the disturbances w in the criterion functions are needed to specify the probability of choosing a certain choice set. As another example, consider the choice of a decision protocol (such as utility maximization, random choice, satisficing, and so on). In this case, every decision protocol will have an unobserved criterion function (for example, related to time pressure, education, and so on), and the protocol with the highest value of the criterion function is the one chosen. Let Q (s 0 Xn;a,Sw) denote the probability of belonging to class s (for example, choosing a certain decision protocol or a choice set). This probability is dependent on the explanatory variables Xn, unknown parameters a, and the variance-covariance matrix Sw of the disturbances w in the unobserved criterion functions. Conditional on belonging to class s, the choice probability is denoted as P (yn 0 Xn;bs,Se) , where the parameters bs are specific to class s. Since the actual class to which an individual belongs is unobserved, the unconditional choice probability for individual n is obtained by mixing the conditional choice probability over the probability distribution of the latent classes: P (yn 0 Xn;b,a,Se,Sw) 5 a P (yn 0 Xn;bs,Se) Q (s 0 Xn;a,Sw) S

(17.16)

s51

where S is the total number of classes. 2.2

Introducing Indicators of the Latent Variables

While the latent variables are unobserved, the analyst can get indicators or measures of these latent variables from surveys. Examples of statements that measure the latent variables ‘perception of comfort’ and ‘perception of convenience’ of travel by public transportation are ‘Traveling by public transportation is comfortable’ or ‘Traveling by public transportation is inconvenient’, where respondents rate their level of agreement with these statements. The responses to these statements are thus manifestations of the individuals’ latent perceptions of the comfort and convenience of traveling by public transportation. Similarly, examples of statements that measure attitudes are ‘I wouldn’t mind having to make a transfer when traveling by public transportation’ or ‘We should raise the price of gasoline to reduce congestion and air pollution’. The indicators of the latent variables do not have a causal relationship with behavior. They are used only at the estimation stage to ease the identification of the model but not during model application. The HCM framework with indicators is shown in Figure 17.3. Thus, Figure 17.3 is the setting applied for estimation, while Figure 17.2 and the models presented in section 2.1 are used in application. Note that in Figure 17.3 the causality goes from the latent variables to the indicators, not the other way around, and the indicators are shown in rectangles because they are observed variables. See Ashok et al. (2002), McFadden (1986), Morikawa et al. (2002) and Walker (2001) for a discussion of statistical issues involved when the indicators are included directly in the utility equations (without latent variables), or when the extracted latent variables are used as explanatory variables in the utility equations but their distribution is not accounted for in the derivation of the choice probability.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

390

Handbook of choice modelling Explanatory variables (X)

Latent variables (X*)

Indicators (I)

Utility (U)

Choice (y) Figure 17.3 The HCM with indicators of the latent variables We next discuss the changes in the model formulation with the addition of the indicators. We consider the case of continuous indicators and continuous latent variables. The example presented in section 3.2 shows how information from indicators can be incorporated in the formulation when the latent variables are discrete. For the case of discrete indicators, the reader is referred to Daly et al. (2012). The indicators are expressed as a function of the latent variables (and can also be a function of observed explanatory variables) as follows: In 5 I (Xn,X*n ;l,un) and un~D (0,Su)

(17.17)

where In is a vector of indicators (of dimension R 3 1), l is a vector of unknown parameters, un is a vector of error terms with a distribution function D which has zero mean and a variance-covariance matrix denoted as Su, and I (.) is a function usually specified as linear in the parameters and with an additive error term. Every indicator can be expressed as a function of one or more latent variables. It is typically assumed that the disturbances e (in the utility equations) are not correlated with the error terms u (in the measurement equations of the latent variables). This is the conditional independence assumption, which means that the correlation between the choice and the indicators arises from their dependence on the latent variables X*, but conditional on the latent variables, the choice and indicators are independent. It is also typically assumed that the disturbances w are not correlated with e and u (see, for example, Everitt, 1984). We next express the joint probability of the choice and the indicators. The indicators of the latent variables reveal information about the latent variables and ease the identification of the model. Therefore, it is advantageous to write the joint probability of the choice and the indicators (and then later in estimation to maximize the sample likelihood

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

X*

Hybrid choice models

391

given both the choice and the indicators). As stated earlier, the choice and the indicators are correlated through their dependence on the latent variables X*n. Therefore, the joint probability is not equal to the product of the unconditional probabilities of the choice and the indicators. However, when conditioned on the latent variables, the choice and the indicators are independent. Therefore, the joint probability of the choice and the indicators of the latent variables is expressed as the product of the conditional choice probability and the conditional density function of the indicators, integrated over the density of the latent variables, as follows: P (yn,In 0 Xn;b,a,l,Se,Sw,Su) 5 3 P (yn 0 Xn,X*n ;b,Se ) g (In 0 Xn,X*n;l,Su) f (X*n 0 Xn;a,Sw) dX* (17.18) X* where g (In 0 Xn,X*n ;l,Su) denotes the joint density function of the indicators conditional on the latent variables. For example, if the measurement error terms in equation (17.17) are additive and normally distributed with zero covariances, the joint density function becomes a product of univariate normal density functions. 2.3

Normalization

To summarize, the unknown parameters in the model are: the coefficients and variancecovariance matrix of the disturbances in the utility equations (b and Se), coefficients and variance-covariance matrix of the disturbances in the structural equations of the latent variables (a and Sw), and coefficients and variance-covariance matrix of the error terms in the measurement equations of the latent variables (l and Su). We discuss in this section restrictions that should be imposed to set the scale of the latent variables in the HCM. A general rule for identification of the HCM is to ensure that both the choice model and the latent variable model are identified. However, no general conditions for the identification of the HCM have been established. Identification issues are discussed in more detail in another chapter in this handbook. Further references on identification include Bollen (1989) dealing with identification conditions for structural equation models, and Bolduc et al. (2005), Raveau et al. (2012), Walker (2001) and Walker et al. (2007) dealing with identification conditions for mixture models and Daly et al. (2012) for the case where the indicators of the latent variables are discrete. Since the latent variables are unobserved, their scale needs to be fixed. The scale of the utility is set by normalizing the variance of its disturbance. For every other latent variable, the normalization can be done in one of two ways. The first method is to normalize the variance of the latent variable (that is, the variance terms in Sw in equation (17.8) for the case of continuous latent variables and equation (17.15) for the case of discrete latent variables), as is done for the utility, to a certain value such as 1. The second method is to set the scale of a latent variable to be the same as the scale of one of its indicators. Consider the latent variable X*1n. If the measurement equation is linear in the unknown parameters and additive in the error term (for example, I1n 5 k1 1 l1X*1n 1 u1n in the case of continuous indicators), this is typically done by fixing the factor loading (that is, l1) to 1. Both methods are often used in practice, but the advantage of fixing the factor loading to 1 is that it eases the interpretation of the scale of the latent variable in terms of a particular measurement. If the factor loading is normalized, it is preferable

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

392

Handbook of choice modelling

to normalize the factor loading in the equation of the indicator that is believed to be the most reliable indicator of the latent variable out of all available indicators. Otherwise, there is the risk of normalizing to 1 a parameter whose true value may be close to zero in case of a weak association between the indicator and the latent variable. 2.4

Estimation

The HCM can be estimated using two approaches: sequential and simultaneous estimation. The sequential estimation method consists of two stages. In the first stage, the latent variable model is estimated, and the latent variables and their distribution are extracted. In the second stage, the fitted latent variables are used as explanatory variables in the choice model, and the choice probability conditional on the latent variables is integrated over the distribution of the fitted latent variables to obtain the unconditional choice probability (see McFadden, 1986; Morikawa et al., 2002). In the simultaneous estimation method, the latent variable model and the choice model are jointly estimated. Both methods result in consistent parameter estimates, but the parameter estimates obtained in the simultaneous estimation method are more efficient. The estimation of the model is typically done through maximum likelihood. The probability expressions derived earlier (of the choice, or the choice and indicators of the latent variables) can be used to write the sample likelihood function as usual. When the latent variables are continuous, the dimension of the integral of the probability function in equations (17.11) and (17.18) is equal to the number of latent variables. When there are three or more latent variables, numerical integration becomes computationally burdensome and so the integral can be approximated through Monte Carlo integration by drawing from the distribution of the latent variables. Maximum simulated likelihood (MSL) then consists of maximizing the simulated log-likelihood function. For more details on estimation by simulation and the properties of the resulting estimator, the reader is referred to Train (2003). After model estimation, the values of the latent variables may be extracted. If the purpose is to use the extracted values for prediction (for example, estimating change in behavior owing to a change in the value of a latent variable), the structural equation of the latent variable can be used. The systematic part of the structural equation can be used as the fitted value of the latent variable; a disturbance can also be simulated for every individual and added to the systematic part. If the purpose of the analysis is to extract the values of the latent variables not in a prediction context, the measurement equations can also be used with or without the structural equation of the latent variable. See Gopinath (1995) for details on this extraction method. Measures of goodness-of-fit can be computed for the overall model as well as for specific model components. For the overall model, goodness-of-fit statistics include the loglikelihood over the choice and the indicators, the rho-squared and the Akaike criterion (see Ben-Akiva and Lerman, 1985). It is also useful to assess the goodness-of-fit of specific model components such as the likelihood of the observed choice and other measures of goodness-of-fit of the structural and measurement equations of the latent variables. For the observed choice, one can compute the choice log-likelihood (using equations (17.11) or (17.16) with the estimated parameter values to compute the choice probability for a given individual) and the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models

393

corresponding rho-squared and Akaike criterion. This is useful if one is interested in comparing the goodness-of-fit of the HCM to that of a choice-only model without latent variables and indicators, in which case the total log-likelihood (over the choice and the indicators) of the HCM cannot be directly compared to that of the choice-only model. For the structural equations of the latent variables and for every measurement equation of a latent variable, a measure of squared multiple correlation (‘pseudo’ R2) can be computed as follows with a higher value indicating a better fit: Pseudo R2 5 1 2

error variance variance of dependent variable of equation

(17.19)

For the measurement equations (assuming continuous indicators), the error variance is estimated and the variance of the dependent variable (that is, the indicator) can be obtained from the sample data. For the structural equation of a latent variable, the variance of the disturbance is also estimated or normalized, but the variance of the dependent variable (that is, the latent variable) needs to be computed in one of two ways: (1) either using the measurement equations whereby the total variance of an indicator is expressed as the sum of the variance of the error term and the square of the factor loading multiplied by the variance of the latent variable; (2) or using the structural equation where the variance of the latent variable can be computed given the variances of the explanatory variables based on the sample data, the variance of the disturbance in the structural equation, and the estimated parameters. Both methods should yield equivalent values of the variance of the latent variable. The pseudo R2 of the measurement equations can give an indication as to which of the indicators provide good measurements of the latent variable, potentially leading to the removal of ‘weak’ indicators from the model. And the pseudo R2 of the structural equation will indicate if the variables in the structural equation explain the latent variable adequately. 2.5

Prediction

Once a HCM is estimated, the model as shown in Figure 17.2 and the corresponding equations (17.11) and (17.16) (in the case of continuous or discrete latent variables, respectively) can be used to make predictions. The quality of the prediction will depend on the explanatory power of the structural equations of the latent variables. In this regard, there is also a concern about the stability over time of the relationship between the latent variables X* and the observed variables X. If this relationship changes over time, the analyst can judgmentally shift the distribution f (X* 0 X) , for example representing a general attitudinal shift by adjusting the constant in the structural equation of the latent variable.

3

ADVANTAGES OF THE HYBRID CHOICE MODEL

This section presents the advantages of the HCM and illustrates them with specific case studies. These advantages are the ability to explicitly model unobserved heterogeneity, increased efficiency, enhanced behavioral realism and extended policy relevance. The examples cover both continuous and discrete latent variables.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

394

Handbook of choice modelling

3.1

Unobserved Taste Heterogeneity

One criticism of the standard discrete choice model is that it does not adequately capture taste heterogeneity, that is, the fact that different people have different sensitivities to attributes of the alternatives. In the standard model, most commonly the taste parameters (b in equation (17.1)) are specified as constants that do not vary over individuals, and taste heterogeneity is represented by interacting socio-demographic variables with alternative attributes. This method captures systematic taste heterogeneity. Another method that captures random taste heterogeneity, which has been increasingly adopted over the past few years, is the use of a mixture model (as in equation (17.6)) where the parameters are distributed across the population with means and variances that are estimated. The HCM provides an extension of mixture models to capture random heterogeneity through behavioral mixture models, whereby the distribution of a parameter is explained as a function of a behavioral latent variable such as an attitude (see also Walker and Ben-Akiva, 2011). Consider, for example, different approaches for modeling the heterogeneity in the value of travel time savings (VTTS), which is equal to the ratio of the travel time coefficient to the travel cost coefficient in a mode choice model. Heterogeneity in VTTS arises from different sources. The standard model is able to accommodate systematic heterogeneity, that is, heterogeneity that is attributed to observed variables such as trip purpose, income, and so on. This is typically done by for example, estimating different models by trip purpose or interacting the socio-demographic variable with time or cost. In equation (17.20) which expresses the utility of alternative i for individual n, cost is divided by income to capture the fact that individuals with higher income are less cost sensitive and have a higher value of time (or VTTS). In this case, individuals with the same income will have the same VTTS, which is given in equation (17.21) for individual n. Uin 5 b1Timein 1 b2Costin /Incomen 1 c1 ein VTTSn 5

b1 3 Incomen b2

(17.20) (17.21)

Heterogeneity in VTTS may also arise from unobserved sources. One way to model unobserved heterogeneity is to use a mixture model where the coefficient of travel time or travel cost in a mode choice model is distributed across the population (for example, lognormal) and thus every individual in the population would have a different value of time, but the cause of the heterogeneity is left unexplained. For example, if the cost coefficient is randomly distributed (b2n below), the utility function is: Uin 5 b1Timein 1 b2nCostin /Incomen 1 c1 ein

(17.22)

And the VTTS of individual n is: VTTSn 5

b1 3 Incomen b2n

(17.23)

Finally, in the HCM, the heterogeneity in the cost coefficient b2n is given meaning by introducing a latent variable that is responsible for this heterogeneity. For example,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models

395

one may postulate that the VTTS varies depending on the attitude an individual holds towards travel modes. The more an individual loves traveling by car, the less sensitive he/she is towards travel cost by car, if everything else is the same. Accordingly, the utility equation (17.20) of the standard model is modified by introducing an interaction of cost/ income with the latent variable representing attitude towards the modal alternative as follows: Uin 5 b1Timein 1 b2Costin /Incomen 1 b3 (Costin /Incomen) 3 Attitudein 1 c1ein (17.24) where Attitudein represents the attitude of individual n towards alternative i. In this formulation, the sensitivity to cost includes the value of the attitude so that the overall cost coefficient is (b2 1 b3Attitudein) /Incomen and the VTTS for individual n and alternative i is given as follows: VTTSin 5

b1 3 Incomen b2 1 b3Attitudein

(17.25)

Comparing equations (17.21) and (17.25), we note that even individuals with the same income will have different values of travel time savings. While this was also the case when a mixture model was used (as in equation (17.23)), the heterogeneity in the VTTS is now attributed directly to variation in the attitude. This is called a behavioral mixture model. The above methodology was applied to model heterogeneity in VTTS using a stated preferences (SP) experiment conducted in Stockholm, Sweden, which involved the choice between the current (or RP) car commute, an alternative car commute, or indifference between the two options. The alternatives varied by travel time, travel cost, and the presence of speed cameras along the route. A HCM was estimated, where the utilities of the car alternatives were a function of these three attributes (with cost divided by income included alone as well as interacted with a ‘car-loving’ attitude which is a continuous latent variable). The structural equation of the attitude expressed attitude as a function of socio-demographic variables including gender, income, age and education. Four ordinal measures of the attitude were used as indicators (measuring on a five-point scale perceptions of the safety and comfort of the car and attitudes towards speed limits, which together were taken as measuring an overall attitude towards the car). For comparison purposes, a choice-only base model was estimated without the latent attitude and its indicators. The full estimation results are available in Abou-Zeid et al. (2010). Figure 17.4 shows the cumulative distribution function (cdf) of VTTS for three income groups for the case of the standard or base model without latent attitudes (dashed curve) and the case of the HCM with latent attitudes and indicators (solid curve). For a given income level, the cdf for the standard model is concentrated at one point since all individuals with the same income have the same VTTS, while it is continuously increasing for every income level when the attitude is included. Moreover, including the attitude allows capturing greater variability in VTTS as income increases.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

1.0

1.0

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0

0

396

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

1.0

0.2 Base model Mode with latent attitudes

0 0

50

100

150

200

VOT (SEK/hr) 7500 < Income < 15 000 SEK/month

250

300

0

50

100

150

200

VOT (SEK/hr) 25 000 < Income < 30 000 SEK/month

250

300

0

50

100 150 200 250 300 VOT (SEK/hr)

Income > 50 000 SEK/month

Source: Abou-Zeid et al. (2010).

Figure 17.4 Cdf of VTTS for the standard (base) model and the model with latent attitudes for three income levels expressed in Swedish Kronas per month

Hybrid choice models 3.2

397

Efficiency

Compared to the standard discrete choice mixture model, the behavioral mixture model enables the analyst to use indicators which improve the efficiency of the model because they provide more information about the latent variables. In this section, we discuss efficiency in the context of a mode choice example with unobserved choice sets taken from Ben-Akiva and Boccara (1995) where the full model specification and further background are available. This example also differs from the previous example in that the latent variables representing choice sets are discrete (indicators are also available from a survey). In what follows, we describe the motivation and show the relevant equations including how measurement equations can be specified when the latent variables are discrete, estimation results comparing models with and without the latent variables and with or without indicators, and comparison of the models in terms of efficiency. 3.2.1 Motivation A standard discrete choice model assumes that the choice set can be predicted deterministically for every individual. However, choice sets are better represented as latent variables, because in addition to observed socio-demographic variables that determine the choice set (for example, car availability, driver’s license, and so on in the context of mode choice), the perceived availability of alternatives may depend on subjective factors like the individual’s travel attitudes and perceptions of the attributes of the modes. The approach used to model choice set generation is based on the concept of random constraints discussed earlier (Swait and Ben-Akiva, 1987a, 1987b). It is postulated that an individual perceives a certain alternative to be available only if a number of individual-specific constraints related to that alternative are satisfied (for example, in the case of transit, the constraints may be related to walking distance to the bus stop, travel time, and so on, and they are satisfied when the corresponding variables exceed certain individual-specific latent thresholds). Since different individuals may have different availability criteria, these constraints or criteria are latent, and so the availability of an alternative is also latent. 3.2.2 General formulation Choice set (class membership) model Let Ki denote the set of constraints related to the availability of alternative i, Hkin the value of the kth criterion or constraint for alternative i and individual n, and A*in the latent availability of alternative i for individual n (A*in is equal to 1 if the alternative is available and 0 otherwise). Hkin, which was given in equation (17.15) in a generic form, can be expressed as the difference of a systematic part hkin and a random part wkin as follows: Hkin 5 hkin 2 wkin

(17.26)

The probability that alternative i is available for individual n can then be expressed as the probability that all constraints related to alternative i are satisfied: Pr (A*in 5 1) 5 Pr (Hkin $ 0, 4k [ Ki)

(17.27)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

398

Handbook of choice modelling

Conditional on having a non-empty choice set, the probability that choice set C is considered by individual n can then be expressed as follows (see Ben-Akiva and Boccara, 1995): Pn (C) 5

Pr ({ A*in 5 1, 4i [ C } d { A*jn 5 0, 4j [ Mn\C }) 1 2 Pr (A*l n 5 0, 4l [ Mn)

(17.28)

where Mn represents the set of all deterministically feasible alternatives for individual n, and Mn\C contains the alternatives that are in Mn but not in C. In the latent class terminology of section 2.1.2, equation (17.28) represents the class membership model Q (s 0 Xn;a, Sw) (with a certain choice set C representing a class) which is given behavioral meaning using the random constraints approach. Choice probability The unconditional choice probability Pn (i) is obtained by mixing the conditional choice probability Pn (i 0 C) given a choice set over the probability distribution of the choice sets as follows: Pn (i) 5 a Pn (i 0 C) Pn (C)

(17.29)

C [Gn

where Gn represents the set of all non-empty subsets of Mn. Introducing indicators of alternative availabilities Indicators of the availability of the alternatives can be obtained from a survey using an ordinal scale of availability (for example, ‘never available’ to ‘always available’) or a binary scale (available or not). Since perceived availability may be related to actual availability as well as to the desirability (for example, the utility or the choice probability) of an alternative, these indicators can then be expressed as follows: I*n 5 I* (Hn,Un) 1 un

(17.30)

where I*n denotes a vector of latent response variables underlying the ordinal or binary observed availability indicators In, Hn is the matrix of all latent constraints considered by individual n, Un is the vector of utilities of all alternatives considered by individual n, and un is a vector of error terms. The log-likelihood for individual n given both the choice and indicators is expressed by considering all possible combinations of responses to the availability questions and the actual choice: Ln5 a IinyinlnPn (i) 1Iin (12yin) lnPr (Iin51,yin50) 1 (12Iin) (12yin) lnPr (Iin5 0,yin50) J

i51

(17.31)

where J represents the number of alternatives in the universal choice set and Iin is equal to 1 if the individual stated that alternative i is available and 0 otherwise. The relevant expressions are available in Boccara (1989).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models

399

3.2.3 Estimation The above framework was applied to model mode choice among drive alone (DA), shared ride (SR), and transit (T). The dataset included the following binary indicators related to alternative availabilities: (i) Is drive alone available for your trip? (ii) Is shared ride available for your trip? (iii) Is transit available for your trip? The following three models were estimated: ●

●

●

a logit choice model with deterministic choice sets (DA unavailable if the individual has no driver’s license; SR and T always available) a probabilistic choice set model (PCS) combining a choice set model (choice sets including: {DA, SR, T}, {DA, SR}, {SR}, {SR, T} and {T}) and a logit choice model conditional on the choice set an integrated model which combines the PCS with measurement equations expressing the availability indicators as a function of the latent availabilities and desirability of the alternatives, that is, it is a PCS model with indicators.

All three models have the same utility specifications with the following attributes and characteristics: in-vehicle travel time, out-of-vehicle travel time divided by distance by auto, cost, number of cars divided by number of driving-age individuals in the household, and walking distance to transit. Overall, eight parameters in the utilities are estimated. In the PCS and integrated models, one random constraint is used for each mode i as shown in equation (17.32), where a1i and a2i are parameters to be estimated, and assuming a logistic disturbance w1in with a location of 0 and a scale of 1. It is assumed that DA is perceived to be available if the number of cars divided by the number of driving-age individuals in the household (x1in in equation (17.32)) exceeds a certain threshold, while transit is available if the walking distance to transit (x1in in equation (17.32)) is below a certain threshold. It is assumed that SR is available when DA is available; conditional on DA being unavailable, the systematic part of the SR constraint consists of a constant term only. Thus, five additional parameters are to be estimated in the random constraints equations (that is, a1 and a2 for each of DA and transit, and a1 for SR). H1in 5 a1i 1 a2ix1in 2 w1in

(17.32)

Finally, in the integrated (PCS 1 indicators) model, the measurement equation takes the following form: I*in 5 (l1i 1 l2iPin) A*in 1 (l3i 1 l4iPin) (1 2 A*in) 2 uin

(17.33)

where Pin denotes the probability that individual n chooses alternative i from the universal choice set, and l1i, l2i, l3i, and l4i are parameters to be estimated. Thus, compared to the PCS model, the integrated model contains an additional 4 parameters per alternative. Table 17.1 shows the parameter estimates and t-statistics of the parameters in the structural equations of the three models: utility equations, and random constraints in the PCS and integrated models. The coefficients of the measurement equations are not shown. There seems to be a difference in magnitude between the parameter estimates of the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

400

Handbook of choice modelling

Table 17.1 Estimation results for the logit, PCS, and integrated models Variable

Logit

PCS

Integrated (PCS with indicators)

Parameter t-statistic Parameter t-statistic estimate estimate Utilities Constant for DA Constant for SR In-vehicle travel time (min) Out-of-vehicle travel time/ distance (min/miles) Cost (cents) Number of cars/number of driving-age individuals (specific to DA) Number of cars/number of driving-age individuals (specific to SR) Distance to transit (miles; specific to transit) Availability constraints DA: constant (a1) DA: Number of cars/ number of driving-age individuals (a2) SR: constant (a1) Transit: constant (a1) Transit: Distance to transit (miles) (a2)

Parameter t-statistic estimate

−1.61 −2.80 −0.48

−3.63 −6.75 −3.20

−4.83 −4.83 −0.23

−1.90 −2.15 −2.17

−7.85 −7.40 −0.25

−4.20 −4.53 −5.85

−0.84

−3.76

−5.66

−3.18

−7.29

−5.85

−0.18 4.25

−0.77 6.39

0.02 −0.44

0.29 −0.24

0.04 1.48

1.02 1.19

3.86

5.85

−1.06

−0.57

0.43

0.51

−1.07

−1.61

1.02

0.18

−2.90

−1.58

6.61 16.05

1.72 4.89

8.01 18.88

5.36 5.99

7.11 1.19 1.68

2.47 1.99 1.84

5.08 1.05 0.75

3.81 5.18 0.62

Source: Ben-Akiva and Boccara (1995).

logit model and those of the PCS model, possibly reflecting a scale effect. The parameter estimates of the PCS and those of the integrated model are closer in magnitude. The signs are generally according to expectations, or if not, the corresponding variables are insignificant. One interesting result is that the car availability variable has a positive coefficient and is highly significant in the utility equation of the logit model, but not in the PCS and integrated models. In the latter two models, this variable is instead a highly significant predictor of the perceived availability of DA and SR, as one would expect. 3.2.4 Efficiency Using indicators of the latent variables in a HCM adds to the information content of the data and is expected to result in a gain in efficiency if the measurement equations are

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models

401

correctly specified. Referring to the example considered in this section, one can expect that the integrated model which has indicators is more efficient than the PCS model. This gain in efficiency can be demonstrated by comparing the variance-covariance matrices of the parameter estimators. Considering the estimators of the common set of 13 parameters in the structural equations of the PCS and integrated models, denoted as bˆ PCS and bˆ Integrated, respectively, the difference in their variance-covariance matrices, that is, Sˆb 2 Sˆb , turns out to be positive-definite (has positive Eigen values), which shows that the integrated model is more efficient. In addition, one can compare the t-statistics of the parameter estimates. If both models are consistent (see below), their parameter estimates should be close to each other, yet the more efficient model will have lower standard errors and hence higher t-statistics. Referring to Table 17.1, the integrated model is more efficient as indicated by the higher t-statistics of the parameter estimates in the integrated model for those variables that are common and significant in both models. Finally, one can compare the standard errors of the predicted choice probabilities of the HCM and a mixture model without indicators. A more efficient model will result in smaller standard errors. For a given individual, the distribution of the choice probabilities can be simulated by drawing from the multivariate distribution of the parameter estimators (using the parameter estimates as their means and the estimated variance-covariance matrix).2 Having established that the integrated model is more efficient than the PCS, a Hausman specification test (Hausman, 1978) can be used to check for misspecification of the integrated model. The null hypothesis is that the difference in the parameter estimates of these two models is zero. The test is conducted using the common set of 13 parameters in the structural equations of the two models. The test statistic is: PCS

Integrated

(bˆ PCS 2 bˆ Integrated) r (Sˆb

PCS

2 Sˆb

) 21 (bˆ PCS 2 bˆ Integrated)

Integrated

(17.34)

This test statistic is chi-squared distributed with 13 degrees of freedom. The value of the above statistic turns out to be 3.61, which is smaller than the critical value of 7.04 at the 90 percent level of confidence. Therefore, the null hypothesis cannot be rejected, indicating that the parameter estimators of the integrated model are consistent. 3.3

Behavioral Realism

The standard discrete choice model has been criticized on the grounds that it is too simplistic to adequately model behavior. It can be viewed as a black box that maps observed inputs into observed choices through a preference function represented by the utility. The actual decision-making process involves several stages, including awareness/ knowledge of opportunities and attributes of alternatives, formation of perceptions and cognitive and affective attitudes, and plans or intentions for implementing a certain behavior (McFadden, 1999). It may also be affected by subjective norms (Ajzen, 1991) or other contextual factors related to the behavior of others (Ben-Akiva et al., 2012). These factors affecting individuals’ preferences are latent or unobserved by the analyst. The standard discrete choice model would gain behavioral richness by explaining observed behavior as a function of these latent factors. A recent example by Theis (2011) illustrates how the behavioral realism of airline

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

402

Handbook of choice modelling

itinerary choice models can be enhanced by modeling preferences for connecting time between flights as a function of attitudes towards the risk of misconnection, rush aversion, and trust in airlines’ scheduling abilities, thus representing more transparently how people make such decisions. This section discusses the motivation, data, and model for this case study. 3.3.1 Motivation Theis (2011) postulated that, contrary to the standard assumption that airlines make about individuals preferring the minimum connection time possible and which is used as a basis for scheduling flights, individuals may often prefer to have some additional buffer time (beyond the minimum connection time set by an airport). This preference is given behavioral meaning by explaining it as a function of passengers’ attitudes. In particular, if passengers fear the risk of missing their connecting flight, if they do not like to be rushed through an airport terminal, and if they have low trust in airlines’ abilities to provide reliable connections, they are more likely to prefer some buffer time. This hypothesis is tested by explicitly incorporating passengers’ attitudes towards risk, rush, and trust in a model predicting their itinerary choice. 3.3.2 Data The dataset used in the study by Theis (2011) is obtained from an SP survey. Every respondent was presented with eight choice experiments each involving the choice between the respondent’s recent US domestic air trip (on which information was obtained) and an alternative flight itinerary. The attributes presented for each itinerary include: airline, aircraft type, departure airport, departure time, arrival airport, arrival time, layover (or connection) time including information about the minimum connection time required by the airport, number of connections, on-time performance (percentage of similar flights that are on time) and round-trip fare. The on-time performance attribute was included to avoid bias towards choosing the recent trip itinerary. A snapshot of a choice experiment is shown in Figure 17.5. Individuals’ preferences regarding specific airports and airlines were also collected to help in the design of the SP survey. Socio-demographic characteristics, number of US domestic air trips made in the last year, membership in frequent flyer programs for all ranked airlines, and information on whether an individual missed a connecting flight in the past two years were also collected. Finally, respondents’ attitudes towards risk, rush and trust were measured by asking them to rate their level of agreement with the statements shown in Table 17.2 on a five-point Likert scale with response categories including ‘strongly disagree’, ‘somewhat disagree’, ‘neither agree nor disagree’, ‘somewhat agree’ and ‘strongly agree’. 3.3.3 Model framework and specification Framework The modeling framework is an integrated choice and latent variable model as shown in Figure 17.6. The utility of a flight itinerary alternative is a function of attributes of the itinerary such as the fare, the airline and the buffer time; characteristics of the traveler such as gender, age, income and trip purpose; and the attitudes of risk, rush and trust, interacted with attributes of the itinerary such as buffer time and number

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models

403

Which would you choose for a trip to Jacksonville, FL? Your current flight

Alternate flight

Delta Regional Jet

Continental Standard Jet

Logan International Airport, Boston MA

Burlington International Airport, Burlington VT

AIRLINE AIRCRAFT TYPE AIRPORT DEPARTURE

8.00 AM

5.00 PM

Jacksonville International 12:00 PM

Jacksonville International 10:00 PM

1 hr. (your connecting airport requires a minimum of 40 mins. to connect) 4 hrs.

40 mins. (the connecting airport requires a minimum of 40 mins. to connect) 5 hrs.

TIME AIRPORT ARRIVAL

TIME

LAYOVER TIME

TOTAL TRAVEL TIME

1

1

80% of these flights are on time

90% of these flights are on time

$250

$188

my current flight

the alternate flight

NUMBER OF CONNECTIONS ON-TIME PERFORMANCE ROUND TRIP FARE

I would choose: Source: Theis (2011).

Figure 17.5 SP experiment example Table 17.2 Attitudinal statements Indicator

Description

I1 I2 I3 I4

I like to take my time when connecting between flights. It’s hard for me to find my way through airports. I don’t think time at airports is wasted because I can shop, eat, or work at airports. I don’t mind being rushed at a connecting airport if this means I’ll arrive at my final destination earlier. I enjoy having extra time at airports. I usually arrive at the check-in counter just before the check-in deadline. Catching my scheduled connecting flight is of great importance to me. I try to avoid short connections because of the risk of either me or my luggage missing the connecting flight. Given two itineraries that only differ in connecting time, I always choose the one with shorter connecting time. I’m willing to accept the risk of a missed connection if this gets me earlier to my destination most of the time. Airlines only sell connections that they expect passengers could make. Airlines sometimes underestimate the time needed to connect between flights. It is the passenger’s responsibility to plan for a sufficient transfer time when booking a connecting itinerary. I make sure that the planned connecting time is adequate for me when booking a connecting itinerary.

I5 I6 I7 I8 I9 I10 I11 I12 I13 I14

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

404

Handbook of choice modelling

I1

Bags

I2

1

I4 I5

RISK

I6 2

Preferred Airlines

RUSH

Most Preferred Airport

Night Departure

U

Access Time Elapsed Time Min Connecting Time

y

Interactions with attributes

Number of Connections

I7 I8 I9

Second Preferred Airport

Attributes of the Alternative

Fare On-time Performance

I10 3 TRUST

I3

I11

Rating Exercise Responses

Employmt

Age

Income

Trip Duration

Party Size

Paid by

Distr Ch

Trip Purpose

Frequency of Trips

FFP Level

Gender

Missed

Characteristics of the Traveler

I12 I13 I14

Buffer Time

Source: Theis (2011).

Figure 17.6 Integrated choice and latent variable model of airline itinerary choice of connections. The interaction of an attitude such as rush with an attribute such as buffer time captures the varying sensitivity to buffer time as a function of the degree of rush aversion. The latent variables are functions of the characteristics of the traveler. The attitudinal indicators collected in the survey are used as indicators of the latent variables. The selection of specific indicators to use for a given attitude depends on a combination of the researcher’s judgment of the correspondence of these statements with the attitude and the estimated factor loadings and their statistical significance; larger loadings correspond to a stronger relationship between the attitude and the corresponding indicator. Finally, the utility is measured by the choice. We show below how the latent variables enter the utility equations, the form of the measurement equations of the indicators, the distributions of the disturbances and error terms in the various equations, and the likelihood function expression. All the structural and measurement equations are linear in the parameters. Note that indicator I7 is excluded from the model formulation and estimation results shown below. Formulation The latent variables enter the utility equation of alternative i as follows (the reference to an individual n is implicit and omitted for simplicity):

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models

405

| Ui 5 Vi 1 Rush (b16Buffer timei , 15 min 1 b17Buffer timei 15 2 59 min 1 b18Buffer timei . 60 min) 1 (b19Risk 1 b20Trust) Number of connectionsi 1 ei (17.35) | where V i denotes the systematic part of the utility excluding the latent variables, and Buffer timei is additional connecting time in minutes associated with itinerary i beyond the minimum connecting time. The disutility of buffer time is specified as a piecewise linear function with two breakpoints at 15 and 60 minutes, and the three ranges of buffer time are defined as follows: Buffer timei , 15 min 5 min (Buffer timei,15)

(17.36)

Buffer timei 15 2 59 min 5 max (0, min (Buffer timei 2 15) ,45)

(17.37)

Buffer timei . 60 min 5 max (0,Buffer timei 2 60)

(17.38)

The buffer time variables and the number of connections variable are additionally included in the systematic utility without interaction with the latent variables. The disturbances in the utility equations of the flight itineraries are i.i.d. extreme value type I (0,1). The disturbances in the structural equations of the attitudes are i.i.d. normal (0,1). Their variances are fixed at 1 to set their scale. The indicators are modeled as continuous variables for simplicity, and every indicator Ir,r 5 1,. . .6,8,. . .,14 is expressed as a function of one or more latent variables as follows: Ir 5 kr 1 lr1Risk 1 lr2Rush 1 lr3Trust 1 ur and ur~N (0,s2u ) r

(17.39)

where kr is a constant and lr1, lr2, and lr3 are parameters to be estimated (some of which are fixed at 0). The error terms u are assumed to be multivariate normally distributed with a diagonal variance-covariance matrix Su. For a given individual, the joint probability of the choice and the 13 indicators is expressed as the product of their conditional probabilities, integrated over the joint density function of the three latent variables as follows:

3

P (y,I1,. . .,I6,I8,. . .,I14 0 X;b,a,l,k,Se,Sw,Su) 5

3 3 P (y 0 X,Risk,Rush,Trust;b,Se) g (I1,. . .,I6,I8,. . .,I14 0 Risk,Rush,Trust;l,k,Su)

Trust Rush Risk

f (Risk,Rush,Trust 0 X;a,Sw) d Risk d Rush d Trust

(17.40)

The conditional choice probability is a logit model. The joint density function of the attitudinal indicators is expressed as follows: 14 Ir2kr2lr1Risk2lr2Rush2lr3Trust 1 g (I1, . . . , I6, I8,. . ., I14 0 Risk, Rush, Trust; l, k,Su ) 5 q fa b s s ur r51 ur

(17.41)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

406

Handbook of choice modelling

The product on the right hand side of equation (17.41) does not include a term for the seventh indicator. The joint density function of the latent attitudes is expressed as follows: f (Risk,Rush,Trust 0 X;a,Sw) 5

Risk 2 hRisk (X;a) Rush 2 hRush (X;a) 1 1 fa fa b b sw sw sw sw Risk

1

sw

Trust

Risk

Trust 2 h sw

Trust

fa

Rush

(X;a)

Trust

Rush

b

5f (Risk 2 hRisk (X;a)) f (Rush 2 hRush (X;a) ) f (Trust 2 hTrust (X;a) ) (17.42) In the above expression, hRisk (X;a) , hRush (X;a) , and hTrust (X;a) represent the systematic parts of the structural equations of the attitudes Risk, Rush and Trust, respectively. The above expression simplifies due to the normalization sw 5 sw 5 sw 5 1. Risk

Rush

Trust

3.3.4 Estimation Table 17.3 shows the estimation results of the parameters related to the following variables in the utility equations: number of connections, minimum connecting time, buffer time, and the interactions with latent variables. While an increasing number of connections and an increasing minimum connecting time decrease the utility of a flight alternative, passengers who are rush averse may gain utility from the first 15 minutes of buffer time beyond the minimum connecting time, after which additional buffer time causes disutility. Figure 17.7 shows the utility of buffer time for two values of the latent variable Rush: a low value of –0.410 and a higher value of 0.0124 (median in the sample3). Individuals with a rush value of –0.410 have zero utility from the first 15 minutes of buffer time and then the utility decreases monotonically; individuals with a rush value smaller than –0.410 have a monotonically decreasing utility function. Table 17.3 Estimation results (part of the utility equations) for the airline itinerary choice model with latent variables Variable

Parameter estimate

Number of connections Minimum connecting time (in min) Buffer time < 15 min (in min) Buffer time 15–59 min (in min) Buffer time > 60 min (in min) Interactions Buffer time < 15 min (in min) × rush aversion Buffer time 15–59 min (in min) × rush aversion Buffer time > 60 min (in min) × rush aversion Number of connections × risk tolerance Number of connections × trust

Standard error t-statistic

−0.418 −0.00656 0.0113 −0.00397 −0.00141

0.132 0.003 0.005 0.002 0.002

−3.2 −2.1 2.4 −1.9 −0.9

0.0193 −0.00671 0.00117 0.0107 0.0720

0.006 0.003 0.001 0.065 0.072

3.5 −1.9 0.9 1.7 1.0

Source: Theis (2011).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models

407

0.2 0.1

Utility

0 −0.1

0

20

40

60

80

100

120

140

−0.2 −0.3 −0.4 −0.5 −0.6 Buffer time rush aversion = –0.410

rush aversion = 0.0124

Source: Theis (2011).

Figure 17.7 Buffer time utility for different rush aversion levels And individuals with a rush value of 0.0124 gain utility as buffer time increases from 0 to 15 minutes, but further increases in buffer time cause disutility. Higher levels of risk tolerance and trust in airlines’ schedule reliabilities decrease the disutility caused by the number of connections as indicated by positive coefficients for the interactions between these latent variables and the number of connections. This makes sense since if passengers tolerate risk, they are less likely to be annoyed by having more connections compared to passengers who are risk-averse, and similarly for the level of trust interpretation. To conclude, including attitudes of rush, risk and trust helps explain the nonmonotonicity of the disutility of buffer time through variation in the values of these attitudes. As for the attitudes themselves, the variables that were statistically significant in all three attitudinal structural equations were gender, having elite status on any airline, having missed a connection in the past 12 months and whether the trip is paid for by the individual’s company, but the explanatory power of the attitude equations was low (pseudo R2 ranged from 0.07 to 0.14). 3.4 Policy Relevance Modeling the influence of latent variables on behavior is likely to make a significant difference in the accuracy of predictions, the design of effective policies, and the appraisal of policies and projects due to several reasons. First, the HCM allows the analyst to segment people by latent variables such as attitudes and satisfaction; the importance of market segmentation by customer attitudes has long been recognized in the marketing literature as a policy tool for marketing products and services differently to different market segments (for example, Anable, 2005; Proussaloglou et al., 2001; Shiftan et al., 2008) thus allowing for greater customer satisfaction and potentially greater revenues. The airline itinerary choice example illustrates the advantages of explicitly modeling passengers’ attitudes both from the perspective of the passengers themselves (having better options) and the airline that may be able to reduce its costs.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

408

Handbook of choice modelling

Second, explicitly modeling the latent variables is likely to lead to better predictions of the impacts of policies when the latent variables are important predictors of the choice and when there is significant heterogeneity in the latent variables across the population. This is illustrated by the value of time example and the latent choice set example. Ashok et al. (2002) also show that when the latent variables are important predictors of the choice yet are misspecified (for example, using the indicators directly in the utility equations or using fitted latent variables from a factor analysis model without accounting for their distribution), the results can be misleading from a policy perspective. Third, by explicitly modeling the determinants of the latent variables, one can test policies that may impact the choice indirectly through their influence on the latent variables. This effect is discussed below in the context of the latent choice set example. Another example may be a change in the transportation system, such as the introduction of a new rail system, which may influence people’s attitudes towards travel modes (see for example, Yáñez et al., 2010). 3.4.1 Airline itinerary choice The airline itinerary choice case study (Theis, 2011) illustrates that individuals have varying preferences for connecting time based on their level of rush aversion, which is itself a function of several socio-economic variables and past experiences. This finding has several policy implications that airlines can capitalize on to improve their flight schedules in a way that better aligns with passengers’ preferences, especially those that favor some extra buffer time beyond the minimum connecting time. First, airlines can enhance distribution channel displays by offering more choices (for example, with longer connecting times) to customers booking their itineraries online, possibly based on a customer’s socio-demographics which influence attitudes if the customer is identified, or by giving a warning if a customer selects a flight with a short connecting time. Second, airlines can change their default sorting of flights shown to a customer so that it is not necessarily in increasing order of elapsed time. Third, if airlines de-peak their timetables, which results in an increase in connecting times, they can save on operational costs owing to more effective use of resources (gates, ground equipment, and so on). Overall, airlines can benefit from longer connecting times (beyond the minimum possible) as this reduces the irregularity costs (such as misconnection follow-up costs and passenger goodwill) for the airlines. Airlines can also increase their revenues if they are able to charge money for additional connecting time since at least a segment of the population has a positive willingness to pay for additional buffer time. 3.4.2 Value of travel time savings In standard appraisal methods of transportation projects, travel time savings represent the major category of benefits. These savings are monetized by using an estimate of the value of time. A richer representation of value of travel time savings as a function of attitudes would lead to better estimates of the VTTS and consequently of the benefits of new transportation projects. 3.4.3 Mode choice with latent choice sets The mode choice with latent choice sets example (Ben-Akiva and Boccara, 1995) also illustrates that there are several advantages from the explicit representation of latent

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models

409

Table 17.4 Prediction results for the logit and PCS models

Scenario 1: 100% increase in DA and SR in-vehicle travel time Change in share of DA Change in share of SR Change in share of T Scenario 2: 100% increase in transit out-of-vehicle travel time Change in share of DA Change in share of SR Change in share of T

Logit

PCS

−34.4% −10.5% +44.9%

−7.1% −10.2% +17.3%

+2.3% +1.8% −4.1%

+2.9% +0.8% −3.7%

Source: Ben-Akiva and Boccara (1995).

choice sets from a marketing perspective. One advantage has to do with the prediction of the impact of advertisements, promotions, and so on. Including these variables in the utility equations directly, as is typically done, is not desirable because such factors do not alter the utility of the product. The causality instead is at the level of the choice set through, for example, an increase in awareness about the products available in the market. Moreover, if the latent choice set model contains information capturing consumer captivity or loyalty to certain brands or products, specific marketing plans can then be customized to certain consumer segments. Another advantage of explicitly modeling the choice set is greater predictive power when there is significant heterogeneity in the choice set across consumers in the market. An example of differences in predictive power is shown next. Since the integrated and PCS models are consistent with each other as was shown in section 3.2.4, it is sufficient to compare the predictive power of one of them to that of the logit model. Table 17.4 shows the percentage change in modal shares for two scenarios: a 100 percent increase in DA and SR in-vehicle travel time, and a 100 percent increase in transit out-of-vehicle travel time. The logit model predicts larger changes in mode shares than the PCS model for changes in attributes that do not influence the choice set; this is because in the PCS model, any such changes in attributes for alternatives that are unavailable for a given individual make no difference in the individual’s choice probabilities, while they do in the logit model. On the other hand, if the scenario involves a variable that influences the alternative availabilities (for example, number of cars), the modal share changes predicted by the PCS model are expected to be larger than those predicted by the logit model. Thus, the predictive power of the PCS model seems to be stronger than that of the logit model.

4

CONCLUSION

The HCM, which integrates latent variable models with discrete choice models, has been in use for about a decade now. This chapter reviewed the framework and formulation of the HCM and discussed its four main advantages: ability to explicitly model unobserved heterogeneity, increased efficiency, enhanced behavioral realism and extended policy

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

410

Handbook of choice modelling

relevance. These advantages were illustrated in the context of three applications: heterogeneity in the value of travel time savings arising from heterogeneity in attitudes towards travel modes; mode choice with latent choice sets; and airline itinerary choice incorporating attitudes towards risk of misconnection, rush aversion and trust in airlines. Despite these advantages and the growing number of applications employing the HCM as a modeling framework, there remain a number of difficulties that have hindered widespread use of this framework. These are discussed below, along with directions for future research in this area. First, there are estimation issues. Unlike a logit model, the likelihood function of the HCM is not globally concave, which makes the estimation process more complex and necessitates that the model be estimated from multiple starting values to check that convergence to the same set of ‘behaviorally plausible values’ is achieved (Ben-Akiva and Boccara, 1995). Moreover, since general conditions for the identification of these models have not been established, the researcher has to rely to some extent on empirical tests to ensure that the model is identified. Also, from a practical perspective, until recently there was no software that would allow the simultaneous estimation of the HCM without coding the likelihood function. Second, one issue with the structural equations of the latent variables is that these equations usually have low explanatory power in most empirical applications as usually indicated by insignificant variables and low pseudo R2 values. This is because latent variables like attitudes and perceptions are usually expressed as a function of socio-demographic variables. However, it is doubtful whether latent variables such as attitudes are actually a function of socio-demographic variables (see for example, Anable, 2005). They are more likely to be shaped by people’s life experiences, lifestyles, and so on. The challenge is in adequately collecting such data in surveys and incorporating them in the models. Third, there is the issue of endogeneity. In the HCM framework, the latent variables are predictors of choice. But it may also be the case that the latent variables are affected by the choice, such as attitudes towards a travel mode being affected by repeated exposure to that mode. If there is an effect of the choice on the attitude which is not modeled, the parameter estimates will be biased. This is less of an issue when using data from stated preferences experiments, where it may be reasonably assumed that people’s attitudes (formed before the SP study) influence the choices they make in the SP experiment, but should be tested for when using revealed preferences data. Ideally, panel data would be needed to test these causalities. Fourth, and from a practical perspective, the development of HCMs has mostly dealt with estimation as opposed to application. As discussed in this chapter, the structural part of the HCM (that is, the framework shown in Figure 17.2 without the indicators) can be used in the application and generally does not require additional information beyond what is included in the model. More work on model application is needed to illustrate the potential of the HCM in leading to more sensible policy analysis. Finally, in this chapter, we presented the formulation and example applications of the static HCM. When the latent variables evolve over time, the dynamics in the behavior or actions are driven by the dynamics in the underlying latent variables. The dynamic HCM is a discrete choice model integrated with a hidden Markov model. Formulations and examples of the dynamic HCM are available in Ben-Akiva (2010) and Choudhury et al. (2010), but this area is still under-researched.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models

411

NOTES 1. With exogenous latent variables, the formulation of the choice probability remains the same except that the distribution of the latent variables in the probability expression is no longer a function of explanatory variables (that is, is not a behavioral model). 2. The data were unavailable to conduct this analysis. 3. A fitted value of the latent variable was computed for every individual in the sample as the systematic part of the structural equation of the latent variable evaluated at the estimated values of the parameters.

REFERENCES Abou-Zeid, M., M. Ben-Akiva, M., Bierlaire, C. Choudhury and S. Hess (2010), ‘Attitudes and value of time heterogeneity’, in E. Van de Voorde and T. Vanelslander (eds), Applied Transport Economics: A Management and Policy Perspective, Antwerp: Uitgeverij De Boeck nv, pp. 523–45. Ajzen, I. (1991), ‘The theory of planned behavior’, Organizational Behavior and Human Decision Processes, 50 (2), 179–211. Anable, J. (2005), ‘Complacent car addicts or aspiring environmentalists? Identifying travel behaviour segments using attitude theory’, Transport Policy, 12 (1), 65–78. Ashok, K., W.R. Dillon and S. Yuan (2002), ‘Extending discrete choice models to incorporate attitudinal and other latent variables’, Journal of Marketing Research, 39 (1), 31–46. Ben-Akiva, M. (2010), ‘Planning and action in a model of choice’, in S. Hess and A. Daly (eds), Choice Modelling: The State-of-the-Art and the State-of-Practice, Proceedings from the Inaugural International Choice Modelling Conference, Bingley: Emerald, 19–34. Ben-Akiva, M. and B. Boccara (1995), ‘Discrete choice models with latent choice sets’, International Journal of Research in Marketing, 12 (1), 9–24. Ben-Akiva, M. and S. Lerman (1985), Discrete Choice Analysis: Theory and Application to Travel Demand, Cambridge, MA: MIT Press. Ben-Akiva, M., D. McFadden, K. Train, J. Walker, C. Bhat, M. Bierlaire, D. Bolduc, A. Boersch-Supan, D. Brownstone, D.S. Bunch, A. Daly, A. de Palma, D. Gopinath, A. Karlstrom and M.A. Munizaga (2002a), ‘Hybrid choice models: progress and challenges’, Marketing Letters, 13 (3), 163–75. Ben-Akiva, M., A. de Palma, D. McFadden, M. Abou-Zeid, P.-A. Chiappori, M. de Lapparent, S.N. Durlauf, M. Fosgerau, D. Fukuda, S. Hess, C. Manski, A. Pakes, N. Picard and J. Walker (2012), ‘Process and context in choice models’, Marketing Letters, 23 (2), 439–56. Ben-Akiva, M., J. Walker, A. Bernardino, D. Gopinath, T. Morikawa and A. Polydoropoulou (2002b), ‘Integration of choice and latent variable models’, in H. Mahmassani (ed.), Perpetual Motion: Travel Behaviour Research Opportunities and Application Challenges, Bingley: Emerald, pp. 431–70. Boccara, B. (1989), ‘Modeling choice set formation in discrete choice models’, PhD dissertation, Massachusetts Institute of Technology. Bolduc, D. and R. Alvarez-Daziano (2010), ‘On estimation of hybrid choice models’, in S. Hess and A. Daly (eds), Choice Modelling: The State-of-the-Art and the State-of-Practice, Proceedings from the Inaugural International Choice Modelling Conference, Bingley: Emerald, pp. 259–87. Bolduc, D., M. Ben-Akiva, J. Walker and M. Michaud (2005), ‘Hybrid choice models with logit kernel: applicability to large scale models’, in M. Lee-Gosselin and S. Doherty (eds), Integrated Land-Use and Transportation Models: Behavioural Foundations, Amsterdam: Elsevier Science, pp. 275–302. Bollen, K.A. (1989), Structural Equations with Latent Variables, New York: John Wiley and Sons. Choo, S. and P.L. Mokhtarian (2004), ‘What type of vehicle do people drive? The role of attitude and lifestyle in influencing vehicle type choice’, Transportation Research Part A, 38 (3), 201–22. Choudhury, C., M. Ben-Akiva and M. Abou-Zeid (2010), ‘Dynamic latent plan models’, Journal of Choice Modelling, 3 (2), 50–70. Daly, A., S. Hess, B. Patruni, D. Potoglou and C. Rohr (2012), ‘Using ordered attitudinal indicators in a latent variable choice model: a study of the impact of security on rail travel behaviour’, Transportation, 39 (2), 267–97. Everitt, B.S. (1984), An Introduction to Latent Variable Models, London: Chapman and Hall. Gopinath, D.A. (1995), ‘Modeling heterogeneity in discrete choice processes: application to travel demand’, PhD dissertation, Massachusetts Institute of Technology, Cambridge, MA. Gopinath, D.A. and M. Ben-Akiva (1997), ‘Estimation of randomly distributed value of time’, working paper, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

412

Handbook of choice modelling

Hausman, J. (1978), ‘Specification tests in econometrics’, Econometrica, 46 (6), 1251–71. Johansson, M.V., T. Heldt and P. Johansson (2006), ‘The effects of attitudes and personality traits on mode choice’, Transportation Research Part A, 40 (6), 507–25. Kitrinou, E., A. Polydoropoulou and D. Bolduc (2010), ‘Development of integrated choice and latent variable (ICLV) models for the residential relocation decision in island areas’, in S. Hess and A. Daly (eds), Choice Modelling: The State-of-the-Art and the State-of-Practice, Proceedings from the Inaugural International Choice Modelling Conference, Bingley: Emerald, pp. 593–618. McFadden, D. (1986), ‘The choice theory approach to market research’, Marketing Science, 5 (4), 275–97. McFadden, D. (1999), ‘Rationality for economists?’, Journal of Risk and Uncertainty, 19 (1–3), 73–105. Morikawa, T., M. Ben-Akiva and D. McFadden (2002), ‘Discrete choice models incorporating revealed preferences and psychometric data’, in T.B. Fomby, R.C. Hill and I. Jeliazkov (eds), Advances in Econometrics, vol. 16, Bingley: Emerald, pp. 29–55. Proussaloglou, K., K. Haskell, R. Vaidya and M. Ben-Akiva (2001), ‘An attitudinal market segmentation approach to commuter mode choice and transit service design’, paper presented at the 80th Annual Meeting of the Transportation Research Board, Washington, DC, January. Raveau, S., M.F. Yáñez and J. de D. Ortúzar (2012), ‘Practical and empirical identifiability of hybrid discrete choice models’, Transportation Research Part B, 46 (10), 1374–83. Shiftan, Y., M.L. Outwater and Y. Zhou (2008), ‘Transit market research using structural equation modeling and attitudinal market segmentation’, Transport Policy, 15 (3), 186–95. Swait, J. and M. Ben-Akiva (1987a), ‘Incorporating random constraints in discrete models of choice set generation’, Transportation Research Part B, 21 (2), 91–102. Swait, J. and M. Ben-Akiva (1987b), ‘Empirical test of a constrained discrete choice model: mode choice in São-Paulo, Brazil’, Transportation Research Part B, 21 (2), 103–15. Theis, G.W. (2011), ‘Incorporating attitudes in airline itinerary choice: modeling the impact of elapsed time’, PhD dissertation, Massachusetts Institute of Technology, Cambridge, MA. Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge: Cambridge University Press. Walker, J. and J. Li (2007), ‘Latent lifestyle preferences and household location decisions’, Journal of Geographical Systems, 9 (1), 77–101. Walker, J.L. (2001), ‘Extended discrete choice models: integrated framework, flexible error structures, and latent variables’, PhD dissertation, Massachusetts Institute of Technology, Cambridge, MA. Walker, J.L. and M. Ben-Akiva (2002), ‘Generalized random utility model’, Mathematical Social Sciences, 43 (3), 303–43. Walker, J.L. and M. Ben-Akiva (2011), ‘Advances in discrete choice: mixture models’, in A. de Palma, R. Lindsey, E. Quinet and R. Vickerman (eds), A Handbook of Transport Economics, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 160–87. Walker, J.L., M. Ben-Akiva and D. Bolduc (2007), ‘Identification of parameters in normal error component logit-mixture (NECLM) models’, Journal of Applied Econometrics, 22 (6), 1095–125. Yáñez, M.F., S. Raveau and J. de D. Ortúzar (2010), ‘Inclusion of latent variables in mixed logit models: modelling and forecasting’, Transportation Research Part A, 44 (9), 744–53.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:11AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

18 Choice modeling and risk management Glenn W. Harrison and Jimmy Martínez-Correa1

Choice models in economics often present decision makers with a fixed set of alternatives. This setting certainly mimics many naturally occurring decisions, such as the homely selection of products from shelves in a supermarket. But it does not reflect the type of decisions that are the center of attention in the field of risk management, where it is precisely the ability to ‘fine tune’ the choice alternative that is the behavior of interest. It is one thing to ask someone to select between a safe lottery and a risky lottery, where the return on the former is higher than the latter, and another thing to provide a menu of options to change the consequences of risk for each lottery. Of course, this can be viewed as just an expanded choice set, but it is central to understanding how decision makers mitigate risk. They can do so by simply choosing the safer options, or by engaging in a range of activities which alter the risks of those options or the consequences of the options. In many respects this is a natural domain of application for ‘choice modeling,’ as the term is used here, since one can simultaneously discover how decision makers want to structure their choices as well as discover their attitudes to risk. Although our focus is on the economics of risk management, many of the basic ideas have developed in other related disciplines. A good example is transport economics, where the alternatives may be fixed but the attributes selected and adjusted endogenously. This possibility arises naturally in negotiation settings, as demonstrated by Hensher et al. (2007) and Marcucci et al. (2009), but applies equally well when considering the endogenous process as akin to the risk management choices discussed here. The same idea arises in marketing, where software-based ‘configurators’ allow mass customization of products to meet the preferences of customers (for example, Kamis et al., 2008). And the choice of one mode of transport implies tradeoffs between many attributes that need to be considered simultaneously, akin to the discussion of multivariate risk aversion discussed here. In fact, that literature derived from multi-attribute concerns of the kind that are standard in transport economics and marketing applications (for example, Louviere et al. 2000). It is also close to the literature that spans economics and transportation engineering on the value of a statistical life, when one tries to untangle the correlated effects of the risk of fatal and non-fatal injuries: what does not kill you likely injures you seriously (for example, Viscusi, 1993; Leeth and Ruser, 2003). We review five major topics within this emerging domain of application of choice modeling, with an eye for how they change the nature of the choice modeling task. We particularly focus on a canonical choice task under risk, the purchase of insurance. Choices can be insured through market transactions with companies specializing in providing insurance coverage, but the same risk management can often be accomplished by the decision maker undertaking other types of costly activities. Certain transactions might change the probabilities that the decision maker faces, or change the final consequences of those choices. We introduce the basic elements of the vast literature on risk 413 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

414

Handbook of choice modelling

management in section 1, and draw out implications for the modeling of choice behavior over risky outcomes. Choices are often modeled by assuming that decision makers perfectly integrate them with all other choices they have made, or that decision makers completely ignore other choices. The issue here is whether one assumes perfect or imperfect asset integration in some settings, or perfect or imperfect capital markets in other settings. We review the implications for choice modeling of these assumptions in section 2. Choices are characterized by several attributes, and indeed the complex manner in which multiple attributes trade off is one of the central concerns of most models of choice behavior. If one is to properly characterize risky choice over multiple attributes, then one needs to recognize the formal implications of dealing with multi-attribute risk aversion. We do that in section 3. The major ‘behavioral moving parts’ in any choice involving risk and time are risk attitudes, broadly defined, time preferences, and subjective beliefs. We have theoretical, experimental and econometrics tools to evaluate their confounding role in understanding observed choice, but they are rarely applied systematically in the existing literature. We examine these tools in section 4, and in section 5 discuss extensions to consider uncertainty and ambiguity.

1

RISK MANAGEMENT OF FINANCIAL AND NON-FINANCIAL RISKS

Ehrlich and Becker’s (1972) paper is widely viewed as the first theoretical paper in risk management. They considered the role of financial and non-financial risk management tools, such as insurance and self-protection activities, to deal with hazards and their economic costs, and derived four important results. First, when ‘market’ insurance is not available, individuals engage in self-insurance (for example, cash balances or savings) and self-protection activities (for example, theft alarms) according to the costs and benefits associated with those activities. Selfinsurance is defined as an activity that is able to affect the cost associated with a potential loss, whereas self-protection affects the probability of the loss itself.2 Each of these activities has to involve some cost to the decision maker, so that there is an economic tradeoff in choosing the risk management strategy. Second, market insurance and self-insurance are substitutes. Both risk management strategies are able to redistribute resources from good states of the world to bad states of the world, thus affecting the size cost of the loss and smoothing consumption across states of nature. Third, market insurance and self-protection can be substitutes or complements. For example, the installation of a theft alarm at home might be enough to deter robbery so home insurance might not be necessary. On the contrary, a theft alarm might increase the demand for insurance if this self-protection strategy reduces the probability of a loss and the cost of insurance varies proportionally with this probability. An implication of this relationship is that if the price of insurance to cover certain risks is independent of expenditures on self-protection activities that reduce the probability of a loss, then the market will tend to consider those risks as uninsurable since moral hazard is more likely

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modeling and risk management

415

to arise. However, in contrast to the moral-hazard intuition, this relationship may also imply that the presence of market insurance may increase the self-protection activities. Finally, the framework of Ehrlich and Becker (1972, p. 627) is able to explain the insurance-gambling puzzle that Friedman and Savage (1948) identified. They argued that behavior of people buying insurance and gambling at the same time can be explained if the broader choice options available are sufficiently favorable: ‘inferences about attitudes towards risk cannot be made independently of existing market opportunities: a person may appear to be “risk avoider” under one combination of prices and potential losses and a “risk taker” in another’ (Ehrlich and Becker, 1972, p. 627). Using the expected utility theory (EUT) framework, the analysis has been extended in several directions. Dionne and Eeckhoudt (1985) found that more risk aversion increases self-insurance activities but does not necessarily increase self-protection. Briys and Schlesinger (1990, p. 466) provided an explanation for these results: self-insurance unambiguously reduces risk, while self-protection does not. Consequently, it is no surprise that an increase in risk aversion unambiguously increases the level of self-insurance, but may sometimes decrease the level of self-protection. However, Hiebert (1989, pp. 300–301) showed that self-insurance activities do not necessarily increase with more risk aversion if the effectiveness on loss mitigation of these activities is uncertain. Briys et al. (1991) take the analysis one step further and analyze the main tools of risk management when their reliability cannot be guaranteed; that is, when there is as chance that these tools might not work as expected in the case of a loss, just as an insurer might default on its clients because of solvency issues. In contrast to Ehrlich and Becker (1972), Briys et al. (1991, p. 47) find that the ‘riskiness’ of final wealth is not necessarily reduced by an increase in market insurance or self-insurance when they are not fully reliable, and market insurance and self-insurance might be complements without full reliability. Finally, Sweeny and Beard (1992) study the comparative statics of self-protection when initial wealth and the size of the loss changes. From a non-EUT perspective many of the results in Ehrlich and Becker (1972) hold. In fact, Quiggin (1991, p. 340) showed that a wide range of comparative static results under EUT can be extended to rank dependent utility, because this model ‘may be regarded as expected utility with respect to a transformed probability distribution.’ Konrad and Skaperdas (1993), Machina (1995, 2000) and Courbage (2001) showed that the main results in Ehrlich and Becker (1972) hold under a wide range of alternatives to EUT. The framework of Ehrlich and Becker (1972) can be used to study how policies or activities that change risks can affect individuals’ welfare. Shogren and Crocker (1991, pp. 6, 9 and 11), using the EUT model, analyze the ex ante value of reducing risk when there are self-protection possibilities and find three important results. First, when self-protection influences the probability and/or the severity of a loss, the ex ante valuation of a reduction in risk is a function of risk attitudes and the marginal rate of technical substitution between self-protection and hazard concentrations. This implies that willingness to pay for reductions in risk cannot be studied just by looking at observable expenditures in self-protection and exposure to risk. This also implies that one cannot simply sum the unweighted compensating or equivalent variation to study the societal impact of a policy that affects a risk individuals are exposed to. Second, even under intuitive conditions, such as individuals’ pecuniary costs of hazards being convex to risk, increased exposure to a hazard does not necessarily mean that the individual needs to be compensated. The intuition behind this result is that a change in exposure to risk

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

416

Handbook of choice modelling

induced by self-protection may have effects both on the probability and the severity of the loss. There are no ex ante reasons to believe that self-protection affects both probability and severity in the same direction. Finally, intuitive and simple conditions on the costs of risks are not sufficient to guarantee an unambiguous response of self-protection expenditures to changes in risk. This implies that observed expenditures in self-protection are not necessarily a lower bound on the subject’s ex ante value of a reduction in risk. Intuitively this may happen because these expenditures may not be necessarily increasing in risk. Quiggin (1992, p. 41) claims that, under certain intuitive conditions, the negative results in Shogren and Crocker (1991) will not hold and that the ‘standard willingness-topay approach to valuing environmental hazards is valid under fairly general conditions’. These conditions are decreasing absolute risk aversion and a separability condition. The former is an assumption that deserves discussion but is widely accepted by economists; however, the latter implies that self-protection activities mitigates the individual’s exposure to risk but does not affect the risk itself. Shogren and Crocker (1999) claim that this separability assumption is problematic and they argue that risk is endogenous in many situations. This implies that self-protection activities can mitigate the consequences of risk to the individual but can also affect the general level of risk itself. If this is in fact the case, then the welfare analysis of risk reduction cannot avoid the identification of risk attitudes. This is an important instance in which choice tasks and experiments can help identify risk attitudes in order to carry out welfare analysis. A modern perspective on the analysis of financial decisions was formally introduced by Mayers and Smith (1983), but recognized earlier by Gould (1969, p. 151), who proposed that insurance decisions should be analyzed in the presence of traded and non-traded assets such as human capital. The introduction of a non-traded asset can significantly change some of the standard results in insurance economics, such as the presumption of a wealthier individual buying less insurance. A non-traded asset is closely related to the concept of uninsurable background risk, which is of general relevance to the analysis of decisions under risk. As pointed out by Schlesinger and Doherty (1985), incomplete markets induce the presence of uninsurable background risks that can affect the standard results in the analysis in insurance decisions. Doherty and Schlesinger (1983) studied the robustness of the standard results to the presence of an uninsurable background risk that is independent to the insurable loss. They found that under certain restrictive conditions the full insurance theorem by Mossin (1968) and Smith (1968) holds, and that more risk-averse people will choose a lower deductible. Turnbull (1983, p. 217) also showed that in the presence of many risks, the Arrow-Pratt measure of risk aversion is not a sufficient statistic to describe individuals’ behavior in insurance purchasing. Doherty (1984, p. 209) showed in an EUT framework that the Mossin-Smith theorem only holds if the covariance between the nontraded asset and the insurable loss is negative. The intuition is that even if the insurance premium is fair, the decision maker might still not be willing to fully insure if he or she can compensate high losses with high realizations of his or her human capital. However, if the covariance is negative, the individual might want to fully insure if a health shock negatively affects his or her productivity, which would undermine his or her human capital. Moreover, in the presence of a non-traded asset circumstances may arise where a risk-averse individual prefers a coinsurance arrangement to an actuarially equivalent insurance contract with straight deductible.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modeling and risk management

2

417

ASSET INTEGRATION AND PERFECT CAPITAL MARKETS

A common assumption in many models of insurance choice, even those that consider portfolios and related risk management strategies, is perfect asset integration within a given time period. This amounts to the assumption that there exist perfect markets that allow all assets to be traded, and aggregated into one scalar value of wealth. This scalar wealth is then used as the sole argument of some utility function. The same issue arises in an intertemporal context. However, when imperfect capital markets exist, it is no longer possible for the individual to aggregate time-dated wealth or consumption into one aggregate. When imperfect markets are assumed, things change fundamentally. Pye (1966) considered the implications for the optimal investment rule of a company, and demonstrated that imperfect capital markets implied that there no longer existed a ‘utility free’ investment rule, such as implied by the Fisher separation theorem. That utility-free rule held that production and consumption choices could be separated, and that one does not need to know the utility function of the agent in order to identify optimal investment and production. Pye (1966) and Hirshleifer (1970, ch. 7) showed that when capital markets were imperfect, in general one could not define the intertemporal budget constraint without knowing the utility function of the individual. More generally, imperfect markets force you to consider multivariate risk aversion when evaluating insurance demand. Because it is then no longer possible to aggregate to a scalar wealth measure, you must pay attention to the utility evaluation of two or more components of wealth with tools of multivariate risk aversion. Generalizations of the one-dimensional Arrow-Pratt measure of risk aversion have been proposed by Kihlstrom and Mirman (1974), Duncan (1977) and Karni (1979). Kihlstrom and Mirman (1974) posed the issue of multivariate risk aversion under the restrictive assumption that the ordinal preferences underlying two expected utility functions exhibit the same preferences over non-stochastic outcomes. In this case they propose a scalar measure of total risk aversion that allows one to make statements about whether one person is more risk averse than another in several dimensions, or if the same person is more risk averse after some event than before. If one relaxes this assumption, which is not an attractive one in most applications, Duncan (1977) shows that the Kihlstrom and Mirman (1974) multivariate measure of risk aversion naturally becomes matrix-valued. Hence one has vector-valued risk premia, and this vector is not ‘direction dependent’ in terms of evaluation. Karni (1979) shows that you can define the risk premia in terms of the expenditure function, rather than the direct utility function, and then evaluate it ‘uniquely’ by further specifying an interesting statistic of the stochastic process. For example, if you are considering risk attitudes towards a vector of stochastic price shocks, then you could use the mean of those shocks.

3

MULTI-ATTRIBUTE RISK ATTITUDES

A closely related literature defines multi-attribute risk aversion where the utility function is defined over more than one attribute. In this context, Keeney (1973) first defined the concept of conditional risk aversion, Richard (1975) defined the same concept as

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

418

Handbook of choice modelling

bivariate risk aversion, and Epstein and Tanny (1980) defined it as correlation aversion.3 There are several ways to extend these pairwise concepts of risk aversion over two attributes to more than two attributes, as reviewed by Dorfleitner and Krapp (2007). One attraction of the concept of multi-attribute risk aversion is that it allows a relatively simple characterization of the functional forms for utility that rule out multiattribute risk attitudes: additivity. To see the significance of this for insurance demand, consider time-dating as the attribute in question. If you assume the popular additive intertemporal utility function, you rule out correlation aversion. In this case, as is well known, a-temporal risk preferences and the intertemporal elasticity of substitution cannot be estimated or calibrated independently: one is the inverse of the other. But with non-additive intertemporal utility functions, you can immediately separate ‘risk preferences’ and ‘time preferences’. And you can then talk about individuals having preferences for how risk is resolved over time, the essence of any insurance contract. That is, preferences for how risk is resolved over time can be distinct from preferences for how risk is resolved at any given point of time, and hence be a separate behavioral determinant of the demand for insurance. Controlled experiments provide a way to identify and estimate the degree of correlation aversion, and Andersen et al. (2011b) present evidence that it exists and is significant for the Danish population.

4

ESTIMATING PREFERENCES AND BELIEFS

There are three fundamental, behavioral ‘moving parts’ in almost any decision of importance concerning the attributes of choice under risk: risk attitudes, time preferences and subjective beliefs. Experimental economists now have a robust set of tools to elicit each of these, although controversies remain, as expected in foundational concepts such as these. We consider the role of each ‘moving part’ in the effort to identify the determinants of choice over insurance products. Risk attitudes refer to the risk premium that individuals place on lotteries. The familiar diminishing marginal utility explanation of EUT provides one characterization of the risk premium, and allows a wide range of flexible utility functions to be estimated. But it is a simple matter to also allow for probability weighting to explain the risk premium: ‘pessimistic’ attitudes towards probabilities can just as easily account for risk aversion.4 Similarly, it is possible to extend the estimation to allow for sign-dependent preferences, whereby ‘losses’ are evaluated differently than ‘gains’. We add quotation marks for losses and gains because the Achilles heel of sign-dependent models is the specification of the reference point, and this is the subject of considerable debate. All of these approaches simply decompose and explain the risk premium in different ways, and build on the approach before it. Experimental and econometric methods for the estimation of risk attitudes using all of these approaches are relatively well developed: see Harrison and Rutström (2008) for an extensive survey. There is also considerable evidence that behavior towards risky lotteries is not characterized by just one model of decision-making under risk. Mixture specifications in rich and poor countries, in the laboratory and the field, show a remarkable combination, close to 50:50, of both EUT and non-EUT characterizations (for example, Harrison and Rutström, 2009, Harrison et al., 2010). This finding is likely to vary from domain

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modeling and risk management

419

to domain, and population to population, but offers a much richer characterization of behavior than the usual approach favored by economists.5 Most of the effort to go into estimating ‘risk attitudes’ is actually directed at estimating utility functions. So a trivial by-product of that effort is to be able to generate estimates of higher-order concepts such as ‘prudence’ and ‘temperance’, which under EUT reflect an aversion to skewness and kurtosis, respectively. Although it is possible to generate lotteries that identify preferences driven solely by prudence or temperance (for example, Ebert and Wiesen, 2011), these designs typically require that subjects satisfy reduction of compound lotteries, which is a strong assumption and appears to limit the generalization to non-EUT models such as RDU. Recent extensions include attention to the problem, noted earlier, of the presence of ‘background risk’ affecting decisions over foreground risk (for example, Harrison et al., 2007). For example, it makes little sense to evaluate the value of a statistical life without worrying about the confound of compensating differentials for non-fatal injuries: what does not kill often injures. Time preferences are also now relatively well understood. The first generation of experiments used loose procedures by modern standards, often relying on the elicitation of present values using fill-in-the-blank (FIB) methods that have notoriously poor behavioral properties. This literature is characterized by the need to use scientific notation to summarize estimated astronomic discount rates, a sure sign that something was wrong with behavior, experimental design or inferential methods. Frederick et al. (2002) summarize the literature up to this point. The second generation of experiments moved towards binary choice tasks to ensure incentive compatibility, albeit at the loss of information precision (if the FIB methods behaved the way theorists advertised them, which was not the case), and stakes that were more substantial. Inferred discount rates were now at the level of consumer credit cards: high, but believable (for example, Coller and Williams, 1999; Harrison et al., 2002). The third generation of experiments recognized that discount factors equalize time-dated utility, and not time-dated money, so you needed to account for diminishing marginal utility when inferring discount factors. This is a simple matter of theory, from the conceptual definition of a discount factor. Jensen’s inequality does the rest theoretically: inferred discount rates must be lower if you have a concave utility function than if you assume a linear utility function. Appropriate experimental designs and econometric inferences then simply quantify this insight from theory, with a dramatic reduction in estimated discount rates down to 10 percent or even lower (for example, Andersen et al., 2008). Quite apart from the level of discount rates, there appears to be no support for ‘hyperbolicky’ specifications of the discounting function in field data (for example, Andersen et al., 2011a). This does not mean that exponential specifications are appropriate for all populations, just that the monolithic presumption in favor of non-exponential specifications is not supported by the data. Subjective beliefs can be elicited using scoring rule procedures that have a venerable tradition, such as Savage (1971b). These procedures do require that one correct for risk attitudes, and only directly elicit true subjective beliefs under the assumption of risk neutrality. But it is a relatively simple matter to condition inferences about beliefs on the estimated risk attitudes of individuals, by combining experimental tasks that allow you to identify the risk attitudes independently of the task that elicits subjective beliefs (for

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

420

Handbook of choice modelling

example, Andersen et al., 2014). One can also use generalizations of these scoring rules to elicit whole subjective probability distributions, rather than just one subjective probability (for example, Mathieson and Winkler, 1976, for the theory). This area is the least developed of the three, but the experimental tools are in place for rigorous elicitation, and are being widely applied. It should be stressed that there are also many loose claims about how you can elicit risk attitudes, time preferences and subjective beliefs ‘on the cheap’ with simpler methods. In some cases these are hypothetical survey methods, with no theoretical claim to be eliciting anything of interest. In other cases these are experimental methods that rely, as noted, on tasks that are simply not incentive compatible: subjects could exploit the experimenter, for gain, by deliberately misrepresenting their true preferences. Or experimenters use FIB elicitation methods that have known behavioral biases.6 The fact that experimenters assert that these problems did not arise says nothing about whether they do. The existence of relatively transparent, incentive compatible methods leads you to wonder why you would risk using other methods.7 It is appropriate that all of these methods were first developed in laboratory environments, and that the econometric procedures for estimation of preferences and beliefs were first refined in that setting. Laboratory experiments give us control, if designed and executed correctly. If we cannot identify the conceptually correct measure in that setting, we cannot hope to do so in more complicated field settings. But there is a relatively easy bridge between the laboratory and the field, as stressed by Harrison and List (2004), so that both are complementary ways to make inferences.

5

RISK AND UNCERTAINTY

The evaluation of naturally occurring choices involves more than just the evaluation of objective risk. Does anything change when we allow for subjective beliefs in the evaluation of a choice? Unfortunately, yes and no. Nothing changes if we assume, following Savage (1971a), that decisions are made as if one obeys the reduction of compound lotteries (ROCL) axiom. But things change radically if one does not make that assumption. This seemingly technical issue is actually of great significance for the evaluation of policy choices, and is worth explaining carefully. Figure 18.1 illustrates the situation. Assume that the subjective beliefs are symmetric, with mean one-half as shown by the solid, vertical line. But they vary in terms of the underlying distribution, as shown in the four panels of Figure 18.1. Some are just more or less precise than others, and one is bimodal. Under ROCL, all would generate decisions with the same outcome, since all have the same (weighted) average. Something nags at us to say that behavior ought to be different under these different sets of beliefs, but ROCL begs to differ. Figure 18.2 raises the stakes by considering asymmetric distributions. Again, ROCL is a strong, identifying assumption. Together, Figures 18.1 and 18.2 remind us that Savage (1971a) did not assume that people had degenerate subjective probabilities that they held with certainty, he only assumed that under ROCL they behaved as if they did. We often forget that linguistic methodological sidestep, and confuse the ‘as if’ behavior for what was actually assumed. In some cases the difference does not matter, but here it does. The reason

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modeling and risk management Mean = 0.5

0

0.25 0.5 0.75 Subjective probability

1

Mean = 0.5

0

0.25 0.5 0.75 Subjective probability

Mean = 0.5

0

0.25

0.5 0.75 Subjective probability

1

421

1

Mean = 0.5

0

0.25 0.5 0.75 Subjective probability

1

Figure 18.1 Symmetric subjective probability distributions

Mean = 0.35

0

0.25 0.75 0.5 Subjective probability

1

Mean = 0.23

0

0.25 0.5 0.75 Subjective probability

Mean = 0.16

0

0.25 0.5 0.75 Subjective probability

1

1

Mean = 0.13

0

0.25 0.5 0.75 Subjective probability

Figure 18.2 Asymmetric subjective probability distributions

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

1

422

Handbook of choice modelling 0.6

Subjective density

0.5 0.4 0.3 0.2 0.1 0 0.6

0.7 Subjective probability ()

0.8

Figure 18.3 ROCL at work is that when we have to worry about the underlying non-degenerate distribution, when ROCL is not assumed, then we have moved from the realm of (subjective) risk to uncertainty. And when the individual does not even have enough information to form any subjective belief distribution, degenerate or non-degenerate, we are in the realm of ambiguity. Figure 18.3 allows a simple illustration of how ROCL allows one to collapse these disparate, non-degenerate distributions into one degenerate weighted average. Figure 18.3 displays a three-point discrete, non-degenerate, subjective distribution over a binary event in which the individual holds subjective probability π = 0.6 with ‘prior’ probability 0.1, π = 0.7 with ‘prior’ probability 0.6, and π = 0.8 with ‘prior’ probability 0.3, for a weighted average π = 0.72. Now consider a lottery in which one gets $X if the event occurs, and $x otherwise. Then the subjective expected utility (SEU) is 0.1 × 0.6 × U(X) + 0.1 × 0.4 × U(x) + 0.6 × 0.7 × U(X) + 0.6 × 0.3 × U(x) + 0.3 × 0.8 × U(X) + 0.3 × 0.2 × U(x),

(18.1)

which collapses to (0.1 × 0.6 + 0.6 × 0.7 + 0.3 × 0.8) × U(X) + (0.1 × 0.4 + 0.6 × 0.3 + 0.3 × 0.2) × U(x)

(18.2)

and hence to 0.72 × U(X) + 0.28 × U(x)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

(18.3)

Choice modeling and risk management

423

under ROCL. So the non-degenerate distribution in Figure 18.3 can be boiled down to a degenerate subjective probability of 0.72 under ROCL: an impressive identifying restriction! How we relax ROCL is a matter for important, foundational research. Although it has taken half a century for the implications of Ellsberg (1961) to be formalized in tractable ways, we are much closer to doing so. One popular approach is the ‘smooth ambiguity model’ of Klibanoff et al. (2005). Another popular approach is due to Ghirardoto et al. (2004), generalizing Gilboa and Schmeidler (1989). We can illustrate the smooth ambiguity model with a simple example. Let CE(π=0.6) be the certainty equivalent of the lottery 0.6 × U(X) + 0.4 × U(x), CE(π = 0.7) be the certainty equivalent of the lottery 0.7 × U(X) + 0.3 × U(x), and CE(π = 0.8) be the certainty equivalent of the lottery 0.8 × U(X) + 0.2 × U(x). Then the evaluation of the lottery can be written 0.1 × f(U(CE(π = 0.6))) + 0.6 × f(U(CE(π = 0.7))) + 0.3 × f(U(CE(π = 0.8))), (18.4) where f is a function defined over the certainty-equivalent of the lottery that is conditional on a particular subjective probability value. Akin to the properties of U(·) defining risk attitudes under EUT or SEU, the properties of f(·) define attitudes towards the uncertainty over the particular subjective probability value.8 If f is concave, then the decision-maker is uncertainty averse; if f is convex, then the decision maker is uncertainty loving; and if f is linear, then the decision maker is uncertainty neutral. The familiar SEU specification emerges if f is linear, since then ROCL applies after some irrelevant normalization. The overall evaluation of the lottery choice depends on risk attitudes and uncertainty attitudes, and there is no reason for the decision-maker to be averse to both at the same time. An important econometric corollary is that one cannot infer attitudes toward uncertainty from observed choice until attitudes toward risk are characterized. An equally important implication is that the very definition of a choice setting as involving risk, uncertainty or ambiguity is subjective. The propensity to make decisions using ROCL is a subjective one. The propensity to fill in ambiguous blanks with welldefined subjective belief distributions, whether or not you then apply ROCL to them, is also a subjective one. So you cannot speak a priori of any given environment being one of risk, uncertainty or ambiguity without knowing, or assuming, more about the decision maker.

6

CONCLUSIONS

Choice modeling frameworks can be extended to consider the management and perception of risk, and there is already a rich literature on the implications for modeling and experimental inference. Working backwards from observed choice behavior to infer latent preferences is not easy when you allow for decision makers to want to mitigate that risk by their choices, but the theoretical, experimental and econometrics tools are in place.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

424

Handbook of choice modelling

NOTES 1. We are grateful to a reviewer for constructive comments. 2. Many of the risk management strategies that people have at their disposal can potentially affect both the cost and the probability of the loss. However, for pedagogic reasons it is common to assume that there is a separation between self-insurance and self-protection activities. 3. Several studies note that the core concept appeared as early as de Finetti (1952), but this was written in Italian and we cannot verify that claim. 4. The logic is easy to see. Assume lotteries defined solely over gains, and a linear utility function just to remove the effect of diminishing marginal utility. Then if the weighted probability is always equal to or less than the actual (objective or subjective) probability, the EU based on these weighted probabilities will be less than the EV based on the actual probabilities, hence there is a risk premium. 5. In effect, the usual methodological approach is akin to running a horserace, declaring a winner, maybe by a nose, and shooting all of the losing horses. The fact that one of these losers might have done better on a different, wetter track is ignored. 6. See Harrison (1992, 2005, 2006a, 2006b) and Harrison and Rutström (2008) for examples and surveys of these biases. 7. This strong claim is supported by extensive discussion by Harrison (2006a) of the claims that one can easily generate incentive-compatible choices using appropriate hypothetical surveys. 8. In the original specifications f is said to characterize attitudes towards ambiguity, but the earlier definition of risk, uncertainty and ambiguity makes it apparent why one would not want to casually confound the two. One would only be dealing with ambiguity in the absence of well-defined prior probabilities over the three subjective probability values 0.6, 0.7 and 0.8.

REFERENCES Andersen, S., J. Fountain, G.W. Harrison and E.E. Rutström (2014), ‘Estimating subjective probabilities’, Journal of Risk and Uncertainty, forthcoming. Andersen, S., G.W. Harrison, M.I. Lau and E.E. Rutström (2008), ‘Eliciting risk and time preferences’, Econometrica, 76 (3), 583–618. Andersen, S., G.W. Harrison, M.I. Lau and E.E. Rutström (2011a), ‘Discounting behavior: a reconsideration’, Working Paper 2011-03, Center for the Economic Analysis of Risk, Robinson College of Business, Georgia State University. Andersen, S., G.W. Harrison, M.I. Lau and E.E. Rutström (2011b), ‘Multiattribute utility, intertemporal utility and correlation aversion’, Working Paper 2011-04, Center for the Economic Analysis of Risk, Robinson College of Business, Georgia State University. Briys, E. and H. Schlesinger (1990), ‘Risk aversion and the propensities for self-insurance and self-protection’, Southern Economic Journal, 57 (2), 458–67. Briys, E., H. Schlesinger and J.M.G. Schulenburg (1991), ‘Reliability of risk management: market insurance, self-insurance and self-protection’, The Geneva Papers on Risk and Insurance Theory, 16 (1), 45–58. Coller, M. and M.B. Williams (1999), ‘Eliciting individual discount rates’, Experimental Economics, 2 (2), 107–27. Courbage, C. (2001), ‘Self-insurance, self-protection and market insurance within the dual theory of choice’, The Geneva Papers on Risk and Insurance Theory, 26 (1), 43–56. Dionne, G. and L. Eeckhoudt (1985), ‘Self-Insurance, self-protection and increased risk aversion,’ Economics Letters, 17 (1–2), 39–42. Doherty, N.A. (1984), ‘Portfolio efficient insurance buying strategies’, Journal of Risk and Insurance, 51 (2), 205–24. Doherty, N.A. and H. Schlesinger (1983), ‘The optimal deductible for an insurance policy when initial wealth is random’, Journal of Business, 56 (4), 555–65. Dorfleitner, G. and M. Krapp (2007), ‘On multiattributive risk aversion: some clarifying results’, Review of Managerial Science, 1 (1), 47–63. Duncan, G.T. (1977), ‘A matrix measure of multivariate local risk aversion’, Econometrica, 45 (4), 895–903. Ebert, S. and D. Wiesen (2011), ‘Testing for prudence and skewness seeking’, Management Science, 57 (7), 1334–49. Ehrlich, I. and G.S. Becker (1972), ‘Market insurance, self-insurance and self-protection’, Journal of Political Economy, 80 (4), 623–48.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modeling and risk management

425

Ellsberg, D. (1961), ‘Risk, ambiguity, and the savage axioms’, Quarterly Journal of Economics, 75 (4), 643–69. Epstein, L.G. and S.M. Tanny (1980), ‘Increasing generalized correlation: a definition and some economic consequences’, Canadian Journal of Economics, 13 (1), 16–34. Frederick, S., G. Loewenstein and T. O’Donoghue (2002), ‘Time discounting and time preference: a critical review’, Journal of Economic Literature, 40 (2), 351–401. Friedman, M. and L.J. Savage (1948), ‘The utility analysis of choices involving risk’, Journal of Political Economy, 56 (4), 279–304. Ghirardoto, P., F. Maccheroni and M. Marinacci (2004), ‘Differentiating ambiguity and ambiguity attitude’, Journal of Economic Theory, 118 (2), 133–73. Gilboa, I. and D. Schmeidler (1989), ‘Maxmin expected utility with a non-unique prior’, Journal of Mathematical Economics, 18 (2), 141–53. Gould, J.P. (1969), ‘The expected utility hypothesis and the selection of optimal deductibles for a given insurance policy’, Journal of Business, 42 (2), 143–51. Harrison, G.W. (1992), ‘Theory and misbehavior of first-price auctions: reply’, American Economic Review, 82 (December), 1426–43. Harrison, G.W. (2005), ‘Hypothetical bias over uncertain outcomes’, in J.A. List (ed.), Using Experimental Methods in Environmental and Resource Economics, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 41–69. Harrison, G.W. (2006a), ‘Making choice studies incentive compatible’, in B. Kanninen (ed.), Valuing Environmental Amenities Using Stated Choice Studies: A Common Sense Guide to Theory and Practice, Boston, MA: Kluwer, pp. 65–108. Harrison, G.W. (2006b), ‘Experimental evidence on alternative environmental valuation methods’, Environmental and Resource Economics, 34 (1), 125–62. Harrison, G.W. and J. List (2004), ‘Field experiments’, Journal of Economic Literature, 42 (4), 1009–55. Harrison, G.W., S.J. Humphrey and A. Verschoor (2010), ‘Choice under uncertainty: evidence from Ethiopia and Uganda’, Economic Journal, 120 (543), 80–104. Harrison, G.W., M. Lau and M.B. Williams (2002), ‘Estimating individual discount rates in Denmark: a field experiment’, American Economic Review, 92 (5), 1606–17. Harrison, G.W., J. List and C.A. Towe (2007), ‘Naturally occurring preferences and exogenous laboratory experiments: a case study of risk aversion’, Econometrica, 75 (2), 433–58. Harrison, G.W. and E.E. Rutström (2008), ‘Experimental evidence on the existence of hypothetical bias in value elicitation experiments’, in C.R. Plott and V.L. Smith (eds), Handbook of Experimental Economics Results, New York: Elsevier Press, pp. 41–196. Harrison, G.W. and E.E. Rutström (2009), ‘Expected utility theory and prospect theory: one wedding and a decent funeral’, Experimental Economics, 12 (2), 133–58. Hensher, D.A. S.M. Puckett and J.M. Rose (2007), ‘Extending stated choice analysis to recognize agentspecific attribute endogeneity in bilateral group negotiation and choice: a think piece’, Transportation, 34 (6), 667–79. Hiebert, L.D. (1989), ‘Optimal loss reduction and increases in risk aversion’, Journal of Risk and Insurance, 56 (2), 300–305. Hirshleifer, J. (1970), Investment, Interest and Capital, Englewood Cliffs, NJ: Prentice-Hall. Kamis, A., M. Koufaris and T. Stern (2008), ‘Using an attribute-based DSS for user-customized products online: an experimental investigation’, MIS Quarterly, 32 (1), 159–77. Karni, E. (1973), ‘On multivariate risk aversion’, Econometrica, 47 (6), 1391–401. Keeney, R.L. (1973), ‘Risk independence and multiattributed utility functions’, Econometrica, 41 (1), 27–34. Kihlstrom, R.E. and L.J. Mirman (1974), ‘Risk aversion with many commodities’, Journal of Economic Theory, 8, 361–88. Konrad, K.A. and S. Skaperdas (1993), ‘Self-insurance and self-protection: a nonexpected utility analysis’, The Geneva Papers on Risk and Insurance Theory, 18 (2), 131–46. Leeth, J.D. and J. Ruser (2003), ‘Compensating wage differentials for fatal and nonfatal injury risk by gender and race’, Journal of Risk and Uncertainty, 27 (3), 257–77. Louviere, J.J., D.A. Hensher and J.D. Swait (2000), Stated Choice Methods: Analysis and Application, New York: Cambridge University Press. Machina, M. (1995), ‘Non-expected utility and the robustness of the classical insurance paradigm’, The Geneva Paper on Risk and Insurance Theory, 20 (1), 9–50. Machina, M. (2000), ‘The robustness of the classical insurance paradigm’, in G. Dionne (ed.), Handbook of Insurance, Boston, MA: Kluwer, pp. 37–96. Marcucci, E., L. Rotaris and G. Paglione (2009), ‘A methodology to evaluate the prospects for the introduction of a Park&Buy service’, European Transport, 42, 26–46. Mathieson, J.E. and R.L. Winkler (1976), ‘Scoring rules for continuous probability distributions’, Management Science, 22 (10), 1087–96.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

426

Handbook of choice modelling

Mayers, D. and C.W. Smith Jr (1983), ‘The interdependence of individual portfolio decisions and the demand for insurance’, Journal of Political Economy, 91 (2), 304–11. Mossin, J. (1968), ‘Aspects of rational insurance purchasing,’ Journal of Political Economy, 79 (4), 553–6. Pye, G. (1991), ‘Present values for imperfect capital markets’, Journal of Business, 39 (1), 45–51. Quiggin, J. (1991), ‘Comparative statics for rank-dependent expected utility theory’, Journal of Risk and Uncertainty, 4 (2), 338–50. Quiggin, J. (1992), ‘Risk, self-protection and ex ante economic value-some positive results’, Journal of Environmental Economics and Management, 23 (1), 40–53. Richard, S.F. (1975), ‘Multivariate risk aversion, utility independence and separable utility functions’, Management Science, 22 (1), 12–21. Savage, L.J. (1971a), The Foundations of Statistics, 2nd edn, New York: Dover Publications. Savage, L.J. (1971b), ‘Elicitation of personal probabilities and expectations’, Journal of the American Statistical Association, 66 (December), 783–801. Schlesinger, H. and N. Doherty (1985), ‘Incomplete markets for insurance: an overview’, Journal of Risk and Insurance, 52 (3), 402–23. Shogren, J.F. and T.D. Crocker (1991), ‘Risk, self-protection and ex ante economic value,’ Journal of Environmental Economics and Management, 20 (1), 1–15. Shogren, J.F. and T.D. Crocker (1999), ‘Risk and its consequences’, Journal of Environmental Economics and Management, 37 (1), 44–51. Smith, V.L. (1968), ‘Optimal insurance coverage’, Journal of Political Economy, 76 (1), 68–77. Sweeney, G.H. and T.R. Beard (1992), ‘The comparative statics of self-protection’, Journal of Risk and Insurance, 59 (2), 301–9. Turnbull, S.M. (1983), ‘Additional aspect of rational insurance purchasing’, Journal of Business, 56 (2), 217–29. Viscusi, W.K. (1993), ‘The value of risks to life and health’, Journal of Economic Literature, 11 (4), 1912–46.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:06AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

19 Multiple discrete-continuous choice models: a reflective analysis and a prospective view Chandra Bhat and Abdul Pinjari

1

BACKGROUND

Several consumer choices are characterized by a discrete dimension as well as a continuous dimension. Examples of such choice situations include vehicle-type holdings and usage, appliance choice and energy consumption, housing tenure (rent or purchase) and square footage, brand choice and quantity, and activity-type choice and duration of time investment of participation. Two broad model structures may be identified in the literature to handle such discrete-continuous choice situations. The first structure (sometimes referred to as the ‘reduced-form’ structure) has a separate equation for the discrete choice and another separate equation for the continuous choice, with jointness introduced through the statistical correlation in the random stochastic components of each equation. That is, a discrete choice model and a continuous regression model are specified separately, and then simply statistically stitched together through the stochastic terms. This first structure has seen extensive use and has proved useful to handle many empirical situations, but it is not based off an underlying (and unifying) theoretical economic model (this structure does not include the class of indirect utility function-based models that are consistent with utility maximization, as discussed in the next section). The second structure to discrete-continuous choice modeling, and the one of interest in this chapter, originates from the classical microeconomic theory of utility maximization. While much work in the context of consumer utility maximization has been focused on the case of a single discrete-continuous (SDC) choice situation (where the choice involves the selection of one of many alternatives and the continuous dimension associated with the chosen alternative), there has been increasing interest recently in the multiple discrete-continuous (MDC) choice situation (where the choice situation involves the selection of one or more alternatives, along with a continuous quantity dimension associated with the consumed alternatives). Such MDC choices are pervasive in the social sciences, including transportation, economics, and marketing. Examples include individuals’ time-use choices (decisions to engage in different types of activities and time allocation to each activity), investment portfolios (where and how much to invest) and grocery purchases (brand choice and purchase quantity). Regardless of whether a choice situation belongs to an SDC case or an MDC case, at a basic level, the choice process faced by the consumer can be formulated using the theory of utility maximization as described next.

427 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

428

Handbook of choice modelling

1.1

The Random Utility Maximization (RUM) Approach to Modeling Discrete-Continuous Choices

Consumers are assumed to maximize a direct utility function Uk over a set of non-negative consumption quantities x 5 (x1, . . . , xk, . . . , xK) subject to a budget constraint, as below: Max U(x) such that x.p 5 E and xk $ 0

(19.1)

where Uk is a quasi-concave, increasing and continuously differentiable utility function with respect to the consumption quantity vector, p is the vector of unit prices for all goods, and E is the total expenditure (or income). Note that we are suppressing the index for the consumer in equation (19.1) for presentation efficiency. The formulation above is equally applicable to cases with complete or incomplete demand systems (that is, the modeling of demand for all commodities that enter preferences or the modeling of demand for a subset of commodities that enter preferences).1 The vector x in equation (19.1) may or may not include an outside good. The outside good, when included, represents the part of the total budget (for example, income) that is not spent on the K inside goods of interest to the analyst. Generally, the outside good is treated as a numeraire with unit price, implying that the prices and characteristics of all goods grouped into the outside category do not influence the choice and expenditure allocation among the inside goods (see Deaton and Muellbauer, 1980). The outside good allows for the overall demand for the inside goods to change due to changes in prices and other influential factors of the inside goods. Other assumptions typically made in the above utility maximization formulation are: (a) the direct utility contribution due to the consumption of different alternatives is additively separable, and (b) the constraint is linear in prices, and it is the only constraint governing consumers’ decisions. We will return to these assumptions later. The form of the utility function U (x) in equation (19.1) determines whether the formulation corresponds to an SDC model or an MDC model. The SDC case assumes that the choice alternatives are perfect substitutes; that is, the choice of one alternative precludes the choice of others. The MDC case accommodates imperfect substitution among goods, thus allowing for the possibility of consuming multiple alternatives. A linear utility form with respect to consumption characterizes the perfect substitutes (or SDC) case, while a non-linear utility form allowing diminishing marginal utility with increasing consumption characterizes the imperfect substitutes (or MDC) case. An example SDC framework is Hanemann’s (1984) specification: U (x) 5 U*a a yk xk, x1 b, K

(19.2)

k 52

where U* is a bivariate utility function and yk(k 5 2,. . ., K) represents the quality index (or baseline preference) specific to each inside good k, with the first good considered as the outside good. This functional form assures that, in addition to the outside good, exactly one inside good (k 5 2, 3,. . ., K) is consumed. Hanemann (1984) refers to this as the ‘extreme corner solution’. Examples of MDC frameworks are discussed later.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 429 Two approaches have been used to derive demand functions for the consumption quantities for the utility maximization problem in equation (19.1). The first approach, due to Hanemann (1978) and Wales and Woodland (1983), takes a direct approach to solving the constrained utility maximization problem in equation (19.1) via standard application of the Karush–Kuhn–Tucker (KKT) first-order necessary conditions of optimality. Considering the utility function U(x) to be random over the population leads to stochastic KKT conditions, which form the basis for deriving probabilities for consumption patterns (including corner solutions). This approach is called the KKT approach due to the central role played by the KKT conditions (more popularly, the approach is referred to as the KT approach, but we use the label ‘KKT’ to give credit to Karush who, in an unpublished manuscript, derived the first order optimality conditions in a constrained optimization setting even earlier than Kuhn and Tucker). The second approach, due to Hanemann (1984) and Lee and Pitt (1986), solves the maximization problem in equation (19.1) by using ‘virtual prices’ (a method that is dual to the KKT approach), which allows the analysis to start with the specification of a conditional indirect utility function. Subsequently, the implied Marshallian demand functions are obtained via Roy’s identity (Roy, 1947).2 The vast majority of applications in the literature have involved single discrete or SDC choices. These use the indirect utility approach as opposed to the KKT approach (that is, the direct utility approach). This is mainly because the KKT approach has been perceived to be difficult to use until the past decade. This is primarily due to the absence of practical methods for estimating the structural parameters. In particular, the KKT conditions, in a stochastic setting, lead to a probability expression for the consumption vector that involves multidimensional integrals of the order of the number of goods in the analysis as discussed in section 3.2 (and, until Bhat, 2005, this expression was thought to be analytically intractable). Further, simple and practically feasible prediction and welfare analysis methods were not available for models based on the KKT approach. However, recent interest in MDC problems has brought renewed attention to the KKT approach. Besides, the use of direct utility functions has some advantages: the relationship of the utility function to behavioral theory is more transparent, offering more interpretable parameters and better insights into identification issues. This is true even for the SDC case. For example, Bunch (2009) shows that the indirect utility function used by Chintagunta (1993) is in fact from the linear expenditure system, so the direct utility function is known. Applying the KKT approach yields the correct analytical expression for the reservation price in terms of parameters from the direct utility function, which has a clear behavioral interpretation. Over the past decade, the field has witnessed significant strides in using the KKT approach for modeling MDC choices – both for estimation of the parameters for KKT models and for application of the models for forecasting and welfare analysis. Thus, in this chapter, we focus on the KKT approach to modeling MDC choices. Specifically, we review the recent advances and outline an agenda for future research. 1.2 Structure of the Chapter The rest of this chapter is organized as follows. The next section provides an overview of the utility forms used to model MDC choices. Section 3 outlines the econometric

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

430

Handbook of choice modelling

structure and KKT conditions of optimality that form the basis for deriving the model structure and likelihood expressions. Section 4 outlines the specific model structures used in the literature based on different specifications of the utility form and the stochastic structure. Section 5 provides a brief discussion of the case where the choice alternatives comprise a combination of imperfect and perfect substitutes. Section 6 presents methods that enable the use of the KKT-based MDC models for forecasting and policy analysis purposes. Section 7 discusses several developments on the horizon and the challenges that lie beyond. Section 8 summarizes the chapter.

2

UTILITY FORMS FOR MODELING MDC CHOICES

As discussed earlier, non-linear utility forms that allow diminishing marginal utility with increasing consumption can be used to model ‘multiple discreteness’ in consumer choices. A number of different utility forms have been used in the literature. In this section, we discuss the following form used in Bhat (2008) as it subsumes a variety of utility forms used in previous studies as special cases: k

K g a xk k U (x) 5 a yk e a 1 1b 2 1 f gk k 51 ak

(19.3)

In the above utility function, U(x) is a quasi-concave, increasing, and continuously differentiable function with respect to the consumption quantity (K 3 1)-vector x (xk ≥ 0 for all k), and yk, gk and ak are parameters associated with good k. The function in equation (19.3) is a valid utility function if yk.0 and ak ≤ 1 for all k. Further, for presentation ease, we assume temporarily that there is no Hicksian composite outside good that is consumed by all decision makers, so that corner solutions (that is, zero consumptions) are allowed for all the goods k. The possibility of corner solutions implies that the term gk, which is a translation parameter, should be greater than zero for all k. The reader will note that there is an assumption of additive separability of preferences in the utility form of equation (19.1). More on this assumption later. The form of the utility function in equation (19.1) highlights the role of the various parameters yk, gk, and ak, and explicitly indicates the inter-relationships between these parameters that relate to theoretical and empirical identification issues. The form also assumes weak complementarity (see Mäler, 1974), which implies that the consumer receives no utility from a non-essential good’s attributes if he or she does not consume it (that is, a good and its quality attributes are weak complements). The functional form proposed by Bhat (2008) in equation (19.3) generalizes earlier forms used by Hanemann (1978), von Haefen et al. (2004), Phaneuf et al. (2000) and others. Specifically, the utility form of equation (19.3) collapses to the following linear expenditure system (LES) form when ak S 04k: U (x) 5 a gkykln ( (xk /gk) 1 1) K

k 51

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

(19.4)

Multiple discrete-continuous choice models 431 2.1 Role of Parameters in the Utility Specification 2.1.1 Role of yk The role of yk can be inferred by computing the marginal utility of consumption with respect to good k, which is: a 21 xk 0U (x) 5 yk a 1 1b gk 0xk k

(19.5)

It is clear from above that yk represents the baseline marginal utility, or the marginal utility at the point of zero consumption of good k. Alternatively, the marginal rate of substitution between any two goods k and l at the point of zero consumption of both goods is yk /yl. This is the case regardless of the values of gk and ak. Thus, a good k with a higher baseline marginal utility is more likely to be consumed than a good l with a lower baseline marginal utility. 2.1.2 Role of gk An important role of the gk terms is to shift the position of the point at which the indifference curves are asymptotic to the axes from (0,0,0. . .,0) to ( 2 g1,2g2,2g3,. . .,2gK) , so that the indifference curves strike the positive orthant with a finite slope. This, combined with the consumption point corresponding to the location where the budget line is tangential to the indifference curve, results in the possibility of zero consumption of good k. To see this, consider two goods 1 and 2 with y1 5 y2 5 1, a15 a2 5 0.5, and g2 5 1. Figure 19.1 presents the profiles of the indifference curves in this two-dimensional space for various values of g1(g1 . 0). To compare the profiles, the indifference curves are all drawn to go through the point (0,8). You will also note that all the indifference curve profiles strike the y-axis with the same slope. As can be observed from Figure 19.1, the positive values of g1 and g2 lead to indifference curves that cross the axes of the positive orthant, allowing for corner solutions. The indifference curve profiles are asymptotic to the x-axis at y 5 −1 (corresponding to the constant value of g2 5 1), while they are asymptotic to the y-axis at x 5 2g1. Figure 19.2 points to another role of the gk term as a satiation parameter. Specifically, the figure plots the sub-utility function for alternative k for ak S 0 and yk 5 1, and for different values of gk. All of the curves have the same slope yk 5 1 at the origin point, because of the functional form used here. However, the marginal utilities vary for the different curves at xk . 0. Specifically, the higher the value of gk, the less is the satiation effect in the consumption of xk. Thus, different values of gk lead to different satiation effects, provided ak , 1. 2.1.3 Role of ak The express role of ak is to reduce the marginal utility with increasing consumption of good k; that is, it represents a satiation parameter. When ak 5 1 for all k, this represents the case of absence of satiation effects or, equivalently, the case of constant marginal utility. The utility function in equation (19.1) in such a situation collapses to g ykxk, k which represents the perfect substitutes case. This is the case of single discreteness. As ak moves downward from the value of 1, the satiation effect for good k increases. When

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

432

Handbook of choice modelling 11

Consumption quantity of good 2

10

1 = 2 = 1

9

1 = 2 = 0.5

8

2 = 1

7 6 5 4

1 = 0.25

3 2 1 = 1

1 –2

–1

1 = 2

0 –0.25 0 –1

1

2

–2

3

4

5

1 = 5

6

Consumption quantity of good 1

Utility accrued due to consumption of good k

Figure 19.1 Indifference curves corresponding to different values of g1

180 k = 1

160

k u 0

k = 100

140 120 k = 20

100 80 60 40

k = 10 1k = 5

20

k = 1

0 0

50

100

150

200

150

300

350

400

Consumption quantity of good k

Figure 19.2 Effect of gk value on good k’s sub-utility function profile ak S 0, the utility function collapses to the LES form, as discussed earlier. ak can also take negative values and, when ak S 2`, this implies immediate and full satiation. Figure 19.3 plots the utility function for alternative k for gk 5 1 and yk 5 1, and for different values of ak. Again, all of the curves have the same slope yk 5 1 at the origin point, and accommodate different levels of satiation through different values of ak for any given gk value.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 433 Utility accrued due to consumption of good k

40

k = 1 k = 0.5

k = 1

30 k = 0.4

20

k = 0.25

10 k u 0 k u –0.5 k = –2

0 0

50

100

150

200

150

300

350

400

Consumption quantity of good k

Figure 19.3 Effect of ak value on good k’s sub-utility function profile 2.2

Empirical Identification Issues Associated with Utility Form

The discussion in the previous section indicates that yk reflects the baseline marginal utility, which controls whether or not a good is selected for positive consumption (or the extensive margin of choice). The role of gk is to enable corner solutions, though it also governs the level of satiation. The purpose of ak is solely to allow satiation. The precise functional mechanism through which gk and ak impact satiation are, however, different; gk controls satiation by translating consumption quantity, while ak controls satiation by exponentiating consumption quantity. Clearly, both these effects operate in different ways, and different combinations of their values lead to different satiation profiles. However, empirically speaking, and as discussed in detail in Bhat (2008), it is very difficult to disentangle the two effects separately, which leads to serious empirical identification problems and estimation breakdowns when one attempts to estimate both gk and ak parameters for each good. In fact, for a given yk value, it is possible to closely approximate a sub-utility function profile based on a combination of gk and ak values with a sub-utility function based solely on gk or ak values. In actual application, it would benefit the analyst to estimate models based on both the ak-profile (that is, a utility function based solely on ak values) and the yk-profile (that is, a sub-utility function based solely on values gk, with the ak values set to zero), and choose a specification that provides a better statistical fit. Alternatively, the analyst can stick with one functional form a priori, but experiment with various fixed values of ak for the gk-profile and gk for the ak-profile. 2.3 Utility Form for Situations with an Outside Good Thus far, the discussion has assumed that there is no outside numeraire good (that is, no essential Hicksian composite good). If an outside good is present, label it as the first

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

434

Handbook of choice modelling

good which now has a unit price of one. Then, the utility functional form needs to be modified as follows: U (x) 5

K g a xk 1 k y1 { (x1 1 g1) a } 1 a yk e a 1 1b 2 1 f gk a1 k 52 ak k

1

(19.6)

In the above formula, we need g1 # 0, while gk . 0 for k . 1. Also, we need x1 1 g1 . 0 The magnitude of g1 may be interpreted as the required lower bound (or a ‘subsistence value’) for consumption of the outside good. The identification considerations discussed for the ‘no-outside good’ case carries over to the ‘with outside good’ case. For example, as in the ‘no-outside good’ case, the analyst will generally not be able to estimate both ak and gk for the outside and inside goods. Another important normalization necessary for parameter identification, regardless of the presence or absence of the outside good, is that the coefficients of explanatory variables (including the constants) in the baseline utility parameters yk (k 5 1,2,. . .,K) should be normalized (for example, to zero) for at least one alternative. In situations with a Hicksian composite outside good, the natural candidate for such normalization is the baseline marginal utility parameter of the outside good. This identification condition is similar to that in the standard discrete choice model, though the origin of the condition is different between standard discrete choice models and the multiple discrete-continuous models. In standard discrete choice models, individuals choose the alternative with the highest indirect utility, so that all that matters is relative utility. In multiple discrete-continuous models, the origin of this condition is the adding up (or budget) constraint associated with the quantity of consumption of each good.

3

ECONOMETRIC STRUCTURE AND KARUSH-KUHNTUCKER (KKT) CONDITIONS OF OPTIMALITY

The KKT approach employs a direct stochastic specification by assuming the utility function U(x) to be random over the population. In all recent applications of the KKT approach for multiple discreteness, a multiplicative random element is introduced to the baseline marginal utility of each good as follows: y (zk,ek) 5 y (zk) # eek,

(19.7)

where zk is a set of attributes characterizing alternative k and the decision maker, and ek captures idiosyncratic (unobserved) characteristics that impact the baseline utility for good k. The exponential form for the introduction of the random term guarantees the positivity of the baseline utility as long as y (zk) . 0. To ensure this latter condition, y (zk) is further parameterized as exp (brzk) , which then leads to the following form for the baseline random utility associated with good k: y (zk,ek) 5 exp (brzk 1 ek)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

(19.8)

Multiple discrete-continuous choice models 435 The zk vector in the above equation includes a constant term. The overall random utility function of equation (19.3) then takes the following form: a gk xk U (x) 5 a [ exp (brzk 1 ek) ] # e a 1 1b 2 1 f gk k ak k

(19.9)

As indicated earlier, the part of b9 (that is, the coefficients of explanatory variables) corresponding to at least one alternative must be normalized to zero. In the presence of a Hicksian composite outside good, arbitrarily designating the first alternative as the outside good, the overall random utility function can be written as: U (x) 5

a gk xk 1 exp (e1){(x1 1g1) a } 1 a [ exp (brzk 1 ek) ] # e a 1 1b 21f gk a1 k ak k

1

(19.10)

Note that, for identification, y (z1, e1) is specified as ee , by normalizing the coefficients of z1 to zero. But some studies (particularly those in the environmental economics literature) impose a stronger normalization by considering the utility of the outside good as being deterministic (that is, e1 5 0) and setting y (z1, e1) 5 1. Then the overall random utility function becomes: 1

U (x) 5

a g x 1 { (x1 1 g1) a } 1 a k [ exp (brzk 1 ek) ] # e a k 1 1b 2 1 f gk a1 k ak k

1

(19.11)

While the above normalization is not theoretically inappropriate, it is unnecessary. Further, it is arbitrary to set a good’s utility contribution to be deterministic. This is particularly a problem in situations with no Hicksian composite outside good, where the analyst has to arbitrarily choose the utility contribution of any one alternative to be deterministic. Further, as demonstrated in Bhat (2008), the probability expressions and probability values for the consumption pattern depend on which choice alternative is chosen for this normalization. Finally, in contexts with an outside good, including the stochastic term on the outside good e1 helps in capturing correlation among the random utilities of the inside goods. Such correlation helps in inducing greater competition among the consumptions of the inside goods, when compared with the competition between the inside goods and the outside good. Thus, we prefer the specification with stochasticity in the utility contribution of all choice alternatives, including that of the outside good in situations with an outside good. 3.1 Optimal Consumptions The analyst can solve for the optimal expenditure allocations by forming the Lagrangian and applying the KKT conditions. For the utility form in equation (19.10), the Lagrangian function for the problem is:3 K g a K x 1 k [ exp (brzk 1ek)] ea k 11b 2 1f 2l c a pkxk 2 Ed , exp (e1){(x11g1) a }1 a gk a1 k 52 ak k 51 k

L5

1

(19.12)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

436

Handbook of choice modelling

where l is the Lagrangian multiplier associated with the budget constraint (that is, it can be viewed as the marginal utility of total expenditure or income). The KKT first-order conditions for the optimal consumptions (the x*k values) are given by: exp (e1) (x*1 1 g1) a1 21 5 l, since x*1 . 0 p1 ak 21 exp (brzk 1 ek) x*k 1 1b 5 l, if x*k . 0, k 5 2,. . ., K a pk gk

(19.13)

a 21 exp (brzk 1 ek) x*k 1 1b , l, ifx*k 5 0, k 5 2,. . ., K a pk gk k

In the above KKT conditions, the first condition is for the outside good, while the next two sets of conditions are for the inside goods (k 5 2, 3,. . ., K). Note that the price of the Hicksian outside numeraire good p1 is unity. The optimal demand satisfies the conditions in equation (19.13) plus the budget constraint g pk x*k 5 E. Substituting for the expression of l from the KKT condition for the k outside good into the KKT conditions for the inside goods, and taking logarithms, one can rewrite the KKT conditions as: Vk 1 ek 5 V1 1 e1 if x*k . 0 (k 5 2, 3,. . ., K) Vk 1 ek , V1 1 e1 if x*k 5 0 (k 5 2, 3,. . ., K) ,

(19.14)

where V1 5 (a1 2 1) ln (x*1 1 g1) 2 ln p1, and Vk 5 brzk 1 (ak 2 1) lna 3.2

x*k 1 1b 2 ln pk (k 5 2, 3,. . ., K) . gk

General Econometric Model Structure and Identification

To complete the model structure, the analyst needs to specify the error structure. In the general case, let the joint probability density function of the ek terms be f (e1, e2,. . . , eK) Then, the probability that the individual consumes the first M of the K goods is: 1`

P (x*1, x*2, x*3, . . ., x*M, 0, 0, . . ., 0) 5 0 J 0 3

e1 52`

V1 2VM 1 1 1e1 V1 2VM 1 2 1e1

3

3

eM 1 1 52`

eM 1 2 52`

V1 2VK 2 1 1e1 V1 2VK 1e1

...

3

3

eK 2 1 52`

eK 52`

f (e1, V1 2 V2 1 e1, V1 2 V3 1 e1, . . ., V1 2 VM 1 e1, eM 11, eM 12, . . ., eK 21, eK) deKdeK 21. . .deM 12deM 11de1,

(19.15)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 437 where J is the Jacobian whose elements are given by (see Bhat, 2005): Jih 5

0 [ V1 2 Vi11 1 e1 ] 0 [ V1 2 Vi11 ] 5 ; i, h 5 1, 2, c, M 2 1. 0x*h 11 0x*h 11

(19.16)

The probability expression in equation (19.15) is a (K−M 1 1)-dimensional integral. The dimensionality of the integral can be reduced by one by noticing that the KKT conditions can also be written in a differenced form. To do so, define | e k1 5 ek 2 e1, and let | ,. . .,e | ) . Then, the implied multivariate distribution of the error differences be g (| e 21,e 31 K1 equation (19.11) may be written in the equivalent (K−M)-integral form shown below: V1 2VM 1 1 V1 2VM 1 2

) P (x*1, x*2, x*3, . . . , x* M, 0, 0, . . . , 0 5 0 J 0

3

3

| e M 1 1,1 52` |eM 1 2,1 52`

...

V1 2VK 2 1

V1 2VK

3

3

| e K 2 1,1 52` | e K,1 52`

| de | | g (V1 2 V2, V1 2 V3, . . ., V1 2 VM, | e M 11,1, | e M 12,1, . . ., | e K,1) de K,1 K 21,1. . .de M 11,1 (19.17) The equation above indicates that the probability expression for the observed optimal consumption pattern of goods is completely characterized by the (K − 1) error terms in the differenced form. Thus, all that is estimable is the (K − 1) 3 (K − 1) covariance matrix of the error differences. In other words, it is not possible to estimate a full covariance matrix for the original error terms (e1, e2,. . ., eK) because there are infinite possible densities for f (.) that can map into the same g (.) density for the error differences (see Train, 2003, p. 27, for a similar situation in the context of standard discrete choice models). There are many possible ways to normalize f (.) to account for this situation. For example, one can assume an identity covariance matrix for f (.), which automatically accommodates the normalization that is needed. Alternatively, one can estimate g(.) without reference to f (.). In the general case when the unit prices pk vary across goods, it is possible to estimate K* (K 2 1) /2 parameters of the full covariance matrix of the error differences, as just discussed (though the analyst might want to impose constraints on this full covariance matrix for ease in interpretation and stability in estimation). However, when the unit prices are not different among the goods, an additional scaling restriction needs to be imposed. A typical way to do is by normalizing the scale of the random error terms (that is, the scale of the ek terms) to one.

4

SPECIFIC MODEL STRUCTURES

4.1 The Multiple Discrete-Continuous Extreme-Value (MDCEV) Model Following Bhat (2005, 2008), consider an extreme value distribution for ek and assume that ek is independent of zk (k 5 1, 2,. . . , K) . The ek’s are also assumed to be independently distributed across alternatives with a scale parameter of s (s can be normalized to one if there is no variation in unit prices across goods). Let y (zk, ek) 5 y (zk) # eek be defined as follows:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

438

Handbook of choice modelling V1 5 (a1 2 1) ln (x*1 1 g1) 2 lnp1 Vk 5 brzk 1 (ak 2 1) ln (x*k 1 1) 2 ln pk (k 5 2, 3, . . . , K) , when the a-profile is used, and Vk 5 brzk 2 lna

(19.18)

x*k 1 1b 2 lnpk (k 5 2, 3,. . ., K) , when the g-profile is used. gk

As discussed earlier, it is generally not possible to estimate the Vk form in equation (19.14), because the ak terms and gk terms serve a similar satiation role. From equation (19.17), the probability that the individual allocates expenditure to the first M of the K goods (M ≥ 1) with a corresponding consumption vector x* 5 (x*1, x*2, x*3, . . . , x*M, 0, 0, . . . , 0) is: P (x*1, x*2, x*3, . . ., x*M, 0, 0, . . ., 0) e1 51`

M K V1 2 Vs 1 e1 1 e1 1 V1 2 Vi 1 e1 5 0 J 0 3 e a q lc d b f 3 e q Lc d f la bde1, s s s s s i52 s5M 11

e1 52`

(19.19)

where l is the standard extreme value density function, L is the standard extreme value cumulative distribution function, and 0 J 0 is the determinant of the Jacobian matrix obtained from applying the change of variables calculus between the stochastic KKT conditions and the consumptions, given by the following expression (Bhat, 2008): 0J0 5

M p 1 2 a1 1 M i a q fi b a a b, where fi 5 a b p1 i51 f x* 1 1 g1 i51 i

(19.20)

The integral in equation (19.19) collapses to a surprisingly simple closed form expression providing the following overall expression (Bhat, 2008): qe M

M p 1 # 1 # M i ) P (x*1, x*2, x*3, . . . , x* f a b a b≥ M, 0, 0, . . . , 0 5 i q a p1 sM 21 i51 f i51 i

Vi/s

i51

a ae K

k 51

Vk/s

b

M

¥ (M21) !

(19.21)

The reader will note that the above probability expression can be used even in contexts without an essential Hicksian composite outside good. The only difference in the probability expressions between the two contexts is in how V1 is defined. Specifically, in situations without an essential Hicksian composite outside good, V1 is defined in the same fashion as Vk (k 5 2, 3, . . . , K) are defined in equation (19.18). Further, the expression in equation (19.21) is dependent on the unit price of the good that is used as the first one (see the 1/p1 term in front). In particular, different probabilities of the same consump-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 439 tion pattern arise depending on the good that is labeled as the first good (note that any good that is consumed may be designated as the first good).4 In terms of the likelihood function, the 1/p1 term can be ignored, since it is simply a constant in each individual’s likelihood function. Thus, the same parameter estimates will result independent of the good designated as the first good for each individual. In the case when M 5 1 (that is, only one alternative is chosen), there are no satiation effects (ak51 for all k) and the Jacobian term drops out (that is, the continuous component drops out, because all expenditure is allocated to good 1). Then, the model in equation (19.21) collapses to the standard multinomial logit (MNL) model. Thus, the MDCEV model is a multiple discrete-continuous extension of the standard MNL model. 4.2

Closed Form Extensions of the Multiple Discrete-Continuous Extreme-Value (MDCEV) Model

Thus far, we have assumed that the ek terms are independently and identically extreme value distributed across alternatives k. The analyst can extend the model to allow correlation across alternatives using a generalized extreme-value (GEV) error structure. The advantage of the GEV structure is that it results in closed-form probability expressions for any and all consumption patterns. 4.2.1 The MDCNEV model Pinjari and Bhat (2010) formulate a special two-level nested case of the multiple discretecontinuous generalized extreme-value (MDCGEV) model with a nested extreme value distributed error structure that has the following joint cumulative distribution: (19.22) In the above expression, 5 (1, 2, . . . , Sk ) is the index to represent a nest of alternatives, SK is the total number of nests the K alternatives belong to, and qs (0 , qs # 1; 5 1, 2, . . .,SK) is the (dis)similarity parameter introduced to induce correlations among the stochastic components of the utilities of alternatives belonging to the th nest. This error structure assumes that the nests are mutually exclusive and exhaustive (that is, each alternative can belong to only one nest and all alternatives are allocated to one of the SK nests). Without loss of generality, let 1, 2,. . ., SM be the nests the M chosen alternatives belong to, and let q1, q2, . . . , qs be the number of chosen alternatives in each of the SM nests (thus, q1 1 q2 1 . . .1qs 5 M). Using the nested extreme value error distribution assumption specified in equation (19.22) (and the above-identified notation), Pinjari and Bhat (2010) derived the following expression for the multiple discrete-continuous nested extreme-value (MDCNEV) model:

s

s

s

M

M

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

440

Handbook of choice modelling

(19.23) In the above expression, sum (Xrs) is the sum of elements of a row matrix (Xrs) (see of Pinjari and Bhat, 2010, app. A, for a description of the form of the matrix (Xrs)). 4.2.2 The MDCGEV model More recently, Pinjari (2011) formally proved that the existence of, and derived, the closed form probability expressions for MDC models with error structure based on McFadden’s (1978) GEV structure. To do so, he expressed the probability expression in equation (19.15) as an integral of an Mth order partial derivative of the K-dimensional joint cumulative distribution function (CDF) of the error terms (e1, e2, . . . , eK) : ) P (x*1, . . . , x* M , 0, . . . , 0 5 1`

0M 0J0 3 ce F (e1, e2, . . . ,eK) f 0 ei 5V1 2Vi 1e1, 4i51,2, . . . , K d de1 0e1. . . 0eM

(19.24)

e1 52`

where F (e1,e2, . . .,eK) is the joint CDF of the error terms (e1,e2, . . .,eK) specified based on McFadden’s (1978) GEV form as below: FGEV (e1, e2, . . ., eK) 5exp [ 2G (e2e , e2e , . . ., e2e ) ] 1

2

K

(19.25)

where G is a non-negative function with the following properties: 1.

G (y1,yi,. . .,yK) $ 0, 4yi . 0 (i 5 1, 2, . . ., K)

2.

G is homogeneous of degree m.0, that is G (ay1,. . .ayi. . .,ayK) 5 amG (y1,. . .yi,. . .,yK) ,

3.

lim y S 1`G (y1, . . . , yi, . . . , yK) 5 1`, 4i 5 1, 2, . . . , K, and

4.

(21) M

i

0MG (y1,. . ., yK) # 0, 4yi . 0 (i 5 1, 2,. . ., K) . 0y1 . . . 0yM

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 441 He then derived a general, closed form for the probability expressions as below: V ) P (x*1, . . . , x* M, 0, . . . , 0 5 0 J 0 q e 3 M

i

i51

(M 2 1) ! {6 (H1H2..HM) }1 HM (M 2 2) ! {6 (H212H3..HM) 6(H1..H223..HM) 6 . . . 6 (H1H2..H2(M 21)M) }1 HM 21 (M 2 3) ! 2 { 6 (H3123H4..HM) 6. . . 6 (H122 H34 ..HM) 6 . . . }1 HM 22 ... 1! 21 M 21 )} {6 (HM 1 123. . .M 21HM) 6. . . 6 (H1H H2 0! {6 (HM 123. . .M) } H

(19.26)

234. . .M

(

)

(

)

, , where Hi 5 0H e 0e,. .., e , Hn123 . . . n 5 0 H0ee ....0e. . e and all other terms are defined similarly.5 Recognizing that working with the above general form of probability expressions becomes difficult in situations with complex covariance structures and a large set of choice alternatives (because of the sheer number of terms in the expression), Pinjari (2011) derived compact probability expressions for a variety of cross-nested error structures. The reader is referred to that paper for further details. V1

VK

n

Vi

4.3

V1

V1

VK

Vn

The Mixed MDCEV Model

The MDCGEV structure is able to accommodate flexible correlation patterns. However, it is unable to accommodate random taste variation, and it imposes the restriction of equal scale of the error terms. Incorporating a more general error structure is straightforward through the use of a mixing distribution, which leads to the Mixed MDCEV (or MMDCEV) model. Specifically, the error term,ek, may be partitioned into two components, zk and hk. The first component, zk, can be assumed to be independently and identically Gumbel distributed across alternatives with a scale parameter of pk. The second component, hk, can be allowed to be correlated across alternatives and to have a heteroscedastic scale. Let h 5 (h1,h2,. . .,hK) r, and assume that h is distributed multivariate normal, h , N (0,W) . For given values of the vector h, one can follow the discussion of the earlier section and obtain the usual MDCEV probability that the first M of the k goods are consumed. The unconditional probability can then be computed as: qe M

1

0J0≥ P (x*1,x*2, . . . , x* M, 0, . . . , 0) 5 3 sM 21 h

(Vi 1hi) /s

i51

a ae K

(Vk 1hk) /s

k 51

where F is the multivariate cumulative normal distribution.

b

M

¥ (M 2 1) ! dF ((hh) .

(19.27)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

442

Handbook of choice modelling

Other distributions may also be used for h. Note that the distribution of h can arise from an error components structure or a random coefficients structure or a combination of the two, similar to the case of the usual mixed logit model. Thus, the model in equation (19.27) can be extended in a conceptually straightforward manner to also include random coefficients on the independent variables zk, and random-effects (or even random coefficients) in the ak satiation parameters (if the a profile is used) or the gk parameters (if the g profile is used). 4.4 The Multiple Discrete-Continuous Probit (MDCP) Model The choice of extreme value (either EV or GEV) stochastic specification is driven by convenience (of analytical tractability) rather than theory. A multivariate normally (MVN) distributed stochasticity assumption leads to complex likelihood functions, one reason why the KKT approach did not gain traction for empirical analysis until recently. Attempts have been made to address this issue by using simulation methods such as the GHK simulator (see Kim et al., 2002) and Bayesian estimation methods. However, the GHK and other such simulators become computationally impractical as the dimensionality of integration increases with the number of alternatives. Bayesian estimation methods can also be computationally intensive and saddled with convergencedetermination issues. Thus, no study has been able to estimate KKT demand systems with MVN distributions beyond a small number of alternatives. Notwithstanding the estimation difficulties, there are notable advantages of using an MVN error distribution. First, the MVN error kernel makes it easy to incorporate general covariance structures as well as random coefficients, as long as the number of choice alternatives is not too large. Second, an appealing feature of MVN errors is the possibility of negative correlations among the utilities of different alternatives (as opposed to MEV errors, which allow only positive dependency). This can potentially be exploited to capture situations where the choice of one alternative may reduce (if not preclude) the likelihood of choosing another, where the pattern of substitution is fundamentally different from the substitution due to satiation effects. Given these advantages, we show below that the probability expression of the MDCP model involves the evaluation of a multivariate normal cumulative distribution function (MVNCDF). Equation (19.17) provides the general expression for consumption probabilities for an MDC model based on KKT conditions of random utility maximization. One can rewrite the probability expression using a differenced-errors form as below: P (x*1, x*2, . . ., x*M, 0, . . . , 0) 5 0 J 0 3

(( | e 5 V1 2 V2) , (| e 3,1 5 V1 2 V3) ,. . ., (| e M,1 5 V1 2 VM) ) , c | 2,1 d (( e M 11,1 , V1 2 VM 11) , (| e K,1 , V1 2 VK) ) e M 12,1 , V1 2 VM 12) , . . ., (|

(19.28)

| ,. . ., | | | In the above expression, (| e 2,1,e e M,1,e 3,1 M 11,1,. . ., e K,1) is a K−1 dimensional vector of error differences following a multivariate normal distribution with a zero mean vector m (all elements in m are zeros), and a variance-covariance matrix S. For later

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 443 use, partition this K−1 dimensional vector into two smaller vectors A and B, where | ,. . ., | | | A 5 (| e 2,1,e e M,1) and B 5 (| e M 11,1,e 3,1 M 12,1,. . ., e K,1) . Thus, the m and S matrices m S S can also be partitioned as: m 5 c m d and S 5 c S S d . In the partition of S, S11 and S22 are the variance-covariance matrices of A and B, respectively, while S12 and S21 contain the covariance terms between the elements in A and those in B. Now, express the MDCP probability expression in equation (19.28) as: 1

11

12

2

21

22

) [ ] P (x*1, x*2, . . . x* M, 0, . . ., 0 5 0 J 0 3 P A 5 a, B , b .

(19.29)

) ) ( 0 ) P (x*1,x*2, . . .x* M, 0, . . . ,0 5 0 J 0 .P (A 5 a . P B , b A 5 a

(19.30)

where, a5{ V12V2,V12V3, . . .,V12VM } and b 5 {V1 2VM 11,V1 2VM 12, . . .,V12VK } . One can express the above expression as a product of marginal and conditional probabilities:

To simplify the conditional probability expression in the above expression, we utilize a property of MVN distribution that the distribution of B conditional on A 5 a, is another MVN distribution as given below (Tong, 1990, p. 35): 21 ( 21 (B 0 A5a) ,N (m,S ) , where m 5 m2 1 S21 S11 S12 (19.31) a 2 m1) , and S 5 S22 2 S21S11

21 In the above expression, since m1 and m2 are zero-vectors, one can write m 5 S21S11 a. Using the above result, the conditional probability expression in equation (19.30) can be expressed as Pr (B , b 0 A 5 a) 5 Pr (C , b) where C is an MVN distribution as described above. Then, the MDCP consumption probability can be expressed as:

) ) ( ) P (x*1,x*2,. . . x* M, 0, . . . ,0 5 0 J 0 .P (A 5 a .P C , c

(19.32)

In the above joint probability expression, the marginal probability P (A 5 a) is a multivariate normal probability distribution function (pdf) with a simple closed form expression, where as the MVNCDF Pr(C , c) does not have a closed form. Next, write the MVNCDF Pr (C , b) in standardized form as below: Pr (C , b) 5 Pa

b2m C2m , s b s

5 P (W1 , w1, W2 , w2,. . .,WK 2M , wK 2M)

(19.33)

where, (W1,W2, . . . ,WK 2M) is a vector of standardized, normally distributed random m variables in C 2 and (w1,w2,. . .,wK 2M) is a vector of scalars in b 2s m. Similarly, s m 5 (m1, . . . , mi, . . ., mK 2M) is a vector of means and s 5 (s1, . . ., si, . . ., sK2M) is a vector of standard deviations6 of the normally distributed random variables in C. The problem now boils down to approximating the MVNCDF in equation (19.33). In the recent past, there has been some evidence that using analytical approximations (as opposed to simulation) for evaluating the MVN cumulative distribution function can help in easier estimation of single discrete choice models (for example, the multinomial

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

444

Handbook of choice modelling

probit model; see Bhat and Sidharthan, 2011). Bhat et al. (2013) show that such analytical approximation methods can help in the estimation of MDCP models as well (that is, MDC models with MVN errors). The performance of different analytical approximation methods to evaluate the MVNCDF to estimate the parameters of the MDCP models is an open avenue for further research.

5

THE JOINT MDCEV-SINGLE DISCRETE CHOICE MODEL

The MDCEV model and its extensions discussed thus far are suited for the case when the alternatives are imperfect substitutes, as recognized by the use of a non-linear utility that accommodates a diminishing marginal utility as the consumption of any alternative increases. However, there are many instances where the choice situation is characterized by a combination of imperfect and perfect substitutes in the choice alternative set. The MDCEV model needs to be modified to handle such a combination of a multiple discrete-continuous choice among the imperfect substitutes, as well as a single choice of one alternative each from each subset of perfect substitutes. We do not discuss this case here due to space constraints, but you are referred to Bhat et al. (2006, 2009) for such formulations. Both these studies by Bhat and co-authors assume the absence of price variation across the perfect substitutes. Formulation of KKT model systems to consider price variation across imperfect substitutes as well as perfect substitutes is a potentially fruitful avenue for further research.

6

PREDICTION AND WELFARE ANALYSIS

Thanks to the above advances, several empirical applications have appeared in the recent literature using the KKT approach to model MDC choices. These applications cover a wide range of empirical contexts, including individuals’ time-use analysis, household expenditure patterns, household vehicle ownership and usage, household energy consumption, recreational demand choices and valuation of a variety of environmental goods (for example, fish stock, air quality and water quality). One reason why the KKT approach did not gain much attention until recently was the difficulty of estimating the model parameters. But we are now able to easily estimate KKT demand systems with a large number of choice alternatives (see Van Nostrand et al., 2013, for a model with 211 choice alternatives). Another reason why the KKT approach has not gained popularity is the lack of simple methods to apply the models for forecasting and policy analysis purposes. This section reviews the recent advances aimed to fill that gap. Once the model parameters are estimated, prediction exercises or welfare analyses with KKT-based MDC models involve solving the constrained, non-linear random utility maximization problem in equation (19.1) (or its dual form) for each consumer. In the presence of corner solutions (that is, multiple discreteness), there is no straightforward analytic solution to this problem. The typical approach is to adopt a constrained non-linear optimization procedure at each of several simulated values drawn from the distribution of the stochastic error terms (that is, the ek terms). The constrained optimization procedure itself has been based on either enumerative or iterative techniques.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 445 The enumerative technique (used by Phaneuf et al., 2000) involves an enumeration of all possible sets of alternatives that the consumer can potentially choose. This bruteforce method becomes computationally impractical as the number of choice alternatives increases. Von Haefen et al. (2004) proposed a numerical bisection algorithm based on the insight that, with additively separable utility functions, the optimal consumptions of all goods can be derived if the optimal consumption of the outside good is known. Specifically, conditional on unobserved heterogeneity, they iteratively solve for the optimal consumption of the outside good (and that of other goods) using a bisection procedure. They begin their iterations by setting the lower bound for the consumption of the outside good to zero and the upper bound to be equal to the budget. The average of the lower and upper bounds is used to obtain the initial estimate of the outside good consumption. Based on this, the amounts of consumption of all other inside goods are computed using the KKT conditions. Next, a new estimate of consumption of the outside good is obtained by subtracting the budget on the consumption of the inside goods from the total budget available. If this new estimate of the outside good is larger (smaller) than the earlier estimate, the earlier estimate becomes the new lower (upper) bound of consumption for the outside good, and the iterations continue until the difference between the lower and upper bounds is within an arbitrarily designated threshold. To circumvent the need to perform predictions over the entire distribution of unobserved heterogeneity (which can be time-consuming), von Haefen et al. condition on the observed choices. Pinjari and Bhat (2011) undertook analytic explorations with the KKT conditions of optimality that shed new light on the properties of Bhat’s MDCEV model with additive utility functions. Specifically, they derive a property that the price-normalized baseline marginal utility (that is, yk /pk) of a chosen alternative must be greater than the pricemarginalized baseline marginal utility of an alternative that is not chosen. Further, they discuss a fundamental property of several KKT demand model systems in the literature with additively separable utility form and a single linear binding constraint. Specifically, the choice alternatives can always be arranged in the descending order of a specific measure that depends on the functional form of the utility function. Consequently, when all the choice alternatives are arranged in the descending order of their baseline marginal utility, and the number of chosen alternatives (M) is known, it is a trivial task to identify the chosen alternatives as the first M alternatives in the arrangement. Based on this insight, Pinjari and Bhat (2011) propose computationally efficient prediction algorithms for different forms of the utility function in equation (19.3). One such forecasting algorithm, for the utility form with equal ak parameters across all choice alternatives (that is, ak 5 a 4k 5 1,2,. . .,K) for choice situations with an outside good is outlined in four broad steps below. For predictions algorithms for other additively separable utility forms, the reader is referred to Pinjari and Bhat (2011). Step 0: Assume that only the outside good is chosen and let the number of chosen goods M 5 1. Step 1: Given the input data (Zk,Pk), model parameters (b, gk, a), and the simulated error term (ek) draws, compute the price-normalized baseline utility values (yk /pk) for all alternatives. Arrange all the K alternatives available to the consumer in the descending order of the (yk /pk) values (with the outside good in the first place).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

446

Handbook of choice modelling Step 2: Compute the value of l using the following equation. Go to step 3. E 1 a pkgk M

l5 ±

a 21

k 52

p1 (y1 /p1) 1 2a 1 a pkgk (yk /pk) 1 2a M

1

1

k 52

≤

(19.34)

Step 3: If l . (yM 11 /pM 11) (this condition represents the KKT condition for the M11th alternative) compute the optimal consumptions of the first M alternatives in the above descending order using the following expressions. Set the consumptions of other alternatives as zero and stop: (y1 /p1) 1 2a aE 1 a pkgk b M

1

x*1 5

k 52

1 1 1 2a

p1 (y1 /p )

1 a pkgk (yk /p ) M

1 k 1 2a

(19.35)

k 52

1 (yk /pk) 1 2a # aE 1 a pkgk b M

x*1 5 gk # ±

k 52

p1 (y1 /p1) 1 2a 1 a pkgk (yk /pk) 1 2a 1

M

k 52

1

2 1 ≤ ; 4k 5 (2, 3, . . . , M)

(19.36)

Otherwise, (if l # (yM 11 /pM 11) , set M 5 M11 and go to step 4. Step 4: If (M 5 K), Compute the optimal consumptions using equations (19.35) and 19.36) and stop. Otherwise, (if M , K), go to step 2. The algorithm outlined above can be applied a large number of times with different simulated values of the ek terms to sufficiently cover the simulated distribution of unobserved heterogeneity (that is, the ek terms) and obtain the distributions of the consumption forecasts. The above discussion is primarily orientated toward using KKT-based MDC models for prediction, but does not extend the discussion to include welfare analysis. For a discussion of how such prediction algorithms can be used for welfare analysis, see von Haefen and Phaneuf (2005).

7

FUTURE DIRECTIONS

In the recent past, there has been increasing recognition of the need to extend the basic formulation of consumer’s utility maximization in equation (19.1) in the following directions: 1. 2. 3.

Flexible functional forms for the utility specification. Flexible stochastic specifications for the utility functions. Flexibility in the specification of constraints faced by the consumer.

Each of these directions is discussed next.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 447 7.1 Flexible, Non-additive Utility Forms Most KKT models in the literature assume that the direct utility contribution due to the consumption of different alternatives is additively separable. Mathematically, this assumption implies that: U(x1,. . .,xK) 5 U1(x1) 1. . .1 UK(xK), and greatly simplifies the task of model estimation and welfare analysis. However, this assumption imposes strong restrictions on preference structures and consumption patterns. First, the marginal utility of one alternative is independent of the consumption of another alternative. This assumption, with an increasing and quasi-concave utility function, implies that goods can be neither inferior nor complementary; they can only be substitutes. Thus, for example, one cannot model a situation where the consumption of one good (for example, a new car) may increase the consumption of other goods (for example, gasoline). Third, even flexible substitution patterns in the consumption of different goods can be achieved only by correlating the stochastic utility components of different goods, but not through an explicit functional form. To overcome the restrictions identified above, it is critical to develop tractable estimation methods with flexible, non-additively separable utility functions. There have been a handful of recent efforts in this direction. For example, building on Bhat’s additively separable linear Box-Cox utility form, Vasquez-Lavin and Hanemann (2009) presented a general utility form with interaction terms between sub-utilities, as below: U (x)5a yk K

k 51

a ak am gk xk gk xk gm xm 1 K K ea 11b 21f 1 a a eqkm ca 11b 21d ca 11b d 21f ak gk ak gk am gm 2 k 51m51 k

(19.37)

In the above expression, the second term induces interactions between pairs of goods (m, k) and includes quadratic terms (when m 5 k). These interaction terms allow the marginal utility of a good (k) to depend on the consumption of other goods (m). Specifically, a positive (negative) value for qmk implies that m and k are complements (substitutes). However, the quadratic nature of the utility form does not maintain global consistency (over all consumption bundles) of the strictly increasing and quasi-concave property. Specifically, for certain parameter values and consumption patterns, the utility accrued can decrease with increasing consumption, or the marginal utility can increase with increasing consumption, which is theoretically inconsistent. Bhat and Pinjari (2010) show how a simple normalization by setting qmk 5 0 when m 5 k in equation (19.37) can resolve the issues of theoretical (in)consistency and parameter (un)identification. Other efforts on accommodating complementarity in consumption include Lee et al. (2010) who propose simpler interaction terms using log(quantities), and Gentzkow (2007) who accommodates interactions in indirect utility functions. Despite the above efforts, there are still unresolved conceptual and methodological issues pertaining to: (1) the form of non-additive utility functions, (2) the specification of stochasticity in non-additive utility functions, (3) estimation of parameters with increasing number of choice alternatives, and (4) interpretation of the resulting dependency patterns in consumption. Resolving these issues will be a big step forward in enhancing the behavioral realism of KKT-based RUM MDC models. Further, within the context

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

448

Handbook of choice modelling

of non-additively separable preferences, it is important to recognize asymmetric dependencies in consumption. For example, the purchase of a new car may lead to increased gasoline consumption, but not the other way round. 7.2 Flexible Stochastic Specifications The above discussion was in the context of the form of the utility function. But there is potential for improving the stochastic specification as well. For example, most studies assume independent identically distributed (iid) extreme value random error terms in the utility function. Recent advances on relaxing the iid assumption, specifically via employing MEV distributions, have been discussed in section 4.2. Although we are now able to estimate KKT-based RUM MDC models with general MEV stochastic distributions, no clear understanding exists on how different stochastic specifications and utility functional forms influence the properties of KKT models. Examining the substitution patterns implied by the different stochastic assumptions in KKT-based MDC models is a useful avenue for research. Further, the estimation of the MDCP model with MVN distributed stochasticity (as discussed in section 4.4) is an important avenue for investigation. 7.3 Multiple Constraints Most MDC model applications to date consider only a single linear binding constraint as governing the consumption decisions (for example, the linear constraint in equation 19.1). This stems from an implicit assumption that only a single resource is needed to consume goods. However, in numerous empirical contexts, multiple types of resources, such as time, money and space, need to be expended to acquire and consume goods. While the role of multiple constraints has been long recognized in microeconomic theory (see Becker, 1965), the typical approach to accommodating the different constraints has been to convert them all into a single effective constraint. For example, the time constraint has been collapsed into the money constraint using a monetary value of time. In many situations, however, it is important to consider the different constraints in their own right, because resources may not always be freely exchangeable with each other. To address this issue, a handful of recent studies (Satomura et al., 2011; Castro et al., 2012; Pinjari and Sivaraman, 2012) have provided model formulations to accommodate multiple linear constraints with additive utility functional forms. Satomura et al. (2011) provided a formulation to account for the role of money and space constraints in consumers’ decisions on soft drink purchases. Castro et al. (2012) provide a general treatment of the issue by providing formulations for different scenarios such as complete demand systems (that is, a case without the need of a Hicksian composite good), and incomplete demand systems (a case with the Hicksian composite good). Pinjari and Sivaraman (2012) provide a time- and money-constrained formulation in the context of households’ annual vacation travel destination and mode choices. 7.4

Beyond Simple, Linear Constraints

The above discussion suggests that we have just begun to move toward models with multiple constraints. It is worth noting, however, that most of the literature on MDC

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 449 modeling is geared toward simple, linear constraints that do not represent the complexity of situations consumers face in reality. There are several reasons why linear constraints do not hold. First, linear constraints represent a constant price per unit consumption (or a constant rate of resource-use). In many situations, however, prices vary with the amount of consumption leading to non-linear budget constraints. A classic example of such non-linear budgets is block pricing typically used in energy markets (for example, electricity pricing). While the issue has long been recognized in the classical econometric literature on estimating demand functions, it is yet to be given due consideration in MDC choice studies. Second, linear constraints do not accommodate fixed costs (or set-up costs) which cannot be converted into a constant price per unit consumption. For example, travel cost to a vacation destination is a fixed cost, unlike the lodging costs at the destination which can be treated as variable with a constant price per night. Solving the consumer’s direct utility maximization problem with non-linear constraints can become rather tedious, because the KKT conditions alone may not be sufficient anymore. In a recent study, Parizat and Shachar (2010) employ an enumeration approach to solve a direct utility maximization problem in the context of individuals’ weekly leisure time allocation with fixed costs (for example, ticket costs of going to a movie, the price of a meal). They acknowledge rather large computation times to estimate the parameters for their 12-alternative case. Thus, an alternative approach to incorporate non-linear constraints may be to work with the dual problem using indirect utility functions. Lee and Pitt (1987) provide a methodological treatment of incorporating block pricing with the dual approach. Further studies exploring this approach may enhance our ability to incorporate block prices. Another approach is to econometrically ‘treat’ the inherent endogeneity between prices and consumption due to the dependency of prices on consumption, for example, by estimating price functions simultaneously with the consumer preferences (that is, utility functions). This approach can potentially help in dealing with demand-supply interactions in the market as well (see Berry et al., 1995). 7.5

Prediction and Welfare Analysis with Flexible Model Structures

Thanks to recent advances, we now have simple and computationally efficient methods to apply KKT models with additive utility forms for forecasting and welfare analysis purposes. As the field moves forward with the specification and estimation of more flexible MDC models, it is important to develop methods to apply these models as well. The prediction procedures proposed by von Haefen et al. (2004) and Pinjari and Bhat (2011) based on Karush-Kuhn-Tucker conditions of optimality can potentially be extended to the case with multiple linear binding constraints as well, although with additional layers of computational effort (as many as the number of constraints). However, these procedures fall apart in situations with non-additive utility functions, as they are critically hinged upon the additive utility assumption. Similarly, the presence of non-linear constraints can make it difficult to apply KKT conditions alone for solving the utility maximization problem. Resolving each of these issues is a welcome research direction. Another useful direction of research is in the context of additive utility functions with a simple linear constraint. While we are now able to exploit the KKT conditions for

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

450

Handbook of choice modelling

obtaining the conditional predictions (given specific values of the random terms), we have not been able to characterize the unconditional distributions of the demand functions. In the presence of corner solutions, it is difficult to arrive at closed form expressions for the demand functions from equation (19.1). Perhaps this is why we are not aware of successful attempts to arrive at analytical expressions for price elasticities and sensitivities to explanatory variables. Besides, application of these models requires the simulation of the stochastic terms. In some cases, such as the case with MEV stochastic distributions, the stochastic terms themselves are difficult to simulate. Thus development of fast methods to simulate MEV distributions can aid in the application of KKT models with such stochastic specifications.

8

SUMMARY

There has been an increasing recognition of the MDC nature of consumer choices. Over the past decade the field has witnessed exciting developments in modeling MDC choices, especially with the advancement of the KKT approach to modeling consumer behavior based on RUM. Notable developments include: 1. 2. 3. 4.

Clever specifications with distributional assumptions that lead to closed-form probability expressions enabling easy estimation of the structural parameters (for example, the MDCEV model). Application of the KKT approach to model MDC choices in a variety of empirical contexts. Formulation of computationally efficient prediction/welfare analysis methods with KKT models. Extension of the basic RUM specification in equation (19.1) to accommodate richer patterns of heterogeneity in consumer preferences and to allow flexibility in distributional assumptions. Most of these extensions have been ‘econometrically’ oriented, akin to the extensions of the multinomial logit model in the traditional discrete choice analysis literature.

In the recent past, there has been an increasing recognition of the need to extend the basic formulation of consumer’s utility maximization in equation (19.1) in the following directions: 1. 2. 3.

Flexible functional forms for the utility specification, such as non-additive utility forms. Flexible stochastic specifications for the utility functions, such as MVN distributions. Flexibility in the specification of constraints faced by the consumer, including multiple inter-related constraints, and non-linear constraints.

Given the pace of recent developments, we optimistically look forward to seeing model formulations, estimation methods and prediction/welfare analysis procedures for a general framework with non-additive utility forms, flexible stochastic distributional assumptions and general forms of constraints.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 451

ACKNOWLEDGEMENTS This chapter draws heavily in some places from papers by Bhat and colleagues, and a recent workshop report by Abdul Pinjari, Chandra Bhat and David Bunch.

NOTES 1. A complete demand system involves the modeling of the demands of all consumption goods that exhaust the consumption space of consumers. However, complete demand systems require data on prices and consumptions of all commodity/service items, and can be impractical when studying consumptions in finely defined commodity/service categories. Thus, it is common to use an incomplete demand system, typically in the form of a two-stage budgeting approach or in the form of the use of a Hicksian composite commodity assumption. In the former two-stage budgeting approach, separabilility of preferences is invoked, and the allocation is pursued in two independent stages. The first stage entails allocation between a limited number of broad groups of consumption items, followed by the incomplete demand system allocation of the group expenditure to elementary commodities/services within the broad consumption group of primary interest to the analyst (the elementary commodities/services in the broad group of primary interest are commonly referred to as ‘inside’ goods). The plausibility of such a two-stage budgeting approach requires strong homothetic preferences within each broad group and strong separability of preferences, or the less restrictive conditions of weak separability of preferences and the price index for each broad group not being too sensitive to changes in the utility function (see Menezes et al., 2005). In the Hicksian composite commodity approach, the analyst assumes that the prices of elementary goods within each broad group of consumption items vary proportionally. Then, one can replace all the elementary alternatives within each broad group (that is not of primary interest) by a single composite alternative representing the broad group. The analysis proceeds then by considering the composite goods as ‘outside’ goods and considering consumption in these outside goods as well as the ‘inside’ goods representing the consumption group of main interest to the analyst. It is common in practice in this Hicksian approach to include a single outside good with the inside goods. If this composite outside good is not essential, then the consumption formulation is similar to that of a complete demand system. If this composite outside good is essential, then the formulation needs minor revision to accommodate the essential nature of the outside good. Please refer to von Haefen (2010) for a discussion of the Hicksian approach and other incomplete demand system approaches such as the one proposed by Epstein (1982) that we do not consider here. In this chapter, we consider incomplete demand systems in the form of the second stage of a two-stage incomplete demand system with a finite, positive total budget as obtained from the first stage (for presentation ease, we will refer to this case as the ‘inside goods only’ case in which at least one ‘inside’ good has to be consumed and there are no essential outside goods) or in the form of a Hicksian composite approach with a single outside good that is essential and no requirement that at least one of the inside goods has to be consumed (for presentation ease, we will refer to this case simply as the ‘essential outside good’ case or even more simply, as the outside good case; if the outside good is non-essential, the formulation becomes identical to the case of the ‘inside goods only’ case, while if there are multiple outside goods, the situation is a very simple extension of the formulations presented here depending on whether the outside goods are all essential, all non-essential, or some combination of essential and non-essential). Finally, a complete demand system takes the same formulation as the ‘inside goods only’ formulation. 2. Hanemann (1984) used this approach to derive a variety of SDC model forms consistent with equation (19.2). Chiang (1991) and Chintagunta (1993) extend Hanemann’s SDC formulation to include the possibility of no inside goods being selected by introducing a ‘reservation price’. In their approach, an inside good is selected only if the quality adjusted price of at least one of the inside goods is below the reservation price. See Dubin and McFadden (1984) for another, slightly different, way of employing the (conditional) indirect utility approach for SDC choice analysis. 3. Note that the subsequent discourse is for the case with a Hicksian composite outside good that is essential. However, the derivations carry over to the case without an outside good in a straightforward manner. 4. This is not an issue in contexts with a numeraire Hicksian composite outside good because p15 1.. 5. G and H are similar functions, but with different arguments; G represents G (e2e ,. . .,e2e ) , whereas H represents G (eV1, . .. eVi, . . . , eVn) . Note from the 6 signs that the sign in front of each mixed partial derivative term depends on the number of partial derivatives in the term and the number of chosen alternatives M. Also note that the model structures for MDCNEV and MDCGEV are derived for the case without price variation across choice alternatives. One can extend these structures for situations with price variation in a straightforward fashion. 6. si is the square root of the iith element of the covariance matrix S. 1

n

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

452

Handbook of choice modelling

REFERENCES Becker, G.S. (1965), ‘A theory of the allocation of time’, Economic Journal, 75 (299), 493–517. Berry, S., J. Levinsohn and A. Pakes (1995), ‘Automobile prices in market equilibrium’, Econometrica, 60 (4), 841–90. Bhat, C.R. (2005), ‘A multiple discrete-continuous extreme value model: formulation and application to discretionary time-use decisions’, Transportation Research Part B, 39 (8), 679–707. Bhat, C.R. (2008), ‘The multiple discrete-continuous extreme value (MDCEV) model: role of utility function parameters, identification considerations, and model extensions’, Transportation Research Part B, 42 (3), 274–303. Bhat, C.R. and A.R. Pinjari (2010), ‘The generalized multiple discrete-continuous extreme value (GMDCEV) model: allowing for non-additively separable and flexible utility forms’, working paper, Department of Civil, Architectural and Environmental Engineering, University of Texas at Austin. Bhat, C.R. and R. Sidharthan (2011), ‘A simulation evaluation of the maximum approximate composite marginal likelihood (MACML) estimator for mixed multinomial probit models’, Transportation Research Part B, 45 (7), 940–53. Bhat, C.R., M. Castro and M. Khan (2013), ‘A new estimation approach for the multiple discrete-continuous probit (MDCP) choice model’, Transportation Research Part B, 55, 1–22. Bhat, C.R., S. Sen and N. Eluru (2009), ‘The impact of demographics, built environment attributes, vehicle characteristics, and gasoline prices on household vehicle holdings and use’, Transportation Research Part B, 43 (1), 1–18. Bhat, C.R., S. Srinivasan and S. Sen (2006), ‘A joint model for the perfect and imperfect substitute goods case: application to activity time-use decisions’, Transportation Research Part B, 40 (10), 827–50. Bunch, D.S. (2009), ‘Theory-based functional forms for analysis of disaggregated scanner panel data’, working paper, Graduate School of Management, University of California-Davis. Castro, M., C.R. Bhat, R.M. Pendyala and S.R. Jara-Diaz (2012), ‘Accommodating multiple constraints in the multiple discrete-continuous extreme value (MDCEV) choice model’, Transportation Research Part B, 46 (6), 729–43. Chiang J. (1991), ‘The simultaneous approach to the whether, what, and how much to buy questions’, Marketing Science, 10 (4), 297–315. Chintagunta, P. (1993), ‘Investigating purchase incidence, brand choice and purchase quantity decisions of households’, Marketing Science, 12 (2), 184–208. Deaton, A. and J. Muellbauer (1980), Economics and Consumer Behavior, Cambridge: Cambridge University Press. Dubin, J. and D. McFadden (1984), ‘An econometric analysis of electricity appliance holdings and consumption’, Econometrica, 52 (2), 345–62. Epstein, L.G. (1982), ‘Integrability of incomplete systems of demand functions’, Review of Economic Studies, 49 (3), 411–25. Gentzkow, M. (2007), ‘Valuing new goods in a model with complementarity: online newspapers’, American Economic Review, 97 (3), 713–44. Hanemann, W.M. (1978), ‘A methodological and empirical study of the recreation benefits from water quality improvement’, PhD dissertation, Department of Economics, Harvard University. Hanemann W.M. (1984), ‘Discrete/continuous models of consumer demand’, Econometrica, 52 (3), 541–61. Kim, J., G.M. Allenby and P.E. Rossi (2002), ‘Modeling consumer demand for variety’, Marketing Science, 21 (3), 229–50. Lee, L.F. and M.M. Pitt (1986), ‘Microeconometric demand systems with binding nonnegativity constraints: the dual approach’, Econometrica, 54 (5), 1237–42. Lee, L.F. and M.M. Pitt (1987), ‘Microeconometric models of rationing, imperfect markets, and nonnegativity constraints’, Journal of Econometrics, 36 (1–2), 89–110. Lee, S., J. Kim and G. Allenby (2010), ‘A direct utility model for asymmetric complements’, Working paper, Ohio State University. Mäler, K.-G. (1974), Environmental Economics: A Theoretical Inquiry, Baltimore, MD: Johns Hopkins University Press for Resources for the Future. McFadden, D. (1978), ‘Modelling the choice of residential location’, in A. Karlquist, L. Lundqvist, F. Snickers and J. Weibull (eds), Spatial Interaction Theory and Residential Location, Amsterdam: North-Holland, pp. 75–96. Menezes, T.A., F.G. Silveira and C.R. Azzoni (2005), ‘Demand elasticities for food products: a two-stage budgeting system’, NEREUS-USP, São Paulo (TD Nereus 09-2005). Parizat, S. and R. Shachar (2010), ‘When Pavarotti meets Harry Potter at the Super Bowl’, working paper, Tel Aviv University.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Multiple discrete-continuous choice models 453 Phaneuf, D.J., C.L. Kling and J.A. Herriges (2000), ‘Estimation and welfare calculations in a generalized corner solution model with an application to recreation demand’, Review of Economics and Statistics, 82 (1), 83–92. Pinjari, A.R. (2011), ‘Generalized extreme value (GEV)-based error structures for multiple discrete-continuous choice models’, Transportation Research Part B, 45 (3), 474–89. Pinjari, A.R. and C.R. Bhat (2010), ‘A multiple discrete-continuous nested extreme value (MDCNEV) model: formulation and application to non-worker activity time-use and timing behavior on weekdays’, Transportation Research Part B, 44 (4), 562–83. Pinjari, A.R. and C.R. Bhat (2011), ‘An efficient forecasting procedure for Kuhn-Tucker consumer demand model systems: application to residential energy consumption analysis’, working paper, University of South Florida. Pinjari, A.R. and V. Sivaraman (2012), ‘A time and money budget constrained model of long-distance vacation travel demand’, working paper, University of South Florida. Roy, R. (1947), ‘La distribution du revenu entre les divers biens’ [‘The distribution of income among various goods’], Econometrica, 15 (3), 205–25. Satomura, S., J. Kim and G. Allenby (2011), ‘Multiple constraint choice models with corner and interior solutions’, Marketing Science, 30 (3), 481–90. Tong, Y.L. (1990), The Multivariate Normal Distribution. New York: Springer-Verlag. Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge: Cambridge University Press. Van Nostrand, C., V. Sivaraman and A.R. Pinjari (2013), ‘Analysis of annual, long-distance, vacation travel demand in the United States: a multiple discrete-continuous choice framework’, Transportation, 40 (1), 151–71. Vasquez Lavin, F. and M. Hanemann (2009), ‘Functional forms in discrete/continuous choice models with general corner solution’, working paper, University of California Berkeley. Von Haefen, R.H. (2010), ‘Incomplete demand systems, corner solutions, and welfare measurement’, Agricultural and Resource Economics Review, 39 (1), 22–36. Von Haefen, R.H. and D.J. Phaneuf (2005), ‘Kuhn-Tucker demand system approaches to nonmarket valuation’, in R. Scarpa and A. Alberini (eds), Applications of Simulation Methods in Environmental and Resource Economics, Dordrecht: Springer, pp. 135–58. Von Haefen, R.H., D.J. Phaneuf and G.R. Parsons (2004), ‘Estimation and welfare analysis with large demand systems’, Journal of Business and Economic Statistics, 22 (2), 194–205. Wales, T.J. and A.D. Woodland (1983), ‘Estimation of consumer demand systems with binding non-negativity constraints’, Journal of Econometrics, 21 (3), 263–85.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:53:00AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

20 Bayesian estimation of random utility models Peter Lenk*

1

INTRODUCTION

Conjoint studies and their Bayesian estimation are remarkably intertwined. Luce and Tukey (1964) originated conjoint analysis for measuring judgment and perception in mathematical psychology. They proposed a system to measure constituent components of multi-attribute stimuli from subjects’ ordering of the stimuli. Meanwhile in economics, Lancaster (1966) proposed a theory of consumer choice that decomposed the utility of goods into the utility for their attributes. Green and Rao (1971) synthesized these two ideas to decompose the desirability of product1 attributes from subjects’ rankings of the products. For example, three attributes for hotels are room comfort, business centres and swimming pools. Based on a subject’s ranking of hotels, the researcher can measure the preferences for each attribute. Then a hotel chain can use this information to design hotels for different segments of customers. For instance, business travellers may appreciate business centers but not swimming pools, while families travelling with children prefer swimming pools to business centres. Wind et al. (1989) conducted such a study to design Courtyard by Marriott. The connection between Bayesian inference and conjoint analysis runs deeper than merely providing practical, effective estimation and measurement methods. They both have foundations in utility theory. Random utility theory (RUT), introduced by McFadden (1974) and foreshadowed by Aitchison and Bennet (1970), provides the economic foundation for conjoint analysis. RUT assumes that subjects select products that maximize their utility, or ‘brand enjoyment’ in Aitchison and Bennet, among a competitive set of alternatives. Bayesian analysis also is grounded in utility theory. Savage (1954) extended von Neumann and Morgenstern (1944) axioms of rational preferences to endogenize probability: probability becomes a subjective measure for belief. Savage applied his theory to inference and derived decision rules that maximize expected utility (or minimize expected loss) with respect to the decision maker’s subjective probability of the parameters. The decision maker updates his or her prior distributions by Bayes theorem as sample data become available. Bayesian analysis of conjoint models provides a unique setting where both the data-generating mechanism and the philosophy for inference share common theoretical roots. Most conjoint analyses use hierarchical models with two or more levels. The subjectlevel model relates the observed responses to the products’ attributes, often with the intermediate step of imputing unobserved or latent utilities. This model usually contains subject-specific parameters that allow subjects to have different preferences for product attributes. The population-level model describes the heterogeneity or distribution of the subject-specific parameters across the population. Variations of conjoint analysis alter the specifications of the subject-level or population-level models and the elicitation task. Subjects may rate, rank, or choose products. Different functional forms and 457 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

458

Handbook of choice modelling

distributional assumptions for the random utilities imply qualitatively different behaviour, and different population-level models result in different policy recommendations. These variations produce a large model space and numerous estimation procedures that differ in their details. Unlike classical inference that uses different approaches for these variations, Bayesian inference applies one method. Bayesian pay the price for this simplicity with complex numerical methods to approximate integrals. Fortunately, the reduction in computing cost and the development of numerical methods over the last 20 years have bought Bayesian inference in reach of anyone who owns a laptop. Sawtooth Software and SAS have commercial grade implementations for Bayesian conjoint models, and there are ever-growing freeware, especially Winbugs and R. The goal of the chapter is to give readers a toolset for hierarchical Bayes (HB) analyses of a wide range of conjoint specifications. Hierachical Bayes models are not specialized to conjoint analysis, and its application to conjoint models draws on all aspects of Bayesian theory. Hierachical Bayes models have a long history: Hill (1965) introduced HB models for random effects, one-way ANOVA; and Lindley and Smith (1972) and Smith (1973) proposed HB linear models. Lenk et al. (1996) applied HB analysis for metric conjoint and Allenby and Lenk (1994, 1995) considered discrete-choice HB. Hierachical Bayes models are often termed ‘Bayesian random coefficient models’, though the terminology can be misleading. Bayesians treat all unknown parameters as random, and they estimate random coefficients as though they are fixed coefficients. I will follow the following operational definitions: ‘fixed coefficients’ have prior distributions, and ‘random coefficients’ have heterogeneity distributions. Bayesian models specify a joint probability distribution for the data and all unknown quantities. The joint distribution includes the likelihood function for sample information, heterogeneity distributions for subject-level parameters, structural constraints and prior distributions. Bayesian inference derives posterior, predictive and marginal distributions from this joint distribution. Bayes estimators or Bayes rules minimize expected posterior loss for different loss functions. Bayesian inference is internally consistent and coherent (De Finetti, 1937) because all of the computations are obtained from the joint distribution by simple probability calculations. Bayesian analysis optimally combines all sources of information (Bernardo and Smith, 1994). Numerous studies have shown that Bayesian estimation also has desirable sampling properties, such as asymptotic normality and consistency (Berger, 1985). In very general settings, if the prior distributions do not rule out the truth, then Bayesian inference is consistent in probability (Doob, 1949). These theoretical properties, which are often overlooked in the rush to the computer, guarantee that researchers will be well treated by Bayesian analysis. Conjoint studies often produce ‘broad and shallow’ data: many subjects and few observations per subject. Researchers need to have many subjects to estimate the distribution of subject-parameter heterogeneity, which allow subjects to have different preferences. If studies were also ‘deep’ with many observations per subject, then two-stage methods would work well because the small estimation error for subject-level parameter estimates would not greatly distort the heterogeneity distribution. However, ‘broad and deep’ studies (many subjects and many observations per subject) are prohibitively expensive and difficult for subjects to complete. With broad and shallow data (many subjects and few observations per subject), two-stage methods fail because individual-level estimators may not exist or have high sampling variation, thus distorting the heterogeneity

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

459

Bayesian estimation of random utility models

distribution. Hierarchical Bayes inference introduces bias in the individual-level estimates to reduce their sampling error by shrinking them towards population-level estimators (Allenby and Rossi, 1998). Shrinkage estimators also appear in classical statistics to reduce sampling error: James-Stein estimation (Stein, 1956; James and Stein 1961), ridge regression (Hoerl and Kennard, 1970) and penalized maximum likelihood (Good, 1971; de Montricher et al., 1975). These classical methods can be viewed as special cases of Bayesian inference. Bayesian shrinkage occurs automatically from combining different sources of information in the joint distribution. The amount of shrinkage depends on estimation error and the explanatory power of the population-level model. In this way, HB analysis reliably estimates both the subject-level and population-level models with relatively few observations for each subject. Researchers often conflate Bayesian analysis with its numerical methods, such as Markov chain Monte Carlo (MCMC). The next section identifies essential elements of Bayesian analysis. Section 3 then surveys numerical approximation methods for integration, starting with the well-known grid methods from high school calculus and ending with MCMC simulation algorithms, the workhorse of modern Bayesian computation. Section 4 presents a series of MCMC algorithms for HB regression models for continuous, ordinal and nominal data. Readers can skip these details without loss of continuity. If you decide to implement your own software, revisiting the details will be beneficial. Section 5 discusses Bayesian hypothesis testing and model selection, and section 6 concludes the chapter with a partial survey of extensions and elaborations of the basic random utility model. Recent texts on Bayesian inference or conjoint analysis are Louviere et al. (2000), Lancaster (2004), Orme (2006), Rossi et al. (2005), Koop et al. (2007) and Train (2009).

2 BASICALLY BAYES Bayesian analysis rests on three pillars: the joint distribution of all random components to specify the model; probability calculus to derive marginal, posterior and predictive distributions; and loss functions to derive Bayes rules, which are decision rules for optimal estimation. Bayesian analysis consists of learning and summarization processes. Bayesian encode their prior beliefs about unknown parameters, such as attribute preferences, with probability distributions. When they obtain data from a conjoint study, they update these beliefs by computing posterior distributions in the learning step. They then estimate parameters with various statistics from the posterior distribution in the summarization step. Conjoint studies use repeated measures where each subject provides more than one evaluation. To fix notation, there are n subjects where subject i evaluates mi products or options. The total number of evaluations is M 5 m1 1 . . . 1 mn. Yi is the vector of responses for subject i; Xi are exogenous variables, which can include attributes of the products, experimental manipulations, and subject-level covariates. The entire observed data are (X,Y) where Y 5 {Y1,. . ., Yn} and X 5 {X1,. . ., Xn}. W are the unknown parameters. The joint distribution of Y and W given X is: f (Y, W 0 X) 5 f (Y 0 X, W) g (W)

(20.1)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

460

Handbook of choice modelling

where f(Y|X,W) is the distribution of the data given the parameters and X, and g is the prior distribution for W. If Y or W is a continuous random variable, then f or g is a density function.2 If Y or W is a discrete random variable, then f or g is a probability mass function.3 The likelihood function L(W) 5 f(Y|X, W) expresses the information in the fixed data Y about the unknown parameter W, and the prior distribution summarizes our knowledge about the parameters before obtaining the data. Because X is fixed and exogenous in conjoint studies, we suppress it in following. If X were endogenous, we would have to expand the joint distribution to include its distribution. If the subjects’ responses are conditionally independent given X and W, then the overall likelihood factors into n subject specific likelihoods: f (Y 0 W) 5 wi51 f (Yi 0 W) . Further, if the mi evaluations within mi f ( yij 0 W) where y is subject i are conditionally independent given W, then f (Yi 0 W) 5 w j51 ij th subject i’s evaluation for the j stimulus or product. The Bayesian learning process updates our prior knowledge about W after observing the sample information Y by using Bayes theorem. The updating process results in the posterior distribution of W given the data Y: g (W 0 Y) 5

f (Y,W) f (Y 0 W) g (W) 5 f (Y) f (Y)

(20.2)

where f(Y) is the marginal distribution of Y or the integrated likelihood:4 f (Y) 5 3 f (Y 0 W) g (W) dW.

(20.3)

DB 5 arg min D3 L [ D,R (W) ] g (W 0 Y) dW.

(20.4)

Because Y is fixed at the observed data, the integrated likelihood f(Y) is a normalizing constant for the posterior distribution. Bayesians simply write: g (W 0 Y) ~ f (Y 0 W) g (W) , due to laziness and not to profundity. This normalizing constant f(Y) adjusts the posterior so that it integrates to one, and it does not affect its shape or location. The summarization process focuses on various aspects of the posterior distribution. It may be sufficient to graph the posterior distributions in one or two dimensions. Other summary measures are means, standard deviations, correlations, and percentiles. These measures have decision theoretic justifications based on different loss functions (negative utility for using an estimator). The loss function L[D, R(W)] measures the penalty for using the decision rule or estimator D for parameter R(W) where R is a function of W. R could be as simple as the identity function or as complex as market shares, profits, willingness-to-pay, consumer surplus, or social welfare. For example, squared-error loss is L [ D,R (W) ] 5 [ D 2 R (W) ] ’ [ D 2 R (W) ] . The Bayes rule DB minimizes the posterior expected loss for all possible D:

The integral is the posterior expected loss for using decision rule D. DB is the point estimator that gives minimal loss. Bayes rules are admissible: other estimators cannot uniformly dominate Bayes rules across all parameter values with respect to the loss function (DeGroot, 1970). The posterior mean E [ R (W)0Y ] 5 eR (W) g (W 0 Y) dW is the Bayes rule for squared-error loss; the posterior median is the Bayes rule for absolute error loss, and the posterior mode is the Bayes rules for 0/1 loss.5 The posterior Bayes risk measures the uncertainty in the Bayes rule: e L [ DB,R (W) ] g (W 0 Y) dW. Under squared-error loss, the Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

461

Bayesian estimation of random utility models

Bayes risk is the posterior variance. Many software packages report the posterior mean as the point estimator and the posterior standard deviation as a measure of estimation uncertainty. Bayesians use highest posterior density intervals (HPDI) as a substitute for confidence intervals. Conceptually, draw a horizontal line through the posterior density. Compute the area under the density and between the endpoints determined by the intersection of the horizontal line and density. Find the highest horizontal line such that the area is a specified value, say 90 percent or 95 percent. The HPDI is the set of parameter values corresponding to this area. If the posterior density is approximately normal, a fast and dirty approximation to the 95 percent HPDI is the posterior mean 1 2 posterior standard deviations. HPDI may be an intersection of disjoint subintervals if the posterior density is multi-modal. The learning process for unknown parameters also extends to prediction: future values of Y can be viewed as unknown parameters. Conceptually, Bayesians do not make a major distinction between inference and prediction, unlike classical statistics. The ‘posterior’ distribution for future observations are predictive distributions that integrate the likelihood function for future Yn11, . . ., Yn1k over the posterior distribution from past Y1, . . ., Yn. If the Y’s are conditionally independent given W, then the predictive distribution is: f (Yn 11,. . . ,Yn 1k 0 Y1, . . . ,Yn) 5

f (Y1,. . . ,Yn 1k) f (Y1, . . .,Yn)

5 3 c q f (Yn 1j 0 W) d g (W 0 Y1,. . . ,Yn) dW. k

(20.5)

j51

Loss functions also apply to prediction: the predictive mean is optimal for squared-error loss, and so on. For prediction, the equivalent of the HPDI uses the prediction distribution instead of a posterior distribution. These highest predictive density intervals indicate the range of most likely values for future Y variables. Bayesian summarization includes all sources of information, both sample information and prior information, by integrating over the posterior or predictive distributions. Unfortunately, integration is not easy. The next section briefly describes numerical approximation methods.

3 NUMERICAL APPROXIMATIONS Except for a small number of special models, such as linear regression with conjugate priors,6 Bayesian rely on numerical approximations of posterior expectations. In general, if R is a functional of parameters W, then the posterior expectation of R(W) is: E [ R (W) 0 Y ] 5 3 R (W) g (W 0 Y) dW

(20.6)

This section presents different approximation tactics in roughly their historical order. As computational resources have dramatically increased, the methods have become more

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

462

Handbook of choice modelling

sophisticated, efficient and effective. However, the earlier techniques, which are easy to understand, provide insight into recent methods, which are less intuitive. Grid Methods Grid methods have been around since the beginning of calculus in the seventeenth century. Definite integrals are the area between a curve and the W-axis between two end-points. This area can be approximated by a sequence of rectangles or other shapes with known areas. These methods are feasible for evaluating posterior expectations if the dimension of W is small. In one dimension, we break the range of W into T intervals determined by the points W0 , . . . , WT. These grid points form the bases of approximating rectangles. A simple approximation is: 3 R (W) g (W 0 Y) dW <

T

S

Area Rectangle t

Area Rectangle t 5 R (yt) g (yt 0 Y) dWt Grid t51

(20.7)

where yt 5 (Wt 1 Wt−1)/2 is the midpoint of the interval (Wt−1, Wt) and d Wt 5Wt − Wt−1 is the width of the interval. This approximation replaces the integrand R*g with step functions over the grid. Linear splines (trapezoidal rule) and polynomial splines (Simpson’s rule) improve the performance by providing a better approximation of the integrand than step functions. Adaptive rules sequentially places the grid points where the integrand is the waviest. For smooth integrands in one or two dimensions, surprisingly few grid points are needed. Approximations generally become more accurate as the grid becomes finer. Figure 20.1 illustrates grid methods where R(x) 5 sin(kpx) and f(x) is the normal density with mean 0.5 and standard deviation 0.15. Panel A plots R(x), f(x) and R(x) f(x) over 0 to 1 for k 5 2. Panels B and C approximate R(x)f(x) with step functions and linear splines, respectively, where there are 10, equally spaced intervals on 0 to 1. Panel C gives a better approximation because the linear spline is a better approximation of R(x) f(x) than the step function. Panel D illustrates what can go wrong if the grid is too coarse to detect high frequency features (k 5 19) of the integrand. Grid methods have three limitations that make them impractical for conjoint analysis except in special cases. First, the number of grid points and evaluations of the integrand increases exponentially with the dimension of W. If a grid of 10 points works well in one dimension, then we may need a 10p points in p dimensions. Second, we must identify regions where R(W)g(W|Y) are non-zero. Finding these regions can be a challenging. Third, we should take into account the functional variation in the integrand to avoid smoothing major features as in Figure 20.1D. The previous two problems call for considerable skill in functional analysis. Simulation methods leverage knowledge about the statistical model to avoid these limitations. Monte Carlo Simulation Since the 1960s, Monte Carlo (MC) simulation methods have mostly displaced grid methods in statistics because they are scalable to higher dimensions. It sounds too good to be true, and it is. The catch is that the researcher must have appropriate

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

A

sin(2*pi*x)

Normal

B

sin(2*pi*x)Normal

3.0

1.5

2.0

1.0

sin(2*pi*x)Normal

Step

0.5 1.0 0.0 0

0.0 0

0.2

0.4

0.6

0.8

1

–1.0

0.2

0.4

0.6

1

0.8

1

–0.5 –1.0

–2.0

463

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

0.8

–1.5

C

sin(2*pi*x)Normal

D

Linear

1.5

3.0

1.0

2.0

0.5

1.0

sin(19*pi*x)Normal

Linear

0.0

0.0 0

0.2

0.4

0.6

0.8

0

1

–0.5

–1.0

–1.0

–2.0

–1.5

–3.0

0.2

0.4

0.6

Figure 20.1 Grid methods for approximating E[sin(kpX)] where X ~ N(0.5,0.0225); A: integrand for k=2; B: step-function approximation; C: linear spline approximation; D: high oscillation with k=19

464

Handbook of choice modelling

pseudo-random number generators. Grid methods systematically place the grid points {Wt} over the domain of R(W)g(W|Y), while MC methods draw them from the posterior distribution; thus, placing a ‘flexible grid’ in areas of large posterior probability. In the simplest case, we have a random number generator for the posterior distribution. Then we generate a random sample of T, independent and identically distribution (iid) pseudo random numbers W1, . . . , WT from the posterior distribution g(W|Y). The sample average of {R(Wt)} approximates the posterior mean: 3 R (W) g (W 0 Y) dW <MC

1 T S R (Wt) . T t51

(20.8)

The expected value of the right hand side is the posterior mean of R. The average converges to the integral as the number of random draws increases by the strong law of large numbers, provided that e [ R (W) ] 2g (W 0 Y) dW , `. For the approximation to be accurate, the random draws {Wt} should span the support of the posterior distribution for R(W). The MC sampling variance of equation (20.8) is: var c

1 T 1 S R (Wt) d 5 var [ R (W) 0 Y ] T t51 T

(20.9)

which declines with the number T of draws. The rate of convergence does not depend on the dimension of the parameter space, although the variance of R(W) does. We do not have to evaluate the posterior distribution in equation (20.8), which saves on computations. However, the hard work of numerical integration has shifted to coding appropriate random number generators, which usually do not exist for conjoint models. Importance Sampling Declaring victory over posterior integrals due to MC is premature because we seldom have random number generators for our posterior distributions, and good random number generators are extremely difficult to build. The natural question is how to use standard random number generators to approximate integrals for non-standard posterior distributions. Importance sampling (Hammersley and Handscomb, 1964) is a simple work around that was popular in statistics in the 1970s and 1980s. It is motivated by the observation that: g (W 0 Y) 3 R (W) ( ) h (W) dW h W (20.10) 3 R (W) g (W 0 Y) dW 5 g (W 0 Y) 3 ( ) h (W) dW h W where h is a distribution with the same support as g. The dismayed reader probably noticed that I merely replaced a simple integral with two, more complex integrals, and it does not seem to move us any closer to solving the original problem. However, if we have a good random number generator for h, we can generate iid draws Y1, . . . , YT from h and approximate the posterior mean by a weighted average:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

S wt R (Yt) 3 R (W) g (W 0 Y) dW
g (Yt 0 Y) /h (Yt)

T Ss51 g (Ys 0 Y) /h (Ys)

465

T

for t=1, . . . T

(20.11)

A convenient feature of the sampling weights wt is that they do not depend on the normalizing constants of g and h. The approximation works best if h is close to g. If they are equal, then wt 5 1/T, and the importance sampling estimator is the MC estimator in equation (20.8). The importance weights provide a diagnostic for the sampler. A poor choice of h results in a few large weights and many small or zero weights, and the effective sample size will be less than T. Hesterberg (1995) defined the effective sample size T as: ESS 5 [ St51w2t ] 21. In general, the importance sampler’s accuracy depends on the tail behaviours of g and h. The tails of h should not be shorter than g. If h has shorter tails, random draws from h will not explore the tails of g, and the approximation can be biased. Conversely, importance sampling is inefficient when h has much longer tails than g: if g is zero at a draw from h, then its weight is zero, and the draw does not contribute to the approximation. Importance sampling has largely been displaced by MCMC, though it is making a comeback in particle filtering for dynamic models (Gordon et al., 1993; Gilks and Bersuini, 2001). Markov Chain Monte Carlo MCMC increases the generality of sampling algorithms by relaxing the condition that the draws have to be independent. It is often called ‘Metropolis–Hastings sampling’ after Metropolis et al. (1953) and Hastings (1970). Gelfand and Smith (1990) reintroduced these methods to statistics, though they were foreshadowed by Geman and Geman (1984) and Tanner and Wong (1987), among others. Chib and Greenberg (1995) review the methods. Gelfand and Smith started a computational revolution in the Bayesian community. At that time, Bayesian theory was well developed; computational costs were falling; Bayesians were tackling sophisticated models where Monte Carlo and importance sampling were not effective; and top statistical journals began accepting papers where the main contribution was computational methods. MCMC, as the name implies, creates a sequence of draws from a Markov chain7 where the stationary distribution is the desired posterior distribution. The principle of equivalent exchange implies that these advantages are not free: the draws from the MCMC are autocorrelated and not independent. If the autocorrelation is large, then the rate of convergence is slower than that of MC. The MCMC draws eventually converge to the posterior distribution after passing through an initial transitory period of unknown length. The user must guess the length of this initial period and decide how many draws B to drop from the approximations and how many draws T to use for estimation. There is a large literature (Roberts and Polson, 1994) on the choice of B and T. In most Bayesian models, the rate of convergence is exponential if the states in the parameter space communicate8 with each other. However, in any particular application, the theory is often too remote to be a practical guide for picking B and T. Different MCMC algorithms and

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

466

Handbook of choice modelling

different parameterizations can have different mixing properties. Roughly, ‘mixing’ is the efficacy and speed that the chain of simulated values tours the support of the posterior distribution. In the best case, the chain will rapidly cover the support. The MCMC approximation of the posterior mean is similar to the MC approximation in equation (20.8), except the first B random draws are excluded from the average: 3 R (W) g (W 0 Y) dW < MCMC

T 1 S R (Wt) T 2 B t5B 11

(20.12)

Its sampling variance is: var c

T var [ R (W) 0 Y ] 1 2 T 2B 21 S R (Wt) 0 Yd 5 S (T2B212j) rj f (20.13) e1 1 T 2 B t5B 11 T2B T 2 B j51

where rj is the autocorrelation function: rj 5 Corr[R(Wt),R(Wt1j)|Y], assuming that the sampler has reached the stationary distribution by iteration B. The rate of decay in the autocorrelation function determines the effective sample size and accuracy of the MCMC estimator. Different sampling schemes can lead to different numerical accuracies for different parameters. If the autocorrelations are large for an MCMC sample and parameter R(W), then the chain has poor mixing properties and will require a large number of draws to span the support of the posterior distribution. An active area of research is designing MCMC samplers or reparameterizing models to improve mixing properties of the MCMC. Next, we describe two approaches to MCMC: Gibbs sampling and Metropolis– Hastings. The former is a special case of the latter, but Gibbs sampling is easier and applies to a surprisingly large number of models. Gibbs Sampling The name ‘Gibbs’ is obscure. Metropolis et al. (1953) simulated random numbers from the Gibbs distribution, which describes the energy states of atoms. The joint distribution of energy states for all atoms is very complex. However, Gibbs distributions have a spatial, Markov property: the energy state of an atom only depends on the energy state of its contiguous neighbours. Gibbs sampling recursively generates the energy state for each atom conditional on the energy states of neighbouring atoms. The technique of recursively conditioning is often called ‘Gibbs sampling’ even when not sampling from the Gibbs distribution. K Gibbs sampling splits W into K mutually exclusive and exhaustive blocks: W 5 < Wk. k 51 The definition of the blocks is not arbitrary but designed to make the analysis simple. Usually, the model suggests the partition. The key concept is that if it is easy to generate draws from the ‘full conditional’ distributions for each block Wk, then we can create a Markov chain with the correct stationary distribution. Define W(k) to be all of the K parameters exclusive of block k: W(k) 5 j51:<j 2kWj. The full conditional distribution of Wk is g(Wk|Rest) 5 g(Wk|Y,W(k)). An important detail in executing the algorithm is that W(k) consists of the current values of the parameters, and these could be from the current iteration t or from the previous iteration t−1 depending on the status of the recursive generation.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

467

Gibbs sampling algorithm 1. 2.

3.

Initialize the parameter at t 5 0: W 5 W0. We will temporarily use superscripts for the iteration to avoid confusion with the subscripts for the blocks. At iteration t, loop over the blocks k 5 1, . . . K: For k 5 1, generate Wt1~g (Wt1 0 Y, Wt21 (1) ) . t21 For 1 , k , K, generate Wtk~g (Wtk 0 Y, Wt1, . . . , Wtk 21, Wkt21 11, . . . , WK ) . t t t For k 5 K, generate WK~g (WK 0 Y,W(K)) . Repeat step 2 for T iterations. T is selected so that (a) the Markov process has converged to the stable distribution g(W|Y) after B draws, and (b) T−B is sufficiently large that the MCMC approximation (20.12) is sufficiently accurate.

The Markov transition kernel (roughly, the probability of moving form Wt−1 to Wt) for Gibbs sampling is: t21 F (Wt21, Wt) 5 q k51g (Wtk 0 Y, Wt1, . . . , Wtk 21, Wt21 k 11, . . ., WK ) .

K

(20.14)

By carefully rewriting the conditional probabilities and by using the fact that conditional distributions integrate to one, we find that g(W|Y) is the stationary distribution: 3 g (W

t21

0 Y) F (Wt21, Wt) dWt21 5 g (Wt 0 Y) .

(20.15)

If you select Wt−1 according to the posterior distribution and use the Markov transition kernel to move to state Wt, via Gibbs sampling, then the marginal distribution of Wt is also the posterior distribution. The Markov chain {Wt} converges to the posterior distribution in probability. Frequently asked questions How often do the full conditional distributions for the blocks have convenient random number generators? Surprisingly often for statistical models. Does it matter which order the blocks are sampled? No. Can I randomly select blocks to update? Yes. Can I repeatedly sample from some of the blocks before sampling from others? Yes. Do the initial values matter? In theory, no if the starting values are in the support of the posterior distribution. In practice, yes if the starting values are in a region of small posterior mass. Does the blocking of the parameter space affect the algorithm’s efficiency? Yes. Generally, more blocks result in larger autocorrelations, poorer mixing, and larger variance in equation (20.13) than a choice with fewer blocks, though results can vary across applications. We next illustrate Gibbs sampling with two simple cases that are building blocks for conjoint analysis. Gibbs sampling for a bivariate normal distribution with order constraints This example has two objectives: illustrating Gibbs sampling for a simple problem and developing tools for probit modes. The example first considers the simpler case of Gibbs sampling from an unconstrained, bivariate normal distribution. In this case, standard random number generators are preferred because the draws are independent, and Gibbs

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

468

Handbook of choice modelling 6 5

2

4 3 2 1 0 0

2

4

a)

6

8

10

6

8

10

40

50

1 6 5

2

4 3 2 1 0 0

2

4

c)

1

1

12

2

10

Draw

8 6 4 2 0 0 e)

–2

10

20 30 Iteration t

Figure 20.2 MCMC for bivariate normal distribution; A: initial point and first iteration; B: two iterations; C: 20 iterations; D: 50 draws; E: trace of draws versus iteration number; F: 20 iterations with the constraint Ω1 > Ω2

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

469

6 5

2

4 3 2 1 0 b)

0

2

4

0

2

4

1

6

8

10

6

8

10

6 5

2

4 3 2 –1 0 d) 1

1

6 5

2

4 3 2 1 0 0 f)

2

4

6

8

10

1

sampling is inefficient. Figure 20.2 illustrates Gibbs sampling of W 5 (W1,W2) from a bivariate normal distribution with E(W1) 5 5, E(W2) 5 3, Var(W1) 5 4, Var(W2) 5 1 and Cor(W1,W2) 5 −0.7. The contour lines in Figure 20.2 are from the bivariate normal density. The full conditional distributions are:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

470

Handbook of choice modelling

●

W1 | W2 ~ N(9.20 − 1.40W2 , 2.04), the conditional normal distribution with mean 9.20 − 1.40W2 and variance 2.04. ● The conditional mean is E(W1|W2 ) 5 E(W1) 1 Cor(W1,W2)[ Var(W1)/ Var(W2)]1/2 [W2−E(W2)]. ● The conditional variance is Var(W1|W2 ) 5 Var(W1)[1 − Cor(W1,W2)2].

●

W2 | W1 ~ N(4.75 − 0.35W1 , 0.51). ● The conditional mean is E(W2|W1) 5 E(W2) 1 Cor(W2,W1)[Var(W2)/ Var(W1)]1/2 [W1 − E(W1)]. ● The conditional variance is Var(W2|W1 ) 5 Var(W2)[1−Cor(W2,W1)2].

The initial values are (0,0). Panel A shows the path the (0,0) to the first draw (W11,W21) for t 5 1. Panels B and C show the paths for t 5 2 and t 520. Panel D shows how the MCMC tours the joint distribution for t 5 50, and Panel E traces the draws versus t. The traces appear stationary after the first draw. With 50 draws the MCMC estimators are close to their true values: E(W1) ≈MCMC 5.12, E(W2) ≈MCMC 2.89, Var(W1) ≈MCMC 4.28, Var(W2) ≈MCMC 1.27 and Cor(W1,W2) ≈MCMC −0.68. The analysis of probit models generates random utilities from normal distributions subject to an order constraint. Panel F diagrams the path for 20 iterations subject to an order constraint W1 . W2. The inverse cumulative distribution function (cdf) transform is used to generate constrained draws. A general method of drawing X from a cdf F is based on the observation that U 5 F(X) has a uniform distribution U(0,1) on 0 to 1. Inverting this relation gives X 5 F−1(U). If X is constrained between a and b, then the density is: g(x) 5 f(x)/[F(b) − F(a)] for a , x , b. The inverse cdf transform is: X 5 F 21 [ (1 2 U) F (a) 1 UF (b) ]

(20.16)

where U ~ U(0,1). Special cases for −` , X , b or a , X , ` use F(−`) 5 0 or F(`) 5 1. The inverse cdf transform algorithm to generate W1 . W2 is: 1. 2.

Generate W1 . w2 given W2 5 w2: W1 5 F−1[(1−u)F(w2) 1 u] where F is the normal cdf with mean 9.20 − 1.40w2 and variance 2.04. u is a random draw from U(0,1). Generate W2 , w1 given W1 5 w1: W2 5 F−1[uF(w1)] where F is the normal cdf with mean 4.75 − 0.35w1 and variance 0.51. u is a random draw from U(0,1).

The 20 draws in Panel F ‘walk’ up to the boundary W1 5 W2, but does not cross it. This algorithm is easily generalized to more than two, normally distributed random variables. Gibbs sampling for homogeneous normal regression The homogeneous regression model is: Yi 5 xri b 1 ei for i 5 1, . . . , n

(20.17)

where xi is a vector of predictor variables; b is a vector of regression coefficients, and the random terms {ei} are a random sample from a normal distribution with mean 0 and standard deviation s. The prior distributions are:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models ●

b ~ N(b0,B0), the multivariate normal distribution with prior mean b0 and prior covariance B0. The multivariate normal density is: 1 1 2 g (b) ~ 0 B0 0 2 exp c 2 (b2 b0)r B021 (b2 b0) d . 2

●

471

(20.18)

r s s2 ~IG ( 2 , 2 ) is the inverse Gamma distribution. The prior mean is s0/(r0−2), and the prior variance is 2[E(s2)]2/(r0−4). As the name implies, an inverse Gamma random variable is 1 divided by a Gamma random variable. The inverse Gamma density is: 0

0

g (s2) ~ (s2) 2

r a 0 11b 2

exp a2

s0 b for s2 . 0. 2s2

(20.19)

The joint distribution of the data and parameters is:

c q f (yi 0 b,s2) d g (b) g (s2) n

(20.20)

i51

Priors

Likelihood

The full conditionals are the following. Full conditional of b. The factors in the joint distribution that depends on b are:

n 1 n 1 2 S (yi 2 xri b) 2 2 (b 2 b0)rB021 (b 2 b0) d (20.21) q f (yi 0 b,s ) g (b) ~ exp c 2 2s2 i51 2 i51

Expanding the quadratic forms in b, combining terms, and completing the square gives a normal distribution with updated mean bn and covariance Bn: b |N (bn,Bn)

Bn 5 c B021 1

21 1 n S x xr d i i s2 i51

bn 5 Bn c B021b0 1

1 n Sx y d s2 i51 i i

(20.22)

Full conditional of s2. The factors in the joint distribution that depends on s2 are: 2 2 2 2 q f (yi 0 b,s ) g (s ) ~ (s ) n

i51

a

n 1r0 2

11b

exp c 2

s0 1 n S (y 2 xri b) 2 2 2 d 2 i51 i 2s 2s

(20.23)

Identifying terms results in an inverse Gamma with parameters rn and sn: rn sn s2 | IG c , d 2 2 rn 5 r0 1 n n

sn 5 s0 1 i51 S (yi 2 xri b) 2

(20.24)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

472

Handbook of choice modelling

MCMC draw

Figure 20.3 shows the results from MCMC with simulated data where Yi 5 10 1 xi 1 ei for i 5 1, . . . , 30, and ei ~ N(0,4). X was generated from a standard normal distribution. The MCMC ran for 1000 iterations, and the analysis uses the last 500 iterations. The posterior means are 9.73 for the intercept, 1.06 for the slope and 3.81 for the error variance. Their posterior standard deviations are 0.378, 0.356 and 1.024, respectively. Panel A graphs the MCMC draw versus the iterations. The chain was initialized at 0 for the intercept and slope and 1 for the variance. In this simple problem, MCMC rapidly converges to the posterior distribution, and the series are stationary well before iteration 500. Panel B plots the posterior densities (rotated 90 degrees) based on the last 500 iterations in Panel A. The posterior distributions for the regression coefficients are nearly normal, while the posterior for the variance is skewed because of the occasional, large draws in Panel A. Panel C displays the autocorrelation function and indicates that draws have very little serial correlation. 13 12 11 10 9 8 7 6 5 4 3 2 1 0 –1

13 12 11 10 9 8 7 6 5 4 3 2 1 0 –1

Intercept

Variance

Slope 0

250

500

750

1000

Iteration

a)

Intercept

Variance Slope

0.0 b)

Intercept

Slope

0.3

0.6

0.9

Posterior density

Variance

0.10

ACF

0.05

0.00

–0.05

–0.10 0 c)

10

20 Lag

30

40

Figure 20.3 Bayesian analysis of linear regression model; A: MCMC draws versus iteration; B: posterior densities from last 500 draws; C: autocorrelation functions

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

1.2

Bayesian estimation of random utility models

473

Metropolis–Hastings Sampling The full conditional distributions in Gibbs sampling may not have a convenient random number generator. Metropolis–Hastings (MH) is similar to importance sampling in that it generates random numbers from the wrong distribution and then modifies them by keeping the ones that ‘work’. MH generates draws from a proposal distribution h, which can depend on the last draw Wt−1. The candidate distribution h should have the same support as g. Metropolis–Hastings algorithm 1. 2. 3.

Initialize W0 to a value in the support posterior distribution. (We revert to subscripts for the iteration number.) At iteration t, generate a candidate random variable Y from the proposal distribution h(Y|Wt−1). Compute the Metropolis jump probability: F (Wt21,Y) 5 min e 1,

g (Y 0 Y) h (Wt21 0 Y) f g (Wt21 0 Y) h (Y 0 Wt21)

(20.25)

The minimum ensures that the jump probability is between 0 and 1. One does not need to compute the normalizing constants for g and h. 4. Test U , F(Wt−1,Y) where U ~U( 0,1), the uniform distribution. (Use logarithms to improve numerical stability: ln(U) , ln[F(Wt−1,Y)].) If the condition is true, then accept the candidate and set Wt 5 Y. If the condition is false, reject the candidate and retain the previous draw: Wt 5 Wt−1. 5. Repeat steps 2 to 4 until T random numbers are generated where T is selected so that (a) the Markov process has converged to the stable distribution g(W|Y) after B draws, and (b) T−B is sufficiently large that the MCMC approximation (20.26) is accurate. The Metropolis algorithm always returns a random number on each loop; however, that random number may be the same as the last draw. If the proposal distribution h is close to the target g, then F will be close to one, and the candidate Y will be frequently accepted. Sometimes users interpret the ratio in equation (20.25) as a likelihood ratio test, which is not quite correct. The denominator gives the transition from Wt−1 to Y if the candidate is accepted, and the numerator gives the reverse transition from Y to Wt−1. Together with the test condition in step 4, the ‘detailed balance equation’ for Markov Chains hold: g (Wt21 0 Y) h (Wt 0 Wt21) F (Wt21, Wt) 5 g (Wt 0 Y) h (Wt21 0 Wt) F (Wt,Wt21)

(20.26)

and g is the stationary distribution. The left-hand-side is the transition probability form Wt−1 to Wt, and the right-hand side is the reverse transition probability. If h 5 g, then Metropolis simplifies to Gibbs sampling. A popular choice that is easy to implement is the symmetric random walk: Y|Wt−1 ~ N(Wt−1,C) where the covariance

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

474

Handbook of choice modelling

C is specified by the user. If the variances are too large, then the jumps in the random walk will tend to be large; the probability F in equation (20.25) will tend to be small; and candidates will be frequently rejected. The resulting chain will be stuck at one value for multiple iterations, which increases the autocorrelation in the chain. Conversely, if the variances are too small, the jumps in the random walk will be small and the candidate will be frequently accepted. However, with small jumps the Markov chain will take a long time to transverse the support of the posterior distribution, and the small steps increase the autocorrelation. The goal is to find proposal distributions that balance acceptance rates and jump sizes to minimize the chains autocorrelation. Acceptance rates around 50 percent are optimal for a univariate Gaussian parameter. Optimal acceptance rates fall to around 20 percent for multivariate Guassian parameters in higher dimensions (Roberts et al., 1997). Adaptive methods (Andrieu and Thoms, 2008; Girolami and Calderhead, 2011) attempt to achieve this goal. When the last draw is in a low probability region, adaptive methods tend to have large jumps, and when it is in a high probability region, they tend to have smaller jumps. Homogeneous multinomial logistic regression Logistic regression models require Metropolis sampling because the full conditional distribution of the parameters does not correspond to a known random number generator. Logistic regression is used for discrete-choice conjoint. McFadden (1974) derived logistic probabilities from random utility theory by assuming that the error terms have extreme value distributions. Its density with scale parameter equal to one is: f (e) 5 exp (2e) exp [ 2exp (2e) ]

(20.27)

Figure 20.4 graphs the densities for the standard normal and extreme value distributions. The extreme value distribution is right-skewed with 0 mode, 0.57 mean, 1.27 Extreme value

N(0,1)

0.4

0.3

0.2

0.1

0.0 –4

–2

0

2

4

6

Figure 20.4 Extreme value and standard normal densities

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

8

Bayesian estimation of random utility models

475

standard deviation. Its left tail decreases faster than the normal distribution, while its right tail is longer. In a discrete-choice conjoint study, subject i evaluates mi choice sets. The number of options in each choice set is K. K could vary across choice occasions without changing the MCMC algorithm. The attributes for option k, choice set j, and subject i are xijk. The unobserved, random utility for subject i, choice set j, and option k is: Yijk 5 xrijk b 1 eijk for k 5 1, . . . , K; j 5 1, . . . , mi; and i 5 1, . . ., n

(20.28)

The data-generating mechanism assumes that subject i selects option k if his or her utility for option k exceeds that of the other options in the choice set. The probability that subject i select option k in choice set j is: Pij (k 0 b) 5

exp (xrijk b) K

for k 5 1, . . . , K; j 5 1, . . . , mi ; and i 5 1, . . ., n

(20.29)

S exp (xrijv b)

v51

Define Vij to be the option selected by subject i from choice set j. The joint distribution is: c q q q Pij (k 0 b) c(V 5k) d g (b) n

mi

K

ij

(20.30)

i51 j51 k 51

Priors

Likelihood

Where c (Vij 5 k) 5 1 if option k is selected and 0 otherwise. The log of the joint distribution is: L (b) 5 S S c S c (Vij 5 k) xrijk b 2lna a exp { xrijk b }bd1ln [ g (b) ] j51 n

i51

mi

K

K

k 51

(20.31)

k 51

The random-walk Metropolis algorithm follows: 1.

2.

Initialize the MCMC. (a) Initialize the regression coefficients to b0. A common choice is a vector of zeros, or MLE estimators if available. (b) Compute the log of the joint distribution at b0: L(b0) from equation (20.31). (c) Specify a covariance matrix C for the random-walk Metropolis procedure. Usually, C is a diagonal matrix. Compute the Cholesky decomposition D of C: D’D 5 C. If C is a diagonal matrix, D is the square root of the diagonal of C. Do the following for t 5 1, . . ., T. (a) Generate a candidate draw y 5 bt-1 1 D’z where z is a vector of standard normal random variables. (b) Compute the log of the joint distribution at the candidate: L(y) from equation (20.31). (c) Compute the log of the Metropolis jump probability: ln[F(bt−1, y)] 5 L(y) − L(bt−1). Because the random walk is symmetric: g(y|bt−1 ) 5 g(bt−1|y ), these factors cancel in the jump probability.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

476

Handbook of choice modelling (d) Generate U ~ U(0,1), the uniform distribution. If ln(U) , ln[F(bt−1, y)], set bt 5 y, and L(bt) 5 L(y). If ln(U) . ln[F(bt−1, y)], set bt 5 bt−1 and L(bt) 5 L(bt−1).

The following example simulates a brand conjoint study where 100 subjects evaluate five choice sets. Each choice sets consist of four options: Brand A, Brand B, Brand C and None. Two attributes are Quality and Price, which are generated from uniform distributions. Quality and Price are zero for the option None. To identify the model, the intercept for None is set to zero, and the scale factor for the extreme value distribution is one. The coefficients b consist of intercepts for brands A, B, and C and coefficients for Quality and Price. The design matrix and the exponents of the choice probabilities in equation (20.29) for a choice set are: Brand A 1 Brand B 0 B≥ Brand C 0 None 0

0 1 0 0

0 0 1 0

QA QB QC 0

bA PA bA1QAbQ1PAbP bB PB b 1QB bQ 1PB bP ¥ E bCU = ≥ B ¥ PC bC 1QC bQ 1 PC bP bQ 0 0 bP

(20.32)

‘Q’ is Quality, and ‘P’ is Price. The prior distributions for the coefficients are normal with mean 0 and standard deviation 10. The covariance C for the random walk proposal distribution is 0.01 times the identity matrix. Panel A of Figure 20.5 plots 2000 MCMC iterations. The last 1000 iterations are used for estimation. Table 20.1 gives the true regression parameters and their posterior means and standard deviations. The proportion of times that the candidate draw is accepted in the estimation sample was 0.438. Compared with the MCMC draws in Figure 20.3 from the linear regression model, the chain takes longer to become stationary. Panel B of Figure 20.5 graphs the autocorrelation functions for the parameters. Because the Metropolis algorithm often rejects the candidate or takes small jumps, there is more serial correlation than in the linear regression example of Figure 20.3, Panel C. Convergence Diagnostics In most iterative algorithms, such as MLE or iteratively weighted least squares, the estimates converge to a point. Changes in the parameter estimates from iteration to iteration Table 20.1 Estimates of the homogeneous logistic regression model TRUE

Posterior Mean

Brand A Brand B Brand C Quality Price

0.50 0.00 −0.50 3.00 −2.00

0.624 0.206 −0.337 3.002 −1.955

Std DEV 0.207 0.203 0.215 0.214 0.258

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

477

4 Quality

3

Draw

2 1

Brand A

0

Brand B Brand C

1 2

Price 3 0

500

a)

1000 Iteration

1500

2000

1 0.9 0.8 0.7 ACF

0.6 0.5

Price

0.4 Brand A

0.3

Brand B

0.2

Brand C

0.1 0

Quality 0

b)

10

20

30

40

Lag

Figure 20.5 MCMC draws homogeneous multinomial logistic regression; A: MCMC draws versus iterations; B: autocorrelation function provide information about convergence. In contrast, the draws from MCMC converge to the posterior distribution, which makes detecting convergence more difficult. Variation in the draws is expected after convergence. The MCMC convergence diagnostics test for stationary in the draws, and there are a number of proposals in the literature. Geweke (1992) breaks the MCMC chain into two blocks, and tests that the parameter estimates of the two blocks are equal. A different approach uses more than one chain and tests that the parameter estimates from the multiple chains are equal. Gelman and Rubin (1992) compare the within-chain variation to the between-chain variation. If the chains are stationary, the two sources of variation should be nearly equal. Brooks and Roberts (1998) and Cowles and Bradley (1996) survey the literature on convergence diagnostics. Be warned that there is not a foolproof test. If the posterior density is multi-modal, the MCMC algorithm can become stuck in one region of the posterior support. Then all of

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

478

Handbook of choice modelling

the convergence diagnostics will indicate stationary chains, but the numerical approximations will be biased.

4

HIERARCHICAL BAYES MODELS FOR CONJOINT DATA

This section builds on the homogenous linear and logistic regression examples of the last section by introducing random coefficients and heterogeneity distributions. It presents a series of increasingly difficult hierarchical Bayes models in order to illustrate the flexibility of the MCMC in breaking complex models into simpler parts. The sequence of models starts with HB regression where subjects directly give their ‘utiles’ (amount of utility) for products on a continuous scale. Next, the chapter considers ordinal probit models for elicitation of utiles on ordinal scales. Finally, choice-based conjoint force subjects to select their preferred options. The information content in the data decreases from continuous to ordinal to choice. One can view the ordinal and choice models as passing the continuous observations through a filter that removes some of the signal in the data. However, choice-based conjoint is currently most popular because choice tasks align better with consumer behaviour and avoids the notorious failure of procedural invariance (Lichtenstein and Slovic, 1971; Grether and Plott, 1979). Even though conjoint analysis seldom uses HB regression, the other models incorporate its algorithms. Subject-level Model for Random Utility Subject i’s random utility for option or product j is: Yij 5 wrij a 1 xrij bi 1 eij for i 5 1, . . . , n and j 5 1, . . . , mi

(20.33)

where wij and xij are vectors of observed variables; a is a vector of fixed coefficient that is common to all subjects; bi is a vector of random coefficient that is particular to subject i; and eij is the random component. In HB regression, HB ordinal probit, and HB probit models, the random component has a normal distribution with mean 0 and variance s2 (or covariance S in HB probit). In HB logit models, the random component has the extreme value distribution. In conjoint analysis, xij usually is a vector of product attributes, and bi is subject i’s preferences for the attributes. The data wij could include background or context information that has a common effect a for all subjects. One can easily generalize beyond linear (in parameters) utility by using Metropolis sampling. This chapter uses linear utility to simplify the presentation of algorithms. Heterogeneity Distribution Without too much effort, Bayesians can use different heterogeneity distributions for subject-level parameters {bi} and adapt their algorithms. However, displaying the computations for each of these heterogeneity distributions goes beyond the scope of this chapter. We focus on the multivariate regression heterogeneity, which allows the researcher to relate individual-level parameters to subject-level, observed variables. For instance, price sensitivity may be related to income; brand preference may be related to

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

479

expertise; and colour preference may be related to gender. The multivariate regression model is: bi 5 Ziq 1 di

(20.34)

where bi is vector of parameters; Zi is a design matrix with subject-level covariates; q is a vector of regression coefficients; and di ~ N (0, L) is a multivariate normal distribution with covariance matrix L. This specification simplifies to a multivariate normal distribution when Zi 5 1, and q is the mean of the individual-level parameters.Other choices of heterogeneity distribution are latent class (Kamakura, 1988; DeSarbo et al., 1992; Vriens et al., 1996), mixtures of normal distribution (Lenk and DeSarbo, 2000; FruhwirthSchnatter et al., 2004) and mixture of Dirichlet processes (Green and Richardson 2001; Ansari and Mela, 2003; Burda et al., 2008). Train and Sonnier (2005) recommend a random effects distribution that limits extreme values. The reduced-form model substitutes equation (20.34) into the utility function of equation (20.33) and obtains Yij 5wrij a1xrij Zi q1e*ij where e*ij 5 eij 1 xrij di. The reduced-form utility has interactions between xij and Zi. The reduced-form error term is heteroscedastic (non-constant variance) and induces correlations among the mi observations for subject i. The posterior means of {bi} are convex functions of individual-level estimators, when they exist, and the population-level model. Without subject-level covariates Zi in equation (20.34), all {bi} shrink towards their common mean. With subject-level covariates, a subject’s coefficient shrinks towards the conditional mean of other subjects with similar covariates. The bi for subjects in the same income tend to shrink towards each other; men tend to shrink towards other men; experienced subjects tend to shrink toward other experienced subjects, and so on depending on Zi. As the number of observations at the subject level increases, the amount of shrinkage reduces, and individual-level estimates ‘stand on their own’ data. Researchers need to decide which variables are included in wij, xij, and Zi, and the choice is not always clear. Product attributes that change across options belong in xij. Subject-level covariates could appear in wij or Zi. Zi modifies the subjects’ preferences bi, while wij directly affects the utility. It seems reasonable that subject-level covariates are moderating variables that effect subjects’ perceptions of the attributes and appropriately belong in Zi. Including subject-level covariates in wij results in a reduced form utility specification. The fixed effects could include contextual variables about the experiment, such as presentation order, design fraction, or experimental media. The factors could uniformly shift subjects’ utilities in different experimental conditions. Prior Distributions The remaining parameters a, s2, q and L have prior distributions. ‘Standard’ choices that are flexible, easy to implement and can be made vague are normal distributions (equation 20.18) for a and q, inverse Gamma distribution (equation 20.19) for s2, and inverse Wishart distribution for L: ● ●

a ~ N(a0,A0) with prior mean a0 and prior covariance A0. q ~ N(q0, Qo) with prior mean q0 and prior covariance Q0.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

480 ● ●

Handbook of choice modelling s2~IG (r2 ,s2 ) with prior mean s0/(r0−2). L~IWp (d0,D0) , the p-dimensional inverse Wishart distribution with d0 prior degrees of freedom and scale parameter D0 (Zellner, 1971). The prior density is: 0

0

g (L) ~ 0 L 0 2

d0 1p 11 2

1 exp c 2 tr (L21D0) d 2

(20.35)

where ‘tr(X)’ is the trace of a matrix X and sums its diagonal elements. L and D0 are positive definite. The prior mean of L is D021 / (d0 2 p 2 1) . The specification of the parameters for the prior distributions is not without dispute; however, ‘standard’ settings have evolved over many years of application. For the normal priors, setting the prior means a0 and q0 to zero biases the analysis to ‘no effect’ for wij and Zi, and setting the prior covariance matrices A0 and Q0 to diagonal matrices with large values on the diagonal gives a ‘non-informative’ or relatively flat prior; however, users need to know how ‘large’ is large. A vague prior setting for one variable may be more informative for another if their scales differ by orders of magnitudes. A pro-tip is to standardize continuous variables, and set the prior variances to 100 or more. The prior distributions for variances and covariances are more problematic. Variances do not have a natural reference for ‘no effect,’ unlike regression coefficients. Also, the inverse Gamma and inverse Wishart distribution have a ‘dead zone’ to the right of zero where the density becomes essentially zero (Lenk and Orme, 2009). The size of the dead zone depends on the prior parameters. If the true variance is in the dead zone, then accurate estimation of the variance requires very large sample sizes. Standard practice is to set the degrees of freedom (r0 or d0) to a small integer plus the dimension of the parameter. If the researcher has prior information about the prior mean of the variance, then he or she can back-out the scale parameter from the prior mean and degrees of freedom. Hopefully, the prior mean is selected so that the true variance is not in the dead zone: it is better to make the prior mean too small than too large. MCMC for HB Regression Subjects evaluate products on a continuous scale that measures desirability, attractiveness, or likelihood of purchase. We assume that Y in equation (20.33) is a direct evaluation of the random utilities. HB regression extends the homogeneous linear regression model (equations 20.17 and 20.20) by adding a heterogeneity distribution. The joint distribution for HB regression is: c q q f (yij 0 a,bi,s2) d c q h (bi 0 q,L) d g (a) g (s2) g (q) g (L) n

mi

i51 j51

Likelihood

n

(20.36)

i51

Heterogeneity

Priors

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

481

Full conditionals for HB regression 1.

Full conditional of a. Eliminate all factors in the joint distribution that do not depend on a: g (a 0 Rest) ~ c q q f (yij 0 a,bi,s2) d g (a) n

mi

i51 j51

The full conditional distribution is normal with mean an and covariance An: a | N (an, An)

An 5 c A021 1

2.

an 5 An c A021a0 1

21 1 n m S S w wr d 2 i51 j51 ij ij s i

1 n mi S S w (y 2 xrij bi) d s2 i51 j51 ij ij

(20.37)

Full conditional of bi. Eliminate all factors of the joint distribution that does not depend on bi: h (bi 0 Rest) ~ c q f (yij 0 a,bi,s2) d h (bi 0 q,L) mi

j51

The heterogeneity distribution becomes the ‘prior’ distribution. The full conditional distribution is normal: bi |N (bi, Bi) Bi 5 c L 21 1

3.

bi 5 Bi c L21Zi q 1

21 1 mi S x xr d 2 j51 ij ij s

1 mi S x (y 2 wrij a) d s2 j51 ij ij

(20.38)

Full conditional of s2 Eliminate factors in the joint distribution that do not depend on s2 g (s2 0 Rest) ~ c q q f (yij 0 a,bi,s2) d g (s2) n

mi

i51 j51

The full conditional distribution is:

rn sn s2 | IGa , b 2 2 n

rn 5 r0 1 S mi i51

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

482

Handbook of choice modelling n

mi

sn 5 s0 1 S j51 S (yij 2wrij a 2 xrij bi) 2 i51

4.

(20.39)

Full conditional of q. Eliminate factors in the joint distribution that do not depend on q: g (q 0 Rest) ~ c q h (bi 0 q,L) d g (q) n

i51

The full conditional distribution is:

q |N (qn,Qn)

Qn 5 c Q021 1 S Zri L21Zi d n

21

i51

qn 5 Qn c Q021q0 1 S Zri L21bi d n

i51

5.

(20.40)

Full conditional of L. Eliminate factors from the full conditional that do not depend on L: g (L 0 Rest) ~ c q h (b i 0 q,L) d g (L) n

i51

Then the full conditional distribution is:

L |IWp (dn, Dn) dn 5 d0 1 n n

Dn 5D0 1 S (bi 2Zi q) (bi 2Zi q)r i51

(20.41)

A fast method for generating inverse Wishart distributions uses the Bartlett decomposition (Smith and Hocking, 1972). HB Ordinal Probit Regression Subjects rate products on an ordinal scale. The ordinal probit model assumes that the observed ratings are derived from the unobserved or latent utilities by comparing them to a set of thresholds {hj} (Aitchison and Silvery, 1957; Gelfand et al., 1992; Albert and Chib, 1993). The threshold model adds a ‘link’ function that relates the latent utilities Yij to the observed Vij, and prior distributions for the thresholds. HB ordinal probit model 1. 2.

Threshold likelihood: P (Vij 5 k) 5 P (hk 21 , Yij # hk) for k 5 1, . . ., K where h0 , . . . , hK and h0 52` and hK 5 `. The thresholds are uniform on h1 , . . . , hK 21 where h1 and hK 21 are fixed constants.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

483

This formulation identifies the model by fixing the first and last threshold to constants. Other parameterizations of the model can improve the mixing of the MCMC algorithm (Nandram and Chen, 1996). The joint distribution for the model is: c q q P (vij 0 yij,h) d c q q f (yij 0 a,bi,s2) d n

mi

n

i51 j51

mi

i51 j51

3 c q h (bi 0 q,L) d g (h) g (a) g (s2) g (q) g (L) Likelihood

Latent utility

n

(20.42)

i51

Heterogeneity

Priors

Full conditional distributions for HB ordinal probit 1.

2.

Full conditional of Yij. If Vij 5 k, then

g (yij 0 Rest) ~ f (yij 0 a, bi,s2) c [ hk 21 , yij # hk ]

(20.43)

max{i, j:V 5k} { yij,hk 21 } , hk # min{i, j:V 5k 11} { yij,hk 11 }

(20.44)

where c is the indicator (0/1) function of the set. The full conditional distribution of Yij is a truncated normal distribution. Use the inversed CDF transform from equation (20.16) to accommodate the constraint hk−1 , yij , hk: yij 5 F−1[(1−u)F(hk−1) 1 uF(hk)] where u is a draw from a U(0,1), and F is the normal CDF with mean w9ij a 1 x9ij bi and standard deviation s. The full conditional distribution of hk for k 5 2, . . ., K−1 is uniform on the interval given by: ij

3.

ij

hk is bounded below by the largest yij such that Vij 5 k. If none of the observations is equal to k, then it is bounded below by hk−1. hk is bounded above the smallest yij such that Vij 5 k11. If none of the observations is equal to k11, then it is bounded above by hk11. Given the latent utilities Y, the full conditional distributions for the remaining parameters are the same as the analysis for HB regression.

Discrete-choice Conjoint The subject only reports the option that has maximum utility in discrete-choice conjoint. Subject i is presented with mi choice tasks. Each choice task consists of K options or products. We need to expand the indexing for the subject-level model of equation (20.33). Subject i’s utility for product k in choice task j is Yijk 5 w9ijka 1 x9ijk bi 1 eijk. The K vector of utilities for subject i and choice set j is Yij, and the model can be written as a multivariate regression: Yij 5 Wija 1 Xij bi 1 eij where Wij and Xij are matrices, and the random component eij is a K vector.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

484

Handbook of choice modelling

In discrete choice, only the maximal indicant is observed: Vij 5 v if Yijv $ Yijk for k 5 1, . . . , K. This constraint defines the link function between the observed data and latent utilities. The distribution of the observed choices is P(Vij 5 v) 5 P(Yijv5maxk{Yijk}). The utilities are only identified up to a linear transformation because Y*ijk 5 a Yijk 1 b for constants a . 0 and b does not change the distribution of choices Vij. Common identification constraints set one of the intercepts to 0 and a scale parameter of the error terms to one. If the outside good or ‘None’ is included in the model, standard practice is to set w 5 0 and x 5 0 for the outside good. The distribution of the random components {eijk} gives the likelihood function. Normally distributed random components lead to the probit model (Aitchison and Bennet, 1970; Albert and Chib, 1993; McCulloch and Rossi, 1994), and extremevalue random components (equation 20.27) result in the logit model (McFadden, 1974). Although the normal and extreme value distributions are not greatly dissimilar (Figure 20.4), the probit model differs substantively from the logit model when the normally distributed errors are correlated. Logit models with independent errors exhibit the property of independence of irrelevant alternatives (IIA)9 at the individual level. Correlated random utilities can impact choice probabilities. To see this, consider the situation of three brands, A, B, and C. The expected utilities of all three brands are equal to zero, and the error variances are one. The utility for A is uncorrelated with B and C: A is isolated from the competitive effects of B and C. Figure 20.6 graphs the brand shares as a function of the correlation between B and C. If only brands A and B are in the market, they have equal choice shares of 50 percent by symmetry. The impact of C entering the market depends on its correlation with B. If the correlation is zero, then the three brands split the market, and each has choice shares of 33.3 percent. As the correlation between B and C approaches one, they become perfect substitutes. Then the choice share for A returns to 50 percent, and the choice shares for B and C split the remainder for shares of 25 percent each. B and C become highly differentiated as their correlation goes to minus one. The choice share of A decreases to 25 percent, and the choice shares of B and C split the difference, receiving shares of 37.5 percent each. In contrast, under the logit model the brands always have equal probabilities. Who gains or loses in a new product introduction depends on who does the introduction and the correlations. In this simple example, if a firm new to the market introduces C, then it should try to differentiate itself from the current offering B. If the manufacturer of A introduces C, then it is better off by introducing a perfect substitute of B: its total market share increases from 50 percent to 75 percent. If the manufacturer of B introduces C, then it is better off by introducing a differentiated product: its total market share increases from 50 percent to 75 percent. HB Probit The HB probit model adds a link function to the HB Regression model and modifies the scalar error variance in the likelihood to a matrix covariance. These additional features are:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

485

0.6 P(A|{A,B}) and P(B|{A,B}) 0.5 P(A|{A,B,C})

Choice share

0.4

0.3

P(B|{A,B,C}) and P(C|{A,B,C})

0.2

0.1

0.0 –1.0

–0.8

–0.6

–0.4

–0.2

0.0

0.2

0.4

0.6

0.8

1.0

Correlation between B and C

Figure 20.6 Probit choice shares for three brands where A is uncorrelated with B and C, and B and C are correlated; P(A|{A,B}) is the probability of A given that the choice set is {A, B} HB probit model 1. 2.

Discrete-choice likelihood: P(Vij 5 v) 5 P(Yijv 5 maxk{Yijk}) for i51,. . .,n; j51,. . .,mi; and u51, . . . ,K. S~IWK (r0,S0) c { sKK 5 1 } the constrained, inverse Wishart distribution where the last element is one. If ‘None’ is an option, then it is usually the last element. The joint distribution is: c q q P (Vij 0 Yij) d c q q f (Yij 0 a,bi,S) d n

mi

n

i51 j51

mi

i51 j51

Likelihood

Random utility

3 c q h (bi 0 q,L) d g (a) g (S) g (q) g (L) n

(20.45)

i51

Heterogeneity

Priors

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

486

Handbook of choice modelling

Full conditionals for HB probit 1.

Full conditional of Yij:

f (Yij 0 Rest) ~ P (Vij 5 v 0 Yij) f (Yij 0 a,bi,S)

~ c { Yijv . Yijk for all k } f (Yij 0 a, bi, S)

(20.46)

where c is the indicator function. Because the latent utilities are correlated, we sequentially generate the components of Yij from truncated, conditional normal distributions. The vector Yij(k) is the vector Yij without the kth component yijk. For k 5 1, . . . , K the conditional normal distribution of yijk given Yij(k) has: (a) Conditional mean mijk|(k) 5 mijk 1 Sk,(k) S(k,k)−1[Yij(k) − mij(k) ] and (b) Conditional covariance Sk|(k) 5 Sk,k − Sk,(k) S(k,k)−1 S(k),k. The factors in the conditional mean and covariance are: the mean vector is mij 5 Wija 1 Xij bi; mijk is the kth element of the mean vector; mij(k) is the mean vector without the kth element; Sk,k is the (k,k) element of the covariance matrix S; S(k,k) is S without the kth row and column; Sk,(k) is the kth row of S without the kth column; and S(k),k is the kth column of S without the kth row. We apply the inverse cdf transform (equation 20.16) to generated truncated normal random variables. (a) If Vij 5 k (option k was selected), yijk . max{Yij(k)}. Then yijk 5 F−1[(1−u) F(max{Yij(k)}) 1 u] where u is a U(0,1) random number. (b) If Vij 5 v ≠ k (option k was not selected), yijk , yijv. Then yijk 5 F−1[uF(yijv)] where u is a U(0,1) random number. 2. Full conditional of S. The full conditional distribution is the constrained inverse Wishart: S |IWK (rn, Sn) c { sK,K 5 1 } n

rn 5r0 1 S mi

(20.47)

i51

n

mi

Sn 5 S0 1 S S (Yij 2 Wij a 2 Xij bi) (Yij 2 Wij a 2 Xij bi)r i51 j51

Nobile (2000) describes a method of generating the constrained inverse Wishart distribution by modifying Bartlett’s decomposition for standard Wishart distributions and does not require additional computations. McCulloch et al. (2000) use a more complex scheme that modifies the likelihood and prior for the covariance matrix. 3. Full conditional distribution for a. a | N (an,An)

An 5 cA021 1 S SWrij S 21Wij d n

mi

21

i51 j51

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models an 5 An c A021a0 1 S S Wrij S 21 (Yij 2 Xijbi) d n

mi

i51 j51

4.

487

(20.48)

The full conditional distribution for bi is:

bi |N (bi, Bi)

Bi 5 c L 21 1 S Xrij S 21Xij d mi

21

bi 5Bi cL 21Zi q 1 S Xrij S 21 (Yij 2Wij a) d j51

mi

j51

5.

(20.49)

The full conditional distribution for the population-level parameters q and L are the same as in HB regression equations (20.40) and (20.41).

The MCMC for the HB probit model avoids computation of the choice probabilities P(Yijv 5 maxk{Yijk}) by drawing the latent utilities. However, researchers often need these probabilities to compute fit statistics, to make predictions, and to compute expected values. If there are only two choice options, then it is simple to use the normal cdf. Larger choice sets require simulations methods (Geweke et al., 1994). HB Logit The random component of the random utility has an extreme value distribution with scale parameter set to one to identify the model. The choice probabilities are a logistic regression function (McFadden, 1974): Pij (k 0 a,bi) 5

exp (wrijk a 1 xrijk bi) K

for k 5 1,. . ., K

(20.50)

S exp (wrijv a 1 xrijv bi)

v51

One of the intercepts is set to 0 to identify the model. The joint distribution is: c q q q Pij (k 0 a,bi) c(U 5k) d c q h (bi 0 q,L) d g (a) g (q) g (L) n

mi

K

n

ij

i51 j51 k 51

(20.51)

i51

Likelihood

Heterogeneity

Priors

The HB logit does not generate latent utilities. The MCMC algorithm draws from the following full conditional distributions. 1.

Full conditional of a. Modify Metropolis sampling for the homogeneous multinomial logistic regression by replacing equation (20.31) with: L (a) 5 S S S c (Uij 5 k) ln [ Pij (k 0 a,bi) ]1ln [ g (a) ] n

mi

K

i51 j51 k 51

(20.52)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

488 2.

Handbook of choice modelling Full conditional of bi. Modify Metropolis sampling for the homogeneous multinomial logistic regression by replacing equation (20.31) with: L (bi) 5 S kS c (Uij 5 k) ln [ Pij (k 0 a, bi) ] 1 ln [ h (bi 0 q,L) ] 51 mi

K

j51

3.

(20.53)

The full conditional distribution for the population-level parameters q and L are the same as in HB regression equations (20.40) and (20.41).

5 BAYESIAN HYPOTHESIS TESTING AND MODEL SELECTION Bayesian hypothesis testing and model selection departs dramatically from classical tests that derive reference distributions for test statistics by assuming that the null hypothesis is true. Classical tests then compare the observed test statistic to this reference distribution. Bayesian procedures integrate over the unknown parameters to obtain the posterior probability of the hypothesis given the data. Classical hypothesis testing cannot compute the probabilities of the hypotheses, even though many practitioners erroneously treat the p-value as the probability of the null hypothesis. Bayesians do not ask the question, ‘How likely is our test statistic if the data were generated from the hypothesized model?’ Rather, it asks, ‘Given the data, what is the probability of the hypothesis?’ Bayesian hypothesis testing is very simple to implement with MCMC when the hypothesis restricts parameters, or functions thereof, to regions. The fraction of times that the MCMC draws are in the hypothesized region estimates the posterior probability. For instance, the hypothesis could be that business travellers prefer business centres more than swimming pools. If qB and qS are the relevant preference parameters for business centres and swimming pools, then the MCMC estimate of P(qB . qS | Data) is the fraction of MCMC draws where qB is larger than qS. If this fraction were large, say more than 90 percent or 95 percent, one would conclude that the hypothesis is true. The actual decision depends on the revenue and cost implications of offering hotels for business travellers without swimming pools. Bayesian hypothesis testing is simple to apply even when the hypothesis concerns a complex function of the parameters. For example, will market share for a product be higher with a given attribute given various competitive offerings? Each iteration from MCMC is used to simulate a potential market result, and these outcomes are averaged over the MCMC draws. It avoids the complexity of classical methods: test statistics, reference distributions, degrees of freedom, uniformly most powerful tests, standard errors, and so on. In addition, the Bayesian tests are exact and do not rely on large sample asymptotics. Bayesian methods accurately portray the uncertainty in the problem, while asymptotic computations tend to understate it. One area of difficulty for Bayesian analysis is testing ‘sharp hypothesis’, for example, a parameter is 0. If the parameter space is continuous, the posterior probability of a sharp hypothesis is 0, regardless of the data. An easy approach to test a sharp hypothesis is to compute the percentiles of the posterior distribution. If the hypothesized value were between, say, the 2.5 and 97.5 posterior percentiles, then one would accept that the parameter has the hypothesized value. Alternatively, one could replace the sharp

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

489

hypothesis with a more reasonable ‘indifference region’ or ‘tolerance interval’ around the hypothesized value, and compute the posterior probability of that region. A theoretical approach to testing sharp hypothesis is model selection.Bayesian model selection uses decision theory to select the best model from a predefined set of models (Kass and Raftery, 1995). It balances model fit with model complexity. Model fit typically increases for more complex models, but if the increase does not justify the additional complexity, Bayesian model selection will point to the simpler model. The fit and complexity measures are explicit in Bayesian information criterion (BIC), which is a large sample approximation of the log-integrated likelihood (Schwartz, 1978). In regression analysis, fit is proportional to the log-likelihood evaluated at the posterior mode, and complexity is proportional to p*ln(n) where p is the number of parameter and n is the sample size. Unlike classical methods, the models can be non-nested, and there can be more than two models. The data Y are constant across models, but the likelihood function, parameter space, heterogeneity distributions, prior distributions, and predictor variables can depend on the model. The specification of model m is fm(Y,Wm) 5 fm(Y|Wm)gm(Wm) for m 5 1, . . ., M where there are M models under consideration. Because the best model is unknown, the prior probability of model m is p(m). The posterior probability of model m is: p (m 0 Y) 5

fm (Y) p (m) M

(20.54)

S f (Y) p ( j) j51 j

fm (Y) 5 3 fm (Y 0 Wm) gm (Wm) dWm where fm(Y) is the marginal distribution of the data under model m or the integrated likelihood. The prior distribution gm should be a proper distribution (integrates to one) or else the integral could be infinite. The 0/1 loss function for choosing model d when the true model is m* is: L (d, m*) 5 cd,m* for cd,m* . 0 when d 2 m* and cd,m* 5 0 when d 5 m* (20.55) The Bayes rule DB selects models to minimize the posterior expected loss. If the costs for selecting the wrong model are equal to c, then the expected posterior loss for selecting model d is c[1−p(d|Y)]. Then the Bayes rule DB selects the model with maximum posterior probability. If the prior probabilities of the models are equally likely, then DB selects the model with maximal integrated likelihood fm(Y). Bayes factors, which are the ratios of integrated likelihoods, are often used when comparing two models. The Bayes factors of model 1 versus model 2 is: B12 5 f1(Y)/f2(Y). If B12 . 1, then model 1 is preferred, assuming symmetric costs and model priors. One abuse of Bayesian model selection is to specify model m based on the results from the previous analyses of the models. For instance, a variable in model m−1 has the wrong sign, and the researcher suspects there are missing variables. The researcher then includes a new variable in model m, and the suspect coefficient has the correct sign. Even worse, the researcher neglects to report model m−1 with the wrong sign. Though such practices seem innocuous and most applied researchers10 indulge in them, they negate

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

490

Handbook of choice modelling

the optimality properties of Bayes rules. Strict Bayesians should not propose new models to fix the deficits in previous models when analyzing the same data. Either they should specify all models for testing before looking at the data, or the specification of a new model should be blind to results from previous models. These rules are difficult to follow. Similarly, classical methods that use sequential hypothesis testing should not base one hypothesis on the results of previous tests. The theory of Bayesian model selection is simple and eloquent. Unfortunately, applying the theory can be demanding and hinges on computing the integrated likelihood fm(Y) at the data Y under model m. A naive approach is to sample {Wm,t} from the prior distribution gm and average {fm(Y|Wm,t)}. This approach is usually inefficient because most of the draws from the prior distribution miss areas where the likelihood is non-zero, especially if the likelihood is much sharper than the prior. Newton and Raftery (1994) proposed a harmonic mean estimator: 21 T 1 S fm (Y 0 Wm,t) 21d T 2B t5B 11

(20.56)

21 h (Wm,t) T 1 S d T 2B t5B 11 fm (Y 0 Wm,t) gm (Wm,t)

(20.57)

fˆm (Y) HM 5 c

where {Wm,t} are the MCMC draws for model m. It has the charm of being easy to compute. However, it has infinite variance, so the law of large numbers does not hold with respect to the number of MCMC draws. Additionally, the harmonic mean tends to overestimate the integrated likelihood (Lenk, 2009) and is biased towards more complex models, perhaps a boon to academics who propose evermore complex models. A variety of numerical approximations have been proposed by Carlin and Chib (1995), Chib (1995), Chib and Jeliascov (2001), Green (1995) and Meng and Wong (1996). Over the years, I have had good success with Gelfand and Dey (1994), which uses importance sampling. It modifies the harmonic mean estimator: fˆm (Y) GD 5 c

where h is a density with support that is contained within the simulation support of the posterior distribution. The strong law of large number holds if: 3

h (W) 2 dW , ` fm (Y 0 W) gm (W)

(20.58)

The Gelfand and Dey method is general and usually requires fewer computations than competing methods. If h is the posterior distribution of Wm, then the computation is exact. The closer that h is to the posterior distribution, the better the approximation. The analyst can fit h to the MCMC draws to improve the accuracy of the approximation. A common choice of h when parameters are not restricted is a multivariate normal distribution with mean equal to the mean of the MCMC draws and variance smaller than the variance of the draws. Balcombe et al. (2009) and Lenk and DeSarbo (2000) apply this method to conjoint data. When models are nested, the Bayes factor for a sharp hypothesis is the Savage–Dickey density ratio (Dickey, 1971). Suppose that the parameter is W 5 (q,y), and we are inter-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

491

ested in testing the sharp hypothesis q 5 q0 where q0 is a specified number under model 1 (restricted model) versus unrestricted q under model 2 (full model). Further suppose that the distributions of the data are f(Y|q0,y) under model 1 and f(Y|q,y) under model 2. The priors are g(q,y) under model 2 and g(y|q0) 5 g(q0,y)/g(q0) under model 1. Then the Bayes factor B12 5 f1(Y)/f2(Y) is equal to the Savage–Dickey density ratio B12 5 g(q0|Y)/ g(q0), which is the ratio of the posterior density to the prior density of q0. The numerator is the marginal posterior density under model 2 and evaluated at q 5 q0. Verdinelli and Wasserman (1995) discuss various approximations of the numerator g(q0|Y). Its MCMC approximation from equation (20.12) is: g (q0 0 Y) <MCMC

T f (Y 0 q0,yt) g (q0,yt) 1 S T 2 B t5B 11 3 f (Y 0 q,yt) g (q,yt) dq

(20.59)

where {qt, yt} are draws from model 2, the full model. The MCMC approximation requires computing the integral in the denominator of equation (20.59) for each draw {yt}, which can be numerically intensive. If q is one-dimensional, grid integration methods, such as the trapezoidal rule, can approximate the integral in the denominator of equation (20.59). If q is high-dimension, then sophisticated methods, such as equation (20.57), are needed to approximate the integral, and it may not be efficient to use this approach to test a sharp hypothesis. One caution about Bayesian model selection is that the technique can be sensitive to the specification of the prior distributions gm. The Bayes rule will pick the correct model in probability as the sample size increases. However, for fixed data, different prior specifications can result in different model choices. This observation does not bother dogmatic Bayesians who take their priors seriously. ‘Convenience’ Bayesian, who use whatever prior is convenient or pre-specified in software, should be circumspect about the results from Bayesian model selection, especially if the researcher reversed engineered the priors to obtain desired results. Researchers need caution when comparing or testing HB models. Does your theory require that the individual-level parameters for all subjects be significantly different from zero, or will it pass muster with 90 percent or 50 percent or 10 percent of the subjects having significant parameters? It could be that the expected value of a parameter is zero across the population with some subjects loving the attribute and others hating it. Public policy makers can use social welfare calculations but still need to address minority rights. Marketing researchers have the luxury of making more nuanced decisions by targeting different consumer segments with different products. Finding segments that love different attributes can be pure gold. It may be that only 10 percent of luxury hotel guests visit the spa, but that 10 percent may be high-margin guests and generate the majority of profits from ancillary services. Testing parameters at the population-level may not be conclusive for theory. For example, the posterior mean of the heterogeneity distribution being zero does not imply that the effect is unimportant unless the error variance is also close to zero. On the positive side, Bayesian are less concerned about dropping insignificant variables from models because multicolinearity has less deleterious effects on Bayesian analysis because the prior distribution ‘regularize’ the likelihood.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

492

6

Handbook of choice modelling

CONCLUSION

Conjoint analysis and Bayesian inference share common foundations in utility theory, and Bayesian inference is particularly effective in estimating individual-level parameters and their heterogeneity distributions. Bayesian methods have large payoffs with broad and shallow data (many subjects and few observations per subjects), which is typical of conjoint studies. After introducing the fundamentals of Bayesian analysis, the chapter reviewed numerical methods to approximate integrals, and then applied these methods to hierarchical Bayes models for conjoint analysis. These methods have a wide swath of applications in economics, marketing, public policy, agriculture, environmental science, education and medicine, to mention a few, substantive fields. A power of a methodology is not only determined by how well it solves the problems for which it was initially designed, but also its extendibility to address issues beyond the motivating problem. The ‘basic’ random utility model and its Bayesian estimation are easily adapted to elaborations and modifications. The chapter ends with a brief survey of these extensions. The survey is not complete. It should give the reader a sense of the breath of Bayesian applications. There is some attempt to group the topics, but the list was selected to reflect the diversity of applications. The topics also have a large nonBayesian literature that the reader can explore outside of this chapter: the focus is on the intersection of Bayesian analysis and random utility models. The linear, compensatory, random utility model in equation (20.33) may not adequately describe the choice process. Compensatory models assume that it is possible to switch a subject’s choice by trading-off one attribute for another. For example, MacBook users could be persuaded to switch to a Dell PC if Dell’s prices were sufficiently low. Gilbride and Allenby (2004) consider conjunctive and disjunctive screening rules were some attributes are ‘must have’ and others are ‘no way’. Compensatory models can mimic conjunctive and disjunctive rules by moving coefficients towards plus or minus infinity, but doing so can distort the heterogeneity distribution. Terui and Dahana (2006) analyse choices with kinky utility functions to capture loss aversion. Shively et al. (2000) and Kim et al. (2007b) use Bayesian splines to estimate nonlinear effects in the utility function. Kim et al. (2007a) in quantity conjoint consider interior solutions to the utility maximization problem with nonlinear utilities that include satiation in quantity. Liechty et al. (2005) allow for dynamic coefficients that can vary during the duration of conjoint experiments. Aribarg et al. (2002), Yang and Allenby (2003) and Yang et al. (2006) consider utility functions that are dependent across agents. Ter Hofstede et al. (2002) use spatial information. Bacon and Lenk (2012) supplement discrete choice data with continuous ratings of options or attributes to identify a common origin and scale for the latent utilities, thus allowing for between-subject comparisons of preferences. Choice probabilities depend on the alternatives in the choice set. Chiang et al. (1999) infer the consideration sets from discrete choices, and Mehta et al. (2003) propose a structural model for consideration set formation based on price uncertainty and information search. Related work by Bradlow and Rao (2000) considers the assortment choice where multiple products are selected on each choice occasion. Ainslie and Rossi (1998) examine choices across categories. Liechty et al. (2001) study menu choices for mass customization. If the number of options varies across choice tasks in HB probit, Zeithammer and Lenk (2006) provides a simple and fast method to estimate the error covariance matrix.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

493

Marketing practitioners often design market share simulators from the results of conjoint experiments. Belloni et al. (2008) evaluate different optimization methods given individual-level estimates. Gilbride et al. (2008) introduce market share constraints in conjoint estimation through the loss function for estimating the partworths. Partial profile conjoint describes the options with a partial list of attributes. Bradlow and Ho (2004) infer subjects’ preferences for the missing attributes with a preference model. Often, researchers treat prior distributions as a nuisance and select those that are not informative. Allenby et al. (1995) incorporate information into priors, and Lenk and Orme (2009) discuss the impact of prior specifications with sparse data. Sandor and Wedel (2005) use priors for optimal Bayesian design of discrete-choice experiments. Bayesian estimation of random utility models is an expanding area of research. It is possible to apply our Bayesian toolkit to ever-richer classes of models.

NOTES * 1. 2. 3. 4. 5. 6. 7.

8. 9. 10.

I wish to thank the editor and reviewer for their support and many useful comments. In the following, I use ‘product’ for products, services, public goods or other stimuli in the conjoint study. The density function f has the properties: f(y) $ 0 for all y, and e f (y) dy 5 1. The probability mass function f has the properties: f(y) $ 0 for all y and S f (y) 5 1. The chapter uses integral notation for both continuous and discrete random variables. If g is a probability mass function, then the integral is with respect to the counting measure. The loss is 0 if D is equal to R (W) , and the loss is 1 if D is not equal to R (W) . 0/1 loss is particularly useful in model selection. Priors are conjugate for a likelihood function if the posterior distribution is in the same family of distributions as the prior distribution. {Xt} is a Markov process if the conditional distribution of the future Xt 11, Xt 12, . . . given the past Xt, Xt−1, . . . only depends on the present Xt. The one-step transition distribution is the probability of moving from state w to state x: Ft(w,x) 5 P(Xt 115 x | Xt 5 w). The transition distribution for a time-homogeneous process does not depend on t. A distribution p is the stationary distribution for a time-homogeneous process if S p (w) F (w,x) 5 p (x) . The left-hand-side is the marginal distribution of moving to state x w in one step where the starting value is randomly selected by p. Reversible Markov chains satisfy the ‘detailed balance equation’: p(w)F(w,x) 5 p(x)F(x,w) for all w and x. Any such p is also the stationary distribution. Two states communicate if there is a positive probability of reaching one state from the other. Independence of irrelevant alternatives: the log odds ratio of two alternatives does not depend on the other options in the choice set. I plead guilty to performing ‘exploratory’ analyses of the data before undertaking the ‘real’ analysis.

REFERENCES Ainslie, A. and P.E. Rossi (1998), ‘Similarities in choice behaviour across product categories’, Marketing Science, 17 (2), 91–106. Aitchison J. and J.A. Bennet (1970), ‘Polychotomous quantal response by maximum indicant’, Biometrika, 57 (2), 253–62. Aitchison J. and S.D. Silvery (1957), ‘The generalization of probit analysis to the case of multiple responses’, Biometrika, 44 (June), 131–50. Albert, J.H. and S. Chib (1993), ‘Bayesian analysis of binary and polychotomous response data’, Journal of the American Statistical Association, 88 (422), 669–79. Allenby, G.M. and P.J. Lenk (1994), ‘Modeling household purchase behaviour with logistic normal regression’, Journal of the American Statistical Association, 89 (428), 1218–31. Allenby, G.M. and P.J. Lenk (1995), ‘Reassessing brand loyalty, price sensitivity, and merchandizing effects on consumer brand choice’, Journal of Business and Economic Statistics, 13 (3), 281–90.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

494

Handbook of choice modelling

Allenby, G.M. and P. Rossi (1998), ‘Marketing models of consumer heterogeneity’, Journal of Econometrics, 89 (1–2), 57–78. Allenby, G.M., N. Arora and J.L. Ginter (1995), ‘Incorporating prior knowledge into the analysis of conjoint studies’, Journal of Marketing Research, 32 (2), 152–62. Andrieu, C. and J. Thoms (2008), ‘A tutorial on adaptive MCMC’, Statistical Computing, 19 (4), 343–73. Aribarg, A., N. Arora and H.O. Bodur (2002), ‘Understanding the role of preference revision and concession in group decisions’, Journal of Marketing Research, 39 (3), 336–49. Ansari, A. and C.F. Mela (2003), ‘E-customization’, Journal of Marketing Research, 40 (2), 131–43. Bacon, L. and P. Lenk (2012), ‘Augmenting discrete-choice data to identify common preference scales for inter-subject analyses’, Quantitative Marketing and Economics, 10 (4), 453–74. Balcombe, K., A. Chalakb and I. Fraser (2009), ‘Model selection for the mixed logit with Bayesian estimation’, Journal of Environmental Economics and Management, 57 (2), 226–37. Belloni, A., R. Freund, M. Selove and D. Simester (2008), ‘Optimizing product line designs: efficient methods and comparisons’, Management Science, 54 (9), 1544–52. Berger, J. (1985), Statistical Decision Theory and Bayesian Analysis, New York: Springer-Verlag. Bernardo, J. and A.F.M. Smith (1994), Bayesian Theory, New York: John Wiley and Sons. Bradlow, E.T. and T.H. Ho (2004), ‘A learning-based model for imputing missing levels in partial conjoint profiles’, Journal of Marketing Research, 41 (4), 369–81. Bradlow, E.T. and V.R. Rao (2000), ‘A hierarchical Bayes model for assortment choice’, Journal of Marketing Research, 37 (2), 259–68. Brooks, S.P. and G.O. Roberts, (1998) ‘Assessing convergence of Markov chain Monte Carlo algorithms’, Statistical Computing, 8 (4), 319–35. Burda, M., M. Harding and J. Hausman (2008), ‘A Bayesian mixed logit-probit model for multinomial choice’, Journal of Econometrics, 147 (2), 232–46. Carlin, B.P. and S. Chib (1995), ‘Bayesian model choice via Markov chain Monte Carlo methods’, Journal of the Royal Statistical Society, Series B, 57 (3), 473–84. Chiang, J., S. Chib and C. Narasimhan (1999), ‘Markov chain Monte Carlo and models of consideration set and parameter heterogeneity’, Journal of Econometrics, 89 (1–2), 223–48. Chib, S. (1995), ‘Marginal likelihood from the Gibbs output’, Journal of the American Statistical Association, 90 (432), 1312–21. Chib, S. and E. Greenber (1995), ‘Understanding the Metropolis–Hastings algorithm’, The American Statistician, 49 (4), 327–35. Chib, S. and I. Jeliascov (2001), ‘Marginal likelihood from the Metropolis–Hasting output’, Journal of the American Statistical Association, 96 (453), 270–81. Cowles, M.K. and P.C. Bradley (1996), ‘Markov chain Monte Carlo convergence diagnostics: a comparative review’, Journal of the American Statistical Association, 91 (434), 883–904. De Finetti, B. (1937), ‘La Prévision: Ses Lois Logiques, Ses Sources Subjectives’, Annales de l’Institut Henri Poincaré, 7, 1–68; translated in 1980, ‘Foresight: its logical laws, its subjective sources’, in H.E. Kyburg Jr and H.E. Smokler (eds), Studies in Subjective Probability, New York: Robert E. Krieger. DeGroot, M. (1970, Optimal Statistical Decisions, Hoboken, NJ: John Wiley & Sons. DeSarbo, W.S., M. Wedel, M. Vriens and V. Ramaswamy (1992), ‘Latent class metric conjoint analysis’, Marketing Letters, 3 (3), 273–88. Dickey, J. (1971), ‘The weighted likelihood ratio, linear hypotheses on normal location parameters’, Annals of Mathematical Statistics, 42 (1), 204–23. Doob, J.L. (1949), ‘Application of the theory of martingales’, Actes du Colloque International Le Calcul des Probabilités et ses applications, (Lyon, 28 juin–3 juillet 1948), Paris: CNRS, pp. 23–7. Fruhwirth-Schnatter, S., R. Tuchler and T. Otter (2004), ‘Bayesian analysis of heterogeneity model’, Journal of Business and Economic Statistics, 22 (1), 2–15. Gelfand, A.E. and D.K. Dey (1994), ‘Bayesian model choice: asymptotics and exact calculations’, Journal of the Royal Statistical Society, 56 (3), 501–14. Gelfand, A.E. and A.F.M. Smith (1990), ‘Sampling based approaches to calculating marginal densities’, Journal of the American Statistical Association, 85 (410), 398–409. Gelfand, A.E., A.F.M. Smith and T.M. Lee (1992), ‘Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling’, Journal of the American Statistical Association, 87 (418), 523–32. Gelman, A. and D.B. Rubin (1992), ‘Inference from iterative simulation using multiple sequences’, Statistical Science, 7 (4), 457–511. Geman, S. and D. Geman (1984), ‘Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 6 (6), 721–41. Geweke, J. (1992), ‘Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments’, in J.M. Bernardo, J. Berger, A.P. Dawid and A.F.M. Smith (eds), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 169–93.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

495

Geweke, J., M. Keane and D. Runkle (1994), ‘Alternative computational approaches to inference in multinomial probit model’, Review of Economics and Statistics, 76 (4), 609–32. Gilbride, T.J. and G.M. Allenby (2004), ‘A choice model with conjunctive, disjunctive, and compensatory screening rules’, Marketing Science, 23 (3), 391–406. Gilbride, T.J., P.J. Lenk and J.D. Brazell (2008), ‘Market share constraints and the loss function in choicebased conjoint analysis’, Marketing Science, 27 (5), 995–1011. Gilks, W.R. and C. Bersuini (2001), ‘Following a moving target – Monte Carlo inference for dynamic Bayesian models’, Journal of the Royal Statistics Society, Series B, 63 (1), 127–46. Girolami, M. and B. Calderhead (2011), ‘Riemann manifold Langevin and Hamiltonian Monte Carlo methods’, Journal of the Royal Statistical Society, Series B, 73 (2), 1–37. Good, I.J. (1971), ‘Non-parametric roughness penalty for probability densities’, Nature, 229 (1), 29–30. Gordon, N.J., D.J. Salmond and A.F.M. Smith (1993), ‘Novel approach to nonlinear/non-Gaussian Bayesian state estimation’, IEE Proceedings-F, 140 (2), 107–13. Green, P.E. and V.R. Rao (1971), ‘Conjoint measurement for quantifying judgmental data’, Journal of Marketing Research, 8 (3), 355–63. Green, P. J. (1995), ‘Reversible jump Markov chain Monte Carlo computation and Bayesian model determination,’ Biometrika, 82 (5), 711–32. Green, P.J. and S. Richardson (2001), ‘Modelling heterogeneity with and without the Dirichlet process’, Scandinavian Journal of Statistics, 28 (2), 244–75. Grether, D.M. and C. Plott (1979), ‘Economic theory of choice and the preference reversal phenomena’, American Economic Review, 69 (4), 623–38. Hammersley, J. and D. Handscomb (1964), Monte Carlo Methods, London: Methuen. Hastings, W.K. (1970), ‘Monte Carlo sampling methods using Markov chains, and their applications’, Biometrika, 57 (311), 97–109. Hesterberg, T. (1995), ‘Weighted average importance sampling and defensive mixture distributions’, Technometrics, 37 (2), 185–94. Hill, B.M. (1965), ‘Inference about variance components in the one-way model’, Journal of the American Statistical Association, 60 (311), 806–25. Hoerl, A.E. and R.W. Kennard (1970), ‘Ridge regression: biased estimation for nonorthogonal problems’, Technometrics, 12 (1), 55–67. James, W. and C. Stein (1961), ‘Estimation with quadratic loss’, Proceedings of the Fourth Berkeley Symposium of Mathematical Statistics and Probability, vol. 1, Berkeley, CA: University of California Press, pp. 361–79. Kamakura, W.A. (1988), ‘A least squares procedure for benefit segmentation with conjoint experiments’, Journal of Marketing Research, 25 (2), 157–67. Kass, R.E. and A.E. Raftery (1995), ‘Bayes factors’, Journal of the American Statistical Association, 90 (430), 773–95. Kim, J., G.M. Allenby and P.E. Rossi (2007a), ‘Product attributes and models of multiple discreteness’, Journal of Econometrics, 138 (1), 208–30. Kim, J.G., U. Menzefricke and F.M. Feinberg (2007b), ‘Capturing flexible heterogeneous utility curves: a Bayesian spline approach’, Management Science, 53 (2), 340–54. Koop, G., D.J. Poirier and J.L. Tobias (2007), Bayesian Econometric Methods, Cambridge: Cambridge University Press. Lancaster, K.J. (1966), ‘A new approach to consumer theory’, Journal of Political Economy, 74 (2), 132–57. Lancaster, T. (2004), An Introduction to Modern Bayesian Econometrics, Oxford: Blackwell. Lenk, P.J. (2009), ‘Simulation pseudo-bias correction to the harmonic mean estimator of integrated likelihoods’, Journal of Computational and Graphical Statistics, 18 (4), 941–60. Lenk, P.J. and DeSarbo, W. (2000), ‘Bayesian inference for finite mixtures of generalized linear models with random effects’, Psychometrika, 65 (1), 93–119. Lenk, P.J. and B. Orme (2009), ‘The value of informative priors in Bayesian inference with sparse data’, Journal of Marketing Research, 46 (6), 832–45. Lenk, P.J., W. DeSarbo, P. Green and M. Young (1996), ‘Hierarchical Bayes conjoint analysis: recovery of partworth heterogeneity from reduced experimental designs’, Marketing Science, 15 (2), 173–91. Lichtenstein, S. and P. Slovic (1971), ‘Reversal of preferences between bids and choices in gambling decisions’, Journal of Experimental Psychology, 89 (1), 46–55. Liechty, J., D.K.H. Fong and W.S. DeSarbo (2005), ‘Dynamic models incorporating individual heterogeneity: utility evolution in conjoint analysis’, Marketing Science, 24 (2), 285–93. Liechty, J., V. Ramaswamy and S.H. Cohen (2001), ‘Choice menus for mass customization: an experimental approach for analyzing customer demand with an application to web-based information service’, Journal of Marketing Research, 38 (2), 183–96. Lindley, D.V. and A.F.M. Smith (1972), ‘Bayes estimates for the linear model’, Journal of the Royal Statistical Society, Series B, 34 (1), 1–41.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

496

Handbook of choice modelling

Louviere, J.L., D.A. Hensher and J.D. Swait (2000), Stated Choice Methods: Analysis and Applications, Cambridge: Cambridge University Press. Luce, D.R. and J.W. Tukey (1964), ‘Simultaneous conjoint measurement: a new type of fundamental measurement’, Journal of Mathematical Psychology, 1 (1), 1–27. McCulloch, R.E. and P.E. Rossi (1994), ‘An exact likelihood analysis of the multinomial probit model’, Journal of Econometrics, 64 (1–2), 207–40. McCulloch, R.E., N.G. Polson and P.E. Rossi (2000), ‘A Bayesian analysis of multinomial probit model with fully identified parameters’, Journal of Econometrics, 99 (1), 173–93. McFadden, D. (1974), ‘Conditional logit analysis of quantitative choice behavior’, in P. Zarembka (ed.) Frontiers in Econometrics, New York: Academic Press, pp. 105–42. Mehta, N., S. Rajiv and K. Srinivasan (2003), ‘Price uncertainty and consumer search: a structural model for consideration set formation’, Marketing Science, 22 (1), 58–84. Meng, X.L. and W.H. Wong (1996), ‘Simulating ratios of normalizing constants via a simple identity: a theoretical exploration’, Statistica Sinica, 6 (4), 831–60. Metropolis, N., A.W. Rosenbluth, N.M. Rosenbluth, A.H. Teller and E. Teller (1953), ‘Equations of state calculations by fast computing machines’, Journal of Chemical Physics, 21 (6), 1087–91. Montricher, G.F. de, R.A. Tapia and J.R. Thompson (1975), ‘Nonparametric maximum likelihood estimation of probability densities by penalty function methods’, The Annals of Statistics, 3 (6), 1329–48. Nandram, B. and M.H. Chen (1996), ‘Reparameterizing the generalized linear model to accelerate Gibbs sampler convergence’, Journal of Statistical Computation and Simulation, 54 (1–3), 129–44. Newton, M. and A. Raftery (1994), ‘Approximate Bayesian inference with weighted likelihood bootstrap’, Journal of the Royal Statistical Society, 56 (1), 3–48. Nobile, A. (2000), ‘Comment: Bayesian multinomial probit models with a normalization constraint’, Journal of Econometrics, 99 (2), 335–45. Orme, B. (2006), Getting Started with Conjoint Analysis, Chicago, IL: Research Publishers. Roberts, G.O. and N.G. Polson (1994), ‘On the geometric convergence of the Gibbs sampler’, Journal of the Royal Statistical Society, Series B, 56 (2), 377–84. Roberts, G.O., A. Gelman and W.R. Gilks (1997), ‘Weak convergence and optimal scaling of random walk Metropolis algorithms’, Annals of Applied Probability, 7 (1), 110–20. Rossi, P. E., G.M. Allenby and R. McCulloch (2005), Bayesian Statistics and Marketing, Hove: John Wiley and Sons. Sandor, Z. and M. Wedel (2005), ‘Heterogeneous conjoint choice designs’, Journal of Marketing Research, 42 (2), 210–18. Savage, J.L. (1954), The Foundations of Statistics, New York: John Wiley and Sons. Schwartz, G. (1978), ‘Estimating the dimension of a model’, Annals of Statistics, 6 (2), 461–4. Shively, T.S., G.M. Allenby and R. Kohn (2000), ‘A nonparametric approach to identifying latent relationships in hierarchical models’, Marketing Science, 19 (2), 149–62. Smith, A.F.M. (1973), ‘A general Bayesian linear model’, Journal of the Royal Statistical Society, Series B, 35 (1), 67–75. Smith, W.B. and R.R. Hocking (1972), ‘Wishart variate generator’, Journal of the Royal Statistical Society, Series C (Applied Statistics), 21 (3), 341–5. Stein, C. (1956), ‘Inadmissibility of the usual estimator for the mean of a multivariate distribution’, Proceedings of the Third Berkeley Symposium of Mathematical Statistics and Probability, vol. 1, Berkeley, CA: University of California Press, pp. 197–206. Tanner, M.A. and W.H. Wong (1987) ‘The calculation of posterior distributions by data augmentation’, Journal of the American Statistical Association, 82 (398), 528–40. Ter Hofstede, F., M. Wedel and J.B. Steenkamp (2002), ‘Identifying spatial segments in international markets’, Marketing Science, 21 (2), 160–77. Terui, N. and W.D. Dahana (2006), ‘Estimating heterogeneous price thresholds’, Marketing Science, 25 (4), 384–91. Train, K. (2009), Discrete Choice Methods with Simulation, Cambridge: Cambridge University Press. Train, K. and G. Sonnier (2005), ‘Mixed logit with bounded distributions of correlated Partworths’, in R. Scarpa and A. Alberini (eds), Applications of Simulation Methods in Environmental and Resource Economics, Dordrecht: Springer, pp. 117–34. Verdinelli, I. and L. Wasserman (1995), ‘Computing Bayes factors using a generalization of the Savage–Dickey density ratio’, Journal of the American Statistical Association, 90 (430), 614–18. Von Neumann, J. and O. Morgenstern (1944), Theory of Games and Economic Behavior, Princeton, NJ: Princeton University Press and New York: John Wiley and Sons. Vriens, M., M. Wedeland T. Wilms (1996), ‘Metric conjoint segmentation methods: a Monte Carlo comparison’, Journal of Marketing Research, 33 (1), 73–85.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Bayesian estimation of random utility models

497

Wind, J., P.E. Green, D. Shifflet and M. Scarbrough (1989), ‘Courtyard by Marriott: designing a hotel facility with consumer-based marketing models’, Interfaces, 19 (1), 25–47. Yang, S. and G.M. Allenby (2003), ‘Modeling interdependent consumer preferences’, Journal of Marketing Research, 40 (3), 282–94. Yang, S., V. Narayan and H. Assael (2006), ‘Estimating the interdependence of television program viewership between spouses: a Bayesian, simultaneous equation model’, Marketing Science, 24 (3), 336–49. Zeithammer, R. and P.J. Lenk (2006), ‘Bayesian estimation of multivariate normal models when dimensions are absent’, Quantitative Marketing and Economics, 4 (3), 241–65. Zellner, A. (1971), An Introduction to Bayesian Inference in Econometrics, New York: John Wiley and Sons.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:56AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

21 Simple ways to estimate choice models for single consumers Bart Frischknecht, Christine Eckert, Jordan Louviere and Tiago Ribeiro

1

INTRODUCTION

Choice models were first proposed by Thurstone (1927) for pairs of options. Models for multiple choice options are due to Luce (1959) and McFadden (1974). Except for laboratory choice experiments in psychology, it is rare to see discrete choice models estimated for single people. Following Chapman’s (1984) discussion of the difficulties of using rank-order expansion, there was little work on ways to measure and model single person choices in survey applications until recently. A notable exception, Finn and Louviere (1992), demonstrated that the best-worst scaling method, where a respondent makes best and worst choices from among sets of objects according to an underlying construct such as preference, importance, or concern, could provide individual-level estimates of preferences. The best-worst scaling approach was extended by Louviere et al. (2008) to allow for the estimation of choice models for single persons. The purpose of this chapter is to show that one can use simple methods to model single person choices by extending the Louviere et al. (2008) approach to estimation methods familiar to most academics and practitioners, such as ordinary least squares (OLS) regression and weighted least squares (WLS) regression. Past work modeling individuals is described in section 2. The two new methods described in section 3 yield biased estimates of the choice probabilities, but we demonstrate that one can improve these estimates rather simply. More specifically in sections 4 and 5, we show with Monte-Carlo simulations and data from online discrete choice experiments that we can refine simple parameter estimates to produce well-behaved in-sample and out-of-sample predictions of choices. Section 6 summarizes the contributions of the chapter, which are to describe and discuss these methods for estimating choice models for single individuals using linear regression techniques rather than the more complex mixed logit- or probit-style choice models that require numerical integration. The possibility of linear regression methods for choice models may encourage many more researchers to consider choice topics in their research, where, previously, choice models have been the domain of choice model specialists.

2

BACKGROUND

In what follows, we discuss the motivation for modeling individuals. We start with a brief history on modeling individual choices and present some background on data expansion methods that provide one basis for estimating individual level models (ILMs). 498 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Simple ways to estimate choice models for single consumers 499 2.1

Motivating the Study of Individuals

A primary motivation for studying individuals, even when the aggregate behavior such as the prediction of market share is the object of interest, is that individuals behave differently from one another both in terms of their preferences and also in terms of their decision processes. Therefore a model that assumes individuals are similar may lead to inaccurate conclusions at the individual or aggregate level. Borsboom et al. (2003) argue that a model that restricts either preferences or processes to be the same across individuals will not be able to infer causality for a particular individual even when the model predicts choices well. Current choice modeling approaches that aggregate data across individuals do not have the capability to test or articulate differences between individuals owing to process differences, such as what information they consider, what rules they apply for evaluating attribute information and how consistently they choose. A practical reason for modeling individuals is that there are numerous applications where the population of interest is small enough, or the sampling pool is small enough (for example, 10–50 people) that the assumptions of the aggregate models are questionable even after we accept the assumption that individuals use the same decision process. For example, there has been much less application of choice modeling for business-tobusiness applications than in business-to-consumer settings. 2.2 Modeling Individuals Early efforts to model individuals outside of psychology laboratories can be found in Beggs et al. (1981) and Chapman (1984), who show how individual-level preference rank-order information on options in choice sets can be expanded to estimate ILMs. However, apart from those early attempts, there has been little research on estimating individual-level choice models since then. Following McFadden (1974), almost all choice models were conceived of and specified as ‘top-down’ models, that is, they estimate an aggregate population choice model that makes assumptions regarding the distribution of population preferences and then derives individual-level parameters based on this distribution as well as individual-level observations. Historically, research on top-down models evolved from simple, fixed effects models to more complex random parameter forms and, more recently, model forms that allow for various types of more complex latent heterogeneity, such as estimating terms that can be identified in the variance-covariance matrix of the parameter estimates associated with an assumed distribution. Thus, marketing, transport and economics academics began to focus more on incorporating latent heterogeneity into choice models. A seminal paper by Kamakura and Russell (1989) showed how to estimate segment-specific parameters for a finite number of latent segments. If one weights these parameter estimates by an individual’s segment membership probability, the segment-specific parameter estimates can be used to derive individual-specific parameters. The continuous distribution, or random parameters model form, such as the mixed logit model, assumes that consumer preferences follow a specific (continuous) distribution. Like the Kamakura and Russell (1989) latent class model, individual choice observations can be used to infer an individual’s position on this preference distribution (see Train, 2003; Rossi et al., 2006).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

500

Handbook of choice modelling

Recently, there has been renewed interest in ‘bottom-up’ approaches, which directly model individual preferences, independently of assumptions about the population behavior. For example, Louviere et al. (2008) show how to use observations from bestworst response tasks in choice experiments to estimate individual preference parameters with WLS regression. Marshall et al. (2011) describe and discuss a comparison of the WLS approach with standard hierarchical Bayes (HB) estimation of a mixed logit model, and report that both predict aggregate choice shares similarly well in holdout choice tasks with out-of-sample individuals. 2.3

Data Expansion Models

If we assume that individuals are compensatory decision makers, then we would conclude that estimation of ILMs usually fails due to an insufficient number of observations to estimate parameters reliably. One way to overcome this limitation is to expand the available full or partial ranking information to create implied choice sets and choice observations. That is, data expansion methods use full or partial preference rankings of choice options to infer chosen options in implied choice sets. Rank-order expansion of full or partial rankings of choice options in choice sets to implied choices in implied choice sets was first discussed by Luce and Suppes (1965), and first used in a practical application by Beggs et al. (1981) and Chapman and Staelin (1982). The two prior references relied on a full ranking of choice options in a choice set, but more recent papers (for example, Louviere et al., 2008; Ebling et al., 2010) use choice responses from bestworst questions asked about each choice set in discrete choice experiments (Louviere et al., 2000; Marley and Louviere, 2005) to give partial rankings that can be expanded to implied new choice sets. An example of best-worst questions being used in a discrete choice experiment (DCE) is shown in Figure 21.1. In Figure 21.1, product B is chosen as best and product D is chosen as worst. This gives the partial ordering B . A, C . D (‘.’ denotes ‘is preferred to’). This partial ordering allows one to create 15 non-empty implied choice sets associated with the four alternatives in each original choice set (see also Louviere et al., 2008). In these 15 sets the most preferred option B appears eight times, and it (implicitly) should be chosen each of these times, or eight times. The least preferred option, D, also appears eight times, but it should only be chosen once, namely, when it appears as a singleton option with no other competing options. The remaining two options A and C also appear in eight choice sets, but they can be chosen only if the most preferred option B is not present, and we cannot tell from the partial order whether A is better or worse than C. Thus, we assign both the average of the remaining two expected choice counts, or three choices each. In this way, one can associate expected choice frequencies of 8, 3, 3 and 1 to alternatives B, A, C and D, respectively (see Appendix A to this chapter for a derivation of expected choice frequencies in the general case of K options per choice set).

3

PROPOSED MODELING APPROACHES

In the following we propose two ways to estimate individual level choice models. The first approach is motivated by the first step of the Newton–Raphson algorithm to maxi-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Simple ways to estimate choice models for single consumers 501 Product A

Product B

Product C

Product D

1. Capacity

3 KW

3 KW

1 KW

4 KW

2. Roofspace required

60 m2

15 m2

15 m2

40 m2

25 years

10 years

20 years

20 years

$5100

$5100

$1700

$12400

4.2 years

8.2 years

4.3 years

7.9 years

Features

3. Production output warranty 4. Net purchase price 5. Paypack time

If you want to review the meaning of any features, please click here Choose ONE of the 4 products that you prefer most. Product A

Product B

Product C

Product D

Product C

Product D

Chosse ONE of the 4 products that you prefer least. Product A

Product B

Figure 21.1 Example of best-worst choice experiment mize the individual’s likelihood function with respect to the model parameters, whereas the second approach is motivated by differences in consumers’ choice consistency which need to be taken into consideration when making out-of-sample predictions. 3.1

Newton–Raphson Approximation

In this section, we mathematically derive how results from ordinary least squares regression models estimated directly from the 1/0 choice indicators or the expected choice frequencies associated with the partial rankings of the choice options in each set can be used to model each individual’s choices using the first step of the Newton–Raphson algorithm to maximize the individual’s likelihood function with respect to the model parameters. In the following let i represent the individual, r represent the choice set, j represent the alternative in the choice set and k represent other alternatives in the choice set, and let the utility of alternative j for respondent i be the sum of a deterministic part depending on the attributes X of the alternatives, and a stochastic part eirj: uij 5 bi1x1rj 1 bi2 x2rj 1 . . . 1 Pirj 5 Xrj bi 1 Pirj

(21.1)

3.1.1 Ordinary least squares estimation for 1/0 choice indicators In the following, we drop the subscript for the respondent, but remind the reader that parameters and likelihood-functions are specified at an individual level. We assume sufficient data from a single individual (typically in the form of repeated choices) to estimate the model. Further, yrj is an indicator variable taking on the value 1 if the respondent chose alternative j in the r-th choice set and 0 otherwise. Assuming that the stochastic part erj is independently and identically Gumbel distributed with scale 1 (EV(1)), leads to the well-known closed form for the choice probabilities (multinomial logic – MNL):

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

502

Handbook of choice modelling Prj 5

eXrj b X b a e rk

.

(21.2)

k

The associated log-likelihood function can be written as L(b) 5 a a yrj log (Prj) . r

(21.3)

j

The Newton–Raphson algorithm to find the parameter vector b that maximizes the loglikelihood function L in iteration number t11 is defined as: b(t11) 5 b(t) 1 Db(t),

(21.4)

with Db(t) 5 2 [ 2bL (b(t)) ] 21bL (b(t)) 5 2c

02L (b(t)) 21 0L (b(t)) d 0b0b 0b

(21.5)

Inserting the log-likelihood function as defined in (21.3), we obtain bL (b) 5 a a (yrj 2 Prj) Xrjr 5 a a yrj (Xrj 2Xr) r r

j

r

j

2b L(b) 52 a a Prj (Xrj 2 Xr)r (Xrj 2 Xr) ,

(21.6)

Xr 5 a Prj Xrj.

(21.7)

r

j

with choice set specific

j

Defining

X(t) 5 [ (Xrj 2 Xr(t)) "Prj (b(t)) ] Y(t) 5 [ yrj/"Prj (b(t)) ]

we thus obtain b(t11) 5 b(t) 1 (X(t)rX(t))21X(t)rY(t).

(21.8)

If we set b(0) 5 0 in the first iteration (t50), we have Prj (b(0)) 51/k, where K is the size of the choice set, and thus b(1) 5 b(0) 1 K (XrX) 21XrY

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

(21.9)

Simple ways to estimate choice models for single consumers 503 with (0)

X(t) 5 [ (Xrj 2 X r) ] Y(t) 5 [ yrj ] . Note that (XrX) 21XrY can be obtained as the coefficients of a regression of Y on X and choice set dummy variables and so the estimates associated with the first iteration are these coefficients multiplied by the constant K. 3.1.2 Ordinary least squares on expected choice frequencies for expanded choice data Instead of using the 1/0 choice indicators to estimate individual level choice models, one can also use the expected choice frequencies obtained from the data expansion of the partial rankings described previously. Again let again r denote a choice set and s index a sub-choice set resulting from the expansion of the data as described in Appendix A. Let arsj be an indicator variable for the presence of alternative j in sub-choice set s of the original choice set r. We always order the sub-choice sets that result from the rank expansion in the same way and thus arsj 5 asj for all s. Assuming again that the stochastic part in equation (21.1) is independently and identically Gumbel distributed with scale 1, we obtain the following choice probabilities for the expanded choice sets: Prj 5

arsj eXrj b

5

X b a arske rk k

asj eXrj b X b a ask e rk

.

(21.10)

k

Let yrsj be an indicator variable taking on the value 1 if the respondent ‘chose’ alternative j in the r-th expanded choice set and be 0 otherwise. The log-likelihood function for the expanded data can then be written as L (b) 5 a a a yrsjlog (Prsj) r

s

j

5 a a a yrsjXrjb 2 a a log a askexp (Xrkb) . r

s

j

r

s

(21.11)

k

The derivation of the log-likelihood function with respect to b is then equal to (see Appendix B). bL (b) 5 a a yr+ j (Xrrj 2 Xrr) , r

with

(21.12)

j

Xr 5 a aPr+k / a Pr+j bXrk, k

j

yr+j 5 a yrsj. s

and Pr+j 5 a Prsj. s

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

504

Handbook of choice modelling

Differentiating the log-likelihood function twice leads to 2b L (b) 52a a a Prsj Xrrj (Xrj 2 Xrs) r

s

j

5 2 a a a Prsj (Xrj 2 Xrs)r (Xrj 2 Xrs) r

s

(21.13)

j

with Xrs 5 a PrskXrk. k

Inserting the derivatives into the first step of the Newton–Raphson optimization algorithm, one obtains with b(0) 5 0 in the case of K 5 4 options per choice set (see Appendix C) b(1) 5 b(0) 1 c a a a Prsj (Xrj 2 Xrs)r (Xrj 2 Xrs) d r

5

s

j

21

c a a a (Xrj 2 Xrs)ryrsj d r

s

j

21 36 c a a (Xrj 2 Xr)r (Xrj 2 Xr) d c a a (Xrj 2 Xr) yr+j d 77 r j r j

5

36 (XrX) 21XrY, 77

(21.14)

with X 5 [ Xrj 2 Xr(0) ] Y 5 [ yr+j ] Again, (XrX) 21XrY can be obtained as the coefficients of a regression of Y on X and choice set dummy variables and so the first iteration produces estimates that are multiplied by the constant 36/77. 3.2

Reweighting Based on Individual Choice Consistency Differences

In both preceding estimation approaches, estimates from the individual level OLS regressions are reweighted by an appropriate scalar that is constant across individuals. An alternative to using a constant scalar across all respondents is to use an individual rescaling factor proportional to the respondent’s choice consistency; our empirical and simulation experience with this approach indicates that it also works very well. One estimates an OLS on the 1/0 choice indicators (section 3.2.1) or a WLS regression with particular weights and a transformation of the dependent variable (section 3.2.2) to model a single person, and then one uses the model to predict the observed dependent variable for each choice option in each choice set of interest. Next, one calculates the associated residual mean squared error for each person, and multiplies all OLS or WLS model parameter estimates for an individual by 1/(mean squared error). The revalued param-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Simple ways to estimate choice models for single consumers 505 eters are used to make choice predictions for each person using a logit link function as in equation (21.2). The intuition behind this approach is that people whose choices are less well predicted by the specified model and thus have a higher mean squared error, which we interpret as being less consistent in their choices, should have predicted choice probabilities closer to 1/K. Predicted probabilities closer to 1/K are produced by smaller magnitude parameters while predicted probabilities closer to 0 and 1 are produced by larger magnitude parameters. The following formulas describe the procedure for OLS and WLS estimation in more detail. 3.2.1 OLS estimation and rescaling Let again r represent the choice set, j represent the alternative in the choice set, and k represent other alternatives in the choice set. To improve readability, we again drop the subscript for the individual. For each individual we estimate a model where the dependent variable is yrj5 e

1, if alternative j chosen in choice set r . 0, if alternative j not chosen in choice set r

That is, for each individual we estimate preference parameters b and choice set-specific parameters d via minimizing min S 5 a a (yrj 2 yˆrj) 2. b,d

(21.15)

yˆrj 5 Xrj b^ * 1 dr.

(21.16)

r

j

where

We rescale the parameters from the first-stage OLS estimation as described above, that is, we define the mean square error as a function of the regression residuals MSE 5

1 2 a (yrj 2 yˆrj) . Rk a r j

(21.17)

where R is the total number of choice sets and K is the number of alternatives per choice set. We define the individual rescaling coefficient as a 5 1/MSE. We then rescale the model parameters as b^ 5 ab^ *. To predict choice probabilities, one uses a logit link function as shown below: ˆ rj 5 P

eX

rj

b^

X ae

rk

. b^

(21.18)

k

3.2.2 WLS estimation and rescaling Using the same notation as above, for each individual we estimate a model where the dependent variable is as shown below for the case of choice sets containing four alternatives, and only having a partial ranking of best and worst available:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

506

Handbook of choice modelling ln (8) , if alternative j chosen best in choice set r yrj 5 • ln (3) , if alternative j not chosen in choice set r ln (1) , if alternative j chosen worst in choice set r.

The ‘8’, ‘3’, and ‘1’ are the implied choice frequencies for the case of 4 alternatives per choice set according to the expansion method described in Louviere et al. (2008) (see Appendix A for a derivation of expected choice frequencies in the general case of K options per choice set). For each individual we estimate preference parameters b and choice set specific parameters d via minimizing min S 5 a a Wij (yrj 2 yˆrj) 2 b,d r

(21.19)

j

where yˆrj 5 Xrj b^ * 1 dr and weights 8, if alernative j chosen best in choice set r Wrj 5 • 3, if alernative j not chosen in choice set r 1, if alernative j chosen worst in choice set r to correct for heteroscedastic errors. We again rescale the parameters obtained from the procedure above. In particular, we define the mean square error to be a function of the regression residuals MSE 5

1 2 a (yrj 2 yˆrj) , RK a r j

and define an individual’s rescaling coefficient as a 5 1/MSE. We then rescale the model parameters as b^ 5 ab^ *, and again apply a Logit link function to predict choice probabilities: ˆ rj 5 P

eX

rj

ae

b^

.

Xrk b^

k

4 4.1

SIMULATION STUDY Setup of the Simulation Study

We now describe and discuss a Monte Carlo simulation study that demonstrates the performance of the four proposed ‘bottom-up’ methods. To provide a benchmark we also report the performance of a more commonly used ‘top-down’ approach to obtain individual-level parameters, namely the means of the individual level posteriors of a

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Simple ways to estimate choice models for single consumers 507 mixed logit model estimated with hierarchical Bayes using Kenneth Train’s code for Matlab (Train, 2009). We focus on respondents making choices in line with a linear in attribute utility function and logit choice model as outlined in equation (21.1). Choice options are described by seven attributes, two of which have four levels and the remaining have two levels; all attribute levels are effects coded. We again assume that the stochastic term in equation (21.1) is Gumbel distributed with scale specified below; this again yields logit-type choice probabilities used to simulate individuals’ choices. We assume that individual preferences (parameter estimates) follow a normal distribution with diagonal covariance matrix, and assume the priors in the HB estimation to be normal.1 In the simulation study, we varied two factors: 1.

2.

Number of choice sets per individual: 4, 8, 16, 32, 64 choice sets. All simulated individuals see identical choice tasks, and we hold choice set size constant at four alternatives. This factor was chosen to determine the impact on the information available on the precision of the model estimates. Scale of the error distribution: We use error terms with Gumbel distributions EV(0.5), EV(1) and EV(2), respectively to compare model performance across different levels of choice consistency.

We simulate choices for 300 individuals and estimate individual level models for a random selection of half of these simulated individuals. The other half of the simulated individuals is excluded from the analysis to assess the holdout performance of the proposed models after estimation. We repeat this procedure 100 times for each combination of the two simulation study factors to ensure that our results are not due to specific sampling idiosyncrasies. We compare the performance of the five methods (that is, OLS with Newton–Raphson weights on non-expanded and expanded choice sets, OLS with mean square error (MSE) weight, WLS with MSE weight, and HB posteriors) based on three different comparison measure types: 1.

2.

3.

In order to assess the model’s ability to recover the true parameters, we calculate the root mean squared error (RMSE) between true and estimated parameters; a smaller RMSE indicates better parameter recovery. By definition, the RMSE is only available for respondents used in the estimation. The RMSE is only available for simulated data. In order to assess the precision of population level predictions, we calculate the mean squared error for differences between aggregate observed choice shares and aggregate estimated choice shares across alternatives and choice sets (MSE). A lower MSE indicates that a model fits the observed aggregate choices better. This measure is available in simulations or experimental data with holdout tasks or holdout samples. Finally, to capture the accuracy of individual predictions, we compute the root likelihood (RLH) as the geometric mean of the estimated choice probabilities for the observed choices. The RLH takes on values between 0 and 1 with a value closer to 1 implying that a model fits the observed choices better. The RLH can be calculated

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

508

Handbook of choice modelling for respondents for tasks used in estimation and holdout tasks. For holdout samples we use an analogous measure to the RLH, namely, the root predictive score (RPS). The RPS is the geometric mean of the predicted holdout choice probabilities based on the models for the in-sample respondents.2 Again, RPS takes on values between 0 and 1 with a value closer to 1 implying that a model fits the observed choices better. Both RLH and RPS are available in simulated or experimental data.

4.2

Results of the Simulation Study

In the following, we first discuss the ability of the different models to recover the true individual level parameters. Then we discuss individual and aggregate predictive abilities of different models for in-sample respondents and holdout respondents for in-sample choice tasks and holdout choice tasks. 4.2.1 Parameter recovery Figure 21.2 graphs the root mean squared error for estimated versus true parameters for the five different methods, the number of choice sets, with the scale of the Gumbel distribution equal to 1. Naturally, parameter recovery of all models improves when the number of choice sets, and thus the information available per individual, increases. This improvement is largest for the HB estimates and smallest for the reweighting using the Newton–Raphson first step approximation. It is interesting to note that except for the WLS approach with reweighting equal to 1/MSE, all individual models perform better than the HB mixed logit approach for a small number (that is, four or eight) of choice sets. 4.2.2 In-sample respondents/in-sample choice tasks Figure 21.3 shows the root likelihoods for the respondents and choice sets used in the estimation (left-hand column) and the MSE between predicted aggregate choice shares and observed aggregate choice shares (right-hand column). For both measures and all models, the best fits (that is, highest RLH and lowest aggregate MSE) are achieved for the smallest number of choice sets. Comparing these results with the ones in the previous Medium consistency

3.0 2.5

RMSE

NR OLS 2.0

NR Exp OLS

1.5

MSE OLS MSE WLS

1.0

HB 0.5 0.0 4

Figure 21.2

8

16 # Choice sets

32

64

Parameter recovery measured by RMSE

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Simple ways to estimate choice models for single consumers 509 High consistency

High consistency 0.004

1.2 NR OLS NR Exp OLS MSE OLS MSE WLS

0.8 0.6 0.4

HB

0.2

0.004 Age MSE

RLH

1.0

0.002

NR OLS NR Exp OLS MSE OLS MSE WLS

0.001

HB

0.003

0.000

0.0 4

8

16

32

4

64

8

Medium consistency

64

0.004 NR OLS NR Exp OLS MSE OLS MSE WLS

0.8 0.6 0.4

HB

0.2

Age MSE

1.0 RLH

32

Medium consistency

1.2

0.0

0.003

NR OLS NR Exp OLS MSE OLS MSE WLS

0.002 0.001

HB

0.000 4

8

16

32

64

4

8

# Choice sets Low consistency

0.6 0.4

HB

0.2 0.0 16 # Choice sets

32

64

Age MSE

NR OLS NR Exp OLS MSE OLS MSE WLS

0.8

8

32

64

Low consistency

1.0

4

16 # Choice sets

1.2

RLH

16 # Choice sets

# Choice sets

0.007 0.005 0.004 0.004 0.003 0.002 0.001 0.000

NR OLS NR Exp OLS MSE OLS MSE WLS

HB 4

8

16

32

64

# Choice sets

Figure 21.3 In-sample respondents/in-sample choice tasks RLH and aggregate MSE section where we discussed individual parameter recovery, we conclude that the high fits are due to over-fitting. The decrease in fit in the WLS case for low consistency choices is owing to the fact that revaluing the estimated parameters by dividing them by the MSE in WLS results in large parameter magnitudes relative to the magnitudes required to correctly predict choices in the low consistency case, resulting in worse aggregate performance. A similar consequence is not observed in the plots of RLH. However, this is not surprising given that the RLH for the WLS model in the low consistency case does not reach the null hypothesis level of 1/J50.25 for each alternative. 4.2.3 In-sample respondents/holdout choice tasks Figure 21.4 mirrors Figure 21.3, but the fit measures now are based on how well the model for each individual predicts choices in choice tasks that were not used for estimation. This confirms the observation in the previous section that the very high fit for the lowest number of choice sets (that is, four choice sets) for in-sample individuals and their in-sample choice sets is due to over-fitting. Indeed, Figure 21.4 shows that if only four choice sets are used for estimation, all models perform badly, but performance improves with additional information. When comparing performance across models, the individual posterior means of the HB mixed logit estimation provide the best RLH and the best aggregate MSE, but it is noteworthy that the individual level models follow the same trends and are only slightly worse. The WLS reweighted estimates in the low choice

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

510

Handbook of choice modelling High consistency

High consistency 0.15

RLH

0.6

NR OLS NR Exp OLS MSE OLS MSE WLS

0.4 0.2

HB

Age MSE

0.8

0.0

NR OLS NR Exp OLS MSE OLS MSE WLS

0.10 0.05

HB

0.00 4

8

16

32

64

4

8

# Choice sets

64

0.10 NR OLS NR Exp OLS MSE OLS MSE WLS

0.4 0.2

HB

0.08 Age MSE

0.6 RLH

32

Medium consistency

Medium consistency 0.8

NR OLS NR Exp OLS MSE OLS MSE WLS

0.06 0.04

HB

0.02 0.00

0.0 4

8

16

32

4

64

8

HB 16 # Choice sets

32

64

Age MSE

NR OLS NR Exp OLS MSE OLS MSE WLS

8

32

64

Low consistency

Low consistency 0.30 0.25 0.20 0.15 0.10 0.05 0.00 4

16 # Choice sets

# Choice sets

RLH

16 # Choice sets

0.007 0.006 0.005 0.004 0.003 0.002 0.001 0.000

NR OLS NR Exp OLS MSE OLS MSE WLS

HB

4

8

16

32

64

# Choice sets

Figure 21.4 In-sample respondents/holdout choice tasks RLH and aggregate MSE consistency case are the exception as they decrease in their aggregate predictive ability with an increase in information for estimation. 4.2.4 Holdout respondents/in-sample choice tasks Figure 21.5 shows how the five estimation techniques perform when the individual level models are used to predict the choices of a different set of individuals completing the same tasks that were used in estimation. All models perform fairly similarly for individual level predictions measured by RPS; models that use reweighting based on MSE seem to perform worse only in the case of low choice consistency. It also is worth noting that all individual model estimation techniques result in models that predict the aggregate choices of holdout individuals well, despite over-fitting. The latter result is due to those individuals being from a similar population and so making similar choices to the in-sample individuals for the same choice sets. 4.2.5 Holdout respondents/holdout choice tasks Figure 21.6 graphically displays the performance of the five estimation techniques when the individual level models are used to predict the choices of a different set of individuals completing choice tasks that were not used in estimation. As before, the simple estimation techniques follow the behavior of the more sophisticated HB estimation and produce numerically similar results for individual level choice prediction as measured by

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Simple ways to estimate choice models for single consumers 511 High consistency

High consistency 0.05

1.0

0.4

NR OLS NR Exp OLS MSE OLS MSE WLS

0.2

HB

0.6

0.04 Age MSE

RPS

0.8

0.0

NR OLS NR Exp OLS MSE OLS MSE WLS

0.03 0.02

HB

0.01 0.00

4

8

16

32

64

4

8

# Choice sets

64

0.04

0.4 0.2

HB

Age MSE

NR OLS NR Exp OLS MSE OLS MSE WLS

0.6 RPS

32

Medium consistency

Medium consistency 0.8

NR OLS NR Exp OLS MSE OLS MSE WLS

0.03 0.02

HB

0.01 0.00

0.0 4

8

16

32

4

64

8

# Choice sets

HB 16 # Choice sets

32

64

Age MSE

NR OLS NR Exp OLS MSE OLS MSE WLS

8

32

64

Low consistency

0.30 0.25 0.20 0.15 0.10 0.05 0.00 4

16 # Choice sets

Low consistency

RPS

16 # Choice sets

0.007 0.006 0.005 0.004 0.003 0.002 0.001 0.000

NR OLS NR Exp OLS MSE OLS MSE WLS

HB

4

8

16

32

64

# Choice sets

Figure 21.5 Holdout respondents/in-sample choice tasks RPS and aggregate MSE RPS and Aggregate MSE. The exceptions again are the WLS reweighted estimates for the low choice consistency case, which show declining aggregate predictive ability with increase in information for estimation.

5

DISCRETE CHOICE EXPERIMENT EXAMPLES

We use the four individual model ‘bottom-up’ estimation methods proposed above as well as mixed logit ‘top-down’ model estimated with HB (normal population priors and diagonal covariance matrix) to estimate model parameters and predict observed choices in four datasets collected from online discrete choice experiments: (1) choices of car insurance options, (2) choices of cross-country airline flights, (3) delivered pizzas and (4) digital cameras. In the airline and car insurance data we have observations from 200 individuals who evaluated 12 choice sets with four options each that we used for estimation and four choice sets that we used for holdout assessment. Similar to the simulation data, choice options are described by seven attributes: two have four levels and the rest have two levels (all attribute levels were effects coded). Choice sets were constructed using the Street and Burgess (2007) optimal design approach. For the pizza data set there are 600 respondents and each completed 20 five-option choice sets including a no-choice option used for estimation, with another five choice sets used for holdout assessment. Choice

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

512

Handbook of choice modelling High consistency

High consistency 0.15

RPS

0.6

NR OLS NR Exp OLS MSE OLS MSE WLS

0.4 0.2

HB

Age MSE

0.8

0.0

NR OLS NR Exp OLS MSE OLS MSE WLS

0.10 0.05

HB

0.00 4

8

16

32

64

4

8

# Choice sets Medium consistency 0.10

0.4

NR OLS NR Exp OLS MSE OLS MSE WLS

0.2

HB

0.08 Age MSE

0.6 RPS

64

Medium consistency

0.8

NR OLS NR Exp OLS MSE OLS MSE WLS

0.06 0.04

HB

0.02 0.00

0.0 4

8

16

32

4

64

8

32

16

64

# Choice sets

# Choice sets Low consistency

Low consistency

0.3

0.01 NR OLS NR Exp OLS MSE OLS MSE WLS

0.2 0.2 0.1

HB

0.1

Age MSE

0.3 RPS

32

16 # Choice sets

NR OLS NR Exp OLS MSE OLS MSE WLS

0.01 0.00

HB

0.00 0.00

0.0 4

8

16 # Choice sets

32

64

4

8

16

32

64

# Choice sets

Figure 21.6 Holdout respondents/holdout choice tasks RPS and aggregate MSE options are described by a no-choice constant or six attributes: four had four levels and two had two levels (all attribute levels were effects coded). For the camera data set there are 600 respondents and each completed 24 five-option choice sets used for estimation, with another five choice sets used for holdout assessment. Choice options are described by a no-choice constant or nine attributes: one six-level attribute, one five-level attribute, three four-level attributes, two three-level attributes and two two-level attributes (all attribute levels were effects coded). As in the simulations, we randomly split each data set to estimate models for half the respondents, predicting to the other half. We repeat the splits 10 times and average the results to minimize sampling bias. Figure 21.7 graphically displays the performance of the five estimation techniques in the four datasets. The left-hand column reports fits associated with individual choice predictions (that is, RLH and RPS) and the right-hand column gives the aggregate mean squared error of choice share prediction. We use the measures of prediction performance discussed earlier for (a) in-sample respondents and in-sample choice sets (isr/isct), (b) in-sample respondents and holdout choice sets (isr/hoct), (c) holdout respondents and in-sample choice sets (hor/isct) and (d) holdout respondents and holdout choice sets (hor/hoct). Figure 21.7 shows that individual level prediction performances measured by RLH and RPS are fairly similar across methods and the four data conditions. Except for the reweighted WLS estimates it appears that all individual models perform very similarly to individual-level HB mixed logit estimates. However, as previously observed, the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Simple ways to estimate choice models for single consumers 513 reweighted WLS estimates seem to over-fit the data as the RLH for in-sample respondents and in-sample choice sets is the highest, whereas this method has the lowest RLH for in-sample respondents and holdout choice sets as well as the lowest RPS measures. Regarding aggregate choice share predictions, individual level HB mixed logit estimates best reproduced the true choice shares for all four datasets and all four aggregate choice share prediction measures. The worst performance is associated with estimates from the Newton–Raphson first step optimization approach applied to the expanded data (however, all the values for aggregate MSE of choice share prediction are small), whereas the MSE reweightings yield very good aggregate predictions for both the original (that is, OLS) and the expanded data (that is, WLS).

6

CONCLUSIONS

We proposed and investigated four simple ways to model individual choices which are based on OLS and WLS, respectively. We tested the performance of all estimation approaches using both simulations and four empirical datasets against the popular HB approach for estimating a mixed logit model where we compared in-sample and out-of-sample performance measures. Our results suggest that OLS and WLS models perform well in both real and simulated data once the parameter estimates are revalued by appropriate constant scale factors. Based on the results from the simulations as well as the online discrete choice experiments, we conclude that individual models are particularly appropriate for low information conditions (that is, low number of choice sets per respondent). Because of simplicity and ease of use, all estimation approaches are likely to be attractive to researchers who want to better understand and model consumer choices, but find complex statistical models or black box commercial software challenging. In turn, this should allow both academics and practitioners to do a reasonable job of predicting choices while minimizing mistakes. Our discussion of different performance measures should also stimulate future research: whereas usually only individual prediction performance measures such as RLH are reported, we show that these individual prediction measures can be misleading if one is interested in aggregate choice share predictions. Similarly, our results demonstrate that good aggregate choice share predictions do not necessarily imply good individual choice predictions. Thus, researchers should use a measure that suits their specific application to determine the fit of a model rather than relying on standard performance measures. We also advocate the use of both holdout respondents and holdout choice sets to avoid models that over-fit the data used for estimation, but predict poorly for new samples in new choice situations. An interesting extension of our work, which we have left to future work, would be to compare the ability of the individual models versus the individual level posterior HB mixed logit estimates to recover population distributions of preferences or of other metrics of interest such as willingness-to-pay estimates. Correctly analyzing choice data is only one part of a larger whole associated with academic or applied work in choice modeling. One must first conceptualize academic and practical problems involving consumer and other choices and understand how to design and implement choice experiments. This chapter did not address the precursors to choice data analysis, which require education, training and experience, but given that academics and practitioners can acquire these skills, the approaches proposed in this chapter should

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

514

Handbook of choice modelling Airline

0.80 0.70 0.60

RLH isr/isct

0.50

RLH isr/hoct

0.40

RPS hor/isct

0.30

RPS hor/hoct

0.20 0.10 0.00

xp LS LS LS LS E MSE O MSE W NR O Estimation method

NR O

HB

Camera

0.70 0.60 0.50

RLH isr/isct

0.40

RLH isr/hoct

0.30

RPS hor/isct

0.20

RPS hor/hoct

0.10 0.00

xp LS LS LS LS E MSE O MSE W NR O Estimation method

NR O

HB

Pizza

0.80 0.70 0.60

RLH isr/isct

0.50

RLH isr/hoct

0.40

RPS hor/isct

0.30

RPS hor/hoct

0.20 0.10 0.00

xp LS LS LS LS E MSE O MSE W NR O Estimation method

NR O

HB

Car insurance

0.80 0.70 0.60

RLH isr/isct

0.50

RLH isr/hoct

0.40

RPS hor/isct

0.30

RPS hor/hoct

0.20 0.10 0.00

xp LS LS LS LS E MSE O MSE W NR O Estimation method

NR O

HB

Figure 21.7 Individual and aggregate fit measures for empirical datasets

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Simple ways to estimate choice models for single consumers 515 0.020

Airline

0.015

Agg MSE isr/isct Agg MSE isr/hoct

0.010

Agg MSE hor/isct Agg MSE hor/hoct

0.005 0.000 LS Exp SE OLS WLS NR O R OLS M MSE N

HB

Estimation method

0.0030

Camera

0.0025 Agg MSE isr/isct

0.0020

Agg MSE isr/hoct 0.0015

Agg MSE hor/isct

0.0010

Agg MSE hor/hoct

0.0005 0.000 LS Exp SE OLS WLS NR O R OLS M MSE N

HB

Estimation method

0.005

Pizza

0.004 Agg MSE isr/isct 0.003

Agg MSE isr/hoct Agg MSE hor/isct

0.002

Agg MSE hor/hoct

0.001 0.000 LS Exp SE OLS WLS NR O R OLS M MSE N Estimation method

0.030

HB

Car insurance

0.025 Agg MSE isr/isct

0.020

Agg MSE isr/hoct 0.015

Agg MSE hor/isct

0.010

Agg MSE hor/hoct

0.005 0.000 LS Exp SE OLS WLS NR O R OLS M MSE N

HB

Estimation method

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

516

Handbook of choice modelling

allow many researchers to estimate choice models for single people and predict choice outcomes in many circumstances of academic and practical interest.

NOTES 1. Note that the fact that the prior in the HB estimation matches the simulated population distribution provides an advantage for the HB estimation. It is highly likely that in cases where the prior distribution does not match the true underlying population distribution, the performance of the individual level models, which refrain from making any population distributional assumptions, may improve compared to the HB estimation. We leave such a comparison for future research. 2. Details on the root predictive score can be found in Frischknecht et al. (2013).

REFERENCES Beggs, S., S. Cardell and J. Hausman (1981), ‘Assessing the potential demand for electric cars’, Journal of Econometrics, 16 (1), 1–19. Borsboom, D., G.J. Mellenbergh and J. van Heerden (2003), ‘The theoretical status of latent variables’, Psychological Review, 110 (2), 203–19. Chapman, R.G. (1984), ‘An approach to estimating logit models of a single decision maker’s choice behavior’, Advances in Consumer Research, 11 (1), 656–61. Chapman, R.G. and R. Staelin (1982), ‘Exploiting rank order choice set data within the stochastic utility model’, Journal of Marketing Research, 19 (3), 288–301. Ebling, C., B. Frischknecht and J. Louviere (2010), ‘Keep it simple: easy ways to estimate choice models for single consumers’, paper presented at the Australian and New Zealand Marketing Academy Conference, Christchurch, New Zealand, 29 November–1 December. Finn, A. and J.J. Louviere (1992), ‘Determining the appropriate response to evidence of public concern: the case of food safety’, Journal of Public Policy and Marketing, 11 (1) 12–25. Frischknecht, B., C. Eckert, J. Geweke and J.J. Louviere (2013), ‘A simple method for estimating preference parameters for individuals’, International Journal of Research in Marketing, doi: 10.1016/j.ijresmar.2013.07.005. Kamakura, W.A. and G.J. Russell (1989), ‘A probabilistic choice model for market segmentation and elasticity structure’, Journal of Marketing Research, 26 (4), 379–90. Louviere, J.J., D.A. Hensher and J.D. Swait (eds) (2000), Stated Choice Methods: Analysis and Applications, Cambridge: Cambridge University Press. Louviere, J.J., D. Street, L. Burgess, N. Wasi, T. Islam and A.A.J. Marley (2008), ‘Modeling the choices of individual decision-makers by combining efficient choice experiment designs with extra preference information’, Journal of Choice Modelling, 1 (1), 128–63. Luce, D. and P. Suppes (1965), ‘Preference, utility, and subjective probability’, in R.R. Bush, and E. Galanter (eds), Handbook of Mathematical Psychology, New York: John Wiley and Sons, pp. 249–410. Luce, R.D. (1959), Individual Choice Behavior: A Theoretical Analysis, New York: John Wiley and Sons. Marley A.A.J. and J.J. Louviere (2005), ‘Some probabilistic models of best, worst, and best-worst choices’, Journal of Mathematical Psychology, 49 (6), 464–80. Marshall, D., S. Chan and J. Curry (2011), ‘A head-to-head comparison of the traditional (top-down) approach to choice modeling with a proposed bottom-up approach’, paper presented at the 2010 Sawtooth Software Conference, Newport Beach, CA, 6–8 October, accessed 15 January 2014 at www.sawtoothsoftware.com/support/technical-papers/conference-proceedings/proceedings2010, pp. 309–19. McFadden, D. (1974), ‘Conditional logit analysis of qualitative choice behavior’, in P. Zarembka (ed.), Frontiers in Econometrics, New York: Academic Press, pp. 105–42. Rossi, P., G. Allenby and R. McCulloch (eds) (2006), Bayesian Statistics and Marketing, New York: Wiley. Street, D. and L. Burgess (eds) (2007), The Construction of Optimal Stated Choice Experiments: Theory and Methods, New York: Wiley. Thurstone, L.L. (1927), ‘A law of comparative judgement’, Psychological Review, 34 (4), 273–86. Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge: Cambridge University Press. Train, K. (2009), Kenneth Train’s Home Page, University of California, Berkeley, accessed 15 January 2014 at http://elsa.berkeley.edu/~train/software.html.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Simple ways to estimate choice models for single consumers 517

APPENDIX A: DATA EXPANSION FOR K OPTIONS PER CHOICE SET For a choice set of size K and if choices are ordered, then one can expand each choice K set to Ke 5 a a k b non empty choice sets. Of these Ke choice sets, aKkb are of size k with 0 , k , 5 K. The most preferred option will be chosen whenever it is available in the Ke choice sets, which means that it will be chosen 1 + a c aKkb 2 aK k2 1 b d times. The second most preferred option will be chosen whenever it is available and the most preferred option is not available, which means it will be chosen a c aK 2k 1 b 2 aK 2k 2 b d times. In general, the number of times the mth preferred option (0 , m , 5 K) will be chosen equals K

k5 1

K2 1 k5 1

K2 2 k5 1

K2m11 K2m b2a bd. a ca k k k5 1

K2 m

APPENDIX B: DERIVATION OF EQUATION (21.12) b L(b) 5 a a a (yrsj 2Prsj) Xrrj

5 a a a yrsj aXrrj 2 a Prsk Xrrk b r

s

j

r

s

j

k

5 a a a yrsj (Xrrj 2 Xrrs) r

s

r

j

j

5 a a yr+j Xrrj 2 a a Pr+kXrrk r

k

5 a a yr+j Xrrj 2 a a Pr+kXrr r

j

r

k

5 a a yr+j Xrrj 2 a a yr+kXrr r

j

r

k

5 a a yr+j (Xrj 2 Xrr) , r

j

APPENDIX C: DERIVATION OF MULTIPLYING CONSTANT FOR DATA EXPANSION MODEL 2bL (b) 5 a a a Prsj Xrrj X (Xrj 2 Xrs) r

s

j

5 2a a a Prsj Xrrj Xrj 1 a a a Prsj Xrrj a PrsjXrk r

s

j

r

s

j

k

5 2 a a a Prsj Xrrj Xrj 1 a a a a Prsj Prsk Xrrj Xrk r

s

j

r

s

j

k

5 2 a a a Prsj Xrrj Xrj 1 a a a Prsj Prsk Xrrj Xrj 1 a a a a Prsj Prsk Xrrj Xrk. r

s

j

r

s

j

r

s

j

k 2j

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

518

Handbook of choice modelling

Noting that

1 k K 1 a Prsj 5 K a a m b 5 K A s m51

1 k 1 K 1 a Prsj Prsj 5 K a m a m b 5 K B s m51

and

1 k K 1 1 k 1 K 1 1 a Prsj Prsk 5 K 2 1 a K a a m b 2 K a m a m b b 5 K K 2 1 (A2 B) . s m51 m51 K K A5 aa b m51 m

then the expression above becomes

K 1 K B 5 a a b, m51 m m

1 1 1 1 (A 2 B) Xrrj Xrk 2bL (b) 52 a a AXrrj Xrj 1a a BXrrj Xrj 1 a a a r j K r j K r j k 2j K K 2 1 5 2 a c (A 2 B) r

5 2a c (A2B) r

5 2 (A 2 B)

5 2 (A 2 B)

5 2 (A 2 B)

1 K 1 1 Xrrj Xrj 2 (A 2 B) a a Xrrj Xrk d KK21 j K K 2 1a j k 2j

1 K 1 1 Xrrj Xrj 2 (A 2 B) a a Xrrj Xrk d KK21 j K K 2 1a j k

K 1 1 1 c a Xrj Xrj 2 a a Xrrj Xrk d K21 r K j K Ka j k K 1 c a (Xrj 2 Xr)r (Xrj 2 Xr) d a K21 r K j K a (Xrj 2 Xr) r (Xrj 2 Xr) . K 2 1a r j

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:52AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

22 Hybrid choice models: the identification problem Akshay Vij and Joan L. Walker

1

INTRODUCTION

Conventional microeconomic theory has tended to regard individual consumers as rational self-interested actors engaged in a constant process of evaluating the costs and benefits associated with any decision in the marketplace as they strive to maximize their personal well-being. The random utility maximization model has been the model of choice for studies on consumer behavior over the last several decades. Random utility maximization, or discrete choice, models examine potential outcomes from among a set of mutually exclusive alternatives, and have found wide application in fields as diverse as travel demand analysis, marketing, education, labor force participation, and so on. Early applications almost exclusively used some model form belonging to the generalized extreme value (GEV) family of models, owing largely to the computational tractability offered by these models. The multinomial logit and nested logit models proved by far to be the most popular (Carrasco and Ortúzar, 2002), earning their colloquial appellation of the workhorses of discrete choice analysis. Numerous studies have since devoted attention towards improving the specification of the logit model. Extensions include the incorporation of flexible error structures and random taste heterogeneity through the use of either the mixed logit or the multinomial probit model; the inclusion of latent variables representing latent biological, psychological and sociological constructs underlying the formation of individual preferences, such as attitudes, values, norms and affects; the introduction of latent classes to capture latent segments that differ from each other with regards to, for example, the taste parameters; the combination of stated and revealed preference data to capitalize on the benefits offered by either type of data; and the representation of individual decision-making behavior in a dynamic context to capture interdependencies between decisions made at different stages in time. The hybrid choice model (HCM) combines these and other more recent developments in the choice modeling literature under a single unified framework, leading to a statistically more robust and behaviorally richer model of decision-making that obviates many of the limitations of simpler representations. However, the increased complexity afforded by the HCM raises important questions of identification that remain inadequately addressed in the literature. Any HCM will in general be specified according to some theory of individual behavior. Observable data may then be used to verify the hypothesized theory underlying the model specification. There are two facets to the identification problem: theoretical and empirical. A model specification is said to be theoretically identifiable if no two distinct sets of parameter values generate the same probability distribution of observable data. In most cases, unless restrictions are imposed multiple sets of parameter estimates may generate the same probability distribution for the data. Therefore, the identification problem consists 519 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

520

Handbook of choice modelling

of determining the set of restrictions required to obtain a unique vector of consistent parameter estimates. The theoretical identification problem as it applies to the family of discrete choice models without latent variables, such as the multinomial logit model, the multinomial probit model and the mixed logit model, has received widespread attention in the literature (see, for example, Ben-Akiva and Lerman, 1985; Walker et al., 2007; Train, 2009). The theoretical identification problem, as it applies to the family of structural equation models with latent variables, such as the confirmatory factor analytic model and the path analytic model, is equally well understood (see, for example, Bollen, 1989). In this chapter, we break apart the HCM into a discrete choice model where the latent variables are treated as observable variables, and a structural equation model with latent variables, assembling the rules of identification that have been developed independently for each of these two constituent pieces elsewhere in the literature, and deriving some of our own for specific cases that haven’t been addressed before, into a set of sufficient but not necessary conditions of theoretical identification for the HCM as a whole. However, in so doing our framework overlooks correlation between the two constituent pieces due to the presence of latent variables in both (Daziano and Bolduc, 2012). It is precisely this correlation that results in situations where the rules of theoretical identification presented in this chapter are sufficient but not necessary. Nevertheless, findings from this chapter represent an important first step in addressing the identification problem as it applies to HCMs, setting the stage for future breakthroughs in this important area of research. Theoretical identifiability is predicated on the availability of an infinite number of observations. In reality, the analyst will have a finite sample of observations at her disposal that may or may not contain enough variability to support the estimation of a particular model specification. A model is said to be empirically unidentified or underidentified if the model is theoretically identified but cannot be estimated using a sample dataset. The term empirical underidentification was originally introduced by Kenny (1979) in the context of structural equation models. Reasons for empirical unidentification or underidentification may include small sample size, multicollinearity between observable variables, model misspecification, and so on. The objective of this chapter is to provide a general framework for the theoretical and empirical identification of HCMs. We focus on two components of the GEV model that have found immense popularity with recent studies employing discrete choice analysis: the mixed logit model and the choice and latent variable model. That being said, the framework presented in this chapter can be readily broadened to help identify HCMs that employ multinomial probit, latent classes, multiple datasets, and so on. The chapter is organized as follows. Section 2 introduces a specific version of the HCM that combines the mixed logit model with the choice and latent variable model. Section 3 examines theoretical identification for each component of the HCM introduced in section 2, developing a set of sufficient but not necessary conditions for the identification of the model as a whole. Section 3 is accompanied by three appendices at the end of the chapter that illustrate how these conditions may be applied to different model specifications. Section 4 elaborates on the common sources of empirical underidentification. Section 5 discusses estimation tools that may be used to verify theoretical and empirical identification. Section 6 uses a case study to demonstrate the kinds of identification issues that might

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

521

Hybrid choice models: the identification problem

arise in practice. Section 7 concludes with a discussion of the limitations of the framework presented in this chapter and potential directions for future research.

2

THE HYBRID CHOICE MODEL

The HCM takes as its kernel the random utility maximization model, adding extensions wherever necessary to relax some of the more limiting assumptions of the kernel. In introducing the different components of the HCM, we begin by summarizing a special case of the HCM as outlined by Walker and Ben-Akiva (2002). Section 2.1 presents the framework of the random utility maximization model. Section 2.2 builds on the framework through the inclusion of more flexible error structures that allow for unrestricted substitution patterns and serial correlation, and random taste heterogeneity to capture unobservable variation in sensitivity to alternative attributes and individual characteristics. Section 2.3 incorporates the influence of latent variables and psychometric data that capture the effects of more abstract psychological constructs on observable behavior. Walker and Ben-Akiva (2002) take the HCM further through the inclusion of latent classes and the combination of stated and revealed preference data. We desist from including these extensions in our analysis because the set of sufficient conditions for theoretical identification developed later in section 3 can be easily broadened to cover these more general cases. Instead, section 2.4 extends the framework in a different direction that allows for causal relationships between multiple explanatory variables. Causal relationships are commonplace with studies using structural equation models, and are steadily gaining in popularity with studies employing HCMs as well (see, for example, Temme et al., 2008; Zhao, 2009; Tudela et al., 2011). Furthermore, the identification conditions for models with causal relationships between explanatory variables are distinct and deserving of explicit treatment. For these reasons, we include them in our representation of the HCM. 2.1

The Random Utility Maximization Kernel

Consider a decision maker n (n 5 1, . . . , N) , faced with a set of mutually exclusive alternatives j (j 5 1, . . . , J) , where we have assumed, for the sake of notational convenience, that the number of alternatives faced by all individuals is the same. The random utility maximization model states that the chosen alternative is that which provides the greatest utility, and the model is mathematically formulated as: ynj 5 e

1 0

if unj $ unj for jr 5 1, . . . ,J otherwise r

unj 5 xrnj b1enj,

(22.1) (22.2)

where ynj is an indicator of the observed choice and unj is the perceived utility of alternative j for individual n, xnj is an (L 3 1) vector of all explanatory variables, b is an (L 3 1) vector of parameters, and enj is the stochastic component. The set of explanatory variables xnj may represent both characteristics of the decision maker and attributes of the alternative. Equation (22.1) is the measurement equation of the choice model, and it

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

522

Handbook of choice modelling Notation

Explanatory variable (X)

Observable variable Latent variable Utility (u)

Structural relationship Measurement relationship

Choice (y) Disturbances

Source: Walker and Ben-Akiva (2002).

Figure 22.1 Random utility maximization model framework links the unobservable utilities unj to the observed choice indicators ynj. Equation (22.2) is the structural equation of the choice model, and it links the explanatory variables xnj to the unobservable utilities unj. For random utility maximization models (Figure 22.1) the specification of the absolute levels of utilities is irrelevant; only their differences matter. Nonetheless, we specify the model in level form (that is, unj, j 5 1, . . . , J) rather than in difference form (for example, (unj 2 unJ) , j 5 1,. . ., (J 2 1) ). While working with the difference form would greatly simplify the identification and normalization problem, behaviorally it is often more meaningful to specify and estimate the models at the levels, or structural, form. Employing a more compact vector form, we get: yn 5 [ yn1, . . . ,ynJ ]r

(22.3)

un 5 Xrn b 1 en,

(22.4)

where yn, un and en are (J 3 1) vectors and Xn is an (L 3 J) matrix. The random utility maximization model forms the kernel of the HCM. In selecting a particular form for the kernel, one of the things to be kept in mind is that it should be computationally tenable. The GEV model has a closed form solution that renders it a natural choice for the kernel. Within the GEV family of models the analyst could choose multinomial logit, the most basic of model forms, and introduce further complexity through the addition of mixture distributions. Alternatively, the analyst could start with a more complex form, such as the nested or cross-nested logit model, as the kernel. From the standpoint of estimation, it isn’t always clear which is preferable (Walker and Ben-Akiva, 2002). The procedure for establishing identification is independent of the kernel model form, and may be used to identify kernel model forms belonging to the GEV family, such as multinomial logit and nested logit, and kernel model forms outside the GEV family, such as multinomial probit. We shall assume throughout the chapter that multinomial logit forms the kernel of the HCM, and the identification conditions will be derived for this special case. It is left to the reader to derive the analogous conditions of identification for other kernel model forms.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

523

Hybrid choice models: the identification problem 2.2

Flexible Disturbances

The GEV family of models, though computationally tractable, is deficient in other ways due largely to the rigidity of the error structure. On the other hand, random utility maximization models such as the multinomial probit that offer more flexible error structures can be computationally burdensome. The specific case of the HCM considered in this chapter combines the advantages of both model forms. The random utility term en is made up of two components: a probit-like variable with a multivariate distribution, and an i.i.d. Extreme Value random variable corresponding to the logit kernel. The probitlike term allows for a rich covariance structure and the extreme-value term aids computation (Walker and Ben-Akiva, 2002). We specify the random utility term en using the factor-analytic structure shown below: en 5 Fn xn 1 nn,

(22.5)

where xn is an (R 3 1) vector of R multivariate latent factors, Fn is a (J 3 R) matrix of factor loadings that map the factors to the error structure, and nn is a (J 3 1) vector of i.i.d. Extreme Value random variables with mean zero and variance g/m2, where m is the scale and g is the variance of a standard Extreme Value random variable (g equals p2 /6). For estimation purposes, it is desirable to specify the factors as independent, leading us to decompose xn as follows: xn 5 UhRn,

(22.6)

where hRn is an (R 3 1) vector of independent factors with mean zero and variance one, UUr is the covariance matrix of xn, and U is an (R 3 R) lower triangular matrix that is the Cholesky factorization of the covariance matrix. Equations (22.5) and (22.6) may be combined to get the following factor-analytic form for the error term: en 5 FnUhRn 1 nn.

(22.7)

In principal, the distribution of en is associated with a (J 3 J) covariance matrix, for which level and scale corrections eventually result in a (J 2 1) 3 (J 2 1) covariance matrix requiring an additional normalization in terms of scale. The factor-analytic structure was first proposed by McFadden (1984) in the context of multinomial probit models to ease estimation. In the case of the multinomial probit model, the error term nn does not enter equation (22.7) and the error structure is wholly captured by the factor analytic term FnUhRn. The elements of Fn are specified by the analyst according to some prior hypotheses about the covariance structure of the sample, whereas the elements of U are parameters to be estimated. The only limitation of the factor-analytic structure is that the utility specification should be additively separable into the systematic and the stochastic component, such that the systematic component comprises the expectation of the random utility and the elements of hRn are distributed with mean zero. These restrictions rule out the use of one-sided distributions such as the lognormal or the triangular for elements of hRn. For heteroskedastic, nested and cross-nested covariance structures this is rarely a problem, and the factor-analytic specification can suitably capture these

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

524

Handbook of choice modelling

covariance structures with relatively few parameters. Bunch (1991) presents a set of comprehensive guidelines for the identification of multinomial probit models, and Walker et al. (2007) adapt these guidelines to the case of the mixed logit model with factor-analytic specifications. However, one-sided distributions are commonly used when imposing random parameters on alternative attributes and individual characteristics that are expected to have either a positive or a negative effect on the decision-making process, but not both. For models that do employ one-sided distributions, the conditions of identification developed in this chapter cannot be used directly. In such cases the reader may apply the conditions to an analogous model with two-sided distributions that can be represented using a factor-analytic specification as a means of gathering information regarding the identification status of the original model specification. Additionally, the reader may use the estimation methods described in greater detail in section 5. Substituting equation (22.7) into equation (22.4), we get: un 5 Xrn b 1 FnUhRn 1 nn.

(22.8)

The variables nn and hRn are unknown, whereas the variable Xn is known. The unknown parameters are m (the scale of nn) and the elements of b and U. Though the matrix Fn may include unknown parameters, these cases are rare in the literature, and we will assume for the remainder of this chapter that Fn is known. Furthermore, we will be retaining the extreme value scale term m instead of normalizing it to 1. When one arbitrarily sets the scale of one of the elements of U, the scale of the model (that is, the m) changes, and this change is reflected in the scale of the estimated parameters in b. Therefore, it is necessary to retain the m to interpret the impact of the normalization of U on the remaining parameter estimates. 2.3 Latent Variables Analysts are often interested in the influence exerted by biological, psychological and sociological factors, such as attitudes, norms, perceptions, affects, beliefs, on observable individual behavior. Unfortunately, most of these constructs are not well defined and cannot be directly measured, and are therefore referred to as latent variables. Just as utility as a latent construct is operationalized with the help of observable choices, the latent variables are operationalized with the help of indicators that most often consist of individual responses to survey questions regarding, for example, the level of agreement with attitudinal statements or satisfaction with alternative attributes. Though the latent variable itself is not observed, its effect on observable variables, or indicators, can be measured, and the nature of the relationship can provide information about the underlying latent variable (Ben-Akiva et al., 2002). Figure 22.2 extends the framework of the random utility maximization model to include the additional effect of latent variables. We introduce the superscripts s and , to denote observable and latent variables, respectively. As labeled in the figure, the model comprises two components: the discrete choice sub-model and the latent variable sub-model. The discrete choice sub-model comprises the logit kernel with flexible disturbances built on top of it. The latent variable sub-model maps the indicators onto

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

525

Hybrid choice models: the identification problem Observed variables (Xa)

Latent variables (Xl)

Indicators (I) Latent variable sub-model

Utility (u)

Choice (y)

Discrete choice sub-model

Source: Adapted from Ben-Akiva et al. (2002).

Figure 22.2 Integrated choice and latent variable model framework

the latent variables. Indicator responses vary across individuals and, depending upon the latent variable of interest, could also vary across alternatives (and observations, when working with panel data). For the sake of notational convenience we assume that the indicators vary across individuals and alternatives, as in the case of perceptions regarding alternative attributes. In the simplest case, a linear model is appropriate for describing the mapping of the indicators onto the latent variables, leading to the following equation for the measurement model: inj 5 LxBnj 1 dnj,

(22.9)

where inj is a (Q 3 1) vector of observed indicators, x,nj is the (LB 3 1) matrix of latent variables, dnj is a (Q 3 1) vector of measurement errors, and L is a (Q 3 LB) matrix of coefficients relating the indicators to the latent variables. HCM studies in the literature usually assume that indicator responses are uncorrelated with each other. However, this need not always be the case. For instance, when measuring individual perceptions regarding a specific attribute of each of the alternatives in the choice set, indicator responses for different alternatives for the same individual might be correlated. Similarly, when working with panel data, individual responses to the same indicator might be serially correlated across time. To capture these and other potential sources of correlation, we employ the following factor-analytic representation for the measurement errors: dnj 5 DQhCnj,

(22.10)

where hCnj is a (C 3 1) vector of C independent random factors with mean zero and variance one, QQr is the covariance matrix of QhCnj such that Q is the (C 3 C) lower triangular matrix that is its Cholesky factorization, and D is a (Q 3 C) matrix of factor loadings that map the factors to the error structure. Owing to the central limit theorem, the elements of hCnj are usually assumed to be standard normal. However, if the analyst has strong a priori reasons for believing that some other distributional form, such as extreme value or the Laplace, might be a better fit then that too can be employed.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

526

Handbook of choice modelling

2.4 Causal Relationships The framework shown in Figure 22.2 can be generalized further to include causal relationships within and across observable and latent variables. The literature on HCMs is dominated by studies employing some variation of the multiple indicator multiple cause (MIMIC) model where each latent variable is measured by multiple indicator responses and explained by multiple observable variables. The use of more complex causal relationships between explanatory variables in HCMs has been limited thus far, with only a handful of studies looking beyond the MIMIC model for a representation of individual behavior (see, for example, Temme et al., 2008; Zhao, 2009; Tudela et al., 2011). However, these methods are employed widely by studies in psychology and the social sciences to test theoretical relationships between multiple variables, observable or latent. They are powerful tools for analyzing the mediating influence of intervening constructs on observable individual behavior, and are a natural extension to the MIMIC models currently being used by most studies. Figure 22.3 shows the generalized framework of the HCM with causal relationships between the variables. It is helpful to differentiate between exogenous and endogenous explanatory variables. A variable whose value is determined by the states of other variables in the model is an endogenous variable, superscripted d, whereas a variable whose value is independent of the states of other variables is an exogenous variable, superscripted x. Both latent and observable variables can be either exogenous or endogenous, resulting in the four way stratification of explanatory variables shown in Figure 22.3. As can further be seen from Figure 22.3, two components to the HCM can be distinguished: a discrete choice sub-model and a structural equations sub-model. The discrete choice sub-model is the same as before: equations (22.1) and (22.2) are still the measurement and structural components of the sub-model, respectively. For the structural equation sub-model, equations (22.9) and (22.10) which correspond to the latent variable sub-model from Figure 22.2 form the measurement component of the structural equa-

od

ld

Latent endogenous variables (Xld)

Observed endogenous variables (Xod)

Structural equations sub-model Indicators (I)

Observed exogenous variables (Xox)

Latent exogenous variables (Xlx)

Utility (u)

Discrete choice sub-model

Choice (y)

Note:

The bi-directional arrow denotes correlation.

Figure 22.3 HCM framework

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

527

tions sub-model. The structural component for the sub-model corresponds to the causal relationships between the explanatory variables themselves, the equations for which may be written as follows: xdnj 5 Bxdnj 1 Gxxnj 1 znj,

(22.11)

where xdnj is the (Ld 3 1) vector of endogenous variables, xxnj is the (Lx 3 1) vector of exogenous variables, znj is an (Ld 3 1) vector of random errors, B is an (Ld 3 Ld) matrix of coefficients for the endogenous variables, and G is an (Ld 3 Lx) matrix of coefficients for the exogenous variables. Among the studies mentioned in a previous paragraph that have incorporated causal relationships within the HCM framework, all have assumed that the endogenous variables are uncorrelated. However, studies in psychology and the social sciences that use structural equation models routinely introduce correlation between endogenous variables to help distinguish correlation from causation. We propose to do the same within the HCM framework through the use of the following factor-analytic representation for the covariance structure of the measurement errors: znj 5 GYhSnj,

(22.12)

where hSnj is an (S 3 1) vector of S independent random factors with mean zero and variance one, YYr is the covariance matrix of YhSnj such that Y is the (S 3 S) lower triangular matrix that is its Cholesky factorization, and G is an (Ld 3 S) matrix of factor loadings that map the factors to the error structure. As was the case in section 2.3, the elements of hSnj are usually assumed to be standard normal, but if the analyst so desires other distribution forms may also be used.

3

THEORETICAL IDENTIFICATION

A model is theoretically identifiable if it is possible to infer the true underlying parameter values given an infinitely large number of observations. Identifiability precludes observational equivalence – if a model is identifiable then no two sets of parameter values result in the same probability distribution of observable variables. As with any complex econometric model, identification is an issue with HCM. Although the identification problem has been explored in substantial detail for special cases, a more general framework remains lacking, largely due to the complexity of HCMs that renders infeasible any monolithic examination of the model form. A more tractable approach is to break apart the model into smaller sub-models that can then be examined independently and more fruitfully. The approach adopted in this chapter consists of breaking the HCM into a discrete choice model where the latent variables are treated as observable variables, and a structural equation model with latent variables. The normalizations and restrictions that apply to a discrete choice model without latent variables also apply here, as do the identification rules that apply to a traditional structural equations model with latent variables. Therefore, a sufficient but not necessary condition for identification can be obtained by extending the two-step rule used for structural equation models

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

528

Handbook of choice modelling

with latent variables (Bollen, 1989) to a three-step rule for HCMs (Ben-Akiva et al., 2002): 1. 2.

3.

Confirm that the measurement component of the structural equations sub-model is identified, reformulating the equations as a confirmatory factor analysis. Confirm that the structural component of the structural equations sub-model is identified, reformulating the equations as a structural equations model with observable variables and treating each latent variable like an observed variable that is perfectly measured. Confirm that the structural component of the discrete choice sub-model is identified, treating each explanatory variable like an exogenous observed variable that is perfectly measured.

For example, consider the HCM shown in Figure 22.4A. It may be broken apart into three constituent sub-models. The confirmatory factor analytic model (Figure 22.4B) Bx comprises the two correlated latent variables XBd 1 and X2 and the five indicators used to measure them. In pulling apart the latent variables from the HCM and reformulating the model as a confirmatory factor analytic model, the structural relationships between the latent variables need to be ignored and additional relationships that capture correlation between each pair of latent variables must be introduced (even if the latent variables aren’t structurally related to each other). The structural equations model (Figure 22.4C) is similar to the corresponding sub-model in Figure 22.4A, except that the two latent Bx variables XBd 1 and X2 may now be treated as observable variables and additional relationships need to be introduced that capture correlation between each pair of exogenous variable, as would be the case with a structural equations model with observable variables. Lastly, the discrete choice model (Figure 22.4D) is relatively straightforward in that all explanatory variables may be treated as correlated exogenous observed variables. If identification can be established for each of the three sub-models, then the three-step rule states that the HCM as a whole is identifiable as well. A limitation to the three-step rule is that it provides a set of sufficient but not necessary conditions for theoretical identification. Conditional on the latent variables, the discrete choice sub-model and the measurement component of the structural equations submodel can be examined independently of each another. The challenge to the researcher then is to ensure that the structural component of the structural equations sub-model can be identified from either or both components. In the three-step rule, the structural component of the structural equations sub-model relies solely upon the measurement component of the structural equations sub-model, reformulated as a confirmatory factor analytic model, for identification, and ignores the information available to the analyst about the latent variables through the discrete choice sub-model (Daziano and Bolduc (2012) provide an excellent discussion on the additional insights that can be had in terms of the identification problem by a joint examination of the discrete choice sub-model and the structural equations sub-model). As a consequence, if one or more components of the sub-models resulting from the application of the three-step rule fail identification, the HCM may still be identified. In such cases, unless the analyst can verify that the HCM is indeed identified, it might be better to impose appropriate constraints to ensure that each of the three sub-models is identifiable and that the HCM satisfies the three-step rule.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

1

I1 2

I2

3

I3

4

I4

5

I5

2

3

od 3

ld 1 od 4

lx 2

ox 5

ox 6

u

529

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

y

a)

b)

1

I1

2

I2

3

I3

4

I4

5

I5

1

2

3

lx 2

od 3

od 4

ox 5

ox 6

ld 1 ld 1

lx 2 lx 2

c)

od 4

u

ox

5

y

d)

Figure 22.4 An example of an HCM and how it might be broken apart for the purpose of theoretical identification: (A) the HCM; (B) confirmatory factor analytic sub-model; (C) structural equations sub-model with observable variables; and (D) discrete choice sub-model

530

Handbook of choice modelling 3. Theoretical identification

3.1 General rules of identification

3.2 Confirmatory factor analysis

3.2.1 Analysis of the covariance matrix

3.3 SEM with observable variables

3.2.2 Alternative rules of identification

3.3.1 Analysis of the covariance matrix

3.4 Discrete choice models

3.4.1 Systematic parameters

3.4.2 Error structure

3.3.2 Alternative rules of identification

Figure 22.5 Roadmap to section 3 The objective of this section is to provide a repository of information regarding theoretical identification of the different pieces that comprise an HCM. In the spirit of structural equation models, Figure 22.5 shows a roadmap to section 3 in the form of a path diagram to help the reader negotiate the section. Section 3.1 derives rules of identification for a general econometric model form through the analysis of its covariance structure, and sections 3.2, 3.3 and 3.4 apply these rules to confirmatory factor analytic models (as in Figure 22.4B), structural equation models with observable variables (as in Figure 22.4C), and discrete choice models (as in Figure 22.4D), respectively. Sections 3.2 and 3.3 also present a selection of alternative rules of identification for confirmatory factor analytic models and structural equation models with observable variables, respectively. Section 3.4 addresses discrete choice models with flexible covariance structures. Section 3 need not be read in its entirety, and the reader is encouraged to use Figure 22.5 to take whichever path appears best suited to the problem at hand. For example, for a discrete choice model with random parameters and cross-sectional data the reader may only read sections 3.1, 3.4, 3.4.1 and 3.4.2. Similarly, for a relatively simple confirmatory factor analytic model the reader might be better served by one of the alternative rules of identification, and should refer first to sections 3.2 and 3.2.2. If none of the rules apply, the reader may switch to the other branch and refer to sections 3.1, 3.2 and 3.2.1. 3.1

General Rules of Identification

In this section, we present a framework for identification based on the analysis presented in Bollen (1989) and Walker et al. (2007) that can be applied to any general econometric model of the form: yn 5 Xyn 1 Pxn 1 RThW n ,

(22.13)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

531

where yn is a (G 3 1) vector of endogenous variables, xn is an (H 3 1) vector of exogenous variables, X is a (G 3 G) matrix of coefficients of the endogenous variables, P is a (G 3 H) matrix of coefficients of the exogenous variables, hW n is a (W 3 1) vector of W independent random factors with mean zero and variance one, TTr is the covariance matrix of ThGn such that T is the (W 3 W) lower triangular matrix that is its Cholesky factorization, and R is a (G 3 W) matrix of factor loadings that map the factors to the error structure. The variables yn and xn are observed without any measurement error; the elements of R are specified by the analyst based on some prior hypothesis about the covariance structure of the dependent variable yn; and the elements of the matrices X, P and T are parameters to be estimated. The procedure to check that equation (22.13) is identifiable involves the following steps: 1.

2.

Hypothesize the model of interest: Select the endogenous variables yn that are relevant to the study, the exogenous variables xn that are expected to influence the outcome of yn, the structure of the coefficient matrices X and P, and an a priori specification of the covariance structure as denoted by R. Formulate the covariance matrix of the observed variables yn and xn as a function of the unknown parameters in X, P and T. The identification conditions are based on the hypothesis that the covariance matrix of the observed variables is a function of a set of parameters. If the hypothesized model is correct and the analyst knows the parameters in X, P and T, then the population covariance matrix can be exactly reproduced such that: W 5 W (q) ,

3.

(22.14)

where W is the (G 1 H) 3 (G 1 H) sample covariance matrix of the observed variables yn and xn, such that the element wij represents the covariance between the ith and jth observable variables in the sample population; the vector q is the set of all unknown parameters in X, P and T; and W (q) is the covariance matrix as predicted by the model specification, and is a function of q. Equation (22.14) results in a system of simultaneous equations where the left-hand side contains the observables (as calculated from the data) and the right-hand side contains the unknowns (as predicted by the model specification). Apply the order condition, which states that for a covariance matrix W (q) of dimension (G 1 H) 3 (G 1 H) , the number of estimable parameters S should satisfy: S#

(G 1 H) (G 1 H 1 1) , 2

(22.15)

where the upper bound is equal to the number of unique elements in the covariance matrix W (q) , or the maximum potential number of independent equations available from equation (22.14). The order condition is a necessary but insufficient condition of identification, and depending on the hypothesized model structure the number of parameters that can be estimated is often less than that suggested by the order condition. Nonetheless, the order condition does provide for a quick check to avoid major blunders, and there are models that have been published that do not pass this test.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

532 4.

5.

Handbook of choice modelling Determine whether the system of equations represented by (22.14) can be solved for all of the unknown parameters in q. If a parameter in q can be written as a function solely of one or more elements of W and none of the elements of q, that parameter is identified. If all unknown parameters in q are identified, then the model as a whole is identified; if not, necessary constraints need to be imposed to ensure identifiability. A model specification is said to be just-identified if all the parameters are identified and the system of equations represented by (22.14) results in an equal number of independent equations and unknown parameters. A model specification is said to be overidentified if all the parameters are identified and the system of equations represented by (22.14) results in more independent equations than unknown parameters. Just-identified models yield a trivially perfect fit and are uninteresting from the standpoint of analysis. Since overidentified models do not always fit the observed data very well, when one does the analyst may take that to mean that the model is a reasonable representation of the behavior under study. A model is said to be underidentified if at least one of the model parameters cannot be identified. In general, the identified parameters in an underidentified model can be consistently estimated. When the conclusion from steps 3 and 4 is that further identifying restrictions are required, the equality condition is used to determine the set of acceptable normalizations. The equality condition states that any normalization must satisfy: W (qN) 5 W (q) ,

(22.16)

where qN is the vector of parameters from the normalized model. It is necessary to verify that the imposed normalization does not otherwise restrict the model; that is, the covariance matrix must remain the same as before the restriction is imposed. The equality condition assumes particular importance for HCMs with the logit kernel for reasons that shall become clear in Section 3.4 and discussed in greater detail in Walker et al. (2007). The identification steps described above apply to any general econometric model of the form shown in equation (22.13). Over the course of the following sections, we use the steps outlined above to ascertain identifiability of the different components of the HCM as outlined by the three-step rule. 3.2

Confirmatory Factor Analytic Model

Step 1 of the three-step rule requires that the measurement component of the structural equations sub-model, when reformulated as a confirmatory factor analytic model, be identifiable. Prior to model estimation, the indicators are usually processed. For each indicator response, the analyst calculates the deviation from the respective mean, and it is these deviations that are used directly during model estimation. Combining equations (22.9) and (22.10) from section 2.3, the measurement component of the structural equations sub-model may be restated as: inj 5 LxBnj 1 DqhCnj,

(22.17)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

533

where inj is a (Q 3 1) vector of observed indicators representing deviations from the sample mean, xBnj is an (LB 3 1) matrix of latent variables, L is a (Q 3 LB) matrix of coefficients, hCnj is a (C 3 1) vector of independent random factors with mean zero and variance one, q is a (C 3 C) lower triangular matrix that is the Cholesky factorization of the covariance structure between the indicators, and D is a (Q 3 C) matrix of factor loadings that map the random factors hCnj to the covariance structure. In reformulating the measurement model as a confirmatory factor analytic model, we introduce the additional term F 5 E (xBnjxBnjr) , the (LB 3 LB) covariance matrix of the latent factors that captures correlation between each pair of latent variables. The parameters to be identified are the elements of L, F and q. Section 3.2.1 applies the general rules of identification presented in section 3.1 to the confirmatory factor analytic model of equation (22.17), and works through an example to demonstrate how the rules might be applied in practice. Unfortunately, with growing model complexity the general rules of identification can often prove unwieldy. In such cases, alternative rules can be more useful. Section 3.2.2 reviews some of the commonly employed rules in the literature to determine identifiability. These rules cover most confirmatory factor analytic models found in the literature, and readers uninterested in the mathematical details pertaining to the general case may skip ahead to section 3.2.2. 3.2.1 Analysis of the covariance matrix This section is based on Bollen (1989), and for more details the reader is referred to the original text. For confirmatory factor analytic models, the indicators inj are the only observable variables. Assuming that the vector of observable variables inj in equation (22.17) does not denote absolute values but deviations from the mean, the covariance matrix may be parameterized as follows: W (q) 5 E (injirnj) 5 E [ (LxBnj 1 DQhCnj) (xBnjrLr 1 hCnjrQrDr) ] 1 W (q) 5 LE (xBnjxBnjr) Lr 1 DQE (hCnjhCnjr) QrDr 1 W (q) 5 LFLr 1 DQQrDr,

(22.18)

where W (q) is the (Q 3 Q) parameterized covariance matrix (and is independent of the observed data). Therefore, the covariance matrix of observable variables inj may be parameterized in terms of the elements of L, F, D and q. The elements of D comprise zeros and ones, and must be set by the analyst based on prior hypotheses. The elements of L, F and Q are unknown parameters. Combining equations (22.14) and (22.18), the identification problem may be restated as finding constraints that ensure a solution to the following system of nonlinear equations: 1 W 5LFLr 1 DQQrDr

(22.19)

where W here is the (Q 3 Q) sample covariance matrix of observable indicator responses inj to each of the Q indicator constructs. The left-hand side of equation (22.19) is a function of the observed data and the right-hand side is a function of the hypothesized model specification.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

534

Handbook of choice modelling

The rules of identification presented in section 3.1 may now be applied to equation (22.19) to verify identifiability. The general approach for any confirmatory factor analytic model requires the analyst to be able to express all of the unknown parameters in L, F and q as some function of the elements of the sample covariance matrix W. Appendix A uses this approach to demonstrate why the analyst needs to impose constraints to set the scale of the latent variables, and how this might be accomplished, and subsequently applies equation (22.19) to evaluate identifiability of the confirmatory factor analytic model of Figure 22.4B. 3.2.2 Alternative rules of identification Ensuring an algebraic solution to equation (22.19) can often be tedious and error-prone (Bollen, 1989). For these reasons, researchers have developed alternative rules for some of the more popularly employed model forms. Over the following paragraphs, we review some of the rules commonly cited in the literature and useful to the identification of HCMs. There are perhaps as many rules of identification as there are model forms. The rules presented here are by no means an exhaustive set and apply only to models with a factor complexity of one, that is, models where each indicator loads on a single latent variable. Notable among the rules not included here are the set of necessary and sufficient conditions for identification of models with factor complexity one developed by Reilly (1995), and the set of sufficient but not necessary conditions for identification of models with arbitrary factor complexity developed by Reilly and O’Brien (1996). Lastly, each of the following rules assumes that the scale of the latent factors has already been set through the imposition of appropriate constraints on F, the covariance matrix between the latent factors (readers interested in more details on how or why to set the scale of the latent factor should refer to Appendix A). 1.

2.

3.

4.

5.

The three-indicator rule (Bollen, 1989). A model with one or more factors is identified when it has (1) three or more indicators per latent factor; (2) a factor complexity of one; and (3) uncorrelated measurement errors between the indicators. The threeindicator rule places no additional restrictions on F, the covariance matrix between the latent factors, other than those required to set the scale of the latent factors. It is a sufficient but not necessary condition for identification. The two-indicator rule (Bollen, 1989). This is an alternative sufficient condition which states that a model with one or more latent factors is identified when it has (1) two or more indicators per latent factor; (2) a factor complexity of one; (3) uncorrelated measurement errors between the indicators; and (4) each latent factor is correlated with at least one other latent factor. Single factor one indicator rule (O’Brien, 1994). For a single factor model with two uncorrelated indicators, if the factor loading of one of the indicators is identified (based on some other rule) then the factor loading of the other indicator is also identified. Single factor error variance rule (O’Brien, 1994). For a single factor model with any number of indicators, if the factor loading of a particular indicator is identified then the variance of the measurement error of the same indicator is also identified. Single factor error covariance rule (O’Brien, 1994). For a single factor model with two correlated indicators, if the factor loadings of both indicators are identified

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

6.

7.

8.

9.

535

then the covariance between the measurement errors of the two indicators is also identified. Multifactor two indicator one indicator rule (O’Brien, 1994). If a model has two latent factors and three uncorrelated indicators such that two indicators load on one factor and one indicator on the other factor, then the loadings of the two indicators loading on the same factor are identified. Latent variable covariance rule (O’Brien, 1994). For a model with two correlated latent factors and two uncorrelated indicators such that each indicator loads on one of the two latent factors, if the loadings on both indicators are identified then the covariance between the corresponding latent factors is also identified. Multifactor one-indicator rule (O’Brien, 1994): For a model with two correlated latent factors and two uncorrelated indicators such that each indicator loads on one of the two latent factors, if the covariance between the two latent factors and one of the factor loadings are both identified, then the other factor loading is also identified. Multifactor error covariance rule (O’Brien, 1994): For a model with two correlated latent factors and two correlated indicators such that each indicator loads on one of the two latent factors, if the covariance between the two latent factors and both factor loadings are identified, then the covariance between the measurement errors of the two corresponding indicators is also identified.

The different rules together may read much like a cookbook, but the proof for any of these rules can be derived fairly straightforwardly through the analysis of the covariance matrix of the analogous model using the methods described in section 3.2.1. Furthermore, through repeated application of the rules in the right sequential order, the reader can address the identification problem for fairly complex confirmatory factor analytic models. For more details on how these rules may be used in practice, the reader is referred to O’Brien (1994). 3.3

Structural Equation Models with Observable Variables

Step 2 of the three-step rule requires that the structural component of the structural equations sub-model, when reformulated as a structural equations model with observable variables, be identifiable. For the purposes of identification, we assume throughout this section that the endogenous and exogenous variables have been processed such that they represent deviations from the sample mean. Note that this is purely for notational convenience, and does not change the identification problem for structural equation models with observable variables. Combining equations (22.11) and (22.12) from section 2.4, the structural component of the structural equations sub-model may be stated as: xdnj 5 Bxdnj 1 Gxxnj 1 GYhSnj,

(22.20)

where xdnj is the (Ld 3 1) vector of endogenous variables representing deviations from the sample mean, xxnj is the (Lx 3 1) vector of exogenous variables representing deviations from the sample mean, B is an (Ld 3 Ld) matrix of coefficients, G is an (Ld 3 Lx) matrix of coefficients, hSnj is an (S 3 1) vector of independent random factors with mean zero

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

536

Handbook of choice modelling

and variance one, Y is an (S 3 S) lower triangular matrix that is the Cholesky factorization of the covariance structure between the endogenous variables, and G is an (Ld 3 S) matrix of factor loadings that map the random factors hSnj to the covariance structure. In reformulating the structural component as a structural equations model with observed variables, we assume that both xdnj and xxnj are observed, and reintroduce the symbol F, but on this occasion to represent the (Lx 3 Lx) covariance matrix of the exogenous variables xxnj, that is, F 5 E (xxnjxxnjr) . The unknown parameters are the elements of B, G, Y and F. Section 3.3.1 applies the general rules of identification presented in section 3.1 to the structural equations model with observable variables of equation (22.20). Section 3.3.2 presents a set of alternative rules that cover two cases most commonly employed in the literature on HCMs: linear regression and recursive models. Readers uninterested in the general conditions may skip ahead to section 3.3.2. 3.3.1 Analysis of the covariance matrix This section is based on findings presented in Fisher (1976) and Bollen (1989), and readers interested in more details are referred to the original texts. For structural equation models with observed variables, both the vector of endogenous variables xdnj and the vector of exogenous variables xxnj are observed, and the (Ld 1 Lx) 3 (Ld 1 Lx) covariance matrix of observed variables may be given by: W (q) 5 c

Var (xdnj) Cov (xxnj,xdnj)

Var (xxnj)

d.

(22.21)

Rearranging equation (22.20), we get xdnj 5 (JL 2 B) 21 (Gxxnj 1 GYhSnj) , where JL denotes the (Ld 3 Ld) identity matrix. Then, the variance of the endogenous variables xdnj can be calculated as follows: d

d

Var (xdnj) 5 E (xdnj xdnjr) 5 E [ (JL 2 B) 21 (Gxxnj 1 GYhSnj) (xxnjrGr 1 hSnjrYrGr) (JL 2 B) 21r ] d

d

5 (JL 2 B) 21 (GE [ xxnjxxnjr ] Gr1 GYE [ hSnjhSnjr ] YrGr) (JL 2 B) 21r d

d

5 (JL 2B) 21 (GFGr1 GYYrGr) (JL 2 B) 21r, d

d

(22.22)

where it is assumed that the exogenous variables xxnj are uncorrelated with the measurement errors hSnj. Next, by definition F 5 Var (xxnj) . And last, the covariance term Cov (xxnj,xdnj) is given by: Cov (xxnj,xdnj) 5 E [ xxnj xdnjr ] 5 E [ xxnj (xxnjr Gr1 hSnjr YrGr) (JL 2 B) 21r ] d

5 E [ xxnj xxnjr ] Gr (JL 2B) 21r d

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem 5 FGr (JL 2 B) 21r

537

(22.23)

d

Combining equations (22.22) and (22.23) with equations (22.14) and (22.21), the identification problem for a structural equations model with observed variables may be stated as finding solutions to the following system of equations: c

WX X WX X d

d

x

d

WX X x

x

d 5 c

(JL 2 B) 21 (GFGr 1 GYYrGr) (JL 2 B) 21r d , (22.24) FGr (JL 2 B) 21r F d

d

d

where WXd Xd is the (Ld 3 Ld) sample covariance matrix of the endogenous variables xdnj; WXxXx is the (Lx 3 Lx) sample covariance matrix of the exogenous variables xxnj; and WXxXd is the (Lx 3 Ld) sample covariance matrix between the endogenous and exogenous variables. The parameters to be identified are B, G, Y and F. First, the reader should observe that F, the covariance matrix of the exogenous variables, is fully identified from the equation F 5 WXxXx. Substituting the expression for F in equation (22.23) and combining with equation (22.24), we get: (JL 2 B) 21G 5 WXdXx WX21x Xx

(22.25)

21 1 G 1 BWXd Xx W21 X x X x 5 WX d X x WX x X x

(22.26)

d

With regards to the set of equations in the endogenous variables xdnj given by (22.20), we examine the identification of the elements of B and G one equation at a time or, with regards to the matrices B and G itself, one row at a time. Any row of the matrix on the left hand side of equation (22.26) may comprise a maximum of (Lx 1 Ld 2 1) unknown parameters, Lx from the corresponding row in G and (Ld 2 1) from the corresponding row in B (minus one because the diagonal elements of B are constrained to zero). However, the dimension of the matrix on the right hand side is (Ld 3 Lx) , that is, the number of elements in any row is Lx. Therefore, for each of the Ld equations corresponding to the endogenous variables xdnj, we have a maximum of (Lx 1 Ld 2 1) unknown parameters and Lx equations. The order condition may be restated as the requirement that for each of the Ld equations corresponding to the endogenous variables xdnj, at least (Ld 2 1) of the endogenous and exogenous variables xdnj and xxnj must be excluded from the equation. When B is large, (JL 2 B) 21 is tedious to compute and equation (22.25) can be hard to solve for the unknown parameters in B and G. An alternative approach, proposed by Fisher (1976), begins by constructing the following matrices: d

A 5 [ (JL 2 B) d

2 G ] , and W 5 c

(JL 2 B) 21G d JLx

1 AW 5 0 1 atW 5 0,

d

(22.27)

where A is an Ld 3 (Ld 1 Lx) matrix and at is the tth row of the matrix A; and W is an (Ld 1 Lx) 3 Lx matrix. Let Jt be the number of parameters constrained to zero in the tth equation (not including the corresponding diagonal element of B), and \ t denote the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

538

Handbook of choice modelling

(Ld 1 Lx) 3 Jt matrix whose element [tij equals one if the jth constraint on equation t states that the ith parameter in at should be zero. In other words, at \ t 5 0. Combining with equation (27), we get: at [ \ t 0 W ] 5 0 1 atCt 5 0,

(22.28)

where Ct 5 [ \ t 0 W ] is the (Ld 1 Lx) 3 (Jt 1 Lx) matrix formed by adjoining the matrices \ t and W. Since equation (22.28) captures all that is known about the vector of parameters at, the equation is identifiable if and only if any vector that satisfies the equation is a scalar multiple of the true underlying at. But art belongs to the null space of Ctr. Hence, the equation is identifiable if and only if the dimension of the null-space of Ctr is equal to one. Applying the rank-nullity theorem, the dimension of the null-space of Ctr is equal to one if and only if the rank of Ctr, and hence the rank of Ct, is equal to (Ld 1 Lx) 2 1. It can further be shown that the rank of Ct is equal to (Ld 1 Lx) 2 1 if and only if the rank of A \ t 5 Ld 2 1. Hence the parameters in the tth equation in xdnj, or the tth row of B and G, are identified if and only if the rank of the matrix A \ t is Ld 2 1. This is known as the rank condition. Note that the matrix A\ t can be obtained from removing the columns of A that do not have a zero in the tth row of A. Once the parameters in B and G have been identified, equations (22.22) and (22.24) can be used to show that the full covariance matrix between the measurement errors of the endogenous variables, as represented by the parameters Y and the factor loadings G, is also identified. The order condition is a necessary but not sufficient condition of identification, whereas the rank condition is a necessary and sufficient condition of identification. Appendix B illustrates how the order and rank conditions might be applied to establish identification of the nonrecursive model shown in Figure 22.6C, taken from Hanneman (2000).

1

x

a)

x

1

2

x1

d1

1

d

2

d

3

d

2

1

2

3

2

b)

x1

x2

d1

d2

d

d3

1

3

2

c)

Figure 22.6 Examples of structural equation models with observable variables: (A) seemingly unrelated regression, (B) recursive model and (C) nonrecursive model

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

539

3.3.2 Alternative rules of identification Most studies on HCMs employ fairly simple representations of the structural relationships between explanatory variables. In the following paragraphs, we briefly review two rules that cover most model forms found in the literature. For a more thorough treatment of the rules, the reader is referred to Bollen (1989). 1.

2.

The null B rule. The rule is a sufficient condition of identification that states that a model with endogenous variables that do not affect one another, that is, a model with a null B matrix, is identified. The rule places no restriction on the covariance structure between the vector of endogenous variables xdnj. If the endogenous variables are uncorrelated, then each variable may be treated separately as a regression equation. If the analyst has reasons to hypothesize correlation between the endogenous variables, then the model specification is reduced to a system of seemingly unrelated regression equations. The model drawn in Figure 22.6A is an example where the null B rule may be used to determine identification. The recursive rule. A structural equations model with observed variables is said to be recursive if the system of equations given by (22.20) contain no reciprocal or causation loops, and it is possible to write the matrix B denoting the influence of the endogenous variables on each other as a lower triangular matrix. The recursive rule is a sufficient condition of identification that states that a recursive model with multiple endogenous and exogenous variables is identified if the vector of endogenous variables xdnj is uncorrelated. Figure 22.6B shows a model that can be identified using the recursive rule.

If the structural component of the structural equations sub-model can be written as either a regression model or a recursive model, then the null B or recursive rule provide sufficient conditions for identification. For other model forms, such as a recursive model with correlated endogenous variables, the rank and order conditions presented in section 3.3.1 provide a set of necessary and sufficient conditions for identification. 3.4 Discrete Choice Models Step 3 of the three-step rule requires that the structural component of the discrete choice model be identifiable, treating each explanatory variable as an exogenous observed variable with no measurement error. We restate the structural component of the discrete choice model: un 5 vn 1 en

(22.29)

1 un 5 Xrn b 1 FnUhRn 1 nn,

(22.30)

where un is a (J 3 1) vector of random utilities; vn and en are (J 3 1) vectors that comprise the systematic and stochastic component of un, respectively; Xn is an (L 3 J) matrix of explanatory variables; b is an (L 3 1) vector of parameters; hRn is an (R 3 1) vector of independent random factors with mean zero and variance one; U is an (R 3 R) lower triangular matrix that is the Cholesky factorization of the covariance structure of the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

540

Handbook of choice modelling

utilities; Fn is a (J 3 R) matrix of factor loadings that map the random factors to the covariance structure; and nn is a (J 3 1) vector of i.i.d. Extreme value random variables with mean zero and variance g/m2, where m is the scale and g is the variance of a standard extreme value random variable. To address the identification problem for discrete choice models, we reframe equation (22.30) so that it resembles the form of structural equation models with observable variables discussed in section 3.3. As was the case in section 3.3, we assume that the explanatory variables have been processed such that they represent deviations from the sample mean. To reiterate, this is purely for notational convenience, and does not change the identification problem. Let xn be the (JL 3 1) vector of explanatory variables constructed as shown below: xn1 xn 5 £ ( § , xnJ where xnj is the (L 3 1) vector of explanatory variables corresponding to the jth alternative, that is, xnj is the jth column of Xn. We reintroduce the symbol F to represent the (JL 3 JL) covariance matrix of the explanatory variables xn, that is, F 5 E (xnxrn) . Similarly, let B be the (J 3 JL) block diagonal matrix of parameters constructed as follows: c 0 br 0 0 f ( B5 ≥ ¥ 0 ( f 0 c 0 br Then, equation (22.30) may be restated as: un 5 Bxn 1 FnUhRn 1 nn

(22.31)

There are two sets of relevant parameters to be considered for identification: the matrix B that enters the systematic component of the utility specification, and the unrestricted parameters of the distribution of the stochastic component e. Utility as a construct is a latent variable whose location and scale need to be fixed. With regards to location, only the difference in utilities is observable from the measurement component of the discrete choice sub-model, and the parameters B and e must be normalized accordingly. The scale can be set either by constraining the variance of the error (for example, m 5 1) or by constraining one of the systematic parameters (for example, a particular b 5 1). The assumption made throughout this section is that the scale is normalized through the error variance. Since only the differences in utilities are observable, and not the absolute levels themselves, we rewrite the structural component of the discrete choice model as follows: Dun 5 Dvn 1 Den

(22.32)

1 Dun 5 DBxn 1 DFnUhRn 1 Dnn

(22.33)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

541

where D is the linear operator that transforms the J utilities into (J 2 1) utility differences taken with respect to the Jth alternative. We have assumed here, for the sake of simplicity, that all individuals face the same choice set. For a discussion on mapping from the deviation with respect to the last alternative available to decision maker n with heterogeneous choice sets, see Bolduc (1999). D is a (J 2 1) 3 J matrix that consists of a (J 2 1) 3 (J 2 1) identity matrix with a column vector of 21’s appended as the Jth column. Though D performs the differences with respect to the last alternative for each choice situation, the identification problem is invariant with respect to which alternative is used as the base. Equation (22.14) can now be used to examine identification of equation (22.33), where the observable variables are Dun and xn, and the unknown parameters are B, F, U and m (the scale of nn). The (J 2 1 1 JL) 3 (J 2 1 1 JL) covariance matrix of observed variables may be given by: W (q) 5 c

Var (Dun) Cov (xn,Dun)

Var (xn)

d

(22.34)

The variance of the differences in utilities Var (Dun) can be calculated as follows: Var (Dun) 5 E (Dunurn Dr) 5 DE [ (Bxn 1 FnUhRn 1 nn) (xrn Br1 hRnr UrFrn 1 nrn) ] Dr 5 DBE [ xn xrn ] BrDr 1 DFnUE [ hRn hRnr ] UrFrn Dr 1 DE [ nnnrn ] Dr 5 DBFBrDr 1 DFnUUrFrn Dr 1 a

g bDDr m2

(22.35)

where it is assumed that the explanatory variables xn are uncorrelated with any of the measurement errors. Next, by definition, Var (xn) 5 E (xn xrn) 5 F. Lastly, the covariance term Cov (xn,Dun) is calculated as follows: Cov (xn,Dun) 5 E (xnurn Dr) 5 E [ xn (xrn Br 1hRn r UrFrn 1nrn) Dr ] 5 E [ xn xrn ] BrDr 5 FBrDr

(22.36)

Combining equations (22.35) and (22.36) with equations (22.14) and (22.34), the identification problem for discrete choice models may be stated as finding solutions to the following system of equations:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

542

Handbook of choice modelling c

WDu ,Du Wx ,Du n

n

n

Wx ,x

n

n

n

d 5 £

DBFBrDr 1 DFnUUrFrn Dr 1 a FBrDr

g bDDr m2

F

§,

(22.37)

where WDu ,Du is the (J 2 1) 3 (J 2 1) sample covariance matrix of the difference in utilities; Wx ,x is the (JL 3 JL) sample covariance matrix of the explanatory variables; and Wx ,Du is the JL 3 (J 2 1) sample covariance matrix between the explanatory variables and the differences in utilities. It should be apparent that F, the covariance matrix of the explanatory variables, is fully identified from the equation F 5 Wx ,x . Section 3.4.1 will use the equation FBrDr 5 Wx ,Du to establish the identification conditions for the unknown parameter B in the systematic component of the choice sub-model. Once F and B have both been identified, section 3.4.2 uses equation (22.35) to identify the unknown parameters U and m contained in the stochastic component. n

n

n

n

n

n

n

n

n

n

3.4.1 The systematic parameters The matrix of systematic parameters B is identified if the following equation can be solved for each element of B: FBrDr 5 Wx ,Du n

n

1 DB 5 WDu ,x W21 x ,x , n

n

n

n

(22.38)

where DB is a (J 2 1) 3 JL matrix. Identification of the unknown parameters in DB may be examined one row at a time. If the sample covariance matrix of the explanatory variables Wx ,x is non-singular, then all of the parameters in DB are theoretically identified. However, if any of the explanatory variables in xn are linearly dependent then the matrix Wx ,x is singular. Since the utility specification is linear in parameters, it can be additively separated into the linearly independent component and the linearly dependent component, and the expression in equation (22.36) may be derived separately for the two components. The sub-matrix of Wx ,x corresponding to the linearly independent variables allows identification of the corresponding parameters in a particular row of DB, whereas the sub-matrix of Wx ,x corresponding to the linearly dependent variables is singular and the parameters in a particular row of DB corresponding to these variables cannot be identified using the equations available from that row. In general, it helps to make a distinction between continuous variables and categorical variables, and between alternative attributes and individual characteristics. The rules on how to include each of the four variable types in the utility specification to maintain identifiability are summarized below: n

n

n

n

n

n

1.

2.

n

n

Continuous attributes, such as travel cost and travel time in a travel mode choice model, can enter the utility specification for each alternative, as long as there is some heterogeneity in the values taken by the attributes across different alternatives and choice situations. Categorical attributes, such as vehicle-make in a vehicle ownership model, require that a reference level be selected. For example, for a categorical attribute with C

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

3. 4.

543

levels, a binary variable might be introduced for C 2 1 levels, excluding the reference level, in the utility for all Jn alternatives. Continuous characteristics, such as age and income, may be included in the utilities of J 2 1 alternatives, one alternative being used as a reference. Categorical characteristics, such as gender or education, require that both a reference level and a reference alternative be selected. A binary variable might then be introduced for each level of each characteristic, except the reference level for that characteristic, in the utilities of the J 2 1 alternatives, excluding the reference alternative.

The selection of the reference level and reference alternative has no effect on the model other than to shift the values of the parameters, preserving their differences, and this property holds even when the choice set varies across observations. Characteristics interacted with attributes result in variables that must also be treated as attributes, and depending on whether the resulting variable is continuous or categorical the appropriate attribute-specific rule may be used to verify identification. For more details on the specification of the systematic component, the reader is referred to Ben-Akiva and Lerman (1985). 3.4.2 The error structure Once the analyst has identified F, the covariance matrix of the explanatory variables, and B, the matrix of the unknown parameters in the systematic component of the utility specification, equation (22.35) may be rearranged as follows to help identify the unknown parameters U and m that define the error structure: DFnUUrFrn Dr 1 a

g bDDr 5 WDu ,Du 2 DBFBrDr m2 n

n

5 WDu ,Du 2 WDu ,x W21 x ,x Wx ,Du n

n

n

n

n

n

1 W (q) 5 W,

n

n

(22.39) (22.40)

where the left hand side of the equation contains the function W (q) of the unknown parameters q5{ U, m } , and the right hand side comprises the (J 2 1) 3 (J 2 1) symmetric matrix W of known values. We persist in denoting the right-hand side of equation (22.39) by W even though it isn’t technically a sample covariance matrix, and the lefthand side by W (q) even though it is not a parameterized covariance matrix either. The identification problem for the error structure of the discrete choice model may be stated as finding a solution to each of the unknown parameters contained in equation (22.39). Since the scale of the utility is typically normalized through the error structure, the system of equations given by (22.39) will contain one fewer independent equation than the general case. Therefore, the order condition from section 3.1 must be restated. The number of estimable parameters S in the vector of unknown parameters q must satisfy the inequality: S#

J (J 2 1) 21 2

(22.41)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

544

Handbook of choice modelling

If U is diagonal, as is often the case, the system of equations given by (22.39) is linear in the unknown parameters g/m2 and s2i, where si is the ith diagonal element of U. Then, the number of estimable parameters S must also satisfy the following equality: S 5 Rank (Jacobian (vecu (W (q) ) ) ) 2 1,

(22.42)

where vecu (.) is a function that vectorizes the unique elements of W (q) into a column vector, and the Jacobian is equal to the derivatives of the elements of W (q) with respect to the unknown parameters g/m2 and s2i contained in q, where we redefine q 5 { g/m2,s2i4i 5 1,. . .,R } . Since (22.39) results in a system of simultaneous linear equations, the rank of the Jacobian equals the number of independent equations in (22.39), minus one to set the scale of the utilities. The rank condition is more restrictive than the order condition, and is sufficient to ensure that there is a solution to (22.39). The order condition simply counts cells and ignores the internal structure of W (q) . The rank condition, however, counts the number of linearly independent equations available in W (q) that can be used to estimate the parameters of the model. Together, (22.41) and (22.42) form a set of necessary and sufficient conditions for identification of the error structure. The objective of the procedure outlined above is to find conditions for a discrete choice sub-model specified in levels under which the error structure can be properly identified and normalized. It is important to emphasize the implications of imposing restrictions on the covariance matrix at the levels (U) rather than on differences in utilities (W (q) ), because this is the root cause of the complexity. Technically, only utility differences are estimable from the data. Once an arbitrary constraint has been selected for W (q) , one is done with identification. However, restrictions in the discrete choice sub-model are typically imposed in levels instead of on differences of utilities. This is because the structural parameters (that is, the elements of U) are interpretable, whereas the parameters in the difference model (that is, the elements of W (q) ) are not. Therefore, our aim is to impose and possibly test restrictions on U and verify that the model is identified. If the model is unidentified, some restriction will need to be imposed. Since we are working with the levels form, we want to impose the constraints on U. The choice of constraint on U isn’t always arbitrary (as it is on W (q) ), and one has to make sure that it does not impose additional restrictions on W (q) . The equality condition described in section 3.1 is necessary to ensure that the constraints do not change W (q) , and this is necessary due to the mixing with an Extreme Value error term that has already been normalized. Walker et al. (2007) provides a comprehensive discussion of how the rules of identification may be applied to mixed logit models with heteroskedastic, nested and cross-nested error structures. Appendix C extends these findings to include models with random parameters on explanatory variables and demonstrates how the identification conditions may be established when working with panel data sets and agent effects. We summarize the important results below, and the reader may refer to the citations for more details: 1.

The heteroskedastic model. In the heteroskedastic model, the random error of each alternative has a different variance. The model allows for situations in which the systematic portion of the utility better represents the utility of some alternatives more than others. For J 5 2, neither of the alternative-specific variances can be identified.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

2.

3.

545

For J $ 3, J 2 1 of the alternative-specific variances can be identified, and normalization must be imposed on the parameter corresponding to the minimum variance alternative. However, in practice there is no prior knowledge of the minimum variance alternative. For the general case with J alternatives, a brute force solution is to estimate J versions of the model, each with a different variance term normalized; the model with the best fit is the one with the correct normalization. This is both cumbersome and time consuming. Alternatively, one can estimate the unidentified model with all J variance terms. Although this model is not identified, a software estimation program will produce maximum likelihood parameter estimates (but not standard errors) that reflect the true covariance structure of the model. Therefore, the variance term with minimum estimated variance in the unidentified model is the minimum variance alternative, thus eliminating the need to estimate J different models. Nested and cross-nested models. In nested and cross-nested models, the stochastic component of the utility specification can be correlated across alternatives to allow for more flexible substitution patterns. The alternatives are partitioned into nests such that alternatives within a nest are correlated, and alternatives that do not share a nest are uncorrelated. Nested models refer to cases where the nests are mutually exclusive, that is, an alternative can only belong to one nest. Cross-nested models relax this assumption and allow for overlapping. There are no general rules for the identification of nested and cross-nested models, and the analyst has to check the rank and order conditions on a case-by-case basis. The random parameters model. If the random parameter is imposed on a continuous attribute, there are no identification issues per se. Data permitting, the full covariance structure can be estimated. For a categorical attribute with two levels, independently distributed generic random parameters can be imposed on only one of the two binary variables corresponding to the two levels. For a categorical attribute with three or more levels, the variance term can be identified for independently distributed generic random parameters on the binary variables corresponding to each of the levels. In the case of independently distributed alternative-specific random parameters and a categorical attribute with C levels, where C can be two or more, a reference level must be chosen for the disturbances and only J (C 2 1) of the variance terms corresponding to the random parameters are estimable. If a random parameter is placed on a characteristic of the decision maker that is continuous, it necessarily must be interacted with an alternative-specific variable (otherwise it will cancel out when the differences in utility are taken). The normalization of such parameters then depends on the type of variable with which it interacts. In general, if the characteristic interacts with alternative-specific or nest-specific binary variables, then at most one additional disturbance might be identified over the analogous model form without the interaction with the characteristic variable. For example, if the characteristic interacts with alternative-specific dummy variables, then the model is similar to the heteroskedastic case, except that for J $ 3 a variance term can be identified for all J alternatives. For characteristics that are categorical variables, irrespective of the interaction structure a reference level must be chosen for the disturbances, and only (C 2 1) of the random parameters can be identified per interaction, where C denotes the number of levels to the categorical variable.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

546 4.

Handbook of choice modelling Extensions to panel data. For heteroskedastic, nested and cross-nested models, the use of panel data and a model with agent effects can result in the identification of at most one additional parameter over an equivalent model with cross-sectional data and alternative-specific effects. For the random parameters model, the use of panel data and agent effects does not change the identification problem: continuous attributes are theoretically always identifiable, and the same conditions hold for categorical attributes, and continuous and categorical characteristics as with crosssectional data.

For multinomial probit models where the error structure is specified using the factor analytic form, the identification problem can be reduced to finding a solution to each of the unknown parameters contained in the following equation: DFnUUrFrn Dr 5WDu ,Du 2 WDu ,x W21 x ,x Wx ,Du n

n

n

1 W (q) 5 W,

n

n

n

n

n

(22.43) (22.44)

where as before the left hand side of the equation contains the function W (q) of the unknown parameters q 5 U, and the right hand side comprises the (J 2 1) 3 (J 2 1) symmetric matrix W of known values. The same rules of identification hold as for the mixed logit model. For multinomial probit models not specified using the factor analytic specification, as is often the case in the literature, the reader is referred to Bunch (1991) and Train (2009, ch. 5) for a discussion on how to establish identifiability. In general, these different forms have served as prototypes for most error structures commonly employed in the literature on HCMs and the reader should be able to use these findings to ascertain identification of more general forms that combine one or more features from these prototypes.

4

EMPIRICAL IDENTIFICATION

Since the definition of theoretical identification rests on the availability of an infinite number of observations, it has its limitations. A model that is theoretically identified may often be empirically unidentified due to insufficient variability in the observed data. The flexibility offered by HCMs should be used with caution. If the dataset is not rich enough to support models with a high degree of complexity, multiple model specifications can result in the same improvement in fit (McFadden and Train, 2000), and in some cases this can even result in empirically unidentified models (Walker, 2001). When working with HCMs, it is helpful to have a prior idea of the sample size required to support models of a particular degree of complexity. One of the ways in which a reasonable sample size can be determined for any hypothesized model form is through a Monte Carlo experiment. For more details, the reader is referred to Muthén and Muthén (2002). A second source of empirical unidentification is multicollinearity. Multicollinearity occurs when two or more explanatory variables in the model are strongly correlated and provide redundant information about the behavior of interest, for example, travel times and travel costs in travel demand models. Any data sample will always contain some

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

547

degree of multicollinearity, and it is up to the analyst to decide a tolerable limit. Though a high degree of multicollinearity can lessen the reliability of parameter estimates and the accompanying statistical inference, the exclusion of partially redundant variables from the analysis can also compromise the objectives of the study, and finding a balance between the two isn’t always straightforward. Lastly, the models that we have so far examined have made strong assumptions about linearity, additivity and, in the case of the structural equations sub-model, normality. Violations of these assumptions, or omission and/or incorrect inclusion of important factors, variables or causal paths, may result in empirically unidentified model forms. One of the ways in which any hypothesized model form can be checked for misspecifications is through an outlier analysis. Outliers are data points that deviate markedly from other data points in an observed sample (Grubbs, 1969). To detect outliers, the hypothesized model specification is estimated on the complete sample. The probability of observing each data point in the observed sample is subsequently calculated assuming that the estimated model is the true underlying model. Data points for which the predicted probabilities lie below some predetermined threshold are labeled outliers. Outliers can often occur randomly due to chance deviations in natural populations. In some circumstances though, the outliers may exhibit a systematic trend, and the analyst should check that the theoretical assumptions underlying the model specification are credible. The distinction between systematic and random errors is not always clear, and the analyst should have valid reasons for excluding any data points. For a recent discussion on how to deal with outliers in discrete choice models, the reader is referred to Campbell and Hess (2009).

5

ESTIMATION METHODS

Unfortunately, a general framework of theoretical and empirical identification that is readily practicable remains elusive. Throughout this chapter, we have addressed the identification problem for a select subset of model forms that are linear and additive, and conform to certain distributional assumptions. Establishing identifiability even under these restrictive assumptions can be a challenge. As analysts start to relax some of these restrictions, model specifications can grow increasingly complex, to the point where it is virtually impossible to analyze the covariance structure to determine whether the parameters are identifiable, or to predict a priori whether a particular dataset will be able to support such complexity. Estimation methods can provide insights into the identification status of a model that would otherwise be unavailable from more theoretical procedures. For instance, if the estimation routine for a given model specification fails to converge to a solution, and the Hessian matrix at the optimum is singular or ill conditioned (resulting in absurdly large standard errors), the model may be theoretically or empirically unidentified. If the model is estimated but the parameters lie outside the range of reasonable values, the model may again be unidentified. One of the ways in which an analyst can check for identification is to estimate the model multiple times, employing different starting values for the parameters for each estimation run. If the estimation routine consistently converges to the same solution, the analyst can be reasonably confident that the model is identified and, just as importantly, that the solution is a global maxima.1 Alternatively, if the analyst

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

548

Handbook of choice modelling

is interested in the identification status of a specific parameter, it might be helpful to fix the parameter value to some arbitrary, often unreasonable, value (Hayduk, 1988). If the log-likelihood at convergence does not change with the addition of the constraint, then the parameter is probably unidentified and the log-likelihood is flat along the direction of that parameter. In most cases, the likelihood function for HCMs comprises a multidimensional integral that does not have a closed form solution and cannot be approximated using Gaussian quadrature methods, and estimation routines usually rely on Monte Carlo simulation to numerically approximate the integral. Though simulation allows for the estimation of more flexible model forms, simulation noise leads to biased parameter estimates (Walker, 2001) and may mask identification problems inherent in the model (Chiou and Walker, 2007). This is particularly a problem in the case of HCMs because these models usually require additional or more extensive simulation routines. With regard to simulation bias, the number of draws must rise with sample size at a sufficiently fast rate for the parameters to asymptotically converge to their true values (Train, 2009). Since the appropriate number of draws is a function of the model structure and observed data, there is no way to know a priori what an appropriate number might be. It has been suggested that the analyst estimate the model multiple times, using different starting values and increasing the number of draws with each subsequent run (Hensher and Greene, 2003). If the parameters remain relatively stable, then the analyst can be fairly confident that the estimation routine has converged to the true solution. The definition of what constitutes stable is of course subjective, but an often used rule of thumb requires that parameter estimates lie within one standard error of each other over subsequent runs with increasing number of draws. For a recent discussion on the issue of simulation bias in discrete choice models, the reader is referred to Bastin and Cirillo (2010). The number of draws also plays an important role in masking identification. Often, unidentified models estimated with a small number of draws appear to be identified in that the Hessian is non-singular and well conditioned. As the dimension of the problem increases, the number of draws required to adequately cover the dimension space also increases. Consequently, for unidentified models the estimation routine may break down only when the number of draws is high enough, where high enough could be any number between 100 and 10 000, and maybe even higher. For more details on the masking effect of simulation noise on identification, the reader is referred to Chiou and Walker (2007). To summarize, estimation methods can provide an additional source of information on the identifiability of HCMs. Run-time symptoms of unidentified models include high standard errors, unstable and/or unreasonable parameter estimates with increasing number of draws, singular or ill-conditioned Hessian, and so on. However, due to the confounding effect of simulation noise, they can also be misleading at times. In general, it is good practice to establish identification using one of the techniques presented in the previous sections, and estimation methods should only be used as supplementary tools.

6

CASE STUDY

In this section, we present estimation results for a stated preference dataset of travel mode choice to illustrate some of the issues that can arise in practice. The dataset for our

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

549

analysis was collected as part of a series of experiments conducted at the Experimental Social Sciences Laboratory (XLAB) in the Haas Business School at the University of California, Berkeley. The experiments sought to assess the impact of information provision on various aspects of travel behavior. The kinds of information offered ranged from service reliability and greenhouse gas emissions to health benefits and peer behavior, and the dimensions of travel behavior studied included vehicle ownership, route choice and travel mode choice. More details on the experiments can be found in Gaker et al. (2011). The particular dataset that we use here corresponds to the travel mode choice experiment. Survey respondents were asked to choose a travel mode for some hypothesized trip given the travel times and travel costs of the different modal alternatives, and the greenhouse gas emissions associated with each mode. The original dataset comprised 1670 observations made by 334 undergraduate students from the university, such that each respondent was presented with five different scenarios and the alternatives for any single scenario included three of seven pre-defined travel modes. We will be restricting attention to a subsample of 501 observations made by 306 respondents, where the number of observations for any single respondent in the subsample may vary between one and three, and the alternatives for any single scenario may include any three of the following four travel modes: auto, bus, train and bike. We excluded choice situations that featured any one of the three other alternatives in the original dataset to keep the model specification deliberately sparse and to more clearly emphasize potential identification issues. The application concerns an HCM with a multinomial logit kernel and a latent characteristic denoting pro-environmental attitudes. The latent characteristic is operationalized via a single response asking for agreement on a scale of 1 to 7 with the attitudinal statement, ‘We should raise the price of gasoline to reduce congestion and air pollution’, where a higher response indicates stronger agreement. The latent variable enters the choice model through an interaction with greenhouse gas emissions for each travel mode. The structural component of the structural equations sub-model comprises a normally distributed random factor with mean zero and standard deviation that needs to be estimated. Table 22.1 enumerates the estimation results for models with different sets of constraints. Model 1-1 is the unconstrained partially identified model. The scale of the latent variable has not been fixed, which results in a singular hessian and unreasonable standard errors for the factor loading (l), the standard deviation of the latent variable (f) and the coefficient on the latent variable in the utility specification (b). The literature on HCMs prescribes two general methods for setting the scale of the latent variable, covered in Appendix A: either by constraining the standard deviation of the latent variable or by constraining one of the factor loadings on the indicators (Raveau et al., 2012). Model 1-2 sets the scale of the latent variable by constraining the standard deviation of the latent variable f to 1 and Model 1-3 by constraining the factor loading l to 1. There is in fact a third way of setting the scale of the latent variable that isn’t mentioned in the literature or covered by Appendix A: by constraining the coefficient on the latent variable b as demonstrated by Model 1-3, which fixes it to −1. The framework of identification presented in this chapter breaks the HCM into three sub-models, and analyzes the covariance matrix for each of these sub-models in isolation. While such an approach is algebraically convenient, it is oblivious to the additional information that would be available from an analysis of the covariance matrix of observable variables from different sub-models. For instance, for the example discussed here

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

550

Handbook of choice modelling

Table 22.1 Discrete choice model of travel model choice with a latent variable denoting attitudes towards the environment Model:

1-1

1-2

1-3

1-4

Identification status:

Unidentified

Identified

Identified

Identified

Parameter

Est. (SE)

Est. (SE)

Est. (SE)

Est. (SE)

Utility specification Alt. specific constants Auto 0.00 (−) 0.00 (−) 0.00 (−) 0.00 (−) Bus 0.71 (0.20) 0.71 (0.20) 0.71 (0.20) 0.71 (0.20) Train 0.64 (0.18) 0.64 (0.18) 0.64 (0.18) 0.64 (0.18) Bike 0.40 (0.30)* 0.40 (0.30)* 0.40 (0.30) 0.40 (0.30) Emissions (lbs. of CO2) −0.02 (0.05)* −0.02 (0.05)* −0.02 (0.05)*−0.02 (0.05)* Environmental attitudes × −0.07 (na) −0.07 (0.03) −0.04 (0.02) −1.00 (−) emissions (lbs. of CO2) (â) Latent variable – Pro-environmental attitudes Standard deviation (φ) 1.00 (na) 1.00 (−) 1.66 (0.10) 0.06 (0.03) Indicator – We should raise the price of gasoline to reduce congestion and air pollution Factor loading (ë) 1.66 (na) 1.66 (0.10) 1.00 (−) 25.10 (11.3) Standard deviation 0.56 (0.27) 0.56 (0.28) 0.56 (0.28) 0.56 (0.28) 1.66 1.66 1.66 1.66 f −0.07 −0.07 −0.07 −0.07 bf Simulated log-likelihood −1522.40 −1522.40 −1522.40 −1522.40 1000 pseudo random draws Note:

* Insignificant (5 percent level of significance).

the measurement component of the structural equations sub-model, when reformulated as a confirmatory factor analytic model, comprises a single equation and two unknowns, and thereby fails the order condition. And yet the model is identifiable. This is because the covariance between the indicator and the differences in utilities provides an additional independent equation in the term bf that allows identification of Models 1-2 through 1-4 (and a third way for setting the scale on the latent variable). In fact, only the terms lf and bf are identifiable and not the three parameters l, b and f. Had the additional equation not been available, Models 1-2 through 1-4 would not have been identified and the analyst would be required to impose an additional constraint. This example serves to demonstrate the limitations of the set of sufficient but not necessary conditions of identification developed in this chapter, and offers an exciting direction for future research on the subject.

7

CONCLUSIONS

The HCM has gained currency over the last decade with empirical studies examining different aspects of individual behavior. The HCM combines the simplicity of random utility maximization, or discrete choice, models that belong to the GEV family, such

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

551

as the multinomial logit and nested logit models, with the flexibility offered by mixed logit models and the behavioral richness of latent variable models. Notwithstanding the popularity of the HCM, questions concerning its identification remain outstanding in the literature. In particular, the identification problem has been explored in detail for many special cases but a general framework of identification has been found wanting. In this chapter, we combined literature from the fields of discrete choice analysis and structural equation models to develop a set of sufficient conditions for theoretical identification of HCMs. The procedure for establishing identification began by decomposing the HCM into three constituent sub-models: a confirmatory factor analytic model, a structural equations model with observable variables, and a discrete choice model with exogenous observable variables. We employed a general framework of identification based on the analysis of the covariance matrix of observable data, and applied this framework to each of the three components. Wherever applicable, alternative rules that can provide quicker checks on identification were also presented. Though we focused our attention on HCMs that combine mixed logit models with choice and latent variable models, the framework of theoretical identification can be extended to incorporate multinomial probit, latent classes, multiple datasets and dynamic choice models. Next, we looked at the issue of empirical under identification, highlighting problems with the model and/or the dataset that may result in empirically unidentified models. In some cases, estimation methods can provide a useful supplement to more rigorous theoretical procedures. We discussed some of the more popular estimation techniques for determining model identifiability, and their limitations. Finally, we looked at a case study on travel mode choice to demonstrate how identification issues may manifest themselves in practice, and how they might be suitably addressed. One of the limitations of the framework of theoretical identification developed in this chapter is that it provides a set of sufficient but not necessary conditions of identification that are based on separating the model into smaller components. While such an approach is mathematically more practicable, it ignores additional information offered by the covariances between observable variables from different components of the model. Consequentially, models that may fail these conditions may still be identified, as demonstrated in section 6. Future research should attempt to develop a set of sufficient and necessary conditions for identification based on an analysis of the covariance matrix of the HCM as a whole. The exponential growth in computational power has engendered a commensurate explosion in the complexity of models being employed by studies on discrete choice analysis. The models examined explicitly in this chapter comprise but a small subset of the full range of choice models at the analyst’s disposal, but the methods described in the chapter can be used to verify identification of any general model form. However, as models grow increasingly complex so does the identification problem, and establishing identification need not always be straightforward. Nevertheless, it is imperative that the analyst verify that a model is identified before proceeding forward with estimation and inference. If used appropriately, HCMs can be powerful tools for studying individual behavior.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

552

Handbook of choice modelling

ACKNOWLEDGEMENTS This research was funded by the National Science Foundation, the University of California Transportation Center Dissertation Grant and the Multi-Campus Research Programs and Initiatives. We wish to thank David Gaker and Dave Vautin for allowing us access to the data that we used as part of our case study. We would also like to express our thanks to an anonymous referee whose comments and suggestions helped us improve the quality of our presentation.

NOTE 1. This latter result is particularly useful for HCMs that often exhibit irregularly shaped likelihood functions and multiple local maxima. Of course, the analyst can never be absolutely certain that the solution is a global maxima, but the probability that it is a global maxima is certainly higher if repeated runs converge to the same set of values.

REFERENCES Bastin, F. and C. Cirillo (2010), ‘Reducing simulation bias in mixed logit model estimation’, Journal of Choice Modelling, 3 (2), 71–88. Ben-Akiva, M. and S. Lerman (1985), Discrete Choice Analysis: Theory and Application to Travel Demand, Cambridge, MA: MIT Press. Ben-Akiva, M., J.L. Walker, A.T. Bernardino, D.A. Gopinath, T. Morikawa and A. Polydoropoulou (2002), ‘Integration of choice and latent variable models’, in S. Hani Mahmassani (ed.), In Perpetual Motion: Travel Behavior Research Opportunities and Application Challenges, Amsterdam: Elsevier, pp. 431–70. Bolduc, D. (1999), ‘A practical technique to estimation of multinomial probit models in transportation’, Transportation Research B: Methodological, 33 (1), pp. 63–79. Bollen, K.A. (1989), Structural Equations with Latent Variables, New York: John Wiley and Sons. Bunch, D.A. (1991), ‘Estimability in the multinomial probit model’, Transportation Research B, 25 (1), 1–12. Campbell, D. and S. Hess (2009), ‘Outlying sensitivities in discrete choice data: cause, consequences and remedies’, paper presented at the European Transport Conference, Amsterdam, 5–7 October. Carrasco, J.A. and J. de D. Ortúzar (2002), ‘Review and assessment of the nested logit model’, Transport Reviews: A Transnational Transdisciplinary Journal, 22 (2), 197–218. Chiou, L. and J.L. Walker (2007), ‘Masking identification of discrete choice models under simulation methods’, Journal of Econometrics, 141 (2), 683–703. Daziano, R.A. and D. Bolduc (2012), ‘Covariance, identification, and finite sample performance of the MSL and Bayes estimators of a logit model with latent variables’, Transportation, forthcoming. Fisher, F.M. (1976), The Identification Problem in Econometrics, New York: McGraw-Hill. Gaker, D., D. Vautin, A. Vij and J.L. Walker (2011), ‘The power and value of green in promoting sustainable transport behavior’, Environmental Research Letters, 6 (6), 034010. Grubbs, F.E. (1969), ‘Procedures for detecting outlying observations in samples’, Technometrics, 11 (1), 1–21. Hannemann, R.A. (2000), ‘Structural equation models: identification issues’, Sociology 203B, accessed 1 July 2012 at http://faculty.ucr.edu/~hanneman/soc203b/lectures/identify.html. Hayduk, L.A. (1988), Structural Equation Modeling with LISREL: Essentials and Advances, Baltimore, MD: Johns Hopkins University Press. Hensher, D. and W.H. Greene (2003), ‘The mixed logit model: the state of practice’, Transportation, 30 (2), 133–76. Kenny, D.A. (1979), Correlation and Causality, New York: Wiley. McFadden, D. (1984), ‘Econometric analysis of qualitative response models’, in Z. Griliches and M.D. Intriligator (eds), Handbook of Econometrics II, Amsterdam: Elsevier, pp. 1395–457. McFadden, D.L. (1986), ‘The choice theory approach to marketing research’, Marketing Science, 5 (4), 275–97. McFadden, D. and K. Train (2000), ‘Mixed MNL models for discrete response’, Journal of Applied Econometrics, 15 (5), 447–70.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

553

Muthén, L.K, and B. Muthén (2002), ‘How to use a Monte Carlo study to decide on sample size and determine power’, Structural Equation Modeling: A Multidisciplinary Journal, 9 (4), 599–620. O’Brien, R.M. (1994), ‘Identification of simple measurement models with multiple latent variables and correlated errors’, Sociological Methodology, 24, 137–70 Raveau, S., M.F. Yáñez and J. de D. Ortúzar (2012), ‘Practical and empirical identifiability of hybrid discrete choice models’, Transportation Research Part B: Methodological, 46 (10), 1374–83. Reilly, T. (1995), ‘A necessary and sufficient condition for identification of confirmatory factor analysis models of factor complexity one’, Sociological Methods and Research, 23 (4), 421–41. Reilly, T. and R.M. O’Brien (1996), ‘Identification of confirmatory factor analysis model of arbitrary complexity: the side-by-side rule’, Sociological Methods and Research, 24 (4), 473–91. Temme, D., M. Paulssen and T. Dannewald (2008), ‘Incorporating latent variables into discrete choice models – a simultaneous estimation approach using SEM software’, BuR – Business Research, 1 (2), 220–37. Train, K. E. (2009), ‘Discrete choice models with simulation’, Cambridge: Cambridge University Press. Tudela, A., K.M.N. Habib, J.A. Carrasco and A.O. Osman (2011), ‘Incorporating the explicit role of psychological factors on mode choice: a hybrid mode choice model by using data from an innovative psychometric survey’, paper presented at the Second International Choice Modeling Conference, Leeds, July. Walker, J.L. (2001), ‘Extended discrete choice models: integrated framework, flexible error structures, and latent variables’, PhD dissertation, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology. Walker, J. and M. Ben-Akiva (2002), ‘Generalized random utility model’, Mathematical Social Sciences, 43 (3), 303–43. Walker, J. L., M. Ben-Akiva and D. Bolduc (2007), ‘Identification of parameters in normal error component logit-mixture (NECLM) models’, Journal of Applied Econometrics, 22 (6), 1095–125. Zhao, J. (2009), ‘Preference accommodating and preference shaping: incorporating traveler preferences into transportation planning’, PhD dissertation, Massachusetts Institute of Technology.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

554

Handbook of choice modelling

APPENDIX A: APPLYING THE GENERAL RULES OF IDENTIFICATION TO A CONFIRMATORY FACTORY ANALYTIC MODEL In section 3.2.1, we stated that the identification problem for a confirmatory factor analytic model can be reduced to finding constraints that ensure a solution to the following system of nonlinear equations: 1 W 5 LFLr 1 DqqrDr The rules of identification presented in section 3.1 may now be applied to the above equation to verify identifiability. The general approach requires the analyst to be able to express all of the unknown parameters in L, F and q as some function of the elements of the sample covariance matrix W. Section A.1 uses this approach to demonstrate why the analyst needs to impose constraints to set the scale of the latent variables, and how this might be accomplished. Section A.2 applies equation (22.19) to evaluate identifiability of the confirmatory factor analytic model shown in Figure 22A.1B. A.1 The Location and Scale of the Latent Variable One of the first steps to ensuring identifiability of any confirmatory factor analytic model is to establish the location and scale of each latent variable included in the model specification. Since the indicator responses are usually normalized around the mean, the location of the latent variables is implicitly set to zero. This still leaves the analyst the task of imposing constraints that set the scale of the latent variables. To illustrate why this is necessary, we consider the confirmatory factor analytic model shown in Figure 22A.1A. The model consists of a single latent variable XB1 loaded on by four uncorrelated indicators I1, I2, I3 and I4. The parameters for the model are given as follows:

l

l

1

l

1

2

I1

I2

I3

I4

I1

I2

I3

I4

I5

1

2

3

4

1

2

3

4

5

a)

b)

Figure 22A.1 Examples of confirmatory factor-analytic models: (A) a model with a single latent variable loaded on by four uncorrelated indicators; (B) a model with two latent variables, five partially correlated indicators and factor complexity two

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

555

l11 l L 5 ≥ 21 ¥ , F 5 [ f11 ] l31 l41 1 0 D5 ≥ 0 0

0 1 0 0

0 f1 0 0 ¥,Q 5 ≥ 0 0 1 0

0 0 1 0

0 f2 0 0

0 0 f3 0

0 0 ¥, 0 f4

where the parameter li1 denotes the factor loading of indicator Ii on the latent variable XB1; the parameter f11 denotes the variance of the latent variable XB1; and the parameters along the diagonal of q denote the standard deviations of the measurement errors corresponding to each of the four indicators. Substituting the expressions for L, F, D and q in equation (22.18), we get: l211f11 1 f21 l l f W (q) 5 ≥ 11 21 11 l11l31f11 l11l41f11

l221f11 1 f22 l21l31f11 l21l41f11

l231f11 1 f23 l31l41f11

l241f11 1 f24

¥

Holding Q constant, for any L and F that result in a particular outcome for W (q) , L/2 and 4F result in the same outcome. Therefore, the model is theoretically unidentified and some constraints need to be imposed to set the scale of the latent variable. The nature of the identification problem is such that the scale can be set in one of multiple ways: by setting the variance of the latent variable to a constant such as one (by constraining the appropriate diagonal element of F, the covariance matrix of the latent variables), or by scaling it to any one of the observed indicators by constraining some lij coefficient, usually to one. In most cases, the choice of constraint is trivial, and the analyst is free to choose whichever constraint is most convenient from the standpoint of estimation. Usually, the scale is set by constraining the factor loading of an indicator that is strongly related to the latent variable. An advantage of this approach is that the latent variable has the same units as the indicator, and is easier to interpret. Once the scale for each latent variable has been set, the analyst should verify that the other model parameters are also identifiable. A.2 A More Complicated Example Consider, for the sake of illustration, the confirmatory factor analytic model shown in Figure 22A.1B (same as the model from Figure 22.4B). As we shall show in this section, the model is just identifiable. The model consists of two correlated latent factors: XB1 and XB2, five partially correlated indicator measures: I1, I2, I3, I4 and I5, and has a factor complexity of two, where factor complexity is defined as the maximum number of latent variables loaded on by a single indicator. The parameters for the model are given as follows:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

556

Handbook of choice modelling 1 l21 L 5 El31 0 0

1 0 D 5 E0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0 0 0 0 1

0 1 0 1 0

0 0 1 1 0

0 0 f l32U, F 5 c 11 f21 l42 1 f1 0 0 0 1 0 0U, Q 5 H 0 0 0 1 0 0

0 f2 0 0 0 0 0 0

f21 d f22 0 0 f3 0 0 0 0 0

0 0 0 f4 0 0 0 0

0 0 0 0 f5 0 0 0

0 0 0 0 0 f6 0 0

0 0 0 0 0 0 f7 0

0 0 0 0 X, 0 0 0 f8

where the parameter lij belonging to L denotes the factor loading of indicator i on latent variable j; the parameter fij belonging to F denotes the covariance between latent variables i and j; the parameters along the diagonal of Q denote the standard deviations of the independent factors in h; the first five columns of D, and the corresponding parameters f1, f2, f3, f4 and f5 belonging to Q, allow for heteroscedastic error components across the five indicator constructs; and the last three columns of D, along with the corresponding parameters f6, f7 and f8 belonging to Q, capture correlation between the three pairs of indicators (I2,I4) , (I3,I4) and (I2,I5) , respectively. The scale of the latent variables XB1 and XB2 is set by constraining l11 5 1 and l52 5 1, respectively. In all, there are 15 unknown parameters. Substituting the expressions for L, F, D and Q in equation (22.19), we get:

W5

f11 1 f21 l21f11 El31f11 1 l32f21 l42f21 f12

l221f11 1 f22 1 f26 1 f28 l21l31f11 1 l21l32f21 l231f11 1 2l31l32f21 1 l232f22 1 f23 1 f27 U l21l42f21 1 f26 l31l42f21 1 l32l42f22 1 f27 l242f22 1 f24 1 f26 1 f27 l21f21 1 f28 l31f21 1 l32f22 l42f22 f22 1 f25 1 f28

Let wij denote the element of the population covariance matrix W that lies in row i and column j, that is, wij 5 Cov (Ii,Ij) . If equation (19) allows for each of the 15 parameters to be expressed as some function of the wij’s, then the model as a whole is identifiable. There are 15 unknown parameters and at most 15 independent equations. Therefore, the model satisfies the order condition. To start, note that f21 5 w51, and so f21 is identifiable (and it equals w51 5 Cov (I5,I1) ). Then l42 and f22 may be calculated as follows: l42f21 5 w41 1 l42 5 w41 /w51 l42f22 5 w54 1 f22 5 w54w51 /w41 Similarly, l21 can be identified from the following pair of equations:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

557

l21l31f11 1 l21l32f21 5 w32 and l31f11 1 l32f21 5 w31 1 l21 5 w32 /w31 1 f11 5 w21w31 /w32 Lastly, the parameters l31 and l32 may be identified using the following pair of equations: l31f11 1 l32f21 5 w31 l31f21 1 l32f22 5 w53 Once all of the elements of F and L have been identified, we can turn our attention to Q. The covariances between the indicators f6, f7 and f8 can be solved using the three equations: l21l42f21 1 f26 5 w42 l31l42f21 1 l32l42f22 1 f27 5 w43 l21f21 1 f28 5 w52 We skip enumerating the equations, but the remaining f’s can also be identified from the five elements in W (q) along the diagonal. Therefore, all of the parameters are identifiable. In fact, the model is ‘just-identified’ in that the number of unknown parameters exactly equals the number of equations, and the model has zero degrees of freedom.

APPENDIX B: APPLYING THE RANK AND ORDER CONDITIONS OF IDENTIFICATION TO A STRUCTURAL EQUATIONS MODEL WITH OBSERVABLE VARIABLES To illustrate how the order and rank conditions might be applied, we examine the nonrecursive model of Figure 22A.2, taken from Hanneman (2000). The reader should recognize that the model is the same as the one shown in Figure 22.4C. The model parameters are as follows: 0 B 5 £ b21 0

1 G 5 £0 0

0 1 0

b12 0 0

0 0 1

1 1 0

b13 g11 b23 § , G 5 £ 0 0 0

0 1 1

y1 0 1 0 0§, Y 5 F 0 1 0 0

0 f g22 § , F 5 c 11 f21 0 0 y2 0 0 0 0

0 0 y3 0 0 0

0 0 0 y4 0 0

f22

0 0 0 0 y5 0

d 0 0 0 V, 0 0 y6

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

558

Handbook of choice modelling

x1

x2

d1

d2

d3

1

3

2

Figure 22A.2 A nonrecursive structural equations model with observable variables where the parameter bij belonging to B denotes the coefficient of the jth endogenous variable in the equation for the ith endogenous variable; the parameter gij belonging to G denotes the coefficient of the jth exogenous variable in the equation for the ith endogenous variable; the parameter fij belonging to F denotes the covariance between exogenous variables i and j; the parameters along the diagonal of Y denote the standard deviations of the independent factors in h the first three columns of G, and the corresponding parameters f1, f2 and f3 belonging to Y, allow for heteroscedastic error components across the three endogenous variables; and the last three columns of G, along with the corresponding parameters f4, f5 and f6 belonging to Y, capture correlation between the three pairs of endogenous variables (Xd1,Xd2) , (Xd2,Xd3) and (Xd1,Xd3) , respectively. To check the order condition: Ld 2 1 5 2, and the order condition is satisfied if each of the equations corresponding to the three endogenous variables excludes at least two of the remaining four exogenous and endogenous variables. It can be seen from the path diagram itself that the equation for Xd1 excludes Xd3 and Xx2 ; the equation for Xd2 excludes Xd3 and Xx1 ; and the equation for Xd3 excludes Xx1 and Xx2 . Therefore, the order condition is satisfied. To check the rank condition, we first construct the matrix A for the model: 1 A 5 £ 2b21 2b31

2 b12 1 2 b32

0 0 1

2 g11 0 0

0 2 g22 § 0

Then, the matrices \ t and A\ t corresponding to each of the three equations in the endogenous variables may be written as: 0 0 \ 1 5 E1 0 0

0 0 0 0 0U, \ 2 5 E1 0 0 1 0

0 0 0 0 0U, \ 3 5 E0 1 1 0 0

0 0 0U 0 1

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem 0 A\ 5 £ 0 1 1

0 0 2 2 g22 § , A\ 5 £ 0 0 1

2 g11 2g11 3 0 § , A\ 5 £ 0 0 0

559

0 2 g22 § 0

The rank of each of the three matrices A\ 1, A\ 2 and A\ 3 is two, and the rank condition is also satisfied. Therefore, the model is identified.

APPENDIX C: APPLYING THE GENERAL RULES OF IDENTIFICATION TO MIXED LOGIT MODELS In the following subsections, we apply the rules of identification presented in section 3.4.2 to two special cases of the mixed logit model that haven’t been addressed previously in the literature. Section C.1 examines the conditions of identification as they apply to models with random parameters on explanatory variables. Section C.2 demonstrates how the identification conditions may be established when working with panel data sets and agent effects. C.1

The Random Parameters Model

The random parameters model allows the vector of coefficients b to be randomly distributed across decision makers in the sample, and is used when the analyst has reason to believe that tastes in the sample population vary with unobservable variables or purely randomly. The model formulation with normally distributed random taste parameters can be written as: un 5 Xnbn 1 nn, where bn~N (b,UUr) bn is an (L 3 1) random normal vector with mean b and covariance matrix UUr. Substituting bn 5 b 1 UhnL, where U is the lower triangular Cholesky decomposition of the covariance matrix of bn, leads to a general factor-analytic specification with Fn 5 Xn: un 5 Xnb 1 XnUhnL 1 nn The parameters to be identified are m and the elements of U. Though the specification Fn 5 Xn does slightly change the form of Var (Dun) , the identification of equation (22.39) continues to be a sufficient condition for identification of the unknown parameters m and the elements of U. The matrix U is usually specified as diagonal, but it does not have to be. Also, the random distribution needn’t always be normal. Alternative distributions popularly employed in the literature include lognormal, triangular, uniform, truncated normal, and so on. In analyzing the covariance matrix of utility differences, we have so far assumed that the systematic portion of the utility is linearly separable from the error structure. However, with distributions such as the lognormal, the mean and the variance of the random parameter are both a function of the two disturbance parameters, and

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

560

Handbook of choice modelling

linear separability does not exist. In such a case, the covariance matrix of utility differences is no longer given by equation (22.39), and must be derived on a case-by-case basis. In the two special cases analyzed so far, the matrix Fn was held constant across decision makers, allowing us to restrict our attention to the covariance matrix of utility difference for a single decision maker. However, for the random parameters model Fn varies across observations, and the number of independent rows in W (q) can be as large as NJ. For these same reasons, the order condition is rarely restrictive, and in applying the rank condition one need only look at the column rank of the Jacobian. Through the following paragraphs, we work through the rules of identification for models where a random normal distribution is imposed on continuous and categorical alternative attributes and individual characteristics. 1.

2.

Continuous attributes: There are no identification issues per se. Data permitting, the full covariance structure (that is, variances for each parameter as well as covariances across parameters) can be estimated. Categorical attributes: An interesting and unintuitive identification issue arises when a categorical attribute is specified with independently distributed generic random parameters. Say there are C categories for the variable. Assuming no correlation, there is theoretically a bc and sc for each category c 5 1, . . .,C. However, as was mentioned in Section 3.3.1, for the systematic component bc a reference level needs arbitrarily to be selected and only (C 2 1) bc’s can be identified. However, this is not necessarily true for the disturbances. To illustrate this, we shall consider a two alternative example (since the number of alternatives for a random parameters model does not matter) and a categorical variable with two levels first, and then with three levels. Let xnjp be the pth binary variable for alternative j and individual n such that xnjp equals 1 if the categorical variable equals p, and zero otherwise. For the two levels case, adopting the scalar notation, we specify the utility of alternative j and individual n as follows: u 5 bn1xnj1 1 bn2 (1 2 xnj1) 1 nnj

1F5 c

xn11 xn21

1 2 xn11 s d , and U 5 c 1 1 2 xn21 0

0 d s2

1 Wn (q) 5 (xn11 2 xn21) 2 (s21 1 s22) 1 2g/m2

From above, it should be apparent that only the sum s21 1 s22 is estimable, and not the independent parameters themselves. Either parameter can be normalized to zero, or the parameters can be constrained to be the same. For the three levels case: unj 5 bn1xnj1 1 bn2xnj2 1 bn3 (1 2 xnj1 2 xnj2) 1 nnj 1 Fn 5 c

xn11 xn21

xn12 xn22

s1 1 2 xn11 2 xn12 d , and U 5 £ 0 1 2 xn21 2 xn22 0

0 s2 0

0 0§ s3

1 Wn (q) 5 (xn11 2 xn21) 2s21 1 (xn12 2 xn22) 2s22 1

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

561

(xn11 1 xn12 2 xn21 2 xn22) 2s23 1 2g/m2 Therefore, there is one linearly independent equation for each s in Wn (q) , that is, all three s parameters are identified. The reader should verify that the condition holds for all J $ 2 and C $ 3, that is, a random parameter for each of the categories is theoretically identified for all C $ 3. When the categorical attribute is specified with independently distributed alternative-specific random parameters, a reference level must be chosen for the disturbances, and only J (C 2 1) scj’s are estimable. For example, for a model with two alternatives, and alternative-specific random parameters on a categorical attribute with three levels: Fn 5 c

unj 5 bnj1xnj1 1 bnj2 x nj2 1 bnj2 (1 2 xnj1 2 xnj2 ) 1 nnj xn11 0

xn12 0

1 2 xn11 2 xn12 0

0 xn21

0 xn22

0 d 1 2 xn21 2 xn22

1 Wn (q) 5 x2n11s21 1 x2n12s22 1 (1 2 xn11 2 xn12) 2s23 1 x2n21s24 1 x2n22 s25 1 (1 2 xn21 2 xn22) 2s26 1 2g/m2 1 Wn (q) 5 xn11s21 1 xn12s22 1 (12xn11 2xn12) s23 1 xn21s24 1 xn22s25 1 (1 2 xn21 2 xn22) s26 1 2g/m2,

3.

where the last equality holds because the variables xnjp are binary. Therefore, there are four linearly independent equations in the six s’s, and normalization needs to be imposed on one of (s1, s2, s3) and one of (s4, s5, s6) . The reader should apply the equality condition to see that the normalization can be arbitrary. The results derived here were for a model with a single categorical variable. However, all of the results hold true for models with multiple categorical variables as well, as long as the variables themselves and the random parameters imposed on them are both independent. The extrapolation to multiple categorical variables follows from the independence assumption, which allows the analyst to break the covariance matrix into smaller pieces such that each piece corresponds to a different categorical variable. The analyst can then apply the rules of identification to the covariance matrix for each categorical variable separately. Continuous characteristics: If a random parameter is placed on a characteristic of the decision maker (for example, years employed), it necessarily must be interacted with an alternative-specific variable (otherwise it will cancel out when the differences are taken). The normalization of such parameters then depends on the type of variable with which it interacts. In general, if the characteristic interacts with alternativespecific or nest-specific binary variables, then at most one additional disturbance might be identified over the analogous model form without the interaction with the characteristic variable. This is because Fn 5 xnF, where xn is a characteristic variable of individual n, and F is the matrix containing alternative-specific and nest-specific binary variables. Then, the covariance matrix of utility differences can be given by:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

562

Handbook of choice modelling Wn (q) 5DFnUUrFrn Dr 1 a

g g bDDr 5 x2n DFUUrFrDr 1 a 2 bDDr 2 m m

1 Wn (q) 5 x2n Wec (q) 1 (1 2 x2n) a

4.

C.2

g bDDr, m2

where Wec (q) 5DFUUrFrDr1(g/m2) DDr is the covariance matrix of the error structure for the analogous error components model. Therefore, the column rank of the Jacobian for random parameters on individual characteristics can be at most one more than the column rank of the Jacobian for the analogous error components case. Since D is a function only of the number of alternatives J and is independent of the parameters, the only additional linearly independent equation can be with regards to the unknown parameter (g/m2) . For example, if the characteristic interacts with alternative-specific dummy variables, then the model is similar to the heteroskedastic case, (see Walker et al., 2007) except that for J $ 3 a variance term can be identified for all J alternatives. Categorical characteristics: For characteristics that are categorical variables (for example, low income, medium income, high income), irrespective of the interaction structure a reference level must be chosen for the disturbances, and only (C 2 1) s ’s can be identified per interaction. We omit the details of the proof, but it is nearly identical to the example where a categorical attribute was specified with independently distributed alternative-specific random parameters. Extensions to Panel Data

Panel data refers to multidimensional data that contains observations over different time periods for each decision maker in the sample. Identification for panel data is different from the cross-sectional case because the error components from the mixing distribution take the same value for all choice situations for a given decision maker, whereas the extreme value terms are i.i.d. across decision makers and choice situations. The mixed errors create correlation over choice situations for a given individual, which can be used for identification. Typically, equation (22.1) would be modified for a panel context by adding subscripts k to denote the time period of the choice and the explanatory variables for that choice. Since identification is determined via the covariance structure, we will focus on this aspect of the formulation and modify equation (22.1) such that the covariance structure is a function of all utilities faced by an individual over all time periods. For simplicity, we assume the same number of time periods (k 5 1,. . . ,K) observed for each person and a universal choice set across individuals and time periods. The covariance structure of interest for a given individual is then a function of all JK utilities that the individual faces: un,pl 5 Xn,pl b 1Fn,plUhRn 1 nn,pl

Wn,pl (q) 5 Dpl Fn,plUUr Frn,pl Drpl 1 Dpl a

g bJJKDrpl, m2

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Hybrid choice models: the identification problem

563

where pl denotes panel data, un,pl and nn,pl are (JK 3 1) vectors, Xn,pl is a (JK 3 L) matrix of observed and latent explanatory variables, b is (L 3 1) , Fn,pl is (JK 3 R) , U is (R 3 R) , hRn is (R 3 1) , JJK is a (JK 3 JK) identity matrix, and Dpl is (J 2 1) K 3 JK. The key in terms of identification is that the covariance matrix of utility differences is now of dimension (JK 2 1) 3 (JK 2 1) , which incorporates the added correlation over choice situations for a given individual, referred to as the agent effect. The idea of an agent effect is that what is unobserved for one individual in one time period is likely the same as what is unobserved for the same individual in another time period. This is implemented by having alternative- and individual-specific covariances that are repeated in all time periods for any given individual. For heteroskedastic, nested and cross-nested models, the use of panel data and a model with agent effects can result in the identification of at most one additional parameter over an equivalent model with cross-sectional data and alternative-specific effects. The proof is very similar to that in section A.1 for a model with a random parameter on a continuous characteristic such that the characteristic interacts with alternative-specific or nest-specific binary variables. To illustrate this, we consider a dataset containing two observations for each individual (K 5 2), and the same number of alternatives J across all observations and individuals. Dropping the subscript n, the matrices Dpl and Fpl can then be expressed in terms of their analogs Dcs and Fcs from the equivalent cross-sectional model as follows: Dpl 5 c

Dcs \ J 21,J

F \ J 21,J d , and Fpl 5 c cs d , Dcs Fcs

where \ MN is an (M 3 N) matrix of zeros, Dpl is a (J 2 1) K 3 JK block diagonal matrix formed by stacking the matrix Dcs along the diagonal K times, and Fpl is a JK 3 R matrix formed from stacking Fcs vertically K times, where K 5 2 in this case. Then, the covariance matrix of utility differences can be calculated as shown below: Wpl (q) 5 c

g Dcs FcsU D Dr d [ UrFrcs Drcs UrFrcs Drcs ] 1a 2 b c cs cs DcsFcsU m \ j21, j21 1 Wpl (q) 5 c

1 vecu (Wpl (q)) 5 vecua c

Wcs (q) Wcs (q) 2 (g/m2) Dcs Drcs

Wcs (q)

d

\ j21, j21 d Dcs Drcs

vecu (Wcs (q)) Wcs (q) db 5 c d 2 Wcs (q) 2 (g/m ) Dcs Drcs vecu (Wcs (q) 2 (g/m2) DcsDrcs)

Note that the same expression for vecu (Wpl (q)) holds for all K $ 2. Therefore, the row rank of the Jacobian for panel data can be at most J (J 2 1) /2 more than the row rank for the cross-sectional data (the maximum number of unique elements in Wcs (q) 2 (g/m2) Dcs Drcs). It should further be apparent that the column rank of the Jacobian for panel data can be at most one more than the column rank for cross-sectional data. Thus, the use of panel data and agent effects can result in at most one additional linearly independent equation (in g/m2) available in W (q) . This may only be true in the case of

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

564

Handbook of choice modelling

heteroskedastic, nested and cross-nested specifications, and even then not always. For the random parameters model, the use of panel data and agent effects does not change the identification problem: continuous attributes are theoretically always identifiable, and the same conditions hold for categorical attributes, and continuous and categorical characteristics as with cross-sectional data.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:48AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

23 Numerical methods for optimization-based model estimation and inference David S. Bunch

1

INTRODUCTION1

Researchers frequently use quantitative models to study and analyze choice behavior. They may obtain data on observed choice behavior through a variety of sources and methods (see Chapters 6 and 7) and then use the data to estimate or calibrate models that can be used for a variety of purposes, such as to develop and test theories, or to support the needs of decision makers. This chapter assumes a classical modeling framework where a dataset is viewed as a collection of observed outcomes from a data-generating process (DGP). Based on theory or other related considerations, a researcher may assume that the DGP is a member of a parametric family of models, that is, where a model is defined by q, a K-vector of parameters with domain (parameter space) Q. If this assumption is correct, then the DGP is defined by a specific (unknown) true parameter q0. Because observed choice data are stochastic from the perspective of an analyst, the models take the form of probability distributions. Analyzing an observed dataset requires the researcher to compute model estimates (q^ s) for various candidate model specifications, as well as related statistics for performing hypothesis tests, model selection, and other forms of inference. These activities require selection and use of computational methods. In some cases they may have been implemented within software packages developed by others, and in other cases researchers must develop and code the methods themselves. Either case requires knowledge and understanding to assure credible results. The purpose of this chapter is to provide information that contributes to this understanding. The context here will be discrete choice modeling, so some specialized notation will be helpful.2 Choices are made from a set C of discrete alternatives (indexed by j 51, . . . , J) associated with data z (perhaps in the form of a vector or array) that may be used to explain the choice behavior. Let P(yc| z, q) be the probability that alternative yc is chosen from C, conditional on explanatory data z and parameter q. Model estimation is based on a dataset of N observations (yci,zi) , for i 5 1, . . . , N, where yci and zi are the discrete choice and explanatory data, respectively, for the ith observation, and the N observations are typically assumed to be independent. The entire dataset can be represented in vectormatrix form as (y,z) .3 To be consistent with references to be cited later, we use y in various forms to denote the dependent variable (discrete choice). We use z rather than x for explanatory variables to be consistent with much of the choice modeling literature, which frequently reserves x to represent the attributes of choice alternatives. Explanatory variables can then be a general function of both attributes and decision-maker characteristics, and the average (or strict) utility for a choice alternative (Vj) can then be written as a linear-in-parameters 565 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

566

Handbook of choice modelling

form. Although it would be possible to allow for observation-specific choice sets (Ci) and sizes (Ji), we assume the same choice set for all observations because it is simpler and does not materially affect the discussion. Given the observed data, the well-known maximum likelihood estimator (MLE) can be defined as the value q^ that maximizes the log-likelihood function LL (q) 5 a logP (yci 0 zi,q) . N

(23.1)

i51

The MLE is just one example of an ‘extremum estimator’, that is, an estimator that maximizes (or minimizes) some criterion function. For models of the type discussed in this handbook, computing such estimators requires the numerical solution of a nonlinear optimization problem. In practical terms, this entails performing an iterative search in K-dimensional space. These problems can be difficult to solve, for reasons to be discussed. Moreover, many recently developed choice models are expressed as multidimensional integrals that cannot be computed using simple analytical formulas, so computing their probabilities requires expensive procedures such as simulation or numerical quadrature. In such cases computing a set of N probabilities for a dataset (even one time) can be time-consuming, rendering an iterative search in high-dimensional space that much more challenging. Examples include multinomial probit and mixed logit models (see Chapters 3 and 22 in this volume), particularly in cases with panel data or discrete choice experiments where latent error terms are correlated (see, for example, Chapter 14 in this volume).4 This chapter discusses parameter estimation problems and numerical methods that can be used to solve them, with an emphasis on understanding how and why they work. The next section reviews mathematical concepts and notation required in later sections. Section 3 discusses statistical estimators for choice models and provides a framework for addressing both statistical and computational issues. Section 4 gives details on a variety of numerical methods and procedures required for computing estimators. Section 5 concludes with a brief summary.

2

MATHEMATICAL PRELIMINARIES AND NOTATION

The subject matter of this chapter uses concepts from both econometrics/statistics and numerical analysis, and many researchers may not necessarily have equal levels of training in both areas.5 Although the mathematical concepts employed by these two areas do overlap, the notation and conventions are frequently different. We review key concepts and notation to provide a consistent basis for discussion, as well as an entrée to each literature. A few key references are heavily cited throughout, allowing ready access to more detailed and rigorous treatments of each topic. Two main econometric references are Amemiya (1985) and Cameron and Trivedi (2005).6 They use very similar notation, so we adopt it here. Estimators are defined by statistical criterion functions written as QN (q) 5 QN (y,z,q) . The form QN (q) allows everything but N and q to be suppressed when focusing on the statistical behavior of estimators as a function of sample size, a primary emphasis of econometrics. Estimators

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 567 are defined based on the nature of the criterion, for example, maximizing a likelihood function, or minimizing the sum of squared errors. Numerical analysts use a relatively consistent notation for defining optimization problems, so the main focus is on how to find the solutions. Any objective function is usually expressed as f(x), and the convention is for f(x) to be minimized with respect to x. For example, the unconstrained minimization problem is defined by Given f: Rn S R find x* [ Rn for which f (x*) # f (x) for every x [ Rn and is abbreviated as min f: Rn S R. x [ Rn With respect to terminology, numerical analysts are careful to refer to the optimal value of x (typically denoted x*, as above) as the minimizer of f(x), as distinct from the minimum of f(x), that is, the value f(x*). Because optimization is a central issue here, we adopt the convention of locating minimizers to be consistent with this literature. The main reference on numerical methods is Dennis and Schnabel (1996). Both econometrics and optimization frequently rely on multivariable calculus to prove results and develop methods. For example, econometricians are concerned about whether an estimator is consistent and/or efficient, if its distribution behaves like a normal distribution when N gets large, and if so, how to best estimate a variancecovariance matrix for hypothesis testing – see section 3. Numerical analysts are concerned with whether an algorithm reliably converges to a solution, and if so, how quickly – see section 4. Theoretical results that address these issues typically require assumptions on the behavior of the first- and second-order derivatives of QN (q) . Discussing these in an understandable way requires clear notation. An econometrician might denote the K 3 1 vector of partial derivatives of QN (q) evaluated at q* by 0QN 2 0q q*

5 c

r 0QN 0Q 0Q (q*) , N (q*) , . . . , N (q*) d 0q1 0q2 0qK

where the prime symbol (r) means transpose. In contrast, numerical analysts frequently use the symbol  to denote the gradient of f at x*, defined by f (x*) 5 c

T 0f 0f 0f (x*) , (x*) , . . ., (x*) d , 0x1 0x2 0xn

where T denotes transpose. Econometricians might represent the K 3 K matrix of second partial derivatives of QN (q) evaluated at q* by

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

568

Handbook of choice modelling 02QN (q) 02QN (q) 02QN (q*) 2 , where c 2 d 5 for 1 # i, j # K 0q0qr q* 0q0qr q* ij 0qi0qj

whereas numerical analysts use 2 to denote the Hessian of f at x*, defined as the n 3 n matrix whose i, j element is given by 2f (x*) ij 5

0 2f (x ) for 1 # i, j # n. 0xi 0xj *

Having established this notation, we may choose to use any combination of its features as needed. Proofs of key results in both econometrics and optimization typically assume existence and continuity of the derivatives, as well as non-singularity of the matrix of second partial derivatives at or near a solution. These assumptions support the use of Taylor series-based approximations toQN (q) . To illustrate, recall that the primary goal of estimation is to locate a minimizer of QN (q) . For this problem to be well posed, a minimizer of QN (q) must actually exist, and it is also desirable for it to be unique within a local neighborhood (as will be seen shortly). Suppose an iterative search algorithm returns with a value q^ that it reports is a solution. Assume that the gradient and Hessian of QN (q) are continuous in an area D containing q^ , and that 2QN (q^ ) is positive definite. Let a vector s represent any step away from q^ to a new point q^ 1 s. For any s [ RK such that q^ 1 s [ D, it can be shown that there exists a vector q lying in the open interval (q^ ,q^ 1 s) for which the following quadratic equation is true.7 1 Q (q^ 1 s) 5 Q (q^ ) 1 Q (q^ ) Ts 1 sT 2Q (q) s 2

(23.2)

Now, assume also that the usual first-order condition for optimality holds, that is, QN (q^ ) 5 0

(23.3)

Then 1 Q (q^ 1 s) 2 Q (q^ ) 5 sT 2Q (q) s . 0 (23.4) 2 for all s because QN (q) is positive definite. Therefore Q (q^ ) , Q (q^ 1 s) for all q^ 1 s [ D, that is, q^ is a unique local minimizer. This idea of using a model to approximate a nonlinear function in a local neighborhood is helpful for understanding fundamental concepts in both econometrics and numerical optimization, as will be seen in later sections.

3

ESTIMATORS FOR DISCRETE CHOICE MODELS

Various estimators for discrete choice models have been proposed in the literature, and frequently they can be considered extremum estimators, which many authors also call

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 569 m-estimators. The distinction is that m-estimators are written as a sum of N independent observations, and may be considered a special case. Following Cameron and Trivedi (1985, eq. 5.4) we define m-estimators using the following form for QN (q) : QN (q) 5

1 N 1 N qi (q) 5 a q (yi,zi,q) a N i51 N i51

(23.5)

where the sub-function q ( # ) yields a criterion measure for the ith observation. Amemiya (1985, ch. 4) defines extremum estimators using a general Q, and provides results on their properties. Cameron and Trivedi (2005) focus more on m-estimators, and provide a more accessible version of Amemiya’s (1985) results. The 1/N factor in equation (23.5) does not materially affect the results (Amemiya, 1985, omits it) but they include it to simplify the statement and proof of theoretical results on statistical properties.8 The term m-estimator is generally attributed to Huber, and has been interpreted to be an abbreviation for maximum likelihood-like estimator. In contrast to much of the econometrics literature, Huber (1981, eq. 2.1) defines the m-estimator in terms of minimization (rather than maximization), which is consistent with the convention adopted here. Theory suggests that the most general extremum estimator be defined as: q^ 5 argminQN (q) subject to q [ Q

(23.6)

However, features of this definition have practical implications from both an optimization and econometric perspective. As written, equation (23.6) formally defines q^ as a constrained global minimizer. First, although this definition may be a logical choice in theory, constraints can greatly complicate the solution of nonlinear optimization problems. Amemiya (1985, p. 108) notes that for this reason estimators are typically obtained by unconstrained optimization.9 Second, and perhaps more important, it can be difficult to prove that a computed estimate yields a global optimum rather than a local optimum. For this reason, estimators are frequently defined as solutions to the equation QN (q) 5 0

(23.7)

that is, the first-order conditions for an unconstrained local optimum discussed in the previous section. With regard to statistical behavior, Amemiya (1985, ch. 4) proves consistency of extremum estimators for both global and local optima.10 Although there are potential issues with multiple local optima, it is also true that asymptotic normality (another important property) can generally be proved only for estimators defined as local optima – see Amemiya (1985, p. 110). However, there are other types of estimators, particularly in econometrics. Cameron and Trivedi (2005, pp. 119, 134) suggest a framework for defining an estimator as q^ that solves a system of K estimating equations hN (q) 5

1 N a h (yi,zi,q) 5 0. N i51

(23.8)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

570

Handbook of choice modelling

One type of estimator (called an ‘analogue estimator’) uses population moment conditions to suggest sample moment conditions for defining h. More generally, Cameron and Trivedi (1985, p. 135) indicate that the theory for these estimators ‘can be subsumed within that for generalized method of moments’ (emphasis in the original). Because h in equation (23.8) is a general function, equation (23.7) could be considered a special case. For this reason Cameron and Trivedi (1985) suggest that estimators based on (23.7) could be called ‘estimating equations estimators’ rather than ‘m-estimators’. On the other hand, solutions to equation (23.8) are also solutions to the problem of minimizing the criterion function defined by hN (q) ThN (q) , so in some ways this is a matter of perspective. When developing numerical methods to solve these problems the primary concern is effectiveness and practicality, and aspects of both viewpoints can be incorporated. The remainder of this section reviews more details on estimation and analysis of discrete choice models. It begins with introductory material, followed by presentation of a general estimation framework that can be used to address both statistical and computational issues. Statistical properties and computation of variance estimates are reviewed, and then computational implications for estimation and inference are explored in more detail using MLE for discrete choice models as an example. 3.1

M-estimators for Discrete Choice Models

The MLE definition in equation (23.1) is frequently used in the choice-modeling literature, and is familiar to many. However, a more complete understanding requires additional detail. Maximum likelihood estimation is formally defined as finding q^ that maximizes the joint likelihood (or joint probability) of having observed the dataset (y, z). Using general notation – see Cameron and Trivedi (1985, p. 139) – the probability of observing (yi,zi) conditional on q is given by density function f (yi,zi 0 q) , so in vector notation the likelihood function can be defined as LN (q) 5 LN (q 0 y,z) ; f (y,z 0 q) . For a given q, LN (q) is the probability of having observed (y, z), and the idea is to find q^ that maximizes this probability. In this definition both y and z can depend on q; however, many applications assume that z depends on parameters other than q (for example, g), and also that these are not of interest. This allows us to write LN (q,g) 5 f (y,z 0 q,g) 5 f (y 0 z,q) f (z 0 g) . Because the marginal density of z does not affect the estimation of q, it can be dropped and the MLE can be defined using the conditional likelihood function LN (q) 5 f (y 0 z,q) . Technically, this should be called the conditional maximum likelihood estimator, but it is common for the word ‘conditional’ to be dropped. Although most analyses are conditional, there are cases in discrete choice modeling that require a full likelihood approach. An important example is choice-based sampling, where data are collected from respondents based on an observed choice. For example, if only a small percentage of the population rides a train to work, in a mode choice study it is more efficient to recruit and survey them during their commute on the train. See Amemiya (1985, pp. 319–38, with key references on p. 321) and Cameron and Trivedi (2005, pp. 822–27). Although we do not consider specific examples of this type, it is important to recognize the distinction. The most general results on statistical properties in section 3.3 are still applicable, and computational methods in section 4 would still be

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 571 used (in some form) to obtain estimates. Having said this, we proceed with the conditional likelihood case for clarity of presentation. Statistical independence among observations implies that LN (q) is given by LN (q) 5 q f (yi 0 zi,q) . N

(23.9)

i51

It is common practice to take the log of LN (q) , yielding

LLN (q) 5 log [ LN (q) ] 5 a log [ f (yi 0 zi,q) ] N

(23.10)

i51

because doing so does not change the value of the estimate, and dividing this expression by N yields an expression that generally converges to a constant – see Amemiya (1985, p. 115). The gradient LLN (q) in the first-order condition for local optima (equation (23.7)) is called the ‘score vector’. Estimation of discrete choice models with J 5 2 has the same essential features as a broad range of other nonlinear models with scalar dependent variables. However, for J . 2 their estimation takes on additional structure. For example, although equation (23.1) looks almost exactly like equation (23.9), there is a key difference. The discrete choice for observation i represents a single random draw from a multinomial distribution with J-dimensional probability vector Pi(q) 5 P(zi, q) 5 [P1(zi, q), . . . , PJ(zi, q)]. Although the index of the chosen alternative can be recorded as yci, an equivalent dependent variable can be expressed as a J-dimensional vector yi of indicators (yi1, yi2, . . . , yiJ), where yij 5 1 for yci 5 j, and yij 5 0 otherwise. Using this notation, the (conditional) likelihood function LN (q) is the product of N individual likelihoods, LN (q) 5 q li (q) 5 q l (yi 0 zi,q) , where l (yi 0 zi,q) 5 q Pj (zi,q) y ,

(23.11)

LLN (q) 5 a l (yi 0 zi,q) 5 a a yij logPj (zi,q) .

(23.12)

N

N

J

i51

i51

j51

ij

and taking logs yields

N

i51

N

J

n 51 j51

The first part of equation (23.12) looks like equation (23.9). However, the multinomial structure yields a double sum in the second part of equation (23.12). Note that this still fits the definition of an m-estimator, since it is equivalent to minimizing equation (23.5) using q (yi,zi,q) ; 2log [ l (yi 0 zi,q) ] .

(23.13)

The motivation for the above example was a single discrete choice per observation, conditional on a stochastic zi; however, the notation can be used more generally to represent other choice modeling situations. For example, consider the case of repeated measurements where zi can only take on S distinct possible combinations of values. This

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

572

Handbook of choice modelling

could occur in an experimental setting where zi represents a specific form of treatment applied to multiple subjects. Or, in a more standard application involving survey data, zi might be based on household demographic variables that have been coded using a finite number of categories. In this situation we refer to S ‘choice settings’ characterized by zi, i 5 1, . . . , S, and use yi to record the number of times each choice alternative is chosen in choice setting i. The number of observed choices in choice setting i (ni), and the total number of choices (N) are given by ni 5 a yij and N 5 a ni, J

S

j51

i51

respectively. If we continue to assume that all choices within a setting are drawn from a multinomial distribution Pi(q), the likelihood for choice setting i is now given by li (q) 5

ni!

y q Pj (zi,q) ij. J

j51 q yij! J

(23.14)

j51

The MLE can be defined exactly as before, that is, by equations (23.11) and (23.12), because the only difference is a constant term that can be omitted. Equivalently, the MLE can be defined as the minimizer of QN (q) given by equations (23.5) and (23.13): QN (q) 5 2

1 S 1 S J log [ l (yi 0 zi,q) ] 5 2 a a yijlogPj (zi,q) a N i51 N i51 j51

(23.15)

where, again, the constant terms not affecting the estimation have been dropped. Note that the first equation in (23.15) defines the MLE even in cases where the observations within a choice setting are not treated as simple repeated measures, as, for example, in panel data or stated choice experiments. For historical reference, it is worth noting that the second equation in (23.15) is the formulation used in the seminal McFadden (1974) paper, and was also the subject of investigation by other researchers at the time – see Amemiya (1985, ch. 9 and references therein). It has been included here to illustrate a wider range of possible estimators, and the role that the DGP can play in their statistical behavior. For example, in the repeated measures case, alternative estimators that use relative frequencies fij 5 yij/ni as direct estimates of true choice probabilities (so-called sufficient statistics) can be considered. One such estimator is nonlinear least squares (NLLS) defined by QN (q) 5

2 1 N J 1 yij a 2 Pj (zi,q) b . a a N i51 j51 2 ni

(23.16)

Note that there is nothing that precludes this estimator for the case of ni 5 1. This is based on the nonlinear regression model of the form yij ni

5 Pj (zi,q) 1 uij,

(23.17)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 573 where uij is a stochastic error term. Although NLLS based on (23.16) is clearly consistent, it is also inefficient. Because uij is heteroskedastic with known properties, (23.16) can be modified using appropriate weights to improve its statistical properties. In any case, all m-estimators share some common properties, as will be shown next. 3.2

A Generalized Estimation Framework

The structure of m-estimators fits neatly into a general parameter estimation framework used by Gay and Welsch (1988) and Bunch, Gay, and Welsch (1993). It can be viewed as adding structure to equation (23.5) by introducing a model and/or residual. The following is a slightly simplified version, modified to use this chapter’s notation. Let Q (q) 5 a ri (hi (q) ) N

(23.18)

i51

where hi is a model function and ri is a criterion function that measures the ‘error’ from using hi as an ‘approximation’ to the ith response variable. (Note that the 1/N factor is not included, as mentioned earlier.) The subscript i indicates (possible) dependence on observed data, typically explanatory data in hi, and the dependent variable in ri. For example, standard linear regression is defined by hi (q) 5 zTiq and ri (h) 5 (yi 2 h) 2. The m-estimators defined by equations (23.12), (23.15) and (23.16) can be rewritten in the form of (23.18) by a suitable re-indexing and revision in notation – a detailed example for (23.12) is given in section 3.4. Note for now that MLEs would use ri (h) 5 2yi log (h) and NLLS would use ri (h) 5 (fi 2 h) 2, where fi is an observed relative frequency. The first and second derivatives of Q (q) are of interest for reasons discussed in section 2. The composite structure of (23.18) yields the following form for the gradient Q (q) 5 JTrr

(23.19)

where J is the N 3 K Jacobian matrix for the model h, defined by Jik 5

0hi (q) for i 5 1,. . .,N and k 5 1,. . .K 0qk

(23.20)

and rr is the N 3 1 vector defined by (rr) i 5

0ri (h ) for i 5 1,. . . ,N. 0h i

(23.21)

The Hessian matrix has the form

2Q (q) 5 JT 8rs9J 1 a (rr) i2hi (q) N

(23.22)

i51

where ,rs. is the diagonal matrix defined by

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

574

Handbook of choice modelling 8rs9 5 diag a

02rN 02r1 02r2 ( ) ( ) (h ) b h , h , . . . , 0h2 1 0h2 2 0h2 N

(23.23)

These expressions have a number of implications for both estimation and inference. With regard to computing, the derivatives of r are simple and easily coded. The Jacobian matrix is solely a function of the model, regardless of what type of estimator is employed. These derivatives will be nontrivial in general because discrete choice models are nonlinear. However, many choice models include terms of the form V 5 zTb , so in these cases the chain rule will yield a term that consists simply of z. Once the Jacobian is available, the gradient is simple to compute and so too is the first term of the Hessian in equation (23.22). Moreover, it is clear that this term is positive definite (or, at the very least, positive semi-definite). For a unique local minimizer to exist, the Hessian must be positive definite at the computed estimate. Since the first term is assured to be positive (semi-) definite, attention is drawn to the second term and whether it might adversely affect the positive-definiteness of the Hessian at the solution. In addition, it requires a substantial amount of additional computation, that is, N K 3 K Hessian matrices. Because of these issues, the second term of (23.22) has been referred to as ‘the mess matrix’ among numerical analysts – see, for example, Gay and Welsch (1988). Before discussing this framework in more detail, we review statistical properties of estimators that will turn out to be related. 3.3 Statistical Properties of Extremum and M-estimators Amemiya (1985, ch. 4) establishes conditions for consistency and asymptotically normality of extremum estimators, and applies them to a variety of specific cases and models. Cameron and Trivedi (2005) provide a more accessible version of Amemiya’s (1985) results, as well as helpful examples. Both references remark that verifying the conditions are difficult in general, and need to be done on a case-by-case basis. This is consistent with their presentation of examples, which are frequently m-estimators. Given the complexity of this topic, we provide a generic, non-rigorous discussion of asymptotic normality and variance estimates that can be used for inference. General results for extremum estimators are covered first, followed by maximum likelihood. To begin, we review necessary definitions and terminology. Consider an estimator q^ from a sample with N observations. To perform some types of inference, we would like for it to have the following known property: In large enough samples, q^ is asymptotically normally distributed with mean q0 and asymptotic variance matrix11 V [ q^ ] 5 N21 g 0

(23.24)

where g 0 is a matrix to be determined that depends on q0 and properties of the DGP. Because q^ is consistent, its value collapses exactly to q0 as N S `, that is, it has a degenerate distribution. However, the asymptotic distribution of the transformed quantity bN 5 "N (q^ 2 q0) yields the desired result for the estimators considered here, d stated in the form "N (q^ 2 q0) S N (0,g 0 ) . This is called the limit distribution of q^ . To Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 575 ˆ for g 0 proceed, we need two things: An expression for g0, and a consistent estimator g ˆ. ˆ [ q^ ]5N21 g that allows us to calculate V For the general case of extremum estimators, the limit distribution is obtained by starting with a first-order Taylor approximation of Q (q) around the true parameter q0, evaluated at the estimator q^ : QN (q^ ) < QN (q0) 1 2QN (q) (q^ 2 q0) .

(23.25)

Because QN (q^ ) 5 0, equation (23.23) can be solved for (q^ 2 q0) : (q^ 2 q0) < 22QN (q) 21QN (q0) .

(23.26)

"N (q^ 2 q0) 5 22QN (q) 21 # "NQN (q0) .

(23.27)

A0 5 2 plim 2QN (q0) for any sequence for which plim q 5 q0,

(23.28)

It is then rescaled by a factor of "N:

The asymptotic variance of the left-hand side depends on the behavior of both the Hessian and the rescaled gradient. The conditions are given by Cameron and Trivedi (2005, p. 128). The Hessian must converge in probability to the finite (nonstochastic) nonsingular matrix12

and the rescaled gradient must converge in distribution as follows:

"NQN (q0) S N [ 0,B0 ] where B0 5 plim [ NQN (q0) QN (q0) T ] .

(23.29)

21 "N (q^ 2 q0) S N [ 0,g0 ] , where g 0 5 A21 0 B0A0 .

(23.30)

If these (and other) conditions hold, then the limit distribution of the extremum estimator is given by

The asymptotic variance g 0 takes on the so-called sandwich form, because B0 is sand^ wiched between two A21 0 matrices. A sandwich estimate of the asymptotic variance of q is of the form ˆ (q^ ) 5 1 A ˆ 21B ˆA ˆ 21 V (23.31) N ˆ is a consistent estimate for B0. where Aˆ is a consistent estimate for A0 and B This general form has been playing an increasingly important role in practical research ˆ is consistent under relatively weak assumptions, equation (31) is called settings. When B a ‘robust sandwich estimate’, and it yields robust standard errors. It is sometimes called the Huber estimate (Huber, 1967), or the White estimate (White, 1982). We next consider the specific case of maximum likelihood. We are particularly interested in the relationship between these results and those of the more general case. There are many possible scenarios under which MLE properties may be derived, but there are

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

576

Handbook of choice modelling

many similarities. Proofs typically involve so-called regularity conditions that can be used to establish a key result: the information matrix equality. For purposes of illustration we continue the previous conditional MLE example that uses equations (23.9) and (23.10) – for more complete details, see Cameron and Trivedi (2005, pp. 141–3). Another useful reference with an extended treatment of maximum likelihood is Davidson and MacKinnon (1993, ch. 8). The information matrix (or Fisher information) can be defined as the expectation of the outer product of the score vector: I 5 E [ LLN (q) LLN (q) T ]

(23.32)

Appropriate regularity conditions yield the information matrix equality 2Ef [ 2LLN (q0) ] 5 Ef [ LLN (q0) LLN (q0) T ] 5 I,

(23.33)

where the expectation is taken with respect to the true conditional density defined by q0. Because the observations in (23.10) are independent identically distributed (i.i.d.), it can be shown that Ef [ 2LLN (q0) ] 5 2A0 and Ef [ LLN (q0) LLN (q0) T ] 5 B0

(23.34)

where A0 and B0 are defined in equations (23.28) and (23.29), respectively. Equations (23.33) and (23.34) imply that A0 5 B0 so the asymptotic variance in equation (23.30) simplifies to g 0 5 A21 0 . Finally, a well-known property of MLEs is that this g 0 is the Cramer-Rao lower bound, that is, MLEs generally have the smallest asymptotic variance among the so-called ‘root-N estimators’, making them asymptotically efficient. It is on the basis of these results that MLE software frequently computes variance ˆ or B ˆ . However, the validity of this approach rests heavily estimates for q^ using either A on the strong assumption that the model has been correctly specified. For this reason, researchers have been increasingly encouraged to compute robust variance estimates for MLEs by using equation (23.31). However, this is not a panacea. Assessing the potential benefits (or non-benefits) of using robust variance estimates requires understanding how the MLE behaves when the model is incorrectly specified. Under these conditions it is called a quasi-maximum likelihood estimator (QMLE) – see White (1982), and Cameron and Trivedi (2005, s. 5.7).13 Under appropriate conditions it converges to a well-defined limit, which may or may not include elements of q0. Even in the fortunate case where the estimator remains consistent for certain parameters in q0, the information matrix equality will almost certainly no longer hold. However, in this case equation (23.31) would give consistent variance estimates. Unfortunately, when the model is incorrectly specified it is likely that none of the parameters are consistent, so equation (23.31) may not be helpful. A related recent development is the use of so-called composite marginal likelihood methods, where for reasons of practicality a simpler likelihood function is specified as an approximation to the full likelihood (thereby ensuring a quasi-maximum likelihood estimator). This is used in cases where an m-dimensional dependent variable would be expected to have correlated components as in, for example, panel data or data with

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 577 spatial correlations. The idea is to approximate a joint density function by a product of simpler (marginal) densities that are much easier to compute – see, for example, Varin (2008). For a discrete choice application, see Bhat et al. (2010). Under the right conditions the QMLE is consistent for a set of parameters of interest, but because the density is incompletely specified, equation (23.31) is required to compute consistent variance estimates for the associated parameters. With regard to computing variance estimates, we briefly comment here and provide ˆ and B ˆ more discussion in section 3.4. For general m-estimators, common choices for A are based on sample averages of the relevant derivatives, which may be available as a by-product of performing estimation. Specifically, the Hessian (or empirical Hessian) estimate for A0 is 2QN (q^ ) . Similarly, an estimate for B0 is the average of the outer products given by 1 N BˆOP 5 a qi (q^ ) qi (q^ ) T. N i51

(23.35)

In the specific case of conditional MLEs, direct computation of the expectations in equation (23.34) may be practical options. 3.4

Computational Considerations

The practical implications of the previous two sections can be demonstrated using an example: conditional maximum likelihood for the original case of once choice per observation. To review, assume a DGP generates independent observations (yi, zi), for i 5 1, . . . , N, where the zis are stochastic with some unknown (but well-behaved) distribution, and the MLE defined by (23.12) is to be solved using the general estimation framework of section 3.2. (Note: some of this discussion foreshadows topics to be addressed in section 4.) There are multiple ways to proceed, but one way is to re-index (i, j) in the double sum to l in a single sum as follows: (i 5 1, j 5 1) S l 5 1 (i 5 1, j 5 2) S l 5 2 ( (i 5 1, j 5 J) S l 5 J (i 5 2, j 5 1) S l 5 J 1 1 ( (i 5 N,j 5 J) S l 5 NJ Then the MLE problem is defined by rl (h) 5 2yllog (h) and hl (q) 5Pl (q) 5Pj(l) (zi(l),q) .

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

578

Handbook of choice modelling

The Jacobian matrix in equation (23.20) and the rr vector in equation (23.21) are given by PT1 2 y1 /P1 § J5 £ ( and rr 5 £ ( § PTNJ 2 yNJ /PNJ

respectively, and the diagonal matrix 8rs9 in equation (23.23) is given by 8rs9 5 diag c

yNJ y1 y2 , ,. . ., 2 d . P21 P22 PNJ

(23.36)

(23.37)

Using these definitions, the gradient and Hessian of the negative log-likelihood NLLN (q) are given by yl P (q) , ( ) l P l51 l q

NLLN (q) 5 JTrr 5 2 a NJ

(23.38)

and NJ NJ yl yl 2NLLN (q) 5 a 2 Pl (q) Pl (q) T 2 a 2Pl (q) , ( ) ( ) P P q l51 l l51 ij q

(23.39)

respectively. Although this maintains consistency with the general framework, it is also possible to revert to the double sum version, which can be more convenient for seeing certain features. For example, the gradient can be written N J yij NLLN (q) 5 2 a a P (q) . ( ) ij i51 j51 Pij q

(23.40)

If the model is correctly specified, then E [ yij ] 5 Pij (q) and N J E [y ] N J ij E [ NLLN (q0) ] 5 2 a a Pij (q0) 5 2 a a Pij (q0) 5 0 i51 j51 Pij (q0) i51 j51

(23.41)

a Pij (q0) 5  a Pij (q0) 5  [ 1 ] 5 0,

(23.42)

because J

J

j51

j51

demonstrating the consistency of the estimator. To perform inference, one estimate of g 0 5 2A0 in equation (34) is the empirical Hessian given by N212NLLN (q^ ) , so the asymptotic variance estimate for q^ is ˆ (q^ ) 5 N21 [ N212NLLN (q^ ) ] 21 5 [ 2NLLN (q^ ) ] 21, V

(23.43)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 579 that is, it is just the Hessian evaluated at the solution. This would be available as a by-product of the estimation if Newton’s method were used to obtain q^ . However, the Hessian in (23.39) could be very expensive to calculate due to the second term, making Newton’s method impractical. Estimates are frequently obtained using other methods, but it would still be possible to compute (23.39) at the conclusion of the search, perhaps using finite differences – see section 4. An alternative asymptotic variance estimate is obtained by directly taking the expectation of the Hessian. The following double-sum version of the Hessian, N J N J yij yij 2NLLN (q) 5 a a 2 Pij (q) Pij (q) T 2 a a 2Pij (q) , ( ) ( ) i51 j51 Pij q i51 j51 Pij q

(23.44)

is helpful for investigating this. The expectation is N J E [y ] N J E [y ] ij ij E [ 2NLLN (q0) ] 5 a a 2 Pij (q0) Pij (q0) T 2 a a 2Pij (q0) i51 j51 Pij (q0) i51 j51 Pij (q0) N J N J 1 5 a a Pij (q0) Pij (q0) T 2 a a 2Pij (q0) ( ) i51 j51 Pij q0 i51 j51

(23.45)

N J 1 5 a a P (q ) Pij (q0) T. ( ) ij 0 i51 j51 Pij q0

This result has a number of implications. First, from equation (23.34) this expression should also equal the information matrix, which is readily confirmed. Second, as a formula for computing a variance estimate, (23.45) has multiple attractive features. It is easy to compute, only requires first derivatives, and would likely be positive definite. This is because the problematic ‘mess matrix’ disappears in expectation. This raises the prospect that (23.45) might be a reasonable and inexpensive Hessian approximation in an iterative search. In fact, this is the matrix that would be used in the method of scoring. A closely related idea is that, because E [ 2NLLN (q0) ] 5 E [ NLLN (q0) NLLN (q0) T ] N J yij 5 Ec a a 2 P (q ) Pij (q0) T d , ( ) ij 0 i51 j51 Pij q0

(23.46)

the first term of equation (23.44) might be used as a Hessian approximation. This, in fact, is the motivation for the Berndt et al. (1974) approach – see section 4.3. For this example, ˆ OP, where B ˆ OP is from equation (23.35). Note the first term of (23.44) is equivalent to NB 21 [ ˆ OP ] 21 also that simple inversion yields N B , the sample-based asymptotic variance estimate for q^ . The next section explores all of these ideas in more detail.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

580

4

Handbook of choice modelling

UNCONSTRAINED MINIMIZATION METHODS BASED ON NEWTON’S METHOD

Discussing estimation algorithms typically starts with Newton’s method. When considering an iterative search, let qc denote the current iterate and let sc denote the step taken to reach the new iterate q1 (that is, q1 5 qc 1 sc , where sc is determined during the current iteration). Newton’s method (or, the ‘Newton-Raphson method’), in its simple, unmodified form, is: (0) Choose a starting value q0, and set q1 5 q0. (1) Set qc 5 q1. (2) Find the Newton step sNc by solving the K 3 K system of linear equations 2Q (qc) sNc 5 2Q (qc) .

(23.47)

(3) Set q1 5 qc 1 sNc. Decide to stop, or go to Step 1. For our purposes Newton’s method can be viewed as a prototype algorithm with both positive and negative features. Briefly, a positive feature is that (if it converges) it converges very quickly when it gets close enough to a solution. Two negative features are (1) it requires derivatives that can be expensive to compute, and (2) it is not guaranteed to converge. Today’s estimation methods are the result of efforts to mitigate the negative aspects of Newton’s method while preserving its positive features, and so they are frequently referred to as quasi-Newton methods. Twin goals for an optimization algorithm are to (1) reliably converge to a local minimizer from any starting point, and (2) do so as quickly and cheaply as possible. Algorithms meeting the first goal are called ‘globally convergent’ (not to be confused with ‘finding a global optimum’) and are said to employ a global strategy. The second goal requires a local strategy that attempts to replicate the behavior of Newton’s method near a solution. Understanding how this is done requires a closer look at Newton’s method. We provide this first, and then discuss global and local strategies, respectively. Two additional topics are: computing finite difference gradients and Hessians, and stopping rules. 4.1

Features of Newton’s Method

The idea behind Newton’s method is that, given a current iterate qc, the derivatives of an objective function Q at qc can be used to build a quadratic model Mc that approximates Q near qc: Q (q) < Mc (q) 5 Q (qc) 1 Q (qc) T (q 2 qc) 1

1 (q 2 qc) T 2Q (qc) (q 2 qc) . 2 (23.48)

Substituting q 5 qc 1 s yields 1 Q (qc 1 s) < Mc (qc 1 s) 5 Q (qc) 1 Q (qc) Ts 1 sT 2Q (qc) s, 2

(23.49)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 581 and the Newton step sNc is determined by finding the s that minimizes Mc (qc 1 s) . This requires the following first-order condition to hold: Mc (qc 1 sNc) 5 Q (qc) 1 2Q (qc) sNc 5 0.

(23.50)

Solving (23.50) corresponds to step (2) of Newton’s method. Note that the solution to (23.50) can be represented in closed form by the expression sNc 5 22Q (qc) 21Q (qc) .

(23.51)

This is frequently seen in statistics or econometrics books; however, it is important to understand that in practice this expression should never be computed as written. Using a linear equation solver for equation (23.47) is more computationally efficient and numerically stable than computing sNc via a matrix inversion and multiplication. Recall that the goal is to find a minimizer of Q. Why would this approach work, and if so, how well? To begin, assume that a local minimizer of Q exists and that 2Q (q) is positive definite in the region near qc. If qc is close enough to the minimizer q^ , it can be shown that (23.48) is a good approximation to Q. The minimizer of Mc should therefore be a good ‘guess’ for the minimizer of Q. In fact, note that if Q itself were a quadratic function, then Mc 5 Q and this procedure would give the exact minimizer of Q in a single step. More generally, it can be shown that if an iterate qc enters a region close enough to a local optimum, then this method will proceed to converge, and it will do so very quickly. Speed of convergence can be characterized quantitatively. Let the symbol 7q1 2 q2 7 denote a measure of distance between two points in K-dimensional space (for example, Euclidian distance). Three types of convergence are defined as follows – see Dennis and Schnabel (1996, p. 20). Consider a sequence of iterates { qi }5 { q0, q1, q2, . . . } . If there exists a constant c [ [ 0,1) and an integer ˆi $ 0 such that for all i $ ˆi 7qi11 2 q^ 7 # c7qi 2 q^ 7

(23.52)

7qi11 2 q^ 7 # ci 7qi 2 q^ 7

(23.53)

7qi11 2 q^ 7 # c7qi 2 q^ 7 p,

(23.54)

then { qi } is said to be q-linearly convergent to q^ . If for some sequence { ci } that converges to zero

then { qi } is said to converge q-superlinearly to q^ . If there exist constants p . 1, c $ 0 and ˆi $ 0 so that for all i $ ˆi

then { qi } is said to converge to q^ with q-order at least p. If p 5 2 or 3, the convergence is said to be q-quadratic or q-cubic, respectively. In practice, q-linear convergence is considered to be quite slow, whereas q-superlinear and q-quadratic are considered fast. Newton’s method is q-quadratic (assuming the required conditions hold).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

582

Handbook of choice modelling

To summarize, two positive features of Newton’s method are (1) q-quadratic local convergence from a good starting point (if 2Q (q^ ) is nonsingular) and (2) an exact solution in one step if Q is quadratic. Next are the negative features. First, 2Q (qc) must be computed for every iteration. This is very time consuming for most types of applications, including estimation problems as already discussed. Experience has shown that writing computer code for analytic Hessians can be problematic. Using finite differences as an alternative is discussed in section 4.4. One way to address these concerns is to replace equation (23.49) with 1 Mc (qc 1 s) 5 Q (qc) 1 Q (qc) Ts 1 sTHcs 2

(23.55)

where Hc is a less expensive Hessian approximation. However, the tradeoff is the loss of local q-quadratic convergence. The challenge is to develop versions of Hc that are less expensive to compute, but also fast (for example, q-superlinear). Another major problem with Newton’s method is that there is no guarantee that it will actually converge, and, if it does, that it will converge to a local minimizer. The idea behind repeated solution of (23.50) is to reach a point where the first-order condition Q (q) 5 0 holds. However, this condition does not distinguish among minimizers, maximizers, and saddle points. Unless Q is known to be globally convex (that is, 2Q (q) is positive definite everywhere) the Newton iteration could easily converge to a maximizer, depending on the starting point. For example, if an iterate qc enters a region where 2Q (qc) is negative definite, then computing the next step by solving (23.50) would yield the maximizer of Mc. To show this more explicitly, note that any step sc consists of two features: a direction (dc) and a step length (lc). Each iteration of Newton’s method can be viewed as taking a step of length one in the Newton direction dNc 5 22Q (qc) 21Q (qc) .

(23.56)

When 2Q (qc) is negative definite it means that sT2Q (qc) s , 0 for all s. Therefore, 0 . (dNc) T2Q (qc) dNc 5 [ 2Q (qc) dNc ] TdNc 5 2Q (qc) TdNc 1 Q (qc) TdNc . 0 where Q (qc) TdNc is the directional derivative of Q at qc in the direction dNc – see Dennis and Schnabel (1996, pp. 70–71). A positive directional derivative means that the Newton direction is pointing in an ‘uphill’ direction, which would seem to be a bad choice when searching for a minimizer. This issue is typically addressed by adopting a procedure that modifies the quadratic model for Q so that it is positive definite for every iteration. A well-known example in the econometrics literature is called ‘quadratic hill climbing’ (Goldfeldt et al., 1966). In this approach, Hc in (23.55) takes the form Hc 5 2Q (qc) 1 mcI

(23.57)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 583 where I is the K 3 K identity matrix, and mc is a non-negative constant chosen to ensure that Hc is positive definite. If iterates get close enough to the solution, then mc will be zero and fast local convergence will occur.14 Note, however, that requiring Hc to be positive definite for each iteration does not guarantee convergence. It only guarantees that the quasi-Newton direction dc given by ( dc 5 2H21 c Q qc)

(23.58)

has a directional derivative less than zero, so that dc is a descent direction. In other words, a step length lc in the direction dc must exist for which Q (qc 1 lcdc) , Q (qc) . However, this may not be true for lc 5 1, and in fact lc may need to be quite small. Because Newton’s method without any modification always uses a step size of one, there is no guarantee that (23.56) will produce a decrease in Q, even if Q is globally positive definite. Procedures for selecting a step length are discussed in the next section. It is worth re-emphasizing this point, because many researchers have misunderstood it: a globally positive definite objective function is no guarantee that Newton’s method (unmodified) will converge. It is now clear why quasi-Newton methods usually use positive-definite Hessian approximations: (1) exact Hessians are expensive, and (2) positive-definiteness guarantees a quasi-Newton step in a descent direction. However, completely specifying an optimization algorithm requires more detail on both its global and local strategies. Two global strategies (line searches and trust regions) are discussed next, followed by a discussion of local strategies (essentially the choice of a Hessian approximation), where the interest is on determining the rates of convergence described previously. 4.2

Global Strategies

As noted previously, two types of global strategies are line searches and trust regions. Line searches were developed earlier and have probably been more widely used. Trust regions were developed more recently, and are arguably more complicated than line searches. We discuss line searches first, and then trust regions. As discussed in the previous section, a positive-definite Hessian approximation in equation (23.58) yields a descent direction dc, which defines a one-dimensional subspace (or, a straight line). As the name implies, a line search takes a step in this direction, and the procedure for determining the step length lc defines the method. One obvious approach is to find lc that minimizes Q in the direction dc. This is called an ‘exact line search’, and was explored early on in the literature. It is now considered to be inefficient compared to an inexact line search because using a very large number of function evaluations just to find the best possible step in a single direction is miss-spent effort for an iterative search in K-dimensional space. The requirements for a good line search procedure are (1) global convergence, and (2) step sizes of one near the solution. It is worth emphasizing that simply requiring Q (qc 1 lcdc) , Q (qc) is insufficient for these purposes (although this requirement should always be satisfied). Because of the second requirement, it is common to try l 5 1 first, and then backtrack along the direction if the step fails to meet whatever acceptance

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

584

Handbook of choice modelling

criteria are being used – see Dennis and Schnabel (1996, s. 6.3.2). Generally, conditions are imposed to keep steps from being either too large, or too small. One type of condition for a line search is expressed in the form Q (qc 1 ldc) # Q (qc) 1 aQ (qc) T [ ldc ] 5 Q (qc) 1 alTQ (qc) dc

(23.59)

where a is between zero and one. Equivalently, 2

Q (qc 1 ldc) 2 Q (qc) 2 $ a 0 Q (qc) dc 0 l

(23.60)

which says that the average rate of decrease in the direction dc must be at least some prescribed fraction of the initial rate of decrease in that direction. If this is violated, it means that the amount of decrease is insufficient relative to the step length, that is, the step length is too long. Other conditions can be imposed to ensure that the step lengths are not too short; however, these sometimes require computing the gradient at the trial step to test the condition. One benefit of the backtracking strategy is that checking these conditions can frequently be avoided. Dennis and Schnabel (1996, ch. 6) provides details on these conditions, as well as theorems that establish conditions for global convergence. The trust region approach is a bit different. It extends the earlier idea of using a quadratic model to approximate Q. As mentioned previously, there is no guarantee that the quasi-Newton step from minimizing Mc in (23.55) will yield a decrease in Q, even if Hc is the exact Hessian and is also positive definite. For example, assume Hc is positive definite but the full quasi-Newton step produces an increase in Q. Then, it must be the case that Mc is a poor approximation to Q in the area near qc. However, it can be shown that there exists some region around the current iterate for which Mc is a good approximation, so assume this region is a sphere with radius dc centered at qc Then it can be said that we trust Mc to be a good model for Q within this region (that is, the sphere is the trust region, also called the model trust region). Because this is true, one way to find the next step would be to solve the following constrained optimization problem: | Find s that minimizes Mc (s) 5 Mc (qc 1 s) 5 Q (qc) 1 Q (qc) s 1 sTHcs subject to 7s7 # dc.

(23.61)

s (m) < 2 (Hc 1 mI) 21Q (qc)

(23.62)

where for this discussion we assume 7 # 7 denotes Euclidean distance.15 Dennis and Schnabel (1996, p. 131) shows that the solution to this problem has an interesting relationship to equation (23.57) considered earlier. Specifically, there is a unique solution of (23.61) given by for some value of m $ 0 such that 7s (m) 7 5 dc, unless 7s (0) 7 # dc, in which case the solution is s (0) 5 dNc, that is, the full quasi-Newton step. Note that for any m $ 0, s (m) Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 585 defines a descent direction, and s(m) approaches the negative gradient direction as gets m large. In terms of practical details, there are a number of ways to solve this problem, and moreover (23.61) only needs to be solved approximately (within specified tolerances to ensure convergence) – see Dennis and Schnabel (1996, pp. 134–43) or Conn, et al. (2000). The subscript on dc indicates that the radius can be adjusted during the course of the search based on new information about Q at each iterate. Because there are so many possible rules for adjusting the size of the trust region, we omit those details in the following stylized procedure:16 1. 2. 3.

Solve (23.61) to get a trial step st for the next iterate. Compute Q (qt) , where qt 5 qc 1 st. Is qt acceptable? If yes, set q1 5 qt. If no, set q1 5 qc. Update the trust region radius. If qt was acceptable, decide whether to increase dc and, if so, by how much. If qt was not acceptable, decide how much to decrease dc. Set d1 based on the above. Go to Step 1.

Decisions in both steps 2 and 3 are based on comparing the actual function decrease Q (qc) 2 Q (qt) to the function decrease predicted by the quadratic model, that is, Mc (qc) 2 Mc (qt) 5 Q (qc) 2 Mc (qt) using the ratio rc 5

Q (qc) 2 Q (qt) Q (qc) 2 Mc (qt)

(23.63)

If rc . 0, then there is at least some decrease in Q, and in practice the trial step is frequently deemed acceptable even if rc is relatively small (for example, 0.01). There are a variety of rules for increasing or decreasing dc based on considerations similar to those for line searches. And, as for line searches, trust region procedures ensure that full quasiNewton steps are taken close to the solution to preserve the convergence properties of the local strategy. The book by Conn et al. (2000) on trust region methods provides a detailed treatment using a more general framework, for example, they allow for adjusting the shape as well as the size of the trust region during the search. 4.3

Local Strategies

Choosing a local strategy is essentially synonymous with choosing a method for determining Hc for each iteration. The ideal choice close to the solution would be the exact Hessian, but, as noted previously, it is generally considered too costly. Steepest descent There are choices for Hc that are convenient, but also slow. One frequently mentioned approach is steepest descent, where the search occurs in the direction where Q (qc 1 d) has its maximum decrease. It can be shown that this direction is the negative gradient divided by its length, and is equivalent to setting Hc 5 I, the K 3 K identity matrix. This method is q-linearly convergent, and frequently very slow. Assuming that the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

586

Handbook of choice modelling

method converges to a local minimizer q^ with positive definite 2Q (q^ ) , the constant c in (23.52) is depends on the smallest and largest eigenvalues of 2Q (q^ ) . If the smallest and largest eigenvalues are almost identical, then c can be very small and convergence can be fast. However, Hessians rarely have identical eigenvalues, and the distance between eigenvalues does not need to be very large before c quickly approaches one and convergence is very slow – see Dennis and Schnabel (1996, p. 115). The primary reason for discussing this method is to provide a clear recommendation against using it. Secant methods One of the most widely used approaches for solving general unconstrained optimization problems is based on the secant method for solving the nonlinear equation f(x) 5 0 in one dimension. Specifically, Newton’s method for this problem can be derived using the following model: m (x1) 5 f (xc) 1 fr (xc) (x1 2 xc) .

(23.64)

Setting m (x1) 5 0 yields the Newton iteration x1 5 xc 2

f (xc) . fr (xc)

(23.65)

In this example avoiding calculation of f9 is analogous to avoiding the calculation of the exact Hessian in unconstrained minimization. In one dimension, the derivative of f(x) at the point xc can be approximated by ac 5

f (xc 1 hc) 2 f (xc) hc

(23.66)

for some small value of h near xc. Replacing f 9 in (23.62) with ac yields the approximation | (x) 5 f (x ) 1 a (x 2 x ) . m c c c

(23.67)

Setting this to zero and solving yields f (xc) | x1 5 xc 2 . ac

(23.68)

It can be shown that for an appropriately small hc, this iteration will give results very similar to (23.65). However, this requires two evaluations of f(x) for each iteration. But it is possible to use a cruder approximation that uses only one evaluation of f(x) per iteration. Let x2 denote the iterate immediately prior to xc. Setting hc 5 x2 2 xc yields the following new approximation ac 5

f (x2) 2 f (xc) . x2 2 xc

(23.69)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 587 Using this version of ac in (23.65) yields the secant method. This may seem to be a potentially inaccurate approximation, and it will indeed converge less quickly than (23.65). However, the change in the speed of convergence is relatively small, and the savings in total computational effort makes the method much more efficient. The basic idea in the one-dimensional example is to use the readily available information from successive changes in function values as a substitute for more expensive derivative information. Although the one-dimensional example is instructive, things are not quite as simple in higher dimensions. First, the immediate generalization of f(x) 5 0 is to the problem of solving the n-dimensional system of nonlinear equations F(x) 5 0, where x is a vector in n-dimensions. Secant methods for this problem have been developed, and it would seem that they could be used to compute m-estimates by solving QN (q) 5 0. However, as discussed in section 4.1, the more relevant problem is finding a local minimizer of QN (q) using algorithms designed specifically for this purpose. Let Hc be the current Hessian approximation in an iterative search for a minimizer. What is needed is a procedure to create H1 in a way that uses recently computed information to avoid directly computing 2QN (qc) . In this case, the analogue to (23.69) in K 3 K space is the secant equation H1sc 5 yc

(23.70)

sc 5 q1 2 qc and yc 5 Q (q1) 2 Q (qc) .

(23.71)

where

Because of the dimension of the problem, H1 cannot be uniquely determined from (23.70) alone; it merely represents a constraint that ensures H1 is consistent with the recent change in the gradient. However, there are additional restrictions that help determine H1. It must be symmetric, and also positive definite for reasons discussed in section 4.1. Given a current Hessian approximation Hc, the problem we seek to solve is: Given sc and yc, find a new matrix H1 that satisfies the secant equation (23.70), is symmetric, is positive definite (if possible), and minimizes the distance between H1 and Hc (in some sense). The last requirement is the reason these methods are sometimes called leastchange secant updates. A popular and widely used secant update that solves a version of the previously stated problem is the BFGS update (discovered independently by Broyden, Fletcher, Goldfarb and Shanno), given by H1 5 Hc 1

ycyTc HcscsTcHc 2 . yTcsc sTcHcsc

(23.72)

See Dennis and Schnabel (1996, p. 201) for references. They also refer to this as the positive definite secant update, and opine that it ‘is the best Hessian update currently known’. We are unaware of any new developments that would clearly invalidate this general claim.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

588

Handbook of choice modelling

However, another update is also important for historical and theoretical reasons and still enjoys use as an alternative to BFGS: the DFP update, owing to Davidon, and Fletcher and Powell – see Dennis-Schnabel (1996, p. 203) for references. This update was the first secant update to be discovered, can be viewed as directly updating H21 c to obtain H21 , and hence is sometimes referred to as the inverse positive definite secant update. 1 An expression that implements DFP by updating the Hessian (rather than its inverse) is H1 5 Hc 1

(yc 2 Hcsc) yTc 1 yc (yc 2 Hcsc) T [ (yc 2 Hcsc) Tsc ] ycyTc 2 . [ yTcsc ] 2 yTcsc

(23.73)

Dennis and Schnabel (1996, p. 203) reports a consensus in the literature that ‘the DFP update sometimes produces numerically singular Hessian approximations’, and that the BFGS performs better in conjunction with the global strategies discussed previously. One important aspect of these secant updates is their fast local convergence properties: they are proved to be q-superlinear under various conditions. However, it is also important to recognize that, in practice, secant updates can require many iterations to build up a good enough Hessian approximation for the fast local properties to become effective. This is particularly true if the identity matrix is used to initialize Hc (as is frequently done in practice). An alternative is to initialize the search with a better Hessian estimate, perhaps using finite differences to compute the full Hessian at the starting value (assuming it is positive definite). Another alternative would be the Hessian approximation to be discussed next. Berndt, Hall, Hall, Hausman (BHHH) As mentioned previously, secant methods were developed to solve general optimization problems, that is, they do not use any information specific to the problem being solved. The exact opposite could be said of the Hessian approximation suggested by Berndt et al. (1974), which is based largely on a qualitative statistical argument already discussed in section 3.4. The BHHH method is popular in the econometrics literature due to features discussed previously (for example, the Hessian approximation has a statistical interpretation, and is both inexpensive and positive definite). However, there is an additional feature whose significance may have been overlooked: in their paper, BHHH provided a complete method that included specifications for a line search. In the past, econometricians frequently relied on iterative methods (for example, Newton’s method, method of scoring and iteratively reweighted least squares) that did not incorporate a global strategy. Some of the success of BHHH as a method could be due to the salutary effects of including a global strategy. Additional discussion of BHHH appears next, and in section 5. Gauss–Newton-like methods with model switching Section 3.2 introduced a general parameter estimation framework for nonlinear regression models that accommodates all m-estimators, including MLE, QMLE, NLLS and other minimum distance estimators. The implications of its common statistical estimation structure were demonstrated in section 3.4 using MLE of a general discrete choice model as an example. Bunch et al. (1993) provide software (implemented in Fortran) for a slightly more general version.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 589 Recall that the composite structure of equation (23.18) yields a Hessian expression of the form 2Q (q) 5 C (q) 1 A (q)

(23.74)

where C (q) is easily computed, but A (q) requires a substantial amount of additional computation. Simply ignoring A (q) and using C (q) as a Hessian approximation generalizes a variety of methods proposed in the literature for specific problems, including BHHH for maximum likelihood and the Gauss–Newton method for NLLS (discussed below). An important factor in the performance of these methods is the relative ‘size’ of A (q) versus C (q) . If A (q^ ) turns out to be ‘small’ then local convergence could be fast, because A (q^ ) < 0 implies a close approximation to Newton’s method. The question is: can A (q^ ) ever be small in practice? The earlier result that E [ A (q^ ) ] 5 0 for m-estimators with correctly specified models was considered a hopeful finding. Unfortunately, this provides no guarantee that A (q^ ) will be small for any particular dataset. This is particularly the case for discrete choice models in cases like the prototype example of one choice for each stochastically produced value of zi. For a better understanding, it is instructive to return to the repeated measures case introduced in section 3.1, and consider the NLLS estimator. Equation (23.16) is a special case of the general nonlinear regression problem yi 5 g (xi,q) 1 ei, i 5 1, . . . , N

(23.75)

where yi is a real-valued scalar (corresponding to a relative frequency in (23.16)). Letting ri (h) 5 12 (h 2 yi) 2 and hi (q) 5 g (xi,q) in (23.18) defines the NLLS problem Find q that minimizes Q (q) 5

1 N 2 a [ g (xi,q) 2 yi ] . 2 i51

(23.76)

The Jacobian matrix J is defined as in (23.19), where g replaces h. Expressions for rr and 8rs9 are (rr) i 5 g (xi,q) 2 yi and 8rs9 5 I, respectively, where I denotes the K 3 K identity matrix, and the Hessian is 2Q (q) 5 J (q) TJ (q) 1 a [ g (xi,q) 2 yi ] 2g (xi,q) . N

(23.77)

i51

Performing the Newton iteration using the first term of (23.77) to approximate the Hessian gives the Gauss–Newton method for NLLS. Improving its global convergence properties by adding a line search is called the damped Gauss–Newton method. However, problems can arise during the search if J (qc) is not of full column rank. Employing a version of (23.57) to ensure positive definiteness so that the step is given by sc 5 2Hc21J (qc) Trr 5 2 [ J (qc) J (qc) T 1 mI ] J (qc) Trr

(23.78)

is the Levenberg–Marquardt method (the NLLS version of quadratic hill-climbing). Finally, it is clear that if (23.78) is used, m could be determined using a trust region.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

590

Handbook of choice modelling

However, the key point arises from considering the second term of (23.77). Suppose yi is a relative frequency as in a discrete choice model with repeated measures. As the number of replications gets larger, yi approaches the true probability Pi. For a model with any degree of flexibility, the second term of (23.77) can be close to zero at q^ . This is called the zero-residual case for NLLS. Note, however, that this very same effect also occurs for MLE using the BHHH Hessian approximation. The MLE version of A (q) can be written as N S yij A (q) 5 2 a a 2Pij (q) , ( ) i51 j51 Pij q

(23.79)

and clearly approaches zero in the repeated measures case as the number of replications gets large. The repeated measures case highlights the fact Gauss–Newton/BHHH-like methods are ‘asymptotically’ (for lack of a better term) equal to Newton’s method. However, the nature of the model, the DGP and the sample size N all play a role in determining what might happen in a real dataset. What can be done if A (q^ ) is ‘large’ and convergence is slow? The algorithms in Bunch et al. (1993) take a hybrid approach, using the idea of model switching. As the search proceeds, a specialized secant update is used to build up an approximation to the A (q) matrix. For every iteration there are two Hessian approximations available: HCc 5 C (q) and HAc 5 C (q) 1 Ac.

(23.80)

The algorithms are implemented using model trust regions, and in keeping with that approach both models can be evaluated on their ability to approximate Q using equation (23.63) as a measure. The algorithm can switch models, depending on this evaluation. Dennis et al. (1981) gives a detailed description for the NLLS case. In terms of actual behavior, C (qc) is used in the early part of the search. If the dataset happens to yield a zero-residual case, then using C alone may converge quickly with no model switching. However, in most cases (in our experience) C (qc) will be inferior to C (qc) 1 Ac after only a few iterations, and the algorithm switches to the C (qc) 1 Ac model until convergence. A key feature of this approach is that the secant update for A is designed to yield q-superlinear convergence. This approach ensures that local convergence is always fast, but can sometimes be very fast. Experience with MLE of discrete choice models suggests that this approach dominates the methods in wide use (for example, BFGS and BHHH, either with line searches or trust regions). Bunch and Kitamura (1991) performed a direct comparison of BHHH, BFGS and model switching (all using the same trust region method) for MLE of multinomial probit models of household car ownership level. The data are cross-sectional (one choice per household, N 5 945). Estimates were obtained for six model specifications of varying complexity, using two different starting values for two of the more complex specifications. Model switching dominated both BHHH and BFGS, which had comparable results. In one case BHHH reached a 100-iteration limit without converging. The range and average of iterations (excluding the case where BHHH failed) were: BHHH (11–70, average 32), BFGS (23–52, average 36), and Model Switching (8–22, average 15). These results are what might be expected based on the earlier discussion, and the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 591 model switching results are consistent with many other comparable datasets and models. The earlier discussion also suggested that BHHH might perform better in cases with repeated measurements. This effect was observed in a simulation study by Bunch (1987), and also with stated choice experiment data in Bunch (1988). 4.4

Computer Arithmetic and Finite Differences

There are occasions when knowledge of finite-precision arithmetic can play a role in using computational methods. One is when analytical derivatives are not available and the only option is to compute a numerical approximation using finite differences. Another relates to rules for stopping an iterative search, discussed in the next section. For the optimization methods discussed here, concerns are centered on computing the gradient and the Hessian. The situation for the Hessian has been assessed as follows: It is generally too expensive to use as part of a general iterative search (regardless of how it is computed), but it should be computed after estimation is complete for inference purposes. (There are obviously some models for which computing the Hessian is practical, and therefore should be used, for example, multinomial logit and possibly nested logit.) Moreover, performing this calculation provides helpful diagnostic information on Q at the solution that is not available from any other source. For example, if the model being estimated is structurally unidentified, the actual Hessian will be singular at the solution even if the Hessian approximations used in the search are not. However, because writing code to compute analytical Hessian expressions is frequently impractical, finite differences are typically used.17 On the issue of whether to use analytical or finite-difference gradients, there are frequently mixed signals from the literature. First, it is important to recognize that, given the role of first-order conditions, optimization methods require accurate gradient calculations (whereas this obviously does not apply to the Hessian). For this reason it is frequently stated that analytical gradients are preferred and encouraged. At the same time, it is frequently suggested that successfully writing correct code for gradients can be both difficult and time consuming, and, in addition, the accuracy of finite difference gradients is more than adequate. Our overall conclusion: it is reasonable and much more efficient to use finite difference gradients, particularly for complex, specialized models. If for some reason problems arise that can be traced to the accuracy of the gradient calculation, steps can then be taken to develop code using analytical expressions. In our experience this is rarely necessary, and numerical difficulties are more likely to arise from the model itself, for example, embedded special functions or routines related to computing integrals. The basic issue with computer arithmetic is that, although theory is developed using mathematical concepts in infinite precision, computers can only represent a finite set of numbers. As in scientific notation, computers use two items to represent a number (x): an exponent, and a mantissa with a specified number of digits. (In a mantissa the first non-zero digit is always immediately to the right of the decimal point). This is the computer’s floating-point representation of the number [denoted fl(x)], and its precision is a function of the number of digits in the mantissa. To illustrate one type of problem, suppose a computer is subtracting two numbers that are almost identical. Many of the left-most digits in the mantissa will cancel out so

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

592

Handbook of choice modelling

that the result is based on the difference of the remaining digits, thus limiting accuracy. Computer precision can vary due to hardware and software differences, and these differences can complicate comparisons. A useful concept for characterizing precision is machine epsilon (abbreviated macheps). Machine epsilon for a computer is defined as the smallest positive number t for which 1 1 t . 1. According to Dennis and Schnabel (1996, p. 12), the following can be shown: 1. 2. 3.

The relative error in fl(x) , macheps for x 2 0. The number x will lie in the range [x(1−macheps), x(11macheps)]. Two numbers x and y will agree in the leftmost half of their digits when 0x 2 y0 # "macheps. 0x0

(23.81)

0 f (xc 1 hc) 2 f (xc) 0 # "h. 0 hc 0

(23.82)

h 5 "h max { 0 x 0 ,typx } # sign (x)

(23.83)

The value of macheps plays an important role in implementing finite difference calculations. The formula for a finite difference derivative of f(x) in one dimension was given by equation (23.66). The central issue is choosing the step size hc. There are two conflicting effects that must be addressed. In infinite precision, theory suggests that hc should be chosen as small as possible. However, in finite precision the error in the numerator gets worse as hc gets smaller due to, for example, cancellation in the subtraction f (xc 1 hc) 2 f (xc) . Dennis and Schnabel (1996, p. 97) argues that, if f(x) has t reliable digits, then we would like for f (xc 1 hc) 2 f (xc) to be accurate in the last half of its digits. If h is the relative noise in computing f(x), the desired hc should be chosen so that

They indicate that, in the absence of better information, a reasonable way to accomplish this is to set hc 5 "h # xc. In cases where f(x) is a simple formula, h < macheps. However, if f(x) is produced by another numerical routine with a known noise level (for example, numerical integration), it should be used instead. This argument is based on theory, but in practice there are other issues. For example, what if xc gets close (or equal) to zero? In such cases it would be desirable to have an estimate of the basic scaling of x, denoted typx. In such cases h would be determined by the following:

This basic idea is extended to higher dimensions (along with some adjustments) when computing finite difference Hessians and gradients. There are two choices for computing Hessians. If an analytic gradient is available, the finite difference Hessian (H) is given by

A#k 5

Q (q 1 hkek) 2 Q (q) , for k 5 1,. . .,K hk

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 593 H5

A 1 AT 2

(23.84)

where ek denotes a vector with one in the kth component and zeros elsewhere, and A#k denotes the kth column of the K 3 K matrix A – see Dennis and Schnabel (1996, pp. 103–4). The second line of (23.84) ensures that H is symmetric. Note that h should be determined separately for each component of q using multiple applications of equation (23.83): many references use the same h for all K components. Because the gradient is analytic, it would be typical to choose h 5 macheps unless there is a reason to do otherwise. If an analytic gradient is not available, finite difference Hessians should be computed using objective function values using the formula Hij 5

[ Q (q 1 hiei 1 hjej) 2 Q (q 1 hiei) ] 2 [ Q (q 1 hjej) 2 Q (q) ] hihj

(23.85)

where 1 # i # j # K. Based on earlier discussion there could be some concern about, for example, the product of two h’s in the denominator. In fact, hi must be chosen differently for this case, using the expression hi 5 h 3 max { 0 xi 0 ,typxi } # sign (xi) . 1

(23.86)

A typical choice would be h 5 macheps because Q is analytic. However, if choice probabilities are computed using numerical integration or some other approach with a known level of accuracy, other values might be used. To compute a finite difference gradient (gc) in an iterative search, one approach is to use forward differences by extending equation (23.66) to higher dimensions, that is, (gc) k 5

Q (qc 1 hkek) 2 Q (qc) for k 5 1, . . . ,K hk

(23.87)

where each h is determined by using equation (23.83). A more accurate approximation is given by central differences, that is, (gc) k 5

Q (qc 1 hkek) 2 Q (qc 2 hkek) for k 5 1, . . . ,K 2hk

(23.88)

where hk is determined using equation (23.83). An obvious concern is that (23.88) requires twice as many function evaluations as (23.87). Dennis and Schnabel (1996, p. 106) report that (23.87) is ‘usually quite sufficient’, but offers a reminder that the accuracy of q^ will be limited by the accuracy of the gradient approximation. They also mention the existence of ‘production codes’ that decide when to automatically switch from (23.87) to (23.88) to obtain more accuracy.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

594

Handbook of choice modelling

4.5

Stopping Rules

Perhaps one of the most difficult aspects of numerical optimization is deciding when to stop an iterative search. Dennis and Schnabel (1996, p. 159) provides an operational definition of the stopping decision in very clear and plain language. The search stops when the answer to one of the following three questions is Yes: 1. 2. 3.

‘Have we solved the problem?’ ‘Have we ground to a halt?’ ‘Have we run out of money, time, or patience?’

The first question would seem to be the most important, and the answer might seem obvious: the problem is solved when the first-order condition has been satisfied. Unfortunately, finite-precision arithmetic complicates the situation. In infinite precision there is a ‘true solution’ q* with Q (q*) identically equal to zero; however, in finite precision we can only get ‘close’. What test should be used to decide when Q (q1) is close enough to zero to stop the search? An obvious option is 7Q (q1) 7 # eg

(23.89)

7Q (q1) 7 M ; 7M1/2Q (q1) 7 5 "Q (q1) TMQ (q1) ,

(23.90)

for some small, positive constant eg, where 7 # 7 is a distance measure, such as Euclidean distance. However, (23.89) depends so heavily on the scaling of both Q and q that it cannot be used. To see this, suppose that q1 were to satisfy (23.89). It is possible to multiply Q (q) [and therefore Q (q) ] by a constant large enough to cause (23.89) to be violated. However, the problem is mathematically the same because neither q1 nor q* have changed locations. Conversely, if (23.89) were not satisfied, it would be possible to multiply by a constant small enough for it to be satisfied. Similarly, suppose that q has been rescaled by a K 3 K | | | | nonsingular matrix S, and define q 5 Sq. Then estimation is based on Q ( q ) 5 Q (S21 q ) , | (| and it can be shown that Q q ) 5 S2TQ (q) . As before, the left-hand side of (23.89) has been arbitrarily altered even though the problem is mathematically identical – for details see Dennis and Schnabel (1996, p. 156) or Conn et al. (2000, p. 795). Other stopping criteria can be defined by generalizing (23.89) to use a weighted norm defined by

where M is a symmetric positive-definite weighting matrix. As discussed previously, a quasi-Newton method takes full steps defined by equation (23.58) when converging to a nearby solution. For the current iterate q1, the next quasi-Newton step d1 (not yet taken) can be substituted into (23.55) to obtain the predicted decrease in Q for the next iterate (q11 ): 1 M1 (q1) 2 M1 (q11 ) 5 Q (q1) 2 M1 (q11 ) 5 Q (q1) TH21 1 Q (q1) 2

(23.91)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 595

which equals 12 7Q (q1) 7 2H using (23.90). Because q11 would minimize the quadratic model (23.55) regardless of how q1 is scaled, the predicted decrease in Q can be viewed as a measure of Q (q) that is invariant to the scaling of q. For this reason, stopping rules of the form 21 1

Q (q1) TH121Q (q1) T # e

(23.92)

have been suggested in the literature – see, for example, Fletcher (1981, p. 19) or Dennis and Schnabel (1996, p. 159). However, this does not address the scaling of Q. An immediate remedy is to define convergence using the relative function decrease: Q (q1) 2 M1 (q11 ) Q (q1) 2 M1 (q1 2 H21 1 Q (q1) ) 5 # eR. Q (q1) Q (q1)

(23.93)

It is helpful to consider this condition in the context of trust regions. The relative function decrease in (23.93) is for the full quasi-Newton step, but this step would only be allowed if it also lies within the current trust region. So, both conditions are required for a favorable outcome. This will occur for a well-behaved model, but will typically fail for cases where the model is unidentified. Another criterion used to remove scaling issues is the relative gradient of Q at q, given by

relgrad (q) k 5

relative rate of change in Q 5 lim relative rate of change in q dS0

5

Q (q 1 dek) 2 Q (q) Q (q)

Q (q) kqk Q (q)

d qk (23.94)

where qk in this expression denotes the kth component of q. This is used in the test max 2relgrad (q1) k 2 # gradtol

(23.95)

1#k#K

See Dennis and Schnabel (1996, p. 160). This test is independent of any change in units for either Q or q. However, there are clearly problems if either Q (q) or any qk gets too close to zero. As in the previous section, practical implementation requires estimates of ‘typical values’ for Q and qk, yielding the following version of the test that can be used in practice: Q (q1) k max { 0 (q1) k 0 ,typ qk } 0 # gradtol max { Q (q1) ,typQ } 1#k#K max 0

(23.96)

Using finite difference gradients would place some limits on what a reasonable value for gradtol would be: any value smaller than "macheps would1 be problematic. Dennis and Schnabel (1996, p. 278) suggests a default value of macheps 3 . Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

596

Handbook of choice modelling

If the search stops due to (23.96) then the outcome is likely to be favorable, and it is also likely that the step lengths have been getting shorter and shorter. However, in some searches the steps could get successively shorter without satisfying (23.96). It could be that the algorithm has converged, but that gradtol has been set too aggressively. Or, it could be that there is a problem. Either way, successively shorter steps mean that the algorithm has stalled, and it should be stopped. A stopping criterion based on step length is max relqk # steptol

(23.97)

1#k#K

where, because of scaling issues, step length changes are defined using the relative change in the kth component of the iterate: relqk 5

0 (q1) k 2 (qc) k 0 . max { (q1) k,typqk }

(23.98)

Dennis and Schnabel (1996, p. 160) suggests that, if t significant digits are desired in the solution, then steptol should be set to at least 102t, with a warning that setting steptol 2 too small may cause premature convergence. They suggest a default value of macheps 3 . This test can also be used in conjunction with (23.93) discussed above. Generally speaking, if tolerances are set properly for (23.96) and (23.97) (or (23.93) and (23.97)), they will frequently be satisfied simultaneously when optimizing a well-behaved objective function. However, if an algorithm fails to satisfy any of the tests, limits are required to prevent the algorithm from continuing. The most obvious test for the third question above is an iteration limit. This is particularly a good idea early in a study when failure of the previous tests could indicate a programming or data error. Another scenario is when the objective function is unbounded below. The search could continue until one or more parameters generate an overflow condition. Dennis and Schnabel (1996, p. 161) suggests an additional stopping rule based on a maximum step length. If the algorithm takes five consecutive steps of this length, the problem could be unbounded. This is definitely a good idea when performing MLE on discrete choice models. For certain models and datasets, it is possible to have a situation where all of the data points can be completely explained by a so-called separating hyperplane. In this case the negative log-likelihood would not be strictly convex, but would have a lower asymptote of zero. In this situation only the maximum step length would terminate the search. A similar idea would be to use an optimization method with simple upper and lower bounds on the parameters.

5

SUMMARY

Successful estimation and inference for choice models require the knowledgeable and correct application of a variety of numerical and computational methods. Although many software packages produce reliable results for standard models, state-of-the-art models can be much more challenging. This chapter reviewed mathematical concepts and developed a general framework for estimation and inference that simultaneously

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Numerical methods for optimization-based model estimation 597 addresses computational aspects of both econometrics and nonlinear optimization. Numerical expressions for computing variance estimates are provided, alternative optimization methods for computing estimates are defined and evaluated in terms of their speed and reliability, and relationships among these are demonstrated. The case for robust, adaptive estimation methods using model switching is made. Practical implications of computer arithmetic are reviewed, and methods for computing derivatives using finite differences as well as a detailed discussion of stopping rules are provided.

NOTES 1. 2. 3. 4. 5. 6. 7. 8. 9.

10.

11. 12. 13. 14. 15. 16. 17.

Helpful comments were provided by Professors David Brownstone, Colin Cameron and Dale Poirier, and an anonymous reviewer. Although the context is discrete choice modeling, the treatment provided here is general enough to be applicable in many other modeling situations. The notation is initially used to represent a single, simple choice for each independent observation; however, later it will be extended in a way that can address more complex DGPs, for example, panel data or discrete choice experiments. The literature refers to this as a ‘simulated maximum likelihood estimator’. This terminology should not be confused with the simulators used in Bayesian estimation, for example, the Markov chain Monte Carlo methods discussed in Chapter 20. It will generally be assumed that the audience for this handbook is more likely to be familiar with econometric modeling than with numerical analysis. Another important reference that rigorously addresses large sample properties of econometric estimators is Newey and McFadden (1994). This result is possible because we are approximating a real-valued function using the mean value theorem. See, for example, Dennis and Schnabel (1996), Lemma 4.1.5 and discussion. This factor is omitted when computing estimates, for reasons to be discussed later. Although this statement was published in 1985, in our view it is still true today. Also, note that we are generally referring to non-trivial, complex forms of constraints. Many software packages implement very simple forms of constraints such as setting a specific parameter to a constant, or setting bounds on parameters. The first of these is innocuous and convenient for the user. The second can be important for preventing poor behavior by a search algorithm, and is only a potential concern if the algorithm returns with a solution on the boundary. This could occur due to a variety of complications, and the solution is suspect. For reference, this very important statistical property says that the estimator converges in probability to the true parameter, or, more formally, the limit as N S ` of Pr ( 0 q^ N 2 q0 0 . e) 5 0 for any e . 0.

We rely on asymptotic results because similar analytical results for a finite sample of size N are difficult if not impossible to obtain. The sign in equation (23.26) is opposite of what is found in Amemiya (1985) and Cameron and Trivedi (2005) because we are defining m-estimators as minimizers rather than maximizers. It is also sometimes called a ‘pseudo-maximum likelihood estimator.’ Note, however, that although this approach addresses the positive definiteness issue, it still uses the full Hessian at each iteration. Many trust region methods use more general distance measures that take into account the shape of the region as well as the size. In addition, the distance measures can be updated at each iteration. Convergence tests (which are discussed in section 4.5) have been omitted. For purposes of completeness, we make the following remark: we are aware of cases where researchers have attempted to use the final Hessian approximation from a secant update to compute variance estimates. This should never be done.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

598

Handbook of choice modelling

REFERENCES Amemiya, T. (1985), Advanced Econometrics, Cambridge MA: Harvard University Press. Berndt, E.K., B.H. Hall, R.E. Hall and J.A. Hausman (1974), ‘Estimation and inference in nonlinear structural models’, Annals of Economic and Social Measurement, 4 (3), 653–65. Bhat, C.R., I.N. Sener and N. Eluru (2010), ‘A flexible spatially dependent discrete choice model: formulation and application to teenagers’ weekday recreational activity participation’, Transportation Research Part B, 44 (8-9), 903-21. Bunch, D.S. (1987), ‘Maximum likelihood estimation of probabilistic choice models’, SIAM Journal on Scientific and Statistical Computing, 8 (1), 56–70. Bunch, D.S. (1988), ‘A comparison of algorithms for maximum likelihood estimation of choice models’, Journal of Econometrics, 38 (1–2), 145–67. Bunch, D.S. and R. Kitamura (1991), ‘Probit model estimation revisited: trinomial models of household car ownership’, Working Paper No. 70, September, University of California Transportation Center. Bunch, D.S., D.M. Gay, R.E. Welsch (1993), ‘Algorithm 717: subroutines for maximum likelihood and quasilikelihood estimation of parameters in nonlinear regression models’, ACM Transactions on Mathematical Software, 19 (1), 109–30. Cameron, A.C. and P.K. Trivedi (2005), Microeconometrics: Methods and Applications, Cambridge, MA: Cambridge University Press. Conn, A.R., N.I.M. Gould and P.L. Toint (2000), Trust-Region Methods, MOS-SIAM Series on Optimization, Philadelphia, PA: Society for Industrial and Applied Mathematics. Davidson, R. and J.G. MacKinnon (1993), Estimation and Inference in Econometrics, New York and Oxford: Oxford University Press. Dennis, J.E. and R.B. Schnabel (1996), Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Philadelphia, PA: Society for Industrial and Applied Mathematics; originally published: Englewood Cliffs, NJ: Prentice Hall, Copyright 1983. Dennis, J.E., D.M. Gay and R.E. Welsch (1981), ‘An adaptive nonlinear least-squares algorithm’, ACM Transactions on Mathematical Software, 7 (3), 348–68. Fletcher, R. (1981), Practical Methods of Optimization, Volume 1, Unconstrained Optimization, New York: John Wiley and Sons. Gay, D.M. and R.E. Welsch (1988), ‘Maximum likelihood and quasi-likelihood for nonlinear exponential family regression models’, Journal of the American Statistical Society, 83 (404), 990–98. Goldfeldt, S.M., R.E. Quandt and H.F. Trotter (1966), ‘Maximization by quadratic-hill climbing’, Econometrica, 34 (3), 541–51. Huber, P.J. (1967), ‘The behavior of maximum likelihood estimates under nonstandard conditions’, in J. Neyman (ed.), Proceedings of the Fifth Berkeley Symposium, vol. 1, Berkeley, CA: University of California Press, pp. 221–33. Huber, P.J. (1981), Robust Statistics, New York: John Wiley and Sons. McFadden, D. (1974), ‘Conditional logit analysis of qualitative choice behavior’, in P. Zarembka (ed.), Frontiers in Econometrics, New York: Academic Press, pp. 105–42. Newey, W.K. and D. McFadden (1994), ‘Large sample estimation and hypothesis testing’, in R.F. Engle and D.L. McFadden (eds), Handbook of Econometrics, Volume IV, Amsterdam: Elsevier, pp. 2111–45. Varin, C. (2008), ‘On composite marginal likelihoods,’ Advances in Statistical Analysis, 92 (1), 1–28. White, H. (1982), ‘Maximum likelihood estimation of misspecified models’, Econometrica, 50 (1), 1–25.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:41AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

24 Appraisal Anders Karlström

1

INTRODUCTION

Fundamental problems for human beings may be recognized as they appear in many different scientific fields, tackled from different perspectives and theories. One such problem is how we as a collective should take decisions, and choose appropriate norms, rules and regulations for society. These are thus topics both in moral philosophy and political philosophy, while social choice theory is a discipline that is devoted to the subject of collective decision-making. Its close relative, public choice theory, addresses the same issue in the interaction between political science and economics. As in all of these fields, the behaviour of individuals is also studied in psychology, but traditionally avoids focusing on normative aspects of behaviour. However, psychology is also interested in changing and influencing people’s behaviour, and therefore the demarcation line is not as clear-cut as one may believe. In this chapter we will focus on what the field of economics has to say about how we can analyze and decide what policies or projects should be implemented in a society. Considerably more specific, we will sketch the fundamentals of neoclassical economic arguments for welfare economics used in cost–benefit analysis (CBA). In particular, we will outline the basic theory and practice of welfare evaluation using random utility econometrics or discrete choice modelling, which are methods that are used in many different subdisciplines of economics. Although there exists a vital crossfertilization between, for example, transportation economics, environmental and resource economics, and labour economics, different practices have been developed within each subdiscipline. The different practices are perhaps even more pronounced when it comes to welfare evaluation, the topic of this chapter, rather than modelling, the topic of the handbook. To highlight this, we will make references to the relevant subdisciplines. This chapter is organized as follows. We first outline the basic foundation of welfare evaluation within the neoclassical economic framework. This involves both intrapersonal welfare theory and welfare aggregation across individuals. In section 3 we outline the basics of welfare evaluation within the random utility framework, with the famous logsum formula as the centerpiece. In section 4 we will briefly address the question whether structural modelling is really necessary, and in section 5 we will briefly discuss caveats and criticism of the prevailing neoclassical approach and discuss what road lies ahead of us, which is also part of the concluding section 6.

601 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

602

2 2.1

Handbook of choice modelling

FOUNDATIONS Neoclassical Microeconomic

In this section we will briefly outline the foundation of appraisal (which will be interpreted as cost–benefit analysis) within the realm of neoclassical economics. We start out with arguing that the neoclassical paradigm makes choices the central focus of mainstream economics. To see this, it is important to realize that the departure point of neoclassical economics is not that individuals are trying to maximize their utilities. The axiomatic approach developed by Samuelson (1938a, 1938b) states that if people are adhering to a few regularity axioms, then they will behave as if they were utility maximizers. In fact, preference as such is defined as a binary relation. Expressed by Samuelson (1938a, 1938b), if an individual chooses x when y also could have been chosen, then it is reasonable to construct the statement that x is preferred over y. Let R denote such a binary preference relation, where xRy will denote that x is revealed preferred over y. Then R can be said to rationalize a demand (choice) function, that is, given a budget set B, a demand function is defined by c (B) 5 { x [ B: (4y [ B) [ x R y ] }

(24.1)

To add some regularity to what preference relations we consider, we define regular preference such the preference ordering is total, transitive and reflexitive, and the corresponding demand function is termed regular-rational. Furthermore, a demand function c (B) is said to be representable if it is consistent with maximizing some utility function u (x) over budget set B. For preferences to be representable we need for them to exhibit some regularity and coherence. This is the case if we, for instance, assume that they obey the strong axiom of revealed preferences, which rules out intransitivity in preferences such that if we have yRx, zRy, it is never the case that xRy or xRz. Thus, it can be shown that if we are willing to impose some regularity assumptions on preferences, then the choices they induce can be represented as if they were results of utility maximization. This is known as the as-if assumption of neoclassical economics. In fact, the beauty of this approach is that we do not have to know anything about what the decision process actually is, and what mental deliberations goes on inside the brains of individual; as long as people have coherent and regular preferences, all we need to study is their choices. Neoclassical economics is therefore all about choices, and hence choice modelling holds a very central place in the field of mainstream economics as a whole, not only for appraisal. Regular and coherent preferences are also sometimes known as rational preferences, and therefore it is common to state that the underlying assumption of neoclassical economics is that people are rational. However, this may be somewhat confusing, since the term rational comes with many connotations, and some authors therefore prefer to use terms such as regular (and possibly coherent) instead, see Border (2012) for details and also a discussion, cf. Mas-Colell et al. (1995). In the standard neoclassical approach, it is assumed that preferences are fixed and non-malleable, respecting consumer sovereignty. That is, it is not in the power of the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Appraisal

603

policy maker to affect the preferences of individuals. This is a serious and important assumption, and its relaxation has serious consequences for appraisal, which will be briefly discussed in section 5. Thus, proceeding with the assumption that individuals’ preferences are non-malleable and exhibit the required regularity properties, today’s dominating approach to CBA is the so-called willingness-to-pay approach. There are two components to this approach that need to be mastered. The first is preference elicitation, and the second is preference aggregation (to be discussed in the next subsection). Let us start with preference elicitation of a single individual: by observing choices of individuals (either by observing choices in a real-world market or hypothetical choices), we need tools to elicit the preferences and valuation of individuals. The underlying welfare measures are surprisingly easy to conceptually understand and communicate. For instance, suppose we are in situation A and want to understand an individual’s valuation of a proposed project that will change the situation to B. Then we can formulate the question: How much are you willing to pay to go from A to B? This question is well formulated and can be answered if (i) we are able to precisely communicate situation A and B, and (ii) the individual is willing to trade. There are three well-known welfare measures. The first is the Marshallian consumer a price change from p0 to p1. The loss in consurplus. Let p denote price, and consider p sumer surplus is given by CS 5 ep X (p) dp where X(p) is market (Marshallian) demand. Second, compensating variation (CV) is the amount of money the individual requires after the change in order for its utility to be restored to its initial level. Its counterpart is equivalent variation (EV), the amount of money the individual is willing to pay, in the initial state, to make him as well off as after the change. EV and CV are known as Hicksian welfare measures (these will be defined more carefully in section 3). There are theoretical reasons why the Hicksian welfare measures are sound, but what can be said about the choice between equivalent and compensating variation? This has been the focus of research, in particular in the 1980s (see Chipman and Moore (1980, 1990)), rooted in index theory. From a pragmatic point of view there are two reasons that speak in favour of equivalent variation (EV). First, EV is defined at existing prices and attributes. The existing situation is presumably better known than any future scenarios of proposed policy changes. This is useful when comparing many projects, since they are then compared against the same reference point. Second, as the theory shows equivalent and compensating variation (corresponding to willingness-to-pay, WTP, or willingnessto-accept, WTA, depending on the application) may differ in the presence of an income effect, but we expect the difference to be small, in particular for small projects (Horowitz and McConnell, 2003). In practice, the measured gap between WTP and WTA can be quite large, at least when measured using stated preference data. As a response, in a seminal contribution Hanemann (1991) showed that there may exist a gap between WTP and WTA in many cases also due to a substitution effect. There are many other explanations, including strategic responses, gain-loss asymmetry, or hypothetical biases. Since there are reasons to believe that WTA may be more affected by some of these explanations than WTP, valuation elicitation are based mostly on WTP (McFadden, 1998). In reality, at best we only observe individuals making choices, and we need to be able to recover this willingness to pay (WTP) from observed data on choices. This is the topic of section 3. 1

0

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

604

Handbook of choice modelling

2.2

Preference Aggregation

Observing the demand function of one individual, we are able to derive appropriate welfare measures of the proposed project that would take us from A to B. As we have argued above, the welfare measures are well defined and rather easy to communicate to laymen, which are important features of the approach. Next, we can elicit WTP of a sample from the population and hence arrive at a distribution of WTP in the underlying population, from which we can form a number of descriptive statistics, such as the total, mean, percentage below zero, and so on. In a minimalistic approach to appraisal (Sugden, 2003), such analysis may form the basis for an informed debate about the project. But clearly, if the total WTP in the population is positive, then the question whether this project should not be implemented is urged to be answered; if the sum of individuals WTP is positive, does this indicate that the project should be implemented? To take this further normative step, we will necessarily have to make interpersonal comparisons. According to Hicks and Allen (1934), to earlier generation economists such as Marshall, Walras and Edgeworth, utility was perceived, in principle, as a measurable quantity. If we just had enough facts, we could measure utility and also compare it across individuals. This was later disputed by Pareto, the father of the first of three important welfare criteria, namely, Pareto optimality. The Pareto criterion implies that a policy change is good, from a normative point of view, if there are no losers (all are winners, and some may be indifferent). A Pareto-optimal (or Pareto efficient) allocation is one where no one can be made better off without making someone else worse off. The second, and most common, welfare criterion is the potential Pareto improvement, also known as the Kaldor–Hicks compensation criterion (Hicks, 1939; Kaldor, 1939). In practice, the Pareto criterion is difficult to implement. First, the informational requirements are severe as we need to identify winners and losers. We also have to measure how much they are willing to pay, or how much compensation needed, in order for the project to be implemented. Second, individuals will have incentives to misreport their true valuations, a problem which could be exacerbated if compensations will actually take place. Therefore, the Kaldor–Hicks compensation criterion states that a project is good if it would be a Pareto improvement if the compensation was carried out. That is, a potential Pareto improvement is enough, without the winners actually compensating the losers. The third criterion is known as the Scitovsky criterion (Scitovsky, 1941). This is motivated by the observation that the Kaldor–Hicks criterion is not necessarily an intransitive decision rule. The argument is that if the utility feasibility curves for individuals intersect, it may be the case that situation B is preferred to A (according to the Kaldor– Hicks criterion), and C is preferred to B, while A is preferred to C. This can happen when the proposed projects are large and introduces significant income redistribution. These three welfare criteria form the basis of appraisal, and the Kaldor–Hicks compensation criterion is in particular the most frequently used (for a textbook exposition, see, for example, Cullis and Jones (1992) or Boardman et al. (2006). The three welfare criterion are each using different notions of efficiency. One way to mathematically address efficiency is to introduce a social welfare function (SWF) – that is, a function that embodies the welfare of a (possibly virtual) social planner, and which has the utility of the individuals as arguments. Any Pareto optimal allocation can be found as the solution to a maximization of an SWF. The opposite is also (more trivially) true: any

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

605

Appraisal

allocation that maximizes a SWF will be Pareto optimal. Just as the utility functions may be viewed as a mathematical construct (and no individuals are really maximizing a utility), the SWF may also be viewed as a mathematical construct. However, it can also be rationalized on philosophical grounds, without apologies, for instance from the perspective of utilitarism. The choice of SWF is not innocuous. A rather general parameterization (Boadway and Bruce, 1984) of SWF is the following: W (n1, . . . ,nn) 5

1 ah nh(1 2s) 1 2 sa h

(24.2)

where W is social welfare, nh are achieved utilities of individuals h 5 1, . . .,n. ah is a weight parameter associated with individual h, and s may be interpreted as a constant elasticity of a social welfare indifference curve. Using this parameterization many common SWFs can be seen as special cases. With s 5 0 and ah 5 1 we have the Benthamite or utilitarian social welfare function, which can be generalized if we allow weights ah 2 1. Two other interesting cases are (i) s d 1, which yields a SWF which maximizes the product of utilities, and (ii) s d `, which yields a Rawlsian social welfare function, such that W 5 min ah(uh). With equal individual weights ah, the Rawlsian criterion is thus that we should maximize the utility of the least well off individual in the society. Returning to the situation where we have measured the individual WTP for a project, it does matter for appraisal what assumptions are being made about how these should be weighted together. Just adding them up and taking the total sum is putting more weights on those with higher income, since it is known that marginal utility of money decrease with money. For a discussion and motivation of other options in the transportation appraisal context, see Pearce and Nash (1981), Mackie et al. (2001), Bates (2006) and Jara-Diaz (2007). Finally, we should note that, partly for distributional concerns, the tax system used to fund the public project is typically not a lump-sum taxation, making it difficult to actually achieve a Pareto improvement. In practice, the public projects may have to be funded by some distortive taxes. Ideally, one may want to analyze the public project and its funding as a whole, but this is difficult in practice. Instead, in the standard approach to cost–benefit analysis it is assumed that the marginal public project is funded marginally using the existing distortive tax system. Thus, the cost of tax-funded projects should be adjusted by the so-called marginal cost of public funds. In this standard approach, the benefit of the projects and its funding are separated, but they both take part in the appraisal of the project. In this sense, the distributional concerns implicitly embedded in the tax system come into play in the appraisal. Recently, this standard approach to CBA has been complemented with a new approach inspired by the theory of optimal taxation. Following the new approach, a marginal public project is evaluated together with a marginal adjustment in the existing non-linear tax system, which keeps everybody at the same utility level. The cost–benefit criterion then becomes whether government revenue increases or not. The new approach thus neutralizes distributional concerns by a distribution neutral adjustment of the tax system. For an exposition of the standard and new approaches to CBA, see Kreiner and Verdelin (2011) and references therein.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

606

Handbook of choice modelling

2.3

Valuing the Future

Often the most important quantities in a CBA is the discount factor, or discount rate. More generally, how should we evaluate benefits and costs in the future, compared with the present? The number of books and papers written on this subject is enormous, much of which is also relevant in a choice modelling context. Intertemporal choices can, like other choices, be studied using choice modelling, see, for example Albrecht et al. (2011), van Osselaer et al. (2004), and Bleichrodt and Johannesson (2001). However, there are other aspects of valuing the future, beyond individual intertemporal choice. Appraisal methods are frequently used in resource and environmental economics applications, and are also brought to bear on contemporary environmental challenges’ such as biodiversity and climate change. Interpersonal comparisons across time and space become strongly highlighted in this context and lie at the heart of any definition of sustainability: how should the well-being of future generations be compared to our generation? Additional concerns are whether uncertainty and irreversibility, for example, of resource depletion, are adequately addressed by the standard CBA framework and whether decisions making should be more biased towards precaution. The discounting that is used in standard applied CBA leads in practice to more or less zero weight on generations distant in the future. Yet, some scholars argue that the standard CBA framework does employ an appropriate discounting. Without such a discounting, every generation would be destined to low level of consumption, and it is not self-evident that discounting distant future generations to hold a low weight is more of a moral problem than downweighting all generations as they become the present generation (cf. Groom et al. 2005). Other scholars have developed methods of valuating the future that modify the standard framework of CBA (Chichilnisky, 1996; Li and Löfgren, 2000). In particular, the criterion of Chicilnisky is operational and has been applied as a modification of the traditional discounting in CBA, see, for example, Figuières and Tidball (2010) and Minken and Samstad (2003). For a discussion of discounting in the context of CBA, see, for example, Boardman et al. (2006). 2.4 Externalities and the First Welfare Theorem There are many pieces that need to be mastered in an appraisal, and we cannot address them all in this chapter. Here we will just discuss two related issues that are not so seldom debated in the context of applied CBA. First, it should be noted that CBA is traditionally rooted in partial equilibrium theory. The practice of CBA is well developed and hopefully well understood for appraisal of small projects that induces small changes, and in particular induces no income effects. In this context, estimating appropriate consumer and producer surplus is the central part of the analysis. Still, in practice, disputes often arise about what effects should be counted and what should not, and whether there are risks of double counting. One particular such area of controversy is related to the terms technological and pecuniary externalities. The categorization into technological and pecuniary externalities seems to have been introduced by Scitovsky (1954). Pecuniary externalities are effects of one decision makers’ behaviour on another decision makers’ utility (or production function) which

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Appraisal

607

are induced by changes in prices on markets. These pecuniary externalities causes transfers between decision makers, but only through price mechanisms on markets. Technological externalities occur as activities of one decision maker directly affect the utility or production function of another decision maker, without interaction through price mechanisms on markets. For instance, if air pollution and congestion are caused by actions of decision makers that does not take into account the (negative) effects on others. Technological externalities does not involve any transfer payments through price changes. Perhaps unintuitive at first glance, for economists, the pecuniary externalities are not as interesting as technological externalities. The reason is that these transfer payments through the price mechanism are second order effects if we are at Pareto efficient equilibrium in the first place: the envelope theorem tells us that the agents of the economy have optimized their utilities (profits), and any price change will be a second order effect and therefore will be zero for infinitesimally small changes. This is a central argument in the first welfare theorem of economics. However, this does not mean that pecuniary effects should never enter a CBA. If the equilibrium was not Pareto efficient in the first place, then pecuniary externalities also become first order and may be significant. Therefore, caution is required before disregarding the pecuniary externalities, depending on the application at hand. In the presence of imperfect competition or technological externalities in other markets, pecuniary externalities cannot be disregarded as transfer payments without careful consideration. Also, sometimes it is not quite intuitive what should be considered as pecuniary and technological spillovers, see Small and Steimetz (2012). Second, the standard practice of CBA is not so well suited for studying large projects, such as large infrastructure investments. Envelope arguments are not in general applicable. Here, one would be required to capture also the general equilibrium effects. For instance, in a computable general equilibrium (CGE) model one can calculate proper welfare measures of a policy change. Typically, to be computational tractable, CGE models are based on the assumption of representative individuals. Bröcker (2004) demonstrates how to calculate Hicksian welfare measures when analyzing, for example, large infrastructure investments, assuming one representative individual in one region. As we have argued above, it is rarely the case that market demand can be represented by a representative individual, and there is therefore a trade-off between disaggregate market analysis and capturing general equilibrium effects. The CBA and CGE can be viewed as complementary approaches for large project evaluation (Small, 1999; Vickerman, 2007).

3

WELFARE ECONOMICS IN RANDOM UTILITY MODELLING

3.1 Random Utility Models (RUMs) Let us first present the discrete choice framework. Consumers are assumed to face i 5 1,. . .,M mutually exclusive and exhaustive discrete alternatives. If alternative i is chosen there is an access charge ti, a price per unit of consumption pi, and a vector of attributes qi associated with alternative i. When restricted to choose alternative i, the utility maximization problem of the individual gives the indirect conditional indirect utility function

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

608

Handbook of choice modelling Ui (ti,pi,qi,y,Pi) ; ni (ti,pi,qi,y) 1 Pi ; max u (z,xi,qi) 1 Pi

(24.3)

subject to z 1 pixi # y 2 ti

(24.4)

z,xi

where z is a numeraire good. The solution to this utility maximization problem yields | ,| the conditional demand functions (x i z ). It is common to assume that the conditional indirect utility function only takes the own price and qualities as variables.1 The choice model defined by (24.4) is referred to as a mixed discrete/continuous choice model, since it involves a discrete choice among a finite amount of exhaustive and mutually exclusive alternatives (goods), while the quantity of the chosen good is continuous. One typical application of this setup may be the choice of recreational fishing trips (McFadden, 1999), where the alternatives are distinguished by fishing site (and duration of trip). In this setup, the ti is the transportation cost and living expenses of the trip, pi is the fee associated with the utilization of the site, and qi is the attributes of the site that are important to anglers, for example, catch rates. Another typical application would be residential choice, where the choice alternatives may be local jurisdictions (municipalities). In this case, ti is the local income tax, pi is the price per square meter of housing consumption in the jurisdiction, xi is the housing consumption conditional of jurisdiction, and qi represents local amenities and attributes (such as local public goods) that follow with the choice of municipality, In other cases the individual is assumed to be constrained to consume a fixed amount xi 5 xi of the discrete good, given the alternative i. This setting is called a pure discrete choice2 framework, as opposed to the mixed discrete/continuous choice setting described by (24.4). Hence, the pure discrete choice utility maximization problem is Ui (pi,qi,y,Pi) ; max u (z, xi, qi) 1 ei z

subject to z # y 2 pi

(24.5) (24.6)

where we have subsumed the access fee ti into the price pi, and without loss of generality set xi 5 1. The conditional indirect utility function becomes Ui (pi,qi,y,Pi) 5ni (pi,qi,y) 1 Pi

(24.7)

where ni are conditional standard indirect utility functions. These terms are deterministic, and therefore they are also termed the deterministic components of the RUM. The choice probabilities are given by `

Pi (ni,. . .,nM) 5 3 Fi (ni 1 Pi 2 n1 , . . . , Pi, . . . ,ni 1Pi 2 nM) di

(24.8)

2`

where F is the cumulative distribution function (CDF) of the random utility terms, and Fi denotes its derivative with respect to the i:th argument. In welfare analysis, it is usually assumed that a finite amount of money is required to restore utilities for any finite change of prices or attributes. As established by Hanemann (1984a), substituting (24.6) into the objective function, in a pure discrete choice frame-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

609

Appraisal

work, the income and price of the alternative should enter the conditional indirect utility function as (y 2 pi) . The marginal utility of money l can be found as the marginal disutility of price, 0ni 0ni 52 0y 0pi

l;

(24.9)

The utility maximizing individual will choose the alternative that yields the highest level of utility, yielding the unconditional indirect utility function U (p,q,y,P) ; max Ui (pi,qi,y,Pi)

(24.10)

i

where q 5 { qi } and P 5 { Pi } , sets containing vector members. Consider a policy that changes the attributes and/or prices of some alternatives. An application in the recreational fishing setting mentioned above is a damage remediation program that increases fish abundance. Let (q0,p0) and (q1,p1) denote the vector of attributes and prices associated with the state before and after the proposed policy has been implemented. The Hicksian welfare measures are defined in analogy with the traditional deterministic microeconomic setting. Consider first the income y that restores utility to its original level, which is implicitly defined by U (p0,q0,y0,P0) 5 U (p1,q1,y,P1)

(24.11)

where y0 is the original income. To be discussed below, consumer sovereignty may imply that we should assume that random utility terms are identical before and after the change, so we will here assume that P0 5 P1. Having thus defined the expenditure needed to restore utility, the compensating variation is defined as C 5 y 2 y0

(24.12)

Since utilities in a random utility model is random, the compensating variation will also be a random variable. In standard microeconomic textbooks, such as Varian (1992) or Mas-Colell et al. (1995), the Hicksian welfare measures are derived by using the dual concept of expenditure functions. Expenditure functions were used in a discrete choice framework by Small and Rosen (1981), and in a more standard random utility framework by Hanemann (1985).3 First we define the ex ante conditional expenditure function mk to satisfy vk (pk,qk,mk (pk,qk,u 2 Pk)) 1 Pk 5 u

(24.13)

Then the unconditional random expenditure function (cf. Hanemann, 1985; Varian, 1992) is simply m (p,q,u) 5 min { mk ( pk,qk,u 2 Pk) }

(24.14)

k

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

610

Handbook of choice modelling

Note that the ex ante random compensation functions cannot be applied as is, if we want to calculate the compensating variation of a proposed change. The utility of each individual before the change is random, which needs to be addressed when calculating the Hicksian welfare measures, to which we now turn. 3.1.1 Marshallian consumer surplus measures In discrete choice settings, the compensating variation corresponds to the expected value of C, or EC. Unfortunately, in general there exists no closed form solution for EC. In most applications, a standard additive RUM is used as defined above. To be able to calculate EC there are two additional assumptions commonly made. First, it is very useful if choice probabilities can be written in closed form, which is the case for the well-known class of MEV models4 where the random utility terms are assumed to be multivariate extreme value (MEV) distributed F (P1,. . .,PM) 5 exp (2H (e2P1, . . ., e2P ))

(24.15)

M

where the generator function H is linear homogeneous on Rn1. In this case the choice probabilities are given by Hi (en1,. . . ,en ) H (en1, . . .,en ) M

Pi (n1,. . . ,nM) 5 eni

(24.16)

M

where Hi is the first derivative of the generating function with respect to the i:th argument. Second, it is common to make the simplifying assumption of constant marginal utility of income (CMUI). Assuming constant marginal utility of money, the conditional indirect utility function can be written Ui (pi,qi,y,Pi) 5 ly 1vi (qi,pi) 1 Pi, and then the expected unconditional indirect utility function is given by EU (p,q,y,P) 5 ly1Emax [ v1 (q1,p1) 1 P1, . . .,vn (qn, pn)1Pn ]

(24.17)

where E is the expectation operator, and p 5 { pi } . Given assumptions we have made, a closed form is available in the case of MEV models, yielding EU (p,q,y,P) 5 ly 1 E max [ v1 (q1, p1) 1 P1,. . .,vn (qn, pn) 1 Pn ] 5 ly 1 log H (ev1,. . ., evn) 1 c where c is a constant. Combined with (24.11) we have 1 1 1 o EC (y0,p0,q0,p1,q1) 5 { log H (ev 1,. . ., evn ) 2 log H (ev 1,. . .,evn) } , l o

(24.18)

where y0 is the exogenous original income. This is the logsum formula5 developed by Ben-Akiva (1972), McFadden (1973) and Domencich and McFadden (1975) for the case of iid random utility terms (MNL model), and McFadden (1978) for the MEV case.6 Karlström (1999) shows that the logsum

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Appraisal

611

formula also is valid if the disturbance vector for each individual is assumed to be given by different independent draws before and after the change. Recently, Delle Site and Salucci (2012) show that the formula also applies for any temporal correlation structure, to be discussed below. The logsum formula is identical to the traditional (Marshallian) consumer surplus measure for a policy change from (p0,q0) to (p1,q1) . This is most easily seen by forming the total differential (see McFadden 1999): n ev H (ev1,. . ., evn) n j d log H (ev1, . . ., evn) 5 a 5 Pj (v) dvj dv j a vn ( v1 j51 H e ,. . .,e ) j51 j

(24.19)

Integrating over any path v (t) , 0 # t # 1, from v0 to v1, we have

dvj (t) 1 n dt f e a 3 Pj (v) l j51 0 dt 1

EC (yo,p0,q0,p1,q1) 5

(24.20)

Hence, the log sum formula coincides with the traditional Marshallian consumer surplus given by the area under the traditional Marshallian demand curves for the alternatives, normalized by the marginal utility of money. Of course, as we have assumed constant marginal utility of income (CMUI), the Hicksian welfare measures coincides with the Marshallian consumer surplus. The RUM framework described above leads us to the expected value of the compensating variation EC used in cost–benefit analysis. The compensation is a random variable that can be written C (yo,p0,q0,p1,q1,P0,P1) . The expected compensating variation is thus a conditional of (yo,p0,q0,p1,q1) . To find the unconditional welfare measure to be used in a cost–benefit analysis, we need to find the expected compensated variation in the population. This is done by taking the expected value over the target population distribution of (yo,p0,q0,p1,q1) . It is also sometimes argued that taking the median is to be preferred.7 Although the mean CV is more or less the only used measure in transportation applications, it could be argued that the median should rather be used in CBA, while the mean CV is more appropriate in natural resource damage assessment, see Hanemann and Kanninen (2001) and references cited there. The more narrow focus on the mean CV has also been criticized in the transportation economics literature, see, for example, Johansson and Mattsson (1995). Taking the expected compensated variation to be used in appraisal merits a short discussion. In any finite population, the mean of CV is also a random variable. The law of large numbers tells us that as the size of the population grows, the realized mean CV will approach its expected value, and the central limit theorem also tells us how fast. However, for project evaluation when a only a small population is affected one may want to assess the probability of the realized mean being quite far away from the expected value. This issue can be addressed by large deviation theory, or numerical simulations (Zhao et al., 2012). 3.1.2 Mixed logit The logsum formula (24.20) holds for any MEV model under the CMUI assumption. In other discrete choice models, such as probit or mixed logit models, there exists in general

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

612

Handbook of choice modelling

no closed form solution for the consumer surplus, and one needs to resort to simulation. The mixed logit model is a common model specification which is used for appraisal in applied work. To illustrate how the logsum formula can be useful also in a mixed logit model, consider an MNL model such that the deterministic utilities are given by vj 5 2lpi 1 aqi, where pi is the price and qi is an attribute of the alternative i, and l is interpreted as the marginal utility of income (hence, maintain the CMUI assumption). In a mixed logit model, parameter a does not need to be scalar, but may be allowed to be drawn from a distribution f (x;q) with support on B, and where q are parameters that defines the distribution. Then the choice probability of choosing alternative i can be calculated by the following one-dimensional integral: exp (vi (l,a) ) f (a;q) da exp (vj (l,a)) B a

Pi 5 3

(24.21)

j

where l and q are parameters to be estimated. The choice probabilities are no longer on closed form, but can be estimated using simulation (see the excellent work of Train, 2003). For welfare evaluation, no closed form solution exists either, and instead one has find the consumer surplus by integrating out the mixed parameter: 1 EC 5 3 log { a exp (vj (l, a))} f (a) da l j

(24.22)

B

Crucially in this example, the marginal utility of money was assumed to be fixed. It was recognized early on that mixing the l parameter causes the individual consumer surplus to be fat tailed, with excessive valuation. Therefore, in applied work it is usually preferred to keep the l parameter fixed. For early examples of using the mixed logit in welfare evaluation, see Train (1998), Revelt and Train (1998) and Breffle and Morey (2000), and also see the review paper of Bockstael and McConnell (2007). 3.1.3 Generalizing the logsum formula The famous logsum formula thus defines the Marshallian consumer surplus, but it does not give the distribution of consumer surplus. As explained earlier, for any individual, consumer surplus is a random variable in a random utility model. Furthermore, the logsum formula only holds under the assumption of constant marginal utility of money. In this section we will show how one can derive the cumulative distribution of compensating variation and thus calculate its expected value. The literature is motivated by finding proper welfare measures when CMUI does not hold. This was first addressed in the transportation field in a number of papers, for example, Jara-Diaz and Videla (1990a, 1990b). Approximation schemes and simulation method was developed by McFadden (1999) and applied in Herriges and Kling (1999). The interest in this subject also stems from the effort to establish a representative individual, see Anderson et al. (1987, 1988, 1992), Oppenheim (1995) and Verboven (1996). The exposition below draws on Karlström (1999, 2001) and Dagsvik and Karlström (2005), where proofs and technical assumptions are given.8

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Appraisal

613

First, let us define M (y) to be the probability that the required income to restore utility to its initial level is at least y, that is, M (y) 5Pr { m (p1,q1,U (p0,q0,y0,P) ) } $ y)

(24.23)

where the random compensation function m was defined in (24.14) and U in (24.10). Let J (p,q,y) denote the chosen alternative at prices p, attributes q, and income y. We will first consider an individual that chose alternative i before the change, such that J (p0,q0,y0) 5 i. For such an individual, we will consider a choice between the M alternatives where the deterministic utilities are given by hi (p0i ,q0i ,y0,p1i ,q1i ,y) 5 max { ni (p0,q0,y0) ,ni (p1,q1,y) } . It turns out that Mi (y) , defined as the joint probability of J (p0,q0,y0) 5 i and M (y) , can be written Mi (y) ! Pr { m (p1, q1,U (p0,q0,P)) $ y,J (p0,q0,y0) 5i } 5 Pi (h1 (p0,q0,y0,p1,q1,y) , . . .,hM (p0,q0, y0,p1,q1,y) ) 0 , y , yi (24.24) where Pi is the standard choice probability given in (24.8), or (24.16) in the case of MEV model, and yi is implicitly given by ni (p0i ,q0i ,y0) 5 ni (p1i ,q1i ,yi) . yi is thus the deterministic income compensation needed to restore utilities for an individual that chose alternative i both before and after the change. Note that Mi (y) is a choice probability that will define the probability that at least income y is required to restore utility to its initial value for any individual that initially chose alternative i, as long as y , yi. Note also that nobody that had chosen alternative i before the change will require an income more than yi to be compensated, so Mi (y) 5 0 for y . yi. The maximum income required to be compensated for anyone that had chosen alternative i before the change is yi, which may be below y0 if alternative i was improved due to the change. We now can easily obtain M (y) ! Pr { m (p1,q1,U (p0,q0,)) $ y } 5 Ii (pi ,qi ,y ,p ,q ,y) Pi (h1 (p ,q ,y ,p ,q ,y) ,. . .,hM (p ,q ,y ,p ,q ,y) ) a i 0

0

0

1

1

0

0

0

1

1

0

0

0

1

5 a Ii (p0i ,q0i ,p1,q1,y) Mi (y) y . 0 i

where

Ii (p0i , q0i ,y0, p1,q1,y) 5 e

1, 0,

if ni (p1i ,q1i ,y) , ni (p0i ,q0i , y0) otherwise

1

(24.25)

(24.26)

The cumulative distribution function of the random expenditure is m (p1,q1,U (p0,q0,) ) # y) 5 1 2 M (y)

(24.27)

Having thus derived the CDF of the expenditure function, we are in the position to find its mean9

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

614

Handbook of choice modelling yi ( p0i,q0i,y0,p1i,q1i)

Em 5 a

i

3

Mi (y) dy

(24.28)

0

and the mean of the compensating variation is given by EC 5 Em 2 y0

(24.29)

A similar forumla can be derived for equivalent variation (Karlström, 1999; Dagsvik and Karlström, 2005). The formula (24.28) is equivalent to the logsum formula under the assumption of constant marginal utility of income. In general, the compensating variation cannot be calculated by a closed form formula, even in the case of an MEV model. However, when closed-form choice probabilities exists, then (24.28) shows that we can calculate compensating variation as a sum over one-dimensional integrals of choice probabilities, which are computationally easy to solve numerically. The analytical expression in (24.28) also lends itself to analytical work (de Palma and Kilani, 2011). Also note the simple structure of the formula (24.28): the Hicksian welfare measures are given by one-dimensional integrals of choice probabilities, which is familiar from standard microeconomics where Hicksian welfare measures are given by the integral of the Hicksian compensated demand functions. In this sense, formula (24.28) may be viewed as a (Hicksian) generalization of the (Marshallian) logsum formula, which only holds when the CMUI assumption does not hold. Application areas of the generalized logsum formula include resource and environmental economics (Chattopadhyay, 2009; Morey and Rossmann, 2008), labour economics (Dagsvik et al., 2009) and transportation (Wu et al., 2012). 3.1.4 Generalized cost and rule of half Although in a standard textbook exposition Marshallian consumer surplus is defined in terms of price changes, it should be noted that (24.20) is valid for any composite changes in prices and attributes. Another approach to deal with changes in attributes (instead of changes in prices only) is to introduce what is known as a generalized cost (or composite cost), in which changes in attributes are translated into monetary terms, typically after having made some further assumptions. This approach is most clearly illustrated in what is known as the rule of half. First, let us decompose the changes in deterministic utilities vj into changes of prices and attributes: dvj 5

0vj

0vj dpj 1 a dqjk 0pj k 0qjk

(24.30)

where qjk is attribute k of alternative j. Then consumer surplus in equation (24.20) can be written 1 dpj hjk dqjk EC 5 a 3 c Pj (v) 1 Pj (v) a d dt dt j k l dt 0

(24.31)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

615

Appraisal

where, as before, l 5 0y 4j is the marginal utility of money, hjk 5 0q /l is the marginal subjective valuation of attribute qjk, which we assume to be constant for the relevant domain. As the name indicates, rule of half is an approximation based on a linearization of the demand function, which makes the integrals easy to calculate using only the demand at initial and final prices and attributes, respectively. Let nj 5 12 (Pj (v0) 2 Pj (v1)) . Then 0vj

0vj

jk

EC < a e nj Dpj 1 nj a j

k

hjk l

Dqjk f

(24.32)

where Dpj 5 p1j 2 p0j and Dqjk 5 q1jk 2 q0jk is the change in prices and qualities. Hence, the change in consumer surplus can be calculated if we know the demand before and after the change, and in addition know the change in generalized cost DGC 5 Dpj 1 g hl Dqjk. k To arrive at rule-of-half approximation in (24.32) we have made a number of assumptions. First, we have made the CMUI assumption, just as we did to derive the logsum formula in (24.20). Second, in addition we made the simplifying assumption that the marginal subjective valuation of the attributes is constant in the relevant domain. This is inherent in the definition of generalized cost in which the valuation of attributes are transferred into a monetary unit. Third, as the name rule of half suggests, we applied a linear approximation of the demand functions. The use of rule of half has not been without problems. In particular in the transportation field, the term generalized cost is usually defined as including monetary costs and travel time attributes, such as in-vehicle travel time, waiting time, and so on. As we have seen, the rule of half can then be used to (approximately) calculate consumer surplus when we consider policy changes involve those monetary costs and those attributes. However, if one restricts the generalized cost only to allow for prices and travel time attributes, it will prove quite useless to assess welfare changes due to changes in other attributes. Neuberger (1971) highlighted this fact by considering a destination choice model. Suppose we introduce a policy that increases the attractiveness of one destination. This will cause more traffic to the destination. Travel time may remain constant (or increase due to congestion). Defining generalized cost in a narrow sense to include only price and travel time, rule of half would yield a zero (or even negative) consumer surplus, which clearly is wrong. As is spelled out by Bates (2006) and Jara-Diaz (2007, p. 99), used correctly there is no reason why changes in destination attraction, which is reflected in land-use changes, should not be captured by rule of half. To do so, we need only to adequately extend the definition of generalized cost to include the relevant dimensions of attributes, as is evident from (24.32). The extent to which the logsum formula has been used varies between disciplines. In resource and environmental economics, in particular recreational demand literature (Bockstael and McConnell, 2007), the logsum measure has a stronger position than in the transportation field, where the rule of half has been dominating (Bates 2006; Geurs et al. 2010). However, also in the field of transportation, logsum has been used, in particular in the US (Niemeier 1997; Srour et al., 2002), and it has in recent years gained renewed interest also in Europe (Geurs et al., 2010). In labour economics, logsum is used when MEV models are used, but in life-cycle models most often simulation is used to recover proper welfare measures.10 jk

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

616

Handbook of choice modelling

Let us conclude with a short discussion of the motivation of the use of rule of half. After having estimated an MEV model, the logsum formula can be used for calculating Marshallian consumer surplus, under the CMUI assumption. As we have seen, given two more assumptions, one can instead approximate consumer surplus using the rule of half instead. The rule of half approximation as defined here should work well as an approximation of the logsum formula if the above additional assumptions are reasonable, which is indeed likely the case for small changes. One puzzling question remains: why would one use rule of half instead of the logsum formula directly? In fact, in many appraisal applications, the rule of half is being used instead of the logsum formula. One can state two reasons for its popularity, in particular within the transportation field. First, admittingly, the rule of half is more easy to communicate than the logsum formula. Second, it lends itself to a disentanglement between demand and valuation. The practice in many countries in the field of transportation is to use separate datasets and models to address the marginal subjective valuation of attributes (travel time), on the one hand, and the demand on the other hand. This practice has no theoretic justification, but has partly been justified on equity grounds. However, one may argue that it would be preferred to explicitly revise the weights of the social welfare function, rather than the values of marginal subjective valuations (Sugden, 1999; Mackie et al., 2001; Bates, 2006). 3.1.5 Intertemporal correlated random utilities In neoclassical economics, consumer sovereignty is a fundamental assumption. The preferences are taken as defined prior to market circumstances and independent of these circumstances, including policy changes (McFadden, 1998). Relaxing this assumption is problematic for welfare economics, which we discuss in section 5. Therefore, it is typically assumed that the random utility terms are identical before and after the policy change. On the other hand, a common interpretation of the random utility terms is that they are reflecting unobserved characteristics of alternatives. If the policy change is rather complex and also will change these unobserved characteristics, then it may make sense that the random utility terms also may change. Fortunately, this does not change welfare evaluations, as long as the marginal utility of income is assumed constant. In fact, as shown by Delle Site and Salucci (2012), under the CMUI assumption, in a MEV model the welfare measure given by the logsum formula remains unaffected for any correlation of the random utility terms before and after the change. Note that this applies to the mean of the compensating variation, while the variance of cv of course changes considerably depending on the level of temporal correlation. When CMUI does not hold, we are not so fortunate, and it can be shown that the welfare evaluation depends on the level of temporal correlation. There exists no published simple formulas to calculate the expected compensating variation in the presence of income effects and arbitrarily correlated random utility terms, so Delle Site and Salucci (2012) use simulation to calculate the appropriate welfare measures. 3.1.6 Welfare measures and the representative individual problem Every individual is unique, having his or her own preferences. The economist (investigator) should ideally find every individual’s preferences to make predictions and welfare estimates. There are two major obstacles in achieving this. First, we can in practice hardly ever hope to observe each individual’s behaviour. At best, we can observe a

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Appraisal

617

sample of individuals. Second, we may not observe the behaviour of single individuals at all, but only the behaviour of groups. In economic textbooks, the latter is often explicitly or implicitly recognised. We may in the real world observe only phenomena as aggregated demand curves, but not the preferences of every individual that the observed demand curve arises from. A frequently used theoretical construct is to assume that the aggregate behaviour is generated by a fictitious representative individual, whose utility embodies the aggregate preferences. The problem of determining which assumptions under which this simplification is allowed is called the representative individual problem, or the integrability problem. Consider Figure 24.1. Following the terminology of Mas-Collel et al. (1995), a positive representative individual’s indirect utility function vR (p,Y) should generate the observed market demand xj (p,Y) through Roy’s identity,11 see Figure 24.1 (b). Given any demand system, Slutsky symmetry is a sufficient and necessary condition for the demand system to be derived from an individual preference ordering. There will exist no representative individual if Slutsky symmetry is not fulfilled. The representative individual approach has been used in the product differentiation literature. Anderson et al. (1987, 1988, 1992) show that there exists a representative individual in a linear random utility framework with constant marginal utility of income, and that the indirect utility of the representative individual is a proper welfare measure. They also show that the common CES utility function of a representative individual can be derived from a log-linear random utility model. However, since marginal utility of money is no longer constant, it can be shown that a change that increases the utility of the representative individual is not necessarily a potential Pareto improvement. In a discrete choice random utility framework, Slutsky symmetry will be fulfilled if marginal utility of income is constant. In this case, the logsum formula is a proper Market demand, xj (,Y)

(b)

(a)

Indirect utility of RI R(,Y )

(e)

(c)

Indirect utilities {h(,yh)}

(d)

Welfare analysis, W({h(,yh)})

Figure 24.1 The representative individual problem

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

618

Handbook of choice modelling

indirect utility for the representative individual and can be used for welfare evaluation, Figure 24.1(c). With nonlinear random utility models such that marginal utility of money is not constant, one needs to take the direct approach and calculate the Hicksian welfare measures through the generalized logsum formula (24.28), which may be represented by Figure 24.1(d).

4

STRUCTURAL AND REDUCED FORM MODELLING, AND THE EXPERIMENTALIST APPROACH

Choice modelling is in many cases an exercise in structural modelling, in the sense that the approach is to formulate a behavioural utility-maximizing consistent model which is estimated from observed data. This approach is particularly useful for appraisal, since it is possible to derive theoretical sound welfare measures as we have seen in the case of random utility econometrics in section 3. It should, however, be recognized that other approaches to welfare econometrics exist, as is discussed in this section. There are indeed two approaches to econometrics for appraisal. One strand assumes that there are random fluctuations of market demand around an assumed representative individual. That is, market demand is not deterministically given by a representative individual, but as perturbations around a representative individual (Varian, 1992; cf. Brown and Walker, 1989). Random utility econometrics, as one instance of the other strand, takes a more stringent approach and formulates a structural model in which the random fluctuations of market demand is put into a consistent random utility maximizing framework. Any continuous demand can be generated by some utility maximizing individuals with some income distribution (Debreu, 1974), so one may argue that it is a weak assumption to assume that market demand is generated by individuals with regular preferences. For appraisal, a model of market demand can be used for theoretically sound welfare analysis if market demand is nicely behaved such that Slutsky symmetry (integrability condition) is satisfied. In this case, the Hicksian welfare measures can be found either by directly solving a system of partial differential equations (Hausman and Newey, 1995), or using Slutsky compensated demand as an approximation (Irvine and Sims, 1998). In general, however, note that formulating a structural model of a utility-maximizing individual is dual to formulating demand functions only at the individual level, not at the aggregate level. Therefore, the structural approach of modelling utility-maximizing individuals is a useful approach, arriving at theoretically sound Hicksian welfare measures, see arrow (d) in Figure 24.1. Another much more straightforward approach is to use the Marshallian consumer surplus, which also is a proper welfare measure when there are not income effects (defined by the Slutsky equation). This is an attractive approach by itself, and calculating Marshallian consumer surplus will yield a good approximation of the more theoretically sound Hicksian welfare measures when we analyze small changes and/or in the absence of income effects (see also Willig, 1976). Following this approach, we should then focus our attention to market demand directly, without constructing models consistent with utility-maximization behaviour to match the observed market demand. Thus, we can use the arrow (e) in Figure 24.1.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Appraisal

619

If we settle with using Marshallian consumer surplus as a welfare measure, other tools become central. Note that market demand X(p) is given by a horizontal addition of individual market demand X (p) 5 g xh, and (Marshallian) consumer surplus is simply the h sum of individual consumer surplus p1

p1

p1

CS ! a 3 xh (p) dp 5 3 a xh (p) dp 5 3 X (p) dp h h

p0

p0

(24.33)

p0

Thus, in a reduced from modelling approach using Marshallian consumer surplus as a welfare measure, to evaluate a proposed project we should only focus on best methods to forecast the market demand. Tools for forecasting market demand thus become the central focus. Forecasting methodology is conceptually different from statistical inference, although they are related. Interestingly, the methodology of forecasting has evolved considerably in the last decade, in particular forecasting combination has shown to be a promising tool for improving forecasting accuracy (Timmerman, 2006). This is a method in which different (simple and complex) models can be combined, resulting in improved accuracy. These methods are becoming increasingly available also in the field of econometrics (see, for example, Clements and Harvey (2009)) and has been applied in recreational demand modelling (Song et al., 2009), and transportation (Garcia-Ferrer et al., 2006). Curiously, however, there appears to be no applications aiming at appraisal. This rather atheoretic approach to appraisal, without formulating structural, utility consistent models of individual choice behaviour, is not encompassed by all modellers. Interestingly, and related to this distinction between structural modelling and reduced form demand modelling, there exists a lively debate in econometrics between structural modellers and experimentalists. The experimentalists are making strong arguments, criticizing the practice of structural modelling, which is the topic of most of this handbook, for making too many and strong assumptions. In essence, structural modelling is said to involve too much economic theory, instead of letting the data speak for itself (Keane, 2010). By exploiting natural experiments, experimentalists argue that one will be able to identify and quantify effects of policies, without having to make heroic theoretic assumptions. In response to this critique, Keane (2010) argues that we always need economic theory as a window through which we will have to interpret data, and that the main difference between experimentalists and structural modellers is the degree to which assumptions are being explicit. Experimentalists’ focus on searching for good data is equally useful also for structural modellers, and Keane also recognizes that structural modellers should spend more effort to validate structural models (Keane and Wolpin, 2007).

5

WHAT IF ‘AS IF’ IS QUESTIONABLE?

As we have argued above, neoclassical microeconomics and welfare economics is founded on assumptions of regular and coherent preferences. This paradigm has been very successful and has been dominating the field of economics for more than half a century. However, it has since long been clear that the assumption of regular preferences and its associated rational behaviour is a strong one. As a response, the field of

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

620

Handbook of choice modelling

behavioural economics has emerged and produced an extensive array of so-called choice anomalies. Choice anomalies are not rigorously defined, but in essence they are typically demonstrated and identified in choice experiments in which people seem not to be fully rational, as it would be defined in everyday language by reasonable people. A choice anomaly that has been consistently replicated in many situations and contexts is reference dependence, which captures the idea that we evaluate something in reference to something else. As we change the reference point, we also change the valuation. As a simple example, consider giving an individual a wage increase, and then removing the raise after a month or so. Is the individual just as well off as before the raise? Most people would say probably not. However, one can often come up with quite reasonable explanations why this is not so: perhaps the individual adjusted to the new wage level, perhaps (s)he got married as a result of the raise. After all, we cannot jump into the same river twice, and one can often find explanations to rationalize behaviour. But experiments have shown that such endowments effects are present even for small things and over a short time span, for instance giving someone a pencil before a lecture and then taking it away afterwards. The field of behavioural economics has generated a list of such anomalies, and after years of evidence it seems reasonable to acknowledge that choice anomalies do exist. Few, if any, individuals behave fully rationally all the time. What remains to be answered is what we should do about this information. Provided that we recognize that we should not ignore the new knowledge, one can discriminate two different paths to take (Roe and Haab, 2007). The first path is abandon the axiomatic neoclassical as-if assumption altogether and recognize that individuals experience utility, pain and well-being, and that it is this we should focus on. Utility as used in the traditional neoclassical approach was only a mathematical construct, but we should instead focus on real pleasure, utility, and well-being. Thus shifting focus towards a normative theory of experienced utility is a rather radical view. Kahneman and Sugden (2005) write: It must be said that the idea of using experienced utility as the standard of policy evaluation requires a major change in the foundations of normative economics, even if this is a return to an older tradition of economic thought. The change involved is so great that neither author is ready to advocate it unconditionally.

Another path is more incrementalistic. On this path, we may recognize that there seems to be a domain where the traditional neoclassical approach is a valid. We should learn to understand when we can use the traditional welfare economic toolbox, and when we cannot. The new behavioural welfare economic theory that needs to be developed should ideally have the traditional welfare economic theory as a limiting case. Since, as we have argued, traditional neoclassical economics is all about individuals’ choices, it is reasonable to assume that it will also remain a focus in the new theories that will be developed.12 One such candidate welfare theory is given by Bernheim and Rangel (2007, 2009) and Bernheim (2009). They start with defining ancillary conditions d with the property that according to traditional neoclassical economics they should be irrelevant for both choices and welfare analysis, for example a reference point or an anchor effect. The theory thus proceeds similarly to the standard theory, except that there now exists an ancillary condition d. For instance, an individual may have utilities U (x,z,d ) 5 x 1 dv (z) , where x

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Appraisal

621

and z are goods to consume, and d [ [ dL,dH ] is an ancillary condition that may be interpreted as an anchor. In a traditional setting, d would be fixed and exogenous. Let p denote price of good z. Consider a change from (p0,d0) to (p1,d1) . Bernheim (2009) defines two notions of compensating variation: ●

●

CVA is defined that all levels of compensation (after the change) greater than CVA guarantee that everything selected in the new set is unambiguously chosen over everything selected from the initial set. CVB is defined such that all compensation levels smaller than CVB guarantee that everything selected in the initial set is unambiguously chosen over everything selected from the new set.

It can be shown that CVA # CV # CVB, where CV is the traditional compensating variation associated with ancillary condition d0. If the ancillary condition d0 does not change before and after the policy change, then the different notions of compensating variation all collapse to the traditional definition, such that CVA 5 CV 5 CVB. Thus, in the limiting case when the ancillary condition does not exist, the traditional compensating variation can be seen as a limiting case. Also, the thus extended welfare economic theory also encompasses the intuition that if ancillary conditions are small, it should also have only small consequences for welfare analysis. However, the behavioural welfare economic theory by Bernheim and Rangel does not by itself explain when we should expect ancillary conditions d to be present and malleable. Furthermore, it is not clear what policy recommendations comes out of the analysis when there is a large discrepancy between CVA and CVB. The neoclassical approach has shown to be useful and adequate in many situations. One possible direction is that a new theory encapsulates the neoclassical theory as a limiting case. To arrive at such a theory, there is a need to understand the domain of applicability for the neoclassical theory, and also to understand the nature of the proper limits in which this standard theory emerges. For instance, List (2004) show that behavioural anomalies are less pronounced when individuals become more experienced with the choice context, suggesting that learning is one aspect to understand the proper limits. Efforts to define traditional neoclassical preferences as a limiting case include, for example, Munro and Sugden (2003) and de Borger and Fosgerau (2008).

6

CONCLUDING REMARKS

Neoclassical traditional welfare economics has a long and successful history. Applied welfare analysis is used for project evaluation in many countries and for various purposes, from assessing the Summer Olympic Games to water-treatment plants. The theory is well developed, but it should also be recognized that it also becomes rather technical when the policy changes involve many aspects of the economy.13 Economics is likely to undergo a development towards better models and better understanding of choices in situations when people are not fully rational. Such a development of a theory and methods will also influence welfare economics. It is too early to tell where the ongoing and future work will take us the next few years. It will in any case

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

622

Handbook of choice modelling

be important to be able to understand and predict when the traditional theory can be applied, and what to do when it cannot. A paradigm shift does not necessarily sweep away the old theory completely. In fact, this is the story of classical mechanics. When quantum mechanics was discovered, it was also evident that classical mechanics still applied in some contexts, but was shown experimentally not to be valid in some circumstances (small distances, high velocities). The elegant correspondence principle of Niels Bohr reconciled the apparent conflicting theories, and it was shown that the old theory was just a limiting case of the new theory. As we have seen, there already exists a few approaches such that neoclassical economics is a limiting case of the new theory, but much remains to be done in understanding what the limits are. Behavioural welfare economics is facing perhaps even stronger challenges than positive behavioral economics. At the heart of the debate is the view of consumer sovereignty and the presence of malleable preferences. It is expected that the debate will take place not only within the field of economics, but also in moral and political philosophy. After all, as was stated in the beginning of this chapter, this is only to be expected in issues that are touching the heart of human societies.

NOTES 1. 2. 3.

4. 5. 6.

7. 8. 9. 10.

11.

12. 13.

This is sometimes termed weak complementarity, see Mäler (1974). The term pure discrete choice framework is due to Hanemann (1984b). In retrospect, it was indeed unfortunate the groundbreaking paper Hanemann (1985) was not published at the time. The random utility framework in Small and Rosen (1981) seems not to be identical to the standard random utility framework used here and as outlined in, for example, Hanemann (1984a,1985) or McFadden (1999). These models are also known as generalized extreme value (GEV) models, but this term is used from something else in mathematical statistics. The multivariate distribution is a multivariate extreme value distribution, and therefore the term MEV seems to be adopted more frequently in the literature. Technically, it will be a log of sum only in the case of independently and identical Gumbel distributed random utility terms, but the formula applies also for nested logit within the class of MEV. This set of references is chosen with some care, and cited in McFadden (1999, 2001), but there are certainly other references that may be considered. The references cited here refer to the logsum formula in the random utility setting. It is clear that the logsum formula was known in the early 1970s to be the Marshallian consumer surplus in other contexts, foremost in the gravity model framework, cf. Cochrane (1975) and Williams (1977). This was proposed by Hanemann (1989), who also argues that other quartiles may be given consideration as well. Standard assumptions include that any finite change can be compensated by a finite amount of money, that where income is required to be positive, there is zero probability of ties etc., see Dagsvik and Karlström (2005). See Lemma 1 in Dagsvik and Karlström (2005). Another common practice in labour economics, is to calculate the required compensating variation that maintains the same level of the expected value function. Since, in general, marginal utility of money is not assumed to be constant, it is not clear that these are proper welfare measures as defined in this chapter, see French and Jones (2011). Market demand depends on prices p and individual incomes yh. Under certain conditions, market demand xj(p,{yh}) will depend only on prices and aggregate wealth Y 5 Syh, that is, xj(p,Y) 5 Shxj(p,{yh}). In particular, this holds true for any income distribution if the indirect utility function is of the so called Gorman form, see Varian (1992) and Mas-Collel et al. (1995). See Bernheim (2009) for a review of conflicting theories. For a proficient use of CBA to analyze well-stated questions in a complex context, see, for example, Parry and Small (2009).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Appraisal

623

REFERENCES Albrecht, K., K.G. Volz, M. Sutter, D.I. Laibson and D.Y. von Cramon (2011), ‘What is for me is not for you: brain correlates of intertemporal choice for self and other’, Social Cognitive and Affective Neuroscience, 6 (2), 218–25. Anderson, S.P., A. de Palma and J.F. Thisse (1987), ‘The CES is a discrete choice model?’, Economics Letters, 24 (2), 139–40. Anderson, S.P., A. de Palma and J.F. Thisse (1988), ‘A representative consumer theory of the logit model’, International Economic Review, 29 (3), 461–6. Anderson, S.P., A. de Palma and J.F. Thisse (1992), Discrete Choice Theory of Product Differentiation, Cambridge, MA: MIT Press. Bates, J. (2006), ‘Economic evaluation and transport modelling: theory and practice’, in K. Axhausen (ed.), Moving Through Nets: The Physical and Social Dimensions of Travel, Oxford: Pergamon. Ben-Akiva, M. (1972), ‘The structure of travel demand models’, PhD dissertation, Massachusetts Institute of Technology. Bernheim, B.D. (2009), ‘Behavioral welfare economics’, Journal of the European Economic Association, 7 (2–3), 267–319. Bernheim, B.D. and A. Rangel (2007), ‘Toward choice-theoretic foundations for behavioral welfare economics’, American Economic Review, 97 (2), 464–70. Bernheim, B.D. and A. Rangel (2009), ‘Beyond revealed preference: choice-theoretic foundations for behavioral welfare economics’, Quarterly Journal of Economics, 124 (1), 51–104. Bleichrodt, H. and M. Johannesson (2001), ‘Time preference for health: a test of stationarity versus decreasing timing aversion’, Journal of Mathematical Psychology, 45 (2), 265–82. Boadway, R.W. and N. Bruce (1984), Welfare Economics, Oxford: Basil Blackwell. Boardman, A.E., D.H.Greenberg, A.R. Vining and D.L.Weimer (2006), Cost–benefit Analysis – Concepts and Practice, Upper Saddle River, NJ: Pearson. Bockstael, N.E. and K.E. McConnell (2007), ‘Measuring welfare in discrete choice models’, in N.E. Bockstael and K.E. McConnell, Environmental and Resource Valuation with Revealed Preferences, Dordrecht: Springer, pp. 101–50. Border, K.C. (2012), Introductory Notes on Preference and Rational Choice, Pasadena, CA: California Institute of Technology. Breffle, W.S. and E.R. Morey (2000), ‘Investigating preference heterogeneity in a repeated discrete-choice recreation demand model of Atlantic salmon fishing’, Marine Resource Economics, 15 (1), 1–20. Bro¨ cker, J. (2004), ‘Computable general equilibrium analysis in transportation economics’, in D.A. Hensher, K.J. Button, K. Haynes and P. Stopher (eds), Handbook of Transport Geography and Spatial Systems: Handbooks in Transport, vol. 5, Amsterdam: Elsevier, pp. 269–92. Brown, B.W. and M.B. Walker (1989), ‘The random utility hypothesis and inference in demand systems’, Econometrica, 57 (4), 815–29. Chattopadhyay, S. (2009), ‘The random expenditure function approach to welfare in RUM: the case of hazardous waste clean-up’, Resource and Energy Economics, 31 (1), 58–74. Chichilnisky, G. (1996), ‘An axiomatic approach to sustainable development’, Social Choice and Welfare, 13 (2), 231–57. Chipman, J.S. and J.C. Moore (1980), ‘Compensating variation, consumer’s surplus, and welfare’, American Economic Review, 70 (5), 933–49. Chipman, J.S. and J.C. Moore (1990), ‘Acceptable indicators of welfare change’, in J. Chipman, D. McFadden and K. Richter (eds), Preferences, Uncertainty, and Optimality, Boulder, CO: Westview Press. Clements, M.P. and D.I. Harvey (2009), ‘Forecast combination and encompassing’, in T.C. Mill and K. Patterson (eds), Handbook of Econometrics, vol. 2, Basingstoke: Palgrave Macmillan, pp. 3–67. Cochrane, R.A. (1975), ‘A possible economic basis for the gravity model’, Journal of Transport Economics and Policy, 9 (1), 34–49. Cullis, J. and P. Jones (1992), Public Finance and Public Choice: Analytical Perspectives, Maidenhead: McGraw-Hill. Dagsvik, J.K. and A. Karlström (2005), ‘Compensated variation in random utility models that are non-linear in income’, Review of Economic Studies, 72 (1), 57–76. Dagsvik J.K., M. Locatelli and S. Strom (2009), ‘Tax reform, sector-specific labor supply and welfare effects, Scandinavian Journal of Economics, 111 (2), 299–321. De Borger, B. and M. Fosgerau (2008), ‘The trade-off between money and travel time: a test of the theory of reference-dependent preferences’, Journal of Urban Economics, 64 (1), 101–15. De Palma, A. and K. Kilani (2011), ‘Transition choice probabilities and welfare analysis in additive random utility models’, Economic Theory, 46 (3), 427–54.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

624

Handbook of choice modelling

Debreu, G. (1974), ‘Excess demand functions’, Journal of Mathematical Economics, 1 (1), 15–22. Delle Site, P. and M.V. Salucci (2012), ‘The impact of the before-after error term correlation on welfare measurement in logit’, CREI Working Paper No. 4/2012, Rome. Domencich, T.A. and D. McFadden (1975), Urban Travel Demand: A Behavioral Analysis, Amsterdam: North-Holland. Figuie`res, C. and M. Tidball (2012), ‘Sustainable exploitation of a natural resource: a satisfying use of Chichilnisky’s criterion’, Economic Theory, 49 (2), 243–65. French, E. and J.B. Jones (2011), ‘The effects of health insurance and self-insurance on retirement behavior’, Econometrica, 79 (3), 693–732. Garca-Ferrer, A., A. De Juan and P. Poncela (2006), ‘Forecasting traffic accidents using disaggregated data’, International Journal of Forecasting, 22 (2), 203–22. Geurs, K., B. Zondag, G. de Jong and M. de Bok (2010), ‘Accessibility appraisal of land-use/transport policy strategies: more than just adding up travel-time savings’, Transportation Research Part D, 15 (7), 382–93. Groom B., C. Hepburn, P. Koundouri and D.W. Pearce (2005), ‘Declining discount rates: the long and the short of it’, Environmental and Resource Economics, 33 (4), 445–93. Hanemann, W.M. (1984a), ‘Welfare evaluations in contingent valuation experiment with discrete response’, American Journal of Agricultural Economics, 66 (3), 332–41. Hanemann, W.M. (1984b), ‘Discrete/continuous models of consumer demand’, Econometrica, 52 (3), 541–62. Hanemann, W.M. (1985), ‘Welfare analysis with discrete choice models’, CUDARE Working Paper Series, reprinted in 1999 in J.A. Herriges and C.L. Kling (eds), Valuing Recreation and the Environment, Cheltenham, UK and Northampton, MA, USA: Edward Elgar. Hanemann, W.M. (1989), ‘Welfare evaluation in contingent valuation experiments with discrete response data: reply’, American Journal of Agricultural Economics, 71 (4), 1057–61. Hanemann, W.M. (1991), ‘Willingness to pay and willingness to accept: how much can they differ?’, American Economic Review, 81 (3), 635–47. Hanemann, W.M. and B. Kanninen (2001), ‘The statistical analysis of discrete-response CV data’, in I.J. Bateman and K.G. Willis (eds), Valuing Environmental Preferences: Theory and Practice in Contingent Valuation Methods in the US, EC, and Developing Countries, Oxford: Oxford University Press, pp. 302–441. Hausman, J. and W. Newey (1995), ‘Nonparametric estimation of exact consumer surplus and deadweight loss’, Econometrica, 63 (6), 1445–76. Herriges, J.A. and C.L. Kling (1999), ‘Nonlinear income effects in random utility models’, Review of Economics and Statistics, 81 (1), 62–72. Hicks, J.R. and R.G.D. Allen (1934), ‘A reconsideration of the theory of value. Part I’, Economica, new series, 1 (1), 52–76. Hicks, J. (1939), ‘The foundations of welfare economics’, The Economic Journal, 49 (196), 696–712. Horowitz, J.K. and K.E. McConnell (2003), ‘Willingness to accept, willingness to pay and the income effect’, Journal of Economic Behavior and Organization, 51 (4), 537–45. Irvine, I.J. and W.A. Sims (1998), ‘Measuring consumer surplus with unknown Hicksian demands’, American Economic Review, 88 (1), 314–22. Jara-Diaz, S. (2007), Transport Economic Theory, Bingley: Emerald Group. Jara-Diaz, S.R. and J.I. Videla (1990a), ‘On the role of income in the evaluation of users’ benefits from mode choice models’, in B. Gerardin (ed.), Travel Behaviour Research, London: Gower. Jara-Diaz, S.R. and J.I. Videla (1990b), ‘Welfare implications of the omission of income effect in mode choice models’, Journal of Transport Economics and Policy, 24 (1), 83–93. Johansson, B. and L.-G. Mattsson (1995), ‘Principles of road pricing’, in B. Johansson and L.-G. Mattsson (eds), Road Pricing: Theory, Empirical Assessment and Policy, Dordrecht: Kluwer. Kahneman, D. and R. Sugden (2005), ‘Experienced utility as a standard of policy evaluation’, Environmental and Resource Economics, 32 (1), 161–81. Kaldor, N. (1939), ‘Welfare propositions in economics and interpersonal comparisons of utility’, The Economic Journal, 49 (195), 549–52. Karlström, A. (1999), ‘Four essays on spatial modelling and welfare analysis’, PhD thesis, KTH Royal Institute of Technology, Stockholm. Karlström, A. (2001), ‘Welfare evaluations in non-linear random utility models with income effects’, in D.A. Hensher (ed.), Transportation Research: The Leading Edge, Oxford: Elsevier Science, pp. 361–74. Keane, M.P. (2010), ‘Structural vs. Atheoretic Approaches to Econometrics’, Journal of Econometrics, 156 (1), 3–20. Keane, M.P. and K.I. Wolpin (2007), ‘Exploring the usefulness of a nonrandom holdout sample for model validation: welfare effects on female behavior’, International Economic Review, 48 (4), 1351–78. Kreiner, C.T. and N. Verdelin (2011),’ Optimal provision of public goods: a synthesis’, Scandinavian Journal of Economics, 114 (2), 384–408.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Appraisal

625

Li, C.Z. and K.-G. Lo¨fgren (2000) ‘Renewable resources and economic sustainability: a dynamic analysis with heterogeneous time preferences’, Journal of Environmental Economics and Management, 40 (3), 236–50. List, J. (2004), ‘Neoclassical theory versus prospect theory: evidence from the marketplace’, Econometrica, 72 (2), 615–25. Mackie, P.J., S. Jara-Diaz and A.S. Fowkes (2001), ‘The value of travel time savings in evaluation’, Transportation Research Part E, 37 (2), 91–106. Ma¨ler, K. (1974), Environmental Economics, Baltimore, MD: Johns Hopkins University Press. Mas-Colell, A., M.D. Whinston and J.R. Green (1995), Microeconomic Theory, Oxford: Oxford University Press. McFadden, D. (1973), ‘Conditional logit analysis of qualitative choice behavior’, in P. Zarembka (ed.), Frontiers in Econometrics, New York: Academic Press. McFadden, D. (1978), ‘Modeling the choice of residential location, in spatial interaction theory and planning models’, in A. Karlqvist, L. Lundqvist, F. Snickars and J.W. Weibull (eds), Spatial Interaction Theory and Planning Models, Amsterdam: North-Holland, pp. 75–96. McFadden, D. (1998), ‘Measuring willingness-to-pay for transportation improvements’, in T. Garling, T. Laitila and K. Westin (eds), Theoretical Foundations of Travel Choice Modelling, Oxford: Elsevier, pp. 239–64. McFadden, D. (2001), ‘Economic choices’, American Economic Review, 91 (3), 351–78. McFadden, D. (2006), ‘Free markets and fettered consumers’, American Economic Review, 96 (1), 3–29. McFadden, D.L. (1999), ‘Computing willingness-to-pay in random utility models’, in J. Moore, R. Riezman and J. Melvin (eds), Trade, Theory, and Econometrics: Essays in Honour of John S. Chipman, London: Routledge, pp. 253–74. Minken, H. and H. Samstad (2003), ‘Appraisal in integrated land use and transport planning with sustainability objectives’, TI report 686/2003, Institute of Transport Economics, Oslo. Morey, E. and K.G. Rossmann (2008), ‘Calculating, with income effects, the compensating variation for a state change’, Environmental and Resource Economics, 39 (2), 83–90. Munro, A. and R. Sugden (2003), ‘On the theory of reference-dependent preferences’, Journal of Economic Behavior and Organization, 50 (4), 407–28. Neuberger, H. (1971), ‘User benefit in the evaluation of transport and land use plans’, Journal of Transportation Economics and Policy, 5 (1), 52–75. Niemeier, D.A. (1997), ‘Accessibility: an evaluation using consumer welfare’, Transportation, 24 (4), 377–96. Oppenheim, N. (1995), ‘The integrability problem’, Regional Science and Urban Economics, 25 (1), 85–108. Parry, I.W. and K. Small (2009), ‘Should urban transit subsidies be reduced?’, American Economic Review, 99 (3), 700–724. Pearce, D.W. and C.A. Nash (1981), The Social Appraisal of Projects, London: Macmillan. Revelt, D. and K. Train (1998), ‘Mixed logit with repeated choices: households’ choices of appliance efficiency level’, Review of Economics and Statistics, 80 (4), 647–57. Roe, B. and T. Haab (2007), ‘Using biomedical technologies to inform economic modeling’, Resources for the Future report FF DP 07-26. Samuelson, P.A. (1938a), ‘A note on the pure theory of consumer’s behavior: an addendum’, Economica, new series, 5 (19), 353–4. Samuelson, P.A. (1938b), ‘A note on the pure theory of consumer’s behavior’, Economica, new series, 5 (17), 61–71. Scitovsky, T. (1941), ‘A note on welfare proposition in economics’, Review of Economic Studies, 9 (1), 77–88. Scitovsky, T. (1954), ‘Two concepts of external economies’, Journal of Political Economy, 62 (2), 143–51. Small, K.A. (1999), ‘Project evaluation’, in J.A. Gombez-Ibanez, W. Tye and C. Winston (eds), Transportation Policy and Economics: A Handbook in Honor of John R. Meyer, Washington, DC: The Brookings Institution. Small, K.A. and H.S. Rosen (1981), ‘Applied welfare economics with discrete choice models’, Econometrica, 49 (1), 105–30. Small, K.A. and S. Steimetz (2012), ‘Spatial hedonics and the willingness to pay for residential amenities’, Journal of Regional Science, 52 (4), 635–47. Song, H., S.F. Witt, K.F. Wong and D.C. Wu (2009), ‘An empirical study of forecast combination in tourism’, Journal of Hospitality and Tourism Research, 33 (1), 3–29. Srour, I.M., K.M. Kockelman and T. Dunn (2002), ‘Accessibility indices: a connection to residential land prices and location choices’, Transportation Research Record, 1805 (1), 25–34. Sugden, R. (1999), ‘Developing a consistent cost–benefit framework for multi-modal transport appraisal’, Report to UK Department for Transport, University of East Anglia, Norwich. Sugden, R. (2003), ‘Conceptual foundations of cost benefit analysis – a minimalist account’, in A. Pearman, P. Mackie and J. Nellthorp (eds), Transport Projects, Programmes and Policies, Aldershot: Ashgate, pp. 151–69.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

626

Handbook of choice modelling

Timmermann, A. (2006), ‘Forecast combinations’, in G. Elliot, C.W.J. Granger and A.G. Timmermann (eds), Handbook of Economic Forecasting, vol. 1, Amsterdam: North-Holland, pp. 135–96. Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge: Cambridge University Press. Train, K.E. (1998), ‘Recreation demand models with taste difference over people’, Land Economics, 74 (2), 230–39. Van Osselaer, S.M., J.W. Alba and P. Manchanda (2004), ‘Irrelevant information and mediated intertemporal choice’, Journal of Consumer Psychology, 14 (3), 257–70. Varian, H. (1992), Microeconomic Analysis, 3rd edn, New York: W.W. Norton. Vartia, Y. (1983), ‘Efficient methods of measuring welfare change and compensated income in terms of ordinary demand functions’, Econometrica, 51 (1), 79–98. Verboven, F. (1996), ‘The nested logit model and representative consumer theory’, Economics Letters, 50 (1), 57–63. Vickerman R. (2007), ‘Cost benefit analysis and large-scale infrastructure projects: state of the art and challenges’, Environment and Planning B: Planning and Design, 34 (4), 598–610. Williams, H.C.W.L. (1977), ‘On the formation of travel demand models and economic evaluation measures of user benefits’, Environmental Planning A, 9 (3), 285–344. Willig, R. (1976), ‘Consumer’s surplus without apology’, American Economic Review, 66 (4), 589–97. Wu, D., Y. Yin, S. Lawphongpanich and H. Yang (2012), ‘Design of more equitable congestion pricing and tradable credit schemes for multimodal transportation networks’, Transportation Research Part B, 46 (9), 1273–87. Zhao, Y., K.M. Kockelman and A. Karlstro¨m (2012), ‘Welfare calculations in discrete choice settings: an exploratory analysis of error term correlation with finite populations’, Transport Policy, 19 (1), 76–84.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:37AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

25 Forecasting choice Andrew Daly

1

INTRODUCTION

Forecasting choice behaviour has always been a strong motivation for choice modelling (Gapper and Rolfe, 1968; Daly et al., 1973; McFadden, 1978; Cattin and Wittink, 1982). But while reports for government and commercial organisations on the likely effects of policy or marketing initiatives are numerous, the focus of methodological work in choice modelling has largely been on the development of models rather than their use in forecasting, as can be seen in the chapters of this book. In this context, this chapter attempts to set out the major issues in choice forecasting methodology. The chapter is limited in specific ways. Much of the reporting of forecasts is contained in client reports and other ‘grey’ literature, or in conference papers. Other important information is available only informally. While referencing where possible, I have felt it better to give as complete coverage of the area as possible rather than restricting coverage to fully referenced points. Moreover, my experience, on which I am drawing quite heavily, is largely in the transport sector and there is therefore a preponderance of examples from that sector; it seems that long-term forecasting is more common in transport than in other sectors. Finally, there are quite a few separate aspects to forecasting with choice models and the chapter is therefore somewhat general in its treatment. Throughout, the focus is on forecasting aggregate choices, particularly demand for products or services, rather than predicting the behaviour of individual consumers. The next section of the chapter sets out the logical basis on which we might claim that forecasts of future behaviour might have some credibility. The third section discusses the main methodological tools that can be used: aggregate and disaggregate methods, the use of observations of behaviour distinct from those used in developing the models, possibilities for modelling continuous as well as discrete choices and the issues that arise in using mixed models for forecasting. The fourth section explains the background to temporal transfer, first considering the basis of which we can claim that models might be applicable in future circumstances and second looking at the ways in which model inputs can be forecast. Section 5 looks at the methods that can be used for forecasting population, in the detail that is needed to apply choice model forecasts. The sixth section deals with the important specific issue of forecasting in the presence of new alternatives and a final section briefly considers the problems arising from using forecasts that are unreliable, to a greater or lesser extent.

2

LOGICAL BASIS FOR FORECASTING

In order to make a forecast it is necessary to have some belief that a choice model we have developed has grasped at least some of the essence of the behaviour that is 627 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

628

Handbook of choice modelling

modelled. Without this belief, there is no rational basis for thinking that the forecasts have any value. Specifically, without a belief of causality, correlations between ‘dependent’ and ‘independent’ variables observed in data may be due to any kind of effect, endogenous or exogenous. In more detail, along the lines of the discussion by Keane and Wolpin (2007), one may distinguish an absolute perspective that there exists a ‘true’ model describing behaviour from a pragmatic perspective that any model must be an approximation. Taking the former view, any significant discrepancy between our model and reality leads to rejection of our model, but the latter view would reject our model only when a better model is known, and possibly different models may be better for different applications. The context of modelling human behaviour and the experience of practical modelling suggest that if we assemble sufficient data we would reject any feasible model, so that we find ourselves working with pragmatic approximations. Nevertheless, we must believe that our model captures elements of the true influences on behaviour to think that the forecast has validity. At several points in the subsequent discussion the need to take a pragmatic view affects the assessment of which models should be used and how to use them. For example, discriminant analysis was used in early work by Quarmby (1967) attempting to explain commuters’ mode choice. Discriminant analysis, however, was developed to distinguish separate populations using their easily observed characteristics, for example, distinguishing species of plant by measuring the length of leaf. One would not claim that a plant would change its species if the leaf was cut, but Quarmby’s modelling implied that some commuters would change their travel mode if the fares were cut. Discriminant analysis is in principle not appropriate for modelling choice behaviour as it begins by assuming that there is an underlying discrete variable (population membership) that conditions the continuous variables. Therefore, in applying discriminant analysis there would be no reason to suppose that behaviour would change in response to changes in the continuous variables. The case of discriminant analysis can be seen as an extreme form of endogeneity, since the error and the continuous variables are strictly conditioned by the discrete variable and not vice versa. More difficult forms of endogeneity arise in many competitive market situations, in particular where price may be highly correlated with quality. In these cases, sophisticated correction techniques can be employed to reduce or eliminate bias, for example in residential choice (Guevara and Ben-Akiva, 2006). In order to obtain credible forecasts, therefore, we must be clear that the ‘independent’ variables used in the model do actually influence behaviour. In this context, it is more important that each variable can be believed a priori to influence behaviour than that a statistically significant coefficient can be estimated, although the latter is useful in obtaining an accurate forecast. But we need also to be clear that significance has not been obtained by endogenous determination of the ‘independent’ and ‘dependent’ variables. To deal with issues of this kind, Keane and Wolpin (2007) recommend testing of models against holdout samples that differ from the estimation sample in the dimensions that relate to the policy to be tested. They cite McFadden’s (1978) work on Bay Area Rapid Transit (BART) as an example, in which pre-BART data was used as an estimation sample and post-BART data as a validation sample; our own work on the Great Belt road/rail link in Denmark (Møller et al., 1999) is a similar type of study, investigating the success of forecasts in predicting actual outcomes, also on a major transport

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Forecasting choice

629

investment, and in this case indicating greater success in some aspects of the forecasting than in others, pointing the way to potential model improvements. ‘Back-casting’ studies are also used occasionally in transport work, applying the model to ‘predict’ a known past situation with which the model results can be compared. These studies also have echoes in the temporal transferability work discussed in section 4 of this chapter. A different form of testing is to investigate the sensitivity of the model to changes in specific input variables. For example, what price elasticity does the model imply? In some cases we have information on a reasonable range of elasticities and these also form a test of validity. However, while there is no doubt that model tests of this nature are valuable in revealing deficiencies in the model and pointing towards improvements, it is not possible that such tests can be exhaustive: passing these tests is necessary rather than sufficient to assure model validity. In many cases it will not be possible to collect data on consumers who face the precise policy that is to be tested and in many cases several variants of the policy must be investigated and it is the specific differences between the policy variants that are crucial. Further, in many cases it is required that a model should be able to address a range of widely differing issues, some of them not known at the time of model development. An important example of this type of problem is the issue of reducing future road congestion. Different policies may be proposed for this and appraised with the help of a forecasting model, but the main difference between the base and the future, whatever the policy that is considered, is usually a large growth in traffic, which would not be present in any testing of the model against base-year conditions. A wider range of tests of the response characteristics of the model can be validated by calculating elasticities, that is, the proportional change in demand indicated by the model brought about by a proportional change in a variable. In a number of important cases this model performance can be compared with published literature. For example, in the UK, the Department for Transport requires tests of this type to be made for models supporting publicly funded transport projects (Department for Transport, 2011). The limitation of elasticity testing is that published values are not available for many variables. For major projects, such as major infrastructure, the size of the investment is such as to make careful validation essential. Nevertheless, experience has shown that very large forecasting errors occur (Flyvbjerg et al., 2005) in the forecasts on which such investments are made. These authors then recommend validation against broadly similar projects to determine whether the order of magnitude of the forecasts is plausible; such validations are being made, for example, in the proposed UK HS2 high-speed rail project. Figure 25.1, used as part of the discussion of HS2 (Segal, 2009), shows how the air-rail market share varies for city pairs as a function of rail journey time, believed to be the key variable influencing travellers’ choices. While this curve shows that such a connection exists, there is clearly considerable variation; for example, the UK response appears to be steeper than in other countries, perhaps because the rail system in the UK is already relatively fast and expensive. In interpreting this graph, therefore, judgement is required and this undermines the concept of external validation of forecasts as an objective measure of model quality. Finally, however, there is no escape from the fact that we have to believe that the model represents the true causality of choice behaviour in a reasonably accurate way. In

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

630

Handbook of choice modelling 100% UK city-pairs

90%

Other city-pairs Expected share (SNCF)

80%

Rail market share

70% 60% 50% 40% 30% 20% 10% 0 0

1

2

3 4 5 Rail journey time (hours)

6

7

8

Source: Segal (2009).

Figure 25.1 Validating market shares

this context, it is interesting that Keane and Wolpin (2007) recommend that reference is made to a behavioural theory in which we have faith (in their case economics) to support the acceptance of econometric models. Daly (1982) argues similarly that belief in a behavioural theory is essential, also noting that different theories can give rise to models, indistinguishable on base-year data, which nevertheless yield quite different forecasts, so that data cannot be the sole guide to model specification.

3 FORECASTING METHODS 3.1

Aggregate and Disaggregate Approaches

Most choice models predict choice probabilities. In using these probabilities for forecasting we have to deal with the stochastic nature of the model and there are two approaches that are commonly used for this. The first approach is to simulate, that is, to make random draws from the multinomial distribution indicated by the predicted probabilities and assign choice on the basis of those draws. If the population is large enough, or if the process is repeated several times and averaged, the ‘noise’ introduced by the random sampling can be reduced to a negligible, or at least tolerable, level. The second approach is to work with the expected value of the number of people choosing each alternative, that is, the expected demand for that alternative:

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Forecasting choice 631 E (Qj) 5 a wk . pj (xk)

(25.1)

k

where Qj is the number of people choosing alternative j; wk is the number of people of type k; pj (x) is the probability of choosing j, given explanatory variables x; and xk are the explanatory variables experienced by people of type k. It is clear that this expected demand and the expectation of the demand derived from the simulation forecasting approach are equal. In either case, around this expected value, we may distinguish four types of error. 1.

2.

3.

4.

Noise is generated when a finite population is modelled to make discrete stochastic choices; this is the variation that is introduced by repeated runs of the sampling procedure. In expected-demand calculations, this noise does not arise as the expected value is not a random variable. That is, the result of the expected-demand calculation is the mean to which repeated simulations would converge. It is important to note that the variation induced by the sampling procedure does not describe day-to-day variation. Describing that variation would require a different model, for example, the model would need to include a representation of day-to-day correlation. Error is introduced because both wk and pj in (25.1) are estimated by models that contain error, in particular error that arises because the model parameters are usually estimated from finite data samples. When maximum likelihood methods are used for parameter estimation, a ‘delta’ method can be used to predict the consequent error in the model forecasts; under weak conditions, this forecast error has optimal properties (Daly et al., 2012). However, for large-scale models, such as often arise in travel demand forecasting, the calculation required for the delta method can be excessive and a calculation based on simulating parameter error may be necessary, although this is also time-consuming (de Jong et al., 2007). Choice between the approaches for estimating error in practice depends on the nature of the model and quite often the analytical delta method is best. Error introduced by error in xk is also considered by de Jong et al. (2007), who again use a simulation method to calculate the impact of errors in forecast inputs, based on past variation of the relevant data items. In a typical transport forecasting context, they find, not surprisingly, that the impact of these data errors substantially exceeds the impact of model parameter errors. Moreover, the assumption that past fluctuation of x around a steady trend will give a realistic guide to future error is dubious and broader confidence limits need to be considered. Again an analytical method could be used in simpler models for estimating this error. The assumption that this error is independent from the error in the model itself will usually be reasonable. Finally, forecasting error is also caused by model specification error. An important component of such errors would be omitted variables, but it may also be the case that the form of the model is inappropriate, for example, essential non-linearities or correlations are omitted, or that the model is simply not well designed to describe the behaviour being considered. It seems that any quantitative assessment of these errors is impossible and that we must proceed with the possibility that the forecasts contain unknown additional errors.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

632

Handbook of choice modelling

Of these errors, the last three apply equally to the expected-demand and simulation approaches, so that when choosing between the approaches, only the first type of error, simulation noise, is relevant. However, other features of the simulation approach have led researchers to adopt it in a number of practical transportation studies. The other main differences between the approaches are the following: ●

●

●

The output produced by the simulation approach identifies unique choices for each respondent, making it resemble a data file that might be collected in a survey of actual behaviour. This simple form facilitates further processing and analysis of the results. For example, a specific forecast can easily be given of the behaviour of specific population groups and these groups can be defined flexibly; while such analyses can also be made using the expected-demand approach, it is more difficult to vary the specification of the groups, as specific accumulations need to be made while executing the demand model. When working with a complicated model structure, as is common in travel demand forecasting (particularly when activities are being modelled as well as travel), the simplification offered by identifying specific choices for each individual is particularly helpful. The computer run-time can be quite different between the two approaches, but it is not clear which method will be quicker in any specific context. The problem here is that implementation, that is, the programming and testing, of large models to run using either method is very time-consuming, so that detailed comparisons are difficult to make. The only attempt known to us that has been made to make such a comparison (Matsson and Algers, 2006) was not definitive. In essence, the comparison is between the sample sizes processed in the simulation approach and the segmentations used in the expected-value approach.

In a given practical study, researchers will make a choice of forecasting approach based on the circumstances of the specific work they are undertaking. Until further, more definitive research has been done to compare the advantages of the forecasting approaches, it is not possible to make more detailed recommendations, but it is likely that each approach will be suitable in specific circumstances. Forecasts are often characterised as being aggregate or disaggregate. However, referring to the forecasting equation (25.1) it is clear that k may apply either to an aggregate population of size wk or to an individual to whom an expansion factor is applied. From a mathematical point of view, the forecast is the integral of a function over space and this can be carried out either by sampling (that is, in a Monte Carlo process) or by working with averages for an aggregated group (Daly, 1982). These two approaches to integration typify the simulation and expected-demand approaches to forecasting. When k refers to a specific individual, the forecasting technique is called sample enumeration (Ben-Akiva and Atherton, 1977; but see also Daly and Zachary, 1977). Typically, this procedure is simple to apply to the sample that was used to estimate the model, though other samples are sometimes used. It is usually good practice to calculate the elasticity values implied by a model, essentially to validate the model (as explained in section 2), but also to allow simple projections of changing demand to be made. In this context it is important to note that

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Forecasting choice 633 elasticity is an aggregate concept. Ben-Akiva and Lerman (1985) give explicit procedures for moving between disaggregate and aggregate (that is, correct) elasticities, from which it is clear that applying the model to an average individual will give results that are different from the application to the population. Daly (2008) gives a simple quantification of the magnitude of the difference for multinomial logit models, showing that the true elasticity is less than the ‘elasticity’ for an average individual, by a factor that depends on the variance of the choice probabilities in the population to which the model is applied. To obtain correct values of elasticity, the variation of the population must be considered and equation (25.1) must be applied. Given a model to be used in forecasting, a simple calculation of elasticity can be made by changing one of the x values in equation (25.1) and comparing the result with what is obtained for the base values. The most reliable assessment can be obtained by making very small changes in x, when the same result will be obtained for increases as for decreases. However, for practicality it is necessary to make a finite change in x; issues of nonlinearity then become relevant, though in most cases they should remain minimal. 3.2 Using Observed Behaviour To improve the accuracy of forecasting, it is useful to ask how we can best exploit what we know about base year behaviour. Often, information will be available additional to the data used to estimate the model, most frequently in an aggregate form. It is a well-known fact that a multinomial logit model estimated by maximum likelihood methods and with a full set of alternative-specific constants, used to make forecasts with base values of x, will exactly reproduce the base shares for the alternatives, as given in the estimation data. This follows from the first-order conditions of optimality of the likelihood. For more complicated models, however, this is not the case and minor discrepancies can be expected. If no further information is available, arguments can be made both for adjusting the model to be consistent with the estimation data and for leaving the maximum likelihood estimates unamended, with these minor discrepancies, but in many cases aggregate information is also available and the model will be expected to reproduce the shares given in the aggregate data. Adjustments to the constants are therefore often made to match aggregate market shares. A consideration in making forecasts, particularly in the shorter term or when forecasting demand for a new alternative, is whether it is useful to forecast switching behaviour, that is, to base forecasts on the observed choice in the base situation and the probabilities of changing from that behaviour. It is clear that in some cases the knowledge of which choice was made previously can help in explaining future behaviour, because of the correlation of unobserved tastes, but this information is really useful only in short-term forecasting situations, where the concept of a currently-chosen alternative makes sense. For longer-term forecasting, or where there are also significant changes to several alternatives, it is more reasonable to omit the information on the current choice from the forecasting model. Another way in which observed behaviour can be exploited is by making a ‘pivot point’ forecast (Manheim, 1979). Pivoting implies that the model is used only to predict changes from the current situation. When base data is available that is of higher accuracy than the model, it is clear that this procedure can reduce the overall forecasting error.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

634

Handbook of choice modelling

Changes predicted by a model can be applied to base data either as ratios or as differences and it can be shown (Daly et al., 2011) that it is more effective to use ratio pivoting when the model error is proportional to the demand, while difference pivoting gives lower error when the model error is independent of demand. The analyst has considerable flexibility in determining the level of aggregation at which pivoting is carried out, so that specific applications can be tailored to the data that is available. However, pivoting cannot be applied to make forecasts for new alternatives. It appears that the flexible use of data concerning behaviour in the base situation can be further developed to benefit the accuracy of forecasts and further research can be expected in this direction. 3.3

Forecasting Discrete-continuous Choice

In some cases, forecasting a discrete choice is not sufficient and it is also required to forecast a continuous demand. Such discrete-continuous cases arise in forecasting energy consumption by specific appliances (Dubin and McFadden, 1984), consumption of fuel by cars (Train, 1986; de Jong, 1991, 1997), marketing (Song and Chintagunta, 2007) and numerous other applications, many unpublished, in consumer demand forecasting. In these contexts, equation (25.1) has to be extended to include the quantitative choice component: E (Qj) 5 a wk.pj (xk) .qk0j (xk) k

(25.2)

where qk0j (xk) is the quantity of goods demanded by person type k, given that goods type j is chosen. The issue in setting up these models is to formulate sub-models for p and q and, most particularly, the conditionality of q on p. These formulations also determine the way in which the forecasting model (25.2) relates to the estimation of the parameters of p and q. The simplest models of this type allow each individual to choose only a single alternative j. Commonly-applied models in this case include the following: ●

●

●

the Tobit model (Tobin, 1958), in which p and q are modelled as dependent on a single variable u, with q(uk) 5 uk, if uk . 0 and q (uk) 5 0 otherwise; thus implicitly p (uk) 5 Pr { u . 0 } ; where uk 5 b.xk 1 ek, ek are standard normal and b are parameters to be estimated; the Heckman model (Heckman, 1979), which generalises the Tobit model to use separate arguments u for p and q; p is described by a probit model and q by a linear regression which includes a correction term to ensure unbiased estimation; the Dubin–McFadden model (Dubin and McFadden, 1984), in which the model for p is logit, rather than probit as in the Heckman model, facilitating the choice of j from a larger set of alternatives (although just a single or no alternative is chosen).

Variants exist and extensions to these models exist (for example, Bolduc et al., 2001). It is not the purpose of this chapter to discuss model estimation in these cases, but reference to the papers cited indicates that unbiased estimates can indeed be made by including the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Forecasting choice 635

correction E (u 0 u.0) in the equation for q; in the Heckman model, this correction is the inverse Mills ratio, while in the Dubin–McFadden model an analogous term is used. The aim here is to discuss the issues arising in the use of these models for forecasting. Because separate random terms appear in the discrete and continuous components of the Heckman and Dubin–McFadden models, although the correction term allows for correlation, there is a positive probability that p . 0 and q , 0, although the correction term ensures that the mean for q is always positive and the modelled correlation between p and q ensures that the probability of negative q is small. This cannot arise in the Tobit model, where there is only one random term. That is, in the last two models a consumer can apparently choose an alternative but consume a negative quantity of it. This apparently incorrect possibility arises because of the use of simple linear models for q. In forecasting with these models, the situation can therefore arise that negative consumption is predicted. Practical experience (RAND Europe, 2013) suggests this may occur in only a small number of cases. In these models, it is important to realise that, as well as arising in forecasting, the estimation model also attributes positive probability to such cases, even though they do not occur in the data. Suppressing negative forecasts therefore leads to inconsistency between estimation and forecasting and the analyst may well wish to retain consistency by retaining a small fraction of negative forecasts, despite their lack of reality. The more complete solution would be to replace the simple linear function for q by a function that cannot take negative values, consistent with the data, that is, to abandon the simple formulations of Heckman and of Dubin–McFadden. Forecasting with the Train (1986) or de Jong (1991) models does not present difficulties of this type, since the form of the models prevents negative consumption being forecast. It may be noted that these models both apply Roy’s identity to obtain consistency between p and q, applying microeconomic theory which was not considered in the earlier models, based purely on econometric considerations. The discrete-continuous models described above deal with a single choice, but in a number of important practical cases it is useful to model choice among a range of goods, choosing several of these and for each good chosen a quantity of that good. The first application of this type seems to be by de Jong (1997), but more recently the concept has been extended and developed by Bhat and his colleagues using first-order (Kuhn-Tucker) conditions for the optimality of the individual’s utility and, in particular, based on the multiple discrete-continuous extreme value (MDCEV) model. In Chapter 19 in this volume, Bhat and Pinjari give an extensive discussion of multiple discrete-continuous models, going beyond the MDCEV concept to still more general forms, though they do not refer to the recent work of Song and Chintagunta (2007) and Chintagunta and Nair (2011), in the marketing field, which extends Bhat’s framework to include multiple brands, at the expense of simpler functional forms. However, forecasting with these models presents some specific challenges. An important feature of the MDCEV model is the presence of budget constraints. This feature should be viewed positively, because the absence of an explicit budget other than total income (for example, in the models based on Roy’s identity, such as de Jong, 1997) omits an important feature of multi-commodity purchasing. However, estimating the level of the budget presents a problem even for model estimation, since it is a fundamentally latent concept, while forecasting how budgets may change in the future is clearly a difficult problem, which does not appear to have been addressed in the literature.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

636

Handbook of choice modelling

The second issue in forecasting with these models is their complexity and the consequent need to adopt special procedures for forecasting. Depending on the detail of the parametrisation, different procedures are required. Bhat and Pinjari (Chapter 19 in this volume) refer to Pinjari and Bhat (2011) for details of these procedures. 3.4

Models Involving Discrete or Continuous Mixing

As is clear from the other chapters of this book, many modern choice models involve the use of random ‘mixing’ of the probabilities, that is, the overall choice probabilities are calculated as a random mixture of choices defined by simpler sub-models. Often, the simpler models are of the logit form, leading to mixed logit models. The mixing functions can be of discrete or continuous form and these have different implications for forecasting, as follows. An important set of models of this type are discrete mixing or ‘latent class’ models (Hess, Chapter 14 in this volume), where the choice probability for alternative j is calculated by pj (x,z) 5 a rk (z). pj (x,bk)

(25.3)

k

where rk (z) gives the probability of membership of latent class k, given characteristics (typically socio-economic) z, and pj (x, bk) gives the choice probability for alternative j, given alternative attributes x and the model parameters bk associated with class k. It is important to note that, provided r and p are of closed form, for example, are given by models of the logit family, then the calculation (25.3) does not introduce new issues of random sampling. Often, r is simply a constant for each class. Forecasting with latent class models can therefore often be done quite quickly and without loss of accuracy. In contrast, models in which p is mixed using a continuous distribution, including most mixed logit models, do present substantial calculation issues. Such models can be formulated as pj (x,z) 5 3 pj (x,z,b) f (b) db

(25.4)

where f is the frequency distribution of b and is implicitly continuous, distinguishing it from the discrete mixing of (25.3). Because of the need to calculate the integral over the distribution f, models of this type typically require random sampling, that is, the use of Monte Carlo methods, to calculate the probabilities. However, much less attention appears to have been paid to the issues arising in the use of these methods for forecasting than for model estimation. The classical results of Monte Carlo calculation (Hammersley and Handscomb, 1964) apply, of course, but their application to the specific issues of forecasting with choice models do not seem to have been worked out. Specifically, the trade-off of run-time against sampling noise has not been investigated, along with the techniques for reducing noise, such as the use of quasi-random sampling. It is clear that models with discrete mixing (25.3) impose little additional run-time compared with non-mixed models, providing that the functions p and q are of closed

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Forecasting choice 637 form. In contrast, models with continuous mixing (25.4) do impose a run-time penalty and this has prevented widespread use, though some applications have been made (for example, Börjesson and Kristoffersson, 2012). It can be expected that further developments will take place in this direction and that information about appropriate sample sizes and noise reduction techniques will become available. An interesting possibility when using models of this type for forecasting is that the sampling used for the evaluation of probabilities can be integrated with the sampling from the probabilities as explained in section 3.1. Thus a random draw to obtain utilities for the calculation of probabilities can be extended to assign choices. Then although there is a loss of speed in each stage compared to non-random calculations, the fact that the stages are integrated means that in some circumstances the time can be more than recovered.

4 TEMPORAL TRANSFER To support public policy decisions, particularly with respect to investment in infrastructure, forecasts are often needed over quite long periods. It is reasonable to expect major infrastructure to continue to perform satisfactorily at least 30 years after the date at which analysis is conducted. Two specific issues arise in this context: first, to what extent the model can be considered to be stable and second, how the inputs for forecasting, that is, x in the equations above, can be forecast. 4.1 Stability of the Model To select and appraise policy over such periods, it is necessary to have some confidence that the model will represent appropriately the demand impact of policies at a date long removed from the date at which the model was developed. This is the property of temporal transferability, and it is particularly questionable when, as often happens, the model is estimated on data that has little or no temporal variation. The issue of spatial transferability was a concern of early travel demand modellers (for example, Koppelman and Wilmot, 1982). However, that interest was not maintained and spatial transferability has only recently begun to be considered again. Temporal transfer appears to require similar approaches to those used in spatial transfer but seems to have been studied even less. The recent review by Fox and Hess (2010) considers the limited number of studies that have been conducted in both types of transfer, finding that the majority are of early date (1970s) with very few recent works, despite the relevance of temporal transfer to the validity of forecasts. Recalling the discussion in section 2, an absolutist approach to model transfer will reject all model transfers, given a sufficient volume of data, since it is not conceivable that behaviour will not change over time, at least to a small extent. Therefore temporal transfer is constrained to the pragmatic approach, that is, the question (following Koppelman and Wilmot, 1982) is whether the base model is useful in discriminating between scenarios for the future year, in other words, that the transferred model is the best readily available model. In general terms and taking the pragmatic approach, the literature reviewed by Fox

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

638

Handbook of choice modelling

and Hess (2010) indicates that temporal transfer of the key travel demand models appears moderately successful, at least for periods of up to five years. An important issue is the level of modelling detail that would optimise transfer. Omission of an important variable means that, if that variable changes in the forecasting period, the model will produce incorrect results (as well as possibly biasing the parameter estimates). But inclusion of a redundant variable, on the basis of an accidentally high correlation with base year behaviour, may also cause forecasts to be inaccurate. The issue of temporal transfer in other fields than travel demand forecasting does not appear to have been addressed to the same extent. A study by Brouwer and Bateman (2005) finds a limited stability of preference, but this is not a forecasting study and the period (five years) remains quite short, though the authors claim it is longer than in other studies in their field. Further work in this and other areas is clearly needed before extensive claims can be made for temporal stability. In random utility models, the variance assumed for the random error determines the model scale and in recent discrete choice modelling there has been a focus on incorporating variations in scale, whether conditioned by measured variables or simply random. The question of whether the scale might change in the future needs to be considered and, of course, adjustments can be made for any measured variables that influence the scale and that are forecast to change; otherwise the scale and any scale heterogeneity might be considered to remain unchanged over time so that the model can be used without adjustment on this point. In travel demand forecasting, the assumption that the scale is constant has been based on the use of time to define the scale and the argument that in some sense the marginal utility of time might remain constant over time; as set out in the following section, the marginal utility of money would be strongly influenced by changes in income. A study by Börjesson (2013) reviews the transport evidence and presents some empirical results that support this hypothesis, although generally it could be expected that increasing unexplained heterogeneity in the population might reduce the scale. Evidence on the stability or otherwise of model scales appears to be lacking in other fields and it would be useful if such information could be obtained. 4.2

Forecasting Model Inputs

In order to operate a forecasting model it is necessary to provide forecasts of the x and z variables that drive the model. Forecasting of the z (socio-economic) variables, however, is generally made by a population model as described in the following section; that is, forecasts are not made of how z will change in the future, but of how many people will have specific values of z. This approach implies either a segmentation of the population, so that numbers in each z-group are forecast, or to apply a reweighting of a sample enumeration, so that each consumer in the sample becomes representative for a different number of consumers, each with the same z. Income Income is a key variable that requires special consideration. In some studies income is handled solely by segmentation and a forecast change in income is represented as shifts between the income groups. However, this approach risks confounding the primary model

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Forecasting choice 639 function of income as indicating an ability or willingness to pay for differently-priced alternatives with the additional functions that income may be playing in the model as a proxy for other variables: tastes, employment, education, status, social class, and so on. Forecasting the future average personal or household income (or the gross domestic product – GDP) is clearly very difficult; the recent recession was not widely forecast, for example. Forecasting in this context is then best conducted by considering a range of potential developments. In section 7 the issues of forecasting in uncertain futures are discussed; uncertainty in overall income can be expected to be one of the most important contributors to the uncertainty affecting choice forecasts. Forecasting income distribution is even more difficult. Governments may attempt to reduce income inequality, but the forces influencing income distributions do not seem to be understood and forecasting would be difficult. In most cases, therefore, an assumption of a constant income distribution is maintained and adjustments are made only to the overall level. In some cases, for example, when considering premium products or services, it would be justifiable to investigate a range of assumptions on income distribution. Uncertainty in overall income is correlated with uncertainty in other variables, most importantly employment. This issue is discussed by Daly and Fox (2012), who propose the ‘welfare factor’ approach which separates income change into change caused by changes in employment or other segmenting variables and a remaining change, the welfare factor, which affects all incomes equally. The income change is then used to adjust the impact of cost variables in the model, that is, to adjust willingness to pay. Daly and Fox also point out that cross-sectional and longitudinal changes in willingness to pay can be expected to be different, presumably because of the role of income in the model as proxy for unmeasured and unknown variables. The longitudinal values, which have to be obtained from time-series analysis, should clearly be used for forecasting. Other explanatory variables In forecasting the x variables, key issues are to maintain consistency between forecast and model estimation and to ensure that the forecasts are defensible. In practice this often means that an objective forecasting procedure, open to rigorous challenge, is set up to generate values for the x variables. By applying this procedure to the base case it is usually possible to calibrate the model so that changes in x indicated by the forecasting procedure can be applied in the model to obtain forecast changes in behaviour. Alternatively, the procedure may be applied in the base year to generate the x values used for model estimation, so that consistency is guaranteed; this is most often done when revealed preference (RP) data is being used for estimation. In modelling travel demand, for example, x variables are usually forecast by setting up explicit networks for the travel modes. These networks can then be used in both base and forecast contexts to obtain consistent measures of x. Because the networks contain representations of actual infrastructure, they can also help in obtaining forecast flows on specific parts of the network; this process is usually called assignment. An issue that is of great importance in travel demand forecasting and may be of importance for other application fields also is that of ‘equilibration’. Congestion on transport networks and demand for travel are determined mutually, so that it is natural to think of finding an equilibrium solution in which congestion and demand are consistent. Conditions for the existence and uniqueness of equilibrium solutions, when demand is

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

640

Handbook of choice modelling

forecast by choice models, are given (for example) by Cantarella (1997). However, solution procedures to find the equilibrium are not yet as efficient as would be desirable and further research on this issue is needed. More fundamentally, it is not clear that equilibrium is a good description of the real-world behaviour of travellers, but the equilibrium does give a clear definition of a demand forecast that allows ‘fair’ comparison under different policies. In other fields also it should be expected that analogous equilibria would exist, under the assumption of market clearing, which is essentially the assumption required to achieve equilibrium in the transport case. You would expect to find existence and uniqueness theorems in quite general cases, although, as in the transportation case, algorithms for finding the equilibrium point may not be obvious. Moreover, again as in the transportation case, the assumption of market clearing is debatable, but may be the only way of developing a base for comparing scenarios. Further, in some cases it may be necessary to take into account the responses of suppliers in the market. For example, forecasting a response to a price change by one supplier may need to consider the possibility of competitive response. It is obvious that this can become very complicated in markets with multiple suppliers.

5 POPULATION FORECASTING The key forecasting equations (25.1) and (25.2) illustrate the equal importance of forecasting the future population alongside the choice probabilities. As mentioned in section 4, except for the income variable, this is usually done by segmentation, that is, predicting the numbers of people with each specific value of z, rather than predicting how z will change. The most widely used procedure for forecasting population segments, certainly in the transportation context, is iterative proportional fitting (IPF), described for example by Beckman et al. (1996). In US applications, the technique is often adapted to make use of the Public Use Microdata Samples (PUMS) that are made publicly available for US research. Essentially, the IPF procedure involves repeated factoring of a ‘seed’ matrix to match marginal totals for a series of dimensions. Several methods are available for defining the seed matrix, while several dimensions can be used for the factoring. There are proofs that the procedure converges to give exact matches to the marginal totals. The procedure can be repeated for a number of smaller areas to cover a study area. It seems that data of this type is less readily available in other countries and that this may restrict the forecasting applications that can be made. An alternative idea is given by Daly (1998), who describes an approach developed over the previous decade with his colleagues. This is based on the recognition that input data may contain error, even in the base year, so that the appropriate approach is the minimisation of the deviation of forecasts from all the sources, rather than matching specific sources exactly at the expense of others. The procedure of quadratic minimisation (QUAD) is therefore applied, making adjustments to the weighting of the various deviations to find an appropriate balance. Analysts have begun to explore alternatives to IPF more intensively. Zhu et al. (2013) are critical of the IPF approach and propose an alternative based on logistic regression,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Forecasting choice 641 in which the population shares for specific segments are predicted using logit formulae. Similarly, Farooq et al. (2013) are also critical of IPF and offer a simulation-based approach that appears to perform well in two cases of differing data availability. These recent studies indicate dissatisfaction with the results obtained from IPF and serious attempts to formulate alternative procedures. An important aspect to this work is to incorporate both the best available forecasts and the appropriate base-year information. For example, behaviour that is cohortspecific needs to be projected forward, incorporating the effect that the behaviour of (say) 60-year-olds in 20 years’ time will not necessarily be the same as the behaviour of 60-year-olds now. Simulation of the development of the population, that is, births, household formation and separation, and deaths, can be undertaken but it is not clear that this is stable over an extended period. In summary, it seems that more attention may now be being given to this area. It is desirable that this work should be extended and the relative strengths and weaknesses of the approaches should be understood, so that forecasts can be made using the right approach in each data context. It is quite possible that different approaches may be needed as a function of data quality and availability.

6 NEW ALTERNATIVES A frequent application of choice modelling to forecasting is to meet the need to forecast the demand for new choice alternatives. It is natural that this requirement should be met by the use of stated preference (SP), frequently stated choice (SC), methods. This approach explicitly adopts the pragmatic approach to the forecast model, that is, it would not be claimed that an SP model gave a full explanation of behaviour in the presence of new alternatives, simply that no better approach was available, given that RP data cannot be collected for forecasting the use of new alternatives. However, the classical application of choice modelling to predict the use of a new alternative, McFadden’s (1978) study of the BART system, was based on RP data. A model was estimated for the choice of mode for 771 commuters, with four alternatives: drive-alone, walk-bus, car-bus and car pool. This model was then applied in a context with two additional alternatives: bus-BART and car-BART, with remarkable success as shown in Table 25.1. Apparently, good fortune played a role here, as the calculated standard errors for these predictions were more than ten times the actual forecast errors, although the paper does not explain how those errors were estimated. In applying the model, McFadden and his team did not take account of any newalternative property of BART, but assumed that its unmeasured attributes would be the same as for the corresponding bus alternatives in the pre-BART situation; modern transport analysts would generally expect a rail alternative to offer considerably better comfort and reliability than a bus alternative, and this would usually be reflected in alternative-specific constants. Moreover, the model they used was a simple multinomial logit and the team themselves recognised that this would not represent fully the competitive situation in the market of a new public transport alternative competing with existing public transport and car alternatives. They also pointed to some specific data issues that would impede the accuracy of their forecast.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

642

Handbook of choice modelling

Table 25.1 BART corridor forecasts (%) Auto alone Predicted share Standard error Observed share Actual error

55.8 11.4 59.9 −4.1

Bus/walk 12.5 3.4 10.8 11.7

Bus/auto 2.4 1.4 1.4 11.0

BART/bus 1.00 0.50 0.95 10.05

BART/auto 5.3 2.4 5.2 10.1

Carpool 22.9 10.7 21.7 11.2

Source: McFadden (1978).

Developing beyond the BART study, it would seem natural to model the newalternative aspects explicitly and to take account of the competitive position of the new alternative relative to existing alternatives. To take account of these issues inevitably requires the use of SP data. In applying SP data, such as SC, it is also necessary to take account of the various issues in this data, in particular hypothetical bias, which may mean that the response scale is distorted relative to revealed preferences (Ben-Akiva and Morikawa, 1990; Bradley and Daly, 1997). Moreover, cost-efficient sampling procedures to collect SP data are often related to the actual choices made, raising further potential biases. An approach to address these issues is given by Daly and Rohr (1998). They set out a procedure involving two-stage estimation which allows for simultaneous estimation using all the disaggregate data sources, taking account of error variance differences, determining the context in which the new alternative will compete and dealing with sampling biases. However, this approach is not the last word in forecasting procedure, although more advanced methods are apparently not often used, and it may be expected that improved methods using aggregate and disaggregate data can be developed. A deficiency in these models that does not seem to have been investigated sufficiently to date is that the SC responses are likely to be correlated with the RP choices made by the consumer (though see Morikawa, 1994). For example, the users of a high-price, high-quality new product are likely to be those who are already buying the better products available in the market. While methods exist (Lerman and Manski, 1977; Bierlaire et al., 2008) to estimate models where the sampling is based on the choice made in the specific responses being modelled, methods do not appear to exist for estimating unbiased models where the sampling is somewhat, but not entirely, correlated with the choices made. This feature may well bias models estimated on data of this type.

7 WORKING WITH UNCERTAIN FORECASTS In section 3.1 we outlined the sources of error in forecasting arising from the model or from uncertainty about the future and how model error could be reduced by different forecasting approaches, for example, by more intensive use of base-year data as discussed in 3.2. The quantification of the errors implied by improved procedures is not yet complete and this should be the subject of future work.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Forecasting choice 643 In many cases, those commissioning choice model forecasts and using their output are interested in having a single ‘answer’ from the work. It is often difficult to communicate an appropriate level of confidence, where clients can appreciate that forecasts come with error margins, but without undermining confidence in the modelling work. In principle forecasts should not be delivered without confidence limits being stated; and it is a matter of professional ethics that these confidence limits should be appropriately wide. Most model estimation procedures generally yield estimates of error along with the coefficient values. However, it also needs to be considered that ‘internal’ error of this type represents only a lower bound on the uncertainty associated with forecasts. Additionally, major uncertainty arises from issues such as specification error or general unsuitability of the model and there are striking examples of such error causing major failures of planning decisions. Clients also need to be made aware of these issues. For example, Flyvbjerg et al. (2003) draw attention to the very large pitfalls that can occur when appraising large infrastructure projects. They find that the performance of such projects can often disappoint, also noting that many participants have vested interests, so that all aspects of the appraisal process can be biased, including demand forecasting. They propose a realignment of the burden of risk so that those taking decisions perceive more fully the potential downsides of their forecasts. While these issues are not unique to forecasting with choice models, they do need to be taken into account in our work also; choice models were applied in some of the forecasts cited by Flyvberg et al. (2003). Practical methods for working with uncertainty in both the model and the exogenous description of the future have been developed by Lempert and his colleagues at the RAND Corporation (Lempert and Collins, 2007; Lempert et al., 2013) under the title of ‘Robust decision making’. The objective is to support decision makers dealing with ‘deeply’ uncertain futures, where the possibilities are unknown and/or not agreed by the participants, to develop robust decisions that will be acceptable for a range of possible futures. Applications have included energy strategy, water management and technology policy development, all areas where decisions made now under great uncertainty have long-term impacts. This type of decision support is very different from the single forecasts with cost–benefit analyses that are used in many infrastructure project appraisals at present. The issues and methods arising because of the uncertainty of the future apply much more widely than just to forecasts made with choice models. What choice models offer is insight into an important part of the future and how people may respond to changed and changing circumstances. Choice models can also quantify part of the uncertainty but there are other uncertainties that cannot be quantified. Future work can be expected to improve both the quantification of estimable error and decision-making procedures for dealing with that and with unquantifiable uncertainty. Finally, an approach that is well worth considering for important studies is to develop alternative models, perhaps using different research teams to do this. This approach can give a completely different view of the issues, although of course it is an expensive idea. In conjunction with the intention to find robust decisions, however, it can be very productive.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

644

Handbook of choice modelling

ACKNOWLEDGEMENTS I am grateful to Charlene Rohr, Jordan Louviere and William Greene for comments on a previous version of this chapter which have helped me to make significant improvements. I remain responsible for any errors or confusions that remain.

REFERENCES Beckman, R.J., K.A. Baggerly and M.D. McKay (1996), ‘Creating synthetic baseline populations’, Transportation Research Part A, 30 (6), 415–29. Ben-Akiva, M. and T. Atherton (1977), ‘Methodology for short-range travel demand predictions’, Journal of Transport Economics and Policy, 11 (3), 244–61. Ben-Akiva, M. and T. Morikawa (1990), ‘Estimation of switching models from revealed preferences and stated intentions’, Transportation Research A, 24 (6), 485–95. Ben-Akiva, M.E. and S.R. Lerman (1985), Discrete Choice Analysis: Theory and Application to Travel Demand, Cambridge, MA: MIT Press. Bierlaire, M., D. Bolduc and D. McFadden. (2008), ‘The estimation of generalized extreme value models from choice-based samples’, Transportation Research Part B: Methodological, 42 (4), 381–94. Bolduc, D., L. Khalaf and E. Moyneur (2001), ‘Joint discrete/continuous models with possibly weak identification’, paper presented at Choice Modelling Conference, Asilomar, June. Börjesson, M. (2013), ‘Inter-temporal variation in the travel time and travel cost parameters of transport models’, Transportation, accessed 30 July 2013 at http://dx.doi.org/10.1007/s11116-013-9493-2. Börjesson, M. and I. Kristoffersson (2012), ‘Estimating welfare effects of congestion charges in real world settings’, Centre for Transport Studies Working Paper 2012:13, Stockholm. Bradley, M.A. and A.J. Daly (1997), ‘Estimation of logit choice models using mixed stated preference and revealed preference information’, in P. Stopher and M. Lee-Gosselin (eds), Understanding Travel Behaviour in an Era of Change, Oxford: Pergamon. pp. 209–31. Brouwer, R. and I.J. Bateman (2005), ‘Temporal stability and transferability of models of willingness to pay for flood control and wetland conservation’, Water Resources Research, online journal, 41 (3). Cantarella, G. (1997), ‘A general fixed-point approach to multimode multi-user equilibrium assignment with elastic demand’, Transportation Science, 31 (2), 107–28. Cattin, P. and D.R. Wittink (1982), ‘Commercial use of conjoint analysis: a survey’, Journal of Marketing, 46 (Summer), 44–53. Chintagunta, P. and H. Nair (2011), ‘Discrete choice models of consumer demand in marketing’, Marketing Science, 30 (6), 977–96. Daly, A. (2008), ‘Elasticity, model scale and error’, paper presented to the European Transport Conference, Noordwijkerhout, Netherlands, October. Daly, A. and J. Fox (2012), ‘Forecasting mode and destination choice responses to income change’, presented at the International Association for Travel Behaviour Research Conference, Toronto, July. Daly, A. and C. Rohr (1998), ‘Forecasting demand for new travel alternatives’, in T. Gärling, T. Laitila and K. Westin (eds), Theoretical Foundation for Travel Choice Modelling, Oxford: Pergamon, pp. 451–71. Daly, A. and S. Zachary (1977), ‘The effect of free public transport on the journey to work’, Transport and Road Research Laboratory Report SR388, Crowthorne. Daly, A., J. Fox, B. Patruni and F. Milthorpe (2011), ‘Pivoting in travel demand models’, paper presented at the European Transport Conference and at Australasian Transport Research Forum, Adelaide, September. Daly, A., S. Hess and G. de Jong (2012), ‘Calculating errors for measures derived from choice modelling estimates’, Transportation Research B, 46 (2), 333–41. Daly, A.J. (1998), ‘Prototypical sample enumeration as a basis for forecasting with disaggregate models’, paper presented at PTRC/AET Conference, Loughborough, October. Daly, A.J. (1982), ‘Applicability of disaggregate behavioural modelling: a question of methodology’, Transportation Research, 16A (5–6), 363–70. Daly, A.J., G.W. Phillips, K.G. Rogers and P.J. Smith (1973), ‘Planning urban bus routes: a study for Coventry City Council’, Report C149, Local Government OR Unit, Reading. Department for Transport (2011), ‘Variable demand modelling – convergence realism and testing’, TAG unit 3.10.4, section 1.6, accessed 18 September 2013 at www.dft.gov.uk/webtag/documents/expert/unit3.10.4.php#018. Dubin, J.A. and D. McFadden (1984), ‘An econometric analysis of residential electric appliance holdings and consumption’, Econometrica, 52 (2), 345–62.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Forecasting choice 645 Farooq, B., M. Bierlaire, F. Hurturbia and G. Flötteröd (2013), ‘Simulation based population analysis’, Transportation Research B, accepted for publication. Flyvbjerg, B., N. Bruzelius and W. Rothengatter (2003), Megaprojects and Risk: An Anatomy of Ambition, Cambridge: Cambridge University Press. Flyvbjerg, B., M. Holm and S. Buhl (2005), ‘How (in)accurate are demand forecasts in public works projects? The case of transportation’, Journal of the American Planning Association, 71 (2), 131–46. Fox, J. and S. Hess (2010), ‘Review of evidence for temporal transferability of mode-destination models’, Transportation Research Record, 2175, Transportation Research Board of the National Academies, Washington, DC. Gapper, J. and C. Rolfe (1968), ‘Modal split: factors determining the choice of transport for the journey to work’, Report C32, Local Government OR Unit, Reading. Guevara, C. and M. Ben-Akiva (2006), ‘Endogeneity in residential location choice models’, Transportation Research Record, 1977, 60–66. Hammersley, J. and D. Handscomb (1964), Monte Carlo Methods, London: Chapman and Hall. Heckman, J. (1979), ‘Sample selection bias as a specification error’, Econometrica, 47 (1), 153–61. Jong, G. de, A. Daly, M. Pieters, S. Miller, R. Plasmeijer and F. Hofman (2007), ‘Uncertainty in traffic forecasts: literature review and new results for the Netherlands’, Transportation, 34 (4), 375–95. Jong, G.C. de (1991), ‘An indirect utility model of car ownership and car use’, European Economic Review, 34 (5), 971–85. Jong, G.C. de (1997), ‘A micro-economic model of the joint decision on car ownership and car use’, in P. Stopher and M. Lee-Gosselin (eds), Understanding Travel Behaviour in an Era of Change, Oxford: Pergamon. Keane, M.P. and K.I. Wolpin (2007), ‘Exploring the usefulness of a nonrandom holdout sample for model validation: welfare effects on female behaviour’, International Economic Review, 48 (4), 1351–78. Koppelman, F., and C. Wilmot (1982), ‘Transferability analysis of disaggregate choice models’, in Transportation Research Record, 895, 18–24, National Research Council, Washington, DC. Lempert, R. and M. Collins (2007), ‘Managing the risk of uncertain threshold response: comparison of robust, optimum, and precautionary approaches’, Risk Analysis, 27 (4), 1009–26. Lempert, R., D. Groves and J. Fischbach (2013), ‘Is it ethical to use a single probability density function?’, RAND Working Paper, accessed 26 January 2014 at www.rand.org/pubs/working_papers/WR992.html. Lerman, S. and C. Manski (1977), ‘The estimation of choice probabilities from choice based samples’, Econometrica, 45 (8), 1977–88. Manheim, M.L. (1979), Fundamentals of Transportation System Analysis, Cambridge, MA: MIT Press. Matsson, L.-G. and S. Algers (2006), ‘Development of activity-based models for Stockholm’, project with diverse publications, accessed 16 October 2013 at http://pocket.kth.se/index.php/kb_1/io_8708/ io.html?add_to_infobox51&&add_io58708. McFadden, D. (1978), ‘The theory and practice of disaggregate demand forecasting for various modes of urban transportation’, in Emerging Transportation Planning Methods, DOT-RSPA-DPB-50-78-2, US Department of Transportation, Washington, DC. Møller, L., W. Wätjen, K.S. Pedersen and A.J. Daly (1999), ‘Traffiken på Storebælt’ [‘Traffic across the Great Belt’], Dansk Vejtidsskrift, presented in English translation to International Road Federation Regional Conference, Lahti, Finland, 14–16 June. Morikawa, T. (1994), ‘Correcting state dependence and serial correlation in the RP/SP combined estimation method’, Transportation, 21 (2), 153–65. Pinjari, A.R. and C.R. Bhat (2011), ‘An efficient forecasting procedure for Kuhn-Tucker consumer demand model systems: application to residential energy consumption analysis’, working paper, University of South Florida. Quarmby, D.A. (1967), ‘Choice of travel mode for the journey to work’, Journal of Transport Economics and Policy, 1 (3), 273–314. RAND Europe (2013), ‘Consumers’ responsiveness to alcohol multi-buy sales promotions’, report for HM Revenue and Customs, accessed 26 January 2014 at www.hmrc.gov.uk/research/report263.pdf. Segal, J. (2009), ‘A strategy for a high speed rail network in Britain – why do we need one?’, paper presented at the European Transport Conference, 5 October; reprinted 2010 in The Transport Economist, 37 (1). Song, I. and P.K. Chintagunta (2007), ‘A discrete-continuous model for multicategory purchase behavior of households’, Journal of Marketing Research, 44 (4), 595–612. Tobin, J. (1958), ‘Estimation of relationships for limited dependent variables’, Econometrica, 26 (1), 24–36. Train, K. (1986), Qualitative Choice Analysis; Theory, Econometrics and an Application to Automobile Demand, Cambridge, MA: MIT Press. Zhu, X., S. Mishra, T. Welch, B. Pandey and C. Baber (2013), ‘A framework for modeling and forecasting population age distribution in metropolitan areas at transportation analysis zone level’, paper presented to Transportation Research Board meeting, Washington, DC, January.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:52:34AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

26 Deciding how to decide: an agenda for multi-stage choice modeling research in marketing Joffre Swait and Fred Feinberg

1

INTRODUCTION

The need to focus on consumer and organizational choice is at the very core of marketing practice and academia. Ultimately, both the tactical and strategic spheres of marketing revolve around the question of what product and service choices decision makers will exercise: product design, pricing, promotion and distribution decisions by the individual firm must necessarily be guided by market demand, itself the aggregation of myriad choices made in the midst of a specific market context/structure. As the discipline has matured, it is natural that our scope has broadened widely, perhaps to the detriment of maintaining a crisp focus on the issue of consumer (writ large) choice modeling. One of the consequences of this broader focus has been the somewhat less than critical adoption of the microeconomic consumer model1 as an overarching framework for modeling choice behavior. The external affirmation granted by the ubiquitous use of the homo economicus construct in other disciplines (for example, transportation planning, geography, applied economics – health, environmental, labor, urban, transport, inter alia), the simplicity (even elegance) of its representation, and the ease of its implementation in measurement frameworks, have all contributed to legitimizing the widespread use of this account of decision-making. Here, we expound on a specific and, in our view, underexamined aspect of this model of consumer behavior: that it postulates an essentially single-stage view of consumer decision-making. Taking a step back, however, the marketing discipline has contributed substantially to extant knowledge about consumer simplification of decision processes, particularly through the accumulated work of consumer behavior researchers. One of the overarching results from that research is that context influences the form of adaptations that consumers make to minimize decision effort, yet manage high-quality outcomes (see, among others, Simon, 1955; Wright, 1975; Payne, 1976; Dawes, 1979; Payne et al., 1988; Swait and Adamowicz, 2001). One interpretation of this body of knowledge is that consumers actively decide how to proceed to make a decision in a given context. That is, representations of choice behavior should not simply be about evaluation and choice; they should be extended to specify how decision makers prepare themselves and the context for the exercise of evaluation and choice. We therefore argue that there is a pressing need to extend choice models to include different pre-conditions to evaluation and choice. Specifically, choice set formation, decision strategy selection and goal pursuit are examples of how consumers choose ‘how to decide’ in a given context. The majority of choice modeling work, both in methodological and substantive research, glosses over such pre-conditioning exercised by consumers. We suggest several directions for extensions to choice models that incorporate this multi-stage 649 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

650

Handbook of choice modelling

conceptualization of choice processes. We believe that this is an especially critical issue at this juncture of technological time. Whereas researchers in prior decades could maintain that sets of available items, the totality of accessible attribute information and, especially, processing power were all limited, the vast efficiencies afforded by cheap computation and the Internet have rendered such assumptions somewhere between quaint and perverse. The ubiquity of computing power, concrete information, peer networks, expert recommendations, and the sheer volume of options and specialized tools to help consumers e-navigate, requires a re-evaluation of the assumptions underlying how common decisions – what to purchase, where to live, whom to date – are made by ordinary consumers. Specifically, the view that consumers fail to engage in substantial pre-conditioning seems increasingly untenable, so it seems to us that the time is especially ripe for decision researchers to reexamine the behavioral assumptions underlying their most familiar modeling frameworks. A note before proceeding: we have been quite selective in the citation of literature since this is not intended to be a review paper. Rather, we cite literature that aids us in characterizing past research and motivating future research directions.

2

SETTING THE STAGE FOR INDIVIDUAL CHOICE

As was noted before, the overwhelming majority of literature in marketing (as with other applied disciplines) represents choice as the outcome of full evaluations (utilities or values) {Uj(Xj|b,Z,ej), 5j[M}, M the set of market options/objects, which may be person-specific and is a proper subset of the universal set of alternatives, each object j having attributes Xj, and b a corresponding vector of tastes of the decision maker with characteristics Z.2 We also include a possible stochastic source of utility (e) to characterize idiosyncratic valuations that are generally unknown to an external observer (and hence the only valuation component that is not attributable to the decision maker). The outcome of the decision process is dictated by the maximization operator, which selects as the chosen alternative the object with largest utility. To further establish nomenclature, let the set M be called the ‘choice set’ because all alternatives are assumed to be available for selection by the focal decision maker; the ‘decision rule’ is utility-maximization, because it is the transform that translates evaluations to choice. This representation assumes that the decision maker (1) is exhaustive (that is, evaluates all objects in M, no matter its size), (2) has access to all necessary information (that is, X and Z contain all relevant information needed), (3) uses full information (no matter the dimensionality of X and Z) to make evaluations, (4) has no substantial sources of uncertainty and (5) selects the best alternative. Underlying this set of assumptions, of course, is the antecedent (and behaviorally questionable) assumption that the decision maker has the cognitive wherewithal to process all these quantities into a unidimensional utility value for each option. When the assumptions underpinning the homo economicus decision maker are enunciated in this way, it should make a choice modeler ponder their individual and collective validity as a depiction of actual decision-making processes. To employ a marketing example, a cursory examination of aisles in a supermarket will quickly turn up a list of product categories in which there are literally hundreds of product stock-keeping units (SKUs), challenging the idea that decision makers can be exhaustive – if they were to

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Deciding how to decide 651 examine each of the hundreds of breakfast cereal SKUs available in many markets every time they chose which to purchase, they would certainly be exhaustive, but they would just as certainly be exhausted! Further observation might identify individuals that quickly walk up to a shelf and with no obvious time for deliberation or examination of the package(s), put a box of cereal into their cart and blithely proceed further down the aisle to another product category. This would put into question whether they made use of any available product or retailer information at all, and suggests that the consumer may have been acting simply out of habit: perhaps they placed in their shopping cart the same SKU they purchased on the previous shopping occasion. Research in marketing and in other fields has, for a long time and repeatedly, shown that each of these hyper-rational assumptions is indeed questionable. We wish to suggest here that the marketing discipline is in a unique position to lead the way in development of extensions to choice models by conceptualizing operational means of relaxing these strong assumptions. First, driven by the large variety of products and services that the discipline deals with, marketing is arguably one of the most fertile disciplinary grounds for enhancements to choice modeling. Secondly, the unique blending of marketing science and consumer behavior orientations within the discipline represents a great opportunity for cross-fertilization that is unlikely to occur in economics, psychology, transportation, geography, health and other fields that use choice modeling. Thirdly, as mentioned previously, owing to e-commerce, marketing datasets are growing dramatically in the number of alternatives, categories, and consumers, and information available on each of their purchases. We suggest below several specific types of extensions to existing choice model conceptualizations that build on a simple idea: when faced with complex decision problems, decision makers manage the choice process in such a way as to reduce their decisionmaking costs (for example, literal funds, cognitive capacity, time, emotional conflict or any other valued quantity). Essentially, consumers decompose decision problems into parts that are more easily solved than tackling the whole thing at one time – that is, they use a ‘divide-and-conquer’ problem-solving strategy. The partitioning of the overall problem is of itself effortful, and so may need to have its costs taken into account. In the coming sections we address several ways in which choice models might be extended by suggesting different decomposition strategies decision makers might adopt. 2.1

Choice Set Formation3

In the archetypal homo economicus representation, we assume that the individual chooses the best alternative from M, the set of market options. There is much evidence in marketing (and elsewhere) that consumers may (and often do) restrict their evaluations to a particular subset C#M (for example, see the discussion in Hauser 2010). This might occur because of consumer requirements or constraints (for example, a maximum price, a required quality attribute), supply constraints (for example, stockouts), decision complexity versus time availability, desire for variety, and so forth. Thus, one feasible problem-solving strategy for the consumer is to make a final decision in two stages: 1. 2.

Choice set formation: confine attention to a subset C#M. Evaluation and choice: evaluate only alternatives in C, then select the best alternative in C.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

652

Handbook of choice modelling

Manski (1977) formalized this decomposition by assuming that C is unobserved, which results in the need to characterize the choice set formation process through a probability distribution over the [2J − 1] possible unobserved choice sets. Early work in choice set formation (for example, Swait and Ben-Akiva, 1985, 1987a, 1987b) showed that choice set formation is empirically verifiable and that its omission, when the data generation process does include it, leads to greatly biased inferences and very erroneous policy recommendations. Nonetheless, the dimensionality of the latent choice set formation process makes its incorporation in choice models a daunting challenge and no doubt explains a generalized reluctance to address the issue. That is not to say there has been no progress in trying to address the underlying complexity of choice set formation modeling. Swait and Ben-Akiva (1987b), for instance, suggested the outright elimination of choice sets from the choice set formation space as a means of coping with its dimensionality. Swait (2001) proposed a simple extension to the standard microeconomic model of the consumer that incorporates the use of explicit ‘soft’ cut-offs to mimic (or approximate) choice set formation; see also Elrod et al. (2004) for a related idea that identifies cut-offs without the need for explicit cut-off elicitation from consumers. In transportation, the constrained multinomial logit (MNL) model (Martinez et al., 2009) builds on the idea of penalizing alternative evaluations to describe availability restrictions, leading to a relatively simple nonlinear variant of the MNL model. In marketing, Gilbride and Allenby (2004) proposed a formal two-stage model that captures choice set formation by explicit representation of screening rules, rather than the enumeration of latent choice sets. Despite progress in making the inclusion of choice set formation in choice models more achievable, it seems to us that fresh thinking is needed to address a difficult problem. Existing approaches to choice set formation have used three general strategies to tackle the issue: they have employed brute force to model all choice sets, reduced the problem to manageable size by eliminating some (if not most) choice sets, or pushed the computational complexity into Bayesian algorithm design, as we discuss later. The brute force approach is exact, but limits choice models to situations with 12 or so alternatives (for example, Swait and Ben-Akiva, 1987a; Andrews and Srinivasan, 1995; Swait and Erdem, 2007), even with current computational resources; and because both computational power over time and number of choice sets over J increase exponentially, this practical limitation is unlikely to witness dramatic relaxation. The elimination approach is inexact as (potentially) viable choice sets may be ignored, but much larger problems can be addressed (potentially, with many dozens of alternatives). Early examples of this are Gaudry and Dagenais’s (1979) ‘dogit’ model and the Swait and BenAkiva (1987b) ‘logit captivity model’, wherein choice sets take two forms – individual alternatives (singletons) and the entire group of alternatives, M – greatly reducing the number of such sets to [J + 1], thereby extending the ordinary MNL model and skirting its independence from irrelevant alternatives (IIA) property (that is, the dogit model ‘dodges IIA’). However well such models may fit observed data, it is difficult to argue that they mimic actual consumer choice processes, which clearly can encompass choice sets of intermediate size. We suggest that there may be two other general approaches that future research should explore.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Deciding how to decide 653 Approximations and dimensional reduction In the spirit of the constrained MNL model (Martinez et al., 2009), which circumvents explicit enumeration of the latent choice set space by approximating the inclusion/ exclusion of alternatives via a penalty function, we suggest that another approximation approach may be to develop models that make certain choice sets explicit while aggregating (but not totally ignoring) most choice sets. Such an approach might lead to the capability of handling many more alternatives than currently possible, but as with all approximations, its ability to depict actual behavior ‘sufficiently well’ will need to be examined. Information augmentation Given that the complexity of choice set formation arises from the latent nature of the choice set space (that is, we observe a choice, but nothing about a choice set, except that the chosen alternative is in it), another potential research direction is to collect information on some or all other members of the choice set. While this seems like an obvious idea, it is challenging to implement in practice because this information could be confounded with preferences (that is, evaluations; see Horowitz and Louviere, 1995). However, even partial information on choice set membership would constitute a substantial reduction in estimation complexity: each additional alternative that we can place unambiguously in or out of the choice set reduces the size of the decision problem representation by half. Note that van Nierop et al. (2010) collected data on self-stated choice sets and, although they were not used for dimensionality reduction, they aligned very well – a 62 percent hit rate with eight possible brands – with latent choice sets inferred from choice data alone. 2.2 Decision Strategy Selection Psychologists and consumer behavior researchers have criticized economists’ penchant for attributing optimal decision-making capabilities to consumers, and these critics have provided credible evidence that all of the aforementioned assumptions (for example, exhaustive consideration of all market alternatives, full information availability and use) are sometimes violated in choice behaviors. Alternative choice process representations have been proposed, suggesting changes to different components, for example, elimination-by-aspects (Tversky, 1972) or satisficing (Simon, 1955) as decision rule alternatives to utility maximization, or the use of heuristics (Payne et al., 1988) to simplify decision making. However, both the critics and the objects of their scrutiny have tended to maintain the basic assumption of a single-stage decision process. Building on the premise that consumers are likely to use multi-stage choice processes, we believe it is profitable to consider that decision makers first make a decision about how to make the decision (which we term ‘decision strategy selection’), and then use this strategy to arrive at the final choice. The pre-decision processing may itself have multiple identifiable stages, such as information search, categorization, inclusion and elimination decisions, and so on. This extended representation of choice processes opens up the possibility of recasting choice as a result of deliberate sequential sub-processes: ‘how to decide’ followed by ‘what to choose given decision strategy’. For example, Swait (2009), later extended by Adamowicz and Swait (2012), uses a framework wherein decision strategies are finite in number, latent in nature and

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

654

Handbook of choice modelling

identifiable through their impacts on choice. Adamowicz and Swait (2013) integrate three decision strategies (pure habit/repetition, pure variety seeking and full information utility maximization) into a description of choices over time of catsup and yogurt, based on scanner panel data; each decision strategy requires some level of investment and delivers differential rewards. In this particular paper Adamowicz and Swait suggest, in the context of panel choices, that the representation of an initial decision strategy selection stage may be a superior explanation of the occurrence of habitual/repetitive behavior than the concept of state dependence. Other archetypal decision rules, such as simple price minimization, value maximization, pure brand orientation, and so on, could be used to extend the basic framework. Alternatively, formal mixtures of competing representations of behavior may be a rich direction to explore in model formulations. For instance, in travel demand modeling, Chorus (2011) reviews certain advances in choice behavior representation that involve regret minimization (for example, Chorus 2010) and prospect theory applied to riskless choices (for example, van de Kaa 2010). These competing theories of choice and/or evaluation can be combined with the traditional utility maximization representation as basic archetypes to explain observed choice behavior. Rather than viewing decision makers as ‘belonging’ to a particular archetypal representation, you can instead envision them as mixtures of pure archetypes. Hess et al. (2011) explore this idea with four different transport datasets, each analysis combining different decision rules (for example, utility maximization and lexicographic rule, utility maximization and elimination-by-aspects, utility maximization and regret minimization). Whereas Hess et al. (2011) use choice replications as the basis for identification of the support for each decision rule, the mixture of archetypal decision processes can be explained by factors such as effort required, expected reward from applying each, congruence of outcomes, and so on. This latter approach is not only appealing from the point of view of process representation, but also from the perspective of predictive accuracy. Geweke and Amisano (2011) have demonstrated that model-pooling can lead to improved forecasts compared to single model forecasts. The proposed approach of mixing archetypal decision rules in this manner may prove to yield better forecasts than using either single rules for all consumers or classifying decision makers as exclusively following one or another among multiple rules. 2.3

Multiple Goal Pursuit

Current theories of choice behavior suggest no explicit role for basic motivations (or goals) in process representations: to be somewhat facetious, goals are ‘just in the air’. There is clear indication that consumer behavior is largely goal-directed, and moreover that goals are key motivational constructs underlying decision making (Bettman et al., 1998; Higgins, 2002). Evidence also exists to support the contention that decision makers pursue goals when making choices; in fact, they may simultaneously pursue multiple goals during the same choice instance (Swait and Argo 2012). This literature recognizes that goal management is an active process in which decision makers monitor goals and behaviors. When making a decision, consumers can make concurrent tradeoffs across multiple goals, be subject to or impose several constraints on goal pursuit, and even consider an assortment of alternatives across several product

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Deciding how to decide 655 categories that enable goal achievement (see Swait and Argo, 2012).4 Weber and Johnson (2009) suggest that goals play multiple roles in choice: goals (a) motivate the entire decision, (b) help decision makers manage complexity through better resource allocation, and (c) define and measure success. In their review, Weber and Johnson suggest that the ubiquitous influence of goals in decision-making arises because individuals have limited cognitive resources, and goals are useful tools to reduce cognitive load. Consumer behavior researchers have already been exploring the impact of multiple goal pursuit in sequential decision-making (for example, Dhar and Simonson, 1999; Fishbach and Dhar, 2005, 2008; Fishbach et al., 2006), but not within the context of single choices. Swait and Argo (2012) are an exception: they report on survey evidence in three different decision contexts (choices involved in preparation for a job interview, restaurant menu item choice and candy bar choice) that multiple goal pursuit in single choices is the selfreported norm of behavior rather than the exception. In turn, this suggests that the single goal (utility) homo economicus paradigm is limited in its representation and limiting in its ability to guide future research in choice. That the choice model literature has not recognized the existence and potential effects of multiple goals leads to several potential impacts: (1) if multiple goals are being pursued in a given context, and if ‘tastes’ (preferences or marginal utilities) are goal-specific, then the possibility arises that any one individual is employing multiple ‘tastes’ when making a choice – implying that single-dimension taste heterogeneity as a decision maker characteristic is too limiting a definition; (2) multiple goals imply a potential for conflict, hence a need for goal hierarchies – implying that the argument that a single-dimensional compensatory evaluation metric like utility is generally sufficient to represent alternatives’ attractiveness is fundamentally flawed; (3) when multiple goals are pursued, success in each may be achieved in different ways, for example, one goal may need to be maximized while another may be considered achieved if a threshold is crossed – implying that goalspecific decision operators may be needed. In short, it may be fruitful to conceptualize taste heterogeneity as arising at least in part from different goals people hold, and that those goals are context-dependent. Thus, what may appear to the analyst as inherent differences in tastes across individuals, or even within individuals over time, may in actuality reflect different goals they are attempting to achieve. Recognition that the pursuit of two or more goals can occur concurrently represents a major shift in the way that one conceptualizes choice models. Research addressing the role of goals will deal with a major gap in the human decisionmaking and choice-modeling literatures. 2.4

Hard Constraints, Screens and Cut-offs

The process of ‘deciding how to decide’ may entail knowingly ignoring information about some alternatives, or even certain alternatives entirely. This was noted as early as Stigler’s (1961) classic article introducing the so-called ‘economics of information’, although antecedents date back to the ‘secretary problem’ in the 1950s. Specifically, decision makers may forgo the opportunity to evaluate certain items if others appear suitable, or simply good enough. This can reflect costs of further information acquisition, low anticipated marginal benefits, fear of the currently best option becoming unavailable, and so on. In each of these cases, the decision maker will have needed

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

656

Handbook of choice modelling

to arrive at a (non-compensatory) decision strategy that by its nature does not form summary evaluations of all available options, or even appeal to information on them all. Although space precludes anything like a review of non-compensatory choice models, one area ripe for exploration is the use of ‘hard cut-offs’ – on attributes, on anticipated gains, on choice items themselves or some other element of the decision space. For example, as mentioned earlier, explicit information on items in the choice set help rule out all choice sets failing to contain them; similarly, information on inviolable preferences or goals may help rule out evaluative processes, particular alternatives, or specific decision strategies. Several models in the literature have tried to encode such hard cut-offs. For example, Swait (2001) builds in attribute cut-offs in a flexible manner that allows violations, and so allows for arbitrary combinations of conjunctive, disjunctive and compensatory strategies, as well as pure versions of each. Feinberg and Huber (1996) extend Stigler’s approach to accommodate two kinds of cut-offs: quotas, which specify a fixed number of alternatives to evaluate, and levels, which specify that all alternatives surpassing a minimal overall quality threshold be evaluated. Both cases allow that some alternatives are simply not worth further, costly evaluation, and in this way are reminiscent of choice set formation models, as well as the later perspective of Gilbride and Allenby (2004), who like Swait (2001) allow screening on one or more attributes. Although earlier work in marketing using scanner panel data (for example, Andrews and Srinivasan, 1995; Siddarth et al., 1995) did posit such cut-offs, they still required full information and a summary evaluative procedure for each alternative, which the model itself attempted to recover via estimated parameters. This is clearly impractical as a consumer evaluation strategy in product classes with many alternatives, to which we next turn our attention, suggesting further research on pre-decision sub-processes involving which information or items to sidestep entirely. 2.5

The Challenge of ‘Big Data’

A sea change is taking place in consumer marketing, in terms of the vast array of data available to consumers when making decisions, as well as various systems to help them exercise choice. This presents a dual challenge to decision theorists: to resist the temptation to simply plug all this additional data into extant model formulations, and to study how actual consumers make sense of greatly expanded data and decision resources. E-commerce consumer sales in North America and globally now account for, respectively, $300 billion and $1 trillion, yet the processes through which consumers evaluate available products, and the informational resources at their disposal, may differ markedly from those in a traditional environment. In online settings, algorithms tuned by e-tailers like Amazon and Expedia help present options to consumers in projected order of matching their preferences. The costs of seeking pricing, availability and shipping information across sellers are noticeably reduced. Purchase, browsing and shopping cart histories are stored. In the cases of social networks, purchases and recommendations of acquaintances may be consulted. Each of these possibilities offers not only a new avenue for pre-decision processing, but calls into question how unidimensional evaluations for all available products – the cornerstone of choice modeling – might reasonably be carried out, or how choice sets might form. A particularly novel form of choice is that of selecting friends on social networks and potential mates on dating sites, both of which require dyadic decisions, the latter in a

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Deciding how to decide 657 constrained, competitive setting. Standard choice models involve an active processing agent, the consumer, deciding from among inert, available alternatives. Social networks involve two-sided matching among agents who typically face very different sets of potential people with whom to engage. The dimensionality that so curses choice set formation modeling can be radically larger for real-world networks: even 100 000 users in one small city can generate billions of possible unique connection choices, not counting connections outside their locality. Braun and Bonfrer (2011) present a general approach to this quadratic scaling problem using Dirichlet processes, which are applied to infer latent customer similarities. Much further work is needed to extend such approaches to the full generality of modern choice models. Dyadic choice presents a host of other complexities of which choice modelers will need to take stock, including market (in)efficiencies, dynamic preferences, temporal expectations, lack of full observability, among others; see, for example, Hitsch et al. (2010) on the dating market, specifically. 2.6

Bayesian and Nonparametric Approaches

Bayesian estimation methods have become a cornerstone of modern econometrics, finding especial application in choice modeling. This owes, at least in part, to the foundations of random utility models positing a latent, unobservable (by the analyst) utility. In practice, however, Bayesian estimation provides a particularly powerful platform for accounting for three other real-world features of choice modeling in a natural way: missing data, preference heterogeneity and model identification. Specifically, Bayesian econometric methods allow the analyst to formulate a model for missing data (including that it may be missing completely at random – MCAR) and integrate over its distribution; this can be a substantial stumbling block in the classical approach, involving subtle multiple imputation methods. Second, preference heterogeneity has a straightforward representation in a random coefficient setting, estimated via hierarchical Bayes (HB) methods; Rossi et al. (2005) provide numerous examples. Third, many choice models suffer from both location and scale indeterminacy, requiring a single constraint on each for model identification. Such constraints can interfere with conjugacy in generating posterior conditional densities, but it is typically possible to post-process an unidentified model via data augmentation (Tanner and Wong, 1987; Edwards and Allenby, 2003), greatly aiding in estimation. These general properties can, in our view, be leveraged for understanding multi-stage decision processes. For example, because choice sets are latent, they may be viewed as ‘parameters’, which to the Bayesian comprise all uncertain (that is, non-observed) quantities. Instead of summing over all possible choice sets to calculate a likelihood to be maximized, perhaps sample from the discrete probability distribution over all choice sets containing the selected alternative; conditional on which is drawn, drawing from conditional parameter densities can be carried out at relatively low computational cost. Ergodic theory guarantees the correct joint posterior distribution, asymptotically, although in practice many draws may be required. Similarly, other latent processes – specifically, any taking place before those ordinarily concerning the analyst – may be viewed as unobservables, to be sampled from and ‘integrated out’. Bayesian methods would ensure that only higher probability regions of the latent decision space be visited frequently, and thereby to help assign posterior weights to various decision rules.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

658

Handbook of choice modelling

Such methods are slowly being developed, but several antecedents have appeared. For example, Kim et al. (2007) relaxed the linearity typically presumed for (random) utility functions, replacing them with splines; a highly flexible nonparametric alternative. Such estimation would have been prohibitive in a classical setting. Van Nierop et al. (2010) estimated a two-stage, choice-set based model with full covariance matrices in both the choice-set and choice phases, which again has proved exceedingly difficult in classical applications. Similarly, Bayesian methods are allowing choice theorists to approach temporal and dynamic choice issues for large datasets that would have been prohibitive or impossible otherwise. For example, Zhao et al. (2009) formulate a dynamic model for credit cardholders’ debt repayment behavior; notably, the model allows for an autoregressive account of (latent) parametric evolution, which allows consumers’ repayment tendencies (both whether and how much) to vary over time. Since consumers effecting credit card payments make dynamic allocation decisions, they must ‘decide how to decide’, and modern Bayesian techniques can help uncover the latent parametric processes underlying such decision strategies.

3

CONCLUDING THOUGHTS

In this chapter we have sketched a rough agenda for a rapidly developing, underexplored region of choice theory and modeling: how decision makers ‘decide how to decide’. Just as the analytically convenient homo economicus assumption underlying much classical microeconomic reasoning is falling by the wayside in the face of experimental findings, choice models simply cannot maintain the fiction that decision makers drink in huge amounts of information and process it all into a huge set of utilities, from which they select the largest, no matter how well such models may appear to fit the data. Doing so prevents us, as analysts, from uncovering the ways in which decision makers actually go about navigating informational cascades and, unwittingly, leads us to pointless complexity, piling on the Ptolemaic epicycles. Instead, we need to embrace the view that individuals employ multiple decision stages, varied goals, willfully ignore potentially diagnostic information (among other phenomena) in ‘deciding how to decide’, and revamp our model formulations to reflect this. The challenge and opportunity of ‘Big Data’ is before us, and our models of choice behavior need to rise to the occasion.

NOTES 1. By the term ‘microeconomic consumer model’ we refer to the utility maximizing, full information, fully rational decision maker underlying most economic analysis of consumer behavior. See, for example, Varian (1992) for a general presentation, and Hanemann (1984) for one targeted at the choice of discrete goods. 2. We suppress a subscript for the individual decision maker, but it should be understood to be present. 3. The marketing literature has generally adopted the term ‘consideration set’ (see, for example, Roberts and Lattin, 1991, or more recently, Hauser, 2010) for what other disciplines have termed the ‘choice set’. We adopt here the latter term in an effort to create a bridge to the wider choice modeling literature. Substantively, we define the choice set to be composed of those alternatives on offer at the time of choice that have a structurally non-zero probability of being chosen. This implies that the choice set is contextspecific and subject to both endogenous and exogenous constraints. We prefer to reserve the use of the

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Deciding how to decide 659 term ‘consideration set’ for those alternatives in the universal set of market alternatives that would be seriously considered for choice if a choice occasion were to arise. Thus, a consideration set may or may not be subject to endogenous constraints, might include aspirational goods but, most importantly, is definitely not tied to a particular choice instance. 4. Swait and Argo (2012, p. 5) use a job interview context to motivate the different roles of goals: In the context of preparing for a job interview a potential candidate might have several goals she wishes to accomplish simultaneously (for example, get the job and impress her parents). As a result of these various goals, she might consider using and/or purchasing multiple product categories (for example, buy a blouse, replace a mobile phone, get a haircut, hire a babysitter, buy an umbrella) and think about the effect of different constraints (that is, potential limits on goal achievement).

REFERENCES Adamowicz, V. and J. Swait (2013), ‘Are food choices really habitual? Integrating habits, variety seeking and compensatory choice in a utility maximizing framework’, American Journal of Agricultural Economics, 95 (1), 17–41. Andrews, R. and T.C. Srinivasan (1995), ‘Studying consideration effects in empirical choice models using scanner panel data’, Journal of Marketing Research, 32 (February), 30–41. Bettman, J., M. Luce and J. Payne (1998), ‘Constructive consumer choice processes’, Journal of Consumer Research, 25 (3), 187–217. Braun, M. and A. Bonfrer (2011), ‘Scalable inference of customer similarities from interactions data using Dirichlet processes’, Marketing Science, 30 (3), 513–31. Chorus, C. (2010), ‘A new model of random regret minimization’, European Journal of Transport and Infrastructure Research, 10 (2), 181–96. Chorus, C. (2011), ‘What about behaviour in travel demand modelling? An overview of recent progress’, working paper, April, Delft University of Technology. Dawes, R. (1979), ‘The robust beauty of improper linear models in decision making’, American Psychologist, July, 571–82. Dhar, R. and I. Simonson (1999), ‘Making complementary choices in consumption episodes: highlighting versus balancing’, Journal of Marketing Research, 36 (1), 29–44. Edwards, Y.D. and G. Allenby (2003), ‘Multivariate analysis of multiple response data’, Journal of Marketing Research, 40 (3), 321–34. Elrod, T., R. Johnson and J. White (2004), ‘A new integrated model of noncompensatory and compensatory decision strategies’, Organizational Behavior and Human Decision Processes, 95 (1), 1–19. Feinberg, F. and J. Huber (1996), ‘A theory of cutoff formation under imperfect information’, Management Science, 42 (1), 65–84. Fishbach, A. and R. Dhar (2005), ‘Goals as excuses or guides: the liberating effect of perceived goal progress on choice’, Journal of Consumer Research, 32 (December), 370–77. Fishbach, A. and R. Dhar (2008), ‘Dynamics of goal-based choice: toward an understanding of how goals commit versus liberate choice’, in C.P. Haugtvedt, P.M. Herr and F. Kardes (eds), Handbook of Consumer Psychology, New York: Psychology Press, pp. 611–37. Fishbach, A., R. Dhar and Y. Zhang (2006), ‘Subgoals as substitutes or complements: the role of goal accessibility’, Journal of Personality and Social Psychology, 91 (August), 232–42. Gaudry, M. and M. Dagenais (1979), ‘The dogit model’, Transportation Research Part B, 13B (2), 105–12. Geweke, J. and G. Amisano (2011), ‘Optimal prediction pools’, Journal of Econometrics, 164 (1), 130–41. Gilbride, T. and G. Allenby (2004), ‘A choice model with conjunctive, disjunctive, and compensatory screening rules’, Marketing Science, 23 (3), 391–406. Hanemann, M. (1984), ‘Discrete/continuous models of consumer demand’, Econometrica, 52 (3), 541–62. Hauser, J. (2010), ‘Consideration-set heuristics’, working paper, December, Sloan School of Management, Massachusetts Institute of Technology. Hess, S., A. Stathopoulos and A. Daly (2011), ‘Allowing for heterogeneous decision rules in discrete choice models: an approach and four case studies’, working paper, July, Institute for Transport Studies, University of Leeds. Hitsch, G., A. Hortaçsu and D. Ariely (2010), ‘Matching and sorting in online dating’, American Economic Review, 100 (1), 130–63. Higgins, E.T. (2002), ‘How self-regulation creates distinct values: the case of promotion and prevention decision making’, Journal of Consumer Psychology, 12 (3), 177–91.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

660

Handbook of choice modelling

Horowitz, J. and J. Louviere (1995), ‘What is the role of consideration sets in choice modeling?’, International Journal of Research in Marketing, 12 (3), 39–54. Kim, J.G., U. Menzefricke and F. Feinberg (2007), ‘Capturing flexible heterogeneous utility curves: Bayesian spline approach’, Management Science, 53 (2), 340–54. Manski, C. (1977), ‘The structure of random utility models’, Theory and Decision, 8 (3), 229–54. Martinez, F., F. Aguila and R. Hurtubia (2009), ‘The constrained multinomial logit model: a semicompensatory choice model’, Transportation Research Part B, 43 (3), 365–77. Payne, J. (1976), ‘Task complexity and contingent processing in decision making: an information search and protocol analysis’, Organizational Behavior & Human Performance, 16 (August), 366–87. Payne, J., J. Bettman and E. Johnson (1988), ‘Adaptive strategy selection in decision making’, Journal of Experimental Psychology: Learning, Memory, and Cognition, 14 (3), 534–52. Roberts, J. and J. Lattin (1991), ‘Development and testing of a model of consideration set composition’, Journal of Marketing Research, 28 (November), 429–40. Rossi, P., G. Allenby and R. McCulloch (2005), Bayesian Statistics and Marketing, Hoboken, NJ: John Wiley and Sons. Siddarth, S., R. Bucklin and D. Morrison (1995), ‘Making the cut: modeling and analyzing choice set restriction in scanner panel data’, Journal of Marketing Research, 32 (3), 255–66. Simon, H. (1955), ‘A behavioral model of rational choice’, Quarterly Journal of Economics, 69 (1), 99–188. Stigler, G. (1961), ‘The economics of information’, Journal of Political Economy, 69 (3), 213–25. Swait, J. (2001), ‘A non-compensatory choice model incorporating attribute cutoffs’, Transportation Research Part B, 35 (10), 903–28. Swait, J. (2009), ‘Choice models based on mixed discrete/continuous PDFs’, Transportation Research Part B, 43 (7), 766–83. Swait, J. and W. Adamowicz (2001), ‘Choice complexity and decision strategy selection’, Journal of Consumer Research, 28 (1), 135–48. Swait, J. and J. Argo (2012), ‘The pervasive role of goals in choice behavior’, working paper, January, Alberta Business School, University of Alberta. Swait, J. and M. Ben-Akiva (1985), ‘An analysis of the effects of captivity on travel time and cost elasticities’, Annals of the 1985 International Conference on Travel Behavior, 16–19 April, Noordwijk, the Netherlands, pp. 113–28. Swait, J. and M. Ben-Akiva (1987a), ‘Incorporating random constraints in discrete models of choice set generation’, Transportation Research Part B, 21B (2), 91–102. Swait, J. and M. Ben-Akiva (1987b), ‘Empirical test of a constrained choice discrete model: mode choice in São Paulo’, Transportation Research Part B, 21B (2), 103–15. Swait, J. and T. Erdem (2007), ‘Brand effects on choice and choice set formation under uncertainty’, Marketing Science, 26 (5), 679–97. Tanner, M.A. and W.H. Wong (1987), ‘The calculation of posterior distributions by data augmentation’, Journal of the American Statistical Association, 82 (398), 528–40. Tversky, A. (1972), ‘Elimination by aspects: a theory of choice’, Psychological Review, 79 (4), 281–99. Van der Kaa, E. (2010), ‘Applicability of an extended prospect theory to travel behavior research: a metaanalysis’, Transport Reviews, 30 (6), 771–804. Van Nierop, E., B. Bronnenberg, R. Paap, M. Wedel and P.H. Franses (2010), ‘Retrieving unobserved consideration sets from household panel data’, Journal of Marketing Research, 47 (1), 63–74. Varian, H. (1992), Microeconomic Analysis, 3rd edn, London: W.W. Norton. Weber, E. and E. Johnson (2009), ‘Mindful judgment and decision making’, Annual Review of Psychology, 60 (1), 53–85. Wright, P. (1975), ‘Consumer choice strategies: simplifying vs. optimizing’, Journal of Marketing Research, 11 (February), 60–67. Zhao, Y., Y. Zhao and I. Song (2009), ‘Predicting new customers’ risk type in the credit card market’, Journal of Marketing Research, 46 (4), 506–17.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:36AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

27 Choice modelling research in environmental and resource economics Wiktor L. Adamowicz, Klaus Glenk and Jürgen Meyerhoff

Choice modelling has been a fundamental component of environmental valuation within environmental economics. Stated preference approaches to valuation have traditionally used contingent valuation methods, but are increasingly based on multi-attribute experimental designs (‘choice experiments’) to identify economic values of environmental quality changes. Similarly, revealed preferences on recreation choices or housing choices have been used to identify the value of environmental attributes. In contrast to choice modelling research in transportation and marketing, environmental economists rely on these methods to measure compensating or equivalent variation, or monetary measures of willingness to pay (WTP) or willingness to accept (WTA), for an environmental quality change. In this chapter we review the state of practice and examine research frontiers in the application of choice modelling to environmental valuation, and provide some comment on applications of choice modelling to other areas of natural resource and environmental economics. We begin with an examination of discrete-choice experiments as the most common application of choice modelling in environmental economics. Contingent valuation is one form of ‘discrete-choice experiment’ – using the terminology outlined by Carson and Louviere (2011), as are ‘choice experiments’ in which many attributes, including price, are manipulated using an experimental design. Various challenging issues continue to arise in the use of discrete-choice experiment stated preference methods in environmental valuation.

1

EXPERIMENTAL DESIGN

In the early stages of applying choice experiments (CE) in environmental valuation, not too much attention was paid to the underlying experimental design (ED) and orthogonality between the design attributes representing the primary criterion in the generation of fractional factorial designs (Ferrini and Scarpa, 2007). However, researchers subsequently became more concerned about ED realizing that precision and efficiency of the parameter estimates depend crucially on the chosen design. In the course of this development two separate paradigms for constructing designs developed in the published literature. One seeks to maximize differences between the attribute levels of the stated preference alternatives, whereas the other seeks to minimize the variances of the parameter estimates obtained for each of the attribute coefficients included in the utility specification (Scarpa and Rose, 2008). Overall, the literature indicates that efficient designs outperform utility-neutral designs (for example, Ferrini and Scarpa 2007). Further research aims at improving statistical 661 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

662

Handbook of choice modelling

efficiency. For example, efficient designs will optimize the design such that the data are likely to provide more reliable parameter estimates for the particular model under estimation. Researchers, however, typically do not want to estimate only one type of model. A question therefore is whether efficient designs using non-zero priors that were originally constructed for multinomial logit (MNL) models are also appropriate for more models. So far research indicates that designs generated for MNL models perform well when mixed logit is used, for example (Bliemer and Rose, 2010). Also, the sensitivity of the ED to the accuracy of prior information is a topic for further research. Starting from WTP-optimal designs, Vermeulen et al. (2011) mention the possibility of developing a design criterion that focuses on the accurate measurement of compensating variation. Another group of research questions, which have so far received less attention, are related to behavioural efficiency (Louviere et al., 2008). A crucial question is how designs constructed according to different optimality criteria affect choice task complexity and thus respondents’ processing strategies. Statistically more efficient designs may, for example, have unintended consequences by increasing the cognitive burden for respondents (see Ferrini and Scarpa, 2007; Louviere et al., 2011). Swait and Adamowicz (2001) showed conceptually that if respondents are sensitive to the complexity of the choice sets, as characterized by entropy, then more efficient designs may generate higher variance outcomes.

2

TASTE AND SCALE HETEROGENEITY

Analysing taste heterogeneity is now common in environmental valuation studies. The majority of studies apply the random parameter logit model to account for unobserved heterogeneity using a normal distribution, other distributions such as the triangular are rarely used. Moreover, cost is still often fixed, although there is clear evidence that heterogeneity is present also with respect to the costs of the choice alternatives. Estimating WTP space models (Train and Weeks, 2005) as one way to deal with these problems is rarely done in applied studies. Latent class models have become more popular recently as they require fewer assumptions. At the same time, these models account automatically for differing cost sensitivities. However, how to determine the appropriate number of classes is still an open question (Swait, 2006). A recent innovation is the generalized multinomial logit model (Keane, 2006, implemented by Fiebig et al., 2010). The ability of this approach to effectively isolate scale from taste is currently under debate (Hess and Rose, 2012). Alternatively, differences in scale across subgroups can be estimated using heteroscedastic logit models, or can be accounted for in scale-adjusted latent class models (Magidson and Vermunt, 2007, for an application Thiene et al., 2012b). More recently integrated choice and latent variable models have been applied in environmental valuation (Hess and Beharry-Borg, 2012; Mariel et al., 2013). The objective is to inform choice models about heterogeneity among respondents using responses to attitudinal statements. Incorporating these responses directly may suffer from measurement error, as attitudes are latent, and responses may as well be correlated with other unobserved factors, causing correlation between the modelled and random components of utility, potentially leading to endogeneity. Both applications resulted in substantially different WTP patterns. To what extent these models will shed more light on peoples’ choice behaviour can be judged when more empirical evidence is available.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in environmental and resource economics 663

3

CHOICE TASK DIMENSIONALITY AND COMPLEXITY (CHOICE FORMAT EFFECT)

Going beyond the typical format with two designed alternatives and a status quo option may increase the statistical efficiency and would require smaller sample sizes. At the same time, the cognitive burden on respondents may increase. Louviere et al. (2008) pointed out that respondents are systematically less consistent in answering choice questions as statistical efficiency increases. Given the currently available literature, however, it is not possible to generalize this finding, as too few studies have investigated this relation. Bliemer and Rose (2011), for example, could not find evidence that the more efficient design tends to induce less choice consistency. More evidence is required extending the findings of Boxall et al. (2009) and Zhang and Adamowicz (2011) linking choice-set dimensions and their impacts on response behaviour. Questions raised are whether a higher degree of complexity (higher dimensionality) results in less choice consistency and significantly more status quo choices. Meyerhoff et al. (2013) applied a design-of-designs approach systematically varying the dimensions across 16 split samples. They investigated the influence of five design dimensions on choice consistency. They found that all five design dimensions impact on the error variance. However, fatigue effects are rather small as scale decreases only marginally toward the end of the sequence of choice sets (up to 24 sets). This finding is supported by, for example, Czajkowski et al. (forthcoming), who applied 26 choice sets without detecting a negative influence on error variance. Combining the designof-designs approach with various complexity measures, such as the entropy measure presented by Swait and Adamowicz (2001), and approaches to investigate heterogeneous choice heuristics could lead to new insights into the relation between the dimensionality and choice behaviour.

4

DECISION RULES AND PROCESSING STRATEGIES

A broad array of compensatory and non-compensatory decision rules and heuristics that apply to choice contexts has been identified, predominantly in behavioural research, psychology and marketing (for example, Tversky, 1969; Kahnemann and Tversky, 1979; Loomes and Sugden, 1982). Modelling conjunctive and disjunctive decision rules implies cut-offs used by respondents to be reflected in utility functions (Swait, 2001). Cut-offs may be inferred in a hierarchical model of screening rules as suggested by Gilbride and Allenby (2004), or – computationally simpler – the researcher can make use of cut-off information stated by survey respondents (Ding et al., 2012; Colombo and Glenk, 2014). There are potential endogeneity problems in using stated cut-offs to condition utility functions. Developing instruments for predicting self-reported cut-offs is thus an area of further research (Ding et al., 2012). An alternative to conditioning utility functions based on self-reported or predicted cut-offs may be to infer cut-offs up to a probability using a latent class structure. Such a structure may also be used to allow for heterogeneity in the extent of violations to (stated or predicted) thresholds. Regarding the status quo alternative included in environmental stated choice surveys,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

664

Handbook of choice modelling

it is typically assumed that respondents use the attribute information of the status quo alternative as a reference base in the evaluation of choice tasks. However, respondents may employ reference points that are at least partly influenced by subjective beliefs, for example related to probability of provision. Further, reference points used may shift depending on preceding choices made related to some form of strategic misrepresentation behaviour (McNair et al., 2012). More research could be dedicated to assessing beliefs regarding the presented status quo alternative, especially if it represents a likely future state of the environment in the absence of policy intervention. An interesting research area may be the investigation of perceptions of changes relative to a status quo in virtual reality environments. Bateman et al. (2009) have found that using virtual reality visualizations to communicate environmental changes may reduce the WTA/WTP asymmetry. Visual stimuli can be powerful, but evaluations based on visual impressions will largely be based on perceptions that may differ from ‘objective’ information of change underlying the visual representations, the econometric models and ultimately WTP estimates. Another behavioural decision rule that has entered the choice modelling literature is that of regret minimization (Chorus, 2010). In the context of outdoor recreation, Thiene et al. (2012a) applied a random regret minimization (RRM) model to the choice of trail characteristics in the Alps and compared it to its random utility model (RUM) counterpart. As they note, RRM may be more relevant to some subgroups of respondents. Thus, a better understanding of the environmental economics contexts in which RRM particularly matters, and of which individuals or subgroups of respondents are especially prone to RRM-type behaviour, may be useful. Given that model outcomes in terms of WTP are currently less than intuitive, more work is also needed to assess the policy implications of RRM model outputs in an environmental context. Beyond further inquiries related to the decision rules and behaviours individually, how can heterogeneity in decision rules and heuristics be incorporated into choice models? Respondents across the sample are likely to apply different decision rules and heuristics, and individuals may adapt their behaviours over the sequence of choices. Hence, recent attempts have been made to model heterogeneity of decision rules and heuristics in behavioural mixing models relying on a latent class structure (Hess et al., 2012; McNair et al., 2012; Adamowicz and Swait, 2013; Colombo and Glenk, 2014). Further investigation into utility theoretic models of strategy which provide an avenue for welfare evaluation is warranted. At the level of information processing at the attribute evaluation stage, attribute nonattendance (AN-A) generally describes a reduced focus on only a subset of attributes, which is in conflict with assumptions on fully compensatory behaviour. Inference of AN-A analytically has become increasingly popular, reflecting concerns about endogeneity of using stated information to condition utility functions and issues with framing follow-up questions on AN-A in a manner that accurately captures situations that allow the assignment of zero utility to an attribute in an individual choice task, or a sequence of choices (Alemu et al., 2013). A key concern of environmental choice modelling research is the implication of considering various decision rules, heuristics and choice behaviours for estimation of WTP and welfare impacts. While there seems to be a consensus that ignoring various behavioural response strategies can result in biased WTP estimates, further research is

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in environmental and resource economics 665 required to move towards a more consistent treatment of heuristics and AN-A for WTP estimation. For example, as noted by Thiene et al. (2012a), it is challenging to translate regret associated with a choice among alternatives into monetary value. Regarding other decision heuristics and processing strategies, simply omitting respondents who do not behave according to the standard RUM model may be problematic (Alemu et al., 2013). Concerning AN-A, it is clearly unsatisfactory to exclude respondents whose WTP goes to infinity, because they are identified to not having ‘attended’ to cost, or whose coefficient of variation for attribute parameters exceeds a certain threshold (Hess and Hensher, 2010). The findings of Cameron and DeShazo (2011) confirm that attention to attributes is, in a fairly complex manner, related to aspects of the experimental design and marginal utilities of these attributes. It can be argued that decision rules and processing strategies at the attribute level reflect actual decision-making processes in the marketplace, and hence reflect ‘true’ preferences of respondents. However, this would require establishing a chain of evidence that allows distinguishing between ‘acceptable’ use of heuristics and processing strategies and ‘unacceptable’ use, which is induced and driven by the instrument used for data collection. Thus, an improved understanding is needed about the factors that govern the use of decision rules and processing strategies, and how these factors are related to respondent characteristics. Indeed, this key issue links several themes of investigation in the environmental choice modelling literature, and is thus likely to form a future focal area of research. The use of heuristics and processing strategies is likely to be context dependent (Swait et al., 2002). It may be related to study design and experimental design (Burton and Rigby, 2012), and issues of complexity, familiarity with the decision context and the environmental good, cognitive capacity and incentive structure of the choice tasks play an important role. Online surveys can provide novel insights, as shown by Campbell et al. (2012) who provide the first assessment of the impact of response latency on processing heterogeneity.

5

HOUSEHOLDS AND GROUPS

A topic that has not found much attention in environmental valuation so far is whether joint preferences from a group or household differ significantly from the preferences stated by a representative individual. Typically, in valuation surveys, one member of a household is selected during the sampling process assuming that this person correctly represents the tastes of the entire household. The current literature, however, shows that preference estimates are sensitive to assumptions on whether decisions are made jointly or individually. For environmental valuation studies it might thus be crucial to understand when decisions concerning the use and conservation of the environment as a public good have a high likelihood of being taken jointly. Dosman and Adamowicz (2006) argue, for example, that vacation choices are likely to be subject to intra-household bargaining. A topic for future research is whether models estimating bargaining power are sufficient to capture the dynamics underlying joint decisions. Marcucci et al. (2011) point out that this especially applies to decisions taken by groups with more than two members,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

666

Handbook of choice modelling

that is, households with adolescents. Some studies have used rather small sample sizes (Beharry-Borg et al., 2009; Marcucci et al., 2011; Scarpa et al., 2012), emphasizing the need for larger samples although sampling is more elaborate when both individual and joint decisions are recorded. Another topic for future research concerns understanding which types of household decisions are expected to reveal larger differences among group members. Decisions that involve health risks, as Dosman and Adamowicz (2006) point out, fall into this category. A particularly interesting area of research is arising that extends beyond two-person households and explicitly models social networks (Neilson and Wichmann, 2014). This formal linkage between social network analysis and environmental valuation offers an avenue for considerable exploration of different types of group influence on choice and valuation.

6

INCENTIVE COMPATIBILITY

A significant challenge in the stated preference literature, whether it be applied to environmental topics or other topics, has been the concern around incentive compatibility, or as described by some – hypothetical bias. Whether respondents are truthfully revealing their preferences in a discrete-choice experiment continues to be a difficult issue. Contingent valuation has received extensive scrutiny on this issue and there is some agreement on the factors that generate truthful preference revelation (consequential questions). Carson and Groves (2007, 2011) outline the characteristics of a contingent valuation task that lead to truthful preference revelation. Vossler et al. (2012) outline how these conditions change for a discrete-choice experiment that involves additional attributes and replications – as is typical in discrete-choice experiments. They show that multi-attribute discrete-choice experiments involving replications can elicit truthful preferences, especially in the case where respondents believe that their responses will have an impact on policy. Vossler et al. (2012) caution that their results may not hold true for contentious issues and for other forms of payment vehicles, response formats and other issues. The potential for strategic behaviour arising because of the repetition of choices in multiple response choice experiments has also become a topic of interest for economists. Day et al. (2012) show that attribute levels from one task influence responses over the set of tasks. This could arise because of strategic behaviour or learning. Indeed – one of the conditions for incentive compatibility derived in Vossler et al. (2012) is the independence between tasks. The fact that repeated responses may generate strategic behaviour creates challenges for traditional choice experiment designs as they are commonly implemented using such repeated choices. The literature awaits more insights into the best approach to deal with such issues. A different perspective on incentive compatibility is raised by Carson and Groves (2011). They acknowledge that their approach (and that employed in Vossler et al., 2012) for assessing responses under the incentive compatibility requirement is a neoclassical approach and that behavioural economics may provide other insights into this issue. An approach that relies on a behavioural economics view is that of Lusk and Norwood (2009) that employs the conceptual framework provided by Levitt and List (2007). Lusk and Norwood suggest that responses in stated preference experiments deviate from what might be expected in the field because of moral/ethical aspects of the good and an indi-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in environmental and resource economics 667 vidual’s sensitivity to wealth versus moral/ethical aspects of utility. Therefore, it would be interesting to re-examine the issue of consequentiality and incentive compatibility through a behavioural economics lens. A very common use of stated preference studies is to measure passive use values associated with a loss or change in environmental quality. These studies often form the basis for planning or investment analysis such as benefit–cost analysis, or they are the basis of the determination of compensation in a natural-resource damage setting. A number of vexing problems continue to exist in this literature, over and above the incentive compatibility issues, experimental design and econometric issues. First, the ‘extent of the market’ – or geographic scope used for inclusion of beneficiaries – is one of the most important determinants of aggregate value in such studies. However, we know relatively little about the choice of the relevant market either conceptually or empirically. Second, survey method, sampling issues and response bias (and dealing with non-responses in welfare calculations), appear to significantly affect value estimates. Improved insights into these issues are necessary. Finally, the elicitation of marginal versus total or average values in a context that aligns with the costs of provision of the public goods continues to be an issue necessary for the use of such measures in policy analysis. Significant strides have been made in examining issues of strategic behaviour and approaches to deal with such challenges in choices framed as WTP questions (voting for public goods, and so on). However, increasingly WTA questions are being used in analysis of schemes such as payments for ecosystem services programmes. Many of these are implemented in developing countries (Whittington and Pagiola, 2012). There has been relatively little research on the performance of discrete-choice experiments in a WTA context and the confounding feature of implementation in a developing country context (see Whittington, 2010). Given the important policy relevance of such issues additional research on these topics seems critical.

7

CHOICE MODELS OF PROPERTY VALUE

The analysis of property values has traditionally been the domain of ‘hedonic price models’ in which prices of properties are regressed against characteristics of the property and the surroundings to assess their contribution to price (Taylor, 2003). However, a number of theoretical and empirical challenges arise in the use of hedonic price models. As a result, some researchers have turned to the use of RUMs (choice models) to assess the impact of changes in characteristics on property values (Cropper et al., 1993). The use of choice models to assess factors affecting property values, while conceptually and empirically appealing, raises a number of challenges that have yet to be adequately addressed in the literature. Some researchers examine individual property choice while others examine choice of districts of neighbourhoods (Kuminoff et al., 2010). This choice of ‘aggregation’ raises a variety of issues, including how to characterize the choice set. Issues of choice-set formation have been examined (Banzhaf and Smith, 2007; Kuminoff, 2009) and have been found to have an influence on model outcomes and welfare measures, but additional analysis of choice set formation at various scales is probably necessary. An improved understanding of how incomes and wealth create budget constraints and thus choice set constraints would be helpful.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

668

Handbook of choice modelling

An issue that plagues much of applied research employing choice models, including property value models, is endogeneity. Endogeneity arises in many ways in studies of property choice and value. Endogeneity in price is probably common as price often captures unobserved valuable attributes such as renovations (Guevera and Ben-Akiva, 2009). A number of options are available for treating this form of endogeneity including control function approaches (Guevera and Ben-Akiva, 2009; Petrin and Train, 2010; Nevo, 2011). Another form of endogeneity arises if individual preferences influence the level of public goods provided in a neighbourhood or region. In this context the level of public good cannot be viewed as exogenous and simply included as an attribute of choice. Kuminoff et al. (2010) outline how sorting models can address this issue. A number of interesting and challenging issues arise for those interested in ‘choice models’ of property values and public goods. Kuminoff et al. (2010) provide a summary of ‘frontier issues’ including the inclusion of dynamics (forward-looking agents), interaction between employment/labour supply and property choice and the supply of properties. Embedded within these topics are other fundamental issues, such as choice-set formation, cognition/complexity and inertia. An interesting linkage arises between property value models and stated preference approaches. Property value models have predominantly relied on revealed preference information and made assumptions about information, perceptions and other features of property choice. There are a few notable attempts to use stated preference (SP) in the context of property choice and public goods value (Braden et al., 2008; Chattopadhyay et al., 2005; Paturnu et al., 2007; Phaneuf et al., 2013). The advantages of using SP methods in property choice are similar to those of using SP methods in recreation or commodity choice; that is, to alleviate collinearity problems, address new characteristics, and so on. There may be room for additional use of SP methods to address the complexities of property choice – particularly in the sorting framework.

8

RECREATION DEMAND MODELS

Recreation demand is a well-developed literature that includes many model forms (random RUMs, count models, joint discrete-continuous models, among others) and many data-generating mechanisms (revealed preference general survey data, revealed preference on-site data and SP data of various forms). However, many interesting problems remain in this literature. There appear to be a number of analytical thresholds that still need to be overcome. Some fall into the domain of econometric issues that are discussed in reviews such as Moeltner and von Haefen (2011). But several issues are broader and involve data collection issues and fundamental model structure. The first among these thresholds is the reliance on static analysis (and data). Recreation and tourism are inherently dynamic activities in that consumers learn, develop habits and, at times, move their principal residences to the locations where they enjoy their recreation. Temporal dimensions of choice problems are increasingly being recognized as key components of recreation demand (for example, Phaneuf and Smith’s 2003 review identified various dimensions of time in recreation demand models as critical research issues). A significant challenge in this area is the lack of temporal data on recreation demand. Few systematic longitudinal data collection efforts in the area of

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in environmental and resource economics 669 recreation demand exist. The development of such databases would enable a significant transformation in the literature. Topics such as the extent to which recreation is ‘routine’ behaviour could be addressed (for example, Adamowicz and Swait, 2013) or measures of learning and updating could be incorporated. While there is a small literature on dynamics and inter-temporal choice (for example, Provencher and Bishop, 1997; Swait et al., 2004) it seems that longitudinal data sets could provide a leap forward in analysis. Availability of Homescan and Information Resources, Inc. (IRI) data, for example, seems to have had a large impact on analysis of dynamics in the food demand area (for example, Nevo, 2011). Stated preference data have been a useful substitute and complement to revealed preference (RP) data on recreation – in particular the use of data fusion. A very different approach to data collection involves the use of field experiments. Surprisingly, attempts to conduct field experiments in environmental economics are quite controversial (see Malakoff, 2012). One could imagine a number of other experimental studies on critical aspects of recreational and tourism demand, including the fundamental topic of the value of travel time. Two papers outlining a research agenda to incorporate programme evaluation with environmental valuation are Ferraro et al. (2012) and Pendleton (2010). The second significant threshold in analysis of recreation choices is more careful assessment of endogeneity issues. Again there is some literature on this topic, including Moeltner and von Haefen (2011), Timmins and Murdock (2007) and Phaneuf et al. (2009) among others. A third threshold is more complete analysis of choice set formation. Choice set formation has been investigated in the literature (for example, Haab and Hicks, 1997; von Haefen, 2008) but the interdependence between choice-set formation, dynamics and endogeneity seems to be an area open for inquiry. As individuals learn, their choice sets will likely change (for example, Truong et al., 2011). As non-price equilibria evolve (for example, congestion effects à la Phanuef et al. 2009) choice sets will probably change. Furthermore, the interaction between models of choice-set formation and models that incorporate non-compensatory preferences (for example, cut-off models – Swait, 2001) needs investigation. The recreation demand literature is dominated by single-purpose trip models. Increasingly trips are multi-purpose and, in the case of tourism, it is often the ‘package’ that is in demand. There is a small literature on this but advances in choice modelling, including multiple choices (Nevo et al., 2005), seems a promising avenue of future research.

9

SPATIAL ASPECTS

An area of research that has not received significant attention in the traditional ‘choice modelling’ literature is the analysis of spatial choices made by firms and the assessment of policies and programmes using such models. One example is the literature on spatial choice in fisheries – the integration of spatial information on fish stocks, marine reserves and other bio-physical information with information on costs, prices and expected behaviours of other fishing boats. Zhang and Smith (2011), for example, employ techniques from the sorting literature and research on endogenous attributes to examine fishing-fleet behaviour in response to marine reserves. The prediction of spatial choice

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

670

Handbook of choice modelling

behaviour in fisheries is critical to understand the biological and economic implications of policies such as marine reserves, catch limits, and so on (see Smith, 2005; Abbott and Wilen, 2010; Hicks and Schnier, 2010). The examples from fisheries are somewhat unique in that they are built on data-sets that are often very large (relative to recreation demand, for example) and are longitudinal or at least include some temporal dimensions. There would seem to be some advantages from hybridization over the ‘demand’ literature (recreation, property values, and so on) and the ‘production’ or ‘industrial’ analysis literature. Some of these crossovers are occurring – the application of sorting models to spatial fishing location choice (Zhang and Smith 2011) and the application of choice-set formation models to fishing site choice (Hicks and Schnier, 2010) are excellent examples. Models of fishery location choice, or models of land use (which also incorporate information on economic returns to explain spatial choice decisions) may benefit from the inclusion of social network effects (for example, Dugundji and Walker, 2005; Neilson and Wichmann, 2014), the integration of decision-making into a household production function framework (Bell et al., 2006), and behavioural economics findings regarding complexity impacts and inertia to capture a richer set of elements of spatial, temporal choice. Another recent area of research is the incorporation of space in the analysis of stated choices. The majority of studies report a decay of the stated WTP (for example, Concu, 2007; Garrod et al., 2012). Rolfe and Windle (2012) conclude that distance decay is likely to be limited in the case of iconic assets as the population base supporting the preservation of those assets is probably larger. Choi (2013) adds to this as he finds that protection of a resource might be less important for local populations than to those in distant regions, among them non-users. More evidence regarding distance decay is especially needed in respect of passive use values as from a theoretical point of view these values are not expected to decrease with distance. Another interesting avenue for research is whether distance-decay effects are not uniform by adding further information – for example, substitute availability (Schaafsma et al., 2012, 2013). Finally, Campbell et al. (2008) were the first who showed that WTP estimates from a CE are spatially clustered. Johnston et al. (2011) presented results regarding local clustering of WTP values (see also Meyerhoff, 2013). An important topic for future studies to investigate is to what extent the definitions of neighbourhood (nearest neighbour or distance bands) affect global or local autocorrelation.

10

CONCLUSIONS

Economic analysis of choice behaviour involving environmental quality and/or natural resource use has grown steadily over the past few decades. Choice modelling, in various forms, has been an integral aspect of this area of research. Environmental and resource economists have both contributed to the broader choice modelling literature and benefited from the contributions of choice modelling researchers in other disciplines. A significant amount of research in choice modelling is investigating specifications that differ from standard economic theory and incorporate heuristics, non-compensatory preferences and other ‘non-standard’ forms. Future research in these areas should include the critical linkage to welfare economics that is required for use within environmental and

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in environmental and resource economics 671 resource economics. Investigations linking behavioural economics with welfare theory along the lines of Bernheim and Rangel (2007) should be considered in future research in these areas of choice modelling. Advances in econometric analysis and experimental design have been the most rapid in the past decade and these will undoubtedly continue, but advances in data collection and the integration of richer behavioural economics models into the understanding of choices in the natural environment will also be likely to arise. In this chapter we have outlined a few areas where research is likely to push back the boundaries of knowledge and analysis.

REFERENCES Abbott, J.K. and J.E., Wilen (2010), ‘Voluntary cooperation in the commons? Evaluating the Sea State Program with reduced form and structural models’, Land Economics, 86 (1), 131–54. Adamowicz, W. and J. Swait (2013), ‘Are food choices really habitual? Integrating habits, variety-seeking, and compensatory choice in a utility-maximizing framework’, American Journal of Agricultural Economics, 95 (1), 17–41. Alemu, M.H., M.R. Mørkbak, S.B. Olsen and C.L. Jensen (2013), ‘Attending to the reasons for attribute nonattendance in choice experiments’, Environmental and Resource Economics, 54 (3), 333–59. Banzhaf, S. and V.K. Smith (2007), ‘Meta analysis in model implementation: choice sets and the valuation of air quality improvements’, Journal of Applied Econometrics, 22 (6), 1013–31. Bateman, I.J., B.H. Day, A.P. Jones and S. Jude (2009), ‘Reducing gain-loss asymmetry: a virtual reality choice experiment valuing land use change’, Journal of Environmental Economics and Management, 58 (1), 106–18. Beharry-Borg, N., D. Hensher and R. Scarpa (2009), ‘An analytical framework for joint vs separate decisions by couples in choice experiments: the case of coastal water quality in Tobago’, Environmental and Resource Economics, 43 (1), 95–117. Bell, K.P., K.J. Boyle and J.R. Rubin (eds) (2006), Economics of Rural Land-Use Change, Aldershot: Ashgate. Bernheim, B.D. and A. Rangel (2007), ‘Toward choice theoretic foundations for behavioral welfare economics’, American Economic Review, 97 (2), 464–70. Bliemer, M.C.J. and J.M. Rose (2010), ‘Construction of experimental designs for mixed logit models allowing for correlation across choice observations’, Transportation Research Part B, 44 (3), 720–34. Bliemer, M.C.J. and J.M. Rose (2011), ‘Experimental design influences on stated choice outputs. an empirical study in air travel choice’, Transportation Research Part A, 45 (1), 63–79. Boxall, P., W.L. Adamowicz, and A. Moon (2009), ‘Complexity in choice experiments: choice of the status quo alternative and implications for welfare measurement’, Australian Journal of Agricultural and Resource Economics, 53 (4), 503–19. Braden, J.B., L.O. Taylor, D. Won, N. Mays, A. Cangelosi and A.A. Patunru (2008), ‘Economic benefits of remediating the Buffalo River, New York Area of Concern’, Journal of Great Lakes Research, 34 (4), 631–48. Burton, M. and D. Rigby (2012), ‘The self selection of complexity in choice experiments’, American Journal of Agricultural Economics, 94 (3), 786–800. Cameron, T.A. and J.R. DeShazo (2011), ‘Differential attention to attributes in utility-theoretic choice models’, Journal of Choice Modelling, 3 (3), 73–115. Campbell, D., M.R. Mørkbak and S.B. Olsen (2012), ‘Response latency in stated choice experiments: impact on preference, variance and processing heterogeneity’, paper presented at the 19th Annual Conference of the European Association of Environmental and Resource Economists, Prague, 27–30 June. Campbell, D., R. Scarpa and W.G. Hutchinson (2008), ‘Assessing the spatial dependence of welfare estimates obtained from discrete choice experiments’, Letters in Spatial and Resources Sciences, 1 (2), 117–26. Carson, R.T. and T. Groves (2007), ‘Incentive and information properties of preference questions’, Environmental and Resource Economics, 37 (1), 181–210. Carson, R.T. and T. Groves (2011), ‘Incentive and information properties of preference questions: commentary and extensions’, in J. Bennett (ed.), International Handbook of Non-Market Environmental Valuation, Cheltenham, UK and Northampton, MA, USA: Edward Elgar, pp. 300–321. Carson, R.T. and J. Louviere (2011), ‘A common nomenclature for stated preference elicitation approaches’, Environmental and Resource Economics, 49 (4), 539–59. Chattopadhyay, S., J.B. Braden and A. Patunru (2005), ‘Benefits of hazardous waste cleanup: new evidence from survey-and market–based property value approaches’, Contemporary Economic Policy, 23 (3), 357–75.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

672

Handbook of choice modelling

Choi, A.S (2013), ‘Nonmarket values of major resources in the Korean DMZ areas: a test of distance decay’, Ecological Economics, 88 (1), 97–107. Chorus, C.G. (2010), ‘A new model of random regret minimization’, European Journal of Transport and Infrastructure Research, 10 (2), 181–96. Colombo, S. and K. Glenk (2014), ‘Social preferences for agricultural policy instruments: joint consideration of non-attendance to attributes and to alternatives in modelling discrete choice data’, Journal of Environmental Planning and Management, 57 (2), 215–32. Concu, G.B. (2007), ‘Investigating distance effects on environmental values: a choice modelling approach’, Australian Journal of Agricultural and Resource Economics, 51 (2), 175–94. Cropper, M.L., L. Deck, N. Kishor and K. McConnell (1993), ‘Valuing product attributes using single market data: a comparison of hedonic and discrete choice approaches’, Review of Economics and Statistics, 75 (2), 225–32. Czajkowski, M., M. Giergiczny and W.H. Green (forthcoming), ‘Learning and fatigue effects revisited: investigating the effects of accounting for unobservable preference and scale heterogeneity’, Land Economics. Day, B., I. Bateman, R. Carson, D.P. Dupont, J. Louviere, S. Morimoto, R. Scarpa and P. Wang (2012), ‘Order effects and choice set awareness in repeat-response stated preference studies’, Journal of Environmental Economics and Management, 63 (1), 73–91. Ding, Y., M.M. Veeman and W.L. Adamowicz (2012), ‘The influence of attribute cutoffs on consumers’ choices of a functional food’, European Review of Agricultural Economics, 39 (5), 745–69. Dosman, D. and W. Adamowicz (2006), ‘Combining stated and revealed preference data to construct an empirical examination of intra household bargaining’, Review of Economics of the Household, 4 (1), 15–34. Dugundji, E.R. and J.L. Walker (2005), ‘Discrete choice with social and spatial network interdependencies’, Transportation Research Record, 1921, 70–78. Ferraro, P.J., K. Lawlor, K. Mullan and S.K. Pattanayak (2012), ‘Forest figures: a review of ecosystem services valuation and policy evaluation in developing countries’, Review of Environmental Economics and Policy, 6 (1), 20–44. Ferrini, S. and R. Scarpa (2007), ‘Designs with a priori information for nonmarket valuation with choice experiments: a Monte Carlo study’, Journal of Environmental Economics and Management, 53 (3), 342–63. Fiebig, D.G., M.P. Keane, J. Louviere and N. Wasi (2010), ‘The generalized multinomial logit model: accounting for scale and coefficient heterogeneity’, Marketing Science, 29 (3), 393–421. Garrod, G., E. Ruto, K. Willis and N. Powe (2012), ‘Heterogeneity of preferences for the benefits of environmental stewardship: a latent-class approach’, Ecological Economics, 76 (April), 104–11. Gilbride, T.J. and G.M. Allenby (2004), ‘A choice model with conjunctive, disjunctive, and compensatory screening rules’, Marketing Science, 23 (3), 391–406. Guevera, C.A. and M. Ben-Akiva (2009), ‘Addressing endogeneity in discrete choice models: assessing controlfunction and latent-variable methods’, Working Paper #TSI-SOTUR-09-03, MIT Portugal Program. Haab, T.C. and R.L. Hicks (1997), ‘Accounting for choice set endogeneity in random utility models of recreation demand’, Journal of Environmental Economics and Management, 34 (2), 127–47. Hess, S. and N. Beharry-Borg (2012), ‘Accounting for latent attitudes in willingness-to-pay studies: the case of coastal water quality improvements in Tobago’, Environmental and Resource Economics, 52 (1), 109–31. Hess, S. and D.A. Hensher (2010), ‘Using conditioning on observed choices to retrieve individual-specific attribute processing strategies’, Transportation Research Part B, 44 (6), 781–90. Hess, S. and J.M. Rose (2012), ‘Can scale coefficients heterogeneity be separated in random coefficient models?’, Transportation, 39 (6), 1225–39. Hess, S., A. Stathopoulos and A. Daly (2012), ‘Allowing for heterogeneous decision rules in discrete choice models: an approach and four case studies’, Transportation, 39 (3), 565–91. Hicks, R.L. and K.E. Schnier (2010), ‘Spatial regulations and endogenous consideration sets in fisheries’, Resource and Energy Economics, 32 (2), 117–34. Johnston, R.J., M. Ramachandran, E.T. Schultz, K. Segerson and E.Y. Besedin (2011), ‘Characterizing Spatial pattern in ecosystem service values when distance decay doesn’t apply: choice experiments and local indicators of spatial association’, paper presented at Annual Meeting of the Agricultural and Applied Economics Association, Pittsburgh, 24–26 July. Kahneman, D. and A. Tversky (1979), ‘Prospect theory: an analysis of decisions under risk’, Econometrica, 47 (2), 263–91. Keane, M. (2006), ‘The generalized logit model: preliminary ideas on a research program’, paper presented at the Motorola-CenSoC Hong Kong Meeting, Motorola, Hung Hom, Kowloon, 22 October. Kuminoff, N.V. (2009), ‘Decomposing the structural identification of nonmarket values’, Journal of Environmental Economics and Management, 57 (2), 123–39. Kuminoff, N.V., V.K. Smith and C. Timmins (2010), ‘The new economics of equilibrium sorting and its transformational role for policy evaluation’, NBER working paper 16349, Cambridge, MA.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in environmental and resource economics 673 Levitt, S.D. and J.A. List (2007), ‘What do laboratory experiments measuring social preferences reveal about the real world?’, Journal of Economic Perspectives, 21 (2), 153–74. Loomes, G. and R. Sugden (1982), ‘Regret-theory: an alternative theory of rational choice under uncertainty’, The Economic Journal, 92 (December), 805–24. Louviere, J.J., T. Islam, N. Wasi, D. Street and L. Burgess (2008), ‘Designing discrete choice experiments: do optimal designs come at a price?’, Journal of Consumer Research, 35 (2), 360–75. Louviere, J.J., D. Pihlens and R. Carson (2011), ‘Design of discrete choice experiments: a discussion of issues that matter in future applied research’, Journal of Choice Modelling, 4 (1), 1–8. Lusk, J.L. and F.B. Norwood (2009), ‘An inferred valuation method’, Land Economics, 85 (3), 500–514. Magidson, J. and J.K. Vermunt (2007), ‘Removing the scale factor confound in multinomial logit choice models to obtain better estimates of preference’, in proceedings of the Sawtooth Software Conference, Sawtooth Software, Inc., Santa Rosa, CA, 17–19 October. Malakoff, D (2012), ‘Something fishy? NOAA economics study makes waves’, Science, 335 (6074), 1292, doi: 10.1126/science.335.6074.1292. Marcucci, E., A. Stathopoulos, L. Rotaris and R. Danielis (2011), ‘Comparing single and joint preferences: a choice experiment on residential location in three-member households’, Environment and Planning A, 43 (5), 1209–25. Mariel, P., J. Meyerhoff and S. Hess (2013), ‘Hybrid choice model for heterogeneous preferences towards the landscape externalities of wind turbines’, paper presented at the International Choice Modelling Conference, Sydney, 3–5 July. McNair, B., D. Hensher and J. Bennett (2012), ‘Modelling heterogeneity in response behaviour towards a sequence of discrete choice questions: a probabilistic decision process model’, Environmental and Resource Economics, 51 (4), 599–616. Meyerhoff, J. (2013), ‘Do turbines in the vicinity of respondents’ residences influence choices among programmes for future wind power generation’, Journal of Choice Modelling, 7 (June), 58–71. Meyerhoff, J., M. Oehlmann and P. Weller (2013), ‘The influence of design dimensions on stated choices – an example from environmental valuation using a design-of-designs approach’, Working Papers on Management in Environmental Planning 33, Chair in Environmental and Land economics, Technical University, Berlin. Moeltner, K. and R. von Haefen (2011), ‘Microeconometric strategies for dealing with unobservables and endogenous variables in recreation demand models’, Annual Review of Resource Economics, 3 (3), 375–96. Neilson, W. and B. Wichmann (2014), ‘Social networks and non-market valuations’, Journal of Environmental Economics and Management, accessed 21 January 2014 at http://dx.doi.org/10.1016/j.jeem.2013.11.005i. Nevo, A. (2011), ‘Empirical models of consumer behavior’, Annual Review of Economics, 3 (1), 51–75. Nevo, A., D.L. Rubinfeld and M. McCabe (2005), ‘Academic journal pricing and the demand of libraries’, American Economic Review, 95 (2), 447–52. Patunru, A. A., J.B. Braden and S. Chattopadhyay (2007), ‘Who cares about environmental stigmas and does it matter? Latent segmentation evidence on property values’, American Journal of Agricultural Economics, 89 (3), 712–26. Pendleton, L. (2010), ‘Measuring and monitoring the economic effects of habitat restoration: a summary of a NOAA Blue Ribbon Panel’, report to the National Oceanic and Atmospheric Administration’s Restoration Center and Assessment and Restoration Division, 13 May. Petrin, A. and K. Train (2010), ‘A control function approach to endogeneity in consumer choice models’, Journal of Marketing Research, 47 (1), 3–13. Phaneuf, D. and V.K. Smith (2003), ‘Recreation demand models’, in E.K. Maler and J.R. Vincent (eds), Handbook of Environmental Economics, Amsterdam: North Holland, pp. 671–761. Phaneuf, D., J. Carbone and J. Herriges (2009), ‘Non-price equilibria for non-marketed goods’, Journal of Environmental Economics and Management, 57 (1), 45–64. Phaneuf, D., L. Taylor and J. Braden (2013), ‘Combining RP and SP data to estimate preferences for residential amenities: a GMM approach’, Land Economics, 89 (1), 30–52. Provencher, B. and R.C. Bishop (1997), ‘An estimable dynamic model of recreation behavior with an application to Great Lakes angling’, Journal of Environmental Economics and Management, 33 (2), 107–27. Rolfe, J. and J. Windle (2012), ‘Distance decay functions for iconic assets: assessing national values to protect the health of the Great Barrier Reef in Australia’, Environmental and Resource Economics, 53 (3), 347–65. Scarpa, R. and J.M. Rose (2008), ‘Experimental designs for environmental valuation with choice experiments: Monte Carlo investigations’, Australian Journal of Agricultural and Resource Economics, 52 (3), 253–82. Scarpa, R., M. Thiene and D.A. Hensher (2012), ‘Preferences for tap water attributes within couples: an exploration of alternative mixed logit parameterizations’, Water Resources Research, 48 (1), W01520. Schaafsma, M., R. Brouwer and J. Rose (2012), ‘Directional heterogeneity in WTP models for environmental valuation’, Ecological Economics, 79 (July), 21–31. Schaafsma, M., R. Brouwer, A. Gilbert, J. van den Bergh and A. Wagtendonk (2013), ‘Estimation of distance-

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

674

Handbook of choice modelling

decay functions to account for substitution and spatial heterogeneity in stated preference research’, Land Economics, 89 (3), 514–37. Smith, M.D. (2005), ‘State dependence and heterogeneity in fishing location choice’, Journal of Environmental Economics and Management, 50 (2), 319–40. Swait, J. (2001), ‘A non-compensatory choice model incorporating attribute cutoffs’, Transportation Research B, 35 (10), 903–28. Swait, J. (2006), ‘Advanced choice models’, in B. Kanninen (ed.), Valuing Environmental Amenities Using Stated Choice Studies: A Common Sense Approach to Theory and Practice, Dordrecht: Springer, pp. 229–93. Swait, J. and W. Adamowicz (2001), ‘Choice environment, market complexity and consumer behavior: a theoretical and empirical approach for incorporating decision complexity in models of consumer choice’, Organizational Behavior and Human Decision Processes, 86 (2), 141–67. Swait, J., W. Adamowicz and M. van Bueren (2004), ‘Choice and temporal welfare impacts: incorporating history and expectations into discrete choice models’, Journal of Environmental Economics and Management, 47 (1), 94–116. Swait, J., W.L. Adamowicz, M. Hanemann, A. Diederich, J. Krosnick, D. Layton, W. Provencher, D. Schkade and R. Tourangeau (2002), ‘Context dependence and aggregation in disaggregate choice analysis’, Marketing Letters, 13 (3), 195–205. Taylor, L. (2003), ‘The hedonic method’, in P. Champ, T. Brown and K. Boyle (eds), A Primer on the Economic Valuation of the Environment, Boston, MA: Kluwer Academic, pp. 331–94. Thiene, M., M. Boeri and C.G. Chorus (2012a), ‘Random regret minimization: exploration of a new choice model for environmental and resource economics’, Environmental and Resource Economics, 51 (3), 413–29. Thiene, M., J. Meyerhoff and M. De Salvo (2012b), ‘Scale and taste heterogeneity for forest biodiversity: models of serial nonparticipation and their effects’, Journal of Forest Economics, 18 (4), 355–69. Timmins, C. and J. Murdock (2007), ‘A revealed preference approach to the measurement of congestion in travel cost models’, Journal of Environmental Economics and Management, 54 (2), 230–49. Train, K. and M. Weeks (2005), ‘Discrete choice models in preference space and willingness-to-pay space’, in R. Scarpa and A. Alberini (eds), Applications of Simulation Methods in Environmental and Resource Economics Dordrecht: Springer, pp. 1–16. Truong, T.D., W.L. Adamowicz and P.C. Boxall (2011), ‘Modelling the effect of risk perception on preferences and choice set formation over time: recreational hunting site choice and chronic wasting disease’, paper presented at the AERE Annual Summer Conference, Seattle, 9–10 June. Tversky, A. (1969), ‘The intransitivity of preferences’, Psychological Review, 76 (1), 31–48. Vermeulen, B., P. Goos, R. Scarpa and M. Vandebroek (2011), ‘Bayesian conjoint choice designs for measuring willingness to pay’, Environmental and Resource Economics, 48 (1), 129–49. Von Haefen, R.H. (2008), ‘Latent consideration sets and continuous demand systems’, Environmental and Resource Economics, 41 (3), 363–79. Vossler, C.A., M. Doyon and D. Rondeau (2012), ‘Truth in consequentiality: theory and field evidence on discrete choice experiments’, American Economic Journal: Microeconomics, 4 (4), 145–71. Whittington, D. (2010), ‘What have we learned from 20 years of stated preference research in less-developed countries?’, Annual Review of Resource Economics, 2 (16), 209–36. Whittington, D. and S. Pagiola (2012), ‘Using contingent valuation in the design of payments for environmental services mechanisms: a review and assessment’, World Bank Research Observer, 27 (2), 261–87. Zhang, J., and W.L. Adamowicz (2011), ‘Unraveling the choice format effect: a context-dependent random utility model’, Land Economics 87, 730–43. Zhang, J. and M. Smith (2011), ‘Heterogeneous response to marine reserve formation: a sorting model approach’, Environmental and Resource Economics, 49 (3), 311–25.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:30AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

28 Choice modelling research in health economics Emily Lancsar and Peter Burge

1

INTRODUCTION

This chapter discusses the research needs for choice modelling research in health economics and the health sector more broadly. We first consider the evolution of the use of choice modelling, particularly using discrete-choice experiments (DCEs), from their early quite narrow use to the subsequent broadening in scope in relation to methods, context and the research questions addressed. DCEs undertaken in the health sector share many of the same considerations faced in developing and applying DCEs in other areas. In addition, we discuss a number of specific considerations that arise in a health context and make it a particularly interesting area in which to apply DCE methods. We discuss opportunities to draw on experience and lessons from other fields in which DCEs are applied and highlight research undertaken in health that is likely to be of interest to researchers in other areas of application. We conclude by proposing a new agenda for choice modelling research in health economics.

2

USE OF CHOICE MODELLING IN HEALTH

While much of the recent health economics literature has focused on using stated preference (SP) data, the first applications of discrete choice models within the health sector utilized revealed preference (RP) data; for example: to estimate discrete choice models to examine the demand for primary health services in the Third World (Akin et al., 1985); to model choice of health-care provider in Peru to examine whether user fees were regressive (Gertler et al., 1987); and to estimate a model of demand for health care and health insurance in Australia (Cameron et al., 1988). However, in recent years the focus has shifted to the estimation of choice models from SP data generated in DCEs rather than from RP sources. The health economics discipline has a long tradition of using SP methods, particularly in outcome measurement for use in economic evaluation, including such methods as standard gamble, time trade-off and to a lesser extent, contingent valuation. This is not surprising given RP data in the health sector are perceived to be relatively scarce or potentially deficient when compared with other parts of the economy. This perception arises in part due to missing markets, but is more commonly due to market failure in the health sector. DCEs are another SP method that has been embraced in health economics. A key appeal of DCEs in this setting is they provide rich sources of data which in turn allow investigation of a broad range of research questions to inform clinical, policy and commercial decision-making, some of which would otherwise be intractable analytically. The first DCE was undertaken in the early 1990s (Propper, 1991) and popularized by Ryan and colleagues in the mid to late 1990s (Ryan and Gerard, 2003).1 These early 675 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

676

Handbook of choice modelling

studies focused on applying DCEs in clinical settings to elicit patient preferences and quantify trade-offs, focusing on process and non-health attributes of the delivery of health care.2 The 2000s saw a broadening in application of DCEs, both in the context in which they were applied and the types of research questions they were used to address. The past decade also brought much more focus to bear on methods including, for example, experimental design (including orthogonal, D-optimal and efficient designs utilizing priors (Carlson and Martinsson, 2003)), more flexible econometric methods to better capture behavioural phenomena, accounts for various sources of heterogeneity and improvements in statistical properties of the estimated choice models (for example, Hall et al., 2006; King et al., 2007; Hole, 2008; Regier et al., 2009), along with investigation of consistency of such models with consumer theory and choice axioms more broadly (for example, Lloyd, 2003; San Miguel et al., 2005; Lancsar and Louviere, 2006). The health sector also has been quick to utilize, and indeed lead, extensions of DCE preference elicitation methods including best-worst scaling (BWS), as discussed below. DCEs are now routinely used in the health sector in a range of settings. They continue to be used to elicit patient, or in some cases carer, preferences, (for example, Burge et al., 2004; van den Berg et al., 2005; Kjaer and Gyrd-Hansen, 2008; Watson et al., 2009) and also, increasingly, clinician or other health professional’s preferences for treatment for patients (for example, Mark and Swait, 2004), in recognition that such information can inform clinical decision-making and improve patient adherence with clinical/public health programmes. Applications to labour market decisions of the health workforce (for example, Scott, 2001; Sivey et al., 2012), health-care financing and insurance/regulation (for example, Bech, 2003; Zweifel et al., 2006), and developing country applications (for example, Hanson et al., 2005) are increasing. Interesting work also has been undertaken using DCEs to explore time preferences (for example, van der Pol and Cairns, 2001). Within such settings DCEs continue to be used to quantify trade-offs between attributes. In addition, they are now used to predict uptake/demand for new health-care treatments and services (for example, Hall et al., 2002); information crucial to planning appropriate levels of health-care provision. There also has been a shift from just valuing process and non-health attributes of health care to also valuing health and health outcomes (for example, King et al., 2007). Indeed, like their use in environmental valuation, DCEs are used in the health sector to elicit monetary values, including Hicksian compensating variation/willingness to pay (WTP), as a measure of strength of preference and for potential use in cost–benefit analysis (CBA) (Lancsar and Savage, 2004; McIntosh, 2006). However, in contrast to other areas of applied economics where it is common place to value outcomes in monetary terms for use in CBA, the dominant measure of benefit in the health sector is the quality adjusted life year (QALY) and the dominant form of economic evaluation is cost–utility analysis (CUA) which investigates the comparative cost per QALY.3 This dominance in part has been driven by the central role of QALYs and CUA in drug and health technology reimbursement decisions in a number of Organisation for Economic Development (OECD) countries and explicit preference for such analysis by health technology assessment agencies, notably the National Institute for Health and Clinical Excellence (NICE) in the UK, among others. The QALY metric combines length of survival and the quality of life experienced during those years of survival.4 Within this context, the feasibility of using DCEs to derive health related quality of life values (also referred to as health state utility weights)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in health economics 677 to be used in the construction of QALYs is being explored (for example, Ryan et al., 2006; Ratcliffe et al., 2009; Bansback et al., 2012). Work also has been undertaken using DCEs to test the assumptions underlying the QALY model (Viney et al., 2005). Given the resource constrained environment of most health sectors around the world, the question of how best to set priorities in the allocation of scarce resources is increasingly important. In this context another growing application of DCEs is in the development of priority setting frameworks (for example, Johnson and Backhouse, 2006; Ratcliffe et al., 2009). For example, which types of treatment (or indeed patients) should be given priority in funding and, by implication, which should not; and which characteristics of the use of such resources drive those choices? In this context DCEs are increasingly used to investigate equity or distributional issues alongside efficiency considerations, and indeed potential trade-offs between such objectives. For example, recently, owing to the lack of a budget constraint built into CUA, it has been acknowledged that it is necessary to know what society, or government and other funders, are willing to pay per QALY and whether this value should differ depending on the characteristics of the beneficiaries of QALYs. DCEs have been used to help address these research and policy questions. In particular, Lancsar et al. (2011) used a DCE to elicit distributional weights for QALYs based on age and severity of the recipients of the QALYs. DCEs have been embraced not just by academics and funders of research – with numbers of published papers applying DCEs in health growing considerably from 34 undertaken from 1990 to 2000 (Ryan and Gerard, 2003) to 114 undertaken from 2001 to 2008 (de Bekker-Grob et al., 2012)5 – but also in commercial settings and more recently by policy-makers. The confidential nature of commercial work means such studies are rarely published and therefore less easy to quantify. The Lancsar et al. (2011) work deriving distributional weights for QALYs is an example of publicly commissioned DCE research for use in policy development, as are Ryan et al.’s (2006) and Netten et al.’s (2012) work on developing a measure of outcome for social care in the elderly, among others.

3

KEY CONSIDERATIONS IN APPLYING DCES IN HEALTH

DCEs undertaken in the health sector share many of the same challenges or issues faced in designing, applying and estimating choice models from DCEs in other contexts, such as marketing, transport and environmental and resource economics, which are well documented elsewhere (for example, Louviere et al., 2000; Hensher et al., 2005). In addition, their application in the health sector raises a number of specific considerations which, while not unheard of in other fields, perhaps receive more emphasis or attention in a health setting. Such considerations also make health an interesting area in which to apply such methods. A key example is the high possibility of implausible attribute level combinations; that is, it may be impossible for some attribute levels to co-occur. This issue arises in particular when evaluating health states. For example, combining the level ‘I am confined to bed’ for a ‘mobility’ attribute with the level ‘I have no problems with performing my usual activities’ on a ‘usual activities’ attribute is implausible. Such combinations pose challenges for the development of experimental designs and can require a trade-off in terms of reduced statistical efficiency in order to increase the realism and credibility of the choice sets, or what has been referred to as respondent efficiency (Viney et al., 2005).

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

678

Handbook of choice modelling

The nature of decisions made in DCEs in a health context can be complex, often involving consideration of preferences over such attributes as time (for example, treatments now versus in the future) and risk (for example, of morbidity or mortality). The latter requires consideration of risk communication strategies. Depending on the topic of the DCE, respondents may be very unfamiliar with the context (compared with, say, choice of consumer products), owing to respondents’ lack of experience in, or infrequency of, ‘purchasing’ health and health care, particularly when the sample is drawn from the general population rather than a clinical or patient group. This can require considerable ‘front end’ education of respondents in relation to the context of the decisions they will be asked to make in the experiment. This also can raise issues in relation to respondents forming their preferences (with associated scale or random variability issues) during the experiment, rather than approaching the experiment with well-defined preferences as might be the case in other more familiar choice settings, such as choice of mode of transport to work, for example. Information acceleration, the use of multimedia to accelerate consumers’ knowledge/experience with the good or service developed in marketing, has been shown to be successful in educating respondents in unfamiliar choice contexts when tested against the choices of people in real markets (Urban et al., 1997). As noted below in relation to future research, such approaches are also likely to prove valuable in a health setting. A related issue is the importance of the choices respondents are asked to make in a health context. For example, choice of treatment in a life and death setting is arguably more important than, say, choice of grocery products. Cumulatively, such issues can increase the cognitive burden of the choice task. A related contextual factor posing potential challenges is the cognitive limitations of some respondent groups, such as the elderly (Ryan et al., 2006; Milte et al., 2013) or patients with specific diseases such as schizophrenia (Bridges et al., 2011) and so on. Here a balance is required between statistical efficiency of the choice design and ability of the respondents to engage meaningfully in the task. Another consideration is whose preferences to elicit in health applications. Candidates include patients as users of the product/programme of interest, the general public/tax payers as potential uses of health interventions or regulation and the ultimate funders in publically provided health-care systems, or private insurers in a more market-based health-care system. Similarly, shared decision-making is a growing research area in health (Thistlethwaite et al., 2006) owing to the agency role health-care providers often play in the health sector. This raises similar issues in relation to whose preferences to elicit out of the principle (patient) or agent (health care provider), or indeed both (Fiebig et al., 2009). Another key consideration, given the uncertain nature of health and health care, is whether to elicit ex ante or ex post preferences? The answers to such questions will, of course, depend on the specific research and policy questions being addressed.

4

LESSONS FROM OTHER FIELDS

In this section we focus on three examples of areas in which health applications could learn from work undertaken in other fields, namely, demand forecasting and data enrichment/calibration, sample enumeration and attribute variation.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in health economics 679 Since, as discussed in section 2, there has been less historic use of RP data and choice models for demand forecasting in health, many issues which have been identified and resolved in other sectors – particularly transport and environmental economics (for example, recreation site choice) – could prove useful to those utilizing models drawing on data from DCEs to predict demand in the health sector. Some recent studies published in health economics have sought to apply models from DCEs to undertake market simulations and forecast the demand for alternatives. For example, Vujicic et al. (2010) looked at the impact of different policy levers on the choice probabilities relating to different career choices of nurses in rural Liberia, while Klein (2011) modelled the choice of location for giving birth for those living in the slums of Hyderabad. In both cases revealed and stated data were collected; the observed historic choices of those participating in the survey were collected alongside their stated choices across a range of hypothetical DCE scenarios. However, there may have been a missed opportunity to also include the RP data within the model development and the papers suggest that the predictions of market behaviour were based on uncalibrated models estimated from the DCE data alone. It is well known that the unit of utility scale in a multinomial logic (MNL) model is inversely related to the error variance (Swait and Louviere, 1993). This means that parameters representing identical preference structures but estimated from data-sets with differing variance will differ in magnitude. Choice probabilities are therefore influenced by model scale, as illustrated in Figure 28.1; the higher the scale factor, the more deterministic the choices being modelled. This becomes important when forecasting demand and market shares from choice experiments as we know from the literature, including from health applications, that the scales of models estimated from stated and revealed preference data-sets can differ (Mark and Swait, 2004). This suggests that caution should be applied when using models estimated purely from DCE data-sets for forecasting choice probabilities and demand for alternatives. In some cases there will be relatively little that can be done. However, frequently there is at least some aggregate RP information available which could be used to estimate calibration constants on alternatives to reproduce known existing market shares, or to calibrate the scale of the model to allow it to better reproduce known elasticities. This can be taken a step further where disaggregate RP data are available at the level of the decision-maker (for example, observations of existing career choice or historic treatment location choice). In such cases models can utilize both the SP DCE data and the RP data in a joint estimation. There are cases where this has been undertaken in health modelling studies (Mark and Swait, 2004; Burge et al., 2005); however, this issue seems to be appreciated more in the transport and environmental economics fields than in health, probably because predicting uptake is more commonly undertaken in those fields. Another consideration in forecasting demand and market shares of alternatives is the extent to which the sample being used for the forecasting procedure reflects the population making choices in the real world market. For example, in the study of the career choice of nurses (Vujicic et al., 2010), large facilities were over-sampled and male nurses under-sampled. In the subsequent choice model, observable differences were identified in preferences for those practising in rural areas (which were likely to be correlated with facility size) and also by gender. However, the survey sample was used as the forecasting

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

680

Handbook of choice modelling Pi 1

0.8

Increasing scale

0.6

0.4

0.2

Vi−Vj

0 −3

−2

−1

0

1

2

3

Source: Adamowicz et al. (1998).

Figure 28.1 The effect of the scale parameter on choice probability

base for calculating willingness to work in a rural area, and to make inferences for the impact different policies would have on the proportions of nurses choosing to accept jobs in rural areas. For this investigation of different policy scenarios it may have been more informative to weight the sample to better represent the known composition of the population, and to apply the choice model (after calibration) to this weighted sample in a sample enumeration procedure to calculate the demand for each alternative (Train, 2003); this would have led to forecasts more closely representing the impact that a given policy intervention might achieve. The discussion above highlights that there are some cases where RP data can be useful in complementing SP data in some health studies. The scope for this may be less restricted than has previously been considered – partly as a result of changes in the policy environment where more market contexts are being introduced within health care, and partly as a result of DCEs and choice models being applied to an increasingly broad range of policy issues. The challenge, of course, is to establish good RP data-sets that can support the development of choice models. In work undertaken in the UK on patients’ choices between hospitals (Burge et al., 2005), RP data were available on the alternatives chosen by the participants in the programme. However, the challenge for developing choice models was that the characteristics of the alternatives not chosen were not recorded so had to be inferred. This highlights the potential benefits that might be

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in health economics 681 gained if researchers could engage with those setting up information technology systems to be used to support real-world decision-making. Relatively simple changes to the data being routinely retained on record could produce large data-sets which would be highly amenable to supporting the development of future choice models and providing important policy insights. A further area which has seen recent attention in other fields is the issue of making choices under circumstances of service variation. For example, in the transport field there has been increasing interest in presenting travel-time reliability within choice experiments to explore how travellers respond to variability and uncertainty. A range of different approaches have been tested to convey to individuals the concept that a journey may take different amounts of time on different days (Hollander, 2006; Tseng et al., 2009). This concept of variation in attributes might usefully be considered further in a health context, with applications to both the concepts of uncertainty and risk. For example, when facing choices of treatment there are likely to be both variations in surgery outcomes and in the probability of each outcome, some of which could have significant consequences. This is an interesting area where translation of ideas and approaches between researchers working in different sectors may prove fruitful. It should be noted that while the need to address this issue has been acknowledged in the broader literature, there is still a lack of consensus on how best to present such concepts to survey respondents, and additional research on this is required. However, taking this research agenda forward in the health field may usefully lead to methodological developments that could feed back to other areas where a range of competing approaches are currently being tested.

5

LESSONS FROM HEALTH FOR THOSE WORKING IN OTHER FIELDS

Just as those applying DCEs in health can learn from their application in other fields, other fields also could potentially benefit from research undertaken in a health setting. Again we focus on three examples, namely, work on best-worst scaling and related estimation, a qualitative framework for development of DCEs and utilizing DCEs in priority setting. Best-worst scaling methods (which are the topic of Chapter 8 in this volume) are increasingly used and have been further developed in the health sector. All three cases have been applied in health (Flynn et al., 2007; Lancsar et al., 2007; Louviere and Flynn, 2010). Best-worst scaling case 2, which asks respondents to choose best and worst attribute levels within single profiles, was first used in a health application in the late 1990s (Szeinbach et al., 1997) with several more recent applications (for example, Flynn et al., 2007; Lancsar et al., 2007; Coast et al., 2008; Potoglou et al., 2011). Flynn et al. (2007) is the first case 2 user guide in the DCE literature. Two features of this method are appealing in a health setting. First, it facilitates measurement of attribute levels on common scales, permitting inter-dimensional utility comparisons without a numeraire (Flynn et al., 2007). Second, it also has been suggested that such experiments are cognitively easier than a standard DCE, which is potentially useful given the discussion above of the cognitive limitations of some relevant respondent groups in the health

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

682

Handbook of choice modelling

sector. However, the researcher forgoes the ability to investigate more traditional policy analysis such as predicting uptake and welfare analysis. Methodological work comparing BWS case 2 to standard DCE also has been undertaken in a health setting (Potoglou et al., 2011). Similarly, one of the first BWS case 3 studies, which embeds a best-worst task within a standard DCE, was undertaken in a health setting (Lancsar and Louviere, 2005) and the first case 3 user guide also was developed in health economics (Lancsar et al., 2013). This type of best-worst task produces much more choice and preference data compared with a standard DCE, which is highly relevant in settings that face sample size constraints; for example, owing to small populations from which to sample as is the case for specialist health-care providers, policy-makers and patients with rare diseases. This also facilitates estimation of models for single individuals (Louviere et al., 2008; Lancsar and Louviere 2009). Indeed, Lancsar and Louviere (2009) developed a new analytical model, the sequential best-worst multinomial logit (SBWMNL), that accounts for the best-worst nature of the data-generation process which they harness to estimate models for single individuals and sample level models, which has subsequently been used in other literatures (for example, Collins and Rose, 2011). Lancsar et al., (2013) demonstrate generalization of the SBWMNL model to account for various sources of scale and preference heterogeneity for sample level models. A further area of methods development likely to be useful beyond health settings is a qualitative framework specific to the development of attributes and levels and broader DCE study design developed by Coast and colleagues (Coast and Horrocks, 2007; Coast et al., 2012). This framework provides greater rigour to the development of attributes and the way that they are described within the context of choice experiments, and can help ensure that the analyst is collecting data that more closely relates to the choice processes that they wish to understand. It also can reduce the risk of ambiguities in what is actually measured, and resource devoted at this early stage of experimental design has the potential to greatly increase the quality of data that is subsequently available for the estimation of choice models. Finally, given increasing resource constraints in all sectors of the economy, the methods developed in health to derive distributional weights and priority setting frameworks more generally outlined in section 2 could prove useful beyond the health sector.

6

RESEARCH AGENDA

The complexity of the health sector and the importance of the decisions made in the sector make it a fascinating area in which to develop and use DCE methods. The above discussion naturally leads to areas that would benefit from further research. Such research needs stem both from the evolution and extension of DCE and choice modelling methods, but also from changing policy and societal questions in the health sector which such methods can help to address. The potential complexity of decision making relating to health and health care suggests the need for more research on cognitive burden and the best ways to minimize this while still generating choice experiments to elicit statistically useful information on which to model choice and forecast behaviour. Here research drawing on and investigating

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in health economics 683 psychological insights about human behaviour and decision-making would be beneficial (Adamowicz et al., 2008). As noted elsewhere (Lancsar and Louviere, 2008), information acceleration also could play a role in this area to help educate respondents regarding the choice context. Similarly, experimental laboratory experiments could prove valuable for testing DCEs and for investigating decision-making in a controlled setting. Increasing the behavioural realism of the choices respondents are asked to make will likely lead to continued movement away from forced choice experiments. Research and application of econometric models that potentially more accurately capture the behavioural choice process, as well as choice outcome, is warranted. For example, work undertaken in marketing and other fields on choice-set formation and consideration sets (for example, Swait and Erdem, 2007) would be relevant in many health applications and, indeed, is starting to be explored (Fiebig et al., 2012). Applications of BWS are likely to continue to grow, and research into the optimal dimensions of such studies in relation to numbers of attributes, levels, choice sets and extra best-worst questions to include without compromising reliability and validity, would be welcome. We expect capturing preference heterogeneity (both observed and unobserved) to remain an important research endeavour. Indeed, Fiebig et al., (2010) found that preference heterogeneity was roughly twice as important for medical decisions compared with those involving choice of consumer products. More generally we expect expanding applications of recent methodological developments allowing estimation of models per individual which offer new ways to capture such preference, and other forms, of heterogeneity (Louviere et al., 2008; Lancsar and Louviere, 2009). Such methods are likely to prove particularly useful in the relative new area of personalized medicine. We also expect the recent interest in health applications in modelling scale heterogeneity (Flynn et al., 2010; Lancsar et al., 2013) to increase, particularly given the potential for complex decision applications noted earlier. Other forms of heterogeneity, such as the decision rules respondents’ use, and modelling of deviations from compensatory choice processes (for example, Araña et al., 2008), warrant further attention, as does attribute non-attendance (for example, Lagarde, 2013). While internal validity has received considerable research interest, external validity has been largely ignored in health economics (Mark and Swait, 2004 and Ryan and Watson, 2009 are notable exceptions) but work in this area would add further confidence in the DCE method. As noted elsewhere (Louviere and Lancsar 2009), health applications afford more opportunities than previously thought in which to harness RP data as one way to investigate external validity of the SP results. Even within systems with public insurance where care is usually free at the point of use, there remain areas of health care paid for privately. For example, medications are often paid for, at least in part, privately. The use of RP data relates more broadly to further consideration of both calibration and data enrichment, as discussed in section 4. The former is particularly important for forecasting and the latter could potentially lead to more robust choice models, ideally utilizing the strengths of both SP and RP data while mitigating each other’s deficiencies. Forecasting more generally across time, location and so on is of further research interest. Here benefit transfer could prove useful (Morrison et al., 2002). To date, elasticities are rarely calculated in health DCEs but could prove informative in some settings. From a policy perspective, resource constraints are being felt more keenly than ever,

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

684

Handbook of choice modelling

implying increasing importance of economic evaluation to help inform the optimal allocation of scarce resources in the health sector. Within this context, as highlighted in their DCE user’s guide, Lancsar and Louviere (2008) suggest that, perhaps uniquely, DCEs can be used to generate outcome measures for use in both CBA and CUA. However, while DCEs have been used to generate both monetary values and QALYs, to date these have not been used in a CBA or CUA, so this represents an obvious avenue for further research. A related issue is even if a health-care intervention has a relatively low cost per QALY, it could still have a large budgetary impact if utilized by large numbers. DCEs can be used to predict uptake of such interventions and therefore better inform expected budgetary impacts. We expect DCEs will continue to play a role in priority setting frameworks, particularly as equity considerations become increasingly important in the sector. Lifestyle choices such as smoking, alcohol consumption, nutrition and so on are key drivers of health risk factors and burden of illness and DCEs are likely to continue to prove useful in understanding such choices. Similarly, population ageing coupled with increases in life expectancy mean the elderly will continue to increase as a proportion of the population. Such significant demographic transformations will continue to place upward pressure on expenditure in the health sector. In turn, the needs and preferences of older people will also become increasingly important. DCEs are ideally placed to help address key policy questions related to aged care which includes health aspects but also non-health considerations associated with ageing (for example, Milte et al., 2013). Finally, as a number of developing countries consider introducing national health schemes, DCEs could help elicit preferences to inform the design of such systems and the content of basic packages of care. More generally, we expect the lines between fields in which DCEs are applied to become more blurred, given the current and potential overlap between fields such as health and environment (for example, environmental health applications such as water – Adamowicz et al., 2011 – air pollution, and so on), marketing (for example, medications, devices, and so on) and labour economics (for example, job decisions of the health workforce), among others. Indeed, we expect this handbook, and hope this chapter, will help in this cross-fertilization endeavour.

NOTES 1. Early applications referred to such studies as conjoint analysis but more recently most of the literature has settled on the terminology of DCE (Louviere et al., 2010). 2. Examples of process attributes include location of treatment, waiting time, staff attitudes, and so on; examples of non-health outcomes include information provision, reassurance, and so on. 3. Cost per QALY analysis is also sometimes referred to as cost-effectiveness analysis (CEA) where effectiveness is measured in QALYs; but of course CEA is more general in that it also encompasses analysis undertaken with other measures of effectiveness. 4. For example, 1 QALY represents one year in full health or could also be generated by various combinations of longer durations in less than full health. 5. This excludes DCE methods papers published during this period and some missed papers.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in health economics 685

REFERENCES Adamowicz, W., D. Bunch, T.A. Cameron, B.G.C. Dellaert, M. Hanneman, M. Keane, J. Louviere, R. Meyer, T. Steenburgh and J. Swait (2008), ‘Behavioral frontiers in choice modeling’, Marketing Letters, 19 (3–4), 215–28. Adamowicz, W., D. Dupont, A. Krupnick and J. Zhang (2011), ‘Valuation of cancer and microbial disease risk reductions in municipal drinking water: an analysis of risk context using multiple valuation methods’, Journal of Environmental Economics and Management; 61 (2), 213–26. Adamowicz,W., J. Louviere and J. Swait (1998), ‘Introduction to attribute-based stated choice methods’, final report to Resource Valuation Branch, Damage Assessment Center, NOAA, US Department of Commerce. Akin, J., C. Griffin, D. Guilkey and B. Popkin (1985), The Demand for Primary Health Services in the Third World, Towota, NJ: Rowman and Allanheld. Araña, J., C. León and M. Hanemann (2008), ‘Emotions and decision rules in discrete choice experiments for valuing health care programmes for the elderly’, Journal of Health Economics, 27 (3), 753–69. Bansback, N., J. Brazier, A. Tsuchiya and A. Anis (2012), ‘Using a discrete choice experiment to estimate health state utility values’, Journal of Health Economics, 31 (1), 306–18. Bech, M. (2003), ‘County council politicians’ choice of hospital payment scheme: a discrete choice study’, Applied Health Economics and Health Policy, 2 (4), 225–32. Bridges, J.F., E.T. Kinter, A. Schmeding, I. Rudolph and A. Mühlbacher (2011), ‘Can patients diagnosed with schizophrenia complete choice-based conjoint analysis tasks?’, The Patient, 4 (4), 267–75. Burge, P., N. Devlin, J. Appleby, C. Rohr and J. Grant (2004), ‘Do patients always prefer quicker treatment? A discrete choice analysis of patients’ stated preferences in the London patient choice project’, Applied Health Economics and Health Policy, 3 (4), 183–94. Burge, P., N. Devlin, J. Appleby, C. Rohr and J. Grant (2005), ‘London Patient Choice Project evaluation: a model of patients’ choices of hospital from stated and revealed preference choice data’, report prepared for the London Patient Choice Project, RAND Europe TR-230-DOH. Cameron, A., P.K. Trivedi, F. Milne and J.R. Piggott (1988), ‘A microeconometric model of the demand for health care and health insurance in Australia’, Review of Economic Studies, 55 (1), 85–106. Carlsson, F. and P. Martinsson (2003), ‘Design techniques for stated preference methods in health economics’, Health Economics, 12 (4), 281–94. Coast, J. and S. Horrocks (2007), ‘Developing attributes and levels for discrete choice experiments using qualitative methods’, Journal of Health Services Research and Policy, 12 (1), 25–30. Coast, J., T. Flynn, L. Natarajan, K. Sproston, J. Lewis, J. Louviere and T. Peters (2008), ‘Valuing the ICECAP capability index for older people’, Social Science & Medicine, 67 (5), 874–82. Coast, J., H. Al-Janabi, E. Sutton, S. Horrocks, J. Vosper, D. Swancutt and T. Flynn (2012), ‘Using qualitative methods for attribute development for discrete choice experiments: issues and recommendations’, Health Economics, 21 (6), 730–41. Collins, A.C. and J.M. Rose (2011), ‘Estimation of stochastic scale with best-worst data’, paper presented at the International Choice Modelling Conference, Leeds, 4–6 July. De Bekker-Grob, E.W., M. Ryan and K. Gerard (2012), ‘Discrete choice experiments in health economics: a review of the literature’, Health Economics, 21 (2), 145–72. Fiebig, D.G., M. Haas, I. Hossain, D.J. Street and R. Viney (2009), ‘Decisions about Pap tests: what influences women and providers?’, Social Science and Medicine, 68 (10), 1766–74. Fiebig, D.G., M.P. Keane, J.J. Louviere and N. Wasi (2010), ‘The generalized multinomial logit model: accounting for scale and coefficient heterogeneity’, Marketing Science, 29 (3), 393–421. Fiebig, D., V. Viney, S. Knox, H. Haas, D. Street, A. Risa Hole, E. Weisberg and D. Bateson (2012), ‘Consideration sets and their role in modelling doctor recommendations about contraceptives’, paper presented Econometrics Society Australasian Meeting, Melbourne, 3–6 July. Flynn, T., J. Louviere, T. Peters and J. Coast (2010), ‘Using discrete choice experiments to understand preferences for quality of life: variance-scale heterogeneity matters’, Social Science & Medicine, 70 (12), 1957–65. Flynn, T.N., J. Louviere, T.J. Peters and J. Coast (2007), ‘Best-worst scaling: what it can do for health care research and how to do it’, Journal of Health Economics, 26 (1), 171–89. Gertler, P., L. Locay and W. Sanderson (1987), ‘Are user fees regressive? The welfare implications of health care financing proposals in Peru’, Journal of Econometrics, 36 (supp) (1–2), 67–8. Hall, J., D.G. Fiebig, M.T. King, I. Hossain and J.J. Louviere (2006), ‘What influences participation in genetic carrier testing? Results from a discrete choice experiment’, Journal of Health Economics, 25 (3), 520–37. Hall, J., P. Kenny, M. King, J. Louviere, R. Viney and A. Yeoh (2002), ‘Using stated preference discrete choice modelling to evaluate the introduction of varicella vaccination’, Health Economics, 11 (2), 457–65. Hanson, K., B. McPake, P. Nakamba and L. Archard (2005), ‘Preferences for hospital quality in Zambia: results from a discrete choice experiment’, Health Economics, 14(7), 687–701.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

686

Handbook of choice modelling

Hensher, D.A., J.M. Rose and W.H. Greene (2005), Applied Choice Analysis: A Primer, Cambridge: Cambridge University Press. Hole, A.R. (2008), ‘Modelling heterogeneity in patients’ preferences for the attributes of a general practitioner appointment’, Journal of Health Economics, 27 (4), 1078–94. Hollander, Y. (2006), ‘Direct versus indirect models for the effects of unreliability’, Transportation Research 33 Part A: Policy and Practice, 40 (9), 699–711. Johnson, F.R. and M. Backhouse (2006), ‘Eliciting stated preferences for health-technology adoption criteria using paired comparisons and recommendation judgments’, Value in Health, 9 (5), 303–11. King, M.T., J. Hall, E. Lancsar, D. Fiebig, I. Hossain, J. Louviere, H.K. Reddel and C.R. Jenkins (2007), ‘Patient preferences for managing asthma: results from a discrete choice experiment’, Health Economics, 16 (7), 703–17. Kjaer, T. and D. Gyrd-Hansen (2008), ‘Preference heterogeneity and choice cardiac rehabilitation program: results from a discrete choice experiment’, Health Policy, 85 (1), 124–32. Klein, T. (2011), Why do India’s urban poor choose to go private? Hospital choice experiments in slums of Hyderabad’, Proceedings of the Second European Research Conference on Microfinance, Groningen: European Microfinance Platform. Lagarde, M. (2013), ‘Investigating attribute non-attendance and its consequences in choice experiments with latent class models’, Health Economics, 22 (5), 554–67, doi: 10.1002/hec.2824. Lancsar, E. and J. Louviere (2005), ‘Several methods for dealing with scale confound and efficiency in stated preference data with an empirical illustration’, Health Economics Study Group, Newcastle upon Tyne. Lancsar, E. and J. Louviere (2006), ‘Deleting “irrational” responses from discrete choice experiments: a case of investigating or imposing preferences?’, Health Economics, 15 (8), 797–811. Lancsar, E. and J. Louviere (2008), ‘Conducting discrete choice experiments to inform healthcare decision making: a user’s guide’, Pharmacoeconomics, 26 (8), 661–77. Lancsar, E. and J. Louviere (2009), ‘Estimating individual level discrete choice models and welfare measures using best worst choice experiments and sequential best worst MNL’, CenSoC Working Paper N. 08-003, Sydney. Lancsar, E. and E. Savage (2004), ‘Deriving welfare measures from discrete choice experiments: inconsistency between current methods and random utility and welfare theory’, Health Economics, 13 (9), 901–7. Lancsar, E., J. Louviere and T. Flynn (2007), ‘Several methods to investigate relative attribute impact in stated preference experiments’, Social Science & Medicine, 64 (8), 1738–53. Lancsar, E., J. Louviere, C. Donaldson, G. Currie and L. Burgess (2013), ‘Best worst discrete choice experiments in health: methods and an application’, Social Science and Medicine, 76 (1), 74–82. Lancsar, E., J. Wildman, C. Donaldson, M. Ryan, and R. Baker (2011), ‘Deriving distributional weights for QALYs through discrete choice experiments’, Journal of Health Economics, 30 (2), 466–78. Louviere, J. and T.N. Flynn (2010), ‘Using best-worst scaling choice experiments to measure public perceptions and preferences for healthcare reform in Australia’, The Patient: Patient Centered Outcomes Research, 3 (4), 275–83. Louviere, J. and E. Lancsar (2009), ‘Choice experiments in health; the good, the bad, and the ugly and toward a brighter future’, Health Economics, Policy, and Law, 4 (4), 527–46. Louviere, J.J., T.N. Flynn and R. Carson 2010, ‘Discrete choice experiments are not conjoint analysis’, Journal of Choice Modelling, 3 (3), 57–72. Louviere, J., D.A. Hensher and J.D. Swait (2000), Stated Choice Methods: Analysis and Application, Cambridge: Cambridge University Press. Louviere, J., A. Street, L. Burgess, N. Wasi, T. Islam and A. Marley (2008), ‘Modeling the choices of individual decision-makers by combining efficient choice experiment designs with extra preference information’, Journal of Choice Modelling, 1 (1), 128–63. Lloyd, A.J. (2003), ‘Threats to the estimation of benefit: are preference elicitation methods accurate?’, Health Economics, 12 (5), 393–402. Mark, T.L. and J. Swait (2004), ‘Using stated preference and revealed preference modeling to evaluate prescribing decisions’, Health Economics, 13(6), 563–73. McIntosh, E. (2006), ‘Using discrete choice experiments within a cost benefit analysis framework: some considerations’, Pharmacoeconomics, 24 (9), 855–68. Milte, R., J. Ratcliffe, M. Miller, C. Whitehead, I. Cameron and M. Crotty (2013), ‘What are frail older people prepared to endure to achieve improved mobility following hip fracture? A discrete choice experiment’, Journal of Rehabilitation Medicine, 45 (1), 81–6. Morrison, M., J. Bennett, R. Blamey and J. Louviere (2002), ‘Choice modelling and tests of benefit transfer’, American Journal of Agricultural Economics, 84 (1), 161–70. Netten, A., P. Burge, J. Malley, D. Potoglou, A.M. Towers, J. Brazier, T. Flynn, J. Forder and B. Wall (2012), ‘Outcomes of social care for adults: developing a preference-weighted measure’, Health Technology Assessment, 16 (16), 1–166.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Choice modelling research in health economics 687 Potoglou, D., P. Burge, T. Flynn, A. Netten, J. Malley, J. Forder and J. Brazier (2011), ‘Best worst scaling vs. discrete choice experiments: an empirical comparison using social care data’, Social Science and Medicine, 72 (10), 1717–27. Propper, C. (1991), ‘Contingent valuation of time spent on NHS waiting’, Economic Journal, 100 (400), 193–9. Ratcliffe, J., H.L. Bekker, P. Dolan and R. Edlin (2009), ‘Examining the attitudes and preferences of health care decision-makers in relation to access, equity and cost-effectiveness: a discrete choice experiment’, Health Policy, 90 (1), 45–57. Ratcliffe, J., J. Brazier, A. Tsuchiya, T. Symmonds and M. Brown (2009), ‘Using DCE and ranking data to estimate cardinal values for health states for deriving a preference-based single index from the sexual quality of life questionnaire’, Health Economics, 18 (11), 1261–76. Regier, D.A., M. Ryan, E. Phimister and C.A. Marra (2009), ‘Bayesian and classical estimation of mixed logit: an application to genetic testing’, Journal of Health Economics, 28 (3), 403–13. Ryan, M. and K. Gerard (2003), ‘Using discrete choice experiments to value health care programmes: current practice and future research reflections’, Applied Health Economics Health Policy, 2 (1), 55–64. Ryan, M. and V. Watson (2009), ‘Comparing welfare estimates from payment card contingent valuation and discrete choice experiments’, Health Economics, 18 (4), 389–401. Ryan, M., A. Netten, D. Skatun and P. Smith (2006), ‘Using discrete choice experiments to estimate a preference-based measure of outcome – an application to social care for older people’, Journal of Health Economics, 25 (5), 927–44. San Miguel, F., M. Ryan and M. Amaya-Amaya (2005), ‘Irrational stated preferences: a quantitative and qualitative investigation’, Health Economics, 14 (3), 307–22. Scott, A. (2001), ‘Eliciting GPs’ preferences for pecuniary and non-pecuniary job characteristics’, Journal of Health Economics, 20 (3), 329–47. Sivey, P., A. Scott, J. Witt, C. Joyce and J. Humphreys (2012), ‘Junior doctors’ preferences for specialty choice’, Journal of Health Economics, 31 (6), 813–23. Swait, J. and T. Erdem (2007), ‘Brand effects on choice and choice set formation under uncertainty’, Marketing Science, 26 (5), 679–97. Swait, J. and J. Louviere (1993), ‘The role of the scale parameter in the estimation and comparison of multinomial logit models’, Journal of Marketing Research, 30 (3), 305–14 Szeinbach, S.L., J.H. Barnes and D. Garner (1997), ‘Use of pharmaceutical manufacturers value added services to build customer loyalty’, Journal of Business Research, 40 (3), 229–36. Thistlethwaite, J., R. Evans, R.N. Tie and C. Heal (2006), ‘Shared decision making and decision aids – a literature review’, Australian Family Physician, 35 (7), 537–40. Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge: Cambridge University Press. Tseng, Y.Y, E. Verhoef, G. de Jong, M. Kouwenhoven and T. van der Hoorn (2009), ‘A pilot study into the perception of unreliability of travel times using in-depth interviews’, Journal of Choice Modelling, 2 (1), 8–28. Urban, G.L., J.R. Hauser, W.J. Qualls, B.D. Weinberg, J.D. Bohlmann and R.A. Chicos (1997), ‘Information acceleration: validation and lessons from the field’, Journal of Marketing Research, 34 (1), 143–53 Van den Berg, B., M. Al, W. Brouwer, J. van Exel and M.A. Koopmanschap (2005), ‘Economic valuation of informal care: the conjoint measurement method applied to informal caregiving’, Social Science & Medicine, 61 (6), 1342–55. Van der Polm M. and J. Cairns (2001), ‘Estimating time preferences for health using discrete choice experiments’, Social Science and Medicine, 52 (9), 1459–70. Viney, R., E. Savage and J. Louviere (2005), ‘Empirical investigation of experimental design properties of discrete choice experiments in health care’, Health Economics, 14 (4), 349–62. Vujicic, M., M. Alfano, M. Ryan, C. Sanford Wesseh and J. Brown-Annan (2010), ‘Policy options to attract nurses to rural Liberia: evidence from a discrete choice experiment’, World Bank HNP Discussion Paper. Watson, V., M. Ryan and E. Watson (2009), ‘Valuing experience factors in the provision of chlamydia screening: an application to women attending the family planning clinic’, Value in Health, 12 (4), 621–3. Zweifel, P., H. Telser and S. Vaterlaus (2006), ‘Consumer resistance against regulation: the case of health care’, Journal of Regulatory Economics, 29 (3), 319–32.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:26AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

29 Transport research needs Juan de Dios Ortúzar, Elisabetta Cherchi and Luis Ignacio Rizzi

1

INTRODUCTION

Transport is a large, multidisciplinary and fascinating field, encompassing vastly different areas of research. In fact transport interests span from not very well understood (in fieldwork) issues related with survey methods to highly complex questions associated with the dynamic equilibration of supply and demand in strategic planning contexts; the latter involving large zoning systems, huge multimodal networks and highly complex dynamic modelling approaches (Mahmassani, 2001). But questions also arise at a more macro level (and in a different time span) regarding the interaction of transport and land use, and also at the more micro level with the dynamics of road traffic and public transport modelling, an area which is particularly interesting due to its high complexity in less developed nations (de Cea et al., 2005). We do not have the expertise or the space to dwell on all these issues. For these reasons, in this chapter we will just concentrate on issues related with modelling the demand for travel in the relatively short term. In particular, we will refer to modelling discrete shortterm choices, such as mode, route and/or trip timing; although in our analysis we will pay attention to research and policies oriented to eco-sustainable transport, we will not cover broader issues of recent interest such as ‘behave green’ (which may span from choosing green holidays to choosing eco-food). Thus, we left aside medium- and long-term choices such as destination, travel frequency, location and so on, and – of course – the highly complex issues associated with equilibrating supply and demand in the context of large-scale long-term strategic problems involving land use and transport interactions. A recent major conference workshop was dedicated to the future of travel behaviour and demand modelling; interested readers should consult Sivakumar et al. (2012) and some of the papers referred therein for a complementary vision to ours. The rest of the chapter is organized as follows. In section 2 we will consider theoretical issues, mainly dealing with the need to keep advancing on the integration of microeconomic and psychology modelling approaches. In section 3 we will look at policy issues that demand new research including a brief overview of related modelling issues that go beyond the current state-of-practice. Finally, in section 4 we will let our prejudices for the future to become apparent.

2

THEORETICAL ISSUES

Probably the major theoretical challenge that discrete choice models (DCM) need to face in the coming years is the integration between the theories in microeconomics and 688 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Transport research needs 689 psychology. The most popular models used to study the demand for transport are based on the neoclassical economics assumption of rational decision-makers (or even more extreme, homo economicus), who always perform well-planned and consistent activities, aiming to maximize some subjective measure of value. On the other hand, research in cognitive psychology, marketing, sociology and more recently in behavioural economics, has systematically demonstrated that individuals often violate all the assumptions of the rational postulate (Hensher and Li, 2012). In fact, DCM only allow for non-rational effects by interpreting them as part of the random error in the modeller’s specification of utility. The last decade has seen major improvements in model fit obtained with complex decompositions of these error terms, which give a strong signal that there are inherent limitations in the capability of microeconomic theory to explain individual choices and that there is interesting research to be done in order to achieve a satisfactory representation (through known variables) of the real phenomenon. Transport research interests have significantly shifted lately in favour of psychology theory and an increasing amount of evidence of departure from rationality has been produced (for example, Walker, 2001; Gärling and Axhausen, 2003; Bonsall et al., 2007). Most of this work has concentrated on demonstrating empirically that integrating psychology theory into the economic framework results in tangible improvements to model fit but, interestingly, most of it has been based on the DCM structure. This is a key point, as these new models incorporate strong non-rationality effects but mainly in the indirect utility function; hence they assume that the fundamental microeconomic assumptions not only hold but are affected by the same limitations of current DCM. The integration of psychological effects breaks the elegance and consistency of microeconomic theory and poses highly interesting research challenges for establishing consistent measures of individual welfare change to be used in social cost–benefit analysis. Thus, the integration of microeconomic and psychological theories points to many lines of research (Arentze and Timmermans, 2012; Cherchi, 2012). From a theoretical point of view, it might be important to think about different microeconomic formulations that integrate assumptions of non-perfect rationality at an earlier stage of the decision process, that is, not only in relation to the indirect utility function associated specifically with transport choices. Work is also needed to understand whether the major improvements in fit obtained with, for example, the complex decompositions of the error term now possible, can be attributed to approximations in making the theory operational rather than inherent non-rational behaviour. In terms of forecasting errors, a serious challenge is how to deal with confounding effects between error components and some important microeconomic effects. In this case the problem is that we can end up supporting the wrong policy. For example Amador et al. (2008) found that randomness in the cost parameter was actually masking the presence of income effect. But this effect can also appear in the opposite direction; for example, Amador and Cherchi (2011) refer to the possibility that not accounting for heterogeneity might erroneously suggest the presence of income and time effects. From a more practical perspective, if both components of rationality (perfect and bounded) play a role in the decision process, it seems also important to investigate in more depth under what circumstances and/or to what extent, individuals evaluate alternatives under the perfect rationality rather than the bounded rationality approaches. This is crucial if we want to drive individuals towards a particular course of action (such

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

690

Handbook of choice modelling

as choosing environmental friendly modes), because it might imply totally different strategies. For example, if individuals evaluate alternatives under perfect rationality, a typical supply side intervention (that is, improving level-of-service) could be sufficient to cause a quick shift in demand. On the other hand, if individuals evaluate alternatives under bounded rationality, it might be important to implement an intervention gradually to allow people enough time to form new preferences (trying and experiencing a positive feedback); or it might be important to make a strong information campaign to render the new intervention appealing to peer groups. However, large samples and data of much higher quality are needed to attempt to disentangle these effects. The requirements of good data increase with the complexity of our models. The new formulations integrating psychology and economic theories extend the behavioural capabilities of the model, but are even more demanding in terms of good quality data (Denstadli et al., 2012). On the other hand, new technology (such as Global Positioning System locators, smart phones and so on) has proved to be extremely useful in gaining a deeper and more precise knowledge of individual movements, alleviating at the same time the task of respondents in large surveys. Moreover, as information is available in real time, it is now possible to study how individuals change behaviour based on real-time data. A good example of integration between rational and non-rational theory where highquality data is crucial, is the inertia effect. So far, the problem has been studied either using long panel data, where information is gathered at different points in time hopefully around changes to the external environment, such as transport supply or people socioeconomic characteristics (Srinivasan and Bhargavi, 2007; Yáñez et al., 2009a), with short – or continuous – panel data where information is gathered over a continuous period of time (Ramadurai and Srinivasan, 2006), or with mixed revealed and stated preference data (Morikawa, 1994; Bradley and Daly, 1997; Cantillo et al., 2007; Cherchi and Manca, 2011). However, none of these data types allow taking proper account of the initial conditions problem, because we never observe the very first choice situation faced by individuals (Heckman and Singer, 1986). Thus, a critical piece of information for understanding inertia, the frequency of past behaviour, should also be considered (Thørgersen, 2006; Carrus et al., 2008) and treated as an indicator of a habitual behaviour (Cherchi et al., 2013). Better information is required to better understand the process that causes/leads to inertia and to disentangle the different sources of inertia in our demand models. Thus, we believe that panel data on revealed preferences will become more important, leaving definitively the side-lines. Looking specifically at non-rational effects, research has concentrated so far only on one specific aspect of non-rational behaviour at a time. However, individual decision processes are highly complex and several non-rational components are mutually interrelated and may concur to shape the final decision. The risk involved in studying one effect at the time is that confounding (such as that experienced in classical DCM) might occur also among non-rational behaviours. An important task for the future is to study non-rational aspects jointly, in an attempt to disentangle their relative effects. Mabit et al. (2012) represents the first work in this direction, as they test the extent to which preferences are affected by attitudes when allowing the preferences to be reference dependent, that is, dependent on the changes with respect to reference values other than on the final outcome itself.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Transport research needs 691 However, operational problems in translating psychological theories into practice also occur and could make it very difficult to distinguish one effect from another. For example, it might be hard to distinguish lagged effects that account for inertia from some forms of reference dependence. The latter is indeed a measure of the experience gathered in the past and it is accounted for by specifying utility as the difference between current and past (reference) values of a given characteristic (De Borger and Fosgerau, 2008; Masiero and Hensher, 2011). Although most work related to attempts to depart from microeconomic theory has made use of DCM theory, another avenue offering promise in this quest refers to artificial intelligence (AI), making use of fuzzy theory and neural networks to forecast individual choices under uncertainty and limited information (Avineri, 2012). The integration between economic and psychological theory has already proved to be extremely powerful in improving our model capabilities and new research in this line should allow to disclose the immense potentiality of this approach. However, in order to make these models operational, to provide answers for policy intervention, it is crucial to ‘conduct validation tests, including tests of forecasting ability and performance comparisons with models of simpler formulations’ (Walker, 2001, p. 117). So far there are very few examples of predictions of policy interventions incorporating the effects of bounded rationality. Yáñez et al. (2009b) and Raveau et al. (2010) are the only two examples of an analysis of latent variable models in forecasting. While Swait et al. (2004) and Yáñez et al. (2009a) are among the very few examples of predictions including the effects of inertia. This is a crucial and very stimulating area of research for policy issues. There is also a rich and well-developed field related to how telecommunications and information communication technologies (ICT) might affect travel behaviour, but the relationships between technology and travel behaviour has been controversial. Earlier studies showed that telecommunications led to a reduction in trip frequency and distance travelled (Pendyala et al., 1991). However, more recent studies found that ICT mainly modified the structure of activity/travel patterns but not necessarily the quantity of travel generated (Mokhtarian, 2004; Farag et al., 2007; Wang and Law, 2007). While this relationship is becoming better understood, there is still a need for models that appropriately describe the interrelationships among different forms of ICT and how these affect individual behaviour. In the following section, we present different research topics where DCMs can make an impact. All the topics mentioned are relevant transport policy issues that need to be better understood by city authorities to provide transport services that cater for the needs of their inhabitants in a sustainable and environmentally friendly way.

3

POLICY ISSUES

We envision a few areas where DCM could make a real contribution to better policymaking. Urban congestion will keep on being one of the most relevant preoccupations of transport specialists both in the developed and developing world. Motor vehicle congestion contributes to the production of other negative externalities such as accidents, environmental pollution, noise, urban segregation and aesthetic impacts. Different

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

692

Handbook of choice modelling

measures have being tried to address road externalities, most of them pertaining to the command-and-control type, and less often to the incentive-based type (Santos et al., 2010). Unfortunately, the most promising incentive-based type of measure, road pricing, has not yet achieved enough strength to be of widespread use. We believe that as time goes by, congestion and the aforementioned traffic-related externalities cannot be tackled efficiently if road pricing is not a key element in the menu of options. Policy-makers, however, are reluctant to risk their political capital unless convinced that road pricing may garner political approval. Thus, one fruitful area of work would be examining road pricing acceptability following the examples of Vrtic et al. (2007), Bascuñán and Ortúzar (2012) and Souche et al. (2012). Empirical research should address such issues as: (1) what type of road-pricing schemes could be more acceptable and (2) what destination should be given to the funds collected by a road pricing scheme. With respect to the first issue, a key element is how road pricing should work. Should it be a flat price for accessing and driving within a zone? Or should it be a distance-related charge where drivers who use the priced area more should pay more? Should the price depend on the level of congestion and change dynamically (Lu and Mahmassani, 2008), or should it be a flat price across the whole day irrespective of how much congestion there is? These are issues that might depend, on their resolution, mostly on practicalities based on the characteristics of the area under study. With regard to the destination of the funds generated by the scheme, they could be devoted to alleviate drivers’ payments such as petrol taxes, or they could be devoted to new public transport investment, building cycle lanes, improving pedestrian facilities, or just considered as another source of funding that could alleviate the levels of general taxes. Preliminary work on these issues, using both qualitative and quantitative approaches (such as focus groups and stated choice methods), suggests that crucial to acceptability is the belief that the ‘new tax’ will not compound but replace previous taxation, and that at least part of the funds will not go to general fiscal coffers, but be specifically allocated to improving transport alternatives to the motor car (Bascuñán and Ortúzar, 2012). Road pricing should have the effect of altering transport users’ behaviour. There are many ‘margins’ along which people might modify their travel behaviour. Some road users may change their travel departure and arrival times, some may switch modes, while others might change their place of work and/or residence in the long term. At the same time, in studying the optimum toll (that is, the marginal external congestion cost), it is important to recognize that user preferences are not homogeneous and that congestion increases travel time variability (Fosgerau and Van Dender, 2013). Discrete choice models are an ideal tool to study transport users’ responses. There is already some evidence on the potential change of behaviour along the margins of departure/arrival time or mode switching behaviour (Arellana et al., 2012), but more research is still needed. Sound policy will require local studies to be conducted as results may not be easily transferable from one metropolitan area to another. Theoretical studies (Arnott et al., 1994; Arnott and Kraus, 1998) suggest that the optimal toll increases when travellers are more heterogeneous, but there is no empirical evidence showing the potential size of this effect. Including travel time variability in the generalized travel cost is also likely to lead to an increased estimate of the optimal toll but more empirical evidence is required to evaluate

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Transport research needs 693 how important this effect is in comparison to many other uncertainties and approximations that may be in play (Fosgerau and Van Dender, 2013). Also, establishing adequate road tariffs requires determining the external costs of road motoring. There is already a good deal of work on what the value of travel-time savings is, but there is much less knowledge available about the costs of the welfare loss from other road motoring externalities. This is another area where DCM could be very useful. Using revealed preference (RP) data and/or stated preference (SP) data, models could be estimated that yield household’s willingness to pay for better air quality, noise mitigation or improved neighbourhood amenities. Some of these environmental goods have already been studied (Sillano and Ortúzar, 2005; Ortúzar and Rizzi, 2007; Strazzera et al., 2010), but we believe much more research is needed and that there is a need to perform many such studies across different metropolitan areas. New transport infrastructure investments (highways and at-level railways) could disrupt and adversely impact established neighbourhoods. The use of SP surveys could help to adopt better design strategies and/or determine the amount of compensation to provide for future affected households. Urban segregation is a transport externality that has not been adequately addressed in the valuation literature (Grisolía et al., 2014). It affects neighbourhoods’ accessibility and social interactions. This should be a research area amenable to be analysed using the current transport analysts’ tool kit. Related to sustainability, understanding the behaviour of pedestrians and cyclists is another area whose time has come for expanding the use of DCM. If city authorities wish to increase the share of bike users, we first need to understand what factors positively contribute to its use. After the pioneering study of Ortúzar et al. (2000) other studies have been carried out – most of them in countries that already actively promote the use of bicycles (Halldórsdóttir et al., 2011) – but these have not adopted a strategic planning framework. New studies are required, especially in metropolitan areas where bicycle use is starting to grow; these studies should incorporate the full complexity of issues associated with bicycle use as a sustainable mode, for example: ownership versus rental of bikes, particularly for shopping and leisure; influence of adequate bicycle infrastructure (not only bicycle lanes, but also appropriate and safe places to leave bicycles, both at attractive destinations and at interchange stations), integration with mass transit, and so on. Another important policy related to sustainability is the adoption of car-sharing schemes. Research in North America (Martin et al., 2010) has shown that car-sharing reduces the average number of vehicles per household from 0.47 to 0.24 and increases fuel economy (the cars shared are often more efficient than the average vehicles discarded). Although the car-sharing market has increased significantly, it still represents a fairly low proportion of the total car market and more research is needed to understand and model what specifically affects and limits individual choices in this case. Specific policies, such as unbundling parking (Schure et al., 2012) may help to create extra demand for car-sharing, but other reasons related to individual traits and attitudes might also play a role and could be worth investigating. The adoption of new vehicle technologies, which are more environmentally friendly, should also be an active area of research. For instance, electric vehicles still suffer the problem of having relatively low autonomy as their batteries need recharging and there

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

694

Handbook of choice modelling

are not many facilities where this can be done promptly as in the case of petrol cars. Understanding how people chain their activities and how they travel, day by day, their individual constraints and acute transport needs (such as picking up a sick child from school), could help to design a better network of recharging sites that may contribute to accelerate the penetration rate of such new technologies. At the same time individuals might have a misconception about the impact that the characteristics of this new technology can have on their daily lives. Preferences and attitudes might change as the market for electric vehicles expands and individuals acquire experience with the new technology and how it affects their lives (Jensen et al., 2013). It is important, then, to study the dynamic evolution of individual preferences and attitudes in order to try and forecast correctly the market penetration of electric vehicles (Jensen et al., 2014). The analysis of car purchase decisions is another key input to better transport planning policy. A further fruitful area of research could be to study the way people do shopping and engage in leisure activities. As income increases people tend to travel more for these purposes, and this travel takes time at many different hours of the day, generating congestion beyond the typical commuting hours. Understanding how people plan their nonlabour activities and how this affects their travel decisions will become more necessary, especially for emerging economies with a rapidly growing middle-income urban class. The leisure and shopping opportunities of members of households without a car and with limited access to public transport, are severely cut down restricting their ability to participate in utility-enhancing activities. Understanding how the welfare of these people is affected would be relevant for socially profitable investments in new public transport infrastructure.

4

CONCLUSIONS

This chapter has given a brief view of some of the challenges that the transport-research community faces. First, we dealt with the issue of better blending microeconomic models of consumer behaviour in the area of discrete choice with insights taken from both the psychology and behavioural economics fields. Consumer behaviour shows several anomalies both in laboratory experiments and in actual choices, casting doubts on the validity of traditional microeconomics models. Dealing with many of these anomalies in choice models is not an easy task, but there are many successful examples in the DCM literature. A major problem, though, is that when choice models are modified to allow for the inclusion of ‘non-rational’ effects, the sound microeconomic basis of the willingnessto-pay measures needed for social cost–benefit analysis are normally lost. Second, we proceeded to highlight a few areas where discrete choice models could be particularly useful. These share the issue of sustainability as a common trait. In our opinion, road pricing will become an unavoidable policy instrument sooner or later, if urban areas are committed to deal adequately with sustainability. Measures to improve use of more efficient and healthy modes, such as bicycles, will become more urgent and important. Both user behaviour and external costs of urban road transport need to be modelled and estimated as accurately as possible in these cases: we envision a key role to be played by DCM in this regard, but here –as in previous issues – we foresee a complex

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Transport research needs 695 problem associated with the quality and amount of data required for doing the job appropriately.

ACKNOWLEDGEMENTS We are grateful to the Millennium Institute of Complex Engineering Systems (ICM: P05-004F; FONDECYT: FB016), the TEMPO Project (funded by the Research Council of Norway) the Alexander von Humboldt Foundation and the ALC-BRT Centre of Excellence (funded by the Volvo Research and Educational Foundations) for having supported our research. We are also grateful for the positive comments of Stephane Hess and an unknown referee who helped us to set a more optimistic tone to the work.

REFERENCES Amador, F.J. and E. Cherchi (2011), ‘Econometric effects of utility order transformations in discrete choice models’, Networks and Spatial Economics, 11 (3), 419–38. Amador, F.J., R.M. Gonzalez and J. de D. Ortúzar (2008), ‘On confounding preference heterogeneity and income effect in discrete choice models’, Networks and Spatial Economics, 8 (2), 97–108. Arellana, J., A.J. Daly, S. Hess, J. de D. Ortúzar and L.I. Rizzi (2012), ‘Developing surveys for the study of departure time choice: a two-stage efficient design approach’, Transportation Research Record , 2303, 9–18. Arentze, T.A. and H.J.P. Timmermans (2012), ‘Travel demand modelling: conceptual development and perspectives’, Transportation Letters, 4 (2), 79–92. Arnott, R. and M. Kraus (1998), ‘When are anonymous congestion charges consistent with marginal cost pricing?’, Journal of Public Economy, 67 (1), 45–64. Arnott, R.A., A. de Palma and R. Lindsey (1994), ‘The welfare effects of congestion tolls with heterogeneous commuters’, Journal of Transport Economics and Policy, 28 (2), 139–61. Avineri, E. (2012), ‘Travel behaviour research’, in Artificial Intelligence Applications to Critical Transportation Issues, Transportation Research Circular E-C168, Transportation Research Board, Washington, DC. Bascuñán, R. and J. de D. Ortúzar (2012), ‘Acceptability of road pricing in a developing metropolis: preliminary evidence from qualitative and quantitative data’, working paper, Department of Transport Engineering and Logistics, Pontificia Universidad Católica de Chile. Bonsall, P., J. Shires, J. Maule, B. Matthews and J. Beale (2007), ‘Responses to complex pricing signals: theory, evidence and implications for road pricing’, Transportation Research, 41A (7), 672–83. Bradley, M. and A.J. Daly (1997), ‘Estimation of logit models using mixed stated preference and revealed preference information’, in P. Stopher and M. Lee-Gosselin (eds), Understanding Travel Behaviour in an Era of Change, Oxford: Pergamon Press, pp. 209–31. Cantillo, V., J. de D. Ortúzar and H.C.W.L. Williams (2007), ‘Modelling discrete choices in the presence of inertia and serial correlation’, Transportation Science, 41 (2), 195–205. Carrus, G., P. Passafaro and M. Bonnes (2008), ‘Emotions, habits and rational choices in ecological behaviour: the case of recycling and use of public transportation’, Journal of Environmental Psychology, 28 (1), 51–62. Cherchi, E. (2012), ‘Modelling individual preferences: state of the art, recent advances and future directions’, in R.M. Pendyala and C.R. Bhat (eds), Travel Behaviour Research in an Evolving World, Raleigh, NC: Lulu. com Publishers, pp. 207–59. Cherchi, E. and F. Manca (2011), ‘Accounting for inertia in modal choices: some new evidence using a RP/SP dataset’, Transportation, 38 (4), 679–95. Cherchi, E., I. Meloni and J. de D. Ortúzar, (2013), ‘The latent effect of inertia in the modal choice’, in M. Roorda and E. Miller (eds), Travel Behaviour Research: Current Foundations, Future Prospects, Raleigh, NC: Lulu.com Publishers, pp. 517–34. De Borger, B. and M. Fosgerau (2008), ‘The trade-off between money and travel time: a test of the theory of reference-dependent preferences’, Journal of Urban Economics, 64 (1), 101–15. De Cea, J., J.E. Fernández, V. Dekock and A. Soto (2005), ‘Solving network equilibrium problems on multimodal urban transportation networks with multiple user classes’, Transport Reviews, 25 (3), 293–317.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

696

Handbook of choice modelling

Denstadli, J.M., R. Lines and J. de D. Ortúzar (2012), ‘Information processing in choice-based conjoint experiments: a process-tracing study’, European Journal of Marketing, 46 (3), 422–46. Farag, S., T. Schwanen, M. Dijst and J. Faber (2007), ‘Shopping online and/or in store? A structural equations model of the relationships between e-shopping and in-store shopping’, Transportation Research, 41 (1), 125–41. Fosgerau, M. and K. van Dender (2013), ‘Road pricing with complications’, Transportation, 40 (3), 479–503. Gärling, T. and K. Axhausen (2003), ‘Habitual travel choice’, Transportation, 30 (1), 1–11. Grisolía, J.M., F. López and J. de D. Ortúzar (2014), ‘Burying the highway: the social valuation of community severance and amenity’, International Journal of Sustainable Transportation (in press). Halldórsdóttir, K., L. Christensen, T.C. Jensen and C.G. Prato (2011), ‘Modelling mode choice in short trips – shifting from car to bicycle’, Proceedings of the European Transport Conference 2011, Glasgow, 10–12 October, accessed 21 January 2014 at http://abstracts.aetransport.org/paper/index/id/3730/confid/17. Heckman, J. and B. Singer (1986), ‘Econometric analysis of longitudinal data’, in Z. Griliches and M. Intriligator (eds), Handbook of Econometrics, Amsterdam: North-Holland, pp. 1689–763. Hensher, D.A. and Z. Li (2012), ‘Valuing travel time variability within a rank-dependent utility framework and an investigation of unobserved taste heterogeneity’, Journal of Transport Economics and Policy, 46 (2), 293–312. Jensen, A., E. Cherchi and S. Mabit (2013), ‘On the stability of preferences and attitudes before and after experiencing an electric vehicle’, Transport Research, 25D (1), 24–32. Jensen, A., E. Cherchi, E. and J. de D. Ortúzar (2014), ‘A long panel survey to elicit preferences for electric vehicles and charging infrastructure’, Transportation (in press). Lu, C.-C. and H.S. Mahmassani (2008), ‘Modelling user responses to pricing: simultaneous route and departure time network equilibrium with heterogeneous users’, Transportation Research Record, 2085, 124–35. Mabit, S., E. Cherchi, A. Jensen and J. Jordal-Jørgensen (2012), ‘Hybrid choice modelling allowing for reference-dependent preferences: the case of alternative-fuel vehicles in Denmark’, 1st European Symposium on Quantitative Methods in Transportation, Lausanne, 4–8 September. Mahmassani, H.S. (2001), ‘Dynamic network traffic assignment and simulation methodology for advanced systems management applications’, Networks and Spatial Economics, 1 (3), 267–92. Martin, E., S.A. Shaheen and J. Lidicker (2010), ‘Impact of carsharing on household vehicle holdings: results from a North American shared-use vehicle survey’, Transportation Research Record, 2143, 150–58. Masiero, L. and D.A. Hensher (2011), ‘Shift of reference point and implications on behavioural reaction to gains and losses’, Transportation, 38 (2), 249–71. Mokhtarian, P.L. (2004), ‘A conceptual analysis of the transportation impacts of B2C e-commerce’, Transportation, 31 (3), 257–84. Morikawa, T. (1994), ‘Correcting state dependence and serial correlation in the RP/SP combined estimation method’, Transportation, 21 (2), 153–65. Ortúzar, J. de D. and L.I. Rizzi (2007), ‘Valuation of transport externalities by stated choice methods’, in P. Coto-Millán and V. Inglada (eds), Essays in Transport Economics, Heidelberg: Physica-Verlag, pp. 249–72. Ortúzar, J. de D., A. Iacobelli and C. Valeze (2000), ‘Estimating demand for a cycle-way network’, Transportation Research, 34 (5), 353–73. Pendyala, R.M., K.G. Goulias and R. Kitamura (1991), ‘Impact of telecommuting on spatial and temporal patterns of household travel’, Transportation, 18 (4), 383–409. Ramadurai, G. and K. Srinivasan (2006), ‘Dynamics and variability in within-day mode choice decisions: role of state dependence, habit persistence, and unobserved heterogeneity’, Transportation Research Record, 1977, 43–52. Raveau, S, R. Alvarez-Daziano, M.F. Yáñez, D. Bolduc and J. de D. Ortúzar (2010), ‘Sequential and simultaneous estimation of hybrid discrete choice models: some new findings’, Transportation Research Record, 2156, 131–9. Santos, G., H. Behrendt, L. Maconi, T. Shirvani and A. Tteytelboym (2010), ‘Part I: Externalities and economic policies in road transport’, Research in Transportation Economics, 28 (1), 2–45. Schure, J., F. Napolitan and R. Hutchinson (2012), ‘Cumulative impacts of carsharing and unbundled parking on vehicle ownership and mode choice’, Transportation Research Record, 2319, 96–104. Sillano, M. and J. de D. Ortúzar (2005), ‘Willingness-to-pay estimation with mixed logit models: some new evidence’, Environment and Planning, 37 (3), 525–50. Sivakumar, A., R. Batley and J.W. Polak (2012), ‘Toward the future of travel behavior and demand modeling’, Transportation Letters, 4 (2), 75–7. Souche, S., C. Raux and Y. Croissant (2012), ‘On the perceived justice of urban road pricing: an empirical study in Lyon’, Transportation Research, 46 (7), 1124–36. Srinivasan, K. and P. Bhargavi (2007), ‘Longer-term changes in mode choice decisions in Chennai: a comparison between cross-sectional and dynamic models’, Transportation, 34 (3), 355–74.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Transport research needs 697 Strazzera, E., E. Cherchi and S. Ferrini (2010), ‘Assessment of regeneration projects in urban areas of environmental interest: a stated choice approach to estimate use and quasi-option values’, Environmental and Planning, 42A (2), 452–68. Swait, J.D., W. Adamowicz and M. van Bueren (2004), ‘Choice and temporal welfare impacts: incorporating history into discrete choice models’, Journal of Environmental Economics and Management, 47 (1), 94–116. Thøgersen, J. (2006), ‘Understanding repetitive travel mode choices in a stable context: a panel study approach’, Transportation Research, 40 (8), 621–38. Vrtic, M., N. Schüssler, A. Erath and K.W. Axhausen (2007), ‘Design elements of road pricing schemes and their acceptability’, 86th Annual Meeting of the Transportation Research Board, Washington, DC, 21–25 January. Walker, J.L. (2001), ‘Extended discrete choice models: integrated framework, flexible error structures and latent variables’, PhD dissertation, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology. Wang D. and Y.T.F. Law (2007), ‘Impacts of information and communication technologies (ICT) on time use and travel behaviour: a structural equations analysis’, Transportation, 34 (4), 513–27. Yáñez, M.F, E. Cherchi and J. de D. Ortúzar (2009a), ‘Inertia and shock effects on mode choice panel data: implications of the Transantiago implementation’, 12th International Conference on Travel Behaviour Research, Jaipur, 13–18 December. Yáñez, M.F., S. Raveau, M. Rojas and J. de D. Ortúzar (2009b), ‘Modelling and forecasting with latent variables in discrete choice panel models’, European Transport Conference, Noordwijkerhout, The Netherlands, 5–8 October.

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:19AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Index

accessibility measurement 370–72 concept of 370 cumulative opportunities approach 369–70 dynamic transport model 372 gravity based 369 individual–specific 371–2 proximity based 369 utility-based approach 370 ACCRA Cost of Living indices 15 Akaike information criteria (AIC) 314, 392–3 allocation Pareto efficient 29 random 29 alternative specific constraints (ASC) 208 altruism 9–10, 36, 40 genetic 35–6 analysis of variance (ANOVA) 157 appraisal Kaldor–Hicks compensation criterion (Pareto improvement) 604, 617–18 Pareto criterion 604–5 Scitovsky criterion 604 Archimedian copula 261–2 Ariely, Dan 2 artificial intelligence (AI) potential use in DCM theory asymmetric value function 284 asymptomatic variance-covariance (AVC) matrix 157, 161–2, 164–5, 167, 170 in DCM 168, 171 attribute non-attendance (AN-A) 271–2, 274–6, 278–81, 323, 664 calculation of VTTS in 277 definitions of 272 developments 280 self-reported 274 use in WTP estimation 665 attribute processing 269–70, 272–5, 279–80, 284–6 induced 272 heterogeneity in 272 rules of 271–2, 275 strategies (APS) 269, 274–5, 323–4, 326 attribute thresholds 281–2, 284–5 Auspitz, Rudolph 10 Australia 277, 355, 675

automatic number plate recognition (ANPR) 137–8, 144–5 concept of 139 average partial effects (APE) 344–5 balanced incomplete block designs (BIBDs) 196 concept of 182–3 Bayesian estimation/inference 457, 488–93, 657 hierarchical (HB) 458–61, 482–4, 486–8, 491, 500, 507–11 regression models 459, 478 of RUM 493 use in implementation of MCMC 488 use of grid methods in 462 use of HPDI in 461 Bayesian information criterion (BIC) 277, 314, 317, 489 behavioural economics 51, 291 anomalies in 620 behavioural realism 401–2 Bernoulli random variable 336 best-worse choice (BWCHOICE) 209 Best-Worst Scaling (BWS) 180–81, 183, 186–8, 195–6, 681 concept of 178 development of 178–81, 184 maxdiff model 180, 188–92, 194–5 MNL model 187–9, 191–2 Multi-profile Case (Case 3) 184–6, 188–90, 192–3 Object Case (Case 1) 181–2, 186–90, 192, 194–5 Profile Case (Case 2) 183–4, 186–9, 192–3 use in health economies 681–3 Bentham, Jeremy 10 Introduction to the Principles of Morals and Legislation (1789) 8–9 view of utility 24 bidding game (BGAME) 207–8, 210 binary choice data 265 cross-validation of 262–4 Bohr, Niels elegant correspondence principle of 622 Canada Environmental Resource Inventory (EVRI) 202–3

699 Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

700

Handbook of choice modelling

candidate welfare theory 620–21 choice alternative 565–6 choice behaviour 1, 51, 56–7, 268–70, 629–31, 654, 670 goal pursuit 654–5 independence 54 regularity 55 stationarity 55–6 transitivity 52 choice context 101, 108–9, 113–14, 120–21, 124 bioecological model 102–11 global influence 107 historical time and place 102–4, 107–8, 111–12 human agency 103 human development 104–7 linked lives 103, 107 person-process-context-time (PPCT) model 105–9, 124 timing of lives 103–4 choice experiments (CE) 661 WTP estimates 670 choice modelling (CM) 1–2, 67–8, 73, 117–18, 206, 269, 413–14, 423, 499, 618, 641, 643, 649, 658, 661, 667–71 big data 656–8 decision rules 663–5 development of 498 heterogeneity in 662 incentive compatibility 666–7 multiple goal pursuit 654–5 multi-stage 649–50, 653–4, 657–8 non-compensatory 656 nonparametric 656–8 spatial choice 669–70 use of latent class models in 314 Clark–Groves mechanism 28 Cobb–Douglas demands 14 Cobb–Douglas price index 15 cognitive psychology 26–7, 29 commodities 18 hedonic 22 common consequence effect (CCE) 78, 84 common ratio effect (CRE) 76–80, 82, 84 classical 94 patterns of 93 compensating variation (CV) 603 composite marginal likelihood 576 compromise effect 54 computable general equilibrium (CGE) 607 conjoint choice experiments hypothetical bias in 241–3 Constant Elasticity of Substitution (CES) 15, 617

consumer behaviour 1, 23–5, 34, 478, 519, 649, 651, 653–4, 694 money-metric utility view of 13 neoclassical modelling of 24 non-positivist analysis 101 studies of 7, 22 use of RUM in modelling 450 consumer demand 1 consumer measurement brain activity 36–40 consumer theory 10–11, 24, 212 economic 40 neoclassical 9–10, 18, 26, 207–8 consumption 25 contingent valuation (CV) 202–3, 205–6, 211–12, 217, 221, 224–5, 611, 661 application of 238 bequest value 218 criticisms of 222–3, 225, 666 development of 204, 218–19, 228 existence value 215–16 relationship with DCE 206–7 relationship with elicitation formats 205, 207, 209–10 random utility framework 213–14 stewardship value 215–16 studies/surveys 204–5, 213, 217–20, 223–4, 227–8, 249 use in measurement of WTA/WTP 220–21, 225–6 use value 215 continuous utility index 11 conventional demand theory 33 convergence 583 q-linear 581, 585–6 q-quadratic 581–2 q-superlinear 581 speed of 581 cost–benefit analysis (CBA) 601, 603, 605–7, 676, 684 consumer sovereignty 616 discount factor 606 cost–utility analysis (CUA) 676, 684 Court, Andrew 21 covariance matrix 437, 486, 523–4, 533–4, 536–7, 541–3 Cholesky factorization of 523, 525, 527, 536, 539–40 variance–covariance (VC) matrix 159–60, 384–5, 442–3 cultural constructs 116, 123 cumulative density function (CDF) 257, 260, 263, 336–7, 395, 487, 608, 613 approximation of 258 continuous 261

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Index derivatives of 440 inverse/inverted 470, 483, 486 joint 440 multivariate 261 multivariate normal (MVNCDF) 442–4 of expenditure function 613 use in RUM 608 cumulative prospect theory (CPT) 81 data collection 131–2 activity purpose imputation 142 cleaning and smoothing 140–41 ground truth 132–3 processing 140 response burden 135 sources 138–9 spatial matching 142 data-generating process (DGP) 577, 590 concept of 565 Dawkins, William 35 Deaton, Angus Almost Ideal Demand System 15 decision by sampling 65–6 decision-making 28, 67, 364, 384 collective model 366–7 dynamic transport model 372–3 discrete choice 363 household model of 365–9 individual 364–5 unitart model 365–6 decision neuroscience 66 decision rules compensatory 663 non-compensatory 663 deflation 13 demand analysis neoclassical 13 demand function 14, 17 Gorman polar form 14 Hicksian 11 market 11 demand systems derivation of 11 neoclassical 15 discrete choice 22–3, 290, 539 binomial 257 decision-making 363 estimation of 568–9 m-estimators 569 method of sieves 258–61 mixtures of distributions approach (MOD) 260–61 hedonic 23 influence of random utility theory in 384

701

multinomial 257 use of MLE in 570 discrete choice experiment (DCE) 179–80, 182–3, 186, 194–5, 202–3, 205–6, 211, 215, 225, 228–9, 259, 397, 434, 443, 661, 675, 680–81, 683 additional complexity 297–8 attribute-based model 212 elicitation 207 multi-attribute 666 questions binary choice (BC-SEQ) 209, 211 multinomial choice (MC-SEQ) 209, 211 relationship with CV 206–7 SP 679 traditional 181 discrete choice models (DCM) 23, 27, 156–7, 160, 168, 171, 179–80, 259, 268, 271, 290, 294, 298, 340, 350, 368–9, 374, 383, 397, 409, 427, 434, 437, 498, 519–20, 527–8, 547–8, 550–51, 565, 570, 574, 589, 601, 611–12, 638, 675, 688–9, 691–2, 694 AVC matrix in 168, 171 binomial 257 classical 690 criticisms of 394, 401 data use in 693 decision rules in 295, 297 estimation 568–9, 571 identification problems in 540–41 limitations of 689 MLE of 570, 588, 590, 596 multinomial 257 of labour supply 363 potential use of AI in 691 single 114, 443–4 structure of 530, 539–40 error structure 543 use in health economies 675–9, 681–3 use of HCM in extension of 386, 410 use of RUT in 384 discrete-continuous choice models 3–4 mixed 608 multiple (MDC) 427–30, 444, 448–9, 451 KKT-based 430, 444, 446 random utility maximization (RUM) 428, 451 KKT-based 447–8 single (SDC) 427–9 Diewert, Erwin 15 Dupuit, Jules 9, 16 observations of changes in consumer wellbeing 18 dynamic psychological models 59–60, 65 ‘horse race’ choices model 60 independent Poisson 60

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

702

Handbook of choice modelling

linear ballistic accumulator (LBA) type 60, 67 sequential sampling choice model 61–3, 65–6 associative accumulation model (AAM) 64–5, 67 attention modulated drift-diffusion model (AM-DDM) 64–5 decision field theory (DFT) 63–4, 67 leaky competing accumulator (LCA) model 63–4, 67 dynamic stochastic programming 24–5 economic behaviour 36, 40 econometric demand analysis 14–15 economic demand analysis 13 Edgeworth, Francis 8–9, 24 elicitation formats 3, 28, 73, 102, 110, 117, 203 dichotomous choice (DC) 239–41, 246 discrete choice 205 double-bounded binary choice (DBBC) 208, 213 relationship with CV 205, 207, 209–10 single binary choice (SBC) 208, 210, 228 single multinomial choice (SMC) 208–9, 211 stated preference 209–10 elimination by aspects (EBA) theory 292 concept of 58 endogeneity 114, 171, 280, 449, 628, 663 in pricing 668 equivalent variation (EV) 603 error theory 49 European Union (EU) 103 expected monetary values (EVs) 78 expected utility theory (EUT) 73, 76–8, 80–81, 89, 93–4, 249, 418–19 alternatives to 90 betweenness property of 76 concept of 75 departures from 84 use in risk management 415–16, 423 use on lottery pair response rates 249 experimental design (ED) 156, 661–2 concept of 157 extreme value (EV) 442 generalized (GEV) 171, 325, 370, 442, 439, 519, 522–3, 550–51 mixed (MMDCEV) 441 multiple discrete-continuous extreme value (MDCEV) model 114, 439, 441–2, 444, 450, 635 multiple discrete-continuous generalized extreme-value (MDCGEV) model 439–40

multiple discrete-continuous nested extremevalue (MDCNEV) model 439–40 multivariate (MEV) 610–11, 613–16 logsum issued in 615 random variable 523 extreme value density function 438 fairness 40 final prices 17 Fisher, Irving 10, 13 forecasting 627–8, 635–6, 682 aggregate 632–3 disaggregate 632–3 discriminant analysis 628 errors 631–3, 642–3 models error 642 inputs 637–9 temporal transfer 637–8 of discrete-continuous choice 634–5 pivot point 633–4 population 640–41 iterative proportional fitting (IPF) 640–41 quadratic minimisation (QUAD) 640 France Paris 13, 371 Frisch, Ragnar 13 Gaussian copula 262 concept of 261 use with method of sieves 261–2 generalized cost function 373 Generalized Leontief cost function 15 generalized method of moments (GMM) 353 Global Positioning System (GPS) 122–3, 131, 133, 137–9, 144 diaries 143 measurement 140 global system for mobile communications (GSM) 131, 133, 137–40, 143–4 measurements 140 goods 11, 447 durable 10 inside 434–6 market 27 non-durable 10 outside 434–5 Gorman, Terence 14–15 Gossen, Hermann 9 Greece 349 Hägerstrand, T. time-space prism (TSP) concept 372 Hall, Robert 15 Hamilton, William 35

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Index health economies 678, 683–4 data use in 675, 679–80, 683 quality adjusted life year (QALY) 676–7, 684 use of BWS in 681–3 use of DCEs in 675–9, 681–3 hedonic taste 23 hedonic products market equilibrium 22 of household production 21–2 Hessian 341, 568, 574–5, 580, 584–6, 588, 591–2 analytic 582 approximation of 579, 583, 587–90 BHHH 590 empirical 577–8 finite difference 593 ill-conditioned 548 matrix 547, 573–4 singular 549 heterogeneity 3, 15, 20, 25, 116, 270, 272, 278, 280, 311–12, 317, 326, 346–9, 354, 393, 458 cross-individual 20 decision rule 324–6 deterministic 313, 325 in CM 662 in HCM 394–5 in induced attribute processing 272 in IP strategies 273 intra-individual 20 in VTTS 394–5 modelling parameter 350–51 multivariate regression 478–9 preference 18–19, 270, 277, 311 random 316, 318 taste 20, 311–12, 314, 316, 318, 394, 662 unobserved 409 within-class 278–9 heteroscedasticity 282, 326 Hicks, John 9–10, 16, 28, 219 Hicks–Samuelson formulation 10 highest posterior density intervals (HPDI) use in Bayesian estimation/inference 461 homogeneity intra-individual 315 Hotelling, Harold 10 household production concept of 21 hedonic products of 21–2 Houthakker indirect addilog system 15 human capital 416 hybrid choice model (HCM) 385–6, 389–90, 392–4, 397–8, 402, 404, 407–10, 520, 522–3, 526–7, 530, 532, 539, 546–51

703

aims of 386 concept of 386–7, 521 conditional choice probability in 387 development of 383, 410 efficiency 397 error structure 543–5 estimation of 392–3, 399, 406–7 heterogeneity in 394–5 identification of 527–8, 530–32, 534–5, 539, 548, 551 confirmatory factor analytic model 532–4 structural equation models 535–6 in logit models 519–20 kernel of 521–2 latent variables 388–91, 400–401, 404–5, 408–9, 521, 524–5, 528 multiple indicator response (MIMIC) model 526 normalization of 391–2 production of 393 use in extension of DCM 386, 410 hyperbolic discounting 32 hypothetical bias 239–42, 642 advisory referenda and realism 248 in conjoint choice experiments 241–3 in MPL 241 mitigation of 244 instrument calibration 244 statistical calibration 244–8 salient rewards 248–9 hypothetical choice 236–9, 242, 247–50 DC elicitation in 239–40 incentive compatibility 239 in choice behaviour 237–8 independence of irrelevant alternatives (IIA) condition 187, 211, 320 in MNL model 652 restrictive property 214 indirect utility function 11–12 individual level models (ILMs) estimation of 498–500 use of Newton–Raphson algorithm in 501 information communications technology (ICT) 113, 115 impact on travel behaviour 691 information processing (IP) 272, 280 strategies (IPS) 273, 323 intemporal choice 39 International Choice Modelling Conference 2 Jorgenson, Dale 15 judgmental noise concept of 88

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

704

Handbook of choice modelling

Kahneman, Daniel 2, 29 kernel regression concept of 263–4 labor force participation 101 labor relations 101 labor supply 367–8, 374 collective model of 367 Lagrangian function 435–6 Lancaster, Kevin 21 latency 116 constructs 117 response 665 latent class models 316–17, 323–5, 350–51, 388 class allocation 315, 317–18, 323–4 concept of 314–15 contrasts in 316 correlation between coefficients 318–19 disaggregate elasticities 319–21 posterior analysis 317–18 use in CM 314 latent regression 336, 338 model 337, 339–40, 349 variables of 341 Lau, Larry 15 Lewbel, Arthur 15 Lieben, Richard 10 likelihood function 312, 439 linear budgets 20 constraints 11, 18 linear expenditure system (LES) 430, 432 linear regression model 159, 498 homoscedastic 160 log-likelihood 349–52, 398, 504 function 502 negative 596 unconditional 350 logistic regression models 474–5 logit models 168, 173, 285, 346, 351, 405, 636 binary 353 conditional 194 ML estimates of 195 continuous mixed 311, 319–22 generalized rank ordered (GROL) 188 hybrid choice model (HCM) 519–20 latent class 311, 321 mixed multinomial (LC_MMNL) 278 mass point 313–14 mixed 259–60, 278, 611–12, 636, 662 HB 509, 512–13 modified 271 multinomial (MNL) 161–2, 165–8, 170–73, 186–9, 191, 196, 214, 277, 282, 312–13, 315–16, 320–21, 370–71, 439, 520, 549, 610, 612, 633, 662

constrained 653 generalized (GMNL) 188, 662 heteroscedastic Gumbel scale (HG-SMNL) 283 IIA property of 652 mixed (MMNL) 164, 170–72 sequential best-worst (SBWMNL) 682 use in BWS 187–9, 191–2 nested (NL) 170–71, 371 non-linear 168 regression 195 lotteries 30–32, 73, 84, 86–9, 93, 418–19, 422–3 alternative 31, 86 between-lottery interactions 79–83, 91, 93–4 choice in 29–30 hypotenuse 87–8 non-degenerate 92 options in choice set 91–2 payoffs 89–91 reduction of compound (ROCL) 420 risky 76, 84, 249, 413, 418 safe 413 state 354 two-outcome 92 within-lottery interactions 77–8, 85, 87, 91, 93 Louviere, Jordan role in development of BWS 178–81, 184 majority of confirming dimensions (MCD) 283–4 market behaviour revealed 26 role of decision utility in 23 market demand 19–20 market demand function 11 marketing 649–51, 658, 689 choice set formation 651–3, 667 decision strategy selection 653–4 stock-keeping units (SKUs) 650–51 use of big data in 656–7 Markov process 58 Marschak–Machina (M–M) triangle 76, 81–3, 85–6, 90–91 Marshall, Alfred 9, 16 Marshallian consumer surplus 614 as welfare measure 619 maximum likelihood (ML) 192, 194–5, 259, 265, 312, 314, 574, 577, 631, 633 estimates of conditional logit 195 estimation (MLE) 340–41, 352, 354, 476–7, 566, 569–70, 572–6, 588, 590 quasi- (QMLE) 576–7, 588 use in BHHH 590 use in discrete choice models 570

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Index maximum simulated log-likelihood (MSL) 313, 392 McKenzie, Lionel 11 mean square error (MSE) weight 508–10 aggregate 511, 513 for HB 507 for WLS 507, 509 microeconomic theory 29, 519, 688–9 neoclassical 602–3, 619 microsimulation models 122, 325 Monte Carlo (MC) 352–3, 636 approximation 466 integration 392 Markov chain (MCMC) 459, 465–6, 472, 475–6, 483, 488, 491 Bayesian hypothesis testing 488–91, 493 Gelfand and Dey method 490 Gibbs sampling 466–70, 473 HB logit model 487 HB probit model 484–6, 492 HB regression 480–82, 484 Metropolis–Hastings (MH) sampling 466, 473–5, 477–80 ordinal probit model 483–4 Savage–Dickey density ratio 490 simulations 174, 462, 464, 506–7, 548 Muelbauer, John Almost Ideal Demand System 15 multicollinearity 159 concept of 546–7 multiple price list (MPL) 241 hypothetical bias in 241 multivariate analysis of variance (MANOVA) 158 multivariate normal (MVN) 443 distributed stochasticity 448 cumulative distribution function (MVNCDF) 442–4 errors 442, 444 Muth, John 21 National Oceanographic and Atmospheric Administration (NOAA) 238 National Survey of Families and Households (NSFH) 373 Netflix 333 neurology 37–9 substance addiction observation 39 Newton–Raphson algorithm 500–502, 508 use in ILMs 501 Newton’s method 580, 582, 586 concept of 580–81 quasi- 583–4, 594–5 nonlinear budgets 18 constraint 22

705

nonlinear least squares (NLLS) 572–3, 588, 590 estimation of 589 Gauss–Newton method for 589 nonlinear regression 572–3 optimality Karush–Kuhn–Tucker (KKT) first-order conditions of 429–30, 434–7, 442, 444, 446–50 optimization 567–8, 584, 586, 588 algorithm 580, 583 Bernt, Hall, Hall, Hausman (BHHH) method 588–91 Hessian approximation 590 use of MLE in 590 Broyden, Fletcher, Goldfarb and Shanno (BFGS) method 587–8, 590 DFP update 588 nonlinear 566, 569, 597 stopping rules 594–5 ordered choice model 333, 338–42, 346–7, 352–4 examples of 333–4 ordered probit model 337, 342–4 Poisson model 338–9, 354 ordinary least square (OLS) 503, 513 estimation of 501, 505 regression 498 individual level 504 with MSE weight 507 orthogonal design/orthogonality 23, 158–60, 162, 164–5, 167–8, 171, 173–4, 661, 676 polynomial coding 162–3 sequentially constructed 162, 165 unbalanced 159 use of choice tasks in 156 Pareto, Vilfredo 9 Parsonian functionalism 109 payment card (PCARD) 207–8, 211 potential path area (PPA) concept of 372 preferences 28 strict 52 preferential choice behaviour 49, 67 probabilistic choice set (PCS) model 399–400, 409 probit model 57, 180, 205, 246, 311, 337, 340, 611, 634 bivariate 213 multinominal 520, 546, 551, 566, 590 multiple discrete-continuous (MDCP) model 442–4, 448

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

706

Handbook of choice modelling

ordered model 333, 337, 340, 342–5, 351–2 generalized 348 ordinal model 482–3 HB 482–4 regression 195 public transport operations data (PTOD) 137–8, 144–5 automatic fare collection (AFC) 139, 142 automatic passenger counting (APC) 139 automatic vehicle location (AVL) 139 qualitative 115, 117, 240, 458, 588, 681–2, 692 behavioural process-oriented 123–4 data 122 quantitative 8, 40, 64, 68, 135, 155, 224, 337, 343, 565, 581, 631, 634, 692 direct valuation 179 quantum choice theory 66 RAND Corporation 643 random effect assumption 257 random regret minimization (RRM) model 664 use in travel choice models 293–5, 297–303, 325 random utility class 52 random utility function 435, 658 random utility model (RUM) 49–58, 64, 66–7, 160, 184, 196, 213, 324, 334–8, 369, 383, 434, 459, 492, 607–12, 638, 657, 664, 667–8 Bayesian estimation of 493 classic 60 closed form 180 econometrics 601 log-linear 617 maximum/maximization 196, 300, 325, 442, 519, 521–4, 550–51 non-linear 444 minimum 196 non-linear 618 standard additive 610 use of CDF in 608 random utility theory (RUT) 179, 214, 384, 474, 478, 487, 616–17 concept of 457 development of 457 use in development of DCM 384 ranking models 194 rationality 26, 29–30, 34–5, 689–90 neoclassical 34 utilitarian 32 real choice 236, 242, 247, 249 reciprocity 8, 10, 33 asynchronous 34 concept of 34

recreation demand models data variants used in 668–9 reduction of compound lotteries (ROCL) 420, 422–3 regression hedonic 23 revealed preference (RP) 10, 202, 204, 217, 303, 395, 639, 641, 668–9, 679 application of RRM model 301 disaggregate 679 estimation of 227 use of data in DCM 693 use of data in health economies 675, 679–80, 683 risk management 413 development of 414 non-traded asset 416 preferences time 419 risk attitudes 419–20 risk aversion 417–18 multi-attribute 417–18 self-protection 415–16 insurance 414–15, 417–18 use of EUT in 415–16, 423 risky choice studies 73–4, 80–81, 84, 88, 93 between-lottery interactions 79–80 neurological aspect of 88 within-lottery interactions 77–8 root likelihood (RLH) 507–8, 512–13 root mean squared error (RMSE) concept of 507 root predictive score (RPS) 508, 510, 512 Rosen, Sherwin 22 Roy, Rene 10, 13 sample selectivity models 355 development of 355–6 Poisson regression in 356 Samuelson, Paul 9–11, 28, 209 theories of group selection 36 Second World War (1939–45) 219 seemingly unrelated regression (SUR) 369 Sen, Amartya 365 Shepherd, Ron 10–11 similarity effect 54 Simulator of Transport, Routes, Activities, Emissions, and Land (SimTRAVEL) 112 Slutsky, Eugen 9–10 Smith, Adam 9 social capital 110 social networks 1–2, 33, 109–10, 112, 114, 666 formation of 113–14 impact on behaviour 34 word-of-mouth 33–4

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Index social welfare 19–20, 28 function (SWF) 604–5, 616 Rawlsian 605 sociality 33–5 concept of 33 evolution 35 sociobiology 35 Southern California Association of Governments (SCAG) 114 Soviet Union (USSR) collapse of (1991) 102 standard expenditure function 21 stated choice (SC) 158–9, 162, 164, 270, 301, 641–2, 669 data 152, 158 experiments 119–21, 152, 238–9, 277 controlled 120 hypothetical bias in 239 non-zero Bayesian priors 171–2 studies 152–4, 156–9, 161–5, 168, 170, 172, 174, 270, 273 stated preference (SP) 202, 204, 207, 217, 303, 402, 641–2, 666, 668, 679 elicitation formats 209–10 estimation of 226 studies 296 use in DCE 679 use of data in DCM 693 use of data in health economies 675, 679–80 static psychological models 57–60 context dependent preference (CDP) model 58–9 EBA model 58–9, 66 Thurstone 57–8 Stone, Sir Richard 7, 14 structural equation modelling (SEM) 369 study design 152–4, 158–64, 169–70 attribute levels 155 objectives 157–8 stated choice (SC) design 157, 159, 162–5, 168, 170, 273 subjective well being (SWB) 347 survey data sets 117–19 collection 119 longitudinal data 122 transitivity 53, 79 concept of 52 weak stochastic 53 travel behaviour 105, 109, 112, 115–16, 131, 137–8, 142–4, 146, 302–3, 374, 688–9 activity-travel choice behaviour 114–15, 117 activity-travel engagement 122 car-sharing schemes 693 children 144

707

impact of ICT on 691 impact of road pricing on 692 new vehicle technologies 693–4 self-reported 134–8 tracing technologies 137–45 travel diaries 134–7 travel choice models 290–91, 303–4, 548–9 additional complexity 297–8 alternative 297–8, 301–3 contextual concavity model 293 discrete 290 EBA rule 292, 295 identification issues in 294–5 linear-additive utility maximization rule 292, 299–300 random utility maximization (RUM) model 300 relative advantage model 293 use of RRM model in 293–5, 297–303, 325 trust 40 neurochemical elements 39–40 Tversky, Amos 29 uncertainty 3, 8, 18, 24, 73, 93, 131, 164, 202, 214, 220, 227, 414, 422–3, 488, 606, 639, 642, 650, 681, 691 in forecasting 643 measurement of 460–61 neutral 423 price 492 respondent 213 threshold of 244 unquantifiable 643 United Kingdom (UK) 365–6, 676, 680 Department of Transport 629 HS2 high-speed rail project 629–30 National Institute for Health and Clinical Excellence (NICE) 676 United States of America (USA) 107–8, 115, 118, 227, 373–4, 615 Bay Area Rapid Transit (BART) 628, 641–2 Census 373 Clean Water Act 203 Department of the Interior 238 District of Columbia Court of Appeals Ohio v. U.S. Department of Interior (1989) 214 Environmental Protection Agency (EPA) 204 Public Use Microdata Samples (PUMS) 640 taxation system of 365 University College London (UCL) 8 University of Alberta faculty of 178 University of California, Berkeley Haas Business School

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

708

Handbook of choice modelling

Experimental Social Sciences Laboratory (XLAB) 549 University of Leeds 160–61 US Consumer Expenditure Survey 15 utilitarianism 8, 297 classical 20 utility 9–10, 24, 28, 85, 290, 434, 540, 545, 606, 655 decision 23–5 fixed model 49–50 indirect 21, 23 individual 40 instant 23–4 inter-temporal 10 level 16–17 marginal 9, 14, 20, 431, 447 measurement of 13, 21 non-linear 430 of income constant marginal (CMUI) 610, 612, 614–16 recovery of 11 remembered 23–4 stochastic source of 650 subjective expected (SEU) 422–3 utility distribution function 51 utility form 430, 435, 445, 447 linear 428 Box-Cox 447 non-linear 428 utility function 212, 297–8, 384–5, 417, 430, 432, 445 direct 12 indirect 12, 14, 21–2, 216–17 indirect conditional 607 kinky 492 random 435, 616 reduced form 22 stochastic component of 213–14 sub- 216, 433 utility model fixed 50 neoclassical 20

random 50–51, 56–7, 196, 324, 338, 612 nonlinear 618 Uzawa, Hirofumi 11 values of travel time savings (VTTS) 395 calculation of 277 cumulative distribution function of 395 estimates of 408 heterogeneity in 394–5 weighted least squares (WLS) 511, 513 estimation of 504–6, 512 regression 498, 500, 504 with MSE weight 507, 509 welfare economics 202, 219, 303–4, 601, 616, 619–20 neoclassical 207, 621 well-being 18, 40 neoclassical measure of 16–17 Williams, George 35 willingness-to-accept (WTA) 16–18, 29, 212–13, 218, 220, 603, 661, 664, 667 compensation 203 marginal (MWTA) 17 use of CV in measurement of 220–21 willingness-to-pay (WTP) 16–18, 21, 29, 160, 167, 170, 172, 204–5, 208–10, 212–13, 218, 220, 223–5, 246, 248, 258, 301–2, 603–5, 661–34, 667, 670, 676 direct question (DQ) for 207 estimates 206, 210–11, 219, 226–7, 279–80, 301, 664–5, 670 income elasticity of 203, 220–23 marginal (MWTP) 17, 269 space model 662 use of AN-A in 665 use of CV in measurement of 220–21, 225–6 World Wildlife Fund (WWF) 242 Yugoslavia Yugoslav Wars (1991–99) 102

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Stephane Hess and Andrew Daly - 9781781003145 Downloaded from Elgar Online at 04/07/2015 06:54:14AM via VRIJE UNIVERSITEIT AMSTERDAM (FREE)

Handbook of Electoral System Choice

Handbook of Electoral System Choice

Discrete Choice Modelling and Air Travel Demand

Discrete Choice Modelling and Air Travel Demand

Choice Modelling: The State-of-the-art and the State-of-practice: Proceedings from the Inaugural International Choice Modelling Conference

Choice Modelling: The State-of-the-art and the State-of-practice: Proceedings from the Inaugural International Choice Modelling Conference

The handbook of rational and social choice

The handbook of rational and social choice

Choice

Choice

Choice

Choice

Perspectives on Public Choice: A Handbook

Perspectives on Public Choice: A Handbook

Aircraft Modelling (Modelling Masterclass)

Aircraft Modelling (Modelling Masterclass)

Weapons of choice

Weapons of choice

Weapons of Choice

Weapons of Choice

Freedom of Choice

Freedom of Choice

A Woman of Choice

A Woman of Choice

A Choice of Gods

A Choice of Gods

A Choice of Gods

A Choice of Gods

Choice of The Cat

Choice of The Cat

Choice of the Cat

Choice of the Cat

Choice of the Cat

Choice of the Cat

Weapons of Choice

Weapons of Choice

Choice of Evil

A Choice of Gods

A Choice of Gods

Weapons of Choice

Weapons of Choice

Choice of Evil

Choice of Evil

Choice of Submissions

Choice of Submissions

Choice of Evil

A Choice of Gods

A Choice of Gods

Choice Of The Cat

Choice Of The Cat

Choice of the Cat

Choice of the Cat

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close