CHOICE MODELLING: THE STATE-OF-THE-ART AND THE STATE-OF-PRACTICE
From left to right: Moshe Ben-Akiva, Stephane Hess, Andrew Daly, Daniel McFadden, Riccardo Scarpa, David Hensher, Chandra Bhat, Michel Bierlaire
Guest Speakers and Organisers
CHOICE MODELLING: THE STATE-OF-THE-ART AND THE STATE-OF-PRACTICE Proceedings from the Inaugural International Choice Modelling Conference
EDITED BY
STEPHANE HESS Institute for Transport Studies, Leeds, UK
ANDREW DALY Institute for Transport Studies, Leeds, UK
United Kingdom North America Japan India Malaysia China
Emerald Group Publishing Limited Howard House, Wagon Lane, Bingley BD16 1WA, UK First edition 2010 Copyright r 2010 Emerald Group Publishing Limited Reprints and permission service Contact:
[email protected] No part of this book may be reproduced, stored in a retrieval system, transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without either the prior written permission of the publisher or a licence permitting restricted copying issued in the UK by The Copyright Licensing Agency and in the USA by The Copyright Clearance Center. No responsibility is accepted for the accuracy of information contained in the text, illustrations or advertisements. The opinions expressed in these chapters are not necessarily those of the Editor or the publisher. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-1-84950-772-1
Contents
ix
Biography
xix
Editorial
PART I: GUEST SPEAKER PAPERS
3
1.
Sociality, Rationality, and the Ecology of Choice Daniel McFadden
2.
Planning and Action in a Model of Choice Moshe Ben-Akiva
3.
Attribute Processing, Heuristics and Preference Construction in Choice Analysis David A. Hensher
35
The Multiple Discrete-Continuous Extreme Value (MDCEV) Model: Formulation and Applications Chandra R. Bhat and Naveen Eluru
71
4.
5.
Capturing Human Perception of Facial Expressions by Discrete Choice Modelling Matteo Sorci, Thomas Robin, Javier Cruz, Michel Bierlaire, J.-P. Thiran and Gianluca Antonini
19
101
PART II: DATA COLLECTION 6.
Serial Choice Conjoint Analysis for Estimating Discrete Choice Models Michiel C. J. Bliemer and John M. Rose
139
vi 7.
8.
9.
Contents Observed Efficiency of a D-Optimal Design in an Interactive Agency Choice Experiment Sean M. Puckett and John M. Rose Effects of Stated Choice Design Dimensions on Model Estimates Phani Kumar Chintakayala, Stephane Hess, John M. Rose and Mark Wardman Stated Choice Experimental Designs for Scheduling Models Paul Koster and Yin-Yen Tseng
163
195
217
PART III: CONCEPTS AND METHODOLOGY 10.
Systematically Heterogeneous Covariance in Network GEV Models Jeffrey P. Newman
239
11.
On Estimation of Hybrid Choice Models Denis Bolduc and Ricardo Alvarez-Daziano
259
12.
A Model of Travel Happiness and Mode Switching Maya Abou-Zeid and Moshe Ben-Akiva
289
13.
On Path Generation Algorithms for Route Choice Models Emma Frejinger and Michel Bierlaire
307
PART IV: ENDOGENEITY AND HETEROGENEITY 14.
Mode Choice Endogeneity in Value of Travel Time Estimation Stefan L. Mabit and Mogens Fosgerau
15.
Accommodating Coefficient Outliers in Discrete Choice Modelling: A Comparison of Discrete and Continuous Mixing Approaches Danny Campbell, Stephane Hess, Riccardo Scarpa and John M. Rose
16.
Addressing Endogeneity in Discrete Choice Models: Assessing Control-Function and Latent-Variable Methods Cristian Angelo Guevara and Moshe Ben-Akiva
319
331
353
Contents 17.
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation Based on Compensatory Screening Rules Matthieu de Lapparent
vii
371
PART V: TRANSPORT MATTERS 18.
19.
20.
21.
Transport Welfare Benefits in the Presence of an Income Effect James Laird Which Commuters Will Car Share? An Examination of Alternative Approaches to Identifying Market Segments Jon Crockett, Gerard Andrew Whelan, Caroline Louise Sinclair and Hugh Gillies
399
421
Modelling Choice in a Changing Environment: Assessing the Shock Effects of a New Transport System Marı´a Francisca Ya´n˜ez and Juan de Dios Ortu´zar
445
What Do We Really Know About Travellers’ Response to Unreliability? Yaron Hollander
461
PART VI: BEYOND TRANSPORT 22.
23.
24.
25.
Optimizing Product Portfolios Using Discrete Choice Modeling and TURF Thomas J. Adler, Colin Smith and Jeffrey Dumont
485
Preference Stability: Modeling how Consumer Preferences Shift after Receiving New Product Information Harmen Oppewal, Mark Morrison, Paul Wang and David Waller
499
Investigating Willingness to Pay–Willingness to Accept Asymmetry in Choice Experiments Bruno Lanz, Allan Provins, Ian J. Bateman, Riccardo Scarpa, Ken Willis and Ece Ozdemiroglu Clustering Ranked Preference Data Using Sociodemographic Covariates Isobel Claire Gormley and Thomas Brendan Murphy
517
543
viii
Contents
26.
Continuous versus Discrete Representation of Investing Firm Heterogeneity in Modelling FDI Location Decisions Simona Rasciute and Eric J. Pentecost
571
Development of Integrated Choice and Latent Variable (ICLV) Models for the Residential Relocation Decision in Island Areas Eleni Kitrinou, Amalia Polydoropoulou and Denis Bolduc
593
27.
Biography
Editors Stephane Hess is a principal research fellow in the Institute for Transport Studies (ITS) at the University of Leeds and holds a Leverhulme Early Career Fellowship. He is also research group leader for Economics and Behavioural Modelling at ITS, and is a visiting research scholar in the Institute for Transport and Logistics Studies at the University of Sydney. Hess holds a Ph.D. in transport demand modelling from Imperial College London, and a M.Phil. in statistical science from Cambridge University. His main research interests lie in the use of advanced discrete choice models for the analysis of travel behaviour. His contributions have been recognised by the 2005 Eric Pas award for the best Ph.D. thesis in the area of travel behaviour modelling. He is also the founding editor-in-chief of the Journal of Choice Modelling, and is the chair of the Innovative Methods in Transport Analysis, Planning and Appraisal committee at the European Transport Conference, as well as being a member of the council of the Association for European Transport. Andrew Daly is a research professor at the Institute for Transport Studies in Leeds, a senior adviser to RAND Europe and the author of the widely used ALOGIT software. His work has attempted to bridge the gap between research and practice in choice modelling: pioneering the introduction of random utility models, making advanced models operational for large areas so they can be used in practical planning, and improving the credibility of stated preference methods. He has published and presented well over 100 papers on these subjects, was chair of the recent International Choice Modelling Conference, has contributed to a number of books and regularly reviews papers for the leading transport journals. He has directed large-scale transport modelling projects in The Netherlands, France, Sweden, Denmark, Norway, Australia and the UK and contributed to projects in several other countries. He frequently advises local, national and international government agencies on transport modelling issues.
Guest Speakers Daniel McFadden passed through the public school system of North Carolina, then graduated in Physics but soon developed an interest in human behaviour and turned
x
Biography
to economics. Following the completion of his Ph.D. at the University of Minnesota in 1962, he has spent his career largely at Berkeley, CA and from 1977 to 1991 at the economics faculty at MIT, where he became Director of the Statistics Research Center. He then returned to Berkeley to establish the Econometrics Laboratory, a facility devoted to improving statistical computation for economics applications, where he is now the holder of the E. Morris Cox chair. From 1964 onwards, seeking methods to analyse institutional decisionmaking behaviour, he developed an econometric model based on an axiomatic theory of choice behaviour, showing how this model linked to the economic theory of choice behaviour. These developments are now called the multinomial logit model and the random utility model, which have turned out to be widely useful in economics and other social sciences. Over the years he has written papers on a variety of topics in economics and choice theory, almost all having origins in applied problems, often with an emphasis on binding economic theory, and on developing theoretical and statistical tools. In recent years, his research has concentrated on the deviations from the economic theory of choice and their implications for economic analysis. In 2000, he was awarded the Nobel Prize in Economic Sciences ‘for his development of theory and methods for analyzing discrete choice’. Characteristically, his lecture gave credit to a long list of inspirations and collaborators with whom he has worked. His own contribution to choice modelling is, however, unparalleled. Moshe Ben-Akiva is the Edmund K. Turner professor of civil and environmental engineering and director of the Intelligent Transportation Systems Program at the Massachusetts Institute of Technology (MIT). He holds a Ph.D. degree in transportation systems from MIT and has received honorary degrees from the University of the Aegean, the Universite´ Lumie`re Lyon and the Stockholm Royal Institute of Technology (KTH). His awards include a Transportation Science Dissertation Prize from the Operations Research Society of America (now INFORMS), the MIT Department of Civil and Environmental Engineering Effective Teaching Award, the MIT Samuel M. Seegal Prize awarded to professors who inspire students to pursue and achieve excellence, the Lifetime Achievement Award of the International Association for Travel Behavior Research and the Jules Dupuit Prize from the World Conference on Transport Research Society. He has co-authored two books, including the textbook Discrete Choice Analysis, published by MIT Press, and over 200 papers in refereed journals and conference proceedings. He recently coedited the book Recent Developments in Transport Modelling: Lessons for the Freight Sector published by Emerald. Two traffic simulators have been developed under his supervision: MITSIMLab, a microscopic simulator; and DynaMIT, a mesoscopic simulator — which includes algorithms for dynamic traffic assignment, traffic predictions and route guidance. He has worked as a consultant in industries such as transportation, energy, telecommunications, financial services and marketing for a number of private and public organisations, including Hague Consulting Group, RAND Europe, ChoiceStream and Cambridge Systematics, where he is a senior principal and a member of the board of directors.
Biography
xi
David A. Hensher is professor of management, and founding director of the Institute of Transport and Logistics Studies (ITLS): The Australian Key Centre of Teaching and Research in Transport Management at The University of Sydney. David is a fellow of the Academy of Social Sciences in Australia (FASSA), Recipient of the 2006 Engineers Australia Transport Medal for lifelong contribution to transportation, recipient of the 2009 Bus NSW (Bus and Coach Association) Outstanding Contribution to Industry Award, member of Singapore Land Transport Authority International Advisory Panel (chaired by Minister of Transport) and past president of the International Association of Travel Behaviour Research. David is the cofounder of The International Conference in Competition and Ownership of Land Passenger Transport (the Thredbo Series), now in its 20th year. David is on the editorial boards of 10 of the leading transport journals and area editor of Transport Reviews. He is also series and volume editor of a handbook series Handbooks in Transport. He has published extensively (over 425 papers) in the leading international transport journals and key journals in economics as well as 11 books. Chandra R. Bhat is the Adnan Abou-Ayyash Centennial Professor at The University of Texas at Austin, where he teaches courses in transportation systems analysis and transportation planning methods. Bhat received the 2004 Walter L. Huber Award and the 2005 James Laurie Prize from the American Society of Civil Engineers (ASCE) in recognition of his contributions to ‘innovative methods in transportation systems analysis and modeling.’ He also received the 2006 Lockheed Martin Aeronautics Company Award for Excellence in Engineering Teaching from the Cockrell School of Engineering at UT Austin, and the 2006–2007 Outstanding Graduate Teaching Award from the UT Graduate School. Bhat was also named as the recipient of the 2008 Wilbur S. Smith Distinguished Transportation Educator Award, and was selected as a 2008–2009 Jefferson Science Fellow by the United States Department of State and the National Academies. More details about Prof. Bhat are available at his website: http://www.caee.utexas.edu/prof/bhat/home.html Michel Bierlaire, Belgian, and born in 1967, holds a M.Sc. and a Ph.D. in mathematical sciences from the Faculte´s Universitaires Notre-Dame de la Paix, Namur, Belgium (University of Namur). Between 1995 and 1998, he was research associate and project manager at the Intelligent Transportation Systems Program of the Massachusetts Institute of Technology (Cambridge, MA). Between 1998 and 2006, he was a junior faculty in the Operations Research group ROSO within the Institute of Mathematics at EPFL. In 2006, he was appointed associate professor in the School of Architecture, Civil and Environmental Engineering at EPFL, where he became the director of the Transport and Mobility Laboratory. Since 2009, he is the director of TraCE, the Transportation Center at EPFL. His main expertise is in the design, development and applications of models and algorithms for the design, analysis and management of transportation systems. Namely, he has been active in demand modelling (discrete choice models, estimation of origin–destination matrices) and Dynamic Traffic Management Systems. As of October 2009, he has published 44 papers in international journals (including Transportation Research Part B, the transportation journal with the highest impact factor), 1 book, 20 book
xii
Biography
chapters, 74 articles in conference proceedings, 84 technical reports and has given 134 scientific seminars. His article ‘An efficient algorithm for real-time estimation and prediction of dynamic OD table’, co-authored with Frank Crittin, has received the Best Paper Award of the Transportation Science & Logistics Society of INFORMS in 2006.
Other Authors Maya Abou-Zeid is an assistant professor of Civil and Environmental Engineering at the American University of Beirut (AUB) and a research affiliate of the Massachusetts Institute of Technology (MIT). Maya received doctoral and master’s degrees in transportation from MIT and a bachelor’s degree in civil and environmental engineering from AUB. Thomas J. Adler is president of Resource Systems Group, Inc., a US-based transportation, market research and environmental consulting firm. Prior to cofounding the firm, he was a professor at Dartmouth College for 10 years, where he taught graduate-level courses in discrete choice modelling, operations research, statistics and transportation modelling methods. Ricardo Alvarez-Daziano is doing a Ph.D. at Laval University since 2005 and member of the GREEN and CDAT Research groups of the Department of Economics. His thesis concerns the study of hybrid choice models (HCM) and aims to improve the representation of behavioural decision making. Gianluca Antonini is a senior research scientist at the IBM Zurich Research Laboratory (ZRL), working with the Information Analytics group in the Mathematical & Computational Sciences Department. He holds a master degree in telecommunication engineering from the University of Siena, and a Ph.D. from the Signal Processing Institute, Ecole Polytechnique Federale de Lausanne (EPFL). Ian J. Bateman is professor of environmental economics at the University of East Anglia, UK and associate professor at both the University of Western Australia and the University of Waikato, New Zealand. Michiel C. J. Bliemer is an associate professor transport modelling at Delft University of Technology, an adjunct professor at the Institute of Transport and Logistics Studies at The University of Sydney, and a senior consultant transport innovation and modelling at Goudappel Coffeng BV. He holds an M.Sc. degree in econometrics (honours) and a Ph.D. in traffic engineering. Denis Bolduc is full-time professor at Laval University. His major fields of expertise include: discrete choice analysis, applied econometrics and demand modelling. He has performed applied and theoretical research mostly in the transportation field. He has been chair of the department during the 2002–2008 period. He is currently chair of the CDAT.
Biography
xiii
Danny Campbell is a lecturer in environmental economics at the Institute for a Sustainable World, Queen’s University Belfast. He has obtained a Ph.D. in environmental economics, a M.Sc. in rural development and a B.Sc. in agricultural economics. Danny’s research interests include environmental valuation and methodological issues associated with discrete choice modelling. Phani Kumar Chintakayala is Accent research fellow at Institute for Transport Studies in the University of Leeds, UK. He also works as stated preference analyst for Accent Marketing & Research Ltd., UK. His research interests are stated preference designs, behavioural efficiency analysis and application of SP to transport and non-transport sectors. Jon Crockett is a transport planner and researcher at MVA Consultancy. He is their technical group leader for behavioural research, modelling and analysis, and has specialised in understanding reactions to changes in the transport network(s) through the application of econometric theories. Javier Cruz received the M.S. degree in mathematics and the M.S. degree in telecommunications engineering from the Technical University of Catalonia (UPC), in 2005 and 2006, respectively. In January 2007, he joined the Transport and Mobility Laboratory (Transp-OR) of EPFL, where he is now working towards his Ph.D. on image and signal processing under the supervision of Prof. M. Bierlaire and Prof. J.-P. Thiran. Matthieu de Lapparent is researcher at the French National Institute of Research on Transport and Safety (INRETS). He holds a Ph.D. in economics from the Universite´ de Paris 1 Panthe´on-Sorbonne. He specialises in discrete choice modelling accounting for bounded rationality of decision makers and/or for risky choice situations. Jeffrey Dumont is an associate at Resource Systems Group with professional interests in market research and discrete choice modelling. He is a graduate of the mathematics program at Lafayette College in Easton, PA. Naveen Eluru is currently a Ph.D. candidate in transportation engineering at The University of Texas at Austin. He received his M.S. degree in civil engineering at The University of Texas at Austin and bachelors in technology from Indian Institute of Technology, Chennai. Naveen grew up in Andhra Pradesh, India. Mogens Fosgerau is a senior researcher at DTU Transport. His main research interests are transport economics, microeconometrics and valuation of non-market goods. His most recent publications relate to the value of travel time reliability and the use of non-parametric methods in transport modelling. Emma Frejinger holds a Ph.D. in mathematical sciences from EPFL. She won the TSL INFORMS dissertation prize 2008 for her thesis on route choice modelling (Michel Bierlaire, advisor). Since 2008, she is researcher at the Centre for Transport Studies (Royal Institute of Technology, Stockholm).
xiv
Biography
Isobel Claire Gormley is a lecturer in statistics in University College Dublin, Ireland. The statistical modelling of rank data is among her research interests. Cristian Angelo Guevara is author of several scientific papers in transportation demand and economics. He is a research professor at Universidad de los Andes, Chile. He holds an M.Sc. from Universidad de Chile and another from MIT, where he now pursues his Ph.D. He had been awarded the Fulbright and the Martin-Family fellowships. Hugh Gillies is the development management/strategic roads safety manager for Transport Scotland. During the time of this study he led Transport Scotland’s research programme on Land use And Transport Integration in Scotland (LATIS), which provided support for the appraisal of policies in areas such as transport, planning and the environment. Yaron Hollander works for Steer Davies Gleave in London, and is very active in applied research on public transport reliability. In his Ph.D. thesis (at the Institute for Transport Studies, Leeds) and subsequent projects for various clients he investigated travellers’ attitudes to unreliability, techniques for forecasting the level of unreliability and the introduction of reliability benefits to scheme appraisal. Eleni Kitrinou is lecturer of statistics, computers and methodology of social research at the Department of Sociology, University of the Aegean, Greece. Paul Koster received his master of science in spatial, transport and environmental economics at the Department of Spatial Economics, VU University Amsterdam in 2007, where he is currently working as a Ph.D. candidate. His research focuses on the estimation and valuation of travel time variability in private and public transport. James Laird is a senior research fellow at the Institute for Transport Studies in the University of Leeds, UK. His main research area is the economic appraisal of transport projects. He has worked in both the private and the academic environments and has recently completed a Ph.D. in transport economics. Bruno Lanz is a Ph.D. candidate in economics at ETH Zu¨rich, a visiting scholar at Massachusetts Institute of Technology’s Joint Program on the Science & Policy of Global Change and an associate consultant with Economics for the Environment Consultancy (eftec) in London. Stefan L. Mabit, assistant professor at DTU Transport. He finished his Ph.D. study at DTU Transport in 2008 on discrete choice issues within transport modelling. His main research interests are travel demand modelling and discrete choice modelling, especially related to willingness-to-pay measures and car demand. Mark Morrison is Associate-Director of the Institute for Land, Water and Society and professor in the School of Business at Charles Sturt University in Bathurst, Australia. He received his Ph.D. from the University of New South Wales. His research interests are in choice modelling, non-market valuation, market segmentation, marketing
Biography
xv
education and environmental marketing. He published in several major journals in environmental economics and marketing education. Thomas Brendan Murphy is an associate professor of statistics, in University College Dublin, Ireland. The statistical modelling of rank data is among his research interests. Jeffrey P. Newman served as a researcher in the Transport and Mobility Lab at EPFL, holds a Ph.D. in civil engineering from Northwestern University, and a masters of public administration and a B.S. in policy analysis from Cornell University. He has been awarded fellowships from the US Departments of Transportation and Homeland Security. Harmen Oppewal is professor in the Department of Marketing at Monash University in Melbourne, Australia. He received his Ph.D. from the Technical University of Eindhoven. His research focuses on choice modelling and consumer analysis in retail, services, transport and tourism. He published in several of the main journals in marketing, transport and planning. Juan de Dios Ortu´zar is professor of transport engineering at the Pontificia Universidad Cato´lica de Chile. He has published widely in the areas of discrete choice modelling and advanced survey methods, with particular interest in the willingness-topay for reducing transport externalities. His book Modelling Transport (with L. G. Willumsen) is reaching its fourth edition and also being translated to Chinese. Ece Ozdemiroglu, the founding director of Economics for the Environment Consultancy (eftec), is an environmental economist specialising in economic valuation and using this evidence for environmental decision-making. She has coedited or co-authored 11 books and contributed to several articles. Ece studied economics at Istanbul University and environmental and resource economics at University College London. Eric J. Pentecost is professor of economics and head of the Department of Economics, at Loughborough University, UK. Eric worked as an economics analyst in the Bank of England before joining the staff of Loughborough University in 1987. Eric’s research interests are in macroeconomics and international finance in which fields he has published over 50 academic papers and authored three books. Amalia Polydoropoulou is associate professor of analysis of transport systems at the Department of Shipping, Trade and Transport, University of the Aegean, Greece. Allan Provins is an environmental economist working for economics for the Environment Consultancy (eftec) in London, UK. His main areas of work are economic valuation of non-market goods and services and the application of cost–benefit analysis for public and private sector decision making. Sean M. Puckett is postgraduate coordinator in transport management and a lecturer in transport and supply chain management at the Institute of Transport
xvi
Biography
and Logistics Studies at The University of Sydney. He previously worked at the US Department of Commerce. Simona Rasciute is a research associate in the Department of Economics at Loughborough University, UK where she also gained Ph.D. (2008). Simona did her undergraduate studies in Kaunas, Lithuania, after which she also finished two masters degrees in Kaunas and in Antwerp, Belgium. Simona’s research interests are in international and financial economics and discrete choice modelling. Thomas Robin obtained in 2006 an engineer degree from the ‘Ecole des mines d’Ale`s’ in France. Since October 2006, he is Ph.D. student in the Transp-or Laboratory at EPFL, under the supervision of Prof. Michel Bierlaire. He works on the behavioural modelling of human experts for scene analysis, particularly on facial expression recognition in videos. John M. Rose is a senior lecturer in transport and logistics management and program director at the Institute of Transport and Logistics Studies at The University of Sydney. He holds a bachelor of economics (honours) and a Ph.D. in transport economics. Caroline Louise Sinclair is a transport modeller at MVA Consultancy. She has developed and applied highway, public transport and multi-modal models for a wide range of studies. Recently, she has managed the development of an aviation demand and economics forecasting model for the US Federal Aviation Administration. Riccardo Scarpa researches choice modelling for non-market valuation methods. Since 2005 he has held a chair in environmental economics at the University of Waikato, at Waikato Management School Economics Department where he teaches econometrics and resource and environmental economics. He previously worked at York, Newcastle upon Tyne, Madison and Viterbo. Colin Smith is a senior project consultant at Resource Systems Group, with professional interests in travel demand forecasting and discrete choice modelling. He is a graduate of the civil engineering program at the University of Nottingham and the intercollegiate transport program at Imperial College and University College, London. Matteo Sorci received his B.S. and M.S. degrees from the Faculty of Telecommunication Engineering, University of Siena, Italy in 2001, and the doctoral degree in 2009 from EPFL (Swiss Federal Institute of Technology), in the Signal Processing Laboratory under the supervision of Prof. Jean-Philippe Thiran. Jean-Philippe Thiran received the Elect. Eng. and Ph.D. degrees from the Universite catholique de Louvain (UCL), Louvain-la-Neuve, Belgium, in 1993 and 1997, respectively. Since January 2004, he has been an assistant professor, responsible for the Image Analysis Group at the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland. His current scientific interests include image segmentation, prior knowledge integration in image analysis, partial differential equations, and variational methods in image analysis, multimodal signal processing, medical image
Biography
xvii
analysis, including multimodal image registration, segmentation, computer-assisted surgery, and diffusion MRI. Dr. Thiran was co-editor-in-chief of Signal Processing Journal (published by Elsevier Science) from 2001 to 2005. He is currently an associate editor of the International Journal of Image and Video Processing (published by Hindawi), and member of the editorial board of Signal, Image and Video Processing (published by Springer). He was the general chairman of the 2008 European Signal Processing Conference (EUSIPCO 2008). He is a senior member of the IEEE, and a member of the MLSP and IVMSP technical committees of the IEEE Signal Processing Society Yin-Yen Tseng received her Ph.D. in economics at VU University Amsterdam in 2008. Her research focuses on the reliability of travel time in passenger transportation, valuation methods and discrete choice analyses. She is now working as a post doc at the Department of Spatial Economics, VU University Amsterdam. David Waller is a senior lecturer in the School of Marketing at the University of Technology, Sydney. He obtained his Ph.D. from the University of Newcastle. His research focuses on advertising strategy and media planning. He has several publications in the main advertising journals. Paul Wang is a senior lecturer in the School of Marketing at the University of Technology, Sydney. He obtained his Ph.D. from the University of Arkansas. His interests are in choice modelling, segmentation and market research. He published in several of the main marketing journals. Mark Wardman is professor of transport demand analysis at the Institute for Transport Studies (ITS) at the University of Leeds, and is the director of ITS. His main research interests are econometric analysis of travel demand, particularly rail and stated preference, with an emphasis on public transport and environmental factors. Gerard Andrew Whelan is an economist and transport planner by profession. His work covers road, rail and air sectors with a focus on competition and consumer choice. Gerard leads MVA Consultancy’s transport planning and social market research activities in London. Ken Willis is professor of environmental economics at Newcastle University. He has conducted numerous valuation projects using stated preference choice models, as well as contingent valuation, travel-cost and hedonic price models. Issues covered include biodiversity, cultural heritage, energy, forests, landscape, quarries, recreation, transport, waste disposal, and water quality and supply. Marı´ a Francisca Ya´n˜ez received her B.Sc. degree in civil engineering from Pontificia Universidad Cato´lica de Chile (PUC). She is currently Ph.D. candidate at PUC, and is enjoying a 1-year stay at the German Institute for Economic Research (DIWBerlin) financed by the Chilean Council for Scientific and Technological Research and the German Academic Exchange Service (DAAD). She has presented part of her thesis work at the most important conferences in Latin America and Europe.
Editorial
The inaugural International Choice Modelling Conference was organised by the Institute for Transport Studies at the University of Leeds and was held at Harrogate in Yorkshire in the North of England from 30 March to 1 April 2009. The conference brought together leading researchers and practitioners from across the many different areas in which choice modelling is a key technique for understanding behaviour and evaluating policy. The conference also came 29 years after another University of Leeds-organised conference on Research and Application of Disaggregate Travel Demand Models. Several of the delegates from the 1980 event were once again present at our conference, showing the longevity of the field. The presence of numerous new young delegates illustrates the growing international popularity of choice modelling as a topic of research. The highlight of the conference was a presentation by Professor Daniel McFadden from the University of California at Berkeley, Nobel Prize laureate in Economics and chief architect of random utility modelling. We were similarly delighted to be able to secure the involvement of five other leading choice modellers as keynote speakers, namely Professor Moshe Ben-Akiva, Professor Chandra Bhat, Professor Michel Bierlaire, Professor David Hensher and Professor Riccardo Scarpa. The diversity of the field was reflected in presentations by both academics and practitioners, coming from six continents and various different fields, with a similar mix in non-speaking delegates. This book brings together a selection of the best theoretical and applied papers from the conference, as well as five of the guest speaker contributions. We specifically chose not to use this editorial for the purpose of summarising the various papers contained in this volume; we leave it to the readers to form their own opinions. We have, however, grouped the papers into a number of subsets. The first of these contains the five guest speaker papers referred to above and which indicate the achievements and current frontiers of the research area in a number of directions. This is followed by a set of four papers looking at data collection, primarily concerned with issues of experimental design and another set of four papers dealing with behavioural concepts and methodology which introduce a number of new ideas. Next are four papers looking in different ways and contexts at issues to do with endogeneity and heterogeneity, illustrating the rich variety of types of behaviour that can be addressed. The field of transport has always been one of the most fertile
xx
Editorial
breeding grounds in choice modelling, and this is reflected in the four papers contained in the next section. However, highlighting the multi-disciplinary nature of the field and the conference, the final subset contains six papers with applications from beyond transport. It remains to us to once again thank our five conference sponsors, MVA Consultancy, Peter Davidson Consultancy, RAND Europe, Resource Systems Group Inc. and Significance, and our additional supporters, Accent and PTV. Further thanks need to go to the members of the academic committee, as well as Julie Hipkin, who worked tirelessly behind the scenes. Finally, all delegates will remember the wonderful musical performance by Supertram, joined on stage by the Red Hot Chilean Professors and John Bates Piano Services, with Mark Wardman on additional vocals and air guitar. In closing, we hope to see many of the authors and readers of this volume at the second International Choice Modelling Conference, planned for 2011. Stephane Hess Andrew Daly University of Leeds, UK
PART I GUEST SPEAKER PAPERS
Chapter 1
Sociality, Rationality, and the Ecology of Choice Daniel McFadden
Abstract This paper discusses the influence of human sociality on choice behavior, through association with social networks and the influence of these networks on constraints, perceptions, preferences, and decision-making processes. The paper discusses ways to incorporate these factors into choice models, while retaining the aspects of the theory of individual rationality that are predictive. Finally, the paper outlines an econometric method for solving the ‘‘reflection problem’’ of determining whether social affiliations follow preferences, or preferences follow social affiliations, by distinguishing opportunity-based and preference-based motivations for association with social networks.
1.1. Introduction A defining feature of the neoclassical economic theory of choice is that consumers meet in the marketplace, but are individualistic and egocentric in their tastes and beliefs. They are indifferent to the welfare of others, with sovereign preferences that are immune to the influence of their peers. Jostling in the market, haggling over prices, and influences that are not effectively priced such as congestion, induce interdependence between consumers that may be intense. Consumers use these interactions to acquire facts and update their rational expectations, but in the classical theory, a consumer never lets social interactions get under her skin and
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
4
Daniel McFadden
directly touch her perceptions or preferences. Rivalry in markets may be up-close, but if it becomes personal, then one has to take it outside (of classical economics). This paper considers the alternative proposition that sociality, the influence of direct interpersonal interaction on human behavior, must be taken into account in modeling choice behavior. Humans are social animals, and even when self-interest is paramount, one’s self is defined through the reaction and approval of others. Models that fail to account for sociality will not predict well in circumstances where consumers use their social networks to define their beliefs and preferences. This paper examines ways to model behavior that incorporate sociality while keeping the core aspects of rationality that are predictive for choice. Sociality is usually defined as the tendency to associate in or form social groups and networks, such as families, friends, neighbors, co-workers, teams, religious sects, tribal and ethnic groups, political parties, and other affinity and interest groups. I will term all of these alliances social networks. Membership in some social networks is involuntary — you do not get to choose your family or your nationality. However, many social network affiliations are voluntary, and are often dynamic, involving recruitment, commitment in varying degrees, and the possibility of expulsions, quits, and break-aways. There are four (nonexclusive) primary motivations for voluntary affiliation with social networks: mutual support, protection, and status; efficiencies in collection and sharing of information; opportunity-based homophily: joint production and division of effort, and risksharing; preference-based homophily: decision-making economies, and approval. All of these motivations stem from the evolution of humans as social animals who are comfortable operating in groups, but the last three are also economic, reducing the likelihood of regret from unsatisfactory choices. Two significant features of the social behavior of humans are reciprocity, exchanges of benefits that are not necessarily synchronous or bilateral, and altruism, conveying of benefits to others without an expectation of a reciprocal benefit. Reciprocity is often explained by enlightened self-interest, the instrument for establishing reputation and trust that facilitates mutually advantageous commerce. Norms for reciprocity and fairness may evolve in social networks through accountability, approval, and sanctions of network members; e.g., a person who fails to conform to norms for sharing may be punished. Networks also establish reciprocity and fairness norms through recruitment and expulsion of members, recruiting those who appear to accept the network norms and expelling those who fail to conform. Finally, to the extent that these norms make social networks more rewarding for members, there is likely to be group selection, in which networks with more effective norms are more attractive, and consequently more likely to survive. Altruism enters behavior when the preferences of individuals depend on the satisfactions achieved by other network members. One way to recast classical
Sociality, Rationality, and the Ecology of Choice
5
consumer theory to encompass altruism is to postulate that consumers have individualistic felicities, and personal welfare functions that depend on their own felicity and the felicities of others. An immediate question with this formulation is how a consumer can know or judge how satisfied other network members are. One possibility is that consumers depend on the self-reported satisfactions of others, but such reports are not necessarily reliable; e.g., the aphorism ‘‘Learn to complain without suffering.’’ Another, proposed by Browning and Chiappori (1998), is that satisfaction of others is judged by observed features of their consumption; e.g., the size of their house or the brand of their wristwatch. There are several broad regularities of altruism that provide some insight into its nature. First, altruism is weaker toward people who when placed in a situation similar to oneself behave differently than oneself (see Luttmer, 2001). Second, altruism is stronger toward people who demonstrate altruism to other network members, and this reinforces norms for multilateral reciprocity (see Tagiuri & Kogan, 1957). Together, these patterns suggest that consumers may evaluate the felicities of others using their own tastes, and what they can observe or infer about others’ circumstances. Such behavior would be consistent with the psychological observation that while people may be empathetic to others, they find it difficult to ‘‘put themselves in the other person’s shoes,’’ so that the other person’s pleasure, pain, and emotion are attenuated and interpreted through one’s own experiences. Behavior that does not seem sensible when viewed using one’s own preferences and expectations will tend to be interpreted as destructive or improper. A behavioral aspect of individualistic preferences is that people seem to be on a hedonic treadmill, with quick adjustment to homeostasis, and pleasure and pain experienced relative to the status quo, with losses looming larger than comparable gains. The hedonic treadmill may also operate interpersonally, with personal welfare adapting quickly to status quo privilege and status in networks, but highly sensitive to gains and losses in position in the network. Thus, ‘‘keeping up with the Joneses’’ is less about one’s historical status and more about the ‘‘fairness’’ of changes in relative status, and individuals are likely to be particularly adverse to unfavorable innovations that they view as unfair or undeserved. Both altruism and reciprocity can be interpreted as the result of evolution, mechanisms for satisfactory solution of repeated games or resource allocation within networks that have survival value, and have become internalized through the success of individuals and networks with higher propensities to conform to the associated behaviors. This paper develops the implications of sociality for choice behavior, and discusses ways in which social network effects on choice can be modeled. Section 1.2 gives a very selective review of the large literature on sociality and behavior, and classifies the ways in which sociality may enter the decision-making process. Section 1.3 gives a more detailed discussion of the channels through which social network effects influence choice, and ways to build these channels into econometric models of choice. Section 1.4 examines more closely the econometric issue of simultaneity in the determination of individual choices and aggregate network behavior.
6
Daniel McFadden
1.2. How Sociality Influences Economic Behavior Economists and other social scientists have long been aware that behavior is influenced by the social networks in which individuals operate. The early economists emphasized the importance of family in economic decisions. Adam Smith (1759) said ‘‘Every man feels [after himself, the pleasures and pains] of the members of his own family.’’ Edgeworth (1881) said ‘‘y efforts and sacrifices y are often incurred for the sake of one’s family rather than oneself.’’ Veblen (1899) emphasized the competitive aspects of sociality, stating that the ‘‘human instincts of emulation, predation, workmanship, parental bent, and idle curiosity [dictate consumption behavior].’’ These aspects of consumer behavior were omitted from the neoclassical formalization of consumer theory by Hicks and Samuelson, but not from the concerns of economists. Dusenberry (1949) and Schelling (1969, 1971) emphasized the comparative aspects of consumption. Manski (1993), Banerjee (1992), and Samuelson (2004) emphasized the role of information obtained from social networks regarding the attributes and desirability of choice alternatives; e.g., Banerjee says ‘‘We often decide on what stores and restaurants to patronize or what schools to attend on the basis of how popular they seem to be.’’ There have been attempts to model network effects on choice. Brock and Durlauf (2001, 2002) construct a model in which ‘‘[t]he utility or payoff an individual receives from a given action depends directly on the choices of others in that individual’s reference group [a field effect], as opposed to the sort of dependence which occurs through the intermediation of markets.’’ Manski (1995, Chapter 7) shows that field effects cannot be identified from equilibrium observations except through functional restrictions. Dugundji and Walker (2005) estimate a model in which ‘‘interdependencies [across decision-makers] are captured [through] choices of others [and] correlation of disturbances,’’ an informative analysis that does not, however, overcome Manski’s identification problem. An aspect of sociality that has received relatively little attention from econometricians is the endogeneity of network effects when network affiliation is voluntary. A simple and fundamental question is whether individuals join networks to gain information that will improve their choices, or to gain approval that will reinforce their choices. A few experimental studies indicate that network affiliation is at some level a ‘‘disposable’’ aspect of behavior that is acquired or dropped at the individual’s convenience, and does not loom large in preferences (see Myagkov, Orbell, Johnson, & Menshikov, 2007). However, this seems at odds with the rather strong reciprocity and altruism effects that are observed even in relatively transient social networks. Today, interest in sociality ranges across the scientific spectrum, from neurological and cognitive science, through evolutionary biology, to studies of cultural and social norms in anthropology and sociology, and to demand for positional goods in marketing. However, while sociality has been widely recognized as critical to understanding human behavior, the mechanisms underlying its influence on choice have been obscure, and there is no simple formalization of socially driven choice behavior that is comparable to that for neoclassical consumer theory. A useful
Sociality, Rationality, and the Ecology of Choice
7
starting point is to recognize that sociality can influence the choice process at four stages: constraints that define available choice alternatives, information about and perceptions of alternatives, preferences, and the processes used to reach decisions. The following section discusses each of these channels in some detail, and cites selected evidence on their nature and importance.
1.3. Modeling the Effects of Sociality 1.3.1.
Constraints
Constraints are the limits on choice imposed by the actions of others. These may be mediated by the market, as in the price of gasoline a commuter faces as a result of the driving behavior of others, or bridge tolls imposed by highway authorities. They may also come from outside of markets, as in the effect of travel decisions by other household members on the modes available to a commuter, or the effect of driving decisions of others on the traffic congestion a commuter faces. For example, the choice environment faced by a driver in congested conditions is largely determined by the constraints imposed by surrounding traffic. The classical theory of rational behavior is designed to handle these sort of constraints. Its primary focus is on the consumer’s budget constraint operating through market prices and income, but nonmarket externalities (e.g., congestion, social network field effects) can be accounted for in neoclassical characterizations of alternatives, individualistic preferences can take these effects into account, and one does not have to introduce new features of consumer theory to accommodate them. In judging whether rational models are adequate for predicting choice behavior, it will be important that these models be fed realistic and fully articulated constraints. In particular, instrumental activities such as travel that are part of household and network production of personal benefits can be accommodated within the neoclassical model, although when the constraints on the individual are determined by bargaining in the social network, it may not be appropriate to treat them as exogenous. For example, two-stage budgeting at the family level, negotiation over who gets the family car, and allocation of household production tasks, require a formulation of the consumer problem that goes beyond individualistic felicities. In some circumstances, individuals may subject themselves voluntarily to constraints as part of their resource allocation problem within their social networks, so these constraints are endogenous.
1.3.2.
Perceptions
Perceptions are formed in the rational consumer model through observing events and processing facts in a systematic way consistent with the laws of probability. In general, the classical theory permits consumers to have innate subjective priors that are updated over time through experience and observation, with the market
8
Daniel McFadden
reconciling the beliefs of different individuals through myriad wagers; a phenomenon we do not observe. There is much stronger empirical support for the behavioral alternative that memory is imperfect, with selective memory for first, last, and extreme occurrences, and coincidences, and the personal probability calculus is inconsistent, with small probabilities either overestimated or ignored. Analogies and exemplars, often drawn from associates in social networks, are used in place of carefully calculated risk probabilities. Risk perceptions are modulated by inattention, and heavily influenced by the status quo. One major problem that individuals have in forming accurate perceptions is extension bias, a cognitive failure to consistently integrate experience along dimensions that are relevant for determining satisfactory choices. A classical experiment of Kahneman, Fredrickson, Schreiber, and Redelmeier (1993) shows that individuals remember an episode that extends over time primarily in terms of the extreme and final levels of experienced pleasure or pain, rather than an integral of the instantaneous sensations. This can lead to choices based on remembered felicity that are inconsistent with optimization of experience. A convincing example of how extension bias can harm decision-makers is the common perception that when driving, one tends to frequently get stuck in a slow lane, compelling a lane change. Redelmeier and Tibshirani (1999) conducted an experiment in which they ‘‘videotaped traffic sequences by mounting a camera in a moving vehicle and filming the side-view perspective of the next lane on a congested road. When a section of videotape showing a slightly slower average speed in the next lane was screened to driving students (n ¼ 4120), 70% stated that the next lane was moving faster and 65% said they would change lane if possible.’’ What causes this common perception? Psychophysical effects may play a role. An overtaken vehicle is quickly out of sight (and out of mind), while an overtaking vehicle is a visible source of irritation until it disappears ahead. Losses from the ‘‘status quo,’’ being overtaken, outweigh gains, and are more memorable. Slowly moving drivers may be more attentive to adjacent lane activity. Finally, humans (and other animals) are more stressed by objects moving toward them in their visual periphery than objects moving away from them in their central vision. However, an extension bias is the most likely explanation of the misperception. Suppose two adjacent lanes have the same average speed, and over a long distance are equally fast, but vary in relative speed. If you count the number of cars passing you, or that you pass, or the proportion of the distance traveled during which you are being passed, you will conclude correctly that there are no advantages to switching lanes. However, if you take a time average of the events of passing or being passed, then you will conclude that you are being passed a majority of the time. Figure 1.1 shows the transit times of drivers in adjacent lanes 1 and 2 when speeds vary sinusoidally, with a phase shift between the lanes. Lane1 is initially faster, but eventually slower, and total transit time over a kilometer is the same for both lanes. In either lane, the event of being passed occurs 58% of the time, so that all drivers who time average will conclude that they are stuck in the slower lane. Redelmeier and Tibshirani find a similar pattern in a more realistic simulation with random arrival times in the two lanes and various levels of congestion.
Sociality, Rationality, and the Ecology of Choice
9
Figure 1.1: Travel times in adjacent lanes. The point of this example is that perceptions, even those based on personal observation, are not necessarily accurate enough to exclude bad choices. Then, the reported perceptions and experiences of network members, and their observed behavior, may provide a stabilizing guide to the individual decision-maker. An important role of social networks is to collect and disseminate information on attributes of choice alternatives. Active information transfers occur when ‘‘early adaptors’’ convey their experiences with new products to other network members; e.g., friend’s ratings of new movies, books, and restaurants. Passive information transfers occur when individuals ‘‘learn by observing’’ behavior patterns of network members; e.g., perceiving that a brand of mobile phone is desirable because many network members have acquired one. The dynamics of either type of information flows are described by contagion models, in which the share of early-adopters in a network, and the level of reliability and trust in the network, determine the speed of transmission and the ultimate penetration of the information. For example, transmission will be more complete in voluntary networks characterized by preference-based homophily, as the shared tastes in such networks raise the likelihood that alternative attributes that appeal to others in the network will also appeal to you. A feature of contagion processes is that they can lead to distinct nonunique equilibria, perhaps triggered by systematic features of a network, but also perhaps generated by chance. Kohler (1997) gives an example of the dissemination of information on contraceptive techniques among women in rural villages in Korea, where different methods are dominant in different villages due to chance first
10
Daniel McFadden
adoptions and contagion within, but not between, villages. An important econometric consequence of equilibrium in contagion models is that it will often be impossible to identify from observations on equilibrium the structural parameters of the process, and separate the effect of initial propensities of a network from the effect of contagion (see Manski, 1993). To overcome this problem, it is necessary to observe the dynamics of the process. Further discussion of the field effects of information on perceptions and induced preferences can be found in Banerjee (1992), Aoki (1996), and Brock and Durlauf (2001, 2002). A strong behavioral effect is ambiguity aversion — people will choose an alternative with known risk characteristics over one with unknown risk characteristics, even in circumstances where there can be no objective difference in risk. Figure 1.2 shows two bowls, each containing 10 balls that are either white or black. You are asked to choose a bowl, then draw a ball. If you draw a black ball, you win a prize. Bowl A contains five black balls, so the probability of winning if you choose A is 50%. The number of black balls in bowl B is completely unknown, with no prior information that would lead you to believe than one configuration is more likely than another. Then, your probability of winning if you choose B is also 50%. However, experimental subjects overwhelmingly choose bowl A. More generally, when consumers are faced with choice among unfamiliar alternatives, ambiguity is a critical element. People will then be strongly motivated to turn to their social networks for information that removes ambiguity, strengthening field effects and herd behavior. Gilboa and Schmeidler (1989) and Fosgerau and De Borger (2008) show that this behavior is ‘‘rational’’ when consumers use least-favorable Bayesian priors to resolve ambiguity. Ambiguity aversion has a number of behavioral consequences. First, it will induce strong ‘‘status quo’’ effects — ‘‘the devil you know is better than the devil you don’t.’’ Second, neglect of private information and adoption of protective heuristics (e.g., ‘‘Don’t gamble.’’ or ‘‘Never a borrower nor a lender be.’’) will often induce inefficient resource allocation, even if effective in avoiding major risks. Third, while emulating network behavior will induce stable response to small shocks, as decision-makers discount private information relative to observed field response of the network, it may induce chaotic response to large shocks when contagion flips the network to a new equilibrium.
? ?
?
? ?
? ?
Bowl A
?
?
?
Bowl B
Figure 1.2: Ambiguity aversion.
Sociality, Rationality, and the Ecology of Choice 1.3.3.
11
Preferences
The neoclassical consumer model is based on the proposition that consumers have predetermined, sovereign, individualistic preferences, and as a consequence are indifferent to the welfare of others, and immune to influence from their social networks. The existence of social networks belies this premise — the attraction of social networks and the glue that permits their operation must be rooted in interpersonal sources of satisfaction. Approval by others, and accountability to network social norms and the threat of sanctions, may shape preferences and be internalized. One model for these interpersonal connections is that individuals have personal welfare functions of the individualistic felicities of themselves and of others in their network, with altruism explaining the dependence. Other interpersonal traits may also enter this characterization of preferences, such as competition for status, predation, and altruistic punishment of network members who breach social norms. In practice, individuals will not know all details of the consumption of others, or their idiosyncratic felicities. This can be modeled by postulating that individuals form expectations regarding the felicities of others using their own experiences and tastes, and what they observe about the behavior and stated satisfactions of others. A strong but plausible hypothesis is that when judging the satisfaction of others, individuals use their own felicity tastes, and are substantially less willing to accept statements of satisfaction that are inconsistent with their own tastes. Strong empirical evidence for altruistic preferences comes from experimental play of the trust game. This is a one-shot game with anonymous players who are not allowed to engage in side conversation. The experimenter gives 100 monetary units to the first player, called the investor. This player can then transfer a proportion X of this endowment to the second player, called the trustee. The experimenter augments this transfer with an additional 2X monetary units, so the trustee has 3X in total. The trustee than transfers an amount Y between zero and 3X back to the investor, and keeps the remainder. This is a dictator sub-game for the trustee, and a rational response for a selfish trustee is to return Y ¼ 0. Then, it is rational for the first player to invest X ¼ 0. It is found in many trust game experiments that the players are not neoclassically rational, and instead both the amounts invested and the amounts returned are positive. For example, Berg, Dickhaut, and McCabe (1995) find that for knowledgeable players, the average investment is 56.1 monetary units or percent of the endowment, and the average return by the trustee is 67.6 monetary units, which is 40.2% of the augmented investment the trustee receives. The question then is what the motivations of the players could be. One possible explanation for trust game behavior is that each player has an individualistic felicity for herself, and imputed felicity for the other player, and seeks to maximize a personal welfare function that gives a positive altruistic weight to the other player’s imputed felicity. Another is that social norms for reciprocity are sufficiently strong, and sufficiently internalized, so that some trustees will override utility considerations and return to the investor a ‘‘fair’’ portion of the augmented investment received. For example, an investor may believe that there are three types of trustees (1) selfish/rational players who will return nothing, (2) reciprocators who
12
Daniel McFadden
will return half the augmented investment, 3X/2, and (3) utility maximizers who are somewhat altruistic and will maximize a personal welfare function that is symmetric with the investor’s personal welfare function. The likelihood the investor attaches to these types will then determine the level of investment that maximizes expected personal welfare. A construction illustrates how this might work. Suppose each player has a CARA felicity function, u(c) ¼ (1 exp( dc))/d, where d is a risk aversion parameter which will be assumed to be one in this example. Suppose that the investor anticipates, in this case correctly, that the trustee’s felicity function is the same as her own, and assume that her personal welfare function is u ¼ Ey
1 expðdð1 x þ yÞÞ ð1 expðdð3x yÞÞ þl d d
where l parameterizes the extent of altruism, and the expectation is with respect to the investor’s beliefs on trustee behavior, given by a probability a that the trustee is of the selfish type and a probability b that she is of the welfare-maximizing type.1 Values of a, b, and l that result in X and Y matching the McCabe data are a ¼ 17.8%, b ¼ 1.8%, and l ¼ 0.05. Note that these values are not unique; we are fitting three parameters to two data points, and furthermore, the characterization of reciprocators is unwarrentedly specific. However, the nonlinearity of maximized expected welfare in the parameters restricts solutions to a range where the proportion of reciprocators is relatively high and proportion of welfare maximizers is relatively low. The fit illustrates that a mix of selfishly rational, norm-driven, and altruistic behaviors may explain outcomes in the trust game and similar games. Fehr and Fischbacher (2004) and Baumgartner, Heinrichs, Vonlanthen, Fischbacher, and Fehr1 (2008) conduct trust game experiments in which players are subject to treatments in which they are administered the ‘‘trust’’ peptide oxytocin, or a placebo, prior to play. They find that investors who received the oxytocin choose substantially higher investments than those administered the placebo. On the other hand, trustee play is not affected by the treatment. These experiments show that perceptions and/or preferences can be altered chemically. This is an acid test that shows that preferences cannot be sovereign and immutable, and must be considered functions of the context in which the individual is placed.
1.3.4.
Process
The decision-making process is simple in the neoclassical model — maximize expected utility, with rational expectations. The behavioral alternative is that individualistic utility maximization may be supplemented or replaced by processes that use analogies,
1. A welfare-maximizing trustee will maximize u ¼ l(1 exp( d(1 x + y)))/d + (1 exp( d(3x y)))/d in y subject to 0ryr3x; this is achieved at y ¼ max{0, min{3x, 2x ½ + (ln l)/2d}}.
Sociality, Rationality, and the Ecology of Choice
13
exemplars, and heuristics influenced by information obtained from social networks, internalized social norms, and approval of network members. A very broad characterization of social experiments and observations is that consumers behave consistently with individualistic rationality when stakes are high, but show influence of sociality when stakes are modest, particularly in situations where alternatives are unfamiliar and ambiguous. This decision-making pattern is consistent with a model of decision-making inside social networks in which the individual follows the herd in circumstances where the effort required to complete individualistic optimization is high, and the potential regret from follower behavior is less than the potential regret from a decision that does not rely on network support. If decision-making is costly, and regret painful, then it may be an efficient strategy to affiliate with a social network, accept the limits on choices imposed by network social norms, and follow the herd. This is, for example, the behavior of riders in a bicycle race, who form pellatons in which riders follow a rotating leader. By doing so, they save energy and avoid wasteful and unproductive excursions. However, ‘‘break-aways’’ also occur in this example, and in the broader context of decision-making, this would correspond to abandonment of an old social network affiliation, and formation of a new network that more closely matches the individual’s perceptions and tastes. How heuristics and procedures other than welfare optimization develop and survive is an important question. It is possible that some procedures are initially motivated by welfare maximization, and live on as heuristics in analogous situations because they are available and have proved satisfactory in the past. It is also possible that trial heuristics arise more or less at random, and the successful ones are selected and reused. The observation that behavior is close to optimum (judged by neoclassical standards, such as revealed preference) when stakes are high suggests that individuals facing a choice may first engage in a meta-choice of decision-making protocol. When the stakes are low and the possibilities for regret are small, readily available, easily deployed heuristics are likely to be adopted, particularly easy rules that rely on network field effects. When the possibilities for regret are more substantial, then more attention is allocated to the decision problem, and solutions that test for consistency with self-interest are more likely to be employed. When attention is limited and decision-making effort costly, this may be a choice strategy that in an overarching sense is nearly optimal.
1.4. Econometric Analysis of the Effects of Sociality on Choice This paper has outlined some of the ways in which sociality can influence decisionmaking. The channels of influence through constraints, perceptions, preferences, and process are not necessarily mutually exclusive and distinct, but there are differences in observable implications for the different channels, and with experiments it will be possible in most cases to identify the separate effects. However, the econometric task of identifying these effects in natural choice settings is difficult. To be specific, consider the question of the identification of field and group (homophily) effects,
14
Daniel McFadden
the problem studied by Manski (1993), Aoki (1996), and Brock and Durlauf (2001, 2002), in which individual preferences and/or perceptions are influenced by the field effects from network choices. An empirical study by Dugundji and Walker (2005) considers this problem for mode choice in Amsterdam, and considers the influence of neighborhood and social strata networks. The authors consider homophily effects, which appear as unobserved factors common to all network members, and field effects, which appear as factors in individual utility functions. In their setup, there is no separation of opportunity-based and preference-based homophily effects, or distinction between field effects that arise from common supply-side or opportunitybased factors, and those that arise because of contagion in transmission of information on the available alternatives. The authors do not resolve Manski’s reflection problem, or deal with possible endogeneity. Subject to these limitations, the authors find that network field effects are significant, and that unobserved group or homophily effects do not add significant explanation once field effects are taken into account. A natural question is whether further econometric analysis could untangle the homophily, field, and equilibrium effects, or whether in the absence of observations on the dynamics of mode choice, point identification of these effects is impossible. Several features of the networks the authors consider may aid identification. Both neighborhood or social strata networks are arguably predetermined, so that network affiliation can be treated as exogenous.2 Neighborhood network effects are likely to come from supply-side constraints and opportunity-based homophily, such as availability and convenience of public transit or bicycle lanes. Social stratum network effects are more likely to come from preference-based homophily and from economic constraints; e.g., high income people are likely to have similarly high values of time, and to have budgets that permit consideration of all alternative modes. Field and supply-side effects act on individuals through market equilibrium, and this will tend to make them endogenous when they are entered as explanatory factors in choice models. Specifically, if there are unobserved group or homophily effects, then they will have a non-negligible impact on market equilibrium, inducing dependence between unobservables in the individual choice and market-level field effects. This is precisely the reflection problem considered by Manski in the context of linear models. To analyze this problem further, consider a model of social network effects in which there are K networks, indexed k ¼ 1,y, K, and j ¼ 1,y, Nk individuals in network k. Suppose for simplicity that each individual belongs to exactly one network, and faces a binomial transportation mode choice. Suppose the relative utility of alternative 1 is u ¼ Xjkb + Ykg + ak + ejk, where Xjk is a vector of observed relative mode attributes, possibly interacted with observed individual characteristics,
2. Of course, if individuals engage in life cycle optimization, then the choice of where to live, and choice of job that determines income and social status, may be influenced by the same latent factors as later decisions such as mode choice, so that networks defined by neighborhood and socioeconomic status are not genuinely predetermined.
Sociality, Rationality, and the Ecology of Choice
15
Yk a network equilibrium (field) effect, ak an unobserved random group (e.g., preference-based-homophily) effect, ejk an individual disturbance that will be assumed to be independent of ak and Yk and to have a logistic distribution, and b, g are parameters. Think of Yk as a measure of relative congestion, crowding, or travel time that is determined as a result of the equilibrium of supply and demand for transportation modes. Then, observed choices satisfy d jk ¼ 1ðX jk b þ Y k g þ ak þ jk 40Þ where djk is an indicator that is one when alternative 1 is chosen. Define the network share, X d jk sk ¼ Nk j The supply side of the market that determines network equilibrium is given by an equation zk ¼ gðsk ; Z k ; Y k Þ where Zk are observed exogenous factors that influence supply, but are excluded from the utility model that determines individual choice, and the zk are disturbances that for the current discussion will be assumed independent of the demand side disturbances. In this supply-side system, Zk is assumed to include factors that influence market equilibrium; this is necessary to escape Manski’s reflection problem. Examples in the mode choice application would be variations in public transit operating characteristics, such as schedule frequency and line density, and variations in out-of-network demand. With this setup, it may be possible to identify and estimate group and field effects using a simple version of the Berry-Levinsohn-Pakes (1995) estimator. The choice probability can be written Pjk ¼ Ea=Y LðX jk b þ Y k g þ ak Þ where L denotes the logistic function; the problem with working directly with this model is the dependence of Y and a. However, one can estimate a model with a network fixed effect that absorbs all the impact of Y and a, Pjk ¼ LðX jk b þ ak Þ This step provides consistent estimates of b, but note that the effect of any component of Xjk that does not vary within the network will be swept into the fixed effect and cannot be separately identified. Redefine ak ¼ m + Zk to isolate an overall mean, and estimate the linear model ak ¼ Y k g þ m þ Zk
16
Daniel McFadden
where the left-hand-side variable is the fitted network fixed effect, using instrumental variables with functions of the Zk as instruments. If the conditions for instrumental variables are met, then this procedure will identify g and the empirical distribution of the random group effects. The substantive question for the application is whether the field variable considered, sk, is in fact determined in an equilibrium influenced by exogenous supply-side factors that do not appear in the individual choice criterion. If one has observations on the dynamics of choice, it is natural to consider models in which an individual’s choice today depends on her and other network members’ choices yesterday. This removes the reflection problem, but leaves an initial values problem due to dependence of group effects and the first observed market equilibrium. This may in turn be resolved if the network is observed from the first time a choice becomes available, or handled in a sufficiently long panel by using semiparametric setups that estimate the initial distributions of group effects conditioned on the initial market state.
1.5. Conclusions The purpose of this paper has been to show that sociality matters for understanding and predicting choice behavior. Social network effects influence economic choice behavior through constraints, perceptions, preferences, and the decision-making process, and their omission makes choice models incomplete and misleading. The rational choice model can be expanded to encompass field, opportunity-based homophily, and preference-based homophily effects, and altruism, but econometric analysis must then account for equilibrium, endogeneity of field effects, and in a problem that does not so far appear to have been analyzed, the endogeneity of network affiliations. Identification of field and group effects will usually be aided by study of the dynamics of network field effects, with a remaining initial value problem in the specification of the distributions of unobserved group effects. Reconciliation of rational choice models with reciprocity and other social norms that make sense in the context of repeated games, reputation, and evolution will require deeper analysis of the dynamics of preferences. It may be possible in the future to explain the evolution of social norms as solutions to repeated games.
Acknowledgment This research was supported by the E. Morris Cox endowment.
References Aoki, M. (1996). New approaches to macroeconomic modeling: Evolutionary stochastic dynamics, multiple equilibria and externalities as field effects. Cambridge University Press.
Sociality, Rationality, and the Ecology of Choice
17
Banerjee, A. (1992). A simple model of herd behavior. The Quarterly Journal of Economics, 107, 797–817. Baumgartner, T., Heinrichs, M., Vonlanthen, A., Fischbacher, U., & Fehr1, E. (2008). Oxytocin shapes the neural circuitry of trust and trust adaptation in humans. Neuron, 58, 639–650. Berg, J., Dickhaut, J., & McCabe, K. (1995). Trust, reciprocity, and social history. Games and Economic Behavior, 10, 122–142. Berry, S., Levinsohn, J., & Pakes, A. (1995). Automobile prices in market equilibrium. Econometrica, 63, 841–890. Brock, W., & Durlauf, S. (2001). Discrete choice with social interactions. The Review of Economic Studies, 68, 235–260. Brock, W., & Durlauf, S. (2002). A multinomial-choice model of neighborhood effects. The American Economic Review, 92, 298–303. Browning, M., & Chiappori, P. (1998). Efficient intra-household allocations: A general characterization and empirical tests. Econometrica, 66, 1241–1278. Dugundji, E., & Walker, J. (2005). Discrete choice with social and spatial network interdependencies. Transportation Research Record, 1921, 70–78. Dusenberry, J. (1949). Income, saving, and the theory of consumer behavior. Cambridge, MA: Harvard University Press. Edgeworth, F. Y. (1881). Mathematical psychics, and further papers on political economy. Reprinted 2003, P. Newman (Ed.), Oxford: Oxford University Press. Fehr, E., & Fischbacher, U. (2004). Social norms and human cooperation. Trends in Cognitive Sciences, 4, 185–190. Fosgerau, M., & De Borger, B. (2008). Hedonic preferences, symmetric loss aversion and the willingness to pay-willingness to accept gap. Working paper, Technical University of Denmark. Gilboa, I., & Schmeidler, D. (1989). Maxmin expected utility with non-unique prior. Journal of Mathematical Economics, 18, 141–153. Kahneman, D., Fredrickson, B. L., Schreiber, C. A., & Redelmeier, D. A. (1993). When more pain is preferred to less: Adding a better end. Psychological Science, 4, 401–405. Kohler, H. (1997). Learning in social networks and contraceptive choice. Demography, 34, 369–383. Luttmer, E. (2001). Group loyalty and the taste for redistribution. The Journal of Political Economy, 109, 500–528. Manski, C. (1993). Identification of endogenous social effects: The reflection problem. Review of Economic Studies, 60, 531–542. Manski, C. (1995). Identification problems in the social sciences. Cambridge: Harvard University Press. Myagkov, M., Orbell, J., Johnson, T., & Menshikov, I. (2007). Sociality as risky choice in economic theory and in its critic prospect theory. Working paper, University of Oregon. Redelmeier, D., & Tibshirani, R. (1999). Why cars in the next lane seem to go faster. Nature, 401, 35. Samuelson, L. (2004). Information-based relative consumption effects. Econometrica, 72, 93–118. Schelling, T. (1969). Models of segregation. American Economic Review, 59, 488–493. Schelling, T. (1971). On the ecology of micromotives. The Public Interest, 25, 61–98. Smith, A. (1759). The wealth of nations. Reprinted 2009, New York: Classic House Books. Tagiuri, R., & Kogan, N. (1957). The visibility of interpersonal preferences. Human Relations, 10, 385–390. Veblen, T. (1899). The theory of the leisure class. Reprinted 2007, M. Banta (Ed.), Oxford: Oxford University Press.
Chapter 2
Planning and Action in a Model of Choice Moshe Ben-Akiva
Abstract Purpose: This chapter introduces a choice modeling framework that explicitly represents the planning and action stages of the choice process. Methodology: A discussion of evidence from behavioral research is followed by the development of a discrete choice modeling framework with explicit planning and action submodels. The plan/action choice model is formulated for both static and dynamic contexts; where the latter is based on the Hidden Markov Model. Plans are often unobservable and are treated as latent variables in model estimation using observed actions. Implications: By modeling the interactions between the planning and action stages, we are able to incorporate richer specifications in choice models with better predictive and policy analysis capabilities. The applications of this research in areas such as driving behavior, route choice, and mode choice demonstrate the advantages of the plan/action model in comparison to a ‘‘black box’’ choice model in terms of improved microsimulations of behaviors that better represent real-life situations. As such, the outcomes of this chapter are relevant to researchers and policy analysts.
2.1. Introduction This chapter contributes to the effort of introducing richer behavioral representation to discrete choice models with focus on the process leading up to a choice. This process can be divided into two stages: the planning stage and the implementation/ action stage.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
20
Moshe Ben-Akiva
The basis behind this theory is that people plan before they act. Plans can be short term or tactical, such as a target lane that the driver wants to move to. They can be medium term, such as replacing an old car; and long term or strategic, such as a residential relocation. A plan can even be following a habit or the choice of not to plan, which is shown by impulsive behavior. A plan is essentially a strategy or a policy that is manifested by an action or a choice. Actions depend on the plan, for example, changing lanes, purchasing a new car, or moving to a new home. However, plans are often unobserved. Therefore, we combine explicit models of the plan and the action stages treating plans as latent variables. Although this idea of planning has previously been studied by behavioral scientists and applied to specific cases, it has not been formulated as a general approach to discrete choice modeling. This chapter introduces a description of the decision-making processes within both planning and action stages using supporting evidence from behavioral research. Then a modeling framework is presented, followed by three applications: driving behavior, route choice and mode choice, and finally, a conclusion.
2.2. Evidence from Behavioral Research To support the idea of planning, we present four examples from behavioral research. The first example is from Ajzen’s (1991) theory of planned behavior, which describes behavior as a direct outcome of an individual’s assessment of his/her own abilities (perceived behavioral control) and an individual’s motivation or intentions. Ajzen argues that the past has the largest impact on an individual’s perceived behavioral controls, and affects future behavior. The motivations are influenced by an individual’s beliefs regarding the norms, attitudes, and his/her abilities. In addition, intentions and behavioral controls and constraints affect behavior, as shown in Figure 2.1. Behavioral intention is a type of plan, and the planning stage describes the formation of behavioral intentions. A second example is the car replacement study by Marell, Davidsson, and Ga¨rling (1997) that developed the process shown in Figure 2.2. Individuals that own a car evaluate the quality of the car that they own. They also have aspirations, based on Attitudes
Beliefs
Norms
Intention
Behavior
Behavior Controls
Figure 2.1: Theory of planned behavior (Ajzen, 1991).
Planning and Action in a Model of Choice Economy, Innovations, Socio demographics, Environmental concern
21
Owned Automobiles
Aspiration Level
Comparison
REPLACEMENT INTENTION
Goal Setting
Search
Replacement
Figure 2.2: Car replacement process (Marell et al., 1997).
GOAL
COMMITMENT
OUTCOME • Pay $17.3 per visit
Monthly Contract Lose a Few Pounds
Annual Contract
No Commitment
• Go 4.0 times per month • Pay $15.2 per visit • Go 4.4 times per month • Pay $10-$12 per visit
Figure 2.3: Gym commitment (DellaVigna & Malmendier, 2006). other considerations, of a car that they would like to own. In forming a plan, they compare their aspirations to what they already have. The outcome could be a plan to either replace or keep their car. Subsequently, the plan is followed by the action of either purchasing a replacement car or keeping the car they already own. The authors found that discrepancies between the evaluation and replacement intention are not random but systematic. This shows the link between the car assessment and replacement intentions, which represents the planning stage. Behavioral anomalies can also occur between the action and planning stage. That is, individuals do not always act according to the plans they make. For example, the purchase of a gym membership is often motivated by a plan to lose a few pounds or become healthier. A gym contract represents the commitment to the plan. One can sign a monthly or an annual contract, or make no commitment at all and pay per visit to the gym. A recent study of gym memberships by DellaVigna and Malmendier (2006) showed that, on average, monthly contract members actually paid more per visit than the per visit cost without a contract, see Figure 2.3. In this case, the outcome is inconsistent with the planned commitment. One explanation for this
22
Moshe Ben-Akiva Bet less if losing Plan (Before gambling) Losing
Action
Bet more
Winning
No effect on bets
Figure 2.4: Gambling behavior (Andrade & Iyer, 2007). phenomenon is overconfidence in their future self-control. During the planning stage, individuals tend to overestimate the number of times they will go to the gym. Another possible reason for this phenomenon is that plans are designed to generate pressure to act in a certain way, such as going to the gym. The generated pressure creates stress when the commitment is unfulfilled. In this way, people create an additional stress relieving incentive to fulfill their commitment. It is worth noting that the authors conducted another experiment with a small group, where the participants were told that depending on the number of times each participant actually went to the gym, the subscription plan may be more expensive. The participants in this experiment correctly chose the cheaper pay-per-visit plan. This shows how additional information during the planning stage may eliminate the inconsistencies between plans and actions. Andrade and Iyer (2007) also observed the same behavioral anomaly in a study of gambling behavior, as shown in Figure 2.4. In a lab experiment, participants were asked to go through a sequence of bets. Before they started the experiment, the participants were asked about their strategy/plan and what they would do if they win or lose. Most participants said that if they lost on the first bet, they will risk less on the subsequent bets. But in actuality, the same participants bet more after a loss despite their plan to bet less. This may be the result of the participants underestimating, during the planning stage, the negative feelings they would experience after a loss. Plans may not be perfect; but, in general, affect what people do.
2.3. Why Represent Planning in Choice Models? The main reasons to represent a two-stage decision process in a choice model are to enhance behavioral realism and capture the dynamics of the decision-making process, and to enable effective intervention. The incorporation of richer behavioral representations, such as the inclusion of an explicit planning model, improves model performance. The inclusion of plans is essential in a dynamic context to explain the timing of actions since execution takes place when plans can be implemented. In addition, plans change and evolve over time (situational constraints change, the contexts of the choices change, and experiences are gained) affecting the timing of the actions. Therefore, an explicit planning stage is
Planning and Action in a Model of Choice
23
a mechanism to capture the temporal dependence between plans and actions, which explains the timing of actions. The dependence of actions over time is carried through the plans as plans evolve over time. An example of a travel choice model with an explicit planning process is the model developed by Hirsh, Prashker, and Ben-Akiva (1986). They introduced a dynamic model of weekly activity pattern, based on data collected in the form of activity diaries, for a 1-week period. The week is divided into multiple time periods. At the beginning of a week, an individual plans the activity pattern for the week based on his/her initial expectations and needs. The plans are then updated at the beginning of each subsequent time period based on changes that occur during the previous period caused by the individuals’ actions and/or external factors, as well as new information. This model was applied to predict the effect of a shortened workweek on shopping activities. The model structure is shown in Figure 2.5. From a perspective of designing interventions that affect choice outcomes, these interventions need to occur within the planning stage in order to affect the outcomes. For example, route choice can be affected by information provided during the route planning stage. If the information is provided after a traveler is already en route and implementing his/her route plan, the effects of the information on the traveler’s route choice may be more limited. Residential choice can also be affected by information provided during the planning stage. People go through a process of forming intentions, selecting criteria on how they are going to select among available housing START
PLAN WEEKLY ACTIVITY PATTERN
ANY PERIODS LEFT?
NO
YES EXECUTE CURRENT PERIOD PLAN
UPDATE WEEKLY ACTIVITY PLAN FOR REMAINING PERIODS
Figure 2.5: Weekly activity planning (Hirsh et al., 1986).
24
Moshe Ben-Akiva
options. Depending on criteria such as commuting mode/distance, information about fuel tax may greatly influence a buyer’s decision and plans to buy a particular residence. But, if the information is provided after a buyer has already purchased a home, then the information will have little to no effect on the purchase. The above leads us to conclude that: (1) planning is a real behavioral phenomenon of a choice process; and (2) it is desirable and sometimes essential to include an explicit planning stage in a choice model.
2.4. Modeling Framework The two-stage process of planning and action leads to a two-layer decision hierarchy, as shown in Figure 2.6 (Ben-Akiva, Choudhury, & Toledo, 2006). The framework represents a sequential choice, indicating that the planning and action do not occur simultaneously. The first layer is the choice of a plan and the second is the choice of an action given a plan. The plan could be the criterion that is used to determine the action. For example, in route choice, the planning stage could consist of the choice among criteria such as time, distance, speed limits, tolls, congestion, etc. The action stage consists of choosing the actual route based on the criteria selected in the planning stage. A plan may represent intention, objective, target, and choice criterion. Examples from driving behavior include the target lane and the tactics of how to react to other drivers. The probability of selecting plan l is denoted as PðlÞ. The probability of selecting action j given plan l is Pð jjlÞ. In the driving behavior examples, the action would be the actual driving maneuver. The two choices of plan and action can be based on different criteria and considerations, differentiating this model from a hierarchical choice model such as a nested logit where all choice dimensions are modeled with the same utility function (see Ben-Akiva & Lerman, 1985). The probability of action j is: Pð jjuÞ ¼
X
Pð jjl; uÞPðljuÞ
(2.1)
l
Plan
P(l) 1
2
..
l
..
L
P(j|l)
Action 1
2 .. j .. J1
1
2 .. j .. J2
1
2 .. j .. Jl
1
Figure 2.6: Two-layer decision hierarchy.
2 .. j .. JL
Planning and Action in a Model of Choice
25
where the conditional probability Pð jjl; uÞ of action j is multiplied by the probability of a plan PðljuÞ, and the product is summed over all the possible plans. u is included to represent the effects of other random factors that may operate, such as individual disposition that may affect both planning and action. This is not a usual latent class choice model where the factors ‘‘generating’’ the heterogeneity among individuals can be conceptualized as discrete or categorical constructs (see Gopinath, 1995) because the marginal probability of selecting a plan is a choice, although it could be a choice that is unobserved or latent. If the plans are latent, then the plan and action submodels (i.e., the above marginal and conditional probabilities) cannot be estimated separately. The two submodels can be formulated as two distinct models, but the estimation has to be performed using the likelihood of observed actions. In other words, if the plan is unobserved, the estimation is executed using Eq. (2.1). It may be also noted that in latent class choice models, the class membership model is usually static and is based only on characteristics of the individuals. The latent plans, however, can vary dynamically as will be shown later in this section. If it is possible to collect data on plans, then it is advantageous to do so. However, this may be infeasible as plans are often impossible to observe. An example would be driving trajectory data where lane changes are observable but target lanes are not. Observable plans make the model estimation simpler. But if plans are latent, it is possible to make inferences about plan choices by analyzing observed actions. In the above driving behavior example, the plan and action submodels are jointly estimated using observed vehicle trajectories or sequences of lane changes. The most intuitive application of this framework is for decisions in a dynamic context with sequences of plans and actions. Plans may depend on the inertia of previous plans and on experiences gained from past actions. In every time period, plans can evolve based on experience, new information and inertia. Thus, there may be state dependency of plans over time. The upper panel of Figure 2.7 shows how the plans and actions evolve over time. The lower panel shows the two-stage hierarchy at a given point in time. In Figure 2.7, lt denotes the plan at time t, and jt denotes the action at time t. The causal sequence shows that at every time period, there is a plan in effect. (A plan of no action may be in effect when no planning has taken place.) The resulting action during that time period depends on the plan in effect. That plan depends on plans and actions during previous time periods. This forms a repetitive cycle of planning and actions that progress in time. The probability of selecting plan l at time t, conditional on previous plans and actions and on individual specific factors, is denoted as Pðl t jl 1:t1 ; u; j 1:t1 Þ, where l 1:t1 denotes the sequence of plans from time 1 to t 1 and j 1:t1 denotes the sequence of actions from time 1 to t 1. The probability of action j at time t, conditional on the prevailing plan, previous plans and actions, and on individual specific factors, is denoted as Pð j t jl 1:t ; u; j 1:t1 Þ. In this dynamic context, we model the joint probability of the chosen sequence of plans and actions. With latent plans, calculating the probability of the sequence of actions may present a problem since it is necessary to sum the joint probability of the sequence of plans and actions over all possible sequences of plans.
26
Moshe Ben-Akiva
l0
l1
l2
j1
j2
1
Plan
2
…
lT
jT
…
...
l
L
...
t = t+1
Action
1
2 ... j ... J1
1
2 ... j ... J2
1
2 ... j ... Jl
1
2 ... j ... JL
Figure 2.7: Upper panel: Plans and actions over time. Lower panel: Two-stage hierarchy at a given point in time. The number of possible sequences of plans is jLjT , where jLj is the number of possible plans and T is the number of time periods. If T is very large, as it is in driving behavior data, the number of possible sequences makes it computationally infeasible. One way to simplify this is by using the Hidden Markov Model that is based on the following two assumptions. The first assumption is that the action at time t only depends on the plan in effect as follows: Pð j t jl 1:t ; u; j 1:t1 Þ ¼ Pð j t jl t ; uÞ
(2.2)
The second assumption is that the plan at time t depends only on the plan at time t 1. It may also depend on all historical actions. Thus, the probability of the plan at time t conditional on previous plans and actions is expressed as follows: Pðl t jl 1:t1 ; u; j 1:t1 Þ ¼ Pðl t jl t1 ; u; j 1:t1 Þ
(2.3)
The history of actions can affect the plan l at time t. But only the plan at time t 1 affects the plan at time t. At any given time t, the joint probability of the plan and action, conditional on the previous plans and actions, is given as: Pð j t jl t ; uÞPðl t jl t1 ; u; j 1:t1 Þ
(2.4)
Therefore, the probability of a sequence of plans and actions is the product of Eq. (2.4) over time, as follows. T Y t¼1
Pð j t jl t ; uÞPðl t jl t1 ; u; j 1:t1 Þ
(2.5)
Planning and Action in a Model of Choice
27
The simplifying assumptions of the Hidden Markov Model make a difference when we calculate the probability of a sequence of actions. The summation of Eq. (2.5) over all possible sequences of plans is simplified using the Hidden Markov Model. This sum can be written as a sequence of sums over the different plan options. This sequence of sums starts from the first time period and for every subsequent time period there is an additional sum over the possible plans. The number of elements inside this sequence of sums is reduced from jLjT to jLjT. The joint probability of a sequence of actions given the initial conditions is expressed as follows: Pð j 1 ; . . . ; j T jl 0 ; uÞ ¼
T X Y
Pð j t jl t ; uÞPðl t jl t1 ; u; j 1:t1 Þ
ðl 1 ;...;l T Þ t¼1
¼
X lT
X
Pð j T jl T ; uÞ
X
Pðl T jl T1 ; u; j 1:T1 ÞPð j T1 jl T1 ; uÞ
(2.6)
l T1
Pðl 2 jl 1 ; u; j 1 ÞPð j 1 jl 1 ; uÞPðl 1 jl 0 ; uÞ
l1
This joint probability is computationally feasible, even for a large T, and can be used as the likelihood function for model estimation. There are cases where the current plan does not only depend on the previous plan; but depends on a sequence of previous plans. In this situation, the usual solution is to define an augmented state, which is a redefined measure of time where the state of the system at time t includes multiple periods preceding t. For example, the augmented plan at time t is a sequence of plans ending at time t. The Hidden Markov Model is then applied using plan and action probabilities that are specified for the augmented plans and actions.
2.5. Applications The potential benefit from the plan/action model is demonstrated with the following three applications to driving behavior, route choice, and mode choice.
2.5.1.
Driving Behavior
The planning process in driving behavior includes the selection of target lanes. For example, a driver on a four-lane highway observes the traffic conditions ahead and with knowledge of the exit he/she needs to take, forms a plan, which includes a target lane. The execution of the plan requires an acceptable gap on the way to the target lane. If the driver is in lane 3, as shown in Figure 2.8, and decides that lane 1 is the target lane, he/she will first need to move into lane 2 before entering lane 1. The details of the plan/action model for this behavior are presented in Choudhury (2007) and Choudhury, Ben-Akiva, and Toledo (2007b).
28
Moshe Ben-Akiva Lane 1 (HOV) Avg. Speed 70 mph Lane 2 Avg. Speed 40 mph Lane 3 Avg. Speed 45 mph
A
Lane 4 Avg. Speed 50 mph Traffic direction
Figure 2.8: Target lane choice.
Currently in Lane 3
Lane 1
No Change
Change Left
Lane 2
No Change
Change Left
Lane 3
No Change
Lane 4
No Change
Change Right
Target Lane (Plan)
Gap Acceptance (Action)
Figure 2.9: Freeway lane-changing model.
The model structure for the situation in Figure 2.8 is shown in Figure 2.9. If the current lane is perceived as the best lane, that lane is therefore the target lane and there is no need to change lanes. However, if a chosen target lane is any other lane, then the next step is to make a lane change. The action choice set is either to change or not to change. Reasons for not changing lanes may be that the driver is already in the target lane, or it is impossible to change because there are no acceptable gaps, in which case the driver will wait. While the driver waits for an acceptable gap, new information may also cause a change in plan. Supposing again that the driver is in lane 3 and lane 1 is the target lane, the driver can stay in lane 3 or switch to lane 2, which means changing left and subsequently repeating the same process. This particular model was estimated based on vehicle trajectory data from I-395 in Virginia and validated with data from I-80 in California. The unique aspect of the freeway in California is that the leftmost lane is a high occupancy vehicle lane with unlimited access. The data were collected as part of the Next Generation Simulation
Planning and Action in a Model of Choice
Fraction of Vehicles
Observed
Latent Plan
29
Reduced Form
0.25 0.20 0.15 0.10 0.05 0.00 HOV Lane
Lane2
Lane3
Lane4
Lane5
Lane6
Lanes
Figure 2.10: Vehicle lane distributions. (NGSIM) project. For more information see Choudhury, Ben-Akiva, Rao, Lee, and Toledo (2007a), Choudhury, Ramanujam, and Ben-Akiva (2008), and the project website (http://www.ngsim.fhwa.dot.gov). The data only included observations of the lane changes. This model was further validated with traffic data from New York and implemented into commercial traffic simulation software. The validation results indicated that the latent plan model performed better than a myopic lane-changing model. In the myopic model (reduced form), drivers only considered one lane at a time, instead of considering all four lanes at once. The results, as presented in Figure 2.10, show that the latent plan model has better predictions. The latent plan model captures the behavior of HOV’s that enter the freeway in lane 4 and target lane 1 because of its higher speed. In traffic simulations, the timing of an action is critical because the levels and locations of traffic jams depend on when and where drivers change lanes. Without modeling plans, the traffic simulator would fail to replicate real traffic situations, often resulting in unrealistic overestimations of traffic congestion.
2.5.2.
Route Choice
In a recent study by Papinski, Scott, and Doherty (2009), planned and observed home-to-work route choices of 31 individuals are compared, where planned routes are obtained from pre-trip interviews and observed routes from GPS traces. One fifth of the subjects deviated from their planned routes, citing reasons such as en route congestion, driver change or changes in schedule. This observation supports the simulation of a route choice model using the plan/action framework. The idea behind applying planning to route choice (Gao, Frejinger, & Ben-Akiva, 2008) was developed by Song Gao in her PhD dissertation (Gao, 2005) and further developed by Gao et al. (2008) and Gao, Frejinger, and Ben-Akiva (2009). The model captures adaptive decisions that drivers make en route in response to traffic information. Prior to departure, drivers select a policy, not a particular route. A policy is a plan that specifies a course of action that the driver will take under certain circumstances. For example, when a driver on the freeway learns from the
30
Moshe Ben-Akiva
Routing Policy 1
Path 1
...
Routing Policy 2
Path J1
Path 1
Routing Policy L
...
Path J2
Path 1
...
Path JL
Figure 2.11: Routing policy model. radio broadcast that there is an accident ahead, the driver will exit the freeway at the nearest exit onto an arterial road. This is a plan or a policy with the ‘‘if, then’’ rule. The policy is to change routes if there is an accident ahead. The route diversion is the action. The adaptive part means that the policy is updated as more information is acquired (i.e., as conditions are observed). It is a mapping from location, time, and traffic information to decisions about diversions from planned routes. It determines the next link the driver will take in the network. A similar concept is used for public transportation (PT) route choice, where multiple PT lines serve the same destination. The policy could be to board the train that arrives first. The model framework is shown in Figure 2.11. Routing policy is latent and is denoted by l. We observe only the sequence of links followed by the traveler, denoted as path j. Information r depends on realizations of uncertain traffic conditions. It is known to the modeler through archived monitoring data but is unknown to the traveler before the trip. The probability of a particular path j, given policy l and information r, is denoted as Pð jjl; rÞ. It is a deterministic model such that Pð jjl; rÞ equals 1 if policy l is realized as path j with information r, and 0 otherwise. The choice probability of a routing policy l is denoted as PðlÞ. The policy choice model is stochastic. The unconditional path choice model requires the sum of the joint probability of policy and path over all possible routing policies to obtain the marginal path probability, as follows: Pð jjrÞ ¼
X
PðlÞPðjjl; rÞ
(2.7)
l2G
where Pð jjrÞ is the probability of observing path j with information r and G is the choice set of routing policies. To simplify the generation of routing policies, it is convenient to represent network travel times as discrete random variables. The routing policies then take the above-mentioned form of if (travel times) then (link). The numerical experiments performed by Gao et al. (2008, 2009) demonstrate that this policy model produces qualitatively different results from both adaptive and nonadaptive path choice models. The plan/action model also provides a general approach to model travelers’ response to fluctuations in travel conditions and realtime travel information.
Planning and Action in a Model of Choice 2.5.3.
31
Mode Choice
Maya Abou-Zeid’s (2009) dissertation is focused on the measurement and modeling of activity and travel well-being. She conducted experiments and surveys in Switzerland (at the Ecole Polytechnique Fe´de´rale de Lausanne, Universite´ de Lausanne, and Geneva airport) and at the Massachusetts Institute of Technology (MIT). Her experiment was designed to get participants to reconsider their choice of travel mode to work. Habitual car commuters were required to switch to PT for a specified period of time. Immediately after the temporary switch, participants were asked about their experiences, happiness with commuting by car and PT, and whether or not they planned on changing from their routine travel mode (i.e., car). Data on participants’ actual mode choice 1 month later were also available. This mode choice application (Abou-Zeid & Ben-Akiva, 2009) is similar to Ajzen’s idea in that intention is followed by behavior. The plan is the anticipated frequency of using the different modes for commuting. The action is the actual mode choice (car or PT). The model framework is shown in Figure 2.12. Table 2.1 shows the distribution of MIT participants by their planned frequency of commuting by PT versus the subsequent mode choice decision. The results show a high degree of consistency between plans and actions. That is, most of the participants who planned to commute by PT two to three times per week or more ended up choosing PT as a commuting mode; and all those who planned to commute by PT once a week or less ended up choosing the car as a commuting mode. This mode choice example differs from the two previous applications in three aspects. First, while the plans were unobserved in the driving behavior and route choice applications, the plans represented as the intended frequency of commuting by different modes are observed in this application. Second, the plans were treated as discrete in the general framework and in the previous applications but are treated as continuous variables in this mode choice application. Third, in the previous applications, the choice was used as the only indicator of the utility. In this application, happiness measures (for car and PT) are used as additional indicators of the utility. This latter extension of standard discrete choice models falls within the framework of Hybrid Choice Models (Ben-Akiva et al., 2002; Walker & Ben-Akiva, 2002) which combine choice models with latent variables that may have their own indicators.
1
...
Car
l
L
...
PT
Figure 2.12: Mode choice with planning.
32
Moshe Ben-Akiva
Table 2.1: Number of MIT participants by their public transportation (PT) plan (intended frequency of commuting by PT) and action (mode choice). PT plan
Action Car
Less than once a month Once a month Two to three times per month Once a week Two to three times per week More than four times per week Total
PT
19 11 13 3 1
6 14 20
47
Two models were estimated. The following model did not contain planning: DU ¼ b0 þ b1 ðlnðTimeCar Þ lnðTimePT ÞÞ þ b2 ðCostCar =Income CostPT =IncomeÞ þ ; Nð0; 1Þ ( y¼
Car
if DU þ mZ 0
PT
otherwise
(2.8)
(2.9)
Z Logisticð0; 1Þ
hB ¼ a1 þ l1 DU þ u1 ; u1 Nð0; s2u1 Þ hA ¼ a2 þ l2 DU þ u2 ; u2 Nð0; s2u2 Þ
(2.10)
where DU is the utility difference, time is travel time from home to work by mode, cost is monthly commuting cost, and income is annual personal income. y is the choice indicator, hB and hA are pre- and post-treatment happiness indicators, b0, b1, b2, a1, a2, l1, and l2 are unknown coefficients, m is a scale parameter, and s2u1 and s2u2 are unknown variances. The plan-action model, shown in Figure 2.13, includes the same utility specification as Eq. (2.8), the same choice indicator as Eq. (2.9), and the same happiness indicators as Eq. (2.10). It adds the following equation: l ¼ a3 þ l3 DU þ u3 ; u3 Nð0; s2u3 Þ
(2.11)
where l is the reported plan specified as the natural logarithm of the ratio of the intended monthly frequency of commuting by car to the intended monthly frequency of commuting by PT, a3 and l3 are unknown coefficients, and s2u3 is an unknown
Planning and Action in a Model of Choice
Intrinsic utility (ΔU)
33
Happiness (hB, hA) Reported plan (l)
Action (y)
Figure 2.13: Mode choice model framework. variance. This additional equation states that the utility that determines the choice is indicated by the reported plan. In other words, the underlying utility reflects the latent plan. In this application, the focus is on a choice model that depends on a utility function. The results show that the plan can also be explained by the same utility. Therefore, by jointly estimating the choice model and the plan equation, we obtained a better estimated utility and a more accurate choice model.
2.6. Conclusion Planning is an essential aspect of choice. The plan represents the choice of criteria used to determine the action. The inclusion of planning in choice models enhances the model capabilities and improves model performance and goodness of fit. The dynamic microsimulation of plans and actions, which is based on the Hidden Markov Model, allows modelers to predict the timing of choices. This marks an important advance in choice modeling with major implications for the testing and efficacy of policy interventions. The applications of planning in driving behavior, route choice, and mode choice demonstrate the benefits of incorporating the planning stage. The driving behavior application shows how the plan/action model is essential for realistic simulation of traffic conditions. The benefit of the latent plan in the route choice model is to realistically capture how travelers deal with uncertainty. The mode choice application demonstrates how data on plans can be used to improve the accuracy of choice models estimated with revealed preferences data. Taken together, these applications show the possibility for further applications of discrete choice models using the plan/action framework.
Acknowledgments I would like to acknowledge former and current students whose research has contributed to this chapter. These include Charisma Choudhury, Song Gao, Varun Ramanujam, Anita Rao, Sujith Rapolu, Kamil Sveda, and Tomer Toledo. In particular, I greatly benefited from the work of Maya Abou-Zeid and the editorial contributions of Tina Xue.
34
Moshe Ben-Akiva
References Abou-Zeid, M. (2009). Measuring and modeling activity and travel well-being. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Abou-Zeid, M., & Ben-Akiva, M. (2009). An experimental study of happiness and travel behavior modification. Presented at the 12th international conference on travel behaviour research, December, Jaipur, India. Ajzen, I. (1991). Theory of planned behavior. Organizational Behavior and Human Decision Processes, 50, 179–211. Andrade, E. B., & Iyer, G. (2007). Dynamic inconsistencies in gambling and the role of feelings. UC Berkeley, Experimental Social Sciences Lab (Xlab), Berkeley, CA. Available at: http:// escholarship.org/uc/item/0fx6729K Ben-Akiva, M., Choudhury, C., & Toledo, T. (2006). Modeling latent choices: Application to driving behavior. In: 11th international conference on travel behaviour research, Kyoto, Japan. Ben-Akiva, M., & Lerman, S. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: The MIT Press. Ben-Akiva, M., McFadden, D., Train, K., Walker, J., Bhat, C., Bierlaire, M., Bolduc, D., Boersch-Supan, A., Brownstone, D., Bunch, D. S., Daly, A., de Palma, A., Gopinath, D., Karlstrom, A., & Munizaga, M. A. (2002). Hybrid choice models: Progress and challenges. Marketing Letters, 13(3), 163–175. Choudhury, C. (2007). Modeling driving decisions with latent plans. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Choudhury, C., Ben-Akiva, M., Rao, A., Lee, G., & Toledo, T. (2007a). State dependence in lane changing models. In: R. Allsop, M. Bell & B. Heydecker (Eds), Transportation and traffic theory (pp. 711–733). Amsterdam, The Netherlands: Elsevier. Choudhury, C., Ben-Akiva, M., & Toledo, T. (2007b). Modeling lane-changing behavior in presence of exclusive lanes. In: 11th world conference on transport research, Berkeley, CA. Choudhury, C., Ramanujam, V., & Ben-Akiva, M. (2008). A lane changing model for urban arterials. In: 3rd international symposium of transport simulation, Gold Coast, Australia. DellaVigna, S., & Malmendier, U. (2006). Paying not to go to the gym. American Economic Review, 96, 694–719. Gao, S. (2005). Optimal adaptive routing and traffic assignment in stochastic time-dependent networks. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Gao, S., Frejinger, E., & Ben-Akiva, M. (2008). Adaptive route choice models in stochastic time-dependent networks. Transportation Research Record, No. 2085, pp. 136–143. Gao, S., Frejinger, E., & Ben-Akiva, M. (2009). Adaptive route choices in risky traffic networks: A prospect theory approach. Transportation Research Part C. Available online, doi: 10.1016/j.trc.2009.08.001. Gopinath, D. A. (1995). Modeling heterogeneity in discrete choice processes: Application to travel demand. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Hirsh, M., Prashker, J., & Ben-Akiva, M. (1986). Dynamic model of weekly activity pattern. Transportation Science, 20(1), 24–36. Marell, A., Davidsson, P., & Ga¨rling, T. (1997). The role of aspiration level for decisions to replace automobiles. In: D. Hensher, J. King, and T. Oum (Eds), World transport research: Proceedings of the 7th world conference on transport research, Vol. 1: Travel behavior, Pergamon, Oxford (pp. 181–187). Papinski, D., Scott, D., & Doherty, S. (2009). Exploring the route choice decision-making process: A comparison of planned and observed routes obtained using person-based GPS. Transportation Research Part F: Traffic Psychology and Behavior, 12(4), 347–358. Walker, J., & Ben-Akiva, M. (2002). Generalized random utility model. Mathematical Social Sciences, 43(3), 303–343.
Chapter 3
Attribute Processing, Heuristics and Preference Construction in Choice Analysis David A. Hensher
Abstract It has long been recognised that humans draw from a large pool of processing aids to help manage the everyday challenges of life. It is not uncommon to observe individuals adopting simplifying strategies when faced with ever increasing amounts of information to process, and especially for decisions where the chosen outcome will have a very marginal impact on their well-being. The transactions costs associated with processing all new information often exceed the benefits from such a comprehensive review. The accumulating life experiences of individuals are also often brought to bear as reference points to assist in selectively evaluating information placed in front of them. These features of human processing and cognition are not new to the broad literature on judgment and decision-making, where heuristics are offered up as deliberative analytic procedures intentionally designed to simplify choice. What is surprising is the limited recognition of heuristics that individuals use to process the attributes in stated choice experiments. In this paper we present a case for a utility-based framework within which some appealing processing strategies are embedded (without the aid of supplementary selfstated intentions), as well as models conditioned on self-stated intentions represented as single items of process advice, and illustrate the implications on willingness to pay for travel time savings of embedding each heuristic in the choice process. Given the controversy surrounding the reliability of self-stated intentions, we introduce a framework in which mixtures of process advice embedded within a belief function might be used in future empirical studies to condition choice, as a way of increasingly judging the strength of the evidence.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
36
David A. Hensher
3.1. Introduction Any economic decision or judgment has an associated, often subconscious, psychological process prodding it along, in ways that makes the ‘‘neoclassical ambition of avoiding [this] necessity y unrealizable’’ (Simon, 1978, p. 507). The translation of this fundamental statement on human behaviour has become associated with the identification of heuristics that individuals use to simplify preference construction and hence make choices, or to make the representation of relevant matters, regardless of the degree of complexity as perceived by the decision-maker and/or the analyst. Despite the recognition in behavioural research as far back as the 1950s (see Svenson, 1998; Gilovich, Griffin, & Kahneman, 2002) that cognitive processes have a key role in preference revelation, and the reminders throughout the literature (see McFadden, 1998; Yoon & Simonson, 2008) about rule-driven behaviour (e.g., Swait & Ben-Akiva, 1987; Gilbride & Allenby, 2004; Martinez, Aguila, & Hurtubia, 2009; Arana, Leon, & Hanemann, 2008; Gabaix & Laibon, 2000), we still see relatively little of the belief incorporated into stated choice (SC) modelling which is, increasingly, becoming the mainstream empirical context for preference measurement and willingness to pay (WTP) derivatives.1 There is an extensive literature on what might broadly be described as heuristics and biases, and which is crystallised in the notion of process, in contrast to outcome. Choice has both elements of process and outcome, which in combination represent the endogeneity of choice in SC studies. The failure to recognise process and the maintenance of a linear additive utility expression under full attribute and parameter preservation is an admission, by default, that individuals when faced with a SC experiment, deem all attributes relevant, and that a compensatory decision rule is used to arrive at a choice. Although there should be no suggestion that such compensatory rules are always invalid, indeed they may be, in aggregate, an acceptable representation or approximation of many process circumstances, there is a strong belief that process heterogeneity exists as a consequence of mixtures of genuine cognitive processing strategies that simplify decision-making in real markets, for all manner of reason, and the presence of new states that are introduced through the design of choice experiments that are no more than new circumstances to process. Whether the processing rules adopted are natural to real choices, or are artefacts of the design of an experiment or some other survey instrument (including revealed preference surveys) in front of an individual, is in some senses irrelevant; what is relevant is the manner in which such choice assessments are processed in respect of the role that each design attribute and the mixture of attributes play in the outcome. Yoon and
1. Consultants still adopt, almost without exception, a full compensatory approach in which all attributes are ‘‘relevant.’’
Attribute Processing, Heuristics and Preference Construction
37
Simonson (2008) and Park, Ding, and Rao (2008)2 provide some interesting perspectives from marketing research on preference revelation. Recent research by Hensher (2006, 2008), Greene and Hensher (in press), Layton and Hensher (in press), Hensher and Rose (2009), Hensher and Layton (2008), Hess and Hensher (in press), Puckett and Hensher (2008), Swait (2001), Cantillo, Heydecker, and Ortu´zar (2006), Cameron (2008), Scarpa, Gilbride, Campbell, and Hensher (2009a), Scarpa, Thiene, and Hensher (2009b), Beharry and Scarpa (2008), Cantillo and Ortu´zar (2005), Carlsson, Kataria, and Lampi (2008), Caussade, Ortu´zar, Rizzi, and Hensher (2005) and Hensher, Rose, and Greene (2005), Hensher, Scarpa, and Campbell (in press), among others are examples of a growing interest in the way that individuals evaluate a package of attributes associated with ordered or unordered alternatives in real or hypothetical markets, and make choices.3 The accumulating empirical evidence suggests that individuals use a number of strategies derived from heuristics, to represent the way that information embedded within attributes defining alternatives is used to process the context and arrive at a choice outcome. These include cancellation or attribute exclusion, degrees of attention paid to attributes in a package of attributes, referencing of new or hypothetical attribute packages around a recent or past experience, and attribute aggregation where attributes are in common units (see Gilovich et al., 2002 for a series of papers that synthesise the evidence under the theme of heuristics and biases). Importantly, as shown herein, the heuristics are likely to be context specific, such that the nature of the information shown in SC experiments, for example, conditions the choice of rules adopted. Hensher (2006, 2008) argues that individuals appear to adopt a range of ‘‘coping’’ or editing strategies in hypothetical choice settings that are consistent with how they normally process information in real markets. Choice experiments have varying amounts of information to process, but importantly, aligning ‘‘choice complexity’’ with the amount of information to process is potentially misleading. Relevancy is what matters (Hensher, 2006),4 and the heuristics adopted by individuals to evaluate a circumstance is what needs to be captured through frameworks that can empirically identify rules adopted by individuals. There are at least two ways in which information on processing might be identified. One involves direct questioning of respondents after each choice scenario (what we refer to as self-stated intentions); the other involves probabilistic conditions imposed on the model form through specification of the utility expressions associated
2. Park et al. (2008) promotes the idea of starting with a basic product profile and upgrading it, one attribute at a time, identifying the willingness to pay for that additional attribute, given that the budgets are available. 3. This paper does not consider other aspects of process in choice experiments such as uncertainty in the choice response. See Lundhede, Olsen, Jacobsen, and Thorsen (2009). 4. The emphasis on cognitive load may well be misplaced. Arana et al. (2008) suggest that individual’s cognitive abilities can be interrelated with affective and emotional states, i.e., every aspect of human decision making could be seen as simultaneously influencing the final outcome of the choice process. They show that complexity effects could be non-significant for particular emotional states.
38
David A. Hensher
with each alternative, that enables inference on the way that specific attributes are processed. Both may be complementary. The focus of this paper draws on a stream of active research by Hensher, Rose, Puckett, Layton, Greene, Scarpa, and Hess, in which we are systematically investigating process rules to establish the behavioural implications on marginal WTP.5 The functional forms presented herein, as well as responses to self-stated intention questions, enable the analyst to infer, up to a probability, the presence of some very specific attribute processing strategies, such as (i) common-metric attribute aggregation, (ii) common-metric parameter transfer, and (iii) attribute non-attendance. This paper is organised as follows. In the next section a utility-based framework is set out in which special cases of attribute processing can be specified, including a latent class model that can accommodate all heuristics of current interest. The empirical context is then summarised, followed by the empirical analysis, with a focus on marginal WTP. A further section considers the influence that self-stated intention responses play in WTP when we accept the full validity of such evidence. Given concerns about the face validity of self-stated intentions, we consider the role that belief functions might play in adjusting the self-stated intentions, so that the evidence is more plausible. We conclude with suggestions for the ongoing research.
3.2. Incorporating Attribute Processing Heuristics through Non-Linear Processing 3.2.1.
Process I: Common-Metric Attribute Aggregation
In this section, we present a utility specification that captures two heuristics6 in nonlinear attribute processing of common-metric attributes over a continuum that accommodates preservation of attribute partitioning and attribute aggregation.7 Importantly, the approach allows for mixtures of heuristics within a single model form associated with a sampled population, in contrast to the many studies that impose a single heuristic on the entire sample, and compare separate models in which each is defined by a single heuristic. A recent study by Arana et al. (2008) also considers multiple heuristics. With more than one heuristic within a model form, we are able to capture an individual choosing heuristic h with expected payoff E[Vh], and define the decision rule h* as optimal if E[Vh*] ¼ maxhE[Vh]. This specification is similar to the way in which a nested logit model is defined, with the new twist that the upper level of a two-level tree defines a choice set of heuristics.
5. The methods used to identify MWTP also have relevance in the calculation of total WTP. 6. Generalisation to more than two heuristics is feasible. 7. The functional form selected is one of many possible forms, but is useful in illustrating the way in which the utility expression can be defined to test for specific heuristics applications across a sampled population.
Attribute Processing, Heuristics and Preference Construction
39
Consider a utility function defined in terms of two attributes labelled x1 and x2 (in the empirical setting below, these might be route-specific free-flow time and slowed-down time, both in common units) and other attributes such as running cost and toll cost as x3 and x4: U ¼ f ðx1 ; x2 ; x3 ; x4 Þ þ
(3.1)
where ( f ðx1 ; x2 ; x3 ; x4 Þ ¼
b1 x1 þ b2 x2 þ b3 x3 þ b4 x4 if ðx1 x2 Þ2 4a b12 ðx1 þ x2 Þ þ b3 x3 þ b4 x4 if ðx1 x2 Þ2 oa
) (3.2)
b1, b2, b3, b4, b12, are estimated parameters. b12 does not necessarily equal a combination of b1 and b2. We assume that the standard random utility alternativespecific error e is not dependent on which form of f(x1, x2) is operative. The term (x1 x2)2 is introduced as the basis of a heuristic, and represents the ‘‘distance’’ between x1 and x2. A squared form is computationally convenient, but another form could be used. Intuitively, the heuristic associated with this functional specification is as follows (Layton & Hensher, in press): when the difference between the common-metric attributes x1 and x2 is great enough, the agent’s process preserves attribute partitioning, and thus treats each attribute as separate entities, and evaluates their contribution to utility in the standard random utility model manner with parameters b1 and b2. On the other hand, when the difference between the common-metric attributes x1 and x2 is relatively small, the agent’s process aggregates the attributes and thus treats the sum of x1 and x2 as a single attribute with utility weight b12. We can enrich the model by allowing the ai for person i to be randomly distributed (with aiW0). A useful candidate distribution is that ai is exponential with mean 1/l and density f ðaÞ ¼ lela (Layton & Hensher, in press). This density generally has a large mass near zero, and so allows for some fraction of the population to behave essentially as standard optimisers. Still others behave as standard optimisers when attributes are dissimilar, but aggregate when attributes are similar. Importantly, this density also allows for a tail of others who more frequently are aggregating the two attributes. The probability conditions are given in (3.3). In this model, we assume that there is an exponentially distributed threshold parameter, IID across alternatives and respondents, which indicates how the respondent views the attribute components.8 Pððx1 x2 Þ2 4aÞ ¼ 1 explðx1 x2 Þ
2
(3.3a)
8. At much greater computational cost one might allow for the ai’s to be constant across alternatives for a given respondent. We leave refinements like this for future work.
David A. Hensher
40 and
Pððx1 x2 Þ2 oaÞ ¼ explðx1 x2 Þ
2
(3.3b)
Integrating over the ai we write U in conditional form (Layton & Hensher, in press): U ¼ f ðx1 ; x2 j½ðx1 x2 Þ2 4aÞPð½ðx1 x2 Þ2 4aÞ
(3.4)
þ f ðx1 ; x2 j½ðx1 x2 Þ2 oaÞPð½ðx1 x2 Þ2 oaÞ þ Eq. (3.4) implies that: 2
2
U ¼ ðb1 x1 þ b2 x2 Þð1 explðx1 x2 Þ Þ þ b12 ðx1 þ x2 Þðexplðx1 x2 Þ Þ þ
(3.5)
Eq. (3.5) together with the equivalent treatment of x3 and x4 implies that: 2
2
U ¼ ðb1 x1 þ b2 x2 Þð1 expl1 ðx1 x2 Þ Þ þ b12 ðx1 þ x2 Þðexpl1 ðx1 x2 Þ Þ 2
2
þ ðb3 x3 þ b4 x4 Þð1 expl2 ðx3 x4 Þ Þ þ b34 ðx3 þ x4 Þðexpl2 ðx3 x4 Þ Þ þ
(3.5a)
Eq. (3.5a) is a non-linear form in x1, x2, x3, x4. As li, i ¼ 1, 2, tends towards N the distribution becomes degenerate at zero. In this case, all individuals are always standard optimisers who partition the common-metric attributes and we obtain the linear additive form (3.6). U ¼ b1 x1 þ b2 x2 þ b3 x3 þ b4 x4 þ
(3.6)
If l tends towards 0, every individual becomes a common-metric aggregator, as they perceive no difference between the two attributes.9 Eq. (3.5a) is the estimable utility expression for each alternative in a stated or revealed choice model. The WTP function is non-linear. The derivative of the utility expression with respect to a specific attribute is given in Eq. (3.7), using free-flow time (defined as x1), and in Eq. (3.8) using slowed-down time (x2) as examples of the common form.
9. As an example, imagine an experimental design with x1 and x2 being dummy variables, and the only combinations considered are (1,0) and (0,1). In both cases (x1 x2)2 ¼ 1, and so we have condition: U ¼ ðb1 x1 þ b2 x2 Þð1 expl Þ þ b12 ðx1 þ x2 Þðexpl Þ þ If x1 ¼ 1 and x2 ¼ 0, we have condition (3.6a), equivalent to (3.6b). U ¼ ðb1 x1 Þð1 expl Þ þ b12 ðx1 Þðexpl Þ þ
(3.6a)
U ¼ ðb1 x1 Þ þ ðb12 b1 Þx1 ðexpl Þ þ ¼ fb1 þ ðb12 b1 Þðexpl Þgx1 þ
(3.6b)
The same functional expression applies for x2. In both cases we have a co-mingling of parameters. If we include the combinations of (1,1) and (0,0), then we have Eq. (3.6c). U ¼ b12 ðx1 þ x2 Þ þ
(3.6c)
Attribute Processing, Heuristics and Preference Construction
41
The exactly same functional form for Eqs. (3.7) and (3.8) applies to running cost and toll cost, respectively. 2 2 @V ¼ b1 ð1 explðx1 x2 Þ Þ þ 2ðb1 x1 þ b2 x2 Þlðx1 x2 Þexplðx1 x2 Þ @x1
þ b12 exp
lðx1 x2 Þ2
2b12 ðx1 þ x2 Þlðx1 x2 Þexp
(3.7)
lðx1 x2 Þ2
2 2 @V ¼ b2 ð1 explðx1 x2 Þ Þ 2ðb1 x1 þ b2 x2 Þlðx1 x2 Þexplðx1 x2 Þ @x2
þ b12 exp
3.2.2.
lðx1 x2 Þ2
þ 2b12 ðx1 þ x2 Þlðx1 x2 Þexp
(3.8)
lðx1 x2 Þ2
Process II: Common-Metric Attribute Parameter Assignment
We now introduce a new heuristic on top of this general non-linear specification, to account for parameter transfer. Essentially, we replace the aggregation of the two attributes with a parameter transfer rule. The attribute process model proposed assumes that if a common-metric attribute (i.e., time or cost components) is greater in magnitude to the other attribute, then individuals transfer the parameter assigned initially to the former attribute to the latter attribute. We call this process ‘‘attribute marginal disutility referencing’’ (Hensher & Layton, 2008). In this new model, the processing sets determined by a and the x’s are more complicated. First we note that for each pair of common-metric attributes, say x1 and x2, there are three regimes, whether x1Wx2, x1 ¼ x2, or x1ox2. Next, in the language of our model, when x1 and x2 are cognitively close, neither attribute is expected to dominate, and hence the a conditions are reversed from the previous heuristic. Allowing a to follow an exponential distribution, as above, results in the following utilities for a situation in which there will be two sets of common-metric attributes for two time measured attributes (xt1 and xt2 associated with lt), and two cost measured attributes (xc1 and xc2 associated with lc). We can write the overall utility in terms of the subutilities for time, Vt, and cost, Vc. If xt1Wxt2: 2
2
V t ¼ ðbt1 xt1 þ bt2 xt2 Þðexplt ðxt1 xt2 Þ Þ þ bt1 ðxt1 þ xt2 Þð1 explt ðxt1 xt2 Þ Þ
(3.9)
If xt1oxt2: 2
2
V t ¼ ðbt1 xt1 þ bt2 xt2 Þðexplt ðxt1 xt2 Þ Þ þ bt2 ðxt1 þ xt2 Þð1 explt ðxt1 xt2 Þ Þ
(3.10)
In the case of xt1 ¼ xt2, evaluating either (3.9) or (3.10) at xt1 ¼ xt2 yields: V t ¼ ðbt1 xt1 þ bt2 xt2 Þ
(3.11)
42
David A. Hensher
If xc1Wxc2: 2
2
2
2
V c ¼ ðbc1 xc1 þ bc2 xc2 Þðexplc ðxc1 xc2 Þ Þ þ bc1 ðxc1 þ xc2 Þð1 explc ðxc1 xc2 Þ Þ
(3.12)
If xc1oxc2: V c ¼ ðbc1 xc1 þ bc2 xc2 Þðexplc ðxc1 xc2 Þ Þ þ bc2 ðxc1 þ xc2 Þð1 explc ðxc1 xc2 Þ Þ
(3.13)
In the case of xc1 ¼ xc2, evaluating either (3.12) or (3.13) at xc1 ¼ xc2 yields: V c ¼ ðbc1 xc1 þ bc2 xc2 Þ
(3.14)
The overall utility for alternative k is given in Eq. (3.15). U k ¼ V time;k þ V cost;k þ k
(3.15)
In any single utility expression setting, the rules will identify pairs of time and cost situations given above. There are three time conditions and three cost conditions, giving nine possible combinations for each alternative in a choice set. If l tends towards N, every individual becomes a common-metric repackager using the parameter transfer rule for the marginal (dis)utility of each attribute. For all cases, as the common-metric attributes become equal the standard compensatory model results. The WTP expressions for Process II are summarised below. The exact same functional form for Eqs. (3.16) and (3.17) applies to running cost and toll cost, respectively. If xt1Wxt2: 2 2 @V t ¼ ðbt1 Þðexplt ðxt1 xt2 Þ Þ 2ðbt1 xt1 þ bt2 xt2 Þðexplt ðxt1 xt2 Þ Þðlt ðxt1 xt2 ÞÞ @X t1 2
2
þ bt1 ð1 explt ðxt1 xt2 Þ Þ þ 2bt1 ðxt1 þ xt2 Þðexplt ðxt1 xt2 Þ Þðlt ðxt1 xt2 ÞÞ (3.16) 2 ¼ bt1 2ðbt1 xt1 þ bt2 xt2 Þðexplt ðxt1 xt2 Þ Þðlt ðxt1 xt2 ÞÞ 2
þ 2bt1 ðxt1 þ xt2 Þðexplt ðxt1 xt2 Þ Þðlt ðxt1 xt2 ÞÞ If xt1oxt2: 2 2 @V t ¼ ðbt1 Þðexplt ðxt1 xt2 Þ Þ 2ðbt1 xt1 þ bt2 xt2 Þðexplt ðxt1 xt2 Þ Þðlt ðxt1 xt2 ÞÞ @X t1 2
2
þ bt1 ð1 explt ðxt1 xt2 Þ Þ 2bt2 ðxt1 þ xt2 Þðexplt ðxt1 xt2 Þ Þðlt ðxt1 xt2 ÞÞ (3.17) 2 ¼ ðbt1 Þ 2ðbt1 xt1 þ bt2 xt2 Þðexplt ðxt1 xt2 Þ Þðlt ðxt1 xt2 ÞÞ 2
2bt2 ðxt1 þ xt2 Þðexplt ðxt1 xt2 Þ Þðlt ðxt1 xt2 ÞÞ If x1 ¼ x2: @V=@x1 ¼ b1 ; @V=@x2 ¼ b2
(3.18)
Attribute Processing, Heuristics and Preference Construction 3.2.3.
43
Process III: Attribute Non-Attendance
The general form above is not suitable for the attribute non-attendance heuristic, since it collapses down to a simple linear model. Given four attributes, the proposed utility for alternative k is given in Eqs. (3.19) and (3.20). U k ¼ V free flow time;k þ V slowed down time;k þ V toll cost;k þ V running cost;k þ k
(3.19)
or (suppressing the subscript k): U ¼ bff xff ð1 explff xff Þ þ bsd xsd ð1 explsd xsd Þ þ brc xrc ð1 explrc xrc Þ þ btoll xtoll ð1 expltoll xtoll Þ þ
ð3:20Þ
As l tends towards 0, the probability of an attribute being non-attended increases; as l tends towards N, the probability of full preservation increases. The WTP as the derivative of the utility expression with respect to a specific attribute is given in Eq. (3.21), using free-flow time (defined as x1) as an example of the common form: @V ¼ bff bff ðexplxff Þ þ bff xff lðexplxff Þ @xff
(3.21)
The focus above is a (potentially) behaviourally richer specification of the utility expression in a simple multinomial logit (MNL) model that embeds a number of process heuristics adopted by choice makers, which offers new opportunities to extract greater behavioural richness from simpler econometric specifications, in contrast to preserving the full attribute and parameter preservation assumption and introducing random parameters through mixed logit models. In time, we see the research evidence herein being extended to more advanced econometric specifications, but a reappraisal in the context of attribute processing under a simple MNL framework has merit in gaining a better understanding of the role of processing strategies in conditioning the parameters of specific attributes, and hence WTP for such attributes.
3.2.4.
Process IV: Latent Class Specification: Non-Attendance and Dual Processing of Common-Metric Attributes in Choice Analysis
The three classes of processing heuristics presented above can also be evaluated within a latent class model framework (see Hensher & Greene, 2008).10 The underlying theory of the latent class model posits that individual behaviour depends
10. In 2007, Stephane Hess gave a presentation in Norway where he considered a latent class model to accommodate attribute non-attendance and aggregation (see Hess & Rose, 2007). This was brought to my attention after completing the paper by Hensher and Greene (2008). Swait and Adamowicz (2001) have also used latent class modelling to accommodate decision complexity.
44
David A. Hensher
on observable attributes and on latent heterogeneity that varies with factors that are unobserved by the analyst. It is assumed that individuals are implicitly sorted into a set of Q processing classes, but which class contains any particular individual, whether known or not to that individual, is unknown to the analyst. The behavioural model is a logit model for discrete choice among Ji alternatives, by individual i observed in Ti choice situations, given in (3.22). expðx0 it; j bq Þ Prob½choice j by individual i in choice situation tjclass q ¼ PJ i (3.22) 0 j¼1 expðx it; j bq Þ The number of observations and the size of the choice set may vary by individual. In principle, the choice set could vary by choice situation as well. For convenience, we allow yit to denote the specific choice made, so that the model provides: Pitjq ð jÞ ¼ Probð yit ¼ jjclass ¼ qÞ
(3.23)
For convenience, we also simplify this further to Pit|q. For the given class assignment, the contribution of individual i to the likelihood is the joint probability of the sequence yi ¼ [ yi1, yi2, y, yiT], given in (3.24). YT i P (3.24) Pijq ¼ t¼1 itjq The class assignment is unknown. Let Hiq denote the prior probability for class q for individual i. A convenient form is the MNL (Eq. (3.25)). expðz0i yq Þ ; q ¼ 1; . . . ; Q; yQ ¼ 0 H iq ¼ PQ 0 q¼1 expðzi yq Þ
(3.25)
where zi denotes a set of observable characteristics which enter the model for class membership. To account for possible heuristics defined in the domains of attribute non-attendance, aggregation, and common-metric parameter transfer, we impose restrictions on parameters within each latent class, each class representing a particular process heuristic.11 For example, to impose the condition of nonattendance of a specific attribute, we set its parameter to zero; to impose commonmetric aggregation, we constrain two parameters to be equal; and to allow for parameter transfer, we define a single parameter based on the parameter associated with a specific attribute.12
11. Importantly, the number of classes selected is determined by the number of heuristics to investigate, and not by the usual BIC and AIC comparisons across a varying number of classes. 12. Unlike Process rule II, which is defined on the absolute levels of each attribute, the latent class model transfers the parameter to the entire sample within the class rule. Hence, the parameter transfer rule under Process IV is strictly different to that under Process rule II.
Attribute Processing, Heuristics and Preference Construction
45
3.3. Empirical Illustration To illustrate the implications of each attribute processing strategy (APS), we use a data set drawn from a study undertaken in Sydney in 2004, in the context of cardriving commuters making choices from a range of level of service packages defined in terms of travel times and costs, including a toll where applicable. The SC questionnaire presented respondents with 16 choice situations, each giving a choice between their current (reference) route and two alternative routes with varying trip attributes. The sample of 243 effective interviews, each responding to 16 choice sets, resulted in 3888 observations for model estimation. To ensure that we captured a large number of travel circumstances and potential attribute processing rules, we sampled individuals who had recently undertaken trips of various travel times, in locations where toll roads currently exist.13 To ensure some variety in trip length, an individual was assigned to one of the three trip length segments based on a recent commuting trip: no more than 30 min, 31–60 min, and more than 61 min (capped at 2 h). A telephone call was used to establish eligible participants from households stratified geographically, and a time and location agreed to for a face-to-face computer-aided personal interview (CAPI). A statistically efficient design (see Rose & Bliemer, 2007; Sandor & Wedel, 2002) that is pivoted around the knowledge base of travellers, is used to establish the attribute packages in each choice scenario, in recognition of supporting theories in behavioural and cognitive psychology and economics, such as prospect theory. A pivot design recognises the useful information contained in a revealed preference alternative, capturing the accumulated exposure to the studied context. Further details of the design of the choice experiment and merits of pivot or referenced designs are provided in Hensher and Layton (2008), Hensher (in press), and Rose, Bliemer, Hensher, and Collins (2008a). The two SC alternatives are unlabelled routes. The trip attributes associated with each route are free-flow time, slowed-down time, trip time variability, running cost, and toll cost. All attributes of the SC alternatives are based on the values of the current trip. Variability in travel time for the current alternative was calculated as the difference between the longest and shortest trip time provided in non-SC questions. The SC alternative values for this attribute are variations around the total trip time. For all other attributes, the values for the SC alternatives are variations around the values for the current trip. The variations used for each attribute are given in Table 3.1. The experimental design has one version of 16 choice sets (games). The design has no dominance.14 The distinction between free-flow and slowed-down time is designed to promote the differences in the quality of travel time between various
13. Sydney has a growing number of operating toll roads; hence, drivers have had a lot of exposure to paying tolls. 14. The survey designs are available from the author on request.
46
David A. Hensher
Table 3.1: Profile of the attribute range in the SC design.
Level Level Level Level
1 2 3 4
Free-flow time (%)
Slowed-down time (%)
Variability (%)
Running costs (%)
Toll costs (%)
50 20 + 10 + 40
50 20 + 10 + 40
+5 + 10 + 15 + 20
50 20 + 10 + 40
100 + 20 + 40 + 60
Figure 3.1: An example of a stated choice screen. routes — especially a tolled route and a non-tolled route, and is separate to the influence of total time. Free-flow time is interpreted with reference to a trip at 3 a.m. in the morning when there are no delays due to traffic.15 An example of a SC screen is shown as Figure 3.1.
15. This distinction does not imply that there is a specific minute of a trip that is free flow per se, but it does tell respondents that there is a certain amount of the total time that is slowed down due to traffic, etc., and hence a balance is not slowed down (i.e., is free flow like one observes typically at 3 a.m. in the morning).
Attribute Processing, Heuristics and Preference Construction
47
3.4. Evidence on Marginal Willingness to Pay: Value of Travel Time Savings 3.4.1.
Evidence from Processing Models I–IV
In this section, we bring together the evidence on value of travel time savings (VTTS) when one or more processing strategies are accounted for in modelling choice outcomes. The estimated models are not presented herein since they are given in Layton and Hensher (in press), Hensher and Layton (2008), Hensher and Rose (2009), and Hensher and Greene (2008). In all cases, we have accounted for the panel structure of the data. Our interest in this paper is on establishing the extent of under or over estimates of mean VTTS, in contrast to full relevancy and compensatory rules, when account is taken of the various process rules set out above. To obtain a VTTS distribution for each of free-flow and slowed-down time, we have to either simulate the distribution across values for the attribute(s) of interest, or apply the formula to a sample of observations. We chose the latter, using the same data used to estimate the models. Given that the denominator in the WTP expression is a weighted average of the role of running cost and toll cost, where the weights reflect the incidence of running and toll cost, and the numerator includes both attributes with a common metric, the WTP for a specific trip time component (i.e., free-flow or slowed-down time) is dependent on a mix of levels of all four attributes. We summarise the evidence in Table 3.2, including the reference source. The major finding is that all mean estimates of VTTS are higher when one or more processing rules are accounted for, in contrast to the traditional MNL model that assumes full attribute and parameter preservation. There is a clear trend here that, if reinforced by other data sets, sends a warning about the underestimation of VTTS when processing heuristics are not accounted for. The extent of underestimation appears significant; for the overall weighted average travel time it ranges from a high of 34.7 per cent for the full set of process rules in the latent class model to a low of 7.3 per cent for attribute aggregation for both time and cost.16 We take a closer look at the findings from the latent class model, summarised in Table 3.3. There is a range of mean estimates of the VTTS across the latent classes. The range is $1.35–42.19, after dividing the marginal disutility of each time component by the weighted average cost parameter, where the weights are the levels
16. It is worth noting that the attribute aggregation model (Process I) allowed for aggregation of both the time and the cost components. By contrast, the latent class model (Process IV) only found time aggregation statistically significant, but did identify a significant effect from the heuristic that transferred the toll cost parameter to the running cost attribute. What this latter evidence suggests is that individuals do not tend to add up the cost components, but tend to reweight their influence by the parameter transfer rule.
48
David A. Hensher
Table 3.2: Summary of willingness to pay estimates ($AUD2004 per person hour). Process rule
VTTS: free-flow time
Full preservation of 11.76 attributes and parameters MNL Full preservation of 14.11 attributes and parameters mixed logit Process I: attribute 12.87 aggregation
VTTS: VTTS: slowed-down weighted time average time 15.72
14.07a
16.78
15.67a
16.78
15.10
19.44
16.91
Process II: parameter transfer
13.37
Process III: attribute non-attendance Process IV: latent class mixture of all rules
15.28 (1.91) 22.05 (2.74) –
–
19.23 19.62a
Reference
Hensher and Greene (2008) Hensher and Greene (2008) Layton and Hensher (in press)b Hensher and Layton (2008) Hensher and Rose (2009) Hensher and Greene (2008)
a
The standard errors have been obtained by bootstrapping. The mean standard deviations for MNL, mixed logit, and latent class are $1.42, 3.71, and 5.10, respectively. We can reject the null of no difference between LC and MNL and between LC and mixed logit but not between MNL and mixed logit. b In order to estimate the model as a panel, Layton and Hensher (in press) used a combination of many start values and simulated annealing (code written by Tsionas 9/4/95, available at the American University Gauss Archive: http://www.american.edu/academic.depts/cas/econ/gaussres/GAUSSIDX.HTM). Using the maximum from the simulated annealing approach, we then computed a Newton–Raphson iteration using 500 replications of the simulator, and computed the covariance from all terms except for lt and lc.
of running and toll cost. To obtain an overall sample average, we have to weight each mean estimate by the probability of class membership. The overall sample weighted average for total time is $19.62, which contrasts with $14.07 for the traditional MNL specification in Table 3.2 (Hensher & Greene, 2008, Table 3.3). The mean estimate of VTTS is 39.4 per-cent higher when process heterogeneity is accounted for across three classes of heuristics. A closer look at the contribution of each heuristic suggests that attribute addition for the two time components produces the highest mean estimate contribution to VTTS, after controlling for class membership. Ignoring free-flow time is the next contributor, followed by full attendance to all attributes. Ignoring running cost and slowed-down time is the next contribution.
Attribute Processing, Heuristics and Preference Construction
49
Table 3.3: Values of travel time savings from a latent class model ($AUD2004 per person hour). NAT ¼ not attended to ParT ¼ parameter transfer All attributes attended to Free-flow NAT Toll cost NAT Slowed-down time NAT Running cost and slowed-down time NAT Free-flow and slowed-down time added Free-flow to slowed-down and vice versa ParT Free-flow to slowed-down ParT and running cost to toll cost and vice versa ParT Class membership weighted VTTS
Class Free-flow Slowed- Total time membership time down time probability 0.2817 0.1119 0.0359 0.0643 0.0497
5.87
9.89 23.02 8.93
3.95 1.35 42.19
8.22 23.02 6.85 1.35 42.19
0.2978
37.57
37.57
0.0758
4.57
4.57
0.0829
9.26
9.26
19.62 (5.10)
Source: Hensher and Greene (2008). Note: Standard error for each component VTTS is available on request.
3.4.2.
Evidence from Self-Stated Processing Response for Common-Metric Addition
The focus of the previous sections was on exploring a way in which we are able to allow for the possibility of heterogeneity in the way that individuals process commonmetric attributes in making choices, focusing on a number of potential heuristics, without having to ask supplementary (deterministic) elicitation questions. In addition to the SC experiment, in the survey we did however ask supplementary elicitation questions shown in Figure 3.2.17 In this section we investigate the possible implications of conditioning the preference function use to derive WTP estimates, using the response to question 2 to illustrate the empirical implications. A large percentage of the respondents stated, in supplementary questions (see Hensher, 2008), that they added the components: 88.06 per cent and 76.5 per cent, respectively, for time and cost.
17. This question was asked after completion of all 16 choice tasks. An alternative approach is to ask these questions after each choice task as was the case in Puckett and Hensher (2008, 2009), and Scarpa, Thiene, and Hensher (2009b). Our preference is for choice-task-specific self-stated processing questions, especially where attribute level matters; however this comes at the risk of cognitive burden and the possibility that the number of choice tasks might have to be reduced. We also recognise the potential limitation of such questions, and devote a later section to future ways of investigating question structure, and the believability/plausibility of the evidence.
50
David A. Hensher
Figure 3.2: CAPI questions on attribute relevance. We estimated five panel-specification models — two mixed logit (with and without error components) and three latent class models. One mixed logit model ignored the attribute processing rule, and the other accommodated it through the specification of separate parameters to capture the following conditions: (i) added up times but not costs, (ii) added up costs but not times, (iii) added up both times and costs, and (iv) preserved all four attributes as separate components. One latent class model defined four class memberships as per (i)–(iv) above without recourse to information from the supplementary questions, whereas another latent class model conditioned class membership on conditions (i)–(iv). A base latent class model assumed all attributes are treated separately, but three classes were identified with statistically significant latent class probabilities. The findings are summarised in Table 3.4. Mixed logit and latent class models are well documented in the literature. For mixed logit, we have selected a quasi-constrained triangular distribution for each random parameter, in which the spread18 estimate is constrained to equal the mean estimate for the random parameters. If the scale equals 1.0, the range is 0–2b1. This is an appealing way of capturing the random taste heterogeneity, avoiding the search for such heterogeneity at the extremes of unconstrained distributions.19
pffiffiffi 18. The spread is the standard deviation times 6. 19. We acknowledge that this restriction is controversial; although we prefer to adopt it in contrast to unconstrained distributions where sign changes are common, or eliminating all negative VTTS as some analysts do.
Weighted average VTTS (Australian $AUD2004 per person hour)
Log likelihood at convergence Log likelihood at zero
Fixed parameter: Non-reference alternative dummy
Aggregated RC and TC
Toll cost (TC)
Running cost (RC)
Aggregated FF and SDT
Slowed-down time (SDT)
(i) Mixed logit models (panel specification) Random parameters: Free-flow time (FF)
Attributes
$15.87 ($10.14)
0.1344 ( 2.88) 2762.80
0.4167 ( 14.58) 0.188 ( 22.99) –
0.10023 ( 17.33) 0.1147 ( 21.94) –
No allowance for self-stated APS
4271.41 $20.12 ($16.01)
0.1669 ( 3.61) 2711.88
0.0497 ( 3.64) 0.687 ( 5.98) 0.1236 ( 22.5) 0.1945 ( 4.11) 0.2905 ( 9.70) 0.6103 ( 21.62)
Allowance for self-stated APS
Mixed logit model (constrained triangular for random parameters), t-ratios in brackets except for VTTS, which is the standard deviation
Table 3.4: Influence of self-stated attribute processing strategy (APS) on VTTS.
Attribute Processing, Heuristics and Preference Construction 51
Weighted average VTTS (Australian $AUD2004 per person hour)
Standard deviation of latent random effect Log likelihood at convergence Log likelihood at zero
Fixed parameter Non-reference alternative dummy
Aggregated RC and TC
Toll cost (TC)
Running cost (RC)
Aggregated FF and SDT
Slowed-down time (SDT)
$16.11 ($10.87)
0.18195 (1.95) 2.43423 (24.5) 2485.03
0.49740 ( 19.74) 0.55193 ( 32.95) –
0.11190 ( 31.45) 0.12746 ( 34.25) –
4271.41 $22.63 ($23.26)
0.27233 ( 2.13) 2.3357 (28.21) 2447.43
0.08113 ( 5.50) 0.07514 ( 7.06) 0.13076 ( 19.37) 0.23583 ( 3.96) 0.26234 ( 7.489) 0.65814 ( 17.19)
Allowance for self-stated APS
Mixed logit model (constrained triangular for random parameters), t-ratios in brackets except for VTTS, which is the standard deviation No allowance for self-stated APS
(ii) Mixed logit models (panel specification) with error component Random parameters Free-flow time (FF)
Attributes
Table 3.4: (Continued )
52 David A. Hensher
Weighted average VTTS (Australian $AUD2004 per person hour)
Log likelihood at convergence Log likelihood at zero
Class membership probability
Non-reference alternative
Toll cost
Running cost
Slowed-down time
(iii) Latent class models (panel specification) Base model Free-flow time 0.04006 ( 4.7) 0.0603 ( 9.6) 0.3323 ( 8.9) 0.2883 ( 10.7) 2.5043 (12.3) 0.263 (6.92)
Class 1
$17.89
0.2022 ( 28.9) 0.2009 ( 31.6) 0.3399 ( 10.7) 0.3417 ( 24.2) 0.3947 ( 7.2) 0.361 (10.45) 2542.74 4271.41
Class 2
0.0338 ( 7.5) 0.0749 ( 22.0) 0.4739 ( 15.3) 0.6115 ( 33.6) 1.0281 ( 23.3) 0.376 (11.14)
Class3
Attribute Processing, Heuristics and Preference Construction 53
Class membership probability
0.050
0.318
0.253
Time components aggregated
Cost components aggregated
Time and Cost components aggregated
0.049, 0.090, 0.638, 0.743, 0.622 ( 5.5, 13.0, 11.3, 19.1, 6.9) 0.057, 0.057, 0.29, 0.38, 3.9 ( 3.3, 3.3, 1.9, 9.2, 11.1) 0.217, 0.212, 0.319, 0.319, 0.428 ( 26.9, 29.2, 19.1, 19.1, 6.8) 0.052 , 0.052, 0.282, 0.282, 2.58 ( 17.4, 17.4, 25.4, 25.4, 22.2)
Parameter estimates for FF, SDT, RC, TC, NONSQ (t-ratios in brackets)
No allowance for self-stated APS
Models allowing for attribute processing All attributes treated 0.379 separately
Latent class attributes
Table 3.4: (Continued).
0.257
0.310
0.052
0.381
Class membership probability
0.055, 0.092, 0.648, 0.748, 0.637 ( 5.0, 12.1, 10.1, 16.3, 6.7) 0.054, 0.054, 0.332, 0.370, 3.82 ( 3.2, 3.2, 2.0, 8.4, 10.4) 0.221, 0.215, 0.317, 0.317, 0.410 ( 25.1, 27.8, 17.5, 17.5, 6.3) 0.050, 0.050, 0.277, 0.277, 2.49 ( 16.1, 16.1, 23.2, 23.2, 21.9)
Parameter estimates for FF, SDT, RC, TC, NONSQ (t-ratios in brackets)
Allowance for self-stated APS
54 David A. Hensher
Weighted average VTTS (Australian $AUD2004 per person hour)
Log likelihood at convergence Log likelihood at zero
Time components aggregated Cost components aggregated
All attributes treated separately
Theta in class probability:
$18.02 ($15.02)
2427.57 4271.41
$18.05 ($15.28)
Constant, FF, SDT, FFSDT, RC, TC, RCTC Note: all covariates are in minutes or dollars, except the constant. Statistically significant: * ¼ 5%, ** ¼ 10% level 1.35**, 0.006, 0.003, 0.005 , 0.33, 0.079, 0.093 (2.4, 0.17, 0.14, 0.61, 1.1, 0.45, 1.4) 1.59, 0.18*, 0.45, 0.009, 0.52, 0.61, 0.13 ( 1.2, 1.9, 1.4, 0.44, 1.6, 1.1, 0.7) 1.16*, 0.02, 0.03, 0.009, 0.35*, 0.15, 0.13* (1.9, 0.7, 1.1, 0.9, 1.7, 0.9, 1.7) 2399.64
Attribute Processing, Heuristics and Preference Construction 55
56
David A. Hensher
The triangular distribution was first used for random coefficients by Train and Revelt (2000) later incorporated into Train (2003), and it is increasingly being used in empirical studies. The overall goodness-of-fit for the models, with allowance for self-stated APS, are statistically better than when self-stated APS is not accounted for. The mixed logit models differ in the way that the time and cost attributes are included in the utility expressions, but in both models all parameters have the expected negative signs, and are statistically significant at the one per cent level. Given the different ways that freeflow and slowed-down time are handled, the most sensible representation of the VTTS is as a weighted average estimate, with weights associated with the contribution of each of the three specifications of cost and of time. The VTTS in Table 3.4 are based on conditional distributions (i.e., conditional on the alternative chosen). The VTTS in the mixed logit model is significantly higher when the selfstated APS is accounted for, that is, $20.12 (22.63 with error components) per person hour, compared to $15.87 ($16.11 with error components) per person hour. The latent class model is based on four attribute addition rules (i)–(iv), and all time and cost parameters are statistically significant at the one per cent level, and of the expected sign when class membership is conditioned on the self-stated APS; however, when the self-stated APS are not included, all but one parameter is statistically significant at the 1 per cent level, the exception being running cost in the second class, which has a 10 per cent significance level. The overall log likelihood at convergence is greatly improved over the mixed logit model for both latent class models, suggesting that the discrete nature of heterogeneity captured through latent class is a statistical improvement over the continuous representation of heterogeneity in the mixed logit model. The weighted average VTTS are derived first across classes for each attribute, based on conditional distributions associated with the probability of class membership of each respondent within each class, and then a further weighting is undertaken using weights that reflect the magnitudes of the components of time and cost. The weighted average VTTS in the two latent class models that account for attribute processing are virtually identical. What this suggests is that once we have captured the alternative processing rules, though the definition of latent classes, the inclusion of the self-stated APS rules as conditions on class membership do not contribute additional statistically useful evidence to revise the findings, in the aggregate. This is consistent with the statistical non-significance of most of the selfstated APS variables; with only three parameters having a 10 per cent significance level (excluding the constants). There were no parameters with one or five per cent significance levels. However, when we contrast this evidence to the base latent class model which makes no allowance for attribute processing, the mean VTTS is only slightly lower (i.e., $17.89 per person hour compared to $18.02, and $14.07 for the MNL model). What this may suggest is that the latent class specification may have done a good job is approximating the way in which attributes are processed. These findings support the hypothesis that allowance for attribute processing rules tends to result, on average, in a higher mean estimate of WTP for travel time savings. This is consistent, directionally, with other studies undertaken by Rose, Hensher, and Greene (2005) and Hensher and Layton (2008).
Attribute Processing, Heuristics and Preference Construction
57
3.5. Other Perspectives-Belief and Support Functions to Establish Judgment of Evidence Strength A growing number of studies ask supplementary questions, such as those illustrated in the previous section, to elicit how respondents processed specific attributes (e.g., Hensher, 2008; Hess & Hensher, in press). The reliability of responses to such questions (e.g., ‘‘which attributes did you ignore?’’ or ‘‘which attributes did you add up?’’) is not without controversy (see Bertrand & Mullainathan, 2001), with preliminary evidence suggesting that the marginal WTP, when the responses to supplementary intention questions are used to condition the treatment of an attribute in model estimation, are sometimes higher and sometimes lower than when processing is excluded. In contrast, the (limited but growing) evidence appears to be consistently in the upward direction when heuristics are tested through the functional specification of non-linear utility expressions. So which tendency is ‘‘correct?’’ The answer is far from clear. Furthermore, some studies have shown that the expectation of a parameter approaching zero, when a respondent claims that they ignored an attribute, is not proven (Hess & Hensher, in press); in contrast, a recent study by Rose, Hensher, Greene, Washington, and Black (2008b) using Bayesian methods found encouraging evidence that self-stated responses on attribute nonpreservation were indeed consistent with a statistically non-significant difference around a zero parameter estimate. One potentially fruitful way forward is to transform the self-stated processing responses to recognise the potential for error in response.20 While there are a number of ways that this might be undertaken, one way that is worthy of investigation is known as the belief-function perspective. There is a large literature on believability, emanating from the works of Dempster and Shafer in the 1960s and 1970s (Shafer, 1976; Dubois & Prade, 1988). Although not focused on attribute processing in choice analysis per se, the sentiment is aligned.21 The focus is on the uncertainty that arises because of the lack of knowledge of the true state of nature, where we lack not only the knowledge of a stable frequency (how can we be sure that the heuristic adopted is stable over time? See also Yoon & Simonson (2008), as implied by the process heuristics in previous sections), but also the means to specify fully the fixed conditions under which repetitions can be performed (Shafer & Srivastava, 1990). The Dempster–Shafer theory of belief
20. Another interesting approach is to recognise the role of heterogeneity and to identify what are the strongest parametric assumptions and see how they can be relaxed through modelling heterogeneity. King and Wand (2007) offer some interesting ideas in the context of ordered choices and anchoring Vignettes identified from supplementary information designed to account for differences in perceptions across respondents. See Greene and Hensher (2009, chapter 7). 21. A linked literature in social psychology is focused on the meaning and moderators of attitude strength, where strong attitudes are characterized by high levels of confidence and stability. Attitude confidence is defined as the degree to which an individual is certain that his attitude is correct. See Krosnick and Schuman (1988).
58
David A. Hensher
functions is used to assess reliability of evidence, which provides support for the presence or absence of such a variable in situations where the event cannot be treated as a random variable. Dempster (1967) introduces belief functions from a statistical perspective in terms of carrying a (frequentist) probability measure from a ‘‘space of observations’’ to a ‘‘space of interpretations of these observations’’ by a ‘‘point-to-set mapping’’ (Dubois & Prade, 1988). Many individuals are influenced by the views of others, suggesting that additional information on the believability of an individual’s response may be aided by this extra evidence.22 Thus we need to find ways in which we can triangulate evidence from various sources, in order to establish a measure of belief of the evidence offered by an individual on how they processed specific attributes. The level of belief, on whether the person in question processed an attribute using a specific rule, or not, depends on the items of evidence, and their credibility. A belief function treatment of such problems provides a possible framework. It involves three constructs — belief functions, plausibility functions, and a measure of ambiguity. When combined, especially belief function and plausibility function, we obtain Dempster’s rule of what I term ‘‘rule reliability.’’ We now explain this rule in more detail, and suggest the nature of data required in future studies to embed the rule-reliability measure into the estimation of choice models. The Dempster–Shafer theory of belief functions is similar to probability theory, however, with one difference. Under probability theory, uncertainty is assigned to the state of nature based on the knowledge of frequency of occurrence. However, under belief functions, uncertainty is assigned to the state of nature or assertion of interest in an indirect way, based on the probability knowledge in another frame, by mapping that knowledge onto the frame of interest. This mapping may not necessarily be one to one. To illustrate, suppose we have a variable, say A, with n possible mutually exclusive and exhaustive set of values: a1, a2, a3,y, an. These values could be alternative ways that an attribute is processed (in isolation or in conjunction with other attributes), and/or processing responses to different question structures, including a simple binary statement of ‘‘ignored or did not ignore’’23 the attribute, or ‘‘added up or did not add up two attributes of a common metric.’’ Define the frame, Y ¼ {a1, a2, a3,y, an} of discernment24 for the variable A. Under probability theory, for such a set, we assign a probability mass, P(ai), Pto each state ai such that the sum of all these probabilities equals one, that is, ni¼1 Pðai Þ ¼ 1. However, under the Dempster–Shafer theory of belief functions, uncertainties are assigned in terms of belief masses not only to singletons, but also to all the subsets of the frame, and to the entire frame Y. These belief masses add to one, similar to probability masses.
22. Extra evidence may be obtained from other questions asked to the same individual, as well as questions to other persons. 23. Including different wording, such as ‘‘attending to or not attending to.’’ 24. That is the quality of being able to grasp and comprehend what is obscure.
Attribute Processing, Heuristics and Preference Construction
59
The entire frame Y in our example might be a binary setting of ‘‘ignored’’ (a1) and ‘‘not ignored’’ (a2) for a specific attribute associated with an alternative and/or a choice task.25 These belief masses define a function called the basic belief mass function (Shafer, 1976). We can write a belief mass assigned to a subset B as m(B), where B could be a single element, or a subset of two, a subset of three (e.g., degrees of attribute preservation), and Pso on, or the entire frame, Y. The sum of such belief masses equals one, that is, B Y mðBÞ ¼ 1. When the non-zero belief masses are only defined on the singletons, the belief function reduces to probability theory. Thus, one can argue that probability theory is a special case of Dempster–Shafer theory of belief functions. To crystallise this distinction in an example, suppose we were able to determine, from a number of sources, that m(IG) ¼ 0.3, m(NIG) ¼ 0, and m(IG, NIG) ¼ 0.7.26 IG stands for ‘‘the ignore response being a reasonable representation of reality,’’ and NIG stands for ‘‘the ignored response being either materially misstated or not reflecting acceptable views of others.’’27 The belief function interpretation of these belief masses is that the analyst has 0.3 level of support for ‘‘IG,’’ no support for ‘‘NIG,’’ and 0.7 level of support remains uncommitted which represents ignorance28 (Dubois & Prade, 1988).29 However, if we had to express the above judgment in terms of probabilities, we get into problems, because we will assign P(IG) ¼ 0.3 and P(NIG) ¼ 0.7, which implies that there is a 70 per cent chance that the response to the question is ‘‘materially misstated or does not reflect acceptable views of others.’’ However, this is not what the analyst’s judgment is; he has no information or evidence that ignoring an attribute is materially misstated. Simply knowing the fact that the response appears to be reasonable, compared to the predicted values based on the average views of others, including additional information obtained from the specific individual, provide no evidence that the response to the question on whether
25. It could also be degrees of attribute relevance (a1, a2, y, an), from totally relevant (not ignored) to totally irrelevant (ignored). 26. Establishing these probabilities is the great challenge. 27. Information to gauge the reliability of stated self-intentions could be sought from the very same person along similar lines to supplementary questions used in reducing the hypothetical bias (HB) gap in willingness to pay. An example in the HB context is a supplementary certainty scale question after each choice scenario, along lines suggested by Johannesson et al. (1999), on a scale 0 (very unsure) to 10 (very sure), to indicate how sure or certain the respondent is that they would actually chose that route (or not at all) at the indicated price and travel time. 28. A ‘‘complete ignorance heuristic’’ (CI) reflects the case in which the individual is not aware of the influence of the attributes in their utility function. Arana et al. (2008, p. 757) suggest an interpretation as follows: ‘‘It collects individuals who do not care about the consequences of their responses, or who do not pay attention to the experiment. In other words, CI is utilized by those individuals who make choices using a completely random process.’’ 29. Dubois and Prade (1988, p. 55) state, ‘‘Probability theory is not very good at modeling weak states of knowledge where the uncertainty about some event is but loosely related to the uncertainty about the contrary event. Especially, total ignorance y cannot be expressed by a single probability measure. Another way of putting it is that probability cannot distinguish between the absence of belief in not-A and the belief in A.’’
60
David A. Hensher
an attribute is ignored is materially misstated. It only provides some level of support that the subjective response is accurately stated. The Belief Function is defined as follows: The belief in B, Bel(B), for a subset B of elements of a frame, Y, represents the total belief in B, and is equal to the belief mass, m(B), assigned to B plus the sum of all the belief masses assignedPto the set of elements (C) that are contained in B. In terms of symbols: BelðBÞ ¼ C B mðCÞ.30 The Plausibility Function is defined as follows: Intuitively, the plausibility of B is the degree to which B is plausible given the evidence. In other words, Pl(B) represents the maximum belief that could be assigned to B, given that all the evidence collected in the future P support B. In mathematical terms, one can define plausibility of B as PlðBÞ ¼ B\C¼+ mðCÞ, which can also be expressed as: Pl(B) ¼ 1 Bel(BB), which is the degree to which we do not assign belief to its negation (BB). The belief function measure of ambiguity in an assertion, say B, is the difference between the plausibility of B, and the belief in B (Wong & Wang, 1993). Dempster’s rule (Shafer, 1976) combines more than one independent items of evidence, similar to Bayes’ rule in probability theory. Dempster’s rule reduces to Bayes’ rule under the condition when all the belief masses defined on the frame are zero, except the ones for the singletons. For example, for two independent items of evidence31 pertaining to a frame of discernment, Y, we can write the combined belief mass for a subset B in Y using Dempster’s rule of combination as: mðBÞ ¼
X C 1 \C2
where K ¼1
m1 ðC1 Þm2 ðC2 Þ K ¼B
X
m1 ðC 1 Þm2 ðC 2 Þ
(3.26)
(3.27)
C1 \C2 ¼+
The symbols m1(C1) and m2(C2) determine the belief masses of C1 and C2, respectively, from the two independent items of evidence represented by the subscripts. K is a renormalisation constant. The second term in K represents the conflict between the two items of evidence32; the two items of evidence are not combinable if the conflict term is 1.
30. By definition, the belief mass assigned to an empty set is always zero, i.e., m(+) ¼ 0. 31. Such as the supplementary self-stated intention questions and the reliability question (see footnote 27). 32. A challenging problem in combining uncertain information is to decide what to do with conflicts. Generally, combining information issued from conflicting sources leads to un-normalized uncertainty measures. Shafer (1976), in advocating Dempster’s rule, suggests that the resulting uncertainty measure should be renormalised. He motivates his choice by the Sherlock Holmes principle saying that ‘‘having discarded the impossible, what remains, however improbable, is the truth.’’ But this principle assumes that sources of information are both totally reliable, an over-optimistic assumption in some situations. The weight m(+), and more generally, the amount of subnormalisation, assesses the extent to which both sources are indeed reliable. Moreover, the normalisation operation introduces discontinuities in the combination rule. See Dubois and Prade (1987) for a discussion of combination rules for belief functions and possibility measures, including the case of unequally reliable sources.
Attribute Processing, Heuristics and Preference Construction
61
Let us consider an example to illustrate the details of Dempster’s rule. Suppose we have the following sets of belief masses obtained from two independent items of evidence33 related to the accurate representation of whether an attribute is ignored (IG) or not ignored (NIG): Evidence 1 : m1 ðIGÞ ¼ 0:3; m1 ðNIGÞ ¼ 0:0; m1 ðfIG; NIGgÞ ¼ 0:7 Evidence 2 : m2 ðIGÞ ¼ 0:6; m2 ðNIGÞ ¼ 0:1; m2 ðfIG; NIGgÞ ¼ 0:3 The renormalisation constant for the above case is: K ¼ 1 ½m1 ðIGÞm2 ðNIGÞ þ m1 ðNIGÞm2 ðIGÞ ¼ 1 ½0:3 0:1 þ 0:0 0:6 ¼ 0:97 Using Dempster’s rule (3.26), the combined belief masses for ‘‘IG,’’ ‘‘NIG,’’ and {IG, NIG} are given by34: mðIGÞ ¼ ¼ mðNIGÞ ¼ ¼ mðfIG; NIGgÞ ¼
½m1 ðIGÞm2 ðIGÞ þ m1 ðIGÞm2 ðfIG; NIGgÞ þ m1 ðfIG; NIGgÞm2 ðIGÞ K ½0:3 0:6 þ 0:3 0:3 þ 0:7 0:6 0:69 ¼ ¼ 0:711 0:97 0:97 ½m1 ðNIGÞm2 ðNIGÞ þ m1 ðNIGÞm2 ðfIG; NIGgÞ þ m1 ðfIG; NIGgÞm2 ðNIGÞ K ½0:0 0:1 þ 0:0 0:3 þ 0:7 0:1 0:07 ¼ ¼ 0:0721 0:97 0:97 m1 ðfIG; NIGgÞm2 ðfIG; NIGgÞ 0:7 0:3 0:21 ¼ ¼ ¼ 0:2165 K 0:97 0:97
ð3:28Þ
33. It is straightforward to generalise to any number of evidence sources. 34. The term ‘‘m2({IG, NIG})m1({IG, NIG)}’’ represents the ‘‘unknown’’ or ambiguity factor. This is assigned to m({IG, NIG}). Dempster’s rule for combining two items of evidence on a frame {x, Bx} of a binary variable X dictates that the combined belief masses on the frame should be determined as follows: m(x) ¼ [m1(x)m2(x) + m1(x)m2({x, Bx}) + m1({x, Bx})m2(x)]/K where K ¼ 1 [m1(x)m2(Bx) + m1(Bx)m2(x)]. The above equation comes from the simple logic that state ‘‘x’’ is true if both items of evidence suggest that ‘‘x’’ is true, i.e., m1(x)m2(x), or one item of evidence suggests that ‘‘x’’ is true but the other one is not sure whether it is ‘‘x’’ or ‘‘Bx,’’ i.e., m1(x)m2({x, Bx}) and m1({x, Bx})m2(x). K is a renormalisation constant to make sure that the combined m-values add to one. As you can see above, the second term in K, i.e., [m1(x)m2(Bx) + m1(Bx)m2(x)], represents conflict between the two items of evidence; one item of evidence is suggesting that ‘‘x’’ is true and the other one is suggesting that ‘‘Bx’’ is true. m(Bx) ¼ [m1(Bx)m2(Bx) + m1(Bx)m2({x, Bx}) + m1({x, Bx})m2(Bx)]/K. The above equation again comes from the logic that state ‘‘Bx’’ is true if both items of evidence suggest that ‘‘Bx’’ is true, i.e., m1(Bx)m2(Bx), or one item of evidence suggests that ‘‘Bx’’ is true but other one is not sure whether it is ‘‘x’’ or ‘‘Bx,’’ i.e., m1(Bx)m2({x, Bx}) and m1({x, Bx})m2(Bx). m({x, Bx}) ¼ [m1({x, Bx})m2 ({x, Bx})]/K. The above equation suggests that if both items of evidence are not sure whether it is ‘‘x’’ or ‘‘Bx,’’ then the combined evidence also is not sure whether it is ‘‘x’’ or ‘‘Bx.’’ Thus, [m1({x, Bx})m2 ({x, Bx})] maps to m({x, Bx}). We thank Stewart Jones for this clarification.
62
David A. Hensher
The combined beliefs and plausibilities that attribute processing is not misstated are: BelðIGÞ ¼ mðIGÞ ¼ 0:711; and BelðNIGÞ ¼ mðNIGÞ ¼ 0:0721 PlðIGÞ ¼ 1 BelðNIGÞ ¼ 0:928; and PlðNIGÞ ¼ 1 BelðIGÞ ¼ 0:289
(3.29) (3.30)
The choice model, for each individual observation, can have each attribute discounted by the ‘‘plausibility factors’’: Pl(IG) ( ¼ 0.928) and Pl(NIG) ( ¼ 0.289). This might be a decomposition of a random parameter in a mixed logit model, or interaction terms in MNL and latent class models, or through conditioning the scale parameters. These plausibility factors would be applied to all observations, based on evidence obtained from supplementary questions. The challenge for ongoing research is to identify a relevant set of questions posed to the respondent and other agents that can be used to quantify evidence, suitable for deriving the belief and plausibility functions for each respondent. The Dempster–Shafer theory of belief has links to support theory (Tversky & Koehler, 1994; Fox & Tversky, 1998; Idson, Krantz, Osherson, & Bonini, 2001; Hensher, 2009), a psychological model of degree of belief, which argues that different descriptions of the same event often give rise to systematically different responses, and hence the judged probability of the union of disjoint events is generally smaller than the sum of judged probabilities of these events. Support theory assumes that subjective probability is attached to descriptions of events (e.g., ‘‘which attribute(s) did you ignore?’’ or ‘‘did you ignore attribute x?’’), and not events per se, and hence different descriptions of the same event may be assigned different probabilities. There is a key distinction, however, between the Dempster–Shafer theory of belief and support theory, linked to the extensionality principle, which states that events with the same extension are assigned the same probability. The extensionality principle is problematic in that alternative descriptions of the same event can produce systematically different judgments. For example, in the context of two states of the world, called events, such as ‘‘two attributes were added up’’ and ‘‘two attributes were treated separately through unpacking retention,’’ we might ask an individual to consider two routes to take for a given trip. In the first experiment we might offer total times of 30 and 40 min (and associated costs of $4 and $2.50); in the second experiment we might offer a free-flow time of 20 minutes and a congested time of 10 min versus 25 min for free-flow and 15 min for congested time, keeping costs the same as the first experiment. The responses are almost always different across a sample (see Layton & Hensher, in press), since some individuals will initially add up the travel time components and undertake the comparison, essentially equivalencing experiments 1 and 2; whereas other individuals will evaluate the unpacked time components and make a judgement (i.e., the choice), which implicitly has weighted the components differently to the weight obtained for the aggregation of time. The previous paragraph is essentially reiterating a point made by Krantz (1991) that Dempster–Shafer’s model is more suitable for judgments of evidence strength
Attribute Processing, Heuristics and Preference Construction
63
than for judgments of probability, the latter being what we focus on in the estimation of the choice model to explain the choice among alternatives, in contrast to how we enter a specific attribute into the choice model. The judgments of evidence strength is the very role that the plausibility function plays in the context of specifying the way that a specific attribute is processed in the context of SC experiments. We are not using the belief (or indeed the support) theory to establish probabilities of outcomes, since that is accommodated though the choice model. The specific feature of the belief paradigm is the idea of superadditivity; namely that multiple sources of evidence (obtained through more detail, or what is commonly referred to as unpacking in the psychology literature) results in a belief in the disjunction of disjoint events being greater than or equal to the sum of the beliefs in each of the components. For example, if we have four attributes (x1, x2, x3, x4), of which the first two have a common metric (e.g., travel time) and the last two have a common metric (e.g., cost), we might have a number of ways in which we can structure questions suitable for establishing how each specific attribute is processed (in the context of how the package of attributes is processed). There are a number of possible ways of evaluating an attribute in arriving at a decision on how it will be processed in the context of a choice task. These might, for example, be based on five items of evidence (or heuristics) in relation to the processing of x1 (the responses could include: (i) ignored or not, (ii) added up with another common-metric attributes, and (iii) transferred the parameter to another common-metric attribute): E_a ¼ E(x1): ‘‘I evaluated only x1 in deciding what role x1 plays.’’35 E_b ¼ E(x1, x2); E(x1, x3); E(x1, x4); y; E(x1, x2, x3, x4): ‘‘I evaluated x1 in the context of a subset of the attributes offered.’’ E_g ¼ E(x1, x2): ‘‘I evaluated x1 in the context of attributes that have a common metric with x1.’’ E_e ¼ E(x1 + x2, x3 + x4): ‘‘I evaluated x1 by adding up attributes that have a common metric (e.g., times and costs).’’ E_Z ¼ E(x1, x2, x3, x4): ‘‘I evaluated every attribute in deciding what role x1 plays.’’ For each of these candidate heuristics, the analysts might ask, in the context of whether an attribute x1 was ignored or not: Please allocate 100 points between the three possible ways you might (or did) respond to reflect your assessment of
35. The approach commonly adopted in supplementary questions. The precise wording would vary according to the nature of the empirical study.
64
David A. Hensher
how you believe you used each of the processing rules in determining the role of attribute x1: E_a I definitely ignored (IG) x1 — I did not fully ignore, or fully not ignore {IG, NIG}, x1 — — I definitely did not ignore (NIG) x1 100
E_b — — — 100
E_g — — — 100
E_e — — — 100
E_Z — — — 100
These heuristics may be randomly assigned to each respondent, or all might be assigned to each respondent (in a randomised order). There are some clear (cognitive) disadvantages of assigning all heuristics to each respondent, yet this may be necessary in order to obtain the required data to calculate a plausibility expression. It might also be of interest to have each respondent rank the heuristics in order of applicability (in the example above, this is a rank from 1 to 5, where 1 ¼ most applicable). If the focus is on whether an attribute x1 was ignored or not, then we might identify the following evidence: E_a : E_aðIGÞ ¼ 0:4; E_aðNIGÞ ¼ 0:2; E_aðfIG; NIGgÞ ¼ 0:4; rank ¼ 4 E_b : E_bðIGÞ ¼ 0:4; E_bðNIGÞ ¼ 0:3; E_bðfIG; NIGgÞ ¼ 0:3; rank ¼ 3 E_g : E_gðIGÞ ¼ 0:5; E_gðNIGÞ ¼ 0:3; E_gðfIG; NIGgÞ ¼ 0:2; rank ¼ 2 E_ : E_ðIGÞ ¼ 0:3; E_ðNIGÞ ¼ 0:3; E_ðfIG; NIGgÞ ¼ 0:6; rank ¼ 5
(3.31)
E_Z : E_ZðIGÞ ¼ 0:5; E_ZðNIGÞ ¼ 0:2; E_ZðfIG; NIGgÞ ¼ 0:3; rank ¼ 1 The responses to Eq. (3.31) can be fed into Eq. (3.28) to obtain the belief and plausibility values in Eqs. (3.29) and (3.30), which can then be interacted in a choice model (for ordered or unordered alternatives), with each attribute and/or the scale parameters to account for the APS of each respondent at an alternative and at a choice set level.36
3.6. Conclusions This paper brings together an accumulating set of interesting processing rules that are hypothesised to be applied, to varying degrees, by respondents in assessing choice scenarios in choice experiments. The rules are ways of cognitively rationalising the information on offer in order to make a choice. The paper synthesises the empirical evidence presented in Layton and Hensher (in press), Hensher and Layton (2008), Hensher and Rose (2009), and Hensher and Greene (2008), and offers new evidence
36. A way of accounting for the rank order requires ongoing research.
Attribute Processing, Heuristics and Preference Construction
65
to support the view that failure to identify and account for process heterogeneity tends to result in potentially significant differences in the marginal WTP for travel time savings. If this evidence accumulates, and is shown to be applicable to a wider set of marginal WTP attributes and contexts, then we should be concerned about the standard evidence, especially in an economic appraisal and demand forecasting context. The paper also draws on a literature not connected to discrete choice analysis that recognises the errors of response in qualitative questions typically used to establish the presence of a specific processing rule. Although we have no new empirical evidence to quantify the notion of believability and plausibility associated with the Dempster–Shafer belief function, the approach to discounting self-stated explication on how attributes are processed is appealing, and worthy of investigation. In ongoing research, we are investigating additional heuristics, including alternative functional forms for the heuristics herein, and ways of combining more than two heuristics into a single choice model that enable each heuristic to evolve up to a probability as continuum across a sampled population and/or across observations obtained from the same individuals (as is a panel such as a set of choice scenarios in a SC experiment). Another topic of especial interest is the relationship between WTP findings from self-stated intentions and specific functional forms of utility expressions. The evidence to date, from both sources, is empirically not supportive of the other in terms of the magnitude of marginal WTP relative to the simple fully compensatory linear model. The roles of both approaches are yet to be clarified. Tangential to the current study is the growing literature on hypothetical bias in SC studies, which suggests that the marginal WTP (MWTP) is underestimated for VTTS in SC studies, compared to actual market-based evidence, possibly by as much as 50 per cent (see Brownstone & Small, 2005; Hensher, in press). Isacsson (2007), in the context of trading time with money, found that the MWTP based on a hypothetical experiment was almost 50 per cent lower at the mean than the real experiment MWTP, supporting the conclusions by Brownstone and Small (2005), in a transport context, that ‘‘ythe value of time saved on the morning commute is quite high (between $20 and $40 per hour) when based on revealed behavior, and less than half that amount when based on hypothetical behavior’’ (p. 279). It may be that the failure to accommodate process heterogeneity is a significant contributing influence.
Acknowledgements The ideas presented herein are an accumulation of research activity undertaken with a number of colleagues. I especially acknowledge the contributions made by William Greene, John Rose, David Layton, Sean Puckett, Ric Scarpa, Stephane Hess, and Joffre Swait. Discussions with Stewart Jones on belief functions were especially useful. This research is partially funded by the Australian Research Council Discovery Project Grant DP0770618.
66
David A. Hensher
References Arana, J. E., Leon, C. J., & Hanemann, M. W. (2008). Emotions and decision rules in discrete choice experiments for valuing health care programmes for the elderly. Journal of Health Economics, 27(3), 753–769. Beharry, N., & Scarpa, R. (2008). Who should select the attributes in choice-experiments for non-market valuation? An application to coastal water quality in Tobago. Leeds, UK: Sustainability Research Institute, The University of Leeds. Bertrand, M., & Mullainathan, S. (2001). Do people mean what they say? Implications for subjective survey data. American Economic Review Papers and Proceedings, 91(2), 67–72. Brownstone, D., & Small, K. (2005). Valuing time and reliability: Assessing the evidence from road pricing demonstrations. Transportation Research Part A, 39, 279–293. Cameron, T. (2008). Differential attention to attributes in utility-theoretic choice models. Department of Economics, University of Oregon. Cantillo, V., Heydecker, B., & Ortu´zar, J. de D. (2006). A discrete choice model incorporating thresholds for perception in attribute values. Transportation Research B, 40(9), 807–825. Cantillo, V., & Ortu´zar, J. de D. (2005). A semi-compensatory discrete choice model with explicit attribute thresholds of perception. Transportation Research, 39B, 641–657. Carlsson, F., Kataria, M., & Lampi, E. (2008). Ignoring attributes in choice experiments. Proceedings EAERE conference, 25–28 June 2008, Gothenburg, Sweden. Caussade, S., Ortu´zar, J. de D., Rizzi, L., & Hensher, D. A. (2005). Assessing the influence of design dimensions on stated choice experiment estimates. Transportation Research B, 39(7), 621–640. Dempster, A. P. (1967). Upper and lower probabilities induced by a multiple-valued mapping. Annals of Mathematical Statistics, 38, 325–339. Dubois, D., & Prade, H. (1987). Representation and combination of uncertainty with belief functions and possibility measures. Computational Intelligence, 170(11), 909–924. Dubois, D., & Prade, H. (1988). Modelling uncertainty and inductive inference: A survey of recent non-additive probability systems. Acta Psychologica, 68, 53–78. Fox, C., & Tversky, A. (1998). A belief-based account of decision under uncertainty. Management Science, 44(7), 870–895. Gabaix, X., & Laibon, D. (2000). A boundedly rational decision algorithm. American Economic Review Papers and Proceedings, 90(2), 433–438. Gilbride, T., & Allenby, G. (2004). A choice model with conjunctive, disjunctive, and compensatory screening rules. Marketing Science, 23(93), 391–406. Gilovich, T., Griffin, D., & Kahneman, D. (Eds). (2002). Heuristics and biases – The psychology of intuitive judgment. Cambridge: Cambridge University Press. Greene, W. H., & Hensher, D. A. (in press). ‘Ordered choice, heterogeneity, and attribute processing’, Journal of Transport Economics and Policy. Greene, W. H., & Hensher, D. A. (2009). Modeling ordered choices, Cambridge: Cambridge University Press. Hensher, D., Scarpa, R., & Campbell, D. (in press). Non-attendance to attributes in environmental choice analysis: A latent class specification. Australian Journal of Agriculture and Resource Economics. Hensher, D. A. (2006). How do respondents handle stated choice experiments? Attribute processing strategies under varying information load. Journal of Applied Econometrics, 21, 861–878.
Attribute Processing, Heuristics and Preference Construction
67
Hensher, D. A. (2008). Joint estimation of process and outcome in choice experiments and implications for willingness to pay. Journal of Transport Economics and Policy, 42(2), 297–322. Hensher, D. A. (in press). Hypothetical bias, stated choice studies and willingness to pay, Transportation Research B. Hensher, D. A. (2009). Event descripton and support theory as a framework for representing process in choice analysis. Institute of Transport and Logistics Studies, University of Sydney, February. Mimeo, unpublished. Hensher, D. A., & Greene, W. H. (2008). Non-attendance and dual processing of commonmetric attributes in choice analysis: A latent class specification. Empirical Economics, December. Hensher, D. A., & Layton, D. (2008). Common-metric attribute parameter transfer and cognitive rationalisation: Implications for willingness to pay (July). Institute of Transport and Logistics Studies, University of Sydney. Hensher, D. A., & Rose, J. (2009). Simplifying choice through attribute preservation or nonattendance: Implications for willingness to pay. Transportation Research E, 45(4), 583–590. Hensher, D. A., Rose, J., & Greene, W. (2005). The implications on willingness to pay of respondents ignoring specific attributes. Transportation, 32(3), 203–222. Hess, S., & Hensher, D. (in press). Using conditioning on observed choices to retrieve individual-specific attribute processing strategies. Transportation Research B, Special Issue. Hess, S., & Rose, J. M. (2007). A latent class approach to modelling heterogeneous information processing strategies in SP studies. Paper presented at the Oslo Workshop on Valuation Methods in Transport Planning, Oslo. Idson, L. C., Krantz, D. H., Osherson, D., & Bonini, N. (2001). The relation between probability and evidence judgment: An extension of support theory. The Journal of Risk and Uncertainty, 22(3), 227–249. Isacsson, G. (2007). The trade off between time and money: Is there a difference between real and hypothetical choices? Borlange, Sweden: Swedish National Road and Transport Research Institute. Johannesson, M., Blomquist, G., Blumenschien, K., Johansson, P., Liljas, B., & O’Connor, R. (1999). Calibrating hypothetical willingness to pay responses. Journal of Risk and Uncertainty, 8, 21–32. King, G., & Wand, J. (2007). Comparing incomparable survey responses: New tools for anchoring vignettes. Political Analysis, 15, 46–66. Krantz, D. H. (1991). From indices to mappings: The representational approach to measurement. In: D. Brown & E. Smith (Eds), Frontiers of mathematical psychology: Essays in honour of Clyde Coombs (pp. 1–52). New York: Springer Verlag. Krosnick, J. A., & Schuman, H. (1988). Attitude intensity importance, and certainty and susceptibility to response effects. Journal of Personality and Social Psychology, 54(6), 940–952. Layton, D., & Hensher, D. A. (in press). Aggregation of common-metric attributes in preference revelation in choice experiments and implications for willingness to pay. Transportation Research D, Special Issue. Lundhede, T. H., Olsen, S. B., Jacobsen, J. B., & Thorsen, B. J. (2009). Handling respondent uncertainty in choice experiments: Evaluating recoding approaches against explicit modelling of uncertainty. Faculty of Life Sciences, University of Copenhagen. Martinez, F., Aguila, F., & Hurtubia, R. (2009). The constrained multinomial logit: A semicompensatory choice model. Transportation Research B, 43, 365–377.
68
David A. Hensher
McFadden, D. (1998). Measuring willingness-to-pay for transportation improvements. In: T. Garling, T. Laitila & K. Westin (Eds), Theoretical foundations of travel choice modelling (pp. 239–264). Oxford: Elsevier. Park, Y.-H., Ding, M., & Rao, V. (2008). Eliciting preference for complex products: A webbased upgrading method. Journal of Marketing Research, XLV(October), 562–574. Puckett, S., & Hensher, D. A. (2009). Revealing the extent of process heterogeneity in choice analysis: An empirical assessment. Transportation Research A, 43(1), 117–126. Puckett, S. M., & Hensher, D. A. (2008). The role of attribute processing strategies in estimating the preferences of road freight stakeholders under variable road user charges. Transportation Research E, 44, 379–395. Rose, J., Hensher, D. A., & Greene, W. (2005). Recovering costs through price and service differentiation: Accounting for exogenous information on attribute processing strategies in airline choice. Journal of Air Transport Management, 11, 400–407. Rose, J. M., Bliemer, M. C., Hensher, D. A., & Collins, A. T. (2008a). Designing efficient stated choice experiments in the presence of reference alternatives. Transportation Research B, 42(4), 395–406. Rose, J. M., & Bliemer, M. C. J. (2007). Stated preference experimental design strategies. In: D. A. Hensher & K. Button (Eds), Transport modelling (2nd ed., Chapter 8), Handbooks in Transport (Vol. 1, pp. 151–180), Elsevier Science, Oxford. Rose, J. M., Hensher, D. A., Greene, W. H., Washington, S. P., & Black, I. (2008b). Accounting for exogenous information on decision maker processing strategies in models of discrete choice: Attribute exclusion strategies in airline choice (July). Institute of Transport and Logistics Studies, University of Sydney. Sandor, Z., & Wedel, M. (2002). Profile construction in experimental choice designs for mixed logit models. Marketing Science, 21(4), 455–475. Scarpa, R., Gilbride, T., Campbell, D., & Hensher, D. A. (2009a). Modeling attribute nonattendence in choice experiments: Does it matter? European Review of Agricultural Economics, 36(2), 151–174. Scarpa, R., Thiene, M., & Hensher, D. A. (2009b). Monitoring choice task attribute attendance in non-market valuation of multiple park management services: Does it matter? Land Economics, in press. Shafer, G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton University Press. Shafer, G., & Srivastava, R. P. (1990). The Bayesian and belief-function formalisms: A general perspective for auditing. Auditing: A Journal of Practice and Theory (Supplement), 110–148. Simon, H. (1978). Rational decision making in organisations. The American Economic Review, 69(4), 493–513. Svenson, O. (1998). The perspective from behavioral decision theory on modeling travel choice. In: T. Garling, T. Laitila & K. Westin (Eds), Theoretical foundations of travel choice modeling (pp. 141–172). Oxford: Elsevier. Swait, J. (2001). A non-compensatory choice model incorporating attribute cut-offs. Transportation Research B, 35(10), 903–928. Swait, J. D., & Adamowicz, W. L. (2001). The influence of task complexity on consumer choice: A latent class model of decision strategy switching. Journal of Consumer Research, 28, 135–148. Swait, J. D., & Ben-Akiva, M. (1987). Empirical test of a constrained discrete choice model: Mode choice in Sao Paula, Brazil. Transportation Research B, 21(2), 103–115. Train, K. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University Press.
Attribute Processing, Heuristics and Preference Construction
69
Train, K., & Revelt, D. (2000). Customer-specific taste parameters and mixed logit. Working Paper. Department of Economics, University of California, Berkeley, CA. Available at: http://elsa.berkeley.edu/wp/train0999.pdf Tversky, A., & Koehler, D. (1994). Support theory: A nonextensional representation of subjective probability. Psychological Review, 1010, 547–567. Wong, S. K. M., & Wang, Z. W. (1993). Qualitative measures of ambiguity. In: D. Hackerman & A. Mamdani (Eds), Proceedings of the ninth conference on Uncertainty in Artificial Intelligence, San Mateo, California: Morgan Kaufmann (pp. 443–450). Yoon, S.-O., & Simonson, I. (2008). Choice set configuration as a determinant of preference attribution and strength. Journal of Consumer Research, 35(August), 324–336.
Chapter 4
The Multiple Discrete-Continuous Extreme Value (MDCEV) Model: Formulation and Applications Chandra R. Bhat and Naveen Eluru
Abstract Many consumer choice situations are characterized by the simultaneous demand for multiple alternatives that are imperfect substitutes for one another. A simple and parsimonious multiple discrete-continuous extreme value (MDCEV) econometric approach to handle such multiple discreteness was formulated by Bhat (2005) within the broader Kuhn–Tucker (KT) multiple discrete-continuous economic consumer demand model of Wales and Woodland (1983). In this chapter, the focus is on presenting the basic MDCEV model structure, discussing its estimation and use in prediction, formulating extensions of the basic MDCEV structure, and presenting applications of the model. The paper examines several issues associated with the MDCEV model and other extant KT multiple discrete-continuous models. Specifically, the paper discusses the utility function form that enables clarity in the role of each parameter in the utility specification, presents identification considerations associated with both the utility functional form as well as the stochastic nature of the utility specification, extends the MDCEV model to the case of price variation across goods and to general error covariance structures, discusses the relationship between earlier KT-based multiple discrete-continuous models, and illustrates the many technical nuances and identification considerations of the multiple discrete-continuous model structure. Finally, we discuss the many applications of MDCEV model and its extensions in various fields.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
72
Chandra R. Bhat and Naveen Eluru
4.1. Introduction Several consumer demand choices related to travel and other decisions are characterized by the choice of multiple alternatives simultaneously, along with a continuous quantity dimension associated with the consumed alternatives. Examples of such choice situations include vehicle type holdings and usage, and activity type choice and duration of time investment of participation. In the former case, a household may hold a mix of different kinds of vehicle types (e.g., a sedan, a minivan, and a pickup) and use the vehicles in different ways based on the preferences of individual members, considerations of maintenance/running costs, and the need to satisfy different functional needs (such as being able to travel on weekend getaways as a family or to transport goods). In the case of activity type choice and duration, an individual may decide to participate in multiple kinds of recreational and social activities within a given time period (such as a day) to satisfy varietyseeking desires. Of course, there are several other travel-related and other consumer demand situations characterized by the choice of multiple alternatives, including airline fleet mix and usage, carrier choice and transaction level, brand choice and purchase quantity for frequently purchased grocery items (such as cookies, ready-toeat cereals, soft drinks, yoghurt, etc.), and stock selection and investment amounts. There are many ways that multiple discrete situations, such as those discussed above, may be modeled. One approach is to use the traditional random utility-based (RUM) single discrete choice models by identifying all combinations or bundles of the ‘‘elemental’’ alternatives, and treating each bundle as a ‘‘composite’’ alternative (the term ‘‘single discrete choice’’ is used to refer to the case where a decision-maker chooses only one alternative from a set of alternatives). A problem with this approach, however, is that the number of composite alternatives explodes with the number of elemental alternatives. Specifically, if J is the number of elemental alternatives, the total number of composite alternatives is (2J–1). A second approach to analyze multiple discrete situations is to use the multivariate probit (logit) methods of Manchanda, Ansari, and Gupta (1999), Baltas (2004), Edwards and Allenby (2003), and Bhat and Srinivasan (2005). In these multivariate methods, the multiple discreteness is handled through statistical methods that generate correlation between univariate utility-maximizing models for single discreteness. While interesting, this second approach is more of a statistical ‘‘stitching’’ of univariate models rather than being fundamentally derived from a rigorous underlying utility maximization model for multiple discreteness. The resulting multivariate models also do not collapse to the standard discrete choice models when all individuals choose one and only one alternative at each choice occasion. A third approach is the one proposed by Hendel (1999) and Dube (2004). These researchers consider the case of ‘‘multiple discreteness’’ in the purchase of multiple varieties within a particular product category as the result of a stream of expected (but unobserved to the analyst) future consumption decisions between successive shopping purchase occasions (see also Walsh, 1995). During each consumption occasion, the standard discrete choice framework of perfectly substitutable alternatives is invoked, so that only one product is consumed. Due to varying tastes across individual consumption occasions between
MDCEV Model: Formulation and Applications
73
the current shopping purchase and the next, consumers are observed to purchase a variety of goods at the current shopping occasion. In all the three approaches discussed above to handle multiple discreteness, there is no recognition that individuals choose multiple alternatives to satisfy different functional or variety-seeking needs (such as wanting to relax at home as well as participate in out-of-home recreation). Thus, the approaches fail to incorporate the diminishing marginal returns (i.e., satiation) in participating in a single type of activity, which may be the fundamental driving force for individuals choosing to participate in multiple activity types.1 Finally, in the approaches above, it is very cumbersome, even if conceptually feasible, to include a continuous choice into the model (e.g., modeling the different activity purposes of participation as well as the duration of participation in each activity purpose). Wales and Woodland (1983) proposed two alternative ways to handle situations of multiple discreteness based on satiation behavior within a behaviorally consistent utility-maximizing framework. Both approaches assume a direct utility function U(x) that is assumed to be quasi-concave, increasing, and continuously differentiable with respect to the consumption quantity vector x.2 Consumers maximize the utility function subject to a linear budget constraint, which is binding in that all the available budget is invested in the consumption of the goods; that is, the budget constraint has an equality sign rather than a ‘‘r’’ sign. This binding nature of the budget constraint is the result of assuming an increasing utility function, and also implies that at least one good will be consumed. The difference in the two alternative approaches proposed by Wales and Woodland (1983) is in how stochasticity, nonnegativity of consumption, and corner solutions (i.e., zero consumption of some goods) are accommodated, as briefly discussed below (see Wales & Woodland, 1983; Phaneuf, Kling, & Herriges, 2000 for additional details). The first approach, which Wales and Woodland label as the Amemiya–Tobin approach, is an extension of the classic microeconomic approach of adding normally distributed stochastic terms to the budget-constrained utility-maximizing share equations. In this approach, the direct utility function U(x) itself is assumed to be deterministic by the analyst, and stochasticity is introduced post-utility maximization. The justification for the addition of such normally distributed stochastic terms to the deterministic utility-maximizing allocations is based on the notion that consumers make errors in the utility-maximizing process, or that there are
1. The approach of Hendel and Dube can be viewed as a ‘‘vertical’’ variety-seeking model that may be appropriate for frequently consumed grocery items such as carbonated soft drinks, cereals, and cookies. However, in many other choice occasions, such as time allocation to different types of discretionary activities, the true decision process may be better characterized as ‘‘horizontal’’ variety-seeking, where the consumer selects an assortment of alternatives due to diminishing marginal returns for each alternative. That is, the alternatives represent inherently imperfect substitutes at the choice occasion. 2. The assumption of a quasi-concave utility function is simply a manifestation of requiring the indifference curves to be convex to the origin (see Deaton & Muellbauer, 1980, p. 30 for a rigorous definition of quasiconcavity). The assumption of an increasing utility function implies that U(x1)WU(x0) if x1Wx0.
74
Chandra R. Bhat and Naveen Eluru
measurement errors in the collection of share data, or that there are unknown factors (from the analyst’s perspective) influencing actual consumed shares. However, the addition of normally distributed error terms to the share equations in no way restricts the shares to be positive and less than 1. The contribution of Wales and Woodland was to devise a stochastic formulation, based on the earlier work of Tobin (1958) and Amemiya (1974), that (a) respects the unit simplex range constraint for the shares, (b) accommodates the restriction that the shares sum to one, and (c) allows corner solutions in which one or more alternatives are not consumed. They achieve this by assuming that the observed shares for the (K 1) of the K alternatives follow a truncated multivariate normal distribution (note that since the shares across alternatives have to sum to one, there is a singularity generated in the K-variate covariance matrix of the K shares, which can be accommodated by dropping one alternative). However, an important limitation of the Amemiya–Tobin approach of Wales and Woodland is that it does not account for corner solutions in its underlying behavior structure. Rather, the constraint that the shares have to lie within the unit simplex is imposed by ad hoc statistical procedures of mapping the density outside the unit simplex to the boundary points of the unit simplex. The second approach suggested by Wales and Woodland, which they label as the Kuhn–Tucker approach, is based on the Kuhn–Tucker or KT (1951) first-order conditions for constrained random utility maximization (see Hanemann, 1978, who used such an approach even before Wales and Woodland). Unlike the Amemiya– Tobin approach, the KT approach employs a more direct stochastic specification by assuming the utility function U(x) to be random (from the analyst’s perspective) over the population, and then derives the consumption vector for the random utility specification subject to the linear budget constraint by using the KT conditions for constrained optimization. Thus, the stochastic nature of the consumption vector in the KT approach is based fundamentally on the stochastic nature of the utility function. Consequently, the KT approach immediately satisfies all the restrictions of utility theory, and the stochastic KT first-order conditions provide the basis for deriving the probabilities for each possible combination of corner solutions (zero consumption) for some goods and interior solutions (strictly positive consumption) for other goods. The singularity imposed by the ‘‘adding-up’’ constraint is accommodated in the KT approach by employing the usual differencing approach with respect to one of the goods, so that there are only (K 1) interdependent stochastic first-order conditions. Among the two approaches discussed above, the KT approach constitutes a more theoretically unified and behaviorally consistent framework for dealing with multiple discreteness consumption patterns. However, the KT approach did not receive much attention until relatively recently because the random utility distribution assumptions used by Wales and Woodland led to a complicated likelihood function that entails multidimensional integration. Kim, Allenby, and Rossi (2002) addressed this issue by using the Geweke–Hajivassiliou–Keane (or GHK) simulator to evaluate the multivariate normal integral appearing in the likelihood function in the KT approach. Also, different from Wales and Woodland, Kim et al. used a generalized variant of the well-known translated constant elasticity of substitution (CES) direct utility function (see Pollak & Wales, 1992, p. 28) rather than the quadratic direct
MDCEV Model: Formulation and Applications
75
utility function used by Wales and Woodland. In any case, the Kim et al. approach, like the Wales and Woodland approach, is unnecessarily complicated because of the need to evaluate truncated multivariate normal integrals in the likelihood function. In contrast, Bhat (2005) introduced a simple and parsimonious econometric approach to handle multiple discreteness, also based on the generalized variant of the translated CES utility function but with a multiplicative log-extreme value error term. Bhat’s model, labeled the multiple discrete-continuous extreme value (MDCEV) model, is analytically tractable in the probability expressions and is practical even for situations with a large number of discrete consumption alternatives. In fact, the MDCEV model represents the multinomial logit (MNL) form-equivalent for multiple discretecontinuous choice analysis and collapses exactly to the MNL in the case that each (and every) decision-maker chooses only one alternative. Independent of the above works of Kim et al. and Bhat, there has been a stream of research in the environmental economics field (see Phaneuf et al., 2000; von Haefen, Phaneuf, & Parsons, 2004; von Haefen, 2003, 2004; von Haefen & Phaneuf, 2005; Phaneuf & Smith, 2005) that has also used the KT approach to multiple discreteness. These studies use variants of the linear expenditure system (LES) as proposed by Hanemann (1978) and the translated CES for the utility functions, and use multiplicative log-extreme value errors. However, the error specification in the utility function is different from that in Bhat’s MDCEV model, resulting in a different form for the likelihood function. In this chapter, the focus is on presenting the basic MDCEV model structure, discussing its estimation and use in prediction, formulating extensions of the basic MDCEV structure, and presenting applications of the model. Accordingly, the rest of the chapter is structured as follows. The next section formulates a functional form for the utility specification that enables the isolation of the role of different parameters in the specification. This section also identifies empirical identification considerations in estimating the parameters in the utility specification. Section 4.3 discusses the stochastic form of the utility specification, the resulting general structure for the probability expressions, and associated identification considerations. Section 4.4 derives the MDCEV structure for the utility functional form used in the current paper, and extends this structure to more general error structure specifications. For presentation ease, Sections 4.2–4.4 consider the case of the absence of an outside good. In Section 4.5, we extend the discussions of the earlier sections to the case when an outside good is present. Section 4.6 provides an overview of empirical applications using the model. The final section concludes the paper.
4.2. Functional Form of Utility Specification We consider the following functional form for utility in this paper, based on a generalized variant of the translated CES utility function: ak xk c þ1 1 UðxÞ ¼ a k gk k¼1 k K X g
k
(4.1)
76
Chandra R. Bhat and Naveen Eluru
where U(x) is a quasi-concave, increasing, and continuously differentiable function with respect to the consumption quantity (K 1)-vector x (xkZ0 for all k), and ck, gk, and ak are parameters associated with good k. The function in Eq. (4.1) is a valid utility function if ckW0 and akr1 for all k. Further, for presentation ease, we assume temporarily that there is no-outside good, so that corner solutions (i.e., zero consumptions) are allowed for all the goods k (this assumption is being made only to streamline the presentation and should not be construed as limiting in any way; the assumption is relaxed in a straightforward manner as discussed in Section 4.5). The possibility of corner solutions implies that the term gk, which is a translation parameter, should be greater than zero for all k.3 The reader will note that there is an assumption of additive separability of preferences in the utility form of Eq. (4.1), which immediately implies that none of the goods are a priori inferior and all the goods are strictly Hicksian substitutes (see Deaton & Muellbauer, 1980, p. 139). Additionally, additive separability implies that the marginal utility with respect to any good is independent of the levels of all other goods.4 The form of the utility function in Eq. (4.1) highlights the role of the various parameters ck, gk, and ak, and explicitly indicates the interrelationships between these parameters that relate to theoretical and empirical identification issues. The form also assumes weak complementarity (see Ma¨ler, 1974), which implies that the consumer receives no utility from a nonessential good’s attributes if s/he does not consume it (i.e., a good and its quality attributes are weak complements, or Uk ¼ 0 if xk ¼ 0, where Uk is the subutility function for the kth good). The reader will also note that the functional form proposed by Bhat (2008) in Eq. (4.1) generalizes earlier forms used by Hanemann (1978), von Haefen et al. (2004), Herriges, Kling, and Phaneuf (2004), Phaneuf et al. (2000), and Mohn and Hanemann (2005). Specifically, it should be noted that the utility form of Eq. (4.1) collapses to the following LES form when ak ! 08k: UðxÞ ¼
K X k¼1
gk ck ln
xk þ1 gk
(4.2)
3. As illustrated in Kim et al. (2002) and Bhat (2005), the presence of the translation parameters makes the indifference curves strike the consumption axes at an angle (rather than being asymptotic to the consumption axes), thus allowing corner solutions. 4. Some other studies assume the overall utility to be derived from the characteristics embodied in the goods, rather than using the goods as separate entities in the utility function. The reader is referred to Chan (2006) for an example of such a characteristics approach to utility. Also, as we discuss later, recent work by Vasquez and Hanemann (2008) relaxes the assumption of additive separability, but at a computational and interpretation cost.
MDCEV Model: Formulation and Applications 4.2.1.
77
Role of Parameters in Utility Specification
4.2.1.1. Role of wk The role of ck can be inferred by computing the marginal utility of consumption with respect to good k, which is: ak 1 @UðxÞ xk ¼ ck þ1 gk @xk
(4.3)
It is obvious from above that ck represents the baseline marginal utility, or the marginal utility at the point of zero consumption. Alternatively, the marginal rate of substitution between any two goods k and l at the point of zero consumption of both goods is ck/cl. This is the case regardless of the values of gk and ak. For two goods i and j with same unit prices, a higher baseline marginal utility for good i relative to good j implies that an individual will increase overall utility more by consuming good i rather than j at the point of no consumption of any goods. That is, the consumer will be more likely to consume good i than good j. Thus, a higher baseline ck implies less likelihood of a corner solution for good k. 4.2.1.2. Role of ck An important role of the gk terms is to shift the position of the point at which the indifference curves are asymptotic to the axes from (0, 0, 0y, 0) to ðg1 ; g2 ; g3 ; :::; gK Þ, so that the indifference curves strike the positive orthant with a finite slope. This, combined with the consumption point corresponding to the location where the budget line is tangential to the indifference curve, results in the possibility of zero consumption of good k. To see this, consider two goods 1 and 2 with c1 ¼ c2 ¼ 1, a1 ¼ a2 ¼ 0.5, and g2 ¼ 1. Figure 4.1 presents the profiles of the indifference curves in this two-dimensional space for various values of g1(g1W0). To compare the profiles, the indifference curves 11 10 Consumption Quantity of Good 2
9 8
1 = 2 = 1 1 = 2 = 0.5 2 = 1
7 6 5 4
1 = 0.25
3 2
1 = 1
1 -2
-1
0 0.25 0 -1
1 = 2 1
2
3
4
5
1 = 5
-2 Consumption Quantity of Good 1
Figure 4.1: Indifference curves corresponding to different values of g1.
6
78
Chandra R. Bhat and Naveen Eluru
Utility Accrued Due to Consumption of Good k
180 k = 1
160
k = 100
k ⎯⎯→ 0
140 120
k = 20
100 80 60
k = 10
40
k = 5
20
k = 1
0 0
50
100
150
200
250
300
350
400
Consumption Quantity of Good k
Figure 4.2: Effect of gk value on good k’s subutility function profile. are all drawn to go through the point (0, 8). The reader will also note that all the indifference curve profiles strike the y-axis with the same slope. As can be observed from the figure, the positive values of g1 and g2 lead to indifference curves that cross the axes of the positive orthant, allowing for corner solutions. The indifference curve profiles are asymptotic to the x-axis at y ¼ 1 (corresponding to the constant value of g2 ¼ 1), while they are asymptotic to the y-axis at x ¼ g1. Figure 4.1 also points to another role of the gk term as a satiation parameter. Specifically, the indifference curves get steeper in the positive orthant as the value of g1 increases, which implies a stronger preference (or lower satiation) for good 1 as g1 increases (with steeper indifference curve slopes, the consumer is willing to give up more of good 2 to obtain 1 unit of good 1). This point is particularly clear if we examine the profile of the subutility function for alternative k. Figure 4.2 plots the function for alternative k for ak-0 and ck ¼ 1, and for different values of gk. All of the curves have the same slope ck ¼ 1 at the origin point, because of the functional form used in this paper. However, the marginal utilities vary for the different curves at xkW0. Specifically, the higher the value of gk, the less is the satiation effect in the consumption of xk. 4.2.1.3. Role of ak The express role of ak is to reduce the marginal utility with increasing consumption of good k; that is, it represents a satiation parameter. When ak ¼ 1 for all k, this represents the case of absence of satiation effects or, equivalently, the case of constant P marginal utility. The utility function in Eq. (4.1) in such a situation collapses to k ck xk , which represents the perfect substitutes case as proposed by Deaton and Muellbauer (1980) and applied in Hanemann (1984), Chiang (1991), Chintagunta (1993), and Arora, Allenby, and Ginter (1998), among others. Intuitively, when there is no satiation and the unit good prices are all the same, the consumer will invest all expenditure on the single good with the highest baseline (and constant) marginal utility (i.e., the highest ck value). This is the case of
Utility Accrued Due to Consumption of Good k
MDCEV Model: Formulation and Applications
79
40 k = 0.5
k = 1 k = 1 30
k = 0.4 20 k = 0.25 10 k ⎯⎯→ 0 k ⎯⎯→ −0.5 k = −2
0 0
50
100
150
200
250
300
350
400
Consumption Quantity of Good k
Figure 4.3: Effect of ak value on good k’s subutility function profile. single discreteness.5 As ak moves downward from the value of 1, the satiation effect for good k increases. When ak-0, the utility function collapses to the form in Eq. (4.2), as discussed earlier. ak can also take negative values and, when ak- N, this implies immediate and full satiation. Figure 4.3 plots the utility function for alternative k for gk ¼ 1 and ck ¼ 1, and for different values of ak. Again, all of the curves have the same slope ck ¼ 1 at the origin point, and accommodate different levels of satiation through different values of ak for any given gk value.
4.2.2.
Empirical Identification Issues Associated with Utility Form
The discussion in the previous section indicates that ck reflects the baseline marginal utility, which controls whether or not a good is selected for positive consumption (or the extensive margin of choice). The role of gk is to enable corner solutions, though it also governs the level of satiation. The purpose of ak is solely to allow satiation. Thus, for a given extensive margin of choice of good k, gk and ak influence the quantity of good k consumed (or the intensive margin of choice) through their impact on satiation effects. The precise functional mechanism through which gk and ak impact satiation are, however, different; gk controls satiation by translating consumption quantity, while ak controls satiation by exponentiating consumption quantity. Clearly, both these effects operate in different ways, and different
5. If there is price variation across goods, one needs to take the derivative of the utility function with respect to expenditures (ek) on the goods. In the case that ak ¼ 1 for all k, U ¼ Skck(ek/pk), where ck is the unit price of good k. Then qU/qek ¼ ck/pk. In this situation, the consumer will invest all expenditures on the single good with the highest price-normalized marginal (and constant) utility ck/pk.
80
Chandra R. Bhat and Naveen Eluru
combinations of their values lead to different satiation profiles. However, empirically speaking, it is very difficult to disentangle the two effects separately, which leads to serious empirical identification problems and estimation breakdowns when one attempts to estimate both gk and ak parameters for each good. In fact, for a given ck value, it is possible to closely approximate a subutility function profile based on a combination of gk and ak values with a subutility function based solely on gk or ak values. In actual application, it would behoove the analyst to estimate models based on both the ak-profile and the gk-profile, and choose a specification that provides a better statistical fit.6
4.3. Stochastic Form of Utility Function The KT approach employs a direct stochastic specification by assuming the utility function U(x) to be random over the population. In all recent applications of the KT approach for multiple discreteness, a multiplicative random element is introduced to the baseline marginal utility of each good as follows: cðzk ; k Þ ¼ cðzk Þ ek
(4.4)
where zk is a set of attributes characterizing alternative k and the decision-maker, and ek captures idiosyncratic (unobserved) characteristics that impact the baseline utility for good j. The exponential form for the introduction of the random term guarantees the positivity of the baseline utility as long as c(zk)W0. To ensure this latter condition, c(zk) is further parameterized as exp(buzk), which then leads to the following form for the baseline random utility associated with good k: cðzk ; k Þ ¼ expðb0 zk þ k Þ
(4.5)
The zk vector in the above equation includes a constant term. The overall random utility function of Eq. (4.1) then takes the following form: UðxÞ ¼
ak Xg xk k expðb0 zk þ k Þ þ1 1 ak gk k
(4.6)
From the analyst’s perspective, the individual P is maximizing random utility subject to the binding linear budget constraint that K k¼1 ek ¼ E, where E is total expenditure or income (or some other appropriately defined total budget quantity), ek ¼ pkxk, and pk is the unit price of good k.
6. Alternatively, the analyst can stick with one functional form a priori, but experiment with various fixed values of ak for the gk-profile and gk for the ak-profile.
MDCEV Model: Formulation and Applications 4.3.1.
81
Optimal Expenditure Allocations
The analyst can solve for the optimal expenditure allocations by forming the Lagrangian and applying the KT conditions.7 The Lagrangian function for the problem is: " # ak K Xg X ek k 0 L¼ expðb zk þ k Þ þ1 1 l ek E ak g k pk k¼1 k
(4.7)
where l is the Lagrangian multiplier associated with the expenditure constraint (i.e., it can be viewed as the marginal utility of total expenditure or income). The KT firstorder conditions for the optimal expenditure allocations (the ek values) are given by:
ak 1 ek þ1 l ¼ 0; if ek 40; k ¼ 1; 2; . . . ; K g k pk
ak 1 ek expðb0 zk þ k Þ þ1 lo0; if ek ¼ 0; k ¼ 1; 2; . . . ; K g k pk pk expðb0 zk þ k Þ pk
(4.8)
constraint PKTheoptimal demand satisfies the conditions in Eq. (4.8) plus the budget e ¼ E. The budget constraint implies that only K 1 of the e values need to k¼1 k k be estimated, since the quantity consumed of any one good is automatically determined from the quantity consumed of all the other goods. To accommodate this constraint, designate activity purpose 1 as a purpose to which the individual allocates some nonzero amount of consumption (note that the individual should participate in at least one of the K purposes, given that EW0). For the first good, the KT condition may then be written as: a1 1 expðb0 z1 þ 1 Þ e1 þ1 l¼ g 1 p1 p1
(4.9)
Substituting for l from above into Eq. (4.8) for the other activity purposes (k ¼ 2,y, K), and taking logarithms, we can rewrite the KT conditions as: V k þ k ¼ V 1 þ 1 if ek 40 ðk ¼ 2; 3; . . . KÞ V k þ k oV 1 þ 1 if ek ¼ 0 ðk ¼ 2; 3; . . . KÞ; where ek þ 1 ln pk ðk ¼ 1; 2; 3; . . . ; KÞ V k ¼ b0 zk þ ðak 1Þ ln g k pk
(4.10)
7. For reasons that will become clear later, we solve for the optimal expenditure allocations ek for each good, not the consumption amounts xk of each good. This is different from earlier studies that focus on the consumption of goods.
82
Chandra R. Bhat and Naveen Eluru
Also, note that, in Eq. (4.10), a constant cannot be identified in the buzk term for one of the K alternatives (because only the difference in the Vk from V1 matters). Similarly, individual-specific variables are introduced in the Vk’s for (K 1) alternatives, with the remaining alternative serving as the base.8
4.3.2.
General Econometric Model Structure and Identification
To complete the model structure, the analyst needs to specify the error structure. In the general case, let the joint probability density function of the ek terms be f(e1, e2,y, eK). Then, the probability that the individual allocates expenditure to the first M of the K goods is: Pðe1 ; e2 ; e3 ; . . . ; eM ; 0; 0; . . . ; 0Þ Z þ1 Z V 1 V Mþ1 þ1 Z V 1 V Mþ2 þ1 Z ¼ jJj 1 ¼1
Mþ1 ¼1
Mþ2 ¼1
V 1 V K1 þ1
K1 ¼1
Z
V 1 V K þ1
K ¼1
f ð1 ; V 1 V 2 þ 1 ; V 1 V 3 þ 1 ; . . . ; V 1 V M þ 1 ; Mþ1 ; Mþ2 ; :::; K1 ; K Þ
dK dK1 . . . dMþ2 dMþ1 d1 (4.11) where J is the Jacobian whose elements are given by (see Bhat, 2005):
J ih ¼
@½V 1 V iþ1 þ 1 @½V 1 V iþ1 ¼ ; i; h ¼ 1; 2; . . . ; M 1 @ehþ1 @ehþ1
(4.12)
The probability expression in Eq. (4.11) is a (K M + 1)-dimensional integral. The expression for the probability of all goods being consumed is one-dimensional, while the expression for the probability of only the first good being consumed is K-dimensional. The dimensionality of the integral can be reduced by one by noticing that the KT conditions can also be written in a differenced form. To do so, define ~k1 ¼ k 1 , and let the implied multivariate distribution of the error differences be gð~21 ; ~31 ; :::; ~K1 Þ. Then, Eq. (4.11) may be written in the equivalent (K M)-integral
8. These identification conditions are similar to those in the standard discrete choice model, though the origin of the conditions is different between standard discrete choice models and the multiple discretecontinuous models. In standard discrete choice models, individuals choose the alternative with highest utility, so that all that matters is relative utility. In multiple discrete-continuous models, the origin of these conditions is the adding-up (or budget) constraint associated with the quantity of consumption of each good that leads to the KT first-order conditions of Eq. (4.10).
MDCEV Model: Formulation and Applications
83
form shown below: Pðe1 ; e2 ; e3 ; . . . ; eM ; 0; 0; . . . ; 0Þ
Z
V 1 V Mþ1
Z
Z
V 1 V Mþ2
¼ jJj
V 1 V K1
Z
V 1 V K
~Mþ1;1 ¼1
~Mþ2;1 ¼1
~ K1;1 ¼1
~K;1 ¼1
gðV 1 V 2 ; V 1 V 3 ; . . . ; V 1 V M ; ~Mþ1;1 ; ~Mþ2;1 ; . . . ; ~K;1 Þd~K;1 d~K1;1 . . . d ~Mþ1;1 (4.13) The equation above indicates that the probability expression for the observed optimal expenditure pattern of goods is completely characterized by the (K 1) error terms in difference form. Thus, all that is estimable is the (K 1) (K 1) covariance matrix of the error differences. In other words, it is not possible to estimate a full covariance matrix for the original error terms ð1 ; 2 ; . . . ; K Þ because there are infinite possible densities for f(.) that can map into the same g(.) density for the error differences (see Train, 2003, p. 27, for a similar situation in the context of standard discrete choice models). There are many possible ways to normalize f(.) to account for this situation. For example, one can assume an identity covariance matrix for f(.), which automatically accommodates the normalization that is needed. Alternatively, one can estimate g(.) without reference to f(.). In the general case when the unit prices pk vary across goods, it is possible to estimate K(K 1)/2 parameters of the full covariance matrix of the error differences, as just discussed (though the analyst might want to impose constraints on this full covariance matrix for ease in interpretation and stability in estimation). However, when the unit prices are not different among the goods, an additional scaling restriction needs to be imposed. To see this, consider the case of independent and identically distributed error terms for the ek terms, which leads to a (K 1) (K 1) covariance matrix for ~k1 (k ¼ 2, 3,y, K) with diagonal elements equal to twice the value of scale parameter of the ek terms and off-diagonal elements equal to the scale parameter of the ek terms. Let the unit prices of all goods be the same (see Bhat, 2005; Bhat & Sen, 2006; Bhat, Srinivasan, & Sen, 2006; Bhat, Sen, & Eluru, 2009 for examples where the weights or prices on the goods in the budget constraint are equal). Consider the utility function in Eq. (4.6) and another utility function as given below: U~ ¼
Xg k
k ak
( ) a k xk ½expfs ðb zk þ k Þg þ1 1 gk 0
(4.14)
The scale of the error terms in the utility function in the above expression is s times the scale of the error terms in Eq. (4.6). Let ak ¼ sðak 1Þ þ 1, where ak is the satiation parameter in the original Eq. (4.6).9 The KT conditions for optimal
9. Note that ak is less than or equal to 1 by definition, because ak is less than or equal to 1 and the scale s should be nonnegative.
84
Chandra R. Bhat and Naveen Eluru
expenditure for this modified utility function can be shown to be: V k þ sk ¼ V 1 þ s1 if ek 40 ðk ¼ 2; 3; . . . KÞ V k þ sk oV 1 þ s1 if ek ¼ 0 ðk ¼ 2; 3; . . . KÞ
(4.15)
where V k
ek ¼ sb zk þ 1Þ ln þ 1 ln pk ðk ¼ 1; 2; 3; :::; KÞ g p k k ek 0 ¼ sb zk þ sðak 1Þ ln þ 1 ln pk ðk ¼ 1; 2; 3; :::; KÞ g k pk 0
ðak
If the unit prices are not all the same (i.e., the unit prices of at least two of the K goods are different), the KT conditions above are different from the KT conditions in Eq. (4.10).
4.4. Specific Model Structures 4.4.1.
The MDCEV Model Structure
Following Bhat (2005, 2008), consider an extreme value distribution for ek and assume that ek is independent of zk (k ¼ 1, 2,y, K). The ek’s are also assumed to be independently distributed across alternatives with a scale parameter of s (s can be normalized to one if there is no variation in unit prices across goods). Let Vk be defined as follows: ek þ 1 ln pk V k ¼ b zk þ ðak 1Þ ln pk 0
ðk ¼ 1; 2; 3; . . . ; KÞ;
when the a-profile is used; and ek þ 1 ln pk ðk ¼ 1; 2; 3; . . . ; KÞ; V k ¼ b0 zk ln g k pk
(4.16)
when the g-profile is used As discussed earlier, it is generally not possible to estimate the Vk form in Eq. (4.10), because the ak terms and gk terms serve a similar satiation role. From Eq. (4.11), the probability that the individual allocates expenditure to the first M of the K goods (MZ1) is: Pðe1 ; e2 ; e3 ; . . . ; eM ; 0; 0; . . . ; 0Þ
! )( Y
) Z 1 ¼þ1 ( Y M K 1 V 1 V i þ 1 V 1 V s þ 1 1 1 l l L ¼ jJj d1 s s s s s 1 ¼1 i¼2 s¼Mþ1 (4.17)
MDCEV Model: Formulation and Applications
85
where l is the standard extreme value density function and L the standard extreme value cumulative distribution function. The expression in Eq. (4.17) simplifies to a remarkably simple and elegant closed-form expression. Bhat derived the form of the Jacobian for the case of equal unit prices across goods, which however can be extended in a simple fashion to accommodate the more general case of different unit prices. The resulting form for the determinant of the Jacobian has a compact structure given by:10
jJj ¼
M Y
! ci
M X 1
i¼1
i¼1
!
; where ci ¼
ci
1 ai e i þ g i pi
(4.18)
The integration in Eq. (4.17) also collapses to a closed-form expression providing the following overall expression: Pðe1 ; e2 ; e3 ; . . . ; eM ; 0; 0; . . . ; 0Þ ¼
1
"
sM1
M Y
#" ci
i¼1
M X 1 i¼1
ci
#" Q M
V i =s i¼1 e
ð
PK
k¼1 e
V k =s ÞM
# ðM 1Þ! (4.19)
In the case when M ¼ 1 (i.e., only one alternative is chosen), there are no satiation effects (ak ¼ 1 for all k) and the Jacobian term drops out (i.e., the continuous component drops out, because all expenditure is allocated to good 1). Then, the model in Eq. (4.19) collapses to the standard MNL model. Thus, the MDCEV model is a multiple discrete-continuous extension of the standard MNL model.11 The expression for the probability of the consumption pattern of the goods (rather than the expenditure pattern) can be derived to be:
Pðx1 ; x2 ; x3 ; . . . ; xM ; 0; 0; . . . ; 0Þ ¼
1 1 p1 sM1
"
M Y i¼1
#" fi
3 2 # QM V i =s M X pi 6 e 7 4P i¼1 M 5ðM 1Þ! f K V k =s i¼1 i k¼1 e (4.20)
where Vk is as defined earlier (see Eq. (4.16)) and f i ¼ ðð1 ai Þ=ðxi þ gi ÞÞ. The expression in Eq. (4.20) is, however, not independent of the good that is used as the first one (see the 1/p1 term in front). In particular, different probabilities of the same consumption pattern arise depending on the good that is labeled as the first good
10. It is important to note that this compact Jacobian form is independent of the assumptions regarding the density and correlation structure of the error terms. 11. Note that when ak ¼ 1 for all k, Vk ¼ buzk ln pk. Even if M ¼ 1, when Eq. (4.19) collapses to the MNL form, the scale s is estimable as long as the utility takes the functional form Vk ¼ buzk ln pk and there is price variation across goods. This is because the scale is the inverse of the coefficient on the ln pk term (see Hanemann, 1984).
86
Chandra R. Bhat and Naveen Eluru
(note that any good that is consumed may be designated as the first good). In terms of the likelihood function, the 1/p1 term can be ignored, since it is simply a constant in each individual’s likelihood function. Thus, the same parameter estimates will result independent of the good designated as the first good for each individual, but it is still awkward to have different probability values for the same consumption pattern. This is particularly the case because different log-likelihood values at convergence will be obtained for different designations of the first good. Thus, the preferred approach is to use the probability expression for expenditure allocations, which will provide the same probability for a given expenditure pattern regardless of the good labeled as the first good. However, in the case that the first good is an outside numeraire good that is always consumed (see Section 4.5), then p1 ¼ 1 and one can use the consumption pattern probability expression or the expenditure allocation probability expression.
4.4.2.
The Multiple Discrete-Continuous Generalized Extreme Value (MDCGEV) Model Structure
Thus far, we have assumed that the ek terms are independently and identically extreme value distributed across alternatives k. The analyst can extend the model to allow correlation across alternatives using a generalized extreme value (GEV) error structure. The remarkable advantage of the GEV structure is that it continues to result in closed-form probability expressions for any and all expenditure patterns. However, the derivation is tedious, and the expressions get unwieldy. Pinjari and Bhat (in press) formulate a special two-level nested case of the MDCGEV model with a nested extreme value distributed structure that has the following joint cumulative distribution: 2 Fð1 ; 2 ; ::; K Þ ¼ exp4
SK X s¼1
(
3 )ys i 5 exp y s th i2s nest X
(4.21)
In the above expression, sð¼ 1; 2; :::; S K Þ is the index to represent a nest of alternatives, SK the total number of nests the K alternatives belong to, and ys ð0oys 1; s ¼ 1; 2; :::; S K Þ the (dis)similarity parameter introduced to induce correlations among the stochastic components of the utilities of alternatives belonging to the sth nest.12 Without loss of generality, let 1; 2; :::; SM be the nests the M chosen alternatives belong to, and let q1 ; q2 ; :::; qSM be the number of chosen alternatives in each of the SM nests (thus, q1 þ q2 þ ::: þ qSM ¼ M). Using the nested extreme value error
12. This error structure assumes that the nests are mutually exclusive and exhaustive (i.e., each alternative can belong to only one nest and all alternatives are allocated to one of the SK nests).
MDCEV Model: Formulation and Applications
87
distribution assumption specified in Eq. (4.21) (and the above-identified notation), Pinjari and Bhat (in press) derived the following expression for the multiple discretecontinuous nested extreme value (MDCNEV) model: Q Pðt1 ; t2 ; . . . tM ;
i2fchosen altsg e
0; . . . ; 0Þ ¼ jJj QSM P s¼1
i2sth nest e
V i =yi
V i =ys
qs
q1 X r1 ¼1
:::
qs X rs ¼1
8 2 3qs rs þ1 !
P SM SM
sumðX rs Þ : s¼1 PSk P th eV i =ys ys s¼1 s¼1 i2s nest !) SM X ðqs rs þ 1Þ 1 !
:::
qS M X rSM ¼1
(4.22)
s¼1
In the above expression, SðX rs Þ is the sum of elements of a row matrix Xrs (see Appendix 4.A for a description of the form of the matrix Xrs). As indicated in Pinjari and Bhat (in press), the general expression above represents the MDCNEV consumption probability for any consumption pattern with a twolevel nested extreme value error structure. It may be verified that the MDCNEV probability expression in Eq. (4.22) simplifies to Bhat’s (2008) MDCEV probability expression when each of the utility functions are independent of one another (i.e., when ys ¼ 1 and qs ¼ 1’s, and SM ¼ M).
4.4.3.
The Mixed MDCEV Model
The MDCGEV structure is able to accommodate flexible correlation patterns. However, it is unable to accommodate random taste variation, and it imposes the restriction of equal scale of the error terms. Incorporating a more general error structure is straightforward through the use of a mixing distribution, which leads to the mixed MDCEV (or MMDCEV) model. Specifically, the error term, ek, may be partitioned into two components, zk and Zk. The first component, zk, can be assumed to be independently and identically Gumbel distributed across alternatives with a scale parameter of s. The second component, Zk, can be allowed to be correlated across alternatives and to have a heteroscedastic scale. Let Z ¼ ðZ1 ; Z2 ; . . . ; ZK Þ0 , and assume that Z is distributed multivariate normal, Z Nð0; OÞ.13 For given values of the vector Z, one can follow the discussion of the earlier section and obtain the usual MDCEV probability that the first M of the k goods are
13. Other distributions may also be used for Z. Note that the distribution of Z can arise from an errorcomponents structure or a random coefficients structure or a combination of the two, similar to the case of the usual mixed logit model (see Bhat, 2007).
88
Chandra R. Bhat and Naveen Eluru
consumed. The unconditional probability can then be computed as: " #" # Z M M X Y 1 1 ci Pðe1 ; e2 ; e3 ; . . . ; eM ; 0; 0; . . . ; 0Þ ¼ M1 c Z s i¼1 i¼1 i 3 2 QM ðV i þZ Þ=s i e 7 6 4P i¼1 M 5ðM 1Þ!dFðZÞ K ðV k þZk Þ=s k¼1 e
(4.23)
where F is the multivariate cumulative normal distribution (see Bhat, 2005; Bhat & Sen, 2006; Bhat et al., 2006). The model in Eq. (4.23) can be extended in a conceptually straightforward manner to also include random coefficients on the independent variables zk, and randomeffects (or even random coefficients) in the ak satiation parameters (if the a profile is used) or the gk parameters (if the g profile is used). 4.4.3.1. Heteroscedastic structure within the MMDCEV framework Consider the case where there is price variation across the alternatives, and the overall errors ek are heteroscedastic, but not correlated. Assuming a 4-alternative case for ease in presentation, the heteroscedastic structure may be specified in the form of the following covariance matrix for ¼ ðk1 ; k2 ; k3 ; k4 Þ: 2 CovðÞ ¼
1
p2 s2 6 6 6 6 4
0 1
3 2 2 o1 0 6 7 07 6 7þ6 05 6 4
0 0 1
1
0 o22
0 0 o23
3 0 7 0 7 7 0 7 5 o24
(4.24)
where the first component on the right side corresponds to the IID covariance matrix of z ¼ ðz1 ; z2 ; z3 ; z4 Þ and the second component is the heteroscedastic covariance matrix of Z. The covariance of error differences with respect to the first alternative is: 2 2 2
Covð~1 ¼ ~21 ; ~ 31 ; ~41 Þ ¼
ps 6 4 6
2
3
2
1
1
2
7 6 15 þ 4 2
o21 þ o22
o21 o21 þ o23
o21 o21 o21 þ o24
3 7 5 (4.25)
An inspection of the matrix above shows only four independent equations (the rank condition), implying that at most four parameters are estimable.14 There are two ways to proceed with a normalization, as discussed below.
14. Strictly speaking, one can estimate all the five parameters (s, o1, o2, o3, and o4) because of the difference in the extreme value distributions used for zk and the normal distributions used for Zk (see Walker, 2002). However, the model will be near singular, and it is important to place the order/rank constraint.
MDCEV Model: Formulation and Applications
89
The first approach is to normalize s and estimate the heteroscedastic covariance ~ and let the matrix of Z (i.e., o1, o2, o3, and o4). Assume that s is normalized to s, ~ k (k ¼ 1, 2, 3, 4). Then, the following equalities corresponding values of ok be o should hold, based on Eq. (4.25), for any normalization of s to s~ (q ¼ p2/6 below): ~ 21 þ qs~ 2 o21 þ qs2 ¼ o ~ 21 þ o ~ 2k þ 2qs~ 2 ðk ¼ 2; 3; 4Þ o21 þ o2k þ 2qs2 ¼ o
(4.26)
The above equalities can be rewritten as: ~ 2k ¼ o2k þ qs2 qs~ 2 ðk ¼ 1; 2; 3; 4Þ o
(4.27)
~ 2k must be greater than or equal to zero, which The normalized variance terms o implies that the following conditions should hold: o2k þ qs2 qs~ 2 ðk ¼ 1; 2; 3; 4Þ
(4.28)
Intuitively, the above condition implies that the normalization on s~ must be set low enough so that the overall ‘‘true’’ variance of each error term ð¼ o2k þ qs2 Þ is larger than qs~ 2 . For example, setting s to 1 would be inappropriate if the ‘‘true’’ variance of one or more alternatives is less than p2/6. Since the ‘‘true’’ variance is unknown, the best the analyst can do is to normalize s to progressively smaller values and statistically examine the results. The second approach is to normalize one of the ok terms instead of the s term. In this case, from Eq. (4.25), we can write: 1 ~ 21 ¼ ½o21 þ o2k þ 2qs2 o ~ 21 o ~ 2k ; k ¼ 2; 3; 4 qs~ 2 ¼ o21 þ qs2 o 2
(4.29)
After some manipulations, the above equation may be rewritten as: ~ 21 o21 ; k ¼ 2; 3; 4 ~ 2k ¼ o2k þ o o
(4.30)
~ 2k must be greater than or Next, imposing the condition that the normalized terms o equal to zero implies the following: ~ 21 o21 o2k ðk ¼ 2; 3; 4Þ o
(4.31)
The above condition is automatically satisfied as long as the first alternative is the minimum variance alternative. An associated convenient normalization is ~ 21 ¼ 0, since the resulting model nests the MDCEV model. The minimum o variance alternative can be determined by estimating an unidentified model with all the k ok terms, and identifying the alternative with the minimum variance
90
Chandra R. Bhat and Naveen Eluru
(see Walker, Ben-Akiva, & Bolduc, 2004, for an equivalent procedure for a heteroscedastic specification within the mixed multinomial logit model). The above discussion assumes there is price variation across goods. In the case of no price variation, the scale s is not identifiable. In this case, the easiest procedure is to normalize s to 1 and the o2k value for the minimum variance alternative k to zero. 4.4.3.2. The general error covariance structure within the MMDCEV framework Appropriate identification normalizations will have to placed on s and the covariance matrix of Z when the analyst is estimating an error-components structure to allow correlation in unobserved factors influencing the baseline utility of alternatives, since only a (K 1) (K 1) covariance of error differences is identified. This can be accomplished by imposing a structure based on a priori beliefs or intuitive considerations. However, the analyst must ensure that the elements of the assumed restricted covariance structure can be recovered from the (K 1) (K 1) covariance of error differences that is actually estimable. In the most general error covariance structure, and when there is price variation, one way to achieve identification is the following: (1) normalize the scale parameter s to be a small value such that the variance of the minimum variance alternative exceeds p2s2/6 (since this variance is not known, the analyst will have to experiment with alternative fixed s values), (2) normalize ok for the minimum variance alternative k to zero, and (3) normalize all correlations of this minimum variance alternative with other alternatives to zero. Together, these normalizations leave only K(K 1)/2 parameters to be estimated, and are adequate for identification. In the case of no price variation, P an 2additional restriction will have to be imposed. One approach would be to set K k¼2 ok ¼ 1 to set the scale in the covariance matrix of Z.
4.4.4.
The Joint MDCEV-Single Discrete Choice Model
The MDCEV model and its extensions discussed thus far are suited for the case when the alternatives are imperfect substitutes, as recognized by the use of a nonlinear utility that accommodates a diminishing marginal utility as the consumption of any alternative increases. However, there are many instances where the real choice situation is characterized by a combination of imperfect and perfect substitutes (perfect substitutes correspond to the case where consumers prefer to select only one discrete alternative at any choice occasion; see Hanemann, 1984). The MDCEV model needs to be modified to handle such a combination of a multiple discrete-continuous choice among alternatives, as well as a single choice of one subalternative within one or more of the alternatives. We do not discuss this case here due to space constraints, but the reader is referred to Bhat et al. (2009) and Bhat et al. (2006).
MDCEV Model: Formulation and Applications 4.4.5.
91
The Nonadditive MDCEV Model Structure
Vasquez Lavin and Hanemann (2008) have recently proposed an extension of Bhat’s additively separable linear Box–Cox utility functional form (Eq. (4.1)) to incorporate a nonadditively separable quadratic Box-Cox functional form. Using more flexible nonadditive utility structures allows the analyst to handle both complementarity as well as substitution among goods. To write this general nonadditive form, define mk as: ak K X gk xk þ1 1 mk ¼ a gk k¼1 k Then, a nonadditively separable functional form may be written as: UðxÞ ¼
K X k¼1
mk ck þ
K X K 1X ykm mk mm 2 k¼1 m¼1
This is very general, and collapses to Bhat’s additively separable form when ykm ¼ 0 for all k and m. It collapses to the translog functional form when ak-0 for all k, and to Wales and Woodland’s quadratic form when ak ¼ 1 for all k. The interpretation of the parameters is not as straightforward as in Bhat’s MDCEV and the probability expressions for the consumption of the goods and the Jacobian do not have simple forms. But the gain is that the marginal utility of consumption of a good is not only dependent on the amount of that good consumed, but also the amount of other goods consumed.
4.5. The Model with an Outside Good Thus far, the discussion has assumed that there is no-outside numeraire good (i.e., no essential Hicksian composite good). If an outside good is present, label it as the first good which now has a unit price of one. Also, for identification, let cðx1 ; 1 Þ ¼ e1 . Then, the utility functional form needs to be modified as follows: UðxÞ ¼
K X 1 gk expð1 Þfðx1 þ g1 Þa1 g þ expðb0 zk þ k Þ a a1 k¼2 k
ak xk þ1 1 gk
(4.32)
In the above formula, we need g1r0, while gkW0 for kW1. Also, we need x1 + g1W0. The magnitude of g1 may be interpreted as the required lower bound (or a ‘‘subsistence value’’) for consumption of the outside good. As in the ‘‘no-outside good’’ case, the analyst will generally not be able to estimate both ak and gk for the outside and inside goods. The analyst can estimate one of the following five utility forms: UðxÞ ¼
K X 1 1 expð1 Þxa11 þ expðb0 zk þ k Þfðxk þ 1Þak 1g a1 a k k¼2
92
Chandra R. Bhat and Naveen Eluru K X 1 xk a1 0 UðxÞ ¼ expð1 Þx1 þ gk expðb zk þ k Þ ln þ1 gk a1 k¼2 K X 1 gk UðxÞ ¼ expð1 Þxa1 þ expðb0 zk þ k Þ a a k¼2
UðxÞ ¼ expð1 Þ lnfx1 þ g1 g þ
xk þ1 gk
a
(4.33)
1
K X 1 expðb0 zk þ k Þfðxk þ 1Þak 1g a k¼2 k
UðxÞ ¼ expð1 Þ lnfx1 þ g1 g þ
K X k¼2
gk expðb0 zk þ k Þ ln
xk þ1 gk
The third functional form above is estimable because the constant a parameter is obtaining a ‘‘pinning effect’’ from the satiation parameter for the outside good. The analyst can estimate all the five possible functional forms and select the one that fits the data best based on statistical and intuitive considerations. The identification considerations discussed for the ‘‘no-outside good’’ case carries over to the ‘‘with outside good’’ case. The probability expression for the expenditure allocation on the various goods (with the first good being the outside good) is identical to Eq. (4.19), while the probability expression for consumption of the goods (with the first good being the outside good) is 2 3 " #" # Q M M M V =s Xp 6 Y 1 e i 7 i fi Pðx1 ; x2 ; x3 ; . . . ; xM ; 0; 0; . . . ; 0Þ ¼ M1 4P i¼1 M 5ðM 1Þ! f s K i V =s k i¼1 i¼1 k¼1 e (4.34) where f i ¼ ð1 ai Þ=ðxi þ gi Þ. The expressions for V in Eqs. (4.19) and (4.34) are as follows for each of the five utility forms in Eq. (4.33): First form : Second form :
Third form : Fourth form : Fifth form :
V k ¼ b0 zk þ ðak 1Þ lnðxk þ 1Þ ln pk ðk 2Þ; V 1 ¼ ða1 1Þ lnðx1 Þ x V k ¼ b0 zk ln k þ 1 ln pk ðk 2Þ; V 1 ¼ ða1 1Þ lnðx1 Þ gk (4.35) x V k ¼ b0 zk þ ða 1Þ ln k þ 1 ln pk ðk 2Þ; V 1 ¼ ða1 1Þ lnðx1 Þ gk V k ¼ b0 zk þ ðak 1Þ lnðxk þ 1Þ ln pk ðk 2Þ; V 1 ¼ lnðx1 þ g1 Þ x V k ¼ b0 zk ln k þ 1 ln pk ðk 2Þ; V 1 ¼ lnðx1 þ g1 Þ gk
MDCEV Model: Formulation and Applications
93
4.6. Applications The MDCEV model framework has been employed in modeling a number of choice situations that are characterized by multiple discreteness. These can be broadly categorized into the following research areas: (1) activity time–use analysis (adults and children), (2) household vehicle ownership, (3) household expenditures, and (4) Angler’s site choice.15
4.6.1.
Activity Time–Use Analysis
The MDCEV model that assumes diminishing marginal utility of consumption provides an ideal platform for modeling activity time–use decisions. The different studies on activity time–use are described chronologically below. Bhat (2005) demonstrated an application of the MDCEV model to individual time-use in different types of discretionary activity pursuits on weekend days. The modeling exercise included different kinds of variables, including household demographics, household location variables, individual demographics and employment characteristics, and day of week and season of year. Bhat et al. (2006) formulate a unified utility-maximizing framework for the analysis of a joint imperfect–perfect substitute goods case. This is achieved by using a satiation-based utility structure (MDCEV) across the imperfect substitutes, but a simple standard discrete choicebased linear utility structure (MNL) within perfect substitutes. The joint model is applied to analyze individual time–use in both maintenance and leisure activities using weekend day time–use. Kapur and Bhat (2007) specifically modeled the social context of activity participation by examining the accompaniment arrangement (i.e., company type) in activity participation. Sener and Bhat (2007) also examined participation and time investment in in-home leisure as well as out-of-home discretionary activities with a specific emphasis on the accompanying individuals in children’s activity engagement. Copperman and Bhat (2007) formulated a comprehensive framework to consider participation, and levels of participation, in physically passive and physically active episodes among children on weekend days. LaMondia, Bhat, and Hensher (2008) focused their attention on vacation travel in USA. Specifically, the paper examined how households decide what vacation travel activities to participate in on an annual basis, and to what extent, given the total annual vacation travel time that is available at their disposal.
15. The summary of all the studies discussed in this chapter are compiled in the form of a table with information on the application focus, the data source used for the empirical analysis, the number and labels of discrete alternatives, the continuous component in the empirical context, and the MDCEV model type employed. The table is available to the readers at: http://www.caee.utexas.edu/prof/bhat/ ABSTRACTS/MDCEV_BookChapter_Table1.pdf
94
Chandra R. Bhat and Naveen Eluru
The models presented in Sener, Copperman, Pendyala, and Bhat (2008) offer a rich framework for categorizing and representing the activity-travel patterns of children within larger travel demand model systems. The paper provides a taxonomy of child activities that explicitly considers the spatial and temporal constraints that may be associated with different types of activities. Pinjari, Bhat, and Hensher (2009) presented a joint model system of residential location and activity time–use choices. The model system takes the form of a joint mixed multinomial logit–multiple discrete-continuous extreme value (MNL–MDCEV) structure that (a) accommodates differential sensitivity to the activity-travel environment attributes due to both observed and unobserved individual-related attributes, and (b) controls for the self selection of individuals into neighborhoods due to both observed and unobserved individual-related factors. Spissu, Pinjari, Bhat, Pendyala, and Axhausen (2009) formulated a panel version of the mixed multiple discrete-continuous extreme value (MMDCEV) model that is capable of simultaneously accounting for repeated observations from the same individuals (panel), participation in multiple activities in a week, durations of activity engagement in various activity categories, and unobserved individual-specific factors affecting discretionary activity engagement including those common across pairs of activity category utilities. Pinjari and Bhat (in press) proposed the MDCNEV model that captures interalternative correlations among alternatives in mutually exclusive subsets (or nests) of the choice set, while maintaining the closed-form of probability expressions for any (and all) consumption pattern(s). The model estimation results provide several insights into the determinants of nonworkers’ activity time–use and timing decisions. Rajagopalan, Pinjari, and Bhat (2009) predicted workers’ activity participation and time allocation patterns in seven types of out-of-home nonwork activities at various time periods of the day. The knowledge of the activities (and the corresponding time allocations and timing decisions) predicted by this model can be used for subsequent detailed scheduling and sequencing of activities and related travel in an activity-based microsimulation framework.
4.6.2.
Household Vehicle Ownership
The MDCEV framework, with its capability to handle multiple discreteness, lends itself very well to model household vehicle ownership by type. Bhat and Sen (2006) modeled the simultaneous holdings of multiple vehicle types (passenger car, SUV, pickup truck, minivan, and van), as well as determined the continuous miles of usage of each vehicle type. The model can be used to determine the change in vehicle type holdings and usage due to changes in independent variables over time. As a demonstration, the impact of an increase in vehicle operating costs, on vehicle type ownership and usage, is assessed. Ahn, Jeong, and Kim (2008) employed conjoint analysis and the MDCEV framework to understand consumer preferences for alternative fuel vehicles. The results indicate a clear
MDCEV Model: Formulation and Applications
95
preference of gasoline-powered cars among consumers, but alternative fuel vehicles offer a promising substitute to consumers. Bhat et al. (2009) formulated and estimated a nested model structure that includes a MDCEV component to analyze the choice of vehicle type/vintage and usage in the upper level and a multinomial logit (MNL) component to analyze the choice of vehicle make/model in the lower nest.
4.6.3.
Household Expenditures
The MDCEV framework provides a feasible framework to analyze consumption patterns. Ferdous, Pinjari, Bhat, and Pendyala (2008) employed a MDCNEV structure to explicitly recognize that people choose to consume multiple goods and commodities. Model results show that a range of household socio-economic and demographic characteristics affect the percent of income or budget allocated to various consumption categories and savings. Rajagopalan and Srinivasan (2008) explicitly investigated transportation related household expenditures by mode. Specifically, they examined the mode choice and modal expenditures at the household level. The model results indicate that mode choice and frequency decisions are influenced by prior mode choice decisions, and the user’s perception of safety and congestion.
4.6.4.
Angler’s Site Choice
Vasquez Lavin and Hanemann (2008) formulate the nonadditive MDCEV model structure to study angler site choice. In this study, they employ individual level variables such as skill, leisure time available, and ownership status (of cabin, boat, or RV). Further, they undertake the computation of welfare measures using a sequential quadratic programming method.
4.7. Conclusions Classical discrete and discrete-continuous models deal with situations where only one alternative is chosen from a set of mutually exclusive alternatives. Such models assume that the alternatives are perfectly substitutable for each other. On the other hand, many consumer choice situations are characterized by the simultaneous demand for multiple alternatives that are imperfect substitutes or even complements for one another. This chapter discusses the MDCEV model and its many variants. Recent applications of the MDCEV type of models are presented and briefly discussed. This overview of applications indicates that the MDCEV model has been employed in many different empirical contexts in the transportation field, and also highlights the
96
Chandra R. Bhat and Naveen Eluru
potential for application of the model in several other fields. The overview also serves to highlight the fact that the field is at an exciting and ripe stage for further applications of the multiple discrete-continuous models. At the same time, several challenges lie ahead, including (1) accommodating more than one constraint in the utility maximization problem (e.g., recognizing both time and money constraints in activity type choice and duration models; see Anas, 2006 for a recent theoretical effort to accommodate such multiple constraints), (2) incorporating latent consideration sets in a theoretically appropriate way within the MDCEV structure (the authors are currently addressing this issue in ongoing research), (3) using more flexible utility structures that can handle both complementarity as well as substitution among goods, and that do not impose the constraints of additive separability (Vasquez & Hanemann, 2008 provide some possible ways to accommodate this), and (4) developing easy-to-apply techniques to use the model in forecasting mode.
References Ahn, J., Jeong, G., & Kim, Y. (2008). A forecast of household ownership and use of alternative fuel vehicles: A multiple discrete-continuous choice approach. Energy Economics, 30(5), 2091–2104. Amemiya, T. (1974). Multivariate regression and simultaneous equation models when the dependent variables are truncated normal. Econometrica, 42, 999–1012. Anas, A. (2006). A unified theory of consumption and travel. Presented at the Conference Honoring Kenneth A. Small, Department of Economics, University of California, Irvine, February 3–4. Arora, N., Allenby, G. M., & Ginter, J. L. (1998). A hierarchical Bayes model of primary and secondary demand. Marketing Science, 17, 29–44. Baltas, G. (2004). A model for multiple brand choice. European Journal of Operational Research, 154, 144–149. Bhat, C. R. (2005). A multiple discrete-continuous extreme value model: Formulation and application to discretionary time–use decisions. Transportation Research Part B, 39(8), 679–707. Bhat, C. R. (2007). Econometric choice formulations: Alternative model structures, estimation techniques, and emerging directions. In: K. W. Axhausen (Ed.), Moving through nets: The physical and social dimensions of travel – Selected papers from the 10th international conference on travel behaviour research (pp. 45–80). Amsterdam: Elsevier. Bhat, C. R. (2008). The multiple discrete-continuous extreme value (MDCEV) model: Role of utility function parameters, identification considerations, and model extensions. Transportation Research Part B, 42(3), 274–303. Bhat, C. R., & Sen, S. (2006). Household vehicle type holdings and usage: An application of the multiple discrete-continuous extreme value (MDCEV) model. Transportation Research Part B, 40(1), 35–53. Bhat, C. R., Sen, S., & Eluru, N. (2009). The impact of demographics, built environment attributes, vehicle characteristics, and gasoline prices on household vehicle holdings and use. Transportation Research Part B, 43(1), 1–18.
MDCEV Model: Formulation and Applications
97
Bhat, C. R., & Srinivasan, S. (2005). A multidimensional mixed ordered-response model for analyzing weekend activity participation. Transportation Research Part B, 39(3), 255–278. Bhat, C. R., Srinivasan, S., & Sen, S. (2006). A joint model for the perfect and imperfect substitute goods case: Application to activity time–use decisions. Transportation Research Part B, 40(10), 827–850. Chan, T. Y. (2006). Estimating a continuous hedonic-choice model with an application to demand for soft drinks. RAND Journal of Economics, 37(2), 466–482. Chiang, J. (1991). A simultaneous approach to whether to buy, what to buy, and how much to buy. Marketing Science, 4, 297–314. Chintagunta, P. K. (1993). Investigating purchase incidence, brand choice and purchase quantity decisions of households. Marketing Science, 12, 194–208. Copperman, R., & Bhat, C. R. (2007). An analysis of the determinants of children’s weekend physical activity participation. Transportation, 34(1), 67–87. Deaton, A., & Muellbauer, J. (1980). Economics and Consumer Behavior. Cambridge: Cambridge University Press. Dube, J. P. (2004). Multiple discreteness and product differentiation: Demand for carbonated soft drinks. Marketing Science, 23(1), 66–81. Edwards, Y. D., & Allenby, G. M. (2003). Multivariate analysis of multiple response data. Journal of Marketing Research, 40, 321–334. Ferdous, N., Pinjari, A. R., Bhat, C. R., & Pendyala, R. M. (2008). A comprehensive analysis of household transportation expenditures relative to other goods and services: An application to United States consumer expenditure data. Technical paper, Department of Civil, Architectural & Environmental Engineering, The University of Texas at Austin, July 2008. Hanemann, W. M. (1978). A methodological and empirical study of the recreation benefits from water quality improvement. Ph.D. dissertation, Department of Economics, Harvard University. Hanemann, W. M. (1984). The discrete/continuous model of consumer demand. Econometrica, 52, 541–561. Hendel, I. (1999). Estimating multiple-discrete choice models: An application to computerization returns. Review of Economic Studies, 66, 423–446. Herriges, J. A., Kling, C. L., & Phaneuf, D. J. (2004). What’s the use? Welfare estimates from revealed preference models when weak complementarity does not hold. Journal of Environmental Economics and Management, 47, 55–70. Kapur, A., & Bhat, C. R. (2007). On modeling adults’ daily time use by activity purpose and accompaniment arrangement. Transportation Research Record, 2021, 18–27. Kim, J., Allenby, G. M., & Rossi, P. E. (2002). Modeling consumer demand for variety. Marketing Science, 21, 229–250. Kuhn, H. W., & Tucker, A. W. (1951). Nonlinear programming. In: J. Neyman, (Ed.), Proceedings of the second Berkeley symposium on mathematical statistics and probability (pp. 481–492). Berkeley, CA: University of California Press. LaMondia, J., Bhat, C. R., & Hensher, D. A. (2008). An annual time use model for domestic vacation travel. Journal of Choice Modelling, 1(1), 70–97. Ma¨ler, K.-G. (1974). Environmental economics: A theoretical inquiry. Baltimore, MD: The Johns Hopkins University Press for Resources for the Future. Manchanda, P., Ansari, A., & Gupta, S. (1999). The ‘‘shopping basket’’: A model for multicategory purchase incidence decisions. Marketing Science, 18, 95–114. Mohn, C., & Hanemann, M. (2005). Caught in a corner: Using the Kuhn–Tucker conditions to value Montana sportfishing. In: R. Halvorsen & D. F. Layton (Eds), Explorations in
98
Chandra R. Bhat and Naveen Eluru
environmental and natural resource economics: Essays in honor of Gardner M. Brown, Jr (pp. 188–207). Northampton, MA: Edward Elgar Publishing, Inc. Ch 10. Phaneuf, D. J., Kling, C. L., & Herriges, J. A. (2000). Estimation and welfare calculations in a generalized corner solution model with an application to recreation demand. The Review of Economics and Statistics, 82(1), 83–92. Phaneuf, D. J., & Smith, V. K. (2005). Recreation demand models. In: K.-G. Ma¨ler & J. R. Vincent (Eds), Handbook of environmental economics (Vol. 2). North Holland. Pinjari, A. R., & Bhat, C. R. (in press). A multiple discrete-continuous nested extreme value (MDCNEV) model: Formulation and application to non-worker activity time-use and timing behavior on weekdays. Transportation Research Part B. Pinjari, A. R., Bhat, C. R., & Hensher, D. A. (2009). Residential self-selection effects in an activity time–use behavior model. Transportation Research Part B, 43(7), 729–748. Pollak, R., & Wales, T. (1992). Demand system specification and estimation. New York: Oxford University Press. Rajagopalan, B. S., Pinjari, A. R., & Bhat, C. R. (2009). A comprehensive model of workers’ non-work activity time-use and timing behavior Transportation Research Record, forthcoming. Rajagopalan, B. S., & Srinivasan, K. S. (2008). Integrating household-level mode choice and modal expenditure decisions in a developing country: Multiple discrete-continuous extreme value model. Transportation Research Record: Journal of the Transportation Research Board, No. 2076, pp. 41–51, Transportation Research Board of the National Academies, Washington, DC. Sener, I. N., & Bhat, C. R. (2007). An analysis of the social context of children’s weekend discretionary activity participation. Transportation, 34(6), 697–721. Sener, I. N., Copperman, R. B., Pendyala, R. M., & Bhat, C. R. (2008). An analysis of children’s leisure activity engagement: Examining the day of week, location, physical activity level, and fixity dimensions. Transportation, 35(5), 673–696. Spissu, E., Pinjari, A. R., Bhat, C. R., Pendyala, R. M., & Axhausen, K. W. (2009). An analysis of weekly out-of-home discretionary activity participation and time–use behavior. Transportation, 36(5), 483–510. Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26, 24–36. Train, K. (2003). Discrete choice methods with simulation. Cambridge, UK: Cambridge University Press. Vasquez Lavin, F., & Hanemann, W. M. (2008). Functional forms in discrete/continuous choice models with general corner solution (December 30, 2008). CUDARE Working Paper no. 1078, Department of Agricultural and Resource Economics, UCB. von Haefen, R. H. (2003). Latent consideration sets and continuous demand system models. Working paper, Department of Agricultural and Resource Economics, University of Arizona. von Haefen, R. H. (2004). Empirical strategies for incorporating weak complementarity into continuous demand system models. Working paper, Department of Agricultural and Resource Economics, University of Arizona. von Haefen, R. H., & Phaneuf, D. J. (2005). Kuhn–Tucker demand system approaches to nonmarket valuation. In: R. Scarpa & A. A. Alberini (Eds), Applications of simulation methods in environmental and resource economics. Springer. von Haefen, R. H., Phaneuf, D. J., & Parsons, G. R. (2004). Estimation and welfare analysis with large demand systems. Journal of Business and Economic Statistics, 22(2), 194–205.
MDCEV Model: Formulation and Applications
99
Wales, T. J., & Woodland, A. D. (1983). Estimation of consumer demand systems with binding non-negativity constraints. Journal of Econometrics, 21(3), 263–285. Walker, J. (2002). Mixed logit (or logit kernel) model: Dispelling misconceptions of identification. Transportation Research Record, 1805, 86–98. Walker, J., Ben-Akiva, M., & Bolduc, D. (2004). Identification of the logit kernel (or mixed logit) model. Presented at the 10th International Conference on Travel Behavior Research, Lucerne, Switzerland, August. Walsh, J. W. (1995). Flexibility in consumer purchasing for uncertain future tastes. Marketing Science, 14, 148–165.
Appendix 4.A For rs ¼ 1, Xrs ¼ {1}. ðqs 1Þð1 ys Þ ðqs 2Þð1 ys Þ 2ð1 ys Þ 1ð1 ys Þ . For rs ¼ 2, X rs ¼ þ þ þ þ ys ys ys ys
qs 2 For rs ¼ 3, 4,y, qs, Xrs is a matrix of size which is formed as described r2 below. Consider the following row matrices Aqs and Ars (with the elements arranged in the descending order, and of size qs 1 and rs 2, respectively): Aqs ¼
ðqs 1Þð1 ys Þ ðqs 2Þð1 ys Þ ðqs 3Þð1 ys Þ ; ; ;...; ys ys ys 3ð1 ys Þ 2ð1 ys Þ 1ð1 ys Þ ; ; ys ys ys
Ars ¼ frs 2; rs 3; rs 4; . . . ; 3; 2; 1g Choose any rs 2 elements (other than the last element, (1 ys)/ys) of the matrix Aqs and arrange" them in # the descending order into another matrix Aiqs . Note that qs 2 number of such matrices. Subsequently, form another we can form rs 2 matrix Airqs ¼ Aiqs þ Ars . Of the remaining elements in the Aqs matrix, discard the elements that are larger than or equal to the smallest element of the Aiqs matrix, and store the remaining elements into another matrix labeled Birqs . Now, an element of Xrs (i.e., xirqs ) is formed by performing the following operation: xirqs ¼ ProductðAirqs Þ SumðBirqs Þ; that is, by multiplying the product of all elements of the matrix Airqs with the sum of all elements of the" matrix#Birqs . Note that the qs 2 number of such elements of the matrix Xrs is equal to . rs 2
Chapter 5
Capturing Human Perception of Facial Expressions by Discrete Choice Modelling Matteo Sorci, Thomas Robin, Javier Cruz, Michel Bierlaire, J.-P. Thiran and Gianluca Antonini
Abstract Facial expression recognition by human observers is affected by subjective components. Indeed there is no ground truth. We have developed Discrete Choice Models (DCM) to capture the human perception of facial expressions. In a first step, the static case is treated, that is modelling perception of facial images. Image information is extracted using a computer vision tool called Active Appearance Model (AAM). DCMs attributes are based on the Facial Action Coding System (FACS), Expression Descriptive Units (EDUs) and outputs of AAM. Some behavioural data have been collected using an Internet survey, where respondents are asked to label facial images from the Cohn–Kanade database with expressions. Different models were estimated by likelihood maximization using the obtained data. In a second step, the proposed static discrete choice framework is extended to the dynamic case, which considers facial video instead of images. The model theory is described and another Internet survey is currently conducted in order to obtain expressions labels on videos. In this second Internet survey, videos come from the Cohn–Kanade database and the Facial Expressions and Emotions Database (FEED).
5.1. Introduction Facial expressions are one of the most visual methods to convey emotions and one of the most powerful means used by human beings to relate to each other. In order to
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
102
Matteo Sorci et al.
move towards real interacting human–computer systems, where algorithms written by humans should be able to capture, mimic and reproduce human perceptions, facial expressions play surely a central role. One of the key issues to consider in building such systems is the definition of facial expression measurements to study and quantify facial behaviour. The two major approaches in psychological research are message and sign judgement (Cohn, 2006). The task of message judgement is the inference of the displayed facial behaviour, in terms of inferred emotion. As indicated by Cohn (2006), among the different descriptors those of Ekman (1992) have been largely used in the recent past. Ekman proposed the use of the six basic emotions (happiness, surprise, fear, disgust, sadness and anger) that are universally displayed and recognized from facial expressions (Keltner & Ekman, 2000). In sign-judgement approaches the displayed facial behaviour is described by facial movements. Among the various methods the Facial Action Coding Systems (FACS) (Ekman & Friesen, 1978; Ekman, Friesen, & Hager, 2002) is the most comprehensive and widely used. The FACS is a human-observed based system designed to detect subtle changes in facial features and associates facial expression changes with actions of the muscles that produce them. Thus, a nasolabial furrow, running down from the nostrils outward beyond the corners of the lips, can be judged as ‘sadness’ in a message judgement and as a facial movement that raises the cheeks in a sign-judgement approach. In other terms, while message judgement is all about interpretation, signjudgement attempts to be objective. In this work we focus and propose an automatic approach belonging to the family of message judgement–based system. The dominant challenge in building such an automatic system, even if narrowed down to the facial expression perception task of message judgement, arises from the fact that such a perception (performed by human beings in the real world) is subjective and strongly related to contextual information. A typical automatic facial expressions recognition system (Tian, Kanade, & Cohn, 2003; Pantic & Bartlett, 2007; Fasel & Luettin, 2003) is based on a representation of each expression, learned from a training set of pre-selected meaningful features. In the learning process, an expert is asked to associate labels to training samples. An expert should be someone having a strong knowledge of the problem, in order to ensure the correctness of what we are trying to reproduce. Three important questions arise from this fundamental hypothesis of ‘learning by examples’ technique: Can one expert be representative of humans’ perception? How to get and use the experts’ strong knowledge? How to represent the visual information used by the experts? The outstanding human ability to identify individual human faces has long been of major interest to cognitive scientists, neuropsychologists and neuroscientists (Diamond & Carey, 1986; Carey, 1992; Moses, Ullman, & Edelman, 1996). Whereas the human mechanisms for face detection are very robust, the same is not the case for interpretation of facial expressions. It is often very difficult to determine the exact nature of the expression on a person’s face. According to Bassili (1978), a trained
Capturing Human Perception of Facial Expressions
103
observer can correctly classify faces showing six basic emotions with an average of 87%. This ratio varies depending on several factors: the familiarity with the face, the familiarity with the personality of the observed person, the general experience with different types of expressions, the attention given to the face and the non-visual cues (e.g. the context in which an expression appears). Whereas sign-judgement systems are completely insensitive to context and familiarity with the face, the message based ones are strongly influenced by them. This consideration leads to the answer to the first question: in a message-based framework the judgement of one human is not enough to reproduce and capture the different behaviours of humans. In support of this last statement and in order to answer to the second question, the data collected by a web-based static facial expression evaluation survey, developed by the authors (Sorci, Antonini, Thiran, & Bierlaire, 2007) and described in Section 5.3, shows the need for a model capable of taking into account the heterogeneity in human’s perception of facial expressions. Figure 5.1 shows two images of the survey and the histograms of the 33 participants that have annotated them. These are two typical examples of how heterogeneity (Figure 5.1(a)–(b)) and homogeneity (Figure 5.1(c)–(d)) can both be present in human’s judgement. Concerning the last question, most recent attempts in the representation of visual information for facial expression have focused on reproducing the set of rule descriptors suggested by the FACS system. Based on this system, a facial expression can be linguistically described in terms of measures that can be extracted from the face. These measures can be considered as the mathematical representations of local facial features. In the last decade, works on psychophysics and cognitive psychology (Farah, Wilson, Drain, & Tanaka, 1998; Schwaninger, Lobmaier, & Collishaw, 2002; Cabeza & Kato, 2000; Meulders, Boeck, Mechelen, & Gelman, 2005) have shown that face recognition and perception of emotions rely on featural1 and configural2 information. Human’s visual perception of a face involves the processing of both local facial measures and their holistic spatial layout. The implication of these findings is that an automatic system, aiming at interpreting faces, needs to extract and make use of these two sources of information as well. The objective of this work is to propose novel models to describe and reproduce the evaluation of humans, considered as a heterogeneous population, facing the task of labelling static facial expressions. The labelling process is a decision-making process where individuals choose a categorized expression among a set of nine different options: happiness, surprise, fear, disgust, sadness, anger, neutral, other and I don’t know. Discrete Choice Models (DCM) (Ben-Akiva & Lerman, 1985; Manski, 1977; Manski & McFadden, 1981) well fit our needs and they represent a reasonable and theoretically grounded modelling framework. DCMs are econometric models
1. Facial featural features represent local measures of facial components. 2. Facial configural features represent the holistic spatial layout of facial components.
Figure 5.1: Examples of heterogeneous and homogeneous judgements in the data collected by the survey. (a–b) Image of an ambiguous expression and histogram of participants annotations; (c–d) image of a happiness expression, unanimously perceived by the participants.
104 Matteo Sorci et al.
Capturing Human Perception of Facial Expressions
105
designed to forecast the behaviour of individuals in choice situations, when the set of available alternatives is finite and discrete. Our idea is to approach the decisionmaking process through the rational behaviour paradigm, representing the logic behind the DCMs and well matching the evaluation process of the human observer. Three main factors will lead us in the development of a good model: (1) a strong a priori knowledge of the problem; (2) realistic annotations from an heterogeneous population of humans; (3) a reliable set of features. The contributions of this work can be summarized as follows: we propose the use of DCM for modelling the human perception of static facial expression; we develop three models of increased complexity; we show how measures extrapolated by the FACS can be combined with two new sets of features to complete the characterization of each expression and improve the descriptiveness of the model; we have extended the discrete choice framework for static facial expression perception to a dynamic version, which consists in considering videos instead of images. The remainder of the paper is organized as follows: in the next section, we present an overview of the existing works and identify the limitations and differences with ours. In Section 5.3, we describe the facial expression survey we have developed to provide the data used in this work. Section 5.4 introduces the methodological framework, while Section 5.5 details the feature sets used in our model and the associated methods. In Sections 5.6, 5.7 and 5.8 we describe, respectively, the model specification, the estimation of the related parameters and the extension to dynamic facial expression recognition. We end in Section 5.9 with discussions and conclusions.
5.2. Previous Work The current research on facial expression analysis is mostly oriented in two main directions: recognition of prototypic emotional expression and recognition of facial action units. The first aims to a categorical representation of the six universal basic emotions. The second does not attempt to give an interpretation of the expression, but it focuses on the detection of atomic facial signals. The interpretation can be delegated to higher order decision making. The two approaches are strictly related to the two main streams in psychological research: message and sign judgement. Most of the available literature on both approaches proposes a three-step procedure in order to make the problem operational: face detection, facial features extraction and facial changes recognition (prototypic emotions or action units). Face detection is a problem studied since the very beginning times of computer vision. It consists of determining all the regions of the scene under analysis that
106
Matteo Sorci et al.
contain a face. In order to achieve that, a wide variety of works can be found in the literature (Pentland, Moghaddam, & Starner, 1994; Rowley, Baluja, & Kanade, 1998; Sung & Poggio, 1998; Schneiderman & Kanade, 2000) but probably the most commonly used nowadays is the face detector introduced by Viola and Jones (2004). This detector is based on a cascade of classifiers trained with the AdaBoost algorithm (Freund & Schapire, 1997) and the use of the integral image, which makes the method able to run in real-time. A survey on the topic can be found in Yang, Kriegman, and Ahuja (2002) or in Chapter 8 of Medioni and Kang (2004). Once faces are detected, features from these faces need to be extracted. These features can be divided into geometric features and appearance features. Geometric features are featural descriptors of the face that represent it in terms of shape and locations of the main facial components (mouth, eyes, nose etc.). Some recent examples of geometric features extraction can be found in Hu, Chang, Feris, and Turk (2004), Pantic and Patras (2006) or Valstar and Pantic (2007). With respect to appearance features, they are configural or featural descriptors of the face that represent it in terms of facial texture, including wrinkles, bulges and furrows. Some recent examples of these techniques can be found in Ye, Zhan, and Song (2004), Chang, Hu, and Turk (2004) or Bartlett et al. (2006). Hybrid techniques can also be found in the literature, as for example the approach of Zhang and Ji (2005), that uses 26 landmarks around the main facial components as well as the transient features, like wrinkles and furrows. Finally, in the third step, all the information extracted from the face has to be associated with a facial expression, or an action unit, by means of a decision or classification rule. A wide variety of approaches can be found on the literature using a broad range of machine learning techniques: neural networks (NNs) (Zhang, Lyons, Schuster, & Akamatsu, 1998; Padgett & Cottrell, 1998; li Tian, Kanade, & Cohn, 2001, 2002), Bayesian classifiers (Cohen, Sebe, Chen, Garg, & Huang, 2003), Linear Discriminant Analysis (LDA) (Abboud & Davoine, 2004), Hidden Markov Models (HMM) (Cohen et al., 2003) or Support Vector Machines (SVM) (Valstar & Pantic, 2007), for mentioning some of them. Recently, the authors introduced in Antonini, Sorci, Bierlaire, and Thiran (2006), the use of DCM for static facial expression classification.
5.2.1.
Limitations of Previous Approaches
Current works on facial expression understanding, in our view, suffer from the following shortcomings: 1. The main paradigm of standard classification approaches, in the context of message judgement frameworks, consists in associating any two examples having the same features to the same corresponding class. One of the main assumptions is that facial expression labels, reported in the training set, represent the true expressions. As underlined by the example in Figure 5.1, this assumption does not
Capturing Human Perception of Facial Expressions
107
hold in modelling human’s perception static facial expression. Indeed, facial expressions are ambiguous and different people might perceive differently the same expression. This fact is even more accentuated in a static context, where the lack of transitions between following expressions deprives the observer of an important source of information. A probabilistic approach is more suitable in this case. 2. Another limitation of most previous approaches concerns the inability to interpret knowledge acquired by the systems. In other words, their black-box nature prevents any interpretation about the relations between the inputs and outputs of the model. For the same reason, it is also impossible to gain any understanding of the problem at hand or to incorporate human expertise to simplify, accelerate and improve the modelling process. 3. The integration of featural and configural facial features provides crucial cues in the human interpretation of an expression. Besides the work of Zhang and Ji (2005), more complex hybrid system have not been investigated rigorously by the existing works. To overcome the above limitations, we propose the use of DCMs and the introduction of new sets of features. The proposed probabilistic approach allows to: model the possible ambiguities in human perception of static facial expressions; enable the analyst to exploit her knowledge of the problem; and improve the descriptiveness of a face by introducing a more complete set of featural and configural features.
5.3. Data Collection Construction of a good database of facial expressions requires time and training of subjects. Only a few of such databases are available, such as the Cohn–Kanade Database (Kanade, Cohn, & Tian, 2000), JAFFE (Lyons, Akamatsu, Kamachi, & Gyoba, 1998) and most recently the MMI database (Pantic, Valstar, Rademaker, & Maat, 2005). The images used in the survey come from the Cohn–Kanade Database (Figure 5.2).
5.3.1.
Cohn–Kanade Database
The Cohn–Kanade Database consists of image sequences of expressions, starting from a neutral expression and ending most of the time in the peak of the facial expression. The 104 subjects of the database are university students enrolled in introductory psychology classes. They ranged in age from 18 to 30 years. Female 65%, 15% were African-American and 3% were Asian or Latino. Subjects were instructed by an experimenter to perform a series of 23 facial displays. Six of the
108
Matteo Sorci et al.
Figure 5.2: Examples of faces in the Cohn–Kanade Database. displays were based on descriptions of prototypic emotions (i.e. happiness, anger, fear, disgust, sadness and surprise). Before performing each display, an experimenter described and modelled the desired display. 5.3.2.
Facial Expressions Evaluation Survey
In August 2006, Sorci et al. (2007) published the Internet facial expressions evaluation survey in order to find a way to directly get humans’ perception of facial expressions (http://lts5www.epfl.ch/face). The aim of the survey is to collect a dataset created by a sample of real human observers, from all around the world, doing different jobs, having different cultural backgrounds, ages and gender, belonging to different ethnic groups, doing the survey from different places (work, home, on travel etc.) (Figure 5.3). The images used in the survey comes from the Cohn–Kanade Database. Over the 104 subjects in the database, only 11 of them gave the consent for publication. The subset of the Cohn–Kanade Database used in this survey consists of the 1271 images of these 11 subjects (9 women and 2 men). The annotation process consists in associating an expression label (among a set of available human expressions) to each image that is presented to the survey’s participant (see Figure 5.3(b)). A simple and intuitive interface has been designed in order to facilitate the annotation process 3. For each image in the group the participant has to choose one of the following options: happiness, surprise, fear, anger, disgust, sadness, ‘I don’t know’ and ‘Other’. The last two options have been introduced in order to deal with images particularly ambiguous to the participant. In addition, these two options make the set exhaustive, in the sense that they permit to cover the whole range of human expressions. We should remind that in this work we deal with static perception of human expressions and with frames randomly chosen from small video sequences displaying the whole dynamic of the performed expression. The lack of temporal factor, in the labelling process, makes the annotation task difficult and subjective in some cases.
Figure 5.3: Online survey interface: (a) socio-economic form; (b) image annotation interface.
Capturing Human Perception of Facial Expressions 109
110
Matteo Sorci et al.
5.4. Discrete Choice Analysis: A Behavioural Modelling Framework DCM are known in econometrics since the late 1950s. They are designed to describe the behaviour of people in choice situations, when the set of available alternatives is finite and discrete (choice set). They are based on the concept of utility maximization in economics, where the decision maker is assumed to be rational, performing a choice in order to maximize the utilities she perceives from the alternatives. The alternatives are supposed to be mutually exclusive and collectively exhaustive, while the rationality of the decision maker implies transitive and coherent preferences. The utility is a latent construct, which is not directly observed by the modeller, and is treated as a random variable. The discrete choice paradigm matches well with the labelling assignment process of the participants in the survey. This approach can be interpreted as an attempt to model the decision process performed by a hypothetical human observer during the labelling procedure for the facial expressions. Given a population of N individuals, the (random) utility function Uin perceived by individual n from alternative i, given a choice set Cn, is defined as follows: U in ¼ V in þ in
(5.1)
It is composed by the sum of a deterministic term Vin, capturing the systematic behaviour (features extracted from a face), and a random term ein, capturing the uncertainty. This random term captures unobserved attributes, unobserved individual characteristics, measurement errors and instrumental variables. We actually do not observe the real values of the utilities as perceived by the participant. Under the utility maximization assumption, the output of the model is represented by the choice probability that individual n will choose alternative i, given the choice set Cn. It is given by: Pn ðijC n Þ ¼ Pn ðU in U jn ; 8j 2 C n Þ Z Iðn oV in V jn ; 8j 2 Cn ; jaiÞf ðn Þdn ¼
(5.2)
n
where n ¼ jn in and I(U) is an indicator function which is equal to 1 when its argument is satisfied, zero otherwise. In this paper we use a Multinomial Logit Model (MNL), which is largely the simplest and most used DCM in literature. The MNL choice probability is given by the following expression:
Pn ðijC n Þ ¼ P
emV in mV jn j2C n e
(5.3)
In this work the choice set Cn is represented by the nine survey alternatives (‘happiness’, ‘surprise’, ‘fear’, ‘disgust’, ‘sadness’, ‘anger’, ‘neutral’, ‘other’ and ‘I don’t know’).
Capturing Human Perception of Facial Expressions
111
5.5. Explanatory Variables The survey provides the raw data capturing the participants’ perception of facial expressions. This raw data consists on a set of facial expressions images (the Cohn–Kanade images) and the set of participants’ choices. In order to exploit the information coming from both sources we need to identify and represent the facial visual cues describing an expression. The FACS (Ekman & Friesen, 1978) represents the leading standard for measuring facial expressions in behavioural science. The main measures suggested by this human observer system represent a valid starting point in the quest of variables characterizing the different expressions. In the rest of the paragraph we detail the set of explanatory variables induced by the FACS and we introduce two new and complementary sets of visual measures aiming at improving the descriptiveness of each expression. Figure 5.4 schematically shows the image preprocessing steps necessary to compute these three sets of explanatory variables. For that purpose, the AAM representation of the face, described in Section 5.5.1, is applied to the available 1271 images. The shape description of the face (Figure 5.5(a)) is used for computing both measures coming from the FACS (detailed in Section 5.5.2) and the new set of configural measures (Section 5.5.3), called Expression
Figure 5.4: Schema of the image-processing steps that lead to the extraction of the three sets of explanatory variables.
Figure 5.5: (a) Facial landmarks (55 points); (b) the geometrical relationship of facial feature points, where the rectangles represent the regions of furrows and wrinkles; (c) featural descriptors used in the definition of the EDUs.
112 Matteo Sorci et al.
Capturing Human Perception of Facial Expressions
113
Descriptive Unit (EDU), complementing Ekman’s ones. Since both holistic features and local features are important from the human perceptual point of view (Schwaninger, 2003; Cabeza & Kato, 2000; Wallraven, Schwaninger, & Bu¨lthoff, 2005; Bicego, Salah, Grosso, Tistarelli, & Akarun, 2007), a third set of measures representing the appearance of the face has been introduced (Section 5.5.4).
5.5.1.
Active Appearance Model (AAM)
The AAM is a statistical method for matching a combined model of shape and texture to unseen faces. The combination of a model of shape variation with a model of texture variation generates a statistical appearance model. The model relies on a set of annotated images. A training set of images is annotated by putting a group of landmark points around the main facial features, marked in each example. The shape is represented by a vector s brought into a common normalized frame — w.r.t. position, scale and rotation — to which all shapes are aligned. After having computed the mean shape s and aligned all the shapes from the training set by means of a Procrustes transformation (Dryden & Mardia, 1998), it is possible to warp textures from the training set onto the mean shape s, in order to obtain shape-free all patches. Similarly to the shape, after computing the mean shape-free texture g, the textures in the training set can be normalized with respect to it by scaling and offset of luminance values. PCA is applied to build the statistical shape and textures models: si ¼ s þ Fs bsi
and
gi ¼ g þ Ft bti
(5.4)
where si and gi are, respectively, the synthesized shape and shape-free texture, Fs and Ft are the matrices describing the modes of variation derived from the training set, bsi and bti the vectors controlling the synthesized shape and shape-free texture. The unification of the presented shape and texture models into one complete appearance model is obtained by concatenating the vectors bsi and bti by means of normalizing matrix Ws:
bi ¼
Ws bsi b ti
! (5.5)
and learning the correlations between them by means of a further PCA. bi ¼ F c ci
(5.6)
where Fc are the eigenvectors and ci is a vector of appearance parameters allowing to simultaneously control both shape and texture.
114
Matteo Sorci et al.
The statistical model is then given by: si ¼ s þ Qs ci
and
gi ¼ g þ Qt ci
(5.7)
where Qs and Qt are the matrices describing the principal modes of the combined variations in the training set. Fixing the parameters ci we derive the shape and the shape-free texture vectors using Eq. (5.7). A full reconstruction is given by warping the generated texture into the generated shape. In order to allow pose displacement of the model, other parameters must be added to the appearance parameters ci: the pose parameters pi. The matching of the appearance model to a target face can be treated as an optimization problem, minimizing the difference between the synthesized model image and the target face (Stegmann, 2000; Cootes, Edwards, & Taylor, 2001; Cootes & Taylor, 2004; Matthews & Baker, 2004).
5.5.2.
Measures from the FACS
Facial expressions represent a visible consequence of facial muscle and autonomic nervous system actions. Ekman and Friesen (1978) propose the FACS in order to measure all visible movements. Ideally, FACS would differentiate every change in muscular action, but it is limited to what a user can reliably discriminate. A comprehensive system was obtained by discovering how each muscle of the face acts to change visible appearances. With this knowledge it is possible to analyse any facial movement into anatomically based, minimal action units. FACS measurement units are called action units (AUs) and represent the muscular activity that produces momentary changes in facial appearance. A facial expression is indeed the combination of AUs. In particular, there are six basic emotions (happiness, anger, disgust, fear, surprise and sadness) that Keltner and Ekman (2000) postulated as having a distinctive content together with a unique facial expression. Based on the EMFACS (Friesen & Ekman, 1983) the six basic expressions can be described linguistically using Ekman’s AUs. Likewise, we adapt the AU-coded descriptions of facial expressions in the EMFACS in order to describe these six expressions. Table 5.1, which is directly adapted from Friesen and Ekman (1983), illustrates the facial AUs pertaining to the different expressions. By drawing on the work of Zhang and Ji (2005), we group AUs of facial expressions as primary AUs and auxiliary AUs. The primary AUs refer to those AUs or combinations of AUs that univocally describe one of the six expressions. The auxiliary AUs provide an additional support to the expression characterization. This additional support can come from transient features, such as wrinkles and furrows, or from nontransient features, such as measures among facial components. In order to transform the AUs to a set of quantitatively measures Zhang and Ji translate these appearance changes descriptors in a set of geometrical relationships of some facial feature points, shown in Figure 5.5(b), and linguistically reported by Zhang and Ji (2005). We use the shape mask, provided by the AAM, to measure the set of angles and distances detailed in Table 5.2. In the computation of these measures we need to take into
Capturing Human Perception of Facial Expressions
115
Table 5.1: The association of six emotional expressions to AUs, AU combinations and transient features. Emotional category
Primary visual cues AU
AU
AU
AU
Auxiliary visual cues AU AU AU AU AU AU
Happiness
6
12
Sadness Disgust
1 9
15 10
17
Surprise Anger
5 2
26 4
27 1 + 2 7 23 24
Fear
20 1 + 5 5 + 7
25
26
16
4 17
7 25
25 26
26
17
25
26
16
4
5
7
25
Transient feature(s) Wrinkles on outer eye canthi, presence of nasolabial furrow Presence of nasolabial furrow Furrows on the forehead Vertical furrows between brows
26
account that there exists a large variance in the morphology of human faces. In order to deal with these differences a shape normalization is required. The AAM framework establishes a coordinate reference to which all the shapes are aligned by filtering out location, scale and rotational effects. The use of the alignment procedure on the detected masks ensures the computation of consistent measures. On completion of the FACS system visual cues, we describe here the transient features and the measures used to quantify them. Transient wrinkles and furrows are the result of facial muscles movements. These movements produce small ridges in certain face regions. The regions of facial wrinkles and furrows are indicated by rectangles in Figure 5.5(b) and by the curves starting from P and Pu for the nasolabial furrows. The change of wrinkles in the region X is directly related to AU9 (Nose Wrinkler). The furrows in the regions Z, Y, V, U provide diagnostic information for the identification of AU1 (Inner Brow Raiser), AU2 (Outer Brow Raiser), AU4 (Brow Lowerer), AU6 (Cheek Raiser) and AU17 (Chin Raiser), respectively. In order to detect these features, the edge detection with embedded confidence, proposed by Meer and Georgescu (2001), is used. The detection is successively refined by analysing the direction of the extracted edge. Referring to Figure 5.5(b), wrinkles in regions Z and X should be mostly horizontal while those in region Y mostly vertical. Figure 5.6 shows examples of transient feature detection. The ratio between edge pixels (wrinkles) and background pixels (skin) is used to measure and detect the presence of wrinkles in regions X, Y and Z. For the nasolabial furrows, the areas of interest are those reported in Figure 5.7(a). These regions, as well as all the other transient areas, are automatically detected using the AAM landmarks. Figure 5.7 shows the four possible configurations for the nasolabial region: nasolabial furrow absence, nasolabial furrow due to cheek raising Figure 5.7(b)–(c) (AU6), nasolabial furrow due to nose wrinkling or upper lip raising Figure 5.7(d) (AU9, AU10). If the analysis of the longest connected edge in the two nasolabial regions (Figure 5.7(a)) reveals the presence of furrows,
116
Matteo Sorci et al.
Table 5.2: Correspondences between measures on masks 5(b) and 5(a). FACS measures JJ 0 JF J 0F 0 KG l8 K 0 G0 GI l6 G0 I 0 PF P0 F 0 FC F 0 C0 FD l4 F 0D OD OB DB C0 C FHJ F 0H 0J 0 HFI H 0F 0I 0 HGF H 0 G0 F 0 Nose wrinkles 6(a) Eyes wrinkles 6(b) Forehead wrinkles 6(c) Nasolabial fold 6(d)
Measures on mask 5(a)
Explanatory variables
Dist(P5, P6) Dist(P6, P19) Dist(P5, P15) Dist(P8, P25) Dist(P3, P17) Dist(P25, P21) Dist(P13, P17) Dist(P19, P42) Dist(P15, P37) Dist(P19, P31) Dist(P15, P27) Dist(P25, P29) Dist(P17, P29) DistðððP39 þ P40Þ=2Þ; P29Þ Distððð39 þ 40Þ=2Þ; P33Þ Dist(P29, P33) Dist(P27, P31) P19P23P6 P15P11P5 P23P19P21 P11P15P13 P23P25P19 P15P17P11 Presence detection Presence detection Presence detection Presence detection
EVF1 EVF2 EVF3 EVF4 EVF5 EVF6 EVF7 EVF8 EVF9 EVF10 EVF11 EVF12 EVF13 EVF14 EVF15 EVF16 EVF17 EVF18 EVF19 EVF20 EVF21 EVF22 EVF23 EVF24 EVF25 EVF26 EVF27
then the extracted curve is approximated by a quadratic equation: y ¼ ax2 + bx + c. The approximated curve is obtained by fitting the set of nasolabial furrow’s pixels to y using the least-square method, similarly to Zhang and Ji (2005). The a coefficient represents the curvature of the nasolabial fold. According to its value we can detect and encode the presence of the nasolabial furrows as follows: aW0, as shown in Figure 5.7(b), which contributes to AU6 and to happiness-like expressions; ao0 and the vertex x ¼ b/2a is a pixel belonging to the detected furrow, as indicated by the red curve in Figure 5.7(c). This instance is again connected to AU6; ao0 and it has no vertex, as shown in Figure 5.7(d). This case is a support evidence to AU9 and
Capturing Human Perception of Facial Expressions
117
Figure 5.6: Transient feature detection: (a) horizontal wrinkles between eyes, (b) vertical furrows between brows, (c) horizontal wrinkles on the forehead and (d) nasolabial fold. AU10 and so to disgust-like expressions. The measures concerning regions V and U are discarded for two main reasons: (1) the related wrinkles are not always detectable in subjects; (2) they are redundant, since strictly linked to wrinkles and furrows in the retained regions.
5.5.3.
Expression Descriptive Units (EDUs)
In the visual perception community there is a general agreement on the fact that face recognition is the result of two main sources of information: featural coming from individual facial features (mouth, nose etc.) and configural related to the facial layout and configuration of the previous features (Farah et al., 1998; Cabeza & Kato, 2000). The measures extrapolated by the FACS give information about isolated components in a face, providing a featural contribution to face representation. According to the hypothesis of configural encoding, the spatial relationships between facial components provide additional sources of information in the analysis of facial expressions. In order to exploit the combination of these two useful sources we have decided to add a group of measures encoding the interactions among the featural descriptors showed in Figure 5.5(c). For that purpose we define to use the set of measures, called EDU, reported in Table 5.3 and introduced by the authors in Antonini et al. (2006). The first 5 EDUs represent, respectively, the eccentricity of eyes, left and right eyebrows, mouth and nose. The EDUs from 7 to 9 represent the
118
Matteo Sorci et al.
Figure 5.7: Nasolabial furrows possible scenarios: (a) nasolabial furrows absence and the two monitored regions around landmarks 37 and 42; (b) nasolabial furrows curve paramterized by a ¼ 0 and associated to AU6; (c) nasolabial furrows curve characterized by ao0 and x ¼ b/2a, associated to AU6; (d) nasolabial furrows curve characterized by ao0 and we the visible curve, associated to AU9 and AU10.
eyes interactions with mouth and nose, while the 10th EDU is the nose–mouth relational unit. The last 4 EDUs relate the eyebrows to mouth and nose. The EDUs can be intuitively interpreted. For example, in a face displaying a surprise expression, the eyes and the mouth are usually opened and this can be captured by EDU7 (eyeheight/mouthheight).
5.5.4.
Appearance Vector (c)
FACS and EDU provide measures of local facial features or areas that are prone to change with facial expressions, but they do not provide a description of a face as a global entity. This information can be obtained considering the appearance vector c matching the face in the processed image. Figure 5.8 shows
Capturing Human Perception of Facial Expressions
119
Table 5.3: Expression descriptive units. EDU measures EDU1 EDU2 EDU3 EDU4 EDU5 EDU6 EDU7 EDU8 EDU9 EDU10 EDU11 EDU12 EDU13 BDU14
Measures definition
Explanatory variables
ðlew þ rewÞ=ðleh þ rehÞ ðlbw=lbhÞ ðrbw=rbhÞ ðmw=mhÞ ðnh=nwÞ ðlew=mwÞ ðleh=mhÞ ðleh þ rehÞ=ðlbh þ rbhÞ ðlew=nwÞ ðnw=mwÞ ðEDU2=EDU4Þ ðEDU3=EDU4Þ ðEDU2=EDU10Þ ðEDU3=EDU14Þ
EVE28 EVE29 EVE30 EVE31 EVE32 EVE33 EVE34 EVE35 EVE36 EVE37 EVE38 EVE39 EVE40 EVE41
Figure 5.8: Examples of synthesized faces obtained varying the first c parameter from the mean face (73std).
the effect of varying the first appearance model parameter, showing changes in identity and expression.
5.6. Models Specification In this paragraph we focus on the deterministic part Vi of the random utility function (see Eq. (5.1)). Any alternative i can be described in terms of a combination of a certain number of attributes EVi reflecting reasonable hypotheses about the effects of these variables on the corresponding utility. We propose three models of
120
Matteo Sorci et al.
increasing complexity.
V j ¼ ASCj þ
KF X
I Fkj bFkj EVFk
FACS Model
I Ehj bEkj EVEh
FACS þ EDU Model
C C IC hj blj EVl
FACS þ EDU þ C; FEC Model
k¼1
þ
KE X
(5.8)
h¼1
þ
KC X l¼1
where j 2 {‘happiness’, ‘surprise’, ‘fear’, ‘disgust’, ‘sadness’, ‘anger’, ‘neutral’, ‘other’, ‘I don’t know’}, {F, E, C} refer respectively to the FACS, EDUs and the appearance parameters c, EVfF;E;Cg fk;h;lg refers to {k, h, l}-th explanatory variable of one of the used sets, is an K{F,E,C} is the total number of the explanatory variables for each set, I fF;E;Cg kj indicator function equal to 1 if the k-th explanatory variable is included in the utility for is the weight for the k-th EV in alternative j and the alternative j and 0 otherwise, bfF;E;Cg kj ASCj is an alternative specific constant. The ASCj coefficients represent the average value of the unobserved part of the corresponding utility and they are added in each utility. For the model to be identified, one of the constant must be normalized to zero. In our case the neutral alternative is considered as the reference alternative and its ASC is set to zero. In addition neutral is a ‘by default expression’, it corresponds to a fully relaxation of the facial muscles. Indeed features of a neutral face are supposed to be at their basic level. Consequently in the developed DCMs, the deterministic utility associated to the neutral expression is fixed to zero. Concerning the ‘Don’t know’ alternative, it has been introduced in the survey in order to avoid collecting noise. In the models, its corresponding utility contains only an ASC because no clear causal effect can be identified. This is not the case for the ‘Other’ alternative, which represents a set of expressions. Principal features are introduced in its deterministic utility, according to principal AUs. Different models utility specifications are presented in Table 5.A.1 in Appendix 5.A. The first version of the systematic utility functions (FACS Model, in Eq. (5.8)), for the proposed MNL Model, includes the explanatory variables associated with the local measures defined in the AU. In the second step the local interactions between facial features provided by the EDUs are also included, FACS + EDU Model in Eq. (5.8). In the last model the c appearance parameters, encoding global measures about the face, are finally added to the two previous sets of measures, Model FEC in Eq. (5.8). The five first c parameters, that capture the 75% of the total variance in the AAM training set, are introduced in the utility functions using alternative specific parameters.
5.7. Model Estimation The models introduced in the previous section have been estimated using the free Biogeme package (Bierlaire, 2003) using maximum likelihood estimation. In Table 5.4
+ 103
bF17H
+ 56.81
t test 0
bFE 31SU
bFE 17H
bFE ki
+ 8.12
+ 34
Estimate
FE model
+ 48.3
+ 4.98
t test 0
FEC bF17H ¼ mouth width happiness; bFE 31SU ¼ EDU4 surprise; b46A ¼ c5 anger Sample size ¼ 38,110 Sample size ¼ 38,110 No. of estimated parameters ¼ 93 No. of estimated parameters ¼ 120 Null log-likelihood ¼ 83,736.229 Null log-likelihood ¼ 83,736.229 Final log-likelihood ¼ 57,072.872 Final log-likelihood ¼ 55,027.381 Likelihood ratio test ¼ 53,326.712 Likelihood ratio test ¼ 57,417.695 r 2 ¼ 0:341 r 2 ¼ 0:317
Estimate
bFki
F model
9.67
bFEC 46A
11.13
+ 39.59
+ 37.67
t test 0
Sample size ¼ 38,110 No. of estimated parameters ¼ 139 Null log-likelihood ¼ 83,736.229 Final log-likelihood ¼ 53,474.271 Likelihood ratio test ¼ 60,523.915 r 2 ¼ 0:360
+ 6.89
+ 105
Estimate
FEC model
bFEC 31SU
bFEC 17H
bFEC ki
Table 5.4: Estimation results for the FACS, FACS + EDU, FACS + EDU + C Models.
Capturing Human Perception of Facial Expressions 121
122
Matteo Sorci et al.
we report the final coefficients estimates for some b for the three models. In the first half of the table, each row relates each particular b for a specific model to its estimated coefficient and its associated t-statistic values. The second half of the table shows summary statistics for the entire estimation run for each of the three models. The sign of the parameters are consistent with the common reading of facial expressions in terms of facial component modifications. In Table 5.4, we report a subset of bki estimates. A parameter is considered significant if the norm of the t-test against 0 is bigger than 1.96, representing the 95% of significance. bF17H represents the coefficient of the mouth width measure in the happiness expression. It is a FACS parameter and it is included in all the specifications. Its positive value shows a positive impact on the respective utility. This means that an increase of the mouth width with respect to the neutral expression (the reference one in our model) corresponds to higher utilities for the happiness alternative. The bF17H estimate is inline with the FACS expectations for the happiness expression. The first row in Table 5.1 describes the FACS happiness encoding in terms of the primary AUs 6 and 12. During an AU12 a stretching of the mouth’s lip corners is expected. This corresponds indeed to an increase of the measure CC 0 associated to the estimated parameter bF17H and representing the mouth width. bFE 31SU is the parameter related to EDU4 (Table 5.3) describing the mouth eccentricity in the surprise alternative. Its positive sign explains the expected behaviour of the mouth in subjects performing a surprise expression, where the mouth movement leads to a lower mouth’s height and a higher mouth’s width, with respect to the reference alternative. The third parameter bFEC 46A is the coefficient related to the fifth appearance parameters c for the anger utility. The bigger this coefficient is the more negative is the impact on the anger utility. We can visually interpret this result by looking at Figure 5.9. Considering the neutral c5 value as the reference value, we can notice how increasing this parameter (leaving unchanged the others) we move towards a happiness-like expression, whereas an anger-like face corresponds to values of c5 smaller than the reference one. The statistics concerning the goodness of fit for the three different models are reported in the second half of Table 5.4. It can be observed that for the second model the fitting is better than for the first one (higher log-likelihood and r 2 ) and the same for the third model with respect to the second one. The proposed models have been built in a nested way. This means that the first model is a restricted version of the second one and the latest a restriction of the third one. In this case, a likelihood ratio test (Ben-Akiva & Lerman, 1985) can be used to verify if the additional variables of the unrestricted model add a significant explanatory power to the model and compensate for the degrees of freedom used by the fuller specification. The null hypothesis for this test states that the restricted and unrestricted models are equivalent. The statistic to compute the test is 2ðLðb^ R Þ Lðb^ U ÞÞ w2K U K R
(5.9)
Capturing Human Perception of Facial Expressions
123
Figure 5.9: Example of the effect of variation of the c5 value. Increasing this parameter (leaving unchanged the others) we move towards a happiness-like expression, whereas an anger-like face corresponds to values of c5 smaller than the reference one. Table 5.5: Summary of the different performed likelihood ratio tests. Performed test F vs. FE FE vs. FEC
Degrees of freedom
Test value
v2 threshold
27 19
4090.98 3106.22
40.11 30.14
where Ki is the number of parameters of the model i and w2j is a w2 distribution with j degrees of freedom. Usually, a significance level of 95% is taken, and then the null hypothesis is rejected if the test value is above the threshold provided by the w2 distribution corresponding to the j degrees of freedom. The results for this test are reported in Table 5.5. The performed tests refer to the two possible (restricted, unrestricted) models couples. The first test shows that the inclusion of new parameters makes the unrestricted FE model significantly different from its restricted counterpart, the F model. This result justifies the second test comparing the most complex model (FEC) with its restricted version (FE), showing that the model considering the whole set of three different explanatory variables can be considered and retained as the final model that best fit our data.
5.8. Extension to Dynamic Facial Expression Recognition The Discrete Choice framework used for static facial expression recognition is extended in order to consider face video sequences instead of images. An Internet
124
Matteo Sorci et al.
survey similar to the one described in Section 5.3.2 is currently conducted for collecting expressions labels on face video sequences. It is available at http:// transp-or2.epfl.ch/videosurvey/. Two video databases are used, the Cohn–Kanade database (Kanade et al., 2000) (also used in the static case), and the Facial Expressions and Emotions Database (FEED) (Wallhoff, 2004). The dynamic model is inspired by car line changing models (Choudhury, 2007) and is a direct application of DCM with latent segmentation (Walker, 2001). We hypothesise that the respondent expression perception evolves when watching the video. In addition we consider that the influence of the video frames on the respondent perception is varying depending on their dynamic. Considering perception evolving at each frame is not realistic. Indeed frames transition is too fast as frame rate is 25/s, consequently a perception evolution time step is defined equal to 1 s. The sequence is therefore sampled selecting the first frame of each group of 25 frames. Features for each frame group are then the features of its first frame. By extension in the following we call a group of frames, a frame. The dynamic facial expression recognition model consists of a combination of two DCMs. A perception state, corresponding to the respondent facial expression perception, is associated to each time step. A first DCM is used to quantify this perception, whose choice set is composed of the nine expressions used in the static case. This is similar to the static model. The second DCM quantifies the frame influences on the respondent observed facial expression choice. The choice set in this case is composed of the frames of the labelled video, which makes that the choice set varies from one video to another. Note that both models are based on latent concepts; indeed the respondent instantaneous perception and the frames influences are not observed. Only the video expression choice is observed. The probability for respondent n to choose the expression i when watching the frame t of the video sequence o is written Pn(i|t, o) (first DCM). Then, the probability for the respondent n to make her expression choice when watching the frame t of the video sequence o is Pn(t|o) (second DCM). The two DCMs are linked by the probability for the respondent n to label the video o with the expression i, called Pn(i|o). This relation can be expressed as: Pn ðijoÞ ¼
To X
Pn ðijt; oÞPn ðtjoÞ
(5.10)
t¼1
To being the video duration in seconds. As shown for the static model, Pn(i|t, o) is quite universal, in the sense that for the moment no clear socio-economic characteristic seems to interact with the expression perception. We expect that this is not the case for Pnt|o which should strongly depend on the respondent n. Indeed the frame dynamic perception depends on the current respondent attention. This leads to take into account the panel data effect. xn is defined as a random term specific to the respondent n. So Eq. (5.10) can be transformed as: Pn ðijo; xn Þ ¼
To X t¼1
Pn ðijt; oÞPn ðtjo; xn Þ
(5.11)
Capturing Human Perception of Facial Expressions
125
In order to obtain a closed form of Pn(i|o, xn), we need to integrate on xn. By default xn is supposed to be normally distributed N(0, s). f(x) is the probability density distribution of xn, and On the number of observations associated to the respondent n. By integration we obtain Pn(i|o) On Y
Pn ðijoÞ ¼
o¼1
Z Y On X To
Pn ðijt; oÞPn ðtjo; xn Þf ðxÞdx
(5.12)
o¼1 t¼1
Theoretically Pn(i|t, o) can be of any DCM type, such as multivariate extreme value (MEV), or mixture of logit models. But as mentioned before, the model is designed exactly for the same purpose as the static model, so in a first step a simple logit model will be used and the utility specification will be near to the one proposed in the static model version. In a second step, utilities will take into account the perception memory effect. Concerning Pn(t|o, xn), it is a mixture of logit models, due to the panel data effect term. We prefer to use a quite simple model form, such as mixture of logit models, and not mixtures of MEV models, because the correlation between frames is difficult to define. Indeed the frames number varies from one video to another. The utility specification has to contain attributes that reflect the frame dynamics, such as derivatives of the attributes used in the first DCM. The idea to use a simple correlation structure is also motivated by the fact that both models are estimated jointly by likelihood maximization, as a classical DCM. Indeed the combination of such models can imply high nonlinearities in the likelihood function, and the optimization algorithm has to deal with such difficulties. If we call b the parameters vector we want to estimate, the likelihood l(b) has the following form: lðbÞ ¼
On N Y Y n¼1
! Pn ðijt; o; bÞ
(5.13)
o¼1
By mixing Eqs. (5.12) and (5.13) we obtain lðbÞ ¼
Z Y On X To N Y n¼1
! Pn ðijt; o; bÞPn ðtjo; xn ; bÞf ðxÞdx
(5.14)
o¼1 t¼1
But for numerical reasons, the logarithm of the likelihood function, LðbÞ ¼
N X n¼1
log
Z Y On X To
! Pn ðijt; o; bÞPn ðtjo; xn ; bÞf ðxÞdx
(5.15)
o¼1 t¼1
is used instead of l(b) during the estimation process. An extension of the biogeme software (Bierlaire, 2003) will be implemented to estimate such models, the optimization toolbox remaining the same. We conclude this section by underlying the fact that the model specification will depend on the number of observations provided by the Internet video survey.
126
Matteo Sorci et al.
Indeed nowadays the database contains 500 observations. This little number constrains the number of alternative specific parameters in the perception model to be reduced, compared to the static model version.
5.9. Conclusion and Discussion We have proposed a new method for facial expressions modelling, based on discrete choice analysis. The data of the facial evaluation survey suggested that a subjective component biases the labelling process, requiring a detailed statistical analysis on the collected data. DCM paradigm well matches the human observer labelling procedure, allowing to capture and model the subjective perception of the choice makers. In the static case, we showed how to improve the descriptiveness of the model by sequentially introducing complementary set of features. The estimation of the three proposed models has shown the correctness of the chosen sets of features, revealing the best fitting behaviour of the third and most complex model. This work represents one of the first attempts to apply discrete choice analysis for modelling facial expressions. Several means of improvement are possible. First, a deeper understanding of the choice process can be achieved by exploring the personal characteristics of the decision maker. The heterogeneity in the respondent population of the survey will allow the investigation and the interpretation of these human factors. For that purpose, the socio-economic features can be analysed and introduced in the utility functions as categorical variables. This analysis would overcome another shortcoming of previous approaches where humans are usually modelled as invariants and not as individuals. While modelling invariants is fundamental for most machine learning or patterns recognition problems, in perception it is also important to ask how people are different. A further investigation of the parameters involved in the decision-maker’s choice process can be obtained by applying a segmentation of the population. This means that, instead of introducing a parameter for each socio-economic attribute, the population is divided with respect to that feature. For example, the behaviour of men and women can be explored by analysing the two groups separately. Secondly, other families of DCM can be used. As described in Section 5.4, the utility of each alternative is a random variable containing a systematic random part. Different assumptions about the random term give rise to different models. The MNL models assume no correlations between alternatives. This hypothesis can be relaxed, by considering Nested (Daly & Zachary, 1978) and Cross-Nested (Bierlaire, 2006) Models. Finally the static discrete choice framework has been extended to the dynamic case. A model composed of two discrete choice sub-models is proposed, one of them being similar to the model used in the static version, the other one measuring the influence of each video frame. The dynamic model is an adaptation of a DCM with latent segmentation proposed by Walker (2001).
Capturing Human Perception of Facial Expressions
127
References Abboud, D., & Davoine, F. (2004). Appearance factorization based facial expression recognition and synthesis. ICPR, 4, 163–166. Antonini, G., Sorci, M., Bierlaire, M., & Thiran, J. (2006). Discrete choice models for static facial expression recognition. In: J. Blanc-Talon, W. Philips, D. Popescu, & P. Scheunders (Eds), 8th International Conference on Advanced Concepts for Intelligent Vision Systems, Vol. 4179 of Lecture Notes in Computer Science. Berlin/Heidelberg, Berlin: Springer (pp. 710–721). ISBN: 978-3-540-44630-9. Bartlett, M. S., Littlewort, G., Frank, M., Lainscsekl, C., Fasel, I., & Movellan, J. (2006). Fully automatic facial action recognition in spontaneous behavior. FGR ‘06: Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, IEEE Computer Society, Washington, DC (pp. 223–230). Bassili, J. N. (1978). Facial motion in the perception of faces and of emotional expression. Journal of Experimental Psychology. Human Perception and Performance, 4(3), 373–379. Ben-Akiva, M. E., & Lerman, S. R. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: MIT Press. Bicego, M., Salah, A. A., Grosso, E., Tistarelli, M., & Akarun, L. (2007). Generalization in holistic versus analytic processing of faces. ICIAP ‘07: Proceedings of the 14th International Conference on Image Analysis and Processing, IEEE Computer Society, Washington, DC (pp. 235–240). Bierlaire, M. (2003). BIOGEME: A free package for the estimation of discrete choice models. Proceedings of the 3rd Swiss Transportation Research Conference, Ascona, Switzerland. Bierlaire, M. (2006). A theoretical analysis of the cross-nested logit model. Annals of Operations Research, 144(1), 287–300. Cabeza, R., & Kato, T. (2000). Features are also important: Contributions of featural and configural processing to face recognition. Psychological Science, 11, 429–433. Carey, S. (1992). Becoming a face expert. Philosophical Transactions of the Royal Society of London, B, 335, 95–103. Chang, Y., Hu, C., & Turk, M. (2004). Probabilistic expression analysis on manifolds. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2004 (Vol. 2, pp. II-520–II-527). Choudhury, C. F. (2007). Model driving decisions with latent plans. Ph.D. thesis, Massachusetts Institute of Technology. Cohen, I., Sebe, N., Chen, L., Garg, A., & Huang, T. S. (2003). Facial expression recognition from video sequences: Temporal and static modeling. Computer Vision and Image Understanding, 10, 160–187. Cohn, J. F. (2006). Foundations of human computing: Facial expression and emotion. ICMI ‘06: Proceedings of the 8th international conference on Multimodal interfaces, ACM, New York, NY (pp. 233–238). Cootes, T., & Taylor, C. (2004). Statistical models of appearance for computer vision. Available at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.4759 Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 681–685. Daly, A. J., & Zachary, S. (1978). Improved multiple choice. In: D. A. Hensher & M. Q. Dalvi (Eds), Determinants of travel demand. Sussex: Saxon House. Diamond, R., & Carey, S. (1986). Why faces are and are not special: An effect of expertise. Journal of Experimental Psychology: General, 115(2), 107–117.
128
Matteo Sorci et al.
Dryden, I. L., & Mardia, K. V. (1998). Statistical shape analysis. New York: Wiley. Ekman, P. (1992). An argument for basic emotions. Cognition & Emotion, 6(3), 169–200. Ekman, P., Friesen, W., & Hager, J. (2002). Facial action coding system. Salt Lake City, UT: Research Nexus, Network Research Information. Ekman, P., & Friesen, W. V. (1978). Facial action coding system investigator’s guide. Palo Alto, CA: Consulting Psychologist Press. Farah, M. J., Wilson, K. D., Drain, M., & Tanaka, J. N. (1998). What is ‘‘special’’ about face perception? Psychological Review, 105(3), 482–498. Fasel, B., & Luettin, J. (2003). Automatic facial expression analysis: A survey. Pattern Recognition, 36(1), 259–275. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 23–27. Friesen, W., & Ekman, P. (1983). Emfacs-7: Emotional facial action coding system. San Francisco, CA: University of California. Hu, C., Chang, Y., Feris, R., & Turk, M. (2004). Manifold based analysis of facial expression. CVPRW ‘04: Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’04) Volume 5, IEEE Computer Society, Washington, DC (p. 81). Kanade, T., Cohn, J., & Tian, Y. L. (2000). Comprehensive database for facial expression analysis. Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition (FG’00) (pp. 46–53). Keltner, D., & Ekman, P. (2000). Facial expression of emotion. In: M. Lewis & J. M. Havilland (Eds), Handbooks of emotions (pp. 236–249). li Tian, Y., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 97–115. li Tian, Y., Kanade, T., & Cohn, J. F. (2002). Evaluation of gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity. Proceedings of Fifth IEEE International Conference on Automatic Face and Gesture Recognition (pp. 229–234). Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998). Coding facial expressions with gabor wavelets. FG ‘98: Proceedings of the 3rd International Conference on Face & Gesture Recognition, IEEE Computer Society, Washington, DC (p. 200). Manski, C. (1977). The structure of random utility models. Theory and Decision, 8, 229–254. Manski, C. F., & McFadden, D. (1981). Econometric models of probabilistic choice. In: C. F. Manski & D. McFadden (Eds), Structural analysis of discrete data with econometric applications (pp. 198–272). Cambridge: MIT Press. Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(1), 135–164. Medioni, G., & Kang, S.-B. (2004). Emerging topics in computer vision. Upper Saddle River, NJ: Prentice Hall PTR. Meer, P., & Georgescu, B. (2001). Edge detection with embedded confidence. Transactions on Pattern Analysis and Machine Intelligence, 23(12), 1351–1365. Meulders, M., Boeck, P. D., Mechelen, I. V., & Gelman, A. (2005). Probabilistic feature analysis of facial perception of emotions. Journal of The Royal Statistical Society Series C, 54(4), 781–793. Moses, Y., Ullman, S., & Edelman, S. (1996). Generalization to novel images in upright and inverted faces. Perception, 25, 443–462. Padgett, C., & Cottrell, G. W. (1998). A simple neural network models categorical perception of facial expressions. Proceedings of the Twentieth Annual Cognitive Science Conference, Erlbaum (pp. 806–807).
Capturing Human Perception of Facial Expressions
129
Pantic, M., & Bartlett, M. (2007). Machine analysis of facial expressions. In: K. D. M. Grgic (Ed.), Face recognition (Chapter 20, pp. 377–416). Vienna, Austria: I-Tech Education and Publishing. Pantic, M., & Patras, I. (2006). Dynamics of facial expression: Recognition of facial actions and their temporal segments from face profile image sequences. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 36(2), 433–449. Pantic, M., Valstar, M. F., Rademaker, R., & Maat, L. (2005). Web-based database for facial expression analysis. IEEE International Conference on Multimedia and Expo (ICME), IEEE (pp. 317–321). Pentland, A., Moghaddam, B., & Starner, T. (1994). View-based and modular eigenspaces for face recognition. Proceedings of the Fourth International Conference on Computer Vision (pp. 84–91). Rowley, H., Baluja, S., & Kanade, T. (1998). Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 23–38. Schneiderman, H., & Kanade, T. (2000). A statistical model for 3d object detection applied to faces and cars. IEEE Conference on Computer Vision and Pattern Recognition. Schwaninger, A. (2003). Perception and representation of faces. Ph.D. thesis, Universita¨t Zu¨rich. Schwaninger, A., Lobmaier, J. S., & Collishaw, S. M. (2002). Role of featural and configural information in familiar and unfamiliar face recognition. BMCV ‘02: Proceedings of the Second International Workshop on Biologically Motivated Computer Vision, Springer-Verlag, London, UK (pp. 643–650). Sorci, M., Antonini, G., Thiran, J.-P., & Bierlaire, M. (2007). Facial expressions evaluation survey. Technical Report. ITS. Stegmann, M. B. (2000). Active appearance models: Theory, extensions and cases. Master’s thesis, Informatics and Mathematical Modelling, Technical University of Denmark, DTU, Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby. Sung, K. K., & Poggio, T. (1998). Example-based learning for view-based human face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 39–51. Tian, Y.-L., Kanade, T., & Cohn, J. (2003). Facial expression analysis. In: S. L. A. Jain (Ed.), Handbook of face recognition. New York, NY: Springer. Valstar, M. F., & Pantic, M. (2007). Combined support vector machines and hidden Markov models for modeling facial action temporal dynamics, ICCV-HCI (pp. 118–127). Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154. Walker, J. L. (2001). Extended discrete choice models: Integrated framework, flexible error structures, and latent variables. Ph.D. thesis, Massachusetts Institute of Technology. Wallhoff, F. (2004). Fgnet-facial expression and emotion database. Technical Report, Technische Universitt Mnchen. URL: http://www.mmk.ei.tum.de/waf/fgnet/feedtum.html Wallraven, C., Schwaninger, A., & Bu¨lthoff, H. H. (2005). Learning from humans: Computational modeling of face recognition. Network, 16(4), 401–418. Yang, M.-H., Kriegman, D. J., & Ahuja, N. (2002). Detecting faces in images: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(1), 34–58. Ye, J., Zhan, Y., & Song, S. (2004). Facial expression features extraction based on gabor wavelet transformation. IEEE International Conference on Systems, Man and Cybernetics (pp. 10–13). Zhang, Y., & Ji, Q. (2005). Active and dynamic information fusion for facial expression understanding from image sequences. Transactions on Pattern Analysis and Machine Intelligence, 27(5), 699–714.
130
Matteo Sorci et al.
Zhang, Z., Lyons, M., Schuster, M., & Akamatsu, S. (1998). Comparison between geometrybased and gabor-wavelets-based facial expression recognition using multi-layer perceptron. FG ‘98: Proceedings of the 3rd. International Conference on Face & Gesture Recognition, IEEE Computer Society, Washington, DC (p. 454).
Appendix 5.A. Specification Table with Estimated Parameters The values of estimated parameters are presented in Table 5.A.1. In the first and second columns the parameter name and its associated feature are mentioned. From column three to eleven, the associated utility for each parameter is indicated. Finally, in columns 12–14, estimated values and t-tests against zero are shown for the three models. Note that if the parameter is not present in one of the models, the corresponding cell is empty.
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16 b17 b18 b19 b20 b21 b22 b23 b24 b25 b26 b27
Constant Constant Constant Constant Constant Constant Constant Constant C1 C1 C1 C1 C1 C1 C2 C2 C2 C2 C2 C2 C2 C3 C3 C3 C3 C3 C4
Brief description
|
|
|
H
|
|
|
|
SU
|
|
|
F
|
|
|
D
|
|
|
|
SA
Utilities
Table 5.A.1: Details of models specifications.
|
|
|
|
A
N
|
|
|
|
O
|
DK 2.22 1.71 2.29 3.83 1.15 1.38 2.69 4.05
Estimate 6.63 6.73 69.24 3.53 3.52 4.54 5.63 21.01
t test 0
F Model
1.51 0.26 2.29 1.01 25.00 6.05 14.60 1.56
Estimate 2.86 0.11 69.25 0.32 10.87 3.04 5.21 2.95
t test 0
FE Model
5.91 2.65 2.29 5.65 2.40 3.34 9.61 1.92 5.66 9.25 13.60 3.07 10.90 2.75 8.87 18.60 6.56 3.91 12.80 10.10 4.04 3.05 18.00 5.74 11.80 7.29 9.24
Estimate
11.53 1.16 69.24 3.85 2.56 1.67 3.39 3.83 7.81 7.83 15.84 4.47 13.61 3.52 10.13 22.87 5.15 3.88 17.41 11.05 4.06 3.35 10.40 7.76 12.64 7.30 10.36
t test 0
FEC Model
Capturing Human Perception of Facial Expressions 131
b28 b29 b30 b31 b32 b33 b34 b35 b36 b37 b38 b39 b40 b41 b42 b43 b44 b45 b46 b47 b48 b49 b50 b51 b52 b53 b54
C4 C4 C4 C4 C5 C5 C5 C5 C5 C5 EDU10 EDU10 EDU10 EDU10 EDU10 EDU10 EDU5 EDU5 EDU5 EDU6 EDU6 EDU6 EDU6 EDU7 EDU7 EDU7 EDU7
Brief description
|
|
|
|
|
H
Table 5.A.1: (Continued )
|
|
|
SU
|
|
|
|
F
|
|
|
|
D
|
|
|
|
|
SA
Utilities
|
|
|
A
N
|
|
|
|
O
DK
Estimate
t test 0
F Model
9.62 13.20 8.14 16.00 15.40 3.17 1.78 2.45 1.25 17.70 16.70 25.70 24.30 2.31 1.28 2.46 2.06
Estimate
19.91 3.75 6.38 5.27 3.96 7.17 11.68 15.44 8.33 4.15 6.75 7.16 5.49 14.29 5.58 5.76 10.52
t test 0
FE Model
22.10 21.30 2.21 2.44 3.13 2.68
14.50 11.70 7.79 13.70 9.67 8.05 1.96 2.04 7.71 12.90 12.30 12.50 6.02 12.10 11.10 2.02 3.18 2.77 1.15 19.40
Estimate
6.21 5.08 13.30 11.92 8.21 14.13
12.46 11.77 9.56 14.65 10.66 6.82 2.06 2.59 8.18 14.36 23.43 3.70 6.76 4.03 2.95 4.34 27.36 15.35 7.39 4.74
t test 0
FEC Model
132 Matteo Sorci et al.
b55 b56 b57 b58 b59 b60 b61 b62 b63 b64 b65 b66 b67 b68 b69 b70 b71 b72 b73 b74 b75 b76 b77 b78 b79 b80 b81 b82 b83 b84 b85 b86
EDU7 EDU8 EDU8 EDU8 EDU8 EDU8 EDU8 EDU9 EDU9 EDU9 EDU9 EDU9 RAP brow RAP brow RAP brow RAP brow RAP eye RAP eye RAP eye RAP eye RAP eye RAP eye RAP mouth RAP mouth RAP mouth RAP mouth RAP mouth RAP mouth RAP mouth Brow dist Brow dist Brow dist
S
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9.25 32.90 23.10
4.71 8.91 11.56
2.03 2.33 4.29 6.85 0.75 8.39 5.80 12.20 2.97 6.26 12.30 14.80 4.78 10.60 12.40 12.50 3.59 7.09 23.40 8.79 14.30 2.00 14 3.13 9.23 6.28 3.88 7.30 8.12 21.70 14.40 47.60
10.60 5.88 12.49 14.29 2.25 12.02 16.54 4.29 2.57 10.81 5.18 5.24 7.94 18.21 12.63 10.75 4.87 3.24 10.03 16.77 14.53 3.45 15.12 2.48 33.58 5.09 9.99 4.56 39.06 9.67 4.54 18.46 7.75 8.38 4.29 9.98 6.89 24.90 9.28 43.50
17.40
4.61 10.30 11.20
5.59 6.42 1.13 6.15 3.94 12.00 4.02 3.12 8.08 11.50 1.11 12.20 5.76 7.54 7.17
2.05
28.30 8.25 11.50 5.63 33.38 11.46 3.02 17.07
17.85
5.54 19.03 11.07
16.68 13.74 3.37 8.89 11.02 4.36 5.71 4.94 3.40 4.16 2.07 21.04 6.39 6.78 11.14
10.26
Capturing Human Perception of Facial Expressions 133
b87 b88 b89 b90 b91 b92 b93 b94 b95 b96 b97 b98 b99 b100 b101 b102 b103 b104 b105 b106 b107 b108 b109 b110 b111 b112 b113
Broweye 12 Broweye 12 Broweye 12 Broweye 12 Broweye 13 Broweye 13 Broweye r2 Broweye r2 Broweye r2 Broweye r2 Broweye r2 Browwr Browwr Browwr Browwr Eye angle below 1 Eye angle below r Eye angle below r Eye angle 1 Eye angle 1 Eye angle 1 Eye angle r Eye angle r Eye angle r Eye brow angle 1 Eye brow angle 1 Eye brow angle 1
Brief description
H
Table 5.A.1: (Continued )
|
|
|
SU
|
|
|
| |
|
F
|
|
D
|
|
|
|
|
|
|
SA
Utilities
|
|
|
|
|
|
A
N
|
|
|
O
DK 34.40 24.50 4.41 6.48 27.50 9.99 71.00 55.80 19.10 59.20 4.40 4.26 11.90 6.31 3.15 1.46 0.26 0.61 0.76 5.86 4.21 3.37 0.83 4.71 7.05 2.73 1.13
Estimate 7.33 5.86 0.79 1.59 7.72 2.92 16.26 21.12 2.02 9.18 0.72 2.55 7.04 4.48 1.88 6.07 0.88 4.75 2.36 12.69 14.89 9.89 2.11 17.21 13.98 7.99 2.63
t test 0
F Model
1.54 5.06 1.97 2.03 3.12 1.70 4.32 3.93
2.36
10.40 4.28
91.30
75.60 50.10
25.10 40.80 15.30 33.60 28.10
Estimate
3.96 10.02 5.00 4.84 5.34 3.93 9.91 11.11
6.44
6.15 2.97
10.28
16.21 15.38
4.60 10.03 3.96 11.30 6.37
t test 0
FE Model
1.76 4.42 3.42
2.47 5.13 1.09 2.72
1.96
4.64 4.33
80.80
74.20 31.20
50.00 22.00 48.80 32.80
Estimate
4.15 10.15 9.72
6.37 12.25 2.68 6.15
11.50
2.59 2.95
8.71
19.54 10.15
11.15 5.03 14.48 8.95
t test 0
FEC Model
134 Matteo Sorci et al.
b114 b115 b116 b117 b118 b119 b120 b121 b122 b123 b124 b125 b126 b127 b128 b129 b130 b131 b132 b133 b134 b135 b136 b137 b138 b139 b140 b141 b142 b143 b144 b145
Eye brow angle r Eye brow angle r Eye brow angle r Eye brow angle r Eye mouth dist 12 Eye mouth dist 12 Eye mouth dist 1 Eye mouth dist 1 Eye mouth dist 1 Eye mouth dist 1 Eye mouth dist r2 Eye mouth dist r2 Eye mouth dist r Eye mouth dist r Eye mouth dist r Eye mouth dist r Eye nose dist 1 Eye nose dist 1 Eye nose dist 1 Eye nose dist 1 Eye nose dist 1 Eye nose dist r Eye nose dist r Eye nose dist r Eye nose dist r Eye nose dist r Fore Fore Fore Fore Leye h Leye h
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1.46 1.75 5.31 1.22 41.10 8.29 33.30 12.30 29.60 30.70 27.70 7.52 30.90 79.80 29.70 62.20 5.15 90.00 64.10 90.40 113.00 50.20 94.90 74.70 108.00 135.00 0.13 0.87 0.29 0.56 81.20 27.60
2.14 8.49 12.84 3.69 14.79 3.51 5.07 3.23 7.74 6.44 11.86 3.02 4.81 20.78 8.33 14.47 0.84 15.96 8.10 16.86 19.33 6.72 14.68 7.34 17.09 20.26 1.62 11.21 4.82 9.29 8.11 4.51 0.67 0.16 0.54 86.70 204.00
28.80 70.10 96.50 42.00 78.20 105.00 31.50 136.00 62.00 77.00 117.00
20.70 31.60 4.50 42.40 63.40
54.00 55.70
2.10 0.84 7.96 2.75 16.00
9.09 2.67 9.03 4.40 12.31
6.39 9.72 13.80 4.84 15.42 15.34 3.63 19.01 6.09 12.79 14.77
3.84 11.19 3.99 6.84 12.59
8.57 10.09
6.49 4.18 12.43 12.37 4.13
0.74 0.20 0.47 32.00 41.70
38.90 95.30
36.50 67.30 49.50 19.70 54.90 79.40 25.00 96.20
21.10 26.70 4.40 46.90 58.60
66.30 59.70
1.54 0.95 5.81 2.93
9.39 3.21 7.56 3.55 4.09
6.05 12.18
8.12 9.23 8.26 5.77 10.47 11.23 2.87 12.88
3.96 12.72 3.74 7.22 11.17
10.12 10.66
5.86 4.24 9.10 13.69
Capturing Human Perception of Facial Expressions 135
b146 b147 b148 b149 b150 b151 b152 b153 b154 b155 b156 b157 b158 b159 b160 b161 b162 b163 b164 b165 b166 b167 b168
Leye h Mouth h Mouth h Mouth h Mouth h Mouth h Mouth h Mouth nose Mouth nose Mouth nose Mouth nose Mouth w Mouth w Mouth w Mouth w Mouth w Mouth w Naslab Naswr Naswr Reye h Reye h Reye h
dist2 dist2 dist dist
Brief description
|
|
|
|
H
Table 5.A.1: (Continued )
|
|
|
SU
|
|
|
F
| |
|
|
|
D
|
|
|
SA
Utilities
|
|
|
A
N
|
|
O
DK 61.20 27.60 5.46 42.90 4.07 8.45 55.10 8.17 14.20 15.70 31.20 23.30 31.30 19.80 103.00 19.30 3.07 0.76 18.80 4.73 33.20 44.70 30.30
Estimate 9.21 9.90 3.54 29.23 1.62 3.91 43.15 3.46 7.03 4.89 14.28 11.41 17.91 9.43 41.72 10.54 1.56 14.52 30.31 6.68 2.61 9.35 7.01
t test 0
F Model
190.00 38.00
44.90 0.57 16.70 6.35
23.10 34.40
5.39 20.10 11.80 37.90
73.20
20.90 111.00 43.00
Estimate
11.03 5.60
7.42 11.09 24.11 9.16
4.07 4.19
2.18 9.69 3.38 11.87
6.00
2.91 18.17 5.40
t test 0
FE Model
36.00 44.90
49.90 0.68 15.70 5.94
18.60 105.00
5.15 19.40 59.10
72.50
26.70 134.00 28.20
Estimate
3.95 9.14
8.38 12.66 22.63 8.22
5.09 37.67
2.25 6.12 18.56
5.38
3.63 21.23 8.66
t test 0
FEC Model
136 Matteo Sorci et al.
PART II DATA COLLECTION
Chapter 6
Serial Choice Conjoint Analysis for Estimating Discrete Choice Models Michiel C. J. Bliemer and John M. Rose
Abstract Stated choice experiments can be used to estimate the parameters in discrete choice models by showing hypothetical choice situations to respondents. These attribute levels in each choice situation are determined by an underlying experimental design. Often, an orthogonal design is used, although recent studies have shown that better experimental designs exist, such as efficient designs. These designs provide more reliable parameter estimates. However, they require prior information about the parameter values, which is often not readily available. Serial efficient designs are proposed in this paper in which the design is updated during the survey. In contrast to adaptive conjoint, serial conjoint only changes the design across respondents, not within-respondent thereby avoiding endogeneity bias as much as possible. After each respondent, new parameters are estimated and used as priors for generating a new efficient design. Results using the multinomial logit model show that using such a serial design, using zero initial prior values, provides the same reliability of the parameter estimates as the best efficient design (based on the true parameters). Any possible bias can be avoided by using an orthogonal design for the first few respondents. Serial designs do not suffer from misspecification of the priors as they are continuously updated. The disadvantage is the extra implementation cost of an automated parameter estimation and design generation procedure in the survey. Also, the respondents have to be surveyed in mostly serial fashion instead of all parallel.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
140
Michiel C. J. Bliemer and John M. Rose
6.1. Introduction Discrete choice models need no introduction given the vast amount of literature on the topic (e.g., Ben-Akiva & Lerman, 1985; Hensher, Rose, & Greene, 2005; Louviere, Hensher, & Swait, 2000; Train, 2003) and numerous applications of such models in a wide range of fields, such as marketing, transportation, health economics, etc. While estimation of discrete choice models has received much attention (mainly in the transportation literature), only in the last decade have procedures for obtaining data specific for estimating such models become an increasingly popular field of study (mainly in the marketing literature). Aside from the revealed preference data, where choices are observed in real markets, there has been a growing interest in stated preference data, with constructed choice situations in a hypothetical market. Generating these choice situations in stated choice (SC) experiments requires knowledge in the field of experimental design. Typically, each respondent in a SC experiment is faced with multiple-choice situations where the respondent is asked to make a choice between two or more alternatives, described by attributes and their levels. The levels for all attributes and alternatives typically are drawn from an underlying experimental design that has to be generated beforehand by the analyst. Several researchers have focused on generating experimental designs for SC experiments. Traditionally, orthogonal designs (also mathematically referred to as orthogonal arrays) have been used (see e.g., Louviere et al., 2000), which aim to minimize the correlation between the levels of any two attributes. Historically, orthogonal designs were generated for linear models; however, since discrete choice models are not linear, researchers have shown that they are not the most suitable design to use for SC studies. This fact has been acknowledged by, among others, Bunch, Louviere, and Anderson (1996), Kuhfeld, Tobias, and Garratt (1994) and Huber and Zwerina (1996) who focused on the efficiency of the design by linking it explicitly to the discrete choice model to be estimated. These researchers defined efficiency in terms of the reliability of the parameter estimates; roughly speaking the aim is to minimize the standard errors of the parameter estimates. Since then many researchers have focused on the generation of efficient designs, including: Bliemer and Rose (2008), Bliemer, Rose, and Hensher (2009), Carlsson and Martinsson (2003), Ferrini and Scarpa (2007), Johnson, Kanninen, and Bingham (2006), Kanninen (2002, 2005), Kessels, Goos, and Vandebroek (2006, 2009), Rose and Bliemer (2008), Rose, Bliemer, Hensher, and Collins (2008), Sa´ndor and Wedel (2001, 2002, 2005), Toner, Clark, Grant-Muller, and Fowkes (1999), and Yu, Goos, and Vandebroek (2009). Original research focused only on designs for the multinomial logit (MNL) model, but recent publications have seen a shift toward mixed multinomial logit (MMNL) models and to a lesser extent nested logit (NL) models. Other researchers do not focus on optimal attribute levels, but rather focus on the optimal choice probabilities for each alternative (e.g., Johnson et al., 2006; Kanninen, 2002, 2005; Toner et al., 1999). These optimal choice probabilities can typically only be achieved by assuming that at least one of the attributes has continuous attribute levels (e.g., a cost attribute), such that only the attribute-level range is input for this attribute and not the (discrete) attribute levels themselves.
Serial Choice Conjoint Analysis
141
In order to generate efficient designs, prior parameter values (best guesses) are needed. Early work (e.g., Huber & Zwerina, 1996) assumed fixed or precisely known values for the parameters. Later, Sa´ndor and Wedel (2001) introduced random prior distributions to indicate uncertainty about these priors leading to so-called Bayesian efficient designs. Still, one has to decide which parameter distributions are to be used, with pilot studies typically needed to determine good prior distributions. All of these efficient designs let go of the principle of orthogonality. Nevertheless, a separate stream of research on experimental design has recently been initiated by Burgess and Street (2003), Street and Burgess (2004), Street, Bunch, and Moore (2001), and Street, Burgers, and Louviere (2005) that seeks to locate optimal orthogonal designs without requiring any prior information (basically assuming prior values equal to zero). While this approach has been adopted frequently in the marketing and health economics literature, it does have some drawbacks. For example, it is only suitable for unlabeled experiments with generic parameters, and the designs can be shown to lose efficiency (as defined by the wider literature on experimental designs for SC studies) if the priors are not equal to zero in practice (which is essentially always the case). Still, the property that no priors are required is appealing to many, as these priors are usually not readily available. In order to overcome the problem of having to specify a priori prior parameters with more or less uncertainty but still some level of statistical efficiency in generated designs, adaptive designs (mostly referred to as adaptive conjoint analysis) have been introduced, for example, by Johnson (1987, 1991), and Green, Krieger, and Agarwal (1991). The choice situations in these designs are not fixed, but are adapted based on answers given by the respondent in earlier choice situations. However, such design methods are not without limitations with Toubia, Simester, Hauser, and Dahan (2003) stating: To understand this result we first recognize that any adaptive questiondesign method is potentially subject to endogeneity bias. Specifically, the qth question depends upon the answers to the first q 1 questions. This means that the qth question depends, in part, on any response errors in the first q 1 questions. This is a classical problem, which often leads to bias (see, for example, Judge, Griffiths, Hill, Lutkepohl, & Lee, 1985, p. 571). Thus, adaptivity represents a tradeoff: We get better estimates more quickly, but with the risk of endogeneity bias. Adaptive conjoint can be distinguished by changes in the questions asked both within and across respondents over the course of the study. In order to prevent endogeneity bias as much as possible, we propose the use of an adaptive process in which the design is only changed across respondents, which we term as serial efficient designs (also called sequential efficient designs). Such a procedure has been suggested by others, but has not been explored in more detail. Unlike other typical designs such as orthogonal and regular (Bayesian) efficient designs, a serial efficient design is no longer the same for each respondent. At the beginning of the survey, no information is assumed. An orthogonal design, for example, can be used for the first respondent,
142
Michiel C. J. Bliemer and John M. Rose
or alternatively an efficient design based on zero-valued priors. After completion of the S choice situations by this first respondent, the parameters are estimated based on his or her observed choices. Parameters that turn out to be statistically significant are then used as priors in determining the next design, while those that are not statistically significant are assumed to be zero. Based on these new priors, a new efficient design can be generated and given to the next respondent. The data from each additional respondent is then pooled with the data from previously surveyed respondents and new models estimated, after which a new design is generated and given to the next surveyed respondent. There may still be some risk of endogeneity bias in such a procedure, however, by requiring attribute-level balance1 for each new design, the choice situations will still cover a wide range of the utility space and will not concentrate (as in adaptive conjoint) on certain parts of utility space, thus keeping any possible bias to a minimum. Hauser and Toubia (2005) state that endogeneity bias, in general, exists in adaptive conjoint analysis, but also mention that adaptive utility-balanced choice questions do not appear to be biased in this way. The paper is outlined as follows. First, the MNL model will be discussed, mainly to introduce the required notation. The next section will briefly state the parameter estimation process, which is needed to update the priors for constructing the serial designs. Then, the generation of efficient designs is discussed and the algorithm for the serial efficient design procedure is stated in more detail. Two case studies are presented, illustrating the potential of serial designs. We will create SC data using a serial design, an (optimal) orthogonal, and a regular fixed efficient design. Then we compare the parameter estimates obtained from each of these data sets. Finally, results are summarized and discussed along with some advantages and disadvantages of the serial design approach.
6.2. The Multinomial Logit Model Consider N respondents that have to make a choice between J alternatives in S different choice situations. Let unsj denote the perceived utility of respondent n for alternative j in choice situation s. Furthermore, assume that this perceived utility consists of a systematic utility component, vnsj, and a random component, ensj, unsj ¼ vnsj þ nsj
(6.1)
1. Attribute-level balance means that each level for each attribute appears an equal number of times over all choice situations; hence, a respondent does not see, for example, mainly high or mainly low attribute levels.
Serial Choice Conjoint Analysis
143
The systematic utility is often assumed to be composed of a linear combination of attribute values and associated weights, vnsj ¼ x0 nsj b
(6.2)
where xnsjARK is a vector of K attribute values, bARK is a vector of weights (parameters), and the prime indicates the transpose operator. Each respondent is assumed to choose the alternative that maximizes his or her perceived utility. The probability of a respondent choosing a certain alternative depends on the assumptions made on the random components ensj. Under the strict assumption that these random components are independently and identically extreme value type I distributed, the probability pnsj that respondent n chooses alternative j in choice situation s can be written as (McFadden, 1974): expðvnsj Þ pnsj ¼ PJ i¼1 expðvnsi Þ
(6.3)
More sophisticated logit models exist, for example, generalized extreme value models or MMNL models; however, for simplicity the MNL model will be used in this paper. Hereby, we ignore the repeating nature of successive choice observations by a single respondent in a SC experiment. The panel MMNL model is suitable to take the correlations between choice observations into account, however, this model is much more complex and optimal experimental design is computationally much more expensive (Bliemer & Rose, 2008). Further, Bliemer and Rose (2008) have shown that designs generated for the MNL model offers similar efficiency levels to those generated specifically for MMNL models. Nevertheless, for a real-life case study, the use of MMNL designs would probably not represent a significant problem. However, in our analysis we adopt a Monte Carlo approach, repeating the process 100 times, making the computation times with the panel MMNL model prohibitively large. Therefore, we restrict ourselves to the simpler MNL model and note that there is no reason to believe that results found in this paper are not transferrable to more sophisticated model types.
6.3. Model Estimation In order to estimate the unknown parameters, b, in the MNL model formulated above, a SC experiment can be conducted. In such an experiment, the attribute levels xnsj are input from an underlying experimental design and presented to the respondents in a survey. The respondents then state their preferred alternative in each choice situation. Let ynARSJ denote the vector of choices made by respondent n in a sequence of S choice situations, where ynsj ¼ 1 if the respondent chooses alternative j in choice situation s, and 0 otherwise. These choice outcomes can be used to estimate parameters b. More specifically, the parameters can be estimated by
144
Michiel C. J. Bliemer and John M. Rose
maximizing the following log-likelihood function: ‘N ðbjX N ; Y N Þ ¼ Y 0N log PN ðX N jbÞ (6.4) 0 1 0 1 0 1 y1 p1 x1 B C B C B C . . SJ B . C B .. C . C the where2 X N ¼ B @ . A; Y N ¼ @ . A; and PN ¼ @ . A; with pnAR xN yN pN MNL probabilities for respondent n from Eq. (6.3), and xnAR(SJ) K the attribute levels for each respondent n, 0
xn111 B .. B . B B B xn1J1 B B xn211 B xn ¼ B B .. B . B B xn2J1 B B . B . @ . xnSJ1
xn112 .. .
.. .
xn1J2 xn212 .. . xn2J2 .. . xnSJ2
.. . .. .
1 xn11K .. C . C C C xn1JK C C xn21K C C .. C C . C C xn2JK C C .. C C . A xnSJK
(6.5)
This matrix of attribute levels can also be referred to as the experimental design for respondent n. Since the function in Eq. (6.4) is concave (see e.g., Train, 2003), in order to find the maximum, the Newton–Raphson iterative procedure can be applied by using the first and second derivatives. The vector of first derivatives, the gradient gNARK, and the matrix of second derivatives, the Hessian hNARK K, are given by: gN ðbjX N ; Y N Þ ¼
hN ðbjX N Þ ¼
@‘N ¼ X 0N ðY N PN Þ; and @b
@2 ‘ N ¼ Z 0N diagðPN ÞZ N , @b@b0
(6.6)
(6.7)
0
1 z1 B . C PJ . C where Z N ¼ B @ . A; with znsjk ¼ xnsjk i¼1 pnsi xnsik . zN
2. Upper case variables in this paper denote accumulated data, for example, Yn represents all data from y1–yn.
Serial Choice Conjoint Analysis
145
6.4. Experimental Design So far, the experimental design was assumed given. The question now is, which designs xn, n ¼ 1, y, N, will yield the most accurate and/or reliable parameter estimates? The design that yields reliable parameter estimates (i.e., small standard errors) for a fixed sample size is called an efficient design. Traditionally, the same design is used for each respondent and therefore only a single design is constructed. The most widely used design type is a so-called orthogonal design that aims to minimize the correlation between the levels of two distinct attributes. This type of design is optimal (most efficient) for estimating linear models; however, the MNL model is nonlinear and, therefore, this design will typically not be the most efficient design possible. More efficient designs can be found by taking the model type explicitly into account. These efficient designs are usually found by minimizing the so-called D-error, leading to a D-efficient design. The D-error can be computed by taking the determinant of the asymptotic variance–covariance (AVC) matrix for a single respondent n, OnARK K, and normalizing this value to the number of parameters, K, dðxn jbÞ ¼ ½detðXn Þ1=K ¼
1 detðhn ðbjxn ÞÞ
1=K (6.8)
As stated in the equation above, the AVC matrix is the inverse of the Fisher information matrix, which is the negative Hessian matrix. It is important to observe that hn does not depend on yn, therefore the D-error can be determined without conducting any surveys.3 In other words, the efficiency of the design can be evaluated beforehand. However, hn can only be determined if the parameter values, b, are known. Since the aim is to estimate these, these values are clearly unknown. Hence, prior values b~ have to be assumed as best guesses. These priors can be obtained from literature or pilot studies. Lack of knowledge about these priors is one of the aspects that limits the generation of efficient designs. In Figure 6.1 the procedures for estimating parameters using different types of experimental designs are depicted. When using an efficient or serial design, priors are needed, but for the serial design one can start with zero priors, while for the efficient design more effort is needed to determine good priors. Orthogonal designs do not require any priors (Step 1 is missing). The serial design is updated for each respondent (using updated priors), while this updating Step 4 is missing when using an efficient or orthogonal design. In the last step, the parameters are estimated, together with their standard errors. As mentioned before, in general the lower these standard errors, the more efficient the design. In the next section, the procedure for determining serial designs will be discussed in detail.
3. It has been shown in Bliemer et al. (2009) and Sa´ndor and Wedel (2002) that both the NL model and the cross-sectional mixed logit model share this same property. However, as discussed in Bliemer and Rose (2008), in the panel mixed logit model the dependency on yn remains, such that sampling is required.
Figure 6.1: Procedure for estimating parameters using an efficient, orthogonal, or serial design.
146 Michiel C. J. Bliemer and John M. Rose
Serial Choice Conjoint Analysis
147
6.5. Generating Serial Efficient Designs The five steps in Figure. 6.1 will now be described in detail. Step 0. Initialization Specify the utility function in Eq. (6.2) and determine the dimensions of the design, that is, the number of choice situations, S, and the possible levels of each attribute. Let L denote the set of feasible designs, constrained by the specified attribute levels and attribute-level balance. Set n ¼ 1. Let N be the sample size. Set X0 ¼ Y0 ¼ P0 ¼ Z0 ¼ +. Step 1. Initialize Priors Assuming no prior information on parameter values, set b~ ¼ 0. Step 2. Determine Efficient Design ~ Determine design xn such that the D-error in Eq. (6.8) is minimized for the given b. Finding an optimal design is very difficult, therefore, we settle for the most efficient design that we can find after a few thousand design evaluations. These designs are not chosen randomly from the set of feasible designs, L, but are optimized using local search techniques. Swapping algorithms that switch two attribute levels for the same attribute are easy to implement, maintain attribute-level balance, and find relatively efficient designs rather quickly. The design for the previous respondent, xn 1, is included in the search space as a good starting point for swapping. ! X n1 Pool all design data, X n ¼ . xn Step 3. Observe Choices from Respondent Present choice situations dictated by design xn to respondent n and collect his or her choice data, yn. ! Y n1 . Pool all the observed choices, Y n ¼ yn Step 4. Update Priors Estimate parameters based on the design, xn, and the choice data, yn, using the Newton–Raphson iterative procedure: (a) Set b ¼ b~ as an initial solution. (b) Compute the probabilities corresponding to these parameter values using Eq. (6.3). P (c) Compute znsjk ¼ xnsjk Ji¼1 pnsi xnsik .
148
Michiel C. J. Bliemer and John M. Rose
(d) Pool the variables: Z n ¼
Z n1 zn
! ;
Pn ¼
! Pn1 . pn
Compute gradient: gn ¼ X 0n ðY n Pn Þ. Compute Hessian: hn ¼ Z 0n diagðPn ÞZ n . Determine AVC matrix: Xn ¼ h1 n . Update parameter estimate: b: ¼ b + Ongn. Convergence: if ||gn||od1, then set b~ ¼ b and continue to Step 4( j), otherwise return to Step 4(b). ( j) Let se(bk) be the standard error for parameter k, that is the square root of the kth diagonal element of matrix O. If |bk/se(bk)|o1.96, then bk is not statistically significant, such that we set the prior to zero, b~ k ¼ 0. (k) If noN, then set n: ¼ n + 1 and return to Step 2. Otherwise, continue to Step 5. (e) (f) (g) (h) (i)
Step 5. Estimate Parameters The parameter estimates are readily available from the previous step; the estimates are b^ ¼ b. The above procedure describes how to apply serial efficient designs in practice. For the results in this paper, we have not conducted any real surveys, however, we rely on simulation results. Step 3 in the above procedure, therefore, is replaced by a sample generator that will simulate choices of respondents, assuming they behave as described by the MNL model. Below is the description of the rather straightforward sample generator, where b is the assumed to be the true set of parameters for the population. Step 6. Sample Generator for Respondent n (a) Compute systematic utilities: vnsj ¼ x0nsj b (b) Compute error components: ensj ¼ log( log(onsj)), where onsj is a (uniformly distributed) random number between 0 and 1. (c) Compute utilities perceived by respondent n: unsj ¼ vnsj + ensj. (d) For each s, determine the alternative with the highest utility: j ns ¼ arg maxðunsj Þ: In case of multiple alternatives with highest utility, a random j
selection is made. ; 1; if j ¼ jns (e) Define ynsj ¼ 0; otherwise:
6.6. Case Studies In this section, two examples are shown. In the first example, a labeled experiment with some alternative-specific parameters is considered, while in the second example
Serial Choice Conjoint Analysis
149
an unlabeled experiment with all generic parameters is assumed. In both examples, an orthogonal design is constructed with D-efficient designs constructed based on priors that correspond to the true parameters. This latter set of designs is expected to offer better performance. In each example, serial efficient designs are constructed assuming 80 respondents. Since the sample generator contains a random error component, the generated choices will be different each time we generate a sample, leading to different priors each time, and also different serial designs. Therefore, a Monte Carlo approach is adopted in which we repeat the whole process 100 times. The outcomes in terms of parameter accuracy (the actual values) and parameter reliability (in terms of t-values) will be discussed. Example 1. Consider the following utility functions with eight attributes and seven (generic and alternative-specific) parameters to estimate: vns1 ¼ b1 þ b3 xns1;3 þ b4 xns1;4 þ b5 xns1;5 vns2 ¼ b2 þ b3 xns2;3 þ b4 xns2;4 þ b6 xns2;6 vns3 ¼ b3 xns3;3 þ b7 xns3;7 The following attribute levels are assumed: xns1;3 ; xns2;3 ; xns3;3 2 f6; 8; 10; 12g; xns1;4 ; xns2;4 2 f4; 8g; xns1;5 ; xns2;6 ; xns3;7 2 f0; 1g The following values for the (true) parameters are assumed: b ¼ ð1:2; 0:8; 0:6; 0:4; 0:3; 0:8; 1:0Þ0 An attribute-level balanced orthogonal and a D-efficient design (assuming b as prior), with 12 choice situations, have been generated using the Ngene software and are shown in Table 6.1 (in the same format as the matrix in Eq. (6.5)). The D-error of the efficient design is 0.5147, which is significantly lower than the orthogonal design with a D-error of 0.8802. As usual, the orthogonal design is more efficient under the assumption of zero priors. In that case, the D-error would be 0.3572 compared to a D-error of 0.3965 for the efficient design. The designs are used for all respondents. Obviously, the serial efficient design is variable and, therefore, not listed here. In each of the Monte Carlo runs, we simulate respondents and after each respondent the parameters are estimated again. In this way, we can observe how the parameter estimates converge to the true parameter values b as the sample size increases. As an example, the parameter estimates for b^ 2 for 100 runs of up to 80 respondents are depicted in Figure 6.2, in which the horizontal line indicates the true parameter value. In general, the more the respondents, the more accurate the parameter estimates become, although there exists quite some variability in the actual values obtained. Comparing the three different designs, it is clear that the orthogonal design has a larger spread of parameter estimates, while the efficient and serial designs seem to provide similar results. In Figure 6.3, the average values of all seven parameter estimates are shown for each sample size and each design type. Particularly in the first four parameters,
150
Michiel C. J. Bliemer and John M. Rose
Table 6.1: Attribute levels of the orthogonal and efficient designs in Example 1. Orthogonal design s:
j:
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
k:
Efficient design
1
2
3
4
5
6
7
1
2
3
4
5
6
7
1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0
6 6 6 10 10 8 10 8 10 12 6 12 6 8 12 8 12 6 8 6 8 10 12 8 8 12 12 12 8 6 12 10 10 6 10 10
8 8 0 4 8 0 8 4 0 4 4 0 4 8 0 4 4 0 8 4 0 8 4 0 8 8 0 4 8 0 8 8 0 4 4 0
0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0
8 8 6 10 10 12 12 6 8 8 10 10 10 6 8 6 6 6 8 12 12 12 8 12 12 12 10 10 8 8 6 12 6 6 10 10
4 8 0 8 4 0 4 8 0 8 4 0 4 8 0 4 8 0 8 4 0 4 8 0 8 4 0 4 8 0 8 4 0 8 4 0
1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0
Serial Choice Conjoint Analysis efficient design
ˆ2
orthogonal design
ˆ2
2
2
2
1.5
1.5
1
1
1
0.5
0.5
0.5
0
0
0
-0.5
-0.5
-0.5
-1
0
10
20
30
40
50
60
70
n
-1 0
serial design
ˆ2
1.5
151
2
-1 10
20
30
40
50
60
70
n
0
10
20
30
40
50
60
70
n
Figure 6.2: Parameter estimates of b2 in Example 1 for different sample sizes (100 runs).
there is some clear overestimation (in absolute value) in the serial design, which slowly disappears with larger sample sizes. With low sample sizes, the priors are not of high quality yet and, therefore, the serial design may not be of high quality either. The result is that the information collected from the first respondents is not optimal. This may suggest using the orthogonal design for the first few respondents before estimating and updating any priors. Also, possible endogeneity bias may play a role, although with higher sample sizes this bias appears to play no role. We will further investigate the overestimation or bias later in this section. The values in Figure 6.3 only show the average parameter estimates, however, the variance of these parameter estimates plays an important role in assessing the reliability of the estimates. Figure 6.4 shows the absolute t-ratios for each parameter for different sample sizes, defined as the ratio of the true parameter (absolute) value and the standard error. The lower the standard error, the larger this ratio, and, therefore, the higher the reliability. The horizontal lines in the figure indicates a t-ratio of 1.96 as the 95%-level of statistical significance. The first observation that can be made is that the orthogonal design provides in general much less reliable parameter estimates, in line with Figure 6.2. Only for the third parameter does it perform better, but this parameter has already a very low standard error (the parameter is already statistically significant with a few respondents) and, therefore, is not of much concern. The second observation is that the performance of the efficient and serial designs are almost the same. The serial design, assuming no initial prior information, seems to catch up relatively quick with the efficient design. Examining the points where the t-value exceeds 1.96, an indication of the required sample size can be obtained. These sample sizes are indicated in Figure 6.5. The fifth parameter is the most difficult parameter to estimate. The required sample sizes of the efficient and serial designs are almost identical, while the orthogonal design requires approximately twice as many respondents. The reason why this parameter requires the highest sample size lies in the fact that the attribute-level range is narrow (0, 1) combined with a small prior parameter value (0.3). Bliemer and Rose (2009) discuss how indications for required sample sizes can be obtained prior to conducting the survey.
152
Michiel C. J. Bliemer and John M. Rose E(ˆ1)
E(ˆ2)
1 .5
1 0 .9 5
1 .4
0.9 1 .3
0 .8 5 2
1
0 .7 5
1.1
0.7 1
0 .6 5
0 .9
0.6 0
10
20
30
40
50
60
70
n
0
10
20
30
40
50
60
70
n
10
20
30
40
50
60
70
n
10
20
30
40
50
60
70
n
E(ˆ4)
E(ˆ3)
-0.3
-0.45 -0 .5
-0 .35 -0.55 4
3 -0.65
-0 .45 -0.7 -0.75
-0.5
0
10
20
30
40
50
60
70
n
0 E(ˆ6)
E(ˆ5) 0.38
1
0.36
0.95
0.34
0.9
0.32
0.85
5
6
0.28
0.75
0.26
0.7
0.24
0.65 0.6
0.22 0
10
20
30
40
50
60
70
n
0
E(ˆ7) efficient design orthogonal design serial design
-0 .8 -0 .9 7 -1 .1 -1 .2 -1 .3 0
10
20
30
40
50
60
70
n
Figure 6.3: Average parameter estimates in Example 1 for different sample sizes.
Serial Choice Conjoint Analysis
153
t2
t1
3 .5
5
3 4
2 .5
3
2 1 .5
2
1 1 0
0 .5 0
10
20
30
40
50
60
70
n
0
0
10
20
30
40
50
60
70
n
10
20
30
40
50
60
70
n
10
20
30
40
50
60
70
n
t4
t3 18
10
16 14
8
12 10
6
8 4
6 4
2
2 0
0 0
10
20
30
40
50
60
70
n
0
t5
t6
2
4 .5 4 3 .5
1.5
3 2 .5 1
2 1 .5
0.5
1 0 .5
0 0
10
20
30
40
50
60
70
n
0
0
t7 efficient design orthogonal design serial design
6 5 4 3 2 1 0 0
10
20
30
40
50
60
70
n
Figure 6.4: t-Values in Example 1 for different sample sizes.
154
Michiel C. J. Bliemer and John M. Rose N* efficient design 140
orthogonal design serial design
120 100 80 60 40 20 0
2
1
3
4
5
6
7
Figure 6.5: Sample sizes needed for parameters in Example 1 to be statistically significant. E(ˆ2)
t2
1
3.5
0.95
3
0.9 0.85 2
2.5
0.75
1.5
2
0.7
1
0.65 0.5
0.6
0 0
10
20
30
40
50
efficient design
60
70
n
0
10
20
orthogonal design
30
40
50
60
70
n
serial design
Figure 6.6: Serial design using an orthogonal design for the first 20 respondents.
Coming back to the observed overestimation of the parameters with a serial design, another simulation is performed in which an orthogonal design is used for the first 20 respondents and then the design is updated with new priors from then on. This prevents bad and unreliable priors to be used in the generation of the serial design, which seems to cause the bias. The results are shown in Figure 6.6 for the second parameter, noting that all other parameters show similar results. The bias for the serial design has clearly disappeared. The efficiency in terms of t-ratios is low for the first 20 respondents, but quickly increases when the design becomes
Serial Choice Conjoint Analysis
155
more efficient. This procedure seems to prevent bias while providing reliable parameter estimates. Example 2. Now consider the following two utility functions with four generic parameters: vns1 ¼ b1 xns1;1 þ b2 xns1;2 þ b3 xns1;3 þ b4 xns1;4 vns2 ¼ b1 xns2;1 þ b2 xns2;2 þ b3 xns2;3 þ b4 xns2;4 All attributes are assumed to have three levels: 0, 1, or 2. The following values for the (true) parameters are assumed: b ¼ ð0:6; 0:4; 0:5; 0:2Þ0 . For unlabeled experiments with all generic parameters, Street et al. (2005) describe a procedure to determine optimal orthogonal designs, although their definition of optimality is somewhat different from what we assume here as D-optimality. They define optimality in terms of maximizing the attribute-level differences for the same attribute across alternatives, arguing that if an attribute level is the same across two or more alternatives, then no information is captured about trade-offs for that attribute. Such a definition does not require prior information when generating the design. Table 6.2 shows just such a design. Furthermore, a D-efficient design has also been generated and is shown in Table 6.2. The (optimal) orthogonal design has a Table 6.2: Attribute levels of the orthogonal and efficient designs in Example 2. Orthogonal design s:
j:
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
k:
Efficient design
1
2
3
4
1
2
3
4
0 1 0 1 0 1 1 2 1 2 1 2 2 0 2 0 2 0
0 2 1 0 2 1 0 2 1 0 2 1 0 2 1 0 2 1
0 1 1 2 2 0 1 2 2 0 0 1 2 0 0 1 1 2
0 2 2 1 1 0 1 0 0 2 2 1 2 1 1 0 0 2
2 0 0 2 1 1 2 1 2 0 1 0 1 1 0 2 0 2
1 1 1 1 1 1 0 2 0 2 2 0 2 0 2 0 0 2
1 1 1 1 2 0 0 2 2 0 1 1 2 0 0 2 0 2
0 2 0 2 2 0 2 0 1 1 1 1 1 1 2 0 0 2
156
Michiel C. J. Bliemer and John M. Rose E(ˆ1)
E(ˆ2) 0 .5 5
0 .8
0.5 0 .7
0 .4 5 ∼ 2
∼ 1
0 .3 5
0 .5
0.3 0 .4 0 .3
0.25 0.2 0
10
20
30
40
50
60
70
0
n
10
20
30
40
50
60
70
n
10
20
30
40
50
60
70
n
E(ˆ4)
E(ˆ3) -0.3 -0.35 -0.4 -0.45 ∼ 3
-0.15 ∼ 4
-0.55 -0.6 -0.65 -0.7 -0.75
-0.25
-0.3 0
10
20
30
40
50
60
70
efficient design
n
0
orthogonal design
serial design
Figure 6.7: Average parameter estimates in Example 2 for different sample sizes.
D-error of 0.3143 (0.2222 if all zero priors are assumed) and the efficient design has a D-error of 0.2267 (0.1869 in case of all zero priors). As in the previous example, the average parameter estimates and the t-ratios are shown in Figures 6.7 and 6.8. The differences in results between designs are not as dramatic as in the previous example, most likely because the input was a ‘‘good’’ orthogonal design, but the results are in line with the previous results. The orthogonal design performs worst, while the t-ratios of the serial design are again very close to the t-ratios of the efficient design. The required sample sizes are depicted in Figure 6.9.
6.7. Summary and Discussion Orthogonal designs are still used most in practice, although these designs are clearly not the most efficient as has been shown several times in recent literature and again in this paper. However, constructing efficient experimental designs requires prior parameter information. Often, these priors are not readily available, and not all
Serial Choice Conjoint Analysis t1
t2
9 8 7 6 5 4 3 2 1
7
0
157
6 5 4 3 2 1 0
10
20
30
40
50
60
70
n
0
t3
t4
8
3 .5
7
3
6
0
10
20
30
40
50
60
70
n
0
10
20
30
40
50
60
70
n
2 .5
5
2
4
1 .5
3 2
1
1
0 .5
0
0 0
10
20
30
40
50
60
70
efficient design
n
orthogonal design
serial design
Figure 6.8: t-Values in Example 2 for different sample sizes. N* efficient design 30
orthogonal design serial design
25 20 15 10 5 0 1
2
3
4
Figure 6.9: Sample sizes needed for parameters in Example 2 to be statistically significant.
158
Michiel C. J. Bliemer and John M. Rose
practitioners want to go through the process of first doing a pilot study in order to obtain some priors. Therefore, in this paper, we have examined an automated updating procedure in which efficient designs are generated based on current information of the priors, that is, parameters estimated from choice observations of previous respondents. No initial priors need be known, while in the end an efficient design (and data set) will result. The results show that a serial design is as good as an efficient design based on true parameter values. It is very unlikely that the true parameter values are known in advance and are at best known only with uncertainty. That serial design is able to produce results similar to the efficient design based on true parameters, is therefore, very promising. Serial designs are also not sensitive to misspecification of the priors, since the priors are updated continuously. Efficient designs lose efficiency when the priors turn out to be incorrect. Adopting a Bayesian approach in which the parameter priors are assumed to be random distributions, that is, explicitly assuming that the priors are uncertain, would yield a design that is more robust against misspecification, but would also lose some efficiency. A serial Bayesian design seems an interesting direction for further research, in which not only the parameter estimates themselves are used to generate an efficient design, but also the standard error is used to indicate the level of uncertainty. For example, one can consider Bayesian priors following a normal distribution with means and standard deviations equal to the parameter estimates and the standard errors, respectively. However, generating a Bayesian efficient design is computationally intensive, as the Bayesian D-error has to be approximated with simulation. For more about computing the Bayesian efficiency of a design, see Bliemer, Rose, and Hess (2008). The case studies also showed that there is a potential risk of overestimating the parameters (in the absolute sense), particularly at low sample sizes. This may be avoided by using an orthogonal design for the first few respondents, such that the first update of the priors is made based on sufficient data. In the case study, using an orthogonal design for the first 20 respondents before updating the design got rid of the overestimation problem completely, while still keeping most of the efficiency in parameter estimation. The applicability is not restricted to only the MNL model, but also NL models and MMNL models could be estimated using serial experimental designs. For generating efficient designs for NL models, see Bliemer et al. (2009). For more on efficient designs for cross-sectional mixed logit models, see Sa´ndor and Wedel (2002), and for the panel MMNL model, see Bliemer and Rose (2008). The MMNL models could potentially lead to computational problems, as in serial designs estimation of the parameters is done after each respondent. Estimation of MMNL models can be rather time consuming and may become impractical in serial design generation. Furthermore, generating efficient designs for MMNL models, particularly for panel MMNL models where sample generation is necessary (see Bliemer and Rose, 2008), is also computationally intensive. Depending on the size of the estimation problem, application of MMNL may be feasible or not (in terms of computation time). If there is no time restriction, for example, if only a single respondent per day is surveyed, then clearly there is no problem.
Serial Choice Conjoint Analysis
159
Several further extensions can be made using the same principle of serial design. Instead of assuming discrete attribute levels, continuous attribute levels could be used as discussed in, for example, Kanninen (2002). Instead of using fixed attribute levels, relative attribute levels around reference levels could be used, so-called pivot designs, see Rose et al. (2008). This way, a single (relative) design could be used for different segments in which the reference level is different. One can also include covariates (e.g., gender, income) as to segment the population and create efficient design taking the influence of these covariates into account, see Rose and Bliemer (2006). Segments based on attribute levels or covariates can be handled by a single design (but possibly evaluated using multiple sets of priors) for all segments, which is as efficient as possible over all segments simultaneously, as discussed in abovementioned references. Alternatively, it is also possible to create multiple designs, each specific and efficient for a certain segment. Both approaches could be adopted in a serial design fashion. Since orthogonal and efficient designs are kept fixed, the survey can be sent out to all respondents at the same time. These surveys can be done in parallel. Serial designs are generated based on choices from previous respondents, such that the survey cannot be sent out to all respondents at the same time. In case of a computer-aided personal interviewing (CAPI) technique, where an interviewer brings a laptop computer and visits respondents, this does not pose a problem. Internet surveys need a smarter implementation to avoid problems. For example, not after each respondent a new serial design is generated, but after every 10 respondents in an automated fashion, or every hour. The design does not change for respondents already filling out the survey, but only for newly logged-in respondents. Concluding, serial designs combine the benefits of orthogonal designs (no initial prior information required) and regular efficient designs (reliable parameter estimates). Also, a serial design is not sensitive to misspecification and will outperform Bayesian efficient designs that are only suboptimal. A disadvantage would be that more effort has to be put in the implementation of the survey (automated estimation and design generation) and that the serial character may prohibit large-scale parallel surveying techniques. However, some form of serial design can always be applied, even by estimating the model parameters and updating the design manually at certain time intervals.
References Ben-Akiva, M., & Lerman, S. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: MIT Press. Bliemer, M. C. J., & Rose, J. M. (2008). Construction of experimental designs for mixed logit models allowing for correlation across choice observations. Proceedings of the 87th Annual Meeting of the Transportation Research Board, Washington, DC. Bliemer, M. C. J., & Rose, J. M. (2009). Efficiency and sample size requirements for stated choice experiments. Proceedings of the 88th Annual Meeting of the Transportation Research Board, Washington, DC.
160
Michiel C. J. Bliemer and John M. Rose
Bliemer, M. C. J., Rose, J. M., & Hensher, D. A. (2009). Efficient stated choice experiments for estimating nested logit models. Transportation Research B, 43(1), 19–35. Bliemer, M. C. J., Rose, J. M., & Hess, S. (2008). Approximation of Bayesian efficiency in experimental choice designs. Journal of Choice Modelling, 1(1), 98–127. Bunch, D. S., Louviere, J. J., & Anderson, D. A. (1996). A comparison of experimental design strategies for Multinomial Logit Models: The case of generic attributes. Working Paper no. UCD-GSM-WP11-96. Graduate School of Management, University of California at Davis, CA. Burgess, L., & Street, D. J. (2003). Optimal designs for 2k choice experiments. Communications in Statistics, Theory and Methods, 32(11), 2185–2206. Carlsson, F., & Martinsson, P. (2003). Design techniques for stated preference methods in health economics. Health Economics, 12(June), 281–294. Ferrini, S., & Scarpa, R. (2007). Designs with a-priori information for nonmarket valuation with choice-experiments: A Monte Carlo study. Journal of Environmental Economics and Management, 53, 342–363. Green, P. E., Krieger, A., & Agarwal, M. K. (1991). Adaptive conjoint analysis: Some caveats and suggestions. Journal of Marketing Research, 23(2), 215–222. Hauser, J. R., & Toubia, O. (2005). The impact of utility balance and endogeneity in conjoint analysis. Marketing Science, 24(3), 498–507. Hensher, D. A., Rose, J. M., & Greene, W. H. (2005). Applied choice analysis: A primer. Cambridge, UK: Cambridge University Press. Huber, J., & Zwerina, K. (1996). The importance of utility balance in efficient choice designs. Journal of Marketing Research, 33(August), 307–317. Johnson, F. R., Kanninen, B. J., & Bingham, M. (2006). Experimental design for stated choice studies. In: B. J. Kanninen (Ed.), Valuing environmental amenities using stated choice studies: A common sense approach to theory and practice (pp. 159–202). The Netherlands: Springer. Johnson, R. (1987). Accuracy of utility estimation in ACA. Working Paper. Sawtooth Software, Research Paper Series. Sequim, WA. Johnson, R. (1991). Comment on ‘Adaptive conjoint analysis: Some caveats and suggestions’. Journal of Marketing Research, 28(May), 223–225. Judge, G. G., Griffiths, W. E., Hill, R. C., Lutkepohl, H., & Lee, T. C. (1985). The theory and practice of econometrics. New York: Wiley. Kanninen, B. J. (2002). Optimal design for multinomial choice experiments. Journal of Marketing Research, 39, 214–217. Kanninen, B. J. (2005). Optimal design for binary choice experiments with quadratic or interactive terms. Proceedings of the International Health Economics Association Conference, Barcelona, Spain. Kessels, R., Goos, P., & Vandebroek, M. (2006). A comparison of criteria to design efficient choice experiments. Journal of Marketing Research, 43(August), 409–419. Kessels, R., Jones, B., Goos, P., & Vandebroek, M. (2009). An efficient algorithm for constructing Bayesian optimal choice designs. Journal of Business and Economic Statistics, 27(2), 279–291. Kuhfeld, W. F., Tobias, R. D., & Garratt, M. (1994). Efficient experimental design with marketing research applications. Journal of Marketing Research, 31, 545–557. Louviere, J. J., Hensher, D. A., & Swait, J. D. (2000). Stated choice methods: Analysis and application. Cambridge, UK: Cambridge University Press. McFadden, D. (1974). Conditional logit analysis of qualitative choice behaviour. In: P. Zarembka (Ed.), Frontiers of econometrics. New York: Academic Press.
Serial Choice Conjoint Analysis
161
Rose, J. M., & Bliemer, M. C. J. (2006). Designing efficient stated choice data: Accounting for socio-demographic and contextual effects in designing stated choice experiments. Proceedings of the 11th International Conference on Travel Behaviour Research, Kyoto, Japan. Rose, J. M., & Bliemer, M. C. J. (2008). Stated preference experimental design strategies. In: D. A. Hensher & K. J. Button (Eds), Handbook of transport modelling (pp. 151–179). Oxford, UK: Elsevier. Rose, J. M., Bliemer, M. C. J., Hensher, D. A., & Collins, A. (2008). Designing efficient stated choice experiments in the presence of reference alternatives. Transportation Research Part B, 42, 395–406. Sa´ndor, Z., & Wedel, M. (2001). Designing conjoint choice experiments using managers’ prior beliefs. Journal of Marketing Research, 38(November), 430–444. Sa´ndor, Z., & Wedel, M. (2002). Profile construction in experimental choice designs for mixed logit models. Marketing Science, 21(4), 455–475. Sa´ndor, Z., & Wedel, M. (2005). Heterogeneous conjoint choice designs. Journal of Marketing Research, 42(May), 210–218. Street, D. J., Bunch, D. S., & Moore, B. J. (2001). Optimal designs for 2k paired comparison experiments. Communications in Statistics, Theory and Methods, 30(10), 2149–2171. Street, D. J., & Burgess, L. (2004). Optimal and near optimal pairs for the estimation of effects in 2-level choice experiments. Journal of Statistical Planning and Inference, 118, 185–199. Street, D., Burgess, L. B., & Louviere, J. J. (2005). Quick and easy choice sets: Constructing optimal and nearly optimal stated choice experiments. International Journal of Research in Marketing, 22(4), 459–470. Toner, J. P., Clark, S. D., Grant-Muller, S. M., & Fowkes, A. S. (1999). Anything you can do, we can do better: A provocative introduction to a new approach to stated preference design. WCTR Proceedings (Vol. 3, pp. 107–120). Antwerp, Belgium. Toubia, O., Simester, D. L., Hauser, J. R., & Dahan, E. (2003). Fast polyhedral adaptive conjoint estimation. Marketing Science, 22(3), 273–303. Train, K. (2003). Discrete choice methods with simulation. Cambridge, UK: Cambridge University Press. Yu, J., Goos, P., & Vandebroek, M. (2009). Efficient conjoint choice designs in the presence of respondent heterogeneity. Marketing Science, 28(1), 122–135.
Chapter 7
Observed Efficiency of a D-Optimal Design in an Interactive Agency Choice Experiment Sean M. Puckett and John M. Rose
Abstract Currently, the state of practice in experimental design centres on orthogonal designs (Alpizar et al., 2003), which are suitable when applied to surveys with a large sample size. In a stated choice experiment involving interdependent freight stakeholders in Sydney (see Hensher & Puckett, 2007; Puckett et al., 2007; Puckett & Hensher, 2008), one significant empirical constraint was difficult in recruiting unique decision-making groups to participate. The expected relatively small sample size led us to seek an alternative experimental design. That is, we decided to construct an optimal design that utilised extant information regarding the preferences and experiences of respondents, to achieve statistically significant parameter estimates under a relatively low sample size (see Bliemer & Rose, 2006). The D-efficient experimental design developed for the study is unique, in that it centred on the choices of interdependent respondents. Hence, the generation of the design had to account for the preferences of two distinct classes of decision makers: buyers and sellers of road freight transport. This paper discusses the process by which these (non-coincident) preferences were used to seed the generation of the experimental design, and then examines the relative power of the design through an extensive bootstrap analysis of increasingly restricted sample sizes for both decision-making classes in the sample. We demonstrate the strong potential for efficient designs to achieve empirical goals under sampling constraints, whilst identifying limitations to their power as sample size decreases.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
164
Sean M. Puckett and John M. Rose
7.1. Introduction The paramount motivation for choosing an optimal design above an orthogonal design in a stated choice (SC) experiment is to minimise the expected standard errors in choice models that utilise the data from the experiment. This appears to be a straightforward motivation in itself; after all, why would the analyst want to induce relatively large standard errors simply as an artefact of design specification? What may be overlooked in the discussion of the merits of optimal design is the dominant force governing the choice in the first place: sample size. There is nothing inherently wrong with orthogonal designs. Indeed, a lack of correlation across attributes in choice sets (should one be able to preserve this empirically after removing observations) is a desirable feature. Rather, orthogonal designs can require relatively large sample sizes to yield statistically significant parameter estimates in choice models (see e.g. Bliemer & Rose, 2006). This is due to the non-linear nature of the discrete choice models, where the (co)variance matrices of such models, from which the standard errors are taken, are a function not only of the data (design), but also the choice probabilities and hence also the parameter estimates derived from the model. Whilst orthogonality relates to the correlation structure of a design, it says nothing about the choice probabilities that one is likely to obtain from models estimated using such data. Several researchers have shown that non-orthogonal designs, typically termed efficient designs, may produce lower standard errors than orthogonal designs for a given sample size (see e.g. Bunch, Louviere, & Anderson, 1994; Bliemer & Rose, 2006; Carlsson & Martinsson, 2003; Huber & Zwerina, 1996; Kanninen, 2002; Sa´ndor & Wedel, 2001). Thus, in cases where the large samples expected to satisfy an orthogonal design may be difficult or impossible to source due to financial, temporal or population constraints, efficient designs offer a powerful alternative. When a choice experiment utilising an efficient design yields statistically significant parameter estimates, it is natural to assign some of the empirical success of the study to the design. Technically, however, the true power of the design is not identified simply through achieving statistical significance; after all, any given design could potentially achieve statistically significant parameter estimates despite being relatively inefficient. Fortunately, there are empirical means of identifying the robustness of experimental designs. The approach discussed in this paper, repeated bootstrapping analysis of sub-samples, is an intuitive tool for identifying the degree to which the design helped to derive the empirical results. Bootstrapping achieves this by examining the degree to which the sample size could have been further limited whilst maintaining statistical significance (and stability in behavioural implications). In an effort to demonstrate this concept, this paper investigates the observed efficiency of a particular type of efficient design known as a d-efficient design that was utilised in a choice study of interdependent road freight stakeholders in Sydney, Australia (see Hensher & Puckett, 2007; Puckett, Hensher, Collins, & Rose, 2007; Puckett & Hensher, 2008). The study centred on an interactive agency choice experiment involving buyers and sellers of road freight transport services under a hypothetical variable road-user charging system. An efficient design was sought due
Observed Efficiency of a D-Optimal Design
165
to the empirical constraints governing the experiment: difficulty in sourcing eligible respondent dyads, relatively large amounts of time needed to recruit and administer the survey for each sampled group, and relatively high expenses in administering the survey. Given the complex modelling structure that was to be applied to the choice data, these constraints made a traditional orthogonal design unlikely to produce robust behavioural results. That is, the expected sample size was not large enough to have faith in the analysts’ ability to derive the desired model outputs. Ultimately, the choice data from the d-efficient design utilised in the study were sufficient to obtain robust model estimates. In this paper, we examine the specific contribution of the design to the empirical results by analysing the model estimates that would have resulted from subsets of the sample obtained in the original study. This enables us to identify the lower limit of sample size that would have been sufficient under the experimental design given the sampled groups we were able to source. We begin the discussion with an overview of efficient designs in Section 7.2, and then introduce the optimal design and empirical survey utilised in the choice study in Section 7.3. This is followed by our empirical exercise in Section 7.4, and a discussion of implications for future studies in Section 7.5.
7.2. Optimal Designs for Multinomial Logit Models The state of practice in experimental design centres on orthogonal designs (Alpizar, Carlsson, & Martinsson, 2003), which are suitable when applied to surveys with a large sample size, in general. When the expected sample size for a study is small, the analyst may have reason to doubt the effectiveness of an orthogonal design. As a safeguard against yielding an insufficient sample size, the analyst may opt to develop an optimal design to achieve statistically significant parameter estimates under a relatively low sample size (see Bliemer & Rose, 2006). An optimal design utilises extant information regarding the preferences and experiences of respondents, to specify attribute levels in choice sets that maximise the information captured when respondents select their preferred alternatives. That is, rather than setting attribute levels with respect to a constraint that they are uncorrelated across alternatives and observations as in an orthogonal design, efficient designs remove the implied assumption of equal preferences for all attributes (present in orthogonal designs) to develop alternatives that identify the preferences of respondents with greater efficiency. Orthogonal designs ignore any extant information with respect to the preferences and experiences of respondents (i.e. marginal utility parameter estimates and attribute levels experienced in the market, respectively), yielding designs that do not achieve efficient asymptotics. That is, orthogonal designs, of which there may be any number for a given research application (with a corresponding range of efficiency that is unknown to the analyst), essentially assume that all parameters to be estimated are equal to zero, and that the attribute levels within the design are immaterial to the outcome. D-efficient designs, conversely, utilise extant information
166
Sean M. Puckett and John M. Rose
Table 7.1: Benefits and constraints of D-efficient and orthogonal designs. Parameters
D-Efficient designs
Required sample size to achieve desired standard errors Statistical knowledge required Prevalence in the literature Ease of design generation
Generally small
Generally large
Relatively large
Relatively small
Not utilised often
Predominant design form
Designed through software or first principles Yes
Designed through software, first principles, websites and published arrays No
User defines the constraints of the design
Orthogonality is constrained by the number of alternatives, attributes and their levels
Evidence of priors required Flexibility in generation
Orthogonal designs
regarding the preferences and experiences of respondents, allowing for greater inferential accuracy for a given sample size, or, of paramount interest to researchers facing sampling constraints, a relatively low sample size for a given desired significance level for parameter estimates (Carlsson & Martinsson, 2003). Other experimental design criteria can be utilised in this regard, although Kuhfeld, Tobias, and Garrat (1994) demonstrate that it is less difficult computationally to find a d-efficient design; given that the candidate efficiency criteria (e.g. a-efficiency and g-efficiency) are highly correlated with d-efficiency, a preference for a d-efficient design was justified in the freight study examined here (Carlsson & Martinsson, 2003). A d-efficient design is one of the many candidate d-efficient profiles that satisfy a desired level of statistical efficiency by minimising the expected standard errors of resulting marginal utility parameters. The d-efficient design utilised in the experiment was derived by specifying prior information gathered through a literature review, previous studies and a pilot study within an iterative optimisation technique.1 The pros and cons of d-efficient and orthogonal designs are compared in Table 7.1.
1. The optimisation technique calculated the expected d-error for randomly generated designs under the specified prior values for attribute levels and marginal utility coefficients. The design with the lowest expected d-error was stored within the program (a macro in Microsoft Excel); whenever a new design achieved an improved error measure, the new design was stored as the preferred design. This procedure was continued until no further improvements could be found over a period deemed by the analyst to be sufficiently long to end the optimisation.
Observed Efficiency of a D-Optimal Design
167
Table 7.1 highlights the primary reasons that orthogonal designs are the most common choice in experimental design applications: orthogonal designs have relatively low user-input requirements, can be generated by using many readily available sources, and their use is widely accepted in the literature. Indeed, if the expected sample size is reasonably large, it is fair to expect an orthogonal design to lead to sufficiently small standard errors for model outputs. However, if the expected sample size is not large, the additional knowledge and resources required to derive a d-efficient design need not be prohibitive. The flexibility gained in removing the constraints relating to orthogonality is powerful, and may offset (perhaps greatly) the burden associated with establishing prior information and generating the design. It is worth noting that the use of d-efficient designs can raise concerns about yielding a design that achieves utility balance (i.e. where the expected utilities of each alternative are equal). Under true utility balance, respondents may have little information or motivation to optimise their choices, as there may be no significant trade-off across alternatives; this would lead to random choices that are not informative to the analyst. However, the generation of a d-efficient design centres on the process of minimising d-error, which is not the same as creating designs with equal expected marginal utilities across alternatives. Indeed, Kanninen (2002) and others (e.g. Toner, Clark, Grant-Muller, & Fowkes, 1999; Fowkes, 2000) have demonstrated that for certain contexts, there exist ideal expected choice probabilities for achieving an efficient design, and these probabilities are far from equal to one another (e.g. in the case of choices under two-alternative design to be estimated within a multinomial logit (MNL) model, the ideal probabilities are approximately 70 and 30% for the two alternatives; see Johnson, Kanninen, & Bingham, 2006 for a discussion of these probabilities). When working on the experimental design in our application, we expected the empirical framework to be relatively complex. Indeed, the empirical models were generalised mixed logit models. However, it should be noted that the experimental design was calibrated using a relatively simple MNL framework. At the time of the empirical study, the asymptotic variance — covariance matrix for the mixed logit model was unknown. The only available means for deriving a d-efficient design based upon a mixed logit model would have been simulation, which would have taken prohibitively long to carry out; a simulation of each iteration of the mixed logit model for each design to test would have been required, which was not a practical option. Fortunately, Bliemer and Rose (2008, 2009) have demonstrated that the MNL model offers a reasonable approximation of the mixed logit model when generating optimal designs. Following the exposition of Carlsson and Martinsson (2003) within an MNL framework, when developing a d-efficient design for a choice experiment involving a revealed-preference-based reference alternative and SC alternatives, the analyst’s task is to estimate, with the highest degree of precision feasible, the parameters of the utility functions for individual i for the reference alternative r and the SC alternatives s, respectively: U ir ¼ a þ bk xk þ
(7.1)
168
Sean M. Puckett and John M. Rose U is ¼ bk xk þ
(7.2)
where a represents an alternative-specific constant representing the real-market nature of the reference alternative, bk represents the vector of desired marginal utility parameter estimates, xk represents the levels of the corresponding vector of k attributes in the alternative and e represents the unobserved effects, which are assumed to be independently and identically distributed type I extreme value. The fundamental difference between the two utility functions in the freight study is that the attribute level for variable road-user charges (VUCs) in the reference alternative is always equal to zero. As demonstrated by McFadden (1974), the covariance matrix of the expected maximum likelihood estimators (i.e. those based upon prior information) is a function of the observed marginal utilities of respondents, " O¼
N X J X
# z0jn
Pjn
z1 jn
(7.3)
n¼1 j¼1
where
zjn ¼ xjn
Jn X
xjn Pjn
(7.4)
j¼1
and where xjn represents the vector of attribute levels for an alternative j (numbered 1 to J) in choice set n (numbered 1 to N), and Pjn represents the choice probability for alternative j in choice set n (Carlsson & Martinsson, 2003). This is intuitive, as the choice probabilities that are observed are a direct function of the preferences underlying the choices made. Therefore, the covariance matrix O is a function of both the marginal utilities of respondents, which are invariant across alternatives, and the attribute level combinations corresponding to a set of alternatives on offer, which, in the case of the SC alternatives, are under the control of the analyst. A d-efficient design is found when maximising the inverse of the determinant of O (scaled by an exponent incorporating the number of parameters to estimate K), which Kanninen (2002) points out is the (scaled) expected value of the Hessian of the log likelihood function, multiplied by 1: max½jOj1=K 1
(7.5)
Importantly, Kanninen clarifies that, due to the central role of the covariance matrix within the search for d-efficiency, by maximising d-efficiency (should the priors be correct), one minimises the magnitude of the asymptotic confidence region around the parameter estimates. Hence, the efficiency of the design is critically dependent upon the manner in which the attribute levels are specified for each alternative. Consequently, by utilising prior information about the likely preferences of respondents and the likely reference attribute levels they would specify, greater
Observed Efficiency of a D-Optimal Design
169
efficiency can be achieved through minimisation of the estimated covariances by manipulating the combination of attribute levels on offer across alternatives for each choice set (Huber & Zwerina, 1996; Carlsson & Martinsson, 2003).
7.3. Freight Survey and its D-Efficient Design 7.3.1.
Freight Survey
The primary focus of the freight study was to establish how the implementation of a variable user charging (VUC) system may affect both the levels of service offered by the traffic infrastructure and the costs of transporting freight. The SC experiment was designed to capture the preferences of freight stakeholders under the range of levels of services and costs that may be present under a distance-based road-user charging system. Respondents from freight firms and their clients were asked to choose from among a set of alternative urban freight trip options for a particular consignment, each of which contained its unique mix of levels of service components and cost. These alternatives represent potential means of coping with a (hypothetical) VUC system. Each of these alternatives was framed in reference to recent experience that forms the basis for the SC alternatives; i.e. a recent goods movement, its corresponding performance measures, and indicators of the relationship between the two firms involved. The primary desired estimation outputs were the sensitivities of buyers and sellers of urban freight services to trade-offs between elements of travel time and cost under VUCs. These include two measures of the value of time savings: one for travel time in free-flow traffic conditions (i.e. those where the truck can travel at the speed limit and manoeuvre without difficulty), and one for travel time in slowed-down conditions (i.e. those where the truck has difficulty in travelling at the speed limit, and where manoeuvring is impeded by the level of other vehicles present). The other two temporal measures are related to transactions time and reliability in arrival time. The former is measured as the value of waiting time savings (i.e. time spent waiting at delivery destinations whilst unable to unload goods due to queuing); the latter is measured as the value of reliability gains (i.e. percentage increases in the probability that a truck will reach its delivery destination without incurring a penalty due to missing a specified arrival time window). Respondents were asked to assume that, for each of the choice sets given, the same cargo needs to be carried out for the same client discussed earlier in the survey, subject to the same constraints faced when the reference trip was undertaken. Respondents were then informed that the choice sets involve three alternative methods of carrying out the trip: their experienced trip and two SC alternatives that involve VUCs. The choice tasks were described to respondents as involving two steps. The first step is to indicate which alternatives would be preferable if the two organisations had to reach agreement, whilst the second step is to indicate what information mattered when making each choice.
170
Sean M. Puckett and John M. Rose
Respondents were faced with four choice sets if representing a freight firm and eight choice sets if representing a client of a freight firm. The difference is attributable to the relatively larger burden placed on respondents from freight firms, in that they must supply the trip- and relationship-specific details required to establish the choice setting and reference alternative. The exact four choice sets answered by a given respondent from a freight firm were given to the corresponding sampled client. The additional four choice sets faced by the sampled client use the same reference alternative as the other four choice sets. The attributes within each choice set are: free-flow travel time, slowed-down travel time, total time waiting to unload goods, likelihood of on-time arrival, freight rate paid by the client, fuel cost, and variable charges, the latter of which always takes a value of zero for the reference alternative. Each of these attributes except for the freight rate (which is not a design attribute) are either an input into a road-user charging policy (i.e. changes in fuel taxes, road-user charges) or direct functions of such a policy. The likelihood of on-time arrival was chosen in preference to other measures of reliability or travel time variability, as in-depth interviews revealed that on-time arrival rates (defined within the experiment as the likelihood of reaching the delivery destination(s) close enough to the time window agreed upon to avoid being penalised) are a key measure of reliability. The levels and ranges of the attributes were chosen to reflect a range of available routing and scheduling options under a hypothetical VUC system. The reference alternative was utilised to offer a base, around which the SC design levels were pivoted. The resulting mixes represent alternatives including: taking the same route at the same time as in the reference alternative under new traffic conditions, costs, or both; and taking alternative, previously less-favourable routes, departing at alternative, previously less-favourable times, or both, with corresponding levels of traffic conditions and costs. In all cases except for the VUC’s, referred to in the SC experiment as a distancebased charge, the attribute levels for each of the SC alternatives are expressed as deviations from the reference level, which is the exact value specified in the corresponding non-SC questions, unless noted: (1) (2) (3) (4)
Free-flow time: 50%, 25%, 0, + 25%, + 50%. Slowed-down time: 50%, 25%, 0, + 25%, + 50%. Waiting time at destination: 50%, 25%, 0, + 25%, + 50%. Probability of on-time arrival: 50%, 25%, 0, + 25%, + 50%, with the resulting value rounded to the nearest 5% (e.g. a reference value of 75% reduced by 50% would yield a raw figure of 37.5%, which would be rounded to 40%). If the resulting value is 100%, the value is expressed as 99%. If the reference level is greater than 92%, the pivot base is set to 92%. If the pivot base is greater than 66% (i.e. if 1½ times the base would be greater than 100%) let the pivot base equal X, and let the difference between 99% and X equal Y. The range of attribute levels for on-time arrival when XW66% are (in percentage terms): X–Y, X 0.5 Y, X, X + 0.5 Y, X + Y. This yields five equally spaced attribute levels between X–Y and 99%.
Observed Efficiency of a D-Optimal Design
171
(5) Fuel cost: 50%, 25%, 0, + 25%, + 50% (representing changes in fuel taxes of 100%, 50%, 0, + 50%, + 100%). Note: Fuel taxes represented approximately half of fuel prices in Australia at the time of the study. (6) Distance-based (or variable user) charges: Pivot base equals one-half of the reference fuel cost, to reflect the amount of fuel taxes paid in the reference alternative. Variations around the pivot base are: 50%, 25%, 0, + 25%, + 50%. One potential complication that we identified is that changes in levels of service and operating costs could lead to upward or downward adjustments in the freight rate charged by the transport company. Incorporating an endogenous (at least to the freight transport provider) choice that could swamp the changes in costs into the experimental design is not a simple matter. To accommodate this, we developed a method to internalise this endogeneity and uncertainty, making it exogenous to the final choice. For each SC alternative involving a net change in direct operating costs (i.e. a decrease in fuel costs that is not equal in magnitude to the value of the new distance-based charges), respondents from freight firms were asked to indicate by how much of the net change in costs they would like to adjust their freight rate. Hence, the freight rate, which is not a design alternative, yet is clearly an important contextual effect, is allowed to vary across SC alternatives under changes in net operating costs. The specific range over which the freight rate may vary is bounded by the change in net operating cost for each alternative.
7.3.2.
D-Efficient Design
As mentioned in Section 7.3.1, each choice set in the study contained a respondentspecified reference alternative, along with two SC design alternatives. Complicating the design generation process was the fact that the full design is partitioned into two groups of choice sets. Respondents from freight firms, who were hypothesised to have one set of marginal utilities, were given four choice sets; respondents from shippers, who were hypothesised to have a set of marginal utilities that differed to those of freight firm respondents, were given the identical four choice sets that were given to a corresponding freight firm respondent, along with four unique choice sets. When generating the experimental design, it was necessary to specify appropriate prior values for marginal utility parameters and attribute levels. The survey pretesting phase and literature reviews identified a range of plausible prior specifications across the two respondent classes, to accommodate for the likely divergent preferences for transporters and shippers. The prior parameter estimates for the design are shown in Table 7.2. Parameter estimates for free-flow travel time and fuel cost for freight transport operators from a previous study at the Institute of Transport and Logistics Studies were used as priors for the same attributes in the design for respondents from freight firms. The parameter value for variability in travel time from the aforementioned study was chosen as the prior for likelihood of on-time arrival (after being multiplied
172
Sean M. Puckett and John M. Rose
Table 7.2: Prior parameter values for the D-efficient design. Parameters Free-flow time Slowed-down time Waiting time Probability of on-time arrival Fuel cost Distance-based charges
Transporters
Shippers
0.047 0.066 0.057 0.038 0.058 0.116
0.024 0.024 0.024 0.038 0.029 0.058
by negative one due to the inverse behavioural relationship between the two concepts of reliability and variability). The resulting prior value of reliability gains using this measure was approximately A$0.65 per percentage point increase in reliability. Priors for the remaining parameters for respondents from freight transport providers were developed using the following heuristics. Firstly, the value for sloweddown time was found by multiplying the prior for free-flow time by 1.4, which is a ratio supported by previous travel studies at ITLS. The prior for waiting time was set as a weighted average of the free-flow and slowed-down priors (two-thirds of the former and one-third of the latter). Lastly, the prior for distance-based charges was set as two times the prior for fuel cost, to account for scaling effects in attribute values; that is, the average attribute values for fuel cost are expected to be twice as high as those for distance-based charges, and hence a base assumption of equivalent aversion to both cost measures necessitates scaling the parameter for distance-based charges. These priors were adjusted for respondents from clients of freight transport providers using the following heuristics. Firstly, the value for free-flow time was specified as one-half the value of the prior for free-flow time for freight transporters. A value of one-half of the corresponding prior for freight transporters was chosen as a parsimonious hedge amongst uncertainty in which the plausible prior value ranged between zero and the value held by freight transporters. Secondly, we assumed no variation in preferences for types of travel time, and hence set the priors for sloweddown time and waiting time equal to this value. The priors for cost measures were set equal to the corresponding priors for freight transporters, due to the ability of freight transporters to pass along the new costs to shippers within the experiment. Lastly, the prior for reliability was set as equal to the corresponding prior for freight transporters, because shippers value reliability, and hence the established prior was the best value available for us to utilise. The priors were selected bearing in mind that the scale of the parameters is fundamentally important. The efficiency of the design is a direct function of the resulting choice probabilities. Independent of the relative magnitudes of the priors (i.e. the scale of parameters relative to one another), absolute magnitudes of priors impact the choice probabilities, and hence the efficiency of the design directly. If the priors are of insufficient magnitude, the resulting utilities of the alternatives will be insufficiently large and the choice probabilities will tend towards 1/J, with
Observed Efficiency of a D-Optimal Design
173
J representing the number of alternatives (i.e. respondents demonstrate random choices). Conversely, if the priors are too large, the resulting utilities of the alternatives will be large enough to tend to have one alternative dominate. Hence, we sought to use prior values that were of magnitudes consistent with (limited) extant knowledge of preferences in freight. The prior parameter values chosen were suitable in that they were of magnitudes that they were estimated based upon attribute levels that were within our range of expected attribute levels in the empirical exercise, and therefore were of acceptable scale. Furthermore, in the case of prior parameter values where a value could not be imported directly, we specified these values in relation to the extant priors, maintaining consistent scale. Once the prior parameter values were established, we needed to identify appropriate prior values for the corresponding attribute levels. Whilst it was known the attribute levels would be the same for transporters and shippers, we needed to establish whether one prior attribute level would be sufficient for each attribute in the design, or whether it was preferable to segment the design into multiple classes of trips. We chose to separate the design into two segments: those involving trips of less than 2 h, and those lasting 2–7 h. The motivation for this segmentation arose from the data source utilised to set these priors. Global positioning systems (GPS) devices were placed in four freight vehicles operating for a major freight transport company in Sydney for one week. The data from the GPS devices was used to measure distances and times for freight delivery tours for the vehicles. Approximately half of the trips measured took 2 h or less to complete, whilst virtually all of the remainder fell within the 7-h limit established for the choice experiment. Table 7.3 shows the prior attribute levels established for two broad trip-length segments, trips of 2 h or less, and trips of 2–7 h. The average travel time for each trip-length segment was divided into two, with one-half specified as the prior free-flow time and the other half specified as the prior slowed-down time. With no further information on the likely proportion of sloweddown time in total travel time, an even split was determined to be the most parsimonious decision. The minimum value of time spent unloading at a destination was used as the baseline for unloading time; this value was deducted from the average time spent at delivery destinations, with the difference multiplied by the average number of deliveries made in each segment to find the prior value for waiting time. Table 7.3: Prior attribute levels for the D-efficient design. Parameters Free-flow time Slowed-down time Waiting time Probability of on-time arrival Fuel cost Distance-based charges
Trips less than 2 h
Trips greater than 2 h
40 min 20 min 20 min 75% A$11.00 A$5.50
140 min 45 min 45 min 75% A$30.00 A$15.00
174
Sean M. Puckett and John M. Rose
The prior value for likelihood of on-time arrival was established as the arithmetic average of on-time arrival rates for primary and secondary retail freight, as revealed by the in-depth interviews. The prior value for fuel cost was established by multiplying the average fuel efficiency of the predominant vehicle type (rigid truck) by the average distance travelled within each trip-length segment, and multiplied by the current price of diesel fuel. This yielded a base value for distance-based charges in the design, equal to one-half of the prior for fuel cost. The pilot data did not reveal significant parameter estimates when analysed within a basic MNL model, and hence the prior parameter estimates were not amended as a result of the pilot. However, the pilot confirmed the presence of meaningful tradeoffs within choice sets, in that the reference alternative did not dominate the SC alternatives. Likewise, the pilot confirmed the merit of the prior attribute levels utilised within the experimental design. That is, the observed RP data offered by respondents were consistent with the prior attribute levels. The d-efficient design ultimately utilised in this research was found through the use of a search algorithm designed at ITLS, which was adjusted for the complex nature of the interactive agency survey. The search algorithm was designed to accommodate: (1) the presence of a mix of a reference alternative and two SC alternatives that are generic to one another; (2) the constraint that the four choice sets faced by a respondent from a freight firm must also be given to a respondent to a corresponding sampled client and (3) hypothesised preference heterogeneity across agent types. Whilst extensive iteration of the algorithm could not guarantee that d-efficiency was maximised globally, the design selected by the algorithm achieved superior d-efficiency to the other designs generated in the search process. The final d-efficient design utilised in this research is given in the Appendix. This design was selected as the design most capable of achieving precise parameter estimates, conditional on the prior values specified.
7.4. Empirical Analysis The freight study yielded a sample of 145 transporters (for a total of 580 choice observations) and 138 shippers (for a total of 1106 choice observations). The analysis in this section centres on repeated bootstrapping procedures that were conducted within the software program Ngene. Utilising this software, we drew repeated sub-samples of transporters and shippers at increasingly restricted sample sizes. For each sub-sample, we re-estimated base empirical (mixed multinomial logit or MMNL) models of transporter and shipper preferences with respect to the choice sets faced within the freight study. MMNL models were estimated in preference to MNL models due to the behavioural link between SC experiments and MMNL models. Specifically, SC experiments yield multiple observations per respondent, which can lead to biased standard errors under MNL models (see e.g. Bliemer & Rose, 2008). MMNL models can accommodate the longitudinal nature of SC data, mitigating biases in standard errors found under MNL models. This property
Observed Efficiency of a D-Optimal Design
175
reinforced our preference to estimate MMNL models in the empirical exercise rather than MNL models. The use of repeated sub-samples for each target sample size allows us to identify trends in the mean and standard deviation of model outputs as the sample size decreases. With this information we can gauge whether the experimental design used in the study would have been sufficient to achieve the desired statistical significance under a given restricted sample size. Likewise, trends in the mean and standard deviation of the model outputs across restricted sample sizes enable us to identify whether there is a potential for model outputs to be biased or unreliable at a given sample size.
7.4.1.
Results for Transporters
We begin the analysis by estimating the most complete replica of the base empirical model for transporters in the original freight study, with three key differences. Firstly, we estimated a model that includes every transporter choice observation, rather than excluding potential outliers as in the original study; this gives us the largest possible sample size to use as a benchmark when comparing sub-sample sizes. Secondly, we only tested unconstrained normal and uniform distributions for possible effects of unobserved preference heterogeneity; this restriction is due to a limitation in the software utilised in the bootstrapping exercise. Thirdly, we isolated transporters for their own model rather than pooling transporters and shippers into one model; this was done to avoid sub-samples with implausible membership (e.g. shippers whose corresponding transporter partner are not included in the sample). Given these constraints, we settled upon a base model for transporters with each of the attribute constructs in the original transporter model (travel time, probability of on-time arrival, fuel cost, freight rate, distance-based charges), plus one interaction term that was in the original model (free-flow travel time multiplied by distance travelled), which serves to isolate separate disutilities for free-flow travel time and slowed-down travel time. The probability of on-time arrival, freight rate and distancebased charges are modelled as normally distributed random parameters. Table 7.4A highlights both the statistical significance of the explanatory variables and the degree of misspecification for each of the prior parameter values (expressed as marginal rates of substitution with respect to fuel cost). Each of the linear effects without a random specification shows a high level of statistical significance, with the appropriate sign. Transporters show a strong preference for the reference alternative (which has no distance-based charging component), and disutility for travel time and fuel cost. The mean and standard deviation for on-time arrival probability enters the model highly significantly, demonstrating both a general preference for reliable travel and heterogeneity in the degree to which this is preferred. The freight rate shows an insignificant positive marginal utility, which confirms a hypothesis of rational passing of distance-based charges and fuel costs along to shippers (i.e. once transporters decide how much of the changes in their costs to pass along to customers via the freight rate, they are
0.861 (3.209) 0.009 ( 2.284, 1.669b) 0.015 ( 2.284) 0.030 (1.985) 0.056 (2.07) 0.011 ( 2.383) 0.002 (0.354) 0.008 (2.213) 0.008 ( 1.377) 0.011 (2.344) 0 (not in model)
Full sample
2 0.983
0
1 —
1 0.145 [ 0.169, 0.121] 0.765 [0.724, 0.806]
0.810 1.138 0.655
Prior MRSa
0.720 [0.706, 0.734] 1.384 [1.367, 1.401] 2.868 [ 3.049, 2.687]
Marginal rates of substitutiona (MRS)
Marginal rates of substitution are for values with respect to the value for fuel cost. The values in brackets are the lower and upper bounds of the 95% confidence interval, respectively. b t-statistic for an interaction term involving distance travelled; the mean effect of this interaction is included in the parameter for free-flow time for comparison with the prior.
a
Reference alternative Free-flow time (min) Slowed-down time (min) Probability of on-time arrival (%) — mean Probability of on-time arrival (%) — std. dev. Fuel cost (A$) Freight rate (A$) — Mean Freight rate (A$) — std. dev. Distance-based charges (A$) — mean Distance-based charges (A$) — std. dev. Waiting time (min)
Parameters
Table 7.4A: Transporter base model versus priors (t-statistics in parentheses; all random parameters distributed normally).
176 Sean M. Puckett and John M. Rose
Observed Efficiency of a D-Optimal Design
177
indifferent to the amount they are compensated across alternatives). However, there is a highly significant degree of preference heterogeneity with respect to the freight rate, suggesting that there are some transporters who are much more sensitive to the freight rate than others; hence, the low mean utility could also be related to other effects. Lastly, distance-based charges have a mean negative effect on utility less than the disutility of fuel cost, although this effect has only minor statistical significance. Transporters appear to have strongly varying sensitivities to distancebased charges, with a highly significant standard deviation of marginal disutilities for the charges. Some relationships between these parameter estimates are close to those amongst the assumed prior parameter values, whilst others reveal relative sensitivities that differ strongly from the priors. Most notably, the mean estimated marginal rates of substitution for both free-flow and slowed-down travel time with respect to fuel cost are quite similar to their corresponding prior values, at 0.72 dollars per minute in free-flow conditions versus a prior value of 0.81 and 1.384 dollars per minute in slowed-down conditions versus a prior value of 1.14. Conversely, sampled transporters demonstrated mean sensitivities to on-time arrival reliability and distance-based charges with respect to fuel cost that were highly divergent from the assumed prior relationships, at 2.868 dollars per percentage point of reliability versus a prior value of 0.655 and 0.765 dollar spent on fuel per dollar spent on distance-based charges versus a prior value of 2. We can also compare the assumed prior attribute levels for the reference alternative versus those observed in the study (which, unlike the preference estimates are constant across decision-maker class). In contrast to the relatively large discrepancies with respect to prior and observed marginal utility parameters, Table 7.4B shows that the observed mean attribute levels were generally similar to the prior values. In the two-hour-or-less segment, each of the observed travel time and on-time arrival values are similar to the priors, with free-flow travel time, slowed-down travel time and waiting time only around 10, 6 and 14 min greater than the prior values, respectively. Likewise, the mean on-time arrival probability is only around 8% (in magnitude) greater than the prior value. The observed cost measures are much higher, however, with levels almost three times as large as the priors. Table 7.4B: Observed mean attribute levels by segment versus priors. Parameters
Free-flow time (min) Slowed-down time (min) Waiting time (min) Probability of on-time arrival (%) Fuel cost
Trips less than 2 h
Trips greater than 2 h
Assumed priors
Observed mean
Assumed priors
Observed mean
40 20 20 75 A$11.00
50.39 26.49 34.07 82.54 A$31.69
140 45 45 75 A$30.00
234.68 51.02 64.58 81.54 A$237.99
178
Sean M. Puckett and John M. Rose
In the over-two-hours segment, the priors reflect a mean trip involving considerably less free-flow time and distance travelled than observed in the sample. The observed free-flow time and costs are much higher than the prior values. Still, the observed values for slowed-down time, waiting time and on-time arrival probability are relatively close to the priors, with discrepancies of only around 6 min, 20 min and 6%, respectively. This is an encouraging result, in that the design avoided broad-scale misspecification of prior attribute levels. The efficiency of the design was dependent upon revealed preference attribute levels that we had to assume in the design generation process, and hence large discrepancies between the prior attribute levels and observed reference attribute levels would have impacted the efficiency of the design negatively. We now turn to the bootstrapping exercise for transporters. We re-estimated the base model for 100 randomly selected sub-samples for restricted sample sizes (i.e. cohort sizes) of decreasing multiples of ten. Beginning with a cohort size of 140 (the closest multiple of 10 below the full sample size of 145) we estimated repeated subsamples on increasingly smaller cohort sizes to gauge the strength of the experimental design. Our exercise focusses on comparisons of parameter estimates rather than the more intuitive marginal rates of substitution, because the optimisation of our design was limited to a calibration with respect to the standard errors of these parameters. More recent research (Scarpa & Rose, 2008) has looked at optimising with respect to marginal rates of substitution; this process was unavailable at the time of the original empirical exercise and as yet is limited to MNL models. We are currently exploring designs optimised with respect to marginal rates of substitution within MMNL models, but the research is in a preliminary stage. Table 7.5 shows the relative stability in mean parameter estimates as the sample size is increasingly restricted. The mean parameter estimates appear fairly stable as the cohort size falls, implying no general tendency toward bias under small samples. That is, other than some small fluctuations in mean values as the sample size is reduced, a relatively small sample size would have given similar results as the full sample, on average. Indeed, the mean parameter values show no general misbehaviour until the cohort size falls below 80, which approaches only one-half of the sample. The mean parameter estimates across cohort sizes do not tell the whole story, however. In practice, under any given restricted sample size the analyst would only have one sample with which to work. One question that follows the first test for consistency directly is whether the parameter estimates that one would obtain under a restricted sample size would tend to be statistically significant. Table 7.6 compares the frequencies (out of 100) with which each parameter estimate meets the standard of a p-value below 0.05 as the sample size is reduced: A preference for the reference alternative was the strongest effect statistically in the base model, and hence serves as a useful initial gauge of the relative strength of the experimental design at each cohort size. The reference alternative is statistically significant at the 95% confidence level in every sub-sample, save one outlier, through a sample size of 110. Given that the reference alternative was the strongest explanatory variable in the base model, it would follow that other explanatory variables would require larger sample sizes to reach a desired significance level. Indeed, travel time has
0.8662 0.0145 0.0305 0.0560 0.0106 0.0017 0.0077 0.0238 0.0080 0.0111
0.8614 0.0146 0.0304 0.0560 0.0105 0.0015 0.0077 0.0239 0.0080 0.0111
Reference alternative Travel time (min) Probability of on-time arrival (%) — mean Probability of on-time arrival (%) — std. dev. Fuel cost (A$) Freight rate (A$) — mean Freight rate (A$) — std. dev. Free-flow time (min) distance (in ‘000 km) Distance-based charges (A$) — mean Distance-based charges (A$) — std. dev.
140
All (145)
Parameters
0.0112
0.0082
0.0106 0.0017 0.0077 0.0241
0.0557
0.8560 0.0145 0.0309
130
0.0115
0.0082
0.0105 0.0015 0.0078 0.0237
0.0561
0.8563 0.0144 0.0312
120
0.0115
0.0083
0.0105 0.0014 0.0078 0.0242
0.0547
0.8458 0.0147 0.0310
110
0.0114
0.0081
0.0105 0.0013 0.0077 0.0241
0.0550
0.8586 0.0146 0.0310
100
0.0117
0.0085
0.0105 0.0013 0.0076 0.0249
0.0542
0.8473 0.0149 0.0308
90
0.0111
0.0083
0.0106 0.0014 0.0075 0.0240
0.0545
0.8661 0.0147 0.0307
80
0.0121
0.0094
0.0107 0.0014 0.0074 0.0241
0.0534
0.8385 0.0148 0.0299
70
Table 7.5: Mean bootstrap parameter estimates by cohort size (transporters, 100 sub-samples per cohort size, 4 observations per respondent).
Observed Efficiency of a D-Optimal Design 179
180
Sean M. Puckett and John M. Rose
Table 7.6: Frequency of parameter estimates significant at the 95% confidence level by cohort size (transporters). Attribute
Cohort size 140 130 120 110 100
Reference alternative 100 100 100 Travel time (min) 100 87 69 Probability of on-time arrival (%) — 52 41 25 mean Probability of on-time arrival (%) — 72 51 35 std. dev. Fuel cost (A$) 100 99 90 Freight rate (A$) — mean 0 0 0 Freight rate (A$) — std. dev. 96 77 60 Free-flow time (min) distance 0 0 0 (in ‘000 km) Distance-based charges (A$) — mean 0 0 0 Distance-based charges (A$) — std. dev. 99 90 81
90
80
70
99 59 21
91 38 15
86 30 12
80 14 6
61 11 2
16
13
12
5
3
72 0 45 1
46 0 37 5
28 0 25 3
15 0 13 4
7 0 8 1
0 60
0 47
1 32
0 10
0 11
difficulty in achieving a p-value below 0.05 under mildly restricted sample sizes. Ontime arrival probability appears to require more choice observations than travel time to achieve statistical significance, although its status as a random parameter may contribute to this. Fuel cost performs well under relatively sample sizes, generally maintaining statistical significance at the 95% level at a sample size of 120. The interaction between free-flow travel time and trip distance loses significance quickly as the sample size is reduced, implying that at lower sample sizes a different modelling construct would be needed to identify separate effects for free-flow and slowed-down travel time. Lastly, the mean disutility of distance-based charges is insignificant at all restricted sample sizes, but heterogeneity around the mean is generally statistically significant at the 95% confidence level at a sample size of 130. Now that we have looked into the mean values of each marginal utility estimate and the relative tendencies for each parameter estimate to be statistically significant at increasingly restricted sample sizes, it is important to consider the degree to which estimates from any given sub-sample would tend to differ from the full sample value. It is possible to observe repeated sub-samples that yield mean parameter estimates that are both close to the full sample values and which have a high probability of being statistically significant, yet display the potential to include individual subsamples with parameter estimates that are different from the mean value and are statistically significant, which is problematic. Table 7.7 shows the normalised standard deviation of each parameter estimate by cohort size. This gives us a scaled indicator of how closely grouped each parameter estimate is across repeated sub-samples: Under an assumption of normality, we would expect a little more than 95% of the parameter estimates to lie within a range of two times the normalised standard
Reference alternative Travel time (min) Probability of on-time arrival (%) — mean Probability of on-time arrival (%) — std. dev. Fuel cost (A$) Freight rate (A$) — mean Freight rate (A$) — std. dev. Free-flow time (min) distance (in ‘000 km) Distance-based charges (A$) — mean Distance-based charges (A$) — std. dev.
Parameters 0.035 0.046 0.057 0.057 0.052 0.424 0.098 0.070 0.100 0.054
140 0.074 0.079 0.110 0.096 0.095 0.779 0.183 0.121 0.193 0.108
130 0.112 0.093 0.148 0.134 0.124 1.151 0.239 0.141 0.221 0.154
120 0.150 0.121 0.187 0.175 0.167 1.522 0.305 0.188 0.325 0.202
110 0.185 0.154 0.209 0.197 0.187 1.870 0.367 0.243 0.406 0.298
100
0.179 0.177 0.232 0.246 0.252 2.473 0.403 0.308 0.466 0.289
90
0.200 0.195 0.265 0.277 0.271 2.560 0.484 0.336 0.526 0.359
80
Table 7.7: Standard deviation of parameter estimates as a percentage of the mean by cohort size (transporters).
0.246 0.215 0.317 0.307 0.327 3.193 0.544 0.365 0.600 0.428
70 Observed Efficiency of a D-Optimal Design 181
182
Sean M. Puckett and John M. Rose
deviation away from the mean. Beginning with the marginal utility of the reference alternative, a little more than 95% of the parameter estimates lie within 7% of the mean value at a restricted sample size of 140. This range grows quickly as the sample size is reduced, doubling when the sample size is restricted to 130 (indeed, each of the parameter estimate distribution ranges from the mean roughly doubles when the sample size is reduced from 140 to 130). This degree of uncertainty is stepped up incrementally again when restricting the sample size to 120 (the lowest sample size at which one may be reasonably confident in achieving strong statistical significance throughout the model), at which point about 95% of the marginal utility estimates lie within 25% of the mean. Most of the parameter estimates behave similarly at a restricted sample size of 120, with around 95% of the marginal utility estimates for travel time, on-time arrival, fuel cost, free-flow time multiplied by trip distance, and distance-based charges between 20% and 30% of their respective means. The mean estimate for distance-based charges and the estimated standard deviation of the freight rate (not a design attribute) do show larger divergence from the mean, however, at a restricted sample size of 120 (whilst performing reasonably well at a sample size of 130). Considered together with Table 7.6, this evidence suggests that a sample size of 130 should have been sufficient to yield significant parameter estimates that are also reliable, but that further reductions in sample size could result in small sample size effects with heterogeneity (i.e. unrepresentative samples) leading to either insignificant parameter estimates or significant parameter estimates that are considerably different from the values that one would expect under a larger sample.
7.4.2.
Results for Shippers
Turning to the shipper portion of the study, our analysis centres on a relatively complex base model. In this model, the marginal utility of each attribute in the design (other than waiting time, which is insignificant just as in the transporter model), along with the reference alternative and freight rate, is represented as a random parameter. This flexibility in model selection is likely due in large part to the near doubling of choice observations relative to the transporter model (138 respondents with 8 choice observations each versus 145 transporters with 4 choice observations each). Such a hypothesis leads directly to the central theme of this section, the potential degree to which the sample could have been restricted whilst still yielding statistically significant parameter estimates for shippers. We begin the discussion with a look into the discrepancies between the assumed prior parameter values and the observed parameter estimate values in the full sample. Table 7.8 highlights the differences between the priors and the observed marginal utility parameters. The relationships amongst parameter estimates for shippers are generally, and in some cases drastically, different to the assumed prior relationships. The mean marginal rates of substitution with respect to fuel cost reveal much stronger sensitivities to travel time components than assumed, with values of 1.749 and 3.097
0.985 (4.302) 0.016 ( 4.095) 0.020 (4.187) 0.030 ( 3.392) 0.026 (1.840) 0.179 (5.549) 0.167 (5.126) 0.010 ( 2.866) 0.006 (2.041) 0.006 ( 1.727) 0.020 (4.955) 0.013 ( 3.149) 0.010 (3.191) 0 (Not in model)
Full sample
1 — 2 0.828
1 0.66 [0.425, 0.894] 1.294 [1.25, 1.338] 0
2
0.828
3.097 [2.945, 3.249] 15.878 [ 16.65, 15.106]
0.828
Prior MRSa
1.749 [1.577, 1.921]
Marginal rates of substitutiona (MRS)
Marginal rates of substitution are for values with respect to the value for fuel cost. The values in brackets are the lower and upper bounds of the 95% confidence interval, respectively.
a
Reference alternative Free-flow time (min) — mean Free-flow time (min) — std. dev. Slowed-down time (min) — mean Slowed-down time (min) — std. dev. Probability of on-time arrival (%) — mean Probability of on-time arrival (%) — std. dev. Fuel cost (A$) — mean Fuel cost (A$) — std. dev. Freight rate (A$) — mean Freight rate (A$) — std. dev. Distance-based charges (A$) — mean Distance-based charges (A$) — std. dev. Waiting time (min)
Parameter
Table 7.8: Shipper base model versus priors (t-statistics in parentheses; all random parameters distributed normally).
Observed Efficiency of a D-Optimal Design 183
184
Sean M. Puckett and John M. Rose
dollars per minute in free-flow and slowed-down conditions, respectively, versus prior values of 0.828 for both travel time components. That is, the base model implies that shippers, on average, are both more sensitive to travel time relative to fuel cost than assumed, and more sensitive to slowed-down time relative to free-flow time than assumed. Consistent with the transporter model, shippers appeared to be less sensitive to distance-based charges than assumed, on average, with a mean marginal rate of substitution between distance-based charges and fuel cost of 1.294 dollars spent on fuel per dollar spent on distance-based charges, compared to an assumed rate of 2. Most strikingly, the estimated mean marginal rate of substitution between the probability of on-time arrival and fuel cost is almost eight times higher than assumed, at 15.878 dollars per percentage point in reliability compared to 2. Hence, the assumed prior sensitivity to reliability was much lower than observed. As with the transporter model, we estimated 100 sub-samples for decreasing multiples of 10 respondents starting from the full sample size. The mean parameter estimates under each cohort size are shown in Table 7.9. Unlike the transporter model, which demonstrated fairly steady mean parameter estimates across cohort sizes up to around one-half of the full sample, we observe two separate tendencies in the shipper model. The first tendency is the aforementioned tendency, in which approximately half of the attributes’ marginal utility estimates show no major trend as sample size is reduced until the sample size approaches onehalf of the sample. That is, marginal utility estimates for the reference alternative, free-flow mean effect, slowed-down time mean effect, on-time arrival mean effect, freight rate and distance-based charge standard deviation are generally steady as the cohort size falls from 138 toward around 80. However, this is countered by the second tendency, in which the remaining explanatory variables tend to reveal a mean estimate that changes quickly from the full sample value as the cohort size falls. This indicates a strong statistical influence by a relatively small group of observations (i.e. outlier effects) over these variables. Indeed, the main empirical models arising from the original study removed some observations to control for outlier effects (chiefly unusually long trips and vehicles with unusual stated fuel economy). The new values taken by the mean parameter estimates at a sample size of 130 appear to represent a steady baseline for these variables, as the marginal utility estimates for these variables demonstrate stability from a sample size of 130 down to around the same threshold as the other attributes. This confirms the general trend throughout both the transporter and shipper sides of the sample for sub-samples to yield unbiased estimates relative to the full sample until the cohort size falls to around 80; in the case of this subset of attributes, this tendency is only apparent once controlling for outlier effects. We now turn to Table 7.10 to examine the frequencies with which the marginal utility parameter estimates reached statistical significance at each restricted sample size. Many of the attributes in the design maintain significance up to and even beyond the 80 respondent levels. The mean effects for free-flow time, slowed-down time and on-time arrival probability, along with the standard deviations for free-flow time and on-time arrival probability, each maintain a strong tendency to reach statistical
0.1699 0.1627
0.1785 0.1670
0.1648
0.1698
0.0241
0.1680
0.1739
0.0246
0.1714
0.1777
0.0252
100
0.1714
0.1772
0.0248
90
0.1744
0.1795
0.0270
80
0.1754
0.1800
0.0274
70
0.1781
0.1773
0.0284
60
0.1774
0.1802
0.0288
50
40
0.1935
0.1947
0.0324
0.0102
0.0096
0.0102
0.0099
0.0109
0.0102
0.0111
0.0118
0.0138
0.0147
0.0174
0.0100 0.0082 0.0085 0.0089 0.0096 0.0089 0.0097 0.0105 0.0113 0.0120 0.0119 0.0065 0.0038 0.0038 0.0043 0.0044 0.0047 0.0049 0.0054 0.0062 0.0066 0.0076 0.0058 0.0111 0.0111 0.0115 0.0105 0.0121 0.0125 0.0121 0.0113 0.0106 0.0144 0.0199 0.0205 0.0201 0.0210 0.0205 0.0206 0.0209 0.0214 0.0188 0.0211 0.0221 0.0132 0.0095 0.0097 0.0107 0.0116 0.0111 0.0121 0.0127 0.0149 0.0154 0.0182
0.0242
0.0256
110
0.9847 0.9947 0.9959 0.9868 0.9853 0.9704 0.9526 0.9634 0.9392 0.9579 0.9249 0.0163 0.0151 0.0151 0.0155 0.0158 0.0157 0.0160 0.0160 0.0161 0.0163 0.0177 0.0203 0.0204 0.0206 0.0209 0.0215 0.0213 0.0225 0.0226 0.0225 0.0226 0.0247 0.0295 0.0287 0.0284 0.0287 0.0291 0.0294 0.0302 0.0305 0.0326 0.0325 0.0357
120
Reference alternative Free-flow time (min) — mean Free-flow time (min) — std. dev. Slowed-down time (min) — mean Slowed-down time (min) — std. dev. Probability of on-time arrival (%) — mean Probability of on-time arrival (%) — std. dev. Fuel cost (A$) — mean Fuel cost (A$) — std. dev. Freight rate (A$) — mean Freight rate (A$) — std. dev. Distance-based charges (A$) — mean Distance-based charges (A$) — std. dev.
130
All (138)
Parameters
Table 7.9: Mean bootstrap parameter estimates by cohort size (shippers, 100 sub-samples per cohort size, 8 choice observations per respondent).
Observed Efficiency of a D-Optimal Design 185
186
Sean M. Puckett and John M. Rose
Table 7.10: Frequency of parameter estimates significant at the 95% confidence level by cohort size (shippers). Attribute
Cohort size 130 120 110 100 90
Reference alternative Free-flow time (min) — mean Free-flow time (min) — std. dev. Slowed-down time (min) — mean Slowed-down time (min) — std. dev. Probability of on-time arrival (%) — mean Probability of on-time arrival (%) — std. dev. Fuel cost (A$) — mean Fuel cost (A$) — std. dev. Freight rate (A$) — mean Freight rate (A$) — std. dev. Distance-based charges (A$) — mean Distance-based charges (A$) — std. dev.
100 100 100 100 18 100
80
70
60
50
40
100 100 99 99 96 89 75 67 63 100 100 100 99 98 99 86 79 62 100 99 99 100 98 97 85 79 57 100 100 100 95 88 81 72 53 41 18 13 11 18 19 19 14 14 17 100 99 99 100 100 100 100 100 100
100 100
99
99 100 100 100
99 100
91
97 90 15 7 87 81 100 100 98 95 93 91
82 18 75 99 85 78
74 15 59 97 75 76
34 22 51 86 44 55
25 21 41 78 34 39
61 17 64 98 63 66
53 16 59 99 59 57
48 18 58 94 52 51
32 20 38 84 39 40
significance at restricted sample sizes. The reference alternative also enters the model significantly in almost all cases down to a sample size of 80. Two heterogeneous effects that are significant in the full model are seldom significant at any of the restricted sample sizes; preferences for slowed-down time and fuel cost appear to be homogeneous at small sample sizes, in general. Indeed, the statistical significance of fuel cost is not assured when the sample size is restricted to around 100. This is not a terribly difficult outcome to justify (i.e. after controlling for the freight rate, the shipper may not be terribly sensitive to fuel cost). However, if the large sample estimates showing a sensitivity to fuel cost are correct, this uncertainty in significance for fuel cost at moderate sample sizes is a concern. Some of this could feasibly be mitigated by re-specifying the parameter as non-random, which may increase the explanatory power of the mean effect. The freight rate (not a design attribute) demonstrates significant preference heterogeneity at sample sizes of 70 and even lower. Lastly, the mean marginal disutility of distance-based charges tends to enter the model with a p-value less than 0.05 in sample sizes as low as 110, with preference heterogeneity apparent at a somewhat smaller frequency. This shares the implications for fuel cost, in that whilst a low sensitivity to distance-based charges by shippers may be intuitive after accounting for the freight rate, at larger sample sizes shippers’ sensitivities to distance-based charges are indeed statistically significant. Hence, if the model fails to show this under a restricted sample size, it may be a case of observing too few choices to identify shippers’ true behaviour. Turning to Table 7.11, we observe a rather large spread of most parameter estimates relative to their mean values at even moderately restricted sample sizes.
Reference alternative Free-flow time (min) — mean Free-flow time (min) — std. dev. Slowed-down time (min) — mean Slowed-down time (min) — std. dev. Probability of on-time arrival (%) — mean Probability of on-time arrival (%) — std. dev. Fuel cost (A$) — mean Fuel cost (A$) — std. dev. Freight rate (A$) — mean Freight rate (A$) — std. dev. Distance-based charges (A$) — mean Distance-based charges (A$) — std. dev.
Parameters
120
110
100
90
80
70
60
50
40
0.083 0.071
0.070 0.052
0.090
0.109
0.222
0.104
0.130
0.245
0.133
0.164
0.380
0.171
0.198
0.381
0.203
0.206
0.453
0.218
0.208
0.487
0.238
0.280
0.561
0.378
0.378
0.673
0.268
0.273
0.416
0.476
0.582
0.622
0.698
0.846
0.867
0.997
0.209 0.292 0.366 0.416 0.508 0.564 0.595 0.736 0.714 1.104 0.542 0.559 0.716 0.760 0.843 0.914 0.963 1.005 1.140 1.187 0.296 0.346 0.401 0.538 0.543 0.600 0.708 0.942 1.130 1.155 0.099 0.142 0.183 0.283 0.303 0.390 0.422 0.497 0.519 0.695 0.263 0.332 0.436 0.469 0.516 0.536 0.598 0.803 0.707 0.970
0.168
0.135
0.066 0.111 0.142 0.161 0.178 0.212 0.251 0.342 0.338 0.474 0.082 0.119 0.137 0.184 0.202 0.237 0.240 0.308 0.368 0.576 0.106 0.143 0.170 0.200 0.227 0.274 0.302 0.351 0.358 0.573 0.074 0.098 0.139 0.169 0.208 0.250 0.284 0.307 0.358 0.479
130
Table 7.11: Standard deviation of parameter estimates as a percentage of the mean by cohort size (shippers).
Observed Efficiency of a D-Optimal Design 187
188
Sean M. Puckett and John M. Rose
This is important, in that the preceding tables show strong tendencies for the marginal utility estimates to reach statistical significance even when the sample size is reduced to around 80 respondents. Hence, under a restricted sample there would have been a risk of observing significant parameter estimates that would be considerably different from the values found under a large sample. The lowest sample size that yielded a reasonably tight set of parameter value distributions was 110; at this sample size, around 95% of the sub-samples taken resulted in parameter estimates with 36% of the mean for the reference alternative, free-flow time (both mean and standard deviation), standard deviation of slowed-down time, on-time arrival probability (both mean and standard deviation) and the standard deviation of the freight rate. The distributions for the standard deviations of free-flow time and the freight rate become wide at a sample size of 100, with the distributions of the mean estimates for free-flow and slowed-down time growing in spread at a sample size of 90. At a sample size of 80, the distributions for all but on-time arrival probability (mean and standard deviation) are wide enough that a spread of 40% of the mean is insufficient to account for 95% of estimates. Ultimately, it appears that the design itself was strong enough to accommodate a sample size as small as 80 shippers; the choice observations that would have been captured under a sample of 80 shippers would likely have led to an econometric model of shipper choice behaviour that yielded statistically significant parameter estimates. However, the strength of the design would not necessarily have been sufficient to obtain estimates that are reasonably close to the values that would be obtained under a larger sample. This may be less an issue of optimal design itself, and more an issue of heterogeneity dominating statistical efficiency concerns under small sample sizes.
7.5. Conclusions The experimental design for the freight study was viewed as a strong success upon the completion of the survey, in that it led to the capture of sufficient preference information to estimate a series of complex econometric models despite the limited sample size of 145 transporters (with 580 choice observations) and 138 shippers (with 1106 choice observations). The bootstrapping exercise examined in this paper served to gauge just how far the design could have been pushed, had there been greater difficulty in sourcing respondents for the study. The initial analysis indicates that the design would likely have been sufficient to yield accurate and significant behavioural implications had the sample been restricted to around 130 transporters and 110 shippers. If the sample had been restricted to between 80 and 100 respondents from both decision-making classes, the sample size may have been small enough to yield an unrepresentative sample whilst still offering enough choice observations under the optimal design to achieve statistically significant parameter estimates. In such an unfortunate case, the significant parameter estimates could have been biased away from the values that would be found under a larger sample. Hence, it is important to acknowledge that, whilst optimal designs can be a powerful tool in achieving statistically significant parameter estimates under small
Observed Efficiency of a D-Optimal Design
189
sample sizes, behavioural factors can outweigh statistical factors in determining an appropriate sample size. Ultimately, although statistical significance is a necessary condition for identifying preference information, it may not be sufficient. Rather, one must ensure that stability in parameter estimates has been reached before one can have confidence that the statistically significant parameter estimates obtained are also plausible estimates. We are confident that we have found such stability in the estimates obtained in the study at levels near the full sample and even restricted as low as around 100 respondents of both classes of decision makers. Some instability found in interaction terms and random parameter distributions could be remedied through alternative model specifications, further solidifying the models under restricted sample sizes. In addition to the implications found relating to each half of the sample, the bootstrapping exercise revealed some interesting implications regarding experimental designs for studies involving multiple classes of decision makers, in general. In the freight study, the restriction of having capturing half as many choice observations per transporter compared to each shipper led to the design algorithm sacrificing some statistical efficiency with respect to the choices made by shippers over the choice sets faced jointly by transporters and shippers. This allowed the experimental design to have a greater ability to ensure statistical significance for transporters under a smaller number of choice observations relative to a design that weighted the prior information on transporters and shippers equally. The design appears to have successfully struck such a balance, in that both the transporter and shipper models demonstrated similar rates of decline in performance as the sample size was decreased. That is, despite the fact that the transporter model was calibrated against one-half the number of observations for a given sample size relative to the shipper model, statistical confidence in the behavioural implications for both sides of the sample decreased in a similar manner as the sample size was increasingly restricted. This similarity across the two decision-making classes likely reflects one of two effects. The first effect would be the optimal one, in which the design accomplished what it was intended to accomplish. If this is true, we would recommend the same approach in similar studies involving multiple decision-making classes. A caveat should be issued, however, in that it may be appropriate to find an additional weighting mechanism in the experimental design process if one would expect to have greater difficulty in sourcing respondents from a particular class of decision maker. In such a case, rather than using the number of choice observations per survey instrument for each class as a primary weighting criterion, the expected ratio of choice observations to be obtained in the study across classes may be important to consider. For example, in the freight study, if one had expected to only have the ability to recruit half as many shippers as transporters (and hence obtain an equal number of choice observations for the two classes), the design would likely have been improved if it had been calibrated to weight each class equally. The other effect that may have resulted in similar behaviour across the two models as the sample size was increasingly restricted is that the shipper model is more
190
Sean M. Puckett and John M. Rose
complex than the transporter model, and hence requires more data than the transporter model to reach a desired level of statistical significance. This is certainly a plausible explanation for at least some of the similarity in performance across the models, and it would be beneficial to re-examine the design under a simpler, common modelling structure for both transporters and shippers. We selected the more complex shipper model in this study because it takes the same form as in the original freight study, and because it allows us to examine a more complex model under restricted sample sizes. We will isolate this effect in ongoing research by establishing an appropriate common modelling structure and conducting repeated bootstrapping exercises as in this study. In other ongoing research, we will attempt to account for flexibility in model specification as sample size is reduced, giving a more thorough picture of the potential for an optimal design to produce meaningful inference under small sample sizes. We will also test the performance of a range of orthogonal designs as a benchmark to reveal the sample sizes that would be required to achieve the same quality of inference in the absence of an optimal design. It would also be beneficial to compare the performance of d0 designs (i.e. d-optimal designs where all priors are assumed to be zero), as these represent a potentially valuable compromise between d-efficient and orthogonal designs. These exercises should demonstrate further the value and limitations of optimal designs in econometric studies of choice behaviour.
Acknowledgements Support for this research has been provided by the Australian Research Council Discovery Program under Grant DP0208269 on Freight Transport and the Environment. We thank our referees for their comments, which have substantially improved the content of this paper.
References Alpizar, F., Carlsson, F., & Martinsson, P. (2003). Using choice experiments for non-market valuation. Economic Issues, 8, 83–110. Bliemer, M. C., & Rose, J. M. (2006). Designing stated choice experiments: The state of the art. Accepted for presentation at 11th International Conference on Travel Behaviour Research, Kyoto, August 16–20, Japan. Bliemer, M. C., & Rose, J. M. (2008). Construction of experimental designs for mixed logit models allowing for correlation across choice observations. Proceedings of the 87th Annual Meeting of the Transportation Research Board, Washington D.C., January. Bliemer, M. C., & Rose J. M. (2009). Efficiency and sample size requirements for stated choice experiments. The Transportation Research Board (TRB), 88th Annual Meeting, Washington, D.C., United States, 15th January 2009.
Observed Efficiency of a D-Optimal Design
191
Bunch, D. S., Louviere, J. J., & Anderson, D. A. (1994). A comparison of experimental design strategies for multinomial logit models: The case of generic attributes. Working paper, Graduate School of Management, University of California at Davis. Carlsson, F., & Martinsson, P. (2003). Design techniques for stated preference methods in health economics. Health Economics, 12(4), 281–294. Fowkes, A. S. (2000). Recent developments in state preference techniques in transport research. In: J. de D. Ortuzar (Ed.), Stated preference modelling techniques (pp. 37–52). London: PTRC Education and Research Services Ltd. Hensher, D. A., & Puckett, S. M. (2007). Power, concession and agreement in freight distribution chains subject to distance-based user charges. International Journal of Logistics: Research and Applications, 11(2), 81–100. Huber, J., & Zwerina, K. (1996). The importance of utility balance and efficient choice designs. Journal of Marketing Research, 33, 307–317. Johnson, F. R., Kanninen, B. J., & Bingham, M. (2006). Experimental design for stated choice studies. In: B. J. Kanninen (Ed.), Valuing environmental amenities using stated choice studies: A common sense approach to theory and practice (pp. 159–202). The Netherlands: Springer. Kanninen, B. J. (2002). Optimal design for multinomial choice experiments. Journal of Marketing Research, 39(2), 214–217. Kuhfeld, W., Tobias, R., & Garrat, M. (1994). Efficient experimental design with marketing research applications. Journal of Marketing Research, 31, 545–557. McFadden, D. (1974). Conditional logit analysis of qualitative choice behaviour. In: P. Zarembka (Ed.), Frontiers in econometrics (pp. 105–142). New York: Academic Press. Puckett, S. M., & Hensher, D. A. (2008). The role of attribute processing strategies in estimating the preferences of road freight stakeholders. Transportation Research Part E, 44, 379–395. Puckett, S. M., Hensher, D. A., Collins, A., & Rose, J. (2007). Design and development of a stated choice experiment in a two-agent setting: Interactions between buyers and sellers of urban freight distribution services. Transportation, 34(4), 429–451. Sa´ndor, Z., & Wedel, M. (2001). Designing conjoint choice experiments using managers’ prior beliefs. Journal of Marketing Research, 38, 430–444. Scarpa, R., & Rose, J. M. (2008). Designs efficiency for non-market valuation with choice modelling: How to measure it, what to report and why. Australian Journal of Agricultural and Resource Economics, 52(3), 253–282. Toner, J. P., Clark, S. D., Grant-Muller, S. M., & Fowkes, A. S. (1999). Anything you can do, we can do better: A provocative introduction to a new approach to stated preference design. WCTR Proceedings, Antwerp (Vol. 3, pp. 107–120).
192
Sean M. Puckett and John M. Rose
Appendix See Appendix Tables 7.A.1–7.A.4 Table 7.A.1: Choice sets common across sampled group members — trips of 2 h or less (attribute levels relative to reference alternative). Choice set
1 1 2 2 3 3 4 4
Alternative
Free-flow time (%)
Sloweddown time (%)
Waiting time (%)
B C B C B C B C
25 0 25 + 50 0 25 + 25 + 50
+ 50 50 25 + 50 50 25 + 50 25
25 + 25 0 + 25 + 25 0 + 50 0
Fuel cost Likelihood of Distance(%) on-time based arrival (%) charges (%) 25 0 50 + 50 + 25 0 + 25 + 25
0 + 50 0 + 25 + 50 0 0 + 50
50 25 + 25 50 + 50 25 0 25
Table 7.A.2: Choice sets common across sampled group members — trips of more than 2 h (attribute levels relative to reference alternative). Choice set
1 1 2 2 3 3 4 4
Alternative
Free-flow time (%)
Sloweddown time (%)
Waiting time (%)
B C B C B C B C
25 25 0 25 50 50 0 + 25
25 0 25 + 50 0 + 50 50 + 25
25 + 50 + 25 25 + 25 50 25 + 25
Fuel cost Likelihood of Distance(%) on-time based arrival (%) charges (%) + 50 50 0 + 50 + 50 25 + 50 + 50
+ 50 + 50 0 + 50 50 0 + 50 + 25
50 50 + 50 50 50 + 50 + 50 + 25
Table 7.A.3: Choice sets for shippers only — trips of 2 h or less (attribute levels relative to reference alternative). Choice set
5 5 6 6 7 7 8 8
Alternative
Free-flow time (%)
Sloweddown time (%)
Waiting time (%)
B C B C B C B C
+ 25 50 + 50 50 50 25 50 + 25
0 + 25 + 25 25 + 50 25 + 25 50
50 + 50 50 25 + 50 0 0 25
Fuel cost Likelihood of Distance(%) on-time based arrival (%) charges (%) 50 + 50 25 + 25 0 25 + 25 50
+ 25 50 + 25 50 + 25 25 25 0
0 + 50 25 + 25 0 50 0 25
Observed Efficiency of a D-Optimal Design
193
Table 7.A.4: Choice sets for shippers only — trips of more than 2 h (attribute levels relative to reference alternative). Choice set
5 5 6 6 7 7 8 8
Alternative
Free-flow time (%)
Sloweddown time (%)
Waiting time (%)
B C B C B C B C
+ 25 + 25 + 50 25 50 + 25 0 50
0 25 + 25 + 50 50 0 50 0
50 25 + 25 50 + 50 + 25 50 50
Fuel cost Likelihood of Distance(%) on-time based arrival (%) charges (%) + 50 25 0 + 25 + 25 50 + 25 + 25
50 + 50 0 + 25 + 50 25 50 + 25
+ 50 + 25 25 0 + 25 0 25 25
Chapter 8
Effects of Stated Choice Design Dimensions on Model Estimates$ Phani Kumar Chintakayala, Stephane Hess, John M. Rose and Mark Wardman
Abstract There have always been concerns about task complexity and respondent burden in the context of stated choice (SC) studies, with calls to limit the number of alternatives, attributes and choice sets. At the same time, some researchers have also made the case that too simplistic a design might be counterproductive given that such designs may result in issues of omitting important decision variables. This paper aims to take another look at the effects of design complexity on model results. Specifically, we make use of an approach devised by Hensher (2004)1 in which different respondents in the study are presented with designs of different complexity, and look specifically at effects on model scale in a UK context, adding to previous Chilean evidence by Caussade et al. (2005). The results of our study indicate that the impact of design complexity may be somewhat lower than anticipated, and that more complex designs may not necessarily lead to poorer results. In fact, some of the more complex designs lead to higher scale in the models. Overall, our findings suggest that respondents can cope adequately with large number of attributes, alternatives and choice sets. The implications for practical research are
$
This paper builds on earlier work by the authors presented at the 2008 European Transport Conference, but makes use of data set from which a number of coding errors have been removed, significantly changing the results. 1. See also Hensher (2006a, 2006b).
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
196
Phani Kumar Chintakayala et al.
potentially significant, given the widespread use, especially in Europe, of stated choice designs with a limited number of alternatives and attributes.
8.1. Introduction One of the main advantages of stated choice (SC) methods over the cruder stated intentions (SI) and willingness to pay (WTP) approaches is that the range of variables included makes the true purpose of the study less apparent, thereby potentially reducing the risk of response bias. From this perspective, it could be argued that, even if the analyst is only interested in, say, the value of time, and therefore only needs respondents to trade time against cost, there might be reasons for offering a broader range of attributes to conceal the purpose of the study. Indeed, this argument was commonly used in the early days of SC applications by advocates of the method to counter the scepticism of some transport planners towards hypothetical questioning methods in general and the SI approach in particular with its reputation for yielding exaggerated behavioural responses. Moreover, since real-world decision making rarely involves choices between alternatives with only a limited number of attributes, it was argued both on the grounds of realism and for suitability for real-world forecasting that the set of key variables in an SC choice exercise should not be artificially restricted. SC methods as first employed in the United Kingdom were imported from marketing research applications in the United States (Sheldon & Steer, 1982). The view in that field at that time was that a large number of attributes and/or alternatives did not have an adverse impact on model validity (Green & Wind, 1973; Malhotra, 1982; McCullough & Best, 1979; Scott & Wright, 1976). More recently, the view that simplicity is best is regarded as ‘urban myth’ by some and, according to Louviere (2001), there is no evidence to suggest that simpler SC exercises produce different valuations due to reduced task complexity. Despite these various considerations, especially in an applied context, and even more so in the United Kingdom and Europe by extension, there is often a prevailing opinion that SC exercises must be kept relatively straightforward if reliable responses are to be obtained. This view has resulted in a trend towards simpler SC exercises with fewer variables. Hence, SC exercises with two alternatives and three or four attributes are common, with many examples of the most simplistic two-alternative two-attribute designs. To some extent, this is based on evidence in the transport context that indicates task complexity can influence valuations through simplifying choice rules, ignoring attributes or inability to process the information correctly (Arentze, Borgers, Timmermans, & DelMistro, 2003; Widlert, 1998). There are a number of other explanations of the trend to and preference for simpler designs. Work closely associated with Fowkes (e.g. Fowkes, Wardman, & Holden, 1993) has involved the use of ‘boundary rays’ which graphically represent the trade-off between attributes in terms of their implied valuations. This is largely a development
Effects of Stated Choice Design Dimensions on Model Estimates
197
of the ‘Beesleygraph’ concept of the central importance of sensible trade-offs between attributes. This has intuitive appeal, although not necessarily statistical support. This method was fairly widely used in consultancy studies in the United Kingdom in the 1990s. The need to represent the problem in two-dimensional space tended to limit the SC design to just three attributes. Another stimulus to simple designs, recognised by Kroes and Sheldon (1988), is that self-completion questionnaires require more straightforward SC exercise than interviewer-led or computer-based exercises. The cost effectiveness of this means of data collection means that in suitable contexts it is very attractive, thereby increasing the reliance on simple exercises. Recently, analysts have moved away from SC exercises based around real-world choice contexts when the purpose of the study is valuation. Instead, choices are offered amongst unlabelled alternatives (A and B), whereupon there can be no confounding effects from a real-world choice context. This loosens the restriction to offer the principal variables influencing choice with instead a focus on just those main attributes of interest. There has been a tendency in the United Kingdom, which has been reflected in mainland Europe, towards simpler exercises. Not only is this our impression from familiarity with a great many studies, but it can be at least partially evidenced. For example, in a review of a large amount of British empirical evidence, Wardman (1998) quantifies trends towards simpler forms of SC exercise, with choice experiments replacing ranking exercises and the number of choice scenarios evaluated reducing over time. As far as what are often termed national value of time studies are concerned, two have been conducted in the United Kingdom. The first of these consisted of five SC exercises (MVA Consultancy, ITS University of Leeds, TSU Oxford University, 1987). For urban bus and inter-urban car users (both unlabelled SC exercises), only three variables were offered. This was increased to four in the case of urban car (route choice) and inter-urban public transport (unlabelled). The largest number was offered in a mode choice exercise involving rail and coach for commuting journeys into London (main in-vehicle time, other in-vehicle time, walk time, wait time and fare). The second British study (Hague Consulting Group, Accent Marketing and Research, Department for Transport, 1999) used even simpler designs. The main design had only two attributes, time and cost, and is similar in nature to designs used in major national studies in the Netherlands (Gunn, Tuinenga, Cheung, & Kleijn, 1999), Denmark (Fosgerau, Hjorth, & Lyk-Jensen, 2007) and Switzerland (Alger, Dillen, & Widlert, 1996) but not, as far as we are aware, in studies outside Europe. However, it is probably fair to state that the reasoning here was not simplicity but to offer a specific form of SC design with desired and clear trade-offs between time and cost. Four other SC exercises were covered in the study, one of which also had two attributes, two of which had three attributes and the remaining one had four attributes. From the data set used in meta-analysis of British empirical evidence relating to valuations of time and service quality covering the period 1963 to 2000 (Wardman, 2004), 155 studies used SC methods. Of these, 6% contained only two variables,
198
Phani Kumar Chintakayala et al.
19% contained three variables, 59% contained four variables, 15% contained five variables and 1% contained six variables. Whilst designs with four attributes are most common, and in some cases are sufficient to address the key variables in the choice context of interest, there remains a widespread concern to limit the number of attributes contained in an SC exercise to avoid overburdening respondents. In some cases, however, this may artificially limit the variables that are covered, ignore what for some are important influences on choice and generally reduce the value of the results obtained. Current viewpoints are set against a background of little direct empirical evidence that demonstrates the effect of the dimensions of the SC designs on the results obtained in the British context. We here report on a repeat of the survey that has been previously applied in Australia, Chile and Taiwan (Caussade, de Ortu´zar, Rizzi, & Hensher, 2005; Hensher, 2004; Rose, Bliemer, Hensher, & Collins, 2008). The purpose here is to contribute to the evidence base as it relates to the United Kingdom rather than to undertake a comparative analysis.
8.2. Data 8.2.1.
Description of Survey
For readers unfamiliar with the original work (Hensher, 2004), we here outline the approach to survey design. The survey is aimed at understanding the influence of design dimensionality on behavioural response. The SC experiment has 16 designs embedded in 1 master design. The different designs have different number of alternatives, choice sets and attributes, as well as varying numbers of attribute levels and attribute level ranges (i.e. narrow range, medium range and wide range). The design characteristics are shown in Table 8.1. Prior to undertaking the SC experiment, respondents are first asked to provide information on a recent trip that they have undertaken. The SC alternatives are then constructed with the values of the attributes constructed as percentage variations around the values for the respondent’s reported reference trip. Different percentages were used depending upon whether the respondent was assigned to a narrow-, middle- or wide-range design. Narrow-range designs varied the percentages between 75% of the respondent’s reported attribute values, medium-range designs varied the percentages by up to 720%, whilst wide-range designs varied the percentages by up to 760%. Each of the designs is computer generated, with the objective in constructing them being to minimise the variance–covariance matrix of the parameters obtained from models estimated on data collected based on the survey instrument (Rose, Hensher, Caussade, de Ortuzar, & Jou, 2008). Six attributes have been selected based on earlier studies (Hensher, 2001a, 2001b). These are (a) free flow time (FFT), (b) slowed-down time (SDT), (c) stop/start time (SST), (d) Uncertainty Time (UT), that is the buffer time/extra time people need to cover any uncertainties like accidents, road works etc.
Effects of Stated Choice Design Dimensions on Model Estimates
199
Table 8.1: Design characteristics. Design 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
No. of choice sets
No. of alternatives
No. of attributes
No. of levels
15 12 15 9 6 15 6 9 15 6 6 9 12 12 9 12
4 4 3 3 3 3 4 5 5 5 4 5 4 3 3 5
4 4 5 5 3 3 6 3 6 6 5 4 6 3 4 5
3 4 2 4 3 4 2 4 4 3 4 2 2 3 2 3
Range of attribute levels Medium Wide Wide Medium Wide Narrow Narrow Wide Medium Wide Narrow Narrow Medium Narrow Medium Narrow
so as to reach their destinations in time, (e) toll cost (TLC), and (f ) running cost (RC). Given that the ‘number of attributes’ dimension has four levels, ranging from three to six, the following combinations of the six attributes were selected: 3: 4: 5: 6:
(a + b + c)2, d, (e + f )3 a, (b + c)4, d, (e + f ) a, b, c, d, (e + f ) a, b, c, d, e, f
Attempts were made to prepare level-balanced designs. For attributes with odd number of levels, the centre pivot level is 0% with levels evenly spaced 7 either side with a maximum range of 720% change from the reference alternative. For attributes with even numbers of levels, similar ranges are used but the 0% level is not used. Balancing the levels in this way avoids potential biases that can arise when unbalanced designs entice respondents to focus too much on those attribute levels that are presented more frequently. The elements of the design plan are manipulated according to a master plan. As is apparent from Table 8.1, the master plan has 16 runs, that is, 16 different designs are
2. Referred to as total time. 3. Referred to as total costs. 4. Referred to as combined slowed-down and stop/start time (SSST).
200
Phani Kumar Chintakayala et al.
constructed to test the impact of these 5 design elements. In addition to the linear and quadratic effects for the five dimensions, some interactions between those elements can also be estimated. The master plan design allows for the interaction between the number of choice sets and number of alternatives as well as between the number of alternatives and number of attributes. The design of designs (DoD) SC experiment has 16 designs embedded in 1 master design. That is, there are 16 designs in the background, each with two versions. Since these designs do not have the same number of alternatives, choice sets and attributes, and since they do not refer to the same number of attribute levels, nor do they refer to the same levels (narrow range, medium range and wide range), all these are made interactive. Given the nature of the overall design process, not all of the 32 designs used display attribute level balance; however, where balance is not possible, attempts were made to make the designs as balanced as possible. Where attribute level balance was not possible, typically the midpoint levels are those displayed the least. Respondents are randomly allocated to one of the 16 subdesigns. As such, the levels applied to the choice tasks given to individual respondents differ depending on the range of attribute levels as well as on the number of levels for each attribute based on the design that they have been allocated to. Once a respondent is allocated to a design, the number of attributes, alternatives and attribute level range are kept constant for that respondent. Variations in the design dimensions thus occur not within respondent but rather between respondents. An example SC screen is illustrated in Figure 8.1. Respondents are initially asked to choose among alternatives that include their current trip as one of the alternatives. If one chooses the current trip, then he/she is forced to make a secondary choice among the nonbase alternatives available in that choice task. In the present paper, we focus on an analysis of the primary choices.
Figure 8.1: Sample SC screen.
Effects of Stated Choice Design Dimensions on Model Estimates 8.2.2.
201
Description of Sample
The survey was conducted in the United Kingdom by Accent Marketing and Research using computer-aided personal interview (CAPI) technology. Face-to-face interviews with 300 commuters were undertaken in Bristol, Edinburgh and London in early May 2008. Fourteen of the 300 respondents were removed because of errors made during initial data entry. Stratified random sampling was employed, with screening questions used to establish eligibility in respect to commuting by car. Three trip length quotas were imposed: less than 30 min, 30–60 min and more than 60 min. Sample characteristics are given in Table 8.2. Given the emphasis on commuting, it is not surprising that the vast majority are in full-time employment, and this could contribute to a greater proportion of males in the sample along with higher car ownership levels amongst male commuters. The distributions of travel times to work and age also seem reasonable.
Table 8.2: Sample characteristics. Characteristic Gender
Description Male Female
Age
o35 yrs 36–55 yrs W56 yrs
Employment
Full time Part time Casual No (in 6 months)
Trip Length (min)
o30 31–60 W60
Number 187 (62%) 113 (38%) 126 (42%) 130 (43%) 44 (15%) 255 (85%) 35 (12%) 7 (2%) 3 (1%) 180 (60%) 96 (32%) 24 (8%)
202
Phani Kumar Chintakayala et al.
Table 8.3: Summary of observations information. Number of people
Number of observations
Choice sets 6 9 12 15
79 70 76 75
474 630 912 1125
Alternatives 3 4 5
101 92 107
1122 933 1086
63 77 72 88
669 846 732 894
Levels 2 3 4
95 96 109
969 963 1200
Range Narrow Medium Wide
112 99 89
1095 1218 828
Attributes 3 4 5 6
Summary information on the distribution of the sample across choice sets, alternatives, attributes, levels of attributes, and the range of levels is provided in Table 8.3. A good spread across the design dimensions has been achieved.
8.3. Base Model The results of our base model are shown in Table 8.4. This model makes use of a linear-in-attributes specification of a multinomial logit model, with coefficients estimated for all attributes, along with alternative specific constants (ASCs) for the first four alternatives. A jackknife procedure was used to allow for the repeat observations nature of the data. As is typically the case, this procedure had very little impact on the coefficient estimates themselves, but there are some noticeable reductions in the precision of these estimates to the extent that five coefficients are not significant at the 5% level.
Effects of Stated Choice Design Dimensions on Model Estimates
203
Table 8.4: Results for base model. Parameter
Free Flow Time Stop/Start Time Combined Slowed-down and Stop/Start Time Slowed-down time Uncertain Time Running Cost Toll Cost Total Cost Total Time ASC1 ASC2 ASC3 ASC4 Model Fits Observations Log Likelihood Adjusted r2
Before jackknife
After jackknife
Coefficient
t-Stat
Coefficient
t-Stat
0.036 0.084 0.096
2.26 6.42 5.24
0.036 0.085 0.096
1.70 3.06 2.52
0.066 0.039 0.499 0.416 0.961 0.172 1.560 0.179 0.147 0.259
4.64 7.14 4.49 3.36 8.31 12.32 13.96 1.94 1.36 1.69
0.066 0.039 0.500 0.416 0.962 0.172 1.565 0.179 0.147 0.261
2.87 2.93 1.90 1.16 3.35 6.90 5.51 2.15 0.80 1.38
3018 2221.44 0.456
Two of these are ASCs and we would not necessarily expect these to be significant, representing as they do an ordering effect. Nonetheless, a positive and highly significant ASC was detected for the first alternative, reflecting an inertia towards the current alternative. Comparison across coefficients at this stage is hampered by possible differences in scale across the different designs. For example, total cost and total time appear together but are not necessarily comparable with the coefficient estimates for their constituent parts that enter different designs. Nonetheless, it is encouraging that there is a monotonic relationship of the expected form between FFT, SDT and SST, although it is normally found that TLC is assigned a higher weight than RC and this is not the case here. Table 8.5 shows the t-ratios for the time parameter differences based on the model presented in Table 8.4. It can be seen that FFT is significantly different from SST and combined SSST and not far from significantly different from SDT. However, none of the other differences are significant except for those relating to uncertain time. The WTP values estimated using the RC and total cost, and the ratios of the time parameters from the model in Table 8.4 are shown in Table 8.6. None of the valuations of time in terms of RC are significant, and a key factor here is that the RC coefficient is not significant. The value of time ranges from d4.28 per hour (7.13 pence per minute) for FFT to d10.15 per hour (16.91 pence per minute) for
204
Phani Kumar Chintakayala et al.
Table 8.5: t-Stats for parameter differences. Parameter 2
t-Stat
Uncertain Time Slowed-down Time Stop/Start Time Slowed-down Stop/Start Time Slowed-down Time Uncertain Time Uncertain Time Slowed-down Stop/Start Time Slowed-down Time Uncertain Time
0.22 1.50 2.41 2.40 1.26 2.93 1.78 0.49 0.95 3.28
Parameter 1 Free Flow Time Free Flow Time Free Flow Time Free Flow Time Slowed-down Stop/Start Time Slowed-down Stop/Start Time Slowed-down Time Stop/Start Time Stop/Start Time Stop/Start Time
Table 8.6: WTP estimates for base model. Parameter
Free Flow Time Stop/Start Time Combined Slowed-down and Stop/Start Time Slowed-down Time Uncertain Time Total Time vs. Total Cost
After jackknifing Value (d per hour)
t-Stat
4.35 10.40 11.80
1.54 1.51 1.47
0 0 0
10.78 22.26 26.32
8.13 4.86 10.84
1.55 1.36 3.16
0 0 4.1
17.56 10.01 17.37
Confidence Interval
SST. These numbers surround official Department for Transport values of, essentially an unspecified type of time, around 10.5 pence per minute, for commuting.
8.4. Accounting for Design Dimensions In this section, we investigate the impacts of design complexity by testing for differences in the relative weights of the observed and unobserved utility components as a function of the design. It is done by estimating scales for each dimension. When estimating models on data with different sources, it is important to allow for differences in scale across data sources. Indeed, the relative weight of the observed and unobserved parts of utility may vary significantly across datasets, for example as a result of more or less randomness in the choice processes. We can hypothesise that the amount of randomness in choices will be greater as task complexity increases. The models estimated in this section all need to be compared to the base model from the previous section.
Effects of Stated Choice Design Dimensions on Model Estimates 8.4.1.
205
Allowing for Differences in Scale
One can use the Bradley-Daly (1992) nested logit model technique for estimating scale parameters. In this paper, however, we employ a more direct method of estimating scale parameters, using BIOGEME as explained below. Let Ui,d give the utility of alternative i with sample d, where d ¼ 1,y, D. Then, we have U i;1 ¼ V i;1 þ i;1 U i;d ¼ V d;1 þ i;d
(8.1)
U i;D ¼ V i;D þ i;D where varði;d Þ ¼ p2 =6m2d Using sample 1 as the arbitrary base, we multiply the utility functions in group d by ad, where we set a1 ¼ 1. In detail, we then have U i;1 ¼ V i;1 þ i;1 ad U i;d ¼ ad V i;d þ ad i;d
(8.2)
aD U i;D ¼ aD V i;D þ aD i;D For estimation as a homoscedastic model, we thus obtain that: varði;1 Þ ¼ a2 varði;d Þ
(8.3)
This gives us that: a2d ¼
varði;1 Þ varði;d Þ
(8.4)
md m1
(8.5)
which in turn reduces to: ad ¼
This means that if the estimate for ad is larger than 1, then the variance of the unobserved utility components in sample d is smaller than in the base sample, with the converse applying if ad is smaller than 1.
206 8.4.2.
Phani Kumar Chintakayala et al. Variations in Scale with Design Dimension
We now turn away from accounting for differences in scales across all 16 designs, focusing instead on differences as a function of the number of choice sets, alternatives, attributes, levels of attributes and the range of levels (narrow or wide) on the choices. Here, it should be noted that this comparison is not perfect as other factors might come into it, although the master design should have largely dealt with this. Table 8.7 presents the estimates of scale parameters and Table 8.8 reports the estimates of the study parameters. 8.4.2.1. Model with scales for different designs There is a significant improvement in the model fit when the designs are treated separately. Chi-squared test suggests that the model with scales for different designs is superior to the base model. All but three scale parameters are significantly different from and less than 1, indicating that the choices are less sensitive to the observed utilities when compared to the base design, which was one of the more complex designs, with 15 choice sets, 4 alternatives, 4 attributes, and 3 levels. However, no clear pattern in the scales as a function of design can be observed. 8.4.2.2. Model with scales for number of choice sets The scale parameter for the group that has data related to 6 choice sets is normalised to 1. The model fit has marginally improved from the base model. However, none of the scale parameters is significantly different from 1, indicating that the number of choice sets seems to have little influence on the relative weight of the observed and unobserved utility components. 8.4.2.3. Model with scales for number of alternatives The scale parameter for the group that has data related to 3 alternatives is set to 1. The model fit has marginally improved from the base model, and a Chi-squared test suggests that this model is significantly better than the base model. Both remaining scale parameters are not significantly different from 1, indicating that the number of alternatives in this case has little or no impact on the observed utility components. 8.4.2.4. Model with scales for number of attributes The scale parameter for the group that has data related to 3 attributes is set to 1. The model fit has marginally improved from the base model. Again, a Chi-squared test suggests that this model is significantly better than the base model. Only the scale parameter for the data with 6 attributes is significantly different from and less than 1, indicating more randomness for 6 attributes. This potentially suggests an upper limit to the acceptable number of attributes. 8.4.2.5. Model with scales for number of levels The scale parameter for the group that has data related to 2 levels is set to 1. The model fit has improved considerably from the base model. Of the remaining scale parameters, the scale parameter representing 3 levels is significantly different from and greater than 1, indicating more
0.437
0.515
0.400
0.419 0.872 0.522 0.688 0.242 0.348 0.472 0.597 0.404 0.529 0.346 1.010
Design 2
Design 3
Design 4
Design Design Design Design Design Design Design Design Design Design Design Design
a
0.4800
Choice sets
Log Likelihood Adjusted r2
6 choice sets 9 choice sets 12 choice sets 15 choice sets
0.35
0.40
0.92
Jackknife t-stat
0.4570
2213.22
1.060
1.060
0.842
Base
Parameter Coeff.
t-Stats are with respect to 1.
Adjusted r2
Log Likelihood
5.34 0.40 3.41 0.98 8.90 4.62 2.89 3.41 3.36 2.35 5.07 0.03
3.28
2.53
3.17
Jackknife t-stata
2107.56
Base
Design 1
5 6 7 8 9 10 11 12 13 14 15 16
Coeff.
Parameter
Designs
0.4580
Adjusted r2
1.62
1.18
Jackknife t-stat
2211.96
1.220
1.280
Base
Coeff.
Log Likelihood
5 alternatives
4 alternatives
3 alternatives
Parameter
Alternatives
Table 8.7: Scale Differences models — scales.
Adjusted r2
Log Likelihood
6 attributes
5 attributes
4 attributes
3 attributes
Parameter
5.15
0.68
0.70
Jackknife t-stat
0.4570
2211.64
0.500
0.930
0.920
Base
Coeff.
Attributes
Log Likelihood Adjusted r2
4 levels
3 levels
2 levels
1.38
1.95
0.4620
2194.30
0.859
1.250
Base
Log Likelihood Adjusted r2
Wide
Narrow
Base
5.15
2.87
Jackknife t-stat
0.4580
2209.50
0.8
0.79
Base
Parameter Coeff.
Parameter Coeff.
Jackknife t-stat
Range
Levels
size size size size
20 24 25 30
Log Likelihood Adjusted r2
Set Set Set Set
Set size 16
Set size 15
Set size 12
Set size 9
Parameter
0.70 0.91 1.55 2.69
0.36
0.64
2.22
Jackknife t-stat
0.4680
2162.44
0.876 0.739 1.610 0.472
1.090
0.858
0.577
Base
Coeff.
Set size
Free Flow Time Running Cost Stop/Start Time Combined Sloweddown and Stop/Start Time Slowed-down Time Toll Cost Total Cost Travel/Total Time Uncertain Time ASC1 ASC2 ASC3 ASC4
Parameter
3.06 2.52
2.87
2.93
1.90 1.16 3.35
6.90
5.51 2.15 0.80 1.38
0.084 0.096
0.066
0.039
0.499 0.416 0.961
0.172
1.560 0.179 0.147 0.259
2.430 0.332 0.560 0.568
0.115
3.560 1.880 0.307
0.192
0.194
1.710 0.239
0.081
Coeff.
2.70 1.71 1.95 1.36
2.47
2.22 2.03 2.56
2.35
2.30
1.58 1.88
0.82
After jackknife t-stat
Designs
1.530 0.165 0.147 0.276
0.037
0.356 1.040 0.172
0.062
0.092
0.475 0.081
0.034
Coeff.
6.01 2.05 0.80 1.66
2.92
0.95 2.66 5.36
3.10
2.85
2.01 2.84
1.63
After jackknife t-stat
Choice sets
1.370 0.171 0.226 0.145
0.032
0.281 0.852 0.155
0.056
0.081
0.401 0.069
0.031
Coeff.
4.28 2.34 1.52 0.93
2.92
0.90 3.32 6.87
3.25
2.02
1.63 2.74
1.78
After jackknife t-stat
Alternatives
1.700 0.230 0.089 0.568
0.062
2.090 1.000 0.185
0.108
0.101
1.020 0.139
0.038
Coeff.
3.90 2.00 0.45 2.22
2.86
3.18 3.53 7.03
3.09
2.24
1.71 2.82
1.36
After jackknife t-stat
Attributes
1.470 0.151 0.121 0.308
0.042
0.517 0.987 0.165
0.070
0.100
0.521 0.087
0.035
Coeff.
5.93 1.67 0.64 1.63
3.63
1.47 3.74 6.09
3.35
3.13
1.83 3.27
1.68
After jackknife t-stat
Levels
1.720 0.214 0.208 0.271
0.049
0.735 1.130 0.189
0.082
0.117
0.625 0.101
0.045
Coeff.
1.510 0.205 0.245 0.490
0.060
3.00 5.11 2.21 0.90 1.32
1.960 1.120 0.178
0.106
0.094
0.972 0.137
0.040
Coeff.
3.33 2.21 1.33 1.79
2.90
1.74 3.37 6.73
2.31
1.50
1.87 2.20
1.24
After jackknife t-stat
Set size
1.71 3.77 7.38
2.93
2.68
1.92 3.00
1.66
After jackknife t-stat
Range
Log 2221.44 Log 2107.56 Log 2213.22 Log 2211.96 Log 2211.64 Log 2194.30 Log 2209.50 Log 2162.44 Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Likelihood Adjusted 0.4560 Adjusted 0.4800 Adjusted 0.4570 Adjusted 0.4580 Adjusted 0.4570 Adjusted 0.4620 Adjusted 0.4580 Adjusted 0.4680 r2 r2 r2 r2 r2 r2 r2 r2
1.70
After jackknife t-stat
0.036
Coeff.
Base Model
Table 8.8: Scale differences models — study parameters.
Effects of Stated Choice Design Dimensions on Model Estimates
209
weight to the utilities, and the 4-level scale being insignificant does not convey any message. There is no clear pattern to suggest. 8.4.2.6. Model with scales for range The scale parameter for the group that has data from designs using the medium range is set to 1. The model fit has marginally improved from the base model. The remaining scale parameters are significantly different from and less than 1, suggesting that narrow and wide ranges are less desirable. Nonetheless, the magnitude of the effect is minor. 8.4.2.7. Model with scales for number of items in a set The scale parameter for the group that has data from design using the minimum number of items (i.e. 9 items resulting from 3 alternatives times 3 attributes) is set to1. The model fit has improved significantly from the base model. Two scale parameters are significantly different from and less than 1, suggesting that the choices are less sensitive to the observed utilities when compared to the base. The remaining scale parameters are insignificant. No clear pattern could be observed. The above analysis suggests that accounting for scale differences for various designs improves the model performance but that few design dimension effects are statistically significant and where they are, the scale is not greatly different from one. That increases in design complexity do not necessarily lead to a reduction in scale dispels some prevailing concerns.
8.4.3.
Variations in WTP Values
We now focus on the effect of accounting for scale difference for different design dimensions on the WTP values. Table 8.9 presents the WTP values estimated from different models that account for different design dimensions. There seem to be not much variation in the WTP values for the parameters across models with exceptions of models that account for different designs, attributes and size of set. FFT is valued lowest, as expected. While the parameter combined SSST is valued highest in models that account for scale differences for variation in the number of choice sets, alternatives, levels and width of levels, it is valued second highest to SST in models that account for scale differences for variation in the number of designs, attributes and size of set.
8.5. Separate Models by Group After establishing that design dimension has little bearing on the scale of a model, we turn to its impact on model fit and WTP valuations through the estimation of separate models for different subgroups of the data, that is data collected from surveys with 3 alternatives, 4 alternatives, and so on. As a first result, the adjusted r2 measure for the various models is given in Figure 8.2.
Free Flow Time Stop/Start Time Combined Slow and Stop/Start Time Slowed-down Time Uncertain Time Total Time vs. Total Cost
Parameters
Choice sets
Alternatives
Attributes
Levels
Width
Set size
1.20
2.16
1.75
2.09
1.72
3.17
2.92
8.51
6.94
6.81
4.11
9.86
4.33 1.55
4.93
7.42
9.99 3.23
4.81 1.29
7.99 1.54
7.44 11.96 1.38
9.01 10.47 1.46
3.73
3.76 10.49
0.00
0.44
0.00
0.79
0.00
5.03
4.76 1.45
5.34
8.37 4.97 1.34
8.56 1.58
3.93 10.63 11.04 3.02
0.00
0.00
0.00 12.19 12.42 1.35
0.00 10.74 10.60 1.48
0.00
5.36
5.50
8.92 3.76 1.63
6.52 1.76
6.15 1.67
8.41 1.90
2.36 1.24
3.88 11.58 11.21 2.94
0.00
0.00
0.00 12.64
0.00 10.87
0.00
4.08 1.48
4.61
7.05
4.98 1.47
8.20 1.66
6.68 11.81 1.67
8.86 10.22 1.63
3.39
3.73 11.73 10.10 3.34
0.00
0.00
0.00
0.00
0.00
4.79
9.92 1.56
4.34 1.52
5.56
8.59
4.83 1.43
8.04 1.63
4.17 10.75 10.16 3.31
0.00
0.00
0.00 12.09 11.50 1.56
0.00 10.53
0.00
5.02
9.92 1.56
4.34 1.52
5.41
8.43
4.83 1.43
8.04 1.63
4.15 10.80 10.16 3.31
0.00
0.00
0.00 11.76 11.50 1.56
0.00 10.23
0.00
5.02
5.41
8.43
4.15 10.80
0.00
0.00
0.00 11.76
0.00 10.23
0.00
t Confidence t Confidence WTP t Confidence WTP t Confidence WTP t Confidence WTP t Confidence WTP WTP t Confidence WTP value value interval value value interval value value interval value value interval value value interval value value interval value value interval (95%) (95%) (95%) (95%) (95%) (95%) (95%)
Designs
Table 8.9: WTP values (in d per hour).
Effects of Stated Choice Design Dimensions on Model Estimates
211
Figure 8.2: Adjusted r2 values. The results show that the number of alternatives has little or no effect on model fit. The findings on the number of attributes and levels are consistent with those from the scale analysis, with 6 attributes and uneven number of levels leading to lower model fit. However, unlike the results showing lower scale with narrower or wider ranges, we here observe higher model fit. This could be an indication that with the narrower or wider ranges, the choice becomes more deterministic without, however, increasing the weight for the attributes, for example as a result of explaining more behaviour through the constants. Table 8.10 shows a summary of WTP measures from the different models estimated in this section. Here, we can see that while there is consistency across surveys for some of the indicators, there are also some significant differences. The results show that the WTP for FFT is between d5 per hour and d9.3 per hour, SDT is between d6.5 per hour and d8.2 per hour, SST is between d6.2 per hour and d13.6 per hour, combined SSST is between d3.8 per hour and d12.9 per hour with d46.2 per hour for 5 alternatives data set, UT is between d2.6 per hour and d5.1 per hour and travel time WTP is between d2.8 per hour and d16.2 per hour with d22.9 per hour value for 9 choice sets data set. To some extent, these differences are also a result of small sample sizes in some groups and lower parameter significance, where with hindsight this clearly potentially also impacts our earlier findings in terms of scale. In any case, although the results show some impact by the design on the WTP indicators, there is no clear pattern suggesting that more complex designs yield less reliable results.
8.6. Conclusions With a view to demonstrate the effect of the dimensions of the SC designs on the results obtained in the British context, we have repeated the survey that has been
212
Phani Kumar Chintakayala et al.
Table 8.10: WTP values from different models. Model
WTP values (d per hour) FFT
SDT
SST
SSST
UT
TT
Base
n/s
n/s
n/s
n/s
n/s
3_alts
n/s
N/A
N/A
N/A
N/A
4_alts
n/s
n/s
n/s
5_alts
n/s
6.699 (2.96) n/s
10.84 (3.16) 8.36 (3.28) N/A
n/s
3_atts 4_atts 5_atts 6_atts
N/A N/A N/A 8.15 (1.85a) n/s
N/A N/A N/A n/s
N/A N/A N/A n/s
46.76 (1.78a) N/A N/A N/A N/A
9.23 (2.58) n/s
4_levels
5.72 (1.99) n/s
6.54 (3.02) 8.15 (2.23) n/s
Narrow
n/s
Medium Wide
9.49 (1.76a) n/s
6_sets
n/s
9_sets 12_sets
2_levels
4.99 (3.2) n/s N/A N/A N/A n/s
n/s
7.62 (3.73) 7.72 (1.74a) n/s
4.44 (3.08) 4.50 (2.28) n/s
n/s
n/s
n/s
n/s
n/s
n/s
n/s
n/s
7.03 (2.55) n/s
n/s
n/s
N/A
N/A
6.61 (2.00) 11.46 (1.93a) N/A
N/A
N/A
3.17 (1.84a) N/A
5.07 (2.06) n/s
6.54 (3.03) n/s
8.72 (2.32) 13.66
n/s n/s
4.80 (2.39) n/s
9_items 12_items 15_items
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
20_items 24_items
N/A 5.44 (4.15) N/A 9.27 (1.65a)
N/A 6.57 (3.21) N/A n/s
N/A 7.74 (2.16) N/A n/s
N/A N/A
N/A 4.89 (2.39) N/A n/s
3_levels
15_sets
25_items 30_items
N/A N/A
14.96 (2.38) n/s N/A N/A N/A N/A 4.69 (1.66a) 16.73 (1.75a) 3.48 (2.45) N/A 14.94 (2.50) n/s 23.37 (2.5) n/s 9.05 (2.19) n/s N/A 17.46 (3.44) N/A N/A N/A N/A
Abbreviations: FFT, free flow time; SDT, slowed-down time; SST, stop/start time; SSST, slowed-down and stop/start time; UT, uncertainty time; TT, total time vs. total cost; n/s, not significant; N/A, either the parameter or running cost is not present in the data. a Significant at 90% confidence level.
Effects of Stated Choice Design Dimensions on Model Estimates
213
previously applied in Australia, Chile and Taiwan. We have attempted to capture the effect of the design dimensions by allowing for differences in the scales that represent variations in the dimensions. The model with scale differences for designs showed no clear indication to suggest that more complex designs are difficult to cope with. The model with scale differences for number of choice sets indicates that the number of choice sets seems to have little influence on the relative weight of the observed and unobserved utility components. The model with scale differences for number of alternatives indicates that the number of alternatives seems to have little or no impact on the relative weight of the observed and unobserved utility components. The model with scale differences for number of attributes provides evidence that the randomness is more when the number of attributes is 6 than when the number of attributes is less than 6. However, there is no clear pattern. The model with scale differences for number of levels indicates that when the number of levels is 3, choices are more sensitive to observed utilities. Again, no clear pattern could be observed. The model with scale differences for level range shows significant effects from wide and narrow range, but the size of the effects is minor. Overall, our study shows that there is little or no impact on scales when design dimension are accounted for. Our results also suggest that people are not having problems with more complex designs. The estimates for the study parameters indicate that all the parameters are having signs as expected and the ASC for the base alternative (ASC1) suggests that there is a strong inertia towards the base alternative. As for the WTP values, the pattern shows that the FFT is valued lowest as expected while the parameter combined SSST is valued highest in models that account for scale differences for variation in the number of choice sets, alternatives, levels and width of levels; it is valued second highest to SST in models that account for scale differences for variation in the number of designs, attributes and size of set. The estimation of separate models for different subgroups of the data suggests the number of alternatives has little or no effect on model fit. The findings on the number of attributes and levels are consistent with those from the scale analysis, with 6 attributes and uneven number of levels leading to lower model fit. However, unlike the results showing lower scale with narrower or wider ranges, we here observe higher model fit. Although our results show some significant differences across designs in terms of model scale and substantive results such as WTP indicators, there is little or no suggestion that the results from the more complex designs are less reliable than those from the more simplistic designs. In fact, especially when looking at the results for scale, the opposite is regularly the case.
Acknowledgments The authors wish to thank Accent Marketing and Research Ltd. for funding the data collection in the United Kingdom. The second author thanks the Leverhulme Trust for the financial support in the form of a Leverhulme Early Career Fellowship.
214
Phani Kumar Chintakayala et al.
References Alger, S., Dillen, J. L., & Widlert, S. (1996). The national Swedish value of time study. Paper presented at PTRC International Conference on the value of time, 28–30 October, PTRC, Wokingham, UK. Arentze, T., Borgers, A., Timmermans, H., & DelMistro, R. (2003). Transport stated choice responses: Effects of task complexity, presentation format and literacy. Transportation Research Part E, 39E(3), 229–244. Bradley, M. A., & Daly A. J. (1992). Uses of the logit scaling approach in stated preference analysis. Paper presented at the 7th world conference on transport research, Lyon, France. Caussade, S., de Ortu´zar, J. D., Rizzi, L. I., & Hensher, D. A. (2005). Assessing the influence of design dimensions on stated choice experiment estimates. Transportation Research Part B, 39, 621–640. Fosgerau, M., Hjorth, K., & Lyk-Jensen, V. S. (2007). The Danish value of time study: Final report 2007. Available at: http://www.transport.dtu.dk/Forskning/Publikationer/ Publikationer%20DTF/2007.aspx Fowkes, A. S., Wardman, M., & Holden, D. (1993). Non-orthogonal stated preference design. PTRC Summer Annual Meeting, Manchester. Green, P. E., & Wind, Y. (1973). Multiattribute decisions in marketing: A measurement approach. Hinsdale, IL: The Dryden Press. Gunn, H. F., Tuinenga, J. G., Cheung, Y. H. F., & Kleijn, H. J. (1999). Value of Dutch travel time savings in 1997. In: H. Meersman, E. Van de Voorde, & W. Winkelmans (Eds.), Proceedings of the 8th world conference on transport research, transport modelling/assessment (Vol. 3, pp. 513–526). Pergamon, Amsterdam. Hague Consulting Group, Accent Marketing and Research, Department for Transport. (1999). The value of travel time on UK roads. The Hague. Hensher, D. A. (2001a). The Valuation of commuter travel time savings for car drivers in New Zealand: Evaluating alternative model specifications. Transportation, 28, 101–118. Hensher, D. A. (2001b). Measurement of the valuation of travel time savings. Journal of Transport Economics and Policy, 35(Part 1), 71–98. Hensher, D. A. (2004). Accounting for stated choice design dimensionality in willingness to pay for travel time savings. Journal of Transport Economics and Policy, 38, 425–446. Hensher, D. A. (2006a). Revealing differences in behavioural response due to the dimensionality of stated choice designs: An initial assessment. Environmental and Resource Economics, 34, 7–44. Hensher, D. A. (2006b). How do respondents process stated choice experiments? Attribute consideration under varying information load. Journal of Applied Econometrics, 21, 861–878. Kroes, E., & Sheldon, R. (1988). Stated preference methods: An introduction. Journal of Transport Economics and Policy, 22(1), 11–26. Louviere, J. J. (2001). Choice experiments: An overview of concepts and issues. In: J. Bennett & R. Blamey (Eds), The choice modelling approach to environmental valuation. Cheltenham, UK: Edward Elgar. Malhotra, N. K. (1982). Structural reliability and stability of nonmetric conjoint analysis. Journal of Marketing Research, 19, 199–207. McCullough, J., & Best, R. (1979). Conjoint measurement: Temporal stability and structural reliability. Journal of Marketing Research, 18, 80–86. MVA Consultancy, ITS University of Leeds, TSU Oxford University. (1987). Value of travel time savings. Newbury: Policy Journals.
Effects of Stated Choice Design Dimensions on Model Estimates
215
Rose, J. M., Bliemer, M. C. J., Hensher, D. A., & Collins, T. A. (2008). Designing efficient stated choice experiments in the presence of reference alternatives. Transportation Research Part B, 42(4), 395–406. Rose, J. M., Hensher, D. A., Caussade, S., de Ortuzar, J. D., & Jou, R.-C. (2008). Identifying differences in willingness to pay due to dimensionality in stated choice experiments: A cross country analysis. Journal of Transport Geography, 17(1), 21–29. Scott, J. E., & Wright, P. (1976). Modelling an organisational buyer’s product evaluation strategy: Validity and procedural considerations. Journal of Marketing Research, 13, 211–224. Sheldon, R., & Steer, J. (1982). The use of conjoint analysis in transport research. Paper presented at the PTRC summer annual meeting, PTRC, London. Wardman, M. (1998). The value of travel time: A review of British evidence. Journal of Transport Economics and Policy, 32(3), 285–315. Wardman, M. (2004). Public transport values of time. Transport Policy, 11, 363–377. Widlert, S. (1998). Stated preference studies: The design affects the results. In: J. Ortuzar, D. de Hensher & S. Jara-Diaz (Eds), Travel behavior research: Updating the state of play (Chapter 7, pp. 105–123). Oxford, UK: Pergamon.
Chapter 9
Stated Choice Experimental Designs for Scheduling Models Paul Koster and Yin-Yen Tseng
Abstract In this paper, we analyze statistical properties of stated choice experimental designs when model attributes are functions of several design attributes. The scheduling model is taken as an example. This model is frequently used for estimating the willingness to pay (WTP) for a reduction in schedule delay early and schedule delay late. These WTP values can be used to calculate the costs of travel time variability. We apply the theoretical results to the scheduling model and design the choice experiment using measures of efficiency (S-efficiency and WTP-efficiency). In the simulation exercise, we show that the designs based on these efficiency criteria perform on average better than the designs used in the literature in terms of the WTP for travel time, schedule delay early, and schedule delay late variables. However, the gains in efficiency decrease in the number of respondents. Surprisingly, the orthogonal design performs rather well in the example we demonstrated.
9.1. Introduction This paper focuses on the design of a stated preference (SP) experiment and contributes to the literature in two ways. First, we show how to design a SP experiment when model attributes are different from design attributes. The existing approaches of creating experimental designs are hard to apply in this case. The main problem here is that the relations between the model attributes and the design attributes are not one to one, and the model attributes do depend on the values of
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
218
Paul Koster and Yin-Yen Tseng
several design attributes. As a result, there can be a lot of correlations between the model attributes even if the design attributes are designed orthogonally. Because of this mapping problem, it is also difficult to implement efficient designs, since the efficiency measure based on the design attributes is no longer appropriate. In this paper, we propose a design procedure to find a reasonable efficient design given the relations between the model and design attributes and some knowledge about the prior coefficients of the model attributes. These priors can usually be identified from the literature, or can be gained by a pilot study. Second, we compare a number of different design techniques that can be used in the experimental designs using simulation analysis. In the simulation we compare the traditional orthogonal design and the design proposed by earlier studies (Small, Noland, Chu, & Lewis, 1999; Asensio & Matas, 2008) with the designs obtained by using efficiency considerations, such as S-efficiency and WTP-efficiency. In our simulation exercise, we use the scheduling model as an example. The scheduling model is frequently used to estimate the willingness to pay (WTP) for reductions in travel time and schedule delay early (SDE) and schedule delay late (SDL). The resulting coefficients can be used to derive the costs of travel time variability (Noland & Small, 1995; Fosgerau & Karlstro¨m, 2007), which are relevant for policy evaluation in transport projects. Therefore, the precision of the estimated parameters and the resulting WTPs are of vital importance, and are linked to how efficient individuals’ preferences are revealed in the choice experiments. Several authors have estimated the scheduling model by using SP experiments. An overview of the scheduling model and some earlier studies will be given in Section 9.3. The remainder of this paper is organized as follows. In Section 9.2, we derive design efficiency criteria for the case that model variables are different from design variables. Section 9.3 gives a brief introduction of the scheduling model and some empirical studies in this area. Section 9.4 illustrates the procedure of generating the experiment for the scheduling model. In Section 9.5, we compare the efficient designs proposed in the present study with the conventional orthogonal design and the design from an earlier study. Finally, Section 9.6 concludes and discusses the approach proposed in this paper.
9.2. Design Efficiency 9.2.1.
Theoretical Derivations
The theoretical derivations in this section are done for the general case where the vector of model attributes A, depends on the vector of design attributes, Z. The following derivations are done for the case of the binary logit model, but can also be extended to the case of the multinomial logit model. We denote V1n(A(Z), b) as the utility of choice situation n for the first alternative and V2n(A(Z), b) as the utility for the second alternative, where A is a function of the variables in Z, and b the vector of coefficients of the utility function. The binary logit probability of choosing the first alternative is given in Eq. (9.1a) and depends on the difference of the utilities.
Stated Choice Experimental Designs for Scheduling Models
219
Furthermore, the difference depends on the scale of utility l, which is inversely related to the standard deviation of the Gumbel error term (Train, 2003; McFadden, 1974; Louvie`re, Hensher, & Swait, 2000). P1n ¼
P2n ¼
1 1 þ el½V 2n ðAðZÞ;bÞV 1n ðAðZÞ;bÞ
(9.1a)
1 1þ
(9.1b)
el½V 1n ðAðZÞ;bÞV 2n ðAðZÞ;bÞ
The log-likelihood function is given by Eq. (9.2). LL ¼
N X
½ y1n logðP1n Þ þ y2n logðP2n Þ
(9.2)
n¼1
where y1n is 1, if alternative 1 is chosen and 0 otherwise. When estimating the MNL model we maximize the log-likelihood function. The second derivative of the loglikelihood function can be used to derive the asymptotic variance covariance (AVC) matrix (Rose & Bliemer, 2005; McFadden, 1974). The elements of the AVC matrix do not depend on the choices y. Therefore the efficiency of the design can be evaluated without the use of simulation. So, given the design levels and the assumed priors, the utilities and the logit probabilities of Eqs. (9.1a) and (9.1b) can be calculated. The second derivative of the log-likelihood function is given by Eq. (9.3a) (Kanninen, 2002; McFadden, 1974; Rose & Bliemer, 2005). The matrix of second derivatives is sometimes called the information matrix, and designs that result in higher values of the information matrix are likely to perform better. For notational convenience we define Dkn(Z) ¼ Ak1n(Z) Ak2n as the difference between the kth attribute levels of the two alternatives for choice situation n. N X eV 1n ðAðZÞ;bÞþV 2n ðAðZÞ;bÞ Dkn ðZÞ DlnðZÞ 2 @2 LL ¼ l @bk @bl ðeV 1n ðAðZÞ;bÞ þ eV 2n ðAðZÞ;bÞ Þ2 n¼1
¼
N X
P1n P2n Dkn ðZÞ DlnðZÞ l
(9.3a)
2
n¼1
IðAðZÞ; bÞ 0 N P P1n P2n D1n ðZÞ D1n ðZÞ l2 B B n¼1 B B. ¼B B .. B B P N @ P P D ðZÞ D ðZÞ l2 1n
n¼1
2n
kn
1n
...
N P
2
1
P1n P2n D1n ðZÞ Dkn ðZÞ l C C C C .. C . C C C N P 2A P P D ðZÞ D ðZÞ l
n¼1
.. . ...
1n
2n
kn
kn
n¼1
(9.3b)
Paul Koster and Yin-Yen Tseng
220
If the model attributes are corresponding to the design attributes, a change in a design attribute will result in a change of the values of the corresponding column and row of that attribute. So, for example, if we change the first attribute level of the first alternative only the values in the first row and column of the information matrix will be changed. When a model attribute depends on more design attributes, a change in a design variable Z will affect the whole information matrix, since Z appears in every element of the matrix. In the case of zero priors, P the probabilities P1n and 2P2n are equal to 0.5, and the second derivative reduces to N n¼1 ð1=4Þ Dkn ðZÞ DlnðZÞ l . As pointed out by Rose and Bliemer (2005), the orthogonal design is then the most efficient design. Equation (9.4a) is the AVC matrix O for R respondents, which can be derived by taking the inverse of Eq. (9.3b) (McFadden, 1974). Since a higher scale parameter l result in higher values of the second derivative, it in turn results in lower values for the elements in the AVC matrix. O¼
1 R ð1=l Þ ðIðAðZÞÞ; bÞ1 2
@O 1 1 @I @A ¼ 2 ðIððAðZÞÞ; bÞÞ1 ðIððAðZÞÞ; bÞÞ1 @Z R l @A @Z
(9.4a)
(9.4b)
From Eq. (9.4b) we can conclude that a change in Z affects all the elements of the AVC matrix. If the design attributes correspond to the model attributes, the first derivative of A with respect to Z equals 1 and the mapping of the design attributes on the model attributes is one to one. The results of Eqs. (9.4a) and (9.4b) are very general, and can be used to handle functions of design attributes, such as quadratic terms and interaction terms between attributes. If the levels of the design attributes are chosen on a continuous scale, Eq. (9.4b) can be used to optimize the design levels by using, for example, a BFGS algorithm. However, most researchers choose a certain number of fixed levels for the attributes. In that case, integer optimization techniques or simulation can be used.
9.2.2.
Design Criteria
The criteria of evaluating the design performance are based on the information matrix of Eq. (9.3b), or the AVC matrix in Eq. (9.4a). For example, a frequently used criterion is the D-error:
1 D-error ¼ det 2 ððIðb; AðZÞÞÞ1 Þ l
1=K ¼
1 det½ðIðb; AðZÞÞÞ1 1=K 2 l
(9.5)
In Eq. (9.5) the determinant is raised to the power 1/K if there are K attributes in the model. Furthermore, the D-error increases if the scale parameter l decreases. We will not use the D-error in our analysis because it summarizes the whole AVC
Stated Choice Experimental Designs for Scheduling Models
221
matrix. Since not all elements of O are of our interest, we will use two other criteria in the present study. First, we use S-efficiency as a design criterion. The goal of the S-efficiency design is to minimize the sample size required for obtaining reasonable t-values given a set of prior coefficients and the attribute levels (Rose & Bliemer, 2005). The objective in the simulation is then to maximize the lowest t-value for all the coefficients. In other words, it is to minimize the theoretical sample size required for obtaining significant estimates for all parameters in the utility function. Another criterion that has been introduced by Kanninen (1993) summarizes the efficiency of the WTP estimates. The WTP is usually in monetary unit, so for attribute k the WTP can be obtained by taking the ratio of the coefficients for k and cost (i.e., bk/bc, where bk and bc are the coefficients of k and cost, respectively) in the utility function. In Eq. (9.6), the variance of the WTP for attribute k is approximated using the delta method, as proposed by Scarpa and Rose (2008) and Vermeulen, Goos, Scarpa, and Vandebroek (2008). " # Varðbk Þ 2 ðbk =bc Þ Covðbk; bc Þ þ ðbk =bc Þ2 Varðbc Þ bk ffi Var bc b2c
(9.6)
The variance of the WTP depends on the (estimated) values of the coefficients, the variance of these coefficients, and the covariance. From Eq. (9.6), it is clear that a high positive covariance will result in a lower variance for the WTP. In Eq. (9.7) we derive the t-values for the WTP for attribute k as: tWTPk
pffiffiffiffi WTPk R ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VarðWTPk Þ
(9.7)
A higher number of respondents, R, results in higher t-values, since the t-values increase with the square root of R. The design that maximizes the lowest t-value of these WTPs is therefore the best design. For a good overview of several other design criteria we refer to Kessels, Goos, and Vandebroek (2006) and Ibanez, Toner, and Daly (2007). Note that the approach can be extended to other models such as the mixed logit (using a simulated AVC matrix) and the nested logit model (e.g., Sa´ndor & Wedel, 2002; Bliemer & Rose, 2008; Bliemer, Rose, & Hensher, 2008a, 2008b). In the analysis of this paper, we will use the simple MNL model for demonstrating purposes.
9.3. Scheduling Models and Experimental Designs 9.3.1.
Scheduling Model
Small (1982), inspired by the work of Vickrey (1969) on the dynamic equilibrium and optimum for queuing behind a bottleneck, introduced the schedule delay (SD)
222
Paul Koster and Yin-Yen Tseng
variable to measure the difference between traveler’s actual arrival time and preferred arrival time (PAT). Since people may value early and late arrivals differently due to their different consequences, the SD variable can be evaluated as two separate terms, SDE and SDL. SDE is defined as the amount of time arriving earlier at the destination than the PAT, while SDL is the amount of time arriving later than PAT. This gives the relationship in the utility expression (U) as follows: U ¼ bT T þ bC C þ bE SDE þ bL SDL þ y DL
(9.8)
In Eq. (9.8), T denotes the travel time and C gives the travel cost. SDE is defined as Max(0, PAT minus actual arrival time), SDL is defined as Max(0, actual arrival time minus PAT), and DL is the lateness dummy, which is equal to 1 when SDLZ0 and 0 otherwise. The coefficients of bE and bL measure the marginal disutilities of being early and late, while y represents a fixed penalty of late arrival. Since C, T, SDE, and SDL are disutilities, the coefficients are assumed to be negative. Small’s (1982) empirical finding is that jbE jojbT jojbL j, which means that people prefer early arrival to additional travel time, and prefer additional travel time to late arrival. The model proposed by Noland and Small in 1995 extended Small’s (1982) trip scheduling model by considering the probability distribution of travel time and adding an additional random component depicting the variability effect that is different from the scheduling constraint. The result is presented as Eq. (9.9), where we assume maximum expected utility (MEU) theory. EðUÞ ¼ bT EðTÞ þ bC C þ bE EðSDEÞ þ bL EðSDLÞ þ y PL
(9.9)
In Eq. (9.9), E(T ) is the expected travel time, E(SDE) is the expected schedule delay early, E(SDL) is the expected schedule delay late, and PL ¼ E(DL) is the expected lateness. Once the model is estimated, one can derive the marginal rate of substitution between any pair of the attributes in the bundle. Obtaining such measures is a common objective in the use of discrete choice models. For example, the monetary value of travel time (VOTT), an important economic indicator in transportation studies, is defined as the marginal substitution rate between travel time and costs and hence as the ratio of the respective coefficients (see Eq. (9.10)). VOTT ¼
@U=@T bT ¼ @U=@C bC
(9.10)
Similarly, the value of schedule delay early (VSDE) and value of schedule delay late (VSDL) can be derived.
9.3.2.
Empirical Studies and the Designs
The scheduling model of Noland and Small (1995) has become the workhorse model to model travelers’ departure time or route/mode choices when travel times are
Stated Choice Experimental Designs for Scheduling Models
223
PLEASE CIRCLE EITHER CHOICE A OR CHOICE B Average Travel Time Average Travel Time 9 minutes 9 minutes You have an equal chance of arriving You have an equal chance of arriving at any of the following times: at any of the following times: 7 minutes early 4 minutes early 1 minute early 5 minutes late 9 minutes late
3 minutes early 3 minutes early 2 minute early 2 minutes early On time
Your cost: $0.25
Your cost: $1.50
Choice A
Choice B
Figure 9.1: Example of the choice question in Small et al. (1999). variable. Several empirical studies have estimated the scheduling model based on the data collected from stated choice experiments (e.g., Small et al., 1999; Hollander, 2006; Tseng, 2008; Asensio & Matas, 2008). While the stated choice experiment has provided a valuable tool of gathering the choice responses for estimating the scheduling model, it remains unclear how to properly develop an experimental design that suits this purpose. The biggest problem here is that the attributes used in the experimental design are not the same attributes entered in the utility function of Eq. (9.9). Figure 9.1 gives an example of the choice question developed by Small et al. (1999). As stated in Small et al. (1999), the design attributes in this experiment are: mean travel time, travel cost, departure time shift, and standard deviation of travel time; whilst the attributes shown to the respondents are: mean travel time, travel cost, and possible arrival times with respect to individual’s PAT. Another example of a choice question given by Hollander (2006) is presented in Figure 9.2. The design attributes in Hollander (2006) are the same as the ones in Small et al. (1999), but they are different in their ways of presenting the choice question: Hollander shows the departure times explicitly to the respondents, and Small et al. keep the departure time information implicitly.1 Table 9.1 summarizes the design attributes and the model attributes used in the literature for estimating the scheduling model. It is clear that the design attributes in this type of experiment are different from the attributes used in the utility function, which is E(T ), C, E(SDE), and E(SDL) (and probably PL) in most cases.
1. Though different presentation formats may result in some systematic biases in the choice response, this is not within the scope of the present study. The idea here is to demonstrate the differences among the underlying design attributes, the attributes shown to respondents, and the variables estimated in the scheduling model. See Tseng, Verhoef, Jong, Kouwenhoven, and van der Hoorn (2009) for a study about the effect of different presentation formats.
224
Paul Koster and Yin-Yen Tseng
Figure 9.2: Example of the choice question in Hollander (2006).
Table 9.1: Design and model attributes for the scheduling model. Design attributes Z Travel time Travel cost Departure time Reliability (represented by a series of possible travel times)
Model attributes A(Z) Expected travel time Travel cost Expected schedule delay early Expected schedule delay late
More importantly, the relations between the design and model attributes are not one to one. To explicitly derive the relations between the model and design attributes, we first denote the travel time as T, and the departure time as DT. Since travel time reliability is usually represented by a series of possible travel times, it can be seen as a distribution of n possible resulting delays2 DLi, each occurring with a probability pi, Pn and i¼1 pi ¼ 1. Suppose a respondent’s PAT is the sum of his preferred departure time (PDT) and his usual (or reference) travel time (TR), that is, PAT ¼ PDT + TR. Then the SDE and SDL for the i-th possible delay can be computed as in Eqs. (9.11a)–(9.11d).
2. These delays are in fact the time differences between the possible travel times and the designed travel time level.
Stated Choice Experimental Designs for Scheduling Models
225
If PATZ the i-th possible arrival time, then SDEi ¼ ðPDT þ T R Þ ðDT þ T þ DLi Þ
(9.11a)
SDLi ¼ 0
(9.11b)
SDEi ¼ 0
(9.11c)
SDLi ¼ ðDT þ T þ DLi Þ ðPDT þ T R Þ
(9.11d)
Otherwise,
The model attributes E(SDE), E(SDL), and E(T ) are derived in Eqs. (9.12a)– (9.12c). EðSDEÞ ¼
n X
pi SDEi
(9.12a)
pi SDLi
(9.12b)
i¼1
EðSDLÞ ¼
n X i¼1
EðTÞ ¼
n X
pi ðT þ DLi Þ
(9.12c)
i¼1
As can be seen from Eqs. (9.12a)–(9.12c), it is clear that the model attributes E(SDE) and E(SDL) are dependent on the levels of travel time (T ), departure time (DT ), and reliability (pi, DLi) attributes, and E(T ) is dependent on the travel time (T ) and reliability (pi, DLi). This creates the problem of finding a good experimental design for estimating the scheduling model. As stated above, studies in this field usually adopted different design procedures rather than the orthogonal or efficient designs. For example, Small et al. (1999) developed a simulation approach to carry out the design work. The procedure of Small et al. (1999) design is as follows: 1. 2. 3. 4. 5. 6. 7.
Develop simulation procedure; Determine ranges of coefficients for simulation; Select SP design; Develop values for attributes (cost, travel time, etc.); Simulate experiment for each travel time segment; Review designs and values; If necessary, repeat from step 3.
226
Paul Koster and Yin-Yen Tseng
The aim of simulating the SP experiments in step 5 is to ensure that the designs could recover a sensible range of coefficients and, consequently, a sensible range of WTPs. Step 7 could take the design work back to step 3 and/or step 4 until a satisfactory range of sensible ratios was successfully recovered. When selecting the SP designs (step 3), one strategy used by the authors is to avoid dominated choice sets. Such a design procedure is also used in Asensio and Matas (2008). Tseng (2008, chapter 6) follows the idea of simulation but with some modification in the experimental design. In this paper, we extend the design procedure of earlier studies by using designs based on the S-efficiency and WTP-efficiency criteria.
9.4. Designing the SP Experiment for the Scheduling Model 9.4.1.
Introduction
In this section, we create designs for the experiment that is shown in Figure 9.1. The reliability presentation of a five mass points (with equal chance) distribution has been used by several studies, such as Small et al. (1999) and Asensio and Matas (2008). In principle, the analysis can be done with a two mass points distribution, but with a five mass points distribution with equal chance, smaller probabilities (0.2) can be offered to the respondent without the explicit provision of the probability. The utility of choice situation n of alternative 1 and alternative 2 are given by Eqs. (9.13a) and (9.13b). The formulation we use is a standard scheduling model without a penalty for being late, and therefore, y in Eq. (9.9) is assumed to be 0. Also the standard deviation is not included, since many empirical studies have shown that if scheduling attributes are included in the utility, the coefficient for the standard deviation would become insignificant (Asensio & Matas, 2008; Noland, Small, Koskenoja, & Chu, 1998; Small, Winston, & Yan, 2005). V 1n ¼ bT EðTÞ1n þ bC C 1n þ bE EðSDEÞ1n þ bL EðSDLÞ1n
(9.13a)
V 2n ¼ bT EðTÞ2n þ bC C 2n þ bE EðSDEÞ2n þ bL EðSDLÞ2n
(9.13b)
In the case of the scheduling model, we have some prior information about the parameters b. At least it is reasonable to assume that the coefficients of the utility functions of Eqs. (9.13a) and (9.13b) are all negative. When there is no information available, a pilot can be used to collect information about the priors. Suppose we have an observed travel time distribution with Tmin as the minimal travel time and Tmax as the maximum travel time. Tmin is usually close to the travel time with free flow speed. The difference between Tmax and Tmin is the range of the distribution. In Figure 9.3, the five points of the distribution are chosen between Tmin and Tmax. In our analysis, the midpoint of the distribution, T3, is always close to the reference travel time. The reference travel time can be the travel time of the latest trip of the respondent and can be asked in the questionnaire. T3 is constructed by using
Stated Choice Experimental Designs for Scheduling Models
Tmin
T1
T2
T3
T4
T5
227
Tmax
Figure 9.3: Construction of the travel time distribution.
Table 9.2: Values used in the simulation. Variable Costs (in euro’s) Departure level (in minutes before reference departure level) Travel time level (in minutes) Minimum travel time (in minutes) Maximum travel time (in minutes) Reference travel time (in minutes)
Levels [1.25, 2, 2.25, 2.75, 3.25, 4] [ 10, 5, 0] [ 10, 0, 5] 25 75 50
three different levels for the travel time: [ 10, 0, 5]. For example, if the reference travel time is 40 min and the first travel time level is used, T3 will be equal to 30 min. With the second level, T3 will be equal to the reference travel time, and with the third level T3 will be equal to 45 min. Given T3, the four other mass points can be determined. T1 and T2 are chosen randomly between Tmin and T3, and T4 and T5 are chosen randomly between T3 and Tmax. By shifting T3, different distributions can be obtained. The advantage of this approach is that the midpoint of the distribution is close to the reference travel time and the maximum and the minimum travel time can be based on observed travel time data. Therefore, the distributions would be more realistic to the respondents. This method does not require that the midpoint has to correspond with the reference travel time, but we do so to assure that there is at least one mass point that is close to the RP situation of the traveler. For developing the design of a group of travelers with different reference travel times, we can generate designs for different segments with approximately similar reference travel times.3 For the simulation exercise, we take some hypothetical values for different levels of the cost and the departure time attributes. These values are summarized in Table 9.2 and are close to the values used by Asensio and Matas (2008). The minimum and maximum travel times can be obtained by analyzing the observed travel time data.
3. However, Eq. (9.4a) will not hold if the respondents have different reference travel times. We like to thank a referee for making this point.
228
Paul Koster and Yin-Yen Tseng
We use the estimated coefficients of Asensio and Matas (2008) as prior coefficients in our simulation. The procedure is as follows: 1. Choose the attribute levels based on realistic observations of travel time (distributions); 2. Set the attribute level constraints and the prior coefficients; 3. Set the number of blocks; the number of choice sets and the stopping criterion; 4. Set the maximum difference within a distribution that is shown to a respondent; 5. Generate randomly a candidate SP design; 6. Calculate the model data given the design attributes, the prior coefficients and the reference point of the travelers; 7. Calculate the correlation of the differences and the other design criteria to evaluate the design performance; 8. Repeat steps 5–7 until the stopping criterion has been reached; If the above procedure cannot find a better design for 30 min, the program will stop.
9.4.2.
S-Efficiency
When S-efficiency is used as a design criterion, the goal is to minimize the sample size needed to obtain reasonable t-values given a set of prior coefficients and the attribute levels. The objective of the simulation is to maximize the lowest t-value of all the coefficients. The design we generated has 2 blocks and each with 9 choices, so in total there are 18 choices in a design. Figure 9.4 gives one of the simulation results as an example. In this figure, we plot the absolute t-values of the estimates with respect to the number of respondents.4 In Figure 9.4, we see that to obtain a significant coefficient of the E(SDE) parameter, a minimum theoretical sample of 18 respondents is needed (so 9 design replications or 162 choices), assuming that the priors are reflecting the respondents preferences perfectly well. 9.4.3.
WTP-Efficiency
In this section, we use WTP-efficiency as a criterion so we try to maximize the lowest t-value of Eq. (9.7). In Figure 9.5, we plot the absolute t-values as a function of the number of the respondents. From Figure 9.5 we see that 3 respondents (27 choices) are needed to obtain an estimate of the VSDE with absolute t-value of 1.96, assuming that these respondents have the priors that we used in our simulation. For the VOTT and the VSDL, the absolute t-values are already reached with two respondents.
4. All programming is done in Matlab 7.6.
Stated Choice Experimental Designs for Scheduling Models 12 C ET ESDE ESDL
10
absolute t-values
8
6
4
2
0
0
10
20
30
40 50 60 respondents
70
80
90
100
Figure 9.4: Absolute t-values of the model parameters.
25 VOTT VSDE VSDL treshold t-value for WTP
absolute t-values of WTP
20
15
10
5
0 0
20
40
60
80 100 120 respondents
140
160
180
Figure 9.5: WTP efficiency for the scheduling model.
200
229
230 9.4.4.
Paul Koster and Yin-Yen Tseng Correcting for the Scale of the Utility
In the previous section, the priors are assumed to be fixed. As pointed out by Scarpa and Rose (2008), it can be a problem that the scale of the utility — as used in Eqs. (9.1a) and (9.1b) — also matters for the obtained values for the efficiency criteria. When estimating WTP values, the size of the coefficients does not matter, but only the ratio of the coefficients does. When we multiply the utility with l, the WTP will not be changed, but the resulting AVC matrix (Eq. (9.4a)) will be affected. It can be shown that the variance of the WTP will change with 1/l4, and therefore the corresponding t-values can be multiplied by l2.5 An increasing scale parameter (so a lower standard deviation of the Gumbel distribution) will therefore result in higher t-values for the WTP. Given that a design performs well on a chosen scale parameter, it will also perform better on all the other scale parameters. The scale of the utility only affects the number of respondents that is needed to obtain significant estimates, and therefore the size of the efficiency gain. But it does not affect the generation of the optimal design.
9.5. Comparison of Design Approaches in an Unlabeled Experiment 9.5.1.
Introduction
In this section, we compare different designs in an unlabeled two alternatives choice experiment with regard to their performance on the WTP estimate, by conducting a simulation analysis. We evaluate the orthogonal design, the design in Asensio and Matas (2008), the design based on S-efficiency, and the design based on WTPefficiency. The priors we use are based on the estimated parameters in Asensio and Matas (2008). The orthogonal design has three levels for the attributes of cost, time, departure time, and reliability, and is designed orthogonally within the alternatives using SPSS. The design of Asensio and Matas can be found in their paper Appendix A, Table A.1, and the other designs are shown in Appendix A, Tables 9.A.1.–9.A.3.
9.5.2.
Generating the Artificial Choice Responses
By using the attribute levels in Table 9.2 and the assumed priors, we can calculate the deterministic part of the indirect utility based on the scheduling model described in Eqs. (9.13a) and (9.13b). In the random utility framework, there is also an unobservable part e. We assume that e follows a standard Gumbel distribution. Thus, based on the deterministic part and the random errors drawn from the Gumbel distribution, we can obtain the utilities of the alternatives. Alternative A is chosen if
5. This can be verified by multiplying the variances and the covariance in Eq. (9.6) by 1/l2 and the coefficients by l.
Stated Choice Experimental Designs for Scheduling Models
231
the utility of alternative A is larger than the utility of alternative B, and vice versa. In this way, the choice responses can be generated. These artificial choice responses are used to estimate a MNL model and to obtain the estimated parameters. In the simulations, each individual is assumed to make nine choices. For each given sample size, we generate a dataset 200 times and estimate MNL logit models for each artificial dataset. Then we average the resulting WTP-biases from these 200 estimates. The WTP-bias is defined in Eq. (9.14), where WTPest is the estimated WTP and WTPprior is the WTP based on the prior coefficients. WTPbias ¼ abs
9.5.3.
WTPest WTPprior WTPprior
(9.14)
Simulation Results
The simulation results are summarized in Table 9.3. For all designs, the WTP-bias decreases by the sample size. However, for the VSDE, the WTP-bias is still high when the number of respondents is high. If we take the maximum bias of the WTP estimates as an indicator, there is a gain of approximately 3–10% in terms of efficiency compared to the orthogonal design. The gain is higher for a lower number Table 9.3: Simulation results. Numbers of respondents 50
150
500
WTP-bias (%)
WTP-bias (%)
WTP-bias (%)
VOTT Orthogonal Asensio and Matas S-efficiency WTP-efficiency
11.77 32.49 22.64 23.95
6.74 10.84 14.47 14.75
3.97 5.40 6.37 8.10
VSDE Orthogonal Asensio and Matas S-efficiency WTP-efficiency
36.58 100.96 35.35 25.78
21.19 43.44 19.55 15.22
10.18 24.25 10.14 8.20
VSDL Orthogonal Asensio and Matas S-efficiency WTP-efficiency
21.01 78.17 26.20 25.94
11.06 24.50 15.03 12.73
5.34 13.05 7.73 7.66
232
Paul Koster and Yin-Yen Tseng
of respondents than for a high number of respondents. Efficiency of the design is therefore more important when using a small sample size. The design of Asensio and Matas performs not very well with a low number of respondents and is outperformed by the other design procedures. The reason that the orthogonal design performs better on the VOTT and the VSDL than the efficient designs is that we maximize the lowest t-value for the WTP. In our simulation, the t-value of the VSDE is the WTP estimate with the lowest t-value. This is because the prior coefficient is closer to zero than the other coefficients. The gain in efficiency for the VSDE compared to the orthogonal design goes with a loss in efficiency for the VOTT. Finally, the WTP-bias is in general substantial for all designs, even for a high number of respondents. A possible explanation could be that the attribute levels are not optimally chosen. Therefore it would be useful to play around with the attribute values since the analysis is done given the priors and a given set of attribute values.
9.6. Conclusions and Discussion In this paper, we propose a design procedure to implement efficient designs in choice experiments when model attributes are functions of several design attributes. The design process of the scheduling model is taken as an example. In the simulation study, we show that the designs based on S-efficiency and WTP-efficiency perform better on the criterion of minimizing the maximal t-value for the coefficients and/or the WTP, but that the gains are small for a high number of respondents. Surprisingly, the orthogonal design performs also very well. Besides the fact that the scale of utility determines the size of the efficiency gains, the approach used in this paper has some drawbacks. In principle the model that will be estimated is not known in advance. In our paper we only designed the experiment for one basic model. In the estimation phase, researchers may specify all kind of interactions and dummy variables and try to estimate lots of different models. Although one could try to come up with all of these models and average the resulting efficiency measures, and furthermore assume distributions on the prior coefficients, the uncertainty involved there will likely not outweigh the gains in efficiency. We therefore recommend first to select choices that are relevant and realistic (based on RP values) for the respondent, and then try to optimize over this set of choices using statistical criteria of design efficiency. A second remark is that the efficiency gains for larger samples are not very high. The use of efficient designs is therefore mainly important when sample sizes are small, for example, in the case of personal face-to-face interview or when computerassisted personal interviewing (CAPI) survey is used.
Acknowledgments We would like to thank Eric Kroes and Erik Verhoef for discussions about this topic and for their comments on earlier versions of this paper. Furthermore, we would like
Stated Choice Experimental Designs for Scheduling Models
233
to thank the anonymous referees for their suggestions that help to improve the paper. We retain responsibility for any remaining errors or misinterpretations. This research is part of the project ‘‘Reliable accessibility of Schiphol Airport’’ and has been funded by TRANSUMO (TRANsition SUstainable MObility).
References Asensio, J., & Matas, A. (2008). Commuters’ valuation of travel time variability. Transportation Research Part E, 44, 1074–1085. Bliemer, M. C. J., & Rose, J. M. (2008). Construction of experimental designs for mixed logit models allowing for correlation across choice observations. Proceedings of the 87th Annual Meeting of the Transportation Research Board, Washington DC. Bliemer, M. C. J., Rose, J. M., & Hensher, D. A. (2008a). Efficient stated choice experiments for estimating nested logit models. Transportation Research Part B, 43, 19–35. Bliemer, M. C. J., Rose, J. M., & Hess, S. (2008b). Approximation of Bayesian efficiency in experimental choice designs. Journal of Choice Modelling, 1(1), 98–127. Fosgerau, M., & Karlstro¨m, A. (2007). The value of reliability. MPRA, Working Paper. Hollander, Y. (2006). Direct versus indirect models for the effects of unreliability. Transportation Research Part A, 40, 699–711. Ibanez, J. N., Toner, J., & Daly, A. (2007). Optimality and efficiency requirements for the design of stated choice experiments. Association for European Transport and Contributors, European Transport Conference, 2007. Kanninen, B. J. (1993). Optimal experimental design for double-bounded dichotomous choice contingent valuation. Land Economics, 69, 138–196. Kanninen, B. J. (2002). Optimal design for multinomial choice experiments. Journal of Marketing Research, 39, 214–217. Kessels, R., Goos, P., & Vandebroek, M. (2006). A comparison of criteria to design efficient choice experiments. Journal of Marketing Research, 43, 409–419. Louvie`re, J. J., Hensher, D. A., & Swait, J. D. (2000). Stated choice methods — analysis and application. Cambridge: Cambridge University Press. McFadden, D. (1974). Conditional logit analysis of qualitative choice behaviour. In: P. Zarembka (Ed.), Frontiers of econometrics (pp. 105–142). New York: Academic Press. Noland, R. B., & Small, K. A. (1995). Travel-time uncertainty, departure time choice, and the cost of morning commutes. Transportation Research Record, 1493, 150–158. Noland, R. B., Small, K. A., Koskenoja, P., & Chu, X. (1998). Simulating travel time variability. Regional Science and Urban Economics, 28, 535–564. Rose, J. M., & Bliemer, M. C. J. (2005). Sample optimality in the design of stated choice experiments. Working Paper: ITLS-WP-05-09, Institute of Transport and Logistics Studies, University of Sydney. Sa´ndor, Z., & Wedel, M. (2002). Profile construction in experimental choice designs for mixed logit models. Marketing Science, 21(4), 455–475. Scarpa, R., & Rose, J. M. (2008). Efficiency for non-market valuation with choice modelling: how to measure it, what to report and why. Australian Journal of Agricultural and Resource Economics, 52, 253–282. Small, K. A. (1982). The scheduling of consumer activities: Work trips. American Economic Review, 72, 467–479.
234
Paul Koster and Yin-Yen Tseng
Small K. A., Noland, R. B., Chu, X., & Lewis, D. (1999). Socio-economic attributes and impacts of travel time reliability:Aa stated preference approach. University of California, Irvine, California PATH Research Report. Small, K. A., Winston, C., & Yan, J. (2005). Uncovering the distribution of motorists’ preferences for travel time and reliability: Implications for road pricing. Econometrica, 73(4), 1367–1382. Train, K. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University Press. Tseng, Y.-Y. (2008). Valuation of travel time reliability in passenger transport. Ph.D. dissertation, Department of Spatial Economics, Vrije Universiteit Amsterdam, the Netherlands. Tseng, Y.-Y., Verhoef, E. T., Jong, G. de, Kouwenhoven, M., & van der Hoorn, T. (2009). A pilot study into the perception of unreliability of travel times using in-depth interviews. Journal of Choice Modelling, 2(1), 8–28. Vermeulen, B., Goos, P., Scarpa, R., & Vandebroek, M. (2008). Efficient and robust willingness-to-pay designs for choice experiments: Some evidence from simulations. KBI Working Paper no. 0809. Vickrey, W. S. (1969). Congestion theory and transport investment. American Economic Review, 59, 251–261.
Appendix 9.A See Tables 9.A.1–9.A.3.
Table 9.A.1: The 18 choice sets of the orthogonal design. Choice set
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Alternative 1 Time Cost
50 55 40 55 50 55 40 40 50 50 40 55 40 50 40 55 55 50
4 4 2.25 2.25 2.25 1.25 1.25 4 1.25 1.25 1.25 2.25 2.25 2.25 4 4 1.25 4
Early departure 15 0 15 5 0 15 0 5 5 0 15 0 5 15 0 15 5 5
Alternative 2 Time Cost
Distribution of possible travel times 45 47 34 50 38 41 36 30 43 38 34 47 30 45 36 41 50 43
48 51 37 53 43 47 38 34 46 43 37 51 34 48 38 47 53 46
50 55 40 55 50 55 40 40 50 50 40 55 40 50 40 55 55 50
52 59 43 57 56 62 42 45 54 56 43 59 45 52 42 62 57 54
58 72 52 63 73 80 46 58 65 73 52 72 58 58 46 80 63 65
40 50 55 50 40 50 55 55 40 40 55 50 55 40 55 50 50 40
2.25 2.25 1.25 1.25 1.25 4 4 2.25 4 4 4 1.25 1.25 1.25 2.25 2.25 4 2.25
Early departure 5 15 5 0 15 5 15 0 0 15 5 15 0 5 15 5 0 0
Distribution of possible travel times 30 45 50 38 34 43 41 47 36 34 50 45 47 30 41 43 38 36
34 48 53 43 37 46 47 51 38 37 53 48 51 34 47 46 43 38
40 50 55 50 40 50 55 55 40 40 55 50 55 40 55 50 50 40
45 52 57 56 43 54 62 59 42 43 57 52 59 45 62 54 56 42
58 58 63 73 52 65 80 72 46 52 63 58 72 58 80 65 73 46
Stated Choice Experimental Designs for Scheduling Models
235
Table 9.A.2: The 18 choice sets of the design using S-efficiency as a design criterion. Choice set
Alternative 1 Time Cost
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
55 40 55 40 50 40 50 50 50 40 50 55 50 50 50 40 50 40
3.25 4 1.25 2.25 2.75 2.75 3.25 2.75 1.25 2 4 1.25 2.75 2.75 1.25 3.25 1.25 4
Early departure
15 15 5 5 15 15 15 5 5 15 15 15 0 15 15 5 15 5
Alternative 2
Distribution of possible travel times 43 28 42 29 31 27 29 33 25 29 28 31 35 41 40 27 27 26
43 32 46 31 48 33 43 36 29 31 30 42 45 44 49 36 47 33
55 40 55 40 50 40 50 50 50 40 50 55 50 50 50 40 50 40
55 67 55 69 51 45 59 57 56 53 54 55 54 51 55 50 55 58
Time Cost
55 73 60 71 68 70 72 57 56 73 55 59 69 63 71 72 64 68
55 50 50 50 50 40 50 50 40 55 50 55 55 40 40 40 50 55
2.75 1.25 3.25 2.25 2.75 4 1.25 4 4 3.25 3.25 2 2 2 1.25 2.25 3.25 2.25
Early departure
0 15 15 0 15 5 5 15 0 15 15 15 5 5 0 15 5 15
Distribution of possible travel times 37 31 28 26 31 32 46 25 35 35 30 47 29 35 28 32 33 37
48 48 35 44 36 39 49 48 39 52 45 53 45 36 31 37 47 46
55 50 50 50 50 40 50 50 40 55 50 55 55 40 40 40 50 55
56 56 54 51 57 43 70 65 56 55 56 55 72 41 49 69 52 57
58 64 72 53 62 65 71 71 64 70 66 69 72 57 62 72 60 67
Table 9.A.3: The 18 choice sets of the design using WTP-efficiency as a design criterion. Choice set
Alternative 1 Time Cost
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
40 55 55 55 40 55 50 55 55 40 50 40 40 50 55 50 55 40
4 2 1.25 1.25 2.75 4 4 3.25 2.25 4 1.25 4 2.75 3.25 2 2 1.25 4
Early departure
0 15 0 15 0 5 15 0 0 0 5 5 15 15 15 5 5 0
Alternative 2
Distribution of possible travel times 30 29 29 27 27 31 39 36 25 33 36 27 28 25 44 41 25 36
35 32 35 51 37 46 49 37 35 37 47 27 39 34 44 45 42 39
40 55 55 55 40 55 50 55 55 40 50 40 40 50 55 50 55 40
41 66 55 58 54 68 50 55 55 47 55 68 52 61 59 56 57 42
Time Cost
54 70 68 60 74 70 52 63 70 58 56 72 74 70 66 57 65 52
40 50 55 50 50 55 50 40 55 40 55 50 50 40 50 40 50 55
2.25 2.25 4 2.25 3.25 2.75 2.75 2.75 2.25 2.75 3.25 2.75 3.25 2.25 1.25 3.25 4 2
Early departure
5 15 0 0 15 5 15 5 0 15 15 0 0 15 0 15 5 15
Distribution of possible travel times 26 27 47 33 35 35 34 26 32 31 28 35 28 32 43 25 43 28
33 31 53 45 39 48 37 37 41 37 30 39 46 34 44 27 46 52
40 50 55 50 50 55 50 40 55 40 55 50 50 40 50 40 50 55
44 60 56 50 69 61 50 67 60 54 55 50 60 48 56 58 53 57
52 62 57 59 71 64 71 71 64 64 64 56 63 48 70 69 58 59
PART III CONCEPTS AND METHODOLOGY
Chapter 10
Systematically Heterogeneous Covariance in Network GEV Models Jeffrey P. Newman
Abstract Mixed logit models can represent heterogeneity across individuals, in both observed and unobserved preferences, but require computationally expensive calculations to compute probabilities. A few methods for including error covariance heterogeneity in a closed form models have been proposed, and this paper adds to that collection, introducing a new form of a Network GEV model that sub-parameterizes the allocation values for the assignment of alternatives (and sub-nests) to nests. This change allows the incorporation of systematic (nonrandom) error covariance heterogeneity across individuals, while maintaining a closed form for the calculation of choice probabilities. Also explored is a latent class model of nested models, which can similarly express heterogeneity. The heterogeneous models are compared to a similar model with homogeneous covariance in a realistic scenario, and are shown to significantly outperform the homogeneous model, and the level of improvement is especially large in certain market segments. The results also suggest that the two heterogeneous models introduced herein may be functionally equivalent.
10.1. Introduction A great deal of work had been put into creating discrete choice models that represent heterogeneity across decision makers, but this effort has mainly been focused on heterogeneous parameters incorporated into a mixed logit framework. In some ways, mixed logit models are very good for expressing such heterogeneity. The two basic
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
240
Jeffrey P. Newman
formulations of such models, the random parameters logit and the error components logit, while formally equivalent, neatly express heterogeneity with respect to observed and unobserved attributes, respectively. Ultimately, a mixed logit model can approximate any random utility model arbitrarily closely (McFadden & Train, 2000), but it is not always desirable to use such a model. Mixed logit models can be difficult to specify correctly (Garrow, 2004), and when the mixing distributions are continuous, lack a closed form expression for choice probabilities, and thus require the use of simulation (Train, 2003). Simulation is becoming easier with increasing computational power, but still requires substantial extra computational and memory resources. For large models or large datasets, simulation can be burdensome. The few functional forms for incorporating covariance heterogeneity into logit models that have been explored include a nested logit model with covariance heterogeneity, proposed by Bhat (1997a), and the mixed covariance GEV, explored by Hess, Bolduc, and Polak (2005). This paper will introduce a new method of incorporating covariance heterogeneity, called the heterogeneous covariance network GEV (HeNGEV) model, and examine a potential practical application of such a model, using a synthetically generated dataset.
10.2. COVNL Model The possibility of directly expressing covariance heterogeneity across individuals was first explored by Bhat (1997a), who developed a nested logit model with covariance heterogeneity (COVNL model). The COVNL model expressed covariance heterogeneity as a deterministic function of some attributes of the decision maker. In a single level nested model, this took a relatively simple form, replacing the nested logit model’s usual logsum parameter m: mt ¼ Fða þ gzt Þ
(10.1)
where zn is a vector of attributes of the decision maker t, a and c are scalar and vector parameters respectively, and F a transformational function with the properties that F(N) ¼ 0, F(+N) ¼ 1, and f ðxÞ ¼ @FðxÞ=@x40. This worked well for single level nesting structures, as restricting m to the range between 0 and 1 is sufficient for a nested logit model to remain a GEV model consistent with utility maximization, per McFadden’s (1978) original requirements. When extending this structure to multiple levels, such that nest n is inside a higher level nest m, it becomes more complicated, as in addition to restricting the m values to the (0,1] interval, they must also be restricted such that mnomm. Bhat achieves this by defining the lower level logsum parameter mnt as in (10.1), and the higher level logsum parameter as mmt ¼ Fða þ gzt þ Gðd þ Zwt ÞÞ
(10.2)
with G as a strictly positive function, and wt, d, and Z mirroring zt, a, and g. The positivity of G will ensure that the input value for F in (10.2) is greater than that
Systematically Heterogeneous Covariance in Network GEV Models
241
in (10.1), ensuring the correct ordering of the logsum parameter values, no matter the values of the parameters or the individual attributes. Since the logsum parameter is deterministic for each person, the COVNL does retain a simple closed form probability expression, making its implementation not computationally burdensome. Koppelman and Sethi (2005) incorporated the COVNL form into a broader model structure they dubbed the heterogeneous generalized nested logit (HGNL) model. This model brought the heterogeneity proposed by Bhat into a generalized nested logit structure, along with heteroscedasticity of error variance across individuals, as proposed by Swait and Adamowicz (2001).
10.3. Mixed Covariance GEV More recently, Hess et al. (2005) examined a more general form of covariance heterogeneity in the mixed covariance GEV (MCGEV) model. In this model, the logsum parameters m assume a random distribution across individuals, instead of adopting the variable but deterministic form proposed by Bhat (1997a). This change provides the benefit of greater generality and flexibility, but at the cost of greatly increased conceptual complexity, in order to maintain the decreasing values of the logsum parameters in the network. Additionally, if the distribution of the logsum parameters is continuous, the model loses the closed form for the probability calculations. In the MCGEV model, the probability formulation for each alternative is conditioned on the distribution of the logsum parameter Z Pt ðijmÞf ðmjOÞdm
Pt ðiÞ ¼
(10.3)
m
The attributes of the model thus depend on the specific form chosen for the distribution of f(m|O). In order to maintain the necessary conditions for m, the selected distribution must be bounded by 0 and 1. For multilevel nesting structures, the parameters also need to obey the proper ordering, as the m for a nest for any individual must be less than the m for higher level nests for that same individual. This requirement is necessary to ensure consistency with utility maximization. Hess et al. (2005) address this issue by redefining m to be a function of the predecessor, so that mlower ¼ mupper m^ lower
(10.4)
The hatted versions of m for each nest then become the relevant parameters in estimation, and (10.3) is changed to be conditioned on m^ instead of m. By using this product, imposing the condition that m^ lower is constrained to be inside the (0,1) interval will automatically ensure that mlower omupper . This replaces an inequality constraint relating the values of two parameters with an inequality constraint regarding one parameter and a constant, which is simpler to handle when estimating parameters. This becomes more complicated for cross-nested structures, as the
242
Jeffrey P. Newman
adjustment in (10.4) must be made across all higher nests: mlower ¼ min
u2upper
mu m^ lower
10.4. Incorporating Heterogeneity Through Allocations Each of these methods for incorporating heterogeneity into a logit models comes with substantial drawbacks. The COVNL model restricts the form of covariance to a strictly hierarchical form, as in a traditional nested logit model. The Mixed Covariance GEV relaxes that constraint, but can become difficult to operationalize, given the constraints required on the logsum parameters. The model forms introduced in this section attack the problem from a completely different angle.
10.4.1. Disaggregation of Allocation As examined in the previous section, all known previous attempts to incorporate heterogeneity in error covariance have focused on sub-parameterizing the logsum scaling parameter. This direction of thinking is intuitive, as it is these parameters that induce covariance between error terms, and it is convenient to start with the simplest logit model containing covariance, which is the nested logit model. However, a more complex but flexible model, such as the network GEV model (NetGEV), introduced by Daly and Bierlaire (2006), provides alternative ways of generating error covariance heterogeneity. In particular, the NetGEV model includes allocation parameters on network links, which can be sub-parameterized instead of the logsum parameters, to achieve the same goal. The NetGEV model is defined by a finite, directed, connected, circuit-free network. Such a network has a single source (root) node that represents the complete model, a sink node for each elemental alternative in the choice set, and some set of intermediate nesting nodes. Each node is associated with a G function, with G function at elemental alternative node a being Ga ðyÞ ¼ ya Each other node i has G function that is calculated based on the G values of that node’s direct successors: Gi ðyÞ ¼
X ½ðaij Gj ðyÞÞ1=mi
! mi (10.5)
j2i#
where i# is the set of successor nodes to i, aij an allocation parameter associated with the edge from i to j in the network, and mi a scaling parameter associated with node i. This model thus requires a logsum parameter on every node other than the elemental
Systematically Heterogeneous Covariance in Network GEV Models
243
alternatives, as well as an allocation parameter on each edge in the network. For the ultimate model to be a valid GEV model, and consistent with random utility maximization according to McFadden’s (1978) criteria, the allocation parameters must be nonnegative, and the logsum parameters must obey sequential inequality constraints, so that the m value for any node is smaller than or equal to the m values for all of its network predecessors. It is the sequential inequality constraint on the m values that makes them undesirable to sub-parameterize. Since each m must be less than the minimum of its predecessors, it is not easy to formulate a model that will allow m to vary across individuals. The allocation parameters, on the other hand, do not have such a constraint. 10.4.1.1. Relaxing allocation parameter constraints The normalization of the NetGEV model does require some type of constraint on the allocation parameters. As described in Newman (2008), the exact form of this constraint can help ensure an unbiased model, but P it is often convenient to have these allocation parameters sum to one, such that h2i" ahi ¼ 1 (where i" is the set of predecessors of i). Imposing this restriction directly on estimated parameters is certainly possible. However, this restriction can be relaxed by transforming the parameters using the familiar logit structure: ahi ¼ P
expðfhi Þ k2i" ½expðfki Þ
(10.6)
Under this transformation, a new set of f parameters replaces the a parameters throughout the network on a one-for-one basis. Instead of the a parameters’ linear adding-up requirement among the set of parameters associated with each node with more than one predecessor, the f parameters may vary unbounded across R, so long as one f in each such group is fixed to some constant value (typically zero). This can be an advantage in parameter estimation, as nonlinear optimization algorithms are generally easier to implement when there are no (or fewer) constraints on the parameters. 10.4.1.2. Incorporating data in the allocation Replacing the a parameters with a logit formulation not only simplifies the process of estimating the allocation parameters, it also opens up the possibility creating a much richer model. The logit structure for nest allocation allows for the incorporation of data into the correlation structure of error terms: athi ¼ P
expðfhi þ fhi Z t Þ k2i" ½expðfki þ fki Z t Þ
(10.7)
where fhi is the baseline parameter as in (10.6), Zt a vector of data specific to decision maker t, and fhi a vector of parameters to the model which are specific to the link
244
Jeffrey P. Newman
from predecessor node j to successor node i. If we assume that the first value in Zt is 1 (defining a ‘‘link-specific’’ constant), we can simplify (10.7) to1 athi ¼ P
expðfhi Z t Þ k2i" ½expðfki Z t Þ
(10.8)
Thus the G function for nesting nodes becomes 0 Gi ðyÞ ¼ @
X j2i#
!1=mi 1mi expðfij Z t Þ A P Gj ðyÞ " ½expðfkj Z t Þ k2j
The f parameters are all link-specific parameters, analogous to alternative specific parameters in an MNL model. As usual for ‘‘alternative’’ specific constants and variables in logit models, one of the vectors fij must be constrained to some arbitrary value, usually zeros. The remaining f vectors can vary unconstrained in both positive and negative regions of R: This formulation also allows the addition of decision maker attributes to be introduced as data not only in determining the systematic (observed) utility, but also in determining the correlation structure for random (unobserved) utility. This heterogeneous covariance network GEV (HeNGEV) model thus allows both the amount and the form of covariance to vary across decision makers. For example, consider an air itinerary and fare class choice model, built on a network model. The network is bifurcated into two substructures, one with itinerary nested inside fare class, and the other with fare class nested inside itinerary. Each particular potential ticket choice is partly allocated to both substructures. The allocation parameters could then vary based on ticket purchase timing, with travelers who purchase well in advance tending to choose based on one substructure, while travelers who purchase closer to the date of travel shifting to the other structure. Since the form of (10.8) is by construction strictly positive, the HeNGEV model already meets one of the conditions of the NetGEV formulation, that a is positive. As long as the nonincreasing m parameters condition also holds, the HeNGEV model will be consistent with utility maximization.
10.4.2. Discrete Mixtures of Nested Logit Models Another approach to expressing heterogeneity in covariance is through a latent class type formulation. In the traditional use of this model form, each decision maker is
1. A similar mathematical form has been employed by Sener, Pendyala, and Bhat (2008) to model spatial correlation. Despite the superficial similarity of Sener’s equations (4) and (8), they actually represent different relationships. In Sener, the data z is the spatially generated relationship between alternatives, indexed over alternatives i and j, while in (8), the data Z is attributes of the decision maker, indexed over all decision makers t; along with the appropriate matching f parameters in each model.
Systematically Heterogeneous Covariance in Network GEV Models
245
assumed to be a member of one of an unknown but assumed number of classes. Class membership is unobserved, but modeled in some way so that each decision maker t has a probability Qt(k) of belonging to class k. Conditional on membership in a particular class, the choice probabilities are calculated normally for the class’ kernel choice model. The unconditional choice probabilities must then be calculated from the class membership probabilities and the various class kernel models: Pt ðiÞ ¼
X
Pt ðijkÞQt ðkÞ
k2Classes
Latent class discrete choice models in transportation research have generally employed multinomial logit models as the class kernel model, with some or all of the MNL utility parameters allowed to vary across the classes, as in Bhat (1997b) and Greene and Hensher (2003). However, there is no particular reason why the kernel models could not be nested logit or NetGEV models, so as to mix together various covariance structures, as suggested by Walker and Ben-Akiva (2002). This scenario allows a ‘‘latent covariance’’ subtype of a latent class model, where the various b parameters to the alternative utility functions are assumed to be consistent across the classes, with only the nesting structure varying across the classes. This formulation would result in a mixture of covariance structures, controlled by the parameters of the class membership probability function. When that function contains as input data some attributes of the decision maker, the resulting total model will have a heterogeneous covariance structure. This discrete mixing is subtly different from that proposed by Hess (2005, Chapter 7); the mixing done by Hess was by varying the logsum parameter directly, while the mixing here envisions mixing entire nesting network structures as a whole, without manipulating the logsum parameters therein directly. This latent class structure can be interpreted in two ways: as a latent but real separation of individuals into distinct choice classes, or as a variable (but deterministic) mixing of utility covariances within the population. The first interpretation is analogous to the traditional latent class interpretation, where the modeler cannot identify the specific structure to which any given individual ascribes her choice, so he represents this modeling uncertainty in a probabilistic mixing fashion. The second interpretation involves the probabilistic mixing of choice structures within the choice process modeled at the individual decision maker level, so that the individual decision makers in the model have complex internal choice processes. In cross-sectional data, this distinction is not important, but in panel data, it can be relevant. Under the second interpretation, the relationship between the decision makers and discrete classes breaks down; individual decision makers in this model do not belong to a distinct class, but rather the probabilistic representation of the model of their choice process is itself being probabilistically determined. Under the first interpretation, they would retain their class membership and the associated choice model structure over time. Thus, the probabilities for their choices would need to be aggregated over all their choices before being
246
Jeffrey P. Newman
averaged across possible classes:
Pt ði1 ; i2 ; i3 ; . . . ; iN t Þ ¼
X
"
k2Classes
Qt ðkÞ
Nt Y
# Pt ðin jkÞ
(10.9)
n¼0
In second interpretation, such intermediate accumulation of probabilities across individuals for the panelization would not be appropriate. The operational difference between these approaches, while interesting, is beyond the scope of this paper and left for future research.
10.5. Application The heterogeneous covariance models, by their nature, are most useful for analyzing complex decisions. Choices where decision makers only have a small handful of options do not offer lots of opportunity for complex correlation structures. In complex choices with large choice sets, the benefits of this flexible model can become more apparent. One typical such decision occurs in air travel booking, where travelers must choose between a variety of itineraries when selecting an airline ticket. To demonstrate the model, we can examine a hypothetical choice scenario, and compare similar models with homogeneous and heterogeneous covariance forms.
10.5.1. Data Generation This scenario involves synthetically created data that would approximate what might be observed for a flight itinerary choice between two medium-sized airports in the United States. There are a variety of itinerary options (nonstop, single connect, and double connect flights on five different carriers). From each itinerary, various data attributes are provided, including departure time, level of service (nonstop, single connect, double connect), carrier, fare ratio (the comparative fare levels, on average, across the airlines serving this city pair), and distance ratio (the ratio of itinerary flight distance to straight line distance). The data on the itineraries is shown in Table 10.1. The advantage of the HeNGEV model described here is that it can incorporate attributes of the decision maker (or of the choice itself ) into the correlation structure. To examine the usefulness of such enhanced tools, the dataset also includes data on the annual income level of each decision maker, as well as the number of days in advance that the ticket was purchased. These attributes were generated for 100,000 simulated travelers. The structure of this model, based on one by Coldren and Koppelman (2005), is depicted in Figure 10.1. The network depicted has numerous nodes and arcs. If the associated parameters were each estimated independently, the parameter estimation process would become overwhelmed, and the resulting model would be virtually meaningless as a descriptive or predictive tool. Instead, the nodes are grouped into
Systematically Heterogeneous Covariance in Network GEV Models
247
Table 10.1: Flight itinerary choices in synthetic data. Itinerary number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Airline
Departure time
Distance ratio
Fare ratio
Level of service
BB BB AA AA AA AA AA CC CC BB BB BB BB BB BB BB DD DD CC BB AA AA AA BB DD DD EE EE
12:55 21:05 13:19 16:47 16:47 8:20 16:15 18:20 9:15 16:45 14:50 7:20 12:30 17:05 18:50 7:45 9:15 18:20 8:00 9:00 10:05 16:15 14:40 11:00 7:15 14:40 7:30 7:30
100 100 111 111 111 111 111 127 127 132 132 132 111 111 111 111 127 127 130 132 132 132 132 153 130 130 121 121
104 104 100 100 100 100 100 55 55 104 104 104 104 104 104 104 46 46 55 104 100 100 100 104 46 46 49 49
Nonstop Nonstop Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Single connect Double connect Double connect Double connect Double connect Double connect Double connect Double connect Double connect
four sections (upper and lower nests on each side) with common logsum parameters, and the allocations between the sides were grouped together so that all alternatives would have common allocation parameters. Since the data in this example is synthetic, the true model underlying the observations is known. In particular, the distribution of the covariance structure in the population is known and defined to be heterogeneous. This distribution is shown in Figure 10.2. A large share of the population is grouped near the right side, having a covariance structure nearly entirely defined by the L submodel, while a much smaller share of the population is represented on the B submodel side. This reflects the common scenario in air travel, where there are a few (generally high-revenue and business-related) travelers, who make decisions in a different way than most other travelers, and would thus exhibit different error covariance structures.
Jeffrey P. Newman Root
CC 2-Connect 7:45am
AA 1-Connect 8:20am
BB Nonstop 12:55pm
L-ToD L-Carrier
PM
CC
MD
AM
PM
MD
AM
BB
PM
MD
AM
AA
CC
BB
AA
PM
CC
BB
AA
BB
L
MD
CC
AM
AA
B-Carrier
B-ToD
B
BB Nonstop 9:05pm
Adapted from Coldren & Koppelman (2005)
Figure 10.1: Flight itinerary choice model for synthetic data.
0.05 0.045 Fraction of Simulated Travelers
0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0
248
Allocation to "L" Sub-model
Figure 10.2: Distribution of allocation weights in unimodal synthetic data.
Systematically Heterogeneous Covariance in Network GEV Models
249
10.5.2. Estimated Models The estimated parameters for the HeNGEV model are shown in Table 10.2. Most of the parameters in this model closely match the known true parameters, although three, with bolded t-statistics, show a statistically significant difference from the true values. That these three parameters are not correctly finding their true values might be explained in part by the high correlation in their estimators, highlighted in Table 10.3. Such high correlations would tend to inflate the errors on the parameter’s estimates (which should make the differences seem less significant, not more) so that these parameters having statistically significant errors is mildly troubling, especially in light of the fact that such a large simulated dataset was used for estimation. Nevertheless, this model performs better than others on this dataset, so these issues may be more closely related to the nature of the data used here (which was simulated but based on real-world observed models) than the model itself. Table 10.2: HeNGEV model. Estimated parameter
Departure time Before 08:00 08:00–09:59 10:00–12:59 13:00–15:59 16:00–18:59 19:00 or later
0 0.15 0.1 0.05 0.1 0.3
0 0.1065 0.09257 0.02468 0.07013 0.2975
n/a 0.01796 0.09851 0.02453 0.01867 0.09828
n/a 2.42 0.08 1.03 1.60 0.03
Level of service Nonstop Single connect Double connect
0 2.3 5.8
0 2.286 5.864
n/a 0.1019 0.1354
n/a 0.14 0.47
Flight characteristics Distance ratio Fare ratio
0.01 0.004
0.007141 0.003359
0.001107 0.0005518
0.8 0.2 0.7 0.3
0.7994 0.1439 0.6746 0.3075
0.01509 0.02585 0.01973 0.006947
0.04 2.17 1.29 1.08
1 0.03 0.2
1.066 0.02912 0.1772
0.389 0.005029 0.02686
0.17 0.17 0.85
Nesting parameters B time of day (upper) nest B carrier (lower) nest L carrier (upper) nest L time of day (lower) nest Allocation parameters Phi constant L side Phi income (000) L side Phi advance purchase L side
Standard error of estimate
t-statistic vs. true
True value
2.58 1.16
08:00–09:59 10:00–12:59 13:00–15:59 16:00–18:59 19:00 or later Distance ratio Fare ratio Single connect Double connect B carrier (lower) nest B time of day (upper) nest L time of day (lower) nest L carrier (upper) nest Phi advance purchase L side Phi constant L side Phi income (000) L side
0.08 1 0.05 0.13 1.00 0.05 0.03 0.96 0.71 0.06
0.03
0.32
0.06
0.02
0.00
0.02
0.33
0.43
0.66
0.46
0.15
0.41
10:00– 12:59
1 0.08 0.61 0.77 0.03 0.90 0.78 0.12 0.11 0.82
08:00– 09:59
0.06
0.08
0.03
0.21
0.29
0.12
0.61 0.05 1 0.71 0.03 0.55 0.54 0.05 0.01 0.56
13:00– 15:59
0.15
0.04
0.13
0.35
0.22
0.02
0.77 0.13 0.71 1 0.10 0.66 0.57 0.00 0.06 0.63
0.02
0.01
0.02
0.01
0.35
0.07
0.03 1.00 0.03 0.10 1 0.00 0.03 0.97 0.75 0.02
0.44
0.17
0.49
0.69
0.46
0.34
0.90 0.05 0.55 0.66 0.00 1 0.87 0.13 0.14 0.90
16:00– 19:00 or Distance 18:59 later ratio
Single connect
0.46
0.15
0.16 0.08
0.52 0.18
0.72 0.19
0.57 0.47
0.41 0.23
0.78 0.12 0.03 0.96 0.54 0.05 0.57 0.00 0.03 0.97 0.87 0.13 1 0.20 0.20 1 0.22 0.80 0.82 0.11
Fare ratio
Table 10.3: Parameter estimator correlation, HeNGEV model.
0.21
0.13
0.27
0.26
0.42
0.32
0.11 0.74 0.01 0.06 0.75 0.14 0.22 0.80 1 0.11
0.40
0.14
0.44
0.59
0.60
0.29
0.70
0.57
0.44
1
0.26 0.41
0.33 0.03 0.12 0.02 0.07 0.34 0.41 0.23 0.32 0.26
0.82 0.06 0.56 0.63 0.02 0.90 0.82 0.11 0.11 1
0.30
0.09
0.34
0.40
1
0.44
0.43 0.32 0.29 0.22 0.35 0.46 0.57 0.47 0.42 0.41
0.60
0.33
0.74
1
0.40
0.57
0.66 0.06 0.21 0.35 0.01 0.69 0.72 0.19 0.26 0.59
0.70
0.24
1
0.74
0.34
0.70
0.46 0.02 0.03 0.13 0.02 0.49 0.52 0.18 0.27 0.44
0.81
1
0.24
0.33
0.09
0.29
0.15 0.00 0.08 0.04 0.01 0.17 0.16 0.08 0.13 0.14
1
0.81
0.70
0.60
0.30
-0.60
0.41 0.02 0.06 0.15 0.02 0.44 0.46 0.15 0.21 0.40
Double B carrier B time L time L carrier Phi Phi Phi connect (lower) of day of day (upper) advance constant income nest (upper) (lower) nest purchase L side (000) nest nest L side L side
250 Jeffrey P. Newman
Systematically Heterogeneous Covariance in Network GEV Models
251
Table 10.4: NetGEV model. True value
Estimated parameter
Standard error of estimate
t-statistic vs. true
Departure time Before 08:00 08:00–09:59 10:00–12:59 13:00–15:59 16:00–18:59 19:00 or later
0 0.15 0.1 0.05 0.1 0.3
0 0.06687 0.03704 0.03495 0.02141 0.3445
n/a 0.03759 0.1177 0.07088 0.05334 0.1120
n/a 2.21 0.53 1.20 1.47 0.40
Level of service Nonstop Single connect Double connect
0 2.3 5.8
0 2.331 5.956
n/a 0.1407 0.2530
n/a 0.22 0.62
Flight characteristics Distance ratio Fare ratio
0.01 0.004
0.004372 0.002202
0.002449 0.001068
Nesting parameters B time of day (upper) nest B carrier (lower) nest L carrier (upper) nest L time of day (lower) nest
0.8 0.2 0.7 0.3
0.8307 0.07244 0.6519 0.3078
0.1022 0.04395 0.08702 0.01321
0.30 2.90 0.55 0.59
Allocation parameters Phi constant L side
1
0.5928
0.4722
0.86
2.30 1.68
The NetGEV model without a heterogeneous covariance (shown in Table 10.4) performs relatively well, but definitely worse than the HeNGEV model. The NetGEV model has a log-likelihood at convergence that is 240 less than the HeNGEV model, a highly significant deterioration given that only two degrees of freedom are lost. The performance of the individual parameter estimates in the NetGEV and HeNGEV models are compared in Table 10.5. For each parameter in the model, the HeNGEV estimate is closer to the known true value than the NetGEV estimate, generally by about half. Further, the standard errors of the estimates are all smaller for the HeNGEV model, also by about half. A latent class model was also estimated for the same dataset, with two classes representing the two correlation structures. The class membership model also utilized a multinomial logit structure, with class membership based on the same input variables as the allocation data in the HeNGEV model. The resulting parameter estimates and model fit were identical to those estimated using the HeNGEV model. For a more complete picture, regular nested logit models were estimated using each of the two submodels, as well as a multinomial logit model that ignored the
252
Jeffrey P. Newman
Table 10.5: Comparison of HeNGEV and NetGEV models. HeNGEV model
NetGEV model
Actual error Standard error Actual error Standard error of estimate of estimate of estimate of estimate Departure time Before 08:00 08:00–09:59 10:00–12:59 13:00–15:59 16:00–18:59 19:00 or later
n/a 0.0435 0.00743 0.02532 0.02987 0.0025
n/a 0.01796 0.09851 0.02453 0.01867 0.09828
n/a 0.08313 0.06296 0.08495 0.07859 0.0445
n/a 0.03759 0.1177 0.07088 0.05334 0.1120
Level of service Nonstop Single connect Double connect
n/a 0.014 0.064
n/a 0.1019 0.1354
n/a 0.031 0.156
n/a 0.1407 0.2530
Flight characteristics Distance ratio Fare ratio
0.002859 0.000641
0.001107 0.0005518
0.005628 0.001798
0.002449 0.001068
Nesting parameters B time of day (upper) nest B carrier (lower) nest L carrier (upper) nest L time of day (lower) nest
0.0006 0.0561 0.0254 0.0075
0.01509 0.02585 0.01973 0.006947
0.0307 0.12756 0.0481 0.0078
0.1022 0.04395 0.08702 0.01321
Allocation parameters Phi constant L side Phi income (000) L side Phi advance purchase L side
0.066 0.00088 0.0228
0.389 0.005029 0.02686
0.4072
0.4722
error covariance entirely. As shown in Figure 10.3, the results of these models were (by definition) inferior to even the NetGEV model, although the L-only nested submodel performed nearly as well as the homogeneous NetGEV model. Not surprisingly, the MNL model with similarly defined utility functions performs relatively poorly, with log-likelihood benefits in the thousands for a change to either nested structure. Of course, while all of the differences shown in Figure 10.3 are statistically significant, some differences are more significant than others. The L-only structure has a better fit for the data than the B-only model. This is consistent with the construction of this dataset, which is heavily weighted with decision makers exhibiting error correlation structures that are nearly the same as the L-only model. This heavy weight toward the L model is also reflected in the very small improvement (6.77) in log-likelihood when moving from the L-only model to the NetGEV model, which incorporates both L and B submodels. While this change is still statistically significant (w2 ¼ 13.54, with 3 degrees of freedom, p ¼ 0.0036) it is
Systematically Heterogeneous Covariance in Network GEV Models 3 Restrictions LL=6.77
NL (L)
253
2 Restrictions LL=3835
LL= -177128
HeNGEV
NetGEV
MNL
LL= -176881
LL= -177121
LL= -180964
NL (B) 2 Restrictions LL=241
LL= -177244
3 Restrictions LL=123
2 Restrictions LL=3719
improving log likelihood (not to scale)
Figure 10.3: Log-likelihoods and relationships between models estimated using unimodal dataset. Table 10.6: HeNGEV and NetGEV market-level predictions. Itinerary
Total observed
Predictions
Differences
HeNGEV
NetGEV
HeNGEV
NetGEV
1 2 3 4 5 6 7 8 9 16 17 18 19 Others
45,067 26,746 2,633 1,346 1,415 3,521 1,452 3,328 2,374 1,047 3,983 3,412 2,221 1,455
44,806.47 26,769.61 2,649.82 1,439.44 1,439.44 3,328.98 1,439.44 3,273.62 2,485.81 1,055.51 4,014.62 3,506.99 2,257.96 1,532.29
44,824.55 26,753.7 2,650.9 1,432.45 1,432.45 3,355.5 1,432.45 3,293.55 2,466.85 1,053.15 4,001.65 3,506 2,264.9 1,531.95
260.53 23.61 16.82 93.44 24.44 192.02 12.56 54.38 111.81 8.51 31.62 94.99 36.96 77.29
242.45 7.70 17.90 86.45 17.45 165.50 19.55 34.45 92.85 6.15 18.65 94.00 43.90 76.95
Total
100,000
100,000
100,000
0
0
tiny compared to the changes observed between other models. In this instance, with most travelers exhibiting similar L choice patterns, it appears that upgrading to the NetGEV model alone does not provide much benefit. Far more improvement in the log-likelihood is made when the heterogeneous covariance is introduced, which allows the small portion of the population that exhibits ‘‘B’’ choice patterns to follow that model, without adversely affecting the predictions for the larger L population. When applied for prediction, the predictions of the HeNGEV model and the NetGEV model across the entire market are roughly similar, as can be seen in Table 10.6. The two models over- or under-predict in roughly the same amounts for each itinerary. However, when the predictions are segmented by income as in Table 10.7, the
8884 5246 572 275 292 703 307 693 503 181 842 740 477 148 137
8958 5211 565 285 332 730 318 730 495 181 803 675 462 134 121
Bottom Lower fifth of middle income income
9010 5264 533 280 261 722 292 681 497 228 822 715 416 164 115
Middle fifth of income
9139 5423 500 277 285 686 260 622 460 226 761 625 442 159 135
9076 5602 463 229 245 980 275 602 419 231 755 657 424 214 128
27.3 33.0 18.5 19.2 27.8 64.1 13.8 49.7 17.1 10.9 14.9 57.0 6.8 20.7 21.6
407.87 407.16
11.5 128.2 6.4 48.0 31.0 37.0 16.0 16.7 26.1 11.2 6.8 27.2 11.6 7.3 9 361.95
53.6 72.5 0.4 10.5 29.5 56.2 1.5 22.2 2.5 20.0 16.7 8.7 38.3 0.9 25.1
Middle fifth of income 39.2 23.0 26.1 18.6 2.6 14.4 27.4 10.4 41.5 27.5 10.9 31.1 14.9 17.2 4.3 399.51 321.93
152.0 23.3 16.0 2.9 10.9 20.3 14.2 11.2 24.7 1.3 29.3 50.7 4.9 18.0 2.2 530.55
80.9 104.7 41.8 11.5 5.5 31.9 20.5 34.3 9.6 29.6 41.7 38.8 24.0 18.3 3
519.19
6.9 139.7 34.8 1.5 45.5 58.9 31.5 71.3 1.6 29.6 2.7 26.2 9.0 32.3 19.1
369.91
45.1 86.7 2.8 6.5 25.5 50.9 5.5 22.3 3.6 17.4 21.7 13.8 37.0 2.3 25.1
Middle fifth of income
564.37
884.23
171.1 111.1 72.3 151.3 30.2 67.2 9.5 57.5 1.5 41.5 14.9 8.9 26.5 11.5 36.7 56.7 33.4 74.4 15.4 20.4 39.3 45.3 76.2 44.2 11.0 29.0 7.3 47.7 5.1 12.1
Upper Top fifth middle of income income
NetGEV model
Upper Top fifth Bottom Lower middle of income fifth of middle income income income
HeNGEV model
Upper Top fifth Bottom Lower middle of income fifth of middle income income income
Observed choices
Total absolute deviation
1 2 3 4 5 6 7 8 9 16 17 18 19 20 Others
Itinerary
Table 10.7: HeNGEV and NetGEV predictions segmented by income.
254 Jeffrey P. Newman
Systematically Heterogeneous Covariance in Network GEV Models
255
Figure 10.4: Observations and market-level prediction errors. HeNGEV model can be seen to outperform the NetGEV model in all income segments, especially in the extremes of the income range. The errors for the whole market, on the right side of Figure 10.4, are roughly similar for both models. However, within the extreme high and low income segments (especially in the high income segment), as shown in Figure 10.5, the errors in prediction for the HeNGEV model are generally much smaller than those of the NetGEV model. The overall market predictions for the NetGEV model end up close to the HeNGEV predictions because the particularly large errors appearing in the extreme income segments have offsetting signs.
10.6. Conclusions Overall, the HeNGEV models show a better fit for the synthetic data than the matching homogeneous NetGEV models. Individual parameter estimates were generally improved by adopting the heterogeneous model, often by half or more of the error in the estimate. This result is not especially surprising in this dataset, as it was constructed using a heterogeneous synthetic choice process. Nevertheless, the ability to recover the choice parameters, and to recover them more accurately
Figure 10.5: Prediction errors, segmented by income.
256 Jeffrey P. Newman
Systematically Heterogeneous Covariance in Network GEV Models
257
than in a homogeneous model, demonstrates that the heterogeneous structure can be used successfully where appropriate. Better fitting models are obviously a positive attribute of the HeNGEV structure, but they are not the only benefit. When used to predict choices of subsections of the population, the responsiveness of the correlation structure to data allows the HeNGEV to be a superior predictive tool. Such benefits could be especially appealing in revenue management systems, which seek specifically to segment markets in order to capture these types of differences in pricing and availability decisions. The identical results achieved for the HeNGEV and latent class models strongly suggest an equivalence between these two model forms. This may be a particular result of the covariance structure selected for this synthetic data experiment, although the topology of the network chosen (with all the allocation parameters at the bottom of the network, on edges connecting to the elemental alternatives) is reasonably general, as any NetGEV model can be transformed to have this bottomonly allocation (Newman, 2008). Additionally, the synthetic data was a crosssectional dataset, and not panel data with identifiable repeated choices by a single decision maker. Nevertheless, the author conjectures that the network GEV model and the discrete mixture of nested logit model are indeed generally equivalent, although the full exploration of this is left for the future.
References Bhat, C. R. (1997a). Covariance heterogeneity in nested models: Econometric structure and application to intercity travel. Transportation Research Part B: Methodological, 31(1), 11–21. Bhat, C. R. (1997b). An endogeneous segmentation mode choice model with an application to intercity travel. Transportation Science, 31(1), 34–48. Coldren, G. M., & Koppelman, F. S. (2005). Modeling the competition among air-travel itinerary shares: GEV model development. Transportation Research Part A, 39(4), 345–365. Daly, A., & Bierlaire, M. (2006). A general and operational representation of generalised extreme value models. Transportation Research Part B: Methodological, 40, 285–305. Garrow, L. A. (2004). Comparison of models representing correlation and random taste variation: An application to airline passengers’ rescheduling behavior. Ph.D. thesis, Northwestern University, Evanston, IL. Greene, W. H., & Hensher, D. A. (2003). A latent class model for discrete choice analysis: Contrasts with mixed logit. Transportation Research Part B: Methodological, 37, 681–698. Hess, S. (2005, September). Advanced discrete choice models with applications to transport demand. Ph.D. thesis, Imperial College London. Hess, S., Bolduc, D., & Polak, J. W. (2005, August). Random covariance heterogeneity in discrete choice models. ERSA conference papers ersa05p375. European Regional Science Association. Koppelman, F. S., & Sethi, V. (2005). Incorporating variance and covariance heterogeneity in the Generalized Nested Logit model: An application to modeling long distance travel choice behavior. Transportation Research Part B: Methodological, 39(9), 825–853.
258
Jeffrey P. Newman
McFadden, D. (1978). Modeling the choice of residential location. In: A. Karlqvist (Ed.), Spatial interaction theory and residential location (pp. 75–96). Amsterdam: North-Holland. McFadden, D., & Train, K. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics, 15, 447–470. Newman, J. P. (2008). Normalization of network generalized extreme value models. Transportation Research Part B: Methodological, 42(10), 958–969. Sener, I. N., Pendyala, R. M., & Bhat, C. R. (2008, July). Accommodating spatial correlation across choice alternatives in discrete choice models: an application to modeling residential location choice behavior. Technical Report, Department of Civil, Architectural & Environmental Engineering, The University of Texas at Austin, available at: http:// www.ce.utexas.edu/prof/bhat/ABSTRACTS/SpatialCorrelationPaper31July08.pdf Swait, J., & Adamowicz, W. L. (2001). Choice environment, market complexity, and consumer behavior: A theoretical and empirical approach for incorporating decision complexity into models of consumer choice. Organizational Behavior and Human Decision Processes, 86(2), 141–167. Train, K. (2003). Discrete choice methods with simulation. New York: Cambridge University Press. Walker, J., & Ben-Akiva, M. (2002). Generalized random utility model. Mathematical Social Sciences, 43(3), 303–343.
Chapter 11
On Estimation of Hybrid Choice Models Denis Bolduc and Ricardo Alvarez-Daziano
Abstract The search for flexible models has led the simple multinomial logit model to evolve into the powerful but computationally very demanding mixed multinomial logit (MMNL) model. That flexibility search lead to discrete choice hybrid choice models (HCMs) formulations that explicitly incorporate psychological factors affecting decision making in order to enhance the behavioral representation of the choice process. It expands on standard choice models by including attitudes, opinions, and perceptions as psychometric latent variables. In this paper we describe the classical estimation technique for a simulated maximum likelihood (SML) solution of the HCM. To show its feasibility, we apply it to data of stated personal vehicle choices made by Canadian consumers when faced with technological innovations. We then go beyond classical methods, and estimate the HCM using a hierarchical Bayesian approach that exploits HCM Gibbs sampling considering both a probit and a MMNL discrete choice kernel. We then carry out a Monte Carlo experiment to test how the HCM Gibbs sampler works in practice. To our knowledge, this is the first practical application of HCM Bayesian estimation. We show that although HCM joint estimation requires the evaluation of complex multi-dimensional integrals, SML can be successfully implemented. The HCM framework not only proves to be capable of introducing latent variables, but also makes it possible to tackle the problem of measurement errors in variables in a very natural way. We also show that working with Bayesian methods has the potential to break down the complexity of classical estimation.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
260
Denis Bolduc and Ricardo Alvarez-Daziano
11.1. Introduction Within the continuous search for flexible models capable of dealing with different practical and realistic situations, discrete choice modeling has developed especially quickly: the simple but restrictive multinomial logit (MNL) model has evolved into the powerful mixed multinomial logit (MMNL) model. In the last few years this flexibility search has been extended to the next level, and discrete choice modeling now aims to explicitly incorporate psychological factors affecting decision making, with the goal of enhancing the behavioral representation of the choice process. Hybrid choice models (HCMs) is a new-generation discrete choice model with improved explanatory power that integrates discrete choice and latent variables model taking into account the impact of attitudes and perceptions on the decision process. The use of HCM permits us to adequately predict individual preferences and to assess the impact of unobserved factors involved in the behavioral representation of the choice process. In this paper we describe HCM estimation techniques and analyze the practical implementation of HCM to include perceptions and attitudes in a standard discrete choice setting. In Section 11.2, we discuss how HCM integrates latent variables into a standard discrete choice setting. Section 11.3 describes the technical details regarding the maximum simulated likelihood (MSL) implementation of a general HCM with an MMNL kernel for the discrete choice submodel. We expand the method presented in Bolduc, Ben-Akiva, Walker, and Michaud (2005) to a more general case, through simultaneity among the latent variables and through incorporating not only continuous but also discrete indicator variables. Section 11.4 presents an example of the feasibility of HCM classical estimation using empirical data on private vehicle choice. This section also presents the results of each partial model that configures the hybrid model setting. Section 11.5 introduces HCM Bayesian estimation. In this section we carry out a Monte Carlo experiment to test how the HCM Gibbs sampler works in practice. This section connects to the rest of the paper by adding to the discussion on estimation of HCM. Sections 11.3 and 11.4 provide the elements of classical HCM estimation, whereas Section 11.5 introduces the idea that by using a Bayesian approach we could potentially break down the complexity of classical techniques. Section 11.6 presents the main conclusions of our work and identifies guidelines for future research.
11.2. Hybrid Choice Modeling 11.2.1. Standard Discrete Choice Modeling In discrete choice modeling, the most common approach is based on random utility maximization theory or RUM theory (McFadden, 1974), which introduces the concept of individual choice behavior to be intrinsically probabilistic. According to this theory, each individual has a utility function associated with each of the alternatives. This individual function can be divided into a systematic part, which considers the effect of the explanatory variables, and a random part that takes into
On Estimation of Hybrid Choice Models
261
account all the effects not included in the systematic part of the utility function. In other words, choices are modeled using a structural equation [Eq. (11.1)] — the utility function — representing the individual preferences, where the explanatory variables are the alternative attributes and individual characteristics. The observed choice corresponds to the alternative that maximizes the individual utility function, a process represented by a measurement equation [Eq. (11.2)]. Because the utility function has a random nature, the output of the model actually corresponds to the choice probability of individual n choosing alternative i. The set of equations describing the standard discrete choice setting is given by: U in ¼ X in b þ uin yin ¼
1 0
if U in U jn ; 8j 2 C n ; jai otherwise
(11.1)
(11.2)
where Uin corresponds to the utility of alternative i as perceived by individual n; Xin is a row vector of attributes of alternative i and socioeconomic characteristics of individual n; b a column vector of unknown parameters; uin an error term; yin corresponds to an indicator of whether alternative i is chosen by individual n or not; and Cn the individual set of available alternatives. Different choice models can be derived depending on the assumptions considered for the distribution of the random error term (Ben-Akiva & Lerman, 1985). So far the workhorses in this area have been the MNL model (McFadden, 1974) and the nested logit model (Ben-Akiva, 1973). Both offer closed forms for choice probabilities but rely on restrictive simplifying assumptions. To gain generality, more flexible models have been incorporated in practice. One powerful modeling alternative is the MMNL model (Bolduc & Ben-Akiva, 1991; Brownstone & Train, 1999), which can approximate any RUM model (McFadden & Train, 2000). The main idea of this kind of model is to consider more than one random component, allowing for the presence of a more flexible covariance structure. The estimation implies the evaluation of integrals without a closed-form solution, although it is possible to use computer-aided simulation techniques to approximate it. In this way it can be seen that the development of discrete choice modeling has evolved quickly and that powerful models can be used. However, under the standard random utility approach (McFadden, 1974) discrete choice models represent the decision process as an obscure black box, where attitudes, perceptions, and knowledge are neglected. According to 2002 Nobel Laureate Daniel Kahneman, there still remains a significant difference between economist modelers who develop practical models of decision making and behavioral scientists who focus on in-depth understanding of agent behavior. Both have fundamental interests in behavior, but each work with different assumptions and tools. McFadden (1986) points out the need to bridge these worlds by incorporating attitudes in choice models. In his 2000 Nobel lecture, McFadden emphasized the need to incorporate attitudinal constructs in conventional economics models of decision making.
262
Denis Bolduc and Ricardo Alvarez-Daziano
11.2.2. Latent Variables and Discrete Choice: The HCM In the last few years the flexibility search we described in the previous section has been extended to the next level, and discrete choice modeling now aims to explicitly incorporate psychological factors affecting decision making (Ben-Akiva et al., 2002), with the goal of enhancing the behavioral representation of the choice process. The hybrid choice modeling approach embraces this improved representation (see Figure 11.1). In fact, HCMs expand on standard choice models by considering the following extensions (Walker & Ben-Akiva, 2002): heterogeneity through flexible error structures (such as the use of a MMNL kernel formulation), the combination of revealed preference (RP) and stated preference (SP) data, the presence of latent classes explaining market segments, and the integration of latent (unobserved) constructs according to an integrated choice and latent variable (ICLV) model. It is the ICLV model inside the HCM conceptual framework that permits the inclusion of attitudes, opinions, and perceptions as psychometric latent variables in such a way that understanding of consumer behavior is improved while the model gains in predictive power. Since it is methodologically trivial to add in a latent class model and to consider mixed RP/SP data, our HCM analysis will focus on both the ICLV model and the consideration of flexible disturbances. The ICLV setting is an improved representation model of the choice process that involves dealing with a choice model formulation that contains unobserved psychometric variables (perceptions and attitudes) among the explanatory variables incorporated as latent variables (Bolduc et al., 2005). On the one hand, perception variables measure the cognitive individual capacity to represent and evaluate the attributes of different alternatives. Perceptions are relevant because the choice process depends on how attribute levels are perceived by the individual beliefs of a
Figure 11.1: Hybrid choice model (general representation).
On Estimation of Hybrid Choice Models
263
specific consumer. On the other hand, attitude variables measure the individual evaluation of importance assigned to the features of different alternatives. Attitudes are related to individual heterogeneity (taste variations) and reflect individual tastes, needs, values, goals, and capabilities that develop over time and are affected by experience and external factors, such as the socioeconomic characteristics of the decision maker (Walker, 2002). Econometrically, the latent variable model is composed of a set of structural equations, which describe the latent variables in terms of observable exogenous variables, and a group of measurement relationships (measurement model) linking latent variables to indicators (Jo¨reskog & So¨rbom, 1984). Since the latent variables are unobserved, they are normally linked to answers to questions of a survey: the indicators. These indicators can be continuous, binary or categorical variables expressed by responses to attitudinal and perceptual survey questions. Note that under the RUM framework, the standard choice model is a latent model itself. The utility function is a latent construct that measures the individual level of satisfaction conditional on each alternative (choice model structural equation). While the utility function is unobservable, revealed or stated choice serves as an indicator of the underlying choice process. The latent variable model can be integrated into the standard choice model setting, obtaining the HCM group of structural and measurement equations that may be written as follows. 11.2.2.1. zn ¼
Structural equations Y
Y 1 Y 1 zn þ Bwn þ zn ¼ I L Bwn þ I L zn ;
zn Nð0; CÞ (11.3)
U n ¼ X n b þ Gzn þ un 11.2.2.2.
(11.4)
Measurement equations I n ¼ a þ Lzn þ n ; yin ¼
1 0
n Nð0; YÞ
if U in U jn ; 8j 2 C n ; jai otherwise
(11.5)
(11.6)
where zn is a (L 1) vector of latent variables; we introduce the (L L) matrix P allowing the eventual presence of simultaneity or interactions among the latent variables, assuming that (IL P) is invertible, where IL represents the identity matrix of size L; wn is a (M 1) vector of explanatory variables affecting the latent variables; B is a (L M) matrix of unknown parameters used to describe the global effect of (IL P) 1Bwn on the latent variables; and C is a (L L) variance–covariance matrix
264
Denis Bolduc and Ricardo Alvarez-Daziano
which describes the relationship among the latent variables through the error term. The choice model in Eq. (11.4) is written in vector form where we assume that there are J alternatives. Therefore, Un is a (J 1) vector of utilities; un is a (J 1) vector of error terms associated with the utility terms. Xn is a (J K) matrix with Xin designating the ith row. b is a (K 1) vector of unknown parameters. G is a (J L) matrix of unknown parameters associated with the latent variables present in the utility function, with Gi designating the ith row of matrix G. In the set of measurement equations, In corresponds to a (R 1) vector of indicators of latent variables associated with individual n; a is a (R 1) vector of constants and L is a (R L) matrix of unknown parameters that relate the latent variables to the indicators. The term en is a (R 1) vector of independent error terms. This implies that Y is a diagonal matrix with variance terms on the diagonal. Finally, we stack the choice indicators yin’s into a (J 1) vector called yn. If the latent variables were not present, the choice probability of individual n selecting alternative i would correspond exactly to the standard choice probability Pðyin ¼ 1jX n ; bÞ Pn ðijX n ; bÞ. In a setting with given values for the latent variables zn , the choice probability would be represented by Pn ðijzn ; X n ; yÞ where y contains all the unknown parameters in the choice model of Eq. (11.4). Since latent variables are not actually observed, the choice probability is obtained by integrating the latter expression over the whole space of zn : Z Pn ðijX n ; wn ; y; B; P; CÞ ¼ zn
Pn ðijzn ; X n ; yÞgðzn jwn ; B; P; CÞdzn
(11.7)
which is an integral of dimension equal to the number of latent variables in zn and where gðzn jwn ; B; P; CÞ is the density of zn defined in Eq. (11.3). Indicators are introduced in order to characterize the unobserved latent variables, and econometrically they permit identification of the parameters of the latent variables. Indicators also provide efficiency in estimating the choice model with latent variables because they add information content. The variables yn and In are assumed to be correlated only via the presence of the latent variables zn in Eqs. (11.4) and (11.5). Given our assumptions, the joint probability Pðyin ¼ 1; I n Þ Pn ði; IÞ of observing yn and In may thus be written as: Z Pn ði; IjX n ; wn ; dÞ ¼ zn
Pn ðijzn ; X n ; yÞf ðI n jzn ; L; YÞgðzn jwn ; B; P; CÞdzn
(11.8)
where f ðI n jzn ; L; YÞ is the density of In defined in Eq. (11.5). The term d designates the full set of parameters to estimate jointly the discrete choice and the latent variable models (i.e., d ¼ fy; B; P; C; L; Yg). Few applications of the HCM are found in the literature, usually focusing on one latent variable — allowing numerical integration — or using a consistent but not efficient two-stage estimation procedure. While Ben-Akiva and Boccara (1987) develop the idea of hybrid modeling in the context of a more comprehensive travel
On Estimation of Hybrid Choice Models
265
behavior framework, Walker and Ben-Akiva (2002) and Morikawa, Ben-Akiva and McFadden (2002) extend this development, showing how the model can be estimated and implemented in practice. The contribution of Bolduc et al. (2005) is the first example of the analysis and implementation of a situation characterized by a large number of latent variables and a large number of choices. The work of Ben-Akiva, Bolduc, and Park (2008) is a recent application of the hybrid choice setting applied to the freight sector. For a personal-vehicle-technology choice context, Bolduc, Boucher, and Alvarez-Daziano (2008) analyze the practical use of a large number of indicators.
11.3. HCMs: Classical Estimation This section provides the analytical details regarding the MSL implementation of a general HCM with a MMNL kernel for the discrete choice submodel. To gain generality and flexibility, we expand the method presented in Bolduc et al. (2005) in two relevant ways. In our model we allow for both the presence of simultaneity among the latent variables [through Eq. (11.3)] and the incorporation of latent variables with associated indicators that can be not only continuous but also discrete (binary or multinomial).
11.3.1. Evaluating the Joint Choice Probability For efficiency reasons, we only focus on a full information solution. Following Walker and Ben-Akiva (2002), HCM classical full information estimation requires the evaluation of the joint probability Pn ði; IjX n ; wn ; dÞ defined in Eq. (11.8). This joint probability depends, first, on the discrete choice kernel Pn ðijzn ; X n ; yÞ. In addition, the analytical form of the discrete choice kernel depends on the assumptions regarding the distribution of the random term un defined in Eq. (11.4). Indeed, if un is independent and identically distributed (i.i.d.) extreme value type 1 distributed, then conditional on zn the probability of choosing alternative i has the MNL form, which leads to the following expression: Z Pn ði; IjX n ; wn ; dÞ ¼ zn
P
expðX in b þ Ci zn Þ f ðI n jzn ; L; YÞgðzn jwn ; B; P; CÞdzn j2C n expðX jn b þ C j zn Þ (11.9)
Assuming a MNL kernel provides an easier calculation of Pn ði; IjX n ; wn ; dÞ because the choice probability Pn ðijzn ; X n ; yÞ has a closed form. However, the same modeling disadvantages found in the standard case still hold. MNL assumes a restricted covariance structure, with no correlation and no heteroscedasticity.
266
Denis Bolduc and Ricardo Alvarez-Daziano
We can derive a probit kernel if we make the assumption that the error terms un are multivariate Normal distributed. The probit kernel solves the problem of restrictive simplifying assumptions of MNL. However, in the probit case the choice probability no longer has a closed form. In fact, probit classical estimation has proven to be burdensome in practice. For classical estimation, a MMNL kernel is the most convenient assumption to model flexible error structures. We will decompose un assuming a normally distributed factor analytic structure: un ¼ PTxn þ nn
(11.10)
where P is a (J L) matrix of factor loadings; T is a (F F ) diagonal matrix that contains factor specific standard deviations (TAy); x is a (F 1) of i.i.d. normally distributed factors; and v is a (J 1) vector of i.i.d. extreme value type 1 error terms. The MMNL kernel adds an additional F-dimensional integral to the joint probability Pn ði; IjX n ; wn ; dÞ, which now implies solving: Pn ði; IjX n ; wn ; dÞ Z Z (11.11) Pn ðijzn ; X n ; y; xn Þf ðI n jzn ; L; YÞgðzn jwn ; B; P; CÞN x ð0; I F Þdzn dxn ¼ xn
zn
Since v is i.i.d. extreme value type 1, note that Pn ðijzn ; X n ; y; xn Þ has the following MNL form:
Pn ðijzn ; X n ; y; xn Þ ¼ P
expðX in b þ C i zn þ Pi Txn Þ j2Cn expðX jn b þ C j zn þ Pj Txn Þ
(11.12)
where Pi denotes row i of P. Assuming that zn and xn are mutually independent, Eq. (11.12) can be incorporated directly into Eq. (11.11). Regarding the measurement model and its distribution f ðI n jzn ; L; YÞ, we assume that each equation that links the indicators and the latent variables corresponds to a continuous, a binary, or a multinomial ordered response. A measurement equation r in the continuous case is given by I rn ¼ I rn with: I rn ¼ ar þ Lr zn þ rn ;
rn : Nð0; y2r Þ
(11.13)
In the binary case, we rather get instead: I rn ¼
1 if I rn 0 0 otherwise
(11.14)
On Estimation of Hybrid Choice Models
267
while in the multinomial ordered case with Q responses, we obtain:
I rn
8 1 if g0 oI rn g1 > > > > > < 2 if g1 oI rn g2 ¼ .. > . > > > > : Q if g oI g Q1 Q rn
(11.15)
where Irn and ern are the rth element of I n and n , respectively. y2r is the rth element on the diagonal of Y, and Lr denotes row r of L. In the multinomial cases, the gq’s are estimated. By convention, g0 and gQ are fixed to values that represent N and N, respectively. We assume that Y is diagonal, which implies that the indicators are not cross-correlated. Given our assumptions, the density f ðI n jzn ; L; YÞ that we denote as f(In) to simplify, corresponds to: f ðI n Þ ¼
R Y
f ðI rn Þ
(11.16)
r¼1
According to the assumptions of Eq. (11.13), if measurement equation r is continuous, then 1 I rn ar Lr zn f ðI rn Þ ¼ j yr yr
(11.17)
where j denotes the probability density function (pdf) of a standard normal. If the measurement equation r corresponds to a binary response, then
f ðI rn Þ ¼ F
ð1I rn Þ ar þ Lr zn I rn ar þ Lr zn 1F yr yr
(11.18)
where F denotes the cumulative distribution function (cdf) of a standard normal. Finally, if measurement equation r corresponds to a multinomial ordered response, then f ðI rn
gq Lr zn gq1 Lr zn F ¼ qÞ ¼ F yr yr
(11.19)
Additionally, gðzn jwn ; B; P; CÞ corresponds simply to the multivariate normal distribution MVNððI L PÞ1 Bwn ; ½ðI L PÞ1 C½ðI L PÞ1 0 Þ.
268
Denis Bolduc and Ricardo Alvarez-Daziano
11.3.2. HCM Simulated Maximum Likelihood Solution Now that we have described each component of the joint probability shown in Eq. (11.8), we can write the likelihood equation as:
lðdÞ ¼
N Y Y
Pn ði; IjX n ; wn ; dÞyin
(11.20)
n¼1 i2Cn
which leads to the following maximum log-likelihood problem: max lðdÞ ¼ d
N X X
yin ln Pn ði; IjX n ; wn ; dÞ
(11.21)
n¼1 i2Cn
The evaluation of the joint probability Pn ði; IjX n ; wn ; dÞ is required to find the solution of the problem (11.21) d^ ¼ arg maxflðdÞg. The number of latent variables has an impact on the computation of this probability. In fact, for the case of a MMNL kernel, note that Eq. (11.11) implies the computation of an integral of dimension F + L. In a moderate size model with say F ¼ 5 factors and L ¼ 4 latent variables, this integral is of dimension 9. Clearly, simulation would be required. In practice, with a large number of latent variables (more than 3), we replace the multidimensional integral with a smooth simulator that has good properties. Taking advantage of the expectation form of Eq. (11.11), we can replace the probability with the following empirical mean: S s 1X expðX in b þ C i zs s n þ Pi Txn Þ P P~ n ði; IjX n ; wn ; dÞ ¼ s f ðI n jzn ; L; YÞ þ P Tx Þ S s¼1 j2Cn expðX jn b þ C j zs j n n
(11.22)
where zs n corresponds to a random draw s from the gðzn jwn ; B; P; CÞ distribution, s and xn is a random draw s taken over the distribution of x. This sum is computed over S draws. This simulator is known to be unbiased, consistent (as S-N) and smooth with respect to the unknown parameters. Replacing Pn ði; IjX n ; wn ; dÞ with P~ n ði; IjX n ; wn ; dÞ in the log likelihood leads to a MSL solution. We therefore consider the following PN P objective function — often called the sample average approximation: ~ n¼1 i2Cn yin ln Pn ði; IjX n ; wn ; dÞ. In the past few years, a lot of progress has been made regarding MSL estimation. Train (2003) gives an in-depth analysis of the properties of MSL estimators. Recent results, based mainly on the analysis of MMNL models and mostly attributable to Bhat (2001), suggest the use of Halton draws. Halton-type sequences are known to produce simulators with a given level of accuracy using fewer draws than when using conventional uniform random draws (Ben-Akiva et al., 2002; Munizaga & AlvarezDaziano, 2005). Currently, our HCM estimation software (Bolduc & Giroux, 2005) makes use of both Halton sequences and standard pseudorandom numbers.
On Estimation of Hybrid Choice Models
269
Simulated maximum likelihood (SML) is now well known and has been applied in numerous circumstances. The logit probability kernel present in Eq. (11.11) makes the simulated log likelihood fairly well behaved. Asymptotically, meaning as S-N and as N-N, the solution becomes P to a solution arising from maximizing Pidentical the actual log-likelihood function: N n¼1 i2Cn yin ln Pn ði; IjX n ; wn ; dÞ.
11.3.3. Identification Discussion While identification issues are now well understood in the context of traditional discrete choice models (Ben-Akiva & Lerman, 1985; Train, 2003), general necessary and sufficient conditions for identification of HCMs have not yet been developed. A sufficient but not necessary technique for HCM identification is a two-step approach, where we apply separate conditional identification rules for the choice model and the latent variable model (Walker & Ben-Akiva, 2002). On the one hand, in discrete choice models what matters are differences between utilities and not the level of the utilities itself. Therefore and as a general framework, we have an order condition that establishes a limit for the total number of nuisance parameters that can be estimated. This boundary, which is a necessary condition for identification, is equal to the number of potentially different cells in the deviated covariance matrix. The next step is to examine the rank condition, which is more restrictive than the order condition and which is a sufficient condition for identification. This condition states that the number of nuisance parameters that can be estimated is generally equal to the rank of the Jacobian matrix of the vector that contains different elements of the deviated covariance matrix, minus one term which sets scale. Dansie (1985), Bunch (1991), and Bolduc (1992) study the identification conditions for the case of multinomial probit models, which is equivalent to the case of the MMNL model, discussed specifically in Walker (2002) and Ben-Akiva, Bolduc and Walker (2001). On the other hand, for identification of the parameters of the measurement equation of the latent variable model [Eq. (11.5)] the constant terms ar must be set to 0 in the noncontinuous cases. Additionally — except for the continuous case — the variances yr cannot be estimated. In that case they need to be fixed to 1.
11.4. Classical Estimation: Real Data Application In this section we study empirically the application of the HCM classical estimation method discussed in the previous section. This method allows us to estimate jointly the parameters of a discrete choice model and a latent variable model. Using real data about virtual personal vehicle choices made by Canadian consumers, we want to verify that hybrid choice is genuinely capable of adapting to practical situations.
270
Denis Bolduc and Ricardo Alvarez-Daziano
11.4.1. Personal Vehicle Choice Data We use data from a survey conducted by the EMRG (Energy and Materials Research Group, Simon Fraser University) of virtual personal vehicle choices made by Canadian consumers when faced with technological innovations. Horne (2003) provides all the details regarding the survey, including the design of the questionnaire, the process of conducting the survey, and analysis of the collected data. Survey participants were first contacted in a telephone interview for personalizing a detailed questionnaire that was then mailed to them. The mailed questionnaire had five different parts: Part 1: Transportation options, requirements, and habits; Part 2: Virtual personal vehicle choices made by Canadian consumers when faced with technological innovations (stated preferences experiment); Part 3: Transportation mode preferences; Part 4: Views on transportation issues; and Part 5: Additional information (gender, education, income). The SP hypothetical personal vehicle choices in Part 2 of the questionnaire considered four vehicle types: standard gasoline vehicle (SGV, operating on gasoline or diesel), alternative fuel vehicle (AFV, natural-gas vehicle), hybrid vehicle (HEV, gasoline–electric vehicle), and hydrogen fuel cell vehicle (HFC). For each one of these alternative vehicle types, the attributes were defined as capital cost (purchase price), operating cost (fuel costs), fuel available (percentage of stations selling the proper fuel), express lane access (whether or not the vehicle would be granted express lane access), emissions data (emissions compared to a SGV), power (power compared to current personal vehicle). Each participant was asked to make up to four consecutive virtual choices while the vehicle attribute values were modified after each round according to the SP experimental design. The sample has 866 completed surveys (of the total of 1150 individuals, 75% response rate), where each respondent provided up to four personal vehicle choices. After a cleanup where we kept the individuals who answered the whole attitudinal rating exercise, there remain 1877 usable observations for HCM estimation. The SP design is described in Horne, Jaccard, and Tiedman (2005), including the various values assumed by the characteristics of the vehicles that were used as a basis for developing the experimental design. The first step to build an HCM is to set the latent variables involved. We conduct our analysis focusing on three different relevant questions of the survey:
Transport Policies Support (TPS): Evaluation of eight different policies (see Table 11.1 for the definition of each policy) — or government actions that would influence the transportation system — according to degree of support: five levels from Strongly Opposed to Strongly Supportive Transport Problems Evaluation (TPE): Evaluation of six different factors related to transportation problems (see Table 11.1 for the definition of each problem) according to degree of seriousness of problem: five levels from Not a Problem to Major Problem
On Estimation of Hybrid Choice Models
271
Table 11.1: Indicators and latent variables. Latent variables Indicator Transport Policies Support (TPS)
Transport Problems Evaluation (TPE)
Car Attributes Importance (CAI)
Description (evaluated issue)
EC
TPS1 TPS2 TPS3 TPS4 TPS5 TPS6 TPS7 TPS8
Expanding and Upgrading Roads Road Tolls and Gas Taxes Bike Lanes and Speed Controls Emissions Testing and Standards HOV and Transit Priorities Improving Transit Service Promoting Compact Communities Encouraging Short Work Weeks
| | | | | | | |
TPE1 TPE2 TPE3 TPE4 TPE5 TPE6
Traffic Congestion Traffic Noise Poor Local Air Quality Accidents Caused by Bad Drivers High Greenhouse Gas Emissions Speeding Drivers in Neighborhoods
| | | | |
CAI1 CAI2 CAI3 CAI4 CAI5 CAI6 CAI7
Purchase Price Fuel Economy Horsepower Safety Seating Capacity Reliability Styling
ACF | |
|
| | | | | |
EC, environmental concern; ACF, appreciation of new car features; HOV, high occupancy vehicle.
Car Attributes Importance (CAI): Evaluation of seven different factors or car attributes (see Table 11.1 for the definition of each attribute) that influence the family’s decision to purchase the current personal vehicle. Evaluation made according to degree of importance: five levels from Not at All Important to Very Important Considering the answers to these questions as indicators we identify two latent variables: Environmental Concern (EC): Related to transportation and its environmental impact Appreciation of new car features (ACF): Related to car purchase decisions and how important are the characteristics of this new alternative The full list of indicators for each question [Transport Policies Support (TPS), Transport Problems Evaluation (TPE), and Car Attributes Importance (CAI)] and
272
Denis Bolduc and Ricardo Alvarez-Daziano
their relations with the transport-related latent variables [environmental concern (EC) and appreciation of new car features (ACF)] is provided in Table 11.1. Our model also includes a third latent variable, the latent income variable (REV, revenue), to account for the measurement error problem in quantifying the income variable. Note that modeling measurement errors as latent variables as well as other practical situations such as self-selection are cases where the application of the HCM framework naturally fits to solve the related problems (bias) that eventually arise. The hybrid model setting that we consider is represented in Figure 11.2, where the complete set of structural and measurement equations is sketched, depicting the relationships between explanatory variables and each partial model. Indeed, we can distinguish the choice model, which is centered on the utility function modeling; the latent variables structural model, which links the latent variables with the characteristics of the traveler; and the latent variables measurement model, which links each latent variable with the indicators. We will now present the results of the HCM classical estimation process. We implemented the SML solution presented in Section 11.3.2 in a custom-coded software written in Fortran (Bolduc & Giroux, 2005). Although the estimation process implies that all the equations are calibrated simultaneously, we will present the results separately for each HCM submodel, that is, the car choice model, the latent variable structural model, and the latent variable measurement model. 11.4.2. Car Choice Model The set of equations for the mode choice model alone are given by: U SGVn ¼ V SGVn þ G1;2 ACFn þ uSGVn
(11.23)
U AFVn ¼ V AFVn þ G2;1 ECn þ G2;2 ACFn þ uAFVn
(11.24)
U HEVn ¼ V HEVn þ G3;1 ECn þ G3;2 ACFn þ uHEVn
(11.25)
U HFCn ¼ V HFCn þ G4;1 ECn þ G4;2 ACFn þ G4;3 REVn þ uHFCn
(11.26)
where V in ¼ X in b denotes the deterministic part of the utility expression for alternative i and individual n. The deterministic utility contains the experimental attributes capital cost, operating costs, fuel available, express lane access, power as well as alternative specific constants for the AFV, the HEV, and the HFC. The utility specification also contains the effect of the latent variables. The latent variable related to EC was not considered on the SGV. On the basis of several attempts, the REV was included only in the HFC alternative (Table 11.2). Common parameters with the standard MNL model have the same sign and magnitude, except for alternative specific constants. Although the rho-square of the hybrid model is only slightly better than the MNL rho-square, the significance of the
Figure 11.2: Hybrid choice model (graphical representation of the application).
On Estimation of Hybrid Choice Models 273
274
Denis Bolduc and Ricardo Alvarez-Daziano
Table 11.2: Car choice model results. Hybrid choice model
ASC_AFV ASC_HEV ASC_HFC Capital Cost Operating Cost Fuel Available Express Lane Access Power Latent Variables ACF on SGV EC on AFV ACF on AFV EC on HEV ACF on HEV EC on HFC ACF on HFC REV on HFC Number of Individuals HCM Final Global Function MNL Log Likelihood Choice Model adjusted Rho-square Number of Halton Draws
MNL
Estimates
t-Stat
Estimates
t-Stat
6.626 4.383 6.403 0.943 0.849 1.384 0.162 2.710
5.162 5.688 8.750 4.369 3.917 7.096 2.229 3.985
4.500 1.380 2.100 0.856 0.826 1.360 0.156 2.700
6.81 2.18 3.26 4.07 4.18 7.32 2.29 4.12
3.160 0.798 2.810 0.770 2.810 1.085 3.054 0.456
25.984 2.695 30.708 3.965 30.708 5.620 31.048 3.373
– – – – – – – –
– – – – – – – –
1877 57624.26 – 0.236 500
1877 – 1984.55 0.234 –
MNL, multinomial logit; ASC, alternative specific constant; AFV, alternative fuel vehicle; HEV, hybrid vehicle; HFC, hydrogen fuel cell vehicle; SGV, standard gasoline vehicle; REV, revenue (income); HCM, hybrid choice model.
latent variable parameters shows a relevant effect on the individual utilities. The latent variables all enter very significantly and positively into the choice model specification. EC encourages the choice of green technologies. In fact, EC has the highest effect on the HFC, followed by the AFV, and then the HEV. Note that HFC represents the cleanest engine technology. At the same time, all vehicles show a positive effect of the ACF on the choice probabilities.
11.4.3. Structural Model For each one of the three latent variables (EC, ACF, and REV), we assume a linear structural regression equation whose estimation results are shown in Table 11.3. The structural equation links consumer characteristics with the latent variables. For example, we can conclude that EC is more important for public transportation users
On Estimation of Hybrid Choice Models
275
Table 11.3: Structural model results. EC
ACF
REV
Estimates
t-Stat
Estimates
t-Stat
2.434 0.020 0.097 0.204 0.258 0.011
9.660 0.428 1.135 2.572 7.392 0.294
3.094 0.118 0.100 – 0.283 0.001
18.495 2.244 0.938 – 6.474 0.017
1.293 – – – – –
7.374 – – – – –
0.064 0.187 0.262 0.332
1.712 2.279 3.105 3.702
0.008 0.328 0.621 0.525
0.170 3.961 6.657 5.278
0.598 0.592 0.889 0.228
9.522 5.363 7.952 1.840
Intercept Driving Alone User Car Pool User Transit User Female Dummy High Income Dummy (W$80,000) Education: University Age level: 26–40 years Age level: 41–55 years Age level: 56 years and more
Estimates t-Stat
EC, environmental concern; ACF, appreciation of new car features; REV, revenue (income).
Table 11.4: Latent variables variances.
Var(EC) Var(ACF) Var(REV)
Estimates
t-Stat
0.553 0.708 0.449
20.934 23.410 3.336
than for car pool users. We in fact observe a negative parameter for a driving-alone user. We also find that women are more worried about environmental issues than men. High education level has a significant positive effect on the REV while the effect is positive but not significant on both variables EC and ACF. In addition, the effect of age on EC increases as the individuals get older. The effect of age on the ACF is the highest for people between 41 and 55 years old. Not surprisingly, the effect of age on the REV is small and not significant for people older than 55 years of age (an effect of people being retired). Note that we also estimate the elements of the covariance matrix. As the results show (see Table 11.4), the elements in the diagonal of the C matrix are significantly different from 0 and show the presence of heteroscedasticity. In this version, we assumed that the latent variables are uncorrelated. 11.4.4. Measurement Model Finally, the latent variable measurement model links the latent variables with the indicators, and a typical equation for this model has the form: (11.27) TPS2 ¼ aTPS2 þ lEC;TPS2 ECn þ lAF;TPS2 ACFn þ TPS2
276
Denis Bolduc and Ricardo Alvarez-Daziano
In this example, we can see that the effects on the second indicator related to the Transport Policies Support (TPS) question are measured using a constant and the latent variables EC and ACF. We have considered 21 indicators, so it is necessary to specify 21 equations. Their relation with the latent variables is depicted in Figure 11.2 and the results are displayed in Table 11.5. As explained before, this model measures the effect of the latent variables on each indicator. Some interesting conclusions can be seen from the estimated parameters. Table 11.5: Structural model covariates. Estimates
t-Stat
Transport Policies Support EC on Expanding and Upgrading Roads EC on Road Tolls and Gas Taxes ACF on Road Tolls and Gas Taxes EC on Bike Lanes and Speed Controls EC on Reducing Car Emissions ACF on Reducing Car Emissions EC on High Occupancy Vehicles and Transit Priorities EC on Improving Transit Service EC on Promoting Compact Communities EC on Encouraging Short Work Weeks
0.392 0.581 0.091 0.532 0.478 0.295 0.606 0.491 0.206 0.396
5.405 6.600 1.389 8.507 7.944 7.856 8.143 8.352 2.994 7.159
Transport Problems Evaluation EC on Traffic Congestion EC on Traffic Noise EC on Poor Local Air Quality ACF on Poor Local Air Quality EC on Accidents Caused by Bad Drivers EC on Emissions and Global Warming EC on Speeding Drivers in Neighborhoods
0.735 0.901 1.000 0.061 0.837 1.113 1.107
9.154 9.495 – 1.416 14.472 16.200 15.790
Car Attributes Importance ACF on Purchase Price ACF on Fuel Economy Importance ACF on Horsepower Importance ACF on Safety Importance ACF on Seating Capacity Importance ACF on Reliability Importance ACF on Styling
0.004 0.259 0.433 1.000 0.684 0.537 0.371
0.087 5.008 7.220 – 11.563 16.241 6.660
1.00
–
Income Class REV on HFC
EC, environmental concern; ACF, appreciation of new car features; REV, revenue (income); HFC, hydrogen fuel cell vehicle.
On Estimation of Hybrid Choice Models
277
For example, the effect of EC on the indicator related to the support of expanding and upgrading roads is negative. This sign reflects the idea that car priority in the context of rising road capacity is negatively perceived because of the negative impact on the environment. On the one hand, we see that the effect of EC on the indicator related to the support of applying road tolls and gas taxes is positive, indicating a perceived positive environmental impact of measures allowing a rational use of the car. Note also that the effect on the same indicator of the ACF, the other transport-related latent variable considered, is negative although not significant. This sign can be explained because of the perceived negative impact of this kind of car use restrictions, especially if the user is considering to buy a new car. A similar analysis can be carried out for the other indicators. For example, the positive sign of the effect of both EC and ACF on the support for reducing vehicle emissions with regular testing and manufacturer emissions standards: it is perceived with a positive environmental impact, and also it is positively perceived by consumers as a good attribute of a potential new car. It is worth noting that such results permit us to establish a consumer profile in a way not possible with standard discrete choice models. In sum, using real data about virtual personal vehicle choices we showed that HCM is genuinely capable of adapting to practical situations. Although we obtained similar effects for the alternative attributes in the car choice model for both the hybrid and MNL models, and the gain in fit of HCM is only marginally better than MNL, nevertheless, HCM offers an enormous improvement in modeling behavior. For hybrid models, the choice model is only a part of the whole behavioral process. HCM combines the direct effect of underlying latent variables on the choice probabilities, with the sociodemographic characteristics that now enter the choice probabilities through the latent variables. HCM also takes into account opinions and attitudes through the consumer’s response to attitudinal rating exercises. These responses are taken as indicators of the latent variables. Using simple questions we can enrich the model and obtain better knowledge about the user’s characteristics and his or her behavioral attitudes and perceptions.
11.5. HCM Bayesian Estimation: Analysis of a Simple Case 11.5.1. Introducing HCM Bayesian Estimation Including numerous attitudes and perceptions in HCM with large sets of potentially interrelated choices directly entails the simulation of high dimensional integrals. We can address this problem using classical methods, which use an efficient choice probability simulator through MSL estimation, the technical details of which are described in Section 11.3. Although feasible in practice (as shown in Section 11.4), the MSL approach necessary for classical HCM estimation is very demanding in situations with a huge choice set of interdependent alternatives comprising a large number of latent variables.
278
Denis Bolduc and Ricardo Alvarez-Daziano
For these reasons, we propose to go beyond classical methods by introducing hierarchical Bayesian techniques. Building on the rapid development of Markov chain Monte Carlo (MCMC) techniques, and on the idea that Bayesian tools (with appropriate priors) could be used to produce estimators that are asymptotically equivalent to those obtained using classical methods, we define the goal of both theoretically and empirically implementing a Bayesian approach to hybrid choice modeling. In a first attempt to achieve this goal we develop and implement the HCM sequence of full conditional posterior distributions for a simple case: a trinomial probit for the choice model with 1 latent variable, 1 continuous indicator, and known variances. We will justify the assumption of a probit kernel by showing that for this kernel the derivation of the HCM Bayesian estimator is straightforward.
11.5.2. Choice Model Let us consider a trinomial choice for n ¼ 1,y, N individuals, where three alternatives i ¼ 1, 2, 3 ’n are described by two different attributes X1 and X2 — plus two alternative specific constants, and one latent variable z* which we will discuss later in this chapter. The error terms u are assumed to be independently Normal distributed. 11.5.2.1.
Structural equation
For each individual n, we thus have
U 1n ¼ b1 X 11n þ b2 X 12n þ u1n
(11.28)
U 2n ¼ ASC2 þ b1 X 21n þ b2 X 22n þ G2 zn þ u2n
(11.29)
U 3n ¼ ASC3 þ b1 X 31n þ b2 X 32n þ G3 zn þ u3n
(11.30)
In discrete choice models, decisions are based on utility differences so that we can consider a deviated model that, in this particular case, leads us to write the structural equation in two equations: U 2n U 1n ¼ ASC2 þ b1 ðX 21n X 11n Þ þ b2 ðX 22n X 12n Þ þ G2 zn þ u2n u1n (11.31) U 3n U 1n ¼ ASC3 þ b1 ðX 31n X 11n Þ þ b2 ðX 32n X 12n Þ þ G3 zn þ u3n u1n (11.32) or, using the difference operator D21 U n ¼ U 2n U 1n : D21 U n ¼ ASC2 þ b1 D21 X 1n þ b2 D21 X 2n þ G2 zn þ D21 un ¼ D21 V n þ D21 un
(11.33)
D31 U n ¼ ASC3 þ b1 D31 X 1n þ b2 D31 X 2n þ G3 zn þ D31 un ¼ D31 V n þ D31 un
(11.34)
On Estimation of Hybrid Choice Models
279
which is equivalent to 0 D21 U n D31 U n
! ¼
1 0
0 D21 X 1n 1 D31 X 1n
D21 X 2n D31 X 2n
zn 0
ASC2
1
B C B ASC3 C C !B C 0 B B b1 C B Cþ zn B b2 C B C B G2 C @ A G3
D21 un
!
D31 un
(11.35)
In this HCM independent probit kernel, we assume that 2 un i:i:d: Nð0; s2 Þ ) Dun Nð0; s2 Þ. We will normalize the model setting the scale factor equal to 1 by assuming s2 ¼ 1. 11.5.2.2.
Measurement equation
8 > < 1 if U 1n ¼ maxðU 1n ; U 2n ; U 3n Þ3ðD21 U n o0Þ ^ ðD31 U n o0Þ yn ¼ 2 if U 2n ¼ maxðU 1n ; U 2n ; U 3n Þ3ðD21 U n 0Þ ^ ðD21 U n 4D31 U n Þ > : 3 if U ¼ maxðU ; U ; U Þ3ðD U 0Þ ^ ðD U 4D U Þ 3n 1n 2n 3n 31 n 31 n 21 n
(11.36)
11.5.3. Latent Variable Model Consider now one latent variable, with one observable continuous indicator. 11.5.3.1.
Structural equation zn ¼ bwn þ zn
(11.37)
where we will consider zn Nð0; 1Þ; 8n. 11.5.3.2.
Measurement equation I n ¼ lzn þ n
(11.38)
with n Nð0; 1Þ; 8n.
11.5.4. Gibbs Sampler Implementation The parameters to estimate in the case we are analyzing are y0 ¼ ½ASC 2 ASC 3 b1 b2 G2 G3 ; b and l. Bayes estimation implementation for these parameters requires making draws from the joint posterior distribution: Pðy; b; ljy; IÞ
(11.39)
280
Denis Bolduc and Ricardo Alvarez-Daziano
or, using data augmentation, from: PðDU; z ; y; b; ljy; IÞ
(11.40)
and z ¼ ðz1 ; . . . ; zN Þ0 ; y ¼ ðy1 ; . . . ; yN Þ0 ; and I ¼ where DU ¼ ½D21 U; D31 U, 0 ðI 1 ; . . . ; I N Þ capture the information for the full group of individuals. Using Gibbs sampling the estimators are obtained from draws inside an iterative process involving the set of full conditional distributions. Namely, at the gth iteration: ðg1Þ ðg1Þ ðg1Þ ðg1Þ ;y ;b ;l ; yn ðg1Þ ; I n ðg1Þ Þ; 8n DU ðgÞ n pðDU n jzn
(11.41)
zn ðgÞ pðzn jDU ðg1Þ ; yðg1Þ ; bðg1Þ ; lðg1Þ ; yn ðg1Þ ; I n ðg1Þ Þ; 8n
(11.42)
yðgÞ pðyjDU ðg1Þ ; z ðg1Þ ; bðg1Þ ; lðg1Þ ; yðg1Þ ; I ðg1Þ Þ
(11.43)
bðgÞ pðbjDU ðg1Þ ; zðg1Þ ; yðg1Þ ; bðg1Þ ; yðg1Þ ; I ðg1Þ Þ
(11.44)
lðgÞ pðljDU ðg1Þ ; z ðg1Þ ; yðg1Þ ; bðg1Þ ; yðg1Þ ; I ðg1Þ Þ
(11.45)
First, note that D21 U n NðD21 V n ; 1Þ, and that D31 U n NðD31 V n ; 1Þ. But we also know that ðD21 U n o0Þ ^ ðD31 U n o0Þ when yn equals 1, that ðD21 U n 0Þ ^ ðD21 U n 4D31 U n Þ when yn equals 2, and that ðD31 U n 0Þ ^ ðD31 U n 4D21 U n Þ when yn equals 3. This implies that conditional on yn, DUn follows a truncated multivariate normal (TMVN) distribution, where the truncation region < is defined by yn: pðDU n jzn ; y; b; l; yn ; I n Þ
D21 V n
TMVN<jyn
!
D31 V n
;
1
0
0
1
!! ; 8n
(11.46)
In addition, and since the latent variable zn is not observable, we need to incorporate the information provided by the indicator In on zn . This information is explicitly given by the conditional probability pðzn jI n Þ whose expression depends on the assumptions we make. If we assume an independent bivariate normal distribution for the error terms of the latent variable model, then we have zn
!
bwn
!
l
l
1 þ l2
!! ; 8n
(11.47)
pðzn jDU; y; b; l; yn ; I n Þ Nðmzn jI n ; s2zn jI n Þ; 8n
(11.48)
In
N
lbwn
;
1
which implies
On Estimation of Hybrid Choice Models
281
where mzn jI n ¼ bwn þ
l ðI n lbwn Þ 1 þ l2
s2zn jI n ¼
(11.49)
1 1 þ l2
(11.50)
Note that the latter expression is independent of the individual n, so we can write s2zjI . When using data augmentation, the latent variables become ‘‘observable,’’ as draws of both DU and z* are made. This fact implies that the rest of the conditional distributions simply correspond to ordinary Bayesian regressions: V y Þ pðyjDU; z ; b; l; y; IÞ Nðy;
(11.51)
V b Þ pðbjDU; z ; y; b; y; IÞ Nðb;
(11.52)
V l Þ pðljDU; zn ; y; b; y; IÞ Nðl;
(11.53)
V y Þ, pðbÞ Nðb; V b Þ, If prior beliefs for y, b, and l are described by pðyÞ Nðy; V l Þ respectively, then we can show that and pðlÞ Nðl; 1 1 V b ¼ ðV b þ w0 wÞ1 ; b ¼ V b ðV b þ w0 z Þ
(11.54)
1 1 V l ¼ ðV l þ z0 zÞ1 ; l ¼ V l ðV l þ z0 I Þ
(11.55)
0 ~ 1 ; y ¼ V y ðV y 1 þ X~ 0 DUÞ V y ¼ ðV y1 þ X~ XÞ
(11.56)
where X~ is a matrix build by stacking the matrices X~ n ¼
1
0
D21 X 1n
D21 X 2n
zn
0
0
1
D31 X 1n
D31 X 2n
0
zn
! ; 8n
Note that to have an unbiased estimator of y, the unconditional zn must enter X~ n .
282
Denis Bolduc and Ricardo Alvarez-Daziano
11.5.5. Simulation Experiment 11.5.5.1. Choice model We considered a situation with three alternatives i ¼ {1, 2, 3} described by two attributes X1 and X2, for N ¼ 50,000 individuals. The attributes were built taking random draws from a uniform distribution according to Table 11.6. Then, the systematic utility functions were built assuming a linear specification on the attributes and one alternative specific constant for alternatives 2 and 3. In Table 11.7 we show the taste parameters that were assumed. These values were fixed to achieve a relatively balanced choice. In other words, the taste values assure that each alternative has enough choices to correctly estimate the model. The error terms u were assumed i.i.d. normal and such that DunBN(0,1). The scale factor considered was then equal to 1.0 and — in this particular case — it also assures that the choice process is not completely deterministic or completely random. In fact, we checked that of the values shown in Table 11.7 around 25% of the simulated individuals changed their choices because of the random term. Previous experiments results tell this is a good trade-off to replicate a standard discrete choice model. 11.5.5.2. Latent variable model To simulate the latent component of the model we built one binary individual characteristic wn, following a Bernoulli distribution with p ¼ 0.5. The associated parameter, as we can see in Eq. (11.37), corresponds to b, whose assumed value is reported in Table 11.8. Using these values it was possible to build one — in this case observed — latent variable that was added into the utility function of alternatives 2 and 3 using G2 and G3 as weight. Note also that we assumed zn Nð0; 1Þ and n Nð0; 1Þ; 8n.
Table 11.6: Attributes of experimental design. BU[min, max]
Min
Max
X1 X2
2.0 5.0
5.0 15.0
Table 11.7: Taste parameters for the choice model. Taste parameters ASC2 ASC3 b1 b2 G2 G3
0.20 0.40 0.05 0.10 0.50 0.60
On Estimation of Hybrid Choice Models
283
Table 11.8: Latent choice model parameters. Taste parameters 0.50 0.70
b l
Table 11.9: Starting values. d
Value
ASC2 ASC3 b1 b2 G2 G3 b l D21 U D31 U z*
0.00 0.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 0.00 0.00
Table 11.10: Estimation results using 1000 replications dripping the first 200 draws for burn-in period. d
Target
d^
SE
t Test
t Target
ASC2 ASC3 b1 b2 G2 G3 b l
0.20 0.40 0.05 0.10 0.50 0.60 0.50 0.80
0.227 0.396 0.052 0.099 0.532 0.655 0.497 0.794
0.064 0.063 0.031 0.009 0.080 0.080 0.083 0.052
3.54 6.30 1.68 10.80 6.62 8.19 5.98 15.32
0.42 0.07 0.07 0.12 0.39 0.69 0.03 0.12
11.5.5.3. Estimated parameters We implemented the iterative Gibbs sampling routine presented earlier using the R software. Starting values are presented in Table 11.9. To construct the reported results shown in Table 11.10 we consider a randomly selected subsample of 1000 individuals from the original database of N ¼ 50,000 individuals. The estimated parameters correspond to the means of the results for 15 repetitions of the sub-sampling process.
284
Denis Bolduc and Ricardo Alvarez-Daziano
We observe that not only the estimates are significant but also that they significantly replicate the target values, which correspond to those used to build our simulated database. Specifically, the t-target value is calculated to test the null hypothesis of each estimate of each parameter of interest being equal to its target value.
11.5.6. Probit Kernel Vs. MMNL Kernel When we want to incorporate flexibility into a standard choice model we have two options: probit and MMNL. In the simple case that we will study in this section, we developed the HCM Gibbs sampler assuming a probit kernel. When using classical SML techniques, the intricate classical multinomial probit estimation process reduces the practicability of the standard probit model. In fact, the MMNL SML estimation outperforms the probit SML estimation because of the good statistical properties that can be derived for the MMNL estimator (Munizaga & AlvarezDaziano, 2005). This explains the choice of an MMNL kernel in the SML solution for HCMs presented in Section 11.3. However, working with Bayesian methods breaks down the complexity of probit classical estimation (Bolduc, Fortin, & Gordon, 1997). As we show in our simulation experiment, HCM Bayesian estimation with a probit kernel is also straightforward. The properties of the normal distribution allow us to exploit data augmentation techniques basically because DU follows a normal distribution (Albert & Chib, 1993; McCulloch, Polson, & Rossi, 2000). In fact, it is trivial to make an extension of the Gibbs sampler we developed to a normal error component model — such as randomly Normal distributed taste variations — with a probit kernel for the mode choice. On the other hand, when modeling a normal error component model with a MNL kernel — which results in an MMNL model — we no longer have the advantageous properties that make implementation of the probit-kernel Gibbs sampling easier. The Bayesian procedure for a standard MMNL estimation — without an associated structure of latent variables — is described in Train (2003). Since the MMNL distribution of DU is hard to describe, MMNL Bayesian estimation uses Metropolis– Hastings method. Plugging this Metropolis–Hastings procedure into our iterative process of conditional draws, we are able to estimate the whole set of HCM parameters (d). The choice model parameters, that is, y, are obtained using the Metropolis–Hastings step. Note that data augmentation is no longer necessary, which explains the loss in modeling simplicity we found for the probit kernel case — and that we implemented in our simulation experiment. As for b and l, note that Eqs. (11.54) and (11.55) are still valid.
11.6. Conclusions In the last decade, discrete choice modeling has evolved toward an explicit incorporation of psychological factors affecting decision making. Traditionally,
On Estimation of Hybrid Choice Models
285
discrete choice models represented the decision process as an obscure black box for which attributes of alternatives and characteristics of individuals were inputs and where the observed choice made by the individual corresponded to the output of the system. The new trend in discrete choice modeling is to enhance the behavioral representation of the choice process. As a direct result, the model specification is improved and the model gains in predictive power. Econometrically, the improved representation — called the hybrid choice model — involves dealing with a choice model formulation that contains latent psychometric variables among the set of explanatory variables. Since perceptions and attitudes are now incorporated, this leads to more realistic models. In this paper we have described the HCM, composed of a group of structural equations describing the (potentially interrelated) latent variables in terms of observable exogenous variables, and a group of measurement relationships linking latent variables to certain observable (continuous or discrete) indicators. We have shown that although HCM estimation requires the evaluation of complex multidimensional integrals, both SML and Bayesian methods can be successfully implemented and offers an unbiased, consistent, and smooth estimator of the true probabilities. Additionally, analyzing a simple case we showed that working with Bayesian methods can break down the complexity of classical estimation. Using real data about virtual personal vehicle choices made by Canadian consumers when faced with technological innovations, we showed that HCM is genuinely capable of adapting to practical situations. Indeed, the results provide a better description of the profile of consumers and their adoption of new private transportation technologies. We identified two travel-related dimensions with a significant impact: EC and ACF. We also successfully included in the choice model a REV to account for the measurement errors associated with the reported income classes. Further research is needed to generalize the HCM Gibbs sampler we developed in our study. By testing the general HCM Gibbs sampler, we expect to determine when Bayesian MCMC outperforms MSL according to empirical results based on correct identification restrictions and accurate predictions.
References Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679. Ben-Akiva, M. (1973). Structure of passenger travel demand models. Ph.D. dissertation, Department of Civil Engineering, MIT, Cambridge, Massachusetts. Ben-Akiva, M., & Boccara, B. (1987). Integrated framework for travel behavior analysis. IATBR conference, Aix-en-Provence, France: Pergamon. Ben-Akiva, M., Bolduc, D., & Park, J. Q. (2008). Discrete choice analysis of shipper’s preferences. In: E. Van de Voorde (Ed.), Recent developments in transport modeling: Lessons for the freight sector. Amsterdam: Elsevier.
286
Denis Bolduc and Ricardo Alvarez-Daziano
Ben-Akiva, M., Bolduc, D., & Walker, J. (2001). Specification, identification and estimation of the logit kernel (or continuous mixed logit) model. Working paper, MIT Press. Ben-Akiva, M., & Lerman, S. R. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: MIT Press. Ben-Akiva, M., McFadden, D., Train, K., Walker, J., Bhat, C., Bierlaire, M., Bolduc, D., Boersch-Supan, A., Brownstone, D., Bunch, D., Daly, A., de Palma, A., Gopinath, D., Karlstrom, A., & Munizaga, M. A. (2002). Hybrid choice models: Progress and challenges. Marketing Letters, 13(3), 163–175. Bhat, C. (2001). Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model. Transportation Research Part B, 35, 677–693. Bolduc, D. (1992). Generalized autoregressive errors in the multinomial probit model. Transportation Research Part B, 26, 155–170. Bolduc, D., & Ben-Akiva, M. (1991). A multinomial probit formulation for large choice sets. Proceedings of the 6th international conference on travel behaviour, Vol. 2, pp. 243–258. Oxford: Pergamon. Bolduc, D., Ben-Akiva, M., Walker, J., & Michaud, A. (2005). Hybrid choice models with logit kernel: Applicability to large scale models. In: M. Lee-Gosselin & S. Doherty (Eds), Integrated land-use and transportation models: Behavioral foundations (pp. 275–302). Oxford: Elsevier. Bolduc, D., Boucher, N., & Alvarez-Daziano, R. (2008). Hybrid choice modeling of new technologies for car choice in Canada. Transportation Research Record (Journal of the Transportation Research Board), No.2082, pp. 63–71. Transportation Research Board of the National Academies, Washington DC. Bolduc, D., Fortin, B., & Gordon, S. (1997). Multinomial probit estimation of spatially interdependent choices: An empirical comparison of two new techniques. International Regional Science Review, 20, 77–101. Bolduc, D., & Giroux, A. (2005). The integrated choice and latent variable (ICLV) model: Handout to accompany the estimation software. De´partement d’e´conomique, Universite´ Laval. Brownstone, D., & Train, K. E. (1999). Forecasting new product penetration with flexible substitution patterns. Journal of Econometrics, 89, 109–129. Bunch, D. A. (1991). Estimability in the multinomial probit model. Transportation Research Part B, 25, 1–12. Dansie, B. R. (1985). Parameter estimability in the multinomial probit model. Transportation Research Part B, 19, 526–528. Horne, M. (2003). Incorporating preferences for personal urban transportation technologies into a hybrid energy-economy model. Master’s degree thesis, School of Resource and Environmental Management, Simon Fraser University, Burnaby, BC, Canada. Horne, M., Jaccard, M., & Tiedman, K. (2005). Improving behavioral realism in hybrid energy-economy models using discrete choice studies of personal transportation decisions. Energy Economics, 27(1), 59–77. Jo¨reskog, K., & So¨rbom, D. (1984). LISREL VI: Analysis of linear structural relations by maximum likelihood, instrumental variables and least squares methods, User’s guide. Department of Statistics, University of Uppsala, Uppsala, Sweden. McCulloch, R. R., Polson, N. G., & Rossi, P. E. (2000). Bayesian analysis of the multinomial probit model with fully identified parameters. Journal of Econometrics, 99, 173–193. McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In: P. Zarembka (Ed.), Frontier in econometrics. New York: Academic Press.
On Estimation of Hybrid Choice Models
287
McFadden, D. (1986). The choice theory approach to market research. Marketing Science, 5(4), 275–297. McFadden, D., & Train, K. E. (2000). Mixed MNL models of discrete response. Journal of Applied Econometrics, 15, 447–470. Morikawa, T., Ben-Akiva, M., & McFadden, D. (2002). Discrete choice models incorporating revealed preferences and psychometric data. Econometric Models in Marketing, 16, 27–53. Munizaga, M. A., & Alvarez-Daziano, R. (2005). Testing mixed logit and probit models by simulation. Transportation Research Record, 1921, 53–62. Train, K. (2003). Discrete choice methods with simulation. New York: Cambridge University Press. Walker, J. (2002). The mixed logit (or logit kernel) model: Dispelling misconceptions of identification. Transportation Research Record, 1805, 86–98. Walker, J., & Ben-Akiva, M. (2002). Generalized random utility model. Mathematical Social Sciences, 43(3), 303–343.
Chapter 12
A Model of Travel Happiness and Mode Switching Maya Abou-Zeid and Moshe Ben-Akiva
Abstract In previous research (Abou-Zeid et al., 2008), we postulated that people report different levels of travel happiness under routine and nonroutine conditions and supported this hypothesis through an experiment requiring habitual car drivers to switch temporarily to public transportation. This chapter develops a general modeling framework that extends random utility models by using happiness measures as indicators of utility in addition to the standard choice indicators, and applies the framework to modeling happiness and travel mode switching using the data collected in the experiment. The model consists of structural equations for pretreatment (remembered) and posttreatment (decision) utilities and explicitly represents their correlations, and measurement equations expressing the choice and the pretreatment and posttreatment happiness measures as a function of the corresponding utilities. The results of the empirical model are preliminary but support the premise that the extended modeling framework, which includes happiness, will potentially enhance behavioral models based on random utility theory by making them more efficient.
12.1. Introduction Subjective well-being or happiness has been studied by psychologists and economists (Bruni & Porta, 2007; Camerer, Loewenstein, & Rabin, 2003; Frey & Stutzer, 2002; Kahneman, Diener, & Schwarz, 1999; Kahneman & Krueger, 2006; Seligman, 2002;
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
290
Maya Abou-Zeid and Moshe Ben-Akiva
Van Praag & Ferrer-i-Carbonell, 2004). Considering that economic measures are not sufficient on their own as indicators of quality of life, happiness researchers have called for the use of subjective well-being measures as additional indicators for informing or evaluating public policy (Diener & Seligman, 2004; Dolan & White, 2007; Loewenstein & Ubel, 2008). Many surveys have been conducted to measure life and domain satisfaction resulting in additional measures of the well-being of society (e.g., DIW Berlin, German Institute for Economic Research, 2009; The ESRC United Kingdom Longitudinal Studies Centre, 2009; European Commission, 2009; National Opinion Research Center at the University of Chicago, 2009; World Values Survey, 2009). In the transportation domain, the study of subjective well-being has also recently emerged as an important area of research (e.g., Abou-Zeid, 2009; Duarte, Garcia, Lima˜o, & Polydoropoulou, 2008, 2009a, 2009b; Ettema, Ga¨rling, Olsson, & Friman, 2009; Ory & Mokhtarian, 2005). It can be postulated that (1) travel choices, which are usually modeled using the concept of generalized cost, are more likely to be motivated by a broader goal of maintaining and enhancing travel well-being and (2) activity participation, which induces travel, is driven by a desire to maintain or enhance subjective well-being. Measuring and modeling activity and travel well-being can potentially enhance travel behavior models and better capture the travel benefits. In previous research (Abou-Zeid & Ben-Akiva, 2007), we measured activity and travel well-being through a cross-sectional web-based survey of a sample of commuters. We found significant correlations between well-being and behavior. In subsequent research (Abou-Zeid, Ben-Akiva, & Bierlaire, 2008), we investigated the measurement of travel well-being in a way that accounts for the routine nature of travel. We postulated that when people are in a routine, they do not engage in a cognitive process of fully evaluating their travel happiness. Only when people evaluate their options and reconsider their decisions will they carefully think of their travel happiness. We tested this hypothesis through an experiment requiring habitual car drivers to switch temporarily to public transportation. After this intervention, participants reported significantly greater levels of happiness with their commute by car compared to what they reported when they were in a routine. In this chapter, we develop and test a framework that models the relationship between travel happiness and mode switching using data collected from this experiment. Our framework extends random utility models by using happiness measures as indicators of utility in addition to the standard choice indicators. Different happiness measures (by mode and by time period) are used as indicators of the corresponding utilities. By using a simultaneous estimation technique, the empirical model extends a previous model we estimated sequentially (mode choice first, then happiness) in Abou-Zeid et al. (2008). The modeling framework is general and can be used to model behavior in other domains. It is anticipated that the use of the happiness measures as indicators of utility will result in more efficient parameter estimates and more accurate model forecasts. The remainder of this chapter is organized as follows. The next section reviews the relationship between happiness and utility and proposes a framework for incorporating happiness measures in random utility models. Then the data collected
A Model of Travel Happiness and Mode Switching
291
in the experiment we conducted and the main descriptive findings are described. This is followed by the model formulation and the estimation results. The last section concludes the chapter.
12.2. Happiness and Utility In addition to their policy relevance, measures of well-being can potentially enhance behavioral models based on random utility theory. In this section, we first present a brief historical perspective on the relationship between happiness and utility, which shows that the concept of utility is not remote from the concept of happiness. Then we present a conceptual and modeling framework arguing that happiness measures, appropriately defined, can be used as indicators of utility.
12.2.1. Historical Perspective In the classical era of economics, Bentham (1789) equated utility with happiness and defined it as the experiences of pleasure and pain (i.e., in a hedonic sense). There were several proponents of the Benthamite conceptualization of utility but no attempt was made at the actual measurement of happiness in the classical era of consumer theory. Measurement was viewed as a task left to psychometricians. In the neoclassical era of economics, economists started to distance themselves from psychological introspections of utility, conceptualizing it instead as a ‘‘black box whose inner workings were not their concern’’ (McFadden, 2005). They associated utility with the weight of outcomes in making decisions, where preferences can only be inferred from choices. In the modern behavioral revaluation of consumer theory, there has been a revived interest in the measurement of happiness, and the two concepts of happiness and utility have been brought close together once again. Kahneman, Wakker, and Sarin (1997) and Kahneman (2000) referred to Bentham’s definition of utility as experienced utility and to the modern usage of utility by neoclassical economists as decision utility. They further made a distinction among three forms of experienced utility: remembered, moment, and predicted utility. Remembered utility is a retrospective global assessment of an experience. Moment utility is the real-time characterization of the affective/ hedonic state of each moment of experience. Predicted utility is the anticipated feeling of pleasure or pain associated with an outcome that will be experienced in the future. Through a series of experiments, Kahneman and his colleagues (Kahneman, Fredrickson, Schreiber, & Redelmeier, 1993; Redelmeier, Katz, & Kahneman, 2003; Schreiber & Kahneman, 2000) demonstrated that remembered utility (of pain or pleasure) is determined by selected moments of the actual experience (such as the peak and the end). They also found that people tended to repeat choices, which they remembered as less unpleasant or more pleasant; that is, remembered utility affects predicted utility, which is part of decision utility. See also Wirtz, Kruger, Scollon, and
292
Maya Abou-Zeid and Moshe Ben-Akiva
Diener (2003) for further evidence. However, these choices often failed to maximize experienced moment-to-moment utility. A number of economists have also discussed the relationship between happiness and utility or used happiness data in modeling efforts. Di Tella and MacCulloch (2006) suggested that happiness data capture ‘‘something meaningful about true utility’’ because happiness ratings tend to be correlated with a number of variables that are associated with utility. Kimball and Willis (2006) argued that happiness (which they defined as current affect) is different from utility; individuals have a preference for happiness, implying that happiness affects utility, and happiness also depends on utility through news about lifetime utility. Frey, Luechinger, and Stutzer (2004), Benz and Stutzer (2003), and Van Praag and Ferrer-i-Carbonell (2004) have used reported life satisfaction data empirically as a proxy for utility.
12.2.2. Conceptual and Modeling Framework We adopt the following conceptualization of happiness and utility. If happiness is broadly defined as overall satisfaction or general happiness considering all aspects of a situation or an experience, then it is plausible to assume that happiness is the same as utility and use measures of happiness as indicators of utility. If, however, happiness is more narrowly defined as satisfaction with particular aspects of the experience of interest, then happiness will just be part of the overall utility, and happiness measures can then be used as indicators of part of the utility. Since there are different notions of utility as described above, different happiness measures may reflect different utilities. If the research objective is to model behavior, then decision utility will be relevant and happiness measures that capture decision utility should be used. In a static context, the happiness measures will be imperfect indicators of decision utility since they are collected after people experience the outcomes of their decisions. In a dynamic context, happiness measures will capture decision utility better since they can be collected at the time of decision making. Moreover, in a dynamic context, one can further make a distinction among the different notions of utility, and happiness measures can then be used as indicators of utility accordingly. It will also be possible to model the causal relationships among different types of utility: decision utility in a given time period depends on remembered utility in the previous time period; moment utility depends on the attributes of the choice made; and remembered utility in a given time period depends on moment utility in that time period. Figure 12.1 shows a framework for modeling these causal relationships in a dynamic context where we also represent the effect of a choice made in one time period on decision utility in the next time period (inertia). In this chapter, we formulate and apply a simplified version of this framework to a mode choice experiment we conducted in Switzerland (described in the next section). The framework is general and can be applied to modeling behavior in other domains. Quantitative methods for analyzing happiness data in the literature have mostly focused on the use of statistical tests and models that explain the causes of happiness. To the best of our knowledge, no attempt has yet been made at using happiness
A Model of Travel Happiness and Mode Switching
Observed / Latent Variables
Indicators
Decision Utility
Indicators
293
Choice t= t+1
Moment Utility
Indicators
Remembered Utility
Indicators
Figure 12.1: Dynamic framework for modeling happiness and behavior. measures within a random utility modeling framework. The ability to measure subjective well-being directly will provide more information about the utility beyond what is indicated through observed choices. It will potentially lead to the development of richer and more efficient behavioral models.
12.3. Data and Descriptive Findings A total of 30 self-selected employees of Geneva airport, Ecole Polytechnique Fe´de´rale de Lausanne (EPFL), and Universite´ de Lausanne (UNIL) participated in an experiment we conducted in the spring/summer of 2008. They were habitual car drivers who have not commuted by public transportation for a long time or who used it only occasionally. The experiment consisted of three phases: pretreatment, treatment, and posttreatment. In the pretreatment phase, participants responded to questionnaires measuring their satisfaction with the commute by car, perceptions, attitudes, and plans in addition to standard socioeconomic and demographic questions. The commute satisfaction question, with a five-point response scale anchored by ‘‘very dissatisfied’’ to ‘‘very satisfied,’’ was phrased as follows: Taking all things together, how satisfied are you with your commute by car between your residence and work?
294
Maya Abou-Zeid and Moshe Ben-Akiva
In the treatment phase, which lasted for 1 week, participants were ‘‘required’’ (as a condition for participation) to commute by public transportation for 2–3 days in that week. As an incentive, they were given free public transportation passes that were valid for 2–4 weeks starting from the beginning of the treatment phase. Throughout the treatment week, and for 1 week before the treatment and 1 week after, participants filled out daily travel diaries. In the posttreatment phase, participants filled out the same questionnaire as in the pretreatment phase, measuring their satisfaction with the commute by car, perceptions, attitudes, and plans. They also indicated their satisfaction with their commute by public transportation and evaluated various aspects of their public transportation experience. After the treatment, none of the participants canceled his/her parking permit and switched completely to public transportation. However, 10 out of the 30 participants commuted by public transportation at least once in the week following the treatment (when the free pass was still valid). Of the 25 participants who were contacted a few months later, 5 indicated that after the expiration of the pass they commuted by public transportation at a higher rate than that before the intervention. The distribution of reported satisfaction with the commute by car changed significantly from pretreatment to posttreatment, with an overall trend of increase in reported car satisfaction. Out of 29 participants, 16 did not change their ratings, 11 gave higher ratings, and 2 gave lower ratings. This finding provides support for the hypothesis that the travel happiness measures collected under routine and nonroutine situations are different. In essence, the pretreatment measure captures remembered utility, while the posttreatment measure collected at the time of decision making captures decision utility. We now turn to incorporating these measures of happiness into a modeling framework that relates them to mode switching behavior.
12.4. Model Formulation In this section, we formulate a dynamic random utility model of happiness and behavior that is an application of the dynamic framework proposed earlier. The model is formulated in the context of the experiment described in the previous section. Figure 12.2 shows the dynamic modeling framework applied to the mode choice experiment. We distinguish between two periods: pretreatment and posttreatment. In the pretreatment period, since participants commute habitually by car and are not making new decisions, they have a remembered utility from the car. This utility is reflected by the pretreatment car happiness measure. In the posttreatment period, the participants decide whether they will switch to public transportation or keep commuting by car. The utility concept that is relevant in this decision-making period is decision utilities of car and public transportation. These utilities are reflected by the posttreatment car and public transportation happiness measures, respectively, and by
A Model of Travel Happiness and Mode Switching
295
Explanatory Variables
X
Happiness
hCar
Decision Utility
Decision Utility
U Car
U PT
Happiness
hPT
Choice y Happiness 0 Car
h
Remembered Utility 0 U Car
Figure 12.2: Framework for modeling happiness and travel mode switching in the Swiss experiment. the choice. Participants select the mode that maximizes their utility. The choice here refers to the choice of mode on a given day in the posttreatment week. Remembered and decision utilities are affected by explanatory variables. The remembered utility of car is correlated with the decision utility of car and of public transportation, and the decision utilities of car and public transportation are also correlated. The correlations among the three happiness measures are captured through the correlations among the utilities. Note that although the model framework does not explicitly represent a causal relationship from remembered to decision utility (as in the dynamic framework presented earlier) but rather shows a correlation structure, the correlation relationship can be viewed as a special case of the causal structure.
12.4.1. Structural Model The structural model is a specification of the utility equations of car and public transportation. For car, we specify pretreatment (remembered) and posttreatment (decision) utilities. For public transportation, we specify only a posttreatment (decision) utility equation as public transportation becomes relevant to the participants only after they try it and decide if they want to switch to it. These utilities are given by Eqs. (12.1)–(12.3). U 0Car ¼ V 0Car þ 0Car ¼ b0 þ b1 timeCar þ b2 high income dummy þ 0Car
(12.1)
296
Maya Abou-Zeid and Moshe Ben-Akiva
U Car ¼ V Car þ Car ¼ b3 þ b1 timeCar þ b2 high income dummy þ Car U PT ¼ V PT þ PT ¼ b1 timePT þ PT
(12.2)
(12.3)
where U 0Car is the pretreatment car utility, UCar is the posttreatment car utility, and UPT is the posttreatment public transportation utility; V 0Car , VCar, and VPT are the corresponding systematic utilities which are specified as a function of travel time for car and public transportation and also as a function of income for car (using a dummy variable equal to one if monthly personal pre-tax income is greater than 7500 Swiss Francs and equal to zero otherwise). The travel time variable is door-to-door home-to-work time in hours and is obtained from the travel diaries. The specification was kept simple due to the small sample size. Moreover, no cost variable was included; the mode choice data used in the model come from a posttreatment week when the free public transportation passes were still valid; and a distance variable used as a proxy for fuel costs in the car alternative had a positive but insignificant coefficient so it was not included in the model. 0Car , eCar, and ePT are error terms distributed as multivariate normal with zero means, unit variances, and nonzero covariances (or correlations since the variances are one), as follows: 02 3 0 B6 7 6 7 4 Car 5 N @4 0 5; PT 0 2
0Car
3
2
1
6r 4 c rb
rc 1 ra
rb
31
C ra 7 5A 1
where rc is the correlation between the pretreatment and posttreatment car utilities, ra is the correlation between the posttreatment car and public transportation utilities, and rb is the correlation between the pretreatment car utility and posttreatment public transportation utility. Let DU denote the difference between the posttreatment car and public transportation utilities. DU ¼ U Car U PT
(12.4)
12.4.2. Measurement Model The measurement model consists of a specification of the mode choice and happiness indicators. The choice between car and public transportation is assumed to be based on posttreatment utility maximization (since the choice is made posttreatment), as follows: ( y¼
1 ðCarÞ if DU þ Z 0 0 ðPTÞ
otherwise
(12.5)
A Model of Travel Happiness and Mode Switching
297
where y is a choice indicator and Z is a logistic error with a location of 0 and a scale parameter of 1ðZ Logisticð0; 1ÞÞ. Z can represent optimization errors on the part of the decision-maker. Since the total utility difference contains a normal error term (eCar ePT) and a logistic-error term, the choice model is an error component logit mixture. Three happiness measures are available: pretreatment car happiness, posttreatment car happiness, and posttreatment public transportation happiness. These measures are ordinal with a five-point scale (with 1 denoting very dissatisfied and 5 denoting very satisfied). Therefore, technically these measures should be related to the corresponding utilities through ordinal logit or probit regressions. However, due to the small number of observations, an ordinal treatment of these measures produced several counterintuitive parameter estimates. Therefore, continuous regressions were used instead, recognizing the limitations of doing so when the number of categories is small. These regressions are specified as follows: h0Car ¼ a1 þ l1 U 0Car þ u1
(12.6)
hCar ¼ a2 þ l2 U Car þ u2
(12.7)
hPT ¼ a3 þ l3 U PT þ u3
(12.8)
where h0Car , hCar, and hPT are the observed measures of pretreatment car happiness, posttreatment car happiness, and posttreatment public transportation happiness, respectively; a1, a2, and a3 are intercepts; l1, l2, and l3 are scaling parameters; u1, u2, and u3 are normal error terms with zero means and unknown variances; u1 Nð0; s2u1 Þ, u2 Nð0; s2u2 Þ, and u3 Nð0; s2u3 Þ. Through this formulation, the correlation between the happiness measures is captured through the correlation between the error terms in the corresponding utility equations. For example, the correlation between the measures of pretreatment and posttreatment car satisfaction is captured through the correlation between 0Car and eCar, and similarly for the correlations between the other two pairs of happiness measures. To ease the model estimation, we specify the happiness equations in deviations forms, as in Eqs. (12.6u–12.8u), and do not estimate the intercepts a1, a2, and a3. Also, note that with the model expressed in deviations form, we do not estimate the intercept parameter b0 as well (since b0 cancels out from the pretreatment car utility when it is expressed in deviations form). 0 0 0 0 0 h0 Car ¼ hCar hCar ¼ l1 ðU Car V Car Þ þ u1 ¼ l1 U Car þ u1
(12.60 )
hCar ¼ hCar hCar ¼ l2 ðU Car V Car Þ þ u2 ¼ l2 U Car þ u2
(12.70 )
298
Maya Abou-Zeid and Moshe Ben-Akiva hPT ¼ hPT hPT ¼ l3 ðU PT V PT Þ þ u3 ¼ l3 U PT þ u3
(12.80 )
where h0 Car , hCar , and hPT are the pretreatment car happiness, posttreatment car happiness, and posttreatment public transportation happiness measures, respectively, 0 all expressed in deviations form; hCar , hCar , and hPT are the sample averages of cor responding happiness measures; U 0 Car , U Car , and U PT are the pretreatment car utility, posttreatment car utility, and posttreatment public transportation utility, 0 respectively, all expressed in deviations form; V Car , V Car , and V PT are the sample averages of the corresponding systematic utilities.
12.4.3. Likelihood Function The maximum likelihood method is used for estimation. The likelihood of a given observation is the probability of observing the sequence of mode choices and the three happiness indicators. Ignoring first the correlation arising from the panel nature of the data (multiple mode choices for every individual corresponding to multiple home-to-work trips), then conditional on 0Car , eCar, and ePT, the probabilities of the choice and each of the happiness indicators are independent. The likelihood for observation n is computed by integrating the product of these conditional probabilities over the joint density of 0Car , eCar, and ePT, as follows: Z
Z
Z
Pn ¼ PT
Car
0Car
0 ðL1 ðyjCar ; PT Þf 2 ðh0 Car jCar Þf 3 ðhCar jCar Þ
f 4 ðhPT jPT Þf 5 ð0Car ; Car ; PT Þd0Car dCar dPT Þ
ð12:9Þ
0 where L1 ðyjCar ; PT Þ is the choice probability, f 2 ðh0 Car jCar Þ is the density function of the pretreatment car happiness measure in deviations form, f 3 ðhCar jCar Þ is the density function of the posttreatment car happiness measure in deviations form, and f 4 ðhPT jPT Þ is the density function of the posttreatment public transportation happiness measure in deviations form — all conditional on 0Car , eCar, and ePT that have a joint density function denoted as f 5 ð0Car ; Car ; PT Þ. L1 ðyjCar ; PT Þ is a logit model given by the following equation:
L1 ðyjCar ; PT Þ ¼
1 1 þ eDU
y
eDU 1 þ eDU
ð1yÞ (12.10)
0 f 2 ðh0 Car jCar Þ, f 3 ðhCar jCar Þ, and f 4 ðhPT jPT Þ are normal density functions specified as follows, where f denotes a standard normal density function:
0 f 2 ðh0 Car jCar Þ ¼
0 1 h l1 U 0 Car f Car su1 su 1
(12.11)
A Model of Travel Happiness and Mode Switching
299
f 3 ðhCar jCar Þ
1 hCar l2 U Car ¼ f su2 su 2
(12.12)
f 4 ðhPT jPT Þ
1 hPT l3 U PT ¼ f su3 su 3
(12.13)
f 5 ð0Car ; Car ; PT Þ is a multivariate normal density function given by: 0 f 5 ð0Car ; Car ; PT Þ ¼
1 ð2pÞ
3=2
j
P
1=2
j
B exp@
1h 2
0Car
Car
2 0 31 iX1 Car 6 7C PT 4 Car 5A PT (12.14)
P P where is theP variance–covariance matrix ofP0Car , eCar, and ePT; | | denotes the P1 is the inverse of . determinant of ; and Given the structure of the variance–covariance matrix (with unit variances), it can be shown using the Cholesky decomposition that in order to guarantee a positive definite variance–covariance matrix, the following three constraints should be imposed on the correlations:
ra ¼ g
1 rc 1
(12.15)
1 rb 1
(12.16)
qffiffiffiffiffiffiffiffiffiffiffiffiffi 1 r2c þ rb rc ;
0 r2b þ g2 1
(12.17)
where g is a parameter to be estimated. Since multiple mode choice observations are available for every participant, we account for the correlation among mode choice observations of the same participant by making the error terms eCar and ePT individual specific. For any given individual, conditional on eCar and ePT, the probability of a sequence of mode choices is the product of the probabilities of mode choices made on different home-to-work trips on several days. The likelihood for any given individual is modified as follows: Z
Z Pn ¼ PT
Car
Z 0Car
Tn Y
! 0 L1t ðyt jCar ; PT Þ f 2 ðh0 Car jCar Þf 3 ðhCar jCar Þ
t¼1
f 4 ðhPT jPT Þf 5 ð0Car ; Car ; PT Þd0Car dCar dPT
! ð12:18Þ
300
Maya Abou-Zeid and Moshe Ben-Akiva
where Tn is the number of mode choice observations for individual n and yt is the mode choice for observation t. The log likelihood for the full sample is given by the following equation: L¼
N X
ln Pn
(12.19)
n¼1
12.5. Model Estimation The model specified in the previous section was programmed in Gauss (Aptech Systems, 1995) and estimated simultaneously using the maximum likelihood approach with multiple imputations for income used in the estimation. The results shown below are based on pooled estimation from five imputed datasets with adjustments for the standard errors. Numerical integration was used to estimate the integrals in the likelihood function. Moreover, we tested for the presence of sample selection bias using a simplified version of the above model. In particular, we postulated that the decision to participate in the experiment may be correlated with the mode switching decision. That is, people who decide to participate may have already thought about switching to public transportation anyway or may be more inclined in general to switch. Using data on participants and nonparticipants, we estimated sample selection models and tested for the presence of sample selection bias and found no evidence for it. The details of the income imputation procedure and sample selection tests are described in Abou-Zeid (2009). When the model was estimated, the correlation parameter ra between the posttreatment car and public transportation utilities approached 1 (corner solution). Therefore, we set this correlation to 1 (which means that the error term ePT is equal to 1 times eCar) and estimated only one correlation parameter rc between the pretreatment and posttreatment car utilities. This correlation parameter rc turned out negative (contrary to expectations) but insignificant. Therefore, it was set to zero. Moreover, a more parsimonious specification with only one l parameter in the happiness equations (i.e., l1 ¼ g2 ¼ l3 ¼ l) produced better results. Table 12.1 shows the parameter estimates, standard errors, and t-statistics for the resulting model. Figure 12.3 shows the model structure and parameter estimates (except for the variances of the error terms in the happiness equations) with t-statistics in parentheses. The travel time coefficient is negative and significant at the 90% level of confidence. The high-income dummy variable in the car utility equation has a positive coefficient as expected but is insignificant. The scaling parameter l in the happiness equations is positive and significant at the 90% level of confidence, which indicates that the happiness measures are valid indicators of utility. The standard deviations of the error terms of the happiness equations are very significant. The extended model (choice + happiness) was compared to a standard model that does not include happiness (i.e., choice only). To set the two models at the same
A Model of Travel Happiness and Mode Switching
301
Table 12.1: Extended model (choice + happiness) estimation results (N ¼ 28). Parameter
Estimate
Structural model Car constant (posttreatment) Time (h) High-income dummy rc rb ra Measurement model l1 ¼ l 2 ¼ l 3
t-Statistic
1.27 2.13 0.878 – – –
0.18 1.92 0.49 – – –
0.234 4.09 0.433 0.00 (fixed) 0.00 (fixed) 1.00 (fixed)
Pretreatment car happiness su1 Posttreatment car happiness su2 Posttreatment PT happiness su3 Total log likelihood Choice log likelihood
Standard error
0.153
0.0857
1.79
0.757
0.108
6.98
0.732
0.101
7.29
0.891
0.123
7.25
143.1 42.5
Time: -4.09 (-1.92) High Income Dummy Car Alternative: 0.433 (0.49)
Happiness
hCar
0.153 (1.79)
Decision Utility
-1.00
Decision Utility
U PT
U Car
0.153 (1.79)
Happiness
hPT
Choice y Happiness 0 Car
h
0.153 (1.79)
Remembered Utility 0 U Car
Figure 12.3: Swiss model structure and parameter estimates (the car constant and happiness equations error variances are not shown in the figure; t-statistics are shown in parentheses).
302
Maya Abou-Zeid and Moshe Ben-Akiva
Table 12.2: Standard choice model estimation results (N ¼ 28). Parameter Car constant Time (h) High-income dummy ra Choice log likelihood
Estimate
Standard error
t-Statistic
0.241 3.38 0.143 1.00 (fixed) 42.4
1.52 2.55 1.07 –
0.16 1.33 0.13 –
Table 12.3: Measures of efficiency for the extended and standard models. Extended model (choice + happiness) Varðb^ Standard Þ Varðb^ Extended Þ ____ VarðV^ Car Þ ____ VarðV^ PT Þ
Standard model (choice only)
Positive definite 4.36 4.69
6.28 6.74
scale, the standard mode choice only model was estimated with standard normal error components added to the utility equations of car and public transportation such that the correlation between these two error components is 1 (as in the extended model). Table 12.2 shows the estimation results for the standard model. The parameter estimates of the extended and standard models are different but are generally of a similar order of magnitude. The standard errors of the parameter estimates in the standard model are larger than those in the extended model. The goodness-of-fit of the extended and standard models, measured by the choice log likelihood, is similar. The efficiency of the extended model was assessed based on two criteria. First, we checked whether the variance–covariance matrix of the structural parameters (car constant, time, and high-income dummy) of the extended model was smaller than the corresponding variance–covariance matrix of the standard model and found that it was smaller; a Hausman specification test (Hausman, 1978) further indicated that the extended and standard structural model parameters are consistent. Second, we computed the average variance (across the sample) of the systematic utility of car and public transportation using both models and found that the variance obtained in the extended model is smaller than the variance obtained in the standard model. These results are summarized in Table 12.3.
12.6. Conclusion In this chapter, we argued that happiness, broadly defined as overall satisfaction with all aspects of an experience, can be interpreted to be same as utility, or part of the utility if it is more narrowly defined. Happiness measures can then be used to provide
A Model of Travel Happiness and Mode Switching
303
more information about the utility. We presented a modeling framework that extends random utility models by incorporating happiness measures as indicators of utility. We also explicitly distinguished in the framework among different notions of utility: decision, moment, and remembered utility, and represented their causal relationships. We applied the framework to modeling happiness and travel mode switching using data collected in Switzerland from a sample of habitual car drivers who were asked to commute temporarily to work by public transportation. Their pretreatment satisfaction with the commute by car and posttreatment satisfaction with the commute by car and public transportation were recorded. Their mode choices for 1-week posttreatment were also recorded. In previous research (Abou-Zeid et al., 2008), we postulated that people do not fully think of their travel happiness unless they reconsider their decisions, and so happiness measures collected under routine and nonroutine situations are expected to be different. We found that the pretreatment (routine) and posttreatment (nonroutine) measures of satisfaction with the commute by car were significantly different. The pretreatment measure captures remembered utility, while the posttreatment measure collected at the time of decision making captures decision utility. The model consists of a structural specification of three utility equations with a general correlation structure among the utilities: pretreatment car utility, posttreatment car utility, and posttreatment public transportation utility, and a measurement model consisting of a choice model and three happiness equations corresponding to the pretreatment and posttreatment measures collected. The model was estimated simultaneously in Gauss using maximum likelihood. The happiness measures were valid indicators of the utilities. The extended model (choice + happiness) was also compared to a standard mode choice only model. The goodness-of-fit was similar; the measures of efficiency suggest that the extended model may be more efficient than the standard model. The results are based on a parsimonious model specification due to the small sample size and point to the need for testing this framework using larger datasets. The results of applying the framework to a slightly larger dataset are reported in Abou-Zeid (2009). The modeling framework will potentially improve behavioral models based on random utility theory. The usefulness of the happiness indicators lies mainly in producing more accurate parameter estimates. As with the use of indicators of latent variables in Hybrid Choice Models (Ben-Akiva et al., 2002; Walker & Ben-Akiva, 2002), the happiness indicators are typically not used in model application because they do not influence behavior. As such, the quantities of interest that can be derived from the extended model are the same as those obtained in a standard model (value of time, market shares, etc.). However, one can still use the happiness measurement equations to predict satisfaction levels if predictions of utility are available. Two extensions to the empirical model may be considered. First, the happiness model was based on continuous regressions. With larger datasets, ordinal regressions can be used and the corresponding thresholds can be modeled as random parameters (with or without a behavioral model, specifying the thresholds as a function of socioeconomic variables for example). Other parameters in the model can also be modeled as random with larger datasets. Second, the happiness model was developed at an absolute level rather than at a difference level. That is, every happiness measure
304
Maya Abou-Zeid and Moshe Ben-Akiva
was used as an indicator of the corresponding utility. One can alternatively model differences between happiness measures (e.g., difference between car and public transportation satisfaction ratings) as indicators of the differences between the corresponding utilities. This would allow testing the idea that people give relative rather than absolute satisfaction judgments.
References Abou-Zeid, M. (2009). Measuring and modeling activity and travel well-being. Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, MA, USA. Abou-Zeid, M. & Ben-Akiva, M. (2007). The effect of social comparisons on commute wellbeing. Presented at the Frontiers in Transportation Workshop, Amsterdam, The Netherlands. Abou-Zeid, M., Ben-Akiva, M., & Bierlaire, M. (2008). Happiness and travel behavior modification. Proceedings of the European Transport Conference, Noordwijkerhout, The Netherlands. Aptech Systems. (1995). GAUSS applications: Maximum likelihood. Maple Valley, WA. Ben-Akiva, M., McFadden, D., Train, K., Walker, J., Bhat, C., Bierlaire, M., Bolduc, D., Boersch-Supan, A., Brownstone, D., Bunch, D. S., Daly, A., de Palma, A., Gopinath, D., Karlstrom, A., & Munizaga, M. A. (2002). Hybrid choice models: Progress and challenges. Marketing Letters, 13(3), 163–175. Bentham, J. (1789). An Introduction to the principle of morals and legislations. Blackwell: Oxford, United Kingdom (reprinted 1948). Benz, M., & Stutzer, A. (2003). Do workers enjoy procedural utility? Applied Economics Quarterly, 49(2), 149–172. Bruni, L., & Porta, P. L. (Eds). (2007). Handbook on the economics of happiness. Northampton, MA: Edward Elgar Publishing. Camerer, C., Loewenstein, G., & Rabin, M. (Eds). (2003). Advances in behavioral economics. Princeton, NJ: Princeton University Press. Diener, E., & Seligman, M. (2004). Beyond money: Toward an economy of well-being. Psychological Science in the Public Interest, 5(1), 1–31. Di Tella, R., & MacCulloch, R. (2006). Some uses of happiness data in economics. The Journal of Economic Perspectives, 20(1), 25–46. DIW Berlin, German Institute for Economic Research. (2009). The German Socio-Economic Panel Study. Available at: http://www.diw.de/english/sop/ Dolan, P., & White, M. (2007). How can measures of subjective well-being be used to inform public policy? Perspectives on Psychological Science, 2(1), 71–85. Duarte, A., Garcia, C., Giannarakis, G., Lima˜o, S., Polydoropoulou, A., & Litinas, N. (2009a). New Approaches in transportation planning: Happiness and transport economics. NETNOMICS: Economic Research and Electronic Networking, Special Issue on ‘Design and Evaluation of Transportation Networks’, doi:10.1007/s11066-009-9037-2. Duarte, A., Garcia, C., Lima˜o, S., & Polydoropoulou, A. (2008). Happiness in transport decision making – the Swiss sample. Paper presented at the 8th Swiss Transport Research Conference, Monte Verita`, Ascona, Switzerland. Duarte, A., Garcia, C., Lima˜o, S., & Polydoropoulou, A. (2009b). Experienced and expected happiness in transport mode decision making process. Paper presented at the 88th Annual Meeting of the Transportation Research Board, Washington, DC. The ESRC United Kingdom Longitudinal Studies Centre. (2009). British Household Panel Survey (BHPS). Available at: http://www.iser.essex.ac.uk/ulsc/bhps/
A Model of Travel Happiness and Mode Switching
305
Ettema, D., Ga¨rling, T., Olsson, L. E., & Friman, M. (2009). Out-of-home activities, daily travel, and subjective well-being. Paper submitted for review to Transportation Research Part A. European Commission. (2009). Recent standard Eurobarometer. Available at: http:// ec.europa.eu/public_opinion/standard_en.htm Frey, B., Luechinger, S., & Stutzer, A. (2004). Valuing public goods: The life satisfaction approach. Working Paper no. 184, Institute for Empirical Research in Economics, University of Zurich, Zurich, Switzerland. Frey, B., & Stutzer, A. (2002). What can economists learn from happiness research? Journal of Economic Literature, 40(2), 402–435. Hausman, J. (1978). Specification tests in econometrics. Econometrica, 46, 1251–1272. Kahneman, D. (2000). Experienced utility and objective happiness: A moment-based approach. In: D. Kahneman & A. Tversky (Eds), Choices, values, and frames (pp. 673–692). New York: Cambridge University Press. Kahneman, D., Diener, E., & Schwarz, N. (Eds). (1999). Well-being: The foundations of hedonic psychology. New York: Russell Sage. Kahneman, D., Fredrickson, B., Schreiber, C. M., & Redelmeier, D. (1993). When more pain is preferred to less: Adding a better end. Psychological Science, 4(6), 401–405. Kahneman, D., & Krueger, A. (2006). Developments in the measurement of subjective wellbeing. Journal of Economic Perspectives, 20(1), 3–24. Kahneman, D., Wakker, P. P., & Sarin, R. (1997). Back to Bentham? Explorations of experienced utility. The Quarterly Journal of Economics, 112(2), 375–405. Kimball, M. & Willis, R. (2006). Utility and happiness. Working Paper, available at: http:// www.econ.princeton.edu/seminars/SEMINARS/Fall07Seminar/Fall07pdfs/kimball.pdf Loewenstein, G., & Ubel, P. A. (2008). Hedonic adaptation and the role of decision and experience utility in public policy. Journal of Public Economics, 92, 1795–1810. McFadden, D. (2005). The new science of pleasure: Consumer behavior and the measurement of well-being. Frisch Lecture, Econometric Society World Congress, London, Available at: http://elsa.berkeley.edu/wp/mcfadden0105/ScienceofPleasure.pdf National Opinion Research Center at the University of Chicago. (2009). General social survey. Available at: http://www.norc.org/projects/general + social + survey.htm Ory, D., & Mokhtarian, P. (2005). When is getting there half the fun? Modeling the liking for travel. Transportation Research Part A, 39, 97–123. Redelmeier, D. A., Katz, J., & Kahneman, D. (2003). Memories of colonoscopy: A randomized trial. Pain, 104(1–2), 187–194. Schreiber, C. A., & Kahneman, D. (2000). Determinants of the remembered utility of aversive sounds. Journal of Experimental Psychology: General, 129(1), 27–42. Seligman, M. (2002). Authentic happiness: Using the new positive psychology to realize your potential for lasting fulfillment. New York: Free Press/Simon and Schuster. Van Praag, B., & Ferrer-i-Carbonell, A. (2004). Happiness quantified: A satisfaction calculus approach. Oxford, UK: Oxford University Press. Walker, J., & Ben-Akiva, M. (2002). Generalized random utility model. Mathematical Social Sciences, 43(3), 303–343. Wirtz, D., Kruger, J., Scollon, C. N., & Diener, E. (2003). Research report: What to do on spring break? The role of predicted, on-line, and remembered experience in future choice. Psychological Science, 14(5), 520–524. World Values Survey. (2009). The world’s most comprehensive investigation of political and sociocultural change. Available at: http://www.worldvaluessurvey.org/
Chapter 13
On Path Generation Algorithms for Route Choice Models Emma Frejinger and Michel Bierlaire
Abstract This paper deals with choice set generation for the estimation of route choice models. Two different frameworks are presented in the literature: one aims at generating consideration sets and one samples alternatives from the set of all paths. Most algorithms are designed to generate consideration sets but fail in general to do so because some observed paths are not generated. In the sampling approach, the observed path as well as all considered paths is in the choice set by design. However, few algorithms can be actually used in the sampling context. In this paper, we present the two frameworks, with an emphasis on the sampling approach, and discuss the applicability of existing algorithms to each of the frameworks.
13.1. Introduction A route choice model predicts which route, or path, a traveler would take when going from a given origin to a given destination in a transport network. Discrete choice models are appropriate for the analysis of this problem. However, the set of alternatives potentially contains all the paths connecting the origin–destination (OD) pair. It is typically very large and unknown (both to the analyst and the traveler). It is, therefore, meaningful to consider subsets of alternatives to obtain operational models.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
308
Emma Frejinger and Michel Bierlaire
We denote by U the set of all possible paths connecting an OD pair. We denote by Cn the set of paths actually considered by traveler n. It is sometimes called the consideration set in the literature. As the set Cn in general is not known by the analyst, Manski (1977) proposes a two stage choice model: Pn ðiÞ ¼
X
Pn ðijCÞPn ðCÞ
(13.1)
C2G
where Pn ðCÞ captures the probability that C is the choice set actually considered by the traveler and G is the set of all nonempty subsets of U. The literature on consideration sets is vast (see e.g., Shocker, Ben-Akiva, Boccara, & Nedungadi, 1991; Roberts & Lattin, 1991; Roberts & Lattin, 1997 for overviews) and there are applications in various fields (for a deeper discussion of route choice, see Bovy, 2009). The number of possible choice sets in G is exponentially large. Consequently, any operational model must be based on a simplification of Eq. (13.1). A first approach is based on a behavioral motivation, where the analyst tries to identify only a small number of potential choice sets C in G that have a significant probability Pn ðCÞ to be considered by the traveler. Assuming that all other subsets of U have a zero probability to be considered, Eq. (13.1) considerably simplifies. In the literature, most proposed algorithms generate only one such choice set C, so that Eq. (13.1) involves only one term. To the best of our knowledge, all algorithms proposed in the literature so far have been shown to be incorrect, as the choice sets that they generate do not even contain the actually chosen path for a significant part of the observations. More specifically, we refer to the coverage results reported by, for example, Ramming (2002), Hoogendoorn-Lanser (2005), and Frejinger and Bierlaire (2007). The best result is reported by Prato and Bekhor (2006), showing that their branch-and-bound algorithm is able to generate 91% of the observed routes. A second approach is motivated by mathematical convenience rather than behavioral assumptions. The idea is to simplify Eq. (13.1) by assuming that only one set is considered. It is actually consistent with Horowitz and Louviere (1995) who argue that choice need not be modeled as a two-step process in which a consideration step precedes choice. Frejinger, Bierlaire, and Ben-Akiva (2009) propose a choice model where the choice set of each individual is the universal choice set U, that is: Pn ðiÞ ¼ Pn ðijU Þ
(13.2)
The universal choice set has the major advantage to contain the chosen alternative, and all considered alternatives, so that the issue raised by other algorithms is not relevant in this context. From a behavioral perspective, it is not reasonable to assume that travelers actually consider all alternatives in U. The focus of this approach is, however, on obtaining unbiased parameter estimates. It is assumed that the bias of not including a considered alternative in the choice set is larger than including many nonconsidered alternatives. Indeed, the route choice model should assign very small choice probabilities to unattractive paths.
On Path Generation Algorithms for Route Choice Models
309
U is large and unknown to the analyst. Therefore, the estimation of these models must rely on sampling of alternatives, as proposed by McFadden (1978), and applied to route choice by Frejinger et al. (2009). In this context, subsets must also be generated but instead of capturing any behavioral reality, they must simply verify some mathematical properties to be consistent with the estimation procedure. In summary, each of the two approaches presented above require choice set generation algorithms, but for different reasons. The objective of this paper is to present some algorithms proposed in the literature, and discuss their applicability for the sampling approach. Before, we summarize some methodological aspects of the estimation of discrete choice models with samples of alternatives.
13.2. Sampling of Alternatives Consider a sampling protocol where a subset Dn of alternatives is drawn for each observation n from U according to the probability distribution pðDn jiÞ, which verifies the positive conditioning property (PCP). That is, if i; j 2 Dn U and pðDn jiÞ40, then pðDn j jÞ40. McFadden (1978) shows that the logit model can be consistently estimated on a subset of alternatives using classical exogenous sample maximum likelihood (ESML) estimation. The term PðijDn Þ involved in the likelihood function is then conditional on the choice set Dn drawn by the analyst: emV in þln pðDn jiÞ mV jn þln pðDn j jÞ j2Dn e
PðijDn Þ ¼ P
(13.3)
where m is a scale parameter and Vin is the deterministic part of the utility. The alternative specific terms ln pðDn j jÞ correct for sampling bias. See also Ben-Akiva and Lerman (1985) and Train (2003) for a more detailed discussion about sampling of alternatives. If pðDn j jÞ verifies the uniform conditioning property, that is, if i; j 2 Dn U implies pðDn jiÞ ¼ pðDn j jÞ, then the estimation can be done without any correction for bias. Indeed, the terms involving pðDn jiÞ get canceled out in Eq. (13.3). A simple random sampling protocol verifies that last property. It is, however, not necessarily interesting when the full set of alternatives is particularly large, like for route choice applications. The sample should include attractive alternatives since comparing a chosen alternative to a set of highly unattractive alternatives would not provide much information on the choice and, consequently, the estimator would be inefficient. In order to increase the efficiency of the estimator, attractive alternatives should be included to have information about the trade-offs made by the traveler. If a simple random strategy is used, the sample should be prohibitively large to guarantee that. A sampling protocol called importance sampling can be considered, where attractive alternatives are selected with higher probability than unattractive alternatives. Then, the correction terms in Eq. (13.3) do not get canceled out. Note that if alternative
310
Emma Frejinger and Michel Bierlaire
specific constants (ASC) are estimated, all parameter estimates except the constants would be unbiased even if the correction is not included in the utilities. However, route choice models do not include ASCs, and the correction must be explicitly included. In the context of route choice, a typical procedure consists in drawing paths with replacement, and to add the chosen path with certainty. If kjn is the number of times path j is drawn and q( j) is the probability to be drawn, Frejinger et al. (2009) show that the correction term in Eq. (13.3) is: pðDn j jÞ ¼ K Dn
kjn kjn 8j 2 Dn qð jÞ qð jÞ
(13.4)
Note that pðDn j jÞ is only proportional to knj =qð jÞ but the constant K Dn get canceled out in Eq. (13.3). Several studies can be found in the literature where importance sampling of alternatives has been used, including Quigley (1985) and Ben-Akiva and Bowman (1998) for residential location choice, and Train, McFadden, and Ben-Akiva (1987) for choice of telephone service. For route choice applications, the two most commonly used models are MNL based, path size logit (Ben-Akiva & Ramming, 1998; Ben-Akiva & Bierlaire, 1999) and c-logit (Cascetta, Nuzzolo, Russo, & Vitetta, 1996), and can thus be estimated on samples of alternatives including a correction as in Eq. (13.3). Frejinger et al. (2009) use the path size logit model in their study and propose a corrected version of the path size attribute consistent with the sampling approach. Two other categories of route choice models are present in the literature: (i) multivariate extreme value (MEV) models (link-nested logit, Vovsha & Bekhor, 1998) and (ii) mixtures of logit and probit models (see e.g. Frejinger & Bierlaire, 2007; Yai, Iwakura, & Morichi, 1997). Bierlaire, Bolduc, and McFadden (2008) show that MEV models can be consistently estimated based on samples of alternatives. A correction similar to Eq. (13.4) should be derived in this case as well. It is, however, worth mentioning that the link-nested logit model was rejected in favor of the path size logit model by Ramming (2002), probably due to strong assumptions on the nesting parameter values (see Abbe´, Bierlaire, & Toledo, 2007, for a discussion of MEV models). For the second category of models (mixtures of logit and probit), there are no known sampling results.
13.3. Path Sampling In this section, we present some possible algorithms for path generation. In order to be applicable in the sampling approach, they must be such that 1. the generated set for each observation contains the observed path, 2. the PCP is verified, and 3. the probability pðDn j jÞ can be computed. From Eq. (13.4), it is actually sufficient to be able to compute the probability q( j) of generating path j.
On Path Generation Algorithms for Route Choice Models
311
First, we discuss procedures actually designed for sampling, and second, analyze some methods proposed in the literature for the generation of consideration sets. We show that, in general, the latter are not appropriate for sampling of alternatives. Uniform Sampling As presented in the previous section, a uniform sampling scheme ðqð jÞ ¼ 1=jUj8j 2 UÞ would be convenient because the correction terms in Eq. (13.3) gets canceled out. It is, however, not appropriate for path generation since the sample would need to be very large, and most importantly, it is, in general, not possible to draw paths with equal probability since it would require an enumeration of the paths in U. Partitioning McFadden (1978, p. 14) proposes a deterministic sampling scheme where U is partitioned into mutually exclusive sets. For each observation n, the set containing the chosen alternative is considered as the reduced choice set. In the route choice context, it is infeasible to characterize a partition of U. Random Walk Random walk algorithms construct a path link by link from the origin to the destination, randomly drawing the next link at each node. The link selection probabilities can be defined in various ways and the probability of selecting a path is the product of the probability of selecting each link in the path. This type of sampling verifies the PCP since any path in U may be sampled. The method proposed by Dial (1971) for traffic assignment can be applied in this context, although only ‘‘reasonable’’ paths will have nonzero probabilities to be selected, which may violate the PCP. Frejinger et al. (2009) propose another variant of random walk based on similar ideas. Contrarily to Dial (1971), they do not use a logit model for link selection, and they do not exclude any path a priori, so that the PCP is verified. The three approaches above are designed from a sampling perspective and only the last one can be used in practice for route choice modeling. Note, however, that it is most probably not appropriate for generating consideration sets. Indeed, a random walk, even if it is biased toward the shortest path, may generate unrealistic routes. We now turn our attention to algorithms specifically designed for generating consideration sets. Their performance from that perspective is analyzed in the literature, see for example a review by Bovy (2009). Here, we focus on the most commonly used algorithms and discuss the three conditions required by the sampling approach for each of them. Deterministic Algorithms Several deterministic approaches are proposed in the literature, including link elimination (Azevedo, Costa, Madeira, & Martins, 1993), link penalty (de la Barra, Pe´rez, & An˜ez, 1993), labeled paths (Ben-Akiva, Bergman, Daly, & Ramaswamy, 1984) and branch-and-bound (Prato & Bekhor, 2006), but none of them guarantee that the chosen path belongs to the generated set (see Section 13.1) violating the first condition above. Moreover, they do not satisfy the PCP. Indeed, for a given observation n, if the chosen path in does belong to Dn, Dn is the set used for this observation. If it does not belong to Dn, then the set Dþ n ¼ Dn [ fin g; is used as the subset for this observation. It means that if
312
Emma Frejinger and Michel Bierlaire
in eDn , then pðDþ n jin Þ ¼ 1 and pðDn ji n Þ ¼ 0. Consider now a path j 2 Dn and, þ therefore, also in Dþ n . The PCP would require that pðDn j jÞ40. However, as j is þ already in Dn, we have pðDn j jÞ ¼ 1 and pðDn j jÞ ¼ 0. Monte-Carlo Simulation Simulation-based algorithms have been proposed to generate consideration sets (Ramming, 2002; Bovy & Fiorenzo-Catalano, 2006). They use link-based Monte-Carlo simulations similar to the approaches proposed by Burrell (1968) and Sheffi and Powell (1982). The procedure consists in repeatedly drawing realizations of random generalized cost variables associated with each link and compute the shortest path. If the generalized costs on each link are drawn from a distribution with support ½0; þ1½, every path has a nonzero probability to be generated. If the observed path is added to the generated set, the PCP holds. The probability of drawing a path P is the probability that it is the shortest, that is:
Pr
X
ðcl þ nl Þ
X
! ðcm þ nm Þ8Q 2 U
m2Q
l2P
¼ Pr cP þ
X
nl cQ þ
l2P
X
!
(13.5)
nm 8Q 2 U
m2Q
where cl is the generalized cost of link l, cP the generalized cost of path P, P and nl the random variable used to generate the draws on link l. Denoting xP ¼ l2P nl we obtain: PrðcP þ xP cQ þ xQ 8Q 2 UÞ
(13.6)
which has the exactly same form as that of a choice model. Therefore, obtaining the probability for a given path to be selected is as difficult as the original choice problem. Gateway Algorithm Gateway algorithms (e.g., Lombard & Church, 1993) force a generated path to traverse a given link or node. A stochastic version of such an algorithm is used by Bierlaire and Frejinger (2008), where a gateway link is drawn randomly and the shortest path is computed, based on deterministic generalized cost, from the origin to the source node of the gateway link and from its sink node to the destination. The probability of drawing a path is then the probability of drawing the gateway link. Unfortunately, not all paths in U can be generated by this procedure, as the generated paths are limited to the shortest path tree. The PCP is not satisfied.
13.4. Summary and Future Directions For the estimation of route choice models, choice sets of routes need to be generated. Most algorithms proposed in the literature aim at generating consideration sets,
On Path Generation Algorithms for Route Choice Models
313
that is, alternatives that a traveler actually considered for a given trip. They have proved to be incorrect since not all observed paths are generated. An alternative approach proposed by Frejinger et al. (2009) is to use sampling of alternatives instead and assumes that all paths belong to the choice set considered by the traveler. Although behaviorally questionable, it has the advantage that the chosen alternative as well as all considered alternatives are in the choice set. The aim is to obtain unbiased parameter estimates of the logit model with an operational procedure. It is assumed that the bias of not including a considered alternative in the choice set is larger than including many nonconsidered alternatives. It is important to note that, as for any result based on samples, the approach is asymptotically valid. Hence, the more paths in the choice sets, the better the estimation results. In this paper, we present the underlying assumptions of the sampling approach and discuss some existing path generation algorithms. We show that few of the existing algorithms meet the requirements for the sampling approach. The main difficulty lies in the fact that the set of all paths U is very large and unknown in practice. Clearly, the list is not exhaustive. The purpose is to illustrate the difficulty to design an algorithm that is consistent with the requirements of sampling of alternatives in the route choice context. So far, sampling of paths has only been discussed for the estimation of route choice models. Future research should be dedicated to the sampling of alternatives for prediction.
References Abbe´, E., Bierlaire, M., & Toledo, T. (2007). Normalization and correlation of cross-nested logit models. Transportation Research Part B, 41(7), 795–808. Azevedo, J., Costa, M. S., Madeira, J. S., & Martins, E. V. (1993). An algorithm for the ranking of shortest paths. European Journal of Operational Research, 69(1), 97–106. Ben-Akiva, M., Bergman, M., Daly, A., & Ramaswamy, R. (1984). Modeling inter urban route choice behaviour. In: J. Vollmuller & R. Hamerslag (Eds), Proceedings of the 9th International Symposium on Transportation and Traffic Theory (pp. 299–330). VNU Science Press, Utrecht, Netherlands. Ben-Akiva, M., & Bierlaire, M. (1999). Discrete choice methods and their applications to short-term travel decisions. In: R. Hall (Ed.), Handbook of transportation science. Kluwer. Ben-Akiva, M., & Bowman, J. L. (1998). Integration of an activity-based model system and a residential location model. Urban Studies, 35(7), 1131–1153. Ben-Akiva, M., & Lerman, S. R. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: MIT Press. Ben-Akiva, M., & Ramming, S. (1998). Lecture notes: Discrete choice models of traveler behavior in networks. Prepared for Advanced Methods for Planning and Management of Transportation Networks. Capri, Italy. Bierlaire, M., Bolduc, D., & McFadden, D. (2008). The estimation of generalized extreme value models from choice-based samples. Transportation Research Part B, 42(4), 381–394.
314
Emma Frejinger and Michel Bierlaire
Bierlaire, M., & Frejinger, E. (2008). Route choice modeling with network-free data. Transportation Research Part C, 16(2), 187–198. Bovy, P. H. L. (2009). On modelling route choice sets in transportation networks: A synthesis. Transport Reviews, 29(1), 43–68. Bovy, P. H. L., & Fiorenzo-Catalano, S. (2006). Stochastic route choice set generation: Behavioral and probabilistic foundations. Proceedings of the 11th International Conference on Travel Behaviour Research, Kyoto, Japan. Burrell, J. (1968). Multipath route assignment and its application to capacity restraint. Proceedings of the 4th International Symposium on the Theory of Road and Traffic Flow. Cascetta, E., Nuzzolo, A., Russo, F., & Vitetta, A. (1996). A modified logit route choice model overcoming path overlapping problems. Specification and some calibration results for interurban networks. In: J. B. Lesort (Ed), Proceedings of the 13th International Symposium on Transportation and Traffic Theory, Lyon, France. de la Barra, T., Pe´rez, B., & An˜ez, J. (1993). Mutidimensional path search and assignment. Proceedings of the 21st PTRC Summer Meeting (pp. 307–319). Dial, R. (1971). A probabilistic multipath traffic assignment algorithm which obviates path enumeration. Transportation Research, 5(2), 83–111. Frejinger, E., & Bierlaire, M. (2007). Capturing correlation with subnetworks in route choice models. Transportation Research Part B, 41(3), 363–378. Frejinger, E., Bierlaire, M., & Ben-Akiva, M. (2009). Sampling of alternatives for route choice modeling. Transportation Research Part B, 43(10), 984–994. Hoogendoorn-Lanser, S. (2005). Modelling travel behaviour in multi-modal networks, Ph.D. thesis, Delft University of Technology, The Netherlands. Horowitz, J., & Louviere, J. (1995). What is the role of consideration sets in choice modelling? International Journal of Research in Marketing, 12, 39–54. Lombard, K., & Church, R. L. (1993). The gateway shortest path problem: Generating alternative routes for a corridor location problem. Geographical Systems, 1, 25–45. Manski, C. F. (1977). The structure of random utility models. Theory and decision, 8(3), 229–254. McFadden, D. (1978). Modelling the choice of residential location. In: A. Karlqvist, L. Lundqvist, F. Snickars & J. Weibull (Eds), Spatial interaction theory and residential location (pp. 75–96). Amsterdam: North-Holland. Prato, C. G., & Bekhor, S. (2006). Path enumeration by using branch and bound technique. Transportation Research Record, 1985, 19–28. Quigley, J. N. (1985). Consumer choice of dwelling neighborhood and public services. Regional and Science of Urban Economics, 15, 41–63. Ramming, M. (2002). Network knowledge and route choice, Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Roberts, J. H., & Lattin, J. M. (1991). Development and testing of a model of consideration set composition. Journal of Marketing Research, 28(4), 429–440. Roberts, J. H., & Lattin, J. M. (1997). Consideration: Review of research and prospects for future insights. Journal of Marketing Research, 34(3), 406–410. Sheffi, Y., & Powell, W. (1982). An algorithm for the traffic assignment problem with random link costs. Networks, 12(2), 191–207. Shocker, A. D., Ben-Akiva, M., Boccara, B., & Nedungadi, P. (1991). Consideration set influences on consumer decision-making and choice: Issues, models, and suggestions. Marketing Letters, 2(3), 181–197. Train, K. (2003). Discrete choice methods with simulation. Cambridge University Press.
On Path Generation Algorithms for Route Choice Models
315
Train, K., McFadden, D., & Ben-Akiva, M. (1987). The demand for local telephone service: A fully discrete model for residential calling patterns and service choice. The RAND Journal of Economics, 18(1), 109–123. Vovsha, P., & Bekhor, S. (1998). Link-nested logit model of route choice overcoming route overlapping problem. Transportation Research Record, 1645, 133–142. Yai, T., Iwakura, S., & Morichi, S. (1997). Multinomial probit with structured covariance for route choice behavior. Transportation Research Part B, 31(3), 195–207.
PART IV ENDOGENEITY AND HETEROGENEITY
Chapter 14
Mode Choice Endogeneity in Value of Travel Time Estimation Stefan L. Mabit and Mogens Fosgerau
Abstract It is often found that the value of travel time (VTT) is higher for car drivers than for public transport passengers. This paper examines the possible explanation that the difference could be due to a selection effect. The result is an inability to measure the effect of a mode difference, e.g., comfort, among transport modes. We specify a model that captures the mode difference through a mode dummy and use econometric techniques that allow treatment of the mode dummy as the result of an individual choice and hence endogenous. Using first a standard logit model we find a large and significant difference between the VTT for bus and car. When we control for endogeneity using instruments, the mode dummy becomes smaller and just significant. Our investigation is novel in that it allows for endogeneity in the estimation of VTT but like other applications using instruments the results indicate that we have difficulty in finding good instrumental variables.
14.1. Introduction Numerous studies are concerned with the estimation of the value of travel time (VTT) as it plays an important role in demand models and appraisal of transport projects. It is essential to recognise that the VTT is heterogeneous among the population. Some variation in the VTT may be explained by observed variables such as income and transport mode. Nevertheless, a large share of the heterogeneity in VTT is not explained by observables. In the stylised case of car versus bus, we would
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
320
Stefan L. Mabit and Mogens Fosgerau
expect the mode effect to cause the VTT to be higher in bus, since riding a bus is mostly less comfortable than driving a car. On the other hand, we expect individuals with a high VTT, perhaps explainable by higher income, to be more likely to choose car. For this reason we may observe a higher VTT among car drivers than among bus passengers even though the opposite may be the case for each individual. As mentioned by Wardman (2004), it is difficult to separate such mode and usertype effects. In many empirical studies the combined impact of the two effects is that individuals in car are seen to have higher VTT than for example in bus. This may be an adequate representation of the actual VTT in car and bus. But for many purposes this state of affairs is not satisfactory. For example, a traffic model that includes a higher VTT in car than in bus would predict a shift in the mode share towards bus if travel times were increased equally in the two modes. Such a result would not make sense if the VTT is higher in bus than in car for each individual. The typical way to address this issue is to specify separate models for each mode. These models are by their nature able to capture heterogeneity within a specific mode. But they suffer a fundamental problem since they are estimated conditional on the chosen mode. This makes them good at assessment of specific projects where the mode-specific population may reasonably be assumed to represent the population of interest. On the other hand strong assumptions are necessary to apply mode-specific models to a population of mixed travellers. In that case, an improvement of the transport system may motivate some travellers to change modes. Many applications have shown a gap between the VTT in car and the VTT in public transport. Discussions of how self-selection may explain the differences found in VTT among modes are found in Mabit and Fosgerau (2009) and Fosgerau, Hjorth, and Lyk-Jensen (2007). Mabit and Fosgerau (2009) use a Heckman-type selection approach to model self-selection into modes. In Fosgerau et al. (2007), VTT is estimated both in a reference mode and an alternative mode to investigate whether differences among modes may be explained by strategic choices or self-selection. Their results favour the self-selection explanation. This is the motivation behind the model presented in this paper. We try to get a less biased estimate of the mode effects on the VTT by using an estimator that controls for endogeneity of mode choice. We specify and estimate a model for the VTT in a population where a mode dummy is included in the specification of the VTT to capture mode effects. In our estimation we then allow for the possible endogeneity of the mode dummy. Endogeneity means that the mode dummy may be correlated with the errors in the model, since the errors reflect unobserved characteristics of travellers that may also affect mode choice. Such correlation violates the assumptions underlying the standard estimation procedures and may cause bias. Our approach for the estimation of models with endogenous explanatory variables is taken from Lewbel (2004). We compare this to models that do not take endogeneity into account. The principle behind Lewbel’s approach is to transform the discrete choice into a linear model using a special regressor and then utilise methods dealing with endogenous variables in linear models. Our model is simple, e.g., it does not take the panel dimension of data into account. With the present econometric state of the art this is the price we have to pay for being able to take endogeneity into account.
Mode Choice Endogeneity in Value of Travel Time Estimation
321
Future research may be able to address the shortcomings of the methodology that we apply. We believe it is of interest to start working in this direction taking endogeneity into account to make discrete choice models more robust. The paper is organised as follows. In Section 14.2 we present our model for VTT together with an approach from Lewbel (2004) for the estimation of models with endogenous explanatory variables. Section 14.3.1 presents the data and Section 14.3.2 discusses the model specification. Section 14.3.3 presents the estimation results. The final Section 14.4 contains some concluding remarks.
14.2. Model Formulation 14.2.1. VTT Model To model VTT we use the framework of discrete choice models based on random utility maximisation (see Train, 2003). Here we restrict ourselves to binary choices. Suppose that individual n faces a choice between two alternatives and that each alternative has just two attributes: cost and time. We assume that the fast alternative is chosen if and only if VTTn 4
DC n DT n
(14.1)
where VTTn is the unknown VTT of individual n while DCn and DTn refer to the difference in the cost attributes and time attributes between the two alternatives. If we assume a positive VTT, we may use a log transformation. The equation above then becomes
DCn 40 lnðVTTn Þ ln DT n
(14.2)
This modelling approach is described in more detail in Fosgerau (2007). To make the model empirically tractable we assume a linear parameterisation of ln(VTT), and add a random error representing measurement and specification errors. As discussed by Wardman (2004) we may split heterogeneity in VTT into mode and user-type effects. Here we include mode effects through a mode dummy while the remaining variables, x1, describe the individual and the choice context, i.e., lnðVTTn Þ ¼ b0 xn ¼ b01 x1;n þ g0 yn þ n
(14.3)
where yn is the mode dummy and en a random error term. The final model describing whether an individual chooses the fast alternative, Dn ¼ 1, becomes DC n þ n 40 Dn ¼ I b01 x1;n þ g0 yn ln DT n
(14.4)
322
Stefan L. Mabit and Mogens Fosgerau
where I( ) is an indicator function equal to one if the expression in parenthesis is true and zero if it is false. Our interest lies in the estimation of g.
14.2.2. Estimation Suppose first that we are willing to make the standard strong assumption underlying the logit model. That is we assume the model in Eq. (14.4) together with an assumption that e is independent of x and iid logistically distributed over individuals. The model may then be estimated using maximum likelihood estimation. A less restrictive estimation procedure is given by the Klein–Spady estimator (see Klein & Spady, 1993). This procedure does not assume any specific distribution for e. The standard maximum likelihood estimation for binary choice is based on maximisation of the likelihood function using the distribution function of e. In the Klein–Spady approach this distribution is replaced by a non-parametric estimate that depends on the coefficients, b. Included in both the logit and Klein–Spady estimator is the standard but strong assumption that x is exogenous. To allow for the possibility that y and e are dependent we use an estimation procedure presented in Lewbel (2004). Suppose the model D ¼ Iðb0 x1 þ g0 y þ v þ 40Þ where v and x1 are exogenous variables, y is an endogenous variable, and that we have instruments z1 for y, i.e., additional variables that are correlated with y but not with e.1 Let s ¼ ðy0 ; x01 ; z01 Þ denote all variables except v and x ¼ ðy0 ; x01 Þ. Define v to be a very exogenous regressor (VER) if Assumption 1. v ¼ gðn; sÞ for some function g, g is differentiable, strictly monotonically increasing in its first element, n ? s; ; and n is continuously distributed. Assumption 2. suppð½b0 x þ Þ suppðvjsÞ. The first condition tells us that v depends on s and an independent error. This error term n is assumed independent of other explanatory variables s and e which is slightly stronger than ordinary exogeneity. The second condition tells us that the support of v is large. This is used to identify the b parameter. We will discuss both of these conditions later when we discuss our specific VER and the assumed distribution of n. Lewbel (2004) proves the following theorem using the notation z ¼ ðz01 ; x01 Þ to denote the exogenous variables. Theorem 1. Suppose we have the model D ¼ Iðb0 x þ v þ 40Þ and instrument, z, such that EðzÞ ¼ 0. Assume that v is a VER, i.e., suppðb0 x þ Þ suppðvjsÞ;
1. The normalisation of v’s coefficient to unity is not a restriction as long as we do not assume anything about the variance of e.
Mode Choice Endogeneity in Value of Travel Time Estimation
323
v ¼ gðn; sÞ; g is differentiable, strictly monotonically increasing in its first element, n ? s; ; and n is continuously distributed with density function f(n). Define T and e by T¼
D Iðv 0Þ @gðn; sÞ and e ¼ T b0 x f ðnÞ @n
Then E(ze) ¼ 0. The theorem tells us that we may estimate the parameters b using two-stage least squares regressing T on x using z as instruments.2 In this article we use the special case assuming that the VER is normally distributed conditional on the other exogenous variables, i.e., we apply the assumption v ¼ b0 s þ n; n ? s; ; and n Nð0; s2 Þ with density function f(n). Given this both of the conditions for a VER are fulfilled. As an example of a suitable v Lewbel mentions a bid determined by experimental design. Neither of the procedures discussed above take random effects into account. This could be a problem if data come as a panel with several choices for each individual. Our approach may seem simplistic in that we do not take any panel dimension into account. In this respect we have chosen to make one restriction to be able to relax another.
14.3. Data and Estimation 14.3.1. Data The data are from the 2004 Danish VTT study known as DATIV (see Fosgerau, Hjorth, & Lyk-Jensen, 2006). Each individual made eight stated preference (SP) choices in an unlabelled experiment referring to a current trip, i.e., car users only made car choices. Every choice was a binary choice where the alternatives were only described by travel time and cost. There were 1050 individuals of which 416 used bus and 634 car. This gave 3109 bus observations and 4867 car observations. Table 14.1 summarises the 0–1 dummies used as explanatory variables. The attributes are summarised in Table 14.2 together with the continuous variables. Here v denotes lnðDC=DTÞ where DT is the time attribute of the first alternative minus the time attribute of the second, and likewise DC the cost difference. The alternatives have been arranged in such a way that alternative 1 is the fastest.
2. Two-stage least squares regression is generally applied in situations where (1) there is one or more endogenous regressors (i.e., regressors that are correlated with the error term) and (2) there are variables available, called instruments, which are correlated with the endogenous variables but not with the residuals. The instruments can then be used to extract the variation in the endogenous variables that is correlated with the error term, such that the endogeneity problem is removed. For more details, see Wooldridge (2002).
324
Stefan L. Mabit and Mogens Fosgerau
Table 14.1: Descriptive statistics, 0–1 indicators, share ¼ 1 in percent. Variable
All
Bus
Car
Description
xarea12 xcars xcarsin xgender xgrp2 xwage0
42.5 17.6 12.0 46.4 27.2 7.9
50.2 10.3 5.3 53.3 13.2 4.2
37.6 22.2 16.3 42.0 36.1 10.2
Residence in larger (Danish) city More than one car in household Single with car Female Travel with household member No wage earner in household
Table 14.2: More descriptive statistics. Variable
Mean
SD
Min.
Max.
Mean (bus)
Mean (car)
xage (years) xdist (km) xinc (scale) xjcost (DKK) xjtime (min) v (DKK/min) xDT max (min)
3.72 3.08 1.15 3.04 3.35 0.47 8.52
0.37 1.16 0.56 1.15 0.79 0.86 4.25
2.77 0 0 0 1.39 3.00 0
4.43 6.18 2.20 6.62 6.17 1.72 14
3.59 2.49 1.05 2.68 3.18 0.54 8.87
3.80 3.45 1.20 3.27 3.46 0.43 8.30
The bid v is chosen as our VER. The variable was generated within the design of the SP experiment. This makes it plausible that the independence assumption is valid concerning the remaining variables and the error term in the model. Furthermore the experimental design allowed for a wide range of bids. The variable xage is the log of age, xdist the log of the distance in kilometres between origin and destination. If distance was zero, xdist is set to zero. The variable xinc is the log of gross personal income.3 The variable xjtime is the log of reported travel time. The variable xjcost is the log of reported travel cost. The last variable is xDT max . This is defined as xDT max ¼ maxð15 þ DT; 0Þ. This allows us to see how VTT varies for different time differences up to a difference of 15 min.4
14.3.2. Model Specification In the specification we have to decide which variables to use as x in Eq. (14.3). As restrictions on cost and time cause VTT to exist we use variables related to cost and
3. Income is a discrete variable with 9 levels where level 1 is income below 100,000 DKK, level 2 is income between 100,000 DKK and 200,000 DKK, etc., until level 9 which is income above 800,000 DKK. One Euro is 7.5 DKK (050908). 4. Beyond 15 min VTT was tested to be constant with respect to the time difference.
Mode Choice Endogeneity in Value of Travel Time Estimation
325
time. In addition we use other design variables as they describe the choice context together with age and gender as the causal relation between VTT and these variables is clear. Based on preliminary testing we ended up with the specification below. Gender was excluded as it turned out to be insignificant. The testing considered both which variables to include and whether to use a linear or logarithmic parameterisation. lnðVTTÞ ¼ b0 þ bage xage þ bjtime xjtime þ bjcost xjcost þ binc xinc þ Zt SðtÞ þ Zc SðcÞ þ bDT max xDT max þ bmode xmode The remaining variables S(t) and S(c) indicate how the cost and time attributes in the SP choice relate to the reference values. We define S(t) as the sign of (T1 + T2)/2 jtime and likewise for S(c). A positive coefficient on either corresponds to loss aversion concerning that attribute. We may also use these two variables to divide the choice situation into one of four types depending on the reference situation: equivalent gain, equivalent loss, willingness to pay, and willingness to accept. These concepts are discussed in detail in De Borger and Fosgerau (2008). They show that it is relevant to include these in the estimation of VTT since VTT differs significantly among the choice situations. In the specification of the VER approach we suspect that the mode dummy is endogenous. We retain the assumption that the other variables are exogenous. We then have to find suitable instruments for the mode dummy. These should affect the mode choice and at the same time they should be redundant in the VTT equation conditional on the other variables. An ideal instrument would be a variable having a large influence on selection of mode but being uncorrelated with VTT conditional on mode. As instruments we use xarea12, xcars, xcarsin, xgender, xgrp2, xwage0, and xdist. Their explanatory power concerning mode choice is supported by visual inspection of Tables 14.1 and 14.2. It was also supported by a binary logit estimation (not reported) of mode choice depending on the socioeconomic variables.
14.3.3. Model Estimation The estimations were performed by a program written in Ox (see Doornik, 2001). The results for the logit estimation are given in Table 14.3. The estimates were found using maximum likelihood estimation while the confidence limits were found by bootstrapping,5 i.e., the model was estimated 400 times. The lower and upper bounds correspond to the boundaries of the 95% confidence interval for each coefficient. The estimation shows that all the parameters except the constant are significantly different from zero. The VTT decreases with age and increases with
5. Other procedures, e.g., Jackknife, were not considered. We chose bootstrapping since this was suggested by Lewbel.
326
Stefan L. Mabit and Mogens Fosgerau
Table 14.3: Logit estimation. Explanatory variable
Estimated coefficient Lower bound Upper bound
Constant LN (age) LN (journey time) LN (journey cost) LN (income) S(t) S(c) Time difference less than 15 min Car dummy Scale
0.61 0.63 0.35 0.44 0.60 0.19 0.13 0.04 0.59 1.02
0.02 0.79 0.45 0.37 0.51 0.15 0.08 0.06 0.47 0.95
1.22 0.45 0.25 0.52 0.71 0.25 0.18 0.03 0.72 1.09
Number of parameters: 10 Log-likelihood at convergence: 4541 Adjusted r2: 0.18
income which is reasonable. Furthermore the income parameter is a direct estimate of the income elasticity and 0.60 is a reasonable estimate. VTT decreases with journey time and increases with journey cost in contrast to some former studies. Many studies show increasing VTT with journey time which also has been supported theoretically in Jara-Diaz (1998). In our case we obtained a positive sign on the journey time parameter in preliminary estimations without journey cost. But as soon as journey cost was entered the sign changed. This is not in contrast with the empirical findings supporting the positive sign as these do not include journey cost. The two sign variables relating to the framing of the choice experiment show the expected pattern (see De Borger & Fosgerau, 2008). We have loss aversion concerning both time and cost with the aversion being largest for time. The time difference parameter shows a rising VTT with the time difference up to a difference of 15 min (when DT increases then xDT max decreases). The mode dummy parameter is the parameter of interest. The logit model gives a positive and significant dummy. This dummy indicates that individuals who chose car on their reference trip have a higher VTT than those who chose bus, even when controlling for both socioeconomic characteristics and design variables. The final parameter is the scale on the logistic error. It is possible to estimate the scale as we restrict the coefficient on the bid v to unity. The scale is not significantly different from 1 but we keep it in the model to make the model more comparable to the Klein–Spady estimation and the VER approach. Next we consider the estimation using the Klein–Spady estimator. We use a Nadaraya–Watson estimator with normal density kernel. The results are seen in Table 14.4. The results look like the logit results. Therefore we may conclude that even though we see a large improvement in log-likelihood this does not affect the
Mode Choice Endogeneity in Value of Travel Time Estimation
327
Table 14.4: Klein–Spady estimation. Explanatory variable
Estimated coefficient
Standard error
t value
0.62 0.28 0.48 0.54 0.21 0.14 0.03 0.65
0.07 0.05 0.04 0.04 0.03 0.02 0.01 0.06
9.14 5.20 11.03 12.23 8.43 6.58 4.06 11.03
LN (age) LN (journey time) LN (journey cost) LN (income) S(t) S(c) Time difference less than 15 min Car dummy Number of parameters: 8 Log-likelihood at convergence: 4516
Table 14.5: Estimation using a VER. Explanatory variable Constant LN (age) LN (journey time) LN (journey cost) LN (income) S(t) S(c) Time difference less than 15 min Car dummy
Estimated coefficient Lower bound Upper bound 0.02 0.44 0.52 0.43 0.54 0.16 0.22 0.03 0.41
1.74 0.70 1.04 0.11 0.34 –0.02 0.07 0.03 0.06
1.31 0.13 0.07 0.86 0.79 0.34 0.42 0.13 0.77
parameter estimates much. Since the Klein–Spady estimator cannot estimate an intercept we only have eight estimated coefficients. Finally, we consider the estimation using the VER approach. The first step in the estimation procedure using the bid as a VER is to regress the bid v against all other variables. This is just a way to obtain a linear projection of v on the other variables and the regression coefficients have no interpretation. The only concern is whether the normality assumption is appropriate. In our case we regressed v against the other variables and investigated the normality assumption using a QQ plot. This check did not indicate problems with the normality assumption. The estimation results are shown in Table 14.5. The upper and lower bounds refer to the 95% confidence interval found by bootstrapping. The parameters for age, journey time, journey cost, and income are significant and similar to the logit estimates. Concerning the design-related variables, the parameter for S(t) is now insignificant, i.e., loss aversion has become insignificant for time. The car dummy becomes smaller but it is still positive and significant. The estimate of 0.41 is clearly
328
Stefan L. Mabit and Mogens Fosgerau
Table 14.6: Estimation using a VER without instruments. Explanatory variable
Estimated coefficient Lower bound Upper bound
Constant LN (age) LN (journey time) LN (journey cost) LN (income) S(t) S(c) Time difference less than 15 min Car dummy
0.18 0.43 0.50 0.42 0.56 0.17 0.23 0.04 0.39
1.92 0.64 1.03 0.11 0.35 0.01 0.06 0.03 0.11
1.26 0.19 0.04 0.83 0.83 0.36 0.44 0.14 0.65
outside the 95% confidence interval of the logit estimates which has a lower bound of 0.47. This indicates a possible endogeneity problem in the mode dummy as the logit estimate is biased. On the other hand the VER approach has so large confidence intervals that we cannot reject the logit estimate. Finally it has to be noted that this procedure does not estimate a scale coefficient. Therefore there are nine coefficients in Table 14.5. As the Lewbel estimation do not reject the logit model, the results may indicate that we do not have sufficiently good instruments. As a test of this we estimated the VER approach without instruments. This estimation gave us the results seen in Table 14.6, which are almost identical to the results from the estimation with instruments. This leads us to conclude that our instruments are weak. At the same time this estimation indicates that we have a problem with our VER as the only difference between the VER approach without instruments and the Klein–Spady estimator is the stronger assumptions concerning the VER. This indicates that even though the normality assumption for our VER was reasonable, the variable does not have a wide enough range to satisfy Assumption 2 in Section 14.2.2. Using the estimates we evaluated the VTT based on the different estimation results. These reflected the fact that the car dummies do not change significantly using the more general estimation procedure. Therefore they do not add anything to the discussion above and they are not reported.
14.4. Summary and Conclusions We have formulated a model where the VTT depends on the chosen mode, car or bus, as well as unobserved heterogeneity. Self-selection into modes is influenced by the VTT, which may cause the mode dummy to be correlated with the unobserved heterogeneity. This is an endogeneity problem, which may cause the parameters of the model, in particular the parameter corresponding to the mode dummy, to be biased. Three models are estimated. First we estimate an ordinary logit model. Second we estimate a model using the Klein–Spady estimator, where the error term is not
Mode Choice Endogeneity in Value of Travel Time Estimation
329
assumed to follow a particular distribution. The third model uses a VER to transform the discrete choice model into a linear model, so that instrumental variable methods (two-stage least squares) may be applied to control for endogeneity of the mode dummy. Comparison of the logit estimates with the Klein–Spady estimates does not indicate that the distribution of the error term is not logistic. The VER approach does, however, indicate that there is an endogeneity problem. The mode dummy estimate using the VER approach falls outside the confidence interval from the logit estimation. This is an indication that there is in fact some dependence between the mode dummy and the error term. With the logit model, the mode dummy is large and very significant. With the VER approach, the mode dummy decreases, but it is still positive and (just) significant. So it would seem that the VER approach reduces the difference between the VTT in the different modes. This is as would be expected if mode choice is influenced by the VTT. However, the VER model is not able to reject the logit parameter estimates. We suspect that our main difficulty is the lack of good instruments. Good instruments are correlated with the mode dummy but not with the residual. Variables that influence mode choice, but which do not have a direct causal effect on the VTT, satisfy this criterion. Such variables are hard to find. To further check the validity of model assumptions, we compared the Klein–Spady estimates to VER estimates obtained without using the instruments. This comparison indicates that our VER may not satisfy the support condition. Generally, our results are consistent with an endogeneity bias in the mode choice dummy in the VTT. We may therefore retain the hypothesis that self-selection causes the VTT to be higher for car drivers than for bus passengers, while it is possible that any individual has the opposite relation between VTTs. The kind of endogeneity that we are considering is not the one typically considered in the literature, since here it affects the distribution of a parameter. An important contribution of the present paper is therefore that we are now able to deal with this situation. We have demonstrated and tested a promising method to deal with this kind of endogeneity in binary discrete choice models. However, it seems that results could be improved if better instruments could be found. A wider range of trade-off VTTs would reduce the problems that seem to be present regarding the support condition for the VER approach. An interesting and relevant extension would be to investigate whether the VER approach could be developed for the mixed logit model and whether it could account for panel data.6 This would make the VER approach applicable to more of the models that are used in practice.
6. In Lewbel (2004) a fixed effects approach is presented which is suitable for panel data. Unfortunately this does not allow for the estimation of a mode dummy.
330
Stefan L. Mabit and Mogens Fosgerau
Acknowledgement The authors would like to thank two anonymous referees and Katrine Hjort for their valuable comments.
References De Borger, B., & Fosgerau, M. (2008). The trade-off between money and travel time: A test of the theory of reference-dependent preferences. Journal of Urban Economics, 64(1), 101–115. Doornik, J. (2001). Ox: An object-oriented matrix language. London: Timberlake Consultants Press. Fosgerau, M. (2007). Using nonparametrics to specify a model to measure the value of travel time. Transportation Research Part A, 41, 842–856. Fosgerau, M., Hjorth, K., & Lyk-Jensen, S. V. (2006). An integrated approach to the estimation of the value of travel time, European Transport Conference, Strasbourg, 2006. Fosgerau, M., Hjorth, K. & Lyk-Jensen, S. V. (2007). Between mode differences in the value of travel time: Self-selection or strategic behaviour? European Transport Conference, Leiden, 2007. Jara-Diaz, S. R. (1998). Time and income in travel choice: Towards a microeconomic activitybased theoretical framework. In: T. Ga¨rling, T. Laitia & K. Westin (Eds), Theoretical foundations of travel choice modelling (pp. 51–73). Oxford: Pergamon. Klein, R. W., & Spady, R. H. (1993). An efficient semiparametric estimator for binary response models. Econometrica, 61(2), 387–421. Lewbel, A. (2004). Simple estimators for hard problems: Endogeneity in discrete choice related models. Working Paper, available at: http://www2.bc.edu/lewbel/ Mabit, S. L., & Fosgerau, M. (2009). Controlling for sample selection in the estimation of the value of travel time. In: R. Kitamura, T. Yoshii & T. Yamamoto (Eds), The expanding sphere of travel behaviour research (Chapter 32, pp. 703–723). Bingley, UK: Emerald Group Publishing Limited. Train, K. (2003). Discrete choice methods with simulation. New York, NY: Cambridge University Press. Wardman, M. (2004). Public transport values of time. Transport Policy, 11, 363–377. Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.
Chapter 15
Accommodating Coefficient Outliers in Discrete Choice Modelling: A Comparison of Discrete and Continuous Mixing Approaches Danny Campbell, Stephane Hess, Riccardo Scarpa and John M. Rose
Abstract The presence of respondents with apparently extreme sensitivities in choice data may have an important influence on model results, yet their role is rarely assessed or even explored. Irrespective of whether such outliers are due to genuine preference expressions, their presence suggests that specifications relying on preference heterogeneity may be more appropriate. In this paper, we compare the potential of discrete and continuous mixture distributions in identifying and accommodating extreme coefficient values. To test our methodology, we use five stated preference datasets (four simulated and one real). The real data were collected to estimate the existence value of rare and endangered fish species in Ireland.
15.1. Introduction The analysis of discrete choice data is nowadays routinely conducted by means of specifications which accommodate random taste variation (e.g. Revelt & Train, 1998; Train, 1998; McFadden & Train, 2000; Hensher & Greene, 2003). This growing interest in understanding and explaining preference heterogeneity has yet to produce a systematic investigation of the role of extreme heterogeneity, where the term ‘extreme’ refers to the values of taste coefficient outliers. In the present paper, we distinguish such type of outliers from choice prediction outliers (e.g. see Ben-Akiva
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
332
Danny Campbell et al.
& Lerman, 1985) and work by Daly and Zachary, 1975 in the context of choice prediction for binary logit and probit models.1 While the role of extreme heterogeneity is rarely assessed or even explored, it is well known that observations in the tails of distributions of attribute coefficients may exert undue influence on estimates (Kanninen, 1995; Lewbel, 1997) and ultimately on the estimator’s numerical performance. They may also affect the estimated asymptotic standard errors, with consequences for inference, interpretation of statistical significance and hypotheses testing (Louviere, Hensher, & Swait, 2003). Finally, observations of outlying coefficient values are behaviourally relevant because they may provide evidence that not all individuals conform to established conceptions of economic behaviour. The above concerns motivate the present study on how to appropriately identify and accommodate outlying taste intensities in discrete choice analysis. In fact, when dealing with outliers, their identification represents only the first step. After ruling out the possibility of outliers being due to coding errors, the second step relates to how to adequately handle their presence in the sample. Assessing the legitimacy of outliers requires a potentially controversial judgement and needs to be evaluated on a case-by-case basis. The option of removing them from the active sample used in estimation is prone to criticism. Therefore, whenever possible, improvements to the model specification should first be sought so that outliers can be accommodated (Ben-Akiva & Lerman, 1985; Sælensminde, 2006). Irrespective of whether or not extreme values are genuine preference statements, their presence suggests that model specifications relying on preference heterogeneity are more appropriate. Interacting observed respondent-specific variables with the attributes provide a first step to uncovering the ‘observed’ component of heterogeneity. ‘Unobserved’ preference heterogeneity, on the other hand, can be accommodated by treating preferences as varying according to some distributional assumption. However, the presence of extreme coefficient values raises potential problems when deciding an appropriate mixing distribution (see Hess & Axhausen, 2005). We illustrate the impacts of a small number of respondents with extreme sensitivities on sample population level estimates of heterogeneity with the following example. Let us consider a population in which 95% of respondents have a cost coefficient (b) distributed N ð0:5; 0:1Þ. Assume further that the remaining 5% of respondents form two equally sized outlying groups, one with very low and the other with very high cost sensitivity. The low-cost-sensitivity group has the cost coefficient distributed N ð0:1; 0:05Þ; while the high-cost-sensitivity group are distributed N ð1:5; 0:1Þ. This leads to a true distribution with three separate modes. However, erroneously fitting a single normal distribution to this population implies N ð0:516; 0:196Þ, which exaggerates the density of respondents in the intervals
1. For an analysis and comparison of choice prediction and coefficient outliers, see Campbell and Hess (2009).
333
True distribution Fitted Normal distribution
0.0
0.5
1.0
1.5
Density 2.0
2.5
3.0
3.5
Accommodating Coefficient Outliers in Discrete Choice Modelling
−2.0
−1.5
−1.0
β
−0.5
0.0
0.5
Figure 15.1: Example distribution affected by extreme sensitivities. between the highest mode and the two smaller modes of the outlying groups, as shown in Figure 15.1. These problems are also likely to arise with other typically used continuous distributions, which, with a few exceptions, are unimodal and make strict shape assumptions. It is important to note that the degree of heterogeneity (as defined by standard deviation) in the two distributions in the example is identical; the difference lies in the shape of the distribution and in the fact that the fitted distribution incorrectly assigns a large amount of probability between the middle mode and the two outlying groups. For the above reasons, in datasets where coefficient outliers are expected, the use of discrete mixing distributions may provide a more realistic and flexible representation of the underlying form of unobserved preference heterogeneity. Indeed, in a discrete mixture environment, each coefficient takes on a finite number of different values, with no a priori assumptions on the shape of the resulting distribution. In this paper, we assess the role of extreme coefficient values in discrete choice analysis and the relative aptness of continuous and discrete mixture distributions to identify and accommodate them. Within our discrete mixture approach, we give particular emphasis to explaining the extreme lower and upper elements of the distribution (i.e. the ‘outliers’). In order to do so in estimation we specify three mass points for each coefficient. We compare this modelling approach with a standard MNL model and a mixed MNL (MMNL) model, where continuous normal distributions are assumed. In addition, we use the means of the conditional distributions (from the continuous MMNL) of each respondent in the sample to assign them either to a lower, middle or upper class, and estimate models with separate coefficients for the three classes. To test our methodology, we use four simulated datasets and a stated choice dataset
334
Danny Campbell et al.
designed to determine the existence value of a number of rare and endangered fish species in Ireland. The remainder of the paper is structured as follows: Section 15.2 describes our econometric approach, Section 15.3 outlines the simulated data and its results, our empirical case study is presented in Section 15.4 along with the results and Section 15.5 provides a discussion and conclusion.
15.2. Methodology Starting with the conventional specification of utility, where respondents are indexed by n, preferred alternatives by i, choice occasions by t and the vector of attributes are represented by x, we have: U nit ¼ b0n xnit þ nit
(15.1)
where b are parameters to be estimated, e is an iid Gumbel distributed error term, with constant variance p2/6, and we assumed a linear-in-parameters specification of the indirect utility. Since we are interested in identifying respondents with outlying values of taste intensity, we treat the vector bn as random across respondents, denoted by n. In this case, the vector bn is fixed over the sequence of t choices by each respondent. Denoting the respondent’s chosen alternative in choice occasion t as ynt and their sequence of choices over the Tn choice occasions as yn ¼ hyn1 ; yn2 ; . . . ; ynT n i, then, conditional on bn, the probability of respondent n’s sequence of choices is the product of logit formulas: Probðyn jbn ; xn Þ ¼
Tn Y
expðV nyntt ðbn ÞÞ PJ t¼1 j¼1 expðV n jt ðbn ÞÞ
(15.2)
where V nit ¼ b0n xnit : Further, we denote attributes by k and seek to identify and accommodate the extreme lower and upper (i.e. outlying) values of bnk. To assess the aptness of the mixing distributions used to uncover these outliers, we compare discrete and continuous distributions.
15.2.1. Discrete Mixture Approach In a discrete mixture (henceforth abbreviated as DM) context, the number of possible values for the parameter coefficients is finite (e.g. see Hess, Bierlaire, & Polak, 2007; Train, 2008, for a review). To address coefficient outliers, we place particular emphasis on explaining the extreme lower and upper elements of the distribution. In estimation, we impose each of the K random parameters to take three mass points — the first (bnk,L) and third (bnk,H) of which are respectively associated with lower and upper outlying parameters, whilst the middle support point (bnk,M) represents the
Accommodating Coefficient Outliers in Discrete Choice Modelling
335
intermediate and most common preference intensities (i.e. different from the outlying minorities). The mass points for the lower, middle and upper classes are associated with the probabilities pk,l, pk,M and pk,H, respectively. The overall choice probability is given by:
Probðyn jbn ; xn Þ ¼
" Tn X S Y
# expðV nynt t ðbn;s ÞÞ os PJ s¼1 j¼1 expðV n jt ðbn;s ÞÞ
t¼1
(15.3)
where s ¼ 1, y, S is an index over all possible combinations of values for the K taste coefficients given their three values each (i.e. S ¼ 3K). As an example, with two coefficients, say a and g, we would have that S ¼ 9 and with s ¼ 1 relating to the case where the lower values are used for both coefficients, we would have that o1 ¼ paLpgL and b1 ¼ (aL, gL). Unlike the approach used in Train (2008), where the densities are estimated for fixed parameters, we fix the values of pk,L, pk,M and pk,H, and estimate the coefficients. Imposing an ordering restriction ði:e: bnk;L bnk;M bnk;H Þ to ensure monotonicity provides a means for identifying coefficients at different points in the distribution. Given our interest in the extreme lower and upper ends of the distribution, we set the values of pk,L and pk,H to be relatively small.
15.2.2. Continuous Mixture Approach When using continuous mixing distributions, we assume that the set of K random parameters bnk are normally distributed ði:e: bnk N ðmk ; s2k ÞÞ: Dropping the subscript k to unclutter notation, we note that as bn is random, the unconditional choice probability is obtained by integrating the logit probability Lðyn jbn ; xn Þ over the range of values for bn, using as weights the draws from the assumed distribution f ðbjm; XÞ, where m and X are the vectors of population hyper-parameters: Z
Lðyn bn ; xn Þf ðbn jm; XÞdbn
Probðyn jm; X; xn Þ ¼
(15.4)
bn
In this paper, we use a two-stage estimation approach. Firstly, we estimate the ^ population vector of means m^ and the associated variance–covariance matrix X. Secondly, we derive m^ n conditional on each respondent’s pattern of observed choice y1n ; y2n ; . . . ; yTn and choice-task attributes x1n ; x2n ; . . . ; xTn for each of the K random coefficients. Then we use the means of these respondent-specific distributions to assign each respondent to either an outlier class (lower or upper) or to the main middle class. While disregarding the fact that these conditional parameters themselves follow a random distribution around this mean, this approach nevertheless gives us some information about the likely position of a respondent on the distribution. It further allows a rudimentary assessment of the presence of extreme sensitivities and the range of these sensitivities. The work in this paper is
336
Danny Campbell et al.
exploratory and is only a first step in the direction of an appropriate treatment of coefficient outliers. An important area for future work is to use the entire conditional distribution in assigning respondents to outlier groups, that is incorporating the uncertainty in the conditional distribution into the allocation process. On the basis of this class partition, we then proceed with a second stage estimation to obtain separate parameters for the three classes (i.e. bnk,L, bnk,M and bnk,H). This is achieved by specifying each attribute parameter as a function of respondent-specific dummy variables: U nit ¼ ðd0n;L bn;L Þ0 xnit þ ðd0n;M bn;M Þ0 xnit þ ðdn;H bn;H Þ0 xnit þ nit
(15.5)
where dn,L, dn,M and dn,H are the dummy variables, which are specified independently for each coefficient as follows: ( dn;L ¼
1 0
( dn;M ¼
( dn;H ¼
if the respondent0 s conditional mean is a lower outlier if otherwise
1
if the respondent0 s conditional mean is not an outlier
0
if otherwise
1 if the respondent0 s conditional mean is an upper outlier 0 if otherwise
In estimation the choice probabilities for the MMNL models are approximated by simulating the sample log-likelihood with pseudo-random draws. We recognise that the goal of MMNL estimation is to obtain population estimates of distribution parameters, yet propose a method to assess their sensitivity to a differential treatment of those few sample respondents whose panel of choices reveals conditional distribution of taste which are anomalous. The method is useful for different reasons. Firstly, it should help us identify respondents who are more likely to have extreme sensitivities. Secondly, the second stage estimation allows us to evaluate the model performance in the presence of a specific treatment of these outlying sensitivities. Thirdly, it allows us to judge whether using a special treatment for these respondents leads to a significant change (and, arguably, correction) in the shape of the retrieved distribution compared to the base specification. Finally, as we will see later in the paper, it helps us assess issues of confounding under the DM approach.
Accommodating Coefficient Outliers in Discrete Choice Modelling
337
15.3. Simulated Data Experiments This section of the paper discusses the results of four simulated data experiments. Each experiment makes use of the same underlying data, but with different assumptions on the choice data generating processes (DGPs). The underlying data is based on an orthogonal design, with two alternatives, each described by travel time (TT) and travel cost (TC). A sample of 720 respondents was used, with 10 choice sets per respondent. In each case study, six models were estimated. The first four models make use of the data in its original form, with the following specifications being used: MNL: Standard multinomial logit (MNL) model, with two marginal utility parameters, bTT,M and bTC,M; DM1: DM model, with three support points each for the TT and TC coefficients, identified as bT,L, bT,M and bT,H with associated probabilities of pT,L, pT,M and pT,H fixed to values of 0.025, 0.95 and 0.025, respectively. An order constraint was imposed on bT,L, bT,M and bT,H; DM2: Like DM1, but with estimated values for pT,L, pT,M and pT,H to show the effect of the restricting the probabilities for the three support points in DM1; and MMNL: Standard MMNL model, with univariate normal distributions used for the two coefficients, leading to the estimation of mean coefficients bT,M and standard deviations pT,M .2 The two DM models and the MMNL model were estimated with consideration to the repeated choice nature of the data, with variation in tastes across respondents, but not across choices for the same respondent. After estimation of the MMNL model, individual-specific parameter distributions were generated for the two marginal utility coefficients, and the means of these distributions were used to assign each individual to one of three classes for each of the two coefficients, namely a middle class with a weight of at least 95%, and a lower and upper class with maximum weights of 2.5%.3 On the basis of this, two additional models were estimated: MNLC: MNL model making use of separate coefficients in the lower (L), middle (M) and upper (U) group and MMNLC: MMNL model, with normal distributions used for the separate coefficients in the L, M and U group.
2. The use of univariate normal distributions is justified given the assumption of independence between coefficients in the DGPs. 3. The use of strict 2.5%, 95% and 2.5% splits was not always possible due to some respondents having the same mean values for the conditional distributions.
338
Danny Campbell et al.
15.3.1. Simulated Data Case Study I Table 15.1 reports the details of the DGP and the estimation results for the first simulated dataset. The DGP for this experiment splits respondents into three groups, with no additional heterogeneity within groups. For each attribute, all respondents are simulated as having one of three distinct values. The three values represent in growing order: (a) the lower tail outlier, (b) a non-outlier parameter value in the middle class and (c) the upper tail outlier. Looking at the output from DM1, we find that, with the possible exception of the lower tail outlier for TC, the means are sufficiently close to those used to simulate the data. DM2 also retrieves the correct means and, importantly, the estimated values of p are not dissimilar to those used to generate the data, and no major change in model fit is observed. Interestingly, while the means of the random parameters in the MMNL model are close to the support points specified for the middle group, the standard deviations are significantly different from zero. This indicates that the MMNL model retrieves random taste heterogeneity as a result of the presence of extreme sensitivities. With the use of standard continuous distributions, this would lead to incorrect patterns of heterogeneity in between the three distinct groups used in the DGP. Furthermore, in terms of model fit, the MMNL does not perform as well as any of the DM models. The two models with conditional segmentation provide superior fits. Both of these models retrieve correct mean values, but the problems associated with the lower outlier for the TC attribute remain. We speculate this might be due to the small size of this group and the fact that the value we used in the data-generation process was quite extreme. While, for this parameter, preference heterogeneity is retrieved by the MMNLC specification, it is reassuring that the coefficient of variation is very small. Notwithstanding this idiosyncrasy (and the data construct), the fact that the estimated standard deviations are insignificant when outliers are accommodated in estimation provides further evidence that the presence of just a handful of respondents with lower (N ¼ 15 and N ¼ 6 for TT and TC, respectively) and upper (N ¼ 18 for both TT and TC) outliers can result in retrieving significant patterns of preference heterogeneity which may be misinterpreted depending on the distributional assumptions.
15.3.2. Simulated Data Case Study II The DGP and results for the second study on simulated data are given in Table 15.2. This data was simulated based on the same mean values and weights for outliers as in the first simulated case study, but with heterogeneity around the means in the middle and most numerous (95%) group of respondents. Both DM models have superior fits compared to the standard MNL model and, with the exception of the lower outlier for TT, the retrieved mean values are relatively close to those of the true DGP. However, DM2 picks up more outliers, especially for the lower TC parameter, which is a potential evidence of confounding with the heterogeneity used for the middle
0.025 0.536 –
3259.04 2 0.347
3158.32 6 0.366
– 4.40 –
– 33.49 –
– 2.45 –
– 5.81 –
– 32.87 –
– 2.11 –
|t-Ratio|
Abbreviations: est.: established; Par.: parameter. a t-Ratio calculated with respect to a base value of 1. b N denotes the number of respondents with the separate coefficient.
Final LL Par. Adj. r2
– – –
– – –
0.950 0.749 –
– – –
– – –
0.025 0.500 0.000
pTC,U bTC,U sTC,U
– 35.75 –
0.025 10.600 –
NTC,L NTC,M NTC,U
– 0.668 –
0.950 0.750 0.000
pTC,Ma bTC,M sTC,M
– – –
0.025 0.101 –
– – –
– – –
0.025 4.000 0.000
pTC,L bTC,L sTC,L
– – –
0.950 0.155 –
0.025 0.559 –
est.
– – –
– – –
0.025 0.100 0.000
pTT,U bTT,U sTT,U
– 34.80 –
– – –
|t-Ratio|
DM1
NTT,Lb NTT,M NTT,U
– 0.135 –
0.950 0.150 0.000
pTT,Ma bTT,M sTT,M
– – –
0.025 0.800 0.000
est.
MNL
pTT,L bTT,L sTT,L
Simulation settings
Table 15.1: Results for first study on simulated data.
– – –
– – –
2.96 2.93 –
3.80 28.46 –
2.09 0.91 –
3.75 3.70 –
1.18 18.32 –
0.57 1.18 –
|t-Ratio|
3156.68 10 0.366
0.034 0.453 –
0.942 0.759 –
0.025 8.960 –
0.039 0.087 –
0.923 0.156 –
0.038 0.427 –
est.
DM2
– – –
– – –
– – –
– – –
– 29.28 8.68
– – –
– – –
– 28.89 10.21
– – –
|t-Ratio|
3188.65 4 0.36
– 0.796 0.353
– – –
– – –
– 0.163 0.078
– – –
est.
MMNL
2912.88 6 0.415
6 696 18
– 6.48 –
– 36.65 –
– 113.69 –
– 6.36 –
– 36.03 –
– 9.54 –
|t-Ratio|
15 687 18
– 0.673 –
– 0.766 –
– 28.200 –
– 0.114 –
– 0.155 –
– 0.750 –
est.
MNLC
2912.87 12 0.414
6 696 18
– 8.50 0.31
– 36.84 0.20
– 50.59 6.79
– 7.87 0.47
– 35.41 0.38
– 9.53 0.30
|t-Ratio|
15 687 18
– 0.673 0.001
– 0.766 0.011
– 33.300 0.310
– 0.114 0.000
– 0.155 0.004
– 0.750 0.001
est.
MMNLC
Accommodating Coefficient Outliers in Discrete Choice Modelling 339
– – –
– – –
0.025 0.100 0.000
0.025 4.000 0.000
0.950 0.750 0.300
0.025 0.500 0.000
pTT,U bTT,U sTT,U
pTC,L bTC,L sTC,L
pTC,Ma bTC,M sTC,M
pTC,U bTC,U sTC,U
3350.06 6 0.328
– 6.04 –
– 28.64 –
– 4.92 –
– 7.94 –
– 29.61 –
– 4.19 –
|t-Ratio|
Abbreviations: est.: established; Par.: parameter. a t-Ratio calculated with respect to a base value of 1. b N denotes the number of respondents with the separate coefficient.
3477.02 2 0.303
– – –
– – –
NTC,L NTC,M NTC,U
Final LL Par. adj. r2
– – –
0.025 0.492 –
0.950 0.694 –
0.025 3.450 –
0.025 0.110 –
0.950 0.141 –
0.025 2.560 –
est.
– – –
– – –
– 33.92 –
– – –
– – –
– 33.72 –
– – –
|t-Ratio|
DM1
NTT,Lb NTT,M NTT,U
– – –
– 0.592 –
– 0.121 –
0.950 0.150 0.050
pTT,Ma bTT,M sTT,M
– – –
0.025 0.800 0.000
est.
MNL
pTT,L bTT,L sTT,L
Simulation settings
Table 15.2: Results for second study on simulated data.
– – –
– – –
3.36 2.78 –
2.21 10.09 –
1.44 3.11 –
4.01 3.35 –
4.12 18.62 –
2.30 3.48 –
|t-Ratio|
3327.20 10 0.331
0.067 0.325 –
0.779 0.674 –
0.154 1.700 –
0.063 0.076 –
0.853 0.143 –
0.084 0.519 –
est.
DM2
– – –
– – –
– – –
– – –
– 26.54 12.51
– – –
– – –
– 26.34 12.09
– – –
|t-Ratio|
3346.16 4 0.329
– 0.768 0.448
– – –
– – –
– 0.155 0.089
– – –
est.
MMNL
3153.36 6 0.370
7 694 19
– 7.15 –
– 35.62 –
– 92.01 –
– 7.03 –
– 34.78 –
– 10.83 –
|t-Ratio|
18 684 18
– 0.676 –
– 0.689 –
– 28.700 –
– 0.136 –
– 0.136 –
– 0.626 –
est.
MNLC
3128.85 12 0.371
7 694 19
– 8.68 0.41
– 28.84 8.64
– 33.50 0.02
– 11.44 0.15
– 28.49 6.90
– 11.25 0.48
|t-Ratio|
18 684 18
– 0.701 0.014
– 0.767 0.287
– 32.500 0.390
– 0.142 0.000
– 0.149 0.051
– 0.701 0.003
est.
MMNLC
340 Danny Campbell et al.
Accommodating Coefficient Outliers in Discrete Choice Modelling
341
group in the DGP. Additionally, DM2 outperforms DM1 in terms of model fit. We find that the MMNL model has inferior model fit vis-a`-vis DM2. It retrieves a significant amount of random taste heterogeneity, but the confounding between the heterogeneity in the middle group and the two outlying groups is evident in the inflated degree of heterogeneity. Again, the models with conditional segmentation provide superior model fits. While some problems remain with the lower outlier for TC, both models adequately retrieve the mean values. Furthermore, the MMNLC specification correctly uncovers random variation for the middle group only with the derived standard deviations being very close to those used in the DGP.
15.3.3. Simulated Data Case Study III In Table 15.3, we report the DGP and results for the third case study. This dataset was simulated on a similar basis as the second simulated case study, but with less extreme values for the lower outliers and with heterogeneity in all three groups (i.e. essentially a mixture of normals). Aside from recurring issues associated with the lower outliers, the DM models recover support points that are not dissimilar to those used in the true DGP. Both DM models have better fits than the standard MNL model, but DM2 does marginally better. This is a result of DM2 having a more flexible specification, allowing it to deal with the heterogeneity within groups (again illustrating the confounding of heterogeneity within groups and the very presence of the extreme groups). In terms of model fit, while MMNL is found to outperform DM1, it is inferior to DM2. The means retrieved under the MMNL model are of a comparable magnitude to those specified for the middle group. While the MMNL specification correctly retrieves random variation, it is clearly once again affected by confounding between the two types of heterogeneity. The performance of the models based on conditional segmentation is again superior. The means recovered from both models are relatively close to those of the true DGP. However, we do note some inconsistencies in the standard deviations for the outlier groups, but the heterogeneity in the middle group is retrieved very accurately.
15.3.4. Simulated Data Case Study IV Table 15.4 reports the DGP and results for the fourth case study. In this case study, the same means and spreads are used for the middle group as those used in the second and third case studies, but no explicit outliers are defined. Our motivation here is to examine the risk of confounding between outliers and heterogeneity under the DM approach. Both DM models have superior fits than the MNL model, and retrieve some of the heterogeneity used in data generation, with fits almost as high as the MMNL model. We note that the values found for the outliers in DM1 are quite extreme, especially the lower outliers, given the values used for the normal
– – –
– – –
0.025 0.100 0.050
0.025 2.000 0.500
0.950 0.750 0.300
0.025 0.500 0.200
pTT,U bTT,U sTT,U
pTC,L bTC,L sTC,L
pTC,Ma bTC,M sTC,M
pTC,U bTC,U sTC,U
3334.14 6 0.331
– 6.78 –
– 29.34 –
– 3.06 –
– 5.57 –
– 29.50 –
– 1.39 –
|t-Ratio|
Abbreviations: est.: established; Par.: parameter. a t-Ratio calculated with respect to a base value of 1. b N denotes the number of respondents with the separate coefficient.
3429.81 2 0.312
– – –
– – –
NTC,L NTC,M NTC,U
Final LL Par. adj. r2
– – –
0.025 0.357 –
0.950 0.699 –
– – –
– – –
– 34.46 –
0.025 3.040 –
0.025 0.102 –
0.950 0.141 –
0.025 2.510 –
est.
DM1
NTT,Lb NTT,M NTT,U
– – –
– 0.608 –
– – –
– – –
– 34.09 –
– – –
|t-Ratio|
MNL
– 0.124 –
0.950 0.150 0.050
pTT,Ma bTT,M sTT,M
– – –
0.025 0.400 0.200
est.
pTT,L bTT,L sTT,L
Simulation settings
Table 15.3: Results for third study on simulated data.
– – –
– – –
3.45 1.39 –
3.45 8.20 –
1.69 2.12 –
3.08 2.57 –
1.31 17.01 –
0.74 2.72 –
|t-Ratio|
3315.82 10 0.334
0.064 0.151 –
0.886 0.718 –
0.070 1.480 –
0.086 0.060 –
0.786 0.145 –
0.128 0.496 –
est.
DM2
– – –
– – –
– – –
– – –
– 27.56 12.33
– – –
– – –
– 26.99 11.47
– – –
|t-Ratio|
3330.08 4 0.332
– 0.761 0.399
– – –
– – –
– 0.154 0.082
– – –
est.
MMNL
6 696 18
15 687 18
– 3.93 –
– 35.49 –
– 4.31 –
– 4.40 –
– 35.07 –
– 8.05 –
|t-Ratio|
3221.64 6 0.353
– 0.347 –
– 0.679 –
– 2.030 –
– 0.073 –
– 0.136 –
– 0.439 –
est.
MNLC
6 696 18
15 687 18
– 5.95 1.14
– 29.34 8.10
– 2.83 2.22
– 3.15 1.49
– 28.79 6.78
– 4.98 2.13
|t-Ratio|
3191.68 12 0.358
– 0.388 0.015
– 0.760 0.278
– 2.750 1.140
– 0.074 0.063
– 0.150 0.052
– 0.579 0.622
est.
MMNLC
342 Danny Campbell et al.
– 0.136 –
– – –
– 0.667 –
– – –
1.000 0.150 0.050
– – –
– – –
1.000 0.750 0.300
– – –
pTT,M bTT,M sTT,M
pTT,U bTT,U sTT,U
pTC,La bTC,L sTC,L
pTC,M bTC,M sTC,M
pTC,U bTC,U sTC,U
3222.44 6 0.353
– 1.91 –
– 31.63 –
– 5.37 –
– 2.25 –
– 31.62 –
– 3.23 –
|t-Ratio|
Abbreviations: est.: established; Par.: parameter. a t-Ratio calculated with respect to a base value of 1. b N denotes the number of respondents with the separate coefficient.
3254.90 2 0.347
– – –
– – –
NTC,L NTC,M NTC,U
Final LL Par. adj. r2
– – –
0.025 0.118 –
0.950 0.717 –
0.025 2.440 –
0.025 0.046 –
0.950 0.145 –
0.025 0.714 –
est.
– – –
– – –
– 36.05 –
– – –
– – –
– 35.69 –
– – –
|t-Ratio|
DM1
NTT,Lb NTT,M NTT,U
– – –
– – –
– – –
est.
MNL
pTT,La bTT,L sTT,L
Simulation settings
Table 15.4: Results for fourth study on simulated data.
– – –
– – –
– – –
o0.01 o0.01 –
– – –
– – –
– – –
– 30.01 8.85
– – –
– – –
– 29.12 8.78
– – –
|t-Ratio|
3219.07 4 0.354
– 0.759 0.293
1.94 2.68 –
– – –
– – –
o0.01 o0.01 – 1.94 10.85 –
– 0.154 0.060
– – –
est.
MMNL
1.57 0.73 –
1.57 16.08 –
|t-Ratio|
3220.30 10 0.353
0.000 14.700 –
0.261 0.317 –
0.739 0.888 –
0.000 3.030 –
0.107 0.024 –
0.893 0.163 –
est.
DM2
11 690 19
18 684 18
– 3.96 –
– 36.49 –
– 126.00 –
– 3.86 –
– 36.01 –
– 11.69 –
|t-Ratio|
3013.12 6 0.395
– 0.316 –
– 0.742 –
– 31.400 –
– 0.072 –
– 0.147 –
– 0.634 –
est.
MNLC
11 690 19
18 684 18
– 8.99 0.11
– 32.99 1.62
– 12.70 0.01
– 6.32 1.67
– 35.74 0.16
– 11.93 1.30
|t-Ratio|
3012.82 12 0.394
– 0.317 0.000
– 0.750 0.097
– 31.300 0.151
– 0.072 0.000
– 0.148 0.003
– 0.643 0.002
est.
MMNLC
Accommodating Coefficient Outliers in Discrete Choice Modelling 343
344
Danny Campbell et al.
distribution, showing some risk of confounding.4 Here, the estimation of the DM2 model is an important check for analysts to carry out; this model produces a visibly less extreme distribution of sensitivities. The MMNL model recovers means and standard deviations that are very similar to those used in the true DGP. While superior fits are achieved under MNLC and MMNLC, both models are found to have equivalent fits. These are also found to produce relatively extreme values given the true DGP, along with no random variation in the MMNLC model. These findings further highlight the potential issues with confounding and the difficulty of disentangling the extreme sensitivities from the more commonly assumed type of heterogeneity. Indeed, we acknowledge the fact that the MMNLC model in this case insinuates the presence of extreme outliers rather than standard random taste heterogeneity. In this context, using the two-stage approach with different sizes for outlying groups may be a helpful way of assessing this type of heterogeneity.
15.4. Empirical Case Study To illustrate the proposed methodology on an empirical case study, we use stated preference data collected to estimate the existence value of a number of rare and endangered fish species in the Lough Melvin Catchment in Ireland. Lough Melvin is a freshwater lake in the North West of Ireland which straddles the border between the Republic of Ireland and Northern Ireland. With a unique population of native fish species, the Lough Melvin Catchment has an internationally important conservation status. Lough Melvin and its associated river system supports the only remaining population of Arctic char (AC) in Northern Ireland and contains Atlantic salmon (AS) and three genetically distinct populations of brown trout known as ferox (F), gillaroo (G) and sonaghan (S). Since the habitat of these fish populations is recognised as being vulnerable, there is a need to assess the extent to which the general public supports the prevention of their extinction. The discrete choice experiment consisted of a panel of 16 repeated choice sets. Each choice set outlined three possible outcomes. The first two outcomes — labelled as ‘Option A’ and ‘Option B’ — described the conservation status of each of the fish species after the implementation of two experimentally designed conservation schemes. At the end of these schemes, the fish species would either be ‘Conserved’ or ‘Extinct’. While a particular scheme described under either ‘Option A’ or ‘Option B’ may have been unable to prevent some of the fish species from becoming extinct, these both ensured against the extinction of all fish species (i.e. at least one species was conserved under each scheme). The final outcome — labelled as ‘Do Nothing’ — showed the expected outcome if nothing was done to protect the fish species. In this case, the respondents were informed that all five fish species would become extinct.
4. As an example, the true lower 2.5 percentile point for the TT coefficient is 0.248.
Accommodating Coefficient Outliers in Discrete Choice Modelling
345
‘Option A’ and ‘Option B’ were both described to respondents as available at a positive cost (CST). The payment vehicle used was the amount that they would personally have to pay per year — through an increase in their income tax and/or value added tax contributions — to implement the scheme. The ‘Do Nothing’ (or status-quo (SQ)) option had zero cost to the respondent. The population of interest was the adult population of the Republic of Ireland and Northern Ireland. The study adopted a stratified random sample to reflect the geographic distribution of the adult population, the approximate rural/urban spilt, the approximate socio-economic status of the regional population and the approximate gender and age profile of the populations within both jurisdictions. A final sample of 624 useable responses was obtained which, with each respondent answering 16 choice tasks, resulted in 9984 observations for model estimation. The survey was administered via a ‘paper-and-pencil’ questionnaire by professional interviewers and the responses were manually entered into a data file.5
15.4.1. Discrete Mixtures Modelling Results Table 15.5 reports the standard MNL model, with marginal utility parameters for the six attributes and an alternative specific constant for the SQ option. This model is compared against three DM models, where the probabilities of the support points for lower and upper coefficients are fixed. As a test of sensitivity, a series of models with different predefined densities are estimated: DM1%: lower and upper outliers fixed to 0.01, DM2.5%: lower and upper outliers fixed to 0.025 and DM5%: lower and upper outliers fixed to 0.05. The coefficient estimates for the middle group are close to those of the MNL model. We find that the DM models produce extreme values at the lower and upper support points, possibly providing evidence of coefficient outliers. We remark, however, in line with expectations, that as one moves from DM1% to DM5% the support points of the outliers become less extreme, although for some of the parameters, the support points remain relatively close to the original values. Additionally, as we increase the size of the outlying groups, we observe an increase in model performance. This suggests that the models with larger groups offer a better representation of the heterogeneity, possibly caused by a large degree of heterogeneity in the data. The outlying values found for the lower and upper groups would, however, suggest that some of the retrieved heterogeneity is caused by
5. While we acknowledge that this method of data collection can increase the risk of incorrect coding, every attempt was made to ensure that there were no errors in the recording and entering respondent’s answers. We are, therefore, confident that any extreme heterogeneity uncovered in estimation is not due to coding errors.
346
Danny Campbell et al.
Table 15.5: MNL and discrete mixtures models for the empirical data. MNL
pL bAC,L bAS,L bF,L bG,L bS,L bCST,L bSQ,L pM bAC,M bAS,M bF,M bG,M bS,M bCST,M bSQ,M pU bAC,U bAS,U bF,U bG,U bS,U bCST,U bSQ,U Final LL Par. adj. r2
DM1%
DM2.5%
DM5%
est.
|t-Ratio|
est.
|t-Ratio|
est.
|t-Ratio|
est.
|t-Ratio|
– – – – – – – –
– – – – – – – –
0.010 1.150 1.370 1.560 1.020 1.760 0.205 28.400
– 6.99 3.60 6.53 6.25 7.36 14.30 37.98
0.025 0.999 1.000 1.210 0.767 1.460 0.205 28.300
– 7.24 4.89 6.25 4.45 5.96 14.22 38.61
0.050 0.906 0.792 0.920 0.553 1.030 0.209 28.200
– 6.72 6.20 5.79 3.59 5.11 15.52 41.03
0.980 0.332 0.681 0.382 0.526 0.484 0.013 2.340
– 8.90 12.62 11.31 14.24 10.15 5.46 14.05
0.950 0.344 0.693 0.400 0.555 0.504 0.014 2.470
– 8.49 11.77 10.59 13.43 9.49 5.58 9.25
0.900 0.359 0.711 0.428 0.582 0.529 0.015 2.810
– 8.26 11.32 9.88 12.64 9.01 5.89 8.15
0.010 2.960 4.370 4.150 1.860 4.620 0.063 1.890
– 12.88 12.83 5.59 16.24 16.23 8.03 5.36
0.025 2.760 4.100 1.960 1.810 4.460 0.062 1.500
– 10.43 16.06 1.13 14.8 16.19 8.14 3.54
0.050 2.530 3.920 1.350 1.720 4.350 0.059 1.030
– 9.22 19.13 3.48 13.28 16.24 8.21 3.45
– – 0.304 13.05 0.642 24.29 0.297 12.57 0.355 15.32 0.508 20.11 0.018 19.83 0.974 20.59 – – – – – – – –
– – – – – – – –
8727.64 7 0.204
6930.94 21 0.366
6658.60 21 0.391
6442.65 21 0.411
Abbreviations: est.: established; Par.: parameter.
extreme sensitivities rather than the more standard type of heterogeneity. Indeed, the DM models could be seen to be picking up both coefficient outliers and more standard type of preference heterogeneity, and as the size of the outlying groups increases, more of the standard variation is explained, resulting in an improvement in model fit. A further point needs attention. All DM models suggest the presence of marginal estimates that are intuitively inconsistent with rational economic behaviour. Indeed, the marginal utility of increases in species conservation and cost would be expected to be positive and negative, respectively, but this is not always the case. We note that these counter intuitive preferences are directly derived from the data, which we argue,
Accommodating Coefficient Outliers in Discrete Choice Modelling
347
Table 15.6: Base model for conditional distributions. s
l
AC AS F G S CST SQ
est.
|t-Ratio|
est.
|t-Ratio|
0.525 1.029 0.513 0.641 0.935 0.042 3.588
10.68 14.58 11.65 14.38 12.65 12.41 12.68
0.763 1.229 0.516 0.592 1.44 0.076 2.948
12.53 16.96 9.17 8.18 16.88 17.38 14.12
Final LL Par. adj. r2
6180.64 35 0.436
Abbreviations: est.: established; Par.: parameter.
especially the evidence of positive cost sensitivity, may be due to the effects of ‘warmglow’ (i.e. yeah saying).
15.4.2. MMNL Modelling Results In estimation, we replicate the approach used for the simulated data and estimate a panel MMNL model in which all parameters are assumed randomly. After testing various distributional assumptions, we settled on multivariate normal distributions.6 The MMNL model (reported in Table 15.6) is estimated by simulating the loglikelihood with 500 pseudo-random draws. The estimates for the means of the random parameters are statistically significant and of the expected sign. The MMNL model provides evidence of heterogeneity in the taste-intensities across the sample of respondents for the random parameters. We also observe that the MMNL model gives a higher model fit than the DM models, even after taking into account the higher number of parameters. Individual-specific parameter distributions of the MMNL model were retrieved and the kernel densities for the conditional means were produced, where Figure 15.2 illustrates this process for the gillaroo (G) attribute.7 As observed earlier in this
6. While we recognise that we did not allow for correlation amongst the coefficients in the DM models, our decision to allow for correlated coefficients in the MMNL specifications was based on the fact that they afforded much superior model fits, from which we could retrieve more reliable conditional distributions. In contrast, allowing for correlation amongst the coefficients in the DM models is not straightforward. 7. Distributions for the remaining attributes are available with the lead author.
1.0
Danny Campbell et al.
0.6 0.4
π
0.4
0.0
0.0
0.2
0.2
Kernel density
0.6
0.8
0.8
348
−1.0
−0.5
0.0
0.5 ^μ n
1.0
1.5
2.0
Figure 15.2: Conditional distributions of m^ nG and the support points retrieved under the DM models.
section, using a DM approach puts the analyst at risk of confounding between outliers and heterogeneity. For example, if the true distribution has a high variance, the mass points representing the lower and upper limits of the distribution may be more extreme than the true limits. It is, hence, of interest to compare the support points of the DM distributions against the continuous distribution. So, we include a secondary y-axis in Figure 15.2 showing the probability of the support points of the three DM models. To illustrate, the rightmost vertical mark has a value of 1.02 and probability of 0.01, representing the estimated lower outlier for the G attribute in DM1%. Examination of the location of the lower and upper support points highlights that they generally lie within the extreme tails of the distribution. This is an important finding as it indicates that our DM modelling approach identifies extreme values that are consistent with those approximated using continuous normal distributions. We do note, however, that in a few cases the DM support points are more extreme than those produced under the continuous distribution. The means of the individual-specific parameter distributions obtained from the MMNL model were used to assign respondents to either a lower, middle or upper class. On the basis of this, further MMNL models were estimated, with separate means and spreads for the lower, middle and upper classes. Again, to tease out the influence of coefficient outliers at various levels, we estimated three models, each with a different proportion of outliers. For consistency, the percentiles used for classifying outlying respondents corresponded with those used in the DM models in Table 15.5.
Accommodating Coefficient Outliers in Discrete Choice Modelling
349
The conditioned MMNL models specified that all taste parameters are distributed independently, normal and probabilities of choice were simulated in estimation with 500 pseudo-random draws.8 Results from these are presented in Table 15.7. In line with a priori expectations, we find that the magnitude of the estimated coefficients in the outlying classes generally becomes less extreme as class sizes increase, whereas the larger classes increasingly capture heterogeneity from the middle group. Nevertheless, we find further evidence that these are quite extreme, and in some cases, theoretically inconsistent. Parameter estimates for the middle class are relatively stable across the six models. There is little evidence of heterogeneity amongst outliers, especially in the lower group. But, as expected, the degree of heterogeneity in taste intensities amongst outliers does slightly increase as the outlying groups increase in size, again given that these larger classes increasingly capture heterogeneity from the middle group. Correspondingly, there is a general decline in the coefficients of variation amongst the middle group as one moves from MMNLC,1% to MMNLC,5% — indicating relatively smaller heterogeneity. We find that using the conditional means to assign separate parameters for lower, middle and upper classes leads to better model fits, which is to be expected. Further improvements in model fit are achieved as the size of lower and upper groups increases, where this is again a result of capturing more heterogeneity. An examination of the conditional means obtained from the estimates in Table 15.7 reveals that as more respondents are assigned to outlier groups, the distributions become increasingly different from those produced by the base MMNL model. As expected, we observe that the density of the outliers increases as more respondents are assigned as having extreme values. Associated with this finding is a general move from unimodal distributions under MMNLC,1% to increasingly apparent tri-modal distributions under MMNLC,5%. Moreover, as the proportion of outliers increases, the position of the outliers become considerably less extreme, as they converge towards the median. While the location of conditional means for the non-outlying group does not vary much, it is clearly evident that the peaks become much more pronounced, given that the two outlying groups now capture more of the heterogeneity. This is consistent with the earlier finding that the extent of heterogeneity declines for this group as the outlying groups get larger. Importantly, the proportion of the distribution in the intuitively inconsistent domain diminishes as one moves from the conditional means obtained from MMNLC,1% to MMNLC,5%. These findings further highlight the important role that a few extreme coefficient values can have on the degree of unobserved heterogeneity retrieved using MMNL models.
8. Multivariate distributions were not used in these models since three coefficients were estimated for each attribute and it was not possible to allow for correlation between these.
350
Danny Campbell et al.
Table 15.7: Models taking conditional distributions into account. MMNLC,1%
MMNLC,2.5%
MMNLC,5%
est.
|t-Ratio|
est.
|t-Ratio|
est.
|t-Ratio|
bAC,L sAC,L bAS,L sAS,L bF,L sF,L bG,L sG,L bS,L sS,L bCST,L sCST,L bSQ,L sSQ,L
1.513 0.205 1.429 0 0.598 0.202 0.53 0.467 2.105 0.157 0.593 0.183 56.028 0
3.86 0.38 3.37 o0.01 1.86 0.46 1.26 0.79 4.12 0.25 5.09 1.7 o0.01 o0.01
1.43 0.087 1.444 0.097 0.252 0.045 0.018 0.236 1.985 0.012 0.433 0.119 27.059 0
2.57 0.07 3.91 0.07 0.78 0.04 0.07 0.22 4.69 o0.01 6.04 0.97 o0.01 o0.01
1.203 0.034 1.245 0.076 0.794 0.243 0.64 0.445 1.611 0.012 0.404 0.067 26.393 0
7.22 0.15 7 0.34 4.77 0.9 3.22 1.11 8.3 0.04 11.38 1.55 o0.01 o0.01
bAC,M sAC,M bAS,M sAS,M bF,M sF,M bG,M sG,M bS,M sS,M bCST,M sCST,M bSQ,M sSQ,M
0.467 0.611 1.073 1.179 0.484 0.42 0.643 0.45 0.856 1.249 0.04 0.081 3.054 2.176
10.33 10.54 15.9 16.8 11.59 6.32 14.85 6.8 13.05 18.15 11.27 20.17 16.52 16.13
0.397 0.581 1.43 1.363 0.425 0.48 0.474 0.82 0.718 1.043 0.049 0.075 3.187 1.533
8.87 11.05 19.75 18.79 9.75 7.22 7.79 11.09 11.93 18.85 15.33 23.87 22.44 16.85
0.486 0.463 1.032 0.808 0.477 0.328 0.641 0.289 0.823 0.925 0.041 0.062 3.2 1.667
11.32 7.69 18.1 13.22 11.75 4.73 15.68 3.42 14.24 15.5 12.75 20.19 17.66 12.67
bAC,U sAC,U bAS,U sAS,U bF,U sF,U bG,U sG,U bS,U sS,U bCST,U sCST,U bSQ,U sSQ,U
2.259 0.092 8.723 0.904 1.736 0.056 2.357 0.052 69.631 0 1.799 0.065 3.569 0.019
4.78 0.18 2.54 0.22 4.18 0.13 5.01 0.1 o0.01 o0.01 7.21 0.31 7.32 0.03
2.285 0.06 5.452 0.001 1.85 0.066 2.496 0.033 30.212 0 1.042 2.412 3.172 0.067
6.07 0.02 9.24 o0.01 4.27 0.03 5.27 0.04 o0.01 o0.01 8.53 34.45 6.05 0.03
2.213 0.584 7.935 2.828 1.763 0.014 2.195 0.08 8.076 1.536 0.079 0.008 2.217 0.348
9.31 2.05 6.34 2.9 8.99 0.05 10.64 0.29 5.1 1.04 7.04 0.36 10.6 0.89
Final LL par. adj. r2
6060.55 42 0.446
Abbreviations: est.: established; Par.: parameter.
5926.44 42 0.459
5445.38 42 0.503
Accommodating Coefficient Outliers in Discrete Choice Modelling
351
15.5. Conclusion In this paper, we build on a surprisingly sparse literature regarding the role of coefficient outliers in discrete choice data. We explore two separate ways of representing random taste heterogeneity. In our first approach, we assume that random taste heterogeneity for each coefficient can be adequately described by three discrete points: one for the majority of agents, while the other two for a minority taking up extreme values. To implement this partitioning, we constrain the point estimates to be monotonic, so that we can derive lower, middle and upper estimates on the distribution. We further impose that the densities of the lower and upper classes are relatively small so that we capture the extreme lower and upper elements of the distribution. This approach is then compared with the common continuous mixed logit assumption where random parameters are continuously distributed. From these models, we retrieve the individual-specific distributions and use the conditional means to assign the values of three dummy variables denoting either lower, middle or upper class values. By specifying the attribute parameters as a function of these dummy variables, we estimate separate coefficients for the three classes. To test our methodology, we first use four simulated datasets and then apply it on a stated choice dataset designed to estimate the existence value of a number of rare and endangered fish species in Ireland. In our simulated datasets we find that the DM models retrieve the correct mean values and densities of the outliers used in the DGPs for the simulations. We also find that in the presence of extreme coefficient values the MMNL models generally overstate the degree of random taste heterogeneity in the middle group. In other words, while the models may retrieve the correct overall degree of heterogeneity, they misrepresent the shape of the distribution and assign too much weight away from a central group. This finding suggests some caution when selecting an appropriate mixing distribution for datasets believed to include extreme coefficient values. Throughout the paper, we acknowledge the fact that there may be issues of confounding between outliers and more standard heterogeneity, and that our DM approach is potentially more vulnerable to such confounding. Indeed, in our fourth simulated dataset, we demonstrate the difficultly in separating outliers from remaining heterogeneity. Nevertheless, we find that in our empirical application the outlying support points of the DM models are generally consistent with the outlying conditional means derived from the MMNL. This is reassuring, as it provides some evidence that the DM approach is relatively robust to this potential source of confounding in datasets where there are a handful of respondents with extreme sensitivities. In applying the approach to our empirical dataset, we find further evidence that extreme coefficient outliers may lead to misleading findings in terms of the patterns of heterogeneity. We also find that accommodating these outliers in our empirical application reduces the proportion of extreme and intuitively inconsistent point estimates. Whilst deciding on the legitimacy of coefficient outliers is a difficult judgement and is ultimately an empirical issue to be evaluated case-by-case, the fact that some of them are found to be behaviourally inconsistent with a priori expectations suggests
352
Danny Campbell et al.
some caution for analysts engaged in discrete choice modelling. Indeed, the findings indicate the importance of testing for the presence of extreme coefficient values and that their detection and evaluation should become part of the standard course of action in practice.
References Ben-Akiva, M., & Lerman, S. R. (1985). Discrete choice analysis: Theory and application to travel demand. Massachusetts: MIT Press. Campbell, D. & Hess, S. (2009). Outlying sensitivities in discrete choice data: causes, consequences and remedies. European Transport Conference, Amsterdam. Daly, A. J., & Zachary, S. (1975). Commuters’ value of time. Report T55, Local Government Operational Research Unit, Reading, UK. Hensher, D. A., & Greene, W. H. (2003). The mixed logit model: The state of practice. Transportation, 30, 133–176. Hess, S., & Axhausen, K. W. (2005). Distributional assumptions in the representation of random taste heterogeneity, Proceedings of the 5th Swiss transport research conference, Zurich, UK. Hess, S., Bierlaire, M., & Polak, J. W. (2007). A systematic comparison of continuous and discrete mixture models. European Transport, 37, 35–61. Kanninen, B. J. (1995). Bias in discrete response contingent valuation. Journal of Environmental Economics and Management, 28, 114–125. Lewbel, A. (1997). Semiparametric estimation of location and other discrete choice models. Econometric Theory, 13, 32–51. Louviere, J. J., Hensher, D. A., & Swait, J. D. (2003). Stated choice methods: Analysis and application. Cambridge, UK: Cambridge University Press. McFadden, D., & Train, K. E. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics, 15, 447–470. Revelt, D., & Train, K. E. (2000). Mixed logit with repeated choices: Households’ choices of appliance efficiency level. Review of Economics and Statistics, 80, 647–657. Sælensminde, K. (2006). Causes and consequences of lexicographic choices in stated choice studies. Ecological Economics, 59, 331–340. Train, K. E. (1998). Recreation demand models with taste differences over people. Land Economics, 74, 185–194. Train, K. E. (2008). EM algorithms for nonparametric estimation of mixing distributions. Journal of Choice Modelling, 1, 40–69.
Chapter 16
Addressing Endogeneity in Discrete Choice Models: Assessing Control-Function and Latent-Variable Methods Cristian Angelo Guevara and Moshe Ben-Akiva
Abstract Endogeneity or nonorthogonality in discrete choice models occurs when the systematic part of the utility is correlated with the error term. Under this misspecification, the model’s estimators are inconsistent. When endogeneity occurs at the level of each observation, the principal technique used to treat for it is the control-function method, where a function that accounts for the endogenous part of the error term is constructed and is then included as an additional variable in the choice model. Alternatively, the latent-variable method can also address endogeneity. In this case, the omitted quality attribute is considered as a latent variable and modeled as a function of observed variables and/or measured through indicators. The link between the controlfunction and the latent-variable methods in the correction for endogeneity has not been established in previous work. This paper analyzes the similarities and differences between a set of variations of both methods, establishes the formal link between them in the correction for endogeneity, and illustrates their properties using a Monte Carlo experiment. The paper concludes with suggestions for future lines of research in this area.
16.1. Introduction Demand models allow us to understand and forecast the behavior of individuals (or agents). This requires a range of assumptions regarding the behavior of the
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
354
Cristian Angelo Guevara and Moshe Ben-Akiva
individuals as a function of available information and about the statistical properties of the information itself. In order to have consistent estimators of model parameters, one critical assumption is that the observed model variables are uncorrelated to unobserved ones. The violation of this assumption is defined as endogeneity. The analysis of methods to correct for endogeneity in discrete choice models is an area of current development in econometrics (Louviere et al., 2005). One technique is the control-function method, which is particularly suitable when endogeneity occurs at the level of each observation. The motivation of this paper is to explore possible enhancements of the two-stage control-function method applied by Guevara and Ben-Akiva (2006), in light of the latent-variable approach. The next section describes the problem of endogeneity in discrete choice models. Then the basics of the control-function and the latent-variable methods are surveyed. Afterwards, the properties of both methods are contrasted; the equivalences and dissimilarities are studied; and a formal link between both is established. In Section 16.5, proposed formulations are illustrated and compared using synthetic data. The final section summarizes the principal findings, draws conclusions and suggests future lines of research in this area.
16.2. The Problem: Endogeneity in Discrete Choice Models When modeling the behavioral response to a certain choice, if all the attributes that are relevant for the individuals are observed, the model estimators will be consistent; that is, they will be as close to the true model parameters as needed, with probability equal to one, if the sample size is large enough. In turn, if some relevant attributes are not observed, the estimators will be consistent if and only if those unobserved attributes are not correlated with the observed ones. Consider, for example, modeling the choice made by individual n among combinations i of car makes and models. Utility Uin perceived by the individual is linear in the following attributes: price, size, fuel efficiency, safety features and whether the car is red or not (color). There is also an additive error term ein, which is independent across alternatives and individuals. U in ¼ ASCi þ yp pricein þ ys sizein þ ye efficiencyin þ ys safetyin þ yc colorin þ ein |fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl} in
(16.1) If the price, size, efficiency and safety can be perfectly measured, but the car’s color is omitted, the model’s error will be ein instead of ein, as shown in Eq. (16.1). The omission of this variable will not compromise the consistency of the estimators, if and only if car color does not affect the observed attributes (price, size, efficiency and safety). Nonetheless, since the variance of ein will be larger than the variance of ein, the scale of the estimated model will be smaller than the true scale. Instead, if car prices are dependent on color, and color is omitted from the model, the crucial exogeneity assumption will be broken. For instance, if red cars become
Addressing Endogeneity in Discrete Choice Models
355
more popular, retailers will adjust the price of red cars upwards to maximize profit. An external analyst will then observe that, for seemingly equal cars, which only differ in price (and unobserved color), some buyers choose the more expensive alternative. The analyst will then erroneously conclude that yp is smaller in absolute value than what it really is or even that it is positive, that would make the modeling effort completely worthless. The price variable is frequently at the core of the endogeneity problem in a demand function because of the omission of correlated quality attributes. Formally, if it is assumed that the error term has zero mean, price is said to be endogenous if Eðp0 Þa0, where EðÞ corresponds to the expected value, and p and e are vectors where the variable price and the error term are stacked correspondingly across alternatives i and individuals n. Beyond the omission of attributes, endogeneity in discrete choice models may also be caused by errors in variables (Walker, Li, Srinivansan, & Bolduc, 2008), simultaneous determination or sample selection bias (Vella, 1992; Eklo¨f & Karlsson, 1997; Mabit & Fosgerau, 2009).
16.3. The Methods Under Study 16.3.1. The Control-Function Method The control-function method is a procedure used to address endogeneity in econometric models. For a complete description of this method in the case of discrete choice models see Train (2009). This method is especially suitable when the endogeneity occurs at the level of each observation. This is the case, for example, of residential location choice models. The theoretical basis of the method is described in Heckman (1978), Hausman (1978), Petrin and Train (2005) and Blundell and Powell (2004). The basic idea is to construct a variable or control function that accounts for the non-zero expected value of the error term, conditional on the observed attributes. The endogeneity problem is then solved by adding this control function as an explanatory variable in the choice utility. For example, consider that only one observed attribute p is correlated with the error term of the utility e and that a proper set of instrumental variables Z is available. To be a proper instrument, the elements in Z must be correlated with p, but at the same time, should not be correlated with e. Consider now the ordinary least squares (OLS) regression of p as a function of Z, which will be henceforth labeled as the ‘‘price equation’’. Since the fitted errors of this OLS regression are orthogonal to the Z by construction (Greene, 2003), and since Z is not endogenous, it follows that the fitted errors of the price equation will capture all of p that is correlated with e. Thus, if the fitted errors are included as auxiliary variables in the utility function, they will solve the endogeneity problem. Finding appropriate instrumental variables is cumbersome. First the instruments need be sufficiently correlated with the endogenous variable, but the actual degree of
356
Cristian Angelo Guevara and Moshe Ben-Akiva
correlation required is difficult to determine. Second, the instruments need to be uncorrelated with the error term of the utility, which is unobserved. Formal tests for the validity of instruments exist for linear models. Guevara and Ben-Akiva (2008) developed and applied an adaptation of one of these tests for logit models.
16.3.2. The Latent-Variable Method The latent-variable method is a technique used to account for unobserved or latent variables in econometric models. The basic idea of the method is to explicitly include the latent variables in the model specification, and then to integrate them out in the calculation of the likelihood of each observation. The problem is that the distribution of the latent variables is unknown. However, this distribution can be depicted under some conditions by accounting for the causality of each latent variable with other latent and measurable variables. For example, in the case of a choice model, the random utility of each alternative in the choice set is latent since it cannot be measured. Instead, we observe the choices made by the individuals, choices that depend on the utilities of the different alternatives available to them. In this case, the observed choices are said to be indicators explained by the utilities (latent variables) through a measurement equation (the choice behavior). As an alternative to measurement equations or in addition to them, the distribution of the latent variables can be also depicted by structural equations. In a choice model, the specification of the utility function is an example of a structural equation, where the utility (a latent variable) is written as a function of measurable attributes of each alternative and, potentially, also other latent variables such as unobserved quality attributes. The latent-variable method can be estimated either sequentially or simultaneously. For a complete description of this method in discrete choice models see Walker and Ben-Akiva (2002).
16.4. Combining Control-Function and Latent-Variable Methods to Correct for Endogeneity 16.4.1. Issues to be Addressed The latent-variable and the control-function methods were originally conceived for different purposes. Unlike the latent-variable method, the control-function method was specifically created to address endogeneity. The control-function method focuses on the statistical properties of the variables, while the latent-variable approach is primarily behaviorally based. The purpose of this section is to analyze similarities and differences between both methods that may be of relevance in the correction of endogeneity.
Addressing Endogeneity in Discrete Choice Models
357
There are two issues to be addressed. First, the control-function method is estimated in two stages, whereas the latent-variable method is generally estimated simultaneously. This is important because the number of stages may impact the efficiency of the estimators. Section 16.4.2 proposes an approach to estimate the control-function method in one-stage based on the full information maximum likelihood (FIML) framework (Greene, 2003). Second, it is not clear which is the counterpart, if any, for the components of the latent-variable method (e.g., the structural equations, the measurement equations and the latent variables) in the control-function approach. To achieve this goal, Section 16.4.3 discusses the conditions under which the latent-variable approach may be directly used to correct for endogeneity. Then, Sections 16.4.4 and 16.4.5 propose two ways to combine the latent-variable and the control-function methods to correct for endogeneity in discrete choice models.
16.4.2. Simultaneous Estimation of the Control-Function Method The simultaneous estimation of the control-function method can be solved using an FIML approach, where the likelihood of both the choice model and the price equation are maximized simultaneously. Since data and parameters are shared by both models, this simultaneous procedure should increase efficiency. However, this potential increase in efficiency comes with the cost of making stronger assumptions about the joint distribution of the error terms in both models. To explain the procedure proposed in this section, we first present the two-stage control-function method applied by Guevara and Ben-Akiva (2006). Consider that an individual n perceives a certain utility Uin for alternative i, which is a linear function of a set of attributes Xin and the price pin. y and yp are parameters and ein is an error term. In this model, utility Uin is a latent variable. Instead, we can observe the choice yin, variable which is equal to one if the alternative i is chosen and zero otherwise. Assuming that individuals choose the alternative with the largest utility within the choice set Cn, the choice model can be formulated as it is shown in Eq. (16.2). U in ¼ yp pin þ X 0in y þ in yin ¼ 1½U in ¼ maxj2Cn U jn
(16.2)
If the error term e is the distributed extreme value (0, m), the resulting choice model is the logit (Ben-Akiva & Lerman, 1985). The likelihood of an observation Þ is equal to Eq. (16.3), where i corresponds to the chosen alternative. ðLChoice n 0
¼P LChoice n
emðyp pin þX in yÞ j2Cn e
mðyp pjn þX 0jn yÞ
(16.3)
The estimation of the model parameters by the maximization of Eq. (16.3), allows us to retrieve only my and myp, but not m, y or yp separately. Therefore, some
358
Cristian Angelo Guevara and Moshe Ben-Akiva
normalization is required for identification. This is usually done by setting the scale coefficient m equal to one. Under this normalization, the scale m disappears from Eq. (16.3). Consider now that price is endogenous because it is correlated with some unobserved variable that is relevant to the choice process. If the coefficients are estimated by maximizing the likelihood function shown in Eq. (16.3), the estimators will not be consistent. However if, as in Eq. (16.4), pin can be written as a linear function of exogenous instruments Zin and an error term nin, the endogeneity problem can be solved if nin is uncorrelated with Zin, Xin and ein. pin ¼ Z 0in b þ nin
(16.4)
The first stage of the control-function method consists of the estimation of the price equation using OLS. These estimators are then used to calculate the fitted errors of the price equation n^ , which are then used as auxiliary variables that enter the utility function, in the second stage. OLS
pin ¼ Z0in b þ nin ! n^ in U in ¼ yp pin þ X 0in y þ yn n^ in þ ein
(16.5)
yin ¼ 1½U in ¼ maxj2Cn U jn One way to transform the two-stage control-function method into a one-stage procedure, results by making an additional assumption on the distribution of the error term of the price equation. If the error nin in Eq. (16.4) is distributed normal for individual n will correspond (0, s2n I), the likelihood of the price equation Lpeq n to Eq. (16.6). ¼ Lpeq n
Y
2 0 1 2 pffiffiffiffiffiffiffiffiffiffi eð1=2sn Þðpjn Zjn bÞ 2 2psn j2Cn
(16.6)
This means that if the price equation were estimated using maximum likelihood, the result would be exactly the same as if it were estimated using OLS. This can be noted in the fact that if we take the log of Eq. (16.6), the result is the negative of the sum of the squared errors plus multiplicative and additive scalars, which play no role in the likelihood maximization process. It follows directly that if the errors in the price equation are normally distributed, the control-function method can be estimated simultaneously by considering the product of the likelihood of the price equation shown in Eq. (16.6) and the likelihood of the choice model shown in Eq. (16.3) as the objective function to be maximized. As it is shown in Eq. (16.7), in this case the error of the price equation should be included as an additional variable in the utility function. 0
LFIML n
¼P
0
eyp pin þX in yþyn ðpin Zin bÞ j2Cn e
yp pjn þX 0jn yþyn ðpjn Z0jn bÞ
Y
2 0 1 2 pffiffiffiffiffiffiffiffiffiffi eð1=2sn Þðpjn Zjn bÞ 2 2psn j2Cn
(16.7)
Addressing Endogeneity in Discrete Choice Models
359
This one-stage procedure can be seen as what is called FIML in econometrics literature since, instead of maximizing the likelihood of only one problem conditional on the estimated values of the previous stage, the likelihood to be maximized in this case simultaneously considers the information of both models.
16.4.3. Using the Latent-Variable Method to Correct for Endogeneity In this section, we study how the latent-variable method can be used to correct for endogeneity in discrete choice models. For expositional purposes, we consider a concrete example: the choice of commuting mode. However, the statements can be straightforwardly extended to any type of discrete choice model. Comfort, safety and reliability are quality attributes, which are difficult to measure, have an effect on mode choice, and are correlated with cost and travel time. If these omitted quality attributes are independent across individuals, the endogeneity problem caused by their omission can be solved by including alternative (modal) specific constants. However, comfort, safety and reliability might depend, in general, on the distance traveled or on the origin and destination pair of each trip, that would make the use of alternative (modal) specific constants insufficient. The latent-variable approach can be used to correct for endogeneity in this case. This is achieved by including explicitly the omitted quality attribute as a latent variable qin in the utility function. Then, the problem is how to depict the distribution of qin. As discussed before, this can be done using structural and measurement equations. Consider for example the case of comfort. Even though it is the difference in comfort among modes that affects the choice, comfort cannot be measured by itself. However, we know that comfort can be explained as a mixture of observable variables Oin such as passenger density, the age and model of the vehicle, the travel time (which is in Xin), the price or even other variables such as safety or status, which may be also latent. We can therefore postulate a structural equation where comfort is in the left hand side and Oin is in the right hand side, which includes an additive error as well. The distribution of the latent variable may also be depicted if individuals provide information or indicators Iin on, for example, their declared degree of satisfaction with each mode or their qualitative appreciation of the comfort, safety and reliability experienced in each mode. In this case, a measurement equation can be formulated where the indicator Iin is in the left hand side explained by the latent variable qin and a set of variables Min, which may include the individual’s characteristics, some components of Xin, the price and/or other latent variables as well. The latent-variable model considering the structural and the measurement equations can be expressed as it is shown in Eq. (16.8), where j, a and aq are parameters and oin and gin are, respectively, the error terms of the structural and the
360
Cristian Angelo Guevara and Moshe Ben-Akiva
measurement equations. U in ¼ yp pin þ X 0in y þ yq qin þ ein yin ¼ 1½U in ¼ maxj2Cn U jn qin ¼ O0in j þ oin
(16.8)
I in ¼ M 0in a þ aq qin þ gin The likelihood of the choice model is calculated by integrating out the latent variable qin, conditional on the structural and the measurement equations. The maximization of this likelihood will result in a consistent estimation of the parameters under some conditions on the error terms. First, to avoid endogeneity in the measurement and structural equations, it is required that Eðq0 gÞ ¼ 0, EðM 0k gÞ ¼ 0 and EðO0l oÞ ¼ 0, where k and l correspond to each component of Min and Oin, respectively, stacked across alternatives and individuals. To avoid endogeneity due to simultaneous determination between the structural equation and the measurement equation, it is required that Eðo0 gÞ ¼ 0. Finally, to avoid endogeneity due to simultaneous determination between the structural equation and the utility, it is required that Eðo0 eÞ ¼ 0. Note that it is not necessary to impose the condition Eðe0 gÞ ¼ 0; that is, the error term of the choice model may also explain the realization of the indicators without compromising the consistency of the whole model. This type of correlation will not generate endogeneity due to simultaneous determination because the latent variable in the measurement equation is in the right hand side. In the next section we study the correspondence between this latent-variable formulation and the control-function method. The crucial issue is the identification of the link between elements qin, Iin, Min and Oin with the instrumental variables and other components of the control-function method. As it is explained in Sections 16.4.4 and 16.4.5 this correspondence can be either stated as a two-stage or as a onestage procedure.
16.4.4. Two-Stage Latent-Variable/Control-Function Method The link between the latent-variable and the control-function methods can be established as a two-stage procedure in the following way. Recall that in the controlfunction method the fitted errors of the price equation n^ in are used directly to replace the omitted quality attribute qin, although we know that there is a discrepancy between qin and n^ in . Calling this discrepancy oin we can always write Eq. (16.9), where j0 is a constant to ensure that oin has zero mean. qin ¼ j0 þ jv n^ in þ oin
(16.9)
It can be shown (Train, 2009) that the assumptions required in the derivation of the control-function method imply that oin is not correlated with n^ in , Xin or ein. It follows that Eq. (16.9) can be used as a structural equation where qin is latent.
Addressing Endogeneity in Discrete Choice Models
361
The resulting two-stage latent-variable/control-function model can be formulated as follows. U in ¼ yp pin þ X 0in y þ yq qin þ ein yin ¼ 1½U in ¼ maxj2Cn U jn
(16.10)
OLS
pin ¼ Z 0in b þ nin ! n^ in qin ¼ j0 þ jv n^ in þ oin
The likelihood of each observation is calculated by integrating out the latent variable qin. Making the change of variables implied by Eq. (16.9), and assuming that oin is distributed normal (0, s2o I), the likelihood of each observation corresponds to Eq. (16.11). LnLV2Stage
Z
Z
þ1 1
0
þ1
:::
¼
1
P
eyp pin X in yþyq ðj0 þjv n^ in þoin Þ j2Cn e
yp pjn þX 0jn yþyq ðj0 þjv n^ jn þojn Þ
Y
2 2 1 pffiffiffiffiffiffiffiffiffiffiffi eð1=2Þðoj =so Þ do 2 2pso j2Cn
(16.11) Note that Eq. (16.11) may also be interpreted independently of the latent-variable approach. If an error term oin is added to n^ in in model (16.5) and if this error term is then integrated out, the resulting model will be a logit mixture model equal to expression (16.11). This formulation can be seen as a conceptual improvement to the two stage control-function method described in Eq. (16.7) since it addresses the fact that the omitted attribute does not correspond exactly to n^ in . On the other hand, compared to Eq. (16.7), this formulation relies on a stronger assumption on the joint distribution of the error term and involves the resolution of a multifold integral in which the number of dimensions is equal to the number of alternatives in the choice set. This may be impractical, for example, in models of residential choice where the number of alternatives is huge.
16.4.5. One-Stage Latent-Variable/Control-Function Method To achieve the simultaneous estimation of the control-function method within the latent-variable framework, we propose a model that directly uses the information of the instrumental variables instead of information of the fitted errors of the price equation. This can be achieved by replacing Eq. (16.4) into Eq. (16.9) to obtain Eq. (16.12). qin ¼ j0 þ jv ðpin Z0in bÞ þ oin
(16.12)
The calculation of the likelihood of the model considering this replacement is not straightforward. Note first that the likelihood in Eq. (16.11) is implicitly written conditional on n^ in . Therefore, if the actual nin ¼ pin Z 0in b is used instead of n^ in , the
362
Cristian Angelo Guevara and Moshe Ben-Akiva
likelihood of each observation should also include the likelihood of nin. If it is assumed that oin is distributed normal (0, s2o I), and making the change of variables implied by Eq. (16.9), the likelihood of each observation corresponds to the following expression. LnLV1stage ¼
Z þ1 2 0 1 2 pffiffiffiffiffiffiffiffiffiffi eð1=2Þððpjn Zjn bÞ =sv Þ ::: 2ps2v 1 j2C n Z þ1 0 0 Y 2 2 eyp pin þX in yþyq ðj0 þjv ðpin Zin bÞþoin Þ 1 pffiffiffiffiffiffiffiffiffiffiffi eð1=2Þðoj =so Þ do P 0 0 yp pjn þX jn yþyq ðj0 þjv ðpjn Z jn bÞþojn Þ 2 2pso 1 j2C n j2C n e Y
(16.13) Note that Eq. (16.13) can be also interpreted as a logit mixture extension of Eq. (16.7). Viewed in this way, it can be shown that Eq. (16.13) is equivalent to the formulations used by Villas-Boas and Winer (1999) and Park and Gupta (2009) to perform a simultaneous estimation of the control-function method or what Train (2009) denominates as maximum likelihood methods. Like the one-stage estimator described in Section 16.4.2, this one-stage latent variable method will be more efficient than the two-stage method if the assumptions about the joint distribution of ein, oin and nin are true. However, it may be less efficient if these assumptions fail. Additionally, unlike the two-stage method, the estimation of this one-stage procedure involves the calculation of a multifold integral, which may require numerical approximations that may end up overshadowing any potential improvement in efficiency.
16.5. Monte Carlo Experiment In this section we create synthetic data that have endogeneity. We then implement a set of variations of the proposed methods to correct for endogeneity and analyze their results. The aim is to illustrate the properties of the different methods and to remark some practical issues associated with their estimation and normalization. Results regarding, for example, relative efficiency are not at all conclusive, since this is just one experiment. The analysis of different specifications of synthetic and/or real data as well, may provide a conclusive answer to those issues in future research.
16.5.1. Experimental Design The experiment considers 2000 (N) synthetic individuals who choose between two alternatives. Each individual (n) maximizes his/her utility (Uin), which is assumed to be a linear function of the attributes ain, bin, cin, a quality attribute qin, the price pin of each available alternative (i) and an error term (ein). The coefficients of each attribute
Addressing Endogeneity in Discrete Choice Models
363
in the utility are shown in Eq. (16.14). U in ¼ 1ain þ 1bin þ 1cin þ 2qin 1pin þ ein
(16.14)
The error term is distributed iid extreme value (0, 1) which implies a logit form for the probability that individual n chooses alternative i. Additionally, price is determined by the price equation shown in Eq. (16.15), which is linear in the attributes cin, qin, z1in, z2in, and an error term din that is distributed normal (0, 1). pin ¼ 1cin þ 1z1in þ 1z2in þ 1qin þ din |fflfflfflfflfflffl{zfflfflfflfflfflffl}
(16.15)
vin
Variables ain, bin, cin, z1in, z2in and qin were considered iid uniform (1, 10) for each individual and alternative. Variable pin was generated using Eq. (16.15) as a function of cin, qin and the exogenous instruments z1in and z2in. Table 16.1 summarizes the synthetic data considered in this experiment. Within this setting, variables cin and qin are correlated with price pin but neither with ain nor with bin. Therefore if the variable qin is omitted, price will be correlated with the error term of the utility, which, in this case, would be equal to in ¼ 10qin þ ein . Equivalently, the error of the price equation would become vin ¼ 1qin þ din , whose variance would therefore be equal to s2v ¼ 7:75. At the same time, variables z1in and z2in are, by construction, proper instruments for price since they are correlated with price, but not with the error terms ein and nin.
16.5.2. Models Estimated Using the synthetic data, seven models were estimated using the open-source software R (R Development Core Team, 2008). The first model is a logit model that includes all variables, and acts as a benchmark. In subsequent models variable qin is omitted causing endogeneity. In Models III–VII different variations of the proposed methods to correct for endogeneity are applied. All results are reported in Table 16.2. Table 16.1: Summary statistics of synthetic data (N ¼ 2000). Variable Mean
Standard error
Correlation a
a b c q p z1 z2
5.5 5.6 5.6 5.4 22 5.5 5.4
2.7 2.6 2.6 2.7 5.3 2.6 2.6
1.0 0.0046 0.012 0.017 0.025 0.017 0.034
b
c
q
p
z1
z2
0.0046 0.012 0.017 0.025 0.017 0.034 1.0 0.014 0.019 0.012 0.00011 0.00030 0.014 1.0 0.0060 0.48 0.0020 0.015 0.019 0.0060 1.0 0.49 0.031 0.033 0.012 0.48 0.49 1.0 0.52 0.48 0.00011 0.0020 0.031 0.52 1.0 0.029 0.00030 0.015 0.033 0.48 0.029 1.0
1.00
1.00
2.00
1.00
yb
yc
yq
yp
so
jn
yv
yz 2
yz1
1.00
0.00
True values
ya
Choice model ASC
Coeff.
0.0681 (0.103) 0.953 (0.0618) 0.979 (0.0612) 0.929 (0.0641) 1.90 (0.116) 0.954 (0.0574) 0.105 (0.00886)
0.00307 (0.0541) 0.262 (0.0167) 0.280 (0.0172) 0.0787 (0.0170)
0.977 (0.0509)
0.560 (0.0289)
0.0469 (0.0781) 0.549 (0.0322) 0.575 (0.0325) 0.554 (0.0364)
0.978 (0.0522)
0.560 (0.0312)
0.0453 (0.0782) 0.549 (0.0323) 0.574 (0.0325) 0.554 (0.0418)
0.417 (0.0278) 0.979 (0.0538) 0.990 (0.0531)
0.0431 (0.0782) 0.550 (0.0323) 0.574 (0.0325) 0.448 (0.0350)
Model I: Model II: q is Model III: Model IV: Model V: all variables excluded two-stage simultaneous price included control control equation in function function utility
Table 16.2: Monte Carlo experiment different model estimators to address endogeneity.
1.00 Fixed 0.977 (0.0509) 0.0000105 (0.902)
0.560 (0.0289)
0.0468 (0.0781) 0.549 (0.0322) 0.575 (0.0325) 0.554 (0.0364)
Model VI: twostage control function/latent variable
1.00 Fixed 0.984 (0.0753) 0.200 (0.865)
0.564 (0.0439)
0.0457 (0.0788) 0.553 (0.0448) 0.578 (0.0454) 0.558 (0.0520)
Model VII: onestage control function/latent variable
364 Cristian Angelo Guevara and Moshe Ben-Akiva
1.00
1.00
2.78
bc
bq
sn 2.80 (N/A)
5.26 (0.166) 1.01 (0.0153) 1.00 (0.0151) 1.03 (0.0171)
2000 2000 2000 1386.29 1386.29 1386.29 526.33 525.99 525.78 0.981 0.981 1.32 0.991 0.991 1.23
5.26 (0.166) 1.02 (0.0171) 0.997 (0.0168) 1.03 (0.0171)
Standard errors in parentheses. ^ Likelihood of the choice model LðyÞ. (N/A): Standard error of s^ 2n cannot be retrieved since s^ 2n was estimated iteratively.
2000 2000 2000 1386.29 1386.29 1386.29 307.67 1027.90 1.00 0.999 2.50 1.00 1.03 3.33
1.00
bz2
N L(0) ^ LðyÞ y^ a =y^ p y^ a =y^ c
1.00
0.00
bz 1
Price equation Intercept
2000 1386.29 526.33 0.981 0.991
5.26 (0.166) 1.02 (0.0171) 0.997 (0.0168) 1.03 (0.0171)
2000 1386.29 526.02 0.981 0.991
2.80 (N/A)
5.26 (0.166) 1.01 (0.0153) 1.00 (0.0151) 1.03 (0.0171) Addressing Endogeneity in Discrete Choice Models 365
366
Cristian Angelo Guevara and Moshe Ben-Akiva
16.5.2.1. Model I: all variables included The first model corresponds to a logit model in which all the variables that are present in the true model shown in Eq. (16.14) are included. The estimates of this model are shown in the third column of Table 16.2, where it can be seen that all estimated coefficients are statistically equal to the true ones. 16.5.2.2. Model II: q is excluded The second model in Table 16.2 corresponds to the estimation of a logit model in which variable q was omitted from the utility specification. Since variable q is correlated with the price by construction, this model suffers from endogeneity. As expected, results show that the estimator of the price coefficient is positively biased in this case. Since the scale of the different models differ, the correct way to check the bias of the price coefficient is by comparing it with the estimated coefficient of variables a or b since those variables are independent, by construction, to all other variables and to the error term. These ratios are at the bottom of Table 16.2, where it can be noted that the coefficient of p is 2.5 times smaller (in absolute value) than the coefficient of a, instead of being equal, as they are in Eq. (16.14). The coefficient of c is also pushed down because it is correlated with ^ is price. Additionally, note that the log likelihood of the choice model LðyÞ substantially more negative than that of Model I. 16.5.2.3. Model III: two-stage control function The next model corresponds to the application of the two-stage control-function correction, as used in Guevara and Ben-Akiva (2006), over the model that excludes q. The estimators of this model are shown in the fifth column of Table 16.2. In this case, the variables of Eq. (16.15) used to build the control function were the two instruments z1in and z2in, variable cin and an intercept to guarantee that n^ in has zero expected mean. Either z1in or z2in could be excluded from the calculation of n^ in but not the two at the same time. In turn, the inclusion of cin in the calculation of n^ in is crucial in this case. Although cin is not correlated with ein, it is correlated with pin and it is an attribute of Uin. Therefore, if cin is excluded from the calculation of n^ in , and n^ in is used as an auxiliary variable in the choice model, the remaining error ein would be correlated with cin, causing endogeneity again. In general, all model variables that are correlated with the endogenous one, but not with the model error, should be used as instruments in the price equation. The estimators of the coefficients of the price equation are statistically equal to the true values. The only exception is the intercept, which is larger because of the omission of qin. Regarding the choice model parameters, it can be noted the twostage control-function method satisfactorily corrected the endogeneity problem since the sign of the coefficients is correct and their ratio, relative to variables a and b, are near to the true ones. Also, the log likelihood of the choice model of Model III is substantially more positive than that of Model II. 16.5.2.4. Model IV: simultaneous control function The next model corresponds to the FIML model described in expression (16.7), which is labeled here as the
Addressing Endogeneity in Discrete Choice Models
367
simultaneous control-function model. The estimators of this model are shown in the sixth column of Table 16.2. The estimation of this model has one particularity. Since the variance of the price equation s2v is a function of other parameters bl in the model, the correspondence between bl and s2v must be guaranteed by the inclusion of a constraint. Instead of including this constraint as part of the maximum likelihood estimation procedure, we consider it iteratively in the following way: (i) For a given iteration k, likelihood shown in Eq. (16.7) is maximized conditional on a given variance s^ 2n_k . (ii) Then, using Eq. (16.16) the resulting estimators are used to calculate the variance to be used in the next iteration. (iii) The process is repeated until convergence. s^ 2n_kþ1 ¼
1 X 2 ^ n^ ðb Þ JN n;j2C jn k
(16.16)
n
In Eq. (16.16), N is the sample size and J is the size of the choice set, which is assumed to be equal across the sample. The extension to the case of different choice set sizes across individuals is obvious. Note that the use of this iterative procedure implies that the standard error of this estimator cannot be retrieved. The estimation results of this model show that the endogeneity problem was solved since the sign of the coefficients of p and c are correct and their size is statistically equal, in absolute value, to that of a. A slight improvement can also be noted in the choice model likelihood. Additionally, there is an increase in the standard errors of some coefficients in the choice model. This could be misinterpreted as a reduction in efficiency resulting from the simultaneous estimation. However, it should be noted that the estimators of the standard errors in the two-stage control function, as in all two-stage procedures, are biased. Therefore, the comparison is not really possible with a single experiment. 16.5.2.5. Model V: price equation in utility An easy (but incorrect) way to achieve simultaneity in the estimation of the control-function method corresponds to the direct replacement of the price equation in the utility as it is shown in Eq. (16.17). nin ¼ pin Z0in b U in ¼ yp pin þ X 0in y þ yn nin þ ein U in ¼ yp pin þ X 0in y þ yn ðpin Z 0in bÞ þ ein U in ¼ ðyp þ yn Þpin þ X 0in y Z 0in byn þ ein U in ¼ y~ p p þ X 0 y Z 0 b~ þ ein in
in
in
ð16:17Þ
This procedure is equivalent to including the instruments directly in the utility function. The estimation results of this model are shown in the seventh column of Table 16.2. It can be noted that, although the likelihood was substantially improved, the estimators of p and c have the wrong sign.
368
Cristian Angelo Guevara and Moshe Ben-Akiva
The reason for these misleading results is that the true coefficients cannot be identified in Model V. Note that the coefficient y~ p in Eq. (16.17) corresponds to the sum of the true coefficient of price (yp) and the coefficient of the control-function (yn). Indeed, if the correct estimators of yp and yn (taken from the control-function method estimations of Models III or IV) are summed, the result is almost equal to the estimator of yp that is obtained in Model V. 16.5.2.6. Model VI: two-stage control function/latent variable This model corresponds to the two-stage latent-variable/control-function model described in Eq. (16.11). In this case the choice model includes a latent variable, whose distribution is then depicted using a structural equation in which the fitted errors of the price equation are used as explanatory variables. The integration of the latent variable was performed using the Hermite procedure with 136 points (Judd, 1998). Some coefficients of Eq. (16.11) need to be normalized to achieve identification. First j0 is set to be equal to zero, since all deviations from the mean is already captured by the alternative specific constant of the choice model. Also, yv is set to be equal to one to allow the identification of jv and the variance of o. The results of this model are reported in the eighth column of Table 16.2. This procedure successfully corrected the endogeneity problem since the parameters have the correct sign and the ratios between the absolute value of the parameters of a, p and c are near one. The standard errors in this model tend to be smaller than those of Model IV for all the coefficients. In contrast, the likelihood is slightly worse. Concordantly, the estimator of so is statistically equal to zero. These results indicate that, for this particular experiment, the proposed method is outperformed by the two-stage and the simultaneous control-function as well. It can be speculated that the numerical approximations needed for the calculation of the integral required in this model, may play a role in this outcome. 16.5.2.7. Model VII: one-stage control function/latent variable The final method under study corresponds to the one-stage control-function/latent-variable method whose likelihood is described in Eq. (16.13). The estimation procedure in this case considers the same normalization used for Model VI and the iterative procedure described in Eq. (16.16). The results are shown in the ninth column of Table 16.2. It can be noted that this method succeeded in the correction of endogeneity. In this case the likelihood of the choice model is slightly better than for Models VI and III, but worse than that of Model IV. Also, it can be noted a small improvement the efficiency of the estimators of the price equation, but a worse result in the case of choice model. Additionally, the estimator of so is more significant than that of Model VI, but still statistically equal to zero for any reasonable level of confidence. It should be remarked that a formal comparison of the relative efficiency of the different models is not possible with a single experiment since the standard errors of
Addressing Endogeneity in Discrete Choice Models
369
the two-stage procedures are biased. Further analysis on this issue is left for further research.
16.6. Conclusion This paper explores different alternatives to address endogeneity in discrete choice models by combining the control-function and the latent-variable methods. It was first shown that both methods can be independently used to treat for endogeneity. In the case of the control-function, the availability of instrumental variables is necessary; whereas in the case of the latent-variable method, the availability of indicators and/or other observed variables that may explain the omitted quality attributes is required. The preference for one method over the other depends on the particular case and on the data available. For example, in mode choice models it is easier to have proper indicators than instruments to correct for endogeneity in travel time, whereas the contrary occurs in residential location choice when correcting for endogeneity of price. It was also shown how both methods may be seen as equivalent. The instruments used in the control-function method can be used to build the fitted errors of the price equation and then those fitted errors can be considered as explanatory variables in a structural equation in the latent-variable method. The specifications proposed were estimated and assessed, based on their ability to correct for endogeneity in a Monte Carlo experiment. Normalization and other estimation procedures associated with each method were described. Of the five specifications analyzed, four succeeded in correcting the endogeneity. The only failure, as expected, corresponds to the case when the instruments are directly used as additional variables in the choice model. The specifications analyzed involved estimation procedures performed in one and two stages. Although simultaneous estimators should increase efficiency, they also imply stronger distributional assumptions and a significant computational burden associated with the calculation of a multifold integral that depends on the number of alternatives. The Monte Carlo experiment performed in this research is inconclusive regarding these issues. Therefore, future research in this area should include the analysis of the relative performance of the different methods using real data and different specifications and repetitions of the synthetic data. Further analysis should also be performed regarding the study of methods to reduce the computational burden associated with the estimation of the integrals in the methods proposed.
Acknowledgments This publication was made possible by the generous support of the Portuguese Government through the Portuguese Foundation for International Cooperation in Science, Technology and Higher Education, undertaken in the MIT-Portugal
370
Cristian Angelo Guevara and Moshe Ben-Akiva
Program. Research assistance by Kamil Sveda and editorial assistance by Tina Xue are greatly appreciated.
References Ben-Akiva, M., & Lerman, S. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: The MIT Press. Blundell, R., & Powell, J. (2004). Endogeneity in semi-parametric binary response models. Review of Economic Studies, 71, 655–679. Eklo¨f, J., & Karlsson, S. (1997). Testing and correcting for sample selection bias in discrete choice contingent valuation studies. Working Paper no. 171, Stockholm School of Economics, Sweden. Greene, W. (2003). Econometric analysis (5th ed.). New York: Prentice Hall. Guevara, C. A., & Ben-Akiva, M. (2006). Endogeneity in residential location choice models. Transportation Research Record, 1977, 60–66. Guevara, C. A., & Ben-Akiva, M. (2008). A Lagrange multiplier test for the validity of instruments in MNL models: An application to residential choice. European Transport Conference, Leeuwenhorst, The Netherlands. Hausman, J. (1978). Specification tests in econometrics. Econometrica, 46, 1251–1272. Heckman, J. (1978). Dummy endogenous variables in a simultaneous equation system. Econometrica, 46, 931–959. Judd, K. (1998). Numerical methods in economics. Cambridge, MA: MIT Press. Louviere, J., Train, K., Ben-Akiva, M., Bhat, C., Brownstone, D., Cameron, T., Carson, C., Deshazo, J., Fiebig, D., Greene, W., Hensher, D., & Waldman, D. (2005). Recent progress on endogeneity in choice modeling. Marketing Letters, 16(3–4), 255–265. Mabit, S., & Fosgerau, M. (2009). Mode choice endogeneity in value of travel time estimation. Proceedings of International Choice Modelling Conference, Leeds. Park, S., & Gupta, S. (2009). A simulated maximum likelihood estimatorfor the random coefficient logit model using aggregate data. Journal of Marketing Research, 46(4), 531–542. Petrin, A., & Train, K. (2005). Omitted product attributes in discrete choice models. Working Paper, National Bureau of Economic Research. R Development Core Team. (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing,Vienna, Austria. ISBN 3-900051-07-0, available at: http://www.R-project.org Train, K. (2009). Discrete choice methods with simulation (2nd ed.). Cambridge, UK: Cambridge University Press. Vella, F. (1992). Simple tests for sample selection bias in censored and discrete choice models. Journal of Applied Econometrics, 7, 413–421. Villas-Boas, M., & Winer, R. (1999). Endogeneity in brand choice models. Management Science, 45, 1324–1338. Walker, J., & Ben-Akiva, M. (2002). Generalized random utility model. Mathematical Social Sciences, 43(3), 303–343. Walker, J., Li, J., Srinivansan, S., & Bolduc, D. (2008). Mode choice endogeneity in value of travel time estimation. Proceedings of the Transportation Research Board Annual Meeting.
Chapter 17
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation Based on Compensatory Screening Rules Matthieu de Lapparent
Abstract This article addresses simultaneously two important features in random utility maximisation (RUM) choice modelling: choice set generation and unobserved taste heterogeneity. It is proposed to develop and to compare definitions and properties of econometric specifications that are based on mixed logit (MXL) and latent class logit (LCL) RUM models in the additional presence of prior compensatory screening decision rules. The latter allow for continuous latent bounds that determine choice alternatives to be or not to be considered for decision making. It is also proposed to evaluate and to test each against the other ones in an application to home-to-work mode choice in the Paris region of France using 2002 data.
17.1. Introduction As observer of a discrete choice situation, one knows only the set of available choices, some of their attributes, some of the characteristics of the decision maker, and the final choice. Even if information that is provided to the decision maker is perfect, various decision rules may be used to simplify decision tasks, especially if they are complicated. When individual preferences are moreover considered normatively or positively as stochastic, one is interested in modelling her choice rationality in a probabilistic framework. One finds a lot of decision theories that
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
372
Matthieu de Lapparent
derive individual choice probabilities from latent behavioural processes. Well accepted ones are the theory of dominant preferences (Scott, 2002), the elimination-by-aspect approach proposed by Tversky (1972a, 1972b), the additive preference trees of Sattah and Tversky (1977), the preference trees of Tversky and Sattah (1979), the feature-based elimination process of Andrews and Manrai (1998), and random utility theory (random utility maximisation — RUM, e.g. Block & Marshack, 1959; McFadden, 1973; Manski, 1977; Falmagne, 1978; Fishburn, 1978; Ben-Akiva & Lerman, 1985; McFadden & Richter, 1990; McFadden, 2005) and its extensive frameworks (hybrid choice models and generalised RUM choice models, e.g. BenAkiva et al. (2002) and Walker and Ben-Akiva (2002)). The model that is presented in this article is to some extent a generalised RUM choice model. It accounts for endogenous choice set generation that is based on behavioural processes and it assumes that the final decision is made on the basis of RUM. One problem that is addressed in the present article is that of choice set consideration. Actually, one does not know whether all the available choice alternatives are taken into account or not for final decision. One may assume that a non-empty subset of alternatives is selected from the set of available choice before proceeding to the final decision. Some of them may be screened out. Screening rules restrict the set of choice alternatives that are evaluated for final selection. A quick look at the existing literature shows that it has been proposed several approaches and strategies to deal with this problem: the conjunctive and disjunctive rules (e.g. Grether & Wilde, 1984; Gensch, 1987; Roberts & Lattin, 1991; Gilbride & Allenby, 2004; Cantillo & Ortu´zar, 2005), the penalised utility function approach of Swait (2001), the parametrised independent availability model (PIAM) of Swait and BenAkiva (1987a, 1987b), the multivariate (and possibly random) bounds approach of Boccara (1989) and Ben-Akiva and Boccara (1995), the parametrised consideration set (PCS) approach proposed by Basar and Bhat (2004) (an attractive feature of the PIAM approach), and the single threshold (SLT) approach suggested by GeorgescuRoegin (1958). All these approaches account for the fact that, as observer, it is impossible to know the set of alternatives that are ‘really considered’ by the decision maker. Another important point in modelling choice behaviour is that of unobserved taste heterogeneity (e.g. Manski, 1977; Train, 2003). The present approach accounts for it. It proposes two ways of considering randomly distributed parameters, by means of either continuous distributions or discrete distributions. In the latter case, one proposes a parametric distribution that attempts to explain why a decision maker may belong to some latent classes on the basis of its socioeconomic characteristics. The proposed modelling approaches have two layers. The top layer focuses on the choice of a non-empty subset of the full set of available choices. It deals with latent choice set generation. Latent consideration levels are introduced to model thresholds from which are based selection of choice alternatives: the levels of utility of the latter must be larger to be considered for final decision making. It is also assumed that, whichever is the chosen bundle of attributes of alternatives, the considered choice set is never empty. The decision maker cannot cancel the choice situation. The bottom
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
373
layer focuses on the choice of an alternative given the considered set. The alternatives that pass the screen are evaluated in a manner consistent with random utility theory. The alternatives that do not pass the screen have a zero probability of being chosen. Obviously, if there is a single alternative that passes the screen, it has then a probability equal to one of being chosen. The outline of the article is as follows. The approaches and the empirical choice models are presented in Section 17.2. The latter are based on a specific mixture of an appropriate generalised extreme value (GEV) probability distribution (McFadden, 1977; Ben-Akiva & Franc- ois, 1983; Ben-Akiva & Lerman, 1985; Bierlaire, 2006). It is detailed the marginal and conditional choice probabilities of each step of the decision scheme. Identification and estimation of the parameters by maximum (simulated when necessary) likelihood technique are also discussed. Data are discussed in Section 17.3. Sample formation and the way independent variables are collected to rebuild individual choice situations are detailed. Descriptive statistics of the dependent and independent variables are also presented. The results are reported and discussed in Section 17.4. Conclusions are drawn in a last section. Further research tracks are also elaborated.
17.2. The Model 17.2.1. Probabilistic Framework Consider a population of decision makers, each indexed by i; where i ¼ 1; . . . ; n; who are faced with a set A of M available choices.1 Each choice is modelled by m ¼ 1; . . . ; M. One observes only the outcome of a nominal variable yi that models the decision made by an individual i when faced with A. For the rest of the article, one writes yi ¼ m to denote that the chosen alternative chosen by i is the mth one. As it regards consideration for some but not necessarily all of the choice alternatives in A, the set of subsets one may build with the latter is finite and countable. It is furthermore assumed that the decision maker cannot cancel the choice situation. The empty subset cannot be selected.2 There are therefore K ¼ 2M 1 possible subsets. Each of them is modelled by k ¼ 1; . . . ; K. Let also di be the nominal variable that model subset selection. For the rest of the article, one writes di ¼ k or Ck to denote that the subset of alternatives that is considered is the kth one. Without loss of generality, k ¼ 1 corresponds to A. As statistical inference is based on observed data, one is interested in the expected choice probability with respect to the distribution of
1. The set of available choices may actually differ between the decision makers. The invariance assumption is made to make the presentation and the application easier. One notes, however, for the latter that it implies a sample selection process that may limit the empirical analysis to a specific population of interest. 2. Such an assumption means that the decision maker is forced to participate to the choice situation. The empty subset is excluded from the set of possible subsets: it has a probability to be selected that is equal to zero.
374
Matthieu de Lapparent
subset selection. It is assumed to be conditional to a set of independent variables x that model the attributes of the choice alternatives and to a set of independent variables z that model the characteristics of the decision maker. It depends also on a parameter y: Prð yi ¼ mjxi ; zi ; yÞ ¼
K X
Prðd i ¼ kjxi ; zi ; yÞ Prðyi ¼ mjd i ¼ k; xi ; zi ; yÞ
(17.1)
k¼1
In the present approach, one would like to define the joint probability distribution of the nominal variables as a joint probability distribution that is based on functions of some latent variables that model the behaviour of the decision maker.
17.2.2. Econometric Specification 17.2.2.1. Conditional choice probabilities No matter which is the chosen nonempty subset of the available choice alternatives, it is assumed that the choice between the latter is based on RUM. Utility of decision maker i who chooses the mth alternative is defined as: U i;m ¼ V i;m þ i;m ; V i;m ¼ x0 i;m g þ z0 i bm
(17.2)
where iid
i;m * EV 1 ð0; mÞ
(17.3)
Given RUM and the above distributional assumptions,3 the probability that the mth alternative is chosen writes then as a multinomial logit (MNL) choice probability: expðmV i;m Þ j2Ck expðmV i;j Þ
Prð yi ¼ mjd i ¼ k; xi ; zi ; yÞ ¼ P
(17.4)
If Ck is a singleton and if it contains only m, then the probability to choose m is equal to one. If Ck does not contain m, then the probability to choose m is equal to zero. 17.2.2.2. Screening rules One turns now to the modelling of choice set generation. It is assumed that the decision maker selects the alternatives she considers for final
3. One reminds the reader that the conditional RUM assumption postulates that Prð yi ¼ mjd i ¼ k; xi ; zi ; yÞ ¼ Prð8m; j 2 Ck ; jam; U i;m 4U i;j jd i ¼ k; xi ; zi ; yÞ and that the EV1 distribution has a cumulative distribution function that is defined 8x 2 R and 8m 2 Rþ as F(x) ¼ exp( exp( mx)). m is the scale of the distribution.
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
375
choice on the basis of a specific behavioural screening rule. In the present approach, it considered existence of latent thresholds (or latent cutoffs) that determine whether a choice alternative is considered or not. PIAM The PIAM approach assumes that there exist as much latent consideration levels as there are choice alternatives. Each of the latter has one associated latent cutoff. It is considered in the final choice set if and only if its corresponding level of utility is larger than the latter. The selected choice set may then be defined as: Ck :¼ fm 2 AjU i;m 4W i;m g
(17.5)
The latent consideration levels are modelled as linear functions of the levels of observed choice attributes and individual characteristics. They are defined up to independently and identically distributed error terms: i;m þ i;m; W i;m ¼ x0 i;m c þ z0 i fm 8m 2 A; W i;m ¼ W
(17.6)
where iid
8m 2 A; i;m * EV1ð0; mÞ
(17.7)
8j; m 2 A; Eði;m :i;j Þ ¼ Eði;m ÞEði;j Þ
(17.8)
and
As stated by Eq. (17.6), the definition of the vector of latent bounds is context dependent. However, one notes that context dependency of a latent bound is related only to the attributes of its corresponding choice alternative. Eq. (17.8) defines a conditional independence assumption. It states that the error terms of the utility functions and the error terms of the latent thresholds are independently distributed. After some algebraical manipulations, the probability to select a subset of choice alternatives may be written as: Prðd i ¼ kjxi ; zi ; yÞ Prð8m 2 Ck ; U i;m 4W i;m and 8jeCk ; U i;j W i;j jxi ; zi ; yÞ 1 Prð8l 2 A; U i;l W i;l jxi ; zi ; yÞ Q Q m2Ck PrðU i;m 4W i;m jxi ; zi ; yÞ jeCk PrðU i;j W i;j jxi ; zi ; yÞ ¼ Q 1 M l¼1 PrðU i;l W i;l jxi ; zi ; yÞ Q Q m2Ck 1=ð1 þ expðmðW i;m V i;m ÞÞÞ jeCk 1=ð1 þ expðmðV i;j W i;j ÞÞÞ ¼ QM i;l ÞÞÞ 1 1=ð1 þ expðmðV i;l W ¼
l¼1
(17.9)
376
Matthieu de Lapparent
and the probability to choose A simplifies to: Prðd i ¼ 1jxi ; zi ; yÞ QM ¼
1
m¼1 PrðU i;m 4W i;m jxi ; zi ; yÞ Q M l¼1 PrðU i;l W i;l jxi ; zi ; yÞ
QM ¼
1
m¼1 1=ð1 þ expðmðW i;m V i;m ÞÞÞ Q M l¼1 1=ð1 þ expðmðV i;l W i;l ÞÞÞ
(17.10)
One notes that the probability to consider a choice alternative m is a decreasing i;m V i;m . Due to the linear functional forms of W i;m function of the difference W and V i;m , the sign of the effect of an independent variable on the probability to consider the mth choice alternative depends on the sign of the difference of the corresponding coefficients. SLT The SLT approach may appear as a special case of the PIAM approach. It is defined only one latent threshold that determines the level of consideration to reach for a choice alternative to be included in the considered choice set. A choice alternative is selected for final choice if and only if its current level of utility is larger than this latent consideration threshold. The alternatives that pass the screen have larger levels of utility. The alternatives that do not pass the screen have lower levels of utility. Ck :¼ fm 2 AjU i;m 4W i g
(17.11)
In the present approach, the latent threshold is modelled as an additive function of both the characteristics of the decision maker and the attributes that characterise choice outcomes plus an independent error term: i þ Zi ; W i ¼ z0 i a þ j0 hðxi Þ Wi ¼ W
(17.12)
where iid
Zi * EV 1 ð0; mÞ
(17.13)
8m 2 A; EðZi i:m Þ ¼ EðZi ÞEði;m Þ
(17.14)
and h : RP 7!RQ ; Q P being a deterministic mapping. Note that, as discussed already by Huber and Klein (1991), cutoffs are then adapted to the choice situation. They are context dependent. The latter may be defined as a vector of functions of choice attributes. Actually, h defines and controls context dependency. Context independency is not a credible assumption. It may appear surprising that the decision maker sets his/her cutoff level without considering the choice situation. On the other hand, context dependency needs to be defined with precision. There are, however, several ways of doing it and one needs therefore to test each against the others to
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
377
shed light on the most appropriate ones for the performed analysis. One must also accept that it may vary with the latter: it is not rotationally invariant.4 Note also that Eq. (17.14) states that the error terms of the utility functions are not correlated between them and with the error term of the latent threshold. Here again, it a conditional independence assumption that is made to simplify the approach. One must recognise that it is a rather strict assumption. It appears, however, as a convenient and practical one to a first approximation. It will be discussed how to overcome it later in the paper. As also previously mentioned, it is assumed that the decision maker cannot cancel the choice situation. She is obliged to select at least one alternative to form the choice set she considers. The probability that she selects a given subset of choices Ck from A is therefore conditional to the fact that it can never be the empty set. Adapting Eqs. (17.9) and (17.10) to the SLT screening rule, one obtains: Prðd i ¼ kjxi ; zi ; yÞ Prð8m 2 Ck ; U i;m 4W i and 8jeCk ; U i;j W i jxi ; zi ; yÞ 1 Prð8l 2 A; U i;l W i jxi ; zi ; yÞ Q Q m2Ck PrðU i;m 4W i jxi ; zi ; yÞ jeCk PrðU i;j W i;j jxi ; zi ; yÞ ¼ Q 1 M l¼1 PrðU i;l W i jxi ; zi ; yÞ Q Q m2Ck 1=ð1 þ expðmðW i V i;m ÞÞÞ jeCk 1=ð1 þ expðmðV i;j W i ÞÞÞ ¼ QM i ÞÞÞ 1 1=ð1 þ expðmðV i;l W
¼
(17.15)
l¼1
and the probability to choose A simplifies to: Prðd i ¼ 1jxi ; zi ; yÞ QM QM m¼1 PrðU i;m 4W i jxi ; zi ; yÞ m¼1 1=ð1 þ expðmðW i V i;m ÞÞÞ ¼ ¼ QM QM i ÞÞÞ 1 l¼1 PrðU i;l W i jxi ; zi ; yÞ 1 l¼1 1=ð1 þ expðmðV i;l W
(17.16)
One notes that the probability to consider a choice alternative m is a decreasing function of the difference W i V i;m . Here again, due to the linear functional forms of W i and V i;m , the sign of the effect of an independent variable on the probability to consider the mth choice alternative depends on the sign of the difference of the corresponding coefficients. 17.2.2.3. Continuous vs. discrete unobserved taste heterogeneity Rather strong assumptions that are made to implement the SLT and PIAM approaches are isodistribution and independence of the error terms. There are many reasons to
4. It means that changing the data but not the purpose of the analysis may in turn induce a change of the most appropriate mapping that would model context dependency.
378
Matthieu de Lapparent
foresee existence of unobserved correlation between the levels of utility and the levels of the latent bounds. There are also many reasons to consider unobserved heteroskedasticity. Such methodological drawbacks may (at least partly) be overcomed by introducing error components and/or random parameters in the mathematical formulations of the different parts of the modelling approaches (e.g. Train, 2003). One must, however, define clearly their causal effects and their meanings. In the present approaches, it is assumed that, even though the decision makers adopt the same behavioural norm (utility maximisation), the impacts of the different factors that affect selection and choice may vary across them. One assumes existence of what is commonly referred to as unobserved taste heterogeneity. Continuous unobserved taste heterogeneity Continuous unobserved taste heterogeneity assumes that the vector of the unknown parameters has a continuous probability density function. In the present approach, it is assumed a parametric distribution f ðjpÞ. For simplification purpose, it does not depend on any independent variable.5 The choice probability is defined as the expected choice probability in Eq. (17.1) with respect to the distribution of the vector of random parameters: Prð yi jxi ; zi ; pÞ YM Z ¼ m¼1
Iðyi ¼mÞ Prð yi ¼ mjxi ; zi ; yÞf ðyjpÞ
(17.17)
DðyÞ
The choice of f ðjpÞ is itself a challenge and it depends on the objectives of the modeller and the available data. The most used distributions are the normal distribution, the log-normal distribution, the truncated normal distribution, the uniform distribution, the triangular distribution and the Johnson Sb distribution. One refers the reader to Hensher, Rose, and Greene (2005), Hess, Polak, and Axhausen (2006), Train (2003) for a discussion of their advantages and drawbacks. Discrete unobserved taste heterogeneity Latent class choice models have recently drawn considerable attention among many researchers and practitioners as a class of useful tools for capturing heterogeneity across different segments in a target market or population (see for instance Greene & Hensher, 2003). It allows for the fact that the decision makers do not have the same taste when they do not belong to the same segment and that they have the same tastes when they belong to the same segment.
5. One notes that one may have introduced independent variables in the specification of the distribution to explain variation of its parameter. Such an approach has not been tested in the presented application.
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
379
The choice probability is defined as the expected choice probability in Eq. (17.1) with respect to the discrete distribution of the vector of coefficients: Prð yi jxi ; zi ; y1 ; . . . ; yS ; pÞ " #Iðyi ¼mÞ S YM X ¼ Prðys jzi ; pÞPrð yi ¼ mjxi ; zi ; ys Þ m¼1
(17.18)
s¼1
In the present approach, one does not set any strict behavioural assumptions on class assignment. It does not depend on specific latent processes but rather on the intersection of several fuzzy ones where some of them may not be naturally ordered. No attempt will be made to explain in a meticulous way the distribution of unobserved heterogeneity across the population of decision makers. It exists and that it is modelled by MNL assignment probabilities. expðz0 i ps Þ Prðys jzi ; pÞ ¼ PS 0 l¼1 expðz i pl Þ
(17.19)
The fact that the distributions of unobserved heterogeneity does not depend on the attributes of the discrete alternatives in A relies on the principle of consumer sovereignty (see for instance McFadden, 2001). It states that preferences, hence tastes, are established prior to any choice situation.
17.2.2.4.
Identification
Log-likelihood functions The log-likelihood functions are defined as the sum of the logarithms of the n individual choice probabilities that are defined unconditionally to the choice of a consideration set and unconditionally to unobserved taste heterogeneity: ln ‘MXL ðpjyi ; xi ; zi Þ ¼
n X
lnðPrðyi jxi ; zi ; pÞÞ
(17.20)
i¼1
The ln ‘MXL function does not have any closed form solution. Monte-Carlo integration as well as numerical integration techniques have to be used (e.g. Gourie´roux & Monfort, 1996). Monte-Carlo integration techniques will be used for the application. As it regards the approaches with discrete unobserved heterogeneity, the loglikelihood function may be written as:
ln ‘LCL ðy1 ; . . . ; yS ; pjyi ; xi ; zi Þ ¼
n X i¼1
lnðPrð yi jxi ; zi ; y1 ; . . . ; yS ; pÞÞ
(17.21)
380
Matthieu de Lapparent
Discussion As regards identification of the parameters of the model, one has to apply the standard normalisation condition m ¼ 1 whatever is the chosen econometric specification. Looking at Eq. (17.4), as already explained in McFadden (1973), Ben-Akiva and Lerman (1985), Train (2003), one must also choose a choice of reference, say m*, for which the parameters that weigh the characteristics of the decision maker are fixed to 0 (because choice probabilities are functions of the differences of the levels of utility). It means that only bm bm may be estimated. Additional identification conditions need to be adapted depending on the choice of a mixed logit (MXL) or a latent class logit (LCL) model but not depending on the choice of a SLT or a PIAM screening rule. Looking at Eqs. (17.9), (17.10), (17.15), (17.16), one observes that probability to consider a given subset of choice alternatives is a function of the differences between the levels of utility and the levels of the latent cutoffs. But if one uses the above exclusion constraint, one finds that only fm bm and a bm may be estimated. g, c, and j may be estimated without additional constraint. Walker, Ben-Akiva, and Bolduc (2007) have proposed rigorous identification conditions about some standard MXL specifications, including the ones that assume alternative-specific variances. For cross-sectional data, by using rank, order, and equality conditions, they showed that the mode of reference must be the one with the largest alternative-specific variance. For an easier reading of the results, one may also choose the same mode of reference for identification of the coefficients that are associated to the characteristics of the decision maker. If random coefficients that are associated to the characteristics of the decision maker are random, then exclusion constraints must also apply to their higher moments. For instance, if their distribution is defined only on the basis of mean and variance, then exclusion constraints must apply to the variances of the parameters of the mode of reference. Finally, when specifying latent class models with MNL assignment probabilities as in Eq. (17.19), one must choose a latent class of reference for which the parameters have to be fixed to 0.
17.3. Data 17.3.1. Sample Formation Data are drawn from the 2001–2002 travel survey that was conducted in the Paris region of France and from the outputs of the MODUS model of the regional direction of roads and infrastructure (Papinutti, 1998; DREIF/DIT/GEST, 1998). The MODUS model is one of the few regional transportation planning systems (Bhouri, 2002). One uses the interzonal and the intrazonal equilibrium travel times, travel costs and travel distances it produces to build up the transport attributes of the individual choice situations. The population of interest consists of workers who may use either a private vehicle or a public mode of transport for their home-to-work trips, that is two generic modes
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
381
Table 17.1: Descriptive statistics, sample size n ¼ 4601. Label Observed choices Choice: PV mode Choice: PT mode Independent variables Trip timea in hr, PV Trip timeb in hr, PT Trip costc in Euros, PV Trip costd in Euros, PT Trip distancee in km Daily incomef in Euros Age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
Mean
SD
Freq.
– –
– –
3251 1350
0.63 1.01 2.45 1.69 13.37 58.58 40.82 – – –
0.50 0.68 4.41 0.57 10.59 34.48 22.55 – – –
– – – – – – – 440 1168 2714
Abbreviations: SD, standard deviation; Freq., frequency; PV, private motorised vehicle; PT, public transportation; hr, hour; km, kilometre; yr, year; D.V., dummy variable. a Computed as the network equilibrium path time when PV is not chosen. b Computed as the shortest path time when PT is not chosen. c Computed as the 2002 STIF average kilometre cost (0.19 Euros) by distance at the network equilibrium state. d Computed as the monthly cost of travel card divided by 40 trips for a full time job and 20 trips for a part time job. e Computed as the shortest path distance. f Computed as the class center value divided by consumption units in the household and 360 days per year.
of transport.6 Such a sample selection excludes some workers who live and/or work in some locations that public transport does not serve. It excludes also workers who do not have available a private mode of transport either because their households do not own cars or motorcycles or because they are not licensed to drive or because they cannot practice carpooling. Table 17.1 reports briefly descriptive statistics of the variables that have been selected for the analysis. About 70.66% of the decision makers use a private vehicle. On average, one observes that it is the fastest but also the most expensive.7 The average daily income is about 58.58h but the empirical distribution has a large variance. Only 9.56% of observed workers live in the city of Paris (i.e. the urban centre of the region) and 25.39% work in the city of Paris; 58.99% of workers are men.
6. Genericity may not be the best approach to study individual choices but one has not available more detailed transport supply data. 7. One may note that if there is no such balance in travel attributes, one of the mode of transport may dominate the other.
382
Matthieu de Lapparent
Table 17.2: Summary of estimated models, sample size n ¼ 4601. Modela MNL LC-MNL MXL PIAM MXL-PIAM SLTj LC-SLT MXL-SLT
xcl.b
ln ‘c
ln ‘0 d
1 2 1 1 C 1 2 C
1773.07 1734.94 1762.69 1665.67 1642.94 1710.32 1664.55 1701.02
3189.17 3189.17 3189.17 3189.17 3189.17 3189.17 3189.17 3189.17
xpar.e LR stat.f q2g 8 22 16 24 48 15 36 30
2832.20 2908.46 2852.96 3047.00 3092.46 2957.70 3049.24 2976.30
0.44 0.45 0.45 0.47 0.47 0.46 0.47 0.46
AICh
BICi
3562.14 3513.88 3557.38 3379.34 3381.88 3450.64 3401.10 3462.04
3613.61 3655.43 3660.86 3533.73 3690.71 3547.15 3632.73 3655.06
Abbreviations: MNL, multinomial logit; LCL, latent class logit; MXL, mixed logit; PIAM, parametrisedindependent alternatives model; SLT, single latent threshold. a Maximum likelihood or maximum simulated likelihood estimation. Optimisation algorithms involve a combination of BFGS and Newton–Raphson algorithms with line search based on trust region or on Wolfe’s conditions approaches. Initial values: vector of zeros for the MNL model; MNL estimates for PIAM and SLT, PIAM and SLT for their mixed versions; expectation-maximisation (EM) estimates for latent class PIAM and latent class SLT. Initial values of the EM are arbitrary although they include PIAM or SLT estimates plus a vector of zeros as a possible initial guess. b Number of latent classes. C: continuously distributed taste heterogeneity assuming independent normal probability distribution functions. Halton draws (100) are used to approximate multivariate integrals in the log-likelihood function. c Value of the log-likelihood function at convergence point. d Value of the log-likelihood function at parameters all equal to 0. e Number of parameters, P. f Likelihood ratio statistic: 2ðln ‘0 ln ‘Þ. g Pseudo-r2, adjusted: 1 ððln ‘ PÞ= ln ‘0 Þ. h Akaike information criterion (AIC): 2P 2 ln ‘. The AIC attempts to find the model that best explains the data with a minimum of free parameters. i Bayesian information criterion (BIC): P lnðnÞ 2 ln ‘. The BIC attempts also to find the model that best explains the data with a minimum of free parameters. When nZ3, it penalises the latter more strongly than does the AIC. j Context-dependency: shortest distance.
17.3.2. Model Specifications Several MNL, SLT and PIAM specifications have been estimated but not all of them are reported in the present article. The models have been estimated with different functional forms of the utility function: linear utility functions, logarithmic utility functions, combinations of linear and logarithmic transformations of the independent variables. It is found that the likelihood functions are maximised when all the quantitative variables are transformed into logarithms (Table 17.2).8
8. One is aware that one should have tested for more flexible functional forms, for example by using Box– Cox transformations of quantitative variables (see, for instance Mandel, Gaudry, & Rothengatter, 1997).
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
383
Models with random parameters and alternative specific constants have been estimated by maximum simulated likelihood method (e.g. Gourie´roux & Monfort, 1996) using 100 Halton draws9 (e.g. Train & Sandor, 2004) and assuming that the coefficients are normally distributed.10 Public transport is the mode of reference for which the intercept and the coefficients of the individual-specific variables are fixed to 0. It appears to be the mode with the largest unobserved variance. For the sake of easiness in reading and in comparing the results, it is also chosen to be the mode of reference when it is allowed discrete latent classes. After having tested various model structures,11 context dependency is defined in the SLT approach as the shortest distance to travel from home to work. It defines the model specification that replicates the best the observed choices and that gives the best statistical information. The choice between the different definitions is made on the basis of the BIC criterion. The estimation strategy has also led to estimate model specifications assuming different numbers of latent classes. As proposed by Greene and Hensher (2003), the ‘optimal’ number of latent classes is determined by selecting the models with the lowest BIC criterions. The reported latent class discrete choice models are the latter. Numerical evaluation of the PIAM approach with latent classes was however not stable and the results did not seem reliable for several reasons: violation of theoretical assumptions (positive signs of the coefficients of travel attributes in the utility functions), huge inflation of the number of parameters to estimate and numerical issues in computational aspects (optimisation of highly non-linear functions with large number of parameters). It may be due to an inappropriate estimation strategy, especially as it regards computation of a ‘good’ initial guess to start the algorithm, with respect to the large dimension of the optimisation problem. For instance, estimation of such a specification with only 2 latent classes requires estimating 54 parameters. If it is assumed 3 latent classes, then the number of parameters to estimate grows up to 84. While running the algorithm to estimate a LCL-PIAM model, it was observed that it converged almost systematically to saddle points or to boundary solutions. It does, however, not mean that the approach is not feasible but it means rather that an important topic of future research is to focus on
9. Halton draws is a way to speed up estimation for a given level of accuracy of the estimator. It is not the only method, e.g. Hess, Bierlaire, and Polak (2005). 10. Although somewhat conventional, such an assumption has been and is still criticised by many authors. Many alternative specifications have been proposed to keep its advantages and to circumvent some of its drawbacks (e.g. the textbooks of Train (2003), Hensher et al. (2005)). 11. Firstly, the same independent variables have been used for every model specifications. Secondly, it has also been estimated SLT models without context dependency, SLT models with context dependency that is defined as the minimum travel time between the available modes of transport, SLT models with context dependency that is defined as the minimum travel cost between the available modes of transport, SLT models with context dependency that is defined as the minimum travel time and the minimum travel cost between the available modes of transport, and SLT models with context dependency that is defined as the shortest distance between modes of transport.
384
Matthieu de Lapparent
specific optimisation routines to deal with this issue, including automatic testing of multiple start values. The results show a strong improvement in model fit under a latent class structure in both the base model and the SLT model as compared to a MXL structure. Assuming that the decision makers may be grouped into subpopulations gives better results than assuming that each decision maker has specific tastes. Either under a latent class structure or a base structure, the SLT approach performs statistically better than a MNL approach, thereby suggesting that the decision makers do not behave as pure random utility maximisers. But as compared to a MNL specification or a SLT specification, the PIAM approach seems to reproduce the observed choices the best. Regardless of the choice of model specification between MNL, PIAM or SLT, the MXL extension does not give satisfactory results. The improvements in the loglikelihood functions are not significant as compared to the additional number of parameters required to estimate the models. The choice of a model clearly depends on which criterion is selected. Within each class of model, and based on the BIC criterion, the LC-MNL, the PIAM and LC-SLT, are the specifications that best explain the data with a minimum of free parameters. Considering all the estimated models together, the PIAM approach should be preferred.
17.4. Results Even though it is mainly presented for informational purpose only, the results of the MNL model are reassuring about the expected signs and significance of the estimates (Table 17.3). The probability to choose a private motorised mode is a decreasing function of its travel time and its travel cost, and an increasing function of income and age. It decreases also when home location or when workplace is located in the urban centre, that is where the ratio of road capacity to population density (and job density) is the lowest and where supply of public transport is the most dense. It is also shown that the probability to choose a private vehicle is larger for men than for women, denoting therefore a significant gender effect. When the model is a MXL (with alternative specific variances) with independently and identically normally distributed random coefficients, it is found that the impacts of travel time, travel cost and home location are significantly dispersed across the population of decision makers (Table 17.4). When unobserved heterogeneity is discrete and explained by a MNL class assignment model, the results show that two latent classes are enough to cluster the population of travellers (Table 17.5). The results show that the decision makers with higher willingness-to-pay for travel time savings have a larger probability to live in the suburbs and work in the urban centre. They have also a lower probability to choose a private vehicle.
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
385
Table 17.3: MNL estimates, sample size n ¼ 4601. Choice probability: PV modea Label
Est.b
t-statc
Intercept lnd trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
1.63 2.00 1.04 0.13 0.45 0.60 1.66 0.69
2.75 20.65 14.87 1.83 2.95 4.65 17.71 7.90
Fit statistics ln ‘ ln ‘0 LR stat Pseudo-r2, adjusted AIC BIC
1773.07 3189.17 2832.20 0.44 3562.14 3613.61
Abbreviations: hr, hour; yr, year; D.V., dummy variable; AIC, Akaike information criterion; BIC, Bayesian information criterion. a For identification purpose, coefficients of intercept and individual specific variables are fixed to 0 in the specification of the PT utility function. b Maximum likelihood estimate. c Student t-statistic. d Natural logarithm.
The SLT model assumes existence of a latent threshold that determines whether a mode of transport is considered or not. The effect of a variable on the probability to consider a mode of transport is measured as the difference between the estimate in the corresponding utility function and the estimate in the latent threshold. If the difference is negative, then the impact is negative. If the difference is positive, then the impact is positive (Table 17.6). Consideration for both modes of transport is an increasing function of the distance to travel, an increasing function of age, and it decreases when home is located in the urban centre (in the city of Paris). The decision maker is likely to increase the set of choice alternatives when the distance to travel increases. He/she is likely to look for larger variety of travel alternatives and trade them off when travel distance increases. In the present application, it means conversely that the decision maker is more and more set in his/her tastes when distance to travel decreases. He/she favours, therefore, only one mode of transport. It may add to the fact that living in the city of Paris increases the probability to consider only one mode of transport, especially public transport as shown by the coefficients of the utility function of the private vehicle. Interestingly, the results show also that even though
386
Matthieu de Lapparent
Table 17.4: Mixed MNL estimates, sample size n ¼ 4601. Choice probability: PV mode Label
Mean
t-stat
SD
t-stat
Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
3.10 3.51 1.67 0.27 0.75 0.77 2.71 1.26
1.68 2.28 2.36 1.43 1.73 1.86 2.26 2.09
0.70 1.33 0.59 0.25 0.32 2.65 0.12 1.30
0.65 1.90 2.13 1.26 1.15 2.01 0.26 1.59
Fit statistics ln ‘ ln ‘0 LR stat Pseudo-r2, adjusted AIC BIC Number of simulations
1762.69 3189.17 2852.96 0.45 3557.38 3660.86 100
Abbreviations: SD, standard deviation; hr, hour; yr, year; D.V., dummy variable; AIC, Akaike information criterion; BIC, Bayesian information criterion.
the probability to choose the private vehicle decreases when workplace is located in the city of Paris, the latter variable does not affect the probability to consider it. One notes also that consideration for both modes of transport is an increasing function of age but it leads also to a larger probability to choose a private mode of transport. Finally, given the definition of context dependency and the mathematical formulation of the model, there is an interesting interpretation of the probability to consider a mode of transport. It is actually defined as an increasing function of its speed. The results show also that utility for a mode of transport is a decreasing function of its travel time and its travel cost. The probability to choose a private mode of transport increases with respect to age and when the decision maker is a man. Although income does not affect the latter choice probability and the level of consideration (i.e. W), it affects significantly the probability to consider a private vehicle. The effect of income on it is actually defined as the difference between the two estimates that are associated to the income variable, with a corresponding t-stat that is equal to 3.49. Considering normally distributed unobserved heterogeneity, it is found that only the travel time coefficient is significantly dispersed across the population of decision makers (Table 17.7).
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
387
Table 17.5: Latent class MNL estimates, sample size n ¼ 4601. Class 1
Class 2
Choice probability: PV mode Label Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender Class assignmenta Intercept ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender Fit statistics ln ‘ ln ‘0 LR stat Pseudo-r2, adjusted AIC BIC
Est.
t-stat
Est.
t-stat
1.89 3.23 1.41 0.16 0.61 1.83 0.90 0.57
1.37 7.61 8.58 0.87 1.85 4.45 3.45 2.53
2.50 0.21 0.43 1.26 0.06 2.31 4.54 2.52
0.85 0.65 1.74 2.14 0.09 1.37 3.14 1.67
– – – – – –
– – – – – –
2.47 0.18 0.33 0.62 0.97 0.11
1.14 0.78 0.77 2.05 3.09 0.44
1734.94 3189.17 2908.46 0.45 3513.88 3655.43
Abbreviations: hr, hour; yr, year; D.V., dummy variable; AIC, Akaike information criterion; BIC, Bayesian information criterion. a For identification purpose, the intercept term and the coefficients of the individual-specific variables are fixed to 0 for the first latent class.
As compared to the model without unobserved heterogeneity, the signs, the relative magnitudes and significance of average effects does not change much. There is no clear difference. The results of the SLT model with two latent classes allow two groups of individuals in the population. One of them being is found to be more sensitive to travel time, travel cost and travel speed. The decision makers who belong to it have a larger willingness-to-pay for travel time savings (Table 17.8). Class assignment characterises the probability of an individual to belong to a specific group. One observes that the probability to belong to the second population segment, which is the most sensitive to the attributes of modes of transport, is an
388
Matthieu de Lapparent
Table 17.6: SLT estimates, sample size n ¼ 4601. Choice probability: PV mode Est.
t-stat
Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
1.41 2.65 1.35 0.13 0.38 0.37 1.55 0.64
2.18 22.27 15.19 1.62 2.28 2.50 15.23 6.64
Single latent threshold Intercept ln trip distance in km ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
7.12 3.21 0.11 1.23 2.07 0.19 0.21
2.87 11.68 0.34 1.75 3.12 0.30 0.52
Label
Fit statistics ln ‘ ln ‘0 LR stat Pseudo-r2, adjusted AIC BIC
1710.32 3189.17 2957.70 0.46 3450.64 3547.15
Abbreviations: SD, standard deviation; hr, hour; km, kilometre; yr, year; D.V., dummy variable; AIC, Akaike information criterion; BIC, Bayesian information criterion.
increasing function of income, a decreasing function of age, and it increases also when workplace is located in the city of Paris. The results show that income plays a more significant12 role when the decision maker is likely to belong to the first segment of the observed population. One notes, however, that it concerns with a larger probability the population of decision makers with lesser levels of income. An increase of the latter affects more significantly the choice probability of a private vehicle. Even though there is by definition decreasing
12. It is to be understood that the coefficient has a larger t-stat. Comparing the two magnitudes does not deserve any attention as the two latent segments have their own EV1 probability distribution (i.e. the levels of utility have different scales).
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
389
Table 17.7: Mixed SLT estimates, sample size n ¼ 4601. Choice probability: PV mode Label
Mean
t-stat
SD
t-stat
Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
1.46 2.72 1.39 0.13 0.40 0.35 1.59 0.68
2.21 20.44 14.80 1.61 2.31 2.26 14.82 6.81
0.06 0.34 0.02 0.02 0.02 0.05 0.11 0.02
0.97 2.54 0.17 0.52 0.46 0.12 0.66 0.46
Single latent threshold Intercept ln trip distance in km ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
7.09 3.30 0.09 1.24 2.71 0.03 0.29
2.67 11.34 0.26 1.67 3.86 0.04 0.66
0.14 0.22 0.01 0.12 0.26 0.05 0.18
0.42 0.93 0.07 0.49 0.39 0.06 0.41
Fit statistics ln ‘ ln ‘0 LR stat Pseudo-r2, adjusted AIC BIC Number of simulations
1701.02 3189.17 2976.30 0.46 3462.04 3655.06 100
Abbreviations: hr, hour; km, kilometre; yr, year; D.V., dummy variable; AIC, Akaike information criterion; BIC, Bayesian information criterion.
marginal returns of income on the choice probability of a private mode of transport, it emphasises it. One understands that not only utility is an increasing concave function of income but also preferences, and therefore tastes, may change with it. One notes also that the decision makers who may belong to the second latent class are the most sensitive to speed when evaluating the considerable modes of transport. Their probabilities to consider a mode of transport are more sensitive to a variation of travel speed. The PIAM model assumes different latent consideration levels for the modes of transport. Here again, the effect of a variable on the probability to consider a mode of transport is measured as the difference between the estimate in the corresponding utility function and the estimate in the corresponding latent threshold. If the
390
Matthieu de Lapparent
Table 17.8: Latent class SLT estimates, sample size n ¼ 4601. Label
Class 1
Class 2
Choice probability: PV mode Est.
t-stat
Est.
t-stat
Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
0.49 2.57 1.58 0.63 0.54 0.46 0.97 1.08
0.39 11.79 8.97 3.71 1.46 1.15 2.23 3.80
10.77 7.64 2.06 1.03 0.37 0.01 2.51 1.08
2.88 5.53 5.28 1.86 0.55 0.02 3.28 2.39
Single latent threshold Intercept ln trip distance in km ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
7.66 3.75 0.10 1.34 2.88 0.57 0.91
1.73 7.27 0.17 0.96 3.02 0.42 0.93
5.77 6.58 1.13 2.10 1.08 7.03 1.18
0.99 5.97 1.21 1.68 1.19 3.99 1.39
Class assignmenta Intercept ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
– – – – – –
1.64 1.12 1.42 0.26 3.03 0.52
0.75 3.56 2.39 0.47 4.91 1.24
Fit statistics ln ‘ ln ‘0 LR stat Pseudo-r2, adjusted AIC BIC
– – – – – –
1664.55 3189.17 3049.24 0.47 3401.10 3632.73
Abbreviations: hr, hour; km, kilometre; yr, year; D.V., dummy variable; AIC, Akaike information criterion; BIC, Bayesian information criterion. a For identification purpose, the intercept term and the coefficients of the individual-specific variables are fixed to 0 for the first latent class.
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
391
difference is negative, then the impact is negative. If the difference is positive, then the impact is positive (Table 17.9). The results show again a decreasing probability to consider a private mode of transport when the decision maker works in the city of Paris. They show also a decreasing probability to consider a private mode of transport when his/her home is located in the city of Paris. The effects of home and workplace location on the choice probabilities are not significant. The probability to consider a private mode of transport increases when the decision maker is a man. Gender has no effect on the probability to consider a public mode of transport. Once it is controlled for the impacts on the consideration processes, gender has also no effect on the probability to choose a mode of transport. The results show that only travel time plays a significant role on the probability to consider a mode of transport and the probability to choose it when considered. Travel cost does not play any role on the probability to consider a mode of transport. Even though the difference of the coefficients may appear to be strictly positive for the public transport mode, it is not statistically significant (t-stat ¼ 1.53). Considering that both modes are selected for final decision making, it is found that travel cost does not play a significant role on the probability to choose one of them. Age has no effect on the probability to consider a public transport mode of transport. It has, however, a significant and a positive effect on the probability to consider a private vehicle. When assuming continuously distributed unobserved heterogeneity, one observes that taste heterogeneity does not concern a lot of independent variables (Table 17.10). One notes, however, that the impacts of home location and income on the probability to choose a private vehicle are significantly dispersed across the population of travellers. The results show a significant unobserved heteroskedasticity, denoting therefore that there may exist unobserved circumstances that may affect this probability. The results show also that the effect of travel time on the probability to consider a private vehicle as an alternative is significantly dispersed across the population of decision makers, that is the difference of the variances is statistically significant (t-stat ¼ 2.30). One finds also the same kind of result as it regards the income, the age and the gender variables. From a general point of view, the results of the models show that tastes are distributed in the population of travellers, especially as it regards travel time. The results show actually a significant dispersion of their willingness-to-pay for saving travel time, whatever one considers continuous or discrete distributions. When the latter is privileged, one finds two latent segments are enough to cluster the observed population. Finally, if one has to choose a preferred model on the basis of the full set of independent variables, that is without re-estimating the models with their only significant variables, one would use the MXL-PIAM approach.
392
Matthieu de Lapparent
Table 17.9: PIAM estimates, sample size n ¼ 4601. Choice probability: PV mode Est.
t-stat
Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
3.83 4.60 0.15 0.15 0.47 0.35 0.35 0.38
2.61 8.80 0.53 0.97 1.49 0.92 1.39 1.59
Latent threshold: PV mode Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
2.00 4.10 0.05 0.44 0.28 0.82 1.96 2.18
0.60 5.42 0.06 1.13 0.30 0.81 2.73 2.22
Latent threshold: PT mode Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
0.86 2.34 2.85 0.005 0.38 0.29 2.19 0.29
0.61 2.23 4.53 0.03 1.15 1.05 8.50 1.46
Label
Fit statistics ln ‘ ln ‘0 LR stat Pseudo-r2, adjusted AIC BIC
1665.67 3189.17 3047.00 0.47 3379.34 3533.73
Abbreviations: hr, hour; yr, year; D.V., dummy variable; AIC, Akaike information criterion; BIC, Bayesian information criterion.
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
393
Table 17.10: Mixed PIAM estimates, sample size n ¼ 4601. Choice probability: PV mode Label
Mean
t-stat
SD
t-stat
Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
3.02 4.44 0.46 0.07 0.44 0.51 0.63 0.51
1.54 5.22 1.11 0.50 1.39 1.48 1.75 2.33
0.22 0.05 0.25 0.08 0.04 1.85 0.31 0.25
2.27 0.20 0.85 1.65 0.89 3.11 1.29 1.22
Latent threshold: PV mode Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
0.87 3.77 0.19 1.01 0.26 1.85 2.29 5.97
0.20 3.26 0.24 2.05 -0.22 1.40 2.14 2.60
0.40 0.75 0.38 0.29 0.31 0.77 0.74 3.20
1.08 2.03 0.61 1.85 2.29 0.66 0.61 2.70
Latent threshold: PT mode Intercept ln trip time in hr ln trip cost in Euros ln daily income in Euros ln age in yr D.V.: live in Paris city D.V.: work in Paris city D.V.: male gender
0.55 0.88 4.10 0.05 0.63 0.67 2.67 0.17
0.26 0.31 2.32 0.20 1.09 1.55 4.15 0.65
0.09 0.82 0.33 0.09 0.06 0.20 0.02 0.24
0.62 1.23 0.83 0.84 0.77 0.25 0.05 0.59
Fit statistics ln ‘ ln ‘0 LR stat Pseudo-r2, adjusted AIC BIC Number of simulations
1642.94 3189.17 3092.46 0.47 3381.88 3690.71 100
Abbreviations: hr, hour; yr, year; D.V., dummy variable; AIC, Akaike information criterion; BIC, Bayesian information criterion.
394
Matthieu de Lapparent
17.5. Conclusions Even though the approaches are unfortunately not easily practicable for large M in the sense that it requires to study selection among 2M 1 subsets of alternatives, they remain worthwhile approaches. They may be virtually set up with any MXL specification and/or any GEV probability distribution functions as concerns the layer that models the choice in the considered set of discrete alternatives. One notes also that one may use other distributional assumption about the error terms. From a general point of view, the results of the application are satisfactory. Accounting for latent pre-selection of choice alternatives produces better results. Under a latent class structure, the SLT approach gives the best results. But on the basis of the entire set of estimated model, the PIAM approach should be privileged. Note, however, that estimation of the LCL-PIAM model has failed. It remains an important future topic of research. Also, every time a latent class structure is assumed, it improves significantly the fit of the models and it performs better than a MXL structure. It is also derived from the application that two latent classes are enough to model taste heterogeneity. They correspond to two groups of travellers: those with a low value of travel time savings and those with a high value of travel time savings. When associated to the SLT approach, one deduces that the optimal segmentation of the population of interest is of six groups, that is three possible consideration set of alternatives times two latent classes for tastes. Further research may for instance also concern specification, estimation and comparison of models with larger full choice sets, models with flexible functional forms of the levels of utility and latent thresholds, models with varying GEV distributions along with the considerable choice sets, and models with other screening rules.
References Andrews, R. L., & Manrai, A. K. (1998). Feature-based elimination: Model and empirical comparison. European Journal of Operational Research, 111(2), 248–267. Basar, G., & Bhat, C. (2004). A parameterized consideration set model for airport choice: An application to the San Francisco bay area. Transportation Research Part B: Methodological, 38(10), 889–904. Ben-Akiva, M., & Boccara, B. (1995). Discrete choice models with latent choice sets. International Journal of Research in Marketing, 12(1), 9–24. Ben-Akiva, M., & Franc- ois, B. (1983). m Homogeneous generalized extreme value model. Working Paper, Department of Civil Engineering, MIT, Cambridge, MA. Ben-Akiva, M., & Lerman, S. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: MIT Press. Ben-Akiva, M., McFadden, D., Train, K., Walker, J., Bhat, C., Bierlaire, M., Bolduc, D., Boersch-Supan, A., Brownstone, D., Bunch, D., Daly, A., de Palma, A., Gopinath, D.,
Latent Class and Mixed Logit Models with Endogenous Choice Set Formation
395
Karlstrom, A., & Munizaga, M. (2002). Hybrid choice models: Progress and challenges. Marketing Letters, 13(3), 163–175. Bhouri, N. (2002). Intermodalite´: Bilan et perspectives. Rapport de convention DTT, 02/RB/087. Bierlaire, M. (2006). A theoretical analysis of the cross-nested logit model. Annals of Operations Research, 144, 287–300. Block, H., & Marshack, J. (1959). Random orderings and stochastic theories of response. Cowles Foundation Discussion Paper no. 66. Boccara, B. (1989). Modeling choice set formation in discrete choice models. Ph.D. thesis, MIT, Cambridge, MA. Cantillo, V., & Ortu´zar, J. d. D. (2005). A semi-compensatory discrete choice model with explicit attribute thresholds of perception. Transportation Research Part B: Methodological, 39(7), 641–657. DREIF/DIT/GEST. (1998). Modus 1 et 2: pre´sentation ge´ne´rale. Document DREIF. Falmagne, J.-C. (1978). A representation theorem for finite random scale systems. Journal of Mathematical Psychology, 18, 52–72. Fishburn, P. (1978). Choice probabilities and choice functions. Journal of Mathematical Psychology, 10, 327–352. Gensch, D. (1987). A two-stage disaggregate attribute choice model. Marketing Science, 6(3), 223–239. Georgescu-Roegin, N. (1958). Threshold in choice and theory of demand. Econometrica, 26, 157–168. Gilbride, T., & Allenby, G. (2004). A choice model with conjunctive, disjunctive, and compensatory screening rules. Marketing Science, 23(3), 391–406. Gourie´roux, C., & Monfort, A. (1996). Simulation-based econometric methods. Oxford, UK: Oxford University Press. Greene, W., & Hensher, D. (2003). A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B, 37, 681–698. Grether, D., & Wilde, L. (1984). An analysis of conjunctive choice: Theory and experiments. Journal of Consumer Research: An Interdisciplinary Quarterly, 10(4), 373–385. Hensher, D., Rose, J., & Greene, W. (2005). Applied choice analysis: A primer. Cambridge, UK: Cambridge University Press. Hess, S., Bierlaire, M., & Polak, J. (2005). Estimation of value of travel-time savings using mixed logit models. Transportation Research Part A, 39, 221–236. Hess, S., Polak, J., & Axhausen, K. (2006). Distributional assumptions in mixed logit models. In: TRB 85th Annual Meeting Compendium of Papers CD-ROM, no. 06-2065. Huber, J., & Klein, N. (1991). Adapting cutoffs to the choice environment: The effects of attribute correlation and reliability. The Journal of Consumer Research, 18(3), 346–357. Mandel, B., Gaudry, M., & Rothengatter, W. (1997). A disaggregate Box–Cox logit mode choice model of intercity passenger travel in Germany and its implications for high-speed rail demand forecasts. The Annals of Regional Science, 31(2), 99–120. Manski, C. F. (1977). The structure of random utility models. Theory and Decision, 8(3), 229–254. McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. In: P. Zarembka (Ed.), Frontiers in Econometrics. New York: Academic Press. McFadden, D. (1977). Modelling the choice of residential location. Cowles Foundation Discussion Paper no. 477. Cowles Foundation, Yale University, New Haven, CT. McFadden, D. (2001). Disaggregate behavioural travel demand’s RUM side: A 30 year retrospective. In: D. Hensher (Ed.), Travel behaviour research: The leading edge. Oxford, UK: Pergamon.
396
Matthieu de Lapparent
McFadden, D. (2005). Revealed stochastic preference: A synthesis. Economic Theory, 26(2), 245–264. McFadden, D., & Richter, M. (1990). Stochastic rationality and revealed stochastic preference. In: J. Chipman, D. McFadden & M. Richter (Eds), Preferences, uncertainty, and optimality, essays in honor of Leo Hurwicz (pp. 161–186). Boulder, CO: Westview Press. Papinutti, M. (1998). Le mode`le de de´placements de la re´gion Iˆle-de-france. Notes de synthe`se du Service e´conomique et statistique, 116, 27–32. Roberts, J., & Lattin, J. (1991). Development and testing of a model of consideration set composition. Journal of Marketing Research, 28, 429–440. Sattah, S., & Tversky, A. (1977). Additive similarity trees. Psychometrika, 42, 319–345. Scott, A. (2002). Identifying and analysing dominant preferences in discrete choice experiments: An application in health care. Journal of Economic Psychology, 23(3), 383–398. Swait, J. (2001). A non-compensatory choice model incorporating attribute cutoffs. Transportation Research Part B: Methodological, 35(10), 903–928. Swait, J., & Ben-Akiva, M. (1987a). Empirical test of a constrained choice discrete model: Mode choice in Sa˜o Paulo, Brazil. Transportation Research Part B: Methodological, 21(2), 103–115. Swait, J., & Ben-Akiva, M. (1987b). Incorporating random constraints in discrete models of choice set generation. Transportation Research Part B: Methodological, 21(2), 91–102. Train, K. (2003). Discrete choice methods with simulation. Cambridge, UK: Cambridge University Press. Train, K., & Sandor, Z. (2004). Quasi-random simulation of discrete choice models. Transportation Research Part B, 38, 313–327. Tversky, A. (1972a). Choice by elimination. Journal of Mathematical Psychology, 9, 341–367. Tversky, A. (1972b). Elimination by aspects: A theory of choice. Psychological Review, 79, 281–299. Tversky, A., & Sattah, S. (1979). Preference trees. Psychological Review, 86, 542–573. Walker, J., & Ben-Akiva, M. (2002). Generalized random utility models. Mathematical Social Science, 43(3), 303–343. Walker, J., Ben-Akiva, M., & Bolduc, D. (2007). Identification of parameters in normal error component logit-mixture (NECLM) models. Journal of Applied Econometrics, 22, 1095–1125.
PART V TRANSPORT MATTERS
Chapter 18
Transport Welfare Benefits in the Presence of an Income Effect James Laird
Abstract This paper presents new evidence that the error in estimating the economic welfare of a transport scheme can be very large. This is for two reasons. Firstly when cost changes are large the income effect can be significant. This means the change in consumer surplus is no longer a good estimate of the compensating variation — the true measure of welfare benefit. Secondly, in the presence of large cost changes estimating the change in consumer surplus using the Rule of Half can lead to large errors. The paper uses a novel approach based on stated choice and contingent valuation data to estimate the size of this error for the situation of the provision of fixed links to islands in the Outer Hebrides of Scotland.
18.1. Introduction In the transportation field it is normal to calculate the welfare benefits of an intervention through the change in Marshallian consumer surplus in combination with the assumption of a linear demand curve. The latter assumption is usually referred to as the Rule of Half convention. A good transport appraisal therefore accurately measures Area GC0aubuGC1 in Figure 18.1(b) when estimating the welfare benefit of a transport intervention. The ‘exact’ welfare benefit of a transport intervention is in fact given by either of the Hicksian measures of compensating or equivalent variation. With its reference point of pre-intervention utility levels compensating variation is generally accepted as the appropriate Hicksian measure for a transport cost–benefit analysis of changes in transport quality (see e.g. Boardman, Greenberg, Vining, & Weimer, 2001,
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
400
James Laird (a) Demand for composite good (Y)
G b I
U1
a J
c U0
K
H X0
Xc
X1
Demand for transport (X)
(b) Generalised Cost (GC) Hicksian demand curve (utility compensated) a′
GC0
Marshallian demand curve
Welfare benefit c’ c′
GC1 Error 1
X0
b′ Error 2
Xc
X1
Linear approximation to Marshallian demand curve Demand for transport (X)
Figure 18.1: Compensating variation and the change in consumer surplus. (a) Indifference curve analysis of change in generalised cost. (b) Demand curve analyses of change in generalised cost.
Transport Welfare Benefits in the Presence of an Income Effect
401
pp. 59–64). The compensating variation represents the income equivalent of a price change. In a transport context it is the maximum a transport user will pay for a transport intervention. It is illustrated in Figure 18.1(a) as the difference between points G and J, and in Figure 18.1(b) by the Area GC0aucuGC1. With transport acting as a normal good and a reduction in generalised cost, as in Figure 18.1(b), the compensating variation is always less than the change in consumer surplus. The difference between the two measures stems from the need to exclude the income effect when calculating the willingness to pay for a transport intervention. With reference to Figure 18.1(a) the initial bundle of goods comprises a composite good (Y) and a transport good (X) and is given by point a on indifference curve U0. The pre-intervention budget constraint is given by line GH. A reduction in the generalised cost of transport reduces the slope of the budget constraint post intervention to that depicted by line GI. This results in an optimum post intervention bundle of goods at point b on indifference curve U1. The resulting change in demand for transport (X1 X0) can be decomposed into a substitution effect (XC X0) and an income effect (X1 XC). This is because at the post-intervention prices but at the initial utility level (U0) the budget constraint would be JK and the optimum bundle of goods would be at point c with a demand for transport of XC. The maximum willingness to pay for the transport intervention therefore is given by the difference in the intercepts on the y-axis of the budget constraints GI and JK (i.e. the difference between G and J). This is equivalent to the area under the Hicksian utility compensated demand curve between the pre- and post-intervention generalised costs and, as mentioned above, is illustrated in Figure 18.1(b) by Area GC0aucuGC1. Willig (1976) argues that in almost all circumstances, and certainly where the change in consumer surplus is less than 5% of income, the change in consumer surplus is a good approximation to the ‘exact’ compensating variation measure. That is ‘Error 1’ in Figure 18.1(b) is small. For example, with an income elasticity of 0.8 and a consumer surplus that is 5% of income the compensating variation is within 2% of the change in consumer surplus. It is because of arguments such as Willig’s that the change in consumer surplus has become de rigueur in transport cost–benefit analysis. A further argument espoused in support of consumer surplus is that the income effect can only be estimated imprecisely. By attempting to correct for the income effect an additional error is introduced into the welfare benefit calculation and this can prove more costly in terms of precision than just using the change in consumer surplus estimate with no correction (Alston & Larson, 1993). The assumption of a linear demand curve also introduces an error into the appraisal. This is shown as ‘Error 2’ in Figure 18.1(b). For small cost changes this error is not an issue, but for large changes a significant overestimation in the change in consumer surplus can occur. As transport can form a large proportion of total household expenditure and the household’s use of time,1 a policy intervention that significantly
1. Fourteen percent of household expenditure is on transport in the case of UK (ONS, 2007, Table 18.1). Each household member spends approximately 1 h a day travelling (Metz, 2005).
402
James Laird
affects the generalised cost of transport could potentially generate changes in consumer surplus that are large in relation to income. In such situations the change in consumer surplus may not be a good approximation to compensating variation (Error 1), and simultaneously the Rule of Half may not be a good approximation to the change in consumer surplus (Error 2). Whilst acknowledging this limitation, there has been little development to date in the transport economic literature to addressing the problem. With respect to Error 1, the earliest work is that by Jara-Diaz and Videla (1990) who found that for medium income Chilean households the change in Marshallian consumer surplus is 12% higher than the compensating variation. In more recent work Cherchi and Polak (2007) found for a low-income Italian sample (transport expenditure comprises 50% of total expenditure) the change in consumer surplus is 30% larger than the compensating variation. For high income people (transport expenditure comprises 13% of total expenditure) the error is much smaller (o5%) as it is for a mixed sample of high- and low-income people. Daly, de Jong, Ibanez, Batley, and de Bok (2008) using the Dutch national model system estimate that the compensating variation of a policy to introduce national road pricing with a very high per kilometre charge is 34% lower than the change in consumer surplus (i.e. the change in consumer surplus overestimates the ‘exact’ welfare benefit by just over 50%). With respect to the second error, Error 2, Nellthorp and Hyman (2001) argue that it exceeds 10% of the change in consumer surplus if the generalised cost changes exceed 33%. De Jong, Daly, Pieters, and Van der horn (2007) also identify a substantial error — as high as 32% — for a high-speed rail case study. Clearly the limited evidence that exists suggests the impact of including income effects in the calculation of the welfare benefit from a transport intervention is quite varied. In some instances the impact can be tolerated at a practical level but in others it seems very large. Certainly, on this evidence, Willig’s argument that the change in consumer surplus is applicable to almost all circumstances needs closer examination in the transport context. There is also very limited evidence on the overestimation of the change in consumer surplus. The potential for overestimation has long been noted in the literature, but very little effort has been expended in ensuring that the Rule of Half approximation is not used inappropriately. The primary concern of this paper therefore is the overestimation of economic welfare in the presence of an income effect, when the welfare benefit is estimated using the Rule of Half. It is distinct from the existing literature in that it comes at the problem from a different angle. Instead of estimating different demand schedules and comparing the benefit measures as Jara-Dı´ az and Videla, Cherchi and Polak, Daly et al. and Nellthorp and Hyman have done, the research presented here compares two empirical estimates. The first empirical estimate is derived using a stated choice design to give the marginal value of an additional operating hour of a ferry. This is used to estimate the change in generalised cost (GC0 GC1) and from that the change in consumer surplus using the Rule of Half. A second design and a contingent valuation question are used to directly elicit the compensating variation (i.e. Area GC0aucuGC1). A comparison between the two benefit measures identifies the error associated with using the Rule of Half convention to approximate welfare benefits in
Transport Welfare Benefits in the Presence of an Income Effect
403
the presence of an income effect. The advantage of comparing the benefit measures empirically rather than via separate estimations of the demand schedules is that errors in the estimation of the Marshallian and Hicksian demand functions may dominate any comparison between the two measures of welfare benefit. The remainder of the paper is structured as follows. The next section of the paper presents the empirical analysis. It provides background information on the stated choice survey, and describes the econometric models estimated. The third section utilises the marginal valuations from the econometric models to derive estimates of the change in consumer surplus and the compensating variation. The fourth and final section presents the conclusions and suggests avenues for further research.
18.2. Empirical Analysis 18.2.1. Data The data used in this analysis were collected in the Outer Hebrides in 2005 as part of a wider study to obtain estimates of the marginal values of ferry headways and operating hours, as well as the difference in risk premium between a ferry and a fixed link (a bridge or a causeway) (Laird, 2008, Chapters 4, 5 and 6). For this paper only the data from the household survey is utilised. Householders on the islands of Scalpay, Berneray, Eriskay and Vatersay were surveyed. These islands are small (populations are less than 300) and form part of the Outer Hebridean group. The whole of the Outer Hebrides has a population of about 26,000 and lies about 50 km off the north west coast of Scotland. Each of the islands surveyed recently had its lifeline ferry service to the main islands in the group replaced by a fixed link. The ferry services that used to serve the islands had limited hours of operation and were relatively infrequent. From the perspective of this paper what makes the sample interesting is that on average households are not wealthy and the benefits of replacing the ferry with an untolled fixed link are large. Errors 1 and 2 may therefore be substantial. As part of the survey respondents faced three tasks of particular relevance to the interest of this paper. In the local ferry stated preference game they were asked to choose between two types of ferry service; in the fixed link stated preference game they chose between a ferry and a fixed link; and finally they were asked a contingent valuation question. Each of the tasks was framed within the context of the old ferry service to the island and/or the fixed link that replaced it. Example scenario cards and the question that supported the cards are given in Figures 18.2 and 18.3. The contingent valuation scenario was presented in a format similar to that depicted in Figure 18.3. Householders were interviewed face-to-face, and completed questionnaires were returned from 149 households. This represents just under 50% of the population over the four islands. This gave a maximum number of cases of 596 for the local ferry SP game and 745 for the fixed link SP game. After data cleaning this
404
James Laird I am now going to show you some more HYPOTHETICAL situations. This time I would like you to imagine that this island was still only connected to Harris by ferry. I will show you two types of ferry service, which differ in fares, frequency and hours of operation. In each situation I would be grateful if you could choose the ferry service that YOUR HOUSEHOLD would most prefer. OPTION A
OPTION B
Vehicle Ferry
Vehicle Ferry
Vehicle size
No restrictions
No restrictions
Fares (vehicle and passengers)
£5.00
£6.50
(return)
(return)
Every 4 hours
Every hour
Frequency (average) Opening Hours
First ferry:
First ferry:
9am to 5pm
Last ferry:
7am to 7pm
Last ferry:
Example Sailing times
0900, 1300, 1700
0700, 0830, 0900, 1000,…etc…., 1700, 1800, 1900
Choose one only
A
B
Figure 18.2: Local ferry stated preference game — scenario framing and presentation.
I am now going to show you a number of HYPOTHETICAL situations. I would like you to imagine the situation before the bridge was constructed but YOUR HOUSEHOLD could choose whether this island was to be connected to Harris with a FREE ferry or with a bridge. However, as the bridge has to be constructed by the local council, council tax must go up to pay for it. In each situation I would be grateful if you could choose the transport and council tax option that YOUR HOUSEHOLD would most prefer. OPTION A (Vehicle Ferry)
OPTION B (Bridge)
Funding
Fully subsidised through council tax
Constructed by the council
Vehicle size
No restrictions
No restrictions
Journey time
---
15 minutes quicker than ferry
Fares/tolls
No fares
No tolls
Frequency
Half Hourly
---
Opening Hours
First ferry:
9am
Last ferry:
5pm
24 hrs Existing
Council Tax
As existing
+ £5.80 per week (£800 per year)
Choose one only
A
B
Figure 18.3: Fixed link stated preference game — scenario framing and presentation.
Transport Welfare Benefits in the Presence of an Income Effect
405
reduces down to 404 and 517 cases respectively. Further details of the survey design and data cleaning process are described in Laird (2008, Chapters 4 and 6). An important distinguishing feature between the two stated preference games, from the perspective of this paper, is that the payment vehicle for the local ferry game is a fare per trip, whilst that for the fixed link game is a premium on the household’s Council Tax. Council Tax is a local tax paid by the household in monthly instalments. The revenue from this tax is used to fund local services including local transport services. It is this difference that allows the comparison of the results from the two SP datasets to be used as a basis for comparing the change in consumer surplus and compensating variation of a transport intervention. It is unusual in the transport economics literature to set willingness to pay questions within the context of a charge unrelated to use (i.e. a premium on Council Tax), except for the small literature on transport-related option values (e.g. Painter, Scott, Wandschneider, & Casavant, 2001; Geurs, Haaijer, & van Wee, 2006; Humphreys & Fowkes, 2006), but it is relatively common in the environmental economics literature on option and non-use values. There also exist a number of network access charges in transport that are unrelated to use including vehicle excise duty (in the UK), the Eurovignette and, to a certain extent, the London congestion charge. The approach adopted within this study therefore was to present the Council Tax premium in weekly terms, with the annual implication of such a weekly premium available to ensure the respondent was fully aware of any budget implications. The weekly Council Tax premia considered were between d0.40 and d19.00 per week. The upper end of this range is equivalent to the fare paid on the old ferries for 2.5 return trips per week. The Council Tax premiums are therefore quite reasonable, and it was found that the respondents were able to understand the scenarios presented without any difficulties.
18.2.2. Estimation Results 18.2.2.1. Local ferry stated preference game In the first instance multinomial logit (MNL) models were estimated. The best performing MNL model acted as a starting point for the estimation of the mixed logit (MXL) models. The econometric specification used is: ferry f ðHÞ þ wferry gðOHÞ þ fferry hðPÞ þ hA U ferry hA ¼ bk k k ferry U ferry f ðHÞ þ wferry gðOHÞ þ fferry hðPÞ þ hB hB ¼ bk k k
(18.1)
is the utility that household h (belonging to population segment k) where U ferry hA receives from the ferry service A; f(H) a function in headway, g(OH) a function in operating hours and h(P) a function in price (fares). bk, wk and fk are population segment parameters associated with headway, operating hours and price. ehA and ehB are unobserved terms independently and identically distributed (IID) over households and alternatives.
406
James Laird
An unsegmented model linear in headway, operating hours and cost is presented in the first column of Table 18.1 (model LF-MNL1). Experimentation with segmentation by socio-economic characteristics and different forms for the functions f(H), g(OH) and h(P) gave rise to model LF-MNL5 as the best performing MNL model (second column of Table 18.1). This model is linear in cost and is a step function in headway and operating hours. The model acts as the starting point for the estimation of the MXL models. Model LF-MXL3 uses a normal distribution function to capture taste variation, model LF-MXL4 uses triangular distribution function with no constraints on the spread and model LF-MXL5 uses a triangular distribution function with the spread constrained to the mean. The log-normal distribution is not used as it results in unrealistically high estimates of the mean willingness to pay (see e.g. Laird, 2008, Chapter 5). As can be seen from the loglikelihood values introducing taste variation into the model improves the level of fit significantly for the models fitted with the normal (LF-MXL3) and the unconstrained triangular (LF-MXL4) distributions. The improvement in fit over the MNL model of the constrained triangular distribution (LF-MXL5) is marginal. In the models presented only some of the variables are treated as random. This is because some of the spread parameters are not significantly different from zero (including the spread parameter associated with cost). It can also be seen that the introduction of taste variation reduces the number of variables in the models. That is the distribution functions capture some of the variation in marginal utility that previously required an additional variable. The advantage of using the constrained triangular distribution over the normal or unconstrained triangular distributions is that no part of the distribution of willingness to pay values has the wrong sign (negative). The fit, however, of the models with the unconstrained normal or triangular distributions is better (LF-MXL3 and LF-MXL4). These models result in a proportion of the population being attributed a negative willingness to pay — between 0% and 14% depending on the model estimated, the attribute in question and the attribute’s level. For a practical application such proportions, whilst undesirable, are considered acceptable. Of the two models LF- MXL3 and LF-MXL4, model LF-MXL4 (triangular unconstrained) has a marginally higher log-likelihood and the fitted distribution is also bounded. This is more appealing as it is unrealistic to expect the marginal utility of headway and operating hours to tend to infinity for some members of the population. On these grounds LF-MXL4 is taken as the preferred model. The second column of Table 18.2 shows the marginal values from the preferred model are, depending on headway length, 2.0 and 11.3 pence per headway minute and, depending on the length of the operating day, 19.7 and 34.7 pence per operating hour. The marginal values reflect willingness to pay by householders for changes in departure time. These marginal valuations are within the design range of the stated preference questions and this gives confidence in the validity of the results. They are also similar to the values that are derived from the MNL model. No confidence intervals are available for these results which derive from the preferred MXL model. The case for these results being robust therefore centres on how well the parameters of the distribution functions for the different marginal utilities have been estimated.
F
c b c
F
b
Households making 5 or more trips per week over fixed link Operating hours coefficient (no. of hours closed)
Headways Z240 min
F
F
c b c b c
F
b
c
Increment on headway coefficient for Headways Z60 min b
Headway coefficient (headways Z30 min)
Cost of all trips
0.131 ( 7.80) –
0.008 ( 6.29) –
0.006 ( 6.26)
LF-MNL1
Table 18.1: Local ferry SP game estimation results.
F
F
F
F
F
–
0.000
0.007 ( 3.17)
0.004 (1.85)
0.042 (3.14)
0.007 ( 6.49) – 0.050 ( 3.53) –
LF-MNL5
F
F
N
F
F
–
– 0.017 ( 3.22) – 0.000
0.107 (3.07) 0.018 ( 3.70) –
0.012 ( 5.48) – 0.121 ( 3.22) –
LF-MXL3
F
F
TU
F
F
–
– 0.017 ( 3.20) – 0.000
0.110 (3.14) 0.044 (3.92) –
0.012 ( 5.58) – 0.123 ( 3.29) –
LF-MXL4
F
F
TC
F
–
– 0.007 ( 3.41) – 0.000
–
–
–
0.006 (3.25) 0.006 (4.11)
0.008 ( 5.79)
LF-MXL5 Transport Welfare Benefits in the Presence of an Income Effect 407
c
b
c
b
404 232.34 0.161
F
F
0.140 ( 2.13)
0.132 ( 4.18)
0.062 ( 3.27)
404 200.08 0.278
F
F
F
LF-MNL5
0.357 ( 2.47) –
0.174 ( 2.64) 0.281 ( 3.56) 0.228 ( 3.77) –
404 182.34 –
F
F
N
LF-MXL3
0.349 ( 2.68) –
0.178 ( 2.66) 0.670 ( 3.85) 0.225 ( 3.71) –
404 182.26 –
F
F
TU
LF-MXL4
0.064 ( 2.29) 0.076 ( 2.29) 0.201 ( 4.03) 0.501 ( 3.46) 0.154 ( 2.43) – 404 199.79 –
F
TU
TC
LF-MXL5
Notes: F, N, and TU and TC indicate distribution function for parameters. F means fixed (i.e. no distribution function), N is the normal, TU the triangular where the spread is not constrained in estimation and TC the triangular with spread constrained equal to the mean. For each random parameter two coefficients are estimated (b and c). For the normal distribution the mean ¼ b and SD ¼ c; and for the triangular mean ¼ b and spread ¼ c. t-statistics is given in parentheses. Parameter is significant at 99% level if the t-statisticW2.33; at the 95% level if the t-statisticW1.96 and at the 90% level if the t-statisticW1.65. t-statistics for MXL models calculated using robust standard. MNL models estimated with ALOGIT v4.2, MXL models with Train’s Gauss code using 500 Halton draws (http://elsa.berkeley.edu/Software/abstracts/train0296.html).
No. of observations Log-likelihood Adjusted r2
Households making 17 or more trips per week over fixed link
Households making 3 or more trips per week over fixed link
c
–
LF-MNL1
Increment on operating hours coefficient for Hours closed Z12 h b F
Table 18.1: (Continued )
408 James Laird
Transport Welfare Benefits in the Presence of an Income Effect
409
Table 18.2: Properties of willingness to pay distributions for headway and operating hours (local ferry SP game MXL models). LF-MXL3
LF-MXL4
LF-MXL5
Marginal value of a headway minute (pence/headway min/household trip) When 30 minrheadway o60 min Mean SD Proportion with negative WTP When 60 minrheadwayr240 min Mean SD Proportion with negative WTP
11.1
11.3
0.7 0%
0.7 0%
1.9 1.5 13%
2.0 1.7 14%
As for headways between 60 and 240 min 1.4 0.6 0%
Marginal value of an operating hour (pence/operating hour/household trip) Between 7 p.m. and 7 a.m. (i.e. late evening and night) Mean SD Proportion with negative WTP
20.0 17.2 0%
After 7 a.m. and before 7 p.m. (i.e. during day) Mean 34.4 SD 25.3 Proportion with negative WTP 11%
19.7 16.9 0%
22.3 16.2 0%
34.7 28.5 11%
30.7 16.5 0%
Notes: Mean and SD values estimated using a Monte-Carlo simulation based on a population of 10,000 (split into the statistically relevant socio-economic groups from the model using sample proportions). Lowest and highest 2.5% are excluded from calculation of mean and SD for the unbounded normal distribution (LF-MXL3) (Hensher & Greene, 2003; Cirillo & Axhausen, 2006). Fixed cost coefficients and bounded distributions mean that all 10,000 simulations are used for LF-MXL4 and LF-MXL5.
As can be seen from Table 18.1 they are all significant at the 99% level. Some confidence in the robustness of the results can also be gained from the preliminary MNL model (LF-MNL1) where the 95% confidence intervals for the marginal value of headway and operating hours are 725% or below. The validity of these results is important for the comparison that will be made in Section 18.3. The uniqueness of this research however makes it difficult to make comparisons with other empirical evidence. The only empirical evidence in the literature to which it is directly comparable is that by Bra˚then and Hervik (1997) which has since been adapted into Norwegian appraisal guidance (Bra˚then & Lyche, 2004). The latter give values for inconvenience costs (i.e. the costs imposed by headway and operating hour restrictions). The implication of the Norwegian guidance is that inconvenience costs for replacing a ‘lifeline’ ferry, that operated from 0530 to 2400 and with peak headways of less than 30 min, with a fixed link is equivalent to a 21 min time saving. The estimation results presented here would give
410
James Laird
a marginal value per vehicle of 17 car-IVT minutes.2 The results are comparable and give confidence in the validity and believability of the marginal values per single household trip. Additionally the results for the marginal value of headways (o60 min) are consistent with, though slightly lower than those found by Wardman (2004, Table 12) for other forms of public transport. 18.2.2.2. Fixed link stated preference game The same estimation strategy to that used for the local ferry stated preference game was used for the estimation of a model to the fixed link stated preference game data. The following econometric specification was used. ¼ bFerry f ðHÞ þ wFerry gðOHÞ þ fFerry hðPÞ þ Ferry U Ferry h k k k h U FixedLink ¼ aFixedLinkFerry þ fk hðPÞ þ FixedLink h h k
(18.2)
where akFixedLinkFerry is the population segment specific intercept for the fixed link over the Ferry, arising from unobserved attributes, including journey time, convenience, availability (reliability) and non-use value. An unsegmented model linear in operating hours and cost is presented in the first column of Table 18.3. Experimentation with segmentation by socio-economic characteristics and different forms for the function g(OH) gave rise to model FL-MNL2 as the best performing MNL model (second column of Table 18.3). This model is linear in cost and is a step function in operating hours. It was not possible to estimate an MXL model with anything but a constrained triangular distribution assumed for the policy variables. The estimation process did not converge if a normal distribution function was used (with or without constraints on the standard deviation) or an unconstrained triangular distribution was used. The best performing MXL model is FL-MXL4 and is presented in the third column of Table 18.3. This model is taken to be the preferred MXL model on the grounds of fit and ability to reproduce the observed willingness to pay distribution function (see Laird, 2008, pp. 155–160). As can be seen from the third column of Table 18.4 the preferred model, model FL-MXL4, has a value for the fixed link constant of d333.60 per household per annum and a marginal value for an operating hour of d26.40 (per household per annum). These values represent the willingness to pay of households through the medium of Council Tax for the improved quality that a fixed link offers. It should of course be noted that Council Tax in the UK is paid in instalments and that the aggregation to annual premium here is made partly for presentational purposes. As before no confidence intervals are available for these results which derive from an
2. Average vehicle occupancy of 3.2 for an island residents’ trip on a ferry — derived from analysis of Berneray Causeway ex-ante and ex-post studies (Halcrow, 1996 and SQW, 2004 respectively). Average car occupant value of time is d5.07 per hour (i.e. 8.5 p/min) (2005 perceived prices and values). Derived from standard value of travel time savings per passenger (DfT, 2007) using non-work other value of time.
Transport Welfare Benefits in the Presence of an Income Effect
411
Table 18.3: Fixed Link SP game estimation results. FL-MNL1 Cost coefficient for all household incomes
b
FL-MNL2
FL-MXL4
F
0.003 ( 8.5)
F
0.003 ( 8.8)
TC
F
1.411 (7.5)
F
1.421 (6.4)
F
–
F
F
–
F
–
F
0.894 ( 3.6) 1.414 ( 3.3) 0.776 (2.9)
–
F
c Fixed link constant (all households)
b
Increment on fixed link constant for Households on b F Berneray Households on b F Vatersay Households making 5 b F or more trips per week over fixed link Households making 17 b F or more trips per week over fixed link Operating hours b coefficient (no. of hours closed) c
0.046 ( 2.6) –
Increment on operating hours coefficient for Households making 3 b F – or more trips per week over fixed link c – No. of observations Log-likelihood Adjusted r2
517 281.66 0.130
F
8.8 ( 4.97) 8.8 ( 4.97) 2,817.3 (6.33) 1,317.3 ( 2.11) – –
0.680 (1.7)
–
–
–
–
–
0.085 ( 4.1) – 517 245.92 0.240
TC
291.3 ( 2.09) 291.3 ( 2.09) 517 200.82 –
Notes: F and TC indicate distribution function for parameters. F means fixed (i.e. no distribution function) and TC is the triangular with spread constrained equal to the mean. For each random parameter two coefficients are estimated (b and c). For the triangular mean ¼ b and spread ¼ c. t-statistics are given in parentheses. Parameter is significant at 99% level if the t-statisticW2.33; at the 95% level if the t-statisticW1.96 and at the 90% level if the t-statisticW1.65. t-statistics for MXL models calculated using robust standard. MNL models estimated with ALOGIT v4.2, MXL models with Train’s Gauss code using 500 Halton draws (http://elsa.berkeley.edu/Software/abstracts/train0296.html).
MXL model. The robustness of the results rests on the how well the different parameters of the model have been estimated. As can be seen all are significant at the 95% level. Confidence intervals can be calculated for the preliminary MNL model (FL-MNL1) and these are 727% for the fixed link constant and 770% for the marginal value of operating hours.
412
James Laird
Table 18.4: Properties of willingness to pay distributions for fixed link constant and operating hours (fixed link SP game). FL-MNL1
FL-MNL3
Value of fixed link constant (d per household per annum) Mean 524.1 SD N/A Proportion with negative WTP N/A
514.8 N/A N/A
Marginal value of an operating hour (d per household per annum) Mean 17.5 19.6 SD N/A N/A Proportion with negative WTP N/A N/A
FL-MXL4 333.6 284.2 0% 26.4 23.0 0%
Notes: Mean and SD values estimated using a Monte-Carlo simulation based on a population of 10,000 (split into the statistically relevant socio-economic groups from the model using sample proportions). Lowest and highest 2.5% are excluded from calculation of mean and SD for models FL-MXL4 where the cost parameter is treated as random.
Table 18.5: Non-parametric estimates of mean and median annual willingness to pay and confidence intervals for the contingent valuation question for 12 h ferry to 24 h ferry (d per annum per household). Mean
Median
SD
Lower Upper CI as Min bound bound percentage of 95% of 95% of mean CI CI
269.50
104.00 385.80 184.40
354.60
732%
Max
0.00 2000.00
No. of observations
79
18.2.2.3. Contingent valuation 12–24 h ferry question The primary purpose of the contingent valuation questions is to provide an internal validation of the fixed link stated preference results. This is because no other empirical work exists against which the derived value can be validated. As such the primary interest is in the average willingness to pay in the sample. Non-parametric methods are perfectly adequate for deriving such a value and there is therefore no need to fit a distribution to the contingent valuation data (Bateman et al., 2002, pp. 224–228, 237–242). Table 18.5(a) and (b) present non-parametric summary statistics for the contingent valuation question. This table shows that the mean willingness to pay for a move from a 12-h operating day to a 24-h operating day is d269.50 per annum per household (equivalent to d5.18 per week or d22.46 per month) with a 95% confidence interval of 732%. This premium is equivalent to an average d22.45 annual premium for each additional operating hour. It can therefore be seen that the fixed link stated preference data and the contingent valuation data give similar corroborating results.
Transport Welfare Benefits in the Presence of an Income Effect
413
18.3. The Change in Consumer Surplus and the Compensating Variation To compare the change in consumer surplus estimated using the Rule of Half (DCSRoH) and the compensating variation the findings from the analysis of the stated preference data are applied to the context of an actual transport intervention — the construction of fixed links to the islands of Scalpay, Berneray, Eriskay and Vatersay. Unlike the ferries the fixed links are free to use, that is there is no toll. They also give on average about a 15-min journey time saving over the ferries. Furthermore the fixed links are available at all times of the day, whereas previously the ferries only operated a 12 h day and at around a 75-min headway on average.
18.3.1. Change in Consumer Surplus Prior to construction of the fixed links an average 3.7 return person trips per household per week were made on the ferries to the islands (Halcrow, 1996; SQW, 2004). This value is based on ferry ticket sales data to the islanders. With the construction of the fixed links this has increased to an average 10.9 return person trips per household per week. The 10.9 return trips per person is the average number of return trips reported in a week long travel diary surveyed as part of this study. With reference to Figure 18.1 X0 is therefore 3.7 whilst X1 is 10.9. Xc is unknown. As discussed previously in a transport appraisal it is normal to assume the Marshallian demand curve is linear. In which case the change in consumer surplus, Area GC0aubuGC1, can be estimated using Eq. (18.3). 1 DCS RoH ¼ ðX 0 þ X 1 ÞðGC 0 GC 1 Þ 2
(18.3)
With a linearly additive expression for generalised cost in headway, operating hours and price, as implied by Eqs. (18.1) and (18.2), this calculation can be disaggregated into a series of ‘mini’ consumer surplus calculations for each component of generalised cost. The formulation of the mini consumer surplus calculation for operating hours is given in Eq. (18.4). 1 OH OH DCS OH RoH ¼ ðX 0 þ X 1 ÞðC 0 C 1 Þ 2
(18.4)
OH where C OH are the cost of operating hours in the before and after situations. 0 and C 1 If we now focus on the user benefit arising from a change in operating hours (i.e. DCS OH RoH ) we can derive an estimate of using the rule of half to estimate the compensating variation. As each of the ferries operated a 12 h day, the construction of the causeway increases availability from a 12 h day to a 24 h day. From Table 18.2 the marginal value of increasing operating hours from 12 h to 24 h is 19.7 p/h/single vehicle trip. With an average vehicle occupancy of 2.8 when travelling on the ferry
414
James Laird
(Halcrow, 1996; SQW, 2004) this implies a marginal value of 7.04 p/h/single person trip (i.e. 14.07 p/h/return person trip). The change in consumer surplus arising from an increase in operating day is therefore given by: 1 DCS OH RoH ¼ ð3:7 þ 10:9Þð14:07 12 0Þ 2 ¼ 1231:7 p=household=week Over the course of a year this implies a change in consumer surplus from the increase in operating hours of d640.46 per household. Extrapolating this consumer surplus calculation to each of the components of generalised cost3 implies that the total change in consumer surplus represents 27% of household disposable income. Clearly this is large.
18.3.2. Compensating Variation With reference to Figure 18.1 the compensating variation is given by the maximum willingness to pay to move from indifference curve U0 to U1, that is by the difference between G and J (equivalent to Area GC0aucuGC1). In this study this value has been elicited indirectly through stated preference questions couched in terms of willingness to pay through an increment in Council Tax. The result was validated with a contingent valuation question — a direct elicitation of the willingness to pay. From Table 18.4 the maximum willingness to pay per additional operating hour is d26.40 per household per annum. This implies the compensating variation of increasing operating hours from a 12 h to 24 h operating day is d316.80 per household per annum. An alternative estimate, d269.50, of the compensating variation can be obtained directly from the contingent valuation question.
18.4. Discussion A comparison between the change in consumer surplus estimated using the Rule of Half convention (DCSRoH) and the compensating variation indicates a large discrepancy. This discrepancy is d323.66 if comparing the DCSRoH against the compensating variation derived from the fixed link SP game model, and d370.96 if comparing against the contingent valuation result. The DCSRoH is therefore more
3. User benefits from operating hours form 14.3% of the total change in consumer surplus of the Berneray Causeway (Laird, 2008, Chapter 9). This implies that the average change in total consumer surplus across the four islands is about d4500 per household per annum. This represents 27% of household disposable income. Mean net household income in the Outer Hebrides is d16,642. Source: 2003/4 Scottish Household Survey variable annetinc (MORI Scotland et al., 2005).
Transport Welfare Benefits in the Presence of an Income Effect
415
than double either of the estimates of the compensating variation. This error is very serious in an appraisal context. To be confident that there really is a large discrepancy between the DCSRoH and the compensating variation it is necessary to examine other potential explanations that may have given rise to the finding. Firstly, the finding could be an outcome of comparing two imprecise estimates. The robustness argument therefore hinges on the statistical significance of the difference between the DCSRoH and the compensating variation. Whilst no confidence intervals for the marginal values derived from the MXL models are available, confidence intervals are available from the MNL models. Taking these as representative of the robustness of the change in consumer surplus results and the compensating variation results implies 95% confidence intervals for the difference between the two estimates4 as follows: 785% (i.e. a 95% confidence interval of d50–d600) for the difference based on the estimate of the compensating variation from fixed link SP game; and 749% (i.e. a 95% confidence interval of d190–d550) for the difference based on the estimate of the compensating variation based on the contingent valuation estimate. These are quite large ranges, though both indicate that a statistically significant difference exists. The comparison against the contingent valuation results also indicates that even at the lower end of the confidence interval the error associated with using the change in consumer surplus as an approximation to the compensating variation is about 50%. The utility functions estimated to the data have a constant marginal utility of income. This is because the dataset is too small to derive a statistically valid model and allow the marginal utility of income to vary by either population segment or as a function of expenditure on transport services and income. This means that no income effect has been included in the demand model (see e.g. Jara-Diaz & Videla, 1990; Cherchi & Polak, 2007; Daly et al., 2008). This would be a serious concern if the paper estimated demand for a new ferry service or fixed link, or if the paper calculated the impact of the income effect by estimating both the Marshallian and Hicksian demand schedules. It is still of some concern for this paper as the marginal values of headway and operating hours maybe biased. Some of the difference between the compensating variation estimate and the DCSRoH may therefore be a result of this bias. As discussed in the introduction to this paper two errors exist in measuring the exact welfare benefit. There is the approximation by the Rule of Half to the change in consumer surplus and there is the approximation by the change in consumer surplus to the compensating variation. With the method adopted in this paper it is not
4. Standard errors and confidence intervals for the difference between the change in consumer surplus and the compensating variation were calculated following Hess and Daly (2008).
416
James Laird
possible to separate the two exactly. However experimentation with negative exponential demand curves indicate that a linear approximation to the demand curve (i.e. the use of the Rule of Half convention) may result in an overestimate of the change in consumer surplus of up to 40% (Laird, 2008, pp. 160–162). Though for such an overestimate to occur, the demand curve must be very convex to the origin.5 This implies that with a 100% error between the DCSRoH and the compensating variation, 43% of it is due to the presence of an income effect (Error 1) and 57% is due to the use of the Rule of Half (Error 2). Willig’s (1976) rule of thumb indicates that a difference of 43% between the change in consumer surplus and the compensating variation (Error 1) could only be associated with a change in consumer surplus as a proportion of income far in excess of the 27% estimated here — unless a very high income elasticity exists. This suggests that the estimate of Error 1 is too large. However, for transport interventions such large differences may occur because the intervention impacts on both the money budget constraint and the time budget constraint. Willig’s analysis only considers the money budget constraint. In principle Willig’s calculation could be extended to include the time budget, but this is beyond the scope of this paper. It is also noted that an error of 43% due to the income effect (Error 1) is part way between the error estimated by Cherchi and Polak (2007) and Daly et al. (2008), and is therefore not out of line with other estimates of the influence of income effects in transport welfare analysis for extreme examples. A contrasting argument to the above is that the difference between DCSRoH and the compensating variation is in part due to the presence of respondent biases in the data — the most relevant of which are scoping effect bias and framing bias. Scoping effect bias occurs as respondents have difficulty valuing the size of the benefit. This can result in small benefits (per trip) being valued too highly and large benefits (per annum) being valued too low. It has already been argued that the marginal values (per trip) implied by the models estimated to the data are comparable with other literature including that derived from revealed preferences. The consistency with revealed preference data in particular would suggest that the marginal values per trip are of the correct order of magnitude. The interest in this bias therefore becomes one of understanding whether the annual values are too low. Respondents may also give biased responses because of the framing of the fixed link stated preference game as the payment vehicle is Council Tax. As with any form of taxation, there are mixed feelings in the population regarding how tax revenue is spent — it would be fair to say that not everyone agrees with a local authority’s spending programme. To a certain extent this can be seen in the zero willingness to pay responses to the contingent valuation questions. Respondents who indicated a zero willingness to pay were asked what their motivation was for such a response. From these data it was found that 13% of the responses could be viewed as
5. Demand curve D1 ¼ D0e( b.cost^0.3), where D1 is demand after intervention, D0 demand before the intervention and cost the difference in generalised cost before and after the intervention.
Transport Welfare Benefits in the Presence of an Income Effect
417
protest votes. Whilst these protest votes were excluded from the data analysed and reported in this paper it is possible that other respondents, who recorded positive willingness to pay amounts, could have been influenced by the use of Council Tax as a payment vehicle. Whilst there is consistency between the responses to the scenarios posed in the fixed link stated preference game and the response to the contingent valuation question, it is possible that scoping effects and framing have had the same effect on both sets of data. To test for framing and scoping bias we would ideally like to examine the level of error associated with the using the Rule of Half by population segment. This is because we would expect the income effect to be small (possibly negligible) for those who make very few trips across the fixed link (e.g. 1 or less a week) and for those with high incomes, whilst we would expect the effect to be large for those with low incomes and high trip making characteristics. The small dataset combined with a lack of data on how travel demand for ferry/fixed link varies by socio-economic group before and after construction of the fixed link makes this additional analysis impossible. No definitive answer can therefore be given as to whether scoping effect bias and framing bias have had a significant impact on the willingness to pay results. In defence of the values derived, the fixed link stated preference game and the contingent valuation scenarios posed to households were in the context of weekly premiums of Council Tax. These premiums were set at levels comparable to up to an average of 2.5 return ferry trips per week on the old ferries. The sums presented were therefore not only familiar to householders, but were of the same order of magnitude as the return ferry fares.
18.5. Conclusion The main conclusion of this paper is that the income effect and the overestimate of benefit associated with assuming a linear demand curve, whilst ignored in almost all transport economic appraisals, can be important to the calculation of welfare benefits. For the example of the construction of fixed links in small remote island communities a standard transport appraisal can give an estimate of the change in consumer surplus that is over 25% of household disposable income. This is quite substantial. This derives from time savings, ferry fare savings and improvements in availability (frequency and operating hours). Focusing on the latter this study has estimated: marginal values per household trip of between 2.0 and 11.3 pence per headway minute, and between 19.7 and 34.7 pence per operating hour; as well as willingness to pay values of between d22.45 and d26.40 per operating hour per household per annum. Households value marginal changes in these attributes as they improve departure time choice. By estimating these values this study has been able to compare the change in consumer surplus, as estimated in a conventional transport appraisal, with the compensating variation. There is a large discrepancy between the two values. For this particular example the estimated change in consumer surplus is
418
James Laird
more than double the compensating variation — the exact measure of welfare benefit of the transport intervention. The discrepancy arises due to the presence of two errors. Firstly, just over half of this difference is attributed to the practice of assuming a linear Marshallian demand curve when estimating the change in consumer surplus (i.e. the Rule of Half), whilst the remainder is attributed to the presence of an income effect. It is well known that the assumption of a linear demand curve in the presence of large cost changes can seriously overestimate the change in consumer surplus. Despite this the practice of making the assumption continues. This paper has therefore provided further evidence on the need to use alternative methods when estimating welfare benefits. Income effects become more important the larger the income elasticity of demand and the larger the benefit received is relative to income. Furthermore as transport schemes impact on both the money budget constraint and the time budget constraint income effects may be larger than would have been thought from an analysis of the money budget alone. Until we have a better understanding of the implications of using the change in consumer surplus as an approximation to the exact welfare benefit in a transport appraisal it is important that analysts examine the scale of benefit accruing to households/individuals in relation to their income as a standard part of any appraisal (e.g. as suggested by Willig). The nature of the empirical work means there is some uncertainty in the estimates, and this coupled with the potential that responses may be biased by either scoping effects or question framing (in the form of the payment vehicle) mean the scale of the income effect may be overestimated. Further research is needed to address these empirical issues, but also to understand the relevance of estimating the welfare benefit using the Rule of Half in general transport policy scenarios. A potentially fruitful avenue could be to extend the empirical analysis presented in this paper to other scenarios including road pricing, or any infrastructure intervention in which large cost changes are experienced by frequent transport users (e.g. commuters). Further added value could be obtained by also collecting sufficient data to estimate the Marshallian and Hicksian demand curves and disaggregate the results by socioeconomic group — something that was not possible with the existing dataset.
Acknowledgements The research presented in this paper is drawn from the author’s PhD thesis at the University of Leeds. I am grateful to my supervisors Peter Mackie and Richard Batley for comments, support and advice during the research and preparation of this paper. I am also grateful to the EPSRC studentship that funded my studies and to Transport Scotland for funding the surveys. I am further grateful to the three anonymous referees who reviewed and commented on the paper. None of the above, however, is in any way responsible for the results described and views expressed in this paper.
Transport Welfare Benefits in the Presence of an Income Effect
419
References Alston, J. M., & Larson, D. M. (1993). Hicksian vs. Marshallian welfare measures: Why do we do what we do? American Journal of Agricultural Economics, 75(3), 764–769. Bateman, I. J., Carson, R. T., Day, B., Hanemann, M., Hanley, N., Hett, T., Lee, M. J., Loomes, G., Mourato, S., Ozdemiroglu, E., Pearce, D. W., Sugden, R., & Swanson, J. (2002). Economic valuation with stated preference techniques: A manual. Cheltenham: Edward Elgar. Boardman, A. E., Greenberg, D. H., Vining, A. R., & Weimer, D. L. (2001). Cost-benefit analysis: Concepts and practice (2nd ed.). Upper Saddle River, NJ: Prentice Hall. Bra˚then, S., & Hervik, A. (1997). Strait crossings and economic development. Developing economic impact assessment by means of ex post analyses. Transport Policy, 4(4), 193–201. Bra˚then, S., & Lyche, L. (2004). Economic impact assessment in Norwegian road ferry links. A discussion of a few cost components. Working Paper no. 5. Molde University College and Molde Research Institute (in Norwegian only), Molde. Cherchi, E., & Polak, J. W. (2007). Income effects in the evaluation of user benefits using discrete choice models. Universities’ Transport Study Group 39th Annual Conference, 3–5 January 2007. University of Leeds, Harrogate, Leeds. Cirillo, C., & Axhausen, K. W. (2006). Evidence on the distribution of values of travel time savings from a six-week diary. Transportation Research Part A, 40(5), 444–457. Daly, A., de Jong, G. C., Ibanez, N., Batley, R. P., & de Bok, M. (2008). Welfare measures from discrete choice models in the presence of income effect. Paper presented at the European Transport Conference, Noordwijkerhout, Netherlands, 6–8 October 2008. AET Transport, London. De Jong, G. C., Daly, A., Pieters, M., & Van der horn, T. (2007). The logsum as an evaluation measure: Review of the literature and new results. Transportation Research Part A, 41, 874–889. Department for Transport. (2007). Values of time and operating costs. Transport Appraisal Guidance (TAG) Unit 3.5.6. Department for Transport, London. [Accessed 6 May 2008]. Available from World Wide Web: http://www.webtag.org.uk/webdocuments/3_Expert/ 5_Economy_Objective/3.5.6.htm Geurs, K. T., Haaijer, R., & van Wee, B. (2006). The option value of public transport: Methodology for measurement and case study for regional rail links in The Netherlands. Transport Reviews, 26(5), 613–643. Halcrow, F. (1996). Berneray Causeway scheme appraisal. Report to Western Isles Council. Report dated June 1996 (unpublished). Hensher, D. A., & Greene, W. H. (2003). The mixed logit model: The state of practice. Transportation, 30(2), 133–176. Hess, S., & Daly, A. (2008). Calculating errors for measures derived from choice modeling estimates. Paper presented at European Transport Conference, Noordwijkerhout, Netherlands, 6–8 October 2008. AET Transport, London. Humphreys, R. M., & Fowkes, A. S. (2006). The significance of indirect use and non-use values in transport appraisal. International Journal of Transport Economics, XXXIII(1), 17–35. Jara-Diaz, S. R., & Videla, J. I. (1990). Welfare implications of the omission of income effect in mode choice models. Journal of Transport Economics and Policy, 24(1), 83–93. Laird, J. J. (2008). Modelling the economic impact of transport projects in sparse networks and peripheral regions. Ph.D. thesis, University of Leeds, UK.
420
James Laird
Metz, D. (2005). Journey quality as the focus of future transport policy. Transport Policy, 12, 353–359. Mori Scotland, TNS Social Research and Scottish Executive. (2005). Scottish household survey, 2003-4 [computer file]. UK Data Archive [distributor], Colchester, Essex. Nellthorp, J., & Hyman, G. (2001). Alternatives to the rule of a half in matrix based appraisal. Proceedings of the European transport conference, 10–12 September, Cambridge. AET Transport, London. Office for National Statistics (ONS). (2007). Family spending: 2006 edition. Palgrave Macmillan, Basingstoke. [Accessed 8 April 2008]. Available from World Wide Web: http://www.statistics. gov.uk/downloads/theme_social/Family_Spending_2005-06/Familyspending2005-06.pdf Painter, K. M., Scott, R. D., II., Wandschneider, P. R., & Casavant, K. L. (2001). Using contingent valuation to measure user and nonuser benefits: An application to public transit. Review of Agricultural Economics, 24(2), 394–409. SQW. (2004). An evaluation of the social and economic impacts of fixed links to the islands of Scalpay and Berneray. A final report to Western Isles Enterprise. Report dated March 2004 (unpublished). Wardman, M. (2004). Public transport values of time. Transport Policy, 11(4), 363–377. Willig, R. D. (1976). Consumer’s surplus without apology. The American Economic Review, 66(4), 589–597.
Chapter 19
Which Commuters Will Car Share? An Examination of Alternative Approaches to Identifying Market Segments Jon Crockett, Gerard Andrew Whelan, Caroline Louise Sinclair and Hugh Gillies
Abstract Interest in car-sharing initiatives, as a tool for improving transport network efficiency in urban areas and on interurban links, has grown in recent years. They have often been proposed as a more cost effective alternative to other modal shift and congestion relief initiatives, such as public transport or highway improvement schemes; however, with little implementation in practice, practitioners have only limited evidence for assessing their likely impacts. This study reports the findings of a Stated Preference (SP) study aimed at understanding the value that car drivers put on car sharing as opposed to single occupancy trips. Following an initial pilot period, 673 responses were received from a web-based survey conducted in June 2008 amongst a representative sample of car driving commuters in Scotland. An important methodological aspect of this study was the need to account for differences in behaviour to identify those market segments with the greatest propensity to car share. To this end, we estimated a range of choice model forms and compared the ability of each to consistently identify individual behaviours. More specifically, this included a comparison of: Standard market segmentation approaches based on multinomial logit with attribute coefficients estimated by reported characteristics (e.g. age, income, etc.);
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
422
Jon Crockett et al.
A two-stage mixed logit approach involving the estimation of random parameters logit models followed by an examination of individual respondent’s choices to arrive at estimates of their parameters, conditional on know distributions across the population (following Revelt & Train, 1999); and A latent-class model involving the specification of C classes of respondent, each with their own coefficients, and assigning each individual a probability that they belongs to a given class based upon their observed choices, socioeconomic characteristics and their reported attitudes. As hypothesised, there are significant variations in tastes and preferences across market segments, particularly for household car ownership, gender, age group, interest in car pooling, current journey time and sharing with a stranger (as opposed to family member/friend). Comparing the sensitivity of demand to a change from a single occupancy to a car-sharing trip, it can be seen that the latter imposes a ‘penalty’ equivalent to 29.85 IVT minutes using the mixed logit structure and 26.68 IVT minutes for the multinomial specification. Segmenting this latter value according to the number of cars owner per household results in ‘penalties’ equivalent to 46.51 and 26.42 IVT minutes for one and two plus car owning households respectively.
19.1. Introduction High occupancy vehicle (HOV), or carpool, lanes are reserved for the use of vehicles with two or more occupants (i.e. the driver plus one or more passengers), including buses. Interest in their deployment has grown in recent years as Central and Local Governments seek initiatives for more efficient use of the highway network. By encouraging car sharing, as opposed to single occupant car trips, they have the potential to reduce the total volume of traffic on the network. Typically they are created through the widening of existing alignments or the use of the hard shoulder on the motorway network. In recent years, consideration has also been given to their inclusion from scheme opening. Understanding the propensity for using HOV lanes amongst the existing travelling population requires an appreciation not just of traditional attributes such as time and cost, but also the impact sharing with an acquaintance or stranger has on these attributes. Whilst it can be assumed that journey costs can be distributed between the driver and passengers of the car, a decision to car share may lead to an increase in journey time, through diverting to pick up passengers. Additionally, there may be ‘penalties’ to the car driver that cannot be captured in the more readily quantifiable and measurable attributes such as monetary cost and time, including issues related to comfort, convenience, reliability, autonomy and security. It is common for these more qualitative attributes of travel to be considered as part of the Alternative Specific Constant (ASC), or mode penalty. The ASC is
An Examination of Alternative Approaches to Identifying Market Segments
423
considered to be uniform across all journeys, i.e. it does not vary by journey time, cost or distance, by a given alternative form of transportation, although it may be segmented by journey purpose and other market segments such as demographic and socio-economic characteristics. Hypothesised components of the ASC construct in the context of this study include: Convenience in terms of the additional penalty associated with not being able to perform a ‘door-to-door’ journey or having to share personal space with other passengers; Reliability in that additional journey time and detour is likely to add a real, or perceived, additional element of (un)reliability to journey time. Additionally, passengers and drivers may have concerns regarding the reliability of their ‘pick-up’; Security, particularly in relation to having to share confined space with other travellers with whom the traveller in question may not be familiar. Alternatively there may be a positive utility from feeling more secure when travelling at night with another person; Comfort, whereby the ergonomics, cleanliness and privacy of the traveller’s immediate environment is compromised in some way; and Autonomy, typically a greater feeling of autonomy is generated by the feeling of being in control over one’s own life. Adherence to a fixed schedule, i.e. a pick up time, may compromise or adversely affect such feelings. However, conversely, some people may seek the social interaction with fellow passengers. Another means of considering the less tangible factors is via the inclusion of alternative-specific in-vehicle time (IVT) factors, which quantify the relative (dis)utility of time spent on alternatives. The objective of this study was to provide quantifiable evidence on the propensity of commuters in Scotland to use a dedicated HOV lane. A Stated Preference (SP) survey of car-borne commuters in Scotland was undertaken, to examine their sensitivity to changes in travel times and costs and to gain a better understanding of the modal penalty (ASC) or equivalent IVT factor, so that the implementation of HOV lanes could be robustly appraised. The primary objectives of the study were to: Develop ASCs relative to single occupancy for car sharers; Derive ASCs for certain key market segments, according to their relative importance in the model process; Examine the potential for deriving IVT factors for car sharing as opposed to single occupancy (normalising the IVT factor of drive alone to one); Examine the impacts of sharing with an acquaintance as opposed to a member of a car pool or work colleague; and Develop a forecasting tool to assess the relative market share (probability) for car sharing under a number of hypothetical scenarios.
424
Jon Crockett et al.
19.2. Modelling Approaches The modelling of random taste distributions within a population under the random utility maximising (RUM) behavioural framework has relied on either continuous distributions or finite ones. Whilst continuous distributions have considerable merit, particularly in terms of their seeming analogy with respect to real world choice data (Train, 2003), their use in modelling choices and estimation presents particular challenges (see Greene & Hensher, 2003). This is especially true when we attempt to specify a functional form (typically normal or lognormal) for what may be considered ‘lumpy’ preference data, where underlying segments, or classes, are the source of the distribution in attribute valuations. The alternative, finite, approach employing endogenous segmentation of the population, offers considerable benefits in terms of ease of implementation, estimation and interpretation. Such segmentation has traditionally relied on easily collected data relating to the individual or household; however, a growing body of research has identified that preferences are not so easily grouped and additional attitudinal data, non-measured characteristics and latent classes may be more reliable indicators of differences in taste and preference across the population.
19.2.1. Multinomial Logit Standard market segmentation approaches have, for many years, been based on the multinomial logit form with attribute coefficients estimated by reportable, easily measurable, and comparable characteristics (e.g. age, income, journey purpose etc.). Functional forms are well documented and traditionally involve separate valuations of the ASC component of the utility specification. The relative merits of the multinomial logit (MNL) model, and benefits/disbenefits of prejudging the market segments of interest, do not require wider discussion at this point, except to say that their shortcomings have been a stimulus into research for alternatives for a number of years (see, e.g., Ortuzar and Willumsen (2006) for a wider ranging discussion on discrete choice model forms).
19.2.2. Mixed Logit The mixed logit model (see Ben-Akiva & Bolduc, 1996; McFadden & Train, 2000) stands as the most significant extension of the multinomial logit approach. The utility function of an alternative j for an individual i is given by: U ij ¼ yi X ij þ ij
(19.1)
where Uij, the utility of alternative j for individual i; yi, a vector of unknown coefficients that vary randomly according to individual tastes; Xij, a vector of
An Examination of Alternative Approaches to Identifying Market Segments
425
observable variables; eij, a random error term which is assumed to follow an IID Gumbel distribution, independent of yi and Xij. Among the many attractive features of the mixed logit model is its ability to take account of taste variation among decision-makers by allowing coefficients (yis) to follow pre-specified distributions (usually normal or lognormal). Whilst accounting for heterogeneity in the population, simple applications of the technique fail to identify valuable information on differences in preference and behaviour between market segments. The ‘standard’ approach to overcome this problem when working with the mixed logit model is to identify segments prior to modelling and either specify a set of constant coefficients for each market segment together with an additional error term to ‘mop-up’ any residual variation, or by allowing separate distributions for each market segment. A ‘two-stage’ mixed logit approach involves the estimation of random parameters logit models followed by an examination of individual respondent’s choices to arrive at estimates of their parameters. These latter parameters are conditional on known distributions across the population (see Revelt & Train, 1999). Following on from Eq. (19.1), if ki, ki1ykit denotes the series of choices made by an individual i in a choice experiment, then conditional on the individual’s preferences (bi), the probability that individual i chooses alternative k (from alternatives j, j + 1yk) in choice scenario t can be expressed in the logit form: expðb0 X ikit Þ Pit ðkit jbi Þ ¼ P 0 j expðb X ij it Þ
(19.2)
The unconditional probability is the integral of the conditional probability over all possible values of b: Z Pit ðkit jbÞ f ðbjyÞdb
Qit ðkit jyÞ ¼
(19.3)
Assuming that the individual’s tastes do not change over choice situations, the conditional probability of individual i’s sequence of choices is the product of the logits: Y
Lit ð yit jbÞ
(19.4)
Sð yi jbÞ f ð bjyÞdb
(19.5)
Sð yi jbÞ ¼ The unconditional probability is: Z Pð yi jyÞ ¼
The goal of the first stage of the estimation process is to estimate parameters that describe the distribution of tastes across individuals. Unlike the estimation of standard logit models exact maximum-likelihood estimation is not possible since the integral in Eq. (19.5) cannot be evaluated analytically. Instead, a simulated likelihood
426
Jon Crockett et al.
function is specified in which P( yi|y) is approximated by summation over randomly chosen values of b. The process is repeated for R random draws of b (where br is the r-th draw from f(b|y)) and the simulated probability of the individual’s sequence of choices is: SPð yi jyÞ ¼
X 1 Sð yi jbr Þ R
(19.6)
So long as the number of random draws is sufficiently large, the simulated probability is an unbiased estimate of the P true probability and the simulated likelihood function is constructed as SLL ¼ i lnðSPðyi jyÞÞ. In recent years, the exploitation of Halton sequences (Train, 1998; Bhat, 2000), and more recently techniques such as the Modified Latin Hypercube Strategy (MLHS — see Hess, Train, & Polak, 2006), has improved both the accuracy and speed of estimation. Although we can estimate the density f(b|y) in Eq. (19.5) describing the distribution of tastes in the population it is also desirable to know where each decision-maker is in this distribution. Following Revelt and Train (1999), let g(b|yi,y) denote the density of b conditional on the decision-maker’s (i) sequence of choices and the population parameters y. By Bayes’ rule: gðbjyi ; yÞ ¼
Pðyi jbÞ f ðbjyÞ Pð yi jyÞ
(19.7)
Eq. (19.7) is then used to calculate the conditional expectation of b, the individual’s expected tastes k(b): Z Eðkjyi ; yÞ ¼
kðbÞ gðbjyi ; yÞdb
(19.8)
Substituting the formula for g: R
kðbÞ Pð yi jbÞ f ðbjyÞdb Pð yi jbÞ R kðbÞ Pð yi jbÞ f ðbjyÞdb R ¼ Pð yi jbÞ f ðbjyÞdb
Eðkjyi ; yÞ ¼
ð19:9Þ
As Eq. (19.9) does not have a closed form, the conditional expectation of b is approximated by simulation. This procedure involves taking random draws of b from the population density f(b|y) and estimating the weighted average of these draws with the weight of the draw br being proportional to P( yi|br): ~ Eðkjy i ; yÞ ¼
P
Þ Pðyi jbr Þ r r Pð yi jb Þ
kðb rP
r
(19.10)
Whelan (2003) reports on the results of a two-stage mixed logit approach versus conventional mixed logit from a SP experiment on choice of vehicle type. As would
An Examination of Alternative Approaches to Identifying Market Segments
427
be expected, an un-segmented mixed logit structure was found to have a significantly better fit than the standard multinomial logit. Using standard socio-economic characteristics, the process outlined above was found to be unsatisfactory as preferences for different vehicle types were diverse and not easily grouped using conventional market segments. Latent-class (L-C) models were recommended as an alternative methodology for further analysis (Whelan, 2003).
19.2.3. Latent Class L-C models are an alternative form of endogenous segmentation. Previous studies have argued its merits as a convenient and intuitive alternative to the mixed logit continuous approach, particularly in terms of computational time and cost (Provenchar, Baerenklau, & Bishop, 2002). It enables preferences amongst a sample to be modelled without making any prior assumptions regarding some observable deterministic explanation for that heterogeneity, i.e. through demographic, socioeconomic and travel characteristics or via psychographics.1 From a theoretical point of view the L-C structure has much to recommend, not least in its ability to incorporate respondent attitudes within the analysis. Use of L-C models has been widespread in other sciences for a number of years, but, with the exception of Greene and Hensher (2003), rarely applied in transportation research and practice. Examples of L-C models based upon choice data include Provenchar et al. (2002) and Scarpa and Thiene (2005). In these studies choice data was used to estimate the number of (latent) classes, the probability of class membership, and the preference parameters in each class’s conditional, indirectutility function. No attitudinal data was used in these studies; Breffle, Morey and Thacher (2006) report on the combination of attitudinal and choice data to produce combined L-CAC models, where the subscripts A and C denote attitudinal and choice. In each preference class it is assumed that individuals respond and behave in a similar manner to one another, i.e. the choice patterns of individuals from the same preference class are more correlated with each other than with individuals in other classes. Each individual is assigned a probability (P) of belonging to a given class (Ci) based upon their observed choices, socio-economic characteristics and their reported attitudes (if provided). Once preference class has been accounted for, it is assumed that all choices are independent across scenarios and individuals. Following Greene and Hensher (2003), for a given class of respondent C, the probability of the discrete
1. Psychographics are any attributes related to a person’s, or group’s personality, values, attitudes, interests or lifestyles. Attitudinal data relating to these aspects can be incorporated in choice modelling alongside standard variables using techniques such as cluster analysis.
428
Jon Crockett et al.
choice j among Ji alternatives by individual i in choice scenario t can be specified as: expðbc X ijt Þ Pijtc ¼ Pj i i¼1 expðbc X ijt Þ
(19.11)
where Pijt|c, the probability of individual i of class c choosing alternative j in choice scenario t; bc, a set of estimated coefficients for class c; Xijt, a set of attributes that describe the choice alternative. This standard logit choice model is specified with its own set of coefficients for C classes of respondent. The probability that a respondent belongs to a given class can be based on their observed choices, their observed characteristics and their reported attitudes. This is achieved using another logit model that shows the probability that individual i belong to class c as: expðyc Zi Þ Pic ¼ PC c¼1 expðyc Z i Þ
(19.12)
where Pic, the probability that individual i belongs to class c; yc, a set of c coefficient vectors; Zi, a set of observable characteristics and reported attitudes. The models detailed in Eqs. (19.11) and (19.12) are calibrated jointly using maximum-likelihood estimation. It can be seen that across all classes and alternatives Pi is equal to one: Pi ¼
C X
Pijtc Pic
(19.13)
c¼1
Despite these apparent benefits, a significant drawback exists regarding the specification of the number of underlying classes within the data. Indeed, there is no well-established statistical test for determining differing hypotheses regarding their number. C is not a parameter in the interior of a convex parameter space; consequently, comparison of log likelihoods of sequentially smaller models is not an appropriate approach (Greene & Hensher, 2003). Scarpa and Thiene (2005) report on the relative merits of different selection (information) criteria, including the Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC) and the corrected AIC (Hurvich & Tsai, 1989). All of these criteria fail some of the regularity conditions for valid test under the null (Leroux, 1992). The AIC is reported to be biased towards an over-estimate of the number of classes, whilst the BIC is not. Conversely, the BIC tends towards an underestimate when the sample size is small (McLachlan & Peel, 2000). As the number of classes increases, the significance of the parameter estimates in the utility function decrease. This is especially true in classes with low probability of class membership. The number of classes must therefore account for the significance of parameter estimates and the meaningfulness of their sign and magnitude (Scarpa & Thiene, 2005).
An Examination of Alternative Approaches to Identifying Market Segments
429
Undoubtedly this potential shortcoming leaves L-C models vulnerable to ‘attack’ from other parties who could speculate on the ‘arbitrary’ selection (by the practitioner) of the number of classes. Much is therefore dependent on the practitioner’s prior hypothesis. For example, in the context of car sharing, it would appear intuitive to hypothesise a minimum of three latent classes: A class strongly opposed to sharing ‘personal space’ with others, i.e. those favouring autonomy; An indifferent class, who, provided the incentive is strong enough are willing to share; and A ‘socialable’ class who would derive utility from the sharing of the space with others. In addition to the above, we could speculate that there will be classes of respondent primarily motivated by, e.g. time or money; however, these attributes are incorporated in the experiment and will not therefore influence L-C membership (under the assumption that the utility parameters for time and cost explicitly account for all variations in taste and preference associated with these attributes). Greene and Hensher (2003) report on the comparative qualities of a mixed logit and L-C approach on the same SP dataset, from an experiment pertaining to route choice. Both forms are preferred to the multinomial logit, with the L-C model offering clear benefits in terms of its lack of distributional assumptions about individual heterogeneity. However, it is noted that the range of functional forms which can be specified within the mixed logit formulation offset this to a great extent. In conclusion, the differences in degree-of-fit are not sufficient to warrant the choice of one model over the other, instead ‘both models allow the analyst to harvest a rich variety of information about behaviour from a panel, or repeated measures, dataset’.
19.3. Data In order to maximise the number of people reached within the target market, a webbased self-completion survey was employed. The survey form was designed to be generic, so that responses could be elicited from respondents all across Scotland. This sampling strategy ensured that the valuations and recommendations regarding car sharing are transferable between different contexts and localities. This assumption was tested through the incorporation of geography, in the form of the Scottish Executive’s urban/rural classification, as a potential explanatory variable within the model construct. The survey was designed in such a way as to ensure that the decision context related to an actual journey likely to be experienced by the respondent, and the ‘response space’ allowed the respondents to describe their hypothetical behaviour. Screening questions at the beginning of the surveys ensured that the respondents were able to provide a response relevant to the car-sharing context. Users of non-car modes were automatically routed past the SP choice scenarios, and solely provided
430
Jon Crockett et al.
information regarding their current travel patterns, demographics and socioeconomic characteristics. The study was concerned with the values car users place on car sharing as opposed to driving alone, expressed in terms of the attributes: Monetary cost: parking charges; and fuel cost. Journey time: as a result of HOV lanes; walk time; waiting time due to pick-up/drop-off of all car passengers; and having a designated car parking space at their workplace. Car-sharing context: sharing with a friend or family member; and a car pool member or work colleague. The choice experiment asked the respondent to consider a journey similar to the one they undertook on the day they completed the survey, where the available options were single occupancy car trip, car share (using a HOV lane) and do not travel/travel elsewhere/use another mode. To ensure that the SP design was both realistic and relevant to the respondent, the experimental design was expressed as differences from the respondent’s current journey time and costs. The SP design contained five attributes in total. Journey cost and time were assigned three possible levels, with car sharing assigned two levels. The attributes are presented in Table 19.1. The full factorial design of the alternatives presented in Table 19.1 necessitates nine scenarios in the overall fractional factorial statistical design, summarised for one block of the final design in Table 19.2. Table 19.1: Attributes and levels. Level 1
Level 2
Level 3
Car alone Journey timea Costb
15 125
5 50
25 0
Car share Journey timea Costb Sharingc
20 50 0
10 100 1
10 25 –
a
Time is shown in minutes difference from current journey. Cost is shown in pence difference from current journey. c Sharing is shown as 0 ¼ family member or friend, 1 ¼ colleague or car pool member. b
An Examination of Alternative Approaches to Identifying Market Segments
431
19.3.1. Sample A total of 673 surveys were received from respondents whose most common mode of travel to work was to drive a car or van. The survey collected data about the regular commute to work and demographic and socio-economic characteristics of the respondents. Tables 19.3 presents the gender and age profile of the survey respondents against corresponding data for the total Scottish population (General Register Office for Scotland), and shows that the survey respondents were typical generally of the general population. As the survey was targeted at commuters, the percentage of respondents over the age of 60 will be lower than the general population. Table 19.2: Stated preference differences design (block 1). Scenario
1 2 3 4 5 6 7 8 9
Car alone
Car share
Journey time (min)
Cost (pence)
Journey time (min)
Cost (pence)
Sharing
15 5 25 15 25 5 5 15 25
125 50 125 0 50 0 125 50 0
20 10 10 20 10 10 10 20 10
50 100 50 25 100 25 50 100 25
Family/friend Family/friend Family/friend Stranger Stranger Family/friend Stranger Family/friend Family/friend
Table 19.3: Profile of survey respondents and Scotland. Survey (%)
Scotland population estimate (%)
Difference (%)
Gender Male Female
52 48
48 52
+4 4
Age 17–24 years 25–34 years 35–44 years 45–54 years 55–59 years 60 years and over
6 27 35 25 4 3
16 19 24 22 10 8
10 +8 + 11 +3 6 5
432
Jon Crockett et al.
Similarly differences between the sample profile and the Scottish population in general can be accounted for by factors such as: the working population being likely to be positively skewed towards those aged between 24 and 54, i.e. post-higher education and pre-retirement; car ownership also being proportionately higher amongst these groups than the younger population (17–24), leading to increased propensity to commute by car; and car availability, for the commute, might also be expected to be higher for males as opposed to females. When such differences are accounted for, the confidence that the survey has achieved a representative increases markedly. Table 19.4 shows the car ownership of the respondents, who under the initial screening questions have indicated that their normal commute was as a car driver (and are therefore likely to own a car). The majority of respondent’s households own more than two cars. Table 19.5 shows the propensity of the survey respondents to travel with other people, by gender. It shows that the majority of people travel alone, but that females are more likely to travel with companions, and their companions are more likely to be children. Existing journeys with companions such as children could reduce the flexibility of respondents choosing to join car-pool schemes. The flexibility of respondents with respect to potential for car pooling was explored further by considering ‘trip-chaining’ or performing journeys with multiple Table 19.4: Car ownership level amongst survey respondents. Car ownership
Percentage
None One Two Three plus
0.6 38.7 52.5 8.2
Table 19.5: Propensity to travel with companions by gender. Traveling companions
Alone Self + child Self + adult Self + two others plus
Percentage by gender Male
Female
82 4 10 4
78 8 8 6
An Examination of Alternative Approaches to Identifying Market Segments
433
Table 19.6: Participation in additional activities on the journey to work. Additional activities Nothing Take spouse/partner to work Take children to school Take friends/family to work/education Go shopping/other errands Buy things for work Other
Percentage 50.5 7.7 14.9 5.9 21.8 13.2 6.9
purposes, e.g. shopping on the way home from work. Table 19.6 illustrates the other activities undertaken by survey respondents on their journey to work. The majority of respondents do not undertake additional activities, but shopping and performing the school run are popular activities. The survey also revealed that women and respondents aged 25–54 were more likely to undertake additional activities. Respondents were also asked if they would consider joining a car-pool initiative. Thirty six per cent were interested in car sharing while 3% indicated that they were already a member of such a scheme. Of the 415 respondents who provided a response to the potential use of car pooling should there be HOV lanes, 20% indicated that they would be interested in joining a car share scheme.
19.4. Results The data from the SP experiments was analysed under a discrete choice-modelling framework. This framework is based on the principle that a decision-maker (the traveller) chooses the choice alternative (mode) that yields greatest satisfaction or ‘utility’, where utility is taken to be related to the ‘attributes’ of the choice alternative (e.g. monetary cost, journey time and who the vehicle was shared with). The choice context is composed of a finite set of alternatives. The aim was to develop a model that shows the probability that a decision-maker will choose a choice alternative and to quantify how this choice probability is influenced by changes in the attribute of the alternatives. By comparing the relative influence of one attribute against another, it is possible to infer its relative value. For example, by comparing the influence that changes in journey time have on choice compared with the influence that changes in journey cost have on choice, it is possible to estimate the implied value of time (VoT). The objective of the study was to quantify the additional value that car drivers and passengers put on their time spent sharing their vehicle with others (friends/family or car-pool member/colleague). This can be expressed as an absolute value in money or equivalent journey time units (an ASC or modal penalty), or can be related to the length of the journey by expressing the value proportionate to journey time (an IVT factor).
434
Jon Crockett et al.
Following conventional modelling practice, we started the data analysis using a MNL model, followed by more complex model forms which take better account of correlation between choice alternatives and variations in tastes and preferences across travellers.
19.4.1. Multinomial Logit Models The main models were estimated with a non-nested simple linear in the parameters form. The estimation models underwent a thorough set of econometric diagnostic tests to ensure that they were statistically robust. This included an examination of their overall level-of-fit, precision in parameter estimation and accommodation of correlation between attributes and alternatives. Two main models were developed as follows: Model 1. has a logit structure with the value of sharing (as opposed being alone) expressed as an absolute value; and Model 2. has a logit structure with the VoT spent sharing (as opposed to alone) expressed as a factor of IVT. In the first model the utility of each mode is specified as a function of monetary cost (fuel and parking), IVT and ASCs (d1, d2 and d3). The ASCs capture the preferences for travel by each alternative after taking account of the other variables (attributes) in the model, and as such represent the value of car sharing as opposed to car alone. The resulting utility specifications take the form: V CarAlone ¼ b1 CostCarAlone þ b2 IVTCarAlone þ d1 V CarShare ¼ b1 CostCarShare þ b2 IVTCarShare þ d2 V NT ¼ d3
ð19:14Þ
An alternative specification is in terms of a factor for IVT. Initial analysis was undertaken with no ASCs; however, this provided a positive coefficient on the IVT for single occupancy car trip, which meant a (credible) IVT factor could not be computed. V CarAlone ¼ b1 CostCarAlone þ b2 IVTCarAlone þ d1 V CarShare ¼ b1 CostCarShare þ b4 IVTCarShare þ d2 V NT ¼ d3
ð19:15Þ
In addition, to the above a dummy variable for sharing with a stranger (car-pool member/colleague) was also included in an earlier specification. However, within this model specification, the parameter was found to be insignificant and of the intuitively wrong sign, i.e. a stranger was perceived more positively than a family member/ friend. This suggested that the attribute for sharing per se, was deemed to being
An Examination of Alternative Approaches to Identifying Market Segments
435
accounting for all statistical variation associated with car sharing. It was therefore excluded from further analysis in the aggregate model. Table 19.7 presents the results of the model estimations. The adjusted r2 for Model 1 is relatively high (0.187) showing a good degree-of-fit to the data. t-statistics are highly significant whilst parameter estimates have intuitively correct signs. The VoT for commuters is marginally lower than that recommended by the Scottish Government (d5.68/h for car drivers and d5.02/h for other occupants) at d4.86/h. The implied value for car sharing as opposed to drive alone is 26.68 min/trip, which compares to a mean current journey time of 31.1 min (the median is 30 min). On average, sharing as opposed to driving along imposes a ‘penalty’ equivalent to 85% of current journey time. Although the adjusted r2 for Model 2 suggests a good degree-of-fit to the data at 0.189 and all t-statistics are significant at the 5% level, the parameter for sharing IVT is of a lower magnitude than single occupancy. This is counter intuitive to the overall hypothesis that car sharing imposes, on average, a greater penalty upon the traveller than travelling by car alone, and contradicts the longitudinal trend of reducing car occupancy rates across Scotland. The utility construct under this form is also unnecessarily complicated by the consideration of both IVT factors and ASCs, with uncertainty regarding the ‘true’ penalty for car sharing for the practitioner. The combination of the two terms results in a distorted, and unrepresentative, distribution of sharing valuations when combined with the ‘pure’ IVT. The IVT factor results do not therefore form an appropriate specification technique for aggregate demand models such as Transport Model for Scotland (TMfS). Table 19.7: Car sharing valuation (Models 1 and 2). Attribute
Model 1
Model 2
Coefficient
t-Statistics
Coefficient
ASCCarAlone (d1) ASCCarShare (d2) ASCNT (d3)
0 1.23 2.3
Fixed 23.45** 42.99**
0 1.53 2.68
Fixed 20.61** 30.66**
Cost (b1) Time (b2) IVTCarAlone (b2) IVTCarShare (b4)
0.00569 0.0461
15.64** 25.15**
0.00585
15.91**
0.0708 0.0251
14.93** 6.14**
Null log-likelihood Log-likelihood Adjusted r2 Respondents Observations
6555.42 5327.797 0.187 673 5967
t-Statistics
6555.42 5311.329 0.189 673 5967
Notes: t-statistics are shown relative to zero. **Indicates significance at the 5% level. The models were estimated using BIOGEMEv1.5 (Bierlaire, 2003).
436
Jon Crockett et al.
Subsequent analysis focuses on the specification of models which consider additional terms, market segments and specifications.
19.4.2. Market Segment Analysis Building on the base models, a series of additional models were estimated to examine how tastes and preferences vary across the individuals in the sample. This was achieved through a combination of approaches involving the estimation of separate utility parameters for key segments, including: income (Inc), age group, car ownership (1 or 2 + ), gender (Female), employment status (Full-Time (F-T) or Part-Time (P-T)), geography (Scottish Executive six-fold urban/rural classification), whether the person they were sharing was a friend/family member or a stranger (ShareStranger), current journey time (JT), interest in partaking in a car pool (CarPoolInterest) and current propensity to car share. In Model 3, analysis of differences and values by market segment was undertaken on the absolute values of the ASCs. This builds on Model 1 by provided a disaggregated valuation of d2 according to key market segments. Piecewise estimation of the ASCs showed a strong relationship between choice, car ownership level, age group, gender, current journey time (continuous variable) and interest in car pooling. The preferred specification is shown in Table 19.8. Please note that there Table 19.8: Market segment analysis (Model 3). Attribute
Coefficient
ASCCarAlone (d1) ASCNT (d3)
0 1.92
ASCCarShare CarOwn2 + (d12 ) ASCCarShare Female (d22 ) ASCCarShare Age 25–44 (d32 ) ASCCarShare CarPoolInterest (d42 ) ASCCarShare JT (d52 ) ASCCarShare Share (d62 ) Cost (b1) Time (b2)
0.396 0.0951 0.167 1.85 0.00827 0.241 0.00391 0.0331
Null log-likelihood Log-likelihood Adjusted r2 Respondents Observations
t-statistics Implied value Units Fixed 38.61** 7.65** 1.76 3.15** 8.14** 6.59** 4.27** 31** 0.07**
58.01 11.96 2.87 5.05 55.89 0.25 7.28 – 8.47 d5.08
6456.544 5373.867 0.166 663 5877
Notes: t-statistics are shown relative to zero. **Indicates significant at the 5% level.
Min Min Min Min Min Min Min – p/min d/h
An Examination of Alternative Approaches to Identifying Market Segments
437
Proportion of Sample
0.25
0.2
0.15
0.1
0.05
90 10 0
80
70
60
50
40
30
20
0 10
-1
00 -9 0 -8 0 -7 0 -6 0 -5 0 -4 0 -3 0 -2 0 -1 0
0
Disbenefit of Car Sharing
Figure 19.1: MNL model values (Model 3). are 90 fewer observations in this model as 10 respondents did not report an estimate of their current journey time. The model fit is good, the parameters are generally statistically significant and the implied VoT is just marginally lower than that recommended by the Scottish Government. In general, higher car ownership, whether they are female, being aged between 25 and 44, lengthier journey times and having to share with a stranger means that respondents are more likely to choose a single occupancy car trip. On average, car sharing showed an 18 min penalty over driving alone with the distribution of values shown in Figure 19.1 below. Again, with reference to current journey time, the average penalty is worth 58% of the current journey time of 31.1 min.
19.4.3. Mixed Logit Specification — Panel Data Models A random parameters or mixed logit specification offers two principal advantages over traditional multinomial logit specifications. Firstly, it accommodates taste variation across the sample by specifying the model coefficients as distributions (e.g. a normal distribution) rather than fixed values. Secondly, the model takes account of potential correlations in the data introduced because of the repeat observation nature of the data. In SP experiments, each respondent typically provides choice information for many scenarios (in this experiment nine) and because these choices come from the same individual they cannot be treated as independent. The random parameters model can be specified to account for such panel data effects by allowing the choices of an individual to be correlated. In Model 4, analysis was undertaken using an absolute specification of the disbenefit of car sharing (as in Model 1) with all parameters specified to be normally
438
Jon Crockett et al.
Table 19.9: Mixed logit results (Model 4). Attribute
Mean of random parameter Coefficient
ASCCarAlone (d1) ASCCarShare (d2) ASCNT (d3) Cost (b1) Time (b2) Share family/friend Null log-likelihood Log-likelihood Adjusted r2 Respondents Observations
0 4.2782 7.8466 0.0176 0.1433 0.0948
Std. deviation of random parameter
t-Statistics
Coefficient
t-statistics
Fixed 14.80** 18.88** 14.87** 19.94** 0.76
0 4.7652 5.0093 0.0171 0.0805 1.0574
Fixed 16.17** 12.97** 11.09** 9.34** 3.63**
6555.42 3153.61 0.506 673 5967
Notes: t-statistics are shown relative to zero. **Indicates significant at the 5% level. Mixed logit model estimated with a panel specification using 5000 Halton draws.
distributed across individuals. Table 19.9 shows estimates of both the mean and standard deviation of the parameters. As is typically experienced with mixed logit specification, the overall level-of-fit is substantially improved compared with the non-random parameters specification — an adjusted r2 of 0.506 compared with 0.187. The estimated parameters have the anticipated signs and magnitude and are generally estimated with a high degree of precision. The standard deviations on the random parameters indicate a considerable degree of taste variation across the sample, although the mean estimates of the VoT are broadly similar to those from Model 1. Although the mean value of sharing with family/friends is still insignificant, there is statistically significant taste variation across the sample. Given that the estimated parameters describe distributions rather than fixed points, the estimation of the distribution of the relative attribute value (e.g. the VoT) is not straightforward. Conventional practice is to estimate the properties of the relative attribute distribution by Monte-Carlo simulation. In this instance 10,000 draws were made from the ASC distribution and the time coefficient distribution to generate a Cauchy distributed disbenefit of car sharing. Although the ratio of the mean coefficients is 29.85 min, the mean of the 10,000 simulated values is 56.39 min (median 27.51). Where the extreme values are excluded (i.e. values greater than or less than 200 min) the mean value estimated by Monte-Carlo simulation is 30.66 min. A histogram of this simulation is shown in Figure 19.2. From this analysis it can be seen that although most respondents see a disbenefit to car sharing, around 20% of respondents show a benefit from sharing.
An Examination of Alternative Approaches to Identifying Market Segments
439
An alternative to simulating the distribution using Monte-Carlo methods is to employ the two-stage mixed logit approach outlined in Section 19.2.2 involving the estimation of random parameters logit models followed by an examination of individual respondent’s choices to arrive at estimates of individual-specific values. The distribution of individual-specific values (Figure 19.3) has a mean value of 36.96 min (Model 5).
0.25
Proportion of Sample
0.2
0.15
0.1
0.05
90 10 0
80
60 70
50
40
30
20
0 10
-1
00 -9 0 -8 0 -7 0 -6 0 -5 0 -4 0 -3 0 -2 0 -1 0
0
Disbenefit of Car Sharing
Figure 19.2: MMNL simulated distribution (Model 4).
0.25
0.15
0.1
0.05
Disbenefit of Car Sharing
Figure 19.3: Individual-specific values (Model 5).
10 0
90
80
70
60
50
40
30
20
0 10
00 -9 0 -8 0 -7 0 -6 0 -5 0 -4 0 -3 0 -2 0 -1 0
0 -1
Proportion of Sample
0.2
440
Jon Crockett et al.
Table 19.10: Latent-class analysis (Model 6). Attribute
ASCCarAlone ASCCarShare ASCNT Cost Time Share family/friend ASC class Null log-likelihood Log-likelihood Respondents Observations
Latent Class 1
Latent Class 2
Coefficient
t-statistics
Coefficient
t-statistics
0 4.17310 2.84931 0.08706 0.00209 0.20366 0 6456.544 5274.52 673 5967
Fixed 6.29** 9.85** 7.44** 1.61 0.67
0 0.44277 3.02839 0.09509 0.01694 0.47712 0.10308
Fixed 2.64** 10.26** 8.97** 9.34** 2.44** 0.58
Notes: t-statistics are shown relative to zero. **Indicates significant at the 5% level. Model estimated using GAUSS. Please note that this model is directly comparable with Models 1, 2 and 4 in terms of null loglikelihood and number of observations. Model 3 has a reduced number of observations, due to some respondents not recording their journey time.
19.4.4. Latent-Class Results The L-C models proved relatively difficult to estimate with only the two-class specification able to be estimated. Although deterministic, the latent specification is a relatively difficult optimisation problem with Greene and Hensher (2003) reporting that the choice of a good starting point is crucial. The estimated model is shown in Table 19.10 with the probability of an individual belonging to Class 1 being equal to 71%. The estimated coefficients show those in Class 1 having a low sensitivity to cost and a correspondingly high VoT, and a dislike of sharing. Individuals in Class 2 have a lower VoT (5.6 pence/min) and are relatively indifferent to car sharing. On average, respondents show a disbenefit of car sharing equal to 35.4 min.
19.5. Discussion A variety of models have been produced based upon the SP data in order to derive valuations of car sharing in terms of adjustments to the ASCs and in equivalent alternative-specific IVT factors (relative to car alone). For each of these, further analysis has been undertaken according to key market segments, namely: income; car ownership; age group; gender; employment status; geography (Scottish Executive six-fold classification); sharing with a stranger; current journey time; interest in partaking in a car pool and current propensity to car share.
An Examination of Alternative Approaches to Identifying Market Segments
441
All of the base models produce significant results across the modelled parameters. The accompanying recommended values and factors therefore represent a solid foundation for the modelling and appraisal of HOV lanes in Scotland. The corresponding ASC is 26.68 min from the multinomial specification and approximately 30 min from the mixed logit specification with normal distributions on time, cost and the ASCs. Further analysis has shown that significant parameters can also be developed for market segments, including: car ownership (two cars plus households); gender; age group; current journey time; sharing with a stranger and interest in partaking in a car pool. The alternative model specifications however generate different distributions of the value of the disbenefit of car sharing. The mean values include 26.7 min for the MNL (Model 1), 18 min for the MNL (Model 3), 30 min for the MMNL (Model 4), an average of 37 min across individual-specific values and 35.4 min for the L-C model. The L-C model shows the least variation across individuals and the MMNL the greatest variation and there is little correlation between the individual-specific values derived from the MMNL and the MNL (Model 3).
19.6. Conclusions This paper has examined respondents’ willingness-to-pay for car-sharing initiatives in Scotland. A range of different functional forms and specifications were estimated to the data, each producing models with a good degree-of-fit to the dataset and significant parameters. It was hypothesised that underneath the aggregate results there would be significant variations in taste and preference, which were dependent upon underlying characteristics, or the psychographic profile, of the respondent. Examination of market segments using three distinct techniques added invaluable insights into the data, and highlighted a number of issues associated with distribution assumptions, simulation and estimation. Perhaps most pertinently, the analysis emphasises that parameter estimates can differ significantly and are highly dependent upon model specification and any a priori assumptions made by the modeller. Commencing with the MNL model, whilst its degree-of-fit was as expected the lowest, estimation and specification remain the most well-established. The resulting valuations are neither implausible nor difficult to apply in conventional transport models. This does not though deflect from two fundamental concerns, namely: that finite distributions do not accurately reflect taste and preference distributions amongst the population; and the specification of endogenous segments may miss the true drivers of variations in taste and preference. We also see (in Figure 19.1a) that the finite segmented distribution method results in an unnaturally ‘lumpy’ distribution of the disbenefit of car sharing, which does not appear analogous to ‘real world’ tastes. The incorporation of psychographic profiling can begin to offset the reliance on well-established market segments, e.g. income,
442
Jon Crockett et al.
but we would conclude that the MNL structure should only represent the starting point in the model estimation process. Whilst the two mixed logit approaches described here show clear benefits in terms of their resulting distributions of valuations amongst the population, their estimation is not without difficulty. The use of Monte-Carlo simulation techniques is shown to clearly skew the mean value of the parameter unless extreme values are excluded. Defining the boundaries represents an initial source of possible error into the model estimation process. In this respect we believe that the two-stage approach offers clear benefits. Whilst its distribution is not as attractive in terms of its analogy with established functional forms, it does benefit from being readily explainable (and justifiable) to non-technical audiences and requires no assumptions and interventions on the part of the modeller. In addition, its distribution is broadly in line with that produced using simulation techniques. It is noted that the individual-specific values of the disbenefit of car sharing derived from the MNL and two-stage MMNL show very little correlation and a particularly pertinent advancement of existing techniques would be to examine alternative functional forms of the MMNL in more detail. Although we were only able to estimate the two-class L-C model, we believe that if the techniques and associated software can be developed further it offers a substantial incremental improvement on the MNL. As Greene and Hensher (2003) have previously discussed, the differential between the L-C and MMNL is less defined, and was hampered here by difficulties in estimation for c ¼ 3 + classes. We believed that this is related to the requirement to commence estimation from a good starting point(s); two drawbacks then emerge, namely: the requirement for the ‘good starting point’ and the extent to which this prejudges valuations or effects model estimation; and difficulties when the attributes being investigated are more novel in nature and such starting points may not be intuitively apparent or available (such as this study). By contrast, a particular benefit is the lack of a requirement to prejudge the characteristics and attitudes of interest. Specification of a single normally distributed population would not appear analogous to our experience of variation in taste and preference across the population, and L-C models would appear well placed to capture this. In conclusion, we believe that L-C models could offer significant benefits and that if the current advancements we believe to be taking place can overcome the starting point drawback, then they will be placed to advance our understanding of variations in taste and preference.
Acknowledgements The authors would like to acknowledge the funding of Transport Scotland for the research. Thanks are also due to the support of colleagues, including Sekai Seawell,
An Examination of Alternative Approaches to Identifying Market Segments
443
David Connolly and Annabel Eccles. The authors also thank two anonymous referees for comments on earlier versions of the paper.
References Ben-Akiva, M. E., & Bolduc, D. (1996). Multinomial probit with a logit kernel and a general parameter specification of the covariance structure. Working Paper, Department d’Economique, Universite Laval, Canada. Bhat, C. (2000). Quasi random maximum simulated likelihood estimation of the mixed multinomial logit model. Transportation Science, 34, 228–238. Bierlaire, M. (2003). BIOGEME: A free package for the estimation of discrete choice models. Proceedings of the 3rd Swiss transportation research conference, Ascona, Switzerland. Breffle, B., Morey, E., & Thacher, J. (2006). Using angler characteristics and attitudinal data to identify environmental preference classes: A latent-class model. Environmental and Resource Economics, 34, 91–115. Greene, W. H., & Hensher, D. A. (2003). A latent class model for discrete choice analysis: Contrasts with mixed logit. Transportation Research Part B: Methodological, 37, 681–698. Hess, S., Train, K., & Polak, J. (2006). On the use of modified Latin hypercube sampling (MLHS) method in the estimation of mixed logit model for vehicle choice. Transportation Research Part B: Methodological, 40(2), 147–163. Hurvich, M., & Tsai, C. (1989). Regression and time series model selection in small sample sizes. Biometrika, 76, 297–307. Leroux, B. G. (1992). Consistent estimation of mixing statistics. Annals of Statistics, 20, 1350– 1360. Mcfadden, D., & Train, K. (2000). Mixed MNNL models for discrete response. Journal of Applied Econometrics, 15, 447–470. Mclachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley. Ortuzar, J. D., & Willumsen, L. G. (2006). Modelling transport third edition. Chichester, UK: Wiley. Provenchar, B., Baerenklau, K., & Bishop, R. (2002). A finite mixture logit model of recreational angling with serially correlated random utility. American Journal of Agricultural Economics, 84, 1066–1075. Revelt, D., & Train, K. E. (1999). Customer-specific taste parameters and mixed logit. Working Paper, University of Berkeley, California, USA. Available at: http://elsa.berkeley.edu/wp/ train0999.pdf. Accessed December 12, 2008. Scarpa, R., & Thiene, M. (2005). Destination choice models for rock climbing in the northeastern alps: A latent-class approach based on intensity of preferences. Land Economics, 83(1), 426–444. Train, K. E. (1998). Recreation demand models with taste differences over people. Land Economics, 74(5), 230–239. Train, K. E. (2003). Discrete choice methods with simulation. Cambridge, UK: Cambridge University Press. Whelan, G. A. (2003). Identifying taste variation from choice models. Paper presented at the 2003 European Transport Conference. PTRC, UK.
Chapter 20
Modelling Choice in a Changing Environment: Assessing the Shock Effects of a New Transport System Marı´a Francisca Ya´n˜ez and Juan de Dios Ortu´zar
Abstract Discrete choice models based on cross-sectional data have the important limitation of not considering habit and inertia effects and this may be especially significant in changing environments; notwithstanding, most demand models to date have been based on this type of data. To avoid this limitation, we started by building a mode choice panel around a drastically changing environment, the introduction of a radically new public transport system for the conurbation of Santiago de Chile. This paper presents the formulation and estimation of a family of discrete choice models that enables to treat two main elements: (i) the relative values of the modal attributes, as usual, and (ii) the shock resulting from the introduction of this radical new policy. We also analyse the influence of socioeconomic variables in these two forces. We found that introducing this drastic new policy may even modify the perception of attribute values; in fact, the changes can be different among individuals, as socioeconomic characteristics act as either enhancers or softeners of the shock effects generated by the new policy.
20.1. Introduction Mode choice models are usually based on cross-sectional data, due to both its higher availability and better practical knowledge about their treatment.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
446
Marı´a Francisca Ya´n˜ez and Juan de Dios Ortu´zar
However, cross-sectional data have a broadly known limitation; the models cannot consider temporal effects. Using panel data constitutes a good alternative to overcome this limitation, as panels have many advantages (Golob, Kitamura, & Supernak, 1997; Stopher & Greaves, 2004), which are especially significant when dealing with changing environments. Notwithstanding, apart from the before-andafter study carried out in Massachusetts around the introduction of a free bus service (Parody, 1977), the work developed in Amsterdam on an extension of the urban motorway system (Kroes, Daly, Gunn, & van der Hoorn, 1996) and our own work on the Santiago Panel (Ya´n˜ez, Mansilla, & Ortu´zar, 2009), to our knowledge there are no panels built around a substantial change that enables modelling the mode choice process. The Santiago Panel is a four-wave five-day pseudo diary containing data just before and after the introduction of Transantiago (Mun˜oz, Ortu´zar, & Gschwender, 2009), a radical but, sadly, very badly implemented new public transport system for Santiago de Chile.1 The aim of the plan was to improve public transport in the city in an effort to stop its gradual decline. It was implemented in ‘big bang’ fashion (i.e. neither pilot studies nor gradual changes were included) on 10th February 2007. The main feature of the new system is that it considers an integrated fare in a system of feeder and trunk buses in conjunction with the Santiago underground (Metro), which was established as its backbone. Additionally, the new system is characterized by higher quality buses with shorter routes (i.e. with almost no accidents, low noise and much reduced pollution), no competition among buses to gain passengers (hence, more service-oriented drivers), and a new payment system featuring a contactless card (Bip! similar to the Oyster in London). However, it also requires many more transfers, lower frequency in the feeder services, longer walking times in the suburbs and, particularly at its start, severe overcrowding in bus and Metro services during peak hours. All in all, during its first 2 years of operation, the results of Transantiago were certainly not as expected, as the government gave an unrealistic vision of what the new system would be like; Transantiago was presented as a solution to almost every problem that the old public transport system had. Unfortunately, at implementation time several problems dominated the system, among others: the buses did not have the necessary technology to allow full use of the contactless card, all segregated bus-only lanes were not constructed (i.e. the speeds assumed when the system was designed were impossible), Transantiago had limited human resources and the operators’ contracts lacked appropriate incentives for them to transport more passengers. The aim of this paper is to both understand and model the shock effects associated with the introduction of a (radical) new policy. Panels allow considering inertia or
1. Santiago is the fifth largest conurbation in South America. In December 2002, it had approximately 6 million inhabitants living in 1.5 million dwellings spread over 1400 km2, and on a typical working day 16.3 million trips were made, 10 million of which used motorised transport modes (DICTUC, 2003).
Modelling Choice in a Changing Environment
447
Modes
Other Shared Tax - Bus
Wave 4
Bus - Metro
Wave 3
Shared Taxi - Metro
Wave 2
Car Passenger - Metro
Wave 1
Car Driver - Metro Shared Taxi Bicycle Walk Metro Bus Car Passenger Car Driver 0%
5%
10% 15% 20% 25% 30% 35% 40%
Figure 20.1: Change in modal split. habit effects, but here we are particularly interested in modelling the special consequences of a large change (i.e. a shock) in the choice process. The rest of the paper is organized as follows. In Section 20.2, we briefly describe the Santiago Panel. In Section 20.3, we present the theoretical formulation and discuss the empirical results of several estimated models. Finally, in Section 20.4, we present our main conclusions.
20.2. Data Analysis The data used in this research comes from the Santiago Panel, a five-day pseudo diary2 with four waves, one before and three after the implementation of Transantiago. The initial sample of the Santiago Panel consisted of 303 individuals who live in Santiago and work full-time at one of the four campuses and two hospitals of the Pontificia Universidad Cato´lica de Chile. Thanks to the panel design and an effective maintenance methodology (Ya´n˜ez et al., 2009), the Santiago Panel has effectively controlled attrition,3 which was just 5, 3 and 7% in waves 2, 3 and 4, respectively. A simple statistical analysis of the data gathered shows significant changes after the implementation of Transantiago. About 55% of respondents changed their
2. For budget reasons, the panel considered information for the five working days, but only about the work trips on the morning peak hour. 3. Attrition is one of the main problems of panels and refers to the losing of respondents between successive surveys (waves).
448
Marı´a Francisca Ya´n˜ez and Juan de Dios Ortu´zar
Table 20.1: Average income per mode. Mode
Car driver Car passenger Shared taxi Metro Bus Walking Bicycle Park‘n’Ride Kiss‘n’Ride Shared taxi–Metro Bus–Metro Shared taxi–Bus
Average income (Ch$) Wave 1
Wave 2
Wave 3
1,217,664 483,733 535,000 619,738 313,089 742,105 290,000 475,833 586,720 367,794 595,360 n.a.
1,238,152 451,722 547,142 426,191 329,097 721,697 326,250 860,555 512,222 331,012 388,474 420,952
1,177,379 473,822 485,571 442,318 314,129 699,167 489,428 606,956 498,775 474,687 371,840 510,000
transport mode (between waves one and four), and for the rest a significant proportion changed either route or the number of transfers in their trip (see the details of mode changes in Figure 20.1). Therefore, we can say that in this panel, changes are not an exception, unlike most panels (such as the Puget Sound Panel, Murakami & Watterson, 1990), where 85% of the workers chose the same mode in the first two waves. Due to these severe mode changes, the average socioeconomic (SE) conditions of the users of each mode also changed; Table 20.1 shows how users’ income changed for each mode among waves. The largest changes (for Metro, bicycle, Park‘n’Ride, Kiss‘n’Ride and bus–Metro) were expected, as Transantiago introduced an integrated fare without an extra charge for transfers and this promoted Metro use by lower-income individuals. Regarding SE characteristics, the sample average income (US$1180 per month) is higher than the Chilean average income.4 Consequently, we could expect higher subjective values of time (SVT). Moreover, due to the fact that all respondents are full-time workers, who also have a better education level than the Chilean mean, we could expect even a higher SVT.
20.3. Modelling A shock can be defined as a sudden or unexpected event. This may be an earthquake or strike, or the introduction of new infrastructure, among others. There are
4. The average income in Chile is currently US$582.
Modelling Choice in a Changing Environment
449
different kinds of shocks, but all are difficult to characterize due to their intrinsically sudden nature. In this case, we are considering the introduction of a radically new public transport system as a level-shift shock. Thus, we model the situation where an individual q uses commonly the same option to travel to work. But at time ts the transport system changes radically. In other words, we intend to model a habitual choice situation in the presence of an important system change. Hence, apart from the influence of modal attributes, we could consider two different opposite forces influencing the choice process: shock effects and inertia effects (habit). However, in this paper, we will just analyse the first and with the following objectives: (i) to find the best way to introduce the shock effects in a model and (ii) to analyse the relationship between the intensity of the shock effects and the SE characteristics of the individuals.
20.3.1. A Model with Shock Effects We will assume that the shock effects can modify the parameters associated to the attributes of the various alternatives. This means that they may modify the relative importance of each attribute over the traveller’s choice process. More precisely, we propose that the shock effects can be random, a function of the SE characteristics and choice situations, and different among both individuals and alternatives. This is different to what has been done to model inertia, which is another important temporal effect, as what matters is there is the difference in the utility of two or more alternatives (Cantillo, Ortu´zar, & Williams, 2007). The main hypotheses supporting our formulation are: Individuals are utility maximizers. Individual responses present panel correlation. A radical intervention may generate a shock effect, which in turn may have the power to modify the entire choice process; consequently, individuals may modify their valuation process altering their utility functions. The shock effect may be different for each alternative. The shock effect may vary among individuals; thus, SE characteristics may influence (either amplify or reduce) the shock effect. We will use the following random utility structure to develop all our models: ( U wiq
¼
if wots awi þ X wiq bwiq þ zwiq w w w w w w w ai þ X iq biq ½1 þ ðS iq þ SE q bSE Þ þ ziq otherwise
where X wiq is a vector with the attributes of option i for individual q on wave w; bwiq is a vector of parameters for individual q, for option i, on wave w;
(20.1)
450
Marı´a Francisca Ya´n˜ez and Juan de Dios Ortu´zar
SE wq is a vector of SE characteristics for individual q on wave w; S wiq is a vector of parameters that varies randomly among individuals and represents the shock effects of the policy; consequently, shock effects will exist if and only if S wi (the population mean) is different from zero; zwiq is a random error term formulated as zwiq ¼ uiq þ wiq , where uq is a random effect specific to the individual and wiq is the typical random error distributed independently and identically Gumbel.
20.3.2. Model Estimation To estimate the above model, we need choice data from a panel with at least two waves. Then, we have to build a joint likelihood function for the number of waves (W) considered. Given that a mixed logit (ML) model probability of choice is given by the integral of standard multinomial logit (MNL) probabilities over a density of parameters (Train, 2003), the ML panel probability can be expressed simply as the product of ML probabilities: Piq ðlÞ ¼
W Z X w¼1
! expðV wiq ðlwi ÞÞ P f ðlwi jOÞdl w w w expðV ðl ÞÞ lq l l2Aq
(20.2)
where l is a vector of parameters to be estimated, which is assumed to distribute following f in the population. To evaluate Eq. (20.2), even if we assumed that wiq has a Gumbel distribution, it is necessary to compute multidimensional integrals for each individual. Therefore, this calculation requires simulating values as in: ! w w N X expðV ðl ÞÞ 1 in iq P SPN iq ¼ w w N n¼1 l2Awq expðV lq ðlln ÞÞ
(20.3)
where N is the number of random draws, as usual (Sillano & Ortu´zar, 2005).
20.3.3. Empirical Analysis Starting from Eq. (20.1) which presents a general model formulation, models of increasing complexity were estimated. In all cases, we assumed that individuals chose their mode among a finite set of alternatives and that choice sets could vary over working days, waves and individuals. For space reasons, we will leave out several specifications tested as they came out as no improvement (i.e. using the likelihood ratio (LR) test) over their restricted versions. Model 1. This is the simplest specification and assumes generic parameters over waves. It considers panel correlation (i.e. correlation among the answers from the same individual over time), so it could be valid for common and stable choice
Modelling Choice in a Changing Environment
451
environments, but it is at least questionable for changing environments. Here, we have that bwiq ¼ biq ; awi ¼ ai ; Swiq ¼ 0 Model 2. This also considers panel correlation and assumes generic parameters for the modal attributes, but it does include shock effects assuming simply that the alternative specific constants (ASC) capture their influence. So here bwiq ¼ biq ; Swiq ¼ 0 Model 3. This specification assumes that the shock effect is unique (i.e. it affects all alternatives in the same way) and that it may modify the value of the parameters. Here, we tested three different versions: Model 3_Shock. It assumes that the shock effect is equal for all individuals. bwiq ¼ biq ; Swiq ¼ S Model 3_Random_Shock. It assumes that the shock effect may be different among individuals, that is it considers the shock as a random parameter. bwiq ¼ biq ; S wiq ¼ Swq Model 3_Shock_SE_Variables. It assumes that the shock effect may be different for different groups of individuals, that is it considers that the SE characteristics may affect the magnitude of the shock effect. bwiq ¼ biq ; Swiq ¼ S wq þ SEwq bwSE Model 4. This assumes that the shock effect may modify the value of the parameters, but it is not unique (i.e. it may vary among alternatives). We also tested three specifications: Model 4_Shock. The shock effect is different for each group of alternatives (i.e. private modes, public modes, slow modes and combined modes) but equal for all individuals.5 bwiq ¼ biq S wiq ¼ S i Model 4_Random_Shock. The shock effect is different for each group6 of alternatives and varies randomly among individuals. bwiq ¼ biq ; Swiq ¼ S wiq
5. Identifiability problems precluded the estimation of shock parameters that varied for each mode. 6. Private modes (car driver and car passenger); public modes (bus, Metro, shared taxi, bus–Metro, shared taxi–Metro, bus–shared taxi); slow modes (bicycle and walking); combined modes (car driver–Metro or Park‘n’Ride and car passenger–Metro or Kiss‘n’Ride).
452
Marı´a Francisca Ya´n˜ez and Juan de Dios Ortu´zar Model 4_Shock_SE_Variables. The shock effect is different for each group of alternatives and different among individuals, as it considers that SE variables may affect its magnitude. bwiq ¼ biq ; S wiq ¼ Swiq þ SEwq bwSE
All specifications incorporate systematic heterogeneity around the ASC through the introduction of two variables related to the start of Transantiago.7 The first was needed to represent the effect of the new payment system, which featured an integrated fare for bus and Metro trips (which did not exist in the old system), allowing a time window of 2 h to make the transfer. For this we decided to introduce a dummy variable that indicates if the individual was a new Metro user or not (Transantiago relies on the Metro as its backbone), and we expected a positive effect in the utility function. The second variable was needed to represent the comfort8 perceived by public transport users; this became especially important after Transantiago started because the levels of overcrowding in buses and the Metro were extreme at the beginning of the new regime (and although it improved a lot after the first year of operations, it is still much higher than ever before in the Metro). Regrettably, the initial survey design did not consider the inclusion of latent variables, so we could not predict the importance of the above effects. Nevertheless, we obtained valuable information about the subjective perceptions of individuals, and we used these to create a dummy variable that took the value of one for respondents stating that comfort was the attribute that worsened the most after the introduction of Transantiago. The expected sign of this variable is positive, as users chose the mode despite declaring a negative change in comfort. These two new variables obtained correct signs and high t-ratios. Moreover, in some cases when they were not considered, the models presented problems with the signs of key policy variables. We also asked if individuals had the need to use a car due to additional activities performed prior to or after their journey (i.e. take children to school, go to the gym). With this we defined the variable Captive, which took the value of 1 if this was the case. This variable, which was introduced interacting with the shock effect, had the correct sign and a high t-ratio but a modest magnitude.9 Table 20.2 shows the model results obtained for specifications 1 and 2, while Tables 20.3–20.5 show model results for specifications (3 and 4) including explicitly the shock effect.10 As can be seen, the mode-attribute parameters are all significant and have the expected sign; also, in every specification containing it excepting
7. They were included from the second wave onwards. 8. We are aware that this variable is closely related to travel time, but its effect is not included in the models as the associated parameter was not significant. 9. We took out of the sample roughly 12% of cases in this situation and estimated models without them, finding no significant changes. 10. We present the best two models for specifications 3 and 4 because the simplest models (Model 3_Shock and Model 4_Shock) were significantly worse (according to the LR test).
Modelling Choice in a Changing Environment
453
Table 20.2: Model parameters for specifications 1 and 2. Model 1
Model 2 Wave 1
Number of cars t-Test Cost (mean) t-Test Cost (st. dev.) t-Test Travel time t-Test Waiting time t-Test Walking time t-Test Interchanges t-Test Comfort (mean) t-Test Comfort (st. dev.) t-Test Transantiago t-Test Car driver t-Test Car passenger t-Test Shared taxi t-Test Metro t-Test Walk t-Test Bicycle t-Test Park‘n’Ride t-Test Kiss‘n’Ride t-Test Shared taxi–Metro t-Test Bus–Metro t-Test Bus–shared taxi t-Test L (max) Number of parameters
2.57 5.91 0.497 14.1 0.847 14.6 0.101 7.69 0.224 10.16 0.081 8.48 0.794 3.25 16.5 9.62 30.1 10.65 2.07 4.04 8.02 8.13 3.59 3.77 7 6.84 7.21 7.62 5.52 5.74 2.68 2.72 5.81 5.94 5.88 6.16 8.7 8.94 8.82 9.03 8.17 7.85 2742.98 22
2.69 2.98 1.89 2.07 1.38 0.15 2.85 3.44 0.167 0.17 2.91 2.76 0.936 0.99 1.72 2.03 4.42 5.38 4.14 5.05
Wave 2 3.38 5.92 0.58 15.89 0.997 15.61 0.14 8.56 0.229 9.17 0.083 7.96 3.22 5.33 8.39 9.44 14.3 13.96 1.25 2.1 3.4 3.04 0.735 0.66 0.1 0.93 1.74 1.74 0.429 0.33 6.84 3.99 7.47 5.71 0.961 0.93 0.145 0.14 2.41 2.52 0.489 0.43 1966.58 43
Wave 3
3.53 2.21 4.56 2.82 5.77 0.89 2.43 1.63 5.58 3.23 7 3.94 9.16 5.32 5.16 3.17 5.67 3.53 5.61 3.57 7.89 5.05
Number of observations Number of cars t-Test Cost (mean) t-Test Cost (st. dev.) t-Test Travel time t-Test Waiting time t-Test Walking time t-Test Interchanges t-Test Comfort (mean) t-Test Transantiago t-Test Car driver t-Test Car passenger t-Test Shared taxi t-Test Metro t-Test
2.34 6.08 6.75 13.24 2.22 5.23 1.03 5.51
0.098 12.18 0.273 15.26 0.282 16.79 1.35 11.14 16.4 14.99
4.26 10.03 0.211 16.2
Wave 1
0.9 0.93 4.47 54.28
0.9 0.93 4.47 54.28
3.29 5.29 0.226 10.75 1.08 13.14 0.081 8.65 0.244 11.25 0.139 10.54 1.85 6.37 3.62 2.67
Wave 1
0.3 0.39 2.71 3.01 0.236 0.52 0.074 0.35
2.75
1.41
0.11
0.19
0.06
0.17
4117 2.5
Wave 2
3.04 33.31 0.212 10.44
Wave 1
25.7 9.66 16.1 5.97 7.47 6.66 7.28 7.57
20.28
20.28 14.4 6.1 8.63 1.94 0.911 1 0.279 0.32
11.1
4.65
0.87
1.05
0.41
0.91
12.98
Wave 3
11.1
4.65
0.87
1.05
0.41
0.91
4117 12.98
Wave 2
Model 4_Random_Shock
0.096 12.53 0.19 0.247 14.33 0.11 0.203 16.62 1.41 1.09 8.32 2.75 2.6 10.19 4.75 3.84 0.707 0.71 1.53 2.25 1.85 2.06 4.32 5.72 1.53 0.444 3.51 1.18 0.91 1.74 5.13 7.96
0.06
0.17
2.5
Wave 3
Model 3_Shock_SE Variables
2.01 5.65 1.09 3.29 6.23 2.49 10.8 10.9 5.75 5.72 5.54 11.18 0.255 0.095 1.7 0.41 0.82 4.37 0.624 2.03 0.323 1.38 5.91 1.67
0.32
0.7
0.7
0.32
14.1
Wave 3
4117 14.1
Wave 2
Model 3_Random_Shocka
Table 20.3: Models parameters for specifications 3 and 4.
6.61 7.82 1.3 9 2.56 9.02 0.045 2.31 0.161 9.83 0.106 6.37 3.74 4.09 19.1 6.96 12.7 5.18 1.16 1.12 4.82 4.79 1.59 2.02 4.39 5.26
Wave 1
8.76 5.27 6.95 5.76 6.74 5.86 7.1 6.25
0.97
1.45
0.28
0.01
0.01
0
0.1
4117 0.5
Wave 2
0.547 0.28 4.21 2.7 5.94 4.09 5.56 3.86
0.97
1.45
0.28
0.01
0.01
0
0.1
0.5
Wave 3
Model 4_Shock_SE Variables
454 Marı´a Francisca Ya´n˜ez and Juan de Dios Ortu´zar
1772.51 42
0.651 5.86 1.09 4.76 8.12 36.8 11.54 13.93 4.99 2.47 10.22 2.1 1.13 5.19 3.64 4.25 0.844 1.89 2.47 3.44 0.21 2.58 0.85 6.83 0.01 0.5
0.352 0.19 6.15 4.83 1.14 1.78 0.28 0.65 0.503 0.42 2.92 9.77 1.32 2.83
6.11 1699.14 49
1.91 1901.2 49
12.11 0.636
1.08 0.9 6.8 5.88 1.97 1.36 5.76 4.64 10.2 7.71 5.27 6.82
1.98 0.848
18.1 4.25 9.08 0.96 1.32 0.999 0.754 0.54 6.9 6.74 5.64 8.77 6.06 5.09
8.93 0.835
1.65 16.3 2.99 0.97 2.82 5.42 5.22 0.95 1.19 10.1 2.79 1.59 0.154 3.99 0.38 0.66 3.5 2.46 9.16 3.43 0.562 1.29 1.96 2.79 12.1 0.76
1.79 0.998
0.515 0.92 2.47 3.7 1.38 2.44 0.1 0.77 2.4 5.1 2.92 8.15 1.31 2.8 0.465
0.276 0.13 8.19 2.91 0.455 0.81 0.211 0.55 0.353 0.39 2.89 7.66 1.91 2.48
0.883
2.84 4.1 6.95 8.43 3.45 7.93 1.68 0.6 1.32 3.76 0.459 1.69
1648.55 49
4.07 3.26 1.25 0.55 2.48 1.74 2.75 2.29 6.84 6.11 7.75 7.03 5.81 5.16
3.22 2.02 2.5 1.62 1.67 1 0.594 0.35 6.07 4.12 5.98 4.12 4.38 3.01
Definitions: Cost: cost/wage rate; Transantiago: new Metro user as a consequence of Transantiago’s implementation; Travel, Walking and Waiting Time: in minutes; Captive: an extra reason to choose the mode (i.e. carry children); Income level 1: incomeoCh$ 350,000; INCOME LEVEL 2: income between Ch$ 350,000 and Ch$ 1,000,000. a The parameters of waves 2 and 3 for model specifications 3 and 4 are: b1i ð1 þ Si þ SEq bSE Þ. As an example, the table presents the parameters for the average shock value without considering socioeconomic characteristics b1i ð1 þ Si Þ. In particular, for model specification 4 (for space reasons), we report only the parameters for the public modes b1Publicmodes ð1 þ SPublicmodes Þ.
Walk t-Test Bicycle t-Test Park‘n’Ride t-Test Kiss‘n’Ride t-Test Shared taxi–Metro t-Test Bus–Metro t-Test Bus–shared taxi t-Test Captive (Shock_SE_Variable) t-Test Income level 1 (Shock_SE_Variable) t-Test Income level 2 (Shock_SE_Variable) t-Test L (max) Number of parameters
Modelling Choice in a Changing Environment 455
456
Marı´a Francisca Ya´n˜ez and Juan de Dios Ortu´zar
Table 20.4: Shock parameters for model specification 3. Model 3_Random_Shock
Model 3_Shock_SE Variables
2.31 8.32 4.03 11.7
0.24 1.55 –
Shock effect (mean) t-Test Shock effect (st. dev.) t-Test
Table 20.5: Shock parameters for model specification 4. Model 4_Random_Shock
Model 4_Shock_SE Variables
Private Public Combined Slow Private Public Combined Slow modes modes modesa modesb modesc modesd modes modes Shock effect (mean) t-Test Shock effect (st. dev.) t-Test
10.5
3.27
8.22
9.81
13.58 21.8
10.21 5.73
12.62 7.34
2.0 18.4
18.49
12.56
17.94
3.5
1.75 3.59
0.924 49.0
1.18 2.06 36.36 9.88
a
Private modes: car driver and car passenger. Public modes: shared taxi, shared taxi–Metro, bus–shared taxi, bus, bus–Metro, Metro. c Combined modes: car driver–Metro and car passenger–Metro. d Slow modes: bicycle and walk. b
Model 3_Shock_SE_Variables, we found that the shock coefficient was significantly different from zero. We tested different distributions for the random parameters (i.e. cost and comfort) of the modal attributes, finding fairly similar results for the normal and log-normal distributions. With the former we were able to check that the expected proportion of individuals with ‘incorrect signs’ would be minimal (Sillano & Ortu´zar, 2005), so we decided to make our analyses based on the log-normal distribution as it gave a marginally superior fit. Considering first, the three models belonging to specification 3, a LR test (Ortu´zar & Willumsen, 2001) allows to conclude that Model 3_Shock, which assumes a unique shock effect for every individual, is the least appropriate model (i.e. the null hypothesis of model similarity is rejected with respect to all other models), while Model 3_Random_Shock, assuming a random11 shock parameter, is superior. Second, considering the three models belonging to specification 4, the LR test allows to conclude that Model 4_Shock is, again, the least appropriate model;
11. We assumed a normal distribution for the shock parameters.
Modelling Choice in a Changing Environment
457
however, this time Model 4_Shock_SE_Variables, considering that SE characteristics affect the magnitude of the shock effect, is superior. If we compare all the estimated models, we can conclude that Model 4_Shock_SE_Variables is the best specification. Interestingly, this model exhibits a positive shock effect only for the private modes. Regarding SE variables,12 being ‘Captive’ and not having high income were found to be characteristics acting as enhancers of shock effects, as expected. But contrary to our expectations, other important SE variables (i.e. sex, age and educational level, among others) did not show a significant impact. Summarizing, given that Model 4_Shock_SE_Variables is the best overall model, we can say that SE variables have an influence in the effects of a shock, but their impact is clearer when we have an appropriate shock formulation. On the other hand, even though specifications 3 and 4 allow for different parameters per wave and alternative, the SVT for each model formulation remain the same over waves and also over alternatives, as the parameters bwiq are multiplied by ð1 þ S wiq Þ from the second wave onwards. Apart from these specifications, based on the general formulation presented in Eq. (20.1), we tested a variety of models such as: different shock effects for waves 2 and 3 ðS 2iq aS3iq Þ and different shock effects per individual considering the influence of both random parameters and SE variables. In all cases, the models were judged inferior to our best specification (Model 4_Shock_SE_Variables). To analyse the consequences of using shock models instead of the traditional ones, we applied the models to a validation sample. As all models were estimated with the first three waves of the Santiago Panel, we used the recently finished fourth wave as validation sample. Thus, we were able to confirm the results of the original LR tests, as a direct LR test (Ortu´zar & Willumsen, 2001) shows that data under Model 4_Shock_SE_Variables (the best model) are 20 times more likely than under Model 1 and 16 times more likely than under Model 2. Finally, and considering that utility models are widely used to estimate willingness-to-pay, Table 20.6 shows the estimated SVT values for every model. We can see that the traditional models clearly appear to underestimate the SVT.
20.4. Conclusions Using data from the Santiago Panel, we studied the importance of the ‘shock effects’ caused by the introduction of a radical new policy. We assumed that these effects can modify the parameters associated to the attributes of the various alternatives in a mode choice situation. We evaluated different ways to tackle the shock effects, and our findings support our original hypotheses, that is a shock can indeed modify the valuation process of alternatives, causing changes in individual utility functions.
12. SE variables that allow introducing systematic variation over shock effects.
458
Marı´a Francisca Ya´n˜ez and Juan de Dios Ortu´zar
Table 20.6: Subjective values of time. Model 1 Model 2 Model 3_ Random_ Shock SVT travel (Ch$/h)a SVT wait (Ch$/h) SVT walk (Ch$/h) a
595.64
Model 3_ Shock_SE Variables
Model 4_ Model Random_ 4_Shock_ Shock SE_Variables
707.48
1361.32
1055.68
1327.25
1021.34
1321.01 1157.24
3792.24
3164.44
3414.89
3629.93
3917.26
1802.69
2806.57
2389.89
475.92
417.41
During the survey 1 US$ varied between 700 and 500 Ch$ (Chilean pesos).
Our best models reveal that the shock effects vary among alternatives and individuals. Additionally, although in this case the shock was generated by the introduction of Transantiago, a policy that modifies only the public transport system, we found, as expected, not only a direct effect in the public transport modes but also indirect and not less important effects in the private and combined modes. We tested two ways to include the hypothesis that the shock effect could vary among individuals: to allow for random shock parameters and to allow for systematic variations of the shock parameters. The first implies a different effect per individual, while the second implies different effects for certain groups of individuals. Moreover, we allowed for the shock parameter to have different signs among individuals. Our empirical findings show that the SE variables (systematic variations) act as enhancers of the shock effect, keeping the original sign; while the models with random parameters enabled different signs to be revealed (in 28.3% of cases). These findings reinforced our belief that changing systems should be modelled with data, which can really capture the effect of new policies in the individuals’ choice processes. Moreover, when the new policy implies large and sudden changes, the formulations should be more flexible and considering that the shock (large change) effects may be different over alternatives and over individuals. Finally, this research presents one possible way to deal with shock effects. But inertia may also affect behaviour in an important way, especially for trips strongly influenced by routines (Cantillo et al., 2007). We are currently studying more complex model formulations to incorporate together shock and inertia. Until now, we can only say that inertia and shock are opposite effects; the inertia effect increases the probability to remain in the same alternative, while the shock effect increases the probability to change the usual choice. Therefore, under this assumption if we incorporate inertia to the models, the shock effects could be higher. Moreover, we could expect that the inertia effect tends to disappear or at least reduce in magnitude
Modelling Choice in a Changing Environment
459
between the first two waves (when Transantiago was introduced). Afterwards, this effect might recover its power, decreasing the probability to change the choice. Thus, contrary to the models presented in this paper, we could also expect different shock effect values over waves. It is important to emphasize that the shock model can better explain the choice situation affected by a shock, but it cannot forecast changes because shocks are defined precisely as sudden changes. Moreover, the effects over the users’ choices depend on the shock nature, which may be extremely different.
Acknowledgements We wish to thank the funding of the Chilean Fund for Scientific and Technological Research, FONDECYT (Project 1050672) and of the Millennium Institute on Complex Engineering Systems (Project P05-004F). The last version of this paper was written when the first author was a visiting Ph.D. student at the German Institute for Economic Research (DIW-Berlin), financed by the Chilean Council for Scientific and Technological Research (CONICYT) and the German Academic Exchange Service (DAAD), and the second author was a visiting scholar at the French National Institute for Transport and Safety Research (INRETS).
References Cantillo, V., Ortu´zar, J. de D., & Williams, H. C. W. L. (2007). Modelling discrete choices in the presence of inertia and serial correlation. Transportation Science, 41(2), 195–205. DICTUC. (2003). Encuesta de Movilidad 2001 de Santiago. Final report to the Ministry of Planning, Department of Transport Engineering, Pontificia Universidad Cato´lica de Chile, Santiago (in Spanish). Golob, T. F., Kitamura, R., & Supernak, J. (1997). A panel-based evaluation of the San Diego I-15 carpool lanes project. In: T. F. Golob, R. Kitamura & L. Long (Eds), Panels for transportation planning: Methods and applications (pp. 97–128). Boston, MA: Kluwer Academic. Kroes, E. P., Daly, A. J., Gunn, H. F., & van der Hoorn, A. I. J. M. (1996). The opening of the Amsterdam ring road: A case study on short-term effects of removing a bottleneck. Transportation, 23(1), 71–82. Mun˜oz, J. C., Ortu´zar, J. de D., & Gschwender, A. (2009). Transantiago: The fall and rise of a radical public transport intervention. In: W. Saleh & G. Sammer (Eds), Travel demand management and road user pricing: Success, failure and feasibility (pp. 151–172). Farnham, UK: Ashgate. Murakami, E., & Watterson, W. T. (1990). Developing a household travel survey for the Pudget Sound Region. Transportation Research Record, 1285, 40–48. Ortu´zar, J. de D., & Willumsen, L. G. (2001). Modelling transport (3rd ed.). Chichester, UK: Wiley. Parody, T. (1977). Analysis of predictive qualities of disaggregate modal-choice models. Transportation Research Record, 637, 51–57.
460
Marı´a Francisca Ya´n˜ez and Juan de Dios Ortu´zar
Sillano, M., & Ortu´zar, J. de D. (2005). Willingness-to-pay estimation with mixed logit models: Some new evidence. Environment and Planning A, 37(3), 525–550. Stopher, P. R., & Greaves, S. P. (2004). Sample size requirements for measuring a change in behaviour. 27th Australian Transport Research Forum, Adelaide, Australia. Train, K. E. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University Press. Ya´n˜ez, M. F., Mansilla, P., & Ortu´zar, J. de D. (2009). The Santiago Panel: Measuring the effects of implementing Transantiago. Transportation, DOI: 10.1007/s11116-009-9223-y (online first).
Chapter 21
What Do We Really Know About Travellers’ Response to Unreliability? Yaron Hollander
Abstract We review what is known and what is still unknown about the process of revealing the impact of unreliability on travel choices. We do this from the perspective of a demand-modelling practitioner who wishes to allow for the benefits from improved reliability in the assessment of a transport scheme. We discuss the travel responses affected by unreliability, the requirements from the data used to model these responses, the explanatory variables used in these models and the additional information required as input when applying them. One of our findings is that there is a conflict between existing studies in their conclusions about the aversion to early arrival. Another notion is that it is unclear whether the common simplified treatment of the distribution of preferred arrival times is acceptable. We also suggest that the dominance of departure time shifting as a primary response to unreliability might refute the common assumptions about travellers’ choice hierarchy, which was established without considering the impact of unreliability; this raises questions about the robustness of assignment models that do not allow time shifting.
21.1. Introduction Much attention has been recently paid to the fact that travellers respond to the unreliability of travel times, and to the existence of economic benefits or costs associated with this response. In the United Kingdom this has been acknowledged not only in the academic literature, but also in series of formal policy documents,
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
462
Yaron Hollander
including the Eddington Transport Study; Towards a Sustainable Transport System; some early work on the NATA Refresh; and in some new guidance from the Department for Transport on the consideration of reliability in modelling and forecasting. Nevertheless, the study of how unreliability affects travellers’ choices is still a relatively new area of transport research, and an even newer area for choicemodelling practitioners. This paper does not report the findings of new research, but it reviews what is currently known about the process of revealing the impact of unreliability on travel choices, as well as what is still unknown. The requirements from travel demand models that account for the effects of unreliability are becoming more stringent. So long as the cost associated with unreliability was not recognised by transport economists and decision makers, any modelling work based on a sound methodology, that revealed the behavioural effects of unreliability, made a valuable contribution. Researchers in the past three decades have made a convincing case for the importance of unreliability, which led to the recognition that this should play a role in scheme assessment, as is evident in the policy documents mentioned above. Now that it is agreed that reliability matters, it is becoming increasingly important to agree how to deliver robust forecasts. Our perspective here is one of a demand-modelling practitioner who wishes to allow for the benefits from improved reliability in the assessment or the appraisal of a transport scheme. While so far it did not matter much whether the importance of reliability is illustrated using standard or non-standard approaches, the chances of successful implementation of these new ideas by practitioners are higher if they are incorporated in standard modelling tools. Each of the four sections of this paper discusses a question that relates to the will to consider the response to unreliability without abandoning the concepts that demand modellers are familiar with. The question discussed in the first section is which of the responses covered by standard demand models is affected by unreliability. The following section discusses the requirements from the data used to estimate these models. In the subsequent section we look into the choice of explanatory variables for models that consider the impact of unreliability. The section that follows discusses the additional information required as input for assessing the effects of unreliability in future scenarios. The last section summarises our conclusions about what is known and what is still to be explored in the context of the analysis of travellers’ response to unreliability. As in most recent studies that focus on the unreliability of transport systems, the term unreliability here refers to unpredictable variability of travel times from day to day. Journey time variation that has known reasons but cannot be predicted by a traveller (e.g. variation as a result from train cancellation or road accident) is considered a part of this unreliability; but variation that a rational traveller can anticipate (e.g. the difference between day periods) is not. Such a definition is chosen because the general understanding is that this uncertainty is what triggers a change of behaviour which is not accounted for otherwise. It is important to highlight here the difference between the uncertainty about the travel time on a particular day, which is the main result of unreliability, and the term uncertainty as commonly used in the literature. The term uncertainty describes the idea that travellers do not know what travel conditions they are about to experience,
Travellers’ Response to Unreliability
463
not necessarily because of the stochastic nature of traffic phenomena, but more generally because this information is not readily available for whatever reason. There is a large body of research about the effect of uncertainty, the willingness of individuals to take risks that result from uncertainty and the consequences of the assumption of full information that analysts often make. This is different from the effect of unreliability because travel time variability is a feature of the transport system and not a result of a lack of information. Unreliability would exist even in a system with full information; and uncertainty exists not only about the level of unreliability. Indeed, it is somewhat simplified to discuss the impact of unreliability here without considering the fact that in reality there are both unreliability and uncertainty. This is further complicated by the fact that some of the reasons for unreliability are also sources of uncertainty, and by the fact that not all researchers base their analysis on identical definitions. But a review of both strands of research would be very ambitious for the current scope and we leave it for future work. Nevertheless, we return to the definition of unreliability later in this paper.
21.2. Which Travel Responses Are Affected by Unreliability? While practitioners in the transport industry have not yet incorporated the effects of unreliability into their forecasting tools, researchers in the past three decades have developed a range of models that reveal these effects. Variables representing the level of reliability have been used in several types of models, describing different travel responses, as we explain in the following paragraphs. Prashker (1979) was among the very first to point the significance of travel time variability in explaining mode choice. The effect on mode choice was later also studied by Hendrickson and Plank (1984). Exploring the chance of switching between modes of transport due to the level of unreliability has not been a primary area of research but work on this topic is ongoing, and includes recent studies such as Bhat and Sardesai (2005). There has been more work on the effects of unreliability on route choice. Jackson and Jucker (1982), Abdel-Aty, Kitamura, and Jovanis (1995), Liu, Recker, and Chen (2004), Brownstone and Small (2005), Small, Winston, and Yan (2005), Bogers, Viti, Hoogendoorn, and van Zuylen (2005) and Tilahun and Levinson (2008) developed models for route choice that account for this effect. Lam (2000) proposed a model for the combined choice of route and departure time, and Lam and Small (2001) created models for different combinations of route, time and mode choice. It should be noted that in most cases, the route choice problem is a simplified one with a choice between a small number of routes. There has also been much work on the impact of unreliability on departure time choice. Note that this concerns the decision when to start the journey within a preset time window, as a response to the uncertainty about the exact travel time; namely, it is a micro time choice problem and not the choice of a travel period. The importance of reliability considerations in this choice was studied by Gaver (1968), Knight (1974),
464
Yaron Hollander
Hall (1983), Hendrickson and Plank (1984), Pells (1987a, 1987b), Polak (1987a, 1987b), Mahmassani and Stephan (1988), Noland and Small (1995), Noland, Small, Koskenoja, and Chu (1998), Hollander (2006) and others. As mentioned in the introduction, it appears that research has been successful in revealing how unreliability influences a range of travel choices. At first glance it might seem that the insights into the impact of unreliability can now be employed to include reliability components in mode, route and time choice models built for forecasting and assessment purposes. However, it is important to pay attention to two related issues that have not been given sufficient attention. The first issue has to do with the overall hierarchy of a full demand model that includes the response to reliability. Traditional ‘four-stage’ models have usually included route choice at the lowest (i.e. most sensitive) level and mode choice at the level above it. More recently, with the increasing effect of congestion and peak spreading, it has become quite common to include a micro time choice element below mode choice and above route choice, as an addition to the original four stages. This is acknowledged, for example, in the formal guidance for transport assessment by the UK Department for Transport (WebTAG, unit 3.10.3). Another recent trend has been the shift of the mode choice to a higher position in the overall hierarchy, above model components such as macro time choice (which is in itself a new addition to the traditional structure) and trip distribution. This is a result of some evidence implying that the majority of travellers are more likely to change their choice of trip destination than to change their preferred mode. The structure of the overall model hierarchy is an issue in the current context because the hierarchy commonly adopted by practitioners is different from the hierarchy suggested by recent academic work on the effects of unreliability. Some of the studies mentioned earlier present evidence that when travel options differ in the level of reliability, micro time choice would be the first choice to change. Noland and Small (1995) and Bates, Polak, Jones, and Cook (2001) are quite explicit in discussing this, and it is also supported (though implicitly) by evidence provided by other authors about the relative flexibility of micro time choice compared to other potential responses. Namely, in studies that focus on the effect of unreliability, a hierarchy where route choice is above micro time choice is implied. In principle, it is perfectly feasible that a choice which is not at the bottom of the model hierarchy is the one that is most influenced by a certain factor. But it should be remembered that when the most common hierarchy was established, with route choice at its bottom, the influence of unreliability was not considered. Whether the justification for this common hierarchy can be re-established with reliability attributes in the model is a question that has so far not been thoroughly examined. Noland and Polak (2002) commented that route choice models need to consider scheduling effects if travel time variability is to be taken into account. Srinivasan and Guo (2003) and Hollander, Mackie, and Liu (2007) proposed methodologies that combine a micro time choice feedback loop within a broader modelling framework; but these are illustrations of principles which are not commonly applied. Changes in micro time choice are within the same-day period, and it is therefore possible
Travellers’ Response to Unreliability
465
that their dependency on the levels of reliability does not significantly change the full-period (or full-hour) demand matrices which are fed into a route choice model. But this simply results from the fact that the route choice models we commonly use are of a static nature, and do not consider variations in the level of demand within the modelled period. If we accounted for both the dynamic nature of demand and the effects of unreliability, it may reveal a pattern of route choices which is very different to what is estimated otherwise. Put another way, the common concept that static and dynamic assignment outputs are sufficiently similar to each other, for the purposes of most applications, was probably tested without considering the effects of unreliability. The discussion above does not aim to suggest that dynamic assignment models are essential when we wish the demand model to consider the effects of unreliability. Rather, it suggests that within the framework of a full demand model, the existing evidence for the high sensitivity of time choice to unreliability has implications which have not yet been fully investigated. If a full travel demand model is built in an area where local data confirms that micro time choice is the most sensitive response, then there is a genuine problem with using the common traffic assignment packages. These packages generally do not have a feature that calculates the generalised cost of each route as a composite cost across several possible departure times, thus they are only suitable for the case where route choice is at the bottom of the model hierarchy. The other issue that deserves attention, before embarking on the estimation of mode, route and time choice models with reliability considerations, is the risk that information on general preferences is wrongly interpreted as insight into a specific type of response. Such risk exists when a model is estimated from a stated preference (SP) survey. We discuss SP surveys in greater detail in the next section, but we mention here this particular problem which in some cases might imply that the data is misused. The difficulties (discussed later) in presenting the level of reliability in a SP questionnaire in a concise way often lead us to refrain from displaying any detail which is not seen as vital. Most SP questionnaires used recently to learn about the attitudes to unreliability present two travel alternatives in each choice task, and the presentation of these alternatives is dominated by an attempt to explain the extent of travel time variability in a clear way. Whether the choice is between two routes, two modes or two departure times is not always stressed. For example, De Jong, Tseng, Kouwenhoven, Verhoef, and Bates (2007) present SP tasks which could equally apply to different types of models; Batley, Ibanez, Wardman, Shires, and Whelan (2007) deal with a route choice problem but their SP task is dominated by scheduling considerations and does not describe other features of the route. Indeed, all surveys include an introductory section which explains this, but given the limited ability of SP survey respondents to process large amounts of information, this may not be sufficient. In cases where such surveys are aimed to establish a hypothesis that one type of response is more sensitive to reliability attributes than another, conclusions would be more robust if it can be shown that this sensitivity is not an artefact of the survey format.
466
Yaron Hollander
21.3. What Inputs Are Appropriate for Studying the Attitudes to Unreliability? It is widely agreed among transport analysts that revealed preference (RP) has an inherent advantage over SP as a source of data for modelling, since it is based on observed behaviour. Still, it is also agreed that relying on SP data is essential in many situations. SP allows us to analyse the effect of individual factors, which are difficult to isolate in RP data, and it is also often used when there is interest in the response to scenarios that cannot be observed. Various authors (e.g. Bates et al., 2001; Batley et al., 2007) have agreed that it is extremely challenging to analyse the attitudes to unreliability from RP data, as there are hardly any situations in which a choice truly exists between alternatives that differ from each other in the level of reliability. Even where such choice exists, the difficulty in separating the effect of different potential explanatory variables remains. As Bates et al. (2001) point out, this leaves SP as the ‘only practical option’. One type of setting in which several authors did study the impact of unreliability from RP data is where a toll road operates in parallel to a toll-free road, serving similar journeys but offering lower travel times and higher reliability. The typical RP challenge of separating the effect of reliability from the effect of reduced mean travel times still exists in these situations, but several studies (Lam, 2000; Lam & Small, 2001; Small et al., 2005; Brownstone & Small, 2005) did estimate route choice models with reliability attributes for these cases. The route choice problem in all cases is a simplified one, with the tolled and toll-free routes as the only two alternatives. It appears that the statement about the impracticality of RP applies to any route choice problem more complex than this. Bhat and Sardesai (2005) combine RP and SP data to estimate their mode choice model with reliability elements. We are not aware of other attempts to use RP data to model the effect of unreliability on public transport demand. A combination of RP and SP is also used in the estimation of the route choice model mentioned earlier by Brownstone and Small (2005) and Small et al. (2005). SP remains, therefore, the dominant approach for collecting data on the attitudes to unreliability. This makes the investigation of these attitudes particularly sensitive to the classical weaknesses of SP studies. The choices made by SP respondents are artificial by definition; the concept that the presented choice task replicates the way a real choice situation is perceived is merely an assumption. Evidence from some qualitative studies (e.g. the focus groups study described in Steer Davies Gleave, 2008) does confirm that the attributes and terminology presented in SP tasks are often consistent with the perceived ones. But there is less evidence on whether travellers remember (or have access to) information on travel times on previous days or with alternative travel options. Only if this information is available, the real choice task can be reproduced by the SP task. As mentioned in the introduction, this relates to a broader issue that exceeds the scope of this review. SP questionnaires that present a choice between alternative unreliability scenarios have some special features which are not typical in SP surveys undertaken for other purposes; it is important to recognise these features to make the survey design more
Travellers’ Response to Unreliability
467
effective. A key issue is that some of the attributes which we hope to include later in our choice model are not explicitly presented in the SP choice task. The attributes that capture the level of unreliability may include the standard deviation of travel times, the amount of earliness or lateness to the destination, the amount of earliness or lateness of a public transport service at the origin and others (we discuss the choice of attributes in more detail later). These attributes are analytical statistics which are not well understood by many travellers even if they are later used to explain their behaviour. Instead of showing the attributes of our interest we thus present, in each SP task, alternative information on the extent of travel time variability that exhibits these statistics implicitly. In other words, one set of attributes is used to design the SP questionnaire and another set of attributes is actually presented. Each set of levels of the design attributes is converted after the design stage to an equivalent set of levels of the presentation attributes. This raises three problems. First, there is the problem of choosing the presentation attributes so that the idea that travel times are unpredictable is well understood. Early SP experiments (Jackson & Jucker, 1982; Pells, 1987a, 1987b; Bates, Dix, & May, 1987; Atkins Consultants Ltd., 1997) described the level of unreliability by noting the usual travel time and the extent and frequency of delay. A typical question in surveys of this type would ask respondents to choose between a journey that always takes 50 min and a journey that normally takes 40 min but has a 20-min delay once a week. In recent years, most experiments (Black & Towriss, 1993; Senna, 1994a, 1994b; Noland et al., 1998; Small, Noland, Chu, & Lewis, 1999; Cook et al., 1999; Bates et al., 2001; Hollander, 2006; De Jong et al., 2007; Batley et al., 2007) used a more explicit formulation, in which the level of travel time variability in each of the presented alternatives is expressed as a ‘typical’ sequence of travel times. While the idea of presenting a sequence of travel times is now widely accepted, there is still some debate about the details of this sequence and its graphical presentation. Bates et al. (2001) used a ‘clock-face’ presentation to draw the respondent’s attention to the fact that travel times do not appear in any particular order. This is found useful, but it should be noted that the graphical display does not illustrate the actual extent of unreliability. Hollander (2006) introduced a ‘vertical bars’ presentation which illustrated the level of variation in travel times and was shown to be well understood by respondents who do not read all the textual information, as well as respondents which are not proficient with reading graphs. A very similar approach was also adopted later by Batley et al. (2007), who replaced the explicit display of departure and arrival times (which was more suitable to highfrequency bus travel) by a statement of the number of minutes of deviation from schedule (which is more suited for rail travel). De Jong et al. (2007) compared the performance of several approaches to present the idea of unreliability, one of which is a simplified version of the vertical bars approach. They found that presenting five typical travel times with no graphical presentation is the preferred approach; unfortunately the original form of the vertical bars approach was not tested. In the qualitative focus groups study that preceded the full SP by Steer Davies Gleave (2008), participants were asked how clear they found the different ways of presenting
468
Yaron Hollander
the distribution of travel times. It was found that above a half of the participants preferred the vertical bars approach as in Hollander (2006) while the remaining participants preferred a display of five times with no graphics. It was concluded that a vertical bars presentation that also clearly states the five times can make the most of the preferences of both types of respondents. Most studies assumed that the effect of variability in travel times have to do with the deviation from the desired arrival time at the destination. Batley et al. (2007) introduced the hypothesis that fluctuations around train departure times from the origin station have an additional disutility. Accordingly, when presenting vertical bars that represent travel times on five different days, they introduced additional variation in the position of the ‘departure’ ends of the bars, whereas previous studies only varied the ‘arrival’ end. A somewhat unclear aspect of this approach for presentation is that in the real transport system, a delayed departure from the origin causes some anxiety to travellers as it is likely to lead to delayed arrival at the destination; but Batley et al. (2007) present the fluctuation in departure and arrival simultaneously, thus removing the uncertainty about the effect of delayed departure on arrival. To investigate the difference between the attitudes to unreliability at departure and at arrival, Steer Davies Gleave (2008) used two separate questionnaires, presenting only one of them (chosen randomly) to each respondent. One questionnaire deals with the effect of fluctuation of the departure time and the other examines the response to fluctuation of the arrival time. Although these are clearly dependent on each other, the choice task draws the respondent’s attention to the time at one end of the journey more than the other, by making its presentation more explicit. See the appendix for an example. We return to the issue of investigating unreliability at the origin in the next section. The second problem related to the SP attributes is that the experimental design process needs to be carried out before we know which variables we will eventually include in the choice model. There is a long list of variables that are sometimes found significant in capturing the aversion to unreliability; different studies identify different variables as those that perform best. Choosing a small subset of these (e.g. the mean travel time, travel time variance and mean lateness) as the key attributes for the statistical procedure of experimental design might result in a failure to identify effects that do exist but are not part of this subset. Expanding the subset to include all the attributes that have a potential to contribute to the model is impractical, since there are too many such attributes (e.g. each element expressed as a mean value or as a probability; ‘dummy’ attributes for different levels of extra lateness; etc.). The only practical approach we are aware of is based on using a relatively small list of attributes for the core design, but a much extended list when the efficiency of the design is simulated. Namely, out of a group of experimental designs that optimise the efficiency using the core attributes, it is recommended to choose those that also perform well in identifying additional attributes. This is only possible through an extensive set of simulation runs, in which imaginary choices are generated using synthetic sets of parameters for the broader set of attributes, and the ability of different designs to reproduce these synthetic parameters is examined.
Travellers’ Response to Unreliability
469
The third problem caused by the difference between the SP design attributes and the presentation attributes is that the conversion of the design attribute levels to levels of the presentation attributes is not trivial. If the choice tasks present the pattern of unreliability as a series of five travel times, as was done in most recent studies, the challenge is to create for each choice task five numbers that meet a predetermined set of constraints. The following are the main difficulties that arise when generating these numbers: Too many constraints. As an example, some models explained the disutility of travel using a set of attributes that included the mean travel time, mean lateness, mean earliness and a probability of arriving more than 10 min late. It is not always feasible to find a set of positive travel times that complies with given levels of all these. Too many degrees of freedom. In contrast, in some cases (e.g. when the only design attributes are the mean and variance of the travel time) the constraining is not sufficient to identify a set of suitable travel times. This can be easily solved either by fixing some of the times to a realistic level or by allowing some randomness in this process; but it should be borne in mind that the response to the SP choice tasks is not indifferent to this interference. The arbitrary rule we use to remove unnecessary degrees of freedom might serve as an unwanted attribute and thus obscure the effect of the real attributes of interest. Illogical range. The levels of the design attributes need to encompass a large enough range to ensure that a credible model can be estimated from the data. But when these are converted to the presentation attributes, their range of values may seem too wide. Since we usually prefer to limit the amount of presented times to five or so per alternative, the times at the high end of the range are often irrationally high. A similar problem occurs when the times at the lower end of the range are too close to zero and do not seem realistic. We often need to add additional constraints that fix the lowest and highest possible individual travel times. It should be noted that these typical problems do not normally prevent us from being able to generate good experimental designs; but it is important to give them sufficient attention so that the logic of the choice tasks is uncompromised. In some cases, the combination of design attribute levels, which we find optimal, cannot be converted into a rational combination of presentation attributes, and we need to choose a suboptimal design. Since this is the case, it is not uncommon to start the design by generating several rational candidate sets of the presentation attributes, converting them into the design attributes and then choosing the set that is found the most statistically efficient. All in all, the almost-complete reliance of reliability analysis on SP-based evidence, coupled with the technical challenges described in this section, imply that even more caution should be taken during this process than is usual for SP studies. The usual warnings for SP practitioners, regarding robust design and simulation, questionnaire development, sample size, careful recruiting and segmentation can only be highlighted here again.
470
Yaron Hollander
21.4. Which Variables can be Used to Explain the Attitudes to Unreliability? In the previous section we have already presented some of the main variables used in utility (or generalised cost) functions in choice models that exhibit sensitivity to the level of reliability. The discussion of these variables would not be complete without mentioning that the choice of explanatory variables divides existing models into two groups: models based on the ‘scheduling approach’ and models based on the ‘mean-variance’ approach. The scheduling approach is based on the concept that the entire impact of unreliability is related to travellers’ willingness to arrive on time. The variability of travel times results in early or late arrival to the destination, and travellers allow a ‘safety margin’ on their individual departure time in an attempt to minimise the overall disutility caused by travel time itself, earliness and lateness. This concept was introduced by Gaver (1968) and Knight (1974), with further contributions by Hall (1983), Pells (1987a, 1987b), Polak (1987a, 1987b) and Noland and Small (1995). Note that apart from departure time choice models, route choice and mode choice models can also be based on the scheduling approach; a route choice model based on this approach will have the extent of early and late arrival as attributes of each alternative route rather than attributes of each alternative departure time. Some of the authors who developed scheduling models found that the effect of late arrival justified some degree of double counting, normally by including the mean lateness as one attribute and adding other attributes such as the probability of late arrival, the probability of extra-late arrival (e.g. more than 10 min late), quadratic functions of any of these and so on. However, since the exact list of attributes varies between case studies, it is necessary when estimating new models to establish which of these, if any, are significant in each case (and also whether the design was suitable to identify them, as explained earlier). In scheduling models with a linear-in-parameters utility function, the willingness to pay for an improved level of reliability is derived as the ratio of the parameters of any of these attributes to the cost parameter, similar to the conventional calculation of the value of time. The alternative, ‘mean-variance’ approach asserts that modelling only scheduling decisions does not capture the full extent to which travellers dislike unreliability. Mean-variance models (Jackson & Jucker, 1982; Black & Towriss, 1993; Senna, 1994a, 1994b; Atkins Consultants Ltd., 1997) sometimes include some partial penalisation of late arrival, similar to scheduling models, but they primarily show explicit aversion to unreliability per se by including the variance (or standard deviation) of travel times as an element in the utility function. Note that similar to scheduling models, mean-variance models can, in principle, describe any choice. Noland et al. (1998), Small et al. (1999) and Hollander (2006) used their data to estimate both scheduling models and mean-variance models, for car and bus users. They found that the explicit attributes representing the variance of travel times were statistically significant when scheduling variables were not included, but they no longer contributed to the explanatory power of their models once the scheduling variables were added. Small et al. (1999) concluded, ‘y in models with a fully
Travellers’ Response to Unreliability
471
specified set of scheduling costs, it is unnecessary to add an additional cost for unreliability’. Other authors did not present both mean-variance and scheduling models but reported that scheduling models demonstrated higher statistical significance. The model estimated for rail users by Bates et al. (2001) is the only published work we are aware of that contradicted this finding; this study found that unreliability has a cost per se, over and above the scheduling cost. In general there is no particular reason to expect that one of the two approaches would always be found more suitable; it is the standard practice in demand modelling that the contribution of each variable varies between locations and between segments of the population. Still, it should be highlighted that the mean-variance approach is used in various studies even though most attempts to compare it to the scheduling approach have found that the latter was superior. Since this is the case, it is important to understand the justification provided by different researchers for their choice of the mean-variance approach. We found three different reasons for developing mean-variance models to capture the impact of unreliability: As mentioned earlier, Bates et al. (2001) find evidence in their data for the significance of the travel time variability attribute, which they ascribe to the anxiety or stress caused to travellers by the uncertainty of travel conditions. This is the only study we are aware of that finds such evidence. Several authors reported that the sum of the earliness and lateness components in the utility function of a scheduling model can be empirically approximated by a single component, expressing the standard deviation of travel times; or more generally that mean-variance models and scheduling models may be theoretically equivalent (most recently, Fosgerau and Karlstro¨m, 2007). Although this is only valid under certain conditions, it suggests that the costs captured in mean-variance formulations may in fact be implicit estimates of scheduling considerations. Some studies acknowledge that a scheduling approach may perform better but choose a mean-variance formulation from the outset in order to avoid the higher analytical effort required to apply a scheduling model. The implementation of a mean-variance model is fairly straightforward and only requires aggregate estimates of the mean and variance of travel times, while applying a scheduling model requires information on the distribution of preferred arrival times and simulation of the mean lateness and earliness at a disaggregate level. Batley et al. (2007), for example, refrain from including scheduling attributes in their model for rail travellers since they rely on the type of supply data available in the British rail industry. Attributes representing the extent of late arrival are included in their model, but these are based on train delays with respect to timetables rather than individual preferred arrival times, and are therefore not consistent with the scheduling approach. The difference between these three arguments is subtle but important. Bates et al. (2001) proposed a mean-variance model since this formulation had the best fit with survey data. In contrast, authors who developed mean-variance models without testing the potential contribution of scheduling attributes may have created tools that work empirically but do not have the same power to reproduce the way travellers behave.
472
Yaron Hollander
All scheduling models reviewed above show a strong preference to avoid late arrival, using a range of alternative variables. In the academic literature, all scheduling models also place a penalty on early arrival. Practitioners sometimes wrongly state that in a time choice model, one can expect to see a preference to increase the extent of early arrival rather than reduce it, or to find that travellers are simply indifferent to earliness. This is made formal by the modelling guidance from the UK Department for Transport (WebTAG, unit 3.5.7), which guides modellers not to consider early arrival as a source of additional cost. In the context of time choice modelling it must be stressed that this leads to erroneous results: a time choice model in which travellers have no aversion to early arrival will allocate all trips to the earliest possible departure time (e.g. at 4 AM), so that even the slightest risk of late arrival is avoided. Indeed, the penalty on each minute of early arrival is always found lower than the respective cost of a minute of lateness; but having a positive unit cost for both earliness and lateness is critical for the mechanism of choosing the departure time to remain logical. In the previous section, we mentioned two recent studies that pay attention to the unpredictability of the time when a public transport service leaves the origin, in addition to the fluctuation of the arrival time at the destination. Whether this is a significant part of the generalised cost of travel is a topic that deserves some further exploration. Batley et al. (2007) presented, in some of the SP situations, possible delays at the origin simultaneously with delays at the destination, so that respondents of their survey did not have to see unreliability at the origin as a source of uncertainty about their expected arrival time. One would expect that in such case the cost of unreliability at the origin should be minor if not negligible, but the actual result was that this cost was significant. This is an outcome which we do not understand; it is hoped that future research can clarify whether variability is disliked so badly even without uncertainty about the travel time, or maybe respondents simply did not interpret the question correctly. Steer Davies Gleave (2008) dealt with travel time variability at the origin and at the destination in two separate SP experiments. One experiment presented only variability of arrival times at the destination, stating that departure is always on time. In a different experiment, the departure times at the origin exhibited variability and were stated explicitly, and travel time itself was assumed to be the same on each day, but the effect on arrival time was somewhat blurred by not stating it explicitly. Indeed, since the different possible departure times and the journey time were known, respondents could use them to calculate the arrival time at the destination; but it was assumed that many of the participants will only respond to information that does not require making such calculations. Namely, it was felt that presenting variability both at the origin and at the destination in an equally clear way in the same choice situation was too obvious, and that somewhat blurring one of them would give the respondents an opportunity to express aversion to unreliability at the origin if such aversion existed. The effect of unreliability at both ends of the journey was tested by using the responses from both experiments jointly to estimate a single choice model, in which the potential significance of four scheduling attributes was examined: mean earliness
Travellers’ Response to Unreliability
473
at the origin, mean lateness at the origin, mean earliness at the destination and mean lateness at the destination. The result was a model in which only two of the four came out significant: the amount of earliness at the origin and the amount of lateness at the destination (but not the lateness at the origin and the earliness at the destination). Our interpretation of this outcome is that this model captures three concepts jointly: The ‘earliness at origin’ attribute captures the idea that the service must not arrive at the origin station or stop before its expected time, as the passenger will have not arrived there by that time. The ‘earliness at origin’ also captures the preference to avoid early arrival at the destination, because on average a shift of the departure by 1 min backwards causes a similar shift of the arrival. If earliness at the origin were not included, this would have to be represented separately; but when earliness at the origin is penalised already, adding another term for earliness at the destination would be double counting. The ‘lateness at destination’ captures the desire not to arrive after the preferred time. Therefore, the set of two attributes only (together with attributes representing the mean travel time and fare) seemed, in this study, to rationally capture all the aversion to unreliability. Nevertheless, more research in this area would help establishing this view, which is currently only supported by a single study.
21.5. What Inputs Are Needed to Estimate the Effect of Unreliability in Future Scenarios? To develop demand models with reliability variables, it is not essential to have the ability to predict changes in these variables in hypothetical settings. This independence between the two capabilities is somewhat encouraged by the fact that estimation of these models is SP-based, because investigating the response to different levels of unreliability is done without having to seek information about actual levels. But the main purpose of including unreliability in a demand model is to subsequently use the model for assessment or appraisal of investment scenarios. There is little gain in including an explanatory variable in the demand model if we are unable to estimate future values of this variable. Therefore, there needs to be a match between the specification of the demand model and the capabilities of available supply models. Since the 1970s, dozens of models have been proposed in the literature for estimating future levels of variability of travel times. A detailed review of these models is not brought here but is available from the author. Brief reviews of supply models of unreliability were also included in some of the articles mentioned
474
Yaron Hollander
earlier, such as Noland and Polak (2002). Tools for estimating the level of unreliability can generally be grouped into four categories: Travel time variability as a function of other attributes. This includes formulae (estimated by regression or similar techniques) that express the standard deviation or variance of travel times as a function of variables representing the mean travel time, the level of congestion and other attributes. Models of this type are empirical in nature and do not attempt to establish theoretical foundations for the mathematical relationship they present. Calibration of these formulae is of a very local nature, and the transferability of an expression developed in one location to another is generally limited. Recently, Hyder Consulting (2007) developed for the UK Department of Transport a model of inter-urban highway unreliability; a relatively high level of transferability between locations is reported, although this has not been widely tested yet. Fitting a probability density function to travel times. Numerous studies attempted to determine empirically which common families of statistical distribution curves matched observed patterns of unreliability. A wide range of findings is reported, although most studies find that asymmetrical distributions (such as the log normal or exponential) perform better than symmetrical ones. See Hollander (2007) for a review of 25 such studies. Knowing the shape of the distribution of travel times helps understanding the consequences of a given level of unreliability, but in the current context the value of knowing this is limited, since this does not help identifying which points along the probability density curve correspond to our different scenarios of interest. Repeated simulation. Some practitioners exploit the stochastic nature of traffic microsimulation models by running these models several times and using the distribution of travel times from a series of runs as an estimate of the distribution of travel times that determines the level of unreliability. Doing so without calibrating the microsimulation model appropriately, to make it suitable for this task, is simply a mistake; there is no reason to assume that the level of randomness between model runs is similar to the level of randomness between days in the real transport system. Hollander and Liu (2008) provide guidance on how a microsimulation model can be modified so that estimating the distribution of travel times by repeated simulation is made more robust. But the presented principles are quite ambitious, and we doubt if they have been used in practice for large-scale network models. Other methods. Some other methods, employing more innovative mathematical approaches, have been recently proposed. Van Lint and van Zuylen (2005) developed an artificial neural network model to predict the distribution of travel times. The mathematical complexity of this model makes it potentially more sensitive to the particular characteristics of the different settings in which it can be used, but the estimated levels of unreliability have no theoretical foundation, and their transferability is still an unclear subject. An equally innovative approach was proposed by Robinson and Polak (2007), based on a K-nearest-neighbour method, to characterise variability in travel times.
Travellers’ Response to Unreliability
475
All in all, none of these models has gained wide acceptance or been recognised as a first step towards a common forecasting approach. This is a critical point because it is currently the supply-modelling challenges, rather than demand-modelling challenges, which form the main hurdle to overcome before unreliability can start playing an active role in transport demand assessment by practitioners. Nevertheless, since it is the demand side that we focus on here, let us assume for now that in our area of interest an acceptable technique for estimating future levels of unreliability does exist. Let us also assume that the estimates of unreliability have been restructured so that the typical route-based or link-based format of the supply-model outputs has been converted into an origin–destination format, as required by most demand models, while carefully considering the non-additive nature of measures of variation. Even then, we still need to convert the information on the general level of unreliability into a set of individual scheduling attributes, if our demand model is based on the scheduling approach. The calculation of the extent of earliness and lateness (or similar scheduling attributes) of individual travellers might be rather complex, although various levels of simplification are possible. It has not yet been sufficiently investigated what level of simplification can be considered acceptable. This calculation combines the distribution of travel times from the supply model with a distribution of travellers’ preferred arrival times. Note that obtaining the distribution of preferred arrival times is not a supply-modelling issue but a matter of market research. It is derived from information about when people need to be at work, to what extent they have to be there exactly on time (an issue we elaborate on further below) and so on. As discussed earlier, models based on the scheduling approach have strong behavioural foundations, but much of this behavioural power is lost in cases when the analyst does not have a good source of information about the distribution of preferred arrival times, and hence can only roughly estimate the mean earliness and lateness. When estimating the amount of earliness and lateness, it is not uncommon to assume that a single distribution of travel times from the supply model applies similarly to different travellers even if their preferred arrival times differ. This is a simplification because different preferred arrival times are associated with different levels of traffic congestion, and therefore also with different distributions of travel times. It is hoped that future research will help understanding whether this is a tolerable approximation. It should also be noted that not only the time when one needs to be at their destination, but also the level of flexibility in the arrival time, that is the degree of aversion to late arrival, varies between travellers. Using a model that penalises every deviation from a desired arrival time is likely to overestimate the cost of unreliability, unless the analyst believes that the penalties used already incorporate the indifference of some travellers to unreliability. The level of flexibility in arrival time generally varies by the journey purpose: low for business and education travel, high for leisure activities. Nevertheless, commuting trips are likely to disclose a considerable range. Steer Davies Gleave (2008) report, based on a survey of 1700 travellers in Leeds, that 60% of trips to work had to arrive ‘promptly on time’ while the remaining 40% were allowed various levels of flexibility. Since we discuss the way information from a supply model is fed into the demand models, it is important to remind that this is in fact a two-way feedback. If travellers in
476
Yaron Hollander
the model are allowed to exercise their preferences by improving the balance between earliness and lateness, then the result is a new distribution of departure times, and the supply model outputs calculated previously (i.e. travel times and their variability) may no longer be valid. Therefore, if departure time choice is modelled explicitly, then ideally the calculation of the utility or generalised cost using inputs from the supply model should be done iteratively. Indeed, the shift of departure time may be as small as a few minutes, and many supply models do not possess this level of temporal detail; this is further complicated when modelling the demand for public transport, due to the discrete nature of departures and due to the effect of transfers, as discussed in detail by Bates et al. (2001). Practical approaches to combine these insights in standard modelling tools have not yet been developed, and how flawed the analysis is when these are dealt with in a simplified manner (e.g. not iteratively) is still unknown. Another important question to raise, in the discussion about the connection between the demand and supply models, is whether our definition of unreliability is consistent throughout the analysis. This definition, that excludes types of travel time variation that a rational traveller can expect, is not an obvious one. The authors of the studies reviewed here generally agree that it is important to stick to this definition in order to separate the response to uncertainty from behavioural patterns which are accounted for elsewhere. Nevertheless, not all studies explicitly present their definition of unreliability, and we are not confident that they have all been sufficiently cautious about this. The following is an example of where one can easily go wrong. In the supply analysis (i.e. the analysis undertaken to estimate the level of travel time variability, earliness or lateness) we define analysis periods and we generally see any journey made during one analysis period as representing that period. It is common to specify analysis periods of two hours each, for example 07:00 AM to 09:00 AM, or more. However, many travellers know that travel time variability at 08:50 AM is different from the variability at 07:20 AM. Therefore, if the supply analysis uses travel time measurements from 07:20 AM and from 08:50 AM for the same estimate of the level of unreliability, then it contradicts the idea of focusing on unpredictable variation. In other words, when the supply analysis builds a distribution of travel times to calculate the measures required for the demand model, two different travel times can be used as parts of the same distribution only if they are measured at time points that travellers see as one. This means that the periods of supply analysis need to be much shorter than what is common. In several studies, the author has used periods of 30 min or less, assuming that most travellers can identify typical differences in travel conditions between periods longer than this. However, the 30-min threshold was not determined through a robust investigation, and it is worth exploring this issue more thoroughly in future studies.
21.6. Conclusion We have discussed here various issues that arise when attempting to reveal the effect of unreliability on travel demand. We have reviewed the contribution of many studies
Travellers’ Response to Unreliability
477
and found that much is already known about the models we can use to capture the response to unreliability. This includes the following. Researchers in the past three decades have made a convincing case for the importance of unreliability. At least in the United Kingdom, there seems to be no further need to give evidence that unreliability has impacts with an associated cost. Unreliability may have effects on mode, route and departure time (i.e. micro time) choices. Many studies state that the departure time choice is the most sensitive response to unreliability, and route choice is just above it in the choice hierarchy, although it should be observed that the common understanding is that the hierarchy of demand responses does vary from one place to another. The difficulties in isolating the effects of individual choice attributes from RP data make SP surveys the only type of evidence available in practice for estimating models with reliability attributes. In several specific cases where a toll road runs parallel to a toll-free road, there have been successful attempts to combine SP and RP data. In SP surveys that focus on unreliability, most studies today prefer to illustrate the level of unreliability using a set of five typical travel times. The ‘vertical bars’ approach delivers the idea successfully, but some respondents prefer the five times to be presented as simple text. In any case, a detailed introduction that clarifies these concepts is essential. SP surveys, where the choice tasks do not state if the alternatives represent departure times, routes or modes (as is quite common), may not be suitable for establishing that one type of response is more sensitive than another. Namely, when the survey is meant to collect data for building forecasting tools, it should be made very clear what is being chosen. SP questionnaires in reliability studies are unique in that the design attributes are different from those actually presented. The conversion of the design to a choice task for presentation should be done with caution; the choice of the travel times we present to illustrate the design attributes is prone to problems such as insufficient constraining or overconstraining, and might result in an illogical range of travel times. Scheduling considerations were found in most studies to be sufficient in explaining travellers’ aversion to unreliability, but the study of rail users by Bates et al. (2001) found that there is a cost to unreliability per se. Many other studies develop meanvariance models without testing scheduling attributes at all because of the difficulty to apply scheduling models. There is no reason to expect that one of these two approaches is always superior to the other; but in cases where a mean-variance model is used without testing scheduling variables, it should be remembered that its behavioural foundations may be compromised. In a scheduling model for time choice there must be a penalty on early arrival; otherwise the model will allocate all trips to the earliest possible departure time. The analysis of supply data to provide inputs to the demand model must be based on sufficiently short periods; otherwise the variability of travel times is overestimated and includes some variation which is not random, that is which travellers can predict.
478
Yaron Hollander
There is also still much to be explored before the study of unreliability can be entirely based on off-the-shelf techniques. Our recommended list of topics for further investigation includes the following. We know that departure time choice is found to be the most sensitive response when the focus of the study is unreliability, but we do not know if it will often remain the most sensitive response in a broader context. If in some applications time choice remains at the bottom of the choice hierarchy, we need to develop new practical solutions for modelling route choice since the common assignment packages today do not allow time shifting for a given route. We are unsure to what extent travellers correctly perceive the information on travel times on the day of travel and on previous days, as well as information on alternative travel options, before they process all these to choose their preferred travel option. The models we reviewed, which are based on SP surveys, do not consider the possible difference between the perceived values and the actual ones. Experimental design of SP questionnaires with reliability attributes needs to be done before we know which of these attributes we would eventually include in the model. We cannot use all potential attributes in the formal design but it is important to at least test how well the design performs in identifying the contribution of potential attributes which are not part of the core design. It is worth investigating practical ways for undertaking such tests. Further exploration is required concerning the aversion to unreliability at the origin. Batley et al. (2007) reached an unclear conclusion, that there is a high penalty on variability of the departure time even when this does not imply uncertainty about arrival times. Steer Davies Gleave (2008) reached a different conclusion that the overall aversion to unreliability can be captured by penalising the earliness at the origin and the lateness at the destination, but this is still not supported or refuted by other studies. A key obstacle to the applicability of reliability modelling is the unavailability of widely accepted supply models to estimate the inputs for the demand model. There is little gain in establishing the unit values of journey time reliability, earliness or lateness if we have no way of estimating how unreliable, early or late a journey will be in a hypothetical scenario. When a suitable supply model is available, and the demand model follows the scheduling approach, it is still quite complex to estimate the level of earliness and lateness of individual travellers with different preferred arrival times. It has not yet been sufficiently investigated what level of simplification in this process can be considered acceptable. There is also a need for practical approaches to account for the discrete nature of travel by public transport, as discussed by Bates et al. (2001). Further research could help revealing the appropriate duration of the analysis periods. This duration should be specified so that travellers do not identify systematic travel time variability within a period (hence randomness can be assumed within the period), but they do recognise systematic variability between periods.
Travellers’ Response to Unreliability
479
References Abdel-Aty, M. A., Kitamura, R., & Jovanis, P. P. (1995). Investigating effect of travel time variability on route choice using repeated-measurement stated preference data. Transportation Research Record, No. 1493, pp. 39–45. Atkins Consultants Ltd. (1997). Bus reliability study – Stated preference research. UK. Bates, J., Dix, M., & May, T. (1987). Travel time variability and its effect on time of day choice for the journey to work, transportation planning methods, Proceedings of seminar C, The PTRC Summer Annual Meeting, University of Bath, Vol. P290 (pp. 293–311). Bates, J., Polak, J., Jones, P., & Cook, A. (2001). The valuation of reliability for personal travel. Transportation Research E, 37, 191–229. Batley, R., Ibanez, J. N., Wardman, M., Shires, J., & Whelan, G. (2007). A discrete choice study to assess the impact of reliability on passenger rail demand. Paper presented at the European Transport Conference in Noordwijkerhout, The Netherlands. Bhat, C. R., & Sardesai, R. (2005). On examining the impact of stop-making and travel time reliability on commute mode choice: An application to predict commuter rail transit mode for Austin, TX. Proceedings of the 84th TRB annual meeting, Washington, DC. Black, I. G., & Towriss, J. G. (1993). Demand effects of travel time reliability. Centre for Logistics and Transportation, Cranfield Institute of Technology. Bogers, E. A. I., Viti, F., Hoogendoorn, E. P., & van Zuylen, H. J. (2005). Valuation of different types of travel time reliability in routh choice – a large scale laboratory experiment. Proceedings of the 85th TRB annual meeting, Washington, DC. Brownstone, D., & Small, K. A. (2005). Valuing time and reliability: Assessing the evidence from road pricing demonstrations. Transportation Research A, 39, 279–293. Cook, A. J., Jones, P., Bates, J. J., Polak, J., & Haigh, M. (1999). Improved methods of representing travel time reliability in SP experiments, transportation planning methods. Proceedings of seminar F held at the European Transport Conference, Homerton College, Cambridge, P434 (pp. 37–49). De Jong, G., Tseng, Y., Kouwenhoven, M., Verhoef, E., & Bates, J. (2007). The value of travel time and travel time reliability. The Netherlands Ministry of Transport, Public Works and Water Management. Fosgerau, M., & Karlstro¨m, A. (2007). The value of reliability. Munich Personal RePEc Archive, No. 5733. Gaver, D. P., Jr. (1968). Headstart strategies for combating congestion. Transportation Science, 2(2), 172–181. Hall, R. W. (1983). Travel outcome and performance: The effect of uncertainty and accessibility. Transportation Research B, 17(4), 275–290. Hendrickson, C., & Plank, E. (1984). The flexibility of departure times for work trips. Transportation Research A, 18(1), 25–36. Hollander, Y. (2006). Direct versus indirect models for the effects of unreliability. Transportation Research A, 40(9), 699–711. Hollander, Y. (2007). The cost of bus travel time variability. Research report. Institute for Transport Studies, University of Leeds. Hollander, Y., & Liu, R. (2008). Estimation of the distribution of travel times by repeated simulation. Transportation Research C, 16(2), 212–231. Hollander, Y., Mackie, P. J. & Liu, R. (2007). Estimating traveller costs with a departure time choice loop. Paper presented at the European Transport Conference in Noordwijkerhout, The Netherlands.
480
Yaron Hollander
Hyder Consulting (2007). Forecasting travel time variability in urban areas. London, UK: Department for Transport. Jackson, W. B., & Jucker, J. V. (1982). An empirical study of travel time variability and travel choice behavior. Transportation Science, 16(4), 460–475. Knight, T. E. (1974). An approach to the evaluation of changes in travel unreliability: A ‘‘Safety Margin’’ hypothesis. Transportation, 3, 393–408. Lam, T. (2000). Route and scheduling choice under travel time uncertainty. Transportation Research Record, No. 1725, pp. 71–78. Lam, T. C., & Small, K. A. (2001). The value of time and reliability: Measurement from a value pricing experiment. Transportation Research E, 37, 231–251. Liu, H. X., Recker, W., & Chen, A. (2004). Uncovering the contribution of travel time reliability to dynamic route choice using real-time loop data. Proceedings of the 83rd TRB annual meeting, Washington, DC. Mahmassani, H. S., & Stephan, D. G. (1988). Experimental investigation of route and departure time choice dynamics of urban commuters. Transportation Research Record, No. 1203, pp. 69–84. Noland, R. B., & Polak, J. W. (2002). Travel time variability: A review of theoretical and empirical issues. Transport Reviews, 22(1), 39–54. Noland, R. B., & Small, K. A. (1995). Travel time uncertainty, departure time choice, and the cost of morning commutes. Transportation Research Record, No. 1493, pp. 150–158. Noland, R. B., Small, K. A., Koskenoja, P. M., & Chu, X. (1998). Simulating travel reliability. Regional Science and Urban Economics, 28, 535–564. Pells, S. (1987a). The evaluation of reductions in the variability of travel times on the journey to work, transportation planning methods, Proceedings of Seminar C held at the PTRC Summer Annual Meeting, University of Bath, Vol. P290 (pp. 313–325). Pells, S. (1987b). The evaluation of reductions in travel time variability. Ph.D. thesis in Economics, University of Leeds, UK. Polak, J. (1987a). Travel time variability and departure time choice: A utility theoretic approach. Discussion Paper No. 15. Transort Studies Group, Polytechnic of Central London, UK. Polak, J. (1987b). A more general model of individual departure time choice, transportation planning methods, Proceedings of seminar C, The PTRC Summer Annual Meeting, University of Bath, Vol. P290 (pp. 247–258). Prashker, J. N. (1979). Direct analysis of the perceived importance of attributes of reliability of travel modes in urban travel. Transportation, 8, 329–346. Robinson, S., & Polak, J. W. (2007), Characterizing components of urban travel time variability using k-nearest-neighbor method. Proceedings of the 86th TRB annual meeting, Washington, DC. Senna, L. A. D. S. (1994a). User response to travel time variability. Ph.D. thesis in Civil Engineering, University of Leeds, UK. Senna, L. A. D. S. (1994b). The influence of travel time variability on the value of time. Transportation, 21, 203–228. Small, K. A., Noland, R., Chu, X., & Lewis, D. (1999). Valuation of travel-time savings and predictability in congested conditions for highway user-cost estimation, NCHRP Report no. 431, Transportation Research Board, US. Small, K. A., Winston, C., & Yan, J. (2005). Uncovering the distribution of motorists’ preferences for travel time and reliability. Econometrica, 73(4), 1367–1382. Srinivasan, K. K., & Guo, Z. (2003). Day-to-day evolution of network flows under departure time dynamics in commuter decisions. Transportation Research Record, No. 1831, pp. 47–56.
Travellers’ Response to Unreliability
481
Steer Davies Gleave. (2008). New generation transport in Leeds: A stated preference study. Metro, West Yorkshire Public Transport Executive. Tilahun, N. Y., & Levinson, D. M. (2008), A moment of time: Reliability in route choice using stated preference. Proceedings of the 87th TRB annual meeting, Washington, DC. Van Lint, J. W. C., & van Zuylen, H. J. (2005), Monitoring and predicting freeway travel time reliability. Proceedings of the 84th TRB annual meeting, Washington, DC. WebTAG. Transport Analysis Guidance. Department for Transport, United Kingdom, available at: http://www.dft.gov.uk/webtag/
Appendix 21.A.1. SP Questionnaires to Investigate the Attitudes to Unreliability 1. A ‘vertical bars’ SP task from Hollander (2006)
482
Yaron Hollander
2. A ‘reliability at origin’ SP task from Steer Davies Gleave (2008)
3. A ‘reliability at destination’ SP task from Steer Davies Gleave (2008)
PART VI BEYOND TRANSPORT
Chapter 22
Optimizing Product Portfolios Using Discrete Choice Modeling and TURF Thomas J. Adler, Colin Smith and Jeffrey Dumont
Abstract Discrete choice models are widely used for estimating the effects of changes in attributes on a given product’s likely market share. These models can be applied directly to situations in which the choice set is constant across the market of interest or in which the choice set varies systematically across the market. In both of these applications, the models are used to determine the effects of different attribute levels on market shares among the available alternatives, given predetermined choice sets, or of varying the choice set in a straightforward way. Discrete choice models can also be used to identify the ‘‘optimal’’ configuration of a product or service in a given market. This can be computationally challenging when preferences vary with respect to the ordering of levels within an attribute as well the strengths of preferences across attributes. However, this type of optimization can be a relatively straightforward extension of the typical discrete choice model application. In this paper, we describe two applications that use discrete choice methods to provide a more robust metric for use in Total Unduplicated Reach and Frequency (TURF) applications: apparel and food products. Both applications involve products for which there is a high degree of heterogeneity in preferences among consumers. We further discuss a significant challenge in using TURF — that with multiattributed products the method can become computationally intractable — and describe a heuristic approach to support food and apparel applications. We conclude with a summary of the challenges in these applications, which are yet to be addressed.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
486
Thomas J. Adler et al.
22.1. Introduction and Background Discrete choice models are widely used for estimating the effects of changes in attributes on a given product’s likely market share and can be applied directly to two situations. In the first, the choice set is constant across the market of interest (e.g., in a typical consumer product application, in which the brands or products of interest are available to all consumers). The model is used to estimate the effects of price or product features on a product’s market share. In the second situation, the choice set varies in a systematic way across the market. In transportation applications, for example, the choice set commonly varies across individuals in a systematic way, and transit is assumed not to be in the choice set for individuals who make trips that are not served by the transit system. In both of these applications, the models are used to determine the effects of different attribute levels on market shares among the available alternatives, given predetermined choice sets, or of varying the choice set in a straightforward way (e.g., expanding transit service or introducing a single new product). Discrete choice models can also, in theory, be used to identify the ‘‘optimal’’ configuration of a product or service in a given market. This can be computationally challenging when preferences are heterogeneous with respect to the ordering of levels within an attribute as well the strengths of preferences across attributes. However, this type of optimization is a relatively straightforward extension of the typical discrete choice model application. An analogous extension involves optimizing the total market share achieved by a portfolio of products or services when a given supplier can provide multiple items in the portfolio. This type of optimization involves trade-offs between market share and production costs. In a market with heterogeneous preferences, the market share will be largest for portfolios that have at least one item that appeals to each of the consumers in that market. However, design and production costs increase with the number of items in the product portfolio and, in most retail environments, producers are limited as to the number of items (stock keeping units or ‘‘SKUs’’) that they can display. Total Unduplicated Reach and Frequency (TURF) has been widely used for analogous problems in media placement planning, and, more recently, for consumer product portfolio optimization (e.g., Cohen, 1993; Conklin & Lipovetsky, 2005). In the media placement application, survey data are used to determine consumers’ media consumption patterns, which are then used in a sample enumeration form to mechanically calculate reach (number of individuals who are exposed to a given set of media) and frequency (the number of such exposures). The typical consumer product portfolio application uses survey data in an analogous way, asking consumers which of several products they would consider for purchase and which they would not. Again, the survey data can be used in a sample enumeration process to estimate the reach and frequency of a given product portfolio. One of the significant challenges in using TURF is that the basic calculations have a computational dimension that is proportional to n choose m, where n is the set of all possible products and m is the allowable number in the portfolio. With products that have multiple attributes, n can be quite large. In the food products application
Optimizing Product Portfolios Using Discrete Choice Modeling and TURF
487
described below, it is in the range of 57,000, and enumerating options for any portfolio greater than two items becomes computationally intractable. In a typical market research-type application of TURF, respondents are asked to indicate the reason for purchasing each product of interest using a five-point purchase intention scale. The reach for a given assortment of products is then calculated as the share of individuals in the survey sample who indicated a high purchase intent (e.g., top one or two boxes on the five-point scale) for one or more products in that assortment. This is an acceptable approach when the number of basic products is small and, in particular, when the search does not extend across multiple attributes of each product. An alternative approach is to use discrete choice modeling to provide a TURF metric that can be synthetically generated for the more general cases that involve either larger numbers of basic products or products that are distinguished along several attributes. In the former case, where the attributes are fixed, maximum difference scaling conjoint (‘‘maxdiff’’) with its parsimonious survey element provides a useful alternative that is scalable to larger numbers of products (Louviere, 1993). Posteriors describing the individual-level utilities for each product can be estimated using these data. For the latter case, choice-based conjoint (stated choice) survey data can be used to estimate individual-level utility functions across all of the tested attributes and levels. The individual-level utilities can be computed from models that are estimated using mixed logit, finite mixture models, or hierarchical Bayes. These posterior estimates can then be used to calculate the utility for any given combination of attribute levels, and this utility in turn can serve as a metric for TURF analyses. The following sections of this paper describe two applications that use these two types of discrete choice modeling to provide a more robust metric for use in TURF applications. Both involve products for which there is inherently a high degree of heterogeneity in preferences among consumers: apparel and food. The apparel application uses maxdiff to provide the TURF metric, whereas the food product application uses conventional discrete choice methods. Both use individuallevel posteriors to reflect the sample heterogeneity. To support these applications, a heuristic TURF algorithm was developed to efficiently sample from the large space of possible product assortments to create close-to-optimal portfolios.
22.2. Apparel Application 22.2.1. Study Overview The objective of the study was to develop 15 different assortments of shirts from a larger test set of shirts (a total of 571 SKUs across 15 categories). The project’s client was interested in identifying the optimal assortments of shirts to make available what would maximize the number of consumers who would purchase a shirt from the assortment. The data sample of 4000 responses was collected using an online survey. Respondents
488
Thomas J. Adler et al.
were recruited for the survey using a sample provider and were compensated for their time. Respondents were screened to ensure that they purchased the type of clothing under study at the retailers where the project’s client would place them.
22.2.2. The Survey The survey consisted of several sections. The first section asked about current clothing purchases and demographics to ensure that the respondent purchased casual clothes from the retail outlets where the shirts would be placed and was a member of the target demographic market for the shirts. Those passing the screening criteria were then assigned to evaluate either 1 or 2 of the 15-shirt categories based on gender and (in the case of women) whether they bought clothes for a male significant other. The next section of the survey featured a style evaluation section. A ‘‘style’’ in this context is a silhouette or cut of shirt. In this survey, each style was available in multiple colors, and each combination of style and color constitutes an item, or SKU. Respondents were asked to indicate which styles of shirt they might buy in the next year and which ones they would not. The primary purpose of the style evaluation section is to allow the respondents to familiarize themselves with the clothing line in advance of the discrete choice exercises. The discrete choice exercises comprised a set of maxdiff experiments (two sets in cases where the group of respondents were to evaluate two categories of clothing). In the maxdiff experiments, respondents were presented with four shirts and asked to select the one they were most likely to buy and the one they were least likely to buy. A set of balanced incomplete block designs was generated using commercial software (Sawtooth Software, 2007). Table 22.1 lists the clothing segments, the number of styles and items in each segment, their sample sizes, and resulting number of choice observations. Following the choice exercises for each shirt category, respondents were asked to rate their most-preferred and least-preferred items on a five-point purchase intention scale. The most-preferred and least-preferred items were selected within the survey software using a simple counting analysis of the respondent’s choices. A selection for an item as the most preferred resulted in a + 1 score for that item, whereas a selection for an item as the least preferred resulted in a 1 score for that item. Scores were totaled for each item; the item with the highest score is the most preferred and the item with the lowest score is the least preferred. Ties were broken by random selection. This exercise gave purchase intentions for two items that were later used for scaling the estimated item utilities. In the final survey section, respondents completed a section composed of demographic questions.
22.2.3. Utility Estimation The maxdiff data were coded for estimation using standard choice conventions (Louviere, 1993) and individual-level utilities were estimated using hierarchical Bayes
Optimizing Product Portfolios Using Discrete Choice Modeling and TURF
489
Table 22.1: Number of items per segment, sample sizes, and number of observations. Group
Sample Females
Males
1 2 3 4
120 120 120 120
180 180 180 180
5 6 7
120 120 120
180 180 180
8 9 10 11 12
120 400 400 400 400
180 0 0 0 0
2560
1440
Total
Category
Styles
SKUs
Experiments per respondent
Observations
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
6 9 11 1 1 6 9 8 1 6 8 3 8 3 1
45 53 64 15 17 48 61 45 15 41 56 25 52 25 9
23 27 32 8 9 24 31 23 8 21 28 13 26 13 5
6,900 8,100 9,600 2,400 2,700 7,200 9,300 6,900 2,400 8,400 11,200 5,200 10,400 5,200 2,000
81
571
97,900
(a good overview discussion of these methods is given in Andrews, Ansari, & Currim, 2002). Separate models were estimated for each of the 15-shirt categories. 22.2.4. Scaling to Purchase Intention The purchase intention ratings for the two items rated immediately after the maxdiff exercises in the survey were used to scale the items’ utilities to the purchase intention scale of 1–5, with 1 being very likely to purchase and 5 being not at all likely to purchase. This enables the results from the different shirt segments to be compared and distinguishes between consumers who are more or less likely to purchase. To accomplish the scaling, the two rated items were matched up with their estimated item utilities. A linear relationship was assumed between the utilities (based on previous work in which additional scale points were obtained), and the purchase intention values from this linear relationship were applied to all of the item utilities in that shirt category. Figure 22.1 shows an example of rescaling item utilities to the purchase intention scale.
22.2.5. Size of TURF Problem The study’s client was interested in developing assortments of 8 shirts for each of the 15-shirt categories. Calculating the reach and frequency for each of the possible eight
490
Thomas J. Adler et al. Rescaling to Purchase Intention 5
4.5
4
Purchase Intention
3.5
3
2.5
2
1.5
1 Rated Item (Utility: 2)
Scaled Item (Utility: 2.5)
Scaled Item (Utility: 4)
Scaled Item Rated Item (Utility: 4.5) (Utility: 5)
Figure 22.1: Example of purchase intention scaling. Note: A value of 1 on the purchase intention scale corresponds to very likely to purchase. A value of 5 corresponds to not at all likely to purchase. shirt assortments is tractable for some of the smaller segments (up to 25 items). However, even with modern computational power, calculating the reach and frequency for all assortments for the larger segments (41 or more items) cannot be done exhaustively in a reasonable amount of time. Instead, a heuristic approach (described below) was developed and used for the larger segments. Table 22.2 shows the number of assortments for each shirt segment, ordered by number of assortments. 22.2.6. Heuristic Approach Many different computationally tractable heuristic approaches can be used to identify close-to-optimal TURF assortments. A computationally trivial — and commonly used — approach begins with the single item that has the highest reach. It keeps that item as fixed and then evaluates all of the remaining items to determine the one that, combined with the first, maximizes the two-item reach. A third item is
Optimizing Product Portfolios Using Discrete Choice Modeling and TURF
491
Table 22.2: The number of possible combinations of eight shirts for each shirt segment. Category 15 4 9 5 12 14 10 1 8 6 13 2 11 7 3
SKUs
Combinations
9 15 15 17 25 25 41 45 45 48 52 53 56 61 64
9 6,435 6,435 24,310 1,081,575 1,081,575 95,548,245 215,553,195 215,553,195 377,348,994 752,538,150 886,322,710 1,420,494,075 2,944,827,765 4,426,165,368
added to the first two fixed items using the same process and the procedure continues in this way until a full assortment is constructed. This procedure, however, in many applications creates only a single — and often far-from-optimal — solution. One significant problem is that it does not take into account the correlations in preferences that exist among items. An alternative approach using Monte Carlo sampling from the space of assortments was developed for this application. This new approach uses a scoring metric that considers each item’s individual reach and estimates its reach when combined with each of the other available items. The initial matrix calculations required to compute this metric are tractable for problems even much larger than this application. The metric provides a good approximation of each item’s contribution to reach in a multi-item assortment and is used as a weight for a Monte Carlo sampling process. Tests were conducted against an unweighted Monte Carlo sampling process, and the metric was shown to provide considerable efficiency in identifying optimal (highest reach) assortments. For the categories in the apparel application where the heuristic approach was required, the TURF software was allowed to run for several hours to evaluate a large number of assortments.
22.2.7. Evaluation of TURF Results The resulting assortments for all shirt categories were imported into a SQL database for evaluation. The assortments with the highest reaches in each shirt category were
492
Thomas J. Adler et al.
reported to the study’s client and variations on these assortments were produced and sold in the client’s retail channels. The same process has now been applied to several apparel lines over multiple seasons with corresponding validation. The study team is currently working to collect additional data that can be used to further validate and tune the approach.
22.3. Food Product Application 22.3.1. Study Overview The objectives of the study were to evaluate the acceptability of a new frozen vegetable side-dish product and to determine whether to market two versions of the product that differed by method of cooking preparation. The project’s client was interested in identifying the optimal set of side-dish products to make available what would maximize the number of consumers who would purchase an item from the set. A sample of just over 1800 responses was collected using an online survey. Respondents were recruited for the survey using a sample provider and were compensated for their time. Screening criteria were included in the survey to ensure that respondents were likely to be purchasers of frozen vegetable products.
22.3.2. The Survey The survey consisted of four sections. The first section introduced the product concepts and, for screening purposes, asked about the current use of frozen vegetable side dishes. This section also included detailed information about the two preparation methods. The next section of the survey featured two sets of discrete choice exercises, one for each method of cooking preparation. Each set contained 10 choice experiments. Respondents were asked to select from three unlabeled alternatives. Each alternative was constructed from four attributes: vegetable cut, vegetable type, seasoning/sauce, and extras (an extra vegetable, fruit, or nut). Table 22.3 contains the number of product elements used for each attribute. As some product elements were clearly Table 22.3: The number of levels for each of the attributes in the choice exercises. Attribute Vegetable cut Vegetable type Seasoning/sauce Extras
Number of levels 7 10 42 30
Optimizing Product Portfolios Using Discrete Choice Modeling and TURF
493
not matched with each other, some elements were only shown in combination with selected other levels. The discrete choice exercises were followed by two product configuration exercises. For each preparation method, respondents were given the opportunity to construct their own ideal product from the list of attributes and elements used in the choice exercises. Respondents were then asked to rate their ideal product on a purchase intention scale. Respondents were also asked to rate one product randomly selected from those not chosen in each of the discrete choice exercises. This gave purchase intention ratings for four products that were later used for scaling the estimated utility preferences. Finally, survey respondents completed a section composed of demographic questions.
22.3.3. Utility Sums for Each Product The discrete choice data were initially used to do specification testing using simple multinomial logit models. Interaction effects between different product attributes were tested using the data from the configured items. No significant interactions (outside of obvious exclusions) were found, so interaction effects were excluded from any further modeling. Preference utilities were then estimated for each of the product elements for each respondent using hierarchical Bayes. Models for each preparation method were estimated separately. The resulting preference utilities were then combined to form a total preference utility for every possible product configuration for each respondent. Because of the predefined element pairing exclusions, a total of over 57,000 product utilities were calculated for each respondent. For the 1800 respondents, this resulted in over 104 million product utilities. Owing to the number of levels and the number of respondents, this calculation was done, and the results were stored, in a SQL database.
22.3.4. Scaling to Purchase Intention The purchase intention ratings for the four items rated in the product configuration section of the survey were used to scale the preference values to a purchase intention scale of 1–5, with 1 being very likely to purchase and 5 being not at all likely to purchase. This enables the two cooking preparation methods to be compared and distinguishes between consumers who are more or less likely to purchase. To accomplish the scaling, the two items from the configuration exercises for each preparation method were matched up with their total preference utility. As with the apparel application, a linear relationship was assumed between the utilities and the purchase likelihood values. This linear relationship was applied to all of the total preference utilities.
494
Thomas J. Adler et al.
22.3.5. Size of TURF Problem The study’s client was interested in product sets of 2–5 items. They were interested in the reach and frequency of sets of items for each preparation method independently, as well as sets of items from both preparation methods combined. Owing to the large number of possible product configurations, a simple TURF analysis would be computationally intractable even with these small item set sizes. To make the TURF analysis tractable, first a subset of the 57,000 products was chosen for each preparation method. The items given the highest scaled purchase likelihood rating for more than 50% of the sample were included in the TURF analysis. For the analysis of the combined preparation methods, the items most frequently occurring in the top 250 TURF solutions by method were included. Table 22.4 contains the number of items used for the TURF analysis for each preparation method. Calculating the reach and frequency for each of the item sets of size two is tractable. However, even with modern computational power, calculating the reach and frequency for products sets of size three and larger on any of these product sets cannot be done exhaustively in a reasonable amount of time. Instead, a heuristic approach was developed and used for sets of size three and larger. Table 22.5 contains the number of solutions for each size of the optimal product set for the full product set as well as the smaller subsets. Table 22.4: The number of items included in the TURF analysis for each preparation method. Preparation method
Number of items
Percent of total (%)
4,094 3,452 1,693
7 6 1
Method 1 Method 2 Combined
Table 22.5: The number of possible solutions for each size of the optimal product set. Size of optimal product set 2 3 4 5
items items items items
Number of possible solutions full product set
1,649,360,895 31,575,914,760,845 453,366,984,136,212,000 5.21 1021
Number of possible Number of possible solutions product solutions product subset for preparation subset method 1 for preparation method 2 8,378,371 11,428,098,044 11,688,087,274,501 9.56 1015
5,956,426 6,849,889,900 5,906,317,566,275 4.07 1015
Optimizing Product Portfolios Using Discrete Choice Modeling and TURF
495
22.3.6. Heuristic Approach Owing to the number of items being brought into the TURF analysis, the matrix calculations used in the previously described apparel method were computationally intractable. Instead, a simpler Monte Carlo sampling heuristic was developed: (1) (2) (3) (4) (5)
Calculate single item reach (R) for all items; eliminate any items with R ¼ 0 Calculate a score for each item; W ¼ R/N, where N is the P sample size Calculate a sampling proportion for each item: Pj ¼ W j =ð m i¼1 Þ8j Loop k times, where k is the number of elements in the item set Monte Carlo sample without replacement in proportion to P to create the set of items (6) End loop (7) Calculate the reach and frequency for this set of items and store the results.
22.3.7. Evaluation of TURF Results The heuristic was run for 12 h and the resulting solutions were imported into a SQL database for evaluation. The total number of results generated can be found in Table 22.6. Duplicate solutions were removed and the results with the highest percent reach were reported to the study’s client and used to inform the next stage of product development.
22.4. Conclusions The use of discrete choice modeling and TURF analysis together as described in the previous sections has been implemented as part of the product planning process in the client organizations. This work has proved to be a useful in guide for the development of product portfolios — not only in the identification of products that complement each other in the portfolios, but also in determining the appropriate number of products in the portfolios. Table 22.6: The number of solutions found after running the heuristic for 12 h. Size of optimal product set 2 3 4 5 a
itemsa items items items
Preparation method 1
Preparation method 2
Combined
8,378,371 7,703,171 4,800,280 4,210,204
5,956,426 6,831,129 5,920,974 4,864,408
975,106 15,007,666 17,304,383 11,010,537
The analysis was run exhaustively on item sets of size two.
496
Thomas J. Adler et al.
22.4.1. Apparel Application In the apparel application, maxdiff conjoint, combined with purchase intention data, was used in constructing the TURF metric. Owing to the large number of combinations in the optimization problem, a Monte Carlo sampling technique was developed that takes into account the correlation that exists among the items. Currently, significant amounts of validation data are being collected as the portfolios are placed into the markets. These data are useful for tuning the metrics that are used in the TURF application and, in particular, for determining the appropriate weighting of reach and frequency in determining the ‘‘optimal’’ portfolio. However, in addition to the calculation of unconstrained optimal assortments, the client needed to be able to specify conditions on the assortments that were imposed by retailers or other practical considerations. For this set of applications, a web-based system has been developed that allows product managers to graphically specify rules to apply to the assortments and to determine the optimal solutions that satisfy those rules. This system has been used extensively in developing the many assortments that are developed for different retailers’ shelves. The TURF metrics provide useful guidance to apparel makers on both the number of items to include in and the composition of their assortments. In these applications, the maximum reach achieved by a given assortment generally increases significantly as assortment sizes increase from two to three to four but levels off appreciably for larger assortment sizes. The TURF metrics assist the apparel maker in determining exactly how many items to include in each product assortment and the mix of items that best match the diverse preferences represented in consumer markets. For example, in an initial application, TURF identified a color/style combination that was generally not popular overall and thus would not normally have been considered. However, that style appealed to a specific segment and thus the TURF analysis suggested that it should be included in any assortment of four or more items. Had that style not been included, the consumer segment that was uniquely attracted to that style would not have been reached. Sales data obtained after the assortment had been placed on retailers’ shelves indicated that that style in fact sold at the levels suggested by the analysis and at levels well above what the company’s style consultants would have initially estimated.
22.4.2. Food Application In the food application, choice-based conjoint, combined with preferred product configuration data, was used for the TURF metric. Preference utilities for each attribute were estimated and combined to form overall utilities for each product configuration. Owing to significant number of items brought into the TURF analysis, a simpler heuristic was developed that does not take into account the correlation among the product configurations, but which nevertheless produced
Optimizing Product Portfolios Using Discrete Choice Modeling and TURF
497
solutions that were measurably better than alternatives developed using conventional methods. One challenge that has been recognized by others is that the reach calculations produce a large number of ‘‘solutions’’ whose reaches are statistically equal. The secondary TURF measurement, frequency, is highly correlated with reach and, for assortments from which most consumers choose only one item, is generally not a very relevant measure (unlike in the media application where frequency is an important measure of exposure). Alternative measures such as the Shapley Value have been proposed (e.g., Conklin & Lipovetsky, 2005), but the usefulness of that measure is debatable and its application requires a computational effort that is significantly larger than the reach calculation. We have recently developed, and are currently testing, an alternative, computationally tractable measure to distinguish among items with comparable reach and also look forward to future work that others complete in this field.
References Andrews, R., Ansari, A., & Currim, I. (2002 February). Hierarchical Bayes versus finite mixture conjoint analysis models: A comparison of fit, prediction, and partworth recovery. Journal of Marketing Research, 20, 87–98. Cohen, E. (1993). TURF analysis. Quirk’s Marketing Research Review, July/August, pp. 10–13. Conklin, M., & Lipovetsky, S. (2005). Marketing decision analysis by TURF and Shapley Value. International Journal of Information Technology & Decision Making, 4(01), 5–19. Louviere, J. (1993). The best-worst or maximum difference measurement model: Applications to behavioral research in marketing. Paper presented at American Marketing Association’s Behavioral Research Conference, Phoenix, AZ, USA. Sawtooth Software. (2007). The MaxDiff/Web v6.0 technical paper, available at: http:// www.sawtoothsoftware.com
Chapter 23
Preference Stability: Modeling how Consumer Preferences Shift after Receiving New Product Information Harmen Oppewal, Mark Morrison, Paul Wang and David Waller
Abstract An assumption made in many applications of stated preference modeling is that preferences remain stable over time and over multiple exposures to information about choice alternatives. However, there are many domains where this assumption can be challenged. One of these is where individuals learn about new products. This paper aims to test how attribute preferences as measured in an experimental choice task shift when respondents are exposed to new product information. The paper presents results from a study investigating consumer preferences for a new consumer electronics product conducted among 400 respondents from a large consumer panel. All respondents received several choice tasks and were then able to read additional information about the new product. After this they completed an additional set of choice tasks. All choices were from pairs of new product alternatives that varied across eight attributes designed according to an orthogonal plan. Using heteroscedastic logit modeling, the paper analyses the shifts in attribute utilities and scale variances that result from the exposure to product information. Results show that as respondents become better informed about a new attribute the attribute has a greater influence on their choices. In addition a significant shift in scale variance is observed, suggesting an increase in preference heterogeneity after information exposure.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
500
Harmen Oppewal et al.
23.1. Introduction An assumption that is typical for most stated preference applications is that respondents have well-defined preferences. It is assumed that participation in the stated preference task allows respondents to reveal their preferences for a range of alternatives and attributes, which in stated preference analyses may well vary beyond the current domain of experience or available alternatives. The ability to probe preferences over this wider domain is one of the key benefits of stated preference methods. The assumption that preferences are stable over this domain of attributes and attribute levels may be deemed valid for applications where respondents are at least somewhat familiar with the presented attributes and their range. However, when respondents are not so familiar with the attribute it is less clear what their responses represent. To overcome this problem researchers may choose to provide explanatory information about the attributes to inform the respondent. For example, it is common practice in stated preference studies to include attribute glossaries that respondents either study prior to the task or can access during the stated preference task (Hensher, Rose, & Greene, 2005). In some studies more elaborate attempts to inform respondents were made, ranging from the provision of visual attribute information to better convey attribute meanings (e.g., Crouch, Devinney, Louviere, & Islam, 2009), to the use of full virtual reality techniques, as for example in the studies using ‘‘information acceleration’’ (Urban, Hauser, & Roberts, 1990; Urban et al., 1997). How does accessing such additional information, which allows the respondent to learn about the attributes, influence attribute valuations, and model performance? This paper aims to address this issue by comparing choice models estimated for stated preference responses before and after respondents were exposed to product information that allowed them to better understand the nature of a new product feature. To control for differences due to mere order and fatigue effects, we also compare choice models estimated for stated preference responses from different stages in between which no new information was provided. Hence, this paper investigates the effects of information provision on preference stability by testing how attribute parameters shift during the completion of a choice task as well as after respondents are exposed to additional background information about a new product. The latter consists in our project of a new electronics product. We test for changes in utility as well as for variations in error variance in the model before and after the exposure to the additional information, and also across earlier and later stages of a stated preference task that has the typical set up where respondents complete a sequence of multiple choice tasks. We report findings from an online survey among 400 participants that indicate that as respondents become better informed about the meaning of a new attribute, not only the attribute is more likely to have an influence on their choices but that it also diminishes the role of several other attributes. We also find evidence for a change in the utility scale of the parameters in the model, however, when we account for this difference in the analysis the above attribute shift effects are no longer reliably detected.
Consumer Preferences Shift after Receiving New Product Information
501
To present these ideas and findings our paper will first briefly review the existing approaches to modeling preference variation and the most relevant points from the existing literature on choice modeling. We then present our two main hypotheses and explain our procedure of data collection. A results section and a final conclusion and discussion section follow this.
23.2. Literature Review and Hypothesis Formulation In the context of experimental choice modeling, learning from new attribute information refers to improving one’s understanding of the benefits associated with a new attribute. Based on the literature regarding learning and decision-making, we will propose that if an attribute concerns a new product feature that the respondent is not familiar with, background information provided for the attribute will influence not only the utility of the new product attribute itself but also of other particular product attributes and their utility scale. We will therefore first review developments in modeling, then focus on experimental effects on the utility scale effects, and thereafter discuss the effects of new product information.
23.2.1. Modeling Variation in Preferences Choice experiments measure preferences as being determined by a systematic and a random utility component, as proposed in random utility theory (e.g., Louviere, Hensher, & Swait, 2000). The random component is assumed to capture preference variations across respondents and/or across measurement occasions, as well as variation due to measurement error. In contrast, the systematic utility captures the more enduring and stable preference component of individual respondents. Models have increasingly become available that allow capturing variation in systematic utility across respondents. Traditionally, a limited number of sociodemographic and other background variables were included to account for at least some respondent heterogeneity. Nowadays, however, random coefficients and latent class models are increasingly used to capture respondent heterogeneity (e.g., Boxall & Adamowicz, 2002; McFadden & Train, 2000; Revelt & Train, 1998). Models have also been developed that include attitudinal and other dispositional variables in addition to traditional sociodemographic variables (Ben-Akiva et al., 2002). These models however all still tend to assume that a single respondent’s systematic utility remains stable across task conditions and across choice sets in a stated preference task. Advances in modeling have also led to better incorporating random utility variations across choice conditions. Accounting for different scale levels of random utility has been a major focus of research in choice modeling for more than a decade, as researchers realized that differences in error variance are directly related to the scale in which the systematic utility component is expressed (e.g., Swait & Louviere, 1993; Louviere, 2001; Sonnier, Ainslie, & Otter, 2007). Significant progress in this
502
Harmen Oppewal et al.
area has resulted in models now being available that can account for differences in scale, such as the heteroscedastic logit model (DeShazo & Fermo, 2002; Hole, 2006).
23.2.2. Learning and Fatigue Effects in Choice Tasks Much of the research on understanding the factors influencing error variance has focused on the role of choice set design and respondent fatigue, and their influence on task complexity. For example, Mazzotta and Opaluch (1995) found support for the notion that increasing task complexity increases the random error component of choices. Similarly, DeShazo and Fermo (2002) and Arentze, Borgers, Timmermans, and Del Mistro (2003) found that increasing the number of choice set attributes and alternatives, hence increasing complexity, lead to increased error variance. Bradley and Daly (1994) observed an increase in error variance with choice set rank order. Caussade, de Dios Ortu´zar, Rizzi, and Hensher (2005) similarly found evidence that increasing complexity in terms of number of attributes or attribute levels leads to increasing levels of error variance (although they find a U effect for the number of alternatives and the number of choice sets). Various behavioral explanations have been given for this apparent relationship between task complexity and increasing error variance. There is evidence that as complexity changes, so does the approach that respondents use to make their choices. Tversky and Shafir (1992), for example, suggested that as task complexity increases, individuals will use several different strategies, including being more likely to choose the status quo alternative. Contrasting findings were, however, obtained by Swait and Adamowicz (2001a), who examined the effect of complexity on error variance across multiple datasets. They found for some data sets that error variance decreased with complexity. They suggest that this surprising result is because at high levels of complexity respondents simplify their choice process, thereby reducing error. For other data sets, they identified an inverted U-shaped relationship, with variance being highest at moderate levels of complexity and lower at either low or high levels of complexity. They suggest the reduced error variance at high levels of complexity may be due to decision-makers adopting noncompensatory decision rules. This would be consistent with Payne (1976), who observed that individuals were more likely to use noncompensatory decision strategies as task complexity increased (see also Lussier & Olshavsky, 1979). Learning from being exposed to repeated choice tasks effects can be expected to work in a similar direction as these complexity effects: as respondents complete more choice tasks it would be expected that they would begin to learn which attributes are most important to them and how to trade-off between attributes, thus simplifying the choice task, hence reducing error variance. Some evidence of these learning effects has been identified (Bjornstad, Cummings, & Osborne, 1997; Huber, Wittink, Johnson, & Miller, 1992). Such effects would also be consistent with results by Swait and Adamowicz (2001b), who found that choice sets that appear later in a task are more likely to induce simplified decision strategies.
Consumer Preferences Shift after Receiving New Product Information
503
Such effects may, however, depend on the mode in which the choice experiment is administered. Savage and Waldman (2008) compared learning and fatigue effects between online and mail survey choice experiments and found that whereas for mail surveys the error variance was constant across different order positions, in online settings accuracy decreases over a series of choice sets. This is relevant as our study, reported below, was conducted online. Finally, several studies in the related area of conjoint analysis have looked into effect of learning and fatigue. For example, Liechty, Fong, & DeSarbo (2005) provided a Bayesian approach to incorporating individual variation in learning and fatigue effects.
23.2.3. Learning from New Attribute Background Information The above studies all focus on effects of task complexity, learning, and fatigue as design artifacts. Far fewer studies seem to have looked at genuine learning effects as occurs when respondents are provided with additional product information. In stated preference studies additional attribute information is often provided in the form of glossaries of attribute definitions. In addition, there is a rich tradition of concept testing for new product development, in which stated preference-related methods such as conjoint analysis have been used for decades to assess consumers’ early responses to new product features and designs. This includes a range of attempts to use pictorial information to describe attributes that are difficult to verbally represent, extending to the early work by Urban et al. in ‘‘information acceleration’’ (Urban et al., 1990, 1997). Nowadays with the advent of online surveys, it is increasingly possible, and cost effective, to provide larger groups of respondents with such rich descriptions (see, e.g. Savage & Waldman, 2008), creating ample opportunities to introduce respondents to alternatives and attributes they were hitherto unfamiliar with. Despite this tradition in the area of new product development relatively little work in choice modeling seems to have focused on these more dynamic aspects of learning and preference change. Models have been developed that account for different stages of decision-making such as decisions concerning inclusion of alternatives in some prefinal or consideration set creation versus selection of the final alternative (e.g. Ben-Akiva & Boccara, 1995; Swait, 2001; Swait & Ben-Akiva, 1987) but these do not explicitly account for learning from exposure to new information. So, previous studies have focused principally on understanding whether and how respondents’ response accuracy improves with repeated choice tasks involving relatively familiar attributes, and the effect of this learning on error variance, preferences, and decision strategies. Other studies have attempted to enrich the typical choice stimuli by using additional sensory information in the form of pictures, or even sounds. Less research seems to have been conducted into the effects of providing new product background information during choice experimental tasks. Our study aims to investigate structural effects of exposure to new attribute background information and separate these from the incremental learning and fatigue effects in choice tasks as discussed in the previous section.
504
Harmen Oppewal et al.
23.2.4. Hypothesis Formulation We specify two hypotheses regarding how respondents’ preferences, as measured in a choice model, will change after exposure to new product background information. The first is that while the target (new) attribute becomes more important, other attributes will become less important when respondents receive explanatory information about a new and unfamiliar product feature. This will especially apply to so-called extrinsic attributes, which are attributes that generally signal product quality such as prices and warranties, and brand names (Purohit & Srivastava, 2001; Rao & Monroe, 1988; Suri & Monroe, 2003). Because they find it difficult to evaluate the target attribute, respondents use these signals to infer if the attribute represents value. For example, low knowledge decision-makers have been found to be willing to pay more, as they are more reliant on price as an extrinsic cue for product quality (Cordell, 1997). However, once they have received more information and understand what the attribute means, respondents can judge the attribute on its own merits and the extrinsic attributes lose their role as quality cues. Hence, we hypothesize that as a result of exposing respondents to new attribute information: H1. The relative importance of attributes will shift, with attributes for which additional information was provided becoming more important and other attributes, especially extrinsic attributes such as brand names and warranties, becoming less important after respondents have received relevant new product information. Our second hypothesis is about the error variance of the estimated model. We expect that the error variance in the model will increase as respondents receive more information about the new product feature. This is because as they receive more information, there is more basis for respondents to differently interpret and evaluate the product features (Malhotra, 1982; Keller & Staelin, 1987, 1989), with the consequence that an aggregated model will fit the data less well, resulting in higher levels of error variance in the choice model. Prior literature suggests that as respondents become more familiar with a choice task and the available range of attributes and attribute levels, this will make them more determined in their choice behavior in terms of using only a limited set of attributes as the main drivers of their choices (Huber, Ariely, & Fischer, 2002). This may be because as the task unfolds they become more familiar with and learn about their own preferences and/or them being less able and willing to consider the required trade-offs due to fatigue. Where this selection of a subset of determinant attributes varies across respondents, we predict that choice sets administered later in a series of choice tasks will display larger levels of preference heterogeneity. This increase in error variance will come in addition to the increase that has generally been observed in the literature, as reviewed above (with the exception of Swait & Adamowicz, 2001a), that as respondents proceed through a series of choice sets, fatigue will eventually result in less consistent choices.
Consumer Preferences Shift after Receiving New Product Information
505
We thus expect a second type of change as a result of exposing respondents to new attribute information: H2. After becoming familiar with the task and product features, respondents will be better able to determine which features they like; consequently, they will have better defined preferences. This will result in higher levels of error variance in an aggregate choice model.
23.3. Methodology 23.3.1. Sample and Procedure Choice data were obtained in Australia from members of an online panel recruited to participate in a survey regarding new electronic goods. Panel members were sampled using a sampling frame that is representative of the national population on key demographic characteristics such as age, income, and residential location. A random sample of panel members was approached of which 406 completed the survey (response rate 32%). After some introductory questions regarding product awareness and purchase intentions participants were presented with a choice experiment measuring DVD recorder preferences as outlined below. This was followed by additional questions about technology use, ownership, and awareness. In the final section of the survey instrument, background variables including sociodemographics were collected.
23.3.2. Choice Scenario Design A situation was presented where respondents have to assume they have received a $1000 special gift voucher from a well-known and reputable loyalty program. The voucher would allow them to obtain a new DVD recorder, to be selected from a limited set of recorder options. They could also choose not to use their voucher for a recorder and instead use the voucher amount to purchase DVD or CD disks, books, or to have the money donated to a charity. Respondents were asked to make their selection of how to use the voucher by choosing from pairs of DVD recorders. For each pair they indicated which of the two alternatives they would choose if these were the only available DVD recorder options for voucher purchase, or whether in that case they wished to purchase neither option. All new product alternatives varied across eight attributes designed according to an orthogonal plan. The set of attributes comprised brand name, price, warranty, and various functional product features. One of these, disk technology, was a new product feature at the time of surveying and concerned whether the recorder provided either or none of two competing technologies that were about to be launched onto the market, called Blu-Ray and HD-DVD.
506
Harmen Oppewal et al.
Each respondent received four choice tasks that were randomly selected, separately for each individual, from a master design comprising 128 DVD recorder profile pairs. This design in 128 consisted of a mix of two subdesigns each in 64 pairs. Design 1 consisted of 64 pairs of orthogonal recorder alternatives, drawn from a 21045 orthogonal plan. Design 2 used a maximum difference plan in 64 treatments, based on a 2543 orthogonal base plan (see Street & Burgess, 2007). Attributes were the same in both designs and included brand name (4 brands), number of disks the player can hold (1 or 5), disk technology (Normal DVD, HD-DVD, Blu-Ray or both Blu-Ray and HD-DVD), hard disk capacity (100 or 200 GB), presence of a combo player (yes or no), presence of a firewire connection (yes or no), warranty (1 or 3 years), and price. Price varied over four levels with intervals of $100, the lowest being Aus$600 and the highest being Aus$900. One of the two orthogonal alternatives in Design 1 was, however, designed to vary over a lower price range, starting at $300 up to $600, had different levels for the hard drive (it either had no hard drive at all or a hard drive of 100 Gb) and had the disk technology fixed at ‘‘Normal DVD.’’ The latter design variation was included to allow creating realistic price ranges for a more standard and a more advanced recorder alternative.
23.3.3. Providing Product Information After completing the four choice sets respondents were exposed to a mix of briefs with background information about the product in the form of a product review as can appear in newspapers or consumer magazines. There was one review that, while also providing general information about recorders, specifically argued that Blu-Ray was the most likely candidate to win the ‘‘war’’ between the two formats, Blu-Ray and HD-DVD. Another review argued the opposite and predicted that HD-DVD would be the winner. The third review similarly provided general product information but without an indication about which technology would become the new standard. It was determined by random draw which of the three reviews a respondent would receive. We included these different briefings in order to test a wider range of information briefs than if we had used only one brief. After receiving and reading the information brief respondents answered eight more choice sets, randomly drawn (without replacement) from the same master set of 128 choice sets. Additional questions followed measuring product knowledge and sociodemographics.
23.3.4. The Heteroscedastic Logit Model We use a heteroscedastic logit model (DeShazo & Fermo, 2002; Hole, 2006) to analyze the choice responses and assess the shifts in attribute utilities and scale variances resulting from the exposure to product information. We use a
Consumer Preferences Shift after Receiving New Product Information
507
heteroscedastic model instead of a random parameters model because the heteroscedastic model allows one to explicitly model the error variance. 23.3.4.1. Utility specification Let Unsi denote the utility of alternative i perceived by respondent n in choice situation s. For both the conditional and heteroscedastic logit models, Unsi is separated into two components: an observed component of utility, Vnsi and an unobserved (and unmodeled) component, ensi, such that: U nsi ¼ V nsi þ nsi
(23.1)
The observed component of utility is typically assumed to be a linear relationship of observed vector of attribute levels of each alternative, xnsi, and their corresponding weights (parameters), b. The unobserved component, ensi, reflects the idiosyncrasies and tastes of each respondent. In both models, the parameter weights for (coded) attributes k are invariant over respondents, so that the observed component of utility may be represented as: V nsi ¼
K X
bk xnsik
(23.2)
k¼1
The residuals, ensi, are assumed to be random variables with mean zero and distributed IID Gumbell for both the conditional and heteroscedastic logit models. 23.3.4.2. Model probabilities The probability Pnsi of respondent n choosing alternative i in choice situation s is: expðmV nsi Þ Pnsi ¼ PJ j¼1 expðmV nsj Þ
(23.3)
where m is a scale parameter that is inversely related to the standard deviation of the pffiffiffi error terms ðm ¼ p= 6snsi Þ, which is assumed constant across individuals. Further, as it cannot be estimated separately from the model parameters, it is typically normalized to one. In contrast to the conditional logit model, the heteroscedastic logit model relaxes the assumption about the scale parameter and parameterizes it as a function mns ¼ expðZ ns gÞ
(23.4)
where Zns represents a vector of characteristics of individuals, situations, or treatment variables. The heteroscedastic logit model then takes the following form: expðmn V nsi Þ Pnsi ¼ PJ j¼1 expðmn V nsi Þ
(23.5)
508
Harmen Oppewal et al.
When g ¼ 0, the heteroscedastic logit model collapses to the conditional logit model (Hole, 2006). To obtain willingness-to-pay (WTP) values (or implicit prices) for the conditional logit model the coefficient for a given attribute of interest k (bk) is divided by the negative of the coefficient for the cost variable ( bcost): WTPk ¼
bk bcost
(23.6)
We apply a similar rationale to calculate WTP values for the heteroscedastic model estimates (cf. Caussade et al., 2005). Our models include dummy variables Xku representing each attribute k’s shift in utility due to information exposure. We, therefore, can calculate WTP values for the postinformation conditions by adding the postinformation attribute-by-dummy interaction parameters as follows: WTPk; post
info
¼
ðbk þ bk0 Þ ðbcost þ bcost0 Þ
(23.7)
23.4. Results 23.4.1. Sample Descriptives The sample consisted of an equal number of males and females and represented a wide range of respondent groups: 35.7% were aged 35 or under, 46.7% between 35 and 55, and 17.6% were over 55 years old. Regarding education, 35.5% had completed at least 1 year of tertiary education. Also relevant is to report the proportion of respondents who indicated, at the end of the survey, if they had been aware of the two disk technologies before participating in the survey. For HD-DVD, 68.1% reported they already knew this technology existed; for Blu-Ray this percentage was only 33.2%. Hence we regard Blu-Ray as the product attribute that respondents are least familiar with.
23.4.2. Shifts in Attribute Importance Our first hypothesis was that the new product attribute would become more important while the other attributes would become less important as respondents received background information about the new product. To test for differences in attribute preferences before and after receiving additional product information, we estimated logit models that included the attribute parameters and, in addition, ‘‘before-after-dummy’’-by-attribute interactions to pick up differences in constants and attribute effects between the choice sets before and after the background information was provided. Price was included as a
Consumer Preferences Shift after Receiving New Product Information
509
single numerical price variable, relative to the base price of Aus$600, which was the bridging level across the two experimental designs incorporated in the one master design. We first estimated a conditional model with all attribute effects and all their interactions with a generic ‘‘before-after’’ dummy (Table 23.1). The model fit is quite acceptable with a pseudo-R2 of 0.179 and an likelihood ratio (LR) w2 of 1890.24 (df ¼ 27, po0.001). The ‘‘main’’ attribute effects in the model represent the effects across all choice sets after controlling for the effect of information provision in the later choice sets, so in fact represents the effects before information exposure. All attributes except two Table 23.1: Parameter estimates and WTP values for conditional logit model with dummy interactions for postinformation-provision attribute effects. Attribute
constant priceb brand1 brand2 brand3 blu_ray hd_dvd blu_hd combo ndisks fwire no hard disk 200 Gb hd warranty
Main attribute effects
Postinformation dummies
Coefficient
z
Coefficient
z
0.506 0.097 0.127 0.504 0.050 0.352 0.461 0.285 0.320 0.253 0.225 1.223 0.424 0.248
3.67 4.78 1.16 5.02 0.50 4.18 4.90 2.93 4.96 3.90 3.48 8.18 5.54 7.64
0.531 0.021 0.243 0.330 0.016 0.478 0.017 na 0.031 0.147 0.123 0.174 0.015 0.257
5.11 1.04 1.81 2.65 0.13 5.61 0.17
Model stats n LR w2 (df ¼ 27) Probability Ww2 Log likelihood Pseudo-R2
0.39 1.84 1.55 0.95 0.16 3.23
Preinformation Postinformation Marginal WTPa
Marginal WTPa
$520.42
$876.60
$130.38 $518.61 $51.17 $362.60 $474.64 $292.99 $329.36 $260.88 $231.72 $1,259.40 $436.45 $254.87
$98.20 $147.17 $28.55 $701.92 $404.23 $240.60 $244.29 $90.28 $86.25 $1,180.92 $345.41 $7.67
14457 1890.24 0 4349.09 0.1785
Abbreviations: WTP, willingness-to-pay; LR, likelihood ratio. a Negative WTP values appear if the base level of an attribute (e.g., a 100 Gb hard disk) is perceived as having more value than the coded attribute level (e.g., no hard disk). b Price in hundreds of dollars.
510
Harmen Oppewal et al.
of the brand dummies are significant in the model and have the effects in the expected direction. The last columns represent the marginal WTP values for each attribute for the before and after information provision conditions. The largest effect before information provision is observed for the presence of a hard disk (of 100 Gb), with an estimated marginal WTP of $1259, compared to not having a hard disk. Having a 200 GB instead of a 100 Gb hard disk adds another $436. The next largest effects are for brand2 ($518 relative to brand4), and the availability of HD-DVD disk technology ($475 relative to standard disk technology). Blu-Ray has also a large effect but, with $362, has a substantially smaller WTP than HD-DVD. The significant negative interaction between Blu-Ray and HD-DVD indicates a diminishing utility if a recorder is presented as being able to use both Blu-Ray and HD-DVD disks; the reduction in WTP is $293. The WTP for having a 3-year instead of a 1-year warranty is $255. The smallest effects are observed for the other two technical features, number of disks and the availability of firewire. These effects are all somewhat as expected; though it is interesting that they indicate that respondents were not willing to pay extra for the flexibility of having the equipment to use both competing new disk technologies in one device. Looking next at the dummy indicators representing attribute utility shifts due to the information conditions, the table shows firstly a significant increase in overall willingness to pay (see marginal WTP for the constant) after receiving the extra product information; respondents apparently felt more ready to accept or purchase having this extra product information. This increase however should not be judged in isolation as there are several significant before-after-dummy interactions with attributes. As expected, apart from the Blu-Ray and HD-DVD attribute, all of these interactions are negative, although not all are statistically significant. The largest decreases in WTP are observed for two brands: Brand2 drops from $518 to $147 and brand1 from $130 to $98 (relative to brand4) and for warranty (from $255 to $8). The main increase is for Blu-Ray — after receiving product information respondents’ WTP for this feature almost doubles, from $363 to $702. There was no increase for the HD-DVD technology. So the new information explaining the nature of DVD recorders and their new technology features was very influential in increasing the interest in Blu-Ray but not for HD-DVD. This is consistent with our sample statistics, which showed that only one-third of our respondents had ever heard of Blu-Ray while two-third had heard of HD-DVD. These findings overall confirm our first hypothesis, with additional information about the new attribute making the attribute more important while decreasing the valuation of other attributes, in particular the valuation of extrinsic attributes such as brand names and warranty.
23.4.3. Testing for Scale Differences due to Information Provision and Fatigue We next tested for scale differences between the choice sets before and after the provision of background information by estimating the same utility specification as
Consumer Preferences Shift after Receiving New Product Information
511
before, but now as a heteroscedastic logit model. The resulting model is shown in Table 23.2. The model shows a significant improvement in fit (LR w2 ¼ 6.7, df ¼ 1, po0.01) and a significant and negative scale parameter (g ¼ 0.797), indicating that the error variance after information provision has increased. However, compared with the conditional logit model there are now fewer significant attribute by information provision interaction dummies. The Blu-Ray effect, however, has increased from $339 to $510. The information effect on the valuation of availability of a hard disk has increased and is now estimated as a difference of $463 (po0.10). Neither the brand nor warranty effects are any longer significant. It appears that after accommodating for differences in model error variance the model has no longer the power to reliably detect these effects. Table 23.2: Parameter estimates and WTP values for heteroscedastic logit model with dummy interactions for postinformation-provision attribute effect. Attribute
Main attribute effects
Postinformation Preinformation Postinformation dummies
Coefficient
z
Coefficient
z
constant price brand1 brand2 brand3 blu_ray hd_dvd blu_hd combo ndisc fwire no hard disk 200 Gb hd warranty
0.839 0.138 0.129 0.489 0.093 0.433 0.538 0.435 0.301 0.059 0.210 1.309 0.412 0.237
3.94 5.33 1.18 4.87 0.92 4.28 5.00 2.96 4.64 3.60 3.25 8.42 5.37 7.26
1.311 0.074 0.383 0.088 0.069 1.318 0.430 na 0.365 0.022 0.041 1.688 0.528 0.083
2.36 1.58 1.73 0.37 0.37 2.47 1.44
g [ ¼ ln(m)] postinformation
0.797
2.38
Model stats n LR w2 (df ¼ 1) Probability Ww2 Log likelihood Pseudo-R2
1.35 0.14 0.26 1.74 1.44 0.35
Marginal WTP $606.87
$1,010.99
$93.06 $353.72 $67.60 $313.33 $389.00 $314.32 $217.72 $42.56 $152.22 $946.47 $297.68 $171.37
$119.65 $188.62 $11.41 $823.34 $454.94 $204.36 $313.10 $38.11 $118.43 $1,409.29 $441.90 $150.42
14457 6.7 0.0096 4345.74 0.1792
Abbreviations: WTP, willingness-to-pay; LR, likelihood ratio.
Marginal WTP
512
Harmen Oppewal et al.
Therefore, although this finding confirms our second hypothesis, it somewhat undermines our earlier conclusion about the first hypothesis. An additional factor influencing error variance is respondent fatigue. As discussed, fatigue may result in either larger or smaller levels of error variance in later choice sets. To assess to what extent the results of our study are affected by fatigue, we compared the four choice sets administered immediately after the information briefs with the four choice sets that followed these. We tested for scale differences between the 5 to 8th and the 9 to 12th choice sets, but there was no statistically significant difference in scale between these positions in the sequence of choice sets (LR w2 ¼ 1.43, df ¼ 1, p ¼ 0.232, n.s.). This provides further evidence that the scale effect can be genuinely attributed to the provision of new attribute information.
23.4.4. Effects of Specific Information Sources So far we did not account for the fact that different respondents received different information briefs. To accommodate these differences in the model we replace the ‘‘generic’’ information by attribute effects with dummy interactions that capture the effects for each specific type of information source. We limit this extension to the new technology attributes and warranty only (further testing showed no effects for any of the other attributes). The resulting model (Table 23.3) has a significantly improved fit compared to the conditional logit above (LR w2 ¼ 45.11, df ¼ 5, po0.000). The model shows a significant information effect of the Blu-Ray supportive brief and the general ‘‘neutral’’ brief on the Blu-Ray attribute, and a marginally significant effect of the HD-DVD supportive brief on the HD-DVD attribute. There are no effects for warranty. The scale parameter is still significant and is even slightly more negative than in the previous model (g ¼ 0.842). This means that taking into account heterogeneity induced by the differences between the information briefs in terms of whether they support Blu-Ray or HD-DVD does not reduce the increased heterogeneity caused by the information exposure. It thus appears that information provision in general leads to higher levels of model error variance, caused by either true random error or by increased respondent heterogeneity.
23.5. Conclusion and Discussion This paper presented results of a stated preference study where respondents received additional background information about a product and its attributes between completing two equivalent choice tasks. The information consisted of product reviews that were positive about either of two competing new DVD disk technologies or the review was neutral and did not recommend any of the two disk technologies.
Consumer Preferences Shift after Receiving New Product Information
513
Table 23.3: Parameter estimates and WTP values for heteroscedastic logit model with dummy interactions including specific postinformation-provision attribute effects. Attribute
Main attribute effects
Postinformation dummies
Coefficient
z
Coefficient
z
constant price brand1 brand2 brand3 blu_ray hd_dvd blu_hd blu_ray HDD brief blu_ray BR brief blu_ray Neutral hd_dvd HD brief hd_dvd BR brief hd_dvd Neutral combo ndisc fwire no hard disk 200 Gb hd warranty
0.843 0.138 0.128 0.489 0.094 0.430 0.535 0.428 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.059 0.210 1.311 0.412 0.237
3.96 5.33 1.18 4.87 0.93 4.22 4.95 2.87 na na na na na na 4.64 3.60 3.24 8.43 5.37 7.25
1.416 0.087 0.416 0.096 0.098 0.00 0.00 0.00 0.679 2.097 1.417 1.052 0.155 0.559 0.414 0.037 0.066 1.847 0.588 0.119
2.36 1.58 1.73 0.37 0.37 na na na 1.42 2.05 1.85 1.59 0.14 1.10 0.20 1.30 0.30 0.43 1.59 1.34
g [ ¼ ln (m)] postinformation
0.842
2.41
Model stats n LR w2 (df ¼ 1) Probability Ww2 Log likelihood Pseudo-R2
Preinformation Postinformation Marginal WTP Marginal WTP $609.10 $100.00 $92.80 $353.14 $67.74 $310.48 $386.29 $308.93 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $42.49 $151.89 $946.78 $297.37 $171.06
$1,000.56 $127.28 $173.91 $1.96 $190.32 $236.80 $189.37 $300.45 $928.54 $627.52 $465.85 $68.41 $247.58 $183.40 $42.42 $122.15 $1,398.34 $442.62 $157.59
14457 7.16 0.0074 4326.53 0.1828
Abbreviations: WTP, willingness-to-pay; LR, likelihood ratio.
Our prediction was that as respondents become better informed about a new attribute not only the attribute is more likely to have an influence on their choices but also that it will decrease the effect of other attributes. The latter was expected to occur especially for extrinsic attributes such as warranties and brand names. Our results showed confirmation for this prediction in that the parameters, and WTP, for the most prominent brand name as well as for product warranties decreased after respondents received the extra information.
514
Harmen Oppewal et al.
We also predicted that with the better understanding of the meaning of a new attribute, respondents would be better able to act on their individual preferences and, consequently, that there would be greater variety in preferences, resulting in a greater level of error variance if an aggregate model is estimated. This prediction was also confirmed; we observed a significant shift in scale after the extra information had been provided. This shift was unlikely to be due to fatigue as suggested in the literature as we found no evidence for scale differences between choice sets from different choice stages post receiving extra attribute information. It was neither caused by the different information brief versions that were used in this study as, even though the different briefs had different effects on technology preference, accounting for these effects did not help reduce the observed scale differences. The results therefore suggest that in addition to the observed shift in attribute preferences the information provision has resulted in an increase in preference heterogeneity and a corresponding increase in the error variance in the estimated aggregate model. Future research may avoid some of the limitations of the present study. Different media of information delivery could be tested. Our study used product reviews as they can appear in newspapers and consumer magazines; it will be interesting to test if other information sources such as word-of-mouth or sales people will have similar effects, and more in general if variations in credibility of the information source will affect the estimates. This study was only a first attempt to model the effects of providing information about new products on choice model estimates. In our ongoing work, we aim to assess more widely how choice models can be enriched if respondents receive extra information about product or context attributes, building on work conducted in the ‘‘information acceleration’’ paradigm of Urban et al. (1990, 1997). Our study also provided only verbal information. An ongoing research issue in choice experimental research is if, how and when to provide nonverbal (imagery) information and to what extent that will allow respondents to better articulate their preferences as they would operate in ‘‘real’’ choice situations. Finally, we hypothesized an increase in error variance caused by increasing preference heterogeneity but we did not explicitly model this heterogeneity. Latent class analysis could be applied to investigate this issue (cf. Boxall & Adamowicz, 2002; Swait & Adamowicz, 2001a). The main implication of our findings is that if respondents are unfamiliar with an attribute, providing explanatory information about the attribute will not only result in parameter shifts for the particular attribute but it will also affect the estimates of the remaining attributes. In addition there is a substantive effect on error variance if an aggregate model is estimated.
Acknowledgment This research project was funded by the Australian Research Council (Discovery Grant no. DP0450910).
Consumer Preferences Shift after Receiving New Product Information
515
References Arentze, T., Borgers, A., Timmermans, H., & Del Mistro, R. (2003). Transport stated choice responses: Effects of task complexity, presentation format and literacy. Transportation Research E, 39, 229–244. Ben-Akiva, M., & Boccara, B. (1995). Discrete choice models with latent choice sets. International Journal of Research in Marketing, 12(1), 9–24. Ben-Akiva, M., McFadden, D., Train, K., Walker, J., Bhat, C., Bierlaire, M., Bolduc, D., Boersch-Supan, A., Brownstone, D., Bunch, D. S., Daly, A., De Palma, A., Gopinath, D., Karlstrom, A., & Munizaga, M. A. (2002). Hybrid choice models: Progress and challenges. Marketing Letters, 13(3), 163–175. Bjornstad, D., Cummings, R., & Osborne, L. (1997). A learning design for reducing hypothetical bias in the contingent valuation method. Environmental and Resource Economics, 10(3), 207–221. Boxall, P. C., & Adamowicz, W. L. (2002). Understanding heterogeneous preferences in random utility models: A latent class approach. Environmental and Resource Economics, 23(4), 421–466. Bradley, M., & Daly, A. (1994). Use of the logit scaling approach to test for rank-order and fatigue effects in stated preference data. Transportation, 21, 167–184. Caussade, S., de Dios Ortu´zar, J., Rizzi, L. I., & Hensher, D. A. (2005). Assessing the influence of design dimensions on stated choice experiment estimates. Transportation Research B, 39(7), 621–640. Cordell, V. (1997). Consumer knowledge measures as predictors in product evaluation. Psychology and Marketing, 14(3), 241–260. Crouch, G. I., Devinney, T. M., Louviere, J., & Islam, T. (2009). Modelling consumer choice behaviour in space tourism. Tourism Management, 30, 441–454. DeShazo, J. R., & Fermo, G. (2002). Designing choice sets for stated preference methods: The effects of complexity on choice consistency. Journal of Environmental Economics and Management, 44(1), 123–143. Hensher, D. A., Rose, J., & Greene, W. H. (2005). Applied choice analysis: A primer. Cambridge: Cambridge University Press. Hole, A. R. (2006). Small-sample properties of tests for heteroscedasticity in the conditional logit model. Economic Bulletin, 3(18), 1–14. Huber, J., Ariely, D., & Fischer, G. (2002). Expressing preferences in a principal-agent task: A comparison of choice, rating and matching. Organizational Behavior and Human Decision Processes, 87(1), 66–90. Huber, J., Wittink, D. R., Johnson, R. M., & Miller, R. (1992). Learning effects in preference tasks: Choice-based versus standard conjoint. In: 1992 Sawtooth Software Conference Proceedings, Sawtooth Software, Inc., Ketchum, ID. Keller, K. L., & Staelin, R. (1987). Effects of quality and quantity of information on decision effectiveness. Journal of Consumer Research, 14(2), 200–213. Keller, K. L., & Staelin, R. (1989). Assessing biases in measuring decision effectiveness and information Overload. Journal of Consumer Research, 15(4), 504–508. Liechty, J. C., Fong, D. K. H., & DeSarbo, W. (2005). Dynamic models with individual level heterogeneity: Utility evolution in conjoint analysis. Marketing Science, 24(2), 285–293. Louviere, J. J. (2001). What if consumer experiments impact variances as well as means? Response variability as a behavioral phenomenon. Journal of Consumer Research, 28(3), 506–511.
516
Harmen Oppewal et al.
Louviere, J. J., Hensher, D. A., & Swait, J. D. (2000). Stated choice methods: Analysis and application. Cambridge, UK: Cambridge University Press. Lussier, D. A., & Olshavsky, R. W. (1979). Task complexity and contingent processing in brand choice. Journal of Consumer Research, 6(2), 154–165. Malhotra, N. K. (1982). Information load and consumer decision making. Journal of Consumer Research, 8(4), 419–430. Mazzotta, M. J., & Opaluch, J. J. (1995). Decision making when choices are complex: A test of Heiner’s hypothesis. Land Economics, 71(4), 500–515. McFadden, D., & Train, K. (2000). Mixed MNL models for discrete responses. Journal of Applied Econometrics, 15(5), 447–470. Payne, J. W. (1976). Task complexity and contingent processing in decision making: An information search and protocol analysis. Organizational Behavior and Human Performance, 16(2), 366–387. Purohit, D., & Srivastava, J. (2001). Effect of manufacturer reputation, retailer reputation, and product warranty on consumer judgments of product quality: A cue diagnosticity framework. Journal of Consumer Psychology, 10(3), 123–134. Rao, A. R., & Monroe, K. B. (1988). The moderating effect of prior knowledge on cue utilization in product evaluations. Journal of Consumer Research, 15(2), 253–264. Revelt, D., & Train, K. E. (1998). Mixed logit with repeated choices: Households’ choices of appliance efficiency level. The Review of Economics and Statistics, 80(4), 647–657. Savage, S., & Waldman, D. M. (2008). Learning and fatigue during choice experiments: A comparison of online and mail survey modes. Journal of Applied Econometrics, 23, 351–371. Sonnier, G., Ainslie, A., & Otter, T. (2007). Heterogeneity distributions of willingness-to-pay in choice models. Quantitative Marketing and Economics, 5(3), 313–331. Street, D., & Burgess, L. (2007). The construction of optimal stated choice experiments: Theory and methods. Hoboken, NJ: Wiley. Suri, R., & Monroe, K. B. (2003). The effects of time constraints on consumers’ judgements of prices and products. Journal of Consumer Research, 30(1), 92–104. Swait, J. (2001). A non-compensatory choice model incorporating attribute cutoffs. Transportation Research B, 35(10), 903–928. Swait, J., & Adamowicz, W. (2001a). Choice environment, market complexity and consumer behavior: A theoretical and empirical approach for incorporating decision complexity into models of consumer choice. Organizational Behavior and Human Decision Processes, 86, 141–167. Swait, J., & Adamowicz, W. (2001b). The influence of task complexity on consumer choice: A latent class model of decision strategy switching. Journal of Consumer Research, 28, 135–148. Swait, J., & Ben-Akiva, M. (1987). Incorporating random constraints in discrete choice models of choice set generation. Transportation Research B, 21(22), 91–102. Swait, J., & Louviere, J. J. (1993). The role of the scale parameter in the estimation and comparison of multinomial logit models. Journal of Marketing Research, 30(3), 305–314. Tversky, A., & Shafir, E. (1992). Choice under conflict: The dynamics of deferred decision. Psychological Science, 3(6), 358–361. Urban, G. L., Hauser, J. R., Qualls, W. J., Weinberg, B. D., Bohlmann, J. D., & Chicos, R. A. (1997). Information acceleration: Validation and lessons from the field. Journal of Marketing Research, 34(1), 143–153. Urban, G. L., Hauser, J. R., & Roberts, J. H. (1990). Prelaunch forecasting of new automobiles. Management Science, 36(4), 401–421.
Chapter 24
Investigating Willingness to Pay–Willingness to Accept Asymmetry in Choice Experiments Bruno Lanz, Allan Provins, Ian J. Bateman, Riccardo Scarpa, Ken Willis and Ece Ozdemiroglu
Abstract We investigate discrepancies between willingness to pay (WTP) and willingness to accept (WTA) in the context of a stated choice experiment. Using data on customer preferences for water services where respondents were able to both ‘sell’ and ‘buy’ the choice experiment attributes, we find evidence of nonlinearity in the underlying utility function even though the range of attribute levels is relatively small. Our results reveal the presence of significant loss aversion in all the attributes, including price. We find the WTP–WTA schedule to be asymmetric around the current provision level and that the WTP–WTA ratio varies according to the particular provision change under consideration. Such reference point findings are of direct importance for practitioners and decision-makers using choice experiments for economic appraisal such as cost– benefit analysis, where failure to account for non-linearity in welfare estimates may significantly over- or under-state individual’s preferences for gains and avoiding losses respectively.
24.1. Introduction 24.1.1. Willingness to Pay–Willingness to Accept Asymmetry Asymmetry in willingness to pay (WTP) and willingness to accept (WTA) is one of the most documented phenomena in empirical literature. The basic finding is that
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
518
Bruno Lanz et al.
while standard (Hicksian) economic theory allows for a small difference between WTP for a unit gain and WTA for a unit loss (Hicks, 1943; Randall & Stoll, 1980), numerous empirical investigations have observed discrepancies which appear significantly larger than predicted by theory, with losses being valued substantially more than gains.1 Indeed as documented by Horowitz and McConnell (2002), the empirical evidence is pervasive and gain–loss effects have been observed for a wide variety of economic goods. This includes goods traded in formal markets as well as commodities that are non-market and/or public good in nature, in real, hypothetical and experimental settings. In particular, the disparity in terms of the ratio of WTP to WTA is found to increase the further the good is from ‘an ordinary private good’ (Horowitz & McConnell, 2002). A variety of explanations for WTP–WTA asymmetry have been proffered. Explanations from standard economic theory emphasise the role of substitution and income effects in exacerbating the WTP–WTA disparity (Hanemann, 1991, 1999). Indeed, disposable income constrains demand for a good in terms of WTP, but not demand for compensation as relevant for WTA. In addition, low substitutability of a good might imply that it is not possible to compensate an individual for its loss, possibly giving rise to extreme WTA values. Hence observations of WTP–WTA disparity in instances of public goods, where private consumption goods are imperfect substitutes, may not be ‘anomalous’ even within the standard model. Sugden (1999), however, contends that the degree of asymmetry observed in empirical studies would have to be generated by implausible levels of income and substitution effects. In fact the most commonly cited explanation for WTP–WTA asymmetry is that of ‘loss aversion’, an insight from psychological literature. This is formalised in the theory of reference-dependent utility (Kahneman & Tversky, 1979; Tversky & Kahneman, 1991), which has been shown to be capable of being a superior predictor of behaviour than the standard economic model (Bateman, Munro, Rhodes, Starmer, & Sugden, 1997). The central premise is that preferences are defined in relation to ‘reference states’ and will vary according to whether a gain or loss is assessed. In this setting loss-averse decision-makers will perceive losses as larger than gains of the same magnitude, with the WTP–WTA disparity arising because these valuations are formed relative to different reference states.
24.1.2. Stated Choice Experiments With regards to public and non-market goods, the bulk of WTP–WTA asymmetry evidence derives from contingent valuation (CV) studies (see e.g.
1. Relevant references are numerous. Some classics include Knetsch and Sinden (1984), Knetsch (1989) and Kahneman, Knetsch, and Thaler (1990). The literature uses a variety of terminology including ‘loss aversion’, the ‘endowment effect’ and ‘status quo bias’ which essentially all accord to the same observation in behaviour (Hanemann, 1999).
Investigating WTP–WTA Asymmetry in Choice Experiments
519
Brookshire & Coursey, 1987; Van Kooten & Schmitz, 1992; Bowker & MacDonald, 1993). In this survey-based economic valuation method, respondents are invited to participate in a hypothetical market for a given good, thereby revealing their valuation for the good. To some extent however, the CV approach is limited. In particular the analyst is only able to observe WTP–WTA asymmetry for the proposed good, i.e. asking respondents WTP for a discrete change from ‘A to B’ and WTA for the change from ‘B to A’ and then comparing the elicited values. A potentially richer setting in which to assess WTP–WTA disparity for public and non-market goods is provided by stated choice experiments. Here, it is assumed that the utility an individual derives from a good is a function of the characteristics of the good. A stated choice experiment mirrors this process and survey respondents are presented with sets of alternative combinations of attributes (or characteristics) and are asked to choose their most preferred bundle. Repeated choices by respondents from sets of alternatives reveal the trade-offs individuals are willing to make between attributes and hence their utility mapping of independent changes in each attributes. If one of these characteristics is the price of the bundle, a monetary value can be inferred for unit changes in each attribute. Hence a stated choice experiment approach enables numerous discrete changes to be presented to respondents, from which a complete WTP–WTA schedule may be inferred. In this paper we empirically investigate WTP–WTA asymmetry within a stated choice experiment frame, thereby contributing to the growing literature in this area (see e.g. Hess, Rose, & Hensher, 2008; Grutters et al., 2008; Bateman, Day, Jones, & Jude, 2009). The data for our investigation is taken from a survey of water company customer in North West England, eliciting preferences for changes to water, wastewater and environmental service levels. The survey instrument allowed respondents to trade both improvements and deterioration in the levels of attributes against the status quo (SQ), entitling them to both ‘buy’ or ‘sell’ the attributes. By estimating model specifications where marginal utility is allowed to vary with the change in each attribute, we account for the possibility that deteriorations may be seen differently as compared to improvements, but also that marginal changes near the reference point are given a higher value than subsequent increments (see Tversky & Kahneman, 1991). This is particularly important where welfare estimates from choice experiment studies — WTP and WTA values — are used to inform decision-making via economic appraisal tools such as cost–benefit analysis. Failure to account for potential non-linearities could result in significant under- or overestimates of the value of outcomes associated with policies and projects that are subject to appraisal. Importantly, we consider the utility of the monetary attribute as potentially nonlinear, which permits investigation of some related issues. First, Munro and Sugden (2003) conjecture that money is valued as any other good, so that money outlays are perceived as losses rather than foregone gains (see also Bateman et al., 1997; Bateman, Kahneman, Munro, Starmer, & Sugden, 2005). If this effect is empirically significant, it follows that WTP–WTA disparity will also hinge upon non-linearity in the utility for the price attribute. Second, while the changes in wealth are typically small relative to the disposable income, some evidence suggests that respondents
520
Bruno Lanz et al.
might not make choices with the entirety of their income in mind. In particular ‘mental accounting’ theories contend that individuals compartmentalise their budgets into a series of headings or ‘mental accounts’ and tend not to transfer money across these compartments (see e.g. Thaler, 1999). In effect individual’s budgets for different categories of discretionary purchases, such as those which are the subject of stated choice experiments, are much smaller than their total income and hence income constraint effects are much more likely to occur. The remainder of this paper is structured as follows. Section 24.2 provides brief details of the study from which our data is derived. Section 24.3 establishes the basis for non-linear specifications of the utility function and econometric estimation. Welfare estimates and WTP–WTA asymmetry results are reported in Section 24.4. Finally, discussions and conclusions are offered in Section 24.5.
24.2. Study Design 24.2.1. Background — Water Company Services and Customer Preferences Water supply companies in most OECD countries provide integrated services in the sense that their operations relate to the whole cycle of water use, from intercept, to storage, to residential and industrial supply, to collection of waste water, treatment and discharge back to the environment. Public regulators oversee their operations and negotiate with the companies the terms of service concession. In such a regulated market customer preferences do not have the opportunity of being expressed through competitive market signals such as varying prices in response to quantitative or qualitative product differences. Choice experiments find in this context an ideal setting for the exploration and quantitative analysis of water customer preferences. In the UK, The Water Services Regulation Authority (Ofwat) is the public regulator responsible for oversight of water companies and periodically revises water tariffs. Following the stated preference study by Yorkshire Water during the 2004 Price Review (see Willis, Scarpa, & Acutt, 2005), Ofwat has placed greater emphasis on justifying investment proposals through the application of cost–benefit analysis (Ofwat, 2007). The survey underlying the present analysis was undertaken for United Utilities Water (UUW) in North West England, and was designed to inform the investment planning for the 2009 Price Review.
24.2.2. Choice Experiment Attributes In conjunction with UUW and following a series of focus groups to refine definitions and descriptions, 14 service attributes were identified for inclusion in the survey. Both improvements and deteriorations in services were considered (within the regulatory constraints). To generate manageable choices for the respondents the service attributes were split into four blocks, which in effect gives four separate
Investigating WTP–WTA Asymmetry in Choice Experiments
521
Table 24.1: Choice experiment attributes and levels. Attribute (Label)
Definition
Level L2 L1
External flooding from sewers — remote from properties (EX1) External flooding from sewers — adjacent to properties (EX2) Internal flooding from sewers — below ground level (IN1) Internal flooding from sewers — at ground level (IN2) Water bill (BIL)
Each year the expected number of sewage flooding incidents in public places
–
SQ
1875 1500
L1
L2
L3
L4
750
375
–
–
– 2500 2000 1000 500 – – Each year the expected number of sewage flooding incidents in gardens or close to homes – 800 400 200 100 – – Each year the expected number of properties flooded in basement/ cellar areas only by untreated sewage – 875 700 350 175 – – Each year the expected number of properties flooded in ground floor rooms by untreated sewage Change in annual water d15 d5 d0 + d5 + d10 + d15 + d25 bill from 2008
choice experiments exercises. Two blocks of attributes related mainly to water service attributes (supply and quality of water to customers) and the two other blocks included wastewater services (sewer flooding on the wastewater network) as well as local and natural environment quality as impacted by wastewater services. Within each block a price attribute was also specified, which was defined in terms of the increase or decrease in the annual water and wastewater bill paid by customers. In this paper we report on analysis from only one of the service blocks, which focused on the location and severity of sewer flooding incidents.2 The attributes and levels are reported in Table 24.1. Sewer flooding can arise due to inadequate hydraulic capacity in sewers linked to heavy or prolonged rainfall, or from failure of equipment, sewer collapses or blockages. This results in untreated sewage escaping from sewers and flooding both external locations (such as footpaths and roads and also gardens and near properties) and internal areas (this can be below ground level
2. Analysis of further blocks of service attributes showed that findings reported here are not dependent to the choice of a particular block.
522
Bruno Lanz et al.
in cellars and basements or at ground level in living areas of properties). Incidents of sewer flooding are relatively infrequent in the UUW customer area; for instance out of a customer base of 2.7 million households and 0.2 million business premises, the number of properties affected by internal sewer flooding is around 1000 per annum. Overall, the sewer flooding service attributes are characteristic of public goods, and this is emphasised by the attribute definitions that describe the service level achieved across the water and wastewater network rather than the level of service received by a given customer. Four levels were specified for each service attribute on the basis of UUW’s investment planning process. This included the SQ level, which is the level of service currently experienced by customers, an improved level of service (L1) concordant with increased investment, a further ‘aspirational’ improved level of service (L2) that represents the greatest improvement that could be achieved within the current constraints of investment planning, and a reduced level of service resulting from lower investment (L 1) which still meets minimum standards required by the water industry regulator. The price attribute was specified over seven levels (L 2 to L4) including both increases and decreases in the customers’ annual water bill.
24.2.3. Experimental Design Separate experimental designs were generated for each of the four blocks of attributes. For the choice experiment of interest here, it is composed of 84 choice cards. The design was informed by analysis of responses to a pilot survey with a sample size of approximately 80 respondents, which evidenced a propensity of respondents to select the SQ option and provided a range of values for the price coefficient. This information was used to generate a design focused on increasing the efficiency of utility coefficient estimates, so called ‘D-efficiency’ (Ferrini & Scarpa, 2007). To accommodate uncertainty with regards to the utility coefficient at the stage of design derivation, a Bayesian criterion was applied assuming a uniform distribution of values for the SQ and price coefficient (Scarpa & Rose, 2008).
24.2.4. Sample The full survey comprised of a random sample of 2107 residential customers served by UUW, reflecting the population distribution of the customer base. The sample was largely representative of the socio-economic profile of North West England. The stated choice questionnaire was administered face-to-face via computer-aided personal interview (CAPI) in respondents’ homes by a market research firm. Each respondent was presented with two of the four choice experiment blocks, which were randomly allocated and ordered. For each block, respondents were shown eight choice cards. For the choice experiment concerned with the sewer flooding service attributes, this resulted in approximately 1000 respondents and over 8000 choice task observations.
Investigating WTP–WTA Asymmetry in Choice Experiments
523
24.3. Model Specification 24.3.1. Representation of Preferences We employ the standard approach to analysing choice experiment data, based on random utility modelling (RUM), where individuals are assumed to select the choice alternative that yields the greatest expected utility to them. As customary (but not indispensable) in the RUM framework, it is assumed that the utility function is additive and separable in its arguments, and that its random component (or error term) is also additive.3 In this framework, the traditional approach to model the preferences of the decision-makers is through a linear utility function. In particular, given a set of K attributes, and denoting the level of the kth attribute by Xk, the utility of a given bundle of attributes can be written as: UðX 1 ; . . . ; X k Þ ¼ b1 X 1 þ þ bk X k where the b’s measure the (constant) marginal utility of each attribute. Following the classic contributions by Kahneman and Tversky (1979) and Tversky and Kahneman (1991), we extend this simple representation of preferences by considering the properties of a reference-dependent utility function: (i) it is nondecreasing in its arguments; (ii) changes are evaluated relative to a reference point; (iii) it displays diminishing sensitivity, so that it is concave for gains and convex for losses, i.e. ‘s-shaped’; and (iv) a change of the same magnitude has larger impact on the respondent’s welfare if it is a deterioration than if it is an improvement. Formally, we define utility derived from attribute Xk as a non-decreasing function: ( vðX k Þ ¼
vþ ðX k Þ for X k 0 v ðX k Þ for X k o0
where Xk is defined as the difference between the final endowment and the SQ level of the good, with vð0Þ ¼ 0. Assuming the first and second derivatives vu(Xk) and vv(Xk) exist, this function has the following properties: Loss aversion: vðX k Þo vðX k Þ, 8X k 40, and v0 ðX k Þov0 ðX k Þ, 8X k 40 Diminishing sensitivity: v00 ðX k Þ 0, 8X k 40, and v00 ðX k Þ 0, 8X k o0.
3. Although not investigated in this paper, an alternative approach would be to directly assess nonlinearities in willingness to pay space (see e.g. Train & Weeks, 2005; Scarpa, Thiene, & Train, 2008; Fosgerau & Bierlaire, 2009).
524
Bruno Lanz et al.
For the purposes of our empirical investigation, we use two alternative functional forms from the empirical literature: (i) a piecewise-linear function and (ii) an exponential function.4 The piecewise-linear function is a direct extension of the linear utility function commonly used in discrete choice models, and represents a ‘basic’ non-linear formulation: ( vðX k Þ ¼
bþ X k ; b X k ;
Xk 0 X k o0
with b bþ . In the context of discrete choice modelling, Hess et al. (2008) apply this functional form to investigate gains–loss asymmetry in attributes of travel trip time. The piecewise-linear functional form can capture gains–loss asymmetry, but it rules out potential non-linearities within gain and loss domains. This limitation is relaxed by the exponential function, while still nesting the linear model as a special case. Formally the exponential specification can be written as: 8 > þ 1 expðmX k Þ > ; >
expðgX k Þ 1 > ; > :b g
Xk 0 X k o0
where 0om g and b bþ . For m ¼ g ! 0, this expression reduces to the piecewise-linear function, and if in addition b ¼ bþ , we have the standard linear model. The case for applying an exponential function formulation in gain–loss asymmetry setting is supported by both normative (see e.g. Ko¨bberling & Wakker, 2005) and empirical considerations (Stott, 2006). In this setting, our objective is to compare the standard linear function to the two non-linear functional forms presented above, which permit the assessment of WTP–WTA discrepancies. The non-linear specifications are also applied to the price attribute, which can provide further insight as to the source of WTP–WTA asymmetry. Note that an important consequence of such preferences is that, unless the valuation function is symmetrical around the SQ, the possibility arises that the ratio of WTP to WTA will in part be dictated by the size of the change under consideration.
4. Other non-linear specifications were tested, including the logarithm utility function and the power (Cobb–Douglas) utility function. However, the logarithm function does not allow specifying the curvature of the utility and is ill-behaved between zero and one, whereas the power function entails the possibility that vðX k Þ4 vðX k Þ (see Ko¨bberling & Wakker, 2005) and yielded implausible welfare estimates. For a comprehensive review of candidate functional forms, see Stott (2006).
Investigating WTP–WTA Asymmetry in Choice Experiments
525
24.3.2. Econometric Estimation In estimating the non-linear functions, we first note that gains–loss asymmetry is typically characterised as a discontinuity in the function at the SQ attribute level (the ‘reference point’). In this model, the perception of the current situation is therefore a central aspect of the respondents’ valuation, providing a potentially important source of unobserved heterogeneity (Scarpa, Willis, & Acutt, 2007; Hess & Rose, 2009). Second, the parameters describing the effects of an attribute change in terms of utility gains and losses are potentially different. Such differences can manifest themselves in terms of magnitude (b’s in expressions above) and/or in terms of the curvature parameters (the g’s and m’s) for the exponential functional form. From the data for the four service attributes (IN1, IN2, EX1 and EX2) we observe four points on the utility function for each respondent: the SQ, which is given a value of zero; a reduced level of the attribute (L 1); and two increased levels of the attribute (L1 and L2, where L1oL2). In this setting, it is possible to identify the curvature parameter for improvements in the level of attributes, but not for reductions. Since diminishing sensitivity to losses has revealed to be an important aspect of individual’s decision-making (Tversky & Kahneman, 1991), we specify the utility in the loss space to be convex and set the value of the curvature parameter to be the same for gains and losses. As shown below, this approach permits for flexibility in the estimation and conveniently captures potential gains–loss asymmetry. It is also supported by findings in the literature (see e.g. Tversky & Kahneman, 1992) and is consistent with the piecewise-linear model, which has the same curvature for gains and losses. For the price attribute (BIL), which is specified over a greater range of attribute levels (L 2 to L4) it is possible to retrieve the exponent parameter for both gains and losses. The specification of the utility function used in the modelling is obtained by defining: ( X k
¼
X k X SQ k
if X k oX SQ k
0
otherwise
( ;
Xþ k
¼
X k X SQ k
if X k X SQ k
0
otherwise
so that the utility of a bundle of K attributes and the price attribute P is given by: þ þ þ þ U i ðX 1 ; . . . ; X K Þ ¼ þ b ki v ðX k Þ þ bki v ðX k Þ þ bPi v ðP Þ þ þ þ bþ Pi v ðP Þ þ þ i
where v + ( ) and v ( ) are one of the functional forms reviewed above and are based on utility mapping for improvements and deteriorations respectively, and ei is an idiosyncratic (random) error term, assumed to follow a Type I extreme value distribution. Information about gain–loss asymmetry is conveniently summarised in the difference between the parameters of the econometric model. For the non-price þ coefficients, b ki 4bki would provide evidence for individual i being loss averse with
526
Bruno Lanz et al.
respect to attribute k, since it would imply vðX k Þo vðX k Þ and v0 ðX k Þov0 ðX k Þ, 8X k 40. For the price coefficients, we need to account for differences in both the curvature and scale parameters. To retrieve the parameters of interest, we use a mixed logit (MXL) model, which we estimate via simulated maximum likelihood (Train, 2003) assuming normally distributed coefficients,5 using robust standard errors, and 500 Halton draws to approximate the integral of the unconditional likelihood of each panel of choices. We thereby control for unobserved heterogeneity in the respondents’ tastes by exploiting the panel structure of the collected data. As mentioned above, this is a potentially crucial aspect of the specification since the decision-making process is modelled as relative to the individually experienced SQ. Direct estimation of the curvature parameters revealed to be numerically infeasible if unobserved heterogeneity is accounted for, this being due mainly to the differing amounts of information gathered for gains and losses. Hence, in order to estimate the models with exponential functional form, we use an auxiliary regression where each attribute level is entered as a dummy variable. This corresponds to a ‘non-parametric’ utility function, since it does not impose a specific functional form on the choices made by the respondents (Hey & Orme, 1994). Because of the relatively large number of parameters to estimate (19 in total), the auxiliary regression is based on a multinomial logit (MNL) model. The estimates from the auxiliary model are then used to compute an approximation of the curvature coefficient to be used in the main MXL estimation.6 This first step also provides a descriptive assessment of the shape of respondent preferences.
24.4. Results 24.4.1. Auxiliary Regression — Non-Parametric Utility Function Results from the auxiliary MNL model with dummy coding of attribute levels are presented in Table 24.2. Estimates of the curvature parameter for the exponential functional forms are also reported. The model’s estimates should be interpreted as the value or welfare of each attribute’s level, so that for non-price attributes any improvement level should display a positive value, while deterioration levels will be perceived negatively. On the other hand, the shift to a higher bill level is expected to have a negative impact on welfare, while the shift to a lower bill level should have a positive welfare impact.
5. Models estimated with a log-normal distribution instead yielded implausible welfare estimates, mainly because of imprecisely estimated standard deviations parameters. 6. For the exponential utility function, the curvature parameter is implicitly given by: DX L2 =DX L1 ¼ ½1 expðm DX L2 Þ=½1 expðm DX L1 Þ and can be retrieved through the Newton–Ralphson method for example.
–
–
0.061 (0.079)
–
IN1
IN2
BIL
SQ
–
0.037 (0.080)
0.499*** (0.071)
0.473*** (0.078)
0.499*** (0.076)
0.245*** (0.074)
L1
0.520*** (0.102)
–
–
–
–
–
SQ
0.264*** (0.065)
0.374*** (0.073)
0.205*** (0.077)
0.184*** (0.070)
L2
–
–
–
–
L3
–
–
–
–
L4
–
–
–
–
0.427*** 0.667*** 1.097*** 1.292*** (0.075) (0.082) (0.097) (0.100)
0.326*** (0.057)
0.313*** (0.059)
0.254*** (0.065)
0.005 (0.066)
L1
Notes: Standard errors in parenthesis; statistical significance: *po0.1; **po0.05; ***po0.01. a Curvature parameter for reductions in the price attribute (BIL). b Curvature parameter for increases in the price attribute (BIL).
8328 1041 7801.73 15641.46
–
EX2
No. observations No. respondents Log-likelihood AIC
–
L2
EX1
Attribute
Table 24.2: MNL model with dummy coded variables.
0.200a 0.060b
0.05
0.05
0.25
0.15
Curvature parameter Investigating WTP–WTA Asymmetry in Choice Experiments 527
528
Bruno Lanz et al.
All estimated coefficients that are found to be statistically different from zero have the expected sign, so that reductions in internal and external sewer flooding incidents (L1 and L2) increase utility while more sewer flooding incidents (L 1) decrease utility. It is also found that for the service attributes EX2 and IN2 the utility of the highest improvement (L2) is lower in magnitude than the utility of the lower improvement level (L1). This finding indicates a potential inconsistency in choice patterns: since the Level 1 improvement is embedded within the Level 2 improvement, giving a larger weight to Level 1 violates the weak axiom of revealed preference. However, we note that confidence intervals for the estimates overlap; i.e. the welfare attributed to the improvements levels are not statistically different from each other. Hence for these attributes, these results suggest relatively strong nonlinearities, where initial improvements are given a relatively high value as compared to subsequent incremental improvements. Three coefficient estimates are not statistically significant at the conventional significance levels in the MNL model: EX1 (L1), BIL (L 1) and BIL (L 2). As a consequence, the curvature parameter for the attribute EX1 is set somewhat arbitrarily. The finding with respect to the price attribute (BIL) implies that the proposed reductions in water bill (L 1 and L 2) were not large enough to have an impact on the respondent’s welfare.7 In contrast the pattern of estimates for increases in the price attribute indicates the disutility of increases in water bills as well as evidence of diminishing marginal sensitivity. Generally the results for the dummy coded MNL model agree with our expectations. External flooding close to properties (EX2) generates greater disutility than flooding in locations that are remote from properties (EX1). And internal flooding in living areas (IN2) generates greater disutility than flooding below ground level. Moreover, more incidents/properties are affected by EX2 and IN2 type flooding than EX1 and IN1 type flooding. Hence we would expect the coefficient for EX2 to be greater than that for EX1, and the coefficient for IN2WIN1. Mostly this is what is observed with the coefficients for EX2WEX1, for L1 and L2; and IN2WIN1 for L1 but not for L2 and similarly for losses where EX2WEX1 and IN2WIN1.
24.4.2. Main Model Estimation — Welfare Estimates Turning to the main estimation results, estimates from the MXL model are reported in Table 24.3. The coefficients refer to the mean and standard deviation (SD) of a unit change of each attribute. Hence the coefficient on the bill is expected to be negative on both the gain and loss domains (a marginally higher bill is a bad whether above or below the SQ), and the coefficients on other attributes will be positive (marginally better service is a good).
7. We note that the actual attribute levels (d5 and d15 reduction in water bill) represent roughly 1% and 4% reductions in the total bill (approximately d350 per annum for water and wastewater services).
Investigating WTP–WTA Asymmetry in Choice Experiments
529
Reviewing the summary statistics, we first note the large improvement of the loglikelihood and AIC criterion of the MXL model as compared to the MNL dummy coded model, confirming the importance of unobserved heterogeneity8 and of accounting for the panel nature of the observed choices. Secondly, Table 24.3 shows that the non-linear specifications fit the data significantly better than the linear specification in terms of both log-likelihood and information criterion. Furthermore, the exponential specification, which imposes diminishing marginal sensitivity, fits the data better than the piecewise-linear utility function, even though the difference in goodness-of-fit measures is relatively small. The inferiority of the linear model is also confirmed by the interpretation of the magnitude of the coefficients, as results for both external flooding (EX2oEX1) and internal flooding (IN2oIN1) are at odds with expectations. Findings for the piecewise-linear and exponential utility models are in line with expectations as we find that EX2WEX1 for both gains and losses and IN2WIN1 for losses. For internal flooding gains, the results indicate that IN2oIN1, although the difference is not statistically significant ( po0.01). In general, it can be seen that the magnitude of the difference is greater for the model with exponential utility function, whereas coefficients are generally not statistically significantly different from each other in the piecewise-linear model. As previously mentioned, an important aspect of our estimation is the perception of the SQ. Interestingly, we find that the magnitude of the coefficient capturing the utility of the SQ declines significantly when gains and losses are estimated separately, whereas the SD of the SQ variable remains roughly constant across models. This pattern indicates that the constraint imposed by a linear utility specification artificially inflates the welfare attributed to the SQ alternative (for a similar line of argument, see Scarpa, Ferrini, & Willis, 2005, 2007; Hess & Rose, 2009). Turning to gains–loss asymmetry, we find that in almost all instances the absolute value of coefficient estimates associated with losses is of greater magnitude than those for gains. There is also a tendency for the SD coefficient to be more pronounced for losses rather than gains, implying greater heterogeneity in preferences in this regard. The price attribute (BIL) is also subject to gains–loss asymmetry, with increases in the bill inducing significant disutility, whereas a reduction in bill is not found to have a statistically significant effect (at the mean of the distribution). However, there is evidence of significant heterogeneity surrounding the effect of the price attribute, so that some respondents were sensitive to decreases in their bill, but always to a much smaller extent than for an increase of the bill of the same magnitude.
8. As noted above it was not feasible to estimate a MXL model with dummy coded variables. We have, however, compared the results of MNL models across all specifications. Interestingly, the dummy-coded model generally outperforms other specification in terms of log-likelihood, but not if we penalise it for the number of variables included in the model. In fact, we find that the AIC is worst for the ‘non-parametric’ utility function as compared to the alternative specifications.
530
Bruno Lanz et al.
Table 24.3: MXL model with linear and non-linear specifications. Linear function
Piecewise-linear function
Exponential function
No. observations No. respondents Log-likelihood AIC
8328 1041 7101.23 14226.47
8328 1041 7025.93 14095.86
8328 1041 7010.95 14065.90
Attribute EX1 Mean
Coefficient
Reduction
0.00046*** (0.00006) 0.00034** (0.00017)
0.00159*** (0.00028) 0.00196*** (0.00057)
0.00020*** (0.00008) 0.00029 (0.00018)
0.21252*** (0.03785) 0.24013** (0.10605)
0.04056** (0.01646) 0.05410 (0.05209)
0.00045*** (0.000049) 0.00053*** (0.00008)
0.00162*** (0.00022) 0.00177*** (0.00037)
0.00031*** (0.00006) 0.00034** (0.00016)
0.28996*** (0.03941) 0.35697*** (0.06140)
0.11998*** (0.02187) 0.16896*** (0.04946)
0.00156*** (0.00010) 0.00138*** (0.00015)
0.00174*** (0.00028) 0.00222*** (0.00036)
0.00160*** (0.00028) 0.00177*** (0.00050)
0.03607*** (0.00649) 0.04982*** (0.00936)
0.03438*** (0.00506) 0.03286*** (0.00994)
0.00121*** (0.00008) 0.00119*** (0.00013)
0.00385*** (0.00060) 0.00402*** (0.00073)
0.00074*** (0.00015) 0.00143*** (0.00017)
0.05851*** (0.00900) 0.06354*** (0.01413)
0.01977*** (0.00365) 0.03419*** (0.00448)
SD EX2 Mean SD IN1 Mean SD IN2 Mean SD BIL Mean SD SQ Mean SD
0.04753*** (0.00327) 0.06053*** (0.00394) 1.62298*** (0.07299) 1.64420*** (0.06629)
Increase
Reduction
Increase
0.00739 0.09341*** 0.00084 0.16582*** (0.00684) (0.00740) (0.02141) (0.01201) 0.08415*** 0.09019*** 0.24852*** 0.15845*** (0.01354) (0.00686) (0.03654) (0.01141) 0.61753*** (0.11992) 1.50382*** (0.07447)
0.58207*** (0.13093) 1.47929*** (0.08400)
Notes: Standard errors in parenthesis; statistical significance: *po0.1; **po0.05; ***po0.01.
The extent of gains–loss asymmetry is illustrated graphically in Figures 24.1 and 24.2 for the attributes IN2 and BIL respectively. Graphs for EX1, EX2 and IN1 are reported in Appendix 24.A. Generally we observe that the linear model overestimates positive welfare effects for reductions in internal and external sewer flooding incidents and reductions in bill, and underestimates negative welfare effects of increased sewer flooding incidents and increased bills, particularly in relation to the better fitting exponential model.
Investigating WTP–WTA Asymmetry in Choice Experiments
Figure 24.1: Gains–loss asymmetry for internal sewer flooding (IN2).
Figure 24.2: Gains–loss asymmetry for the price attribute (BIL).
531
532
Bruno Lanz et al.
24.4.3. Comparing WTP and WTA In the linear model, individual-level marginal WTP for a service attribute is given by the ratio of the marginal utility of an attribute divided by the negative of the price coefficient. By construction, it is constant for any size of improvement since the marginal utility estimate for each attribute (and the price) is constant. It follows that WTP will be equal to WTA, a constraint that is at odds with the empirical evidence from the literature. In our non-linear specification WTP and WTA may differ, but it complicates the computation of marginal WTP and WTA. Nevertheless, with the functional forms used here it is possible to derive closed form expressions for the monetary valuation of a change at the individual level. Intuitively, the requirement is to estimate the change in the price that would offset the utility or disutility brought about by the change in the attribute, so that we obtain the expressions for WTP/WTA by equating vi ðDX k Þ ¼ vi ðDPricei Þ, where i indexes respondents. For each of the functional forms, we have the following expressions: Linear model: bk;i DX k ¼ bP;i DPricei 3DPricei ¼
bk;i DX k bP;i
Piecewise-linear model: þ DX k 0 : bþ k;i DX k ¼ bP;i DPricei 3DPricei ¼
bþ k;i
DX k bþ P;i b k;i DX k o0 : b DX ¼ b DPrice 3DPrice ¼ DX k k i i k;i P;i b P;i
Exponential function model: þ DX k 0 : bþ k;i ½1 expðmDX k Þð1=mÞ ¼ bP;i ½1 expðdDPricei Þ1=d3 ( ) bþ k;i d DPricei ¼ ð1=dÞ ln 1 þ ½1 expðmDX k Þ bP;i m
DX k o0 : b k;i ½expðmDX k Þ 1ð1=mÞ ¼ bP;i ½expðgDPricei Þ 1ð1=gÞ3 ( ) b k;i g DPricei ¼ ð1=gÞ ln 1 þ ½expðmDX k Þ 1 bP;i m
Where m, g and d are the curvature coefficients. The value of the change is then the negative of the change in bills needed to offset the welfare change.
Investigating WTP–WTA Asymmetry in Choice Experiments
533
Table 24.4: Comparison of WTP to WTA for internal sewer flooding (IN2). Change from SQa,b,c
Linear function
Piecewise-linear function
Exponential function
175 fewer (WTP) 175 more (WTA) Ratio (WTA: WTP) 350 fewer (WTP) 350 more (WTA) Ratio (WTA: WTP) 525 fewer (WTP) 525 more (WTA) Ratio (WTA: WTP)
4.46 4.46 1.00 8.91 8.91 1.00 13.37 13.37 1.00
1.40 91.18 65.36 2.79 182.35 65.36 4.19 273.53 65.36
1.34 25.49 19.09 1.86 27.22 14.62 2.08 27.80 13.39
a
d per household per year. Units ¼ expected number of properties flooded per year. c Change from SQ and corresponding attribute levels from Table 24.1: SQ to L 1 ¼ + 175; SQ to L1 ¼ 350; and SQ to L2 ¼ 525. b
For improvements in service attributes, we compute marginal WTP (per household per year) for a gain for a representative (average) household, whereas for decreases in service we compute marginal WTA (per household per year) for a loss. Table 24.4 presents WTP–WTA ratios for IN2 to provide an indication of the quantitative results, while Figure 24.3 illustrates graphically the WTP–WTA asymmetry. Graphs for EX1, EX2 and IN1 are reported in Appendix 24.A. From our data and as illustrated in the figures, we observe pronounced asymmetry in WTP and WTA across the range of levels for the four service attributes. In virtually all cases the linear model provides larger WTP estimates for an improvement in service. Moreover, the exponential functional form gives rise to a ‘plateau’ effect, where the valuation reaches a maximum relatively rapidly. For large increases in the attribute, therefore, the piecewise-linear function also tends to provide larger value estimates for improvements. Turning to WTA for service deterioration, the linear versus non-linear observation is reversed, with the linear model now providing a conservative estimate of WTA in comparison to the better fitting non-linear models. However, with respect to comparisons between non-linear models, we observe substantial differences in WTA estimates; a result that has to be contrasted with the relatively small differences across models in terms of measures of fit (as reported in Table 24.3). In particular the results for the piecewise-linear model show extremely high WTA, which may be attributed to the very small marginal utility associated with decreases in the price attribute. The exponential functional form yields more ‘plausible’ values and also shows the effect of diminishing sensitivity, as the value function flattens out relatively quickly. We note, however, that the latter effect is mainly a construct of our estimation procedure since we only observe one point of the utility function in the loss space. As the study design was primarily driven by the appraisal which the choice
534
Bruno Lanz et al.
Figure 24.3: WTP–WTA asymmetry for internal sewer flooding (IN2).
experiment was to inform, future research directed at assessing non-linearities in the loss-space should aim at gathering more information in order to identify separately the scale and curvature parameters for the gain and loss domains.
24.5. Conclusion The aim of this paper has been to assess WTP–WTA discrepancies in the context of a choice experiment, modelling the utility of respondents as a non-linear function. Our empirical investigation shows that preferences for changes in levels of service related to internal and external sewer flooding are subject to relatively strong gains– loss asymmetries. We also find evidence of diminishing marginal sensitivity for gains. Given our approach to the econometric estimation this result is also translated to imply diminishing marginal sensitivity to losses, but this proposition should be subject to further research given the limitations of the data within our study. Less commonly investigated in the literature to date is non-linearity in the price attribute, an issue that is found to be relatively important in our data. We suggest that the pronounced degree of gain–loss asymmetry observed may be taken as further evidence that monetary outlays are construed of losses rather than foregone gains. This provides further support for the theory of reference-dependent preferences proposed by Munro and Sugden (2003), which is empirically supported by
Investigating WTP–WTA Asymmetry in Choice Experiments
535
Bateman et al. (2005). The evidence from our analysis also provides support for the hypothesis that people behave according to the theory of ‘mental accounting’ (Thaler, 1999), since we observe patterns of declining marginal sensitivity to bill increases even though the amounts considered are relatively small compared to the income of most respondents. According to these results, loss aversion in the price attribute is a significant source of the WTP–WTA asymmetry observed in the analysis and should be given careful scrutiny both for survey design and for the purpose of deriving welfare estimates from stated choice experiments. In controlling for non-linearities in respondent’s value function, we establish that the WTP–WTA ratio varies according to the particular provision change under consideration. Accounting for this effect is a particularly appealing property of the choice experiment approach, which as noted also facilitates the investigation of nonlinearity in both the physical attributes of the good and the price attribute over a range of levels of provision. Such flexibility is not evident from studies that focus on a single discrete change for a particular good and compare WTP for the change from A to B to WTA for the change from B to A. The approach also allows estimation of an individual WTP–WTA schedule, whereas most existing studies typically compare WTP and WTA elicited within different groups of respondents. In the light of the findings reported here, the previous results from the literature comparing WTP and WTA should be reinterpreted by accounting for the magnitude of the change considered. The presence of reference point effects and loss aversion confirms findings from the experimental literature (e.g. Bateman et al., 1997) and raises a number of important issues for both analysts and decision-makers. From the analysts’ perspective our results suggest that the common use of linear utility specifications may be erroneous, particularly when considering scenarios involving both gains and losses. Our study makes a case for the adoption of more flexible functional forms allowing for the influence of behavioural factors upon stated preferences. Given the lack of theoretical guidance, the choice of econometric specification is ultimately an empirical issue. Accordingly specification search (e.g. through the use of dummy-coded, non-parametric utility function) should be undertaken and reported. In the context of economic appraisal, an implication of non-linear preferences is the role of the current provision level of the good. Indeed with a non-linear WTP–WTA schedule, a change in the reference point could generate a very different outcome for the appraisal of a particular project. First, we note that the implementation of a particular project could shift the reference point, thereby modifying the value of subsequent changes. However, in the context of ex-ante appraisal informed by stated preference surveys, it is the values expressed before the change that are relevant (as would transactions made under a set of market prices observed before the policy is implemented). Second, for the assessment of projects generating heterogeneous impacts across the relevant population, economic appraisal should ideally control for variations in both the SQ and the change experienced. In sum, such reasoning highlights the need for careful contextualisation of each study, notably in terms of the definition of the reference situation as well as its perception among the population.
536
Bruno Lanz et al.
From a broader perspective, the empirical confirmation of reference point effects and a priori loss aversion raises further issues and asks for continued research. Indeed, some commentators argue that such results should be taken at face value as indications of a genuine and persistent disparity of loss versus gain values (Knetsch, 2005). On this basis, economic appraisal should be based on a schedule for the WTA–WTP, with potentially varying marginal values across the change in outcome envisaged. However, a counter view is that such disparities may be a feature of the elicitation procedure itself, focussing on ‘decision utility’ (Sugden, 2005a, 2005b). In alternative settings, eliciting ‘experienced utility’, the observed disparities may themselves diminish (although not necessarily disappear) as ex-post SQ is integrated into the decision-making process (Kahneman & Sugden, 2005). This is, however, a wider debate than can be covered in the present paper and is still far from resolved. For the present therefore we restrict ourselves to the parsimonious analysis of data derived from stated preference surveys for which our results show that reference point effects and loss aversion are significant. Given the weight of experimental research and findings from other stated preference techniques such as CV, such a finding should not be a surprise and cannot be ignored in the design and analysis of choice experiments.
Acknowledgements The authors would like to thank two anonymous referees for their constructive comments and suggestions, as well as feedback from participants at Envecon 2009 Applied Environmental Economics Conference, the 2009 International Choice Modelling Conference and the 2009 Annual Conference of the European Association of Environmental and Resource Economists (EAERE). Support from United Utilities Water is gratefully acknowledged. The opinions expressed and any remaining errors are those of the authors only.
References Bateman, I. J., Day, B. H., Jones, A. P., & Jude, S. (2009). Reducing gains–loss asymmetry: A virtual reality choice experiment valuing land use change. Journal of Environmental Economics and Management, 58(1), 106–118. Bateman, I. J., Kahneman, D., Munro, A., Starmer, C., & Sugden, R. (2005). Testing competing models of loss aversion: An adversarial collaboration. Journal of Public Economics, 89(8), 1561–1580. Bateman, I. J., Munro, A., Rhodes, B., Starmer, C., & Sugden, R. (1997). A test of the theory of reference-dependent preferences. Quarterly Journal of Economics, 112(2), 479–505. Bowker, J. M., & MacDonald, H. F. (1993). An economic analysis of localized pollution: Rendering emissions in a residential setting. Canadian Journal of Agricultural Economics, 41(1), 45–59.
Investigating WTP–WTA Asymmetry in Choice Experiments
537
Brookshire, D. S., & Coursey, D. L. (1987). Measuring the value of a public good: An empirical comparison of elicitation procedures. American Economic Review, 77(4), 554–566. Ferrini, S., & Scarpa, R. (2007). Designs with a-priori information for non-market valuation with choice-experiments: A Monte Carlo study. Journal of Environmental Economics and Management, 53(3), 342–363. Fosgerau, M., & Bierlaire, M. (2009). Discrete choice models with multiplicative error terms. Transportation Research Part B: Methodological, 43(5), 494–505. Grutters, J. P. C., Kessels, A. G. H., Dirksen, C. D., van Helvoort-Postulart, D., Anteunis, L. J. C., Joore, M. A., et al. (2008). Willingness to accept versus willingness to pay in a discrete choice experiment. Value in Health, 11(7), 1110–1119. Hanemann, W. M. (1991). Willingness to pay and willingness to accept: How much can they differ? American Economic Review, 81(3), 635–647. Hanemann, W. M. (1999). The economic theory of WTP and WTA. In: I. J. Bateman & K. G. Willis (Eds), Valuing environmental preferences: Theory and practice of the contingent valuation method in the US, EU and developing countries (pp. 42–97). Oxford, UK: Oxford University Press. Hess, S., & Rose, J. M. (2009). Should reference alternatives in pivot design SC surveys be treated differently? Environmental and Resource Economics, 42(3), 297–317. Hess, S., Rose, J. M., & Hensher, D. A. (2008). Asymmetric preference formation in willingness to pay estimates in discrete choice models. Transportation Research Part E: Logistics and Transportation Review, 44(5), 847–863. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62(6), 1291–1326. Hicks, J. R. (1943). The four consumer surpluses. Review of Economic Studies, 8, 108–116. Horowitz, J. K., & McConnell, K. E. (2002). A review of WTA/WTP studies. Journal of Environmental Economics and Management, 44(3), 426–447. Kahneman, D., Knetsch, J. L., & Thaler, R. (1990). Experimental tests of the endowment effect and the Coase Theorem. Journal of Political Economy, 98(6), 1325–1348. Kahneman, D., & Sugden, R. (2005). Experienced utility as a standard of policy evaluation. Environmental and Resource Economics, 32(1), 161–181. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291. Knetsch, J. L. (1989). The endowment effect and evidence of nonreversible indifference curves. American Economic Review, 79(5), 1277–1284. Knetsch, J. L. (2005). Gains, losses, and the US-EPA economic analysis guidelines: A hazardous product? Environmental and Resource Economics, 32(1), 91–112. Knetsch, J. L., & Sinden, J. A. (1984). Willingness to pay and compensation demanded: Experimental evidence of an unexpected disparity in measures of value. Quarterly Journal of Economics, 99(3), 507–521. Ko¨bberling, V., & Wakker, P. (2005). An index of loss aversion. Journal of Economic Theory, 122(1), 119–131. Munro, A., & Sugden, R. (2003). On the theory of reference-dependent preferences. Journal of Economic Behavior and Organization, 50(4), 407–428. Ofwat. (2007). Further Ofwat guidance on the use of cost-benefit analysis for PR09, Letter to all regulatory directors of water and sewerage companies and water only companies, Fiona Pethick, Ofwat Director of Corporate Affairs, 19th December 2007. Randall, A., & Stoll, J. R. (1980). Consumer’s surplus in commodity space. American Economic Review, 70(3), 449–455.
538
Bruno Lanz et al.
Scarpa, R., Ferrini, S., & Willis, K. G. (2005). Performance of error component models for status-quo effects in choice experiments. In: R. Scarpa & A. Alberini (Eds), Applications of simulation methods in environmental and resource economics (pp. 247–273). Dordrecht, The Netherlands: Springer Publisher. Scarpa, R., & Rose, J. M. (2008). Design efficiency for non-market valuation with choice modelling: How to measure it, what to report and why. Australian Journal of Agricultural and Resource Economics, 52(3), 253–282. Scarpa, R., Thiene, M., & Train, K. (2008). Utility in willingness to pay space: A tool to address confounding random scale effects in destination choice to the Alps. American Journal of Agricultural Economics, 90(4), 994–1010. Scarpa, R., Willis, K. G., & Acutt, M. (2007). Valuing externalities from water supply: Statusquo, choice complexity and individual random effects in panel kernel logit analysis of choice experiments. Journal of Environmental Planning and Management, 50(4), 449–466. Stott, H. P. (2006). Cumulative prospect theory’s functional menagerie. Journal of Risk and Uncertainty, 32(2), 101–130. Sugden, R. (1999). Alternatives to the neo-classical theory of choice. In: I. J. Bateman & K. G. Willis (Eds), Valuing environmental preferences: Theory and practice of the contingent valuation method in the US, EU and developing countries (pp. 152–181). Oxford, UK: Oxford University Press. Sugden, R. (2005a). Anomalies and stated preference techniques: A framework for a discussion of coping strategies. Environmental and Resource Economics, 32(1), 1–12. Sugden, R. (2005b). Coping with preference anomalies in cost–benefit analysis: A marketsimulation approach. Environmental and Resource Economics, 32(1), 129–160. Thaler, R. (1999). Mental accounting matter. Journal of Behavioral Decision Making, 12(3), 183–206. Train, K. (2003). Discrete choice methods with simulation. Cambridge, UK: Cambridge University Press. Train, K., & Weeks, M. (2005). Discrete choice models in preference space and willingness-topay space. In: R. Scarpa & A. Alberini (Eds), Applications of simulation methods in environmental and resource economics (pp. 1–16). Dordrecht, The Netherlands: Springer Publisher. Tversky, A., & Kahneman, D. (1991). Loss aversion in riskless choice: A reference-dependent model. Quarterly Journal of Economics, 106(4), 1039–1061. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4), 297–323. Van Kooten, G. C., & Schmitz, A. (1992). Preserving waterfowl habitat on the Canadian prairies: Economic incentives versus moral suasion. American Journal of Agricultural Economics, 74(1), 79–89. Willis, K. G., Scarpa, R., & Acutt, M. (2005). Assessing water company customer preferences and willingness to pay for service improvements: A stated choice analysis. Water Resources Research, 41(2), W02019, doi: 10.1029/2004WR003277.
Appendix 24.A Figures 24.A.1–24.A.6.
Investigating WTP–WTA Asymmetry in Choice Experiments
Figure 24.A.1: Gains–loss asymmetry for external sewer flooding (EX1).
Figure 24.A.2: Gains–loss asymmetry for external sewer flooding (EX2).
539
540
Bruno Lanz et al.
Figure 24.A.3: Gains–loss asymmetry for internal sewer flooding (IN1).
Figure 24.A.4: WTP–WTA asymmetry for external sewer flooding (EX1).
Investigating WTP–WTA Asymmetry in Choice Experiments
Figure 24.A.5: WTP–WTA asymmetry for external sewer flooding (EX2).
Figure 24.A.6: WTP–WTA asymmetry for internal sewer flooding (IN1).
541
Chapter 25
Clustering Ranked Preference Data Using Sociodemographic Covariates Isobel Claire Gormley and Thomas Brendan Murphy
Abstract Ranked preference data arise when a set of judges rank, in order of their preference, a set of objects. Such data arise in preferential voting systems and market research surveys. Covariate data associated with the judges are also often recorded. Such covariate data should be used in conjunction with preference data when drawing inferences about judges. To cluster a population of judges, the population is modeled as a collection of homogeneous groups. The Plackett-Luce model for ranked data is employed to model a judge’s ranked preferences within a group. A mixture of PlackettLuce models is employed to model the population of judges, where each component in the mixture represents a group of judges. Mixture of experts models provide a framework in which covariates are included in mixture models. Covariates are included through the mixing proportions and the component density parameters. A mixture of experts model for ranked preference data is developed by combining a mixture of experts model and a mixture of Plackett-Luce models. Particular attention is given to the manner in which covariates enter the model. The mixing proportions and group specific parameters are potentially dependent on covariates. Model selection procedures are employed to choose optimal models. Model parameters are estimated via the ‘EMM algorithm’, a hybrid of the expectation–maximization and the minorization–maximization algorithms. Examples are provided through a menu survey and through Irish election data. Results indicate mixture modeling using covariates is insightful when examining a population of judges who express preferences.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
544
Isobel Claire Gormley and Thomas Brendan Murphy
25.1. Introduction Ranked preference data arise when judges rank some or all of a set of objects in order of their preference. Such data arise in a wide range of contexts including preferential voting systems (e.g. Irish elections that use a preferential voting system (Coakley & Gallagher, 2004)), market research surveys (e.g. food preference surveys (Ralston, Brent, Starke, Riggins, & Lin, 2002)) and university application procedures (e.g. in Ireland, students rank up to 10 degree courses in their college application (Gormley & Murphy, 2006)). Modeling preference data in an appropriate manner is imperative when examining the behavior of the set of judges who gave rise to the data. Additionally, it is often the case that covariates associated with each judge are recorded when a survey of their preferences is taken. Such covariate data can be used in conjunction with preference data to provide a deeper understanding of the preferences and/or structure of the population of judges under investigation. Models for preference data including those that incorporate covariates are discussed in Section 25.2.1. Clustering methods are used when it is believed that a heterogeneous population of judges consists of a collection of homogeneous subpopulations and these subpopulations are unknown and need to be characterized. Clustering methods tend to be either algorithmic (e.g. hierarchical or k-means clustering) or based on statistical models (e.g. Fraley & Raftery, 2002). In the model-based approach to clustering, the population is modeled as a finite collection of homogeneous groups that are modeled individually using appropriate statistical models; that is, a finite mixture model is used to model the data. In this work a model-based approach is taken, where the Plackett-Luce (or exploded logit) model for rank data is employed within each group to model the way in which group members rank preferences. Thus, a mixture of Plackett-Luce models is employed as an appropriate statistical model for the population of judges, where each component in the mixture represents a group of judges with a specific parameterization of the Plackett-Luce model. A more detailed outline of the mixture of Plackett-Luce models is given in Section 25.2.2. Mixture of experts (MoE) models (Jacobs, Jordan, Nowlan, & Hinton, 1991) provide a framework in which covariates are included in mixture models. In these models, covariates are included through the mixing proportions and through the parameters of component densities using generalized linear model theory. In Section 25.2.3, an MoE model for ranked preference data is developed by combining an MoE model and a mixture of Plackett-Luce models. Particular attention is given to the manner in which covariates enter the model. The model parameters are estimated via the ‘EMM’ algorithm, a hybrid of the expectation–maximization (EM) and the minorization–maximization (MM) algorithms. Model selection procedures are employed to select both the manner in which covariates enter the model and to select the optimal number of groups within the population. This approach provides a framework where the manner in which covariates influence a clustering is selected in a unified, statistically principled manner. Details of model fitting and selection are given in Section 25.3.
Clustering Ranked Preference Data
545
In this paper, two applications are used to illustrate the proposed methodology for clustering ranked preference data in the presence of covariates: a marketing data set collected through the 1996 Menu Census Survey conducted by the Market Research Corporation of America and an Irish election data set where voters rank electoral candidates in order of their preference. In both examples, data have been collected from a heterogeneous set of judges who have expressed their preferences. Interest lies in establishing the existence of homogeneous subgroups of judges in the population who have similar preferences. Covariates associated with the judges are also available and there is interest in establishing if the covariates provide information about the clustering. The data sets are described in detail in Section 25.4. The results of applying the MoE model for ranked preference data to the illustrative examples are given in Section 25.5. The results indicate that mixture modeling using covariates can be insightful when examining a population of judges who express preferences. Section 25.6 concludes with a discussion of the proposed model and the results of its application in the illustrative examples.
25.2. A Mixture of Experts Model for Ranked Preference Data The MoE model (Jacobs et al., 1991) accommodates clustering and covariate modeling in a single modeling framework. In this section, we develop a MoE model for ranked preference data, so that covariates can be used in conjunction with ranked preference data for clustering purposes.
25.2.1. Models for Ranked Preference Data Many models have been proposed for ranked preference data. Examples of models include distance-based models (Critchlow, 1985; Mallows, 1957) where the probability of a ranking decreases as the distance from a central ranking increases, order statistic (random utility) models (Chapman & Staelin, 1982; Luce & Suppes, 1963; McFadden, 1974; Thurstone, 1927) where the ranking reflects the ordering of latent scores given to each object and multistage ranking models (Benter, 1994; Fligner & Verducci, 1988; Plackett, 1975) where the ranking is modeled as a sequential process of selecting the next most preferred object. Detailed reviews of models for ranking data are given by Critchlow, Fligner, and Verducci (1991), Fligner and Verducci (1993) and Marden (1995), and references therein. In this paper, the Plackett-Luce model (or exploded logit model) (Chapman & Staelin, 1982; Plackett, 1975) for ranked preference data is used to model data within a homogeneous set of judges. Suppose that data are collected from M judges who list their preference ordering for a set of N objects. Let c(i, j) denote the object ranked in jth position by judge i. Then xi ¼ ðcði; 1Þ; cði; 2Þ; . . . ; cði; ni ÞÞ is an ordered list of the objects as recorded in the ranked preference of judge i, where ni is the number of
546
Isobel Claire Gormley and Thomas Brendan Murphy
preferences expressed by this judge. The Plackett-Luce model with support parameter p ¼ ðp1 ; p2 ; . . . ; pN Þ is of the form,
Pðxi j pÞ ¼
ni Y pcði;1Þ pcði;2Þ pcði;ni Þ pcði;tÞ ¼ N N N N P P P P pcði;sÞ pcði;sÞ pcði;sÞ t¼1 pcði;sÞ s¼1
s¼ni
s¼2
(25.1)
s¼t
where cði; ni þ 1Þ; . . . ; cði; NÞ is any permutation of objects not listed in the judge’s ranked preference; the choice of this ordering does not influence the P probability. In order to make the parameter p identifiable, it is usual to restrict N j¼1 pj ¼ 1; under this restriction, the support parameter pj can be interpreted as the probability of selecting object j in first place, out of the currently available choice set. Under the Plackett-Luce model the ranking of objects by a judge is modeled as a set of independent choices by the judge, conditional on the cardinality of the choice set being reduced by one after each choice is made. The Plackett-Luce model can be interpreted as either an order statistic (random utility) model or a multistage model (cf. McFadden, 1974). In particular, let the utility that voter i selects candidate j be U ij ¼ log pj þ ij , where the eij are independent identically distributed according to an extreme value distribution. Then, Pðxi j pÞ ¼ PðU icði;1Þ 4U icði;2Þ 4 4U icði;ni Þ Þ, and can be written in the same form as Eq. (25.1) (e.g. Train, 2003, Section 7.3.1). The Plackett-Luce model can accommodate covariates using a multinomial logit structure, as proposed in Chapman and Staelin (1982) and Train (2003, Chapter 7). Let wi ¼ ðwi1 ; wi2 ; . . . ; wiL Þ be the covariates for observation i. The support parameters are modeled as a logistic function of the covariates, pj ðwi Þ ¼ gj0 þ gj1 wi1 þ gj2 wi2 þ þ gjL wiL ¼ gTj wi log p1 ðwi Þ Object 1 is the baseline category, with g1 ¼ ð0; . . . ; 0Þ, in order to assure identifiability of the model parameters. This leads to the exploded logit model of the form,
Pðxi j pðwi ÞÞ ¼ Pðxi jg; wi Þ ¼
expðgTcði;1Þ wi Þ N P s¼1
¼
expðgTcði;sÞ wi Þ
ni expðgTcði;tÞ wi Þ Y t¼1
N P s¼t
expðgTcði;2Þ wi Þ N P s¼2
expðgTcði;sÞ wi Þ
expðgTcði;n Þ wi Þ i
N P s¼ni
expðgTcði;sÞ wi Þ ð25:2Þ
expðgTcði;sÞ wi Þ
In this case, the model corresponds to a random utility model with U ij ¼ gTj wi þ ij , where the eij are iid according to an extreme value distribution.
Clustering Ranked Preference Data
547
25.2.2. Mixture Models The mixture model (also known as a latent class model (LCM)) assumes that a population can be modeled as a finite collection of subpopulations, where each subpopulation can be characterized by a suitable probability density. Mixture models have been used in a wide range of applications where data are collected from heterogeneous sources. Latent class analysis (LCA; Lazarsfeld & Henry, 1968) uses a mixture model to investigate the dependence between categorical variables, thus providing a discrete version of factor analysis for categorical data. General reviews of mixture modeling are given by Titterington, Smith, and Makov (1985), McLachlan and Basford (1988) and McLachlan and Peel (2000). In addition, Fraley and Raftery (2002) provide a review of the use of mixture models for clustering. Suppose that a population consists of K subpopulations and that the probability of belonging to subpopulation k is pk. The probability density for observation xi from subpopulation k is f ðxi jyk Þ, where yk are the parameters of the model for subpopulation k. Then the model for an observation of unknown subpopulation is of the form, Pðxi Þ ¼
K X
pk f ðxi jyk Þ
(25.3)
k¼1
that is a K component mixture model. The pk values are known as mixing proportions, f ðxi jyk Þ, are called component densities and yk are the parameters of the component densities. Mixture models have been successfully applied to the analysis of ranked preference data in Stern (1993), Vigneau, Courcoux, and Semenou (1999), Murphy and Martin (2003), Gormley and Murphy (2006, 2008a), Busse, Orbanz, and Buhmann (2007) and Meila˘ and Bao (2008) among others. The mixture of Plackett-Luce models, Pðxi Þ ¼
K X k¼1
pk f ðxi jpk Þ ¼
K X k¼1
pk
ni Y pkcði;tÞ N P t¼1 pkcði;sÞ
(25.4)
s¼t
where pk is the parameter of the Plackett-Luce model that characterizes subpopulation k, is applied to ranked preference data by Gormley and Murphy (2006) who analyze Irish college applications. Additionally, Gormley and Murphy (2008a) analyze Irish election data using both a mixture of Plackett-Luce models and a mixture of Benter models (Benter, 1994). In this article, the mixture of Plackett-Luce models is extended to facilitate the inclusion of covariates. The mixture of Plackett-Luce models has some connections with LCA, in that a mixture model is being used to model the data, but the motivation is quite different. In this work, the mixture-modeling framework is being used to find clusters in the data, whereas in LCA the emphasis is to study dependence between variables. However, both approaches use a discrete latent variable to study structure in the population.
548
Isobel Claire Gormley and Thomas Brendan Murphy
25.2.3. Mixture of Experts Models Jacobs et al. (1991) introduce the MoE model as an extension of the standard mixture model to include covariates. Covariates are incorporated in the mixture model through the use of generalized linear models (GLMs) (Dobson, 2002; McCullagh & Nelder, 1983; Nelder & Wedderburn, 1972). GLMs are used to model both the relationship between the outcome variable and covariates and the relationship between the mixing proportions and covariates. Hence, the general MoE model is of the form,
Pðxi jwi Þ ¼
K X
pk ðwi Þf ðxi jyk ðwi ÞÞ
(25.5)
k¼1
where the relationship between the mixing proportions and covariates, for example, is modeled as a multinomial logistic regression model of the form, log
pk ðwi Þ ¼ bk0 þ bk1 wi1 þ þ bkL wiL ¼ bTk wi p1 ðwi Þ
(25.6)
The MoE model originates from the machine learning literature and the terminology used for this model is different from mixture modeling. The mixing proportions pk ðwi Þ are called gating networks and the component densities f ðxi jyk ðwi ÞÞ are called expert networks. The MoE model (Eq. 25.5) generalizes the mixture model (Eq. 25.3) by allowing both the mixing proportions and the component densities to be functions of the covariates. In this paper, four possible models are proposed by either allowing or not allowing terms in the model to depend on the covariates. Specifically, the general MoE model in Eq. (25.5) models both the mixing proportions and the component density parameters as functions of covariates. The mixture model is a special case of the general MoE model in which neither the mixing proportions nor the component density parameters are influenced by covariates. The expert network MoE model allows the component density parameters to depend on the covariates, but not the mixing proportions, while in the gating network MoE model covariates influence the mixing proportions but not the component density parameters. In the context of modeling ranked data, the mixture model has been proposed previously by Gormley and Murphy (2006, 2008a) and the gating network MoE model was proposed by Gormley and Murphy (2008b). In other contexts, Hurn, Justel, and Robert (2003) used the expert network MoE model in the special case where the model reduces to a mixture of regression models. Thompson, Smith, and Boyle (1998) used the general MoE model to evaluate diagnostic criteria for diabetes. A unified framework in which the optimal model is chosen using model selection techniques has not been employed in any of these applications. In the MoE model for ranked preference data, for example, the general MoE model allows both the component densities (or expert networks) and the mixing
Clustering Ranked Preference Data
549
proportions (or gating networks) depend on the covariates. Explicitly, Pðxi jwi Þ ¼
K X
pk ðwi Þf ðxi jyk ðwi ÞÞ
k¼1
9 8 > > > > > ni = K expðgTkcði;tÞ wi Þ > X expðbTk wi Þ > t¼1 P P k¼1 > > expðbTr wi Þ > expðgTkcði;sÞ wi Þ> ; :
ð25:7Þ
s¼t
r¼1
which arises from substituting Eqs. (25.2) and (25.6) into Eq. (25.5). The mixture model, the gating network MoE model and the expert network MoE model are special cases of Eq. (25.7) where the mixing proportions and/or the component densities are treated as constant with respect to the covariates. This paper provides an unifying framework for MoE modeling for ranked preference data by including all four models and allowing the most appropriate model to be selected using model selection criteria. This framework addresses the question of how and where covariates can be used in the clustering of ranked preference data.
25.3. Model Fitting and Selection 25.3.1. Model Fitting The MoE model for ranked preference data can be fitted in a maximum likelihood framework using an EM algorithm (Dempster, Laird, & Rubin, 1977; McLachlan & Krishnan, 1997). The methods for fitting the model closely follow the methods outlined in Gormley and Murphy (2006, 2008a, 2008b); model fitting details for the more general model are outlined in this paper. Let g ¼ ðg1 ; g2 ; . . . ; gN Þ and b ¼ ðb1 ; b2 ; . . . ; bK Þ be the unknown parameters in the general MoE model for ranked preference data (Eq. 25.7). The likelihood function is of the form, 93 8 2 > > > > > T ni M 6X K =7 expðgkcði;tÞ wi Þ > Y expðbTk wi Þ > 5 P 4 P > i¼1 k¼1 t¼1 T > > expðbT wi Þ> expðg w Þ : i ; r¼1
r
s¼t
kcði;sÞ
This likelihood function (25.8) is not easy to maximize directly, due to the conditional mixture form of the likelihood. As a result, an EM algorithm is used for model fitting. The EM algorithm is used to provide maximum likelihood parameter estimates when some of the data under study are (treated as) missing. In this case a latent
550
Isobel Claire Gormley and Thomas Brendan Murphy
indicator variable zi ¼ ðzi1 ; . . . ; ziK Þ is imputed which records the unknown group membership of observation i, where zik ¼ 1 if observation i comes from group k and zik ¼ 0 otherwise. The complete data likelihood (i.e. the likelihood of both the missing and observed data) is then of the form, 93zik 8 > > > > > T ni M Y K 6 expðbT w Þ > =7 > 4 5 P P > > T i¼1 k¼1 t¼1 T > > expðbr wi Þ : expðgkcði;sÞ wi Þ; 2
(25.9)
s¼t
r¼1
giving the complete data log likelihood 93 8 > > > > > > T ni =7 M X K 6 expðbT w Þ > 4 5 P P > > T t¼1 i¼1 k¼1 T > expðbr wi Þ > expðg w Þ : kcði;sÞ i ; s¼t r¼1 9 8 > > > > > = < expðbT w Þ > M X K X k i ¼ zik log K > > > P expðbT w Þ> i¼1 k¼1 > > : r i ; r¼1 9 8 > > > > > ni = < expðgT wi Þ > M X K X X kcði;tÞ þ ð25:10Þ zik log N > > P > > t¼1 i¼1 k¼1 T > > expðg w i Þ; : 2
s¼t
kcði;sÞ
The EM algorithm is an iterative algorithm in which each iteration consists of two steps — an expectation step and a maximization step. At the expectation or E step, the expected value of the complete data log likelihood is estimated; in this case the E step reduces to estimating the expected value of the missing data z. At the maximization or M step, the expected complete data log likelihood is then maximized with respect to the model parameters, producing on convergence (at least local) maximum likelihood parameter estimates. At the M step of the EM algorithm, maximization of an equation of the form (25.10) can be achieved by noticing that the first term is of the same form as a multinomial logistic regression likelihood and the second term is of the same form as the likelihood for fitting an exploded logit mixture model. The two terms involve independent parameters, so they can be maximized independently at the M step. An MM algorithm (Hunter, 2004; Hunter & Lange, 2004; Lange, Hunter, & Yang, 2000) is used here as a more efficient way to implement the M step of the EM algorithm. An MM algorithm proceeds by iteratively maximizing a minorizing surrogate function, which approximates the original objective function to be maximized. Full details are provided in Appendix 25.A.3 for the expert network MoE model. Extra details for the
Clustering Ranked Preference Data
551
implementation of the M step for the other models are contained in Gormley and Murphy (2008a, 2008b). Approximate standard errors for the model parameters are computed from the empirical information matrix as outlined in McLachlan and Krishnan (1997) and McLachlan and Peel (2000), after the EM algorithm has converged. Given the definition of the latent variables zi the structure of the different forms of the MoE model for rank data can be clarified, as illustrated using a graphical model in Figure 25.1. w
w
x
z
x
z
(a)
(b)
w
w
x
x
z
z
(c)
(d)
Figure 25.1: Graphical model representation of the four mixture of experts models: (a) in the mixture model, the ranking distribution depends on the latent variable z and the model is independent of the covariates w; (b) in the gating network MoE model, the ranking distribution depends on the latent variable z and the distribution of the latent variable depends on w; (c) in the expert network MoE model, the ranking distribution depends on the covariates w and the latent variable z; the distribution of the latent variable is independent of the covariates; (d) in the general MoE model, the ranking distribution depends on the covariates w and the latent variable z and the distribution of the latent variable also depends on the covariates.
552
Isobel Claire Gormley and Thomas Brendan Murphy
25.3.2. Model Selection In the unifying framework developed here, all of the MoE models for ranked preference data are fitted over a range of values for K. The Bayesian information criterion (BIC) (Kass & Raftery, 1995; Schwartz, 1978) is used for model comparison; this criterion is a penalized likelihood criterion which rewards model fit but penalizes unparsimonious models. The BIC value is defined to be: 2 ðmaximized log likelihoodÞ þ logðMÞðnumber of parametersÞ where the first term measures model fit and the second term penalizes for complexity. Small BIC values indicate an optimal model. The use of BIC for model selection in mixture models is supported by theoretical results concerning consistency (Keribin, 2000; Leroux, 1992) and by practical performance (e.g. Fraley & Raftery, 2002; Gormley & Murphy, 2008a; McNicholas & Murphy, 2008; Murphy & Martin, 2003). There are a number of other model selection methods available including the Akaike information criterion (AIC) (Akaike, 1973), integrated completed likelihood (ICL) (Biernacki, Celeux, & Govaert, 2000) and cross-validated likelihood (Smyth, 2000). Yang and Yang (2007) discuss the use of BIC and other information criteria in the separation of latent classes and conclude that care is advised when separating a large number of latent classes when sample size is small. Additionally, Yang and Yang (2007) comment that the inclusion of informative covariates improves the performance of information criteria when separating latent classes. In the applications examined here, we found that BIC gave good clustering results that closely correspond to the findings in Gormley and Murphy (2008a). The space of potential MoE models for ranked preference data is very large, once variable selection for the covariates entering the mixing proportions and mixture components is considered. Here only models where covariates enter all mixture components or all mixing proportions are considered in order to restrict the size of the model search space. In fact, even for this reduced model space, there are K 2L 2L possible models to consider. A forward covariate selection procedure was used to find the optimal model within each type of MoE model. Initially all possible models incorporating a single covariate were fitted. The covariate in the optimal model, as determined by BIC, is then retained. The remaining covariates are then added in turn to the optimal model selected at the first stage, and the best model from this set is selected using BIC. This process continues until all covariates are included. All models can then be compared via the BIC.
25.4. Illustrative Applications Clustering preference data in the presence of covariates is illustrated through the use of two applications — the first involves clustering a set of respondents from a food preference survey, while the second involves clustering members of the Irish electorate.
Clustering Ranked Preference Data
553
25.4.1. The Hamburger Preparation Quiz In 1996 the Market Research Corporation of America carried out an extensive national mail survey called the Menu Census Survey. The aim of the survey was to conduct an in-depth study of consumer food safety behavior. As a supplement to the Menu Census Survey respondents were required to complete a ‘Hamburger Preparation Quiz’ (HPQ). In this supplement respondents detailed their preferences for hamburgers. Typical survey items involved respondents stating their preferred hamburger order in a restaurant and their taste preferences for hamburger styles. Demographic information such as the age, the population size of their residential area, ethnicity and the gender of each respondent was also recorded, as was the type of diet (if any) that the respondent was currently following. The HPQ was completed by 1133 individuals, of which 594 provided complete responses to the demographic questions. The adult with the most recent birthday in each household completed the HPQ. Appendix 25.A.1 provides full details of the source of the HPQ data. Interest lies in determining if groups (or clusters) of people with similar preferences for hamburgers exist within the population. If such groups exist, interest lies in determining the types of preferences within each cluster. Moreover, given that demographic information is available, examining the influence demographic factors may have on the clustering structure of the population and/or on the preferences within clusters is of interest. Question 12 from the HPQ asks respondents to rank hamburger patties in terms of their taste. Specifically, respondents were asked to rank rare (R), medium-rare (MR), medium (M), medium-well (MW) and well-done (WD) patties in order of their preference. A ‘don’t know/not sure’ option was also offered to respondents but no respondents availed of this choice. In this application the response to question twelve is treated as the rank response observation from each respondent. These data were previously analyzed in Ralston et al. (2002) and Bao and Meila˘ (2008). Details of the demographic information or covariates recorded are provided in Table 25.1.
25.4.2. Irish Election Data Both governmental and presidential elections in Ireland employ a preferential voting electoral system known as ‘proportional representation by means of a single transferable vote’ (PR-STV). Under this system voters rank, in order of their preference, some or all of the electoral candidates. The counting process that results in the election or elimination of candidates is an intricate procedure, which involves the transfer of votes between candidates as specified by the rank ballots of the voters. Full details of the mechanics of the PR-STV electoral system are given in Sinnott (2004). Further details on the Irish political system in general are given in Coakley and Gallagher (2004) and Sinnott (1995). In this article, the electorate from the 1997 Irish presidential election is analyzed. In 1997, five candidates ran for the office of President of Ireland. Mary Banotti was
554
Isobel Claire Gormley and Thomas Brendan Murphy
Table 25.1: Demographic information recorded in the Hamburger Preparation Quiz. Age –
Residential area population size
Diet
Ethnicity
Gender
Farm o2,500 2,500–9,999 10,000–49,999 50,000–99,999 100,000–249,000 250,000–499,999 500,000–999,999 1 million–2 million W2 million
Not on diet Diet for medical reasons Diet to gain weight Diet to maintain weight Diet to reduce weight Diet for other reasons
Black White Other
Female Male
Note: The levels of each demographic covariate are detailed as appropriate.
endorsed by the political party Fine Gael, which was the main government opposition party in 1997. She was deemed to be a liberal candidate and was popular throughout the electoral campaign. Mary McAleese was backed by the current government party Fine Fa´il and was known as a conservative candidate. McAleese was widely believed to be the favorite for the presidency throughout the campaign and she was subsequently elected as President of Ireland on October 30, 1997. Derek Nally was a late contender for the post, only being nominated as a potential candidate one month prior to election day. He ran on an independent ticket and received the least number of first preference votes on polling day. Adi Roche’s involvement in the Irish presidential campaign in 1997 was the most unstable. Roche was backed by another government opposition party, the Labour party, and began the campaign as joint favorite for the presidency along with McAleese. Her liberal campaign contrasted with McAleese’s conservative campaign, but Roche’s popularity began to dramatically decline after negative publicity regarding her work affairs emerged in the media. As the campaign developed her support ratings dropped. The fifth candidate Rosemary Scallon was an independent, conservative, candidate. Scallon’s support pattern during the campaign was the reverse of Roche’s. Scallon began the campaign with an extremely small support base, but as the campaign wore on she emerged as a capable candidate and finished in a respectable third place, behind the favorites McAleese and Banotti. Table 25.2 summarizes the candidates’ details. A full treatment of the vote counting process in the 1997 Irish presidential election, and further details, can be found in Gormley and Murphy (2008b). Additionally, a detailed account of the 1997 Irish presidential election campaign is provided by Marsh (1999). Irish Marketing Surveys (IMS) completed a survey one month prior to the 1997 presidential election. At this time, Roche was still a major contender in the election and Nally had only started his campaign a few days earlier. In the IMS poll, a sample of 1100 potential voters were asked to list the candidates in order of preference
Clustering Ranked Preference Data
555
Table 25.2: Information about the five candidates who ran for the office of President of Ireland in 1997. Name
Mary Banotti
Mary McAleese
Derek Nally
Adi Roche
Rosemary Scallon
Endorsing party
Fine Gael
Fianna Fa´il
Independent
Labour
Independent
Table 25.3: Covariates recorded for each voter sampled in the IMS poll. Age –
Area
Gender
Government satisfaction
Marital status
Social class
City Town Rural
Housewife Non-housewife Male
Satisfied Dissatisfied No opinion
Married Single Widowed
AB C1 C2 DE F50 + F50
(as if they were voting on that day); 17 people who were sampled said that they did not intend to vote, so they were excluded from this analysis. In addition to the voting preferences, a number of socioeconomic variables were recorded for each person sampled in the poll; these are listed in Table 25.3. Further details on this poll and the covariates are provided in Appendix 25.A.2 and in Gormley and Murphy (2008b). In this application, interest lies in determining if groups (or ‘voting blocs’) of voters with similar preferences for the electoral candidates exist within the electorate. If the electorate is heterogeneous, interest lies in determining the preferences for the candidates within each voting bloc. Examining the influence the recorded socioeconomic variables may have on the clustering structure and/or on the preferences within voting blocs is also of interest.
25.5. Application Results The proposed MoE model for ranked preference data was applied to the illustrative examples in order to determine if and how the covariates enter the model and how their inclusion affects the clustering results.
25.5.1. The Hamburger Preparation Quiz All of the MoE models for ranked preference data were fitted to the Hamburger Preparation data (Section 25.4.1) with K ¼ 1, 2,y, 10. The forwards selection
556
Isobel Claire Gormley and Thomas Brendan Murphy
Table 25.4: The model with the smallest BIC within each type of the mixture of experts model for ranked preference data applied to the Hamburger Preparation Quiz data.
The The The The
mixture model gating network MoE model expert network MoE model general MoE model
BIC
K
Covariates
3677 3722 3922 4682
6 6 5 2
– pk: Ethnicity pk: Gender pk: Ethnicity pk: Area size
Table 25.5: Mixture model support parameter estimates (given as percentages) for the Hamburger Preparation Quiz data. Group 1 Rare Medium rare Medium Medium well Well done
Group 2
Group 3
Group 4
Group 5
0 0 0 25 75
(o1) (o1) (o1) (o1) (o1)
0 25 61 14 0
(o1) (o1) (o1) (o1) (o1)
0 0 38 62 0
(o1) (o1) (o1) (2) (o1)
30 45 24 1 0
(o1) (o1) (o1) (o1) (o1)
2 6 8 19 65
Mixing proportion 39
(3)
22
(2)
17
(2)
17
(2)
3
Group 6
(o1) 0 (o1) (o1) 100 (o1) (o1) 0 (o1) (o1) 0 (o1) (4) 0 (o1) (o1)
2 (o1)
Notes: Standard errors are given in parentheses. Figure 25.2 provides an illustration of the estimates.
strategy for selecting covariates outlined in Section 25.3.2 was utilized and the model with the highest BIC was found within each type of MoE model. The results of this analysis are shown in Table 25.4. Based on the BIC, the optimal model in this case (the mixture model) suggests that there are six groups of judges in the population and that none of the recorded covariates are informative in the modeling. The difference in BIC values indicates very strong support (Kass & Raftery, 1995, Section 3.2) for the fact that the covariates are noninformative. The mixing proportions and support parameters of the optimal model are detailed in Table 25.5 and shown using a mosaic plot (Emerson, 1998; Hartigan & Kleiner, 1981) in Figure 25.2. In the mosaic plot, the width of the bar shows the mixing proportion for each component and the division of the bar shows the support parameter values within each component. Intuition on the suitability of the modeling techniques employed can be provided through model diagnostics. Here a comparison is made between the expected number of first preferences for each choice category, given the estimated model parameters, and the observed number of first preferences for each choice category. The resulting w2-test statistic (detailed in Table 25.6) demonstrates the suitability of the employed modeling techniques (p value ¼ 0.56). Interestingly, the six groups found in this analysis correspond closely to the six groups discovered by Bao and Meila˘ (2008) using a different modeling framework.
Clustering Ranked Preference Data
Rare Medium−rare Medium
Group 1
Group 2
Group 3
Group 4
557
Gp Gp 5 6
Medium−well
Well−done
Figure 25.2: For the Hamburger Preparation Quiz data, a mixture model with six groups was deemed the optimal model according to the BIC (Table 25.4). A mosaic plot representation of the mixture model parameter estimates is given — the width of a bar illustrates the mixing proportion for each group and the division of a bar shows the support parameter values within each group. Parameter estimates and standard errors are detailed in Table 25.5. Table 25.6: Observed and expected number of first preferences for each choice category in the Hamburger Preparation Quiz data. Rare Medium Medium Medium rare well Observed number of first preferences Expected number of first preferences
Well done
36
75
148
149
186
w24 ¼ 2:99
31
87
146
144
186
p ¼ 0.56
Note: The w2 statistic with four degrees of freedom is not significant, suggesting a good model fit.
It is also notable that within each group the support parameters take large values for a contiguous subset of the available choices. This is intuitive as there is a natural ordering to the tastes being ranked. Also, support only tends to be high for one, two or three tastes within each component suggesting that there are precise preferences within each group.
558
Isobel Claire Gormley and Thomas Brendan Murphy
Ralston et al. (2002) examined the HPQ respondents’ cooking and food ordering habits and found that the covariate ‘area size’ had a significant effect on ordering, with respondents from large cities having a higher probability of ordering lightly cooked burgers; this analysis did not find this effect in the taste preference data. The analysis in Ralston et al. (2002) uses the taste preference data as a predictor for cooking and food ordering habits rather than as an outcome variable.
25.5.2. The Irish Presidential Election All of the MoE models for ranked preference data were fitted for K ¼ 1, 2, y, 5 and the forwards covariate selection method was employed when selecting the optimal model using the BIC. The optimal models for each type of MoE model are reported in Table 25.7. Based on the BIC values, the optimal model overall is a gating network MoE model with four components where ‘age’ and ‘government satisfaction’ are important covariates for determining group or ‘voting bloc’ membership. Interestingly, the covariates are not informative within voting blocs, but only in determining voting bloc membership. This model corresponds to the model applied to these data in Gormley and Murphy (2008b). The support parameters for the optimal model are presented in mosaic plot form in Figure 25.3(a). For comparison purposes, a mosaic plot of the support parameters for the best mixture model are shown in Figure 25.3(b). The support parameter estimates have an interpretation in terms of political party competition and in terms of a conservative–liberal competition. Voting bloc 1 is the ‘conservative voting bloc’ with larger support parameters for McAleese and Scallon. Voting bloc 2 has large support for the liberal candidate Adi Roche. This voting bloc indicates that the model has uncovered some of the observed characteristics of the presidential campaign at the time the poll was taken in that Adi Roche has large support. Voting bloc 3 is the largest voting bloc in terms of marginal mixing proportions and intuitively has larger support parameters for the high profile candidates McAleese and Banotti. Voters belonging to voting bloc 4
Table 25.7: The model with smallest BIC within each type of the mixture of experts model for ranked preference data applied to the 1997 Irish presidential election data. BIC
K
Covariates
The gating network MoE model The general MoE model
8491 8512
4 3
The mixture model The expert network MoE model
8513 8528
3 1
pk: Age, Government satisfaction pk: Age, Government satisfaction pk: Age – pk: Government satisfaction
Clustering Ranked Preference Data (a)
Voting bloc 1 Voting bloc 2
Banotti
11
(1)
13
(1)
28
(3)
13
(1)
3
(<1)
Voting bloc 3 17
559
Voting bloc 4
(<1)
53
(<1)
14
(1)
13
(1)
McAleese
Nally
Roche
Scallon
15
(2)
15
(4)
72
70
1
^ 1 = 19 π
(b)
Voting bloc 1
Banotti
(1)
(6)
31
15
(3)
(<1)
^ 2 = 16 π
4
(<1)
15
(1)
6 1
(<1) (<1)
5
(<1)
^ 3 = 35 π
Voting bloc 2
(5)
McAleese
25
(2)
Nally
18
(3)
Roche
14
(3)
Scallon
28
(1)
33
(1)
25
(2)
10
(1)
28
(2)
4
^ 1 = 23 (8) π
(3)
^ 2 = 55 (13) π
^ 4 = 30 π
Voting bloc 3 17
(6)
76
(1)
3
(<1)
3 1
(1) (2)
^ 3 = 22 (8) π
Figure 25.3: A mosaic plot representation of the support parameters (given as percentages) for (a) the gating network MoE model for ranked preference data and (b) the mixture model fitted to the Irish presidential election data. The width of each column denotes the marginal mixing proportions. Standard errors are given in parentheses.
560
Isobel Claire Gormley and Thomas Brendan Murphy
favor Banotti and have more uniform levels of support for the other candidates. A detailed discussion of this optimal model is also given in Gormley and Murphy (2008b). The groups found in the MoE model and the mixture model show some correspondence. Voting blocs 2 and 4 in the MoE model (Figure 25.3(a)) appear to be a division of voting bloc 3 in the mixture model (Figure 25.3(b)). This suggests that the MoE model for ranked preference data was able to discover a finer division of the voters into voting blocs than the mixture model. Table 25.8 details the odds ratios computed for the mixing proportion (or gating network) parameters b ¼ ðb1 ; . . . ; bK Þ: In the model, voting bloc 1 (which is the conservative voting bloc) is the baseline voting bloc. In addition, in the ‘government satisfaction’ covariate, the baseline was chosen to be ‘no opinion’. The odds ratios for the mixing proportions parameters also provide intuitive results within the context of the Irish presidential election. For example, older (and generally more conservative) voters are much less likely to belong to the liberal voting bloc 2 than to the conservative voting bloc 1. Also, voters with some interest in government are more likely to belong to voting bloc 3, the bloc favoring candidates backed by large government parties, than to belong to the conservative voting bloc 1. Voting bloc 1 had high levels of support for the independent candidate Scallon. The mixing proportions parameter estimates further indicate that voters dissatisfied with the current government are more likely to belong to voting bloc 4 than to voting bloc 1. This is again intuitive as voting bloc 4 favors Mary Banotti who was backed by the main government opposition party, while voting bloc 1 favors the government backed Mary McAleese. Further interpretation of the mixing proportion parameters are given in Gormley and Murphy (2008b). To diagnose the suitability of the modeling techniques employed, a comparison is made between the expected number of first preferences for each electoral candidate, given the estimated model parameters, and the observed number of first preferences for each candidate. For the optimal (in terms of BIC) gating network MoE model, the resulting w2-test statistic (detailed in Table 25.9(a)) suggests good model fit; this is not the case for the less preferable mixture model (Table 25.9(b)).
Table 25.8: Odds ratios for the mixing proportion parameters in the gating network MoE model for ranked preference data. Age
Satisfied
Odds ratio 95% CI Odds ratio Voting bloc 2 Voting bloc 3 Voting bloc 4
0.01 0.95 1.56
[0.00, 0.05] [0.32, 2.81] [0.35, 6.91]
1.14 3.12 0.35
Not satisfied
95% CI
Odds ratio
95% CI
[0.42, 3.11] [0.94, 10.31] [0.12, 0.98]
2.80 3.81 3.50
[0.77, 10.15] [0.90, 16.13] [1.07, 11.43]
Clustering Ranked Preference Data
561
Table 25.9(a): Observed number of first preferences and the expected number for the gating network MoE model illustrated in Figure 25.3(a).
Observed number of first preferences Expected number of first preferences
Banotti
McAleese
Nally
Roche
Scallon
277
411
89
222
84
w24 ¼ 0:85
278
399
94
224
88
p ¼ 0.93
For the gating network MoE model, the w2-statistic suggests a good model fit; this is not the case for the suboptimal (in terms of BIC) mixture model.
Table 25.9(b): Expected values for the mixture model illustrated in Figure 25.3(b).
Observed number of first preferences Expected number of first preferences
Banotti
McAleese
Nally
Roche
Scallon
277
411
89
222
84
w24 ¼ 88:9
218
424
129
157
156
P ¼ 0.00
For the gating network MoE model, the w2-statistic suggests a good model fit; this is not the case for the suboptimal (in terms of BIC) mixture model.
25.6. Discussion A novel model has been developed to accommodate the use of covariates when clustering ranked preference data. The model developed offers the flexibility to allow covariates influence the clustering by allowing covariate dependence to enter different parts of the model. Efficient model fitting procedures are developed using a hybrid of the EM and MM algorithms. Optimal models are selected in a statistically principled manner via a model selection criterion. In the application of the model to the HPQ data six clusters were found and the covariates were found to be noninformative in the modeling. However, in the analysis of the Irish election data the covariates were found to be informative and their inclusion provided a deeper picture of the voting bloc structure in the electorate than a standard mixture model does. In other applications, the covariates may enter the model in different manners and the resulting models have the potential to provide a deeper understanding of the population than standard clustering methods that do not incorporate covariates. The proposed model formalizes the practice of trying to understand the cluster structure using covariates by including the covariates in the model directly. The Plackett-Luce model was employed as the rank data model within each homogeneous group in the MoE model for ranked preference data. Alternative rank data models could also be employed in the general MoE model framework developed here. Benter’s model for rank data (Benter, 1994) is one obvious
562
Isobel Claire Gormley and Thomas Brendan Murphy
alternative — Benter’s model is similar to the Plackett-Luce model in that it is parameterized by the same support parameters, but it also has an additional dampening parameter. The dampening parameter models the way in which judges may make choices at different levels within their ranking with differing amounts of certainty. A mixture of Benter models was employed in Gormley and Murphy (2008a) to analyze Irish election data; this model could be extended to include covariates. Many other rank data models are available; see Marden (1995) for further details. The MoE models for rank data developed here are ideal for specifically modeling stated ranked preference data (e.g. from surveys). However, these models are essentially variations on standard discrete choice models, tailored for ranked data. For example, the expert network MoE model for rank data is simply a mixture of standard logit choice models which have been tailored to model rank data (Train, 2003); this can also be thought of a mixed logit model (McFadden & Train, 2000) for rank data where the mixing density is discrete. Hence, the general framework detailed here can be applied to other forms of choice data by changing the component densities f ðxi jyk ðwi ÞÞ to the appropriate form. The parameters of the appropriate density may then be modeled as a function of the covariates. The generalization of the MoE model for rank data to MoE models for any form of choice data highlights a link with the popular LCM for choice data. Greene and Hensher (2003) contrast the LCM with the mixed logit model using an illustrative study in which preferences for road environments are recorded. LCMs have also been extended to include covariates; for example, Dayton and Macready (1988) develop the concomitant-variable LCM where covariates enter both the mixing proportions and the class specific probabilities using a logistic framework. Reboussin, Ip, and Wolfson (2008) also incorporate covariates in the LCM when modeling data from a large-scale survey of underage drinking. Preference ranking data arises in a wide range of contexts and the proposed model has potential applications in these contexts. For example, marketing surveys such as the HPQ examined in this paper are widespread. The modeling framework developed here can be employed not only to highlight clusters of consumers, but also to the covariates which influence, or perhaps significantly do not influence, the clustering structure. The model allows for a detailed analysis of clustering and the effect of covariates on rankings. However, a limitation of the proposed MoE model for rank data is its unsuitability for evaluating standard choice modeling outputs such as forecasts or ‘willingness to pay’ measures (Hensher, Rose, & Greene, 2005). Due to the inherent nature of ranked data output measures are difficult to evaluate. Moreover, even diagnosing the suitability of ranked data models poses problems. The model could be extended to include object covariates as well as covariates for the judges. This would offer an even deeper understanding of the preference ranking procedure. In the Irish election context presented here, for example, including candidate covariates such as their area of residence may provide deeper insight to structure of the electorate and/or to the electorates’ preferences. More advanced methods for selecting the covariates could be considered and there is the possibility of expanding the model space so the covariates only enter some of the mixing proportions or some of the component densities rather than all; this approach was used in a different context by Gustafson and Lefebvre (2008).
Clustering Ranked Preference Data
563
Acknowledgments This research was funded by a Science Foundation Ireland Research Frontiers Programme Grant (06/RFP/M040). The authors would like to thank Professor Adrian Raftery, the members of the Center for Statistics and the Social Sciences and the members of the Working Group on Model-based Clustering at the University of Washington for numerous suggestions that contributed enormously to this work. The authors would also like to thank the anonymous referees for helpful suggestions that have added to the overall quality of this work.
References Akaike, H. (1973). Information theory and an extension to the maximum likelihood principle. Second international symposium on information theory (pp. 267–281). Budapest: Akademiai Kiado. Bao, L., & Meila˘, M. (2008). Clustering permutations by exponential blurring mean-shift algorithm. Technical Report no. 524. Department of Statistics, University of Washington, Seattle. Benter, W. (1994). Computer-based horse race handicapping and wagering systems: A report. In: W. T. Ziemba, V. S. Lo & D. B. Haush (Eds), Efficiency of racetrack betting markets (pp. 183–198). San Diego, CA: Academic Press. Biernacki, C., Celeux, G., & Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis & Machine Intelligence, 22, 719–725. Bo¨hning, D., Dietz, E., Schaub, R., Schlattmann, P., & Lindsay, B. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46, 373–388. Busse, L. M., Orbanz, P., & Buhmann, J. M. (2007). Cluster analysis of heterogeneous rank data. In: Z. Ghahramani (Ed.), Proceedings of the 24th international conference on machine learning, Vol. 227 of ACM international conference proceeding series, New York (pp. 113–120). Chapman, R., & Staelin, R. (1982). Exploiting rank ordered choice set data within the stochastic utility model. Journal of Marketing Research, 19, 288–301. Coakley, J., & Gallagher, M. (2004). Politics in the Republic of Ireland (4th ed.). London: Routledge in association with PSAI Press. Critchlow, D. E. (1985). Metric methods for analyzing partially ranked data (Lecture notes in statistics) (Vol. 34). Berlin: Springer-Verlag. Critchlow, D. E., Fligner, M. A., & Verducci, J. (1991). Probability models on rankings. Journal of Mathematical Psychology, 353, 294–318. Dayton, C. M., & Macready, G. B. (1988). Concomitant-variable latent-class models. Journal of the American Statistical Association, 83, 173–178. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38, with discussion. Dobson, A. J. (Ed.) (2002). An introduction to generalized linear models (2nd ed.). London: Chapman and Hall. Emerson, J. W. (1998). Mosaic displays in S-PLUS: A general implementation and a case study. Statistical Computing and Statistical Graphics Newsletter, 9, 17–23.
564
Isobel Claire Gormley and Thomas Brendan Murphy
Fligner, M. A., & Verducci, J. S. (1988). Multistage ranking models. Journal of the American Statistical Association, 83, 892–901. Fligner, M. A., & Verducci, J. S. (Eds.) (1993). Probability models and statistical analyses for ranking data. New York: Springer-Verlag. Papers from the conference held at the University of Massachusetts, Amherst, MA, June 8–13, 1990. Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611–612. Gormley, I. C., & Murphy, T. B. (2006). Analysis of Irish third-level college applications data. Journal of the Royal Statistical Society, Series A, 169, 361–379. Gormley, I. C., & Murphy, T. B. (2008a). Exploring voting blocs within the Irish electorate: A mixture modeling approach. Journal of the American Statistical Association, 103, 1014–1027. Gormley, I. C., & Murphy, T. B. (2008b). A mixture of experts model for rank data with applications in election studies. The Annals of Applied Statistics, 1452–1477. Greene, W. H., & Hensher, D. A. (2003). A latent class model for discrete choice analysis: Contrasts with mixed logit. Transportation Research Part B: Methodological, 37, 681–698. Gustafson, P., & Lefebvre, G. (2008), Bayesian multinomial regression with class-specific predictor selection. The Annals of Applied Statistics, 2(4), 1478–1502. Hartigan, J. A., & Kleiner, B. (1981). Mosaics for contingency tables. In: Computer science and statistics: Proceedings of the 13th symposium on the interface, Springer-Verlag, New York (pp. 268–273). Hensher, D. A., Rose, J. M., & Greene, W. H. (2005). Applied choice analysis: A primer. Cambridge University Press. Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models. The Annals of Statistics, 32, 384–406. Hunter, D. R., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58, 30–37. Hurn, M., Justel, A., & Robert, C. P. (2003). Estimating mixtures of regressions. Journal of Computational and Graphical Statistics, 12, 55–79. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixture of local experts. Neural Computation, 3, 79–87. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795. Keribin, C. (2000). Consistent estimation of the order of mixture models. Sankhya¯ Ser. A, 62, 49–66. Lange, K., Hunter, D. R., & Yang, I. (2000). Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics, 9, 1–59, with discussion, and a rejoinder by Hunter and Lange. Lazarsfeld, P. F., & Henry, N. W. (1968). Latent Structure Analysis. Boston, MA: Houghton Mifflin. Leroux, B. G. (1992). Consistent estimation of a mixing distribution. The Annals of Statistics, 20, 1350–1360. Lindsay, B. (1995). Mixture models: Theory, geometry and applications. Hayward, CA: Institute of Mathematical Statistics. Luce, R. D., & Suppes, P. (1963). Preference, utility and subjective probability. In: R. Luce, R. Bush & F. Galanter (Eds), Handbook of mathematical psychology (Vol. 3). New York: Wiley. Mallows, C. L. (1957). Non-null ranking models. I. Biometrika, 44, 114–130. Marden, J. I. (1995). Analyzing and modeling rank data. London: Chapman & Hall. Marsh, M. (1999). The making of the eighth president. In: M. Marsh & P. Mitchell (Eds), How Ireland voted 1997 (pp. 215–242). Boulder, CO: Westview.
Clustering Ranked Preference Data
565
McCullagh, P., & Nelder, J. (1983). Generalized linear models. London: Chapman and Hall. McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In: P. Zarembka (Ed.), Frontiers in econometrics (pp. 105–142). New York: Academic Press. McFadden, D., & Train, K. (2000). Mixed MNL models of discrete responses. Journal of Applied Econometrics, 447–470. McLachlan, G. J., & Basford, K. E. (1988). Mixture models: Inference and applications to clustering. New York: Marcel Dekker Inc. McLachlan, G. J., & Krishnan, T. (1997). The EM algorithm and extensions. New York: Wiley. McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: Wiley. McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18, 285–296. Meila˘, M., & Bao, L. (2008). Estimation and clustering with infinite rankings. In: D. A. McAllester & P. Myllyma¨ki (Eds.), Proceedings of the 24th conference in uncertainty in artificial intelligence, AUAI Press, Corvalis, Oregon, USA (pp. 393–402). Murphy, T. B., & Martin, D. (2003). Mixtures of distance-based models for ranking data. Computational Statistics and Data Analysis, 41, 645–655. Nelder, J. A., & Wedderburn, R. W. (1972). Generalized linear models. Journal of the Royal Statistical Society Series A, 135, 370–384. Plackett, R. L. (1975). The analysis of permutations. Applied Statistics, 24, 193–202. Ralston, K., Brent, C. P., Starke, Y., Riggins, T., & Lin, C. J. (2002). Consumer food safety behavior: A case study in hamburger cooking and ordering. No. AER804 in Agricultural Economic Report, Food and Rural Economics Division, Economic Research Service, US Department of Agriculture. Reboussin, B. A., Ip, E. H., & Wolfson, M. (2008). Locally dependent latent class models with covariates: An application to under-age drinking in the USA. Journal of the Royal Statistical Society, Series A, 171, 877–897. Schwartz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464. Sinnott, R. (1995). Irish voters decide: Voting behaviour in elections and referendums since 1918. Manchester, UK: Manchester University Press. Sinnott, R. (2004). The rules of the electoral game. In: J. Coakley & M. Gallagher (Eds), Politics in the Republic of Ireland (4th ed., pp. 105–134). London: Routledge in association with PSAI Press. Smyth, P. (2000). Model selection for probabilisitc clustering using cross-validated likelihood. Statistics and Computing, 9, 63–72. Stern, H. S. (1993). Probability models on rankings and the electoral process. In: M. A. Fligner & J. S. Verducci (Eds), Probability models and statistical analyses for ranking data (pp. 173–195). New York: Springer-Verlag. Thompson, T. J., Smith, P. J., & Boyle, J. P. (1998). Finite mixture models with concomitant information: Assessing diagnostic criteria for diabetes. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47, 393–404. Thurstone, L. L. (1927). A law of comparative judgement. Psychological Review, 34, 273–286. Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. Chichester, UK: Wiley. Train, K. E. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University Press. Vigneau, E., Courcoux, P., & Semenou, M. (1999). Analysis of ranked preference data using latent class models. Food Quality and Preference, 10, 201–207. Yang, C.-C., & Yang, C.-C. (2007). Separating latent classes by information criteria. Journal of Classification, 24, 183–203.
566
Isobel Claire Gormley and Thomas Brendan Murphy
25.A.1. The Hamburger Preparation Quiz Data Source The Hamburger Preparation Quiz data set was collected in 1996 by the Market Research Corporation of America through the Menu Census Survey. The Hamburger Preparation Quiz form and the data set are freely available from http://www.ers.usda.gov/Data/Hamburger/.
25.A.2. Irish Election Data Source The 1997 Irish presidential opinion poll data set was collected by Irish Marketing Surveys and is available through the Irish Elections Data Archive http://www.tcd.ie/ PoliticalScience/elections/elections.html, which is maintained by Professor Michael Marsh in the Department of Political Science, Trinity College Dublin, Ireland.
25.A.3. Mathematical Details for the EMM Algorithm In this section, the expert network MoE model for ranked preference data will be employed to illustrate parameter estimation via the EMM algorithm. In the expert network MoE model, the support parameters within each group are modeled as a function of the judges’ covariates. Specifically, for j ¼ 1, y, N, k ¼ 1, y, K and the covariates of judge i, wi pkj ðwi Þ ¼ expðgTkj wi Þ where gkj ¼ ðgkj0 ; . . . ; gkjL Þ is a vector of parameters. To ensure identifiability in the expert network MoE model gk1 ¼ ð0; . . . ; 0Þ meaning pk1 ¼ 1 in all groups. Under this definition pk1 þ . . . þ pkN a1, but the structure of the Plackett-Luce density ensures valid probabilities of the final preference orderings. The complete data log likelihood for the expert network MoE model is
l C ðp; gÞ ¼
M X K X
( zik log pk þ
i¼1 k¼1
ni X t¼1
gTkcði;tÞ wi
ni X t¼1
log
N X s¼t
) expfgTkcði;sÞ wi g
(25.A.1)
Maximizing the complete data log likelihood (Eq. 25.A.1) via the EMM algorithm provides maximum likelihood estimates (MLEs) for p and g.
The EMM Algorithm The EM algorithm (Dempster et al., 1977) is an iterative algorithm consisting of two steps per iteration, an ‘E’ or expectation step and a ‘M’ or maximization step.
Clustering Ranked Preference Data
567
In the EMM algorithm maximization at the M step is achieved by employing ideas from MM algorithms (Hunter & Lange, 2004). In the E step of the EMM algorithm the expected value of the complete data log likelihood is calculated; essentially this step involves calculating the expected value of the latent variables, z. The form of the E step is independent of the type of MoE model for ranked preference data; the appendix in Gormley and Murphy (2008a) provides full details of the E step. In the M step of the EMM algorithm, the expected complete data log likelihood derived in the E step is maximized with respect to the model parameters, p and g. In the expert network MoE model for ranked preference data the mixing proportions p are treated as independent of the voter covariates. The form of the estimate p^ derived at the M step is therefore the same as that derived under a mixture of Plackett-Luce models; details can be found in Gormley and Murphy (2008a). In the expert network MoE model the Plackett-Luce support parameters p are treated as functions of the voter covariates with parameters g. Maximization of the expected value of Eq. (25.A.1) with respect to g is complex due to the intricate form of the Plackett-Luce density. Ideas from optimization transfer algorithms or ‘MM algorithms’ are therefore employed to maximize the expected value of Eq. (25.A.1) with respect to g.
Constructing Linear Surrogate Functions y, K, Differentiating the expected value of Eq. (25.A.1) with respect to gkjl Pfor k ¼ 1, T expfg w g: j ¼ 2, y, N and l ¼ 0, y, L is problematic due to the term log N s¼t kcði;sÞ i Such maximization issues may be overcome by implementing an optimization transfer algorithm in which optimization is transferred from the problematic objective function to a suitable surrogate function (Lange et al., 2000). Surrogate functions are constructed by exploiting mathematical properties of (part of) the problematic objective function. One approach to constructing linear surrogate functions employs the supporting hyperplane property of a convex function. If f(y) is a convex function with differential f u(y) and y denotes a constant value of the generic parameter y, then, þ f 0 ðyÞðy yÞ f ðyÞ 4 f ðyÞ Since log( ) is a convex function
log
N X s¼t
expfgTkcði;sÞ wi g 4 log
N X s¼t
N P
expfgTkcði;sÞ wi g
þ1
s¼t N P s¼t
expfgTkcði;sÞ wi g expfgTkcði;sÞ wi g
568
Isobel Claire Gormley and Thomas Brendan Murphy
where g denotes a constant value of g. In practice, this value is the value of the parameter from the previous iteration of the EMM algorithm. Hence, the expected complete data log likelihood becomes, up to a constant,
Efl C ðp; gÞg 4
M X K X i¼1 k¼1
z^ik
8 > > ni > < X > t¼1 > > :
N P
gTkcði;tÞ wi
s¼t N P s¼t
9
> > expfgTkcði;sÞ wi g> = > > expfgTkcði;sÞ wi g> ;
(25.A.2)
This surrogate objective function still poses challenges (due to the term expfgTkcði;sÞ wi g) when used to provide estimates of g. Optimization transfer algorithms can be implemented again to simplify the maximization.
Constructing Quadratic Surrogate Functions The term expfgTkcði;sÞ wi g is concave and employing a quadratic surrogate function in this case would improve the approximation of the surrogate function to the objective function. A concave function g(y) can be bounded around y using a quadratic function via þ fg0 ðyÞg T ðy yÞ þ 0:5ðy yÞ T Bðy yÞ gðyÞ 4 gðyÞ 2 for negative definite B where B4ðd2 gðyÞ=dy Þ: Hence
expðgTkj wi Þ expðgTkj wi Þ wTi expðgTkj wi Þðgkj g kj Þ 0:5ðgkj g kj ÞT Bðgkj g kj Þ where B ¼ wTi wi : The covariates are constrained such that 0 r wil r 1 for computational ease. Hence Eq. (25.A.2) becomes
Efl C ðp; gÞg
M X K X i¼1 k¼1
N n X
wTi
s¼t
2 " #1 ni N X X T T 4g w expfgkcði;sÞ wi g z^ik kcði;tÞ i t¼1
expðgTkcði;sÞ wi Þgkcði;sÞ
s¼t
þ
0:5gTkcði;sÞ ðwTi wi Þgkcði;sÞ
gTkcði;sÞ ðwTi wi Þgkcði;sÞ
# o
This surrogate to the expected complete data log likelihood is now simply a quadratic function in gkjl and maximization is straightforward. Maximizing with respect to gkjl for k ¼ 1, y, K and j ¼ 2, y, N and l ¼ 0, y, L provides the
Clustering Ranked Preference Data
569
estimate of the MLE g^ kjl M P
g^ kjl ¼
i¼1
z^ik
ni P
½wil 1fj¼cði;tÞg
t¼1 M P i¼1
z^ik
N P
½wil expfgTkj wi gðwTi wi Þgkjl 1fj¼cði;sÞg=
s¼t ni P N P
N P
t¼1 s¼t
s¼t
½
fðwTi wi Þg1fj¼cði;sÞg=
N P s¼t
expfgTkcði;sÞ wi g
expfgTkcði;sÞ wi g (25.A.3)
The EMM Algorithm for the MoE Model for Ranked Preference Data In summary, the steps of the EMM algorithm to estimate the MLEs of the parameters of the expert network MoE model for ranked preference data are: 0. Let h ¼ 0 and choose initial parameter estimates g(0) and pð0Þ . for i ¼ 1, y, M and k ¼ 1, y, K as detailed 1. E step: Compute the quantities zðhþ1Þ ik in Gormley and Murphy (2008a). 2. M step: for k ¼ 1, y, K as detailed in Gormley and Murphy (2008a). (a) Compute pðhþ1Þ k ðhþ1Þ (b) Compute gkjl for k ¼ 1, y, K, j ¼ 2, y, N and l ¼ 0, y, L as detailed in
Eq. (25.A.3). 3. If converged, then stop. Otherwise, increment h and return to Step 1. Convergence is assessed in this case using Aitken’s acceleration (Bo¨hning, Dietz, Schaub, Schlattmann, & Lindsay, 1994; Lindsay, 1995). The M step changes for the gating network MoE model and for the general MoE model where the mixing proportions p are treated as functions of the covariates; full details of the calculations required in these M steps are detailed in Gormley and Murphy (2008b). For the mixture model, an EMM algorithm is also required; details are provided in Gormley and Murphy (2006).
Chapter 26
Continuous versus Discrete Representation of Investing Firm Heterogeneity in Modelling FDI Location Decisions Simona Rasciute and Eric J. Pentecost
Abstract This paper applies the mixed logit and the latent class models to analyse the heterogeneity in foreign investment location choices in Central and Eastern Europe. The empirical results show that the responsiveness of the probabilities of choices to invest in a particular location to country-level variables differs both across sectors and across firms of different characteristics. The paper highlights the superiority of the latent class model with regards to the model fit and the interpretation of results.
26.1. Introduction There has been a significant increase in foreign direct investment (FDI) to Central and Eastern European Countries (CEECs) following the transition period from planned to market economies, the integration of CEECs into the European Union (EU) and the corresponding elimination of barriers to trade and foreign investment. Multinational enterprises (MNEs) face three major choices while undertaking foreign activities. The first choice is whether to produce at home and export, or whether to produce abroad. Conditional on locating the production abroad the firm has a choice among alternative locations of production. Conditional on deciding where to locate foreign capital, the firm decides the scale of investment. The first two decisions are discrete. The existing empirical literature on FDI mainly focuses on the first and the
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
572
Simona Rasciute and Eric J. Pentecost
last choices (see, e.g. Barrell & Pain, 1999; Carstensen & Toubal, 2004; Be´nassyQue´re´, Coupet, & Mayer, 2007). There is, therefore, a lack of detailed analyses on where firms choose to locate their capital, given the decision to locate capital abroad. The studies that analyse the investment location choices of MNEs (e.g. Becker, Ekholm, Jackle, & Muendler, 2005; Crozet, Mayer, & Mucchielli, 2004; Disdier & Mayer, 2004) rely either on the multinomial logit (MNL) or the nested logit (NL) models.1 The MNL model, however, is subject to restrictive assumptions regarding the substitution patterns across alternative investment locations, while the NL model partially relaxes the independence from irrelevant alternatives (IIA) assumption in order to allow some substitution across alternative investment locations, neither of these models allow for heterogeneity among investing firms, which is potentially very important for the choice of investment location, as Nocke and Yeaple (2007) have demonstrated. The principal contribution of this paper is therefore, to allow for investing firm heterogeneity,2 by applying the mixed logit (ML) and latent class (LC) models to investigate investment location choices by MNEs in CEECs. The specification of heterogeneity in the ML model involves the choice of specific distributions for the parameters that are treated as random. The LC model, on the other hand, allows for investing firms’ heterogeneity by the segmentation of investing firms into a predetermined number of classes for which parameters are estimated separately. The investment location choices of MNEs will not only depend on observed attributes, but also on latent heterogeneity that varies with unobserved factors. The LC model is superior to the ML model in that it does not require alternative distributions to be chosen for the random parameters, but instead captures investing firm heterogeneity with a discrete distribution. Furthermore, the majority of empirical literature that applies both models report that the LC model performs better than the ML model (see, e.g. Greene & Hensher, 2003; Hole, 2008; Shen, Sakata, & Hashimoto, 2006). In addition, in order to enable the analyses of investing firm heterogeneity a novel three-level data set has been constructed, which has investing firm characteristics, rather than just country-level and industry-level data. This multi-level data set, including firm, industry (or sector) and country effects has been compiled from individual firm data supplied by the Bureau van Dijk, to simultaneously determine the firm-level FDI location decisions. The data set covers 1108 FDI location choices of 693 firms in the USA, the EU15, Norway, Switzerland, Russia and Japan into 13 CEECs — the 12 recent EU member states, excluding Cyprus and Malta, but including Croatia, Russia and Ukraine over an 11-year period from 1997 to 2007. The estimation results show that the investing firms’ characteristics have a significant
1. The exception to this is Basile, Castellani, and Zanfei (2008), who apply the mixed logit model in the context of EU cohesion policy but, unfortunately, the interpretation of results and policy implications are based on the estimated coefficients. However, neither the sign nor the magnitude of the coefficients are very informative and further post-estimation is needed. 2. It is referred to as ‘taste’ heterogeneity in transport economics literature.
Continuous versus Discrete Representation of Investing Firm Heterogeneity
573
role to play in the specific choice of investment location and therefore alternative locations tend to attract different types of FDI. It is therefore important to acknowledge that investing firm characteristics are possibly as important, if not more important, than the host country or industry characteristics of the investment receiving country. The rest of the paper is set out as follows. Section 26.2 presents and compares the ML and LC models, Section 26.3 discusses the data set and the construction of explanatory variables used in the model and Section 26.4 presents the econometric results and policy implications. Finally, Section 26.5 concludes.
26.2. The Mixed Logit and Latent Class Models The probability of investing firm i choosing to locate its capital in a foreign country c conditional on a vector of coefficients b3 can be expressed as: 0
ebi xisc Lisc ðbi Þ ¼ P b0 x i isg ge
(26.1)
which are standard logit probabilities, where xisc is a vector of observed investment location-specific variables, which have different values for firms investing in sector, s, and for firms of different characteristics; and bi are individual firm-specific parameters. In the ML4 model, the vector bi is distributed randomly with parameters C. In the most general form, the ML probabilities are the integrals of standard logit probabilities over a density of unobserved random parameters (Train, 2003) such that: Z Pisc ¼
Lisc ðbi Þf ðbi jCÞdbi
(26.2)
where f(bi) is the random parameter density function, which is specified to be continuous. The ML probability is a weighted average of the logit formula evaluated at different values of bi, with the weights given by the density f(bi). Since the integral cannot be calculated analytically, it has to be approximated through simulation by maximising the simulated log-likelihood function. A value of b is drawn from the distribution f(b|C) for a given value of C and labelled br with the subscript r ¼ 1 referring to the first draw. The logit formula Lisc(br) is calculated with this draw. Finally, the two steps are repeated many times and the results are
3. Often referred to as ‘taste’ coefficients in transport economics. 4. In the literature also referred to ‘random parameter logit’, ‘mixed multinomial logit’, ‘kernel logit’ and ‘hybrid logit’.
574
Simona Rasciute and Eric J. Pentecost
averaged, which gives the simulated probability: R 1X P isc ¼ Lisc ðbr Þ R r¼1
(26.3)
where R is a number of draws. P isc is an unbiased estimator of Pisc, whose variance decreases as R increases. It is strictly positive (ln P isc is defined) and smooth (twice differentiable) in the parameters C and the variables x, which facilitates the numerical search for the maximum likelihood function and the calculation of elasticities. Furthermore, P isc sums to one over all alternatives which is helpful when interpreting the results. The simulated probabilities are substituted into the loglikelihood function to give the simulated log likelihood (SLL):
SLL ¼
C X I X
d isc ln P isc
(26.4)
c¼1 i¼1
where disc ¼ 1 if i chose c and zero otherwise. The maximum simulated likelihood estimator (MSLE) is the value of C that maximises SLL. The ML model is the most flexible discrete choice model that approximates any random profit model (McFadden & Train, 2000), but this flexibility comes at the cost of an increased computational burden, as the choice probabilities are given by integrals without a closed form solution. Furthermore, the interpretation of results estimated by the ML model is not straightforward. Knowing that a coefficient is distributed randomly across respondents is of little use to policy makers (Hess, Ben-Akiva, Gopinath, & Walker, 2009). The most challenging task in the specification of the ML model is the choice of an appropriate distribution for random parameters, especially when a priori information about the true shape of the distribution is not known. A choice of an inappropriate distribution can lead to poor model performance and misleading policy implications. Distributions are arbitrary approximations to the real behavioural profile and specific distributions which are selected as ‘empirical truth’ are believed to be somewhere in the domain (Train, 2003). The majority of applications use either normal or lognormal distribution, however, a normal distribution may be restrictive due to its symmetry and the length of tails, while the lognormal distribution restricts the sign and has a long tail on one side. Alternatively, a continuous distribution is replaced with a discrete distribution with a finite number of mass points, which gives a semi-parametric extension of the ML model — the LC model. The LC model does not require the researcher to make specific assumptions about the distribution of random parameters across investing firms, and simulation is not needed in the estimation process. Investing firms are implicitly divided into a number of classes Q, although it is not known which class contains a particular firm. Investing firm behaviour depends on observable attributes and on latent heterogeneity that varies with factors that are unobserved (Greene & Hensher, 2003).
Continuous versus Discrete Representation of Investing Firm Heterogeneity
575
The profit parameters become class specific such that pisc|q ¼ bqzuisc + eisc|q, where p is potential profit of foreign investment in country c, bq are class specific estimated profit parameters and zisc is a vector of country characteristics, which have different values for firms investing in sector, s, and for firms of different characteristics. The probability that an investing firm i in class q chooses among C alternatives to locate its investment is given as: expðbq z0isc Þ PrðY isc ¼ cjclass ¼ qÞ ¼ PC 0 c¼1 expðbq zisc Þ
(26.5)
Within each class, the choice probabilities are assumed to be generated by the MNL model. The class membership, however, is not observed and class probabilities are specified by the MNL form. Therefore, the probability of investing firm i belonging to class q can be expressed as: expðyq h0i Þ ; H iq ¼ PQ 0 q¼1 expðyq hi Þ
q ¼ 1; . . . ; Q;
yQ ¼ 0
(26.6)
where hi denotes a set of observed investing firms’ characteristics. However, the majority of applications use the most simple form, where the probabilities for the different classes and hence values for the ‘taste’ coefficients are kept constant among respondents (one of the few exceptions is Walker & Li, 2007). More flexibility can be gained by linking the class allocation directly to investing firm individual characteristics.5 Combining the conditional choice equation (26.5) and membership classification equation (26.6), the joint probability that investing firm i belongs to class q and chosen alternative c can be written as:
Pisc ¼
Q X
H iq Piscjq ¼
q¼1
Q X q¼1
"
expðbq z0isc Þ
PC
c¼1
#"
expðbq z0isc Þ
expðhq h0i Þ
PQ
q¼1
#
expðhq h0i Þ
(26.7)
The parameter vectors bq and hq are simultaneously estimated by the maximum likelihood method and the log likelihood (LL) for the sample is defined as:
LL ¼
I X i¼1
ln Pi ¼
I X i¼1
" ln
Q X
# H iq Piscjq
(26.8)
q¼1
The LL is maximised with respect to the Q structural parameter vectors, bq, and the Q 1 LC parameter vectors, hq. The issue in the estimation process is the choice
5. Socio-demographic attributes in case of individuals.
576
Simona Rasciute and Eric J. Pentecost
of the number of classes, Q, as the comparison of the LLs of models with a different number of classes is not appropriate. While increasing the number of classes increases the fit of the model, it may lead to some coefficients having very large standard errors. The trade-off between the goodness of fit and the precision of the parameter estimates can be found with the help of information criteria summarised by Shen and Saijo (2007), which could help determine the optimal number of classes, Q. The four criteria are: Akaike information criterion: AIC ¼ 2ðLLQ K Q Þ Akaike’s r2: r 2Q ¼ 1 ½AICQ =ð2LL0 Þ Bozdogan Akaike Information Criterion: AIC3 ¼ 2LLQ þ 3K Q Bayesian Information Criterion: BIC ¼ LLQ þ ðK Q log NÞ=2 where LLQ is the log likelihood at convergence with Q classes, KQ is the number of parameters in the model with Q classes, LL0 is the log likelihood of the sample with equal choice probabilities and N is the sample size. The ‘optimal’ number of LCs is indicated by the minimum (maximum) values of AIC, AIC3 and BIC ðr 2Q Þ.
26.3. The Data Set and Variable Specifications The sample consists of 1108 firm-level data observations on FDI location choices by 693 firms of 20 market economies (EU15 countries, USA, Japan, Russia, Norway and Switzerland) to firms in 13 transition economies (12 new EU member states (except for Malta and Cyprus) plus Croatia, Russia and Ukraine) from 1997 to 2007.6 Most of the empirical literature on FDI focuses on Greenfield investment, excluding other entry modes such as mergers and acquisitions (M&A), joint ventures and institutional buyouts (Brownfield investment), which are the most important in the CEECs (Head & Ries, 2008). For example, in 2005 the share of cross-border M&As in FDI was about 96 per cent in Czech Republic, 84 per cent in Estonia, 82 per cent in Ukraine and 68 per cent in Bulgaria (UNCTAD7 statistics).8 In order to test for the location determinants of Brownfield investment, the MNEs in the sample enter a foreign market via M&A, joint ventures or institutional buyouts. Of all 13 host CEECs in the sample, Poland has received the largest share of foreign capital allocations by MNEs (about 21 per cent) and it was followed by Russia with approximately 17 per cent of foreign investment location choices. Slovenia and Latvia, on the other hand, have received the smallest share foreign
6. It is a cross-sectional analysis, as investment by an individual firm is a one-shot event rather than a continuous process. On the other hand, the correlation among multiple decisions of the same firm has been taken into account. 7. United Nations Conference on Trade and Development. 8. It is difficult to estimate precisely what share of FDI flows is accounted for by cross-border M&A because the values of cross-border M&As cannot be directly compared with FDI flows registered in the balance of payments.
Continuous versus Discrete Representation of Investing Firm Heterogeneity
577
Table 26.1: The number of foreign capital allocations from each source country to each host country (columns: destination countries, rows: source countries). BG CZ EE CR HU LT LV PL RO RU SI SK UA Austria Belgium Denmark Germany Finland France Greece Ireland Italy Japan Luxembourg Netherlands Norway Russia Spain Switzerland Sweden UK US
5 0 2 4 0 2 19 1 1 0 3 3 0 0 2 0 3 5 3
5 4 3 20 5 6 0 2 2 1 4 13 5 7 7 4 4 15 15
0 0 1 3 32 1 0 0 0 0 0 0 4 1 2 0 14 2 0
7 1 0 6 5 2 1 0 0 0 4 3 0 0 1 1 2 2 0
18 3 1 12 0 11 7 0 4 1 0 3 1 2 1 4 1 19 13
0 0 8 1 23 1 0 0 0 0 0 0 3 4 0 0 16 0 1
0 0 4 1 10 2 0 0 0 0 0 0 2 5 0 0 7 1 0
10 6 15 11 21 40 0 3 8 2 9 18 2 1 7 2 13 23 32
10 3 2 14 2 9 19 0 7 0 0 5 1 3 6 3 0 8 12
1 7 0 7 24 13 1 3 7 5 1 17 10 0 6 9 24 37 20
5 3 1 3 0 4 0 0 1 0 0 0 0 0 0 3 2 1 3
5 4 0 6 2 2 0 1 1 0 0 6 1 0 0 0 0 6 3
1 0 0 0 7 8 4 3 0 0 0 3 2 30 0 0 2 3 3
BG, Bulgaria; CZ, Czech Republic; EE, Estonia; CR, Croatia; HU, Hungary; LT, Lithuania; LV, Latvia; PL, Poland; RO, Romania; RU, Russia; SI, Slovenia; SK, Slovakia; UA, Ukraine.
capital allocations (approximately 2 and 3 per cent, respectively). The two major investors in the CEECs in the sample are Finland and the UK with the shares of approximately 12 and 11 per cent, respectively. MNEs from Japan and Ireland were at the other end of the scale regarding investment location choices in the CEECs with approximately 1 per cent each. The number of foreign capital allocations from each source country to each host country are summarised in Table 26.1. The largest number of foreign capital allocation in the CEECs took place in the traditional sectors (approximately 36 per cent), followed by scale-intensive industries (about 24 per cent) and service sectors (nearly 23 per cent).9 Science-based industries have received the smallest share of FDI (approximately 18 per cent) in the sample.
9. Scale-intensive sectors are typically oligopolistic, large firm industries, with high capital intensity, extensive economies of scale and high technical and managerial complexity. Science-based sectors are characterised by innovative activities directly linked to high R&D. Traditional sectors produce nondurable consumer goods but only innovate by acquiring supplies of equipment and materials. Typically, the output of the service sector is ‘intangible goods’, such as financial and information services and professional advice.
578
Simona Rasciute and Eric J. Pentecost
Table 26.2 gives a summary of variable definitions and sources. The traditional determinants of FDI, are the market size of the host country, the cost of capital in the host country and the distance between investing and investment receiving countries. As Table 26.2 shows, market size is measured as the real GDP of the host country. The host country cost of capital is measured as the real discount (interest) rate. Both of these variables are expected to be positively associated with FDI inflows. Distance can be considered as a measure of the transaction costs of undertaking foreign activities, such as the costs of transport and communication, the costs of dealing with cultural and language differences, the costs of sending personnel abroad and the informational costs of institutional and legal factors, for example local property rights, regulations and tax systems. These kinds of costs are all assumed to increase with distance. In addition to the above-mentioned factors, three other country-specific factors are included in the empirical model: the national rate of unemployment and two dummy variables, one for EU membership and another for the presence of a common border between the investing and the investment receiving country. A dummy variable for a common border between the source and the host country is included, as it is expected that the host country is more likely to be chosen to receive investment if it shares a border with the source country. Usually countries sharing the same border have similar cultures and language and stronger historical ties. Countries that joined the EU by January 2007 had to satisfy the economic (market economy), political (democracy and human rights) and administrative (wellfunctioning institutions) criteria set at the Copenhagen European Council in 1993. Therefore, it is not the membership itself that matters but the integration process to the EU that started before a country became an actual member. The accession of a CEEC into the EU meant free trade with EU member states and the adoption of a Western type business and legal environment, which provided foreign investors with confidence in the completion and success of each country’s reforms. As a result, the parameter on the EU dummy variable is expected to have a positive sign. The rate of unemployment in the host country, on the other hand, can be used as an indicator of labour market flexibility and availability of labour. Countries with high local demand for goods and services and high labour market flexibility are likely to face relatively low rates of unemployment, which may encourage firms to invest in a particular host country. On the other hand, a high unemployment rate may mean that although it is easy to recruit labour, there is low demand locally and likely labour market rigidities. The impact of unemployment on the investment location decision is therefore strictly ambiguous and it may have a different effect on firms investing in different industries. For example, firms investing in traditional sectors employ less skilled labour and may be more concerned about the availability of workers, while firms investing in science-based industries, which employ more skilled labour, may be discouraged by higher unemployment, as unemployed people lose their skills through time. Industry-level real wage rates are included as a proxy for the average variable costs of firms (it is implicitly assumed that workers are not fully mobile across sectors, at least in the short run). The profitability of the firm investing abroad is
Continuous versus Discrete Representation of Investing Firm Heterogeneity
579
Table 26.2: List of variables, definitions and sources. Variable Choice
Market size
Distance
Unemployment
Interest Border
EU
Traditional
Scales
Sciences
Services
Wage
Size
Profit
Definition A CEEC, in which firm i chooses to locate its investment over the period of time from 1997 to 2007 (it gets the value of 1 if the country received investment and 0 otherwise) Real GDP of the host country c of the year investment took place Distance between investing country d and investment receiving country c in kilometres Unemployment rate of country c (percentage per annum) of the year investment took place The real discount (interest) rate A dummy variable that takes a value 1 if both source country d and host country c share a border, and 0 otherwise Dummy variable that takes value 1 if country c joined EU before January 2007, and 0 otherwise Dummy variable that takes a value 1 if industry s is a traditional industry, and 0 otherwise Dummy variable that takes a value 1 if industry s is a scale-intensive industry, and 0 otherwise Dummy variable that takes a value 1 if industry s is a science-based industry, and 0 otherwise Dummy variable that takes a value 1 if industry s is a service sector, and 0 otherwise Hourly real wage rates in the industry s in the country c of the year investment took place Turnover of the investing firm i in Euros of the year investment took place Earnings before interest and taxes of the investing firm i in Euros of the year investment took place
Source Bureau van Dijk Zephyr database
International Financial Statistics (IFS) http:// www.indo.com/ distance IFS
IFS Constructed
Constructed
Constructed
Constructed
Constructed
Constructed
International Labour Organisation Bureau van Dijk Zephyr database Bureau van Dijk Zephyr database
580
Simona Rasciute and Eric J. Pentecost
expected to be higher if the labour costs are lower in the chosen country than in the rest of the destination countries (Barrell & Pain, 1999). On the other hand, however, higher wages may reflect higher skills and, therefore, may have a positive effect for firms investing in science-based industries, where more skilled labour is employed as compared to other industries. As a result, the sign of the parameter on industry-level real wage rates is ambiguous. The industries that have received foreign capital can be divided into four groups: scale-intensive sectors, science-based industries, traditional sectors and service sectors. Scale-intensive sectors are typically oligopolistic, large firm industries, with high capital intensity, extensive economies of scale and learning, high technical and managerial complexity, for example, automobiles, aircrafts, chemicals, petrol and coal products, shipbuilding, industrial chemicals, drugs and medicines, petrol refineries, non-ferrous metals and railroad equipment. Science-based sectors, on the other hand, are characterised by innovative activities directly linked to high R&D expenditures, for example, fine chemicals, electronic components, telecommunications and aerospace (Midelfart-Knarvik, Overman, Redding, & Venables, 2000). Traditional (supplier-dominated) sectors include such industries as textiles, clothing, furniture, leather and shoes, ceramics and the simplest metal products. Finally, banking, insurance and retail are examples of service sector industries. The country-level variables, however, may not only have a different value for firms investing in different sectors, but also for firms with different characteristics. Therefore, two firm-level variables are included in the model. Firstly, the turnover of the investing firm, as a proxy for its size. Secondly, earnings before interest and tax, as an indicator of the firm’s financial position in the year in which investment is made. Firms of different sizes and financial performance possess different resources and capabilities. Small firms are assumed to be characterised by speed, flexibility and niche-filling capabilities due to their structural simplicity and faster decision making, entrepreneurial-orientation and less risk aversion (Woo, 1987). As a result, smaller firms respond quicker to the dynamics of the industry environment. Larger firms, which are usually more profitable, are able to acquire larger market shares by exploiting scale economies, bargaining power, patents, reputation and they have more financial resources to deal with shocks and business downturns (Dean, Brown, & Bamford, 1998). Larger firms are expected to invest in countries with larger markets in order to exploit their economies of scale, while better performing firms are expected to be less discouraged to invest in remote countries, as more financial resources are available to cover transaction costs, such as costs of transport and communication, the costs of dealing with cultural and linguistic differences and information costs of institutional and legal factors.
26.4. Estimation and Results This section presents the estimation results of the ML and LC models. For comparison purposes, the MNL model is also estimated. Investing firm heterogeneity
Continuous versus Discrete Representation of Investing Firm Heterogeneity
581
in the MNL model can be accommodated in a deterministic way by either estimating separate coefficients for mutually exclusive subgroups of the sample population or by continuous interaction between ‘taste’ coefficients and individual firm characteristics (Hess et al., 2009). As the segmentation of firms on the basis of observed characteristics is somewhat ambiguous, the heterogeneity in the MNL model is taken into account with interaction terms between country-level factors and industry dummies together with investing firm characteristics (Table 26.3, column 5). The following interaction terms are statistically significant: the interaction terms between the dummy variable for traditional sectors and unemployment rate in the host country, the interaction term between the dummy variable for traditional sectors and the wage rate in the host country, the interaction term between investing firms performance and distance between investing and investment receiving countries, and finally, the interaction term between investing firm’s size and host country’s market size. The estimation results of the ML model that has the best model fit and random parameters with the most appropriate distributions are presented in Table 26.3,
Table 26.3: The results of the mixed logit model estimation. Variables
The mixed logit model Distribution
Wage GDP Distance Unempl Border EU Interest Prof_Dist Size_GDP Tr_Unemp Tr_Wage Log likelihood w2 Pseudo-R2
Triangular Triangular Triangular Triangular Restricted uniform Restricted uniform – – – – –
Mean
SD
0.285* ( 4.493) 0.575* (10.760) 1.252* ( 10.625) 3.5644* ( 2.684) 0.5651* (4.205) 0.8028*
Multinomial logit model
0.7252* 1.4868* 1.9154* 20.779* 0.5651*
(3.021) (5.511) (3.223) (2.653) (4.205)
(5.204) 0.8028*
(5.204)
4.5863* (3.945) 0.1471* (2.922) 0.1334* (4.090) 8.4893* (4.308) 1.4718* ( 5.065) 2498.132 687.664 0.1210
– – – – –
– – – – –
0.197* ( 4.709) 0.491* (13.959) 0.924* ( 11.89) 2.05*** ( 1.893) 0.563* (5.175) 0.576*
(5.606)
0.941 0.106* 0.104* 7.876* 0.429*
(0.954) (2.688) (4.428) (4.689) ( 5.388)
2527.725 121.8558 0.0235
t-Statistics in parenthesis, where *significant at 1 per cent level, **significant at 5 per cent level and ***significant at 10 per cent level. The triangular distribution is imposed on the wage, market size, distance and unemployment variables, while the restricted uniform distribution is imposed on the dummy variables for common border and EU membership. The third and the fourth column present the estimated mean coefficients and their t-statistics, respectively, while the fifth and the sixth columns present the estimated standard deviation parameters and their t-statistics, respectively. For comparison reasons the results of the multinomial logit model estimation are presented in the last two columns.
582
Simona Rasciute and Eric J. Pentecost
columns 3 and 4.10 A triangular distribution is imposed on the variable for market size in the host country, the wage variable, the distance variable, and the unemployment variable. A restricted uniform distribution is imposed on the two dummy variables: the dummy variable for common border and the dummy variable for EU membership.11 The same interaction terms as for the MNL model are statistically significant. As neither the sign nor the magnitude of the interaction terms or separately included variables in the model are informative, the elasticities and marginal effects have to be estimated for continuous and dummy variables. When two continuous variables are interacted, the direction of the interaction term is revealed with the help of simulation.12 The results of the MNL and ML models indicate that firms that choose to invest in traditional sectors are more sensitive to higher wages rates in the host country than firms, which choose to invest in the non-traditional sectors. Higher unemployment in a host country, on the other hand, has a less negative or even positive effect on the probability of selecting the country as a location for foreign capital for firms that invest in traditional sectors, as compared to the firms that invest in non-traditional sectors. Typically, traditional sectors employ more unskilled labour, as compared to other sectors, for example, science-based industries, which employ more skilled labour and pay higher wages that reflect a skill premium. Better performing investing firms are less likely to be discouraged from investing in more remote countries as they usually have more resources to pay for transaction costs associated with investment in more remote countries. The larger the host country, the more likely it is to be chosen by an investing firm to locate its capital, and the effect is stronger for larger investing firms, as compared to smaller firms. Larger firms are usually characterized by significant economies of scale, therefore, they search for larger foreign markets to exploit these economies. The results of the LC model estimation are presented in Table 26.5. Three out of four criteria presented in Section 26.2 support five LCs as the optimal solution for the data, as the calculated values of AIC and AIC3 ðr 2 Þ are minimum (maximum) (Table 26.4).13 The fit of the model indicated by the log-likelihood function, w2 and pseudo-R2 is much better for the LC model as compared to the MNL and ML models implying the superiority of the LC model. The different estimated parameters for each LC reveal five groups of investing firms in the sample. None of the groups has a clearly defined type of investor who is concerned about particular, exclusive
10. There are no rules that help choose appropriate distributions for random parameters in the specification of the ML model, apart from the requirement for the mean and standard deviation of the random parameter to be statistically significant. As a result, the analysis starts by treating each countrylevel variable separately as random by imposing various distributions and then combining the most appropriate distributions in the final specification. 11. 1000 Halton intelligent draws are used to estimate the model. 12. The results are not included due to the lack of space but they are available upon request. 13. Nlogit4 supports maximum five classes.
Continuous versus Discrete Representation of Investing Firm Heterogeneity
583
Table 26.4: The AIC, r2Q , AIC3 and BIC measures for 2, 3, 4 and 5 classes.
LogLQ Kq AIC r2Q AIC3 BIC
2
3
4
5
2467.643 18 4971.286 0.1254 4987.286 2530.7357
2426.882 29 4911.764 0.1359 4940.764 2528.5314
2380.229 40 4840.458 0.1484 4880.458 2520.435
2351.094 51 4804.188 0.1548 4855.188 2529.8567
The optimal number of latent classes is indicated by the minimum value of AIC, AIC3 and BIC and by the maximum value of r2Q .
host country characteristics. The LC model allows for the preferences to overlap, as investing firms are heterogeneous and complex entities. The absolute values of the estimated parameter are not comparable across the LCs due to scale differences and therefore are not informative. In order to be able to compare the results across different classes and to analyse to what degree investors with different characteristics benefit from different country-level variables, the marginal rates of substitution between two factors should be calculated for each class with at least the attribute in the denominator of the ratio being measured in monetary units.14 Furthermore, both attributes used in the calculation of the ratio have to be statistically significant; otherwise, no meaningful measure can be established. The calculated ratios with the parameter of the market size variable being in the denominator show how much investing firms are willing to ‘‘pay’’ in host country’s GDP in billion of EUR for the decrease (increase if the ratio is positive) in the distance between investing and investment receiving country by 1 km, for the decrease in the hourly wage rate by one EUR, for the decrease (increase if the ratio is positive) in the unemployment rate by one percentage point, for the increase in the return on capital by one percentage point, for the host country being a EU member as compared to non-EU members and for the host country having a common border with the source country (Table 26.6). However, it is not the absolute value of the ratio itself that is of interest but its absolute value in relation to the equivalent ratios from other LCs. The comparison of the ratios across different groups reveals the extent of the sensitivity of foreign investors in one class to country-level factors as compared to the foreign investors from other LCs. So, for example, firms in Class 3 are highly sensitive to distance and those in Class 5 to the wage rate.
14. In transport economics willingness to pay measures are used, for example, the value of travel time savings, defined as the amount of money an individual is willing to outlay in order to save a unit of time spent travelling, ceteris paribus.
584
Simona Rasciute and Eric J. Pentecost
The direction and the magnitude of the effect can also be revealed with elasticities and marginal effects. However, elasticities and marginal effects cannot be estimated for each class separately, as it is not known which class contains which investing firm, therefore, in the previous empirical literature that has applied the LC model, only an average effect over all classes is given. In contrast to the previous empirical literature, however, in this paper posterior probabilities are used to identify which firm belongs to which class, and, therefore, elasticities and marginal effects are estimated for each class separately and given in Table 26.7. Following the results presented in Tables 26.5, 26.6 and 26.7, the first class contains foreign investors who choose to locate their capital abroad to access foreign markets (the estimated coefficients in Table 26.5 and estimated elasticities in Table 26.7 are positive for the market size variable for the investors in the first class) preferably in the EU member states (the estimated marginal effects in Table 26.7 and estimated coefficients in Table 26.5 are positive for the EU dummy variable for the investors in the first class) and they are not discouraged to invest in more remote countries in order to access those markets (the estimated elasticities in Table 26.7 and estimated coefficients in Table 26.5 are positive for the distance variables for the investors in the first class). Regarding individual investment receiving countries, market size appears to be especially important in attracting FDI in Russia, Czech Republic and Hungary (Table 26.7). The estimated marginal effects for the EU dummy variable for foreign investors in the first class are very high, especially for Hungary and Czech Republic (Table 26.7), indicating that EU membership is the key factor driving foreign capital allocations by foreign investors in the first class. Unemployment, on the other hand, has a negative effect on foreign capital allocation of firms in the first class, and the effect is the strongest for Croatia and Slovakia (Table 26.7). The marginal rates of substitution cannot be calculated for the second class because the market size variable is not statistically significant for that class. However, based on the elasticities and marginal effects presented in Table 26.7 and the estimated coefficients in Table 26.5, it can be concluded that foreign investors in the second class are small firms who are concerned about the availability of a cheap labour force and they prefer to invest in nearer-neighbouring countries, preferably EU members, which have similar culture and traditions. Investors in the second class have the highest negative elasticities for the labour cost variable (Table 26.7) indicating that the higher labour costs in the host country the less likely that country will be chosen by foreign firms from the second class and this effect is strongest for Slovenia and Poland. The availability of the labour force for the investors in the second class appears to be of most importance in Poland, while a common border with the investing country is especially beneficial for Romania, Bulgaria and Ukraine (Table 26.7). Ukraine, for example, is a neighbour of Russia, which also appears to be its major source of foreign investment. The third class contains larger foreign investors with relatively weak financial performance and with the highest negative sensitivity to the remoteness of the host country (especially Bulgaria and Ukraine) to the source country, as compared to foreign investors from other LCs. The MNEs in the third class are not only
Class 2
Class 3
Class 4
0.1727
( 1.293) 0.3198 ( 1.025) 1.2814 ( 1.960) 0.6919*** (1.859) 1.8385* (1.106) 0.5609 (1.530) 1.1180* (1.593) 1.7452** ( 2.147) 0.8625*
Pseudo-R2
(0.236) 0.3353 ( 0.160) 0.7704** (0.263) 0.3518 ( 0.379) 0.4209
2351.094
0.0498 0.0236 0.0699 0.0525
w2
(7.442) ( 2.967) ( 3.766) (2.733)
t-Statistics in parenthesis *significant at 1 per cent level, **significant at 5 per cent level and ***significant at 10 per cent level. The estimated coefficients in different latent classes are not comparable due to scale differences.
Log likelihood
Constant Size Traditional Profit
parameter parameter parameter parameter 981.7395
Fixed Fixed Fixed Fixed
Wage 0.0028 (0.042) 3.8631** ( 10.718) 0.2516** ( 2.220) 0.0918** ( 2.257) 2.6237* ( 3.571) Distance 3.5988** (2.581) 1.2877** ( 4.486) 4.6814* ( 12.067) 1.4069* ( 13.323) 3.7456 (1.164) GDP 4.4877* (3.907) 0.0888 ( 0.790) 0.4268* (3.934) 0.2288* (3.165) 3.6915* (3.252) Unemployment 17.2130** ( 2.154) 6.7638*** (1.643) 10.2603* (4.366) 8.1199* ( 5.895) 75.7799* ( 4.172) Border 4.2322* ( 3.207) 1.3494* (4.111) 29.7836 (0.000) 0.5371* (3.068) 56.0962 (0.0001) EU 11.5249* (3.334) 1.0645* (5.193) 0.0432 (0.187) 1.9698* (7.515) 33.9335 (0.0001) Interest 23.4552* (3.050) 4.1678* (2.149) 6.7264** (2.067) 3.7809** (2.217) 16.9247 (1.354)
t-statistics
Class 5
Coefficient t-statistics Coefficient t-statistics Coefficient t-statistics Coefficient t-statistics Coefficient
Class 1
Table 26.5: The estimated coefficients of the latent class model.
Continuous versus Discrete Representation of Investing Firm Heterogeneity 585
586
Simona Rasciute and Eric J. Pentecost
Table 26.6: Ratios of the estimated parameters with the parameter of the GDP variable being in the denominator. Variables in the nominator
Class 1
Class 2
Class 3
Class 4
Class 5
Distance Wage Unemployment Interest EU Border Traditional Size Profitability
0.8019 nss 3.8356 5.2266 2.5681 0.9431 nss nss nss
nss nss nss nss nss nss nss nss nss
10.5000 0.5895 24.0401 15.7601 nss nss nss 1.6211 4.0890
6.1490 0.4012 35.4891 16.5249 8.6093 2.3475 4.8864 8.0354 3.7710
nss 0.7107 20.5282 nss nss nss nss nss nss
nss, not statistically significant. In order to compare the estimated coefficients across latent classes presented in Table 26.5, marginal rates of substitution are derived, where the ratios of two estimated parameters in each latent class are calculated, with the parameter of the market size variable being in the denominator.
discouraged to invest in more remote countries, but also in countries with higher wages (the effect is especially strong for Slovenia), as weaker financial performance means less resources available to cover higher labour costs and transaction costs associated with investment in more remote countries. Furthermore, these MNEs prefer to invest in countries with excess labour supply, which is probably low skilled, as they also prefer host countries with lower labour costs. The higher the unemployment rate and the lower the wages in the host country, the more likely the country is to be chosen by foreign investors in the third class to locate their capital. Poland appears to be the country where foreign capital allocations have the highest positive sensitivity to unemployment for investors in the third class. The fourth group of foreign investors contains better performing but small investing firms in the sample, who choose to locate their capital in non-traditional sectors. Foreign firms in the fourth LC have negative elasticities (Table 26.7) and the lowest overall marginal rate of substitution for the unemployment variable (Table 26.6) and the highest overall marginal rate of substitution for the labour cost variable. This can be explained by the fact that investors choosing to locate their capital in non-traditional sectors (science-based industries, service sectors and scaleintensive industries) usually employ more skilled labour and pay higher wages that reflect a skill premium as compared to MNEs investing in traditional sectors. Therefore, a higher unemployment rate in the host country (especially in Croatia and Poland) will have a negative effect on the probability of foreign capital allocations in that country by foreign firms in the fourth class, while these firms will be less discouraged to invest in host countries with higher wage rates. The investment by foreign firms in the fourth class is also driven by larger and closer neighbouring foreign markets, preferably EU members. In Russia and Poland market size has a stronger effect in attracting foreign capital, as compared to other
Bulgaria Czech Rep Estonia
Bulgaria Czech Rep Estonia Croatia Hungary Lithuania Latvia Poland Romania Russia Slovenia Slovakia Ukraine
nss nss nss
Class 2
Class 1
0.7259 3.1211 0.3473
1.7466 7.4199 2.2509 3.6055 5.6090 4.9421 4.5295 7.5580 0.9985 1.3251 11.1518 6.4049 0.7735
Class 2
nss nss nss nss nss nss nss nss nss nss nss nss nss
Class 1
0.0767 0.2416 0.0195
Class 3
GDP
0.1806 0.3872 0.2103 0.3075 0.3838 0.3385 0.2769 0.3672 0.0464 0.1317 0.8514 0.3235 0.0912
Class 3
Wage
0.0360 0.1599 0.0154
Class 4
0.0688 0.2204 0.0919 0.1823 0.1932 0.1316 0.1207 0.2075 0.0580 0.0794 0.3801 0.1752 0.0386
Class 4
0.0572 0.8894 0.3873
Class 1
15.3842 12.2175 11.5179 33.4400 6.9953 18.5056 18.2603 1.1047 13.5268 6.7977 17.5192 21.4716 26.1940
Class 1
1.8513 1.2398 0.6209
0.1105 1.9466 2.0117 5.3300 0.4287 3.6396 3.1706 1.6877 1.3962 0.0086 5.6502 4.9010 0.0021
Class 5
Class 5
Table 26.7: Class-specific elasticities and marginal effects.
0.4772 0.6203 0.1536
Class 2
6.8806 4.6421 1.6950 4.0151 5.0637 4.8869 3.5747 4.8551 1.9980 4.7386 5.7238 4.1800 6.9155
Class 3
1.1861 0.5972 0.2089
Class 3
2.0044 1.7175 1.5904 2.3212 1.9582 2.1129 1.9055 1.6746 2.4457 2.3196 1.9076 1.9778 2.5869
Class 4
0.7643 0.5566 0.2071
Class 4
Unemployment
1.8489 1.4749 1.8100 1.4950 1.3988 2.6723 3.2019 1.8988 1.6934 2.6451 1.1995 1.0926 2.0313
Class 2
Distance
0.5444 1.8249 1.5859
Class 5
nss nss nss nss nss nss nss nss nss nss nss nss nss
Class 5
Continuous versus Discrete Representation of Investing Firm Heterogeneity 587
Bulgaria Czech Rep Estonia Croatia Hungary Lithuania Latvia Poland Romania Russia
Croatia Hungary Lithuania Latvia Poland Romania Russia Slovenia Slovakia Ukraine
GDP
0.2150 0.1893 0.0574 0.2816 0.2782 0.2666 0.0779 0.2120 0.0389 0.1706
0.0886 0.1329 0.0221 0.2198 0.1346 0.0712 0.0272 0.1456 0.0361 0.0097
nss nss nss nss nss nss nss nss nss nss
1.0579 1.3666 0.7248 0.0252 1.9356 0.9356 0.3886 1.1135 1.1687 1.3244
Class 1
2.1836 0.7815 1.2295 1.3607 0.4732 1.0647 0.6930 1.5642 2.0891 1.6326
Class 1
Class 5
0.9037 0.2337 0.6498 0.4188 1.8544 0.9519 0.0832 0.5568 1.1991 0.0033
Class 5
Class 4
0.0578 0.1290 0.0380 0.0205 0.4012 0.0672 0.8706 0.0450 0.0629 0.0980
Class 4
Class 3
Interest
0.0797 0.2520 0.0675 0.0337 0.5727 0.0633 1.5921 0.0841 0.0887 0.1804
Class 3
0.0554 0.1512 0.0131 0.1855 0.1969 0.0651 0.0076 0.1434 0.0274 0.0834
Class 2
Class 1
0.8556 0.7841 0.4100 1.3350 0.9592 0.6938 0.2144 0.1371 0.2246 0.2485
nss nss nss nss nss nss nss nss nss nss
Class 2
1.1984 2.0239 0.7365 0.4363 1.5063 1.1134 8.0646 0.9055 1.2590 1.7354
Class 1
Table 26.7: (Continued )
0.1886 0.0101 0.1470 0.1247 0.0390 0.0233 0.0501 0.0119 0.2408 0.1422
nss nss nss nss nss nss nss nss nss nss
Class 4
0.2215 0.2894 0.3441 0.0228 0.2228 0.1899 0.2175 0.2350 0.1423 0.0884
Class 4
1.0651 0.4792 0.4935 0.5614 1.0853 0.5304 0.6333 0.6518 0.9340 0.7314
Class 3
EU
1.0095 0.6287 0.8016 0.6574 1.1534 0.3654 0.6175 0.8955 1.0183 0.9335
Class 3
Class 2
0.6992 0.4462 0.4245 0.5043 1.1918 0.2806 0.4094 0.6173 0.8767 0.3937
Class 2
Unemployment
nss nss nss nss nss nss nss nss nss nss
Class 5
10.5648 0.5158 6.7402 5.4852 2.9713 5.4856 0.0407 4.9604 8.3755 0.0171
Class 5
588 Simona Rasciute and Eric J. Pentecost
1.0219 0.7704 3.9606
0.3885 0.5019 0.2662 0.0092 0.7108 0.3436 0.1427 0.4089 0.4292 0.4863 0.2621 0.2289 0.0023
Class 1
0.1890 0.1513 0.0356
0.2826 0.2020 0.6091
0.2391 0.0128 0.1864 0.1581 0.0494 0.0296 0.0635 0.0150 0.3053 0.1803 0.0004 0.0396 0.2404
Class 2
0.1329 0.1202 0.2879
nss nss nss
nss nss nss nss nss nss nss nss nss nss nss nss nss
Class 3
Border
0.7136 0.6234 0.0062
0.0003 0.0312 0.1896
0.0604 0.0789 0.0938 0.0062 0.0607 0.0518 0.0593 0.0641 0.0388 0.0241 0.0525 0.0442 0.0076
Class 4
nss nss nss
0.1926 0.1622 0.0278
nss nss nss nss nss nss nss nss nss nss nss nss nss
Class 5
nss nss nss
nss, not statistically significant. Elasticities are estimated for the continuous variables and marginal effects are estimated for the dummy variables. A direct elasticity measures the percentage change in the probability of choosing a particular alternative in the choice set with respect to a given percentage change in an attribute of that same alternative. Direct marginal effects represent the change in the choice probability for an alternative given a unit change in a variable related to that alternative.
Bulgaria Czech Rep Estonia Croatia Hungary Lithuania Latvia Poland Romania Russia Slovenia Slovakia Ukraine
Slovenia Slovakia Ukraine
Continuous versus Discrete Representation of Investing Firm Heterogeneity 589
590
Simona Rasciute and Eric J. Pentecost
CEECs. If the source country shares a common border with the host country, the host country is more likely to be chosen as an investment location by foreign firms in the fourth class, especially when the host countries are Estonia and Czech Republic (Table 26.7). Although, common border has a positive effect on the probability on selecting a host country for foreign capital allocations, the elasticities (Table 26.7) are quite low. The decision to invest abroad by foreign firms in all four classes are positively affected by the cost of capital15 in the host country, with investing firms in the fourth LC having the highest overall marginal rate of substitution for the cost of capital (Table 26.6). In the fifth LC, the parameters of investing firm-specific variables are normalised and their values are set to zero (Table 26.5). Foreign investors in the fifth LC choose to locate their capital in countries with large foreign markets; however, they are discouraged by higher labour costs and unemployment in the host country. Slovakia and Poland appear to have the highest elasticities for the market size variable, while foreign capital allocations by investors in the fifth class appear to be most discouraged by higher labour costs and unemployment in Croatia and Slovakia, as compared to other CEECs.
26.5. Conclusions This paper applies the ML and LC models to investigate the heterogeneity of firms who choose countries in Central and Eastern Europe to locate their capital. It also makes use of a novel multi-level data set — allowing firm, industry (or sector) and country effects to simultaneously determine the firm-level FDI location decisions. The paper shows the superiority of the LC model with regards to the model fit and the interpretation of results. The empirical results support the presence of heterogeneity in the investment location decisions, which is revealed by statistically significant class-specific parameters. The model reveals five classes of foreign investors depending on investment receiving industry and country factors, investing firms’ individual characteristics and latent heterogeneity that varies with factors that are unobserved by the analyst. In contrast to the majority of applications, class membership directly depends on investing firm characteristics rather than only on unobserved factors. The results show that firms investing in different sectors and firms of different size and financial performance benefit from host country factors to differing degrees. Firms investing in the non-traditional sectors are more likely to invest in countries with lower unemployment rates, but are less likely to be discouraged by higher wage rates as compared to MNEs that invest in traditional sectors. Investors choosing to locate their capital in non-traditional sectors usually employ more-skilled labour and pay higher wages that reflect a skill premium. The better performing firms, on
15. Foreign firms bring capital from their home countries.
Continuous versus Discrete Representation of Investing Firm Heterogeneity
591
the other hand, are less likely to be discouraged to invest in more remote countries and pay higher wages, as compared to poorer performing firms, as they have more funds to cover higher labour costs and transaction costs that arise from investment in more distant countries. This more general approach to the FDI location decision shows that to allow for firm heterogeneity is important if robust estimates are to be found for their complex effects.
Acknowledgement The authors are grateful to Professor William Greene for advice on LC modelling techniques.
References Barrell, R., & Pain, N. (1999). Domestic institutions, agglomerations and foreign direct investment in Europe. European Economic Review, 43, 925–934. Basile, R., Castellani, D., & Zanfei, A. (2008). Location choices of multinational firms in Europe: The role of EU cohesion policy. Journal of International Economics, 74, 328–340. Becker, S. O., Ekholm, K., Jackle, R., & Muendler, M. A. (2005). Location choice and employment decisions: A comparison of German and Swedish multinationals. Review of World Economics, 141, 693–731. Be´nassy-Que´re´, A., Coupet, M., & Mayer, T. (2007). Institutional Determinants of Foreign Direct Investment. The World Economy, 30(5), 764–782. Carstensen, K., & Toubal, F. (2004). Foreign direct investment in Central and Eastern European countries: A dynamic panel analysis. Journal of Comparative Economics, 32, 3–22. Crozet, M., Mayer, T., & Mucchielli, J. L. (2004). How do firms agglomerate? A study of FDI in France. Regional Science and Urban Economics, 34, 27–54. Dean, T. J., Brown, R. L., & Bamford, C. E. (1998). Differences in large and small firm responses to environmental context: Strategic implications from a comparative analysis of business formations. Strategic Management Journal, 19, 709–728. Disdier, A. C., & Mayer, T. (2004). How different is Eastern Europe? Structure and determinants of location choices by French firms in Eastern and Western Europe. Journal of Comparative Economics, 32, 280–296. Greene, W. H., & Hensher, D. A. (2003). A latent class model for discrete choice analysis: Contrasts with mixed logit. Transportation Research Part B: Methodological, 37, 681–698. Head, K., & Ries, J. (2008). FDI as an outcome of the market for corporate control: Theory and evidence. Journal of International Economics, 74, 2–20. Hess, S., Ben-Akiva, M., Gopinath, D., & Walker, J. (2009). The heterogeneity, correlation and elasticities in latent class choice models. International discrete choice modelling conference 2009. Harrogate. Hole, A. R. (2008). Modelling heterogeneity in patients’ preferences for the attributes of a general practitioner appointment. Journal of Health Economics, 27(4), 1078–1094. McFadden, D., & Train, K. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics, 15, 447–470.
592
Simona Rasciute and Eric J. Pentecost
Midelfart-Knarvik, K. H., Overman, H. G., Redding, S. J., & Venables, A. J. (2000). The location of European industry. European Economy, Economic Papers 142, Commission of the EC, Directorate-General for Economic and Financial Affairs (DG ECFIN). Nocke, V., & Yeaple, S. (2007). Cross-border mergers and acquisitions vs. Greenfield foreign direct investment: The role of firm heterogeneity. Journal of International Economics, 72(2), 336–365. Shen, J., & Saijo, T. (2007). Does an energy efficiency label alter consumers’ purchasing decisions? A latent class approach based on a stated choice experiment in Shanghai. Journal of Environmental Management, 90(11), 3561–3573. Shen, J., Sakata, Y., Hashimoto, Y. (2006). A comparison between latent class model and mixed logit model for transport mode choice: Evidences from two datasets in Japan. Discussion paper no. 06-05, Graduate School of Economics and Osaka School of International Public Policy, Osaka University, Toyonaka, Osaka. Train, K. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University Press. Walker, J. L., & Li, J. (2007). Latent lifestyle preferences and household location decisions. Journal of Geographical Systems, 9, 77–101. Woo, C. Y. (1987). Path-analysis of the relationship between market share, business-level conduct and risk. Strategic Management Journal, 8, 149–168.
Chapter 27
Development of Integrated Choice and Latent Variable (ICLV) Models for the Residential Relocation Decision in Island Areas Eleni Kitrinou, Amalia Polydoropoulou and Denis Bolduc
Abstract This paper introduces a behavioral framework to model residential relocation decision in island areas, at which the decision in question is influenced by the characteristics of island regions, policy variables related to accessibility measures, and housing prices at the proposed island area, as well as personal, household (HH), job, and latent characteristics of the decision makers. The model framework corresponds to an integrated choice and latent variable (ICLV) setting where the discrete choice model includes latent variables that capture attitudes and perceptions of the decision makers. The latent variable model is composed of a group of structural equations describing the latent variables as a function of observable exogenous variables and a group of measurement equations, linking the latent variables to observable indicators. An empirical study has been developed for the Greek Aegean island area. Data were collected from 900 HHs in Greece contacted via telephone. The HHs were presented hypothetical scenarios involving policy variables, where 2010 was the reference year. ICLV binary logit (BL) and mixed binary logit (MBL) relocation choice models were estimated sequentially. Findings suggest that MBL models are superior to BL models, while both the policy and the latent variables significantly affect the relocation decision and improve considerably the models’ goodness of fit. Sample enumeration method is finally used to aggregate the results over the Greek population.
Choice Modelling: The State-of-the-Art and the State-of-Practice Copyright r 2010 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISBN: 978-1-84950-772-1
594
Eleni Kitrinou et al.
27.1. Introduction Residential location choice has long been a multidisciplinary research topic. For regional and transportation planning, the interest in the effects of HHs’ choice of residence arises from the fact that the decision makers are attracted to certain types of land-use patterns that ultimately shape the transportation, land-use, and regional form. Additionally, the interaction between information and communication technologies (ICTs) and human-activity travel behavior has constituted a substantial research topic in recent years. A number of researchers have suggested that the concept of ‘‘distance’’ being redefined by ICTs, HHs will demonstrate a pronounced tendency to relocate to distant locations, a tendency that in its turn would encourage further decentralization and thus produce changes in transportation demand (Nilles, 1991; Mokhtarian, Collantes, & Gertz, 2004). The purpose of this paper is to provide an insight to the factors affecting residential relocation decision in island areas. The modeling challenge is to develop residential relocation decision models to island areas that are able not only to predict HH preferences, but also to recognize the impact of policy variables referring to accessibility of the island area (with regard to both the transport and the telecommunication systems of the area), together with the impact of psychological factors, such as perceptions and attitudes of the decision makers. The following section provides a comprehensive review of the state of the art regarding modeling residential location choice. Section 27.3 describes a behavioral framework for modeling residential relocation decision to island areas, where preferences are dependent on both observed and latent variables. Section 27.4 features an application concerning the Aegean Island region in Greece. It aims at the development of integrated choice and latent variable (ICLV) models for the choice in question, focusing on the role of the ICTs and the transport system of the proposed island area, additionally to psychological factors of the Greek employees. The main conclusions are presented in Section 27.5.
27.2. State of the Art: Modeling Residential Location Choice Over the last four decades, there has been considerable development in the mathematical modeling of residential location decision. The relevant quantitative studies have mainly focused on modeling the factors affecting this decision. The models developed could be divided into the following four categories: hedonic pricing models, urban economic models, simulation-based models, and discrete choice models. The hedonic pricing models examine the factors affecting housing prices, via the development of linear regression models (e.g., Steiner, 1994). The urban economic models are based on Alonso (1964), whose study is based on the marginal utility theorem and focuses on economic factors in order to analyze the urban spatial structure. Such models have been developed by Muth (1969) and
Development of Integrated Choice and Latent Variable Models
595
Goldberg and Chinloy (1984). The relevant theory considers a monocentric city, and the housing units are arranged around the center; this is the major restriction of this method (Anas, 1982). Other studies have used simulation-based approaches in order to better reach supply and demand equilibrium (e.g., Raju, Sikdar, & Dhingra, 1998). A detailed review regarding the alternative approaches on modeling residential location choice, together with the advantages and disadvantages identified for each approach, can be found at Mokhtarian and Cao (2008). The constraints of the modeling approaches mentioned above led to the development of discrete choice models to analyze residential location decisions. McFadden (1978, 1997) considered the classical model of the rational, utility-maximizing consumer, who faces a residential location decision, with a choice set of communities within a wider area. He suggested that this problem invalidates the commonly used multinomial logit (MNL) model of choice, and the utility of a consumer for a particular alternative is a function of the attributes of the alternative, including accessibility, quality of public services, etc., as well as a function of the decision maker’s characteristics, such as age, family size, income, and so on. The probabilistic concept of the discrete choice modeling approach enables us to take into account the variations in taste and preference of individuals in choosing their residential location, as well as socioeconomic differences within each population group. The modeling results can thus help formulate regional policies that effectively target specific population groups. For a detailed discussion on the characteristics of this approach, see Sermons and Koppelman (2001) and Bhat and Guo (2004). The works that have been done in the past to model residential location decision via the approach of discrete choice modeling are presented in Section 27.2.1.
27.2.1. Discrete Choice Models to Analyze Residential Choice Applications of discrete choice formulation to model residential choice include the work of Lerman (1976) who developed a logit model of a HH’s joint choice of residential location, housing type, auto ownership, and mode to work. Studies have been made by Weisbrod, Lerman, and Ben-Akiva (1980); Quigley (1976), while more recent studies include the work of (Hunt, McMillan, & Abraham, 1994; Waddell, 1993; Sermons & Koppelman, 2001; Shiftan, 2003). Most of the studies have used simple MNL models (Pollakowski, 1982; Friedman, 1981; Quigley, 1976). Other studies have considered joint logit models of residential location and mode to work (Lerman, 1976). Additionally, nested logit models proposed by Ben-Akiva and de Palma (1986) and Kim (1991) have been used to analyze residential location decisions of the HHs. Bhat and Guo (2004) proposed a mixed spatially correlated logit (MSCL) model for residential choice, capturing taste variations as well as spatial correlation in location choice analysis. Additionally, Bhat and Guo (2006) explored different conceptualizations to represent neighborhoods in residential location choice models and described three alternative ways for constructing operational units to represent neighborhoods.
596
Eleni Kitrinou et al.
Chen, Chen, and Timmermans (2008) developed a residential location model that accounts for prior locations of the HHs, using the Puget Sound Panel (1989– 2002) dataset. They showed that prior commute distance, open space, and recreation opportunity are all influential in people’s residential location choices. Ettema (2008) estimated latent class discrete choice models of the relocation probability and the choice of residential environment to investigate the heterogeneity of commuters’ and telecommuters’ preferences. He suggested that, within the group of telecommuters, two different classes can be identified, those as being sensitive and insensitive to commute distance. Latent class discrete choice models for HH location decisions were also developed by Walker and Li (2006), taking into account the lifestyle preferences of the HH. Residential location choice is considered as a HH decision. However, usually for simplicity, the choice of residence location is based on individual’s (head of the HH) decision. Regarding the main factors affecting the choice in question, the above studies found that these factors relate to both the socioeconomic characteristics of the HHs (usually age, children, income, and education of the HH members (Gabriel & Rosenthal, 1989; Ben-Akiva & Bowman, 1998b)), and the characteristics of the alternative residential locations, often called ‘‘neighborhood characteristics,’’ like density, crime rate, safety, school quality, housing prices, transportation accessibility measures, and indices regarding the level of public services (Ben-Akiva & Bowman, 1998b; Shiftan, 2003; Ben-Akiva & de Palma, 1986; Kim, 1991). The role of accessibility on residential location decision has been emphasized in the literature, mainly referring to the access from the residential area to work location, shopping center, or leisure facilities (Molin, Oppewal, & Timmermans, 1999; Srour, Kockelman, & Dunn, 2002; Blijie, 2005). For a detailed review on accessibility measures and their use in evaluating land-use and transport strategies, see Geurs and van Wee (2004). Additionally, the commuting distance of the employees has been underlined as an impact factor in residential choice models (Weisbrod et al., 1980; Quigley, 1985; Evers, 1990; Molin & Timmermans, 2003; Van Ommerman, Rietveld, & Nijkamp, 1998). Other significant accessibility variables for the residential location decision have been found to be the generalized commuting cost (Anas, 1985) and the commuting time (Bhat & Guo, 2004; Lerman, 1976). Prashker, Shiftan, and Hershkovitch-Sarusi (2008) estimated a logit choice model for the residential location choice in the Tel Aviv metropolitan area, in order to identify how residential decisions are influenced by socioeconomic variables and neighborhood characteristics, with emphasis on behavioral differences between the genders. They emphasized on the role of the commute trip on the decision in question and found that the importance of commute distance in residential location choice decreases with increase in one’s level of income, level of education, and number of cars in the HH. Ben-Akiva and Lerman (1977) proposed the use of expected maximum utility of travel demand models as a measure of accessibility. In addition, Ben-Akiva and Bowman (1998b) developed an activity-based accessibility measure, adapting the expected maximum utility accessibility measures to activity-based framework.
Development of Integrated Choice and Latent Variable Models
597
Within this framework, they estimated a residential choice model where the utility function included each HH member’s activity-based accessibility as part of the explanatory variables. It is noted that all of the above-mentioned studies, found in the literature on modeling residential location choice, refer to urban areas. Residential location choice in island areas constitutes a special case, due mainly to the discontinuity of the regional space. Particularities in transport, regarding the island character, are attributed to the fact that access to and from the islands occur via the sea or the air. For a detailed discussion on the characteristics of the maritime transport system in Greece, see Polydoropoulou and Litinas (2007). Kitrinou and Polydoropoulou (2007) proposed a theoretical framework to model residential relocation decision in island areas. They developed a virtual case study for the wider Aegean island area in Greece, in order to identify the critical factors affecting the relevant decision. Via analysis of variance and regressions, they identified the profile of the HHs that are more likely to relocate to the area. The main profiles are: HHs currently living in urban areas, with a small number of members and without children; HHs in which the primary employer is employed at the private sector and he/she is younger than 45 years.
27.3. Methodological Approach for Modeling Residential Relocation Decision in Island Areas 27.3.1. Behavioral Framework Based on the findings that came out of our literature review, to model residential relocation decision in island areas, we propose to use a framework based on the methodology proposed by Morikawa, Ben-Akiva, and McFadden (2002) and Ben-Akiva et al. (1999, 2002). This framework is depicted in Figure 27.1. In this diagram, ovals refer to unobservable or latent variables, while rectangular boxes represent observable variables. The observable explanatory variables are linked to the individual’s utility and perceptions through a causal mapping, showed by solid arrows (structural relationships), while the latent variables are linked to their indicators (measurement relationships) by dashed arrows. The proposed framework is composed of discrete choice models that describe discrete responses of revealed choices together with a linear structural equation model that identifies latent attributes from psychometric perceptual indicators. The key feature of the proposed modeling framework is that the latent attributes can be calculated from the observable variables once the model parameters are estimated (this implies that the models developed can be used for forecasting demand in conjunction with changes in the observable variables). For a detailed review on the methodology to incorporate latent variables into discrete choice models, see Walker (2001). Additionally, the example of the analysis and implementation of a situation
598
Eleni Kitrinou et al. Household characteristics Characteristics of the current location of residence Work characteristics, especially for the primary worker Characteristics of the travel patterns to/from work Policy Variables
Attitudinal Indicators
Perceptual Indicators
Geographical mobility
Satisfaction from the current residential area
perceptions about the quality of life in island areas
Perceptual Indicators
Satisfaction of the commuting trip
Perceptual Indicators
Utility of residential relocation to the proposed area
Stated Preference for relocation to the proposed island area or not
Observed variables
Latent variables structural equations measurement equations
Figure 27.1: Integrated framework to model residential relocation decision to island areas. characterized by a large number of latent variables and choices is available at Bolduc, Ben-Akiva, Walker, and Michaud (2005). As shown in Figure 27.1, the mode1 system contains two types of latent variables and associated indicators. The first latent variable is the satisfaction of the current commuting trip of the workers who are associated with the following indicators: I11 ¼ satisfaction from the travel cost, I12 ¼ satisfaction from the travel time,
Development of Integrated Choice and Latent Variable Models
I13 ¼ satisfaction I14 ¼ satisfaction I15 ¼ satisfaction I16 ¼ satisfaction bus, and I17 ¼ satisfaction
599
from the safety of the trip, of the flexibility to travel, of the travel conditions, of the consequence with regard arrival and departure time for of the trip frequency for bus.
The second latent variable is the geographical mobility, meaning the ‘‘flexibility,’’ someone has to relocate from one geographical area to another, indicated by the following attitudes: I21 ¼ I seriously think to relocate to another geographical area. I22 ¼ The frequent changes in the location of residence impact negatively my identity as a worker. I23 ¼ After some age, it is very difficult for a worker to work at another geographical area. I24 ¼ The decision to relocate to another geographical area includes a lot of risk. The third latent variable considered is the satisfaction from the current residential area, indicated by the following indicators: I31 ¼ I am satisfied with my social life in my current residential area. I32 ¼ In my residential area, there is good infrastructure with regard to health and education. I33 ¼ In my residential area there are no cultural activities. I34 ¼ There is a lot of environmental pollution at my residential area. I35 ¼ The crime level is very low at my residential area. The last latent variable considered is called perceptions about the quality of life in island areas, and its perceptual indicators are the following: I41 ¼ At the island, someone feels remote from the mainland due to the discontinuity of the space. I42 ¼ In an island area, the level of criminality is lower, compared to the mainland. I43 ¼ The transport network connecting the island areas with the mainland has a lot of problems, especially in winter. I44 ¼ The trips in an island are very short and one saves a lot of time. The explanatory variables affecting the residential location decision to island areas are as follows: HH characteristics such as number of HH members, number of workers among the HH members, number and age of children, HH income, etc., respondent’s personal attributes such as age, gender, educational background, etc.,
600
Eleni Kitrinou et al.
current location of residence characteristics, such as type of current residential area (urban, suburban, rural, and island), etc., respondents’ work characteristics, such as type of employment (full-time worker, employed in the private or public sector, hierarchical position, etc.), respondents’ commuting profile, for example, commuting distance, delay time, etc., suggested relocation area characteristics, and policy variables regarding the accessibility of the proposed island area, referring to both the transport and the telecommunication systems.
27.3.2. Specification of Integrated Choice and Latent Variable Models for Modeling Residential Relocation Decision in Island Areas via the Sequential Estimation Method The model presented in Figure 27.1 is the combination of a latent variable model and a discrete choice model. In order for the model system to be estimated, a sequential estimation procedure based on Morikawa et al. (2002) is used. At first, a factor analysis of the indicators for each of the above-presented four latent variables is performed, in order to identify correlations between these variables, via their grouping to factors. Each of the factors is then defined by the variables, via the factor loadings (correlations of the variables with the factors). The factor accounting for most of the variability of the latent variable is then considered. A linear regression model of this factor is then developed with explanatory variables characteristic of the respondent, and the fitted values are calculated. Then the fitted values of the latent variables are used as explanatory variables in the discrete choice model. In this paper, binary logit (BL) and mixed binary logit (MBL), using panel data, choice models for the residential relocation decision to the proposed Aegean island area are developed. The equations of the ICLV models follow, where the relocation choice is noted by i, while the nonrelocation choice is noted by j. 27.3.2.1. Integrated choice and latent variable models Structural equations: Binary logit U in ¼ b0 X in þ c1 Z~ 1n þ c2 Z~ 2n þ c3 Z~ 3n þ c4 Z~ 4n þ in U jn ¼ b0 X jn þ jn
(27.1a)
Mixed binary logit U in ¼ b0 X in ðbÞ þ c1 Z~ 1n þ c2 Z~ 2n þ c3 Z~ 3n þ c4 Z~ 4n þ in U jn ¼ b0 X jn þ jn
(27.1b)
Development of Integrated Choice and Latent Variable Models
601
Measurement equation: ( yn ¼
1;
if U in 4U jn
0;
otherwise
(27.2)
27.3.2.1. Latent variable models Structural equations: Z 1n ¼ W 1n l1 þ o1n Z 2n ¼ W 2n l2 þ o2n Z 3n ¼ W 3n l3 þ o3n Z 4n ¼ W 4n l4 þ o4n
ð27:3Þ
Measurement equations: I 1rn ¼ a1r Z 1n þ u1rn ;
r ¼ 1; 2; . . . ; 7
I 2kn ¼ a2k Z 2n þ u2kn ; k ¼ 1; 2; 3; 4 I 3ln ¼ a3l Z 3n þ u3ln ; l ¼ 1; 2; . . . ; 5 I 4mn ¼ a4m Z4n þ u4mn ;
(27.4)
m ¼ 1; 2; 3; 4
where Uin, Ujn denote the utility of relocation and of nonrelocation decision, respectively, for individual n; Xin, Xjn are sets of observed variables; Zln the latent variable (actually the factor accounting for most of the variability of the latent variable) satisfaction of the commuting trip; Z2n the latent variable (actually the factor accounting for most of the variability of the latent variable) geographical mobility; Z3n the latent variable (actually the factor accounting for most of the variability of the latent variable) satisfaction of the current residential area; Z4n the latent variable (actually the factor accounting for most of the variability of the latent variable) perceptions about the quality of life in island areas; I1n, I2n, I3n, I4n are sets of the indicators of the latent variables Z1n, Z2n, Z3n, Z4n, respectively; Z~ 1n ; Z~ 2n ; Z~ 3n ; Z~ 4n are the fitted values of the latent variables, once they are estimated by Eq. (27.3); W1n, W2n, W3n, W4n sets of observed variables (characteristics of respondent n); o1n, o2n, o3n, o4n and u1rn, u2kn, u3ln, u4mn sets of (i.i.d. normally distributed) errors; b; a1r ; a2k ; a3l ; a4m sets of unknown parameters; l1, l2, l3, l4, c1, c2 , c3, c4 unknown parameters; b (0, s) is a random variable with mean zero and variance s (only for the MBL model, to allow for repeated choices by each sampled decision maker).
27.4. Application for the Aegean Island Area In this section, we consider an application of the ICLV model described above. The empirical application is about residential relocation decision to the Greek Aegean
602
Eleni Kitrinou et al.
island area. The area consists of two regions: the region of North Aegean Islands (which includes the prefectures of Chios, Lesvos, and Samos) and the region of South Aegean Islands (which includes the prefectures of Dodekanisa and Kyklades). A map of the area in consideration is presented in Map 27.1. The main characteristic of the area’s transport system is that access to and from the islands is possible only through the sea or the air. When analyzing the reestablishment of territorial continuity with islands, the following factors are considered: high duration of maritime travel; high transport costs (especially in air travel); issues regarding frequency, regularity, and quality of transport services; problems regarding the infrastructure in ports and airports; and problems concerning decentralization of services and regional development. For the empirical application, data used is based on a survey contacted during 2006. The survey sample includes Greek HHs of which at least one member is a worker, a working member per HH being chosen as the respondent. Both the worker and the HH
Map 27.1: The Aegean Islands area (copyright: http://www.in2greece.com).
Development of Integrated Choice and Latent Variable Models
603
are considered sampling units. A total of 900 HHs were contacted via telephone. The sampling method consisted of two stages: (1) a stratified sampling per Greek prefecture, analogical to the population and (2) a systematically random sampling using telephone catalogs, for each of the 52 Greek prefectures. It is noted that, in the survey sample, 54.1% of the respondents were male, while the remaining 45.9% were female. The average number of HH members was 3.2, while the average number of workers in the HH was about 1.54. Children were present in 58.1% of the HHs, while the average number of children in a HH was 1.78. The average number of cars in a HH was about 1.38. The distribution of the respondents regarding the age group, education level, HH income was about the same at the population. Regarding work characteristics, 28% of the respondents worked in the public sector, and 72% in the private sector. Ninety one percent of the respondents were full time workers, while 9% were part-time workers. Eleven percent had managerial positions. Additionally, only 5% of the respondents were currently teleworkers, and their average teleworking days on a monthly basis were 7.6 days per month. Another 9.7% of the respondents usually ‘‘took work at home’’ on an average of about 9 days per month. The case study sets HHs in hypothetical scenarios having 2010 as their reference year. According to the basic scenario assumption, as of the year 2010, the proposed area for relocation will comprise highly developed telecommunication infrastructure. The basic scenario also includes the existence of facilitators of teleworking adaptation as suggested by Bernardino and Ben-Akiva (1996), namely the possibilities of teleworking from home, a flexible schedule, no salary reduction, and cost of teleworking covered by the employer. Table 27.1 summarizes the stated preferences (SP) scenarios obtained by incorporating in the basic scenario five policy
Table 27.1: Stated preferences (SP) scenarios for relocation likelihood, related to accessibility and housing prices. Scenarios developed
Low level Medium High level (%) level (%) (%)
Scenario 1: Basic scenario (highly developed ICT infrastructure and teleworking opportunities) Scenario 2: Increase of mode frequency to/from/ 100 200 300 into the proposed area (by all modes of transport) Scenario 3: Reduction of travel time (by all modes 10 20 30 of transport) Scenario 4: Reduction of travel cost (by all modes 20 50 100 of transport) Scenario 5: Reduction of the cost of buying or 10 30 50 renting a house at the proposed area Scenario 6: Free internet connection in the proposed area
604
Eleni Kitrinou et al.
variables affecting relocation choice pertaining to accessibility and housing prices. The majority of the attributes incorporated in the SP scenarios have three levels: low, medium, and high. Table 27.1 circumscribes the future scenarios developed. It should be pointed out that these levels of policy variables development are based on the characteristics of both the Aegean Island area and the transport system of the wider region. After the presentation of each scenario, the respondents were asked to choose if they would relocate to the proposed island area or not. From the 900 sampling units, 12,600 observations were thus collected.
27.4.1. Model Development and Estimation Results In this section, the empirical results of the models estimated via the sequential estimation procedure are presented. As mentioned in Section 27.3.1, a factor analysis of the indicators for each of the four latent variables is first performed. It is noted that, in this empirical study, the indicators of each latent variable were factor analyzed using a principal components method for factor extraction and by varimax rotation of the factors. The factor accounting for most of the variability of the latent variable is then considered. Then, a linear regression model of this factor is estimated using the characteristics of the respondent as explanatory variables. The fitted values are then calculated and included as explanatory variables in the discrete choice model (BL and MBL models are to be developed in this study, as mentioned in Section 27.3.2). 27.4.1.1. Latent variable models In this subsection, the latent variable models specified by Eqs. (27.3) and (27.4) (Section 27.3.1) are estimated. Measurement equations The indicators of the four latent variables considered to affect the decision in question have been presented in Section 27.3 of this paper. It is noted that, the indicators of the latent variable satisfaction of the commuting trip, are described by a five-point Likert scale, taking the values: (1) very unsatisfied, (2) unsatisfied, (3) neutral, (4) satisfied, and (5) very satisfied, while the indicators of the remaining latent variables are described by five-point Likert scales of the level of agreement, taking the values: (1) strongly disagree, (2) disagree, (3) neutral, (4) agree, and (5) strongly agree. The seven perceptual indicators for the latent variable satisfaction of the commuting trip have a reliability coefficient Cronbach’s a ¼ 0.896 from the collected data. The factor analysis indicated one factor, which interprets the 62.1% of the total variability of the latent variable. This factor is called Travel time and safety. The four attitudinal indicators for the latent variable geographical mobility take a reliability coefficient Cronbach’s a ¼ 0.24. The factor analysis indicated two factors, which accounts for 73.5% of the total variability of the latent variable. The first factor interprets the 48.5% of the total variability of the latent variable and it could be called Fear to take risk.
Development of Integrated Choice and Latent Variable Models
605
The factor analysis of the five indicators of the latent variable satisfaction of the current residential area indicated two factors, which interpret the 58.3% of the total variability of the latent variable. The first factor interprets the 30.8% of the total variability of the latent variable and is Low criminality and low environmental pollution at the area. It is also noted that the reliability coefficient of the scale is Cronbach’s a ¼ 0.604. Similarly, regarding the latent variable perceptions about the quality of life in island areas, the factor analysis of the four perceptual indicators indicated two factors, which interpret the 70.2% of the total variability of the latent variable. The first factor interprets the 37.1% of the total variability of the latent variable and is improved quality of life in island areas (reliability coefficient Cronbach’s a ¼ 0.647). The corresponding factor loadings of the indicators of the latent variables are presented in Table 27.2 below: Structural equations The linear regression equations of the above-presented factors (accounting for the most of the variability of the relevant latent variables) are specified as follows: Satisfaction of the commuting trip: Z 1n ¼ b0 þ b1 female þ b2 urban þ b3 suburban þ b4 town þ b5 commuting_distance þ b6 delay_time Geographical mobility: Z2n ¼ b0 þ b1 female þ b2 age þ b3 single þ b4 full_time þ b5 private þ b6 internet Satisfaction of the current residential area: Z 3n ¼ b0 þ b1 town þ b2 rural þ b3 island þ b4 family Perceptions about the quality of life in island areas: b0 þ b1 urban þ b2 suburban þ b3 town þ b4 rural þ b5 teleworker þ b6 private where female, 1 if woman; 0 if man; urban, 1 if the current residential area is city center; 0 otherwise; suburban, 1 if the current residential area is suburban; 0 otherwise; town, 1 if the current residential area is small town; 0 otherwise; commuting_distance, the commuting distance (km); delay_time, the delay time for the commuting trip (min); age, 1 if 45 plus years old; 0 otherwise; single, 1 if the member in the HH is 1; 0 otherwise; full_time, 1 if full-time worker; 0 otherwise; private, 1 if employed with private sector; 0 otherwise; Internet, 1 if there is Internet connection at home; 0 otherwise; rural, 1 if the current residential area is rural; 0 otherwise; island, 1 if the current residential area is island; 0 otherwise; family, number of HH members; teleworker, 1 if currently teleworker; 0 otherwise.
606
Eleni Kitrinou et al.
Table 27.2: Measurement equations of the latent variable model, estimated via the sequential estimation method. Indicators
I11 I12 I13 I14 I15 I16 I17 I21 I22 I23 I24 I31 I32 I33 I34 I35 I41 I42 I43 I44
Satisfaction of Geographical Satisfaction of the Perceptions about the commuting mobility (Fear current residential the quality of life trip (Travel to take risk) area (Low criminality in island areas time and and low environmental (Improved quality of safety) pollution at the area) life in island areas) 0.752 0.859 0.756 0.830 0.805 0.782 0.722 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0.014 0.783 0.835 0.791 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0.485 0.174 0.224 0.779 0.740 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.199 0.831 0.225 0.833
Estimates of the factor analysis models.
The estimation results of the regression models are presented in Table 27.3. These results can be summarized as follows: The respondents of the survey consider that the time and the safety of the commuting trip are very important for them. Women seem to be more satisfied with their commuting trip (regarding mainly travel time and safety). Less satisfied from their working trip are the respondents currently living in urban areas, followed by the residents of suburban areas and small towns. Additionally, the more the commuting distance and the more the delay time, the less satisfied the employees. The majority of the employees believe that there is a lot of risk to relocate from one geographical area to another. Women and people greater than 45 years are more conservative than men and younger people, respectively. More ‘‘flexible’’ to move between geographical areas seem to be the employees of the private sector (related to the employees of the public sector), and the single HHs. Most of the respondents of the survey are not satisfied from their current residential location, mainly regarding level of criminality and environmental
Development of Integrated Choice and Latent Variable Models
607
Table 27.3: Structural equations of the latent variable model, estimated via the sequential estimation method. Variables
Satisfaction of commuting trip
Geographical mobility
Satisfaction Perceptions about of current the quality of life residential area in island areas
Coefficient (t-test) Constant Female Urban Suburban Town Commuting_distance Delay_time Age
0.516 (7.56) 0.173 (2.91) 0.346 ( 4.22) 0.315 ( 3.84) 0.244 ( 2.79) 0.003 ( 1.26) 0.026 ( 12.33) 0
0.818 (5.01) 0.121 (1.94) 0 0 0 0
0
0
0
0
0
0
Full_time
0
Private
0
Internet
0
Rural
0
Island
0
0
Family
0
0
Teleworker
0
0
900
900
Estimates of the linear regression models.
0.507 (5.94) 0
0
0
2 R
0
0.416 ( 3.68) 0.393 ( 3.54) 0.607 ( 5.09) 0
0
Single
45.749 (0.00) 0.230
0
0.262 (2.46) 0
0 0.177 (5.53) 0.615 ( 5.88) 0.271 ( 2.42) 0.143 ( 1.99) 0.242 (3.62) 0
Statistics Number of observations F-test (sig.)
0.427 ( 4.78) 0
13.020 (0.00) 0.074
0
0.139 (1.90)
0 0.663 (6.75) 0.242 (2.39) 0.066 (2.56) 0
900 20.990 (0.00) 0.104
0.475 ( 3.68) 0 0 0.400 (3.69) 900 17.725 (0.00) 0.103
608
Eleni Kitrinou et al.
pollution. More satisfied were found the residents of rural areas, followed by the residents of small towns and those of island areas. Additionally, the more the members in the HH, the more satisfied the respondents, maybe due to other satisfaction indices. The respondents of the survey have an a priori positive perception about the quality of life in island areas, as the positive constant of the model indicates. Teleworkers seem to believe that there is improved quality of life in island areas more than the nonteleworkers, while the employees of the private sector are also likely to support this perception more than the employees of the public sector. In addition, less likely to believe that there is improved quality of life in island areas are the residents of small towns in the mainland, followed by those currently living in rural areas, then urban, and finally, suburban areas. 27.4.1.1. Integrated choice and latent variable models In this section, the discrete choice models of the residential relocation likelihood to the suggested Aegean Island area by the year 2010, in accordance with the aforementioned scenarios, are presented. As already mentioned above, the survey includes 14 scenarios (repeated observations) per respondent, a fact that brings the total amount of observations used for modeling purposes to 12,600. In order to deal with this a random variable was employed, let b (0, s), assuming that it has zero mean and standard deviation s (parameter to be estimated from the model). This random variable is a differentiated extreme value 1 scale factor by individual. When it is included in the model, it is shown not to vary across observations from the same individual (for a discussion on the characteristics of the logit mixture model, see Bolduc & Ben-Akiva, 1991). The dependent variable is the likelihood of relocation to the Aegean Island area and it takes the following values: 1, likely (38.1% of total observations) and 0, unlikely (51.9% of total observations). Additionally, to the fitted values of the latent variables, the following observed independent variables are included in the model: (1) Dummy variables referring to the scenarios presented to the respondent (presented in Table 27.1). (2) Kids ¼ 1, if the HH includes children; 0 otherwise. (3) Kids6_12 ¼ number of children in the HH, 6–12 years old. (4) Kids12_18 ¼ number of children in the HH, 12–18 years old. (5) Kids18plus ¼ number of children in the HH, 18 plus years old. (6) Full_time ¼ 1, if the respondent has a full time working contract; 0 otherwise. (7) Teleworker ¼ 1, if the respondent currently teleworks at least one day per month; 0 otherwise. (8) Visit_area ¼ 1, if the respondent has visited the area; 0 otherwise.
Development of Integrated Choice and Latent Variable Models
609
(9) Ten_plus ¼ 1, if the respondent has been living for more than 10 years in the residential area where he/she currently does; 0 otherwise. (10) Internet ¼ 1, if Internet connection is available at the HH; 0 otherwise. (11) Distance ¼ the commuting distance the respondent has to travel (km). (12) Private ¼ 1, if the respondent is employed in private sector; 0 otherwise. (13) Education ¼ 1, if the respondent holds a graduate degree; 0 otherwise. (14) Manager ¼ 1, if the position occupied by the respondent in the professional hierarchy is managerial; 0 otherwise. (15) Urban ¼ 1, if the current residential area is urban; 0 otherwise. (16) Suburban ¼ 1, if the current residential area is suburban; 0 otherwise. (17) Rural ¼ 1, if the current residential area is rural; 0 otherwise. (18) Age ¼ 1, if the age bracket of the respondent is 45–65 years; 0 otherwise. (19) Income ¼ 1, if the annual HH income is higher than h10,000; 0 otherwise. Table 27.4 presents the estimation results of the relocation decision models. The estimation was produced using the BIOGEME software developed by Bierlaire (2003). The totality of variables was included in the utility function of the relocation choice. Consequently, the interpretation of estimated parameters can be translated as follows: positive signs indicate relocation likelihood, while negative signs suggest the opposite. It must be noted here that the superiority of the MBL models over the simple BL models is made obvious by the fact that the scale parameter s captures the unob2 served heterogeneity within the sample population. It is noted that R increases significantly in the MBL models, indicating that the scale parameter increases the goodness of fit of the simple BL models. Also, the majority of the explanatory variables become statistically more significant in the MBL model, since the random variable captures many of the intangible factors specific to the relocation decision. Regarding the comparison between the models with and without latent variables, we note the following: The ICLV models are superior to the classical choice models, due to the fact that the goodness of fit of the classical models increases considerably via the inclusion of the four latent variables. The relevant likelihood ratio (LR) test between the two BL models (with and without latent variables) is as follows: LRBL ¼ 2( 6083.471 ( 5935.773)) ¼ 295.396. This statistic asymptotically follows a w22 distribution with degrees of freedom ¼ 4. We have that: critical value ( p ¼ 0.05) ¼ 9.49. This indicates that the BL model with latent variables is significantly superior to BL model without latent variables. Similarly, the LR test between the two MBL models (with and without latent variables) is as follows: LRMBL ¼ 2( 3676.463 ( 3636.789)) ¼ 79.348, which isW9.49, also indicating the superiority of the MBL model with latent variables to the MBL model without latent variables. 2 Also, it is noted that the R increases from 0.3 to 0.316 in the BL model, while it increases from 0.575 to 0.579 in the MBL model, indicating that the four latent variables significantly increase the goodness of fit of the models.
610
Eleni Kitrinou et al.
Table 27.4: Discrete choice models for relocation decision to the Aegean Island area. Variable name
Binary logit (BL) Without latent variables
With latent variables (ICLV BL model)
Mixed binary logit (MBL) Without latent variables
With latent variables (ICLV MBL model)
Coefficient (t-test) Satisfaction of the commuting trip (Travel time and trip safety) Geographical mobility (Fear to take the risk) Satisfaction from the current residential area (Low criminality and low air pollution) Quality of life in island areas (Improved quality of life) Sigma Constant Scenario 2 (low) Scenario 2 (medium) Scenario 2 (high) Scenario 3 (low) Scenario 3 (medium) Scenario 3 (high) Scenario 4 (low) Scenario 4 (medium) Scenario 4 (high) Scenario 5 (low)
4.710 ( 18.92) 2.150 (11.16) 3.420 (18.13) 3.980 (20.94) 2.270 (11.84) 3.450 (18.26) 4.050 (0.19) 2.470 (12.94) 3.550 (18.77) 4.090 (21.50) 2.500 (13.14)
0.613 ( 11.21)
1.080 ( 8.48)
0.506 ( 3.59) 2.820 ( 6.67)
0.576 ( 1.87) 3.630 ( 3.90)
2.870 (12.42)
3.600 (6.90) 4.040 ( 39.52) 6.260 ( 14.12) 4.080 (14.80) 6.870 (23.83) 8.270 (27.33) 4.340 (15.70) 6.930 (23.99) 8.470 (27.78) 4.360 (17.13) 7.170 (24.64) 8.570 (28.01) 4.830 (17.59)
4.570 ( 17.43) 2.180 (11.27) 3.500 (18.38) 4.070 (21.25) 2.310 (11.97) 3.530 (18.52) 4.150 (21.63) 2.510 (13.09) 3.630 (19.04) 4.190 (21.82) 2.550 (13.29)
3.970 ( 41.37) 7.010 ( 16.20) 3.950 (14.64) 6.700 (23.71) 8.090 (27.29) 4.200 (15.54) 6.760 (23.88) 8.290 (27.75) 4.610 (16.97) 7.000 (24.54) 8.400 (27.99) 4.680 (17.23)
Development of Integrated Choice and Latent Variable Models
611
Table 27.4: (Continued ) Variable name
Binary logit (BL) Without latent variables
With latent variables (ICLV BL model)
Mixed binary logit (MBL) Without latent variables
With latent variables (ICLV MBL model)
Coefficient (t-test) Scenario 5 (medium) Scenario 5 (high) Scenario 6 Kids Kids6_12 Kids12_18 Kids18plus Full_time Teleworker Visit_area Ten_plus Internet Commuting_distance Private Education Manager Urban
3.560 (18.84) 4.080 (21.44) 3.210 (17.02) 0.891 ( 13.55) 0.219 (3.91) 0.336 (7.47) 0.068 (0.79) 0.892 ( 11.12) 0.513 (6.81) 1.740 (14.24) 0.151 ( 4.55) 0.515 (10.66) 0.004 (2.30) 0.524 (9.17) 0.383 ( 7.33) 0.252 (4.77) 1.500 (25.03)
3.640 (19.10) 4.170 (21.76) 3.280 (17.25) 0.740 ( 10.37) 0.328 (5.52) 0.463 (9.21) 0.126 (2.50) 1.020 ( 10.71) 1.840 (14.46) 1.730 (14.04) 0.141 ( 4.11) 0.671 (11.23) 0.003 ( 1.62) 0.898 (13.44) 0.459 ( 8.61) 0.245 (4.54) 0.226 (1.18)
7.030 (24.62) 8.360 (27.91) 6.220 (22.27) 1.370 ( 10.41) 0.609 (5.48) 0.333 (5.65) 0.536 ( 5.83) 1.760 ( 10.67) 0.662 (4.82) 1.090 (0.234) 0.314 ( 4.07) 0.754 (7.73) 0.011 (4.60) 0.377 (0.116) 1.200 ( 10.54) 0.418 (3.75) 2.560 (19.23)
7.200 (24.72) 8.540 (27.93) 6.380 (22.41) 1.390 ( 9.54) 0.844 (6.47) 0.608 (4.75) 0.267 ( 2.49) 1.490 ( 6.35) 2.640 (9.54) 0.822 (5.49) 0.407 ( 5.90) 0.596 (1.87) 0.003 ( 6.09) 0.752 (5.25) 1.630 (13.09) 0.396 (3.44) 0.679 (1.56)
612
Eleni Kitrinou et al.
Table 27.4: (Continued ) Variable name
Binary logit (BL) Without latent variables
With latent variables (ICLV BL model)
Mixed binary logit (MBL) Without latent variables
With latent variables (ICLV MBL model)
Coefficient (t-test) Suburban Rural Age Income Statistics Number of observations Sample size Initial log-likelihood Final log-likelihood 2 R
0.719 (12.66) 1.190 ( 11.94) 0.346 ( 5.46) 0.068 ( 0.94)
0.552 ( 2.92) 0.675 ( 5.02) 0.529 ( 6.84) 0.055 ( 0.75)
1.370 (11.26) 2.190 ( 10.53) 0.285 ( 2.11) 0.606 ( 4.06)
0.658 ( 1.55) 1.630 ( 5.67) 0.605 ( 3.35) 0.428 ( 2.94)
12,600 12,600 12,600 – – 900 8,733.654 8,733.654 8,733.654 6,083.471 5,935.773 3,676.463 0.300 0.316 0.575
12,600 900 8,733.654 3,636.789 0.579
Additionally, the latent variables are statistically significant for the relocation decision, while they have the expected signs. The results can be summarized as follows: The negative sign of the latent variable satisfaction of the commuting trip (fitted values of the factor Travel time and safety, representing the latent variable) indicates that the more the employees are satisfied from their commuting trip (mainly regarding the travel time and the safety of the trip), the less likely they are to relocate to the proposed island area. The negative sign of the latent variable geographical mobility (fitted values of the factor Fear to take risk representing the latent variable) states that the more conservative the employees are to change their residential and work location, the less likely they are to relocate to the proposed island area. The latent variable satisfaction from the current residential area (represented by the fitted values of the factor Low criminality and low air pollution) also takes a negative sign, indicating that the proposed island area is more preferred by employees who currently live in urban areas. The latent variable indicating perceptions about the quality of life in island areas (represented by the factor Improved quality of life), takes a positive sign, as expected,
Development of Integrated Choice and Latent Variable Models
613
because if the employees perceive that the quality of life in island areas is improved, then they are more likely to decide to relocate to such an area. With regard to the observed explanatory variables of the models, the results are summarized in the following paragraphs. First, as far as accessibility and housing prices at the proposed area are concerned, in the manner in which they were described in the future scenarios presented to the respondents, policy variables involving them tend to affect the decision makers positively and strongly in favor of the relocation. Following the gradation specified in the scenarios, from low to medium to high level, the effect tends to gather strength proportionally. Policy variables for the residential relocation decision to an island region include, in order of significance, the reduction of housing prices, the decrease of travel cost and that of travel time spent in trips to/from the island area and local ones. Second, with respect to the respondents’ work attributes, our findings suggest that the groups more likely to relocate are constituted by part-time workers, respondents employed in the private sector, and those occupying managerial positions. Also, respondents who happen to be active teleworkers naturally show a strong inclination to relocate to the proposed residential area. Being connected to the Internet at home is another characteristic that predisposes toward relocation, probably on account of its being a sign of greater familiarity with ICT-use at home. HHs without children seem more flexible and prone to relocation, while for those with offspring the number of children and their respective ages constitute significant variables affecting the relocation decision. To put it in more precise terms, it appears that the larger the number of offspring older than 18, the stronger the likelihood of the HH relocating to the suggested island area. It is noted that similar results are also found by Waddell (1993, 2001); Mokhtarian et al. (2004); and Prashker et al. (2008). Finally, with regard to their actual residential area characteristics, the findings suggest that HHs currently residing in urban areas are more likely to relocate. However, it should be noted that the HHs currently residing in rural areas are the least likely to relocate to an island. At this point, we should note that the estimated coefficients of the dummies ‘‘urban,’’ suburban,’’ and ‘‘rural’’ (representing the type of the current residential area of the respondents) have differences between the models with and without latent variables, probably due to the fact that those dummies are also statistically significant explanatory variables for the latent variables (see Table 27.3). This result has also been found by Polydoropoulou and Ben-Akiva (1998).
27.4.1.2. Aggregation In the process of aggregating the model developed, the sample enumeration approach is used according to which the choice probabilities of each decision maker of the sample are summed over decision makers. A detailed presentation of this approach is available at Train (2003). As mentioned previously, the survey sample was 900 Greek HHs. The population of the survey (the number of HHs in Greece) is about 4,000,000 (see http:// www.statistics.gr). By assuming that all the sample units have the same weight
614
Eleni Kitrinou et al. 0.6
probality of relocation
0.5 0.4 0.3 0.2 0.1 0 Total sample Urban areas
Suburban areas
towns
Rural areas
Island areas
integrated model if all the houses are equipped with an internet connection if all the employees telework at least one day per month
Graph 27.1: Estimated market shares for the integrated MNL model. (representing the number of decision makers similar to them in the population), this weight is estimated as 4,000,000/900. Graph 27.1 presents the estimated market shares with regard to the probability of the relocation decision to the proposed Aegean Island region, in the integrated MBL model. The sample enumeration method is applied for both the overall sample and the stratification of the sample with regard to the type of the current residential location of the HHs (urban, suburban, town, rural, and island). Additionally, in order to further analyze the impact of ICT-use on the relevant choice probabilities, sample enumeration method was also used in this model, but in the following virtual cases: (1) if the variable Internet ( ¼ 1, if the house is equipped with an Internet connection, 0 otherwise) takes the value 1 for all the sample units and (2) if the variable teleworker ( ¼ 1, if the respondent currently teleworks at least one day per month, 0 otherwise) takes the value 1 for all the sample units. The estimated market shares for those cases are also presented at the following Graph 27.1. It is noted that the probability of relocation to the proposed Aegean Island area, under the empirical study developed, increases significantly with the familiarity of the respondents to ICT-use. Specifically, if all the Greek HHs have an Internet connection at home, the probability in question increases from 0.39 to 0.44 (for the totality of the sample), while if all the Greek employees are currently teleworkers, the relevant relocation probability goes to 0.47. This result indicates the critical role of ICT-use and especially teleworking for the residential relocation decision to an island area, which would cause decentralization of services and changes in transportation demand, contributing to the regional development of island areas. It is also noted that, in contrast to the previous result, the relevant market shares for the HHs currently living at rural or island areas decrease in those two virtual
Development of Integrated Choice and Latent Variable Models
615
cases. This result additionally indicates that ICT-use and teleworking supplement the decision of the residents of remote (rural or island) areas to keep their residential location at those areas. Consequently, it could be said that the regional plans considering the attractiveness and the development of the remote, especially island areas, have to include ICT and teleworking applications in their strategic projects.
27.5. Conclusions and Further Research The present paper’s intention was to propose a behavioral framework for modeling the residential relocation decision-making process, where relocation was considered toward island areas and facilitated by ICT-use. The study took into account certain policy variables regarding both accessibility and housing prices at the proposed area, together with the socioeconomic and latent characteristics of the decision makers. The relocation decision to the suggested area was modeled by employing BL and MBL models, using panel data. Those models were estimated sequentially. The latent variables included in the discrete choice models strongly and significantly affect the residential relocation decision, and they improve considerably the models’ goodness of fit. Additionally, the policy variables pertaining to local accessibility and housing prices have a positive effect on the relocation decision in proportion with their level of development (low, medium, or high). In the same perspective, the most significant policy variables in the context of a decision of residential relocation to an island region appear to be cut-down housing prices in conjunction with a reduction of both the travel cost and the travel time spent in trips to/from the islands and from one island to another. The HHs that are more inclined to relocate to the proposed island area are the ones currently residing at urban areas, while the least inclined are the ones residing at rural ones. As far as the work-related and socioeconomic characteristics of the workers are concerned, our findings are consistent with those of the other studies referring to urban areas. The model aggregation via the sample enumeration method additionally suggests the significant role of the ICT-use and especially teleworking for the improvement of the attractiveness of the island areas, as a residential location. Further research includes a joint estimation of the models, which requires the use of a simulation-based approach. It will be produced using the ICLV software developed by Bolduc and Giroux (2005). The comparison of the estimation results of both methodologies (sequential and joint estimations) is also to be considered.
References Alonso, W. (1964). Location and land use: Toward a general theory of land rents. Harvard University Press: Cambridge, MA. Anas, A. (1982). Residential location markets and urban transportation. Academic Press, Inc.: New York.
616
Eleni Kitrinou et al.
Anas, A. (1985). The combined equilibrium of travel networks and residential location markets. Regional Science and Urban Economics, 15, 1–21. Ben-Akiva, M., & Bowman, J. L. (1998b). Integration of an activity-based model system and a residential location model. Urban Studies, 35(7), 1131–1153. Ben-Akiva, M., & de Palma, A. (1986). Analysis of a dynamic residential location choice model with transaction costs. Journal of Regional Science, 26, 321–341. Ben-Akiva, M., & Lerman, S. R. (1977). Disaggregate travel and mobility choice models and measures of accessibility. Paper presented at the Third International Conference on Behavioural Travel Modelling, Tanenda, Australia. Ben-Akiva, M., McFadden, D., Ga¨rling, T., Gopinath, D., Bolduc, D., Borsch-Supan, A., Delquie´, Ph., Larichev, O., Morikawa, T., Polydoropoulou, A., & Rao, V. (1999). Extended framework for modeling choice behavior. Marketing Letters, 10(3), 187–203. Ben-Akiva, M., Walker, J., Bernardino, A., Gopinath, D., Morikawa, T., & Polydoropoulou, A. (2002). Integration of choice and latent variable models. In: H. Mahmassani (Ed.), Perpetual motion: Travel behaviour research opportunities and application challenges. Amsterdam. Bernardino, A., & Ben-Akiva, M. (1996). Modeling the process of adoption of telecommuting: Comprehensive framework. Journal of the Transportation Research Board, No. 1552, pp. 161–170, TRB, National Research Council, Washington, DC. Bhat, C., & Guo, J. (2004). A mixed spatially correlated logit model: Formulation and application to residential choice modelling. Transportation Research, Part B, 38, 147–168. Bhat, C. R., & Guo, J. Y. (2006). A comprehensive analysis of built environment characteristics on household residential choice and auto ownership levels. Technical Report no. SWUTC/06/ 167860-1. Prepared for the Southwest Region University Transportation Center of the U.S. Department of Transportation. Bierlaire, M. (2003). BIOGEME: A free package for the estimation of discrete choice models. Proceedings of the 3rd Swiss transportation research conference, Ascona, Switzerland. Blijie, B. (2005). The impact of accessibility on residential choice: Empirical results of a discrete choice model. In: 45th Congress of the European Regional Science Association, August, Vrije Universiteit, Amsterdam. Bolduc, D., & Giroux, A. (2005). The integrated choice and latent variable (ICLV) model: Handout to accompany the estimation software. Departement d’ Economique, Universite Laval. Bolduc, D., & Ben-Akiva, M. (1991). A multinomial probit formulation for large choice sets. Proceedings of the 6th international conference on travel behaviour, Quebec, Canada (Vol. 2, pp. 243–258). Bolduc, D., Ben-Akiva, M., Walker, J., & Michaud, A. (2005). Hybrid choice models with logit kernel: Applicability to large scale models. In: Lee-Gosselin & Doherty (Eds), Integrated landuse and transportation models: Behavioral foundations (pp. 275–302). Amsterdam. Chen, J., Chen, C., & Timmermans, H. (2008). Accessibility tradeoffs in HHs’ residential location decisions. Washington, DC. Ettema, D. (2008). The impact of telecommuting on residential relocation and residential preferences: A latent class modeling approach. Washington, DC. Evers, G. H. M. (1990). The residential location and workplace choice: A nested multinomial logit model. In: M. M. Fischer, P. Nijkamp & Y. Y. Papageorgiou (Eds), Spatial choices and processes (pp. 313–329). North-Holland: B.V. Elsevier Science Publishers. Friedman, J. (1981). A conditional logit model of the role of local public services in residential choice. Urban Studies, 18, 347–358.
Development of Integrated Choice and Latent Variable Models
617
Gabriel, S. A., & Rosenthal, S. S. (1989). Household location and race: Estimates of a multinomial logit model. The Review of Economics and Statistics, 17(2), 240–249. Geurs, K. T., & van Wee, B. (2004). Accessibility evaluation of land-use and transport strategies: Review and research directions. Journal of Transport Geography, 12(2), 127–140. Goldberg, M., & Chinloy, P. (1984). Urban land economics. Wiley: New York. Hunt, J. D., McMillan, P., & Abraham, J. E. (1994). Stated preference investigation of influences on attractiveness of residential locations. Transportation Research Record, 1466, 79–87. Kim, S. (1991). Choice of rental housing in Korean market: Alternative approach. Journal of Urban Planning and Development, 117, 41–58. Kitrinou, E., & Polydoropoulou, A. (2007, 24–28 June). Analyzing the decisions of relocation and teleworking in island areas. Proceedings of the 11th World Conference on Transportation Research, Berkeley, USA. Lerman, S. R. (1976). Location, housing, auto ownership, and mode to work: A joint choice model. Transportation Research Record, No. 610, TRB, National Research Council, Washington, DC. McFadden, D. (1978, 1997). Modelling the choice of residential location. In: A. Karlqvist, L. Lundqvist, F. Snickars, & J. Weibull (Eds), Spatial interaction theory and planning models (pp. 75–96), Amsterdam, North Holland. [Reprinted in Quigley, J. (Ed.), The economics of housing (Vol. I, pp. 531–552). London: Edward Elgar]. Mokhtarian, P. L., & Cao, X. (2008). Examining the impacts of residential self-selection on travel behavior: A focus on methodologies. Transportation Research, Part B, 42, 204–228. Mokhtarian, P. L., Collantes, G. O., & Gertz, C. (2004). Telecommuting, residential location, and commute distance traveled: Evidence from state of California employees. Environment and Planning A, 36, 1877–1897. Molin, E., Oppewal, H., & Timmermans, H. J. P. (1999). Group-based versus individual-based conjoint preference models of residential preferences: A comparative test. Environment and Planning A, 31, 1935–1947. Molin, E. J. E., & Timmermans, H. J. P. (2003). Transport considerations in residential choice decisions: Accumulated evidence from the Benelux. In: Proceedings of the 82nd annual meeting of the Transport Research Board (TRB), Washington, DC. Morikawa, T., Ben-Akiva, M., & McFadden, D. (2002). Discrete choice models incorporating revealed preferences and psychometric data. Econometric Models in Marketing, 16, 29–55. Muth, R. F. (1969). Cities and housing: The spatial patterns of urban residential land use. Chicago, IL: The University of Chicago Press. Nilles, J. M. (1991). Telecommuting and urban sprawl: Mitigator or inciter? Transportation, 18, 411–432. Pollakowski, U. O. (1982). Urban housing markets and residential location. Lexington, MA: D.C. Heath and Company. Polydoropoulou, A., & Ben-Akiva, M. (1998). The effect of advanced traveler information systems (ATIS) on travelers behaviour. In: R. Emmerink & P. Nijkamp (Eds), Behavioural and network impacts of driver information systems. UK: Ashgate Publishing Group, ISBN 1 84014 506 4. Polydoropoulou, A., & Litinas, N. (2007). Advanced demand models for Greek passenger shipping. In: T. Pallis (Ed.), Maritime transport: The Greek paradigm. Amsterdam. Prashker, J., Shiftan, Y., & Hershkovitch-Sarusi, P. (2008). Residential choice location, gender and the commute trip to work in Tel Aviv. Journal of Transport Geography, 16(5), 332–341. Quigley, J. M. (1976). Housing demand in the short run: An analysis of polytonous choice, explorations in economic research (Vol. 3). New York: National Bureau of Economic Research.
618
Eleni Kitrinou et al.
Quigley, J. M. (1985). Consumer choice of dwelling, neighbourhood and public services. Regional Science and Urban Economics, 15, 41–63. Raju, K., Sikdar, K., & Dhingra, S. L. (1998). Micro-simulation of residential location choice and its variation. Computers, Environment and Urban Systems, 22(3), 203–218. Sermons, W., & Koppelman, S. (2001). Representing the differences between female and male commute behavior in residential location choice models. Journal of Transport Geography, 9, 101–110. Shiftan, Y. (2003). Peoples’ evaluation of urban and suburban characteristics: A residential stated preference model. International Journal of Transport Economics, 2, 151–166. Srour, M., Kockelman, K., & Dunn, T. (2002). Accessibility indices: Connection to residential land prices and location choices. Transportation Research Record, 1805, 25–34. Steiner, R. L. (1994). Residential density and travel patterns: Review of the literature. Transportation Research Record, No. 1466, TRB, National Research Council, Washington, DC. Train, K. (2003). Discrete choice methods with simulation. Cambridge, UK. Van Ommerman, J. N., Rietveld, P., & Nijkamp, P. (1998). Spatial moving behavior of twoearner HHs. Journal of Regional Science, 38(1), 23–41. Waddell, P. (1993). Exogenous workplace choice in residential location models: Is the assumption valid? Geographical Analysis, 25, 65–82. Waddell, P. (2001). UrbanSim: Modeling urban development for land use, transportation and environmental planning. Department of Urban Design and Planning and Daniel J. Evans School of Public Affairs, University of Washington, Seattle, WA. Walker, J., & Li, J. (2006). Latent lifestyle preferences and HH location decisions. Paper presented to IATBR conference, Kyoto. Walker, J. L. (2001). Extended discrete choice models: Integrated framework, flexible error structures, and latent variables. Ph.D. dissertation, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, USA. Weisbrod, G. E., Lerman, S. R., & Ben-Akiva, M. (1980). Tradeoffs in residential location decisions: Transportation versus other factors. Transport Policy and Decision Making, 1, 13–26.